[2025-01-03 20:13:15,252][122130] Saving configuration to /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/config.json... [2025-01-03 20:13:15,253][122130] Rollout worker 0 uses device cpu [2025-01-03 20:13:15,253][122130] Rollout worker 1 uses device cpu [2025-01-03 20:13:15,253][122130] Rollout worker 2 uses device cpu [2025-01-03 20:13:15,253][122130] Rollout worker 3 uses device cpu [2025-01-03 20:13:15,253][122130] Rollout worker 4 uses device cpu [2025-01-03 20:13:15,253][122130] Rollout worker 5 uses device cpu [2025-01-03 20:13:15,253][122130] Rollout worker 6 uses device cpu [2025-01-03 20:13:15,253][122130] Rollout worker 7 uses device cpu [2025-01-03 20:13:15,310][122130] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2025-01-03 20:13:15,311][122130] InferenceWorker_p0-w0: min num requests: 2 [2025-01-03 20:13:15,334][122130] Starting all processes... [2025-01-03 20:13:15,334][122130] Starting process learner_proc0 [2025-01-03 20:13:16,937][122130] Starting all processes... [2025-01-03 20:13:16,945][122176] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2025-01-03 20:13:16,945][122176] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0 [2025-01-03 20:13:16,949][122130] Starting process inference_proc0-0 [2025-01-03 20:13:16,949][122130] Starting process rollout_proc0 [2025-01-03 20:13:16,949][122130] Starting process rollout_proc1 [2025-01-03 20:13:16,949][122130] Starting process rollout_proc2 [2025-01-03 20:13:16,949][122130] Starting process rollout_proc3 [2025-01-03 20:13:16,950][122130] Starting process rollout_proc4 [2025-01-03 20:13:16,959][122176] Num visible devices: 1 [2025-01-03 20:13:16,950][122130] Starting process rollout_proc5 [2025-01-03 20:13:16,964][122176] Starting seed is not provided [2025-01-03 20:13:16,965][122176] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2025-01-03 20:13:16,965][122176] Initializing actor-critic model on device cuda:0 [2025-01-03 20:13:16,950][122130] Starting process rollout_proc6 [2025-01-03 20:13:16,965][122176] RunningMeanStd input shape: (3, 72, 128) [2025-01-03 20:13:16,952][122130] Starting process rollout_proc7 [2025-01-03 20:13:16,966][122176] RunningMeanStd input shape: (1,) [2025-01-03 20:13:16,988][122176] ConvEncoder: input_channels=3 [2025-01-03 20:13:17,165][122176] Conv encoder output size: 512 [2025-01-03 20:13:17,166][122176] Policy head output size: 512 [2025-01-03 20:13:17,212][122176] Created Actor Critic model with architecture: [2025-01-03 20:13:17,213][122176] ActorCriticSharedWeights( (obs_normalizer): ObservationNormalizer( (running_mean_std): RunningMeanStdDictInPlace( (running_mean_std): ModuleDict( (obs): RunningMeanStdInPlace() ) ) ) (returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace) (encoder): VizdoomEncoder( (basic_encoder): ConvEncoder( (enc): RecursiveScriptModule( original_name=ConvEncoderImpl (conv_head): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Conv2d) (1): RecursiveScriptModule(original_name=ELU) (2): RecursiveScriptModule(original_name=Conv2d) (3): RecursiveScriptModule(original_name=ELU) (4): RecursiveScriptModule(original_name=Conv2d) (5): RecursiveScriptModule(original_name=ELU) ) (mlp_layers): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Linear) (1): RecursiveScriptModule(original_name=ELU) ) ) ) ) (core): ModelCoreRNN( (core): GRU(512, 512) ) (decoder): MlpDecoder( (mlp): Identity() ) (critic_linear): Linear(in_features=512, out_features=1, bias=True) (action_parameterization): ActionParameterizationDefault( (distribution_linear): Linear(in_features=512, out_features=5, bias=True) ) ) [2025-01-03 20:13:17,448][122176] Using optimizer [2025-01-03 20:13:19,073][122176] No checkpoints found [2025-01-03 20:13:19,073][122176] Did not load from checkpoint, starting from scratch! [2025-01-03 20:13:19,074][122176] Initialized policy 0 weights for model version 0 [2025-01-03 20:13:19,077][122176] LearnerWorker_p0 finished initialization! [2025-01-03 20:13:19,078][122176] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2025-01-03 20:13:19,617][122217] Worker 5 uses CPU cores [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11] [2025-01-03 20:13:19,657][122201] Worker 0 uses CPU cores [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11] [2025-01-03 20:13:19,678][122218] Worker 6 uses CPU cores [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11] [2025-01-03 20:13:19,699][122200] Worker 1 uses CPU cores [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11] [2025-01-03 20:13:19,714][122216] Worker 2 uses CPU cores [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11] [2025-01-03 20:13:19,738][122202] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2025-01-03 20:13:19,739][122202] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0 [2025-01-03 20:13:19,751][122202] Num visible devices: 1 [2025-01-03 20:13:19,780][122219] Worker 4 uses CPU cores [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11] [2025-01-03 20:13:19,832][122220] Worker 7 uses CPU cores [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11] [2025-01-03 20:13:19,839][122202] RunningMeanStd input shape: (3, 72, 128) [2025-01-03 20:13:19,840][122202] RunningMeanStd input shape: (1,) [2025-01-03 20:13:19,848][122202] ConvEncoder: input_channels=3 [2025-01-03 20:13:19,862][122215] Worker 3 uses CPU cores [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11] [2025-01-03 20:13:19,940][122202] Conv encoder output size: 512 [2025-01-03 20:13:19,940][122202] Policy head output size: 512 [2025-01-03 20:13:19,966][122130] Inference worker 0-0 is ready! [2025-01-03 20:13:19,966][122130] All inference workers are ready! Signal rollout workers to start! [2025-01-03 20:13:19,996][122217] Doom resolution: 160x120, resize resolution: (128, 72) [2025-01-03 20:13:19,996][122218] Doom resolution: 160x120, resize resolution: (128, 72) [2025-01-03 20:13:19,997][122220] Doom resolution: 160x120, resize resolution: (128, 72) [2025-01-03 20:13:20,010][122219] Doom resolution: 160x120, resize resolution: (128, 72) [2025-01-03 20:13:20,015][122216] Doom resolution: 160x120, resize resolution: (128, 72) [2025-01-03 20:13:20,015][122200] Doom resolution: 160x120, resize resolution: (128, 72) [2025-01-03 20:13:20,015][122201] Doom resolution: 160x120, resize resolution: (128, 72) [2025-01-03 20:13:20,015][122215] Doom resolution: 160x120, resize resolution: (128, 72) [2025-01-03 20:13:20,342][122200] Decorrelating experience for 0 frames... [2025-01-03 20:13:20,342][122218] Decorrelating experience for 0 frames... [2025-01-03 20:13:20,342][122220] Decorrelating experience for 0 frames... [2025-01-03 20:13:20,342][122219] Decorrelating experience for 0 frames... [2025-01-03 20:13:20,342][122215] Decorrelating experience for 0 frames... [2025-01-03 20:13:20,563][122200] Decorrelating experience for 32 frames... [2025-01-03 20:13:20,566][122220] Decorrelating experience for 32 frames... [2025-01-03 20:13:20,627][122219] Decorrelating experience for 32 frames... [2025-01-03 20:13:20,627][122215] Decorrelating experience for 32 frames... [2025-01-03 20:13:20,676][122218] Decorrelating experience for 32 frames... [2025-01-03 20:13:20,692][122216] Decorrelating experience for 0 frames... [2025-01-03 20:13:20,867][122217] Decorrelating experience for 0 frames... [2025-01-03 20:13:20,874][122200] Decorrelating experience for 64 frames... [2025-01-03 20:13:20,914][122216] Decorrelating experience for 32 frames... [2025-01-03 20:13:20,915][122219] Decorrelating experience for 64 frames... [2025-01-03 20:13:20,975][122218] Decorrelating experience for 64 frames... [2025-01-03 20:13:21,120][122220] Decorrelating experience for 64 frames... [2025-01-03 20:13:21,151][122217] Decorrelating experience for 32 frames... [2025-01-03 20:13:21,152][122201] Decorrelating experience for 0 frames... [2025-01-03 20:13:21,203][122215] Decorrelating experience for 64 frames... [2025-01-03 20:13:21,318][122216] Decorrelating experience for 64 frames... [2025-01-03 20:13:21,341][122218] Decorrelating experience for 96 frames... [2025-01-03 20:13:21,382][122220] Decorrelating experience for 96 frames... [2025-01-03 20:13:21,391][122201] Decorrelating experience for 32 frames... [2025-01-03 20:13:21,476][122215] Decorrelating experience for 96 frames... [2025-01-03 20:13:21,500][122217] Decorrelating experience for 64 frames... [2025-01-03 20:13:21,559][122130] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 0. Throughput: 0: nan. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2025-01-03 20:13:21,589][122216] Decorrelating experience for 96 frames... [2025-01-03 20:13:21,648][122219] Decorrelating experience for 96 frames... [2025-01-03 20:13:21,765][122217] Decorrelating experience for 96 frames... [2025-01-03 20:13:21,842][122201] Decorrelating experience for 64 frames... [2025-01-03 20:13:21,883][122200] Decorrelating experience for 96 frames... [2025-01-03 20:13:22,134][122201] Decorrelating experience for 96 frames... [2025-01-03 20:13:22,632][122176] Signal inference workers to stop experience collection... [2025-01-03 20:13:22,666][122202] InferenceWorker_p0-w0: stopping experience collection [2025-01-03 20:13:25,122][122176] Signal inference workers to resume experience collection... [2025-01-03 20:13:25,122][122202] InferenceWorker_p0-w0: resuming experience collection [2025-01-03 20:13:26,559][122130] Fps is (10 sec: 5734.3, 60 sec: 5734.3, 300 sec: 5734.3). Total num frames: 28672. Throughput: 0: 1226.4. Samples: 6132. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2025-01-03 20:13:26,560][122130] Avg episode reward: [(0, '3.767')] [2025-01-03 20:13:27,168][122202] Updated weights for policy 0, policy_version 10 (0.0083) [2025-01-03 20:13:29,549][122202] Updated weights for policy 0, policy_version 20 (0.0013) [2025-01-03 20:13:31,559][122130] Fps is (10 sec: 11468.8, 60 sec: 11468.8, 300 sec: 11468.8). Total num frames: 114688. Throughput: 0: 1908.8. Samples: 19088. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 20:13:31,560][122130] Avg episode reward: [(0, '4.497')] [2025-01-03 20:13:31,622][122176] Saving new best policy, reward=4.497! [2025-01-03 20:13:31,871][122202] Updated weights for policy 0, policy_version 30 (0.0012) [2025-01-03 20:13:34,166][122202] Updated weights for policy 0, policy_version 40 (0.0012) [2025-01-03 20:13:35,304][122130] Heartbeat connected on Batcher_0 [2025-01-03 20:13:35,307][122130] Heartbeat connected on LearnerWorker_p0 [2025-01-03 20:13:35,314][122130] Heartbeat connected on RolloutWorker_w0 [2025-01-03 20:13:35,316][122130] Heartbeat connected on InferenceWorker_p0-w0 [2025-01-03 20:13:35,317][122130] Heartbeat connected on RolloutWorker_w1 [2025-01-03 20:13:35,321][122130] Heartbeat connected on RolloutWorker_w2 [2025-01-03 20:13:35,324][122130] Heartbeat connected on RolloutWorker_w3 [2025-01-03 20:13:35,325][122130] Heartbeat connected on RolloutWorker_w4 [2025-01-03 20:13:35,327][122130] Heartbeat connected on RolloutWorker_w5 [2025-01-03 20:13:35,330][122130] Heartbeat connected on RolloutWorker_w6 [2025-01-03 20:13:35,333][122130] Heartbeat connected on RolloutWorker_w7 [2025-01-03 20:13:36,434][122202] Updated weights for policy 0, policy_version 50 (0.0012) [2025-01-03 20:13:36,559][122130] Fps is (10 sec: 17613.0, 60 sec: 13653.4, 300 sec: 13653.4). Total num frames: 204800. Throughput: 0: 3055.1. Samples: 45826. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 20:13:36,559][122130] Avg episode reward: [(0, '4.481')] [2025-01-03 20:13:38,693][122202] Updated weights for policy 0, policy_version 60 (0.0012) [2025-01-03 20:13:40,330][122130] Keyboard interrupt detected in the event loop EvtLoop [Runner_EvtLoop, process=main process 122130], exiting... [2025-01-03 20:13:40,332][122130] Runner profile tree view: main_loop: 24.9987 [2025-01-03 20:13:40,332][122176] Stopping Batcher_0... [2025-01-03 20:13:40,333][122176] Loop batcher_evt_loop terminating... [2025-01-03 20:13:40,332][122130] Collected {0: 270336}, FPS: 10814.0 [2025-01-03 20:13:40,335][122176] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000000067_274432.pth... [2025-01-03 20:13:40,359][122218] EvtLoop [rollout_proc6_evt_loop, process=rollout_proc6] unhandled exception in slot='advance_rollouts' connected to emitter=Emitter(object_id='InferenceWorker_p0-w0', signal_name='advance6'), args=(0, 0) Traceback (most recent call last): File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/signal_slot/signal_slot.py", line 355, in _process_signal slot_callable(*args) File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/sample_factory/algo/sampling/rollout_worker.py", line 241, in advance_rollouts complete_rollouts, episodic_stats = runner.advance_rollouts(policy_id, self.timing) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/sample_factory/algo/sampling/non_batched_sampling.py", line 634, in advance_rollouts new_obs, rewards, terminated, truncated, infos = e.step(actions) ^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/gymnasium/core.py", line 461, in step return self.env.step(action) ^^^^^^^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/sample_factory/algo/utils/make_env.py", line 129, in step obs, rew, terminated, truncated, info = self.env.step(action) ^^^^^^^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/sample_factory/algo/utils/make_env.py", line 115, in step obs, rew, terminated, truncated, info = self.env.step(action) ^^^^^^^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/sf_examples/vizdoom/doom/wrappers/scenario_wrappers/gathering_reward_shaping.py", line 33, in step observation, reward, terminated, truncated, info = self.env.step(action) ^^^^^^^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/gymnasium/core.py", line 522, in step observation, reward, terminated, truncated, info = self.env.step(action) ^^^^^^^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/sample_factory/envs/env_wrappers.py", line 86, in step obs, reward, terminated, truncated, info = self.env.step(action) ^^^^^^^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/gymnasium/core.py", line 461, in step return self.env.step(action) ^^^^^^^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/sf_examples/vizdoom/doom/wrappers/multiplayer_stats.py", line 54, in step obs, reward, terminated, truncated, info = self.env.step(action) ^^^^^^^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/sf_examples/vizdoom/doom/doom_gym.py", line 452, in step reward = self.game.make_action(actions_flattened, self.skip_frames) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ vizdoom.vizdoom.SignalException: Signal SIGINT received. ViZDoom instance has been closed. [2025-01-03 20:13:40,361][122217] EvtLoop [rollout_proc5_evt_loop, process=rollout_proc5] unhandled exception in slot='advance_rollouts' connected to emitter=Emitter(object_id='InferenceWorker_p0-w0', signal_name='advance5'), args=(1, 0) Traceback (most recent call last): File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/signal_slot/signal_slot.py", line 355, in _process_signal slot_callable(*args) File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/sample_factory/algo/sampling/rollout_worker.py", line 241, in advance_rollouts complete_rollouts, episodic_stats = runner.advance_rollouts(policy_id, self.timing) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/sample_factory/algo/sampling/non_batched_sampling.py", line 634, in advance_rollouts new_obs, rewards, terminated, truncated, infos = e.step(actions) ^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/gymnasium/core.py", line 461, in step return self.env.step(action) ^^^^^^^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/sample_factory/algo/utils/make_env.py", line 129, in step obs, rew, terminated, truncated, info = self.env.step(action) ^^^^^^^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/sample_factory/algo/utils/make_env.py", line 115, in step obs, rew, terminated, truncated, info = self.env.step(action) ^^^^^^^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/sf_examples/vizdoom/doom/wrappers/scenario_wrappers/gathering_reward_shaping.py", line 33, in step observation, reward, terminated, truncated, info = self.env.step(action) ^^^^^^^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/gymnasium/core.py", line 522, in step observation, reward, terminated, truncated, info = self.env.step(action) ^^^^^^^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/sample_factory/envs/env_wrappers.py", line 86, in step obs, reward, terminated, truncated, info = self.env.step(action) ^^^^^^^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/gymnasium/core.py", line 461, in step return self.env.step(action) ^^^^^^^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/sf_examples/vizdoom/doom/wrappers/multiplayer_stats.py", line 54, in step obs, reward, terminated, truncated, info = self.env.step(action) ^^^^^^^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/sf_examples/vizdoom/doom/doom_gym.py", line 452, in step reward = self.game.make_action(actions_flattened, self.skip_frames) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ vizdoom.vizdoom.SignalException: Signal SIGINT received. ViZDoom instance has been closed. [2025-01-03 20:13:40,389][122218] Unhandled exception Signal SIGINT received. ViZDoom instance has been closed. in evt loop rollout_proc6_evt_loop [2025-01-03 20:13:40,386][122200] EvtLoop [rollout_proc1_evt_loop, process=rollout_proc1] unhandled exception in slot='advance_rollouts' connected to emitter=Emitter(object_id='InferenceWorker_p0-w0', signal_name='advance1'), args=(0, 0) Traceback (most recent call last): File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/signal_slot/signal_slot.py", line 355, in _process_signal slot_callable(*args) File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/sample_factory/algo/sampling/rollout_worker.py", line 241, in advance_rollouts complete_rollouts, episodic_stats = runner.advance_rollouts(policy_id, self.timing) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/sample_factory/algo/sampling/non_batched_sampling.py", line 634, in advance_rollouts new_obs, rewards, terminated, truncated, infos = e.step(actions) ^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/gymnasium/core.py", line 461, in step return self.env.step(action) ^^^^^^^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/sample_factory/algo/utils/make_env.py", line 129, in step obs, rew, terminated, truncated, info = self.env.step(action) ^^^^^^^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/sample_factory/algo/utils/make_env.py", line 115, in step obs, rew, terminated, truncated, info = self.env.step(action) ^^^^^^^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/sf_examples/vizdoom/doom/wrappers/scenario_wrappers/gathering_reward_shaping.py", line 33, in step observation, reward, terminated, truncated, info = self.env.step(action) ^^^^^^^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/gymnasium/core.py", line 522, in step observation, reward, terminated, truncated, info = self.env.step(action) ^^^^^^^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/sample_factory/envs/env_wrappers.py", line 86, in step obs, reward, terminated, truncated, info = self.env.step(action) ^^^^^^^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/gymnasium/core.py", line 461, in step return self.env.step(action) ^^^^^^^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/sf_examples/vizdoom/doom/wrappers/multiplayer_stats.py", line 54, in step obs, reward, terminated, truncated, info = self.env.step(action) ^^^^^^^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/sf_examples/vizdoom/doom/doom_gym.py", line 452, in step reward = self.game.make_action(actions_flattened, self.skip_frames) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ vizdoom.vizdoom.SignalException: Signal SIGINT received. ViZDoom instance has been closed. [2025-01-03 20:13:40,381][122201] EvtLoop [rollout_proc0_evt_loop, process=rollout_proc0] unhandled exception in slot='advance_rollouts' connected to emitter=Emitter(object_id='InferenceWorker_p0-w0', signal_name='advance0'), args=(0, 0) Traceback (most recent call last): File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/signal_slot/signal_slot.py", line 355, in _process_signal slot_callable(*args) File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/sample_factory/algo/sampling/rollout_worker.py", line 241, in advance_rollouts complete_rollouts, episodic_stats = runner.advance_rollouts(policy_id, self.timing) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/sample_factory/algo/sampling/non_batched_sampling.py", line 634, in advance_rollouts new_obs, rewards, terminated, truncated, infos = e.step(actions) ^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/gymnasium/core.py", line 461, in step return self.env.step(action) ^^^^^^^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/sample_factory/algo/utils/make_env.py", line 129, in step obs, rew, terminated, truncated, info = self.env.step(action) ^^^^^^^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/sample_factory/algo/utils/make_env.py", line 115, in step obs, rew, terminated, truncated, info = self.env.step(action) ^^^^^^^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/sf_examples/vizdoom/doom/wrappers/scenario_wrappers/gathering_reward_shaping.py", line 33, in step observation, reward, terminated, truncated, info = self.env.step(action) ^^^^^^^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/gymnasium/core.py", line 522, in step observation, reward, terminated, truncated, info = self.env.step(action) ^^^^^^^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/sample_factory/envs/env_wrappers.py", line 86, in step obs, reward, terminated, truncated, info = self.env.step(action) ^^^^^^^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/gymnasium/core.py", line 461, in step return self.env.step(action) ^^^^^^^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/sf_examples/vizdoom/doom/wrappers/multiplayer_stats.py", line 54, in step obs, reward, terminated, truncated, info = self.env.step(action) ^^^^^^^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/sf_examples/vizdoom/doom/doom_gym.py", line 452, in step reward = self.game.make_action(actions_flattened, self.skip_frames) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ vizdoom.vizdoom.SignalException: Signal SIGINT received. ViZDoom instance has been closed. [2025-01-03 20:13:40,391][122200] Unhandled exception Signal SIGINT received. ViZDoom instance has been closed. in evt loop rollout_proc1_evt_loop [2025-01-03 20:13:40,392][122201] Unhandled exception Signal SIGINT received. ViZDoom instance has been closed. in evt loop rollout_proc0_evt_loop [2025-01-03 20:13:40,389][122217] Unhandled exception Signal SIGINT received. ViZDoom instance has been closed. in evt loop rollout_proc5_evt_loop [2025-01-03 20:13:40,395][122176] Stopping LearnerWorker_p0... [2025-01-03 20:13:40,368][122216] EvtLoop [rollout_proc2_evt_loop, process=rollout_proc2] unhandled exception in slot='advance_rollouts' connected to emitter=Emitter(object_id='InferenceWorker_p0-w0', signal_name='advance2'), args=(1, 0) Traceback (most recent call last): File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/signal_slot/signal_slot.py", line 355, in _process_signal slot_callable(*args) File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/sample_factory/algo/sampling/rollout_worker.py", line 241, in advance_rollouts complete_rollouts, episodic_stats = runner.advance_rollouts(policy_id, self.timing) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/sample_factory/algo/sampling/non_batched_sampling.py", line 634, in advance_rollouts new_obs, rewards, terminated, truncated, infos = e.step(actions) ^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/gymnasium/core.py", line 461, in step return self.env.step(action) ^^^^^^^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/sample_factory/algo/utils/make_env.py", line 129, in step obs, rew, terminated, truncated, info = self.env.step(action) ^^^^^^^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/sample_factory/algo/utils/make_env.py", line 115, in step obs, rew, terminated, truncated, info = self.env.step(action) ^^^^^^^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/sf_examples/vizdoom/doom/wrappers/scenario_wrappers/gathering_reward_shaping.py", line 33, in step observation, reward, terminated, truncated, info = self.env.step(action) ^^^^^^^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/gymnasium/core.py", line 522, in step observation, reward, terminated, truncated, info = self.env.step(action) ^^^^^^^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/sample_factory/envs/env_wrappers.py", line 86, in step obs, reward, terminated, truncated, info = self.env.step(action) ^^^^^^^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/gymnasium/core.py", line 461, in step return self.env.step(action) ^^^^^^^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/sf_examples/vizdoom/doom/wrappers/multiplayer_stats.py", line 54, in step obs, reward, terminated, truncated, info = self.env.step(action) ^^^^^^^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/sf_examples/vizdoom/doom/doom_gym.py", line 452, in step reward = self.game.make_action(actions_flattened, self.skip_frames) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ vizdoom.vizdoom.SignalException: Signal SIGINT received. ViZDoom instance has been closed. [2025-01-03 20:13:40,391][122219] EvtLoop [rollout_proc4_evt_loop, process=rollout_proc4] unhandled exception in slot='advance_rollouts' connected to emitter=Emitter(object_id='InferenceWorker_p0-w0', signal_name='advance4'), args=(1, 0) Traceback (most recent call last): File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/signal_slot/signal_slot.py", line 355, in _process_signal slot_callable(*args) File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/sample_factory/algo/sampling/rollout_worker.py", line 241, in advance_rollouts complete_rollouts, episodic_stats = runner.advance_rollouts(policy_id, self.timing) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/sample_factory/algo/sampling/non_batched_sampling.py", line 634, in advance_rollouts new_obs, rewards, terminated, truncated, infos = e.step(actions) ^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/gymnasium/core.py", line 461, in step return self.env.step(action) ^^^^^^^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/sample_factory/algo/utils/make_env.py", line 129, in step obs, rew, terminated, truncated, info = self.env.step(action) ^^^^^^^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/sample_factory/algo/utils/make_env.py", line 115, in step obs, rew, terminated, truncated, info = self.env.step(action) ^^^^^^^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/sf_examples/vizdoom/doom/wrappers/scenario_wrappers/gathering_reward_shaping.py", line 33, in step observation, reward, terminated, truncated, info = self.env.step(action) ^^^^^^^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/gymnasium/core.py", line 522, in step observation, reward, terminated, truncated, info = self.env.step(action) ^^^^^^^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/sample_factory/envs/env_wrappers.py", line 86, in step obs, reward, terminated, truncated, info = self.env.step(action) ^^^^^^^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/gymnasium/core.py", line 461, in step return self.env.step(action) ^^^^^^^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/sf_examples/vizdoom/doom/wrappers/multiplayer_stats.py", line 54, in step obs, reward, terminated, truncated, info = self.env.step(action) ^^^^^^^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/sf_examples/vizdoom/doom/doom_gym.py", line 452, in step reward = self.game.make_action(actions_flattened, self.skip_frames) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ vizdoom.vizdoom.SignalException: Signal SIGINT received. ViZDoom instance has been closed. [2025-01-03 20:13:40,373][122215] EvtLoop [rollout_proc3_evt_loop, process=rollout_proc3] unhandled exception in slot='advance_rollouts' connected to emitter=Emitter(object_id='InferenceWorker_p0-w0', signal_name='advance3'), args=(1, 0) Traceback (most recent call last): File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/signal_slot/signal_slot.py", line 355, in _process_signal slot_callable(*args) File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/sample_factory/algo/sampling/rollout_worker.py", line 241, in advance_rollouts complete_rollouts, episodic_stats = runner.advance_rollouts(policy_id, self.timing) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/sample_factory/algo/sampling/non_batched_sampling.py", line 634, in advance_rollouts new_obs, rewards, terminated, truncated, infos = e.step(actions) ^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/gymnasium/core.py", line 461, in step return self.env.step(action) ^^^^^^^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/sample_factory/algo/utils/make_env.py", line 129, in step obs, rew, terminated, truncated, info = self.env.step(action) ^^^^^^^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/sample_factory/algo/utils/make_env.py", line 115, in step obs, rew, terminated, truncated, info = self.env.step(action) ^^^^^^^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/sf_examples/vizdoom/doom/wrappers/scenario_wrappers/gathering_reward_shaping.py", line 33, in step observation, reward, terminated, truncated, info = self.env.step(action) ^^^^^^^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/gymnasium/core.py", line 522, in step observation, reward, terminated, truncated, info = self.env.step(action) ^^^^^^^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/sample_factory/envs/env_wrappers.py", line 86, in step obs, reward, terminated, truncated, info = self.env.step(action) ^^^^^^^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/gymnasium/core.py", line 461, in step return self.env.step(action) ^^^^^^^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/sf_examples/vizdoom/doom/wrappers/multiplayer_stats.py", line 54, in step obs, reward, terminated, truncated, info = self.env.step(action) ^^^^^^^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/sf_examples/vizdoom/doom/doom_gym.py", line 452, in step reward = self.game.make_action(actions_flattened, self.skip_frames) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ vizdoom.vizdoom.SignalException: Signal SIGINT received. ViZDoom instance has been closed. [2025-01-03 20:13:40,381][122220] EvtLoop [rollout_proc7_evt_loop, process=rollout_proc7] unhandled exception in slot='advance_rollouts' connected to emitter=Emitter(object_id='InferenceWorker_p0-w0', signal_name='advance7'), args=(0, 0) Traceback (most recent call last): File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/signal_slot/signal_slot.py", line 355, in _process_signal slot_callable(*args) File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/sample_factory/algo/sampling/rollout_worker.py", line 241, in advance_rollouts complete_rollouts, episodic_stats = runner.advance_rollouts(policy_id, self.timing) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/sample_factory/algo/sampling/non_batched_sampling.py", line 634, in advance_rollouts new_obs, rewards, terminated, truncated, infos = e.step(actions) ^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/gymnasium/core.py", line 461, in step return self.env.step(action) ^^^^^^^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/sample_factory/algo/utils/make_env.py", line 129, in step obs, rew, terminated, truncated, info = self.env.step(action) ^^^^^^^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/sample_factory/algo/utils/make_env.py", line 115, in step obs, rew, terminated, truncated, info = self.env.step(action) ^^^^^^^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/sf_examples/vizdoom/doom/wrappers/scenario_wrappers/gathering_reward_shaping.py", line 33, in step observation, reward, terminated, truncated, info = self.env.step(action) ^^^^^^^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/gymnasium/core.py", line 522, in step observation, reward, terminated, truncated, info = self.env.step(action) ^^^^^^^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/sample_factory/envs/env_wrappers.py", line 86, in step obs, reward, terminated, truncated, info = self.env.step(action) ^^^^^^^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/gymnasium/core.py", line 461, in step return self.env.step(action) ^^^^^^^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/sf_examples/vizdoom/doom/wrappers/multiplayer_stats.py", line 54, in step obs, reward, terminated, truncated, info = self.env.step(action) ^^^^^^^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/sf_examples/vizdoom/doom/doom_gym.py", line 452, in step reward = self.game.make_action(actions_flattened, self.skip_frames) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ vizdoom.vizdoom.SignalException: Signal SIGINT received. ViZDoom instance has been closed. [2025-01-03 20:13:40,396][122176] Loop learner_proc0_evt_loop terminating... [2025-01-03 20:13:40,396][122219] Unhandled exception Signal SIGINT received. ViZDoom instance has been closed. in evt loop rollout_proc4_evt_loop [2025-01-03 20:13:40,401][122202] Weights refcount: 2 0 [2025-01-03 20:13:40,396][122216] Unhandled exception Signal SIGINT received. ViZDoom instance has been closed. in evt loop rollout_proc2_evt_loop [2025-01-03 20:13:40,396][122215] Unhandled exception Signal SIGINT received. ViZDoom instance has been closed. in evt loop rollout_proc3_evt_loop [2025-01-03 20:13:40,397][122220] Unhandled exception Signal SIGINT received. ViZDoom instance has been closed. in evt loop rollout_proc7_evt_loop [2025-01-03 20:13:40,404][122202] Stopping InferenceWorker_p0-w0... [2025-01-03 20:13:40,404][122202] Loop inference_proc0-0_evt_loop terminating... [2025-01-03 20:14:02,830][123391] Saving configuration to /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/config.json... [2025-01-03 20:14:02,831][123391] Rollout worker 0 uses device cpu [2025-01-03 20:14:02,831][123391] Rollout worker 1 uses device cpu [2025-01-03 20:14:02,831][123391] Rollout worker 2 uses device cpu [2025-01-03 20:14:02,831][123391] Rollout worker 3 uses device cpu [2025-01-03 20:14:02,831][123391] Rollout worker 4 uses device cpu [2025-01-03 20:14:02,832][123391] Rollout worker 5 uses device cpu [2025-01-03 20:14:02,832][123391] Rollout worker 6 uses device cpu [2025-01-03 20:14:02,832][123391] Rollout worker 7 uses device cpu [2025-01-03 20:14:02,880][123391] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2025-01-03 20:14:02,881][123391] InferenceWorker_p0-w0: min num requests: 2 [2025-01-03 20:14:02,906][123391] Starting all processes... [2025-01-03 20:14:02,907][123391] Starting process learner_proc0 [2025-01-03 20:14:04,543][123391] Starting all processes... [2025-01-03 20:14:04,547][123391] Starting process inference_proc0-0 [2025-01-03 20:14:04,547][123391] Starting process rollout_proc0 [2025-01-03 20:14:04,547][123391] Starting process rollout_proc1 [2025-01-03 20:14:04,552][123451] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2025-01-03 20:14:04,547][123391] Starting process rollout_proc2 [2025-01-03 20:14:04,552][123451] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0 [2025-01-03 20:14:04,547][123391] Starting process rollout_proc3 [2025-01-03 20:14:04,547][123391] Starting process rollout_proc4 [2025-01-03 20:14:04,547][123391] Starting process rollout_proc5 [2025-01-03 20:14:04,548][123391] Starting process rollout_proc6 [2025-01-03 20:14:04,548][123391] Starting process rollout_proc7 [2025-01-03 20:14:04,566][123451] Num visible devices: 1 [2025-01-03 20:14:04,574][123451] Starting seed is not provided [2025-01-03 20:14:04,575][123451] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2025-01-03 20:14:04,575][123451] Initializing actor-critic model on device cuda:0 [2025-01-03 20:14:04,575][123451] RunningMeanStd input shape: (3, 72, 128) [2025-01-03 20:14:04,576][123451] RunningMeanStd input shape: (1,) [2025-01-03 20:14:04,593][123451] ConvEncoder: input_channels=3 [2025-01-03 20:14:04,756][123451] Conv encoder output size: 512 [2025-01-03 20:14:04,757][123451] Policy head output size: 512 [2025-01-03 20:14:04,780][123451] Created Actor Critic model with architecture: [2025-01-03 20:14:04,781][123451] ActorCriticSharedWeights( (obs_normalizer): ObservationNormalizer( (running_mean_std): RunningMeanStdDictInPlace( (running_mean_std): ModuleDict( (obs): RunningMeanStdInPlace() ) ) ) (returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace) (encoder): VizdoomEncoder( (basic_encoder): ConvEncoder( (enc): RecursiveScriptModule( original_name=ConvEncoderImpl (conv_head): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Conv2d) (1): RecursiveScriptModule(original_name=ELU) (2): RecursiveScriptModule(original_name=Conv2d) (3): RecursiveScriptModule(original_name=ELU) (4): RecursiveScriptModule(original_name=Conv2d) (5): RecursiveScriptModule(original_name=ELU) ) (mlp_layers): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Linear) (1): RecursiveScriptModule(original_name=ELU) ) ) ) ) (core): ModelCoreRNN( (core): GRU(512, 512) ) (decoder): MlpDecoder( (mlp): Identity() ) (critic_linear): Linear(in_features=512, out_features=1, bias=True) (action_parameterization): ActionParameterizationDefault( (distribution_linear): Linear(in_features=512, out_features=5, bias=True) ) ) [2025-01-03 20:14:04,947][123451] Using optimizer [2025-01-03 20:14:06,623][123451] Loading state from checkpoint /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000000067_274432.pth... [2025-01-03 20:14:06,671][123451] Loading model from checkpoint [2025-01-03 20:14:06,673][123451] Loaded experiment state at self.train_step=67, self.env_steps=274432 [2025-01-03 20:14:06,678][123451] Initialized policy 0 weights for model version 67 [2025-01-03 20:14:06,684][123451] LearnerWorker_p0 finished initialization! [2025-01-03 20:14:06,686][123451] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2025-01-03 20:14:07,189][123481] Worker 3 uses CPU cores [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11] [2025-01-03 20:14:07,217][123483] Worker 4 uses CPU cores [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11] [2025-01-03 20:14:07,241][123477] Worker 0 uses CPU cores [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11] [2025-01-03 20:14:07,363][123479] Worker 1 uses CPU cores [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11] [2025-01-03 20:14:07,365][123478] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2025-01-03 20:14:07,366][123478] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0 [2025-01-03 20:14:07,381][123480] Worker 2 uses CPU cores [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11] [2025-01-03 20:14:07,381][123478] Num visible devices: 1 [2025-01-03 20:14:07,393][123496] Worker 6 uses CPU cores [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11] [2025-01-03 20:14:07,438][123495] Worker 7 uses CPU cores [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11] [2025-01-03 20:14:07,496][123478] RunningMeanStd input shape: (3, 72, 128) [2025-01-03 20:14:07,497][123478] RunningMeanStd input shape: (1,) [2025-01-03 20:14:07,507][123478] ConvEncoder: input_channels=3 [2025-01-03 20:14:07,517][123482] Worker 5 uses CPU cores [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11] [2025-01-03 20:14:07,599][123478] Conv encoder output size: 512 [2025-01-03 20:14:07,599][123478] Policy head output size: 512 [2025-01-03 20:14:07,627][123391] Inference worker 0-0 is ready! [2025-01-03 20:14:07,627][123391] All inference workers are ready! Signal rollout workers to start! [2025-01-03 20:14:07,658][123496] Doom resolution: 160x120, resize resolution: (128, 72) [2025-01-03 20:14:07,658][123477] Doom resolution: 160x120, resize resolution: (128, 72) [2025-01-03 20:14:07,660][123481] Doom resolution: 160x120, resize resolution: (128, 72) [2025-01-03 20:14:07,669][123479] Doom resolution: 160x120, resize resolution: (128, 72) [2025-01-03 20:14:07,674][123483] Doom resolution: 160x120, resize resolution: (128, 72) [2025-01-03 20:14:07,674][123480] Doom resolution: 160x120, resize resolution: (128, 72) [2025-01-03 20:14:07,675][123482] Doom resolution: 160x120, resize resolution: (128, 72) [2025-01-03 20:14:07,675][123495] Doom resolution: 160x120, resize resolution: (128, 72) [2025-01-03 20:14:07,964][123495] Decorrelating experience for 0 frames... [2025-01-03 20:14:07,991][123482] Decorrelating experience for 0 frames... [2025-01-03 20:14:07,992][123481] Decorrelating experience for 0 frames... [2025-01-03 20:14:07,994][123479] Decorrelating experience for 0 frames... [2025-01-03 20:14:07,994][123496] Decorrelating experience for 0 frames... [2025-01-03 20:14:08,046][123477] Decorrelating experience for 0 frames... [2025-01-03 20:14:08,201][123495] Decorrelating experience for 32 frames... [2025-01-03 20:14:08,234][123482] Decorrelating experience for 32 frames... [2025-01-03 20:14:08,234][123479] Decorrelating experience for 32 frames... [2025-01-03 20:14:08,279][123480] Decorrelating experience for 0 frames... [2025-01-03 20:14:08,326][123481] Decorrelating experience for 32 frames... [2025-01-03 20:14:08,375][123477] Decorrelating experience for 32 frames... [2025-01-03 20:14:08,507][123496] Decorrelating experience for 32 frames... [2025-01-03 20:14:08,528][123480] Decorrelating experience for 32 frames... [2025-01-03 20:14:08,633][123495] Decorrelating experience for 64 frames... [2025-01-03 20:14:08,663][123481] Decorrelating experience for 64 frames... [2025-01-03 20:14:08,677][123477] Decorrelating experience for 64 frames... [2025-01-03 20:14:08,696][123479] Decorrelating experience for 64 frames... [2025-01-03 20:14:08,792][123483] Decorrelating experience for 0 frames... [2025-01-03 20:14:08,839][123482] Decorrelating experience for 64 frames... [2025-01-03 20:14:08,850][123480] Decorrelating experience for 64 frames... [2025-01-03 20:14:08,926][123495] Decorrelating experience for 96 frames... [2025-01-03 20:14:08,966][123481] Decorrelating experience for 96 frames... [2025-01-03 20:14:08,977][123477] Decorrelating experience for 96 frames... [2025-01-03 20:14:09,075][123479] Decorrelating experience for 96 frames... [2025-01-03 20:14:09,128][123480] Decorrelating experience for 96 frames... [2025-01-03 20:14:09,138][123391] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 274432. Throughput: 0: nan. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2025-01-03 20:14:09,188][123496] Decorrelating experience for 64 frames... [2025-01-03 20:14:09,330][123482] Decorrelating experience for 96 frames... [2025-01-03 20:14:09,466][123496] Decorrelating experience for 96 frames... [2025-01-03 20:14:09,642][123483] Decorrelating experience for 32 frames... [2025-01-03 20:14:09,950][123451] Signal inference workers to stop experience collection... [2025-01-03 20:14:09,956][123478] InferenceWorker_p0-w0: stopping experience collection [2025-01-03 20:14:10,026][123483] Decorrelating experience for 64 frames... [2025-01-03 20:14:10,261][123483] Decorrelating experience for 96 frames... [2025-01-03 20:14:12,350][123451] Signal inference workers to resume experience collection... [2025-01-03 20:14:12,350][123478] InferenceWorker_p0-w0: resuming experience collection [2025-01-03 20:14:14,137][123391] Fps is (10 sec: 6554.0, 60 sec: 6554.0, 300 sec: 6554.0). Total num frames: 307200. Throughput: 0: 1517.3. Samples: 7586. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2025-01-03 20:14:14,138][123391] Avg episode reward: [(0, '3.922')] [2025-01-03 20:14:14,400][123478] Updated weights for policy 0, policy_version 77 (0.0073) [2025-01-03 20:14:16,717][123478] Updated weights for policy 0, policy_version 87 (0.0012) [2025-01-03 20:14:18,977][123478] Updated weights for policy 0, policy_version 97 (0.0011) [2025-01-03 20:14:19,138][123391] Fps is (10 sec: 12288.2, 60 sec: 12288.2, 300 sec: 12288.2). Total num frames: 397312. Throughput: 0: 2098.6. Samples: 20986. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2025-01-03 20:14:19,138][123391] Avg episode reward: [(0, '4.210')] [2025-01-03 20:14:21,279][123478] Updated weights for policy 0, policy_version 107 (0.0012) [2025-01-03 20:14:22,874][123391] Heartbeat connected on Batcher_0 [2025-01-03 20:14:22,877][123391] Heartbeat connected on LearnerWorker_p0 [2025-01-03 20:14:22,886][123391] Heartbeat connected on RolloutWorker_w0 [2025-01-03 20:14:22,887][123391] Heartbeat connected on InferenceWorker_p0-w0 [2025-01-03 20:14:22,889][123391] Heartbeat connected on RolloutWorker_w1 [2025-01-03 20:14:22,895][123391] Heartbeat connected on RolloutWorker_w3 [2025-01-03 20:14:22,895][123391] Heartbeat connected on RolloutWorker_w2 [2025-01-03 20:14:22,895][123391] Heartbeat connected on RolloutWorker_w4 [2025-01-03 20:14:22,899][123391] Heartbeat connected on RolloutWorker_w5 [2025-01-03 20:14:22,904][123391] Heartbeat connected on RolloutWorker_w6 [2025-01-03 20:14:22,907][123391] Heartbeat connected on RolloutWorker_w7 [2025-01-03 20:14:23,627][123478] Updated weights for policy 0, policy_version 117 (0.0012) [2025-01-03 20:14:24,137][123391] Fps is (10 sec: 18022.4, 60 sec: 14199.7, 300 sec: 14199.7). Total num frames: 487424. Throughput: 0: 3190.3. Samples: 47854. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-03 20:14:24,138][123391] Avg episode reward: [(0, '4.456')] [2025-01-03 20:14:26,023][123478] Updated weights for policy 0, policy_version 127 (0.0012) [2025-01-03 20:14:28,363][123478] Updated weights for policy 0, policy_version 137 (0.0012) [2025-01-03 20:14:29,137][123391] Fps is (10 sec: 17613.0, 60 sec: 14950.6, 300 sec: 14950.6). Total num frames: 573440. Throughput: 0: 3672.1. Samples: 73442. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2025-01-03 20:14:29,138][123391] Avg episode reward: [(0, '4.283')] [2025-01-03 20:14:30,815][123478] Updated weights for policy 0, policy_version 147 (0.0012) [2025-01-03 20:14:33,155][123478] Updated weights for policy 0, policy_version 157 (0.0012) [2025-01-03 20:14:34,137][123391] Fps is (10 sec: 17203.2, 60 sec: 15401.1, 300 sec: 15401.1). Total num frames: 659456. Throughput: 0: 3449.0. Samples: 86224. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-03 20:14:34,138][123391] Avg episode reward: [(0, '4.495')] [2025-01-03 20:14:35,511][123478] Updated weights for policy 0, policy_version 167 (0.0011) [2025-01-03 20:14:37,942][123478] Updated weights for policy 0, policy_version 177 (0.0012) [2025-01-03 20:14:39,137][123391] Fps is (10 sec: 16793.6, 60 sec: 15564.9, 300 sec: 15564.9). Total num frames: 741376. Throughput: 0: 3746.0. Samples: 112380. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 20:14:39,138][123391] Avg episode reward: [(0, '4.625')] [2025-01-03 20:14:39,156][123451] Saving new best policy, reward=4.625! [2025-01-03 20:14:40,469][123478] Updated weights for policy 0, policy_version 187 (0.0014) [2025-01-03 20:14:42,837][123478] Updated weights for policy 0, policy_version 197 (0.0012) [2025-01-03 20:14:44,137][123391] Fps is (10 sec: 16793.7, 60 sec: 15799.0, 300 sec: 15799.0). Total num frames: 827392. Throughput: 0: 3925.6. Samples: 137394. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 20:14:44,138][123391] Avg episode reward: [(0, '4.720')] [2025-01-03 20:14:44,138][123451] Saving new best policy, reward=4.720! [2025-01-03 20:14:45,375][123478] Updated weights for policy 0, policy_version 207 (0.0012) [2025-01-03 20:14:47,712][123478] Updated weights for policy 0, policy_version 217 (0.0011) [2025-01-03 20:14:49,137][123391] Fps is (10 sec: 17203.2, 60 sec: 15974.5, 300 sec: 15974.5). Total num frames: 913408. Throughput: 0: 3743.6. Samples: 149744. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-03 20:14:49,138][123391] Avg episode reward: [(0, '5.010')] [2025-01-03 20:14:49,142][123451] Saving new best policy, reward=5.010! [2025-01-03 20:14:50,110][123478] Updated weights for policy 0, policy_version 227 (0.0012) [2025-01-03 20:14:52,522][123478] Updated weights for policy 0, policy_version 237 (0.0011) [2025-01-03 20:14:54,137][123391] Fps is (10 sec: 16793.5, 60 sec: 16020.0, 300 sec: 16020.0). Total num frames: 995328. Throughput: 0: 3898.7. Samples: 175440. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 20:14:54,138][123391] Avg episode reward: [(0, '4.543')] [2025-01-03 20:14:54,913][123478] Updated weights for policy 0, policy_version 247 (0.0012) [2025-01-03 20:14:57,287][123478] Updated weights for policy 0, policy_version 257 (0.0012) [2025-01-03 20:14:59,137][123391] Fps is (10 sec: 16793.8, 60 sec: 16138.4, 300 sec: 16138.4). Total num frames: 1081344. Throughput: 0: 4301.7. Samples: 201162. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-03 20:14:59,138][123391] Avg episode reward: [(0, '4.875')] [2025-01-03 20:14:59,691][123478] Updated weights for policy 0, policy_version 267 (0.0011) [2025-01-03 20:15:02,017][123478] Updated weights for policy 0, policy_version 277 (0.0011) [2025-01-03 20:15:04,137][123391] Fps is (10 sec: 17203.3, 60 sec: 16235.1, 300 sec: 16235.1). Total num frames: 1167360. Throughput: 0: 4294.1. Samples: 214220. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-03 20:15:04,138][123391] Avg episode reward: [(0, '4.662')] [2025-01-03 20:15:04,495][123478] Updated weights for policy 0, policy_version 287 (0.0011) [2025-01-03 20:15:07,076][123478] Updated weights for policy 0, policy_version 297 (0.0012) [2025-01-03 20:15:09,137][123391] Fps is (10 sec: 16384.0, 60 sec: 16179.3, 300 sec: 16179.3). Total num frames: 1245184. Throughput: 0: 4236.5. Samples: 238494. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-03 20:15:09,138][123391] Avg episode reward: [(0, '4.642')] [2025-01-03 20:15:09,749][123478] Updated weights for policy 0, policy_version 307 (0.0013) [2025-01-03 20:15:12,292][123478] Updated weights for policy 0, policy_version 317 (0.0012) [2025-01-03 20:15:14,137][123391] Fps is (10 sec: 15974.4, 60 sec: 16998.4, 300 sec: 16195.0). Total num frames: 1327104. Throughput: 0: 4197.5. Samples: 262328. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 20:15:14,138][123391] Avg episode reward: [(0, '4.805')] [2025-01-03 20:15:14,850][123478] Updated weights for policy 0, policy_version 327 (0.0013) [2025-01-03 20:15:17,316][123478] Updated weights for policy 0, policy_version 337 (0.0013) [2025-01-03 20:15:19,137][123391] Fps is (10 sec: 16383.8, 60 sec: 16861.9, 300 sec: 16208.5). Total num frames: 1409024. Throughput: 0: 4182.4. Samples: 274432. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 20:15:19,138][123391] Avg episode reward: [(0, '4.847')] [2025-01-03 20:15:19,645][123478] Updated weights for policy 0, policy_version 347 (0.0012) [2025-01-03 20:15:21,945][123478] Updated weights for policy 0, policy_version 357 (0.0011) [2025-01-03 20:15:24,137][123391] Fps is (10 sec: 17203.2, 60 sec: 16861.9, 300 sec: 16329.4). Total num frames: 1499136. Throughput: 0: 4191.3. Samples: 300988. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 20:15:24,138][123391] Avg episode reward: [(0, '4.795')] [2025-01-03 20:15:24,234][123478] Updated weights for policy 0, policy_version 367 (0.0011) [2025-01-03 20:15:26,560][123478] Updated weights for policy 0, policy_version 377 (0.0012) [2025-01-03 20:15:28,869][123478] Updated weights for policy 0, policy_version 387 (0.0011) [2025-01-03 20:15:29,137][123391] Fps is (10 sec: 18022.6, 60 sec: 16930.2, 300 sec: 16435.3). Total num frames: 1589248. Throughput: 0: 4224.0. Samples: 327472. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-03 20:15:29,138][123391] Avg episode reward: [(0, '4.798')] [2025-01-03 20:15:31,239][123478] Updated weights for policy 0, policy_version 397 (0.0012) [2025-01-03 20:15:33,572][123478] Updated weights for policy 0, policy_version 407 (0.0011) [2025-01-03 20:15:34,137][123391] Fps is (10 sec: 17612.8, 60 sec: 16930.1, 300 sec: 16480.4). Total num frames: 1675264. Throughput: 0: 4239.5. Samples: 340520. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-03 20:15:34,138][123391] Avg episode reward: [(0, '4.693')] [2025-01-03 20:15:35,909][123478] Updated weights for policy 0, policy_version 417 (0.0011) [2025-01-03 20:15:38,226][123478] Updated weights for policy 0, policy_version 427 (0.0011) [2025-01-03 20:15:39,137][123391] Fps is (10 sec: 17203.3, 60 sec: 16998.4, 300 sec: 16520.6). Total num frames: 1761280. Throughput: 0: 4256.2. Samples: 366970. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2025-01-03 20:15:39,138][123391] Avg episode reward: [(0, '4.856')] [2025-01-03 20:15:40,590][123478] Updated weights for policy 0, policy_version 437 (0.0011) [2025-01-03 20:15:42,941][123478] Updated weights for policy 0, policy_version 447 (0.0011) [2025-01-03 20:15:44,137][123391] Fps is (10 sec: 17203.1, 60 sec: 16998.4, 300 sec: 16556.5). Total num frames: 1847296. Throughput: 0: 4262.8. Samples: 392988. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-03 20:15:44,138][123391] Avg episode reward: [(0, '4.676')] [2025-01-03 20:15:45,299][123478] Updated weights for policy 0, policy_version 457 (0.0011) [2025-01-03 20:15:48,061][123478] Updated weights for policy 0, policy_version 467 (0.0017) [2025-01-03 20:15:49,138][123391] Fps is (10 sec: 15973.6, 60 sec: 16793.5, 300 sec: 16465.9). Total num frames: 1921024. Throughput: 0: 4265.5. Samples: 406168. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2025-01-03 20:15:49,139][123391] Avg episode reward: [(0, '4.536')] [2025-01-03 20:15:52,516][123478] Updated weights for policy 0, policy_version 477 (0.0027) [2025-01-03 20:15:54,138][123391] Fps is (10 sec: 11878.0, 60 sec: 16179.1, 300 sec: 16110.9). Total num frames: 1966080. Throughput: 0: 4052.7. Samples: 420866. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2025-01-03 20:15:54,138][123391] Avg episode reward: [(0, '4.844')] [2025-01-03 20:15:56,650][123478] Updated weights for policy 0, policy_version 487 (0.0025) [2025-01-03 20:15:59,138][123391] Fps is (10 sec: 9830.6, 60 sec: 15633.0, 300 sec: 15862.7). Total num frames: 2019328. Throughput: 0: 3855.9. Samples: 435846. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-03 20:15:59,138][123391] Avg episode reward: [(0, '4.772')] [2025-01-03 20:15:59,148][123451] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000000493_2019328.pth... [2025-01-03 20:16:00,858][123478] Updated weights for policy 0, policy_version 497 (0.0025) [2025-01-03 20:16:04,138][123391] Fps is (10 sec: 10239.9, 60 sec: 15018.6, 300 sec: 15600.4). Total num frames: 2068480. Throughput: 0: 3751.9. Samples: 443270. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-03 20:16:04,139][123391] Avg episode reward: [(0, '4.749')] [2025-01-03 20:16:04,827][123478] Updated weights for policy 0, policy_version 507 (0.0026) [2025-01-03 20:16:08,674][123478] Updated weights for policy 0, policy_version 517 (0.0023) [2025-01-03 20:16:09,138][123391] Fps is (10 sec: 10239.8, 60 sec: 14609.0, 300 sec: 15394.1). Total num frames: 2121728. Throughput: 0: 3511.4. Samples: 459002. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-03 20:16:09,138][123391] Avg episode reward: [(0, '4.824')] [2025-01-03 20:16:12,938][123478] Updated weights for policy 0, policy_version 527 (0.0024) [2025-01-03 20:16:14,137][123391] Fps is (10 sec: 10240.4, 60 sec: 14062.9, 300 sec: 15171.6). Total num frames: 2170880. Throughput: 0: 3250.3. Samples: 473734. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-03 20:16:14,138][123391] Avg episode reward: [(0, '4.571')] [2025-01-03 20:16:16,715][123478] Updated weights for policy 0, policy_version 537 (0.0023) [2025-01-03 20:16:19,138][123391] Fps is (10 sec: 10240.0, 60 sec: 13585.0, 300 sec: 14997.7). Total num frames: 2224128. Throughput: 0: 3143.6. Samples: 481984. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-03 20:16:19,138][123391] Avg episode reward: [(0, '4.625')] [2025-01-03 20:16:20,795][123478] Updated weights for policy 0, policy_version 547 (0.0024) [2025-01-03 20:16:24,138][123391] Fps is (10 sec: 10239.8, 60 sec: 12902.3, 300 sec: 14806.3). Total num frames: 2273280. Throughput: 0: 2891.5. Samples: 497090. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-03 20:16:24,138][123391] Avg episode reward: [(0, '4.503')] [2025-01-03 20:16:24,861][123478] Updated weights for policy 0, policy_version 557 (0.0024) [2025-01-03 20:16:28,703][123478] Updated weights for policy 0, policy_version 567 (0.0024) [2025-01-03 20:16:29,137][123391] Fps is (10 sec: 10240.2, 60 sec: 12288.0, 300 sec: 14657.8). Total num frames: 2326528. Throughput: 0: 2660.9. Samples: 512730. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-03 20:16:29,138][123391] Avg episode reward: [(0, '4.680')] [2025-01-03 20:16:32,488][123478] Updated weights for policy 0, policy_version 577 (0.0022) [2025-01-03 20:16:34,138][123391] Fps is (10 sec: 10649.7, 60 sec: 11741.8, 300 sec: 14519.6). Total num frames: 2379776. Throughput: 0: 2550.3. Samples: 520930. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-03 20:16:34,138][123391] Avg episode reward: [(0, '4.610')] [2025-01-03 20:16:36,211][123478] Updated weights for policy 0, policy_version 587 (0.0024) [2025-01-03 20:16:39,138][123391] Fps is (10 sec: 10649.5, 60 sec: 11195.7, 300 sec: 14390.6). Total num frames: 2433024. Throughput: 0: 2579.0. Samples: 536922. Policy #0 lag: (min: 0.0, avg: 0.8, max: 1.0) [2025-01-03 20:16:39,138][123391] Avg episode reward: [(0, '4.749')] [2025-01-03 20:16:40,321][123478] Updated weights for policy 0, policy_version 597 (0.0024) [2025-01-03 20:16:44,138][123391] Fps is (10 sec: 10239.8, 60 sec: 10581.3, 300 sec: 14243.5). Total num frames: 2482176. Throughput: 0: 2590.7. Samples: 552430. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-03 20:16:44,138][123391] Avg episode reward: [(0, '4.684')] [2025-01-03 20:16:44,172][123478] Updated weights for policy 0, policy_version 607 (0.0023) [2025-01-03 20:16:48,209][123478] Updated weights for policy 0, policy_version 617 (0.0025) [2025-01-03 20:16:49,138][123391] Fps is (10 sec: 10239.8, 60 sec: 10240.0, 300 sec: 14131.2). Total num frames: 2535424. Throughput: 0: 2596.8. Samples: 560124. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 20:16:49,139][123391] Avg episode reward: [(0, '4.515')] [2025-01-03 20:16:52,159][123478] Updated weights for policy 0, policy_version 627 (0.0023) [2025-01-03 20:16:54,138][123391] Fps is (10 sec: 10649.8, 60 sec: 10376.6, 300 sec: 14025.7). Total num frames: 2588672. Throughput: 0: 2587.8. Samples: 575452. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2025-01-03 20:16:54,138][123391] Avg episode reward: [(0, '4.768')] [2025-01-03 20:16:56,079][123478] Updated weights for policy 0, policy_version 637 (0.0024) [2025-01-03 20:16:59,138][123391] Fps is (10 sec: 10240.0, 60 sec: 10308.2, 300 sec: 13902.3). Total num frames: 2637824. Throughput: 0: 2610.7. Samples: 591214. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2025-01-03 20:16:59,138][123391] Avg episode reward: [(0, '4.684')] [2025-01-03 20:17:00,099][123478] Updated weights for policy 0, policy_version 647 (0.0025) [2025-01-03 20:17:04,122][123478] Updated weights for policy 0, policy_version 657 (0.0024) [2025-01-03 20:17:04,138][123391] Fps is (10 sec: 10239.9, 60 sec: 10376.6, 300 sec: 13809.4). Total num frames: 2691072. Throughput: 0: 2594.9. Samples: 598754. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2025-01-03 20:17:04,138][123391] Avg episode reward: [(0, '4.632')] [2025-01-03 20:17:07,892][123478] Updated weights for policy 0, policy_version 667 (0.0023) [2025-01-03 20:17:09,138][123391] Fps is (10 sec: 10240.0, 60 sec: 10308.3, 300 sec: 13698.8). Total num frames: 2740224. Throughput: 0: 2609.0. Samples: 614494. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2025-01-03 20:17:09,138][123391] Avg episode reward: [(0, '4.702')] [2025-01-03 20:17:12,102][123478] Updated weights for policy 0, policy_version 677 (0.0024) [2025-01-03 20:17:14,137][123391] Fps is (10 sec: 11059.5, 60 sec: 10513.1, 300 sec: 13660.7). Total num frames: 2801664. Throughput: 0: 2618.0. Samples: 630540. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2025-01-03 20:17:14,138][123391] Avg episode reward: [(0, '4.415')] [2025-01-03 20:17:14,820][123478] Updated weights for policy 0, policy_version 687 (0.0014) [2025-01-03 20:17:17,364][123478] Updated weights for policy 0, policy_version 697 (0.0013) [2025-01-03 20:17:19,137][123391] Fps is (10 sec: 13926.8, 60 sec: 10922.7, 300 sec: 13710.8). Total num frames: 2879488. Throughput: 0: 2716.2. Samples: 643160. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-03 20:17:19,138][123391] Avg episode reward: [(0, '4.649')] [2025-01-03 20:17:20,477][123478] Updated weights for policy 0, policy_version 707 (0.0019) [2025-01-03 20:17:24,138][123391] Fps is (10 sec: 13107.0, 60 sec: 10991.0, 300 sec: 13632.3). Total num frames: 2932736. Throughput: 0: 2785.0. Samples: 662248. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2025-01-03 20:17:24,138][123391] Avg episode reward: [(0, '4.683')] [2025-01-03 20:17:24,300][123478] Updated weights for policy 0, policy_version 717 (0.0024) [2025-01-03 20:17:28,473][123478] Updated weights for policy 0, policy_version 727 (0.0024) [2025-01-03 20:17:29,138][123391] Fps is (10 sec: 10239.8, 60 sec: 10922.6, 300 sec: 13537.3). Total num frames: 2981888. Throughput: 0: 2769.2. Samples: 677044. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2025-01-03 20:17:29,138][123391] Avg episode reward: [(0, '4.800')] [2025-01-03 20:17:31,355][123478] Updated weights for policy 0, policy_version 737 (0.0016) [2025-01-03 20:17:33,599][123478] Updated weights for policy 0, policy_version 747 (0.0012) [2025-01-03 20:17:34,138][123391] Fps is (10 sec: 13516.8, 60 sec: 11468.8, 300 sec: 13626.7). Total num frames: 3067904. Throughput: 0: 2845.3. Samples: 688162. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-03 20:17:34,138][123391] Avg episode reward: [(0, '4.541')] [2025-01-03 20:17:36,901][123478] Updated weights for policy 0, policy_version 757 (0.0021) [2025-01-03 20:17:39,138][123391] Fps is (10 sec: 13926.3, 60 sec: 11468.8, 300 sec: 13555.8). Total num frames: 3121152. Throughput: 0: 2971.0. Samples: 709146. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2025-01-03 20:17:39,138][123391] Avg episode reward: [(0, '4.735')] [2025-01-03 20:17:40,906][123478] Updated weights for policy 0, policy_version 767 (0.0025) [2025-01-03 20:17:43,752][123478] Updated weights for policy 0, policy_version 777 (0.0016) [2025-01-03 20:17:44,137][123391] Fps is (10 sec: 11878.7, 60 sec: 11742.0, 300 sec: 13545.4). Total num frames: 3186688. Throughput: 0: 3026.3. Samples: 727394. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2025-01-03 20:17:44,138][123391] Avg episode reward: [(0, '4.487')] [2025-01-03 20:17:45,945][123478] Updated weights for policy 0, policy_version 787 (0.0012) [2025-01-03 20:17:49,138][123391] Fps is (10 sec: 13926.4, 60 sec: 12083.2, 300 sec: 13572.7). Total num frames: 3260416. Throughput: 0: 3154.9. Samples: 740726. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-03 20:17:49,138][123391] Avg episode reward: [(0, '4.586')] [2025-01-03 20:17:49,351][123478] Updated weights for policy 0, policy_version 797 (0.0021) [2025-01-03 20:17:53,110][123478] Updated weights for policy 0, policy_version 807 (0.0023) [2025-01-03 20:17:54,138][123391] Fps is (10 sec: 12697.2, 60 sec: 12083.2, 300 sec: 13507.7). Total num frames: 3313664. Throughput: 0: 3174.2. Samples: 757334. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2025-01-03 20:17:54,138][123391] Avg episode reward: [(0, '4.911')] [2025-01-03 20:17:56,800][123478] Updated weights for policy 0, policy_version 817 (0.0023) [2025-01-03 20:17:59,138][123391] Fps is (10 sec: 11059.2, 60 sec: 12219.7, 300 sec: 13463.4). Total num frames: 3371008. Throughput: 0: 3184.1. Samples: 773826. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-03 20:17:59,138][123391] Avg episode reward: [(0, '4.832')] [2025-01-03 20:17:59,146][123451] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000000823_3371008.pth... [2025-01-03 20:17:59,216][123451] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000000067_274432.pth [2025-01-03 20:18:00,633][123478] Updated weights for policy 0, policy_version 827 (0.0023) [2025-01-03 20:18:04,138][123391] Fps is (10 sec: 11059.2, 60 sec: 12219.8, 300 sec: 13403.5). Total num frames: 3424256. Throughput: 0: 3083.8. Samples: 781930. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2025-01-03 20:18:04,138][123391] Avg episode reward: [(0, '4.782')] [2025-01-03 20:18:04,337][123478] Updated weights for policy 0, policy_version 837 (0.0024) [2025-01-03 20:18:08,141][123478] Updated weights for policy 0, policy_version 847 (0.0023) [2025-01-03 20:18:09,137][123391] Fps is (10 sec: 11469.2, 60 sec: 12424.6, 300 sec: 13380.3). Total num frames: 3485696. Throughput: 0: 3015.2. Samples: 797932. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2025-01-03 20:18:09,138][123391] Avg episode reward: [(0, '4.528')] [2025-01-03 20:18:10,675][123478] Updated weights for policy 0, policy_version 857 (0.0015) [2025-01-03 20:18:14,137][123391] Fps is (10 sec: 12288.1, 60 sec: 12424.5, 300 sec: 13358.0). Total num frames: 3547136. Throughput: 0: 3142.4. Samples: 818452. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2025-01-03 20:18:14,138][123391] Avg episode reward: [(0, '4.461')] [2025-01-03 20:18:14,195][123478] Updated weights for policy 0, policy_version 867 (0.0022) [2025-01-03 20:18:17,796][123478] Updated weights for policy 0, policy_version 877 (0.0023) [2025-01-03 20:18:19,138][123391] Fps is (10 sec: 11878.1, 60 sec: 12083.2, 300 sec: 13320.2). Total num frames: 3604480. Throughput: 0: 3085.5. Samples: 827008. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-03 20:18:19,138][123391] Avg episode reward: [(0, '4.579')] [2025-01-03 20:18:21,491][123478] Updated weights for policy 0, policy_version 887 (0.0022) [2025-01-03 20:18:24,138][123391] Fps is (10 sec: 11058.9, 60 sec: 12083.2, 300 sec: 13267.8). Total num frames: 3657728. Throughput: 0: 2987.6. Samples: 843590. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-03 20:18:24,139][123391] Avg episode reward: [(0, '4.715')] [2025-01-03 20:18:25,485][123478] Updated weights for policy 0, policy_version 897 (0.0024) [2025-01-03 20:18:28,544][123478] Updated weights for policy 0, policy_version 907 (0.0016) [2025-01-03 20:18:29,137][123391] Fps is (10 sec: 11878.7, 60 sec: 12356.3, 300 sec: 13264.8). Total num frames: 3723264. Throughput: 0: 2980.1. Samples: 861498. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 20:18:29,138][123391] Avg episode reward: [(0, '4.847')] [2025-01-03 20:18:30,846][123478] Updated weights for policy 0, policy_version 917 (0.0012) [2025-01-03 20:18:33,146][123478] Updated weights for policy 0, policy_version 927 (0.0012) [2025-01-03 20:18:34,137][123391] Fps is (10 sec: 15565.4, 60 sec: 12424.6, 300 sec: 13354.5). Total num frames: 3813376. Throughput: 0: 2986.6. Samples: 875122. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 20:18:34,138][123391] Avg episode reward: [(0, '4.613')] [2025-01-03 20:18:35,449][123478] Updated weights for policy 0, policy_version 937 (0.0012) [2025-01-03 20:18:37,765][123478] Updated weights for policy 0, policy_version 947 (0.0012) [2025-01-03 20:18:39,138][123391] Fps is (10 sec: 16793.1, 60 sec: 12834.1, 300 sec: 13395.4). Total num frames: 3891200. Throughput: 0: 3205.3. Samples: 901574. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-03 20:18:39,138][123391] Avg episode reward: [(0, '4.682')] [2025-01-03 20:18:41,974][123478] Updated weights for policy 0, policy_version 957 (0.0027) [2025-01-03 20:18:44,138][123391] Fps is (10 sec: 12287.6, 60 sec: 12492.7, 300 sec: 13315.7). Total num frames: 3936256. Throughput: 0: 3158.8. Samples: 915972. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-03 20:18:44,138][123391] Avg episode reward: [(0, '4.496')] [2025-01-03 20:18:46,251][123478] Updated weights for policy 0, policy_version 967 (0.0027) [2025-01-03 20:18:49,138][123391] Fps is (10 sec: 9830.5, 60 sec: 12151.5, 300 sec: 13268.1). Total num frames: 3989504. Throughput: 0: 3144.5. Samples: 923432. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-03 20:18:49,138][123391] Avg episode reward: [(0, '4.729')] [2025-01-03 20:18:50,000][123478] Updated weights for policy 0, policy_version 977 (0.0022) [2025-01-03 20:18:50,382][123451] Stopping Batcher_0... [2025-01-03 20:18:50,382][123391] Component Batcher_0 stopped! [2025-01-03 20:18:50,383][123451] Loop batcher_evt_loop terminating... [2025-01-03 20:18:50,384][123451] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... [2025-01-03 20:18:50,418][123478] Weights refcount: 2 0 [2025-01-03 20:18:50,421][123478] Stopping InferenceWorker_p0-w0... [2025-01-03 20:18:50,422][123478] Loop inference_proc0-0_evt_loop terminating... [2025-01-03 20:18:50,424][123391] Component InferenceWorker_p0-w0 stopped! [2025-01-03 20:18:50,467][123479] Stopping RolloutWorker_w1... [2025-01-03 20:18:50,467][123391] Component RolloutWorker_w1 stopped! [2025-01-03 20:18:50,467][123482] Stopping RolloutWorker_w5... [2025-01-03 20:18:50,468][123391] Component RolloutWorker_w5 stopped! [2025-01-03 20:18:50,468][123482] Loop rollout_proc5_evt_loop terminating... [2025-01-03 20:18:50,468][123479] Loop rollout_proc1_evt_loop terminating... [2025-01-03 20:18:50,469][123451] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000000493_2019328.pth [2025-01-03 20:18:50,470][123480] Stopping RolloutWorker_w2... [2025-01-03 20:18:50,470][123391] Component RolloutWorker_w2 stopped! [2025-01-03 20:18:50,471][123483] Stopping RolloutWorker_w4... [2025-01-03 20:18:50,471][123451] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... [2025-01-03 20:18:50,471][123480] Loop rollout_proc2_evt_loop terminating... [2025-01-03 20:18:50,471][123495] Stopping RolloutWorker_w7... [2025-01-03 20:18:50,472][123391] Component RolloutWorker_w4 stopped! [2025-01-03 20:18:50,472][123483] Loop rollout_proc4_evt_loop terminating... [2025-01-03 20:18:50,472][123495] Loop rollout_proc7_evt_loop terminating... [2025-01-03 20:18:50,472][123391] Component RolloutWorker_w7 stopped! [2025-01-03 20:18:50,473][123496] Stopping RolloutWorker_w6... [2025-01-03 20:18:50,473][123391] Component RolloutWorker_w6 stopped! [2025-01-03 20:18:50,474][123496] Loop rollout_proc6_evt_loop terminating... [2025-01-03 20:18:50,476][123391] Component RolloutWorker_w0 stopped! [2025-01-03 20:18:50,475][123477] Stopping RolloutWorker_w0... [2025-01-03 20:18:50,477][123477] Loop rollout_proc0_evt_loop terminating... [2025-01-03 20:18:50,478][123481] Stopping RolloutWorker_w3... [2025-01-03 20:18:50,478][123391] Component RolloutWorker_w3 stopped! [2025-01-03 20:18:50,478][123481] Loop rollout_proc3_evt_loop terminating... [2025-01-03 20:18:50,560][123451] Stopping LearnerWorker_p0... [2025-01-03 20:18:50,560][123391] Component LearnerWorker_p0 stopped! [2025-01-03 20:18:50,560][123451] Loop learner_proc0_evt_loop terminating... [2025-01-03 20:18:50,566][123391] Waiting for process learner_proc0 to stop... [2025-01-03 20:18:51,932][123391] Waiting for process inference_proc0-0 to join... [2025-01-03 20:18:51,932][123391] Waiting for process rollout_proc0 to join... [2025-01-03 20:18:51,932][123391] Waiting for process rollout_proc1 to join... [2025-01-03 20:18:51,933][123391] Waiting for process rollout_proc2 to join... [2025-01-03 20:18:51,933][123391] Waiting for process rollout_proc3 to join... [2025-01-03 20:18:51,933][123391] Waiting for process rollout_proc4 to join... [2025-01-03 20:18:51,934][123391] Waiting for process rollout_proc5 to join... [2025-01-03 20:18:51,934][123391] Waiting for process rollout_proc6 to join... [2025-01-03 20:18:51,934][123391] Waiting for process rollout_proc7 to join... [2025-01-03 20:18:51,935][123391] Batcher 0 profile tree view: batching: 12.4530, releasing_batches: 0.0290 [2025-01-03 20:18:51,935][123391] InferenceWorker_p0-w0 profile tree view: wait_policy: 0.0001 wait_policy_total: 5.2467 update_model: 4.3349 weight_update: 0.0023 one_step: 0.0058 handle_policy_step: 257.5488 deserialize: 9.9140, stack: 1.5193, obs_to_device_normalize: 62.8038, forward: 117.5268, send_messages: 18.9620 prepare_outputs: 36.3900 to_cpu: 24.3378 [2025-01-03 20:18:51,935][123391] Learner 0 profile tree view: misc: 0.0043, prepare_batch: 12.7592 train: 63.0072 epoch_init: 0.0058, minibatch_init: 0.0066, losses_postprocess: 0.3273, kl_divergence: 0.3661, after_optimizer: 1.2819 calculate_losses: 21.5594 losses_init: 0.0036, forward_head: 1.1328, bptt_initial: 15.3543, tail: 0.6870, advantages_returns: 0.1975, losses: 2.7943 bptt: 1.1874 bptt_forward_core: 1.1200 update: 39.0545 clip: 0.8884 [2025-01-03 20:18:51,936][123391] RolloutWorker_w0 profile tree view: wait_for_trajectories: 0.1788, enqueue_policy_requests: 12.9224, env_step: 155.4093, overhead: 8.7577, complete_rollouts: 0.2908 save_policy_outputs: 14.6290 split_output_tensors: 4.7553 [2025-01-03 20:18:51,936][123391] RolloutWorker_w7 profile tree view: wait_for_trajectories: 0.1781, enqueue_policy_requests: 13.0374, env_step: 155.4683, overhead: 8.8047, complete_rollouts: 0.2857 save_policy_outputs: 14.5489 split_output_tensors: 4.7563 [2025-01-03 20:18:51,936][123391] Loop Runner_EvtLoop terminating... [2025-01-03 20:18:51,937][123391] Runner profile tree view: main_loop: 289.0310 [2025-01-03 20:18:51,937][123391] Collected {0: 4005888}, FPS: 12910.2 [2025-01-03 20:18:52,336][123391] Loading existing experiment configuration from /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/config.json [2025-01-03 20:18:52,336][123391] Overriding arg 'num_workers' with value 1 passed from command line [2025-01-03 20:18:52,336][123391] Adding new argument 'no_render'=True that is not in the saved config file! [2025-01-03 20:18:52,336][123391] Adding new argument 'save_video'=True that is not in the saved config file! [2025-01-03 20:18:52,337][123391] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! [2025-01-03 20:18:52,337][123391] Adding new argument 'video_name'=None that is not in the saved config file! [2025-01-03 20:18:52,337][123391] Adding new argument 'max_num_frames'=1000000000.0 that is not in the saved config file! [2025-01-03 20:18:52,337][123391] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! [2025-01-03 20:18:52,337][123391] Adding new argument 'push_to_hub'=False that is not in the saved config file! [2025-01-03 20:18:52,337][123391] Adding new argument 'hf_repository'=None that is not in the saved config file! [2025-01-03 20:18:52,337][123391] Adding new argument 'policy_index'=0 that is not in the saved config file! [2025-01-03 20:18:52,338][123391] Adding new argument 'eval_deterministic'=False that is not in the saved config file! [2025-01-03 20:18:52,338][123391] Adding new argument 'train_script'=None that is not in the saved config file! [2025-01-03 20:18:52,338][123391] Adding new argument 'enjoy_script'=None that is not in the saved config file! [2025-01-03 20:18:52,338][123391] Using frameskip 1 and render_action_repeat=4 for evaluation [2025-01-03 20:18:52,372][123391] Doom resolution: 160x120, resize resolution: (128, 72) [2025-01-03 20:18:52,374][123391] RunningMeanStd input shape: (3, 72, 128) [2025-01-03 20:18:52,375][123391] RunningMeanStd input shape: (1,) [2025-01-03 20:18:52,389][123391] ConvEncoder: input_channels=3 [2025-01-03 20:18:52,527][123391] Conv encoder output size: 512 [2025-01-03 20:18:52,527][123391] Policy head output size: 512 [2025-01-03 20:18:52,687][123391] Loading state from checkpoint /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... [2025-01-03 20:18:53,452][123391] Num frames 100... [2025-01-03 20:18:53,571][123391] Num frames 200... [2025-01-03 20:18:53,697][123391] Num frames 300... [2025-01-03 20:18:53,852][123391] Avg episode rewards: #0: 3.840, true rewards: #0: 3.840 [2025-01-03 20:18:53,853][123391] Avg episode reward: 3.840, avg true_objective: 3.840 [2025-01-03 20:18:53,903][123391] Num frames 400... [2025-01-03 20:18:54,017][123391] Num frames 500... [2025-01-03 20:18:54,127][123391] Num frames 600... [2025-01-03 20:18:54,240][123391] Num frames 700... [2025-01-03 20:18:54,350][123391] Num frames 800... [2025-01-03 20:18:54,478][123391] Avg episode rewards: #0: 5.320, true rewards: #0: 4.320 [2025-01-03 20:18:54,478][123391] Avg episode reward: 5.320, avg true_objective: 4.320 [2025-01-03 20:18:54,518][123391] Num frames 900... [2025-01-03 20:18:54,620][123391] Num frames 1000... [2025-01-03 20:18:54,725][123391] Num frames 1100... [2025-01-03 20:18:54,823][123391] Num frames 1200... [2025-01-03 20:18:54,924][123391] Num frames 1300... [2025-01-03 20:18:55,022][123391] Num frames 1400... [2025-01-03 20:18:55,084][123391] Avg episode rewards: #0: 6.027, true rewards: #0: 4.693 [2025-01-03 20:18:55,084][123391] Avg episode reward: 6.027, avg true_objective: 4.693 [2025-01-03 20:18:55,182][123391] Num frames 1500... [2025-01-03 20:18:55,286][123391] Num frames 1600... [2025-01-03 20:18:55,389][123391] Num frames 1700... [2025-01-03 20:18:55,531][123391] Avg episode rewards: #0: 5.480, true rewards: #0: 4.480 [2025-01-03 20:18:55,531][123391] Avg episode reward: 5.480, avg true_objective: 4.480 [2025-01-03 20:18:55,545][123391] Num frames 1800... [2025-01-03 20:18:55,649][123391] Num frames 1900... [2025-01-03 20:18:55,748][123391] Num frames 2000... [2025-01-03 20:18:55,843][123391] Num frames 2100... [2025-01-03 20:18:55,967][123391] Avg episode rewards: #0: 5.152, true rewards: #0: 4.352 [2025-01-03 20:18:55,968][123391] Avg episode reward: 5.152, avg true_objective: 4.352 [2025-01-03 20:18:55,998][123391] Num frames 2200... [2025-01-03 20:18:56,105][123391] Num frames 2300... [2025-01-03 20:18:56,203][123391] Num frames 2400... [2025-01-03 20:18:56,305][123391] Num frames 2500... [2025-01-03 20:18:56,407][123391] Num frames 2600... [2025-01-03 20:18:56,485][123391] Avg episode rewards: #0: 5.373, true rewards: #0: 4.373 [2025-01-03 20:18:56,485][123391] Avg episode reward: 5.373, avg true_objective: 4.373 [2025-01-03 20:18:56,560][123391] Num frames 2700... [2025-01-03 20:18:56,659][123391] Num frames 2800... [2025-01-03 20:18:56,760][123391] Num frames 2900... [2025-01-03 20:18:56,860][123391] Num frames 3000... [2025-01-03 20:18:56,923][123391] Avg episode rewards: #0: 5.154, true rewards: #0: 4.297 [2025-01-03 20:18:56,923][123391] Avg episode reward: 5.154, avg true_objective: 4.297 [2025-01-03 20:18:57,018][123391] Num frames 3100... [2025-01-03 20:18:57,114][123391] Num frames 3200... [2025-01-03 20:18:57,211][123391] Num frames 3300... [2025-01-03 20:18:57,359][123391] Avg episode rewards: #0: 4.990, true rewards: #0: 4.240 [2025-01-03 20:18:57,359][123391] Avg episode reward: 4.990, avg true_objective: 4.240 [2025-01-03 20:18:57,370][123391] Num frames 3400... [2025-01-03 20:18:57,473][123391] Num frames 3500... [2025-01-03 20:18:57,571][123391] Num frames 3600... [2025-01-03 20:18:57,669][123391] Num frames 3700... [2025-01-03 20:18:57,771][123391] Num frames 3800... [2025-01-03 20:18:57,864][123391] Avg episode rewards: #0: 5.044, true rewards: #0: 4.267 [2025-01-03 20:18:57,865][123391] Avg episode reward: 5.044, avg true_objective: 4.267 [2025-01-03 20:18:57,959][123391] Num frames 3900... [2025-01-03 20:18:58,060][123391] Num frames 4000... [2025-01-03 20:18:58,165][123391] Num frames 4100... [2025-01-03 20:18:58,267][123391] Num frames 4200... [2025-01-03 20:18:58,346][123391] Avg episode rewards: #0: 4.924, true rewards: #0: 4.224 [2025-01-03 20:18:58,346][123391] Avg episode reward: 4.924, avg true_objective: 4.224 [2025-01-03 20:19:05,243][123391] Replay video saved to /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/replay.mp4! [2025-01-03 20:24:09,504][124806] Saving configuration to /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/config.json... [2025-01-03 20:24:09,504][124806] Rollout worker 0 uses device cpu [2025-01-03 20:24:09,505][124806] Rollout worker 1 uses device cpu [2025-01-03 20:24:09,505][124806] Rollout worker 2 uses device cpu [2025-01-03 20:24:09,505][124806] Rollout worker 3 uses device cpu [2025-01-03 20:24:09,505][124806] Rollout worker 4 uses device cpu [2025-01-03 20:24:09,505][124806] Rollout worker 5 uses device cpu [2025-01-03 20:24:09,505][124806] Rollout worker 6 uses device cpu [2025-01-03 20:24:09,506][124806] Rollout worker 7 uses device cpu [2025-01-03 20:24:09,546][124806] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2025-01-03 20:24:09,547][124806] InferenceWorker_p0-w0: min num requests: 2 [2025-01-03 20:24:09,578][124806] Starting all processes... [2025-01-03 20:24:09,578][124806] Starting process learner_proc0 [2025-01-03 20:24:10,976][124806] Starting all processes... [2025-01-03 20:24:10,980][124806] Starting process inference_proc0-0 [2025-01-03 20:24:10,980][124806] Starting process rollout_proc0 [2025-01-03 20:24:10,984][124851] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2025-01-03 20:24:10,980][124806] Starting process rollout_proc1 [2025-01-03 20:24:10,984][124851] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0 [2025-01-03 20:24:10,980][124806] Starting process rollout_proc2 [2025-01-03 20:24:10,984][124806] Starting process rollout_proc3 [2025-01-03 20:24:10,984][124806] Starting process rollout_proc4 [2025-01-03 20:24:11,000][124851] Num visible devices: 1 [2025-01-03 20:24:10,984][124806] Starting process rollout_proc5 [2025-01-03 20:24:10,984][124806] Starting process rollout_proc6 [2025-01-03 20:24:11,010][124851] Starting seed is not provided [2025-01-03 20:24:11,010][124851] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2025-01-03 20:24:11,010][124851] Initializing actor-critic model on device cuda:0 [2025-01-03 20:24:11,011][124851] RunningMeanStd input shape: (3, 72, 128) [2025-01-03 20:24:11,011][124851] RunningMeanStd input shape: (1,) [2025-01-03 20:24:10,984][124806] Starting process rollout_proc7 [2025-01-03 20:24:11,022][124851] ConvEncoder: input_channels=3 [2025-01-03 20:24:11,129][124851] Conv encoder output size: 512 [2025-01-03 20:24:11,129][124851] Policy head output size: 512 [2025-01-03 20:24:11,152][124851] Created Actor Critic model with architecture: [2025-01-03 20:24:11,153][124851] ActorCriticSharedWeights( (obs_normalizer): ObservationNormalizer( (running_mean_std): RunningMeanStdDictInPlace( (running_mean_std): ModuleDict( (obs): RunningMeanStdInPlace() ) ) ) (returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace) (encoder): VizdoomEncoder( (basic_encoder): ConvEncoder( (enc): RecursiveScriptModule( original_name=ConvEncoderImpl (conv_head): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Conv2d) (1): RecursiveScriptModule(original_name=ELU) (2): RecursiveScriptModule(original_name=Conv2d) (3): RecursiveScriptModule(original_name=ELU) (4): RecursiveScriptModule(original_name=Conv2d) (5): RecursiveScriptModule(original_name=ELU) ) (mlp_layers): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Linear) (1): RecursiveScriptModule(original_name=ELU) ) ) ) ) (core): ModelCoreRNN( (core): GRU(512, 512) ) (decoder): MlpDecoder( (mlp): Identity() ) (critic_linear): Linear(in_features=512, out_features=1, bias=True) (action_parameterization): ActionParameterizationDefault( (distribution_linear): Linear(in_features=512, out_features=5, bias=True) ) ) [2025-01-03 20:24:11,314][124851] Using optimizer [2025-01-03 20:24:12,887][124851] Loading state from checkpoint /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... [2025-01-03 20:24:12,971][124851] Loading model from checkpoint [2025-01-03 20:24:12,973][124851] Loaded experiment state at self.train_step=978, self.env_steps=4005888 [2025-01-03 20:24:12,974][124851] Initialized policy 0 weights for model version 978 [2025-01-03 20:24:12,976][124851] LearnerWorker_p0 finished initialization! [2025-01-03 20:24:12,977][124851] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2025-01-03 20:24:13,260][124876] Worker 0 uses CPU cores [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11] [2025-01-03 20:24:13,269][124894] Worker 5 uses CPU cores [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11] [2025-01-03 20:24:13,306][124878] Worker 2 uses CPU cores [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11] [2025-01-03 20:24:13,403][124886] Worker 3 uses CPU cores [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11] [2025-01-03 20:24:13,487][124895] Worker 7 uses CPU cores [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11] [2025-01-03 20:24:13,524][124889] Worker 4 uses CPU cores [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11] [2025-01-03 20:24:13,548][124893] Worker 6 uses CPU cores [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11] [2025-01-03 20:24:13,643][124875] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2025-01-03 20:24:13,643][124875] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0 [2025-01-03 20:24:13,655][124875] Num visible devices: 1 [2025-01-03 20:24:13,672][124806] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 4005888. Throughput: 0: nan. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2025-01-03 20:24:13,677][124877] Worker 1 uses CPU cores [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11] [2025-01-03 20:24:13,737][124875] RunningMeanStd input shape: (3, 72, 128) [2025-01-03 20:24:13,738][124875] RunningMeanStd input shape: (1,) [2025-01-03 20:24:13,746][124875] ConvEncoder: input_channels=3 [2025-01-03 20:24:13,832][124875] Conv encoder output size: 512 [2025-01-03 20:24:13,832][124875] Policy head output size: 512 [2025-01-03 20:24:13,856][124806] Inference worker 0-0 is ready! [2025-01-03 20:24:13,857][124806] All inference workers are ready! Signal rollout workers to start! [2025-01-03 20:24:13,888][124886] Doom resolution: 160x120, resize resolution: (128, 72) [2025-01-03 20:24:13,889][124877] Doom resolution: 160x120, resize resolution: (128, 72) [2025-01-03 20:24:13,905][124876] Doom resolution: 160x120, resize resolution: (128, 72) [2025-01-03 20:24:13,905][124889] Doom resolution: 160x120, resize resolution: (128, 72) [2025-01-03 20:24:13,905][124895] Doom resolution: 160x120, resize resolution: (128, 72) [2025-01-03 20:24:13,905][124894] Doom resolution: 160x120, resize resolution: (128, 72) [2025-01-03 20:24:13,905][124878] Doom resolution: 160x120, resize resolution: (128, 72) [2025-01-03 20:24:13,905][124893] Doom resolution: 160x120, resize resolution: (128, 72) [2025-01-03 20:24:14,164][124886] Decorrelating experience for 0 frames... [2025-01-03 20:24:14,168][124877] Decorrelating experience for 0 frames... [2025-01-03 20:24:14,181][124889] Decorrelating experience for 0 frames... [2025-01-03 20:24:14,181][124878] Decorrelating experience for 0 frames... [2025-01-03 20:24:14,182][124893] Decorrelating experience for 0 frames... [2025-01-03 20:24:14,281][124876] Decorrelating experience for 0 frames... [2025-01-03 20:24:14,283][124895] Decorrelating experience for 0 frames... [2025-01-03 20:24:14,402][124889] Decorrelating experience for 32 frames... [2025-01-03 20:24:14,405][124893] Decorrelating experience for 32 frames... [2025-01-03 20:24:14,408][124878] Decorrelating experience for 32 frames... [2025-01-03 20:24:14,521][124876] Decorrelating experience for 32 frames... [2025-01-03 20:24:14,637][124886] Decorrelating experience for 32 frames... [2025-01-03 20:24:14,641][124877] Decorrelating experience for 32 frames... [2025-01-03 20:24:14,654][124895] Decorrelating experience for 32 frames... [2025-01-03 20:24:14,872][124878] Decorrelating experience for 64 frames... [2025-01-03 20:24:14,883][124889] Decorrelating experience for 64 frames... [2025-01-03 20:24:14,916][124877] Decorrelating experience for 64 frames... [2025-01-03 20:24:14,919][124886] Decorrelating experience for 64 frames... [2025-01-03 20:24:15,129][124894] Decorrelating experience for 0 frames... [2025-01-03 20:24:15,143][124878] Decorrelating experience for 96 frames... [2025-01-03 20:24:15,152][124889] Decorrelating experience for 96 frames... [2025-01-03 20:24:15,160][124895] Decorrelating experience for 64 frames... [2025-01-03 20:24:15,192][124886] Decorrelating experience for 96 frames... [2025-01-03 20:24:15,367][124876] Decorrelating experience for 64 frames... [2025-01-03 20:24:15,372][124894] Decorrelating experience for 32 frames... [2025-01-03 20:24:15,430][124895] Decorrelating experience for 96 frames... [2025-01-03 20:24:15,456][124877] Decorrelating experience for 96 frames... [2025-01-03 20:24:15,634][124893] Decorrelating experience for 64 frames... [2025-01-03 20:24:15,669][124876] Decorrelating experience for 96 frames... [2025-01-03 20:24:15,720][124894] Decorrelating experience for 64 frames... [2025-01-03 20:24:15,958][124893] Decorrelating experience for 96 frames... [2025-01-03 20:24:16,049][124894] Decorrelating experience for 96 frames... [2025-01-03 20:24:16,121][124851] Signal inference workers to stop experience collection... [2025-01-03 20:24:16,147][124875] InferenceWorker_p0-w0: stopping experience collection [2025-01-03 20:24:16,246][124806] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 4005888. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2025-01-03 20:24:16,247][124806] Avg episode reward: [(0, '2.533')] [2025-01-03 20:24:18,392][124851] Signal inference workers to resume experience collection... [2025-01-03 20:24:18,393][124851] Stopping Batcher_0... [2025-01-03 20:24:18,393][124851] Loop batcher_evt_loop terminating... [2025-01-03 20:24:18,399][124806] Component Batcher_0 stopped! [2025-01-03 20:24:18,411][124875] Weights refcount: 2 0 [2025-01-03 20:24:18,413][124875] Stopping InferenceWorker_p0-w0... [2025-01-03 20:24:18,413][124806] Component InferenceWorker_p0-w0 stopped! [2025-01-03 20:24:18,413][124875] Loop inference_proc0-0_evt_loop terminating... [2025-01-03 20:24:18,446][124895] Stopping RolloutWorker_w7... [2025-01-03 20:24:18,446][124806] Component RolloutWorker_w7 stopped! [2025-01-03 20:24:18,447][124895] Loop rollout_proc7_evt_loop terminating... [2025-01-03 20:24:18,448][124889] Stopping RolloutWorker_w4... [2025-01-03 20:24:18,449][124889] Loop rollout_proc4_evt_loop terminating... [2025-01-03 20:24:18,449][124806] Component RolloutWorker_w4 stopped! [2025-01-03 20:24:18,450][124877] Stopping RolloutWorker_w1... [2025-01-03 20:24:18,450][124806] Component RolloutWorker_w1 stopped! [2025-01-03 20:24:18,450][124877] Loop rollout_proc1_evt_loop terminating... [2025-01-03 20:24:18,451][124806] Component RolloutWorker_w6 stopped! [2025-01-03 20:24:18,451][124893] Stopping RolloutWorker_w6... [2025-01-03 20:24:18,452][124893] Loop rollout_proc6_evt_loop terminating... [2025-01-03 20:24:18,452][124876] Stopping RolloutWorker_w0... [2025-01-03 20:24:18,452][124806] Component RolloutWorker_w0 stopped! [2025-01-03 20:24:18,453][124876] Loop rollout_proc0_evt_loop terminating... [2025-01-03 20:24:18,453][124806] Component RolloutWorker_w5 stopped! [2025-01-03 20:24:18,454][124878] Stopping RolloutWorker_w2... [2025-01-03 20:24:18,454][124806] Component RolloutWorker_w2 stopped! [2025-01-03 20:24:18,454][124878] Loop rollout_proc2_evt_loop terminating... [2025-01-03 20:24:18,453][124894] Stopping RolloutWorker_w5... [2025-01-03 20:24:18,457][124894] Loop rollout_proc5_evt_loop terminating... [2025-01-03 20:24:18,458][124886] Stopping RolloutWorker_w3... [2025-01-03 20:24:18,459][124806] Component RolloutWorker_w3 stopped! [2025-01-03 20:24:18,459][124886] Loop rollout_proc3_evt_loop terminating... [2025-01-03 20:24:18,841][124851] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000000980_4014080.pth... [2025-01-03 20:24:18,925][124851] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000000823_3371008.pth [2025-01-03 20:24:18,927][124851] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000000980_4014080.pth... [2025-01-03 20:24:18,992][124851] Stopping LearnerWorker_p0... [2025-01-03 20:24:18,996][124851] Loop learner_proc0_evt_loop terminating... [2025-01-03 20:24:18,993][124806] Component LearnerWorker_p0 stopped! [2025-01-03 20:24:18,999][124806] Waiting for process learner_proc0 to stop... [2025-01-03 20:24:19,664][124806] Waiting for process inference_proc0-0 to join... [2025-01-03 20:24:19,664][124806] Waiting for process rollout_proc0 to join... [2025-01-03 20:24:19,664][124806] Waiting for process rollout_proc1 to join... [2025-01-03 20:24:19,664][124806] Waiting for process rollout_proc2 to join... [2025-01-03 20:24:19,664][124806] Waiting for process rollout_proc3 to join... [2025-01-03 20:24:19,665][124806] Waiting for process rollout_proc4 to join... [2025-01-03 20:24:19,665][124806] Waiting for process rollout_proc5 to join... [2025-01-03 20:24:19,665][124806] Waiting for process rollout_proc6 to join... [2025-01-03 20:24:19,665][124806] Waiting for process rollout_proc7 to join... [2025-01-03 20:24:19,665][124806] Batcher 0 profile tree view: batching: 0.0192, releasing_batches: 0.0005 [2025-01-03 20:24:19,665][124806] InferenceWorker_p0-w0 profile tree view: update_model: 0.0062 wait_policy: 0.0001 wait_policy_total: 1.3388 one_step: 0.0029 handle_policy_step: 0.9018 deserialize: 0.0248, stack: 0.0034, obs_to_device_normalize: 0.1673, forward: 0.5922, send_messages: 0.0335 prepare_outputs: 0.0583 to_cpu: 0.0322 [2025-01-03 20:24:19,666][124806] Learner 0 profile tree view: misc: 0.0000, prepare_batch: 0.7957 train: 2.1472 epoch_init: 0.0000, minibatch_init: 0.0000, losses_postprocess: 0.0005, kl_divergence: 0.0058, after_optimizer: 0.0174 calculate_losses: 0.4651 losses_init: 0.0000, forward_head: 0.3805, bptt_initial: 0.0572, tail: 0.0086, advantages_returns: 0.0007, losses: 0.0139 bptt: 0.0037 bptt_forward_core: 0.0036 update: 1.6574 clip: 0.0233 [2025-01-03 20:24:19,666][124806] RolloutWorker_w0 profile tree view: wait_for_trajectories: 0.0005, enqueue_policy_requests: 0.0243, env_step: 0.2195, overhead: 0.0131, complete_rollouts: 0.0005 save_policy_outputs: 0.0223 split_output_tensors: 0.0074 [2025-01-03 20:24:19,666][124806] RolloutWorker_w7 profile tree view: wait_for_trajectories: 0.0007, enqueue_policy_requests: 0.0367, env_step: 0.3178, overhead: 0.0197, complete_rollouts: 0.0007 save_policy_outputs: 0.0324 split_output_tensors: 0.0107 [2025-01-03 20:24:19,666][124806] Loop Runner_EvtLoop terminating... [2025-01-03 20:24:19,667][124806] Runner profile tree view: main_loop: 10.0887 [2025-01-03 20:24:19,667][124806] Collected {0: 4014080}, FPS: 812.0 [2025-01-03 20:24:19,872][124806] Loading existing experiment configuration from /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/config.json [2025-01-03 20:24:19,873][124806] Overriding arg 'num_workers' with value 1 passed from command line [2025-01-03 20:24:19,873][124806] Adding new argument 'no_render'=True that is not in the saved config file! [2025-01-03 20:24:19,873][124806] Adding new argument 'save_video'=True that is not in the saved config file! [2025-01-03 20:24:19,873][124806] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! [2025-01-03 20:24:19,873][124806] Adding new argument 'video_name'=None that is not in the saved config file! [2025-01-03 20:24:19,873][124806] Adding new argument 'max_num_frames'=1000000000.0 that is not in the saved config file! [2025-01-03 20:24:19,873][124806] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! [2025-01-03 20:24:19,873][124806] Adding new argument 'push_to_hub'=False that is not in the saved config file! [2025-01-03 20:24:19,873][124806] Adding new argument 'hf_repository'=None that is not in the saved config file! [2025-01-03 20:24:19,873][124806] Adding new argument 'policy_index'=0 that is not in the saved config file! [2025-01-03 20:24:19,873][124806] Adding new argument 'eval_deterministic'=False that is not in the saved config file! [2025-01-03 20:24:19,873][124806] Adding new argument 'train_script'=None that is not in the saved config file! [2025-01-03 20:24:19,874][124806] Adding new argument 'enjoy_script'=None that is not in the saved config file! [2025-01-03 20:24:19,874][124806] Using frameskip 1 and render_action_repeat=4 for evaluation [2025-01-03 20:24:19,896][124806] Doom resolution: 160x120, resize resolution: (128, 72) [2025-01-03 20:24:19,897][124806] RunningMeanStd input shape: (3, 72, 128) [2025-01-03 20:24:19,897][124806] RunningMeanStd input shape: (1,) [2025-01-03 20:24:19,906][124806] ConvEncoder: input_channels=3 [2025-01-03 20:24:19,993][124806] Conv encoder output size: 512 [2025-01-03 20:24:19,993][124806] Policy head output size: 512 [2025-01-03 20:24:20,107][124806] Loading state from checkpoint /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000000980_4014080.pth... [2025-01-03 20:24:20,628][124806] Num frames 100... [2025-01-03 20:24:20,721][124806] Num frames 200... [2025-01-03 20:24:20,813][124806] Num frames 300... [2025-01-03 20:24:20,887][124806] Avg episode rewards: #0: 4.200, true rewards: #0: 3.200 [2025-01-03 20:24:20,888][124806] Avg episode reward: 4.200, avg true_objective: 3.200 [2025-01-03 20:24:20,977][124806] Num frames 400... [2025-01-03 20:24:21,078][124806] Num frames 500... [2025-01-03 20:24:21,175][124806] Num frames 600... [2025-01-03 20:24:21,270][124806] Num frames 700... [2025-01-03 20:24:21,368][124806] Num frames 800... [2025-01-03 20:24:21,484][124806] Avg episode rewards: #0: 6.320, true rewards: #0: 4.320 [2025-01-03 20:24:21,485][124806] Avg episode reward: 6.320, avg true_objective: 4.320 [2025-01-03 20:24:21,558][124806] Num frames 900... [2025-01-03 20:24:21,654][124806] Num frames 1000... [2025-01-03 20:24:21,746][124806] Num frames 1100... [2025-01-03 20:24:21,839][124806] Num frames 1200... [2025-01-03 20:24:21,932][124806] Num frames 1300... [2025-01-03 20:24:21,997][124806] Avg episode rewards: #0: 6.040, true rewards: #0: 4.373 [2025-01-03 20:24:21,998][124806] Avg episode reward: 6.040, avg true_objective: 4.373 [2025-01-03 20:24:22,094][124806] Num frames 1400... [2025-01-03 20:24:22,191][124806] Num frames 1500... [2025-01-03 20:24:22,287][124806] Num frames 1600... [2025-01-03 20:24:22,383][124806] Num frames 1700... [2025-01-03 20:24:22,477][124806] Num frames 1800... [2025-01-03 20:24:22,572][124806] Num frames 1900... [2025-01-03 20:24:22,648][124806] Avg episode rewards: #0: 6.800, true rewards: #0: 4.800 [2025-01-03 20:24:22,648][124806] Avg episode reward: 6.800, avg true_objective: 4.800 [2025-01-03 20:24:22,735][124806] Num frames 2000... [2025-01-03 20:24:22,831][124806] Num frames 2100... [2025-01-03 20:24:22,931][124806] Num frames 2200... [2025-01-03 20:24:23,031][124806] Num frames 2300... [2025-01-03 20:24:23,152][124806] Avg episode rewards: #0: 6.536, true rewards: #0: 4.736 [2025-01-03 20:24:23,152][124806] Avg episode reward: 6.536, avg true_objective: 4.736 [2025-01-03 20:24:23,197][124806] Num frames 2400... [2025-01-03 20:24:23,295][124806] Num frames 2500... [2025-01-03 20:24:23,389][124806] Num frames 2600... [2025-01-03 20:24:23,467][124806] Avg episode rewards: #0: 5.873, true rewards: #0: 4.373 [2025-01-03 20:24:23,468][124806] Avg episode reward: 5.873, avg true_objective: 4.373 [2025-01-03 20:24:23,580][124806] Num frames 2700... [2025-01-03 20:24:23,676][124806] Num frames 2800... [2025-01-03 20:24:23,775][124806] Num frames 2900... [2025-01-03 20:24:23,872][124806] Num frames 3000... [2025-01-03 20:24:23,935][124806] Avg episode rewards: #0: 5.583, true rewards: #0: 4.297 [2025-01-03 20:24:23,935][124806] Avg episode reward: 5.583, avg true_objective: 4.297 [2025-01-03 20:24:24,039][124806] Num frames 3100... [2025-01-03 20:24:24,136][124806] Num frames 3200... [2025-01-03 20:24:24,233][124806] Num frames 3300... [2025-01-03 20:24:24,374][124806] Avg episode rewards: #0: 5.365, true rewards: #0: 4.240 [2025-01-03 20:24:24,374][124806] Avg episode reward: 5.365, avg true_objective: 4.240 [2025-01-03 20:24:24,389][124806] Num frames 3400... [2025-01-03 20:24:24,492][124806] Num frames 3500... [2025-01-03 20:24:24,592][124806] Num frames 3600... [2025-01-03 20:24:24,687][124806] Num frames 3700... [2025-01-03 20:24:24,780][124806] Num frames 3800... [2025-01-03 20:24:24,871][124806] Avg episode rewards: #0: 5.378, true rewards: #0: 4.267 [2025-01-03 20:24:24,872][124806] Avg episode reward: 5.378, avg true_objective: 4.267 [2025-01-03 20:24:24,942][124806] Num frames 3900... [2025-01-03 20:24:25,045][124806] Num frames 4000... [2025-01-03 20:24:25,153][124806] Num frames 4100... [2025-01-03 20:24:25,271][124806] Num frames 4200... [2025-01-03 20:24:25,354][124806] Avg episode rewards: #0: 5.224, true rewards: #0: 4.224 [2025-01-03 20:24:25,354][124806] Avg episode reward: 5.224, avg true_objective: 4.224 [2025-01-03 20:24:31,929][124806] Replay video saved to /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/replay.mp4! [2025-01-03 20:24:31,941][124806] Loading existing experiment configuration from /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/config.json [2025-01-03 20:24:31,941][124806] Overriding arg 'num_workers' with value 1 passed from command line [2025-01-03 20:24:31,941][124806] Adding new argument 'no_render'=True that is not in the saved config file! [2025-01-03 20:24:31,941][124806] Adding new argument 'save_video'=True that is not in the saved config file! [2025-01-03 20:24:31,942][124806] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! [2025-01-03 20:24:31,942][124806] Adding new argument 'video_name'=None that is not in the saved config file! [2025-01-03 20:24:31,942][124806] Adding new argument 'max_num_frames'=100000 that is not in the saved config file! [2025-01-03 20:24:31,942][124806] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! [2025-01-03 20:24:31,942][124806] Adding new argument 'push_to_hub'=True that is not in the saved config file! [2025-01-03 20:24:31,942][124806] Adding new argument 'hf_repository'='spenning/rl_course_vizdoom_health_gathering_supreme' that is not in the saved config file! [2025-01-03 20:24:31,942][124806] Adding new argument 'policy_index'=0 that is not in the saved config file! [2025-01-03 20:24:31,942][124806] Adding new argument 'eval_deterministic'=False that is not in the saved config file! [2025-01-03 20:24:31,942][124806] Adding new argument 'train_script'=None that is not in the saved config file! [2025-01-03 20:24:31,942][124806] Adding new argument 'enjoy_script'=None that is not in the saved config file! [2025-01-03 20:24:31,942][124806] Using frameskip 1 and render_action_repeat=4 for evaluation [2025-01-03 20:24:31,959][124806] RunningMeanStd input shape: (3, 72, 128) [2025-01-03 20:24:31,959][124806] RunningMeanStd input shape: (1,) [2025-01-03 20:24:31,969][124806] ConvEncoder: input_channels=3 [2025-01-03 20:24:31,995][124806] Conv encoder output size: 512 [2025-01-03 20:24:31,995][124806] Policy head output size: 512 [2025-01-03 20:24:32,011][124806] Loading state from checkpoint /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000000980_4014080.pth... [2025-01-03 20:24:32,364][124806] Num frames 100... [2025-01-03 20:24:32,461][124806] Num frames 200... [2025-01-03 20:24:32,554][124806] Num frames 300... [2025-01-03 20:24:32,646][124806] Num frames 400... [2025-01-03 20:24:32,717][124806] Avg episode rewards: #0: 5.160, true rewards: #0: 4.160 [2025-01-03 20:24:32,717][124806] Avg episode reward: 5.160, avg true_objective: 4.160 [2025-01-03 20:24:32,797][124806] Num frames 500... [2025-01-03 20:24:32,891][124806] Num frames 600... [2025-01-03 20:24:33,010][124806] Avg episode rewards: #0: 3.860, true rewards: #0: 3.360 [2025-01-03 20:24:33,011][124806] Avg episode reward: 3.860, avg true_objective: 3.360 [2025-01-03 20:24:33,042][124806] Num frames 700... [2025-01-03 20:24:33,139][124806] Num frames 800... [2025-01-03 20:24:33,233][124806] Num frames 900... [2025-01-03 20:24:33,327][124806] Num frames 1000... [2025-01-03 20:24:33,423][124806] Num frames 1100... [2025-01-03 20:24:33,504][124806] Avg episode rewards: #0: 4.400, true rewards: #0: 3.733 [2025-01-03 20:24:33,504][124806] Avg episode reward: 4.400, avg true_objective: 3.733 [2025-01-03 20:24:33,586][124806] Num frames 1200... [2025-01-03 20:24:33,678][124806] Num frames 1300... [2025-01-03 20:24:33,777][124806] Num frames 1400... [2025-01-03 20:24:33,873][124806] Num frames 1500... [2025-01-03 20:24:33,967][124806] Num frames 1600... [2025-01-03 20:24:34,064][124806] Num frames 1700... [2025-01-03 20:24:34,145][124806] Avg episode rewards: #0: 5.570, true rewards: #0: 4.320 [2025-01-03 20:24:34,145][124806] Avg episode reward: 5.570, avg true_objective: 4.320 [2025-01-03 20:24:34,245][124806] Num frames 1800... [2025-01-03 20:24:34,342][124806] Num frames 1900... [2025-01-03 20:24:34,438][124806] Num frames 2000... [2025-01-03 20:24:34,536][124806] Num frames 2100... [2025-01-03 20:24:34,602][124806] Avg episode rewards: #0: 5.224, true rewards: #0: 4.224 [2025-01-03 20:24:34,603][124806] Avg episode reward: 5.224, avg true_objective: 4.224 [2025-01-03 20:24:34,704][124806] Num frames 2200... [2025-01-03 20:24:34,798][124806] Num frames 2300... [2025-01-03 20:24:34,890][124806] Num frames 2400... [2025-01-03 20:24:35,031][124806] Avg episode rewards: #0: 4.993, true rewards: #0: 4.160 [2025-01-03 20:24:35,032][124806] Avg episode reward: 4.993, avg true_objective: 4.160 [2025-01-03 20:24:35,038][124806] Num frames 2500... [2025-01-03 20:24:35,142][124806] Num frames 2600... [2025-01-03 20:24:35,238][124806] Num frames 2700... [2025-01-03 20:24:35,335][124806] Num frames 2800... [2025-01-03 20:24:35,432][124806] Num frames 2900... [2025-01-03 20:24:35,526][124806] Num frames 3000... [2025-01-03 20:24:35,619][124806] Avg episode rewards: #0: 5.629, true rewards: #0: 4.343 [2025-01-03 20:24:35,619][124806] Avg episode reward: 5.629, avg true_objective: 4.343 [2025-01-03 20:24:35,688][124806] Num frames 3100... [2025-01-03 20:24:35,783][124806] Num frames 3200... [2025-01-03 20:24:35,878][124806] Num frames 3300... [2025-01-03 20:24:35,974][124806] Num frames 3400... [2025-01-03 20:24:36,054][124806] Avg episode rewards: #0: 5.405, true rewards: #0: 4.280 [2025-01-03 20:24:36,055][124806] Avg episode reward: 5.405, avg true_objective: 4.280 [2025-01-03 20:24:36,142][124806] Num frames 3500... [2025-01-03 20:24:36,244][124806] Num frames 3600... [2025-01-03 20:24:36,343][124806] Num frames 3700... [2025-01-03 20:24:36,440][124806] Num frames 3800... [2025-01-03 20:24:36,502][124806] Avg episode rewards: #0: 5.231, true rewards: #0: 4.231 [2025-01-03 20:24:36,503][124806] Avg episode reward: 5.231, avg true_objective: 4.231 [2025-01-03 20:24:36,626][124806] Num frames 3900... [2025-01-03 20:24:36,724][124806] Num frames 4000... [2025-01-03 20:24:36,819][124806] Num frames 4100... [2025-01-03 20:24:36,961][124806] Avg episode rewards: #0: 5.092, true rewards: #0: 4.192 [2025-01-03 20:24:36,961][124806] Avg episode reward: 5.092, avg true_objective: 4.192 [2025-01-03 20:24:43,480][124806] Replay video saved to /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/replay.mp4! [2025-01-03 20:26:09,968][124806] The model has been pushed to https://huggingface.co/spenning/rl_course_vizdoom_health_gathering_supreme [2025-01-03 20:28:49,320][126169] Saving configuration to /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/config.json... [2025-01-03 20:28:49,321][126169] Rollout worker 0 uses device cpu [2025-01-03 20:28:49,321][126169] Rollout worker 1 uses device cpu [2025-01-03 20:28:49,321][126169] Rollout worker 2 uses device cpu [2025-01-03 20:28:49,321][126169] Rollout worker 3 uses device cpu [2025-01-03 20:28:49,321][126169] Rollout worker 4 uses device cpu [2025-01-03 20:28:49,321][126169] Rollout worker 5 uses device cpu [2025-01-03 20:28:49,322][126169] Rollout worker 6 uses device cpu [2025-01-03 20:28:49,322][126169] Rollout worker 7 uses device cpu [2025-01-03 20:28:49,365][126169] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2025-01-03 20:28:49,365][126169] InferenceWorker_p0-w0: min num requests: 2 [2025-01-03 20:28:49,389][126169] Starting all processes... [2025-01-03 20:28:49,389][126169] Starting process learner_proc0 [2025-01-03 20:28:50,735][126169] Starting all processes... [2025-01-03 20:28:50,739][126169] Starting process inference_proc0-0 [2025-01-03 20:28:50,739][126169] Starting process rollout_proc0 [2025-01-03 20:28:50,740][126169] Starting process rollout_proc1 [2025-01-03 20:28:50,741][126169] Starting process rollout_proc2 [2025-01-03 20:28:50,742][126222] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2025-01-03 20:28:50,742][126222] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0 [2025-01-03 20:28:50,741][126169] Starting process rollout_proc3 [2025-01-03 20:28:50,741][126169] Starting process rollout_proc4 [2025-01-03 20:28:50,758][126222] Num visible devices: 1 [2025-01-03 20:28:50,741][126169] Starting process rollout_proc5 [2025-01-03 20:28:50,750][126169] Starting process rollout_proc6 [2025-01-03 20:28:50,753][126169] Starting process rollout_proc7 [2025-01-03 20:28:50,771][126222] Starting seed is not provided [2025-01-03 20:28:50,771][126222] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2025-01-03 20:28:50,771][126222] Initializing actor-critic model on device cuda:0 [2025-01-03 20:28:50,772][126222] RunningMeanStd input shape: (3, 72, 128) [2025-01-03 20:28:50,773][126222] RunningMeanStd input shape: (1,) [2025-01-03 20:28:50,783][126222] ConvEncoder: input_channels=3 [2025-01-03 20:28:50,917][126222] Conv encoder output size: 512 [2025-01-03 20:28:50,917][126222] Policy head output size: 512 [2025-01-03 20:28:50,939][126222] Created Actor Critic model with architecture: [2025-01-03 20:28:50,939][126222] ActorCriticSharedWeights( (obs_normalizer): ObservationNormalizer( (running_mean_std): RunningMeanStdDictInPlace( (running_mean_std): ModuleDict( (obs): RunningMeanStdInPlace() ) ) ) (returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace) (encoder): VizdoomEncoder( (basic_encoder): ConvEncoder( (enc): RecursiveScriptModule( original_name=ConvEncoderImpl (conv_head): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Conv2d) (1): RecursiveScriptModule(original_name=ELU) (2): RecursiveScriptModule(original_name=Conv2d) (3): RecursiveScriptModule(original_name=ELU) (4): RecursiveScriptModule(original_name=Conv2d) (5): RecursiveScriptModule(original_name=ELU) ) (mlp_layers): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Linear) (1): RecursiveScriptModule(original_name=ELU) ) ) ) ) (core): ModelCoreRNN( (core): GRU(512, 512) ) (decoder): MlpDecoder( (mlp): Identity() ) (critic_linear): Linear(in_features=512, out_features=1, bias=True) (action_parameterization): ActionParameterizationDefault( (distribution_linear): Linear(in_features=512, out_features=5, bias=True) ) ) [2025-01-03 20:28:51,100][126222] Using optimizer [2025-01-03 20:28:52,424][126222] Loading state from checkpoint /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000000980_4014080.pth... [2025-01-03 20:28:52,470][126222] Loading model from checkpoint [2025-01-03 20:28:52,472][126222] Loaded experiment state at self.train_step=980, self.env_steps=4014080 [2025-01-03 20:28:52,473][126222] Initialized policy 0 weights for model version 980 [2025-01-03 20:28:52,475][126222] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2025-01-03 20:28:52,476][126222] LearnerWorker_p0 finished initialization! [2025-01-03 20:28:52,881][126265] Worker 5 uses CPU cores [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11] [2025-01-03 20:28:52,914][126248] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2025-01-03 20:28:52,914][126248] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0 [2025-01-03 20:28:52,927][126248] Num visible devices: 1 [2025-01-03 20:28:53,009][126248] RunningMeanStd input shape: (3, 72, 128) [2025-01-03 20:28:53,010][126248] RunningMeanStd input shape: (1,) [2025-01-03 20:28:53,019][126248] ConvEncoder: input_channels=3 [2025-01-03 20:28:53,091][126251] Worker 3 uses CPU cores [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11] [2025-01-03 20:28:53,120][126248] Conv encoder output size: 512 [2025-01-03 20:28:53,120][126248] Policy head output size: 512 [2025-01-03 20:28:53,308][126266] Worker 7 uses CPU cores [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11] [2025-01-03 20:28:53,383][126249] Worker 2 uses CPU cores [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11] [2025-01-03 20:28:53,405][126267] Worker 6 uses CPU cores [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11] [2025-01-03 20:28:53,432][126247] Worker 0 uses CPU cores [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11] [2025-01-03 20:28:53,461][126250] Worker 1 uses CPU cores [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11] [2025-01-03 20:28:53,473][126169] Inference worker 0-0 is ready! [2025-01-03 20:28:53,474][126169] All inference workers are ready! Signal rollout workers to start! [2025-01-03 20:28:53,474][126169] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 4014080. Throughput: 0: nan. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2025-01-03 20:28:53,481][126263] Worker 4 uses CPU cores [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11] [2025-01-03 20:28:53,504][126251] Doom resolution: 160x120, resize resolution: (128, 72) [2025-01-03 20:28:53,512][126263] Doom resolution: 160x120, resize resolution: (128, 72) [2025-01-03 20:28:53,517][126267] Doom resolution: 160x120, resize resolution: (128, 72) [2025-01-03 20:28:53,518][126247] Doom resolution: 160x120, resize resolution: (128, 72) [2025-01-03 20:28:53,518][126250] Doom resolution: 160x120, resize resolution: (128, 72) [2025-01-03 20:28:53,518][126265] Doom resolution: 160x120, resize resolution: (128, 72) [2025-01-03 20:28:53,519][126249] Doom resolution: 160x120, resize resolution: (128, 72) [2025-01-03 20:28:53,519][126266] Doom resolution: 160x120, resize resolution: (128, 72) [2025-01-03 20:28:53,776][126247] Decorrelating experience for 0 frames... [2025-01-03 20:28:53,776][126251] Decorrelating experience for 0 frames... [2025-01-03 20:28:53,777][126263] Decorrelating experience for 0 frames... [2025-01-03 20:28:53,803][126249] Decorrelating experience for 0 frames... [2025-01-03 20:28:53,803][126266] Decorrelating experience for 0 frames... [2025-01-03 20:28:53,998][126247] Decorrelating experience for 32 frames... [2025-01-03 20:28:54,006][126251] Decorrelating experience for 32 frames... [2025-01-03 20:28:54,023][126263] Decorrelating experience for 32 frames... [2025-01-03 20:28:54,036][126267] Decorrelating experience for 0 frames... [2025-01-03 20:28:54,277][126267] Decorrelating experience for 32 frames... [2025-01-03 20:28:54,298][126266] Decorrelating experience for 32 frames... [2025-01-03 20:28:54,309][126265] Decorrelating experience for 0 frames... [2025-01-03 20:28:54,352][126263] Decorrelating experience for 64 frames... [2025-01-03 20:28:54,358][126250] Decorrelating experience for 0 frames... [2025-01-03 20:28:54,562][126247] Decorrelating experience for 64 frames... [2025-01-03 20:28:54,566][126251] Decorrelating experience for 64 frames... [2025-01-03 20:28:54,586][126265] Decorrelating experience for 32 frames... [2025-01-03 20:28:54,587][126267] Decorrelating experience for 64 frames... [2025-01-03 20:28:54,591][126250] Decorrelating experience for 32 frames... [2025-01-03 20:28:54,822][126247] Decorrelating experience for 96 frames... [2025-01-03 20:28:54,832][126251] Decorrelating experience for 96 frames... [2025-01-03 20:28:54,837][126249] Decorrelating experience for 32 frames... [2025-01-03 20:28:54,840][126263] Decorrelating experience for 96 frames... [2025-01-03 20:28:54,886][126250] Decorrelating experience for 64 frames... [2025-01-03 20:28:54,942][126266] Decorrelating experience for 64 frames... [2025-01-03 20:28:54,985][126265] Decorrelating experience for 64 frames... [2025-01-03 20:28:55,125][126267] Decorrelating experience for 96 frames... [2025-01-03 20:28:55,148][126249] Decorrelating experience for 64 frames... [2025-01-03 20:28:55,153][126250] Decorrelating experience for 96 frames... [2025-01-03 20:28:55,434][126265] Decorrelating experience for 96 frames... [2025-01-03 20:28:55,455][126249] Decorrelating experience for 96 frames... [2025-01-03 20:28:55,728][126266] Decorrelating experience for 96 frames... [2025-01-03 20:28:55,799][126222] Signal inference workers to stop experience collection... [2025-01-03 20:28:55,817][126248] InferenceWorker_p0-w0: stopping experience collection [2025-01-03 20:28:56,068][126169] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 4014080. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2025-01-03 20:28:56,069][126169] Avg episode reward: [(0, '2.616')] [2025-01-03 20:28:58,088][126222] Signal inference workers to resume experience collection... [2025-01-03 20:28:58,088][126248] InferenceWorker_p0-w0: resuming experience collection [2025-01-03 20:28:59,975][126248] Updated weights for policy 0, policy_version 990 (0.0073) [2025-01-03 20:29:01,068][126169] Fps is (10 sec: 8090.5, 60 sec: 8090.5, 300 sec: 8090.5). Total num frames: 4075520. Throughput: 0: 896.2. Samples: 6806. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-01-03 20:29:01,069][126169] Avg episode reward: [(0, '4.645')] [2025-01-03 20:29:02,214][126248] Updated weights for policy 0, policy_version 1000 (0.0011) [2025-01-03 20:29:04,416][126248] Updated weights for policy 0, policy_version 1010 (0.0011) [2025-01-03 20:29:06,068][126169] Fps is (10 sec: 15155.1, 60 sec: 12033.5, 300 sec: 12033.5). Total num frames: 4165632. Throughput: 0: 2751.4. Samples: 34652. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-01-03 20:29:06,069][126169] Avg episode reward: [(0, '4.457')] [2025-01-03 20:29:06,693][126248] Updated weights for policy 0, policy_version 1020 (0.0011) [2025-01-03 20:29:08,817][126248] Updated weights for policy 0, policy_version 1030 (0.0010) [2025-01-03 20:29:09,359][126169] Heartbeat connected on Batcher_0 [2025-01-03 20:29:09,362][126169] Heartbeat connected on LearnerWorker_p0 [2025-01-03 20:29:09,369][126169] Heartbeat connected on RolloutWorker_w0 [2025-01-03 20:29:09,370][126169] Heartbeat connected on InferenceWorker_p0-w0 [2025-01-03 20:29:09,373][126169] Heartbeat connected on RolloutWorker_w1 [2025-01-03 20:29:09,379][126169] Heartbeat connected on RolloutWorker_w3 [2025-01-03 20:29:09,379][126169] Heartbeat connected on RolloutWorker_w2 [2025-01-03 20:29:09,384][126169] Heartbeat connected on RolloutWorker_w5 [2025-01-03 20:29:09,384][126169] Heartbeat connected on RolloutWorker_w4 [2025-01-03 20:29:09,388][126169] Heartbeat connected on RolloutWorker_w6 [2025-01-03 20:29:09,389][126169] Heartbeat connected on RolloutWorker_w7 [2025-01-03 20:29:10,961][126248] Updated weights for policy 0, policy_version 1040 (0.0011) [2025-01-03 20:29:11,069][126169] Fps is (10 sec: 18430.1, 60 sec: 13967.5, 300 sec: 13967.5). Total num frames: 4259840. Throughput: 0: 2763.0. Samples: 48616. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-01-03 20:29:11,070][126169] Avg episode reward: [(0, '4.649')] [2025-01-03 20:29:13,302][126248] Updated weights for policy 0, policy_version 1050 (0.0011) [2025-01-03 20:29:15,636][126248] Updated weights for policy 0, policy_version 1060 (0.0011) [2025-01-03 20:29:16,068][126169] Fps is (10 sec: 18022.5, 60 sec: 14684.1, 300 sec: 14684.1). Total num frames: 4345856. Throughput: 0: 3360.1. Samples: 75918. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-01-03 20:29:16,069][126169] Avg episode reward: [(0, '4.665')] [2025-01-03 20:29:17,902][126248] Updated weights for policy 0, policy_version 1070 (0.0011) [2025-01-03 20:29:20,095][126248] Updated weights for policy 0, policy_version 1080 (0.0011) [2025-01-03 20:29:21,068][126169] Fps is (10 sec: 18024.1, 60 sec: 15437.4, 300 sec: 15437.4). Total num frames: 4440064. Throughput: 0: 3744.0. Samples: 103314. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-01-03 20:29:21,069][126169] Avg episode reward: [(0, '4.623')] [2025-01-03 20:29:22,312][126248] Updated weights for policy 0, policy_version 1090 (0.0010) [2025-01-03 20:29:24,505][126248] Updated weights for policy 0, policy_version 1100 (0.0010) [2025-01-03 20:29:26,068][126169] Fps is (10 sec: 18841.6, 60 sec: 15959.7, 300 sec: 15959.7). Total num frames: 4534272. Throughput: 0: 3598.1. Samples: 117276. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2025-01-03 20:29:26,069][126169] Avg episode reward: [(0, '4.602')] [2025-01-03 20:29:26,737][126248] Updated weights for policy 0, policy_version 1110 (0.0011) [2025-01-03 20:29:28,943][126248] Updated weights for policy 0, policy_version 1120 (0.0011) [2025-01-03 20:29:31,068][126169] Fps is (10 sec: 18432.0, 60 sec: 16234.0, 300 sec: 16234.0). Total num frames: 4624384. Throughput: 0: 3857.9. Samples: 145036. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2025-01-03 20:29:31,069][126169] Avg episode reward: [(0, '4.376')] [2025-01-03 20:29:31,163][126248] Updated weights for policy 0, policy_version 1130 (0.0011) [2025-01-03 20:29:33,411][126248] Updated weights for policy 0, policy_version 1140 (0.0010) [2025-01-03 20:29:35,666][126248] Updated weights for policy 0, policy_version 1150 (0.0011) [2025-01-03 20:29:36,068][126169] Fps is (10 sec: 18022.3, 60 sec: 16443.9, 300 sec: 16443.9). Total num frames: 4714496. Throughput: 0: 4051.8. Samples: 172582. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2025-01-03 20:29:36,069][126169] Avg episode reward: [(0, '4.633')] [2025-01-03 20:29:38,000][126248] Updated weights for policy 0, policy_version 1160 (0.0012) [2025-01-03 20:29:40,302][126248] Updated weights for policy 0, policy_version 1170 (0.0011) [2025-01-03 20:29:41,068][126169] Fps is (10 sec: 18022.7, 60 sec: 16609.8, 300 sec: 16609.8). Total num frames: 4804608. Throughput: 0: 4129.4. Samples: 185822. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2025-01-03 20:29:41,069][126169] Avg episode reward: [(0, '4.655')] [2025-01-03 20:29:42,541][126248] Updated weights for policy 0, policy_version 1180 (0.0011) [2025-01-03 20:29:44,824][126248] Updated weights for policy 0, policy_version 1190 (0.0010) [2025-01-03 20:29:46,068][126169] Fps is (10 sec: 18022.5, 60 sec: 16744.1, 300 sec: 16744.1). Total num frames: 4894720. Throughput: 0: 4578.7. Samples: 212848. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-03 20:29:46,069][126169] Avg episode reward: [(0, '4.479')] [2025-01-03 20:29:47,063][126248] Updated weights for policy 0, policy_version 1200 (0.0011) [2025-01-03 20:29:49,280][126248] Updated weights for policy 0, policy_version 1210 (0.0011) [2025-01-03 20:29:51,068][126169] Fps is (10 sec: 18022.4, 60 sec: 16855.1, 300 sec: 16855.1). Total num frames: 4984832. Throughput: 0: 4570.3. Samples: 240314. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-03 20:29:51,069][126169] Avg episode reward: [(0, '4.499')] [2025-01-03 20:29:51,567][126248] Updated weights for policy 0, policy_version 1220 (0.0011) [2025-01-03 20:29:53,748][126248] Updated weights for policy 0, policy_version 1230 (0.0011) [2025-01-03 20:29:56,015][126248] Updated weights for policy 0, policy_version 1240 (0.0010) [2025-01-03 20:29:56,069][126169] Fps is (10 sec: 18431.8, 60 sec: 17749.3, 300 sec: 17013.7). Total num frames: 5079040. Throughput: 0: 4567.4. Samples: 254146. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-03 20:29:56,069][126169] Avg episode reward: [(0, '4.833')] [2025-01-03 20:29:58,213][126248] Updated weights for policy 0, policy_version 1250 (0.0010) [2025-01-03 20:30:00,397][126248] Updated weights for policy 0, policy_version 1260 (0.0010) [2025-01-03 20:30:01,068][126169] Fps is (10 sec: 18432.0, 60 sec: 18227.2, 300 sec: 17088.4). Total num frames: 5169152. Throughput: 0: 4575.0. Samples: 281792. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-03 20:30:01,069][126169] Avg episode reward: [(0, '4.870')] [2025-01-03 20:30:02,636][126248] Updated weights for policy 0, policy_version 1270 (0.0010) [2025-01-03 20:30:04,825][126248] Updated weights for policy 0, policy_version 1280 (0.0011) [2025-01-03 20:30:06,068][126169] Fps is (10 sec: 18432.2, 60 sec: 18295.5, 300 sec: 17209.1). Total num frames: 5263360. Throughput: 0: 4583.5. Samples: 309572. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-03 20:30:06,069][126169] Avg episode reward: [(0, '4.722')] [2025-01-03 20:30:07,051][126248] Updated weights for policy 0, policy_version 1290 (0.0010) [2025-01-03 20:30:09,275][126248] Updated weights for policy 0, policy_version 1300 (0.0011) [2025-01-03 20:30:11,068][126169] Fps is (10 sec: 18841.4, 60 sec: 18295.8, 300 sec: 17314.3). Total num frames: 5357568. Throughput: 0: 4581.1. Samples: 323424. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 20:30:11,069][126169] Avg episode reward: [(0, '4.735')] [2025-01-03 20:30:11,495][126248] Updated weights for policy 0, policy_version 1310 (0.0010) [2025-01-03 20:30:13,705][126248] Updated weights for policy 0, policy_version 1320 (0.0010) [2025-01-03 20:30:15,900][126248] Updated weights for policy 0, policy_version 1330 (0.0010) [2025-01-03 20:30:16,068][126169] Fps is (10 sec: 18432.0, 60 sec: 18363.7, 300 sec: 17357.2). Total num frames: 5447680. Throughput: 0: 4580.4. Samples: 351152. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 20:30:16,069][126169] Avg episode reward: [(0, '4.479')] [2025-01-03 20:30:18,119][126248] Updated weights for policy 0, policy_version 1340 (0.0010) [2025-01-03 20:30:20,318][126248] Updated weights for policy 0, policy_version 1350 (0.0010) [2025-01-03 20:30:21,068][126169] Fps is (10 sec: 18432.0, 60 sec: 18363.7, 300 sec: 17441.9). Total num frames: 5541888. Throughput: 0: 4588.8. Samples: 379080. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-03 20:30:21,069][126169] Avg episode reward: [(0, '4.492')] [2025-01-03 20:30:22,535][126248] Updated weights for policy 0, policy_version 1360 (0.0010) [2025-01-03 20:30:24,719][126248] Updated weights for policy 0, policy_version 1370 (0.0010) [2025-01-03 20:30:26,068][126169] Fps is (10 sec: 18841.5, 60 sec: 18363.7, 300 sec: 17517.5). Total num frames: 5636096. Throughput: 0: 4603.3. Samples: 392972. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-03 20:30:26,069][126169] Avg episode reward: [(0, '4.449')] [2025-01-03 20:30:26,939][126248] Updated weights for policy 0, policy_version 1380 (0.0010) [2025-01-03 20:30:29,145][126248] Updated weights for policy 0, policy_version 1390 (0.0011) [2025-01-03 20:30:31,068][126169] Fps is (10 sec: 18432.2, 60 sec: 18363.8, 300 sec: 17543.4). Total num frames: 5726208. Throughput: 0: 4619.1. Samples: 420708. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-03 20:30:31,069][126169] Avg episode reward: [(0, '4.770')] [2025-01-03 20:30:31,360][126248] Updated weights for policy 0, policy_version 1400 (0.0010) [2025-01-03 20:30:33,557][126248] Updated weights for policy 0, policy_version 1410 (0.0010) [2025-01-03 20:30:35,778][126248] Updated weights for policy 0, policy_version 1420 (0.0010) [2025-01-03 20:30:36,068][126169] Fps is (10 sec: 18432.0, 60 sec: 18432.0, 300 sec: 17606.6). Total num frames: 5820416. Throughput: 0: 4630.2. Samples: 448672. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-03 20:30:36,069][126169] Avg episode reward: [(0, '4.776')] [2025-01-03 20:30:38,053][126248] Updated weights for policy 0, policy_version 1430 (0.0011) [2025-01-03 20:30:40,299][126248] Updated weights for policy 0, policy_version 1440 (0.0010) [2025-01-03 20:30:41,068][126169] Fps is (10 sec: 18431.8, 60 sec: 18432.0, 300 sec: 17625.9). Total num frames: 5910528. Throughput: 0: 4623.5. Samples: 462202. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 20:30:41,069][126169] Avg episode reward: [(0, '4.547')] [2025-01-03 20:30:42,512][126248] Updated weights for policy 0, policy_version 1450 (0.0011) [2025-01-03 20:30:44,739][126248] Updated weights for policy 0, policy_version 1460 (0.0011) [2025-01-03 20:30:46,068][126169] Fps is (10 sec: 18022.6, 60 sec: 18432.0, 300 sec: 17643.5). Total num frames: 6000640. Throughput: 0: 4621.3. Samples: 489752. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-03 20:30:46,069][126169] Avg episode reward: [(0, '4.563')] [2025-01-03 20:30:46,118][126222] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000001466_6004736.pth... [2025-01-03 20:30:46,161][126222] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth [2025-01-03 20:30:47,060][126248] Updated weights for policy 0, policy_version 1470 (0.0011) [2025-01-03 20:30:49,281][126248] Updated weights for policy 0, policy_version 1480 (0.0010) [2025-01-03 20:30:51,068][126169] Fps is (10 sec: 18432.2, 60 sec: 18500.3, 300 sec: 17694.5). Total num frames: 6094848. Throughput: 0: 4607.6. Samples: 516912. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-03 20:30:51,069][126169] Avg episode reward: [(0, '4.873')] [2025-01-03 20:30:51,500][126248] Updated weights for policy 0, policy_version 1490 (0.0010) [2025-01-03 20:30:53,749][126248] Updated weights for policy 0, policy_version 1500 (0.0010) [2025-01-03 20:30:56,013][126248] Updated weights for policy 0, policy_version 1510 (0.0011) [2025-01-03 20:30:56,069][126169] Fps is (10 sec: 18431.4, 60 sec: 18431.9, 300 sec: 17707.8). Total num frames: 6184960. Throughput: 0: 4603.0. Samples: 530562. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-03 20:30:56,069][126169] Avg episode reward: [(0, '4.613')] [2025-01-03 20:30:58,941][126248] Updated weights for policy 0, policy_version 1520 (0.0017) [2025-01-03 20:31:01,069][126169] Fps is (10 sec: 14335.5, 60 sec: 17817.5, 300 sec: 17431.2). Total num frames: 6238208. Throughput: 0: 4510.7. Samples: 554134. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2025-01-03 20:31:01,070][126169] Avg episode reward: [(0, '4.570')] [2025-01-03 20:31:04,235][126248] Updated weights for policy 0, policy_version 1530 (0.0032) [2025-01-03 20:31:06,069][126169] Fps is (10 sec: 9420.6, 60 sec: 16930.0, 300 sec: 17082.8). Total num frames: 6279168. Throughput: 0: 4147.6. Samples: 565724. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2025-01-03 20:31:06,070][126169] Avg episode reward: [(0, '4.621')] [2025-01-03 20:31:09,353][126248] Updated weights for policy 0, policy_version 1540 (0.0034) [2025-01-03 20:31:11,069][126169] Fps is (10 sec: 8191.9, 60 sec: 16042.6, 300 sec: 16759.7). Total num frames: 6320128. Throughput: 0: 3974.4. Samples: 571820. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2025-01-03 20:31:11,070][126169] Avg episode reward: [(0, '5.081')] [2025-01-03 20:31:11,071][126222] Saving new best policy, reward=5.081! [2025-01-03 20:31:14,402][126248] Updated weights for policy 0, policy_version 1550 (0.0031) [2025-01-03 20:31:16,069][126169] Fps is (10 sec: 7782.4, 60 sec: 15155.1, 300 sec: 16430.6). Total num frames: 6356992. Throughput: 0: 3627.5. Samples: 583946. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2025-01-03 20:31:16,070][126169] Avg episode reward: [(0, '4.837')] [2025-01-03 20:31:19,889][126248] Updated weights for policy 0, policy_version 1560 (0.0031) [2025-01-03 20:31:21,069][126169] Fps is (10 sec: 7782.6, 60 sec: 14267.7, 300 sec: 16151.5). Total num frames: 6397952. Throughput: 0: 3256.2. Samples: 595202. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-03 20:31:21,070][126169] Avg episode reward: [(0, '4.604')] [2025-01-03 20:31:24,535][126248] Updated weights for policy 0, policy_version 1570 (0.0026) [2025-01-03 20:31:26,069][126169] Fps is (10 sec: 8601.7, 60 sec: 13448.5, 300 sec: 15917.5). Total num frames: 6443008. Throughput: 0: 3103.8. Samples: 601874. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-03 20:31:26,069][126169] Avg episode reward: [(0, '4.671')] [2025-01-03 20:31:28,493][126248] Updated weights for policy 0, policy_version 1580 (0.0024) [2025-01-03 20:31:31,069][126169] Fps is (10 sec: 9830.3, 60 sec: 12834.1, 300 sec: 15750.4). Total num frames: 6496256. Throughput: 0: 2825.2. Samples: 616888. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-03 20:31:31,069][126169] Avg episode reward: [(0, '4.654')] [2025-01-03 20:31:32,511][126248] Updated weights for policy 0, policy_version 1590 (0.0023) [2025-01-03 20:31:36,055][126248] Updated weights for policy 0, policy_version 1600 (0.0022) [2025-01-03 20:31:36,069][126169] Fps is (10 sec: 11059.3, 60 sec: 12219.7, 300 sec: 15618.7). Total num frames: 6553600. Throughput: 0: 2587.5. Samples: 633352. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 20:31:36,069][126169] Avg episode reward: [(0, '4.587')] [2025-01-03 20:31:39,579][126248] Updated weights for policy 0, policy_version 1610 (0.0021) [2025-01-03 20:31:41,069][126169] Fps is (10 sec: 11469.0, 60 sec: 11673.6, 300 sec: 15494.9). Total num frames: 6610944. Throughput: 0: 2476.3. Samples: 641996. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 20:31:41,069][126169] Avg episode reward: [(0, '4.826')] [2025-01-03 20:31:43,115][126248] Updated weights for policy 0, policy_version 1620 (0.0022) [2025-01-03 20:31:46,069][126169] Fps is (10 sec: 11469.0, 60 sec: 11127.4, 300 sec: 15378.3). Total num frames: 6668288. Throughput: 0: 2335.6. Samples: 659238. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 20:31:46,069][126169] Avg episode reward: [(0, '4.791')] [2025-01-03 20:31:46,747][126248] Updated weights for policy 0, policy_version 1630 (0.0022) [2025-01-03 20:31:50,261][126248] Updated weights for policy 0, policy_version 1640 (0.0020) [2025-01-03 20:31:51,069][126169] Fps is (10 sec: 11468.7, 60 sec: 10513.0, 300 sec: 15268.2). Total num frames: 6725632. Throughput: 0: 2462.4. Samples: 676530. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 20:31:51,069][126169] Avg episode reward: [(0, '4.952')] [2025-01-03 20:31:53,818][126248] Updated weights for policy 0, policy_version 1650 (0.0022) [2025-01-03 20:31:56,069][126169] Fps is (10 sec: 11468.7, 60 sec: 9966.9, 300 sec: 15164.2). Total num frames: 6782976. Throughput: 0: 2517.0. Samples: 685084. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-03 20:31:56,069][126169] Avg episode reward: [(0, '4.558')] [2025-01-03 20:31:57,464][126248] Updated weights for policy 0, policy_version 1660 (0.0023) [2025-01-03 20:32:01,069][126169] Fps is (10 sec: 11059.2, 60 sec: 9966.9, 300 sec: 15043.9). Total num frames: 6836224. Throughput: 0: 2623.3. Samples: 701996. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-03 20:32:01,069][126169] Avg episode reward: [(0, '5.082')] [2025-01-03 20:32:01,116][126248] Updated weights for policy 0, policy_version 1670 (0.0022) [2025-01-03 20:32:04,726][126248] Updated weights for policy 0, policy_version 1680 (0.0022) [2025-01-03 20:32:06,069][126169] Fps is (10 sec: 11059.3, 60 sec: 10240.1, 300 sec: 14951.1). Total num frames: 6893568. Throughput: 0: 2746.5. Samples: 718794. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 20:32:06,069][126169] Avg episode reward: [(0, '4.675')] [2025-01-03 20:32:08,389][126248] Updated weights for policy 0, policy_version 1690 (0.0023) [2025-01-03 20:32:11,069][126169] Fps is (10 sec: 11059.3, 60 sec: 10444.8, 300 sec: 14842.2). Total num frames: 6946816. Throughput: 0: 2788.3. Samples: 727346. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 20:32:11,069][126169] Avg episode reward: [(0, '4.535')] [2025-01-03 20:32:12,208][126248] Updated weights for policy 0, policy_version 1700 (0.0024) [2025-01-03 20:32:15,752][126248] Updated weights for policy 0, policy_version 1710 (0.0022) [2025-01-03 20:32:16,069][126169] Fps is (10 sec: 11058.5, 60 sec: 10786.1, 300 sec: 14758.9). Total num frames: 7004160. Throughput: 0: 2820.1. Samples: 743794. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-03 20:32:16,070][126169] Avg episode reward: [(0, '4.486')] [2025-01-03 20:32:19,273][126248] Updated weights for policy 0, policy_version 1720 (0.0022) [2025-01-03 20:32:21,069][126169] Fps is (10 sec: 11878.2, 60 sec: 11127.5, 300 sec: 14699.4). Total num frames: 7065600. Throughput: 0: 2840.4. Samples: 761172. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 20:32:21,069][126169] Avg episode reward: [(0, '4.777')] [2025-01-03 20:32:22,736][126248] Updated weights for policy 0, policy_version 1730 (0.0021) [2025-01-03 20:32:26,069][126169] Fps is (10 sec: 11469.4, 60 sec: 11264.0, 300 sec: 14604.2). Total num frames: 7118848. Throughput: 0: 2847.6. Samples: 770138. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 20:32:26,069][126169] Avg episode reward: [(0, '4.522')] [2025-01-03 20:32:27,023][126248] Updated weights for policy 0, policy_version 1740 (0.0025) [2025-01-03 20:32:30,803][126248] Updated weights for policy 0, policy_version 1750 (0.0023) [2025-01-03 20:32:31,069][126169] Fps is (10 sec: 10240.1, 60 sec: 11195.7, 300 sec: 14494.5). Total num frames: 7168000. Throughput: 0: 2791.6. Samples: 784862. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 20:32:31,069][126169] Avg episode reward: [(0, '4.525')] [2025-01-03 20:32:34,293][126248] Updated weights for policy 0, policy_version 1760 (0.0021) [2025-01-03 20:32:36,069][126169] Fps is (10 sec: 11059.3, 60 sec: 11264.0, 300 sec: 14444.9). Total num frames: 7229440. Throughput: 0: 2790.9. Samples: 802120. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 20:32:36,069][126169] Avg episode reward: [(0, '4.856')] [2025-01-03 20:32:37,761][126248] Updated weights for policy 0, policy_version 1770 (0.0022) [2025-01-03 20:32:41,069][126169] Fps is (10 sec: 11878.3, 60 sec: 11264.0, 300 sec: 14379.5). Total num frames: 7286784. Throughput: 0: 2800.0. Samples: 811082. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 20:32:41,069][126169] Avg episode reward: [(0, '4.590')] [2025-01-03 20:32:41,256][126248] Updated weights for policy 0, policy_version 1780 (0.0021) [2025-01-03 20:32:44,679][126248] Updated weights for policy 0, policy_version 1790 (0.0021) [2025-01-03 20:32:46,069][126169] Fps is (10 sec: 11468.7, 60 sec: 11264.0, 300 sec: 14317.0). Total num frames: 7344128. Throughput: 0: 2820.0. Samples: 828898. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 20:32:46,069][126169] Avg episode reward: [(0, '4.646')] [2025-01-03 20:32:46,079][126222] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000001793_7344128.pth... [2025-01-03 20:32:46,156][126222] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000000980_4014080.pth [2025-01-03 20:32:48,287][126248] Updated weights for policy 0, policy_version 1800 (0.0021) [2025-01-03 20:32:51,068][126169] Fps is (10 sec: 12288.3, 60 sec: 11400.6, 300 sec: 14291.5). Total num frames: 7409664. Throughput: 0: 2833.7. Samples: 846310. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 20:32:51,069][126169] Avg episode reward: [(0, '4.773')] [2025-01-03 20:32:51,249][126248] Updated weights for policy 0, policy_version 1810 (0.0017) [2025-01-03 20:32:53,918][126248] Updated weights for policy 0, policy_version 1820 (0.0016) [2025-01-03 20:32:56,069][126169] Fps is (10 sec: 13516.9, 60 sec: 11605.3, 300 sec: 14284.0). Total num frames: 7479296. Throughput: 0: 2919.6. Samples: 858728. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 20:32:56,069][126169] Avg episode reward: [(0, '4.765')] [2025-01-03 20:32:57,547][126248] Updated weights for policy 0, policy_version 1830 (0.0022) [2025-01-03 20:33:01,016][126248] Updated weights for policy 0, policy_version 1840 (0.0021) [2025-01-03 20:33:01,069][126169] Fps is (10 sec: 12697.3, 60 sec: 11673.6, 300 sec: 14227.1). Total num frames: 7536640. Throughput: 0: 2942.7. Samples: 876214. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 20:33:01,069][126169] Avg episode reward: [(0, '4.822')] [2025-01-03 20:33:04,877][126248] Updated weights for policy 0, policy_version 1850 (0.0023) [2025-01-03 20:33:06,069][126169] Fps is (10 sec: 11059.2, 60 sec: 11605.3, 300 sec: 14156.3). Total num frames: 7589888. Throughput: 0: 2921.1. Samples: 892620. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-03 20:33:06,069][126169] Avg episode reward: [(0, '4.668')] [2025-01-03 20:33:08,310][126248] Updated weights for policy 0, policy_version 1860 (0.0020) [2025-01-03 20:33:11,069][126169] Fps is (10 sec: 11059.2, 60 sec: 11673.6, 300 sec: 14104.2). Total num frames: 7647232. Throughput: 0: 2922.0. Samples: 901630. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-03 20:33:11,069][126169] Avg episode reward: [(0, '4.521')] [2025-01-03 20:33:11,842][126248] Updated weights for policy 0, policy_version 1870 (0.0021) [2025-01-03 20:33:14,417][126248] Updated weights for policy 0, policy_version 1880 (0.0013) [2025-01-03 20:33:16,069][126169] Fps is (10 sec: 13517.1, 60 sec: 12015.1, 300 sec: 14132.0). Total num frames: 7725056. Throughput: 0: 3035.2. Samples: 921444. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 20:33:16,069][126169] Avg episode reward: [(0, '4.424')] [2025-01-03 20:33:16,896][126248] Updated weights for policy 0, policy_version 1890 (0.0013) [2025-01-03 20:33:19,358][126248] Updated weights for policy 0, policy_version 1900 (0.0014) [2025-01-03 20:33:21,069][126169] Fps is (10 sec: 15155.2, 60 sec: 12219.7, 300 sec: 14143.4). Total num frames: 7798784. Throughput: 0: 3179.8. Samples: 945212. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-03 20:33:21,069][126169] Avg episode reward: [(0, '4.542')] [2025-01-03 20:33:22,828][126248] Updated weights for policy 0, policy_version 1910 (0.0022) [2025-01-03 20:33:26,069][126169] Fps is (10 sec: 13516.8, 60 sec: 12356.3, 300 sec: 14109.4). Total num frames: 7860224. Throughput: 0: 3178.6. Samples: 954120. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-03 20:33:26,069][126169] Avg episode reward: [(0, '4.561')] [2025-01-03 20:33:26,347][126248] Updated weights for policy 0, policy_version 1920 (0.0021) [2025-01-03 20:33:29,762][126248] Updated weights for policy 0, policy_version 1930 (0.0021) [2025-01-03 20:33:31,068][126169] Fps is (10 sec: 11878.6, 60 sec: 12492.8, 300 sec: 14061.8). Total num frames: 7917568. Throughput: 0: 3179.2. Samples: 971960. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-03 20:33:31,069][126169] Avg episode reward: [(0, '4.506')] [2025-01-03 20:33:33,165][126248] Updated weights for policy 0, policy_version 1940 (0.0020) [2025-01-03 20:33:36,075][126169] Fps is (10 sec: 11461.5, 60 sec: 12423.2, 300 sec: 14015.6). Total num frames: 7974912. Throughput: 0: 3179.6. Samples: 989412. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-03 20:33:36,076][126169] Avg episode reward: [(0, '4.820')] [2025-01-03 20:33:36,817][126248] Updated weights for policy 0, policy_version 1950 (0.0022) [2025-01-03 20:33:40,309][126248] Updated weights for policy 0, policy_version 1960 (0.0020) [2025-01-03 20:33:41,069][126169] Fps is (10 sec: 11878.3, 60 sec: 12492.8, 300 sec: 13985.9). Total num frames: 8036352. Throughput: 0: 3099.5. Samples: 998204. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-03 20:33:41,069][126169] Avg episode reward: [(0, '4.709')] [2025-01-03 20:33:43,805][126248] Updated weights for policy 0, policy_version 1970 (0.0022) [2025-01-03 20:33:46,069][126169] Fps is (10 sec: 11885.7, 60 sec: 12492.8, 300 sec: 13942.9). Total num frames: 8093696. Throughput: 0: 3097.4. Samples: 1015596. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-03 20:33:46,070][126169] Avg episode reward: [(0, '4.478')] [2025-01-03 20:33:47,652][126248] Updated weights for policy 0, policy_version 1980 (0.0024) [2025-01-03 20:33:51,069][126169] Fps is (10 sec: 11059.0, 60 sec: 12287.9, 300 sec: 14009.7). Total num frames: 8146944. Throughput: 0: 3099.2. Samples: 1032086. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-03 20:33:51,069][126169] Avg episode reward: [(0, '4.388')] [2025-01-03 20:33:51,283][126248] Updated weights for policy 0, policy_version 1990 (0.0022) [2025-01-03 20:33:54,829][126248] Updated weights for policy 0, policy_version 2000 (0.0021) [2025-01-03 20:33:56,069][126169] Fps is (10 sec: 11059.5, 60 sec: 12083.2, 300 sec: 13995.8). Total num frames: 8204288. Throughput: 0: 3083.3. Samples: 1040380. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-03 20:33:56,069][126169] Avg episode reward: [(0, '4.569')] [2025-01-03 20:33:58,184][126248] Updated weights for policy 0, policy_version 2010 (0.0020) [2025-01-03 20:34:01,069][126169] Fps is (10 sec: 11468.9, 60 sec: 12083.2, 300 sec: 13884.7). Total num frames: 8261632. Throughput: 0: 3041.0. Samples: 1058288. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-03 20:34:01,069][126169] Avg episode reward: [(0, '4.677')] [2025-01-03 20:34:01,863][126248] Updated weights for policy 0, policy_version 2020 (0.0022) [2025-01-03 20:34:05,896][126248] Updated weights for policy 0, policy_version 2030 (0.0022) [2025-01-03 20:34:06,068][126169] Fps is (10 sec: 11059.3, 60 sec: 12083.2, 300 sec: 13745.9). Total num frames: 8314880. Throughput: 0: 2863.2. Samples: 1074054. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-03 20:34:06,069][126169] Avg episode reward: [(0, '4.647')] [2025-01-03 20:34:08,107][126248] Updated weights for policy 0, policy_version 2040 (0.0011) [2025-01-03 20:34:10,285][126248] Updated weights for policy 0, policy_version 2050 (0.0011) [2025-01-03 20:34:11,068][126169] Fps is (10 sec: 14746.0, 60 sec: 12697.7, 300 sec: 13773.7). Total num frames: 8409088. Throughput: 0: 2944.1. Samples: 1086602. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 20:34:11,069][126169] Avg episode reward: [(0, '4.496')] [2025-01-03 20:34:12,821][126248] Updated weights for policy 0, policy_version 2060 (0.0014) [2025-01-03 20:34:16,069][126169] Fps is (10 sec: 15564.5, 60 sec: 12424.5, 300 sec: 13662.6). Total num frames: 8470528. Throughput: 0: 3076.7. Samples: 1110414. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-03 20:34:16,069][126169] Avg episode reward: [(0, '4.815')] [2025-01-03 20:34:16,717][126248] Updated weights for policy 0, policy_version 2070 (0.0024) [2025-01-03 20:34:20,307][126248] Updated weights for policy 0, policy_version 2080 (0.0023) [2025-01-03 20:34:21,069][126169] Fps is (10 sec: 11878.1, 60 sec: 12151.5, 300 sec: 13537.6). Total num frames: 8527872. Throughput: 0: 3054.7. Samples: 1126856. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-03 20:34:21,069][126169] Avg episode reward: [(0, '4.784')] [2025-01-03 20:34:23,796][126248] Updated weights for policy 0, policy_version 2090 (0.0021) [2025-01-03 20:34:26,069][126169] Fps is (10 sec: 11468.8, 60 sec: 12083.2, 300 sec: 13426.5). Total num frames: 8585216. Throughput: 0: 3054.4. Samples: 1135654. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-03 20:34:26,069][126169] Avg episode reward: [(0, '4.841')] [2025-01-03 20:34:27,355][126248] Updated weights for policy 0, policy_version 2100 (0.0022) [2025-01-03 20:34:31,025][126248] Updated weights for policy 0, policy_version 2110 (0.0021) [2025-01-03 20:34:31,069][126169] Fps is (10 sec: 11468.7, 60 sec: 12083.2, 300 sec: 13315.5). Total num frames: 8642560. Throughput: 0: 3044.3. Samples: 1152588. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-03 20:34:31,069][126169] Avg episode reward: [(0, '4.826')] [2025-01-03 20:34:34,437][126248] Updated weights for policy 0, policy_version 2120 (0.0022) [2025-01-03 20:34:36,069][126169] Fps is (10 sec: 11468.7, 60 sec: 12084.5, 300 sec: 13204.4). Total num frames: 8699904. Throughput: 0: 3068.0. Samples: 1170144. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2025-01-03 20:34:36,069][126169] Avg episode reward: [(0, '4.691')] [2025-01-03 20:34:38,346][126248] Updated weights for policy 0, policy_version 2130 (0.0023) [2025-01-03 20:34:41,069][126169] Fps is (10 sec: 10649.2, 60 sec: 11878.3, 300 sec: 13065.5). Total num frames: 8749056. Throughput: 0: 3055.3. Samples: 1177868. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2025-01-03 20:34:41,070][126169] Avg episode reward: [(0, '4.580')] [2025-01-03 20:34:42,565][126248] Updated weights for policy 0, policy_version 2140 (0.0025) [2025-01-03 20:34:46,069][126169] Fps is (10 sec: 10240.2, 60 sec: 11810.2, 300 sec: 12940.6). Total num frames: 8802304. Throughput: 0: 2991.8. Samples: 1192920. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2025-01-03 20:34:46,069][126169] Avg episode reward: [(0, '4.960')] [2025-01-03 20:34:46,134][126222] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000002150_8806400.pth... [2025-01-03 20:34:46,137][126248] Updated weights for policy 0, policy_version 2150 (0.0019) [2025-01-03 20:34:46,176][126222] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000001466_6004736.pth [2025-01-03 20:34:48,445][126248] Updated weights for policy 0, policy_version 2160 (0.0012) [2025-01-03 20:34:50,700][126248] Updated weights for policy 0, policy_version 2170 (0.0011) [2025-01-03 20:34:51,068][126169] Fps is (10 sec: 14336.7, 60 sec: 12424.6, 300 sec: 12926.7). Total num frames: 8892416. Throughput: 0: 3176.2. Samples: 1216982. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-03 20:34:51,069][126169] Avg episode reward: [(0, '4.577')] [2025-01-03 20:34:54,107][126248] Updated weights for policy 0, policy_version 2180 (0.0022) [2025-01-03 20:34:56,069][126169] Fps is (10 sec: 14336.0, 60 sec: 12356.3, 300 sec: 12801.7). Total num frames: 8945664. Throughput: 0: 3127.1. Samples: 1227324. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2025-01-03 20:34:56,069][126169] Avg episode reward: [(0, '4.658')] [2025-01-03 20:34:57,837][126248] Updated weights for policy 0, policy_version 2190 (0.0022) [2025-01-03 20:35:01,069][126169] Fps is (10 sec: 11468.7, 60 sec: 12424.5, 300 sec: 12690.7). Total num frames: 9007104. Throughput: 0: 2963.0. Samples: 1243748. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2025-01-03 20:35:01,069][126169] Avg episode reward: [(0, '4.808')] [2025-01-03 20:35:01,334][126248] Updated weights for policy 0, policy_version 2200 (0.0021) [2025-01-03 20:35:04,931][126248] Updated weights for policy 0, policy_version 2210 (0.0021) [2025-01-03 20:35:06,069][126169] Fps is (10 sec: 11878.2, 60 sec: 12492.7, 300 sec: 12565.7). Total num frames: 9064448. Throughput: 0: 2982.3. Samples: 1261058. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-03 20:35:06,069][126169] Avg episode reward: [(0, '4.804')] [2025-01-03 20:35:08,730][126248] Updated weights for policy 0, policy_version 2220 (0.0023) [2025-01-03 20:35:11,069][126169] Fps is (10 sec: 11059.2, 60 sec: 11810.1, 300 sec: 12440.7). Total num frames: 9117696. Throughput: 0: 2968.7. Samples: 1269246. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-03 20:35:11,069][126169] Avg episode reward: [(0, '4.896')] [2025-01-03 20:35:12,646][126248] Updated weights for policy 0, policy_version 2230 (0.0022) [2025-01-03 20:35:16,069][126169] Fps is (10 sec: 10239.4, 60 sec: 11605.2, 300 sec: 12288.0). Total num frames: 9166848. Throughput: 0: 2931.6. Samples: 1284512. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2025-01-03 20:35:16,070][126169] Avg episode reward: [(0, '4.549')] [2025-01-03 20:35:16,827][126248] Updated weights for policy 0, policy_version 2240 (0.0024) [2025-01-03 20:35:19,255][126248] Updated weights for policy 0, policy_version 2250 (0.0012) [2025-01-03 20:35:21,068][126169] Fps is (10 sec: 13107.4, 60 sec: 12015.0, 300 sec: 12246.3). Total num frames: 9248768. Throughput: 0: 3005.9. Samples: 1305410. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2025-01-03 20:35:21,069][126169] Avg episode reward: [(0, '4.642')] [2025-01-03 20:35:21,431][126248] Updated weights for policy 0, policy_version 2260 (0.0011) [2025-01-03 20:35:23,529][126248] Updated weights for policy 0, policy_version 2270 (0.0010) [2025-01-03 20:35:26,069][126169] Fps is (10 sec: 16385.1, 60 sec: 12424.5, 300 sec: 12218.6). Total num frames: 9330688. Throughput: 0: 3157.6. Samples: 1319958. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2025-01-03 20:35:26,069][126169] Avg episode reward: [(0, '4.533')] [2025-01-03 20:35:26,644][126248] Updated weights for policy 0, policy_version 2280 (0.0020) [2025-01-03 20:35:30,224][126248] Updated weights for policy 0, policy_version 2290 (0.0022) [2025-01-03 20:35:31,069][126169] Fps is (10 sec: 13926.3, 60 sec: 12424.6, 300 sec: 12093.6). Total num frames: 9388032. Throughput: 0: 3251.2. Samples: 1339222. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2025-01-03 20:35:31,069][126169] Avg episode reward: [(0, '4.846')] [2025-01-03 20:35:33,695][126248] Updated weights for policy 0, policy_version 2300 (0.0021) [2025-01-03 20:35:36,069][126169] Fps is (10 sec: 11468.8, 60 sec: 12424.5, 300 sec: 11982.5). Total num frames: 9445376. Throughput: 0: 3104.2. Samples: 1356672. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2025-01-03 20:35:36,069][126169] Avg episode reward: [(0, '4.631')] [2025-01-03 20:35:37,224][126248] Updated weights for policy 0, policy_version 2310 (0.0021) [2025-01-03 20:35:40,860][126248] Updated weights for policy 0, policy_version 2320 (0.0022) [2025-01-03 20:35:41,069][126169] Fps is (10 sec: 11468.7, 60 sec: 12561.1, 300 sec: 11871.4). Total num frames: 9502720. Throughput: 0: 3069.1. Samples: 1365436. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2025-01-03 20:35:41,069][126169] Avg episode reward: [(0, '4.904')] [2025-01-03 20:35:44,396][126248] Updated weights for policy 0, policy_version 2330 (0.0022) [2025-01-03 20:35:46,069][126169] Fps is (10 sec: 11468.8, 60 sec: 12629.3, 300 sec: 11746.5). Total num frames: 9560064. Throughput: 0: 3084.1. Samples: 1382532. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2025-01-03 20:35:46,069][126169] Avg episode reward: [(0, '4.577')] [2025-01-03 20:35:47,719][126248] Updated weights for policy 0, policy_version 2340 (0.0020) [2025-01-03 20:35:51,056][126248] Updated weights for policy 0, policy_version 2350 (0.0021) [2025-01-03 20:35:51,069][126169] Fps is (10 sec: 12287.9, 60 sec: 12219.7, 300 sec: 11663.2). Total num frames: 9625600. Throughput: 0: 3111.0. Samples: 1401054. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-03 20:35:51,070][126169] Avg episode reward: [(0, '4.802')] [2025-01-03 20:35:54,451][126248] Updated weights for policy 0, policy_version 2360 (0.0021) [2025-01-03 20:35:56,069][126169] Fps is (10 sec: 12288.1, 60 sec: 12288.0, 300 sec: 11677.1). Total num frames: 9682944. Throughput: 0: 3127.1. Samples: 1409966. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2025-01-03 20:35:56,069][126169] Avg episode reward: [(0, '4.838')] [2025-01-03 20:35:57,858][126248] Updated weights for policy 0, policy_version 2370 (0.0021) [2025-01-03 20:36:01,069][126169] Fps is (10 sec: 11878.5, 60 sec: 12288.0, 300 sec: 11746.5). Total num frames: 9744384. Throughput: 0: 3194.9. Samples: 1428280. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-03 20:36:01,069][126169] Avg episode reward: [(0, '4.690')] [2025-01-03 20:36:01,224][126248] Updated weights for policy 0, policy_version 2380 (0.0021) [2025-01-03 20:36:04,594][126248] Updated weights for policy 0, policy_version 2390 (0.0020) [2025-01-03 20:36:06,069][126169] Fps is (10 sec: 12287.8, 60 sec: 12356.3, 300 sec: 11815.9). Total num frames: 9805824. Throughput: 0: 3129.1. Samples: 1446222. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2025-01-03 20:36:06,069][126169] Avg episode reward: [(0, '4.360')] [2025-01-03 20:36:07,955][126248] Updated weights for policy 0, policy_version 2400 (0.0020) [2025-01-03 20:36:11,069][126169] Fps is (10 sec: 12287.9, 60 sec: 12492.8, 300 sec: 11899.2). Total num frames: 9867264. Throughput: 0: 3012.8. Samples: 1455534. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-03 20:36:11,069][126169] Avg episode reward: [(0, '4.729')] [2025-01-03 20:36:11,292][126248] Updated weights for policy 0, policy_version 2410 (0.0020) [2025-01-03 20:36:14,708][126248] Updated weights for policy 0, policy_version 2420 (0.0020) [2025-01-03 20:36:16,069][126169] Fps is (10 sec: 11878.5, 60 sec: 12629.5, 300 sec: 11954.8). Total num frames: 9924608. Throughput: 0: 2991.5. Samples: 1473840. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2025-01-03 20:36:16,069][126169] Avg episode reward: [(0, '5.153')] [2025-01-03 20:36:16,103][126222] Saving new best policy, reward=5.153! [2025-01-03 20:36:18,161][126248] Updated weights for policy 0, policy_version 2430 (0.0021) [2025-01-03 20:36:21,069][126169] Fps is (10 sec: 11877.9, 60 sec: 12287.9, 300 sec: 12010.3). Total num frames: 9986048. Throughput: 0: 2995.1. Samples: 1491452. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2025-01-03 20:36:21,070][126169] Avg episode reward: [(0, '4.769')] [2025-01-03 20:36:21,652][126248] Updated weights for policy 0, policy_version 2440 (0.0021) [2025-01-03 20:36:25,052][126248] Updated weights for policy 0, policy_version 2450 (0.0020) [2025-01-03 20:36:26,069][126169] Fps is (10 sec: 11878.3, 60 sec: 11878.4, 300 sec: 12024.2). Total num frames: 10043392. Throughput: 0: 2998.8. Samples: 1500382. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2025-01-03 20:36:26,069][126169] Avg episode reward: [(0, '5.448')] [2025-01-03 20:36:26,098][126222] Saving new best policy, reward=5.448! [2025-01-03 20:36:28,433][126248] Updated weights for policy 0, policy_version 2460 (0.0021) [2025-01-03 20:36:31,069][126169] Fps is (10 sec: 11878.9, 60 sec: 11946.6, 300 sec: 12038.1). Total num frames: 10104832. Throughput: 0: 3021.5. Samples: 1518500. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2025-01-03 20:36:31,069][126169] Avg episode reward: [(0, '5.128')] [2025-01-03 20:36:31,863][126248] Updated weights for policy 0, policy_version 2470 (0.0021) [2025-01-03 20:36:35,422][126248] Updated weights for policy 0, policy_version 2480 (0.0021) [2025-01-03 20:36:36,068][126169] Fps is (10 sec: 12288.2, 60 sec: 12015.0, 300 sec: 12052.0). Total num frames: 10166272. Throughput: 0: 2996.4. Samples: 1535890. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-01-03 20:36:36,069][126169] Avg episode reward: [(0, '4.883')] [2025-01-03 20:36:38,673][126248] Updated weights for policy 0, policy_version 2490 (0.0021) [2025-01-03 20:36:41,069][126169] Fps is (10 sec: 12288.0, 60 sec: 12083.2, 300 sec: 12065.8). Total num frames: 10227712. Throughput: 0: 3010.4. Samples: 1545436. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2025-01-03 20:36:41,069][126169] Avg episode reward: [(0, '4.599')] [2025-01-03 20:36:42,133][126248] Updated weights for policy 0, policy_version 2500 (0.0021) [2025-01-03 20:36:44,556][126248] Updated weights for policy 0, policy_version 2510 (0.0012) [2025-01-03 20:36:46,068][126169] Fps is (10 sec: 14336.2, 60 sec: 12492.9, 300 sec: 12149.2). Total num frames: 10309632. Throughput: 0: 3052.1. Samples: 1565622. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-03 20:36:46,069][126169] Avg episode reward: [(0, '4.755')] [2025-01-03 20:36:46,073][126222] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000002517_10309632.pth... [2025-01-03 20:36:46,110][126222] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000001793_7344128.pth [2025-01-03 20:36:46,724][126248] Updated weights for policy 0, policy_version 2520 (0.0012) [2025-01-03 20:36:49,842][126248] Updated weights for policy 0, policy_version 2530 (0.0019) [2025-01-03 20:36:51,069][126169] Fps is (10 sec: 14745.6, 60 sec: 12492.8, 300 sec: 12176.9). Total num frames: 10375168. Throughput: 0: 3171.7. Samples: 1588950. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-03 20:36:51,069][126169] Avg episode reward: [(0, '4.699')] [2025-01-03 20:36:53,254][126248] Updated weights for policy 0, policy_version 2540 (0.0022) [2025-01-03 20:36:56,069][126169] Fps is (10 sec: 12697.3, 60 sec: 12561.1, 300 sec: 12204.7). Total num frames: 10436608. Throughput: 0: 3165.4. Samples: 1597978. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-03 20:36:56,069][126169] Avg episode reward: [(0, '4.692')] [2025-01-03 20:36:56,784][126248] Updated weights for policy 0, policy_version 2550 (0.0021) [2025-01-03 20:37:00,157][126248] Updated weights for policy 0, policy_version 2560 (0.0022) [2025-01-03 20:37:01,069][126169] Fps is (10 sec: 11878.4, 60 sec: 12492.8, 300 sec: 12204.7). Total num frames: 10493952. Throughput: 0: 3154.5. Samples: 1615794. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2025-01-03 20:37:01,069][126169] Avg episode reward: [(0, '4.885')] [2025-01-03 20:37:03,511][126248] Updated weights for policy 0, policy_version 2570 (0.0020) [2025-01-03 20:37:06,069][126169] Fps is (10 sec: 11878.5, 60 sec: 12492.8, 300 sec: 12232.5). Total num frames: 10555392. Throughput: 0: 3163.2. Samples: 1633796. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2025-01-03 20:37:06,069][126169] Avg episode reward: [(0, '4.593')] [2025-01-03 20:37:06,963][126248] Updated weights for policy 0, policy_version 2580 (0.0021) [2025-01-03 20:37:10,293][126248] Updated weights for policy 0, policy_version 2590 (0.0020) [2025-01-03 20:37:11,069][126169] Fps is (10 sec: 12287.9, 60 sec: 12492.8, 300 sec: 12246.4). Total num frames: 10616832. Throughput: 0: 3164.0. Samples: 1642760. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2025-01-03 20:37:11,069][126169] Avg episode reward: [(0, '4.815')] [2025-01-03 20:37:13,646][126248] Updated weights for policy 0, policy_version 2600 (0.0021) [2025-01-03 20:37:16,069][126169] Fps is (10 sec: 11878.4, 60 sec: 12492.8, 300 sec: 12232.5). Total num frames: 10674176. Throughput: 0: 3170.1. Samples: 1661154. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2025-01-03 20:37:16,069][126169] Avg episode reward: [(0, '4.416')] [2025-01-03 20:37:17,381][126248] Updated weights for policy 0, policy_version 2610 (0.0022) [2025-01-03 20:37:21,068][126169] Fps is (10 sec: 11059.4, 60 sec: 12356.4, 300 sec: 12232.5). Total num frames: 10727424. Throughput: 0: 3148.0. Samples: 1677552. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2025-01-03 20:37:21,069][126169] Avg episode reward: [(0, '4.648')] [2025-01-03 20:37:21,095][126248] Updated weights for policy 0, policy_version 2620 (0.0023) [2025-01-03 20:37:24,314][126248] Updated weights for policy 0, policy_version 2630 (0.0018) [2025-01-03 20:37:26,068][126169] Fps is (10 sec: 13107.4, 60 sec: 12697.7, 300 sec: 12329.7). Total num frames: 10805248. Throughput: 0: 3122.6. Samples: 1685954. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-01-03 20:37:26,069][126169] Avg episode reward: [(0, '5.308')] [2025-01-03 20:37:26,416][126248] Updated weights for policy 0, policy_version 2640 (0.0011) [2025-01-03 20:37:28,938][126248] Updated weights for policy 0, policy_version 2650 (0.0015) [2025-01-03 20:37:31,069][126169] Fps is (10 sec: 15154.9, 60 sec: 12902.4, 300 sec: 12371.3). Total num frames: 10878976. Throughput: 0: 3241.9. Samples: 1711508. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-01-03 20:37:31,069][126169] Avg episode reward: [(0, '4.538')] [2025-01-03 20:37:32,499][126248] Updated weights for policy 0, policy_version 2660 (0.0021) [2025-01-03 20:37:35,910][126248] Updated weights for policy 0, policy_version 2670 (0.0020) [2025-01-03 20:37:36,069][126169] Fps is (10 sec: 13106.8, 60 sec: 12834.1, 300 sec: 12371.3). Total num frames: 10936320. Throughput: 0: 3116.1. Samples: 1729174. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2025-01-03 20:37:36,069][126169] Avg episode reward: [(0, '4.660')] [2025-01-03 20:37:39,289][126248] Updated weights for policy 0, policy_version 2680 (0.0021) [2025-01-03 20:37:41,069][126169] Fps is (10 sec: 11878.6, 60 sec: 12834.1, 300 sec: 12385.2). Total num frames: 10997760. Throughput: 0: 3115.8. Samples: 1738188. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2025-01-03 20:37:41,069][126169] Avg episode reward: [(0, '4.757')] [2025-01-03 20:37:42,601][126248] Updated weights for policy 0, policy_version 2690 (0.0020) [2025-01-03 20:37:45,887][126248] Updated weights for policy 0, policy_version 2700 (0.0020) [2025-01-03 20:37:46,069][126169] Fps is (10 sec: 12288.0, 60 sec: 12492.7, 300 sec: 12371.3). Total num frames: 11059200. Throughput: 0: 3131.8. Samples: 1756726. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2025-01-03 20:37:46,069][126169] Avg episode reward: [(0, '4.785')] [2025-01-03 20:37:49,148][126248] Updated weights for policy 0, policy_version 2710 (0.0020) [2025-01-03 20:37:51,069][126169] Fps is (10 sec: 12287.9, 60 sec: 12424.5, 300 sec: 12343.5). Total num frames: 11120640. Throughput: 0: 3145.1. Samples: 1775326. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2025-01-03 20:37:51,069][126169] Avg episode reward: [(0, '4.764')] [2025-01-03 20:37:52,425][126248] Updated weights for policy 0, policy_version 2720 (0.0020) [2025-01-03 20:37:55,768][126248] Updated weights for policy 0, policy_version 2730 (0.0020) [2025-01-03 20:37:56,069][126169] Fps is (10 sec: 12288.0, 60 sec: 12424.5, 300 sec: 12357.4). Total num frames: 11182080. Throughput: 0: 3156.9. Samples: 1784822. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2025-01-03 20:37:56,069][126169] Avg episode reward: [(0, '4.551')] [2025-01-03 20:37:59,337][126248] Updated weights for policy 0, policy_version 2740 (0.0021) [2025-01-03 20:38:01,069][126169] Fps is (10 sec: 11878.4, 60 sec: 12424.5, 300 sec: 12371.3). Total num frames: 11239424. Throughput: 0: 3140.1. Samples: 1802458. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-01-03 20:38:01,069][126169] Avg episode reward: [(0, '4.711')] [2025-01-03 20:38:02,796][126248] Updated weights for policy 0, policy_version 2750 (0.0020) [2025-01-03 20:38:06,069][126169] Fps is (10 sec: 11878.4, 60 sec: 12424.5, 300 sec: 12385.2). Total num frames: 11300864. Throughput: 0: 3180.3. Samples: 1820664. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2025-01-03 20:38:06,069][126169] Avg episode reward: [(0, '4.773')] [2025-01-03 20:38:06,118][126248] Updated weights for policy 0, policy_version 2760 (0.0020) [2025-01-03 20:38:09,399][126248] Updated weights for policy 0, policy_version 2770 (0.0020) [2025-01-03 20:38:11,068][126169] Fps is (10 sec: 12288.2, 60 sec: 12424.6, 300 sec: 12329.7). Total num frames: 11362304. Throughput: 0: 3196.4. Samples: 1829794. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-03 20:38:11,069][126169] Avg episode reward: [(0, '4.670')] [2025-01-03 20:38:12,693][126248] Updated weights for policy 0, policy_version 2780 (0.0020) [2025-01-03 20:38:15,931][126248] Updated weights for policy 0, policy_version 2790 (0.0020) [2025-01-03 20:38:16,069][126169] Fps is (10 sec: 12697.8, 60 sec: 12561.1, 300 sec: 12301.9). Total num frames: 11427840. Throughput: 0: 3050.3. Samples: 1848772. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2025-01-03 20:38:16,069][126169] Avg episode reward: [(0, '5.095')] [2025-01-03 20:38:19,178][126248] Updated weights for policy 0, policy_version 2800 (0.0019) [2025-01-03 20:38:21,069][126169] Fps is (10 sec: 12697.4, 60 sec: 12697.6, 300 sec: 12301.9). Total num frames: 11489280. Throughput: 0: 3071.9. Samples: 1867408. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2025-01-03 20:38:21,069][126169] Avg episode reward: [(0, '4.760')] [2025-01-03 20:38:22,438][126248] Updated weights for policy 0, policy_version 2810 (0.0020) [2025-01-03 20:38:25,403][126248] Updated weights for policy 0, policy_version 2820 (0.0017) [2025-01-03 20:38:26,068][126169] Fps is (10 sec: 13516.9, 60 sec: 12629.3, 300 sec: 12357.4). Total num frames: 11563008. Throughput: 0: 3084.2. Samples: 1876976. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2025-01-03 20:38:26,069][126169] Avg episode reward: [(0, '4.903')] [2025-01-03 20:38:27,775][126248] Updated weights for policy 0, policy_version 2830 (0.0014) [2025-01-03 20:38:31,069][126169] Fps is (10 sec: 13926.4, 60 sec: 12492.8, 300 sec: 12385.5). Total num frames: 11628544. Throughput: 0: 3177.0. Samples: 1899692. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2025-01-03 20:38:31,069][126169] Avg episode reward: [(0, '4.652')] [2025-01-03 20:38:31,089][126248] Updated weights for policy 0, policy_version 2840 (0.0020) [2025-01-03 20:38:34,711][126248] Updated weights for policy 0, policy_version 2850 (0.0022) [2025-01-03 20:38:36,069][126169] Fps is (10 sec: 12287.7, 60 sec: 12492.8, 300 sec: 12371.3). Total num frames: 11685888. Throughput: 0: 3143.9. Samples: 1916800. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2025-01-03 20:38:36,069][126169] Avg episode reward: [(0, '4.872')] [2025-01-03 20:38:39,042][126248] Updated weights for policy 0, policy_version 2860 (0.0024) [2025-01-03 20:38:41,068][126169] Fps is (10 sec: 11878.6, 60 sec: 12492.8, 300 sec: 12385.2). Total num frames: 11747328. Throughput: 0: 3092.3. Samples: 1923976. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-03 20:38:41,069][126169] Avg episode reward: [(0, '4.630')] [2025-01-03 20:38:41,366][126248] Updated weights for policy 0, policy_version 2870 (0.0011) [2025-01-03 20:38:43,561][126248] Updated weights for policy 0, policy_version 2880 (0.0011) [2025-01-03 20:38:46,069][126169] Fps is (10 sec: 14745.5, 60 sec: 12902.4, 300 sec: 12496.3). Total num frames: 11833344. Throughput: 0: 3265.2. Samples: 1949394. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 20:38:46,069][126169] Avg episode reward: [(0, '4.841')] [2025-01-03 20:38:46,082][126222] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000002889_11833344.pth... [2025-01-03 20:38:46,159][126222] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000002150_8806400.pth [2025-01-03 20:38:46,244][126248] Updated weights for policy 0, policy_version 2890 (0.0016) [2025-01-03 20:38:50,189][126248] Updated weights for policy 0, policy_version 2900 (0.0024) [2025-01-03 20:38:51,069][126169] Fps is (10 sec: 13926.1, 60 sec: 12765.9, 300 sec: 12482.4). Total num frames: 11886592. Throughput: 0: 3247.7. Samples: 1966812. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-03 20:38:51,069][126169] Avg episode reward: [(0, '4.663')] [2025-01-03 20:38:53,749][126248] Updated weights for policy 0, policy_version 2910 (0.0022) [2025-01-03 20:38:56,069][126169] Fps is (10 sec: 11059.5, 60 sec: 12697.6, 300 sec: 12482.4). Total num frames: 11943936. Throughput: 0: 3237.8. Samples: 1975496. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-03 20:38:56,069][126169] Avg episode reward: [(0, '4.812')] [2025-01-03 20:38:57,404][126248] Updated weights for policy 0, policy_version 2920 (0.0023) [2025-01-03 20:39:01,069][126169] Fps is (10 sec: 11059.1, 60 sec: 12629.3, 300 sec: 12482.4). Total num frames: 11997184. Throughput: 0: 3187.3. Samples: 1992200. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-03 20:39:01,069][126169] Avg episode reward: [(0, '4.595')] [2025-01-03 20:39:01,078][126248] Updated weights for policy 0, policy_version 2930 (0.0023) [2025-01-03 20:39:04,913][126248] Updated weights for policy 0, policy_version 2940 (0.0022) [2025-01-03 20:39:06,069][126169] Fps is (10 sec: 10649.4, 60 sec: 12492.8, 300 sec: 12343.5). Total num frames: 12050432. Throughput: 0: 3134.0. Samples: 2008440. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-03 20:39:06,069][126169] Avg episode reward: [(0, '4.643')] [2025-01-03 20:39:08,599][126248] Updated weights for policy 0, policy_version 2950 (0.0023) [2025-01-03 20:39:11,069][126169] Fps is (10 sec: 11059.3, 60 sec: 12424.5, 300 sec: 12329.7). Total num frames: 12107776. Throughput: 0: 3108.5. Samples: 2016858. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2025-01-03 20:39:11,069][126169] Avg episode reward: [(0, '4.639')] [2025-01-03 20:39:12,196][126248] Updated weights for policy 0, policy_version 2960 (0.0022) [2025-01-03 20:39:15,821][126248] Updated weights for policy 0, policy_version 2970 (0.0022) [2025-01-03 20:39:16,069][126169] Fps is (10 sec: 11468.8, 60 sec: 12288.0, 300 sec: 12329.6). Total num frames: 12165120. Throughput: 0: 2981.6. Samples: 2033864. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2025-01-03 20:39:16,069][126169] Avg episode reward: [(0, '4.525')] [2025-01-03 20:39:19,531][126248] Updated weights for policy 0, policy_version 2980 (0.0021) [2025-01-03 20:39:21,068][126169] Fps is (10 sec: 12288.3, 60 sec: 12356.3, 300 sec: 12357.4). Total num frames: 12230656. Throughput: 0: 2997.0. Samples: 2051662. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2025-01-03 20:39:21,069][126169] Avg episode reward: [(0, '4.752')] [2025-01-03 20:39:21,801][126248] Updated weights for policy 0, policy_version 2990 (0.0012) [2025-01-03 20:39:23,950][126248] Updated weights for policy 0, policy_version 3000 (0.0011) [2025-01-03 20:39:26,069][126169] Fps is (10 sec: 15565.1, 60 sec: 12629.3, 300 sec: 12468.5). Total num frames: 12320768. Throughput: 0: 3147.2. Samples: 2065600. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 20:39:26,069][126169] Avg episode reward: [(0, '4.884')] [2025-01-03 20:39:26,441][126248] Updated weights for policy 0, policy_version 3010 (0.0014) [2025-01-03 20:39:30,195][126248] Updated weights for policy 0, policy_version 3020 (0.0024) [2025-01-03 20:39:31,069][126169] Fps is (10 sec: 14745.1, 60 sec: 12492.8, 300 sec: 12468.5). Total num frames: 12378112. Throughput: 0: 3057.5. Samples: 2086982. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-03 20:39:31,069][126169] Avg episode reward: [(0, '4.799')] [2025-01-03 20:39:33,828][126248] Updated weights for policy 0, policy_version 3030 (0.0023) [2025-01-03 20:39:36,069][126169] Fps is (10 sec: 11468.8, 60 sec: 12492.8, 300 sec: 12496.3). Total num frames: 12435456. Throughput: 0: 3040.5. Samples: 2103636. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-03 20:39:36,069][126169] Avg episode reward: [(0, '4.946')] [2025-01-03 20:39:37,500][126248] Updated weights for policy 0, policy_version 3040 (0.0023) [2025-01-03 20:39:41,069][126169] Fps is (10 sec: 11059.2, 60 sec: 12356.2, 300 sec: 12496.3). Total num frames: 12488704. Throughput: 0: 3033.8. Samples: 2112018. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-03 20:39:41,069][126169] Avg episode reward: [(0, '4.583')] [2025-01-03 20:39:41,215][126248] Updated weights for policy 0, policy_version 3050 (0.0023) [2025-01-03 20:39:44,691][126248] Updated weights for policy 0, policy_version 3060 (0.0021) [2025-01-03 20:39:46,069][126169] Fps is (10 sec: 11059.1, 60 sec: 11878.4, 300 sec: 12385.2). Total num frames: 12546048. Throughput: 0: 3045.6. Samples: 2129252. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-03 20:39:46,069][126169] Avg episode reward: [(0, '4.916')] [2025-01-03 20:39:48,127][126248] Updated weights for policy 0, policy_version 3070 (0.0021) [2025-01-03 20:39:51,069][126169] Fps is (10 sec: 11468.9, 60 sec: 11946.7, 300 sec: 12399.1). Total num frames: 12603392. Throughput: 0: 3068.5. Samples: 2146522. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2025-01-03 20:39:51,069][126169] Avg episode reward: [(0, '4.991')] [2025-01-03 20:39:51,881][126248] Updated weights for policy 0, policy_version 3080 (0.0022) [2025-01-03 20:39:55,816][126248] Updated weights for policy 0, policy_version 3090 (0.0024) [2025-01-03 20:39:56,069][126169] Fps is (10 sec: 11059.4, 60 sec: 11878.4, 300 sec: 12371.3). Total num frames: 12656640. Throughput: 0: 3059.8. Samples: 2154550. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2025-01-03 20:39:56,069][126169] Avg episode reward: [(0, '4.801')] [2025-01-03 20:39:59,328][126248] Updated weights for policy 0, policy_version 3100 (0.0021) [2025-01-03 20:40:01,068][126169] Fps is (10 sec: 11059.2, 60 sec: 11946.7, 300 sec: 12371.3). Total num frames: 12713984. Throughput: 0: 3052.4. Samples: 2171222. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-03 20:40:01,069][126169] Avg episode reward: [(0, '5.095')] [2025-01-03 20:40:02,897][126248] Updated weights for policy 0, policy_version 3110 (0.0020) [2025-01-03 20:40:06,069][126169] Fps is (10 sec: 11468.7, 60 sec: 12014.9, 300 sec: 12385.2). Total num frames: 12771328. Throughput: 0: 3031.4. Samples: 2188076. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-03 20:40:06,069][126169] Avg episode reward: [(0, '4.523')] [2025-01-03 20:40:06,584][126248] Updated weights for policy 0, policy_version 3120 (0.0024) [2025-01-03 20:40:09,309][126248] Updated weights for policy 0, policy_version 3130 (0.0014) [2025-01-03 20:40:11,068][126169] Fps is (10 sec: 13107.2, 60 sec: 12288.0, 300 sec: 12468.5). Total num frames: 12845056. Throughput: 0: 2928.4. Samples: 2197378. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 20:40:11,069][126169] Avg episode reward: [(0, '4.686')] [2025-01-03 20:40:12,259][126248] Updated weights for policy 0, policy_version 3140 (0.0018) [2025-01-03 20:40:15,630][126248] Updated weights for policy 0, policy_version 3150 (0.0021) [2025-01-03 20:40:16,069][126169] Fps is (10 sec: 13516.9, 60 sec: 12356.3, 300 sec: 12399.1). Total num frames: 12906496. Throughput: 0: 2927.9. Samples: 2218736. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-03 20:40:16,069][126169] Avg episode reward: [(0, '4.797')] [2025-01-03 20:40:18,951][126248] Updated weights for policy 0, policy_version 3160 (0.0020) [2025-01-03 20:40:21,069][126169] Fps is (10 sec: 12287.8, 60 sec: 12287.9, 300 sec: 12329.7). Total num frames: 12967936. Throughput: 0: 2955.3. Samples: 2236626. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-03 20:40:21,069][126169] Avg episode reward: [(0, '4.661')] [2025-01-03 20:40:22,480][126248] Updated weights for policy 0, policy_version 3170 (0.0020) [2025-01-03 20:40:25,843][126248] Updated weights for policy 0, policy_version 3180 (0.0021) [2025-01-03 20:40:26,069][126169] Fps is (10 sec: 11878.4, 60 sec: 11741.9, 300 sec: 12329.6). Total num frames: 13025280. Throughput: 0: 2965.4. Samples: 2245462. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 20:40:26,069][126169] Avg episode reward: [(0, '4.479')] [2025-01-03 20:40:29,203][126248] Updated weights for policy 0, policy_version 3190 (0.0021) [2025-01-03 20:40:31,069][126169] Fps is (10 sec: 11878.4, 60 sec: 11810.1, 300 sec: 12343.5). Total num frames: 13086720. Throughput: 0: 2990.5. Samples: 2263822. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 20:40:31,069][126169] Avg episode reward: [(0, '4.716')] [2025-01-03 20:40:32,518][126248] Updated weights for policy 0, policy_version 3200 (0.0021) [2025-01-03 20:40:35,905][126248] Updated weights for policy 0, policy_version 3210 (0.0021) [2025-01-03 20:40:36,069][126169] Fps is (10 sec: 12287.9, 60 sec: 11878.4, 300 sec: 12357.4). Total num frames: 13148160. Throughput: 0: 3018.7. Samples: 2282362. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 20:40:36,069][126169] Avg episode reward: [(0, '4.563')] [2025-01-03 20:40:39,430][126248] Updated weights for policy 0, policy_version 3220 (0.0021) [2025-01-03 20:40:41,069][126169] Fps is (10 sec: 11878.4, 60 sec: 11946.7, 300 sec: 12357.4). Total num frames: 13205504. Throughput: 0: 3034.3. Samples: 2291094. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 20:40:41,069][126169] Avg episode reward: [(0, '4.673')] [2025-01-03 20:40:42,430][126248] Updated weights for policy 0, policy_version 3230 (0.0016) [2025-01-03 20:40:45,113][126248] Updated weights for policy 0, policy_version 3240 (0.0015) [2025-01-03 20:40:46,069][126169] Fps is (10 sec: 13107.1, 60 sec: 12219.7, 300 sec: 12385.2). Total num frames: 13279232. Throughput: 0: 3132.3. Samples: 2312178. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 20:40:46,069][126169] Avg episode reward: [(0, '4.727')] [2025-01-03 20:40:46,081][126222] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000003242_13279232.pth... [2025-01-03 20:40:46,151][126222] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000002517_10309632.pth [2025-01-03 20:40:48,760][126248] Updated weights for policy 0, policy_version 3250 (0.0022) [2025-01-03 20:40:51,069][126169] Fps is (10 sec: 13107.2, 60 sec: 12219.7, 300 sec: 12385.2). Total num frames: 13336576. Throughput: 0: 3143.6. Samples: 2329540. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-03 20:40:51,069][126169] Avg episode reward: [(0, '4.772')] [2025-01-03 20:40:52,200][126248] Updated weights for policy 0, policy_version 3260 (0.0021) [2025-01-03 20:40:55,602][126248] Updated weights for policy 0, policy_version 3270 (0.0021) [2025-01-03 20:40:56,069][126169] Fps is (10 sec: 11878.4, 60 sec: 12356.2, 300 sec: 12385.2). Total num frames: 13398016. Throughput: 0: 3135.7. Samples: 2338484. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-03 20:40:56,070][126169] Avg episode reward: [(0, '4.837')] [2025-01-03 20:40:59,440][126248] Updated weights for policy 0, policy_version 3280 (0.0024) [2025-01-03 20:41:01,068][126169] Fps is (10 sec: 11469.1, 60 sec: 12288.0, 300 sec: 12357.4). Total num frames: 13451264. Throughput: 0: 3035.0. Samples: 2355310. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-03 20:41:01,069][126169] Avg episode reward: [(0, '4.612')] [2025-01-03 20:41:02,406][126248] Updated weights for policy 0, policy_version 3290 (0.0015) [2025-01-03 20:41:04,537][126248] Updated weights for policy 0, policy_version 3300 (0.0011) [2025-01-03 20:41:06,068][126169] Fps is (10 sec: 14336.6, 60 sec: 12834.2, 300 sec: 12454.6). Total num frames: 13541376. Throughput: 0: 3175.9. Samples: 2379542. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 20:41:06,069][126169] Avg episode reward: [(0, '4.836')] [2025-01-03 20:41:06,820][126248] Updated weights for policy 0, policy_version 3310 (0.0011) [2025-01-03 20:41:09,013][126248] Updated weights for policy 0, policy_version 3320 (0.0011) [2025-01-03 20:41:11,069][126169] Fps is (10 sec: 18431.5, 60 sec: 13175.4, 300 sec: 12579.6). Total num frames: 13635584. Throughput: 0: 3284.3. Samples: 2393254. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-03 20:41:11,069][126169] Avg episode reward: [(0, '4.826')] [2025-01-03 20:41:11,332][126248] Updated weights for policy 0, policy_version 3330 (0.0012) [2025-01-03 20:41:15,527][126248] Updated weights for policy 0, policy_version 3340 (0.0026) [2025-01-03 20:41:16,069][126169] Fps is (10 sec: 14335.6, 60 sec: 12970.7, 300 sec: 12537.9). Total num frames: 13684736. Throughput: 0: 3345.8. Samples: 2414384. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-03 20:41:16,069][126169] Avg episode reward: [(0, '4.803')] [2025-01-03 20:41:19,903][126248] Updated weights for policy 0, policy_version 3350 (0.0027) [2025-01-03 20:41:21,069][126169] Fps is (10 sec: 9420.7, 60 sec: 12697.6, 300 sec: 12496.3). Total num frames: 13729792. Throughput: 0: 3243.6. Samples: 2428324. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-03 20:41:21,070][126169] Avg episode reward: [(0, '4.848')] [2025-01-03 20:41:23,988][126248] Updated weights for policy 0, policy_version 3360 (0.0024) [2025-01-03 20:41:26,069][126169] Fps is (10 sec: 9830.2, 60 sec: 12629.3, 300 sec: 12468.5). Total num frames: 13783040. Throughput: 0: 3219.3. Samples: 2435964. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 20:41:26,070][126169] Avg episode reward: [(0, '4.978')] [2025-01-03 20:41:28,237][126248] Updated weights for policy 0, policy_version 3370 (0.0025) [2025-01-03 20:41:31,069][126169] Fps is (10 sec: 10240.1, 60 sec: 12424.5, 300 sec: 12426.8). Total num frames: 13832192. Throughput: 0: 3069.0. Samples: 2450282. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2025-01-03 20:41:31,069][126169] Avg episode reward: [(0, '4.568')] [2025-01-03 20:41:32,185][126248] Updated weights for policy 0, policy_version 3380 (0.0023) [2025-01-03 20:41:35,814][126248] Updated weights for policy 0, policy_version 3390 (0.0023) [2025-01-03 20:41:36,069][126169] Fps is (10 sec: 10240.2, 60 sec: 12288.0, 300 sec: 12399.1). Total num frames: 13885440. Throughput: 0: 3051.1. Samples: 2466838. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-03 20:41:36,069][126169] Avg episode reward: [(0, '4.462')] [2025-01-03 20:41:39,412][126248] Updated weights for policy 0, policy_version 3400 (0.0022) [2025-01-03 20:41:41,069][126169] Fps is (10 sec: 11059.3, 60 sec: 12288.0, 300 sec: 12315.8). Total num frames: 13942784. Throughput: 0: 3039.7. Samples: 2475270. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-03 20:41:41,069][126169] Avg episode reward: [(0, '4.689')] [2025-01-03 20:41:42,986][126248] Updated weights for policy 0, policy_version 3410 (0.0021) [2025-01-03 20:41:46,069][126169] Fps is (10 sec: 11468.8, 60 sec: 12015.0, 300 sec: 12288.0). Total num frames: 14000128. Throughput: 0: 3055.0. Samples: 2492788. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2025-01-03 20:41:46,069][126169] Avg episode reward: [(0, '4.475')] [2025-01-03 20:41:46,493][126248] Updated weights for policy 0, policy_version 3420 (0.0022) [2025-01-03 20:41:49,941][126248] Updated weights for policy 0, policy_version 3430 (0.0020) [2025-01-03 20:41:51,069][126169] Fps is (10 sec: 11878.3, 60 sec: 12083.2, 300 sec: 12288.0). Total num frames: 14061568. Throughput: 0: 2905.5. Samples: 2510292. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-03 20:41:51,069][126169] Avg episode reward: [(0, '4.385')] [2025-01-03 20:41:53,446][126248] Updated weights for policy 0, policy_version 3440 (0.0021) [2025-01-03 20:41:56,069][126169] Fps is (10 sec: 11878.4, 60 sec: 12015.0, 300 sec: 12288.0). Total num frames: 14118912. Throughput: 0: 2794.4. Samples: 2519002. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2025-01-03 20:41:56,069][126169] Avg episode reward: [(0, '4.641')] [2025-01-03 20:41:56,992][126248] Updated weights for policy 0, policy_version 3450 (0.0022) [2025-01-03 20:42:00,395][126248] Updated weights for policy 0, policy_version 3460 (0.0021) [2025-01-03 20:42:01,069][126169] Fps is (10 sec: 11468.7, 60 sec: 12083.1, 300 sec: 12274.1). Total num frames: 14176256. Throughput: 0: 2718.6. Samples: 2536722. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-03 20:42:01,069][126169] Avg episode reward: [(0, '4.783')] [2025-01-03 20:42:03,912][126248] Updated weights for policy 0, policy_version 3470 (0.0021) [2025-01-03 20:42:06,068][126169] Fps is (10 sec: 13107.5, 60 sec: 11810.1, 300 sec: 12315.8). Total num frames: 14249984. Throughput: 0: 2829.0. Samples: 2555628. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2025-01-03 20:42:06,069][126169] Avg episode reward: [(0, '4.636')] [2025-01-03 20:42:06,257][126248] Updated weights for policy 0, policy_version 3480 (0.0012) [2025-01-03 20:42:08,436][126248] Updated weights for policy 0, policy_version 3490 (0.0011) [2025-01-03 20:42:10,774][126248] Updated weights for policy 0, policy_version 3500 (0.0012) [2025-01-03 20:42:11,068][126169] Fps is (10 sec: 16384.4, 60 sec: 11741.9, 300 sec: 12426.9). Total num frames: 14340096. Throughput: 0: 2974.7. Samples: 2569826. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-03 20:42:11,069][126169] Avg episode reward: [(0, '4.557')] [2025-01-03 20:42:13,085][126248] Updated weights for policy 0, policy_version 3510 (0.0012) [2025-01-03 20:42:16,069][126169] Fps is (10 sec: 15974.0, 60 sec: 12083.2, 300 sec: 12482.4). Total num frames: 14409728. Throughput: 0: 3244.4. Samples: 2596282. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2025-01-03 20:42:16,069][126169] Avg episode reward: [(0, '4.663')] [2025-01-03 20:42:16,728][126248] Updated weights for policy 0, policy_version 3520 (0.0020) [2025-01-03 20:42:20,789][126248] Updated weights for policy 0, policy_version 3530 (0.0025) [2025-01-03 20:42:21,069][126169] Fps is (10 sec: 11878.1, 60 sec: 12151.5, 300 sec: 12385.2). Total num frames: 14458880. Throughput: 0: 3187.5. Samples: 2610276. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2025-01-03 20:42:21,069][126169] Avg episode reward: [(0, '4.869')] [2025-01-03 20:42:24,402][126248] Updated weights for policy 0, policy_version 3540 (0.0022) [2025-01-03 20:42:26,069][126169] Fps is (10 sec: 10649.5, 60 sec: 12219.8, 300 sec: 12329.7). Total num frames: 14516224. Throughput: 0: 3185.5. Samples: 2618616. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2025-01-03 20:42:26,069][126169] Avg episode reward: [(0, '4.645')] [2025-01-03 20:42:28,190][126248] Updated weights for policy 0, policy_version 3550 (0.0022) [2025-01-03 20:42:31,069][126169] Fps is (10 sec: 11059.2, 60 sec: 12288.0, 300 sec: 12315.8). Total num frames: 14569472. Throughput: 0: 3159.7. Samples: 2634974. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-03 20:42:31,069][126169] Avg episode reward: [(0, '5.158')] [2025-01-03 20:42:32,057][126248] Updated weights for policy 0, policy_version 3560 (0.0023) [2025-01-03 20:42:35,461][126248] Updated weights for policy 0, policy_version 3570 (0.0021) [2025-01-03 20:42:36,069][126169] Fps is (10 sec: 11059.3, 60 sec: 12356.3, 300 sec: 12301.9). Total num frames: 14626816. Throughput: 0: 3151.3. Samples: 2652102. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-03 20:42:36,069][126169] Avg episode reward: [(0, '4.805')] [2025-01-03 20:42:38,945][126248] Updated weights for policy 0, policy_version 3580 (0.0021) [2025-01-03 20:42:41,069][126169] Fps is (10 sec: 11878.4, 60 sec: 12424.5, 300 sec: 12301.9). Total num frames: 14688256. Throughput: 0: 3155.5. Samples: 2660998. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-03 20:42:41,069][126169] Avg episode reward: [(0, '4.653')] [2025-01-03 20:42:42,453][126248] Updated weights for policy 0, policy_version 3590 (0.0021) [2025-01-03 20:42:45,880][126248] Updated weights for policy 0, policy_version 3600 (0.0020) [2025-01-03 20:42:46,069][126169] Fps is (10 sec: 11878.2, 60 sec: 12424.5, 300 sec: 12288.0). Total num frames: 14745600. Throughput: 0: 3154.0. Samples: 2678654. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-03 20:42:46,069][126169] Avg episode reward: [(0, '4.868')] [2025-01-03 20:42:46,077][126222] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000003600_14745600.pth... [2025-01-03 20:42:46,147][126222] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000002889_11833344.pth [2025-01-03 20:42:49,417][126248] Updated weights for policy 0, policy_version 3610 (0.0022) [2025-01-03 20:42:51,069][126169] Fps is (10 sec: 11468.9, 60 sec: 12356.3, 300 sec: 12274.1). Total num frames: 14802944. Throughput: 0: 3118.9. Samples: 2695978. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2025-01-03 20:42:51,069][126169] Avg episode reward: [(0, '4.922')] [2025-01-03 20:42:52,833][126248] Updated weights for policy 0, policy_version 3620 (0.0020) [2025-01-03 20:42:56,069][126169] Fps is (10 sec: 11878.5, 60 sec: 12424.5, 300 sec: 12288.0). Total num frames: 14864384. Throughput: 0: 3006.6. Samples: 2705122. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2025-01-03 20:42:56,069][126169] Avg episode reward: [(0, '4.936')] [2025-01-03 20:42:56,188][126248] Updated weights for policy 0, policy_version 3630 (0.0021) [2025-01-03 20:42:59,503][126248] Updated weights for policy 0, policy_version 3640 (0.0019) [2025-01-03 20:43:01,069][126169] Fps is (10 sec: 12287.9, 60 sec: 12492.8, 300 sec: 12288.0). Total num frames: 14925824. Throughput: 0: 2829.4. Samples: 2723604. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2025-01-03 20:43:01,069][126169] Avg episode reward: [(0, '5.046')] [2025-01-03 20:43:02,960][126248] Updated weights for policy 0, policy_version 3650 (0.0020) [2025-01-03 20:43:06,069][126169] Fps is (10 sec: 12288.0, 60 sec: 12287.9, 300 sec: 12288.0). Total num frames: 14987264. Throughput: 0: 2921.3. Samples: 2741734. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2025-01-03 20:43:06,069][126169] Avg episode reward: [(0, '4.881')] [2025-01-03 20:43:06,305][126248] Updated weights for policy 0, policy_version 3660 (0.0021) [2025-01-03 20:43:09,674][126248] Updated weights for policy 0, policy_version 3670 (0.0020) [2025-01-03 20:43:11,069][126169] Fps is (10 sec: 11878.3, 60 sec: 11741.8, 300 sec: 12260.2). Total num frames: 15044608. Throughput: 0: 2939.3. Samples: 2750884. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2025-01-03 20:43:11,069][126169] Avg episode reward: [(0, '5.186')] [2025-01-03 20:43:13,351][126248] Updated weights for policy 0, policy_version 3680 (0.0022) [2025-01-03 20:43:16,069][126169] Fps is (10 sec: 11468.9, 60 sec: 11537.1, 300 sec: 12246.3). Total num frames: 15101952. Throughput: 0: 2950.2. Samples: 2767734. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2025-01-03 20:43:16,069][126169] Avg episode reward: [(0, '4.921')] [2025-01-03 20:43:17,145][126248] Updated weights for policy 0, policy_version 3690 (0.0022) [2025-01-03 20:43:20,428][126248] Updated weights for policy 0, policy_version 3700 (0.0018) [2025-01-03 20:43:21,068][126169] Fps is (10 sec: 11878.6, 60 sec: 11741.9, 300 sec: 12204.7). Total num frames: 15163392. Throughput: 0: 2953.2. Samples: 2784996. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-03 20:43:21,069][126169] Avg episode reward: [(0, '5.478')] [2025-01-03 20:43:21,114][126222] Saving new best policy, reward=5.478! [2025-01-03 20:43:23,290][126248] Updated weights for policy 0, policy_version 3710 (0.0017) [2025-01-03 20:43:26,069][126169] Fps is (10 sec: 12287.3, 60 sec: 11810.0, 300 sec: 12190.8). Total num frames: 15224832. Throughput: 0: 2999.3. Samples: 2795968. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-03 20:43:26,070][126169] Avg episode reward: [(0, '5.184')] [2025-01-03 20:43:26,835][126248] Updated weights for policy 0, policy_version 3720 (0.0021) [2025-01-03 20:43:30,216][126248] Updated weights for policy 0, policy_version 3730 (0.0022) [2025-01-03 20:43:31,069][126169] Fps is (10 sec: 12287.8, 60 sec: 11946.7, 300 sec: 12204.7). Total num frames: 15286272. Throughput: 0: 3003.6. Samples: 2813816. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-03 20:43:31,069][126169] Avg episode reward: [(0, '5.260')] [2025-01-03 20:43:33,504][126248] Updated weights for policy 0, policy_version 3740 (0.0021) [2025-01-03 20:43:36,069][126169] Fps is (10 sec: 12288.8, 60 sec: 12014.9, 300 sec: 12204.7). Total num frames: 15347712. Throughput: 0: 3015.4. Samples: 2831672. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-03 20:43:36,069][126169] Avg episode reward: [(0, '4.955')] [2025-01-03 20:43:37,110][126248] Updated weights for policy 0, policy_version 3750 (0.0021) [2025-01-03 20:43:40,461][126248] Updated weights for policy 0, policy_version 3760 (0.0020) [2025-01-03 20:43:41,069][126169] Fps is (10 sec: 11878.4, 60 sec: 11946.7, 300 sec: 12107.5). Total num frames: 15405056. Throughput: 0: 3011.7. Samples: 2840646. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-03 20:43:41,069][126169] Avg episode reward: [(0, '5.269')] [2025-01-03 20:43:43,748][126248] Updated weights for policy 0, policy_version 3770 (0.0020) [2025-01-03 20:43:46,069][126169] Fps is (10 sec: 11878.4, 60 sec: 12015.0, 300 sec: 12135.3). Total num frames: 15466496. Throughput: 0: 3010.1. Samples: 2859058. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 20:43:46,069][126169] Avg episode reward: [(0, '4.816')] [2025-01-03 20:43:47,136][126248] Updated weights for policy 0, policy_version 3780 (0.0020) [2025-01-03 20:43:50,348][126248] Updated weights for policy 0, policy_version 3790 (0.0018) [2025-01-03 20:43:51,068][126169] Fps is (10 sec: 13107.4, 60 sec: 12219.8, 300 sec: 12176.9). Total num frames: 15536128. Throughput: 0: 3018.6. Samples: 2877572. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 20:43:51,069][126169] Avg episode reward: [(0, '4.908')] [2025-01-03 20:43:52,433][126248] Updated weights for policy 0, policy_version 3800 (0.0010) [2025-01-03 20:43:54,514][126248] Updated weights for policy 0, policy_version 3810 (0.0010) [2025-01-03 20:43:56,068][126169] Fps is (10 sec: 16794.0, 60 sec: 12834.2, 300 sec: 12329.7). Total num frames: 15634432. Throughput: 0: 3140.9. Samples: 2892224. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-03 20:43:56,069][126169] Avg episode reward: [(0, '4.759')] [2025-01-03 20:43:56,653][126248] Updated weights for policy 0, policy_version 3820 (0.0011) [2025-01-03 20:43:59,558][126248] Updated weights for policy 0, policy_version 3830 (0.0018) [2025-01-03 20:44:01,069][126169] Fps is (10 sec: 16793.3, 60 sec: 12970.7, 300 sec: 12385.2). Total num frames: 15704064. Throughput: 0: 3341.2. Samples: 2918088. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2025-01-03 20:44:01,069][126169] Avg episode reward: [(0, '4.805')] [2025-01-03 20:44:03,058][126248] Updated weights for policy 0, policy_version 3840 (0.0021) [2025-01-03 20:44:06,069][126169] Fps is (10 sec: 12697.1, 60 sec: 12902.4, 300 sec: 12385.2). Total num frames: 15761408. Throughput: 0: 3351.8. Samples: 2935826. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2025-01-03 20:44:06,069][126169] Avg episode reward: [(0, '5.255')] [2025-01-03 20:44:06,450][126248] Updated weights for policy 0, policy_version 3850 (0.0020) [2025-01-03 20:44:09,819][126248] Updated weights for policy 0, policy_version 3860 (0.0020) [2025-01-03 20:44:11,069][126169] Fps is (10 sec: 11878.4, 60 sec: 12970.7, 300 sec: 12399.1). Total num frames: 15822848. Throughput: 0: 3309.6. Samples: 2944896. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2025-01-03 20:44:11,069][126169] Avg episode reward: [(0, '5.264')] [2025-01-03 20:44:13,134][126248] Updated weights for policy 0, policy_version 3870 (0.0019) [2025-01-03 20:44:16,069][126169] Fps is (10 sec: 12288.1, 60 sec: 13038.9, 300 sec: 12385.2). Total num frames: 15884288. Throughput: 0: 3316.6. Samples: 2963062. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2025-01-03 20:44:16,069][126169] Avg episode reward: [(0, '5.341')] [2025-01-03 20:44:16,657][126248] Updated weights for policy 0, policy_version 3880 (0.0021) [2025-01-03 20:44:20,002][126248] Updated weights for policy 0, policy_version 3890 (0.0021) [2025-01-03 20:44:21,069][126169] Fps is (10 sec: 12288.0, 60 sec: 13038.9, 300 sec: 12288.0). Total num frames: 15945728. Throughput: 0: 3319.8. Samples: 2981064. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2025-01-03 20:44:21,069][126169] Avg episode reward: [(0, '4.968')] [2025-01-03 20:44:23,338][126248] Updated weights for policy 0, policy_version 3900 (0.0020) [2025-01-03 20:44:26,069][126169] Fps is (10 sec: 11878.4, 60 sec: 12970.8, 300 sec: 12288.0). Total num frames: 16003072. Throughput: 0: 3327.3. Samples: 2990374. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2025-01-03 20:44:26,069][126169] Avg episode reward: [(0, '4.964')] [2025-01-03 20:44:26,800][126248] Updated weights for policy 0, policy_version 3910 (0.0021) [2025-01-03 20:44:30,110][126248] Updated weights for policy 0, policy_version 3920 (0.0020) [2025-01-03 20:44:31,069][126169] Fps is (10 sec: 11878.5, 60 sec: 12970.7, 300 sec: 12301.9). Total num frames: 16064512. Throughput: 0: 3320.8. Samples: 3008492. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2025-01-03 20:44:31,069][126169] Avg episode reward: [(0, '4.729')] [2025-01-03 20:44:33,391][126248] Updated weights for policy 0, policy_version 3930 (0.0019) [2025-01-03 20:44:36,069][126169] Fps is (10 sec: 12287.8, 60 sec: 12970.6, 300 sec: 12329.6). Total num frames: 16125952. Throughput: 0: 3313.2. Samples: 3026666. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-03 20:44:36,069][126169] Avg episode reward: [(0, '4.560')] [2025-01-03 20:44:36,989][126248] Updated weights for policy 0, policy_version 3940 (0.0021) [2025-01-03 20:44:40,277][126248] Updated weights for policy 0, policy_version 3950 (0.0020) [2025-01-03 20:44:41,069][126169] Fps is (10 sec: 12288.0, 60 sec: 13038.9, 300 sec: 12343.5). Total num frames: 16187392. Throughput: 0: 3184.2. Samples: 3035512. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2025-01-03 20:44:41,069][126169] Avg episode reward: [(0, '4.681')] [2025-01-03 20:44:43,677][126248] Updated weights for policy 0, policy_version 3960 (0.0021) [2025-01-03 20:44:46,069][126169] Fps is (10 sec: 11878.5, 60 sec: 12970.6, 300 sec: 12343.5). Total num frames: 16244736. Throughput: 0: 3015.0. Samples: 3053764. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-03 20:44:46,069][126169] Avg episode reward: [(0, '4.642')] [2025-01-03 20:44:46,118][126222] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000003967_16248832.pth... [2025-01-03 20:44:46,183][126222] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000003242_13279232.pth [2025-01-03 20:44:47,229][126248] Updated weights for policy 0, policy_version 3970 (0.0022) [2025-01-03 20:44:50,687][126248] Updated weights for policy 0, policy_version 3980 (0.0020) [2025-01-03 20:44:51,069][126169] Fps is (10 sec: 11878.1, 60 sec: 12834.0, 300 sec: 12371.3). Total num frames: 16306176. Throughput: 0: 3010.1. Samples: 3071282. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-03 20:44:51,069][126169] Avg episode reward: [(0, '4.715')] [2025-01-03 20:44:54,046][126248] Updated weights for policy 0, policy_version 3990 (0.0020) [2025-01-03 20:44:56,069][126169] Fps is (10 sec: 11878.5, 60 sec: 12151.4, 300 sec: 12371.3). Total num frames: 16363520. Throughput: 0: 3009.7. Samples: 3080334. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-03 20:44:56,069][126169] Avg episode reward: [(0, '4.576')] [2025-01-03 20:44:57,461][126248] Updated weights for policy 0, policy_version 4000 (0.0020) [2025-01-03 20:45:00,773][126248] Updated weights for policy 0, policy_version 4010 (0.0021) [2025-01-03 20:45:01,068][126169] Fps is (10 sec: 11878.8, 60 sec: 12015.0, 300 sec: 12385.2). Total num frames: 16424960. Throughput: 0: 3012.4. Samples: 3098620. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2025-01-03 20:45:01,069][126169] Avg episode reward: [(0, '4.660')] [2025-01-03 20:45:04,157][126248] Updated weights for policy 0, policy_version 4020 (0.0021) [2025-01-03 20:45:06,069][126169] Fps is (10 sec: 12288.0, 60 sec: 12083.2, 300 sec: 12343.5). Total num frames: 16486400. Throughput: 0: 3014.4. Samples: 3116712. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-03 20:45:06,069][126169] Avg episode reward: [(0, '5.038')] [2025-01-03 20:45:07,526][126248] Updated weights for policy 0, policy_version 4030 (0.0020) [2025-01-03 20:45:10,815][126248] Updated weights for policy 0, policy_version 4040 (0.0019) [2025-01-03 20:45:11,069][126169] Fps is (10 sec: 12287.9, 60 sec: 12083.2, 300 sec: 12343.5). Total num frames: 16547840. Throughput: 0: 3016.6. Samples: 3126120. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-03 20:45:11,069][126169] Avg episode reward: [(0, '4.694')] [2025-01-03 20:45:14,099][126248] Updated weights for policy 0, policy_version 4050 (0.0020) [2025-01-03 20:45:16,069][126169] Fps is (10 sec: 12287.9, 60 sec: 12083.2, 300 sec: 12343.5). Total num frames: 16609280. Throughput: 0: 3027.4. Samples: 3144724. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 20:45:16,069][126169] Avg episode reward: [(0, '4.581')] [2025-01-03 20:45:17,763][126248] Updated weights for policy 0, policy_version 4060 (0.0021) [2025-01-03 20:45:20,903][126248] Updated weights for policy 0, policy_version 4070 (0.0017) [2025-01-03 20:45:21,068][126169] Fps is (10 sec: 12288.3, 60 sec: 12083.3, 300 sec: 12357.4). Total num frames: 16670720. Throughput: 0: 3005.9. Samples: 3161932. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 20:45:21,069][126169] Avg episode reward: [(0, '4.472')] [2025-01-03 20:45:23,116][126248] Updated weights for policy 0, policy_version 4080 (0.0011) [2025-01-03 20:45:26,012][126248] Updated weights for policy 0, policy_version 4090 (0.0019) [2025-01-03 20:45:26,069][126169] Fps is (10 sec: 14336.2, 60 sec: 12492.8, 300 sec: 12426.9). Total num frames: 16752640. Throughput: 0: 3112.5. Samples: 3175576. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 20:45:26,069][126169] Avg episode reward: [(0, '4.701')] [2025-01-03 20:45:29,483][126248] Updated weights for policy 0, policy_version 4100 (0.0021) [2025-01-03 20:45:31,069][126169] Fps is (10 sec: 13926.0, 60 sec: 12424.5, 300 sec: 12413.0). Total num frames: 16809984. Throughput: 0: 3135.8. Samples: 3194876. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 20:45:31,069][126169] Avg episode reward: [(0, '4.715')] [2025-01-03 20:45:33,109][126248] Updated weights for policy 0, policy_version 4110 (0.0022) [2025-01-03 20:45:36,069][126169] Fps is (10 sec: 11468.6, 60 sec: 12356.3, 300 sec: 12413.0). Total num frames: 16867328. Throughput: 0: 3125.8. Samples: 3211944. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 20:45:36,069][126169] Avg episode reward: [(0, '4.631')] [2025-01-03 20:45:36,734][126248] Updated weights for policy 0, policy_version 4120 (0.0022) [2025-01-03 20:45:40,103][126248] Updated weights for policy 0, policy_version 4130 (0.0021) [2025-01-03 20:45:41,069][126169] Fps is (10 sec: 11468.9, 60 sec: 12288.0, 300 sec: 12357.4). Total num frames: 16924672. Throughput: 0: 3123.4. Samples: 3220886. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 20:45:41,069][126169] Avg episode reward: [(0, '4.597')] [2025-01-03 20:45:43,649][126248] Updated weights for policy 0, policy_version 4140 (0.0021) [2025-01-03 20:45:46,069][126169] Fps is (10 sec: 11878.5, 60 sec: 12356.3, 300 sec: 12371.3). Total num frames: 16986112. Throughput: 0: 3106.2. Samples: 3238400. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 20:45:46,069][126169] Avg episode reward: [(0, '4.927')] [2025-01-03 20:45:47,040][126248] Updated weights for policy 0, policy_version 4150 (0.0020) [2025-01-03 20:45:50,355][126248] Updated weights for policy 0, policy_version 4160 (0.0020) [2025-01-03 20:45:51,069][126169] Fps is (10 sec: 12287.9, 60 sec: 12356.3, 300 sec: 12371.3). Total num frames: 17047552. Throughput: 0: 3111.6. Samples: 3256736. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 20:45:51,069][126169] Avg episode reward: [(0, '4.792')] [2025-01-03 20:45:53,814][126248] Updated weights for policy 0, policy_version 4170 (0.0020) [2025-01-03 20:45:56,069][126169] Fps is (10 sec: 11878.3, 60 sec: 12356.3, 300 sec: 12385.2). Total num frames: 17104896. Throughput: 0: 3099.1. Samples: 3265582. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 20:45:56,069][126169] Avg episode reward: [(0, '4.927')] [2025-01-03 20:45:57,362][126248] Updated weights for policy 0, policy_version 4180 (0.0021) [2025-01-03 20:45:59,996][126248] Updated weights for policy 0, policy_version 4190 (0.0014) [2025-01-03 20:46:01,068][126169] Fps is (10 sec: 13107.6, 60 sec: 12561.1, 300 sec: 12329.7). Total num frames: 17178624. Throughput: 0: 3094.3. Samples: 3283966. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-03 20:46:01,069][126169] Avg episode reward: [(0, '4.590')] [2025-01-03 20:46:02,167][126248] Updated weights for policy 0, policy_version 4200 (0.0011) [2025-01-03 20:46:04,292][126248] Updated weights for policy 0, policy_version 4210 (0.0011) [2025-01-03 20:46:06,068][126169] Fps is (10 sec: 16794.0, 60 sec: 13107.2, 300 sec: 12329.7). Total num frames: 17272832. Throughput: 0: 3347.5. Samples: 3312568. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 20:46:06,069][126169] Avg episode reward: [(0, '4.565')] [2025-01-03 20:46:06,821][126248] Updated weights for policy 0, policy_version 4220 (0.0014) [2025-01-03 20:46:10,413][126248] Updated weights for policy 0, policy_version 4230 (0.0022) [2025-01-03 20:46:11,068][126169] Fps is (10 sec: 15155.0, 60 sec: 13039.0, 300 sec: 12357.4). Total num frames: 17330176. Throughput: 0: 3269.4. Samples: 3322698. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 20:46:11,069][126169] Avg episode reward: [(0, '4.473')] [2025-01-03 20:46:13,932][126248] Updated weights for policy 0, policy_version 4240 (0.0022) [2025-01-03 20:46:16,069][126169] Fps is (10 sec: 11878.2, 60 sec: 13038.9, 300 sec: 12413.0). Total num frames: 17391616. Throughput: 0: 3222.9. Samples: 3339908. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 20:46:16,069][126169] Avg episode reward: [(0, '4.804')] [2025-01-03 20:46:17,437][126248] Updated weights for policy 0, policy_version 4250 (0.0021) [2025-01-03 20:46:20,849][126248] Updated weights for policy 0, policy_version 4260 (0.0021) [2025-01-03 20:46:21,069][126169] Fps is (10 sec: 11878.2, 60 sec: 12970.6, 300 sec: 12426.9). Total num frames: 17448960. Throughput: 0: 3238.9. Samples: 3357694. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 20:46:21,069][126169] Avg episode reward: [(0, '4.883')] [2025-01-03 20:46:24,333][126248] Updated weights for policy 0, policy_version 4270 (0.0021) [2025-01-03 20:46:26,069][126169] Fps is (10 sec: 11468.8, 60 sec: 12561.0, 300 sec: 12454.6). Total num frames: 17506304. Throughput: 0: 3232.6. Samples: 3366354. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 20:46:26,069][126169] Avg episode reward: [(0, '4.665')] [2025-01-03 20:46:27,711][126248] Updated weights for policy 0, policy_version 4280 (0.0021) [2025-01-03 20:46:31,069][126169] Fps is (10 sec: 11878.5, 60 sec: 12629.4, 300 sec: 12482.4). Total num frames: 17567744. Throughput: 0: 3249.5. Samples: 3384628. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-03 20:46:31,069][126169] Avg episode reward: [(0, '4.477')] [2025-01-03 20:46:31,078][126248] Updated weights for policy 0, policy_version 4290 (0.0020) [2025-01-03 20:46:34,639][126248] Updated weights for policy 0, policy_version 4300 (0.0022) [2025-01-03 20:46:36,069][126169] Fps is (10 sec: 11878.4, 60 sec: 12629.3, 300 sec: 12482.4). Total num frames: 17625088. Throughput: 0: 3227.9. Samples: 3401990. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 20:46:36,069][126169] Avg episode reward: [(0, '4.829')] [2025-01-03 20:46:38,210][126248] Updated weights for policy 0, policy_version 4310 (0.0021) [2025-01-03 20:46:41,069][126169] Fps is (10 sec: 11468.8, 60 sec: 12629.3, 300 sec: 12482.4). Total num frames: 17682432. Throughput: 0: 3226.0. Samples: 3410752. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 20:46:41,069][126169] Avg episode reward: [(0, '4.895')] [2025-01-03 20:46:41,899][126248] Updated weights for policy 0, policy_version 4320 (0.0022) [2025-01-03 20:46:45,233][126248] Updated weights for policy 0, policy_version 4330 (0.0020) [2025-01-03 20:46:46,069][126169] Fps is (10 sec: 11878.6, 60 sec: 12629.4, 300 sec: 12482.4). Total num frames: 17743872. Throughput: 0: 3205.5. Samples: 3428214. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 20:46:46,069][126169] Avg episode reward: [(0, '4.846')] [2025-01-03 20:46:46,076][126222] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000004332_17743872.pth... [2025-01-03 20:46:46,136][126222] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000003600_14745600.pth [2025-01-03 20:46:48,785][126248] Updated weights for policy 0, policy_version 4340 (0.0021) [2025-01-03 20:46:51,069][126169] Fps is (10 sec: 11878.3, 60 sec: 12561.1, 300 sec: 12482.4). Total num frames: 17801216. Throughput: 0: 2964.2. Samples: 3445956. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 20:46:51,069][126169] Avg episode reward: [(0, '4.744')] [2025-01-03 20:46:52,142][126248] Updated weights for policy 0, policy_version 4350 (0.0021) [2025-01-03 20:46:55,478][126248] Updated weights for policy 0, policy_version 4360 (0.0021) [2025-01-03 20:46:56,069][126169] Fps is (10 sec: 11878.2, 60 sec: 12629.3, 300 sec: 12496.3). Total num frames: 17862656. Throughput: 0: 2939.5. Samples: 3454976. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 20:46:56,069][126169] Avg episode reward: [(0, '4.916')] [2025-01-03 20:46:58,866][126248] Updated weights for policy 0, policy_version 4370 (0.0020) [2025-01-03 20:47:01,069][126169] Fps is (10 sec: 12287.9, 60 sec: 12424.5, 300 sec: 12454.6). Total num frames: 17924096. Throughput: 0: 2962.9. Samples: 3473238. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 20:47:01,069][126169] Avg episode reward: [(0, '4.796')] [2025-01-03 20:47:02,566][126248] Updated weights for policy 0, policy_version 4380 (0.0022) [2025-01-03 20:47:05,943][126248] Updated weights for policy 0, policy_version 4390 (0.0019) [2025-01-03 20:47:06,069][126169] Fps is (10 sec: 11878.5, 60 sec: 11810.1, 300 sec: 12343.5). Total num frames: 17981440. Throughput: 0: 2952.8. Samples: 3490572. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 20:47:06,069][126169] Avg episode reward: [(0, '4.873')] [2025-01-03 20:47:09,240][126248] Updated weights for policy 0, policy_version 4400 (0.0020) [2025-01-03 20:47:11,068][126169] Fps is (10 sec: 11878.6, 60 sec: 11878.4, 300 sec: 12315.8). Total num frames: 18042880. Throughput: 0: 2966.2. Samples: 3499834. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 20:47:11,069][126169] Avg episode reward: [(0, '5.116')] [2025-01-03 20:47:12,542][126248] Updated weights for policy 0, policy_version 4410 (0.0020) [2025-01-03 20:47:15,802][126248] Updated weights for policy 0, policy_version 4420 (0.0020) [2025-01-03 20:47:16,069][126169] Fps is (10 sec: 12287.9, 60 sec: 11878.4, 300 sec: 12357.4). Total num frames: 18104320. Throughput: 0: 2976.0. Samples: 3518550. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 20:47:16,069][126169] Avg episode reward: [(0, '4.731')] [2025-01-03 20:47:19,030][126248] Updated weights for policy 0, policy_version 4430 (0.0018) [2025-01-03 20:47:21,069][126169] Fps is (10 sec: 13107.1, 60 sec: 12083.2, 300 sec: 12399.1). Total num frames: 18173952. Throughput: 0: 3033.7. Samples: 3538504. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 20:47:21,069][126169] Avg episode reward: [(0, '4.802')] [2025-01-03 20:47:22,007][126248] Updated weights for policy 0, policy_version 4440 (0.0018) [2025-01-03 20:47:25,446][126248] Updated weights for policy 0, policy_version 4450 (0.0020) [2025-01-03 20:47:26,068][126169] Fps is (10 sec: 13107.6, 60 sec: 12151.5, 300 sec: 12426.9). Total num frames: 18235392. Throughput: 0: 3032.5. Samples: 3547214. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 20:47:26,069][126169] Avg episode reward: [(0, '4.945')] [2025-01-03 20:47:27,675][126248] Updated weights for policy 0, policy_version 4460 (0.0012) [2025-01-03 20:47:30,825][126248] Updated weights for policy 0, policy_version 4470 (0.0020) [2025-01-03 20:47:31,069][126169] Fps is (10 sec: 13516.9, 60 sec: 12356.3, 300 sec: 12482.4). Total num frames: 18309120. Throughput: 0: 3142.1. Samples: 3569610. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 20:47:31,069][126169] Avg episode reward: [(0, '5.009')] [2025-01-03 20:47:34,106][126248] Updated weights for policy 0, policy_version 4480 (0.0020) [2025-01-03 20:47:36,069][126169] Fps is (10 sec: 13516.6, 60 sec: 12424.6, 300 sec: 12482.4). Total num frames: 18370560. Throughput: 0: 3157.3. Samples: 3588034. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 20:47:36,069][126169] Avg episode reward: [(0, '4.950')] [2025-01-03 20:47:37,377][126248] Updated weights for policy 0, policy_version 4490 (0.0020) [2025-01-03 20:47:40,671][126248] Updated weights for policy 0, policy_version 4500 (0.0021) [2025-01-03 20:47:41,069][126169] Fps is (10 sec: 12697.5, 60 sec: 12561.0, 300 sec: 12510.2). Total num frames: 18436096. Throughput: 0: 3171.1. Samples: 3597674. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 20:47:41,069][126169] Avg episode reward: [(0, '5.154')] [2025-01-03 20:47:43,947][126248] Updated weights for policy 0, policy_version 4510 (0.0021) [2025-01-03 20:47:46,069][126169] Fps is (10 sec: 12697.5, 60 sec: 12561.0, 300 sec: 12524.0). Total num frames: 18497536. Throughput: 0: 3180.3. Samples: 3616352. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 20:47:46,069][126169] Avg episode reward: [(0, '5.218')] [2025-01-03 20:47:47,384][126248] Updated weights for policy 0, policy_version 4520 (0.0021) [2025-01-03 20:47:50,653][126248] Updated weights for policy 0, policy_version 4530 (0.0020) [2025-01-03 20:47:51,068][126169] Fps is (10 sec: 12288.2, 60 sec: 12629.4, 300 sec: 12524.1). Total num frames: 18558976. Throughput: 0: 3204.3. Samples: 3634766. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 20:47:51,069][126169] Avg episode reward: [(0, '5.245')] [2025-01-03 20:47:53,907][126248] Updated weights for policy 0, policy_version 4540 (0.0019) [2025-01-03 20:47:56,069][126169] Fps is (10 sec: 12288.1, 60 sec: 12629.4, 300 sec: 12524.0). Total num frames: 18620416. Throughput: 0: 3207.7. Samples: 3644180. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 20:47:56,069][126169] Avg episode reward: [(0, '5.141')] [2025-01-03 20:47:57,454][126248] Updated weights for policy 0, policy_version 4550 (0.0021) [2025-01-03 20:48:00,781][126248] Updated weights for policy 0, policy_version 4560 (0.0020) [2025-01-03 20:48:01,069][126169] Fps is (10 sec: 11878.3, 60 sec: 12561.1, 300 sec: 12510.2). Total num frames: 18677760. Throughput: 0: 3189.4. Samples: 3662074. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 20:48:01,069][126169] Avg episode reward: [(0, '4.601')] [2025-01-03 20:48:04,145][126248] Updated weights for policy 0, policy_version 4570 (0.0019) [2025-01-03 20:48:06,069][126169] Fps is (10 sec: 11878.3, 60 sec: 12629.3, 300 sec: 12524.0). Total num frames: 18739200. Throughput: 0: 3148.3. Samples: 3680178. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 20:48:06,069][126169] Avg episode reward: [(0, '4.706')] [2025-01-03 20:48:07,479][126248] Updated weights for policy 0, policy_version 4580 (0.0020) [2025-01-03 20:48:10,728][126248] Updated weights for policy 0, policy_version 4590 (0.0020) [2025-01-03 20:48:11,068][126169] Fps is (10 sec: 12697.7, 60 sec: 12697.6, 300 sec: 12551.8). Total num frames: 18804736. Throughput: 0: 3166.5. Samples: 3689706. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-03 20:48:11,069][126169] Avg episode reward: [(0, '4.860')] [2025-01-03 20:48:14,004][126248] Updated weights for policy 0, policy_version 4600 (0.0019) [2025-01-03 20:48:16,069][126169] Fps is (10 sec: 12697.7, 60 sec: 12697.6, 300 sec: 12551.8). Total num frames: 18866176. Throughput: 0: 3085.6. Samples: 3708462. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 20:48:16,069][126169] Avg episode reward: [(0, '4.771')] [2025-01-03 20:48:17,430][126248] Updated weights for policy 0, policy_version 4610 (0.0020) [2025-01-03 20:48:20,719][126248] Updated weights for policy 0, policy_version 4620 (0.0021) [2025-01-03 20:48:21,069][126169] Fps is (10 sec: 11878.4, 60 sec: 12492.8, 300 sec: 12538.0). Total num frames: 18923520. Throughput: 0: 3082.2. Samples: 3726734. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-03 20:48:21,069][126169] Avg episode reward: [(0, '4.678')] [2025-01-03 20:48:24,040][126248] Updated weights for policy 0, policy_version 4630 (0.0021) [2025-01-03 20:48:26,068][126169] Fps is (10 sec: 12288.2, 60 sec: 12561.1, 300 sec: 12551.8). Total num frames: 18989056. Throughput: 0: 3073.1. Samples: 3735962. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 20:48:26,069][126169] Avg episode reward: [(0, '5.002')] [2025-01-03 20:48:26,765][126248] Updated weights for policy 0, policy_version 4640 (0.0015) [2025-01-03 20:48:28,878][126248] Updated weights for policy 0, policy_version 4650 (0.0010) [2025-01-03 20:48:31,069][126169] Fps is (10 sec: 15155.1, 60 sec: 12765.8, 300 sec: 12635.1). Total num frames: 19075072. Throughput: 0: 3203.6. Samples: 3760512. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-03 20:48:31,069][126169] Avg episode reward: [(0, '4.714')] [2025-01-03 20:48:31,820][126248] Updated weights for policy 0, policy_version 4660 (0.0018) [2025-01-03 20:48:35,305][126248] Updated weights for policy 0, policy_version 4670 (0.0021) [2025-01-03 20:48:36,069][126169] Fps is (10 sec: 14745.2, 60 sec: 12765.8, 300 sec: 12649.0). Total num frames: 19136512. Throughput: 0: 3211.4. Samples: 3779278. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-03 20:48:36,069][126169] Avg episode reward: [(0, '4.699')] [2025-01-03 20:48:38,691][126248] Updated weights for policy 0, policy_version 4680 (0.0021) [2025-01-03 20:48:41,069][126169] Fps is (10 sec: 11878.4, 60 sec: 12629.3, 300 sec: 12635.1). Total num frames: 19193856. Throughput: 0: 3205.5. Samples: 3788426. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-03 20:48:41,069][126169] Avg episode reward: [(0, '4.734')] [2025-01-03 20:48:42,181][126248] Updated weights for policy 0, policy_version 4690 (0.0021) [2025-01-03 20:48:45,494][126248] Updated weights for policy 0, policy_version 4700 (0.0020) [2025-01-03 20:48:46,069][126169] Fps is (10 sec: 11878.3, 60 sec: 12629.3, 300 sec: 12607.3). Total num frames: 19255296. Throughput: 0: 3202.5. Samples: 3806186. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 20:48:46,069][126169] Avg episode reward: [(0, '4.727')] [2025-01-03 20:48:46,078][126222] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000004701_19255296.pth... [2025-01-03 20:48:46,144][126222] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000003967_16248832.pth [2025-01-03 20:48:49,033][126248] Updated weights for policy 0, policy_version 4710 (0.0021) [2025-01-03 20:48:51,069][126169] Fps is (10 sec: 11878.4, 60 sec: 12561.0, 300 sec: 12468.5). Total num frames: 19312640. Throughput: 0: 3193.9. Samples: 3823904. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 20:48:51,069][126169] Avg episode reward: [(0, '4.545')] [2025-01-03 20:48:52,460][126248] Updated weights for policy 0, policy_version 4720 (0.0021) [2025-01-03 20:48:55,733][126248] Updated weights for policy 0, policy_version 4730 (0.0020) [2025-01-03 20:48:56,069][126169] Fps is (10 sec: 11878.5, 60 sec: 12561.0, 300 sec: 12440.7). Total num frames: 19374080. Throughput: 0: 3181.7. Samples: 3832884. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 20:48:56,069][126169] Avg episode reward: [(0, '5.384')] [2025-01-03 20:48:59,010][126248] Updated weights for policy 0, policy_version 4740 (0.0020) [2025-01-03 20:49:01,069][126169] Fps is (10 sec: 12697.6, 60 sec: 12697.6, 300 sec: 12468.5). Total num frames: 19439616. Throughput: 0: 3184.4. Samples: 3851762. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 20:49:01,069][126169] Avg episode reward: [(0, '5.621')] [2025-01-03 20:49:01,070][126222] Saving new best policy, reward=5.621! [2025-01-03 20:49:02,532][126248] Updated weights for policy 0, policy_version 4750 (0.0021) [2025-01-03 20:49:05,907][126248] Updated weights for policy 0, policy_version 4760 (0.0020) [2025-01-03 20:49:06,069][126169] Fps is (10 sec: 12288.2, 60 sec: 12629.4, 300 sec: 12454.6). Total num frames: 19496960. Throughput: 0: 3172.8. Samples: 3869512. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 20:49:06,069][126169] Avg episode reward: [(0, '5.590')] [2025-01-03 20:49:09,185][126248] Updated weights for policy 0, policy_version 4770 (0.0021) [2025-01-03 20:49:11,069][126169] Fps is (10 sec: 11878.5, 60 sec: 12561.1, 300 sec: 12454.6). Total num frames: 19558400. Throughput: 0: 3174.0. Samples: 3878792. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 20:49:11,069][126169] Avg episode reward: [(0, '5.638')] [2025-01-03 20:49:11,070][126222] Saving new best policy, reward=5.638! [2025-01-03 20:49:12,557][126248] Updated weights for policy 0, policy_version 4780 (0.0020) [2025-01-03 20:49:15,791][126248] Updated weights for policy 0, policy_version 4790 (0.0020) [2025-01-03 20:49:16,068][126169] Fps is (10 sec: 12288.0, 60 sec: 12561.1, 300 sec: 12454.6). Total num frames: 19619840. Throughput: 0: 3042.2. Samples: 3897410. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 20:49:16,069][126169] Avg episode reward: [(0, '5.368')] [2025-01-03 20:49:19,361][126248] Updated weights for policy 0, policy_version 4800 (0.0022) [2025-01-03 20:49:21,068][126169] Fps is (10 sec: 12288.1, 60 sec: 12629.4, 300 sec: 12468.5). Total num frames: 19681280. Throughput: 0: 3010.4. Samples: 3914744. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 20:49:21,069][126169] Avg episode reward: [(0, '5.212')] [2025-01-03 20:49:22,165][126248] Updated weights for policy 0, policy_version 4810 (0.0015) [2025-01-03 20:49:25,352][126248] Updated weights for policy 0, policy_version 4820 (0.0019) [2025-01-03 20:49:26,069][126169] Fps is (10 sec: 13107.0, 60 sec: 12697.5, 300 sec: 12496.3). Total num frames: 19750912. Throughput: 0: 3070.4. Samples: 3926592. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-03 20:49:26,069][126169] Avg episode reward: [(0, '5.790')] [2025-01-03 20:49:26,076][126222] Saving new best policy, reward=5.790! [2025-01-03 20:49:28,709][126248] Updated weights for policy 0, policy_version 4830 (0.0020) [2025-01-03 20:49:31,069][126169] Fps is (10 sec: 13107.1, 60 sec: 12288.0, 300 sec: 12496.3). Total num frames: 19812352. Throughput: 0: 3079.7. Samples: 3944772. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 20:49:31,069][126169] Avg episode reward: [(0, '5.624')] [2025-01-03 20:49:31,939][126248] Updated weights for policy 0, policy_version 4840 (0.0020) [2025-01-03 20:49:35,318][126248] Updated weights for policy 0, policy_version 4850 (0.0020) [2025-01-03 20:49:36,069][126169] Fps is (10 sec: 12287.9, 60 sec: 12288.0, 300 sec: 12496.3). Total num frames: 19873792. Throughput: 0: 3099.4. Samples: 3963376. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 20:49:36,069][126169] Avg episode reward: [(0, '5.730')] [2025-01-03 20:49:38,711][126248] Updated weights for policy 0, policy_version 4860 (0.0021) [2025-01-03 20:49:41,069][126169] Fps is (10 sec: 12287.9, 60 sec: 12356.3, 300 sec: 12510.2). Total num frames: 19935232. Throughput: 0: 3100.9. Samples: 3972424. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-03 20:49:41,069][126169] Avg episode reward: [(0, '5.355')] [2025-01-03 20:49:42,016][126248] Updated weights for policy 0, policy_version 4870 (0.0020) [2025-01-03 20:49:45,286][126248] Updated weights for policy 0, policy_version 4880 (0.0021) [2025-01-03 20:49:46,069][126169] Fps is (10 sec: 12288.1, 60 sec: 12356.3, 300 sec: 12510.2). Total num frames: 19996672. Throughput: 0: 3092.2. Samples: 3990912. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-03 20:49:46,069][126169] Avg episode reward: [(0, '5.769')] [2025-01-03 20:49:48,613][126248] Updated weights for policy 0, policy_version 4890 (0.0020) [2025-01-03 20:49:51,068][126169] Fps is (10 sec: 12697.8, 60 sec: 12492.8, 300 sec: 12537.9). Total num frames: 20062208. Throughput: 0: 3113.4. Samples: 4009616. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 20:49:51,069][126169] Avg episode reward: [(0, '5.358')] [2025-01-03 20:49:51,321][126248] Updated weights for policy 0, policy_version 4900 (0.0015) [2025-01-03 20:49:53,440][126248] Updated weights for policy 0, policy_version 4910 (0.0010) [2025-01-03 20:49:55,550][126248] Updated weights for policy 0, policy_version 4920 (0.0010) [2025-01-03 20:49:56,068][126169] Fps is (10 sec: 16384.3, 60 sec: 13107.3, 300 sec: 12662.9). Total num frames: 20160512. Throughput: 0: 3226.5. Samples: 4023982. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 20:49:56,069][126169] Avg episode reward: [(0, '5.415')] [2025-01-03 20:49:58,181][126248] Updated weights for policy 0, policy_version 4930 (0.0016) [2025-01-03 20:50:01,069][126169] Fps is (10 sec: 15974.0, 60 sec: 13038.9, 300 sec: 12662.9). Total num frames: 20221952. Throughput: 0: 3356.3. Samples: 4048444. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 20:50:01,069][126169] Avg episode reward: [(0, '5.360')] [2025-01-03 20:50:01,840][126248] Updated weights for policy 0, policy_version 4940 (0.0023) [2025-01-03 20:50:05,423][126248] Updated weights for policy 0, policy_version 4950 (0.0021) [2025-01-03 20:50:06,069][126169] Fps is (10 sec: 11878.1, 60 sec: 13038.9, 300 sec: 12649.0). Total num frames: 20279296. Throughput: 0: 3347.9. Samples: 4065400. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-03 20:50:06,069][126169] Avg episode reward: [(0, '5.690')] [2025-01-03 20:50:08,837][126248] Updated weights for policy 0, policy_version 4960 (0.0021) [2025-01-03 20:50:11,069][126169] Fps is (10 sec: 11878.5, 60 sec: 13038.9, 300 sec: 12649.0). Total num frames: 20340736. Throughput: 0: 3284.3. Samples: 4074386. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-03 20:50:11,069][126169] Avg episode reward: [(0, '5.173')] [2025-01-03 20:50:12,337][126248] Updated weights for policy 0, policy_version 4970 (0.0021) [2025-01-03 20:50:15,743][126248] Updated weights for policy 0, policy_version 4980 (0.0021) [2025-01-03 20:50:16,069][126169] Fps is (10 sec: 11878.4, 60 sec: 12970.6, 300 sec: 12635.1). Total num frames: 20398080. Throughput: 0: 3276.6. Samples: 4092220. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-03 20:50:16,069][126169] Avg episode reward: [(0, '5.111')] [2025-01-03 20:50:19,095][126248] Updated weights for policy 0, policy_version 4990 (0.0020) [2025-01-03 20:50:21,069][126169] Fps is (10 sec: 11878.4, 60 sec: 12970.6, 300 sec: 12565.7). Total num frames: 20459520. Throughput: 0: 3262.2. Samples: 4110176. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-03 20:50:21,069][126169] Avg episode reward: [(0, '5.113')] [2025-01-03 20:50:22,447][126248] Updated weights for policy 0, policy_version 5000 (0.0020) [2025-01-03 20:50:25,704][126248] Updated weights for policy 0, policy_version 5010 (0.0020) [2025-01-03 20:50:26,069][126169] Fps is (10 sec: 12697.7, 60 sec: 12902.4, 300 sec: 12593.5). Total num frames: 20525056. Throughput: 0: 3272.2. Samples: 4119674. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-03 20:50:26,069][126169] Avg episode reward: [(0, '5.077')] [2025-01-03 20:50:29,030][126248] Updated weights for policy 0, policy_version 5020 (0.0020) [2025-01-03 20:50:31,069][126169] Fps is (10 sec: 12697.6, 60 sec: 12902.4, 300 sec: 12607.4). Total num frames: 20586496. Throughput: 0: 3274.0. Samples: 4138244. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-03 20:50:31,069][126169] Avg episode reward: [(0, '5.305')] [2025-01-03 20:50:32,373][126248] Updated weights for policy 0, policy_version 5030 (0.0021) [2025-01-03 20:50:35,667][126248] Updated weights for policy 0, policy_version 5040 (0.0020) [2025-01-03 20:50:36,069][126169] Fps is (10 sec: 12288.1, 60 sec: 12902.4, 300 sec: 12621.2). Total num frames: 20647936. Throughput: 0: 3270.9. Samples: 4156806. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-03 20:50:36,069][126169] Avg episode reward: [(0, '5.963')] [2025-01-03 20:50:36,075][126222] Saving new best policy, reward=5.963! [2025-01-03 20:50:39,029][126248] Updated weights for policy 0, policy_version 5050 (0.0021) [2025-01-03 20:50:41,069][126169] Fps is (10 sec: 12288.0, 60 sec: 12902.4, 300 sec: 12621.2). Total num frames: 20709376. Throughput: 0: 3154.5. Samples: 4165936. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 20:50:41,069][126169] Avg episode reward: [(0, '4.851')] [2025-01-03 20:50:42,482][126248] Updated weights for policy 0, policy_version 5060 (0.0020) [2025-01-03 20:50:46,069][126169] Fps is (10 sec: 11468.7, 60 sec: 12765.9, 300 sec: 12593.5). Total num frames: 20762624. Throughput: 0: 3006.0. Samples: 4183714. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 20:50:46,069][126169] Avg episode reward: [(0, '5.219')] [2025-01-03 20:50:46,089][126222] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000005070_20766720.pth... [2025-01-03 20:50:46,094][126248] Updated weights for policy 0, policy_version 5070 (0.0021) [2025-01-03 20:50:46,155][126222] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000004332_17743872.pth [2025-01-03 20:50:49,572][126248] Updated weights for policy 0, policy_version 5080 (0.0021) [2025-01-03 20:50:51,069][126169] Fps is (10 sec: 11468.9, 60 sec: 12697.6, 300 sec: 12607.4). Total num frames: 20824064. Throughput: 0: 3016.8. Samples: 4201156. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 20:50:51,069][126169] Avg episode reward: [(0, '4.860')] [2025-01-03 20:50:52,897][126248] Updated weights for policy 0, policy_version 5090 (0.0020) [2025-01-03 20:50:56,069][126169] Fps is (10 sec: 12288.0, 60 sec: 12083.2, 300 sec: 12565.7). Total num frames: 20885504. Throughput: 0: 3025.8. Samples: 4210548. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 20:50:56,069][126169] Avg episode reward: [(0, '4.953')] [2025-01-03 20:50:56,123][126248] Updated weights for policy 0, policy_version 5100 (0.0019) [2025-01-03 20:50:59,363][126248] Updated weights for policy 0, policy_version 5110 (0.0020) [2025-01-03 20:51:01,068][126169] Fps is (10 sec: 12697.7, 60 sec: 12151.5, 300 sec: 12468.5). Total num frames: 20951040. Throughput: 0: 3050.6. Samples: 4229496. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 20:51:01,069][126169] Avg episode reward: [(0, '4.781')] [2025-01-03 20:51:02,657][126248] Updated weights for policy 0, policy_version 5120 (0.0020) [2025-01-03 20:51:05,908][126248] Updated weights for policy 0, policy_version 5130 (0.0020) [2025-01-03 20:51:06,069][126169] Fps is (10 sec: 12697.5, 60 sec: 12219.7, 300 sec: 12482.4). Total num frames: 21012480. Throughput: 0: 3070.7. Samples: 4248356. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 20:51:06,069][126169] Avg episode reward: [(0, '4.881')] [2025-01-03 20:51:09,201][126248] Updated weights for policy 0, policy_version 5140 (0.0020) [2025-01-03 20:51:11,069][126169] Fps is (10 sec: 12287.8, 60 sec: 12219.7, 300 sec: 12482.4). Total num frames: 21073920. Throughput: 0: 3063.9. Samples: 4257548. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 20:51:11,069][126169] Avg episode reward: [(0, '4.888')] [2025-01-03 20:51:12,503][126248] Updated weights for policy 0, policy_version 5150 (0.0020) [2025-01-03 20:51:15,754][126248] Updated weights for policy 0, policy_version 5160 (0.0018) [2025-01-03 20:51:16,069][126169] Fps is (10 sec: 12288.0, 60 sec: 12288.0, 300 sec: 12496.3). Total num frames: 21135360. Throughput: 0: 3068.7. Samples: 4276338. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 20:51:16,069][126169] Avg episode reward: [(0, '4.907')] [2025-01-03 20:51:19,121][126248] Updated weights for policy 0, policy_version 5170 (0.0021) [2025-01-03 20:51:21,068][126169] Fps is (10 sec: 13107.4, 60 sec: 12424.6, 300 sec: 12537.9). Total num frames: 21204992. Throughput: 0: 3062.3. Samples: 4294610. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 20:51:21,069][126169] Avg episode reward: [(0, '4.985')] [2025-01-03 20:51:21,829][126248] Updated weights for policy 0, policy_version 5180 (0.0014) [2025-01-03 20:51:24,258][126248] Updated weights for policy 0, policy_version 5190 (0.0014) [2025-01-03 20:51:26,069][126169] Fps is (10 sec: 14336.2, 60 sec: 12561.1, 300 sec: 12579.6). Total num frames: 21278720. Throughput: 0: 3152.4. Samples: 4307794. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 20:51:26,069][126169] Avg episode reward: [(0, '4.617')] [2025-01-03 20:51:27,656][126248] Updated weights for policy 0, policy_version 5200 (0.0021) [2025-01-03 20:51:31,009][126248] Updated weights for policy 0, policy_version 5210 (0.0021) [2025-01-03 20:51:31,069][126169] Fps is (10 sec: 13516.5, 60 sec: 12561.1, 300 sec: 12593.5). Total num frames: 21340160. Throughput: 0: 3185.0. Samples: 4327038. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 20:51:31,069][126169] Avg episode reward: [(0, '4.916')] [2025-01-03 20:51:34,487][126248] Updated weights for policy 0, policy_version 5220 (0.0021) [2025-01-03 20:51:36,069][126169] Fps is (10 sec: 11878.3, 60 sec: 12492.8, 300 sec: 12593.5). Total num frames: 21397504. Throughput: 0: 3188.3. Samples: 4344630. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 20:51:36,069][126169] Avg episode reward: [(0, '4.881')] [2025-01-03 20:51:38,014][126248] Updated weights for policy 0, policy_version 5230 (0.0022) [2025-01-03 20:51:41,069][126169] Fps is (10 sec: 11468.8, 60 sec: 12424.5, 300 sec: 12579.6). Total num frames: 21454848. Throughput: 0: 3174.3. Samples: 4353390. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 20:51:41,069][126169] Avg episode reward: [(0, '5.073')] [2025-01-03 20:51:41,714][126248] Updated weights for policy 0, policy_version 5240 (0.0023) [2025-01-03 20:51:45,083][126248] Updated weights for policy 0, policy_version 5250 (0.0020) [2025-01-03 20:51:46,069][126169] Fps is (10 sec: 11468.7, 60 sec: 12492.8, 300 sec: 12579.6). Total num frames: 21512192. Throughput: 0: 3139.3. Samples: 4370764. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 20:51:46,069][126169] Avg episode reward: [(0, '4.767')] [2025-01-03 20:51:48,379][126248] Updated weights for policy 0, policy_version 5260 (0.0020) [2025-01-03 20:51:51,069][126169] Fps is (10 sec: 11878.4, 60 sec: 12492.8, 300 sec: 12579.6). Total num frames: 21573632. Throughput: 0: 3121.8. Samples: 4388838. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 20:51:51,069][126169] Avg episode reward: [(0, '4.787')] [2025-01-03 20:51:52,011][126248] Updated weights for policy 0, policy_version 5270 (0.0022) [2025-01-03 20:51:55,529][126248] Updated weights for policy 0, policy_version 5280 (0.0021) [2025-01-03 20:51:56,069][126169] Fps is (10 sec: 11878.5, 60 sec: 12424.5, 300 sec: 12565.7). Total num frames: 21630976. Throughput: 0: 3106.6. Samples: 4397346. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 20:51:56,069][126169] Avg episode reward: [(0, '4.779')] [2025-01-03 20:51:58,602][126248] Updated weights for policy 0, policy_version 5290 (0.0016) [2025-01-03 20:52:00,762][126248] Updated weights for policy 0, policy_version 5300 (0.0010) [2025-01-03 20:52:01,068][126169] Fps is (10 sec: 13926.7, 60 sec: 12697.6, 300 sec: 12649.0). Total num frames: 21712896. Throughput: 0: 3126.4. Samples: 4417024. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 20:52:01,069][126169] Avg episode reward: [(0, '4.622')] [2025-01-03 20:52:02,890][126248] Updated weights for policy 0, policy_version 5310 (0.0010) [2025-01-03 20:52:04,995][126248] Updated weights for policy 0, policy_version 5320 (0.0010) [2025-01-03 20:52:06,069][126169] Fps is (10 sec: 17613.1, 60 sec: 13243.8, 300 sec: 12760.1). Total num frames: 21807104. Throughput: 0: 3363.7. Samples: 4445976. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 20:52:06,069][126169] Avg episode reward: [(0, '4.846')] [2025-01-03 20:52:07,554][126248] Updated weights for policy 0, policy_version 5330 (0.0015) [2025-01-03 20:52:11,069][126169] Fps is (10 sec: 15564.4, 60 sec: 13243.7, 300 sec: 12760.1). Total num frames: 21868544. Throughput: 0: 3318.8. Samples: 4457140. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 20:52:11,069][126169] Avg episode reward: [(0, '4.754')] [2025-01-03 20:52:11,332][126248] Updated weights for policy 0, policy_version 5340 (0.0023) [2025-01-03 20:52:14,850][126248] Updated weights for policy 0, policy_version 5350 (0.0022) [2025-01-03 20:52:16,069][126169] Fps is (10 sec: 11878.1, 60 sec: 13175.5, 300 sec: 12718.4). Total num frames: 21925888. Throughput: 0: 3261.8. Samples: 4473820. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 20:52:16,069][126169] Avg episode reward: [(0, '5.054')] [2025-01-03 20:52:18,322][126248] Updated weights for policy 0, policy_version 5360 (0.0021) [2025-01-03 20:52:21,069][126169] Fps is (10 sec: 11468.9, 60 sec: 12970.6, 300 sec: 12704.5). Total num frames: 21983232. Throughput: 0: 3255.0. Samples: 4491106. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 20:52:21,069][126169] Avg episode reward: [(0, '4.433')] [2025-01-03 20:52:21,949][126248] Updated weights for policy 0, policy_version 5370 (0.0021) [2025-01-03 20:52:25,258][126248] Updated weights for policy 0, policy_version 5380 (0.0020) [2025-01-03 20:52:26,069][126169] Fps is (10 sec: 11878.4, 60 sec: 12765.8, 300 sec: 12662.9). Total num frames: 22044672. Throughput: 0: 3256.2. Samples: 4499918. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-03 20:52:26,069][126169] Avg episode reward: [(0, '5.032')] [2025-01-03 20:52:28,673][126248] Updated weights for policy 0, policy_version 5390 (0.0021) [2025-01-03 20:52:31,069][126169] Fps is (10 sec: 11878.3, 60 sec: 12697.6, 300 sec: 12649.0). Total num frames: 22102016. Throughput: 0: 3268.7. Samples: 4517856. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2025-01-03 20:52:31,069][126169] Avg episode reward: [(0, '5.548')] [2025-01-03 20:52:32,213][126248] Updated weights for policy 0, policy_version 5400 (0.0021) [2025-01-03 20:52:35,442][126248] Updated weights for policy 0, policy_version 5410 (0.0019) [2025-01-03 20:52:36,069][126169] Fps is (10 sec: 11878.6, 60 sec: 12765.9, 300 sec: 12635.1). Total num frames: 22163456. Throughput: 0: 3274.9. Samples: 4536208. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2025-01-03 20:52:36,069][126169] Avg episode reward: [(0, '5.385')] [2025-01-03 20:52:38,756][126248] Updated weights for policy 0, policy_version 5420 (0.0019) [2025-01-03 20:52:41,069][126169] Fps is (10 sec: 12288.0, 60 sec: 12834.1, 300 sec: 12635.1). Total num frames: 22224896. Throughput: 0: 3294.0. Samples: 4545576. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2025-01-03 20:52:41,069][126169] Avg episode reward: [(0, '5.424')] [2025-01-03 20:52:42,193][126248] Updated weights for policy 0, policy_version 5430 (0.0021) [2025-01-03 20:52:45,452][126248] Updated weights for policy 0, policy_version 5440 (0.0020) [2025-01-03 20:52:46,069][126169] Fps is (10 sec: 12287.8, 60 sec: 12902.4, 300 sec: 12635.1). Total num frames: 22286336. Throughput: 0: 3264.5. Samples: 4563928. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-03 20:52:46,069][126169] Avg episode reward: [(0, '5.596')] [2025-01-03 20:52:46,131][126222] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000005442_22290432.pth... [2025-01-03 20:52:46,190][126222] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000004701_19255296.pth [2025-01-03 20:52:48,806][126248] Updated weights for policy 0, policy_version 5450 (0.0020) [2025-01-03 20:52:51,069][126169] Fps is (10 sec: 12697.6, 60 sec: 12970.7, 300 sec: 12649.0). Total num frames: 22351872. Throughput: 0: 3036.7. Samples: 4582630. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 20:52:51,069][126169] Avg episode reward: [(0, '5.505')] [2025-01-03 20:52:52,105][126248] Updated weights for policy 0, policy_version 5460 (0.0021) [2025-01-03 20:52:55,454][126248] Updated weights for policy 0, policy_version 5470 (0.0021) [2025-01-03 20:52:56,069][126169] Fps is (10 sec: 12287.9, 60 sec: 12970.7, 300 sec: 12649.0). Total num frames: 22409216. Throughput: 0: 2988.4. Samples: 4591618. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 20:52:56,069][126169] Avg episode reward: [(0, '4.984')] [2025-01-03 20:52:58,783][126248] Updated weights for policy 0, policy_version 5480 (0.0019) [2025-01-03 20:53:01,069][126169] Fps is (10 sec: 11878.5, 60 sec: 12629.3, 300 sec: 12649.0). Total num frames: 22470656. Throughput: 0: 3026.9. Samples: 4610030. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 20:53:01,069][126169] Avg episode reward: [(0, '5.425')] [2025-01-03 20:53:02,211][126248] Updated weights for policy 0, policy_version 5490 (0.0021) [2025-01-03 20:53:05,502][126248] Updated weights for policy 0, policy_version 5500 (0.0021) [2025-01-03 20:53:06,069][126169] Fps is (10 sec: 12288.2, 60 sec: 12083.2, 300 sec: 12635.1). Total num frames: 22532096. Throughput: 0: 3051.1. Samples: 4628406. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 20:53:06,069][126169] Avg episode reward: [(0, '5.898')] [2025-01-03 20:53:08,849][126248] Updated weights for policy 0, policy_version 5510 (0.0020) [2025-01-03 20:53:11,069][126169] Fps is (10 sec: 12287.9, 60 sec: 12083.2, 300 sec: 12635.1). Total num frames: 22593536. Throughput: 0: 3059.7. Samples: 4637604. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 20:53:11,069][126169] Avg episode reward: [(0, '5.663')] [2025-01-03 20:53:12,260][126248] Updated weights for policy 0, policy_version 5520 (0.0020) [2025-01-03 20:53:15,581][126248] Updated weights for policy 0, policy_version 5530 (0.0020) [2025-01-03 20:53:16,069][126169] Fps is (10 sec: 12287.4, 60 sec: 12151.4, 300 sec: 12649.0). Total num frames: 22654976. Throughput: 0: 3068.2. Samples: 4655928. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 20:53:16,070][126169] Avg episode reward: [(0, '5.319')] [2025-01-03 20:53:18,932][126248] Updated weights for policy 0, policy_version 5540 (0.0021) [2025-01-03 20:53:21,069][126169] Fps is (10 sec: 11878.6, 60 sec: 12151.5, 300 sec: 12621.2). Total num frames: 22712320. Throughput: 0: 3061.1. Samples: 4673956. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 20:53:21,069][126169] Avg episode reward: [(0, '5.292')] [2025-01-03 20:53:22,733][126248] Updated weights for policy 0, policy_version 5550 (0.0023) [2025-01-03 20:53:26,069][126169] Fps is (10 sec: 11059.5, 60 sec: 12014.9, 300 sec: 12510.2). Total num frames: 22765568. Throughput: 0: 3026.4. Samples: 4681762. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 20:53:26,069][126169] Avg episode reward: [(0, '4.902')] [2025-01-03 20:53:26,497][126248] Updated weights for policy 0, policy_version 5560 (0.0023) [2025-01-03 20:53:29,391][126248] Updated weights for policy 0, policy_version 5570 (0.0015) [2025-01-03 20:53:31,068][126169] Fps is (10 sec: 13107.5, 60 sec: 12356.3, 300 sec: 12565.7). Total num frames: 22843392. Throughput: 0: 3026.5. Samples: 4700120. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 20:53:31,069][126169] Avg episode reward: [(0, '4.879')] [2025-01-03 20:53:31,519][126248] Updated weights for policy 0, policy_version 5580 (0.0011) [2025-01-03 20:53:34,298][126248] Updated weights for policy 0, policy_version 5590 (0.0017) [2025-01-03 20:53:36,068][126169] Fps is (10 sec: 14745.9, 60 sec: 12492.8, 300 sec: 12607.4). Total num frames: 22913024. Throughput: 0: 3143.3. Samples: 4724076. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 20:53:36,069][126169] Avg episode reward: [(0, '4.721')] [2025-01-03 20:53:37,718][126248] Updated weights for policy 0, policy_version 5600 (0.0021) [2025-01-03 20:53:41,069][126169] Fps is (10 sec: 13106.8, 60 sec: 12492.8, 300 sec: 12607.4). Total num frames: 22974464. Throughput: 0: 3145.8. Samples: 4733180. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 20:53:41,069][126169] Avg episode reward: [(0, '4.754')] [2025-01-03 20:53:41,134][126248] Updated weights for policy 0, policy_version 5610 (0.0021) [2025-01-03 20:53:44,436][126248] Updated weights for policy 0, policy_version 5620 (0.0020) [2025-01-03 20:53:46,069][126169] Fps is (10 sec: 12287.8, 60 sec: 12492.8, 300 sec: 12621.2). Total num frames: 23035904. Throughput: 0: 3143.1. Samples: 4751472. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 20:53:46,069][126169] Avg episode reward: [(0, '4.673')] [2025-01-03 20:53:47,774][126248] Updated weights for policy 0, policy_version 5630 (0.0020) [2025-01-03 20:53:51,050][126248] Updated weights for policy 0, policy_version 5640 (0.0020) [2025-01-03 20:53:51,069][126169] Fps is (10 sec: 12697.6, 60 sec: 12492.8, 300 sec: 12635.1). Total num frames: 23101440. Throughput: 0: 3150.8. Samples: 4770194. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 20:53:51,069][126169] Avg episode reward: [(0, '5.040')] [2025-01-03 20:53:54,316][126248] Updated weights for policy 0, policy_version 5650 (0.0020) [2025-01-03 20:53:56,069][126169] Fps is (10 sec: 12697.7, 60 sec: 12561.1, 300 sec: 12621.2). Total num frames: 23162880. Throughput: 0: 3153.0. Samples: 4779490. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 20:53:56,069][126169] Avg episode reward: [(0, '5.749')] [2025-01-03 20:53:57,620][126248] Updated weights for policy 0, policy_version 5660 (0.0020) [2025-01-03 20:54:00,923][126248] Updated weights for policy 0, policy_version 5670 (0.0019) [2025-01-03 20:54:01,069][126169] Fps is (10 sec: 12287.6, 60 sec: 12561.0, 300 sec: 12635.1). Total num frames: 23224320. Throughput: 0: 3164.3. Samples: 4798320. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 20:54:01,070][126169] Avg episode reward: [(0, '5.315')] [2025-01-03 20:54:04,295][126248] Updated weights for policy 0, policy_version 5680 (0.0021) [2025-01-03 20:54:06,069][126169] Fps is (10 sec: 11878.4, 60 sec: 12492.8, 300 sec: 12621.2). Total num frames: 23281664. Throughput: 0: 3159.1. Samples: 4816116. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 20:54:06,069][126169] Avg episode reward: [(0, '5.355')] [2025-01-03 20:54:07,761][126248] Updated weights for policy 0, policy_version 5690 (0.0021) [2025-01-03 20:54:10,987][126248] Updated weights for policy 0, policy_version 5700 (0.0020) [2025-01-03 20:54:11,069][126169] Fps is (10 sec: 12288.3, 60 sec: 12561.1, 300 sec: 12635.1). Total num frames: 23347200. Throughput: 0: 3189.8. Samples: 4825304. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 20:54:11,069][126169] Avg episode reward: [(0, '5.737')] [2025-01-03 20:54:14,251][126248] Updated weights for policy 0, policy_version 5710 (0.0020) [2025-01-03 20:54:16,069][126169] Fps is (10 sec: 12697.6, 60 sec: 12561.2, 300 sec: 12635.1). Total num frames: 23408640. Throughput: 0: 3205.0. Samples: 4844344. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 20:54:16,069][126169] Avg episode reward: [(0, '5.556')] [2025-01-03 20:54:17,567][126248] Updated weights for policy 0, policy_version 5720 (0.0021) [2025-01-03 20:54:20,837][126248] Updated weights for policy 0, policy_version 5730 (0.0021) [2025-01-03 20:54:21,068][126169] Fps is (10 sec: 12288.2, 60 sec: 12629.3, 300 sec: 12607.4). Total num frames: 23470080. Throughput: 0: 3086.4. Samples: 4862966. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 20:54:21,069][126169] Avg episode reward: [(0, '5.691')] [2025-01-03 20:54:24,089][126248] Updated weights for policy 0, policy_version 5740 (0.0020) [2025-01-03 20:54:26,069][126169] Fps is (10 sec: 12288.0, 60 sec: 12765.9, 300 sec: 12607.3). Total num frames: 23531520. Throughput: 0: 3092.8. Samples: 4872356. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 20:54:26,069][126169] Avg episode reward: [(0, '5.255')] [2025-01-03 20:54:27,386][126248] Updated weights for policy 0, policy_version 5750 (0.0020) [2025-01-03 20:54:30,653][126248] Updated weights for policy 0, policy_version 5760 (0.0019) [2025-01-03 20:54:31,069][126169] Fps is (10 sec: 12697.4, 60 sec: 12561.0, 300 sec: 12621.2). Total num frames: 23597056. Throughput: 0: 3105.4. Samples: 4891216. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 20:54:31,069][126169] Avg episode reward: [(0, '4.849')] [2025-01-03 20:54:33,935][126248] Updated weights for policy 0, policy_version 5770 (0.0020) [2025-01-03 20:54:36,068][126169] Fps is (10 sec: 13107.3, 60 sec: 12492.8, 300 sec: 12635.1). Total num frames: 23662592. Throughput: 0: 3097.7. Samples: 4909588. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 20:54:36,069][126169] Avg episode reward: [(0, '4.980')] [2025-01-03 20:54:36,696][126248] Updated weights for policy 0, policy_version 5780 (0.0015) [2025-01-03 20:54:38,830][126248] Updated weights for policy 0, policy_version 5790 (0.0011) [2025-01-03 20:54:40,943][126248] Updated weights for policy 0, policy_version 5800 (0.0010) [2025-01-03 20:54:41,068][126169] Fps is (10 sec: 15974.7, 60 sec: 13039.0, 300 sec: 12746.2). Total num frames: 23756800. Throughput: 0: 3197.4. Samples: 4923372. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-03 20:54:41,069][126169] Avg episode reward: [(0, '4.971')] [2025-01-03 20:54:43,197][126248] Updated weights for policy 0, policy_version 5810 (0.0012) [2025-01-03 20:54:46,069][126169] Fps is (10 sec: 16793.4, 60 sec: 13243.7, 300 sec: 12774.0). Total num frames: 23830528. Throughput: 0: 3371.9. Samples: 4950054. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 20:54:46,069][126169] Avg episode reward: [(0, '4.783')] [2025-01-03 20:54:46,078][126222] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000005818_23830528.pth... [2025-01-03 20:54:46,145][126222] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000005070_20766720.pth [2025-01-03 20:54:46,704][126248] Updated weights for policy 0, policy_version 5820 (0.0021) [2025-01-03 20:54:50,301][126248] Updated weights for policy 0, policy_version 5830 (0.0022) [2025-01-03 20:54:51,069][126169] Fps is (10 sec: 13107.2, 60 sec: 13107.2, 300 sec: 12635.1). Total num frames: 23887872. Throughput: 0: 3354.0. Samples: 4967048. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 20:54:51,069][126169] Avg episode reward: [(0, '4.860')] [2025-01-03 20:54:53,810][126248] Updated weights for policy 0, policy_version 5840 (0.0021) [2025-01-03 20:54:56,069][126169] Fps is (10 sec: 11468.7, 60 sec: 13038.9, 300 sec: 12621.2). Total num frames: 23945216. Throughput: 0: 3344.9. Samples: 4975824. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 20:54:56,069][126169] Avg episode reward: [(0, '4.835')] [2025-01-03 20:54:57,352][126248] Updated weights for policy 0, policy_version 5850 (0.0021) [2025-01-03 20:55:00,755][126248] Updated weights for policy 0, policy_version 5860 (0.0021) [2025-01-03 20:55:01,069][126169] Fps is (10 sec: 11468.7, 60 sec: 12970.7, 300 sec: 12621.2). Total num frames: 24002560. Throughput: 0: 3314.7. Samples: 4993504. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 20:55:01,069][126169] Avg episode reward: [(0, '4.472')] [2025-01-03 20:55:04,105][126248] Updated weights for policy 0, policy_version 5870 (0.0021) [2025-01-03 20:55:06,069][126169] Fps is (10 sec: 11878.5, 60 sec: 13038.9, 300 sec: 12621.2). Total num frames: 24064000. Throughput: 0: 3297.5. Samples: 5011354. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 20:55:06,069][126169] Avg episode reward: [(0, '4.841')] [2025-01-03 20:55:07,480][126248] Updated weights for policy 0, policy_version 5880 (0.0020) [2025-01-03 20:55:10,749][126248] Updated weights for policy 0, policy_version 5890 (0.0020) [2025-01-03 20:55:11,069][126169] Fps is (10 sec: 12288.0, 60 sec: 12970.7, 300 sec: 12635.1). Total num frames: 24125440. Throughput: 0: 3299.0. Samples: 5020810. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 20:55:11,069][126169] Avg episode reward: [(0, '4.745')] [2025-01-03 20:55:14,061][126248] Updated weights for policy 0, policy_version 5900 (0.0020) [2025-01-03 20:55:16,069][126169] Fps is (10 sec: 12697.5, 60 sec: 13038.9, 300 sec: 12649.0). Total num frames: 24190976. Throughput: 0: 3293.4. Samples: 5039420. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 20:55:16,069][126169] Avg episode reward: [(0, '4.754')] [2025-01-03 20:55:17,381][126248] Updated weights for policy 0, policy_version 5910 (0.0020) [2025-01-03 20:55:20,806][126248] Updated weights for policy 0, policy_version 5920 (0.0021) [2025-01-03 20:55:21,069][126169] Fps is (10 sec: 12287.9, 60 sec: 12970.6, 300 sec: 12621.2). Total num frames: 24248320. Throughput: 0: 3295.8. Samples: 5057900. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 20:55:21,069][126169] Avg episode reward: [(0, '4.783')] [2025-01-03 20:55:24,579][126248] Updated weights for policy 0, policy_version 5930 (0.0023) [2025-01-03 20:55:26,069][126169] Fps is (10 sec: 11059.2, 60 sec: 12834.1, 300 sec: 12593.5). Total num frames: 24301568. Throughput: 0: 3169.0. Samples: 5065978. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 20:55:26,069][126169] Avg episode reward: [(0, '4.612')] [2025-01-03 20:55:28,183][126248] Updated weights for policy 0, policy_version 5940 (0.0022) [2025-01-03 20:55:31,069][126169] Fps is (10 sec: 11468.8, 60 sec: 12765.9, 300 sec: 12593.5). Total num frames: 24363008. Throughput: 0: 2946.3. Samples: 5082638. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-03 20:55:31,069][126169] Avg episode reward: [(0, '4.784')] [2025-01-03 20:55:31,705][126248] Updated weights for policy 0, policy_version 5950 (0.0022) [2025-01-03 20:55:34,985][126248] Updated weights for policy 0, policy_version 5960 (0.0020) [2025-01-03 20:55:36,069][126169] Fps is (10 sec: 12288.0, 60 sec: 12697.5, 300 sec: 12593.5). Total num frames: 24424448. Throughput: 0: 2972.0. Samples: 5100788. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2025-01-03 20:55:36,069][126169] Avg episode reward: [(0, '4.517')] [2025-01-03 20:55:38,222][126248] Updated weights for policy 0, policy_version 5970 (0.0020) [2025-01-03 20:55:41,069][126169] Fps is (10 sec: 12288.0, 60 sec: 12151.4, 300 sec: 12621.2). Total num frames: 24485888. Throughput: 0: 2988.0. Samples: 5110282. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2025-01-03 20:55:41,069][126169] Avg episode reward: [(0, '4.537')] [2025-01-03 20:55:41,686][126248] Updated weights for policy 0, policy_version 5980 (0.0021) [2025-01-03 20:55:44,551][126248] Updated weights for policy 0, policy_version 5990 (0.0016) [2025-01-03 20:55:46,068][126169] Fps is (10 sec: 13926.7, 60 sec: 12219.8, 300 sec: 12676.8). Total num frames: 24563712. Throughput: 0: 3016.2. Samples: 5129232. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2025-01-03 20:55:46,069][126169] Avg episode reward: [(0, '4.833')] [2025-01-03 20:55:46,662][126248] Updated weights for policy 0, policy_version 6000 (0.0010) [2025-01-03 20:55:48,769][126248] Updated weights for policy 0, policy_version 6010 (0.0011) [2025-01-03 20:55:50,887][126248] Updated weights for policy 0, policy_version 6020 (0.0011) [2025-01-03 20:55:51,068][126169] Fps is (10 sec: 17203.4, 60 sec: 12834.1, 300 sec: 12787.9). Total num frames: 24657920. Throughput: 0: 3266.3. Samples: 5158338. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-03 20:55:51,069][126169] Avg episode reward: [(0, '4.854')] [2025-01-03 20:55:54,173][126248] Updated weights for policy 0, policy_version 6030 (0.0021) [2025-01-03 20:55:56,069][126169] Fps is (10 sec: 15564.5, 60 sec: 12902.4, 300 sec: 12774.0). Total num frames: 24719360. Throughput: 0: 3293.9. Samples: 5169036. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2025-01-03 20:55:56,069][126169] Avg episode reward: [(0, '4.612')] [2025-01-03 20:55:57,717][126248] Updated weights for policy 0, policy_version 6040 (0.0021) [2025-01-03 20:56:01,069][126169] Fps is (10 sec: 11878.2, 60 sec: 12902.4, 300 sec: 12760.1). Total num frames: 24776704. Throughput: 0: 3271.9. Samples: 5186654. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2025-01-03 20:56:01,069][126169] Avg episode reward: [(0, '4.824')] [2025-01-03 20:56:01,179][126248] Updated weights for policy 0, policy_version 6050 (0.0021) [2025-01-03 20:56:04,745][126248] Updated weights for policy 0, policy_version 6060 (0.0022) [2025-01-03 20:56:06,069][126169] Fps is (10 sec: 11468.8, 60 sec: 12834.1, 300 sec: 12746.2). Total num frames: 24834048. Throughput: 0: 3241.0. Samples: 5203744. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2025-01-03 20:56:06,069][126169] Avg episode reward: [(0, '4.619')] [2025-01-03 20:56:08,057][126248] Updated weights for policy 0, policy_version 6070 (0.0020) [2025-01-03 20:56:11,069][126169] Fps is (10 sec: 12288.0, 60 sec: 12902.4, 300 sec: 12760.1). Total num frames: 24899584. Throughput: 0: 3271.6. Samples: 5213200. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2025-01-03 20:56:11,069][126169] Avg episode reward: [(0, '4.695')] [2025-01-03 20:56:11,370][126248] Updated weights for policy 0, policy_version 6080 (0.0020) [2025-01-03 20:56:14,648][126248] Updated weights for policy 0, policy_version 6090 (0.0020) [2025-01-03 20:56:16,069][126169] Fps is (10 sec: 12697.5, 60 sec: 12834.1, 300 sec: 12732.3). Total num frames: 24961024. Throughput: 0: 3316.9. Samples: 5231898. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2025-01-03 20:56:16,069][126169] Avg episode reward: [(0, '5.061')] [2025-01-03 20:56:17,975][126248] Updated weights for policy 0, policy_version 6100 (0.0021) [2025-01-03 20:56:21,069][126169] Fps is (10 sec: 12288.0, 60 sec: 12902.4, 300 sec: 12690.7). Total num frames: 25022464. Throughput: 0: 3329.3. Samples: 5250608. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2025-01-03 20:56:21,069][126169] Avg episode reward: [(0, '4.628')] [2025-01-03 20:56:21,230][126248] Updated weights for policy 0, policy_version 6110 (0.0020) [2025-01-03 20:56:24,506][126248] Updated weights for policy 0, policy_version 6120 (0.0020) [2025-01-03 20:56:26,069][126169] Fps is (10 sec: 12288.0, 60 sec: 13038.9, 300 sec: 12690.7). Total num frames: 25083904. Throughput: 0: 3323.1. Samples: 5259824. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2025-01-03 20:56:26,069][126169] Avg episode reward: [(0, '4.826')] [2025-01-03 20:56:27,956][126248] Updated weights for policy 0, policy_version 6130 (0.0020) [2025-01-03 20:56:31,069][126169] Fps is (10 sec: 12288.1, 60 sec: 13038.9, 300 sec: 12704.5). Total num frames: 25145344. Throughput: 0: 3312.3. Samples: 5278284. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2025-01-03 20:56:31,069][126169] Avg episode reward: [(0, '4.837')] [2025-01-03 20:56:31,210][126248] Updated weights for policy 0, policy_version 6140 (0.0019) [2025-01-03 20:56:34,477][126248] Updated weights for policy 0, policy_version 6150 (0.0020) [2025-01-03 20:56:36,069][126169] Fps is (10 sec: 12287.9, 60 sec: 13038.9, 300 sec: 12718.4). Total num frames: 25206784. Throughput: 0: 3075.6. Samples: 5296742. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-03 20:56:36,069][126169] Avg episode reward: [(0, '4.864')] [2025-01-03 20:56:37,868][126248] Updated weights for policy 0, policy_version 6160 (0.0020) [2025-01-03 20:56:41,069][126169] Fps is (10 sec: 12287.8, 60 sec: 13038.9, 300 sec: 12732.3). Total num frames: 25268224. Throughput: 0: 3043.4. Samples: 5305988. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2025-01-03 20:56:41,069][126169] Avg episode reward: [(0, '4.703')] [2025-01-03 20:56:41,176][126248] Updated weights for policy 0, policy_version 6170 (0.0019) [2025-01-03 20:56:44,466][126248] Updated weights for policy 0, policy_version 6180 (0.0020) [2025-01-03 20:56:46,069][126169] Fps is (10 sec: 12288.2, 60 sec: 12765.9, 300 sec: 12732.3). Total num frames: 25329664. Throughput: 0: 3068.3. Samples: 5324726. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-03 20:56:46,069][126169] Avg episode reward: [(0, '4.636')] [2025-01-03 20:56:46,140][126222] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000006185_25333760.pth... [2025-01-03 20:56:46,200][126222] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000005442_22290432.pth [2025-01-03 20:56:47,838][126248] Updated weights for policy 0, policy_version 6190 (0.0020) [2025-01-03 20:56:51,069][126169] Fps is (10 sec: 12288.2, 60 sec: 12219.7, 300 sec: 12746.2). Total num frames: 25391104. Throughput: 0: 3102.0. Samples: 5343332. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 20:56:51,069][126169] Avg episode reward: [(0, '4.939')] [2025-01-03 20:56:51,131][126248] Updated weights for policy 0, policy_version 6200 (0.0021) [2025-01-03 20:56:54,515][126248] Updated weights for policy 0, policy_version 6210 (0.0021) [2025-01-03 20:56:56,069][126169] Fps is (10 sec: 12287.8, 60 sec: 12219.7, 300 sec: 12676.8). Total num frames: 25452544. Throughput: 0: 3090.3. Samples: 5352264. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 20:56:56,069][126169] Avg episode reward: [(0, '4.727')] [2025-01-03 20:56:57,851][126248] Updated weights for policy 0, policy_version 6220 (0.0020) [2025-01-03 20:57:01,068][126169] Fps is (10 sec: 12288.0, 60 sec: 12288.0, 300 sec: 12565.7). Total num frames: 25513984. Throughput: 0: 3089.2. Samples: 5370912. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 20:57:01,069][126169] Avg episode reward: [(0, '4.647')] [2025-01-03 20:57:01,104][126248] Updated weights for policy 0, policy_version 6230 (0.0020) [2025-01-03 20:57:04,403][126248] Updated weights for policy 0, policy_version 6240 (0.0020) [2025-01-03 20:57:06,069][126169] Fps is (10 sec: 12288.0, 60 sec: 12356.3, 300 sec: 12565.7). Total num frames: 25575424. Throughput: 0: 3085.3. Samples: 5389448. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 20:57:06,069][126169] Avg episode reward: [(0, '4.525')] [2025-01-03 20:57:07,774][126248] Updated weights for policy 0, policy_version 6250 (0.0020) [2025-01-03 20:57:10,162][126248] Updated weights for policy 0, policy_version 6260 (0.0012) [2025-01-03 20:57:11,068][126169] Fps is (10 sec: 13926.4, 60 sec: 12561.1, 300 sec: 12635.1). Total num frames: 25653248. Throughput: 0: 3085.0. Samples: 5398650. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 20:57:11,069][126169] Avg episode reward: [(0, '4.742')] [2025-01-03 20:57:13,195][126248] Updated weights for policy 0, policy_version 6270 (0.0019) [2025-01-03 20:57:16,069][126169] Fps is (10 sec: 13926.2, 60 sec: 12561.0, 300 sec: 12649.0). Total num frames: 25714688. Throughput: 0: 3168.8. Samples: 5420882. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 20:57:16,069][126169] Avg episode reward: [(0, '4.774')] [2025-01-03 20:57:16,607][126248] Updated weights for policy 0, policy_version 6280 (0.0021) [2025-01-03 20:57:20,026][126248] Updated weights for policy 0, policy_version 6290 (0.0021) [2025-01-03 20:57:21,069][126169] Fps is (10 sec: 11878.3, 60 sec: 12492.8, 300 sec: 12635.1). Total num frames: 25772032. Throughput: 0: 3152.2. Samples: 5438590. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 20:57:21,069][126169] Avg episode reward: [(0, '4.708')] [2025-01-03 20:57:23,815][126248] Updated weights for policy 0, policy_version 6300 (0.0024) [2025-01-03 20:57:26,068][126169] Fps is (10 sec: 11469.2, 60 sec: 12424.6, 300 sec: 12635.1). Total num frames: 25829376. Throughput: 0: 3130.9. Samples: 5446876. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-03 20:57:26,069][126169] Avg episode reward: [(0, '4.672')] [2025-01-03 20:57:26,922][126248] Updated weights for policy 0, policy_version 6310 (0.0017) [2025-01-03 20:57:29,061][126248] Updated weights for policy 0, policy_version 6320 (0.0011) [2025-01-03 20:57:31,069][126169] Fps is (10 sec: 14336.0, 60 sec: 12834.1, 300 sec: 12718.4). Total num frames: 25915392. Throughput: 0: 3218.8. Samples: 5469570. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-03 20:57:31,069][126169] Avg episode reward: [(0, '4.864')] [2025-01-03 20:57:31,891][126248] Updated weights for policy 0, policy_version 6330 (0.0018) [2025-01-03 20:57:35,299][126248] Updated weights for policy 0, policy_version 6340 (0.0020) [2025-01-03 20:57:36,069][126169] Fps is (10 sec: 14745.3, 60 sec: 12834.1, 300 sec: 12718.4). Total num frames: 25976832. Throughput: 0: 3237.6. Samples: 5489026. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-03 20:57:36,069][126169] Avg episode reward: [(0, '4.558')] [2025-01-03 20:57:38,746][126248] Updated weights for policy 0, policy_version 6350 (0.0020) [2025-01-03 20:57:41,069][126169] Fps is (10 sec: 11878.3, 60 sec: 12765.9, 300 sec: 12704.5). Total num frames: 26034176. Throughput: 0: 3242.1. Samples: 5498158. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-03 20:57:41,069][126169] Avg episode reward: [(0, '4.705')] [2025-01-03 20:57:42,431][126248] Updated weights for policy 0, policy_version 6360 (0.0022) [2025-01-03 20:57:45,745][126248] Updated weights for policy 0, policy_version 6370 (0.0020) [2025-01-03 20:57:46,069][126169] Fps is (10 sec: 11468.7, 60 sec: 12697.6, 300 sec: 12676.8). Total num frames: 26091520. Throughput: 0: 3210.1. Samples: 5515368. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-03 20:57:46,069][126169] Avg episode reward: [(0, '4.665')] [2025-01-03 20:57:49,225][126248] Updated weights for policy 0, policy_version 6380 (0.0021) [2025-01-03 20:57:51,069][126169] Fps is (10 sec: 11878.4, 60 sec: 12697.6, 300 sec: 12690.7). Total num frames: 26152960. Throughput: 0: 3189.5. Samples: 5532976. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 20:57:51,069][126169] Avg episode reward: [(0, '4.733')] [2025-01-03 20:57:52,770][126248] Updated weights for policy 0, policy_version 6390 (0.0021) [2025-01-03 20:57:56,069][126169] Fps is (10 sec: 11878.5, 60 sec: 12629.3, 300 sec: 12676.8). Total num frames: 26210304. Throughput: 0: 3179.7. Samples: 5541736. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-03 20:57:56,069][126169] Avg episode reward: [(0, '4.942')] [2025-01-03 20:57:56,291][126248] Updated weights for policy 0, policy_version 6400 (0.0021) [2025-01-03 20:57:59,629][126248] Updated weights for policy 0, policy_version 6410 (0.0021) [2025-01-03 20:58:01,068][126169] Fps is (10 sec: 11878.6, 60 sec: 12629.3, 300 sec: 12676.8). Total num frames: 26271744. Throughput: 0: 3088.4. Samples: 5559860. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2025-01-03 20:58:01,069][126169] Avg episode reward: [(0, '4.696')] [2025-01-03 20:58:03,054][126248] Updated weights for policy 0, policy_version 6420 (0.0021) [2025-01-03 20:58:06,069][126169] Fps is (10 sec: 12287.9, 60 sec: 12629.3, 300 sec: 12676.8). Total num frames: 26333184. Throughput: 0: 3098.6. Samples: 5578028. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2025-01-03 20:58:06,069][126169] Avg episode reward: [(0, '4.931')] [2025-01-03 20:58:06,369][126248] Updated weights for policy 0, policy_version 6430 (0.0021) [2025-01-03 20:58:08,950][126248] Updated weights for policy 0, policy_version 6440 (0.0014) [2025-01-03 20:58:11,068][126169] Fps is (10 sec: 14336.1, 60 sec: 12697.6, 300 sec: 12746.2). Total num frames: 26415104. Throughput: 0: 3139.4. Samples: 5588150. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-03 20:58:11,069][126169] Avg episode reward: [(0, '4.869')] [2025-01-03 20:58:11,121][126248] Updated weights for policy 0, policy_version 6450 (0.0011) [2025-01-03 20:58:13,361][126248] Updated weights for policy 0, policy_version 6460 (0.0011) [2025-01-03 20:58:15,985][126248] Updated weights for policy 0, policy_version 6470 (0.0017) [2025-01-03 20:58:16,069][126169] Fps is (10 sec: 16793.6, 60 sec: 13107.2, 300 sec: 12843.4). Total num frames: 26501120. Throughput: 0: 3259.6. Samples: 5616254. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-03 20:58:16,069][126169] Avg episode reward: [(0, '4.897')] [2025-01-03 20:58:19,683][126248] Updated weights for policy 0, policy_version 6480 (0.0023) [2025-01-03 20:58:21,069][126169] Fps is (10 sec: 13926.1, 60 sec: 13038.9, 300 sec: 12843.4). Total num frames: 26554368. Throughput: 0: 3223.9. Samples: 5634100. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-03 20:58:21,069][126169] Avg episode reward: [(0, '4.991')] [2025-01-03 20:58:23,155][126248] Updated weights for policy 0, policy_version 6490 (0.0020) [2025-01-03 20:58:26,069][126169] Fps is (10 sec: 11468.8, 60 sec: 13107.2, 300 sec: 12787.8). Total num frames: 26615808. Throughput: 0: 3220.3. Samples: 5643070. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-03 20:58:26,069][126169] Avg episode reward: [(0, '5.014')] [2025-01-03 20:58:26,739][126248] Updated weights for policy 0, policy_version 6500 (0.0022) [2025-01-03 20:58:30,137][126248] Updated weights for policy 0, policy_version 6510 (0.0021) [2025-01-03 20:58:31,069][126169] Fps is (10 sec: 11878.4, 60 sec: 12629.3, 300 sec: 12746.2). Total num frames: 26673152. Throughput: 0: 3231.1. Samples: 5660768. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-03 20:58:31,069][126169] Avg episode reward: [(0, '4.678')] [2025-01-03 20:58:33,401][126248] Updated weights for policy 0, policy_version 6520 (0.0020) [2025-01-03 20:58:36,069][126169] Fps is (10 sec: 11878.0, 60 sec: 12629.3, 300 sec: 12746.2). Total num frames: 26734592. Throughput: 0: 3241.5. Samples: 5678846. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-03 20:58:36,070][126169] Avg episode reward: [(0, '5.031')] [2025-01-03 20:58:36,878][126248] Updated weights for policy 0, policy_version 6530 (0.0021) [2025-01-03 20:58:40,295][126248] Updated weights for policy 0, policy_version 6540 (0.0021) [2025-01-03 20:58:41,069][126169] Fps is (10 sec: 12288.1, 60 sec: 12697.6, 300 sec: 12746.2). Total num frames: 26796032. Throughput: 0: 3247.9. Samples: 5687892. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2025-01-03 20:58:41,069][126169] Avg episode reward: [(0, '5.194')] [2025-01-03 20:58:43,671][126248] Updated weights for policy 0, policy_version 6550 (0.0020) [2025-01-03 20:58:46,069][126169] Fps is (10 sec: 11878.7, 60 sec: 12697.6, 300 sec: 12718.4). Total num frames: 26853376. Throughput: 0: 3243.7. Samples: 5705828. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2025-01-03 20:58:46,069][126169] Avg episode reward: [(0, '4.907')] [2025-01-03 20:58:46,144][126222] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000006557_26857472.pth... [2025-01-03 20:58:46,209][126222] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000005818_23830528.pth [2025-01-03 20:58:47,247][126248] Updated weights for policy 0, policy_version 6560 (0.0021) [2025-01-03 20:58:50,578][126248] Updated weights for policy 0, policy_version 6570 (0.0019) [2025-01-03 20:58:51,069][126169] Fps is (10 sec: 11878.3, 60 sec: 12697.6, 300 sec: 12718.4). Total num frames: 26914816. Throughput: 0: 3238.1. Samples: 5723742. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-03 20:58:51,069][126169] Avg episode reward: [(0, '5.354')] [2025-01-03 20:58:54,005][126248] Updated weights for policy 0, policy_version 6580 (0.0021) [2025-01-03 20:58:56,069][126169] Fps is (10 sec: 12288.2, 60 sec: 12765.9, 300 sec: 12718.4). Total num frames: 26976256. Throughput: 0: 3213.4. Samples: 5732754. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-03 20:58:56,069][126169] Avg episode reward: [(0, '5.337')] [2025-01-03 20:58:57,451][126248] Updated weights for policy 0, policy_version 6590 (0.0020) [2025-01-03 20:59:00,760][126248] Updated weights for policy 0, policy_version 6600 (0.0020) [2025-01-03 20:59:01,069][126169] Fps is (10 sec: 11878.5, 60 sec: 12697.6, 300 sec: 12718.4). Total num frames: 27033600. Throughput: 0: 2990.9. Samples: 5750846. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-03 20:59:01,069][126169] Avg episode reward: [(0, '5.265')] [2025-01-03 20:59:04,120][126248] Updated weights for policy 0, policy_version 6610 (0.0020) [2025-01-03 20:59:06,068][126169] Fps is (10 sec: 11878.6, 60 sec: 12697.6, 300 sec: 12704.5). Total num frames: 27095040. Throughput: 0: 3000.2. Samples: 5769110. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-03 20:59:06,069][126169] Avg episode reward: [(0, '5.171')] [2025-01-03 20:59:07,388][126248] Updated weights for policy 0, policy_version 6620 (0.0020) [2025-01-03 20:59:10,630][126248] Updated weights for policy 0, policy_version 6630 (0.0019) [2025-01-03 20:59:11,069][126169] Fps is (10 sec: 12697.5, 60 sec: 12424.5, 300 sec: 12718.4). Total num frames: 27160576. Throughput: 0: 3016.4. Samples: 5778808. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 20:59:11,069][126169] Avg episode reward: [(0, '5.535')] [2025-01-03 20:59:13,921][126248] Updated weights for policy 0, policy_version 6640 (0.0020) [2025-01-03 20:59:16,069][126169] Fps is (10 sec: 12697.4, 60 sec: 12014.9, 300 sec: 12718.4). Total num frames: 27222016. Throughput: 0: 3037.6. Samples: 5797460. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 20:59:16,069][126169] Avg episode reward: [(0, '5.666')] [2025-01-03 20:59:17,363][126248] Updated weights for policy 0, policy_version 6650 (0.0020) [2025-01-03 20:59:20,669][126248] Updated weights for policy 0, policy_version 6660 (0.0020) [2025-01-03 20:59:21,069][126169] Fps is (10 sec: 12288.0, 60 sec: 12151.5, 300 sec: 12718.4). Total num frames: 27283456. Throughput: 0: 3042.4. Samples: 5815754. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 20:59:21,069][126169] Avg episode reward: [(0, '5.019')] [2025-01-03 20:59:24,335][126248] Updated weights for policy 0, policy_version 6670 (0.0022) [2025-01-03 20:59:26,069][126169] Fps is (10 sec: 11468.9, 60 sec: 12015.0, 300 sec: 12676.8). Total num frames: 27336704. Throughput: 0: 3027.6. Samples: 5824136. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2025-01-03 20:59:26,069][126169] Avg episode reward: [(0, '4.917')] [2025-01-03 20:59:27,681][126248] Updated weights for policy 0, policy_version 6680 (0.0019) [2025-01-03 20:59:30,147][126248] Updated weights for policy 0, policy_version 6690 (0.0014) [2025-01-03 20:59:31,069][126169] Fps is (10 sec: 12697.7, 60 sec: 12288.0, 300 sec: 12704.5). Total num frames: 27410432. Throughput: 0: 3078.5. Samples: 5844362. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2025-01-03 20:59:31,069][126169] Avg episode reward: [(0, '5.150')] [2025-01-03 20:59:33,553][126248] Updated weights for policy 0, policy_version 6700 (0.0021) [2025-01-03 20:59:36,069][126169] Fps is (10 sec: 13516.8, 60 sec: 12288.1, 300 sec: 12593.5). Total num frames: 27471872. Throughput: 0: 3095.0. Samples: 5863018. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2025-01-03 20:59:36,069][126169] Avg episode reward: [(0, '6.117')] [2025-01-03 20:59:36,076][126222] Saving new best policy, reward=6.117! [2025-01-03 20:59:36,959][126248] Updated weights for policy 0, policy_version 6710 (0.0020) [2025-01-03 20:59:40,308][126248] Updated weights for policy 0, policy_version 6720 (0.0020) [2025-01-03 20:59:41,069][126169] Fps is (10 sec: 12287.9, 60 sec: 12288.0, 300 sec: 12551.8). Total num frames: 27533312. Throughput: 0: 3093.5. Samples: 5871960. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-03 20:59:41,069][126169] Avg episode reward: [(0, '5.792')] [2025-01-03 20:59:43,642][126248] Updated weights for policy 0, policy_version 6730 (0.0020) [2025-01-03 20:59:46,069][126169] Fps is (10 sec: 12287.9, 60 sec: 12356.3, 300 sec: 12565.7). Total num frames: 27594752. Throughput: 0: 3100.9. Samples: 5890386. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2025-01-03 20:59:46,069][126169] Avg episode reward: [(0, '5.867')] [2025-01-03 20:59:47,022][126248] Updated weights for policy 0, policy_version 6740 (0.0021) [2025-01-03 20:59:50,322][126248] Updated weights for policy 0, policy_version 6750 (0.0020) [2025-01-03 20:59:51,069][126169] Fps is (10 sec: 12288.2, 60 sec: 12356.3, 300 sec: 12579.6). Total num frames: 27656192. Throughput: 0: 3105.5. Samples: 5908856. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-03 20:59:51,069][126169] Avg episode reward: [(0, '5.613')] [2025-01-03 20:59:53,594][126248] Updated weights for policy 0, policy_version 6760 (0.0020) [2025-01-03 20:59:56,069][126169] Fps is (10 sec: 12288.0, 60 sec: 12356.3, 300 sec: 12593.5). Total num frames: 27717632. Throughput: 0: 3099.6. Samples: 5918288. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-03 20:59:56,069][126169] Avg episode reward: [(0, '5.375')] [2025-01-03 20:59:56,981][126248] Updated weights for policy 0, policy_version 6770 (0.0020) [2025-01-03 21:00:00,309][126248] Updated weights for policy 0, policy_version 6780 (0.0020) [2025-01-03 21:00:01,069][126169] Fps is (10 sec: 12287.8, 60 sec: 12424.5, 300 sec: 12593.5). Total num frames: 27779072. Throughput: 0: 3092.4. Samples: 5936618. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-03 21:00:01,069][126169] Avg episode reward: [(0, '5.406')] [2025-01-03 21:00:03,458][126248] Updated weights for policy 0, policy_version 6790 (0.0017) [2025-01-03 21:00:05,641][126248] Updated weights for policy 0, policy_version 6800 (0.0011) [2025-01-03 21:00:06,068][126169] Fps is (10 sec: 13926.8, 60 sec: 12697.6, 300 sec: 12649.0). Total num frames: 27856896. Throughput: 0: 3166.4. Samples: 5958240. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2025-01-03 21:00:06,069][126169] Avg episode reward: [(0, '4.797')] [2025-01-03 21:00:07,801][126248] Updated weights for policy 0, policy_version 6810 (0.0011) [2025-01-03 21:00:09,915][126248] Updated weights for policy 0, policy_version 6820 (0.0010) [2025-01-03 21:00:11,068][126169] Fps is (10 sec: 17613.2, 60 sec: 13243.8, 300 sec: 12760.1). Total num frames: 27955200. Throughput: 0: 3296.7. Samples: 5972486. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-03 21:00:11,069][126169] Avg episode reward: [(0, '4.826')] [2025-01-03 21:00:12,499][126248] Updated weights for policy 0, policy_version 6830 (0.0015) [2025-01-03 21:00:16,069][126169] Fps is (10 sec: 15564.2, 60 sec: 13175.4, 300 sec: 12760.1). Total num frames: 28012544. Throughput: 0: 3363.9. Samples: 5995736. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2025-01-03 21:00:16,069][126169] Avg episode reward: [(0, '4.762')] [2025-01-03 21:00:16,369][126248] Updated weights for policy 0, policy_version 6840 (0.0024) [2025-01-03 21:00:20,551][126248] Updated weights for policy 0, policy_version 6850 (0.0024) [2025-01-03 21:00:21,069][126169] Fps is (10 sec: 10649.3, 60 sec: 12970.7, 300 sec: 12746.2). Total num frames: 28061696. Throughput: 0: 3282.7. Samples: 6010740. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2025-01-03 21:00:21,069][126169] Avg episode reward: [(0, '4.817')] [2025-01-03 21:00:24,403][126248] Updated weights for policy 0, policy_version 6860 (0.0023) [2025-01-03 21:00:26,069][126169] Fps is (10 sec: 10239.9, 60 sec: 12970.6, 300 sec: 12718.4). Total num frames: 28114944. Throughput: 0: 3260.9. Samples: 6018700. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2025-01-03 21:00:26,070][126169] Avg episode reward: [(0, '4.769')] [2025-01-03 21:00:28,217][126248] Updated weights for policy 0, policy_version 6870 (0.0023) [2025-01-03 21:00:31,069][126169] Fps is (10 sec: 10649.6, 60 sec: 12629.3, 300 sec: 12690.7). Total num frames: 28168192. Throughput: 0: 3204.2. Samples: 6034576. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-03 21:00:31,069][126169] Avg episode reward: [(0, '4.530')] [2025-01-03 21:00:32,170][126248] Updated weights for policy 0, policy_version 6880 (0.0023) [2025-01-03 21:00:35,968][126248] Updated weights for policy 0, policy_version 6890 (0.0023) [2025-01-03 21:00:36,069][126169] Fps is (10 sec: 10649.8, 60 sec: 12492.8, 300 sec: 12662.9). Total num frames: 28221440. Throughput: 0: 3149.8. Samples: 6050600. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2025-01-03 21:00:36,069][126169] Avg episode reward: [(0, '4.824')] [2025-01-03 21:00:38,496][126248] Updated weights for policy 0, policy_version 6900 (0.0013) [2025-01-03 21:00:40,770][126248] Updated weights for policy 0, policy_version 6910 (0.0012) [2025-01-03 21:00:41,068][126169] Fps is (10 sec: 13926.7, 60 sec: 12902.4, 300 sec: 12690.7). Total num frames: 28307456. Throughput: 0: 3182.5. Samples: 6061498. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-03 21:00:41,069][126169] Avg episode reward: [(0, '4.608')] [2025-01-03 21:00:43,704][126248] Updated weights for policy 0, policy_version 6920 (0.0018) [2025-01-03 21:00:46,069][126169] Fps is (10 sec: 14336.0, 60 sec: 12834.1, 300 sec: 12565.7). Total num frames: 28364800. Throughput: 0: 3284.7. Samples: 6084432. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2025-01-03 21:00:46,069][126169] Avg episode reward: [(0, '4.898')] [2025-01-03 21:00:46,081][126222] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000006925_28364800.pth... [2025-01-03 21:00:46,164][126222] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000006185_25333760.pth [2025-01-03 21:00:48,031][126248] Updated weights for policy 0, policy_version 6930 (0.0026) [2025-01-03 21:00:51,069][126169] Fps is (10 sec: 10649.4, 60 sec: 12629.3, 300 sec: 12524.0). Total num frames: 28413952. Throughput: 0: 3121.8. Samples: 6098722. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2025-01-03 21:00:51,069][126169] Avg episode reward: [(0, '5.066')] [2025-01-03 21:00:52,141][126248] Updated weights for policy 0, policy_version 6940 (0.0022) [2025-01-03 21:00:55,942][126248] Updated weights for policy 0, policy_version 6950 (0.0023) [2025-01-03 21:00:56,069][126169] Fps is (10 sec: 10240.0, 60 sec: 12492.8, 300 sec: 12510.2). Total num frames: 28467200. Throughput: 0: 2979.2. Samples: 6106552. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2025-01-03 21:00:56,069][126169] Avg episode reward: [(0, '5.392')] [2025-01-03 21:00:59,833][126248] Updated weights for policy 0, policy_version 6960 (0.0022) [2025-01-03 21:01:01,068][126169] Fps is (10 sec: 11469.0, 60 sec: 12492.8, 300 sec: 12524.0). Total num frames: 28528640. Throughput: 0: 2817.0. Samples: 6122498. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2025-01-03 21:01:01,069][126169] Avg episode reward: [(0, '5.369')] [2025-01-03 21:01:02,196][126248] Updated weights for policy 0, policy_version 6970 (0.0012) [2025-01-03 21:01:04,469][126248] Updated weights for policy 0, policy_version 6980 (0.0012) [2025-01-03 21:01:06,069][126169] Fps is (10 sec: 14745.9, 60 sec: 12629.3, 300 sec: 12593.5). Total num frames: 28614656. Throughput: 0: 3047.1. Samples: 6147858. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-03 21:01:06,069][126169] Avg episode reward: [(0, '5.600')] [2025-01-03 21:01:07,105][126248] Updated weights for policy 0, policy_version 6990 (0.0015) [2025-01-03 21:01:11,069][126169] Fps is (10 sec: 13926.0, 60 sec: 11878.3, 300 sec: 12565.7). Total num frames: 28667904. Throughput: 0: 3094.3. Samples: 6157942. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-03 21:01:11,070][126169] Avg episode reward: [(0, '5.470')] [2025-01-03 21:01:11,150][126248] Updated weights for policy 0, policy_version 7000 (0.0023) [2025-01-03 21:01:14,999][126248] Updated weights for policy 0, policy_version 7010 (0.0023) [2025-01-03 21:01:16,069][126169] Fps is (10 sec: 10649.5, 60 sec: 11810.2, 300 sec: 12537.9). Total num frames: 28721152. Throughput: 0: 3085.6. Samples: 6173430. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-03 21:01:16,069][126169] Avg episode reward: [(0, '5.454')] [2025-01-03 21:01:18,860][126248] Updated weights for policy 0, policy_version 7020 (0.0022) [2025-01-03 21:01:21,069][126169] Fps is (10 sec: 10649.7, 60 sec: 11878.4, 300 sec: 12510.2). Total num frames: 28774400. Throughput: 0: 3077.4. Samples: 6189082. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-03 21:01:21,069][126169] Avg episode reward: [(0, '5.623')] [2025-01-03 21:01:22,856][126248] Updated weights for policy 0, policy_version 7030 (0.0023) [2025-01-03 21:01:26,069][126169] Fps is (10 sec: 10649.6, 60 sec: 11878.4, 300 sec: 12482.4). Total num frames: 28827648. Throughput: 0: 3010.4. Samples: 6196966. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-03 21:01:26,069][126169] Avg episode reward: [(0, '5.712')] [2025-01-03 21:01:26,677][126248] Updated weights for policy 0, policy_version 7040 (0.0022) [2025-01-03 21:01:30,518][126248] Updated weights for policy 0, policy_version 7050 (0.0023) [2025-01-03 21:01:31,069][126169] Fps is (10 sec: 10649.5, 60 sec: 11878.4, 300 sec: 12454.6). Total num frames: 28880896. Throughput: 0: 2861.9. Samples: 6213216. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 21:01:31,069][126169] Avg episode reward: [(0, '5.559')] [2025-01-03 21:01:34,196][126248] Updated weights for policy 0, policy_version 7060 (0.0022) [2025-01-03 21:01:36,069][126169] Fps is (10 sec: 10649.6, 60 sec: 11878.4, 300 sec: 12426.8). Total num frames: 28934144. Throughput: 0: 2902.1. Samples: 6229316. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-03 21:01:36,069][126169] Avg episode reward: [(0, '5.893')] [2025-01-03 21:01:37,909][126248] Updated weights for policy 0, policy_version 7070 (0.0022) [2025-01-03 21:01:41,069][126169] Fps is (10 sec: 11059.2, 60 sec: 11400.5, 300 sec: 12413.0). Total num frames: 28991488. Throughput: 0: 2917.1. Samples: 6237820. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-03 21:01:41,069][126169] Avg episode reward: [(0, '5.588')] [2025-01-03 21:01:41,631][126248] Updated weights for policy 0, policy_version 7080 (0.0022) [2025-01-03 21:01:44,373][126248] Updated weights for policy 0, policy_version 7090 (0.0014) [2025-01-03 21:01:46,068][126169] Fps is (10 sec: 13517.1, 60 sec: 11741.9, 300 sec: 12468.5). Total num frames: 29069312. Throughput: 0: 2981.3. Samples: 6256658. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 21:01:46,069][126169] Avg episode reward: [(0, '5.271')] [2025-01-03 21:01:46,619][126248] Updated weights for policy 0, policy_version 7100 (0.0011) [2025-01-03 21:01:48,845][126248] Updated weights for policy 0, policy_version 7110 (0.0011) [2025-01-03 21:01:51,069][126169] Fps is (10 sec: 16793.9, 60 sec: 12424.6, 300 sec: 12565.7). Total num frames: 29159424. Throughput: 0: 3030.8. Samples: 6284244. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 21:01:51,069][126169] Avg episode reward: [(0, '4.808')] [2025-01-03 21:01:51,127][126248] Updated weights for policy 0, policy_version 7120 (0.0012) [2025-01-03 21:01:55,075][126248] Updated weights for policy 0, policy_version 7130 (0.0024) [2025-01-03 21:01:56,069][126169] Fps is (10 sec: 14335.8, 60 sec: 12424.6, 300 sec: 12537.9). Total num frames: 29212672. Throughput: 0: 3022.2. Samples: 6293942. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 21:01:56,069][126169] Avg episode reward: [(0, '5.098')] [2025-01-03 21:01:58,709][126248] Updated weights for policy 0, policy_version 7140 (0.0021) [2025-01-03 21:02:01,069][126169] Fps is (10 sec: 11059.1, 60 sec: 12356.2, 300 sec: 12524.0). Total num frames: 29270016. Throughput: 0: 3038.0. Samples: 6310142. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 21:02:01,069][126169] Avg episode reward: [(0, '4.728')] [2025-01-03 21:02:02,335][126248] Updated weights for policy 0, policy_version 7150 (0.0021) [2025-01-03 21:02:05,696][126248] Updated weights for policy 0, policy_version 7160 (0.0020) [2025-01-03 21:02:06,069][126169] Fps is (10 sec: 11878.4, 60 sec: 11946.6, 300 sec: 12468.5). Total num frames: 29331456. Throughput: 0: 3080.4. Samples: 6327700. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 21:02:06,069][126169] Avg episode reward: [(0, '4.724')] [2025-01-03 21:02:09,107][126248] Updated weights for policy 0, policy_version 7170 (0.0021) [2025-01-03 21:02:11,069][126169] Fps is (10 sec: 11878.4, 60 sec: 12015.0, 300 sec: 12454.6). Total num frames: 29388800. Throughput: 0: 3107.7. Samples: 6336810. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 21:02:11,069][126169] Avg episode reward: [(0, '4.565')] [2025-01-03 21:02:12,647][126248] Updated weights for policy 0, policy_version 7180 (0.0021) [2025-01-03 21:02:15,942][126248] Updated weights for policy 0, policy_version 7190 (0.0020) [2025-01-03 21:02:16,069][126169] Fps is (10 sec: 11878.5, 60 sec: 12151.5, 300 sec: 12468.5). Total num frames: 29450240. Throughput: 0: 3139.0. Samples: 6354470. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-03 21:02:16,069][126169] Avg episode reward: [(0, '4.691')] [2025-01-03 21:02:19,272][126248] Updated weights for policy 0, policy_version 7200 (0.0020) [2025-01-03 21:02:21,069][126169] Fps is (10 sec: 12288.0, 60 sec: 12288.0, 300 sec: 12482.4). Total num frames: 29511680. Throughput: 0: 3189.2. Samples: 6372830. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-03 21:02:21,069][126169] Avg episode reward: [(0, '4.970')] [2025-01-03 21:02:22,719][126248] Updated weights for policy 0, policy_version 7210 (0.0021) [2025-01-03 21:02:26,069][126169] Fps is (10 sec: 11468.6, 60 sec: 12288.0, 300 sec: 12371.3). Total num frames: 29564928. Throughput: 0: 3195.7. Samples: 6381628. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-03 21:02:26,070][126169] Avg episode reward: [(0, '4.642')] [2025-01-03 21:02:26,634][126248] Updated weights for policy 0, policy_version 7220 (0.0024) [2025-01-03 21:02:30,356][126248] Updated weights for policy 0, policy_version 7230 (0.0023) [2025-01-03 21:02:31,069][126169] Fps is (10 sec: 11059.2, 60 sec: 12356.3, 300 sec: 12357.4). Total num frames: 29622272. Throughput: 0: 3134.8. Samples: 6397726. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-03 21:02:31,069][126169] Avg episode reward: [(0, '4.923')] [2025-01-03 21:02:33,725][126248] Updated weights for policy 0, policy_version 7240 (0.0020) [2025-01-03 21:02:36,069][126169] Fps is (10 sec: 11878.4, 60 sec: 12492.8, 300 sec: 12371.3). Total num frames: 29683712. Throughput: 0: 2915.3. Samples: 6415434. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 21:02:36,069][126169] Avg episode reward: [(0, '4.666')] [2025-01-03 21:02:37,096][126248] Updated weights for policy 0, policy_version 7250 (0.0020) [2025-01-03 21:02:40,424][126248] Updated weights for policy 0, policy_version 7260 (0.0020) [2025-01-03 21:02:41,069][126169] Fps is (10 sec: 11878.4, 60 sec: 12492.8, 300 sec: 12371.3). Total num frames: 29741056. Throughput: 0: 2902.8. Samples: 6424566. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-03 21:02:41,069][126169] Avg episode reward: [(0, '4.817')] [2025-01-03 21:02:43,718][126248] Updated weights for policy 0, policy_version 7270 (0.0020) [2025-01-03 21:02:46,069][126169] Fps is (10 sec: 12288.1, 60 sec: 12288.0, 300 sec: 12385.2). Total num frames: 29806592. Throughput: 0: 2958.6. Samples: 6443280. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 21:02:46,069][126169] Avg episode reward: [(0, '4.966')] [2025-01-03 21:02:46,077][126222] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000007277_29806592.pth... [2025-01-03 21:02:46,138][126222] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000006557_26857472.pth [2025-01-03 21:02:47,140][126248] Updated weights for policy 0, policy_version 7280 (0.0020) [2025-01-03 21:02:50,399][126248] Updated weights for policy 0, policy_version 7290 (0.0021) [2025-01-03 21:02:51,069][126169] Fps is (10 sec: 12288.0, 60 sec: 11741.9, 300 sec: 12385.2). Total num frames: 29863936. Throughput: 0: 2979.2. Samples: 6461762. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 21:02:51,069][126169] Avg episode reward: [(0, '4.852')] [2025-01-03 21:02:53,703][126248] Updated weights for policy 0, policy_version 7300 (0.0020) [2025-01-03 21:02:56,069][126169] Fps is (10 sec: 12287.8, 60 sec: 11946.7, 300 sec: 12399.1). Total num frames: 29929472. Throughput: 0: 2982.4. Samples: 6471020. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 21:02:56,069][126169] Avg episode reward: [(0, '4.441')] [2025-01-03 21:02:57,103][126248] Updated weights for policy 0, policy_version 7310 (0.0020) [2025-01-03 21:03:00,369][126248] Updated weights for policy 0, policy_version 7320 (0.0019) [2025-01-03 21:03:01,069][126169] Fps is (10 sec: 12697.7, 60 sec: 12015.0, 300 sec: 12399.1). Total num frames: 29990912. Throughput: 0: 2998.9. Samples: 6489420. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 21:03:01,069][126169] Avg episode reward: [(0, '4.859')] [2025-01-03 21:03:03,707][126248] Updated weights for policy 0, policy_version 7330 (0.0020) [2025-01-03 21:03:06,069][126169] Fps is (10 sec: 11878.5, 60 sec: 11946.7, 300 sec: 12315.8). Total num frames: 30048256. Throughput: 0: 2997.2. Samples: 6507702. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 21:03:06,069][126169] Avg episode reward: [(0, '4.802')] [2025-01-03 21:03:07,219][126248] Updated weights for policy 0, policy_version 7340 (0.0021) [2025-01-03 21:03:09,778][126248] Updated weights for policy 0, policy_version 7350 (0.0013) [2025-01-03 21:03:11,068][126169] Fps is (10 sec: 13926.5, 60 sec: 12356.3, 300 sec: 12301.9). Total num frames: 30130176. Throughput: 0: 2999.2. Samples: 6516592. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 21:03:11,069][126169] Avg episode reward: [(0, '4.622')] [2025-01-03 21:03:11,884][126248] Updated weights for policy 0, policy_version 7360 (0.0010) [2025-01-03 21:03:13,966][126248] Updated weights for policy 0, policy_version 7370 (0.0010) [2025-01-03 21:03:16,069][126169] Fps is (10 sec: 17613.0, 60 sec: 12902.4, 300 sec: 12440.7). Total num frames: 30224384. Throughput: 0: 3278.7. Samples: 6545266. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 21:03:16,069][126169] Avg episode reward: [(0, '4.948')] [2025-01-03 21:03:16,396][126248] Updated weights for policy 0, policy_version 7380 (0.0014) [2025-01-03 21:03:20,071][126248] Updated weights for policy 0, policy_version 7390 (0.0022) [2025-01-03 21:03:21,069][126169] Fps is (10 sec: 14745.3, 60 sec: 12765.9, 300 sec: 12413.0). Total num frames: 30277632. Throughput: 0: 3320.7. Samples: 6564866. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2025-01-03 21:03:21,069][126169] Avg episode reward: [(0, '4.729')] [2025-01-03 21:03:23,560][126248] Updated weights for policy 0, policy_version 7400 (0.0020) [2025-01-03 21:03:26,069][126169] Fps is (10 sec: 11468.7, 60 sec: 12902.4, 300 sec: 12426.8). Total num frames: 30339072. Throughput: 0: 3315.2. Samples: 6573752. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-03 21:03:26,069][126169] Avg episode reward: [(0, '4.813')] [2025-01-03 21:03:27,115][126248] Updated weights for policy 0, policy_version 7410 (0.0021) [2025-01-03 21:03:30,494][126248] Updated weights for policy 0, policy_version 7420 (0.0020) [2025-01-03 21:03:31,069][126169] Fps is (10 sec: 11878.4, 60 sec: 12902.4, 300 sec: 12413.0). Total num frames: 30396416. Throughput: 0: 3291.4. Samples: 6591392. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-03 21:03:31,069][126169] Avg episode reward: [(0, '4.962')] [2025-01-03 21:03:33,894][126248] Updated weights for policy 0, policy_version 7430 (0.0020) [2025-01-03 21:03:36,069][126169] Fps is (10 sec: 11878.3, 60 sec: 12902.4, 300 sec: 12413.0). Total num frames: 30457856. Throughput: 0: 3278.1. Samples: 6609278. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-03 21:03:36,069][126169] Avg episode reward: [(0, '4.863')] [2025-01-03 21:03:37,378][126248] Updated weights for policy 0, policy_version 7440 (0.0021) [2025-01-03 21:03:40,801][126248] Updated weights for policy 0, policy_version 7450 (0.0021) [2025-01-03 21:03:41,069][126169] Fps is (10 sec: 11878.4, 60 sec: 12902.4, 300 sec: 12413.0). Total num frames: 30515200. Throughput: 0: 3271.5. Samples: 6618236. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-03 21:03:41,069][126169] Avg episode reward: [(0, '5.122')] [2025-01-03 21:03:44,278][126248] Updated weights for policy 0, policy_version 7460 (0.0021) [2025-01-03 21:03:46,069][126169] Fps is (10 sec: 11878.4, 60 sec: 12834.1, 300 sec: 12413.0). Total num frames: 30576640. Throughput: 0: 3259.1. Samples: 6636078. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-03 21:03:46,069][126169] Avg episode reward: [(0, '4.949')] [2025-01-03 21:03:47,614][126248] Updated weights for policy 0, policy_version 7470 (0.0020) [2025-01-03 21:03:50,887][126248] Updated weights for policy 0, policy_version 7480 (0.0020) [2025-01-03 21:03:51,069][126169] Fps is (10 sec: 12288.0, 60 sec: 12902.4, 300 sec: 12413.0). Total num frames: 30638080. Throughput: 0: 3267.0. Samples: 6654718. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-03 21:03:51,069][126169] Avg episode reward: [(0, '5.227')] [2025-01-03 21:03:54,242][126248] Updated weights for policy 0, policy_version 7490 (0.0020) [2025-01-03 21:03:56,069][126169] Fps is (10 sec: 12288.1, 60 sec: 12834.1, 300 sec: 12426.8). Total num frames: 30699520. Throughput: 0: 3272.7. Samples: 6663866. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-03 21:03:56,069][126169] Avg episode reward: [(0, '5.179')] [2025-01-03 21:03:57,602][126248] Updated weights for policy 0, policy_version 7500 (0.0021) [2025-01-03 21:04:00,889][126248] Updated weights for policy 0, policy_version 7510 (0.0021) [2025-01-03 21:04:01,068][126169] Fps is (10 sec: 12288.2, 60 sec: 12834.1, 300 sec: 12426.8). Total num frames: 30760960. Throughput: 0: 3046.3. Samples: 6682350. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-03 21:04:01,069][126169] Avg episode reward: [(0, '5.587')] [2025-01-03 21:04:04,194][126248] Updated weights for policy 0, policy_version 7520 (0.0020) [2025-01-03 21:04:06,069][126169] Fps is (10 sec: 12287.9, 60 sec: 12902.4, 300 sec: 12413.0). Total num frames: 30822400. Throughput: 0: 3018.7. Samples: 6700706. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-03 21:04:06,069][126169] Avg episode reward: [(0, '5.705')] [2025-01-03 21:04:07,491][126248] Updated weights for policy 0, policy_version 7530 (0.0020) [2025-01-03 21:04:10,760][126248] Updated weights for policy 0, policy_version 7540 (0.0019) [2025-01-03 21:04:11,068][126169] Fps is (10 sec: 12288.0, 60 sec: 12561.1, 300 sec: 12413.0). Total num frames: 30883840. Throughput: 0: 3031.9. Samples: 6710188. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-03 21:04:11,069][126169] Avg episode reward: [(0, '5.432')] [2025-01-03 21:04:14,118][126248] Updated weights for policy 0, policy_version 7550 (0.0020) [2025-01-03 21:04:16,069][126169] Fps is (10 sec: 12288.1, 60 sec: 12014.9, 300 sec: 12413.0). Total num frames: 30945280. Throughput: 0: 3052.8. Samples: 6728768. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-03 21:04:16,069][126169] Avg episode reward: [(0, '5.595')] [2025-01-03 21:04:17,478][126248] Updated weights for policy 0, policy_version 7560 (0.0020) [2025-01-03 21:04:20,966][126248] Updated weights for policy 0, policy_version 7570 (0.0022) [2025-01-03 21:04:21,069][126169] Fps is (10 sec: 12287.8, 60 sec: 12151.5, 300 sec: 12440.7). Total num frames: 31006720. Throughput: 0: 3057.9. Samples: 6746884. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 21:04:21,069][126169] Avg episode reward: [(0, '5.730')] [2025-01-03 21:04:25,558][126248] Updated weights for policy 0, policy_version 7580 (0.0024) [2025-01-03 21:04:26,069][126169] Fps is (10 sec: 10649.7, 60 sec: 11878.4, 300 sec: 12343.5). Total num frames: 31051776. Throughput: 0: 3040.2. Samples: 6755044. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 21:04:26,069][126169] Avg episode reward: [(0, '5.531')] [2025-01-03 21:04:28,473][126248] Updated weights for policy 0, policy_version 7590 (0.0013) [2025-01-03 21:04:30,788][126248] Updated weights for policy 0, policy_version 7600 (0.0012) [2025-01-03 21:04:31,068][126169] Fps is (10 sec: 12697.9, 60 sec: 12288.0, 300 sec: 12413.0). Total num frames: 31133696. Throughput: 0: 3030.0. Samples: 6772428. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 21:04:31,069][126169] Avg episode reward: [(0, '5.662')] [2025-01-03 21:04:32,989][126248] Updated weights for policy 0, policy_version 7610 (0.0011) [2025-01-03 21:04:35,139][126248] Updated weights for policy 0, policy_version 7620 (0.0010) [2025-01-03 21:04:36,068][126169] Fps is (10 sec: 17612.9, 60 sec: 12834.2, 300 sec: 12524.0). Total num frames: 31227904. Throughput: 0: 3238.3. Samples: 6800442. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 21:04:36,069][126169] Avg episode reward: [(0, '5.466')] [2025-01-03 21:04:37,383][126248] Updated weights for policy 0, policy_version 7630 (0.0011) [2025-01-03 21:04:41,069][126169] Fps is (10 sec: 15973.9, 60 sec: 12970.6, 300 sec: 12537.9). Total num frames: 31293440. Throughput: 0: 3321.0. Samples: 6813312. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 21:04:41,070][126169] Avg episode reward: [(0, '5.807')] [2025-01-03 21:04:41,071][126248] Updated weights for policy 0, policy_version 7640 (0.0023) [2025-01-03 21:04:45,076][126248] Updated weights for policy 0, policy_version 7650 (0.0024) [2025-01-03 21:04:46,069][126169] Fps is (10 sec: 11468.4, 60 sec: 12765.8, 300 sec: 12496.3). Total num frames: 31342592. Throughput: 0: 3255.8. Samples: 6828860. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 21:04:46,070][126169] Avg episode reward: [(0, '5.646')] [2025-01-03 21:04:46,079][126222] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000007652_31342592.pth... [2025-01-03 21:04:46,151][126222] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000006925_28364800.pth [2025-01-03 21:04:48,975][126248] Updated weights for policy 0, policy_version 7660 (0.0024) [2025-01-03 21:04:51,069][126169] Fps is (10 sec: 10240.1, 60 sec: 12629.3, 300 sec: 12468.5). Total num frames: 31395840. Throughput: 0: 3198.0. Samples: 6844616. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 21:04:51,069][126169] Avg episode reward: [(0, '5.464')] [2025-01-03 21:04:52,848][126248] Updated weights for policy 0, policy_version 7670 (0.0023) [2025-01-03 21:04:56,069][126169] Fps is (10 sec: 10649.7, 60 sec: 12492.8, 300 sec: 12440.7). Total num frames: 31449088. Throughput: 0: 3161.3. Samples: 6852448. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 21:04:56,069][126169] Avg episode reward: [(0, '4.999')] [2025-01-03 21:04:56,464][126248] Updated weights for policy 0, policy_version 7680 (0.0022) [2025-01-03 21:04:59,860][126248] Updated weights for policy 0, policy_version 7690 (0.0020) [2025-01-03 21:05:01,069][126169] Fps is (10 sec: 11468.9, 60 sec: 12492.8, 300 sec: 12385.2). Total num frames: 31510528. Throughput: 0: 3142.9. Samples: 6870200. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 21:05:01,069][126169] Avg episode reward: [(0, '4.836')] [2025-01-03 21:05:03,639][126248] Updated weights for policy 0, policy_version 7700 (0.0024) [2025-01-03 21:05:06,069][126169] Fps is (10 sec: 11468.6, 60 sec: 12356.2, 300 sec: 12232.4). Total num frames: 31563776. Throughput: 0: 3091.9. Samples: 6886022. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 21:05:06,070][126169] Avg episode reward: [(0, '4.775')] [2025-01-03 21:05:07,628][126248] Updated weights for policy 0, policy_version 7710 (0.0023) [2025-01-03 21:05:11,069][126169] Fps is (10 sec: 10649.6, 60 sec: 12219.7, 300 sec: 12218.6). Total num frames: 31617024. Throughput: 0: 3088.4. Samples: 6894022. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 21:05:11,069][126169] Avg episode reward: [(0, '4.804')] [2025-01-03 21:05:11,392][126248] Updated weights for policy 0, policy_version 7720 (0.0024) [2025-01-03 21:05:15,092][126248] Updated weights for policy 0, policy_version 7730 (0.0023) [2025-01-03 21:05:16,069][126169] Fps is (10 sec: 10649.8, 60 sec: 12083.2, 300 sec: 12232.5). Total num frames: 31670272. Throughput: 0: 3072.6. Samples: 6910696. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 21:05:16,069][126169] Avg episode reward: [(0, '4.670')] [2025-01-03 21:05:19,163][126248] Updated weights for policy 0, policy_version 7740 (0.0024) [2025-01-03 21:05:21,069][126169] Fps is (10 sec: 10239.9, 60 sec: 11878.4, 300 sec: 12218.6). Total num frames: 31719424. Throughput: 0: 2775.5. Samples: 6925342. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 21:05:21,069][126169] Avg episode reward: [(0, '4.442')] [2025-01-03 21:05:23,294][126248] Updated weights for policy 0, policy_version 7750 (0.0023) [2025-01-03 21:05:25,678][126248] Updated weights for policy 0, policy_version 7760 (0.0012) [2025-01-03 21:05:26,068][126169] Fps is (10 sec: 11878.7, 60 sec: 12288.0, 300 sec: 12274.1). Total num frames: 31789056. Throughput: 0: 2661.4. Samples: 6933074. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 21:05:26,069][126169] Avg episode reward: [(0, '4.730')] [2025-01-03 21:05:28,918][126248] Updated weights for policy 0, policy_version 7770 (0.0020) [2025-01-03 21:05:31,069][126169] Fps is (10 sec: 12697.9, 60 sec: 11878.4, 300 sec: 12288.0). Total num frames: 31846400. Throughput: 0: 2793.5. Samples: 6954568. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 21:05:31,069][126169] Avg episode reward: [(0, '4.788')] [2025-01-03 21:05:33,152][126248] Updated weights for policy 0, policy_version 7780 (0.0024) [2025-01-03 21:05:35,876][126248] Updated weights for policy 0, policy_version 7790 (0.0014) [2025-01-03 21:05:36,069][126169] Fps is (10 sec: 11878.3, 60 sec: 11332.3, 300 sec: 12204.7). Total num frames: 31907840. Throughput: 0: 2815.8. Samples: 6971328. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 21:05:36,069][126169] Avg episode reward: [(0, '4.688')] [2025-01-03 21:05:39,245][126248] Updated weights for policy 0, policy_version 7800 (0.0021) [2025-01-03 21:05:41,069][126169] Fps is (10 sec: 11878.4, 60 sec: 11195.8, 300 sec: 12204.7). Total num frames: 31965184. Throughput: 0: 2873.6. Samples: 6981760. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 21:05:41,069][126169] Avg episode reward: [(0, '4.631')] [2025-01-03 21:05:43,051][126248] Updated weights for policy 0, policy_version 7810 (0.0022) [2025-01-03 21:05:46,069][126169] Fps is (10 sec: 11059.2, 60 sec: 11264.0, 300 sec: 12218.6). Total num frames: 32018432. Throughput: 0: 2841.5. Samples: 6998066. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 21:05:46,069][126169] Avg episode reward: [(0, '4.710')] [2025-01-03 21:05:46,868][126248] Updated weights for policy 0, policy_version 7820 (0.0023) [2025-01-03 21:05:50,805][126248] Updated weights for policy 0, policy_version 7830 (0.0023) [2025-01-03 21:05:51,068][126169] Fps is (10 sec: 11059.5, 60 sec: 11332.3, 300 sec: 12232.5). Total num frames: 32075776. Throughput: 0: 2835.2. Samples: 7013606. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 21:05:51,069][126169] Avg episode reward: [(0, '4.766')] [2025-01-03 21:05:53,057][126248] Updated weights for policy 0, policy_version 7840 (0.0012) [2025-01-03 21:05:55,310][126248] Updated weights for policy 0, policy_version 7850 (0.0011) [2025-01-03 21:05:56,068][126169] Fps is (10 sec: 14745.8, 60 sec: 11946.7, 300 sec: 12329.7). Total num frames: 32165888. Throughput: 0: 2937.8. Samples: 7026224. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 21:05:56,069][126169] Avg episode reward: [(0, '4.941')] [2025-01-03 21:05:57,577][126248] Updated weights for policy 0, policy_version 7860 (0.0011) [2025-01-03 21:05:59,792][126248] Updated weights for policy 0, policy_version 7870 (0.0011) [2025-01-03 21:06:01,069][126169] Fps is (10 sec: 18022.0, 60 sec: 12424.5, 300 sec: 12343.5). Total num frames: 32256000. Throughput: 0: 3175.6. Samples: 7053596. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-03 21:06:01,069][126169] Avg episode reward: [(0, '4.852')] [2025-01-03 21:06:02,689][126248] Updated weights for policy 0, policy_version 7880 (0.0018) [2025-01-03 21:06:06,069][126169] Fps is (10 sec: 13926.0, 60 sec: 12356.3, 300 sec: 12329.7). Total num frames: 32305152. Throughput: 0: 3272.1. Samples: 7072586. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 21:06:06,069][126169] Avg episode reward: [(0, '5.006')] [2025-01-03 21:06:07,044][126248] Updated weights for policy 0, policy_version 7890 (0.0027) [2025-01-03 21:06:11,004][126248] Updated weights for policy 0, policy_version 7900 (0.0025) [2025-01-03 21:06:11,069][126169] Fps is (10 sec: 10239.8, 60 sec: 12356.3, 300 sec: 12329.7). Total num frames: 32358400. Throughput: 0: 3259.3. Samples: 7079744. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 21:06:11,069][126169] Avg episode reward: [(0, '4.780')] [2025-01-03 21:06:14,912][126248] Updated weights for policy 0, policy_version 7910 (0.0023) [2025-01-03 21:06:16,069][126169] Fps is (10 sec: 10240.0, 60 sec: 12288.0, 300 sec: 12315.8). Total num frames: 32407552. Throughput: 0: 3129.1. Samples: 7095378. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 21:06:16,069][126169] Avg episode reward: [(0, '5.229')] [2025-01-03 21:06:18,710][126248] Updated weights for policy 0, policy_version 7920 (0.0023) [2025-01-03 21:06:21,069][126169] Fps is (10 sec: 10649.7, 60 sec: 12424.6, 300 sec: 12329.7). Total num frames: 32464896. Throughput: 0: 3112.0. Samples: 7111370. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 21:06:21,069][126169] Avg episode reward: [(0, '5.025')] [2025-01-03 21:06:22,572][126248] Updated weights for policy 0, policy_version 7930 (0.0024) [2025-01-03 21:06:26,069][126169] Fps is (10 sec: 10649.6, 60 sec: 12083.2, 300 sec: 12315.8). Total num frames: 32514048. Throughput: 0: 3056.5. Samples: 7119304. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 21:06:26,069][126169] Avg episode reward: [(0, '4.960')] [2025-01-03 21:06:27,033][126248] Updated weights for policy 0, policy_version 7940 (0.0026) [2025-01-03 21:06:31,069][126169] Fps is (10 sec: 9420.8, 60 sec: 11878.4, 300 sec: 12288.0). Total num frames: 32559104. Throughput: 0: 3001.1. Samples: 7133114. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 21:06:31,069][126169] Avg episode reward: [(0, '5.078')] [2025-01-03 21:06:31,254][126248] Updated weights for policy 0, policy_version 7950 (0.0024) [2025-01-03 21:06:34,116][126248] Updated weights for policy 0, policy_version 7960 (0.0015) [2025-01-03 21:06:36,068][126169] Fps is (10 sec: 12288.3, 60 sec: 12151.5, 300 sec: 12357.4). Total num frames: 32636928. Throughput: 0: 3105.5. Samples: 7153354. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-03 21:06:36,069][126169] Avg episode reward: [(0, '5.158')] [2025-01-03 21:06:36,385][126248] Updated weights for policy 0, policy_version 7970 (0.0011) [2025-01-03 21:06:38,655][126248] Updated weights for policy 0, policy_version 7980 (0.0012) [2025-01-03 21:06:40,902][126248] Updated weights for policy 0, policy_version 7990 (0.0011) [2025-01-03 21:06:41,068][126169] Fps is (10 sec: 16794.0, 60 sec: 12697.6, 300 sec: 12399.1). Total num frames: 32727040. Throughput: 0: 3124.1. Samples: 7166810. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 21:06:41,069][126169] Avg episode reward: [(0, '4.894')] [2025-01-03 21:06:44,536][126248] Updated weights for policy 0, policy_version 8000 (0.0023) [2025-01-03 21:06:46,069][126169] Fps is (10 sec: 14335.5, 60 sec: 12697.5, 300 sec: 12274.1). Total num frames: 32780288. Throughput: 0: 2998.6. Samples: 7188532. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 21:06:46,070][126169] Avg episode reward: [(0, '5.029')] [2025-01-03 21:06:46,080][126222] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000008003_32780288.pth... [2025-01-03 21:06:46,161][126222] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000007277_29806592.pth [2025-01-03 21:06:49,103][126248] Updated weights for policy 0, policy_version 8010 (0.0028) [2025-01-03 21:06:51,069][126169] Fps is (10 sec: 9830.3, 60 sec: 12492.7, 300 sec: 12246.3). Total num frames: 32825344. Throughput: 0: 2874.2. Samples: 7201926. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 21:06:51,069][126169] Avg episode reward: [(0, '5.100')] [2025-01-03 21:06:53,252][126248] Updated weights for policy 0, policy_version 8020 (0.0024) [2025-01-03 21:06:56,069][126169] Fps is (10 sec: 9830.6, 60 sec: 11878.4, 300 sec: 12232.5). Total num frames: 32878592. Throughput: 0: 2889.4. Samples: 7209768. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 21:06:56,069][126169] Avg episode reward: [(0, '5.300')] [2025-01-03 21:06:57,275][126248] Updated weights for policy 0, policy_version 8030 (0.0024) [2025-01-03 21:07:01,002][126248] Updated weights for policy 0, policy_version 8040 (0.0023) [2025-01-03 21:07:01,069][126169] Fps is (10 sec: 10649.7, 60 sec: 11264.0, 300 sec: 12204.7). Total num frames: 32931840. Throughput: 0: 2892.3. Samples: 7225532. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 21:07:01,069][126169] Avg episode reward: [(0, '4.752')] [2025-01-03 21:07:04,792][126248] Updated weights for policy 0, policy_version 8050 (0.0022) [2025-01-03 21:07:06,069][126169] Fps is (10 sec: 10649.6, 60 sec: 11332.3, 300 sec: 12190.8). Total num frames: 32985088. Throughput: 0: 2893.8. Samples: 7241592. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 21:07:06,069][126169] Avg episode reward: [(0, '4.850')] [2025-01-03 21:07:08,649][126248] Updated weights for policy 0, policy_version 8060 (0.0024) [2025-01-03 21:07:11,069][126169] Fps is (10 sec: 10239.9, 60 sec: 11264.0, 300 sec: 12149.1). Total num frames: 33034240. Throughput: 0: 2897.5. Samples: 7249690. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 21:07:11,070][126169] Avg episode reward: [(0, '4.776')] [2025-01-03 21:07:12,181][126248] Updated weights for policy 0, policy_version 8070 (0.0019) [2025-01-03 21:07:14,428][126248] Updated weights for policy 0, policy_version 8080 (0.0012) [2025-01-03 21:07:16,068][126169] Fps is (10 sec: 13926.7, 60 sec: 11946.7, 300 sec: 12246.4). Total num frames: 33124352. Throughput: 0: 3048.6. Samples: 7270300. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 21:07:16,069][126169] Avg episode reward: [(0, '4.942')] [2025-01-03 21:07:16,629][126248] Updated weights for policy 0, policy_version 8090 (0.0011) [2025-01-03 21:07:18,929][126248] Updated weights for policy 0, policy_version 8100 (0.0011) [2025-01-03 21:07:21,068][126169] Fps is (10 sec: 18023.0, 60 sec: 12492.9, 300 sec: 12371.3). Total num frames: 33214464. Throughput: 0: 3205.2. Samples: 7297586. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 21:07:21,069][126169] Avg episode reward: [(0, '5.199')] [2025-01-03 21:07:21,138][126248] Updated weights for policy 0, policy_version 8110 (0.0011) [2025-01-03 21:07:24,609][126248] Updated weights for policy 0, policy_version 8120 (0.0020) [2025-01-03 21:07:26,069][126169] Fps is (10 sec: 14745.1, 60 sec: 12629.3, 300 sec: 12371.3). Total num frames: 33271808. Throughput: 0: 3145.6. Samples: 7308362. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 21:07:26,069][126169] Avg episode reward: [(0, '5.121')] [2025-01-03 21:07:28,261][126248] Updated weights for policy 0, policy_version 8130 (0.0023) [2025-01-03 21:07:31,069][126169] Fps is (10 sec: 11468.5, 60 sec: 12834.2, 300 sec: 12357.4). Total num frames: 33329152. Throughput: 0: 3030.0. Samples: 7324882. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 21:07:31,069][126169] Avg episode reward: [(0, '5.309')] [2025-01-03 21:07:32,131][126248] Updated weights for policy 0, policy_version 8140 (0.0024) [2025-01-03 21:07:35,763][126248] Updated weights for policy 0, policy_version 8150 (0.0022) [2025-01-03 21:07:36,069][126169] Fps is (10 sec: 11059.4, 60 sec: 12424.5, 300 sec: 12343.5). Total num frames: 33382400. Throughput: 0: 3098.9. Samples: 7341376. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-03 21:07:36,069][126169] Avg episode reward: [(0, '5.148')] [2025-01-03 21:07:39,301][126248] Updated weights for policy 0, policy_version 8160 (0.0022) [2025-01-03 21:07:41,069][126169] Fps is (10 sec: 11468.9, 60 sec: 11946.7, 300 sec: 12329.7). Total num frames: 33443840. Throughput: 0: 3115.1. Samples: 7349946. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-03 21:07:41,069][126169] Avg episode reward: [(0, '5.515')] [2025-01-03 21:07:42,703][126248] Updated weights for policy 0, policy_version 8170 (0.0020) [2025-01-03 21:07:46,069][126169] Fps is (10 sec: 11878.2, 60 sec: 12015.0, 300 sec: 12329.6). Total num frames: 33501184. Throughput: 0: 3165.9. Samples: 7367996. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-03 21:07:46,069][126169] Avg episode reward: [(0, '5.572')] [2025-01-03 21:07:46,101][126248] Updated weights for policy 0, policy_version 8180 (0.0020) [2025-01-03 21:07:49,753][126248] Updated weights for policy 0, policy_version 8190 (0.0022) [2025-01-03 21:07:51,069][126169] Fps is (10 sec: 11468.6, 60 sec: 12219.7, 300 sec: 12301.9). Total num frames: 33558528. Throughput: 0: 3185.2. Samples: 7384924. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 21:07:51,069][126169] Avg episode reward: [(0, '5.445')] [2025-01-03 21:07:53,559][126248] Updated weights for policy 0, policy_version 8200 (0.0023) [2025-01-03 21:07:56,069][126169] Fps is (10 sec: 11059.2, 60 sec: 12219.7, 300 sec: 12274.1). Total num frames: 33611776. Throughput: 0: 3187.8. Samples: 7393140. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-03 21:07:56,069][126169] Avg episode reward: [(0, '5.608')] [2025-01-03 21:07:57,556][126248] Updated weights for policy 0, policy_version 8210 (0.0023) [2025-01-03 21:08:01,056][126248] Updated weights for policy 0, policy_version 8220 (0.0021) [2025-01-03 21:08:01,069][126169] Fps is (10 sec: 11059.0, 60 sec: 12287.9, 300 sec: 12274.1). Total num frames: 33669120. Throughput: 0: 3088.1. Samples: 7409264. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-03 21:08:01,070][126169] Avg episode reward: [(0, '5.935')] [2025-01-03 21:08:04,921][126248] Updated weights for policy 0, policy_version 8230 (0.0023) [2025-01-03 21:08:06,069][126169] Fps is (10 sec: 11059.2, 60 sec: 12288.0, 300 sec: 12176.9). Total num frames: 33722368. Throughput: 0: 2842.3. Samples: 7425492. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 21:08:06,069][126169] Avg episode reward: [(0, '5.505')] [2025-01-03 21:08:08,445][126248] Updated weights for policy 0, policy_version 8240 (0.0022) [2025-01-03 21:08:11,069][126169] Fps is (10 sec: 11059.5, 60 sec: 12424.5, 300 sec: 12052.0). Total num frames: 33779712. Throughput: 0: 2798.5. Samples: 7434292. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-03 21:08:11,069][126169] Avg episode reward: [(0, '5.684')] [2025-01-03 21:08:12,365][126248] Updated weights for policy 0, policy_version 8250 (0.0023) [2025-01-03 21:08:15,866][126248] Updated weights for policy 0, policy_version 8260 (0.0022) [2025-01-03 21:08:16,069][126169] Fps is (10 sec: 11059.2, 60 sec: 11810.1, 300 sec: 12052.0). Total num frames: 33832960. Throughput: 0: 2793.7. Samples: 7450598. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-03 21:08:16,069][126169] Avg episode reward: [(0, '4.957')] [2025-01-03 21:08:18,673][126248] Updated weights for policy 0, policy_version 8270 (0.0016) [2025-01-03 21:08:20,793][126248] Updated weights for policy 0, policy_version 8280 (0.0010) [2025-01-03 21:08:21,068][126169] Fps is (10 sec: 13926.6, 60 sec: 11741.8, 300 sec: 12135.3). Total num frames: 33918976. Throughput: 0: 2929.0. Samples: 7473180. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 21:08:21,069][126169] Avg episode reward: [(0, '4.951')] [2025-01-03 21:08:22,960][126248] Updated weights for policy 0, policy_version 8290 (0.0011) [2025-01-03 21:08:25,600][126248] Updated weights for policy 0, policy_version 8300 (0.0016) [2025-01-03 21:08:26,069][126169] Fps is (10 sec: 16793.4, 60 sec: 12151.5, 300 sec: 12218.6). Total num frames: 34000896. Throughput: 0: 3056.2. Samples: 7487474. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 21:08:26,070][126169] Avg episode reward: [(0, '4.957')] [2025-01-03 21:08:30,416][126248] Updated weights for policy 0, policy_version 8310 (0.0030) [2025-01-03 21:08:31,069][126169] Fps is (10 sec: 12287.7, 60 sec: 11878.4, 300 sec: 12149.2). Total num frames: 34041856. Throughput: 0: 3028.6. Samples: 7504284. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-03 21:08:31,070][126169] Avg episode reward: [(0, '4.659')] [2025-01-03 21:08:35,115][126248] Updated weights for policy 0, policy_version 8320 (0.0027) [2025-01-03 21:08:36,068][126169] Fps is (10 sec: 8601.9, 60 sec: 11741.9, 300 sec: 12107.5). Total num frames: 34086912. Throughput: 0: 2940.5. Samples: 7517246. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-03 21:08:36,069][126169] Avg episode reward: [(0, '4.751')] [2025-01-03 21:08:37,719][126248] Updated weights for policy 0, policy_version 8330 (0.0013) [2025-01-03 21:08:39,946][126248] Updated weights for policy 0, policy_version 8340 (0.0011) [2025-01-03 21:08:41,068][126169] Fps is (10 sec: 13926.8, 60 sec: 12288.0, 300 sec: 12218.6). Total num frames: 34181120. Throughput: 0: 3028.5. Samples: 7529420. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 21:08:41,069][126169] Avg episode reward: [(0, '4.850')] [2025-01-03 21:08:42,113][126248] Updated weights for policy 0, policy_version 8350 (0.0011) [2025-01-03 21:08:44,249][126248] Updated weights for policy 0, policy_version 8360 (0.0011) [2025-01-03 21:08:46,069][126169] Fps is (10 sec: 18841.4, 60 sec: 12902.4, 300 sec: 12329.7). Total num frames: 34275328. Throughput: 0: 3301.9. Samples: 7557850. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 21:08:46,069][126169] Avg episode reward: [(0, '4.428')] [2025-01-03 21:08:46,074][126222] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000008368_34275328.pth... [2025-01-03 21:08:46,120][126222] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000007652_31342592.pth [2025-01-03 21:08:46,535][126248] Updated weights for policy 0, policy_version 8370 (0.0011) [2025-01-03 21:08:50,215][126248] Updated weights for policy 0, policy_version 8380 (0.0022) [2025-01-03 21:08:51,069][126169] Fps is (10 sec: 15153.7, 60 sec: 12902.2, 300 sec: 12315.7). Total num frames: 34332672. Throughput: 0: 3400.4. Samples: 7578514. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 21:08:51,070][126169] Avg episode reward: [(0, '4.533')] [2025-01-03 21:08:54,353][126248] Updated weights for policy 0, policy_version 8390 (0.0024) [2025-01-03 21:08:56,069][126169] Fps is (10 sec: 10649.5, 60 sec: 12834.1, 300 sec: 12274.1). Total num frames: 34381824. Throughput: 0: 3365.7. Samples: 7585748. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 21:08:56,069][126169] Avg episode reward: [(0, '4.689')] [2025-01-03 21:08:57,899][126248] Updated weights for policy 0, policy_version 8400 (0.0021) [2025-01-03 21:09:01,069][126169] Fps is (10 sec: 9831.1, 60 sec: 12697.6, 300 sec: 12232.5). Total num frames: 34430976. Throughput: 0: 3374.7. Samples: 7602458. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 21:09:01,069][126169] Avg episode reward: [(0, '4.659')] [2025-01-03 21:09:02,648][126248] Updated weights for policy 0, policy_version 8410 (0.0026) [2025-01-03 21:09:06,069][126169] Fps is (10 sec: 9830.1, 60 sec: 12629.3, 300 sec: 12190.8). Total num frames: 34480128. Throughput: 0: 3176.6. Samples: 7616130. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 21:09:06,070][126169] Avg episode reward: [(0, '4.891')] [2025-01-03 21:09:06,518][126248] Updated weights for policy 0, policy_version 8420 (0.0022) [2025-01-03 21:09:10,254][126248] Updated weights for policy 0, policy_version 8430 (0.0023) [2025-01-03 21:09:11,069][126169] Fps is (10 sec: 10649.7, 60 sec: 12629.3, 300 sec: 12176.9). Total num frames: 34537472. Throughput: 0: 3040.1. Samples: 7624278. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 21:09:11,069][126169] Avg episode reward: [(0, '4.917')] [2025-01-03 21:09:13,858][126248] Updated weights for policy 0, policy_version 8440 (0.0022) [2025-01-03 21:09:16,069][126169] Fps is (10 sec: 11059.3, 60 sec: 12629.3, 300 sec: 12149.1). Total num frames: 34590720. Throughput: 0: 3033.2. Samples: 7640780. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 21:09:16,070][126169] Avg episode reward: [(0, '4.943')] [2025-01-03 21:09:17,597][126248] Updated weights for policy 0, policy_version 8450 (0.0022) [2025-01-03 21:09:21,069][126169] Fps is (10 sec: 11059.3, 60 sec: 12151.4, 300 sec: 12190.8). Total num frames: 34648064. Throughput: 0: 3116.7. Samples: 7657498. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-03 21:09:21,069][126169] Avg episode reward: [(0, '5.189')] [2025-01-03 21:09:21,145][126248] Updated weights for policy 0, policy_version 8460 (0.0021) [2025-01-03 21:09:24,585][126248] Updated weights for policy 0, policy_version 8470 (0.0020) [2025-01-03 21:09:26,069][126169] Fps is (10 sec: 11878.6, 60 sec: 11810.2, 300 sec: 12121.4). Total num frames: 34709504. Throughput: 0: 3043.7. Samples: 7666386. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-03 21:09:26,069][126169] Avg episode reward: [(0, '5.645')] [2025-01-03 21:09:27,966][126248] Updated weights for policy 0, policy_version 8480 (0.0020) [2025-01-03 21:09:31,069][126169] Fps is (10 sec: 11878.4, 60 sec: 12083.2, 300 sec: 11996.4). Total num frames: 34766848. Throughput: 0: 2804.3. Samples: 7684044. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-03 21:09:31,069][126169] Avg episode reward: [(0, '4.897')] [2025-01-03 21:09:31,662][126248] Updated weights for policy 0, policy_version 8490 (0.0023) [2025-01-03 21:09:35,025][126248] Updated weights for policy 0, policy_version 8500 (0.0021) [2025-01-03 21:09:36,069][126169] Fps is (10 sec: 11468.7, 60 sec: 12288.0, 300 sec: 11968.7). Total num frames: 34824192. Throughput: 0: 2733.7. Samples: 7701530. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2025-01-03 21:09:36,069][126169] Avg episode reward: [(0, '4.822')] [2025-01-03 21:09:38,483][126248] Updated weights for policy 0, policy_version 8510 (0.0021) [2025-01-03 21:09:41,069][126169] Fps is (10 sec: 11878.3, 60 sec: 11741.8, 300 sec: 12010.3). Total num frames: 34885632. Throughput: 0: 2771.3. Samples: 7710456. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2025-01-03 21:09:41,069][126169] Avg episode reward: [(0, '4.736')] [2025-01-03 21:09:42,070][126248] Updated weights for policy 0, policy_version 8520 (0.0022) [2025-01-03 21:09:45,508][126248] Updated weights for policy 0, policy_version 8530 (0.0021) [2025-01-03 21:09:46,069][126169] Fps is (10 sec: 11878.4, 60 sec: 11127.4, 300 sec: 12024.2). Total num frames: 34942976. Throughput: 0: 2785.4. Samples: 7727800. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-01-03 21:09:46,069][126169] Avg episode reward: [(0, '5.002')] [2025-01-03 21:09:48,725][126248] Updated weights for policy 0, policy_version 8540 (0.0019) [2025-01-03 21:09:50,875][126248] Updated weights for policy 0, policy_version 8550 (0.0010) [2025-01-03 21:09:51,068][126169] Fps is (10 sec: 13517.1, 60 sec: 11469.0, 300 sec: 12107.5). Total num frames: 35020800. Throughput: 0: 2953.0. Samples: 7749012. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2025-01-03 21:09:51,069][126169] Avg episode reward: [(0, '5.334')] [2025-01-03 21:09:53,540][126248] Updated weights for policy 0, policy_version 8560 (0.0015) [2025-01-03 21:09:56,069][126169] Fps is (10 sec: 13926.2, 60 sec: 11673.6, 300 sec: 12107.5). Total num frames: 35082240. Throughput: 0: 3050.4. Samples: 7761548. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-01-03 21:09:56,070][126169] Avg episode reward: [(0, '5.442')] [2025-01-03 21:09:57,969][126248] Updated weights for policy 0, policy_version 8570 (0.0025) [2025-01-03 21:10:01,069][126169] Fps is (10 sec: 11468.7, 60 sec: 11741.9, 300 sec: 12107.5). Total num frames: 35135488. Throughput: 0: 3010.5. Samples: 7776250. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-01-03 21:10:01,069][126169] Avg episode reward: [(0, '4.987')] [2025-01-03 21:10:01,459][126248] Updated weights for policy 0, policy_version 8580 (0.0019) [2025-01-03 21:10:04,409][126248] Updated weights for policy 0, policy_version 8590 (0.0017) [2025-01-03 21:10:06,069][126169] Fps is (10 sec: 11878.8, 60 sec: 12015.0, 300 sec: 12149.2). Total num frames: 35201024. Throughput: 0: 3065.3. Samples: 7795436. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-03 21:10:06,069][126169] Avg episode reward: [(0, '5.684')] [2025-01-03 21:10:08,103][126248] Updated weights for policy 0, policy_version 8600 (0.0023) [2025-01-03 21:10:11,069][126169] Fps is (10 sec: 12287.9, 60 sec: 12014.9, 300 sec: 12163.0). Total num frames: 35258368. Throughput: 0: 3055.5. Samples: 7803882. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-03 21:10:11,069][126169] Avg episode reward: [(0, '5.558')] [2025-01-03 21:10:11,769][126248] Updated weights for policy 0, policy_version 8610 (0.0022) [2025-01-03 21:10:15,186][126248] Updated weights for policy 0, policy_version 8620 (0.0019) [2025-01-03 21:10:16,068][126169] Fps is (10 sec: 12288.1, 60 sec: 12219.8, 300 sec: 12218.6). Total num frames: 35323904. Throughput: 0: 3038.1. Samples: 7820756. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-03 21:10:16,069][126169] Avg episode reward: [(0, '5.406')] [2025-01-03 21:10:17,341][126248] Updated weights for policy 0, policy_version 8630 (0.0011) [2025-01-03 21:10:19,561][126248] Updated weights for policy 0, policy_version 8640 (0.0011) [2025-01-03 21:10:21,068][126169] Fps is (10 sec: 15565.1, 60 sec: 12765.9, 300 sec: 12288.0). Total num frames: 35414016. Throughput: 0: 3239.7. Samples: 7847316. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 21:10:21,069][126169] Avg episode reward: [(0, '5.978')] [2025-01-03 21:10:21,969][126248] Updated weights for policy 0, policy_version 8650 (0.0012) [2025-01-03 21:10:26,069][126169] Fps is (10 sec: 14335.8, 60 sec: 12629.3, 300 sec: 12274.1). Total num frames: 35467264. Throughput: 0: 3291.8. Samples: 7858588. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2025-01-03 21:10:26,069][126169] Avg episode reward: [(0, '5.550')] [2025-01-03 21:10:26,331][126248] Updated weights for policy 0, policy_version 8660 (0.0026) [2025-01-03 21:10:31,069][126169] Fps is (10 sec: 9010.5, 60 sec: 12287.9, 300 sec: 12190.8). Total num frames: 35504128. Throughput: 0: 3187.0. Samples: 7871218. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2025-01-03 21:10:31,070][126169] Avg episode reward: [(0, '6.070')] [2025-01-03 21:10:31,844][126248] Updated weights for policy 0, policy_version 8670 (0.0030) [2025-01-03 21:10:34,685][126248] Updated weights for policy 0, policy_version 8680 (0.0012) [2025-01-03 21:10:36,068][126169] Fps is (10 sec: 11059.3, 60 sec: 12561.1, 300 sec: 12246.4). Total num frames: 35577856. Throughput: 0: 3086.4. Samples: 7887900. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2025-01-03 21:10:36,069][126169] Avg episode reward: [(0, '5.010')] [2025-01-03 21:10:36,897][126248] Updated weights for policy 0, policy_version 8690 (0.0012) [2025-01-03 21:10:39,075][126248] Updated weights for policy 0, policy_version 8700 (0.0011) [2025-01-03 21:10:41,068][126169] Fps is (10 sec: 16385.5, 60 sec: 13039.0, 300 sec: 12371.3). Total num frames: 35667968. Throughput: 0: 3120.6. Samples: 7901974. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 21:10:41,069][126169] Avg episode reward: [(0, '5.629')] [2025-01-03 21:10:41,317][126248] Updated weights for policy 0, policy_version 8710 (0.0011) [2025-01-03 21:10:43,970][126248] Updated weights for policy 0, policy_version 8720 (0.0016) [2025-01-03 21:10:46,069][126169] Fps is (10 sec: 15563.5, 60 sec: 13175.3, 300 sec: 12399.0). Total num frames: 35733504. Throughput: 0: 3351.8. Samples: 7927082. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 21:10:46,070][126169] Avg episode reward: [(0, '5.011')] [2025-01-03 21:10:46,111][126222] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000008725_35737600.pth... [2025-01-03 21:10:46,191][126222] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000008003_32780288.pth [2025-01-03 21:10:48,367][126248] Updated weights for policy 0, policy_version 8730 (0.0026) [2025-01-03 21:10:51,069][126169] Fps is (10 sec: 11878.0, 60 sec: 12765.8, 300 sec: 12274.1). Total num frames: 35786752. Throughput: 0: 3249.5. Samples: 7941662. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2025-01-03 21:10:51,069][126169] Avg episode reward: [(0, '4.955')] [2025-01-03 21:10:52,176][126248] Updated weights for policy 0, policy_version 8740 (0.0023) [2025-01-03 21:10:55,709][126248] Updated weights for policy 0, policy_version 8750 (0.0021) [2025-01-03 21:10:56,069][126169] Fps is (10 sec: 10650.2, 60 sec: 12629.4, 300 sec: 12149.1). Total num frames: 35840000. Throughput: 0: 3245.5. Samples: 7949930. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-03 21:10:56,069][126169] Avg episode reward: [(0, '4.789')] [2025-01-03 21:10:59,352][126248] Updated weights for policy 0, policy_version 8760 (0.0023) [2025-01-03 21:11:01,069][126169] Fps is (10 sec: 11059.4, 60 sec: 12697.6, 300 sec: 12176.9). Total num frames: 35897344. Throughput: 0: 3249.8. Samples: 7966998. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-03 21:11:01,069][126169] Avg episode reward: [(0, '5.047')] [2025-01-03 21:11:03,057][126248] Updated weights for policy 0, policy_version 8770 (0.0022) [2025-01-03 21:11:06,069][126169] Fps is (10 sec: 11468.5, 60 sec: 12561.0, 300 sec: 12190.8). Total num frames: 35954688. Throughput: 0: 3036.2. Samples: 7983946. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-03 21:11:06,070][126169] Avg episode reward: [(0, '5.171')] [2025-01-03 21:11:06,794][126248] Updated weights for policy 0, policy_version 8780 (0.0023) [2025-01-03 21:11:10,912][126248] Updated weights for policy 0, policy_version 8790 (0.0024) [2025-01-03 21:11:11,069][126169] Fps is (10 sec: 10649.4, 60 sec: 12424.5, 300 sec: 12190.8). Total num frames: 36003840. Throughput: 0: 2954.4. Samples: 7991536. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-03 21:11:11,069][126169] Avg episode reward: [(0, '4.867')] [2025-01-03 21:11:14,773][126248] Updated weights for policy 0, policy_version 8800 (0.0023) [2025-01-03 21:11:16,069][126169] Fps is (10 sec: 10240.3, 60 sec: 12219.7, 300 sec: 12176.9). Total num frames: 36057088. Throughput: 0: 3021.8. Samples: 8007198. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-03 21:11:16,070][126169] Avg episode reward: [(0, '4.705')] [2025-01-03 21:11:18,925][126248] Updated weights for policy 0, policy_version 8810 (0.0024) [2025-01-03 21:11:21,069][126169] Fps is (10 sec: 10649.6, 60 sec: 11605.3, 300 sec: 12190.8). Total num frames: 36110336. Throughput: 0: 2987.9. Samples: 8022354. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-03 21:11:21,069][126169] Avg episode reward: [(0, '4.983')] [2025-01-03 21:11:22,463][126248] Updated weights for policy 0, policy_version 8820 (0.0021) [2025-01-03 21:11:26,032][126248] Updated weights for policy 0, policy_version 8830 (0.0023) [2025-01-03 21:11:26,069][126169] Fps is (10 sec: 11059.2, 60 sec: 11673.6, 300 sec: 12232.5). Total num frames: 36167680. Throughput: 0: 2868.2. Samples: 8031046. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-03 21:11:26,069][126169] Avg episode reward: [(0, '5.092')] [2025-01-03 21:11:29,654][126248] Updated weights for policy 0, policy_version 8840 (0.0022) [2025-01-03 21:11:31,069][126169] Fps is (10 sec: 11059.2, 60 sec: 11946.8, 300 sec: 12149.1). Total num frames: 36220928. Throughput: 0: 2693.4. Samples: 8048284. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-03 21:11:31,069][126169] Avg episode reward: [(0, '5.062')] [2025-01-03 21:11:33,154][126248] Updated weights for policy 0, policy_version 8850 (0.0022) [2025-01-03 21:11:36,069][126169] Fps is (10 sec: 11059.2, 60 sec: 11673.5, 300 sec: 12038.1). Total num frames: 36278272. Throughput: 0: 2735.4. Samples: 8064756. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-03 21:11:36,069][126169] Avg episode reward: [(0, '5.350')] [2025-01-03 21:11:37,042][126248] Updated weights for policy 0, policy_version 8860 (0.0020) [2025-01-03 21:11:39,359][126248] Updated weights for policy 0, policy_version 8870 (0.0012) [2025-01-03 21:11:41,068][126169] Fps is (10 sec: 13926.7, 60 sec: 11537.0, 300 sec: 12135.3). Total num frames: 36360192. Throughput: 0: 2771.8. Samples: 8074660. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-03 21:11:41,069][126169] Avg episode reward: [(0, '5.448')] [2025-01-03 21:11:41,546][126248] Updated weights for policy 0, policy_version 8880 (0.0011) [2025-01-03 21:11:43,843][126248] Updated weights for policy 0, policy_version 8890 (0.0012) [2025-01-03 21:11:46,068][126169] Fps is (10 sec: 16794.2, 60 sec: 11878.6, 300 sec: 12274.1). Total num frames: 36446208. Throughput: 0: 3004.5. Samples: 8102200. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 21:11:46,069][126169] Avg episode reward: [(0, '5.437')] [2025-01-03 21:11:46,351][126248] Updated weights for policy 0, policy_version 8900 (0.0013) [2025-01-03 21:11:49,706][126248] Updated weights for policy 0, policy_version 8910 (0.0021) [2025-01-03 21:11:51,069][126169] Fps is (10 sec: 14335.7, 60 sec: 11946.7, 300 sec: 12288.0). Total num frames: 36503552. Throughput: 0: 3071.1. Samples: 8122146. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 21:11:51,069][126169] Avg episode reward: [(0, '5.188')] [2025-01-03 21:11:53,995][126248] Updated weights for policy 0, policy_version 8920 (0.0026) [2025-01-03 21:11:56,069][126169] Fps is (10 sec: 10649.0, 60 sec: 11878.4, 300 sec: 12274.1). Total num frames: 36552704. Throughput: 0: 3062.4. Samples: 8129346. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2025-01-03 21:11:56,070][126169] Avg episode reward: [(0, '5.329')] [2025-01-03 21:11:58,048][126248] Updated weights for policy 0, policy_version 8930 (0.0025) [2025-01-03 21:12:01,069][126169] Fps is (10 sec: 10649.6, 60 sec: 11878.4, 300 sec: 12288.0). Total num frames: 36610048. Throughput: 0: 3055.7. Samples: 8144702. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-03 21:12:01,069][126169] Avg episode reward: [(0, '5.879')] [2025-01-03 21:12:01,793][126248] Updated weights for policy 0, policy_version 8940 (0.0022) [2025-01-03 21:12:05,463][126248] Updated weights for policy 0, policy_version 8950 (0.0022) [2025-01-03 21:12:06,069][126169] Fps is (10 sec: 11059.6, 60 sec: 11810.2, 300 sec: 12301.9). Total num frames: 36663296. Throughput: 0: 3088.6. Samples: 8161340. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-03 21:12:06,069][126169] Avg episode reward: [(0, '5.512')] [2025-01-03 21:12:09,176][126248] Updated weights for policy 0, policy_version 8960 (0.0022) [2025-01-03 21:12:11,069][126169] Fps is (10 sec: 11059.2, 60 sec: 11946.7, 300 sec: 12190.8). Total num frames: 36720640. Throughput: 0: 3082.1. Samples: 8169740. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 21:12:11,069][126169] Avg episode reward: [(0, '5.298')] [2025-01-03 21:12:12,722][126248] Updated weights for policy 0, policy_version 8970 (0.0022) [2025-01-03 21:12:16,069][126169] Fps is (10 sec: 11468.7, 60 sec: 12014.9, 300 sec: 12079.7). Total num frames: 36777984. Throughput: 0: 3074.2. Samples: 8186622. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 21:12:16,069][126169] Avg episode reward: [(0, '5.633')] [2025-01-03 21:12:16,371][126248] Updated weights for policy 0, policy_version 8980 (0.0022) [2025-01-03 21:12:19,935][126248] Updated weights for policy 0, policy_version 8990 (0.0022) [2025-01-03 21:12:21,069][126169] Fps is (10 sec: 11059.4, 60 sec: 12015.0, 300 sec: 12065.9). Total num frames: 36831232. Throughput: 0: 3087.7. Samples: 8203700. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 21:12:21,069][126169] Avg episode reward: [(0, '5.928')] [2025-01-03 21:12:23,568][126248] Updated weights for policy 0, policy_version 9000 (0.0024) [2025-01-03 21:12:26,069][126169] Fps is (10 sec: 10649.4, 60 sec: 11946.6, 300 sec: 12051.9). Total num frames: 36884480. Throughput: 0: 3057.5. Samples: 8212250. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-03 21:12:26,070][126169] Avg episode reward: [(0, '5.165')] [2025-01-03 21:12:27,900][126248] Updated weights for policy 0, policy_version 9010 (0.0023) [2025-01-03 21:12:30,305][126248] Updated weights for policy 0, policy_version 9020 (0.0012) [2025-01-03 21:12:31,068][126169] Fps is (10 sec: 12697.7, 60 sec: 12288.0, 300 sec: 12121.4). Total num frames: 36958208. Throughput: 0: 2816.0. Samples: 8228920. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-03 21:12:31,069][126169] Avg episode reward: [(0, '5.566')] [2025-01-03 21:12:32,528][126248] Updated weights for policy 0, policy_version 9030 (0.0011) [2025-01-03 21:12:36,069][126169] Fps is (10 sec: 13926.7, 60 sec: 12424.5, 300 sec: 12135.3). Total num frames: 37023744. Throughput: 0: 2872.4. Samples: 8251404. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-03 21:12:36,069][126169] Avg episode reward: [(0, '5.051')] [2025-01-03 21:12:36,304][126248] Updated weights for policy 0, policy_version 9040 (0.0023) [2025-01-03 21:12:39,980][126248] Updated weights for policy 0, policy_version 9050 (0.0023) [2025-01-03 21:12:41,069][126169] Fps is (10 sec: 11878.1, 60 sec: 11946.6, 300 sec: 12121.4). Total num frames: 37076992. Throughput: 0: 2894.1. Samples: 8259580. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-03 21:12:41,069][126169] Avg episode reward: [(0, '5.298')] [2025-01-03 21:12:43,746][126248] Updated weights for policy 0, policy_version 9060 (0.0023) [2025-01-03 21:12:46,069][126169] Fps is (10 sec: 11059.3, 60 sec: 11468.7, 300 sec: 12121.4). Total num frames: 37134336. Throughput: 0: 2916.0. Samples: 8275922. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 21:12:46,069][126169] Avg episode reward: [(0, '5.157')] [2025-01-03 21:12:46,078][126222] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000009066_37134336.pth... [2025-01-03 21:12:46,146][126222] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000008368_34275328.pth [2025-01-03 21:12:47,426][126248] Updated weights for policy 0, policy_version 9070 (0.0023) [2025-01-03 21:12:49,745][126248] Updated weights for policy 0, policy_version 9080 (0.0012) [2025-01-03 21:12:51,069][126169] Fps is (10 sec: 13106.5, 60 sec: 11741.7, 300 sec: 12190.8). Total num frames: 37208064. Throughput: 0: 3012.6. Samples: 8296908. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 21:12:51,070][126169] Avg episode reward: [(0, '5.245')] [2025-01-03 21:12:52,969][126248] Updated weights for policy 0, policy_version 9090 (0.0020) [2025-01-03 21:12:56,069][126169] Fps is (10 sec: 13516.9, 60 sec: 11946.7, 300 sec: 12204.7). Total num frames: 37269504. Throughput: 0: 3031.7. Samples: 8306166. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 21:12:56,069][126169] Avg episode reward: [(0, '5.937')] [2025-01-03 21:12:56,387][126248] Updated weights for policy 0, policy_version 9100 (0.0021) [2025-01-03 21:12:59,906][126248] Updated weights for policy 0, policy_version 9110 (0.0021) [2025-01-03 21:13:01,069][126169] Fps is (10 sec: 11879.1, 60 sec: 11946.7, 300 sec: 12218.6). Total num frames: 37326848. Throughput: 0: 3051.2. Samples: 8323926. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 21:13:01,069][126169] Avg episode reward: [(0, '5.787')] [2025-01-03 21:13:03,472][126248] Updated weights for policy 0, policy_version 9120 (0.0021) [2025-01-03 21:13:06,069][126169] Fps is (10 sec: 11468.7, 60 sec: 12014.9, 300 sec: 12218.6). Total num frames: 37384192. Throughput: 0: 3052.1. Samples: 8341046. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 21:13:06,069][126169] Avg episode reward: [(0, '5.334')] [2025-01-03 21:13:07,081][126248] Updated weights for policy 0, policy_version 9130 (0.0022) [2025-01-03 21:13:10,520][126248] Updated weights for policy 0, policy_version 9140 (0.0021) [2025-01-03 21:13:11,069][126169] Fps is (10 sec: 11468.8, 60 sec: 12014.9, 300 sec: 12232.5). Total num frames: 37441536. Throughput: 0: 3055.2. Samples: 8349734. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-03 21:13:11,069][126169] Avg episode reward: [(0, '5.291')] [2025-01-03 21:13:13,939][126248] Updated weights for policy 0, policy_version 9150 (0.0021) [2025-01-03 21:13:16,069][126169] Fps is (10 sec: 11468.6, 60 sec: 12014.9, 300 sec: 12135.3). Total num frames: 37498880. Throughput: 0: 3077.5. Samples: 8367410. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2025-01-03 21:13:16,070][126169] Avg episode reward: [(0, '5.578')] [2025-01-03 21:13:17,670][126248] Updated weights for policy 0, policy_version 9160 (0.0021) [2025-01-03 21:13:21,069][126169] Fps is (10 sec: 11468.9, 60 sec: 12083.2, 300 sec: 12052.0). Total num frames: 37556224. Throughput: 0: 2961.5. Samples: 8384670. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2025-01-03 21:13:21,069][126169] Avg episode reward: [(0, '5.878')] [2025-01-03 21:13:21,165][126248] Updated weights for policy 0, policy_version 9170 (0.0021) [2025-01-03 21:13:24,723][126248] Updated weights for policy 0, policy_version 9180 (0.0022) [2025-01-03 21:13:26,069][126169] Fps is (10 sec: 11468.9, 60 sec: 12151.5, 300 sec: 12107.5). Total num frames: 37613568. Throughput: 0: 2965.5. Samples: 8393028. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2025-01-03 21:13:26,069][126169] Avg episode reward: [(0, '6.096')] [2025-01-03 21:13:28,038][126248] Updated weights for policy 0, policy_version 9190 (0.0019) [2025-01-03 21:13:30,222][126248] Updated weights for policy 0, policy_version 9200 (0.0012) [2025-01-03 21:13:31,068][126169] Fps is (10 sec: 13926.7, 60 sec: 12288.0, 300 sec: 12232.5). Total num frames: 37695488. Throughput: 0: 3052.8. Samples: 8413296. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 21:13:31,069][126169] Avg episode reward: [(0, '5.620')] [2025-01-03 21:13:32,650][126248] Updated weights for policy 0, policy_version 9210 (0.0014) [2025-01-03 21:13:36,069][126169] Fps is (10 sec: 14745.6, 60 sec: 12288.0, 300 sec: 12135.3). Total num frames: 37761024. Throughput: 0: 3085.3. Samples: 8435746. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 21:13:36,069][126169] Avg episode reward: [(0, '5.612')] [2025-01-03 21:13:36,312][126248] Updated weights for policy 0, policy_version 9220 (0.0023) [2025-01-03 21:13:39,871][126248] Updated weights for policy 0, policy_version 9230 (0.0021) [2025-01-03 21:13:41,069][126169] Fps is (10 sec: 12287.7, 60 sec: 12356.3, 300 sec: 12010.3). Total num frames: 37818368. Throughput: 0: 3060.8. Samples: 8443900. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-03 21:13:41,069][126169] Avg episode reward: [(0, '5.699')] [2025-01-03 21:13:43,250][126248] Updated weights for policy 0, policy_version 9240 (0.0020) [2025-01-03 21:13:46,069][126169] Fps is (10 sec: 11468.9, 60 sec: 12356.3, 300 sec: 12010.3). Total num frames: 37875712. Throughput: 0: 3057.2. Samples: 8461500. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 21:13:46,069][126169] Avg episode reward: [(0, '5.176')] [2025-01-03 21:13:46,939][126248] Updated weights for policy 0, policy_version 9250 (0.0022) [2025-01-03 21:13:50,293][126248] Updated weights for policy 0, policy_version 9260 (0.0021) [2025-01-03 21:13:51,069][126169] Fps is (10 sec: 11878.3, 60 sec: 12151.6, 300 sec: 12052.0). Total num frames: 37937152. Throughput: 0: 3072.0. Samples: 8479284. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 21:13:51,069][126169] Avg episode reward: [(0, '6.069')] [2025-01-03 21:13:53,684][126248] Updated weights for policy 0, policy_version 9270 (0.0021) [2025-01-03 21:13:56,069][126169] Fps is (10 sec: 12287.9, 60 sec: 12151.4, 300 sec: 12093.6). Total num frames: 37998592. Throughput: 0: 3080.8. Samples: 8488372. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 21:13:56,069][126169] Avg episode reward: [(0, '5.615')] [2025-01-03 21:13:57,162][126248] Updated weights for policy 0, policy_version 9280 (0.0021) [2025-01-03 21:14:00,663][126248] Updated weights for policy 0, policy_version 9290 (0.0021) [2025-01-03 21:14:01,069][126169] Fps is (10 sec: 11878.4, 60 sec: 12151.5, 300 sec: 12121.4). Total num frames: 38055936. Throughput: 0: 3080.5. Samples: 8506030. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 21:14:01,069][126169] Avg episode reward: [(0, '5.651')] [2025-01-03 21:14:03,869][126248] Updated weights for policy 0, policy_version 9300 (0.0018) [2025-01-03 21:14:06,069][126169] Fps is (10 sec: 12697.6, 60 sec: 12356.3, 300 sec: 12163.0). Total num frames: 38125568. Throughput: 0: 3153.1. Samples: 8526560. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 21:14:06,069][126169] Avg episode reward: [(0, '5.547')] [2025-01-03 21:14:06,429][126248] Updated weights for policy 0, policy_version 9310 (0.0015) [2025-01-03 21:14:09,912][126248] Updated weights for policy 0, policy_version 9320 (0.0021) [2025-01-03 21:14:11,069][126169] Fps is (10 sec: 13107.3, 60 sec: 12424.5, 300 sec: 12190.8). Total num frames: 38187008. Throughput: 0: 3175.9. Samples: 8535942. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 21:14:11,069][126169] Avg episode reward: [(0, '5.453')] [2025-01-03 21:14:13,315][126248] Updated weights for policy 0, policy_version 9330 (0.0021) [2025-01-03 21:14:16,069][126169] Fps is (10 sec: 11878.4, 60 sec: 12424.6, 300 sec: 12190.8). Total num frames: 38244352. Throughput: 0: 3121.0. Samples: 8553744. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 21:14:16,069][126169] Avg episode reward: [(0, '6.136')] [2025-01-03 21:14:16,096][126222] Saving new best policy, reward=6.136! [2025-01-03 21:14:16,847][126248] Updated weights for policy 0, policy_version 9340 (0.0021) [2025-01-03 21:14:20,360][126248] Updated weights for policy 0, policy_version 9350 (0.0022) [2025-01-03 21:14:21,068][126169] Fps is (10 sec: 11878.6, 60 sec: 12492.8, 300 sec: 12190.8). Total num frames: 38305792. Throughput: 0: 3012.2. Samples: 8571292. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 21:14:21,069][126169] Avg episode reward: [(0, '5.900')] [2025-01-03 21:14:23,821][126248] Updated weights for policy 0, policy_version 9360 (0.0021) [2025-01-03 21:14:26,069][126169] Fps is (10 sec: 11878.4, 60 sec: 12492.8, 300 sec: 12190.8). Total num frames: 38363136. Throughput: 0: 3029.2. Samples: 8580212. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 21:14:26,069][126169] Avg episode reward: [(0, '5.377')] [2025-01-03 21:14:27,432][126248] Updated weights for policy 0, policy_version 9370 (0.0022) [2025-01-03 21:14:31,069][126169] Fps is (10 sec: 11058.1, 60 sec: 12014.7, 300 sec: 12176.9). Total num frames: 38416384. Throughput: 0: 3010.7. Samples: 8596986. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 21:14:31,070][126169] Avg episode reward: [(0, '5.394')] [2025-01-03 21:14:31,379][126248] Updated weights for policy 0, policy_version 9380 (0.0023) [2025-01-03 21:14:33,897][126248] Updated weights for policy 0, policy_version 9390 (0.0013) [2025-01-03 21:14:36,068][126169] Fps is (10 sec: 13517.1, 60 sec: 12288.1, 300 sec: 12246.4). Total num frames: 38498304. Throughput: 0: 3093.8. Samples: 8618506. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 21:14:36,069][126169] Avg episode reward: [(0, '5.312')] [2025-01-03 21:14:36,196][126248] Updated weights for policy 0, policy_version 9400 (0.0012) [2025-01-03 21:14:38,410][126248] Updated weights for policy 0, policy_version 9410 (0.0011) [2025-01-03 21:14:40,548][126248] Updated weights for policy 0, policy_version 9420 (0.0011) [2025-01-03 21:14:41,068][126169] Fps is (10 sec: 17614.5, 60 sec: 12902.4, 300 sec: 12371.3). Total num frames: 38592512. Throughput: 0: 3198.6. Samples: 8632308. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-03 21:14:41,069][126169] Avg episode reward: [(0, '5.735')] [2025-01-03 21:14:42,704][126248] Updated weights for policy 0, policy_version 9430 (0.0010) [2025-01-03 21:14:45,540][126248] Updated weights for policy 0, policy_version 9440 (0.0017) [2025-01-03 21:14:46,069][126169] Fps is (10 sec: 17202.6, 60 sec: 13243.7, 300 sec: 12371.3). Total num frames: 38670336. Throughput: 0: 3422.5. Samples: 8660042. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-03 21:14:46,069][126169] Avg episode reward: [(0, '5.412')] [2025-01-03 21:14:46,079][126222] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000009441_38670336.pth... [2025-01-03 21:14:46,158][126222] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000008725_35737600.pth [2025-01-03 21:14:49,569][126248] Updated weights for policy 0, policy_version 9450 (0.0025) [2025-01-03 21:14:51,069][126169] Fps is (10 sec: 12697.2, 60 sec: 13038.9, 300 sec: 12329.7). Total num frames: 38719488. Throughput: 0: 3319.0. Samples: 8675916. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 21:14:51,069][126169] Avg episode reward: [(0, '5.387')] [2025-01-03 21:14:53,136][126248] Updated weights for policy 0, policy_version 9460 (0.0022) [2025-01-03 21:14:56,069][126169] Fps is (10 sec: 11059.4, 60 sec: 13038.9, 300 sec: 12357.4). Total num frames: 38780928. Throughput: 0: 3304.4. Samples: 8684638. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-03 21:14:56,069][126169] Avg episode reward: [(0, '5.591')] [2025-01-03 21:14:56,776][126248] Updated weights for policy 0, policy_version 9470 (0.0021) [2025-01-03 21:15:00,180][126248] Updated weights for policy 0, policy_version 9480 (0.0019) [2025-01-03 21:15:01,069][126169] Fps is (10 sec: 11878.5, 60 sec: 13038.9, 300 sec: 12329.6). Total num frames: 38838272. Throughput: 0: 3298.6. Samples: 8702180. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-03 21:15:01,069][126169] Avg episode reward: [(0, '5.277')] [2025-01-03 21:15:03,646][126248] Updated weights for policy 0, policy_version 9490 (0.0020) [2025-01-03 21:15:06,069][126169] Fps is (10 sec: 11468.8, 60 sec: 12834.1, 300 sec: 12329.7). Total num frames: 38895616. Throughput: 0: 3294.0. Samples: 8719522. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 21:15:06,069][126169] Avg episode reward: [(0, '5.818')] [2025-01-03 21:15:07,265][126248] Updated weights for policy 0, policy_version 9500 (0.0021) [2025-01-03 21:15:10,693][126248] Updated weights for policy 0, policy_version 9510 (0.0021) [2025-01-03 21:15:11,069][126169] Fps is (10 sec: 11878.5, 60 sec: 12834.1, 300 sec: 12315.8). Total num frames: 38957056. Throughput: 0: 3287.8. Samples: 8728164. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-03 21:15:11,069][126169] Avg episode reward: [(0, '6.106')] [2025-01-03 21:15:14,043][126248] Updated weights for policy 0, policy_version 9520 (0.0021) [2025-01-03 21:15:16,069][126169] Fps is (10 sec: 11878.5, 60 sec: 12834.2, 300 sec: 12204.7). Total num frames: 39014400. Throughput: 0: 3320.4. Samples: 8746402. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-03 21:15:16,069][126169] Avg episode reward: [(0, '5.647')] [2025-01-03 21:15:17,506][126248] Updated weights for policy 0, policy_version 9530 (0.0022) [2025-01-03 21:15:20,875][126248] Updated weights for policy 0, policy_version 9540 (0.0021) [2025-01-03 21:15:21,069][126169] Fps is (10 sec: 11878.3, 60 sec: 12834.1, 300 sec: 12232.5). Total num frames: 39075840. Throughput: 0: 3242.2. Samples: 8764404. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 21:15:21,069][126169] Avg episode reward: [(0, '5.752')] [2025-01-03 21:15:24,160][126248] Updated weights for policy 0, policy_version 9550 (0.0021) [2025-01-03 21:15:26,069][126169] Fps is (10 sec: 12288.0, 60 sec: 12902.4, 300 sec: 12315.8). Total num frames: 39137280. Throughput: 0: 3140.4. Samples: 8773628. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 21:15:26,069][126169] Avg episode reward: [(0, '5.378')] [2025-01-03 21:15:27,509][126248] Updated weights for policy 0, policy_version 9560 (0.0021) [2025-01-03 21:15:30,794][126248] Updated weights for policy 0, policy_version 9570 (0.0021) [2025-01-03 21:15:31,069][126169] Fps is (10 sec: 12287.9, 60 sec: 13039.1, 300 sec: 12274.1). Total num frames: 39198720. Throughput: 0: 2936.9. Samples: 8792204. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 21:15:31,069][126169] Avg episode reward: [(0, '5.468')] [2025-01-03 21:15:34,119][126248] Updated weights for policy 0, policy_version 9580 (0.0020) [2025-01-03 21:15:36,069][126169] Fps is (10 sec: 12288.0, 60 sec: 12697.6, 300 sec: 12176.9). Total num frames: 39260160. Throughput: 0: 2988.2. Samples: 8810384. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 21:15:36,069][126169] Avg episode reward: [(0, '5.863')] [2025-01-03 21:15:37,437][126248] Updated weights for policy 0, policy_version 9590 (0.0021) [2025-01-03 21:15:40,844][126248] Updated weights for policy 0, policy_version 9600 (0.0020) [2025-01-03 21:15:41,069][126169] Fps is (10 sec: 12288.1, 60 sec: 12151.4, 300 sec: 12163.1). Total num frames: 39321600. Throughput: 0: 3004.0. Samples: 8819820. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 21:15:41,069][126169] Avg episode reward: [(0, '5.968')] [2025-01-03 21:15:44,152][126248] Updated weights for policy 0, policy_version 9610 (0.0021) [2025-01-03 21:15:46,068][126169] Fps is (10 sec: 12288.0, 60 sec: 11878.5, 300 sec: 12190.8). Total num frames: 39383040. Throughput: 0: 3018.5. Samples: 8838010. Policy #0 lag: (min: 0.0, avg: 0.9, max: 1.0) [2025-01-03 21:15:46,069][126169] Avg episode reward: [(0, '6.060')] [2025-01-03 21:15:47,471][126248] Updated weights for policy 0, policy_version 9620 (0.0020) [2025-01-03 21:15:50,999][126248] Updated weights for policy 0, policy_version 9630 (0.0021) [2025-01-03 21:15:51,069][126169] Fps is (10 sec: 12288.0, 60 sec: 12083.2, 300 sec: 12218.6). Total num frames: 39444480. Throughput: 0: 3034.9. Samples: 8856094. Policy #0 lag: (min: 0.0, avg: 0.8, max: 1.0) [2025-01-03 21:15:51,069][126169] Avg episode reward: [(0, '5.771')] [2025-01-03 21:15:54,429][126248] Updated weights for policy 0, policy_version 9640 (0.0021) [2025-01-03 21:15:56,069][126169] Fps is (10 sec: 11878.2, 60 sec: 12014.9, 300 sec: 12218.6). Total num frames: 39501824. Throughput: 0: 3035.8. Samples: 8864774. Policy #0 lag: (min: 0.0, avg: 0.8, max: 1.0) [2025-01-03 21:15:56,069][126169] Avg episode reward: [(0, '6.002')] [2025-01-03 21:15:57,850][126248] Updated weights for policy 0, policy_version 9650 (0.0021) [2025-01-03 21:16:01,069][126169] Fps is (10 sec: 11878.4, 60 sec: 12083.2, 300 sec: 12232.5). Total num frames: 39563264. Throughput: 0: 3035.2. Samples: 8882988. Policy #0 lag: (min: 0.0, avg: 0.8, max: 1.0) [2025-01-03 21:16:01,069][126169] Avg episode reward: [(0, '6.103')] [2025-01-03 21:16:01,268][126248] Updated weights for policy 0, policy_version 9660 (0.0021) [2025-01-03 21:16:04,682][126248] Updated weights for policy 0, policy_version 9670 (0.0021) [2025-01-03 21:16:06,069][126169] Fps is (10 sec: 12288.2, 60 sec: 12151.5, 300 sec: 12274.1). Total num frames: 39624704. Throughput: 0: 3030.2. Samples: 8900764. Policy #0 lag: (min: 0.0, avg: 0.8, max: 1.0) [2025-01-03 21:16:06,069][126169] Avg episode reward: [(0, '5.381')] [2025-01-03 21:16:08,016][126248] Updated weights for policy 0, policy_version 9680 (0.0020) [2025-01-03 21:16:11,069][126169] Fps is (10 sec: 11877.7, 60 sec: 12083.1, 300 sec: 12288.0). Total num frames: 39682048. Throughput: 0: 3033.7. Samples: 8910146. Policy #0 lag: (min: 0.0, avg: 0.8, max: 1.0) [2025-01-03 21:16:11,070][126169] Avg episode reward: [(0, '5.743')] [2025-01-03 21:16:11,667][126248] Updated weights for policy 0, policy_version 9690 (0.0022) [2025-01-03 21:16:15,077][126248] Updated weights for policy 0, policy_version 9700 (0.0022) [2025-01-03 21:16:16,069][126169] Fps is (10 sec: 11468.8, 60 sec: 12083.2, 300 sec: 12301.9). Total num frames: 39739392. Throughput: 0: 3008.2. Samples: 8927574. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-03 21:16:16,069][126169] Avg episode reward: [(0, '5.501')] [2025-01-03 21:16:18,436][126248] Updated weights for policy 0, policy_version 9710 (0.0021) [2025-01-03 21:16:21,069][126169] Fps is (10 sec: 11879.1, 60 sec: 12083.2, 300 sec: 12315.8). Total num frames: 39800832. Throughput: 0: 3001.9. Samples: 8945472. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-03 21:16:21,069][126169] Avg episode reward: [(0, '5.475')] [2025-01-03 21:16:21,943][126248] Updated weights for policy 0, policy_version 9720 (0.0021) [2025-01-03 21:16:25,252][126248] Updated weights for policy 0, policy_version 9730 (0.0020) [2025-01-03 21:16:26,069][126169] Fps is (10 sec: 12287.7, 60 sec: 12083.2, 300 sec: 12343.5). Total num frames: 39862272. Throughput: 0: 2990.3. Samples: 8954382. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-03 21:16:26,069][126169] Avg episode reward: [(0, '5.891')] [2025-01-03 21:16:29,279][126248] Updated weights for policy 0, policy_version 9740 (0.0023) [2025-01-03 21:16:31,069][126169] Fps is (10 sec: 11059.3, 60 sec: 11878.4, 300 sec: 12315.8). Total num frames: 39911424. Throughput: 0: 2952.0. Samples: 8970850. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-03 21:16:31,069][126169] Avg episode reward: [(0, '5.853')] [2025-01-03 21:16:33,056][126248] Updated weights for policy 0, policy_version 9750 (0.0022) [2025-01-03 21:16:36,069][126169] Fps is (10 sec: 10649.7, 60 sec: 11810.1, 300 sec: 12232.5). Total num frames: 39968768. Throughput: 0: 2919.0. Samples: 8987450. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 21:16:36,069][126169] Avg episode reward: [(0, '5.897')] [2025-01-03 21:16:36,679][126248] Updated weights for policy 0, policy_version 9760 (0.0023) [2025-01-03 21:16:38,775][126169] Component Batcher_0 stopped! [2025-01-03 21:16:38,775][126222] Stopping Batcher_0... [2025-01-03 21:16:38,776][126222] Loop batcher_evt_loop terminating... [2025-01-03 21:16:38,775][126222] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000009767_40005632.pth... [2025-01-03 21:16:38,802][126248] Weights refcount: 2 0 [2025-01-03 21:16:38,804][126248] Stopping InferenceWorker_p0-w0... [2025-01-03 21:16:38,804][126169] Component InferenceWorker_p0-w0 stopped! [2025-01-03 21:16:38,804][126248] Loop inference_proc0-0_evt_loop terminating... [2025-01-03 21:16:38,826][126222] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000009066_37134336.pth [2025-01-03 21:16:38,828][126222] Saving new best policy, reward=6.292! [2025-01-03 21:16:38,839][126247] Stopping RolloutWorker_w0... [2025-01-03 21:16:38,840][126169] Component RolloutWorker_w0 stopped! [2025-01-03 21:16:38,840][126247] Loop rollout_proc0_evt_loop terminating... [2025-01-03 21:16:38,841][126250] Stopping RolloutWorker_w1... [2025-01-03 21:16:38,841][126251] Stopping RolloutWorker_w3... [2025-01-03 21:16:38,841][126169] Component RolloutWorker_w3 stopped! [2025-01-03 21:16:38,842][126169] Component RolloutWorker_w1 stopped! [2025-01-03 21:16:38,842][126250] Loop rollout_proc1_evt_loop terminating... [2025-01-03 21:16:38,842][126251] Loop rollout_proc3_evt_loop terminating... [2025-01-03 21:16:38,842][126265] Stopping RolloutWorker_w5... [2025-01-03 21:16:38,842][126169] Component RolloutWorker_w5 stopped! [2025-01-03 21:16:38,842][126263] Stopping RolloutWorker_w4... [2025-01-03 21:16:38,842][126169] Component RolloutWorker_w4 stopped! [2025-01-03 21:16:38,842][126265] Loop rollout_proc5_evt_loop terminating... [2025-01-03 21:16:38,843][126263] Loop rollout_proc4_evt_loop terminating... [2025-01-03 21:16:38,843][126169] Component RolloutWorker_w6 stopped! [2025-01-03 21:16:38,843][126267] Stopping RolloutWorker_w6... [2025-01-03 21:16:38,843][126249] Stopping RolloutWorker_w2... [2025-01-03 21:16:38,844][126169] Component RolloutWorker_w2 stopped! [2025-01-03 21:16:38,844][126267] Loop rollout_proc6_evt_loop terminating... [2025-01-03 21:16:38,844][126249] Loop rollout_proc2_evt_loop terminating... [2025-01-03 21:16:38,846][126266] Stopping RolloutWorker_w7... [2025-01-03 21:16:38,846][126169] Component RolloutWorker_w7 stopped! [2025-01-03 21:16:38,847][126266] Loop rollout_proc7_evt_loop terminating... [2025-01-03 21:16:38,884][126222] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000009767_40005632.pth... [2025-01-03 21:16:38,946][126222] Stopping LearnerWorker_p0... [2025-01-03 21:16:38,947][126169] Component LearnerWorker_p0 stopped! [2025-01-03 21:16:38,949][126169] Waiting for process learner_proc0 to stop... [2025-01-03 21:16:38,947][126222] Loop learner_proc0_evt_loop terminating... [2025-01-03 21:16:39,875][126169] Waiting for process inference_proc0-0 to join... [2025-01-03 21:16:39,876][126169] Waiting for process rollout_proc0 to join... [2025-01-03 21:16:39,876][126169] Waiting for process rollout_proc1 to join... [2025-01-03 21:16:39,876][126169] Waiting for process rollout_proc2 to join... [2025-01-03 21:16:39,876][126169] Waiting for process rollout_proc3 to join... [2025-01-03 21:16:39,876][126169] Waiting for process rollout_proc4 to join... [2025-01-03 21:16:39,877][126169] Waiting for process rollout_proc5 to join... [2025-01-03 21:16:39,877][126169] Waiting for process rollout_proc6 to join... [2025-01-03 21:16:39,877][126169] Waiting for process rollout_proc7 to join... [2025-01-03 21:16:39,877][126169] Batcher 0 profile tree view: batching: 108.7508, releasing_batches: 0.2879 [2025-01-03 21:16:39,877][126169] InferenceWorker_p0-w0 profile tree view: wait_policy: 0.0000 wait_policy_total: 43.6774 update_model: 44.8860 weight_update: 0.0019 one_step: 0.0034 handle_policy_step: 2632.8056 deserialize: 102.8084, stack: 15.5661, obs_to_device_normalize: 647.1216, forward: 1221.4513, send_messages: 202.7633 prepare_outputs: 333.8814 to_cpu: 209.9327 [2025-01-03 21:16:39,877][126169] Learner 0 profile tree view: misc: 0.0404, prepare_batch: 113.3803 train: 555.5426 epoch_init: 0.0549, minibatch_init: 0.0705, losses_postprocess: 3.2698, kl_divergence: 3.7108, after_optimizer: 8.0285 calculate_losses: 194.0003 losses_init: 0.0376, forward_head: 8.1008, bptt_initial: 140.3676, tail: 7.0268, advantages_returns: 2.0398, losses: 21.8073 bptt: 12.5111 bptt_forward_core: 11.7939 update: 342.2052 clip: 9.1548 [2025-01-03 21:16:39,878][126169] RolloutWorker_w0 profile tree view: wait_for_trajectories: 1.9079, enqueue_policy_requests: 137.1412, env_step: 1622.7626, overhead: 91.7580, complete_rollouts: 2.9448 save_policy_outputs: 155.2106 split_output_tensors: 50.5434 [2025-01-03 21:16:39,878][126169] RolloutWorker_w7 profile tree view: wait_for_trajectories: 1.8618, enqueue_policy_requests: 137.5712, env_step: 1622.2618, overhead: 90.6083, complete_rollouts: 2.9497 save_policy_outputs: 155.0441 split_output_tensors: 50.3737 [2025-01-03 21:16:39,878][126169] Loop Runner_EvtLoop terminating... [2025-01-03 21:16:39,878][126169] Runner profile tree view: main_loop: 2870.4893 [2025-01-03 21:16:39,878][126169] Collected {0: 40005632}, FPS: 12538.5 [2025-01-03 21:16:40,080][126169] Loading existing experiment configuration from /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/config.json [2025-01-03 21:16:40,080][126169] Overriding arg 'num_workers' with value 1 passed from command line [2025-01-03 21:16:40,080][126169] Adding new argument 'no_render'=True that is not in the saved config file! [2025-01-03 21:16:40,080][126169] Adding new argument 'save_video'=True that is not in the saved config file! [2025-01-03 21:16:40,080][126169] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! [2025-01-03 21:16:40,080][126169] Adding new argument 'video_name'=None that is not in the saved config file! [2025-01-03 21:16:40,080][126169] Adding new argument 'max_num_frames'=1000000000.0 that is not in the saved config file! [2025-01-03 21:16:40,080][126169] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! [2025-01-03 21:16:40,080][126169] Adding new argument 'push_to_hub'=False that is not in the saved config file! [2025-01-03 21:16:40,081][126169] Adding new argument 'hf_repository'=None that is not in the saved config file! [2025-01-03 21:16:40,081][126169] Adding new argument 'policy_index'=0 that is not in the saved config file! [2025-01-03 21:16:40,081][126169] Adding new argument 'eval_deterministic'=False that is not in the saved config file! [2025-01-03 21:16:40,081][126169] Adding new argument 'train_script'=None that is not in the saved config file! [2025-01-03 21:16:40,081][126169] Adding new argument 'enjoy_script'=None that is not in the saved config file! [2025-01-03 21:16:40,081][126169] Using frameskip 1 and render_action_repeat=4 for evaluation [2025-01-03 21:16:40,103][126169] Doom resolution: 160x120, resize resolution: (128, 72) [2025-01-03 21:16:40,104][126169] RunningMeanStd input shape: (3, 72, 128) [2025-01-03 21:16:40,105][126169] RunningMeanStd input shape: (1,) [2025-01-03 21:16:40,113][126169] ConvEncoder: input_channels=3 [2025-01-03 21:16:40,200][126169] Conv encoder output size: 512 [2025-01-03 21:16:40,200][126169] Policy head output size: 512 [2025-01-03 21:16:40,306][126169] Loading state from checkpoint /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000009767_40005632.pth... [2025-01-03 21:16:40,850][126169] Num frames 100... [2025-01-03 21:16:40,945][126169] Num frames 200... [2025-01-03 21:16:41,040][126169] Num frames 300... [2025-01-03 21:16:41,131][126169] Num frames 400... [2025-01-03 21:16:41,231][126169] Avg episode rewards: #0: 5.480, true rewards: #0: 4.480 [2025-01-03 21:16:41,231][126169] Avg episode reward: 5.480, avg true_objective: 4.480 [2025-01-03 21:16:41,319][126169] Num frames 500... [2025-01-03 21:16:41,413][126169] Num frames 600... [2025-01-03 21:16:41,508][126169] Num frames 700... [2025-01-03 21:16:41,602][126169] Num frames 800... [2025-01-03 21:16:41,687][126169] Avg episode rewards: #0: 4.660, true rewards: #0: 4.160 [2025-01-03 21:16:41,687][126169] Avg episode reward: 4.660, avg true_objective: 4.160 [2025-01-03 21:16:41,786][126169] Num frames 900... [2025-01-03 21:16:41,881][126169] Num frames 1000... [2025-01-03 21:16:41,976][126169] Num frames 1100... [2025-01-03 21:16:42,071][126169] Num frames 1200... [2025-01-03 21:16:42,141][126169] Avg episode rewards: #0: 4.387, true rewards: #0: 4.053 [2025-01-03 21:16:42,142][126169] Avg episode reward: 4.387, avg true_objective: 4.053 [2025-01-03 21:16:42,262][126169] Num frames 1300... [2025-01-03 21:16:42,359][126169] Num frames 1400... [2025-01-03 21:16:42,453][126169] Num frames 1500... [2025-01-03 21:16:42,548][126169] Num frames 1600... [2025-01-03 21:16:42,599][126169] Avg episode rewards: #0: 4.250, true rewards: #0: 4.000 [2025-01-03 21:16:42,600][126169] Avg episode reward: 4.250, avg true_objective: 4.000 [2025-01-03 21:16:42,725][126169] Num frames 1700... [2025-01-03 21:16:42,821][126169] Num frames 1800... [2025-01-03 21:16:42,915][126169] Num frames 1900... [2025-01-03 21:16:43,021][126169] Avg episode rewards: #0: 4.304, true rewards: #0: 3.904 [2025-01-03 21:16:43,022][126169] Avg episode reward: 4.304, avg true_objective: 3.904 [2025-01-03 21:16:43,084][126169] Num frames 2000... [2025-01-03 21:16:43,181][126169] Num frames 2100... [2025-01-03 21:16:43,277][126169] Num frames 2200... [2025-01-03 21:16:43,371][126169] Num frames 2300... [2025-01-03 21:16:43,464][126169] Num frames 2400... [2025-01-03 21:16:43,548][126169] Avg episode rewards: #0: 4.720, true rewards: #0: 4.053 [2025-01-03 21:16:43,549][126169] Avg episode reward: 4.720, avg true_objective: 4.053 [2025-01-03 21:16:43,644][126169] Num frames 2500... [2025-01-03 21:16:43,738][126169] Num frames 2600... [2025-01-03 21:16:43,833][126169] Num frames 2700... [2025-01-03 21:16:43,928][126169] Num frames 2800... [2025-01-03 21:16:44,025][126169] Num frames 2900... [2025-01-03 21:16:44,092][126169] Avg episode rewards: #0: 4.874, true rewards: #0: 4.160 [2025-01-03 21:16:44,092][126169] Avg episode reward: 4.874, avg true_objective: 4.160 [2025-01-03 21:16:44,212][126169] Num frames 3000... [2025-01-03 21:16:44,306][126169] Num frames 3100... [2025-01-03 21:16:44,400][126169] Num frames 3200... [2025-01-03 21:16:44,515][126169] Avg episode rewards: #0: 4.830, true rewards: #0: 4.080 [2025-01-03 21:16:44,516][126169] Avg episode reward: 4.830, avg true_objective: 4.080 [2025-01-03 21:16:44,589][126169] Num frames 3300... [2025-01-03 21:16:44,684][126169] Num frames 3400... [2025-01-03 21:16:44,780][126169] Num frames 3500... [2025-01-03 21:16:44,853][126169] Avg episode rewards: #0: 4.578, true rewards: #0: 3.911 [2025-01-03 21:16:44,854][126169] Avg episode reward: 4.578, avg true_objective: 3.911 [2025-01-03 21:16:44,953][126169] Num frames 3600... [2025-01-03 21:16:45,049][126169] Num frames 3700... [2025-01-03 21:16:45,143][126169] Num frames 3800... [2025-01-03 21:16:45,238][126169] Num frames 3900... [2025-01-03 21:16:45,327][126169] Avg episode rewards: #0: 4.636, true rewards: #0: 3.936 [2025-01-03 21:16:45,328][126169] Avg episode reward: 4.636, avg true_objective: 3.936 [2025-01-03 21:16:52,114][126169] Replay video saved to /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/replay.mp4! [2025-01-03 21:16:52,124][126169] Loading existing experiment configuration from /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/config.json [2025-01-03 21:16:52,124][126169] Overriding arg 'num_workers' with value 1 passed from command line [2025-01-03 21:16:52,125][126169] Adding new argument 'no_render'=True that is not in the saved config file! [2025-01-03 21:16:52,125][126169] Adding new argument 'save_video'=True that is not in the saved config file! [2025-01-03 21:16:52,125][126169] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! [2025-01-03 21:16:52,125][126169] Adding new argument 'video_name'=None that is not in the saved config file! [2025-01-03 21:16:52,125][126169] Adding new argument 'max_num_frames'=100000 that is not in the saved config file! [2025-01-03 21:16:52,125][126169] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! [2025-01-03 21:16:52,125][126169] Adding new argument 'push_to_hub'=True that is not in the saved config file! [2025-01-03 21:16:52,125][126169] Adding new argument 'hf_repository'='spenning/rl_course_vizdoom_health_gathering_supreme' that is not in the saved config file! [2025-01-03 21:16:52,126][126169] Adding new argument 'policy_index'=0 that is not in the saved config file! [2025-01-03 21:16:52,126][126169] Adding new argument 'eval_deterministic'=False that is not in the saved config file! [2025-01-03 21:16:52,126][126169] Adding new argument 'train_script'=None that is not in the saved config file! [2025-01-03 21:16:52,126][126169] Adding new argument 'enjoy_script'=None that is not in the saved config file! [2025-01-03 21:16:52,126][126169] Using frameskip 1 and render_action_repeat=4 for evaluation [2025-01-03 21:16:52,145][126169] RunningMeanStd input shape: (3, 72, 128) [2025-01-03 21:16:52,146][126169] RunningMeanStd input shape: (1,) [2025-01-03 21:16:52,154][126169] ConvEncoder: input_channels=3 [2025-01-03 21:16:52,182][126169] Conv encoder output size: 512 [2025-01-03 21:16:52,182][126169] Policy head output size: 512 [2025-01-03 21:16:52,197][126169] Loading state from checkpoint /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000009767_40005632.pth... [2025-01-03 21:16:52,548][126169] Num frames 100... [2025-01-03 21:16:52,638][126169] Num frames 200... [2025-01-03 21:16:52,729][126169] Num frames 300... [2025-01-03 21:16:52,819][126169] Num frames 400... [2025-01-03 21:16:52,898][126169] Num frames 500... [2025-01-03 21:16:52,993][126169] Avg episode rewards: #0: 7.440, true rewards: #0: 5.440 [2025-01-03 21:16:52,993][126169] Avg episode reward: 7.440, avg true_objective: 5.440 [2025-01-03 21:16:53,083][126169] Num frames 600... [2025-01-03 21:16:53,174][126169] Num frames 700... [2025-01-03 21:16:53,263][126169] Num frames 800... [2025-01-03 21:16:53,352][126169] Num frames 900... [2025-01-03 21:16:53,444][126169] Num frames 1000... [2025-01-03 21:16:53,536][126169] Num frames 1100... [2025-01-03 21:16:53,626][126169] Num frames 1200... [2025-01-03 21:16:53,725][126169] Avg episode rewards: #0: 9.240, true rewards: #0: 6.240 [2025-01-03 21:16:53,726][126169] Avg episode reward: 9.240, avg true_objective: 6.240 [2025-01-03 21:16:53,784][126169] Num frames 1300... [2025-01-03 21:16:53,875][126169] Num frames 1400... [2025-01-03 21:16:53,965][126169] Num frames 1500... [2025-01-03 21:16:54,057][126169] Num frames 1600... [2025-01-03 21:16:54,142][126169] Avg episode rewards: #0: 7.440, true rewards: #0: 5.440 [2025-01-03 21:16:54,142][126169] Avg episode reward: 7.440, avg true_objective: 5.440 [2025-01-03 21:16:54,246][126169] Num frames 1700... [2025-01-03 21:16:54,338][126169] Num frames 1800... [2025-01-03 21:16:54,430][126169] Num frames 1900... [2025-01-03 21:16:54,520][126169] Num frames 2000... [2025-01-03 21:16:54,647][126169] Avg episode rewards: #0: 6.950, true rewards: #0: 5.200 [2025-01-03 21:16:54,648][126169] Avg episode reward: 6.950, avg true_objective: 5.200 [2025-01-03 21:16:54,706][126169] Num frames 2100... [2025-01-03 21:16:54,801][126169] Num frames 2200... [2025-01-03 21:16:54,892][126169] Num frames 2300... [2025-01-03 21:16:54,985][126169] Num frames 2400... [2025-01-03 21:16:55,101][126169] Avg episode rewards: #0: 6.328, true rewards: #0: 4.928 [2025-01-03 21:16:55,101][126169] Avg episode reward: 6.328, avg true_objective: 4.928 [2025-01-03 21:16:55,172][126169] Num frames 2500... [2025-01-03 21:16:55,264][126169] Num frames 2600... [2025-01-03 21:16:55,357][126169] Num frames 2700... [2025-01-03 21:16:55,450][126169] Num frames 2800... [2025-01-03 21:16:55,541][126169] Num frames 2900... [2025-01-03 21:16:55,628][126169] Num frames 3000... [2025-01-03 21:16:55,716][126169] Num frames 3100... [2025-01-03 21:16:55,806][126169] Num frames 3200... [2025-01-03 21:16:55,857][126169] Avg episode rewards: #0: 7.333, true rewards: #0: 5.333 [2025-01-03 21:16:55,857][126169] Avg episode reward: 7.333, avg true_objective: 5.333 [2025-01-03 21:16:55,981][126169] Num frames 3300... [2025-01-03 21:16:56,074][126169] Num frames 3400... [2025-01-03 21:16:56,165][126169] Num frames 3500... [2025-01-03 21:16:56,297][126169] Avg episode rewards: #0: 6.834, true rewards: #0: 5.120 [2025-01-03 21:16:56,298][126169] Avg episode reward: 6.834, avg true_objective: 5.120 [2025-01-03 21:16:56,352][126169] Num frames 3600... [2025-01-03 21:16:56,449][126169] Num frames 3700... [2025-01-03 21:16:56,542][126169] Num frames 3800... [2025-01-03 21:16:56,631][126169] Num frames 3900... [2025-01-03 21:16:56,725][126169] Num frames 4000... [2025-01-03 21:16:56,776][126169] Avg episode rewards: #0: 6.625, true rewards: #0: 5.000 [2025-01-03 21:16:56,776][126169] Avg episode reward: 6.625, avg true_objective: 5.000 [2025-01-03 21:16:56,911][126169] Num frames 4100... [2025-01-03 21:16:57,007][126169] Num frames 4200... [2025-01-03 21:16:57,104][126169] Num frames 4300... [2025-01-03 21:16:57,206][126169] Avg episode rewards: #0: 6.391, true rewards: #0: 4.836 [2025-01-03 21:16:57,206][126169] Avg episode reward: 6.391, avg true_objective: 4.836 [2025-01-03 21:16:57,284][126169] Num frames 4400... [2025-01-03 21:16:57,377][126169] Num frames 4500... [2025-01-03 21:16:57,472][126169] Num frames 4600... [2025-01-03 21:16:57,564][126169] Num frames 4700... [2025-01-03 21:16:57,681][126169] Avg episode rewards: #0: 6.268, true rewards: #0: 4.768 [2025-01-03 21:16:57,681][126169] Avg episode reward: 6.268, avg true_objective: 4.768 [2025-01-03 21:17:05,673][126169] Replay video saved to /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/replay.mp4! [2025-01-03 21:18:31,102][126169] The model has been pushed to https://huggingface.co/spenning/rl_course_vizdoom_health_gathering_supreme [2025-01-03 21:47:21,455][133311] Saving configuration to /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/config.json... [2025-01-03 21:47:21,456][133311] Rollout worker 0 uses device cpu [2025-01-03 21:47:21,456][133311] Rollout worker 1 uses device cpu [2025-01-03 21:47:21,457][133311] Rollout worker 2 uses device cpu [2025-01-03 21:47:21,457][133311] Rollout worker 3 uses device cpu [2025-01-03 21:47:21,457][133311] Rollout worker 4 uses device cpu [2025-01-03 21:47:21,457][133311] Rollout worker 5 uses device cpu [2025-01-03 21:47:21,457][133311] Rollout worker 6 uses device cpu [2025-01-03 21:47:21,457][133311] Rollout worker 7 uses device cpu [2025-01-03 21:47:21,501][133311] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2025-01-03 21:47:21,501][133311] InferenceWorker_p0-w0: min num requests: 2 [2025-01-03 21:47:21,525][133311] Starting all processes... [2025-01-03 21:47:21,525][133311] Starting process learner_proc0 [2025-01-03 21:47:22,873][133311] Starting all processes... [2025-01-03 21:47:22,876][133311] Starting process inference_proc0-0 [2025-01-03 21:47:22,877][133311] Starting process rollout_proc0 [2025-01-03 21:47:22,877][133311] Starting process rollout_proc1 [2025-01-03 21:47:22,879][133362] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2025-01-03 21:47:22,880][133362] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0 [2025-01-03 21:47:22,877][133311] Starting process rollout_proc2 [2025-01-03 21:47:22,877][133311] Starting process rollout_proc3 [2025-01-03 21:47:22,877][133311] Starting process rollout_proc4 [2025-01-03 21:47:22,883][133311] Starting process rollout_proc5 [2025-01-03 21:47:22,885][133311] Starting process rollout_proc6 [2025-01-03 21:47:22,893][133362] Num visible devices: 1 [2025-01-03 21:47:22,887][133311] Starting process rollout_proc7 [2025-01-03 21:47:22,902][133362] Starting seed is not provided [2025-01-03 21:47:22,903][133362] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2025-01-03 21:47:22,903][133362] Initializing actor-critic model on device cuda:0 [2025-01-03 21:47:22,904][133362] RunningMeanStd input shape: (3, 72, 128) [2025-01-03 21:47:22,906][133362] RunningMeanStd input shape: (1,) [2025-01-03 21:47:22,923][133362] ConvEncoder: input_channels=3 [2025-01-03 21:47:23,080][133362] Conv encoder output size: 512 [2025-01-03 21:47:23,081][133362] Policy head output size: 512 [2025-01-03 21:47:23,104][133362] Created Actor Critic model with architecture: [2025-01-03 21:47:23,104][133362] ActorCriticSharedWeights( (obs_normalizer): ObservationNormalizer( (running_mean_std): RunningMeanStdDictInPlace( (running_mean_std): ModuleDict( (obs): RunningMeanStdInPlace() ) ) ) (returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace) (encoder): VizdoomEncoder( (basic_encoder): ConvEncoder( (enc): RecursiveScriptModule( original_name=ConvEncoderImpl (conv_head): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Conv2d) (1): RecursiveScriptModule(original_name=ELU) (2): RecursiveScriptModule(original_name=Conv2d) (3): RecursiveScriptModule(original_name=ELU) (4): RecursiveScriptModule(original_name=Conv2d) (5): RecursiveScriptModule(original_name=ELU) ) (mlp_layers): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Linear) (1): RecursiveScriptModule(original_name=ELU) ) ) ) ) (core): ModelCoreRNN( (core): GRU(512, 512) ) (decoder): MlpDecoder( (mlp): Identity() ) (critic_linear): Linear(in_features=512, out_features=1, bias=True) (action_parameterization): ActionParameterizationDefault( (distribution_linear): Linear(in_features=512, out_features=5, bias=True) ) ) [2025-01-03 21:47:23,246][133362] Using optimizer [2025-01-03 21:47:24,813][133362] Loading state from checkpoint /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000009767_40005632.pth... [2025-01-03 21:47:24,861][133362] Loading model from checkpoint [2025-01-03 21:47:24,863][133362] Loaded experiment state at self.train_step=9767, self.env_steps=40005632 [2025-01-03 21:47:24,864][133362] Initialized policy 0 weights for model version 9767 [2025-01-03 21:47:24,866][133362] LearnerWorker_p0 finished initialization! [2025-01-03 21:47:24,867][133362] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2025-01-03 21:47:24,999][133387] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2025-01-03 21:47:24,999][133387] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0 [2025-01-03 21:47:25,014][133387] Num visible devices: 1 [2025-01-03 21:47:25,091][133390] Worker 4 uses CPU cores [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11] [2025-01-03 21:47:25,125][133387] RunningMeanStd input shape: (3, 72, 128) [2025-01-03 21:47:25,126][133387] RunningMeanStd input shape: (1,) [2025-01-03 21:47:25,136][133387] ConvEncoder: input_channels=3 [2025-01-03 21:47:25,211][133389] Worker 2 uses CPU cores [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11] [2025-01-03 21:47:25,238][133387] Conv encoder output size: 512 [2025-01-03 21:47:25,238][133387] Policy head output size: 512 [2025-01-03 21:47:25,310][133386] Worker 0 uses CPU cores [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11] [2025-01-03 21:47:25,380][133388] Worker 1 uses CPU cores [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11] [2025-01-03 21:47:25,452][133391] Worker 3 uses CPU cores [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11] [2025-01-03 21:47:25,501][133406] Worker 7 uses CPU cores [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11] [2025-01-03 21:47:25,501][133403] Worker 5 uses CPU cores [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11] [2025-01-03 21:47:25,564][133311] Inference worker 0-0 is ready! [2025-01-03 21:47:25,564][133311] All inference workers are ready! Signal rollout workers to start! [2025-01-03 21:47:25,565][133311] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 40005632. Throughput: 0: nan. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2025-01-03 21:47:25,574][133404] Worker 6 uses CPU cores [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11] [2025-01-03 21:47:25,595][133403] Doom resolution: 160x120, resize resolution: (128, 72) [2025-01-03 21:47:25,597][133388] Doom resolution: 160x120, resize resolution: (128, 72) [2025-01-03 21:47:25,597][133389] Doom resolution: 160x120, resize resolution: (128, 72) [2025-01-03 21:47:25,606][133404] Doom resolution: 160x120, resize resolution: (128, 72) [2025-01-03 21:47:25,608][133391] Doom resolution: 160x120, resize resolution: (128, 72) [2025-01-03 21:47:25,608][133406] Doom resolution: 160x120, resize resolution: (128, 72) [2025-01-03 21:47:25,609][133386] Doom resolution: 160x120, resize resolution: (128, 72) [2025-01-03 21:47:25,610][133390] Doom resolution: 160x120, resize resolution: (128, 72) [2025-01-03 21:47:25,869][133403] Decorrelating experience for 0 frames... [2025-01-03 21:47:25,869][133388] Decorrelating experience for 0 frames... [2025-01-03 21:47:25,872][133404] Decorrelating experience for 0 frames... [2025-01-03 21:47:25,873][133389] Decorrelating experience for 0 frames... [2025-01-03 21:47:25,873][133406] Decorrelating experience for 0 frames... [2025-01-03 21:47:25,946][133386] Decorrelating experience for 0 frames... [2025-01-03 21:47:25,954][133391] Decorrelating experience for 0 frames... [2025-01-03 21:47:26,095][133406] Decorrelating experience for 32 frames... [2025-01-03 21:47:26,100][133388] Decorrelating experience for 32 frames... [2025-01-03 21:47:26,100][133404] Decorrelating experience for 32 frames... [2025-01-03 21:47:26,131][133390] Decorrelating experience for 0 frames... [2025-01-03 21:47:26,180][133391] Decorrelating experience for 32 frames... [2025-01-03 21:47:26,204][133389] Decorrelating experience for 32 frames... [2025-01-03 21:47:26,354][133390] Decorrelating experience for 32 frames... [2025-01-03 21:47:26,386][133404] Decorrelating experience for 64 frames... [2025-01-03 21:47:26,389][133388] Decorrelating experience for 64 frames... [2025-01-03 21:47:26,395][133403] Decorrelating experience for 32 frames... [2025-01-03 21:47:26,492][133389] Decorrelating experience for 64 frames... [2025-01-03 21:47:26,590][133406] Decorrelating experience for 64 frames... [2025-01-03 21:47:26,618][133391] Decorrelating experience for 64 frames... [2025-01-03 21:47:26,658][133388] Decorrelating experience for 96 frames... [2025-01-03 21:47:26,662][133404] Decorrelating experience for 96 frames... [2025-01-03 21:47:26,684][133403] Decorrelating experience for 64 frames... [2025-01-03 21:47:26,768][133389] Decorrelating experience for 96 frames... [2025-01-03 21:47:26,826][133386] Decorrelating experience for 32 frames... [2025-01-03 21:47:26,874][133406] Decorrelating experience for 96 frames... [2025-01-03 21:47:26,941][133403] Decorrelating experience for 96 frames... [2025-01-03 21:47:26,941][133391] Decorrelating experience for 96 frames... [2025-01-03 21:47:27,045][133390] Decorrelating experience for 64 frames... [2025-01-03 21:47:27,131][133386] Decorrelating experience for 64 frames... [2025-01-03 21:47:27,351][133390] Decorrelating experience for 96 frames... [2025-01-03 21:47:27,442][133386] Decorrelating experience for 96 frames... [2025-01-03 21:47:27,682][133362] Signal inference workers to stop experience collection... [2025-01-03 21:47:27,688][133387] InferenceWorker_p0-w0: stopping experience collection [2025-01-03 21:47:27,941][133311] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 40005632. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2025-01-03 21:47:27,941][133311] Avg episode reward: [(0, '2.499')] [2025-01-03 21:47:29,946][133362] Signal inference workers to resume experience collection... [2025-01-03 21:47:29,946][133387] InferenceWorker_p0-w0: resuming experience collection [2025-01-03 21:47:31,813][133387] Updated weights for policy 0, policy_version 9777 (0.0069) [2025-01-03 21:47:32,941][133311] Fps is (10 sec: 7774.1, 60 sec: 7774.1, 300 sec: 7774.1). Total num frames: 40062976. Throughput: 0: 1145.8. Samples: 8452. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2025-01-03 21:47:32,941][133311] Avg episode reward: [(0, '6.052')] [2025-01-03 21:47:34,150][133387] Updated weights for policy 0, policy_version 9787 (0.0012) [2025-01-03 21:47:36,303][133387] Updated weights for policy 0, policy_version 9797 (0.0011) [2025-01-03 21:47:37,941][133311] Fps is (10 sec: 15155.1, 60 sec: 12245.3, 300 sec: 12245.3). Total num frames: 40157184. Throughput: 0: 2909.9. Samples: 36014. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2025-01-03 21:47:37,941][133311] Avg episode reward: [(0, '5.471')] [2025-01-03 21:47:38,536][133387] Updated weights for policy 0, policy_version 9807 (0.0012) [2025-01-03 21:47:40,759][133387] Updated weights for policy 0, policy_version 9817 (0.0011) [2025-01-03 21:47:41,495][133311] Heartbeat connected on Batcher_0 [2025-01-03 21:47:41,498][133311] Heartbeat connected on LearnerWorker_p0 [2025-01-03 21:47:41,505][133311] Heartbeat connected on RolloutWorker_w0 [2025-01-03 21:47:41,506][133311] Heartbeat connected on InferenceWorker_p0-w0 [2025-01-03 21:47:41,509][133311] Heartbeat connected on RolloutWorker_w1 [2025-01-03 21:47:41,513][133311] Heartbeat connected on RolloutWorker_w2 [2025-01-03 21:47:41,517][133311] Heartbeat connected on RolloutWorker_w3 [2025-01-03 21:47:41,517][133311] Heartbeat connected on RolloutWorker_w4 [2025-01-03 21:47:41,520][133311] Heartbeat connected on RolloutWorker_w5 [2025-01-03 21:47:41,525][133311] Heartbeat connected on RolloutWorker_w7 [2025-01-03 21:47:41,526][133311] Heartbeat connected on RolloutWorker_w6 [2025-01-03 21:47:42,941][133311] Fps is (10 sec: 18432.1, 60 sec: 13907.8, 300 sec: 13907.8). Total num frames: 40247296. Throughput: 0: 2863.8. Samples: 49762. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2025-01-03 21:47:42,941][133311] Avg episode reward: [(0, '5.737')] [2025-01-03 21:47:43,135][133387] Updated weights for policy 0, policy_version 9827 (0.0011) [2025-01-03 21:47:45,360][133387] Updated weights for policy 0, policy_version 9837 (0.0011) [2025-01-03 21:47:47,573][133387] Updated weights for policy 0, policy_version 9847 (0.0011) [2025-01-03 21:47:47,941][133311] Fps is (10 sec: 18021.9, 60 sec: 14826.9, 300 sec: 14826.9). Total num frames: 40337408. Throughput: 0: 3418.2. Samples: 76488. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-03 21:47:47,942][133311] Avg episode reward: [(0, '5.114')] [2025-01-03 21:47:49,776][133387] Updated weights for policy 0, policy_version 9857 (0.0011) [2025-01-03 21:47:51,988][133387] Updated weights for policy 0, policy_version 9867 (0.0011) [2025-01-03 21:47:52,941][133311] Fps is (10 sec: 18431.8, 60 sec: 15560.3, 300 sec: 15560.3). Total num frames: 40431616. Throughput: 0: 3813.2. Samples: 104392. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-03 21:47:52,941][133311] Avg episode reward: [(0, '5.019')] [2025-01-03 21:47:54,216][133387] Updated weights for policy 0, policy_version 9877 (0.0011) [2025-01-03 21:47:56,494][133387] Updated weights for policy 0, policy_version 9887 (0.0011) [2025-01-03 21:47:57,941][133311] Fps is (10 sec: 18432.8, 60 sec: 15940.6, 300 sec: 15940.6). Total num frames: 40521728. Throughput: 0: 3657.8. Samples: 118424. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-03 21:47:57,941][133311] Avg episode reward: [(0, '4.808')] [2025-01-03 21:47:58,698][133387] Updated weights for policy 0, policy_version 9897 (0.0010) [2025-01-03 21:48:00,884][133387] Updated weights for policy 0, policy_version 9907 (0.0010) [2025-01-03 21:48:02,941][133311] Fps is (10 sec: 18022.6, 60 sec: 16219.1, 300 sec: 16219.1). Total num frames: 40611840. Throughput: 0: 3902.4. Samples: 145858. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-03 21:48:02,941][133311] Avg episode reward: [(0, '5.046')] [2025-01-03 21:48:03,221][133387] Updated weights for policy 0, policy_version 9917 (0.0011) [2025-01-03 21:48:05,494][133387] Updated weights for policy 0, policy_version 9927 (0.0011) [2025-01-03 21:48:07,682][133387] Updated weights for policy 0, policy_version 9937 (0.0011) [2025-01-03 21:48:07,941][133311] Fps is (10 sec: 18431.9, 60 sec: 16528.5, 300 sec: 16528.5). Total num frames: 40706048. Throughput: 0: 4083.2. Samples: 173032. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-03 21:48:07,941][133311] Avg episode reward: [(0, '5.053')] [2025-01-03 21:48:09,880][133387] Updated weights for policy 0, policy_version 9947 (0.0010) [2025-01-03 21:48:12,063][133387] Updated weights for policy 0, policy_version 9957 (0.0010) [2025-01-03 21:48:12,941][133311] Fps is (10 sec: 18432.0, 60 sec: 16686.2, 300 sec: 16686.2). Total num frames: 40796160. Throughput: 0: 4157.7. Samples: 187094. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-03 21:48:12,941][133311] Avg episode reward: [(0, '5.253')] [2025-01-03 21:48:14,318][133387] Updated weights for policy 0, policy_version 9967 (0.0011) [2025-01-03 21:48:16,612][133387] Updated weights for policy 0, policy_version 9977 (0.0011) [2025-01-03 21:48:17,941][133311] Fps is (10 sec: 18022.1, 60 sec: 16813.7, 300 sec: 16813.7). Total num frames: 40886272. Throughput: 0: 4580.3. Samples: 214568. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 21:48:17,942][133311] Avg episode reward: [(0, '5.613')] [2025-01-03 21:48:18,884][133387] Updated weights for policy 0, policy_version 9987 (0.0011) [2025-01-03 21:48:21,109][133387] Updated weights for policy 0, policy_version 9997 (0.0011) [2025-01-03 21:48:22,941][133311] Fps is (10 sec: 18022.4, 60 sec: 16919.1, 300 sec: 16919.1). Total num frames: 40976384. Throughput: 0: 4570.0. Samples: 241662. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 21:48:22,941][133311] Avg episode reward: [(0, '5.329')] [2025-01-03 21:48:23,462][133387] Updated weights for policy 0, policy_version 10007 (0.0012) [2025-01-03 21:48:25,638][133387] Updated weights for policy 0, policy_version 10017 (0.0010) [2025-01-03 21:48:27,841][133387] Updated weights for policy 0, policy_version 10027 (0.0010) [2025-01-03 21:48:27,941][133311] Fps is (10 sec: 18432.3, 60 sec: 17749.3, 300 sec: 17073.2). Total num frames: 41070592. Throughput: 0: 4566.7. Samples: 255264. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 21:48:27,941][133311] Avg episode reward: [(0, '5.216')] [2025-01-03 21:48:30,087][133387] Updated weights for policy 0, policy_version 10037 (0.0010) [2025-01-03 21:48:32,373][133387] Updated weights for policy 0, policy_version 10047 (0.0011) [2025-01-03 21:48:32,941][133311] Fps is (10 sec: 18431.9, 60 sec: 18295.5, 300 sec: 17143.6). Total num frames: 41160704. Throughput: 0: 4583.2. Samples: 282730. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 21:48:32,941][133311] Avg episode reward: [(0, '5.425')] [2025-01-03 21:48:34,650][133387] Updated weights for policy 0, policy_version 10057 (0.0011) [2025-01-03 21:48:36,912][133387] Updated weights for policy 0, policy_version 10067 (0.0011) [2025-01-03 21:48:37,941][133311] Fps is (10 sec: 18021.7, 60 sec: 18227.1, 300 sec: 17204.2). Total num frames: 41250816. Throughput: 0: 4561.8. Samples: 309674. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 21:48:37,942][133311] Avg episode reward: [(0, '5.825')] [2025-01-03 21:48:39,400][133387] Updated weights for policy 0, policy_version 10077 (0.0012) [2025-01-03 21:48:41,742][133387] Updated weights for policy 0, policy_version 10087 (0.0012) [2025-01-03 21:48:42,941][133311] Fps is (10 sec: 17612.6, 60 sec: 18158.9, 300 sec: 17204.2). Total num frames: 41336832. Throughput: 0: 4527.0. Samples: 322140. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 21:48:42,941][133311] Avg episode reward: [(0, '5.988')] [2025-01-03 21:48:44,037][133387] Updated weights for policy 0, policy_version 10097 (0.0011) [2025-01-03 21:48:46,280][133387] Updated weights for policy 0, policy_version 10107 (0.0011) [2025-01-03 21:48:47,941][133311] Fps is (10 sec: 17613.5, 60 sec: 18159.0, 300 sec: 17253.9). Total num frames: 41426944. Throughput: 0: 4515.3. Samples: 349046. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 21:48:47,941][133311] Avg episode reward: [(0, '5.588')] [2025-01-03 21:48:48,577][133387] Updated weights for policy 0, policy_version 10117 (0.0012) [2025-01-03 21:48:50,846][133387] Updated weights for policy 0, policy_version 10127 (0.0011) [2025-01-03 21:48:52,941][133311] Fps is (10 sec: 18022.3, 60 sec: 18090.6, 300 sec: 17297.9). Total num frames: 41517056. Throughput: 0: 4506.6. Samples: 375828. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-03 21:48:52,942][133311] Avg episode reward: [(0, '5.833')] [2025-01-03 21:48:53,166][133387] Updated weights for policy 0, policy_version 10137 (0.0012) [2025-01-03 21:48:55,406][133387] Updated weights for policy 0, policy_version 10147 (0.0010) [2025-01-03 21:48:57,698][133387] Updated weights for policy 0, policy_version 10157 (0.0011) [2025-01-03 21:48:57,941][133311] Fps is (10 sec: 18022.6, 60 sec: 18090.7, 300 sec: 17337.1). Total num frames: 41607168. Throughput: 0: 4498.0. Samples: 389506. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 21:48:57,941][133311] Avg episode reward: [(0, '6.136')] [2025-01-03 21:48:59,956][133387] Updated weights for policy 0, policy_version 10167 (0.0011) [2025-01-03 21:49:02,173][133387] Updated weights for policy 0, policy_version 10177 (0.0010) [2025-01-03 21:49:02,941][133311] Fps is (10 sec: 18022.7, 60 sec: 18090.6, 300 sec: 17372.3). Total num frames: 41697280. Throughput: 0: 4490.6. Samples: 416642. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 21:49:02,941][133311] Avg episode reward: [(0, '5.546')] [2025-01-03 21:49:04,450][133387] Updated weights for policy 0, policy_version 10187 (0.0011) [2025-01-03 21:49:06,675][133387] Updated weights for policy 0, policy_version 10197 (0.0011) [2025-01-03 21:49:07,941][133311] Fps is (10 sec: 18022.3, 60 sec: 18022.4, 300 sec: 17404.0). Total num frames: 41787392. Throughput: 0: 4496.8. Samples: 444020. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 21:49:07,941][133311] Avg episode reward: [(0, '6.122')] [2025-01-03 21:49:08,935][133387] Updated weights for policy 0, policy_version 10207 (0.0010) [2025-01-03 21:49:11,171][133387] Updated weights for policy 0, policy_version 10217 (0.0010) [2025-01-03 21:49:12,941][133311] Fps is (10 sec: 18022.1, 60 sec: 18022.3, 300 sec: 17432.8). Total num frames: 41877504. Throughput: 0: 4498.2. Samples: 457684. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 21:49:12,942][133311] Avg episode reward: [(0, '6.491')] [2025-01-03 21:49:12,942][133362] Saving new best policy, reward=6.491! [2025-01-03 21:49:13,540][133387] Updated weights for policy 0, policy_version 10227 (0.0011) [2025-01-03 21:49:15,738][133387] Updated weights for policy 0, policy_version 10237 (0.0010) [2025-01-03 21:49:17,941][133311] Fps is (10 sec: 18022.5, 60 sec: 18022.5, 300 sec: 17459.1). Total num frames: 41967616. Throughput: 0: 4491.2. Samples: 484836. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 21:49:17,941][133311] Avg episode reward: [(0, '5.907')] [2025-01-03 21:49:17,973][133362] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000010247_41971712.pth... [2025-01-03 21:49:17,974][133387] Updated weights for policy 0, policy_version 10247 (0.0011) [2025-01-03 21:49:18,021][133362] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000009441_38670336.pth [2025-01-03 21:49:20,236][133387] Updated weights for policy 0, policy_version 10257 (0.0010) [2025-01-03 21:49:22,446][133387] Updated weights for policy 0, policy_version 10267 (0.0011) [2025-01-03 21:49:22,941][133311] Fps is (10 sec: 18432.4, 60 sec: 18090.6, 300 sec: 17518.0). Total num frames: 42061824. Throughput: 0: 4501.9. Samples: 512258. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 21:49:22,941][133311] Avg episode reward: [(0, '5.582')] [2025-01-03 21:49:24,702][133387] Updated weights for policy 0, policy_version 10277 (0.0011) [2025-01-03 21:49:26,969][133387] Updated weights for policy 0, policy_version 10287 (0.0011) [2025-01-03 21:49:27,941][133311] Fps is (10 sec: 18432.0, 60 sec: 18022.4, 300 sec: 17538.6). Total num frames: 42151936. Throughput: 0: 4529.7. Samples: 525974. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 21:49:27,941][133311] Avg episode reward: [(0, '5.608')] [2025-01-03 21:49:29,258][133387] Updated weights for policy 0, policy_version 10297 (0.0011) [2025-01-03 21:49:31,505][133387] Updated weights for policy 0, policy_version 10307 (0.0011) [2025-01-03 21:49:32,941][133311] Fps is (10 sec: 18022.4, 60 sec: 18022.4, 300 sec: 17557.6). Total num frames: 42242048. Throughput: 0: 4530.4. Samples: 552916. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 21:49:32,941][133311] Avg episode reward: [(0, '5.427')] [2025-01-03 21:49:33,736][133387] Updated weights for policy 0, policy_version 10317 (0.0010) [2025-01-03 21:49:35,942][133387] Updated weights for policy 0, policy_version 10327 (0.0010) [2025-01-03 21:49:37,941][133311] Fps is (10 sec: 18022.4, 60 sec: 18022.6, 300 sec: 17575.1). Total num frames: 42332160. Throughput: 0: 4548.8. Samples: 580524. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 21:49:37,941][133311] Avg episode reward: [(0, '6.002')] [2025-01-03 21:49:38,212][133387] Updated weights for policy 0, policy_version 10337 (0.0011) [2025-01-03 21:49:40,444][133387] Updated weights for policy 0, policy_version 10347 (0.0011) [2025-01-03 21:49:42,735][133387] Updated weights for policy 0, policy_version 10357 (0.0011) [2025-01-03 21:49:42,941][133311] Fps is (10 sec: 18022.3, 60 sec: 18090.7, 300 sec: 17591.4). Total num frames: 42422272. Throughput: 0: 4548.6. Samples: 594194. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 21:49:42,941][133311] Avg episode reward: [(0, '6.317')] [2025-01-03 21:49:44,981][133387] Updated weights for policy 0, policy_version 10367 (0.0011) [2025-01-03 21:49:47,389][133387] Updated weights for policy 0, policy_version 10377 (0.0013) [2025-01-03 21:49:47,941][133311] Fps is (10 sec: 17612.1, 60 sec: 18022.3, 300 sec: 17577.7). Total num frames: 42508288. Throughput: 0: 4548.7. Samples: 621336. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 21:49:47,942][133311] Avg episode reward: [(0, '6.116')] [2025-01-03 21:49:52,143][133387] Updated weights for policy 0, policy_version 10387 (0.0031) [2025-01-03 21:49:52,941][133311] Fps is (10 sec: 12697.3, 60 sec: 17203.2, 300 sec: 17259.3). Total num frames: 42549248. Throughput: 0: 4267.4. Samples: 636054. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 21:49:52,942][133311] Avg episode reward: [(0, '5.446')] [2025-01-03 21:49:56,767][133387] Updated weights for policy 0, policy_version 10397 (0.0028) [2025-01-03 21:49:57,941][133311] Fps is (10 sec: 8601.5, 60 sec: 16452.1, 300 sec: 16988.6). Total num frames: 42594304. Throughput: 0: 4113.1. Samples: 642776. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 21:49:57,942][133311] Avg episode reward: [(0, '5.471')] [2025-01-03 21:50:00,997][133387] Updated weights for policy 0, policy_version 10407 (0.0027) [2025-01-03 21:50:02,941][133311] Fps is (10 sec: 9421.0, 60 sec: 15769.6, 300 sec: 16761.2). Total num frames: 42643456. Throughput: 0: 3826.2. Samples: 657016. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 21:50:02,942][133311] Avg episode reward: [(0, '5.696')] [2025-01-03 21:50:05,155][133387] Updated weights for policy 0, policy_version 10417 (0.0026) [2025-01-03 21:50:07,941][133311] Fps is (10 sec: 9830.5, 60 sec: 15086.9, 300 sec: 16547.8). Total num frames: 42692608. Throughput: 0: 3542.8. Samples: 671684. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 21:50:07,942][133311] Avg episode reward: [(0, '6.099')] [2025-01-03 21:50:09,303][133387] Updated weights for policy 0, policy_version 10427 (0.0025) [2025-01-03 21:50:12,909][133387] Updated weights for policy 0, policy_version 10437 (0.0022) [2025-01-03 21:50:12,941][133311] Fps is (10 sec: 10649.5, 60 sec: 14540.8, 300 sec: 16396.1). Total num frames: 42749952. Throughput: 0: 3418.0. Samples: 679784. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 21:50:12,942][133311] Avg episode reward: [(0, '5.967')] [2025-01-03 21:50:16,349][133387] Updated weights for policy 0, policy_version 10447 (0.0020) [2025-01-03 21:50:17,941][133311] Fps is (10 sec: 11468.9, 60 sec: 13994.6, 300 sec: 16253.2). Total num frames: 42807296. Throughput: 0: 3204.1. Samples: 697102. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 21:50:17,942][133311] Avg episode reward: [(0, '6.367')] [2025-01-03 21:50:19,851][133387] Updated weights for policy 0, policy_version 10457 (0.0022) [2025-01-03 21:50:22,941][133311] Fps is (10 sec: 11468.9, 60 sec: 13380.2, 300 sec: 16118.3). Total num frames: 42864640. Throughput: 0: 2978.3. Samples: 714550. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 21:50:22,942][133311] Avg episode reward: [(0, '6.324')] [2025-01-03 21:50:23,404][133387] Updated weights for policy 0, policy_version 10467 (0.0021) [2025-01-03 21:50:26,875][133387] Updated weights for policy 0, policy_version 10477 (0.0021) [2025-01-03 21:50:27,941][133311] Fps is (10 sec: 11878.5, 60 sec: 12902.4, 300 sec: 16013.3). Total num frames: 42926080. Throughput: 0: 2870.8. Samples: 723382. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 21:50:27,942][133311] Avg episode reward: [(0, '6.906')] [2025-01-03 21:50:27,949][133362] Saving new best policy, reward=6.906! [2025-01-03 21:50:30,372][133387] Updated weights for policy 0, policy_version 10487 (0.0022) [2025-01-03 21:50:32,941][133311] Fps is (10 sec: 11468.9, 60 sec: 12288.0, 300 sec: 15870.2). Total num frames: 42979328. Throughput: 0: 2656.3. Samples: 740868. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 21:50:32,942][133311] Avg episode reward: [(0, '5.905')] [2025-01-03 21:50:34,111][133387] Updated weights for policy 0, policy_version 10497 (0.0023) [2025-01-03 21:50:37,691][133387] Updated weights for policy 0, policy_version 10507 (0.0022) [2025-01-03 21:50:37,941][133311] Fps is (10 sec: 11059.0, 60 sec: 11741.8, 300 sec: 15755.8). Total num frames: 43036672. Throughput: 0: 2700.5. Samples: 757578. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 21:50:37,942][133311] Avg episode reward: [(0, '5.459')] [2025-01-03 21:50:41,641][133387] Updated weights for policy 0, policy_version 10517 (0.0024) [2025-01-03 21:50:42,941][133311] Fps is (10 sec: 10649.5, 60 sec: 11059.2, 300 sec: 15605.7). Total num frames: 43085824. Throughput: 0: 2725.7. Samples: 765432. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 21:50:42,942][133311] Avg episode reward: [(0, '5.663')] [2025-01-03 21:50:45,851][133387] Updated weights for policy 0, policy_version 10527 (0.0025) [2025-01-03 21:50:47,941][133311] Fps is (10 sec: 10239.9, 60 sec: 10513.1, 300 sec: 15483.2). Total num frames: 43139072. Throughput: 0: 2733.9. Samples: 780040. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-03 21:50:47,942][133311] Avg episode reward: [(0, '5.592')] [2025-01-03 21:50:49,808][133387] Updated weights for policy 0, policy_version 10537 (0.0023) [2025-01-03 21:50:52,941][133311] Fps is (10 sec: 11059.2, 60 sec: 10786.2, 300 sec: 15386.4). Total num frames: 43196416. Throughput: 0: 2782.2. Samples: 796884. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-03 21:50:52,942][133311] Avg episode reward: [(0, '5.709')] [2025-01-03 21:50:53,283][133387] Updated weights for policy 0, policy_version 10547 (0.0021) [2025-01-03 21:50:56,790][133387] Updated weights for policy 0, policy_version 10557 (0.0022) [2025-01-03 21:50:57,941][133311] Fps is (10 sec: 11468.9, 60 sec: 10991.0, 300 sec: 15294.2). Total num frames: 43253760. Throughput: 0: 2792.4. Samples: 805440. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 21:50:57,942][133311] Avg episode reward: [(0, '5.213')] [2025-01-03 21:51:00,464][133387] Updated weights for policy 0, policy_version 10567 (0.0022) [2025-01-03 21:51:02,941][133311] Fps is (10 sec: 11059.2, 60 sec: 11059.2, 300 sec: 15187.4). Total num frames: 43307008. Throughput: 0: 2779.8. Samples: 822192. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 21:51:02,942][133311] Avg episode reward: [(0, '4.821')] [2025-01-03 21:51:03,475][133311] Keyboard interrupt detected in the event loop EvtLoop [Runner_EvtLoop, process=main process 133311], exiting... [2025-01-03 21:51:03,477][133311] Runner profile tree view: main_loop: 221.9517 [2025-01-03 21:51:03,477][133311] Collected {0: 43311104}, FPS: 14892.8 [2025-01-03 21:51:03,479][133362] Stopping Batcher_0... [2025-01-03 21:51:03,480][133362] Loop batcher_evt_loop terminating... [2025-01-03 21:51:03,501][133406] EvtLoop [rollout_proc7_evt_loop, process=rollout_proc7] unhandled exception in slot='advance_rollouts' connected to emitter=Emitter(object_id='InferenceWorker_p0-w0', signal_name='advance7'), args=(0, 0) Traceback (most recent call last): File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/signal_slot/signal_slot.py", line 355, in _process_signal slot_callable(*args) File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/sample_factory/algo/sampling/rollout_worker.py", line 241, in advance_rollouts complete_rollouts, episodic_stats = runner.advance_rollouts(policy_id, self.timing) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/sample_factory/algo/sampling/non_batched_sampling.py", line 634, in advance_rollouts new_obs, rewards, terminated, truncated, infos = e.step(actions) ^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/gymnasium/core.py", line 461, in step return self.env.step(action) ^^^^^^^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/sample_factory/algo/utils/make_env.py", line 129, in step obs, rew, terminated, truncated, info = self.env.step(action) ^^^^^^^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/sample_factory/algo/utils/make_env.py", line 115, in step obs, rew, terminated, truncated, info = self.env.step(action) ^^^^^^^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/sf_examples/vizdoom/doom/wrappers/scenario_wrappers/gathering_reward_shaping.py", line 33, in step observation, reward, terminated, truncated, info = self.env.step(action) ^^^^^^^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/gymnasium/core.py", line 522, in step observation, reward, terminated, truncated, info = self.env.step(action) ^^^^^^^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/sample_factory/envs/env_wrappers.py", line 86, in step obs, reward, terminated, truncated, info = self.env.step(action) ^^^^^^^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/gymnasium/core.py", line 461, in step return self.env.step(action) ^^^^^^^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/sf_examples/vizdoom/doom/wrappers/multiplayer_stats.py", line 54, in step obs, reward, terminated, truncated, info = self.env.step(action) ^^^^^^^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/sf_examples/vizdoom/doom/doom_gym.py", line 452, in step reward = self.game.make_action(actions_flattened, self.skip_frames) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ vizdoom.vizdoom.SignalException: Signal SIGINT received. ViZDoom instance has been closed. [2025-01-03 21:51:03,520][133362] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000010574_43311104.pth... [2025-01-03 21:51:03,520][133406] Unhandled exception Signal SIGINT received. ViZDoom instance has been closed. in evt loop rollout_proc7_evt_loop [2025-01-03 21:51:03,520][133390] EvtLoop [rollout_proc4_evt_loop, process=rollout_proc4] unhandled exception in slot='advance_rollouts' connected to emitter=Emitter(object_id='InferenceWorker_p0-w0', signal_name='advance4'), args=(0, 0) Traceback (most recent call last): File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/signal_slot/signal_slot.py", line 355, in _process_signal slot_callable(*args) File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/sample_factory/algo/sampling/rollout_worker.py", line 241, in advance_rollouts complete_rollouts, episodic_stats = runner.advance_rollouts(policy_id, self.timing) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/sample_factory/algo/sampling/non_batched_sampling.py", line 634, in advance_rollouts new_obs, rewards, terminated, truncated, infos = e.step(actions) ^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/gymnasium/core.py", line 461, in step return self.env.step(action) ^^^^^^^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/sample_factory/algo/utils/make_env.py", line 129, in step obs, rew, terminated, truncated, info = self.env.step(action) ^^^^^^^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/sample_factory/algo/utils/make_env.py", line 115, in step obs, rew, terminated, truncated, info = self.env.step(action) ^^^^^^^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/sf_examples/vizdoom/doom/wrappers/scenario_wrappers/gathering_reward_shaping.py", line 33, in step observation, reward, terminated, truncated, info = self.env.step(action) ^^^^^^^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/gymnasium/core.py", line 522, in step observation, reward, terminated, truncated, info = self.env.step(action) ^^^^^^^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/sample_factory/envs/env_wrappers.py", line 86, in step obs, reward, terminated, truncated, info = self.env.step(action) ^^^^^^^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/gymnasium/core.py", line 461, in step return self.env.step(action) ^^^^^^^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/sf_examples/vizdoom/doom/wrappers/multiplayer_stats.py", line 54, in step obs, reward, terminated, truncated, info = self.env.step(action) ^^^^^^^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/sf_examples/vizdoom/doom/doom_gym.py", line 452, in step reward = self.game.make_action(actions_flattened, self.skip_frames) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ vizdoom.vizdoom.SignalException: Signal SIGINT received. ViZDoom instance has been closed. [2025-01-03 21:51:03,526][133390] Unhandled exception Signal SIGINT received. ViZDoom instance has been closed. in evt loop rollout_proc4_evt_loop [2025-01-03 21:51:03,513][133386] EvtLoop [rollout_proc0_evt_loop, process=rollout_proc0] unhandled exception in slot='advance_rollouts' connected to emitter=Emitter(object_id='InferenceWorker_p0-w0', signal_name='advance0'), args=(0, 0) Traceback (most recent call last): File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/signal_slot/signal_slot.py", line 355, in _process_signal slot_callable(*args) File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/sample_factory/algo/sampling/rollout_worker.py", line 241, in advance_rollouts complete_rollouts, episodic_stats = runner.advance_rollouts(policy_id, self.timing) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/sample_factory/algo/sampling/non_batched_sampling.py", line 634, in advance_rollouts new_obs, rewards, terminated, truncated, infos = e.step(actions) ^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/gymnasium/core.py", line 461, in step return self.env.step(action) ^^^^^^^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/sample_factory/algo/utils/make_env.py", line 129, in step obs, rew, terminated, truncated, info = self.env.step(action) ^^^^^^^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/sample_factory/algo/utils/make_env.py", line 115, in step obs, rew, terminated, truncated, info = self.env.step(action) ^^^^^^^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/sf_examples/vizdoom/doom/wrappers/scenario_wrappers/gathering_reward_shaping.py", line 33, in step observation, reward, terminated, truncated, info = self.env.step(action) ^^^^^^^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/gymnasium/core.py", line 522, in step observation, reward, terminated, truncated, info = self.env.step(action) ^^^^^^^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/sample_factory/envs/env_wrappers.py", line 86, in step obs, reward, terminated, truncated, info = self.env.step(action) ^^^^^^^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/gymnasium/core.py", line 461, in step return self.env.step(action) ^^^^^^^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/sf_examples/vizdoom/doom/wrappers/multiplayer_stats.py", line 54, in step obs, reward, terminated, truncated, info = self.env.step(action) ^^^^^^^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/sf_examples/vizdoom/doom/doom_gym.py", line 452, in step reward = self.game.make_action(actions_flattened, self.skip_frames) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ vizdoom.vizdoom.SignalException: Signal SIGINT received. ViZDoom instance has been closed. [2025-01-03 21:51:03,533][133386] Unhandled exception Signal SIGINT received. ViZDoom instance has been closed. in evt loop rollout_proc0_evt_loop [2025-01-03 21:51:03,539][133387] Weights refcount: 2 0 [2025-01-03 21:51:03,544][133387] Stopping InferenceWorker_p0-w0... [2025-01-03 21:51:03,544][133387] Loop inference_proc0-0_evt_loop terminating... [2025-01-03 21:51:03,543][133403] EvtLoop [rollout_proc5_evt_loop, process=rollout_proc5] unhandled exception in slot='advance_rollouts' connected to emitter=Emitter(object_id='InferenceWorker_p0-w0', signal_name='advance5'), args=(0, 0) Traceback (most recent call last): File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/signal_slot/signal_slot.py", line 355, in _process_signal slot_callable(*args) File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/sample_factory/algo/sampling/rollout_worker.py", line 241, in advance_rollouts complete_rollouts, episodic_stats = runner.advance_rollouts(policy_id, self.timing) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/sample_factory/algo/sampling/non_batched_sampling.py", line 634, in advance_rollouts new_obs, rewards, terminated, truncated, infos = e.step(actions) ^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/gymnasium/core.py", line 461, in step return self.env.step(action) ^^^^^^^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/sample_factory/algo/utils/make_env.py", line 129, in step obs, rew, terminated, truncated, info = self.env.step(action) ^^^^^^^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/sample_factory/algo/utils/make_env.py", line 115, in step obs, rew, terminated, truncated, info = self.env.step(action) ^^^^^^^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/sf_examples/vizdoom/doom/wrappers/scenario_wrappers/gathering_reward_shaping.py", line 33, in step observation, reward, terminated, truncated, info = self.env.step(action) ^^^^^^^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/gymnasium/core.py", line 522, in step observation, reward, terminated, truncated, info = self.env.step(action) ^^^^^^^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/sample_factory/envs/env_wrappers.py", line 86, in step obs, reward, terminated, truncated, info = self.env.step(action) ^^^^^^^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/gymnasium/core.py", line 461, in step return self.env.step(action) ^^^^^^^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/sf_examples/vizdoom/doom/wrappers/multiplayer_stats.py", line 54, in step obs, reward, terminated, truncated, info = self.env.step(action) ^^^^^^^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/sf_examples/vizdoom/doom/doom_gym.py", line 452, in step reward = self.game.make_action(actions_flattened, self.skip_frames) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ vizdoom.vizdoom.SignalException: Signal SIGINT received. ViZDoom instance has been closed. [2025-01-03 21:51:03,550][133403] Unhandled exception Signal SIGINT received. ViZDoom instance has been closed. in evt loop rollout_proc5_evt_loop [2025-01-03 21:51:03,569][133389] EvtLoop [rollout_proc2_evt_loop, process=rollout_proc2] unhandled exception in slot='advance_rollouts' connected to emitter=Emitter(object_id='InferenceWorker_p0-w0', signal_name='advance2'), args=(1, 0) Traceback (most recent call last): File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/signal_slot/signal_slot.py", line 355, in _process_signal slot_callable(*args) File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/sample_factory/algo/sampling/rollout_worker.py", line 241, in advance_rollouts complete_rollouts, episodic_stats = runner.advance_rollouts(policy_id, self.timing) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/sample_factory/algo/sampling/non_batched_sampling.py", line 634, in advance_rollouts new_obs, rewards, terminated, truncated, infos = e.step(actions) ^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/gymnasium/core.py", line 461, in step return self.env.step(action) ^^^^^^^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/sample_factory/algo/utils/make_env.py", line 129, in step obs, rew, terminated, truncated, info = self.env.step(action) ^^^^^^^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/sample_factory/algo/utils/make_env.py", line 115, in step obs, rew, terminated, truncated, info = self.env.step(action) ^^^^^^^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/sf_examples/vizdoom/doom/wrappers/scenario_wrappers/gathering_reward_shaping.py", line 33, in step observation, reward, terminated, truncated, info = self.env.step(action) ^^^^^^^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/gymnasium/core.py", line 522, in step observation, reward, terminated, truncated, info = self.env.step(action) ^^^^^^^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/sample_factory/envs/env_wrappers.py", line 86, in step obs, reward, terminated, truncated, info = self.env.step(action) ^^^^^^^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/gymnasium/core.py", line 461, in step return self.env.step(action) ^^^^^^^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/sf_examples/vizdoom/doom/wrappers/multiplayer_stats.py", line 54, in step obs, reward, terminated, truncated, info = self.env.step(action) ^^^^^^^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/sf_examples/vizdoom/doom/doom_gym.py", line 452, in step reward = self.game.make_action(actions_flattened, self.skip_frames) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ vizdoom.vizdoom.SignalException: Signal SIGINT received. ViZDoom instance has been closed. [2025-01-03 21:51:03,582][133389] Unhandled exception Signal SIGINT received. ViZDoom instance has been closed. in evt loop rollout_proc2_evt_loop [2025-01-03 21:51:03,575][133391] EvtLoop [rollout_proc3_evt_loop, process=rollout_proc3] unhandled exception in slot='advance_rollouts' connected to emitter=Emitter(object_id='InferenceWorker_p0-w0', signal_name='advance3'), args=(1, 0) Traceback (most recent call last): File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/signal_slot/signal_slot.py", line 355, in _process_signal slot_callable(*args) File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/sample_factory/algo/sampling/rollout_worker.py", line 241, in advance_rollouts complete_rollouts, episodic_stats = runner.advance_rollouts(policy_id, self.timing) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/sample_factory/algo/sampling/non_batched_sampling.py", line 634, in advance_rollouts new_obs, rewards, terminated, truncated, infos = e.step(actions) ^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/gymnasium/core.py", line 461, in step return self.env.step(action) ^^^^^^^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/sample_factory/algo/utils/make_env.py", line 129, in step obs, rew, terminated, truncated, info = self.env.step(action) ^^^^^^^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/sample_factory/algo/utils/make_env.py", line 115, in step obs, rew, terminated, truncated, info = self.env.step(action) ^^^^^^^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/sf_examples/vizdoom/doom/wrappers/scenario_wrappers/gathering_reward_shaping.py", line 33, in step observation, reward, terminated, truncated, info = self.env.step(action) ^^^^^^^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/gymnasium/core.py", line 522, in step observation, reward, terminated, truncated, info = self.env.step(action) ^^^^^^^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/sample_factory/envs/env_wrappers.py", line 86, in step obs, reward, terminated, truncated, info = self.env.step(action) ^^^^^^^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/gymnasium/core.py", line 461, in step return self.env.step(action) ^^^^^^^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/sf_examples/vizdoom/doom/wrappers/multiplayer_stats.py", line 54, in step obs, reward, terminated, truncated, info = self.env.step(action) ^^^^^^^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/sf_examples/vizdoom/doom/doom_gym.py", line 452, in step reward = self.game.make_action(actions_flattened, self.skip_frames) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ vizdoom.vizdoom.SignalException: Signal SIGINT received. ViZDoom instance has been closed. [2025-01-03 21:51:03,583][133391] Unhandled exception Signal SIGINT received. ViZDoom instance has been closed. in evt loop rollout_proc3_evt_loop [2025-01-03 21:51:03,578][133388] EvtLoop [rollout_proc1_evt_loop, process=rollout_proc1] unhandled exception in slot='advance_rollouts' connected to emitter=Emitter(object_id='InferenceWorker_p0-w0', signal_name='advance1'), args=(0, 0) Traceback (most recent call last): File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/signal_slot/signal_slot.py", line 355, in _process_signal slot_callable(*args) File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/sample_factory/algo/sampling/rollout_worker.py", line 241, in advance_rollouts complete_rollouts, episodic_stats = runner.advance_rollouts(policy_id, self.timing) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/sample_factory/algo/sampling/non_batched_sampling.py", line 634, in advance_rollouts new_obs, rewards, terminated, truncated, infos = e.step(actions) ^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/gymnasium/core.py", line 461, in step return self.env.step(action) ^^^^^^^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/sample_factory/algo/utils/make_env.py", line 129, in step obs, rew, terminated, truncated, info = self.env.step(action) ^^^^^^^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/sample_factory/algo/utils/make_env.py", line 115, in step obs, rew, terminated, truncated, info = self.env.step(action) ^^^^^^^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/sf_examples/vizdoom/doom/wrappers/scenario_wrappers/gathering_reward_shaping.py", line 33, in step observation, reward, terminated, truncated, info = self.env.step(action) ^^^^^^^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/gymnasium/core.py", line 522, in step observation, reward, terminated, truncated, info = self.env.step(action) ^^^^^^^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/sample_factory/envs/env_wrappers.py", line 86, in step obs, reward, terminated, truncated, info = self.env.step(action) ^^^^^^^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/gymnasium/core.py", line 461, in step return self.env.step(action) ^^^^^^^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/sf_examples/vizdoom/doom/wrappers/multiplayer_stats.py", line 54, in step obs, reward, terminated, truncated, info = self.env.step(action) ^^^^^^^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/sf_examples/vizdoom/doom/doom_gym.py", line 452, in step reward = self.game.make_action(actions_flattened, self.skip_frames) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ vizdoom.vizdoom.SignalException: Signal SIGINT received. ViZDoom instance has been closed. [2025-01-03 21:51:03,594][133388] Unhandled exception Signal SIGINT received. ViZDoom instance has been closed. in evt loop rollout_proc1_evt_loop [2025-01-03 21:51:03,579][133404] EvtLoop [rollout_proc6_evt_loop, process=rollout_proc6] unhandled exception in slot='advance_rollouts' connected to emitter=Emitter(object_id='InferenceWorker_p0-w0', signal_name='advance6'), args=(0, 0) Traceback (most recent call last): File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/signal_slot/signal_slot.py", line 355, in _process_signal slot_callable(*args) File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/sample_factory/algo/sampling/rollout_worker.py", line 241, in advance_rollouts complete_rollouts, episodic_stats = runner.advance_rollouts(policy_id, self.timing) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/sample_factory/algo/sampling/non_batched_sampling.py", line 634, in advance_rollouts new_obs, rewards, terminated, truncated, infos = e.step(actions) ^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/gymnasium/core.py", line 461, in step return self.env.step(action) ^^^^^^^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/sample_factory/algo/utils/make_env.py", line 129, in step obs, rew, terminated, truncated, info = self.env.step(action) ^^^^^^^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/sample_factory/algo/utils/make_env.py", line 115, in step obs, rew, terminated, truncated, info = self.env.step(action) ^^^^^^^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/sf_examples/vizdoom/doom/wrappers/scenario_wrappers/gathering_reward_shaping.py", line 33, in step observation, reward, terminated, truncated, info = self.env.step(action) ^^^^^^^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/gymnasium/core.py", line 522, in step observation, reward, terminated, truncated, info = self.env.step(action) ^^^^^^^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/sample_factory/envs/env_wrappers.py", line 86, in step obs, reward, terminated, truncated, info = self.env.step(action) ^^^^^^^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/gymnasium/core.py", line 461, in step return self.env.step(action) ^^^^^^^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/sf_examples/vizdoom/doom/wrappers/multiplayer_stats.py", line 54, in step obs, reward, terminated, truncated, info = self.env.step(action) ^^^^^^^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/sf_examples/vizdoom/doom/doom_gym.py", line 452, in step reward = self.game.make_action(actions_flattened, self.skip_frames) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ vizdoom.vizdoom.SignalException: Signal SIGINT received. ViZDoom instance has been closed. [2025-01-03 21:51:03,597][133404] Unhandled exception Signal SIGINT received. ViZDoom instance has been closed. in evt loop rollout_proc6_evt_loop [2025-01-03 21:51:03,601][133362] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000009767_40005632.pth [2025-01-03 21:51:03,602][133362] Stopping LearnerWorker_p0... [2025-01-03 21:51:03,603][133362] Loop learner_proc0_evt_loop terminating... [2025-01-03 21:51:17,246][134211] Saving configuration to /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/config.json... [2025-01-03 21:51:17,246][134211] Rollout worker 0 uses device cpu [2025-01-03 21:51:17,246][134211] Rollout worker 1 uses device cpu [2025-01-03 21:51:17,246][134211] Rollout worker 2 uses device cpu [2025-01-03 21:51:17,247][134211] Rollout worker 3 uses device cpu [2025-01-03 21:51:17,247][134211] Rollout worker 4 uses device cpu [2025-01-03 21:51:17,247][134211] Rollout worker 5 uses device cpu [2025-01-03 21:51:17,247][134211] Rollout worker 6 uses device cpu [2025-01-03 21:51:17,247][134211] Rollout worker 7 uses device cpu [2025-01-03 21:51:17,247][134211] Rollout worker 8 uses device cpu [2025-01-03 21:51:17,247][134211] Rollout worker 9 uses device cpu [2025-01-03 21:51:17,247][134211] Rollout worker 10 uses device cpu [2025-01-03 21:51:17,247][134211] Rollout worker 11 uses device cpu [2025-01-03 21:51:17,303][134211] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2025-01-03 21:51:17,303][134211] InferenceWorker_p0-w0: min num requests: 4 [2025-01-03 21:51:17,338][134211] Starting all processes... [2025-01-03 21:51:17,338][134211] Starting process learner_proc0 [2025-01-03 21:51:18,736][134211] Starting all processes... [2025-01-03 21:51:18,739][134211] Starting process inference_proc0-0 [2025-01-03 21:51:18,739][134211] Starting process rollout_proc0 [2025-01-03 21:51:18,741][134264] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2025-01-03 21:51:18,742][134264] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0 [2025-01-03 21:51:18,739][134211] Starting process rollout_proc1 [2025-01-03 21:51:18,739][134211] Starting process rollout_proc2 [2025-01-03 21:51:18,740][134211] Starting process rollout_proc3 [2025-01-03 21:51:18,740][134211] Starting process rollout_proc4 [2025-01-03 21:51:18,740][134211] Starting process rollout_proc5 [2025-01-03 21:51:18,754][134264] Num visible devices: 1 [2025-01-03 21:51:18,742][134211] Starting process rollout_proc6 [2025-01-03 21:51:18,763][134264] Starting seed is not provided [2025-01-03 21:51:18,763][134264] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2025-01-03 21:51:18,763][134264] Initializing actor-critic model on device cuda:0 [2025-01-03 21:51:18,764][134264] RunningMeanStd input shape: (3, 72, 128) [2025-01-03 21:51:18,764][134264] RunningMeanStd input shape: (1,) [2025-01-03 21:51:18,742][134211] Starting process rollout_proc7 [2025-01-03 21:51:18,744][134211] Starting process rollout_proc8 [2025-01-03 21:51:18,748][134211] Starting process rollout_proc9 [2025-01-03 21:51:18,749][134211] Starting process rollout_proc10 [2025-01-03 21:51:18,749][134211] Starting process rollout_proc11 [2025-01-03 21:51:18,781][134264] ConvEncoder: input_channels=3 [2025-01-03 21:51:18,954][134264] Conv encoder output size: 512 [2025-01-03 21:51:18,955][134264] Policy head output size: 512 [2025-01-03 21:51:18,975][134264] Created Actor Critic model with architecture: [2025-01-03 21:51:18,976][134264] ActorCriticSharedWeights( (obs_normalizer): ObservationNormalizer( (running_mean_std): RunningMeanStdDictInPlace( (running_mean_std): ModuleDict( (obs): RunningMeanStdInPlace() ) ) ) (returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace) (encoder): VizdoomEncoder( (basic_encoder): ConvEncoder( (enc): RecursiveScriptModule( original_name=ConvEncoderImpl (conv_head): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Conv2d) (1): RecursiveScriptModule(original_name=ELU) (2): RecursiveScriptModule(original_name=Conv2d) (3): RecursiveScriptModule(original_name=ELU) (4): RecursiveScriptModule(original_name=Conv2d) (5): RecursiveScriptModule(original_name=ELU) ) (mlp_layers): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Linear) (1): RecursiveScriptModule(original_name=ELU) ) ) ) ) (core): ModelCoreRNN( (core): GRU(512, 512) ) (decoder): MlpDecoder( (mlp): Identity() ) (critic_linear): Linear(in_features=512, out_features=1, bias=True) (action_parameterization): ActionParameterizationDefault( (distribution_linear): Linear(in_features=512, out_features=5, bias=True) ) ) [2025-01-03 21:51:19,240][134264] Using optimizer [2025-01-03 21:51:21,031][134264] Loading state from checkpoint /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000010574_43311104.pth... [2025-01-03 21:51:21,094][134264] Loading model from checkpoint [2025-01-03 21:51:21,101][134264] Loaded experiment state at self.train_step=10574, self.env_steps=43311104 [2025-01-03 21:51:21,101][134264] Initialized policy 0 weights for model version 10574 [2025-01-03 21:51:21,105][134264] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2025-01-03 21:51:21,112][134264] LearnerWorker_p0 finished initialization! [2025-01-03 21:51:21,813][134312] Worker 4 uses CPU cores [4] [2025-01-03 21:51:21,846][134310] Worker 5 uses CPU cores [5] [2025-01-03 21:51:21,864][134296] Worker 0 uses CPU cores [0] [2025-01-03 21:51:21,887][134314] Worker 7 uses CPU cores [7] [2025-01-03 21:51:21,899][134294] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2025-01-03 21:51:21,900][134294] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0 [2025-01-03 21:51:21,914][134294] Num visible devices: 1 [2025-01-03 21:51:22,003][134294] RunningMeanStd input shape: (3, 72, 128) [2025-01-03 21:51:22,004][134294] RunningMeanStd input shape: (1,) [2025-01-03 21:51:22,016][134294] ConvEncoder: input_channels=3 [2025-01-03 21:51:22,028][134308] Worker 3 uses CPU cores [3] [2025-01-03 21:51:22,118][134294] Conv encoder output size: 512 [2025-01-03 21:51:22,118][134294] Policy head output size: 512 [2025-01-03 21:51:22,121][134311] Worker 6 uses CPU cores [6] [2025-01-03 21:51:22,147][134295] Worker 2 uses CPU cores [2] [2025-01-03 21:51:22,150][134317] Worker 10 uses CPU cores [10] [2025-01-03 21:51:22,155][134315] Worker 11 uses CPU cores [11] [2025-01-03 21:51:22,161][134293] Worker 1 uses CPU cores [1] [2025-01-03 21:51:22,178][134313] Worker 8 uses CPU cores [8] [2025-01-03 21:51:22,365][134211] Inference worker 0-0 is ready! [2025-01-03 21:51:22,366][134211] All inference workers are ready! Signal rollout workers to start! [2025-01-03 21:51:22,366][134211] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 43311104. Throughput: 0: nan. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2025-01-03 21:51:22,382][134316] Worker 9 uses CPU cores [9] [2025-01-03 21:51:22,418][134315] Doom resolution: 160x120, resize resolution: (128, 72) [2025-01-03 21:51:22,418][134308] Doom resolution: 160x120, resize resolution: (128, 72) [2025-01-03 21:51:22,418][134317] Doom resolution: 160x120, resize resolution: (128, 72) [2025-01-03 21:51:22,418][134311] Doom resolution: 160x120, resize resolution: (128, 72) [2025-01-03 21:51:22,420][134313] Doom resolution: 160x120, resize resolution: (128, 72) [2025-01-03 21:51:22,420][134293] Doom resolution: 160x120, resize resolution: (128, 72) [2025-01-03 21:51:22,420][134295] Doom resolution: 160x120, resize resolution: (128, 72) [2025-01-03 21:51:22,420][134314] Doom resolution: 160x120, resize resolution: (128, 72) [2025-01-03 21:51:22,443][134296] Doom resolution: 160x120, resize resolution: (128, 72) [2025-01-03 21:51:22,460][134312] Doom resolution: 160x120, resize resolution: (128, 72) [2025-01-03 21:51:22,471][134316] Doom resolution: 160x120, resize resolution: (128, 72) [2025-01-03 21:51:22,493][134310] Doom resolution: 160x120, resize resolution: (128, 72) [2025-01-03 21:51:22,755][134308] Decorrelating experience for 0 frames... [2025-01-03 21:51:22,774][134295] Decorrelating experience for 0 frames... [2025-01-03 21:51:22,822][134315] Decorrelating experience for 0 frames... [2025-01-03 21:51:22,827][134317] Decorrelating experience for 0 frames... [2025-01-03 21:51:22,842][134312] Decorrelating experience for 0 frames... [2025-01-03 21:51:22,882][134310] Decorrelating experience for 0 frames... [2025-01-03 21:51:23,021][134311] Decorrelating experience for 0 frames... [2025-01-03 21:51:23,096][134293] Decorrelating experience for 0 frames... [2025-01-03 21:51:23,168][134315] Decorrelating experience for 32 frames... [2025-01-03 21:51:23,173][134312] Decorrelating experience for 32 frames... [2025-01-03 21:51:23,228][134317] Decorrelating experience for 32 frames... [2025-01-03 21:51:23,228][134310] Decorrelating experience for 32 frames... [2025-01-03 21:51:23,278][134311] Decorrelating experience for 32 frames... [2025-01-03 21:51:23,288][134314] Decorrelating experience for 0 frames... [2025-01-03 21:51:23,359][134316] Decorrelating experience for 0 frames... [2025-01-03 21:51:23,439][134295] Decorrelating experience for 32 frames... [2025-01-03 21:51:23,562][134312] Decorrelating experience for 64 frames... [2025-01-03 21:51:23,589][134293] Decorrelating experience for 32 frames... [2025-01-03 21:51:23,602][134316] Decorrelating experience for 32 frames... [2025-01-03 21:51:23,634][134314] Decorrelating experience for 32 frames... [2025-01-03 21:51:23,692][134315] Decorrelating experience for 64 frames... [2025-01-03 21:51:23,705][134317] Decorrelating experience for 64 frames... [2025-01-03 21:51:23,752][134295] Decorrelating experience for 64 frames... [2025-01-03 21:51:23,877][134312] Decorrelating experience for 96 frames... [2025-01-03 21:51:23,911][134293] Decorrelating experience for 64 frames... [2025-01-03 21:51:23,919][134311] Decorrelating experience for 64 frames... [2025-01-03 21:51:23,964][134313] Decorrelating experience for 0 frames... [2025-01-03 21:51:23,967][134211] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 43311104. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2025-01-03 21:51:24,005][134296] Decorrelating experience for 0 frames... [2025-01-03 21:51:24,009][134310] Decorrelating experience for 64 frames... [2025-01-03 21:51:24,192][134315] Decorrelating experience for 96 frames... [2025-01-03 21:51:24,220][134311] Decorrelating experience for 96 frames... [2025-01-03 21:51:24,293][134314] Decorrelating experience for 64 frames... [2025-01-03 21:51:24,377][134293] Decorrelating experience for 96 frames... [2025-01-03 21:51:24,489][134310] Decorrelating experience for 96 frames... [2025-01-03 21:51:24,608][134296] Decorrelating experience for 32 frames... [2025-01-03 21:51:24,709][134314] Decorrelating experience for 96 frames... [2025-01-03 21:51:24,770][134313] Decorrelating experience for 32 frames... [2025-01-03 21:51:24,930][134316] Decorrelating experience for 64 frames... [2025-01-03 21:51:25,000][134296] Decorrelating experience for 64 frames... [2025-01-03 21:51:25,097][134295] Decorrelating experience for 96 frames... [2025-01-03 21:51:25,160][134264] Signal inference workers to stop experience collection... [2025-01-03 21:51:25,172][134294] InferenceWorker_p0-w0: stopping experience collection [2025-01-03 21:51:25,250][134313] Decorrelating experience for 64 frames... [2025-01-03 21:51:25,380][134296] Decorrelating experience for 96 frames... [2025-01-03 21:51:25,393][134308] Decorrelating experience for 32 frames... [2025-01-03 21:51:25,428][134316] Decorrelating experience for 96 frames... [2025-01-03 21:51:25,454][134317] Decorrelating experience for 96 frames... [2025-01-03 21:51:25,575][134313] Decorrelating experience for 96 frames... [2025-01-03 21:51:25,772][134308] Decorrelating experience for 64 frames... [2025-01-03 21:51:26,110][134308] Decorrelating experience for 96 frames... [2025-01-03 21:51:27,482][134264] Signal inference workers to resume experience collection... [2025-01-03 21:51:27,483][134294] InferenceWorker_p0-w0: resuming experience collection [2025-01-03 21:51:28,970][134211] Fps is (10 sec: 4341.8, 60 sec: 4341.8, 300 sec: 4341.8). Total num frames: 43339776. Throughput: 0: 739.6. Samples: 4884. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2025-01-03 21:51:28,970][134211] Avg episode reward: [(0, '3.956')] [2025-01-03 21:51:29,447][134294] Updated weights for policy 0, policy_version 10584 (0.0074) [2025-01-03 21:51:31,577][134294] Updated weights for policy 0, policy_version 10594 (0.0017) [2025-01-03 21:51:33,836][134294] Updated weights for policy 0, policy_version 10604 (0.0017) [2025-01-03 21:51:33,968][134211] Fps is (10 sec: 12287.8, 60 sec: 10591.8, 300 sec: 10591.8). Total num frames: 43433984. Throughput: 0: 1688.8. Samples: 19592. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2025-01-03 21:51:33,968][134211] Avg episode reward: [(0, '5.230')] [2025-01-03 21:51:36,301][134294] Updated weights for policy 0, policy_version 10614 (0.0017) [2025-01-03 21:51:37,297][134211] Heartbeat connected on Batcher_0 [2025-01-03 21:51:37,300][134211] Heartbeat connected on LearnerWorker_p0 [2025-01-03 21:51:37,313][134211] Heartbeat connected on RolloutWorker_w0 [2025-01-03 21:51:37,313][134211] Heartbeat connected on RolloutWorker_w1 [2025-01-03 21:51:37,313][134211] Heartbeat connected on RolloutWorker_w2 [2025-01-03 21:51:37,315][134211] Heartbeat connected on InferenceWorker_p0-w0 [2025-01-03 21:51:37,322][134211] Heartbeat connected on RolloutWorker_w4 [2025-01-03 21:51:37,325][134211] Heartbeat connected on RolloutWorker_w6 [2025-01-03 21:51:37,325][134211] Heartbeat connected on RolloutWorker_w5 [2025-01-03 21:51:37,329][134211] Heartbeat connected on RolloutWorker_w7 [2025-01-03 21:51:37,330][134211] Heartbeat connected on RolloutWorker_w8 [2025-01-03 21:51:37,333][134211] Heartbeat connected on RolloutWorker_w3 [2025-01-03 21:51:37,337][134211] Heartbeat connected on RolloutWorker_w10 [2025-01-03 21:51:37,340][134211] Heartbeat connected on RolloutWorker_w11 [2025-01-03 21:51:37,341][134211] Heartbeat connected on RolloutWorker_w9 [2025-01-03 21:51:38,679][134294] Updated weights for policy 0, policy_version 10624 (0.0020) [2025-01-03 21:51:38,968][134211] Fps is (10 sec: 18026.6, 60 sec: 12583.0, 300 sec: 12583.0). Total num frames: 43520000. Throughput: 0: 2754.1. Samples: 45722. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 21:51:38,968][134211] Avg episode reward: [(0, '5.408')] [2025-01-03 21:51:40,976][134294] Updated weights for policy 0, policy_version 10634 (0.0017) [2025-01-03 21:51:43,427][134294] Updated weights for policy 0, policy_version 10644 (0.0016) [2025-01-03 21:51:43,968][134211] Fps is (10 sec: 17203.4, 60 sec: 13652.5, 300 sec: 13652.5). Total num frames: 43606016. Throughput: 0: 3322.5. Samples: 71770. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 21:51:43,968][134211] Avg episode reward: [(0, '5.209')] [2025-01-03 21:51:45,848][134294] Updated weights for policy 0, policy_version 10654 (0.0020) [2025-01-03 21:51:48,545][134294] Updated weights for policy 0, policy_version 10664 (0.0021) [2025-01-03 21:51:48,968][134211] Fps is (10 sec: 16383.9, 60 sec: 14011.8, 300 sec: 14011.8). Total num frames: 43683840. Throughput: 0: 3161.6. Samples: 84104. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-03 21:51:48,968][134211] Avg episode reward: [(0, '6.176')] [2025-01-03 21:51:51,496][134294] Updated weights for policy 0, policy_version 10674 (0.0023) [2025-01-03 21:51:53,968][134211] Fps is (10 sec: 14334.8, 60 sec: 13868.4, 300 sec: 13868.4). Total num frames: 43749376. Throughput: 0: 3336.2. Samples: 105430. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-03 21:51:53,969][134211] Avg episode reward: [(0, '5.905')] [2025-01-03 21:51:54,766][134294] Updated weights for policy 0, policy_version 10684 (0.0020) [2025-01-03 21:51:57,142][134294] Updated weights for policy 0, policy_version 10694 (0.0013) [2025-01-03 21:51:58,968][134211] Fps is (10 sec: 13926.4, 60 sec: 13988.5, 300 sec: 13988.5). Total num frames: 43823104. Throughput: 0: 3486.7. Samples: 127620. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-03 21:51:58,968][134211] Avg episode reward: [(0, '6.325')] [2025-01-03 21:52:00,292][134294] Updated weights for policy 0, policy_version 10704 (0.0028) [2025-01-03 21:52:03,968][134211] Fps is (10 sec: 13517.6, 60 sec: 13784.1, 300 sec: 13784.1). Total num frames: 43884544. Throughput: 0: 3283.0. Samples: 136578. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 21:52:03,968][134211] Avg episode reward: [(0, '5.790')] [2025-01-03 21:52:03,969][134294] Updated weights for policy 0, policy_version 10714 (0.0025) [2025-01-03 21:52:07,470][134294] Updated weights for policy 0, policy_version 10724 (0.0026) [2025-01-03 21:52:08,968][134211] Fps is (10 sec: 11468.6, 60 sec: 13447.8, 300 sec: 13447.8). Total num frames: 43937792. Throughput: 0: 3417.3. Samples: 153778. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 21:52:08,969][134211] Avg episode reward: [(0, '5.809')] [2025-01-03 21:52:10,953][134294] Updated weights for policy 0, policy_version 10734 (0.0028) [2025-01-03 21:52:13,968][134211] Fps is (10 sec: 11878.4, 60 sec: 13414.8, 300 sec: 13414.8). Total num frames: 44003328. Throughput: 0: 3716.5. Samples: 172118. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-03 21:52:13,968][134211] Avg episode reward: [(0, '5.614')] [2025-01-03 21:52:14,144][134294] Updated weights for policy 0, policy_version 10744 (0.0026) [2025-01-03 21:52:17,239][134294] Updated weights for policy 0, policy_version 10754 (0.0025) [2025-01-03 21:52:18,968][134211] Fps is (10 sec: 13107.4, 60 sec: 13387.6, 300 sec: 13387.6). Total num frames: 44068864. Throughput: 0: 3602.1. Samples: 181688. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-03 21:52:18,968][134211] Avg episode reward: [(0, '5.929')] [2025-01-03 21:52:20,377][134294] Updated weights for policy 0, policy_version 10764 (0.0030) [2025-01-03 21:52:23,478][134294] Updated weights for policy 0, policy_version 10774 (0.0027) [2025-01-03 21:52:23,968][134211] Fps is (10 sec: 13107.3, 60 sec: 13721.5, 300 sec: 13364.9). Total num frames: 44134400. Throughput: 0: 3464.8. Samples: 201638. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-03 21:52:23,968][134211] Avg episode reward: [(0, '6.018')] [2025-01-03 21:52:26,634][134294] Updated weights for policy 0, policy_version 10784 (0.0027) [2025-01-03 21:52:28,968][134211] Fps is (10 sec: 13107.0, 60 sec: 14336.5, 300 sec: 13345.5). Total num frames: 44199936. Throughput: 0: 3309.3. Samples: 220690. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-03 21:52:28,969][134211] Avg episode reward: [(0, '5.437')] [2025-01-03 21:52:29,958][134294] Updated weights for policy 0, policy_version 10794 (0.0024) [2025-01-03 21:52:33,003][134294] Updated weights for policy 0, policy_version 10804 (0.0026) [2025-01-03 21:52:33,968][134211] Fps is (10 sec: 13107.1, 60 sec: 13858.1, 300 sec: 13328.9). Total num frames: 44265472. Throughput: 0: 3249.3. Samples: 230322. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-03 21:52:33,968][134211] Avg episode reward: [(0, '5.346')] [2025-01-03 21:52:36,168][134294] Updated weights for policy 0, policy_version 10814 (0.0025) [2025-01-03 21:52:38,968][134211] Fps is (10 sec: 12697.5, 60 sec: 13448.5, 300 sec: 13260.9). Total num frames: 44326912. Throughput: 0: 3213.4. Samples: 250030. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 21:52:38,969][134211] Avg episode reward: [(0, '5.448')] [2025-01-03 21:52:39,709][134294] Updated weights for policy 0, policy_version 10824 (0.0030) [2025-01-03 21:52:43,936][134294] Updated weights for policy 0, policy_version 10834 (0.0032) [2025-01-03 21:52:43,970][134211] Fps is (10 sec: 11056.9, 60 sec: 12833.6, 300 sec: 13050.4). Total num frames: 44376064. Throughput: 0: 3061.1. Samples: 265376. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-03 21:52:43,971][134211] Avg episode reward: [(0, '5.370')] [2025-01-03 21:52:47,849][134294] Updated weights for policy 0, policy_version 10844 (0.0031) [2025-01-03 21:52:48,968][134211] Fps is (10 sec: 9830.6, 60 sec: 12356.3, 300 sec: 12864.8). Total num frames: 44425216. Throughput: 0: 3016.1. Samples: 272302. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-03 21:52:48,968][134211] Avg episode reward: [(0, '5.751')] [2025-01-03 21:52:51,759][134294] Updated weights for policy 0, policy_version 10854 (0.0028) [2025-01-03 21:52:53,968][134211] Fps is (10 sec: 11061.5, 60 sec: 12288.1, 300 sec: 12833.3). Total num frames: 44486656. Throughput: 0: 3009.7. Samples: 289214. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-03 21:52:53,968][134211] Avg episode reward: [(0, '5.514')] [2025-01-03 21:52:54,946][134294] Updated weights for policy 0, policy_version 10864 (0.0029) [2025-01-03 21:52:58,146][134294] Updated weights for policy 0, policy_version 10874 (0.0025) [2025-01-03 21:52:58,968][134211] Fps is (10 sec: 12287.9, 60 sec: 12083.2, 300 sec: 12805.1). Total num frames: 44548096. Throughput: 0: 3024.1. Samples: 308202. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-03 21:52:58,968][134211] Avg episode reward: [(0, '5.181')] [2025-01-03 21:53:01,403][134294] Updated weights for policy 0, policy_version 10884 (0.0028) [2025-01-03 21:53:03,968][134211] Fps is (10 sec: 12288.2, 60 sec: 12083.2, 300 sec: 12779.7). Total num frames: 44609536. Throughput: 0: 3024.4. Samples: 317784. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 21:53:03,968][134211] Avg episode reward: [(0, '5.150')] [2025-01-03 21:53:04,698][134294] Updated weights for policy 0, policy_version 10894 (0.0024) [2025-01-03 21:53:08,118][134294] Updated weights for policy 0, policy_version 10904 (0.0029) [2025-01-03 21:53:08,968][134211] Fps is (10 sec: 12287.9, 60 sec: 12219.7, 300 sec: 12756.6). Total num frames: 44670976. Throughput: 0: 2990.3. Samples: 336200. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 21:53:08,968][134211] Avg episode reward: [(0, '5.335')] [2025-01-03 21:53:11,557][134294] Updated weights for policy 0, policy_version 10914 (0.0030) [2025-01-03 21:53:13,968][134211] Fps is (10 sec: 11878.3, 60 sec: 12083.2, 300 sec: 12698.9). Total num frames: 44728320. Throughput: 0: 2954.1. Samples: 353624. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 21:53:13,968][134211] Avg episode reward: [(0, '5.111')] [2025-01-03 21:53:13,979][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000010920_44728320.pth... [2025-01-03 21:53:14,074][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000010247_41971712.pth [2025-01-03 21:53:15,173][134294] Updated weights for policy 0, policy_version 10924 (0.0027) [2025-01-03 21:53:18,437][134294] Updated weights for policy 0, policy_version 10934 (0.0030) [2025-01-03 21:53:18,968][134211] Fps is (10 sec: 11878.5, 60 sec: 12014.9, 300 sec: 12681.3). Total num frames: 44789760. Throughput: 0: 2934.3. Samples: 362366. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 21:53:18,968][134211] Avg episode reward: [(0, '5.310')] [2025-01-03 21:53:21,598][134294] Updated weights for policy 0, policy_version 10944 (0.0023) [2025-01-03 21:53:23,968][134211] Fps is (10 sec: 12697.3, 60 sec: 12014.9, 300 sec: 12698.8). Total num frames: 44855296. Throughput: 0: 2927.4. Samples: 381762. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 21:53:23,969][134211] Avg episode reward: [(0, '5.377')] [2025-01-03 21:53:24,851][134294] Updated weights for policy 0, policy_version 10954 (0.0027) [2025-01-03 21:53:27,998][134294] Updated weights for policy 0, policy_version 10964 (0.0025) [2025-01-03 21:53:28,968][134211] Fps is (10 sec: 12697.6, 60 sec: 11946.7, 300 sec: 12682.6). Total num frames: 44916736. Throughput: 0: 3006.0. Samples: 400638. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-03 21:53:28,968][134211] Avg episode reward: [(0, '5.196')] [2025-01-03 21:53:31,076][134294] Updated weights for policy 0, policy_version 10974 (0.0025) [2025-01-03 21:53:33,968][134211] Fps is (10 sec: 13107.4, 60 sec: 12014.9, 300 sec: 12729.8). Total num frames: 44986368. Throughput: 0: 3076.4. Samples: 410742. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-03 21:53:33,969][134211] Avg episode reward: [(0, '5.591')] [2025-01-03 21:53:34,297][134294] Updated weights for policy 0, policy_version 10984 (0.0025) [2025-01-03 21:53:37,427][134294] Updated weights for policy 0, policy_version 10994 (0.0026) [2025-01-03 21:53:38,968][134211] Fps is (10 sec: 13107.2, 60 sec: 12014.9, 300 sec: 12713.6). Total num frames: 45047808. Throughput: 0: 3133.2. Samples: 430206. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 21:53:38,969][134211] Avg episode reward: [(0, '5.677')] [2025-01-03 21:53:40,834][134294] Updated weights for policy 0, policy_version 11004 (0.0027) [2025-01-03 21:53:43,920][134294] Updated weights for policy 0, policy_version 11014 (0.0028) [2025-01-03 21:53:43,968][134211] Fps is (10 sec: 12697.8, 60 sec: 12288.5, 300 sec: 12727.5). Total num frames: 45113344. Throughput: 0: 3132.5. Samples: 449166. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 21:53:43,968][134211] Avg episode reward: [(0, '5.882')] [2025-01-03 21:53:47,022][134294] Updated weights for policy 0, policy_version 11024 (0.0026) [2025-01-03 21:53:48,968][134211] Fps is (10 sec: 13107.1, 60 sec: 12561.0, 300 sec: 12740.5). Total num frames: 45178880. Throughput: 0: 3138.7. Samples: 459024. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 21:53:48,969][134211] Avg episode reward: [(0, '5.757')] [2025-01-03 21:53:50,108][134294] Updated weights for policy 0, policy_version 11034 (0.0025) [2025-01-03 21:53:53,055][134294] Updated weights for policy 0, policy_version 11044 (0.0023) [2025-01-03 21:53:53,971][134211] Fps is (10 sec: 13103.4, 60 sec: 12628.7, 300 sec: 12752.3). Total num frames: 45244416. Throughput: 0: 3180.1. Samples: 479314. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-03 21:53:53,971][134211] Avg episode reward: [(0, '6.106')] [2025-01-03 21:53:56,094][134294] Updated weights for policy 0, policy_version 11054 (0.0023) [2025-01-03 21:53:58,968][134211] Fps is (10 sec: 13107.5, 60 sec: 12697.6, 300 sec: 12763.9). Total num frames: 45309952. Throughput: 0: 3231.6. Samples: 499044. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-03 21:53:58,968][134211] Avg episode reward: [(0, '5.938')] [2025-01-03 21:53:59,391][134294] Updated weights for policy 0, policy_version 11064 (0.0024) [2025-01-03 21:54:02,365][134294] Updated weights for policy 0, policy_version 11074 (0.0025) [2025-01-03 21:54:03,968][134211] Fps is (10 sec: 13110.9, 60 sec: 12765.8, 300 sec: 12774.5). Total num frames: 45375488. Throughput: 0: 3256.0. Samples: 508886. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 21:54:03,968][134211] Avg episode reward: [(0, '5.745')] [2025-01-03 21:54:05,577][134294] Updated weights for policy 0, policy_version 11084 (0.0024) [2025-01-03 21:54:08,500][134294] Updated weights for policy 0, policy_version 11094 (0.0026) [2025-01-03 21:54:08,968][134211] Fps is (10 sec: 13516.8, 60 sec: 12902.4, 300 sec: 12809.1). Total num frames: 45445120. Throughput: 0: 3271.3. Samples: 528970. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 21:54:08,968][134211] Avg episode reward: [(0, '5.715')] [2025-01-03 21:54:11,698][134294] Updated weights for policy 0, policy_version 11104 (0.0027) [2025-01-03 21:54:13,970][134211] Fps is (10 sec: 13513.5, 60 sec: 13038.4, 300 sec: 12817.6). Total num frames: 45510656. Throughput: 0: 3283.1. Samples: 548386. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 21:54:13,971][134211] Avg episode reward: [(0, '5.074')] [2025-01-03 21:54:14,953][134294] Updated weights for policy 0, policy_version 11114 (0.0028) [2025-01-03 21:54:18,287][134294] Updated weights for policy 0, policy_version 11124 (0.0026) [2025-01-03 21:54:18,968][134211] Fps is (10 sec: 12697.5, 60 sec: 13038.9, 300 sec: 12802.8). Total num frames: 45572096. Throughput: 0: 3260.6. Samples: 557468. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-03 21:54:18,968][134211] Avg episode reward: [(0, '5.118')] [2025-01-03 21:54:21,419][134294] Updated weights for policy 0, policy_version 11134 (0.0026) [2025-01-03 21:54:23,968][134211] Fps is (10 sec: 12699.9, 60 sec: 13038.8, 300 sec: 12811.1). Total num frames: 45637632. Throughput: 0: 3263.8. Samples: 577080. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-03 21:54:23,969][134211] Avg episode reward: [(0, '4.942')] [2025-01-03 21:54:24,497][134294] Updated weights for policy 0, policy_version 11144 (0.0024) [2025-01-03 21:54:27,613][134294] Updated weights for policy 0, policy_version 11154 (0.0027) [2025-01-03 21:54:28,968][134211] Fps is (10 sec: 13107.3, 60 sec: 13107.2, 300 sec: 12819.1). Total num frames: 45703168. Throughput: 0: 3276.6. Samples: 596612. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 21:54:28,969][134211] Avg episode reward: [(0, '5.081')] [2025-01-03 21:54:30,620][134294] Updated weights for policy 0, policy_version 11164 (0.0027) [2025-01-03 21:54:33,674][134294] Updated weights for policy 0, policy_version 11174 (0.0026) [2025-01-03 21:54:33,968][134211] Fps is (10 sec: 13108.0, 60 sec: 13038.9, 300 sec: 12826.6). Total num frames: 45768704. Throughput: 0: 3287.1. Samples: 606944. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 21:54:33,969][134211] Avg episode reward: [(0, '4.945')] [2025-01-03 21:54:36,786][134294] Updated weights for policy 0, policy_version 11184 (0.0026) [2025-01-03 21:54:38,968][134211] Fps is (10 sec: 13107.0, 60 sec: 13107.2, 300 sec: 12833.8). Total num frames: 45834240. Throughput: 0: 3278.3. Samples: 626830. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 21:54:38,968][134211] Avg episode reward: [(0, '5.257')] [2025-01-03 21:54:40,390][134294] Updated weights for policy 0, policy_version 11194 (0.0025) [2025-01-03 21:54:43,968][134211] Fps is (10 sec: 11878.6, 60 sec: 12902.4, 300 sec: 12779.6). Total num frames: 45887488. Throughput: 0: 3216.8. Samples: 643798. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 21:54:43,968][134211] Avg episode reward: [(0, '4.911')] [2025-01-03 21:54:44,090][134294] Updated weights for policy 0, policy_version 11204 (0.0029) [2025-01-03 21:54:48,169][134294] Updated weights for policy 0, policy_version 11214 (0.0032) [2025-01-03 21:54:48,968][134211] Fps is (10 sec: 10649.6, 60 sec: 12697.6, 300 sec: 12728.0). Total num frames: 45940736. Throughput: 0: 3160.7. Samples: 651118. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-03 21:54:48,968][134211] Avg episode reward: [(0, '5.226')] [2025-01-03 21:54:51,726][134294] Updated weights for policy 0, policy_version 11224 (0.0026) [2025-01-03 21:54:53,968][134211] Fps is (10 sec: 11059.1, 60 sec: 12561.7, 300 sec: 12698.3). Total num frames: 45998080. Throughput: 0: 3082.5. Samples: 667684. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 21:54:53,968][134211] Avg episode reward: [(0, '5.173')] [2025-01-03 21:54:55,187][134294] Updated weights for policy 0, policy_version 11234 (0.0031) [2025-01-03 21:54:58,414][134294] Updated weights for policy 0, policy_version 11244 (0.0028) [2025-01-03 21:54:58,968][134211] Fps is (10 sec: 11878.5, 60 sec: 12492.8, 300 sec: 12688.8). Total num frames: 46059520. Throughput: 0: 3063.0. Samples: 686212. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 21:54:58,968][134211] Avg episode reward: [(0, '5.547')] [2025-01-03 21:55:01,780][134294] Updated weights for policy 0, policy_version 11254 (0.0027) [2025-01-03 21:55:03,968][134211] Fps is (10 sec: 12287.9, 60 sec: 12424.5, 300 sec: 12679.8). Total num frames: 46120960. Throughput: 0: 3063.8. Samples: 695338. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 21:55:03,969][134211] Avg episode reward: [(0, '6.671')] [2025-01-03 21:55:05,248][134294] Updated weights for policy 0, policy_version 11264 (0.0028) [2025-01-03 21:55:08,701][134294] Updated weights for policy 0, policy_version 11274 (0.0028) [2025-01-03 21:55:08,968][134211] Fps is (10 sec: 12288.1, 60 sec: 12288.0, 300 sec: 12671.1). Total num frames: 46182400. Throughput: 0: 3026.6. Samples: 713274. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 21:55:08,968][134211] Avg episode reward: [(0, '5.596')] [2025-01-03 21:55:12,145][134294] Updated weights for policy 0, policy_version 11284 (0.0027) [2025-01-03 21:55:13,968][134211] Fps is (10 sec: 11877.9, 60 sec: 12151.9, 300 sec: 12645.1). Total num frames: 46239744. Throughput: 0: 2989.2. Samples: 731130. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 21:55:13,969][134211] Avg episode reward: [(0, '5.894')] [2025-01-03 21:55:13,977][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000011289_46239744.pth... [2025-01-03 21:55:14,065][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000010574_43311104.pth [2025-01-03 21:55:15,599][134294] Updated weights for policy 0, policy_version 11294 (0.0026) [2025-01-03 21:55:18,743][134294] Updated weights for policy 0, policy_version 11304 (0.0029) [2025-01-03 21:55:18,968][134211] Fps is (10 sec: 11877.7, 60 sec: 12151.4, 300 sec: 12637.6). Total num frames: 46301184. Throughput: 0: 2966.2. Samples: 740422. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 21:55:18,969][134211] Avg episode reward: [(0, '5.484')] [2025-01-03 21:55:22,530][134294] Updated weights for policy 0, policy_version 11314 (0.0034) [2025-01-03 21:55:23,970][134211] Fps is (10 sec: 11876.6, 60 sec: 12014.7, 300 sec: 12613.3). Total num frames: 46358528. Throughput: 0: 2908.7. Samples: 757728. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-03 21:55:23,971][134211] Avg episode reward: [(0, '5.428')] [2025-01-03 21:55:26,004][134294] Updated weights for policy 0, policy_version 11324 (0.0028) [2025-01-03 21:55:28,968][134211] Fps is (10 sec: 11469.3, 60 sec: 11878.4, 300 sec: 12590.2). Total num frames: 46415872. Throughput: 0: 2919.1. Samples: 775156. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 21:55:28,968][134211] Avg episode reward: [(0, '5.949')] [2025-01-03 21:55:29,512][134294] Updated weights for policy 0, policy_version 11334 (0.0026) [2025-01-03 21:55:32,813][134294] Updated weights for policy 0, policy_version 11344 (0.0027) [2025-01-03 21:55:33,968][134211] Fps is (10 sec: 11880.4, 60 sec: 11810.1, 300 sec: 12584.2). Total num frames: 46477312. Throughput: 0: 2957.6. Samples: 784212. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 21:55:33,969][134211] Avg episode reward: [(0, '6.239')] [2025-01-03 21:55:35,838][134294] Updated weights for policy 0, policy_version 11354 (0.0024) [2025-01-03 21:55:38,890][134294] Updated weights for policy 0, policy_version 11364 (0.0025) [2025-01-03 21:55:38,968][134211] Fps is (10 sec: 13107.3, 60 sec: 11878.4, 300 sec: 12610.4). Total num frames: 46546944. Throughput: 0: 3037.6. Samples: 804374. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 21:55:38,968][134211] Avg episode reward: [(0, '5.742')] [2025-01-03 21:55:41,958][134294] Updated weights for policy 0, policy_version 11374 (0.0023) [2025-01-03 21:55:43,968][134211] Fps is (10 sec: 13107.7, 60 sec: 12014.9, 300 sec: 12604.2). Total num frames: 46608384. Throughput: 0: 3057.9. Samples: 823818. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 21:55:43,969][134211] Avg episode reward: [(0, '5.321')] [2025-01-03 21:55:45,265][134294] Updated weights for policy 0, policy_version 11384 (0.0027) [2025-01-03 21:55:48,239][134294] Updated weights for policy 0, policy_version 11394 (0.0027) [2025-01-03 21:55:48,969][134211] Fps is (10 sec: 13105.3, 60 sec: 12287.7, 300 sec: 12628.9). Total num frames: 46678016. Throughput: 0: 3073.4. Samples: 833644. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 21:55:48,970][134211] Avg episode reward: [(0, '4.959')] [2025-01-03 21:55:51,382][134294] Updated weights for policy 0, policy_version 11404 (0.0027) [2025-01-03 21:55:53,968][134211] Fps is (10 sec: 13515.9, 60 sec: 12424.4, 300 sec: 12637.8). Total num frames: 46743552. Throughput: 0: 3117.5. Samples: 853566. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 21:55:53,969][134211] Avg episode reward: [(0, '4.784')] [2025-01-03 21:55:54,506][134294] Updated weights for policy 0, policy_version 11414 (0.0026) [2025-01-03 21:55:57,642][134294] Updated weights for policy 0, policy_version 11424 (0.0026) [2025-01-03 21:55:58,968][134211] Fps is (10 sec: 13109.0, 60 sec: 12492.8, 300 sec: 12646.3). Total num frames: 46809088. Throughput: 0: 3153.5. Samples: 873036. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 21:55:58,969][134211] Avg episode reward: [(0, '5.505')] [2025-01-03 21:56:00,791][134294] Updated weights for policy 0, policy_version 11434 (0.0025) [2025-01-03 21:56:03,912][134294] Updated weights for policy 0, policy_version 11444 (0.0028) [2025-01-03 21:56:03,968][134211] Fps is (10 sec: 13107.7, 60 sec: 12561.0, 300 sec: 12654.5). Total num frames: 46874624. Throughput: 0: 3169.1. Samples: 883030. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-03 21:56:03,969][134211] Avg episode reward: [(0, '5.376')] [2025-01-03 21:56:06,870][134294] Updated weights for policy 0, policy_version 11454 (0.0026) [2025-01-03 21:56:08,970][134211] Fps is (10 sec: 13104.8, 60 sec: 12628.9, 300 sec: 12662.3). Total num frames: 46940160. Throughput: 0: 3229.1. Samples: 903038. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-03 21:56:08,970][134211] Avg episode reward: [(0, '5.143')] [2025-01-03 21:56:10,070][134294] Updated weights for policy 0, policy_version 11464 (0.0029) [2025-01-03 21:56:13,236][134294] Updated weights for policy 0, policy_version 11474 (0.0025) [2025-01-03 21:56:13,968][134211] Fps is (10 sec: 13107.6, 60 sec: 12766.0, 300 sec: 12670.0). Total num frames: 47005696. Throughput: 0: 3267.1. Samples: 922176. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-03 21:56:13,968][134211] Avg episode reward: [(0, '5.115')] [2025-01-03 21:56:16,455][134294] Updated weights for policy 0, policy_version 11484 (0.0029) [2025-01-03 21:56:18,968][134211] Fps is (10 sec: 12700.0, 60 sec: 12766.0, 300 sec: 12732.3). Total num frames: 47067136. Throughput: 0: 3279.3. Samples: 931778. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-03 21:56:18,968][134211] Avg episode reward: [(0, '5.133')] [2025-01-03 21:56:19,758][134294] Updated weights for policy 0, policy_version 11494 (0.0024) [2025-01-03 21:56:22,865][134294] Updated weights for policy 0, policy_version 11504 (0.0026) [2025-01-03 21:56:23,968][134211] Fps is (10 sec: 12697.6, 60 sec: 12902.8, 300 sec: 12857.4). Total num frames: 47132672. Throughput: 0: 3260.1. Samples: 951078. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 21:56:23,968][134211] Avg episode reward: [(0, '5.221')] [2025-01-03 21:56:26,023][134294] Updated weights for policy 0, policy_version 11514 (0.0025) [2025-01-03 21:56:28,968][134211] Fps is (10 sec: 13107.2, 60 sec: 13038.9, 300 sec: 12760.1). Total num frames: 47198208. Throughput: 0: 3254.0. Samples: 970246. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 21:56:28,968][134211] Avg episode reward: [(0, '5.293')] [2025-01-03 21:56:29,268][134294] Updated weights for policy 0, policy_version 11524 (0.0027) [2025-01-03 21:56:32,217][134294] Updated weights for policy 0, policy_version 11534 (0.0026) [2025-01-03 21:56:33,968][134211] Fps is (10 sec: 13107.1, 60 sec: 13107.3, 300 sec: 12690.7). Total num frames: 47263744. Throughput: 0: 3260.2. Samples: 980348. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-03 21:56:33,968][134211] Avg episode reward: [(0, '5.107')] [2025-01-03 21:56:35,329][134294] Updated weights for policy 0, policy_version 11544 (0.0026) [2025-01-03 21:56:38,535][134294] Updated weights for policy 0, policy_version 11554 (0.0027) [2025-01-03 21:56:38,968][134211] Fps is (10 sec: 13107.4, 60 sec: 13039.0, 300 sec: 12621.2). Total num frames: 47329280. Throughput: 0: 3256.9. Samples: 1000122. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-03 21:56:38,968][134211] Avg episode reward: [(0, '4.994')] [2025-01-03 21:56:41,881][134294] Updated weights for policy 0, policy_version 11564 (0.0025) [2025-01-03 21:56:43,968][134211] Fps is (10 sec: 12288.1, 60 sec: 12970.7, 300 sec: 12551.8). Total num frames: 47386624. Throughput: 0: 3226.8. Samples: 1018242. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 21:56:43,968][134211] Avg episode reward: [(0, '4.659')] [2025-01-03 21:56:45,435][134294] Updated weights for policy 0, policy_version 11574 (0.0025) [2025-01-03 21:56:48,823][134294] Updated weights for policy 0, policy_version 11584 (0.0029) [2025-01-03 21:56:48,968][134211] Fps is (10 sec: 11877.8, 60 sec: 12834.4, 300 sec: 12537.9). Total num frames: 47448064. Throughput: 0: 3199.1. Samples: 1026988. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 21:56:48,969][134211] Avg episode reward: [(0, '4.986')] [2025-01-03 21:56:52,017][134294] Updated weights for policy 0, policy_version 11594 (0.0026) [2025-01-03 21:56:53,968][134211] Fps is (10 sec: 12696.6, 60 sec: 12834.1, 300 sec: 12510.1). Total num frames: 47513600. Throughput: 0: 3177.2. Samples: 1046008. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 21:56:53,969][134211] Avg episode reward: [(0, '5.264')] [2025-01-03 21:56:55,027][134294] Updated weights for policy 0, policy_version 11604 (0.0026) [2025-01-03 21:56:57,951][134294] Updated weights for policy 0, policy_version 11614 (0.0026) [2025-01-03 21:56:58,968][134211] Fps is (10 sec: 13517.4, 60 sec: 12902.4, 300 sec: 12537.9). Total num frames: 47583232. Throughput: 0: 3210.0. Samples: 1066624. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 21:56:58,968][134211] Avg episode reward: [(0, '5.020')] [2025-01-03 21:57:00,919][134294] Updated weights for policy 0, policy_version 11624 (0.0023) [2025-01-03 21:57:03,859][134294] Updated weights for policy 0, policy_version 11634 (0.0024) [2025-01-03 21:57:03,968][134211] Fps is (10 sec: 13927.4, 60 sec: 12970.7, 300 sec: 12593.5). Total num frames: 47652864. Throughput: 0: 3229.5. Samples: 1077104. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 21:57:03,969][134211] Avg episode reward: [(0, '6.112')] [2025-01-03 21:57:06,865][134294] Updated weights for policy 0, policy_version 11644 (0.0024) [2025-01-03 21:57:08,968][134211] Fps is (10 sec: 13516.8, 60 sec: 12971.1, 300 sec: 12593.5). Total num frames: 47718400. Throughput: 0: 3257.7. Samples: 1097672. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 21:57:08,968][134211] Avg episode reward: [(0, '6.008')] [2025-01-03 21:57:10,087][134294] Updated weights for policy 0, policy_version 11654 (0.0025) [2025-01-03 21:57:13,105][134294] Updated weights for policy 0, policy_version 11664 (0.0028) [2025-01-03 21:57:13,968][134211] Fps is (10 sec: 13107.2, 60 sec: 12970.7, 300 sec: 12593.5). Total num frames: 47783936. Throughput: 0: 3265.6. Samples: 1117198. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 21:57:13,968][134211] Avg episode reward: [(0, '5.355')] [2025-01-03 21:57:13,982][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000011666_47783936.pth... [2025-01-03 21:57:14,058][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000010920_44728320.pth [2025-01-03 21:57:16,210][134294] Updated weights for policy 0, policy_version 11674 (0.0027) [2025-01-03 21:57:18,968][134211] Fps is (10 sec: 13516.7, 60 sec: 13107.2, 300 sec: 12607.3). Total num frames: 47853568. Throughput: 0: 3265.2. Samples: 1127282. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 21:57:18,968][134211] Avg episode reward: [(0, '5.548')] [2025-01-03 21:57:19,172][134294] Updated weights for policy 0, policy_version 11684 (0.0026) [2025-01-03 21:57:22,126][134294] Updated weights for policy 0, policy_version 11694 (0.0026) [2025-01-03 21:57:23,968][134211] Fps is (10 sec: 13517.0, 60 sec: 13107.2, 300 sec: 12607.4). Total num frames: 47919104. Throughput: 0: 3285.5. Samples: 1147972. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 21:57:23,968][134211] Avg episode reward: [(0, '6.222')] [2025-01-03 21:57:25,229][134294] Updated weights for policy 0, policy_version 11704 (0.0026) [2025-01-03 21:57:28,226][134294] Updated weights for policy 0, policy_version 11714 (0.0026) [2025-01-03 21:57:28,968][134211] Fps is (10 sec: 13515.9, 60 sec: 13175.3, 300 sec: 12621.2). Total num frames: 47988736. Throughput: 0: 3335.0. Samples: 1168318. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 21:57:28,970][134211] Avg episode reward: [(0, '5.517')] [2025-01-03 21:57:31,062][134294] Updated weights for policy 0, policy_version 11724 (0.0023) [2025-01-03 21:57:33,968][134211] Fps is (10 sec: 13926.2, 60 sec: 13243.7, 300 sec: 12649.0). Total num frames: 48058368. Throughput: 0: 3373.2. Samples: 1178780. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 21:57:33,968][134211] Avg episode reward: [(0, '5.795')] [2025-01-03 21:57:34,261][134294] Updated weights for policy 0, policy_version 11734 (0.0028) [2025-01-03 21:57:37,162][134294] Updated weights for policy 0, policy_version 11744 (0.0023) [2025-01-03 21:57:38,968][134211] Fps is (10 sec: 13517.7, 60 sec: 13243.7, 300 sec: 12704.6). Total num frames: 48123904. Throughput: 0: 3402.7. Samples: 1199128. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 21:57:38,968][134211] Avg episode reward: [(0, '6.098')] [2025-01-03 21:57:40,262][134294] Updated weights for policy 0, policy_version 11754 (0.0024) [2025-01-03 21:57:43,155][134294] Updated weights for policy 0, policy_version 11764 (0.0023) [2025-01-03 21:57:43,968][134211] Fps is (10 sec: 13517.0, 60 sec: 13448.5, 300 sec: 12774.0). Total num frames: 48193536. Throughput: 0: 3394.3. Samples: 1219366. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 21:57:43,968][134211] Avg episode reward: [(0, '6.087')] [2025-01-03 21:57:46,354][134294] Updated weights for policy 0, policy_version 11774 (0.0023) [2025-01-03 21:57:48,968][134211] Fps is (10 sec: 13516.8, 60 sec: 13516.9, 300 sec: 12787.9). Total num frames: 48259072. Throughput: 0: 3383.1. Samples: 1229342. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 21:57:48,968][134211] Avg episode reward: [(0, '5.687')] [2025-01-03 21:57:49,321][134294] Updated weights for policy 0, policy_version 11784 (0.0024) [2025-01-03 21:57:52,399][134294] Updated weights for policy 0, policy_version 11794 (0.0025) [2025-01-03 21:57:53,968][134211] Fps is (10 sec: 13516.7, 60 sec: 13585.2, 300 sec: 12815.6). Total num frames: 48328704. Throughput: 0: 3375.4. Samples: 1249566. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 21:57:53,968][134211] Avg episode reward: [(0, '5.814')] [2025-01-03 21:57:55,393][134294] Updated weights for policy 0, policy_version 11804 (0.0024) [2025-01-03 21:57:58,375][134294] Updated weights for policy 0, policy_version 11814 (0.0026) [2025-01-03 21:57:58,968][134211] Fps is (10 sec: 13926.3, 60 sec: 13585.0, 300 sec: 12843.4). Total num frames: 48398336. Throughput: 0: 3402.9. Samples: 1270328. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 21:57:58,968][134211] Avg episode reward: [(0, '5.812')] [2025-01-03 21:58:01,245][134294] Updated weights for policy 0, policy_version 11824 (0.0025) [2025-01-03 21:58:03,968][134211] Fps is (10 sec: 13516.7, 60 sec: 13516.8, 300 sec: 12857.3). Total num frames: 48463872. Throughput: 0: 3405.7. Samples: 1280538. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-03 21:58:03,968][134211] Avg episode reward: [(0, '5.892')] [2025-01-03 21:58:04,577][134294] Updated weights for policy 0, policy_version 11834 (0.0027) [2025-01-03 21:58:07,546][134294] Updated weights for policy 0, policy_version 11844 (0.0026) [2025-01-03 21:58:08,968][134211] Fps is (10 sec: 13107.5, 60 sec: 13516.8, 300 sec: 12885.1). Total num frames: 48529408. Throughput: 0: 3384.5. Samples: 1300276. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-03 21:58:08,968][134211] Avg episode reward: [(0, '5.820')] [2025-01-03 21:58:10,566][134294] Updated weights for policy 0, policy_version 11854 (0.0026) [2025-01-03 21:58:13,599][134294] Updated weights for policy 0, policy_version 11864 (0.0024) [2025-01-03 21:58:13,968][134211] Fps is (10 sec: 13107.4, 60 sec: 13516.8, 300 sec: 12898.9). Total num frames: 48594944. Throughput: 0: 3381.6. Samples: 1320486. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 21:58:13,968][134211] Avg episode reward: [(0, '5.801')] [2025-01-03 21:58:16,682][134294] Updated weights for policy 0, policy_version 11874 (0.0024) [2025-01-03 21:58:18,968][134211] Fps is (10 sec: 13516.7, 60 sec: 13516.8, 300 sec: 12912.8). Total num frames: 48664576. Throughput: 0: 3372.0. Samples: 1330518. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-03 21:58:18,968][134211] Avg episode reward: [(0, '5.731')] [2025-01-03 21:58:19,914][134294] Updated weights for policy 0, policy_version 11884 (0.0027) [2025-01-03 21:58:23,027][134294] Updated weights for policy 0, policy_version 11894 (0.0023) [2025-01-03 21:58:23,968][134211] Fps is (10 sec: 13107.0, 60 sec: 13448.5, 300 sec: 12912.8). Total num frames: 48726016. Throughput: 0: 3352.3. Samples: 1349980. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-03 21:58:23,968][134211] Avg episode reward: [(0, '5.619')] [2025-01-03 21:58:26,104][134294] Updated weights for policy 0, policy_version 11904 (0.0023) [2025-01-03 21:58:28,968][134211] Fps is (10 sec: 13107.2, 60 sec: 13448.7, 300 sec: 12912.8). Total num frames: 48795648. Throughput: 0: 3344.5. Samples: 1369870. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 21:58:28,968][134211] Avg episode reward: [(0, '5.646')] [2025-01-03 21:58:29,123][134294] Updated weights for policy 0, policy_version 11914 (0.0026) [2025-01-03 21:58:32,110][134294] Updated weights for policy 0, policy_version 11924 (0.0029) [2025-01-03 21:58:33,968][134211] Fps is (10 sec: 13926.5, 60 sec: 13448.6, 300 sec: 12940.6). Total num frames: 48865280. Throughput: 0: 3349.1. Samples: 1380050. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 21:58:33,968][134211] Avg episode reward: [(0, '5.770')] [2025-01-03 21:58:35,218][134294] Updated weights for policy 0, policy_version 11934 (0.0026) [2025-01-03 21:58:38,221][134294] Updated weights for policy 0, policy_version 11944 (0.0026) [2025-01-03 21:58:38,969][134211] Fps is (10 sec: 13515.6, 60 sec: 13448.3, 300 sec: 12940.5). Total num frames: 48930816. Throughput: 0: 3355.4. Samples: 1400560. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 21:58:38,969][134211] Avg episode reward: [(0, '5.847')] [2025-01-03 21:58:41,300][134294] Updated weights for policy 0, policy_version 11954 (0.0026) [2025-01-03 21:58:43,968][134211] Fps is (10 sec: 12697.7, 60 sec: 13312.0, 300 sec: 12926.7). Total num frames: 48992256. Throughput: 0: 3313.1. Samples: 1419416. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 21:58:43,968][134211] Avg episode reward: [(0, '5.800')] [2025-01-03 21:58:44,969][134294] Updated weights for policy 0, policy_version 11964 (0.0033) [2025-01-03 21:58:48,285][134294] Updated weights for policy 0, policy_version 11974 (0.0029) [2025-01-03 21:58:48,968][134211] Fps is (10 sec: 12289.0, 60 sec: 13243.7, 300 sec: 12912.9). Total num frames: 49053696. Throughput: 0: 3280.0. Samples: 1428138. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 21:58:48,968][134211] Avg episode reward: [(0, '5.397')] [2025-01-03 21:58:51,540][134294] Updated weights for policy 0, policy_version 11984 (0.0028) [2025-01-03 21:58:53,968][134211] Fps is (10 sec: 12288.0, 60 sec: 13107.2, 300 sec: 12898.9). Total num frames: 49115136. Throughput: 0: 3255.5. Samples: 1446772. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 21:58:53,968][134211] Avg episode reward: [(0, '5.745')] [2025-01-03 21:58:54,633][134294] Updated weights for policy 0, policy_version 11994 (0.0025) [2025-01-03 21:58:57,657][134294] Updated weights for policy 0, policy_version 12004 (0.0027) [2025-01-03 21:58:58,968][134211] Fps is (10 sec: 13107.3, 60 sec: 13107.2, 300 sec: 12912.8). Total num frames: 49184768. Throughput: 0: 3257.5. Samples: 1467074. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 21:58:58,968][134211] Avg episode reward: [(0, '5.589')] [2025-01-03 21:59:00,598][134294] Updated weights for policy 0, policy_version 12014 (0.0024) [2025-01-03 21:59:03,533][134294] Updated weights for policy 0, policy_version 12024 (0.0026) [2025-01-03 21:59:03,968][134211] Fps is (10 sec: 13926.1, 60 sec: 13175.5, 300 sec: 12912.8). Total num frames: 49254400. Throughput: 0: 3270.3. Samples: 1477680. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 21:59:03,968][134211] Avg episode reward: [(0, '5.285')] [2025-01-03 21:59:06,599][134294] Updated weights for policy 0, policy_version 12034 (0.0024) [2025-01-03 21:59:08,968][134211] Fps is (10 sec: 13516.7, 60 sec: 13175.4, 300 sec: 12912.9). Total num frames: 49319936. Throughput: 0: 3290.8. Samples: 1498068. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 21:59:08,968][134211] Avg episode reward: [(0, '5.947')] [2025-01-03 21:59:09,586][134294] Updated weights for policy 0, policy_version 12044 (0.0025) [2025-01-03 21:59:12,622][134294] Updated weights for policy 0, policy_version 12054 (0.0028) [2025-01-03 21:59:13,969][134211] Fps is (10 sec: 13515.7, 60 sec: 13243.5, 300 sec: 12940.5). Total num frames: 49389568. Throughput: 0: 3298.0. Samples: 1518284. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 21:59:13,969][134211] Avg episode reward: [(0, '5.770')] [2025-01-03 21:59:13,980][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000012058_49389568.pth... [2025-01-03 21:59:14,051][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000011289_46239744.pth [2025-01-03 21:59:15,669][134294] Updated weights for policy 0, policy_version 12064 (0.0025) [2025-01-03 21:59:18,523][134294] Updated weights for policy 0, policy_version 12074 (0.0023) [2025-01-03 21:59:18,969][134211] Fps is (10 sec: 13925.3, 60 sec: 13243.5, 300 sec: 12954.5). Total num frames: 49459200. Throughput: 0: 3299.9. Samples: 1528550. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 21:59:18,969][134211] Avg episode reward: [(0, '5.168')] [2025-01-03 21:59:21,558][134294] Updated weights for policy 0, policy_version 12084 (0.0026) [2025-01-03 21:59:23,968][134211] Fps is (10 sec: 13517.9, 60 sec: 13312.0, 300 sec: 12954.5). Total num frames: 49524736. Throughput: 0: 3304.3. Samples: 1549250. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 21:59:23,968][134211] Avg episode reward: [(0, '6.163')] [2025-01-03 21:59:24,784][134294] Updated weights for policy 0, policy_version 12094 (0.0027) [2025-01-03 21:59:27,867][134294] Updated weights for policy 0, policy_version 12104 (0.0026) [2025-01-03 21:59:28,968][134211] Fps is (10 sec: 13108.2, 60 sec: 13243.7, 300 sec: 12954.5). Total num frames: 49590272. Throughput: 0: 3322.1. Samples: 1568912. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 21:59:28,968][134211] Avg episode reward: [(0, '6.286')] [2025-01-03 21:59:30,815][134294] Updated weights for policy 0, policy_version 12114 (0.0025) [2025-01-03 21:59:33,894][134294] Updated weights for policy 0, policy_version 12124 (0.0027) [2025-01-03 21:59:33,968][134211] Fps is (10 sec: 13516.8, 60 sec: 13243.7, 300 sec: 12968.3). Total num frames: 49659904. Throughput: 0: 3357.8. Samples: 1579238. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 21:59:33,968][134211] Avg episode reward: [(0, '6.077')] [2025-01-03 21:59:37,116][134294] Updated weights for policy 0, policy_version 12134 (0.0025) [2025-01-03 21:59:38,968][134211] Fps is (10 sec: 13107.2, 60 sec: 13175.6, 300 sec: 12996.1). Total num frames: 49721344. Throughput: 0: 3376.3. Samples: 1598704. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 21:59:38,969][134211] Avg episode reward: [(0, '5.902')] [2025-01-03 21:59:40,164][134294] Updated weights for policy 0, policy_version 12144 (0.0022) [2025-01-03 21:59:43,120][134294] Updated weights for policy 0, policy_version 12154 (0.0027) [2025-01-03 21:59:43,968][134211] Fps is (10 sec: 13107.5, 60 sec: 13312.0, 300 sec: 13051.7). Total num frames: 49790976. Throughput: 0: 3373.2. Samples: 1618870. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 21:59:43,968][134211] Avg episode reward: [(0, '6.073')] [2025-01-03 21:59:46,332][134294] Updated weights for policy 0, policy_version 12164 (0.0024) [2025-01-03 21:59:48,968][134211] Fps is (10 sec: 13926.4, 60 sec: 13448.5, 300 sec: 13093.3). Total num frames: 49860608. Throughput: 0: 3358.4. Samples: 1628806. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 21:59:48,968][134211] Avg episode reward: [(0, '5.489')] [2025-01-03 21:59:49,201][134294] Updated weights for policy 0, policy_version 12174 (0.0025) [2025-01-03 21:59:52,293][134294] Updated weights for policy 0, policy_version 12184 (0.0027) [2025-01-03 21:59:53,968][134211] Fps is (10 sec: 13516.8, 60 sec: 13516.8, 300 sec: 13107.2). Total num frames: 49926144. Throughput: 0: 3361.5. Samples: 1649336. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 21:59:53,968][134211] Avg episode reward: [(0, '5.514')] [2025-01-03 21:59:55,224][134294] Updated weights for policy 0, policy_version 12194 (0.0025) [2025-01-03 21:59:58,243][134294] Updated weights for policy 0, policy_version 12204 (0.0025) [2025-01-03 21:59:58,968][134211] Fps is (10 sec: 13517.0, 60 sec: 13516.8, 300 sec: 13135.0). Total num frames: 49995776. Throughput: 0: 3369.9. Samples: 1669926. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 21:59:58,968][134211] Avg episode reward: [(0, '5.922')] [2025-01-03 22:00:01,250][134294] Updated weights for policy 0, policy_version 12214 (0.0025) [2025-01-03 22:00:03,968][134211] Fps is (10 sec: 13516.0, 60 sec: 13448.4, 300 sec: 13148.8). Total num frames: 50061312. Throughput: 0: 3371.3. Samples: 1680258. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-03 22:00:03,969][134211] Avg episode reward: [(0, '5.666')] [2025-01-03 22:00:04,558][134294] Updated weights for policy 0, policy_version 12224 (0.0028) [2025-01-03 22:00:07,573][134294] Updated weights for policy 0, policy_version 12234 (0.0024) [2025-01-03 22:00:08,968][134211] Fps is (10 sec: 13107.0, 60 sec: 13448.5, 300 sec: 13176.7). Total num frames: 50126848. Throughput: 0: 3336.9. Samples: 1699410. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 22:00:08,968][134211] Avg episode reward: [(0, '5.927')] [2025-01-03 22:00:10,580][134294] Updated weights for policy 0, policy_version 12244 (0.0022) [2025-01-03 22:00:13,798][134294] Updated weights for policy 0, policy_version 12254 (0.0025) [2025-01-03 22:00:13,968][134211] Fps is (10 sec: 13107.9, 60 sec: 13380.5, 300 sec: 13190.5). Total num frames: 50192384. Throughput: 0: 3342.1. Samples: 1719304. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 22:00:13,968][134211] Avg episode reward: [(0, '6.243')] [2025-01-03 22:00:17,006][134294] Updated weights for policy 0, policy_version 12264 (0.0025) [2025-01-03 22:00:18,968][134211] Fps is (10 sec: 13107.4, 60 sec: 13312.2, 300 sec: 13218.4). Total num frames: 50257920. Throughput: 0: 3323.7. Samples: 1728802. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 22:00:18,968][134211] Avg episode reward: [(0, '6.460')] [2025-01-03 22:00:20,008][134294] Updated weights for policy 0, policy_version 12274 (0.0025) [2025-01-03 22:00:22,963][134294] Updated weights for policy 0, policy_version 12284 (0.0021) [2025-01-03 22:00:23,968][134211] Fps is (10 sec: 13516.8, 60 sec: 13380.3, 300 sec: 13259.9). Total num frames: 50327552. Throughput: 0: 3351.2. Samples: 1749506. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 22:00:23,968][134211] Avg episode reward: [(0, '5.948')] [2025-01-03 22:00:25,884][134294] Updated weights for policy 0, policy_version 12294 (0.0023) [2025-01-03 22:00:28,960][134294] Updated weights for policy 0, policy_version 12304 (0.0025) [2025-01-03 22:00:28,969][134211] Fps is (10 sec: 13925.0, 60 sec: 13448.4, 300 sec: 13287.7). Total num frames: 50397184. Throughput: 0: 3357.2. Samples: 1769946. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-03 22:00:28,969][134211] Avg episode reward: [(0, '5.710')] [2025-01-03 22:00:32,238][134294] Updated weights for policy 0, policy_version 12314 (0.0028) [2025-01-03 22:00:33,968][134211] Fps is (10 sec: 13107.1, 60 sec: 13312.0, 300 sec: 13259.9). Total num frames: 50458624. Throughput: 0: 3344.3. Samples: 1779298. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-03 22:00:33,969][134211] Avg episode reward: [(0, '5.463')] [2025-01-03 22:00:35,428][134294] Updated weights for policy 0, policy_version 12324 (0.0025) [2025-01-03 22:00:38,547][134294] Updated weights for policy 0, policy_version 12334 (0.0027) [2025-01-03 22:00:38,968][134211] Fps is (10 sec: 12698.8, 60 sec: 13380.3, 300 sec: 13273.8). Total num frames: 50524160. Throughput: 0: 3322.3. Samples: 1798838. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 22:00:38,968][134211] Avg episode reward: [(0, '5.224')] [2025-01-03 22:00:41,517][134294] Updated weights for policy 0, policy_version 12344 (0.0024) [2025-01-03 22:00:43,968][134211] Fps is (10 sec: 12697.5, 60 sec: 13243.7, 300 sec: 13246.1). Total num frames: 50585600. Throughput: 0: 3293.1. Samples: 1818118. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 22:00:43,969][134211] Avg episode reward: [(0, '5.104')] [2025-01-03 22:00:45,190][134294] Updated weights for policy 0, policy_version 12354 (0.0026) [2025-01-03 22:00:48,742][134294] Updated weights for policy 0, policy_version 12364 (0.0027) [2025-01-03 22:00:48,968][134211] Fps is (10 sec: 12287.9, 60 sec: 13107.2, 300 sec: 13232.2). Total num frames: 50647040. Throughput: 0: 3253.2. Samples: 1826650. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 22:00:48,968][134211] Avg episode reward: [(0, '5.637')] [2025-01-03 22:00:51,903][134294] Updated weights for policy 0, policy_version 12374 (0.0025) [2025-01-03 22:00:53,971][134211] Fps is (10 sec: 12284.3, 60 sec: 13038.2, 300 sec: 13218.1). Total num frames: 50708480. Throughput: 0: 3234.3. Samples: 1844962. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 22:00:53,972][134211] Avg episode reward: [(0, '4.813')] [2025-01-03 22:00:55,071][134294] Updated weights for policy 0, policy_version 12384 (0.0025) [2025-01-03 22:00:58,139][134294] Updated weights for policy 0, policy_version 12394 (0.0025) [2025-01-03 22:00:58,968][134211] Fps is (10 sec: 12697.6, 60 sec: 12970.6, 300 sec: 13218.3). Total num frames: 50774016. Throughput: 0: 3230.3. Samples: 1864666. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-03 22:00:58,968][134211] Avg episode reward: [(0, '4.773')] [2025-01-03 22:01:01,191][134294] Updated weights for policy 0, policy_version 12404 (0.0023) [2025-01-03 22:01:03,968][134211] Fps is (10 sec: 13521.1, 60 sec: 13039.1, 300 sec: 13232.3). Total num frames: 50843648. Throughput: 0: 3247.2. Samples: 1874926. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-03 22:01:03,968][134211] Avg episode reward: [(0, '5.003')] [2025-01-03 22:01:04,344][134294] Updated weights for policy 0, policy_version 12414 (0.0025) [2025-01-03 22:01:07,376][134294] Updated weights for policy 0, policy_version 12424 (0.0022) [2025-01-03 22:01:08,968][134211] Fps is (10 sec: 13516.8, 60 sec: 13038.9, 300 sec: 13232.2). Total num frames: 50909184. Throughput: 0: 3226.3. Samples: 1894690. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 22:01:08,968][134211] Avg episode reward: [(0, '5.318')] [2025-01-03 22:01:10,434][134294] Updated weights for policy 0, policy_version 12434 (0.0027) [2025-01-03 22:01:13,286][134294] Updated weights for policy 0, policy_version 12444 (0.0025) [2025-01-03 22:01:13,968][134211] Fps is (10 sec: 13107.0, 60 sec: 13038.9, 300 sec: 13246.0). Total num frames: 50974720. Throughput: 0: 3231.3. Samples: 1915350. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 22:01:13,968][134211] Avg episode reward: [(0, '5.599')] [2025-01-03 22:01:13,979][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000012446_50978816.pth... [2025-01-03 22:01:14,058][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000011666_47783936.pth [2025-01-03 22:01:16,430][134294] Updated weights for policy 0, policy_version 12454 (0.0026) [2025-01-03 22:01:18,968][134211] Fps is (10 sec: 13516.9, 60 sec: 13107.2, 300 sec: 13259.9). Total num frames: 51044352. Throughput: 0: 3244.5. Samples: 1925300. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-03 22:01:18,968][134211] Avg episode reward: [(0, '5.942')] [2025-01-03 22:01:19,493][134294] Updated weights for policy 0, policy_version 12464 (0.0026) [2025-01-03 22:01:22,532][134294] Updated weights for policy 0, policy_version 12474 (0.0025) [2025-01-03 22:01:23,968][134211] Fps is (10 sec: 13516.8, 60 sec: 13038.9, 300 sec: 13259.9). Total num frames: 51109888. Throughput: 0: 3260.0. Samples: 1945538. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-03 22:01:23,968][134211] Avg episode reward: [(0, '5.576')] [2025-01-03 22:01:25,524][134294] Updated weights for policy 0, policy_version 12484 (0.0024) [2025-01-03 22:01:28,628][134294] Updated weights for policy 0, policy_version 12494 (0.0024) [2025-01-03 22:01:28,968][134211] Fps is (10 sec: 13516.9, 60 sec: 13039.1, 300 sec: 13273.8). Total num frames: 51179520. Throughput: 0: 3278.3. Samples: 1965640. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 22:01:28,968][134211] Avg episode reward: [(0, '5.892')] [2025-01-03 22:01:31,552][134294] Updated weights for policy 0, policy_version 12504 (0.0026) [2025-01-03 22:01:33,968][134211] Fps is (10 sec: 13516.9, 60 sec: 13107.2, 300 sec: 13273.8). Total num frames: 51245056. Throughput: 0: 3317.0. Samples: 1975914. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 22:01:33,968][134211] Avg episode reward: [(0, '6.013')] [2025-01-03 22:01:34,662][134294] Updated weights for policy 0, policy_version 12514 (0.0024) [2025-01-03 22:01:37,624][134294] Updated weights for policy 0, policy_version 12524 (0.0028) [2025-01-03 22:01:38,968][134211] Fps is (10 sec: 13107.2, 60 sec: 13107.2, 300 sec: 13301.6). Total num frames: 51310592. Throughput: 0: 3358.5. Samples: 1996082. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 22:01:38,968][134211] Avg episode reward: [(0, '5.634')] [2025-01-03 22:01:40,890][134294] Updated weights for policy 0, policy_version 12534 (0.0024) [2025-01-03 22:01:43,840][134294] Updated weights for policy 0, policy_version 12544 (0.0025) [2025-01-03 22:01:43,968][134211] Fps is (10 sec: 13516.7, 60 sec: 13243.8, 300 sec: 13329.4). Total num frames: 51380224. Throughput: 0: 3365.8. Samples: 2016128. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-03 22:01:43,969][134211] Avg episode reward: [(0, '5.217')] [2025-01-03 22:01:46,806][134294] Updated weights for policy 0, policy_version 12554 (0.0024) [2025-01-03 22:01:48,968][134211] Fps is (10 sec: 13516.6, 60 sec: 13312.0, 300 sec: 13329.4). Total num frames: 51445760. Throughput: 0: 3362.0. Samples: 2026216. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-03 22:01:48,968][134211] Avg episode reward: [(0, '4.998')] [2025-01-03 22:01:49,984][134294] Updated weights for policy 0, policy_version 12564 (0.0026) [2025-01-03 22:01:53,023][134294] Updated weights for policy 0, policy_version 12574 (0.0029) [2025-01-03 22:01:53,968][134211] Fps is (10 sec: 13107.4, 60 sec: 13381.0, 300 sec: 13315.5). Total num frames: 51511296. Throughput: 0: 3365.6. Samples: 2046140. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 22:01:53,968][134211] Avg episode reward: [(0, '4.872')] [2025-01-03 22:01:56,389][134294] Updated weights for policy 0, policy_version 12584 (0.0027) [2025-01-03 22:01:58,968][134211] Fps is (10 sec: 12697.0, 60 sec: 13311.9, 300 sec: 13287.7). Total num frames: 51572736. Throughput: 0: 3312.0. Samples: 2064390. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 22:01:58,969][134211] Avg episode reward: [(0, '4.950')] [2025-01-03 22:01:59,899][134294] Updated weights for policy 0, policy_version 12594 (0.0025) [2025-01-03 22:02:03,473][134294] Updated weights for policy 0, policy_version 12604 (0.0028) [2025-01-03 22:02:03,969][134211] Fps is (10 sec: 11877.1, 60 sec: 13107.0, 300 sec: 13259.9). Total num frames: 51630080. Throughput: 0: 3290.3. Samples: 2073366. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-03 22:02:03,969][134211] Avg episode reward: [(0, '5.546')] [2025-01-03 22:02:06,696][134294] Updated weights for policy 0, policy_version 12614 (0.0025) [2025-01-03 22:02:08,968][134211] Fps is (10 sec: 11879.0, 60 sec: 13038.9, 300 sec: 13246.0). Total num frames: 51691520. Throughput: 0: 3244.3. Samples: 2091532. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-03 22:02:08,968][134211] Avg episode reward: [(0, '5.371')] [2025-01-03 22:02:10,025][134294] Updated weights for policy 0, policy_version 12624 (0.0027) [2025-01-03 22:02:13,160][134294] Updated weights for policy 0, policy_version 12634 (0.0023) [2025-01-03 22:02:13,968][134211] Fps is (10 sec: 12698.9, 60 sec: 13039.0, 300 sec: 13232.2). Total num frames: 51757056. Throughput: 0: 3218.5. Samples: 2110474. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-03 22:02:13,968][134211] Avg episode reward: [(0, '5.643')] [2025-01-03 22:02:16,177][134294] Updated weights for policy 0, policy_version 12644 (0.0025) [2025-01-03 22:02:18,968][134211] Fps is (10 sec: 13106.8, 60 sec: 12970.6, 300 sec: 13232.1). Total num frames: 51822592. Throughput: 0: 3213.9. Samples: 2120540. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-03 22:02:18,969][134211] Avg episode reward: [(0, '5.576')] [2025-01-03 22:02:19,247][134294] Updated weights for policy 0, policy_version 12654 (0.0026) [2025-01-03 22:02:22,321][134294] Updated weights for policy 0, policy_version 12664 (0.0023) [2025-01-03 22:02:23,968][134211] Fps is (10 sec: 13516.2, 60 sec: 13038.9, 300 sec: 13232.2). Total num frames: 51892224. Throughput: 0: 3214.1. Samples: 2140716. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-03 22:02:23,969][134211] Avg episode reward: [(0, '5.425')] [2025-01-03 22:02:25,533][134294] Updated weights for policy 0, policy_version 12674 (0.0028) [2025-01-03 22:02:28,561][134294] Updated weights for policy 0, policy_version 12684 (0.0024) [2025-01-03 22:02:28,968][134211] Fps is (10 sec: 13517.2, 60 sec: 12970.6, 300 sec: 13218.3). Total num frames: 51957760. Throughput: 0: 3208.8. Samples: 2160524. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-03 22:02:28,968][134211] Avg episode reward: [(0, '5.495')] [2025-01-03 22:02:31,557][134294] Updated weights for policy 0, policy_version 12694 (0.0025) [2025-01-03 22:02:33,968][134211] Fps is (10 sec: 13517.2, 60 sec: 13038.9, 300 sec: 13232.2). Total num frames: 52027392. Throughput: 0: 3212.6. Samples: 2170784. Policy #0 lag: (min: 0.0, avg: 0.8, max: 3.0) [2025-01-03 22:02:33,969][134211] Avg episode reward: [(0, '5.688')] [2025-01-03 22:02:34,550][134294] Updated weights for policy 0, policy_version 12704 (0.0025) [2025-01-03 22:02:37,599][134294] Updated weights for policy 0, policy_version 12714 (0.0026) [2025-01-03 22:02:38,968][134211] Fps is (10 sec: 13516.9, 60 sec: 13038.9, 300 sec: 13218.3). Total num frames: 52092928. Throughput: 0: 3220.5. Samples: 2191064. Policy #0 lag: (min: 0.0, avg: 0.8, max: 3.0) [2025-01-03 22:02:38,968][134211] Avg episode reward: [(0, '5.758')] [2025-01-03 22:02:40,868][134294] Updated weights for policy 0, policy_version 12724 (0.0025) [2025-01-03 22:02:43,968][134211] Fps is (10 sec: 12288.1, 60 sec: 12834.2, 300 sec: 13190.5). Total num frames: 52150272. Throughput: 0: 3214.2. Samples: 2209028. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-03 22:02:43,969][134211] Avg episode reward: [(0, '6.188')] [2025-01-03 22:02:44,575][134294] Updated weights for policy 0, policy_version 12734 (0.0028) [2025-01-03 22:02:47,912][134294] Updated weights for policy 0, policy_version 12744 (0.0026) [2025-01-03 22:02:48,969][134211] Fps is (10 sec: 11467.2, 60 sec: 12697.3, 300 sec: 13148.8). Total num frames: 52207616. Throughput: 0: 3205.0. Samples: 2217590. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-03 22:02:48,970][134211] Avg episode reward: [(0, '5.809')] [2025-01-03 22:02:51,333][134294] Updated weights for policy 0, policy_version 12754 (0.0027) [2025-01-03 22:02:53,968][134211] Fps is (10 sec: 12287.9, 60 sec: 12697.6, 300 sec: 13135.0). Total num frames: 52273152. Throughput: 0: 3209.1. Samples: 2235940. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 22:02:53,968][134211] Avg episode reward: [(0, '6.023')] [2025-01-03 22:02:54,429][134294] Updated weights for policy 0, policy_version 12764 (0.0024) [2025-01-03 22:02:57,400][134294] Updated weights for policy 0, policy_version 12774 (0.0025) [2025-01-03 22:02:58,968][134211] Fps is (10 sec: 13109.0, 60 sec: 12766.0, 300 sec: 13135.0). Total num frames: 52338688. Throughput: 0: 3238.8. Samples: 2256218. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 22:02:58,968][134211] Avg episode reward: [(0, '5.705')] [2025-01-03 22:03:00,349][134294] Updated weights for policy 0, policy_version 12784 (0.0023) [2025-01-03 22:03:03,325][134294] Updated weights for policy 0, policy_version 12794 (0.0024) [2025-01-03 22:03:03,968][134211] Fps is (10 sec: 13926.3, 60 sec: 13039.1, 300 sec: 13162.7). Total num frames: 52412416. Throughput: 0: 3252.6. Samples: 2266904. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-03 22:03:03,968][134211] Avg episode reward: [(0, '5.591')] [2025-01-03 22:03:06,311][134294] Updated weights for policy 0, policy_version 12804 (0.0024) [2025-01-03 22:03:08,968][134211] Fps is (10 sec: 13926.6, 60 sec: 13107.2, 300 sec: 13162.7). Total num frames: 52477952. Throughput: 0: 3259.8. Samples: 2287406. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-03 22:03:08,968][134211] Avg episode reward: [(0, '6.108')] [2025-01-03 22:03:09,490][134294] Updated weights for policy 0, policy_version 12814 (0.0025) [2025-01-03 22:03:12,427][134294] Updated weights for policy 0, policy_version 12824 (0.0025) [2025-01-03 22:03:13,968][134211] Fps is (10 sec: 13516.5, 60 sec: 13175.4, 300 sec: 13162.7). Total num frames: 52547584. Throughput: 0: 3265.4. Samples: 2307466. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-03 22:03:13,969][134211] Avg episode reward: [(0, '5.845')] [2025-01-03 22:03:13,976][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000012829_52547584.pth... [2025-01-03 22:03:14,048][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000012058_49389568.pth [2025-01-03 22:03:15,430][134294] Updated weights for policy 0, policy_version 12834 (0.0026) [2025-01-03 22:03:18,361][134294] Updated weights for policy 0, policy_version 12844 (0.0026) [2025-01-03 22:03:18,968][134211] Fps is (10 sec: 13516.6, 60 sec: 13175.5, 300 sec: 13176.6). Total num frames: 52613120. Throughput: 0: 3264.5. Samples: 2317688. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 22:03:18,968][134211] Avg episode reward: [(0, '6.322')] [2025-01-03 22:03:21,327][134294] Updated weights for policy 0, policy_version 12854 (0.0027) [2025-01-03 22:03:23,968][134211] Fps is (10 sec: 13516.9, 60 sec: 13175.5, 300 sec: 13176.6). Total num frames: 52682752. Throughput: 0: 3273.8. Samples: 2338388. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 22:03:23,969][134211] Avg episode reward: [(0, '5.939')] [2025-01-03 22:03:24,629][134294] Updated weights for policy 0, policy_version 12864 (0.0027) [2025-01-03 22:03:27,571][134294] Updated weights for policy 0, policy_version 12874 (0.0024) [2025-01-03 22:03:28,968][134211] Fps is (10 sec: 13516.8, 60 sec: 13175.5, 300 sec: 13162.7). Total num frames: 52748288. Throughput: 0: 3309.2. Samples: 2357944. Policy #0 lag: (min: 0.0, avg: 0.8, max: 3.0) [2025-01-03 22:03:28,968][134211] Avg episode reward: [(0, '5.963')] [2025-01-03 22:03:30,629][134294] Updated weights for policy 0, policy_version 12884 (0.0024) [2025-01-03 22:03:33,568][134294] Updated weights for policy 0, policy_version 12894 (0.0027) [2025-01-03 22:03:33,968][134211] Fps is (10 sec: 13516.7, 60 sec: 13175.4, 300 sec: 13176.6). Total num frames: 52817920. Throughput: 0: 3350.4. Samples: 2368356. Policy #0 lag: (min: 0.0, avg: 0.8, max: 3.0) [2025-01-03 22:03:33,969][134211] Avg episode reward: [(0, '5.774')] [2025-01-03 22:03:36,553][134294] Updated weights for policy 0, policy_version 12904 (0.0025) [2025-01-03 22:03:38,968][134211] Fps is (10 sec: 13517.0, 60 sec: 13175.5, 300 sec: 13190.5). Total num frames: 52883456. Throughput: 0: 3402.2. Samples: 2389038. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-03 22:03:38,968][134211] Avg episode reward: [(0, '5.204')] [2025-01-03 22:03:39,692][134294] Updated weights for policy 0, policy_version 12914 (0.0026) [2025-01-03 22:03:42,896][134294] Updated weights for policy 0, policy_version 12924 (0.0025) [2025-01-03 22:03:43,968][134211] Fps is (10 sec: 13107.5, 60 sec: 13312.0, 300 sec: 13204.4). Total num frames: 52948992. Throughput: 0: 3379.6. Samples: 2408300. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-03 22:03:43,969][134211] Avg episode reward: [(0, '5.358')] [2025-01-03 22:03:45,811][134294] Updated weights for policy 0, policy_version 12934 (0.0025) [2025-01-03 22:03:48,819][134294] Updated weights for policy 0, policy_version 12944 (0.0028) [2025-01-03 22:03:48,968][134211] Fps is (10 sec: 13516.7, 60 sec: 13517.1, 300 sec: 13232.2). Total num frames: 53018624. Throughput: 0: 3376.0. Samples: 2418822. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 22:03:48,968][134211] Avg episode reward: [(0, '5.363')] [2025-01-03 22:03:51,764][134294] Updated weights for policy 0, policy_version 12954 (0.0025) [2025-01-03 22:03:53,968][134211] Fps is (10 sec: 13926.4, 60 sec: 13585.1, 300 sec: 13232.2). Total num frames: 53088256. Throughput: 0: 3376.3. Samples: 2439340. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 22:03:53,968][134211] Avg episode reward: [(0, '5.481')] [2025-01-03 22:03:54,926][134294] Updated weights for policy 0, policy_version 12964 (0.0025) [2025-01-03 22:03:57,876][134294] Updated weights for policy 0, policy_version 12974 (0.0026) [2025-01-03 22:03:58,968][134211] Fps is (10 sec: 13515.7, 60 sec: 13584.9, 300 sec: 13218.3). Total num frames: 53153792. Throughput: 0: 3382.3. Samples: 2459672. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-03 22:03:58,969][134211] Avg episode reward: [(0, '6.008')] [2025-01-03 22:04:00,893][134294] Updated weights for policy 0, policy_version 12984 (0.0025) [2025-01-03 22:04:03,918][134294] Updated weights for policy 0, policy_version 12994 (0.0031) [2025-01-03 22:04:03,968][134211] Fps is (10 sec: 13517.0, 60 sec: 13516.8, 300 sec: 13232.2). Total num frames: 53223424. Throughput: 0: 3380.4. Samples: 2469808. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-03 22:04:03,968][134211] Avg episode reward: [(0, '5.896')] [2025-01-03 22:04:06,866][134294] Updated weights for policy 0, policy_version 13004 (0.0024) [2025-01-03 22:04:08,968][134211] Fps is (10 sec: 13517.8, 60 sec: 13516.8, 300 sec: 13218.3). Total num frames: 53288960. Throughput: 0: 3374.8. Samples: 2490252. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 22:04:08,968][134211] Avg episode reward: [(0, '6.181')] [2025-01-03 22:04:09,990][134294] Updated weights for policy 0, policy_version 13014 (0.0023) [2025-01-03 22:04:13,030][134294] Updated weights for policy 0, policy_version 13024 (0.0026) [2025-01-03 22:04:13,968][134211] Fps is (10 sec: 13106.9, 60 sec: 13448.6, 300 sec: 13204.4). Total num frames: 53354496. Throughput: 0: 3383.5. Samples: 2510202. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 22:04:13,969][134211] Avg episode reward: [(0, '5.654')] [2025-01-03 22:04:16,102][134294] Updated weights for policy 0, policy_version 13034 (0.0025) [2025-01-03 22:04:18,968][134211] Fps is (10 sec: 13516.7, 60 sec: 13516.8, 300 sec: 13218.3). Total num frames: 53424128. Throughput: 0: 3378.9. Samples: 2520406. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 22:04:18,968][134211] Avg episode reward: [(0, '5.616')] [2025-01-03 22:04:19,139][134294] Updated weights for policy 0, policy_version 13044 (0.0023) [2025-01-03 22:04:22,189][134294] Updated weights for policy 0, policy_version 13054 (0.0029) [2025-01-03 22:04:23,968][134211] Fps is (10 sec: 13516.9, 60 sec: 13448.6, 300 sec: 13218.3). Total num frames: 53489664. Throughput: 0: 3364.7. Samples: 2540448. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 22:04:23,968][134211] Avg episode reward: [(0, '5.695')] [2025-01-03 22:04:25,213][134294] Updated weights for policy 0, policy_version 13064 (0.0025) [2025-01-03 22:04:28,127][134294] Updated weights for policy 0, policy_version 13074 (0.0022) [2025-01-03 22:04:28,968][134211] Fps is (10 sec: 13516.9, 60 sec: 13516.8, 300 sec: 13218.3). Total num frames: 53559296. Throughput: 0: 3395.7. Samples: 2561106. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 22:04:28,969][134211] Avg episode reward: [(0, '6.120')] [2025-01-03 22:04:31,146][134294] Updated weights for policy 0, policy_version 13084 (0.0024) [2025-01-03 22:04:33,968][134211] Fps is (10 sec: 13926.6, 60 sec: 13516.9, 300 sec: 13246.1). Total num frames: 53628928. Throughput: 0: 3389.6. Samples: 2571356. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 22:04:33,968][134211] Avg episode reward: [(0, '5.997')] [2025-01-03 22:04:34,274][134294] Updated weights for policy 0, policy_version 13094 (0.0026) [2025-01-03 22:04:37,256][134294] Updated weights for policy 0, policy_version 13104 (0.0025) [2025-01-03 22:04:38,968][134211] Fps is (10 sec: 13517.0, 60 sec: 13516.8, 300 sec: 13232.2). Total num frames: 53694464. Throughput: 0: 3383.7. Samples: 2591608. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 22:04:38,968][134211] Avg episode reward: [(0, '5.618')] [2025-01-03 22:04:40,285][134294] Updated weights for policy 0, policy_version 13114 (0.0029) [2025-01-03 22:04:43,660][134294] Updated weights for policy 0, policy_version 13124 (0.0028) [2025-01-03 22:04:43,968][134211] Fps is (10 sec: 13106.3, 60 sec: 13516.7, 300 sec: 13218.3). Total num frames: 53760000. Throughput: 0: 3361.5. Samples: 2610938. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 22:04:43,969][134211] Avg episode reward: [(0, '6.040')] [2025-01-03 22:04:47,041][134294] Updated weights for policy 0, policy_version 13134 (0.0025) [2025-01-03 22:04:48,968][134211] Fps is (10 sec: 12288.0, 60 sec: 13312.0, 300 sec: 13190.5). Total num frames: 53817344. Throughput: 0: 3331.0. Samples: 2619702. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-03 22:04:48,968][134211] Avg episode reward: [(0, '6.082')] [2025-01-03 22:04:50,362][134294] Updated weights for policy 0, policy_version 13144 (0.0025) [2025-01-03 22:04:53,487][134294] Updated weights for policy 0, policy_version 13154 (0.0027) [2025-01-03 22:04:53,968][134211] Fps is (10 sec: 12288.8, 60 sec: 13243.8, 300 sec: 13176.6). Total num frames: 53882880. Throughput: 0: 3297.4. Samples: 2638634. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-03 22:04:53,968][134211] Avg episode reward: [(0, '5.869')] [2025-01-03 22:04:56,501][134294] Updated weights for policy 0, policy_version 13164 (0.0024) [2025-01-03 22:04:58,968][134211] Fps is (10 sec: 13516.9, 60 sec: 13312.2, 300 sec: 13190.5). Total num frames: 53952512. Throughput: 0: 3299.4. Samples: 2658676. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 22:04:58,968][134211] Avg episode reward: [(0, '5.113')] [2025-01-03 22:04:59,572][134294] Updated weights for policy 0, policy_version 13174 (0.0024) [2025-01-03 22:05:02,654][134294] Updated weights for policy 0, policy_version 13184 (0.0023) [2025-01-03 22:05:03,968][134211] Fps is (10 sec: 13516.0, 60 sec: 13243.6, 300 sec: 13190.5). Total num frames: 54018048. Throughput: 0: 3297.7. Samples: 2668806. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 22:05:03,969][134211] Avg episode reward: [(0, '5.145')] [2025-01-03 22:05:05,641][134294] Updated weights for policy 0, policy_version 13194 (0.0025) [2025-01-03 22:05:08,522][134294] Updated weights for policy 0, policy_version 13204 (0.0024) [2025-01-03 22:05:08,968][134211] Fps is (10 sec: 13515.8, 60 sec: 13311.9, 300 sec: 13204.4). Total num frames: 54087680. Throughput: 0: 3307.0. Samples: 2689266. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 22:05:08,969][134211] Avg episode reward: [(0, '5.201')] [2025-01-03 22:05:11,683][134294] Updated weights for policy 0, policy_version 13214 (0.0025) [2025-01-03 22:05:13,968][134211] Fps is (10 sec: 13107.7, 60 sec: 13243.7, 300 sec: 13190.5). Total num frames: 54149120. Throughput: 0: 3285.2. Samples: 2708942. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 22:05:13,969][134211] Avg episode reward: [(0, '4.985')] [2025-01-03 22:05:13,987][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000013220_54149120.pth... [2025-01-03 22:05:14,072][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000012446_50978816.pth [2025-01-03 22:05:15,153][134294] Updated weights for policy 0, policy_version 13224 (0.0025) [2025-01-03 22:05:18,757][134294] Updated weights for policy 0, policy_version 13234 (0.0026) [2025-01-03 22:05:18,968][134211] Fps is (10 sec: 11878.9, 60 sec: 13038.9, 300 sec: 13148.8). Total num frames: 54206464. Throughput: 0: 3257.5. Samples: 2717942. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 22:05:18,969][134211] Avg episode reward: [(0, '4.902')] [2025-01-03 22:05:22,482][134294] Updated weights for policy 0, policy_version 13244 (0.0033) [2025-01-03 22:05:23,968][134211] Fps is (10 sec: 11059.4, 60 sec: 12834.1, 300 sec: 13093.4). Total num frames: 54259712. Throughput: 0: 3174.1. Samples: 2734444. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 22:05:23,968][134211] Avg episode reward: [(0, '4.701')] [2025-01-03 22:05:26,342][134294] Updated weights for policy 0, policy_version 13254 (0.0031) [2025-01-03 22:05:28,968][134211] Fps is (10 sec: 10649.8, 60 sec: 12561.1, 300 sec: 13065.6). Total num frames: 54312960. Throughput: 0: 3093.0. Samples: 2750122. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 22:05:28,968][134211] Avg episode reward: [(0, '4.987')] [2025-01-03 22:05:30,102][134294] Updated weights for policy 0, policy_version 13264 (0.0027) [2025-01-03 22:05:33,800][134294] Updated weights for policy 0, policy_version 13274 (0.0029) [2025-01-03 22:05:33,968][134211] Fps is (10 sec: 11059.1, 60 sec: 12356.2, 300 sec: 13037.8). Total num frames: 54370304. Throughput: 0: 3075.5. Samples: 2758102. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 22:05:33,969][134211] Avg episode reward: [(0, '5.114')] [2025-01-03 22:05:37,250][134294] Updated weights for policy 0, policy_version 13284 (0.0024) [2025-01-03 22:05:38,968][134211] Fps is (10 sec: 11468.6, 60 sec: 12219.7, 300 sec: 13023.9). Total num frames: 54427648. Throughput: 0: 3051.1. Samples: 2775936. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 22:05:38,969][134211] Avg episode reward: [(0, '4.631')] [2025-01-03 22:05:41,555][134294] Updated weights for policy 0, policy_version 13294 (0.0035) [2025-01-03 22:05:43,968][134211] Fps is (10 sec: 10649.7, 60 sec: 11946.8, 300 sec: 12982.2). Total num frames: 54476800. Throughput: 0: 2923.7. Samples: 2790242. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 22:05:43,968][134211] Avg episode reward: [(0, '4.661')] [2025-01-03 22:05:45,287][134294] Updated weights for policy 0, policy_version 13304 (0.0032) [2025-01-03 22:05:48,464][134294] Updated weights for policy 0, policy_version 13314 (0.0027) [2025-01-03 22:05:48,968][134211] Fps is (10 sec: 11059.4, 60 sec: 12014.9, 300 sec: 12982.4). Total num frames: 54538240. Throughput: 0: 2896.7. Samples: 2799154. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-03 22:05:48,968][134211] Avg episode reward: [(0, '4.830')] [2025-01-03 22:05:51,628][134294] Updated weights for policy 0, policy_version 13324 (0.0028) [2025-01-03 22:05:53,968][134211] Fps is (10 sec: 12697.6, 60 sec: 12014.9, 300 sec: 12982.2). Total num frames: 54603776. Throughput: 0: 2877.3. Samples: 2818744. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-03 22:05:53,969][134211] Avg episode reward: [(0, '4.931')] [2025-01-03 22:05:54,805][134294] Updated weights for policy 0, policy_version 13334 (0.0026) [2025-01-03 22:05:58,029][134294] Updated weights for policy 0, policy_version 13344 (0.0027) [2025-01-03 22:05:58,968][134211] Fps is (10 sec: 12697.4, 60 sec: 11878.3, 300 sec: 12954.5). Total num frames: 54665216. Throughput: 0: 2865.7. Samples: 2837898. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 22:05:58,968][134211] Avg episode reward: [(0, '5.000')] [2025-01-03 22:06:01,115][134294] Updated weights for policy 0, policy_version 13354 (0.0028) [2025-01-03 22:06:03,968][134211] Fps is (10 sec: 12697.6, 60 sec: 11878.5, 300 sec: 12954.5). Total num frames: 54730752. Throughput: 0: 2884.8. Samples: 2847756. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 22:06:03,968][134211] Avg episode reward: [(0, '5.177')] [2025-01-03 22:06:04,733][134294] Updated weights for policy 0, policy_version 13364 (0.0027) [2025-01-03 22:06:08,341][134294] Updated weights for policy 0, policy_version 13374 (0.0027) [2025-01-03 22:06:08,968][134211] Fps is (10 sec: 11878.4, 60 sec: 11605.4, 300 sec: 12912.8). Total num frames: 54784000. Throughput: 0: 2903.4. Samples: 2865098. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 22:06:08,968][134211] Avg episode reward: [(0, '5.483')] [2025-01-03 22:06:11,407][134294] Updated weights for policy 0, policy_version 13384 (0.0026) [2025-01-03 22:06:13,968][134211] Fps is (10 sec: 11878.4, 60 sec: 11673.6, 300 sec: 12898.9). Total num frames: 54849536. Throughput: 0: 2965.8. Samples: 2883584. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 22:06:13,969][134211] Avg episode reward: [(0, '5.467')] [2025-01-03 22:06:14,903][134294] Updated weights for policy 0, policy_version 13394 (0.0020) [2025-01-03 22:06:18,002][134294] Updated weights for policy 0, policy_version 13404 (0.0026) [2025-01-03 22:06:18,968][134211] Fps is (10 sec: 12697.8, 60 sec: 11741.9, 300 sec: 12885.1). Total num frames: 54910976. Throughput: 0: 2991.6. Samples: 2892722. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 22:06:18,968][134211] Avg episode reward: [(0, '5.909')] [2025-01-03 22:06:21,165][134294] Updated weights for policy 0, policy_version 13414 (0.0027) [2025-01-03 22:06:23,968][134211] Fps is (10 sec: 12697.6, 60 sec: 11946.7, 300 sec: 12871.1). Total num frames: 54976512. Throughput: 0: 3034.9. Samples: 2912508. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 22:06:23,968][134211] Avg episode reward: [(0, '5.587')] [2025-01-03 22:06:24,387][134294] Updated weights for policy 0, policy_version 13424 (0.0028) [2025-01-03 22:06:27,584][134294] Updated weights for policy 0, policy_version 13434 (0.0026) [2025-01-03 22:06:28,968][134211] Fps is (10 sec: 13107.1, 60 sec: 12151.5, 300 sec: 12871.2). Total num frames: 55042048. Throughput: 0: 3136.9. Samples: 2931404. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 22:06:28,968][134211] Avg episode reward: [(0, '5.477')] [2025-01-03 22:06:30,663][134294] Updated weights for policy 0, policy_version 13444 (0.0027) [2025-01-03 22:06:33,689][134294] Updated weights for policy 0, policy_version 13454 (0.0023) [2025-01-03 22:06:33,968][134211] Fps is (10 sec: 13106.8, 60 sec: 12287.9, 300 sec: 12871.1). Total num frames: 55107584. Throughput: 0: 3164.8. Samples: 2941572. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 22:06:33,969][134211] Avg episode reward: [(0, '5.751')] [2025-01-03 22:06:36,798][134294] Updated weights for policy 0, policy_version 13464 (0.0025) [2025-01-03 22:06:38,968][134211] Fps is (10 sec: 13107.2, 60 sec: 12424.5, 300 sec: 12857.3). Total num frames: 55173120. Throughput: 0: 3176.2. Samples: 2961672. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 22:06:38,968][134211] Avg episode reward: [(0, '5.550')] [2025-01-03 22:06:40,092][134294] Updated weights for policy 0, policy_version 13474 (0.0029) [2025-01-03 22:06:43,702][134294] Updated weights for policy 0, policy_version 13484 (0.0029) [2025-01-03 22:06:43,968][134211] Fps is (10 sec: 12288.4, 60 sec: 12561.0, 300 sec: 12829.5). Total num frames: 55230464. Throughput: 0: 3143.8. Samples: 2979370. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 22:06:43,969][134211] Avg episode reward: [(0, '5.681')] [2025-01-03 22:06:47,291][134294] Updated weights for policy 0, policy_version 13494 (0.0029) [2025-01-03 22:06:48,968][134211] Fps is (10 sec: 11468.8, 60 sec: 12492.8, 300 sec: 12801.7). Total num frames: 55287808. Throughput: 0: 3114.4. Samples: 2987904. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 22:06:48,968][134211] Avg episode reward: [(0, '6.140')] [2025-01-03 22:06:50,594][134294] Updated weights for policy 0, policy_version 13504 (0.0028) [2025-01-03 22:06:53,968][134211] Fps is (10 sec: 11878.5, 60 sec: 12424.5, 300 sec: 12801.8). Total num frames: 55349248. Throughput: 0: 3131.3. Samples: 3006006. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 22:06:53,969][134211] Avg episode reward: [(0, '5.436')] [2025-01-03 22:06:53,981][134294] Updated weights for policy 0, policy_version 13514 (0.0026) [2025-01-03 22:06:56,911][134294] Updated weights for policy 0, policy_version 13524 (0.0027) [2025-01-03 22:06:58,968][134211] Fps is (10 sec: 13107.2, 60 sec: 12561.1, 300 sec: 12843.4). Total num frames: 55418880. Throughput: 0: 3160.4. Samples: 3025802. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 22:06:58,968][134211] Avg episode reward: [(0, '5.778')] [2025-01-03 22:07:00,079][134294] Updated weights for policy 0, policy_version 13534 (0.0026) [2025-01-03 22:07:03,270][134294] Updated weights for policy 0, policy_version 13544 (0.0027) [2025-01-03 22:07:03,968][134211] Fps is (10 sec: 13516.1, 60 sec: 12561.0, 300 sec: 12857.3). Total num frames: 55484416. Throughput: 0: 3172.9. Samples: 3035504. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 22:07:03,969][134211] Avg episode reward: [(0, '5.692')] [2025-01-03 22:07:06,566][134294] Updated weights for policy 0, policy_version 13554 (0.0028) [2025-01-03 22:07:08,968][134211] Fps is (10 sec: 12697.8, 60 sec: 12697.6, 300 sec: 12843.4). Total num frames: 55545856. Throughput: 0: 3152.9. Samples: 3054390. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 22:07:08,968][134211] Avg episode reward: [(0, '5.852')] [2025-01-03 22:07:09,762][134294] Updated weights for policy 0, policy_version 13564 (0.0026) [2025-01-03 22:07:12,864][134294] Updated weights for policy 0, policy_version 13574 (0.0025) [2025-01-03 22:07:13,968][134211] Fps is (10 sec: 12698.2, 60 sec: 12697.6, 300 sec: 12843.4). Total num frames: 55611392. Throughput: 0: 3169.9. Samples: 3074052. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 22:07:13,968][134211] Avg episode reward: [(0, '5.328')] [2025-01-03 22:07:14,041][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000013578_55615488.pth... [2025-01-03 22:07:14,106][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000012829_52547584.pth [2025-01-03 22:07:15,901][134294] Updated weights for policy 0, policy_version 13584 (0.0024) [2025-01-03 22:07:18,795][134294] Updated weights for policy 0, policy_version 13594 (0.0023) [2025-01-03 22:07:18,968][134211] Fps is (10 sec: 13516.7, 60 sec: 12834.1, 300 sec: 12843.4). Total num frames: 55681024. Throughput: 0: 3171.1. Samples: 3084268. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 22:07:18,968][134211] Avg episode reward: [(0, '5.242')] [2025-01-03 22:07:21,820][134294] Updated weights for policy 0, policy_version 13604 (0.0025) [2025-01-03 22:07:23,968][134211] Fps is (10 sec: 13516.0, 60 sec: 12834.0, 300 sec: 12843.4). Total num frames: 55746560. Throughput: 0: 3181.8. Samples: 3104856. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 22:07:23,969][134211] Avg episode reward: [(0, '5.240')] [2025-01-03 22:07:25,092][134294] Updated weights for policy 0, policy_version 13614 (0.0023) [2025-01-03 22:07:28,129][134294] Updated weights for policy 0, policy_version 13624 (0.0024) [2025-01-03 22:07:28,968][134211] Fps is (10 sec: 13107.2, 60 sec: 12834.1, 300 sec: 12829.5). Total num frames: 55812096. Throughput: 0: 3224.4. Samples: 3124468. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 22:07:28,968][134211] Avg episode reward: [(0, '4.918')] [2025-01-03 22:07:31,102][134294] Updated weights for policy 0, policy_version 13634 (0.0024) [2025-01-03 22:07:33,968][134211] Fps is (10 sec: 13517.9, 60 sec: 12902.5, 300 sec: 12843.4). Total num frames: 55881728. Throughput: 0: 3262.2. Samples: 3134702. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 22:07:33,968][134211] Avg episode reward: [(0, '4.809')] [2025-01-03 22:07:34,289][134294] Updated weights for policy 0, policy_version 13644 (0.0027) [2025-01-03 22:07:37,264][134294] Updated weights for policy 0, policy_version 13654 (0.0027) [2025-01-03 22:07:38,968][134211] Fps is (10 sec: 13516.8, 60 sec: 12902.4, 300 sec: 12871.2). Total num frames: 55947264. Throughput: 0: 3303.1. Samples: 3154646. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 22:07:38,968][134211] Avg episode reward: [(0, '4.756')] [2025-01-03 22:07:40,239][134294] Updated weights for policy 0, policy_version 13664 (0.0026) [2025-01-03 22:07:43,267][134294] Updated weights for policy 0, policy_version 13674 (0.0024) [2025-01-03 22:07:43,968][134211] Fps is (10 sec: 13516.6, 60 sec: 13107.2, 300 sec: 12912.9). Total num frames: 56016896. Throughput: 0: 3317.4. Samples: 3175084. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 22:07:43,968][134211] Avg episode reward: [(0, '4.874')] [2025-01-03 22:07:46,292][134294] Updated weights for policy 0, policy_version 13684 (0.0024) [2025-01-03 22:07:48,968][134211] Fps is (10 sec: 13517.0, 60 sec: 13243.8, 300 sec: 12912.8). Total num frames: 56082432. Throughput: 0: 3329.4. Samples: 3185326. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 22:07:48,968][134211] Avg episode reward: [(0, '5.036')] [2025-01-03 22:07:49,401][134294] Updated weights for policy 0, policy_version 13694 (0.0023) [2025-01-03 22:07:52,316][134294] Updated weights for policy 0, policy_version 13704 (0.0026) [2025-01-03 22:07:53,968][134211] Fps is (10 sec: 13517.0, 60 sec: 13380.3, 300 sec: 12926.7). Total num frames: 56152064. Throughput: 0: 3360.8. Samples: 3205624. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 22:07:53,968][134211] Avg episode reward: [(0, '5.463')] [2025-01-03 22:07:55,391][134294] Updated weights for policy 0, policy_version 13714 (0.0025) [2025-01-03 22:07:58,311][134294] Updated weights for policy 0, policy_version 13724 (0.0024) [2025-01-03 22:07:58,968][134211] Fps is (10 sec: 13926.2, 60 sec: 13380.3, 300 sec: 12912.8). Total num frames: 56221696. Throughput: 0: 3380.8. Samples: 3226186. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 22:07:58,968][134211] Avg episode reward: [(0, '5.488')] [2025-01-03 22:08:01,301][134294] Updated weights for policy 0, policy_version 13734 (0.0024) [2025-01-03 22:08:03,968][134211] Fps is (10 sec: 13516.6, 60 sec: 13380.4, 300 sec: 12912.8). Total num frames: 56287232. Throughput: 0: 3378.7. Samples: 3236308. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 22:08:03,968][134211] Avg episode reward: [(0, '5.197')] [2025-01-03 22:08:04,575][134294] Updated weights for policy 0, policy_version 13744 (0.0027) [2025-01-03 22:08:07,511][134294] Updated weights for policy 0, policy_version 13754 (0.0024) [2025-01-03 22:08:08,968][134211] Fps is (10 sec: 13107.2, 60 sec: 13448.5, 300 sec: 12898.9). Total num frames: 56352768. Throughput: 0: 3362.7. Samples: 3256176. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 22:08:08,968][134211] Avg episode reward: [(0, '5.619')] [2025-01-03 22:08:10,522][134294] Updated weights for policy 0, policy_version 13764 (0.0026) [2025-01-03 22:08:13,635][134294] Updated weights for policy 0, policy_version 13774 (0.0025) [2025-01-03 22:08:13,968][134211] Fps is (10 sec: 13516.8, 60 sec: 13516.8, 300 sec: 12912.8). Total num frames: 56422400. Throughput: 0: 3376.5. Samples: 3276410. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 22:08:13,968][134211] Avg episode reward: [(0, '5.531')] [2025-01-03 22:08:16,686][134294] Updated weights for policy 0, policy_version 13784 (0.0026) [2025-01-03 22:08:18,968][134211] Fps is (10 sec: 13516.3, 60 sec: 13448.4, 300 sec: 12898.9). Total num frames: 56487936. Throughput: 0: 3371.7. Samples: 3286428. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 22:08:18,969][134211] Avg episode reward: [(0, '5.469')] [2025-01-03 22:08:19,801][134294] Updated weights for policy 0, policy_version 13794 (0.0024) [2025-01-03 22:08:22,668][134294] Updated weights for policy 0, policy_version 13804 (0.0027) [2025-01-03 22:08:23,971][134211] Fps is (10 sec: 13512.6, 60 sec: 13516.3, 300 sec: 12912.7). Total num frames: 56557568. Throughput: 0: 3376.5. Samples: 3306598. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 22:08:23,972][134211] Avg episode reward: [(0, '5.255')] [2025-01-03 22:08:25,703][134294] Updated weights for policy 0, policy_version 13814 (0.0027) [2025-01-03 22:08:28,654][134294] Updated weights for policy 0, policy_version 13824 (0.0025) [2025-01-03 22:08:28,969][134211] Fps is (10 sec: 13925.5, 60 sec: 13584.8, 300 sec: 12912.8). Total num frames: 56627200. Throughput: 0: 3384.6. Samples: 3327394. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 22:08:28,969][134211] Avg episode reward: [(0, '5.441')] [2025-01-03 22:08:31,595][134294] Updated weights for policy 0, policy_version 13834 (0.0026) [2025-01-03 22:08:33,968][134211] Fps is (10 sec: 13520.9, 60 sec: 13516.8, 300 sec: 12912.8). Total num frames: 56692736. Throughput: 0: 3381.9. Samples: 3337512. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 22:08:33,968][134211] Avg episode reward: [(0, '5.736')] [2025-01-03 22:08:34,791][134294] Updated weights for policy 0, policy_version 13844 (0.0027) [2025-01-03 22:08:37,765][134294] Updated weights for policy 0, policy_version 13854 (0.0025) [2025-01-03 22:08:38,968][134211] Fps is (10 sec: 13108.6, 60 sec: 13516.8, 300 sec: 12912.8). Total num frames: 56758272. Throughput: 0: 3376.5. Samples: 3357566. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 22:08:38,968][134211] Avg episode reward: [(0, '5.988')] [2025-01-03 22:08:40,852][134294] Updated weights for policy 0, policy_version 13864 (0.0026) [2025-01-03 22:08:43,968][134211] Fps is (10 sec: 13107.3, 60 sec: 13448.6, 300 sec: 12898.9). Total num frames: 56823808. Throughput: 0: 3354.3. Samples: 3377128. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 22:08:43,968][134211] Avg episode reward: [(0, '6.616')] [2025-01-03 22:08:44,237][134294] Updated weights for policy 0, policy_version 13874 (0.0026) [2025-01-03 22:08:47,669][134294] Updated weights for policy 0, policy_version 13884 (0.0025) [2025-01-03 22:08:48,968][134211] Fps is (10 sec: 12288.0, 60 sec: 13312.0, 300 sec: 12857.3). Total num frames: 56881152. Throughput: 0: 3323.5. Samples: 3385866. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-03 22:08:48,968][134211] Avg episode reward: [(0, '6.177')] [2025-01-03 22:08:51,009][134294] Updated weights for policy 0, policy_version 13894 (0.0025) [2025-01-03 22:08:53,968][134211] Fps is (10 sec: 12287.9, 60 sec: 13243.7, 300 sec: 12857.3). Total num frames: 56946688. Throughput: 0: 3291.4. Samples: 3404288. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-03 22:08:53,968][134211] Avg episode reward: [(0, '6.103')] [2025-01-03 22:08:54,328][134294] Updated weights for policy 0, policy_version 13904 (0.0025) [2025-01-03 22:08:57,185][134294] Updated weights for policy 0, policy_version 13914 (0.0024) [2025-01-03 22:08:58,968][134211] Fps is (10 sec: 13107.2, 60 sec: 13175.5, 300 sec: 12843.4). Total num frames: 57012224. Throughput: 0: 3283.9. Samples: 3424184. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 22:08:58,968][134211] Avg episode reward: [(0, '6.414')] [2025-01-03 22:09:00,286][134294] Updated weights for policy 0, policy_version 13924 (0.0028) [2025-01-03 22:09:03,420][134294] Updated weights for policy 0, policy_version 13934 (0.0024) [2025-01-03 22:09:03,968][134211] Fps is (10 sec: 13107.4, 60 sec: 13175.5, 300 sec: 12843.4). Total num frames: 57077760. Throughput: 0: 3288.7. Samples: 3434416. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 22:09:03,968][134211] Avg episode reward: [(0, '5.284')] [2025-01-03 22:09:06,332][134294] Updated weights for policy 0, policy_version 13944 (0.0024) [2025-01-03 22:09:08,968][134211] Fps is (10 sec: 13516.8, 60 sec: 13243.8, 300 sec: 12857.3). Total num frames: 57147392. Throughput: 0: 3287.2. Samples: 3454512. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 22:09:08,968][134211] Avg episode reward: [(0, '5.738')] [2025-01-03 22:09:09,454][134294] Updated weights for policy 0, policy_version 13954 (0.0021) [2025-01-03 22:09:12,450][134294] Updated weights for policy 0, policy_version 13964 (0.0025) [2025-01-03 22:09:13,968][134211] Fps is (10 sec: 13516.8, 60 sec: 13175.5, 300 sec: 12843.4). Total num frames: 57212928. Throughput: 0: 3271.5. Samples: 3474610. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 22:09:13,968][134211] Avg episode reward: [(0, '5.601')] [2025-01-03 22:09:14,029][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000013969_57217024.pth... [2025-01-03 22:09:14,096][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000013220_54149120.pth [2025-01-03 22:09:15,537][134294] Updated weights for policy 0, policy_version 13974 (0.0023) [2025-01-03 22:09:18,523][134294] Updated weights for policy 0, policy_version 13984 (0.0027) [2025-01-03 22:09:18,968][134211] Fps is (10 sec: 13516.7, 60 sec: 13243.8, 300 sec: 12857.3). Total num frames: 57282560. Throughput: 0: 3274.6. Samples: 3484870. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 22:09:18,969][134211] Avg episode reward: [(0, '6.044')] [2025-01-03 22:09:21,420][134294] Updated weights for policy 0, policy_version 13994 (0.0022) [2025-01-03 22:09:23,968][134211] Fps is (10 sec: 13926.3, 60 sec: 13244.4, 300 sec: 12857.3). Total num frames: 57352192. Throughput: 0: 3286.8. Samples: 3505470. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 22:09:23,968][134211] Avg episode reward: [(0, '6.093')] [2025-01-03 22:09:24,662][134294] Updated weights for policy 0, policy_version 14004 (0.0022) [2025-01-03 22:09:27,711][134294] Updated weights for policy 0, policy_version 14014 (0.0027) [2025-01-03 22:09:28,968][134211] Fps is (10 sec: 13107.0, 60 sec: 13107.4, 300 sec: 12829.5). Total num frames: 57413632. Throughput: 0: 3285.0. Samples: 3524952. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 22:09:28,969][134211] Avg episode reward: [(0, '5.651')] [2025-01-03 22:09:30,766][134294] Updated weights for policy 0, policy_version 14024 (0.0023) [2025-01-03 22:09:33,649][134294] Updated weights for policy 0, policy_version 14034 (0.0025) [2025-01-03 22:09:33,968][134211] Fps is (10 sec: 13107.3, 60 sec: 13175.5, 300 sec: 12843.4). Total num frames: 57483264. Throughput: 0: 3321.0. Samples: 3535312. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 22:09:33,968][134211] Avg episode reward: [(0, '6.074')] [2025-01-03 22:09:36,640][134294] Updated weights for policy 0, policy_version 14044 (0.0023) [2025-01-03 22:09:38,968][134211] Fps is (10 sec: 13925.7, 60 sec: 13243.6, 300 sec: 12857.3). Total num frames: 57552896. Throughput: 0: 3368.1. Samples: 3555856. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 22:09:38,969][134211] Avg episode reward: [(0, '5.860')] [2025-01-03 22:09:39,821][134294] Updated weights for policy 0, policy_version 14054 (0.0028) [2025-01-03 22:09:42,813][134294] Updated weights for policy 0, policy_version 14064 (0.0025) [2025-01-03 22:09:43,968][134211] Fps is (10 sec: 13516.2, 60 sec: 13243.6, 300 sec: 12885.0). Total num frames: 57618432. Throughput: 0: 3372.4. Samples: 3575944. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 22:09:43,969][134211] Avg episode reward: [(0, '5.109')] [2025-01-03 22:09:45,799][134294] Updated weights for policy 0, policy_version 14074 (0.0029) [2025-01-03 22:09:48,724][134294] Updated weights for policy 0, policy_version 14084 (0.0025) [2025-01-03 22:09:48,968][134211] Fps is (10 sec: 13517.8, 60 sec: 13448.5, 300 sec: 12898.9). Total num frames: 57688064. Throughput: 0: 3374.5. Samples: 3586268. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 22:09:48,968][134211] Avg episode reward: [(0, '6.104')] [2025-01-03 22:09:51,704][134294] Updated weights for policy 0, policy_version 14094 (0.0024) [2025-01-03 22:09:53,968][134211] Fps is (10 sec: 13926.4, 60 sec: 13516.7, 300 sec: 12898.9). Total num frames: 57757696. Throughput: 0: 3386.2. Samples: 3606894. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 22:09:53,969][134211] Avg episode reward: [(0, '5.970')] [2025-01-03 22:09:54,882][134294] Updated weights for policy 0, policy_version 14104 (0.0028) [2025-01-03 22:09:57,815][134294] Updated weights for policy 0, policy_version 14114 (0.0026) [2025-01-03 22:09:58,968][134211] Fps is (10 sec: 13516.6, 60 sec: 13516.8, 300 sec: 12899.0). Total num frames: 57823232. Throughput: 0: 3384.4. Samples: 3626910. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 22:09:58,968][134211] Avg episode reward: [(0, '5.987')] [2025-01-03 22:10:01,120][134294] Updated weights for policy 0, policy_version 14124 (0.0025) [2025-01-03 22:10:03,968][134211] Fps is (10 sec: 12697.9, 60 sec: 13448.5, 300 sec: 12871.2). Total num frames: 57884672. Throughput: 0: 3369.8. Samples: 3636512. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-03 22:10:03,968][134211] Avg episode reward: [(0, '6.105')] [2025-01-03 22:10:04,317][134294] Updated weights for policy 0, policy_version 14134 (0.0027) [2025-01-03 22:10:07,247][134294] Updated weights for policy 0, policy_version 14144 (0.0023) [2025-01-03 22:10:08,968][134211] Fps is (10 sec: 13107.3, 60 sec: 13448.5, 300 sec: 12898.9). Total num frames: 57954304. Throughput: 0: 3351.9. Samples: 3656304. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-03 22:10:08,968][134211] Avg episode reward: [(0, '5.974')] [2025-01-03 22:10:10,292][134294] Updated weights for policy 0, policy_version 14154 (0.0026) [2025-01-03 22:10:13,289][134294] Updated weights for policy 0, policy_version 14164 (0.0024) [2025-01-03 22:10:13,968][134211] Fps is (10 sec: 13926.4, 60 sec: 13516.8, 300 sec: 12940.6). Total num frames: 58023936. Throughput: 0: 3375.1. Samples: 3676832. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-03 22:10:13,969][134211] Avg episode reward: [(0, '5.809')] [2025-01-03 22:10:16,345][134294] Updated weights for policy 0, policy_version 14174 (0.0026) [2025-01-03 22:10:18,968][134211] Fps is (10 sec: 13516.9, 60 sec: 13448.6, 300 sec: 12982.2). Total num frames: 58089472. Throughput: 0: 3367.5. Samples: 3686850. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-03 22:10:18,968][134211] Avg episode reward: [(0, '6.380')] [2025-01-03 22:10:19,421][134294] Updated weights for policy 0, policy_version 14184 (0.0026) [2025-01-03 22:10:22,434][134294] Updated weights for policy 0, policy_version 14194 (0.0030) [2025-01-03 22:10:23,968][134211] Fps is (10 sec: 13107.2, 60 sec: 13380.2, 300 sec: 13023.9). Total num frames: 58155008. Throughput: 0: 3358.8. Samples: 3707000. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 22:10:23,968][134211] Avg episode reward: [(0, '6.235')] [2025-01-03 22:10:25,403][134294] Updated weights for policy 0, policy_version 14204 (0.0025) [2025-01-03 22:10:28,358][134294] Updated weights for policy 0, policy_version 14214 (0.0024) [2025-01-03 22:10:28,968][134211] Fps is (10 sec: 13516.6, 60 sec: 13516.8, 300 sec: 13065.6). Total num frames: 58224640. Throughput: 0: 3372.7. Samples: 3727716. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 22:10:28,969][134211] Avg episode reward: [(0, '5.667')] [2025-01-03 22:10:31,324][134294] Updated weights for policy 0, policy_version 14224 (0.0024) [2025-01-03 22:10:33,968][134211] Fps is (10 sec: 13926.5, 60 sec: 13516.8, 300 sec: 13107.2). Total num frames: 58294272. Throughput: 0: 3374.2. Samples: 3738106. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 22:10:33,968][134211] Avg episode reward: [(0, '6.263')] [2025-01-03 22:10:34,409][134294] Updated weights for policy 0, policy_version 14234 (0.0025) [2025-01-03 22:10:37,434][134294] Updated weights for policy 0, policy_version 14244 (0.0026) [2025-01-03 22:10:38,968][134211] Fps is (10 sec: 13516.8, 60 sec: 13448.7, 300 sec: 13162.7). Total num frames: 58359808. Throughput: 0: 3360.7. Samples: 3758124. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 22:10:38,968][134211] Avg episode reward: [(0, '6.021')] [2025-01-03 22:10:40,535][134294] Updated weights for policy 0, policy_version 14254 (0.0026) [2025-01-03 22:10:43,799][134294] Updated weights for policy 0, policy_version 14264 (0.0024) [2025-01-03 22:10:43,968][134211] Fps is (10 sec: 13107.1, 60 sec: 13448.6, 300 sec: 13176.6). Total num frames: 58425344. Throughput: 0: 3356.7. Samples: 3777962. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 22:10:43,968][134211] Avg episode reward: [(0, '6.132')] [2025-01-03 22:10:47,127][134294] Updated weights for policy 0, policy_version 14274 (0.0021) [2025-01-03 22:10:48,968][134211] Fps is (10 sec: 12697.6, 60 sec: 13312.0, 300 sec: 13162.7). Total num frames: 58486784. Throughput: 0: 3340.8. Samples: 3786848. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 22:10:48,968][134211] Avg episode reward: [(0, '5.819')] [2025-01-03 22:10:50,488][134294] Updated weights for policy 0, policy_version 14284 (0.0025) [2025-01-03 22:10:53,642][134294] Updated weights for policy 0, policy_version 14294 (0.0023) [2025-01-03 22:10:53,968][134211] Fps is (10 sec: 12697.6, 60 sec: 13243.8, 300 sec: 13176.6). Total num frames: 58552320. Throughput: 0: 3315.7. Samples: 3805510. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 22:10:53,968][134211] Avg episode reward: [(0, '6.214')] [2025-01-03 22:10:56,580][134294] Updated weights for policy 0, policy_version 14304 (0.0025) [2025-01-03 22:10:58,971][134211] Fps is (10 sec: 13103.2, 60 sec: 13243.1, 300 sec: 13176.5). Total num frames: 58617856. Throughput: 0: 3306.7. Samples: 3825644. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 22:10:58,971][134211] Avg episode reward: [(0, '5.999')] [2025-01-03 22:10:59,690][134294] Updated weights for policy 0, policy_version 14314 (0.0024) [2025-01-03 22:11:02,768][134294] Updated weights for policy 0, policy_version 14324 (0.0026) [2025-01-03 22:11:03,968][134211] Fps is (10 sec: 13107.2, 60 sec: 13312.0, 300 sec: 13218.3). Total num frames: 58683392. Throughput: 0: 3308.2. Samples: 3835720. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 22:11:03,968][134211] Avg episode reward: [(0, '6.007')] [2025-01-03 22:11:05,795][134294] Updated weights for policy 0, policy_version 14334 (0.0025) [2025-01-03 22:11:08,665][134294] Updated weights for policy 0, policy_version 14344 (0.0024) [2025-01-03 22:11:08,968][134211] Fps is (10 sec: 13520.8, 60 sec: 13312.0, 300 sec: 13232.2). Total num frames: 58753024. Throughput: 0: 3317.9. Samples: 3856304. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 22:11:08,968][134211] Avg episode reward: [(0, '5.575')] [2025-01-03 22:11:11,695][134294] Updated weights for policy 0, policy_version 14354 (0.0023) [2025-01-03 22:11:13,968][134211] Fps is (10 sec: 13926.2, 60 sec: 13312.0, 300 sec: 13259.9). Total num frames: 58822656. Throughput: 0: 3310.0. Samples: 3876668. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 22:11:13,969][134211] Avg episode reward: [(0, '5.781')] [2025-01-03 22:11:13,978][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000014361_58822656.pth... [2025-01-03 22:11:14,064][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000013578_55615488.pth [2025-01-03 22:11:14,777][134294] Updated weights for policy 0, policy_version 14364 (0.0024) [2025-01-03 22:11:17,828][134294] Updated weights for policy 0, policy_version 14374 (0.0028) [2025-01-03 22:11:18,968][134211] Fps is (10 sec: 13516.8, 60 sec: 13312.0, 300 sec: 13259.9). Total num frames: 58888192. Throughput: 0: 3297.5. Samples: 3886494. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 22:11:18,968][134211] Avg episode reward: [(0, '5.490')] [2025-01-03 22:11:20,789][134294] Updated weights for policy 0, policy_version 14384 (0.0027) [2025-01-03 22:11:23,805][134294] Updated weights for policy 0, policy_version 14394 (0.0025) [2025-01-03 22:11:23,968][134211] Fps is (10 sec: 13517.1, 60 sec: 13380.3, 300 sec: 13273.8). Total num frames: 58957824. Throughput: 0: 3311.6. Samples: 3907144. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 22:11:23,968][134211] Avg episode reward: [(0, '6.438')] [2025-01-03 22:11:26,865][134294] Updated weights for policy 0, policy_version 14404 (0.0025) [2025-01-03 22:11:28,968][134211] Fps is (10 sec: 13517.1, 60 sec: 13312.0, 300 sec: 13273.8). Total num frames: 59023360. Throughput: 0: 3311.0. Samples: 3926954. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-03 22:11:28,968][134211] Avg episode reward: [(0, '5.644')] [2025-01-03 22:11:30,051][134294] Updated weights for policy 0, policy_version 14414 (0.0025) [2025-01-03 22:11:33,028][134294] Updated weights for policy 0, policy_version 14424 (0.0024) [2025-01-03 22:11:33,968][134211] Fps is (10 sec: 13107.2, 60 sec: 13243.7, 300 sec: 13273.8). Total num frames: 59088896. Throughput: 0: 3336.2. Samples: 3936978. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-03 22:11:33,968][134211] Avg episode reward: [(0, '6.308')] [2025-01-03 22:11:36,081][134294] Updated weights for policy 0, policy_version 14434 (0.0024) [2025-01-03 22:11:38,968][134211] Fps is (10 sec: 13516.7, 60 sec: 13312.0, 300 sec: 13315.5). Total num frames: 59158528. Throughput: 0: 3372.4. Samples: 3957268. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-03 22:11:38,968][134211] Avg episode reward: [(0, '5.807')] [2025-01-03 22:11:39,250][134294] Updated weights for policy 0, policy_version 14444 (0.0023) [2025-01-03 22:11:42,166][134294] Updated weights for policy 0, policy_version 14454 (0.0024) [2025-01-03 22:11:43,968][134211] Fps is (10 sec: 13516.6, 60 sec: 13312.0, 300 sec: 13343.2). Total num frames: 59224064. Throughput: 0: 3369.7. Samples: 3977272. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-03 22:11:43,969][134211] Avg episode reward: [(0, '5.566')] [2025-01-03 22:11:45,473][134294] Updated weights for policy 0, policy_version 14464 (0.0028) [2025-01-03 22:11:48,922][134294] Updated weights for policy 0, policy_version 14474 (0.0025) [2025-01-03 22:11:48,968][134211] Fps is (10 sec: 12697.6, 60 sec: 13312.0, 300 sec: 13343.2). Total num frames: 59285504. Throughput: 0: 3349.7. Samples: 3986454. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-03 22:11:48,968][134211] Avg episode reward: [(0, '6.151')] [2025-01-03 22:11:52,071][134294] Updated weights for policy 0, policy_version 14484 (0.0026) [2025-01-03 22:11:53,968][134211] Fps is (10 sec: 12288.1, 60 sec: 13243.7, 300 sec: 13315.5). Total num frames: 59346944. Throughput: 0: 3302.2. Samples: 4004904. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-03 22:11:53,969][134211] Avg episode reward: [(0, '6.560')] [2025-01-03 22:11:55,636][134294] Updated weights for policy 0, policy_version 14494 (0.0028) [2025-01-03 22:11:58,726][134294] Updated weights for policy 0, policy_version 14504 (0.0024) [2025-01-03 22:11:58,968][134211] Fps is (10 sec: 12288.1, 60 sec: 13176.2, 300 sec: 13301.6). Total num frames: 59408384. Throughput: 0: 3258.3. Samples: 4023290. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 22:11:58,968][134211] Avg episode reward: [(0, '6.147')] [2025-01-03 22:12:01,831][134294] Updated weights for policy 0, policy_version 14514 (0.0025) [2025-01-03 22:12:03,968][134211] Fps is (10 sec: 12697.8, 60 sec: 13175.5, 300 sec: 13315.5). Total num frames: 59473920. Throughput: 0: 3263.4. Samples: 4033346. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 22:12:03,968][134211] Avg episode reward: [(0, '6.082')] [2025-01-03 22:12:04,962][134294] Updated weights for policy 0, policy_version 14524 (0.0032) [2025-01-03 22:12:08,192][134294] Updated weights for policy 0, policy_version 14534 (0.0028) [2025-01-03 22:12:08,968][134211] Fps is (10 sec: 13107.1, 60 sec: 13107.2, 300 sec: 13315.5). Total num frames: 59539456. Throughput: 0: 3240.3. Samples: 4052958. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 22:12:08,968][134211] Avg episode reward: [(0, '5.754')] [2025-01-03 22:12:11,691][134294] Updated weights for policy 0, policy_version 14544 (0.0027) [2025-01-03 22:12:13,968][134211] Fps is (10 sec: 12287.9, 60 sec: 12902.4, 300 sec: 13273.8). Total num frames: 59596800. Throughput: 0: 3184.5. Samples: 4070256. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 22:12:13,968][134211] Avg episode reward: [(0, '5.517')] [2025-01-03 22:12:15,217][134294] Updated weights for policy 0, policy_version 14554 (0.0025) [2025-01-03 22:12:18,264][134294] Updated weights for policy 0, policy_version 14564 (0.0025) [2025-01-03 22:12:18,968][134211] Fps is (10 sec: 12287.9, 60 sec: 12902.4, 300 sec: 13273.8). Total num frames: 59662336. Throughput: 0: 3168.3. Samples: 4079550. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 22:12:18,968][134211] Avg episode reward: [(0, '6.455')] [2025-01-03 22:12:21,353][134294] Updated weights for policy 0, policy_version 14574 (0.0026) [2025-01-03 22:12:23,968][134211] Fps is (10 sec: 13107.3, 60 sec: 12834.1, 300 sec: 13273.8). Total num frames: 59727872. Throughput: 0: 3164.7. Samples: 4099680. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 22:12:23,968][134211] Avg episode reward: [(0, '6.376')] [2025-01-03 22:12:24,435][134294] Updated weights for policy 0, policy_version 14584 (0.0026) [2025-01-03 22:12:27,391][134294] Updated weights for policy 0, policy_version 14594 (0.0026) [2025-01-03 22:12:28,968][134211] Fps is (10 sec: 13107.3, 60 sec: 12834.1, 300 sec: 13259.9). Total num frames: 59793408. Throughput: 0: 3170.8. Samples: 4119956. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 22:12:28,968][134211] Avg episode reward: [(0, '6.062')] [2025-01-03 22:12:30,378][134294] Updated weights for policy 0, policy_version 14604 (0.0026) [2025-01-03 22:12:33,294][134294] Updated weights for policy 0, policy_version 14614 (0.0025) [2025-01-03 22:12:33,968][134211] Fps is (10 sec: 13926.4, 60 sec: 12970.7, 300 sec: 13287.7). Total num frames: 59867136. Throughput: 0: 3200.1. Samples: 4130458. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 22:12:33,968][134211] Avg episode reward: [(0, '5.773')] [2025-01-03 22:12:36,285][134294] Updated weights for policy 0, policy_version 14624 (0.0025) [2025-01-03 22:12:38,968][134211] Fps is (10 sec: 13926.2, 60 sec: 12902.4, 300 sec: 13273.8). Total num frames: 59932672. Throughput: 0: 3250.4. Samples: 4151170. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-03 22:12:38,968][134211] Avg episode reward: [(0, '5.302')] [2025-01-03 22:12:39,408][134294] Updated weights for policy 0, policy_version 14634 (0.0024) [2025-01-03 22:12:42,552][134294] Updated weights for policy 0, policy_version 14644 (0.0024) [2025-01-03 22:12:43,968][134211] Fps is (10 sec: 13107.1, 60 sec: 12902.4, 300 sec: 13273.8). Total num frames: 59998208. Throughput: 0: 3271.1. Samples: 4170490. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-03 22:12:43,968][134211] Avg episode reward: [(0, '5.659')] [2025-01-03 22:12:46,004][134294] Updated weights for policy 0, policy_version 14654 (0.0027) [2025-01-03 22:12:48,968][134211] Fps is (10 sec: 12288.0, 60 sec: 12834.1, 300 sec: 13232.2). Total num frames: 60055552. Throughput: 0: 3243.8. Samples: 4179316. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-03 22:12:48,968][134211] Avg episode reward: [(0, '6.476')] [2025-01-03 22:12:49,582][134294] Updated weights for policy 0, policy_version 14664 (0.0030) [2025-01-03 22:12:52,798][134294] Updated weights for policy 0, policy_version 14674 (0.0024) [2025-01-03 22:12:53,968][134211] Fps is (10 sec: 11878.4, 60 sec: 12834.1, 300 sec: 13204.4). Total num frames: 60116992. Throughput: 0: 3208.2. Samples: 4197326. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-03 22:12:53,968][134211] Avg episode reward: [(0, '5.733')] [2025-01-03 22:12:55,764][134294] Updated weights for policy 0, policy_version 14684 (0.0024) [2025-01-03 22:12:58,711][134294] Updated weights for policy 0, policy_version 14694 (0.0024) [2025-01-03 22:12:58,968][134211] Fps is (10 sec: 13107.4, 60 sec: 12970.7, 300 sec: 13218.3). Total num frames: 60186624. Throughput: 0: 3284.5. Samples: 4218060. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-03 22:12:58,968][134211] Avg episode reward: [(0, '5.470')] [2025-01-03 22:13:01,704][134294] Updated weights for policy 0, policy_version 14704 (0.0029) [2025-01-03 22:13:03,968][134211] Fps is (10 sec: 13926.5, 60 sec: 13038.9, 300 sec: 13232.2). Total num frames: 60256256. Throughput: 0: 3307.1. Samples: 4228370. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 22:13:03,968][134211] Avg episode reward: [(0, '5.918')] [2025-01-03 22:13:04,777][134294] Updated weights for policy 0, policy_version 14714 (0.0025) [2025-01-03 22:13:07,893][134294] Updated weights for policy 0, policy_version 14724 (0.0024) [2025-01-03 22:13:08,968][134211] Fps is (10 sec: 13516.8, 60 sec: 13038.9, 300 sec: 13218.3). Total num frames: 60321792. Throughput: 0: 3303.6. Samples: 4248340. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 22:13:08,968][134211] Avg episode reward: [(0, '5.954')] [2025-01-03 22:13:10,930][134294] Updated weights for policy 0, policy_version 14734 (0.0027) [2025-01-03 22:13:13,845][134294] Updated weights for policy 0, policy_version 14744 (0.0024) [2025-01-03 22:13:13,969][134211] Fps is (10 sec: 13515.7, 60 sec: 13243.5, 300 sec: 13232.1). Total num frames: 60391424. Throughput: 0: 3305.2. Samples: 4268694. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 22:13:13,969][134211] Avg episode reward: [(0, '5.546')] [2025-01-03 22:13:13,980][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000014744_60391424.pth... [2025-01-03 22:13:14,052][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000013969_57217024.pth [2025-01-03 22:13:16,909][134294] Updated weights for policy 0, policy_version 14754 (0.0027) [2025-01-03 22:13:18,968][134211] Fps is (10 sec: 13516.7, 60 sec: 13243.7, 300 sec: 13218.4). Total num frames: 60456960. Throughput: 0: 3293.6. Samples: 4278668. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 22:13:18,968][134211] Avg episode reward: [(0, '6.040')] [2025-01-03 22:13:19,983][134294] Updated weights for policy 0, policy_version 14764 (0.0025) [2025-01-03 22:13:23,275][134294] Updated weights for policy 0, policy_version 14774 (0.0026) [2025-01-03 22:13:23,968][134211] Fps is (10 sec: 13108.0, 60 sec: 13243.7, 300 sec: 13204.4). Total num frames: 60522496. Throughput: 0: 3273.1. Samples: 4298460. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 22:13:23,969][134211] Avg episode reward: [(0, '5.955')] [2025-01-03 22:13:26,479][134294] Updated weights for policy 0, policy_version 14784 (0.0025) [2025-01-03 22:13:28,968][134211] Fps is (10 sec: 12697.5, 60 sec: 13175.4, 300 sec: 13190.5). Total num frames: 60583936. Throughput: 0: 3257.8. Samples: 4317090. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 22:13:28,968][134211] Avg episode reward: [(0, '6.314')] [2025-01-03 22:13:29,789][134294] Updated weights for policy 0, policy_version 14794 (0.0028) [2025-01-03 22:13:32,650][134294] Updated weights for policy 0, policy_version 14804 (0.0023) [2025-01-03 22:13:33,968][134211] Fps is (10 sec: 13107.5, 60 sec: 13107.2, 300 sec: 13204.4). Total num frames: 60653568. Throughput: 0: 3285.9. Samples: 4327182. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-03 22:13:33,969][134211] Avg episode reward: [(0, '6.085')] [2025-01-03 22:13:35,684][134294] Updated weights for policy 0, policy_version 14814 (0.0025) [2025-01-03 22:13:38,602][134294] Updated weights for policy 0, policy_version 14824 (0.0025) [2025-01-03 22:13:38,968][134211] Fps is (10 sec: 13926.5, 60 sec: 13175.5, 300 sec: 13218.3). Total num frames: 60723200. Throughput: 0: 3349.3. Samples: 4348046. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-03 22:13:38,968][134211] Avg episode reward: [(0, '5.632')] [2025-01-03 22:13:41,546][134294] Updated weights for policy 0, policy_version 14834 (0.0028) [2025-01-03 22:13:43,968][134211] Fps is (10 sec: 13516.7, 60 sec: 13175.5, 300 sec: 13246.0). Total num frames: 60788736. Throughput: 0: 3336.9. Samples: 4368222. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 22:13:43,968][134211] Avg episode reward: [(0, '6.036')] [2025-01-03 22:13:44,763][134294] Updated weights for policy 0, policy_version 14844 (0.0027) [2025-01-03 22:13:47,745][134294] Updated weights for policy 0, policy_version 14854 (0.0025) [2025-01-03 22:13:48,968][134211] Fps is (10 sec: 13516.7, 60 sec: 13380.3, 300 sec: 13259.9). Total num frames: 60858368. Throughput: 0: 3334.9. Samples: 4378442. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 22:13:48,968][134211] Avg episode reward: [(0, '5.532')] [2025-01-03 22:13:50,717][134294] Updated weights for policy 0, policy_version 14864 (0.0025) [2025-01-03 22:13:53,540][134294] Updated weights for policy 0, policy_version 14874 (0.0024) [2025-01-03 22:13:53,968][134211] Fps is (10 sec: 13926.5, 60 sec: 13516.8, 300 sec: 13273.8). Total num frames: 60928000. Throughput: 0: 3354.6. Samples: 4399296. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 22:13:53,968][134211] Avg episode reward: [(0, '5.304')] [2025-01-03 22:13:56,558][134294] Updated weights for policy 0, policy_version 14884 (0.0025) [2025-01-03 22:13:58,968][134211] Fps is (10 sec: 13516.9, 60 sec: 13448.5, 300 sec: 13273.8). Total num frames: 60993536. Throughput: 0: 3358.8. Samples: 4419838. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 22:13:58,968][134211] Avg episode reward: [(0, '5.359')] [2025-01-03 22:13:59,579][134294] Updated weights for policy 0, policy_version 14894 (0.0028) [2025-01-03 22:14:02,732][134294] Updated weights for policy 0, policy_version 14904 (0.0026) [2025-01-03 22:14:03,968][134211] Fps is (10 sec: 13516.8, 60 sec: 13448.5, 300 sec: 13273.8). Total num frames: 61063168. Throughput: 0: 3361.8. Samples: 4429948. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 22:14:03,968][134211] Avg episode reward: [(0, '4.974')] [2025-01-03 22:14:05,618][134294] Updated weights for policy 0, policy_version 14914 (0.0026) [2025-01-03 22:14:08,615][134294] Updated weights for policy 0, policy_version 14924 (0.0024) [2025-01-03 22:14:08,968][134211] Fps is (10 sec: 13926.2, 60 sec: 13516.8, 300 sec: 13287.7). Total num frames: 61132800. Throughput: 0: 3379.7. Samples: 4450544. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-03 22:14:08,968][134211] Avg episode reward: [(0, '4.783')] [2025-01-03 22:14:11,499][134294] Updated weights for policy 0, policy_version 14934 (0.0026) [2025-01-03 22:14:13,968][134211] Fps is (10 sec: 13517.0, 60 sec: 13448.7, 300 sec: 13273.8). Total num frames: 61198336. Throughput: 0: 3420.6. Samples: 4471016. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-03 22:14:13,968][134211] Avg episode reward: [(0, '4.695')] [2025-01-03 22:14:14,727][134294] Updated weights for policy 0, policy_version 14944 (0.0027) [2025-01-03 22:14:17,756][134294] Updated weights for policy 0, policy_version 14954 (0.0021) [2025-01-03 22:14:18,968][134211] Fps is (10 sec: 13107.2, 60 sec: 13448.5, 300 sec: 13259.9). Total num frames: 61263872. Throughput: 0: 3413.8. Samples: 4480804. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 22:14:18,968][134211] Avg episode reward: [(0, '5.158')] [2025-01-03 22:14:20,810][134294] Updated weights for policy 0, policy_version 14964 (0.0025) [2025-01-03 22:14:23,659][134294] Updated weights for policy 0, policy_version 14974 (0.0025) [2025-01-03 22:14:23,968][134211] Fps is (10 sec: 13516.8, 60 sec: 13516.9, 300 sec: 13287.7). Total num frames: 61333504. Throughput: 0: 3407.2. Samples: 4501372. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 22:14:23,968][134211] Avg episode reward: [(0, '4.925')] [2025-01-03 22:14:26,670][134294] Updated weights for policy 0, policy_version 14984 (0.0026) [2025-01-03 22:14:28,968][134211] Fps is (10 sec: 13926.4, 60 sec: 13653.3, 300 sec: 13287.7). Total num frames: 61403136. Throughput: 0: 3410.5. Samples: 4521694. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 22:14:28,968][134211] Avg episode reward: [(0, '5.053')] [2025-01-03 22:14:29,771][134294] Updated weights for policy 0, policy_version 14994 (0.0025) [2025-01-03 22:14:32,834][134294] Updated weights for policy 0, policy_version 15004 (0.0024) [2025-01-03 22:14:33,968][134211] Fps is (10 sec: 13516.8, 60 sec: 13585.1, 300 sec: 13273.8). Total num frames: 61468672. Throughput: 0: 3406.1. Samples: 4531718. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 22:14:33,968][134211] Avg episode reward: [(0, '4.893')] [2025-01-03 22:14:35,723][134294] Updated weights for policy 0, policy_version 15014 (0.0021) [2025-01-03 22:14:38,744][134294] Updated weights for policy 0, policy_version 15024 (0.0024) [2025-01-03 22:14:38,968][134211] Fps is (10 sec: 13516.6, 60 sec: 13585.0, 300 sec: 13287.7). Total num frames: 61538304. Throughput: 0: 3404.1. Samples: 4552480. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 22:14:38,969][134211] Avg episode reward: [(0, '4.961')] [2025-01-03 22:14:41,724][134294] Updated weights for policy 0, policy_version 15034 (0.0025) [2025-01-03 22:14:43,968][134211] Fps is (10 sec: 13516.6, 60 sec: 13585.1, 300 sec: 13273.8). Total num frames: 61603840. Throughput: 0: 3387.8. Samples: 4572290. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 22:14:43,968][134211] Avg episode reward: [(0, '5.056')] [2025-01-03 22:14:45,236][134294] Updated weights for policy 0, policy_version 15044 (0.0025) [2025-01-03 22:14:48,567][134294] Updated weights for policy 0, policy_version 15054 (0.0026) [2025-01-03 22:14:48,968][134211] Fps is (10 sec: 12288.3, 60 sec: 13380.3, 300 sec: 13232.2). Total num frames: 61661184. Throughput: 0: 3356.1. Samples: 4580974. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-03 22:14:48,968][134211] Avg episode reward: [(0, '5.070')] [2025-01-03 22:14:52,109][134294] Updated weights for policy 0, policy_version 15064 (0.0027) [2025-01-03 22:14:53,968][134211] Fps is (10 sec: 11878.5, 60 sec: 13243.7, 300 sec: 13218.3). Total num frames: 61722624. Throughput: 0: 3296.8. Samples: 4598898. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-03 22:14:53,968][134211] Avg episode reward: [(0, '5.412')] [2025-01-03 22:14:55,258][134294] Updated weights for policy 0, policy_version 15074 (0.0025) [2025-01-03 22:14:58,205][134294] Updated weights for policy 0, policy_version 15084 (0.0026) [2025-01-03 22:14:58,968][134211] Fps is (10 sec: 13107.1, 60 sec: 13312.0, 300 sec: 13246.1). Total num frames: 61792256. Throughput: 0: 3292.1. Samples: 4619162. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-03 22:14:58,968][134211] Avg episode reward: [(0, '5.616')] [2025-01-03 22:15:01,129][134294] Updated weights for policy 0, policy_version 15094 (0.0022) [2025-01-03 22:15:03,968][134211] Fps is (10 sec: 13516.8, 60 sec: 13243.7, 300 sec: 13232.2). Total num frames: 61857792. Throughput: 0: 3303.3. Samples: 4629454. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-03 22:15:03,968][134211] Avg episode reward: [(0, '5.461')] [2025-01-03 22:15:04,375][134294] Updated weights for policy 0, policy_version 15104 (0.0025) [2025-01-03 22:15:07,236][134294] Updated weights for policy 0, policy_version 15114 (0.0022) [2025-01-03 22:15:08,968][134211] Fps is (10 sec: 13516.8, 60 sec: 13243.7, 300 sec: 13232.2). Total num frames: 61927424. Throughput: 0: 3294.8. Samples: 4649636. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-03 22:15:08,968][134211] Avg episode reward: [(0, '5.801')] [2025-01-03 22:15:10,198][134294] Updated weights for policy 0, policy_version 15124 (0.0024) [2025-01-03 22:15:13,116][134294] Updated weights for policy 0, policy_version 15134 (0.0025) [2025-01-03 22:15:13,968][134211] Fps is (10 sec: 13926.3, 60 sec: 13312.0, 300 sec: 13246.0). Total num frames: 61997056. Throughput: 0: 3305.8. Samples: 4670456. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 22:15:13,969][134211] Avg episode reward: [(0, '5.847')] [2025-01-03 22:15:13,981][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000015136_61997056.pth... [2025-01-03 22:15:14,056][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000014361_58822656.pth [2025-01-03 22:15:16,325][134294] Updated weights for policy 0, policy_version 15144 (0.0026) [2025-01-03 22:15:18,968][134211] Fps is (10 sec: 13926.5, 60 sec: 13380.3, 300 sec: 13259.9). Total num frames: 62066688. Throughput: 0: 3298.8. Samples: 4680164. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 22:15:18,968][134211] Avg episode reward: [(0, '5.688')] [2025-01-03 22:15:19,254][134294] Updated weights for policy 0, policy_version 15154 (0.0026) [2025-01-03 22:15:22,160][134294] Updated weights for policy 0, policy_version 15164 (0.0025) [2025-01-03 22:15:23,968][134211] Fps is (10 sec: 13517.1, 60 sec: 13312.0, 300 sec: 13246.1). Total num frames: 62132224. Throughput: 0: 3302.0. Samples: 4701068. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 22:15:23,968][134211] Avg episode reward: [(0, '6.014')] [2025-01-03 22:15:25,231][134294] Updated weights for policy 0, policy_version 15174 (0.0025) [2025-01-03 22:15:28,176][134294] Updated weights for policy 0, policy_version 15184 (0.0028) [2025-01-03 22:15:28,968][134211] Fps is (10 sec: 13516.7, 60 sec: 13312.0, 300 sec: 13246.1). Total num frames: 62201856. Throughput: 0: 3316.4. Samples: 4721528. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 22:15:28,968][134211] Avg episode reward: [(0, '5.760')] [2025-01-03 22:15:31,169][134294] Updated weights for policy 0, policy_version 15194 (0.0024) [2025-01-03 22:15:33,223][134294] Updated weights for policy 0, policy_version 15204 (0.0012) [2025-01-03 22:15:33,968][134211] Fps is (10 sec: 15974.5, 60 sec: 13721.6, 300 sec: 13329.4). Total num frames: 62291968. Throughput: 0: 3351.7. Samples: 4731800. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-03 22:15:33,968][134211] Avg episode reward: [(0, '5.921')] [2025-01-03 22:15:35,128][134294] Updated weights for policy 0, policy_version 15214 (0.0013) [2025-01-03 22:15:36,971][134294] Updated weights for policy 0, policy_version 15224 (0.0015) [2025-01-03 22:15:38,900][134294] Updated weights for policy 0, policy_version 15234 (0.0015) [2025-01-03 22:15:38,968][134211] Fps is (10 sec: 19661.0, 60 sec: 14336.1, 300 sec: 13468.2). Total num frames: 62398464. Throughput: 0: 3660.5. Samples: 4763618. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-03 22:15:38,968][134211] Avg episode reward: [(0, '7.039')] [2025-01-03 22:15:38,969][134264] Saving new best policy, reward=7.039! [2025-01-03 22:15:40,767][134294] Updated weights for policy 0, policy_version 15244 (0.0013) [2025-01-03 22:15:43,753][134294] Updated weights for policy 0, policy_version 15254 (0.0025) [2025-01-03 22:15:43,968][134211] Fps is (10 sec: 18841.5, 60 sec: 14609.1, 300 sec: 13537.6). Total num frames: 62480384. Throughput: 0: 3833.7. Samples: 4791678. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 22:15:43,968][134211] Avg episode reward: [(0, '6.097')] [2025-01-03 22:15:46,901][134294] Updated weights for policy 0, policy_version 15264 (0.0028) [2025-01-03 22:15:48,968][134211] Fps is (10 sec: 14745.4, 60 sec: 14745.6, 300 sec: 13537.6). Total num frames: 62545920. Throughput: 0: 3812.9. Samples: 4801032. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 22:15:48,968][134211] Avg episode reward: [(0, '5.934')] [2025-01-03 22:15:50,120][134294] Updated weights for policy 0, policy_version 15274 (0.0028) [2025-01-03 22:15:53,228][134294] Updated weights for policy 0, policy_version 15284 (0.0027) [2025-01-03 22:15:53,968][134211] Fps is (10 sec: 13107.2, 60 sec: 14813.9, 300 sec: 13537.8). Total num frames: 62611456. Throughput: 0: 3796.5. Samples: 4820480. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 22:15:53,968][134211] Avg episode reward: [(0, '6.149')] [2025-01-03 22:15:56,212][134294] Updated weights for policy 0, policy_version 15294 (0.0025) [2025-01-03 22:15:58,968][134211] Fps is (10 sec: 13107.1, 60 sec: 14745.6, 300 sec: 13537.6). Total num frames: 62676992. Throughput: 0: 3779.3. Samples: 4840526. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-03 22:15:58,968][134211] Avg episode reward: [(0, '5.789')] [2025-01-03 22:15:59,405][134294] Updated weights for policy 0, policy_version 15304 (0.0025) [2025-01-03 22:16:02,445][134294] Updated weights for policy 0, policy_version 15314 (0.0025) [2025-01-03 22:16:03,968][134211] Fps is (10 sec: 13107.3, 60 sec: 14745.6, 300 sec: 13523.8). Total num frames: 62742528. Throughput: 0: 3784.2. Samples: 4850452. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-03 22:16:03,968][134211] Avg episode reward: [(0, '5.659')] [2025-01-03 22:16:05,443][134294] Updated weights for policy 0, policy_version 15324 (0.0023) [2025-01-03 22:16:08,290][134294] Updated weights for policy 0, policy_version 15334 (0.0022) [2025-01-03 22:16:08,969][134211] Fps is (10 sec: 13924.9, 60 sec: 14813.6, 300 sec: 13537.6). Total num frames: 62816256. Throughput: 0: 3780.2. Samples: 4871182. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 22:16:08,969][134211] Avg episode reward: [(0, '5.852')] [2025-01-03 22:16:11,773][134294] Updated weights for policy 0, policy_version 15344 (0.0028) [2025-01-03 22:16:13,968][134211] Fps is (10 sec: 13106.9, 60 sec: 14609.1, 300 sec: 13509.9). Total num frames: 62873600. Throughput: 0: 3741.7. Samples: 4889904. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 22:16:13,968][134211] Avg episode reward: [(0, '5.826')] [2025-01-03 22:16:15,060][134294] Updated weights for policy 0, policy_version 15354 (0.0026) [2025-01-03 22:16:17,970][134294] Updated weights for policy 0, policy_version 15364 (0.0026) [2025-01-03 22:16:18,968][134211] Fps is (10 sec: 12699.0, 60 sec: 14609.0, 300 sec: 13509.9). Total num frames: 62943232. Throughput: 0: 3727.5. Samples: 4899538. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-03 22:16:18,968][134211] Avg episode reward: [(0, '5.879')] [2025-01-03 22:16:21,055][134294] Updated weights for policy 0, policy_version 15374 (0.0027) [2025-01-03 22:16:23,968][134211] Fps is (10 sec: 13516.9, 60 sec: 14609.0, 300 sec: 13509.8). Total num frames: 63008768. Throughput: 0: 3469.8. Samples: 4919760. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-03 22:16:23,968][134211] Avg episode reward: [(0, '6.111')] [2025-01-03 22:16:24,255][134294] Updated weights for policy 0, policy_version 15384 (0.0025) [2025-01-03 22:16:27,195][134294] Updated weights for policy 0, policy_version 15394 (0.0024) [2025-01-03 22:16:28,968][134211] Fps is (10 sec: 13107.2, 60 sec: 14540.8, 300 sec: 13509.9). Total num frames: 63074304. Throughput: 0: 3292.5. Samples: 4939840. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-03 22:16:28,968][134211] Avg episode reward: [(0, '5.796')] [2025-01-03 22:16:30,230][134294] Updated weights for policy 0, policy_version 15404 (0.0024) [2025-01-03 22:16:33,115][134294] Updated weights for policy 0, policy_version 15414 (0.0025) [2025-01-03 22:16:33,968][134211] Fps is (10 sec: 13517.0, 60 sec: 14199.5, 300 sec: 13509.9). Total num frames: 63143936. Throughput: 0: 3317.2. Samples: 4950306. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-03 22:16:33,968][134211] Avg episode reward: [(0, '5.462')] [2025-01-03 22:16:36,071][134294] Updated weights for policy 0, policy_version 15424 (0.0025) [2025-01-03 22:16:38,968][134211] Fps is (10 sec: 13517.0, 60 sec: 13516.8, 300 sec: 13509.9). Total num frames: 63209472. Throughput: 0: 3332.9. Samples: 4970460. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-03 22:16:38,968][134211] Avg episode reward: [(0, '5.062')] [2025-01-03 22:16:39,423][134294] Updated weights for policy 0, policy_version 15434 (0.0025) [2025-01-03 22:16:42,315][134294] Updated weights for policy 0, policy_version 15444 (0.0023) [2025-01-03 22:16:43,968][134211] Fps is (10 sec: 13106.9, 60 sec: 13243.7, 300 sec: 13523.7). Total num frames: 63275008. Throughput: 0: 3331.3. Samples: 4990436. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-03 22:16:43,968][134211] Avg episode reward: [(0, '5.626')] [2025-01-03 22:16:45,645][134294] Updated weights for policy 0, policy_version 15454 (0.0025) [2025-01-03 22:16:48,968][134211] Fps is (10 sec: 12697.4, 60 sec: 13175.4, 300 sec: 13523.7). Total num frames: 63336448. Throughput: 0: 3312.2. Samples: 4999502. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-03 22:16:48,968][134211] Avg episode reward: [(0, '5.709')] [2025-01-03 22:16:49,122][134294] Updated weights for policy 0, policy_version 15464 (0.0025) [2025-01-03 22:16:52,403][134294] Updated weights for policy 0, policy_version 15474 (0.0026) [2025-01-03 22:16:53,968][134211] Fps is (10 sec: 13107.5, 60 sec: 13243.7, 300 sec: 13551.5). Total num frames: 63406080. Throughput: 0: 3261.4. Samples: 5017942. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 22:16:53,968][134211] Avg episode reward: [(0, '6.238')] [2025-01-03 22:16:54,658][134294] Updated weights for policy 0, policy_version 15484 (0.0016) [2025-01-03 22:16:56,520][134294] Updated weights for policy 0, policy_version 15494 (0.0014) [2025-01-03 22:16:58,415][134294] Updated weights for policy 0, policy_version 15504 (0.0012) [2025-01-03 22:16:58,968][134211] Fps is (10 sec: 18022.7, 60 sec: 13994.7, 300 sec: 13704.2). Total num frames: 63516672. Throughput: 0: 3518.3. Samples: 5048228. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 22:16:58,968][134211] Avg episode reward: [(0, '5.994')] [2025-01-03 22:17:00,328][134294] Updated weights for policy 0, policy_version 15514 (0.0013) [2025-01-03 22:17:03,296][134294] Updated weights for policy 0, policy_version 15524 (0.0026) [2025-01-03 22:17:03,968][134211] Fps is (10 sec: 18431.3, 60 sec: 14131.1, 300 sec: 13732.0). Total num frames: 63590400. Throughput: 0: 3629.5. Samples: 5062868. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-03 22:17:03,969][134211] Avg episode reward: [(0, '6.159')] [2025-01-03 22:17:06,556][134294] Updated weights for policy 0, policy_version 15534 (0.0025) [2025-01-03 22:17:08,969][134211] Fps is (10 sec: 13925.1, 60 sec: 13994.7, 300 sec: 13759.7). Total num frames: 63655936. Throughput: 0: 3602.5. Samples: 5081874. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-03 22:17:08,969][134211] Avg episode reward: [(0, '5.823')] [2025-01-03 22:17:09,862][134294] Updated weights for policy 0, policy_version 15544 (0.0026) [2025-01-03 22:17:12,907][134294] Updated weights for policy 0, policy_version 15554 (0.0024) [2025-01-03 22:17:13,968][134211] Fps is (10 sec: 13107.6, 60 sec: 14131.2, 300 sec: 13759.8). Total num frames: 63721472. Throughput: 0: 3588.1. Samples: 5101306. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 22:17:13,968][134211] Avg episode reward: [(0, '5.691')] [2025-01-03 22:17:13,983][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000015557_63721472.pth... [2025-01-03 22:17:14,053][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000014744_60391424.pth [2025-01-03 22:17:16,068][134294] Updated weights for policy 0, policy_version 15564 (0.0025) [2025-01-03 22:17:18,923][134294] Updated weights for policy 0, policy_version 15574 (0.0024) [2025-01-03 22:17:18,968][134211] Fps is (10 sec: 13517.8, 60 sec: 14131.2, 300 sec: 13773.7). Total num frames: 63791104. Throughput: 0: 3575.7. Samples: 5111212. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 22:17:18,968][134211] Avg episode reward: [(0, '5.707')] [2025-01-03 22:17:21,994][134294] Updated weights for policy 0, policy_version 15584 (0.0022) [2025-01-03 22:17:23,968][134211] Fps is (10 sec: 13516.8, 60 sec: 14131.2, 300 sec: 13773.7). Total num frames: 63856640. Throughput: 0: 3587.9. Samples: 5131916. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 22:17:23,968][134211] Avg episode reward: [(0, '5.934')] [2025-01-03 22:17:24,975][134294] Updated weights for policy 0, policy_version 15594 (0.0027) [2025-01-03 22:17:27,979][134294] Updated weights for policy 0, policy_version 15604 (0.0026) [2025-01-03 22:17:28,968][134211] Fps is (10 sec: 13516.9, 60 sec: 14199.5, 300 sec: 13759.8). Total num frames: 63926272. Throughput: 0: 3595.8. Samples: 5152248. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 22:17:28,968][134211] Avg episode reward: [(0, '5.790')] [2025-01-03 22:17:31,040][134294] Updated weights for policy 0, policy_version 15614 (0.0027) [2025-01-03 22:17:33,968][134211] Fps is (10 sec: 13516.7, 60 sec: 14131.2, 300 sec: 13759.8). Total num frames: 63991808. Throughput: 0: 3624.1. Samples: 5162588. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 22:17:33,968][134211] Avg episode reward: [(0, '5.187')] [2025-01-03 22:17:34,003][134294] Updated weights for policy 0, policy_version 15624 (0.0025) [2025-01-03 22:17:37,050][134294] Updated weights for policy 0, policy_version 15634 (0.0026) [2025-01-03 22:17:38,968][134211] Fps is (10 sec: 13107.2, 60 sec: 14131.2, 300 sec: 13759.8). Total num frames: 64057344. Throughput: 0: 3665.3. Samples: 5182882. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 22:17:38,968][134211] Avg episode reward: [(0, '5.265')] [2025-01-03 22:17:40,096][134294] Updated weights for policy 0, policy_version 15644 (0.0025) [2025-01-03 22:17:42,938][134294] Updated weights for policy 0, policy_version 15654 (0.0023) [2025-01-03 22:17:43,968][134211] Fps is (10 sec: 13926.3, 60 sec: 14267.7, 300 sec: 13815.3). Total num frames: 64131072. Throughput: 0: 3449.6. Samples: 5203462. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 22:17:43,968][134211] Avg episode reward: [(0, '5.407')] [2025-01-03 22:17:45,956][134294] Updated weights for policy 0, policy_version 15664 (0.0025) [2025-01-03 22:17:48,868][134294] Updated weights for policy 0, policy_version 15674 (0.0023) [2025-01-03 22:17:48,968][134211] Fps is (10 sec: 14335.9, 60 sec: 14404.3, 300 sec: 13843.1). Total num frames: 64200704. Throughput: 0: 3356.5. Samples: 5213910. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 22:17:48,968][134211] Avg episode reward: [(0, '5.231')] [2025-01-03 22:17:51,856][134294] Updated weights for policy 0, policy_version 15684 (0.0025) [2025-01-03 22:17:53,971][134211] Fps is (10 sec: 13922.1, 60 sec: 14403.5, 300 sec: 13842.9). Total num frames: 64270336. Throughput: 0: 3396.8. Samples: 5234740. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 22:17:53,972][134211] Avg episode reward: [(0, '5.285')] [2025-01-03 22:17:54,862][134294] Updated weights for policy 0, policy_version 15694 (0.0022) [2025-01-03 22:17:57,807][134294] Updated weights for policy 0, policy_version 15704 (0.0027) [2025-01-03 22:17:58,968][134211] Fps is (10 sec: 13516.1, 60 sec: 13653.2, 300 sec: 13829.2). Total num frames: 64335872. Throughput: 0: 3419.7. Samples: 5255196. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 22:17:58,969][134211] Avg episode reward: [(0, '5.387')] [2025-01-03 22:18:00,994][134294] Updated weights for policy 0, policy_version 15714 (0.0027) [2025-01-03 22:18:03,968][134211] Fps is (10 sec: 13111.2, 60 sec: 13516.8, 300 sec: 13829.2). Total num frames: 64401408. Throughput: 0: 3412.6. Samples: 5264780. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 22:18:03,969][134211] Avg episode reward: [(0, '5.857')] [2025-01-03 22:18:04,288][134294] Updated weights for policy 0, policy_version 15724 (0.0026) [2025-01-03 22:18:07,179][134294] Updated weights for policy 0, policy_version 15734 (0.0026) [2025-01-03 22:18:08,968][134211] Fps is (10 sec: 13108.1, 60 sec: 13517.0, 300 sec: 13815.4). Total num frames: 64466944. Throughput: 0: 3396.8. Samples: 5284770. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 22:18:08,968][134211] Avg episode reward: [(0, '5.708')] [2025-01-03 22:18:10,182][134294] Updated weights for policy 0, policy_version 15744 (0.0021) [2025-01-03 22:18:13,074][134294] Updated weights for policy 0, policy_version 15754 (0.0026) [2025-01-03 22:18:13,968][134211] Fps is (10 sec: 13516.8, 60 sec: 13585.0, 300 sec: 13829.2). Total num frames: 64536576. Throughput: 0: 3406.2. Samples: 5305528. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 22:18:13,969][134211] Avg episode reward: [(0, '5.534')] [2025-01-03 22:18:16,260][134294] Updated weights for policy 0, policy_version 15764 (0.0025) [2025-01-03 22:18:18,837][134294] Updated weights for policy 0, policy_version 15774 (0.0019) [2025-01-03 22:18:18,968][134211] Fps is (10 sec: 14336.1, 60 sec: 13653.4, 300 sec: 13857.0). Total num frames: 64610304. Throughput: 0: 3394.9. Samples: 5315358. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 22:18:18,968][134211] Avg episode reward: [(0, '5.940')] [2025-01-03 22:18:20,730][134294] Updated weights for policy 0, policy_version 15784 (0.0013) [2025-01-03 22:18:22,899][134294] Updated weights for policy 0, policy_version 15794 (0.0016) [2025-01-03 22:18:23,968][134211] Fps is (10 sec: 16793.8, 60 sec: 14131.2, 300 sec: 13968.1). Total num frames: 64704512. Throughput: 0: 3571.8. Samples: 5343612. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 22:18:23,968][134211] Avg episode reward: [(0, '5.674')] [2025-01-03 22:18:25,944][134294] Updated weights for policy 0, policy_version 15804 (0.0026) [2025-01-03 22:18:28,969][134211] Fps is (10 sec: 15972.5, 60 sec: 14062.7, 300 sec: 13954.1). Total num frames: 64770048. Throughput: 0: 3561.2. Samples: 5363718. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 22:18:28,969][134211] Avg episode reward: [(0, '5.864')] [2025-01-03 22:18:29,147][134294] Updated weights for policy 0, policy_version 15814 (0.0028) [2025-01-03 22:18:32,461][134294] Updated weights for policy 0, policy_version 15824 (0.0027) [2025-01-03 22:18:33,968][134211] Fps is (10 sec: 12697.0, 60 sec: 13994.6, 300 sec: 13926.4). Total num frames: 64831488. Throughput: 0: 3535.7. Samples: 5373020. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 22:18:33,969][134211] Avg episode reward: [(0, '5.723')] [2025-01-03 22:18:35,687][134294] Updated weights for policy 0, policy_version 15834 (0.0027) [2025-01-03 22:18:38,843][134294] Updated weights for policy 0, policy_version 15844 (0.0027) [2025-01-03 22:18:38,968][134211] Fps is (10 sec: 12698.9, 60 sec: 13994.7, 300 sec: 13926.4). Total num frames: 64897024. Throughput: 0: 3496.0. Samples: 5392048. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 22:18:38,968][134211] Avg episode reward: [(0, '6.296')] [2025-01-03 22:18:41,835][134294] Updated weights for policy 0, policy_version 15854 (0.0026) [2025-01-03 22:18:43,968][134211] Fps is (10 sec: 13107.7, 60 sec: 13858.1, 300 sec: 13912.5). Total num frames: 64962560. Throughput: 0: 3484.8. Samples: 5412010. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 22:18:43,969][134211] Avg episode reward: [(0, '5.968')] [2025-01-03 22:18:45,259][134294] Updated weights for policy 0, policy_version 15864 (0.0026) [2025-01-03 22:18:48,574][134294] Updated weights for policy 0, policy_version 15874 (0.0027) [2025-01-03 22:18:48,968][134211] Fps is (10 sec: 12287.8, 60 sec: 13653.3, 300 sec: 13870.9). Total num frames: 65019904. Throughput: 0: 3464.9. Samples: 5420700. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 22:18:48,969][134211] Avg episode reward: [(0, '5.600')] [2025-01-03 22:18:51,477][134294] Updated weights for policy 0, policy_version 15884 (0.0019) [2025-01-03 22:18:53,381][134294] Updated weights for policy 0, policy_version 15894 (0.0013) [2025-01-03 22:18:53,967][134211] Fps is (10 sec: 14746.1, 60 sec: 13995.5, 300 sec: 13954.2). Total num frames: 65110016. Throughput: 0: 3492.8. Samples: 5441944. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 22:18:53,968][134211] Avg episode reward: [(0, '5.091')] [2025-01-03 22:18:55,330][134294] Updated weights for policy 0, policy_version 15904 (0.0015) [2025-01-03 22:18:57,231][134294] Updated weights for policy 0, policy_version 15914 (0.0013) [2025-01-03 22:18:58,968][134211] Fps is (10 sec: 20071.0, 60 sec: 14745.8, 300 sec: 14093.0). Total num frames: 65220608. Throughput: 0: 3751.9. Samples: 5474364. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-03 22:18:58,968][134211] Avg episode reward: [(0, '5.659')] [2025-01-03 22:18:59,154][134294] Updated weights for policy 0, policy_version 15924 (0.0013) [2025-01-03 22:19:01,014][134294] Updated weights for policy 0, policy_version 15934 (0.0013) [2025-01-03 22:19:03,604][134294] Updated weights for policy 0, policy_version 15944 (0.0022) [2025-01-03 22:19:03,968][134211] Fps is (10 sec: 19660.0, 60 sec: 15086.9, 300 sec: 14148.5). Total num frames: 65306624. Throughput: 0: 3895.0. Samples: 5490634. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-03 22:19:03,969][134211] Avg episode reward: [(0, '5.775')] [2025-01-03 22:19:06,991][134294] Updated weights for policy 0, policy_version 15954 (0.0029) [2025-01-03 22:19:08,968][134211] Fps is (10 sec: 15154.8, 60 sec: 15086.9, 300 sec: 14148.6). Total num frames: 65372160. Throughput: 0: 3710.0. Samples: 5510564. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-03 22:19:08,968][134211] Avg episode reward: [(0, '5.736')] [2025-01-03 22:19:10,178][134294] Updated weights for policy 0, policy_version 15964 (0.0025) [2025-01-03 22:19:13,405][134294] Updated weights for policy 0, policy_version 15974 (0.0029) [2025-01-03 22:19:13,969][134211] Fps is (10 sec: 12696.3, 60 sec: 14950.1, 300 sec: 14134.6). Total num frames: 65433600. Throughput: 0: 3694.0. Samples: 5529948. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-03 22:19:13,970][134211] Avg episode reward: [(0, '5.783')] [2025-01-03 22:19:13,980][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000015975_65433600.pth... [2025-01-03 22:19:14,056][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000015136_61997056.pth [2025-01-03 22:19:16,741][134294] Updated weights for policy 0, policy_version 15984 (0.0028) [2025-01-03 22:19:18,968][134211] Fps is (10 sec: 12288.0, 60 sec: 14745.5, 300 sec: 14106.9). Total num frames: 65495040. Throughput: 0: 3688.3. Samples: 5538990. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 22:19:18,968][134211] Avg episode reward: [(0, '5.617')] [2025-01-03 22:19:20,071][134294] Updated weights for policy 0, policy_version 15994 (0.0026) [2025-01-03 22:19:23,048][134294] Updated weights for policy 0, policy_version 16004 (0.0025) [2025-01-03 22:19:23,968][134211] Fps is (10 sec: 12699.0, 60 sec: 14267.7, 300 sec: 14093.0). Total num frames: 65560576. Throughput: 0: 3694.4. Samples: 5558298. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 22:19:23,968][134211] Avg episode reward: [(0, '5.775')] [2025-01-03 22:19:26,124][134294] Updated weights for policy 0, policy_version 16014 (0.0027) [2025-01-03 22:19:28,968][134211] Fps is (10 sec: 13107.2, 60 sec: 14268.0, 300 sec: 14093.0). Total num frames: 65626112. Throughput: 0: 3688.1. Samples: 5577972. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 22:19:28,968][134211] Avg episode reward: [(0, '6.217')] [2025-01-03 22:19:29,318][134294] Updated weights for policy 0, policy_version 16024 (0.0027) [2025-01-03 22:19:32,195][134294] Updated weights for policy 0, policy_version 16034 (0.0025) [2025-01-03 22:19:33,968][134211] Fps is (10 sec: 13517.1, 60 sec: 14404.4, 300 sec: 14093.0). Total num frames: 65695744. Throughput: 0: 3720.9. Samples: 5588138. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 22:19:33,968][134211] Avg episode reward: [(0, '5.806')] [2025-01-03 22:19:35,233][134294] Updated weights for policy 0, policy_version 16044 (0.0024) [2025-01-03 22:19:38,117][134294] Updated weights for policy 0, policy_version 16054 (0.0026) [2025-01-03 22:19:38,968][134211] Fps is (10 sec: 13926.6, 60 sec: 14472.5, 300 sec: 14106.9). Total num frames: 65765376. Throughput: 0: 3715.2. Samples: 5609128. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 22:19:38,968][134211] Avg episode reward: [(0, '5.906')] [2025-01-03 22:19:41,191][134294] Updated weights for policy 0, policy_version 16064 (0.0025) [2025-01-03 22:19:43,968][134211] Fps is (10 sec: 13926.2, 60 sec: 14540.8, 300 sec: 14148.5). Total num frames: 65835008. Throughput: 0: 3443.8. Samples: 5629334. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-03 22:19:43,968][134211] Avg episode reward: [(0, '5.950')] [2025-01-03 22:19:44,282][134294] Updated weights for policy 0, policy_version 16074 (0.0026) [2025-01-03 22:19:47,252][134294] Updated weights for policy 0, policy_version 16084 (0.0025) [2025-01-03 22:19:48,968][134211] Fps is (10 sec: 13516.7, 60 sec: 14677.4, 300 sec: 14162.4). Total num frames: 65900544. Throughput: 0: 3310.8. Samples: 5639618. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-03 22:19:48,968][134211] Avg episode reward: [(0, '5.548')] [2025-01-03 22:19:50,338][134294] Updated weights for policy 0, policy_version 16094 (0.0021) [2025-01-03 22:19:53,154][134294] Updated weights for policy 0, policy_version 16104 (0.0027) [2025-01-03 22:19:53,968][134211] Fps is (10 sec: 13516.7, 60 sec: 14335.9, 300 sec: 14162.4). Total num frames: 65970176. Throughput: 0: 3322.7. Samples: 5660084. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-03 22:19:53,968][134211] Avg episode reward: [(0, '6.051')] [2025-01-03 22:19:56,132][134294] Updated weights for policy 0, policy_version 16114 (0.0024) [2025-01-03 22:19:58,968][134211] Fps is (10 sec: 13926.3, 60 sec: 13653.3, 300 sec: 14176.3). Total num frames: 66039808. Throughput: 0: 3351.0. Samples: 5680738. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-03 22:19:58,968][134211] Avg episode reward: [(0, '6.447')] [2025-01-03 22:19:59,233][134294] Updated weights for policy 0, policy_version 16124 (0.0025) [2025-01-03 22:20:02,190][134294] Updated weights for policy 0, policy_version 16134 (0.0023) [2025-01-03 22:20:03,968][134211] Fps is (10 sec: 13516.9, 60 sec: 13312.0, 300 sec: 14162.4). Total num frames: 66105344. Throughput: 0: 3377.5. Samples: 5690978. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 22:20:03,968][134211] Avg episode reward: [(0, '6.082')] [2025-01-03 22:20:05,247][134294] Updated weights for policy 0, policy_version 16144 (0.0025) [2025-01-03 22:20:08,105][134294] Updated weights for policy 0, policy_version 16154 (0.0025) [2025-01-03 22:20:08,968][134211] Fps is (10 sec: 13516.8, 60 sec: 13380.3, 300 sec: 14162.4). Total num frames: 66174976. Throughput: 0: 3406.8. Samples: 5711602. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 22:20:08,968][134211] Avg episode reward: [(0, '6.236')] [2025-01-03 22:20:11,045][134294] Updated weights for policy 0, policy_version 16164 (0.0025) [2025-01-03 22:20:13,968][134211] Fps is (10 sec: 13926.3, 60 sec: 13517.1, 300 sec: 14162.4). Total num frames: 66244608. Throughput: 0: 3428.8. Samples: 5732270. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 22:20:13,968][134211] Avg episode reward: [(0, '5.818')] [2025-01-03 22:20:14,232][134294] Updated weights for policy 0, policy_version 16174 (0.0027) [2025-01-03 22:20:17,273][134294] Updated weights for policy 0, policy_version 16184 (0.0024) [2025-01-03 22:20:18,968][134211] Fps is (10 sec: 13516.9, 60 sec: 13585.1, 300 sec: 14162.4). Total num frames: 66310144. Throughput: 0: 3421.7. Samples: 5742116. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 22:20:18,968][134211] Avg episode reward: [(0, '5.752')] [2025-01-03 22:20:20,297][134294] Updated weights for policy 0, policy_version 16194 (0.0024) [2025-01-03 22:20:23,208][134294] Updated weights for policy 0, policy_version 16204 (0.0026) [2025-01-03 22:20:23,968][134211] Fps is (10 sec: 13516.7, 60 sec: 13653.3, 300 sec: 14162.4). Total num frames: 66379776. Throughput: 0: 3408.9. Samples: 5762528. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 22:20:23,968][134211] Avg episode reward: [(0, '6.256')] [2025-01-03 22:20:26,165][134294] Updated weights for policy 0, policy_version 16214 (0.0024) [2025-01-03 22:20:28,968][134211] Fps is (10 sec: 13926.1, 60 sec: 13721.6, 300 sec: 14093.0). Total num frames: 66449408. Throughput: 0: 3415.8. Samples: 5783044. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 22:20:28,969][134211] Avg episode reward: [(0, '5.783')] [2025-01-03 22:20:29,315][134294] Updated weights for policy 0, policy_version 16224 (0.0028) [2025-01-03 22:20:32,221][134294] Updated weights for policy 0, policy_version 16234 (0.0025) [2025-01-03 22:20:33,968][134211] Fps is (10 sec: 13517.1, 60 sec: 13653.3, 300 sec: 13954.2). Total num frames: 66514944. Throughput: 0: 3416.8. Samples: 5793374. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 22:20:33,968][134211] Avg episode reward: [(0, '5.707')] [2025-01-03 22:20:35,102][134294] Updated weights for policy 0, policy_version 16244 (0.0024) [2025-01-03 22:20:37,172][134294] Updated weights for policy 0, policy_version 16254 (0.0014) [2025-01-03 22:20:38,968][134211] Fps is (10 sec: 15565.1, 60 sec: 13994.6, 300 sec: 13981.9). Total num frames: 66605056. Throughput: 0: 3494.7. Samples: 5817346. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-03 22:20:38,968][134211] Avg episode reward: [(0, '5.783')] [2025-01-03 22:20:39,781][134294] Updated weights for policy 0, policy_version 16264 (0.0020) [2025-01-03 22:20:42,775][134294] Updated weights for policy 0, policy_version 16274 (0.0022) [2025-01-03 22:20:43,969][134211] Fps is (10 sec: 15563.4, 60 sec: 13926.2, 300 sec: 13981.9). Total num frames: 66670592. Throughput: 0: 3519.3. Samples: 5839108. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-03 22:20:43,969][134211] Avg episode reward: [(0, '5.891')] [2025-01-03 22:20:46,125][134294] Updated weights for policy 0, policy_version 16284 (0.0028) [2025-01-03 22:20:48,968][134211] Fps is (10 sec: 12288.1, 60 sec: 13789.9, 300 sec: 13954.2). Total num frames: 66727936. Throughput: 0: 3492.0. Samples: 5848118. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-03 22:20:48,968][134211] Avg episode reward: [(0, '5.886')] [2025-01-03 22:20:49,705][134294] Updated weights for policy 0, policy_version 16294 (0.0024) [2025-01-03 22:20:51,822][134294] Updated weights for policy 0, policy_version 16304 (0.0013) [2025-01-03 22:20:53,735][134294] Updated weights for policy 0, policy_version 16314 (0.0014) [2025-01-03 22:20:53,968][134211] Fps is (10 sec: 15566.3, 60 sec: 14267.8, 300 sec: 14065.3). Total num frames: 66826240. Throughput: 0: 3527.3. Samples: 5870332. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-03 22:20:53,969][134211] Avg episode reward: [(0, '5.451')] [2025-01-03 22:20:55,644][134294] Updated weights for policy 0, policy_version 16324 (0.0015) [2025-01-03 22:20:58,319][134294] Updated weights for policy 0, policy_version 16334 (0.0022) [2025-01-03 22:20:58,968][134211] Fps is (10 sec: 18022.3, 60 sec: 14472.5, 300 sec: 14120.8). Total num frames: 66908160. Throughput: 0: 3696.9. Samples: 5898632. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-03 22:20:58,968][134211] Avg episode reward: [(0, '5.742')] [2025-01-03 22:21:01,523][134294] Updated weights for policy 0, policy_version 16344 (0.0028) [2025-01-03 22:21:03,968][134211] Fps is (10 sec: 14745.3, 60 sec: 14472.5, 300 sec: 14093.1). Total num frames: 66973696. Throughput: 0: 3694.3. Samples: 5908360. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 22:21:03,969][134211] Avg episode reward: [(0, '5.738')] [2025-01-03 22:21:04,778][134294] Updated weights for policy 0, policy_version 16354 (0.0027) [2025-01-03 22:21:07,950][134294] Updated weights for policy 0, policy_version 16364 (0.0029) [2025-01-03 22:21:08,968][134211] Fps is (10 sec: 13106.6, 60 sec: 14404.2, 300 sec: 14120.8). Total num frames: 67039232. Throughput: 0: 3667.0. Samples: 5927542. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 22:21:08,969][134211] Avg episode reward: [(0, '5.778')] [2025-01-03 22:21:11,048][134294] Updated weights for policy 0, policy_version 16374 (0.0023) [2025-01-03 22:21:13,968][134211] Fps is (10 sec: 12697.5, 60 sec: 14267.7, 300 sec: 14093.0). Total num frames: 67100672. Throughput: 0: 3636.8. Samples: 5946700. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 22:21:13,969][134211] Avg episode reward: [(0, '5.554')] [2025-01-03 22:21:14,037][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000016383_67104768.pth... [2025-01-03 22:21:14,116][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000015557_63721472.pth [2025-01-03 22:21:14,455][134294] Updated weights for policy 0, policy_version 16384 (0.0025) [2025-01-03 22:21:17,547][134294] Updated weights for policy 0, policy_version 16394 (0.0026) [2025-01-03 22:21:18,968][134211] Fps is (10 sec: 12697.9, 60 sec: 14267.7, 300 sec: 14093.0). Total num frames: 67166208. Throughput: 0: 3616.0. Samples: 5956094. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 22:21:18,969][134211] Avg episode reward: [(0, '5.806')] [2025-01-03 22:21:20,549][134294] Updated weights for policy 0, policy_version 16404 (0.0025) [2025-01-03 22:21:23,470][134294] Updated weights for policy 0, policy_version 16414 (0.0024) [2025-01-03 22:21:23,968][134211] Fps is (10 sec: 13517.0, 60 sec: 14267.8, 300 sec: 14106.9). Total num frames: 67235840. Throughput: 0: 3544.8. Samples: 5976864. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 22:21:23,969][134211] Avg episode reward: [(0, '5.245')] [2025-01-03 22:21:26,437][134294] Updated weights for policy 0, policy_version 16424 (0.0025) [2025-01-03 22:21:28,968][134211] Fps is (10 sec: 13517.1, 60 sec: 14199.5, 300 sec: 14093.0). Total num frames: 67301376. Throughput: 0: 3502.9. Samples: 5996736. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 22:21:28,968][134211] Avg episode reward: [(0, '6.042')] [2025-01-03 22:21:29,613][134294] Updated weights for policy 0, policy_version 16434 (0.0026) [2025-01-03 22:21:32,618][134294] Updated weights for policy 0, policy_version 16444 (0.0027) [2025-01-03 22:21:33,968][134211] Fps is (10 sec: 13517.0, 60 sec: 14267.7, 300 sec: 14106.9). Total num frames: 67371008. Throughput: 0: 3531.6. Samples: 6007042. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 22:21:33,968][134211] Avg episode reward: [(0, '5.464')] [2025-01-03 22:21:35,509][134294] Updated weights for policy 0, policy_version 16454 (0.0026) [2025-01-03 22:21:38,404][134294] Updated weights for policy 0, policy_version 16464 (0.0026) [2025-01-03 22:21:38,968][134211] Fps is (10 sec: 13926.4, 60 sec: 13926.4, 300 sec: 14120.8). Total num frames: 67440640. Throughput: 0: 3504.9. Samples: 6028052. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-03 22:21:38,968][134211] Avg episode reward: [(0, '5.807')] [2025-01-03 22:21:41,447][134294] Updated weights for policy 0, policy_version 16474 (0.0024) [2025-01-03 22:21:43,968][134211] Fps is (10 sec: 13926.3, 60 sec: 13994.9, 300 sec: 14148.6). Total num frames: 67510272. Throughput: 0: 3327.5. Samples: 6048370. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-03 22:21:43,968][134211] Avg episode reward: [(0, '5.603')] [2025-01-03 22:21:44,562][134294] Updated weights for policy 0, policy_version 16484 (0.0025) [2025-01-03 22:21:47,579][134294] Updated weights for policy 0, policy_version 16494 (0.0023) [2025-01-03 22:21:48,968][134211] Fps is (10 sec: 13516.8, 60 sec: 14131.2, 300 sec: 14134.7). Total num frames: 67575808. Throughput: 0: 3333.9. Samples: 6058384. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-03 22:21:48,968][134211] Avg episode reward: [(0, '5.175')] [2025-01-03 22:21:50,644][134294] Updated weights for policy 0, policy_version 16504 (0.0024) [2025-01-03 22:21:53,171][134294] Updated weights for policy 0, policy_version 16514 (0.0018) [2025-01-03 22:21:53,968][134211] Fps is (10 sec: 14336.1, 60 sec: 13789.9, 300 sec: 14023.6). Total num frames: 67653632. Throughput: 0: 3355.0. Samples: 6078516. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 22:21:53,968][134211] Avg episode reward: [(0, '6.042')] [2025-01-03 22:21:55,636][134294] Updated weights for policy 0, policy_version 16524 (0.0019) [2025-01-03 22:21:58,600][134294] Updated weights for policy 0, policy_version 16534 (0.0024) [2025-01-03 22:21:58,968][134211] Fps is (10 sec: 15155.2, 60 sec: 13653.3, 300 sec: 14023.6). Total num frames: 67727360. Throughput: 0: 3466.5. Samples: 6102690. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 22:21:58,968][134211] Avg episode reward: [(0, '5.708')] [2025-01-03 22:22:01,557][134294] Updated weights for policy 0, policy_version 16544 (0.0027) [2025-01-03 22:22:03,968][134211] Fps is (10 sec: 13926.2, 60 sec: 13653.3, 300 sec: 14023.6). Total num frames: 67792896. Throughput: 0: 3483.7. Samples: 6112862. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 22:22:03,968][134211] Avg episode reward: [(0, '5.596')] [2025-01-03 22:22:04,694][134294] Updated weights for policy 0, policy_version 16554 (0.0027) [2025-01-03 22:22:07,706][134294] Updated weights for policy 0, policy_version 16564 (0.0023) [2025-01-03 22:22:08,968][134211] Fps is (10 sec: 13107.2, 60 sec: 13653.4, 300 sec: 14023.6). Total num frames: 67858432. Throughput: 0: 3466.9. Samples: 6132876. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 22:22:08,968][134211] Avg episode reward: [(0, '6.271')] [2025-01-03 22:22:11,000][134294] Updated weights for policy 0, policy_version 16574 (0.0026) [2025-01-03 22:22:13,316][134294] Updated weights for policy 0, policy_version 16584 (0.0014) [2025-01-03 22:22:13,968][134211] Fps is (10 sec: 14745.9, 60 sec: 13994.7, 300 sec: 14065.3). Total num frames: 67940352. Throughput: 0: 3508.5. Samples: 6154618. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-03 22:22:13,968][134211] Avg episode reward: [(0, '5.826')] [2025-01-03 22:22:15,392][134294] Updated weights for policy 0, policy_version 16594 (0.0012) [2025-01-03 22:22:17,340][134294] Updated weights for policy 0, policy_version 16604 (0.0014) [2025-01-03 22:22:18,967][134211] Fps is (10 sec: 18432.5, 60 sec: 14609.2, 300 sec: 14190.2). Total num frames: 68042752. Throughput: 0: 3611.9. Samples: 6169578. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-03 22:22:18,968][134211] Avg episode reward: [(0, '5.831')] [2025-01-03 22:22:19,223][134294] Updated weights for policy 0, policy_version 16614 (0.0013) [2025-01-03 22:22:21,292][134294] Updated weights for policy 0, policy_version 16624 (0.0018) [2025-01-03 22:22:23,968][134211] Fps is (10 sec: 18431.5, 60 sec: 14813.9, 300 sec: 14231.9). Total num frames: 68124672. Throughput: 0: 3804.1. Samples: 6199236. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-03 22:22:23,969][134211] Avg episode reward: [(0, '6.200')] [2025-01-03 22:22:24,436][134294] Updated weights for policy 0, policy_version 16634 (0.0026) [2025-01-03 22:22:27,636][134294] Updated weights for policy 0, policy_version 16644 (0.0030) [2025-01-03 22:22:28,968][134211] Fps is (10 sec: 14745.3, 60 sec: 14813.9, 300 sec: 14231.9). Total num frames: 68190208. Throughput: 0: 3778.4. Samples: 6218396. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 22:22:28,968][134211] Avg episode reward: [(0, '5.826')] [2025-01-03 22:22:30,811][134294] Updated weights for policy 0, policy_version 16654 (0.0026) [2025-01-03 22:22:33,968][134211] Fps is (10 sec: 12697.8, 60 sec: 14677.3, 300 sec: 14218.0). Total num frames: 68251648. Throughput: 0: 3770.1. Samples: 6228040. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 22:22:33,968][134211] Avg episode reward: [(0, '6.085')] [2025-01-03 22:22:34,077][134294] Updated weights for policy 0, policy_version 16664 (0.0025) [2025-01-03 22:22:37,019][134294] Updated weights for policy 0, policy_version 16674 (0.0028) [2025-01-03 22:22:38,968][134211] Fps is (10 sec: 12697.8, 60 sec: 14609.1, 300 sec: 14190.2). Total num frames: 68317184. Throughput: 0: 3759.2. Samples: 6247680. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-03 22:22:38,968][134211] Avg episode reward: [(0, '6.320')] [2025-01-03 22:22:40,179][134294] Updated weights for policy 0, policy_version 16684 (0.0025) [2025-01-03 22:22:43,050][134294] Updated weights for policy 0, policy_version 16694 (0.0026) [2025-01-03 22:22:43,968][134211] Fps is (10 sec: 13516.7, 60 sec: 14609.0, 300 sec: 14190.2). Total num frames: 68386816. Throughput: 0: 3674.0. Samples: 6268022. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-03 22:22:43,968][134211] Avg episode reward: [(0, '5.988')] [2025-01-03 22:22:46,397][134294] Updated weights for policy 0, policy_version 16704 (0.0025) [2025-01-03 22:22:48,971][134211] Fps is (10 sec: 13103.0, 60 sec: 14540.1, 300 sec: 14162.4). Total num frames: 68448256. Throughput: 0: 3659.7. Samples: 6277558. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 22:22:48,972][134211] Avg episode reward: [(0, '5.806')] [2025-01-03 22:22:49,939][134294] Updated weights for policy 0, policy_version 16714 (0.0026) [2025-01-03 22:22:53,139][134294] Updated weights for policy 0, policy_version 16724 (0.0029) [2025-01-03 22:22:53,968][134211] Fps is (10 sec: 12288.0, 60 sec: 14267.7, 300 sec: 14148.6). Total num frames: 68509696. Throughput: 0: 3610.2. Samples: 6295334. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 22:22:53,968][134211] Avg episode reward: [(0, '5.832')] [2025-01-03 22:22:56,416][134294] Updated weights for policy 0, policy_version 16734 (0.0027) [2025-01-03 22:22:58,968][134211] Fps is (10 sec: 12701.7, 60 sec: 14131.2, 300 sec: 14148.6). Total num frames: 68575232. Throughput: 0: 3557.0. Samples: 6314682. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 22:22:58,968][134211] Avg episode reward: [(0, '6.299')] [2025-01-03 22:22:59,438][134294] Updated weights for policy 0, policy_version 16744 (0.0024) [2025-01-03 22:23:02,480][134294] Updated weights for policy 0, policy_version 16754 (0.0027) [2025-01-03 22:23:03,968][134211] Fps is (10 sec: 13107.3, 60 sec: 14131.2, 300 sec: 14148.5). Total num frames: 68640768. Throughput: 0: 3450.6. Samples: 6324858. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 22:23:03,968][134211] Avg episode reward: [(0, '5.996')] [2025-01-03 22:23:05,497][134294] Updated weights for policy 0, policy_version 16764 (0.0029) [2025-01-03 22:23:08,466][134294] Updated weights for policy 0, policy_version 16774 (0.0021) [2025-01-03 22:23:08,968][134211] Fps is (10 sec: 13516.5, 60 sec: 14199.5, 300 sec: 14148.6). Total num frames: 68710400. Throughput: 0: 3249.4. Samples: 6345458. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 22:23:08,968][134211] Avg episode reward: [(0, '5.621')] [2025-01-03 22:23:11,316][134294] Updated weights for policy 0, policy_version 16784 (0.0024) [2025-01-03 22:23:13,968][134211] Fps is (10 sec: 13926.3, 60 sec: 13994.6, 300 sec: 14134.7). Total num frames: 68780032. Throughput: 0: 3284.1. Samples: 6366182. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 22:23:13,968][134211] Avg episode reward: [(0, '5.860')] [2025-01-03 22:23:13,981][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000016792_68780032.pth... [2025-01-03 22:23:14,058][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000015975_65433600.pth [2025-01-03 22:23:14,451][134294] Updated weights for policy 0, policy_version 16794 (0.0025) [2025-01-03 22:23:17,446][134294] Updated weights for policy 0, policy_version 16804 (0.0025) [2025-01-03 22:23:18,968][134211] Fps is (10 sec: 13926.6, 60 sec: 13448.5, 300 sec: 14051.4). Total num frames: 68849664. Throughput: 0: 3289.5. Samples: 6376066. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 22:23:18,968][134211] Avg episode reward: [(0, '5.773')] [2025-01-03 22:23:20,432][134294] Updated weights for policy 0, policy_version 16814 (0.0024) [2025-01-03 22:23:23,613][134294] Updated weights for policy 0, policy_version 16824 (0.0023) [2025-01-03 22:23:23,968][134211] Fps is (10 sec: 13516.9, 60 sec: 13175.5, 300 sec: 14051.4). Total num frames: 68915200. Throughput: 0: 3309.5. Samples: 6396608. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 22:23:23,968][134211] Avg episode reward: [(0, '5.208')] [2025-01-03 22:23:26,493][134294] Updated weights for policy 0, policy_version 16834 (0.0021) [2025-01-03 22:23:28,798][134294] Updated weights for policy 0, policy_version 16844 (0.0016) [2025-01-03 22:23:28,968][134211] Fps is (10 sec: 14335.3, 60 sec: 13380.2, 300 sec: 14106.9). Total num frames: 68993024. Throughput: 0: 3361.8. Samples: 6419306. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-03 22:23:28,969][134211] Avg episode reward: [(0, '5.587')] [2025-01-03 22:23:31,212][134294] Updated weights for policy 0, policy_version 16854 (0.0018) [2025-01-03 22:23:33,656][134294] Updated weights for policy 0, policy_version 16864 (0.0020) [2025-01-03 22:23:33,968][134211] Fps is (10 sec: 16383.9, 60 sec: 13789.9, 300 sec: 14176.3). Total num frames: 69079040. Throughput: 0: 3419.1. Samples: 6431408. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-03 22:23:33,969][134211] Avg episode reward: [(0, '5.905')] [2025-01-03 22:23:36,780][134294] Updated weights for policy 0, policy_version 16874 (0.0027) [2025-01-03 22:23:38,968][134211] Fps is (10 sec: 14746.4, 60 sec: 13721.6, 300 sec: 14162.5). Total num frames: 69140480. Throughput: 0: 3509.7. Samples: 6453272. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 22:23:38,968][134211] Avg episode reward: [(0, '5.491')] [2025-01-03 22:23:40,004][134294] Updated weights for policy 0, policy_version 16884 (0.0024) [2025-01-03 22:23:43,008][134294] Updated weights for policy 0, policy_version 16894 (0.0026) [2025-01-03 22:23:43,968][134211] Fps is (10 sec: 12697.7, 60 sec: 13653.4, 300 sec: 14190.2). Total num frames: 69206016. Throughput: 0: 3519.1. Samples: 6473040. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 22:23:43,968][134211] Avg episode reward: [(0, '5.407')] [2025-01-03 22:23:46,399][134294] Updated weights for policy 0, policy_version 16904 (0.0028) [2025-01-03 22:23:48,968][134211] Fps is (10 sec: 12697.6, 60 sec: 13654.1, 300 sec: 14093.0). Total num frames: 69267456. Throughput: 0: 3493.9. Samples: 6482082. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-03 22:23:48,968][134211] Avg episode reward: [(0, '5.201')] [2025-01-03 22:23:49,752][134294] Updated weights for policy 0, policy_version 16914 (0.0030) [2025-01-03 22:23:52,768][134294] Updated weights for policy 0, policy_version 16924 (0.0024) [2025-01-03 22:23:53,968][134211] Fps is (10 sec: 12697.5, 60 sec: 13721.6, 300 sec: 13940.3). Total num frames: 69332992. Throughput: 0: 3463.6. Samples: 6501322. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-03 22:23:53,968][134211] Avg episode reward: [(0, '5.103')] [2025-01-03 22:23:55,790][134294] Updated weights for policy 0, policy_version 16934 (0.0027) [2025-01-03 22:23:58,650][134294] Updated weights for policy 0, policy_version 16944 (0.0023) [2025-01-03 22:23:58,968][134211] Fps is (10 sec: 13925.4, 60 sec: 13858.0, 300 sec: 13898.6). Total num frames: 69406720. Throughput: 0: 3467.2. Samples: 6522208. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-03 22:23:58,969][134211] Avg episode reward: [(0, '5.737')] [2025-01-03 22:24:00,547][134294] Updated weights for policy 0, policy_version 16954 (0.0014) [2025-01-03 22:24:02,491][134294] Updated weights for policy 0, policy_version 16964 (0.0012) [2025-01-03 22:24:03,968][134211] Fps is (10 sec: 17203.3, 60 sec: 14404.3, 300 sec: 14009.7). Total num frames: 69505024. Throughput: 0: 3584.7. Samples: 6537378. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-03 22:24:03,968][134211] Avg episode reward: [(0, '5.655')] [2025-01-03 22:24:05,214][134294] Updated weights for policy 0, policy_version 16974 (0.0024) [2025-01-03 22:24:08,186][134294] Updated weights for policy 0, policy_version 16984 (0.0026) [2025-01-03 22:24:08,968][134211] Fps is (10 sec: 16794.6, 60 sec: 14404.3, 300 sec: 14037.5). Total num frames: 69574656. Throughput: 0: 3663.1. Samples: 6561446. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-03 22:24:08,968][134211] Avg episode reward: [(0, '5.304')] [2025-01-03 22:24:11,200][134294] Updated weights for policy 0, policy_version 16994 (0.0029) [2025-01-03 22:24:13,968][134211] Fps is (10 sec: 13516.8, 60 sec: 14336.0, 300 sec: 14051.4). Total num frames: 69640192. Throughput: 0: 3598.0. Samples: 6581214. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-03 22:24:13,968][134211] Avg episode reward: [(0, '5.784')] [2025-01-03 22:24:14,463][134294] Updated weights for policy 0, policy_version 17004 (0.0026) [2025-01-03 22:24:17,506][134294] Updated weights for policy 0, policy_version 17014 (0.0026) [2025-01-03 22:24:18,968][134211] Fps is (10 sec: 13107.2, 60 sec: 14267.7, 300 sec: 14051.4). Total num frames: 69705728. Throughput: 0: 3552.7. Samples: 6591280. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-03 22:24:18,968][134211] Avg episode reward: [(0, '5.528')] [2025-01-03 22:24:20,563][134294] Updated weights for policy 0, policy_version 17024 (0.0025) [2025-01-03 22:24:23,541][134294] Updated weights for policy 0, policy_version 17034 (0.0025) [2025-01-03 22:24:23,968][134211] Fps is (10 sec: 13516.7, 60 sec: 14336.0, 300 sec: 14065.2). Total num frames: 69775360. Throughput: 0: 3513.6. Samples: 6611386. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-03 22:24:23,968][134211] Avg episode reward: [(0, '5.579')] [2025-01-03 22:24:26,488][134294] Updated weights for policy 0, policy_version 17044 (0.0023) [2025-01-03 22:24:28,968][134211] Fps is (10 sec: 13926.3, 60 sec: 14199.5, 300 sec: 14065.2). Total num frames: 69844992. Throughput: 0: 3531.4. Samples: 6631954. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 22:24:28,968][134211] Avg episode reward: [(0, '6.226')] [2025-01-03 22:24:29,537][134294] Updated weights for policy 0, policy_version 17054 (0.0025) [2025-01-03 22:24:32,516][134294] Updated weights for policy 0, policy_version 17064 (0.0024) [2025-01-03 22:24:33,968][134211] Fps is (10 sec: 13516.9, 60 sec: 13858.2, 300 sec: 14051.4). Total num frames: 69910528. Throughput: 0: 3558.1. Samples: 6642196. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 22:24:33,968][134211] Avg episode reward: [(0, '5.740')] [2025-01-03 22:24:35,575][134294] Updated weights for policy 0, policy_version 17074 (0.0025) [2025-01-03 22:24:38,778][134294] Updated weights for policy 0, policy_version 17084 (0.0025) [2025-01-03 22:24:38,968][134211] Fps is (10 sec: 13107.3, 60 sec: 13926.4, 300 sec: 14037.5). Total num frames: 69976064. Throughput: 0: 3573.4. Samples: 6662124. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-03 22:24:38,968][134211] Avg episode reward: [(0, '5.911')] [2025-01-03 22:24:42,015][134294] Updated weights for policy 0, policy_version 17094 (0.0026) [2025-01-03 22:24:43,968][134211] Fps is (10 sec: 13107.1, 60 sec: 13926.4, 300 sec: 14037.5). Total num frames: 70041600. Throughput: 0: 3526.9. Samples: 6680918. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-03 22:24:43,968][134211] Avg episode reward: [(0, '6.240')] [2025-01-03 22:24:45,250][134294] Updated weights for policy 0, policy_version 17104 (0.0026) [2025-01-03 22:24:48,659][134294] Updated weights for policy 0, policy_version 17114 (0.0026) [2025-01-03 22:24:48,968][134211] Fps is (10 sec: 12697.9, 60 sec: 13926.4, 300 sec: 14009.7). Total num frames: 70103040. Throughput: 0: 3408.2. Samples: 6690746. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-03 22:24:48,968][134211] Avg episode reward: [(0, '5.997')] [2025-01-03 22:24:50,786][134294] Updated weights for policy 0, policy_version 17124 (0.0014) [2025-01-03 22:24:52,851][134294] Updated weights for policy 0, policy_version 17134 (0.0013) [2025-01-03 22:24:53,967][134211] Fps is (10 sec: 15974.8, 60 sec: 14472.6, 300 sec: 14106.9). Total num frames: 70201344. Throughput: 0: 3415.4. Samples: 6715136. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 22:24:53,968][134211] Avg episode reward: [(0, '6.305')] [2025-01-03 22:24:54,713][134294] Updated weights for policy 0, policy_version 17144 (0.0014) [2025-01-03 22:24:57,649][134294] Updated weights for policy 0, policy_version 17154 (0.0026) [2025-01-03 22:24:58,968][134211] Fps is (10 sec: 17612.4, 60 sec: 14540.9, 300 sec: 14148.6). Total num frames: 70279168. Throughput: 0: 3543.7. Samples: 6740680. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 22:24:58,968][134211] Avg episode reward: [(0, '5.957')] [2025-01-03 22:25:00,691][134294] Updated weights for policy 0, policy_version 17164 (0.0028) [2025-01-03 22:25:03,855][134294] Updated weights for policy 0, policy_version 17174 (0.0026) [2025-01-03 22:25:03,968][134211] Fps is (10 sec: 14335.5, 60 sec: 13994.6, 300 sec: 14134.7). Total num frames: 70344704. Throughput: 0: 3544.9. Samples: 6750800. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 22:25:03,968][134211] Avg episode reward: [(0, '5.573')] [2025-01-03 22:25:06,769][134294] Updated weights for policy 0, policy_version 17184 (0.0024) [2025-01-03 22:25:08,968][134211] Fps is (10 sec: 13516.8, 60 sec: 13994.7, 300 sec: 14134.7). Total num frames: 70414336. Throughput: 0: 3545.0. Samples: 6770912. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 22:25:08,968][134211] Avg episode reward: [(0, '5.538')] [2025-01-03 22:25:09,895][134294] Updated weights for policy 0, policy_version 17194 (0.0026) [2025-01-03 22:25:12,992][134294] Updated weights for policy 0, policy_version 17204 (0.0029) [2025-01-03 22:25:13,968][134211] Fps is (10 sec: 13516.8, 60 sec: 13994.6, 300 sec: 14134.7). Total num frames: 70479872. Throughput: 0: 3528.7. Samples: 6790746. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-03 22:25:13,968][134211] Avg episode reward: [(0, '5.802')] [2025-01-03 22:25:13,979][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000017207_70479872.pth... [2025-01-03 22:25:14,054][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000016383_67104768.pth [2025-01-03 22:25:16,026][134294] Updated weights for policy 0, policy_version 17214 (0.0026) [2025-01-03 22:25:18,933][134294] Updated weights for policy 0, policy_version 17224 (0.0024) [2025-01-03 22:25:18,968][134211] Fps is (10 sec: 13517.0, 60 sec: 14063.0, 300 sec: 14134.7). Total num frames: 70549504. Throughput: 0: 3526.9. Samples: 6800908. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-03 22:25:18,968][134211] Avg episode reward: [(0, '5.972')] [2025-01-03 22:25:21,964][134294] Updated weights for policy 0, policy_version 17234 (0.0026) [2025-01-03 22:25:23,968][134211] Fps is (10 sec: 13516.9, 60 sec: 13994.7, 300 sec: 14120.8). Total num frames: 70615040. Throughput: 0: 3545.6. Samples: 6821678. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-03 22:25:23,968][134211] Avg episode reward: [(0, '5.594')] [2025-01-03 22:25:24,937][134294] Updated weights for policy 0, policy_version 17244 (0.0024) [2025-01-03 22:25:27,967][134294] Updated weights for policy 0, policy_version 17254 (0.0026) [2025-01-03 22:25:28,968][134211] Fps is (10 sec: 13515.7, 60 sec: 13994.5, 300 sec: 14134.6). Total num frames: 70684672. Throughput: 0: 3578.3. Samples: 6841944. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-03 22:25:28,969][134211] Avg episode reward: [(0, '6.361')] [2025-01-03 22:25:30,937][134294] Updated weights for policy 0, policy_version 17264 (0.0025) [2025-01-03 22:25:33,831][134294] Updated weights for policy 0, policy_version 17274 (0.0023) [2025-01-03 22:25:33,968][134211] Fps is (10 sec: 13926.4, 60 sec: 14062.9, 300 sec: 14065.2). Total num frames: 70754304. Throughput: 0: 3592.6. Samples: 6852414. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-03 22:25:33,968][134211] Avg episode reward: [(0, '5.392')] [2025-01-03 22:25:36,846][134294] Updated weights for policy 0, policy_version 17284 (0.0025) [2025-01-03 22:25:38,968][134211] Fps is (10 sec: 13927.3, 60 sec: 14131.2, 300 sec: 14079.2). Total num frames: 70823936. Throughput: 0: 3514.9. Samples: 6873306. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-03 22:25:38,969][134211] Avg episode reward: [(0, '6.441')] [2025-01-03 22:25:39,840][134294] Updated weights for policy 0, policy_version 17294 (0.0026) [2025-01-03 22:25:42,834][134294] Updated weights for policy 0, policy_version 17304 (0.0025) [2025-01-03 22:25:43,968][134211] Fps is (10 sec: 13516.7, 60 sec: 14131.2, 300 sec: 14106.9). Total num frames: 70889472. Throughput: 0: 3396.6. Samples: 6893526. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-03 22:25:43,968][134211] Avg episode reward: [(0, '6.466')] [2025-01-03 22:25:45,854][134294] Updated weights for policy 0, policy_version 17314 (0.0024) [2025-01-03 22:25:48,786][134294] Updated weights for policy 0, policy_version 17324 (0.0028) [2025-01-03 22:25:48,968][134211] Fps is (10 sec: 13516.0, 60 sec: 14267.5, 300 sec: 14009.7). Total num frames: 70959104. Throughput: 0: 3399.6. Samples: 6903784. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-03 22:25:48,969][134211] Avg episode reward: [(0, '5.895')] [2025-01-03 22:25:51,784][134294] Updated weights for policy 0, policy_version 17334 (0.0024) [2025-01-03 22:25:53,968][134211] Fps is (10 sec: 13926.4, 60 sec: 13789.8, 300 sec: 13968.1). Total num frames: 71028736. Throughput: 0: 3419.6. Samples: 6924794. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-03 22:25:53,968][134211] Avg episode reward: [(0, '6.116')] [2025-01-03 22:25:54,701][134294] Updated weights for policy 0, policy_version 17344 (0.0026) [2025-01-03 22:25:57,769][134294] Updated weights for policy 0, policy_version 17354 (0.0023) [2025-01-03 22:25:58,968][134211] Fps is (10 sec: 13517.7, 60 sec: 13585.1, 300 sec: 13968.1). Total num frames: 71094272. Throughput: 0: 3426.8. Samples: 6944950. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-03 22:25:58,968][134211] Avg episode reward: [(0, '6.442')] [2025-01-03 22:26:00,722][134294] Updated weights for policy 0, policy_version 17364 (0.0025) [2025-01-03 22:26:03,176][134294] Updated weights for policy 0, policy_version 17374 (0.0017) [2025-01-03 22:26:03,968][134211] Fps is (10 sec: 15155.3, 60 sec: 13926.4, 300 sec: 14037.5). Total num frames: 71180288. Throughput: 0: 3436.6. Samples: 6955556. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-03 22:26:03,968][134211] Avg episode reward: [(0, '5.893')] [2025-01-03 22:26:05,502][134294] Updated weights for policy 0, policy_version 17384 (0.0020) [2025-01-03 22:26:08,433][134294] Updated weights for policy 0, policy_version 17394 (0.0026) [2025-01-03 22:26:08,968][134211] Fps is (10 sec: 15564.3, 60 sec: 13926.3, 300 sec: 14065.2). Total num frames: 71249920. Throughput: 0: 3523.3. Samples: 6980228. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-03 22:26:08,969][134211] Avg episode reward: [(0, '6.081')] [2025-01-03 22:26:11,407][134294] Updated weights for policy 0, policy_version 17404 (0.0024) [2025-01-03 22:26:13,968][134211] Fps is (10 sec: 13926.4, 60 sec: 13994.7, 300 sec: 14079.1). Total num frames: 71319552. Throughput: 0: 3527.2. Samples: 7000664. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-03 22:26:13,968][134211] Avg episode reward: [(0, '5.966')] [2025-01-03 22:26:14,531][134294] Updated weights for policy 0, policy_version 17414 (0.0030) [2025-01-03 22:26:17,684][134294] Updated weights for policy 0, policy_version 17424 (0.0026) [2025-01-03 22:26:18,968][134211] Fps is (10 sec: 13517.2, 60 sec: 13926.4, 300 sec: 14065.2). Total num frames: 71385088. Throughput: 0: 3515.3. Samples: 7010602. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 22:26:18,968][134211] Avg episode reward: [(0, '6.344')] [2025-01-03 22:26:20,644][134294] Updated weights for policy 0, policy_version 17434 (0.0022) [2025-01-03 22:26:22,557][134294] Updated weights for policy 0, policy_version 17444 (0.0012) [2025-01-03 22:26:23,967][134211] Fps is (10 sec: 15974.7, 60 sec: 14404.3, 300 sec: 14162.5). Total num frames: 71479296. Throughput: 0: 3569.8. Samples: 7033948. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 22:26:23,968][134211] Avg episode reward: [(0, '5.886')] [2025-01-03 22:26:24,388][134294] Updated weights for policy 0, policy_version 17454 (0.0013) [2025-01-03 22:26:26,297][134294] Updated weights for policy 0, policy_version 17464 (0.0013) [2025-01-03 22:26:28,189][134294] Updated weights for policy 0, policy_version 17474 (0.0013) [2025-01-03 22:26:28,967][134211] Fps is (10 sec: 20480.6, 60 sec: 15087.2, 300 sec: 14301.3). Total num frames: 71589888. Throughput: 0: 3846.6. Samples: 7066620. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-03 22:26:28,968][134211] Avg episode reward: [(0, '6.030')] [2025-01-03 22:26:30,028][134294] Updated weights for policy 0, policy_version 17484 (0.0015) [2025-01-03 22:26:32,863][134294] Updated weights for policy 0, policy_version 17494 (0.0025) [2025-01-03 22:26:33,968][134211] Fps is (10 sec: 18841.2, 60 sec: 15223.5, 300 sec: 14329.1). Total num frames: 71667712. Throughput: 0: 3950.7. Samples: 7081562. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-03 22:26:33,968][134211] Avg episode reward: [(0, '5.531')] [2025-01-03 22:26:36,305][134294] Updated weights for policy 0, policy_version 17504 (0.0028) [2025-01-03 22:26:38,968][134211] Fps is (10 sec: 13926.0, 60 sec: 15086.9, 300 sec: 14301.3). Total num frames: 71729152. Throughput: 0: 3899.0. Samples: 7100250. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-03 22:26:38,968][134211] Avg episode reward: [(0, '5.674')] [2025-01-03 22:26:39,575][134294] Updated weights for policy 0, policy_version 17514 (0.0028) [2025-01-03 22:26:42,794][134294] Updated weights for policy 0, policy_version 17524 (0.0027) [2025-01-03 22:26:43,968][134211] Fps is (10 sec: 12697.6, 60 sec: 15086.9, 300 sec: 14301.3). Total num frames: 71794688. Throughput: 0: 3875.2. Samples: 7119336. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 22:26:43,968][134211] Avg episode reward: [(0, '5.662')] [2025-01-03 22:26:45,711][134294] Updated weights for policy 0, policy_version 17534 (0.0025) [2025-01-03 22:26:48,968][134211] Fps is (10 sec: 12696.8, 60 sec: 14950.4, 300 sec: 14245.7). Total num frames: 71856128. Throughput: 0: 3865.6. Samples: 7129512. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 22:26:48,969][134211] Avg episode reward: [(0, '5.586')] [2025-01-03 22:26:49,113][134294] Updated weights for policy 0, policy_version 17544 (0.0026) [2025-01-03 22:26:52,681][134294] Updated weights for policy 0, policy_version 17554 (0.0026) [2025-01-03 22:26:53,968][134211] Fps is (10 sec: 11878.5, 60 sec: 14745.6, 300 sec: 14190.2). Total num frames: 71913472. Throughput: 0: 3708.2. Samples: 7147094. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-03 22:26:53,968][134211] Avg episode reward: [(0, '5.809')] [2025-01-03 22:26:56,049][134294] Updated weights for policy 0, policy_version 17564 (0.0026) [2025-01-03 22:26:58,968][134211] Fps is (10 sec: 11879.1, 60 sec: 14677.3, 300 sec: 14176.3). Total num frames: 71974912. Throughput: 0: 3658.9. Samples: 7165314. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-03 22:26:58,968][134211] Avg episode reward: [(0, '6.046')] [2025-01-03 22:26:59,371][134294] Updated weights for policy 0, policy_version 17574 (0.0027) [2025-01-03 22:27:02,337][134294] Updated weights for policy 0, policy_version 17584 (0.0026) [2025-01-03 22:27:03,968][134211] Fps is (10 sec: 13107.2, 60 sec: 14404.3, 300 sec: 14190.2). Total num frames: 72044544. Throughput: 0: 3659.4. Samples: 7175276. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-03 22:27:03,968][134211] Avg episode reward: [(0, '5.598')] [2025-01-03 22:27:05,410][134294] Updated weights for policy 0, policy_version 17594 (0.0025) [2025-01-03 22:27:08,322][134294] Updated weights for policy 0, policy_version 17604 (0.0022) [2025-01-03 22:27:08,968][134211] Fps is (10 sec: 13926.6, 60 sec: 14404.4, 300 sec: 14148.6). Total num frames: 72114176. Throughput: 0: 3595.9. Samples: 7195764. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-03 22:27:08,968][134211] Avg episode reward: [(0, '5.625')] [2025-01-03 22:27:11,282][134294] Updated weights for policy 0, policy_version 17614 (0.0024) [2025-01-03 22:27:13,968][134211] Fps is (10 sec: 13516.5, 60 sec: 14336.0, 300 sec: 14023.6). Total num frames: 72179712. Throughput: 0: 3321.9. Samples: 7216106. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-03 22:27:13,969][134211] Avg episode reward: [(0, '6.477')] [2025-01-03 22:27:13,977][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000017622_72179712.pth... [2025-01-03 22:27:14,054][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000016792_68780032.pth [2025-01-03 22:27:14,437][134294] Updated weights for policy 0, policy_version 17624 (0.0025) [2025-01-03 22:27:17,494][134294] Updated weights for policy 0, policy_version 17634 (0.0024) [2025-01-03 22:27:18,968][134211] Fps is (10 sec: 13107.2, 60 sec: 14336.0, 300 sec: 13968.1). Total num frames: 72245248. Throughput: 0: 3209.6. Samples: 7225994. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-03 22:27:18,968][134211] Avg episode reward: [(0, '5.425')] [2025-01-03 22:27:20,401][134294] Updated weights for policy 0, policy_version 17644 (0.0025) [2025-01-03 22:27:23,379][134294] Updated weights for policy 0, policy_version 17654 (0.0024) [2025-01-03 22:27:23,968][134211] Fps is (10 sec: 13516.6, 60 sec: 13926.3, 300 sec: 13981.9). Total num frames: 72314880. Throughput: 0: 3257.4. Samples: 7246832. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-03 22:27:23,969][134211] Avg episode reward: [(0, '5.723')] [2025-01-03 22:27:26,335][134294] Updated weights for policy 0, policy_version 17664 (0.0024) [2025-01-03 22:27:28,968][134211] Fps is (10 sec: 13926.4, 60 sec: 13243.7, 300 sec: 14009.7). Total num frames: 72384512. Throughput: 0: 3287.1. Samples: 7267254. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-03 22:27:28,968][134211] Avg episode reward: [(0, '5.346')] [2025-01-03 22:27:29,414][134294] Updated weights for policy 0, policy_version 17674 (0.0027) [2025-01-03 22:27:32,545][134294] Updated weights for policy 0, policy_version 17684 (0.0023) [2025-01-03 22:27:33,968][134211] Fps is (10 sec: 13517.3, 60 sec: 13038.9, 300 sec: 14009.7). Total num frames: 72450048. Throughput: 0: 3275.4. Samples: 7276904. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 22:27:33,968][134211] Avg episode reward: [(0, '5.888')] [2025-01-03 22:27:35,556][134294] Updated weights for policy 0, policy_version 17694 (0.0024) [2025-01-03 22:27:38,494][134294] Updated weights for policy 0, policy_version 17704 (0.0023) [2025-01-03 22:27:38,968][134211] Fps is (10 sec: 13516.6, 60 sec: 13175.5, 300 sec: 14009.7). Total num frames: 72519680. Throughput: 0: 3343.5. Samples: 7297550. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 22:27:38,968][134211] Avg episode reward: [(0, '6.057')] [2025-01-03 22:27:41,531][134294] Updated weights for policy 0, policy_version 17714 (0.0025) [2025-01-03 22:27:43,968][134211] Fps is (10 sec: 13925.6, 60 sec: 13243.6, 300 sec: 14037.6). Total num frames: 72589312. Throughput: 0: 3392.5. Samples: 7317980. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 22:27:43,969][134211] Avg episode reward: [(0, '6.083')] [2025-01-03 22:27:44,543][134294] Updated weights for policy 0, policy_version 17724 (0.0025) [2025-01-03 22:27:47,543][134294] Updated weights for policy 0, policy_version 17734 (0.0023) [2025-01-03 22:27:48,968][134211] Fps is (10 sec: 13516.8, 60 sec: 13312.1, 300 sec: 14051.4). Total num frames: 72654848. Throughput: 0: 3394.1. Samples: 7328010. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 22:27:48,968][134211] Avg episode reward: [(0, '5.926')] [2025-01-03 22:27:50,561][134294] Updated weights for policy 0, policy_version 17744 (0.0027) [2025-01-03 22:27:53,514][134294] Updated weights for policy 0, policy_version 17754 (0.0023) [2025-01-03 22:27:53,969][134211] Fps is (10 sec: 13515.9, 60 sec: 13516.5, 300 sec: 14065.2). Total num frames: 72724480. Throughput: 0: 3400.4. Samples: 7348786. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 22:27:53,970][134211] Avg episode reward: [(0, '5.960')] [2025-01-03 22:27:56,499][134294] Updated weights for policy 0, policy_version 17764 (0.0023) [2025-01-03 22:27:58,389][134294] Updated weights for policy 0, policy_version 17774 (0.0015) [2025-01-03 22:27:58,968][134211] Fps is (10 sec: 15974.6, 60 sec: 13994.7, 300 sec: 14148.6). Total num frames: 72814592. Throughput: 0: 3478.0. Samples: 7372616. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 22:27:58,968][134211] Avg episode reward: [(0, '5.230')] [2025-01-03 22:28:00,318][134294] Updated weights for policy 0, policy_version 17784 (0.0014) [2025-01-03 22:28:02,164][134294] Updated weights for policy 0, policy_version 17794 (0.0012) [2025-01-03 22:28:03,968][134211] Fps is (10 sec: 19663.3, 60 sec: 14609.1, 300 sec: 14273.5). Total num frames: 72921088. Throughput: 0: 3620.5. Samples: 7388916. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 22:28:03,968][134211] Avg episode reward: [(0, '5.725')] [2025-01-03 22:28:04,114][134294] Updated weights for policy 0, policy_version 17804 (0.0013) [2025-01-03 22:28:06,015][134294] Updated weights for policy 0, policy_version 17814 (0.0013) [2025-01-03 22:28:08,877][134294] Updated weights for policy 0, policy_version 17824 (0.0023) [2025-01-03 22:28:08,968][134211] Fps is (10 sec: 19249.9, 60 sec: 14882.0, 300 sec: 14329.0). Total num frames: 73007104. Throughput: 0: 3836.8. Samples: 7419490. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 22:28:08,969][134211] Avg episode reward: [(0, '5.729')] [2025-01-03 22:28:12,130][134294] Updated weights for policy 0, policy_version 17834 (0.0028) [2025-01-03 22:28:13,968][134211] Fps is (10 sec: 14745.5, 60 sec: 14813.9, 300 sec: 14301.3). Total num frames: 73068544. Throughput: 0: 3801.6. Samples: 7438328. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 22:28:13,968][134211] Avg episode reward: [(0, '6.095')] [2025-01-03 22:28:15,248][134294] Updated weights for policy 0, policy_version 17844 (0.0025) [2025-01-03 22:28:18,391][134294] Updated weights for policy 0, policy_version 17854 (0.0024) [2025-01-03 22:28:18,968][134211] Fps is (10 sec: 12698.2, 60 sec: 14813.8, 300 sec: 14301.3). Total num frames: 73134080. Throughput: 0: 3812.7. Samples: 7448478. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 22:28:18,968][134211] Avg episode reward: [(0, '5.702')] [2025-01-03 22:28:21,468][134294] Updated weights for policy 0, policy_version 17864 (0.0028) [2025-01-03 22:28:23,968][134211] Fps is (10 sec: 13107.0, 60 sec: 14745.6, 300 sec: 14259.6). Total num frames: 73199616. Throughput: 0: 3789.0. Samples: 7468056. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 22:28:23,969][134211] Avg episode reward: [(0, '5.460')] [2025-01-03 22:28:24,778][134294] Updated weights for policy 0, policy_version 17874 (0.0026) [2025-01-03 22:28:27,972][134294] Updated weights for policy 0, policy_version 17884 (0.0027) [2025-01-03 22:28:28,968][134211] Fps is (10 sec: 13106.7, 60 sec: 14677.2, 300 sec: 14190.2). Total num frames: 73265152. Throughput: 0: 3759.7. Samples: 7487166. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 22:28:28,969][134211] Avg episode reward: [(0, '5.655')] [2025-01-03 22:28:30,963][134294] Updated weights for policy 0, policy_version 17894 (0.0026) [2025-01-03 22:28:33,869][134294] Updated weights for policy 0, policy_version 17904 (0.0024) [2025-01-03 22:28:33,969][134211] Fps is (10 sec: 13515.4, 60 sec: 14745.3, 300 sec: 14217.9). Total num frames: 73334784. Throughput: 0: 3768.4. Samples: 7497594. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 22:28:33,970][134211] Avg episode reward: [(0, '5.725')] [2025-01-03 22:28:36,781][134294] Updated weights for policy 0, policy_version 17914 (0.0025) [2025-01-03 22:28:38,968][134211] Fps is (10 sec: 13517.5, 60 sec: 14677.4, 300 sec: 14218.0). Total num frames: 73400320. Throughput: 0: 3768.9. Samples: 7518380. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 22:28:38,968][134211] Avg episode reward: [(0, '6.171')] [2025-01-03 22:28:39,940][134294] Updated weights for policy 0, policy_version 17924 (0.0024) [2025-01-03 22:28:42,905][134294] Updated weights for policy 0, policy_version 17934 (0.0024) [2025-01-03 22:28:43,968][134211] Fps is (10 sec: 13518.5, 60 sec: 14677.5, 300 sec: 14245.7). Total num frames: 73469952. Throughput: 0: 3684.8. Samples: 7538434. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 22:28:43,968][134211] Avg episode reward: [(0, '6.216')] [2025-01-03 22:28:46,084][134294] Updated weights for policy 0, policy_version 17944 (0.0027) [2025-01-03 22:28:48,968][134211] Fps is (10 sec: 13107.1, 60 sec: 14609.1, 300 sec: 14231.9). Total num frames: 73531392. Throughput: 0: 3535.4. Samples: 7548010. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-03 22:28:48,968][134211] Avg episode reward: [(0, '6.309')] [2025-01-03 22:28:49,393][134294] Updated weights for policy 0, policy_version 17954 (0.0026) [2025-01-03 22:28:52,810][134294] Updated weights for policy 0, policy_version 17964 (0.0024) [2025-01-03 22:28:53,968][134211] Fps is (10 sec: 12287.8, 60 sec: 14472.8, 300 sec: 14190.2). Total num frames: 73592832. Throughput: 0: 3264.6. Samples: 7566396. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-03 22:28:53,968][134211] Avg episode reward: [(0, '6.189')] [2025-01-03 22:28:56,158][134294] Updated weights for policy 0, policy_version 17974 (0.0024) [2025-01-03 22:28:58,971][134211] Fps is (10 sec: 12284.2, 60 sec: 13993.9, 300 sec: 14065.1). Total num frames: 73654272. Throughput: 0: 3250.4. Samples: 7584606. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 22:28:58,971][134211] Avg episode reward: [(0, '6.465')] [2025-01-03 22:28:59,513][134294] Updated weights for policy 0, policy_version 17984 (0.0026) [2025-01-03 22:29:02,662][134294] Updated weights for policy 0, policy_version 17994 (0.0026) [2025-01-03 22:29:03,968][134211] Fps is (10 sec: 12288.1, 60 sec: 13243.7, 300 sec: 14037.5). Total num frames: 73715712. Throughput: 0: 3245.0. Samples: 7594502. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 22:29:03,968][134211] Avg episode reward: [(0, '6.158')] [2025-01-03 22:29:05,874][134294] Updated weights for policy 0, policy_version 18004 (0.0028) [2025-01-03 22:29:08,212][134294] Updated weights for policy 0, policy_version 18014 (0.0016) [2025-01-03 22:29:08,968][134211] Fps is (10 sec: 14340.7, 60 sec: 13175.6, 300 sec: 14093.0). Total num frames: 73797632. Throughput: 0: 3242.9. Samples: 7613986. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 22:29:08,968][134211] Avg episode reward: [(0, '6.188')] [2025-01-03 22:29:10,157][134294] Updated weights for policy 0, policy_version 18024 (0.0012) [2025-01-03 22:29:12,022][134294] Updated weights for policy 0, policy_version 18034 (0.0013) [2025-01-03 22:29:13,968][134211] Fps is (10 sec: 18841.5, 60 sec: 13926.4, 300 sec: 14231.9). Total num frames: 73904128. Throughput: 0: 3524.9. Samples: 7645784. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 22:29:13,968][134211] Avg episode reward: [(0, '5.746')] [2025-01-03 22:29:13,974][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000018043_73904128.pth... [2025-01-03 22:29:14,025][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000017207_70479872.pth [2025-01-03 22:29:14,216][134294] Updated weights for policy 0, policy_version 18044 (0.0015) [2025-01-03 22:29:18,064][134294] Updated weights for policy 0, policy_version 18054 (0.0033) [2025-01-03 22:29:18,970][134211] Fps is (10 sec: 15971.0, 60 sec: 13721.2, 300 sec: 14176.2). Total num frames: 73957376. Throughput: 0: 3508.9. Samples: 7655498. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 22:29:18,970][134211] Avg episode reward: [(0, '5.945')] [2025-01-03 22:29:21,665][134294] Updated weights for policy 0, policy_version 18064 (0.0031) [2025-01-03 22:29:23,968][134211] Fps is (10 sec: 11468.7, 60 sec: 13653.3, 300 sec: 14148.6). Total num frames: 74018816. Throughput: 0: 3420.5. Samples: 7672302. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 22:29:23,969][134211] Avg episode reward: [(0, '6.173')] [2025-01-03 22:29:24,975][134294] Updated weights for policy 0, policy_version 18074 (0.0028) [2025-01-03 22:29:28,072][134294] Updated weights for policy 0, policy_version 18084 (0.0025) [2025-01-03 22:29:28,968][134211] Fps is (10 sec: 12290.4, 60 sec: 13585.2, 300 sec: 14134.7). Total num frames: 74080256. Throughput: 0: 3399.3. Samples: 7691404. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 22:29:28,968][134211] Avg episode reward: [(0, '6.109')] [2025-01-03 22:29:31,366][134294] Updated weights for policy 0, policy_version 18094 (0.0027) [2025-01-03 22:29:33,968][134211] Fps is (10 sec: 12697.6, 60 sec: 13517.0, 300 sec: 14134.7). Total num frames: 74145792. Throughput: 0: 3392.9. Samples: 7700690. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 22:29:33,968][134211] Avg episode reward: [(0, '6.705')] [2025-01-03 22:29:34,720][134294] Updated weights for policy 0, policy_version 18104 (0.0027) [2025-01-03 22:29:37,811][134294] Updated weights for policy 0, policy_version 18114 (0.0026) [2025-01-03 22:29:38,969][134211] Fps is (10 sec: 12696.2, 60 sec: 13448.3, 300 sec: 14120.7). Total num frames: 74207232. Throughput: 0: 3411.9. Samples: 7719936. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 22:29:38,969][134211] Avg episode reward: [(0, '6.692')] [2025-01-03 22:29:40,825][134294] Updated weights for policy 0, policy_version 18124 (0.0028) [2025-01-03 22:29:43,939][134294] Updated weights for policy 0, policy_version 18134 (0.0027) [2025-01-03 22:29:43,970][134211] Fps is (10 sec: 13104.5, 60 sec: 13448.0, 300 sec: 14148.4). Total num frames: 74276864. Throughput: 0: 3455.8. Samples: 7740112. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 22:29:43,971][134211] Avg episode reward: [(0, '5.920')] [2025-01-03 22:29:46,914][134294] Updated weights for policy 0, policy_version 18144 (0.0025) [2025-01-03 22:29:48,968][134211] Fps is (10 sec: 13518.0, 60 sec: 13516.8, 300 sec: 14037.5). Total num frames: 74342400. Throughput: 0: 3456.1. Samples: 7750026. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 22:29:48,969][134211] Avg episode reward: [(0, '6.376')] [2025-01-03 22:29:50,096][134294] Updated weights for policy 0, policy_version 18154 (0.0024) [2025-01-03 22:29:53,184][134294] Updated weights for policy 0, policy_version 18164 (0.0027) [2025-01-03 22:29:53,968][134211] Fps is (10 sec: 13109.8, 60 sec: 13585.0, 300 sec: 13995.8). Total num frames: 74407936. Throughput: 0: 3463.1. Samples: 7769828. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 22:29:53,969][134211] Avg episode reward: [(0, '5.809')] [2025-01-03 22:29:56,399][134294] Updated weights for policy 0, policy_version 18174 (0.0025) [2025-01-03 22:29:58,968][134211] Fps is (10 sec: 12697.9, 60 sec: 13585.8, 300 sec: 13981.9). Total num frames: 74469376. Throughput: 0: 3180.1. Samples: 7788890. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 22:29:58,968][134211] Avg episode reward: [(0, '5.428')] [2025-01-03 22:29:59,448][134294] Updated weights for policy 0, policy_version 18184 (0.0025) [2025-01-03 22:30:01,394][134294] Updated weights for policy 0, policy_version 18194 (0.0016) [2025-01-03 22:30:03,335][134294] Updated weights for policy 0, policy_version 18204 (0.0013) [2025-01-03 22:30:03,968][134211] Fps is (10 sec: 16794.1, 60 sec: 14336.0, 300 sec: 14106.9). Total num frames: 74575872. Throughput: 0: 3273.4. Samples: 7802792. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-03 22:30:03,968][134211] Avg episode reward: [(0, '5.926')] [2025-01-03 22:30:05,225][134294] Updated weights for policy 0, policy_version 18214 (0.0014) [2025-01-03 22:30:07,313][134294] Updated weights for policy 0, policy_version 18224 (0.0017) [2025-01-03 22:30:08,968][134211] Fps is (10 sec: 19250.5, 60 sec: 14404.2, 300 sec: 14176.3). Total num frames: 74661888. Throughput: 0: 3591.5. Samples: 7833922. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-03 22:30:08,970][134211] Avg episode reward: [(0, '5.898')] [2025-01-03 22:30:10,952][134294] Updated weights for policy 0, policy_version 18234 (0.0029) [2025-01-03 22:30:13,968][134211] Fps is (10 sec: 14745.4, 60 sec: 13653.3, 300 sec: 14148.6). Total num frames: 74723328. Throughput: 0: 3567.6. Samples: 7851946. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 22:30:13,968][134211] Avg episode reward: [(0, '5.574')] [2025-01-03 22:30:14,250][134294] Updated weights for policy 0, policy_version 18244 (0.0025) [2025-01-03 22:30:17,449][134294] Updated weights for policy 0, policy_version 18254 (0.0027) [2025-01-03 22:30:18,968][134211] Fps is (10 sec: 12288.2, 60 sec: 13790.3, 300 sec: 14134.7). Total num frames: 74784768. Throughput: 0: 3569.0. Samples: 7861294. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 22:30:18,968][134211] Avg episode reward: [(0, '6.037')] [2025-01-03 22:30:20,551][134294] Updated weights for policy 0, policy_version 18264 (0.0026) [2025-01-03 22:30:23,889][134294] Updated weights for policy 0, policy_version 18274 (0.0025) [2025-01-03 22:30:23,968][134211] Fps is (10 sec: 12697.5, 60 sec: 13858.1, 300 sec: 14120.8). Total num frames: 74850304. Throughput: 0: 3570.1. Samples: 7880586. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 22:30:23,968][134211] Avg episode reward: [(0, '5.846')] [2025-01-03 22:30:27,042][134294] Updated weights for policy 0, policy_version 18284 (0.0026) [2025-01-03 22:30:28,968][134211] Fps is (10 sec: 12697.8, 60 sec: 13858.1, 300 sec: 14093.0). Total num frames: 74911744. Throughput: 0: 3539.6. Samples: 7899384. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 22:30:28,968][134211] Avg episode reward: [(0, '5.830')] [2025-01-03 22:30:30,285][134294] Updated weights for policy 0, policy_version 18294 (0.0026) [2025-01-03 22:30:33,209][134294] Updated weights for policy 0, policy_version 18304 (0.0027) [2025-01-03 22:30:33,968][134211] Fps is (10 sec: 13107.3, 60 sec: 13926.4, 300 sec: 14093.0). Total num frames: 74981376. Throughput: 0: 3544.1. Samples: 7909508. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 22:30:33,968][134211] Avg episode reward: [(0, '6.111')] [2025-01-03 22:30:36,326][134294] Updated weights for policy 0, policy_version 18314 (0.0026) [2025-01-03 22:30:38,968][134211] Fps is (10 sec: 13516.8, 60 sec: 13994.9, 300 sec: 14093.0). Total num frames: 75046912. Throughput: 0: 3547.1. Samples: 7929446. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 22:30:38,968][134211] Avg episode reward: [(0, '5.870')] [2025-01-03 22:30:39,523][134294] Updated weights for policy 0, policy_version 18324 (0.0027) [2025-01-03 22:30:42,592][134294] Updated weights for policy 0, policy_version 18334 (0.0026) [2025-01-03 22:30:43,968][134211] Fps is (10 sec: 13107.1, 60 sec: 13926.9, 300 sec: 14079.2). Total num frames: 75112448. Throughput: 0: 3557.5. Samples: 7948980. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 22:30:43,969][134211] Avg episode reward: [(0, '6.549')] [2025-01-03 22:30:45,661][134294] Updated weights for policy 0, policy_version 18344 (0.0025) [2025-01-03 22:30:48,675][134294] Updated weights for policy 0, policy_version 18354 (0.0026) [2025-01-03 22:30:48,968][134211] Fps is (10 sec: 13107.1, 60 sec: 13926.4, 300 sec: 14065.2). Total num frames: 75177984. Throughput: 0: 3476.3. Samples: 7959228. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 22:30:48,968][134211] Avg episode reward: [(0, '6.175')] [2025-01-03 22:30:52,055][134294] Updated weights for policy 0, policy_version 18364 (0.0027) [2025-01-03 22:30:53,968][134211] Fps is (10 sec: 12288.1, 60 sec: 13789.9, 300 sec: 14037.5). Total num frames: 75235328. Throughput: 0: 3203.8. Samples: 7978094. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 22:30:53,968][134211] Avg episode reward: [(0, '5.712')] [2025-01-03 22:30:55,653][134294] Updated weights for policy 0, policy_version 18374 (0.0024) [2025-01-03 22:30:58,353][134294] Updated weights for policy 0, policy_version 18384 (0.0019) [2025-01-03 22:30:58,968][134211] Fps is (10 sec: 13107.5, 60 sec: 13994.7, 300 sec: 13995.8). Total num frames: 75309056. Throughput: 0: 3237.5. Samples: 7997632. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 22:30:58,968][134211] Avg episode reward: [(0, '5.516')] [2025-01-03 22:31:00,942][134294] Updated weights for policy 0, policy_version 18394 (0.0019) [2025-01-03 22:31:03,968][134211] Fps is (10 sec: 14336.1, 60 sec: 13380.2, 300 sec: 13995.8). Total num frames: 75378688. Throughput: 0: 3295.6. Samples: 8009594. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 22:31:03,968][134211] Avg episode reward: [(0, '5.622')] [2025-01-03 22:31:04,025][134294] Updated weights for policy 0, policy_version 18404 (0.0026) [2025-01-03 22:31:07,089][134294] Updated weights for policy 0, policy_version 18414 (0.0024) [2025-01-03 22:31:08,968][134211] Fps is (10 sec: 13516.5, 60 sec: 13039.0, 300 sec: 13981.9). Total num frames: 75444224. Throughput: 0: 3307.0. Samples: 8029400. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 22:31:08,968][134211] Avg episode reward: [(0, '6.297')] [2025-01-03 22:31:10,254][134294] Updated weights for policy 0, policy_version 18424 (0.0027) [2025-01-03 22:31:13,200][134294] Updated weights for policy 0, policy_version 18434 (0.0025) [2025-01-03 22:31:13,968][134211] Fps is (10 sec: 13516.7, 60 sec: 13175.5, 300 sec: 13995.8). Total num frames: 75513856. Throughput: 0: 3337.3. Samples: 8049564. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 22:31:13,968][134211] Avg episode reward: [(0, '6.082')] [2025-01-03 22:31:13,980][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000018436_75513856.pth... [2025-01-03 22:31:14,065][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000017622_72179712.pth [2025-01-03 22:31:16,456][134294] Updated weights for policy 0, policy_version 18444 (0.0027) [2025-01-03 22:31:18,968][134211] Fps is (10 sec: 13517.0, 60 sec: 13243.8, 300 sec: 13898.6). Total num frames: 75579392. Throughput: 0: 3322.9. Samples: 8059036. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-03 22:31:18,968][134211] Avg episode reward: [(0, '5.929')] [2025-01-03 22:31:19,434][134294] Updated weights for policy 0, policy_version 18454 (0.0027) [2025-01-03 22:31:22,611][134294] Updated weights for policy 0, policy_version 18464 (0.0026) [2025-01-03 22:31:23,968][134211] Fps is (10 sec: 13926.6, 60 sec: 13380.3, 300 sec: 13773.7). Total num frames: 75653120. Throughput: 0: 3322.1. Samples: 8078940. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-03 22:31:23,968][134211] Avg episode reward: [(0, '5.625')] [2025-01-03 22:31:24,623][134294] Updated weights for policy 0, policy_version 18474 (0.0013) [2025-01-03 22:31:26,564][134294] Updated weights for policy 0, policy_version 18484 (0.0014) [2025-01-03 22:31:28,443][134294] Updated weights for policy 0, policy_version 18494 (0.0013) [2025-01-03 22:31:28,967][134211] Fps is (10 sec: 18022.6, 60 sec: 14131.2, 300 sec: 13870.9). Total num frames: 75759616. Throughput: 0: 3573.8. Samples: 8109800. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-03 22:31:28,968][134211] Avg episode reward: [(0, '6.023')] [2025-01-03 22:31:30,398][134294] Updated weights for policy 0, policy_version 18504 (0.0013) [2025-01-03 22:31:32,282][134294] Updated weights for policy 0, policy_version 18514 (0.0014) [2025-01-03 22:31:33,968][134211] Fps is (10 sec: 20070.2, 60 sec: 14540.8, 300 sec: 13981.9). Total num frames: 75853824. Throughput: 0: 3701.5. Samples: 8125796. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-03 22:31:33,968][134211] Avg episode reward: [(0, '6.135')] [2025-01-03 22:31:35,444][134294] Updated weights for policy 0, policy_version 18524 (0.0027) [2025-01-03 22:31:38,757][134294] Updated weights for policy 0, policy_version 18534 (0.0025) [2025-01-03 22:31:38,968][134211] Fps is (10 sec: 15564.6, 60 sec: 14472.5, 300 sec: 13968.1). Total num frames: 75915264. Throughput: 0: 3759.7. Samples: 8147280. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-03 22:31:38,968][134211] Avg episode reward: [(0, '5.693')] [2025-01-03 22:31:41,967][134294] Updated weights for policy 0, policy_version 18544 (0.0028) [2025-01-03 22:31:43,968][134211] Fps is (10 sec: 12697.5, 60 sec: 14472.5, 300 sec: 13982.0). Total num frames: 75980800. Throughput: 0: 3739.8. Samples: 8165924. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 22:31:43,968][134211] Avg episode reward: [(0, '5.491')] [2025-01-03 22:31:45,214][134294] Updated weights for policy 0, policy_version 18554 (0.0027) [2025-01-03 22:31:48,234][134294] Updated weights for policy 0, policy_version 18564 (0.0027) [2025-01-03 22:31:48,968][134211] Fps is (10 sec: 13107.1, 60 sec: 14472.5, 300 sec: 14009.7). Total num frames: 76046336. Throughput: 0: 3694.2. Samples: 8175832. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 22:31:48,968][134211] Avg episode reward: [(0, '5.817')] [2025-01-03 22:31:51,507][134294] Updated weights for policy 0, policy_version 18574 (0.0023) [2025-01-03 22:31:53,968][134211] Fps is (10 sec: 12287.9, 60 sec: 14472.5, 300 sec: 13995.8). Total num frames: 76103680. Throughput: 0: 3677.1. Samples: 8194870. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 22:31:53,969][134211] Avg episode reward: [(0, '5.871')] [2025-01-03 22:31:55,173][134294] Updated weights for policy 0, policy_version 18584 (0.0030) [2025-01-03 22:31:58,333][134294] Updated weights for policy 0, policy_version 18594 (0.0022) [2025-01-03 22:31:58,970][134211] Fps is (10 sec: 12285.4, 60 sec: 14335.5, 300 sec: 13981.8). Total num frames: 76169216. Throughput: 0: 3631.7. Samples: 8213000. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 22:31:58,970][134211] Avg episode reward: [(0, '6.080')] [2025-01-03 22:32:01,660][134294] Updated weights for policy 0, policy_version 18604 (0.0026) [2025-01-03 22:32:03,968][134211] Fps is (10 sec: 12288.3, 60 sec: 14131.2, 300 sec: 13940.3). Total num frames: 76226560. Throughput: 0: 3625.4. Samples: 8222180. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 22:32:03,968][134211] Avg episode reward: [(0, '5.844')] [2025-01-03 22:32:05,313][134294] Updated weights for policy 0, policy_version 18614 (0.0028) [2025-01-03 22:32:08,407][134294] Updated weights for policy 0, policy_version 18624 (0.0024) [2025-01-03 22:32:08,968][134211] Fps is (10 sec: 11880.8, 60 sec: 14062.9, 300 sec: 13926.4). Total num frames: 76288000. Throughput: 0: 3582.5. Samples: 8240152. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 22:32:08,968][134211] Avg episode reward: [(0, '5.554')] [2025-01-03 22:32:11,677][134294] Updated weights for policy 0, policy_version 18634 (0.0026) [2025-01-03 22:32:13,968][134211] Fps is (10 sec: 12287.8, 60 sec: 13926.4, 300 sec: 13912.5). Total num frames: 76349440. Throughput: 0: 3319.7. Samples: 8259188. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-03 22:32:13,968][134211] Avg episode reward: [(0, '5.628')] [2025-01-03 22:32:15,042][134294] Updated weights for policy 0, policy_version 18644 (0.0028) [2025-01-03 22:32:18,299][134294] Updated weights for policy 0, policy_version 18654 (0.0026) [2025-01-03 22:32:18,968][134211] Fps is (10 sec: 12698.0, 60 sec: 13926.4, 300 sec: 13898.7). Total num frames: 76414976. Throughput: 0: 3166.3. Samples: 8268280. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-03 22:32:18,968][134211] Avg episode reward: [(0, '6.049')] [2025-01-03 22:32:20,584][134294] Updated weights for policy 0, policy_version 18664 (0.0013) [2025-01-03 22:32:23,361][134294] Updated weights for policy 0, policy_version 18674 (0.0025) [2025-01-03 22:32:23,968][134211] Fps is (10 sec: 14336.2, 60 sec: 13994.7, 300 sec: 13926.4). Total num frames: 76492800. Throughput: 0: 3200.1. Samples: 8291284. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-03 22:32:23,968][134211] Avg episode reward: [(0, '6.344')] [2025-01-03 22:32:26,721][134294] Updated weights for policy 0, policy_version 18684 (0.0029) [2025-01-03 22:32:28,968][134211] Fps is (10 sec: 14335.7, 60 sec: 13311.9, 300 sec: 13926.4). Total num frames: 76558336. Throughput: 0: 3210.3. Samples: 8310386. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-03 22:32:28,969][134211] Avg episode reward: [(0, '5.363')] [2025-01-03 22:32:29,811][134294] Updated weights for policy 0, policy_version 18694 (0.0027) [2025-01-03 22:32:32,918][134294] Updated weights for policy 0, policy_version 18704 (0.0025) [2025-01-03 22:32:33,968][134211] Fps is (10 sec: 13107.2, 60 sec: 12834.1, 300 sec: 13912.5). Total num frames: 76623872. Throughput: 0: 3207.4. Samples: 8320166. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-03 22:32:33,968][134211] Avg episode reward: [(0, '5.863')] [2025-01-03 22:32:36,014][134294] Updated weights for policy 0, policy_version 18714 (0.0026) [2025-01-03 22:32:38,968][134211] Fps is (10 sec: 12697.8, 60 sec: 12834.1, 300 sec: 13884.8). Total num frames: 76685312. Throughput: 0: 3225.8. Samples: 8340028. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 22:32:38,968][134211] Avg episode reward: [(0, '6.107')] [2025-01-03 22:32:39,236][134294] Updated weights for policy 0, policy_version 18724 (0.0023) [2025-01-03 22:32:42,263][134294] Updated weights for policy 0, policy_version 18734 (0.0026) [2025-01-03 22:32:43,968][134211] Fps is (10 sec: 13107.2, 60 sec: 12902.4, 300 sec: 13898.6). Total num frames: 76754944. Throughput: 0: 3256.9. Samples: 8359552. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 22:32:43,968][134211] Avg episode reward: [(0, '6.719')] [2025-01-03 22:32:45,412][134294] Updated weights for policy 0, policy_version 18744 (0.0027) [2025-01-03 22:32:47,380][134294] Updated weights for policy 0, policy_version 18754 (0.0015) [2025-01-03 22:32:48,968][134211] Fps is (10 sec: 15564.8, 60 sec: 13243.8, 300 sec: 13954.2). Total num frames: 76840960. Throughput: 0: 3311.6. Samples: 8371204. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 22:32:48,968][134211] Avg episode reward: [(0, '6.331')] [2025-01-03 22:32:50,105][134294] Updated weights for policy 0, policy_version 18764 (0.0021) [2025-01-03 22:32:53,661][134294] Updated weights for policy 0, policy_version 18774 (0.0027) [2025-01-03 22:32:53,968][134211] Fps is (10 sec: 14335.7, 60 sec: 13243.7, 300 sec: 13843.1). Total num frames: 76898304. Throughput: 0: 3412.5. Samples: 8393714. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-03 22:32:53,969][134211] Avg episode reward: [(0, '5.851')] [2025-01-03 22:32:56,988][134294] Updated weights for policy 0, policy_version 18784 (0.0025) [2025-01-03 22:32:58,968][134211] Fps is (10 sec: 13107.3, 60 sec: 13380.8, 300 sec: 13732.0). Total num frames: 76972032. Throughput: 0: 3434.1. Samples: 8413722. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-03 22:32:58,968][134211] Avg episode reward: [(0, '5.891')] [2025-01-03 22:32:59,438][134294] Updated weights for policy 0, policy_version 18794 (0.0019) [2025-01-03 22:33:02,485][134294] Updated weights for policy 0, policy_version 18804 (0.0025) [2025-01-03 22:33:03,968][134211] Fps is (10 sec: 13926.4, 60 sec: 13516.8, 300 sec: 13662.6). Total num frames: 77037568. Throughput: 0: 3466.5. Samples: 8424274. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 22:33:03,969][134211] Avg episode reward: [(0, '6.344')] [2025-01-03 22:33:05,660][134294] Updated weights for policy 0, policy_version 18814 (0.0025) [2025-01-03 22:33:08,658][134294] Updated weights for policy 0, policy_version 18824 (0.0026) [2025-01-03 22:33:08,968][134211] Fps is (10 sec: 13516.0, 60 sec: 13653.3, 300 sec: 13690.3). Total num frames: 77107200. Throughput: 0: 3394.6. Samples: 8444042. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 22:33:08,969][134211] Avg episode reward: [(0, '6.346')] [2025-01-03 22:33:11,740][134294] Updated weights for policy 0, policy_version 18834 (0.0025) [2025-01-03 22:33:13,968][134211] Fps is (10 sec: 13107.1, 60 sec: 13653.3, 300 sec: 13676.5). Total num frames: 77168640. Throughput: 0: 3409.1. Samples: 8463794. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 22:33:13,969][134211] Avg episode reward: [(0, '5.982')] [2025-01-03 22:33:14,015][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000018841_77172736.pth... [2025-01-03 22:33:14,087][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000018043_73904128.pth [2025-01-03 22:33:15,024][134294] Updated weights for policy 0, policy_version 18844 (0.0029) [2025-01-03 22:33:18,175][134294] Updated weights for policy 0, policy_version 18854 (0.0024) [2025-01-03 22:33:18,968][134211] Fps is (10 sec: 12698.1, 60 sec: 13653.3, 300 sec: 13676.5). Total num frames: 77234176. Throughput: 0: 3396.4. Samples: 8473004. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 22:33:18,968][134211] Avg episode reward: [(0, '5.713')] [2025-01-03 22:33:21,255][134294] Updated weights for policy 0, policy_version 18864 (0.0025) [2025-01-03 22:33:23,969][134211] Fps is (10 sec: 13106.3, 60 sec: 13448.3, 300 sec: 13676.5). Total num frames: 77299712. Throughput: 0: 3400.1. Samples: 8493036. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 22:33:23,969][134211] Avg episode reward: [(0, '6.288')] [2025-01-03 22:33:24,399][134294] Updated weights for policy 0, policy_version 18874 (0.0027) [2025-01-03 22:33:27,420][134294] Updated weights for policy 0, policy_version 18884 (0.0024) [2025-01-03 22:33:28,968][134211] Fps is (10 sec: 13107.2, 60 sec: 13448.5, 300 sec: 13662.6). Total num frames: 77365248. Throughput: 0: 3407.2. Samples: 8512876. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 22:33:28,968][134211] Avg episode reward: [(0, '5.045')] [2025-01-03 22:33:30,448][134294] Updated weights for policy 0, policy_version 18894 (0.0028) [2025-01-03 22:33:33,585][134294] Updated weights for policy 0, policy_version 18904 (0.0025) [2025-01-03 22:33:33,968][134211] Fps is (10 sec: 13517.9, 60 sec: 13516.8, 300 sec: 13676.5). Total num frames: 77434880. Throughput: 0: 3374.6. Samples: 8523060. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 22:33:33,969][134211] Avg episode reward: [(0, '5.242')] [2025-01-03 22:33:35,870][134294] Updated weights for policy 0, policy_version 18914 (0.0015) [2025-01-03 22:33:38,362][134294] Updated weights for policy 0, policy_version 18924 (0.0021) [2025-01-03 22:33:38,968][134211] Fps is (10 sec: 15155.3, 60 sec: 13858.1, 300 sec: 13718.1). Total num frames: 77516800. Throughput: 0: 3410.5. Samples: 8547184. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 22:33:38,968][134211] Avg episode reward: [(0, '5.519')] [2025-01-03 22:33:41,552][134294] Updated weights for policy 0, policy_version 18934 (0.0030) [2025-01-03 22:33:43,968][134211] Fps is (10 sec: 14745.8, 60 sec: 13789.9, 300 sec: 13732.0). Total num frames: 77582336. Throughput: 0: 3403.9. Samples: 8566900. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-03 22:33:43,968][134211] Avg episode reward: [(0, '6.211')] [2025-01-03 22:33:44,673][134294] Updated weights for policy 0, policy_version 18944 (0.0025) [2025-01-03 22:33:47,737][134294] Updated weights for policy 0, policy_version 18954 (0.0027) [2025-01-03 22:33:48,968][134211] Fps is (10 sec: 13107.1, 60 sec: 13448.5, 300 sec: 13745.9). Total num frames: 77647872. Throughput: 0: 3389.4. Samples: 8576796. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-03 22:33:48,968][134211] Avg episode reward: [(0, '5.512')] [2025-01-03 22:33:50,850][134294] Updated weights for policy 0, policy_version 18964 (0.0027) [2025-01-03 22:33:53,968][134211] Fps is (10 sec: 13107.0, 60 sec: 13585.1, 300 sec: 13759.9). Total num frames: 77713408. Throughput: 0: 3391.3. Samples: 8596648. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-03 22:33:53,968][134211] Avg episode reward: [(0, '5.952')] [2025-01-03 22:33:53,997][134294] Updated weights for policy 0, policy_version 18974 (0.0028) [2025-01-03 22:33:56,964][134294] Updated weights for policy 0, policy_version 18984 (0.0029) [2025-01-03 22:33:58,968][134211] Fps is (10 sec: 13516.9, 60 sec: 13516.8, 300 sec: 13787.6). Total num frames: 77783040. Throughput: 0: 3399.5. Samples: 8616770. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 22:33:58,968][134211] Avg episode reward: [(0, '5.705')] [2025-01-03 22:34:00,087][134294] Updated weights for policy 0, policy_version 18994 (0.0026) [2025-01-03 22:34:03,134][134294] Updated weights for policy 0, policy_version 19004 (0.0022) [2025-01-03 22:34:03,971][134211] Fps is (10 sec: 13512.7, 60 sec: 13516.1, 300 sec: 13731.9). Total num frames: 77848576. Throughput: 0: 3423.7. Samples: 8627080. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 22:34:03,972][134211] Avg episode reward: [(0, '5.868')] [2025-01-03 22:34:06,084][134294] Updated weights for policy 0, policy_version 19014 (0.0027) [2025-01-03 22:34:08,034][134294] Updated weights for policy 0, policy_version 19024 (0.0014) [2025-01-03 22:34:08,968][134211] Fps is (10 sec: 15155.2, 60 sec: 13790.0, 300 sec: 13662.6). Total num frames: 77934592. Throughput: 0: 3469.7. Samples: 8649170. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-03 22:34:08,968][134211] Avg episode reward: [(0, '5.823')] [2025-01-03 22:34:10,897][134294] Updated weights for policy 0, policy_version 19034 (0.0022) [2025-01-03 22:34:13,886][134294] Updated weights for policy 0, policy_version 19044 (0.0022) [2025-01-03 22:34:13,968][134211] Fps is (10 sec: 15569.6, 60 sec: 13926.4, 300 sec: 13718.2). Total num frames: 78004224. Throughput: 0: 3531.4. Samples: 8671788. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-03 22:34:13,968][134211] Avg episode reward: [(0, '5.897')] [2025-01-03 22:34:17,108][134294] Updated weights for policy 0, policy_version 19054 (0.0029) [2025-01-03 22:34:18,968][134211] Fps is (10 sec: 13107.2, 60 sec: 13858.2, 300 sec: 13718.1). Total num frames: 78065664. Throughput: 0: 3520.9. Samples: 8681500. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-03 22:34:18,968][134211] Avg episode reward: [(0, '6.753')] [2025-01-03 22:34:20,236][134294] Updated weights for policy 0, policy_version 19064 (0.0024) [2025-01-03 22:34:23,412][134294] Updated weights for policy 0, policy_version 19074 (0.0027) [2025-01-03 22:34:23,968][134211] Fps is (10 sec: 12697.5, 60 sec: 13858.3, 300 sec: 13732.0). Total num frames: 78131200. Throughput: 0: 3420.3. Samples: 8701098. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-03 22:34:23,969][134211] Avg episode reward: [(0, '6.066')] [2025-01-03 22:34:26,593][134294] Updated weights for policy 0, policy_version 19084 (0.0028) [2025-01-03 22:34:28,635][134294] Updated weights for policy 0, policy_version 19094 (0.0014) [2025-01-03 22:34:28,968][134211] Fps is (10 sec: 14745.7, 60 sec: 14131.2, 300 sec: 13787.6). Total num frames: 78213120. Throughput: 0: 3477.1. Samples: 8723368. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-03 22:34:28,968][134211] Avg episode reward: [(0, '5.698')] [2025-01-03 22:34:30,584][134294] Updated weights for policy 0, policy_version 19104 (0.0014) [2025-01-03 22:34:32,511][134294] Updated weights for policy 0, policy_version 19114 (0.0013) [2025-01-03 22:34:33,968][134211] Fps is (10 sec: 18842.0, 60 sec: 14745.6, 300 sec: 13940.3). Total num frames: 78319616. Throughput: 0: 3601.5. Samples: 8738862. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-03 22:34:33,968][134211] Avg episode reward: [(0, '5.621')] [2025-01-03 22:34:34,371][134294] Updated weights for policy 0, policy_version 19124 (0.0013) [2025-01-03 22:34:36,512][134294] Updated weights for policy 0, policy_version 19134 (0.0016) [2025-01-03 22:34:38,968][134211] Fps is (10 sec: 18841.2, 60 sec: 14745.6, 300 sec: 13982.0). Total num frames: 78401536. Throughput: 0: 3823.0. Samples: 8768682. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-03 22:34:38,968][134211] Avg episode reward: [(0, '5.773')] [2025-01-03 22:34:39,864][134294] Updated weights for policy 0, policy_version 19144 (0.0029) [2025-01-03 22:34:43,125][134294] Updated weights for policy 0, policy_version 19154 (0.0028) [2025-01-03 22:34:43,968][134211] Fps is (10 sec: 14335.9, 60 sec: 14677.3, 300 sec: 13968.1). Total num frames: 78462976. Throughput: 0: 3783.9. Samples: 8787048. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-03 22:34:43,968][134211] Avg episode reward: [(0, '6.419')] [2025-01-03 22:34:46,407][134294] Updated weights for policy 0, policy_version 19164 (0.0025) [2025-01-03 22:34:48,968][134211] Fps is (10 sec: 12288.1, 60 sec: 14609.1, 300 sec: 13954.2). Total num frames: 78524416. Throughput: 0: 3765.9. Samples: 8796534. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-03 22:34:48,968][134211] Avg episode reward: [(0, '5.648')] [2025-01-03 22:34:49,577][134294] Updated weights for policy 0, policy_version 19174 (0.0029) [2025-01-03 22:34:53,067][134294] Updated weights for policy 0, policy_version 19184 (0.0029) [2025-01-03 22:34:53,968][134211] Fps is (10 sec: 12287.9, 60 sec: 14540.8, 300 sec: 13954.2). Total num frames: 78585856. Throughput: 0: 3686.0. Samples: 8815040. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-03 22:34:53,968][134211] Avg episode reward: [(0, '5.633')] [2025-01-03 22:34:56,616][134294] Updated weights for policy 0, policy_version 19194 (0.0028) [2025-01-03 22:34:58,968][134211] Fps is (10 sec: 11878.4, 60 sec: 14336.0, 300 sec: 13787.5). Total num frames: 78643200. Throughput: 0: 3563.6. Samples: 8832148. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-03 22:34:58,968][134211] Avg episode reward: [(0, '6.228')] [2025-01-03 22:35:00,201][134294] Updated weights for policy 0, policy_version 19204 (0.0027) [2025-01-03 22:35:03,524][134294] Updated weights for policy 0, policy_version 19214 (0.0024) [2025-01-03 22:35:03,968][134211] Fps is (10 sec: 11878.4, 60 sec: 14268.5, 300 sec: 13704.3). Total num frames: 78704640. Throughput: 0: 3542.3. Samples: 8840906. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 22:35:03,968][134211] Avg episode reward: [(0, '5.403')] [2025-01-03 22:35:06,788][134294] Updated weights for policy 0, policy_version 19224 (0.0028) [2025-01-03 22:35:08,968][134211] Fps is (10 sec: 12288.1, 60 sec: 13858.1, 300 sec: 13704.2). Total num frames: 78766080. Throughput: 0: 3526.3. Samples: 8859780. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 22:35:08,968][134211] Avg episode reward: [(0, '5.808')] [2025-01-03 22:35:09,992][134294] Updated weights for policy 0, policy_version 19234 (0.0028) [2025-01-03 22:35:13,089][134294] Updated weights for policy 0, policy_version 19244 (0.0026) [2025-01-03 22:35:13,968][134211] Fps is (10 sec: 12697.6, 60 sec: 13789.9, 300 sec: 13718.1). Total num frames: 78831616. Throughput: 0: 3460.7. Samples: 8879102. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 22:35:13,969][134211] Avg episode reward: [(0, '5.909')] [2025-01-03 22:35:14,033][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000019247_78835712.pth... [2025-01-03 22:35:14,104][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000018436_75513856.pth [2025-01-03 22:35:16,292][134294] Updated weights for policy 0, policy_version 19254 (0.0024) [2025-01-03 22:35:18,968][134211] Fps is (10 sec: 13107.0, 60 sec: 13858.1, 300 sec: 13718.1). Total num frames: 78897152. Throughput: 0: 3335.1. Samples: 8888940. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-03 22:35:18,969][134211] Avg episode reward: [(0, '5.903')] [2025-01-03 22:35:19,358][134294] Updated weights for policy 0, policy_version 19264 (0.0027) [2025-01-03 22:35:22,812][134294] Updated weights for policy 0, policy_version 19274 (0.0032) [2025-01-03 22:35:23,969][134211] Fps is (10 sec: 12696.5, 60 sec: 13789.7, 300 sec: 13718.1). Total num frames: 78958592. Throughput: 0: 3094.2. Samples: 8907926. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-03 22:35:23,969][134211] Avg episode reward: [(0, '5.914')] [2025-01-03 22:35:25,919][134294] Updated weights for policy 0, policy_version 19284 (0.0025) [2025-01-03 22:35:28,784][134294] Updated weights for policy 0, policy_version 19294 (0.0023) [2025-01-03 22:35:28,968][134211] Fps is (10 sec: 13106.4, 60 sec: 13584.9, 300 sec: 13718.1). Total num frames: 79028224. Throughput: 0: 3130.4. Samples: 8927918. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 22:35:28,969][134211] Avg episode reward: [(0, '6.313')] [2025-01-03 22:35:31,835][134294] Updated weights for policy 0, policy_version 19304 (0.0026) [2025-01-03 22:35:33,968][134211] Fps is (10 sec: 13518.0, 60 sec: 12902.4, 300 sec: 13718.1). Total num frames: 79093760. Throughput: 0: 3147.1. Samples: 8938152. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 22:35:33,968][134211] Avg episode reward: [(0, '5.911')] [2025-01-03 22:35:35,033][134294] Updated weights for policy 0, policy_version 19314 (0.0024) [2025-01-03 22:35:37,304][134294] Updated weights for policy 0, policy_version 19324 (0.0015) [2025-01-03 22:35:38,967][134211] Fps is (10 sec: 15566.1, 60 sec: 13039.0, 300 sec: 13801.5). Total num frames: 79183872. Throughput: 0: 3231.9. Samples: 8960476. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 22:35:38,968][134211] Avg episode reward: [(0, '6.058')] [2025-01-03 22:35:39,202][134294] Updated weights for policy 0, policy_version 19334 (0.0014) [2025-01-03 22:35:41,083][134294] Updated weights for policy 0, policy_version 19344 (0.0013) [2025-01-03 22:35:42,954][134294] Updated weights for policy 0, policy_version 19354 (0.0013) [2025-01-03 22:35:43,968][134211] Fps is (10 sec: 20070.5, 60 sec: 13858.1, 300 sec: 13954.2). Total num frames: 79294464. Throughput: 0: 3572.4. Samples: 8992908. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-03 22:35:43,968][134211] Avg episode reward: [(0, '5.495')] [2025-01-03 22:35:45,425][134294] Updated weights for policy 0, policy_version 19364 (0.0020) [2025-01-03 22:35:48,658][134294] Updated weights for policy 0, policy_version 19374 (0.0029) [2025-01-03 22:35:48,968][134211] Fps is (10 sec: 17203.0, 60 sec: 13858.1, 300 sec: 13968.1). Total num frames: 79355904. Throughput: 0: 3634.5. Samples: 9004460. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-03 22:35:48,968][134211] Avg episode reward: [(0, '5.726')] [2025-01-03 22:35:51,775][134294] Updated weights for policy 0, policy_version 19384 (0.0031) [2025-01-03 22:35:53,968][134211] Fps is (10 sec: 12697.5, 60 sec: 13926.4, 300 sec: 13940.3). Total num frames: 79421440. Throughput: 0: 3639.5. Samples: 9023560. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-03 22:35:53,968][134211] Avg episode reward: [(0, '5.799')] [2025-01-03 22:35:55,052][134294] Updated weights for policy 0, policy_version 19394 (0.0027) [2025-01-03 22:35:58,018][134294] Updated weights for policy 0, policy_version 19404 (0.0025) [2025-01-03 22:35:58,968][134211] Fps is (10 sec: 13107.0, 60 sec: 14062.9, 300 sec: 13926.4). Total num frames: 79486976. Throughput: 0: 3644.4. Samples: 9043102. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 22:35:58,968][134211] Avg episode reward: [(0, '5.647')] [2025-01-03 22:36:01,147][134294] Updated weights for policy 0, policy_version 19414 (0.0025) [2025-01-03 22:36:03,968][134211] Fps is (10 sec: 13107.2, 60 sec: 14131.2, 300 sec: 13926.4). Total num frames: 79552512. Throughput: 0: 3650.6. Samples: 9053218. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 22:36:03,968][134211] Avg episode reward: [(0, '5.789')] [2025-01-03 22:36:04,340][134294] Updated weights for policy 0, policy_version 19424 (0.0030) [2025-01-03 22:36:07,388][134294] Updated weights for policy 0, policy_version 19434 (0.0024) [2025-01-03 22:36:08,968][134211] Fps is (10 sec: 13516.6, 60 sec: 14267.7, 300 sec: 13926.4). Total num frames: 79622144. Throughput: 0: 3667.6. Samples: 9072964. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 22:36:08,969][134211] Avg episode reward: [(0, '6.345')] [2025-01-03 22:36:10,370][134294] Updated weights for policy 0, policy_version 19444 (0.0027) [2025-01-03 22:36:13,330][134294] Updated weights for policy 0, policy_version 19454 (0.0024) [2025-01-03 22:36:13,969][134211] Fps is (10 sec: 13925.3, 60 sec: 14335.8, 300 sec: 13940.2). Total num frames: 79691776. Throughput: 0: 3685.0. Samples: 9093742. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 22:36:13,969][134211] Avg episode reward: [(0, '6.293')] [2025-01-03 22:36:16,188][134294] Updated weights for policy 0, policy_version 19464 (0.0024) [2025-01-03 22:36:18,968][134211] Fps is (10 sec: 13517.1, 60 sec: 14336.0, 300 sec: 13912.5). Total num frames: 79757312. Throughput: 0: 3687.8. Samples: 9104104. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 22:36:18,969][134211] Avg episode reward: [(0, '6.455')] [2025-01-03 22:36:19,533][134294] Updated weights for policy 0, policy_version 19474 (0.0025) [2025-01-03 22:36:22,672][134294] Updated weights for policy 0, policy_version 19484 (0.0025) [2025-01-03 22:36:23,968][134211] Fps is (10 sec: 12697.8, 60 sec: 14336.1, 300 sec: 13759.7). Total num frames: 79818752. Throughput: 0: 3615.3. Samples: 9123168. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 22:36:23,969][134211] Avg episode reward: [(0, '5.831')] [2025-01-03 22:36:25,948][134294] Updated weights for policy 0, policy_version 19494 (0.0024) [2025-01-03 22:36:28,739][134294] Updated weights for policy 0, policy_version 19504 (0.0025) [2025-01-03 22:36:28,968][134211] Fps is (10 sec: 13107.4, 60 sec: 14336.2, 300 sec: 13676.5). Total num frames: 79888384. Throughput: 0: 3339.3. Samples: 9143178. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 22:36:28,968][134211] Avg episode reward: [(0, '6.198')] [2025-01-03 22:36:31,728][134294] Updated weights for policy 0, policy_version 19514 (0.0025) [2025-01-03 22:36:33,968][134211] Fps is (10 sec: 13926.7, 60 sec: 14404.2, 300 sec: 13704.2). Total num frames: 79958016. Throughput: 0: 3313.0. Samples: 9153546. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 22:36:33,969][134211] Avg episode reward: [(0, '6.110')] [2025-01-03 22:36:34,760][134294] Updated weights for policy 0, policy_version 19524 (0.0026) [2025-01-03 22:36:37,820][134294] Updated weights for policy 0, policy_version 19534 (0.0026) [2025-01-03 22:36:38,968][134211] Fps is (10 sec: 13516.6, 60 sec: 13994.6, 300 sec: 13704.2). Total num frames: 80023552. Throughput: 0: 3341.7. Samples: 9173938. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-03 22:36:38,968][134211] Avg episode reward: [(0, '6.272')] [2025-01-03 22:36:40,806][134294] Updated weights for policy 0, policy_version 19544 (0.0024) [2025-01-03 22:36:43,665][134294] Updated weights for policy 0, policy_version 19554 (0.0025) [2025-01-03 22:36:43,968][134211] Fps is (10 sec: 13927.2, 60 sec: 13380.3, 300 sec: 13732.0). Total num frames: 80097280. Throughput: 0: 3370.5. Samples: 9194774. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-03 22:36:43,968][134211] Avg episode reward: [(0, '6.190')] [2025-01-03 22:36:46,636][134294] Updated weights for policy 0, policy_version 19564 (0.0026) [2025-01-03 22:36:48,968][134211] Fps is (10 sec: 13926.3, 60 sec: 13448.5, 300 sec: 13759.8). Total num frames: 80162816. Throughput: 0: 3375.1. Samples: 9205096. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-03 22:36:48,968][134211] Avg episode reward: [(0, '6.516')] [2025-01-03 22:36:49,690][134294] Updated weights for policy 0, policy_version 19574 (0.0027) [2025-01-03 22:36:52,736][134294] Updated weights for policy 0, policy_version 19584 (0.0024) [2025-01-03 22:36:53,968][134211] Fps is (10 sec: 13107.1, 60 sec: 13448.6, 300 sec: 13759.9). Total num frames: 80228352. Throughput: 0: 3388.0. Samples: 9225424. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 22:36:53,968][134211] Avg episode reward: [(0, '5.984')] [2025-01-03 22:36:55,966][134294] Updated weights for policy 0, policy_version 19594 (0.0021) [2025-01-03 22:36:57,978][134294] Updated weights for policy 0, policy_version 19604 (0.0013) [2025-01-03 22:36:58,968][134211] Fps is (10 sec: 15155.5, 60 sec: 13789.9, 300 sec: 13857.0). Total num frames: 80314368. Throughput: 0: 3440.3. Samples: 9248550. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 22:36:58,968][134211] Avg episode reward: [(0, '6.597')] [2025-01-03 22:37:00,019][134294] Updated weights for policy 0, policy_version 19614 (0.0014) [2025-01-03 22:37:01,937][134294] Updated weights for policy 0, policy_version 19624 (0.0013) [2025-01-03 22:37:03,968][134211] Fps is (10 sec: 18431.8, 60 sec: 14336.0, 300 sec: 13981.9). Total num frames: 80412672. Throughput: 0: 3558.0. Samples: 9264216. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-03 22:37:03,968][134211] Avg episode reward: [(0, '6.274')] [2025-01-03 22:37:04,305][134294] Updated weights for policy 0, policy_version 19634 (0.0022) [2025-01-03 22:37:07,670][134294] Updated weights for policy 0, policy_version 19644 (0.0028) [2025-01-03 22:37:08,968][134211] Fps is (10 sec: 15974.1, 60 sec: 14199.5, 300 sec: 13981.9). Total num frames: 80474112. Throughput: 0: 3643.3. Samples: 9287114. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-03 22:37:08,968][134211] Avg episode reward: [(0, '5.934')] [2025-01-03 22:37:10,845][134294] Updated weights for policy 0, policy_version 19654 (0.0027) [2025-01-03 22:37:13,819][134294] Updated weights for policy 0, policy_version 19664 (0.0022) [2025-01-03 22:37:13,969][134211] Fps is (10 sec: 13106.0, 60 sec: 14199.5, 300 sec: 13995.8). Total num frames: 80543744. Throughput: 0: 3640.7. Samples: 9307012. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-03 22:37:13,969][134211] Avg episode reward: [(0, '6.293')] [2025-01-03 22:37:13,980][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000019664_80543744.pth... [2025-01-03 22:37:14,058][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000018841_77172736.pth [2025-01-03 22:37:16,954][134294] Updated weights for policy 0, policy_version 19674 (0.0027) [2025-01-03 22:37:18,968][134211] Fps is (10 sec: 13516.8, 60 sec: 14199.5, 300 sec: 13954.2). Total num frames: 80609280. Throughput: 0: 3626.3. Samples: 9316728. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-03 22:37:18,968][134211] Avg episode reward: [(0, '5.984')] [2025-01-03 22:37:20,005][134294] Updated weights for policy 0, policy_version 19684 (0.0026) [2025-01-03 22:37:22,985][134294] Updated weights for policy 0, policy_version 19694 (0.0026) [2025-01-03 22:37:23,968][134211] Fps is (10 sec: 13518.2, 60 sec: 14336.2, 300 sec: 13968.1). Total num frames: 80678912. Throughput: 0: 3629.9. Samples: 9337282. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-03 22:37:23,968][134211] Avg episode reward: [(0, '6.070')] [2025-01-03 22:37:25,968][134294] Updated weights for policy 0, policy_version 19704 (0.0025) [2025-01-03 22:37:28,798][134294] Updated weights for policy 0, policy_version 19714 (0.0025) [2025-01-03 22:37:28,968][134211] Fps is (10 sec: 13926.2, 60 sec: 14335.9, 300 sec: 13981.9). Total num frames: 80748544. Throughput: 0: 3630.4. Samples: 9358144. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-03 22:37:28,968][134211] Avg episode reward: [(0, '5.777')] [2025-01-03 22:37:31,895][134294] Updated weights for policy 0, policy_version 19724 (0.0024) [2025-01-03 22:37:33,968][134211] Fps is (10 sec: 13516.6, 60 sec: 14267.8, 300 sec: 13995.8). Total num frames: 80814080. Throughput: 0: 3625.2. Samples: 9368230. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 22:37:33,968][134211] Avg episode reward: [(0, '6.331')] [2025-01-03 22:37:35,012][134294] Updated weights for policy 0, policy_version 19734 (0.0024) [2025-01-03 22:37:37,943][134294] Updated weights for policy 0, policy_version 19744 (0.0025) [2025-01-03 22:37:38,968][134211] Fps is (10 sec: 13517.0, 60 sec: 14336.0, 300 sec: 13995.8). Total num frames: 80883712. Throughput: 0: 3618.3. Samples: 9388246. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 22:37:38,968][134211] Avg episode reward: [(0, '6.146')] [2025-01-03 22:37:41,011][134294] Updated weights for policy 0, policy_version 19754 (0.0026) [2025-01-03 22:37:43,968][134211] Fps is (10 sec: 13516.8, 60 sec: 14199.4, 300 sec: 13926.4). Total num frames: 80949248. Throughput: 0: 3565.6. Samples: 9409004. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 22:37:43,968][134211] Avg episode reward: [(0, '5.869')] [2025-01-03 22:37:43,983][134294] Updated weights for policy 0, policy_version 19764 (0.0023) [2025-01-03 22:37:46,893][134294] Updated weights for policy 0, policy_version 19774 (0.0024) [2025-01-03 22:37:48,968][134211] Fps is (10 sec: 13517.1, 60 sec: 14267.8, 300 sec: 13968.1). Total num frames: 81018880. Throughput: 0: 3446.3. Samples: 9419298. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 22:37:48,968][134211] Avg episode reward: [(0, '6.347')] [2025-01-03 22:37:49,928][134294] Updated weights for policy 0, policy_version 19784 (0.0023) [2025-01-03 22:37:52,870][134294] Updated weights for policy 0, policy_version 19794 (0.0024) [2025-01-03 22:37:53,968][134211] Fps is (10 sec: 13926.3, 60 sec: 14336.0, 300 sec: 13954.2). Total num frames: 81088512. Throughput: 0: 3391.1. Samples: 9439714. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 22:37:53,969][134211] Avg episode reward: [(0, '5.749')] [2025-01-03 22:37:55,932][134294] Updated weights for policy 0, policy_version 19804 (0.0026) [2025-01-03 22:37:58,842][134294] Updated weights for policy 0, policy_version 19814 (0.0024) [2025-01-03 22:37:58,968][134211] Fps is (10 sec: 13926.0, 60 sec: 14062.9, 300 sec: 13968.1). Total num frames: 81158144. Throughput: 0: 3411.7. Samples: 9460534. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-03 22:37:58,969][134211] Avg episode reward: [(0, '6.197')] [2025-01-03 22:38:01,790][134294] Updated weights for policy 0, policy_version 19824 (0.0027) [2025-01-03 22:38:03,968][134211] Fps is (10 sec: 13926.0, 60 sec: 13585.0, 300 sec: 13968.1). Total num frames: 81227776. Throughput: 0: 3423.6. Samples: 9470790. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-03 22:38:03,969][134211] Avg episode reward: [(0, '5.854')] [2025-01-03 22:38:04,864][134294] Updated weights for policy 0, policy_version 19834 (0.0023) [2025-01-03 22:38:07,955][134294] Updated weights for policy 0, policy_version 19844 (0.0026) [2025-01-03 22:38:08,968][134211] Fps is (10 sec: 13517.1, 60 sec: 13653.4, 300 sec: 13982.0). Total num frames: 81293312. Throughput: 0: 3414.3. Samples: 9490924. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-03 22:38:08,968][134211] Avg episode reward: [(0, '5.517')] [2025-01-03 22:38:10,844][134294] Updated weights for policy 0, policy_version 19854 (0.0024) [2025-01-03 22:38:13,824][134294] Updated weights for policy 0, policy_version 19864 (0.0026) [2025-01-03 22:38:13,968][134211] Fps is (10 sec: 13517.2, 60 sec: 13653.5, 300 sec: 13995.8). Total num frames: 81362944. Throughput: 0: 3417.3. Samples: 9511924. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 22:38:13,968][134211] Avg episode reward: [(0, '6.229')] [2025-01-03 22:38:16,640][134294] Updated weights for policy 0, policy_version 19874 (0.0025) [2025-01-03 22:38:18,968][134211] Fps is (10 sec: 14335.9, 60 sec: 13789.9, 300 sec: 14023.6). Total num frames: 81436672. Throughput: 0: 3424.1. Samples: 9522312. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 22:38:18,969][134211] Avg episode reward: [(0, '5.898')] [2025-01-03 22:38:19,610][134294] Updated weights for policy 0, policy_version 19884 (0.0024) [2025-01-03 22:38:22,100][134294] Updated weights for policy 0, policy_version 19894 (0.0018) [2025-01-03 22:38:23,968][134211] Fps is (10 sec: 15974.8, 60 sec: 14062.9, 300 sec: 14093.0). Total num frames: 81522688. Throughput: 0: 3484.1. Samples: 9545030. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 22:38:23,968][134211] Avg episode reward: [(0, '5.897')] [2025-01-03 22:38:24,066][134294] Updated weights for policy 0, policy_version 19904 (0.0014) [2025-01-03 22:38:26,169][134294] Updated weights for policy 0, policy_version 19914 (0.0017) [2025-01-03 22:38:28,968][134211] Fps is (10 sec: 16793.7, 60 sec: 14267.8, 300 sec: 14134.7). Total num frames: 81604608. Throughput: 0: 3624.8. Samples: 9572120. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 22:38:28,968][134211] Avg episode reward: [(0, '5.797')] [2025-01-03 22:38:29,213][134294] Updated weights for policy 0, policy_version 19924 (0.0023) [2025-01-03 22:38:32,191][134294] Updated weights for policy 0, policy_version 19934 (0.0025) [2025-01-03 22:38:33,968][134211] Fps is (10 sec: 14745.3, 60 sec: 14267.7, 300 sec: 14079.1). Total num frames: 81670144. Throughput: 0: 3617.6. Samples: 9582090. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 22:38:33,968][134211] Avg episode reward: [(0, '5.767')] [2025-01-03 22:38:35,268][134294] Updated weights for policy 0, policy_version 19944 (0.0026) [2025-01-03 22:38:38,203][134294] Updated weights for policy 0, policy_version 19954 (0.0025) [2025-01-03 22:38:38,968][134211] Fps is (10 sec: 13516.7, 60 sec: 14267.7, 300 sec: 14093.0). Total num frames: 81739776. Throughput: 0: 3622.7. Samples: 9602734. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 22:38:38,968][134211] Avg episode reward: [(0, '6.392')] [2025-01-03 22:38:41,287][134294] Updated weights for policy 0, policy_version 19964 (0.0025) [2025-01-03 22:38:43,968][134211] Fps is (10 sec: 13516.9, 60 sec: 14267.7, 300 sec: 14093.0). Total num frames: 81805312. Throughput: 0: 3594.3. Samples: 9622278. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 22:38:43,968][134211] Avg episode reward: [(0, '6.450')] [2025-01-03 22:38:44,537][134294] Updated weights for policy 0, policy_version 19974 (0.0025) [2025-01-03 22:38:47,434][134294] Updated weights for policy 0, policy_version 19984 (0.0027) [2025-01-03 22:38:48,968][134211] Fps is (10 sec: 13516.8, 60 sec: 14267.7, 300 sec: 14106.9). Total num frames: 81874944. Throughput: 0: 3594.9. Samples: 9632560. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 22:38:48,968][134211] Avg episode reward: [(0, '5.521')] [2025-01-03 22:38:50,383][134294] Updated weights for policy 0, policy_version 19994 (0.0025) [2025-01-03 22:38:53,381][134294] Updated weights for policy 0, policy_version 20004 (0.0026) [2025-01-03 22:38:53,968][134211] Fps is (10 sec: 13516.7, 60 sec: 14199.5, 300 sec: 14093.0). Total num frames: 81940480. Throughput: 0: 3607.4. Samples: 9653256. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 22:38:53,968][134211] Avg episode reward: [(0, '6.104')] [2025-01-03 22:38:56,860][134294] Updated weights for policy 0, policy_version 20014 (0.0026) [2025-01-03 22:38:58,967][134211] Fps is (10 sec: 13107.5, 60 sec: 14131.3, 300 sec: 14093.2). Total num frames: 82006016. Throughput: 0: 3548.6. Samples: 9671608. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 22:38:58,968][134211] Avg episode reward: [(0, '6.037')] [2025-01-03 22:38:59,491][134294] Updated weights for policy 0, policy_version 20024 (0.0018) [2025-01-03 22:39:01,479][134294] Updated weights for policy 0, policy_version 20034 (0.0012) [2025-01-03 22:39:03,381][134294] Updated weights for policy 0, policy_version 20044 (0.0014) [2025-01-03 22:39:03,968][134211] Fps is (10 sec: 17203.5, 60 sec: 14745.7, 300 sec: 14162.4). Total num frames: 82112512. Throughput: 0: 3647.3. Samples: 9686440. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 22:39:03,968][134211] Avg episode reward: [(0, '5.982')] [2025-01-03 22:39:05,267][134294] Updated weights for policy 0, policy_version 20054 (0.0014) [2025-01-03 22:39:07,140][134294] Updated weights for policy 0, policy_version 20064 (0.0014) [2025-01-03 22:39:08,968][134211] Fps is (10 sec: 21298.8, 60 sec: 15428.3, 300 sec: 14287.4). Total num frames: 82219008. Throughput: 0: 3863.7. Samples: 9718898. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-03 22:39:08,968][134211] Avg episode reward: [(0, '5.230')] [2025-01-03 22:39:09,054][134294] Updated weights for policy 0, policy_version 20074 (0.0015) [2025-01-03 22:39:12,017][134294] Updated weights for policy 0, policy_version 20084 (0.0027) [2025-01-03 22:39:13,968][134211] Fps is (10 sec: 17612.4, 60 sec: 15428.3, 300 sec: 14315.2). Total num frames: 82288640. Throughput: 0: 3794.6. Samples: 9742878. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-03 22:39:13,969][134211] Avg episode reward: [(0, '5.691')] [2025-01-03 22:39:13,979][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000020090_82288640.pth... [2025-01-03 22:39:14,060][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000019247_78835712.pth [2025-01-03 22:39:15,429][134294] Updated weights for policy 0, policy_version 20094 (0.0026) [2025-01-03 22:39:18,452][134294] Updated weights for policy 0, policy_version 20104 (0.0027) [2025-01-03 22:39:18,968][134211] Fps is (10 sec: 13107.1, 60 sec: 15223.5, 300 sec: 14301.3). Total num frames: 82350080. Throughput: 0: 3779.4. Samples: 9752164. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-03 22:39:18,968][134211] Avg episode reward: [(0, '5.354')] [2025-01-03 22:39:21,472][134294] Updated weights for policy 0, policy_version 20114 (0.0025) [2025-01-03 22:39:23,968][134211] Fps is (10 sec: 12697.8, 60 sec: 14882.1, 300 sec: 14245.7). Total num frames: 82415616. Throughput: 0: 3763.5. Samples: 9772092. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 22:39:23,968][134211] Avg episode reward: [(0, '5.530')] [2025-01-03 22:39:24,697][134294] Updated weights for policy 0, policy_version 20124 (0.0024) [2025-01-03 22:39:27,782][134294] Updated weights for policy 0, policy_version 20134 (0.0028) [2025-01-03 22:39:28,968][134211] Fps is (10 sec: 13107.4, 60 sec: 14609.1, 300 sec: 14106.9). Total num frames: 82481152. Throughput: 0: 3766.6. Samples: 9791774. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 22:39:28,968][134211] Avg episode reward: [(0, '5.626')] [2025-01-03 22:39:30,731][134294] Updated weights for policy 0, policy_version 20144 (0.0026) [2025-01-03 22:39:33,759][134294] Updated weights for policy 0, policy_version 20154 (0.0027) [2025-01-03 22:39:33,968][134211] Fps is (10 sec: 13516.6, 60 sec: 14677.3, 300 sec: 14065.2). Total num frames: 82550784. Throughput: 0: 3770.5. Samples: 9802234. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 22:39:33,968][134211] Avg episode reward: [(0, '5.764')] [2025-01-03 22:39:36,816][134294] Updated weights for policy 0, policy_version 20164 (0.0025) [2025-01-03 22:39:38,968][134211] Fps is (10 sec: 13516.7, 60 sec: 14609.1, 300 sec: 14079.1). Total num frames: 82616320. Throughput: 0: 3756.9. Samples: 9822318. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-03 22:39:38,968][134211] Avg episode reward: [(0, '5.413')] [2025-01-03 22:39:39,974][134294] Updated weights for policy 0, policy_version 20174 (0.0025) [2025-01-03 22:39:42,913][134294] Updated weights for policy 0, policy_version 20184 (0.0026) [2025-01-03 22:39:43,968][134211] Fps is (10 sec: 13517.0, 60 sec: 14677.4, 300 sec: 14106.9). Total num frames: 82685952. Throughput: 0: 3796.3. Samples: 9842442. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-03 22:39:43,968][134211] Avg episode reward: [(0, '6.336')] [2025-01-03 22:39:45,939][134294] Updated weights for policy 0, policy_version 20194 (0.0025) [2025-01-03 22:39:48,855][134294] Updated weights for policy 0, policy_version 20204 (0.0023) [2025-01-03 22:39:48,968][134211] Fps is (10 sec: 13926.3, 60 sec: 14677.3, 300 sec: 14134.7). Total num frames: 82755584. Throughput: 0: 3696.0. Samples: 9852762. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 22:39:48,968][134211] Avg episode reward: [(0, '6.324')] [2025-01-03 22:39:51,781][134294] Updated weights for policy 0, policy_version 20214 (0.0025) [2025-01-03 22:39:53,968][134211] Fps is (10 sec: 13926.5, 60 sec: 14745.7, 300 sec: 14176.3). Total num frames: 82825216. Throughput: 0: 3440.9. Samples: 9873740. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 22:39:53,968][134211] Avg episode reward: [(0, '6.300')] [2025-01-03 22:39:54,909][134294] Updated weights for policy 0, policy_version 20224 (0.0027) [2025-01-03 22:39:57,879][134294] Updated weights for policy 0, policy_version 20234 (0.0026) [2025-01-03 22:39:58,968][134211] Fps is (10 sec: 13516.8, 60 sec: 14745.5, 300 sec: 14190.2). Total num frames: 82890752. Throughput: 0: 3357.5. Samples: 9893964. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 22:39:58,968][134211] Avg episode reward: [(0, '6.131')] [2025-01-03 22:40:00,847][134294] Updated weights for policy 0, policy_version 20244 (0.0024) [2025-01-03 22:40:03,670][134294] Updated weights for policy 0, policy_version 20254 (0.0022) [2025-01-03 22:40:03,971][134211] Fps is (10 sec: 13512.4, 60 sec: 14130.4, 300 sec: 14217.8). Total num frames: 82960384. Throughput: 0: 3382.1. Samples: 9904370. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-03 22:40:03,971][134211] Avg episode reward: [(0, '6.738')] [2025-01-03 22:40:06,705][134294] Updated weights for policy 0, policy_version 20264 (0.0023) [2025-01-03 22:40:08,970][134211] Fps is (10 sec: 13923.6, 60 sec: 13516.3, 300 sec: 14231.8). Total num frames: 83030016. Throughput: 0: 3401.6. Samples: 9925170. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-03 22:40:08,970][134211] Avg episode reward: [(0, '6.284')] [2025-01-03 22:40:09,778][134294] Updated weights for policy 0, policy_version 20274 (0.0025) [2025-01-03 22:40:12,670][134294] Updated weights for policy 0, policy_version 20284 (0.0025) [2025-01-03 22:40:13,968][134211] Fps is (10 sec: 13930.6, 60 sec: 13516.8, 300 sec: 14245.7). Total num frames: 83099648. Throughput: 0: 3420.9. Samples: 9945716. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-03 22:40:13,969][134211] Avg episode reward: [(0, '6.109')] [2025-01-03 22:40:15,714][134294] Updated weights for policy 0, policy_version 20294 (0.0025) [2025-01-03 22:40:18,544][134294] Updated weights for policy 0, policy_version 20304 (0.0023) [2025-01-03 22:40:18,969][134211] Fps is (10 sec: 13927.8, 60 sec: 13653.1, 300 sec: 14273.5). Total num frames: 83169280. Throughput: 0: 3416.6. Samples: 9955984. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 22:40:18,969][134211] Avg episode reward: [(0, '6.719')] [2025-01-03 22:40:21,542][134294] Updated weights for policy 0, policy_version 20314 (0.0028) [2025-01-03 22:40:23,968][134211] Fps is (10 sec: 13926.6, 60 sec: 13721.6, 300 sec: 14273.5). Total num frames: 83238912. Throughput: 0: 3434.2. Samples: 9976856. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 22:40:23,968][134211] Avg episode reward: [(0, '6.027')] [2025-01-03 22:40:24,617][134294] Updated weights for policy 0, policy_version 20324 (0.0026) [2025-01-03 22:40:27,702][134294] Updated weights for policy 0, policy_version 20334 (0.0027) [2025-01-03 22:40:28,969][134211] Fps is (10 sec: 13107.5, 60 sec: 13653.1, 300 sec: 14259.6). Total num frames: 83300352. Throughput: 0: 3429.2. Samples: 9996758. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 22:40:28,969][134211] Avg episode reward: [(0, '5.736')] [2025-01-03 22:40:30,736][134294] Updated weights for policy 0, policy_version 20344 (0.0025) [2025-01-03 22:40:33,640][134294] Updated weights for policy 0, policy_version 20354 (0.0024) [2025-01-03 22:40:33,968][134211] Fps is (10 sec: 13516.6, 60 sec: 13721.6, 300 sec: 14204.1). Total num frames: 83374080. Throughput: 0: 3429.3. Samples: 10007080. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 22:40:33,968][134211] Avg episode reward: [(0, '5.828')] [2025-01-03 22:40:36,640][134294] Updated weights for policy 0, policy_version 20364 (0.0023) [2025-01-03 22:40:38,968][134211] Fps is (10 sec: 13927.6, 60 sec: 13721.6, 300 sec: 14051.4). Total num frames: 83439616. Throughput: 0: 3426.3. Samples: 10027922. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 22:40:38,968][134211] Avg episode reward: [(0, '6.532')] [2025-01-03 22:40:39,445][134294] Updated weights for policy 0, policy_version 20374 (0.0024) [2025-01-03 22:40:41,484][134294] Updated weights for policy 0, policy_version 20384 (0.0014) [2025-01-03 22:40:43,968][134211] Fps is (10 sec: 15155.3, 60 sec: 13994.6, 300 sec: 14134.7). Total num frames: 83525632. Throughput: 0: 3526.4. Samples: 10052652. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 22:40:43,968][134211] Avg episode reward: [(0, '6.535')] [2025-01-03 22:40:44,289][134294] Updated weights for policy 0, policy_version 20394 (0.0024) [2025-01-03 22:40:47,425][134294] Updated weights for policy 0, policy_version 20404 (0.0023) [2025-01-03 22:40:48,968][134211] Fps is (10 sec: 15564.5, 60 sec: 13994.6, 300 sec: 14148.6). Total num frames: 83595264. Throughput: 0: 3520.6. Samples: 10062788. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-03 22:40:48,968][134211] Avg episode reward: [(0, '6.235')] [2025-01-03 22:40:50,465][134294] Updated weights for policy 0, policy_version 20414 (0.0023) [2025-01-03 22:40:53,440][134294] Updated weights for policy 0, policy_version 20424 (0.0023) [2025-01-03 22:40:53,968][134211] Fps is (10 sec: 13926.7, 60 sec: 13994.7, 300 sec: 14162.5). Total num frames: 83664896. Throughput: 0: 3509.7. Samples: 10083100. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-03 22:40:53,968][134211] Avg episode reward: [(0, '6.427')] [2025-01-03 22:40:55,327][134294] Updated weights for policy 0, policy_version 20434 (0.0015) [2025-01-03 22:40:58,638][134294] Updated weights for policy 0, policy_version 20444 (0.0024) [2025-01-03 22:40:58,968][134211] Fps is (10 sec: 14745.6, 60 sec: 14199.4, 300 sec: 14204.1). Total num frames: 83742720. Throughput: 0: 3574.4. Samples: 10106564. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 22:40:58,968][134211] Avg episode reward: [(0, '6.365')] [2025-01-03 22:41:02,061][134294] Updated weights for policy 0, policy_version 20454 (0.0025) [2025-01-03 22:41:03,968][134211] Fps is (10 sec: 13516.6, 60 sec: 13995.4, 300 sec: 14162.5). Total num frames: 83800064. Throughput: 0: 3540.8. Samples: 10115316. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 22:41:03,968][134211] Avg episode reward: [(0, '6.715')] [2025-01-03 22:41:05,716][134294] Updated weights for policy 0, policy_version 20464 (0.0024) [2025-01-03 22:41:08,034][134294] Updated weights for policy 0, policy_version 20474 (0.0016) [2025-01-03 22:41:08,968][134211] Fps is (10 sec: 13517.1, 60 sec: 14131.7, 300 sec: 14190.3). Total num frames: 83877888. Throughput: 0: 3499.3. Samples: 10134326. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 22:41:08,968][134211] Avg episode reward: [(0, '5.939')] [2025-01-03 22:41:10,063][134294] Updated weights for policy 0, policy_version 20484 (0.0012) [2025-01-03 22:41:12,083][134294] Updated weights for policy 0, policy_version 20494 (0.0014) [2025-01-03 22:41:13,968][134211] Fps is (10 sec: 18022.4, 60 sec: 14677.4, 300 sec: 14315.2). Total num frames: 83980288. Throughput: 0: 3727.8. Samples: 10164506. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 22:41:13,968][134211] Avg episode reward: [(0, '6.111')] [2025-01-03 22:41:13,974][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000020503_83980288.pth... [2025-01-03 22:41:14,020][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000019664_80543744.pth [2025-01-03 22:41:14,124][134294] Updated weights for policy 0, policy_version 20504 (0.0014) [2025-01-03 22:41:16,664][134294] Updated weights for policy 0, policy_version 20514 (0.0023) [2025-01-03 22:41:18,968][134211] Fps is (10 sec: 17612.5, 60 sec: 14745.8, 300 sec: 14356.9). Total num frames: 84054016. Throughput: 0: 3801.1. Samples: 10178128. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 22:41:18,968][134211] Avg episode reward: [(0, '6.348')] [2025-01-03 22:41:19,821][134294] Updated weights for policy 0, policy_version 20524 (0.0027) [2025-01-03 22:41:23,178][134294] Updated weights for policy 0, policy_version 20534 (0.0025) [2025-01-03 22:41:23,968][134211] Fps is (10 sec: 13516.0, 60 sec: 14608.9, 300 sec: 14329.0). Total num frames: 84115456. Throughput: 0: 3753.7. Samples: 10196842. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 22:41:23,969][134211] Avg episode reward: [(0, '6.762')] [2025-01-03 22:41:26,303][134294] Updated weights for policy 0, policy_version 20544 (0.0027) [2025-01-03 22:41:28,968][134211] Fps is (10 sec: 12697.8, 60 sec: 14677.5, 300 sec: 14315.2). Total num frames: 84180992. Throughput: 0: 3643.8. Samples: 10216624. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 22:41:28,968][134211] Avg episode reward: [(0, '6.244')] [2025-01-03 22:41:29,341][134294] Updated weights for policy 0, policy_version 20554 (0.0026) [2025-01-03 22:41:32,402][134294] Updated weights for policy 0, policy_version 20564 (0.0026) [2025-01-03 22:41:33,968][134211] Fps is (10 sec: 13108.0, 60 sec: 14540.8, 300 sec: 14315.2). Total num frames: 84246528. Throughput: 0: 3639.3. Samples: 10226556. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 22:41:33,968][134211] Avg episode reward: [(0, '6.803')] [2025-01-03 22:41:35,423][134294] Updated weights for policy 0, policy_version 20574 (0.0023) [2025-01-03 22:41:38,463][134294] Updated weights for policy 0, policy_version 20584 (0.0025) [2025-01-03 22:41:38,968][134211] Fps is (10 sec: 13516.6, 60 sec: 14609.0, 300 sec: 14301.3). Total num frames: 84316160. Throughput: 0: 3647.0. Samples: 10247214. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 22:41:38,968][134211] Avg episode reward: [(0, '6.685')] [2025-01-03 22:41:41,564][134294] Updated weights for policy 0, policy_version 20594 (0.0026) [2025-01-03 22:41:43,968][134211] Fps is (10 sec: 13107.1, 60 sec: 14199.5, 300 sec: 14287.4). Total num frames: 84377600. Throughput: 0: 3545.4. Samples: 10266108. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 22:41:43,968][134211] Avg episode reward: [(0, '6.631')] [2025-01-03 22:41:45,003][134294] Updated weights for policy 0, policy_version 20604 (0.0025) [2025-01-03 22:41:48,009][134294] Updated weights for policy 0, policy_version 20614 (0.0026) [2025-01-03 22:41:48,968][134211] Fps is (10 sec: 13107.3, 60 sec: 14199.5, 300 sec: 14301.3). Total num frames: 84447232. Throughput: 0: 3559.6. Samples: 10275496. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 22:41:48,968][134211] Avg episode reward: [(0, '6.318')] [2025-01-03 22:41:50,988][134294] Updated weights for policy 0, policy_version 20624 (0.0023) [2025-01-03 22:41:53,968][134211] Fps is (10 sec: 13516.7, 60 sec: 14131.1, 300 sec: 14231.9). Total num frames: 84512768. Throughput: 0: 3589.6. Samples: 10295860. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 22:41:53,968][134211] Avg episode reward: [(0, '6.648')] [2025-01-03 22:41:54,418][134294] Updated weights for policy 0, policy_version 20634 (0.0026) [2025-01-03 22:41:57,807][134294] Updated weights for policy 0, policy_version 20644 (0.0022) [2025-01-03 22:41:58,968][134211] Fps is (10 sec: 12287.9, 60 sec: 13789.9, 300 sec: 14093.0). Total num frames: 84570112. Throughput: 0: 3320.1. Samples: 10313910. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-03 22:41:58,968][134211] Avg episode reward: [(0, '5.895')] [2025-01-03 22:42:00,868][134294] Updated weights for policy 0, policy_version 20654 (0.0028) [2025-01-03 22:42:03,825][134294] Updated weights for policy 0, policy_version 20664 (0.0026) [2025-01-03 22:42:03,969][134211] Fps is (10 sec: 12696.0, 60 sec: 13994.4, 300 sec: 14120.7). Total num frames: 84639744. Throughput: 0: 3242.3. Samples: 10324034. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-03 22:42:03,969][134211] Avg episode reward: [(0, '6.643')] [2025-01-03 22:42:06,737][134294] Updated weights for policy 0, policy_version 20674 (0.0023) [2025-01-03 22:42:08,968][134211] Fps is (10 sec: 13926.4, 60 sec: 13858.1, 300 sec: 14120.8). Total num frames: 84709376. Throughput: 0: 3284.7. Samples: 10344650. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-03 22:42:08,968][134211] Avg episode reward: [(0, '5.948')] [2025-01-03 22:42:10,019][134294] Updated weights for policy 0, policy_version 20684 (0.0027) [2025-01-03 22:42:12,998][134294] Updated weights for policy 0, policy_version 20694 (0.0025) [2025-01-03 22:42:13,968][134211] Fps is (10 sec: 13518.6, 60 sec: 13243.7, 300 sec: 14120.8). Total num frames: 84774912. Throughput: 0: 3282.0. Samples: 10364312. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-03 22:42:13,968][134211] Avg episode reward: [(0, '6.394')] [2025-01-03 22:42:15,944][134294] Updated weights for policy 0, policy_version 20704 (0.0023) [2025-01-03 22:42:17,823][134294] Updated weights for policy 0, policy_version 20714 (0.0013) [2025-01-03 22:42:18,968][134211] Fps is (10 sec: 15565.1, 60 sec: 13516.9, 300 sec: 14190.2). Total num frames: 84865024. Throughput: 0: 3306.4. Samples: 10375344. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-03 22:42:18,968][134211] Avg episode reward: [(0, '6.094')] [2025-01-03 22:42:19,804][134294] Updated weights for policy 0, policy_version 20724 (0.0012) [2025-01-03 22:42:21,788][134294] Updated weights for policy 0, policy_version 20734 (0.0014) [2025-01-03 22:42:23,827][134294] Updated weights for policy 0, policy_version 20744 (0.0014) [2025-01-03 22:42:23,968][134211] Fps is (10 sec: 19251.3, 60 sec: 14199.6, 300 sec: 14301.3). Total num frames: 84967424. Throughput: 0: 3547.0. Samples: 10406830. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-03 22:42:23,968][134211] Avg episode reward: [(0, '6.314')] [2025-01-03 22:42:27,381][134294] Updated weights for policy 0, policy_version 20754 (0.0030) [2025-01-03 22:42:28,968][134211] Fps is (10 sec: 15564.4, 60 sec: 13994.6, 300 sec: 14259.6). Total num frames: 85020672. Throughput: 0: 3578.8. Samples: 10427156. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-03 22:42:28,968][134211] Avg episode reward: [(0, '6.196')] [2025-01-03 22:42:30,999][134294] Updated weights for policy 0, policy_version 20764 (0.0028) [2025-01-03 22:42:33,968][134211] Fps is (10 sec: 11878.2, 60 sec: 13994.6, 300 sec: 14245.7). Total num frames: 85086208. Throughput: 0: 3572.6. Samples: 10436264. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-03 22:42:33,968][134211] Avg episode reward: [(0, '6.571')] [2025-01-03 22:42:34,272][134294] Updated weights for policy 0, policy_version 20774 (0.0030) [2025-01-03 22:42:37,328][134294] Updated weights for policy 0, policy_version 20784 (0.0025) [2025-01-03 22:42:38,968][134211] Fps is (10 sec: 13107.4, 60 sec: 13926.4, 300 sec: 14245.8). Total num frames: 85151744. Throughput: 0: 3549.1. Samples: 10455570. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 22:42:38,968][134211] Avg episode reward: [(0, '6.317')] [2025-01-03 22:42:40,505][134294] Updated weights for policy 0, policy_version 20794 (0.0027) [2025-01-03 22:42:43,334][134294] Updated weights for policy 0, policy_version 20804 (0.0024) [2025-01-03 22:42:43,968][134211] Fps is (10 sec: 13107.2, 60 sec: 13994.6, 300 sec: 14231.9). Total num frames: 85217280. Throughput: 0: 3598.3. Samples: 10475834. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 22:42:43,968][134211] Avg episode reward: [(0, '6.223')] [2025-01-03 22:42:46,328][134294] Updated weights for policy 0, policy_version 20814 (0.0026) [2025-01-03 22:42:48,968][134211] Fps is (10 sec: 13516.8, 60 sec: 13994.7, 300 sec: 14231.9). Total num frames: 85286912. Throughput: 0: 3602.0. Samples: 10486120. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 22:42:48,968][134211] Avg episode reward: [(0, '5.984')] [2025-01-03 22:42:49,454][134294] Updated weights for policy 0, policy_version 20824 (0.0025) [2025-01-03 22:42:52,417][134294] Updated weights for policy 0, policy_version 20834 (0.0023) [2025-01-03 22:42:53,968][134211] Fps is (10 sec: 13516.9, 60 sec: 13994.7, 300 sec: 14218.0). Total num frames: 85352448. Throughput: 0: 3592.5. Samples: 10506312. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-03 22:42:53,968][134211] Avg episode reward: [(0, '6.460')] [2025-01-03 22:42:55,445][134294] Updated weights for policy 0, policy_version 20844 (0.0028) [2025-01-03 22:42:58,821][134294] Updated weights for policy 0, policy_version 20854 (0.0025) [2025-01-03 22:42:58,968][134211] Fps is (10 sec: 13107.0, 60 sec: 14131.2, 300 sec: 14204.1). Total num frames: 85417984. Throughput: 0: 3589.8. Samples: 10525854. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-03 22:42:58,968][134211] Avg episode reward: [(0, '6.071')] [2025-01-03 22:43:02,328][134294] Updated weights for policy 0, policy_version 20864 (0.0027) [2025-01-03 22:43:03,968][134211] Fps is (10 sec: 12287.6, 60 sec: 13926.6, 300 sec: 14176.3). Total num frames: 85475328. Throughput: 0: 3540.8. Samples: 10534682. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-03 22:43:03,969][134211] Avg episode reward: [(0, '6.034')] [2025-01-03 22:43:05,787][134294] Updated weights for policy 0, policy_version 20874 (0.0025) [2025-01-03 22:43:07,884][134294] Updated weights for policy 0, policy_version 20884 (0.0015) [2025-01-03 22:43:08,968][134211] Fps is (10 sec: 14336.3, 60 sec: 14199.5, 300 sec: 14231.9). Total num frames: 85561344. Throughput: 0: 3290.2. Samples: 10554888. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-03 22:43:08,968][134211] Avg episode reward: [(0, '5.690')] [2025-01-03 22:43:09,917][134294] Updated weights for policy 0, policy_version 20894 (0.0013) [2025-01-03 22:43:11,762][134294] Updated weights for policy 0, policy_version 20904 (0.0012) [2025-01-03 22:43:13,968][134211] Fps is (10 sec: 17613.2, 60 sec: 14609.0, 300 sec: 14287.4). Total num frames: 85651456. Throughput: 0: 3491.6. Samples: 10584278. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-03 22:43:13,969][134211] Avg episode reward: [(0, '5.366')] [2025-01-03 22:43:14,057][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000020912_85655552.pth... [2025-01-03 22:43:14,134][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000020090_82288640.pth [2025-01-03 22:43:14,696][134294] Updated weights for policy 0, policy_version 20914 (0.0025) [2025-01-03 22:43:18,006][134294] Updated weights for policy 0, policy_version 20924 (0.0028) [2025-01-03 22:43:18,968][134211] Fps is (10 sec: 15154.9, 60 sec: 14131.1, 300 sec: 14204.1). Total num frames: 85712896. Throughput: 0: 3496.5. Samples: 10593608. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-03 22:43:18,968][134211] Avg episode reward: [(0, '5.768')] [2025-01-03 22:43:21,276][134294] Updated weights for policy 0, policy_version 20934 (0.0024) [2025-01-03 22:43:23,968][134211] Fps is (10 sec: 12697.6, 60 sec: 13516.7, 300 sec: 14148.5). Total num frames: 85778432. Throughput: 0: 3491.6. Samples: 10612692. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 22:43:23,969][134211] Avg episode reward: [(0, '6.141')] [2025-01-03 22:43:24,433][134294] Updated weights for policy 0, policy_version 20944 (0.0028) [2025-01-03 22:43:27,869][134294] Updated weights for policy 0, policy_version 20954 (0.0024) [2025-01-03 22:43:28,968][134211] Fps is (10 sec: 12697.5, 60 sec: 13653.3, 300 sec: 14134.7). Total num frames: 85839872. Throughput: 0: 3450.8. Samples: 10631120. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 22:43:28,969][134211] Avg episode reward: [(0, '5.329')] [2025-01-03 22:43:31,203][134294] Updated weights for policy 0, policy_version 20964 (0.0026) [2025-01-03 22:43:33,968][134211] Fps is (10 sec: 12697.7, 60 sec: 13653.3, 300 sec: 14120.8). Total num frames: 85905408. Throughput: 0: 3426.7. Samples: 10640322. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 22:43:33,968][134211] Avg episode reward: [(0, '6.032')] [2025-01-03 22:43:34,248][134294] Updated weights for policy 0, policy_version 20974 (0.0026) [2025-01-03 22:43:37,240][134294] Updated weights for policy 0, policy_version 20984 (0.0025) [2025-01-03 22:43:38,968][134211] Fps is (10 sec: 13107.5, 60 sec: 13653.3, 300 sec: 14120.8). Total num frames: 85970944. Throughput: 0: 3430.0. Samples: 10660660. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 22:43:38,968][134211] Avg episode reward: [(0, '5.843')] [2025-01-03 22:43:40,545][134294] Updated weights for policy 0, policy_version 20994 (0.0026) [2025-01-03 22:43:43,643][134294] Updated weights for policy 0, policy_version 21004 (0.0025) [2025-01-03 22:43:43,968][134211] Fps is (10 sec: 13107.2, 60 sec: 13653.3, 300 sec: 14106.9). Total num frames: 86036480. Throughput: 0: 3424.1. Samples: 10679938. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 22:43:43,968][134211] Avg episode reward: [(0, '5.625')] [2025-01-03 22:43:46,630][134294] Updated weights for policy 0, policy_version 21014 (0.0022) [2025-01-03 22:43:48,656][134294] Updated weights for policy 0, policy_version 21024 (0.0012) [2025-01-03 22:43:48,968][134211] Fps is (10 sec: 14745.7, 60 sec: 13858.1, 300 sec: 14162.5). Total num frames: 86118400. Throughput: 0: 3450.2. Samples: 10689938. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 22:43:48,968][134211] Avg episode reward: [(0, '5.844')] [2025-01-03 22:43:51,480][134294] Updated weights for policy 0, policy_version 21034 (0.0023) [2025-01-03 22:43:53,968][134211] Fps is (10 sec: 14745.8, 60 sec: 13858.1, 300 sec: 14162.4). Total num frames: 86183936. Throughput: 0: 3540.8. Samples: 10714224. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 22:43:53,968][134211] Avg episode reward: [(0, '5.828')] [2025-01-03 22:43:54,634][134294] Updated weights for policy 0, policy_version 21044 (0.0024) [2025-01-03 22:43:57,829][134294] Updated weights for policy 0, policy_version 21054 (0.0027) [2025-01-03 22:43:58,969][134211] Fps is (10 sec: 13106.0, 60 sec: 13858.0, 300 sec: 14023.6). Total num frames: 86249472. Throughput: 0: 3318.4. Samples: 10733610. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 22:43:58,969][134211] Avg episode reward: [(0, '5.752')] [2025-01-03 22:44:00,832][134294] Updated weights for policy 0, policy_version 21064 (0.0025) [2025-01-03 22:44:03,811][134294] Updated weights for policy 0, policy_version 21074 (0.0027) [2025-01-03 22:44:03,969][134211] Fps is (10 sec: 13516.5, 60 sec: 14063.0, 300 sec: 13898.6). Total num frames: 86319104. Throughput: 0: 3341.9. Samples: 10743992. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 22:44:03,970][134211] Avg episode reward: [(0, '5.581')] [2025-01-03 22:44:06,756][134294] Updated weights for policy 0, policy_version 21084 (0.0025) [2025-01-03 22:44:08,968][134211] Fps is (10 sec: 13927.6, 60 sec: 13789.8, 300 sec: 13898.6). Total num frames: 86388736. Throughput: 0: 3374.1. Samples: 10764526. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 22:44:08,968][134211] Avg episode reward: [(0, '6.022')] [2025-01-03 22:44:09,893][134294] Updated weights for policy 0, policy_version 21094 (0.0023) [2025-01-03 22:44:12,325][134294] Updated weights for policy 0, policy_version 21104 (0.0019) [2025-01-03 22:44:13,968][134211] Fps is (10 sec: 15564.7, 60 sec: 13721.6, 300 sec: 13981.9). Total num frames: 86474752. Throughput: 0: 3497.9. Samples: 10788524. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 22:44:13,969][134211] Avg episode reward: [(0, '5.924')] [2025-01-03 22:44:14,518][134294] Updated weights for policy 0, policy_version 21114 (0.0017) [2025-01-03 22:44:17,479][134294] Updated weights for policy 0, policy_version 21124 (0.0024) [2025-01-03 22:44:18,968][134211] Fps is (10 sec: 15154.7, 60 sec: 13789.8, 300 sec: 13981.9). Total num frames: 86540288. Throughput: 0: 3547.3. Samples: 10799952. Policy #0 lag: (min: 0.0, avg: 0.8, max: 3.0) [2025-01-03 22:44:18,968][134211] Avg episode reward: [(0, '6.225')] [2025-01-03 22:44:20,538][134294] Updated weights for policy 0, policy_version 21134 (0.0023) [2025-01-03 22:44:23,480][134294] Updated weights for policy 0, policy_version 21144 (0.0028) [2025-01-03 22:44:23,968][134211] Fps is (10 sec: 13517.1, 60 sec: 13858.2, 300 sec: 13995.8). Total num frames: 86609920. Throughput: 0: 3545.0. Samples: 10820186. Policy #0 lag: (min: 0.0, avg: 0.8, max: 3.0) [2025-01-03 22:44:23,968][134211] Avg episode reward: [(0, '6.370')] [2025-01-03 22:44:26,447][134294] Updated weights for policy 0, policy_version 21154 (0.0025) [2025-01-03 22:44:28,968][134211] Fps is (10 sec: 13517.2, 60 sec: 13926.4, 300 sec: 13981.9). Total num frames: 86675456. Throughput: 0: 3571.3. Samples: 10840646. Policy #0 lag: (min: 0.0, avg: 0.8, max: 3.0) [2025-01-03 22:44:28,968][134211] Avg episode reward: [(0, '6.678')] [2025-01-03 22:44:29,585][134294] Updated weights for policy 0, policy_version 21164 (0.0025) [2025-01-03 22:44:32,313][134294] Updated weights for policy 0, policy_version 21174 (0.0019) [2025-01-03 22:44:33,968][134211] Fps is (10 sec: 14745.7, 60 sec: 14199.5, 300 sec: 14037.5). Total num frames: 86757376. Throughput: 0: 3565.9. Samples: 10850402. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-03 22:44:33,968][134211] Avg episode reward: [(0, '6.288')] [2025-01-03 22:44:34,555][134294] Updated weights for policy 0, policy_version 21184 (0.0018) [2025-01-03 22:44:37,516][134294] Updated weights for policy 0, policy_version 21194 (0.0023) [2025-01-03 22:44:38,968][134211] Fps is (10 sec: 15155.2, 60 sec: 14267.7, 300 sec: 14037.5). Total num frames: 86827008. Throughput: 0: 3567.5. Samples: 10874760. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-03 22:44:38,968][134211] Avg episode reward: [(0, '6.218')] [2025-01-03 22:44:40,585][134294] Updated weights for policy 0, policy_version 21204 (0.0025) [2025-01-03 22:44:43,510][134294] Updated weights for policy 0, policy_version 21214 (0.0024) [2025-01-03 22:44:43,968][134211] Fps is (10 sec: 13926.4, 60 sec: 14336.0, 300 sec: 14037.5). Total num frames: 86896640. Throughput: 0: 3592.7. Samples: 10895280. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-03 22:44:43,968][134211] Avg episode reward: [(0, '5.977')] [2025-01-03 22:44:46,517][134294] Updated weights for policy 0, policy_version 21224 (0.0026) [2025-01-03 22:44:48,968][134211] Fps is (10 sec: 13926.4, 60 sec: 14131.2, 300 sec: 14037.5). Total num frames: 86966272. Throughput: 0: 3592.0. Samples: 10905632. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 22:44:48,968][134211] Avg episode reward: [(0, '6.485')] [2025-01-03 22:44:49,489][134294] Updated weights for policy 0, policy_version 21234 (0.0028) [2025-01-03 22:44:52,511][134294] Updated weights for policy 0, policy_version 21244 (0.0024) [2025-01-03 22:44:53,968][134211] Fps is (10 sec: 13516.7, 60 sec: 14131.2, 300 sec: 14037.5). Total num frames: 87031808. Throughput: 0: 3587.7. Samples: 10925972. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 22:44:53,968][134211] Avg episode reward: [(0, '6.140')] [2025-01-03 22:44:55,430][134294] Updated weights for policy 0, policy_version 21254 (0.0026) [2025-01-03 22:44:58,559][134294] Updated weights for policy 0, policy_version 21264 (0.0023) [2025-01-03 22:44:58,968][134211] Fps is (10 sec: 13517.1, 60 sec: 14199.7, 300 sec: 14037.6). Total num frames: 87101440. Throughput: 0: 3497.5. Samples: 10945908. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 22:44:58,968][134211] Avg episode reward: [(0, '6.625')] [2025-01-03 22:45:00,655][134294] Updated weights for policy 0, policy_version 21274 (0.0015) [2025-01-03 22:45:02,721][134294] Updated weights for policy 0, policy_version 21284 (0.0013) [2025-01-03 22:45:03,968][134211] Fps is (10 sec: 17203.6, 60 sec: 14745.7, 300 sec: 14148.7). Total num frames: 87203840. Throughput: 0: 3566.9. Samples: 10960462. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 22:45:03,968][134211] Avg episode reward: [(0, '5.999')] [2025-01-03 22:45:04,662][134294] Updated weights for policy 0, policy_version 21294 (0.0013) [2025-01-03 22:45:06,522][134294] Updated weights for policy 0, policy_version 21304 (0.0012) [2025-01-03 22:45:08,434][134294] Updated weights for policy 0, policy_version 21314 (0.0014) [2025-01-03 22:45:08,967][134211] Fps is (10 sec: 20889.7, 60 sec: 15360.0, 300 sec: 14273.5). Total num frames: 87310336. Throughput: 0: 3819.1. Samples: 10992046. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 22:45:08,968][134211] Avg episode reward: [(0, '6.159')] [2025-01-03 22:45:10,920][134294] Updated weights for policy 0, policy_version 21324 (0.0019) [2025-01-03 22:45:13,969][134211] Fps is (10 sec: 17611.1, 60 sec: 15086.8, 300 sec: 14273.5). Total num frames: 87379968. Throughput: 0: 3903.9. Samples: 11016326. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 22:45:13,970][134211] Avg episode reward: [(0, '5.873')] [2025-01-03 22:45:13,979][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000021333_87379968.pth... [2025-01-03 22:45:14,058][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000020503_83980288.pth [2025-01-03 22:45:14,138][134294] Updated weights for policy 0, policy_version 21334 (0.0027) [2025-01-03 22:45:17,308][134294] Updated weights for policy 0, policy_version 21344 (0.0024) [2025-01-03 22:45:18,968][134211] Fps is (10 sec: 13516.7, 60 sec: 15087.0, 300 sec: 14259.6). Total num frames: 87445504. Throughput: 0: 3892.2. Samples: 11025550. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 22:45:18,968][134211] Avg episode reward: [(0, '6.311')] [2025-01-03 22:45:20,492][134294] Updated weights for policy 0, policy_version 21354 (0.0029) [2025-01-03 22:45:23,501][134294] Updated weights for policy 0, policy_version 21364 (0.0026) [2025-01-03 22:45:23,968][134211] Fps is (10 sec: 13108.1, 60 sec: 15018.7, 300 sec: 14273.6). Total num frames: 87511040. Throughput: 0: 3799.5. Samples: 11045738. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 22:45:23,968][134211] Avg episode reward: [(0, '6.565')] [2025-01-03 22:45:26,483][134294] Updated weights for policy 0, policy_version 21374 (0.0025) [2025-01-03 22:45:28,968][134211] Fps is (10 sec: 13107.0, 60 sec: 15018.7, 300 sec: 14245.8). Total num frames: 87576576. Throughput: 0: 3787.3. Samples: 11065710. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 22:45:28,968][134211] Avg episode reward: [(0, '6.656')] [2025-01-03 22:45:29,564][134294] Updated weights for policy 0, policy_version 21384 (0.0028) [2025-01-03 22:45:32,906][134294] Updated weights for policy 0, policy_version 21394 (0.0026) [2025-01-03 22:45:33,968][134211] Fps is (10 sec: 13107.2, 60 sec: 14745.6, 300 sec: 14245.7). Total num frames: 87642112. Throughput: 0: 3769.4. Samples: 11075254. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 22:45:33,968][134211] Avg episode reward: [(0, '6.141')] [2025-01-03 22:45:35,908][134294] Updated weights for policy 0, policy_version 21404 (0.0026) [2025-01-03 22:45:38,968][134211] Fps is (10 sec: 13107.0, 60 sec: 14677.3, 300 sec: 14176.3). Total num frames: 87707648. Throughput: 0: 3761.0. Samples: 11095218. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 22:45:38,969][134211] Avg episode reward: [(0, '6.406')] [2025-01-03 22:45:39,001][134294] Updated weights for policy 0, policy_version 21414 (0.0025) [2025-01-03 22:45:42,000][134294] Updated weights for policy 0, policy_version 21424 (0.0023) [2025-01-03 22:45:43,968][134211] Fps is (10 sec: 13517.0, 60 sec: 14677.3, 300 sec: 14176.3). Total num frames: 87777280. Throughput: 0: 3765.7. Samples: 11115364. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 22:45:43,968][134211] Avg episode reward: [(0, '6.450')] [2025-01-03 22:45:44,977][134294] Updated weights for policy 0, policy_version 21434 (0.0026) [2025-01-03 22:45:47,816][134294] Updated weights for policy 0, policy_version 21444 (0.0026) [2025-01-03 22:45:48,968][134211] Fps is (10 sec: 13926.6, 60 sec: 14677.3, 300 sec: 14176.3). Total num frames: 87846912. Throughput: 0: 3679.8. Samples: 11126052. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 22:45:48,968][134211] Avg episode reward: [(0, '5.917')] [2025-01-03 22:45:50,913][134294] Updated weights for policy 0, policy_version 21454 (0.0026) [2025-01-03 22:45:53,855][134294] Updated weights for policy 0, policy_version 21464 (0.0022) [2025-01-03 22:45:53,968][134211] Fps is (10 sec: 13926.3, 60 sec: 14745.6, 300 sec: 14148.6). Total num frames: 87916544. Throughput: 0: 3427.8. Samples: 11146298. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 22:45:53,968][134211] Avg episode reward: [(0, '5.972')] [2025-01-03 22:45:56,972][134294] Updated weights for policy 0, policy_version 21474 (0.0025) [2025-01-03 22:45:58,968][134211] Fps is (10 sec: 13516.8, 60 sec: 14677.3, 300 sec: 14176.3). Total num frames: 87982080. Throughput: 0: 3339.9. Samples: 11166618. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 22:45:58,968][134211] Avg episode reward: [(0, '5.830')] [2025-01-03 22:45:59,896][134294] Updated weights for policy 0, policy_version 21484 (0.0025) [2025-01-03 22:46:02,936][134294] Updated weights for policy 0, policy_version 21494 (0.0023) [2025-01-03 22:46:03,968][134211] Fps is (10 sec: 13516.4, 60 sec: 14131.1, 300 sec: 14148.5). Total num frames: 88051712. Throughput: 0: 3366.6. Samples: 11177048. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-03 22:46:03,969][134211] Avg episode reward: [(0, '6.265')] [2025-01-03 22:46:05,919][134294] Updated weights for policy 0, policy_version 21504 (0.0025) [2025-01-03 22:46:08,782][134294] Updated weights for policy 0, policy_version 21514 (0.0025) [2025-01-03 22:46:08,968][134211] Fps is (10 sec: 13926.5, 60 sec: 13516.8, 300 sec: 14037.5). Total num frames: 88121344. Throughput: 0: 3379.7. Samples: 11197826. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-03 22:46:08,968][134211] Avg episode reward: [(0, '5.897')] [2025-01-03 22:46:11,793][134294] Updated weights for policy 0, policy_version 21524 (0.0022) [2025-01-03 22:46:13,968][134211] Fps is (10 sec: 13927.0, 60 sec: 13517.0, 300 sec: 14023.6). Total num frames: 88190976. Throughput: 0: 3391.9. Samples: 11218346. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-03 22:46:13,968][134211] Avg episode reward: [(0, '6.548')] [2025-01-03 22:46:14,838][134294] Updated weights for policy 0, policy_version 21534 (0.0026) [2025-01-03 22:46:17,139][134294] Updated weights for policy 0, policy_version 21544 (0.0016) [2025-01-03 22:46:18,968][134211] Fps is (10 sec: 15155.1, 60 sec: 13789.8, 300 sec: 14093.0). Total num frames: 88272896. Throughput: 0: 3421.2. Samples: 11229208. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 22:46:18,968][134211] Avg episode reward: [(0, '6.230')] [2025-01-03 22:46:19,502][134294] Updated weights for policy 0, policy_version 21554 (0.0019) [2025-01-03 22:46:22,571][134294] Updated weights for policy 0, policy_version 21564 (0.0026) [2025-01-03 22:46:23,968][134211] Fps is (10 sec: 15155.2, 60 sec: 13858.2, 300 sec: 14106.9). Total num frames: 88342528. Throughput: 0: 3519.1. Samples: 11253576. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 22:46:23,968][134211] Avg episode reward: [(0, '6.172')] [2025-01-03 22:46:25,551][134294] Updated weights for policy 0, policy_version 21574 (0.0026) [2025-01-03 22:46:28,586][134294] Updated weights for policy 0, policy_version 21584 (0.0025) [2025-01-03 22:46:28,968][134211] Fps is (10 sec: 13926.6, 60 sec: 13926.4, 300 sec: 14120.8). Total num frames: 88412160. Throughput: 0: 3527.2. Samples: 11274088. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 22:46:28,968][134211] Avg episode reward: [(0, '6.116')] [2025-01-03 22:46:31,017][134294] Updated weights for policy 0, policy_version 21594 (0.0020) [2025-01-03 22:46:32,963][134294] Updated weights for policy 0, policy_version 21604 (0.0013) [2025-01-03 22:46:33,967][134211] Fps is (10 sec: 16793.8, 60 sec: 14472.6, 300 sec: 14218.0). Total num frames: 88510464. Throughput: 0: 3562.1. Samples: 11286344. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 22:46:33,968][134211] Avg episode reward: [(0, '6.512')] [2025-01-03 22:46:34,799][134294] Updated weights for policy 0, policy_version 21614 (0.0012) [2025-01-03 22:46:36,760][134294] Updated weights for policy 0, policy_version 21624 (0.0012) [2025-01-03 22:46:38,968][134211] Fps is (10 sec: 19660.7, 60 sec: 15018.7, 300 sec: 14342.9). Total num frames: 88608768. Throughput: 0: 3833.7. Samples: 11318812. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 22:46:38,968][134211] Avg episode reward: [(0, '6.144')] [2025-01-03 22:46:38,980][134294] Updated weights for policy 0, policy_version 21634 (0.0020) [2025-01-03 22:46:42,357][134294] Updated weights for policy 0, policy_version 21644 (0.0030) [2025-01-03 22:46:43,968][134211] Fps is (10 sec: 15973.9, 60 sec: 14882.1, 300 sec: 14315.2). Total num frames: 88670208. Throughput: 0: 3837.0. Samples: 11339282. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 22:46:43,968][134211] Avg episode reward: [(0, '6.296')] [2025-01-03 22:46:45,476][134294] Updated weights for policy 0, policy_version 21654 (0.0030) [2025-01-03 22:46:48,549][134294] Updated weights for policy 0, policy_version 21664 (0.0022) [2025-01-03 22:46:48,968][134211] Fps is (10 sec: 13107.0, 60 sec: 14882.1, 300 sec: 14329.1). Total num frames: 88739840. Throughput: 0: 3824.6. Samples: 11349154. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 22:46:48,968][134211] Avg episode reward: [(0, '6.085')] [2025-01-03 22:46:51,568][134294] Updated weights for policy 0, policy_version 21674 (0.0026) [2025-01-03 22:46:53,968][134211] Fps is (10 sec: 13516.8, 60 sec: 14813.8, 300 sec: 14356.8). Total num frames: 88805376. Throughput: 0: 3809.0. Samples: 11369230. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 22:46:53,968][134211] Avg episode reward: [(0, '6.545')] [2025-01-03 22:46:54,789][134294] Updated weights for policy 0, policy_version 21684 (0.0025) [2025-01-03 22:46:57,851][134294] Updated weights for policy 0, policy_version 21694 (0.0025) [2025-01-03 22:46:58,968][134211] Fps is (10 sec: 13107.3, 60 sec: 14813.9, 300 sec: 14343.0). Total num frames: 88870912. Throughput: 0: 3783.9. Samples: 11388622. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 22:46:58,968][134211] Avg episode reward: [(0, '6.422')] [2025-01-03 22:47:01,409][134294] Updated weights for policy 0, policy_version 21704 (0.0027) [2025-01-03 22:47:03,968][134211] Fps is (10 sec: 11878.4, 60 sec: 14540.9, 300 sec: 14287.4). Total num frames: 88924160. Throughput: 0: 3734.0. Samples: 11397240. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 22:47:03,968][134211] Avg episode reward: [(0, '6.576')] [2025-01-03 22:47:05,139][134294] Updated weights for policy 0, policy_version 21714 (0.0024) [2025-01-03 22:47:08,485][134294] Updated weights for policy 0, policy_version 21724 (0.0025) [2025-01-03 22:47:08,968][134211] Fps is (10 sec: 11468.7, 60 sec: 14404.2, 300 sec: 14273.5). Total num frames: 88985600. Throughput: 0: 3580.2. Samples: 11414686. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 22:47:08,968][134211] Avg episode reward: [(0, '6.274')] [2025-01-03 22:47:10,942][134294] Updated weights for policy 0, policy_version 21734 (0.0018) [2025-01-03 22:47:12,836][134294] Updated weights for policy 0, policy_version 21744 (0.0012) [2025-01-03 22:47:13,967][134211] Fps is (10 sec: 16384.5, 60 sec: 14950.4, 300 sec: 14315.2). Total num frames: 89088000. Throughput: 0: 3711.9. Samples: 11441124. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 22:47:13,968][134211] Avg episode reward: [(0, '6.185')] [2025-01-03 22:47:13,972][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000021750_89088000.pth... [2025-01-03 22:47:14,015][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000020912_85655552.pth [2025-01-03 22:47:14,690][134294] Updated weights for policy 0, policy_version 21754 (0.0012) [2025-01-03 22:47:16,607][134294] Updated weights for policy 0, policy_version 21764 (0.0014) [2025-01-03 22:47:18,856][134294] Updated weights for policy 0, policy_version 21774 (0.0019) [2025-01-03 22:47:18,971][134211] Fps is (10 sec: 20064.3, 60 sec: 15222.7, 300 sec: 14301.1). Total num frames: 89186304. Throughput: 0: 3797.8. Samples: 11457256. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-03 22:47:18,971][134211] Avg episode reward: [(0, '6.423')] [2025-01-03 22:47:22,001][134294] Updated weights for policy 0, policy_version 21784 (0.0028) [2025-01-03 22:47:23,968][134211] Fps is (10 sec: 15974.0, 60 sec: 15086.9, 300 sec: 14329.1). Total num frames: 89247744. Throughput: 0: 3595.4. Samples: 11480606. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-03 22:47:23,969][134211] Avg episode reward: [(0, '6.404')] [2025-01-03 22:47:25,670][134294] Updated weights for policy 0, policy_version 21794 (0.0028) [2025-01-03 22:47:28,968][134211] Fps is (10 sec: 11881.3, 60 sec: 14882.0, 300 sec: 14301.3). Total num frames: 89305088. Throughput: 0: 3529.1. Samples: 11498092. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-03 22:47:28,969][134211] Avg episode reward: [(0, '6.695')] [2025-01-03 22:47:29,061][134294] Updated weights for policy 0, policy_version 21804 (0.0026) [2025-01-03 22:47:32,165][134294] Updated weights for policy 0, policy_version 21814 (0.0025) [2025-01-03 22:47:33,968][134211] Fps is (10 sec: 12288.0, 60 sec: 14335.9, 300 sec: 14301.3). Total num frames: 89370624. Throughput: 0: 3510.4. Samples: 11507124. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-03 22:47:33,968][134211] Avg episode reward: [(0, '6.743')] [2025-01-03 22:47:35,328][134294] Updated weights for policy 0, policy_version 21824 (0.0026) [2025-01-03 22:47:38,296][134294] Updated weights for policy 0, policy_version 21834 (0.0025) [2025-01-03 22:47:38,968][134211] Fps is (10 sec: 13517.6, 60 sec: 13858.1, 300 sec: 14315.2). Total num frames: 89440256. Throughput: 0: 3518.6. Samples: 11527568. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-03 22:47:38,968][134211] Avg episode reward: [(0, '6.046')] [2025-01-03 22:47:41,462][134294] Updated weights for policy 0, policy_version 21844 (0.0025) [2025-01-03 22:47:43,971][134211] Fps is (10 sec: 13512.6, 60 sec: 13925.7, 300 sec: 14301.1). Total num frames: 89505792. Throughput: 0: 3524.8. Samples: 11547248. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 22:47:43,972][134211] Avg episode reward: [(0, '6.532')] [2025-01-03 22:47:44,578][134294] Updated weights for policy 0, policy_version 21854 (0.0022) [2025-01-03 22:47:47,575][134294] Updated weights for policy 0, policy_version 21864 (0.0025) [2025-01-03 22:47:48,968][134211] Fps is (10 sec: 13107.2, 60 sec: 13858.1, 300 sec: 14301.3). Total num frames: 89571328. Throughput: 0: 3553.8. Samples: 11557160. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 22:47:48,968][134211] Avg episode reward: [(0, '6.954')] [2025-01-03 22:47:50,513][134294] Updated weights for policy 0, policy_version 21874 (0.0024) [2025-01-03 22:47:53,457][134294] Updated weights for policy 0, policy_version 21884 (0.0024) [2025-01-03 22:47:53,968][134211] Fps is (10 sec: 13521.0, 60 sec: 13926.4, 300 sec: 14315.2). Total num frames: 89640960. Throughput: 0: 3628.2. Samples: 11577954. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 22:47:53,968][134211] Avg episode reward: [(0, '5.733')] [2025-01-03 22:47:56,420][134294] Updated weights for policy 0, policy_version 21894 (0.0023) [2025-01-03 22:47:58,968][134211] Fps is (10 sec: 13516.7, 60 sec: 13926.4, 300 sec: 14343.0). Total num frames: 89706496. Throughput: 0: 3494.0. Samples: 11598356. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 22:47:58,968][134211] Avg episode reward: [(0, '5.599')] [2025-01-03 22:47:59,687][134294] Updated weights for policy 0, policy_version 21904 (0.0028) [2025-01-03 22:48:02,861][134294] Updated weights for policy 0, policy_version 21914 (0.0027) [2025-01-03 22:48:03,968][134211] Fps is (10 sec: 13107.1, 60 sec: 14131.2, 300 sec: 14273.5). Total num frames: 89772032. Throughput: 0: 3339.3. Samples: 11607516. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-03 22:48:03,968][134211] Avg episode reward: [(0, '5.423')] [2025-01-03 22:48:05,788][134294] Updated weights for policy 0, policy_version 21924 (0.0027) [2025-01-03 22:48:08,753][134294] Updated weights for policy 0, policy_version 21934 (0.0024) [2025-01-03 22:48:08,968][134211] Fps is (10 sec: 13516.9, 60 sec: 14267.7, 300 sec: 14204.1). Total num frames: 89841664. Throughput: 0: 3280.1. Samples: 11628212. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-03 22:48:08,968][134211] Avg episode reward: [(0, '5.502')] [2025-01-03 22:48:11,445][134294] Updated weights for policy 0, policy_version 21944 (0.0021) [2025-01-03 22:48:13,347][134294] Updated weights for policy 0, policy_version 21954 (0.0012) [2025-01-03 22:48:13,968][134211] Fps is (10 sec: 16384.4, 60 sec: 14131.2, 300 sec: 14315.2). Total num frames: 89935872. Throughput: 0: 3447.1. Samples: 11653210. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-03 22:48:13,968][134211] Avg episode reward: [(0, '5.762')] [2025-01-03 22:48:15,230][134294] Updated weights for policy 0, policy_version 21964 (0.0013) [2025-01-03 22:48:17,129][134294] Updated weights for policy 0, policy_version 21974 (0.0013) [2025-01-03 22:48:18,968][134211] Fps is (10 sec: 19251.4, 60 sec: 14131.9, 300 sec: 14426.3). Total num frames: 90034176. Throughput: 0: 3608.5. Samples: 11669508. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-03 22:48:18,968][134211] Avg episode reward: [(0, '6.624')] [2025-01-03 22:48:19,645][134294] Updated weights for policy 0, policy_version 21984 (0.0023) [2025-01-03 22:48:22,846][134294] Updated weights for policy 0, policy_version 21994 (0.0026) [2025-01-03 22:48:23,968][134211] Fps is (10 sec: 16383.7, 60 sec: 14199.5, 300 sec: 14440.1). Total num frames: 90099712. Throughput: 0: 3676.5. Samples: 11693012. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-03 22:48:23,968][134211] Avg episode reward: [(0, '6.372')] [2025-01-03 22:48:25,983][134294] Updated weights for policy 0, policy_version 22004 (0.0027) [2025-01-03 22:48:28,968][134211] Fps is (10 sec: 13107.1, 60 sec: 14336.1, 300 sec: 14440.1). Total num frames: 90165248. Throughput: 0: 3679.5. Samples: 11712812. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 22:48:28,968][134211] Avg episode reward: [(0, '5.611')] [2025-01-03 22:48:29,102][134294] Updated weights for policy 0, policy_version 22014 (0.0023) [2025-01-03 22:48:32,184][134294] Updated weights for policy 0, policy_version 22024 (0.0028) [2025-01-03 22:48:33,968][134211] Fps is (10 sec: 13107.2, 60 sec: 14336.0, 300 sec: 14440.1). Total num frames: 90230784. Throughput: 0: 3672.8. Samples: 11722436. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 22:48:33,968][134211] Avg episode reward: [(0, '6.064')] [2025-01-03 22:48:35,272][134294] Updated weights for policy 0, policy_version 22034 (0.0025) [2025-01-03 22:48:38,210][134294] Updated weights for policy 0, policy_version 22044 (0.0026) [2025-01-03 22:48:38,968][134211] Fps is (10 sec: 13516.8, 60 sec: 14336.0, 300 sec: 14454.0). Total num frames: 90300416. Throughput: 0: 3662.8. Samples: 11742782. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 22:48:38,968][134211] Avg episode reward: [(0, '6.013')] [2025-01-03 22:48:41,321][134294] Updated weights for policy 0, policy_version 22054 (0.0026) [2025-01-03 22:48:43,968][134211] Fps is (10 sec: 13516.8, 60 sec: 14336.7, 300 sec: 14398.5). Total num frames: 90365952. Throughput: 0: 3655.8. Samples: 11762866. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 22:48:43,968][134211] Avg episode reward: [(0, '5.985')] [2025-01-03 22:48:44,286][134294] Updated weights for policy 0, policy_version 22064 (0.0027) [2025-01-03 22:48:47,409][134294] Updated weights for policy 0, policy_version 22074 (0.0025) [2025-01-03 22:48:48,968][134211] Fps is (10 sec: 13517.0, 60 sec: 14404.3, 300 sec: 14412.4). Total num frames: 90435584. Throughput: 0: 3672.9. Samples: 11772794. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 22:48:48,968][134211] Avg episode reward: [(0, '5.798')] [2025-01-03 22:48:50,332][134294] Updated weights for policy 0, policy_version 22084 (0.0023) [2025-01-03 22:48:53,233][134294] Updated weights for policy 0, policy_version 22094 (0.0026) [2025-01-03 22:48:53,968][134211] Fps is (10 sec: 13926.6, 60 sec: 14404.3, 300 sec: 14426.3). Total num frames: 90505216. Throughput: 0: 3676.9. Samples: 11793672. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 22:48:53,968][134211] Avg episode reward: [(0, '6.110')] [2025-01-03 22:48:56,253][134294] Updated weights for policy 0, policy_version 22104 (0.0026) [2025-01-03 22:48:58,968][134211] Fps is (10 sec: 13107.1, 60 sec: 14336.0, 300 sec: 14398.5). Total num frames: 90566656. Throughput: 0: 3564.9. Samples: 11813632. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 22:48:58,968][134211] Avg episode reward: [(0, '5.896')] [2025-01-03 22:48:59,703][134294] Updated weights for policy 0, policy_version 22114 (0.0025) [2025-01-03 22:49:02,737][134294] Updated weights for policy 0, policy_version 22124 (0.0023) [2025-01-03 22:49:03,968][134211] Fps is (10 sec: 13516.9, 60 sec: 14472.6, 300 sec: 14412.4). Total num frames: 90640384. Throughput: 0: 3395.3. Samples: 11822298. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 22:49:03,968][134211] Avg episode reward: [(0, '5.509')] [2025-01-03 22:49:04,816][134294] Updated weights for policy 0, policy_version 22134 (0.0013) [2025-01-03 22:49:06,713][134294] Updated weights for policy 0, policy_version 22144 (0.0014) [2025-01-03 22:49:08,581][134294] Updated weights for policy 0, policy_version 22154 (0.0013) [2025-01-03 22:49:08,968][134211] Fps is (10 sec: 18022.6, 60 sec: 15087.0, 300 sec: 14481.8). Total num frames: 90746880. Throughput: 0: 3514.6. Samples: 11851166. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 22:49:08,968][134211] Avg episode reward: [(0, '5.250')] [2025-01-03 22:49:11,059][134294] Updated weights for policy 0, policy_version 22164 (0.0018) [2025-01-03 22:49:13,968][134211] Fps is (10 sec: 18020.9, 60 sec: 14745.4, 300 sec: 14509.5). Total num frames: 90820608. Throughput: 0: 3632.3. Samples: 11876270. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 22:49:13,969][134211] Avg episode reward: [(0, '5.501')] [2025-01-03 22:49:13,979][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000022173_90820608.pth... [2025-01-03 22:49:14,057][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000021333_87379968.pth [2025-01-03 22:49:14,196][134294] Updated weights for policy 0, policy_version 22174 (0.0027) [2025-01-03 22:49:17,463][134294] Updated weights for policy 0, policy_version 22184 (0.0028) [2025-01-03 22:49:18,968][134211] Fps is (10 sec: 13926.1, 60 sec: 14199.4, 300 sec: 14495.7). Total num frames: 90886144. Throughput: 0: 3625.8. Samples: 11885596. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 22:49:18,968][134211] Avg episode reward: [(0, '5.250')] [2025-01-03 22:49:20,453][134294] Updated weights for policy 0, policy_version 22194 (0.0029) [2025-01-03 22:49:23,548][134294] Updated weights for policy 0, policy_version 22204 (0.0025) [2025-01-03 22:49:23,968][134211] Fps is (10 sec: 13107.9, 60 sec: 14199.4, 300 sec: 14495.7). Total num frames: 90951680. Throughput: 0: 3619.5. Samples: 11905658. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 22:49:23,969][134211] Avg episode reward: [(0, '5.534')] [2025-01-03 22:49:26,508][134294] Updated weights for policy 0, policy_version 22214 (0.0025) [2025-01-03 22:49:28,968][134211] Fps is (10 sec: 13107.4, 60 sec: 14199.5, 300 sec: 14440.1). Total num frames: 91017216. Throughput: 0: 3617.2. Samples: 11925638. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 22:49:28,968][134211] Avg episode reward: [(0, '4.990')] [2025-01-03 22:49:29,719][134294] Updated weights for policy 0, policy_version 22224 (0.0024) [2025-01-03 22:49:32,710][134294] Updated weights for policy 0, policy_version 22234 (0.0028) [2025-01-03 22:49:33,968][134211] Fps is (10 sec: 13517.0, 60 sec: 14267.7, 300 sec: 14440.1). Total num frames: 91086848. Throughput: 0: 3618.6. Samples: 11935632. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 22:49:33,968][134211] Avg episode reward: [(0, '5.416')] [2025-01-03 22:49:35,673][134294] Updated weights for policy 0, policy_version 22244 (0.0022) [2025-01-03 22:49:38,696][134294] Updated weights for policy 0, policy_version 22254 (0.0025) [2025-01-03 22:49:38,968][134211] Fps is (10 sec: 13516.6, 60 sec: 14199.5, 300 sec: 14426.2). Total num frames: 91152384. Throughput: 0: 3610.3. Samples: 11956138. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 22:49:38,969][134211] Avg episode reward: [(0, '5.447')] [2025-01-03 22:49:41,728][134294] Updated weights for policy 0, policy_version 22264 (0.0025) [2025-01-03 22:49:43,968][134211] Fps is (10 sec: 13516.8, 60 sec: 14267.7, 300 sec: 14426.2). Total num frames: 91222016. Throughput: 0: 3614.1. Samples: 11976266. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 22:49:43,968][134211] Avg episode reward: [(0, '5.425')] [2025-01-03 22:49:44,850][134294] Updated weights for policy 0, policy_version 22274 (0.0026) [2025-01-03 22:49:47,881][134294] Updated weights for policy 0, policy_version 22284 (0.0022) [2025-01-03 22:49:48,968][134211] Fps is (10 sec: 13517.0, 60 sec: 14199.5, 300 sec: 14426.3). Total num frames: 91287552. Throughput: 0: 3642.1. Samples: 11986194. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 22:49:48,968][134211] Avg episode reward: [(0, '4.837')] [2025-01-03 22:49:50,847][134294] Updated weights for policy 0, policy_version 22294 (0.0024) [2025-01-03 22:49:53,745][134294] Updated weights for policy 0, policy_version 22304 (0.0026) [2025-01-03 22:49:53,968][134211] Fps is (10 sec: 13516.8, 60 sec: 14199.4, 300 sec: 14426.2). Total num frames: 91357184. Throughput: 0: 3464.6. Samples: 12007076. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 22:49:53,968][134211] Avg episode reward: [(0, '5.976')] [2025-01-03 22:49:56,764][134294] Updated weights for policy 0, policy_version 22314 (0.0024) [2025-01-03 22:49:58,848][134294] Updated weights for policy 0, policy_version 22324 (0.0014) [2025-01-03 22:49:58,968][134211] Fps is (10 sec: 15155.3, 60 sec: 14540.8, 300 sec: 14356.8). Total num frames: 91439104. Throughput: 0: 3405.1. Samples: 12029496. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 22:49:58,968][134211] Avg episode reward: [(0, '5.493')] [2025-01-03 22:50:01,603][134294] Updated weights for policy 0, policy_version 22334 (0.0023) [2025-01-03 22:50:03,968][134211] Fps is (10 sec: 15155.2, 60 sec: 14472.5, 300 sec: 14231.9). Total num frames: 91508736. Throughput: 0: 3467.8. Samples: 12041646. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-03 22:50:03,968][134211] Avg episode reward: [(0, '5.636')] [2025-01-03 22:50:04,783][134294] Updated weights for policy 0, policy_version 22344 (0.0023) [2025-01-03 22:50:07,647][134294] Updated weights for policy 0, policy_version 22354 (0.0025) [2025-01-03 22:50:08,968][134211] Fps is (10 sec: 13926.4, 60 sec: 13858.1, 300 sec: 14231.9). Total num frames: 91578368. Throughput: 0: 3470.9. Samples: 12061848. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-03 22:50:08,968][134211] Avg episode reward: [(0, '5.839')] [2025-01-03 22:50:10,749][134294] Updated weights for policy 0, policy_version 22364 (0.0025) [2025-01-03 22:50:13,578][134294] Updated weights for policy 0, policy_version 22374 (0.0023) [2025-01-03 22:50:13,968][134211] Fps is (10 sec: 13926.7, 60 sec: 13790.1, 300 sec: 14245.8). Total num frames: 91648000. Throughput: 0: 3488.2. Samples: 12082606. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-03 22:50:13,968][134211] Avg episode reward: [(0, '6.352')] [2025-01-03 22:50:15,558][134294] Updated weights for policy 0, policy_version 22384 (0.0014) [2025-01-03 22:50:18,403][134294] Updated weights for policy 0, policy_version 22394 (0.0025) [2025-01-03 22:50:18,968][134211] Fps is (10 sec: 15564.6, 60 sec: 14131.2, 300 sec: 14315.2). Total num frames: 91734016. Throughput: 0: 3584.7. Samples: 12096942. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-03 22:50:18,968][134211] Avg episode reward: [(0, '6.249')] [2025-01-03 22:50:21,314][134294] Updated weights for policy 0, policy_version 22404 (0.0024) [2025-01-03 22:50:23,968][134211] Fps is (10 sec: 15154.2, 60 sec: 14131.1, 300 sec: 14315.1). Total num frames: 91799552. Throughput: 0: 3591.3. Samples: 12117746. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-03 22:50:23,969][134211] Avg episode reward: [(0, '6.396')] [2025-01-03 22:50:24,450][134294] Updated weights for policy 0, policy_version 22414 (0.0025) [2025-01-03 22:50:27,429][134294] Updated weights for policy 0, policy_version 22424 (0.0025) [2025-01-03 22:50:28,968][134211] Fps is (10 sec: 13516.8, 60 sec: 14199.4, 300 sec: 14329.1). Total num frames: 91869184. Throughput: 0: 3589.5. Samples: 12137794. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-03 22:50:28,968][134211] Avg episode reward: [(0, '6.549')] [2025-01-03 22:50:30,545][134294] Updated weights for policy 0, policy_version 22434 (0.0026) [2025-01-03 22:50:33,493][134294] Updated weights for policy 0, policy_version 22444 (0.0023) [2025-01-03 22:50:33,968][134211] Fps is (10 sec: 13517.6, 60 sec: 14131.2, 300 sec: 14329.1). Total num frames: 91934720. Throughput: 0: 3594.7. Samples: 12147954. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-03 22:50:33,968][134211] Avg episode reward: [(0, '7.002')] [2025-01-03 22:50:36,405][134294] Updated weights for policy 0, policy_version 22454 (0.0027) [2025-01-03 22:50:38,967][134211] Fps is (10 sec: 13926.8, 60 sec: 14267.8, 300 sec: 14343.0). Total num frames: 92008448. Throughput: 0: 3588.5. Samples: 12168558. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-03 22:50:38,968][134211] Avg episode reward: [(0, '6.260')] [2025-01-03 22:50:38,999][134294] Updated weights for policy 0, policy_version 22464 (0.0018) [2025-01-03 22:50:40,886][134294] Updated weights for policy 0, policy_version 22474 (0.0013) [2025-01-03 22:50:42,783][134294] Updated weights for policy 0, policy_version 22484 (0.0013) [2025-01-03 22:50:43,968][134211] Fps is (10 sec: 18432.2, 60 sec: 14950.5, 300 sec: 14481.8). Total num frames: 92119040. Throughput: 0: 3768.8. Samples: 12199094. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-03 22:50:43,968][134211] Avg episode reward: [(0, '6.345')] [2025-01-03 22:50:44,642][134294] Updated weights for policy 0, policy_version 22494 (0.0015) [2025-01-03 22:50:46,501][134294] Updated weights for policy 0, policy_version 22504 (0.0013) [2025-01-03 22:50:48,804][134294] Updated weights for policy 0, policy_version 22514 (0.0017) [2025-01-03 22:50:48,968][134211] Fps is (10 sec: 20889.0, 60 sec: 15496.5, 300 sec: 14579.0). Total num frames: 92217344. Throughput: 0: 3861.2. Samples: 12215398. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-03 22:50:48,972][134211] Avg episode reward: [(0, '6.949')] [2025-01-03 22:50:51,862][134294] Updated weights for policy 0, policy_version 22524 (0.0029) [2025-01-03 22:50:53,968][134211] Fps is (10 sec: 16383.6, 60 sec: 15428.3, 300 sec: 14579.0). Total num frames: 92282880. Throughput: 0: 3924.6. Samples: 12238454. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-03 22:50:53,968][134211] Avg episode reward: [(0, '7.089')] [2025-01-03 22:50:53,980][134264] Saving new best policy, reward=7.089! [2025-01-03 22:50:55,364][134294] Updated weights for policy 0, policy_version 22534 (0.0030) [2025-01-03 22:50:58,521][134294] Updated weights for policy 0, policy_version 22544 (0.0026) [2025-01-03 22:50:58,968][134211] Fps is (10 sec: 12288.1, 60 sec: 15018.6, 300 sec: 14537.3). Total num frames: 92340224. Throughput: 0: 3878.8. Samples: 12257154. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-03 22:50:58,968][134211] Avg episode reward: [(0, '6.564')] [2025-01-03 22:51:02,248][134294] Updated weights for policy 0, policy_version 22554 (0.0024) [2025-01-03 22:51:03,968][134211] Fps is (10 sec: 11468.7, 60 sec: 14813.8, 300 sec: 14495.7). Total num frames: 92397568. Throughput: 0: 3750.9. Samples: 12265732. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 22:51:03,969][134211] Avg episode reward: [(0, '6.205')] [2025-01-03 22:51:05,844][134294] Updated weights for policy 0, policy_version 22564 (0.0023) [2025-01-03 22:51:08,968][134211] Fps is (10 sec: 11878.5, 60 sec: 14677.3, 300 sec: 14467.9). Total num frames: 92459008. Throughput: 0: 3671.2. Samples: 12282946. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 22:51:08,968][134211] Avg episode reward: [(0, '6.130')] [2025-01-03 22:51:09,205][134294] Updated weights for policy 0, policy_version 22574 (0.0035) [2025-01-03 22:51:12,436][134294] Updated weights for policy 0, policy_version 22584 (0.0028) [2025-01-03 22:51:13,968][134211] Fps is (10 sec: 12288.2, 60 sec: 14540.8, 300 sec: 14398.5). Total num frames: 92520448. Throughput: 0: 3638.1. Samples: 12301508. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 22:51:13,968][134211] Avg episode reward: [(0, '6.749')] [2025-01-03 22:51:13,978][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000022588_92520448.pth... [2025-01-03 22:51:14,057][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000021750_89088000.pth [2025-01-03 22:51:15,590][134294] Updated weights for policy 0, policy_version 22594 (0.0028) [2025-01-03 22:51:17,900][134294] Updated weights for policy 0, policy_version 22604 (0.0017) [2025-01-03 22:51:18,968][134211] Fps is (10 sec: 14335.4, 60 sec: 14472.5, 300 sec: 14440.1). Total num frames: 92602368. Throughput: 0: 3634.0. Samples: 12311484. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-03 22:51:18,969][134211] Avg episode reward: [(0, '6.207')] [2025-01-03 22:51:20,341][134294] Updated weights for policy 0, policy_version 22614 (0.0022) [2025-01-03 22:51:23,387][134294] Updated weights for policy 0, policy_version 22624 (0.0024) [2025-01-03 22:51:23,968][134211] Fps is (10 sec: 15155.1, 60 sec: 14540.9, 300 sec: 14440.1). Total num frames: 92672000. Throughput: 0: 3729.5. Samples: 12336388. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-03 22:51:23,968][134211] Avg episode reward: [(0, '6.665')] [2025-01-03 22:51:26,623][134294] Updated weights for policy 0, policy_version 22634 (0.0025) [2025-01-03 22:51:28,968][134211] Fps is (10 sec: 13517.1, 60 sec: 14472.5, 300 sec: 14329.0). Total num frames: 92737536. Throughput: 0: 3475.2. Samples: 12355478. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-03 22:51:28,969][134211] Avg episode reward: [(0, '6.213')] [2025-01-03 22:51:29,737][134294] Updated weights for policy 0, policy_version 22644 (0.0025) [2025-01-03 22:51:32,730][134294] Updated weights for policy 0, policy_version 22654 (0.0022) [2025-01-03 22:51:33,968][134211] Fps is (10 sec: 13516.9, 60 sec: 14540.8, 300 sec: 14231.9). Total num frames: 92807168. Throughput: 0: 3338.4. Samples: 12365628. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-03 22:51:33,968][134211] Avg episode reward: [(0, '6.524')] [2025-01-03 22:51:35,698][134294] Updated weights for policy 0, policy_version 22664 (0.0025) [2025-01-03 22:51:38,667][134294] Updated weights for policy 0, policy_version 22674 (0.0024) [2025-01-03 22:51:38,968][134211] Fps is (10 sec: 13926.5, 60 sec: 14472.5, 300 sec: 14259.6). Total num frames: 92876800. Throughput: 0: 3288.0. Samples: 12386412. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-03 22:51:38,968][134211] Avg episode reward: [(0, '5.803')] [2025-01-03 22:51:41,707][134294] Updated weights for policy 0, policy_version 22684 (0.0025) [2025-01-03 22:51:43,968][134211] Fps is (10 sec: 13926.6, 60 sec: 13789.9, 300 sec: 14259.6). Total num frames: 92946432. Throughput: 0: 3321.7. Samples: 12406632. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-03 22:51:43,968][134211] Avg episode reward: [(0, '6.784')] [2025-01-03 22:51:44,261][134294] Updated weights for policy 0, policy_version 22694 (0.0020) [2025-01-03 22:51:46,103][134294] Updated weights for policy 0, policy_version 22704 (0.0012) [2025-01-03 22:51:48,379][134294] Updated weights for policy 0, policy_version 22714 (0.0016) [2025-01-03 22:51:48,968][134211] Fps is (10 sec: 16793.7, 60 sec: 13789.9, 300 sec: 14370.7). Total num frames: 93044736. Throughput: 0: 3481.7. Samples: 12422406. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-03 22:51:48,968][134211] Avg episode reward: [(0, '6.555')] [2025-01-03 22:51:51,610][134294] Updated weights for policy 0, policy_version 22724 (0.0030) [2025-01-03 22:51:53,968][134211] Fps is (10 sec: 15564.4, 60 sec: 13653.3, 300 sec: 14342.9). Total num frames: 93102080. Throughput: 0: 3579.1. Samples: 12444006. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-03 22:51:53,969][134211] Avg episode reward: [(0, '5.961')] [2025-01-03 22:51:55,172][134294] Updated weights for policy 0, policy_version 22734 (0.0027) [2025-01-03 22:51:57,408][134294] Updated weights for policy 0, policy_version 22744 (0.0015) [2025-01-03 22:51:58,968][134211] Fps is (10 sec: 13107.2, 60 sec: 13926.4, 300 sec: 14412.4). Total num frames: 93175808. Throughput: 0: 3650.0. Samples: 12465756. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-03 22:51:58,968][134211] Avg episode reward: [(0, '5.961')] [2025-01-03 22:52:00,554][134294] Updated weights for policy 0, policy_version 22754 (0.0025) [2025-01-03 22:52:03,730][134294] Updated weights for policy 0, policy_version 22764 (0.0027) [2025-01-03 22:52:03,968][134211] Fps is (10 sec: 13926.5, 60 sec: 14062.9, 300 sec: 14426.2). Total num frames: 93241344. Throughput: 0: 3640.3. Samples: 12475296. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 22:52:03,968][134211] Avg episode reward: [(0, '6.221')] [2025-01-03 22:52:06,883][134294] Updated weights for policy 0, policy_version 22774 (0.0028) [2025-01-03 22:52:08,968][134211] Fps is (10 sec: 13107.3, 60 sec: 14131.2, 300 sec: 14301.3). Total num frames: 93306880. Throughput: 0: 3517.1. Samples: 12494658. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 22:52:08,968][134211] Avg episode reward: [(0, '6.334')] [2025-01-03 22:52:10,193][134294] Updated weights for policy 0, policy_version 22784 (0.0026) [2025-01-03 22:52:12,833][134294] Updated weights for policy 0, policy_version 22794 (0.0018) [2025-01-03 22:52:13,968][134211] Fps is (10 sec: 14336.3, 60 sec: 14404.3, 300 sec: 14232.0). Total num frames: 93384704. Throughput: 0: 3570.6. Samples: 12516156. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 22:52:13,968][134211] Avg episode reward: [(0, '5.699')] [2025-01-03 22:52:14,766][134294] Updated weights for policy 0, policy_version 22804 (0.0013) [2025-01-03 22:52:16,957][134294] Updated weights for policy 0, policy_version 22814 (0.0016) [2025-01-03 22:52:18,968][134211] Fps is (10 sec: 16383.2, 60 sec: 14472.5, 300 sec: 14315.2). Total num frames: 93470720. Throughput: 0: 3690.3. Samples: 12531694. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-03 22:52:18,969][134211] Avg episode reward: [(0, '6.021')] [2025-01-03 22:52:19,983][134294] Updated weights for policy 0, policy_version 22824 (0.0026) [2025-01-03 22:52:23,294][134294] Updated weights for policy 0, policy_version 22834 (0.0026) [2025-01-03 22:52:23,968][134211] Fps is (10 sec: 14745.3, 60 sec: 14336.0, 300 sec: 14329.1). Total num frames: 93532160. Throughput: 0: 3681.2. Samples: 12552066. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-03 22:52:23,968][134211] Avg episode reward: [(0, '5.985')] [2025-01-03 22:52:26,715][134294] Updated weights for policy 0, policy_version 22844 (0.0023) [2025-01-03 22:52:28,968][134211] Fps is (10 sec: 12698.3, 60 sec: 14336.1, 300 sec: 14329.1). Total num frames: 93597696. Throughput: 0: 3637.2. Samples: 12570306. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-03 22:52:28,968][134211] Avg episode reward: [(0, '6.107')] [2025-01-03 22:52:29,784][134294] Updated weights for policy 0, policy_version 22854 (0.0024) [2025-01-03 22:52:32,952][134294] Updated weights for policy 0, policy_version 22864 (0.0025) [2025-01-03 22:52:33,968][134211] Fps is (10 sec: 13107.1, 60 sec: 14267.7, 300 sec: 14315.2). Total num frames: 93663232. Throughput: 0: 3498.6. Samples: 12579842. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 22:52:33,969][134211] Avg episode reward: [(0, '6.251')] [2025-01-03 22:52:36,012][134294] Updated weights for policy 0, policy_version 22874 (0.0023) [2025-01-03 22:52:38,968][134211] Fps is (10 sec: 13107.0, 60 sec: 14199.5, 300 sec: 14315.3). Total num frames: 93728768. Throughput: 0: 3469.8. Samples: 12600148. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 22:52:38,968][134211] Avg episode reward: [(0, '5.543')] [2025-01-03 22:52:39,154][134294] Updated weights for policy 0, policy_version 22884 (0.0027) [2025-01-03 22:52:42,206][134294] Updated weights for policy 0, policy_version 22894 (0.0025) [2025-01-03 22:52:43,968][134211] Fps is (10 sec: 13107.4, 60 sec: 14131.2, 300 sec: 14315.2). Total num frames: 93794304. Throughput: 0: 3428.4. Samples: 12620034. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 22:52:43,968][134211] Avg episode reward: [(0, '6.441')] [2025-01-03 22:52:45,124][134294] Updated weights for policy 0, policy_version 22904 (0.0025) [2025-01-03 22:52:47,181][134294] Updated weights for policy 0, policy_version 22914 (0.0014) [2025-01-03 22:52:48,968][134211] Fps is (10 sec: 16384.4, 60 sec: 14131.2, 300 sec: 14412.4). Total num frames: 93892608. Throughput: 0: 3478.8. Samples: 12631842. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 22:52:48,968][134211] Avg episode reward: [(0, '5.751')] [2025-01-03 22:52:49,090][134294] Updated weights for policy 0, policy_version 22924 (0.0015) [2025-01-03 22:52:50,963][134294] Updated weights for policy 0, policy_version 22934 (0.0014) [2025-01-03 22:52:52,881][134294] Updated weights for policy 0, policy_version 22944 (0.0013) [2025-01-03 22:52:53,967][134211] Fps is (10 sec: 20480.5, 60 sec: 14950.5, 300 sec: 14551.2). Total num frames: 93999104. Throughput: 0: 3766.9. Samples: 12664168. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 22:52:53,968][134211] Avg episode reward: [(0, '5.631')] [2025-01-03 22:52:54,702][134294] Updated weights for policy 0, policy_version 22954 (0.0013) [2025-01-03 22:52:57,499][134294] Updated weights for policy 0, policy_version 22964 (0.0025) [2025-01-03 22:52:58,968][134211] Fps is (10 sec: 18431.8, 60 sec: 15018.7, 300 sec: 14592.9). Total num frames: 94076928. Throughput: 0: 3870.8. Samples: 12690340. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 22:52:58,968][134211] Avg episode reward: [(0, '5.716')] [2025-01-03 22:53:01,144][134294] Updated weights for policy 0, policy_version 22974 (0.0031) [2025-01-03 22:53:03,968][134211] Fps is (10 sec: 13106.9, 60 sec: 14813.9, 300 sec: 14537.3). Total num frames: 94130176. Throughput: 0: 3710.2. Samples: 12698650. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 22:53:03,968][134211] Avg episode reward: [(0, '5.771')] [2025-01-03 22:53:04,968][134294] Updated weights for policy 0, policy_version 22984 (0.0029) [2025-01-03 22:53:08,373][134294] Updated weights for policy 0, policy_version 22994 (0.0028) [2025-01-03 22:53:08,968][134211] Fps is (10 sec: 11059.1, 60 sec: 14677.3, 300 sec: 14412.4). Total num frames: 94187520. Throughput: 0: 3629.0. Samples: 12715372. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 22:53:08,969][134211] Avg episode reward: [(0, '5.701')] [2025-01-03 22:53:11,630][134294] Updated weights for policy 0, policy_version 23004 (0.0025) [2025-01-03 22:53:13,968][134211] Fps is (10 sec: 12288.0, 60 sec: 14472.5, 300 sec: 14301.3). Total num frames: 94253056. Throughput: 0: 3644.4. Samples: 12734304. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 22:53:13,968][134211] Avg episode reward: [(0, '6.340')] [2025-01-03 22:53:13,976][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000023011_94253056.pth... [2025-01-03 22:53:14,052][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000022173_90820608.pth [2025-01-03 22:53:14,811][134294] Updated weights for policy 0, policy_version 23014 (0.0026) [2025-01-03 22:53:17,854][134294] Updated weights for policy 0, policy_version 23024 (0.0024) [2025-01-03 22:53:18,968][134211] Fps is (10 sec: 13107.4, 60 sec: 14131.3, 300 sec: 14301.3). Total num frames: 94318592. Throughput: 0: 3649.3. Samples: 12744060. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 22:53:18,968][134211] Avg episode reward: [(0, '6.341')] [2025-01-03 22:53:20,845][134294] Updated weights for policy 0, policy_version 23034 (0.0026) [2025-01-03 22:53:23,798][134294] Updated weights for policy 0, policy_version 23044 (0.0025) [2025-01-03 22:53:23,968][134211] Fps is (10 sec: 13516.9, 60 sec: 14267.8, 300 sec: 14315.2). Total num frames: 94388224. Throughput: 0: 3656.9. Samples: 12764708. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 22:53:23,968][134211] Avg episode reward: [(0, '6.009')] [2025-01-03 22:53:26,827][134294] Updated weights for policy 0, policy_version 23054 (0.0026) [2025-01-03 22:53:28,968][134211] Fps is (10 sec: 13516.7, 60 sec: 14267.7, 300 sec: 14315.2). Total num frames: 94453760. Throughput: 0: 3661.9. Samples: 12784818. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 22:53:28,968][134211] Avg episode reward: [(0, '5.934')] [2025-01-03 22:53:29,966][134294] Updated weights for policy 0, policy_version 23064 (0.0025) [2025-01-03 22:53:32,914][134294] Updated weights for policy 0, policy_version 23074 (0.0024) [2025-01-03 22:53:33,968][134211] Fps is (10 sec: 13516.5, 60 sec: 14336.0, 300 sec: 14315.2). Total num frames: 94523392. Throughput: 0: 3623.6. Samples: 12794904. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 22:53:33,968][134211] Avg episode reward: [(0, '5.975')] [2025-01-03 22:53:35,952][134294] Updated weights for policy 0, policy_version 23084 (0.0025) [2025-01-03 22:53:38,968][134211] Fps is (10 sec: 13926.4, 60 sec: 14404.3, 300 sec: 14329.1). Total num frames: 94593024. Throughput: 0: 3369.2. Samples: 12815782. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-03 22:53:38,968][134211] Avg episode reward: [(0, '5.398')] [2025-01-03 22:53:38,971][134294] Updated weights for policy 0, policy_version 23094 (0.0025) [2025-01-03 22:53:42,000][134294] Updated weights for policy 0, policy_version 23104 (0.0028) [2025-01-03 22:53:43,968][134211] Fps is (10 sec: 13516.8, 60 sec: 14404.3, 300 sec: 14315.2). Total num frames: 94658560. Throughput: 0: 3227.7. Samples: 12835586. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-03 22:53:43,968][134211] Avg episode reward: [(0, '6.336')] [2025-01-03 22:53:45,007][134294] Updated weights for policy 0, policy_version 23114 (0.0026) [2025-01-03 22:53:48,007][134294] Updated weights for policy 0, policy_version 23124 (0.0027) [2025-01-03 22:53:48,968][134211] Fps is (10 sec: 13516.6, 60 sec: 13926.3, 300 sec: 14315.2). Total num frames: 94728192. Throughput: 0: 3274.0. Samples: 12845980. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-03 22:53:48,968][134211] Avg episode reward: [(0, '6.246')] [2025-01-03 22:53:50,925][134294] Updated weights for policy 0, policy_version 23134 (0.0024) [2025-01-03 22:53:53,861][134294] Updated weights for policy 0, policy_version 23144 (0.0024) [2025-01-03 22:53:53,968][134211] Fps is (10 sec: 13926.4, 60 sec: 13311.9, 300 sec: 14342.9). Total num frames: 94797824. Throughput: 0: 3366.0. Samples: 12866840. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 22:53:53,968][134211] Avg episode reward: [(0, '5.458')] [2025-01-03 22:53:56,144][134294] Updated weights for policy 0, policy_version 23154 (0.0017) [2025-01-03 22:53:58,014][134294] Updated weights for policy 0, policy_version 23164 (0.0012) [2025-01-03 22:53:58,968][134211] Fps is (10 sec: 16794.1, 60 sec: 13653.4, 300 sec: 14426.3). Total num frames: 94896128. Throughput: 0: 3550.4. Samples: 12894072. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 22:53:58,968][134211] Avg episode reward: [(0, '5.610')] [2025-01-03 22:53:59,935][134294] Updated weights for policy 0, policy_version 23174 (0.0014) [2025-01-03 22:54:01,790][134294] Updated weights for policy 0, policy_version 23184 (0.0012) [2025-01-03 22:54:03,968][134211] Fps is (10 sec: 20070.4, 60 sec: 14472.5, 300 sec: 14412.4). Total num frames: 94998528. Throughput: 0: 3692.1. Samples: 12910206. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 22:54:03,968][134211] Avg episode reward: [(0, '5.824')] [2025-01-03 22:54:04,230][134294] Updated weights for policy 0, policy_version 23194 (0.0019) [2025-01-03 22:54:07,413][134294] Updated weights for policy 0, policy_version 23204 (0.0028) [2025-01-03 22:54:08,968][134211] Fps is (10 sec: 16383.7, 60 sec: 14540.8, 300 sec: 14370.7). Total num frames: 95059968. Throughput: 0: 3745.5. Samples: 12933258. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 22:54:08,968][134211] Avg episode reward: [(0, '6.201')] [2025-01-03 22:54:10,619][134294] Updated weights for policy 0, policy_version 23214 (0.0028) [2025-01-03 22:54:13,711][134294] Updated weights for policy 0, policy_version 23224 (0.0025) [2025-01-03 22:54:13,968][134211] Fps is (10 sec: 12697.8, 60 sec: 14540.8, 300 sec: 14370.7). Total num frames: 95125504. Throughput: 0: 3736.6. Samples: 12952964. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 22:54:13,968][134211] Avg episode reward: [(0, '6.332')] [2025-01-03 22:54:16,718][134294] Updated weights for policy 0, policy_version 23234 (0.0022) [2025-01-03 22:54:18,968][134211] Fps is (10 sec: 13516.8, 60 sec: 14609.0, 300 sec: 14384.6). Total num frames: 95195136. Throughput: 0: 3736.2. Samples: 12963032. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 22:54:18,968][134211] Avg episode reward: [(0, '6.301')] [2025-01-03 22:54:19,884][134294] Updated weights for policy 0, policy_version 23244 (0.0029) [2025-01-03 22:54:22,795][134294] Updated weights for policy 0, policy_version 23254 (0.0026) [2025-01-03 22:54:23,968][134211] Fps is (10 sec: 13516.7, 60 sec: 14540.8, 300 sec: 14384.6). Total num frames: 95260672. Throughput: 0: 3716.7. Samples: 12983036. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 22:54:23,969][134211] Avg episode reward: [(0, '6.703')] [2025-01-03 22:54:25,818][134294] Updated weights for policy 0, policy_version 23264 (0.0023) [2025-01-03 22:54:28,716][134294] Updated weights for policy 0, policy_version 23274 (0.0025) [2025-01-03 22:54:28,968][134211] Fps is (10 sec: 13516.1, 60 sec: 14608.9, 300 sec: 14384.6). Total num frames: 95330304. Throughput: 0: 3739.9. Samples: 13003884. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 22:54:28,969][134211] Avg episode reward: [(0, '6.728')] [2025-01-03 22:54:31,831][134294] Updated weights for policy 0, policy_version 23284 (0.0023) [2025-01-03 22:54:33,968][134211] Fps is (10 sec: 13516.8, 60 sec: 14540.8, 300 sec: 14384.6). Total num frames: 95395840. Throughput: 0: 3731.4. Samples: 13013892. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 22:54:33,968][134211] Avg episode reward: [(0, '6.615')] [2025-01-03 22:54:34,976][134294] Updated weights for policy 0, policy_version 23294 (0.0024) [2025-01-03 22:54:37,840][134294] Updated weights for policy 0, policy_version 23304 (0.0022) [2025-01-03 22:54:38,968][134211] Fps is (10 sec: 13517.5, 60 sec: 14540.8, 300 sec: 14384.6). Total num frames: 95465472. Throughput: 0: 3715.5. Samples: 13034038. Policy #0 lag: (min: 0.0, avg: 0.8, max: 3.0) [2025-01-03 22:54:38,968][134211] Avg episode reward: [(0, '6.054')] [2025-01-03 22:54:40,946][134294] Updated weights for policy 0, policy_version 23314 (0.0026) [2025-01-03 22:54:43,850][134294] Updated weights for policy 0, policy_version 23324 (0.0023) [2025-01-03 22:54:43,968][134211] Fps is (10 sec: 13926.4, 60 sec: 14609.1, 300 sec: 14398.5). Total num frames: 95535104. Throughput: 0: 3569.4. Samples: 13054698. Policy #0 lag: (min: 0.0, avg: 0.8, max: 3.0) [2025-01-03 22:54:43,968][134211] Avg episode reward: [(0, '6.494')] [2025-01-03 22:54:46,739][134294] Updated weights for policy 0, policy_version 23334 (0.0025) [2025-01-03 22:54:48,968][134211] Fps is (10 sec: 13926.4, 60 sec: 14609.1, 300 sec: 14398.5). Total num frames: 95604736. Throughput: 0: 3441.9. Samples: 13065092. Policy #0 lag: (min: 0.0, avg: 0.8, max: 3.0) [2025-01-03 22:54:48,968][134211] Avg episode reward: [(0, '5.902')] [2025-01-03 22:54:49,836][134294] Updated weights for policy 0, policy_version 23344 (0.0028) [2025-01-03 22:54:51,793][134294] Updated weights for policy 0, policy_version 23354 (0.0013) [2025-01-03 22:54:53,659][134294] Updated weights for policy 0, policy_version 23364 (0.0013) [2025-01-03 22:54:53,967][134211] Fps is (10 sec: 16794.1, 60 sec: 15087.0, 300 sec: 14454.0). Total num frames: 95703040. Throughput: 0: 3478.5. Samples: 13089788. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-03 22:54:53,968][134211] Avg episode reward: [(0, '6.450')] [2025-01-03 22:54:55,552][134294] Updated weights for policy 0, policy_version 23374 (0.0012) [2025-01-03 22:54:57,408][134294] Updated weights for policy 0, policy_version 23384 (0.0012) [2025-01-03 22:54:58,968][134211] Fps is (10 sec: 20480.3, 60 sec: 15223.5, 300 sec: 14579.0). Total num frames: 95809536. Throughput: 0: 3760.5. Samples: 13122184. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-03 22:54:58,968][134211] Avg episode reward: [(0, '6.176')] [2025-01-03 22:54:59,559][134294] Updated weights for policy 0, policy_version 23394 (0.0015) [2025-01-03 22:55:02,945][134294] Updated weights for policy 0, policy_version 23404 (0.0030) [2025-01-03 22:55:03,968][134211] Fps is (10 sec: 16793.3, 60 sec: 14540.8, 300 sec: 14551.2). Total num frames: 95870976. Throughput: 0: 3800.1. Samples: 13134034. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-03 22:55:03,968][134211] Avg episode reward: [(0, '5.677')] [2025-01-03 22:55:06,501][134294] Updated weights for policy 0, policy_version 23414 (0.0027) [2025-01-03 22:55:08,968][134211] Fps is (10 sec: 12287.8, 60 sec: 14540.8, 300 sec: 14523.4). Total num frames: 95932416. Throughput: 0: 3728.8. Samples: 13150832. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 22:55:08,968][134211] Avg episode reward: [(0, '5.782')] [2025-01-03 22:55:09,894][134294] Updated weights for policy 0, policy_version 23424 (0.0027) [2025-01-03 22:55:13,001][134294] Updated weights for policy 0, policy_version 23434 (0.0026) [2025-01-03 22:55:13,968][134211] Fps is (10 sec: 12696.8, 60 sec: 14540.7, 300 sec: 14454.0). Total num frames: 95997952. Throughput: 0: 3695.2. Samples: 13170170. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 22:55:13,969][134211] Avg episode reward: [(0, '5.403')] [2025-01-03 22:55:13,976][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000023437_95997952.pth... [2025-01-03 22:55:14,051][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000022588_92520448.pth [2025-01-03 22:55:16,105][134294] Updated weights for policy 0, policy_version 23444 (0.0025) [2025-01-03 22:55:18,968][134211] Fps is (10 sec: 13107.2, 60 sec: 14472.5, 300 sec: 14454.0). Total num frames: 96063488. Throughput: 0: 3695.3. Samples: 13180178. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 22:55:18,968][134211] Avg episode reward: [(0, '5.296')] [2025-01-03 22:55:19,014][134294] Updated weights for policy 0, policy_version 23454 (0.0026) [2025-01-03 22:55:22,045][134294] Updated weights for policy 0, policy_version 23464 (0.0022) [2025-01-03 22:55:23,968][134211] Fps is (10 sec: 13517.5, 60 sec: 14540.8, 300 sec: 14454.0). Total num frames: 96133120. Throughput: 0: 3706.5. Samples: 13200832. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 22:55:23,968][134211] Avg episode reward: [(0, '5.557')] [2025-01-03 22:55:25,010][134294] Updated weights for policy 0, policy_version 23474 (0.0023) [2025-01-03 22:55:28,009][134294] Updated weights for policy 0, policy_version 23484 (0.0026) [2025-01-03 22:55:28,968][134211] Fps is (10 sec: 13926.4, 60 sec: 14540.9, 300 sec: 14467.9). Total num frames: 96202752. Throughput: 0: 3706.4. Samples: 13221486. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 22:55:28,968][134211] Avg episode reward: [(0, '5.867')] [2025-01-03 22:55:30,905][134294] Updated weights for policy 0, policy_version 23494 (0.0026) [2025-01-03 22:55:33,835][134294] Updated weights for policy 0, policy_version 23504 (0.0025) [2025-01-03 22:55:33,968][134211] Fps is (10 sec: 13926.4, 60 sec: 14609.1, 300 sec: 14454.0). Total num frames: 96272384. Throughput: 0: 3705.0. Samples: 13231818. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 22:55:33,968][134211] Avg episode reward: [(0, '5.814')] [2025-01-03 22:55:36,802][134294] Updated weights for policy 0, policy_version 23514 (0.0022) [2025-01-03 22:55:38,968][134211] Fps is (10 sec: 13516.8, 60 sec: 14540.8, 300 sec: 14301.3). Total num frames: 96337920. Throughput: 0: 3620.0. Samples: 13252688. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 22:55:38,973][134211] Avg episode reward: [(0, '5.707')] [2025-01-03 22:55:40,040][134294] Updated weights for policy 0, policy_version 23524 (0.0026) [2025-01-03 22:55:43,068][134294] Updated weights for policy 0, policy_version 23534 (0.0026) [2025-01-03 22:55:43,968][134211] Fps is (10 sec: 13107.4, 60 sec: 14472.6, 300 sec: 14190.2). Total num frames: 96403456. Throughput: 0: 3335.3. Samples: 13272272. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 22:55:43,968][134211] Avg episode reward: [(0, '6.330')] [2025-01-03 22:55:45,387][134294] Updated weights for policy 0, policy_version 23544 (0.0015) [2025-01-03 22:55:47,210][134294] Updated weights for policy 0, policy_version 23554 (0.0013) [2025-01-03 22:55:48,968][134211] Fps is (10 sec: 17613.1, 60 sec: 15155.2, 300 sec: 14343.0). Total num frames: 96514048. Throughput: 0: 3395.0. Samples: 13286808. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 22:55:48,968][134211] Avg episode reward: [(0, '5.823')] [2025-01-03 22:55:49,124][134294] Updated weights for policy 0, policy_version 23564 (0.0012) [2025-01-03 22:55:51,004][134294] Updated weights for policy 0, policy_version 23574 (0.0013) [2025-01-03 22:55:53,358][134294] Updated weights for policy 0, policy_version 23584 (0.0019) [2025-01-03 22:55:53,968][134211] Fps is (10 sec: 20070.3, 60 sec: 15018.6, 300 sec: 14454.0). Total num frames: 96604160. Throughput: 0: 3734.6. Samples: 13318888. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 22:55:53,968][134211] Avg episode reward: [(0, '5.963')] [2025-01-03 22:55:56,707][134294] Updated weights for policy 0, policy_version 23594 (0.0029) [2025-01-03 22:55:58,968][134211] Fps is (10 sec: 15564.5, 60 sec: 14336.0, 300 sec: 14481.8). Total num frames: 96669696. Throughput: 0: 3729.7. Samples: 13338004. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 22:55:58,968][134211] Avg episode reward: [(0, '6.147')] [2025-01-03 22:55:59,917][134294] Updated weights for policy 0, policy_version 23604 (0.0025) [2025-01-03 22:56:03,105][134294] Updated weights for policy 0, policy_version 23614 (0.0028) [2025-01-03 22:56:03,968][134211] Fps is (10 sec: 12697.5, 60 sec: 14336.0, 300 sec: 14481.8). Total num frames: 96731136. Throughput: 0: 3720.9. Samples: 13347620. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 22:56:03,968][134211] Avg episode reward: [(0, '6.143')] [2025-01-03 22:56:06,232][134294] Updated weights for policy 0, policy_version 23624 (0.0026) [2025-01-03 22:56:08,968][134211] Fps is (10 sec: 12697.8, 60 sec: 14404.3, 300 sec: 14495.7). Total num frames: 96796672. Throughput: 0: 3700.7. Samples: 13367364. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 22:56:08,968][134211] Avg episode reward: [(0, '5.938')] [2025-01-03 22:56:09,353][134294] Updated weights for policy 0, policy_version 23634 (0.0026) [2025-01-03 22:56:12,462][134294] Updated weights for policy 0, policy_version 23644 (0.0031) [2025-01-03 22:56:13,968][134211] Fps is (10 sec: 13517.0, 60 sec: 14472.7, 300 sec: 14454.0). Total num frames: 96866304. Throughput: 0: 3683.3. Samples: 13387236. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-03 22:56:13,969][134211] Avg episode reward: [(0, '5.538')] [2025-01-03 22:56:15,374][134294] Updated weights for policy 0, policy_version 23654 (0.0023) [2025-01-03 22:56:18,356][134294] Updated weights for policy 0, policy_version 23664 (0.0024) [2025-01-03 22:56:18,968][134211] Fps is (10 sec: 13926.2, 60 sec: 14540.8, 300 sec: 14454.0). Total num frames: 96935936. Throughput: 0: 3686.2. Samples: 13397698. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-03 22:56:18,968][134211] Avg episode reward: [(0, '5.212')] [2025-01-03 22:56:21,334][134294] Updated weights for policy 0, policy_version 23674 (0.0025) [2025-01-03 22:56:23,968][134211] Fps is (10 sec: 13516.5, 60 sec: 14472.5, 300 sec: 14454.0). Total num frames: 97001472. Throughput: 0: 3678.9. Samples: 13418238. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-03 22:56:23,969][134211] Avg episode reward: [(0, '5.833')] [2025-01-03 22:56:24,410][134294] Updated weights for policy 0, policy_version 23684 (0.0025) [2025-01-03 22:56:27,380][134294] Updated weights for policy 0, policy_version 23694 (0.0026) [2025-01-03 22:56:28,968][134211] Fps is (10 sec: 13516.9, 60 sec: 14472.6, 300 sec: 14454.0). Total num frames: 97071104. Throughput: 0: 3696.0. Samples: 13438592. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 22:56:28,968][134211] Avg episode reward: [(0, '5.490')] [2025-01-03 22:56:30,384][134294] Updated weights for policy 0, policy_version 23704 (0.0028) [2025-01-03 22:56:33,381][134294] Updated weights for policy 0, policy_version 23714 (0.0025) [2025-01-03 22:56:33,969][134211] Fps is (10 sec: 13515.4, 60 sec: 14404.0, 300 sec: 14440.1). Total num frames: 97136640. Throughput: 0: 3599.8. Samples: 13448804. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 22:56:33,970][134211] Avg episode reward: [(0, '5.976')] [2025-01-03 22:56:36,428][134294] Updated weights for policy 0, policy_version 23724 (0.0026) [2025-01-03 22:56:38,968][134211] Fps is (10 sec: 13107.1, 60 sec: 14404.3, 300 sec: 14426.2). Total num frames: 97202176. Throughput: 0: 3332.7. Samples: 13468858. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 22:56:38,968][134211] Avg episode reward: [(0, '6.768')] [2025-01-03 22:56:39,585][134294] Updated weights for policy 0, policy_version 23734 (0.0026) [2025-01-03 22:56:41,734][134294] Updated weights for policy 0, policy_version 23744 (0.0015) [2025-01-03 22:56:43,620][134294] Updated weights for policy 0, policy_version 23754 (0.0014) [2025-01-03 22:56:43,968][134211] Fps is (10 sec: 16386.1, 60 sec: 14950.4, 300 sec: 14426.3). Total num frames: 97300480. Throughput: 0: 3489.9. Samples: 13495048. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 22:56:43,968][134211] Avg episode reward: [(0, '6.084')] [2025-01-03 22:56:45,972][134294] Updated weights for policy 0, policy_version 23764 (0.0021) [2025-01-03 22:56:48,968][134211] Fps is (10 sec: 17202.1, 60 sec: 14335.8, 300 sec: 14481.8). Total num frames: 97374208. Throughput: 0: 3571.6. Samples: 13508342. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-03 22:56:48,969][134211] Avg episode reward: [(0, '6.528')] [2025-01-03 22:56:48,991][134294] Updated weights for policy 0, policy_version 23774 (0.0026) [2025-01-03 22:56:52,071][134294] Updated weights for policy 0, policy_version 23784 (0.0023) [2025-01-03 22:56:53,968][134211] Fps is (10 sec: 14335.7, 60 sec: 13994.6, 300 sec: 14467.9). Total num frames: 97443840. Throughput: 0: 3575.2. Samples: 13528248. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-03 22:56:53,968][134211] Avg episode reward: [(0, '5.674')] [2025-01-03 22:56:55,177][134294] Updated weights for policy 0, policy_version 23794 (0.0028) [2025-01-03 22:56:58,131][134294] Updated weights for policy 0, policy_version 23804 (0.0027) [2025-01-03 22:56:58,968][134211] Fps is (10 sec: 13517.8, 60 sec: 13994.7, 300 sec: 14467.9). Total num frames: 97509376. Throughput: 0: 3584.8. Samples: 13548550. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-03 22:56:58,968][134211] Avg episode reward: [(0, '5.866')] [2025-01-03 22:57:01,668][134294] Updated weights for policy 0, policy_version 23814 (0.0028) [2025-01-03 22:57:03,968][134211] Fps is (10 sec: 13107.5, 60 sec: 14063.0, 300 sec: 14467.9). Total num frames: 97574912. Throughput: 0: 3548.4. Samples: 13557376. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-03 22:57:03,968][134211] Avg episode reward: [(0, '5.949')] [2025-01-03 22:57:04,215][134294] Updated weights for policy 0, policy_version 23824 (0.0013) [2025-01-03 22:57:06,317][134294] Updated weights for policy 0, policy_version 23834 (0.0013) [2025-01-03 22:57:08,190][134294] Updated weights for policy 0, policy_version 23844 (0.0014) [2025-01-03 22:57:08,968][134211] Fps is (10 sec: 17203.4, 60 sec: 14745.6, 300 sec: 14565.1). Total num frames: 97681408. Throughput: 0: 3684.2. Samples: 13584024. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-03 22:57:08,968][134211] Avg episode reward: [(0, '5.643')] [2025-01-03 22:57:10,097][134294] Updated weights for policy 0, policy_version 23854 (0.0016) [2025-01-03 22:57:12,066][134294] Updated weights for policy 0, policy_version 23864 (0.0016) [2025-01-03 22:57:13,968][134211] Fps is (10 sec: 19660.1, 60 sec: 15086.9, 300 sec: 14579.0). Total num frames: 97771520. Throughput: 0: 3898.9. Samples: 13614044. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-03 22:57:13,969][134211] Avg episode reward: [(0, '5.752')] [2025-01-03 22:57:13,977][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000023870_97771520.pth... [2025-01-03 22:57:14,054][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000023011_94253056.pth [2025-01-03 22:57:15,198][134294] Updated weights for policy 0, policy_version 23874 (0.0028) [2025-01-03 22:57:18,277][134294] Updated weights for policy 0, policy_version 23884 (0.0029) [2025-01-03 22:57:18,968][134211] Fps is (10 sec: 15564.5, 60 sec: 15018.7, 300 sec: 14592.9). Total num frames: 97837056. Throughput: 0: 3883.0. Samples: 13623534. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 22:57:18,968][134211] Avg episode reward: [(0, '5.546')] [2025-01-03 22:57:21,373][134294] Updated weights for policy 0, policy_version 23894 (0.0027) [2025-01-03 22:57:23,968][134211] Fps is (10 sec: 13107.3, 60 sec: 15018.7, 300 sec: 14592.9). Total num frames: 97902592. Throughput: 0: 3872.9. Samples: 13643138. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 22:57:23,969][134211] Avg episode reward: [(0, '5.520')] [2025-01-03 22:57:24,586][134294] Updated weights for policy 0, policy_version 23904 (0.0027) [2025-01-03 22:57:27,651][134294] Updated weights for policy 0, policy_version 23914 (0.0027) [2025-01-03 22:57:28,968][134211] Fps is (10 sec: 12697.8, 60 sec: 14882.1, 300 sec: 14579.0). Total num frames: 97964032. Throughput: 0: 3727.9. Samples: 13662802. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 22:57:28,968][134211] Avg episode reward: [(0, '5.049')] [2025-01-03 22:57:30,715][134294] Updated weights for policy 0, policy_version 23924 (0.0025) [2025-01-03 22:57:33,581][134294] Updated weights for policy 0, policy_version 23934 (0.0023) [2025-01-03 22:57:33,968][134211] Fps is (10 sec: 13517.0, 60 sec: 15019.0, 300 sec: 14606.8). Total num frames: 98037760. Throughput: 0: 3663.2. Samples: 13673182. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 22:57:33,968][134211] Avg episode reward: [(0, '5.715')] [2025-01-03 22:57:36,667][134294] Updated weights for policy 0, policy_version 23944 (0.0028) [2025-01-03 22:57:38,968][134211] Fps is (10 sec: 13926.2, 60 sec: 15018.7, 300 sec: 14606.8). Total num frames: 98103296. Throughput: 0: 3674.7. Samples: 13693608. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 22:57:38,968][134211] Avg episode reward: [(0, '5.964')] [2025-01-03 22:57:39,887][134294] Updated weights for policy 0, policy_version 23954 (0.0024) [2025-01-03 22:57:42,946][134294] Updated weights for policy 0, policy_version 23964 (0.0026) [2025-01-03 22:57:43,968][134211] Fps is (10 sec: 13107.1, 60 sec: 14472.5, 300 sec: 14495.7). Total num frames: 98168832. Throughput: 0: 3657.8. Samples: 13713152. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 22:57:43,968][134211] Avg episode reward: [(0, '5.687')] [2025-01-03 22:57:45,968][134294] Updated weights for policy 0, policy_version 23974 (0.0025) [2025-01-03 22:57:48,865][134294] Updated weights for policy 0, policy_version 23984 (0.0025) [2025-01-03 22:57:48,968][134211] Fps is (10 sec: 13517.0, 60 sec: 14404.4, 300 sec: 14370.7). Total num frames: 98238464. Throughput: 0: 3690.4. Samples: 13723442. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 22:57:48,968][134211] Avg episode reward: [(0, '5.591')] [2025-01-03 22:57:51,849][134294] Updated weights for policy 0, policy_version 23994 (0.0026) [2025-01-03 22:57:53,968][134211] Fps is (10 sec: 13516.8, 60 sec: 14336.0, 300 sec: 14329.1). Total num frames: 98304000. Throughput: 0: 3557.5. Samples: 13744112. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 22:57:53,968][134211] Avg episode reward: [(0, '5.744')] [2025-01-03 22:57:54,990][134294] Updated weights for policy 0, policy_version 24004 (0.0025) [2025-01-03 22:57:57,897][134294] Updated weights for policy 0, policy_version 24014 (0.0024) [2025-01-03 22:57:58,968][134211] Fps is (10 sec: 13516.7, 60 sec: 14404.3, 300 sec: 14384.6). Total num frames: 98373632. Throughput: 0: 3340.3. Samples: 13764358. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 22:57:58,968][134211] Avg episode reward: [(0, '5.799')] [2025-01-03 22:58:00,921][134294] Updated weights for policy 0, policy_version 24024 (0.0026) [2025-01-03 22:58:03,968][134211] Fps is (10 sec: 13516.7, 60 sec: 14404.2, 300 sec: 14412.4). Total num frames: 98439168. Throughput: 0: 3356.3. Samples: 13774566. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 22:58:03,968][134211] Avg episode reward: [(0, '5.571')] [2025-01-03 22:58:04,161][134294] Updated weights for policy 0, policy_version 24034 (0.0028) [2025-01-03 22:58:07,218][134294] Updated weights for policy 0, policy_version 24044 (0.0023) [2025-01-03 22:58:08,968][134211] Fps is (10 sec: 14336.2, 60 sec: 13926.4, 300 sec: 14454.0). Total num frames: 98516992. Throughput: 0: 3358.5. Samples: 13794270. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 22:58:08,968][134211] Avg episode reward: [(0, '5.886')] [2025-01-03 22:58:09,239][134294] Updated weights for policy 0, policy_version 24054 (0.0014) [2025-01-03 22:58:11,227][134294] Updated weights for policy 0, policy_version 24064 (0.0014) [2025-01-03 22:58:13,129][134294] Updated weights for policy 0, policy_version 24074 (0.0013) [2025-01-03 22:58:13,968][134211] Fps is (10 sec: 18432.4, 60 sec: 14199.5, 300 sec: 14592.9). Total num frames: 98623488. Throughput: 0: 3611.1. Samples: 13825300. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 22:58:13,968][134211] Avg episode reward: [(0, '5.614')] [2025-01-03 22:58:15,133][134294] Updated weights for policy 0, policy_version 24084 (0.0013) [2025-01-03 22:58:17,572][134294] Updated weights for policy 0, policy_version 24094 (0.0020) [2025-01-03 22:58:18,968][134211] Fps is (10 sec: 18430.7, 60 sec: 14404.1, 300 sec: 14620.6). Total num frames: 98701312. Throughput: 0: 3715.0. Samples: 13840360. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 22:58:18,970][134211] Avg episode reward: [(0, '5.629')] [2025-01-03 22:58:20,923][134294] Updated weights for policy 0, policy_version 24104 (0.0029) [2025-01-03 22:58:23,968][134211] Fps is (10 sec: 14335.9, 60 sec: 14404.3, 300 sec: 14620.6). Total num frames: 98766848. Throughput: 0: 3697.1. Samples: 13859978. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 22:58:23,968][134211] Avg episode reward: [(0, '5.908')] [2025-01-03 22:58:24,042][134294] Updated weights for policy 0, policy_version 24114 (0.0029) [2025-01-03 22:58:27,096][134294] Updated weights for policy 0, policy_version 24124 (0.0027) [2025-01-03 22:58:28,968][134211] Fps is (10 sec: 13107.9, 60 sec: 14472.5, 300 sec: 14606.8). Total num frames: 98832384. Throughput: 0: 3701.8. Samples: 13879732. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 22:58:28,968][134211] Avg episode reward: [(0, '6.109')] [2025-01-03 22:58:30,352][134294] Updated weights for policy 0, policy_version 24134 (0.0026) [2025-01-03 22:58:33,523][134294] Updated weights for policy 0, policy_version 24144 (0.0024) [2025-01-03 22:58:33,968][134211] Fps is (10 sec: 13107.0, 60 sec: 14336.0, 300 sec: 14592.9). Total num frames: 98897920. Throughput: 0: 3678.6. Samples: 13888980. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 22:58:33,969][134211] Avg episode reward: [(0, '6.018')] [2025-01-03 22:58:36,455][134294] Updated weights for policy 0, policy_version 24154 (0.0026) [2025-01-03 22:58:38,968][134211] Fps is (10 sec: 13107.1, 60 sec: 14336.0, 300 sec: 14592.9). Total num frames: 98963456. Throughput: 0: 3666.4. Samples: 13909100. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 22:58:38,968][134211] Avg episode reward: [(0, '6.338')] [2025-01-03 22:58:39,721][134294] Updated weights for policy 0, policy_version 24164 (0.0028) [2025-01-03 22:58:42,685][134294] Updated weights for policy 0, policy_version 24174 (0.0026) [2025-01-03 22:58:43,968][134211] Fps is (10 sec: 13516.7, 60 sec: 14404.2, 300 sec: 14592.9). Total num frames: 99033088. Throughput: 0: 3659.4. Samples: 13929030. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 22:58:43,968][134211] Avg episode reward: [(0, '6.796')] [2025-01-03 22:58:45,604][134294] Updated weights for policy 0, policy_version 24184 (0.0026) [2025-01-03 22:58:48,611][134294] Updated weights for policy 0, policy_version 24194 (0.0023) [2025-01-03 22:58:48,968][134211] Fps is (10 sec: 13926.5, 60 sec: 14404.2, 300 sec: 14592.9). Total num frames: 99102720. Throughput: 0: 3664.4. Samples: 13939464. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 22:58:48,968][134211] Avg episode reward: [(0, '5.958')] [2025-01-03 22:58:51,596][134294] Updated weights for policy 0, policy_version 24204 (0.0025) [2025-01-03 22:58:53,968][134211] Fps is (10 sec: 13926.6, 60 sec: 14472.5, 300 sec: 14495.7). Total num frames: 99172352. Throughput: 0: 3686.1. Samples: 13960144. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-03 22:58:53,968][134211] Avg episode reward: [(0, '5.749')] [2025-01-03 22:58:54,646][134294] Updated weights for policy 0, policy_version 24214 (0.0024) [2025-01-03 22:58:57,551][134294] Updated weights for policy 0, policy_version 24224 (0.0027) [2025-01-03 22:58:58,968][134211] Fps is (10 sec: 13516.8, 60 sec: 14404.3, 300 sec: 14370.7). Total num frames: 99237888. Throughput: 0: 3444.3. Samples: 13980294. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-03 22:58:58,968][134211] Avg episode reward: [(0, '6.415')] [2025-01-03 22:59:00,963][134294] Updated weights for policy 0, policy_version 24234 (0.0028) [2025-01-03 22:59:03,968][134211] Fps is (10 sec: 12288.2, 60 sec: 14267.8, 300 sec: 14356.8). Total num frames: 99295232. Throughput: 0: 3309.8. Samples: 13989298. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-03 22:59:03,968][134211] Avg episode reward: [(0, '5.601')] [2025-01-03 22:59:04,271][134294] Updated weights for policy 0, policy_version 24244 (0.0024) [2025-01-03 22:59:06,323][134294] Updated weights for policy 0, policy_version 24254 (0.0012) [2025-01-03 22:59:08,221][134294] Updated weights for policy 0, policy_version 24264 (0.0012) [2025-01-03 22:59:08,967][134211] Fps is (10 sec: 15974.8, 60 sec: 14677.4, 300 sec: 14481.8). Total num frames: 99397632. Throughput: 0: 3417.3. Samples: 14013758. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-03 22:59:08,968][134211] Avg episode reward: [(0, '5.953')] [2025-01-03 22:59:10,159][134294] Updated weights for policy 0, policy_version 24274 (0.0013) [2025-01-03 22:59:12,314][134294] Updated weights for policy 0, policy_version 24284 (0.0015) [2025-01-03 22:59:13,968][134211] Fps is (10 sec: 19250.8, 60 sec: 14404.2, 300 sec: 14551.2). Total num frames: 99487744. Throughput: 0: 3623.5. Samples: 14042792. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 22:59:13,969][134211] Avg episode reward: [(0, '6.289')] [2025-01-03 22:59:13,979][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000024289_99487744.pth... [2025-01-03 22:59:14,064][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000023437_95997952.pth [2025-01-03 22:59:15,666][134294] Updated weights for policy 0, policy_version 24294 (0.0030) [2025-01-03 22:59:18,606][134294] Updated weights for policy 0, policy_version 24304 (0.0027) [2025-01-03 22:59:18,968][134211] Fps is (10 sec: 15154.5, 60 sec: 14131.3, 300 sec: 14537.3). Total num frames: 99549184. Throughput: 0: 3625.5. Samples: 14052130. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 22:59:18,969][134211] Avg episode reward: [(0, '6.522')] [2025-01-03 22:59:21,811][134294] Updated weights for policy 0, policy_version 24314 (0.0027) [2025-01-03 22:59:23,968][134211] Fps is (10 sec: 12697.7, 60 sec: 14131.2, 300 sec: 14523.5). Total num frames: 99614720. Throughput: 0: 3617.3. Samples: 14071878. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 22:59:23,968][134211] Avg episode reward: [(0, '6.618')] [2025-01-03 22:59:24,929][134294] Updated weights for policy 0, policy_version 24324 (0.0026) [2025-01-03 22:59:27,976][134294] Updated weights for policy 0, policy_version 24334 (0.0026) [2025-01-03 22:59:28,968][134211] Fps is (10 sec: 13517.0, 60 sec: 14199.4, 300 sec: 14537.3). Total num frames: 99684352. Throughput: 0: 3620.9. Samples: 14091968. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 22:59:28,969][134211] Avg episode reward: [(0, '5.601')] [2025-01-03 22:59:30,985][134294] Updated weights for policy 0, policy_version 24344 (0.0024) [2025-01-03 22:59:33,968][134211] Fps is (10 sec: 13517.0, 60 sec: 14199.5, 300 sec: 14523.4). Total num frames: 99749888. Throughput: 0: 3615.4. Samples: 14102158. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 22:59:33,968][134211] Avg episode reward: [(0, '5.771')] [2025-01-03 22:59:34,029][134294] Updated weights for policy 0, policy_version 24354 (0.0027) [2025-01-03 22:59:36,882][134294] Updated weights for policy 0, policy_version 24364 (0.0024) [2025-01-03 22:59:38,968][134211] Fps is (10 sec: 13517.1, 60 sec: 14267.8, 300 sec: 14523.5). Total num frames: 99819520. Throughput: 0: 3616.4. Samples: 14122882. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 22:59:38,968][134211] Avg episode reward: [(0, '6.191')] [2025-01-03 22:59:40,087][134294] Updated weights for policy 0, policy_version 24374 (0.0025) [2025-01-03 22:59:43,102][134294] Updated weights for policy 0, policy_version 24384 (0.0026) [2025-01-03 22:59:43,968][134211] Fps is (10 sec: 13516.7, 60 sec: 14199.5, 300 sec: 14509.6). Total num frames: 99885056. Throughput: 0: 3606.8. Samples: 14142598. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 22:59:43,968][134211] Avg episode reward: [(0, '5.675')] [2025-01-03 22:59:46,121][134294] Updated weights for policy 0, policy_version 24394 (0.0026) [2025-01-03 22:59:48,968][134211] Fps is (10 sec: 13516.6, 60 sec: 14199.5, 300 sec: 14412.4). Total num frames: 99954688. Throughput: 0: 3632.5. Samples: 14152760. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 22:59:48,968][134211] Avg episode reward: [(0, '6.237')] [2025-01-03 22:59:49,264][134294] Updated weights for policy 0, policy_version 24404 (0.0026) [2025-01-03 22:59:52,232][134294] Updated weights for policy 0, policy_version 24414 (0.0026) [2025-01-03 22:59:53,968][134211] Fps is (10 sec: 13516.9, 60 sec: 14131.2, 300 sec: 14273.5). Total num frames: 100020224. Throughput: 0: 3537.6. Samples: 14172952. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 22:59:53,968][134211] Avg episode reward: [(0, '5.371')] [2025-01-03 22:59:54,900][134294] Updated weights for policy 0, policy_version 24424 (0.0021) [2025-01-03 22:59:56,718][134294] Updated weights for policy 0, policy_version 24434 (0.0013) [2025-01-03 22:59:58,698][134294] Updated weights for policy 0, policy_version 24444 (0.0016) [2025-01-03 22:59:58,968][134211] Fps is (10 sec: 16793.8, 60 sec: 14745.6, 300 sec: 14412.4). Total num frames: 100122624. Throughput: 0: 3526.3. Samples: 14201474. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 22:59:58,968][134211] Avg episode reward: [(0, '4.976')] [2025-01-03 23:00:01,680][134294] Updated weights for policy 0, policy_version 24454 (0.0028) [2025-01-03 23:00:03,969][134211] Fps is (10 sec: 16792.0, 60 sec: 14881.9, 300 sec: 14426.2). Total num frames: 100188160. Throughput: 0: 3561.6. Samples: 14212406. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 23:00:03,969][134211] Avg episode reward: [(0, '4.827')] [2025-01-03 23:00:04,945][134294] Updated weights for policy 0, policy_version 24464 (0.0029) [2025-01-03 23:00:08,072][134294] Updated weights for policy 0, policy_version 24474 (0.0028) [2025-01-03 23:00:08,968][134211] Fps is (10 sec: 13107.0, 60 sec: 14267.7, 300 sec: 14426.3). Total num frames: 100253696. Throughput: 0: 3553.7. Samples: 14231796. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 23:00:08,968][134211] Avg episode reward: [(0, '4.805')] [2025-01-03 23:00:10,991][134294] Updated weights for policy 0, policy_version 24484 (0.0023) [2025-01-03 23:00:13,968][134211] Fps is (10 sec: 13518.0, 60 sec: 13926.4, 300 sec: 14440.1). Total num frames: 100323328. Throughput: 0: 3550.9. Samples: 14251760. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 23:00:13,969][134211] Avg episode reward: [(0, '4.972')] [2025-01-03 23:00:14,182][134294] Updated weights for policy 0, policy_version 24494 (0.0027) [2025-01-03 23:00:17,244][134294] Updated weights for policy 0, policy_version 24504 (0.0025) [2025-01-03 23:00:18,968][134211] Fps is (10 sec: 13516.7, 60 sec: 13994.7, 300 sec: 14426.2). Total num frames: 100388864. Throughput: 0: 3547.9. Samples: 14261816. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 23:00:18,968][134211] Avg episode reward: [(0, '5.041')] [2025-01-03 23:00:20,246][134294] Updated weights for policy 0, policy_version 24514 (0.0027) [2025-01-03 23:00:22,241][134294] Updated weights for policy 0, policy_version 24524 (0.0013) [2025-01-03 23:00:23,967][134211] Fps is (10 sec: 16384.5, 60 sec: 14540.9, 300 sec: 14523.5). Total num frames: 100487168. Throughput: 0: 3612.1. Samples: 14285424. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 23:00:23,968][134211] Avg episode reward: [(0, '5.973')] [2025-01-03 23:00:24,129][134294] Updated weights for policy 0, policy_version 24534 (0.0014) [2025-01-03 23:00:26,637][134294] Updated weights for policy 0, policy_version 24544 (0.0021) [2025-01-03 23:00:28,968][134211] Fps is (10 sec: 17203.3, 60 sec: 14609.1, 300 sec: 14537.3). Total num frames: 100560896. Throughput: 0: 3753.0. Samples: 14311482. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 23:00:28,968][134211] Avg episode reward: [(0, '5.885')] [2025-01-03 23:00:29,817][134294] Updated weights for policy 0, policy_version 24554 (0.0026) [2025-01-03 23:00:32,959][134294] Updated weights for policy 0, policy_version 24564 (0.0027) [2025-01-03 23:00:33,968][134211] Fps is (10 sec: 13925.9, 60 sec: 14609.0, 300 sec: 14537.3). Total num frames: 100626432. Throughput: 0: 3737.8. Samples: 14320962. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 23:00:33,969][134211] Avg episode reward: [(0, '6.135')] [2025-01-03 23:00:35,947][134294] Updated weights for policy 0, policy_version 24574 (0.0019) [2025-01-03 23:00:38,936][134294] Updated weights for policy 0, policy_version 24584 (0.0024) [2025-01-03 23:00:38,968][134211] Fps is (10 sec: 13516.8, 60 sec: 14609.0, 300 sec: 14551.2). Total num frames: 100696064. Throughput: 0: 3742.8. Samples: 14341376. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 23:00:38,968][134211] Avg episode reward: [(0, '6.165')] [2025-01-03 23:00:41,890][134294] Updated weights for policy 0, policy_version 24594 (0.0025) [2025-01-03 23:00:43,968][134211] Fps is (10 sec: 13517.1, 60 sec: 14609.1, 300 sec: 14398.5). Total num frames: 100761600. Throughput: 0: 3558.9. Samples: 14361624. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 23:00:43,968][134211] Avg episode reward: [(0, '5.841')] [2025-01-03 23:00:44,958][134294] Updated weights for policy 0, policy_version 24604 (0.0027) [2025-01-03 23:00:47,976][134294] Updated weights for policy 0, policy_version 24614 (0.0027) [2025-01-03 23:00:48,968][134211] Fps is (10 sec: 13516.8, 60 sec: 14609.1, 300 sec: 14329.1). Total num frames: 100831232. Throughput: 0: 3538.9. Samples: 14371652. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 23:00:48,968][134211] Avg episode reward: [(0, '6.158')] [2025-01-03 23:00:51,006][134294] Updated weights for policy 0, policy_version 24624 (0.0025) [2025-01-03 23:00:53,968][134211] Fps is (10 sec: 13516.5, 60 sec: 14609.0, 300 sec: 14329.1). Total num frames: 100896768. Throughput: 0: 3552.3. Samples: 14391648. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 23:00:53,969][134211] Avg episode reward: [(0, '5.902')] [2025-01-03 23:00:54,240][134294] Updated weights for policy 0, policy_version 24634 (0.0029) [2025-01-03 23:00:56,524][134294] Updated weights for policy 0, policy_version 24644 (0.0015) [2025-01-03 23:00:58,418][134294] Updated weights for policy 0, policy_version 24654 (0.0013) [2025-01-03 23:00:58,968][134211] Fps is (10 sec: 15974.6, 60 sec: 14472.5, 300 sec: 14440.1). Total num frames: 100990976. Throughput: 0: 3686.5. Samples: 14417650. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 23:00:58,968][134211] Avg episode reward: [(0, '5.727')] [2025-01-03 23:01:00,590][134294] Updated weights for policy 0, policy_version 24664 (0.0015) [2025-01-03 23:01:02,636][134294] Updated weights for policy 0, policy_version 24674 (0.0016) [2025-01-03 23:01:03,968][134211] Fps is (10 sec: 18432.3, 60 sec: 14882.4, 300 sec: 14523.4). Total num frames: 101081088. Throughput: 0: 3788.8. Samples: 14432310. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 23:01:03,968][134211] Avg episode reward: [(0, '6.249')] [2025-01-03 23:01:06,035][134294] Updated weights for policy 0, policy_version 24684 (0.0028) [2025-01-03 23:01:08,968][134211] Fps is (10 sec: 14745.3, 60 sec: 14745.6, 300 sec: 14481.8). Total num frames: 101138432. Throughput: 0: 3728.8. Samples: 14453220. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 23:01:08,968][134211] Avg episode reward: [(0, '5.925')] [2025-01-03 23:01:09,441][134294] Updated weights for policy 0, policy_version 24694 (0.0030) [2025-01-03 23:01:12,473][134294] Updated weights for policy 0, policy_version 24704 (0.0031) [2025-01-03 23:01:13,968][134211] Fps is (10 sec: 12287.9, 60 sec: 14677.3, 300 sec: 14467.9). Total num frames: 101203968. Throughput: 0: 3572.0. Samples: 14472220. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 23:01:13,968][134211] Avg episode reward: [(0, '5.956')] [2025-01-03 23:01:13,976][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000024708_101203968.pth... [2025-01-03 23:01:14,057][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000023870_97771520.pth [2025-01-03 23:01:15,644][134294] Updated weights for policy 0, policy_version 24714 (0.0028) [2025-01-03 23:01:18,622][134294] Updated weights for policy 0, policy_version 24724 (0.0026) [2025-01-03 23:01:18,968][134211] Fps is (10 sec: 13516.9, 60 sec: 14745.6, 300 sec: 14481.8). Total num frames: 101273600. Throughput: 0: 3583.3. Samples: 14482210. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 23:01:18,968][134211] Avg episode reward: [(0, '5.819')] [2025-01-03 23:01:21,652][134294] Updated weights for policy 0, policy_version 24734 (0.0029) [2025-01-03 23:01:23,968][134211] Fps is (10 sec: 13516.8, 60 sec: 14199.4, 300 sec: 14467.9). Total num frames: 101339136. Throughput: 0: 3582.7. Samples: 14502598. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 23:01:23,968][134211] Avg episode reward: [(0, '5.301')] [2025-01-03 23:01:24,765][134294] Updated weights for policy 0, policy_version 24744 (0.0026) [2025-01-03 23:01:27,872][134294] Updated weights for policy 0, policy_version 24754 (0.0027) [2025-01-03 23:01:28,968][134211] Fps is (10 sec: 13107.2, 60 sec: 14062.9, 300 sec: 14468.0). Total num frames: 101404672. Throughput: 0: 3568.7. Samples: 14522218. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 23:01:28,968][134211] Avg episode reward: [(0, '5.532')] [2025-01-03 23:01:30,912][134294] Updated weights for policy 0, policy_version 24764 (0.0026) [2025-01-03 23:01:33,840][134294] Updated weights for policy 0, policy_version 24774 (0.0024) [2025-01-03 23:01:33,970][134211] Fps is (10 sec: 13513.8, 60 sec: 14130.7, 300 sec: 14481.7). Total num frames: 101474304. Throughput: 0: 3576.9. Samples: 14532622. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 23:01:33,971][134211] Avg episode reward: [(0, '5.596')] [2025-01-03 23:01:37,027][134294] Updated weights for policy 0, policy_version 24784 (0.0027) [2025-01-03 23:01:38,968][134211] Fps is (10 sec: 13516.8, 60 sec: 14062.9, 300 sec: 14370.7). Total num frames: 101539840. Throughput: 0: 3580.3. Samples: 14552760. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 23:01:38,968][134211] Avg episode reward: [(0, '6.257')] [2025-01-03 23:01:40,146][134294] Updated weights for policy 0, policy_version 24794 (0.0027) [2025-01-03 23:01:41,984][134294] Updated weights for policy 0, policy_version 24804 (0.0011) [2025-01-03 23:01:43,930][134294] Updated weights for policy 0, policy_version 24814 (0.0013) [2025-01-03 23:01:43,968][134211] Fps is (10 sec: 16388.0, 60 sec: 14609.1, 300 sec: 14454.1). Total num frames: 101638144. Throughput: 0: 3579.4. Samples: 14578724. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 23:01:43,968][134211] Avg episode reward: [(0, '5.703')] [2025-01-03 23:01:45,802][134294] Updated weights for policy 0, policy_version 24824 (0.0016) [2025-01-03 23:01:47,724][134294] Updated weights for policy 0, policy_version 24834 (0.0012) [2025-01-03 23:01:48,968][134211] Fps is (10 sec: 19660.8, 60 sec: 15086.9, 300 sec: 14551.2). Total num frames: 101736448. Throughput: 0: 3615.1. Samples: 14594992. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 23:01:48,968][134211] Avg episode reward: [(0, '5.620')] [2025-01-03 23:01:50,557][134294] Updated weights for policy 0, policy_version 24844 (0.0025) [2025-01-03 23:01:53,968][134211] Fps is (10 sec: 15974.2, 60 sec: 15018.7, 300 sec: 14537.3). Total num frames: 101797888. Throughput: 0: 3673.7. Samples: 14618536. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 23:01:53,968][134211] Avg episode reward: [(0, '5.711')] [2025-01-03 23:01:54,016][134294] Updated weights for policy 0, policy_version 24854 (0.0029) [2025-01-03 23:01:57,449][134294] Updated weights for policy 0, policy_version 24864 (0.0030) [2025-01-03 23:01:58,968][134211] Fps is (10 sec: 12288.1, 60 sec: 14472.5, 300 sec: 14523.4). Total num frames: 101859328. Throughput: 0: 3642.1. Samples: 14636114. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-03 23:01:58,968][134211] Avg episode reward: [(0, '6.459')] [2025-01-03 23:02:00,630][134294] Updated weights for policy 0, policy_version 24874 (0.0028) [2025-01-03 23:02:03,819][134294] Updated weights for policy 0, policy_version 24884 (0.0025) [2025-01-03 23:02:03,968][134211] Fps is (10 sec: 12697.4, 60 sec: 14062.9, 300 sec: 14384.6). Total num frames: 101924864. Throughput: 0: 3636.4. Samples: 14645850. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-03 23:02:03,969][134211] Avg episode reward: [(0, '6.443')] [2025-01-03 23:02:07,194][134294] Updated weights for policy 0, policy_version 24894 (0.0026) [2025-01-03 23:02:08,968][134211] Fps is (10 sec: 12287.8, 60 sec: 14062.9, 300 sec: 14273.5). Total num frames: 101982208. Throughput: 0: 3599.2. Samples: 14664560. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-03 23:02:08,968][134211] Avg episode reward: [(0, '6.484')] [2025-01-03 23:02:10,600][134294] Updated weights for policy 0, policy_version 24904 (0.0025) [2025-01-03 23:02:13,538][134294] Updated weights for policy 0, policy_version 24914 (0.0023) [2025-01-03 23:02:13,968][134211] Fps is (10 sec: 12697.9, 60 sec: 14131.2, 300 sec: 14287.4). Total num frames: 102051840. Throughput: 0: 3594.1. Samples: 14683950. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 23:02:13,968][134211] Avg episode reward: [(0, '6.880')] [2025-01-03 23:02:15,642][134294] Updated weights for policy 0, policy_version 24924 (0.0013) [2025-01-03 23:02:18,262][134294] Updated weights for policy 0, policy_version 24934 (0.0022) [2025-01-03 23:02:18,968][134211] Fps is (10 sec: 15564.9, 60 sec: 14404.3, 300 sec: 14356.8). Total num frames: 102137856. Throughput: 0: 3678.1. Samples: 14698128. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 23:02:18,968][134211] Avg episode reward: [(0, '6.666')] [2025-01-03 23:02:21,260][134294] Updated weights for policy 0, policy_version 24944 (0.0023) [2025-01-03 23:02:23,968][134211] Fps is (10 sec: 15154.9, 60 sec: 14404.2, 300 sec: 14370.7). Total num frames: 102203392. Throughput: 0: 3691.7. Samples: 14718886. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 23:02:23,968][134211] Avg episode reward: [(0, '6.181')] [2025-01-03 23:02:24,317][134294] Updated weights for policy 0, policy_version 24954 (0.0028) [2025-01-03 23:02:27,473][134294] Updated weights for policy 0, policy_version 24964 (0.0024) [2025-01-03 23:02:28,968][134211] Fps is (10 sec: 13107.2, 60 sec: 14404.3, 300 sec: 14342.9). Total num frames: 102268928. Throughput: 0: 3549.1. Samples: 14738432. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 23:02:28,968][134211] Avg episode reward: [(0, '6.432')] [2025-01-03 23:02:30,799][134294] Updated weights for policy 0, policy_version 24974 (0.0026) [2025-01-03 23:02:32,869][134294] Updated weights for policy 0, policy_version 24984 (0.0016) [2025-01-03 23:02:33,968][134211] Fps is (10 sec: 15155.6, 60 sec: 14677.9, 300 sec: 14412.4). Total num frames: 102354944. Throughput: 0: 3396.2. Samples: 14747820. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 23:02:33,968][134211] Avg episode reward: [(0, '5.943')] [2025-01-03 23:02:34,778][134294] Updated weights for policy 0, policy_version 24994 (0.0013) [2025-01-03 23:02:36,711][134294] Updated weights for policy 0, policy_version 25004 (0.0012) [2025-01-03 23:02:38,569][134294] Updated weights for policy 0, policy_version 25014 (0.0013) [2025-01-03 23:02:38,967][134211] Fps is (10 sec: 19251.8, 60 sec: 15360.1, 300 sec: 14551.2). Total num frames: 102461440. Throughput: 0: 3589.7. Samples: 14780070. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 23:02:38,968][134211] Avg episode reward: [(0, '6.025')] [2025-01-03 23:02:40,542][134294] Updated weights for policy 0, policy_version 25024 (0.0015) [2025-01-03 23:02:43,284][134294] Updated weights for policy 0, policy_version 25034 (0.0025) [2025-01-03 23:02:43,968][134211] Fps is (10 sec: 19250.9, 60 sec: 15155.2, 300 sec: 14606.7). Total num frames: 102547456. Throughput: 0: 3808.9. Samples: 14807514. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 23:02:43,968][134211] Avg episode reward: [(0, '6.520')] [2025-01-03 23:02:46,501][134294] Updated weights for policy 0, policy_version 25044 (0.0027) [2025-01-03 23:02:48,968][134211] Fps is (10 sec: 14745.2, 60 sec: 14540.8, 300 sec: 14592.9). Total num frames: 102608896. Throughput: 0: 3812.2. Samples: 14817400. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-03 23:02:48,969][134211] Avg episode reward: [(0, '6.070')] [2025-01-03 23:02:49,684][134294] Updated weights for policy 0, policy_version 25054 (0.0025) [2025-01-03 23:02:52,850][134294] Updated weights for policy 0, policy_version 25064 (0.0027) [2025-01-03 23:02:53,968][134211] Fps is (10 sec: 12697.6, 60 sec: 14609.1, 300 sec: 14579.0). Total num frames: 102674432. Throughput: 0: 3825.3. Samples: 14836698. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-03 23:02:53,968][134211] Avg episode reward: [(0, '6.923')] [2025-01-03 23:02:55,851][134294] Updated weights for policy 0, policy_version 25074 (0.0025) [2025-01-03 23:02:58,968][134211] Fps is (10 sec: 13107.1, 60 sec: 14677.3, 300 sec: 14579.0). Total num frames: 102739968. Throughput: 0: 3841.4. Samples: 14856812. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-03 23:02:58,968][134211] Avg episode reward: [(0, '6.700')] [2025-01-03 23:02:59,090][134294] Updated weights for policy 0, policy_version 25084 (0.0024) [2025-01-03 23:03:02,590][134294] Updated weights for policy 0, policy_version 25094 (0.0024) [2025-01-03 23:03:03,970][134211] Fps is (10 sec: 12285.4, 60 sec: 14540.3, 300 sec: 14509.4). Total num frames: 102797312. Throughput: 0: 3718.0. Samples: 14865444. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 23:03:03,971][134211] Avg episode reward: [(0, '5.704')] [2025-01-03 23:03:05,963][134294] Updated weights for policy 0, policy_version 25104 (0.0030) [2025-01-03 23:03:08,968][134211] Fps is (10 sec: 12288.0, 60 sec: 14677.3, 300 sec: 14370.7). Total num frames: 102862848. Throughput: 0: 3659.4. Samples: 14883560. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 23:03:08,968][134211] Avg episode reward: [(0, '5.935')] [2025-01-03 23:03:09,180][134294] Updated weights for policy 0, policy_version 25114 (0.0024) [2025-01-03 23:03:12,175][134294] Updated weights for policy 0, policy_version 25124 (0.0022) [2025-01-03 23:03:13,968][134211] Fps is (10 sec: 13519.5, 60 sec: 14677.3, 300 sec: 14343.0). Total num frames: 102932480. Throughput: 0: 3669.1. Samples: 14903540. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 23:03:13,968][134211] Avg episode reward: [(0, '6.053')] [2025-01-03 23:03:13,974][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000025130_102932480.pth... [2025-01-03 23:03:14,043][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000024289_99487744.pth [2025-01-03 23:03:15,248][134294] Updated weights for policy 0, policy_version 25134 (0.0028) [2025-01-03 23:03:18,084][134294] Updated weights for policy 0, policy_version 25144 (0.0027) [2025-01-03 23:03:18,968][134211] Fps is (10 sec: 13516.8, 60 sec: 14336.0, 300 sec: 14342.9). Total num frames: 102998016. Throughput: 0: 3687.5. Samples: 14913758. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 23:03:18,968][134211] Avg episode reward: [(0, '5.840')] [2025-01-03 23:03:21,085][134294] Updated weights for policy 0, policy_version 25154 (0.0023) [2025-01-03 23:03:23,968][134211] Fps is (10 sec: 13516.9, 60 sec: 14404.3, 300 sec: 14356.8). Total num frames: 103067648. Throughput: 0: 3433.1. Samples: 14934560. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 23:03:23,968][134211] Avg episode reward: [(0, '6.551')] [2025-01-03 23:03:24,091][134294] Updated weights for policy 0, policy_version 25164 (0.0026) [2025-01-03 23:03:27,094][134294] Updated weights for policy 0, policy_version 25174 (0.0025) [2025-01-03 23:03:28,968][134211] Fps is (10 sec: 13926.5, 60 sec: 14472.5, 300 sec: 14370.7). Total num frames: 103137280. Throughput: 0: 3274.0. Samples: 14954844. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 23:03:28,968][134211] Avg episode reward: [(0, '6.315')] [2025-01-03 23:03:30,109][134294] Updated weights for policy 0, policy_version 25184 (0.0025) [2025-01-03 23:03:32,069][134294] Updated weights for policy 0, policy_version 25194 (0.0013) [2025-01-03 23:03:33,968][134211] Fps is (10 sec: 15564.9, 60 sec: 14472.5, 300 sec: 14440.1). Total num frames: 103223296. Throughput: 0: 3322.8. Samples: 14966924. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 23:03:33,968][134211] Avg episode reward: [(0, '6.327')] [2025-01-03 23:03:34,659][134294] Updated weights for policy 0, policy_version 25204 (0.0020) [2025-01-03 23:03:37,703][134294] Updated weights for policy 0, policy_version 25214 (0.0024) [2025-01-03 23:03:38,968][134211] Fps is (10 sec: 15155.1, 60 sec: 13789.8, 300 sec: 14426.3). Total num frames: 103288832. Throughput: 0: 3419.6. Samples: 14990582. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 23:03:38,968][134211] Avg episode reward: [(0, '6.184')] [2025-01-03 23:03:41,132][134294] Updated weights for policy 0, policy_version 25224 (0.0026) [2025-01-03 23:03:43,967][134211] Fps is (10 sec: 12697.9, 60 sec: 13380.3, 300 sec: 14398.5). Total num frames: 103350272. Throughput: 0: 3375.5. Samples: 15008708. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 23:03:43,968][134211] Avg episode reward: [(0, '6.767')] [2025-01-03 23:03:44,248][134294] Updated weights for policy 0, policy_version 25234 (0.0021) [2025-01-03 23:03:46,234][134294] Updated weights for policy 0, policy_version 25244 (0.0013) [2025-01-03 23:03:48,145][134294] Updated weights for policy 0, policy_version 25254 (0.0013) [2025-01-03 23:03:48,968][134211] Fps is (10 sec: 16793.9, 60 sec: 14131.2, 300 sec: 14523.5). Total num frames: 103456768. Throughput: 0: 3500.1. Samples: 15022940. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 23:03:48,968][134211] Avg episode reward: [(0, '5.732')] [2025-01-03 23:03:50,014][134294] Updated weights for policy 0, policy_version 25264 (0.0013) [2025-01-03 23:03:51,921][134294] Updated weights for policy 0, policy_version 25274 (0.0015) [2025-01-03 23:03:53,968][134211] Fps is (10 sec: 20888.9, 60 sec: 14745.6, 300 sec: 14648.4). Total num frames: 103559168. Throughput: 0: 3815.6. Samples: 15055260. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-03 23:03:53,968][134211] Avg episode reward: [(0, '7.190')] [2025-01-03 23:03:53,979][134264] Saving new best policy, reward=7.190! [2025-01-03 23:03:54,193][134294] Updated weights for policy 0, policy_version 25284 (0.0019) [2025-01-03 23:03:57,545][134294] Updated weights for policy 0, policy_version 25294 (0.0029) [2025-01-03 23:03:58,968][134211] Fps is (10 sec: 16383.1, 60 sec: 14677.2, 300 sec: 14662.3). Total num frames: 103620608. Throughput: 0: 3832.1. Samples: 15075984. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-03 23:03:58,969][134211] Avg episode reward: [(0, '6.438')] [2025-01-03 23:04:00,759][134294] Updated weights for policy 0, policy_version 25304 (0.0028) [2025-01-03 23:04:03,880][134294] Updated weights for policy 0, policy_version 25314 (0.0028) [2025-01-03 23:04:03,968][134211] Fps is (10 sec: 12697.8, 60 sec: 14814.4, 300 sec: 14537.3). Total num frames: 103686144. Throughput: 0: 3823.7. Samples: 15085826. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-03 23:04:03,968][134211] Avg episode reward: [(0, '6.823')] [2025-01-03 23:04:06,949][134294] Updated weights for policy 0, policy_version 25324 (0.0025) [2025-01-03 23:04:08,968][134211] Fps is (10 sec: 13107.8, 60 sec: 14813.9, 300 sec: 14454.0). Total num frames: 103751680. Throughput: 0: 3806.8. Samples: 15105864. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-03 23:04:08,968][134211] Avg episode reward: [(0, '6.197')] [2025-01-03 23:04:09,984][134294] Updated weights for policy 0, policy_version 25334 (0.0025) [2025-01-03 23:04:13,115][134294] Updated weights for policy 0, policy_version 25344 (0.0024) [2025-01-03 23:04:13,969][134211] Fps is (10 sec: 13105.2, 60 sec: 14745.3, 300 sec: 14467.8). Total num frames: 103817216. Throughput: 0: 3800.7. Samples: 15125880. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-03 23:04:13,970][134211] Avg episode reward: [(0, '7.308')] [2025-01-03 23:04:13,978][134264] Saving new best policy, reward=7.308! [2025-01-03 23:04:16,099][134294] Updated weights for policy 0, policy_version 25354 (0.0024) [2025-01-03 23:04:18,968][134211] Fps is (10 sec: 13516.7, 60 sec: 14813.9, 300 sec: 14481.8). Total num frames: 103886848. Throughput: 0: 3754.3. Samples: 15135866. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-03 23:04:18,968][134211] Avg episode reward: [(0, '6.411')] [2025-01-03 23:04:19,020][134294] Updated weights for policy 0, policy_version 25364 (0.0024) [2025-01-03 23:04:22,046][134294] Updated weights for policy 0, policy_version 25374 (0.0025) [2025-01-03 23:04:23,968][134211] Fps is (10 sec: 13928.5, 60 sec: 14813.9, 300 sec: 14481.8). Total num frames: 103956480. Throughput: 0: 3690.4. Samples: 15156648. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-03 23:04:23,968][134211] Avg episode reward: [(0, '7.063')] [2025-01-03 23:04:25,010][134294] Updated weights for policy 0, policy_version 25384 (0.0022) [2025-01-03 23:04:27,976][134294] Updated weights for policy 0, policy_version 25394 (0.0023) [2025-01-03 23:04:28,968][134211] Fps is (10 sec: 13926.4, 60 sec: 14813.9, 300 sec: 14495.7). Total num frames: 104026112. Throughput: 0: 3749.1. Samples: 15177418. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 23:04:28,968][134211] Avg episode reward: [(0, '6.603')] [2025-01-03 23:04:30,999][134294] Updated weights for policy 0, policy_version 25404 (0.0022) [2025-01-03 23:04:33,968][134211] Fps is (10 sec: 13516.7, 60 sec: 14472.5, 300 sec: 14481.8). Total num frames: 104091648. Throughput: 0: 3656.4. Samples: 15187480. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 23:04:33,968][134211] Avg episode reward: [(0, '6.398')] [2025-01-03 23:04:34,236][134294] Updated weights for policy 0, policy_version 25414 (0.0027) [2025-01-03 23:04:37,352][134294] Updated weights for policy 0, policy_version 25424 (0.0027) [2025-01-03 23:04:38,968][134211] Fps is (10 sec: 12697.6, 60 sec: 14404.3, 300 sec: 14467.9). Total num frames: 104153088. Throughput: 0: 3370.9. Samples: 15206952. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 23:04:38,969][134211] Avg episode reward: [(0, '6.144')] [2025-01-03 23:04:40,498][134294] Updated weights for policy 0, policy_version 25434 (0.0024) [2025-01-03 23:04:42,458][134294] Updated weights for policy 0, policy_version 25444 (0.0013) [2025-01-03 23:04:43,968][134211] Fps is (10 sec: 15564.9, 60 sec: 14950.3, 300 sec: 14551.2). Total num frames: 104247296. Throughput: 0: 3463.8. Samples: 15231852. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-03 23:04:43,968][134211] Avg episode reward: [(0, '6.937')] [2025-01-03 23:04:44,335][134294] Updated weights for policy 0, policy_version 25454 (0.0015) [2025-01-03 23:04:46,235][134294] Updated weights for policy 0, policy_version 25464 (0.0013) [2025-01-03 23:04:48,096][134294] Updated weights for policy 0, policy_version 25474 (0.0013) [2025-01-03 23:04:48,967][134211] Fps is (10 sec: 20480.4, 60 sec: 15018.7, 300 sec: 14704.0). Total num frames: 104357888. Throughput: 0: 3610.1. Samples: 15248280. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-03 23:04:48,968][134211] Avg episode reward: [(0, '6.289')] [2025-01-03 23:04:49,952][134294] Updated weights for policy 0, policy_version 25484 (0.0014) [2025-01-03 23:04:52,622][134294] Updated weights for policy 0, policy_version 25494 (0.0022) [2025-01-03 23:04:53,968][134211] Fps is (10 sec: 19251.2, 60 sec: 14677.3, 300 sec: 14634.5). Total num frames: 104439808. Throughput: 0: 3818.1. Samples: 15277680. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-03 23:04:53,969][134211] Avg episode reward: [(0, '5.966')] [2025-01-03 23:04:55,922][134294] Updated weights for policy 0, policy_version 25504 (0.0029) [2025-01-03 23:04:58,969][134211] Fps is (10 sec: 14333.9, 60 sec: 14677.1, 300 sec: 14620.6). Total num frames: 104501248. Throughput: 0: 3795.6. Samples: 15296682. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-03 23:04:58,970][134211] Avg episode reward: [(0, '5.872')] [2025-01-03 23:04:59,207][134294] Updated weights for policy 0, policy_version 25514 (0.0031) [2025-01-03 23:05:02,865][134294] Updated weights for policy 0, policy_version 25524 (0.0030) [2025-01-03 23:05:03,968][134211] Fps is (10 sec: 11469.0, 60 sec: 14472.5, 300 sec: 14579.0). Total num frames: 104554496. Throughput: 0: 3760.9. Samples: 15305106. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 23:05:03,968][134211] Avg episode reward: [(0, '6.493')] [2025-01-03 23:05:06,320][134294] Updated weights for policy 0, policy_version 25534 (0.0025) [2025-01-03 23:05:08,970][134211] Fps is (10 sec: 11467.2, 60 sec: 14403.6, 300 sec: 14551.1). Total num frames: 104615936. Throughput: 0: 3684.6. Samples: 15322464. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 23:05:08,971][134211] Avg episode reward: [(0, '6.522')] [2025-01-03 23:05:09,831][134294] Updated weights for policy 0, policy_version 25544 (0.0028) [2025-01-03 23:05:12,750][134294] Updated weights for policy 0, policy_version 25554 (0.0026) [2025-01-03 23:05:13,968][134211] Fps is (10 sec: 13107.0, 60 sec: 14472.9, 300 sec: 14565.1). Total num frames: 104685568. Throughput: 0: 3653.8. Samples: 15341840. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 23:05:13,969][134211] Avg episode reward: [(0, '6.434')] [2025-01-03 23:05:13,976][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000025558_104685568.pth... [2025-01-03 23:05:14,049][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000024708_101203968.pth [2025-01-03 23:05:15,795][134294] Updated weights for policy 0, policy_version 25564 (0.0026) [2025-01-03 23:05:18,722][134294] Updated weights for policy 0, policy_version 25574 (0.0025) [2025-01-03 23:05:18,968][134211] Fps is (10 sec: 13520.3, 60 sec: 14404.3, 300 sec: 14454.0). Total num frames: 104751104. Throughput: 0: 3652.9. Samples: 15351862. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-03 23:05:18,968][134211] Avg episode reward: [(0, '6.538')] [2025-01-03 23:05:21,672][134294] Updated weights for policy 0, policy_version 25584 (0.0027) [2025-01-03 23:05:23,968][134211] Fps is (10 sec: 13516.9, 60 sec: 14404.3, 300 sec: 14440.1). Total num frames: 104820736. Throughput: 0: 3684.6. Samples: 15372760. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-03 23:05:23,968][134211] Avg episode reward: [(0, '6.361')] [2025-01-03 23:05:24,776][134294] Updated weights for policy 0, policy_version 25594 (0.0024) [2025-01-03 23:05:26,731][134294] Updated weights for policy 0, policy_version 25604 (0.0017) [2025-01-03 23:05:28,970][134211] Fps is (10 sec: 15560.7, 60 sec: 14676.7, 300 sec: 14509.4). Total num frames: 104906752. Throughput: 0: 3681.6. Samples: 15397534. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-03 23:05:28,971][134211] Avg episode reward: [(0, '6.910')] [2025-01-03 23:05:29,492][134294] Updated weights for policy 0, policy_version 25614 (0.0023) [2025-01-03 23:05:32,506][134294] Updated weights for policy 0, policy_version 25624 (0.0025) [2025-01-03 23:05:33,968][134211] Fps is (10 sec: 15155.1, 60 sec: 14677.3, 300 sec: 14495.7). Total num frames: 104972288. Throughput: 0: 3541.2. Samples: 15407636. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-03 23:05:33,969][134211] Avg episode reward: [(0, '6.098')] [2025-01-03 23:05:35,543][134294] Updated weights for policy 0, policy_version 25634 (0.0023) [2025-01-03 23:05:38,421][134294] Updated weights for policy 0, policy_version 25644 (0.0024) [2025-01-03 23:05:38,968][134211] Fps is (10 sec: 13520.4, 60 sec: 14813.9, 300 sec: 14509.6). Total num frames: 105041920. Throughput: 0: 3347.7. Samples: 15428326. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 23:05:38,968][134211] Avg episode reward: [(0, '7.184')] [2025-01-03 23:05:41,502][134294] Updated weights for policy 0, policy_version 25654 (0.0023) [2025-01-03 23:05:43,968][134211] Fps is (10 sec: 13516.9, 60 sec: 14336.0, 300 sec: 14495.7). Total num frames: 105107456. Throughput: 0: 3357.3. Samples: 15447758. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 23:05:43,968][134211] Avg episode reward: [(0, '7.020')] [2025-01-03 23:05:44,569][134294] Updated weights for policy 0, policy_version 25664 (0.0023) [2025-01-03 23:05:46,467][134294] Updated weights for policy 0, policy_version 25674 (0.0014) [2025-01-03 23:05:48,365][134294] Updated weights for policy 0, policy_version 25684 (0.0014) [2025-01-03 23:05:48,968][134211] Fps is (10 sec: 17203.4, 60 sec: 14267.7, 300 sec: 14634.5). Total num frames: 105213952. Throughput: 0: 3484.1. Samples: 15461890. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 23:05:48,968][134211] Avg episode reward: [(0, '6.441')] [2025-01-03 23:05:50,240][134294] Updated weights for policy 0, policy_version 25694 (0.0013) [2025-01-03 23:05:52,136][134294] Updated weights for policy 0, policy_version 25704 (0.0015) [2025-01-03 23:05:53,968][134211] Fps is (10 sec: 21299.5, 60 sec: 14677.4, 300 sec: 14676.2). Total num frames: 105320448. Throughput: 0: 3822.7. Samples: 15494476. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-03 23:05:53,968][134211] Avg episode reward: [(0, '6.501')] [2025-01-03 23:05:54,035][134294] Updated weights for policy 0, policy_version 25714 (0.0013) [2025-01-03 23:05:56,252][134294] Updated weights for policy 0, policy_version 25724 (0.0017) [2025-01-03 23:05:58,968][134211] Fps is (10 sec: 18431.6, 60 sec: 14950.7, 300 sec: 14634.5). Total num frames: 105398272. Throughput: 0: 3981.0. Samples: 15520984. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-03 23:05:58,968][134211] Avg episode reward: [(0, '6.371')] [2025-01-03 23:05:59,379][134294] Updated weights for policy 0, policy_version 25734 (0.0029) [2025-01-03 23:06:02,726][134294] Updated weights for policy 0, policy_version 25744 (0.0026) [2025-01-03 23:06:03,968][134211] Fps is (10 sec: 13926.3, 60 sec: 15086.9, 300 sec: 14648.4). Total num frames: 105459712. Throughput: 0: 3962.4. Samples: 15530168. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-03 23:06:03,968][134211] Avg episode reward: [(0, '6.672')] [2025-01-03 23:06:05,822][134294] Updated weights for policy 0, policy_version 25754 (0.0027) [2025-01-03 23:06:08,849][134294] Updated weights for policy 0, policy_version 25764 (0.0027) [2025-01-03 23:06:08,968][134211] Fps is (10 sec: 13107.3, 60 sec: 15224.1, 300 sec: 14662.3). Total num frames: 105529344. Throughput: 0: 3940.2. Samples: 15550070. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-03 23:06:08,968][134211] Avg episode reward: [(0, '6.360')] [2025-01-03 23:06:11,879][134294] Updated weights for policy 0, policy_version 25774 (0.0022) [2025-01-03 23:06:13,968][134211] Fps is (10 sec: 13516.6, 60 sec: 15155.2, 300 sec: 14648.4). Total num frames: 105594880. Throughput: 0: 3835.7. Samples: 15570130. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-03 23:06:13,968][134211] Avg episode reward: [(0, '6.660')] [2025-01-03 23:06:14,979][134294] Updated weights for policy 0, policy_version 25784 (0.0025) [2025-01-03 23:06:18,001][134294] Updated weights for policy 0, policy_version 25794 (0.0024) [2025-01-03 23:06:18,969][134211] Fps is (10 sec: 13515.4, 60 sec: 15223.2, 300 sec: 14662.2). Total num frames: 105664512. Throughput: 0: 3830.1. Samples: 15579994. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-03 23:06:18,969][134211] Avg episode reward: [(0, '6.321')] [2025-01-03 23:06:20,932][134294] Updated weights for policy 0, policy_version 25804 (0.0025) [2025-01-03 23:06:23,874][134294] Updated weights for policy 0, policy_version 25814 (0.0025) [2025-01-03 23:06:23,969][134211] Fps is (10 sec: 13924.6, 60 sec: 15223.1, 300 sec: 14676.1). Total num frames: 105734144. Throughput: 0: 3832.7. Samples: 15600804. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-03 23:06:23,970][134211] Avg episode reward: [(0, '6.607')] [2025-01-03 23:06:26,838][134294] Updated weights for policy 0, policy_version 25824 (0.0023) [2025-01-03 23:06:28,968][134211] Fps is (10 sec: 13517.5, 60 sec: 14882.7, 300 sec: 14662.4). Total num frames: 105799680. Throughput: 0: 3856.5. Samples: 15621300. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 23:06:28,969][134211] Avg episode reward: [(0, '6.305')] [2025-01-03 23:06:30,227][134294] Updated weights for policy 0, policy_version 25834 (0.0028) [2025-01-03 23:06:33,475][134294] Updated weights for policy 0, policy_version 25844 (0.0024) [2025-01-03 23:06:33,970][134211] Fps is (10 sec: 12696.5, 60 sec: 14813.3, 300 sec: 14648.3). Total num frames: 105861120. Throughput: 0: 3740.0. Samples: 15630200. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 23:06:33,971][134211] Avg episode reward: [(0, '6.205')] [2025-01-03 23:06:36,523][134294] Updated weights for policy 0, policy_version 25854 (0.0024) [2025-01-03 23:06:38,968][134211] Fps is (10 sec: 13107.8, 60 sec: 14813.9, 300 sec: 14551.2). Total num frames: 105930752. Throughput: 0: 3459.1. Samples: 15650136. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 23:06:38,968][134211] Avg episode reward: [(0, '6.509')] [2025-01-03 23:06:39,515][134294] Updated weights for policy 0, policy_version 25864 (0.0025) [2025-01-03 23:06:42,498][134294] Updated weights for policy 0, policy_version 25874 (0.0026) [2025-01-03 23:06:43,968][134211] Fps is (10 sec: 13519.9, 60 sec: 14813.9, 300 sec: 14440.1). Total num frames: 105996288. Throughput: 0: 3318.9. Samples: 15670334. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-03 23:06:43,968][134211] Avg episode reward: [(0, '5.706')] [2025-01-03 23:06:45,509][134294] Updated weights for policy 0, policy_version 25884 (0.0022) [2025-01-03 23:06:48,412][134294] Updated weights for policy 0, policy_version 25894 (0.0025) [2025-01-03 23:06:48,968][134211] Fps is (10 sec: 13516.7, 60 sec: 14199.4, 300 sec: 14467.9). Total num frames: 106065920. Throughput: 0: 3346.5. Samples: 15680762. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-03 23:06:48,968][134211] Avg episode reward: [(0, '5.415')] [2025-01-03 23:06:51,395][134294] Updated weights for policy 0, policy_version 25904 (0.0026) [2025-01-03 23:06:53,968][134211] Fps is (10 sec: 13926.4, 60 sec: 13585.1, 300 sec: 14495.7). Total num frames: 106135552. Throughput: 0: 3364.5. Samples: 15701470. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-03 23:06:53,968][134211] Avg episode reward: [(0, '5.035')] [2025-01-03 23:06:54,513][134294] Updated weights for policy 0, policy_version 25914 (0.0026) [2025-01-03 23:06:56,768][134294] Updated weights for policy 0, policy_version 25924 (0.0015) [2025-01-03 23:06:58,752][134294] Updated weights for policy 0, policy_version 25934 (0.0013) [2025-01-03 23:06:58,968][134211] Fps is (10 sec: 16384.3, 60 sec: 13858.2, 300 sec: 14592.9). Total num frames: 106229760. Throughput: 0: 3486.3. Samples: 15727014. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-03 23:06:58,968][134211] Avg episode reward: [(0, '5.224')] [2025-01-03 23:07:01,606][134294] Updated weights for policy 0, policy_version 25944 (0.0020) [2025-01-03 23:07:03,968][134211] Fps is (10 sec: 15564.5, 60 sec: 13858.1, 300 sec: 14606.8). Total num frames: 106291200. Throughput: 0: 3525.8. Samples: 15738652. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-03 23:07:03,969][134211] Avg episode reward: [(0, '5.705')] [2025-01-03 23:07:05,284][134294] Updated weights for policy 0, policy_version 25954 (0.0027) [2025-01-03 23:07:07,942][134294] Updated weights for policy 0, policy_version 25964 (0.0019) [2025-01-03 23:07:08,968][134211] Fps is (10 sec: 13516.8, 60 sec: 13926.4, 300 sec: 14620.6). Total num frames: 106364928. Throughput: 0: 3463.9. Samples: 15756672. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-03 23:07:08,968][134211] Avg episode reward: [(0, '5.679')] [2025-01-03 23:07:09,894][134294] Updated weights for policy 0, policy_version 25974 (0.0013) [2025-01-03 23:07:11,780][134294] Updated weights for policy 0, policy_version 25984 (0.0014) [2025-01-03 23:07:13,665][134294] Updated weights for policy 0, policy_version 25994 (0.0015) [2025-01-03 23:07:13,968][134211] Fps is (10 sec: 18432.4, 60 sec: 14677.4, 300 sec: 14704.0). Total num frames: 106475520. Throughput: 0: 3716.3. Samples: 15788530. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-03 23:07:13,968][134211] Avg episode reward: [(0, '5.332')] [2025-01-03 23:07:14,006][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000025996_106479616.pth... [2025-01-03 23:07:14,048][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000025130_102932480.pth [2025-01-03 23:07:15,597][134294] Updated weights for policy 0, policy_version 26004 (0.0012) [2025-01-03 23:07:17,427][134294] Updated weights for policy 0, policy_version 26014 (0.0012) [2025-01-03 23:07:18,968][134211] Fps is (10 sec: 22118.3, 60 sec: 15360.3, 300 sec: 14856.7). Total num frames: 106586112. Throughput: 0: 3877.7. Samples: 15804686. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-03 23:07:18,968][134211] Avg episode reward: [(0, '6.089')] [2025-01-03 23:07:19,338][134294] Updated weights for policy 0, policy_version 26024 (0.0013) [2025-01-03 23:07:22,271][134294] Updated weights for policy 0, policy_version 26034 (0.0026) [2025-01-03 23:07:23,968][134211] Fps is (10 sec: 17612.4, 60 sec: 15292.0, 300 sec: 14856.7). Total num frames: 106651648. Throughput: 0: 4032.4. Samples: 15831594. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-03 23:07:23,969][134211] Avg episode reward: [(0, '6.066')] [2025-01-03 23:07:25,597][134294] Updated weights for policy 0, policy_version 26044 (0.0031) [2025-01-03 23:07:28,618][134294] Updated weights for policy 0, policy_version 26054 (0.0026) [2025-01-03 23:07:28,968][134211] Fps is (10 sec: 13516.6, 60 sec: 15360.1, 300 sec: 14801.1). Total num frames: 106721280. Throughput: 0: 4015.6. Samples: 15851034. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-03 23:07:28,968][134211] Avg episode reward: [(0, '5.890')] [2025-01-03 23:07:31,728][134294] Updated weights for policy 0, policy_version 26064 (0.0026) [2025-01-03 23:07:33,969][134211] Fps is (10 sec: 13515.5, 60 sec: 15428.6, 300 sec: 14662.2). Total num frames: 106786816. Throughput: 0: 4005.1. Samples: 15860998. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-03 23:07:33,969][134211] Avg episode reward: [(0, '6.365')] [2025-01-03 23:07:34,856][134294] Updated weights for policy 0, policy_version 26074 (0.0024) [2025-01-03 23:07:38,016][134294] Updated weights for policy 0, policy_version 26084 (0.0025) [2025-01-03 23:07:38,968][134211] Fps is (10 sec: 12697.6, 60 sec: 15291.7, 300 sec: 14579.0). Total num frames: 106848256. Throughput: 0: 3977.8. Samples: 15880470. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 23:07:38,968][134211] Avg episode reward: [(0, '6.395')] [2025-01-03 23:07:41,174][134294] Updated weights for policy 0, policy_version 26094 (0.0025) [2025-01-03 23:07:43,968][134211] Fps is (10 sec: 12698.9, 60 sec: 15291.7, 300 sec: 14592.9). Total num frames: 106913792. Throughput: 0: 3840.7. Samples: 15899848. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 23:07:43,968][134211] Avg episode reward: [(0, '6.336')] [2025-01-03 23:07:44,356][134294] Updated weights for policy 0, policy_version 26104 (0.0027) [2025-01-03 23:07:47,249][134294] Updated weights for policy 0, policy_version 26114 (0.0023) [2025-01-03 23:07:48,968][134211] Fps is (10 sec: 13516.8, 60 sec: 15291.7, 300 sec: 14606.8). Total num frames: 106983424. Throughput: 0: 3808.1. Samples: 15910016. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 23:07:48,968][134211] Avg episode reward: [(0, '6.213')] [2025-01-03 23:07:50,359][134294] Updated weights for policy 0, policy_version 26124 (0.0027) [2025-01-03 23:07:53,185][134294] Updated weights for policy 0, policy_version 26134 (0.0026) [2025-01-03 23:07:53,968][134211] Fps is (10 sec: 13926.6, 60 sec: 15291.7, 300 sec: 14620.6). Total num frames: 107053056. Throughput: 0: 3869.6. Samples: 15930806. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-03 23:07:53,968][134211] Avg episode reward: [(0, '6.404')] [2025-01-03 23:07:56,198][134294] Updated weights for policy 0, policy_version 26144 (0.0025) [2025-01-03 23:07:58,968][134211] Fps is (10 sec: 13516.8, 60 sec: 14813.8, 300 sec: 14648.5). Total num frames: 107118592. Throughput: 0: 3612.3. Samples: 15951086. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-03 23:07:58,969][134211] Avg episode reward: [(0, '7.132')] [2025-01-03 23:07:59,316][134294] Updated weights for policy 0, policy_version 26154 (0.0027) [2025-01-03 23:08:02,392][134294] Updated weights for policy 0, policy_version 26164 (0.0023) [2025-01-03 23:08:03,969][134211] Fps is (10 sec: 13514.8, 60 sec: 14950.1, 300 sec: 14662.2). Total num frames: 107188224. Throughput: 0: 3477.6. Samples: 15961184. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-03 23:08:03,970][134211] Avg episode reward: [(0, '6.263')] [2025-01-03 23:08:05,442][134294] Updated weights for policy 0, policy_version 26174 (0.0027) [2025-01-03 23:08:08,328][134294] Updated weights for policy 0, policy_version 26184 (0.0021) [2025-01-03 23:08:08,968][134211] Fps is (10 sec: 13926.4, 60 sec: 14882.1, 300 sec: 14662.3). Total num frames: 107257856. Throughput: 0: 3335.4. Samples: 15981686. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-03 23:08:08,968][134211] Avg episode reward: [(0, '6.578')] [2025-01-03 23:08:11,230][134294] Updated weights for policy 0, policy_version 26194 (0.0025) [2025-01-03 23:08:13,968][134211] Fps is (10 sec: 13518.6, 60 sec: 14131.1, 300 sec: 14662.3). Total num frames: 107323392. Throughput: 0: 3357.0. Samples: 16002098. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 23:08:13,968][134211] Avg episode reward: [(0, '5.925')] [2025-01-03 23:08:14,390][134294] Updated weights for policy 0, policy_version 26204 (0.0026) [2025-01-03 23:08:17,333][134294] Updated weights for policy 0, policy_version 26214 (0.0026) [2025-01-03 23:08:18,968][134211] Fps is (10 sec: 13517.0, 60 sec: 13448.5, 300 sec: 14662.3). Total num frames: 107393024. Throughput: 0: 3362.0. Samples: 16012286. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 23:08:18,968][134211] Avg episode reward: [(0, '6.592')] [2025-01-03 23:08:20,356][134294] Updated weights for policy 0, policy_version 26224 (0.0022) [2025-01-03 23:08:23,224][134294] Updated weights for policy 0, policy_version 26234 (0.0025) [2025-01-03 23:08:23,968][134211] Fps is (10 sec: 13926.4, 60 sec: 13516.8, 300 sec: 14662.3). Total num frames: 107462656. Throughput: 0: 3396.0. Samples: 16033288. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 23:08:23,969][134211] Avg episode reward: [(0, '6.391')] [2025-01-03 23:08:26,149][134294] Updated weights for policy 0, policy_version 26244 (0.0026) [2025-01-03 23:08:28,464][134294] Updated weights for policy 0, policy_version 26254 (0.0018) [2025-01-03 23:08:28,968][134211] Fps is (10 sec: 15155.2, 60 sec: 13721.6, 300 sec: 14648.4). Total num frames: 107544576. Throughput: 0: 3466.9. Samples: 16055856. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-03 23:08:28,968][134211] Avg episode reward: [(0, '7.402')] [2025-01-03 23:08:28,969][134264] Saving new best policy, reward=7.402! [2025-01-03 23:08:31,020][134294] Updated weights for policy 0, policy_version 26264 (0.0021) [2025-01-03 23:08:33,968][134211] Fps is (10 sec: 15155.4, 60 sec: 13790.1, 300 sec: 14662.3). Total num frames: 107614208. Throughput: 0: 3505.1. Samples: 16067746. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-03 23:08:33,968][134211] Avg episode reward: [(0, '6.229')] [2025-01-03 23:08:34,129][134294] Updated weights for policy 0, policy_version 26274 (0.0024) [2025-01-03 23:08:37,089][134294] Updated weights for policy 0, policy_version 26284 (0.0026) [2025-01-03 23:08:38,968][134211] Fps is (10 sec: 13926.4, 60 sec: 13926.5, 300 sec: 14690.1). Total num frames: 107683840. Throughput: 0: 3499.5. Samples: 16088284. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-03 23:08:38,968][134211] Avg episode reward: [(0, '5.747')] [2025-01-03 23:08:39,534][134294] Updated weights for policy 0, policy_version 26294 (0.0015) [2025-01-03 23:08:41,385][134294] Updated weights for policy 0, policy_version 26304 (0.0013) [2025-01-03 23:08:43,308][134294] Updated weights for policy 0, policy_version 26314 (0.0013) [2025-01-03 23:08:43,967][134211] Fps is (10 sec: 18022.8, 60 sec: 14677.4, 300 sec: 14704.0). Total num frames: 107794432. Throughput: 0: 3713.6. Samples: 16118198. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-03 23:08:43,968][134211] Avg episode reward: [(0, '6.157')] [2025-01-03 23:08:45,152][134294] Updated weights for policy 0, policy_version 26324 (0.0015) [2025-01-03 23:08:47,412][134294] Updated weights for policy 0, policy_version 26334 (0.0018) [2025-01-03 23:08:48,968][134211] Fps is (10 sec: 19660.2, 60 sec: 14950.4, 300 sec: 14648.4). Total num frames: 107880448. Throughput: 0: 3850.4. Samples: 16134448. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-03 23:08:48,969][134211] Avg episode reward: [(0, '6.885')] [2025-01-03 23:08:50,668][134294] Updated weights for policy 0, policy_version 26344 (0.0029) [2025-01-03 23:08:53,822][134294] Updated weights for policy 0, policy_version 26354 (0.0026) [2025-01-03 23:08:53,968][134211] Fps is (10 sec: 15154.7, 60 sec: 14882.1, 300 sec: 14662.3). Total num frames: 107945984. Throughput: 0: 3842.0. Samples: 16154578. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-03 23:08:53,968][134211] Avg episode reward: [(0, '6.477')] [2025-01-03 23:08:56,822][134294] Updated weights for policy 0, policy_version 26364 (0.0024) [2025-01-03 23:08:58,968][134211] Fps is (10 sec: 13106.7, 60 sec: 14882.0, 300 sec: 14662.3). Total num frames: 108011520. Throughput: 0: 3822.4. Samples: 16174106. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-03 23:08:58,969][134211] Avg episode reward: [(0, '6.378')] [2025-01-03 23:09:00,261][134294] Updated weights for policy 0, policy_version 26374 (0.0026) [2025-01-03 23:09:03,883][134294] Updated weights for policy 0, policy_version 26384 (0.0028) [2025-01-03 23:09:03,968][134211] Fps is (10 sec: 12288.0, 60 sec: 14677.7, 300 sec: 14634.5). Total num frames: 108068864. Throughput: 0: 3788.2. Samples: 16182758. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 23:09:03,968][134211] Avg episode reward: [(0, '7.028')] [2025-01-03 23:09:07,341][134294] Updated weights for policy 0, policy_version 26394 (0.0029) [2025-01-03 23:09:08,968][134211] Fps is (10 sec: 11469.3, 60 sec: 14472.5, 300 sec: 14606.8). Total num frames: 108126208. Throughput: 0: 3707.9. Samples: 16200142. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 23:09:08,969][134211] Avg episode reward: [(0, '5.615')] [2025-01-03 23:09:10,635][134294] Updated weights for policy 0, policy_version 26404 (0.0027) [2025-01-03 23:09:13,693][134294] Updated weights for policy 0, policy_version 26414 (0.0028) [2025-01-03 23:09:13,968][134211] Fps is (10 sec: 12697.6, 60 sec: 14540.8, 300 sec: 14606.7). Total num frames: 108195840. Throughput: 0: 3635.0. Samples: 16219430. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 23:09:13,968][134211] Avg episode reward: [(0, '6.928')] [2025-01-03 23:09:13,975][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000026415_108195840.pth... [2025-01-03 23:09:14,046][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000025558_104685568.pth [2025-01-03 23:09:16,240][134294] Updated weights for policy 0, policy_version 26424 (0.0018) [2025-01-03 23:09:18,104][134294] Updated weights for policy 0, policy_version 26434 (0.0012) [2025-01-03 23:09:18,967][134211] Fps is (10 sec: 16384.4, 60 sec: 14950.4, 300 sec: 14690.1). Total num frames: 108290048. Throughput: 0: 3630.0. Samples: 16231094. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 23:09:18,968][134211] Avg episode reward: [(0, '6.227')] [2025-01-03 23:09:20,005][134294] Updated weights for policy 0, policy_version 26444 (0.0014) [2025-01-03 23:09:21,896][134294] Updated weights for policy 0, policy_version 26454 (0.0012) [2025-01-03 23:09:23,969][134211] Fps is (10 sec: 19249.9, 60 sec: 15428.1, 300 sec: 14787.2). Total num frames: 108388352. Throughput: 0: 3897.1. Samples: 16263658. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-03 23:09:23,969][134211] Avg episode reward: [(0, '6.192')] [2025-01-03 23:09:24,531][134294] Updated weights for policy 0, policy_version 26464 (0.0023) [2025-01-03 23:09:27,862][134294] Updated weights for policy 0, policy_version 26474 (0.0028) [2025-01-03 23:09:28,968][134211] Fps is (10 sec: 15974.0, 60 sec: 15086.9, 300 sec: 14773.4). Total num frames: 108449792. Throughput: 0: 3672.7. Samples: 16283472. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-03 23:09:28,968][134211] Avg episode reward: [(0, '7.006')] [2025-01-03 23:09:31,028][134294] Updated weights for policy 0, policy_version 26484 (0.0028) [2025-01-03 23:09:33,968][134211] Fps is (10 sec: 12697.9, 60 sec: 15018.5, 300 sec: 14787.2). Total num frames: 108515328. Throughput: 0: 3528.8. Samples: 16293246. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-03 23:09:33,969][134211] Avg episode reward: [(0, '7.221')] [2025-01-03 23:09:34,183][134294] Updated weights for policy 0, policy_version 26494 (0.0025) [2025-01-03 23:09:37,320][134294] Updated weights for policy 0, policy_version 26504 (0.0027) [2025-01-03 23:09:38,968][134211] Fps is (10 sec: 13107.3, 60 sec: 14950.4, 300 sec: 14690.1). Total num frames: 108580864. Throughput: 0: 3518.2. Samples: 16312896. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-03 23:09:38,968][134211] Avg episode reward: [(0, '6.613')] [2025-01-03 23:09:40,270][134294] Updated weights for policy 0, policy_version 26514 (0.0024) [2025-01-03 23:09:43,328][134294] Updated weights for policy 0, policy_version 26524 (0.0027) [2025-01-03 23:09:43,968][134211] Fps is (10 sec: 13108.0, 60 sec: 14199.4, 300 sec: 14537.3). Total num frames: 108646400. Throughput: 0: 3537.7. Samples: 16333300. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-03 23:09:43,968][134211] Avg episode reward: [(0, '5.931')] [2025-01-03 23:09:46,394][134294] Updated weights for policy 0, policy_version 26534 (0.0026) [2025-01-03 23:09:48,968][134211] Fps is (10 sec: 13516.3, 60 sec: 13926.3, 300 sec: 14495.7). Total num frames: 108716032. Throughput: 0: 3566.8. Samples: 16343264. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-03 23:09:48,969][134211] Avg episode reward: [(0, '6.667')] [2025-01-03 23:09:49,576][134294] Updated weights for policy 0, policy_version 26544 (0.0026) [2025-01-03 23:09:52,531][134294] Updated weights for policy 0, policy_version 26554 (0.0025) [2025-01-03 23:09:53,968][134211] Fps is (10 sec: 13516.2, 60 sec: 13926.3, 300 sec: 14509.6). Total num frames: 108781568. Throughput: 0: 3625.5. Samples: 16363290. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-03 23:09:53,969][134211] Avg episode reward: [(0, '6.328')] [2025-01-03 23:09:55,533][134294] Updated weights for policy 0, policy_version 26564 (0.0027) [2025-01-03 23:09:58,346][134294] Updated weights for policy 0, policy_version 26574 (0.0021) [2025-01-03 23:09:58,968][134211] Fps is (10 sec: 13517.2, 60 sec: 13994.8, 300 sec: 14565.1). Total num frames: 108851200. Throughput: 0: 3662.9. Samples: 16384260. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 23:09:58,968][134211] Avg episode reward: [(0, '6.513')] [2025-01-03 23:10:01,879][134294] Updated weights for policy 0, policy_version 26584 (0.0025) [2025-01-03 23:10:03,968][134211] Fps is (10 sec: 12698.1, 60 sec: 13994.7, 300 sec: 14551.3). Total num frames: 108908544. Throughput: 0: 3605.9. Samples: 16393362. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 23:10:03,968][134211] Avg episode reward: [(0, '6.755')] [2025-01-03 23:10:04,909][134294] Updated weights for policy 0, policy_version 26594 (0.0018) [2025-01-03 23:10:06,888][134294] Updated weights for policy 0, policy_version 26604 (0.0014) [2025-01-03 23:10:08,776][134294] Updated weights for policy 0, policy_version 26614 (0.0014) [2025-01-03 23:10:08,967][134211] Fps is (10 sec: 15974.8, 60 sec: 14745.7, 300 sec: 14662.3). Total num frames: 109010944. Throughput: 0: 3416.2. Samples: 16417384. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 23:10:08,968][134211] Avg episode reward: [(0, '7.073')] [2025-01-03 23:10:10,696][134294] Updated weights for policy 0, policy_version 26624 (0.0012) [2025-01-03 23:10:12,539][134294] Updated weights for policy 0, policy_version 26634 (0.0012) [2025-01-03 23:10:13,968][134211] Fps is (10 sec: 21299.6, 60 sec: 15428.3, 300 sec: 14815.0). Total num frames: 109121536. Throughput: 0: 3699.7. Samples: 16449956. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 23:10:13,968][134211] Avg episode reward: [(0, '6.026')] [2025-01-03 23:10:14,466][134294] Updated weights for policy 0, policy_version 26644 (0.0013) [2025-01-03 23:10:16,275][134294] Updated weights for policy 0, policy_version 26654 (0.0013) [2025-01-03 23:10:18,784][134294] Updated weights for policy 0, policy_version 26664 (0.0021) [2025-01-03 23:10:18,968][134211] Fps is (10 sec: 20479.3, 60 sec: 15428.2, 300 sec: 14898.3). Total num frames: 109215744. Throughput: 0: 3846.4. Samples: 16466334. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 23:10:18,969][134211] Avg episode reward: [(0, '6.032')] [2025-01-03 23:10:22,024][134294] Updated weights for policy 0, policy_version 26674 (0.0028) [2025-01-03 23:10:23,968][134211] Fps is (10 sec: 15564.7, 60 sec: 14814.1, 300 sec: 14815.2). Total num frames: 109277184. Throughput: 0: 3885.2. Samples: 16487732. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 23:10:23,968][134211] Avg episode reward: [(0, '5.793')] [2025-01-03 23:10:25,228][134294] Updated weights for policy 0, policy_version 26684 (0.0027) [2025-01-03 23:10:28,284][134294] Updated weights for policy 0, policy_version 26694 (0.0024) [2025-01-03 23:10:28,968][134211] Fps is (10 sec: 13106.9, 60 sec: 14950.3, 300 sec: 14828.9). Total num frames: 109346816. Throughput: 0: 3871.7. Samples: 16507528. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 23:10:28,969][134211] Avg episode reward: [(0, '5.844')] [2025-01-03 23:10:31,306][134294] Updated weights for policy 0, policy_version 26704 (0.0025) [2025-01-03 23:10:33,968][134211] Fps is (10 sec: 13516.6, 60 sec: 14950.5, 300 sec: 14815.0). Total num frames: 109412352. Throughput: 0: 3876.0. Samples: 16517682. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 23:10:33,968][134211] Avg episode reward: [(0, '5.969')] [2025-01-03 23:10:34,533][134294] Updated weights for policy 0, policy_version 26714 (0.0027) [2025-01-03 23:10:37,606][134294] Updated weights for policy 0, policy_version 26724 (0.0026) [2025-01-03 23:10:38,968][134211] Fps is (10 sec: 13106.8, 60 sec: 14950.2, 300 sec: 14815.0). Total num frames: 109477888. Throughput: 0: 3862.2. Samples: 16537088. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 23:10:38,969][134211] Avg episode reward: [(0, '6.157')] [2025-01-03 23:10:40,627][134294] Updated weights for policy 0, policy_version 26734 (0.0027) [2025-01-03 23:10:43,630][134294] Updated weights for policy 0, policy_version 26744 (0.0025) [2025-01-03 23:10:43,968][134211] Fps is (10 sec: 13516.7, 60 sec: 15018.6, 300 sec: 14690.0). Total num frames: 109547520. Throughput: 0: 3850.7. Samples: 16557542. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 23:10:43,969][134211] Avg episode reward: [(0, '6.099')] [2025-01-03 23:10:46,592][134294] Updated weights for policy 0, policy_version 26754 (0.0023) [2025-01-03 23:10:48,969][134211] Fps is (10 sec: 13515.8, 60 sec: 14950.2, 300 sec: 14551.1). Total num frames: 109613056. Throughput: 0: 3879.4. Samples: 16567940. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 23:10:48,970][134211] Avg episode reward: [(0, '6.217')] [2025-01-03 23:10:49,631][134294] Updated weights for policy 0, policy_version 26764 (0.0026) [2025-01-03 23:10:52,708][134294] Updated weights for policy 0, policy_version 26774 (0.0024) [2025-01-03 23:10:53,968][134211] Fps is (10 sec: 13517.1, 60 sec: 15018.8, 300 sec: 14523.5). Total num frames: 109682688. Throughput: 0: 3794.7. Samples: 16588146. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 23:10:53,968][134211] Avg episode reward: [(0, '6.295')] [2025-01-03 23:10:55,668][134294] Updated weights for policy 0, policy_version 26784 (0.0026) [2025-01-03 23:10:58,547][134294] Updated weights for policy 0, policy_version 26794 (0.0025) [2025-01-03 23:10:58,968][134211] Fps is (10 sec: 13928.2, 60 sec: 15018.7, 300 sec: 14551.2). Total num frames: 109752320. Throughput: 0: 3533.0. Samples: 16608942. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 23:10:58,968][134211] Avg episode reward: [(0, '6.448')] [2025-01-03 23:11:01,970][134294] Updated weights for policy 0, policy_version 26804 (0.0025) [2025-01-03 23:11:03,968][134211] Fps is (10 sec: 12697.5, 60 sec: 15018.7, 300 sec: 14509.6). Total num frames: 109809664. Throughput: 0: 3375.7. Samples: 16618240. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 23:11:03,968][134211] Avg episode reward: [(0, '6.438')] [2025-01-03 23:11:05,434][134294] Updated weights for policy 0, policy_version 26814 (0.0029) [2025-01-03 23:11:08,678][134294] Updated weights for policy 0, policy_version 26824 (0.0024) [2025-01-03 23:11:08,968][134211] Fps is (10 sec: 12288.2, 60 sec: 14404.3, 300 sec: 14509.6). Total num frames: 109875200. Throughput: 0: 3299.0. Samples: 16636188. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 23:11:08,968][134211] Avg episode reward: [(0, '6.924')] [2025-01-03 23:11:10,637][134294] Updated weights for policy 0, policy_version 26834 (0.0014) [2025-01-03 23:11:12,520][134294] Updated weights for policy 0, policy_version 26844 (0.0013) [2025-01-03 23:11:13,967][134211] Fps is (10 sec: 17203.5, 60 sec: 14336.0, 300 sec: 14634.6). Total num frames: 109981696. Throughput: 0: 3502.1. Samples: 16665122. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 23:11:13,968][134211] Avg episode reward: [(0, '6.326')] [2025-01-03 23:11:14,023][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000026852_109985792.pth... [2025-01-03 23:11:14,064][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000025996_106479616.pth [2025-01-03 23:11:14,477][134294] Updated weights for policy 0, policy_version 26854 (0.0013) [2025-01-03 23:11:17,146][134294] Updated weights for policy 0, policy_version 26864 (0.0023) [2025-01-03 23:11:18,968][134211] Fps is (10 sec: 18021.9, 60 sec: 13994.7, 300 sec: 14648.5). Total num frames: 110055424. Throughput: 0: 3576.7. Samples: 16678632. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 23:11:18,969][134211] Avg episode reward: [(0, '6.965')] [2025-01-03 23:11:20,434][134294] Updated weights for policy 0, policy_version 26874 (0.0030) [2025-01-03 23:11:23,530][134294] Updated weights for policy 0, policy_version 26884 (0.0026) [2025-01-03 23:11:23,968][134211] Fps is (10 sec: 13926.2, 60 sec: 14062.9, 300 sec: 14648.4). Total num frames: 110120960. Throughput: 0: 3578.0. Samples: 16698096. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-03 23:11:23,968][134211] Avg episode reward: [(0, '6.563')] [2025-01-03 23:11:26,546][134294] Updated weights for policy 0, policy_version 26894 (0.0026) [2025-01-03 23:11:28,968][134211] Fps is (10 sec: 13107.5, 60 sec: 13994.8, 300 sec: 14662.4). Total num frames: 110186496. Throughput: 0: 3562.8. Samples: 16717866. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-03 23:11:28,968][134211] Avg episode reward: [(0, '6.062')] [2025-01-03 23:11:29,702][134294] Updated weights for policy 0, policy_version 26904 (0.0025) [2025-01-03 23:11:32,747][134294] Updated weights for policy 0, policy_version 26914 (0.0027) [2025-01-03 23:11:33,968][134211] Fps is (10 sec: 13107.2, 60 sec: 13994.7, 300 sec: 14648.4). Total num frames: 110252032. Throughput: 0: 3553.0. Samples: 16727820. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-03 23:11:33,968][134211] Avg episode reward: [(0, '5.851')] [2025-01-03 23:11:35,797][134294] Updated weights for policy 0, policy_version 26924 (0.0028) [2025-01-03 23:11:38,714][134294] Updated weights for policy 0, policy_version 26934 (0.0026) [2025-01-03 23:11:38,968][134211] Fps is (10 sec: 13516.6, 60 sec: 14063.1, 300 sec: 14662.3). Total num frames: 110321664. Throughput: 0: 3561.5. Samples: 16748412. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-03 23:11:38,968][134211] Avg episode reward: [(0, '5.866')] [2025-01-03 23:11:41,735][134294] Updated weights for policy 0, policy_version 26944 (0.0025) [2025-01-03 23:11:43,968][134211] Fps is (10 sec: 13516.6, 60 sec: 13994.7, 300 sec: 14648.4). Total num frames: 110387200. Throughput: 0: 3537.7. Samples: 16768140. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 23:11:43,968][134211] Avg episode reward: [(0, '6.125')] [2025-01-03 23:11:45,044][134294] Updated weights for policy 0, policy_version 26954 (0.0029) [2025-01-03 23:11:48,004][134294] Updated weights for policy 0, policy_version 26964 (0.0028) [2025-01-03 23:11:48,968][134211] Fps is (10 sec: 13516.9, 60 sec: 14063.3, 300 sec: 14648.4). Total num frames: 110456832. Throughput: 0: 3550.8. Samples: 16778028. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 23:11:48,968][134211] Avg episode reward: [(0, '5.785')] [2025-01-03 23:11:50,977][134294] Updated weights for policy 0, policy_version 26974 (0.0026) [2025-01-03 23:11:53,086][134294] Updated weights for policy 0, policy_version 26984 (0.0013) [2025-01-03 23:11:53,968][134211] Fps is (10 sec: 15564.4, 60 sec: 14335.9, 300 sec: 14620.6). Total num frames: 110542848. Throughput: 0: 3647.6. Samples: 16800334. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 23:11:53,968][134211] Avg episode reward: [(0, '5.725')] [2025-01-03 23:11:55,032][134294] Updated weights for policy 0, policy_version 26994 (0.0013) [2025-01-03 23:11:57,151][134294] Updated weights for policy 0, policy_version 27004 (0.0015) [2025-01-03 23:11:58,968][134211] Fps is (10 sec: 18432.1, 60 sec: 14813.9, 300 sec: 14745.6). Total num frames: 110641152. Throughput: 0: 3673.5. Samples: 16830430. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-03 23:11:58,969][134211] Avg episode reward: [(0, '6.222')] [2025-01-03 23:11:59,573][134294] Updated weights for policy 0, policy_version 27014 (0.0019) [2025-01-03 23:12:02,967][134294] Updated weights for policy 0, policy_version 27024 (0.0028) [2025-01-03 23:12:03,968][134211] Fps is (10 sec: 15564.9, 60 sec: 14813.8, 300 sec: 14690.0). Total num frames: 110698496. Throughput: 0: 3600.6. Samples: 16840660. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-03 23:12:03,969][134211] Avg episode reward: [(0, '6.274')] [2025-01-03 23:12:06,518][134294] Updated weights for policy 0, policy_version 27034 (0.0029) [2025-01-03 23:12:08,968][134211] Fps is (10 sec: 11878.1, 60 sec: 14745.5, 300 sec: 14523.4). Total num frames: 110759936. Throughput: 0: 3554.5. Samples: 16858048. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-03 23:12:08,969][134211] Avg episode reward: [(0, '6.123')] [2025-01-03 23:12:09,971][134294] Updated weights for policy 0, policy_version 27044 (0.0027) [2025-01-03 23:12:13,239][134294] Updated weights for policy 0, policy_version 27054 (0.0027) [2025-01-03 23:12:13,968][134211] Fps is (10 sec: 12288.5, 60 sec: 13994.6, 300 sec: 14356.8). Total num frames: 110821376. Throughput: 0: 3522.8. Samples: 16876394. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-03 23:12:13,968][134211] Avg episode reward: [(0, '5.912')] [2025-01-03 23:12:16,153][134294] Updated weights for policy 0, policy_version 27064 (0.0026) [2025-01-03 23:12:18,968][134211] Fps is (10 sec: 12697.7, 60 sec: 13858.1, 300 sec: 14356.8). Total num frames: 110886912. Throughput: 0: 3528.1. Samples: 16886586. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 23:12:18,968][134211] Avg episode reward: [(0, '6.398')] [2025-01-03 23:12:19,338][134294] Updated weights for policy 0, policy_version 27074 (0.0028) [2025-01-03 23:12:22,305][134294] Updated weights for policy 0, policy_version 27084 (0.0027) [2025-01-03 23:12:23,968][134211] Fps is (10 sec: 13516.5, 60 sec: 13926.4, 300 sec: 14356.8). Total num frames: 110956544. Throughput: 0: 3515.6. Samples: 16906614. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 23:12:23,968][134211] Avg episode reward: [(0, '6.173')] [2025-01-03 23:12:25,347][134294] Updated weights for policy 0, policy_version 27094 (0.0029) [2025-01-03 23:12:28,136][134294] Updated weights for policy 0, policy_version 27104 (0.0024) [2025-01-03 23:12:28,968][134211] Fps is (10 sec: 14745.6, 60 sec: 14131.2, 300 sec: 14398.5). Total num frames: 111034368. Throughput: 0: 3548.6. Samples: 16927826. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 23:12:28,968][134211] Avg episode reward: [(0, '5.946')] [2025-01-03 23:12:30,026][134294] Updated weights for policy 0, policy_version 27114 (0.0013) [2025-01-03 23:12:31,957][134294] Updated weights for policy 0, policy_version 27124 (0.0013) [2025-01-03 23:12:33,949][134294] Updated weights for policy 0, policy_version 27134 (0.0012) [2025-01-03 23:12:33,968][134211] Fps is (10 sec: 18432.5, 60 sec: 14813.9, 300 sec: 14551.2). Total num frames: 111140864. Throughput: 0: 3686.0. Samples: 16943898. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-03 23:12:33,968][134211] Avg episode reward: [(0, '5.858')] [2025-01-03 23:12:36,323][134294] Updated weights for policy 0, policy_version 27144 (0.0016) [2025-01-03 23:12:38,968][134211] Fps is (10 sec: 17612.9, 60 sec: 14813.9, 300 sec: 14565.1). Total num frames: 111210496. Throughput: 0: 3802.2. Samples: 16971432. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-03 23:12:38,968][134211] Avg episode reward: [(0, '5.993')] [2025-01-03 23:12:39,636][134294] Updated weights for policy 0, policy_version 27154 (0.0031) [2025-01-03 23:12:42,808][134294] Updated weights for policy 0, policy_version 27164 (0.0030) [2025-01-03 23:12:43,968][134211] Fps is (10 sec: 13516.2, 60 sec: 14813.8, 300 sec: 14551.2). Total num frames: 111276032. Throughput: 0: 3548.4. Samples: 16990110. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-03 23:12:43,969][134211] Avg episode reward: [(0, '6.001')] [2025-01-03 23:12:45,956][134294] Updated weights for policy 0, policy_version 27174 (0.0028) [2025-01-03 23:12:48,936][134294] Updated weights for policy 0, policy_version 27184 (0.0024) [2025-01-03 23:12:48,968][134211] Fps is (10 sec: 13516.0, 60 sec: 14813.7, 300 sec: 14551.2). Total num frames: 111345664. Throughput: 0: 3545.0. Samples: 17000188. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-03 23:12:48,969][134211] Avg episode reward: [(0, '6.323')] [2025-01-03 23:12:52,012][134294] Updated weights for policy 0, policy_version 27194 (0.0025) [2025-01-03 23:12:53,968][134211] Fps is (10 sec: 13516.6, 60 sec: 14472.5, 300 sec: 14551.2). Total num frames: 111411200. Throughput: 0: 3609.6. Samples: 17020482. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-03 23:12:53,969][134211] Avg episode reward: [(0, '6.321')] [2025-01-03 23:12:55,136][134294] Updated weights for policy 0, policy_version 27204 (0.0022) [2025-01-03 23:12:58,028][134294] Updated weights for policy 0, policy_version 27214 (0.0021) [2025-01-03 23:12:58,968][134211] Fps is (10 sec: 13108.0, 60 sec: 13926.4, 300 sec: 14537.4). Total num frames: 111476736. Throughput: 0: 3652.8. Samples: 17040772. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-03 23:12:58,968][134211] Avg episode reward: [(0, '6.004')] [2025-01-03 23:13:01,130][134294] Updated weights for policy 0, policy_version 27224 (0.0022) [2025-01-03 23:13:03,970][134211] Fps is (10 sec: 12694.8, 60 sec: 13994.1, 300 sec: 14509.4). Total num frames: 111538176. Throughput: 0: 3643.3. Samples: 17050546. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-03 23:13:03,971][134211] Avg episode reward: [(0, '6.056')] [2025-01-03 23:13:04,881][134294] Updated weights for policy 0, policy_version 27234 (0.0028) [2025-01-03 23:13:08,099][134294] Updated weights for policy 0, policy_version 27244 (0.0024) [2025-01-03 23:13:08,968][134211] Fps is (10 sec: 13107.4, 60 sec: 14131.3, 300 sec: 14523.5). Total num frames: 111607808. Throughput: 0: 3580.5. Samples: 17067736. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-03 23:13:08,968][134211] Avg episode reward: [(0, '6.383')] [2025-01-03 23:13:10,265][134294] Updated weights for policy 0, policy_version 27254 (0.0012) [2025-01-03 23:13:12,254][134294] Updated weights for policy 0, policy_version 27264 (0.0014) [2025-01-03 23:13:13,968][134211] Fps is (10 sec: 16798.3, 60 sec: 14745.6, 300 sec: 14620.6). Total num frames: 111706112. Throughput: 0: 3744.1. Samples: 17096312. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 23:13:13,968][134211] Avg episode reward: [(0, '6.412')] [2025-01-03 23:13:14,042][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000027273_111710208.pth... [2025-01-03 23:13:14,082][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000026415_108195840.pth [2025-01-03 23:13:14,235][134294] Updated weights for policy 0, policy_version 27274 (0.0013) [2025-01-03 23:13:16,986][134294] Updated weights for policy 0, policy_version 27284 (0.0023) [2025-01-03 23:13:18,968][134211] Fps is (10 sec: 17203.2, 60 sec: 14882.2, 300 sec: 14634.5). Total num frames: 111779840. Throughput: 0: 3677.6. Samples: 17109388. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 23:13:18,968][134211] Avg episode reward: [(0, '5.884')] [2025-01-03 23:13:20,196][134294] Updated weights for policy 0, policy_version 27294 (0.0029) [2025-01-03 23:13:23,212][134294] Updated weights for policy 0, policy_version 27304 (0.0025) [2025-01-03 23:13:23,968][134211] Fps is (10 sec: 13926.1, 60 sec: 14813.9, 300 sec: 14579.0). Total num frames: 111845376. Throughput: 0: 3503.0. Samples: 17129066. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 23:13:23,968][134211] Avg episode reward: [(0, '6.004')] [2025-01-03 23:13:26,280][134294] Updated weights for policy 0, policy_version 27314 (0.0028) [2025-01-03 23:13:28,968][134211] Fps is (10 sec: 13106.5, 60 sec: 14609.0, 300 sec: 14565.1). Total num frames: 111910912. Throughput: 0: 3526.8. Samples: 17148816. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 23:13:28,969][134211] Avg episode reward: [(0, '6.418')] [2025-01-03 23:13:29,474][134294] Updated weights for policy 0, policy_version 27324 (0.0026) [2025-01-03 23:13:32,573][134294] Updated weights for policy 0, policy_version 27334 (0.0023) [2025-01-03 23:13:33,968][134211] Fps is (10 sec: 12697.8, 60 sec: 13858.1, 300 sec: 14537.3). Total num frames: 111972352. Throughput: 0: 3523.1. Samples: 17158724. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 23:13:33,968][134211] Avg episode reward: [(0, '5.910')] [2025-01-03 23:13:35,925][134294] Updated weights for policy 0, policy_version 27344 (0.0026) [2025-01-03 23:13:38,880][134294] Updated weights for policy 0, policy_version 27354 (0.0026) [2025-01-03 23:13:38,968][134211] Fps is (10 sec: 13107.6, 60 sec: 13858.1, 300 sec: 14398.5). Total num frames: 112041984. Throughput: 0: 3501.5. Samples: 17178048. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 23:13:38,968][134211] Avg episode reward: [(0, '5.829')] [2025-01-03 23:13:42,112][134294] Updated weights for policy 0, policy_version 27364 (0.0025) [2025-01-03 23:13:43,968][134211] Fps is (10 sec: 13107.2, 60 sec: 13790.0, 300 sec: 14315.2). Total num frames: 112103424. Throughput: 0: 3479.4. Samples: 17197344. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 23:13:43,968][134211] Avg episode reward: [(0, '5.370')] [2025-01-03 23:13:45,248][134294] Updated weights for policy 0, policy_version 27374 (0.0025) [2025-01-03 23:13:47,913][134294] Updated weights for policy 0, policy_version 27384 (0.0020) [2025-01-03 23:13:48,967][134211] Fps is (10 sec: 14336.5, 60 sec: 13994.9, 300 sec: 14370.7). Total num frames: 112185344. Throughput: 0: 3490.7. Samples: 17207618. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-03 23:13:48,968][134211] Avg episode reward: [(0, '6.351')] [2025-01-03 23:13:49,828][134294] Updated weights for policy 0, policy_version 27394 (0.0013) [2025-01-03 23:13:51,712][134294] Updated weights for policy 0, policy_version 27404 (0.0013) [2025-01-03 23:13:53,575][134294] Updated weights for policy 0, policy_version 27414 (0.0013) [2025-01-03 23:13:53,968][134211] Fps is (10 sec: 19250.2, 60 sec: 14745.6, 300 sec: 14523.4). Total num frames: 112295936. Throughput: 0: 3775.8. Samples: 17237650. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-03 23:13:53,968][134211] Avg episode reward: [(0, '6.271')] [2025-01-03 23:13:55,476][134294] Updated weights for policy 0, policy_version 27424 (0.0014) [2025-01-03 23:13:58,468][134294] Updated weights for policy 0, policy_version 27434 (0.0027) [2025-01-03 23:13:58,970][134211] Fps is (10 sec: 18837.1, 60 sec: 14949.9, 300 sec: 14592.8). Total num frames: 112373760. Throughput: 0: 3739.5. Samples: 17264598. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-03 23:13:58,971][134211] Avg episode reward: [(0, '6.213')] [2025-01-03 23:14:01,602][134294] Updated weights for policy 0, policy_version 27444 (0.0028) [2025-01-03 23:14:03,968][134211] Fps is (10 sec: 14336.7, 60 sec: 15019.4, 300 sec: 14620.6). Total num frames: 112439296. Throughput: 0: 3670.0. Samples: 17274540. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-03 23:14:03,968][134211] Avg episode reward: [(0, '6.684')] [2025-01-03 23:14:04,918][134294] Updated weights for policy 0, policy_version 27454 (0.0024) [2025-01-03 23:14:07,927][134294] Updated weights for policy 0, policy_version 27464 (0.0026) [2025-01-03 23:14:08,968][134211] Fps is (10 sec: 13109.5, 60 sec: 14950.3, 300 sec: 14606.7). Total num frames: 112504832. Throughput: 0: 3657.9. Samples: 17293672. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 23:14:08,969][134211] Avg episode reward: [(0, '6.567')] [2025-01-03 23:14:10,957][134294] Updated weights for policy 0, policy_version 27474 (0.0025) [2025-01-03 23:14:13,941][134294] Updated weights for policy 0, policy_version 27484 (0.0026) [2025-01-03 23:14:13,968][134211] Fps is (10 sec: 13516.6, 60 sec: 14472.5, 300 sec: 14523.4). Total num frames: 112574464. Throughput: 0: 3675.5. Samples: 17314214. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 23:14:13,968][134211] Avg episode reward: [(0, '6.374')] [2025-01-03 23:14:16,900][134294] Updated weights for policy 0, policy_version 27494 (0.0025) [2025-01-03 23:14:18,968][134211] Fps is (10 sec: 13517.5, 60 sec: 14336.0, 300 sec: 14412.4). Total num frames: 112640000. Throughput: 0: 3678.6. Samples: 17324260. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 23:14:18,968][134211] Avg episode reward: [(0, '6.685')] [2025-01-03 23:14:20,071][134294] Updated weights for policy 0, policy_version 27504 (0.0024) [2025-01-03 23:14:23,030][134294] Updated weights for policy 0, policy_version 27514 (0.0024) [2025-01-03 23:14:23,968][134211] Fps is (10 sec: 13517.0, 60 sec: 14404.3, 300 sec: 14440.1). Total num frames: 112709632. Throughput: 0: 3699.0. Samples: 17344502. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 23:14:23,968][134211] Avg episode reward: [(0, '6.913')] [2025-01-03 23:14:26,083][134294] Updated weights for policy 0, policy_version 27524 (0.0024) [2025-01-03 23:14:28,968][134211] Fps is (10 sec: 13516.7, 60 sec: 14404.4, 300 sec: 14440.2). Total num frames: 112775168. Throughput: 0: 3725.2. Samples: 17364978. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 23:14:28,968][134211] Avg episode reward: [(0, '6.658')] [2025-01-03 23:14:29,012][134294] Updated weights for policy 0, policy_version 27534 (0.0024) [2025-01-03 23:14:32,100][134294] Updated weights for policy 0, policy_version 27544 (0.0026) [2025-01-03 23:14:33,968][134211] Fps is (10 sec: 13516.7, 60 sec: 14540.8, 300 sec: 14454.0). Total num frames: 112844800. Throughput: 0: 3722.4. Samples: 17375126. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 23:14:33,968][134211] Avg episode reward: [(0, '6.692')] [2025-01-03 23:14:35,178][134294] Updated weights for policy 0, policy_version 27554 (0.0026) [2025-01-03 23:14:38,163][134294] Updated weights for policy 0, policy_version 27564 (0.0024) [2025-01-03 23:14:38,967][134211] Fps is (10 sec: 13926.7, 60 sec: 14540.9, 300 sec: 14467.9). Total num frames: 112914432. Throughput: 0: 3510.5. Samples: 17395622. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 23:14:38,968][134211] Avg episode reward: [(0, '7.054')] [2025-01-03 23:14:40,231][134294] Updated weights for policy 0, policy_version 27574 (0.0014) [2025-01-03 23:14:42,682][134294] Updated weights for policy 0, policy_version 27584 (0.0020) [2025-01-03 23:14:43,968][134211] Fps is (10 sec: 15155.4, 60 sec: 14882.1, 300 sec: 14509.6). Total num frames: 112996352. Throughput: 0: 3466.8. Samples: 17420598. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-03 23:14:43,968][134211] Avg episode reward: [(0, '7.053')] [2025-01-03 23:14:45,798][134294] Updated weights for policy 0, policy_version 27594 (0.0025) [2025-01-03 23:14:48,742][134294] Updated weights for policy 0, policy_version 27604 (0.0024) [2025-01-03 23:14:48,968][134211] Fps is (10 sec: 15154.8, 60 sec: 14677.3, 300 sec: 14523.5). Total num frames: 113065984. Throughput: 0: 3473.8. Samples: 17430860. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-03 23:14:48,968][134211] Avg episode reward: [(0, '5.910')] [2025-01-03 23:14:51,752][134294] Updated weights for policy 0, policy_version 27614 (0.0026) [2025-01-03 23:14:53,968][134211] Fps is (10 sec: 13926.1, 60 sec: 13994.7, 300 sec: 14523.4). Total num frames: 113135616. Throughput: 0: 3505.9. Samples: 17451436. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-03 23:14:53,968][134211] Avg episode reward: [(0, '6.361')] [2025-01-03 23:14:54,848][134294] Updated weights for policy 0, policy_version 27624 (0.0025) [2025-01-03 23:14:56,766][134294] Updated weights for policy 0, policy_version 27634 (0.0012) [2025-01-03 23:14:58,697][134294] Updated weights for policy 0, policy_version 27644 (0.0013) [2025-01-03 23:14:58,968][134211] Fps is (10 sec: 16793.9, 60 sec: 14336.6, 300 sec: 14662.3). Total num frames: 113233920. Throughput: 0: 3643.0. Samples: 17478150. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-03 23:14:58,968][134211] Avg episode reward: [(0, '6.972')] [2025-01-03 23:15:00,633][134294] Updated weights for policy 0, policy_version 27654 (0.0012) [2025-01-03 23:15:02,807][134294] Updated weights for policy 0, policy_version 27664 (0.0013) [2025-01-03 23:15:03,968][134211] Fps is (10 sec: 19660.6, 60 sec: 14882.1, 300 sec: 14648.4). Total num frames: 113332224. Throughput: 0: 3760.5. Samples: 17493482. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-03 23:15:03,968][134211] Avg episode reward: [(0, '6.561')] [2025-01-03 23:15:05,408][134294] Updated weights for policy 0, policy_version 27674 (0.0020) [2025-01-03 23:15:08,688][134294] Updated weights for policy 0, policy_version 27684 (0.0029) [2025-01-03 23:15:08,968][134211] Fps is (10 sec: 15974.0, 60 sec: 14814.0, 300 sec: 14481.8). Total num frames: 113393664. Throughput: 0: 3831.1. Samples: 17516902. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-03 23:15:08,968][134211] Avg episode reward: [(0, '5.823')] [2025-01-03 23:15:11,862][134294] Updated weights for policy 0, policy_version 27694 (0.0028) [2025-01-03 23:15:13,968][134211] Fps is (10 sec: 12697.8, 60 sec: 14745.6, 300 sec: 14384.6). Total num frames: 113459200. Throughput: 0: 3800.8. Samples: 17536014. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-03 23:15:13,968][134211] Avg episode reward: [(0, '5.782')] [2025-01-03 23:15:13,976][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000027700_113459200.pth... [2025-01-03 23:15:14,059][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000026852_109985792.pth [2025-01-03 23:15:15,098][134294] Updated weights for policy 0, policy_version 27704 (0.0025) [2025-01-03 23:15:18,267][134294] Updated weights for policy 0, policy_version 27714 (0.0027) [2025-01-03 23:15:18,970][134211] Fps is (10 sec: 13104.4, 60 sec: 14745.0, 300 sec: 14398.4). Total num frames: 113524736. Throughput: 0: 3785.4. Samples: 17545476. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 23:15:18,971][134211] Avg episode reward: [(0, '7.487')] [2025-01-03 23:15:18,972][134264] Saving new best policy, reward=7.487! [2025-01-03 23:15:21,610][134294] Updated weights for policy 0, policy_version 27724 (0.0025) [2025-01-03 23:15:23,968][134211] Fps is (10 sec: 12287.9, 60 sec: 14540.8, 300 sec: 14356.8). Total num frames: 113582080. Throughput: 0: 3747.8. Samples: 17564276. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 23:15:23,968][134211] Avg episode reward: [(0, '5.819')] [2025-01-03 23:15:24,951][134294] Updated weights for policy 0, policy_version 27734 (0.0026) [2025-01-03 23:15:27,998][134294] Updated weights for policy 0, policy_version 27744 (0.0028) [2025-01-03 23:15:28,968][134211] Fps is (10 sec: 12700.3, 60 sec: 14609.1, 300 sec: 14370.7). Total num frames: 113651712. Throughput: 0: 3625.0. Samples: 17583722. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 23:15:28,968][134211] Avg episode reward: [(0, '6.558')] [2025-01-03 23:15:30,932][134294] Updated weights for policy 0, policy_version 27754 (0.0026) [2025-01-03 23:15:33,905][134294] Updated weights for policy 0, policy_version 27764 (0.0023) [2025-01-03 23:15:33,968][134211] Fps is (10 sec: 13926.3, 60 sec: 14609.0, 300 sec: 14384.6). Total num frames: 113721344. Throughput: 0: 3626.3. Samples: 17594044. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 23:15:33,969][134211] Avg episode reward: [(0, '6.872')] [2025-01-03 23:15:36,896][134294] Updated weights for policy 0, policy_version 27774 (0.0027) [2025-01-03 23:15:38,969][134211] Fps is (10 sec: 13515.8, 60 sec: 14540.6, 300 sec: 14370.7). Total num frames: 113786880. Throughput: 0: 3632.2. Samples: 17614886. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 23:15:38,969][134211] Avg episode reward: [(0, '6.104')] [2025-01-03 23:15:40,105][134294] Updated weights for policy 0, policy_version 27784 (0.0028) [2025-01-03 23:15:43,165][134294] Updated weights for policy 0, policy_version 27794 (0.0024) [2025-01-03 23:15:43,968][134211] Fps is (10 sec: 13106.9, 60 sec: 14267.6, 300 sec: 14370.8). Total num frames: 113852416. Throughput: 0: 3474.4. Samples: 17634500. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 23:15:43,969][134211] Avg episode reward: [(0, '6.929')] [2025-01-03 23:15:46,055][134294] Updated weights for policy 0, policy_version 27804 (0.0024) [2025-01-03 23:15:48,968][134211] Fps is (10 sec: 13518.0, 60 sec: 14267.8, 300 sec: 14370.7). Total num frames: 113922048. Throughput: 0: 3360.7. Samples: 17644714. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 23:15:48,968][134211] Avg episode reward: [(0, '6.676')] [2025-01-03 23:15:49,078][134294] Updated weights for policy 0, policy_version 27814 (0.0024) [2025-01-03 23:15:51,036][134294] Updated weights for policy 0, policy_version 27824 (0.0015) [2025-01-03 23:15:53,896][134294] Updated weights for policy 0, policy_version 27834 (0.0024) [2025-01-03 23:15:53,969][134211] Fps is (10 sec: 15564.2, 60 sec: 14540.6, 300 sec: 14426.2). Total num frames: 114008064. Throughput: 0: 3390.1. Samples: 17669460. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 23:15:53,969][134211] Avg episode reward: [(0, '6.773')] [2025-01-03 23:15:56,822][134294] Updated weights for policy 0, policy_version 27844 (0.0027) [2025-01-03 23:15:58,968][134211] Fps is (10 sec: 15155.1, 60 sec: 13994.6, 300 sec: 14454.0). Total num frames: 114073600. Throughput: 0: 3418.2. Samples: 17689832. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 23:15:58,968][134211] Avg episode reward: [(0, '6.904')] [2025-01-03 23:15:59,966][134294] Updated weights for policy 0, policy_version 27854 (0.0025) [2025-01-03 23:16:02,391][134294] Updated weights for policy 0, policy_version 27864 (0.0018) [2025-01-03 23:16:03,968][134211] Fps is (10 sec: 15566.2, 60 sec: 13858.2, 300 sec: 14537.3). Total num frames: 114163712. Throughput: 0: 3422.3. Samples: 17699470. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 23:16:03,968][134211] Avg episode reward: [(0, '6.818')] [2025-01-03 23:16:04,278][134294] Updated weights for policy 0, policy_version 27874 (0.0013) [2025-01-03 23:16:06,476][134294] Updated weights for policy 0, policy_version 27884 (0.0018) [2025-01-03 23:16:08,968][134211] Fps is (10 sec: 17203.1, 60 sec: 14199.5, 300 sec: 14454.0). Total num frames: 114245632. Throughput: 0: 3659.7. Samples: 17728960. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 23:16:08,968][134211] Avg episode reward: [(0, '6.518')] [2025-01-03 23:16:09,584][134294] Updated weights for policy 0, policy_version 27894 (0.0026) [2025-01-03 23:16:13,033][134294] Updated weights for policy 0, policy_version 27904 (0.0025) [2025-01-03 23:16:13,968][134211] Fps is (10 sec: 13926.1, 60 sec: 14062.9, 300 sec: 14398.5). Total num frames: 114302976. Throughput: 0: 3638.9. Samples: 17747472. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 23:16:13,968][134211] Avg episode reward: [(0, '6.862')] [2025-01-03 23:16:16,124][134294] Updated weights for policy 0, policy_version 27914 (0.0024) [2025-01-03 23:16:18,968][134211] Fps is (10 sec: 12697.6, 60 sec: 14131.7, 300 sec: 14412.4). Total num frames: 114372608. Throughput: 0: 3626.5. Samples: 17757238. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 23:16:18,968][134211] Avg episode reward: [(0, '6.711')] [2025-01-03 23:16:19,300][134294] Updated weights for policy 0, policy_version 27924 (0.0024) [2025-01-03 23:16:22,346][134294] Updated weights for policy 0, policy_version 27934 (0.0026) [2025-01-03 23:16:23,968][134211] Fps is (10 sec: 13516.8, 60 sec: 14267.7, 300 sec: 14412.4). Total num frames: 114438144. Throughput: 0: 3603.4. Samples: 17777036. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 23:16:23,968][134211] Avg episode reward: [(0, '6.373')] [2025-01-03 23:16:25,167][134294] Updated weights for policy 0, policy_version 27944 (0.0021) [2025-01-03 23:16:27,153][134294] Updated weights for policy 0, policy_version 27954 (0.0015) [2025-01-03 23:16:28,968][134211] Fps is (10 sec: 15155.2, 60 sec: 14540.8, 300 sec: 14481.8). Total num frames: 114524160. Throughput: 0: 3721.6. Samples: 17801972. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 23:16:28,968][134211] Avg episode reward: [(0, '5.714')] [2025-01-03 23:16:29,950][134294] Updated weights for policy 0, policy_version 27964 (0.0023) [2025-01-03 23:16:33,076][134294] Updated weights for policy 0, policy_version 27974 (0.0024) [2025-01-03 23:16:33,968][134211] Fps is (10 sec: 15155.2, 60 sec: 14472.5, 300 sec: 14467.9). Total num frames: 114589696. Throughput: 0: 3727.6. Samples: 17812458. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 23:16:33,968][134211] Avg episode reward: [(0, '6.772')] [2025-01-03 23:16:36,127][134294] Updated weights for policy 0, policy_version 27984 (0.0025) [2025-01-03 23:16:38,968][134211] Fps is (10 sec: 13516.7, 60 sec: 14541.0, 300 sec: 14481.8). Total num frames: 114659328. Throughput: 0: 3616.7. Samples: 17832210. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 23:16:38,968][134211] Avg episode reward: [(0, '6.298')] [2025-01-03 23:16:39,351][134294] Updated weights for policy 0, policy_version 27994 (0.0028) [2025-01-03 23:16:41,460][134294] Updated weights for policy 0, policy_version 28004 (0.0015) [2025-01-03 23:16:43,305][134294] Updated weights for policy 0, policy_version 28014 (0.0013) [2025-01-03 23:16:43,968][134211] Fps is (10 sec: 16793.3, 60 sec: 15087.0, 300 sec: 14579.0). Total num frames: 114757632. Throughput: 0: 3754.1. Samples: 17858766. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 23:16:43,968][134211] Avg episode reward: [(0, '6.553')] [2025-01-03 23:16:45,220][134294] Updated weights for policy 0, policy_version 28024 (0.0013) [2025-01-03 23:16:47,131][134294] Updated weights for policy 0, policy_version 28034 (0.0014) [2025-01-03 23:16:48,968][134211] Fps is (10 sec: 20070.7, 60 sec: 15633.1, 300 sec: 14634.5). Total num frames: 114860032. Throughput: 0: 3900.7. Samples: 17875002. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 23:16:48,969][134211] Avg episode reward: [(0, '6.320')] [2025-01-03 23:16:49,632][134294] Updated weights for policy 0, policy_version 28044 (0.0018) [2025-01-03 23:16:52,808][134294] Updated weights for policy 0, policy_version 28054 (0.0026) [2025-01-03 23:16:53,968][134211] Fps is (10 sec: 16384.2, 60 sec: 15223.6, 300 sec: 14509.5). Total num frames: 114921472. Throughput: 0: 3768.3. Samples: 17898536. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 23:16:53,968][134211] Avg episode reward: [(0, '6.330')] [2025-01-03 23:16:56,023][134294] Updated weights for policy 0, policy_version 28064 (0.0028) [2025-01-03 23:16:58,968][134211] Fps is (10 sec: 12697.4, 60 sec: 15223.4, 300 sec: 14537.3). Total num frames: 114987008. Throughput: 0: 3781.6. Samples: 17917646. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 23:16:58,969][134211] Avg episode reward: [(0, '6.044')] [2025-01-03 23:16:59,241][134294] Updated weights for policy 0, policy_version 28074 (0.0024) [2025-01-03 23:17:02,754][134294] Updated weights for policy 0, policy_version 28084 (0.0028) [2025-01-03 23:17:03,968][134211] Fps is (10 sec: 12288.0, 60 sec: 14677.3, 300 sec: 14523.4). Total num frames: 115044352. Throughput: 0: 3768.8. Samples: 17926834. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 23:17:03,969][134211] Avg episode reward: [(0, '6.336')] [2025-01-03 23:17:06,231][134294] Updated weights for policy 0, policy_version 28094 (0.0026) [2025-01-03 23:17:08,968][134211] Fps is (10 sec: 11468.8, 60 sec: 14267.7, 300 sec: 14509.6). Total num frames: 115101696. Throughput: 0: 3714.1. Samples: 17944172. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 23:17:08,968][134211] Avg episode reward: [(0, '6.368')] [2025-01-03 23:17:09,871][134294] Updated weights for policy 0, policy_version 28104 (0.0026) [2025-01-03 23:17:13,438][134294] Updated weights for policy 0, policy_version 28114 (0.0022) [2025-01-03 23:17:13,968][134211] Fps is (10 sec: 11468.7, 60 sec: 14267.7, 300 sec: 14481.8). Total num frames: 115159040. Throughput: 0: 3537.8. Samples: 17961174. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 23:17:13,969][134211] Avg episode reward: [(0, '6.129')] [2025-01-03 23:17:13,983][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000028115_115159040.pth... [2025-01-03 23:17:14,075][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000027273_111710208.pth [2025-01-03 23:17:15,912][134294] Updated weights for policy 0, policy_version 28124 (0.0018) [2025-01-03 23:17:17,761][134294] Updated weights for policy 0, policy_version 28134 (0.0013) [2025-01-03 23:17:18,968][134211] Fps is (10 sec: 15973.9, 60 sec: 14813.8, 300 sec: 14592.9). Total num frames: 115261440. Throughput: 0: 3583.6. Samples: 17973722. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 23:17:18,968][134211] Avg episode reward: [(0, '6.410')] [2025-01-03 23:17:19,919][134294] Updated weights for policy 0, policy_version 28144 (0.0017) [2025-01-03 23:17:23,032][134294] Updated weights for policy 0, policy_version 28154 (0.0022) [2025-01-03 23:17:23,968][134211] Fps is (10 sec: 16793.9, 60 sec: 14813.9, 300 sec: 14551.2). Total num frames: 115326976. Throughput: 0: 3737.7. Samples: 18000408. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 23:17:23,968][134211] Avg episode reward: [(0, '6.542')] [2025-01-03 23:17:25,924][134294] Updated weights for policy 0, policy_version 28164 (0.0028) [2025-01-03 23:17:28,956][134294] Updated weights for policy 0, policy_version 28174 (0.0029) [2025-01-03 23:17:28,968][134211] Fps is (10 sec: 13927.0, 60 sec: 14609.1, 300 sec: 14440.1). Total num frames: 115400704. Throughput: 0: 3601.0. Samples: 18020812. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-03 23:17:28,968][134211] Avg episode reward: [(0, '6.582')] [2025-01-03 23:17:31,989][134294] Updated weights for policy 0, policy_version 28184 (0.0025) [2025-01-03 23:17:33,968][134211] Fps is (10 sec: 13926.4, 60 sec: 14609.1, 300 sec: 14426.2). Total num frames: 115466240. Throughput: 0: 3461.9. Samples: 18030790. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-03 23:17:33,968][134211] Avg episode reward: [(0, '7.101')] [2025-01-03 23:17:35,082][134294] Updated weights for policy 0, policy_version 28194 (0.0026) [2025-01-03 23:17:38,058][134294] Updated weights for policy 0, policy_version 28204 (0.0025) [2025-01-03 23:17:38,968][134211] Fps is (10 sec: 13107.3, 60 sec: 14540.8, 300 sec: 14426.3). Total num frames: 115531776. Throughput: 0: 3399.0. Samples: 18051490. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-03 23:17:38,968][134211] Avg episode reward: [(0, '6.585')] [2025-01-03 23:17:40,988][134294] Updated weights for policy 0, policy_version 28214 (0.0025) [2025-01-03 23:17:43,969][134211] Fps is (10 sec: 13515.4, 60 sec: 14062.7, 300 sec: 14426.2). Total num frames: 115601408. Throughput: 0: 3417.0. Samples: 18071414. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-03 23:17:43,969][134211] Avg episode reward: [(0, '6.483')] [2025-01-03 23:17:44,155][134294] Updated weights for policy 0, policy_version 28224 (0.0030) [2025-01-03 23:17:46,866][134294] Updated weights for policy 0, policy_version 28234 (0.0021) [2025-01-03 23:17:48,786][134294] Updated weights for policy 0, policy_version 28244 (0.0012) [2025-01-03 23:17:48,967][134211] Fps is (10 sec: 15565.0, 60 sec: 13789.9, 300 sec: 14495.7). Total num frames: 115687424. Throughput: 0: 3434.9. Samples: 18081402. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 23:17:48,968][134211] Avg episode reward: [(0, '6.336')] [2025-01-03 23:17:50,679][134294] Updated weights for policy 0, policy_version 28254 (0.0014) [2025-01-03 23:17:52,575][134294] Updated weights for policy 0, policy_version 28264 (0.0013) [2025-01-03 23:17:53,968][134211] Fps is (10 sec: 18843.8, 60 sec: 14472.6, 300 sec: 14620.6). Total num frames: 115789824. Throughput: 0: 3765.2. Samples: 18113604. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 23:17:53,968][134211] Avg episode reward: [(0, '6.253')] [2025-01-03 23:17:55,357][134294] Updated weights for policy 0, policy_version 28274 (0.0024) [2025-01-03 23:17:58,501][134294] Updated weights for policy 0, policy_version 28284 (0.0025) [2025-01-03 23:17:58,969][134211] Fps is (10 sec: 16791.1, 60 sec: 14472.3, 300 sec: 14634.6). Total num frames: 115855360. Throughput: 0: 3862.0. Samples: 18134966. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 23:17:58,970][134211] Avg episode reward: [(0, '6.739')] [2025-01-03 23:18:01,690][134294] Updated weights for policy 0, policy_version 28294 (0.0029) [2025-01-03 23:18:03,968][134211] Fps is (10 sec: 13107.0, 60 sec: 14609.1, 300 sec: 14620.6). Total num frames: 115920896. Throughput: 0: 3799.4. Samples: 18144694. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 23:18:03,968][134211] Avg episode reward: [(0, '5.812')] [2025-01-03 23:18:04,880][134294] Updated weights for policy 0, policy_version 28304 (0.0029) [2025-01-03 23:18:07,975][134294] Updated weights for policy 0, policy_version 28314 (0.0026) [2025-01-03 23:18:08,968][134211] Fps is (10 sec: 13108.9, 60 sec: 14745.6, 300 sec: 14509.6). Total num frames: 115986432. Throughput: 0: 3635.5. Samples: 18164006. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 23:18:08,968][134211] Avg episode reward: [(0, '6.750')] [2025-01-03 23:18:11,014][134294] Updated weights for policy 0, policy_version 28324 (0.0024) [2025-01-03 23:18:13,968][134211] Fps is (10 sec: 13107.3, 60 sec: 14882.2, 300 sec: 14481.8). Total num frames: 116051968. Throughput: 0: 3625.6. Samples: 18183966. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 23:18:13,968][134211] Avg episode reward: [(0, '5.981')] [2025-01-03 23:18:14,246][134294] Updated weights for policy 0, policy_version 28334 (0.0026) [2025-01-03 23:18:17,231][134294] Updated weights for policy 0, policy_version 28344 (0.0024) [2025-01-03 23:18:18,968][134211] Fps is (10 sec: 13107.0, 60 sec: 14267.8, 300 sec: 14481.8). Total num frames: 116117504. Throughput: 0: 3631.1. Samples: 18194190. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 23:18:18,968][134211] Avg episode reward: [(0, '6.507')] [2025-01-03 23:18:20,382][134294] Updated weights for policy 0, policy_version 28354 (0.0027) [2025-01-03 23:18:23,263][134294] Updated weights for policy 0, policy_version 28364 (0.0026) [2025-01-03 23:18:23,968][134211] Fps is (10 sec: 13107.4, 60 sec: 14267.8, 300 sec: 14481.8). Total num frames: 116183040. Throughput: 0: 3611.9. Samples: 18214026. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 23:18:23,968][134211] Avg episode reward: [(0, '6.451')] [2025-01-03 23:18:25,460][134294] Updated weights for policy 0, policy_version 28374 (0.0014) [2025-01-03 23:18:27,956][134294] Updated weights for policy 0, policy_version 28384 (0.0021) [2025-01-03 23:18:28,968][134211] Fps is (10 sec: 15564.8, 60 sec: 14540.8, 300 sec: 14579.0). Total num frames: 116273152. Throughput: 0: 3729.4. Samples: 18239234. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 23:18:28,968][134211] Avg episode reward: [(0, '6.100')] [2025-01-03 23:18:30,966][134294] Updated weights for policy 0, policy_version 28394 (0.0024) [2025-01-03 23:18:33,971][134211] Fps is (10 sec: 15559.7, 60 sec: 14540.0, 300 sec: 14564.9). Total num frames: 116338688. Throughput: 0: 3730.5. Samples: 18249286. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 23:18:33,972][134211] Avg episode reward: [(0, '6.235')] [2025-01-03 23:18:34,112][134294] Updated weights for policy 0, policy_version 28404 (0.0022) [2025-01-03 23:18:36,976][134294] Updated weights for policy 0, policy_version 28414 (0.0024) [2025-01-03 23:18:38,968][134211] Fps is (10 sec: 13517.0, 60 sec: 14609.1, 300 sec: 14592.9). Total num frames: 116408320. Throughput: 0: 3464.7. Samples: 18269516. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 23:18:38,968][134211] Avg episode reward: [(0, '6.399')] [2025-01-03 23:18:40,159][134294] Updated weights for policy 0, policy_version 28424 (0.0025) [2025-01-03 23:18:42,823][134294] Updated weights for policy 0, policy_version 28434 (0.0023) [2025-01-03 23:18:43,968][134211] Fps is (10 sec: 14750.4, 60 sec: 14745.9, 300 sec: 14579.0). Total num frames: 116486144. Throughput: 0: 3476.7. Samples: 18291412. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 23:18:43,968][134211] Avg episode reward: [(0, '6.117')] [2025-01-03 23:18:44,751][134294] Updated weights for policy 0, policy_version 28444 (0.0015) [2025-01-03 23:18:46,670][134294] Updated weights for policy 0, policy_version 28454 (0.0014) [2025-01-03 23:18:48,968][134211] Fps is (10 sec: 17203.1, 60 sec: 14882.1, 300 sec: 14523.5). Total num frames: 116580352. Throughput: 0: 3623.5. Samples: 18307750. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 23:18:48,968][134211] Avg episode reward: [(0, '5.943')] [2025-01-03 23:18:49,550][134294] Updated weights for policy 0, policy_version 28464 (0.0027) [2025-01-03 23:18:52,698][134294] Updated weights for policy 0, policy_version 28474 (0.0027) [2025-01-03 23:18:53,968][134211] Fps is (10 sec: 15974.0, 60 sec: 14267.7, 300 sec: 14481.9). Total num frames: 116645888. Throughput: 0: 3664.6. Samples: 18328914. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 23:18:53,968][134211] Avg episode reward: [(0, '6.547')] [2025-01-03 23:18:55,699][134294] Updated weights for policy 0, policy_version 28484 (0.0023) [2025-01-03 23:18:58,713][134294] Updated weights for policy 0, policy_version 28494 (0.0025) [2025-01-03 23:18:58,968][134211] Fps is (10 sec: 13516.8, 60 sec: 14336.3, 300 sec: 14495.7). Total num frames: 116715520. Throughput: 0: 3679.1. Samples: 18349526. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 23:18:58,968][134211] Avg episode reward: [(0, '5.851')] [2025-01-03 23:19:01,904][134294] Updated weights for policy 0, policy_version 28504 (0.0025) [2025-01-03 23:19:03,968][134211] Fps is (10 sec: 12697.6, 60 sec: 14199.5, 300 sec: 14467.9). Total num frames: 116772864. Throughput: 0: 3668.5. Samples: 18359272. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 23:19:03,968][134211] Avg episode reward: [(0, '6.284')] [2025-01-03 23:19:05,344][134294] Updated weights for policy 0, policy_version 28514 (0.0024) [2025-01-03 23:19:07,360][134294] Updated weights for policy 0, policy_version 28524 (0.0014) [2025-01-03 23:19:08,967][134211] Fps is (10 sec: 14745.9, 60 sec: 14609.1, 300 sec: 14537.3). Total num frames: 116862976. Throughput: 0: 3706.2. Samples: 18380804. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 23:19:08,968][134211] Avg episode reward: [(0, '6.697')] [2025-01-03 23:19:09,336][134294] Updated weights for policy 0, policy_version 28534 (0.0014) [2025-01-03 23:19:11,234][134294] Updated weights for policy 0, policy_version 28544 (0.0014) [2025-01-03 23:19:13,085][134294] Updated weights for policy 0, policy_version 28554 (0.0014) [2025-01-03 23:19:13,968][134211] Fps is (10 sec: 20071.0, 60 sec: 15360.1, 300 sec: 14690.1). Total num frames: 116973568. Throughput: 0: 3864.8. Samples: 18413148. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 23:19:13,968][134211] Avg episode reward: [(0, '5.782')] [2025-01-03 23:19:13,994][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000028559_116977664.pth... [2025-01-03 23:19:14,042][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000027700_113459200.pth [2025-01-03 23:19:15,484][134294] Updated weights for policy 0, policy_version 28564 (0.0019) [2025-01-03 23:19:18,486][134294] Updated weights for policy 0, policy_version 28574 (0.0028) [2025-01-03 23:19:18,968][134211] Fps is (10 sec: 18022.2, 60 sec: 15428.3, 300 sec: 14690.1). Total num frames: 117043200. Throughput: 0: 3912.3. Samples: 18425328. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-03 23:19:18,968][134211] Avg episode reward: [(0, '6.286')] [2025-01-03 23:19:21,791][134294] Updated weights for policy 0, policy_version 28584 (0.0026) [2025-01-03 23:19:23,969][134211] Fps is (10 sec: 13105.6, 60 sec: 15359.7, 300 sec: 14676.1). Total num frames: 117104640. Throughput: 0: 3886.4. Samples: 18444408. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-03 23:19:23,969][134211] Avg episode reward: [(0, '6.025')] [2025-01-03 23:19:24,965][134294] Updated weights for policy 0, policy_version 28594 (0.0025) [2025-01-03 23:19:28,134][134294] Updated weights for policy 0, policy_version 28604 (0.0022) [2025-01-03 23:19:28,968][134211] Fps is (10 sec: 12697.4, 60 sec: 14950.4, 300 sec: 14662.3). Total num frames: 117170176. Throughput: 0: 3830.4. Samples: 18463780. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-03 23:19:28,968][134211] Avg episode reward: [(0, '5.668')] [2025-01-03 23:19:31,151][134294] Updated weights for policy 0, policy_version 28614 (0.0026) [2025-01-03 23:19:33,968][134211] Fps is (10 sec: 13518.3, 60 sec: 15019.5, 300 sec: 14662.3). Total num frames: 117239808. Throughput: 0: 3695.0. Samples: 18474026. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-03 23:19:33,968][134211] Avg episode reward: [(0, '5.747')] [2025-01-03 23:19:34,294][134294] Updated weights for policy 0, policy_version 28624 (0.0027) [2025-01-03 23:19:37,218][134294] Updated weights for policy 0, policy_version 28634 (0.0024) [2025-01-03 23:19:38,968][134211] Fps is (10 sec: 13516.8, 60 sec: 14950.4, 300 sec: 14606.7). Total num frames: 117305344. Throughput: 0: 3672.1. Samples: 18494156. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-03 23:19:38,968][134211] Avg episode reward: [(0, '6.032')] [2025-01-03 23:19:40,404][134294] Updated weights for policy 0, policy_version 28644 (0.0028) [2025-01-03 23:19:43,317][134294] Updated weights for policy 0, policy_version 28654 (0.0025) [2025-01-03 23:19:43,968][134211] Fps is (10 sec: 13106.9, 60 sec: 14745.5, 300 sec: 14592.9). Total num frames: 117370880. Throughput: 0: 3660.9. Samples: 18514268. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-03 23:19:43,969][134211] Avg episode reward: [(0, '6.162')] [2025-01-03 23:19:46,366][134294] Updated weights for policy 0, policy_version 28664 (0.0028) [2025-01-03 23:19:48,968][134211] Fps is (10 sec: 13517.0, 60 sec: 14336.0, 300 sec: 14592.9). Total num frames: 117440512. Throughput: 0: 3671.3. Samples: 18524478. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-03 23:19:48,968][134211] Avg episode reward: [(0, '6.349')] [2025-01-03 23:19:49,496][134294] Updated weights for policy 0, policy_version 28674 (0.0022) [2025-01-03 23:19:52,447][134294] Updated weights for policy 0, policy_version 28684 (0.0024) [2025-01-03 23:19:53,968][134211] Fps is (10 sec: 13516.9, 60 sec: 14336.0, 300 sec: 14481.8). Total num frames: 117506048. Throughput: 0: 3639.9. Samples: 18544602. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-03 23:19:53,968][134211] Avg episode reward: [(0, '6.339')] [2025-01-03 23:19:55,454][134294] Updated weights for policy 0, policy_version 28694 (0.0022) [2025-01-03 23:19:58,417][134294] Updated weights for policy 0, policy_version 28704 (0.0027) [2025-01-03 23:19:58,968][134211] Fps is (10 sec: 13926.4, 60 sec: 14404.3, 300 sec: 14398.5). Total num frames: 117579776. Throughput: 0: 3383.0. Samples: 18565382. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 23:19:58,968][134211] Avg episode reward: [(0, '6.201')] [2025-01-03 23:20:00,498][134294] Updated weights for policy 0, policy_version 28714 (0.0016) [2025-01-03 23:20:03,137][134294] Updated weights for policy 0, policy_version 28724 (0.0022) [2025-01-03 23:20:03,968][134211] Fps is (10 sec: 15565.0, 60 sec: 14813.9, 300 sec: 14467.9). Total num frames: 117661696. Throughput: 0: 3425.1. Samples: 18579458. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 23:20:03,968][134211] Avg episode reward: [(0, '5.609')] [2025-01-03 23:20:06,233][134294] Updated weights for policy 0, policy_version 28734 (0.0025) [2025-01-03 23:20:08,968][134211] Fps is (10 sec: 14745.3, 60 sec: 14404.2, 300 sec: 14467.9). Total num frames: 117727232. Throughput: 0: 3455.5. Samples: 18599900. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 23:20:08,968][134211] Avg episode reward: [(0, '5.920')] [2025-01-03 23:20:09,302][134294] Updated weights for policy 0, policy_version 28744 (0.0027) [2025-01-03 23:20:12,346][134294] Updated weights for policy 0, policy_version 28754 (0.0025) [2025-01-03 23:20:13,968][134211] Fps is (10 sec: 13926.5, 60 sec: 13789.9, 300 sec: 14495.8). Total num frames: 117800960. Throughput: 0: 3476.0. Samples: 18620198. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 23:20:13,968][134211] Avg episode reward: [(0, '5.364')] [2025-01-03 23:20:14,578][134294] Updated weights for policy 0, policy_version 28764 (0.0017) [2025-01-03 23:20:16,435][134294] Updated weights for policy 0, policy_version 28774 (0.0013) [2025-01-03 23:20:18,310][134294] Updated weights for policy 0, policy_version 28784 (0.0014) [2025-01-03 23:20:18,968][134211] Fps is (10 sec: 18432.4, 60 sec: 14472.5, 300 sec: 14676.2). Total num frames: 117911552. Throughput: 0: 3606.3. Samples: 18636310. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 23:20:18,968][134211] Avg episode reward: [(0, '5.342')] [2025-01-03 23:20:20,214][134294] Updated weights for policy 0, policy_version 28794 (0.0013) [2025-01-03 23:20:22,779][134294] Updated weights for policy 0, policy_version 28804 (0.0020) [2025-01-03 23:20:23,968][134211] Fps is (10 sec: 19250.8, 60 sec: 14814.1, 300 sec: 14717.8). Total num frames: 117993472. Throughput: 0: 3834.3. Samples: 18666700. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 23:20:23,968][134211] Avg episode reward: [(0, '5.666')] [2025-01-03 23:20:26,023][134294] Updated weights for policy 0, policy_version 28814 (0.0028) [2025-01-03 23:20:28,968][134211] Fps is (10 sec: 14335.9, 60 sec: 14745.6, 300 sec: 14690.1). Total num frames: 118054912. Throughput: 0: 3804.1. Samples: 18685452. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 23:20:28,968][134211] Avg episode reward: [(0, '5.800')] [2025-01-03 23:20:29,322][134294] Updated weights for policy 0, policy_version 28824 (0.0025) [2025-01-03 23:20:32,415][134294] Updated weights for policy 0, policy_version 28834 (0.0028) [2025-01-03 23:20:33,968][134211] Fps is (10 sec: 12697.5, 60 sec: 14677.3, 300 sec: 14690.1). Total num frames: 118120448. Throughput: 0: 3790.6. Samples: 18695056. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 23:20:33,969][134211] Avg episode reward: [(0, '6.097')] [2025-01-03 23:20:35,660][134294] Updated weights for policy 0, policy_version 28844 (0.0029) [2025-01-03 23:20:38,655][134294] Updated weights for policy 0, policy_version 28854 (0.0026) [2025-01-03 23:20:38,968][134211] Fps is (10 sec: 13516.6, 60 sec: 14745.6, 300 sec: 14704.0). Total num frames: 118190080. Throughput: 0: 3784.0. Samples: 18714882. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 23:20:38,968][134211] Avg episode reward: [(0, '6.633')] [2025-01-03 23:20:41,713][134294] Updated weights for policy 0, policy_version 28864 (0.0024) [2025-01-03 23:20:43,968][134211] Fps is (10 sec: 13516.7, 60 sec: 14745.6, 300 sec: 14690.0). Total num frames: 118255616. Throughput: 0: 3766.9. Samples: 18734894. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 23:20:43,968][134211] Avg episode reward: [(0, '6.250')] [2025-01-03 23:20:44,735][134294] Updated weights for policy 0, policy_version 28874 (0.0023) [2025-01-03 23:20:47,781][134294] Updated weights for policy 0, policy_version 28884 (0.0023) [2025-01-03 23:20:48,968][134211] Fps is (10 sec: 13107.2, 60 sec: 14677.3, 300 sec: 14620.7). Total num frames: 118321152. Throughput: 0: 3678.9. Samples: 18745008. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 23:20:48,968][134211] Avg episode reward: [(0, '5.783')] [2025-01-03 23:20:50,698][134294] Updated weights for policy 0, policy_version 28894 (0.0024) [2025-01-03 23:20:53,699][134294] Updated weights for policy 0, policy_version 28904 (0.0025) [2025-01-03 23:20:53,968][134211] Fps is (10 sec: 13517.0, 60 sec: 14745.6, 300 sec: 14634.5). Total num frames: 118390784. Throughput: 0: 3685.7. Samples: 18765756. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-03 23:20:53,968][134211] Avg episode reward: [(0, '5.400')] [2025-01-03 23:20:56,774][134294] Updated weights for policy 0, policy_version 28914 (0.0024) [2025-01-03 23:20:58,968][134211] Fps is (10 sec: 13926.3, 60 sec: 14677.3, 300 sec: 14565.1). Total num frames: 118460416. Throughput: 0: 3682.8. Samples: 18785926. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-03 23:20:58,968][134211] Avg episode reward: [(0, '5.787')] [2025-01-03 23:20:59,860][134294] Updated weights for policy 0, policy_version 28924 (0.0028) [2025-01-03 23:21:02,053][134294] Updated weights for policy 0, policy_version 28934 (0.0016) [2025-01-03 23:21:03,968][134211] Fps is (10 sec: 14745.6, 60 sec: 14609.0, 300 sec: 14551.2). Total num frames: 118538240. Throughput: 0: 3589.1. Samples: 18797820. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-03 23:21:03,969][134211] Avg episode reward: [(0, '6.121')] [2025-01-03 23:21:05,201][134294] Updated weights for policy 0, policy_version 28944 (0.0028) [2025-01-03 23:21:07,283][134294] Updated weights for policy 0, policy_version 28954 (0.0014) [2025-01-03 23:21:08,968][134211] Fps is (10 sec: 16384.4, 60 sec: 14950.4, 300 sec: 14648.4). Total num frames: 118624256. Throughput: 0: 3436.3. Samples: 18821334. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-03 23:21:08,968][134211] Avg episode reward: [(0, '6.373')] [2025-01-03 23:21:09,402][134294] Updated weights for policy 0, policy_version 28964 (0.0015) [2025-01-03 23:21:12,227][134294] Updated weights for policy 0, policy_version 28974 (0.0024) [2025-01-03 23:21:13,968][134211] Fps is (10 sec: 15974.1, 60 sec: 14950.3, 300 sec: 14662.3). Total num frames: 118697984. Throughput: 0: 3555.9. Samples: 18845468. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 23:21:13,969][134211] Avg episode reward: [(0, '5.730')] [2025-01-03 23:21:13,977][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000028979_118697984.pth... [2025-01-03 23:21:14,052][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000028115_115159040.pth [2025-01-03 23:21:15,413][134294] Updated weights for policy 0, policy_version 28984 (0.0022) [2025-01-03 23:21:18,389][134294] Updated weights for policy 0, policy_version 28994 (0.0026) [2025-01-03 23:21:18,968][134211] Fps is (10 sec: 13926.1, 60 sec: 14199.4, 300 sec: 14662.3). Total num frames: 118763520. Throughput: 0: 3567.6. Samples: 18855600. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 23:21:18,968][134211] Avg episode reward: [(0, '6.097')] [2025-01-03 23:21:21,448][134294] Updated weights for policy 0, policy_version 29004 (0.0021) [2025-01-03 23:21:23,968][134211] Fps is (10 sec: 13517.2, 60 sec: 13994.7, 300 sec: 14606.8). Total num frames: 118833152. Throughput: 0: 3577.0. Samples: 18875846. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 23:21:23,968][134211] Avg episode reward: [(0, '5.864')] [2025-01-03 23:21:24,540][134294] Updated weights for policy 0, policy_version 29014 (0.0028) [2025-01-03 23:21:27,586][134294] Updated weights for policy 0, policy_version 29024 (0.0025) [2025-01-03 23:21:28,968][134211] Fps is (10 sec: 13516.8, 60 sec: 14062.9, 300 sec: 14606.8). Total num frames: 118898688. Throughput: 0: 3573.5. Samples: 18895700. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 23:21:28,968][134211] Avg episode reward: [(0, '5.716')] [2025-01-03 23:21:30,618][134294] Updated weights for policy 0, policy_version 29034 (0.0026) [2025-01-03 23:21:33,515][134294] Updated weights for policy 0, policy_version 29044 (0.0025) [2025-01-03 23:21:33,968][134211] Fps is (10 sec: 13516.6, 60 sec: 14131.2, 300 sec: 14606.8). Total num frames: 118968320. Throughput: 0: 3579.7. Samples: 18906094. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-03 23:21:33,968][134211] Avg episode reward: [(0, '5.277')] [2025-01-03 23:21:36,720][134294] Updated weights for policy 0, policy_version 29054 (0.0024) [2025-01-03 23:21:38,779][134294] Updated weights for policy 0, policy_version 29064 (0.0014) [2025-01-03 23:21:38,967][134211] Fps is (10 sec: 15155.5, 60 sec: 14336.1, 300 sec: 14551.2). Total num frames: 119050240. Throughput: 0: 3568.7. Samples: 18926348. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-03 23:21:38,968][134211] Avg episode reward: [(0, '5.559')] [2025-01-03 23:21:40,736][134294] Updated weights for policy 0, policy_version 29074 (0.0012) [2025-01-03 23:21:42,676][134294] Updated weights for policy 0, policy_version 29084 (0.0013) [2025-01-03 23:21:43,967][134211] Fps is (10 sec: 18432.6, 60 sec: 14950.5, 300 sec: 14551.2). Total num frames: 119152640. Throughput: 0: 3825.1. Samples: 18958054. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-03 23:21:43,968][134211] Avg episode reward: [(0, '5.484')] [2025-01-03 23:21:44,588][134294] Updated weights for policy 0, policy_version 29094 (0.0014) [2025-01-03 23:21:46,398][134294] Updated weights for policy 0, policy_version 29104 (0.0013) [2025-01-03 23:21:48,517][134294] Updated weights for policy 0, policy_version 29114 (0.0016) [2025-01-03 23:21:48,968][134211] Fps is (10 sec: 20479.8, 60 sec: 15564.8, 300 sec: 14690.1). Total num frames: 119255040. Throughput: 0: 3923.6. Samples: 18974380. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-03 23:21:48,968][134211] Avg episode reward: [(0, '4.973')] [2025-01-03 23:21:51,775][134294] Updated weights for policy 0, policy_version 29124 (0.0027) [2025-01-03 23:21:53,968][134211] Fps is (10 sec: 15973.9, 60 sec: 15360.0, 300 sec: 14662.3). Total num frames: 119312384. Throughput: 0: 3906.5. Samples: 18997126. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 23:21:53,969][134211] Avg episode reward: [(0, '4.889')] [2025-01-03 23:21:55,399][134294] Updated weights for policy 0, policy_version 29134 (0.0030) [2025-01-03 23:21:58,468][134294] Updated weights for policy 0, policy_version 29144 (0.0024) [2025-01-03 23:21:58,968][134211] Fps is (10 sec: 12287.1, 60 sec: 15291.6, 300 sec: 14690.0). Total num frames: 119377920. Throughput: 0: 3781.2. Samples: 19015626. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 23:21:58,969][134211] Avg episode reward: [(0, '4.679')] [2025-01-03 23:22:01,579][134294] Updated weights for policy 0, policy_version 29154 (0.0024) [2025-01-03 23:22:03,968][134211] Fps is (10 sec: 13107.2, 60 sec: 15086.9, 300 sec: 14717.8). Total num frames: 119443456. Throughput: 0: 3776.5. Samples: 19025544. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 23:22:03,968][134211] Avg episode reward: [(0, '4.939')] [2025-01-03 23:22:04,854][134294] Updated weights for policy 0, policy_version 29164 (0.0026) [2025-01-03 23:22:08,019][134294] Updated weights for policy 0, policy_version 29174 (0.0027) [2025-01-03 23:22:08,968][134211] Fps is (10 sec: 12698.5, 60 sec: 14677.3, 300 sec: 14731.7). Total num frames: 119504896. Throughput: 0: 3755.7. Samples: 19044854. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 23:22:08,968][134211] Avg episode reward: [(0, '5.284')] [2025-01-03 23:22:11,027][134294] Updated weights for policy 0, policy_version 29184 (0.0025) [2025-01-03 23:22:13,968][134211] Fps is (10 sec: 13107.2, 60 sec: 14609.1, 300 sec: 14620.7). Total num frames: 119574528. Throughput: 0: 3753.5. Samples: 19064608. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 23:22:13,968][134211] Avg episode reward: [(0, '5.404')] [2025-01-03 23:22:14,155][134294] Updated weights for policy 0, policy_version 29194 (0.0023) [2025-01-03 23:22:17,165][134294] Updated weights for policy 0, policy_version 29204 (0.0027) [2025-01-03 23:22:18,968][134211] Fps is (10 sec: 13516.7, 60 sec: 14609.1, 300 sec: 14620.6). Total num frames: 119640064. Throughput: 0: 3741.6. Samples: 19074466. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 23:22:18,968][134211] Avg episode reward: [(0, '5.304')] [2025-01-03 23:22:20,253][134294] Updated weights for policy 0, policy_version 29214 (0.0024) [2025-01-03 23:22:23,185][134294] Updated weights for policy 0, policy_version 29224 (0.0024) [2025-01-03 23:22:23,968][134211] Fps is (10 sec: 13516.8, 60 sec: 14609.0, 300 sec: 14606.7). Total num frames: 119709696. Throughput: 0: 3751.7. Samples: 19095176. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 23:22:23,968][134211] Avg episode reward: [(0, '5.929')] [2025-01-03 23:22:26,088][134294] Updated weights for policy 0, policy_version 29234 (0.0024) [2025-01-03 23:22:28,968][134211] Fps is (10 sec: 13925.6, 60 sec: 14677.2, 300 sec: 14620.6). Total num frames: 119779328. Throughput: 0: 3503.0. Samples: 19115692. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 23:22:28,969][134211] Avg episode reward: [(0, '5.747')] [2025-01-03 23:22:29,226][134294] Updated weights for policy 0, policy_version 29244 (0.0025) [2025-01-03 23:22:32,217][134294] Updated weights for policy 0, policy_version 29254 (0.0026) [2025-01-03 23:22:33,968][134211] Fps is (10 sec: 13516.8, 60 sec: 14609.1, 300 sec: 14620.6). Total num frames: 119844864. Throughput: 0: 3366.9. Samples: 19125892. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 23:22:33,968][134211] Avg episode reward: [(0, '5.784')] [2025-01-03 23:22:35,274][134294] Updated weights for policy 0, policy_version 29264 (0.0024) [2025-01-03 23:22:38,555][134294] Updated weights for policy 0, policy_version 29274 (0.0024) [2025-01-03 23:22:38,968][134211] Fps is (10 sec: 13108.1, 60 sec: 14336.0, 300 sec: 14606.8). Total num frames: 119910400. Throughput: 0: 3300.5. Samples: 19145648. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 23:22:38,968][134211] Avg episode reward: [(0, '5.944')] [2025-01-03 23:22:40,783][134294] Updated weights for policy 0, policy_version 29284 (0.0017) [2025-01-03 23:22:42,781][134294] Updated weights for policy 0, policy_version 29294 (0.0013) [2025-01-03 23:22:43,968][134211] Fps is (10 sec: 16793.6, 60 sec: 14335.9, 300 sec: 14662.3). Total num frames: 120012800. Throughput: 0: 3489.8. Samples: 19172664. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 23:22:43,968][134211] Avg episode reward: [(0, '5.932')] [2025-01-03 23:22:44,624][134294] Updated weights for policy 0, policy_version 29304 (0.0014) [2025-01-03 23:22:46,517][134294] Updated weights for policy 0, policy_version 29314 (0.0013) [2025-01-03 23:22:48,568][134294] Updated weights for policy 0, policy_version 29324 (0.0016) [2025-01-03 23:22:48,968][134211] Fps is (10 sec: 20480.0, 60 sec: 14336.0, 300 sec: 14662.3). Total num frames: 120115200. Throughput: 0: 3630.5. Samples: 19188914. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-03 23:22:48,968][134211] Avg episode reward: [(0, '6.532')] [2025-01-03 23:22:51,803][134294] Updated weights for policy 0, policy_version 29334 (0.0028) [2025-01-03 23:22:53,968][134211] Fps is (10 sec: 16383.0, 60 sec: 14404.1, 300 sec: 14648.4). Total num frames: 120176640. Throughput: 0: 3720.7. Samples: 19212288. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-03 23:22:53,969][134211] Avg episode reward: [(0, '6.336')] [2025-01-03 23:22:55,184][134294] Updated weights for policy 0, policy_version 29344 (0.0028) [2025-01-03 23:22:58,189][134294] Updated weights for policy 0, policy_version 29354 (0.0025) [2025-01-03 23:22:58,969][134211] Fps is (10 sec: 12695.8, 60 sec: 14404.1, 300 sec: 14648.3). Total num frames: 120242176. Throughput: 0: 3707.2. Samples: 19231436. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-03 23:22:58,970][134211] Avg episode reward: [(0, '5.997')] [2025-01-03 23:23:01,376][134294] Updated weights for policy 0, policy_version 29364 (0.0026) [2025-01-03 23:23:03,968][134211] Fps is (10 sec: 12698.2, 60 sec: 14336.0, 300 sec: 14634.5). Total num frames: 120303616. Throughput: 0: 3707.2. Samples: 19241292. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-03 23:23:03,969][134211] Avg episode reward: [(0, '5.861')] [2025-01-03 23:23:04,944][134294] Updated weights for policy 0, policy_version 29374 (0.0025) [2025-01-03 23:23:08,466][134294] Updated weights for policy 0, policy_version 29384 (0.0024) [2025-01-03 23:23:08,968][134211] Fps is (10 sec: 11880.1, 60 sec: 14267.7, 300 sec: 14606.8). Total num frames: 120360960. Throughput: 0: 3631.9. Samples: 19258610. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 23:23:08,968][134211] Avg episode reward: [(0, '6.549')] [2025-01-03 23:23:11,786][134294] Updated weights for policy 0, policy_version 29394 (0.0027) [2025-01-03 23:23:13,968][134211] Fps is (10 sec: 11878.9, 60 sec: 14131.3, 300 sec: 14592.9). Total num frames: 120422400. Throughput: 0: 3580.2. Samples: 19276800. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 23:23:13,968][134211] Avg episode reward: [(0, '5.516')] [2025-01-03 23:23:14,021][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000029401_120426496.pth... [2025-01-03 23:23:14,062][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000028559_116977664.pth [2025-01-03 23:23:14,621][134294] Updated weights for policy 0, policy_version 29404 (0.0021) [2025-01-03 23:23:16,924][134294] Updated weights for policy 0, policy_version 29414 (0.0021) [2025-01-03 23:23:18,968][134211] Fps is (10 sec: 14336.2, 60 sec: 14404.3, 300 sec: 14648.4). Total num frames: 120504320. Throughput: 0: 3653.8. Samples: 19290312. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 23:23:18,968][134211] Avg episode reward: [(0, '6.273')] [2025-01-03 23:23:20,097][134294] Updated weights for policy 0, policy_version 29424 (0.0026) [2025-01-03 23:23:22,941][134294] Updated weights for policy 0, policy_version 29434 (0.0026) [2025-01-03 23:23:23,968][134211] Fps is (10 sec: 15155.0, 60 sec: 14404.3, 300 sec: 14579.0). Total num frames: 120573952. Throughput: 0: 3668.8. Samples: 19310744. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 23:23:23,968][134211] Avg episode reward: [(0, '6.349')] [2025-01-03 23:23:25,717][134294] Updated weights for policy 0, policy_version 29444 (0.0020) [2025-01-03 23:23:27,574][134294] Updated weights for policy 0, policy_version 29454 (0.0014) [2025-01-03 23:23:28,967][134211] Fps is (10 sec: 16793.7, 60 sec: 14882.3, 300 sec: 14690.2). Total num frames: 120672256. Throughput: 0: 3663.3. Samples: 19337510. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-03 23:23:28,968][134211] Avg episode reward: [(0, '5.889')] [2025-01-03 23:23:29,515][134294] Updated weights for policy 0, policy_version 29464 (0.0015) [2025-01-03 23:23:32,391][134294] Updated weights for policy 0, policy_version 29474 (0.0027) [2025-01-03 23:23:33,968][134211] Fps is (10 sec: 17203.2, 60 sec: 15018.7, 300 sec: 14703.9). Total num frames: 120745984. Throughput: 0: 3590.2. Samples: 19350474. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-03 23:23:33,968][134211] Avg episode reward: [(0, '6.365')] [2025-01-03 23:23:35,512][134294] Updated weights for policy 0, policy_version 29484 (0.0023) [2025-01-03 23:23:38,440][134294] Updated weights for policy 0, policy_version 29494 (0.0027) [2025-01-03 23:23:38,968][134211] Fps is (10 sec: 13926.1, 60 sec: 15018.7, 300 sec: 14662.3). Total num frames: 120811520. Throughput: 0: 3523.7. Samples: 19370850. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-03 23:23:38,968][134211] Avg episode reward: [(0, '6.228')] [2025-01-03 23:23:41,516][134294] Updated weights for policy 0, policy_version 29504 (0.0028) [2025-01-03 23:23:43,968][134211] Fps is (10 sec: 13516.9, 60 sec: 14472.6, 300 sec: 14579.0). Total num frames: 120881152. Throughput: 0: 3539.2. Samples: 19390696. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-03 23:23:43,968][134211] Avg episode reward: [(0, '6.511')] [2025-01-03 23:23:44,618][134294] Updated weights for policy 0, policy_version 29514 (0.0026) [2025-01-03 23:23:47,844][134294] Updated weights for policy 0, policy_version 29524 (0.0023) [2025-01-03 23:23:48,968][134211] Fps is (10 sec: 13107.4, 60 sec: 13789.9, 300 sec: 14565.1). Total num frames: 120942592. Throughput: 0: 3534.1. Samples: 19400326. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 23:23:48,968][134211] Avg episode reward: [(0, '5.943')] [2025-01-03 23:23:50,761][134294] Updated weights for policy 0, policy_version 29534 (0.0023) [2025-01-03 23:23:53,697][134294] Updated weights for policy 0, policy_version 29544 (0.0024) [2025-01-03 23:23:53,968][134211] Fps is (10 sec: 13106.1, 60 sec: 13926.4, 300 sec: 14565.1). Total num frames: 121012224. Throughput: 0: 3606.7. Samples: 19420912. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 23:23:53,969][134211] Avg episode reward: [(0, '5.657')] [2025-01-03 23:23:56,676][134294] Updated weights for policy 0, policy_version 29554 (0.0024) [2025-01-03 23:23:58,662][134294] Updated weights for policy 0, policy_version 29564 (0.0017) [2025-01-03 23:23:58,968][134211] Fps is (10 sec: 15564.7, 60 sec: 14268.1, 300 sec: 14662.3). Total num frames: 121098240. Throughput: 0: 3724.6. Samples: 19444406. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 23:23:58,968][134211] Avg episode reward: [(0, '6.096')] [2025-01-03 23:24:01,460][134294] Updated weights for policy 0, policy_version 29574 (0.0024) [2025-01-03 23:24:03,968][134211] Fps is (10 sec: 15565.9, 60 sec: 14404.3, 300 sec: 14592.9). Total num frames: 121167872. Throughput: 0: 3685.8. Samples: 19456174. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 23:24:03,968][134211] Avg episode reward: [(0, '6.041')] [2025-01-03 23:24:04,400][134294] Updated weights for policy 0, policy_version 29584 (0.0024) [2025-01-03 23:24:07,408][134294] Updated weights for policy 0, policy_version 29594 (0.0027) [2025-01-03 23:24:08,968][134211] Fps is (10 sec: 13516.7, 60 sec: 14540.8, 300 sec: 14440.1). Total num frames: 121233408. Throughput: 0: 3684.5. Samples: 19476548. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 23:24:08,968][134211] Avg episode reward: [(0, '6.172')] [2025-01-03 23:24:10,368][134294] Updated weights for policy 0, policy_version 29604 (0.0025) [2025-01-03 23:24:13,284][134294] Updated weights for policy 0, policy_version 29614 (0.0025) [2025-01-03 23:24:13,968][134211] Fps is (10 sec: 13926.4, 60 sec: 14745.5, 300 sec: 14454.0). Total num frames: 121307136. Throughput: 0: 3554.2. Samples: 19497448. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 23:24:13,968][134211] Avg episode reward: [(0, '6.299')] [2025-01-03 23:24:15,861][134294] Updated weights for policy 0, policy_version 29624 (0.0020) [2025-01-03 23:24:17,754][134294] Updated weights for policy 0, policy_version 29634 (0.0015) [2025-01-03 23:24:18,968][134211] Fps is (10 sec: 17202.5, 60 sec: 15018.5, 300 sec: 14579.0). Total num frames: 121405440. Throughput: 0: 3543.8. Samples: 19509946. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 23:24:18,968][134211] Avg episode reward: [(0, '6.609')] [2025-01-03 23:24:19,640][134294] Updated weights for policy 0, policy_version 29644 (0.0014) [2025-01-03 23:24:21,485][134294] Updated weights for policy 0, policy_version 29654 (0.0015) [2025-01-03 23:24:23,384][134294] Updated weights for policy 0, policy_version 29664 (0.0014) [2025-01-03 23:24:23,968][134211] Fps is (10 sec: 20480.4, 60 sec: 15633.1, 300 sec: 14717.8). Total num frames: 121511936. Throughput: 0: 3816.3. Samples: 19542584. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 23:24:23,968][134211] Avg episode reward: [(0, '6.266')] [2025-01-03 23:24:25,541][134294] Updated weights for policy 0, policy_version 29674 (0.0016) [2025-01-03 23:24:28,630][134294] Updated weights for policy 0, policy_version 29684 (0.0029) [2025-01-03 23:24:28,970][134211] Fps is (10 sec: 18428.5, 60 sec: 15291.1, 300 sec: 14745.5). Total num frames: 121589760. Throughput: 0: 3946.3. Samples: 19568288. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-03 23:24:28,971][134211] Avg episode reward: [(0, '6.352')] [2025-01-03 23:24:31,721][134294] Updated weights for policy 0, policy_version 29694 (0.0025) [2025-01-03 23:24:33,968][134211] Fps is (10 sec: 13926.2, 60 sec: 15086.9, 300 sec: 14731.7). Total num frames: 121651200. Throughput: 0: 3945.5. Samples: 19577872. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-03 23:24:33,968][134211] Avg episode reward: [(0, '6.167')] [2025-01-03 23:24:35,073][134294] Updated weights for policy 0, policy_version 29704 (0.0025) [2025-01-03 23:24:38,277][134294] Updated weights for policy 0, policy_version 29714 (0.0022) [2025-01-03 23:24:38,968][134211] Fps is (10 sec: 12700.7, 60 sec: 15087.0, 300 sec: 14731.7). Total num frames: 121716736. Throughput: 0: 3911.1. Samples: 19596910. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-03 23:24:38,968][134211] Avg episode reward: [(0, '7.092')] [2025-01-03 23:24:41,288][134294] Updated weights for policy 0, policy_version 29724 (0.0024) [2025-01-03 23:24:43,968][134211] Fps is (10 sec: 13107.1, 60 sec: 15018.6, 300 sec: 14717.8). Total num frames: 121782272. Throughput: 0: 3827.4. Samples: 19616640. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-03 23:24:43,969][134211] Avg episode reward: [(0, '6.423')] [2025-01-03 23:24:44,656][134294] Updated weights for policy 0, policy_version 29734 (0.0024) [2025-01-03 23:24:47,909][134294] Updated weights for policy 0, policy_version 29744 (0.0026) [2025-01-03 23:24:48,968][134211] Fps is (10 sec: 12696.9, 60 sec: 15018.5, 300 sec: 14703.9). Total num frames: 121843712. Throughput: 0: 3767.6. Samples: 19625716. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-03 23:24:48,969][134211] Avg episode reward: [(0, '6.732')] [2025-01-03 23:24:51,149][134294] Updated weights for policy 0, policy_version 29754 (0.0029) [2025-01-03 23:24:53,968][134211] Fps is (10 sec: 12288.2, 60 sec: 14882.3, 300 sec: 14662.3). Total num frames: 121905152. Throughput: 0: 3728.0. Samples: 19644308. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-03 23:24:53,968][134211] Avg episode reward: [(0, '6.875')] [2025-01-03 23:24:54,378][134294] Updated weights for policy 0, policy_version 29764 (0.0028) [2025-01-03 23:24:57,393][134294] Updated weights for policy 0, policy_version 29774 (0.0025) [2025-01-03 23:24:58,968][134211] Fps is (10 sec: 12698.2, 60 sec: 14540.8, 300 sec: 14606.8). Total num frames: 121970688. Throughput: 0: 3712.9. Samples: 19664528. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-03 23:24:58,968][134211] Avg episode reward: [(0, '6.435')] [2025-01-03 23:25:00,430][134294] Updated weights for policy 0, policy_version 29784 (0.0026) [2025-01-03 23:25:03,466][134294] Updated weights for policy 0, policy_version 29794 (0.0026) [2025-01-03 23:25:03,968][134211] Fps is (10 sec: 13516.6, 60 sec: 14540.8, 300 sec: 14620.6). Total num frames: 122040320. Throughput: 0: 3658.6. Samples: 19674582. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-03 23:25:03,968][134211] Avg episode reward: [(0, '6.781')] [2025-01-03 23:25:06,594][134294] Updated weights for policy 0, policy_version 29804 (0.0022) [2025-01-03 23:25:08,727][134294] Updated weights for policy 0, policy_version 29814 (0.0013) [2025-01-03 23:25:08,968][134211] Fps is (10 sec: 15155.5, 60 sec: 14813.9, 300 sec: 14648.4). Total num frames: 122122240. Throughput: 0: 3392.5. Samples: 19695248. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-03 23:25:08,968][134211] Avg episode reward: [(0, '6.661')] [2025-01-03 23:25:10,706][134294] Updated weights for policy 0, policy_version 29824 (0.0012) [2025-01-03 23:25:12,594][134294] Updated weights for policy 0, policy_version 29834 (0.0012) [2025-01-03 23:25:13,967][134211] Fps is (10 sec: 18842.2, 60 sec: 15360.1, 300 sec: 14634.5). Total num frames: 122228736. Throughput: 0: 3516.9. Samples: 19726540. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-03 23:25:13,968][134211] Avg episode reward: [(0, '6.148')] [2025-01-03 23:25:13,972][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000029841_122228736.pth... [2025-01-03 23:25:14,019][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000028979_118697984.pth [2025-01-03 23:25:14,466][134294] Updated weights for policy 0, policy_version 29844 (0.0012) [2025-01-03 23:25:16,357][134294] Updated weights for policy 0, policy_version 29854 (0.0013) [2025-01-03 23:25:18,969][134211] Fps is (10 sec: 19658.8, 60 sec: 15223.4, 300 sec: 14662.3). Total num frames: 122318848. Throughput: 0: 3664.0. Samples: 19742756. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-03 23:25:18,970][134211] Avg episode reward: [(0, '5.525')] [2025-01-03 23:25:18,996][134294] Updated weights for policy 0, policy_version 29864 (0.0024) [2025-01-03 23:25:22,349][134294] Updated weights for policy 0, policy_version 29874 (0.0029) [2025-01-03 23:25:23,968][134211] Fps is (10 sec: 15564.3, 60 sec: 14540.7, 300 sec: 14676.2). Total num frames: 122384384. Throughput: 0: 3705.6. Samples: 19763662. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 23:25:23,968][134211] Avg episode reward: [(0, '6.217')] [2025-01-03 23:25:25,499][134294] Updated weights for policy 0, policy_version 29884 (0.0028) [2025-01-03 23:25:28,553][134294] Updated weights for policy 0, policy_version 29894 (0.0024) [2025-01-03 23:25:28,968][134211] Fps is (10 sec: 13108.3, 60 sec: 14336.6, 300 sec: 14676.2). Total num frames: 122449920. Throughput: 0: 3704.8. Samples: 19783354. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 23:25:28,968][134211] Avg episode reward: [(0, '6.478')] [2025-01-03 23:25:31,542][134294] Updated weights for policy 0, policy_version 29904 (0.0022) [2025-01-03 23:25:33,968][134211] Fps is (10 sec: 13107.1, 60 sec: 14404.2, 300 sec: 14662.3). Total num frames: 122515456. Throughput: 0: 3729.8. Samples: 19793554. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 23:25:33,968][134211] Avg episode reward: [(0, '6.233')] [2025-01-03 23:25:34,742][134294] Updated weights for policy 0, policy_version 29914 (0.0029) [2025-01-03 23:25:37,809][134294] Updated weights for policy 0, policy_version 29924 (0.0025) [2025-01-03 23:25:38,968][134211] Fps is (10 sec: 13107.3, 60 sec: 14404.3, 300 sec: 14662.3). Total num frames: 122580992. Throughput: 0: 3758.6. Samples: 19813444. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 23:25:38,968][134211] Avg episode reward: [(0, '7.109')] [2025-01-03 23:25:40,865][134294] Updated weights for policy 0, policy_version 29934 (0.0025) [2025-01-03 23:25:43,778][134294] Updated weights for policy 0, policy_version 29944 (0.0028) [2025-01-03 23:25:43,968][134211] Fps is (10 sec: 13516.9, 60 sec: 14472.6, 300 sec: 14676.2). Total num frames: 122650624. Throughput: 0: 3761.0. Samples: 19833774. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-03 23:25:43,968][134211] Avg episode reward: [(0, '6.592')] [2025-01-03 23:25:46,857][134294] Updated weights for policy 0, policy_version 29954 (0.0027) [2025-01-03 23:25:48,969][134211] Fps is (10 sec: 13514.5, 60 sec: 14540.5, 300 sec: 14662.2). Total num frames: 122716160. Throughput: 0: 3758.5. Samples: 19843722. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-03 23:25:48,970][134211] Avg episode reward: [(0, '7.009')] [2025-01-03 23:25:49,993][134294] Updated weights for policy 0, policy_version 29964 (0.0024) [2025-01-03 23:25:53,045][134294] Updated weights for policy 0, policy_version 29974 (0.0025) [2025-01-03 23:25:53,968][134211] Fps is (10 sec: 13107.2, 60 sec: 14609.0, 300 sec: 14648.4). Total num frames: 122781696. Throughput: 0: 3748.9. Samples: 19863950. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-03 23:25:53,968][134211] Avg episode reward: [(0, '5.619')] [2025-01-03 23:25:55,954][134294] Updated weights for policy 0, policy_version 29984 (0.0023) [2025-01-03 23:25:58,809][134294] Updated weights for policy 0, policy_version 29994 (0.0024) [2025-01-03 23:25:58,968][134211] Fps is (10 sec: 13928.6, 60 sec: 14745.6, 300 sec: 14634.5). Total num frames: 122855424. Throughput: 0: 3513.8. Samples: 19884664. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-03 23:25:58,968][134211] Avg episode reward: [(0, '6.662')] [2025-01-03 23:26:01,852][134294] Updated weights for policy 0, policy_version 30004 (0.0024) [2025-01-03 23:26:03,968][134211] Fps is (10 sec: 13925.6, 60 sec: 14677.2, 300 sec: 14565.1). Total num frames: 122920960. Throughput: 0: 3381.7. Samples: 19894930. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 23:26:03,969][134211] Avg episode reward: [(0, '6.025')] [2025-01-03 23:26:05,057][134294] Updated weights for policy 0, policy_version 30014 (0.0028) [2025-01-03 23:26:07,754][134294] Updated weights for policy 0, policy_version 30024 (0.0021) [2025-01-03 23:26:08,968][134211] Fps is (10 sec: 14745.8, 60 sec: 14677.3, 300 sec: 14592.9). Total num frames: 123002880. Throughput: 0: 3359.6. Samples: 19914842. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 23:26:08,968][134211] Avg episode reward: [(0, '6.469')] [2025-01-03 23:26:09,646][134294] Updated weights for policy 0, policy_version 30034 (0.0012) [2025-01-03 23:26:11,524][134294] Updated weights for policy 0, policy_version 30044 (0.0016) [2025-01-03 23:26:13,430][134294] Updated weights for policy 0, policy_version 30054 (0.0013) [2025-01-03 23:26:13,968][134211] Fps is (10 sec: 18843.0, 60 sec: 14677.3, 300 sec: 14731.7). Total num frames: 123109376. Throughput: 0: 3642.3. Samples: 19947258. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 23:26:13,968][134211] Avg episode reward: [(0, '6.374')] [2025-01-03 23:26:15,421][134294] Updated weights for policy 0, policy_version 30064 (0.0017) [2025-01-03 23:26:18,448][134294] Updated weights for policy 0, policy_version 30074 (0.0025) [2025-01-03 23:26:18,968][134211] Fps is (10 sec: 18431.7, 60 sec: 14472.7, 300 sec: 14759.5). Total num frames: 123187200. Throughput: 0: 3730.4. Samples: 19961422. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 23:26:18,968][134211] Avg episode reward: [(0, '6.476')] [2025-01-03 23:26:21,507][134294] Updated weights for policy 0, policy_version 30084 (0.0027) [2025-01-03 23:26:23,968][134211] Fps is (10 sec: 14335.7, 60 sec: 14472.5, 300 sec: 14759.5). Total num frames: 123252736. Throughput: 0: 3727.7. Samples: 19981190. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 23:26:23,968][134211] Avg episode reward: [(0, '6.723')] [2025-01-03 23:26:24,732][134294] Updated weights for policy 0, policy_version 30094 (0.0026) [2025-01-03 23:26:27,950][134294] Updated weights for policy 0, policy_version 30104 (0.0025) [2025-01-03 23:26:28,968][134211] Fps is (10 sec: 12697.7, 60 sec: 14404.3, 300 sec: 14731.7). Total num frames: 123314176. Throughput: 0: 3702.3. Samples: 20000376. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 23:26:28,968][134211] Avg episode reward: [(0, '6.347')] [2025-01-03 23:26:31,118][134294] Updated weights for policy 0, policy_version 30114 (0.0024) [2025-01-03 23:26:33,968][134211] Fps is (10 sec: 13107.4, 60 sec: 14472.6, 300 sec: 14690.1). Total num frames: 123383808. Throughput: 0: 3694.3. Samples: 20009958. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 23:26:33,968][134211] Avg episode reward: [(0, '6.359')] [2025-01-03 23:26:34,333][134294] Updated weights for policy 0, policy_version 30124 (0.0027) [2025-01-03 23:26:37,279][134294] Updated weights for policy 0, policy_version 30134 (0.0025) [2025-01-03 23:26:38,968][134211] Fps is (10 sec: 13107.3, 60 sec: 14404.3, 300 sec: 14551.2). Total num frames: 123445248. Throughput: 0: 3686.3. Samples: 20029834. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 23:26:38,968][134211] Avg episode reward: [(0, '6.875')] [2025-01-03 23:26:40,474][134294] Updated weights for policy 0, policy_version 30144 (0.0026) [2025-01-03 23:26:43,343][134294] Updated weights for policy 0, policy_version 30154 (0.0025) [2025-01-03 23:26:43,968][134211] Fps is (10 sec: 13516.8, 60 sec: 14472.6, 300 sec: 14454.0). Total num frames: 123518976. Throughput: 0: 3680.0. Samples: 20050264. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 23:26:43,968][134211] Avg episode reward: [(0, '6.961')] [2025-01-03 23:26:46,375][134294] Updated weights for policy 0, policy_version 30164 (0.0026) [2025-01-03 23:26:48,968][134211] Fps is (10 sec: 13926.2, 60 sec: 14472.9, 300 sec: 14481.8). Total num frames: 123584512. Throughput: 0: 3681.0. Samples: 20060574. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 23:26:48,968][134211] Avg episode reward: [(0, '6.603')] [2025-01-03 23:26:49,388][134294] Updated weights for policy 0, policy_version 30174 (0.0026) [2025-01-03 23:26:52,308][134294] Updated weights for policy 0, policy_version 30184 (0.0025) [2025-01-03 23:26:53,968][134211] Fps is (10 sec: 14745.6, 60 sec: 14745.6, 300 sec: 14537.4). Total num frames: 123666432. Throughput: 0: 3695.1. Samples: 20081120. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 23:26:53,968][134211] Avg episode reward: [(0, '7.060')] [2025-01-03 23:26:54,195][134294] Updated weights for policy 0, policy_version 30194 (0.0016) [2025-01-03 23:26:56,071][134294] Updated weights for policy 0, policy_version 30204 (0.0014) [2025-01-03 23:26:57,936][134294] Updated weights for policy 0, policy_version 30214 (0.0013) [2025-01-03 23:26:58,968][134211] Fps is (10 sec: 19251.6, 60 sec: 15360.0, 300 sec: 14690.1). Total num frames: 123777024. Throughput: 0: 3697.2. Samples: 20113634. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 23:26:58,968][134211] Avg episode reward: [(0, '6.036')] [2025-01-03 23:26:59,862][134294] Updated weights for policy 0, policy_version 30224 (0.0013) [2025-01-03 23:27:03,087][134294] Updated weights for policy 0, policy_version 30234 (0.0023) [2025-01-03 23:27:03,968][134211] Fps is (10 sec: 17612.6, 60 sec: 15360.2, 300 sec: 14703.9). Total num frames: 123842560. Throughput: 0: 3692.7. Samples: 20127592. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-03 23:27:03,968][134211] Avg episode reward: [(0, '6.633')] [2025-01-03 23:27:06,950][134294] Updated weights for policy 0, policy_version 30244 (0.0030) [2025-01-03 23:27:08,968][134211] Fps is (10 sec: 12287.7, 60 sec: 14950.4, 300 sec: 14662.3). Total num frames: 123899904. Throughput: 0: 3609.7. Samples: 20143626. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-03 23:27:08,969][134211] Avg episode reward: [(0, '6.442')] [2025-01-03 23:27:10,937][134294] Updated weights for policy 0, policy_version 30254 (0.0029) [2025-01-03 23:27:13,229][134294] Updated weights for policy 0, policy_version 30264 (0.0017) [2025-01-03 23:27:13,967][134211] Fps is (10 sec: 13107.5, 60 sec: 14404.3, 300 sec: 14690.1). Total num frames: 123973632. Throughput: 0: 3621.6. Samples: 20163346. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-03 23:27:13,968][134211] Avg episode reward: [(0, '6.848')] [2025-01-03 23:27:13,996][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000030268_123977728.pth... [2025-01-03 23:27:14,041][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000029401_120426496.pth [2025-01-03 23:27:15,225][134294] Updated weights for policy 0, policy_version 30274 (0.0014) [2025-01-03 23:27:17,137][134294] Updated weights for policy 0, policy_version 30284 (0.0014) [2025-01-03 23:27:18,968][134211] Fps is (10 sec: 18022.7, 60 sec: 14882.2, 300 sec: 14815.0). Total num frames: 124080128. Throughput: 0: 3755.8. Samples: 20178970. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-03 23:27:18,968][134211] Avg episode reward: [(0, '6.368')] [2025-01-03 23:27:19,116][134294] Updated weights for policy 0, policy_version 30294 (0.0016) [2025-01-03 23:27:22,197][134294] Updated weights for policy 0, policy_version 30304 (0.0028) [2025-01-03 23:27:23,968][134211] Fps is (10 sec: 17202.9, 60 sec: 14882.2, 300 sec: 14801.2). Total num frames: 124145664. Throughput: 0: 3880.9. Samples: 20204476. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-03 23:27:23,968][134211] Avg episode reward: [(0, '6.894')] [2025-01-03 23:27:25,398][134294] Updated weights for policy 0, policy_version 30314 (0.0025) [2025-01-03 23:27:28,453][134294] Updated weights for policy 0, policy_version 30324 (0.0025) [2025-01-03 23:27:28,968][134211] Fps is (10 sec: 13107.1, 60 sec: 14950.4, 300 sec: 14801.1). Total num frames: 124211200. Throughput: 0: 3862.2. Samples: 20224062. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-03 23:27:28,968][134211] Avg episode reward: [(0, '6.817')] [2025-01-03 23:27:31,526][134294] Updated weights for policy 0, policy_version 30334 (0.0027) [2025-01-03 23:27:33,968][134211] Fps is (10 sec: 13107.0, 60 sec: 14882.1, 300 sec: 14801.1). Total num frames: 124276736. Throughput: 0: 3856.0. Samples: 20234096. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-03 23:27:33,968][134211] Avg episode reward: [(0, '6.229')] [2025-01-03 23:27:34,698][134294] Updated weights for policy 0, policy_version 30344 (0.0023) [2025-01-03 23:27:37,761][134294] Updated weights for policy 0, policy_version 30354 (0.0025) [2025-01-03 23:27:38,968][134211] Fps is (10 sec: 13107.4, 60 sec: 14950.4, 300 sec: 14676.2). Total num frames: 124342272. Throughput: 0: 3836.9. Samples: 20253782. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-03 23:27:38,968][134211] Avg episode reward: [(0, '6.145')] [2025-01-03 23:27:40,713][134294] Updated weights for policy 0, policy_version 30364 (0.0027) [2025-01-03 23:27:43,720][134294] Updated weights for policy 0, policy_version 30374 (0.0025) [2025-01-03 23:27:43,968][134211] Fps is (10 sec: 13517.0, 60 sec: 14882.1, 300 sec: 14565.1). Total num frames: 124411904. Throughput: 0: 3575.1. Samples: 20274516. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 23:27:43,968][134211] Avg episode reward: [(0, '5.905')] [2025-01-03 23:27:46,748][134294] Updated weights for policy 0, policy_version 30384 (0.0021) [2025-01-03 23:27:48,968][134211] Fps is (10 sec: 13926.2, 60 sec: 14950.4, 300 sec: 14592.9). Total num frames: 124481536. Throughput: 0: 3484.8. Samples: 20284406. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 23:27:48,968][134211] Avg episode reward: [(0, '5.857')] [2025-01-03 23:27:49,916][134294] Updated weights for policy 0, policy_version 30394 (0.0026) [2025-01-03 23:27:52,840][134294] Updated weights for policy 0, policy_version 30404 (0.0026) [2025-01-03 23:27:53,971][134211] Fps is (10 sec: 13512.6, 60 sec: 14676.5, 300 sec: 14592.8). Total num frames: 124547072. Throughput: 0: 3575.9. Samples: 20304552. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 23:27:53,972][134211] Avg episode reward: [(0, '5.993')] [2025-01-03 23:27:55,938][134294] Updated weights for policy 0, policy_version 30414 (0.0023) [2025-01-03 23:27:58,764][134294] Updated weights for policy 0, policy_version 30424 (0.0022) [2025-01-03 23:27:58,968][134211] Fps is (10 sec: 13516.9, 60 sec: 13994.7, 300 sec: 14620.7). Total num frames: 124616704. Throughput: 0: 3598.1. Samples: 20325262. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 23:27:58,968][134211] Avg episode reward: [(0, '6.432')] [2025-01-03 23:28:01,816][134294] Updated weights for policy 0, policy_version 30434 (0.0026) [2025-01-03 23:28:03,968][134211] Fps is (10 sec: 13930.9, 60 sec: 14063.0, 300 sec: 14662.3). Total num frames: 124686336. Throughput: 0: 3478.5. Samples: 20335504. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 23:28:03,968][134211] Avg episode reward: [(0, '6.380')] [2025-01-03 23:28:04,792][134294] Updated weights for policy 0, policy_version 30444 (0.0024) [2025-01-03 23:28:06,693][134294] Updated weights for policy 0, policy_version 30454 (0.0014) [2025-01-03 23:28:08,553][134294] Updated weights for policy 0, policy_version 30464 (0.0012) [2025-01-03 23:28:08,968][134211] Fps is (10 sec: 17203.3, 60 sec: 14813.9, 300 sec: 14801.1). Total num frames: 124788736. Throughput: 0: 3474.8. Samples: 20360840. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 23:28:08,968][134211] Avg episode reward: [(0, '6.181')] [2025-01-03 23:28:10,458][134294] Updated weights for policy 0, policy_version 30474 (0.0014) [2025-01-03 23:28:13,007][134294] Updated weights for policy 0, policy_version 30484 (0.0022) [2025-01-03 23:28:13,968][134211] Fps is (10 sec: 18431.7, 60 sec: 14950.3, 300 sec: 14801.1). Total num frames: 124870656. Throughput: 0: 3670.5. Samples: 20389236. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 23:28:13,969][134211] Avg episode reward: [(0, '6.006')] [2025-01-03 23:28:16,287][134294] Updated weights for policy 0, policy_version 30494 (0.0031) [2025-01-03 23:28:18,968][134211] Fps is (10 sec: 14745.1, 60 sec: 14267.7, 300 sec: 14787.2). Total num frames: 124936192. Throughput: 0: 3657.1. Samples: 20398664. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 23:28:18,969][134211] Avg episode reward: [(0, '5.947')] [2025-01-03 23:28:19,728][134294] Updated weights for policy 0, policy_version 30504 (0.0028) [2025-01-03 23:28:22,818][134294] Updated weights for policy 0, policy_version 30514 (0.0025) [2025-01-03 23:28:23,968][134211] Fps is (10 sec: 13107.4, 60 sec: 14267.7, 300 sec: 14676.2). Total num frames: 125001728. Throughput: 0: 3640.0. Samples: 20417580. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 23:28:23,968][134211] Avg episode reward: [(0, '6.626')] [2025-01-03 23:28:25,735][134294] Updated weights for policy 0, policy_version 30524 (0.0024) [2025-01-03 23:28:28,745][134294] Updated weights for policy 0, policy_version 30534 (0.0025) [2025-01-03 23:28:28,968][134211] Fps is (10 sec: 13107.4, 60 sec: 14267.7, 300 sec: 14648.4). Total num frames: 125067264. Throughput: 0: 3637.3. Samples: 20438194. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 23:28:28,968][134211] Avg episode reward: [(0, '7.094')] [2025-01-03 23:28:31,717][134294] Updated weights for policy 0, policy_version 30544 (0.0023) [2025-01-03 23:28:33,968][134211] Fps is (10 sec: 13516.6, 60 sec: 14336.0, 300 sec: 14662.3). Total num frames: 125136896. Throughput: 0: 3642.7. Samples: 20448328. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 23:28:33,970][134211] Avg episode reward: [(0, '6.237')] [2025-01-03 23:28:34,934][134294] Updated weights for policy 0, policy_version 30554 (0.0025) [2025-01-03 23:28:37,864][134294] Updated weights for policy 0, policy_version 30564 (0.0021) [2025-01-03 23:28:38,967][134211] Fps is (10 sec: 14336.3, 60 sec: 14472.6, 300 sec: 14676.2). Total num frames: 125210624. Throughput: 0: 3621.0. Samples: 20467484. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 23:28:38,968][134211] Avg episode reward: [(0, '6.536')] [2025-01-03 23:28:39,748][134294] Updated weights for policy 0, policy_version 30574 (0.0013) [2025-01-03 23:28:41,650][134294] Updated weights for policy 0, policy_version 30584 (0.0011) [2025-01-03 23:28:43,535][134294] Updated weights for policy 0, policy_version 30594 (0.0013) [2025-01-03 23:28:43,968][134211] Fps is (10 sec: 18022.7, 60 sec: 15087.0, 300 sec: 14828.9). Total num frames: 125317120. Throughput: 0: 3874.5. Samples: 20499614. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 23:28:43,969][134211] Avg episode reward: [(0, '6.766')] [2025-01-03 23:28:46,400][134294] Updated weights for policy 0, policy_version 30604 (0.0027) [2025-01-03 23:28:48,968][134211] Fps is (10 sec: 17202.8, 60 sec: 15018.7, 300 sec: 14815.1). Total num frames: 125382656. Throughput: 0: 3897.9. Samples: 20510908. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 23:28:48,968][134211] Avg episode reward: [(0, '6.065')] [2025-01-03 23:28:49,805][134294] Updated weights for policy 0, policy_version 30614 (0.0027) [2025-01-03 23:28:52,807][134294] Updated weights for policy 0, policy_version 30624 (0.0025) [2025-01-03 23:28:53,968][134211] Fps is (10 sec: 13107.0, 60 sec: 15019.4, 300 sec: 14745.6). Total num frames: 125448192. Throughput: 0: 3759.8. Samples: 20530032. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 23:28:53,968][134211] Avg episode reward: [(0, '6.602')] [2025-01-03 23:28:55,973][134294] Updated weights for policy 0, policy_version 30634 (0.0023) [2025-01-03 23:28:58,919][134294] Updated weights for policy 0, policy_version 30644 (0.0026) [2025-01-03 23:28:58,968][134211] Fps is (10 sec: 13516.8, 60 sec: 15018.6, 300 sec: 14745.6). Total num frames: 125517824. Throughput: 0: 3576.9. Samples: 20550196. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 23:28:58,968][134211] Avg episode reward: [(0, '6.670')] [2025-01-03 23:29:01,895][134294] Updated weights for policy 0, policy_version 30654 (0.0023) [2025-01-03 23:29:03,968][134211] Fps is (10 sec: 13107.3, 60 sec: 14882.1, 300 sec: 14731.7). Total num frames: 125579264. Throughput: 0: 3590.6. Samples: 20560242. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 23:29:03,968][134211] Avg episode reward: [(0, '6.226')] [2025-01-03 23:29:05,549][134294] Updated weights for policy 0, policy_version 30664 (0.0027) [2025-01-03 23:29:08,185][134294] Updated weights for policy 0, policy_version 30674 (0.0017) [2025-01-03 23:29:08,967][134211] Fps is (10 sec: 13517.2, 60 sec: 14404.3, 300 sec: 14731.7). Total num frames: 125652992. Throughput: 0: 3579.3. Samples: 20578646. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 23:29:08,968][134211] Avg episode reward: [(0, '6.095')] [2025-01-03 23:29:10,215][134294] Updated weights for policy 0, policy_version 30684 (0.0014) [2025-01-03 23:29:12,120][134294] Updated weights for policy 0, policy_version 30694 (0.0015) [2025-01-03 23:29:13,954][134294] Updated weights for policy 0, policy_version 30704 (0.0014) [2025-01-03 23:29:13,968][134211] Fps is (10 sec: 18432.3, 60 sec: 14882.2, 300 sec: 14773.4). Total num frames: 125763584. Throughput: 0: 3811.6. Samples: 20609714. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 23:29:13,968][134211] Avg episode reward: [(0, '6.698')] [2025-01-03 23:29:13,976][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000030704_125763584.pth... [2025-01-03 23:29:14,026][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000029841_122228736.pth [2025-01-03 23:29:15,892][134294] Updated weights for policy 0, policy_version 30714 (0.0015) [2025-01-03 23:29:17,826][134294] Updated weights for policy 0, policy_version 30724 (0.0013) [2025-01-03 23:29:18,968][134211] Fps is (10 sec: 20889.1, 60 sec: 15428.3, 300 sec: 14745.6). Total num frames: 125861888. Throughput: 0: 3945.6. Samples: 20625878. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 23:29:18,968][134211] Avg episode reward: [(0, '6.253')] [2025-01-03 23:29:20,681][134294] Updated weights for policy 0, policy_version 30734 (0.0025) [2025-01-03 23:29:23,832][134294] Updated weights for policy 0, policy_version 30744 (0.0026) [2025-01-03 23:29:23,968][134211] Fps is (10 sec: 16383.9, 60 sec: 15428.3, 300 sec: 14704.1). Total num frames: 125927424. Throughput: 0: 4051.3. Samples: 20649794. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 23:29:23,968][134211] Avg episode reward: [(0, '6.279')] [2025-01-03 23:29:26,978][134294] Updated weights for policy 0, policy_version 30754 (0.0027) [2025-01-03 23:29:28,968][134211] Fps is (10 sec: 12697.6, 60 sec: 15360.0, 300 sec: 14703.9). Total num frames: 125988864. Throughput: 0: 3763.5. Samples: 20668970. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 23:29:28,968][134211] Avg episode reward: [(0, '6.374')] [2025-01-03 23:29:30,240][134294] Updated weights for policy 0, policy_version 30764 (0.0025) [2025-01-03 23:29:33,254][134294] Updated weights for policy 0, policy_version 30774 (0.0025) [2025-01-03 23:29:33,968][134211] Fps is (10 sec: 13107.3, 60 sec: 15360.0, 300 sec: 14717.8). Total num frames: 126058496. Throughput: 0: 3733.2. Samples: 20678900. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 23:29:33,968][134211] Avg episode reward: [(0, '6.865')] [2025-01-03 23:29:36,224][134294] Updated weights for policy 0, policy_version 30784 (0.0024) [2025-01-03 23:29:38,968][134211] Fps is (10 sec: 13516.8, 60 sec: 15223.4, 300 sec: 14717.8). Total num frames: 126124032. Throughput: 0: 3759.9. Samples: 20699228. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 23:29:38,968][134211] Avg episode reward: [(0, '6.484')] [2025-01-03 23:29:39,333][134294] Updated weights for policy 0, policy_version 30794 (0.0026) [2025-01-03 23:29:42,462][134294] Updated weights for policy 0, policy_version 30804 (0.0025) [2025-01-03 23:29:43,968][134211] Fps is (10 sec: 13107.2, 60 sec: 14540.8, 300 sec: 14731.7). Total num frames: 126189568. Throughput: 0: 3745.8. Samples: 20718756. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 23:29:43,968][134211] Avg episode reward: [(0, '6.057')] [2025-01-03 23:29:45,529][134294] Updated weights for policy 0, policy_version 30814 (0.0026) [2025-01-03 23:29:48,530][134294] Updated weights for policy 0, policy_version 30824 (0.0024) [2025-01-03 23:29:48,968][134211] Fps is (10 sec: 13516.4, 60 sec: 14609.0, 300 sec: 14759.5). Total num frames: 126259200. Throughput: 0: 3748.2. Samples: 20728912. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 23:29:48,970][134211] Avg episode reward: [(0, '6.206')] [2025-01-03 23:29:51,470][134294] Updated weights for policy 0, policy_version 30834 (0.0023) [2025-01-03 23:29:53,968][134211] Fps is (10 sec: 13926.2, 60 sec: 14677.3, 300 sec: 14773.4). Total num frames: 126328832. Throughput: 0: 3799.7. Samples: 20749636. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 23:29:53,969][134211] Avg episode reward: [(0, '6.532')] [2025-01-03 23:29:54,487][134294] Updated weights for policy 0, policy_version 30844 (0.0028) [2025-01-03 23:29:57,557][134294] Updated weights for policy 0, policy_version 30854 (0.0027) [2025-01-03 23:29:58,968][134211] Fps is (10 sec: 13517.1, 60 sec: 14609.1, 300 sec: 14759.5). Total num frames: 126394368. Throughput: 0: 3556.2. Samples: 20769742. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 23:29:58,968][134211] Avg episode reward: [(0, '6.481')] [2025-01-03 23:30:00,542][134294] Updated weights for policy 0, policy_version 30864 (0.0025) [2025-01-03 23:30:03,515][134294] Updated weights for policy 0, policy_version 30874 (0.0024) [2025-01-03 23:30:03,968][134211] Fps is (10 sec: 13517.0, 60 sec: 14745.6, 300 sec: 14717.8). Total num frames: 126464000. Throughput: 0: 3428.7. Samples: 20780170. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 23:30:03,968][134211] Avg episode reward: [(0, '7.018')] [2025-01-03 23:30:06,484][134294] Updated weights for policy 0, policy_version 30884 (0.0023) [2025-01-03 23:30:08,968][134211] Fps is (10 sec: 13516.8, 60 sec: 14609.0, 300 sec: 14579.0). Total num frames: 126529536. Throughput: 0: 3353.1. Samples: 20800684. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 23:30:08,968][134211] Avg episode reward: [(0, '6.588')] [2025-01-03 23:30:09,601][134294] Updated weights for policy 0, policy_version 30894 (0.0023) [2025-01-03 23:30:12,398][134294] Updated weights for policy 0, policy_version 30904 (0.0022) [2025-01-03 23:30:13,968][134211] Fps is (10 sec: 15155.3, 60 sec: 14199.5, 300 sec: 14565.1). Total num frames: 126615552. Throughput: 0: 3422.5. Samples: 20822980. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 23:30:13,968][134211] Avg episode reward: [(0, '6.349')] [2025-01-03 23:30:14,311][134294] Updated weights for policy 0, policy_version 30914 (0.0013) [2025-01-03 23:30:16,183][134294] Updated weights for policy 0, policy_version 30924 (0.0012) [2025-01-03 23:30:18,039][134294] Updated weights for policy 0, policy_version 30934 (0.0014) [2025-01-03 23:30:18,968][134211] Fps is (10 sec: 19251.6, 60 sec: 14336.0, 300 sec: 14704.0). Total num frames: 126722048. Throughput: 0: 3563.7. Samples: 20839268. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 23:30:18,968][134211] Avg episode reward: [(0, '6.626')] [2025-01-03 23:30:19,951][134294] Updated weights for policy 0, policy_version 30944 (0.0012) [2025-01-03 23:30:21,913][134294] Updated weights for policy 0, policy_version 30954 (0.0015) [2025-01-03 23:30:23,968][134211] Fps is (10 sec: 19660.3, 60 sec: 14745.6, 300 sec: 14787.3). Total num frames: 126812160. Throughput: 0: 3821.2. Samples: 20871182. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-03 23:30:23,969][134211] Avg episode reward: [(0, '6.268')] [2025-01-03 23:30:24,963][134294] Updated weights for policy 0, policy_version 30964 (0.0025) [2025-01-03 23:30:28,334][134294] Updated weights for policy 0, policy_version 30974 (0.0030) [2025-01-03 23:30:28,968][134211] Fps is (10 sec: 15564.5, 60 sec: 14813.9, 300 sec: 14787.3). Total num frames: 126877696. Throughput: 0: 3811.9. Samples: 20890294. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-03 23:30:28,968][134211] Avg episode reward: [(0, '6.316')] [2025-01-03 23:30:31,405][134294] Updated weights for policy 0, policy_version 30984 (0.0028) [2025-01-03 23:30:33,968][134211] Fps is (10 sec: 12697.6, 60 sec: 14677.3, 300 sec: 14773.4). Total num frames: 126939136. Throughput: 0: 3801.9. Samples: 20899998. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-03 23:30:33,968][134211] Avg episode reward: [(0, '5.746')] [2025-01-03 23:30:34,867][134294] Updated weights for policy 0, policy_version 30994 (0.0030) [2025-01-03 23:30:37,845][134294] Updated weights for policy 0, policy_version 31004 (0.0029) [2025-01-03 23:30:38,968][134211] Fps is (10 sec: 12697.6, 60 sec: 14677.3, 300 sec: 14759.5). Total num frames: 127004672. Throughput: 0: 3760.8. Samples: 20918870. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-03 23:30:38,968][134211] Avg episode reward: [(0, '6.713')] [2025-01-03 23:30:41,054][134294] Updated weights for policy 0, policy_version 31014 (0.0026) [2025-01-03 23:30:43,968][134211] Fps is (10 sec: 13107.4, 60 sec: 14677.3, 300 sec: 14759.6). Total num frames: 127070208. Throughput: 0: 3751.6. Samples: 20938564. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-03 23:30:43,968][134211] Avg episode reward: [(0, '6.700')] [2025-01-03 23:30:44,117][134294] Updated weights for policy 0, policy_version 31024 (0.0025) [2025-01-03 23:30:47,200][134294] Updated weights for policy 0, policy_version 31034 (0.0027) [2025-01-03 23:30:48,968][134211] Fps is (10 sec: 13107.3, 60 sec: 14609.1, 300 sec: 14759.5). Total num frames: 127135744. Throughput: 0: 3739.5. Samples: 20948448. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-03 23:30:48,968][134211] Avg episode reward: [(0, '6.010')] [2025-01-03 23:30:50,160][134294] Updated weights for policy 0, policy_version 31044 (0.0023) [2025-01-03 23:30:53,134][134294] Updated weights for policy 0, policy_version 31054 (0.0025) [2025-01-03 23:30:53,968][134211] Fps is (10 sec: 13516.6, 60 sec: 14609.1, 300 sec: 14745.6). Total num frames: 127205376. Throughput: 0: 3744.8. Samples: 20969200. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-03 23:30:53,968][134211] Avg episode reward: [(0, '6.323')] [2025-01-03 23:30:56,181][134294] Updated weights for policy 0, policy_version 31064 (0.0025) [2025-01-03 23:30:58,968][134211] Fps is (10 sec: 13926.3, 60 sec: 14677.3, 300 sec: 14759.5). Total num frames: 127275008. Throughput: 0: 3697.8. Samples: 20989382. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-03 23:30:58,968][134211] Avg episode reward: [(0, '5.711')] [2025-01-03 23:30:59,193][134294] Updated weights for policy 0, policy_version 31074 (0.0026) [2025-01-03 23:31:01,149][134294] Updated weights for policy 0, policy_version 31084 (0.0016) [2025-01-03 23:31:03,094][134294] Updated weights for policy 0, policy_version 31094 (0.0015) [2025-01-03 23:31:03,968][134211] Fps is (10 sec: 16384.2, 60 sec: 15086.9, 300 sec: 14801.1). Total num frames: 127369216. Throughput: 0: 3653.1. Samples: 21003656. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-03 23:31:03,968][134211] Avg episode reward: [(0, '6.308')] [2025-01-03 23:31:06,404][134294] Updated weights for policy 0, policy_version 31104 (0.0028) [2025-01-03 23:31:08,968][134211] Fps is (10 sec: 15564.7, 60 sec: 15018.7, 300 sec: 14648.4). Total num frames: 127430656. Throughput: 0: 3433.6. Samples: 21025694. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-03 23:31:08,968][134211] Avg episode reward: [(0, '6.128')] [2025-01-03 23:31:09,987][134294] Updated weights for policy 0, policy_version 31114 (0.0022) [2025-01-03 23:31:12,530][134294] Updated weights for policy 0, policy_version 31124 (0.0016) [2025-01-03 23:31:13,968][134211] Fps is (10 sec: 14336.2, 60 sec: 14950.4, 300 sec: 14662.3). Total num frames: 127512576. Throughput: 0: 3496.8. Samples: 21047650. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-03 23:31:13,968][134211] Avg episode reward: [(0, '6.195')] [2025-01-03 23:31:14,040][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000031132_127516672.pth... [2025-01-03 23:31:14,082][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000030268_123977728.pth [2025-01-03 23:31:14,492][134294] Updated weights for policy 0, policy_version 31134 (0.0012) [2025-01-03 23:31:16,339][134294] Updated weights for policy 0, policy_version 31144 (0.0014) [2025-01-03 23:31:18,200][134294] Updated weights for policy 0, policy_version 31154 (0.0013) [2025-01-03 23:31:18,967][134211] Fps is (10 sec: 19251.8, 60 sec: 15018.7, 300 sec: 14815.0). Total num frames: 127623168. Throughput: 0: 3641.9. Samples: 21063884. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-03 23:31:18,968][134211] Avg episode reward: [(0, '6.600')] [2025-01-03 23:31:20,087][134294] Updated weights for policy 0, policy_version 31164 (0.0014) [2025-01-03 23:31:22,606][134294] Updated weights for policy 0, policy_version 31174 (0.0023) [2025-01-03 23:31:23,968][134211] Fps is (10 sec: 19250.8, 60 sec: 14882.1, 300 sec: 14884.4). Total num frames: 127705088. Throughput: 0: 3892.1. Samples: 21094014. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 23:31:23,968][134211] Avg episode reward: [(0, '6.496')] [2025-01-03 23:31:25,863][134294] Updated weights for policy 0, policy_version 31184 (0.0026) [2025-01-03 23:31:28,941][134294] Updated weights for policy 0, policy_version 31194 (0.0025) [2025-01-03 23:31:28,968][134211] Fps is (10 sec: 14745.2, 60 sec: 14882.1, 300 sec: 14870.6). Total num frames: 127770624. Throughput: 0: 3882.8. Samples: 21113290. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 23:31:28,968][134211] Avg episode reward: [(0, '6.292')] [2025-01-03 23:31:32,054][134294] Updated weights for policy 0, policy_version 31204 (0.0025) [2025-01-03 23:31:33,968][134211] Fps is (10 sec: 13107.2, 60 sec: 14950.4, 300 sec: 14884.4). Total num frames: 127836160. Throughput: 0: 3879.9. Samples: 21123046. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 23:31:33,968][134211] Avg episode reward: [(0, '6.887')] [2025-01-03 23:31:35,119][134294] Updated weights for policy 0, policy_version 31214 (0.0024) [2025-01-03 23:31:38,297][134294] Updated weights for policy 0, policy_version 31224 (0.0026) [2025-01-03 23:31:38,968][134211] Fps is (10 sec: 12697.8, 60 sec: 14882.2, 300 sec: 14842.8). Total num frames: 127897600. Throughput: 0: 3862.4. Samples: 21143008. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 23:31:38,968][134211] Avg episode reward: [(0, '5.461')] [2025-01-03 23:31:41,352][134294] Updated weights for policy 0, policy_version 31234 (0.0026) [2025-01-03 23:31:43,968][134211] Fps is (10 sec: 13107.2, 60 sec: 14950.4, 300 sec: 14856.7). Total num frames: 127967232. Throughput: 0: 3850.7. Samples: 21162664. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 23:31:43,968][134211] Avg episode reward: [(0, '6.189')] [2025-01-03 23:31:44,611][134294] Updated weights for policy 0, policy_version 31244 (0.0027) [2025-01-03 23:31:47,702][134294] Updated weights for policy 0, policy_version 31254 (0.0024) [2025-01-03 23:31:48,968][134211] Fps is (10 sec: 13107.1, 60 sec: 14882.1, 300 sec: 14787.3). Total num frames: 128028672. Throughput: 0: 3747.2. Samples: 21172282. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 23:31:48,968][134211] Avg episode reward: [(0, '6.109')] [2025-01-03 23:31:50,664][134294] Updated weights for policy 0, policy_version 31264 (0.0024) [2025-01-03 23:31:53,759][134294] Updated weights for policy 0, policy_version 31274 (0.0025) [2025-01-03 23:31:53,968][134211] Fps is (10 sec: 13107.2, 60 sec: 14882.1, 300 sec: 14648.4). Total num frames: 128098304. Throughput: 0: 3709.6. Samples: 21192628. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 23:31:53,968][134211] Avg episode reward: [(0, '6.494')] [2025-01-03 23:31:56,809][134294] Updated weights for policy 0, policy_version 31284 (0.0026) [2025-01-03 23:31:58,968][134211] Fps is (10 sec: 13926.4, 60 sec: 14882.1, 300 sec: 14662.3). Total num frames: 128167936. Throughput: 0: 3666.3. Samples: 21212632. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 23:31:58,968][134211] Avg episode reward: [(0, '5.489')] [2025-01-03 23:31:59,912][134294] Updated weights for policy 0, policy_version 31294 (0.0023) [2025-01-03 23:32:02,976][134294] Updated weights for policy 0, policy_version 31304 (0.0025) [2025-01-03 23:32:03,968][134211] Fps is (10 sec: 13516.7, 60 sec: 14404.2, 300 sec: 14690.1). Total num frames: 128233472. Throughput: 0: 3528.1. Samples: 21222648. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 23:32:03,969][134211] Avg episode reward: [(0, '6.244')] [2025-01-03 23:32:05,777][134294] Updated weights for policy 0, policy_version 31314 (0.0020) [2025-01-03 23:32:07,782][134294] Updated weights for policy 0, policy_version 31324 (0.0014) [2025-01-03 23:32:08,968][134211] Fps is (10 sec: 15155.2, 60 sec: 14813.9, 300 sec: 14731.7). Total num frames: 128319488. Throughput: 0: 3384.3. Samples: 21246306. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 23:32:08,968][134211] Avg episode reward: [(0, '5.727')] [2025-01-03 23:32:10,659][134294] Updated weights for policy 0, policy_version 31334 (0.0023) [2025-01-03 23:32:13,587][134294] Updated weights for policy 0, policy_version 31344 (0.0024) [2025-01-03 23:32:13,968][134211] Fps is (10 sec: 15565.0, 60 sec: 14609.0, 300 sec: 14606.7). Total num frames: 128389120. Throughput: 0: 3440.0. Samples: 21268090. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 23:32:13,968][134211] Avg episode reward: [(0, '5.625')] [2025-01-03 23:32:16,631][134294] Updated weights for policy 0, policy_version 31354 (0.0025) [2025-01-03 23:32:18,968][134211] Fps is (10 sec: 13516.8, 60 sec: 13858.1, 300 sec: 14606.8). Total num frames: 128454656. Throughput: 0: 3447.9. Samples: 21278202. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 23:32:18,968][134211] Avg episode reward: [(0, '5.780')] [2025-01-03 23:32:19,841][134294] Updated weights for policy 0, policy_version 31364 (0.0025) [2025-01-03 23:32:22,040][134294] Updated weights for policy 0, policy_version 31374 (0.0016) [2025-01-03 23:32:23,949][134294] Updated weights for policy 0, policy_version 31384 (0.0012) [2025-01-03 23:32:23,968][134211] Fps is (10 sec: 15974.7, 60 sec: 14063.0, 300 sec: 14704.0). Total num frames: 128548864. Throughput: 0: 3512.0. Samples: 21301046. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 23:32:23,968][134211] Avg episode reward: [(0, '5.447')] [2025-01-03 23:32:25,847][134294] Updated weights for policy 0, policy_version 31394 (0.0013) [2025-01-03 23:32:27,665][134294] Updated weights for policy 0, policy_version 31404 (0.0012) [2025-01-03 23:32:28,968][134211] Fps is (10 sec: 20070.4, 60 sec: 14745.6, 300 sec: 14842.8). Total num frames: 128655360. Throughput: 0: 3799.5. Samples: 21333640. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 23:32:28,968][134211] Avg episode reward: [(0, '6.593')] [2025-01-03 23:32:30,085][134294] Updated weights for policy 0, policy_version 31414 (0.0020) [2025-01-03 23:32:33,309][134294] Updated weights for policy 0, policy_version 31424 (0.0029) [2025-01-03 23:32:33,968][134211] Fps is (10 sec: 17202.7, 60 sec: 14745.6, 300 sec: 14842.8). Total num frames: 128720896. Throughput: 0: 3829.0. Samples: 21344586. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 23:32:33,969][134211] Avg episode reward: [(0, '6.088')] [2025-01-03 23:32:36,635][134294] Updated weights for policy 0, policy_version 31434 (0.0029) [2025-01-03 23:32:38,968][134211] Fps is (10 sec: 12697.6, 60 sec: 14745.6, 300 sec: 14815.0). Total num frames: 128782336. Throughput: 0: 3795.1. Samples: 21363408. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 23:32:38,968][134211] Avg episode reward: [(0, '6.304')] [2025-01-03 23:32:39,969][134294] Updated weights for policy 0, policy_version 31444 (0.0026) [2025-01-03 23:32:43,309][134294] Updated weights for policy 0, policy_version 31454 (0.0029) [2025-01-03 23:32:43,968][134211] Fps is (10 sec: 12288.1, 60 sec: 14609.1, 300 sec: 14787.2). Total num frames: 128843776. Throughput: 0: 3759.3. Samples: 21381802. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 23:32:43,968][134211] Avg episode reward: [(0, '6.249')] [2025-01-03 23:32:46,535][134294] Updated weights for policy 0, policy_version 31464 (0.0030) [2025-01-03 23:32:48,968][134211] Fps is (10 sec: 11878.5, 60 sec: 14540.8, 300 sec: 14759.6). Total num frames: 128901120. Throughput: 0: 3741.9. Samples: 21391032. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 23:32:48,968][134211] Avg episode reward: [(0, '6.605')] [2025-01-03 23:32:49,922][134294] Updated weights for policy 0, policy_version 31474 (0.0027) [2025-01-03 23:32:52,810][134294] Updated weights for policy 0, policy_version 31484 (0.0025) [2025-01-03 23:32:53,968][134211] Fps is (10 sec: 12697.8, 60 sec: 14540.8, 300 sec: 14759.5). Total num frames: 128970752. Throughput: 0: 3648.4. Samples: 21410484. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 23:32:53,968][134211] Avg episode reward: [(0, '6.107')] [2025-01-03 23:32:55,922][134294] Updated weights for policy 0, policy_version 31494 (0.0025) [2025-01-03 23:32:57,988][134294] Updated weights for policy 0, policy_version 31504 (0.0014) [2025-01-03 23:32:58,967][134211] Fps is (10 sec: 15974.6, 60 sec: 14882.2, 300 sec: 14828.9). Total num frames: 129060864. Throughput: 0: 3696.0. Samples: 21434410. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 23:32:58,968][134211] Avg episode reward: [(0, '5.803')] [2025-01-03 23:32:59,873][134294] Updated weights for policy 0, policy_version 31514 (0.0014) [2025-01-03 23:33:01,766][134294] Updated weights for policy 0, policy_version 31524 (0.0014) [2025-01-03 23:33:03,730][134294] Updated weights for policy 0, policy_version 31534 (0.0012) [2025-01-03 23:33:03,967][134211] Fps is (10 sec: 19251.5, 60 sec: 15496.6, 300 sec: 14828.9). Total num frames: 129163264. Throughput: 0: 3830.5. Samples: 21450574. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-03 23:33:03,968][134211] Avg episode reward: [(0, '5.890')] [2025-01-03 23:33:06,045][134294] Updated weights for policy 0, policy_version 31544 (0.0016) [2025-01-03 23:33:08,971][134211] Fps is (10 sec: 17197.4, 60 sec: 15222.7, 300 sec: 14787.1). Total num frames: 129232896. Throughput: 0: 3925.5. Samples: 21477706. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-03 23:33:08,972][134211] Avg episode reward: [(0, '6.001')] [2025-01-03 23:33:09,603][134294] Updated weights for policy 0, policy_version 31554 (0.0026) [2025-01-03 23:33:12,999][134294] Updated weights for policy 0, policy_version 31564 (0.0027) [2025-01-03 23:33:13,968][134211] Fps is (10 sec: 13106.7, 60 sec: 15086.9, 300 sec: 14773.4). Total num frames: 129294336. Throughput: 0: 3589.3. Samples: 21495158. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-03 23:33:13,969][134211] Avg episode reward: [(0, '6.333')] [2025-01-03 23:33:14,027][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000031567_129298432.pth... [2025-01-03 23:33:14,097][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000030704_125763584.pth [2025-01-03 23:33:16,247][134294] Updated weights for policy 0, policy_version 31574 (0.0025) [2025-01-03 23:33:18,969][134211] Fps is (10 sec: 13109.9, 60 sec: 15154.9, 300 sec: 14787.2). Total num frames: 129363968. Throughput: 0: 3561.0. Samples: 21504834. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-03 23:33:18,970][134211] Avg episode reward: [(0, '6.364')] [2025-01-03 23:33:19,209][134294] Updated weights for policy 0, policy_version 31584 (0.0024) [2025-01-03 23:33:22,252][134294] Updated weights for policy 0, policy_version 31594 (0.0023) [2025-01-03 23:33:23,968][134211] Fps is (10 sec: 13517.1, 60 sec: 14677.3, 300 sec: 14787.3). Total num frames: 129429504. Throughput: 0: 3593.8. Samples: 21525130. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-03 23:33:23,968][134211] Avg episode reward: [(0, '6.376')] [2025-01-03 23:33:25,273][134294] Updated weights for policy 0, policy_version 31604 (0.0023) [2025-01-03 23:33:28,295][134294] Updated weights for policy 0, policy_version 31614 (0.0022) [2025-01-03 23:33:28,968][134211] Fps is (10 sec: 13518.5, 60 sec: 14063.0, 300 sec: 14787.3). Total num frames: 129499136. Throughput: 0: 3642.8. Samples: 21545726. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-03 23:33:28,968][134211] Avg episode reward: [(0, '6.718')] [2025-01-03 23:33:31,199][134294] Updated weights for policy 0, policy_version 31624 (0.0025) [2025-01-03 23:33:33,968][134211] Fps is (10 sec: 13926.4, 60 sec: 14131.3, 300 sec: 14773.4). Total num frames: 129568768. Throughput: 0: 3667.4. Samples: 21556066. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-03 23:33:33,968][134211] Avg episode reward: [(0, '6.888')] [2025-01-03 23:33:34,275][134294] Updated weights for policy 0, policy_version 31634 (0.0026) [2025-01-03 23:33:37,271][134294] Updated weights for policy 0, policy_version 31644 (0.0026) [2025-01-03 23:33:38,968][134211] Fps is (10 sec: 13516.7, 60 sec: 14199.5, 300 sec: 14634.5). Total num frames: 129634304. Throughput: 0: 3683.0. Samples: 21576220. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-03 23:33:38,968][134211] Avg episode reward: [(0, '6.018')] [2025-01-03 23:33:40,417][134294] Updated weights for policy 0, policy_version 31654 (0.0026) [2025-01-03 23:33:43,367][134294] Updated weights for policy 0, policy_version 31664 (0.0027) [2025-01-03 23:33:43,969][134211] Fps is (10 sec: 13105.5, 60 sec: 14267.5, 300 sec: 14634.5). Total num frames: 129699840. Throughput: 0: 3603.8. Samples: 21596584. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-03 23:33:43,970][134211] Avg episode reward: [(0, '6.576')] [2025-01-03 23:33:46,291][134294] Updated weights for policy 0, policy_version 31674 (0.0025) [2025-01-03 23:33:48,968][134211] Fps is (10 sec: 13516.7, 60 sec: 14472.5, 300 sec: 14648.4). Total num frames: 129769472. Throughput: 0: 3471.1. Samples: 21606772. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 23:33:48,968][134211] Avg episode reward: [(0, '5.561')] [2025-01-03 23:33:49,424][134294] Updated weights for policy 0, policy_version 31684 (0.0027) [2025-01-03 23:33:52,103][134294] Updated weights for policy 0, policy_version 31694 (0.0021) [2025-01-03 23:33:53,968][134211] Fps is (10 sec: 15566.9, 60 sec: 14745.6, 300 sec: 14704.0). Total num frames: 129855488. Throughput: 0: 3347.8. Samples: 21628348. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 23:33:53,968][134211] Avg episode reward: [(0, '6.202')] [2025-01-03 23:33:54,001][134294] Updated weights for policy 0, policy_version 31704 (0.0012) [2025-01-03 23:33:55,864][134294] Updated weights for policy 0, policy_version 31714 (0.0014) [2025-01-03 23:33:57,785][134294] Updated weights for policy 0, policy_version 31724 (0.0015) [2025-01-03 23:33:58,968][134211] Fps is (10 sec: 19661.1, 60 sec: 15086.9, 300 sec: 14870.6). Total num frames: 129966080. Throughput: 0: 3680.6. Samples: 21660782. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 23:33:58,968][134211] Avg episode reward: [(0, '6.625')] [2025-01-03 23:33:59,679][134294] Updated weights for policy 0, policy_version 31734 (0.0013) [2025-01-03 23:34:01,541][134294] Updated weights for policy 0, policy_version 31744 (0.0012) [2025-01-03 23:34:03,810][134294] Updated weights for policy 0, policy_version 31754 (0.0018) [2025-01-03 23:34:03,968][134211] Fps is (10 sec: 20889.1, 60 sec: 15018.6, 300 sec: 14953.9). Total num frames: 130064384. Throughput: 0: 3830.0. Samples: 21677180. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 23:34:03,968][134211] Avg episode reward: [(0, '6.479')] [2025-01-03 23:34:07,045][134294] Updated weights for policy 0, policy_version 31764 (0.0027) [2025-01-03 23:34:08,968][134211] Fps is (10 sec: 15974.1, 60 sec: 14882.9, 300 sec: 14787.2). Total num frames: 130125824. Throughput: 0: 3886.0. Samples: 21700002. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-03 23:34:08,968][134211] Avg episode reward: [(0, '6.513')] [2025-01-03 23:34:10,224][134294] Updated weights for policy 0, policy_version 31774 (0.0028) [2025-01-03 23:34:13,311][134294] Updated weights for policy 0, policy_version 31784 (0.0025) [2025-01-03 23:34:13,968][134211] Fps is (10 sec: 13107.3, 60 sec: 15018.7, 300 sec: 14690.1). Total num frames: 130195456. Throughput: 0: 3865.1. Samples: 21719656. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-03 23:34:13,968][134211] Avg episode reward: [(0, '6.582')] [2025-01-03 23:34:16,333][134294] Updated weights for policy 0, policy_version 31794 (0.0026) [2025-01-03 23:34:18,968][134211] Fps is (10 sec: 13516.4, 60 sec: 14950.6, 300 sec: 14690.0). Total num frames: 130260992. Throughput: 0: 3860.5. Samples: 21729792. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-03 23:34:18,969][134211] Avg episode reward: [(0, '6.169')] [2025-01-03 23:34:19,535][134294] Updated weights for policy 0, policy_version 31804 (0.0025) [2025-01-03 23:34:22,406][134294] Updated weights for policy 0, policy_version 31814 (0.0024) [2025-01-03 23:34:23,968][134211] Fps is (10 sec: 13107.1, 60 sec: 14950.3, 300 sec: 14703.9). Total num frames: 130326528. Throughput: 0: 3854.7. Samples: 21749680. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-03 23:34:23,968][134211] Avg episode reward: [(0, '5.936')] [2025-01-03 23:34:25,447][134294] Updated weights for policy 0, policy_version 31824 (0.0026) [2025-01-03 23:34:28,393][134294] Updated weights for policy 0, policy_version 31834 (0.0025) [2025-01-03 23:34:28,968][134211] Fps is (10 sec: 13516.5, 60 sec: 14950.2, 300 sec: 14703.9). Total num frames: 130396160. Throughput: 0: 3862.0. Samples: 21770370. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 23:34:28,969][134211] Avg episode reward: [(0, '6.401')] [2025-01-03 23:34:31,586][134294] Updated weights for policy 0, policy_version 31844 (0.0024) [2025-01-03 23:34:33,968][134211] Fps is (10 sec: 13516.9, 60 sec: 14882.1, 300 sec: 14703.9). Total num frames: 130461696. Throughput: 0: 3854.5. Samples: 21780226. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 23:34:33,968][134211] Avg episode reward: [(0, '5.583')] [2025-01-03 23:34:34,662][134294] Updated weights for policy 0, policy_version 31854 (0.0025) [2025-01-03 23:34:37,789][134294] Updated weights for policy 0, policy_version 31864 (0.0024) [2025-01-03 23:34:38,968][134211] Fps is (10 sec: 13107.2, 60 sec: 14882.0, 300 sec: 14703.9). Total num frames: 130527232. Throughput: 0: 3813.7. Samples: 21799968. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 23:34:38,969][134211] Avg episode reward: [(0, '5.922')] [2025-01-03 23:34:40,754][134294] Updated weights for policy 0, policy_version 31874 (0.0024) [2025-01-03 23:34:43,661][134294] Updated weights for policy 0, policy_version 31884 (0.0026) [2025-01-03 23:34:43,968][134211] Fps is (10 sec: 13517.0, 60 sec: 14950.7, 300 sec: 14704.0). Total num frames: 130596864. Throughput: 0: 3549.8. Samples: 21820522. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 23:34:43,968][134211] Avg episode reward: [(0, '5.605')] [2025-01-03 23:34:46,721][134294] Updated weights for policy 0, policy_version 31894 (0.0026) [2025-01-03 23:34:48,968][134211] Fps is (10 sec: 13927.2, 60 sec: 14950.4, 300 sec: 14704.0). Total num frames: 130666496. Throughput: 0: 3414.9. Samples: 21830850. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-03 23:34:48,968][134211] Avg episode reward: [(0, '6.019')] [2025-01-03 23:34:49,797][134294] Updated weights for policy 0, policy_version 31904 (0.0026) [2025-01-03 23:34:52,010][134294] Updated weights for policy 0, policy_version 31914 (0.0016) [2025-01-03 23:34:53,968][134211] Fps is (10 sec: 15564.7, 60 sec: 14950.4, 300 sec: 14773.4). Total num frames: 130752512. Throughput: 0: 3426.7. Samples: 21854202. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-03 23:34:53,968][134211] Avg episode reward: [(0, '5.925')] [2025-01-03 23:34:54,422][134294] Updated weights for policy 0, policy_version 31924 (0.0020) [2025-01-03 23:34:57,328][134294] Updated weights for policy 0, policy_version 31934 (0.0026) [2025-01-03 23:34:58,968][134211] Fps is (10 sec: 15564.2, 60 sec: 14267.6, 300 sec: 14773.4). Total num frames: 130822144. Throughput: 0: 3481.5. Samples: 21876326. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-03 23:34:58,969][134211] Avg episode reward: [(0, '6.488')] [2025-01-03 23:35:00,321][134294] Updated weights for policy 0, policy_version 31944 (0.0024) [2025-01-03 23:35:03,407][134294] Updated weights for policy 0, policy_version 31954 (0.0022) [2025-01-03 23:35:03,969][134211] Fps is (10 sec: 13925.2, 60 sec: 13789.7, 300 sec: 14787.2). Total num frames: 130891776. Throughput: 0: 3487.3. Samples: 21886724. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-03 23:35:03,969][134211] Avg episode reward: [(0, '6.850')] [2025-01-03 23:35:06,309][134294] Updated weights for policy 0, policy_version 31964 (0.0021) [2025-01-03 23:35:08,459][134294] Updated weights for policy 0, policy_version 31974 (0.0013) [2025-01-03 23:35:08,968][134211] Fps is (10 sec: 15155.8, 60 sec: 14131.2, 300 sec: 14773.4). Total num frames: 130973696. Throughput: 0: 3523.2. Samples: 21908222. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 23:35:08,968][134211] Avg episode reward: [(0, '6.718')] [2025-01-03 23:35:10,510][134294] Updated weights for policy 0, policy_version 31984 (0.0012) [2025-01-03 23:35:12,409][134294] Updated weights for policy 0, policy_version 31994 (0.0014) [2025-01-03 23:35:13,968][134211] Fps is (10 sec: 18023.7, 60 sec: 14609.0, 300 sec: 14745.6). Total num frames: 131072000. Throughput: 0: 3735.9. Samples: 21938482. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 23:35:13,968][134211] Avg episode reward: [(0, '6.463')] [2025-01-03 23:35:13,975][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000032000_131072000.pth... [2025-01-03 23:35:14,048][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000031132_127516672.pth [2025-01-03 23:35:15,174][134294] Updated weights for policy 0, policy_version 32004 (0.0023) [2025-01-03 23:35:18,240][134294] Updated weights for policy 0, policy_version 32014 (0.0025) [2025-01-03 23:35:18,968][134211] Fps is (10 sec: 16383.9, 60 sec: 14609.2, 300 sec: 14662.3). Total num frames: 131137536. Throughput: 0: 3742.0. Samples: 21948616. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 23:35:18,968][134211] Avg episode reward: [(0, '5.947')] [2025-01-03 23:35:21,486][134294] Updated weights for policy 0, policy_version 32024 (0.0026) [2025-01-03 23:35:23,968][134211] Fps is (10 sec: 13107.5, 60 sec: 14609.1, 300 sec: 14662.3). Total num frames: 131203072. Throughput: 0: 3737.2. Samples: 21968142. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 23:35:23,968][134211] Avg episode reward: [(0, '6.001')] [2025-01-03 23:35:24,562][134294] Updated weights for policy 0, policy_version 32034 (0.0026) [2025-01-03 23:35:27,594][134294] Updated weights for policy 0, policy_version 32044 (0.0026) [2025-01-03 23:35:28,968][134211] Fps is (10 sec: 13107.2, 60 sec: 14540.9, 300 sec: 14676.2). Total num frames: 131268608. Throughput: 0: 3722.6. Samples: 21988038. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 23:35:28,968][134211] Avg episode reward: [(0, '5.836')] [2025-01-03 23:35:30,636][134294] Updated weights for policy 0, policy_version 32054 (0.0029) [2025-01-03 23:35:33,596][134294] Updated weights for policy 0, policy_version 32064 (0.0024) [2025-01-03 23:35:33,968][134211] Fps is (10 sec: 13516.8, 60 sec: 14609.1, 300 sec: 14690.1). Total num frames: 131338240. Throughput: 0: 3722.5. Samples: 21998362. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 23:35:33,968][134211] Avg episode reward: [(0, '6.467')] [2025-01-03 23:35:36,571][134294] Updated weights for policy 0, policy_version 32074 (0.0023) [2025-01-03 23:35:38,968][134211] Fps is (10 sec: 13516.2, 60 sec: 14609.1, 300 sec: 14690.0). Total num frames: 131403776. Throughput: 0: 3663.0. Samples: 22019038. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 23:35:38,969][134211] Avg episode reward: [(0, '6.543')] [2025-01-03 23:35:39,719][134294] Updated weights for policy 0, policy_version 32084 (0.0026) [2025-01-03 23:35:42,756][134294] Updated weights for policy 0, policy_version 32094 (0.0028) [2025-01-03 23:35:43,968][134211] Fps is (10 sec: 13107.0, 60 sec: 14540.8, 300 sec: 14690.1). Total num frames: 131469312. Throughput: 0: 3609.9. Samples: 22038770. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 23:35:43,969][134211] Avg episode reward: [(0, '6.386')] [2025-01-03 23:35:45,403][134294] Updated weights for policy 0, policy_version 32104 (0.0019) [2025-01-03 23:35:47,258][134294] Updated weights for policy 0, policy_version 32114 (0.0011) [2025-01-03 23:35:48,968][134211] Fps is (10 sec: 16794.5, 60 sec: 15086.9, 300 sec: 14801.2). Total num frames: 131571712. Throughput: 0: 3672.9. Samples: 22052002. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 23:35:48,968][134211] Avg episode reward: [(0, '5.639')] [2025-01-03 23:35:49,150][134294] Updated weights for policy 0, policy_version 32124 (0.0012) [2025-01-03 23:35:51,015][134294] Updated weights for policy 0, policy_version 32134 (0.0013) [2025-01-03 23:35:52,922][134294] Updated weights for policy 0, policy_version 32144 (0.0013) [2025-01-03 23:35:53,968][134211] Fps is (10 sec: 21299.7, 60 sec: 15496.6, 300 sec: 14940.0). Total num frames: 131682304. Throughput: 0: 3918.3. Samples: 22084546. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-03 23:35:53,968][134211] Avg episode reward: [(0, '5.940')] [2025-01-03 23:35:54,818][134294] Updated weights for policy 0, policy_version 32154 (0.0013) [2025-01-03 23:35:57,062][134294] Updated weights for policy 0, policy_version 32164 (0.0017) [2025-01-03 23:35:58,968][134211] Fps is (10 sec: 19251.0, 60 sec: 15701.4, 300 sec: 14898.3). Total num frames: 131764224. Throughput: 0: 3869.8. Samples: 22112620. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-03 23:35:58,968][134211] Avg episode reward: [(0, '5.810')] [2025-01-03 23:36:00,343][134294] Updated weights for policy 0, policy_version 32174 (0.0031) [2025-01-03 23:36:03,520][134294] Updated weights for policy 0, policy_version 32184 (0.0024) [2025-01-03 23:36:03,968][134211] Fps is (10 sec: 14745.2, 60 sec: 15633.3, 300 sec: 14912.2). Total num frames: 131829760. Throughput: 0: 3854.4. Samples: 22122064. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-03 23:36:03,968][134211] Avg episode reward: [(0, '6.048')] [2025-01-03 23:36:06,745][134294] Updated weights for policy 0, policy_version 32194 (0.0025) [2025-01-03 23:36:08,968][134211] Fps is (10 sec: 12697.6, 60 sec: 15291.7, 300 sec: 14842.8). Total num frames: 131891200. Throughput: 0: 3845.6. Samples: 22141192. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-03 23:36:08,968][134211] Avg episode reward: [(0, '6.387')] [2025-01-03 23:36:09,920][134294] Updated weights for policy 0, policy_version 32204 (0.0030) [2025-01-03 23:36:12,934][134294] Updated weights for policy 0, policy_version 32214 (0.0026) [2025-01-03 23:36:13,968][134211] Fps is (10 sec: 12697.6, 60 sec: 14745.6, 300 sec: 14690.0). Total num frames: 131956736. Throughput: 0: 3842.1. Samples: 22160932. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 23:36:13,968][134211] Avg episode reward: [(0, '6.352')] [2025-01-03 23:36:16,139][134294] Updated weights for policy 0, policy_version 32224 (0.0027) [2025-01-03 23:36:18,970][134211] Fps is (10 sec: 13513.9, 60 sec: 14813.3, 300 sec: 14648.3). Total num frames: 132026368. Throughput: 0: 3828.8. Samples: 22170664. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 23:36:18,970][134211] Avg episode reward: [(0, '6.373')] [2025-01-03 23:36:19,290][134294] Updated weights for policy 0, policy_version 32234 (0.0025) [2025-01-03 23:36:22,232][134294] Updated weights for policy 0, policy_version 32244 (0.0026) [2025-01-03 23:36:23,968][134211] Fps is (10 sec: 13516.9, 60 sec: 14813.8, 300 sec: 14648.4). Total num frames: 132091904. Throughput: 0: 3814.1. Samples: 22190670. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 23:36:23,968][134211] Avg episode reward: [(0, '6.502')] [2025-01-03 23:36:25,403][134294] Updated weights for policy 0, policy_version 32254 (0.0027) [2025-01-03 23:36:28,325][134294] Updated weights for policy 0, policy_version 32264 (0.0023) [2025-01-03 23:36:28,968][134211] Fps is (10 sec: 13110.0, 60 sec: 14813.9, 300 sec: 14648.4). Total num frames: 132157440. Throughput: 0: 3824.5. Samples: 22210874. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 23:36:28,968][134211] Avg episode reward: [(0, '6.507')] [2025-01-03 23:36:31,364][134294] Updated weights for policy 0, policy_version 32274 (0.0025) [2025-01-03 23:36:33,968][134211] Fps is (10 sec: 13516.8, 60 sec: 14813.8, 300 sec: 14676.2). Total num frames: 132227072. Throughput: 0: 3759.7. Samples: 22221190. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 23:36:33,968][134211] Avg episode reward: [(0, '6.896')] [2025-01-03 23:36:34,364][134294] Updated weights for policy 0, policy_version 32284 (0.0025) [2025-01-03 23:36:37,429][134294] Updated weights for policy 0, policy_version 32294 (0.0024) [2025-01-03 23:36:38,968][134211] Fps is (10 sec: 13516.8, 60 sec: 14814.0, 300 sec: 14662.3). Total num frames: 132292608. Throughput: 0: 3490.2. Samples: 22241606. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 23:36:38,968][134211] Avg episode reward: [(0, '6.951')] [2025-01-03 23:36:40,615][134294] Updated weights for policy 0, policy_version 32304 (0.0026) [2025-01-03 23:36:43,047][134294] Updated weights for policy 0, policy_version 32314 (0.0015) [2025-01-03 23:36:43,967][134211] Fps is (10 sec: 14746.1, 60 sec: 15087.0, 300 sec: 14731.7). Total num frames: 132374528. Throughput: 0: 3353.4. Samples: 22263524. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 23:36:43,968][134211] Avg episode reward: [(0, '6.402')] [2025-01-03 23:36:44,949][134294] Updated weights for policy 0, policy_version 32324 (0.0012) [2025-01-03 23:36:46,854][134294] Updated weights for policy 0, policy_version 32334 (0.0014) [2025-01-03 23:36:48,671][134294] Updated weights for policy 0, policy_version 32344 (0.0013) [2025-01-03 23:36:48,967][134211] Fps is (10 sec: 19251.6, 60 sec: 15223.5, 300 sec: 14870.6). Total num frames: 132485120. Throughput: 0: 3505.5. Samples: 22279808. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 23:36:48,968][134211] Avg episode reward: [(0, '6.531')] [2025-01-03 23:36:50,576][134294] Updated weights for policy 0, policy_version 32354 (0.0012) [2025-01-03 23:36:53,028][134294] Updated weights for policy 0, policy_version 32364 (0.0021) [2025-01-03 23:36:53,968][134211] Fps is (10 sec: 19660.4, 60 sec: 14813.8, 300 sec: 14926.1). Total num frames: 132571136. Throughput: 0: 3774.0. Samples: 22311024. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 23:36:53,968][134211] Avg episode reward: [(0, '6.178')] [2025-01-03 23:36:56,196][134294] Updated weights for policy 0, policy_version 32374 (0.0028) [2025-01-03 23:36:58,968][134211] Fps is (10 sec: 15153.9, 60 sec: 14540.6, 300 sec: 14926.1). Total num frames: 132636672. Throughput: 0: 3766.4. Samples: 22330424. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 23:36:58,969][134211] Avg episode reward: [(0, '6.577')] [2025-01-03 23:36:59,584][134294] Updated weights for policy 0, policy_version 32384 (0.0029) [2025-01-03 23:37:02,540][134294] Updated weights for policy 0, policy_version 32394 (0.0026) [2025-01-03 23:37:03,968][134211] Fps is (10 sec: 13107.2, 60 sec: 14540.8, 300 sec: 14856.7). Total num frames: 132702208. Throughput: 0: 3761.2. Samples: 22339910. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 23:37:03,968][134211] Avg episode reward: [(0, '6.728')] [2025-01-03 23:37:05,685][134294] Updated weights for policy 0, policy_version 32404 (0.0022) [2025-01-03 23:37:08,968][134211] Fps is (10 sec: 12698.3, 60 sec: 14540.8, 300 sec: 14828.9). Total num frames: 132763648. Throughput: 0: 3747.5. Samples: 22359306. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 23:37:08,968][134211] Avg episode reward: [(0, '6.903')] [2025-01-03 23:37:09,348][134294] Updated weights for policy 0, policy_version 32414 (0.0027) [2025-01-03 23:37:12,842][134294] Updated weights for policy 0, policy_version 32424 (0.0024) [2025-01-03 23:37:13,968][134211] Fps is (10 sec: 11878.4, 60 sec: 14404.3, 300 sec: 14801.1). Total num frames: 132820992. Throughput: 0: 3676.1. Samples: 22376298. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 23:37:13,968][134211] Avg episode reward: [(0, '6.950')] [2025-01-03 23:37:13,978][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000032427_132820992.pth... [2025-01-03 23:37:14,051][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000031567_129298432.pth [2025-01-03 23:37:16,215][134294] Updated weights for policy 0, policy_version 32434 (0.0023) [2025-01-03 23:37:18,701][134294] Updated weights for policy 0, policy_version 32444 (0.0018) [2025-01-03 23:37:18,968][134211] Fps is (10 sec: 13107.4, 60 sec: 14473.1, 300 sec: 14731.7). Total num frames: 132894720. Throughput: 0: 3654.1. Samples: 22385622. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 23:37:18,968][134211] Avg episode reward: [(0, '6.676')] [2025-01-03 23:37:20,678][134294] Updated weights for policy 0, policy_version 32454 (0.0015) [2025-01-03 23:37:22,910][134294] Updated weights for policy 0, policy_version 32464 (0.0017) [2025-01-03 23:37:23,968][134211] Fps is (10 sec: 16382.8, 60 sec: 14882.0, 300 sec: 14676.1). Total num frames: 132984832. Throughput: 0: 3821.4. Samples: 22413570. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 23:37:23,969][134211] Avg episode reward: [(0, '6.699')] [2025-01-03 23:37:25,915][134294] Updated weights for policy 0, policy_version 32474 (0.0025) [2025-01-03 23:37:28,923][134294] Updated weights for policy 0, policy_version 32484 (0.0028) [2025-01-03 23:37:28,968][134211] Fps is (10 sec: 15973.8, 60 sec: 14950.3, 300 sec: 14690.1). Total num frames: 133054464. Throughput: 0: 3795.3. Samples: 22434314. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 23:37:28,969][134211] Avg episode reward: [(0, '6.872')] [2025-01-03 23:37:31,938][134294] Updated weights for policy 0, policy_version 32494 (0.0026) [2025-01-03 23:37:33,968][134211] Fps is (10 sec: 13517.7, 60 sec: 14882.1, 300 sec: 14703.9). Total num frames: 133120000. Throughput: 0: 3656.1. Samples: 22444332. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 23:37:33,970][134211] Avg episode reward: [(0, '6.738')] [2025-01-03 23:37:35,064][134294] Updated weights for policy 0, policy_version 32504 (0.0025) [2025-01-03 23:37:38,211][134294] Updated weights for policy 0, policy_version 32514 (0.0027) [2025-01-03 23:37:38,967][134211] Fps is (10 sec: 13107.8, 60 sec: 14882.2, 300 sec: 14717.8). Total num frames: 133185536. Throughput: 0: 3406.9. Samples: 22464332. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 23:37:38,968][134211] Avg episode reward: [(0, '6.495')] [2025-01-03 23:37:40,350][134294] Updated weights for policy 0, policy_version 32524 (0.0015) [2025-01-03 23:37:42,257][134294] Updated weights for policy 0, policy_version 32534 (0.0014) [2025-01-03 23:37:43,968][134211] Fps is (10 sec: 17613.2, 60 sec: 15360.0, 300 sec: 14898.3). Total num frames: 133296128. Throughput: 0: 3616.9. Samples: 22493180. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 23:37:43,968][134211] Avg episode reward: [(0, '6.620')] [2025-01-03 23:37:44,104][134294] Updated weights for policy 0, policy_version 32544 (0.0013) [2025-01-03 23:37:46,022][134294] Updated weights for policy 0, policy_version 32554 (0.0013) [2025-01-03 23:37:48,462][134294] Updated weights for policy 0, policy_version 32564 (0.0023) [2025-01-03 23:37:48,968][134211] Fps is (10 sec: 20069.9, 60 sec: 15018.6, 300 sec: 14967.8). Total num frames: 133386240. Throughput: 0: 3769.2. Samples: 22509522. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 23:37:48,968][134211] Avg episode reward: [(0, '6.596')] [2025-01-03 23:37:52,007][134294] Updated weights for policy 0, policy_version 32574 (0.0028) [2025-01-03 23:37:53,968][134211] Fps is (10 sec: 15154.8, 60 sec: 14609.0, 300 sec: 14870.5). Total num frames: 133447680. Throughput: 0: 3782.6. Samples: 22529524. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 23:37:53,968][134211] Avg episode reward: [(0, '7.348')] [2025-01-03 23:37:55,180][134294] Updated weights for policy 0, policy_version 32584 (0.0029) [2025-01-03 23:37:58,248][134294] Updated weights for policy 0, policy_version 32594 (0.0022) [2025-01-03 23:37:58,968][134211] Fps is (10 sec: 12697.6, 60 sec: 14609.2, 300 sec: 14745.6). Total num frames: 133513216. Throughput: 0: 3845.6. Samples: 22549350. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-03 23:37:58,968][134211] Avg episode reward: [(0, '6.170')] [2025-01-03 23:38:01,294][134294] Updated weights for policy 0, policy_version 32604 (0.0026) [2025-01-03 23:38:03,970][134211] Fps is (10 sec: 13104.0, 60 sec: 14608.4, 300 sec: 14731.7). Total num frames: 133578752. Throughput: 0: 3859.2. Samples: 22559296. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-03 23:38:03,971][134211] Avg episode reward: [(0, '5.917')] [2025-01-03 23:38:04,516][134294] Updated weights for policy 0, policy_version 32614 (0.0025) [2025-01-03 23:38:07,602][134294] Updated weights for policy 0, policy_version 32624 (0.0024) [2025-01-03 23:38:08,968][134211] Fps is (10 sec: 13107.3, 60 sec: 14677.4, 300 sec: 14745.6). Total num frames: 133644288. Throughput: 0: 3670.2. Samples: 22578728. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-03 23:38:08,968][134211] Avg episode reward: [(0, '6.081')] [2025-01-03 23:38:10,607][134294] Updated weights for policy 0, policy_version 32634 (0.0024) [2025-01-03 23:38:13,547][134294] Updated weights for policy 0, policy_version 32644 (0.0025) [2025-01-03 23:38:13,968][134211] Fps is (10 sec: 13520.3, 60 sec: 14882.1, 300 sec: 14745.7). Total num frames: 133713920. Throughput: 0: 3670.8. Samples: 22599498. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-03 23:38:13,968][134211] Avg episode reward: [(0, '7.127')] [2025-01-03 23:38:16,521][134294] Updated weights for policy 0, policy_version 32654 (0.0024) [2025-01-03 23:38:18,968][134211] Fps is (10 sec: 13926.2, 60 sec: 14813.8, 300 sec: 14759.5). Total num frames: 133783552. Throughput: 0: 3676.0. Samples: 22609750. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-03 23:38:18,968][134211] Avg episode reward: [(0, '5.880')] [2025-01-03 23:38:19,567][134294] Updated weights for policy 0, policy_version 32664 (0.0024) [2025-01-03 23:38:22,513][134294] Updated weights for policy 0, policy_version 32674 (0.0024) [2025-01-03 23:38:23,968][134211] Fps is (10 sec: 13516.8, 60 sec: 14404.5, 300 sec: 14745.6). Total num frames: 133849088. Throughput: 0: 3684.0. Samples: 22630112. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-03 23:38:23,968][134211] Avg episode reward: [(0, '7.264')] [2025-01-03 23:38:25,566][134294] Updated weights for policy 0, policy_version 32684 (0.0026) [2025-01-03 23:38:27,856][134294] Updated weights for policy 0, policy_version 32694 (0.0019) [2025-01-03 23:38:28,967][134211] Fps is (10 sec: 15155.6, 60 sec: 14677.4, 300 sec: 14801.1). Total num frames: 133935104. Throughput: 0: 3566.0. Samples: 22653652. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-03 23:38:28,968][134211] Avg episode reward: [(0, '7.292')] [2025-01-03 23:38:29,754][134294] Updated weights for policy 0, policy_version 32704 (0.0014) [2025-01-03 23:38:31,620][134294] Updated weights for policy 0, policy_version 32714 (0.0014) [2025-01-03 23:38:33,541][134294] Updated weights for policy 0, policy_version 32724 (0.0014) [2025-01-03 23:38:33,967][134211] Fps is (10 sec: 19661.1, 60 sec: 15428.3, 300 sec: 14953.9). Total num frames: 134045696. Throughput: 0: 3562.1. Samples: 22669814. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-03 23:38:33,968][134211] Avg episode reward: [(0, '6.718')] [2025-01-03 23:38:35,789][134294] Updated weights for policy 0, policy_version 32734 (0.0016) [2025-01-03 23:38:38,968][134211] Fps is (10 sec: 18021.8, 60 sec: 15496.4, 300 sec: 14967.8). Total num frames: 134115328. Throughput: 0: 3733.1. Samples: 22697514. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 23:38:38,969][134211] Avg episode reward: [(0, '6.727')] [2025-01-03 23:38:38,984][134294] Updated weights for policy 0, policy_version 32744 (0.0028) [2025-01-03 23:38:42,202][134294] Updated weights for policy 0, policy_version 32754 (0.0030) [2025-01-03 23:38:43,968][134211] Fps is (10 sec: 13516.6, 60 sec: 14745.6, 300 sec: 14953.9). Total num frames: 134180864. Throughput: 0: 3715.4. Samples: 22716544. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 23:38:43,968][134211] Avg episode reward: [(0, '6.118')] [2025-01-03 23:38:45,301][134294] Updated weights for policy 0, policy_version 32764 (0.0027) [2025-01-03 23:38:48,220][134294] Updated weights for policy 0, policy_version 32774 (0.0028) [2025-01-03 23:38:48,968][134211] Fps is (10 sec: 13517.0, 60 sec: 14404.3, 300 sec: 14898.3). Total num frames: 134250496. Throughput: 0: 3719.8. Samples: 22726678. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 23:38:48,968][134211] Avg episode reward: [(0, '6.736')] [2025-01-03 23:38:51,309][134294] Updated weights for policy 0, policy_version 32784 (0.0025) [2025-01-03 23:38:53,968][134211] Fps is (10 sec: 13516.6, 60 sec: 14472.5, 300 sec: 14745.6). Total num frames: 134316032. Throughput: 0: 3736.3. Samples: 22746860. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 23:38:53,968][134211] Avg episode reward: [(0, '6.152')] [2025-01-03 23:38:54,458][134294] Updated weights for policy 0, policy_version 32794 (0.0025) [2025-01-03 23:38:57,473][134294] Updated weights for policy 0, policy_version 32804 (0.0025) [2025-01-03 23:38:58,968][134211] Fps is (10 sec: 13516.9, 60 sec: 14540.8, 300 sec: 14648.4). Total num frames: 134385664. Throughput: 0: 3721.1. Samples: 22766946. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 23:38:58,968][134211] Avg episode reward: [(0, '6.276')] [2025-01-03 23:39:00,380][134294] Updated weights for policy 0, policy_version 32814 (0.0024) [2025-01-03 23:39:03,405][134294] Updated weights for policy 0, policy_version 32824 (0.0027) [2025-01-03 23:39:03,968][134211] Fps is (10 sec: 13926.7, 60 sec: 14609.7, 300 sec: 14676.2). Total num frames: 134455296. Throughput: 0: 3725.5. Samples: 22777398. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 23:39:03,968][134211] Avg episode reward: [(0, '6.674')] [2025-01-03 23:39:06,399][134294] Updated weights for policy 0, policy_version 32834 (0.0022) [2025-01-03 23:39:08,970][134211] Fps is (10 sec: 13107.0, 60 sec: 14540.8, 300 sec: 14648.4). Total num frames: 134516736. Throughput: 0: 3719.9. Samples: 22797510. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 23:39:08,970][134211] Avg episode reward: [(0, '6.468')] [2025-01-03 23:39:09,879][134294] Updated weights for policy 0, policy_version 32844 (0.0025) [2025-01-03 23:39:11,967][134294] Updated weights for policy 0, policy_version 32854 (0.0015) [2025-01-03 23:39:13,968][134211] Fps is (10 sec: 15155.2, 60 sec: 14882.1, 300 sec: 14731.7). Total num frames: 134606848. Throughput: 0: 3733.7. Samples: 22821668. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 23:39:13,968][134211] Avg episode reward: [(0, '6.421')] [2025-01-03 23:39:13,994][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000032864_134610944.pth... [2025-01-03 23:39:13,995][134294] Updated weights for policy 0, policy_version 32864 (0.0014) [2025-01-03 23:39:14,037][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000032000_131072000.pth [2025-01-03 23:39:15,888][134294] Updated weights for policy 0, policy_version 32874 (0.0016) [2025-01-03 23:39:17,803][134294] Updated weights for policy 0, policy_version 32884 (0.0013) [2025-01-03 23:39:18,968][134211] Fps is (10 sec: 20070.8, 60 sec: 15564.9, 300 sec: 14884.5). Total num frames: 134717440. Throughput: 0: 3731.1. Samples: 22837714. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 23:39:18,968][134211] Avg episode reward: [(0, '6.282')] [2025-01-03 23:39:19,979][134294] Updated weights for policy 0, policy_version 32894 (0.0018) [2025-01-03 23:39:23,163][134294] Updated weights for policy 0, policy_version 32904 (0.0029) [2025-01-03 23:39:23,968][134211] Fps is (10 sec: 17612.4, 60 sec: 15564.7, 300 sec: 14870.6). Total num frames: 134782976. Throughput: 0: 3695.7. Samples: 22863820. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-03 23:39:23,969][134211] Avg episode reward: [(0, '6.579')] [2025-01-03 23:39:26,294][134294] Updated weights for policy 0, policy_version 32914 (0.0026) [2025-01-03 23:39:28,968][134211] Fps is (10 sec: 13107.1, 60 sec: 15223.4, 300 sec: 14870.6). Total num frames: 134848512. Throughput: 0: 3700.0. Samples: 22883042. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-03 23:39:28,968][134211] Avg episode reward: [(0, '7.055')] [2025-01-03 23:39:29,527][134294] Updated weights for policy 0, policy_version 32924 (0.0028) [2025-01-03 23:39:32,604][134294] Updated weights for policy 0, policy_version 32934 (0.0024) [2025-01-03 23:39:33,969][134211] Fps is (10 sec: 13106.3, 60 sec: 14472.3, 300 sec: 14870.5). Total num frames: 134914048. Throughput: 0: 3692.1. Samples: 22892826. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-03 23:39:33,969][134211] Avg episode reward: [(0, '6.060')] [2025-01-03 23:39:35,646][134294] Updated weights for policy 0, policy_version 32944 (0.0028) [2025-01-03 23:39:38,690][134294] Updated weights for policy 0, policy_version 32954 (0.0028) [2025-01-03 23:39:38,968][134211] Fps is (10 sec: 13516.8, 60 sec: 14472.6, 300 sec: 14870.6). Total num frames: 134983680. Throughput: 0: 3699.3. Samples: 22913330. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-03 23:39:38,968][134211] Avg episode reward: [(0, '6.180')] [2025-01-03 23:39:41,629][134294] Updated weights for policy 0, policy_version 32964 (0.0027) [2025-01-03 23:39:43,968][134211] Fps is (10 sec: 13517.8, 60 sec: 14472.5, 300 sec: 14856.7). Total num frames: 135049216. Throughput: 0: 3698.8. Samples: 22933394. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 23:39:43,969][134211] Avg episode reward: [(0, '6.614')] [2025-01-03 23:39:44,710][134294] Updated weights for policy 0, policy_version 32974 (0.0028) [2025-01-03 23:39:47,840][134294] Updated weights for policy 0, policy_version 32984 (0.0027) [2025-01-03 23:39:48,968][134211] Fps is (10 sec: 13107.3, 60 sec: 14404.3, 300 sec: 14787.3). Total num frames: 135114752. Throughput: 0: 3682.7. Samples: 22943120. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 23:39:48,968][134211] Avg episode reward: [(0, '7.042')] [2025-01-03 23:39:50,851][134294] Updated weights for policy 0, policy_version 32994 (0.0025) [2025-01-03 23:39:53,670][134294] Updated weights for policy 0, policy_version 33004 (0.0024) [2025-01-03 23:39:53,968][134211] Fps is (10 sec: 13926.7, 60 sec: 14540.9, 300 sec: 14801.2). Total num frames: 135188480. Throughput: 0: 3695.5. Samples: 22963806. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 23:39:53,968][134211] Avg episode reward: [(0, '7.223')] [2025-01-03 23:39:55,601][134294] Updated weights for policy 0, policy_version 33014 (0.0014) [2025-01-03 23:39:57,475][134294] Updated weights for policy 0, policy_version 33024 (0.0013) [2025-01-03 23:39:58,967][134211] Fps is (10 sec: 18432.2, 60 sec: 15223.5, 300 sec: 14940.0). Total num frames: 135299072. Throughput: 0: 3823.9. Samples: 22993744. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 23:39:58,968][134211] Avg episode reward: [(0, '6.526')] [2025-01-03 23:39:59,351][134294] Updated weights for policy 0, policy_version 33034 (0.0013) [2025-01-03 23:40:01,235][134294] Updated weights for policy 0, policy_version 33044 (0.0012) [2025-01-03 23:40:03,156][134294] Updated weights for policy 0, policy_version 33054 (0.0015) [2025-01-03 23:40:03,968][134211] Fps is (10 sec: 21299.2, 60 sec: 15769.6, 300 sec: 15009.4). Total num frames: 135401472. Throughput: 0: 3829.4. Samples: 23010036. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-03 23:40:03,968][134211] Avg episode reward: [(0, '6.405')] [2025-01-03 23:40:06,012][134294] Updated weights for policy 0, policy_version 33064 (0.0025) [2025-01-03 23:40:08,968][134211] Fps is (10 sec: 16383.6, 60 sec: 15769.6, 300 sec: 14884.5). Total num frames: 135462912. Throughput: 0: 3787.1. Samples: 23034238. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-03 23:40:08,968][134211] Avg episode reward: [(0, '6.660')] [2025-01-03 23:40:09,416][134294] Updated weights for policy 0, policy_version 33074 (0.0029) [2025-01-03 23:40:12,520][134294] Updated weights for policy 0, policy_version 33084 (0.0026) [2025-01-03 23:40:13,968][134211] Fps is (10 sec: 12697.2, 60 sec: 15359.9, 300 sec: 14884.4). Total num frames: 135528448. Throughput: 0: 3783.2. Samples: 23053288. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-03 23:40:13,969][134211] Avg episode reward: [(0, '7.202')] [2025-01-03 23:40:15,614][134294] Updated weights for policy 0, policy_version 33094 (0.0029) [2025-01-03 23:40:18,603][134294] Updated weights for policy 0, policy_version 33104 (0.0023) [2025-01-03 23:40:18,968][134211] Fps is (10 sec: 13516.7, 60 sec: 14677.3, 300 sec: 14898.3). Total num frames: 135598080. Throughput: 0: 3793.1. Samples: 23063514. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-03 23:40:18,968][134211] Avg episode reward: [(0, '7.167')] [2025-01-03 23:40:21,531][134294] Updated weights for policy 0, policy_version 33114 (0.0025) [2025-01-03 23:40:23,968][134211] Fps is (10 sec: 13516.9, 60 sec: 14677.3, 300 sec: 14898.3). Total num frames: 135663616. Throughput: 0: 3790.4. Samples: 23083898. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-03 23:40:23,969][134211] Avg episode reward: [(0, '6.602')] [2025-01-03 23:40:24,766][134294] Updated weights for policy 0, policy_version 33124 (0.0028) [2025-01-03 23:40:27,659][134294] Updated weights for policy 0, policy_version 33134 (0.0025) [2025-01-03 23:40:28,968][134211] Fps is (10 sec: 13517.0, 60 sec: 14745.6, 300 sec: 14898.3). Total num frames: 135733248. Throughput: 0: 3789.0. Samples: 23103900. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-03 23:40:28,968][134211] Avg episode reward: [(0, '6.420')] [2025-01-03 23:40:30,720][134294] Updated weights for policy 0, policy_version 33144 (0.0026) [2025-01-03 23:40:33,939][134294] Updated weights for policy 0, policy_version 33154 (0.0026) [2025-01-03 23:40:33,968][134211] Fps is (10 sec: 13516.8, 60 sec: 14745.8, 300 sec: 14898.3). Total num frames: 135798784. Throughput: 0: 3794.6. Samples: 23113876. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-03 23:40:33,968][134211] Avg episode reward: [(0, '7.003')] [2025-01-03 23:40:36,940][134294] Updated weights for policy 0, policy_version 33164 (0.0024) [2025-01-03 23:40:38,968][134211] Fps is (10 sec: 13107.1, 60 sec: 14677.3, 300 sec: 14898.3). Total num frames: 135864320. Throughput: 0: 3779.1. Samples: 23133864. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-03 23:40:38,968][134211] Avg episode reward: [(0, '6.072')] [2025-01-03 23:40:40,173][134294] Updated weights for policy 0, policy_version 33174 (0.0026) [2025-01-03 23:40:43,179][134294] Updated weights for policy 0, policy_version 33184 (0.0023) [2025-01-03 23:40:43,968][134211] Fps is (10 sec: 13107.4, 60 sec: 14677.4, 300 sec: 14773.4). Total num frames: 135929856. Throughput: 0: 3555.2. Samples: 23153730. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-03 23:40:43,968][134211] Avg episode reward: [(0, '5.724')] [2025-01-03 23:40:46,203][134294] Updated weights for policy 0, policy_version 33194 (0.0024) [2025-01-03 23:40:48,246][134294] Updated weights for policy 0, policy_version 33204 (0.0014) [2025-01-03 23:40:48,968][134211] Fps is (10 sec: 15155.5, 60 sec: 15018.7, 300 sec: 14690.1). Total num frames: 136015872. Throughput: 0: 3419.0. Samples: 23163890. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 23:40:48,968][134211] Avg episode reward: [(0, '5.482')] [2025-01-03 23:40:50,166][134294] Updated weights for policy 0, policy_version 33214 (0.0013) [2025-01-03 23:40:52,073][134294] Updated weights for policy 0, policy_version 33224 (0.0014) [2025-01-03 23:40:53,943][134294] Updated weights for policy 0, policy_version 33234 (0.0013) [2025-01-03 23:40:53,968][134211] Fps is (10 sec: 19660.9, 60 sec: 15633.1, 300 sec: 14787.3). Total num frames: 136126464. Throughput: 0: 3577.2. Samples: 23195212. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 23:40:53,968][134211] Avg episode reward: [(0, '5.698')] [2025-01-03 23:40:55,825][134294] Updated weights for policy 0, policy_version 33244 (0.0013) [2025-01-03 23:40:58,725][134294] Updated weights for policy 0, policy_version 33254 (0.0024) [2025-01-03 23:40:58,968][134211] Fps is (10 sec: 19660.4, 60 sec: 15223.4, 300 sec: 14856.7). Total num frames: 136212480. Throughput: 0: 3787.3. Samples: 23223714. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 23:40:58,969][134211] Avg episode reward: [(0, '6.064')] [2025-01-03 23:41:02,081][134294] Updated weights for policy 0, policy_version 33264 (0.0028) [2025-01-03 23:41:03,968][134211] Fps is (10 sec: 14335.7, 60 sec: 14472.5, 300 sec: 14842.8). Total num frames: 136269824. Throughput: 0: 3758.4. Samples: 23232640. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 23:41:03,968][134211] Avg episode reward: [(0, '6.514')] [2025-01-03 23:41:05,378][134294] Updated weights for policy 0, policy_version 33274 (0.0024) [2025-01-03 23:41:08,533][134294] Updated weights for policy 0, policy_version 33284 (0.0026) [2025-01-03 23:41:08,968][134211] Fps is (10 sec: 11878.5, 60 sec: 14472.5, 300 sec: 14828.9). Total num frames: 136331264. Throughput: 0: 3733.9. Samples: 23251922. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 23:41:08,968][134211] Avg episode reward: [(0, '5.319')] [2025-01-03 23:41:12,164][134294] Updated weights for policy 0, policy_version 33294 (0.0027) [2025-01-03 23:41:13,968][134211] Fps is (10 sec: 11878.3, 60 sec: 14336.0, 300 sec: 14787.3). Total num frames: 136388608. Throughput: 0: 3670.6. Samples: 23269078. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 23:41:13,969][134211] Avg episode reward: [(0, '5.764')] [2025-01-03 23:41:13,983][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000033299_136392704.pth... [2025-01-03 23:41:14,058][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000032427_132820992.pth [2025-01-03 23:41:15,657][134294] Updated weights for policy 0, policy_version 33304 (0.0026) [2025-01-03 23:41:18,968][134211] Fps is (10 sec: 11878.5, 60 sec: 14199.5, 300 sec: 14773.4). Total num frames: 136450048. Throughput: 0: 3646.8. Samples: 23277982. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 23:41:18,968][134211] Avg episode reward: [(0, '6.285')] [2025-01-03 23:41:18,986][134294] Updated weights for policy 0, policy_version 33314 (0.0023) [2025-01-03 23:41:20,948][134294] Updated weights for policy 0, policy_version 33324 (0.0012) [2025-01-03 23:41:22,920][134294] Updated weights for policy 0, policy_version 33334 (0.0013) [2025-01-03 23:41:23,967][134211] Fps is (10 sec: 16794.3, 60 sec: 14882.2, 300 sec: 14912.2). Total num frames: 136556544. Throughput: 0: 3766.1. Samples: 23303338. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 23:41:23,968][134211] Avg episode reward: [(0, '6.328')] [2025-01-03 23:41:24,767][134294] Updated weights for policy 0, policy_version 33344 (0.0013) [2025-01-03 23:41:26,737][134294] Updated weights for policy 0, policy_version 33354 (0.0016) [2025-01-03 23:41:28,968][134211] Fps is (10 sec: 20070.1, 60 sec: 15291.7, 300 sec: 14995.5). Total num frames: 136650752. Throughput: 0: 3992.3. Samples: 23333386. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 23:41:28,968][134211] Avg episode reward: [(0, '6.174')] [2025-01-03 23:41:29,588][134294] Updated weights for policy 0, policy_version 33364 (0.0026) [2025-01-03 23:41:32,701][134294] Updated weights for policy 0, policy_version 33374 (0.0023) [2025-01-03 23:41:33,968][134211] Fps is (10 sec: 15564.4, 60 sec: 15223.5, 300 sec: 14981.6). Total num frames: 136712192. Throughput: 0: 3982.6. Samples: 23343106. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 23:41:33,968][134211] Avg episode reward: [(0, '6.059')] [2025-01-03 23:41:35,851][134294] Updated weights for policy 0, policy_version 33384 (0.0026) [2025-01-03 23:41:38,965][134294] Updated weights for policy 0, policy_version 33394 (0.0026) [2025-01-03 23:41:38,970][134211] Fps is (10 sec: 13104.5, 60 sec: 15291.2, 300 sec: 14939.9). Total num frames: 136781824. Throughput: 0: 3724.2. Samples: 23362810. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 23:41:38,970][134211] Avg episode reward: [(0, '6.781')] [2025-01-03 23:41:42,065][134294] Updated weights for policy 0, policy_version 33404 (0.0025) [2025-01-03 23:41:43,968][134211] Fps is (10 sec: 13516.9, 60 sec: 15291.7, 300 sec: 14787.2). Total num frames: 136847360. Throughput: 0: 3530.9. Samples: 23382604. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 23:41:43,968][134211] Avg episode reward: [(0, '5.628')] [2025-01-03 23:41:45,142][134294] Updated weights for policy 0, policy_version 33414 (0.0026) [2025-01-03 23:41:47,996][134294] Updated weights for policy 0, policy_version 33424 (0.0028) [2025-01-03 23:41:48,968][134211] Fps is (10 sec: 13519.8, 60 sec: 15018.6, 300 sec: 14731.7). Total num frames: 136916992. Throughput: 0: 3563.2. Samples: 23392984. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 23:41:48,968][134211] Avg episode reward: [(0, '5.938')] [2025-01-03 23:41:51,129][134294] Updated weights for policy 0, policy_version 33434 (0.0027) [2025-01-03 23:41:53,968][134211] Fps is (10 sec: 13107.1, 60 sec: 14199.4, 300 sec: 14717.9). Total num frames: 136978432. Throughput: 0: 3572.6. Samples: 23412690. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-03 23:41:53,968][134211] Avg episode reward: [(0, '5.288')] [2025-01-03 23:41:54,517][134294] Updated weights for policy 0, policy_version 33444 (0.0027) [2025-01-03 23:41:57,692][134294] Updated weights for policy 0, policy_version 33454 (0.0024) [2025-01-03 23:41:58,968][134211] Fps is (10 sec: 12697.6, 60 sec: 13858.2, 300 sec: 14717.8). Total num frames: 137043968. Throughput: 0: 3617.9. Samples: 23431884. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-03 23:41:58,968][134211] Avg episode reward: [(0, '5.655')] [2025-01-03 23:42:00,540][134294] Updated weights for policy 0, policy_version 33464 (0.0024) [2025-01-03 23:42:03,043][134294] Updated weights for policy 0, policy_version 33474 (0.0018) [2025-01-03 23:42:03,968][134211] Fps is (10 sec: 14746.0, 60 sec: 14267.8, 300 sec: 14787.3). Total num frames: 137125888. Throughput: 0: 3650.9. Samples: 23442274. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-03 23:42:03,968][134211] Avg episode reward: [(0, '6.021')] [2025-01-03 23:42:05,421][134294] Updated weights for policy 0, policy_version 33484 (0.0019) [2025-01-03 23:42:08,344][134294] Updated weights for policy 0, policy_version 33494 (0.0026) [2025-01-03 23:42:08,968][134211] Fps is (10 sec: 15154.1, 60 sec: 14404.1, 300 sec: 14828.9). Total num frames: 137195520. Throughput: 0: 3636.7. Samples: 23466994. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-03 23:42:08,969][134211] Avg episode reward: [(0, '5.866')] [2025-01-03 23:42:11,438][134294] Updated weights for policy 0, policy_version 33504 (0.0025) [2025-01-03 23:42:13,968][134211] Fps is (10 sec: 13925.7, 60 sec: 14609.0, 300 sec: 14815.0). Total num frames: 137265152. Throughput: 0: 3414.1. Samples: 23487020. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 23:42:13,969][134211] Avg episode reward: [(0, '5.976')] [2025-01-03 23:42:14,477][134294] Updated weights for policy 0, policy_version 33514 (0.0025) [2025-01-03 23:42:16,737][134294] Updated weights for policy 0, policy_version 33524 (0.0016) [2025-01-03 23:42:18,694][134294] Updated weights for policy 0, policy_version 33534 (0.0014) [2025-01-03 23:42:18,968][134211] Fps is (10 sec: 16385.2, 60 sec: 15155.2, 300 sec: 14829.0). Total num frames: 137359360. Throughput: 0: 3458.4. Samples: 23498732. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 23:42:18,968][134211] Avg episode reward: [(0, '5.701')] [2025-01-03 23:42:20,574][134294] Updated weights for policy 0, policy_version 33544 (0.0013) [2025-01-03 23:42:22,463][134294] Updated weights for policy 0, policy_version 33554 (0.0014) [2025-01-03 23:42:23,968][134211] Fps is (10 sec: 20071.5, 60 sec: 15155.2, 300 sec: 14953.9). Total num frames: 137465856. Throughput: 0: 3736.0. Samples: 23530922. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 23:42:23,968][134211] Avg episode reward: [(0, '6.496')] [2025-01-03 23:42:24,334][134294] Updated weights for policy 0, policy_version 33564 (0.0016) [2025-01-03 23:42:26,993][134294] Updated weights for policy 0, policy_version 33574 (0.0023) [2025-01-03 23:42:28,968][134211] Fps is (10 sec: 17611.3, 60 sec: 14745.4, 300 sec: 14967.7). Total num frames: 137535488. Throughput: 0: 3851.7. Samples: 23555934. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 23:42:28,969][134211] Avg episode reward: [(0, '6.164')] [2025-01-03 23:42:30,830][134294] Updated weights for policy 0, policy_version 33584 (0.0026) [2025-01-03 23:42:33,968][134211] Fps is (10 sec: 12697.2, 60 sec: 14677.3, 300 sec: 14940.0). Total num frames: 137592832. Throughput: 0: 3803.5. Samples: 23564144. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 23:42:33,969][134211] Avg episode reward: [(0, '6.276')] [2025-01-03 23:42:34,513][134294] Updated weights for policy 0, policy_version 33594 (0.0033) [2025-01-03 23:42:38,049][134294] Updated weights for policy 0, policy_version 33604 (0.0025) [2025-01-03 23:42:38,968][134211] Fps is (10 sec: 11469.6, 60 sec: 14473.0, 300 sec: 14759.5). Total num frames: 137650176. Throughput: 0: 3742.9. Samples: 23581118. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 23:42:38,968][134211] Avg episode reward: [(0, '5.438')] [2025-01-03 23:42:41,522][134294] Updated weights for policy 0, policy_version 33614 (0.0024) [2025-01-03 23:42:43,968][134211] Fps is (10 sec: 11468.7, 60 sec: 14336.0, 300 sec: 14648.4). Total num frames: 137707520. Throughput: 0: 3698.3. Samples: 23598310. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 23:42:43,969][134211] Avg episode reward: [(0, '5.740')] [2025-01-03 23:42:45,116][134294] Updated weights for policy 0, policy_version 33624 (0.0025) [2025-01-03 23:42:47,223][134294] Updated weights for policy 0, policy_version 33634 (0.0012) [2025-01-03 23:42:48,968][134211] Fps is (10 sec: 14745.9, 60 sec: 14677.4, 300 sec: 14745.6). Total num frames: 137797632. Throughput: 0: 3712.4. Samples: 23609330. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 23:42:48,968][134211] Avg episode reward: [(0, '5.448')] [2025-01-03 23:42:49,407][134294] Updated weights for policy 0, policy_version 33644 (0.0013) [2025-01-03 23:42:51,295][134294] Updated weights for policy 0, policy_version 33654 (0.0013) [2025-01-03 23:42:53,956][134294] Updated weights for policy 0, policy_version 33664 (0.0022) [2025-01-03 23:42:53,968][134211] Fps is (10 sec: 18022.6, 60 sec: 15155.2, 300 sec: 14828.9). Total num frames: 137887744. Throughput: 0: 3823.4. Samples: 23639046. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 23:42:53,968][134211] Avg episode reward: [(0, '5.269')] [2025-01-03 23:42:57,277][134294] Updated weights for policy 0, policy_version 33674 (0.0027) [2025-01-03 23:42:58,968][134211] Fps is (10 sec: 15154.9, 60 sec: 15086.9, 300 sec: 14815.1). Total num frames: 137949184. Throughput: 0: 3805.1. Samples: 23658246. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 23:42:58,968][134211] Avg episode reward: [(0, '4.902')] [2025-01-03 23:43:00,473][134294] Updated weights for policy 0, policy_version 33684 (0.0026) [2025-01-03 23:43:03,491][134294] Updated weights for policy 0, policy_version 33694 (0.0026) [2025-01-03 23:43:03,968][134211] Fps is (10 sec: 12697.5, 60 sec: 14813.8, 300 sec: 14815.0). Total num frames: 138014720. Throughput: 0: 3767.2. Samples: 23668256. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 23:43:03,968][134211] Avg episode reward: [(0, '5.379')] [2025-01-03 23:43:06,488][134294] Updated weights for policy 0, policy_version 33704 (0.0027) [2025-01-03 23:43:08,969][134211] Fps is (10 sec: 12696.4, 60 sec: 14677.2, 300 sec: 14787.2). Total num frames: 138076160. Throughput: 0: 3490.5. Samples: 23688000. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 23:43:08,969][134211] Avg episode reward: [(0, '5.678')] [2025-01-03 23:43:10,126][134294] Updated weights for policy 0, policy_version 33714 (0.0028) [2025-01-03 23:43:13,544][134294] Updated weights for policy 0, policy_version 33724 (0.0027) [2025-01-03 23:43:13,968][134211] Fps is (10 sec: 12287.9, 60 sec: 14540.8, 300 sec: 14759.5). Total num frames: 138137600. Throughput: 0: 3320.5. Samples: 23705356. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 23:43:13,969][134211] Avg episode reward: [(0, '6.582')] [2025-01-03 23:43:13,978][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000033725_138137600.pth... [2025-01-03 23:43:14,056][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000032864_134610944.pth [2025-01-03 23:43:17,050][134294] Updated weights for policy 0, policy_version 33734 (0.0029) [2025-01-03 23:43:18,968][134211] Fps is (10 sec: 12289.5, 60 sec: 13994.7, 300 sec: 14745.6). Total num frames: 138199040. Throughput: 0: 3334.4. Samples: 23714192. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 23:43:18,968][134211] Avg episode reward: [(0, '5.183')] [2025-01-03 23:43:19,734][134294] Updated weights for policy 0, policy_version 33744 (0.0017) [2025-01-03 23:43:21,659][134294] Updated weights for policy 0, policy_version 33754 (0.0014) [2025-01-03 23:43:23,525][134294] Updated weights for policy 0, policy_version 33764 (0.0014) [2025-01-03 23:43:23,967][134211] Fps is (10 sec: 16794.3, 60 sec: 13994.7, 300 sec: 14815.0). Total num frames: 138305536. Throughput: 0: 3534.9. Samples: 23740190. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 23:43:23,968][134211] Avg episode reward: [(0, '5.314')] [2025-01-03 23:43:25,434][134294] Updated weights for policy 0, policy_version 33774 (0.0014) [2025-01-03 23:43:27,337][134294] Updated weights for policy 0, policy_version 33784 (0.0014) [2025-01-03 23:43:28,967][134211] Fps is (10 sec: 21299.3, 60 sec: 14609.3, 300 sec: 14801.1). Total num frames: 138412032. Throughput: 0: 3877.1. Samples: 23772780. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 23:43:28,968][134211] Avg episode reward: [(0, '5.520')] [2025-01-03 23:43:29,160][134294] Updated weights for policy 0, policy_version 33794 (0.0014) [2025-01-03 23:43:31,596][134294] Updated weights for policy 0, policy_version 33804 (0.0020) [2025-01-03 23:43:33,968][134211] Fps is (10 sec: 18431.4, 60 sec: 14950.4, 300 sec: 14828.9). Total num frames: 138489856. Throughput: 0: 3948.8. Samples: 23787026. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 23:43:33,969][134211] Avg episode reward: [(0, '6.028')] [2025-01-03 23:43:34,851][134294] Updated weights for policy 0, policy_version 33814 (0.0032) [2025-01-03 23:43:38,119][134294] Updated weights for policy 0, policy_version 33824 (0.0029) [2025-01-03 23:43:38,968][134211] Fps is (10 sec: 13925.9, 60 sec: 15018.7, 300 sec: 14815.0). Total num frames: 138551296. Throughput: 0: 3715.4. Samples: 23806240. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2025-01-03 23:43:38,968][134211] Avg episode reward: [(0, '5.822')] [2025-01-03 23:43:41,376][134294] Updated weights for policy 0, policy_version 33834 (0.0025) [2025-01-03 23:43:43,968][134211] Fps is (10 sec: 12288.0, 60 sec: 15087.0, 300 sec: 14787.2). Total num frames: 138612736. Throughput: 0: 3689.6. Samples: 23824280. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2025-01-03 23:43:43,968][134211] Avg episode reward: [(0, '6.524')] [2025-01-03 23:43:44,808][134294] Updated weights for policy 0, policy_version 33844 (0.0030) [2025-01-03 23:43:47,904][134294] Updated weights for policy 0, policy_version 33854 (0.0024) [2025-01-03 23:43:48,968][134211] Fps is (10 sec: 12697.2, 60 sec: 14677.2, 300 sec: 14787.2). Total num frames: 138678272. Throughput: 0: 3682.2. Samples: 23833958. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2025-01-03 23:43:48,969][134211] Avg episode reward: [(0, '6.884')] [2025-01-03 23:43:51,135][134294] Updated weights for policy 0, policy_version 33864 (0.0026) [2025-01-03 23:43:53,968][134211] Fps is (10 sec: 13107.4, 60 sec: 14267.8, 300 sec: 14773.4). Total num frames: 138743808. Throughput: 0: 3673.6. Samples: 23853308. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2025-01-03 23:43:53,968][134211] Avg episode reward: [(0, '7.080')] [2025-01-03 23:43:54,313][134294] Updated weights for policy 0, policy_version 33874 (0.0024) [2025-01-03 23:43:57,260][134294] Updated weights for policy 0, policy_version 33884 (0.0025) [2025-01-03 23:43:58,968][134211] Fps is (10 sec: 13107.9, 60 sec: 14336.0, 300 sec: 14759.5). Total num frames: 138809344. Throughput: 0: 3732.6. Samples: 23873320. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2025-01-03 23:43:58,968][134211] Avg episode reward: [(0, '5.877')] [2025-01-03 23:44:00,203][134294] Updated weights for policy 0, policy_version 33894 (0.0024) [2025-01-03 23:44:03,251][134294] Updated weights for policy 0, policy_version 33904 (0.0027) [2025-01-03 23:44:03,968][134211] Fps is (10 sec: 13516.7, 60 sec: 14404.3, 300 sec: 14787.3). Total num frames: 138878976. Throughput: 0: 3772.7. Samples: 23883964. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 23:44:03,968][134211] Avg episode reward: [(0, '6.651')] [2025-01-03 23:44:06,250][134294] Updated weights for policy 0, policy_version 33914 (0.0025) [2025-01-03 23:44:08,968][134211] Fps is (10 sec: 13516.7, 60 sec: 14472.8, 300 sec: 14703.9). Total num frames: 138944512. Throughput: 0: 3644.2. Samples: 23904180. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 23:44:08,968][134211] Avg episode reward: [(0, '6.636')] [2025-01-03 23:44:09,303][134294] Updated weights for policy 0, policy_version 33924 (0.0024) [2025-01-03 23:44:12,252][134294] Updated weights for policy 0, policy_version 33934 (0.0026) [2025-01-03 23:44:13,968][134211] Fps is (10 sec: 13516.7, 60 sec: 14609.1, 300 sec: 14565.1). Total num frames: 139014144. Throughput: 0: 3374.9. Samples: 23924650. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 23:44:13,968][134211] Avg episode reward: [(0, '6.709')] [2025-01-03 23:44:15,264][134294] Updated weights for policy 0, policy_version 33944 (0.0026) [2025-01-03 23:44:18,179][134294] Updated weights for policy 0, policy_version 33954 (0.0026) [2025-01-03 23:44:18,968][134211] Fps is (10 sec: 13926.5, 60 sec: 14745.6, 300 sec: 14579.0). Total num frames: 139083776. Throughput: 0: 3290.7. Samples: 23935106. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 23:44:18,968][134211] Avg episode reward: [(0, '6.690')] [2025-01-03 23:44:20,338][134294] Updated weights for policy 0, policy_version 33964 (0.0014) [2025-01-03 23:44:22,149][134294] Updated weights for policy 0, policy_version 33974 (0.0013) [2025-01-03 23:44:23,968][134211] Fps is (10 sec: 18022.8, 60 sec: 14813.8, 300 sec: 14731.7). Total num frames: 139194368. Throughput: 0: 3475.1. Samples: 23962620. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-03 23:44:23,968][134211] Avg episode reward: [(0, '6.346')] [2025-01-03 23:44:23,986][134294] Updated weights for policy 0, policy_version 33984 (0.0013) [2025-01-03 23:44:26,147][134294] Updated weights for policy 0, policy_version 33994 (0.0015) [2025-01-03 23:44:28,969][134211] Fps is (10 sec: 19248.9, 60 sec: 14404.0, 300 sec: 14787.2). Total num frames: 139276288. Throughput: 0: 3683.8. Samples: 23990054. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-03 23:44:28,969][134211] Avg episode reward: [(0, '6.165')] [2025-01-03 23:44:29,197][134294] Updated weights for policy 0, policy_version 34004 (0.0028) [2025-01-03 23:44:32,455][134294] Updated weights for policy 0, policy_version 34014 (0.0026) [2025-01-03 23:44:33,968][134211] Fps is (10 sec: 14335.9, 60 sec: 14131.2, 300 sec: 14759.5). Total num frames: 139337728. Throughput: 0: 3675.6. Samples: 23999360. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-03 23:44:33,968][134211] Avg episode reward: [(0, '7.150')] [2025-01-03 23:44:35,585][134294] Updated weights for policy 0, policy_version 34024 (0.0023) [2025-01-03 23:44:38,844][134294] Updated weights for policy 0, policy_version 34034 (0.0024) [2025-01-03 23:44:38,968][134211] Fps is (10 sec: 12699.0, 60 sec: 14199.5, 300 sec: 14759.5). Total num frames: 139403264. Throughput: 0: 3676.8. Samples: 24018766. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-03 23:44:38,968][134211] Avg episode reward: [(0, '7.170')] [2025-01-03 23:44:42,187][134294] Updated weights for policy 0, policy_version 34044 (0.0025) [2025-01-03 23:44:43,968][134211] Fps is (10 sec: 12697.6, 60 sec: 14199.5, 300 sec: 14745.6). Total num frames: 139464704. Throughput: 0: 3644.6. Samples: 24037326. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-03 23:44:43,968][134211] Avg episode reward: [(0, '6.479')] [2025-01-03 23:44:45,429][134294] Updated weights for policy 0, policy_version 34054 (0.0024) [2025-01-03 23:44:47,453][134294] Updated weights for policy 0, policy_version 34064 (0.0013) [2025-01-03 23:44:48,968][134211] Fps is (10 sec: 15155.3, 60 sec: 14609.2, 300 sec: 14801.1). Total num frames: 139554816. Throughput: 0: 3648.0. Samples: 24048122. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 23:44:48,968][134211] Avg episode reward: [(0, '6.412')] [2025-01-03 23:44:49,367][134294] Updated weights for policy 0, policy_version 34074 (0.0016) [2025-01-03 23:44:51,240][134294] Updated weights for policy 0, policy_version 34084 (0.0014) [2025-01-03 23:44:53,250][134294] Updated weights for policy 0, policy_version 34094 (0.0015) [2025-01-03 23:44:53,968][134211] Fps is (10 sec: 19251.0, 60 sec: 15223.4, 300 sec: 14773.4). Total num frames: 139657216. Throughput: 0: 3915.3. Samples: 24080368. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 23:44:53,968][134211] Avg episode reward: [(0, '6.522')] [2025-01-03 23:44:56,152][134294] Updated weights for policy 0, policy_version 34104 (0.0027) [2025-01-03 23:44:58,968][134211] Fps is (10 sec: 16383.8, 60 sec: 15155.2, 300 sec: 14634.5). Total num frames: 139718656. Throughput: 0: 3927.6. Samples: 24101392. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 23:44:58,968][134211] Avg episode reward: [(0, '6.674')] [2025-01-03 23:44:59,859][134294] Updated weights for policy 0, policy_version 34114 (0.0026) [2025-01-03 23:45:02,929][134294] Updated weights for policy 0, policy_version 34124 (0.0027) [2025-01-03 23:45:03,968][134211] Fps is (10 sec: 12697.7, 60 sec: 15086.9, 300 sec: 14648.4). Total num frames: 139784192. Throughput: 0: 3888.7. Samples: 24110100. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 23:45:03,968][134211] Avg episode reward: [(0, '6.316')] [2025-01-03 23:45:06,226][134294] Updated weights for policy 0, policy_version 34134 (0.0024) [2025-01-03 23:45:08,968][134211] Fps is (10 sec: 12697.5, 60 sec: 15018.6, 300 sec: 14634.5). Total num frames: 139845632. Throughput: 0: 3707.0. Samples: 24129434. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 23:45:08,968][134211] Avg episode reward: [(0, '6.753')] [2025-01-03 23:45:09,773][134294] Updated weights for policy 0, policy_version 34144 (0.0028) [2025-01-03 23:45:13,359][134294] Updated weights for policy 0, policy_version 34154 (0.0028) [2025-01-03 23:45:13,968][134211] Fps is (10 sec: 11468.7, 60 sec: 14745.6, 300 sec: 14579.0). Total num frames: 139898880. Throughput: 0: 3474.9. Samples: 24146420. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 23:45:13,968][134211] Avg episode reward: [(0, '6.290')] [2025-01-03 23:45:14,021][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000034156_139902976.pth... [2025-01-03 23:45:14,095][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000033299_136392704.pth [2025-01-03 23:45:16,509][134294] Updated weights for policy 0, policy_version 34164 (0.0022) [2025-01-03 23:45:18,699][134294] Updated weights for policy 0, policy_version 34174 (0.0017) [2025-01-03 23:45:18,968][134211] Fps is (10 sec: 13107.2, 60 sec: 14882.1, 300 sec: 14620.6). Total num frames: 139976704. Throughput: 0: 3472.6. Samples: 24155626. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 23:45:18,968][134211] Avg episode reward: [(0, '6.161')] [2025-01-03 23:45:21,732][134294] Updated weights for policy 0, policy_version 34184 (0.0023) [2025-01-03 23:45:23,968][134211] Fps is (10 sec: 14745.6, 60 sec: 14199.4, 300 sec: 14620.6). Total num frames: 140046336. Throughput: 0: 3564.1. Samples: 24179150. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 23:45:23,968][134211] Avg episode reward: [(0, '6.109')] [2025-01-03 23:45:24,804][134294] Updated weights for policy 0, policy_version 34194 (0.0024) [2025-01-03 23:45:26,974][134294] Updated weights for policy 0, policy_version 34204 (0.0014) [2025-01-03 23:45:28,968][134211] Fps is (10 sec: 15155.2, 60 sec: 14199.7, 300 sec: 14676.2). Total num frames: 140128256. Throughput: 0: 3680.4. Samples: 24202946. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 23:45:28,969][134211] Avg episode reward: [(0, '6.182')] [2025-01-03 23:45:29,872][134294] Updated weights for policy 0, policy_version 34214 (0.0023) [2025-01-03 23:45:33,327][134294] Updated weights for policy 0, policy_version 34224 (0.0025) [2025-01-03 23:45:33,968][134211] Fps is (10 sec: 13925.5, 60 sec: 14131.0, 300 sec: 14648.4). Total num frames: 140185600. Throughput: 0: 3647.1. Samples: 24212246. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 23:45:33,969][134211] Avg episode reward: [(0, '6.947')] [2025-01-03 23:45:35,746][134294] Updated weights for policy 0, policy_version 34234 (0.0012) [2025-01-03 23:45:37,693][134294] Updated weights for policy 0, policy_version 34244 (0.0013) [2025-01-03 23:45:38,968][134211] Fps is (10 sec: 15974.7, 60 sec: 14745.6, 300 sec: 14773.4). Total num frames: 140288000. Throughput: 0: 3470.6. Samples: 24236544. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 23:45:38,968][134211] Avg episode reward: [(0, '6.847')] [2025-01-03 23:45:39,572][134294] Updated weights for policy 0, policy_version 34254 (0.0013) [2025-01-03 23:45:41,493][134294] Updated weights for policy 0, policy_version 34264 (0.0014) [2025-01-03 23:45:43,968][134211] Fps is (10 sec: 19252.6, 60 sec: 15223.5, 300 sec: 14787.2). Total num frames: 140378112. Throughput: 0: 3657.5. Samples: 24265978. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 23:45:43,968][134211] Avg episode reward: [(0, '6.102')] [2025-01-03 23:45:44,347][134294] Updated weights for policy 0, policy_version 34274 (0.0023) [2025-01-03 23:45:47,803][134294] Updated weights for policy 0, policy_version 34284 (0.0026) [2025-01-03 23:45:48,968][134211] Fps is (10 sec: 15155.1, 60 sec: 14745.6, 300 sec: 14620.6). Total num frames: 140439552. Throughput: 0: 3664.1. Samples: 24274984. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 23:45:48,968][134211] Avg episode reward: [(0, '6.596')] [2025-01-03 23:45:51,176][134294] Updated weights for policy 0, policy_version 34294 (0.0026) [2025-01-03 23:45:53,968][134211] Fps is (10 sec: 12287.8, 60 sec: 14062.9, 300 sec: 14537.3). Total num frames: 140500992. Throughput: 0: 3637.4. Samples: 24293118. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-03 23:45:53,969][134211] Avg episode reward: [(0, '6.573')] [2025-01-03 23:45:54,600][134294] Updated weights for policy 0, policy_version 34304 (0.0023) [2025-01-03 23:45:58,005][134294] Updated weights for policy 0, policy_version 34314 (0.0029) [2025-01-03 23:45:58,968][134211] Fps is (10 sec: 12287.3, 60 sec: 14062.8, 300 sec: 14551.2). Total num frames: 140562432. Throughput: 0: 3669.0. Samples: 24311528. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-03 23:45:58,969][134211] Avg episode reward: [(0, '6.812')] [2025-01-03 23:46:00,970][134294] Updated weights for policy 0, policy_version 34324 (0.0027) [2025-01-03 23:46:03,316][134294] Updated weights for policy 0, policy_version 34334 (0.0016) [2025-01-03 23:46:03,968][134211] Fps is (10 sec: 13926.6, 60 sec: 14267.7, 300 sec: 14606.8). Total num frames: 140640256. Throughput: 0: 3687.9. Samples: 24321582. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-03 23:46:03,968][134211] Avg episode reward: [(0, '7.182')] [2025-01-03 23:46:05,953][134294] Updated weights for policy 0, policy_version 34344 (0.0022) [2025-01-03 23:46:08,968][134211] Fps is (10 sec: 14745.9, 60 sec: 14404.2, 300 sec: 14648.4). Total num frames: 140709888. Throughput: 0: 3693.8. Samples: 24345374. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-03 23:46:08,969][134211] Avg episode reward: [(0, '6.326')] [2025-01-03 23:46:09,259][134294] Updated weights for policy 0, policy_version 34354 (0.0027) [2025-01-03 23:46:12,342][134294] Updated weights for policy 0, policy_version 34364 (0.0026) [2025-01-03 23:46:13,968][134211] Fps is (10 sec: 13107.1, 60 sec: 14540.8, 300 sec: 14648.4). Total num frames: 140771328. Throughput: 0: 3584.2. Samples: 24364236. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 23:46:13,969][134211] Avg episode reward: [(0, '6.614')] [2025-01-03 23:46:15,352][134294] Updated weights for policy 0, policy_version 34374 (0.0023) [2025-01-03 23:46:17,343][134294] Updated weights for policy 0, policy_version 34384 (0.0012) [2025-01-03 23:46:18,968][134211] Fps is (10 sec: 15975.0, 60 sec: 14882.2, 300 sec: 14620.6). Total num frames: 140869632. Throughput: 0: 3646.2. Samples: 24376320. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 23:46:18,968][134211] Avg episode reward: [(0, '6.179')] [2025-01-03 23:46:19,233][134294] Updated weights for policy 0, policy_version 34394 (0.0012) [2025-01-03 23:46:21,172][134294] Updated weights for policy 0, policy_version 34404 (0.0013) [2025-01-03 23:46:23,333][134294] Updated weights for policy 0, policy_version 34414 (0.0018) [2025-01-03 23:46:23,968][134211] Fps is (10 sec: 19661.0, 60 sec: 15360.0, 300 sec: 14634.5). Total num frames: 140967936. Throughput: 0: 3814.3. Samples: 24408186. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 23:46:23,968][134211] Avg episode reward: [(0, '5.795')] [2025-01-03 23:46:26,693][134294] Updated weights for policy 0, policy_version 34424 (0.0030) [2025-01-03 23:46:28,968][134211] Fps is (10 sec: 15564.5, 60 sec: 14950.4, 300 sec: 14620.6). Total num frames: 141025280. Throughput: 0: 3588.7. Samples: 24427470. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 23:46:28,968][134211] Avg episode reward: [(0, '5.836')] [2025-01-03 23:46:30,243][134294] Updated weights for policy 0, policy_version 34434 (0.0026) [2025-01-03 23:46:33,587][134294] Updated weights for policy 0, policy_version 34444 (0.0028) [2025-01-03 23:46:33,970][134211] Fps is (10 sec: 11466.3, 60 sec: 14950.0, 300 sec: 14579.0). Total num frames: 141082624. Throughput: 0: 3589.9. Samples: 24436536. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 23:46:33,970][134211] Avg episode reward: [(0, '6.285')] [2025-01-03 23:46:36,805][134294] Updated weights for policy 0, policy_version 34454 (0.0027) [2025-01-03 23:46:38,968][134211] Fps is (10 sec: 12288.0, 60 sec: 14335.9, 300 sec: 14579.0). Total num frames: 141148160. Throughput: 0: 3604.6. Samples: 24455324. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 23:46:38,968][134211] Avg episode reward: [(0, '6.445')] [2025-01-03 23:46:40,064][134294] Updated weights for policy 0, policy_version 34464 (0.0030) [2025-01-03 23:46:43,492][134294] Updated weights for policy 0, policy_version 34474 (0.0026) [2025-01-03 23:46:43,968][134211] Fps is (10 sec: 12700.0, 60 sec: 13858.1, 300 sec: 14551.2). Total num frames: 141209600. Throughput: 0: 3599.9. Samples: 24473524. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 23:46:43,969][134211] Avg episode reward: [(0, '6.607')] [2025-01-03 23:46:46,359][134294] Updated weights for policy 0, policy_version 34484 (0.0026) [2025-01-03 23:46:48,384][134294] Updated weights for policy 0, policy_version 34494 (0.0013) [2025-01-03 23:46:48,968][134211] Fps is (10 sec: 14746.0, 60 sec: 14267.8, 300 sec: 14634.5). Total num frames: 141295616. Throughput: 0: 3609.8. Samples: 24484022. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 23:46:48,968][134211] Avg episode reward: [(0, '6.305')] [2025-01-03 23:46:50,363][134294] Updated weights for policy 0, policy_version 34504 (0.0014) [2025-01-03 23:46:52,325][134294] Updated weights for policy 0, policy_version 34514 (0.0015) [2025-01-03 23:46:53,968][134211] Fps is (10 sec: 19251.9, 60 sec: 15018.7, 300 sec: 14773.4). Total num frames: 141402112. Throughput: 0: 3767.6. Samples: 24514916. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 23:46:53,968][134211] Avg episode reward: [(0, '6.870')] [2025-01-03 23:46:54,289][134294] Updated weights for policy 0, policy_version 34524 (0.0014) [2025-01-03 23:46:56,958][134294] Updated weights for policy 0, policy_version 34534 (0.0023) [2025-01-03 23:46:58,968][134211] Fps is (10 sec: 17612.3, 60 sec: 15155.3, 300 sec: 14731.7). Total num frames: 141471744. Throughput: 0: 3891.2. Samples: 24539338. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 23:46:58,969][134211] Avg episode reward: [(0, '6.207')] [2025-01-03 23:47:00,411][134294] Updated weights for policy 0, policy_version 34544 (0.0029) [2025-01-03 23:47:03,611][134294] Updated weights for policy 0, policy_version 34554 (0.0026) [2025-01-03 23:47:03,968][134211] Fps is (10 sec: 13516.5, 60 sec: 14950.4, 300 sec: 14717.9). Total num frames: 141537280. Throughput: 0: 3832.3. Samples: 24548774. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 23:47:03,968][134211] Avg episode reward: [(0, '6.440')] [2025-01-03 23:47:06,773][134294] Updated weights for policy 0, policy_version 34564 (0.0026) [2025-01-03 23:47:08,968][134211] Fps is (10 sec: 12288.0, 60 sec: 14745.7, 300 sec: 14676.2). Total num frames: 141594624. Throughput: 0: 3549.5. Samples: 24567912. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 23:47:08,968][134211] Avg episode reward: [(0, '6.503')] [2025-01-03 23:47:10,643][134294] Updated weights for policy 0, policy_version 34574 (0.0028) [2025-01-03 23:47:13,968][134211] Fps is (10 sec: 11468.6, 60 sec: 14677.3, 300 sec: 14551.2). Total num frames: 141651968. Throughput: 0: 3486.3. Samples: 24584354. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 23:47:13,969][134211] Avg episode reward: [(0, '6.510')] [2025-01-03 23:47:13,982][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000034583_141651968.pth... [2025-01-03 23:47:14,061][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000033725_138137600.pth [2025-01-03 23:47:14,163][134294] Updated weights for policy 0, policy_version 34584 (0.0029) [2025-01-03 23:47:17,758][134294] Updated weights for policy 0, policy_version 34594 (0.0027) [2025-01-03 23:47:18,967][134211] Fps is (10 sec: 11878.7, 60 sec: 14063.0, 300 sec: 14398.5). Total num frames: 141713408. Throughput: 0: 3476.4. Samples: 24592968. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 23:47:18,968][134211] Avg episode reward: [(0, '6.323')] [2025-01-03 23:47:20,115][134294] Updated weights for policy 0, policy_version 34604 (0.0016) [2025-01-03 23:47:21,900][134294] Updated weights for policy 0, policy_version 34614 (0.0016) [2025-01-03 23:47:23,804][134294] Updated weights for policy 0, policy_version 34624 (0.0014) [2025-01-03 23:47:23,968][134211] Fps is (10 sec: 16794.1, 60 sec: 14199.5, 300 sec: 14523.5). Total num frames: 141819904. Throughput: 0: 3641.7. Samples: 24619202. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-03 23:47:23,968][134211] Avg episode reward: [(0, '6.790')] [2025-01-03 23:47:25,705][134294] Updated weights for policy 0, policy_version 34634 (0.0013) [2025-01-03 23:47:28,581][134294] Updated weights for policy 0, policy_version 34644 (0.0025) [2025-01-03 23:47:28,968][134211] Fps is (10 sec: 19250.8, 60 sec: 14677.3, 300 sec: 14620.6). Total num frames: 141905920. Throughput: 0: 3866.1. Samples: 24647498. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-03 23:47:28,968][134211] Avg episode reward: [(0, '5.766')] [2025-01-03 23:47:31,632][134294] Updated weights for policy 0, policy_version 34654 (0.0028) [2025-01-03 23:47:33,968][134211] Fps is (10 sec: 15154.8, 60 sec: 14814.4, 300 sec: 14648.4). Total num frames: 141971456. Throughput: 0: 3848.7. Samples: 24657216. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-03 23:47:33,969][134211] Avg episode reward: [(0, '7.265')] [2025-01-03 23:47:34,944][134294] Updated weights for policy 0, policy_version 34664 (0.0028) [2025-01-03 23:47:38,164][134294] Updated weights for policy 0, policy_version 34674 (0.0025) [2025-01-03 23:47:38,968][134211] Fps is (10 sec: 12697.5, 60 sec: 14745.6, 300 sec: 14662.3). Total num frames: 142032896. Throughput: 0: 3583.0. Samples: 24676152. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-03 23:47:38,968][134211] Avg episode reward: [(0, '6.190')] [2025-01-03 23:47:41,276][134294] Updated weights for policy 0, policy_version 34684 (0.0026) [2025-01-03 23:47:43,968][134211] Fps is (10 sec: 12697.6, 60 sec: 14813.9, 300 sec: 14579.0). Total num frames: 142098432. Throughput: 0: 3477.3. Samples: 24695818. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-03 23:47:43,969][134211] Avg episode reward: [(0, '6.702')] [2025-01-03 23:47:44,465][134294] Updated weights for policy 0, policy_version 34694 (0.0025) [2025-01-03 23:47:47,499][134294] Updated weights for policy 0, policy_version 34704 (0.0024) [2025-01-03 23:47:48,968][134211] Fps is (10 sec: 13516.9, 60 sec: 14540.8, 300 sec: 14509.6). Total num frames: 142168064. Throughput: 0: 3492.2. Samples: 24705924. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-03 23:47:48,968][134211] Avg episode reward: [(0, '6.746')] [2025-01-03 23:47:50,456][134294] Updated weights for policy 0, policy_version 34714 (0.0024) [2025-01-03 23:47:53,717][134294] Updated weights for policy 0, policy_version 34724 (0.0025) [2025-01-03 23:47:53,968][134211] Fps is (10 sec: 13107.3, 60 sec: 13789.8, 300 sec: 14509.6). Total num frames: 142229504. Throughput: 0: 3507.9. Samples: 24725768. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-03 23:47:53,968][134211] Avg episode reward: [(0, '6.832')] [2025-01-03 23:47:56,936][134294] Updated weights for policy 0, policy_version 34734 (0.0025) [2025-01-03 23:47:58,968][134211] Fps is (10 sec: 12697.7, 60 sec: 13721.6, 300 sec: 14509.6). Total num frames: 142295040. Throughput: 0: 3561.4. Samples: 24744614. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-03 23:47:58,968][134211] Avg episode reward: [(0, '6.677')] [2025-01-03 23:48:00,283][134294] Updated weights for policy 0, policy_version 34744 (0.0024) [2025-01-03 23:48:02,209][134294] Updated weights for policy 0, policy_version 34754 (0.0015) [2025-01-03 23:48:03,968][134211] Fps is (10 sec: 15974.8, 60 sec: 14199.5, 300 sec: 14620.7). Total num frames: 142389248. Throughput: 0: 3622.7. Samples: 24755992. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-03 23:48:03,968][134211] Avg episode reward: [(0, '6.419')] [2025-01-03 23:48:04,125][134294] Updated weights for policy 0, policy_version 34764 (0.0014) [2025-01-03 23:48:06,737][134294] Updated weights for policy 0, policy_version 34774 (0.0023) [2025-01-03 23:48:08,968][134211] Fps is (10 sec: 16793.6, 60 sec: 14472.6, 300 sec: 14662.3). Total num frames: 142462976. Throughput: 0: 3643.5. Samples: 24783158. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-03 23:48:08,968][134211] Avg episode reward: [(0, '5.902')] [2025-01-03 23:48:09,864][134294] Updated weights for policy 0, policy_version 34784 (0.0026) [2025-01-03 23:48:12,799][134294] Updated weights for policy 0, policy_version 34794 (0.0028) [2025-01-03 23:48:13,968][134211] Fps is (10 sec: 13926.1, 60 sec: 14609.1, 300 sec: 14676.2). Total num frames: 142528512. Throughput: 0: 3461.3. Samples: 24803256. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-03 23:48:13,968][134211] Avg episode reward: [(0, '6.503')] [2025-01-03 23:48:15,966][134294] Updated weights for policy 0, policy_version 34804 (0.0027) [2025-01-03 23:48:18,866][134294] Updated weights for policy 0, policy_version 34814 (0.0025) [2025-01-03 23:48:18,969][134211] Fps is (10 sec: 13515.0, 60 sec: 14745.2, 300 sec: 14551.1). Total num frames: 142598144. Throughput: 0: 3468.2. Samples: 24813288. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-03 23:48:18,970][134211] Avg episode reward: [(0, '6.886')] [2025-01-03 23:48:21,919][134294] Updated weights for policy 0, policy_version 34824 (0.0024) [2025-01-03 23:48:23,968][134211] Fps is (10 sec: 13926.7, 60 sec: 14131.2, 300 sec: 14426.2). Total num frames: 142667776. Throughput: 0: 3504.3. Samples: 24833844. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-03 23:48:23,968][134211] Avg episode reward: [(0, '6.016')] [2025-01-03 23:48:24,534][134294] Updated weights for policy 0, policy_version 34834 (0.0020) [2025-01-03 23:48:26,392][134294] Updated weights for policy 0, policy_version 34844 (0.0013) [2025-01-03 23:48:28,601][134294] Updated weights for policy 0, policy_version 34854 (0.0019) [2025-01-03 23:48:28,968][134211] Fps is (10 sec: 16795.6, 60 sec: 14336.0, 300 sec: 14495.7). Total num frames: 142766080. Throughput: 0: 3697.9. Samples: 24862224. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 23:48:28,968][134211] Avg episode reward: [(0, '6.853')] [2025-01-03 23:48:31,585][134294] Updated weights for policy 0, policy_version 34864 (0.0024) [2025-01-03 23:48:33,968][134211] Fps is (10 sec: 16383.6, 60 sec: 14336.0, 300 sec: 14509.6). Total num frames: 142831616. Throughput: 0: 3698.9. Samples: 24872374. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 23:48:33,968][134211] Avg episode reward: [(0, '6.454')] [2025-01-03 23:48:34,812][134294] Updated weights for policy 0, policy_version 34874 (0.0027) [2025-01-03 23:48:37,878][134294] Updated weights for policy 0, policy_version 34884 (0.0028) [2025-01-03 23:48:38,968][134211] Fps is (10 sec: 13106.9, 60 sec: 14404.2, 300 sec: 14523.4). Total num frames: 142897152. Throughput: 0: 3695.7. Samples: 24892074. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 23:48:38,969][134211] Avg episode reward: [(0, '6.557')] [2025-01-03 23:48:41,061][134294] Updated weights for policy 0, policy_version 34894 (0.0023) [2025-01-03 23:48:43,719][134294] Updated weights for policy 0, policy_version 34904 (0.0021) [2025-01-03 23:48:43,968][134211] Fps is (10 sec: 13926.7, 60 sec: 14540.9, 300 sec: 14551.2). Total num frames: 142970880. Throughput: 0: 3717.7. Samples: 24911912. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 23:48:43,968][134211] Avg episode reward: [(0, '6.476')] [2025-01-03 23:48:45,653][134294] Updated weights for policy 0, policy_version 34914 (0.0014) [2025-01-03 23:48:47,537][134294] Updated weights for policy 0, policy_version 34924 (0.0015) [2025-01-03 23:48:48,968][134211] Fps is (10 sec: 18023.1, 60 sec: 15155.2, 300 sec: 14690.1). Total num frames: 143077376. Throughput: 0: 3823.0. Samples: 24928028. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 23:48:48,968][134211] Avg episode reward: [(0, '6.953')] [2025-01-03 23:48:49,405][134294] Updated weights for policy 0, policy_version 34934 (0.0014) [2025-01-03 23:48:51,534][134294] Updated weights for policy 0, policy_version 34944 (0.0018) [2025-01-03 23:48:53,968][134211] Fps is (10 sec: 18841.1, 60 sec: 15496.5, 300 sec: 14745.6). Total num frames: 143159296. Throughput: 0: 3886.4. Samples: 24958046. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-03 23:48:53,969][134211] Avg episode reward: [(0, '7.244')] [2025-01-03 23:48:54,655][134294] Updated weights for policy 0, policy_version 34954 (0.0025) [2025-01-03 23:48:57,761][134294] Updated weights for policy 0, policy_version 34964 (0.0027) [2025-01-03 23:48:58,968][134211] Fps is (10 sec: 14744.7, 60 sec: 15496.4, 300 sec: 14731.7). Total num frames: 143224832. Throughput: 0: 3873.9. Samples: 24977584. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-03 23:48:58,969][134211] Avg episode reward: [(0, '7.099')] [2025-01-03 23:49:00,881][134294] Updated weights for policy 0, policy_version 34974 (0.0027) [2025-01-03 23:49:03,918][134294] Updated weights for policy 0, policy_version 34984 (0.0027) [2025-01-03 23:49:03,968][134211] Fps is (10 sec: 13517.0, 60 sec: 15086.9, 300 sec: 14745.6). Total num frames: 143294464. Throughput: 0: 3872.6. Samples: 24987552. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-03 23:49:03,968][134211] Avg episode reward: [(0, '6.486')] [2025-01-03 23:49:06,903][134294] Updated weights for policy 0, policy_version 34994 (0.0025) [2025-01-03 23:49:08,968][134211] Fps is (10 sec: 13107.9, 60 sec: 14882.1, 300 sec: 14717.8). Total num frames: 143355904. Throughput: 0: 3864.2. Samples: 25007734. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-03 23:49:08,968][134211] Avg episode reward: [(0, '5.893')] [2025-01-03 23:49:10,503][134294] Updated weights for policy 0, policy_version 35004 (0.0023) [2025-01-03 23:49:13,968][134211] Fps is (10 sec: 11878.5, 60 sec: 14745.6, 300 sec: 14676.2). Total num frames: 143413248. Throughput: 0: 3620.9. Samples: 25025164. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 23:49:13,968][134211] Avg episode reward: [(0, '7.203')] [2025-01-03 23:49:13,975][134294] Updated weights for policy 0, policy_version 35014 (0.0028) [2025-01-03 23:49:13,977][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000035014_143417344.pth... [2025-01-03 23:49:14,047][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000034156_139902976.pth [2025-01-03 23:49:17,299][134294] Updated weights for policy 0, policy_version 35024 (0.0026) [2025-01-03 23:49:18,968][134211] Fps is (10 sec: 11878.3, 60 sec: 14609.4, 300 sec: 14509.6). Total num frames: 143474688. Throughput: 0: 3592.1. Samples: 25034016. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 23:49:18,968][134211] Avg episode reward: [(0, '6.266')] [2025-01-03 23:49:20,250][134294] Updated weights for policy 0, policy_version 35034 (0.0022) [2025-01-03 23:49:22,179][134294] Updated weights for policy 0, policy_version 35044 (0.0014) [2025-01-03 23:49:23,967][134211] Fps is (10 sec: 16384.3, 60 sec: 15155.2, 300 sec: 14579.0). Total num frames: 143577088. Throughput: 0: 3697.3. Samples: 25058452. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 23:49:23,968][134211] Avg episode reward: [(0, '6.189')] [2025-01-03 23:49:24,019][134294] Updated weights for policy 0, policy_version 35054 (0.0015) [2025-01-03 23:49:25,931][134294] Updated weights for policy 0, policy_version 35064 (0.0013) [2025-01-03 23:49:28,540][134294] Updated weights for policy 0, policy_version 35074 (0.0022) [2025-01-03 23:49:28,968][134211] Fps is (10 sec: 19251.3, 60 sec: 15018.7, 300 sec: 14676.2). Total num frames: 143667200. Throughput: 0: 3910.8. Samples: 25087900. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 23:49:28,968][134211] Avg episode reward: [(0, '6.380')] [2025-01-03 23:49:31,770][134294] Updated weights for policy 0, policy_version 35084 (0.0025) [2025-01-03 23:49:33,968][134211] Fps is (10 sec: 15154.8, 60 sec: 14950.4, 300 sec: 14662.3). Total num frames: 143728640. Throughput: 0: 3756.3. Samples: 25097064. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 23:49:33,968][134211] Avg episode reward: [(0, '6.520')] [2025-01-03 23:49:35,025][134294] Updated weights for policy 0, policy_version 35094 (0.0028) [2025-01-03 23:49:38,181][134294] Updated weights for policy 0, policy_version 35104 (0.0027) [2025-01-03 23:49:38,968][134211] Fps is (10 sec: 12697.5, 60 sec: 14950.5, 300 sec: 14676.2). Total num frames: 143794176. Throughput: 0: 3522.3. Samples: 25116548. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-03 23:49:38,968][134211] Avg episode reward: [(0, '6.465')] [2025-01-03 23:49:41,215][134294] Updated weights for policy 0, policy_version 35114 (0.0026) [2025-01-03 23:49:43,968][134211] Fps is (10 sec: 13107.1, 60 sec: 14813.8, 300 sec: 14592.9). Total num frames: 143859712. Throughput: 0: 3524.3. Samples: 25136178. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-03 23:49:43,969][134211] Avg episode reward: [(0, '6.376')] [2025-01-03 23:49:44,391][134294] Updated weights for policy 0, policy_version 35124 (0.0027) [2025-01-03 23:49:47,398][134294] Updated weights for policy 0, policy_version 35134 (0.0024) [2025-01-03 23:49:48,968][134211] Fps is (10 sec: 13107.4, 60 sec: 14131.2, 300 sec: 14467.9). Total num frames: 143925248. Throughput: 0: 3526.1. Samples: 25146224. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-03 23:49:48,968][134211] Avg episode reward: [(0, '7.320')] [2025-01-03 23:49:50,419][134294] Updated weights for policy 0, policy_version 35144 (0.0026) [2025-01-03 23:49:53,644][134294] Updated weights for policy 0, policy_version 35154 (0.0026) [2025-01-03 23:49:53,968][134211] Fps is (10 sec: 13516.8, 60 sec: 13926.4, 300 sec: 14495.7). Total num frames: 143994880. Throughput: 0: 3525.8. Samples: 25166394. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-03 23:49:53,969][134211] Avg episode reward: [(0, '6.373')] [2025-01-03 23:49:56,148][134294] Updated weights for policy 0, policy_version 35164 (0.0021) [2025-01-03 23:49:58,051][134294] Updated weights for policy 0, policy_version 35174 (0.0013) [2025-01-03 23:49:58,968][134211] Fps is (10 sec: 16384.1, 60 sec: 14404.4, 300 sec: 14592.9). Total num frames: 144089088. Throughput: 0: 3710.6. Samples: 25192140. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 23:49:58,968][134211] Avg episode reward: [(0, '7.064')] [2025-01-03 23:49:59,970][134294] Updated weights for policy 0, policy_version 35184 (0.0015) [2025-01-03 23:50:03,010][134294] Updated weights for policy 0, policy_version 35194 (0.0026) [2025-01-03 23:50:03,968][134211] Fps is (10 sec: 16793.6, 60 sec: 14472.5, 300 sec: 14634.5). Total num frames: 144162816. Throughput: 0: 3819.5. Samples: 25205894. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 23:50:03,968][134211] Avg episode reward: [(0, '6.648')] [2025-01-03 23:50:06,101][134294] Updated weights for policy 0, policy_version 35204 (0.0025) [2025-01-03 23:50:08,968][134211] Fps is (10 sec: 14335.8, 60 sec: 14609.1, 300 sec: 14690.1). Total num frames: 144232448. Throughput: 0: 3714.2. Samples: 25225590. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 23:50:08,968][134211] Avg episode reward: [(0, '6.988')] [2025-01-03 23:50:09,345][134294] Updated weights for policy 0, policy_version 35214 (0.0027) [2025-01-03 23:50:12,292][134294] Updated weights for policy 0, policy_version 35224 (0.0026) [2025-01-03 23:50:13,968][134211] Fps is (10 sec: 13516.0, 60 sec: 14745.4, 300 sec: 14648.4). Total num frames: 144297984. Throughput: 0: 3499.6. Samples: 25245386. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 23:50:13,969][134211] Avg episode reward: [(0, '7.420')] [2025-01-03 23:50:15,475][134294] Updated weights for policy 0, policy_version 35234 (0.0025) [2025-01-03 23:50:17,895][134294] Updated weights for policy 0, policy_version 35244 (0.0017) [2025-01-03 23:50:18,968][134211] Fps is (10 sec: 14745.8, 60 sec: 15087.0, 300 sec: 14690.1). Total num frames: 144379904. Throughput: 0: 3516.4. Samples: 25255302. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 23:50:18,968][134211] Avg episode reward: [(0, '7.007')] [2025-01-03 23:50:20,183][134294] Updated weights for policy 0, policy_version 35254 (0.0018) [2025-01-03 23:50:23,114][134294] Updated weights for policy 0, policy_version 35264 (0.0026) [2025-01-03 23:50:23,968][134211] Fps is (10 sec: 15156.2, 60 sec: 14540.7, 300 sec: 14648.4). Total num frames: 144449536. Throughput: 0: 3643.4. Samples: 25280500. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 23:50:23,968][134211] Avg episode reward: [(0, '6.577')] [2025-01-03 23:50:26,062][134294] Updated weights for policy 0, policy_version 35274 (0.0024) [2025-01-03 23:50:28,968][134211] Fps is (10 sec: 13925.3, 60 sec: 14199.3, 300 sec: 14690.1). Total num frames: 144519168. Throughput: 0: 3656.0. Samples: 25300700. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 23:50:28,969][134211] Avg episode reward: [(0, '6.616')] [2025-01-03 23:50:29,208][134294] Updated weights for policy 0, policy_version 35284 (0.0024) [2025-01-03 23:50:31,462][134294] Updated weights for policy 0, policy_version 35294 (0.0015) [2025-01-03 23:50:33,370][134294] Updated weights for policy 0, policy_version 35304 (0.0013) [2025-01-03 23:50:33,967][134211] Fps is (10 sec: 16794.0, 60 sec: 14813.9, 300 sec: 14676.2). Total num frames: 144617472. Throughput: 0: 3699.5. Samples: 25312702. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 23:50:33,968][134211] Avg episode reward: [(0, '6.276')] [2025-01-03 23:50:35,609][134294] Updated weights for policy 0, policy_version 35314 (0.0018) [2025-01-03 23:50:38,679][134294] Updated weights for policy 0, policy_version 35324 (0.0024) [2025-01-03 23:50:38,968][134211] Fps is (10 sec: 17204.5, 60 sec: 14950.4, 300 sec: 14620.6). Total num frames: 144691200. Throughput: 0: 3864.5. Samples: 25340296. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 23:50:38,968][134211] Avg episode reward: [(0, '6.281')] [2025-01-03 23:50:41,724][134294] Updated weights for policy 0, policy_version 35334 (0.0025) [2025-01-03 23:50:43,968][134211] Fps is (10 sec: 13515.7, 60 sec: 14882.0, 300 sec: 14620.6). Total num frames: 144752640. Throughput: 0: 3720.7. Samples: 25359574. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 23:50:43,969][134211] Avg episode reward: [(0, '6.607')] [2025-01-03 23:50:44,947][134294] Updated weights for policy 0, policy_version 35344 (0.0028) [2025-01-03 23:50:47,953][134294] Updated weights for policy 0, policy_version 35354 (0.0026) [2025-01-03 23:50:48,968][134211] Fps is (10 sec: 13107.1, 60 sec: 14950.4, 300 sec: 14648.4). Total num frames: 144822272. Throughput: 0: 3635.7. Samples: 25369498. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 23:50:48,969][134211] Avg episode reward: [(0, '6.384')] [2025-01-03 23:50:50,855][134294] Updated weights for policy 0, policy_version 35364 (0.0022) [2025-01-03 23:50:52,848][134294] Updated weights for policy 0, policy_version 35374 (0.0012) [2025-01-03 23:50:53,968][134211] Fps is (10 sec: 15975.6, 60 sec: 15291.8, 300 sec: 14745.6). Total num frames: 144912384. Throughput: 0: 3718.0. Samples: 25392900. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 23:50:53,968][134211] Avg episode reward: [(0, '6.642')] [2025-01-03 23:50:54,766][134294] Updated weights for policy 0, policy_version 35384 (0.0014) [2025-01-03 23:50:56,700][134294] Updated weights for policy 0, policy_version 35394 (0.0013) [2025-01-03 23:50:58,846][134294] Updated weights for policy 0, policy_version 35404 (0.0016) [2025-01-03 23:50:58,968][134211] Fps is (10 sec: 19250.6, 60 sec: 15428.1, 300 sec: 14828.9). Total num frames: 145014784. Throughput: 0: 3976.3. Samples: 25424316. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 23:50:58,969][134211] Avg episode reward: [(0, '6.782')] [2025-01-03 23:51:01,915][134294] Updated weights for policy 0, policy_version 35414 (0.0026) [2025-01-03 23:51:03,968][134211] Fps is (10 sec: 16383.8, 60 sec: 15223.5, 300 sec: 14801.2). Total num frames: 145076224. Throughput: 0: 3983.9. Samples: 25434578. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 23:51:03,968][134211] Avg episode reward: [(0, '6.629')] [2025-01-03 23:51:05,364][134294] Updated weights for policy 0, policy_version 35424 (0.0029) [2025-01-03 23:51:08,648][134294] Updated weights for policy 0, policy_version 35434 (0.0022) [2025-01-03 23:51:08,968][134211] Fps is (10 sec: 12288.4, 60 sec: 15086.9, 300 sec: 14801.1). Total num frames: 145137664. Throughput: 0: 3841.5. Samples: 25453368. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 23:51:08,968][134211] Avg episode reward: [(0, '6.294')] [2025-01-03 23:51:12,209][134294] Updated weights for policy 0, policy_version 35444 (0.0027) [2025-01-03 23:51:13,968][134211] Fps is (10 sec: 11878.1, 60 sec: 14950.5, 300 sec: 14662.3). Total num frames: 145195008. Throughput: 0: 3771.7. Samples: 25470426. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 23:51:13,969][134211] Avg episode reward: [(0, '6.621')] [2025-01-03 23:51:13,979][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000035448_145195008.pth... [2025-01-03 23:51:14,064][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000034583_141651968.pth [2025-01-03 23:51:15,932][134294] Updated weights for policy 0, policy_version 35454 (0.0029) [2025-01-03 23:51:18,396][134294] Updated weights for policy 0, policy_version 35464 (0.0017) [2025-01-03 23:51:18,968][134211] Fps is (10 sec: 13107.5, 60 sec: 14813.9, 300 sec: 14579.0). Total num frames: 145268736. Throughput: 0: 3691.3. Samples: 25478810. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 23:51:18,968][134211] Avg episode reward: [(0, '6.855')] [2025-01-03 23:51:20,341][134294] Updated weights for policy 0, policy_version 35474 (0.0012) [2025-01-03 23:51:23,004][134294] Updated weights for policy 0, policy_version 35484 (0.0023) [2025-01-03 23:51:23,968][134211] Fps is (10 sec: 15565.1, 60 sec: 15018.7, 300 sec: 14662.3). Total num frames: 145350656. Throughput: 0: 3685.2. Samples: 25506132. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 23:51:23,968][134211] Avg episode reward: [(0, '6.944')] [2025-01-03 23:51:26,075][134294] Updated weights for policy 0, policy_version 35494 (0.0027) [2025-01-03 23:51:28,968][134211] Fps is (10 sec: 15155.0, 60 sec: 15018.8, 300 sec: 14704.1). Total num frames: 145420288. Throughput: 0: 3694.1. Samples: 25525806. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 23:51:28,968][134211] Avg episode reward: [(0, '6.625')] [2025-01-03 23:51:29,261][134294] Updated weights for policy 0, policy_version 35504 (0.0028) [2025-01-03 23:51:32,270][134294] Updated weights for policy 0, policy_version 35514 (0.0026) [2025-01-03 23:51:33,968][134211] Fps is (10 sec: 13516.8, 60 sec: 14472.5, 300 sec: 14703.9). Total num frames: 145485824. Throughput: 0: 3693.2. Samples: 25535692. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 23:51:33,968][134211] Avg episode reward: [(0, '6.658')] [2025-01-03 23:51:35,292][134294] Updated weights for policy 0, policy_version 35524 (0.0027) [2025-01-03 23:51:38,255][134294] Updated weights for policy 0, policy_version 35534 (0.0026) [2025-01-03 23:51:38,968][134211] Fps is (10 sec: 13516.9, 60 sec: 14404.3, 300 sec: 14731.7). Total num frames: 145555456. Throughput: 0: 3630.8. Samples: 25556284. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 23:51:38,968][134211] Avg episode reward: [(0, '6.948')] [2025-01-03 23:51:41,356][134294] Updated weights for policy 0, policy_version 35544 (0.0022) [2025-01-03 23:51:43,968][134211] Fps is (10 sec: 13926.6, 60 sec: 14541.0, 300 sec: 14676.2). Total num frames: 145625088. Throughput: 0: 3378.5. Samples: 25576346. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 23:51:43,968][134211] Avg episode reward: [(0, '5.694')] [2025-01-03 23:51:44,216][134294] Updated weights for policy 0, policy_version 35554 (0.0023) [2025-01-03 23:51:46,144][134294] Updated weights for policy 0, policy_version 35564 (0.0014) [2025-01-03 23:51:48,019][134294] Updated weights for policy 0, policy_version 35574 (0.0013) [2025-01-03 23:51:48,968][134211] Fps is (10 sec: 17203.3, 60 sec: 15087.0, 300 sec: 14662.3). Total num frames: 145727488. Throughput: 0: 3474.0. Samples: 25590908. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 23:51:48,968][134211] Avg episode reward: [(0, '6.872')] [2025-01-03 23:51:50,143][134294] Updated weights for policy 0, policy_version 35584 (0.0016) [2025-01-03 23:51:53,548][134294] Updated weights for policy 0, policy_version 35594 (0.0029) [2025-01-03 23:51:53,968][134211] Fps is (10 sec: 17203.0, 60 sec: 14745.6, 300 sec: 14662.3). Total num frames: 145797120. Throughput: 0: 3648.2. Samples: 25617538. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 23:51:53,968][134211] Avg episode reward: [(0, '6.663')] [2025-01-03 23:51:56,974][134294] Updated weights for policy 0, policy_version 35604 (0.0027) [2025-01-03 23:51:58,968][134211] Fps is (10 sec: 12697.4, 60 sec: 13994.7, 300 sec: 14634.5). Total num frames: 145854464. Throughput: 0: 3661.6. Samples: 25635196. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 23:51:58,968][134211] Avg episode reward: [(0, '7.098')] [2025-01-03 23:52:00,346][134294] Updated weights for policy 0, policy_version 35614 (0.0027) [2025-01-03 23:52:03,380][134294] Updated weights for policy 0, policy_version 35624 (0.0027) [2025-01-03 23:52:03,968][134211] Fps is (10 sec: 12287.1, 60 sec: 14062.8, 300 sec: 14662.3). Total num frames: 145920000. Throughput: 0: 3692.2. Samples: 25644964. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 23:52:03,969][134211] Avg episode reward: [(0, '6.440')] [2025-01-03 23:52:06,452][134294] Updated weights for policy 0, policy_version 35634 (0.0028) [2025-01-03 23:52:08,968][134211] Fps is (10 sec: 13107.2, 60 sec: 14131.2, 300 sec: 14690.1). Total num frames: 145985536. Throughput: 0: 3523.5. Samples: 25664688. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 23:52:08,968][134211] Avg episode reward: [(0, '6.572')] [2025-01-03 23:52:09,466][134294] Updated weights for policy 0, policy_version 35644 (0.0023) [2025-01-03 23:52:11,481][134294] Updated weights for policy 0, policy_version 35654 (0.0013) [2025-01-03 23:52:13,968][134211] Fps is (10 sec: 15156.4, 60 sec: 14609.2, 300 sec: 14773.4). Total num frames: 146071552. Throughput: 0: 3634.0. Samples: 25689338. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 23:52:13,968][134211] Avg episode reward: [(0, '6.130')] [2025-01-03 23:52:14,251][134294] Updated weights for policy 0, policy_version 35664 (0.0024) [2025-01-03 23:52:17,353][134294] Updated weights for policy 0, policy_version 35674 (0.0027) [2025-01-03 23:52:18,968][134211] Fps is (10 sec: 15565.0, 60 sec: 14540.8, 300 sec: 14648.4). Total num frames: 146141184. Throughput: 0: 3635.7. Samples: 25699296. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-03 23:52:18,968][134211] Avg episode reward: [(0, '7.438')] [2025-01-03 23:52:19,984][134294] Updated weights for policy 0, policy_version 35684 (0.0022) [2025-01-03 23:52:22,170][134294] Updated weights for policy 0, policy_version 35694 (0.0017) [2025-01-03 23:52:23,968][134211] Fps is (10 sec: 15155.0, 60 sec: 14540.8, 300 sec: 14634.5). Total num frames: 146223104. Throughput: 0: 3723.5. Samples: 25723844. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-03 23:52:23,968][134211] Avg episode reward: [(0, '6.950')] [2025-01-03 23:52:25,211][134294] Updated weights for policy 0, policy_version 35704 (0.0029) [2025-01-03 23:52:28,170][134294] Updated weights for policy 0, policy_version 35714 (0.0026) [2025-01-03 23:52:28,968][134211] Fps is (10 sec: 15154.3, 60 sec: 14540.7, 300 sec: 14648.4). Total num frames: 146292736. Throughput: 0: 3733.5. Samples: 25744354. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-03 23:52:28,969][134211] Avg episode reward: [(0, '6.623')] [2025-01-03 23:52:31,154][134294] Updated weights for policy 0, policy_version 35724 (0.0023) [2025-01-03 23:52:33,968][134211] Fps is (10 sec: 13926.6, 60 sec: 14609.1, 300 sec: 14676.2). Total num frames: 146362368. Throughput: 0: 3638.4. Samples: 25754636. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-03 23:52:33,968][134211] Avg episode reward: [(0, '6.093')] [2025-01-03 23:52:34,283][134294] Updated weights for policy 0, policy_version 35734 (0.0025) [2025-01-03 23:52:36,290][134294] Updated weights for policy 0, policy_version 35744 (0.0013) [2025-01-03 23:52:38,192][134294] Updated weights for policy 0, policy_version 35754 (0.0012) [2025-01-03 23:52:38,968][134211] Fps is (10 sec: 16794.4, 60 sec: 15086.9, 300 sec: 14787.3). Total num frames: 146460672. Throughput: 0: 3609.1. Samples: 25779948. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 23:52:38,968][134211] Avg episode reward: [(0, '6.493')] [2025-01-03 23:52:40,098][134294] Updated weights for policy 0, policy_version 35764 (0.0014) [2025-01-03 23:52:42,029][134294] Updated weights for policy 0, policy_version 35774 (0.0015) [2025-01-03 23:52:43,968][134211] Fps is (10 sec: 19660.7, 60 sec: 15564.8, 300 sec: 14884.4). Total num frames: 146558976. Throughput: 0: 3899.6. Samples: 25810680. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 23:52:43,968][134211] Avg episode reward: [(0, '6.675')] [2025-01-03 23:52:44,844][134294] Updated weights for policy 0, policy_version 35784 (0.0023) [2025-01-03 23:52:48,004][134294] Updated weights for policy 0, policy_version 35794 (0.0028) [2025-01-03 23:52:48,968][134211] Fps is (10 sec: 16384.0, 60 sec: 14950.3, 300 sec: 14898.3). Total num frames: 146624512. Throughput: 0: 3904.5. Samples: 25820664. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 23:52:48,968][134211] Avg episode reward: [(0, '6.453')] [2025-01-03 23:52:51,261][134294] Updated weights for policy 0, policy_version 35804 (0.0026) [2025-01-03 23:52:53,968][134211] Fps is (10 sec: 12287.4, 60 sec: 14745.5, 300 sec: 14870.5). Total num frames: 146681856. Throughput: 0: 3875.8. Samples: 25839102. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 23:52:53,969][134211] Avg episode reward: [(0, '6.330')] [2025-01-03 23:52:54,786][134294] Updated weights for policy 0, policy_version 35814 (0.0029) [2025-01-03 23:52:57,969][134294] Updated weights for policy 0, policy_version 35824 (0.0026) [2025-01-03 23:52:58,968][134211] Fps is (10 sec: 12287.9, 60 sec: 14882.1, 300 sec: 14773.4). Total num frames: 146747392. Throughput: 0: 3738.8. Samples: 25857586. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 23:52:58,968][134211] Avg episode reward: [(0, '6.404')] [2025-01-03 23:53:00,985][134294] Updated weights for policy 0, policy_version 35834 (0.0027) [2025-01-03 23:53:03,968][134211] Fps is (10 sec: 13107.8, 60 sec: 14882.3, 300 sec: 14745.6). Total num frames: 146812928. Throughput: 0: 3744.3. Samples: 25867788. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 23:53:03,969][134211] Avg episode reward: [(0, '6.105')] [2025-01-03 23:53:04,196][134294] Updated weights for policy 0, policy_version 35844 (0.0026) [2025-01-03 23:53:07,135][134294] Updated weights for policy 0, policy_version 35854 (0.0025) [2025-01-03 23:53:08,968][134211] Fps is (10 sec: 13107.5, 60 sec: 14882.2, 300 sec: 14745.6). Total num frames: 146878464. Throughput: 0: 3642.0. Samples: 25887732. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 23:53:08,968][134211] Avg episode reward: [(0, '6.841')] [2025-01-03 23:53:10,445][134294] Updated weights for policy 0, policy_version 35864 (0.0027) [2025-01-03 23:53:12,526][134294] Updated weights for policy 0, policy_version 35874 (0.0013) [2025-01-03 23:53:13,967][134211] Fps is (10 sec: 15155.6, 60 sec: 14882.2, 300 sec: 14801.2). Total num frames: 146964480. Throughput: 0: 3706.3. Samples: 25911134. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 23:53:13,968][134211] Avg episode reward: [(0, '6.687')] [2025-01-03 23:53:14,010][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000035881_146968576.pth... [2025-01-03 23:53:14,052][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000035014_143417344.pth [2025-01-03 23:53:14,653][134294] Updated weights for policy 0, policy_version 35884 (0.0013) [2025-01-03 23:53:16,484][134294] Updated weights for policy 0, policy_version 35894 (0.0014) [2025-01-03 23:53:18,386][134294] Updated weights for policy 0, policy_version 35904 (0.0014) [2025-01-03 23:53:18,968][134211] Fps is (10 sec: 19660.9, 60 sec: 15564.8, 300 sec: 14940.0). Total num frames: 147075072. Throughput: 0: 3828.9. Samples: 25926934. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 23:53:18,968][134211] Avg episode reward: [(0, '6.842')] [2025-01-03 23:53:20,675][134294] Updated weights for policy 0, policy_version 35914 (0.0018) [2025-01-03 23:53:23,853][134294] Updated weights for policy 0, policy_version 35924 (0.0026) [2025-01-03 23:53:23,968][134211] Fps is (10 sec: 18021.9, 60 sec: 15360.0, 300 sec: 14842.8). Total num frames: 147144704. Throughput: 0: 3869.9. Samples: 25954094. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-03 23:53:23,968][134211] Avg episode reward: [(0, '6.761')] [2025-01-03 23:53:26,831][134294] Updated weights for policy 0, policy_version 35934 (0.0026) [2025-01-03 23:53:28,969][134211] Fps is (10 sec: 13515.3, 60 sec: 15291.6, 300 sec: 14842.8). Total num frames: 147210240. Throughput: 0: 3625.2. Samples: 25973818. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-03 23:53:28,969][134211] Avg episode reward: [(0, '6.744')] [2025-01-03 23:53:30,164][134294] Updated weights for policy 0, policy_version 35944 (0.0026) [2025-01-03 23:53:33,376][134294] Updated weights for policy 0, policy_version 35954 (0.0026) [2025-01-03 23:53:33,968][134211] Fps is (10 sec: 12697.6, 60 sec: 15155.2, 300 sec: 14828.9). Total num frames: 147271680. Throughput: 0: 3606.5. Samples: 25982956. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-03 23:53:33,968][134211] Avg episode reward: [(0, '6.110')] [2025-01-03 23:53:36,441][134294] Updated weights for policy 0, policy_version 35964 (0.0027) [2025-01-03 23:53:38,968][134211] Fps is (10 sec: 13108.4, 60 sec: 14677.3, 300 sec: 14815.0). Total num frames: 147341312. Throughput: 0: 3633.9. Samples: 26002628. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-03 23:53:38,968][134211] Avg episode reward: [(0, '6.384')] [2025-01-03 23:53:39,631][134294] Updated weights for policy 0, policy_version 35974 (0.0026) [2025-01-03 23:53:42,647][134294] Updated weights for policy 0, policy_version 35984 (0.0025) [2025-01-03 23:53:43,968][134211] Fps is (10 sec: 13516.7, 60 sec: 14131.2, 300 sec: 14676.2). Total num frames: 147406848. Throughput: 0: 3666.4. Samples: 26022576. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-03 23:53:43,968][134211] Avg episode reward: [(0, '6.390')] [2025-01-03 23:53:45,555][134294] Updated weights for policy 0, policy_version 35994 (0.0022) [2025-01-03 23:53:48,647][134294] Updated weights for policy 0, policy_version 36004 (0.0026) [2025-01-03 23:53:48,968][134211] Fps is (10 sec: 13106.4, 60 sec: 14131.1, 300 sec: 14620.6). Total num frames: 147472384. Throughput: 0: 3669.2. Samples: 26032902. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-03 23:53:48,969][134211] Avg episode reward: [(0, '6.516')] [2025-01-03 23:53:51,724][134294] Updated weights for policy 0, policy_version 36014 (0.0025) [2025-01-03 23:53:53,968][134211] Fps is (10 sec: 13516.9, 60 sec: 14336.1, 300 sec: 14634.5). Total num frames: 147542016. Throughput: 0: 3671.8. Samples: 26052966. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-03 23:53:53,968][134211] Avg episode reward: [(0, '6.511')] [2025-01-03 23:53:54,934][134294] Updated weights for policy 0, policy_version 36024 (0.0026) [2025-01-03 23:53:56,864][134294] Updated weights for policy 0, policy_version 36034 (0.0013) [2025-01-03 23:53:58,735][134294] Updated weights for policy 0, policy_version 36044 (0.0013) [2025-01-03 23:53:58,968][134211] Fps is (10 sec: 16794.3, 60 sec: 14882.1, 300 sec: 14731.7). Total num frames: 147640320. Throughput: 0: 3732.8. Samples: 26079110. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-03 23:53:58,968][134211] Avg episode reward: [(0, '5.763')] [2025-01-03 23:54:00,631][134294] Updated weights for policy 0, policy_version 36054 (0.0015) [2025-01-03 23:54:02,579][134294] Updated weights for policy 0, policy_version 36064 (0.0013) [2025-01-03 23:54:03,968][134211] Fps is (10 sec: 19660.8, 60 sec: 15428.3, 300 sec: 14856.7). Total num frames: 147738624. Throughput: 0: 3745.7. Samples: 26095490. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-03 23:54:03,968][134211] Avg episode reward: [(0, '6.124')] [2025-01-03 23:54:05,422][134294] Updated weights for policy 0, policy_version 36074 (0.0022) [2025-01-03 23:54:08,600][134294] Updated weights for policy 0, policy_version 36084 (0.0028) [2025-01-03 23:54:08,968][134211] Fps is (10 sec: 16384.5, 60 sec: 15428.3, 300 sec: 14884.4). Total num frames: 147804160. Throughput: 0: 3662.6. Samples: 26118912. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-03 23:54:08,968][134211] Avg episode reward: [(0, '6.523')] [2025-01-03 23:54:11,785][134294] Updated weights for policy 0, policy_version 36094 (0.0024) [2025-01-03 23:54:13,968][134211] Fps is (10 sec: 13107.2, 60 sec: 15086.9, 300 sec: 14898.3). Total num frames: 147869696. Throughput: 0: 3650.7. Samples: 26138098. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 23:54:13,969][134211] Avg episode reward: [(0, '6.134')] [2025-01-03 23:54:14,892][134294] Updated weights for policy 0, policy_version 36104 (0.0024) [2025-01-03 23:54:18,274][134294] Updated weights for policy 0, policy_version 36114 (0.0026) [2025-01-03 23:54:18,968][134211] Fps is (10 sec: 12697.6, 60 sec: 14267.7, 300 sec: 14759.5). Total num frames: 147931136. Throughput: 0: 3656.5. Samples: 26147496. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 23:54:18,968][134211] Avg episode reward: [(0, '6.654')] [2025-01-03 23:54:21,383][134294] Updated weights for policy 0, policy_version 36124 (0.0028) [2025-01-03 23:54:23,968][134211] Fps is (10 sec: 12288.1, 60 sec: 14131.2, 300 sec: 14662.3). Total num frames: 147992576. Throughput: 0: 3638.4. Samples: 26166354. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 23:54:23,968][134211] Avg episode reward: [(0, '6.113')] [2025-01-03 23:54:24,624][134294] Updated weights for policy 0, policy_version 36134 (0.0025) [2025-01-03 23:54:26,599][134294] Updated weights for policy 0, policy_version 36144 (0.0014) [2025-01-03 23:54:28,490][134294] Updated weights for policy 0, policy_version 36154 (0.0014) [2025-01-03 23:54:28,968][134211] Fps is (10 sec: 16384.1, 60 sec: 14745.9, 300 sec: 14801.2). Total num frames: 148094976. Throughput: 0: 3790.4. Samples: 26193144. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 23:54:28,968][134211] Avg episode reward: [(0, '6.429')] [2025-01-03 23:54:31,016][134294] Updated weights for policy 0, policy_version 36164 (0.0021) [2025-01-03 23:54:33,968][134211] Fps is (10 sec: 17203.0, 60 sec: 14882.1, 300 sec: 14815.0). Total num frames: 148164608. Throughput: 0: 3838.9. Samples: 26205652. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 23:54:33,968][134211] Avg episode reward: [(0, '6.765')] [2025-01-03 23:54:34,053][134294] Updated weights for policy 0, policy_version 36174 (0.0025) [2025-01-03 23:54:37,061][134294] Updated weights for policy 0, policy_version 36184 (0.0027) [2025-01-03 23:54:38,968][134211] Fps is (10 sec: 13515.8, 60 sec: 14813.7, 300 sec: 14815.0). Total num frames: 148230144. Throughput: 0: 3848.5. Samples: 26226148. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 23:54:38,969][134211] Avg episode reward: [(0, '6.599')] [2025-01-03 23:54:40,252][134294] Updated weights for policy 0, policy_version 36194 (0.0026) [2025-01-03 23:54:43,200][134294] Updated weights for policy 0, policy_version 36204 (0.0024) [2025-01-03 23:54:43,968][134211] Fps is (10 sec: 13516.7, 60 sec: 14882.1, 300 sec: 14828.9). Total num frames: 148299776. Throughput: 0: 3708.2. Samples: 26245978. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 23:54:43,968][134211] Avg episode reward: [(0, '6.841')] [2025-01-03 23:54:46,214][134294] Updated weights for policy 0, policy_version 36214 (0.0028) [2025-01-03 23:54:48,968][134211] Fps is (10 sec: 13926.4, 60 sec: 14950.4, 300 sec: 14828.9). Total num frames: 148369408. Throughput: 0: 3573.5. Samples: 26256298. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 23:54:48,969][134211] Avg episode reward: [(0, '7.105')] [2025-01-03 23:54:49,243][134294] Updated weights for policy 0, policy_version 36224 (0.0026) [2025-01-03 23:54:51,902][134294] Updated weights for policy 0, policy_version 36234 (0.0020) [2025-01-03 23:54:53,753][134294] Updated weights for policy 0, policy_version 36244 (0.0013) [2025-01-03 23:54:53,968][134211] Fps is (10 sec: 15974.8, 60 sec: 15291.8, 300 sec: 14815.0). Total num frames: 148459520. Throughput: 0: 3547.3. Samples: 26278540. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 23:54:53,968][134211] Avg episode reward: [(0, '6.766')] [2025-01-03 23:54:55,614][134294] Updated weights for policy 0, policy_version 36254 (0.0013) [2025-01-03 23:54:57,531][134294] Updated weights for policy 0, policy_version 36264 (0.0014) [2025-01-03 23:54:58,968][134211] Fps is (10 sec: 19662.2, 60 sec: 15428.4, 300 sec: 14926.1). Total num frames: 148566016. Throughput: 0: 3845.8. Samples: 26311158. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 23:54:58,968][134211] Avg episode reward: [(0, '6.756')] [2025-01-03 23:54:59,619][134294] Updated weights for policy 0, policy_version 36274 (0.0014) [2025-01-03 23:55:02,702][134294] Updated weights for policy 0, policy_version 36284 (0.0027) [2025-01-03 23:55:03,968][134211] Fps is (10 sec: 17203.0, 60 sec: 14882.2, 300 sec: 14912.2). Total num frames: 148631552. Throughput: 0: 3906.2. Samples: 26323274. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 23:55:03,968][134211] Avg episode reward: [(0, '6.633')] [2025-01-03 23:55:05,857][134294] Updated weights for policy 0, policy_version 36294 (0.0023) [2025-01-03 23:55:08,968][134211] Fps is (10 sec: 13106.9, 60 sec: 14882.1, 300 sec: 14912.2). Total num frames: 148697088. Throughput: 0: 3922.0. Samples: 26342846. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 23:55:08,969][134211] Avg episode reward: [(0, '6.464')] [2025-01-03 23:55:09,021][134294] Updated weights for policy 0, policy_version 36304 (0.0027) [2025-01-03 23:55:12,695][134294] Updated weights for policy 0, policy_version 36314 (0.0026) [2025-01-03 23:55:13,968][134211] Fps is (10 sec: 12287.9, 60 sec: 14745.6, 300 sec: 14828.9). Total num frames: 148754432. Throughput: 0: 3713.7. Samples: 26360260. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 23:55:13,968][134211] Avg episode reward: [(0, '6.688')] [2025-01-03 23:55:13,979][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000036317_148754432.pth... [2025-01-03 23:55:14,063][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000035448_145195008.pth [2025-01-03 23:55:16,133][134294] Updated weights for policy 0, policy_version 36324 (0.0025) [2025-01-03 23:55:18,750][134294] Updated weights for policy 0, policy_version 36334 (0.0017) [2025-01-03 23:55:18,968][134211] Fps is (10 sec: 13107.5, 60 sec: 14950.4, 300 sec: 14842.8). Total num frames: 148828160. Throughput: 0: 3631.2. Samples: 26369054. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-03 23:55:18,968][134211] Avg episode reward: [(0, '6.015')] [2025-01-03 23:55:21,111][134294] Updated weights for policy 0, policy_version 36344 (0.0019) [2025-01-03 23:55:23,971][134211] Fps is (10 sec: 14741.0, 60 sec: 15154.4, 300 sec: 14856.6). Total num frames: 148901888. Throughput: 0: 3716.6. Samples: 26393406. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-03 23:55:23,972][134211] Avg episode reward: [(0, '6.324')] [2025-01-03 23:55:24,121][134294] Updated weights for policy 0, policy_version 36354 (0.0022) [2025-01-03 23:55:26,994][134294] Updated weights for policy 0, policy_version 36364 (0.0026) [2025-01-03 23:55:28,969][134211] Fps is (10 sec: 14334.4, 60 sec: 14608.8, 300 sec: 14759.4). Total num frames: 148971520. Throughput: 0: 3729.4. Samples: 26413804. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-03 23:55:28,970][134211] Avg episode reward: [(0, '7.174')] [2025-01-03 23:55:29,950][134294] Updated weights for policy 0, policy_version 36374 (0.0023) [2025-01-03 23:55:32,938][134294] Updated weights for policy 0, policy_version 36384 (0.0026) [2025-01-03 23:55:33,968][134211] Fps is (10 sec: 13521.1, 60 sec: 14540.8, 300 sec: 14731.7). Total num frames: 149037056. Throughput: 0: 3734.1. Samples: 26424330. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-03 23:55:33,968][134211] Avg episode reward: [(0, '6.663')] [2025-01-03 23:55:35,790][134294] Updated weights for policy 0, policy_version 36394 (0.0021) [2025-01-03 23:55:37,686][134294] Updated weights for policy 0, policy_version 36404 (0.0012) [2025-01-03 23:55:38,968][134211] Fps is (10 sec: 16385.9, 60 sec: 15087.1, 300 sec: 14856.7). Total num frames: 149135360. Throughput: 0: 3768.8. Samples: 26448138. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-03 23:55:38,968][134211] Avg episode reward: [(0, '6.751')] [2025-01-03 23:55:39,613][134294] Updated weights for policy 0, policy_version 36414 (0.0014) [2025-01-03 23:55:41,453][134294] Updated weights for policy 0, policy_version 36424 (0.0013) [2025-01-03 23:55:43,291][134294] Updated weights for policy 0, policy_version 36434 (0.0012) [2025-01-03 23:55:43,968][134211] Fps is (10 sec: 20889.1, 60 sec: 15769.6, 300 sec: 14995.5). Total num frames: 149245952. Throughput: 0: 3770.1. Samples: 26480814. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 23:55:43,969][134211] Avg episode reward: [(0, '6.954')] [2025-01-03 23:55:45,667][134294] Updated weights for policy 0, policy_version 36444 (0.0019) [2025-01-03 23:55:48,865][134294] Updated weights for policy 0, policy_version 36454 (0.0026) [2025-01-03 23:55:48,968][134211] Fps is (10 sec: 18021.9, 60 sec: 15769.7, 300 sec: 14926.1). Total num frames: 149315584. Throughput: 0: 3783.5. Samples: 26493530. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 23:55:48,968][134211] Avg episode reward: [(0, '6.375')] [2025-01-03 23:55:52,017][134294] Updated weights for policy 0, policy_version 36464 (0.0026) [2025-01-03 23:55:53,968][134211] Fps is (10 sec: 13107.4, 60 sec: 15291.7, 300 sec: 14787.3). Total num frames: 149377024. Throughput: 0: 3772.0. Samples: 26512588. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 23:55:53,968][134211] Avg episode reward: [(0, '6.560')] [2025-01-03 23:55:55,333][134294] Updated weights for policy 0, policy_version 36474 (0.0027) [2025-01-03 23:55:58,243][134294] Updated weights for policy 0, policy_version 36484 (0.0025) [2025-01-03 23:55:58,968][134211] Fps is (10 sec: 12697.3, 60 sec: 14608.9, 300 sec: 14801.1). Total num frames: 149442560. Throughput: 0: 3823.0. Samples: 26532296. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 23:55:58,969][134211] Avg episode reward: [(0, '6.816')] [2025-01-03 23:56:01,419][134294] Updated weights for policy 0, policy_version 36494 (0.0022) [2025-01-03 23:56:03,968][134211] Fps is (10 sec: 13516.8, 60 sec: 14677.3, 300 sec: 14828.9). Total num frames: 149512192. Throughput: 0: 3854.0. Samples: 26542486. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-03 23:56:03,968][134211] Avg episode reward: [(0, '6.341')] [2025-01-03 23:56:04,484][134294] Updated weights for policy 0, policy_version 36504 (0.0025) [2025-01-03 23:56:07,533][134294] Updated weights for policy 0, policy_version 36514 (0.0023) [2025-01-03 23:56:08,968][134211] Fps is (10 sec: 13517.0, 60 sec: 14677.3, 300 sec: 14856.7). Total num frames: 149577728. Throughput: 0: 3755.8. Samples: 26562404. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-03 23:56:08,969][134211] Avg episode reward: [(0, '6.302')] [2025-01-03 23:56:10,613][134294] Updated weights for policy 0, policy_version 36524 (0.0023) [2025-01-03 23:56:13,521][134294] Updated weights for policy 0, policy_version 36534 (0.0026) [2025-01-03 23:56:13,968][134211] Fps is (10 sec: 13516.8, 60 sec: 14882.2, 300 sec: 14842.8). Total num frames: 149647360. Throughput: 0: 3756.2. Samples: 26582828. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-03 23:56:13,969][134211] Avg episode reward: [(0, '7.005')] [2025-01-03 23:56:16,457][134294] Updated weights for policy 0, policy_version 36544 (0.0025) [2025-01-03 23:56:18,968][134211] Fps is (10 sec: 13926.8, 60 sec: 14813.9, 300 sec: 14801.1). Total num frames: 149716992. Throughput: 0: 3752.5. Samples: 26593192. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-03 23:56:18,968][134211] Avg episode reward: [(0, '6.463')] [2025-01-03 23:56:19,575][134294] Updated weights for policy 0, policy_version 36554 (0.0025) [2025-01-03 23:56:22,452][134294] Updated weights for policy 0, policy_version 36564 (0.0026) [2025-01-03 23:56:23,968][134211] Fps is (10 sec: 13516.8, 60 sec: 14678.1, 300 sec: 14787.3). Total num frames: 149782528. Throughput: 0: 3676.6. Samples: 26613586. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-03 23:56:23,968][134211] Avg episode reward: [(0, '6.679')] [2025-01-03 23:56:24,951][134294] Updated weights for policy 0, policy_version 36574 (0.0019) [2025-01-03 23:56:26,840][134294] Updated weights for policy 0, policy_version 36584 (0.0014) [2025-01-03 23:56:28,727][134294] Updated weights for policy 0, policy_version 36594 (0.0014) [2025-01-03 23:56:28,967][134211] Fps is (10 sec: 17612.9, 60 sec: 15360.3, 300 sec: 14940.0). Total num frames: 149893120. Throughput: 0: 3592.4. Samples: 26642470. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-03 23:56:28,968][134211] Avg episode reward: [(0, '6.447')] [2025-01-03 23:56:30,618][134294] Updated weights for policy 0, policy_version 36604 (0.0012) [2025-01-03 23:56:32,763][134294] Updated weights for policy 0, policy_version 36614 (0.0018) [2025-01-03 23:56:33,968][134211] Fps is (10 sec: 20479.8, 60 sec: 15837.8, 300 sec: 15023.3). Total num frames: 149987328. Throughput: 0: 3671.8. Samples: 26658762. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-03 23:56:33,968][134211] Avg episode reward: [(0, '6.179')] [2025-01-03 23:56:35,931][134294] Updated weights for policy 0, policy_version 36624 (0.0029) [2025-01-03 23:56:38,968][134211] Fps is (10 sec: 15564.4, 60 sec: 15223.4, 300 sec: 14995.5). Total num frames: 150048768. Throughput: 0: 3725.1. Samples: 26680218. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-03 23:56:38,968][134211] Avg episode reward: [(0, '5.779')] [2025-01-03 23:56:39,177][134294] Updated weights for policy 0, policy_version 36634 (0.0030) [2025-01-03 23:56:42,373][134294] Updated weights for policy 0, policy_version 36644 (0.0026) [2025-01-03 23:56:43,968][134211] Fps is (10 sec: 12697.7, 60 sec: 14472.6, 300 sec: 14870.6). Total num frames: 150114304. Throughput: 0: 3707.5. Samples: 26699132. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-03 23:56:43,968][134211] Avg episode reward: [(0, '5.873')] [2025-01-03 23:56:45,499][134294] Updated weights for policy 0, policy_version 36654 (0.0026) [2025-01-03 23:56:48,540][134294] Updated weights for policy 0, policy_version 36664 (0.0026) [2025-01-03 23:56:48,968][134211] Fps is (10 sec: 13107.2, 60 sec: 14404.3, 300 sec: 14856.7). Total num frames: 150179840. Throughput: 0: 3706.0. Samples: 26709256. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-03 23:56:48,968][134211] Avg episode reward: [(0, '6.693')] [2025-01-03 23:56:51,621][134294] Updated weights for policy 0, policy_version 36674 (0.0027) [2025-01-03 23:56:53,968][134211] Fps is (10 sec: 12697.4, 60 sec: 14404.2, 300 sec: 14870.6). Total num frames: 150241280. Throughput: 0: 3698.2. Samples: 26728824. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-03 23:56:53,968][134211] Avg episode reward: [(0, '5.990')] [2025-01-03 23:56:55,322][134294] Updated weights for policy 0, policy_version 36684 (0.0026) [2025-01-03 23:56:57,999][134294] Updated weights for policy 0, policy_version 36694 (0.0019) [2025-01-03 23:56:58,967][134211] Fps is (10 sec: 13926.8, 60 sec: 14609.2, 300 sec: 14912.3). Total num frames: 150319104. Throughput: 0: 3690.4. Samples: 26748896. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-03 23:56:58,968][134211] Avg episode reward: [(0, '6.282')] [2025-01-03 23:56:59,914][134294] Updated weights for policy 0, policy_version 36704 (0.0012) [2025-01-03 23:57:01,761][134294] Updated weights for policy 0, policy_version 36714 (0.0013) [2025-01-03 23:57:03,770][134294] Updated weights for policy 0, policy_version 36724 (0.0014) [2025-01-03 23:57:03,968][134211] Fps is (10 sec: 18022.8, 60 sec: 15155.2, 300 sec: 15037.2). Total num frames: 150421504. Throughput: 0: 3820.1. Samples: 26765096. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-03 23:57:03,969][134211] Avg episode reward: [(0, '6.447')] [2025-01-03 23:57:06,695][134294] Updated weights for policy 0, policy_version 36734 (0.0020) [2025-01-03 23:57:08,968][134211] Fps is (10 sec: 16383.4, 60 sec: 15087.0, 300 sec: 14953.9). Total num frames: 150482944. Throughput: 0: 3925.4. Samples: 26790228. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-03 23:57:08,969][134211] Avg episode reward: [(0, '6.964')] [2025-01-03 23:57:10,772][134294] Updated weights for policy 0, policy_version 36744 (0.0027) [2025-01-03 23:57:13,968][134211] Fps is (10 sec: 11468.5, 60 sec: 14813.8, 300 sec: 14898.3). Total num frames: 150536192. Throughput: 0: 3631.8. Samples: 26805904. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-03 23:57:13,969][134211] Avg episode reward: [(0, '6.349')] [2025-01-03 23:57:14,019][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000036753_150540288.pth... [2025-01-03 23:57:14,101][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000035881_146968576.pth [2025-01-03 23:57:14,470][134294] Updated weights for policy 0, policy_version 36754 (0.0029) [2025-01-03 23:57:17,202][134294] Updated weights for policy 0, policy_version 36764 (0.0017) [2025-01-03 23:57:18,968][134211] Fps is (10 sec: 13926.8, 60 sec: 15086.9, 300 sec: 14912.2). Total num frames: 150622208. Throughput: 0: 3462.3. Samples: 26814564. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-03 23:57:18,968][134211] Avg episode reward: [(0, '6.454')] [2025-01-03 23:57:19,208][134294] Updated weights for policy 0, policy_version 36774 (0.0014) [2025-01-03 23:57:21,043][134294] Updated weights for policy 0, policy_version 36784 (0.0014) [2025-01-03 23:57:22,937][134294] Updated weights for policy 0, policy_version 36794 (0.0014) [2025-01-03 23:57:23,968][134211] Fps is (10 sec: 18842.2, 60 sec: 15701.4, 300 sec: 15023.3). Total num frames: 150724608. Throughput: 0: 3684.0. Samples: 26845996. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-03 23:57:23,968][134211] Avg episode reward: [(0, '6.889')] [2025-01-03 23:57:24,900][134294] Updated weights for policy 0, policy_version 36804 (0.0013) [2025-01-03 23:57:27,898][134294] Updated weights for policy 0, policy_version 36814 (0.0027) [2025-01-03 23:57:28,968][134211] Fps is (10 sec: 17612.4, 60 sec: 15086.9, 300 sec: 15037.2). Total num frames: 150798336. Throughput: 0: 3826.9. Samples: 26871342. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-03 23:57:28,969][134211] Avg episode reward: [(0, '6.368')] [2025-01-03 23:57:31,525][134294] Updated weights for policy 0, policy_version 36824 (0.0031) [2025-01-03 23:57:33,968][134211] Fps is (10 sec: 13516.7, 60 sec: 14540.8, 300 sec: 14912.2). Total num frames: 150859776. Throughput: 0: 3795.3. Samples: 26880046. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-03 23:57:33,968][134211] Avg episode reward: [(0, '7.443')] [2025-01-03 23:57:35,047][134294] Updated weights for policy 0, policy_version 36834 (0.0028) [2025-01-03 23:57:38,364][134294] Updated weights for policy 0, policy_version 36844 (0.0026) [2025-01-03 23:57:38,968][134211] Fps is (10 sec: 11878.4, 60 sec: 14472.5, 300 sec: 14773.4). Total num frames: 150917120. Throughput: 0: 3752.9. Samples: 26897704. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-03 23:57:38,968][134211] Avg episode reward: [(0, '6.501')] [2025-01-03 23:57:41,515][134294] Updated weights for policy 0, policy_version 36854 (0.0024) [2025-01-03 23:57:43,968][134211] Fps is (10 sec: 12697.4, 60 sec: 14540.8, 300 sec: 14787.3). Total num frames: 150986752. Throughput: 0: 3739.6. Samples: 26917178. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-03 23:57:43,969][134211] Avg episode reward: [(0, '7.317')] [2025-01-03 23:57:44,643][134294] Updated weights for policy 0, policy_version 36864 (0.0027) [2025-01-03 23:57:47,611][134294] Updated weights for policy 0, policy_version 36874 (0.0025) [2025-01-03 23:57:48,968][134211] Fps is (10 sec: 13516.8, 60 sec: 14540.8, 300 sec: 14815.0). Total num frames: 151052288. Throughput: 0: 3603.2. Samples: 26927240. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-03 23:57:48,968][134211] Avg episode reward: [(0, '6.878')] [2025-01-03 23:57:50,681][134294] Updated weights for policy 0, policy_version 36884 (0.0027) [2025-01-03 23:57:53,683][134294] Updated weights for policy 0, policy_version 36894 (0.0025) [2025-01-03 23:57:53,968][134211] Fps is (10 sec: 13516.8, 60 sec: 14677.3, 300 sec: 14828.9). Total num frames: 151121920. Throughput: 0: 3497.7. Samples: 26947626. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-03 23:57:53,968][134211] Avg episode reward: [(0, '5.935')] [2025-01-03 23:57:56,660][134294] Updated weights for policy 0, policy_version 36904 (0.0026) [2025-01-03 23:57:58,968][134211] Fps is (10 sec: 13516.7, 60 sec: 14472.5, 300 sec: 14828.9). Total num frames: 151187456. Throughput: 0: 3595.7. Samples: 26967712. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-03 23:57:58,969][134211] Avg episode reward: [(0, '6.438')] [2025-01-03 23:57:59,846][134294] Updated weights for policy 0, policy_version 36914 (0.0023) [2025-01-03 23:58:02,755][134294] Updated weights for policy 0, policy_version 36924 (0.0025) [2025-01-03 23:58:03,968][134211] Fps is (10 sec: 13107.4, 60 sec: 13858.1, 300 sec: 14828.9). Total num frames: 151252992. Throughput: 0: 3630.8. Samples: 26977952. Policy #0 lag: (min: 0.0, avg: 0.8, max: 3.0) [2025-01-03 23:58:03,968][134211] Avg episode reward: [(0, '6.958')] [2025-01-03 23:58:05,552][134294] Updated weights for policy 0, policy_version 36934 (0.0021) [2025-01-03 23:58:07,401][134294] Updated weights for policy 0, policy_version 36944 (0.0014) [2025-01-03 23:58:08,968][134211] Fps is (10 sec: 16793.6, 60 sec: 14540.8, 300 sec: 14884.4). Total num frames: 151355392. Throughput: 0: 3485.0. Samples: 27002822. Policy #0 lag: (min: 0.0, avg: 0.8, max: 3.0) [2025-01-03 23:58:08,968][134211] Avg episode reward: [(0, '6.280')] [2025-01-03 23:58:09,308][134294] Updated weights for policy 0, policy_version 36954 (0.0013) [2025-01-03 23:58:11,149][134294] Updated weights for policy 0, policy_version 36964 (0.0015) [2025-01-03 23:58:13,101][134294] Updated weights for policy 0, policy_version 36974 (0.0014) [2025-01-03 23:58:13,968][134211] Fps is (10 sec: 20479.7, 60 sec: 15360.0, 300 sec: 14856.7). Total num frames: 151457792. Throughput: 0: 3631.1. Samples: 27034740. Policy #0 lag: (min: 0.0, avg: 0.8, max: 3.0) [2025-01-03 23:58:13,968][134211] Avg episode reward: [(0, '7.357')] [2025-01-03 23:58:16,211][134294] Updated weights for policy 0, policy_version 36984 (0.0026) [2025-01-03 23:58:18,968][134211] Fps is (10 sec: 15974.4, 60 sec: 14882.1, 300 sec: 14815.0). Total num frames: 151515136. Throughput: 0: 3654.5. Samples: 27044498. Policy #0 lag: (min: 0.0, avg: 0.8, max: 3.0) [2025-01-03 23:58:18,968][134211] Avg episode reward: [(0, '6.361')] [2025-01-03 23:58:19,618][134294] Updated weights for policy 0, policy_version 36994 (0.0030) [2025-01-03 23:58:22,837][134294] Updated weights for policy 0, policy_version 37004 (0.0026) [2025-01-03 23:58:23,968][134211] Fps is (10 sec: 12288.2, 60 sec: 14267.7, 300 sec: 14815.1). Total num frames: 151580672. Throughput: 0: 3677.3. Samples: 27063182. Policy #0 lag: (min: 0.0, avg: 0.8, max: 3.0) [2025-01-03 23:58:23,968][134211] Avg episode reward: [(0, '6.837')] [2025-01-03 23:58:25,908][134294] Updated weights for policy 0, policy_version 37014 (0.0025) [2025-01-03 23:58:28,883][134294] Updated weights for policy 0, policy_version 37024 (0.0026) [2025-01-03 23:58:28,968][134211] Fps is (10 sec: 13516.9, 60 sec: 14199.5, 300 sec: 14842.8). Total num frames: 151650304. Throughput: 0: 3696.8. Samples: 27083532. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 23:58:28,968][134211] Avg episode reward: [(0, '6.358')] [2025-01-03 23:58:31,848][134294] Updated weights for policy 0, policy_version 37034 (0.0026) [2025-01-03 23:58:33,968][134211] Fps is (10 sec: 13516.9, 60 sec: 14267.7, 300 sec: 14828.9). Total num frames: 151715840. Throughput: 0: 3695.8. Samples: 27093550. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 23:58:33,968][134211] Avg episode reward: [(0, '6.544')] [2025-01-03 23:58:34,990][134294] Updated weights for policy 0, policy_version 37044 (0.0026) [2025-01-03 23:58:37,860][134294] Updated weights for policy 0, policy_version 37054 (0.0024) [2025-01-03 23:58:38,967][134211] Fps is (10 sec: 14336.2, 60 sec: 14609.1, 300 sec: 14870.6). Total num frames: 151793664. Throughput: 0: 3690.9. Samples: 27113716. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 23:58:38,968][134211] Avg episode reward: [(0, '6.257')] [2025-01-03 23:58:39,991][134294] Updated weights for policy 0, policy_version 37064 (0.0016) [2025-01-03 23:58:43,071][134294] Updated weights for policy 0, policy_version 37074 (0.0023) [2025-01-03 23:58:43,969][134211] Fps is (10 sec: 14743.9, 60 sec: 14608.8, 300 sec: 14884.4). Total num frames: 151863296. Throughput: 0: 3765.0. Samples: 27137142. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 23:58:43,969][134211] Avg episode reward: [(0, '6.083')] [2025-01-03 23:58:46,121][134294] Updated weights for policy 0, policy_version 37084 (0.0024) [2025-01-03 23:58:48,774][134294] Updated weights for policy 0, policy_version 37094 (0.0018) [2025-01-03 23:58:48,968][134211] Fps is (10 sec: 14745.4, 60 sec: 14813.9, 300 sec: 14912.2). Total num frames: 151941120. Throughput: 0: 3764.2. Samples: 27147342. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 23:58:48,968][134211] Avg episode reward: [(0, '5.929')] [2025-01-03 23:58:50,651][134294] Updated weights for policy 0, policy_version 37104 (0.0013) [2025-01-03 23:58:52,532][134294] Updated weights for policy 0, policy_version 37114 (0.0013) [2025-01-03 23:58:53,968][134211] Fps is (10 sec: 18434.1, 60 sec: 15428.3, 300 sec: 14940.0). Total num frames: 152047616. Throughput: 0: 3851.1. Samples: 27176122. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-03 23:58:53,968][134211] Avg episode reward: [(0, '6.883')] [2025-01-03 23:58:54,392][134294] Updated weights for policy 0, policy_version 37124 (0.0013) [2025-01-03 23:58:56,282][134294] Updated weights for policy 0, policy_version 37134 (0.0012) [2025-01-03 23:58:58,570][134294] Updated weights for policy 0, policy_version 37144 (0.0020) [2025-01-03 23:58:58,969][134211] Fps is (10 sec: 20477.8, 60 sec: 15974.1, 300 sec: 14939.9). Total num frames: 152145920. Throughput: 0: 3838.1. Samples: 27207458. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-03 23:58:58,969][134211] Avg episode reward: [(0, '6.518')] [2025-01-03 23:59:02,115][134294] Updated weights for policy 0, policy_version 37154 (0.0031) [2025-01-03 23:59:03,968][134211] Fps is (10 sec: 15564.4, 60 sec: 15837.8, 300 sec: 14912.2). Total num frames: 152203264. Throughput: 0: 3813.3. Samples: 27216098. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-03 23:59:03,969][134211] Avg episode reward: [(0, '6.648')] [2025-01-03 23:59:05,339][134294] Updated weights for policy 0, policy_version 37164 (0.0027) [2025-01-03 23:59:08,396][134294] Updated weights for policy 0, policy_version 37174 (0.0024) [2025-01-03 23:59:08,968][134211] Fps is (10 sec: 12289.2, 60 sec: 15223.5, 300 sec: 14912.2). Total num frames: 152268800. Throughput: 0: 3829.0. Samples: 27235486. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-03 23:59:08,968][134211] Avg episode reward: [(0, '6.677')] [2025-01-03 23:59:11,915][134294] Updated weights for policy 0, policy_version 37184 (0.0025) [2025-01-03 23:59:13,971][134211] Fps is (10 sec: 12284.2, 60 sec: 14471.8, 300 sec: 14898.2). Total num frames: 152326144. Throughput: 0: 3766.2. Samples: 27253024. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-03 23:59:13,972][134211] Avg episode reward: [(0, '6.504')] [2025-01-03 23:59:13,980][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000037189_152326144.pth... [2025-01-03 23:59:14,064][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000036317_148754432.pth [2025-01-03 23:59:15,522][134294] Updated weights for policy 0, policy_version 37194 (0.0025) [2025-01-03 23:59:18,894][134294] Updated weights for policy 0, policy_version 37204 (0.0028) [2025-01-03 23:59:18,968][134211] Fps is (10 sec: 11878.2, 60 sec: 14540.7, 300 sec: 14898.3). Total num frames: 152387584. Throughput: 0: 3734.9. Samples: 27261620. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 23:59:18,969][134211] Avg episode reward: [(0, '6.509')] [2025-01-03 23:59:22,180][134294] Updated weights for policy 0, policy_version 37214 (0.0023) [2025-01-03 23:59:23,968][134211] Fps is (10 sec: 12292.0, 60 sec: 14472.5, 300 sec: 14759.5). Total num frames: 152449024. Throughput: 0: 3704.0. Samples: 27280398. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 23:59:23,968][134211] Avg episode reward: [(0, '6.797')] [2025-01-03 23:59:24,998][134294] Updated weights for policy 0, policy_version 37224 (0.0023) [2025-01-03 23:59:26,950][134294] Updated weights for policy 0, policy_version 37234 (0.0015) [2025-01-03 23:59:28,968][134211] Fps is (10 sec: 15155.6, 60 sec: 14813.9, 300 sec: 14828.9). Total num frames: 152539136. Throughput: 0: 3742.1. Samples: 27305534. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 23:59:28,968][134211] Avg episode reward: [(0, '6.639')] [2025-01-03 23:59:29,754][134294] Updated weights for policy 0, policy_version 37244 (0.0022) [2025-01-03 23:59:32,877][134294] Updated weights for policy 0, policy_version 37254 (0.0027) [2025-01-03 23:59:33,968][134211] Fps is (10 sec: 15564.6, 60 sec: 14813.8, 300 sec: 14828.9). Total num frames: 152604672. Throughput: 0: 3737.3. Samples: 27315522. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 23:59:33,968][134211] Avg episode reward: [(0, '6.643')] [2025-01-03 23:59:35,867][134294] Updated weights for policy 0, policy_version 37264 (0.0025) [2025-01-03 23:59:38,557][134294] Updated weights for policy 0, policy_version 37274 (0.0019) [2025-01-03 23:59:38,967][134211] Fps is (10 sec: 14336.4, 60 sec: 14813.9, 300 sec: 14856.7). Total num frames: 152682496. Throughput: 0: 3551.8. Samples: 27335954. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 23:59:38,968][134211] Avg episode reward: [(0, '6.535')] [2025-01-03 23:59:40,457][134294] Updated weights for policy 0, policy_version 37284 (0.0013) [2025-01-03 23:59:42,363][134294] Updated weights for policy 0, policy_version 37294 (0.0013) [2025-01-03 23:59:43,967][134211] Fps is (10 sec: 18432.5, 60 sec: 15428.6, 300 sec: 14981.7). Total num frames: 152788992. Throughput: 0: 3542.8. Samples: 27366878. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 23:59:43,968][134211] Avg episode reward: [(0, '6.572')] [2025-01-03 23:59:44,195][134294] Updated weights for policy 0, policy_version 37304 (0.0015) [2025-01-03 23:59:46,331][134294] Updated weights for policy 0, policy_version 37314 (0.0019) [2025-01-03 23:59:48,968][134211] Fps is (10 sec: 18430.2, 60 sec: 15428.1, 300 sec: 14939.9). Total num frames: 152866816. Throughput: 0: 3689.1. Samples: 27382110. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 23:59:48,969][134211] Avg episode reward: [(0, '6.072')] [2025-01-03 23:59:49,811][134294] Updated weights for policy 0, policy_version 37324 (0.0029) [2025-01-03 23:59:53,446][134294] Updated weights for policy 0, policy_version 37334 (0.0030) [2025-01-03 23:59:53,968][134211] Fps is (10 sec: 13516.3, 60 sec: 14609.0, 300 sec: 14773.4). Total num frames: 152924160. Throughput: 0: 3645.9. Samples: 27399554. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 23:59:53,969][134211] Avg episode reward: [(0, '6.944')] [2025-01-03 23:59:56,729][134294] Updated weights for policy 0, policy_version 37344 (0.0028) [2025-01-03 23:59:58,970][134211] Fps is (10 sec: 11876.8, 60 sec: 13994.4, 300 sec: 14759.4). Total num frames: 152985600. Throughput: 0: 3665.2. Samples: 27417954. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-03 23:59:58,971][134211] Avg episode reward: [(0, '6.899')] [2025-01-04 00:00:00,025][134294] Updated weights for policy 0, policy_version 37354 (0.0025) [2025-01-04 00:00:03,341][134294] Updated weights for policy 0, policy_version 37364 (0.0025) [2025-01-04 00:00:03,968][134211] Fps is (10 sec: 12288.1, 60 sec: 14062.9, 300 sec: 14745.6). Total num frames: 153047040. Throughput: 0: 3676.8. Samples: 27427076. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 00:00:03,968][134211] Avg episode reward: [(0, '6.973')] [2025-01-04 00:00:06,465][134294] Updated weights for policy 0, policy_version 37374 (0.0024) [2025-01-04 00:00:08,539][134294] Updated weights for policy 0, policy_version 37384 (0.0012) [2025-01-04 00:00:08,968][134211] Fps is (10 sec: 14749.0, 60 sec: 14404.3, 300 sec: 14842.8). Total num frames: 153133056. Throughput: 0: 3711.9. Samples: 27447434. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 00:00:08,968][134211] Avg episode reward: [(0, '7.320')] [2025-01-04 00:00:10,461][134294] Updated weights for policy 0, policy_version 37394 (0.0012) [2025-01-04 00:00:12,337][134294] Updated weights for policy 0, policy_version 37404 (0.0015) [2025-01-04 00:00:13,968][134211] Fps is (10 sec: 18841.6, 60 sec: 15156.0, 300 sec: 14940.0). Total num frames: 153235456. Throughput: 0: 3855.2. Samples: 27479020. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 00:00:13,968][134211] Avg episode reward: [(0, '6.786')] [2025-01-04 00:00:14,373][134294] Updated weights for policy 0, policy_version 37414 (0.0012) [2025-01-04 00:00:16,394][134294] Updated weights for policy 0, policy_version 37424 (0.0012) [2025-01-04 00:00:18,506][134294] Updated weights for policy 0, policy_version 37434 (0.0015) [2025-01-04 00:00:18,968][134211] Fps is (10 sec: 20070.2, 60 sec: 15769.7, 300 sec: 15023.5). Total num frames: 153333760. Throughput: 0: 3973.9. Samples: 27494348. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 00:00:18,968][134211] Avg episode reward: [(0, '6.365')] [2025-01-04 00:00:22,099][134294] Updated weights for policy 0, policy_version 37444 (0.0028) [2025-01-04 00:00:23,968][134211] Fps is (10 sec: 15564.7, 60 sec: 15701.3, 300 sec: 14981.7). Total num frames: 153391104. Throughput: 0: 4003.1. Samples: 27516096. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 00:00:23,969][134211] Avg episode reward: [(0, '6.832')] [2025-01-04 00:00:25,781][134294] Updated weights for policy 0, policy_version 37454 (0.0024) [2025-01-04 00:00:28,968][134211] Fps is (10 sec: 11058.3, 60 sec: 15086.8, 300 sec: 14939.9). Total num frames: 153444352. Throughput: 0: 3689.4. Samples: 27532904. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 00:00:28,969][134211] Avg episode reward: [(0, '6.380')] [2025-01-04 00:00:29,454][134294] Updated weights for policy 0, policy_version 37464 (0.0028) [2025-01-04 00:00:33,004][134294] Updated weights for policy 0, policy_version 37474 (0.0028) [2025-01-04 00:00:33,968][134211] Fps is (10 sec: 11059.4, 60 sec: 14950.4, 300 sec: 14801.1). Total num frames: 153501696. Throughput: 0: 3539.9. Samples: 27541404. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 00:00:33,968][134211] Avg episode reward: [(0, '6.385')] [2025-01-04 00:00:36,735][134294] Updated weights for policy 0, policy_version 37484 (0.0026) [2025-01-04 00:00:38,968][134211] Fps is (10 sec: 11469.6, 60 sec: 14609.0, 300 sec: 14620.6). Total num frames: 153559040. Throughput: 0: 3524.4. Samples: 27558150. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 00:00:38,968][134211] Avg episode reward: [(0, '7.465')] [2025-01-04 00:00:40,217][134294] Updated weights for policy 0, policy_version 37494 (0.0026) [2025-01-04 00:00:43,824][134294] Updated weights for policy 0, policy_version 37504 (0.0027) [2025-01-04 00:00:43,968][134211] Fps is (10 sec: 11468.5, 60 sec: 13789.8, 300 sec: 14579.0). Total num frames: 153616384. Throughput: 0: 3498.5. Samples: 27575378. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 00:00:43,969][134211] Avg episode reward: [(0, '7.154')] [2025-01-04 00:00:46,514][134294] Updated weights for policy 0, policy_version 37514 (0.0019) [2025-01-04 00:00:48,684][134294] Updated weights for policy 0, policy_version 37524 (0.0014) [2025-01-04 00:00:48,968][134211] Fps is (10 sec: 14336.2, 60 sec: 13926.6, 300 sec: 14662.3). Total num frames: 153702400. Throughput: 0: 3531.7. Samples: 27586002. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 00:00:48,968][134211] Avg episode reward: [(0, '6.432')] [2025-01-04 00:00:50,843][134294] Updated weights for policy 0, policy_version 37534 (0.0013) [2025-01-04 00:00:53,637][134294] Updated weights for policy 0, policy_version 37544 (0.0019) [2025-01-04 00:00:53,968][134211] Fps is (10 sec: 16384.1, 60 sec: 14267.7, 300 sec: 14704.0). Total num frames: 153780224. Throughput: 0: 3698.7. Samples: 27613876. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 00:00:53,969][134211] Avg episode reward: [(0, '6.105')] [2025-01-04 00:00:57,619][134294] Updated weights for policy 0, policy_version 37554 (0.0028) [2025-01-04 00:00:58,968][134211] Fps is (10 sec: 13106.9, 60 sec: 14131.7, 300 sec: 14648.4). Total num frames: 153833472. Throughput: 0: 3355.2. Samples: 27630002. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 00:00:58,969][134211] Avg episode reward: [(0, '6.388')] [2025-01-04 00:01:00,929][134294] Updated weights for policy 0, policy_version 37564 (0.0026) [2025-01-04 00:01:03,585][134294] Updated weights for policy 0, policy_version 37574 (0.0018) [2025-01-04 00:01:03,968][134211] Fps is (10 sec: 12697.9, 60 sec: 14336.0, 300 sec: 14676.2). Total num frames: 153907200. Throughput: 0: 3234.0. Samples: 27639876. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 00:01:03,969][134211] Avg episode reward: [(0, '6.789')] [2025-01-04 00:01:07,240][134294] Updated weights for policy 0, policy_version 37584 (0.0026) [2025-01-04 00:01:08,969][134211] Fps is (10 sec: 12696.7, 60 sec: 13789.6, 300 sec: 14620.6). Total num frames: 153960448. Throughput: 0: 3177.0. Samples: 27659064. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 00:01:08,969][134211] Avg episode reward: [(0, '6.178')] [2025-01-04 00:01:10,380][134294] Updated weights for policy 0, policy_version 37594 (0.0020) [2025-01-04 00:01:12,465][134294] Updated weights for policy 0, policy_version 37604 (0.0014) [2025-01-04 00:01:13,968][134211] Fps is (10 sec: 14336.1, 60 sec: 13585.1, 300 sec: 14690.1). Total num frames: 154050560. Throughput: 0: 3327.9. Samples: 27682656. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 00:01:13,968][134211] Avg episode reward: [(0, '6.521')] [2025-01-04 00:01:13,976][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000037610_154050560.pth... [2025-01-04 00:01:14,031][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000036753_150540288.pth [2025-01-04 00:01:14,766][134294] Updated weights for policy 0, policy_version 37614 (0.0013) [2025-01-04 00:01:17,129][134294] Updated weights for policy 0, policy_version 37624 (0.0018) [2025-01-04 00:01:18,968][134211] Fps is (10 sec: 16385.2, 60 sec: 13175.4, 300 sec: 14717.8). Total num frames: 154124288. Throughput: 0: 3454.5. Samples: 27696856. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 00:01:18,968][134211] Avg episode reward: [(0, '6.268')] [2025-01-04 00:01:20,983][134294] Updated weights for policy 0, policy_version 37634 (0.0029) [2025-01-04 00:01:23,968][134211] Fps is (10 sec: 13106.9, 60 sec: 13175.5, 300 sec: 14537.3). Total num frames: 154181632. Throughput: 0: 3461.8. Samples: 27713932. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 00:01:23,968][134211] Avg episode reward: [(0, '6.599')] [2025-01-04 00:01:24,510][134294] Updated weights for policy 0, policy_version 37644 (0.0024) [2025-01-04 00:01:27,859][134294] Updated weights for policy 0, policy_version 37654 (0.0028) [2025-01-04 00:01:28,971][134211] Fps is (10 sec: 11874.8, 60 sec: 13311.5, 300 sec: 14426.1). Total num frames: 154243072. Throughput: 0: 3474.5. Samples: 27731742. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 00:01:28,971][134211] Avg episode reward: [(0, '7.023')] [2025-01-04 00:01:31,170][134294] Updated weights for policy 0, policy_version 37664 (0.0027) [2025-01-04 00:01:33,968][134211] Fps is (10 sec: 11877.9, 60 sec: 13311.9, 300 sec: 14412.3). Total num frames: 154300416. Throughput: 0: 3444.5. Samples: 27741008. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 00:01:33,969][134211] Avg episode reward: [(0, '7.000')] [2025-01-04 00:01:34,641][134294] Updated weights for policy 0, policy_version 37674 (0.0024) [2025-01-04 00:01:37,670][134294] Updated weights for policy 0, policy_version 37684 (0.0023) [2025-01-04 00:01:38,967][134211] Fps is (10 sec: 13521.3, 60 sec: 13653.4, 300 sec: 14454.0). Total num frames: 154378240. Throughput: 0: 3229.8. Samples: 27759214. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 00:01:38,968][134211] Avg episode reward: [(0, '5.975')] [2025-01-04 00:01:39,733][134294] Updated weights for policy 0, policy_version 37694 (0.0015) [2025-01-04 00:01:41,761][134294] Updated weights for policy 0, policy_version 37704 (0.0014) [2025-01-04 00:01:43,728][134294] Updated weights for policy 0, policy_version 37714 (0.0014) [2025-01-04 00:01:43,968][134211] Fps is (10 sec: 18023.5, 60 sec: 14404.4, 300 sec: 14579.0). Total num frames: 154480640. Throughput: 0: 3540.0. Samples: 27789300. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 00:01:43,968][134211] Avg episode reward: [(0, '6.638')] [2025-01-04 00:01:46,809][134294] Updated weights for policy 0, policy_version 37724 (0.0024) [2025-01-04 00:01:48,970][134211] Fps is (10 sec: 15971.0, 60 sec: 13925.9, 300 sec: 14565.0). Total num frames: 154537984. Throughput: 0: 3575.0. Samples: 27800758. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 00:01:48,970][134211] Avg episode reward: [(0, '6.484')] [2025-01-04 00:01:50,535][134294] Updated weights for policy 0, policy_version 37734 (0.0028) [2025-01-04 00:01:53,968][134211] Fps is (10 sec: 11468.5, 60 sec: 13585.1, 300 sec: 14495.7). Total num frames: 154595328. Throughput: 0: 3523.5. Samples: 27817618. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 00:01:53,968][134211] Avg episode reward: [(0, '6.824')] [2025-01-04 00:01:54,103][134294] Updated weights for policy 0, policy_version 37744 (0.0025) [2025-01-04 00:01:57,354][134294] Updated weights for policy 0, policy_version 37754 (0.0025) [2025-01-04 00:01:58,968][134211] Fps is (10 sec: 11880.8, 60 sec: 13721.6, 300 sec: 14356.8). Total num frames: 154656768. Throughput: 0: 3399.2. Samples: 27835620. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 00:01:58,968][134211] Avg episode reward: [(0, '6.404')] [2025-01-04 00:02:00,751][134294] Updated weights for policy 0, policy_version 37764 (0.0024) [2025-01-04 00:02:03,771][134294] Updated weights for policy 0, policy_version 37774 (0.0025) [2025-01-04 00:02:03,968][134211] Fps is (10 sec: 12697.7, 60 sec: 13585.1, 300 sec: 14370.7). Total num frames: 154722304. Throughput: 0: 3294.6. Samples: 27845112. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 00:02:03,968][134211] Avg episode reward: [(0, '6.828')] [2025-01-04 00:02:06,431][134294] Updated weights for policy 0, policy_version 37784 (0.0020) [2025-01-04 00:02:08,382][134294] Updated weights for policy 0, policy_version 37794 (0.0014) [2025-01-04 00:02:08,967][134211] Fps is (10 sec: 15565.1, 60 sec: 14199.7, 300 sec: 14495.7). Total num frames: 154812416. Throughput: 0: 3426.0. Samples: 27868100. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 00:02:08,968][134211] Avg episode reward: [(0, '6.397')] [2025-01-04 00:02:10,487][134294] Updated weights for policy 0, policy_version 37804 (0.0014) [2025-01-04 00:02:12,581][134294] Updated weights for policy 0, policy_version 37814 (0.0013) [2025-01-04 00:02:13,968][134211] Fps is (10 sec: 18022.1, 60 sec: 14199.4, 300 sec: 14509.5). Total num frames: 154902528. Throughput: 0: 3673.8. Samples: 27897054. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 00:02:13,969][134211] Avg episode reward: [(0, '7.188')] [2025-01-04 00:02:15,766][134294] Updated weights for policy 0, policy_version 37824 (0.0028) [2025-01-04 00:02:18,968][134211] Fps is (10 sec: 15154.9, 60 sec: 13994.7, 300 sec: 14370.7). Total num frames: 154963968. Throughput: 0: 3670.5. Samples: 27906180. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 00:02:18,968][134211] Avg episode reward: [(0, '6.601')] [2025-01-04 00:02:19,002][134294] Updated weights for policy 0, policy_version 37834 (0.0025) [2025-01-04 00:02:22,087][134294] Updated weights for policy 0, policy_version 37844 (0.0027) [2025-01-04 00:02:23,968][134211] Fps is (10 sec: 12697.7, 60 sec: 14131.2, 300 sec: 14342.9). Total num frames: 155029504. Throughput: 0: 3697.1. Samples: 27925584. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 00:02:23,968][134211] Avg episode reward: [(0, '6.180')] [2025-01-04 00:02:25,385][134294] Updated weights for policy 0, policy_version 37854 (0.0025) [2025-01-04 00:02:28,257][134294] Updated weights for policy 0, policy_version 37864 (0.0026) [2025-01-04 00:02:28,968][134211] Fps is (10 sec: 13107.3, 60 sec: 14200.2, 300 sec: 14356.8). Total num frames: 155095040. Throughput: 0: 3470.2. Samples: 27945460. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 00:02:28,968][134211] Avg episode reward: [(0, '7.189')] [2025-01-04 00:02:31,382][134294] Updated weights for policy 0, policy_version 37874 (0.0027) [2025-01-04 00:02:33,969][134211] Fps is (10 sec: 13515.6, 60 sec: 14404.1, 300 sec: 14398.4). Total num frames: 155164672. Throughput: 0: 3441.5. Samples: 27955620. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 00:02:33,969][134211] Avg episode reward: [(0, '6.542')] [2025-01-04 00:02:34,559][134294] Updated weights for policy 0, policy_version 37884 (0.0024) [2025-01-04 00:02:37,535][134294] Updated weights for policy 0, policy_version 37894 (0.0026) [2025-01-04 00:02:38,969][134211] Fps is (10 sec: 13515.3, 60 sec: 14199.2, 300 sec: 14384.6). Total num frames: 155230208. Throughput: 0: 3507.9. Samples: 27975476. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 00:02:38,969][134211] Avg episode reward: [(0, '6.699')] [2025-01-04 00:02:40,599][134294] Updated weights for policy 0, policy_version 37904 (0.0028) [2025-01-04 00:02:43,447][134294] Updated weights for policy 0, policy_version 37914 (0.0023) [2025-01-04 00:02:43,968][134211] Fps is (10 sec: 13518.0, 60 sec: 13653.3, 300 sec: 14398.5). Total num frames: 155299840. Throughput: 0: 3566.4. Samples: 27996108. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 00:02:43,968][134211] Avg episode reward: [(0, '6.666')] [2025-01-04 00:02:46,608][134294] Updated weights for policy 0, policy_version 37924 (0.0025) [2025-01-04 00:02:48,915][134294] Updated weights for policy 0, policy_version 37934 (0.0016) [2025-01-04 00:02:48,968][134211] Fps is (10 sec: 14747.2, 60 sec: 13995.2, 300 sec: 14426.3). Total num frames: 155377664. Throughput: 0: 3578.9. Samples: 28006164. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 00:02:48,968][134211] Avg episode reward: [(0, '6.466')] [2025-01-04 00:02:50,810][134294] Updated weights for policy 0, policy_version 37944 (0.0012) [2025-01-04 00:02:52,692][134294] Updated weights for policy 0, policy_version 37954 (0.0012) [2025-01-04 00:02:53,968][134211] Fps is (10 sec: 18432.3, 60 sec: 14813.9, 300 sec: 14565.1). Total num frames: 155484160. Throughput: 0: 3711.1. Samples: 28035100. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 00:02:53,968][134211] Avg episode reward: [(0, '6.698')] [2025-01-04 00:02:54,660][134294] Updated weights for policy 0, policy_version 37964 (0.0014) [2025-01-04 00:02:56,667][134294] Updated weights for policy 0, policy_version 37974 (0.0014) [2025-01-04 00:02:58,737][134294] Updated weights for policy 0, policy_version 37984 (0.0013) [2025-01-04 00:02:58,968][134211] Fps is (10 sec: 20479.7, 60 sec: 15428.3, 300 sec: 14676.2). Total num frames: 155582464. Throughput: 0: 3759.5. Samples: 28066230. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 00:02:58,968][134211] Avg episode reward: [(0, '6.862')] [2025-01-04 00:03:01,702][134294] Updated weights for policy 0, policy_version 37994 (0.0029) [2025-01-04 00:03:03,968][134211] Fps is (10 sec: 16383.7, 60 sec: 15428.2, 300 sec: 14551.2). Total num frames: 155648000. Throughput: 0: 3797.3. Samples: 28077060. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 00:03:03,968][134211] Avg episode reward: [(0, '6.713')] [2025-01-04 00:03:05,161][134294] Updated weights for policy 0, policy_version 38004 (0.0031) [2025-01-04 00:03:08,353][134294] Updated weights for policy 0, policy_version 38014 (0.0027) [2025-01-04 00:03:08,968][134211] Fps is (10 sec: 12697.6, 60 sec: 14950.3, 300 sec: 14412.4). Total num frames: 155709440. Throughput: 0: 3778.5. Samples: 28095616. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 00:03:08,968][134211] Avg episode reward: [(0, '6.227')] [2025-01-04 00:03:11,717][134294] Updated weights for policy 0, policy_version 38024 (0.0026) [2025-01-04 00:03:13,968][134211] Fps is (10 sec: 12697.5, 60 sec: 14540.8, 300 sec: 14440.1). Total num frames: 155774976. Throughput: 0: 3756.8. Samples: 28114518. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 00:03:13,969][134211] Avg episode reward: [(0, '6.243')] [2025-01-04 00:03:13,978][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000038031_155774976.pth... [2025-01-04 00:03:14,055][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000037189_152326144.pth [2025-01-04 00:03:14,867][134294] Updated weights for policy 0, policy_version 38034 (0.0029) [2025-01-04 00:03:17,977][134294] Updated weights for policy 0, policy_version 38044 (0.0026) [2025-01-04 00:03:18,968][134211] Fps is (10 sec: 13107.2, 60 sec: 14609.0, 300 sec: 14440.1). Total num frames: 155840512. Throughput: 0: 3745.5. Samples: 28124164. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 00:03:18,968][134211] Avg episode reward: [(0, '6.132')] [2025-01-04 00:03:21,000][134294] Updated weights for policy 0, policy_version 38054 (0.0025) [2025-01-04 00:03:23,968][134211] Fps is (10 sec: 13107.3, 60 sec: 14609.1, 300 sec: 14426.2). Total num frames: 155906048. Throughput: 0: 3752.3. Samples: 28144326. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 00:03:23,968][134211] Avg episode reward: [(0, '5.991')] [2025-01-04 00:03:24,146][134294] Updated weights for policy 0, policy_version 38064 (0.0026) [2025-01-04 00:03:27,288][134294] Updated weights for policy 0, policy_version 38074 (0.0025) [2025-01-04 00:03:28,968][134211] Fps is (10 sec: 13107.1, 60 sec: 14609.0, 300 sec: 14426.2). Total num frames: 155971584. Throughput: 0: 3734.1. Samples: 28164144. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 00:03:28,968][134211] Avg episode reward: [(0, '6.635')] [2025-01-04 00:03:30,210][134294] Updated weights for policy 0, policy_version 38084 (0.0027) [2025-01-04 00:03:33,054][134294] Updated weights for policy 0, policy_version 38094 (0.0023) [2025-01-04 00:03:33,968][134211] Fps is (10 sec: 13516.8, 60 sec: 14609.3, 300 sec: 14398.5). Total num frames: 156041216. Throughput: 0: 3745.3. Samples: 28174702. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 00:03:33,968][134211] Avg episode reward: [(0, '6.654')] [2025-01-04 00:03:36,058][134294] Updated weights for policy 0, policy_version 38104 (0.0023) [2025-01-04 00:03:38,968][134211] Fps is (10 sec: 13926.6, 60 sec: 14677.6, 300 sec: 14398.5). Total num frames: 156110848. Throughput: 0: 3563.7. Samples: 28195468. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 00:03:38,968][134211] Avg episode reward: [(0, '6.971')] [2025-01-04 00:03:39,116][134294] Updated weights for policy 0, policy_version 38114 (0.0024) [2025-01-04 00:03:41,728][134294] Updated weights for policy 0, policy_version 38124 (0.0017) [2025-01-04 00:03:43,610][134294] Updated weights for policy 0, policy_version 38134 (0.0014) [2025-01-04 00:03:43,968][134211] Fps is (10 sec: 16384.3, 60 sec: 15087.0, 300 sec: 14454.0). Total num frames: 156205056. Throughput: 0: 3421.2. Samples: 28220182. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 00:03:43,968][134211] Avg episode reward: [(0, '6.454')] [2025-01-04 00:03:45,569][134294] Updated weights for policy 0, policy_version 38144 (0.0013) [2025-01-04 00:03:47,442][134294] Updated weights for policy 0, policy_version 38154 (0.0014) [2025-01-04 00:03:48,967][134211] Fps is (10 sec: 20070.9, 60 sec: 15564.8, 300 sec: 14454.0). Total num frames: 156311552. Throughput: 0: 3530.8. Samples: 28235946. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 00:03:48,968][134211] Avg episode reward: [(0, '6.593')] [2025-01-04 00:03:49,346][134294] Updated weights for policy 0, policy_version 38164 (0.0013) [2025-01-04 00:03:51,587][134294] Updated weights for policy 0, policy_version 38174 (0.0020) [2025-01-04 00:03:53,968][134211] Fps is (10 sec: 18431.9, 60 sec: 15086.9, 300 sec: 14384.7). Total num frames: 156389376. Throughput: 0: 3764.1. Samples: 28265002. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 00:03:53,968][134211] Avg episode reward: [(0, '6.659')] [2025-01-04 00:03:54,926][134294] Updated weights for policy 0, policy_version 38184 (0.0027) [2025-01-04 00:03:58,120][134294] Updated weights for policy 0, policy_version 38194 (0.0026) [2025-01-04 00:03:58,968][134211] Fps is (10 sec: 13926.0, 60 sec: 14472.5, 300 sec: 14398.5). Total num frames: 156450816. Throughput: 0: 3766.1. Samples: 28283992. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 00:03:58,968][134211] Avg episode reward: [(0, '6.023')] [2025-01-04 00:04:01,262][134294] Updated weights for policy 0, policy_version 38204 (0.0028) [2025-01-04 00:04:03,968][134211] Fps is (10 sec: 12287.8, 60 sec: 14404.3, 300 sec: 14384.6). Total num frames: 156512256. Throughput: 0: 3767.9. Samples: 28293718. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 00:04:03,968][134211] Avg episode reward: [(0, '6.427')] [2025-01-04 00:04:04,646][134294] Updated weights for policy 0, policy_version 38214 (0.0025) [2025-01-04 00:04:07,794][134294] Updated weights for policy 0, policy_version 38224 (0.0027) [2025-01-04 00:04:08,968][134211] Fps is (10 sec: 12697.6, 60 sec: 14472.5, 300 sec: 14412.5). Total num frames: 156577792. Throughput: 0: 3739.6. Samples: 28312608. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 00:04:08,968][134211] Avg episode reward: [(0, '6.191')] [2025-01-04 00:04:10,908][134294] Updated weights for policy 0, policy_version 38234 (0.0024) [2025-01-04 00:04:13,968][134211] Fps is (10 sec: 12697.6, 60 sec: 14404.3, 300 sec: 14412.4). Total num frames: 156639232. Throughput: 0: 3719.6. Samples: 28331526. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 00:04:13,968][134211] Avg episode reward: [(0, '5.954')] [2025-01-04 00:04:14,418][134294] Updated weights for policy 0, policy_version 38244 (0.0025) [2025-01-04 00:04:16,955][134294] Updated weights for policy 0, policy_version 38254 (0.0017) [2025-01-04 00:04:18,967][134211] Fps is (10 sec: 14745.9, 60 sec: 14745.7, 300 sec: 14495.7). Total num frames: 156725248. Throughput: 0: 3708.0. Samples: 28341560. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 00:04:18,968][134211] Avg episode reward: [(0, '6.687')] [2025-01-04 00:04:18,999][134294] Updated weights for policy 0, policy_version 38264 (0.0013) [2025-01-04 00:04:20,904][134294] Updated weights for policy 0, policy_version 38274 (0.0014) [2025-01-04 00:04:22,830][134294] Updated weights for policy 0, policy_version 38284 (0.0014) [2025-01-04 00:04:23,968][134211] Fps is (10 sec: 19661.2, 60 sec: 15496.6, 300 sec: 14565.1). Total num frames: 156835840. Throughput: 0: 3937.0. Samples: 28372634. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 00:04:23,968][134211] Avg episode reward: [(0, '6.675')] [2025-01-04 00:04:24,702][134294] Updated weights for policy 0, policy_version 38294 (0.0013) [2025-01-04 00:04:26,792][134294] Updated weights for policy 0, policy_version 38304 (0.0015) [2025-01-04 00:04:28,968][134211] Fps is (10 sec: 19250.7, 60 sec: 15769.6, 300 sec: 14620.6). Total num frames: 156917760. Throughput: 0: 4022.8. Samples: 28401210. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 00:04:28,968][134211] Avg episode reward: [(0, '6.434')] [2025-01-04 00:04:30,004][134294] Updated weights for policy 0, policy_version 38314 (0.0030) [2025-01-04 00:04:33,197][134294] Updated weights for policy 0, policy_version 38324 (0.0028) [2025-01-04 00:04:33,968][134211] Fps is (10 sec: 14745.4, 60 sec: 15701.3, 300 sec: 14579.0). Total num frames: 156983296. Throughput: 0: 3874.2. Samples: 28410286. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 00:04:33,968][134211] Avg episode reward: [(0, '6.246')] [2025-01-04 00:04:36,417][134294] Updated weights for policy 0, policy_version 38334 (0.0029) [2025-01-04 00:04:38,968][134211] Fps is (10 sec: 13107.3, 60 sec: 15633.1, 300 sec: 14440.1). Total num frames: 157048832. Throughput: 0: 3657.2. Samples: 28429574. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 00:04:38,968][134211] Avg episode reward: [(0, '6.800')] [2025-01-04 00:04:39,632][134294] Updated weights for policy 0, policy_version 38344 (0.0026) [2025-01-04 00:04:42,742][134294] Updated weights for policy 0, policy_version 38354 (0.0027) [2025-01-04 00:04:43,968][134211] Fps is (10 sec: 13107.2, 60 sec: 15155.2, 300 sec: 14398.5). Total num frames: 157114368. Throughput: 0: 3672.3. Samples: 28449244. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 00:04:43,968][134211] Avg episode reward: [(0, '5.954')] [2025-01-04 00:04:45,681][134294] Updated weights for policy 0, policy_version 38364 (0.0028) [2025-01-04 00:04:48,649][134294] Updated weights for policy 0, policy_version 38374 (0.0024) [2025-01-04 00:04:48,968][134211] Fps is (10 sec: 13516.4, 60 sec: 14540.7, 300 sec: 14440.1). Total num frames: 157184000. Throughput: 0: 3688.0. Samples: 28459680. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 00:04:48,969][134211] Avg episode reward: [(0, '6.274')] [2025-01-04 00:04:51,743][134294] Updated weights for policy 0, policy_version 38384 (0.0026) [2025-01-04 00:04:53,968][134211] Fps is (10 sec: 13107.3, 60 sec: 14267.7, 300 sec: 14440.2). Total num frames: 157245440. Throughput: 0: 3710.6. Samples: 28479586. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 00:04:53,968][134211] Avg episode reward: [(0, '5.601')] [2025-01-04 00:04:55,060][134294] Updated weights for policy 0, policy_version 38394 (0.0022) [2025-01-04 00:04:57,975][134294] Updated weights for policy 0, policy_version 38404 (0.0027) [2025-01-04 00:04:58,968][134211] Fps is (10 sec: 13107.6, 60 sec: 14404.3, 300 sec: 14467.9). Total num frames: 157315072. Throughput: 0: 3731.8. Samples: 28499458. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 00:04:58,968][134211] Avg episode reward: [(0, '6.170')] [2025-01-04 00:05:00,919][134294] Updated weights for policy 0, policy_version 38414 (0.0025) [2025-01-04 00:05:03,968][134211] Fps is (10 sec: 13516.0, 60 sec: 14472.4, 300 sec: 14398.4). Total num frames: 157380608. Throughput: 0: 3739.0. Samples: 28509816. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 00:05:03,969][134211] Avg episode reward: [(0, '5.929')] [2025-01-04 00:05:03,976][134294] Updated weights for policy 0, policy_version 38424 (0.0025) [2025-01-04 00:05:06,884][134294] Updated weights for policy 0, policy_version 38434 (0.0024) [2025-01-04 00:05:08,968][134211] Fps is (10 sec: 13516.9, 60 sec: 14540.8, 300 sec: 14287.4). Total num frames: 157450240. Throughput: 0: 3507.5. Samples: 28530470. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 00:05:08,968][134211] Avg episode reward: [(0, '6.724')] [2025-01-04 00:05:09,983][134294] Updated weights for policy 0, policy_version 38444 (0.0027) [2025-01-04 00:05:12,069][134294] Updated weights for policy 0, policy_version 38454 (0.0013) [2025-01-04 00:05:13,956][134294] Updated weights for policy 0, policy_version 38464 (0.0014) [2025-01-04 00:05:13,968][134211] Fps is (10 sec: 16794.7, 60 sec: 15155.2, 300 sec: 14287.4). Total num frames: 157548544. Throughput: 0: 3442.9. Samples: 28556140. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 00:05:13,968][134211] Avg episode reward: [(0, '6.280')] [2025-01-04 00:05:13,972][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000038464_157548544.pth... [2025-01-04 00:05:14,015][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000037610_154050560.pth [2025-01-04 00:05:15,889][134294] Updated weights for policy 0, policy_version 38474 (0.0014) [2025-01-04 00:05:17,746][134294] Updated weights for policy 0, policy_version 38484 (0.0013) [2025-01-04 00:05:18,968][134211] Fps is (10 sec: 20070.1, 60 sec: 15428.2, 300 sec: 14440.1). Total num frames: 157650944. Throughput: 0: 3602.1. Samples: 28572380. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 00:05:18,968][134211] Avg episode reward: [(0, '6.147')] [2025-01-04 00:05:20,344][134294] Updated weights for policy 0, policy_version 38494 (0.0024) [2025-01-04 00:05:23,643][134294] Updated weights for policy 0, policy_version 38504 (0.0026) [2025-01-04 00:05:23,968][134211] Fps is (10 sec: 16383.9, 60 sec: 14609.1, 300 sec: 14467.9). Total num frames: 157712384. Throughput: 0: 3712.3. Samples: 28596626. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 00:05:23,968][134211] Avg episode reward: [(0, '6.068')] [2025-01-04 00:05:26,809][134294] Updated weights for policy 0, policy_version 38514 (0.0029) [2025-01-04 00:05:28,968][134211] Fps is (10 sec: 12697.6, 60 sec: 14336.0, 300 sec: 14495.7). Total num frames: 157777920. Throughput: 0: 3705.2. Samples: 28615978. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 00:05:28,968][134211] Avg episode reward: [(0, '6.678')] [2025-01-04 00:05:29,976][134294] Updated weights for policy 0, policy_version 38524 (0.0023) [2025-01-04 00:05:33,070][134294] Updated weights for policy 0, policy_version 38534 (0.0026) [2025-01-04 00:05:33,969][134211] Fps is (10 sec: 13105.3, 60 sec: 14335.7, 300 sec: 14523.4). Total num frames: 157843456. Throughput: 0: 3690.0. Samples: 28625734. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 00:05:33,970][134211] Avg episode reward: [(0, '6.557')] [2025-01-04 00:05:36,097][134294] Updated weights for policy 0, policy_version 38544 (0.0024) [2025-01-04 00:05:38,968][134211] Fps is (10 sec: 13516.8, 60 sec: 14404.3, 300 sec: 14565.1). Total num frames: 157913088. Throughput: 0: 3696.1. Samples: 28645912. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 00:05:38,968][134211] Avg episode reward: [(0, '5.959')] [2025-01-04 00:05:39,203][134294] Updated weights for policy 0, policy_version 38554 (0.0028) [2025-01-04 00:05:42,252][134294] Updated weights for policy 0, policy_version 38564 (0.0026) [2025-01-04 00:05:43,968][134211] Fps is (10 sec: 13518.7, 60 sec: 14404.3, 300 sec: 14495.7). Total num frames: 157978624. Throughput: 0: 3694.8. Samples: 28665724. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 00:05:43,968][134211] Avg episode reward: [(0, '6.372')] [2025-01-04 00:05:45,501][134294] Updated weights for policy 0, policy_version 38574 (0.0026) [2025-01-04 00:05:47,631][134294] Updated weights for policy 0, policy_version 38584 (0.0014) [2025-01-04 00:05:48,968][134211] Fps is (10 sec: 15155.4, 60 sec: 14677.4, 300 sec: 14523.5). Total num frames: 158064640. Throughput: 0: 3682.6. Samples: 28675532. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 00:05:48,968][134211] Avg episode reward: [(0, '6.583')] [2025-01-04 00:05:49,562][134294] Updated weights for policy 0, policy_version 38594 (0.0013) [2025-01-04 00:05:51,386][134294] Updated weights for policy 0, policy_version 38604 (0.0014) [2025-01-04 00:05:53,316][134294] Updated weights for policy 0, policy_version 38614 (0.0013) [2025-01-04 00:05:53,968][134211] Fps is (10 sec: 19660.9, 60 sec: 15496.6, 300 sec: 14717.8). Total num frames: 158175232. Throughput: 0: 3942.2. Samples: 28707870. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 00:05:53,968][134211] Avg episode reward: [(0, '5.869')] [2025-01-04 00:05:55,256][134294] Updated weights for policy 0, policy_version 38624 (0.0016) [2025-01-04 00:05:58,309][134294] Updated weights for policy 0, policy_version 38634 (0.0026) [2025-01-04 00:05:58,968][134211] Fps is (10 sec: 18431.7, 60 sec: 15564.8, 300 sec: 14717.8). Total num frames: 158248960. Throughput: 0: 3953.4. Samples: 28734042. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 00:05:58,968][134211] Avg episode reward: [(0, '5.855')] [2025-01-04 00:06:01,661][134294] Updated weights for policy 0, policy_version 38644 (0.0025) [2025-01-04 00:06:03,968][134211] Fps is (10 sec: 13516.7, 60 sec: 15496.7, 300 sec: 14745.6). Total num frames: 158310400. Throughput: 0: 3796.8. Samples: 28743234. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 00:06:03,968][134211] Avg episode reward: [(0, '6.585')] [2025-01-04 00:06:05,183][134294] Updated weights for policy 0, policy_version 38654 (0.0026) [2025-01-04 00:06:08,341][134294] Updated weights for policy 0, policy_version 38664 (0.0025) [2025-01-04 00:06:08,968][134211] Fps is (10 sec: 12697.7, 60 sec: 15428.3, 300 sec: 14662.3). Total num frames: 158375936. Throughput: 0: 3664.2. Samples: 28761516. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 00:06:08,968][134211] Avg episode reward: [(0, '5.722')] [2025-01-04 00:06:11,456][134294] Updated weights for policy 0, policy_version 38674 (0.0027) [2025-01-04 00:06:13,968][134211] Fps is (10 sec: 12287.9, 60 sec: 14745.6, 300 sec: 14606.8). Total num frames: 158433280. Throughput: 0: 3648.1. Samples: 28780142. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 00:06:13,969][134211] Avg episode reward: [(0, '6.109')] [2025-01-04 00:06:15,131][134294] Updated weights for policy 0, policy_version 38684 (0.0027) [2025-01-04 00:06:18,219][134294] Updated weights for policy 0, policy_version 38694 (0.0021) [2025-01-04 00:06:18,967][134211] Fps is (10 sec: 12697.8, 60 sec: 14199.5, 300 sec: 14648.4). Total num frames: 158502912. Throughput: 0: 3619.2. Samples: 28788590. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 00:06:18,968][134211] Avg episode reward: [(0, '6.046')] [2025-01-04 00:06:20,280][134294] Updated weights for policy 0, policy_version 38704 (0.0012) [2025-01-04 00:06:22,115][134294] Updated weights for policy 0, policy_version 38714 (0.0015) [2025-01-04 00:06:23,968][134211] Fps is (10 sec: 17613.1, 60 sec: 14950.4, 300 sec: 14801.3). Total num frames: 158609408. Throughput: 0: 3793.8. Samples: 28816632. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 00:06:23,968][134211] Avg episode reward: [(0, '5.970')] [2025-01-04 00:06:24,020][134294] Updated weights for policy 0, policy_version 38724 (0.0012) [2025-01-04 00:06:25,856][134294] Updated weights for policy 0, policy_version 38734 (0.0014) [2025-01-04 00:06:28,046][134294] Updated weights for policy 0, policy_version 38744 (0.0018) [2025-01-04 00:06:28,968][134211] Fps is (10 sec: 20070.1, 60 sec: 15428.3, 300 sec: 14926.1). Total num frames: 158703616. Throughput: 0: 4041.9. Samples: 28847608. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 00:06:28,968][134211] Avg episode reward: [(0, '6.561')] [2025-01-04 00:06:31,167][134294] Updated weights for policy 0, policy_version 38754 (0.0026) [2025-01-04 00:06:33,968][134211] Fps is (10 sec: 15974.0, 60 sec: 15428.6, 300 sec: 14884.4). Total num frames: 158769152. Throughput: 0: 4042.9. Samples: 28857462. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 00:06:33,968][134211] Avg episode reward: [(0, '6.630')] [2025-01-04 00:06:34,476][134294] Updated weights for policy 0, policy_version 38764 (0.0028) [2025-01-04 00:06:37,555][134294] Updated weights for policy 0, policy_version 38774 (0.0024) [2025-01-04 00:06:38,968][134211] Fps is (10 sec: 13107.2, 60 sec: 15360.0, 300 sec: 14759.5). Total num frames: 158834688. Throughput: 0: 3746.2. Samples: 28876450. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 00:06:38,968][134211] Avg episode reward: [(0, '5.845')] [2025-01-04 00:06:40,688][134294] Updated weights for policy 0, policy_version 38784 (0.0029) [2025-01-04 00:06:43,734][134294] Updated weights for policy 0, policy_version 38794 (0.0024) [2025-01-04 00:06:43,968][134211] Fps is (10 sec: 13106.8, 60 sec: 15359.9, 300 sec: 14787.3). Total num frames: 158900224. Throughput: 0: 3611.8. Samples: 28896576. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 00:06:43,969][134211] Avg episode reward: [(0, '6.341')] [2025-01-04 00:06:46,794][134294] Updated weights for policy 0, policy_version 38804 (0.0028) [2025-01-04 00:06:48,968][134211] Fps is (10 sec: 13516.7, 60 sec: 15086.9, 300 sec: 14828.9). Total num frames: 158969856. Throughput: 0: 3626.8. Samples: 28906442. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 00:06:48,968][134211] Avg episode reward: [(0, '5.908')] [2025-01-04 00:06:49,990][134294] Updated weights for policy 0, policy_version 38814 (0.0026) [2025-01-04 00:06:53,316][134294] Updated weights for policy 0, policy_version 38824 (0.0026) [2025-01-04 00:06:53,968][134211] Fps is (10 sec: 12698.0, 60 sec: 14199.4, 300 sec: 14815.0). Total num frames: 159027200. Throughput: 0: 3637.9. Samples: 28925220. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 00:06:53,968][134211] Avg episode reward: [(0, '6.167')] [2025-01-04 00:06:56,364][134294] Updated weights for policy 0, policy_version 38834 (0.0027) [2025-01-04 00:06:58,968][134211] Fps is (10 sec: 12697.5, 60 sec: 14131.2, 300 sec: 14828.9). Total num frames: 159096832. Throughput: 0: 3664.1. Samples: 28945028. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 00:06:58,968][134211] Avg episode reward: [(0, '5.637')] [2025-01-04 00:06:59,574][134294] Updated weights for policy 0, policy_version 38844 (0.0024) [2025-01-04 00:07:02,597][134294] Updated weights for policy 0, policy_version 38854 (0.0024) [2025-01-04 00:07:03,968][134211] Fps is (10 sec: 13926.7, 60 sec: 14267.8, 300 sec: 14759.5). Total num frames: 159166464. Throughput: 0: 3698.1. Samples: 28955004. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 00:07:03,968][134211] Avg episode reward: [(0, '5.739')] [2025-01-04 00:07:04,993][134294] Updated weights for policy 0, policy_version 38864 (0.0017) [2025-01-04 00:07:07,307][134294] Updated weights for policy 0, policy_version 38874 (0.0021) [2025-01-04 00:07:08,968][134211] Fps is (10 sec: 15155.2, 60 sec: 14540.8, 300 sec: 14731.7). Total num frames: 159248384. Throughput: 0: 3622.9. Samples: 28979664. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 00:07:08,969][134211] Avg episode reward: [(0, '5.726')] [2025-01-04 00:07:10,545][134294] Updated weights for policy 0, policy_version 38884 (0.0025) [2025-01-04 00:07:12,920][134294] Updated weights for policy 0, policy_version 38894 (0.0019) [2025-01-04 00:07:13,967][134211] Fps is (10 sec: 16384.1, 60 sec: 14950.5, 300 sec: 14801.1). Total num frames: 159330304. Throughput: 0: 3429.5. Samples: 29001936. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 00:07:13,968][134211] Avg episode reward: [(0, '6.170')] [2025-01-04 00:07:13,973][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000038899_159330304.pth... [2025-01-04 00:07:14,020][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000038031_155774976.pth [2025-01-04 00:07:14,882][134294] Updated weights for policy 0, policy_version 38904 (0.0014) [2025-01-04 00:07:17,210][134294] Updated weights for policy 0, policy_version 38914 (0.0019) [2025-01-04 00:07:18,968][134211] Fps is (10 sec: 16793.9, 60 sec: 15223.4, 300 sec: 14870.6). Total num frames: 159416320. Throughput: 0: 3553.5. Samples: 29017370. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 00:07:18,968][134211] Avg episode reward: [(0, '6.557')] [2025-01-04 00:07:20,148][134294] Updated weights for policy 0, policy_version 38924 (0.0024) [2025-01-04 00:07:23,128][134294] Updated weights for policy 0, policy_version 38934 (0.0023) [2025-01-04 00:07:23,968][134211] Fps is (10 sec: 15154.7, 60 sec: 14540.7, 300 sec: 14870.5). Total num frames: 159481856. Throughput: 0: 3600.4. Samples: 29038468. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 00:07:23,969][134211] Avg episode reward: [(0, '6.062')] [2025-01-04 00:07:26,323][134294] Updated weights for policy 0, policy_version 38944 (0.0028) [2025-01-04 00:07:28,968][134211] Fps is (10 sec: 13106.9, 60 sec: 14062.9, 300 sec: 14856.7). Total num frames: 159547392. Throughput: 0: 3581.8. Samples: 29057756. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 00:07:28,969][134211] Avg episode reward: [(0, '6.470')] [2025-01-04 00:07:29,511][134294] Updated weights for policy 0, policy_version 38954 (0.0025) [2025-01-04 00:07:32,430][134294] Updated weights for policy 0, policy_version 38964 (0.0026) [2025-01-04 00:07:33,968][134211] Fps is (10 sec: 14745.9, 60 sec: 14336.0, 300 sec: 14912.3). Total num frames: 159629312. Throughput: 0: 3585.5. Samples: 29067790. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 00:07:33,968][134211] Avg episode reward: [(0, '6.527')] [2025-01-04 00:07:34,359][134294] Updated weights for policy 0, policy_version 38974 (0.0014) [2025-01-04 00:07:37,235][134294] Updated weights for policy 0, policy_version 38984 (0.0024) [2025-01-04 00:07:38,968][134211] Fps is (10 sec: 15155.6, 60 sec: 14404.3, 300 sec: 14912.2). Total num frames: 159698944. Throughput: 0: 3721.9. Samples: 29092704. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 00:07:38,968][134211] Avg episode reward: [(0, '6.312')] [2025-01-04 00:07:40,376][134294] Updated weights for policy 0, policy_version 38994 (0.0028) [2025-01-04 00:07:43,065][134294] Updated weights for policy 0, policy_version 39004 (0.0022) [2025-01-04 00:07:43,968][134211] Fps is (10 sec: 15155.3, 60 sec: 14677.5, 300 sec: 14926.1). Total num frames: 159780864. Throughput: 0: 3757.8. Samples: 29114128. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 00:07:43,968][134211] Avg episode reward: [(0, '6.108')] [2025-01-04 00:07:44,896][134294] Updated weights for policy 0, policy_version 39014 (0.0013) [2025-01-04 00:07:46,832][134294] Updated weights for policy 0, policy_version 39024 (0.0012) [2025-01-04 00:07:48,709][134294] Updated weights for policy 0, policy_version 39034 (0.0013) [2025-01-04 00:07:48,968][134211] Fps is (10 sec: 18841.6, 60 sec: 15291.8, 300 sec: 14926.1). Total num frames: 159887360. Throughput: 0: 3896.4. Samples: 29130344. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 00:07:48,968][134211] Avg episode reward: [(0, '6.466')] [2025-01-04 00:07:50,581][134294] Updated weights for policy 0, policy_version 39044 (0.0014) [2025-01-04 00:07:52,765][134294] Updated weights for policy 0, policy_version 39054 (0.0018) [2025-01-04 00:07:53,968][134211] Fps is (10 sec: 20069.9, 60 sec: 15906.1, 300 sec: 14912.2). Total num frames: 159981568. Throughput: 0: 4061.2. Samples: 29162420. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 00:07:53,969][134211] Avg episode reward: [(0, '6.866')] [2025-01-04 00:07:55,884][134294] Updated weights for policy 0, policy_version 39064 (0.0026) [2025-01-04 00:07:58,968][134211] Fps is (10 sec: 15564.6, 60 sec: 15769.6, 300 sec: 14898.3). Total num frames: 160043008. Throughput: 0: 4008.3. Samples: 29182308. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 00:07:58,968][134211] Avg episode reward: [(0, '6.551')] [2025-01-04 00:07:59,040][134294] Updated weights for policy 0, policy_version 39074 (0.0023) [2025-01-04 00:08:02,233][134294] Updated weights for policy 0, policy_version 39084 (0.0025) [2025-01-04 00:08:03,968][134211] Fps is (10 sec: 12697.7, 60 sec: 15701.3, 300 sec: 14912.2). Total num frames: 160108544. Throughput: 0: 3882.4. Samples: 29192078. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 00:08:03,969][134211] Avg episode reward: [(0, '6.740')] [2025-01-04 00:08:05,239][134294] Updated weights for policy 0, policy_version 39094 (0.0027) [2025-01-04 00:08:08,274][134294] Updated weights for policy 0, policy_version 39104 (0.0026) [2025-01-04 00:08:08,968][134211] Fps is (10 sec: 13516.7, 60 sec: 15496.5, 300 sec: 14926.1). Total num frames: 160178176. Throughput: 0: 3858.7. Samples: 29212108. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 00:08:08,968][134211] Avg episode reward: [(0, '6.818')] [2025-01-04 00:08:11,506][134294] Updated weights for policy 0, policy_version 39114 (0.0025) [2025-01-04 00:08:13,968][134211] Fps is (10 sec: 12697.5, 60 sec: 15086.8, 300 sec: 14898.3). Total num frames: 160235520. Throughput: 0: 3841.4. Samples: 29230618. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 00:08:13,969][134211] Avg episode reward: [(0, '6.531')] [2025-01-04 00:08:15,122][134294] Updated weights for policy 0, policy_version 39124 (0.0024) [2025-01-04 00:08:18,510][134294] Updated weights for policy 0, policy_version 39134 (0.0030) [2025-01-04 00:08:18,968][134211] Fps is (10 sec: 11878.5, 60 sec: 14677.3, 300 sec: 14884.5). Total num frames: 160296960. Throughput: 0: 3808.6. Samples: 29239178. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 00:08:18,968][134211] Avg episode reward: [(0, '6.545')] [2025-01-04 00:08:21,745][134294] Updated weights for policy 0, policy_version 39144 (0.0025) [2025-01-04 00:08:23,968][134211] Fps is (10 sec: 12697.9, 60 sec: 14677.4, 300 sec: 14884.5). Total num frames: 160362496. Throughput: 0: 3671.7. Samples: 29257932. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 00:08:23,968][134211] Avg episode reward: [(0, '5.617')] [2025-01-04 00:08:24,880][134294] Updated weights for policy 0, policy_version 39154 (0.0026) [2025-01-04 00:08:26,942][134294] Updated weights for policy 0, policy_version 39164 (0.0014) [2025-01-04 00:08:28,823][134294] Updated weights for policy 0, policy_version 39174 (0.0014) [2025-01-04 00:08:28,968][134211] Fps is (10 sec: 15974.6, 60 sec: 15155.3, 300 sec: 14967.8). Total num frames: 160456704. Throughput: 0: 3774.6. Samples: 29283984. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 00:08:28,968][134211] Avg episode reward: [(0, '6.034')] [2025-01-04 00:08:30,699][134294] Updated weights for policy 0, policy_version 39184 (0.0012) [2025-01-04 00:08:32,569][134294] Updated weights for policy 0, policy_version 39194 (0.0013) [2025-01-04 00:08:33,968][134211] Fps is (10 sec: 20479.0, 60 sec: 15633.0, 300 sec: 15106.6). Total num frames: 160567296. Throughput: 0: 3776.8. Samples: 29300302. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 00:08:33,968][134211] Avg episode reward: [(0, '6.500')] [2025-01-04 00:08:34,484][134294] Updated weights for policy 0, policy_version 39204 (0.0012) [2025-01-04 00:08:36,316][134294] Updated weights for policy 0, policy_version 39214 (0.0012) [2025-01-04 00:08:38,968][134211] Fps is (10 sec: 20070.2, 60 sec: 15974.4, 300 sec: 15092.7). Total num frames: 160657408. Throughput: 0: 3774.4. Samples: 29332268. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-04 00:08:38,968][134211] Avg episode reward: [(0, '7.038')] [2025-01-04 00:08:39,196][134294] Updated weights for policy 0, policy_version 39224 (0.0025) [2025-01-04 00:08:42,380][134294] Updated weights for policy 0, policy_version 39234 (0.0027) [2025-01-04 00:08:43,968][134211] Fps is (10 sec: 15155.8, 60 sec: 15633.0, 300 sec: 14940.0). Total num frames: 160718848. Throughput: 0: 3752.2. Samples: 29351158. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-04 00:08:43,968][134211] Avg episode reward: [(0, '6.699')] [2025-01-04 00:08:45,561][134294] Updated weights for policy 0, policy_version 39244 (0.0025) [2025-01-04 00:08:48,470][134294] Updated weights for policy 0, policy_version 39254 (0.0027) [2025-01-04 00:08:48,968][134211] Fps is (10 sec: 13107.0, 60 sec: 15018.6, 300 sec: 14912.2). Total num frames: 160788480. Throughput: 0: 3757.2. Samples: 29361152. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-04 00:08:48,969][134211] Avg episode reward: [(0, '6.541')] [2025-01-04 00:08:51,553][134294] Updated weights for policy 0, policy_version 39264 (0.0023) [2025-01-04 00:08:53,968][134211] Fps is (10 sec: 13516.6, 60 sec: 14540.8, 300 sec: 14926.1). Total num frames: 160854016. Throughput: 0: 3764.2. Samples: 29381496. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-04 00:08:53,969][134211] Avg episode reward: [(0, '6.400')] [2025-01-04 00:08:54,816][134294] Updated weights for policy 0, policy_version 39274 (0.0026) [2025-01-04 00:08:57,876][134294] Updated weights for policy 0, policy_version 39284 (0.0027) [2025-01-04 00:08:58,968][134211] Fps is (10 sec: 13107.3, 60 sec: 14609.1, 300 sec: 14940.0). Total num frames: 160919552. Throughput: 0: 3784.9. Samples: 29400936. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-04 00:08:58,968][134211] Avg episode reward: [(0, '6.229')] [2025-01-04 00:09:00,946][134294] Updated weights for policy 0, policy_version 39294 (0.0024) [2025-01-04 00:09:03,904][134294] Updated weights for policy 0, policy_version 39304 (0.0025) [2025-01-04 00:09:03,969][134211] Fps is (10 sec: 13515.9, 60 sec: 14677.2, 300 sec: 14953.8). Total num frames: 160989184. Throughput: 0: 3824.6. Samples: 29411286. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 00:09:03,969][134211] Avg episode reward: [(0, '6.573')] [2025-01-04 00:09:06,874][134294] Updated weights for policy 0, policy_version 39314 (0.0025) [2025-01-04 00:09:08,968][134211] Fps is (10 sec: 13517.0, 60 sec: 14609.1, 300 sec: 14967.8). Total num frames: 161054720. Throughput: 0: 3864.3. Samples: 29431824. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 00:09:08,969][134211] Avg episode reward: [(0, '6.284')] [2025-01-04 00:09:09,913][134294] Updated weights for policy 0, policy_version 39324 (0.0025) [2025-01-04 00:09:12,876][134294] Updated weights for policy 0, policy_version 39334 (0.0024) [2025-01-04 00:09:13,968][134211] Fps is (10 sec: 13517.7, 60 sec: 14813.9, 300 sec: 14912.2). Total num frames: 161124352. Throughput: 0: 3733.4. Samples: 29451986. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 00:09:13,968][134211] Avg episode reward: [(0, '5.793')] [2025-01-04 00:09:13,979][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000039337_161124352.pth... [2025-01-04 00:09:14,051][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000038464_157548544.pth [2025-01-04 00:09:15,983][134294] Updated weights for policy 0, policy_version 39344 (0.0023) [2025-01-04 00:09:18,852][134294] Updated weights for policy 0, policy_version 39354 (0.0023) [2025-01-04 00:09:18,968][134211] Fps is (10 sec: 13926.2, 60 sec: 14950.4, 300 sec: 14773.4). Total num frames: 161193984. Throughput: 0: 3597.9. Samples: 29462206. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 00:09:18,969][134211] Avg episode reward: [(0, '5.599')] [2025-01-04 00:09:21,849][134294] Updated weights for policy 0, policy_version 39364 (0.0023) [2025-01-04 00:09:23,901][134294] Updated weights for policy 0, policy_version 39374 (0.0014) [2025-01-04 00:09:23,968][134211] Fps is (10 sec: 15155.5, 60 sec: 15223.5, 300 sec: 14773.4). Total num frames: 161275904. Throughput: 0: 3352.9. Samples: 29483150. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 00:09:23,969][134211] Avg episode reward: [(0, '6.189')] [2025-01-04 00:09:25,848][134294] Updated weights for policy 0, policy_version 39384 (0.0014) [2025-01-04 00:09:27,726][134294] Updated weights for policy 0, policy_version 39394 (0.0014) [2025-01-04 00:09:28,968][134211] Fps is (10 sec: 18841.5, 60 sec: 15428.2, 300 sec: 14912.2). Total num frames: 161382400. Throughput: 0: 3644.3. Samples: 29515152. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 00:09:28,968][134211] Avg episode reward: [(0, '6.083')] [2025-01-04 00:09:29,616][134294] Updated weights for policy 0, policy_version 39404 (0.0013) [2025-01-04 00:09:31,445][134294] Updated weights for policy 0, policy_version 39414 (0.0013) [2025-01-04 00:09:33,968][134211] Fps is (10 sec: 20069.9, 60 sec: 15155.3, 300 sec: 15009.4). Total num frames: 161476608. Throughput: 0: 3785.8. Samples: 29531514. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 00:09:33,969][134211] Avg episode reward: [(0, '6.999')] [2025-01-04 00:09:34,270][134294] Updated weights for policy 0, policy_version 39424 (0.0025) [2025-01-04 00:09:37,487][134294] Updated weights for policy 0, policy_version 39434 (0.0027) [2025-01-04 00:09:38,968][134211] Fps is (10 sec: 15564.9, 60 sec: 14677.3, 300 sec: 14995.5). Total num frames: 161538048. Throughput: 0: 3803.6. Samples: 29552656. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 00:09:38,968][134211] Avg episode reward: [(0, '6.861')] [2025-01-04 00:09:40,644][134294] Updated weights for policy 0, policy_version 39444 (0.0026) [2025-01-04 00:09:43,821][134294] Updated weights for policy 0, policy_version 39454 (0.0024) [2025-01-04 00:09:43,968][134211] Fps is (10 sec: 12697.7, 60 sec: 14745.6, 300 sec: 14981.6). Total num frames: 161603584. Throughput: 0: 3804.9. Samples: 29572158. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 00:09:43,968][134211] Avg episode reward: [(0, '5.842')] [2025-01-04 00:09:46,842][134294] Updated weights for policy 0, policy_version 39464 (0.0027) [2025-01-04 00:09:48,968][134211] Fps is (10 sec: 13106.9, 60 sec: 14677.3, 300 sec: 14995.5). Total num frames: 161669120. Throughput: 0: 3792.8. Samples: 29581962. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 00:09:48,969][134211] Avg episode reward: [(0, '6.516')] [2025-01-04 00:09:50,023][134294] Updated weights for policy 0, policy_version 39474 (0.0026) [2025-01-04 00:09:52,988][134294] Updated weights for policy 0, policy_version 39484 (0.0026) [2025-01-04 00:09:53,968][134211] Fps is (10 sec: 13517.0, 60 sec: 14745.6, 300 sec: 14995.5). Total num frames: 161738752. Throughput: 0: 3785.1. Samples: 29602152. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 00:09:53,968][134211] Avg episode reward: [(0, '6.470')] [2025-01-04 00:09:55,947][134294] Updated weights for policy 0, policy_version 39494 (0.0022) [2025-01-04 00:09:58,835][134294] Updated weights for policy 0, policy_version 39504 (0.0025) [2025-01-04 00:09:58,968][134211] Fps is (10 sec: 13926.7, 60 sec: 14813.9, 300 sec: 15009.4). Total num frames: 161808384. Throughput: 0: 3800.5. Samples: 29623006. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 00:09:58,968][134211] Avg episode reward: [(0, '7.150')] [2025-01-04 00:10:02,253][134294] Updated weights for policy 0, policy_version 39514 (0.0024) [2025-01-04 00:10:03,968][134211] Fps is (10 sec: 14336.1, 60 sec: 14882.4, 300 sec: 15023.3). Total num frames: 161882112. Throughput: 0: 3779.9. Samples: 29632300. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 00:10:03,968][134211] Avg episode reward: [(0, '6.787')] [2025-01-04 00:10:04,183][134294] Updated weights for policy 0, policy_version 39524 (0.0013) [2025-01-04 00:10:06,039][134294] Updated weights for policy 0, policy_version 39534 (0.0012) [2025-01-04 00:10:08,794][134294] Updated weights for policy 0, policy_version 39544 (0.0024) [2025-01-04 00:10:08,968][134211] Fps is (10 sec: 16383.9, 60 sec: 15291.7, 300 sec: 14995.5). Total num frames: 161972224. Throughput: 0: 3944.4. Samples: 29660650. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 00:10:08,968][134211] Avg episode reward: [(0, '7.015')] [2025-01-04 00:10:12,112][134294] Updated weights for policy 0, policy_version 39554 (0.0027) [2025-01-04 00:10:13,968][134211] Fps is (10 sec: 14745.4, 60 sec: 15087.0, 300 sec: 14842.8). Total num frames: 162029568. Throughput: 0: 3647.7. Samples: 29679296. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 00:10:13,968][134211] Avg episode reward: [(0, '6.451')] [2025-01-04 00:10:15,586][134294] Updated weights for policy 0, policy_version 39564 (0.0023) [2025-01-04 00:10:17,589][134294] Updated weights for policy 0, policy_version 39574 (0.0016) [2025-01-04 00:10:18,967][134211] Fps is (10 sec: 15155.7, 60 sec: 15496.6, 300 sec: 14953.9). Total num frames: 162123776. Throughput: 0: 3517.8. Samples: 29689812. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 00:10:18,968][134211] Avg episode reward: [(0, '6.512')] [2025-01-04 00:10:19,522][134294] Updated weights for policy 0, policy_version 39584 (0.0011) [2025-01-04 00:10:21,370][134294] Updated weights for policy 0, policy_version 39594 (0.0015) [2025-01-04 00:10:23,468][134294] Updated weights for policy 0, policy_version 39604 (0.0017) [2025-01-04 00:10:23,968][134211] Fps is (10 sec: 19251.0, 60 sec: 15769.5, 300 sec: 15064.9). Total num frames: 162222080. Throughput: 0: 3758.1. Samples: 29721772. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 00:10:23,968][134211] Avg episode reward: [(0, '7.011')] [2025-01-04 00:10:26,680][134294] Updated weights for policy 0, policy_version 39614 (0.0027) [2025-01-04 00:10:28,968][134211] Fps is (10 sec: 15974.0, 60 sec: 15018.7, 300 sec: 15051.1). Total num frames: 162283520. Throughput: 0: 3784.2. Samples: 29742446. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 00:10:28,968][134211] Avg episode reward: [(0, '7.239')] [2025-01-04 00:10:29,983][134294] Updated weights for policy 0, policy_version 39624 (0.0026) [2025-01-04 00:10:33,055][134294] Updated weights for policy 0, policy_version 39634 (0.0025) [2025-01-04 00:10:33,968][134211] Fps is (10 sec: 12697.7, 60 sec: 14540.8, 300 sec: 15037.2). Total num frames: 162349056. Throughput: 0: 3779.3. Samples: 29752030. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 00:10:33,968][134211] Avg episode reward: [(0, '6.033')] [2025-01-04 00:10:36,187][134294] Updated weights for policy 0, policy_version 39644 (0.0027) [2025-01-04 00:10:38,968][134211] Fps is (10 sec: 13107.2, 60 sec: 14609.1, 300 sec: 15037.2). Total num frames: 162414592. Throughput: 0: 3774.1. Samples: 29771986. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 00:10:38,968][134211] Avg episode reward: [(0, '7.128')] [2025-01-04 00:10:39,347][134294] Updated weights for policy 0, policy_version 39654 (0.0028) [2025-01-04 00:10:42,325][134294] Updated weights for policy 0, policy_version 39664 (0.0026) [2025-01-04 00:10:43,968][134211] Fps is (10 sec: 13516.9, 60 sec: 14677.4, 300 sec: 14981.6). Total num frames: 162484224. Throughput: 0: 3749.4. Samples: 29791730. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2025-01-04 00:10:43,968][134211] Avg episode reward: [(0, '7.287')] [2025-01-04 00:10:45,348][134294] Updated weights for policy 0, policy_version 39674 (0.0025) [2025-01-04 00:10:48,289][134294] Updated weights for policy 0, policy_version 39684 (0.0026) [2025-01-04 00:10:48,968][134211] Fps is (10 sec: 13926.5, 60 sec: 14745.7, 300 sec: 14842.8). Total num frames: 162553856. Throughput: 0: 3774.8. Samples: 29802166. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2025-01-04 00:10:48,968][134211] Avg episode reward: [(0, '6.616')] [2025-01-04 00:10:51,379][134294] Updated weights for policy 0, policy_version 39694 (0.0025) [2025-01-04 00:10:53,968][134211] Fps is (10 sec: 13516.8, 60 sec: 14677.3, 300 sec: 14815.0). Total num frames: 162619392. Throughput: 0: 3594.1. Samples: 29822384. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2025-01-04 00:10:53,968][134211] Avg episode reward: [(0, '6.864')] [2025-01-04 00:10:54,558][134294] Updated weights for policy 0, policy_version 39704 (0.0027) [2025-01-04 00:10:57,410][134294] Updated weights for policy 0, policy_version 39714 (0.0020) [2025-01-04 00:10:58,967][134211] Fps is (10 sec: 14745.8, 60 sec: 14882.2, 300 sec: 14884.5). Total num frames: 162701312. Throughput: 0: 3670.7. Samples: 29844476. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2025-01-04 00:10:58,968][134211] Avg episode reward: [(0, '7.023')] [2025-01-04 00:10:59,360][134294] Updated weights for policy 0, policy_version 39724 (0.0014) [2025-01-04 00:11:01,262][134294] Updated weights for policy 0, policy_version 39734 (0.0015) [2025-01-04 00:11:03,968][134211] Fps is (10 sec: 16793.6, 60 sec: 15086.9, 300 sec: 14953.9). Total num frames: 162787328. Throughput: 0: 3791.4. Samples: 29860424. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2025-01-04 00:11:03,968][134211] Avg episode reward: [(0, '6.663')] [2025-01-04 00:11:04,057][134294] Updated weights for policy 0, policy_version 39744 (0.0022) [2025-01-04 00:11:07,070][134294] Updated weights for policy 0, policy_version 39754 (0.0024) [2025-01-04 00:11:08,968][134211] Fps is (10 sec: 15564.6, 60 sec: 14745.6, 300 sec: 14995.5). Total num frames: 162856960. Throughput: 0: 3550.1. Samples: 29881526. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 00:11:08,968][134211] Avg episode reward: [(0, '6.461')] [2025-01-04 00:11:10,152][134294] Updated weights for policy 0, policy_version 39764 (0.0024) [2025-01-04 00:11:13,124][134294] Updated weights for policy 0, policy_version 39774 (0.0024) [2025-01-04 00:11:13,968][134211] Fps is (10 sec: 13516.4, 60 sec: 14882.1, 300 sec: 14981.6). Total num frames: 162922496. Throughput: 0: 3542.8. Samples: 29901874. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 00:11:13,969][134211] Avg episode reward: [(0, '7.125')] [2025-01-04 00:11:14,053][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000039777_162926592.pth... [2025-01-04 00:11:14,123][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000038899_159330304.pth [2025-01-04 00:11:15,656][134294] Updated weights for policy 0, policy_version 39784 (0.0020) [2025-01-04 00:11:17,892][134294] Updated weights for policy 0, policy_version 39794 (0.0017) [2025-01-04 00:11:18,968][134211] Fps is (10 sec: 15155.2, 60 sec: 14745.6, 300 sec: 14912.2). Total num frames: 163008512. Throughput: 0: 3626.1. Samples: 29915206. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 00:11:18,968][134211] Avg episode reward: [(0, '6.548')] [2025-01-04 00:11:20,889][134294] Updated weights for policy 0, policy_version 39804 (0.0024) [2025-01-04 00:11:23,869][134294] Updated weights for policy 0, policy_version 39814 (0.0024) [2025-01-04 00:11:23,968][134211] Fps is (10 sec: 15565.0, 60 sec: 14267.7, 300 sec: 14828.9). Total num frames: 163078144. Throughput: 0: 3665.2. Samples: 29936920. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 00:11:23,968][134211] Avg episode reward: [(0, '6.517')] [2025-01-04 00:11:26,846][134294] Updated weights for policy 0, policy_version 39824 (0.0025) [2025-01-04 00:11:28,968][134211] Fps is (10 sec: 13926.5, 60 sec: 14404.3, 300 sec: 14842.8). Total num frames: 163147776. Throughput: 0: 3679.0. Samples: 29957284. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 00:11:28,968][134211] Avg episode reward: [(0, '7.216')] [2025-01-04 00:11:29,554][134294] Updated weights for policy 0, policy_version 39834 (0.0019) [2025-01-04 00:11:31,432][134294] Updated weights for policy 0, policy_version 39844 (0.0014) [2025-01-04 00:11:33,280][134294] Updated weights for policy 0, policy_version 39854 (0.0014) [2025-01-04 00:11:33,967][134211] Fps is (10 sec: 17613.4, 60 sec: 15087.0, 300 sec: 14981.6). Total num frames: 163254272. Throughput: 0: 3777.1. Samples: 29972136. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 00:11:33,968][134211] Avg episode reward: [(0, '6.266')] [2025-01-04 00:11:35,216][134294] Updated weights for policy 0, policy_version 39864 (0.0012) [2025-01-04 00:11:37,293][134294] Updated weights for policy 0, policy_version 39874 (0.0018) [2025-01-04 00:11:38,968][134211] Fps is (10 sec: 19251.1, 60 sec: 15428.3, 300 sec: 15051.1). Total num frames: 163340288. Throughput: 0: 4027.8. Samples: 30003634. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 00:11:38,968][134211] Avg episode reward: [(0, '7.013')] [2025-01-04 00:11:40,642][134294] Updated weights for policy 0, policy_version 39884 (0.0027) [2025-01-04 00:11:43,856][134294] Updated weights for policy 0, policy_version 39894 (0.0026) [2025-01-04 00:11:43,968][134211] Fps is (10 sec: 15154.9, 60 sec: 15360.0, 300 sec: 15037.2). Total num frames: 163405824. Throughput: 0: 3957.5. Samples: 30022564. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 00:11:43,968][134211] Avg episode reward: [(0, '6.790')] [2025-01-04 00:11:47,049][134294] Updated weights for policy 0, policy_version 39904 (0.0030) [2025-01-04 00:11:48,968][134211] Fps is (10 sec: 12697.6, 60 sec: 15223.5, 300 sec: 15051.1). Total num frames: 163467264. Throughput: 0: 3814.8. Samples: 30032088. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 00:11:48,968][134211] Avg episode reward: [(0, '5.833')] [2025-01-04 00:11:50,079][134294] Updated weights for policy 0, policy_version 39914 (0.0025) [2025-01-04 00:11:53,289][134294] Updated weights for policy 0, policy_version 39924 (0.0026) [2025-01-04 00:11:53,968][134211] Fps is (10 sec: 13107.2, 60 sec: 15291.7, 300 sec: 15051.1). Total num frames: 163536896. Throughput: 0: 3789.2. Samples: 30052042. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 00:11:53,968][134211] Avg episode reward: [(0, '6.711')] [2025-01-04 00:11:56,500][134294] Updated weights for policy 0, policy_version 39934 (0.0026) [2025-01-04 00:11:58,968][134211] Fps is (10 sec: 13107.2, 60 sec: 14950.4, 300 sec: 15023.3). Total num frames: 163598336. Throughput: 0: 3759.8. Samples: 30071062. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 00:11:58,968][134211] Avg episode reward: [(0, '6.492')] [2025-01-04 00:11:59,675][134294] Updated weights for policy 0, policy_version 39944 (0.0024) [2025-01-04 00:12:01,848][134294] Updated weights for policy 0, policy_version 39954 (0.0017) [2025-01-04 00:12:03,854][134294] Updated weights for policy 0, policy_version 39964 (0.0015) [2025-01-04 00:12:03,968][134211] Fps is (10 sec: 15565.0, 60 sec: 15087.0, 300 sec: 15065.0). Total num frames: 163692544. Throughput: 0: 3718.5. Samples: 30082538. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 00:12:03,968][134211] Avg episode reward: [(0, '6.314')] [2025-01-04 00:12:05,797][134294] Updated weights for policy 0, policy_version 39974 (0.0014) [2025-01-04 00:12:07,978][134294] Updated weights for policy 0, policy_version 39984 (0.0015) [2025-01-04 00:12:08,968][134211] Fps is (10 sec: 18841.5, 60 sec: 15496.5, 300 sec: 15106.6). Total num frames: 163786752. Throughput: 0: 3913.3. Samples: 30113018. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 00:12:08,968][134211] Avg episode reward: [(0, '6.676')] [2025-01-04 00:12:11,267][134294] Updated weights for policy 0, policy_version 39994 (0.0028) [2025-01-04 00:12:13,968][134211] Fps is (10 sec: 15154.8, 60 sec: 15360.0, 300 sec: 15009.4). Total num frames: 163844096. Throughput: 0: 3888.6. Samples: 30132274. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 00:12:13,969][134211] Avg episode reward: [(0, '5.989')] [2025-01-04 00:12:14,961][134294] Updated weights for policy 0, policy_version 40004 (0.0028) [2025-01-04 00:12:18,762][134294] Updated weights for policy 0, policy_version 40014 (0.0028) [2025-01-04 00:12:18,970][134211] Fps is (10 sec: 11057.1, 60 sec: 14813.4, 300 sec: 14967.7). Total num frames: 163897344. Throughput: 0: 3737.1. Samples: 30140312. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 00:12:18,970][134211] Avg episode reward: [(0, '6.574')] [2025-01-04 00:12:22,201][134294] Updated weights for policy 0, policy_version 40024 (0.0026) [2025-01-04 00:12:23,968][134211] Fps is (10 sec: 11878.7, 60 sec: 14745.7, 300 sec: 14967.8). Total num frames: 163962880. Throughput: 0: 3425.2. Samples: 30157768. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 00:12:23,968][134211] Avg episode reward: [(0, '6.309')] [2025-01-04 00:12:24,602][134294] Updated weights for policy 0, policy_version 40034 (0.0015) [2025-01-04 00:12:26,544][134294] Updated weights for policy 0, policy_version 40044 (0.0014) [2025-01-04 00:12:28,385][134294] Updated weights for policy 0, policy_version 40054 (0.0013) [2025-01-04 00:12:28,968][134211] Fps is (10 sec: 17616.4, 60 sec: 15428.3, 300 sec: 15065.0). Total num frames: 164073472. Throughput: 0: 3661.8. Samples: 30187344. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 00:12:28,968][134211] Avg episode reward: [(0, '5.884')] [2025-01-04 00:12:30,305][134294] Updated weights for policy 0, policy_version 40064 (0.0015) [2025-01-04 00:12:32,194][134294] Updated weights for policy 0, policy_version 40074 (0.0014) [2025-01-04 00:12:33,968][134211] Fps is (10 sec: 21708.8, 60 sec: 15428.2, 300 sec: 15189.9). Total num frames: 164179968. Throughput: 0: 3812.7. Samples: 30203658. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 00:12:33,968][134211] Avg episode reward: [(0, '5.519')] [2025-01-04 00:12:34,105][134294] Updated weights for policy 0, policy_version 40084 (0.0015) [2025-01-04 00:12:37,101][134294] Updated weights for policy 0, policy_version 40094 (0.0025) [2025-01-04 00:12:38,968][134211] Fps is (10 sec: 17202.8, 60 sec: 15086.9, 300 sec: 15134.4). Total num frames: 164245504. Throughput: 0: 3945.1. Samples: 30229574. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 00:12:38,969][134211] Avg episode reward: [(0, '6.188')] [2025-01-04 00:12:40,497][134294] Updated weights for policy 0, policy_version 40104 (0.0027) [2025-01-04 00:12:43,585][134294] Updated weights for policy 0, policy_version 40114 (0.0024) [2025-01-04 00:12:43,968][134211] Fps is (10 sec: 13107.2, 60 sec: 15087.0, 300 sec: 14995.5). Total num frames: 164311040. Throughput: 0: 3946.5. Samples: 30248654. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 00:12:43,968][134211] Avg episode reward: [(0, '5.952')] [2025-01-04 00:12:46,597][134294] Updated weights for policy 0, policy_version 40124 (0.0027) [2025-01-04 00:12:48,968][134211] Fps is (10 sec: 13107.1, 60 sec: 15155.1, 300 sec: 14898.3). Total num frames: 164376576. Throughput: 0: 3912.4. Samples: 30258598. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 00:12:48,969][134211] Avg episode reward: [(0, '6.140')] [2025-01-04 00:12:49,790][134294] Updated weights for policy 0, policy_version 40134 (0.0028) [2025-01-04 00:12:52,854][134294] Updated weights for policy 0, policy_version 40144 (0.0026) [2025-01-04 00:12:53,968][134211] Fps is (10 sec: 13106.2, 60 sec: 15086.8, 300 sec: 14912.2). Total num frames: 164442112. Throughput: 0: 3675.1. Samples: 30278402. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 00:12:53,969][134211] Avg episode reward: [(0, '6.288')] [2025-01-04 00:12:55,940][134294] Updated weights for policy 0, policy_version 40154 (0.0025) [2025-01-04 00:12:58,968][134211] Fps is (10 sec: 13107.4, 60 sec: 15155.2, 300 sec: 14912.2). Total num frames: 164507648. Throughput: 0: 3688.5. Samples: 30298254. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 00:12:58,968][134211] Avg episode reward: [(0, '6.482')] [2025-01-04 00:12:59,221][134294] Updated weights for policy 0, policy_version 40164 (0.0023) [2025-01-04 00:13:02,506][134294] Updated weights for policy 0, policy_version 40174 (0.0023) [2025-01-04 00:13:03,969][134211] Fps is (10 sec: 12696.3, 60 sec: 14608.6, 300 sec: 14884.4). Total num frames: 164569088. Throughput: 0: 3711.6. Samples: 30307332. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 00:13:03,970][134211] Avg episode reward: [(0, '6.650')] [2025-01-04 00:13:05,622][134294] Updated weights for policy 0, policy_version 40184 (0.0026) [2025-01-04 00:13:08,607][134294] Updated weights for policy 0, policy_version 40194 (0.0024) [2025-01-04 00:13:08,968][134211] Fps is (10 sec: 12697.6, 60 sec: 14131.2, 300 sec: 14912.2). Total num frames: 164634624. Throughput: 0: 3757.9. Samples: 30326876. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 00:13:08,968][134211] Avg episode reward: [(0, '6.298')] [2025-01-04 00:13:11,673][134294] Updated weights for policy 0, policy_version 40204 (0.0026) [2025-01-04 00:13:13,968][134211] Fps is (10 sec: 13519.1, 60 sec: 14336.0, 300 sec: 14940.0). Total num frames: 164704256. Throughput: 0: 3553.0. Samples: 30347228. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2025-01-04 00:13:13,968][134211] Avg episode reward: [(0, '6.210')] [2025-01-04 00:13:13,978][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000040211_164704256.pth... [2025-01-04 00:13:14,052][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000039337_161124352.pth [2025-01-04 00:13:14,809][134294] Updated weights for policy 0, policy_version 40214 (0.0025) [2025-01-04 00:13:17,831][134294] Updated weights for policy 0, policy_version 40224 (0.0023) [2025-01-04 00:13:18,968][134211] Fps is (10 sec: 14336.0, 60 sec: 14677.8, 300 sec: 14967.8). Total num frames: 164777984. Throughput: 0: 3406.0. Samples: 30356930. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2025-01-04 00:13:18,968][134211] Avg episode reward: [(0, '6.326')] [2025-01-04 00:13:19,756][134294] Updated weights for policy 0, policy_version 40234 (0.0014) [2025-01-04 00:13:22,453][134294] Updated weights for policy 0, policy_version 40244 (0.0023) [2025-01-04 00:13:23,968][134211] Fps is (10 sec: 15155.1, 60 sec: 14882.1, 300 sec: 14912.2). Total num frames: 164855808. Throughput: 0: 3389.5. Samples: 30382102. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2025-01-04 00:13:23,968][134211] Avg episode reward: [(0, '6.438')] [2025-01-04 00:13:25,483][134294] Updated weights for policy 0, policy_version 40254 (0.0026) [2025-01-04 00:13:28,718][134294] Updated weights for policy 0, policy_version 40264 (0.0028) [2025-01-04 00:13:28,968][134211] Fps is (10 sec: 14336.0, 60 sec: 14131.2, 300 sec: 14759.5). Total num frames: 164921344. Throughput: 0: 3404.3. Samples: 30401850. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2025-01-04 00:13:28,968][134211] Avg episode reward: [(0, '6.535')] [2025-01-04 00:13:31,297][134294] Updated weights for policy 0, policy_version 40274 (0.0020) [2025-01-04 00:13:33,194][134294] Updated weights for policy 0, policy_version 40284 (0.0012) [2025-01-04 00:13:33,968][134211] Fps is (10 sec: 16384.3, 60 sec: 13994.7, 300 sec: 14787.3). Total num frames: 165019648. Throughput: 0: 3437.3. Samples: 30413276. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2025-01-04 00:13:33,968][134211] Avg episode reward: [(0, '6.374')] [2025-01-04 00:13:35,073][134294] Updated weights for policy 0, policy_version 40294 (0.0012) [2025-01-04 00:13:37,007][134294] Updated weights for policy 0, policy_version 40304 (0.0013) [2025-01-04 00:13:38,903][134294] Updated weights for policy 0, policy_version 40314 (0.0012) [2025-01-04 00:13:38,968][134211] Fps is (10 sec: 20480.4, 60 sec: 14677.4, 300 sec: 14940.0). Total num frames: 165126144. Throughput: 0: 3715.3. Samples: 30445586. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-04 00:13:38,968][134211] Avg episode reward: [(0, '6.089')] [2025-01-04 00:13:40,774][134294] Updated weights for policy 0, policy_version 40324 (0.0013) [2025-01-04 00:13:42,664][134294] Updated weights for policy 0, policy_version 40334 (0.0015) [2025-01-04 00:13:43,968][134211] Fps is (10 sec: 20889.0, 60 sec: 15291.7, 300 sec: 15051.1). Total num frames: 165228544. Throughput: 0: 3991.9. Samples: 30477890. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-04 00:13:43,968][134211] Avg episode reward: [(0, '6.892')] [2025-01-04 00:13:45,509][134294] Updated weights for policy 0, policy_version 40344 (0.0025) [2025-01-04 00:13:48,968][134211] Fps is (10 sec: 15974.1, 60 sec: 15155.2, 300 sec: 15023.3). Total num frames: 165285888. Throughput: 0: 3996.9. Samples: 30487184. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-04 00:13:48,968][134211] Avg episode reward: [(0, '6.441')] [2025-01-04 00:13:49,127][134294] Updated weights for policy 0, policy_version 40354 (0.0028) [2025-01-04 00:13:52,266][134294] Updated weights for policy 0, policy_version 40364 (0.0028) [2025-01-04 00:13:53,968][134211] Fps is (10 sec: 11878.4, 60 sec: 15087.1, 300 sec: 15009.4). Total num frames: 165347328. Throughput: 0: 3968.1. Samples: 30505442. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-04 00:13:53,968][134211] Avg episode reward: [(0, '5.981')] [2025-01-04 00:13:55,396][134294] Updated weights for policy 0, policy_version 40374 (0.0028) [2025-01-04 00:13:58,351][134294] Updated weights for policy 0, policy_version 40384 (0.0025) [2025-01-04 00:13:58,968][134211] Fps is (10 sec: 13107.3, 60 sec: 15155.2, 300 sec: 15009.5). Total num frames: 165416960. Throughput: 0: 3966.6. Samples: 30525726. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-04 00:13:58,968][134211] Avg episode reward: [(0, '6.252')] [2025-01-04 00:14:01,445][134294] Updated weights for policy 0, policy_version 40394 (0.0025) [2025-01-04 00:14:03,969][134211] Fps is (10 sec: 13515.2, 60 sec: 15223.6, 300 sec: 15009.3). Total num frames: 165482496. Throughput: 0: 3972.2. Samples: 30535684. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 00:14:03,970][134211] Avg episode reward: [(0, '6.321')] [2025-01-04 00:14:04,792][134294] Updated weights for policy 0, policy_version 40404 (0.0025) [2025-01-04 00:14:07,830][134294] Updated weights for policy 0, policy_version 40414 (0.0025) [2025-01-04 00:14:08,968][134211] Fps is (10 sec: 13107.2, 60 sec: 15223.5, 300 sec: 14995.5). Total num frames: 165548032. Throughput: 0: 3840.5. Samples: 30554924. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 00:14:08,968][134211] Avg episode reward: [(0, '5.730')] [2025-01-04 00:14:10,821][134294] Updated weights for policy 0, policy_version 40424 (0.0025) [2025-01-04 00:14:13,890][134294] Updated weights for policy 0, policy_version 40434 (0.0025) [2025-01-04 00:14:13,968][134211] Fps is (10 sec: 13518.3, 60 sec: 15223.4, 300 sec: 14995.5). Total num frames: 165617664. Throughput: 0: 3862.3. Samples: 30575654. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 00:14:13,969][134211] Avg episode reward: [(0, '5.831')] [2025-01-04 00:14:17,446][134294] Updated weights for policy 0, policy_version 40444 (0.0024) [2025-01-04 00:14:18,968][134211] Fps is (10 sec: 12697.7, 60 sec: 14950.4, 300 sec: 14912.2). Total num frames: 165675008. Throughput: 0: 3800.6. Samples: 30584302. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 00:14:18,968][134211] Avg episode reward: [(0, '6.390')] [2025-01-04 00:14:20,158][134294] Updated weights for policy 0, policy_version 40454 (0.0020) [2025-01-04 00:14:22,124][134294] Updated weights for policy 0, policy_version 40464 (0.0014) [2025-01-04 00:14:23,967][134211] Fps is (10 sec: 15974.9, 60 sec: 15360.0, 300 sec: 14898.3). Total num frames: 165777408. Throughput: 0: 3621.1. Samples: 30608534. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 00:14:23,968][134211] Avg episode reward: [(0, '5.456')] [2025-01-04 00:14:23,996][134294] Updated weights for policy 0, policy_version 40474 (0.0013) [2025-01-04 00:14:25,900][134294] Updated weights for policy 0, policy_version 40484 (0.0014) [2025-01-04 00:14:27,795][134294] Updated weights for policy 0, policy_version 40494 (0.0015) [2025-01-04 00:14:28,968][134211] Fps is (10 sec: 21299.3, 60 sec: 16111.0, 300 sec: 14953.9). Total num frames: 165888000. Throughput: 0: 3629.7. Samples: 30641226. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 00:14:28,968][134211] Avg episode reward: [(0, '6.017')] [2025-01-04 00:14:29,654][134294] Updated weights for policy 0, policy_version 40504 (0.0013) [2025-01-04 00:14:32,110][134294] Updated weights for policy 0, policy_version 40514 (0.0020) [2025-01-04 00:14:33,968][134211] Fps is (10 sec: 18841.1, 60 sec: 15769.5, 300 sec: 15009.4). Total num frames: 165965824. Throughput: 0: 3763.2. Samples: 30656526. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 00:14:33,970][134211] Avg episode reward: [(0, '6.291')] [2025-01-04 00:14:35,358][134294] Updated weights for policy 0, policy_version 40524 (0.0029) [2025-01-04 00:14:38,483][134294] Updated weights for policy 0, policy_version 40534 (0.0030) [2025-01-04 00:14:38,969][134211] Fps is (10 sec: 14334.2, 60 sec: 15086.6, 300 sec: 15009.4). Total num frames: 166031360. Throughput: 0: 3786.1. Samples: 30675818. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 00:14:38,969][134211] Avg episode reward: [(0, '6.174')] [2025-01-04 00:14:41,619][134294] Updated weights for policy 0, policy_version 40544 (0.0024) [2025-01-04 00:14:43,968][134211] Fps is (10 sec: 13107.0, 60 sec: 14472.5, 300 sec: 15009.4). Total num frames: 166096896. Throughput: 0: 3768.1. Samples: 30695290. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 00:14:43,969][134211] Avg episode reward: [(0, '6.467')] [2025-01-04 00:14:44,778][134294] Updated weights for policy 0, policy_version 40554 (0.0024) [2025-01-04 00:14:47,941][134294] Updated weights for policy 0, policy_version 40564 (0.0024) [2025-01-04 00:14:48,968][134211] Fps is (10 sec: 13108.7, 60 sec: 14609.1, 300 sec: 14995.5). Total num frames: 166162432. Throughput: 0: 3764.3. Samples: 30705074. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 00:14:48,968][134211] Avg episode reward: [(0, '5.861')] [2025-01-04 00:14:50,844][134294] Updated weights for policy 0, policy_version 40574 (0.0029) [2025-01-04 00:14:53,939][134294] Updated weights for policy 0, policy_version 40584 (0.0026) [2025-01-04 00:14:53,968][134211] Fps is (10 sec: 13517.1, 60 sec: 14745.6, 300 sec: 14995.5). Total num frames: 166232064. Throughput: 0: 3791.2. Samples: 30725528. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 00:14:53,968][134211] Avg episode reward: [(0, '6.039')] [2025-01-04 00:14:57,024][134294] Updated weights for policy 0, policy_version 40594 (0.0025) [2025-01-04 00:14:58,968][134211] Fps is (10 sec: 13516.6, 60 sec: 14677.3, 300 sec: 14967.7). Total num frames: 166297600. Throughput: 0: 3773.7. Samples: 30745468. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 00:14:58,968][134211] Avg episode reward: [(0, '6.339')] [2025-01-04 00:14:59,911][134294] Updated weights for policy 0, policy_version 40604 (0.0025) [2025-01-04 00:15:02,913][134294] Updated weights for policy 0, policy_version 40614 (0.0024) [2025-01-04 00:15:03,968][134211] Fps is (10 sec: 13516.7, 60 sec: 14745.9, 300 sec: 14898.3). Total num frames: 166367232. Throughput: 0: 3816.9. Samples: 30756064. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 00:15:03,968][134211] Avg episode reward: [(0, '6.084')] [2025-01-04 00:15:05,902][134294] Updated weights for policy 0, policy_version 40624 (0.0024) [2025-01-04 00:15:08,726][134294] Updated weights for policy 0, policy_version 40634 (0.0024) [2025-01-04 00:15:08,968][134211] Fps is (10 sec: 13926.5, 60 sec: 14813.9, 300 sec: 14940.0). Total num frames: 166436864. Throughput: 0: 3738.7. Samples: 30776776. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 00:15:08,968][134211] Avg episode reward: [(0, '6.551')] [2025-01-04 00:15:11,773][134294] Updated weights for policy 0, policy_version 40644 (0.0023) [2025-01-04 00:15:13,968][134211] Fps is (10 sec: 13926.4, 60 sec: 14813.9, 300 sec: 14856.7). Total num frames: 166506496. Throughput: 0: 3469.8. Samples: 30797370. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 00:15:13,968][134211] Avg episode reward: [(0, '6.219')] [2025-01-04 00:15:13,977][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000040651_166506496.pth... [2025-01-04 00:15:14,052][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000039777_162926592.pth [2025-01-04 00:15:14,644][134294] Updated weights for policy 0, policy_version 40654 (0.0024) [2025-01-04 00:15:16,562][134294] Updated weights for policy 0, policy_version 40664 (0.0013) [2025-01-04 00:15:18,457][134294] Updated weights for policy 0, policy_version 40674 (0.0012) [2025-01-04 00:15:18,967][134211] Fps is (10 sec: 17203.5, 60 sec: 15564.8, 300 sec: 14870.6). Total num frames: 166608896. Throughput: 0: 3434.2. Samples: 30811064. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 00:15:18,968][134211] Avg episode reward: [(0, '6.049')] [2025-01-04 00:15:20,351][134294] Updated weights for policy 0, policy_version 40684 (0.0012) [2025-01-04 00:15:22,223][134294] Updated weights for policy 0, policy_version 40694 (0.0013) [2025-01-04 00:15:23,967][134211] Fps is (10 sec: 21299.8, 60 sec: 15701.3, 300 sec: 15037.2). Total num frames: 166719488. Throughput: 0: 3731.5. Samples: 30843730. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 00:15:23,968][134211] Avg episode reward: [(0, '5.511')] [2025-01-04 00:15:24,134][134294] Updated weights for policy 0, policy_version 40704 (0.0013) [2025-01-04 00:15:26,120][134294] Updated weights for policy 0, policy_version 40714 (0.0015) [2025-01-04 00:15:28,968][134211] Fps is (10 sec: 19250.6, 60 sec: 15223.4, 300 sec: 15092.7). Total num frames: 166801408. Throughput: 0: 3916.1. Samples: 30871512. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 00:15:28,968][134211] Avg episode reward: [(0, '6.413')] [2025-01-04 00:15:29,188][134294] Updated weights for policy 0, policy_version 40724 (0.0025) [2025-01-04 00:15:32,455][134294] Updated weights for policy 0, policy_version 40734 (0.0030) [2025-01-04 00:15:33,968][134211] Fps is (10 sec: 14335.5, 60 sec: 14950.4, 300 sec: 15078.8). Total num frames: 166862848. Throughput: 0: 3904.6. Samples: 30880782. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 00:15:33,968][134211] Avg episode reward: [(0, '5.765')] [2025-01-04 00:15:35,590][134294] Updated weights for policy 0, policy_version 40744 (0.0026) [2025-01-04 00:15:38,590][134294] Updated weights for policy 0, policy_version 40754 (0.0029) [2025-01-04 00:15:38,968][134211] Fps is (10 sec: 13107.5, 60 sec: 15019.0, 300 sec: 15078.8). Total num frames: 166932480. Throughput: 0: 3892.6. Samples: 30900696. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 00:15:38,968][134211] Avg episode reward: [(0, '6.337')] [2025-01-04 00:15:41,655][134294] Updated weights for policy 0, policy_version 40764 (0.0025) [2025-01-04 00:15:43,968][134211] Fps is (10 sec: 13517.0, 60 sec: 15018.7, 300 sec: 15064.9). Total num frames: 166998016. Throughput: 0: 3888.4. Samples: 30920446. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 00:15:43,968][134211] Avg episode reward: [(0, '6.392')] [2025-01-04 00:15:44,847][134294] Updated weights for policy 0, policy_version 40774 (0.0026) [2025-01-04 00:15:47,902][134294] Updated weights for policy 0, policy_version 40784 (0.0027) [2025-01-04 00:15:48,968][134211] Fps is (10 sec: 13107.2, 60 sec: 15018.7, 300 sec: 15065.0). Total num frames: 167063552. Throughput: 0: 3864.8. Samples: 30929980. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 00:15:48,968][134211] Avg episode reward: [(0, '5.772')] [2025-01-04 00:15:50,934][134294] Updated weights for policy 0, policy_version 40794 (0.0026) [2025-01-04 00:15:53,856][134294] Updated weights for policy 0, policy_version 40804 (0.0025) [2025-01-04 00:15:53,968][134211] Fps is (10 sec: 13516.9, 60 sec: 15018.7, 300 sec: 15023.3). Total num frames: 167133184. Throughput: 0: 3861.6. Samples: 30950546. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 00:15:53,968][134211] Avg episode reward: [(0, '5.439')] [2025-01-04 00:15:56,845][134294] Updated weights for policy 0, policy_version 40814 (0.0022) [2025-01-04 00:15:58,968][134211] Fps is (10 sec: 13516.8, 60 sec: 15018.7, 300 sec: 14953.9). Total num frames: 167198720. Throughput: 0: 3862.4. Samples: 30971176. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 00:15:58,968][134211] Avg episode reward: [(0, '6.384')] [2025-01-04 00:15:59,955][134294] Updated weights for policy 0, policy_version 40824 (0.0027) [2025-01-04 00:16:02,658][134294] Updated weights for policy 0, policy_version 40834 (0.0021) [2025-01-04 00:16:03,967][134211] Fps is (10 sec: 14745.8, 60 sec: 15223.5, 300 sec: 14995.5). Total num frames: 167280640. Throughput: 0: 3778.8. Samples: 30981110. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 00:16:03,968][134211] Avg episode reward: [(0, '6.229')] [2025-01-04 00:16:04,557][134294] Updated weights for policy 0, policy_version 40844 (0.0013) [2025-01-04 00:16:06,458][134294] Updated weights for policy 0, policy_version 40854 (0.0013) [2025-01-04 00:16:08,350][134294] Updated weights for policy 0, policy_version 40864 (0.0014) [2025-01-04 00:16:08,968][134211] Fps is (10 sec: 19251.3, 60 sec: 15906.2, 300 sec: 15148.3). Total num frames: 167391232. Throughput: 0: 3734.1. Samples: 31011764. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 00:16:08,968][134211] Avg episode reward: [(0, '6.075')] [2025-01-04 00:16:11,108][134294] Updated weights for policy 0, policy_version 40874 (0.0019) [2025-01-04 00:16:13,968][134211] Fps is (10 sec: 17202.6, 60 sec: 15769.6, 300 sec: 15064.9). Total num frames: 167452672. Throughput: 0: 3631.9. Samples: 31034946. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 00:16:13,969][134211] Avg episode reward: [(0, '5.902')] [2025-01-04 00:16:14,269][134294] Updated weights for policy 0, policy_version 40884 (0.0030) [2025-01-04 00:16:18,097][134294] Updated weights for policy 0, policy_version 40894 (0.0032) [2025-01-04 00:16:18,968][134211] Fps is (10 sec: 11877.7, 60 sec: 15018.5, 300 sec: 15023.3). Total num frames: 167510016. Throughput: 0: 3615.3. Samples: 31043470. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 00:16:18,969][134211] Avg episode reward: [(0, '5.855')] [2025-01-04 00:16:21,590][134294] Updated weights for policy 0, policy_version 40904 (0.0026) [2025-01-04 00:16:23,968][134211] Fps is (10 sec: 12697.9, 60 sec: 14336.0, 300 sec: 15023.3). Total num frames: 167579648. Throughput: 0: 3551.4. Samples: 31060508. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 00:16:23,968][134211] Avg episode reward: [(0, '5.492')] [2025-01-04 00:16:24,212][134294] Updated weights for policy 0, policy_version 40914 (0.0015) [2025-01-04 00:16:26,125][134294] Updated weights for policy 0, policy_version 40924 (0.0014) [2025-01-04 00:16:27,972][134294] Updated weights for policy 0, policy_version 40934 (0.0013) [2025-01-04 00:16:28,968][134211] Fps is (10 sec: 17203.9, 60 sec: 14677.4, 300 sec: 15009.4). Total num frames: 167682048. Throughput: 0: 3780.3. Samples: 31090560. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 00:16:28,968][134211] Avg episode reward: [(0, '6.183')] [2025-01-04 00:16:30,561][134294] Updated weights for policy 0, policy_version 40944 (0.0022) [2025-01-04 00:16:33,793][134294] Updated weights for policy 0, policy_version 40954 (0.0027) [2025-01-04 00:16:33,968][134211] Fps is (10 sec: 16793.2, 60 sec: 14745.6, 300 sec: 14940.0). Total num frames: 167747584. Throughput: 0: 3816.5. Samples: 31101722. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 00:16:33,969][134211] Avg episode reward: [(0, '6.312')] [2025-01-04 00:16:36,825][134294] Updated weights for policy 0, policy_version 40964 (0.0025) [2025-01-04 00:16:38,968][134211] Fps is (10 sec: 13107.1, 60 sec: 14677.3, 300 sec: 14940.0). Total num frames: 167813120. Throughput: 0: 3795.7. Samples: 31121352. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 00:16:38,968][134211] Avg episode reward: [(0, '6.064')] [2025-01-04 00:16:40,099][134294] Updated weights for policy 0, policy_version 40974 (0.0021) [2025-01-04 00:16:43,190][134294] Updated weights for policy 0, policy_version 40984 (0.0026) [2025-01-04 00:16:43,968][134211] Fps is (10 sec: 13107.2, 60 sec: 14677.3, 300 sec: 14953.9). Total num frames: 167878656. Throughput: 0: 3767.4. Samples: 31140710. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 00:16:43,969][134211] Avg episode reward: [(0, '5.976')] [2025-01-04 00:16:46,219][134294] Updated weights for policy 0, policy_version 40994 (0.0024) [2025-01-04 00:16:48,968][134211] Fps is (10 sec: 13107.2, 60 sec: 14677.3, 300 sec: 14940.0). Total num frames: 167944192. Throughput: 0: 3774.4. Samples: 31150960. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 00:16:48,968][134211] Avg episode reward: [(0, '6.442')] [2025-01-04 00:16:49,300][134294] Updated weights for policy 0, policy_version 41004 (0.0028) [2025-01-04 00:16:52,400][134294] Updated weights for policy 0, policy_version 41014 (0.0025) [2025-01-04 00:16:53,968][134211] Fps is (10 sec: 13516.8, 60 sec: 14677.3, 300 sec: 14967.7). Total num frames: 168013824. Throughput: 0: 3534.6. Samples: 31170820. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 00:16:53,968][134211] Avg episode reward: [(0, '6.259')] [2025-01-04 00:16:54,998][134294] Updated weights for policy 0, policy_version 41024 (0.0016) [2025-01-04 00:16:56,875][134294] Updated weights for policy 0, policy_version 41034 (0.0012) [2025-01-04 00:16:58,778][134294] Updated weights for policy 0, policy_version 41044 (0.0014) [2025-01-04 00:16:58,967][134211] Fps is (10 sec: 17613.3, 60 sec: 15360.0, 300 sec: 15009.4). Total num frames: 168120320. Throughput: 0: 3650.4. Samples: 31199214. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 00:16:58,968][134211] Avg episode reward: [(0, '6.628')] [2025-01-04 00:17:00,672][134294] Updated weights for policy 0, policy_version 41054 (0.0015) [2025-01-04 00:17:03,300][134294] Updated weights for policy 0, policy_version 41064 (0.0021) [2025-01-04 00:17:03,968][134211] Fps is (10 sec: 19251.2, 60 sec: 15428.2, 300 sec: 14981.6). Total num frames: 168206336. Throughput: 0: 3816.9. Samples: 31215230. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 00:17:03,968][134211] Avg episode reward: [(0, '6.525')] [2025-01-04 00:17:06,559][134294] Updated weights for policy 0, policy_version 41074 (0.0028) [2025-01-04 00:17:08,968][134211] Fps is (10 sec: 14745.2, 60 sec: 14609.0, 300 sec: 14995.5). Total num frames: 168267776. Throughput: 0: 3874.1. Samples: 31234842. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 00:17:08,968][134211] Avg episode reward: [(0, '6.717')] [2025-01-04 00:17:10,046][134294] Updated weights for policy 0, policy_version 41084 (0.0030) [2025-01-04 00:17:13,229][134294] Updated weights for policy 0, policy_version 41094 (0.0025) [2025-01-04 00:17:13,969][134211] Fps is (10 sec: 12286.9, 60 sec: 14608.9, 300 sec: 15023.3). Total num frames: 168329216. Throughput: 0: 3618.9. Samples: 31253412. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 00:17:13,969][134211] Avg episode reward: [(0, '6.235')] [2025-01-04 00:17:13,979][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000041096_168329216.pth... [2025-01-04 00:17:14,054][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000040211_164704256.pth [2025-01-04 00:17:16,418][134294] Updated weights for policy 0, policy_version 41104 (0.0026) [2025-01-04 00:17:18,968][134211] Fps is (10 sec: 12697.7, 60 sec: 14745.7, 300 sec: 15023.3). Total num frames: 168394752. Throughput: 0: 3578.5. Samples: 31262754. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 00:17:18,968][134211] Avg episode reward: [(0, '6.854')] [2025-01-04 00:17:19,396][134294] Updated weights for policy 0, policy_version 41114 (0.0026) [2025-01-04 00:17:22,464][134294] Updated weights for policy 0, policy_version 41124 (0.0025) [2025-01-04 00:17:23,968][134211] Fps is (10 sec: 13108.3, 60 sec: 14677.3, 300 sec: 14870.6). Total num frames: 168460288. Throughput: 0: 3599.2. Samples: 31283314. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 00:17:23,968][134211] Avg episode reward: [(0, '7.134')] [2025-01-04 00:17:25,184][134294] Updated weights for policy 0, policy_version 41134 (0.0021) [2025-01-04 00:17:27,339][134294] Updated weights for policy 0, policy_version 41144 (0.0015) [2025-01-04 00:17:28,968][134211] Fps is (10 sec: 15155.1, 60 sec: 14404.2, 300 sec: 14801.1). Total num frames: 168546304. Throughput: 0: 3710.0. Samples: 31307658. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 00:17:28,968][134211] Avg episode reward: [(0, '6.994')] [2025-01-04 00:17:30,317][134294] Updated weights for policy 0, policy_version 41154 (0.0027) [2025-01-04 00:17:33,239][134294] Updated weights for policy 0, policy_version 41164 (0.0026) [2025-01-04 00:17:33,968][134211] Fps is (10 sec: 15564.8, 60 sec: 14472.5, 300 sec: 14815.0). Total num frames: 168615936. Throughput: 0: 3710.2. Samples: 31317920. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 00:17:33,968][134211] Avg episode reward: [(0, '7.578')] [2025-01-04 00:17:33,977][134264] Saving new best policy, reward=7.578! [2025-01-04 00:17:36,288][134294] Updated weights for policy 0, policy_version 41174 (0.0026) [2025-01-04 00:17:38,478][134294] Updated weights for policy 0, policy_version 41184 (0.0015) [2025-01-04 00:17:38,968][134211] Fps is (10 sec: 15155.5, 60 sec: 14745.7, 300 sec: 14870.6). Total num frames: 168697856. Throughput: 0: 3731.4. Samples: 31338732. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 00:17:38,968][134211] Avg episode reward: [(0, '6.143')] [2025-01-04 00:17:40,434][134294] Updated weights for policy 0, policy_version 41194 (0.0014) [2025-01-04 00:17:42,289][134294] Updated weights for policy 0, policy_version 41204 (0.0015) [2025-01-04 00:17:43,967][134211] Fps is (10 sec: 18842.1, 60 sec: 15428.4, 300 sec: 15009.4). Total num frames: 168804352. Throughput: 0: 3816.7. Samples: 31370964. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 00:17:43,968][134211] Avg episode reward: [(0, '7.038')] [2025-01-04 00:17:44,180][134294] Updated weights for policy 0, policy_version 41214 (0.0014) [2025-01-04 00:17:46,091][134294] Updated weights for policy 0, policy_version 41224 (0.0015) [2025-01-04 00:17:48,802][134294] Updated weights for policy 0, policy_version 41234 (0.0024) [2025-01-04 00:17:48,968][134211] Fps is (10 sec: 19660.5, 60 sec: 15837.9, 300 sec: 15092.8). Total num frames: 168894464. Throughput: 0: 3823.7. Samples: 31387298. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 00:17:48,968][134211] Avg episode reward: [(0, '6.723')] [2025-01-04 00:17:52,183][134294] Updated weights for policy 0, policy_version 41244 (0.0026) [2025-01-04 00:17:53,968][134211] Fps is (10 sec: 15154.7, 60 sec: 15701.3, 300 sec: 15078.8). Total num frames: 168955904. Throughput: 0: 3827.7. Samples: 31407088. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 00:17:53,968][134211] Avg episode reward: [(0, '6.338')] [2025-01-04 00:17:55,303][134294] Updated weights for policy 0, policy_version 41254 (0.0025) [2025-01-04 00:17:58,451][134294] Updated weights for policy 0, policy_version 41264 (0.0029) [2025-01-04 00:17:58,969][134211] Fps is (10 sec: 12696.6, 60 sec: 15018.4, 300 sec: 15092.8). Total num frames: 169021440. Throughput: 0: 3851.7. Samples: 31426736. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 00:17:58,969][134211] Avg episode reward: [(0, '5.825')] [2025-01-04 00:18:01,514][134294] Updated weights for policy 0, policy_version 41274 (0.0029) [2025-01-04 00:18:03,968][134211] Fps is (10 sec: 13107.2, 60 sec: 14677.3, 300 sec: 15092.7). Total num frames: 169086976. Throughput: 0: 3867.6. Samples: 31436796. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 00:18:03,968][134211] Avg episode reward: [(0, '6.123')] [2025-01-04 00:18:04,747][134294] Updated weights for policy 0, policy_version 41284 (0.0022) [2025-01-04 00:18:08,169][134294] Updated weights for policy 0, policy_version 41294 (0.0025) [2025-01-04 00:18:08,968][134211] Fps is (10 sec: 12698.6, 60 sec: 14677.3, 300 sec: 15064.9). Total num frames: 169148416. Throughput: 0: 3816.6. Samples: 31455062. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 00:18:08,968][134211] Avg episode reward: [(0, '6.542')] [2025-01-04 00:18:11,184][134294] Updated weights for policy 0, policy_version 41304 (0.0024) [2025-01-04 00:18:13,969][134211] Fps is (10 sec: 13105.9, 60 sec: 14813.8, 300 sec: 15051.0). Total num frames: 169218048. Throughput: 0: 3723.3. Samples: 31475212. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 00:18:13,969][134211] Avg episode reward: [(0, '6.256')] [2025-01-04 00:18:14,210][134294] Updated weights for policy 0, policy_version 41314 (0.0025) [2025-01-04 00:18:17,781][134294] Updated weights for policy 0, policy_version 41324 (0.0028) [2025-01-04 00:18:18,968][134211] Fps is (10 sec: 12697.8, 60 sec: 14677.4, 300 sec: 14981.6). Total num frames: 169275392. Throughput: 0: 3705.1. Samples: 31484650. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 00:18:18,968][134211] Avg episode reward: [(0, '6.120')] [2025-01-04 00:18:20,267][134294] Updated weights for policy 0, policy_version 41334 (0.0016) [2025-01-04 00:18:22,273][134294] Updated weights for policy 0, policy_version 41344 (0.0013) [2025-01-04 00:18:23,968][134211] Fps is (10 sec: 15976.4, 60 sec: 15291.8, 300 sec: 15106.6). Total num frames: 169377792. Throughput: 0: 3781.5. Samples: 31508900. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 00:18:23,968][134211] Avg episode reward: [(0, '6.645')] [2025-01-04 00:18:24,175][134294] Updated weights for policy 0, policy_version 41354 (0.0012) [2025-01-04 00:18:26,073][134294] Updated weights for policy 0, policy_version 41364 (0.0013) [2025-01-04 00:18:27,956][134294] Updated weights for policy 0, policy_version 41374 (0.0014) [2025-01-04 00:18:28,968][134211] Fps is (10 sec: 21298.7, 60 sec: 15701.3, 300 sec: 15148.2). Total num frames: 169488384. Throughput: 0: 3789.4. Samples: 31541488. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 00:18:28,968][134211] Avg episode reward: [(0, '6.035')] [2025-01-04 00:18:29,801][134294] Updated weights for policy 0, policy_version 41384 (0.0015) [2025-01-04 00:18:31,723][134294] Updated weights for policy 0, policy_version 41394 (0.0013) [2025-01-04 00:18:33,968][134211] Fps is (10 sec: 20478.6, 60 sec: 16110.8, 300 sec: 15106.6). Total num frames: 169582592. Throughput: 0: 3790.8. Samples: 31557886. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 00:18:33,969][134211] Avg episode reward: [(0, '6.362')] [2025-01-04 00:18:34,605][134294] Updated weights for policy 0, policy_version 41404 (0.0024) [2025-01-04 00:18:38,020][134294] Updated weights for policy 0, policy_version 41414 (0.0030) [2025-01-04 00:18:38,968][134211] Fps is (10 sec: 15155.5, 60 sec: 15701.3, 300 sec: 14953.9). Total num frames: 169639936. Throughput: 0: 3812.3. Samples: 31578640. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 00:18:38,968][134211] Avg episode reward: [(0, '5.753')] [2025-01-04 00:18:41,119][134294] Updated weights for policy 0, policy_version 41424 (0.0029) [2025-01-04 00:18:43,968][134211] Fps is (10 sec: 12288.5, 60 sec: 15018.6, 300 sec: 14981.6). Total num frames: 169705472. Throughput: 0: 3805.1. Samples: 31597964. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 00:18:43,969][134211] Avg episode reward: [(0, '6.312')] [2025-01-04 00:18:44,308][134294] Updated weights for policy 0, policy_version 41434 (0.0026) [2025-01-04 00:18:47,318][134294] Updated weights for policy 0, policy_version 41444 (0.0024) [2025-01-04 00:18:48,968][134211] Fps is (10 sec: 13107.0, 60 sec: 14609.1, 300 sec: 14995.5). Total num frames: 169771008. Throughput: 0: 3799.7. Samples: 31607784. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 00:18:48,968][134211] Avg episode reward: [(0, '6.680')] [2025-01-04 00:18:50,430][134294] Updated weights for policy 0, policy_version 41454 (0.0027) [2025-01-04 00:18:53,432][134294] Updated weights for policy 0, policy_version 41464 (0.0025) [2025-01-04 00:18:53,968][134211] Fps is (10 sec: 13516.4, 60 sec: 14745.5, 300 sec: 14995.5). Total num frames: 169840640. Throughput: 0: 3850.1. Samples: 31628318. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 00:18:53,969][134211] Avg episode reward: [(0, '5.962')] [2025-01-04 00:18:56,374][134294] Updated weights for policy 0, policy_version 41474 (0.0026) [2025-01-04 00:18:58,968][134211] Fps is (10 sec: 13926.4, 60 sec: 14814.1, 300 sec: 15009.5). Total num frames: 169910272. Throughput: 0: 3840.6. Samples: 31648036. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 00:18:58,968][134211] Avg episode reward: [(0, '6.266')] [2025-01-04 00:18:59,598][134294] Updated weights for policy 0, policy_version 41484 (0.0025) [2025-01-04 00:19:02,594][134294] Updated weights for policy 0, policy_version 41494 (0.0026) [2025-01-04 00:19:03,968][134211] Fps is (10 sec: 13517.3, 60 sec: 14813.9, 300 sec: 15009.4). Total num frames: 169975808. Throughput: 0: 3855.5. Samples: 31658150. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 00:19:03,969][134211] Avg episode reward: [(0, '6.407')] [2025-01-04 00:19:05,738][134294] Updated weights for policy 0, policy_version 41504 (0.0026) [2025-01-04 00:19:08,575][134294] Updated weights for policy 0, policy_version 41514 (0.0025) [2025-01-04 00:19:08,968][134211] Fps is (10 sec: 13516.8, 60 sec: 14950.4, 300 sec: 15009.4). Total num frames: 170045440. Throughput: 0: 3774.8. Samples: 31678768. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 00:19:08,968][134211] Avg episode reward: [(0, '5.940')] [2025-01-04 00:19:11,577][134294] Updated weights for policy 0, policy_version 41524 (0.0025) [2025-01-04 00:19:13,968][134211] Fps is (10 sec: 13926.2, 60 sec: 14950.6, 300 sec: 15051.0). Total num frames: 170115072. Throughput: 0: 3506.9. Samples: 31699298. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 00:19:13,969][134211] Avg episode reward: [(0, '6.092')] [2025-01-04 00:19:13,976][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000041532_170115072.pth... [2025-01-04 00:19:14,047][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000040651_166506496.pth [2025-01-04 00:19:14,579][134294] Updated weights for policy 0, policy_version 41534 (0.0024) [2025-01-04 00:19:17,055][134294] Updated weights for policy 0, policy_version 41544 (0.0018) [2025-01-04 00:19:18,967][134211] Fps is (10 sec: 15565.2, 60 sec: 15428.3, 300 sec: 14995.5). Total num frames: 170201088. Throughput: 0: 3372.8. Samples: 31709660. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 00:19:18,968][134211] Avg episode reward: [(0, '6.725')] [2025-01-04 00:19:18,970][134294] Updated weights for policy 0, policy_version 41554 (0.0014) [2025-01-04 00:19:20,880][134294] Updated weights for policy 0, policy_version 41564 (0.0013) [2025-01-04 00:19:22,762][134294] Updated weights for policy 0, policy_version 41574 (0.0014) [2025-01-04 00:19:23,968][134211] Fps is (10 sec: 19661.6, 60 sec: 15564.8, 300 sec: 14995.5). Total num frames: 170311680. Throughput: 0: 3629.9. Samples: 31741984. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 00:19:23,968][134211] Avg episode reward: [(0, '6.561')] [2025-01-04 00:19:24,647][134294] Updated weights for policy 0, policy_version 41584 (0.0015) [2025-01-04 00:19:26,542][134294] Updated weights for policy 0, policy_version 41594 (0.0013) [2025-01-04 00:19:28,941][134294] Updated weights for policy 0, policy_version 41604 (0.0020) [2025-01-04 00:19:28,968][134211] Fps is (10 sec: 20888.9, 60 sec: 15360.0, 300 sec: 15064.9). Total num frames: 170409984. Throughput: 0: 3893.6. Samples: 31773178. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 00:19:28,968][134211] Avg episode reward: [(0, '7.590')] [2025-01-04 00:19:28,969][134264] Saving new best policy, reward=7.590! [2025-01-04 00:19:32,201][134294] Updated weights for policy 0, policy_version 41614 (0.0031) [2025-01-04 00:19:33,968][134211] Fps is (10 sec: 15564.1, 60 sec: 14745.7, 300 sec: 15037.2). Total num frames: 170467328. Throughput: 0: 3883.0. Samples: 31782522. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 00:19:33,969][134211] Avg episode reward: [(0, '6.422')] [2025-01-04 00:19:35,418][134294] Updated weights for policy 0, policy_version 41624 (0.0026) [2025-01-04 00:19:38,428][134294] Updated weights for policy 0, policy_version 41634 (0.0026) [2025-01-04 00:19:38,968][134211] Fps is (10 sec: 12697.8, 60 sec: 14950.4, 300 sec: 15051.1). Total num frames: 170536960. Throughput: 0: 3862.7. Samples: 31802138. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 00:19:38,968][134211] Avg episode reward: [(0, '6.520')] [2025-01-04 00:19:41,553][134294] Updated weights for policy 0, policy_version 41644 (0.0022) [2025-01-04 00:19:43,968][134211] Fps is (10 sec: 13517.2, 60 sec: 14950.4, 300 sec: 15051.1). Total num frames: 170602496. Throughput: 0: 3863.3. Samples: 31821882. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 00:19:43,969][134211] Avg episode reward: [(0, '6.623')] [2025-01-04 00:19:44,742][134294] Updated weights for policy 0, policy_version 41654 (0.0027) [2025-01-04 00:19:47,874][134294] Updated weights for policy 0, policy_version 41664 (0.0029) [2025-01-04 00:19:48,968][134211] Fps is (10 sec: 13107.1, 60 sec: 14950.4, 300 sec: 15037.2). Total num frames: 170668032. Throughput: 0: 3852.7. Samples: 31831520. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 00:19:48,968][134211] Avg episode reward: [(0, '6.273')] [2025-01-04 00:19:50,896][134294] Updated weights for policy 0, policy_version 41674 (0.0025) [2025-01-04 00:19:53,841][134294] Updated weights for policy 0, policy_version 41684 (0.0024) [2025-01-04 00:19:53,968][134211] Fps is (10 sec: 13516.8, 60 sec: 14950.5, 300 sec: 15051.1). Total num frames: 170737664. Throughput: 0: 3848.9. Samples: 31851968. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 00:19:53,968][134211] Avg episode reward: [(0, '6.777')] [2025-01-04 00:19:56,735][134294] Updated weights for policy 0, policy_version 41694 (0.0025) [2025-01-04 00:19:58,968][134211] Fps is (10 sec: 13926.3, 60 sec: 14950.4, 300 sec: 15051.1). Total num frames: 170807296. Throughput: 0: 3847.0. Samples: 31872412. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 00:19:58,968][134211] Avg episode reward: [(0, '6.488')] [2025-01-04 00:19:59,935][134294] Updated weights for policy 0, policy_version 41704 (0.0025) [2025-01-04 00:20:02,854][134294] Updated weights for policy 0, policy_version 41714 (0.0026) [2025-01-04 00:20:03,968][134211] Fps is (10 sec: 13516.7, 60 sec: 14950.4, 300 sec: 15037.2). Total num frames: 170872832. Throughput: 0: 3836.8. Samples: 31882316. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 00:20:03,968][134211] Avg episode reward: [(0, '6.429')] [2025-01-04 00:20:05,917][134294] Updated weights for policy 0, policy_version 41724 (0.0025) [2025-01-04 00:20:07,773][134294] Updated weights for policy 0, policy_version 41734 (0.0014) [2025-01-04 00:20:08,967][134211] Fps is (10 sec: 15974.9, 60 sec: 15360.1, 300 sec: 15120.5). Total num frames: 170967040. Throughput: 0: 3636.9. Samples: 31905644. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 00:20:08,968][134211] Avg episode reward: [(0, '6.489')] [2025-01-04 00:20:09,669][134294] Updated weights for policy 0, policy_version 41744 (0.0013) [2025-01-04 00:20:11,505][134294] Updated weights for policy 0, policy_version 41754 (0.0015) [2025-01-04 00:20:13,389][134294] Updated weights for policy 0, policy_version 41764 (0.0014) [2025-01-04 00:20:13,968][134211] Fps is (10 sec: 20070.6, 60 sec: 15974.5, 300 sec: 15134.4). Total num frames: 171073536. Throughput: 0: 3672.8. Samples: 31938454. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 00:20:13,968][134211] Avg episode reward: [(0, '6.732')] [2025-01-04 00:20:16,099][134294] Updated weights for policy 0, policy_version 41774 (0.0023) [2025-01-04 00:20:18,968][134211] Fps is (10 sec: 16793.1, 60 sec: 15564.7, 300 sec: 14967.7). Total num frames: 171134976. Throughput: 0: 3718.2. Samples: 31949838. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 00:20:18,968][134211] Avg episode reward: [(0, '6.745')] [2025-01-04 00:20:19,850][134294] Updated weights for policy 0, policy_version 41784 (0.0025) [2025-01-04 00:20:23,412][134294] Updated weights for policy 0, policy_version 41794 (0.0030) [2025-01-04 00:20:23,968][134211] Fps is (10 sec: 11878.3, 60 sec: 14677.3, 300 sec: 14884.4). Total num frames: 171192320. Throughput: 0: 3657.5. Samples: 31966726. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 00:20:23,968][134211] Avg episode reward: [(0, '5.988')] [2025-01-04 00:20:26,890][134294] Updated weights for policy 0, policy_version 41804 (0.0027) [2025-01-04 00:20:28,968][134211] Fps is (10 sec: 11878.4, 60 sec: 14062.9, 300 sec: 14884.5). Total num frames: 171253760. Throughput: 0: 3607.9. Samples: 31984240. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 00:20:28,968][134211] Avg episode reward: [(0, '6.723')] [2025-01-04 00:20:29,834][134294] Updated weights for policy 0, policy_version 41814 (0.0022) [2025-01-04 00:20:31,728][134294] Updated weights for policy 0, policy_version 41824 (0.0015) [2025-01-04 00:20:33,652][134294] Updated weights for policy 0, policy_version 41834 (0.0013) [2025-01-04 00:20:33,968][134211] Fps is (10 sec: 16383.9, 60 sec: 14813.9, 300 sec: 14995.5). Total num frames: 171356160. Throughput: 0: 3698.9. Samples: 31997970. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 00:20:33,968][134211] Avg episode reward: [(0, '6.865')] [2025-01-04 00:20:35,499][134294] Updated weights for policy 0, policy_version 41844 (0.0014) [2025-01-04 00:20:37,527][134294] Updated weights for policy 0, policy_version 41854 (0.0016) [2025-01-04 00:20:38,968][134211] Fps is (10 sec: 19660.9, 60 sec: 15223.4, 300 sec: 15092.7). Total num frames: 171450368. Throughput: 0: 3962.0. Samples: 32030256. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-04 00:20:38,968][134211] Avg episode reward: [(0, '6.963')] [2025-01-04 00:20:40,811][134294] Updated weights for policy 0, policy_version 41864 (0.0029) [2025-01-04 00:20:43,968][134211] Fps is (10 sec: 15564.7, 60 sec: 15155.1, 300 sec: 15078.8). Total num frames: 171511808. Throughput: 0: 3943.1. Samples: 32049850. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-04 00:20:43,970][134211] Avg episode reward: [(0, '6.542')] [2025-01-04 00:20:43,983][134294] Updated weights for policy 0, policy_version 41874 (0.0024) [2025-01-04 00:20:47,630][134294] Updated weights for policy 0, policy_version 41884 (0.0022) [2025-01-04 00:20:48,968][134211] Fps is (10 sec: 12288.0, 60 sec: 15086.9, 300 sec: 15051.1). Total num frames: 171573248. Throughput: 0: 3910.4. Samples: 32058282. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-04 00:20:48,968][134211] Avg episode reward: [(0, '6.578')] [2025-01-04 00:20:50,898][134294] Updated weights for policy 0, policy_version 41894 (0.0024) [2025-01-04 00:20:53,968][134211] Fps is (10 sec: 12288.1, 60 sec: 14950.4, 300 sec: 15037.2). Total num frames: 171634688. Throughput: 0: 3804.0. Samples: 32076826. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-04 00:20:53,969][134211] Avg episode reward: [(0, '7.135')] [2025-01-04 00:20:54,101][134294] Updated weights for policy 0, policy_version 41904 (0.0026) [2025-01-04 00:20:57,222][134294] Updated weights for policy 0, policy_version 41914 (0.0027) [2025-01-04 00:20:58,969][134211] Fps is (10 sec: 12695.9, 60 sec: 14881.8, 300 sec: 14981.6). Total num frames: 171700224. Throughput: 0: 3495.8. Samples: 32095770. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-04 00:20:58,970][134211] Avg episode reward: [(0, '7.013')] [2025-01-04 00:21:00,595][134294] Updated weights for policy 0, policy_version 41924 (0.0022) [2025-01-04 00:21:03,351][134294] Updated weights for policy 0, policy_version 41934 (0.0022) [2025-01-04 00:21:03,967][134211] Fps is (10 sec: 13926.9, 60 sec: 15018.7, 300 sec: 14856.7). Total num frames: 171773952. Throughput: 0: 3460.3. Samples: 32105552. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 00:21:03,968][134211] Avg episode reward: [(0, '6.611')] [2025-01-04 00:21:05,323][134294] Updated weights for policy 0, policy_version 41944 (0.0013) [2025-01-04 00:21:07,257][134294] Updated weights for policy 0, policy_version 41954 (0.0015) [2025-01-04 00:21:08,968][134211] Fps is (10 sec: 16796.0, 60 sec: 15018.6, 300 sec: 14967.8). Total num frames: 171868160. Throughput: 0: 3712.7. Samples: 32133796. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 00:21:08,968][134211] Avg episode reward: [(0, '6.598')] [2025-01-04 00:21:10,071][134294] Updated weights for policy 0, policy_version 41964 (0.0026) [2025-01-04 00:21:13,060][134294] Updated weights for policy 0, policy_version 41974 (0.0026) [2025-01-04 00:21:13,969][134211] Fps is (10 sec: 15972.3, 60 sec: 14335.7, 300 sec: 14995.5). Total num frames: 171933696. Throughput: 0: 3792.0. Samples: 32154884. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 00:21:13,970][134211] Avg episode reward: [(0, '6.845')] [2025-01-04 00:21:14,056][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000041977_171937792.pth... [2025-01-04 00:21:14,124][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000041096_168329216.pth [2025-01-04 00:21:16,209][134294] Updated weights for policy 0, policy_version 41984 (0.0025) [2025-01-04 00:21:18,970][134211] Fps is (10 sec: 13513.1, 60 sec: 14471.9, 300 sec: 14995.4). Total num frames: 172003328. Throughput: 0: 3705.4. Samples: 32164722. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 00:21:18,971][134211] Avg episode reward: [(0, '7.009')] [2025-01-04 00:21:19,163][134294] Updated weights for policy 0, policy_version 41994 (0.0026) [2025-01-04 00:21:21,882][134294] Updated weights for policy 0, policy_version 42004 (0.0021) [2025-01-04 00:21:23,843][134294] Updated weights for policy 0, policy_version 42014 (0.0013) [2025-01-04 00:21:23,967][134211] Fps is (10 sec: 15566.9, 60 sec: 14950.5, 300 sec: 14940.0). Total num frames: 172089344. Throughput: 0: 3478.9. Samples: 32186806. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 00:21:23,968][134211] Avg episode reward: [(0, '7.207')] [2025-01-04 00:21:25,727][134294] Updated weights for policy 0, policy_version 42024 (0.0014) [2025-01-04 00:21:27,634][134294] Updated weights for policy 0, policy_version 42034 (0.0013) [2025-01-04 00:21:28,968][134211] Fps is (10 sec: 19665.3, 60 sec: 15769.5, 300 sec: 15092.7). Total num frames: 172199936. Throughput: 0: 3764.2. Samples: 32219238. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 00:21:28,969][134211] Avg episode reward: [(0, '6.682')] [2025-01-04 00:21:29,653][134294] Updated weights for policy 0, policy_version 42044 (0.0016) [2025-01-04 00:21:32,711][134294] Updated weights for policy 0, policy_version 42054 (0.0030) [2025-01-04 00:21:33,968][134211] Fps is (10 sec: 18022.2, 60 sec: 15223.5, 300 sec: 15106.6). Total num frames: 172269568. Throughput: 0: 3850.7. Samples: 32231562. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 00:21:33,968][134211] Avg episode reward: [(0, '6.156')] [2025-01-04 00:21:35,805][134294] Updated weights for policy 0, policy_version 42064 (0.0026) [2025-01-04 00:21:38,905][134294] Updated weights for policy 0, policy_version 42074 (0.0026) [2025-01-04 00:21:38,968][134211] Fps is (10 sec: 13517.3, 60 sec: 14745.6, 300 sec: 15106.6). Total num frames: 172335104. Throughput: 0: 3879.7. Samples: 32251410. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 00:21:38,968][134211] Avg episode reward: [(0, '6.894')] [2025-01-04 00:21:41,933][134294] Updated weights for policy 0, policy_version 42084 (0.0026) [2025-01-04 00:21:43,968][134211] Fps is (10 sec: 13107.0, 60 sec: 14813.9, 300 sec: 15106.6). Total num frames: 172400640. Throughput: 0: 3896.7. Samples: 32271118. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 00:21:43,968][134211] Avg episode reward: [(0, '6.866')] [2025-01-04 00:21:45,099][134294] Updated weights for policy 0, policy_version 42094 (0.0026) [2025-01-04 00:21:48,226][134294] Updated weights for policy 0, policy_version 42104 (0.0024) [2025-01-04 00:21:48,968][134211] Fps is (10 sec: 13107.1, 60 sec: 14882.1, 300 sec: 15092.7). Total num frames: 172466176. Throughput: 0: 3898.8. Samples: 32280998. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 00:21:48,968][134211] Avg episode reward: [(0, '6.647')] [2025-01-04 00:21:51,275][134294] Updated weights for policy 0, policy_version 42114 (0.0025) [2025-01-04 00:21:53,968][134211] Fps is (10 sec: 13107.2, 60 sec: 14950.4, 300 sec: 14953.9). Total num frames: 172531712. Throughput: 0: 3708.2. Samples: 32300664. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 00:21:53,968][134211] Avg episode reward: [(0, '6.885')] [2025-01-04 00:21:54,570][134294] Updated weights for policy 0, policy_version 42124 (0.0025) [2025-01-04 00:21:57,843][134294] Updated weights for policy 0, policy_version 42134 (0.0026) [2025-01-04 00:21:58,967][134211] Fps is (10 sec: 13517.1, 60 sec: 15019.0, 300 sec: 14898.3). Total num frames: 172601344. Throughput: 0: 3661.5. Samples: 32319648. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 00:21:58,968][134211] Avg episode reward: [(0, '6.510')] [2025-01-04 00:21:59,905][134294] Updated weights for policy 0, policy_version 42144 (0.0012) [2025-01-04 00:22:01,773][134294] Updated weights for policy 0, policy_version 42154 (0.0013) [2025-01-04 00:22:03,654][134294] Updated weights for policy 0, policy_version 42164 (0.0014) [2025-01-04 00:22:03,968][134211] Fps is (10 sec: 17612.7, 60 sec: 15564.7, 300 sec: 15051.1). Total num frames: 172707840. Throughput: 0: 3800.0. Samples: 32335712. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 00:22:03,968][134211] Avg episode reward: [(0, '6.733')] [2025-01-04 00:22:05,566][134294] Updated weights for policy 0, policy_version 42174 (0.0013) [2025-01-04 00:22:08,461][134294] Updated weights for policy 0, policy_version 42184 (0.0025) [2025-01-04 00:22:08,968][134211] Fps is (10 sec: 18839.9, 60 sec: 15359.8, 300 sec: 15120.5). Total num frames: 172789760. Throughput: 0: 3972.5. Samples: 32365572. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 00:22:08,969][134211] Avg episode reward: [(0, '6.630')] [2025-01-04 00:22:11,865][134294] Updated weights for policy 0, policy_version 42194 (0.0031) [2025-01-04 00:22:13,968][134211] Fps is (10 sec: 13926.5, 60 sec: 15223.8, 300 sec: 15092.7). Total num frames: 172847104. Throughput: 0: 3652.6. Samples: 32383602. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 00:22:13,968][134211] Avg episode reward: [(0, '6.415')] [2025-01-04 00:22:15,262][134294] Updated weights for policy 0, policy_version 42204 (0.0026) [2025-01-04 00:22:18,827][134294] Updated weights for policy 0, policy_version 42214 (0.0032) [2025-01-04 00:22:18,968][134211] Fps is (10 sec: 11879.1, 60 sec: 15087.6, 300 sec: 15078.8). Total num frames: 172908544. Throughput: 0: 3580.7. Samples: 32392694. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 00:22:18,968][134211] Avg episode reward: [(0, '6.534')] [2025-01-04 00:22:22,393][134294] Updated weights for policy 0, policy_version 42224 (0.0025) [2025-01-04 00:22:23,968][134211] Fps is (10 sec: 12288.1, 60 sec: 14677.3, 300 sec: 14995.5). Total num frames: 172969984. Throughput: 0: 3524.0. Samples: 32409988. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 00:22:23,968][134211] Avg episode reward: [(0, '6.586')] [2025-01-04 00:22:24,920][134294] Updated weights for policy 0, policy_version 42234 (0.0017) [2025-01-04 00:22:26,831][134294] Updated weights for policy 0, policy_version 42244 (0.0013) [2025-01-04 00:22:28,800][134294] Updated weights for policy 0, policy_version 42254 (0.0014) [2025-01-04 00:22:28,967][134211] Fps is (10 sec: 16384.5, 60 sec: 14540.9, 300 sec: 15106.6). Total num frames: 173072384. Throughput: 0: 3709.4. Samples: 32438042. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 00:22:28,968][134211] Avg episode reward: [(0, '6.317')] [2025-01-04 00:22:30,679][134294] Updated weights for policy 0, policy_version 42264 (0.0013) [2025-01-04 00:22:32,564][134294] Updated weights for policy 0, policy_version 42274 (0.0014) [2025-01-04 00:22:33,968][134211] Fps is (10 sec: 21299.3, 60 sec: 15223.5, 300 sec: 15203.8). Total num frames: 173182976. Throughput: 0: 3851.8. Samples: 32454328. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 00:22:33,968][134211] Avg episode reward: [(0, '6.327')] [2025-01-04 00:22:34,444][134294] Updated weights for policy 0, policy_version 42284 (0.0012) [2025-01-04 00:22:37,218][134294] Updated weights for policy 0, policy_version 42294 (0.0024) [2025-01-04 00:22:38,968][134211] Fps is (10 sec: 18431.3, 60 sec: 15359.9, 300 sec: 15092.7). Total num frames: 173256704. Throughput: 0: 4037.5. Samples: 32482352. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 00:22:38,969][134211] Avg episode reward: [(0, '5.927')] [2025-01-04 00:22:40,481][134294] Updated weights for policy 0, policy_version 42304 (0.0028) [2025-01-04 00:22:43,645][134294] Updated weights for policy 0, policy_version 42314 (0.0027) [2025-01-04 00:22:43,968][134211] Fps is (10 sec: 13926.0, 60 sec: 15360.0, 300 sec: 15009.4). Total num frames: 173322240. Throughput: 0: 4037.8. Samples: 32501350. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 00:22:43,968][134211] Avg episode reward: [(0, '5.978')] [2025-01-04 00:22:46,667][134294] Updated weights for policy 0, policy_version 42324 (0.0023) [2025-01-04 00:22:48,968][134211] Fps is (10 sec: 13107.3, 60 sec: 15360.0, 300 sec: 15023.3). Total num frames: 173387776. Throughput: 0: 3900.5. Samples: 32511232. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 00:22:48,969][134211] Avg episode reward: [(0, '6.831')] [2025-01-04 00:22:49,796][134294] Updated weights for policy 0, policy_version 42334 (0.0025) [2025-01-04 00:22:52,885][134294] Updated weights for policy 0, policy_version 42344 (0.0026) [2025-01-04 00:22:53,968][134211] Fps is (10 sec: 13107.3, 60 sec: 15360.0, 300 sec: 15023.3). Total num frames: 173453312. Throughput: 0: 3678.7. Samples: 32531112. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 00:22:53,968][134211] Avg episode reward: [(0, '6.536')] [2025-01-04 00:22:55,924][134294] Updated weights for policy 0, policy_version 42354 (0.0024) [2025-01-04 00:22:58,943][134294] Updated weights for policy 0, policy_version 42364 (0.0023) [2025-01-04 00:22:58,968][134211] Fps is (10 sec: 13516.8, 60 sec: 15359.9, 300 sec: 15037.2). Total num frames: 173522944. Throughput: 0: 3730.8. Samples: 32551490. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 00:22:58,968][134211] Avg episode reward: [(0, '6.188')] [2025-01-04 00:23:02,022][134294] Updated weights for policy 0, policy_version 42374 (0.0026) [2025-01-04 00:23:03,968][134211] Fps is (10 sec: 13107.2, 60 sec: 14609.1, 300 sec: 15037.2). Total num frames: 173584384. Throughput: 0: 3742.6. Samples: 32561110. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 00:23:03,968][134211] Avg episode reward: [(0, '6.362')] [2025-01-04 00:23:05,550][134294] Updated weights for policy 0, policy_version 42384 (0.0029) [2025-01-04 00:23:08,877][134294] Updated weights for policy 0, policy_version 42394 (0.0027) [2025-01-04 00:23:08,968][134211] Fps is (10 sec: 12288.1, 60 sec: 14267.9, 300 sec: 15009.5). Total num frames: 173645824. Throughput: 0: 3767.4. Samples: 32579520. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 00:23:08,968][134211] Avg episode reward: [(0, '6.519')] [2025-01-04 00:23:12,014][134294] Updated weights for policy 0, policy_version 42404 (0.0025) [2025-01-04 00:23:13,968][134211] Fps is (10 sec: 13107.3, 60 sec: 14472.6, 300 sec: 15051.1). Total num frames: 173715456. Throughput: 0: 3573.5. Samples: 32598852. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 00:23:13,968][134211] Avg episode reward: [(0, '6.241')] [2025-01-04 00:23:13,973][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000042411_173715456.pth... [2025-01-04 00:23:14,022][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000041532_170115072.pth [2025-01-04 00:23:14,531][134294] Updated weights for policy 0, policy_version 42414 (0.0019) [2025-01-04 00:23:16,425][134294] Updated weights for policy 0, policy_version 42424 (0.0014) [2025-01-04 00:23:18,343][134294] Updated weights for policy 0, policy_version 42434 (0.0013) [2025-01-04 00:23:18,968][134211] Fps is (10 sec: 17612.3, 60 sec: 15223.4, 300 sec: 15064.9). Total num frames: 173821952. Throughput: 0: 3554.3. Samples: 32614274. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 00:23:18,968][134211] Avg episode reward: [(0, '6.971')] [2025-01-04 00:23:20,209][134294] Updated weights for policy 0, policy_version 42444 (0.0016) [2025-01-04 00:23:22,568][134294] Updated weights for policy 0, policy_version 42454 (0.0016) [2025-01-04 00:23:23,968][134211] Fps is (10 sec: 19251.0, 60 sec: 15633.0, 300 sec: 14981.6). Total num frames: 173907968. Throughput: 0: 3614.0. Samples: 32644982. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 00:23:23,968][134211] Avg episode reward: [(0, '6.638')] [2025-01-04 00:23:25,742][134294] Updated weights for policy 0, policy_version 42464 (0.0028) [2025-01-04 00:23:28,958][134294] Updated weights for policy 0, policy_version 42474 (0.0025) [2025-01-04 00:23:28,968][134211] Fps is (10 sec: 15155.5, 60 sec: 15018.6, 300 sec: 14884.5). Total num frames: 173973504. Throughput: 0: 3622.0. Samples: 32664340. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 00:23:28,969][134211] Avg episode reward: [(0, '6.981')] [2025-01-04 00:23:32,250][134294] Updated weights for policy 0, policy_version 42484 (0.0030) [2025-01-04 00:23:33,970][134211] Fps is (10 sec: 12695.0, 60 sec: 14198.9, 300 sec: 14898.2). Total num frames: 174034944. Throughput: 0: 3608.9. Samples: 32673640. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 00:23:33,970][134211] Avg episode reward: [(0, '6.777')] [2025-01-04 00:23:35,457][134294] Updated weights for policy 0, policy_version 42494 (0.0024) [2025-01-04 00:23:38,457][134294] Updated weights for policy 0, policy_version 42504 (0.0026) [2025-01-04 00:23:38,968][134211] Fps is (10 sec: 12697.6, 60 sec: 14062.9, 300 sec: 14898.3). Total num frames: 174100480. Throughput: 0: 3600.9. Samples: 32693152. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 00:23:38,968][134211] Avg episode reward: [(0, '6.453')] [2025-01-04 00:23:41,475][134294] Updated weights for policy 0, policy_version 42514 (0.0026) [2025-01-04 00:23:43,968][134211] Fps is (10 sec: 13519.4, 60 sec: 14131.2, 300 sec: 14912.2). Total num frames: 174170112. Throughput: 0: 3595.4. Samples: 32713284. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 00:23:43,968][134211] Avg episode reward: [(0, '7.281')] [2025-01-04 00:23:44,606][134294] Updated weights for policy 0, policy_version 42524 (0.0023) [2025-01-04 00:23:47,502][134294] Updated weights for policy 0, policy_version 42534 (0.0023) [2025-01-04 00:23:48,968][134211] Fps is (10 sec: 14745.9, 60 sec: 14336.0, 300 sec: 14940.0). Total num frames: 174247936. Throughput: 0: 3604.9. Samples: 32723330. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 00:23:48,968][134211] Avg episode reward: [(0, '7.279')] [2025-01-04 00:23:49,500][134294] Updated weights for policy 0, policy_version 42544 (0.0017) [2025-01-04 00:23:52,350][134294] Updated weights for policy 0, policy_version 42554 (0.0021) [2025-01-04 00:23:53,968][134211] Fps is (10 sec: 15155.2, 60 sec: 14472.5, 300 sec: 14953.9). Total num frames: 174321664. Throughput: 0: 3741.5. Samples: 32747890. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 00:23:53,968][134211] Avg episode reward: [(0, '6.690')] [2025-01-04 00:23:55,351][134294] Updated weights for policy 0, policy_version 42564 (0.0027) [2025-01-04 00:23:58,323][134294] Updated weights for policy 0, policy_version 42574 (0.0024) [2025-01-04 00:23:58,968][134211] Fps is (10 sec: 14336.1, 60 sec: 14472.6, 300 sec: 14967.8). Total num frames: 174391296. Throughput: 0: 3772.5. Samples: 32768614. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 00:23:58,968][134211] Avg episode reward: [(0, '6.717')] [2025-01-04 00:24:00,268][134294] Updated weights for policy 0, policy_version 42584 (0.0015) [2025-01-04 00:24:02,156][134294] Updated weights for policy 0, policy_version 42594 (0.0012) [2025-01-04 00:24:03,968][134211] Fps is (10 sec: 18022.7, 60 sec: 15291.8, 300 sec: 15106.6). Total num frames: 174501888. Throughput: 0: 3777.3. Samples: 32784252. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 00:24:03,968][134211] Avg episode reward: [(0, '6.391')] [2025-01-04 00:24:04,026][134294] Updated weights for policy 0, policy_version 42604 (0.0013) [2025-01-04 00:24:06,490][134294] Updated weights for policy 0, policy_version 42614 (0.0021) [2025-01-04 00:24:08,968][134211] Fps is (10 sec: 18431.6, 60 sec: 15496.5, 300 sec: 15120.5). Total num frames: 174575616. Throughput: 0: 3713.1. Samples: 32812072. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 00:24:08,968][134211] Avg episode reward: [(0, '6.078')] [2025-01-04 00:24:09,723][134294] Updated weights for policy 0, policy_version 42624 (0.0025) [2025-01-04 00:24:12,822][134294] Updated weights for policy 0, policy_version 42634 (0.0024) [2025-01-04 00:24:13,968][134211] Fps is (10 sec: 13926.3, 60 sec: 15428.3, 300 sec: 15051.1). Total num frames: 174641152. Throughput: 0: 3714.5. Samples: 32831494. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 00:24:13,968][134211] Avg episode reward: [(0, '7.273')] [2025-01-04 00:24:15,951][134294] Updated weights for policy 0, policy_version 42644 (0.0027) [2025-01-04 00:24:18,968][134211] Fps is (10 sec: 12697.7, 60 sec: 14677.4, 300 sec: 14884.4). Total num frames: 174702592. Throughput: 0: 3731.6. Samples: 32841556. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 00:24:18,968][134211] Avg episode reward: [(0, '6.954')] [2025-01-04 00:24:19,395][134294] Updated weights for policy 0, policy_version 42654 (0.0025) [2025-01-04 00:24:22,446][134294] Updated weights for policy 0, policy_version 42664 (0.0023) [2025-01-04 00:24:23,968][134211] Fps is (10 sec: 13926.5, 60 sec: 14540.8, 300 sec: 14815.0). Total num frames: 174780416. Throughput: 0: 3703.2. Samples: 32859794. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 00:24:23,968][134211] Avg episode reward: [(0, '6.514')] [2025-01-04 00:24:24,523][134294] Updated weights for policy 0, policy_version 42674 (0.0014) [2025-01-04 00:24:26,429][134294] Updated weights for policy 0, policy_version 42684 (0.0016) [2025-01-04 00:24:28,337][134294] Updated weights for policy 0, policy_version 42694 (0.0013) [2025-01-04 00:24:28,967][134211] Fps is (10 sec: 18432.2, 60 sec: 15223.5, 300 sec: 14981.7). Total num frames: 174886912. Throughput: 0: 3949.7. Samples: 32891020. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 00:24:28,968][134211] Avg episode reward: [(0, '6.921')] [2025-01-04 00:24:30,683][134294] Updated weights for policy 0, policy_version 42704 (0.0018) [2025-01-04 00:24:33,804][134294] Updated weights for policy 0, policy_version 42714 (0.0026) [2025-01-04 00:24:33,968][134211] Fps is (10 sec: 17612.6, 60 sec: 15360.5, 300 sec: 14981.6). Total num frames: 174956544. Throughput: 0: 4010.4. Samples: 32903800. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 00:24:33,968][134211] Avg episode reward: [(0, '7.368')] [2025-01-04 00:24:37,006][134294] Updated weights for policy 0, policy_version 42724 (0.0029) [2025-01-04 00:24:38,968][134211] Fps is (10 sec: 13516.5, 60 sec: 15360.0, 300 sec: 14981.6). Total num frames: 175022080. Throughput: 0: 3895.2. Samples: 32923172. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 00:24:38,968][134211] Avg episode reward: [(0, '6.873')] [2025-01-04 00:24:40,185][134294] Updated weights for policy 0, policy_version 42734 (0.0025) [2025-01-04 00:24:43,107][134294] Updated weights for policy 0, policy_version 42744 (0.0022) [2025-01-04 00:24:43,968][134211] Fps is (10 sec: 13107.2, 60 sec: 15291.8, 300 sec: 14981.6). Total num frames: 175087616. Throughput: 0: 3877.0. Samples: 32943078. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 00:24:43,968][134211] Avg episode reward: [(0, '7.175')] [2025-01-04 00:24:46,091][134294] Updated weights for policy 0, policy_version 42754 (0.0025) [2025-01-04 00:24:48,968][134211] Fps is (10 sec: 13516.8, 60 sec: 15155.2, 300 sec: 14981.6). Total num frames: 175157248. Throughput: 0: 3759.2. Samples: 32953416. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 00:24:48,968][134211] Avg episode reward: [(0, '6.860')] [2025-01-04 00:24:49,219][134294] Updated weights for policy 0, policy_version 42764 (0.0023) [2025-01-04 00:24:52,296][134294] Updated weights for policy 0, policy_version 42774 (0.0025) [2025-01-04 00:24:53,968][134211] Fps is (10 sec: 13107.2, 60 sec: 14950.4, 300 sec: 14953.9). Total num frames: 175218688. Throughput: 0: 3584.9. Samples: 32973392. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 00:24:53,968][134211] Avg episode reward: [(0, '6.804')] [2025-01-04 00:24:55,596][134294] Updated weights for policy 0, policy_version 42784 (0.0027) [2025-01-04 00:24:57,586][134294] Updated weights for policy 0, policy_version 42794 (0.0014) [2025-01-04 00:24:58,968][134211] Fps is (10 sec: 14335.9, 60 sec: 15155.1, 300 sec: 15009.4). Total num frames: 175300608. Throughput: 0: 3664.9. Samples: 32996414. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 00:24:58,968][134211] Avg episode reward: [(0, '6.710')] [2025-01-04 00:25:00,476][134294] Updated weights for policy 0, policy_version 42804 (0.0024) [2025-01-04 00:25:03,396][134294] Updated weights for policy 0, policy_version 42814 (0.0026) [2025-01-04 00:25:03,968][134211] Fps is (10 sec: 15155.1, 60 sec: 14472.5, 300 sec: 14926.1). Total num frames: 175370240. Throughput: 0: 3676.6. Samples: 33007004. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 00:25:03,968][134211] Avg episode reward: [(0, '6.901')] [2025-01-04 00:25:06,365][134294] Updated weights for policy 0, policy_version 42824 (0.0025) [2025-01-04 00:25:08,502][134294] Updated weights for policy 0, policy_version 42834 (0.0014) [2025-01-04 00:25:08,967][134211] Fps is (10 sec: 15565.2, 60 sec: 14677.4, 300 sec: 14856.7). Total num frames: 175456256. Throughput: 0: 3748.0. Samples: 33028452. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 00:25:08,968][134211] Avg episode reward: [(0, '6.187')] [2025-01-04 00:25:10,366][134294] Updated weights for policy 0, policy_version 42844 (0.0013) [2025-01-04 00:25:13,032][134294] Updated weights for policy 0, policy_version 42854 (0.0026) [2025-01-04 00:25:13,968][134211] Fps is (10 sec: 17203.0, 60 sec: 15018.6, 300 sec: 14940.0). Total num frames: 175542272. Throughput: 0: 3667.8. Samples: 33056072. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 00:25:13,968][134211] Avg episode reward: [(0, '6.773')] [2025-01-04 00:25:13,979][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000042857_175542272.pth... [2025-01-04 00:25:14,052][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000041977_171937792.pth [2025-01-04 00:25:16,214][134294] Updated weights for policy 0, policy_version 42864 (0.0026) [2025-01-04 00:25:18,968][134211] Fps is (10 sec: 15154.9, 60 sec: 15086.9, 300 sec: 14967.8). Total num frames: 175607808. Throughput: 0: 3600.6. Samples: 33065826. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 00:25:18,968][134211] Avg episode reward: [(0, '6.367')] [2025-01-04 00:25:19,207][134294] Updated weights for policy 0, policy_version 42874 (0.0027) [2025-01-04 00:25:22,263][134294] Updated weights for policy 0, policy_version 42884 (0.0025) [2025-01-04 00:25:23,968][134211] Fps is (10 sec: 13107.5, 60 sec: 14882.1, 300 sec: 14981.6). Total num frames: 175673344. Throughput: 0: 3622.9. Samples: 33086200. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 00:25:23,968][134211] Avg episode reward: [(0, '6.519')] [2025-01-04 00:25:25,254][134294] Updated weights for policy 0, policy_version 42894 (0.0024) [2025-01-04 00:25:27,156][134294] Updated weights for policy 0, policy_version 42904 (0.0014) [2025-01-04 00:25:28,968][134211] Fps is (10 sec: 15155.4, 60 sec: 14540.8, 300 sec: 14926.1). Total num frames: 175759360. Throughput: 0: 3732.8. Samples: 33111054. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 00:25:28,968][134211] Avg episode reward: [(0, '6.181')] [2025-01-04 00:25:29,869][134294] Updated weights for policy 0, policy_version 42914 (0.0025) [2025-01-04 00:25:32,720][134294] Updated weights for policy 0, policy_version 42924 (0.0026) [2025-01-04 00:25:33,968][134211] Fps is (10 sec: 15564.6, 60 sec: 14540.8, 300 sec: 14842.8). Total num frames: 175828992. Throughput: 0: 3745.2. Samples: 33121952. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 00:25:33,968][134211] Avg episode reward: [(0, '6.786')] [2025-01-04 00:25:35,658][134294] Updated weights for policy 0, policy_version 42934 (0.0024) [2025-01-04 00:25:37,592][134294] Updated weights for policy 0, policy_version 42944 (0.0015) [2025-01-04 00:25:38,968][134211] Fps is (10 sec: 15564.7, 60 sec: 14882.2, 300 sec: 14926.1). Total num frames: 175915008. Throughput: 0: 3831.9. Samples: 33145828. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 00:25:38,968][134211] Avg episode reward: [(0, '6.009')] [2025-01-04 00:25:40,434][134294] Updated weights for policy 0, policy_version 42954 (0.0025) [2025-01-04 00:25:43,358][134294] Updated weights for policy 0, policy_version 42964 (0.0026) [2025-01-04 00:25:43,968][134211] Fps is (10 sec: 15565.0, 60 sec: 14950.4, 300 sec: 14953.9). Total num frames: 175984640. Throughput: 0: 3803.5. Samples: 33167570. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 00:25:43,968][134211] Avg episode reward: [(0, '6.155')] [2025-01-04 00:25:46,421][134294] Updated weights for policy 0, policy_version 42974 (0.0026) [2025-01-04 00:25:48,968][134211] Fps is (10 sec: 13926.3, 60 sec: 14950.4, 300 sec: 14981.6). Total num frames: 176054272. Throughput: 0: 3794.5. Samples: 33177758. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 00:25:48,968][134211] Avg episode reward: [(0, '6.427')] [2025-01-04 00:25:49,683][134294] Updated weights for policy 0, policy_version 42984 (0.0025) [2025-01-04 00:25:51,907][134294] Updated weights for policy 0, policy_version 42994 (0.0018) [2025-01-04 00:25:53,845][134294] Updated weights for policy 0, policy_version 43004 (0.0012) [2025-01-04 00:25:53,968][134211] Fps is (10 sec: 15974.4, 60 sec: 15428.3, 300 sec: 15065.0). Total num frames: 176144384. Throughput: 0: 3819.5. Samples: 33200332. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 00:25:53,968][134211] Avg episode reward: [(0, '6.122')] [2025-01-04 00:25:55,723][134294] Updated weights for policy 0, policy_version 43014 (0.0014) [2025-01-04 00:25:57,818][134294] Updated weights for policy 0, policy_version 43024 (0.0017) [2025-01-04 00:25:58,968][134211] Fps is (10 sec: 18432.0, 60 sec: 15633.1, 300 sec: 15134.4). Total num frames: 176238592. Throughput: 0: 3883.5. Samples: 33230828. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-04 00:25:58,968][134211] Avg episode reward: [(0, '6.039')] [2025-01-04 00:26:01,066][134294] Updated weights for policy 0, policy_version 43034 (0.0026) [2025-01-04 00:26:03,968][134211] Fps is (10 sec: 15564.6, 60 sec: 15496.5, 300 sec: 15023.3). Total num frames: 176300032. Throughput: 0: 3878.0. Samples: 33240338. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-04 00:26:03,968][134211] Avg episode reward: [(0, '6.652')] [2025-01-04 00:26:04,519][134294] Updated weights for policy 0, policy_version 43044 (0.0028) [2025-01-04 00:26:07,537][134294] Updated weights for policy 0, policy_version 43054 (0.0025) [2025-01-04 00:26:08,968][134211] Fps is (10 sec: 12697.5, 60 sec: 15155.1, 300 sec: 15023.3). Total num frames: 176365568. Throughput: 0: 3845.6. Samples: 33259254. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-04 00:26:08,968][134211] Avg episode reward: [(0, '7.034')] [2025-01-04 00:26:10,659][134294] Updated weights for policy 0, policy_version 43064 (0.0024) [2025-01-04 00:26:13,433][134294] Updated weights for policy 0, policy_version 43074 (0.0025) [2025-01-04 00:26:13,967][134211] Fps is (10 sec: 13926.7, 60 sec: 14950.5, 300 sec: 15037.3). Total num frames: 176439296. Throughput: 0: 3747.9. Samples: 33279708. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-04 00:26:13,968][134211] Avg episode reward: [(0, '6.107')] [2025-01-04 00:26:15,345][134294] Updated weights for policy 0, policy_version 43084 (0.0014) [2025-01-04 00:26:17,200][134294] Updated weights for policy 0, policy_version 43094 (0.0012) [2025-01-04 00:26:18,968][134211] Fps is (10 sec: 18022.9, 60 sec: 15633.1, 300 sec: 15106.6). Total num frames: 176545792. Throughput: 0: 3863.5. Samples: 33295810. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-04 00:26:18,968][134211] Avg episode reward: [(0, '7.240')] [2025-01-04 00:26:19,233][134294] Updated weights for policy 0, policy_version 43104 (0.0012) [2025-01-04 00:26:21,343][134294] Updated weights for policy 0, policy_version 43114 (0.0014) [2025-01-04 00:26:23,968][134211] Fps is (10 sec: 19250.3, 60 sec: 15974.3, 300 sec: 15023.3). Total num frames: 176631808. Throughput: 0: 4001.7. Samples: 33325906. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 00:26:23,969][134211] Avg episode reward: [(0, '7.465')] [2025-01-04 00:26:24,053][134294] Updated weights for policy 0, policy_version 43124 (0.0020) [2025-01-04 00:26:27,408][134294] Updated weights for policy 0, policy_version 43134 (0.0029) [2025-01-04 00:26:28,968][134211] Fps is (10 sec: 14745.5, 60 sec: 15564.8, 300 sec: 14995.5). Total num frames: 176693248. Throughput: 0: 3933.8. Samples: 33344592. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 00:26:28,968][134211] Avg episode reward: [(0, '7.740')] [2025-01-04 00:26:28,969][134264] Saving new best policy, reward=7.740! [2025-01-04 00:26:30,714][134294] Updated weights for policy 0, policy_version 43144 (0.0026) [2025-01-04 00:26:33,724][134294] Updated weights for policy 0, policy_version 43154 (0.0025) [2025-01-04 00:26:33,968][134211] Fps is (10 sec: 12698.0, 60 sec: 15496.6, 300 sec: 14995.5). Total num frames: 176758784. Throughput: 0: 3923.2. Samples: 33354300. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 00:26:33,968][134211] Avg episode reward: [(0, '7.044')] [2025-01-04 00:26:36,869][134294] Updated weights for policy 0, policy_version 43164 (0.0026) [2025-01-04 00:26:38,968][134211] Fps is (10 sec: 13107.1, 60 sec: 15155.2, 300 sec: 14995.5). Total num frames: 176824320. Throughput: 0: 3866.3. Samples: 33374316. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 00:26:38,968][134211] Avg episode reward: [(0, '6.644')] [2025-01-04 00:26:39,911][134294] Updated weights for policy 0, policy_version 43174 (0.0026) [2025-01-04 00:26:42,985][134294] Updated weights for policy 0, policy_version 43184 (0.0025) [2025-01-04 00:26:43,968][134211] Fps is (10 sec: 13516.6, 60 sec: 15155.2, 300 sec: 15009.4). Total num frames: 176893952. Throughput: 0: 3635.3. Samples: 33394418. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 00:26:43,968][134211] Avg episode reward: [(0, '6.390')] [2025-01-04 00:26:45,943][134294] Updated weights for policy 0, policy_version 43194 (0.0027) [2025-01-04 00:26:48,858][134294] Updated weights for policy 0, policy_version 43204 (0.0024) [2025-01-04 00:26:48,968][134211] Fps is (10 sec: 13926.4, 60 sec: 15155.2, 300 sec: 15023.3). Total num frames: 176963584. Throughput: 0: 3650.0. Samples: 33404588. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 00:26:48,968][134211] Avg episode reward: [(0, '6.437')] [2025-01-04 00:26:51,863][134294] Updated weights for policy 0, policy_version 43214 (0.0026) [2025-01-04 00:26:53,968][134211] Fps is (10 sec: 13926.4, 60 sec: 14813.8, 300 sec: 15023.3). Total num frames: 177033216. Throughput: 0: 3692.1. Samples: 33425400. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 00:26:53,969][134211] Avg episode reward: [(0, '6.949')] [2025-01-04 00:26:54,897][134294] Updated weights for policy 0, policy_version 43224 (0.0025) [2025-01-04 00:26:57,622][134294] Updated weights for policy 0, policy_version 43234 (0.0023) [2025-01-04 00:26:58,967][134211] Fps is (10 sec: 14746.0, 60 sec: 14540.9, 300 sec: 14926.1). Total num frames: 177111040. Throughput: 0: 3735.7. Samples: 33447814. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 00:26:58,968][134211] Avg episode reward: [(0, '6.964')] [2025-01-04 00:26:59,573][134294] Updated weights for policy 0, policy_version 43244 (0.0016) [2025-01-04 00:27:01,451][134294] Updated weights for policy 0, policy_version 43254 (0.0014) [2025-01-04 00:27:03,347][134294] Updated weights for policy 0, policy_version 43264 (0.0014) [2025-01-04 00:27:03,968][134211] Fps is (10 sec: 18841.1, 60 sec: 15359.9, 300 sec: 15023.3). Total num frames: 177221632. Throughput: 0: 3734.9. Samples: 33463882. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 00:27:03,968][134211] Avg episode reward: [(0, '6.372')] [2025-01-04 00:27:05,212][134294] Updated weights for policy 0, policy_version 43274 (0.0013) [2025-01-04 00:27:07,371][134294] Updated weights for policy 0, policy_version 43284 (0.0018) [2025-01-04 00:27:08,968][134211] Fps is (10 sec: 19660.2, 60 sec: 15701.3, 300 sec: 15120.5). Total num frames: 177307648. Throughput: 0: 3762.3. Samples: 33495210. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 00:27:08,968][134211] Avg episode reward: [(0, '6.772')] [2025-01-04 00:27:10,760][134294] Updated weights for policy 0, policy_version 43294 (0.0029) [2025-01-04 00:27:13,859][134294] Updated weights for policy 0, policy_version 43304 (0.0027) [2025-01-04 00:27:13,968][134211] Fps is (10 sec: 15155.5, 60 sec: 15564.7, 300 sec: 15134.4). Total num frames: 177373184. Throughput: 0: 3772.2. Samples: 33514344. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 00:27:13,968][134211] Avg episode reward: [(0, '6.376')] [2025-01-04 00:27:13,979][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000043304_177373184.pth... [2025-01-04 00:27:14,057][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000042411_173715456.pth [2025-01-04 00:27:17,052][134294] Updated weights for policy 0, policy_version 43314 (0.0028) [2025-01-04 00:27:18,968][134211] Fps is (10 sec: 12697.6, 60 sec: 14813.8, 300 sec: 15134.4). Total num frames: 177434624. Throughput: 0: 3767.7. Samples: 33523846. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 00:27:18,968][134211] Avg episode reward: [(0, '6.330')] [2025-01-04 00:27:20,303][134294] Updated weights for policy 0, policy_version 43324 (0.0026) [2025-01-04 00:27:23,209][134294] Updated weights for policy 0, policy_version 43334 (0.0023) [2025-01-04 00:27:23,968][134211] Fps is (10 sec: 13106.7, 60 sec: 14540.7, 300 sec: 15023.3). Total num frames: 177504256. Throughput: 0: 3765.2. Samples: 33543754. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 00:27:23,970][134211] Avg episode reward: [(0, '7.025')] [2025-01-04 00:27:26,279][134294] Updated weights for policy 0, policy_version 43344 (0.0025) [2025-01-04 00:27:28,968][134211] Fps is (10 sec: 13516.9, 60 sec: 14609.1, 300 sec: 14870.6). Total num frames: 177569792. Throughput: 0: 3766.5. Samples: 33563908. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 00:27:28,968][134211] Avg episode reward: [(0, '6.989')] [2025-01-04 00:27:29,403][134294] Updated weights for policy 0, policy_version 43354 (0.0026) [2025-01-04 00:27:32,433][134294] Updated weights for policy 0, policy_version 43364 (0.0023) [2025-01-04 00:27:33,967][134211] Fps is (10 sec: 13517.8, 60 sec: 14677.4, 300 sec: 14856.7). Total num frames: 177639424. Throughput: 0: 3752.2. Samples: 33573438. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 00:27:33,968][134211] Avg episode reward: [(0, '6.270')] [2025-01-04 00:27:34,790][134294] Updated weights for policy 0, policy_version 43374 (0.0016) [2025-01-04 00:27:36,720][134294] Updated weights for policy 0, policy_version 43384 (0.0014) [2025-01-04 00:27:38,579][134294] Updated weights for policy 0, policy_version 43394 (0.0013) [2025-01-04 00:27:38,968][134211] Fps is (10 sec: 17613.0, 60 sec: 15360.0, 300 sec: 14995.5). Total num frames: 177745920. Throughput: 0: 3904.4. Samples: 33601096. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 00:27:38,968][134211] Avg episode reward: [(0, '6.043')] [2025-01-04 00:27:40,465][134294] Updated weights for policy 0, policy_version 43404 (0.0015) [2025-01-04 00:27:42,825][134294] Updated weights for policy 0, policy_version 43414 (0.0020) [2025-01-04 00:27:43,968][134211] Fps is (10 sec: 19659.1, 60 sec: 15701.2, 300 sec: 15078.8). Total num frames: 177836032. Throughput: 0: 4052.1. Samples: 33630160. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 00:27:43,969][134211] Avg episode reward: [(0, '6.190')] [2025-01-04 00:27:45,977][134294] Updated weights for policy 0, policy_version 43424 (0.0025) [2025-01-04 00:27:48,968][134211] Fps is (10 sec: 15564.7, 60 sec: 15633.1, 300 sec: 15078.8). Total num frames: 177901568. Throughput: 0: 3914.2. Samples: 33640020. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 00:27:48,968][134211] Avg episode reward: [(0, '6.277')] [2025-01-04 00:27:49,028][134294] Updated weights for policy 0, policy_version 43434 (0.0028) [2025-01-04 00:27:52,258][134294] Updated weights for policy 0, policy_version 43444 (0.0025) [2025-01-04 00:27:53,968][134211] Fps is (10 sec: 13107.9, 60 sec: 15564.8, 300 sec: 15064.9). Total num frames: 177967104. Throughput: 0: 3652.3. Samples: 33659564. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 00:27:53,968][134211] Avg episode reward: [(0, '6.229')] [2025-01-04 00:27:55,439][134294] Updated weights for policy 0, policy_version 43454 (0.0028) [2025-01-04 00:27:58,484][134294] Updated weights for policy 0, policy_version 43464 (0.0029) [2025-01-04 00:27:58,968][134211] Fps is (10 sec: 13107.0, 60 sec: 15359.9, 300 sec: 15078.8). Total num frames: 178032640. Throughput: 0: 3668.4. Samples: 33679422. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 00:27:58,968][134211] Avg episode reward: [(0, '6.412')] [2025-01-04 00:28:01,507][134294] Updated weights for policy 0, policy_version 43474 (0.0025) [2025-01-04 00:28:03,968][134211] Fps is (10 sec: 13516.8, 60 sec: 14677.4, 300 sec: 15106.6). Total num frames: 178102272. Throughput: 0: 3684.1. Samples: 33689630. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 00:28:03,968][134211] Avg episode reward: [(0, '6.746')] [2025-01-04 00:28:04,561][134294] Updated weights for policy 0, policy_version 43484 (0.0026) [2025-01-04 00:28:07,552][134294] Updated weights for policy 0, policy_version 43494 (0.0027) [2025-01-04 00:28:08,968][134211] Fps is (10 sec: 13516.8, 60 sec: 14336.0, 300 sec: 15092.7). Total num frames: 178167808. Throughput: 0: 3688.4. Samples: 33709730. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 00:28:08,968][134211] Avg episode reward: [(0, '6.224')] [2025-01-04 00:28:10,579][134294] Updated weights for policy 0, policy_version 43504 (0.0027) [2025-01-04 00:28:12,732][134294] Updated weights for policy 0, policy_version 43514 (0.0015) [2025-01-04 00:28:13,968][134211] Fps is (10 sec: 15565.0, 60 sec: 14745.7, 300 sec: 15037.2). Total num frames: 178257920. Throughput: 0: 3776.6. Samples: 33733856. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 00:28:13,968][134211] Avg episode reward: [(0, '6.700')] [2025-01-04 00:28:14,673][134294] Updated weights for policy 0, policy_version 43524 (0.0013) [2025-01-04 00:28:16,533][134294] Updated weights for policy 0, policy_version 43534 (0.0014) [2025-01-04 00:28:18,452][134294] Updated weights for policy 0, policy_version 43544 (0.0013) [2025-01-04 00:28:18,968][134211] Fps is (10 sec: 19661.2, 60 sec: 15496.6, 300 sec: 15106.6). Total num frames: 178364416. Throughput: 0: 3925.6. Samples: 33750092. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 00:28:18,968][134211] Avg episode reward: [(0, '6.366')] [2025-01-04 00:28:20,430][134294] Updated weights for policy 0, policy_version 43554 (0.0014) [2025-01-04 00:28:22,631][134294] Updated weights for policy 0, policy_version 43564 (0.0016) [2025-01-04 00:28:23,968][134211] Fps is (10 sec: 19660.2, 60 sec: 15838.0, 300 sec: 15189.9). Total num frames: 178454528. Throughput: 0: 3997.7. Samples: 33780992. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 00:28:23,969][134211] Avg episode reward: [(0, '6.896')] [2025-01-04 00:28:26,107][134294] Updated weights for policy 0, policy_version 43574 (0.0030) [2025-01-04 00:28:28,968][134211] Fps is (10 sec: 14745.2, 60 sec: 15701.3, 300 sec: 15176.1). Total num frames: 178511872. Throughput: 0: 3752.6. Samples: 33799024. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 00:28:28,969][134211] Avg episode reward: [(0, '6.453')] [2025-01-04 00:28:29,519][134294] Updated weights for policy 0, policy_version 43584 (0.0028) [2025-01-04 00:28:32,599][134294] Updated weights for policy 0, policy_version 43594 (0.0023) [2025-01-04 00:28:33,968][134211] Fps is (10 sec: 12288.1, 60 sec: 15633.0, 300 sec: 15176.0). Total num frames: 178577408. Throughput: 0: 3745.8. Samples: 33808582. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 00:28:33,968][134211] Avg episode reward: [(0, '6.358')] [2025-01-04 00:28:35,695][134294] Updated weights for policy 0, policy_version 43604 (0.0023) [2025-01-04 00:28:38,713][134294] Updated weights for policy 0, policy_version 43614 (0.0025) [2025-01-04 00:28:38,968][134211] Fps is (10 sec: 13107.4, 60 sec: 14950.4, 300 sec: 15162.2). Total num frames: 178642944. Throughput: 0: 3762.8. Samples: 33828890. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 00:28:38,968][134211] Avg episode reward: [(0, '6.331')] [2025-01-04 00:28:41,790][134294] Updated weights for policy 0, policy_version 43624 (0.0026) [2025-01-04 00:28:43,968][134211] Fps is (10 sec: 13516.9, 60 sec: 14609.2, 300 sec: 15134.4). Total num frames: 178712576. Throughput: 0: 3766.4. Samples: 33848908. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 00:28:43,968][134211] Avg episode reward: [(0, '6.594')] [2025-01-04 00:28:44,791][134294] Updated weights for policy 0, policy_version 43634 (0.0028) [2025-01-04 00:28:48,012][134294] Updated weights for policy 0, policy_version 43644 (0.0025) [2025-01-04 00:28:48,968][134211] Fps is (10 sec: 13107.2, 60 sec: 14540.8, 300 sec: 15092.7). Total num frames: 178774016. Throughput: 0: 3762.1. Samples: 33858922. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 00:28:48,968][134211] Avg episode reward: [(0, '6.654')] [2025-01-04 00:28:51,010][134294] Updated weights for policy 0, policy_version 43654 (0.0026) [2025-01-04 00:28:53,970][134211] Fps is (10 sec: 13104.0, 60 sec: 14608.5, 300 sec: 15092.6). Total num frames: 178843648. Throughput: 0: 3754.7. Samples: 33878702. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 00:28:53,971][134211] Avg episode reward: [(0, '6.261')] [2025-01-04 00:28:54,252][134294] Updated weights for policy 0, policy_version 43664 (0.0026) [2025-01-04 00:28:56,558][134294] Updated weights for policy 0, policy_version 43674 (0.0015) [2025-01-04 00:28:58,487][134294] Updated weights for policy 0, policy_version 43684 (0.0015) [2025-01-04 00:28:58,968][134211] Fps is (10 sec: 16384.1, 60 sec: 15087.0, 300 sec: 15037.2). Total num frames: 178937856. Throughput: 0: 3784.7. Samples: 33904168. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 00:28:58,968][134211] Avg episode reward: [(0, '6.831')] [2025-01-04 00:29:00,742][134294] Updated weights for policy 0, policy_version 43694 (0.0019) [2025-01-04 00:29:03,968][134211] Fps is (10 sec: 16388.2, 60 sec: 15087.0, 300 sec: 15023.3). Total num frames: 179007488. Throughput: 0: 3712.9. Samples: 33917174. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 00:29:03,968][134211] Avg episode reward: [(0, '6.847')] [2025-01-04 00:29:04,007][134294] Updated weights for policy 0, policy_version 43704 (0.0027) [2025-01-04 00:29:06,992][134294] Updated weights for policy 0, policy_version 43714 (0.0025) [2025-01-04 00:29:08,968][134211] Fps is (10 sec: 13926.2, 60 sec: 15155.2, 300 sec: 15037.2). Total num frames: 179077120. Throughput: 0: 3461.3. Samples: 33936752. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 00:29:08,968][134211] Avg episode reward: [(0, '6.148')] [2025-01-04 00:29:10,153][134294] Updated weights for policy 0, policy_version 43724 (0.0023) [2025-01-04 00:29:13,031][134294] Updated weights for policy 0, policy_version 43734 (0.0025) [2025-01-04 00:29:13,968][134211] Fps is (10 sec: 13926.1, 60 sec: 14813.8, 300 sec: 15064.9). Total num frames: 179146752. Throughput: 0: 3512.6. Samples: 33957090. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 00:29:13,968][134211] Avg episode reward: [(0, '6.639')] [2025-01-04 00:29:13,974][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000043737_179146752.pth... [2025-01-04 00:29:14,024][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000042857_175542272.pth [2025-01-04 00:29:15,215][134294] Updated weights for policy 0, policy_version 43744 (0.0017) [2025-01-04 00:29:17,869][134294] Updated weights for policy 0, policy_version 43754 (0.0020) [2025-01-04 00:29:18,968][134211] Fps is (10 sec: 15155.1, 60 sec: 14404.2, 300 sec: 15078.8). Total num frames: 179228672. Throughput: 0: 3616.4. Samples: 33971320. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 00:29:18,968][134211] Avg episode reward: [(0, '6.481')] [2025-01-04 00:29:20,910][134294] Updated weights for policy 0, policy_version 43764 (0.0025) [2025-01-04 00:29:23,420][134294] Updated weights for policy 0, policy_version 43774 (0.0021) [2025-01-04 00:29:23,967][134211] Fps is (10 sec: 15975.0, 60 sec: 14199.6, 300 sec: 14981.6). Total num frames: 179306496. Throughput: 0: 3627.6. Samples: 33992132. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 00:29:23,968][134211] Avg episode reward: [(0, '5.949')] [2025-01-04 00:29:25,330][134294] Updated weights for policy 0, policy_version 43784 (0.0012) [2025-01-04 00:29:27,220][134294] Updated weights for policy 0, policy_version 43794 (0.0013) [2025-01-04 00:29:28,967][134211] Fps is (10 sec: 18842.1, 60 sec: 15087.0, 300 sec: 15120.5). Total num frames: 179417088. Throughput: 0: 3878.1. Samples: 34023422. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 00:29:28,968][134211] Avg episode reward: [(0, '6.536')] [2025-01-04 00:29:29,126][134294] Updated weights for policy 0, policy_version 43804 (0.0014) [2025-01-04 00:29:31,279][134294] Updated weights for policy 0, policy_version 43814 (0.0015) [2025-01-04 00:29:33,968][134211] Fps is (10 sec: 18841.2, 60 sec: 15291.8, 300 sec: 15162.1). Total num frames: 179494912. Throughput: 0: 3991.3. Samples: 34038532. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 00:29:33,968][134211] Avg episode reward: [(0, '6.489')] [2025-01-04 00:29:34,376][134294] Updated weights for policy 0, policy_version 43824 (0.0028) [2025-01-04 00:29:37,612][134294] Updated weights for policy 0, policy_version 43834 (0.0028) [2025-01-04 00:29:38,968][134211] Fps is (10 sec: 13926.2, 60 sec: 15223.5, 300 sec: 15148.3). Total num frames: 179556352. Throughput: 0: 3981.7. Samples: 34057870. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 00:29:38,968][134211] Avg episode reward: [(0, '5.676')] [2025-01-04 00:29:40,797][134294] Updated weights for policy 0, policy_version 43844 (0.0028) [2025-01-04 00:29:43,853][134294] Updated weights for policy 0, policy_version 43854 (0.0026) [2025-01-04 00:29:43,968][134211] Fps is (10 sec: 13107.2, 60 sec: 15223.5, 300 sec: 15148.3). Total num frames: 179625984. Throughput: 0: 3852.9. Samples: 34077548. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 00:29:43,968][134211] Avg episode reward: [(0, '6.409')] [2025-01-04 00:29:46,804][134294] Updated weights for policy 0, policy_version 43864 (0.0027) [2025-01-04 00:29:48,968][134211] Fps is (10 sec: 13516.7, 60 sec: 15291.7, 300 sec: 15162.1). Total num frames: 179691520. Throughput: 0: 3787.6. Samples: 34087618. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 00:29:48,968][134211] Avg episode reward: [(0, '6.410')] [2025-01-04 00:29:50,012][134294] Updated weights for policy 0, policy_version 43874 (0.0027) [2025-01-04 00:29:53,101][134294] Updated weights for policy 0, policy_version 43884 (0.0023) [2025-01-04 00:29:53,968][134211] Fps is (10 sec: 13107.2, 60 sec: 15224.1, 300 sec: 15106.6). Total num frames: 179757056. Throughput: 0: 3791.5. Samples: 34107368. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 00:29:53,968][134211] Avg episode reward: [(0, '6.661')] [2025-01-04 00:29:56,214][134294] Updated weights for policy 0, policy_version 43894 (0.0025) [2025-01-04 00:29:58,968][134211] Fps is (10 sec: 13516.8, 60 sec: 14813.8, 300 sec: 15106.6). Total num frames: 179826688. Throughput: 0: 3786.6. Samples: 34127488. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 00:29:58,968][134211] Avg episode reward: [(0, '6.502')] [2025-01-04 00:29:59,171][134294] Updated weights for policy 0, policy_version 43904 (0.0025) [2025-01-04 00:30:02,380][134294] Updated weights for policy 0, policy_version 43914 (0.0026) [2025-01-04 00:30:03,967][134211] Fps is (10 sec: 14336.2, 60 sec: 14882.2, 300 sec: 15064.9). Total num frames: 179900416. Throughput: 0: 3694.1. Samples: 34137554. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-04 00:30:03,968][134211] Avg episode reward: [(0, '6.604')] [2025-01-04 00:30:04,349][134294] Updated weights for policy 0, policy_version 43924 (0.0014) [2025-01-04 00:30:06,187][134294] Updated weights for policy 0, policy_version 43934 (0.0015) [2025-01-04 00:30:08,100][134294] Updated weights for policy 0, policy_version 43944 (0.0014) [2025-01-04 00:30:08,968][134211] Fps is (10 sec: 18432.3, 60 sec: 15564.8, 300 sec: 15148.3). Total num frames: 180011008. Throughput: 0: 3885.0. Samples: 34166958. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-04 00:30:08,968][134211] Avg episode reward: [(0, '6.281')] [2025-01-04 00:30:09,954][134294] Updated weights for policy 0, policy_version 43954 (0.0013) [2025-01-04 00:30:12,815][134294] Updated weights for policy 0, policy_version 43964 (0.0025) [2025-01-04 00:30:13,968][134211] Fps is (10 sec: 18840.5, 60 sec: 15701.3, 300 sec: 15189.9). Total num frames: 180088832. Throughput: 0: 3773.3. Samples: 34193222. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-04 00:30:13,969][134211] Avg episode reward: [(0, '6.656')] [2025-01-04 00:30:16,110][134294] Updated weights for policy 0, policy_version 43974 (0.0025) [2025-01-04 00:30:18,968][134211] Fps is (10 sec: 13926.1, 60 sec: 15360.0, 300 sec: 15176.0). Total num frames: 180150272. Throughput: 0: 3647.9. Samples: 34202688. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-04 00:30:18,968][134211] Avg episode reward: [(0, '6.324')] [2025-01-04 00:30:19,436][134294] Updated weights for policy 0, policy_version 43984 (0.0029) [2025-01-04 00:30:22,971][134294] Updated weights for policy 0, policy_version 43994 (0.0026) [2025-01-04 00:30:23,968][134211] Fps is (10 sec: 11878.3, 60 sec: 15018.5, 300 sec: 15078.8). Total num frames: 180207616. Throughput: 0: 3619.5. Samples: 34220750. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-04 00:30:23,969][134211] Avg episode reward: [(0, '6.498')] [2025-01-04 00:30:26,624][134294] Updated weights for policy 0, policy_version 44004 (0.0029) [2025-01-04 00:30:28,756][134294] Updated weights for policy 0, policy_version 44014 (0.0015) [2025-01-04 00:30:28,968][134211] Fps is (10 sec: 13107.4, 60 sec: 14404.3, 300 sec: 15092.7). Total num frames: 180281344. Throughput: 0: 3619.4. Samples: 34240420. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-04 00:30:28,968][134211] Avg episode reward: [(0, '6.778')] [2025-01-04 00:30:30,753][134294] Updated weights for policy 0, policy_version 44024 (0.0014) [2025-01-04 00:30:32,612][134294] Updated weights for policy 0, policy_version 44034 (0.0014) [2025-01-04 00:30:33,968][134211] Fps is (10 sec: 18433.1, 60 sec: 14950.4, 300 sec: 15176.0). Total num frames: 180391936. Throughput: 0: 3745.8. Samples: 34256178. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-04 00:30:33,968][134211] Avg episode reward: [(0, '7.046')] [2025-01-04 00:30:34,610][134294] Updated weights for policy 0, policy_version 44044 (0.0015) [2025-01-04 00:30:37,647][134294] Updated weights for policy 0, policy_version 44054 (0.0027) [2025-01-04 00:30:38,968][134211] Fps is (10 sec: 17612.6, 60 sec: 15018.7, 300 sec: 15162.1). Total num frames: 180457472. Throughput: 0: 3906.8. Samples: 34283174. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-04 00:30:38,968][134211] Avg episode reward: [(0, '6.620')] [2025-01-04 00:30:40,873][134294] Updated weights for policy 0, policy_version 44064 (0.0029) [2025-01-04 00:30:43,968][134211] Fps is (10 sec: 13106.7, 60 sec: 14950.3, 300 sec: 15148.2). Total num frames: 180523008. Throughput: 0: 3887.5. Samples: 34302426. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-04 00:30:43,969][134211] Avg episode reward: [(0, '7.091')] [2025-01-04 00:30:44,016][134294] Updated weights for policy 0, policy_version 44074 (0.0028) [2025-01-04 00:30:47,123][134294] Updated weights for policy 0, policy_version 44084 (0.0027) [2025-01-04 00:30:48,968][134211] Fps is (10 sec: 13107.3, 60 sec: 14950.4, 300 sec: 15065.0). Total num frames: 180588544. Throughput: 0: 3876.7. Samples: 34312006. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-04 00:30:48,968][134211] Avg episode reward: [(0, '6.579')] [2025-01-04 00:30:50,510][134294] Updated weights for policy 0, policy_version 44094 (0.0027) [2025-01-04 00:30:53,344][134294] Updated weights for policy 0, policy_version 44104 (0.0025) [2025-01-04 00:30:53,968][134211] Fps is (10 sec: 13107.6, 60 sec: 14950.4, 300 sec: 14967.8). Total num frames: 180654080. Throughput: 0: 3657.9. Samples: 34331562. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-04 00:30:53,968][134211] Avg episode reward: [(0, '6.380')] [2025-01-04 00:30:56,434][134294] Updated weights for policy 0, policy_version 44114 (0.0025) [2025-01-04 00:30:58,968][134211] Fps is (10 sec: 13516.7, 60 sec: 14950.4, 300 sec: 14995.5). Total num frames: 180723712. Throughput: 0: 3519.9. Samples: 34351614. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 00:30:58,968][134211] Avg episode reward: [(0, '6.966')] [2025-01-04 00:30:59,598][134294] Updated weights for policy 0, policy_version 44124 (0.0025) [2025-01-04 00:31:01,995][134294] Updated weights for policy 0, policy_version 44134 (0.0019) [2025-01-04 00:31:03,968][134211] Fps is (10 sec: 15564.9, 60 sec: 15155.2, 300 sec: 15065.0). Total num frames: 180809728. Throughput: 0: 3544.2. Samples: 34362176. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 00:31:03,968][134211] Avg episode reward: [(0, '6.899')] [2025-01-04 00:31:04,004][134294] Updated weights for policy 0, policy_version 44144 (0.0012) [2025-01-04 00:31:05,854][134294] Updated weights for policy 0, policy_version 44154 (0.0014) [2025-01-04 00:31:07,775][134294] Updated weights for policy 0, policy_version 44164 (0.0013) [2025-01-04 00:31:08,968][134211] Fps is (10 sec: 19660.8, 60 sec: 15155.2, 300 sec: 15189.9). Total num frames: 180920320. Throughput: 0: 3850.4. Samples: 34394016. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 00:31:08,968][134211] Avg episode reward: [(0, '6.298')] [2025-01-04 00:31:09,647][134294] Updated weights for policy 0, policy_version 44174 (0.0013) [2025-01-04 00:31:11,571][134294] Updated weights for policy 0, policy_version 44184 (0.0013) [2025-01-04 00:31:13,968][134211] Fps is (10 sec: 20069.7, 60 sec: 15360.0, 300 sec: 15134.4). Total num frames: 181010432. Throughput: 0: 4081.4. Samples: 34424084. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 00:31:13,969][134211] Avg episode reward: [(0, '7.014')] [2025-01-04 00:31:13,978][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000044192_181010432.pth... [2025-01-04 00:31:14,064][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000043304_177373184.pth [2025-01-04 00:31:14,561][134294] Updated weights for policy 0, policy_version 44194 (0.0024) [2025-01-04 00:31:18,133][134294] Updated weights for policy 0, policy_version 44204 (0.0030) [2025-01-04 00:31:18,968][134211] Fps is (10 sec: 14745.5, 60 sec: 15291.7, 300 sec: 15037.2). Total num frames: 181067776. Throughput: 0: 3911.1. Samples: 34432180. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 00:31:18,968][134211] Avg episode reward: [(0, '6.453')] [2025-01-04 00:31:21,417][134294] Updated weights for policy 0, policy_version 44214 (0.0023) [2025-01-04 00:31:23,968][134211] Fps is (10 sec: 11878.7, 60 sec: 15360.1, 300 sec: 15037.2). Total num frames: 181129216. Throughput: 0: 3716.3. Samples: 34450406. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 00:31:23,968][134211] Avg episode reward: [(0, '6.392')] [2025-01-04 00:31:24,909][134294] Updated weights for policy 0, policy_version 44224 (0.0028) [2025-01-04 00:31:28,106][134294] Updated weights for policy 0, policy_version 44234 (0.0025) [2025-01-04 00:31:28,968][134211] Fps is (10 sec: 12288.2, 60 sec: 15155.2, 300 sec: 15023.3). Total num frames: 181190656. Throughput: 0: 3705.0. Samples: 34469148. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 00:31:28,968][134211] Avg episode reward: [(0, '6.524')] [2025-01-04 00:31:31,061][134294] Updated weights for policy 0, policy_version 44244 (0.0025) [2025-01-04 00:31:33,968][134211] Fps is (10 sec: 13107.2, 60 sec: 14472.5, 300 sec: 15037.2). Total num frames: 181260288. Throughput: 0: 3716.6. Samples: 34479252. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 00:31:33,968][134211] Avg episode reward: [(0, '6.404')] [2025-01-04 00:31:34,198][134294] Updated weights for policy 0, policy_version 44254 (0.0025) [2025-01-04 00:31:37,196][134294] Updated weights for policy 0, policy_version 44264 (0.0024) [2025-01-04 00:31:38,968][134211] Fps is (10 sec: 13516.9, 60 sec: 14472.6, 300 sec: 15023.3). Total num frames: 181325824. Throughput: 0: 3731.0. Samples: 34499456. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 00:31:38,969][134211] Avg episode reward: [(0, '6.411')] [2025-01-04 00:31:40,339][134294] Updated weights for policy 0, policy_version 44274 (0.0024) [2025-01-04 00:31:43,245][134294] Updated weights for policy 0, policy_version 44284 (0.0026) [2025-01-04 00:31:43,968][134211] Fps is (10 sec: 13516.7, 60 sec: 14540.8, 300 sec: 15023.3). Total num frames: 181395456. Throughput: 0: 3735.9. Samples: 34519732. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 00:31:43,968][134211] Avg episode reward: [(0, '6.621')] [2025-01-04 00:31:46,334][134294] Updated weights for policy 0, policy_version 44294 (0.0026) [2025-01-04 00:31:48,308][134294] Updated weights for policy 0, policy_version 44304 (0.0012) [2025-01-04 00:31:48,968][134211] Fps is (10 sec: 15564.9, 60 sec: 14882.1, 300 sec: 15078.8). Total num frames: 181481472. Throughput: 0: 3728.9. Samples: 34529978. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 00:31:48,968][134211] Avg episode reward: [(0, '7.035')] [2025-01-04 00:31:50,145][134294] Updated weights for policy 0, policy_version 44314 (0.0014) [2025-01-04 00:31:52,170][134294] Updated weights for policy 0, policy_version 44324 (0.0013) [2025-01-04 00:31:53,968][134211] Fps is (10 sec: 19251.4, 60 sec: 15564.8, 300 sec: 15176.0). Total num frames: 181587968. Throughput: 0: 3713.6. Samples: 34561128. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 00:31:53,968][134211] Avg episode reward: [(0, '7.347')] [2025-01-04 00:31:54,179][134294] Updated weights for policy 0, policy_version 44334 (0.0018) [2025-01-04 00:31:57,328][134294] Updated weights for policy 0, policy_version 44344 (0.0025) [2025-01-04 00:31:58,968][134211] Fps is (10 sec: 16793.2, 60 sec: 15428.2, 300 sec: 15009.4). Total num frames: 181649408. Throughput: 0: 3551.7. Samples: 34583910. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 00:31:58,968][134211] Avg episode reward: [(0, '6.299')] [2025-01-04 00:32:00,630][134294] Updated weights for policy 0, policy_version 44354 (0.0029) [2025-01-04 00:32:03,793][134294] Updated weights for policy 0, policy_version 44364 (0.0025) [2025-01-04 00:32:03,969][134211] Fps is (10 sec: 12696.1, 60 sec: 15086.6, 300 sec: 14939.9). Total num frames: 181714944. Throughput: 0: 3583.5. Samples: 34593442. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 00:32:03,970][134211] Avg episode reward: [(0, '6.438')] [2025-01-04 00:32:06,968][134294] Updated weights for policy 0, policy_version 44374 (0.0025) [2025-01-04 00:32:08,968][134211] Fps is (10 sec: 12697.8, 60 sec: 14267.7, 300 sec: 14926.1). Total num frames: 181776384. Throughput: 0: 3609.0. Samples: 34612812. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 00:32:08,968][134211] Avg episode reward: [(0, '6.244')] [2025-01-04 00:32:10,296][134294] Updated weights for policy 0, policy_version 44384 (0.0025) [2025-01-04 00:32:12,472][134294] Updated weights for policy 0, policy_version 44394 (0.0015) [2025-01-04 00:32:13,968][134211] Fps is (10 sec: 15157.1, 60 sec: 14267.8, 300 sec: 15023.3). Total num frames: 181866496. Throughput: 0: 3713.6. Samples: 34636258. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 00:32:13,968][134211] Avg episode reward: [(0, '7.011')] [2025-01-04 00:32:14,447][134294] Updated weights for policy 0, policy_version 44404 (0.0012) [2025-01-04 00:32:16,321][134294] Updated weights for policy 0, policy_version 44414 (0.0013) [2025-01-04 00:32:18,238][134294] Updated weights for policy 0, policy_version 44424 (0.0015) [2025-01-04 00:32:18,968][134211] Fps is (10 sec: 19660.9, 60 sec: 15087.0, 300 sec: 15148.3). Total num frames: 181972992. Throughput: 0: 3849.3. Samples: 34652470. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 00:32:18,968][134211] Avg episode reward: [(0, '6.751')] [2025-01-04 00:32:21,004][134294] Updated weights for policy 0, policy_version 44434 (0.0024) [2025-01-04 00:32:23,968][134211] Fps is (10 sec: 16383.5, 60 sec: 15018.6, 300 sec: 15120.5). Total num frames: 182030336. Throughput: 0: 3941.5. Samples: 34676824. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 00:32:23,969][134211] Avg episode reward: [(0, '6.110')] [2025-01-04 00:32:24,787][134294] Updated weights for policy 0, policy_version 44444 (0.0032) [2025-01-04 00:32:28,362][134294] Updated weights for policy 0, policy_version 44454 (0.0028) [2025-01-04 00:32:28,968][134211] Fps is (10 sec: 11468.6, 60 sec: 14950.4, 300 sec: 15078.8). Total num frames: 182087680. Throughput: 0: 3862.9. Samples: 34693564. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 00:32:28,968][134211] Avg episode reward: [(0, '7.410')] [2025-01-04 00:32:32,032][134294] Updated weights for policy 0, policy_version 44464 (0.0030) [2025-01-04 00:32:33,968][134211] Fps is (10 sec: 13107.6, 60 sec: 15018.7, 300 sec: 14967.8). Total num frames: 182161408. Throughput: 0: 3822.8. Samples: 34702002. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 00:32:33,968][134211] Avg episode reward: [(0, '6.738')] [2025-01-04 00:32:34,157][134294] Updated weights for policy 0, policy_version 44474 (0.0013) [2025-01-04 00:32:36,000][134294] Updated weights for policy 0, policy_version 44484 (0.0013) [2025-01-04 00:32:38,623][134294] Updated weights for policy 0, policy_version 44494 (0.0023) [2025-01-04 00:32:38,968][134211] Fps is (10 sec: 15974.4, 60 sec: 15360.0, 300 sec: 14953.9). Total num frames: 182247424. Throughput: 0: 3754.0. Samples: 34730060. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 00:32:38,968][134211] Avg episode reward: [(0, '6.820')] [2025-01-04 00:32:41,813][134294] Updated weights for policy 0, policy_version 44504 (0.0028) [2025-01-04 00:32:43,968][134211] Fps is (10 sec: 15154.1, 60 sec: 15291.6, 300 sec: 14953.8). Total num frames: 182312960. Throughput: 0: 3683.5. Samples: 34749668. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 00:32:43,969][134211] Avg episode reward: [(0, '7.273')] [2025-01-04 00:32:45,003][134294] Updated weights for policy 0, policy_version 44514 (0.0025) [2025-01-04 00:32:48,024][134294] Updated weights for policy 0, policy_version 44524 (0.0022) [2025-01-04 00:32:48,968][134211] Fps is (10 sec: 13515.9, 60 sec: 15018.4, 300 sec: 14967.7). Total num frames: 182382592. Throughput: 0: 3691.7. Samples: 34759566. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 00:32:48,969][134211] Avg episode reward: [(0, '7.173')] [2025-01-04 00:32:50,975][134294] Updated weights for policy 0, policy_version 44534 (0.0026) [2025-01-04 00:32:53,904][134294] Updated weights for policy 0, policy_version 44544 (0.0024) [2025-01-04 00:32:53,969][134211] Fps is (10 sec: 13925.9, 60 sec: 14404.0, 300 sec: 14981.6). Total num frames: 182452224. Throughput: 0: 3721.1. Samples: 34780266. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 00:32:53,969][134211] Avg episode reward: [(0, '6.483')] [2025-01-04 00:32:56,885][134294] Updated weights for policy 0, policy_version 44554 (0.0026) [2025-01-04 00:32:58,910][134294] Updated weights for policy 0, policy_version 44564 (0.0014) [2025-01-04 00:32:58,968][134211] Fps is (10 sec: 15156.5, 60 sec: 14745.7, 300 sec: 15023.3). Total num frames: 182534144. Throughput: 0: 3708.3. Samples: 34803132. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 00:32:58,968][134211] Avg episode reward: [(0, '5.887')] [2025-01-04 00:33:01,663][134294] Updated weights for policy 0, policy_version 44574 (0.0026) [2025-01-04 00:33:03,968][134211] Fps is (10 sec: 15156.7, 60 sec: 14814.2, 300 sec: 15037.2). Total num frames: 182603776. Throughput: 0: 3614.1. Samples: 34815106. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 00:33:03,968][134211] Avg episode reward: [(0, '6.462')] [2025-01-04 00:33:04,851][134294] Updated weights for policy 0, policy_version 44584 (0.0026) [2025-01-04 00:33:07,579][134294] Updated weights for policy 0, policy_version 44594 (0.0020) [2025-01-04 00:33:08,968][134211] Fps is (10 sec: 15155.2, 60 sec: 15155.2, 300 sec: 15009.4). Total num frames: 182685696. Throughput: 0: 3523.8. Samples: 34835392. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 00:33:08,968][134211] Avg episode reward: [(0, '6.770')] [2025-01-04 00:33:09,470][134294] Updated weights for policy 0, policy_version 44604 (0.0013) [2025-01-04 00:33:11,453][134294] Updated weights for policy 0, policy_version 44614 (0.0012) [2025-01-04 00:33:13,393][134294] Updated weights for policy 0, policy_version 44624 (0.0013) [2025-01-04 00:33:13,968][134211] Fps is (10 sec: 18432.1, 60 sec: 15360.0, 300 sec: 14995.5). Total num frames: 182788096. Throughput: 0: 3852.2. Samples: 34866910. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 00:33:13,968][134211] Avg episode reward: [(0, '6.995')] [2025-01-04 00:33:13,975][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000044626_182788096.pth... [2025-01-04 00:33:14,030][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000043737_179146752.pth [2025-01-04 00:33:16,053][134294] Updated weights for policy 0, policy_version 44634 (0.0020) [2025-01-04 00:33:18,968][134211] Fps is (10 sec: 17203.0, 60 sec: 14745.6, 300 sec: 14926.1). Total num frames: 182857728. Throughput: 0: 3925.2. Samples: 34878636. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 00:33:18,968][134211] Avg episode reward: [(0, '7.389')] [2025-01-04 00:33:19,330][134294] Updated weights for policy 0, policy_version 44644 (0.0028) [2025-01-04 00:33:22,525][134294] Updated weights for policy 0, policy_version 44654 (0.0025) [2025-01-04 00:33:23,968][134211] Fps is (10 sec: 13106.9, 60 sec: 14813.9, 300 sec: 14940.0). Total num frames: 182919168. Throughput: 0: 3728.4. Samples: 34897840. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 00:33:23,969][134211] Avg episode reward: [(0, '6.462')] [2025-01-04 00:33:25,582][134294] Updated weights for policy 0, policy_version 44664 (0.0027) [2025-01-04 00:33:28,586][134294] Updated weights for policy 0, policy_version 44674 (0.0026) [2025-01-04 00:33:28,968][134211] Fps is (10 sec: 13107.1, 60 sec: 15018.7, 300 sec: 14953.9). Total num frames: 182988800. Throughput: 0: 3743.7. Samples: 34918132. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 00:33:28,968][134211] Avg episode reward: [(0, '6.583')] [2025-01-04 00:33:31,539][134294] Updated weights for policy 0, policy_version 44684 (0.0024) [2025-01-04 00:33:33,968][134211] Fps is (10 sec: 13517.0, 60 sec: 14882.1, 300 sec: 14953.9). Total num frames: 183054336. Throughput: 0: 3748.2. Samples: 34928234. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 00:33:33,968][134211] Avg episode reward: [(0, '6.560')] [2025-01-04 00:33:34,743][134294] Updated weights for policy 0, policy_version 44694 (0.0024) [2025-01-04 00:33:37,818][134294] Updated weights for policy 0, policy_version 44704 (0.0025) [2025-01-04 00:33:38,968][134211] Fps is (10 sec: 13107.2, 60 sec: 14540.8, 300 sec: 14940.0). Total num frames: 183119872. Throughput: 0: 3729.9. Samples: 34948110. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 00:33:38,969][134211] Avg episode reward: [(0, '7.419')] [2025-01-04 00:33:40,437][134294] Updated weights for policy 0, policy_version 44714 (0.0021) [2025-01-04 00:33:42,340][134294] Updated weights for policy 0, policy_version 44724 (0.0013) [2025-01-04 00:33:43,968][134211] Fps is (10 sec: 16793.8, 60 sec: 15155.4, 300 sec: 15078.8). Total num frames: 183222272. Throughput: 0: 3824.1. Samples: 34975218. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 00:33:43,968][134211] Avg episode reward: [(0, '6.711')] [2025-01-04 00:33:44,285][134294] Updated weights for policy 0, policy_version 44734 (0.0015) [2025-01-04 00:33:47,188][134294] Updated weights for policy 0, policy_version 44744 (0.0026) [2025-01-04 00:33:48,968][134211] Fps is (10 sec: 17203.4, 60 sec: 15155.4, 300 sec: 15079.0). Total num frames: 183291904. Throughput: 0: 3835.4. Samples: 34987700. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 00:33:48,968][134211] Avg episode reward: [(0, '6.409')] [2025-01-04 00:33:50,319][134294] Updated weights for policy 0, policy_version 44754 (0.0025) [2025-01-04 00:33:53,373][134294] Updated weights for policy 0, policy_version 44764 (0.0024) [2025-01-04 00:33:53,968][134211] Fps is (10 sec: 13516.6, 60 sec: 15087.2, 300 sec: 14981.6). Total num frames: 183357440. Throughput: 0: 3826.0. Samples: 35007564. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 00:33:53,968][134211] Avg episode reward: [(0, '7.363')] [2025-01-04 00:33:55,924][134294] Updated weights for policy 0, policy_version 44774 (0.0020) [2025-01-04 00:33:57,847][134294] Updated weights for policy 0, policy_version 44784 (0.0014) [2025-01-04 00:33:58,968][134211] Fps is (10 sec: 16793.8, 60 sec: 15428.3, 300 sec: 15092.7). Total num frames: 183459840. Throughput: 0: 3713.0. Samples: 35033996. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 00:33:58,968][134211] Avg episode reward: [(0, '6.374')] [2025-01-04 00:33:59,710][134294] Updated weights for policy 0, policy_version 44794 (0.0013) [2025-01-04 00:34:01,651][134294] Updated weights for policy 0, policy_version 44804 (0.0013) [2025-01-04 00:34:03,733][134294] Updated weights for policy 0, policy_version 44814 (0.0016) [2025-01-04 00:34:03,968][134211] Fps is (10 sec: 20069.7, 60 sec: 15906.0, 300 sec: 15189.9). Total num frames: 183558144. Throughput: 0: 3813.5. Samples: 35050246. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 00:34:03,969][134211] Avg episode reward: [(0, '7.546')] [2025-01-04 00:34:07,114][134294] Updated weights for policy 0, policy_version 44824 (0.0028) [2025-01-04 00:34:08,968][134211] Fps is (10 sec: 15974.3, 60 sec: 15564.8, 300 sec: 15162.2). Total num frames: 183619584. Throughput: 0: 3897.4. Samples: 35073220. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 00:34:08,968][134211] Avg episode reward: [(0, '6.403')] [2025-01-04 00:34:10,293][134294] Updated weights for policy 0, policy_version 44834 (0.0030) [2025-01-04 00:34:13,530][134294] Updated weights for policy 0, policy_version 44844 (0.0027) [2025-01-04 00:34:13,968][134211] Fps is (10 sec: 12698.1, 60 sec: 14950.4, 300 sec: 15106.6). Total num frames: 183685120. Throughput: 0: 3870.8. Samples: 35092318. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 00:34:13,968][134211] Avg episode reward: [(0, '7.051')] [2025-01-04 00:34:16,614][134294] Updated weights for policy 0, policy_version 44854 (0.0026) [2025-01-04 00:34:18,968][134211] Fps is (10 sec: 13106.4, 60 sec: 14882.0, 300 sec: 15064.9). Total num frames: 183750656. Throughput: 0: 3866.7. Samples: 35102238. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 00:34:18,969][134211] Avg episode reward: [(0, '7.350')] [2025-01-04 00:34:19,781][134294] Updated weights for policy 0, policy_version 44864 (0.0027) [2025-01-04 00:34:23,205][134294] Updated weights for policy 0, policy_version 44874 (0.0026) [2025-01-04 00:34:23,968][134211] Fps is (10 sec: 12697.4, 60 sec: 14882.1, 300 sec: 14898.3). Total num frames: 183812096. Throughput: 0: 3848.5. Samples: 35121292. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 00:34:23,968][134211] Avg episode reward: [(0, '6.417')] [2025-01-04 00:34:26,284][134294] Updated weights for policy 0, policy_version 44884 (0.0021) [2025-01-04 00:34:28,311][134294] Updated weights for policy 0, policy_version 44894 (0.0012) [2025-01-04 00:34:28,968][134211] Fps is (10 sec: 14746.6, 60 sec: 15155.3, 300 sec: 14926.1). Total num frames: 183898112. Throughput: 0: 3748.7. Samples: 35143910. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 00:34:28,968][134211] Avg episode reward: [(0, '7.087')] [2025-01-04 00:34:30,265][134294] Updated weights for policy 0, policy_version 44904 (0.0014) [2025-01-04 00:34:32,134][134294] Updated weights for policy 0, policy_version 44914 (0.0014) [2025-01-04 00:34:33,968][134211] Fps is (10 sec: 19251.5, 60 sec: 15837.9, 300 sec: 15078.8). Total num frames: 184004608. Throughput: 0: 3829.4. Samples: 35160024. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 00:34:33,968][134211] Avg episode reward: [(0, '6.472')] [2025-01-04 00:34:34,050][134294] Updated weights for policy 0, policy_version 44924 (0.0014) [2025-01-04 00:34:36,312][134294] Updated weights for policy 0, policy_version 44934 (0.0018) [2025-01-04 00:34:38,968][134211] Fps is (10 sec: 18431.9, 60 sec: 16042.7, 300 sec: 15106.6). Total num frames: 184082432. Throughput: 0: 4015.1. Samples: 35188244. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 00:34:38,968][134211] Avg episode reward: [(0, '7.090')] [2025-01-04 00:34:39,665][134294] Updated weights for policy 0, policy_version 44944 (0.0024) [2025-01-04 00:34:42,832][134294] Updated weights for policy 0, policy_version 44954 (0.0027) [2025-01-04 00:34:43,968][134211] Fps is (10 sec: 13926.3, 60 sec: 15360.0, 300 sec: 15092.7). Total num frames: 184143872. Throughput: 0: 3846.7. Samples: 35207100. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 00:34:43,968][134211] Avg episode reward: [(0, '6.299')] [2025-01-04 00:34:45,975][134294] Updated weights for policy 0, policy_version 44964 (0.0027) [2025-01-04 00:34:48,914][134294] Updated weights for policy 0, policy_version 44974 (0.0023) [2025-01-04 00:34:48,968][134211] Fps is (10 sec: 13107.2, 60 sec: 15360.0, 300 sec: 15106.6). Total num frames: 184213504. Throughput: 0: 3710.5. Samples: 35217216. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 00:34:48,968][134211] Avg episode reward: [(0, '7.251')] [2025-01-04 00:34:51,966][134294] Updated weights for policy 0, policy_version 44984 (0.0026) [2025-01-04 00:34:53,968][134211] Fps is (10 sec: 13516.7, 60 sec: 15360.0, 300 sec: 15092.7). Total num frames: 184279040. Throughput: 0: 3652.6. Samples: 35237586. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 00:34:53,968][134211] Avg episode reward: [(0, '6.620')] [2025-01-04 00:34:55,025][134294] Updated weights for policy 0, policy_version 44994 (0.0026) [2025-01-04 00:34:57,915][134294] Updated weights for policy 0, policy_version 45004 (0.0024) [2025-01-04 00:34:58,968][134211] Fps is (10 sec: 13516.8, 60 sec: 14813.8, 300 sec: 15078.8). Total num frames: 184348672. Throughput: 0: 3686.6. Samples: 35258214. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 00:34:58,968][134211] Avg episode reward: [(0, '7.023')] [2025-01-04 00:35:01,019][134294] Updated weights for policy 0, policy_version 45014 (0.0026) [2025-01-04 00:35:03,969][134211] Fps is (10 sec: 13515.6, 60 sec: 14267.6, 300 sec: 14926.0). Total num frames: 184414208. Throughput: 0: 3687.2. Samples: 35268164. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 00:35:03,969][134211] Avg episode reward: [(0, '6.116')] [2025-01-04 00:35:04,220][134294] Updated weights for policy 0, policy_version 45024 (0.0027) [2025-01-04 00:35:06,954][134294] Updated weights for policy 0, policy_version 45034 (0.0022) [2025-01-04 00:35:08,847][134294] Updated weights for policy 0, policy_version 45044 (0.0014) [2025-01-04 00:35:08,968][134211] Fps is (10 sec: 15154.9, 60 sec: 14677.3, 300 sec: 14953.9). Total num frames: 184500224. Throughput: 0: 3733.0. Samples: 35289276. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 00:35:08,969][134211] Avg episode reward: [(0, '7.222')] [2025-01-04 00:35:10,729][134294] Updated weights for policy 0, policy_version 45054 (0.0013) [2025-01-04 00:35:12,632][134294] Updated weights for policy 0, policy_version 45064 (0.0013) [2025-01-04 00:35:13,968][134211] Fps is (10 sec: 19662.7, 60 sec: 15428.3, 300 sec: 15120.5). Total num frames: 184610816. Throughput: 0: 3954.1. Samples: 35321844. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 00:35:13,968][134211] Avg episode reward: [(0, '6.533')] [2025-01-04 00:35:13,973][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000045071_184610816.pth... [2025-01-04 00:35:14,029][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000044192_181010432.pth [2025-01-04 00:35:14,641][134294] Updated weights for policy 0, policy_version 45074 (0.0016) [2025-01-04 00:35:17,790][134294] Updated weights for policy 0, policy_version 45084 (0.0031) [2025-01-04 00:35:18,970][134211] Fps is (10 sec: 17609.3, 60 sec: 15427.9, 300 sec: 15148.2). Total num frames: 184676352. Throughput: 0: 3867.3. Samples: 35334062. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 00:35:18,970][134211] Avg episode reward: [(0, '6.343')] [2025-01-04 00:35:20,819][134294] Updated weights for policy 0, policy_version 45094 (0.0025) [2025-01-04 00:35:23,938][134294] Updated weights for policy 0, policy_version 45104 (0.0025) [2025-01-04 00:35:23,968][134211] Fps is (10 sec: 13516.8, 60 sec: 15564.8, 300 sec: 15134.4). Total num frames: 184745984. Throughput: 0: 3680.3. Samples: 35353858. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 00:35:23,968][134211] Avg episode reward: [(0, '6.742')] [2025-01-04 00:35:27,034][134294] Updated weights for policy 0, policy_version 45114 (0.0026) [2025-01-04 00:35:28,968][134211] Fps is (10 sec: 13519.6, 60 sec: 15223.4, 300 sec: 14981.6). Total num frames: 184811520. Throughput: 0: 3704.9. Samples: 35373822. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 00:35:28,968][134211] Avg episode reward: [(0, '5.997')] [2025-01-04 00:35:30,029][134294] Updated weights for policy 0, policy_version 45124 (0.0026) [2025-01-04 00:35:33,025][134294] Updated weights for policy 0, policy_version 45134 (0.0027) [2025-01-04 00:35:33,968][134211] Fps is (10 sec: 13107.3, 60 sec: 14540.8, 300 sec: 14981.6). Total num frames: 184877056. Throughput: 0: 3711.1. Samples: 35384214. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 00:35:33,968][134211] Avg episode reward: [(0, '6.402')] [2025-01-04 00:35:35,988][134294] Updated weights for policy 0, policy_version 45144 (0.0027) [2025-01-04 00:35:38,968][134211] Fps is (10 sec: 13516.2, 60 sec: 14404.1, 300 sec: 14995.5). Total num frames: 184946688. Throughput: 0: 3715.7. Samples: 35404794. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 00:35:38,969][134211] Avg episode reward: [(0, '6.672')] [2025-01-04 00:35:39,036][134294] Updated weights for policy 0, policy_version 45154 (0.0026) [2025-01-04 00:35:41,354][134294] Updated weights for policy 0, policy_version 45164 (0.0018) [2025-01-04 00:35:43,728][134294] Updated weights for policy 0, policy_version 45174 (0.0019) [2025-01-04 00:35:43,968][134211] Fps is (10 sec: 15974.3, 60 sec: 14882.1, 300 sec: 15078.8). Total num frames: 185036800. Throughput: 0: 3808.8. Samples: 35429612. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 00:35:43,968][134211] Avg episode reward: [(0, '7.617')] [2025-01-04 00:35:46,729][134294] Updated weights for policy 0, policy_version 45184 (0.0025) [2025-01-04 00:35:48,968][134211] Fps is (10 sec: 15565.6, 60 sec: 14813.9, 300 sec: 15078.8). Total num frames: 185102336. Throughput: 0: 3810.6. Samples: 35439638. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 00:35:48,968][134211] Avg episode reward: [(0, '6.494')] [2025-01-04 00:35:49,919][134294] Updated weights for policy 0, policy_version 45194 (0.0024) [2025-01-04 00:35:51,947][134294] Updated weights for policy 0, policy_version 45204 (0.0013) [2025-01-04 00:35:53,911][134294] Updated weights for policy 0, policy_version 45214 (0.0013) [2025-01-04 00:35:53,968][134211] Fps is (10 sec: 15974.5, 60 sec: 15291.8, 300 sec: 15162.1). Total num frames: 185196544. Throughput: 0: 3862.5. Samples: 35463086. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 00:35:53,968][134211] Avg episode reward: [(0, '6.940')] [2025-01-04 00:35:55,795][134294] Updated weights for policy 0, policy_version 45224 (0.0013) [2025-01-04 00:35:57,652][134294] Updated weights for policy 0, policy_version 45234 (0.0013) [2025-01-04 00:35:58,967][134211] Fps is (10 sec: 20070.7, 60 sec: 15906.2, 300 sec: 15231.6). Total num frames: 185303040. Throughput: 0: 3860.0. Samples: 35495544. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 00:35:58,968][134211] Avg episode reward: [(0, '6.716')] [2025-01-04 00:35:59,586][134294] Updated weights for policy 0, policy_version 45244 (0.0014) [2025-01-04 00:36:02,523][134294] Updated weights for policy 0, policy_version 45254 (0.0030) [2025-01-04 00:36:03,968][134211] Fps is (10 sec: 18022.3, 60 sec: 16042.9, 300 sec: 15106.6). Total num frames: 185376768. Throughput: 0: 3891.1. Samples: 35509154. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 00:36:03,968][134211] Avg episode reward: [(0, '6.949')] [2025-01-04 00:36:05,790][134294] Updated weights for policy 0, policy_version 45264 (0.0023) [2025-01-04 00:36:08,846][134294] Updated weights for policy 0, policy_version 45274 (0.0028) [2025-01-04 00:36:08,968][134211] Fps is (10 sec: 13926.0, 60 sec: 15701.3, 300 sec: 15023.3). Total num frames: 185442304. Throughput: 0: 3877.5. Samples: 35528348. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 00:36:08,968][134211] Avg episode reward: [(0, '7.064')] [2025-01-04 00:36:11,951][134294] Updated weights for policy 0, policy_version 45284 (0.0027) [2025-01-04 00:36:13,968][134211] Fps is (10 sec: 13106.8, 60 sec: 14950.3, 300 sec: 15051.1). Total num frames: 185507840. Throughput: 0: 3874.1. Samples: 35548158. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 00:36:13,969][134211] Avg episode reward: [(0, '7.001')] [2025-01-04 00:36:15,030][134294] Updated weights for policy 0, policy_version 45294 (0.0025) [2025-01-04 00:36:18,030][134294] Updated weights for policy 0, policy_version 45304 (0.0026) [2025-01-04 00:36:18,968][134211] Fps is (10 sec: 13107.3, 60 sec: 14950.9, 300 sec: 15064.9). Total num frames: 185573376. Throughput: 0: 3860.5. Samples: 35557936. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 00:36:18,968][134211] Avg episode reward: [(0, '6.634')] [2025-01-04 00:36:21,232][134294] Updated weights for policy 0, policy_version 45314 (0.0020) [2025-01-04 00:36:23,968][134211] Fps is (10 sec: 12697.8, 60 sec: 14813.8, 300 sec: 15064.9). Total num frames: 185634816. Throughput: 0: 3838.3. Samples: 35577514. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 00:36:23,968][134211] Avg episode reward: [(0, '6.877')] [2025-01-04 00:36:24,780][134294] Updated weights for policy 0, policy_version 45324 (0.0027) [2025-01-04 00:36:28,149][134294] Updated weights for policy 0, policy_version 45334 (0.0022) [2025-01-04 00:36:28,968][134211] Fps is (10 sec: 12697.7, 60 sec: 14813.9, 300 sec: 15051.1). Total num frames: 185700352. Throughput: 0: 3682.9. Samples: 35595344. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 00:36:28,968][134211] Avg episode reward: [(0, '6.318')] [2025-01-04 00:36:30,211][134294] Updated weights for policy 0, policy_version 45344 (0.0015) [2025-01-04 00:36:32,072][134294] Updated weights for policy 0, policy_version 45354 (0.0014) [2025-01-04 00:36:33,940][134294] Updated weights for policy 0, policy_version 45364 (0.0013) [2025-01-04 00:36:33,968][134211] Fps is (10 sec: 17613.2, 60 sec: 15564.8, 300 sec: 15203.8). Total num frames: 185810944. Throughput: 0: 3803.5. Samples: 35610796. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 00:36:33,968][134211] Avg episode reward: [(0, '6.575')] [2025-01-04 00:36:35,967][134294] Updated weights for policy 0, policy_version 45374 (0.0016) [2025-01-04 00:36:38,968][134211] Fps is (10 sec: 18841.4, 60 sec: 15701.4, 300 sec: 15231.6). Total num frames: 185888768. Throughput: 0: 3947.2. Samples: 35640710. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 00:36:38,968][134211] Avg episode reward: [(0, '6.744')] [2025-01-04 00:36:39,012][134294] Updated weights for policy 0, policy_version 45384 (0.0026) [2025-01-04 00:36:42,260][134294] Updated weights for policy 0, policy_version 45394 (0.0030) [2025-01-04 00:36:43,968][134211] Fps is (10 sec: 14335.7, 60 sec: 15291.7, 300 sec: 15162.1). Total num frames: 185954304. Throughput: 0: 3648.2. Samples: 35659716. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 00:36:43,968][134211] Avg episode reward: [(0, '6.327')] [2025-01-04 00:36:45,365][134294] Updated weights for policy 0, policy_version 45404 (0.0026) [2025-01-04 00:36:48,346][134294] Updated weights for policy 0, policy_version 45414 (0.0025) [2025-01-04 00:36:48,968][134211] Fps is (10 sec: 13516.8, 60 sec: 15360.0, 300 sec: 15037.2). Total num frames: 186023936. Throughput: 0: 3570.5. Samples: 35669826. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 00:36:48,968][134211] Avg episode reward: [(0, '6.401')] [2025-01-04 00:36:51,342][134294] Updated weights for policy 0, policy_version 45424 (0.0026) [2025-01-04 00:36:53,969][134211] Fps is (10 sec: 13515.3, 60 sec: 14881.8, 300 sec: 15051.0). Total num frames: 186089472. Throughput: 0: 3594.8. Samples: 35690118. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 00:36:53,970][134211] Avg episode reward: [(0, '6.667')] [2025-01-04 00:36:54,526][134294] Updated weights for policy 0, policy_version 45434 (0.0026) [2025-01-04 00:36:57,514][134294] Updated weights for policy 0, policy_version 45444 (0.0024) [2025-01-04 00:36:58,968][134211] Fps is (10 sec: 13106.2, 60 sec: 14199.2, 300 sec: 15051.1). Total num frames: 186155008. Throughput: 0: 3595.3. Samples: 35709948. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 00:36:58,969][134211] Avg episode reward: [(0, '6.748')] [2025-01-04 00:37:00,257][134294] Updated weights for policy 0, policy_version 45454 (0.0021) [2025-01-04 00:37:02,325][134294] Updated weights for policy 0, policy_version 45464 (0.0014) [2025-01-04 00:37:03,967][134211] Fps is (10 sec: 16386.2, 60 sec: 14609.1, 300 sec: 15176.0). Total num frames: 186253312. Throughput: 0: 3662.2. Samples: 35722732. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 00:37:03,968][134211] Avg episode reward: [(0, '7.036')] [2025-01-04 00:37:04,210][134294] Updated weights for policy 0, policy_version 45474 (0.0015) [2025-01-04 00:37:06,148][134294] Updated weights for policy 0, policy_version 45484 (0.0013) [2025-01-04 00:37:08,025][134294] Updated weights for policy 0, policy_version 45494 (0.0014) [2025-01-04 00:37:08,968][134211] Fps is (10 sec: 20481.9, 60 sec: 15291.8, 300 sec: 15231.6). Total num frames: 186359808. Throughput: 0: 3933.4. Samples: 35754518. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 00:37:08,968][134211] Avg episode reward: [(0, '7.177')] [2025-01-04 00:37:10,390][134294] Updated weights for policy 0, policy_version 45504 (0.0019) [2025-01-04 00:37:13,468][134294] Updated weights for policy 0, policy_version 45514 (0.0028) [2025-01-04 00:37:13,968][134211] Fps is (10 sec: 17612.4, 60 sec: 15360.1, 300 sec: 15106.6). Total num frames: 186429440. Throughput: 0: 4070.6. Samples: 35778520. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 00:37:13,968][134211] Avg episode reward: [(0, '6.223')] [2025-01-04 00:37:13,981][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000045515_186429440.pth... [2025-01-04 00:37:14,058][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000044626_182788096.pth [2025-01-04 00:37:16,828][134294] Updated weights for policy 0, policy_version 45524 (0.0027) [2025-01-04 00:37:18,968][134211] Fps is (10 sec: 13106.0, 60 sec: 15291.6, 300 sec: 15120.5). Total num frames: 186490880. Throughput: 0: 3933.3. Samples: 35787798. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 00:37:18,969][134211] Avg episode reward: [(0, '6.371')] [2025-01-04 00:37:19,960][134294] Updated weights for policy 0, policy_version 45534 (0.0027) [2025-01-04 00:37:23,155][134294] Updated weights for policy 0, policy_version 45544 (0.0025) [2025-01-04 00:37:23,968][134211] Fps is (10 sec: 12697.5, 60 sec: 15360.0, 300 sec: 15148.3). Total num frames: 186556416. Throughput: 0: 3703.8. Samples: 35807380. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 00:37:23,968][134211] Avg episode reward: [(0, '6.465')] [2025-01-04 00:37:26,162][134294] Updated weights for policy 0, policy_version 45554 (0.0026) [2025-01-04 00:37:28,968][134211] Fps is (10 sec: 13107.7, 60 sec: 15359.9, 300 sec: 15120.5). Total num frames: 186621952. Throughput: 0: 3716.6. Samples: 35826962. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 00:37:28,969][134211] Avg episode reward: [(0, '7.194')] [2025-01-04 00:37:29,359][134294] Updated weights for policy 0, policy_version 45564 (0.0024) [2025-01-04 00:37:32,330][134294] Updated weights for policy 0, policy_version 45574 (0.0025) [2025-01-04 00:37:33,968][134211] Fps is (10 sec: 13107.3, 60 sec: 14609.0, 300 sec: 15051.1). Total num frames: 186687488. Throughput: 0: 3716.9. Samples: 35837088. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 00:37:33,968][134211] Avg episode reward: [(0, '6.430')] [2025-01-04 00:37:35,461][134294] Updated weights for policy 0, policy_version 45584 (0.0026) [2025-01-04 00:37:37,687][134294] Updated weights for policy 0, policy_version 45594 (0.0017) [2025-01-04 00:37:38,968][134211] Fps is (10 sec: 15565.6, 60 sec: 14813.9, 300 sec: 15134.4). Total num frames: 186777600. Throughput: 0: 3750.3. Samples: 35858878. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 00:37:38,968][134211] Avg episode reward: [(0, '6.947')] [2025-01-04 00:37:39,570][134294] Updated weights for policy 0, policy_version 45604 (0.0014) [2025-01-04 00:37:41,506][134294] Updated weights for policy 0, policy_version 45614 (0.0013) [2025-01-04 00:37:43,424][134294] Updated weights for policy 0, policy_version 45624 (0.0012) [2025-01-04 00:37:43,968][134211] Fps is (10 sec: 19660.9, 60 sec: 15496.6, 300 sec: 15259.4). Total num frames: 186884096. Throughput: 0: 4024.1. Samples: 35891028. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 00:37:43,968][134211] Avg episode reward: [(0, '6.944')] [2025-01-04 00:37:45,818][134294] Updated weights for policy 0, policy_version 45634 (0.0019) [2025-01-04 00:37:48,962][134294] Updated weights for policy 0, policy_version 45644 (0.0028) [2025-01-04 00:37:48,970][134211] Fps is (10 sec: 18018.3, 60 sec: 15564.3, 300 sec: 15273.2). Total num frames: 186957824. Throughput: 0: 4017.3. Samples: 35903518. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 00:37:48,970][134211] Avg episode reward: [(0, '7.049')] [2025-01-04 00:37:52,042][134294] Updated weights for policy 0, policy_version 45654 (0.0026) [2025-01-04 00:37:53,968][134211] Fps is (10 sec: 13516.8, 60 sec: 15496.9, 300 sec: 15203.8). Total num frames: 187019264. Throughput: 0: 3753.0. Samples: 35923402. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 00:37:53,968][134211] Avg episode reward: [(0, '6.674')] [2025-01-04 00:37:55,248][134294] Updated weights for policy 0, policy_version 45664 (0.0025) [2025-01-04 00:37:58,332][134294] Updated weights for policy 0, policy_version 45674 (0.0026) [2025-01-04 00:37:58,968][134211] Fps is (10 sec: 12699.8, 60 sec: 15496.6, 300 sec: 15189.9). Total num frames: 187084800. Throughput: 0: 3652.4. Samples: 35942880. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 00:37:58,969][134211] Avg episode reward: [(0, '6.496')] [2025-01-04 00:38:01,646][134294] Updated weights for policy 0, policy_version 45684 (0.0029) [2025-01-04 00:38:03,968][134211] Fps is (10 sec: 13107.0, 60 sec: 14950.3, 300 sec: 15134.4). Total num frames: 187150336. Throughput: 0: 3654.5. Samples: 35952250. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 00:38:03,968][134211] Avg episode reward: [(0, '7.194')] [2025-01-04 00:38:04,861][134294] Updated weights for policy 0, policy_version 45694 (0.0027) [2025-01-04 00:38:07,494][134294] Updated weights for policy 0, policy_version 45704 (0.0022) [2025-01-04 00:38:08,971][134211] Fps is (10 sec: 14741.7, 60 sec: 14540.0, 300 sec: 15064.8). Total num frames: 187232256. Throughput: 0: 3666.4. Samples: 35972380. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 00:38:08,971][134211] Avg episode reward: [(0, '6.342')] [2025-01-04 00:38:09,397][134294] Updated weights for policy 0, policy_version 45714 (0.0013) [2025-01-04 00:38:11,321][134294] Updated weights for policy 0, policy_version 45724 (0.0014) [2025-01-04 00:38:13,968][134211] Fps is (10 sec: 17203.2, 60 sec: 14882.1, 300 sec: 15134.4). Total num frames: 187322368. Throughput: 0: 3884.0. Samples: 36001740. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 00:38:13,968][134211] Avg episode reward: [(0, '7.389')] [2025-01-04 00:38:14,068][134294] Updated weights for policy 0, policy_version 45734 (0.0023) [2025-01-04 00:38:17,335][134294] Updated weights for policy 0, policy_version 45744 (0.0026) [2025-01-04 00:38:18,968][134211] Fps is (10 sec: 15568.9, 60 sec: 14950.5, 300 sec: 15148.2). Total num frames: 187387904. Throughput: 0: 3873.7. Samples: 36011406. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 00:38:18,969][134211] Avg episode reward: [(0, '7.359')] [2025-01-04 00:38:20,287][134294] Updated weights for policy 0, policy_version 45754 (0.0026) [2025-01-04 00:38:23,295][134294] Updated weights for policy 0, policy_version 45764 (0.0023) [2025-01-04 00:38:23,968][134211] Fps is (10 sec: 13107.3, 60 sec: 14950.4, 300 sec: 15134.4). Total num frames: 187453440. Throughput: 0: 3844.6. Samples: 36031886. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 00:38:23,968][134211] Avg episode reward: [(0, '7.198')] [2025-01-04 00:38:26,604][134294] Updated weights for policy 0, policy_version 45774 (0.0024) [2025-01-04 00:38:28,722][134294] Updated weights for policy 0, policy_version 45784 (0.0013) [2025-01-04 00:38:28,968][134211] Fps is (10 sec: 14746.2, 60 sec: 15223.6, 300 sec: 15189.9). Total num frames: 187535360. Throughput: 0: 3608.2. Samples: 36053396. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 00:38:28,968][134211] Avg episode reward: [(0, '6.746')] [2025-01-04 00:38:30,781][134294] Updated weights for policy 0, policy_version 45794 (0.0014) [2025-01-04 00:38:32,646][134294] Updated weights for policy 0, policy_version 45804 (0.0013) [2025-01-04 00:38:33,967][134211] Fps is (10 sec: 18842.0, 60 sec: 15906.2, 300 sec: 15328.8). Total num frames: 187641856. Throughput: 0: 3664.9. Samples: 36068430. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 00:38:33,968][134211] Avg episode reward: [(0, '5.992')] [2025-01-04 00:38:34,524][134294] Updated weights for policy 0, policy_version 45814 (0.0014) [2025-01-04 00:38:36,409][134294] Updated weights for policy 0, policy_version 45824 (0.0014) [2025-01-04 00:38:38,878][134294] Updated weights for policy 0, policy_version 45834 (0.0019) [2025-01-04 00:38:38,968][134211] Fps is (10 sec: 20070.0, 60 sec: 15974.3, 300 sec: 15301.0). Total num frames: 187736064. Throughput: 0: 3947.6. Samples: 36101046. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 00:38:38,968][134211] Avg episode reward: [(0, '6.359')] [2025-01-04 00:38:42,195][134294] Updated weights for policy 0, policy_version 45844 (0.0028) [2025-01-04 00:38:43,968][134211] Fps is (10 sec: 15564.3, 60 sec: 15223.4, 300 sec: 15273.2). Total num frames: 187797504. Throughput: 0: 3944.3. Samples: 36120372. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 00:38:43,968][134211] Avg episode reward: [(0, '7.456')] [2025-01-04 00:38:45,436][134294] Updated weights for policy 0, policy_version 45854 (0.0025) [2025-01-04 00:38:48,502][134294] Updated weights for policy 0, policy_version 45864 (0.0024) [2025-01-04 00:38:48,968][134211] Fps is (10 sec: 12697.8, 60 sec: 15087.5, 300 sec: 15273.2). Total num frames: 187863040. Throughput: 0: 3958.4. Samples: 36130376. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 00:38:48,968][134211] Avg episode reward: [(0, '6.571')] [2025-01-04 00:38:51,403][134294] Updated weights for policy 0, policy_version 45874 (0.0025) [2025-01-04 00:38:53,968][134211] Fps is (10 sec: 13107.2, 60 sec: 15155.1, 300 sec: 15148.2). Total num frames: 187928576. Throughput: 0: 3954.8. Samples: 36150336. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 00:38:53,968][134211] Avg episode reward: [(0, '6.809')] [2025-01-04 00:38:54,858][134294] Updated weights for policy 0, policy_version 45884 (0.0026) [2025-01-04 00:38:58,105][134294] Updated weights for policy 0, policy_version 45894 (0.0026) [2025-01-04 00:38:58,968][134211] Fps is (10 sec: 12697.6, 60 sec: 15087.0, 300 sec: 15023.3). Total num frames: 187990016. Throughput: 0: 3712.7. Samples: 36168810. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 00:38:58,968][134211] Avg episode reward: [(0, '6.755')] [2025-01-04 00:39:01,426][134294] Updated weights for policy 0, policy_version 45904 (0.0025) [2025-01-04 00:39:03,968][134211] Fps is (10 sec: 12287.9, 60 sec: 15018.6, 300 sec: 15023.3). Total num frames: 188051456. Throughput: 0: 3707.7. Samples: 36178250. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 00:39:03,969][134211] Avg episode reward: [(0, '6.249')] [2025-01-04 00:39:04,916][134294] Updated weights for policy 0, policy_version 45914 (0.0026) [2025-01-04 00:39:07,816][134294] Updated weights for policy 0, policy_version 45924 (0.0022) [2025-01-04 00:39:08,967][134211] Fps is (10 sec: 13517.0, 60 sec: 14882.9, 300 sec: 15051.1). Total num frames: 188125184. Throughput: 0: 3654.6. Samples: 36196342. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 00:39:08,968][134211] Avg episode reward: [(0, '6.630')] [2025-01-04 00:39:09,756][134294] Updated weights for policy 0, policy_version 45934 (0.0014) [2025-01-04 00:39:11,716][134294] Updated weights for policy 0, policy_version 45944 (0.0013) [2025-01-04 00:39:13,598][134294] Updated weights for policy 0, policy_version 45954 (0.0013) [2025-01-04 00:39:13,968][134211] Fps is (10 sec: 18023.0, 60 sec: 15155.3, 300 sec: 15189.9). Total num frames: 188231680. Throughput: 0: 3878.0. Samples: 36227906. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 00:39:13,968][134211] Avg episode reward: [(0, '6.789')] [2025-01-04 00:39:13,988][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000045956_188235776.pth... [2025-01-04 00:39:14,038][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000045071_184610816.pth [2025-01-04 00:39:15,490][134294] Updated weights for policy 0, policy_version 45964 (0.0013) [2025-01-04 00:39:17,395][134294] Updated weights for policy 0, policy_version 45974 (0.0013) [2025-01-04 00:39:18,968][134211] Fps is (10 sec: 20888.3, 60 sec: 15769.6, 300 sec: 15328.7). Total num frames: 188334080. Throughput: 0: 3902.7. Samples: 36244052. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 00:39:18,969][134211] Avg episode reward: [(0, '6.497')] [2025-01-04 00:39:20,093][134294] Updated weights for policy 0, policy_version 45984 (0.0024) [2025-01-04 00:39:23,271][134294] Updated weights for policy 0, policy_version 45994 (0.0028) [2025-01-04 00:39:23,968][134211] Fps is (10 sec: 16793.2, 60 sec: 15769.6, 300 sec: 15259.3). Total num frames: 188399616. Throughput: 0: 3703.8. Samples: 36267718. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 00:39:23,968][134211] Avg episode reward: [(0, '6.651')] [2025-01-04 00:39:26,406][134294] Updated weights for policy 0, policy_version 46004 (0.0025) [2025-01-04 00:39:28,968][134211] Fps is (10 sec: 13107.6, 60 sec: 15496.5, 300 sec: 15120.5). Total num frames: 188465152. Throughput: 0: 3701.2. Samples: 36286928. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 00:39:28,968][134211] Avg episode reward: [(0, '6.431')] [2025-01-04 00:39:29,617][134294] Updated weights for policy 0, policy_version 46014 (0.0029) [2025-01-04 00:39:32,670][134294] Updated weights for policy 0, policy_version 46024 (0.0028) [2025-01-04 00:39:33,968][134211] Fps is (10 sec: 12696.8, 60 sec: 14745.4, 300 sec: 15064.9). Total num frames: 188526592. Throughput: 0: 3695.8. Samples: 36296688. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 00:39:33,969][134211] Avg episode reward: [(0, '6.342')] [2025-01-04 00:39:35,737][134294] Updated weights for policy 0, policy_version 46034 (0.0026) [2025-01-04 00:39:38,713][134294] Updated weights for policy 0, policy_version 46044 (0.0028) [2025-01-04 00:39:38,968][134211] Fps is (10 sec: 13107.2, 60 sec: 14336.0, 300 sec: 15092.7). Total num frames: 188596224. Throughput: 0: 3707.6. Samples: 36317176. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 00:39:38,968][134211] Avg episode reward: [(0, '6.135')] [2025-01-04 00:39:41,679][134294] Updated weights for policy 0, policy_version 46054 (0.0025) [2025-01-04 00:39:43,968][134211] Fps is (10 sec: 13927.3, 60 sec: 14472.5, 300 sec: 15092.7). Total num frames: 188665856. Throughput: 0: 3748.2. Samples: 36337478. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 00:39:43,968][134211] Avg episode reward: [(0, '6.696')] [2025-01-04 00:39:44,702][134294] Updated weights for policy 0, policy_version 46064 (0.0026) [2025-01-04 00:39:47,781][134294] Updated weights for policy 0, policy_version 46074 (0.0025) [2025-01-04 00:39:48,968][134211] Fps is (10 sec: 13516.9, 60 sec: 14472.5, 300 sec: 15092.7). Total num frames: 188731392. Throughput: 0: 3762.5. Samples: 36347562. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 00:39:48,968][134211] Avg episode reward: [(0, '7.320')] [2025-01-04 00:39:50,798][134294] Updated weights for policy 0, policy_version 46084 (0.0025) [2025-01-04 00:39:53,664][134294] Updated weights for policy 0, policy_version 46094 (0.0027) [2025-01-04 00:39:53,968][134211] Fps is (10 sec: 13516.9, 60 sec: 14540.8, 300 sec: 15092.7). Total num frames: 188801024. Throughput: 0: 3820.4. Samples: 36368262. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 00:39:53,968][134211] Avg episode reward: [(0, '6.510')] [2025-01-04 00:39:55,788][134294] Updated weights for policy 0, policy_version 46104 (0.0015) [2025-01-04 00:39:57,586][134294] Updated weights for policy 0, policy_version 46114 (0.0015) [2025-01-04 00:39:58,968][134211] Fps is (10 sec: 17613.0, 60 sec: 15291.7, 300 sec: 15231.6). Total num frames: 188907520. Throughput: 0: 3751.1. Samples: 36396706. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 00:39:58,968][134211] Avg episode reward: [(0, '6.607')] [2025-01-04 00:40:00,117][134294] Updated weights for policy 0, policy_version 46124 (0.0021) [2025-01-04 00:40:03,153][134294] Updated weights for policy 0, policy_version 46134 (0.0025) [2025-01-04 00:40:03,968][134211] Fps is (10 sec: 17203.3, 60 sec: 15360.1, 300 sec: 15162.1). Total num frames: 188973056. Throughput: 0: 3637.5. Samples: 36407738. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 00:40:03,968][134211] Avg episode reward: [(0, '6.117')] [2025-01-04 00:40:06,295][134294] Updated weights for policy 0, policy_version 46144 (0.0026) [2025-01-04 00:40:08,968][134211] Fps is (10 sec: 13107.0, 60 sec: 15223.4, 300 sec: 15009.4). Total num frames: 189038592. Throughput: 0: 3550.9. Samples: 36427508. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 00:40:08,968][134211] Avg episode reward: [(0, '6.649')] [2025-01-04 00:40:09,356][134294] Updated weights for policy 0, policy_version 46154 (0.0024) [2025-01-04 00:40:11,306][134294] Updated weights for policy 0, policy_version 46164 (0.0012) [2025-01-04 00:40:13,198][134294] Updated weights for policy 0, policy_version 46174 (0.0014) [2025-01-04 00:40:13,968][134211] Fps is (10 sec: 16793.4, 60 sec: 15155.2, 300 sec: 15134.5). Total num frames: 189140992. Throughput: 0: 3743.5. Samples: 36455384. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 00:40:13,968][134211] Avg episode reward: [(0, '6.929')] [2025-01-04 00:40:16,084][134294] Updated weights for policy 0, policy_version 46184 (0.0025) [2025-01-04 00:40:18,968][134211] Fps is (10 sec: 16383.9, 60 sec: 14472.6, 300 sec: 15106.6). Total num frames: 189202432. Throughput: 0: 3758.4. Samples: 36465814. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 00:40:18,968][134211] Avg episode reward: [(0, '7.006')] [2025-01-04 00:40:19,702][134294] Updated weights for policy 0, policy_version 46194 (0.0029) [2025-01-04 00:40:22,716][134294] Updated weights for policy 0, policy_version 46204 (0.0026) [2025-01-04 00:40:23,968][134211] Fps is (10 sec: 12697.8, 60 sec: 14472.6, 300 sec: 15106.6). Total num frames: 189267968. Throughput: 0: 3713.0. Samples: 36484262. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 00:40:23,968][134211] Avg episode reward: [(0, '6.076')] [2025-01-04 00:40:25,060][134294] Updated weights for policy 0, policy_version 46214 (0.0016) [2025-01-04 00:40:27,230][134294] Updated weights for policy 0, policy_version 46224 (0.0014) [2025-01-04 00:40:28,968][134211] Fps is (10 sec: 16384.1, 60 sec: 15018.7, 300 sec: 15217.7). Total num frames: 189366272. Throughput: 0: 3863.1. Samples: 36511316. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 00:40:28,968][134211] Avg episode reward: [(0, '6.864')] [2025-01-04 00:40:29,253][134294] Updated weights for policy 0, policy_version 46234 (0.0013) [2025-01-04 00:40:31,162][134294] Updated weights for policy 0, policy_version 46244 (0.0012) [2025-01-04 00:40:33,300][134294] Updated weights for policy 0, policy_version 46254 (0.0019) [2025-01-04 00:40:33,970][134211] Fps is (10 sec: 19656.2, 60 sec: 15632.7, 300 sec: 15314.8). Total num frames: 189464576. Throughput: 0: 3990.0. Samples: 36527120. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 00:40:33,971][134211] Avg episode reward: [(0, '7.058')] [2025-01-04 00:40:36,741][134294] Updated weights for policy 0, policy_version 46264 (0.0030) [2025-01-04 00:40:38,968][134211] Fps is (10 sec: 15564.7, 60 sec: 15428.3, 300 sec: 15203.8). Total num frames: 189521920. Throughput: 0: 4014.3. Samples: 36548906. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 00:40:38,968][134211] Avg episode reward: [(0, '6.833')] [2025-01-04 00:40:40,079][134294] Updated weights for policy 0, policy_version 46274 (0.0031) [2025-01-04 00:40:43,341][134294] Updated weights for policy 0, policy_version 46284 (0.0029) [2025-01-04 00:40:43,968][134211] Fps is (10 sec: 11880.9, 60 sec: 15291.7, 300 sec: 15189.9). Total num frames: 189583360. Throughput: 0: 3799.6. Samples: 36567688. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 00:40:43,968][134211] Avg episode reward: [(0, '6.954')] [2025-01-04 00:40:46,573][134294] Updated weights for policy 0, policy_version 46294 (0.0027) [2025-01-04 00:40:48,968][134211] Fps is (10 sec: 12697.7, 60 sec: 15291.7, 300 sec: 15092.7). Total num frames: 189648896. Throughput: 0: 3760.2. Samples: 36576946. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 00:40:48,968][134211] Avg episode reward: [(0, '6.495')] [2025-01-04 00:40:49,706][134294] Updated weights for policy 0, policy_version 46304 (0.0027) [2025-01-04 00:40:52,969][134294] Updated weights for policy 0, policy_version 46314 (0.0026) [2025-01-04 00:40:53,968][134211] Fps is (10 sec: 13107.4, 60 sec: 15223.5, 300 sec: 14953.9). Total num frames: 189714432. Throughput: 0: 3749.9. Samples: 36596252. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 00:40:53,968][134211] Avg episode reward: [(0, '6.588')] [2025-01-04 00:40:56,096][134294] Updated weights for policy 0, policy_version 46324 (0.0027) [2025-01-04 00:40:58,556][134294] Updated weights for policy 0, policy_version 46334 (0.0015) [2025-01-04 00:40:58,967][134211] Fps is (10 sec: 14336.2, 60 sec: 14745.6, 300 sec: 14967.8). Total num frames: 189792256. Throughput: 0: 3599.3. Samples: 36617350. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 00:40:58,968][134211] Avg episode reward: [(0, '6.831')] [2025-01-04 00:41:01,111][134294] Updated weights for policy 0, policy_version 46344 (0.0022) [2025-01-04 00:41:03,968][134211] Fps is (10 sec: 14745.2, 60 sec: 14813.8, 300 sec: 14981.6). Total num frames: 189861888. Throughput: 0: 3643.5. Samples: 36629774. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 00:41:03,969][134211] Avg episode reward: [(0, '6.870')] [2025-01-04 00:41:04,226][134294] Updated weights for policy 0, policy_version 46354 (0.0026) [2025-01-04 00:41:07,112][134294] Updated weights for policy 0, policy_version 46364 (0.0024) [2025-01-04 00:41:08,968][134211] Fps is (10 sec: 13926.3, 60 sec: 14882.2, 300 sec: 14995.5). Total num frames: 189931520. Throughput: 0: 3678.8. Samples: 36649810. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 00:41:08,968][134211] Avg episode reward: [(0, '6.391')] [2025-01-04 00:41:09,660][134294] Updated weights for policy 0, policy_version 46374 (0.0017) [2025-01-04 00:41:11,557][134294] Updated weights for policy 0, policy_version 46384 (0.0014) [2025-01-04 00:41:13,436][134294] Updated weights for policy 0, policy_version 46394 (0.0014) [2025-01-04 00:41:13,967][134211] Fps is (10 sec: 17613.5, 60 sec: 14950.5, 300 sec: 15134.4). Total num frames: 190038016. Throughput: 0: 3736.9. Samples: 36679476. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 00:41:13,968][134211] Avg episode reward: [(0, '6.828')] [2025-01-04 00:41:13,999][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000046397_190042112.pth... [2025-01-04 00:41:14,043][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000045515_186429440.pth [2025-01-04 00:41:15,291][134294] Updated weights for policy 0, policy_version 46404 (0.0014) [2025-01-04 00:41:17,693][134294] Updated weights for policy 0, policy_version 46414 (0.0020) [2025-01-04 00:41:18,968][134211] Fps is (10 sec: 19250.7, 60 sec: 15360.0, 300 sec: 15217.7). Total num frames: 190124032. Throughput: 0: 3743.6. Samples: 36695574. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 00:41:18,969][134211] Avg episode reward: [(0, '7.217')] [2025-01-04 00:41:20,983][134294] Updated weights for policy 0, policy_version 46424 (0.0030) [2025-01-04 00:41:23,968][134211] Fps is (10 sec: 15154.7, 60 sec: 15359.9, 300 sec: 15217.7). Total num frames: 190189568. Throughput: 0: 3693.1. Samples: 36715096. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 00:41:23,969][134211] Avg episode reward: [(0, '6.380')] [2025-01-04 00:41:24,222][134294] Updated weights for policy 0, policy_version 46434 (0.0025) [2025-01-04 00:41:27,357][134294] Updated weights for policy 0, policy_version 46444 (0.0025) [2025-01-04 00:41:28,968][134211] Fps is (10 sec: 13106.7, 60 sec: 14813.8, 300 sec: 15064.9). Total num frames: 190255104. Throughput: 0: 3709.2. Samples: 36734602. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 00:41:28,969][134211] Avg episode reward: [(0, '7.185')] [2025-01-04 00:41:30,450][134294] Updated weights for policy 0, policy_version 46454 (0.0028) [2025-01-04 00:41:33,582][134294] Updated weights for policy 0, policy_version 46464 (0.0026) [2025-01-04 00:41:33,968][134211] Fps is (10 sec: 13107.5, 60 sec: 14268.3, 300 sec: 15023.3). Total num frames: 190320640. Throughput: 0: 3725.3. Samples: 36744584. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 00:41:33,968][134211] Avg episode reward: [(0, '6.855')] [2025-01-04 00:41:36,461][134294] Updated weights for policy 0, policy_version 46474 (0.0025) [2025-01-04 00:41:38,968][134211] Fps is (10 sec: 13517.5, 60 sec: 14472.6, 300 sec: 15037.2). Total num frames: 190390272. Throughput: 0: 3746.5. Samples: 36764846. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 00:41:38,968][134211] Avg episode reward: [(0, '6.980')] [2025-01-04 00:41:39,675][134294] Updated weights for policy 0, policy_version 46484 (0.0027) [2025-01-04 00:41:41,742][134294] Updated weights for policy 0, policy_version 46494 (0.0017) [2025-01-04 00:41:43,686][134294] Updated weights for policy 0, policy_version 46504 (0.0013) [2025-01-04 00:41:43,968][134211] Fps is (10 sec: 16383.0, 60 sec: 15018.6, 300 sec: 15120.5). Total num frames: 190484480. Throughput: 0: 3855.7. Samples: 36790860. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 00:41:43,969][134211] Avg episode reward: [(0, '6.843')] [2025-01-04 00:41:45,479][134294] Updated weights for policy 0, policy_version 46514 (0.0013) [2025-01-04 00:41:47,391][134294] Updated weights for policy 0, policy_version 46524 (0.0013) [2025-01-04 00:41:48,968][134211] Fps is (10 sec: 20480.0, 60 sec: 15769.6, 300 sec: 15273.3). Total num frames: 190595072. Throughput: 0: 3943.5. Samples: 36807230. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 00:41:48,968][134211] Avg episode reward: [(0, '6.356')] [2025-01-04 00:41:49,367][134294] Updated weights for policy 0, policy_version 46534 (0.0013) [2025-01-04 00:41:52,569][134294] Updated weights for policy 0, policy_version 46544 (0.0026) [2025-01-04 00:41:53,968][134211] Fps is (10 sec: 17204.2, 60 sec: 15701.3, 300 sec: 15259.4). Total num frames: 190656512. Throughput: 0: 4077.1. Samples: 36833278. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 00:41:53,968][134211] Avg episode reward: [(0, '6.032')] [2025-01-04 00:41:56,048][134294] Updated weights for policy 0, policy_version 46554 (0.0028) [2025-01-04 00:41:58,968][134211] Fps is (10 sec: 12287.8, 60 sec: 15428.2, 300 sec: 15134.4). Total num frames: 190717952. Throughput: 0: 3810.3. Samples: 36850940. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 00:41:58,969][134211] Avg episode reward: [(0, '6.622')] [2025-01-04 00:41:59,521][134294] Updated weights for policy 0, policy_version 46564 (0.0029) [2025-01-04 00:42:02,740][134294] Updated weights for policy 0, policy_version 46574 (0.0026) [2025-01-04 00:42:03,968][134211] Fps is (10 sec: 12287.5, 60 sec: 15291.7, 300 sec: 14981.6). Total num frames: 190779392. Throughput: 0: 3652.0. Samples: 36859916. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 00:42:03,969][134211] Avg episode reward: [(0, '7.238')] [2025-01-04 00:42:05,792][134294] Updated weights for policy 0, policy_version 46584 (0.0028) [2025-01-04 00:42:08,783][134294] Updated weights for policy 0, policy_version 46594 (0.0025) [2025-01-04 00:42:08,968][134211] Fps is (10 sec: 13107.6, 60 sec: 15291.7, 300 sec: 14981.6). Total num frames: 190849024. Throughput: 0: 3665.4. Samples: 36880036. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 00:42:08,968][134211] Avg episode reward: [(0, '6.668')] [2025-01-04 00:42:10,700][134294] Updated weights for policy 0, policy_version 46604 (0.0013) [2025-01-04 00:42:12,597][134294] Updated weights for policy 0, policy_version 46614 (0.0016) [2025-01-04 00:42:13,968][134211] Fps is (10 sec: 18023.3, 60 sec: 15360.0, 300 sec: 15148.3). Total num frames: 190959616. Throughput: 0: 3879.3. Samples: 36909170. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 00:42:13,968][134211] Avg episode reward: [(0, '6.770')] [2025-01-04 00:42:14,441][134294] Updated weights for policy 0, policy_version 46624 (0.0012) [2025-01-04 00:42:16,472][134294] Updated weights for policy 0, policy_version 46634 (0.0015) [2025-01-04 00:42:18,968][134211] Fps is (10 sec: 19660.4, 60 sec: 15360.0, 300 sec: 15217.7). Total num frames: 191045632. Throughput: 0: 4016.1. Samples: 36925308. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 00:42:18,969][134211] Avg episode reward: [(0, '6.407')] [2025-01-04 00:42:19,448][134294] Updated weights for policy 0, policy_version 46644 (0.0027) [2025-01-04 00:42:22,546][134294] Updated weights for policy 0, policy_version 46654 (0.0027) [2025-01-04 00:42:23,968][134211] Fps is (10 sec: 15155.0, 60 sec: 15360.0, 300 sec: 15217.7). Total num frames: 191111168. Throughput: 0: 4018.1. Samples: 36945662. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 00:42:23,968][134211] Avg episode reward: [(0, '6.746')] [2025-01-04 00:42:25,860][134294] Updated weights for policy 0, policy_version 46664 (0.0028) [2025-01-04 00:42:28,968][134211] Fps is (10 sec: 12288.2, 60 sec: 15223.6, 300 sec: 15189.9). Total num frames: 191168512. Throughput: 0: 3848.3. Samples: 36964032. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 00:42:28,968][134211] Avg episode reward: [(0, '6.769')] [2025-01-04 00:42:29,339][134294] Updated weights for policy 0, policy_version 46674 (0.0027) [2025-01-04 00:42:32,941][134294] Updated weights for policy 0, policy_version 46684 (0.0027) [2025-01-04 00:42:33,968][134211] Fps is (10 sec: 11468.6, 60 sec: 15086.9, 300 sec: 15078.8). Total num frames: 191225856. Throughput: 0: 3671.3. Samples: 36972440. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 00:42:33,969][134211] Avg episode reward: [(0, '6.889')] [2025-01-04 00:42:36,003][134294] Updated weights for policy 0, policy_version 46694 (0.0020) [2025-01-04 00:42:38,206][134294] Updated weights for policy 0, policy_version 46704 (0.0013) [2025-01-04 00:42:38,968][134211] Fps is (10 sec: 14336.0, 60 sec: 15360.0, 300 sec: 15009.4). Total num frames: 191311872. Throughput: 0: 3551.0. Samples: 36993074. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 00:42:38,968][134211] Avg episode reward: [(0, '6.782')] [2025-01-04 00:42:40,640][134294] Updated weights for policy 0, policy_version 46714 (0.0019) [2025-01-04 00:42:43,546][134294] Updated weights for policy 0, policy_version 46724 (0.0024) [2025-01-04 00:42:43,968][134211] Fps is (10 sec: 15974.5, 60 sec: 15018.8, 300 sec: 15009.5). Total num frames: 191385600. Throughput: 0: 3697.4. Samples: 37017322. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 00:42:43,968][134211] Avg episode reward: [(0, '6.238')] [2025-01-04 00:42:46,636][134294] Updated weights for policy 0, policy_version 46734 (0.0027) [2025-01-04 00:42:48,968][134211] Fps is (10 sec: 13926.1, 60 sec: 14267.7, 300 sec: 15023.3). Total num frames: 191451136. Throughput: 0: 3716.7. Samples: 37027168. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 00:42:48,968][134211] Avg episode reward: [(0, '7.445')] [2025-01-04 00:42:49,792][134294] Updated weights for policy 0, policy_version 46744 (0.0029) [2025-01-04 00:42:52,678][134294] Updated weights for policy 0, policy_version 46754 (0.0024) [2025-01-04 00:42:53,967][134211] Fps is (10 sec: 13517.2, 60 sec: 14404.3, 300 sec: 15037.2). Total num frames: 191520768. Throughput: 0: 3720.8. Samples: 37047474. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 00:42:53,968][134211] Avg episode reward: [(0, '6.259')] [2025-01-04 00:42:54,941][134294] Updated weights for policy 0, policy_version 46764 (0.0016) [2025-01-04 00:42:56,849][134294] Updated weights for policy 0, policy_version 46774 (0.0012) [2025-01-04 00:42:58,968][134211] Fps is (10 sec: 17202.6, 60 sec: 15086.8, 300 sec: 15162.1). Total num frames: 191623168. Throughput: 0: 3716.9. Samples: 37076432. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 00:42:58,969][134211] Avg episode reward: [(0, '6.623')] [2025-01-04 00:42:59,116][134294] Updated weights for policy 0, policy_version 46784 (0.0016) [2025-01-04 00:43:02,442][134294] Updated weights for policy 0, policy_version 46794 (0.0026) [2025-01-04 00:43:03,968][134211] Fps is (10 sec: 16793.1, 60 sec: 15155.3, 300 sec: 15106.7). Total num frames: 191688704. Throughput: 0: 3580.2. Samples: 37086416. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 00:43:03,968][134211] Avg episode reward: [(0, '7.330')] [2025-01-04 00:43:05,458][134294] Updated weights for policy 0, policy_version 46804 (0.0025) [2025-01-04 00:43:08,640][134294] Updated weights for policy 0, policy_version 46814 (0.0029) [2025-01-04 00:43:08,968][134211] Fps is (10 sec: 13107.8, 60 sec: 15086.9, 300 sec: 15023.3). Total num frames: 191754240. Throughput: 0: 3566.8. Samples: 37106166. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 00:43:08,968][134211] Avg episode reward: [(0, '6.909')] [2025-01-04 00:43:11,099][134294] Updated weights for policy 0, policy_version 46824 (0.0018) [2025-01-04 00:43:12,979][134294] Updated weights for policy 0, policy_version 46834 (0.0014) [2025-01-04 00:43:13,968][134211] Fps is (10 sec: 15974.7, 60 sec: 14813.9, 300 sec: 15120.5). Total num frames: 191848448. Throughput: 0: 3736.5. Samples: 37132174. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 00:43:13,968][134211] Avg episode reward: [(0, '5.917')] [2025-01-04 00:43:13,975][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000046838_191848448.pth... [2025-01-04 00:43:14,044][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000045956_188235776.pth [2025-01-04 00:43:15,584][134294] Updated weights for policy 0, policy_version 46844 (0.0021) [2025-01-04 00:43:18,968][134211] Fps is (10 sec: 15564.8, 60 sec: 14404.3, 300 sec: 15106.6). Total num frames: 191909888. Throughput: 0: 3795.2. Samples: 37143224. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 00:43:18,968][134211] Avg episode reward: [(0, '7.051')] [2025-01-04 00:43:18,986][134294] Updated weights for policy 0, policy_version 46854 (0.0029) [2025-01-04 00:43:22,061][134294] Updated weights for policy 0, policy_version 46864 (0.0029) [2025-01-04 00:43:23,968][134211] Fps is (10 sec: 12697.6, 60 sec: 14404.3, 300 sec: 15051.1). Total num frames: 191975424. Throughput: 0: 3759.3. Samples: 37162244. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 00:43:23,968][134211] Avg episode reward: [(0, '6.942')] [2025-01-04 00:43:24,874][134294] Updated weights for policy 0, policy_version 46874 (0.0020) [2025-01-04 00:43:26,691][134294] Updated weights for policy 0, policy_version 46884 (0.0015) [2025-01-04 00:43:28,604][134294] Updated weights for policy 0, policy_version 46894 (0.0013) [2025-01-04 00:43:28,967][134211] Fps is (10 sec: 17613.2, 60 sec: 15291.7, 300 sec: 15064.9). Total num frames: 192086016. Throughput: 0: 3849.4. Samples: 37190544. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 00:43:28,968][134211] Avg episode reward: [(0, '6.952')] [2025-01-04 00:43:30,471][134294] Updated weights for policy 0, policy_version 46904 (0.0013) [2025-01-04 00:43:32,325][134294] Updated weights for policy 0, policy_version 46914 (0.0013) [2025-01-04 00:43:33,968][134211] Fps is (10 sec: 21299.2, 60 sec: 16042.7, 300 sec: 15092.7). Total num frames: 192188416. Throughput: 0: 3993.2. Samples: 37206860. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 00:43:33,968][134211] Avg episode reward: [(0, '7.056')] [2025-01-04 00:43:34,798][134294] Updated weights for policy 0, policy_version 46924 (0.0021) [2025-01-04 00:43:37,948][134294] Updated weights for policy 0, policy_version 46934 (0.0026) [2025-01-04 00:43:38,968][134211] Fps is (10 sec: 16383.5, 60 sec: 15633.0, 300 sec: 15092.7). Total num frames: 192249856. Throughput: 0: 4080.4. Samples: 37231092. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 00:43:38,968][134211] Avg episode reward: [(0, '7.960')] [2025-01-04 00:43:39,036][134264] Saving new best policy, reward=7.960! [2025-01-04 00:43:41,196][134294] Updated weights for policy 0, policy_version 46944 (0.0027) [2025-01-04 00:43:43,968][134211] Fps is (10 sec: 12697.6, 60 sec: 15496.6, 300 sec: 15092.7). Total num frames: 192315392. Throughput: 0: 3854.4. Samples: 37249880. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 00:43:43,969][134211] Avg episode reward: [(0, '6.663')] [2025-01-04 00:43:44,566][134294] Updated weights for policy 0, policy_version 46954 (0.0028) [2025-01-04 00:43:47,640][134294] Updated weights for policy 0, policy_version 46964 (0.0029) [2025-01-04 00:43:48,968][134211] Fps is (10 sec: 12697.6, 60 sec: 15428.3, 300 sec: 15078.8). Total num frames: 192376832. Throughput: 0: 3842.9. Samples: 37259346. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 00:43:48,968][134211] Avg episode reward: [(0, '6.856')] [2025-01-04 00:43:50,849][134294] Updated weights for policy 0, policy_version 46974 (0.0027) [2025-01-04 00:43:53,913][134294] Updated weights for policy 0, policy_version 46984 (0.0026) [2025-01-04 00:43:53,968][134211] Fps is (10 sec: 13107.1, 60 sec: 15428.2, 300 sec: 15106.6). Total num frames: 192446464. Throughput: 0: 3839.7. Samples: 37278950. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 00:43:53,968][134211] Avg episode reward: [(0, '6.508')] [2025-01-04 00:43:56,973][134294] Updated weights for policy 0, policy_version 46994 (0.0024) [2025-01-04 00:43:58,968][134211] Fps is (10 sec: 13516.9, 60 sec: 14814.0, 300 sec: 15120.5). Total num frames: 192512000. Throughput: 0: 3708.2. Samples: 37299042. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 00:43:58,968][134211] Avg episode reward: [(0, '7.034')] [2025-01-04 00:44:00,016][134294] Updated weights for policy 0, policy_version 47004 (0.0026) [2025-01-04 00:44:03,008][134294] Updated weights for policy 0, policy_version 47014 (0.0024) [2025-01-04 00:44:03,968][134211] Fps is (10 sec: 13516.6, 60 sec: 14882.1, 300 sec: 15106.6). Total num frames: 192581632. Throughput: 0: 3691.9. Samples: 37309358. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 00:44:03,968][134211] Avg episode reward: [(0, '7.124')] [2025-01-04 00:44:06,013][134294] Updated weights for policy 0, policy_version 47024 (0.0025) [2025-01-04 00:44:08,471][134294] Updated weights for policy 0, policy_version 47034 (0.0013) [2025-01-04 00:44:08,968][134211] Fps is (10 sec: 14745.4, 60 sec: 15086.9, 300 sec: 15009.4). Total num frames: 192659456. Throughput: 0: 3718.1. Samples: 37329560. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 00:44:08,968][134211] Avg episode reward: [(0, '6.563')] [2025-01-04 00:44:10,377][134294] Updated weights for policy 0, policy_version 47044 (0.0014) [2025-01-04 00:44:12,256][134294] Updated weights for policy 0, policy_version 47054 (0.0014) [2025-01-04 00:44:13,967][134211] Fps is (10 sec: 18432.5, 60 sec: 15291.8, 300 sec: 15023.3). Total num frames: 192765952. Throughput: 0: 3802.3. Samples: 37361646. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 00:44:13,968][134211] Avg episode reward: [(0, '6.598')] [2025-01-04 00:44:14,147][134294] Updated weights for policy 0, policy_version 47064 (0.0015) [2025-01-04 00:44:16,023][134294] Updated weights for policy 0, policy_version 47074 (0.0013) [2025-01-04 00:44:17,902][134294] Updated weights for policy 0, policy_version 47084 (0.0013) [2025-01-04 00:44:18,968][134211] Fps is (10 sec: 21299.5, 60 sec: 16042.7, 300 sec: 15162.1). Total num frames: 192872448. Throughput: 0: 3802.4. Samples: 37377968. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 00:44:18,969][134211] Avg episode reward: [(0, '7.427')] [2025-01-04 00:44:20,674][134294] Updated weights for policy 0, policy_version 47094 (0.0023) [2025-01-04 00:44:23,893][134294] Updated weights for policy 0, policy_version 47104 (0.0030) [2025-01-04 00:44:23,968][134211] Fps is (10 sec: 17202.6, 60 sec: 16042.6, 300 sec: 15162.1). Total num frames: 192937984. Throughput: 0: 3808.0. Samples: 37402454. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 00:44:23,969][134211] Avg episode reward: [(0, '6.428')] [2025-01-04 00:44:27,341][134294] Updated weights for policy 0, policy_version 47114 (0.0026) [2025-01-04 00:44:28,968][134211] Fps is (10 sec: 12287.9, 60 sec: 15155.1, 300 sec: 15148.3). Total num frames: 192995328. Throughput: 0: 3781.1. Samples: 37420030. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 00:44:28,969][134211] Avg episode reward: [(0, '7.065')] [2025-01-04 00:44:30,935][134294] Updated weights for policy 0, policy_version 47124 (0.0032) [2025-01-04 00:44:33,968][134211] Fps is (10 sec: 11878.4, 60 sec: 14472.5, 300 sec: 15120.5). Total num frames: 193056768. Throughput: 0: 3769.0. Samples: 37428952. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 00:44:33,968][134211] Avg episode reward: [(0, '6.470')] [2025-01-04 00:44:34,293][134294] Updated weights for policy 0, policy_version 47134 (0.0025) [2025-01-04 00:44:37,379][134294] Updated weights for policy 0, policy_version 47144 (0.0028) [2025-01-04 00:44:38,968][134211] Fps is (10 sec: 12288.0, 60 sec: 14472.5, 300 sec: 15092.7). Total num frames: 193118208. Throughput: 0: 3753.6. Samples: 37447862. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 00:44:38,968][134211] Avg episode reward: [(0, '6.474')] [2025-01-04 00:44:40,585][134294] Updated weights for policy 0, policy_version 47154 (0.0025) [2025-01-04 00:44:42,912][134294] Updated weights for policy 0, policy_version 47164 (0.0015) [2025-01-04 00:44:43,967][134211] Fps is (10 sec: 14746.0, 60 sec: 14813.9, 300 sec: 15162.2). Total num frames: 193204224. Throughput: 0: 3811.6. Samples: 37470564. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 00:44:43,968][134211] Avg episode reward: [(0, '6.312')] [2025-01-04 00:44:44,839][134294] Updated weights for policy 0, policy_version 47174 (0.0013) [2025-01-04 00:44:46,699][134294] Updated weights for policy 0, policy_version 47184 (0.0012) [2025-01-04 00:44:48,595][134294] Updated weights for policy 0, policy_version 47194 (0.0014) [2025-01-04 00:44:48,968][134211] Fps is (10 sec: 19661.1, 60 sec: 15633.1, 300 sec: 15301.0). Total num frames: 193314816. Throughput: 0: 3939.7. Samples: 37486642. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 00:44:48,968][134211] Avg episode reward: [(0, '6.751')] [2025-01-04 00:44:51,189][134294] Updated weights for policy 0, policy_version 47204 (0.0023) [2025-01-04 00:44:53,970][134211] Fps is (10 sec: 17608.7, 60 sec: 15564.2, 300 sec: 15162.0). Total num frames: 193380352. Throughput: 0: 4084.5. Samples: 37513370. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 00:44:53,971][134211] Avg episode reward: [(0, '6.270')] [2025-01-04 00:44:54,261][134294] Updated weights for policy 0, policy_version 47214 (0.0028) [2025-01-04 00:44:57,526][134294] Updated weights for policy 0, policy_version 47224 (0.0026) [2025-01-04 00:44:58,968][134211] Fps is (10 sec: 13106.8, 60 sec: 15564.7, 300 sec: 15162.1). Total num frames: 193445888. Throughput: 0: 3797.7. Samples: 37532546. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 00:44:58,969][134211] Avg episode reward: [(0, '5.848')] [2025-01-04 00:45:00,676][134294] Updated weights for policy 0, policy_version 47234 (0.0028) [2025-01-04 00:45:03,717][134294] Updated weights for policy 0, policy_version 47244 (0.0026) [2025-01-04 00:45:03,968][134211] Fps is (10 sec: 13109.9, 60 sec: 15496.5, 300 sec: 15162.1). Total num frames: 193511424. Throughput: 0: 3658.3. Samples: 37542592. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 00:45:03,968][134211] Avg episode reward: [(0, '6.054')] [2025-01-04 00:45:06,698][134294] Updated weights for policy 0, policy_version 47254 (0.0024) [2025-01-04 00:45:08,968][134211] Fps is (10 sec: 13107.6, 60 sec: 15291.8, 300 sec: 15037.2). Total num frames: 193576960. Throughput: 0: 3561.3. Samples: 37562712. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 00:45:08,968][134211] Avg episode reward: [(0, '6.075')] [2025-01-04 00:45:09,962][134294] Updated weights for policy 0, policy_version 47264 (0.0026) [2025-01-04 00:45:12,752][134294] Updated weights for policy 0, policy_version 47274 (0.0021) [2025-01-04 00:45:13,968][134211] Fps is (10 sec: 14746.0, 60 sec: 14882.1, 300 sec: 15106.6). Total num frames: 193658880. Throughput: 0: 3644.2. Samples: 37584018. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 00:45:13,968][134211] Avg episode reward: [(0, '6.400')] [2025-01-04 00:45:13,975][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000047280_193658880.pth... [2025-01-04 00:45:14,030][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000046397_190042112.pth [2025-01-04 00:45:15,005][134294] Updated weights for policy 0, policy_version 47284 (0.0014) [2025-01-04 00:45:17,864][134294] Updated weights for policy 0, policy_version 47294 (0.0023) [2025-01-04 00:45:18,968][134211] Fps is (10 sec: 15155.0, 60 sec: 14267.7, 300 sec: 15120.5). Total num frames: 193728512. Throughput: 0: 3720.1. Samples: 37596356. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 00:45:18,968][134211] Avg episode reward: [(0, '6.360')] [2025-01-04 00:45:20,848][134294] Updated weights for policy 0, policy_version 47304 (0.0026) [2025-01-04 00:45:23,689][134294] Updated weights for policy 0, policy_version 47314 (0.0023) [2025-01-04 00:45:23,967][134211] Fps is (10 sec: 14336.1, 60 sec: 14404.4, 300 sec: 15037.2). Total num frames: 193802240. Throughput: 0: 3760.6. Samples: 37617088. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 00:45:23,968][134211] Avg episode reward: [(0, '6.664')] [2025-01-04 00:45:25,676][134294] Updated weights for policy 0, policy_version 47324 (0.0016) [2025-01-04 00:45:27,537][134294] Updated weights for policy 0, policy_version 47334 (0.0015) [2025-01-04 00:45:28,968][134211] Fps is (10 sec: 17202.6, 60 sec: 15086.9, 300 sec: 15037.3). Total num frames: 193900544. Throughput: 0: 3896.4. Samples: 37645906. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 00:45:28,969][134211] Avg episode reward: [(0, '7.026')] [2025-01-04 00:45:30,306][134294] Updated weights for policy 0, policy_version 47344 (0.0025) [2025-01-04 00:45:33,484][134294] Updated weights for policy 0, policy_version 47354 (0.0026) [2025-01-04 00:45:33,968][134211] Fps is (10 sec: 16383.4, 60 sec: 15155.2, 300 sec: 15064.9). Total num frames: 193966080. Throughput: 0: 3768.1. Samples: 37656208. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 00:45:33,968][134211] Avg episode reward: [(0, '6.985')] [2025-01-04 00:45:36,496][134294] Updated weights for policy 0, policy_version 47364 (0.0027) [2025-01-04 00:45:38,968][134211] Fps is (10 sec: 13516.4, 60 sec: 15291.6, 300 sec: 15092.7). Total num frames: 194035712. Throughput: 0: 3618.7. Samples: 37676204. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 00:45:38,969][134211] Avg episode reward: [(0, '6.878')] [2025-01-04 00:45:39,703][134294] Updated weights for policy 0, policy_version 47374 (0.0028) [2025-01-04 00:45:42,053][134294] Updated weights for policy 0, policy_version 47384 (0.0018) [2025-01-04 00:45:43,964][134294] Updated weights for policy 0, policy_version 47394 (0.0012) [2025-01-04 00:45:43,968][134211] Fps is (10 sec: 15974.8, 60 sec: 15360.0, 300 sec: 15176.0). Total num frames: 194125824. Throughput: 0: 3733.9. Samples: 37700570. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 00:45:43,968][134211] Avg episode reward: [(0, '6.963')] [2025-01-04 00:45:46,225][134294] Updated weights for policy 0, policy_version 47404 (0.0017) [2025-01-04 00:45:48,968][134211] Fps is (10 sec: 16794.6, 60 sec: 14813.8, 300 sec: 15217.7). Total num frames: 194203648. Throughput: 0: 3824.9. Samples: 37714712. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 00:45:48,968][134211] Avg episode reward: [(0, '7.299')] [2025-01-04 00:45:49,241][134294] Updated weights for policy 0, policy_version 47414 (0.0023) [2025-01-04 00:45:52,321][134294] Updated weights for policy 0, policy_version 47424 (0.0024) [2025-01-04 00:45:53,969][134211] Fps is (10 sec: 14334.4, 60 sec: 14814.2, 300 sec: 15176.0). Total num frames: 194269184. Throughput: 0: 3826.4. Samples: 37734904. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 00:45:53,969][134211] Avg episode reward: [(0, '6.761')] [2025-01-04 00:45:55,424][134294] Updated weights for policy 0, policy_version 47434 (0.0025) [2025-01-04 00:45:58,523][134294] Updated weights for policy 0, policy_version 47444 (0.0027) [2025-01-04 00:45:58,968][134211] Fps is (10 sec: 13107.3, 60 sec: 14813.9, 300 sec: 15162.2). Total num frames: 194334720. Throughput: 0: 3793.2. Samples: 37754712. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 00:45:58,968][134211] Avg episode reward: [(0, '6.621')] [2025-01-04 00:46:00,712][134294] Updated weights for policy 0, policy_version 47454 (0.0014) [2025-01-04 00:46:02,584][134294] Updated weights for policy 0, policy_version 47464 (0.0014) [2025-01-04 00:46:03,968][134211] Fps is (10 sec: 17205.2, 60 sec: 15496.6, 300 sec: 15287.1). Total num frames: 194441216. Throughput: 0: 3833.5. Samples: 37768862. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 00:46:03,968][134211] Avg episode reward: [(0, '6.281')] [2025-01-04 00:46:04,452][134294] Updated weights for policy 0, policy_version 47474 (0.0015) [2025-01-04 00:46:06,418][134294] Updated weights for policy 0, policy_version 47484 (0.0014) [2025-01-04 00:46:08,968][134211] Fps is (10 sec: 19251.2, 60 sec: 15837.9, 300 sec: 15217.7). Total num frames: 194527232. Throughput: 0: 4059.0. Samples: 37799742. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 00:46:08,968][134211] Avg episode reward: [(0, '7.023')] [2025-01-04 00:46:09,470][134294] Updated weights for policy 0, policy_version 47494 (0.0025) [2025-01-04 00:46:12,783][134294] Updated weights for policy 0, policy_version 47504 (0.0028) [2025-01-04 00:46:13,969][134211] Fps is (10 sec: 14743.9, 60 sec: 15496.2, 300 sec: 15134.3). Total num frames: 194588672. Throughput: 0: 3839.0. Samples: 37818664. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 00:46:13,969][134211] Avg episode reward: [(0, '6.629')] [2025-01-04 00:46:15,839][134294] Updated weights for policy 0, policy_version 47514 (0.0028) [2025-01-04 00:46:18,894][134294] Updated weights for policy 0, policy_version 47524 (0.0025) [2025-01-04 00:46:18,968][134211] Fps is (10 sec: 13107.1, 60 sec: 15496.5, 300 sec: 15148.3). Total num frames: 194658304. Throughput: 0: 3832.5. Samples: 37828668. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 00:46:18,968][134211] Avg episode reward: [(0, '6.857')] [2025-01-04 00:46:21,889][134294] Updated weights for policy 0, policy_version 47534 (0.0027) [2025-01-04 00:46:23,968][134211] Fps is (10 sec: 13518.1, 60 sec: 15359.9, 300 sec: 15148.3). Total num frames: 194723840. Throughput: 0: 3839.2. Samples: 37848966. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 00:46:23,968][134211] Avg episode reward: [(0, '6.983')] [2025-01-04 00:46:25,036][134294] Updated weights for policy 0, policy_version 47544 (0.0024) [2025-01-04 00:46:28,279][134294] Updated weights for policy 0, policy_version 47554 (0.0022) [2025-01-04 00:46:28,968][134211] Fps is (10 sec: 13516.7, 60 sec: 14882.2, 300 sec: 15162.1). Total num frames: 194793472. Throughput: 0: 3722.5. Samples: 37868084. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 00:46:28,968][134211] Avg episode reward: [(0, '6.361')] [2025-01-04 00:46:30,325][134294] Updated weights for policy 0, policy_version 47564 (0.0013) [2025-01-04 00:46:32,337][134294] Updated weights for policy 0, policy_version 47574 (0.0013) [2025-01-04 00:46:33,968][134211] Fps is (10 sec: 17203.5, 60 sec: 15496.6, 300 sec: 15273.2). Total num frames: 194895872. Throughput: 0: 3737.0. Samples: 37882876. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 00:46:33,968][134211] Avg episode reward: [(0, '7.360')] [2025-01-04 00:46:34,219][134294] Updated weights for policy 0, policy_version 47584 (0.0013) [2025-01-04 00:46:36,145][134294] Updated weights for policy 0, policy_version 47594 (0.0014) [2025-01-04 00:46:38,000][134294] Updated weights for policy 0, policy_version 47604 (0.0014) [2025-01-04 00:46:38,967][134211] Fps is (10 sec: 20890.2, 60 sec: 16111.2, 300 sec: 15314.9). Total num frames: 195002368. Throughput: 0: 4008.4. Samples: 37915278. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 00:46:38,968][134211] Avg episode reward: [(0, '7.028')] [2025-01-04 00:46:39,906][134294] Updated weights for policy 0, policy_version 47614 (0.0014) [2025-01-04 00:46:42,581][134294] Updated weights for policy 0, policy_version 47624 (0.0025) [2025-01-04 00:46:43,968][134211] Fps is (10 sec: 18841.0, 60 sec: 15974.3, 300 sec: 15217.7). Total num frames: 195084288. Throughput: 0: 4165.6. Samples: 37942166. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 00:46:43,969][134211] Avg episode reward: [(0, '5.911')] [2025-01-04 00:46:45,780][134294] Updated weights for policy 0, policy_version 47634 (0.0029) [2025-01-04 00:46:48,928][134294] Updated weights for policy 0, policy_version 47644 (0.0024) [2025-01-04 00:46:48,968][134211] Fps is (10 sec: 14745.0, 60 sec: 15769.5, 300 sec: 15231.6). Total num frames: 195149824. Throughput: 0: 4061.5. Samples: 37951632. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 00:46:48,969][134211] Avg episode reward: [(0, '7.034')] [2025-01-04 00:46:51,902][134294] Updated weights for policy 0, policy_version 47654 (0.0025) [2025-01-04 00:46:53,969][134211] Fps is (10 sec: 13105.9, 60 sec: 15769.6, 300 sec: 15245.4). Total num frames: 195215360. Throughput: 0: 3821.1. Samples: 37971698. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 00:46:53,970][134211] Avg episode reward: [(0, '6.504')] [2025-01-04 00:46:55,048][134294] Updated weights for policy 0, policy_version 47664 (0.0027) [2025-01-04 00:46:58,071][134294] Updated weights for policy 0, policy_version 47674 (0.0028) [2025-01-04 00:46:58,968][134211] Fps is (10 sec: 13107.5, 60 sec: 15769.6, 300 sec: 15259.4). Total num frames: 195280896. Throughput: 0: 3843.9. Samples: 37991634. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 00:46:58,968][134211] Avg episode reward: [(0, '6.959')] [2025-01-04 00:47:01,197][134294] Updated weights for policy 0, policy_version 47684 (0.0024) [2025-01-04 00:47:03,968][134211] Fps is (10 sec: 13108.9, 60 sec: 15086.9, 300 sec: 15245.4). Total num frames: 195346432. Throughput: 0: 3841.1. Samples: 38001516. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 00:47:03,968][134211] Avg episode reward: [(0, '6.764')] [2025-01-04 00:47:04,462][134294] Updated weights for policy 0, policy_version 47694 (0.0028) [2025-01-04 00:47:07,646][134294] Updated weights for policy 0, policy_version 47704 (0.0026) [2025-01-04 00:47:08,968][134211] Fps is (10 sec: 13107.1, 60 sec: 14745.6, 300 sec: 15092.7). Total num frames: 195411968. Throughput: 0: 3811.3. Samples: 38020476. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 00:47:08,968][134211] Avg episode reward: [(0, '7.041')] [2025-01-04 00:47:10,691][134294] Updated weights for policy 0, policy_version 47714 (0.0025) [2025-01-04 00:47:13,606][134294] Updated weights for policy 0, policy_version 47724 (0.0025) [2025-01-04 00:47:13,968][134211] Fps is (10 sec: 13107.0, 60 sec: 14814.1, 300 sec: 15023.3). Total num frames: 195477504. Throughput: 0: 3844.7. Samples: 38041096. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 00:47:13,968][134211] Avg episode reward: [(0, '6.478')] [2025-01-04 00:47:13,979][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000047725_195481600.pth... [2025-01-04 00:47:14,056][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000046838_191848448.pth [2025-01-04 00:47:16,613][134294] Updated weights for policy 0, policy_version 47734 (0.0023) [2025-01-04 00:47:18,968][134211] Fps is (10 sec: 13516.7, 60 sec: 14813.9, 300 sec: 15037.2). Total num frames: 195547136. Throughput: 0: 3739.3. Samples: 38051144. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 00:47:18,969][134211] Avg episode reward: [(0, '6.418')] [2025-01-04 00:47:19,746][134294] Updated weights for policy 0, policy_version 47744 (0.0027) [2025-01-04 00:47:22,772][134294] Updated weights for policy 0, policy_version 47754 (0.0024) [2025-01-04 00:47:23,968][134211] Fps is (10 sec: 14745.9, 60 sec: 15018.7, 300 sec: 15106.6). Total num frames: 195624960. Throughput: 0: 3467.3. Samples: 38071306. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 00:47:23,968][134211] Avg episode reward: [(0, '6.753')] [2025-01-04 00:47:24,652][134294] Updated weights for policy 0, policy_version 47764 (0.0013) [2025-01-04 00:47:26,620][134294] Updated weights for policy 0, policy_version 47774 (0.0013) [2025-01-04 00:47:28,421][134294] Updated weights for policy 0, policy_version 47784 (0.0015) [2025-01-04 00:47:28,968][134211] Fps is (10 sec: 18431.8, 60 sec: 15633.0, 300 sec: 15273.2). Total num frames: 195731456. Throughput: 0: 3564.0. Samples: 38102548. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 00:47:28,968][134211] Avg episode reward: [(0, '6.913')] [2025-01-04 00:47:30,328][134294] Updated weights for policy 0, policy_version 47794 (0.0014) [2025-01-04 00:47:33,145][134294] Updated weights for policy 0, policy_version 47804 (0.0024) [2025-01-04 00:47:33,968][134211] Fps is (10 sec: 18841.2, 60 sec: 15291.7, 300 sec: 15259.3). Total num frames: 195813376. Throughput: 0: 3695.3. Samples: 38117918. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 00:47:33,968][134211] Avg episode reward: [(0, '7.419')] [2025-01-04 00:47:36,398][134294] Updated weights for policy 0, policy_version 47814 (0.0029) [2025-01-04 00:47:38,968][134211] Fps is (10 sec: 14336.2, 60 sec: 14540.7, 300 sec: 15217.7). Total num frames: 195874816. Throughput: 0: 3668.9. Samples: 38136796. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 00:47:38,969][134211] Avg episode reward: [(0, '6.463')] [2025-01-04 00:47:39,870][134294] Updated weights for policy 0, policy_version 47824 (0.0030) [2025-01-04 00:47:43,025][134294] Updated weights for policy 0, policy_version 47834 (0.0029) [2025-01-04 00:47:43,968][134211] Fps is (10 sec: 12697.5, 60 sec: 14267.7, 300 sec: 15217.7). Total num frames: 195940352. Throughput: 0: 3645.1. Samples: 38155662. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 00:47:43,970][134211] Avg episode reward: [(0, '7.035')] [2025-01-04 00:47:45,788][134294] Updated weights for policy 0, policy_version 47844 (0.0022) [2025-01-04 00:47:47,832][134294] Updated weights for policy 0, policy_version 47854 (0.0017) [2025-01-04 00:47:48,968][134211] Fps is (10 sec: 15155.3, 60 sec: 14609.1, 300 sec: 15273.2). Total num frames: 196026368. Throughput: 0: 3693.9. Samples: 38167744. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 00:47:48,968][134211] Avg episode reward: [(0, '6.433')] [2025-01-04 00:47:50,770][134294] Updated weights for policy 0, policy_version 47864 (0.0024) [2025-01-04 00:47:53,695][134294] Updated weights for policy 0, policy_version 47874 (0.0023) [2025-01-04 00:47:53,969][134211] Fps is (10 sec: 15153.7, 60 sec: 14609.1, 300 sec: 15148.2). Total num frames: 196091904. Throughput: 0: 3788.7. Samples: 38190970. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 00:47:53,969][134211] Avg episode reward: [(0, '6.707')] [2025-01-04 00:47:56,763][134294] Updated weights for policy 0, policy_version 47884 (0.0023) [2025-01-04 00:47:58,968][134211] Fps is (10 sec: 13107.2, 60 sec: 14609.0, 300 sec: 15148.3). Total num frames: 196157440. Throughput: 0: 3771.2. Samples: 38210800. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 00:47:58,968][134211] Avg episode reward: [(0, '6.718')] [2025-01-04 00:47:59,809][134294] Updated weights for policy 0, policy_version 47894 (0.0024) [2025-01-04 00:48:01,730][134294] Updated weights for policy 0, policy_version 47904 (0.0014) [2025-01-04 00:48:03,623][134294] Updated weights for policy 0, policy_version 47914 (0.0014) [2025-01-04 00:48:03,968][134211] Fps is (10 sec: 16795.7, 60 sec: 15223.5, 300 sec: 15273.2). Total num frames: 196259840. Throughput: 0: 3841.8. Samples: 38224024. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 00:48:03,968][134211] Avg episode reward: [(0, '6.437')] [2025-01-04 00:48:05,561][134294] Updated weights for policy 0, policy_version 47924 (0.0014) [2025-01-04 00:48:07,444][134294] Updated weights for policy 0, policy_version 47934 (0.0012) [2025-01-04 00:48:08,967][134211] Fps is (10 sec: 21299.8, 60 sec: 15974.5, 300 sec: 15328.8). Total num frames: 196370432. Throughput: 0: 4114.2. Samples: 38256446. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 00:48:08,968][134211] Avg episode reward: [(0, '6.364')] [2025-01-04 00:48:09,279][134294] Updated weights for policy 0, policy_version 47944 (0.0014) [2025-01-04 00:48:11,147][134294] Updated weights for policy 0, policy_version 47954 (0.0012) [2025-01-04 00:48:13,196][134294] Updated weights for policy 0, policy_version 47964 (0.0015) [2025-01-04 00:48:13,968][134211] Fps is (10 sec: 20889.3, 60 sec: 16520.5, 300 sec: 15453.7). Total num frames: 196468736. Throughput: 0: 4128.0. Samples: 38288308. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 00:48:13,968][134211] Avg episode reward: [(0, '6.846')] [2025-01-04 00:48:16,481][134294] Updated weights for policy 0, policy_version 47974 (0.0026) [2025-01-04 00:48:18,968][134211] Fps is (10 sec: 15564.6, 60 sec: 16315.8, 300 sec: 15426.0). Total num frames: 196526080. Throughput: 0: 3997.3. Samples: 38297794. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 00:48:18,968][134211] Avg episode reward: [(0, '7.140')] [2025-01-04 00:48:20,117][134294] Updated weights for policy 0, policy_version 47984 (0.0029) [2025-01-04 00:48:23,350][134294] Updated weights for policy 0, policy_version 47994 (0.0027) [2025-01-04 00:48:23,968][134211] Fps is (10 sec: 11878.3, 60 sec: 16042.6, 300 sec: 15259.3). Total num frames: 196587520. Throughput: 0: 3965.2. Samples: 38315228. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 00:48:23,968][134211] Avg episode reward: [(0, '6.579')] [2025-01-04 00:48:26,767][134294] Updated weights for policy 0, policy_version 48004 (0.0025) [2025-01-04 00:48:28,968][134211] Fps is (10 sec: 11878.3, 60 sec: 15223.5, 300 sec: 15106.6). Total num frames: 196644864. Throughput: 0: 3938.5. Samples: 38332896. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 00:48:28,968][134211] Avg episode reward: [(0, '6.204')] [2025-01-04 00:48:30,518][134294] Updated weights for policy 0, policy_version 48014 (0.0026) [2025-01-04 00:48:33,940][134294] Updated weights for policy 0, policy_version 48024 (0.0028) [2025-01-04 00:48:33,968][134211] Fps is (10 sec: 11878.4, 60 sec: 14882.1, 300 sec: 15106.6). Total num frames: 196706304. Throughput: 0: 3859.8. Samples: 38341436. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 00:48:33,968][134211] Avg episode reward: [(0, '7.169')] [2025-01-04 00:48:37,137][134294] Updated weights for policy 0, policy_version 48034 (0.0025) [2025-01-04 00:48:38,968][134211] Fps is (10 sec: 12288.2, 60 sec: 14882.2, 300 sec: 15092.7). Total num frames: 196767744. Throughput: 0: 3757.8. Samples: 38360066. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 00:48:38,968][134211] Avg episode reward: [(0, '6.792')] [2025-01-04 00:48:40,189][134294] Updated weights for policy 0, policy_version 48044 (0.0027) [2025-01-04 00:48:43,140][134294] Updated weights for policy 0, policy_version 48054 (0.0024) [2025-01-04 00:48:43,968][134211] Fps is (10 sec: 13517.1, 60 sec: 15018.7, 300 sec: 15134.4). Total num frames: 196841472. Throughput: 0: 3775.6. Samples: 38380700. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 00:48:43,968][134211] Avg episode reward: [(0, '7.078')] [2025-01-04 00:48:45,129][134294] Updated weights for policy 0, policy_version 48064 (0.0014) [2025-01-04 00:48:46,983][134294] Updated weights for policy 0, policy_version 48074 (0.0013) [2025-01-04 00:48:48,889][134294] Updated weights for policy 0, policy_version 48084 (0.0014) [2025-01-04 00:48:48,968][134211] Fps is (10 sec: 18432.1, 60 sec: 15428.3, 300 sec: 15273.2). Total num frames: 196952064. Throughput: 0: 3835.6. Samples: 38396626. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 00:48:48,969][134211] Avg episode reward: [(0, '6.342')] [2025-01-04 00:48:51,208][134294] Updated weights for policy 0, policy_version 48094 (0.0020) [2025-01-04 00:48:53,968][134211] Fps is (10 sec: 18431.4, 60 sec: 15565.0, 300 sec: 15301.0). Total num frames: 197025792. Throughput: 0: 3746.1. Samples: 38425022. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 00:48:53,969][134211] Avg episode reward: [(0, '6.416')] [2025-01-04 00:48:54,198][134294] Updated weights for policy 0, policy_version 48104 (0.0027) [2025-01-04 00:48:57,325][134294] Updated weights for policy 0, policy_version 48114 (0.0031) [2025-01-04 00:48:58,968][134211] Fps is (10 sec: 13926.2, 60 sec: 15564.8, 300 sec: 15287.1). Total num frames: 197091328. Throughput: 0: 3465.5. Samples: 38444254. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 00:48:58,968][134211] Avg episode reward: [(0, '6.627')] [2025-01-04 00:49:00,540][134294] Updated weights for policy 0, policy_version 48124 (0.0028) [2025-01-04 00:49:03,620][134294] Updated weights for policy 0, policy_version 48134 (0.0024) [2025-01-04 00:49:03,968][134211] Fps is (10 sec: 13516.8, 60 sec: 15018.6, 300 sec: 15259.3). Total num frames: 197160960. Throughput: 0: 3477.9. Samples: 38454302. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 00:49:03,969][134211] Avg episode reward: [(0, '6.592')] [2025-01-04 00:49:06,651][134294] Updated weights for policy 0, policy_version 48144 (0.0027) [2025-01-04 00:49:08,968][134211] Fps is (10 sec: 13516.8, 60 sec: 14267.7, 300 sec: 15120.5). Total num frames: 197226496. Throughput: 0: 3536.6. Samples: 38474374. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 00:49:08,968][134211] Avg episode reward: [(0, '6.642')] [2025-01-04 00:49:09,839][134294] Updated weights for policy 0, policy_version 48154 (0.0023) [2025-01-04 00:49:12,180][134294] Updated weights for policy 0, policy_version 48164 (0.0017) [2025-01-04 00:49:13,968][134211] Fps is (10 sec: 15565.3, 60 sec: 14131.2, 300 sec: 15065.0). Total num frames: 197316608. Throughput: 0: 3679.5. Samples: 38498474. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 00:49:13,968][134211] Avg episode reward: [(0, '6.407')] [2025-01-04 00:49:13,971][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000048173_197316608.pth... [2025-01-04 00:49:14,023][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000047280_193658880.pth [2025-01-04 00:49:14,101][134294] Updated weights for policy 0, policy_version 48174 (0.0013) [2025-01-04 00:49:16,004][134294] Updated weights for policy 0, policy_version 48184 (0.0012) [2025-01-04 00:49:17,864][134294] Updated weights for policy 0, policy_version 48194 (0.0013) [2025-01-04 00:49:18,968][134211] Fps is (10 sec: 19660.9, 60 sec: 14950.4, 300 sec: 15203.8). Total num frames: 197423104. Throughput: 0: 3850.6. Samples: 38514714. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 00:49:18,968][134211] Avg episode reward: [(0, '7.007')] [2025-01-04 00:49:20,450][134294] Updated weights for policy 0, policy_version 48204 (0.0020) [2025-01-04 00:49:23,611][134294] Updated weights for policy 0, policy_version 48214 (0.0028) [2025-01-04 00:49:23,968][134211] Fps is (10 sec: 17202.9, 60 sec: 15018.7, 300 sec: 15231.6). Total num frames: 197488640. Throughput: 0: 3996.8. Samples: 38539922. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 00:49:23,968][134211] Avg episode reward: [(0, '6.693')] [2025-01-04 00:49:26,776][134294] Updated weights for policy 0, policy_version 48224 (0.0026) [2025-01-04 00:49:28,968][134211] Fps is (10 sec: 12697.5, 60 sec: 15086.9, 300 sec: 15231.6). Total num frames: 197550080. Throughput: 0: 3964.8. Samples: 38559118. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 00:49:28,969][134211] Avg episode reward: [(0, '6.576')] [2025-01-04 00:49:29,980][134294] Updated weights for policy 0, policy_version 48234 (0.0028) [2025-01-04 00:49:32,924][134294] Updated weights for policy 0, policy_version 48244 (0.0025) [2025-01-04 00:49:33,968][134211] Fps is (10 sec: 13107.2, 60 sec: 15223.5, 300 sec: 15259.3). Total num frames: 197619712. Throughput: 0: 3823.9. Samples: 38568700. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 00:49:33,968][134211] Avg episode reward: [(0, '6.533')] [2025-01-04 00:49:36,053][134294] Updated weights for policy 0, policy_version 48254 (0.0027) [2025-01-04 00:49:38,968][134211] Fps is (10 sec: 13516.9, 60 sec: 15291.7, 300 sec: 15189.9). Total num frames: 197685248. Throughput: 0: 3639.9. Samples: 38588818. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 00:49:38,968][134211] Avg episode reward: [(0, '7.322')] [2025-01-04 00:49:39,263][134294] Updated weights for policy 0, policy_version 48264 (0.0024) [2025-01-04 00:49:42,012][134294] Updated weights for policy 0, policy_version 48274 (0.0019) [2025-01-04 00:49:43,919][134294] Updated weights for policy 0, policy_version 48284 (0.0011) [2025-01-04 00:49:43,967][134211] Fps is (10 sec: 15155.6, 60 sec: 15496.6, 300 sec: 15106.6). Total num frames: 197771264. Throughput: 0: 3729.4. Samples: 38612076. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 00:49:43,968][134211] Avg episode reward: [(0, '7.320')] [2025-01-04 00:49:45,814][134294] Updated weights for policy 0, policy_version 48294 (0.0017) [2025-01-04 00:49:48,706][134294] Updated weights for policy 0, policy_version 48304 (0.0023) [2025-01-04 00:49:48,968][134211] Fps is (10 sec: 16793.1, 60 sec: 15018.6, 300 sec: 15162.2). Total num frames: 197853184. Throughput: 0: 3849.4. Samples: 38627526. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 00:49:48,969][134211] Avg episode reward: [(0, '6.789')] [2025-01-04 00:49:51,834][134294] Updated weights for policy 0, policy_version 48314 (0.0027) [2025-01-04 00:49:53,968][134211] Fps is (10 sec: 14745.0, 60 sec: 14882.1, 300 sec: 15162.1). Total num frames: 197918720. Throughput: 0: 3852.6. Samples: 38647742. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 00:49:53,969][134211] Avg episode reward: [(0, '6.620')] [2025-01-04 00:49:54,998][134294] Updated weights for policy 0, policy_version 48324 (0.0023) [2025-01-04 00:49:57,669][134294] Updated weights for policy 0, policy_version 48334 (0.0020) [2025-01-04 00:49:58,967][134211] Fps is (10 sec: 15155.9, 60 sec: 15223.5, 300 sec: 15231.6). Total num frames: 198004736. Throughput: 0: 3817.4. Samples: 38670258. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 00:49:58,968][134211] Avg episode reward: [(0, '6.756')] [2025-01-04 00:49:59,544][134294] Updated weights for policy 0, policy_version 48344 (0.0012) [2025-01-04 00:50:01,436][134294] Updated weights for policy 0, policy_version 48354 (0.0014) [2025-01-04 00:50:03,424][134294] Updated weights for policy 0, policy_version 48364 (0.0015) [2025-01-04 00:50:03,968][134211] Fps is (10 sec: 18432.0, 60 sec: 15701.3, 300 sec: 15342.6). Total num frames: 198103040. Throughput: 0: 3814.8. Samples: 38686380. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 00:50:03,968][134211] Avg episode reward: [(0, '6.729')] [2025-01-04 00:50:06,379][134294] Updated weights for policy 0, policy_version 48374 (0.0026) [2025-01-04 00:50:08,968][134211] Fps is (10 sec: 16382.8, 60 sec: 15701.2, 300 sec: 15287.1). Total num frames: 198168576. Throughput: 0: 3788.7. Samples: 38710414. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 00:50:08,969][134211] Avg episode reward: [(0, '7.410')] [2025-01-04 00:50:09,731][134294] Updated weights for policy 0, policy_version 48384 (0.0025) [2025-01-04 00:50:12,862][134294] Updated weights for policy 0, policy_version 48394 (0.0028) [2025-01-04 00:50:13,968][134211] Fps is (10 sec: 13107.3, 60 sec: 15291.7, 300 sec: 15273.2). Total num frames: 198234112. Throughput: 0: 3787.9. Samples: 38729574. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 00:50:13,968][134211] Avg episode reward: [(0, '6.708')] [2025-01-04 00:50:15,895][134294] Updated weights for policy 0, policy_version 48404 (0.0024) [2025-01-04 00:50:18,916][134294] Updated weights for policy 0, policy_version 48414 (0.0024) [2025-01-04 00:50:18,968][134211] Fps is (10 sec: 13517.6, 60 sec: 14677.3, 300 sec: 15259.3). Total num frames: 198303744. Throughput: 0: 3800.5. Samples: 38739722. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 00:50:18,968][134211] Avg episode reward: [(0, '5.907')] [2025-01-04 00:50:21,949][134294] Updated weights for policy 0, policy_version 48424 (0.0023) [2025-01-04 00:50:23,968][134211] Fps is (10 sec: 13516.8, 60 sec: 14677.3, 300 sec: 15148.3). Total num frames: 198369280. Throughput: 0: 3810.6. Samples: 38760296. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 00:50:23,968][134211] Avg episode reward: [(0, '6.688')] [2025-01-04 00:50:25,013][134294] Updated weights for policy 0, policy_version 48434 (0.0024) [2025-01-04 00:50:27,627][134294] Updated weights for policy 0, policy_version 48444 (0.0018) [2025-01-04 00:50:28,967][134211] Fps is (10 sec: 14745.8, 60 sec: 15018.7, 300 sec: 15203.8). Total num frames: 198451200. Throughput: 0: 3784.5. Samples: 38782380. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 00:50:28,968][134211] Avg episode reward: [(0, '6.188')] [2025-01-04 00:50:29,727][134294] Updated weights for policy 0, policy_version 48454 (0.0014) [2025-01-04 00:50:31,722][134294] Updated weights for policy 0, policy_version 48464 (0.0014) [2025-01-04 00:50:33,630][134294] Updated weights for policy 0, policy_version 48474 (0.0014) [2025-01-04 00:50:33,968][134211] Fps is (10 sec: 18432.4, 60 sec: 15564.8, 300 sec: 15314.9). Total num frames: 198553600. Throughput: 0: 3776.6. Samples: 38797470. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 00:50:33,968][134211] Avg episode reward: [(0, '6.481')] [2025-01-04 00:50:35,584][134294] Updated weights for policy 0, policy_version 48484 (0.0014) [2025-01-04 00:50:38,663][134294] Updated weights for policy 0, policy_version 48494 (0.0024) [2025-01-04 00:50:38,968][134211] Fps is (10 sec: 18021.9, 60 sec: 15769.6, 300 sec: 15273.2). Total num frames: 198631424. Throughput: 0: 3971.2. Samples: 38826446. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 00:50:38,968][134211] Avg episode reward: [(0, '6.521')] [2025-01-04 00:50:41,809][134294] Updated weights for policy 0, policy_version 48504 (0.0026) [2025-01-04 00:50:43,968][134211] Fps is (10 sec: 14335.6, 60 sec: 15428.2, 300 sec: 15231.6). Total num frames: 198696960. Throughput: 0: 3898.7. Samples: 38845702. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 00:50:43,968][134211] Avg episode reward: [(0, '6.436')] [2025-01-04 00:50:44,994][134294] Updated weights for policy 0, policy_version 48514 (0.0027) [2025-01-04 00:50:48,189][134294] Updated weights for policy 0, policy_version 48524 (0.0025) [2025-01-04 00:50:48,968][134211] Fps is (10 sec: 13106.8, 60 sec: 15155.2, 300 sec: 15231.6). Total num frames: 198762496. Throughput: 0: 3753.1. Samples: 38855270. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 00:50:48,969][134211] Avg episode reward: [(0, '6.029')] [2025-01-04 00:50:51,255][134294] Updated weights for policy 0, policy_version 48534 (0.0025) [2025-01-04 00:50:53,968][134211] Fps is (10 sec: 13106.6, 60 sec: 15155.1, 300 sec: 15231.5). Total num frames: 198828032. Throughput: 0: 3651.4. Samples: 38874726. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 00:50:53,969][134211] Avg episode reward: [(0, '6.588')] [2025-01-04 00:50:54,491][134294] Updated weights for policy 0, policy_version 48544 (0.0024) [2025-01-04 00:50:57,425][134294] Updated weights for policy 0, policy_version 48554 (0.0024) [2025-01-04 00:50:58,968][134211] Fps is (10 sec: 13516.7, 60 sec: 14882.0, 300 sec: 15106.6). Total num frames: 198897664. Throughput: 0: 3671.3. Samples: 38894784. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 00:50:58,970][134211] Avg episode reward: [(0, '6.966')] [2025-01-04 00:51:00,450][134294] Updated weights for policy 0, policy_version 48564 (0.0025) [2025-01-04 00:51:02,625][134294] Updated weights for policy 0, policy_version 48574 (0.0014) [2025-01-04 00:51:03,967][134211] Fps is (10 sec: 15566.0, 60 sec: 14677.4, 300 sec: 15106.6). Total num frames: 198983680. Throughput: 0: 3685.9. Samples: 38905586. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 00:51:03,968][134211] Avg episode reward: [(0, '6.699')] [2025-01-04 00:51:04,551][134294] Updated weights for policy 0, policy_version 48584 (0.0015) [2025-01-04 00:51:06,497][134294] Updated weights for policy 0, policy_version 48594 (0.0014) [2025-01-04 00:51:08,882][134294] Updated weights for policy 0, policy_version 48604 (0.0018) [2025-01-04 00:51:08,968][134211] Fps is (10 sec: 18432.9, 60 sec: 15223.6, 300 sec: 15231.6). Total num frames: 199081984. Throughput: 0: 3930.3. Samples: 38937158. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 00:51:08,968][134211] Avg episode reward: [(0, '6.681')] [2025-01-04 00:51:12,522][134294] Updated weights for policy 0, policy_version 48614 (0.0027) [2025-01-04 00:51:13,968][134211] Fps is (10 sec: 15154.6, 60 sec: 15018.6, 300 sec: 15176.0). Total num frames: 199135232. Throughput: 0: 3853.4. Samples: 38955784. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 00:51:13,970][134211] Avg episode reward: [(0, '6.570')] [2025-01-04 00:51:13,995][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000048618_199139328.pth... [2025-01-04 00:51:14,064][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000047725_195481600.pth [2025-01-04 00:51:16,148][134294] Updated weights for policy 0, policy_version 48624 (0.0029) [2025-01-04 00:51:18,957][134294] Updated weights for policy 0, policy_version 48634 (0.0025) [2025-01-04 00:51:18,968][134211] Fps is (10 sec: 12288.1, 60 sec: 15018.7, 300 sec: 15189.9). Total num frames: 199204864. Throughput: 0: 3707.8. Samples: 38964322. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 00:51:18,969][134211] Avg episode reward: [(0, '7.744')] [2025-01-04 00:51:20,816][134294] Updated weights for policy 0, policy_version 48644 (0.0014) [2025-01-04 00:51:22,702][134294] Updated weights for policy 0, policy_version 48654 (0.0013) [2025-01-04 00:51:23,968][134211] Fps is (10 sec: 17613.4, 60 sec: 15701.4, 300 sec: 15314.9). Total num frames: 199311360. Throughput: 0: 3678.0. Samples: 38991956. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 00:51:23,968][134211] Avg episode reward: [(0, '6.813')] [2025-01-04 00:51:24,626][134294] Updated weights for policy 0, policy_version 48664 (0.0013) [2025-01-04 00:51:26,517][134294] Updated weights for policy 0, policy_version 48674 (0.0015) [2025-01-04 00:51:28,664][134294] Updated weights for policy 0, policy_version 48684 (0.0019) [2025-01-04 00:51:28,968][134211] Fps is (10 sec: 20479.9, 60 sec: 15974.4, 300 sec: 15301.0). Total num frames: 199409664. Throughput: 0: 3959.1. Samples: 39023860. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 00:51:28,968][134211] Avg episode reward: [(0, '6.497')] [2025-01-04 00:51:31,809][134294] Updated weights for policy 0, policy_version 48694 (0.0029) [2025-01-04 00:51:33,968][134211] Fps is (10 sec: 16383.7, 60 sec: 15360.0, 300 sec: 15162.1). Total num frames: 199475200. Throughput: 0: 3970.1. Samples: 39033922. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 00:51:33,968][134211] Avg episode reward: [(0, '6.989')] [2025-01-04 00:51:35,064][134294] Updated weights for policy 0, policy_version 48704 (0.0026) [2025-01-04 00:51:38,237][134294] Updated weights for policy 0, policy_version 48714 (0.0028) [2025-01-04 00:51:38,968][134211] Fps is (10 sec: 13107.1, 60 sec: 15155.2, 300 sec: 15106.6). Total num frames: 199540736. Throughput: 0: 3962.6. Samples: 39053042. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 00:51:38,968][134211] Avg episode reward: [(0, '6.715')] [2025-01-04 00:51:41,282][134294] Updated weights for policy 0, policy_version 48724 (0.0026) [2025-01-04 00:51:43,968][134211] Fps is (10 sec: 13107.1, 60 sec: 15155.2, 300 sec: 15106.6). Total num frames: 199606272. Throughput: 0: 3948.5. Samples: 39072464. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 00:51:43,968][134211] Avg episode reward: [(0, '6.133')] [2025-01-04 00:51:44,491][134294] Updated weights for policy 0, policy_version 48734 (0.0025) [2025-01-04 00:51:47,499][134294] Updated weights for policy 0, policy_version 48744 (0.0028) [2025-01-04 00:51:48,968][134211] Fps is (10 sec: 13107.2, 60 sec: 15155.3, 300 sec: 15106.7). Total num frames: 199671808. Throughput: 0: 3930.2. Samples: 39082444. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 00:51:48,968][134211] Avg episode reward: [(0, '6.612')] [2025-01-04 00:51:50,711][134294] Updated weights for policy 0, policy_version 48754 (0.0026) [2025-01-04 00:51:53,858][134294] Updated weights for policy 0, policy_version 48764 (0.0024) [2025-01-04 00:51:53,968][134211] Fps is (10 sec: 13107.2, 60 sec: 15155.3, 300 sec: 15106.6). Total num frames: 199737344. Throughput: 0: 3669.2. Samples: 39102274. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 00:51:53,968][134211] Avg episode reward: [(0, '7.090')] [2025-01-04 00:51:57,031][134294] Updated weights for policy 0, policy_version 48774 (0.0025) [2025-01-04 00:51:58,968][134211] Fps is (10 sec: 12697.3, 60 sec: 15018.7, 300 sec: 15092.7). Total num frames: 199798784. Throughput: 0: 3667.6. Samples: 39120824. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 00:51:58,969][134211] Avg episode reward: [(0, '6.361')] [2025-01-04 00:51:59,968][134294] Updated weights for policy 0, policy_version 48784 (0.0022) [2025-01-04 00:52:02,033][134294] Updated weights for policy 0, policy_version 48794 (0.0012) [2025-01-04 00:52:03,968][134211] Fps is (10 sec: 15565.1, 60 sec: 15155.2, 300 sec: 15189.9). Total num frames: 199892992. Throughput: 0: 3769.0. Samples: 39133926. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 00:52:03,968][134211] Avg episode reward: [(0, '7.078')] [2025-01-04 00:52:04,202][134294] Updated weights for policy 0, policy_version 48804 (0.0013) [2025-01-04 00:52:06,285][134294] Updated weights for policy 0, policy_version 48814 (0.0014) [2025-01-04 00:52:08,637][134294] Updated weights for policy 0, policy_version 48824 (0.0017) [2025-01-04 00:52:08,968][134211] Fps is (10 sec: 18841.7, 60 sec: 15086.9, 300 sec: 15287.1). Total num frames: 199987200. Throughput: 0: 3813.9. Samples: 39163584. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 00:52:08,969][134211] Avg episode reward: [(0, '7.099')] [2025-01-04 00:52:12,127][134294] Updated weights for policy 0, policy_version 48834 (0.0030) [2025-01-04 00:52:13,968][134211] Fps is (10 sec: 15154.8, 60 sec: 15155.2, 300 sec: 15245.4). Total num frames: 200044544. Throughput: 0: 3526.8. Samples: 39182566. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 00:52:13,969][134211] Avg episode reward: [(0, '6.748')] [2025-01-04 00:52:15,302][134294] Updated weights for policy 0, policy_version 48844 (0.0029) [2025-01-04 00:52:18,329][134294] Updated weights for policy 0, policy_version 48854 (0.0026) [2025-01-04 00:52:18,968][134211] Fps is (10 sec: 12698.0, 60 sec: 15155.2, 300 sec: 15217.7). Total num frames: 200114176. Throughput: 0: 3523.7. Samples: 39192486. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 00:52:18,968][134211] Avg episode reward: [(0, '6.365')] [2025-01-04 00:52:21,350][134294] Updated weights for policy 0, policy_version 48864 (0.0025) [2025-01-04 00:52:23,968][134211] Fps is (10 sec: 13516.8, 60 sec: 14472.5, 300 sec: 15078.8). Total num frames: 200179712. Throughput: 0: 3542.4. Samples: 39212452. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 00:52:23,969][134211] Avg episode reward: [(0, '6.715')] [2025-01-04 00:52:24,523][134294] Updated weights for policy 0, policy_version 48874 (0.0026) [2025-01-04 00:52:27,756][134294] Updated weights for policy 0, policy_version 48884 (0.0025) [2025-01-04 00:52:28,968][134211] Fps is (10 sec: 13516.5, 60 sec: 13994.6, 300 sec: 15037.2). Total num frames: 200249344. Throughput: 0: 3552.4. Samples: 39232320. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 00:52:28,968][134211] Avg episode reward: [(0, '7.385')] [2025-01-04 00:52:29,914][134294] Updated weights for policy 0, policy_version 48894 (0.0013) [2025-01-04 00:52:31,961][134294] Updated weights for policy 0, policy_version 48904 (0.0013) [2025-01-04 00:52:33,877][134294] Updated weights for policy 0, policy_version 48914 (0.0012) [2025-01-04 00:52:33,968][134211] Fps is (10 sec: 17203.7, 60 sec: 14609.1, 300 sec: 15176.0). Total num frames: 200351744. Throughput: 0: 3659.4. Samples: 39247118. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 00:52:33,968][134211] Avg episode reward: [(0, '6.632')] [2025-01-04 00:52:36,298][134294] Updated weights for policy 0, policy_version 48924 (0.0020) [2025-01-04 00:52:38,968][134211] Fps is (10 sec: 17612.9, 60 sec: 14745.6, 300 sec: 15203.8). Total num frames: 200425472. Throughput: 0: 3817.2. Samples: 39274046. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 00:52:38,968][134211] Avg episode reward: [(0, '6.279')] [2025-01-04 00:52:39,753][134294] Updated weights for policy 0, policy_version 48934 (0.0025) [2025-01-04 00:52:42,840][134294] Updated weights for policy 0, policy_version 48944 (0.0026) [2025-01-04 00:52:43,968][134211] Fps is (10 sec: 13516.6, 60 sec: 14677.4, 300 sec: 15120.5). Total num frames: 200486912. Throughput: 0: 3822.4. Samples: 39292830. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 00:52:43,968][134211] Avg episode reward: [(0, '6.760')] [2025-01-04 00:52:45,922][134294] Updated weights for policy 0, policy_version 48954 (0.0026) [2025-01-04 00:52:48,901][134294] Updated weights for policy 0, policy_version 48964 (0.0025) [2025-01-04 00:52:48,968][134211] Fps is (10 sec: 13106.8, 60 sec: 14745.5, 300 sec: 15134.4). Total num frames: 200556544. Throughput: 0: 3757.5. Samples: 39303016. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 00:52:48,969][134211] Avg episode reward: [(0, '6.828')] [2025-01-04 00:52:51,814][134294] Updated weights for policy 0, policy_version 48974 (0.0026) [2025-01-04 00:52:53,968][134211] Fps is (10 sec: 13516.8, 60 sec: 14745.6, 300 sec: 15134.4). Total num frames: 200622080. Throughput: 0: 3555.8. Samples: 39323594. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 00:52:53,969][134211] Avg episode reward: [(0, '7.238')] [2025-01-04 00:52:55,281][134294] Updated weights for policy 0, policy_version 48984 (0.0023) [2025-01-04 00:52:57,259][134294] Updated weights for policy 0, policy_version 48994 (0.0013) [2025-01-04 00:52:58,967][134211] Fps is (10 sec: 15565.8, 60 sec: 15223.6, 300 sec: 15092.7). Total num frames: 200712192. Throughput: 0: 3672.5. Samples: 39347828. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 00:52:58,968][134211] Avg episode reward: [(0, '7.418')] [2025-01-04 00:52:59,276][134294] Updated weights for policy 0, policy_version 49004 (0.0014) [2025-01-04 00:53:01,149][134294] Updated weights for policy 0, policy_version 49014 (0.0014) [2025-01-04 00:53:03,285][134294] Updated weights for policy 0, policy_version 49024 (0.0015) [2025-01-04 00:53:03,968][134211] Fps is (10 sec: 18432.0, 60 sec: 15223.4, 300 sec: 15037.2). Total num frames: 200806400. Throughput: 0: 3801.1. Samples: 39363534. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 00:53:03,968][134211] Avg episode reward: [(0, '6.687')] [2025-01-04 00:53:06,762][134294] Updated weights for policy 0, policy_version 49034 (0.0030) [2025-01-04 00:53:08,968][134211] Fps is (10 sec: 15564.0, 60 sec: 14677.3, 300 sec: 14912.2). Total num frames: 200867840. Throughput: 0: 3838.0. Samples: 39385162. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 00:53:08,969][134211] Avg episode reward: [(0, '6.945')] [2025-01-04 00:53:10,085][134294] Updated weights for policy 0, policy_version 49044 (0.0028) [2025-01-04 00:53:13,246][134294] Updated weights for policy 0, policy_version 49054 (0.0027) [2025-01-04 00:53:13,968][134211] Fps is (10 sec: 12697.4, 60 sec: 14813.9, 300 sec: 14940.0). Total num frames: 200933376. Throughput: 0: 3822.4. Samples: 39404330. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 00:53:13,969][134211] Avg episode reward: [(0, '7.034')] [2025-01-04 00:53:13,976][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000049056_200933376.pth... [2025-01-04 00:53:14,046][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000048173_197316608.pth [2025-01-04 00:53:16,306][134294] Updated weights for policy 0, policy_version 49064 (0.0025) [2025-01-04 00:53:18,968][134211] Fps is (10 sec: 13107.5, 60 sec: 14745.6, 300 sec: 14953.9). Total num frames: 200998912. Throughput: 0: 3717.5. Samples: 39414408. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 00:53:18,968][134211] Avg episode reward: [(0, '5.809')] [2025-01-04 00:53:19,443][134294] Updated weights for policy 0, policy_version 49074 (0.0025) [2025-01-04 00:53:22,285][134294] Updated weights for policy 0, policy_version 49084 (0.0021) [2025-01-04 00:53:23,968][134211] Fps is (10 sec: 14746.0, 60 sec: 15018.7, 300 sec: 15037.2). Total num frames: 201080832. Throughput: 0: 3567.0. Samples: 39434560. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 00:53:23,968][134211] Avg episode reward: [(0, '6.716')] [2025-01-04 00:53:24,189][134294] Updated weights for policy 0, policy_version 49094 (0.0014) [2025-01-04 00:53:26,054][134294] Updated weights for policy 0, policy_version 49104 (0.0012) [2025-01-04 00:53:27,965][134294] Updated weights for policy 0, policy_version 49114 (0.0013) [2025-01-04 00:53:28,968][134211] Fps is (10 sec: 19251.5, 60 sec: 15701.4, 300 sec: 15203.8). Total num frames: 201191424. Throughput: 0: 3873.8. Samples: 39467150. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 00:53:28,968][134211] Avg episode reward: [(0, '6.583')] [2025-01-04 00:53:29,862][134294] Updated weights for policy 0, policy_version 49124 (0.0012) [2025-01-04 00:53:31,726][134294] Updated weights for policy 0, policy_version 49134 (0.0012) [2025-01-04 00:53:33,868][134294] Updated weights for policy 0, policy_version 49144 (0.0017) [2025-01-04 00:53:33,968][134211] Fps is (10 sec: 21298.8, 60 sec: 15701.3, 300 sec: 15342.6). Total num frames: 201293824. Throughput: 0: 4010.4. Samples: 39483482. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 00:53:33,968][134211] Avg episode reward: [(0, '6.647')] [2025-01-04 00:53:37,203][134294] Updated weights for policy 0, policy_version 49154 (0.0026) [2025-01-04 00:53:38,968][134211] Fps is (10 sec: 16383.7, 60 sec: 15496.5, 300 sec: 15301.0). Total num frames: 201355264. Throughput: 0: 4073.5. Samples: 39506900. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 00:53:38,968][134211] Avg episode reward: [(0, '7.152')] [2025-01-04 00:53:40,389][134294] Updated weights for policy 0, policy_version 49164 (0.0024) [2025-01-04 00:53:43,461][134294] Updated weights for policy 0, policy_version 49174 (0.0028) [2025-01-04 00:53:43,968][134211] Fps is (10 sec: 12697.6, 60 sec: 15564.8, 300 sec: 15148.2). Total num frames: 201420800. Throughput: 0: 3971.7. Samples: 39526556. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 00:53:43,968][134211] Avg episode reward: [(0, '6.964')] [2025-01-04 00:53:46,487][134294] Updated weights for policy 0, policy_version 49184 (0.0023) [2025-01-04 00:53:48,969][134211] Fps is (10 sec: 13105.5, 60 sec: 15496.3, 300 sec: 15120.4). Total num frames: 201486336. Throughput: 0: 3844.5. Samples: 39536542. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 00:53:48,970][134211] Avg episode reward: [(0, '6.857')] [2025-01-04 00:53:49,657][134294] Updated weights for policy 0, policy_version 49194 (0.0027) [2025-01-04 00:53:52,751][134294] Updated weights for policy 0, policy_version 49204 (0.0021) [2025-01-04 00:53:53,968][134211] Fps is (10 sec: 13517.1, 60 sec: 15564.8, 300 sec: 15134.4). Total num frames: 201555968. Throughput: 0: 3805.5. Samples: 39556410. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 00:53:53,968][134211] Avg episode reward: [(0, '6.400')] [2025-01-04 00:53:55,609][134294] Updated weights for policy 0, policy_version 49214 (0.0025) [2025-01-04 00:53:58,681][134294] Updated weights for policy 0, policy_version 49224 (0.0027) [2025-01-04 00:53:58,969][134211] Fps is (10 sec: 13517.1, 60 sec: 15154.9, 300 sec: 15120.4). Total num frames: 201621504. Throughput: 0: 3834.6. Samples: 39576892. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 00:53:58,969][134211] Avg episode reward: [(0, '6.402')] [2025-01-04 00:54:01,732][134294] Updated weights for policy 0, policy_version 49234 (0.0026) [2025-01-04 00:54:03,968][134211] Fps is (10 sec: 13515.8, 60 sec: 14745.4, 300 sec: 15134.3). Total num frames: 201691136. Throughput: 0: 3835.4. Samples: 39587002. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 00:54:03,969][134211] Avg episode reward: [(0, '6.822')] [2025-01-04 00:54:04,821][134294] Updated weights for policy 0, policy_version 49244 (0.0026) [2025-01-04 00:54:07,088][134294] Updated weights for policy 0, policy_version 49254 (0.0016) [2025-01-04 00:54:08,946][134294] Updated weights for policy 0, policy_version 49264 (0.0013) [2025-01-04 00:54:08,967][134211] Fps is (10 sec: 16386.1, 60 sec: 15291.9, 300 sec: 15148.3). Total num frames: 201785344. Throughput: 0: 3899.6. Samples: 39610040. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 00:54:08,968][134211] Avg episode reward: [(0, '6.821')] [2025-01-04 00:54:11,159][134294] Updated weights for policy 0, policy_version 49274 (0.0018) [2025-01-04 00:54:13,968][134211] Fps is (10 sec: 17204.2, 60 sec: 15496.5, 300 sec: 15051.1). Total num frames: 201863168. Throughput: 0: 3770.2. Samples: 39636808. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 00:54:13,968][134211] Avg episode reward: [(0, '6.476')] [2025-01-04 00:54:14,190][134294] Updated weights for policy 0, policy_version 49284 (0.0027) [2025-01-04 00:54:17,181][134294] Updated weights for policy 0, policy_version 49294 (0.0025) [2025-01-04 00:54:18,968][134211] Fps is (10 sec: 14335.7, 60 sec: 15496.5, 300 sec: 15051.1). Total num frames: 201928704. Throughput: 0: 3628.3. Samples: 39646754. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 00:54:18,968][134211] Avg episode reward: [(0, '6.513')] [2025-01-04 00:54:20,249][134294] Updated weights for policy 0, policy_version 49304 (0.0027) [2025-01-04 00:54:22,726][134294] Updated weights for policy 0, policy_version 49314 (0.0018) [2025-01-04 00:54:23,968][134211] Fps is (10 sec: 15155.1, 60 sec: 15564.7, 300 sec: 15134.4). Total num frames: 202014720. Throughput: 0: 3575.1. Samples: 39667778. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 00:54:23,968][134211] Avg episode reward: [(0, '6.773')] [2025-01-04 00:54:24,949][134294] Updated weights for policy 0, policy_version 49324 (0.0018) [2025-01-04 00:54:28,301][134294] Updated weights for policy 0, policy_version 49334 (0.0026) [2025-01-04 00:54:28,968][134211] Fps is (10 sec: 14745.5, 60 sec: 14745.6, 300 sec: 15106.6). Total num frames: 202076160. Throughput: 0: 3651.3. Samples: 39690866. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 00:54:28,968][134211] Avg episode reward: [(0, '6.151')] [2025-01-04 00:54:31,142][134294] Updated weights for policy 0, policy_version 49344 (0.0021) [2025-01-04 00:54:33,177][134294] Updated weights for policy 0, policy_version 49354 (0.0013) [2025-01-04 00:54:33,967][134211] Fps is (10 sec: 15155.7, 60 sec: 14540.9, 300 sec: 15189.9). Total num frames: 202166272. Throughput: 0: 3671.7. Samples: 39701764. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 00:54:33,968][134211] Avg episode reward: [(0, '6.419')] [2025-01-04 00:54:35,150][134294] Updated weights for policy 0, policy_version 49364 (0.0014) [2025-01-04 00:54:37,005][134294] Updated weights for policy 0, policy_version 49374 (0.0013) [2025-01-04 00:54:38,921][134294] Updated weights for policy 0, policy_version 49384 (0.0013) [2025-01-04 00:54:38,967][134211] Fps is (10 sec: 20071.0, 60 sec: 15360.1, 300 sec: 15273.2). Total num frames: 202276864. Throughput: 0: 3924.0. Samples: 39732988. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 00:54:38,968][134211] Avg episode reward: [(0, '7.130')] [2025-01-04 00:54:40,827][134294] Updated weights for policy 0, policy_version 49394 (0.0013) [2025-01-04 00:54:42,915][134294] Updated weights for policy 0, policy_version 49404 (0.0015) [2025-01-04 00:54:43,968][134211] Fps is (10 sec: 20479.7, 60 sec: 15837.9, 300 sec: 15314.9). Total num frames: 202371072. Throughput: 0: 4154.1. Samples: 39763820. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 00:54:43,968][134211] Avg episode reward: [(0, '6.906')] [2025-01-04 00:54:46,107][134294] Updated weights for policy 0, policy_version 49414 (0.0029) [2025-01-04 00:54:48,968][134211] Fps is (10 sec: 15564.3, 60 sec: 15769.9, 300 sec: 15301.0). Total num frames: 202432512. Throughput: 0: 4142.3. Samples: 39773402. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 00:54:48,968][134211] Avg episode reward: [(0, '7.014')] [2025-01-04 00:54:49,566][134294] Updated weights for policy 0, policy_version 49424 (0.0029) [2025-01-04 00:54:52,627][134294] Updated weights for policy 0, policy_version 49434 (0.0028) [2025-01-04 00:54:53,968][134211] Fps is (10 sec: 12697.6, 60 sec: 15701.3, 300 sec: 15231.6). Total num frames: 202498048. Throughput: 0: 4045.0. Samples: 39792068. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 00:54:53,968][134211] Avg episode reward: [(0, '6.840')] [2025-01-04 00:54:55,728][134294] Updated weights for policy 0, policy_version 49444 (0.0024) [2025-01-04 00:54:58,762][134294] Updated weights for policy 0, policy_version 49454 (0.0025) [2025-01-04 00:54:58,968][134211] Fps is (10 sec: 13107.3, 60 sec: 15701.6, 300 sec: 15120.5). Total num frames: 202563584. Throughput: 0: 3897.2. Samples: 39812180. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 00:54:58,969][134211] Avg episode reward: [(0, '6.863')] [2025-01-04 00:55:01,768][134294] Updated weights for policy 0, policy_version 49464 (0.0024) [2025-01-04 00:55:03,968][134211] Fps is (10 sec: 13107.1, 60 sec: 15633.2, 300 sec: 15120.5). Total num frames: 202629120. Throughput: 0: 3901.0. Samples: 39822300. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 00:55:03,968][134211] Avg episode reward: [(0, '6.267')] [2025-01-04 00:55:04,962][134294] Updated weights for policy 0, policy_version 49474 (0.0027) [2025-01-04 00:55:08,048][134294] Updated weights for policy 0, policy_version 49484 (0.0026) [2025-01-04 00:55:08,968][134211] Fps is (10 sec: 13107.2, 60 sec: 15155.1, 300 sec: 15120.5). Total num frames: 202694656. Throughput: 0: 3870.7. Samples: 39841958. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 00:55:08,968][134211] Avg episode reward: [(0, '7.054')] [2025-01-04 00:55:11,067][134294] Updated weights for policy 0, policy_version 49494 (0.0026) [2025-01-04 00:55:13,968][134211] Fps is (10 sec: 13516.7, 60 sec: 15018.7, 300 sec: 15120.5). Total num frames: 202764288. Throughput: 0: 3803.6. Samples: 39862028. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 00:55:13,968][134211] Avg episode reward: [(0, '6.842')] [2025-01-04 00:55:13,979][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000049503_202764288.pth... [2025-01-04 00:55:14,054][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000048618_199139328.pth [2025-01-04 00:55:14,139][134294] Updated weights for policy 0, policy_version 49504 (0.0027) [2025-01-04 00:55:17,149][134294] Updated weights for policy 0, policy_version 49514 (0.0024) [2025-01-04 00:55:18,968][134211] Fps is (10 sec: 13926.5, 60 sec: 15086.9, 300 sec: 15134.4). Total num frames: 202833920. Throughput: 0: 3785.7. Samples: 39872120. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 00:55:18,968][134211] Avg episode reward: [(0, '6.695')] [2025-01-04 00:55:19,779][134294] Updated weights for policy 0, policy_version 49524 (0.0018) [2025-01-04 00:55:21,894][134294] Updated weights for policy 0, policy_version 49534 (0.0016) [2025-01-04 00:55:23,968][134211] Fps is (10 sec: 15564.9, 60 sec: 15087.0, 300 sec: 15148.2). Total num frames: 202919936. Throughput: 0: 3650.2. Samples: 39897248. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 00:55:23,968][134211] Avg episode reward: [(0, '7.480')] [2025-01-04 00:55:24,860][134294] Updated weights for policy 0, policy_version 49544 (0.0024) [2025-01-04 00:55:27,909][134294] Updated weights for policy 0, policy_version 49554 (0.0026) [2025-01-04 00:55:28,967][134211] Fps is (10 sec: 15974.6, 60 sec: 15291.8, 300 sec: 15051.1). Total num frames: 202993664. Throughput: 0: 3417.0. Samples: 39917586. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 00:55:28,968][134211] Avg episode reward: [(0, '6.116')] [2025-01-04 00:55:29,918][134294] Updated weights for policy 0, policy_version 49564 (0.0012) [2025-01-04 00:55:31,790][134294] Updated weights for policy 0, policy_version 49574 (0.0012) [2025-01-04 00:55:33,680][134294] Updated weights for policy 0, policy_version 49584 (0.0014) [2025-01-04 00:55:33,967][134211] Fps is (10 sec: 18022.9, 60 sec: 15564.8, 300 sec: 15148.3). Total num frames: 203100160. Throughput: 0: 3562.9. Samples: 39933732. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 00:55:33,968][134211] Avg episode reward: [(0, '6.694')] [2025-01-04 00:55:35,562][134294] Updated weights for policy 0, policy_version 49594 (0.0013) [2025-01-04 00:55:38,272][134294] Updated weights for policy 0, policy_version 49604 (0.0024) [2025-01-04 00:55:38,968][134211] Fps is (10 sec: 19250.9, 60 sec: 15155.2, 300 sec: 15217.7). Total num frames: 203186176. Throughput: 0: 3824.6. Samples: 39964176. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 00:55:38,968][134211] Avg episode reward: [(0, '6.624')] [2025-01-04 00:55:41,479][134294] Updated weights for policy 0, policy_version 49614 (0.0025) [2025-01-04 00:55:43,968][134211] Fps is (10 sec: 14745.1, 60 sec: 14609.0, 300 sec: 15203.8). Total num frames: 203247616. Throughput: 0: 3804.2. Samples: 39983370. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 00:55:43,969][134211] Avg episode reward: [(0, '7.111')] [2025-01-04 00:55:44,727][134294] Updated weights for policy 0, policy_version 49624 (0.0029) [2025-01-04 00:55:47,855][134294] Updated weights for policy 0, policy_version 49634 (0.0029) [2025-01-04 00:55:48,968][134211] Fps is (10 sec: 12697.1, 60 sec: 14677.3, 300 sec: 15203.8). Total num frames: 203313152. Throughput: 0: 3791.6. Samples: 39992924. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 00:55:48,969][134211] Avg episode reward: [(0, '6.944')] [2025-01-04 00:55:50,824][134294] Updated weights for policy 0, policy_version 49644 (0.0027) [2025-01-04 00:55:53,890][134294] Updated weights for policy 0, policy_version 49654 (0.0027) [2025-01-04 00:55:53,968][134211] Fps is (10 sec: 13517.1, 60 sec: 14745.6, 300 sec: 15203.8). Total num frames: 203382784. Throughput: 0: 3806.6. Samples: 40013254. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 00:55:53,968][134211] Avg episode reward: [(0, '6.590')] [2025-01-04 00:55:56,793][134294] Updated weights for policy 0, policy_version 49664 (0.0027) [2025-01-04 00:55:58,968][134211] Fps is (10 sec: 13517.4, 60 sec: 14745.6, 300 sec: 15134.4). Total num frames: 203448320. Throughput: 0: 3807.0. Samples: 40033344. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 00:55:58,968][134211] Avg episode reward: [(0, '6.433')] [2025-01-04 00:56:00,178][134294] Updated weights for policy 0, policy_version 49674 (0.0025) [2025-01-04 00:56:02,244][134294] Updated weights for policy 0, policy_version 49684 (0.0014) [2025-01-04 00:56:03,968][134211] Fps is (10 sec: 15565.0, 60 sec: 15155.3, 300 sec: 15106.6). Total num frames: 203538432. Throughput: 0: 3812.1. Samples: 40043666. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 00:56:03,968][134211] Avg episode reward: [(0, '6.701')] [2025-01-04 00:56:04,197][134294] Updated weights for policy 0, policy_version 49694 (0.0012) [2025-01-04 00:56:06,035][134294] Updated weights for policy 0, policy_version 49704 (0.0014) [2025-01-04 00:56:07,929][134294] Updated weights for policy 0, policy_version 49714 (0.0014) [2025-01-04 00:56:08,968][134211] Fps is (10 sec: 20070.3, 60 sec: 15906.1, 300 sec: 15301.0). Total num frames: 203649024. Throughput: 0: 3978.7. Samples: 40076288. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 00:56:08,969][134211] Avg episode reward: [(0, '7.068')] [2025-01-04 00:56:10,411][134294] Updated weights for policy 0, policy_version 49724 (0.0021) [2025-01-04 00:56:13,620][134294] Updated weights for policy 0, policy_version 49734 (0.0026) [2025-01-04 00:56:13,969][134211] Fps is (10 sec: 17610.8, 60 sec: 15837.6, 300 sec: 15287.0). Total num frames: 203714560. Throughput: 0: 4044.3. Samples: 40099584. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 00:56:13,969][134211] Avg episode reward: [(0, '6.892')] [2025-01-04 00:56:16,778][134294] Updated weights for policy 0, policy_version 49744 (0.0027) [2025-01-04 00:56:18,968][134211] Fps is (10 sec: 12697.1, 60 sec: 15701.2, 300 sec: 15134.3). Total num frames: 203776000. Throughput: 0: 3894.1. Samples: 40108970. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 00:56:18,969][134211] Avg episode reward: [(0, '6.917')] [2025-01-04 00:56:19,945][134294] Updated weights for policy 0, policy_version 49754 (0.0026) [2025-01-04 00:56:22,974][134294] Updated weights for policy 0, policy_version 49764 (0.0026) [2025-01-04 00:56:23,968][134211] Fps is (10 sec: 13108.4, 60 sec: 15428.3, 300 sec: 15037.2). Total num frames: 203845632. Throughput: 0: 3658.3. Samples: 40128798. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 00:56:23,968][134211] Avg episode reward: [(0, '7.537')] [2025-01-04 00:56:26,125][134294] Updated weights for policy 0, policy_version 49774 (0.0026) [2025-01-04 00:56:28,968][134211] Fps is (10 sec: 13107.7, 60 sec: 15223.4, 300 sec: 15023.3). Total num frames: 203907072. Throughput: 0: 3649.7. Samples: 40147606. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 00:56:28,968][134211] Avg episode reward: [(0, '6.574')] [2025-01-04 00:56:29,830][134294] Updated weights for policy 0, policy_version 49784 (0.0027) [2025-01-04 00:56:32,534][134294] Updated weights for policy 0, policy_version 49794 (0.0019) [2025-01-04 00:56:33,967][134211] Fps is (10 sec: 13517.1, 60 sec: 14677.3, 300 sec: 15051.1). Total num frames: 203980800. Throughput: 0: 3626.8. Samples: 40156130. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 00:56:33,968][134211] Avg episode reward: [(0, '6.679')] [2025-01-04 00:56:34,560][134294] Updated weights for policy 0, policy_version 49804 (0.0014) [2025-01-04 00:56:36,502][134294] Updated weights for policy 0, policy_version 49814 (0.0013) [2025-01-04 00:56:38,412][134294] Updated weights for policy 0, policy_version 49824 (0.0013) [2025-01-04 00:56:38,967][134211] Fps is (10 sec: 18022.7, 60 sec: 15018.7, 300 sec: 15189.9). Total num frames: 204087296. Throughput: 0: 3852.9. Samples: 40186632. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 00:56:38,968][134211] Avg episode reward: [(0, '6.485')] [2025-01-04 00:56:40,286][134294] Updated weights for policy 0, policy_version 49834 (0.0013) [2025-01-04 00:56:42,830][134294] Updated weights for policy 0, policy_version 49844 (0.0022) [2025-01-04 00:56:43,968][134211] Fps is (10 sec: 19250.5, 60 sec: 15428.3, 300 sec: 15259.3). Total num frames: 204173312. Throughput: 0: 4021.3. Samples: 40214304. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 00:56:43,969][134211] Avg episode reward: [(0, '6.544')] [2025-01-04 00:56:46,157][134294] Updated weights for policy 0, policy_version 49854 (0.0031) [2025-01-04 00:56:48,968][134211] Fps is (10 sec: 14745.1, 60 sec: 15360.1, 300 sec: 15245.5). Total num frames: 204234752. Throughput: 0: 4001.4. Samples: 40223730. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 00:56:48,968][134211] Avg episode reward: [(0, '5.588')] [2025-01-04 00:56:49,430][134294] Updated weights for policy 0, policy_version 49864 (0.0028) [2025-01-04 00:56:52,598][134294] Updated weights for policy 0, policy_version 49874 (0.0026) [2025-01-04 00:56:53,968][134211] Fps is (10 sec: 12697.7, 60 sec: 15291.7, 300 sec: 15259.3). Total num frames: 204300288. Throughput: 0: 3703.0. Samples: 40242924. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 00:56:53,968][134211] Avg episode reward: [(0, '6.521')] [2025-01-04 00:56:55,648][134294] Updated weights for policy 0, policy_version 49884 (0.0026) [2025-01-04 00:56:58,552][134294] Updated weights for policy 0, policy_version 49894 (0.0022) [2025-01-04 00:56:58,968][134211] Fps is (10 sec: 13516.9, 60 sec: 15360.0, 300 sec: 15176.0). Total num frames: 204369920. Throughput: 0: 3637.3. Samples: 40263260. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 00:56:58,968][134211] Avg episode reward: [(0, '6.554')] [2025-01-04 00:57:01,569][134294] Updated weights for policy 0, policy_version 49904 (0.0024) [2025-01-04 00:57:03,968][134211] Fps is (10 sec: 13516.8, 60 sec: 14950.3, 300 sec: 15078.8). Total num frames: 204435456. Throughput: 0: 3653.8. Samples: 40273392. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 00:57:03,968][134211] Avg episode reward: [(0, '6.302')] [2025-01-04 00:57:04,751][134294] Updated weights for policy 0, policy_version 49914 (0.0024) [2025-01-04 00:57:07,962][134294] Updated weights for policy 0, policy_version 49924 (0.0027) [2025-01-04 00:57:08,967][134211] Fps is (10 sec: 13517.2, 60 sec: 14267.8, 300 sec: 15120.5). Total num frames: 204505088. Throughput: 0: 3644.4. Samples: 40292796. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 00:57:08,968][134211] Avg episode reward: [(0, '5.944')] [2025-01-04 00:57:09,991][134294] Updated weights for policy 0, policy_version 49934 (0.0012) [2025-01-04 00:57:11,872][134294] Updated weights for policy 0, policy_version 49944 (0.0013) [2025-01-04 00:57:13,777][134294] Updated weights for policy 0, policy_version 49954 (0.0013) [2025-01-04 00:57:13,968][134211] Fps is (10 sec: 18022.7, 60 sec: 15018.9, 300 sec: 15259.3). Total num frames: 204615680. Throughput: 0: 3898.9. Samples: 40323056. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 00:57:13,968][134211] Avg episode reward: [(0, '6.034')] [2025-01-04 00:57:13,972][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000049955_204615680.pth... [2025-01-04 00:57:14,017][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000049056_200933376.pth [2025-01-04 00:57:15,655][134294] Updated weights for policy 0, policy_version 49964 (0.0013) [2025-01-04 00:57:17,524][134294] Updated weights for policy 0, policy_version 49974 (0.0013) [2025-01-04 00:57:18,968][134211] Fps is (10 sec: 20888.5, 60 sec: 15633.1, 300 sec: 15370.4). Total num frames: 204713984. Throughput: 0: 4070.2. Samples: 40339290. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 00:57:18,969][134211] Avg episode reward: [(0, '6.121')] [2025-01-04 00:57:20,277][134294] Updated weights for policy 0, policy_version 49984 (0.0024) [2025-01-04 00:57:23,518][134294] Updated weights for policy 0, policy_version 49994 (0.0025) [2025-01-04 00:57:23,968][134211] Fps is (10 sec: 16383.8, 60 sec: 15564.8, 300 sec: 15356.5). Total num frames: 204779520. Throughput: 0: 3916.2. Samples: 40362860. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 00:57:23,968][134211] Avg episode reward: [(0, '6.486')] [2025-01-04 00:57:26,692][134294] Updated weights for policy 0, policy_version 50004 (0.0023) [2025-01-04 00:57:28,968][134211] Fps is (10 sec: 13107.6, 60 sec: 15633.1, 300 sec: 15231.6). Total num frames: 204845056. Throughput: 0: 3722.7. Samples: 40381826. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 00:57:28,968][134211] Avg episode reward: [(0, '6.643')] [2025-01-04 00:57:29,879][134294] Updated weights for policy 0, policy_version 50014 (0.0026) [2025-01-04 00:57:32,937][134294] Updated weights for policy 0, policy_version 50024 (0.0026) [2025-01-04 00:57:33,968][134211] Fps is (10 sec: 13107.1, 60 sec: 15496.5, 300 sec: 15203.8). Total num frames: 204910592. Throughput: 0: 3731.6. Samples: 40391650. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 00:57:33,968][134211] Avg episode reward: [(0, '6.580')] [2025-01-04 00:57:35,963][134294] Updated weights for policy 0, policy_version 50034 (0.0027) [2025-01-04 00:57:38,968][134211] Fps is (10 sec: 13107.3, 60 sec: 14813.8, 300 sec: 15217.7). Total num frames: 204976128. Throughput: 0: 3761.3. Samples: 40412182. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 00:57:38,968][134211] Avg episode reward: [(0, '6.516')] [2025-01-04 00:57:39,094][134294] Updated weights for policy 0, policy_version 50044 (0.0027) [2025-01-04 00:57:42,179][134294] Updated weights for policy 0, policy_version 50054 (0.0025) [2025-01-04 00:57:43,968][134211] Fps is (10 sec: 13926.8, 60 sec: 14609.1, 300 sec: 15231.6). Total num frames: 205049856. Throughput: 0: 3756.8. Samples: 40432316. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 00:57:43,968][134211] Avg episode reward: [(0, '6.127')] [2025-01-04 00:57:44,395][134294] Updated weights for policy 0, policy_version 50064 (0.0016) [2025-01-04 00:57:46,269][134294] Updated weights for policy 0, policy_version 50074 (0.0014) [2025-01-04 00:57:48,183][134294] Updated weights for policy 0, policy_version 50084 (0.0013) [2025-01-04 00:57:48,968][134211] Fps is (10 sec: 18432.0, 60 sec: 15428.3, 300 sec: 15384.3). Total num frames: 205160448. Throughput: 0: 3891.6. Samples: 40448512. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 00:57:48,968][134211] Avg episode reward: [(0, '6.773')] [2025-01-04 00:57:50,447][134294] Updated weights for policy 0, policy_version 50094 (0.0019) [2025-01-04 00:57:53,615][134294] Updated weights for policy 0, policy_version 50104 (0.0026) [2025-01-04 00:57:53,968][134211] Fps is (10 sec: 18022.0, 60 sec: 15496.5, 300 sec: 15314.9). Total num frames: 205230080. Throughput: 0: 4056.0. Samples: 40475318. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 00:57:53,968][134211] Avg episode reward: [(0, '6.700')] [2025-01-04 00:57:56,575][134294] Updated weights for policy 0, policy_version 50114 (0.0026) [2025-01-04 00:57:58,968][134211] Fps is (10 sec: 13516.8, 60 sec: 15428.3, 300 sec: 15217.7). Total num frames: 205295616. Throughput: 0: 3816.2. Samples: 40494784. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 00:57:58,968][134211] Avg episode reward: [(0, '6.892')] [2025-01-04 00:57:59,902][134294] Updated weights for policy 0, policy_version 50124 (0.0027) [2025-01-04 00:58:03,103][134294] Updated weights for policy 0, policy_version 50134 (0.0027) [2025-01-04 00:58:03,968][134211] Fps is (10 sec: 12697.8, 60 sec: 15360.0, 300 sec: 15217.7). Total num frames: 205357056. Throughput: 0: 3662.5. Samples: 40504100. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 00:58:03,968][134211] Avg episode reward: [(0, '6.652')] [2025-01-04 00:58:05,532][134294] Updated weights for policy 0, policy_version 50144 (0.0019) [2025-01-04 00:58:07,880][134294] Updated weights for policy 0, policy_version 50154 (0.0021) [2025-01-04 00:58:08,968][134211] Fps is (10 sec: 14744.8, 60 sec: 15632.9, 300 sec: 15287.1). Total num frames: 205443072. Throughput: 0: 3678.8. Samples: 40528408. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 00:58:08,969][134211] Avg episode reward: [(0, '6.260')] [2025-01-04 00:58:10,971][134294] Updated weights for policy 0, policy_version 50164 (0.0027) [2025-01-04 00:58:13,777][134294] Updated weights for policy 0, policy_version 50174 (0.0020) [2025-01-04 00:58:13,968][134211] Fps is (10 sec: 15974.5, 60 sec: 15018.7, 300 sec: 15314.9). Total num frames: 205516800. Throughput: 0: 3712.1. Samples: 40548870. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 00:58:13,968][134211] Avg episode reward: [(0, '6.703')] [2025-01-04 00:58:15,775][134294] Updated weights for policy 0, policy_version 50184 (0.0016) [2025-01-04 00:58:18,699][134294] Updated weights for policy 0, policy_version 50194 (0.0024) [2025-01-04 00:58:18,968][134211] Fps is (10 sec: 15155.8, 60 sec: 14677.4, 300 sec: 15301.0). Total num frames: 205594624. Throughput: 0: 3808.0. Samples: 40563010. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 00:58:18,969][134211] Avg episode reward: [(0, '6.160')] [2025-01-04 00:58:21,721][134294] Updated weights for policy 0, policy_version 50204 (0.0024) [2025-01-04 00:58:23,968][134211] Fps is (10 sec: 14745.6, 60 sec: 14745.6, 300 sec: 15162.1). Total num frames: 205664256. Throughput: 0: 3810.3. Samples: 40583644. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 00:58:23,968][134211] Avg episode reward: [(0, '6.705')] [2025-01-04 00:58:24,914][134294] Updated weights for policy 0, policy_version 50214 (0.0026) [2025-01-04 00:58:27,632][134294] Updated weights for policy 0, policy_version 50224 (0.0021) [2025-01-04 00:58:28,968][134211] Fps is (10 sec: 14745.9, 60 sec: 14950.4, 300 sec: 15078.8). Total num frames: 205742080. Throughput: 0: 3844.4. Samples: 40605316. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 00:58:28,968][134211] Avg episode reward: [(0, '6.598')] [2025-01-04 00:58:29,685][134294] Updated weights for policy 0, policy_version 50234 (0.0012) [2025-01-04 00:58:31,761][134294] Updated weights for policy 0, policy_version 50244 (0.0013) [2025-01-04 00:58:33,772][134294] Updated weights for policy 0, policy_version 50254 (0.0015) [2025-01-04 00:58:33,968][134211] Fps is (10 sec: 17612.6, 60 sec: 15496.6, 300 sec: 15203.8). Total num frames: 205840384. Throughput: 0: 3817.1. Samples: 40620280. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 00:58:33,968][134211] Avg episode reward: [(0, '6.198')] [2025-01-04 00:58:36,776][134294] Updated weights for policy 0, policy_version 50264 (0.0025) [2025-01-04 00:58:38,968][134211] Fps is (10 sec: 16383.7, 60 sec: 15496.5, 300 sec: 15203.8). Total num frames: 205905920. Throughput: 0: 3768.7. Samples: 40644910. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 00:58:38,969][134211] Avg episode reward: [(0, '7.354')] [2025-01-04 00:58:40,024][134294] Updated weights for policy 0, policy_version 50274 (0.0026) [2025-01-04 00:58:43,276][134294] Updated weights for policy 0, policy_version 50284 (0.0027) [2025-01-04 00:58:43,968][134211] Fps is (10 sec: 13107.1, 60 sec: 15359.9, 300 sec: 15203.9). Total num frames: 205971456. Throughput: 0: 3756.6. Samples: 40663834. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 00:58:43,968][134211] Avg episode reward: [(0, '6.795')] [2025-01-04 00:58:46,228][134294] Updated weights for policy 0, policy_version 50294 (0.0025) [2025-01-04 00:58:48,968][134211] Fps is (10 sec: 13107.2, 60 sec: 14609.0, 300 sec: 15189.9). Total num frames: 206036992. Throughput: 0: 3771.9. Samples: 40673838. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 00:58:48,968][134211] Avg episode reward: [(0, '7.424')] [2025-01-04 00:58:49,439][134294] Updated weights for policy 0, policy_version 50304 (0.0026) [2025-01-04 00:58:51,987][134294] Updated weights for policy 0, policy_version 50314 (0.0018) [2025-01-04 00:58:53,967][134211] Fps is (10 sec: 15155.7, 60 sec: 14882.2, 300 sec: 15259.4). Total num frames: 206123008. Throughput: 0: 3716.3. Samples: 40695640. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 00:58:53,968][134211] Avg episode reward: [(0, '6.453')] [2025-01-04 00:58:53,985][134294] Updated weights for policy 0, policy_version 50324 (0.0014) [2025-01-04 00:58:55,883][134294] Updated weights for policy 0, policy_version 50334 (0.0014) [2025-01-04 00:58:57,773][134294] Updated weights for policy 0, policy_version 50344 (0.0013) [2025-01-04 00:58:58,968][134211] Fps is (10 sec: 19251.3, 60 sec: 15564.8, 300 sec: 15384.3). Total num frames: 206229504. Throughput: 0: 3970.6. Samples: 40727548. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 00:58:58,969][134211] Avg episode reward: [(0, '7.156')] [2025-01-04 00:59:00,415][134294] Updated weights for policy 0, policy_version 50354 (0.0021) [2025-01-04 00:59:03,656][134294] Updated weights for policy 0, policy_version 50364 (0.0028) [2025-01-04 00:59:03,968][134211] Fps is (10 sec: 16793.1, 60 sec: 15564.8, 300 sec: 15273.2). Total num frames: 206290944. Throughput: 0: 3894.6. Samples: 40738268. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 00:59:03,968][134211] Avg episode reward: [(0, '6.575')] [2025-01-04 00:59:06,886][134294] Updated weights for policy 0, policy_version 50374 (0.0026) [2025-01-04 00:59:08,968][134211] Fps is (10 sec: 12697.4, 60 sec: 15223.5, 300 sec: 15231.6). Total num frames: 206356480. Throughput: 0: 3855.0. Samples: 40757118. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 00:59:08,968][134211] Avg episode reward: [(0, '6.832')] [2025-01-04 00:59:10,057][134294] Updated weights for policy 0, policy_version 50384 (0.0026) [2025-01-04 00:59:13,133][134294] Updated weights for policy 0, policy_version 50394 (0.0025) [2025-01-04 00:59:13,971][134211] Fps is (10 sec: 13103.1, 60 sec: 15086.1, 300 sec: 15231.4). Total num frames: 206422016. Throughput: 0: 3810.9. Samples: 40776820. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 00:59:13,972][134211] Avg episode reward: [(0, '6.896')] [2025-01-04 00:59:13,984][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000050396_206422016.pth... [2025-01-04 00:59:14,066][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000049503_202764288.pth [2025-01-04 00:59:16,244][134294] Updated weights for policy 0, policy_version 50404 (0.0025) [2025-01-04 00:59:18,968][134211] Fps is (10 sec: 13517.1, 60 sec: 14950.4, 300 sec: 15176.0). Total num frames: 206491648. Throughput: 0: 3695.7. Samples: 40786584. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 00:59:18,968][134211] Avg episode reward: [(0, '7.127')] [2025-01-04 00:59:19,209][134294] Updated weights for policy 0, policy_version 50414 (0.0025) [2025-01-04 00:59:22,038][134294] Updated weights for policy 0, policy_version 50424 (0.0023) [2025-01-04 00:59:23,924][134294] Updated weights for policy 0, policy_version 50434 (0.0012) [2025-01-04 00:59:23,967][134211] Fps is (10 sec: 15570.2, 60 sec: 15223.5, 300 sec: 15259.4). Total num frames: 206577664. Throughput: 0: 3629.4. Samples: 40808234. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 00:59:23,968][134211] Avg episode reward: [(0, '6.842')] [2025-01-04 00:59:25,854][134294] Updated weights for policy 0, policy_version 50444 (0.0014) [2025-01-04 00:59:28,693][134294] Updated weights for policy 0, policy_version 50454 (0.0023) [2025-01-04 00:59:28,981][134211] Fps is (10 sec: 16770.5, 60 sec: 15288.2, 300 sec: 15230.9). Total num frames: 206659584. Throughput: 0: 3834.1. Samples: 40836420. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 00:59:28,984][134211] Avg episode reward: [(0, '7.068')] [2025-01-04 00:59:31,832][134294] Updated weights for policy 0, policy_version 50464 (0.0028) [2025-01-04 00:59:33,968][134211] Fps is (10 sec: 14745.1, 60 sec: 14745.6, 300 sec: 15078.8). Total num frames: 206725120. Throughput: 0: 3829.0. Samples: 40846144. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 00:59:33,968][134211] Avg episode reward: [(0, '7.471')] [2025-01-04 00:59:35,179][134294] Updated weights for policy 0, policy_version 50474 (0.0024) [2025-01-04 00:59:37,403][134294] Updated weights for policy 0, policy_version 50484 (0.0016) [2025-01-04 00:59:38,968][134211] Fps is (10 sec: 15586.3, 60 sec: 15155.3, 300 sec: 15065.0). Total num frames: 206815232. Throughput: 0: 3825.1. Samples: 40867770. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 00:59:38,968][134211] Avg episode reward: [(0, '7.065')] [2025-01-04 00:59:39,252][134294] Updated weights for policy 0, policy_version 50494 (0.0013) [2025-01-04 00:59:41,155][134294] Updated weights for policy 0, policy_version 50504 (0.0014) [2025-01-04 00:59:43,058][134294] Updated weights for policy 0, policy_version 50514 (0.0013) [2025-01-04 00:59:43,968][134211] Fps is (10 sec: 19661.0, 60 sec: 15837.9, 300 sec: 15217.7). Total num frames: 206921728. Throughput: 0: 3840.1. Samples: 40900354. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 00:59:43,968][134211] Avg episode reward: [(0, '6.917')] [2025-01-04 00:59:45,572][134294] Updated weights for policy 0, policy_version 50524 (0.0022) [2025-01-04 00:59:48,677][134294] Updated weights for policy 0, policy_version 50534 (0.0028) [2025-01-04 00:59:48,968][134211] Fps is (10 sec: 17203.0, 60 sec: 15837.9, 300 sec: 15217.7). Total num frames: 206987264. Throughput: 0: 3858.9. Samples: 40911916. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 00:59:48,968][134211] Avg episode reward: [(0, '6.927')] [2025-01-04 00:59:51,882][134294] Updated weights for policy 0, policy_version 50544 (0.0027) [2025-01-04 00:59:53,968][134211] Fps is (10 sec: 13107.1, 60 sec: 15496.4, 300 sec: 15217.7). Total num frames: 207052800. Throughput: 0: 3868.8. Samples: 40931216. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 00:59:53,968][134211] Avg episode reward: [(0, '6.451')] [2025-01-04 00:59:55,027][134294] Updated weights for policy 0, policy_version 50554 (0.0025) [2025-01-04 00:59:58,115][134294] Updated weights for policy 0, policy_version 50564 (0.0028) [2025-01-04 00:59:58,968][134211] Fps is (10 sec: 13107.1, 60 sec: 14813.9, 300 sec: 15217.7). Total num frames: 207118336. Throughput: 0: 3862.0. Samples: 40950596. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 00:59:58,968][134211] Avg episode reward: [(0, '7.150')] [2025-01-04 01:00:01,306][134294] Updated weights for policy 0, policy_version 50574 (0.0026) [2025-01-04 01:00:03,968][134211] Fps is (10 sec: 13107.2, 60 sec: 14882.1, 300 sec: 15217.7). Total num frames: 207183872. Throughput: 0: 3863.0. Samples: 40960422. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 01:00:03,968][134211] Avg episode reward: [(0, '6.903')] [2025-01-04 01:00:04,589][134294] Updated weights for policy 0, policy_version 50584 (0.0024) [2025-01-04 01:00:07,524][134294] Updated weights for policy 0, policy_version 50594 (0.0024) [2025-01-04 01:00:08,967][134211] Fps is (10 sec: 14336.4, 60 sec: 15087.0, 300 sec: 15245.5). Total num frames: 207261696. Throughput: 0: 3821.0. Samples: 40980178. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 01:00:08,968][134211] Avg episode reward: [(0, '6.721')] [2025-01-04 01:00:09,431][134294] Updated weights for policy 0, policy_version 50604 (0.0013) [2025-01-04 01:00:11,332][134294] Updated weights for policy 0, policy_version 50614 (0.0015) [2025-01-04 01:00:13,220][134294] Updated weights for policy 0, policy_version 50624 (0.0017) [2025-01-04 01:00:13,967][134211] Fps is (10 sec: 18842.3, 60 sec: 15838.8, 300 sec: 15384.3). Total num frames: 207372288. Throughput: 0: 3903.1. Samples: 41012006. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 01:00:13,968][134211] Avg episode reward: [(0, '6.787')] [2025-01-04 01:00:15,098][134294] Updated weights for policy 0, policy_version 50634 (0.0014) [2025-01-04 01:00:17,894][134294] Updated weights for policy 0, policy_version 50644 (0.0024) [2025-01-04 01:00:18,968][134211] Fps is (10 sec: 18841.3, 60 sec: 15974.4, 300 sec: 15356.5). Total num frames: 207450112. Throughput: 0: 4017.6. Samples: 41026936. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 01:00:18,969][134211] Avg episode reward: [(0, '6.466')] [2025-01-04 01:00:21,204][134294] Updated weights for policy 0, policy_version 50654 (0.0027) [2025-01-04 01:00:23,968][134211] Fps is (10 sec: 13925.9, 60 sec: 15564.7, 300 sec: 15314.9). Total num frames: 207511552. Throughput: 0: 3963.5. Samples: 41046128. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 01:00:23,968][134211] Avg episode reward: [(0, '6.723')] [2025-01-04 01:00:24,373][134294] Updated weights for policy 0, policy_version 50664 (0.0028) [2025-01-04 01:00:27,484][134294] Updated weights for policy 0, policy_version 50674 (0.0025) [2025-01-04 01:00:28,968][134211] Fps is (10 sec: 12287.9, 60 sec: 15226.9, 300 sec: 15162.1). Total num frames: 207572992. Throughput: 0: 3661.2. Samples: 41065108. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 01:00:28,968][134211] Avg episode reward: [(0, '6.972')] [2025-01-04 01:00:31,064][134294] Updated weights for policy 0, policy_version 50684 (0.0031) [2025-01-04 01:00:33,637][134294] Updated weights for policy 0, policy_version 50694 (0.0015) [2025-01-04 01:00:33,969][134211] Fps is (10 sec: 13515.7, 60 sec: 15359.8, 300 sec: 15120.4). Total num frames: 207646720. Throughput: 0: 3597.7. Samples: 41073814. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 01:00:33,969][134211] Avg episode reward: [(0, '6.625')] [2025-01-04 01:00:35,630][134294] Updated weights for policy 0, policy_version 50704 (0.0013) [2025-01-04 01:00:37,526][134294] Updated weights for policy 0, policy_version 50714 (0.0013) [2025-01-04 01:00:38,967][134211] Fps is (10 sec: 18023.0, 60 sec: 15633.1, 300 sec: 15273.2). Total num frames: 207753216. Throughput: 0: 3805.6. Samples: 41102466. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 01:00:38,968][134211] Avg episode reward: [(0, '7.394')] [2025-01-04 01:00:39,536][134294] Updated weights for policy 0, policy_version 50724 (0.0015) [2025-01-04 01:00:42,484][134294] Updated weights for policy 0, policy_version 50734 (0.0023) [2025-01-04 01:00:43,968][134211] Fps is (10 sec: 17613.4, 60 sec: 15018.5, 300 sec: 15287.1). Total num frames: 207822848. Throughput: 0: 3919.2. Samples: 41126962. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 01:00:43,969][134211] Avg episode reward: [(0, '6.851')] [2025-01-04 01:00:45,716][134294] Updated weights for policy 0, policy_version 50744 (0.0025) [2025-01-04 01:00:48,637][134294] Updated weights for policy 0, policy_version 50754 (0.0027) [2025-01-04 01:00:48,968][134211] Fps is (10 sec: 13516.5, 60 sec: 15018.7, 300 sec: 15273.2). Total num frames: 207888384. Throughput: 0: 3918.9. Samples: 41136772. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 01:00:48,968][134211] Avg episode reward: [(0, '7.062')] [2025-01-04 01:00:51,792][134294] Updated weights for policy 0, policy_version 50764 (0.0027) [2025-01-04 01:00:53,968][134211] Fps is (10 sec: 13517.7, 60 sec: 15087.0, 300 sec: 15287.1). Total num frames: 207958016. Throughput: 0: 3929.8. Samples: 41157018. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 01:00:53,968][134211] Avg episode reward: [(0, '6.460')] [2025-01-04 01:00:54,783][134294] Updated weights for policy 0, policy_version 50774 (0.0026) [2025-01-04 01:00:57,837][134294] Updated weights for policy 0, policy_version 50784 (0.0026) [2025-01-04 01:00:58,968][134211] Fps is (10 sec: 13516.7, 60 sec: 15086.9, 300 sec: 15203.8). Total num frames: 208023552. Throughput: 0: 3668.5. Samples: 41177088. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 01:00:58,968][134211] Avg episode reward: [(0, '7.481')] [2025-01-04 01:01:00,868][134294] Updated weights for policy 0, policy_version 50794 (0.0025) [2025-01-04 01:01:03,905][134294] Updated weights for policy 0, policy_version 50804 (0.0023) [2025-01-04 01:01:03,969][134211] Fps is (10 sec: 13514.9, 60 sec: 15154.9, 300 sec: 15064.9). Total num frames: 208093184. Throughput: 0: 3563.9. Samples: 41187316. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 01:01:03,970][134211] Avg episode reward: [(0, '7.355')] [2025-01-04 01:01:06,087][134294] Updated weights for policy 0, policy_version 50814 (0.0017) [2025-01-04 01:01:07,997][134294] Updated weights for policy 0, policy_version 50824 (0.0013) [2025-01-04 01:01:08,968][134211] Fps is (10 sec: 16793.9, 60 sec: 15496.5, 300 sec: 15176.1). Total num frames: 208191488. Throughput: 0: 3698.7. Samples: 41212568. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 01:01:08,968][134211] Avg episode reward: [(0, '7.405')] [2025-01-04 01:01:09,906][134294] Updated weights for policy 0, policy_version 50834 (0.0012) [2025-01-04 01:01:12,027][134294] Updated weights for policy 0, policy_version 50844 (0.0016) [2025-01-04 01:01:13,968][134211] Fps is (10 sec: 18434.0, 60 sec: 15086.8, 300 sec: 15259.3). Total num frames: 208277504. Throughput: 0: 3913.9. Samples: 41241232. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 01:01:13,969][134211] Avg episode reward: [(0, '7.046')] [2025-01-04 01:01:13,988][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000050850_208281600.pth... [2025-01-04 01:01:14,063][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000049955_204615680.pth [2025-01-04 01:01:15,271][134294] Updated weights for policy 0, policy_version 50854 (0.0028) [2025-01-04 01:01:18,403][134294] Updated weights for policy 0, policy_version 50864 (0.0028) [2025-01-04 01:01:18,968][134211] Fps is (10 sec: 15155.0, 60 sec: 14882.1, 300 sec: 15245.5). Total num frames: 208343040. Throughput: 0: 3928.1. Samples: 41250574. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 01:01:18,968][134211] Avg episode reward: [(0, '6.407')] [2025-01-04 01:01:21,581][134294] Updated weights for policy 0, policy_version 50874 (0.0026) [2025-01-04 01:01:23,968][134211] Fps is (10 sec: 13107.4, 60 sec: 14950.4, 300 sec: 15259.3). Total num frames: 208408576. Throughput: 0: 3723.5. Samples: 41270026. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 01:01:23,969][134211] Avg episode reward: [(0, '7.192')] [2025-01-04 01:01:24,736][134294] Updated weights for policy 0, policy_version 50884 (0.0029) [2025-01-04 01:01:27,571][134294] Updated weights for policy 0, policy_version 50894 (0.0021) [2025-01-04 01:01:28,967][134211] Fps is (10 sec: 14745.9, 60 sec: 15291.8, 300 sec: 15287.1). Total num frames: 208490496. Throughput: 0: 3664.4. Samples: 41291858. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 01:01:28,968][134211] Avg episode reward: [(0, '6.558')] [2025-01-04 01:01:29,509][134294] Updated weights for policy 0, policy_version 50904 (0.0013) [2025-01-04 01:01:31,326][134294] Updated weights for policy 0, policy_version 50914 (0.0014) [2025-01-04 01:01:33,232][134294] Updated weights for policy 0, policy_version 50924 (0.0012) [2025-01-04 01:01:33,968][134211] Fps is (10 sec: 19251.2, 60 sec: 15906.3, 300 sec: 15301.0). Total num frames: 208601088. Throughput: 0: 3810.7. Samples: 41308252. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 01:01:33,968][134211] Avg episode reward: [(0, '6.530')] [2025-01-04 01:01:35,098][134294] Updated weights for policy 0, policy_version 50934 (0.0012) [2025-01-04 01:01:37,136][134294] Updated weights for policy 0, policy_version 50944 (0.0015) [2025-01-04 01:01:38,970][134211] Fps is (10 sec: 19656.1, 60 sec: 15564.2, 300 sec: 15300.9). Total num frames: 208687104. Throughput: 0: 4068.4. Samples: 41340104. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 01:01:38,971][134211] Avg episode reward: [(0, '7.615')] [2025-01-04 01:01:40,309][134294] Updated weights for policy 0, policy_version 50954 (0.0028) [2025-01-04 01:01:43,545][134294] Updated weights for policy 0, policy_version 50964 (0.0028) [2025-01-04 01:01:43,968][134211] Fps is (10 sec: 15155.3, 60 sec: 15496.7, 300 sec: 15314.9). Total num frames: 208752640. Throughput: 0: 4050.7. Samples: 41359370. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 01:01:43,968][134211] Avg episode reward: [(0, '6.404')] [2025-01-04 01:01:46,555][134294] Updated weights for policy 0, policy_version 50974 (0.0026) [2025-01-04 01:01:48,968][134211] Fps is (10 sec: 13110.0, 60 sec: 15496.5, 300 sec: 15314.9). Total num frames: 208818176. Throughput: 0: 4043.3. Samples: 41369260. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 01:01:48,968][134211] Avg episode reward: [(0, '6.942')] [2025-01-04 01:01:49,742][134294] Updated weights for policy 0, policy_version 50984 (0.0024) [2025-01-04 01:01:53,029][134294] Updated weights for policy 0, policy_version 50994 (0.0028) [2025-01-04 01:01:53,968][134211] Fps is (10 sec: 12697.6, 60 sec: 15360.0, 300 sec: 15287.1). Total num frames: 208879616. Throughput: 0: 3911.7. Samples: 41388596. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 01:01:53,968][134211] Avg episode reward: [(0, '6.700')] [2025-01-04 01:01:56,128][134294] Updated weights for policy 0, policy_version 51004 (0.0024) [2025-01-04 01:01:58,968][134211] Fps is (10 sec: 12697.4, 60 sec: 15360.0, 300 sec: 15287.1). Total num frames: 208945152. Throughput: 0: 3699.9. Samples: 41407728. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 01:01:58,968][134211] Avg episode reward: [(0, '6.782')] [2025-01-04 01:01:59,411][134294] Updated weights for policy 0, policy_version 51014 (0.0025) [2025-01-04 01:02:02,426][134294] Updated weights for policy 0, policy_version 51024 (0.0025) [2025-01-04 01:02:03,968][134211] Fps is (10 sec: 13106.8, 60 sec: 15292.0, 300 sec: 15273.2). Total num frames: 209010688. Throughput: 0: 3710.4. Samples: 41417544. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 01:02:03,969][134211] Avg episode reward: [(0, '6.494')] [2025-01-04 01:02:05,262][134294] Updated weights for policy 0, policy_version 51034 (0.0021) [2025-01-04 01:02:07,232][134294] Updated weights for policy 0, policy_version 51044 (0.0014) [2025-01-04 01:02:08,967][134211] Fps is (10 sec: 16384.5, 60 sec: 15291.7, 300 sec: 15231.6). Total num frames: 209108992. Throughput: 0: 3828.5. Samples: 41442306. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 01:02:08,968][134211] Avg episode reward: [(0, '6.154')] [2025-01-04 01:02:09,147][134294] Updated weights for policy 0, policy_version 51054 (0.0015) [2025-01-04 01:02:11,043][134294] Updated weights for policy 0, policy_version 51064 (0.0015) [2025-01-04 01:02:13,158][134294] Updated weights for policy 0, policy_version 51074 (0.0017) [2025-01-04 01:02:13,968][134211] Fps is (10 sec: 19661.4, 60 sec: 15496.6, 300 sec: 15231.6). Total num frames: 209207296. Throughput: 0: 4027.3. Samples: 41473090. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 01:02:13,968][134211] Avg episode reward: [(0, '6.813')] [2025-01-04 01:02:16,408][134294] Updated weights for policy 0, policy_version 51084 (0.0028) [2025-01-04 01:02:18,968][134211] Fps is (10 sec: 15974.0, 60 sec: 15428.2, 300 sec: 15217.7). Total num frames: 209268736. Throughput: 0: 3871.8. Samples: 41482482. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 01:02:18,968][134211] Avg episode reward: [(0, '6.447')] [2025-01-04 01:02:19,693][134294] Updated weights for policy 0, policy_version 51094 (0.0028) [2025-01-04 01:02:22,889][134294] Updated weights for policy 0, policy_version 51104 (0.0027) [2025-01-04 01:02:23,968][134211] Fps is (10 sec: 12697.6, 60 sec: 15428.3, 300 sec: 15217.7). Total num frames: 209334272. Throughput: 0: 3588.3. Samples: 41501572. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 01:02:23,968][134211] Avg episode reward: [(0, '6.731')] [2025-01-04 01:02:25,873][134294] Updated weights for policy 0, policy_version 51114 (0.0026) [2025-01-04 01:02:28,846][134294] Updated weights for policy 0, policy_version 51124 (0.0025) [2025-01-04 01:02:28,968][134211] Fps is (10 sec: 13516.9, 60 sec: 15223.4, 300 sec: 15231.6). Total num frames: 209403904. Throughput: 0: 3613.2. Samples: 41521964. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 01:02:28,968][134211] Avg episode reward: [(0, '7.030')] [2025-01-04 01:02:32,237][134294] Updated weights for policy 0, policy_version 51134 (0.0024) [2025-01-04 01:02:33,968][134211] Fps is (10 sec: 13107.2, 60 sec: 14404.3, 300 sec: 15217.7). Total num frames: 209465344. Throughput: 0: 3598.3. Samples: 41531184. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 01:02:33,968][134211] Avg episode reward: [(0, '6.501')] [2025-01-04 01:02:34,830][134294] Updated weights for policy 0, policy_version 51144 (0.0016) [2025-01-04 01:02:36,834][134294] Updated weights for policy 0, policy_version 51154 (0.0013) [2025-01-04 01:02:38,794][134294] Updated weights for policy 0, policy_version 51164 (0.0013) [2025-01-04 01:02:38,967][134211] Fps is (10 sec: 16384.3, 60 sec: 14677.9, 300 sec: 15314.9). Total num frames: 209567744. Throughput: 0: 3732.4. Samples: 41556552. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 01:02:38,968][134211] Avg episode reward: [(0, '5.931')] [2025-01-04 01:02:40,654][134294] Updated weights for policy 0, policy_version 51174 (0.0013) [2025-01-04 01:02:43,715][134294] Updated weights for policy 0, policy_version 51184 (0.0025) [2025-01-04 01:02:43,968][134211] Fps is (10 sec: 18431.8, 60 sec: 14950.4, 300 sec: 15217.7). Total num frames: 209649664. Throughput: 0: 3915.7. Samples: 41583934. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 01:02:43,969][134211] Avg episode reward: [(0, '6.565')] [2025-01-04 01:02:46,784][134294] Updated weights for policy 0, policy_version 51194 (0.0026) [2025-01-04 01:02:48,968][134211] Fps is (10 sec: 14745.2, 60 sec: 14950.4, 300 sec: 15203.8). Total num frames: 209715200. Throughput: 0: 3918.2. Samples: 41593862. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 01:02:48,968][134211] Avg episode reward: [(0, '6.888')] [2025-01-04 01:02:50,087][134294] Updated weights for policy 0, policy_version 51204 (0.0027) [2025-01-04 01:02:53,121][134294] Updated weights for policy 0, policy_version 51214 (0.0027) [2025-01-04 01:02:53,968][134211] Fps is (10 sec: 13106.7, 60 sec: 15018.6, 300 sec: 15203.8). Total num frames: 209780736. Throughput: 0: 3800.1. Samples: 41613312. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 01:02:53,969][134211] Avg episode reward: [(0, '7.478')] [2025-01-04 01:02:56,070][134294] Updated weights for policy 0, policy_version 51224 (0.0027) [2025-01-04 01:02:58,968][134211] Fps is (10 sec: 13517.0, 60 sec: 15087.0, 300 sec: 15231.6). Total num frames: 209850368. Throughput: 0: 3559.1. Samples: 41633248. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 01:02:58,968][134211] Avg episode reward: [(0, '6.345')] [2025-01-04 01:02:59,294][134294] Updated weights for policy 0, policy_version 51234 (0.0025) [2025-01-04 01:03:01,879][134294] Updated weights for policy 0, policy_version 51244 (0.0019) [2025-01-04 01:03:03,829][134294] Updated weights for policy 0, policy_version 51254 (0.0012) [2025-01-04 01:03:03,968][134211] Fps is (10 sec: 15565.8, 60 sec: 15428.4, 300 sec: 15231.6). Total num frames: 209936384. Throughput: 0: 3575.3. Samples: 41643368. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 01:03:03,968][134211] Avg episode reward: [(0, '7.016')] [2025-01-04 01:03:05,731][134294] Updated weights for policy 0, policy_version 51264 (0.0014) [2025-01-04 01:03:07,558][134294] Updated weights for policy 0, policy_version 51274 (0.0012) [2025-01-04 01:03:08,968][134211] Fps is (10 sec: 19660.0, 60 sec: 15632.9, 300 sec: 15356.5). Total num frames: 210046976. Throughput: 0: 3868.2. Samples: 41675640. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 01:03:08,969][134211] Avg episode reward: [(0, '7.113')] [2025-01-04 01:03:09,503][134294] Updated weights for policy 0, policy_version 51284 (0.0014) [2025-01-04 01:03:11,833][134294] Updated weights for policy 0, policy_version 51294 (0.0019) [2025-01-04 01:03:13,968][134211] Fps is (10 sec: 18840.9, 60 sec: 15291.7, 300 sec: 15356.5). Total num frames: 210124800. Throughput: 0: 4011.2. Samples: 41702468. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 01:03:13,969][134211] Avg episode reward: [(0, '7.327')] [2025-01-04 01:03:13,978][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000051300_210124800.pth... [2025-01-04 01:03:14,062][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000050396_206422016.pth [2025-01-04 01:03:15,126][134294] Updated weights for policy 0, policy_version 51304 (0.0033) [2025-01-04 01:03:18,372][134294] Updated weights for policy 0, policy_version 51314 (0.0027) [2025-01-04 01:03:18,968][134211] Fps is (10 sec: 14336.0, 60 sec: 15359.9, 300 sec: 15342.6). Total num frames: 210190336. Throughput: 0: 4011.5. Samples: 41711702. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 01:03:18,969][134211] Avg episode reward: [(0, '6.920')] [2025-01-04 01:03:21,523][134294] Updated weights for policy 0, policy_version 51324 (0.0027) [2025-01-04 01:03:23,968][134211] Fps is (10 sec: 12697.7, 60 sec: 15291.7, 300 sec: 15287.1). Total num frames: 210251776. Throughput: 0: 3874.5. Samples: 41730908. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 01:03:23,968][134211] Avg episode reward: [(0, '6.082')] [2025-01-04 01:03:25,074][134294] Updated weights for policy 0, policy_version 51334 (0.0025) [2025-01-04 01:03:28,304][134294] Updated weights for policy 0, policy_version 51344 (0.0027) [2025-01-04 01:03:28,968][134211] Fps is (10 sec: 11878.8, 60 sec: 15086.9, 300 sec: 15148.3). Total num frames: 210309120. Throughput: 0: 3668.3. Samples: 41749008. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 01:03:28,968][134211] Avg episode reward: [(0, '6.846')] [2025-01-04 01:03:31,126][134294] Updated weights for policy 0, policy_version 51354 (0.0020) [2025-01-04 01:03:33,355][134294] Updated weights for policy 0, policy_version 51364 (0.0015) [2025-01-04 01:03:33,968][134211] Fps is (10 sec: 14336.3, 60 sec: 15496.6, 300 sec: 15217.7). Total num frames: 210395136. Throughput: 0: 3692.1. Samples: 41760008. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 01:03:33,968][134211] Avg episode reward: [(0, '6.780')] [2025-01-04 01:03:36,321][134294] Updated weights for policy 0, policy_version 51374 (0.0027) [2025-01-04 01:03:38,968][134211] Fps is (10 sec: 15154.7, 60 sec: 14882.0, 300 sec: 15217.7). Total num frames: 210460672. Throughput: 0: 3766.7. Samples: 41782812. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 01:03:38,969][134211] Avg episode reward: [(0, '6.178')] [2025-01-04 01:03:39,553][134294] Updated weights for policy 0, policy_version 51384 (0.0026) [2025-01-04 01:03:41,989][134294] Updated weights for policy 0, policy_version 51394 (0.0015) [2025-01-04 01:03:43,858][134294] Updated weights for policy 0, policy_version 51404 (0.0012) [2025-01-04 01:03:43,968][134211] Fps is (10 sec: 15564.9, 60 sec: 15018.7, 300 sec: 15301.0). Total num frames: 210550784. Throughput: 0: 3862.5. Samples: 41807062. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 01:03:43,968][134211] Avg episode reward: [(0, '6.335')] [2025-01-04 01:03:45,767][134294] Updated weights for policy 0, policy_version 51414 (0.0013) [2025-01-04 01:03:47,647][134294] Updated weights for policy 0, policy_version 51424 (0.0013) [2025-01-04 01:03:48,967][134211] Fps is (10 sec: 19661.7, 60 sec: 15701.4, 300 sec: 15370.4). Total num frames: 210657280. Throughput: 0: 3997.2. Samples: 41823240. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 01:03:48,968][134211] Avg episode reward: [(0, '6.938')] [2025-01-04 01:03:49,557][134294] Updated weights for policy 0, policy_version 51434 (0.0013) [2025-01-04 01:03:52,003][134294] Updated weights for policy 0, policy_version 51444 (0.0020) [2025-01-04 01:03:53,968][134211] Fps is (10 sec: 18841.5, 60 sec: 15974.6, 300 sec: 15287.1). Total num frames: 210739200. Throughput: 0: 3926.4. Samples: 41852326. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 01:03:53,968][134211] Avg episode reward: [(0, '6.803')] [2025-01-04 01:03:55,202][134294] Updated weights for policy 0, policy_version 51454 (0.0026) [2025-01-04 01:03:58,292][134294] Updated weights for policy 0, policy_version 51464 (0.0026) [2025-01-04 01:03:58,968][134211] Fps is (10 sec: 14744.8, 60 sec: 15906.0, 300 sec: 15301.0). Total num frames: 210804736. Throughput: 0: 3766.7. Samples: 41871972. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 01:03:58,970][134211] Avg episode reward: [(0, '6.281')] [2025-01-04 01:04:01,453][134294] Updated weights for policy 0, policy_version 51474 (0.0026) [2025-01-04 01:04:03,968][134211] Fps is (10 sec: 12697.6, 60 sec: 15496.5, 300 sec: 15287.1). Total num frames: 210866176. Throughput: 0: 3777.1. Samples: 41881668. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 01:04:03,968][134211] Avg episode reward: [(0, '6.341')] [2025-01-04 01:04:04,824][134294] Updated weights for policy 0, policy_version 51484 (0.0026) [2025-01-04 01:04:07,975][134294] Updated weights for policy 0, policy_version 51494 (0.0026) [2025-01-04 01:04:08,968][134211] Fps is (10 sec: 12697.3, 60 sec: 14745.5, 300 sec: 15287.2). Total num frames: 210931712. Throughput: 0: 3770.3. Samples: 41900572. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 01:04:08,969][134211] Avg episode reward: [(0, '6.799')] [2025-01-04 01:04:10,963][134294] Updated weights for policy 0, policy_version 51504 (0.0024) [2025-01-04 01:04:13,824][134294] Updated weights for policy 0, policy_version 51514 (0.0025) [2025-01-04 01:04:13,968][134211] Fps is (10 sec: 13516.7, 60 sec: 14609.1, 300 sec: 15287.1). Total num frames: 211001344. Throughput: 0: 3827.6. Samples: 41921252. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 01:04:13,968][134211] Avg episode reward: [(0, '7.374')] [2025-01-04 01:04:16,854][134294] Updated weights for policy 0, policy_version 51524 (0.0029) [2025-01-04 01:04:18,968][134211] Fps is (10 sec: 13517.5, 60 sec: 14609.1, 300 sec: 15217.7). Total num frames: 211066880. Throughput: 0: 3807.2. Samples: 41931332. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 01:04:18,968][134211] Avg episode reward: [(0, '6.654')] [2025-01-04 01:04:19,944][134294] Updated weights for policy 0, policy_version 51534 (0.0025) [2025-01-04 01:04:22,453][134294] Updated weights for policy 0, policy_version 51544 (0.0018) [2025-01-04 01:04:23,968][134211] Fps is (10 sec: 15564.7, 60 sec: 15086.9, 300 sec: 15246.2). Total num frames: 211156992. Throughput: 0: 3780.6. Samples: 41952940. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 01:04:23,968][134211] Avg episode reward: [(0, '6.215')] [2025-01-04 01:04:24,311][134294] Updated weights for policy 0, policy_version 51554 (0.0013) [2025-01-04 01:04:26,193][134294] Updated weights for policy 0, policy_version 51564 (0.0015) [2025-01-04 01:04:28,063][134294] Updated weights for policy 0, policy_version 51574 (0.0015) [2025-01-04 01:04:28,968][134211] Fps is (10 sec: 19661.1, 60 sec: 15906.2, 300 sec: 15384.3). Total num frames: 211263488. Throughput: 0: 3969.3. Samples: 41985682. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 01:04:28,968][134211] Avg episode reward: [(0, '6.469')] [2025-01-04 01:04:30,132][134294] Updated weights for policy 0, policy_version 51584 (0.0016) [2025-01-04 01:04:33,617][134294] Updated weights for policy 0, policy_version 51594 (0.0030) [2025-01-04 01:04:33,968][134211] Fps is (10 sec: 17203.1, 60 sec: 15564.7, 300 sec: 15301.0). Total num frames: 211329024. Throughput: 0: 3898.4. Samples: 41998668. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 01:04:33,969][134211] Avg episode reward: [(0, '6.722')] [2025-01-04 01:04:37,490][134294] Updated weights for policy 0, policy_version 51604 (0.0028) [2025-01-04 01:04:38,968][134211] Fps is (10 sec: 12288.0, 60 sec: 15428.4, 300 sec: 15134.4). Total num frames: 211386368. Throughput: 0: 3614.4. Samples: 42014972. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 01:04:38,968][134211] Avg episode reward: [(0, '6.389')] [2025-01-04 01:04:40,122][134294] Updated weights for policy 0, policy_version 51614 (0.0017) [2025-01-04 01:04:42,898][134294] Updated weights for policy 0, policy_version 51624 (0.0023) [2025-01-04 01:04:43,968][134211] Fps is (10 sec: 13516.8, 60 sec: 15223.4, 300 sec: 15176.0). Total num frames: 211464192. Throughput: 0: 3666.2. Samples: 42036950. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 01:04:43,969][134211] Avg episode reward: [(0, '5.735')] [2025-01-04 01:04:46,000][134294] Updated weights for policy 0, policy_version 51634 (0.0026) [2025-01-04 01:04:48,968][134211] Fps is (10 sec: 14335.8, 60 sec: 14540.8, 300 sec: 15176.0). Total num frames: 211529728. Throughput: 0: 3673.3. Samples: 42046968. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 01:04:48,968][134211] Avg episode reward: [(0, '6.486')] [2025-01-04 01:04:49,117][134294] Updated weights for policy 0, policy_version 51644 (0.0029) [2025-01-04 01:04:51,133][134294] Updated weights for policy 0, policy_version 51654 (0.0014) [2025-01-04 01:04:53,889][134294] Updated weights for policy 0, policy_version 51664 (0.0022) [2025-01-04 01:04:53,968][134211] Fps is (10 sec: 15155.5, 60 sec: 14609.1, 300 sec: 15245.5). Total num frames: 211615744. Throughput: 0: 3790.9. Samples: 42071162. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 01:04:53,968][134211] Avg episode reward: [(0, '5.969')] [2025-01-04 01:04:56,951][134294] Updated weights for policy 0, policy_version 51674 (0.0026) [2025-01-04 01:04:58,968][134211] Fps is (10 sec: 15155.3, 60 sec: 14609.2, 300 sec: 15245.5). Total num frames: 211681280. Throughput: 0: 3780.1. Samples: 42091354. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 01:04:58,968][134211] Avg episode reward: [(0, '5.858')] [2025-01-04 01:05:00,037][134294] Updated weights for policy 0, policy_version 51684 (0.0028) [2025-01-04 01:05:02,756][134294] Updated weights for policy 0, policy_version 51694 (0.0023) [2025-01-04 01:05:03,968][134211] Fps is (10 sec: 14745.7, 60 sec: 14950.4, 300 sec: 15259.3). Total num frames: 211763200. Throughput: 0: 3788.5. Samples: 42101814. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 01:05:03,968][134211] Avg episode reward: [(0, '7.100')] [2025-01-04 01:05:04,685][134294] Updated weights for policy 0, policy_version 51704 (0.0013) [2025-01-04 01:05:06,535][134294] Updated weights for policy 0, policy_version 51714 (0.0014) [2025-01-04 01:05:08,440][134294] Updated weights for policy 0, policy_version 51724 (0.0014) [2025-01-04 01:05:08,968][134211] Fps is (10 sec: 18841.8, 60 sec: 15633.3, 300 sec: 15245.4). Total num frames: 211869696. Throughput: 0: 3975.9. Samples: 42131854. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 01:05:08,968][134211] Avg episode reward: [(0, '6.317')] [2025-01-04 01:05:10,353][134294] Updated weights for policy 0, policy_version 51734 (0.0013) [2025-01-04 01:05:13,464][134294] Updated weights for policy 0, policy_version 51744 (0.0025) [2025-01-04 01:05:13,968][134211] Fps is (10 sec: 18431.4, 60 sec: 15769.6, 300 sec: 15245.4). Total num frames: 211947520. Throughput: 0: 3832.7. Samples: 42158156. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 01:05:13,969][134211] Avg episode reward: [(0, '6.846')] [2025-01-04 01:05:13,980][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000051745_211947520.pth... [2025-01-04 01:05:14,064][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000050850_208281600.pth [2025-01-04 01:05:16,849][134294] Updated weights for policy 0, policy_version 51754 (0.0029) [2025-01-04 01:05:18,968][134211] Fps is (10 sec: 13926.1, 60 sec: 15701.3, 300 sec: 15245.5). Total num frames: 212008960. Throughput: 0: 3743.3. Samples: 42167118. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 01:05:18,968][134211] Avg episode reward: [(0, '6.575')] [2025-01-04 01:05:20,139][134294] Updated weights for policy 0, policy_version 51764 (0.0027) [2025-01-04 01:05:23,286][134294] Updated weights for policy 0, policy_version 51774 (0.0028) [2025-01-04 01:05:23,968][134211] Fps is (10 sec: 12697.8, 60 sec: 15291.8, 300 sec: 15259.3). Total num frames: 212074496. Throughput: 0: 3810.0. Samples: 42186422. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 01:05:23,968][134211] Avg episode reward: [(0, '6.411')] [2025-01-04 01:05:26,196][134294] Updated weights for policy 0, policy_version 51784 (0.0025) [2025-01-04 01:05:28,969][134211] Fps is (10 sec: 13106.0, 60 sec: 14608.8, 300 sec: 15231.6). Total num frames: 212140032. Throughput: 0: 3763.0. Samples: 42206290. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 01:05:28,970][134211] Avg episode reward: [(0, '6.523')] [2025-01-04 01:05:29,373][134294] Updated weights for policy 0, policy_version 51794 (0.0024) [2025-01-04 01:05:32,354][134294] Updated weights for policy 0, policy_version 51804 (0.0025) [2025-01-04 01:05:33,968][134211] Fps is (10 sec: 14335.9, 60 sec: 14813.9, 300 sec: 15134.4). Total num frames: 212217856. Throughput: 0: 3765.3. Samples: 42216408. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 01:05:33,968][134211] Avg episode reward: [(0, '6.846')] [2025-01-04 01:05:34,525][134294] Updated weights for policy 0, policy_version 51814 (0.0015) [2025-01-04 01:05:36,412][134294] Updated weights for policy 0, policy_version 51824 (0.0012) [2025-01-04 01:05:38,890][134294] Updated weights for policy 0, policy_version 51834 (0.0020) [2025-01-04 01:05:38,968][134211] Fps is (10 sec: 17204.7, 60 sec: 15428.2, 300 sec: 15217.7). Total num frames: 212312064. Throughput: 0: 3865.1. Samples: 42245094. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 01:05:38,968][134211] Avg episode reward: [(0, '6.909')] [2025-01-04 01:05:41,934][134294] Updated weights for policy 0, policy_version 51844 (0.0028) [2025-01-04 01:05:43,968][134211] Fps is (10 sec: 15974.6, 60 sec: 15223.5, 300 sec: 15217.7). Total num frames: 212377600. Throughput: 0: 3864.7. Samples: 42265264. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 01:05:43,968][134211] Avg episode reward: [(0, '6.252')] [2025-01-04 01:05:45,237][134294] Updated weights for policy 0, policy_version 51854 (0.0028) [2025-01-04 01:05:47,565][134294] Updated weights for policy 0, policy_version 51864 (0.0017) [2025-01-04 01:05:48,967][134211] Fps is (10 sec: 15155.6, 60 sec: 15564.9, 300 sec: 15273.2). Total num frames: 212463616. Throughput: 0: 3839.2. Samples: 42274576. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 01:05:48,968][134211] Avg episode reward: [(0, '6.523')] [2025-01-04 01:05:49,459][134294] Updated weights for policy 0, policy_version 51874 (0.0013) [2025-01-04 01:05:51,361][134294] Updated weights for policy 0, policy_version 51884 (0.0013) [2025-01-04 01:05:53,182][134294] Updated weights for policy 0, policy_version 51894 (0.0013) [2025-01-04 01:05:53,968][134211] Fps is (10 sec: 19251.4, 60 sec: 15906.2, 300 sec: 15412.1). Total num frames: 212570112. Throughput: 0: 3894.5. Samples: 42307108. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 01:05:53,968][134211] Avg episode reward: [(0, '5.991')] [2025-01-04 01:05:55,729][134294] Updated weights for policy 0, policy_version 51904 (0.0020) [2025-01-04 01:05:58,838][134294] Updated weights for policy 0, policy_version 51914 (0.0027) [2025-01-04 01:05:58,968][134211] Fps is (10 sec: 17612.3, 60 sec: 15974.4, 300 sec: 15412.1). Total num frames: 212639744. Throughput: 0: 3840.0. Samples: 42330954. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 01:05:58,968][134211] Avg episode reward: [(0, '6.696')] [2025-01-04 01:06:02,109][134294] Updated weights for policy 0, policy_version 51924 (0.0026) [2025-01-04 01:06:03,968][134211] Fps is (10 sec: 13106.8, 60 sec: 15633.0, 300 sec: 15287.1). Total num frames: 212701184. Throughput: 0: 3849.1. Samples: 42340326. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 01:06:03,969][134211] Avg episode reward: [(0, '6.346')] [2025-01-04 01:06:05,510][134294] Updated weights for policy 0, policy_version 51934 (0.0026) [2025-01-04 01:06:08,478][134294] Updated weights for policy 0, policy_version 51944 (0.0023) [2025-01-04 01:06:08,968][134211] Fps is (10 sec: 12697.6, 60 sec: 14950.3, 300 sec: 15217.7). Total num frames: 212766720. Throughput: 0: 3845.9. Samples: 42359486. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 01:06:08,968][134211] Avg episode reward: [(0, '6.737')] [2025-01-04 01:06:11,547][134294] Updated weights for policy 0, policy_version 51954 (0.0022) [2025-01-04 01:06:13,968][134211] Fps is (10 sec: 13107.3, 60 sec: 14745.6, 300 sec: 15217.7). Total num frames: 212832256. Throughput: 0: 3847.6. Samples: 42379430. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 01:06:13,968][134211] Avg episode reward: [(0, '6.780')] [2025-01-04 01:06:14,646][134294] Updated weights for policy 0, policy_version 51964 (0.0029) [2025-01-04 01:06:17,756][134294] Updated weights for policy 0, policy_version 51974 (0.0022) [2025-01-04 01:06:18,968][134211] Fps is (10 sec: 13107.3, 60 sec: 14813.9, 300 sec: 15217.7). Total num frames: 212897792. Throughput: 0: 3844.4. Samples: 42389408. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 01:06:18,968][134211] Avg episode reward: [(0, '6.393')] [2025-01-04 01:06:20,410][134294] Updated weights for policy 0, policy_version 51984 (0.0020) [2025-01-04 01:06:22,309][134294] Updated weights for policy 0, policy_version 51994 (0.0013) [2025-01-04 01:06:23,968][134211] Fps is (10 sec: 16793.9, 60 sec: 15428.3, 300 sec: 15287.1). Total num frames: 213000192. Throughput: 0: 3764.0. Samples: 42414474. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 01:06:23,968][134211] Avg episode reward: [(0, '7.052')] [2025-01-04 01:06:24,220][134294] Updated weights for policy 0, policy_version 52004 (0.0013) [2025-01-04 01:06:26,022][134294] Updated weights for policy 0, policy_version 52014 (0.0013) [2025-01-04 01:06:27,953][134294] Updated weights for policy 0, policy_version 52024 (0.0013) [2025-01-04 01:06:28,967][134211] Fps is (10 sec: 21299.7, 60 sec: 16179.5, 300 sec: 15287.1). Total num frames: 213110784. Throughput: 0: 4041.3. Samples: 42447120. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 01:06:28,968][134211] Avg episode reward: [(0, '6.587')] [2025-01-04 01:06:29,781][134294] Updated weights for policy 0, policy_version 52034 (0.0013) [2025-01-04 01:06:31,708][134294] Updated weights for policy 0, policy_version 52044 (0.0013) [2025-01-04 01:06:33,968][134211] Fps is (10 sec: 20889.3, 60 sec: 16520.5, 300 sec: 15328.9). Total num frames: 213209088. Throughput: 0: 4196.7. Samples: 42463428. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-04 01:06:33,968][134211] Avg episode reward: [(0, '6.817')] [2025-01-04 01:06:34,213][134294] Updated weights for policy 0, policy_version 52054 (0.0017) [2025-01-04 01:06:38,160][134294] Updated weights for policy 0, policy_version 52064 (0.0032) [2025-01-04 01:06:38,968][134211] Fps is (10 sec: 15154.1, 60 sec: 15837.7, 300 sec: 15287.1). Total num frames: 213262336. Throughput: 0: 3930.4. Samples: 42483978. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-04 01:06:38,969][134211] Avg episode reward: [(0, '6.894')] [2025-01-04 01:06:41,714][134294] Updated weights for policy 0, policy_version 52074 (0.0029) [2025-01-04 01:06:43,969][134211] Fps is (10 sec: 11057.9, 60 sec: 15701.0, 300 sec: 15259.3). Total num frames: 213319680. Throughput: 0: 3784.5. Samples: 42501260. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-04 01:06:43,970][134211] Avg episode reward: [(0, '6.986')] [2025-01-04 01:06:44,951][134294] Updated weights for policy 0, policy_version 52084 (0.0028) [2025-01-04 01:06:48,008][134294] Updated weights for policy 0, policy_version 52094 (0.0027) [2025-01-04 01:06:48,968][134211] Fps is (10 sec: 12698.3, 60 sec: 15428.2, 300 sec: 15287.1). Total num frames: 213389312. Throughput: 0: 3794.8. Samples: 42511092. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-04 01:06:48,968][134211] Avg episode reward: [(0, '7.235')] [2025-01-04 01:06:51,027][134294] Updated weights for policy 0, policy_version 52104 (0.0024) [2025-01-04 01:06:53,968][134211] Fps is (10 sec: 13517.9, 60 sec: 14745.5, 300 sec: 15287.1). Total num frames: 213454848. Throughput: 0: 3821.7. Samples: 42531466. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-04 01:06:53,969][134211] Avg episode reward: [(0, '6.697')] [2025-01-04 01:06:54,193][134294] Updated weights for policy 0, policy_version 52114 (0.0023) [2025-01-04 01:06:57,191][134294] Updated weights for policy 0, policy_version 52124 (0.0025) [2025-01-04 01:06:58,968][134211] Fps is (10 sec: 13107.2, 60 sec: 14677.3, 300 sec: 15287.1). Total num frames: 213520384. Throughput: 0: 3826.3. Samples: 42551614. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-04 01:06:58,968][134211] Avg episode reward: [(0, '7.160')] [2025-01-04 01:07:00,080][134294] Updated weights for policy 0, policy_version 52134 (0.0028) [2025-01-04 01:07:03,081][134294] Updated weights for policy 0, policy_version 52144 (0.0024) [2025-01-04 01:07:03,968][134211] Fps is (10 sec: 13517.4, 60 sec: 14813.9, 300 sec: 15189.9). Total num frames: 213590016. Throughput: 0: 3838.3. Samples: 42562130. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-04 01:07:03,968][134211] Avg episode reward: [(0, '6.868')] [2025-01-04 01:07:06,041][134294] Updated weights for policy 0, policy_version 52154 (0.0028) [2025-01-04 01:07:08,357][134294] Updated weights for policy 0, policy_version 52164 (0.0015) [2025-01-04 01:07:08,969][134211] Fps is (10 sec: 15153.9, 60 sec: 15086.7, 300 sec: 15134.3). Total num frames: 213671936. Throughput: 0: 3749.1. Samples: 42583188. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 01:07:08,969][134211] Avg episode reward: [(0, '6.322')] [2025-01-04 01:07:10,383][134294] Updated weights for policy 0, policy_version 52174 (0.0012) [2025-01-04 01:07:12,240][134294] Updated weights for policy 0, policy_version 52184 (0.0014) [2025-01-04 01:07:13,968][134211] Fps is (10 sec: 19251.1, 60 sec: 15837.9, 300 sec: 15301.0). Total num frames: 213782528. Throughput: 0: 3725.4. Samples: 42614762. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 01:07:13,968][134211] Avg episode reward: [(0, '7.137')] [2025-01-04 01:07:13,973][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000052193_213782528.pth... [2025-01-04 01:07:14,028][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000051300_210124800.pth [2025-01-04 01:07:14,289][134294] Updated weights for policy 0, policy_version 52194 (0.0017) [2025-01-04 01:07:17,162][134294] Updated weights for policy 0, policy_version 52204 (0.0023) [2025-01-04 01:07:18,968][134211] Fps is (10 sec: 17614.3, 60 sec: 15837.9, 300 sec: 15301.0). Total num frames: 213848064. Throughput: 0: 3627.1. Samples: 42626646. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 01:07:18,968][134211] Avg episode reward: [(0, '6.671')] [2025-01-04 01:07:20,462][134294] Updated weights for policy 0, policy_version 52214 (0.0027) [2025-01-04 01:07:23,542][134294] Updated weights for policy 0, policy_version 52224 (0.0030) [2025-01-04 01:07:23,968][134211] Fps is (10 sec: 12697.6, 60 sec: 15155.2, 300 sec: 15273.2). Total num frames: 213909504. Throughput: 0: 3602.8. Samples: 42646104. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 01:07:23,968][134211] Avg episode reward: [(0, '7.328')] [2025-01-04 01:07:26,930][134294] Updated weights for policy 0, policy_version 52234 (0.0027) [2025-01-04 01:07:28,969][134211] Fps is (10 sec: 12696.7, 60 sec: 14404.0, 300 sec: 15287.1). Total num frames: 213975040. Throughput: 0: 3636.4. Samples: 42664896. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 01:07:28,969][134211] Avg episode reward: [(0, '6.615')] [2025-01-04 01:07:29,968][134294] Updated weights for policy 0, policy_version 52244 (0.0027) [2025-01-04 01:07:32,103][134294] Updated weights for policy 0, policy_version 52254 (0.0014) [2025-01-04 01:07:33,915][134294] Updated weights for policy 0, policy_version 52264 (0.0013) [2025-01-04 01:07:33,968][134211] Fps is (10 sec: 16384.1, 60 sec: 14404.3, 300 sec: 15273.2). Total num frames: 214073344. Throughput: 0: 3685.1. Samples: 42676922. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 01:07:33,968][134211] Avg episode reward: [(0, '6.716')] [2025-01-04 01:07:35,840][134294] Updated weights for policy 0, policy_version 52274 (0.0014) [2025-01-04 01:07:37,719][134294] Updated weights for policy 0, policy_version 52284 (0.0013) [2025-01-04 01:07:38,968][134211] Fps is (10 sec: 20071.9, 60 sec: 15223.6, 300 sec: 15342.6). Total num frames: 214175744. Throughput: 0: 3949.6. Samples: 42709196. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 01:07:38,968][134211] Avg episode reward: [(0, '6.576')] [2025-01-04 01:07:40,598][134294] Updated weights for policy 0, policy_version 52294 (0.0023) [2025-01-04 01:07:43,765][134294] Updated weights for policy 0, policy_version 52304 (0.0028) [2025-01-04 01:07:43,968][134211] Fps is (10 sec: 16383.9, 60 sec: 15292.1, 300 sec: 15328.8). Total num frames: 214237184. Throughput: 0: 3984.1. Samples: 42730898. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 01:07:43,968][134211] Avg episode reward: [(0, '7.160')] [2025-01-04 01:07:46,986][134294] Updated weights for policy 0, policy_version 52314 (0.0025) [2025-01-04 01:07:48,968][134211] Fps is (10 sec: 12696.9, 60 sec: 15223.3, 300 sec: 15328.8). Total num frames: 214302720. Throughput: 0: 3952.1. Samples: 42739978. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 01:07:48,969][134211] Avg episode reward: [(0, '6.127')] [2025-01-04 01:07:50,090][134294] Updated weights for policy 0, policy_version 52324 (0.0023) [2025-01-04 01:07:53,120][134294] Updated weights for policy 0, policy_version 52334 (0.0028) [2025-01-04 01:07:53,968][134211] Fps is (10 sec: 13107.1, 60 sec: 15223.5, 300 sec: 15314.9). Total num frames: 214368256. Throughput: 0: 3935.0. Samples: 42760262. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 01:07:53,968][134211] Avg episode reward: [(0, '6.697')] [2025-01-04 01:07:56,130][134294] Updated weights for policy 0, policy_version 52344 (0.0025) [2025-01-04 01:07:58,968][134211] Fps is (10 sec: 13517.6, 60 sec: 15291.7, 300 sec: 15259.3). Total num frames: 214437888. Throughput: 0: 3679.5. Samples: 42780340. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 01:07:58,968][134211] Avg episode reward: [(0, '6.863')] [2025-01-04 01:07:59,157][134294] Updated weights for policy 0, policy_version 52354 (0.0025) [2025-01-04 01:08:02,267][134294] Updated weights for policy 0, policy_version 52364 (0.0027) [2025-01-04 01:08:03,968][134211] Fps is (10 sec: 13517.0, 60 sec: 15223.5, 300 sec: 15106.6). Total num frames: 214503424. Throughput: 0: 3639.3. Samples: 42790412. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 01:08:03,968][134211] Avg episode reward: [(0, '7.198')] [2025-01-04 01:08:04,852][134294] Updated weights for policy 0, policy_version 52374 (0.0018) [2025-01-04 01:08:06,791][134294] Updated weights for policy 0, policy_version 52384 (0.0013) [2025-01-04 01:08:08,766][134294] Updated weights for policy 0, policy_version 52394 (0.0015) [2025-01-04 01:08:08,968][134211] Fps is (10 sec: 16793.7, 60 sec: 15565.0, 300 sec: 15189.9). Total num frames: 214605824. Throughput: 0: 3791.2. Samples: 42816706. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 01:08:08,968][134211] Avg episode reward: [(0, '6.702')] [2025-01-04 01:08:11,708][134294] Updated weights for policy 0, policy_version 52404 (0.0024) [2025-01-04 01:08:13,968][134211] Fps is (10 sec: 17203.2, 60 sec: 14882.1, 300 sec: 15203.8). Total num frames: 214675456. Throughput: 0: 3881.2. Samples: 42839546. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 01:08:13,968][134211] Avg episode reward: [(0, '7.379')] [2025-01-04 01:08:14,950][134294] Updated weights for policy 0, policy_version 52414 (0.0031) [2025-01-04 01:08:17,988][134294] Updated weights for policy 0, policy_version 52424 (0.0024) [2025-01-04 01:08:18,968][134211] Fps is (10 sec: 13926.5, 60 sec: 14950.4, 300 sec: 15231.6). Total num frames: 214745088. Throughput: 0: 3824.8. Samples: 42849040. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 01:08:18,968][134211] Avg episode reward: [(0, '6.516')] [2025-01-04 01:08:20,004][134294] Updated weights for policy 0, policy_version 52434 (0.0014) [2025-01-04 01:08:21,892][134294] Updated weights for policy 0, policy_version 52444 (0.0013) [2025-01-04 01:08:23,733][134294] Updated weights for policy 0, policy_version 52454 (0.0014) [2025-01-04 01:08:23,967][134211] Fps is (10 sec: 18022.7, 60 sec: 15769.7, 300 sec: 15412.1). Total num frames: 214855680. Throughput: 0: 3741.9. Samples: 42877582. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 01:08:23,968][134211] Avg episode reward: [(0, '6.623')] [2025-01-04 01:08:25,644][134294] Updated weights for policy 0, policy_version 52464 (0.0012) [2025-01-04 01:08:28,450][134294] Updated weights for policy 0, policy_version 52474 (0.0025) [2025-01-04 01:08:28,968][134211] Fps is (10 sec: 19250.7, 60 sec: 16042.8, 300 sec: 15398.2). Total num frames: 214937600. Throughput: 0: 3883.1. Samples: 42905640. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 01:08:28,968][134211] Avg episode reward: [(0, '7.117')] [2025-01-04 01:08:31,615][134294] Updated weights for policy 0, policy_version 52484 (0.0028) [2025-01-04 01:08:33,968][134211] Fps is (10 sec: 13926.1, 60 sec: 15360.0, 300 sec: 15370.4). Total num frames: 214994944. Throughput: 0: 3899.7. Samples: 42915462. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 01:08:33,968][134211] Avg episode reward: [(0, '7.522')] [2025-01-04 01:08:35,523][134294] Updated weights for policy 0, policy_version 52494 (0.0026) [2025-01-04 01:08:38,968][134211] Fps is (10 sec: 11468.9, 60 sec: 14609.1, 300 sec: 15259.3). Total num frames: 215052288. Throughput: 0: 3814.0. Samples: 42931894. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 01:08:38,969][134211] Avg episode reward: [(0, '6.249')] [2025-01-04 01:08:39,061][134294] Updated weights for policy 0, policy_version 52504 (0.0028) [2025-01-04 01:08:41,857][134294] Updated weights for policy 0, policy_version 52514 (0.0017) [2025-01-04 01:08:43,968][134211] Fps is (10 sec: 13925.6, 60 sec: 14950.2, 300 sec: 15176.0). Total num frames: 215134208. Throughput: 0: 3850.7. Samples: 42953622. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 01:08:43,969][134211] Avg episode reward: [(0, '6.729')] [2025-01-04 01:08:44,004][134294] Updated weights for policy 0, policy_version 52524 (0.0012) [2025-01-04 01:08:45,939][134294] Updated weights for policy 0, policy_version 52534 (0.0013) [2025-01-04 01:08:47,816][134294] Updated weights for policy 0, policy_version 52544 (0.0012) [2025-01-04 01:08:48,968][134211] Fps is (10 sec: 19251.6, 60 sec: 15701.5, 300 sec: 15273.2). Total num frames: 215244800. Throughput: 0: 3977.4. Samples: 42969394. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 01:08:48,968][134211] Avg episode reward: [(0, '7.083')] [2025-01-04 01:08:49,688][134294] Updated weights for policy 0, policy_version 52554 (0.0013) [2025-01-04 01:08:51,617][134294] Updated weights for policy 0, policy_version 52564 (0.0015) [2025-01-04 01:08:53,968][134211] Fps is (10 sec: 19661.7, 60 sec: 16042.7, 300 sec: 15342.7). Total num frames: 215330816. Throughput: 0: 4093.3. Samples: 43000906. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 01:08:53,969][134211] Avg episode reward: [(0, '7.008')] [2025-01-04 01:08:54,725][134294] Updated weights for policy 0, policy_version 52574 (0.0025) [2025-01-04 01:08:58,109][134294] Updated weights for policy 0, policy_version 52584 (0.0027) [2025-01-04 01:08:58,968][134211] Fps is (10 sec: 14745.4, 60 sec: 15906.1, 300 sec: 15342.6). Total num frames: 215392256. Throughput: 0: 3998.7. Samples: 43019488. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 01:08:58,968][134211] Avg episode reward: [(0, '6.698')] [2025-01-04 01:09:01,175][134294] Updated weights for policy 0, policy_version 52594 (0.0025) [2025-01-04 01:09:03,968][134211] Fps is (10 sec: 12697.9, 60 sec: 15906.1, 300 sec: 15342.7). Total num frames: 215457792. Throughput: 0: 4011.3. Samples: 43029550. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 01:09:03,968][134211] Avg episode reward: [(0, '6.907')] [2025-01-04 01:09:04,408][134294] Updated weights for policy 0, policy_version 52604 (0.0027) [2025-01-04 01:09:07,425][134294] Updated weights for policy 0, policy_version 52614 (0.0024) [2025-01-04 01:09:08,968][134211] Fps is (10 sec: 13107.2, 60 sec: 15291.7, 300 sec: 15328.8). Total num frames: 215523328. Throughput: 0: 3808.9. Samples: 43048982. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 01:09:08,968][134211] Avg episode reward: [(0, '7.567')] [2025-01-04 01:09:10,461][134294] Updated weights for policy 0, policy_version 52624 (0.0026) [2025-01-04 01:09:13,414][134294] Updated weights for policy 0, policy_version 52634 (0.0026) [2025-01-04 01:09:13,968][134211] Fps is (10 sec: 13516.4, 60 sec: 15291.7, 300 sec: 15342.6). Total num frames: 215592960. Throughput: 0: 3643.4. Samples: 43069592. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 01:09:13,969][134211] Avg episode reward: [(0, '7.080')] [2025-01-04 01:09:14,000][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000052636_215597056.pth... [2025-01-04 01:09:14,074][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000051745_211947520.pth [2025-01-04 01:09:16,490][134294] Updated weights for policy 0, policy_version 52644 (0.0025) [2025-01-04 01:09:18,968][134211] Fps is (10 sec: 13516.9, 60 sec: 15223.5, 300 sec: 15259.3). Total num frames: 215658496. Throughput: 0: 3643.9. Samples: 43079438. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 01:09:18,968][134211] Avg episode reward: [(0, '7.103')] [2025-01-04 01:09:19,716][134294] Updated weights for policy 0, policy_version 52654 (0.0031) [2025-01-04 01:09:22,308][134294] Updated weights for policy 0, policy_version 52664 (0.0019) [2025-01-04 01:09:23,968][134211] Fps is (10 sec: 15155.7, 60 sec: 14813.8, 300 sec: 15189.9). Total num frames: 215744512. Throughput: 0: 3748.7. Samples: 43100586. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 01:09:23,968][134211] Avg episode reward: [(0, '6.915')] [2025-01-04 01:09:24,199][134294] Updated weights for policy 0, policy_version 52674 (0.0012) [2025-01-04 01:09:26,329][134294] Updated weights for policy 0, policy_version 52684 (0.0019) [2025-01-04 01:09:28,968][134211] Fps is (10 sec: 16793.4, 60 sec: 14813.9, 300 sec: 15245.5). Total num frames: 215826432. Throughput: 0: 3876.9. Samples: 43128080. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 01:09:28,968][134211] Avg episode reward: [(0, '7.241')] [2025-01-04 01:09:29,331][134294] Updated weights for policy 0, policy_version 52694 (0.0026) [2025-01-04 01:09:32,387][134294] Updated weights for policy 0, policy_version 52704 (0.0026) [2025-01-04 01:09:33,968][134211] Fps is (10 sec: 14745.4, 60 sec: 14950.4, 300 sec: 15273.2). Total num frames: 215891968. Throughput: 0: 3745.5. Samples: 43137940. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 01:09:33,969][134211] Avg episode reward: [(0, '6.831')] [2025-01-04 01:09:35,469][134294] Updated weights for policy 0, policy_version 52714 (0.0024) [2025-01-04 01:09:38,169][134294] Updated weights for policy 0, policy_version 52724 (0.0025) [2025-01-04 01:09:38,967][134211] Fps is (10 sec: 14336.4, 60 sec: 15291.8, 300 sec: 15273.2). Total num frames: 215969792. Throughput: 0: 3499.4. Samples: 43158380. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 01:09:38,968][134211] Avg episode reward: [(0, '6.581')] [2025-01-04 01:09:40,329][134294] Updated weights for policy 0, policy_version 52734 (0.0014) [2025-01-04 01:09:43,177][134294] Updated weights for policy 0, policy_version 52744 (0.0027) [2025-01-04 01:09:43,968][134211] Fps is (10 sec: 15564.6, 60 sec: 15223.6, 300 sec: 15314.9). Total num frames: 216047616. Throughput: 0: 3632.5. Samples: 43182950. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 01:09:43,968][134211] Avg episode reward: [(0, '7.138')] [2025-01-04 01:09:46,235][134294] Updated weights for policy 0, policy_version 52754 (0.0025) [2025-01-04 01:09:48,746][134294] Updated weights for policy 0, policy_version 52764 (0.0019) [2025-01-04 01:09:48,967][134211] Fps is (10 sec: 15564.9, 60 sec: 14677.4, 300 sec: 15287.1). Total num frames: 216125440. Throughput: 0: 3637.4. Samples: 43193234. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 01:09:48,968][134211] Avg episode reward: [(0, '7.064')] [2025-01-04 01:09:50,650][134294] Updated weights for policy 0, policy_version 52774 (0.0014) [2025-01-04 01:09:52,499][134294] Updated weights for policy 0, policy_version 52784 (0.0015) [2025-01-04 01:09:53,968][134211] Fps is (10 sec: 18432.4, 60 sec: 15018.7, 300 sec: 15426.0). Total num frames: 216231936. Throughput: 0: 3848.5. Samples: 43222164. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 01:09:53,968][134211] Avg episode reward: [(0, '6.965')] [2025-01-04 01:09:54,415][134294] Updated weights for policy 0, policy_version 52794 (0.0015) [2025-01-04 01:09:56,600][134294] Updated weights for policy 0, policy_version 52804 (0.0018) [2025-01-04 01:09:58,968][134211] Fps is (10 sec: 18841.0, 60 sec: 15360.0, 300 sec: 15425.9). Total num frames: 216313856. Throughput: 0: 4001.6. Samples: 43249662. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 01:09:58,968][134211] Avg episode reward: [(0, '6.371')] [2025-01-04 01:09:59,910][134294] Updated weights for policy 0, policy_version 52814 (0.0025) [2025-01-04 01:10:03,486][134294] Updated weights for policy 0, policy_version 52824 (0.0027) [2025-01-04 01:10:03,968][134211] Fps is (10 sec: 13516.5, 60 sec: 15155.1, 300 sec: 15245.4). Total num frames: 216367104. Throughput: 0: 3974.9. Samples: 43258308. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 01:10:03,969][134211] Avg episode reward: [(0, '7.045')] [2025-01-04 01:10:06,910][134294] Updated weights for policy 0, policy_version 52834 (0.0028) [2025-01-04 01:10:08,968][134211] Fps is (10 sec: 11878.5, 60 sec: 15155.2, 300 sec: 15203.8). Total num frames: 216432640. Throughput: 0: 3898.3. Samples: 43276010. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 01:10:08,968][134211] Avg episode reward: [(0, '6.561')] [2025-01-04 01:10:09,827][134294] Updated weights for policy 0, policy_version 52844 (0.0023) [2025-01-04 01:10:11,946][134294] Updated weights for policy 0, policy_version 52854 (0.0017) [2025-01-04 01:10:13,970][134211] Fps is (10 sec: 15152.0, 60 sec: 15427.7, 300 sec: 15287.0). Total num frames: 216518656. Throughput: 0: 3826.7. Samples: 43300290. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 01:10:13,971][134211] Avg episode reward: [(0, '6.552')] [2025-01-04 01:10:14,827][134294] Updated weights for policy 0, policy_version 52864 (0.0026) [2025-01-04 01:10:17,947][134294] Updated weights for policy 0, policy_version 52874 (0.0025) [2025-01-04 01:10:18,968][134211] Fps is (10 sec: 15155.2, 60 sec: 15428.2, 300 sec: 15287.1). Total num frames: 216584192. Throughput: 0: 3830.0. Samples: 43310290. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 01:10:18,968][134211] Avg episode reward: [(0, '6.645')] [2025-01-04 01:10:21,031][134294] Updated weights for policy 0, policy_version 52884 (0.0026) [2025-01-04 01:10:23,146][134294] Updated weights for policy 0, policy_version 52894 (0.0015) [2025-01-04 01:10:23,968][134211] Fps is (10 sec: 15158.7, 60 sec: 15428.3, 300 sec: 15356.6). Total num frames: 216670208. Throughput: 0: 3846.0. Samples: 43331450. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-04 01:10:23,968][134211] Avg episode reward: [(0, '6.384')] [2025-01-04 01:10:25,121][134294] Updated weights for policy 0, policy_version 52904 (0.0014) [2025-01-04 01:10:27,665][134294] Updated weights for policy 0, policy_version 52914 (0.0021) [2025-01-04 01:10:28,968][134211] Fps is (10 sec: 16793.2, 60 sec: 15428.2, 300 sec: 15370.4). Total num frames: 216752128. Throughput: 0: 3904.7. Samples: 43358660. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-04 01:10:28,969][134211] Avg episode reward: [(0, '6.285')] [2025-01-04 01:10:30,821][134294] Updated weights for policy 0, policy_version 52924 (0.0028) [2025-01-04 01:10:33,968][134211] Fps is (10 sec: 14335.9, 60 sec: 15360.0, 300 sec: 15259.3). Total num frames: 216813568. Throughput: 0: 3899.3. Samples: 43368702. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-04 01:10:33,968][134211] Avg episode reward: [(0, '6.596')] [2025-01-04 01:10:33,994][134294] Updated weights for policy 0, policy_version 52934 (0.0023) [2025-01-04 01:10:37,370][134294] Updated weights for policy 0, policy_version 52944 (0.0021) [2025-01-04 01:10:38,967][134211] Fps is (10 sec: 13517.4, 60 sec: 15291.7, 300 sec: 15287.1). Total num frames: 216887296. Throughput: 0: 3661.7. Samples: 43386942. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-04 01:10:38,968][134211] Avg episode reward: [(0, '6.141')] [2025-01-04 01:10:39,419][134294] Updated weights for policy 0, policy_version 52954 (0.0013) [2025-01-04 01:10:41,346][134294] Updated weights for policy 0, policy_version 52964 (0.0014) [2025-01-04 01:10:43,301][134294] Updated weights for policy 0, policy_version 52974 (0.0014) [2025-01-04 01:10:43,968][134211] Fps is (10 sec: 18022.4, 60 sec: 15769.6, 300 sec: 15356.5). Total num frames: 216993792. Throughput: 0: 3740.0. Samples: 43417960. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-04 01:10:43,968][134211] Avg episode reward: [(0, '6.226')] [2025-01-04 01:10:45,823][134294] Updated weights for policy 0, policy_version 52984 (0.0021) [2025-01-04 01:10:48,913][134294] Updated weights for policy 0, policy_version 52994 (0.0027) [2025-01-04 01:10:48,970][134211] Fps is (10 sec: 17608.7, 60 sec: 15632.5, 300 sec: 15231.4). Total num frames: 217063424. Throughput: 0: 3817.9. Samples: 43430120. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-04 01:10:48,970][134211] Avg episode reward: [(0, '6.364')] [2025-01-04 01:10:52,127][134294] Updated weights for policy 0, policy_version 53004 (0.0028) [2025-01-04 01:10:53,968][134211] Fps is (10 sec: 13107.0, 60 sec: 14882.1, 300 sec: 15203.8). Total num frames: 217124864. Throughput: 0: 3857.2. Samples: 43449584. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-04 01:10:53,969][134211] Avg episode reward: [(0, '6.598')] [2025-01-04 01:10:55,126][134294] Updated weights for policy 0, policy_version 53014 (0.0027) [2025-01-04 01:10:58,234][134294] Updated weights for policy 0, policy_version 53024 (0.0024) [2025-01-04 01:10:58,968][134211] Fps is (10 sec: 13109.9, 60 sec: 14677.3, 300 sec: 15231.6). Total num frames: 217194496. Throughput: 0: 3767.8. Samples: 43469832. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 01:10:58,968][134211] Avg episode reward: [(0, '6.589')] [2025-01-04 01:11:01,107][134294] Updated weights for policy 0, policy_version 53034 (0.0025) [2025-01-04 01:11:03,968][134211] Fps is (10 sec: 13517.0, 60 sec: 14882.2, 300 sec: 15231.6). Total num frames: 217260032. Throughput: 0: 3770.3. Samples: 43479954. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 01:11:03,968][134211] Avg episode reward: [(0, '6.865')] [2025-01-04 01:11:04,377][134294] Updated weights for policy 0, policy_version 53044 (0.0027) [2025-01-04 01:11:07,331][134294] Updated weights for policy 0, policy_version 53054 (0.0023) [2025-01-04 01:11:08,968][134211] Fps is (10 sec: 14745.8, 60 sec: 15155.2, 300 sec: 15287.1). Total num frames: 217341952. Throughput: 0: 3744.4. Samples: 43499948. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 01:11:08,968][134211] Avg episode reward: [(0, '6.241')] [2025-01-04 01:11:09,298][134294] Updated weights for policy 0, policy_version 53064 (0.0014) [2025-01-04 01:11:11,189][134294] Updated weights for policy 0, policy_version 53074 (0.0014) [2025-01-04 01:11:13,085][134294] Updated weights for policy 0, policy_version 53084 (0.0013) [2025-01-04 01:11:13,967][134211] Fps is (10 sec: 18841.9, 60 sec: 15497.2, 300 sec: 15426.0). Total num frames: 217448448. Throughput: 0: 3845.3. Samples: 43531698. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 01:11:13,968][134211] Avg episode reward: [(0, '6.745')] [2025-01-04 01:11:14,040][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000053089_217452544.pth... [2025-01-04 01:11:14,081][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000052193_213782528.pth [2025-01-04 01:11:15,006][134294] Updated weights for policy 0, policy_version 53094 (0.0014) [2025-01-04 01:11:16,909][134294] Updated weights for policy 0, policy_version 53104 (0.0015) [2025-01-04 01:11:18,968][134211] Fps is (10 sec: 20070.1, 60 sec: 15974.4, 300 sec: 15398.2). Total num frames: 217542656. Throughput: 0: 3981.0. Samples: 43547848. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 01:11:18,968][134211] Avg episode reward: [(0, '6.875')] [2025-01-04 01:11:19,906][134294] Updated weights for policy 0, policy_version 53114 (0.0027) [2025-01-04 01:11:23,199][134294] Updated weights for policy 0, policy_version 53124 (0.0026) [2025-01-04 01:11:23,968][134211] Fps is (10 sec: 15564.3, 60 sec: 15564.7, 300 sec: 15231.5). Total num frames: 217604096. Throughput: 0: 4044.2. Samples: 43568932. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 01:11:23,968][134211] Avg episode reward: [(0, '6.286')] [2025-01-04 01:11:26,343][134294] Updated weights for policy 0, policy_version 53134 (0.0025) [2025-01-04 01:11:28,968][134211] Fps is (10 sec: 12697.5, 60 sec: 15291.8, 300 sec: 15120.5). Total num frames: 217669632. Throughput: 0: 3787.5. Samples: 43588400. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 01:11:28,968][134211] Avg episode reward: [(0, '7.100')] [2025-01-04 01:11:29,517][134294] Updated weights for policy 0, policy_version 53144 (0.0027) [2025-01-04 01:11:32,542][134294] Updated weights for policy 0, policy_version 53154 (0.0027) [2025-01-04 01:11:33,968][134211] Fps is (10 sec: 13107.4, 60 sec: 15360.0, 300 sec: 15162.2). Total num frames: 217735168. Throughput: 0: 3735.6. Samples: 43598212. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 01:11:33,968][134211] Avg episode reward: [(0, '6.414')] [2025-01-04 01:11:35,616][134294] Updated weights for policy 0, policy_version 53164 (0.0024) [2025-01-04 01:11:38,579][134294] Updated weights for policy 0, policy_version 53174 (0.0025) [2025-01-04 01:11:38,968][134211] Fps is (10 sec: 13516.9, 60 sec: 15291.7, 300 sec: 15203.9). Total num frames: 217804800. Throughput: 0: 3759.4. Samples: 43618758. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 01:11:38,968][134211] Avg episode reward: [(0, '6.310')] [2025-01-04 01:11:41,592][134294] Updated weights for policy 0, policy_version 53184 (0.0021) [2025-01-04 01:11:43,968][134211] Fps is (10 sec: 13516.6, 60 sec: 14609.0, 300 sec: 15189.9). Total num frames: 217870336. Throughput: 0: 3760.4. Samples: 43639052. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 01:11:43,968][134211] Avg episode reward: [(0, '5.992')] [2025-01-04 01:11:44,650][134294] Updated weights for policy 0, policy_version 53194 (0.0026) [2025-01-04 01:11:47,133][134294] Updated weights for policy 0, policy_version 53204 (0.0017) [2025-01-04 01:11:48,967][134211] Fps is (10 sec: 15565.2, 60 sec: 14951.0, 300 sec: 15273.3). Total num frames: 217960448. Throughput: 0: 3760.8. Samples: 43649190. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 01:11:48,968][134211] Avg episode reward: [(0, '6.162')] [2025-01-04 01:11:49,054][134294] Updated weights for policy 0, policy_version 53214 (0.0011) [2025-01-04 01:11:50,889][134294] Updated weights for policy 0, policy_version 53224 (0.0014) [2025-01-04 01:11:52,941][134294] Updated weights for policy 0, policy_version 53234 (0.0013) [2025-01-04 01:11:53,968][134211] Fps is (10 sec: 19661.3, 60 sec: 15701.4, 300 sec: 15412.1). Total num frames: 218066944. Throughput: 0: 4024.8. Samples: 43681062. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 01:11:53,968][134211] Avg episode reward: [(0, '6.974')] [2025-01-04 01:11:54,870][134294] Updated weights for policy 0, policy_version 53244 (0.0013) [2025-01-04 01:11:56,769][134294] Updated weights for policy 0, policy_version 53254 (0.0013) [2025-01-04 01:11:58,968][134211] Fps is (10 sec: 20069.8, 60 sec: 16110.9, 300 sec: 15495.4). Total num frames: 218161152. Throughput: 0: 3992.1. Samples: 43711344. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 01:11:58,968][134211] Avg episode reward: [(0, '7.158')] [2025-01-04 01:11:59,449][134294] Updated weights for policy 0, policy_version 53264 (0.0022) [2025-01-04 01:12:02,825][134294] Updated weights for policy 0, policy_version 53274 (0.0029) [2025-01-04 01:12:03,968][134211] Fps is (10 sec: 15563.9, 60 sec: 16042.5, 300 sec: 15426.0). Total num frames: 218222592. Throughput: 0: 3842.7. Samples: 43720770. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 01:12:03,969][134211] Avg episode reward: [(0, '6.551')] [2025-01-04 01:12:06,099][134294] Updated weights for policy 0, policy_version 53284 (0.0029) [2025-01-04 01:12:08,968][134211] Fps is (10 sec: 12288.1, 60 sec: 15701.3, 300 sec: 15259.3). Total num frames: 218284032. Throughput: 0: 3786.4. Samples: 43739320. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 01:12:08,968][134211] Avg episode reward: [(0, '6.163')] [2025-01-04 01:12:09,457][134294] Updated weights for policy 0, policy_version 53294 (0.0029) [2025-01-04 01:12:12,512][134294] Updated weights for policy 0, policy_version 53304 (0.0026) [2025-01-04 01:12:13,968][134211] Fps is (10 sec: 12698.1, 60 sec: 15018.6, 300 sec: 15259.3). Total num frames: 218349568. Throughput: 0: 3781.6. Samples: 43758572. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 01:12:13,968][134211] Avg episode reward: [(0, '6.861')] [2025-01-04 01:12:15,550][134294] Updated weights for policy 0, policy_version 53314 (0.0021) [2025-01-04 01:12:18,968][134211] Fps is (10 sec: 12696.8, 60 sec: 14472.4, 300 sec: 15259.3). Total num frames: 218411008. Throughput: 0: 3785.7. Samples: 43768572. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 01:12:18,969][134211] Avg episode reward: [(0, '7.675')] [2025-01-04 01:12:19,105][134294] Updated weights for policy 0, policy_version 53324 (0.0025) [2025-01-04 01:12:22,166][134294] Updated weights for policy 0, policy_version 53334 (0.0025) [2025-01-04 01:12:23,968][134211] Fps is (10 sec: 13926.5, 60 sec: 14745.6, 300 sec: 15301.0). Total num frames: 218488832. Throughput: 0: 3743.1. Samples: 43787198. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 01:12:23,968][134211] Avg episode reward: [(0, '6.575')] [2025-01-04 01:12:24,449][134294] Updated weights for policy 0, policy_version 53344 (0.0015) [2025-01-04 01:12:27,550][134294] Updated weights for policy 0, policy_version 53354 (0.0024) [2025-01-04 01:12:28,968][134211] Fps is (10 sec: 14337.2, 60 sec: 14745.6, 300 sec: 15189.9). Total num frames: 218554368. Throughput: 0: 3789.8. Samples: 43809592. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 01:12:28,968][134211] Avg episode reward: [(0, '6.087')] [2025-01-04 01:12:30,395][134294] Updated weights for policy 0, policy_version 53364 (0.0022) [2025-01-04 01:12:32,275][134294] Updated weights for policy 0, policy_version 53374 (0.0012) [2025-01-04 01:12:33,967][134211] Fps is (10 sec: 16384.4, 60 sec: 15291.8, 300 sec: 15176.0). Total num frames: 218652672. Throughput: 0: 3842.5. Samples: 43822104. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 01:12:33,968][134211] Avg episode reward: [(0, '7.148')] [2025-01-04 01:12:34,329][134294] Updated weights for policy 0, policy_version 53384 (0.0013) [2025-01-04 01:12:36,570][134294] Updated weights for policy 0, policy_version 53394 (0.0015) [2025-01-04 01:12:38,968][134211] Fps is (10 sec: 17612.5, 60 sec: 15428.3, 300 sec: 15231.6). Total num frames: 218730496. Throughput: 0: 3770.1. Samples: 43850718. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 01:12:38,969][134211] Avg episode reward: [(0, '7.130')] [2025-01-04 01:12:40,071][134294] Updated weights for policy 0, policy_version 53404 (0.0027) [2025-01-04 01:12:43,504][134294] Updated weights for policy 0, policy_version 53414 (0.0026) [2025-01-04 01:12:43,968][134211] Fps is (10 sec: 13516.4, 60 sec: 15291.7, 300 sec: 15203.8). Total num frames: 218787840. Throughput: 0: 3483.2. Samples: 43868090. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 01:12:43,968][134211] Avg episode reward: [(0, '6.968')] [2025-01-04 01:12:46,209][134294] Updated weights for policy 0, policy_version 53424 (0.0020) [2025-01-04 01:12:48,441][134294] Updated weights for policy 0, policy_version 53434 (0.0018) [2025-01-04 01:12:48,968][134211] Fps is (10 sec: 13926.6, 60 sec: 15155.2, 300 sec: 15259.3). Total num frames: 218869760. Throughput: 0: 3525.1. Samples: 43879398. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 01:12:48,968][134211] Avg episode reward: [(0, '6.231')] [2025-01-04 01:12:51,509][134294] Updated weights for policy 0, policy_version 53444 (0.0025) [2025-01-04 01:12:53,968][134211] Fps is (10 sec: 14745.7, 60 sec: 14472.5, 300 sec: 15245.5). Total num frames: 218935296. Throughput: 0: 3613.6. Samples: 43901930. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 01:12:53,968][134211] Avg episode reward: [(0, '7.164')] [2025-01-04 01:12:54,625][134294] Updated weights for policy 0, policy_version 53454 (0.0026) [2025-01-04 01:12:57,762][134294] Updated weights for policy 0, policy_version 53464 (0.0024) [2025-01-04 01:12:58,968][134211] Fps is (10 sec: 14336.1, 60 sec: 14199.5, 300 sec: 15287.1). Total num frames: 219013120. Throughput: 0: 3640.1. Samples: 43922376. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 01:12:58,968][134211] Avg episode reward: [(0, '8.141')] [2025-01-04 01:12:58,969][134264] Saving new best policy, reward=8.141! [2025-01-04 01:12:59,702][134294] Updated weights for policy 0, policy_version 53474 (0.0015) [2025-01-04 01:13:01,575][134294] Updated weights for policy 0, policy_version 53484 (0.0013) [2025-01-04 01:13:03,470][134294] Updated weights for policy 0, policy_version 53494 (0.0012) [2025-01-04 01:13:03,968][134211] Fps is (10 sec: 18432.0, 60 sec: 14950.5, 300 sec: 15301.0). Total num frames: 219119616. Throughput: 0: 3782.5. Samples: 43938782. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 01:13:03,968][134211] Avg episode reward: [(0, '7.773')] [2025-01-04 01:13:06,154][134294] Updated weights for policy 0, policy_version 53504 (0.0023) [2025-01-04 01:13:08,968][134211] Fps is (10 sec: 17202.7, 60 sec: 15018.6, 300 sec: 15287.1). Total num frames: 219185152. Throughput: 0: 3944.5. Samples: 43964700. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 01:13:08,969][134211] Avg episode reward: [(0, '7.679')] [2025-01-04 01:13:09,304][134294] Updated weights for policy 0, policy_version 53514 (0.0028) [2025-01-04 01:13:12,415][134294] Updated weights for policy 0, policy_version 53524 (0.0025) [2025-01-04 01:13:13,968][134211] Fps is (10 sec: 13107.0, 60 sec: 15018.6, 300 sec: 15273.2). Total num frames: 219250688. Throughput: 0: 3876.6. Samples: 43984040. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 01:13:13,969][134211] Avg episode reward: [(0, '7.051')] [2025-01-04 01:13:13,975][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000053528_219250688.pth... [2025-01-04 01:13:14,053][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000052636_215597056.pth [2025-01-04 01:13:15,720][134294] Updated weights for policy 0, policy_version 53534 (0.0027) [2025-01-04 01:13:18,835][134294] Updated weights for policy 0, policy_version 53544 (0.0028) [2025-01-04 01:13:18,968][134211] Fps is (10 sec: 13107.4, 60 sec: 15087.1, 300 sec: 15120.5). Total num frames: 219316224. Throughput: 0: 3809.9. Samples: 43993552. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 01:13:18,968][134211] Avg episode reward: [(0, '6.815')] [2025-01-04 01:13:21,841][134294] Updated weights for policy 0, policy_version 53554 (0.0024) [2025-01-04 01:13:23,968][134211] Fps is (10 sec: 13107.2, 60 sec: 14882.1, 300 sec: 15064.9). Total num frames: 219381760. Throughput: 0: 3619.2. Samples: 44013582. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 01:13:23,969][134211] Avg episode reward: [(0, '6.904')] [2025-01-04 01:13:25,026][134294] Updated weights for policy 0, policy_version 53564 (0.0027) [2025-01-04 01:13:27,388][134294] Updated weights for policy 0, policy_version 53574 (0.0017) [2025-01-04 01:13:28,968][134211] Fps is (10 sec: 15155.3, 60 sec: 15223.5, 300 sec: 15162.2). Total num frames: 219467776. Throughput: 0: 3757.2. Samples: 44037164. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 01:13:28,968][134211] Avg episode reward: [(0, '7.711')] [2025-01-04 01:13:29,441][134294] Updated weights for policy 0, policy_version 53584 (0.0013) [2025-01-04 01:13:31,407][134294] Updated weights for policy 0, policy_version 53594 (0.0014) [2025-01-04 01:13:33,838][134294] Updated weights for policy 0, policy_version 53604 (0.0020) [2025-01-04 01:13:33,969][134211] Fps is (10 sec: 18020.1, 60 sec: 15154.8, 300 sec: 15287.0). Total num frames: 219561984. Throughput: 0: 3841.0. Samples: 44052250. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 01:13:33,970][134211] Avg episode reward: [(0, '7.094')] [2025-01-04 01:13:37,071][134294] Updated weights for policy 0, policy_version 53614 (0.0027) [2025-01-04 01:13:38,968][134211] Fps is (10 sec: 15564.6, 60 sec: 14882.2, 300 sec: 15217.7). Total num frames: 219623424. Throughput: 0: 3831.3. Samples: 44074340. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 01:13:38,968][134211] Avg episode reward: [(0, '7.226')] [2025-01-04 01:13:40,278][134294] Updated weights for policy 0, policy_version 53624 (0.0028) [2025-01-04 01:13:43,216][134294] Updated weights for policy 0, policy_version 53634 (0.0030) [2025-01-04 01:13:43,968][134211] Fps is (10 sec: 13109.0, 60 sec: 15086.9, 300 sec: 15078.8). Total num frames: 219693056. Throughput: 0: 3817.7. Samples: 44094172. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 01:13:43,968][134211] Avg episode reward: [(0, '7.135')] [2025-01-04 01:13:45,986][134294] Updated weights for policy 0, policy_version 53644 (0.0019) [2025-01-04 01:13:47,858][134294] Updated weights for policy 0, policy_version 53654 (0.0013) [2025-01-04 01:13:48,968][134211] Fps is (10 sec: 16793.8, 60 sec: 15360.0, 300 sec: 15120.5). Total num frames: 219791360. Throughput: 0: 3710.4. Samples: 44105748. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 01:13:48,968][134211] Avg episode reward: [(0, '7.022')] [2025-01-04 01:13:49,788][134294] Updated weights for policy 0, policy_version 53664 (0.0016) [2025-01-04 01:13:52,660][134294] Updated weights for policy 0, policy_version 53674 (0.0026) [2025-01-04 01:13:53,968][134211] Fps is (10 sec: 17203.4, 60 sec: 15496.5, 300 sec: 15162.1). Total num frames: 219865088. Throughput: 0: 3761.1. Samples: 44133948. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 01:13:53,968][134211] Avg episode reward: [(0, '6.785')] [2025-01-04 01:13:55,960][134294] Updated weights for policy 0, policy_version 53684 (0.0028) [2025-01-04 01:13:58,968][134211] Fps is (10 sec: 13516.8, 60 sec: 15223.5, 300 sec: 15148.3). Total num frames: 219926528. Throughput: 0: 3761.9. Samples: 44153324. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 01:13:58,968][134211] Avg episode reward: [(0, '7.125')] [2025-01-04 01:13:59,011][134294] Updated weights for policy 0, policy_version 53694 (0.0026) [2025-01-04 01:14:01,957][134294] Updated weights for policy 0, policy_version 53704 (0.0027) [2025-01-04 01:14:03,968][134211] Fps is (10 sec: 13107.1, 60 sec: 14609.1, 300 sec: 15162.1). Total num frames: 219996160. Throughput: 0: 3771.5. Samples: 44163272. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 01:14:03,968][134211] Avg episode reward: [(0, '7.540')] [2025-01-04 01:14:05,134][134294] Updated weights for policy 0, policy_version 53714 (0.0020) [2025-01-04 01:14:07,039][134294] Updated weights for policy 0, policy_version 53724 (0.0012) [2025-01-04 01:14:08,964][134294] Updated weights for policy 0, policy_version 53734 (0.0011) [2025-01-04 01:14:08,968][134211] Fps is (10 sec: 16793.7, 60 sec: 15155.3, 300 sec: 15259.4). Total num frames: 220094464. Throughput: 0: 3864.5. Samples: 44187482. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 01:14:08,968][134211] Avg episode reward: [(0, '6.681')] [2025-01-04 01:14:10,854][134294] Updated weights for policy 0, policy_version 53744 (0.0013) [2025-01-04 01:14:12,919][134294] Updated weights for policy 0, policy_version 53754 (0.0016) [2025-01-04 01:14:13,968][134211] Fps is (10 sec: 19250.9, 60 sec: 15633.1, 300 sec: 15356.5). Total num frames: 220188672. Throughput: 0: 4025.3. Samples: 44218302. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 01:14:13,968][134211] Avg episode reward: [(0, '6.913')] [2025-01-04 01:14:16,101][134294] Updated weights for policy 0, policy_version 53764 (0.0025) [2025-01-04 01:14:18,968][134211] Fps is (10 sec: 15564.5, 60 sec: 15564.8, 300 sec: 15273.2). Total num frames: 220250112. Throughput: 0: 3898.2. Samples: 44227664. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 01:14:18,968][134211] Avg episode reward: [(0, '7.528')] [2025-01-04 01:14:19,443][134294] Updated weights for policy 0, policy_version 53774 (0.0028) [2025-01-04 01:14:22,577][134294] Updated weights for policy 0, policy_version 53784 (0.0032) [2025-01-04 01:14:23,969][134211] Fps is (10 sec: 12696.6, 60 sec: 15564.6, 300 sec: 15217.6). Total num frames: 220315648. Throughput: 0: 3829.2. Samples: 44246656. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 01:14:23,969][134211] Avg episode reward: [(0, '7.536')] [2025-01-04 01:14:25,717][134294] Updated weights for policy 0, policy_version 53794 (0.0028) [2025-01-04 01:14:28,587][134294] Updated weights for policy 0, policy_version 53804 (0.0024) [2025-01-04 01:14:28,968][134211] Fps is (10 sec: 13516.9, 60 sec: 15291.7, 300 sec: 15231.6). Total num frames: 220385280. Throughput: 0: 3841.2. Samples: 44267024. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 01:14:28,968][134211] Avg episode reward: [(0, '7.418')] [2025-01-04 01:14:31,632][134294] Updated weights for policy 0, policy_version 53814 (0.0024) [2025-01-04 01:14:33,968][134211] Fps is (10 sec: 13108.4, 60 sec: 14746.0, 300 sec: 15176.0). Total num frames: 220446720. Throughput: 0: 3807.7. Samples: 44277094. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 01:14:33,968][134211] Avg episode reward: [(0, '6.622')] [2025-01-04 01:14:34,795][134294] Updated weights for policy 0, policy_version 53824 (0.0023) [2025-01-04 01:14:36,977][134294] Updated weights for policy 0, policy_version 53834 (0.0012) [2025-01-04 01:14:38,968][134211] Fps is (10 sec: 15564.9, 60 sec: 15291.8, 300 sec: 15231.6). Total num frames: 220540928. Throughput: 0: 3695.9. Samples: 44300262. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 01:14:38,968][134211] Avg episode reward: [(0, '7.037')] [2025-01-04 01:14:39,048][134294] Updated weights for policy 0, policy_version 53844 (0.0013) [2025-01-04 01:14:41,084][134294] Updated weights for policy 0, policy_version 53854 (0.0012) [2025-01-04 01:14:42,958][134294] Updated weights for policy 0, policy_version 53864 (0.0013) [2025-01-04 01:14:43,967][134211] Fps is (10 sec: 20070.7, 60 sec: 15906.2, 300 sec: 15328.8). Total num frames: 220647424. Throughput: 0: 3954.4. Samples: 44331270. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 01:14:43,968][134211] Avg episode reward: [(0, '7.524')] [2025-01-04 01:14:44,838][134294] Updated weights for policy 0, policy_version 53874 (0.0014) [2025-01-04 01:14:47,594][134294] Updated weights for policy 0, policy_version 53884 (0.0023) [2025-01-04 01:14:48,968][134211] Fps is (10 sec: 18431.8, 60 sec: 15564.8, 300 sec: 15231.6). Total num frames: 220725248. Throughput: 0: 4056.6. Samples: 44345820. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 01:14:48,968][134211] Avg episode reward: [(0, '6.630')] [2025-01-04 01:14:50,817][134294] Updated weights for policy 0, policy_version 53894 (0.0029) [2025-01-04 01:14:53,822][134294] Updated weights for policy 0, policy_version 53904 (0.0029) [2025-01-04 01:14:53,968][134211] Fps is (10 sec: 14335.8, 60 sec: 15428.3, 300 sec: 15176.0). Total num frames: 220790784. Throughput: 0: 3951.1. Samples: 44365284. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 01:14:53,968][134211] Avg episode reward: [(0, '6.413')] [2025-01-04 01:14:56,986][134294] Updated weights for policy 0, policy_version 53914 (0.0027) [2025-01-04 01:14:58,968][134211] Fps is (10 sec: 13107.3, 60 sec: 15496.5, 300 sec: 15217.7). Total num frames: 220856320. Throughput: 0: 3703.6. Samples: 44384962. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 01:14:58,968][134211] Avg episode reward: [(0, '7.295')] [2025-01-04 01:15:00,045][134294] Updated weights for policy 0, policy_version 53924 (0.0026) [2025-01-04 01:15:03,098][134294] Updated weights for policy 0, policy_version 53934 (0.0025) [2025-01-04 01:15:03,968][134211] Fps is (10 sec: 13107.0, 60 sec: 15428.2, 300 sec: 15217.7). Total num frames: 220921856. Throughput: 0: 3725.3. Samples: 44395304. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 01:15:03,968][134211] Avg episode reward: [(0, '6.886')] [2025-01-04 01:15:06,137][134294] Updated weights for policy 0, policy_version 53944 (0.0023) [2025-01-04 01:15:08,968][134211] Fps is (10 sec: 13516.4, 60 sec: 14950.3, 300 sec: 15162.2). Total num frames: 220991488. Throughput: 0: 3751.0. Samples: 44415448. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 01:15:08,969][134211] Avg episode reward: [(0, '6.661')] [2025-01-04 01:15:09,172][134294] Updated weights for policy 0, policy_version 53954 (0.0025) [2025-01-04 01:15:11,797][134294] Updated weights for policy 0, policy_version 53964 (0.0020) [2025-01-04 01:15:13,968][134211] Fps is (10 sec: 15155.3, 60 sec: 14745.6, 300 sec: 15217.7). Total num frames: 221073408. Throughput: 0: 3824.1. Samples: 44439108. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 01:15:13,968][134211] Avg episode reward: [(0, '7.618')] [2025-01-04 01:15:14,047][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000053974_221077504.pth... [2025-01-04 01:15:14,047][134294] Updated weights for policy 0, policy_version 53974 (0.0017) [2025-01-04 01:15:14,116][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000053089_217452544.pth [2025-01-04 01:15:17,142][134294] Updated weights for policy 0, policy_version 53984 (0.0024) [2025-01-04 01:15:18,968][134211] Fps is (10 sec: 14745.8, 60 sec: 14813.9, 300 sec: 15148.3). Total num frames: 221138944. Throughput: 0: 3826.6. Samples: 44449292. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 01:15:18,968][134211] Avg episode reward: [(0, '6.841')] [2025-01-04 01:15:20,237][134294] Updated weights for policy 0, policy_version 53994 (0.0025) [2025-01-04 01:15:22,136][134294] Updated weights for policy 0, policy_version 54004 (0.0013) [2025-01-04 01:15:23,968][134211] Fps is (10 sec: 16384.3, 60 sec: 15360.3, 300 sec: 15203.8). Total num frames: 221237248. Throughput: 0: 3851.2. Samples: 44473566. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 01:15:23,968][134211] Avg episode reward: [(0, '7.292')] [2025-01-04 01:15:24,040][134294] Updated weights for policy 0, policy_version 54014 (0.0015) [2025-01-04 01:15:25,909][134294] Updated weights for policy 0, policy_version 54024 (0.0014) [2025-01-04 01:15:27,768][134294] Updated weights for policy 0, policy_version 54034 (0.0014) [2025-01-04 01:15:28,968][134211] Fps is (10 sec: 20069.6, 60 sec: 15906.0, 300 sec: 15342.6). Total num frames: 221339648. Throughput: 0: 3882.8. Samples: 44506000. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 01:15:28,969][134211] Avg episode reward: [(0, '6.939')] [2025-01-04 01:15:30,499][134294] Updated weights for policy 0, policy_version 54044 (0.0024) [2025-01-04 01:15:33,731][134294] Updated weights for policy 0, policy_version 54054 (0.0027) [2025-01-04 01:15:33,968][134211] Fps is (10 sec: 17203.0, 60 sec: 16042.7, 300 sec: 15328.7). Total num frames: 221409280. Throughput: 0: 3791.1. Samples: 44516420. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 01:15:33,968][134211] Avg episode reward: [(0, '6.891')] [2025-01-04 01:15:36,842][134294] Updated weights for policy 0, policy_version 54064 (0.0027) [2025-01-04 01:15:38,968][134211] Fps is (10 sec: 13107.9, 60 sec: 15496.5, 300 sec: 15176.0). Total num frames: 221470720. Throughput: 0: 3789.9. Samples: 44535830. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 01:15:38,968][134211] Avg episode reward: [(0, '6.823')] [2025-01-04 01:15:40,089][134294] Updated weights for policy 0, policy_version 54074 (0.0026) [2025-01-04 01:15:43,161][134294] Updated weights for policy 0, policy_version 54084 (0.0027) [2025-01-04 01:15:43,968][134211] Fps is (10 sec: 12697.5, 60 sec: 14813.8, 300 sec: 15162.2). Total num frames: 221536256. Throughput: 0: 3783.6. Samples: 44555226. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 01:15:43,968][134211] Avg episode reward: [(0, '7.291')] [2025-01-04 01:15:46,230][134294] Updated weights for policy 0, policy_version 54094 (0.0026) [2025-01-04 01:15:48,968][134211] Fps is (10 sec: 13516.8, 60 sec: 14677.3, 300 sec: 15189.9). Total num frames: 221605888. Throughput: 0: 3780.8. Samples: 44565440. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 01:15:48,968][134211] Avg episode reward: [(0, '6.928')] [2025-01-04 01:15:49,246][134294] Updated weights for policy 0, policy_version 54104 (0.0025) [2025-01-04 01:15:51,414][134294] Updated weights for policy 0, policy_version 54114 (0.0014) [2025-01-04 01:15:53,906][134294] Updated weights for policy 0, policy_version 54124 (0.0021) [2025-01-04 01:15:53,968][134211] Fps is (10 sec: 15565.0, 60 sec: 15018.7, 300 sec: 15245.5). Total num frames: 221691904. Throughput: 0: 3872.6. Samples: 44589712. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 01:15:53,968][134211] Avg episode reward: [(0, '6.562')] [2025-01-04 01:15:56,982][134294] Updated weights for policy 0, policy_version 54134 (0.0027) [2025-01-04 01:15:58,968][134211] Fps is (10 sec: 15155.1, 60 sec: 15018.6, 300 sec: 15245.4). Total num frames: 221757440. Throughput: 0: 3805.9. Samples: 44610372. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 01:15:58,968][134211] Avg episode reward: [(0, '7.107')] [2025-01-04 01:15:59,978][134294] Updated weights for policy 0, policy_version 54144 (0.0026) [2025-01-04 01:16:01,963][134294] Updated weights for policy 0, policy_version 54154 (0.0014) [2025-01-04 01:16:03,800][134294] Updated weights for policy 0, policy_version 54164 (0.0012) [2025-01-04 01:16:03,967][134211] Fps is (10 sec: 16384.3, 60 sec: 15564.9, 300 sec: 15301.0). Total num frames: 221855744. Throughput: 0: 3856.7. Samples: 44622842. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 01:16:03,968][134211] Avg episode reward: [(0, '6.671')] [2025-01-04 01:16:05,722][134294] Updated weights for policy 0, policy_version 54174 (0.0014) [2025-01-04 01:16:07,657][134294] Updated weights for policy 0, policy_version 54184 (0.0012) [2025-01-04 01:16:08,968][134211] Fps is (10 sec: 20889.9, 60 sec: 16247.6, 300 sec: 15314.9). Total num frames: 221966336. Throughput: 0: 4040.1. Samples: 44655372. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 01:16:08,968][134211] Avg episode reward: [(0, '6.996')] [2025-01-04 01:16:09,665][134294] Updated weights for policy 0, policy_version 54194 (0.0013) [2025-01-04 01:16:12,826][134294] Updated weights for policy 0, policy_version 54204 (0.0026) [2025-01-04 01:16:13,968][134211] Fps is (10 sec: 17612.2, 60 sec: 15974.4, 300 sec: 15217.7). Total num frames: 222031872. Throughput: 0: 3846.2. Samples: 44679076. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 01:16:13,968][134211] Avg episode reward: [(0, '7.130')] [2025-01-04 01:16:16,115][134294] Updated weights for policy 0, policy_version 54214 (0.0026) [2025-01-04 01:16:18,968][134211] Fps is (10 sec: 13107.1, 60 sec: 15974.4, 300 sec: 15231.6). Total num frames: 222097408. Throughput: 0: 3829.3. Samples: 44688738. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 01:16:18,968][134211] Avg episode reward: [(0, '6.021')] [2025-01-04 01:16:19,304][134294] Updated weights for policy 0, policy_version 54224 (0.0026) [2025-01-04 01:16:22,450][134294] Updated weights for policy 0, policy_version 54234 (0.0025) [2025-01-04 01:16:23,968][134211] Fps is (10 sec: 12697.6, 60 sec: 15359.9, 300 sec: 15217.7). Total num frames: 222158848. Throughput: 0: 3824.3. Samples: 44707924. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 01:16:23,969][134211] Avg episode reward: [(0, '6.679')] [2025-01-04 01:16:25,612][134294] Updated weights for policy 0, policy_version 54244 (0.0026) [2025-01-04 01:16:28,500][134294] Updated weights for policy 0, policy_version 54254 (0.0026) [2025-01-04 01:16:28,968][134211] Fps is (10 sec: 13107.0, 60 sec: 14814.0, 300 sec: 15231.6). Total num frames: 222228480. Throughput: 0: 3842.1. Samples: 44728120. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 01:16:28,968][134211] Avg episode reward: [(0, '7.190')] [2025-01-04 01:16:31,474][134294] Updated weights for policy 0, policy_version 54264 (0.0023) [2025-01-04 01:16:33,968][134211] Fps is (10 sec: 13926.4, 60 sec: 14813.8, 300 sec: 15231.6). Total num frames: 222298112. Throughput: 0: 3842.6. Samples: 44738358. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 01:16:33,969][134211] Avg episode reward: [(0, '6.874')] [2025-01-04 01:16:34,620][134294] Updated weights for policy 0, policy_version 54274 (0.0026) [2025-01-04 01:16:36,980][134294] Updated weights for policy 0, policy_version 54284 (0.0018) [2025-01-04 01:16:38,968][134211] Fps is (10 sec: 15565.1, 60 sec: 15223.5, 300 sec: 15301.0). Total num frames: 222384128. Throughput: 0: 3806.0. Samples: 44760980. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 01:16:38,968][134211] Avg episode reward: [(0, '6.964')] [2025-01-04 01:16:39,126][134294] Updated weights for policy 0, policy_version 54294 (0.0012) [2025-01-04 01:16:41,178][134294] Updated weights for policy 0, policy_version 54304 (0.0015) [2025-01-04 01:16:43,157][134294] Updated weights for policy 0, policy_version 54314 (0.0012) [2025-01-04 01:16:43,967][134211] Fps is (10 sec: 18842.2, 60 sec: 15837.9, 300 sec: 15342.6). Total num frames: 222486528. Throughput: 0: 4010.5. Samples: 44790846. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 01:16:43,968][134211] Avg episode reward: [(0, '6.742')] [2025-01-04 01:16:45,211][134294] Updated weights for policy 0, policy_version 54324 (0.0014) [2025-01-04 01:16:48,232][134294] Updated weights for policy 0, policy_version 54334 (0.0028) [2025-01-04 01:16:48,968][134211] Fps is (10 sec: 17612.4, 60 sec: 15906.1, 300 sec: 15231.6). Total num frames: 222560256. Throughput: 0: 4039.7. Samples: 44804630. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 01:16:48,968][134211] Avg episode reward: [(0, '7.356')] [2025-01-04 01:16:51,421][134294] Updated weights for policy 0, policy_version 54344 (0.0027) [2025-01-04 01:16:53,968][134211] Fps is (10 sec: 13516.4, 60 sec: 15496.5, 300 sec: 15120.5). Total num frames: 222621696. Throughput: 0: 3743.9. Samples: 44823850. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 01:16:53,968][134211] Avg episode reward: [(0, '6.948')] [2025-01-04 01:16:54,622][134294] Updated weights for policy 0, policy_version 54354 (0.0026) [2025-01-04 01:16:57,683][134294] Updated weights for policy 0, policy_version 54364 (0.0026) [2025-01-04 01:16:58,968][134211] Fps is (10 sec: 12697.6, 60 sec: 15496.5, 300 sec: 15134.4). Total num frames: 222687232. Throughput: 0: 3653.8. Samples: 44843496. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 01:16:58,968][134211] Avg episode reward: [(0, '6.990')] [2025-01-04 01:17:00,771][134294] Updated weights for policy 0, policy_version 54374 (0.0026) [2025-01-04 01:17:03,862][134294] Updated weights for policy 0, policy_version 54384 (0.0024) [2025-01-04 01:17:03,968][134211] Fps is (10 sec: 13516.8, 60 sec: 15018.6, 300 sec: 15162.1). Total num frames: 222756864. Throughput: 0: 3667.0. Samples: 44853754. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 01:17:03,969][134211] Avg episode reward: [(0, '7.013')] [2025-01-04 01:17:06,620][134294] Updated weights for policy 0, policy_version 54394 (0.0023) [2025-01-04 01:17:08,556][134294] Updated weights for policy 0, policy_version 54404 (0.0013) [2025-01-04 01:17:08,968][134211] Fps is (10 sec: 15565.1, 60 sec: 14609.1, 300 sec: 15231.6). Total num frames: 222842880. Throughput: 0: 3729.7. Samples: 44875760. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 01:17:08,968][134211] Avg episode reward: [(0, '7.321')] [2025-01-04 01:17:11,586][134294] Updated weights for policy 0, policy_version 54414 (0.0023) [2025-01-04 01:17:13,970][134211] Fps is (10 sec: 15152.5, 60 sec: 14608.6, 300 sec: 15245.4). Total num frames: 222908416. Throughput: 0: 3777.9. Samples: 44898134. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 01:17:13,970][134211] Avg episode reward: [(0, '7.408')] [2025-01-04 01:17:13,980][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000054421_222908416.pth... [2025-01-04 01:17:14,055][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000053528_219250688.pth [2025-01-04 01:17:14,952][134294] Updated weights for policy 0, policy_version 54424 (0.0025) [2025-01-04 01:17:17,449][134294] Updated weights for policy 0, policy_version 54434 (0.0019) [2025-01-04 01:17:18,968][134211] Fps is (10 sec: 15155.3, 60 sec: 14950.4, 300 sec: 15273.2). Total num frames: 222994432. Throughput: 0: 3756.4. Samples: 44907396. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 01:17:18,968][134211] Avg episode reward: [(0, '7.432')] [2025-01-04 01:17:19,336][134294] Updated weights for policy 0, policy_version 54444 (0.0014) [2025-01-04 01:17:21,251][134294] Updated weights for policy 0, policy_version 54454 (0.0013) [2025-01-04 01:17:23,140][134294] Updated weights for policy 0, policy_version 54464 (0.0012) [2025-01-04 01:17:23,967][134211] Fps is (10 sec: 19255.3, 60 sec: 15701.4, 300 sec: 15412.1). Total num frames: 223100928. Throughput: 0: 3959.9. Samples: 44939174. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 01:17:23,968][134211] Avg episode reward: [(0, '7.032')] [2025-01-04 01:17:25,312][134294] Updated weights for policy 0, policy_version 54474 (0.0019) [2025-01-04 01:17:28,371][134294] Updated weights for policy 0, policy_version 54484 (0.0025) [2025-01-04 01:17:28,968][134211] Fps is (10 sec: 17612.3, 60 sec: 15701.3, 300 sec: 15314.9). Total num frames: 223170560. Throughput: 0: 3848.4. Samples: 44964024. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 01:17:28,969][134211] Avg episode reward: [(0, '7.017')] [2025-01-04 01:17:31,595][134294] Updated weights for policy 0, policy_version 54494 (0.0026) [2025-01-04 01:17:33,968][134211] Fps is (10 sec: 13516.5, 60 sec: 15633.1, 300 sec: 15273.2). Total num frames: 223236096. Throughput: 0: 3758.2. Samples: 44973748. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 01:17:33,968][134211] Avg episode reward: [(0, '6.799')] [2025-01-04 01:17:34,804][134294] Updated weights for policy 0, policy_version 54504 (0.0026) [2025-01-04 01:17:37,905][134294] Updated weights for policy 0, policy_version 54514 (0.0023) [2025-01-04 01:17:38,968][134211] Fps is (10 sec: 13107.4, 60 sec: 15291.7, 300 sec: 15301.0). Total num frames: 223301632. Throughput: 0: 3763.3. Samples: 44993196. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 01:17:38,970][134211] Avg episode reward: [(0, '6.493')] [2025-01-04 01:17:41,104][134294] Updated weights for policy 0, policy_version 54524 (0.0026) [2025-01-04 01:17:43,968][134211] Fps is (10 sec: 13107.1, 60 sec: 14677.3, 300 sec: 15245.4). Total num frames: 223367168. Throughput: 0: 3758.9. Samples: 45012648. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 01:17:43,968][134211] Avg episode reward: [(0, '7.497')] [2025-01-04 01:17:44,149][134294] Updated weights for policy 0, policy_version 54534 (0.0026) [2025-01-04 01:17:46,478][134294] Updated weights for policy 0, policy_version 54544 (0.0016) [2025-01-04 01:17:48,387][134294] Updated weights for policy 0, policy_version 54554 (0.0013) [2025-01-04 01:17:48,968][134211] Fps is (10 sec: 16383.6, 60 sec: 15086.9, 300 sec: 15356.5). Total num frames: 223465472. Throughput: 0: 3802.2. Samples: 45024854. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 01:17:48,968][134211] Avg episode reward: [(0, '6.862')] [2025-01-04 01:17:50,244][134294] Updated weights for policy 0, policy_version 54564 (0.0014) [2025-01-04 01:17:52,105][134294] Updated weights for policy 0, policy_version 54574 (0.0014) [2025-01-04 01:17:53,968][134211] Fps is (10 sec: 20070.4, 60 sec: 15769.6, 300 sec: 15439.8). Total num frames: 223567872. Throughput: 0: 4032.6. Samples: 45057228. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 01:17:53,968][134211] Avg episode reward: [(0, '7.832')] [2025-01-04 01:17:54,639][134294] Updated weights for policy 0, policy_version 54584 (0.0021) [2025-01-04 01:17:57,733][134294] Updated weights for policy 0, policy_version 54594 (0.0026) [2025-01-04 01:17:58,968][134211] Fps is (10 sec: 16384.3, 60 sec: 15701.3, 300 sec: 15287.1). Total num frames: 223629312. Throughput: 0: 4014.3. Samples: 45078772. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 01:17:58,968][134211] Avg episode reward: [(0, '7.240')] [2025-01-04 01:18:00,916][134294] Updated weights for policy 0, policy_version 54604 (0.0029) [2025-01-04 01:18:03,923][134294] Updated weights for policy 0, policy_version 54614 (0.0027) [2025-01-04 01:18:03,968][134211] Fps is (10 sec: 13107.0, 60 sec: 15701.3, 300 sec: 15301.0). Total num frames: 223698944. Throughput: 0: 4028.0. Samples: 45088656. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 01:18:03,969][134211] Avg episode reward: [(0, '6.606')] [2025-01-04 01:18:06,991][134294] Updated weights for policy 0, policy_version 54624 (0.0025) [2025-01-04 01:18:08,968][134211] Fps is (10 sec: 13516.8, 60 sec: 15360.0, 300 sec: 15301.0). Total num frames: 223764480. Throughput: 0: 3768.3. Samples: 45108750. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 01:18:08,968][134211] Avg episode reward: [(0, '6.797')] [2025-01-04 01:18:10,115][134294] Updated weights for policy 0, policy_version 54634 (0.0025) [2025-01-04 01:18:13,130][134294] Updated weights for policy 0, policy_version 54644 (0.0027) [2025-01-04 01:18:13,969][134211] Fps is (10 sec: 13106.1, 60 sec: 15360.2, 300 sec: 15300.9). Total num frames: 223830016. Throughput: 0: 3660.9. Samples: 45128768. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 01:18:13,969][134211] Avg episode reward: [(0, '7.799')] [2025-01-04 01:18:15,658][134294] Updated weights for policy 0, policy_version 54654 (0.0019) [2025-01-04 01:18:17,620][134294] Updated weights for policy 0, policy_version 54664 (0.0014) [2025-01-04 01:18:18,967][134211] Fps is (10 sec: 16384.4, 60 sec: 15564.8, 300 sec: 15412.1). Total num frames: 223928320. Throughput: 0: 3734.4. Samples: 45141796. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 01:18:18,968][134211] Avg episode reward: [(0, '6.007')] [2025-01-04 01:18:19,645][134294] Updated weights for policy 0, policy_version 54674 (0.0014) [2025-01-04 01:18:21,569][134294] Updated weights for policy 0, policy_version 54684 (0.0013) [2025-01-04 01:18:23,560][134294] Updated weights for policy 0, policy_version 54694 (0.0013) [2025-01-04 01:18:23,968][134211] Fps is (10 sec: 20482.5, 60 sec: 15564.8, 300 sec: 15481.5). Total num frames: 224034816. Throughput: 0: 3991.8. Samples: 45172828. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 01:18:23,968][134211] Avg episode reward: [(0, '7.331')] [2025-01-04 01:18:25,624][134294] Updated weights for policy 0, policy_version 54704 (0.0013) [2025-01-04 01:18:28,732][134294] Updated weights for policy 0, policy_version 54714 (0.0026) [2025-01-04 01:18:28,969][134211] Fps is (10 sec: 18019.7, 60 sec: 15632.8, 300 sec: 15412.1). Total num frames: 224108544. Throughput: 0: 4137.8. Samples: 45198852. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 01:18:28,970][134211] Avg episode reward: [(0, '6.857')] [2025-01-04 01:18:31,995][134294] Updated weights for policy 0, policy_version 54724 (0.0026) [2025-01-04 01:18:33,968][134211] Fps is (10 sec: 13926.1, 60 sec: 15633.0, 300 sec: 15425.9). Total num frames: 224174080. Throughput: 0: 4065.1. Samples: 45207784. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 01:18:33,968][134211] Avg episode reward: [(0, '6.207')] [2025-01-04 01:18:35,127][134294] Updated weights for policy 0, policy_version 54734 (0.0026) [2025-01-04 01:18:38,194][134294] Updated weights for policy 0, policy_version 54744 (0.0028) [2025-01-04 01:18:38,968][134211] Fps is (10 sec: 13108.8, 60 sec: 15633.0, 300 sec: 15412.1). Total num frames: 224239616. Throughput: 0: 3792.3. Samples: 45227882. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 01:18:38,968][134211] Avg episode reward: [(0, '6.479')] [2025-01-04 01:18:41,743][134294] Updated weights for policy 0, policy_version 54754 (0.0027) [2025-01-04 01:18:43,968][134211] Fps is (10 sec: 12288.0, 60 sec: 15496.5, 300 sec: 15273.2). Total num frames: 224296960. Throughput: 0: 3699.6. Samples: 45245254. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 01:18:43,968][134211] Avg episode reward: [(0, '6.919')] [2025-01-04 01:18:45,301][134294] Updated weights for policy 0, policy_version 54764 (0.0026) [2025-01-04 01:18:48,606][134294] Updated weights for policy 0, policy_version 54774 (0.0028) [2025-01-04 01:18:48,968][134211] Fps is (10 sec: 11468.9, 60 sec: 14813.9, 300 sec: 15217.7). Total num frames: 224354304. Throughput: 0: 3677.6. Samples: 45254146. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 01:18:48,968][134211] Avg episode reward: [(0, '6.900')] [2025-01-04 01:18:50,861][134294] Updated weights for policy 0, policy_version 54784 (0.0016) [2025-01-04 01:18:53,333][134294] Updated weights for policy 0, policy_version 54794 (0.0020) [2025-01-04 01:18:53,968][134211] Fps is (10 sec: 14336.1, 60 sec: 14540.8, 300 sec: 15301.0). Total num frames: 224440320. Throughput: 0: 3759.4. Samples: 45277922. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 01:18:53,968][134211] Avg episode reward: [(0, '7.293')] [2025-01-04 01:18:56,350][134294] Updated weights for policy 0, policy_version 54804 (0.0027) [2025-01-04 01:18:58,350][134294] Updated weights for policy 0, policy_version 54814 (0.0013) [2025-01-04 01:18:58,967][134211] Fps is (10 sec: 17613.3, 60 sec: 15018.7, 300 sec: 15370.4). Total num frames: 224530432. Throughput: 0: 3847.0. Samples: 45301878. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 01:18:58,968][134211] Avg episode reward: [(0, '6.686')] [2025-01-04 01:19:00,196][134294] Updated weights for policy 0, policy_version 54824 (0.0013) [2025-01-04 01:19:02,101][134294] Updated weights for policy 0, policy_version 54834 (0.0013) [2025-01-04 01:19:03,968][134211] Fps is (10 sec: 19661.1, 60 sec: 15633.2, 300 sec: 15398.2). Total num frames: 224636928. Throughput: 0: 3921.9. Samples: 45318282. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 01:19:03,968][134211] Avg episode reward: [(0, '6.853')] [2025-01-04 01:19:04,030][134294] Updated weights for policy 0, policy_version 54844 (0.0014) [2025-01-04 01:19:06,831][134294] Updated weights for policy 0, policy_version 54854 (0.0025) [2025-01-04 01:19:08,968][134211] Fps is (10 sec: 17612.1, 60 sec: 15701.3, 300 sec: 15314.9). Total num frames: 224706560. Throughput: 0: 3818.2. Samples: 45344646. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 01:19:08,968][134211] Avg episode reward: [(0, '6.262')] [2025-01-04 01:19:10,121][134294] Updated weights for policy 0, policy_version 54864 (0.0029) [2025-01-04 01:19:13,247][134294] Updated weights for policy 0, policy_version 54874 (0.0024) [2025-01-04 01:19:13,968][134211] Fps is (10 sec: 13516.3, 60 sec: 15701.6, 300 sec: 15328.8). Total num frames: 224772096. Throughput: 0: 3668.9. Samples: 45363948. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 01:19:13,969][134211] Avg episode reward: [(0, '6.495')] [2025-01-04 01:19:13,977][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000054876_224772096.pth... [2025-01-04 01:19:14,056][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000053974_221077504.pth [2025-01-04 01:19:16,387][134294] Updated weights for policy 0, policy_version 54884 (0.0028) [2025-01-04 01:19:18,968][134211] Fps is (10 sec: 13107.2, 60 sec: 15155.1, 300 sec: 15328.8). Total num frames: 224837632. Throughput: 0: 3687.1. Samples: 45373702. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 01:19:18,968][134211] Avg episode reward: [(0, '6.504')] [2025-01-04 01:19:19,356][134294] Updated weights for policy 0, policy_version 54894 (0.0023) [2025-01-04 01:19:22,515][134294] Updated weights for policy 0, policy_version 54904 (0.0025) [2025-01-04 01:19:23,968][134211] Fps is (10 sec: 13107.3, 60 sec: 14472.5, 300 sec: 15314.9). Total num frames: 224903168. Throughput: 0: 3686.2. Samples: 45393760. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 01:19:23,968][134211] Avg episode reward: [(0, '6.956')] [2025-01-04 01:19:25,596][134294] Updated weights for policy 0, policy_version 54914 (0.0027) [2025-01-04 01:19:28,570][134294] Updated weights for policy 0, policy_version 54924 (0.0024) [2025-01-04 01:19:28,968][134211] Fps is (10 sec: 13517.0, 60 sec: 14404.6, 300 sec: 15342.6). Total num frames: 224972800. Throughput: 0: 3751.4. Samples: 45414068. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 01:19:28,968][134211] Avg episode reward: [(0, '6.370')] [2025-01-04 01:19:31,525][134294] Updated weights for policy 0, policy_version 54934 (0.0026) [2025-01-04 01:19:33,681][134294] Updated weights for policy 0, policy_version 54944 (0.0015) [2025-01-04 01:19:33,968][134211] Fps is (10 sec: 15155.5, 60 sec: 14677.4, 300 sec: 15301.0). Total num frames: 225054720. Throughput: 0: 3783.3. Samples: 45424396. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 01:19:33,968][134211] Avg episode reward: [(0, '6.518')] [2025-01-04 01:19:35,625][134294] Updated weights for policy 0, policy_version 54954 (0.0014) [2025-01-04 01:19:37,475][134294] Updated weights for policy 0, policy_version 54964 (0.0014) [2025-01-04 01:19:38,967][134211] Fps is (10 sec: 18841.9, 60 sec: 15360.1, 300 sec: 15301.0). Total num frames: 225161216. Throughput: 0: 3929.2. Samples: 45454736. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 01:19:38,968][134211] Avg episode reward: [(0, '7.002')] [2025-01-04 01:19:39,398][134294] Updated weights for policy 0, policy_version 54974 (0.0012) [2025-01-04 01:19:42,033][134294] Updated weights for policy 0, policy_version 54984 (0.0025) [2025-01-04 01:19:43,968][134211] Fps is (10 sec: 18021.9, 60 sec: 15633.0, 300 sec: 15287.1). Total num frames: 225234944. Throughput: 0: 3961.2. Samples: 45480134. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 01:19:43,969][134211] Avg episode reward: [(0, '6.223')] [2025-01-04 01:19:45,313][134294] Updated weights for policy 0, policy_version 54994 (0.0027) [2025-01-04 01:19:48,382][134294] Updated weights for policy 0, policy_version 55004 (0.0026) [2025-01-04 01:19:48,968][134211] Fps is (10 sec: 13925.9, 60 sec: 15769.6, 300 sec: 15287.1). Total num frames: 225300480. Throughput: 0: 3815.0. Samples: 45489958. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 01:19:48,969][134211] Avg episode reward: [(0, '7.512')] [2025-01-04 01:19:51,506][134294] Updated weights for policy 0, policy_version 55014 (0.0028) [2025-01-04 01:19:53,968][134211] Fps is (10 sec: 13107.3, 60 sec: 15428.2, 300 sec: 15287.1). Total num frames: 225366016. Throughput: 0: 3667.2. Samples: 45509672. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 01:19:53,968][134211] Avg episode reward: [(0, '6.682')] [2025-01-04 01:19:54,711][134294] Updated weights for policy 0, policy_version 55024 (0.0026) [2025-01-04 01:19:57,333][134294] Updated weights for policy 0, policy_version 55034 (0.0020) [2025-01-04 01:19:58,967][134211] Fps is (10 sec: 15155.7, 60 sec: 15360.0, 300 sec: 15356.5). Total num frames: 225452032. Throughput: 0: 3746.0. Samples: 45532518. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 01:19:58,968][134211] Avg episode reward: [(0, '7.414')] [2025-01-04 01:19:59,273][134294] Updated weights for policy 0, policy_version 55044 (0.0014) [2025-01-04 01:20:01,202][134294] Updated weights for policy 0, policy_version 55054 (0.0012) [2025-01-04 01:20:03,311][134294] Updated weights for policy 0, policy_version 55064 (0.0012) [2025-01-04 01:20:03,968][134211] Fps is (10 sec: 18432.5, 60 sec: 15223.5, 300 sec: 15453.7). Total num frames: 225550336. Throughput: 0: 3878.5. Samples: 45548234. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 01:20:03,968][134211] Avg episode reward: [(0, '6.916')] [2025-01-04 01:20:06,270][134294] Updated weights for policy 0, policy_version 55074 (0.0025) [2025-01-04 01:20:08,968][134211] Fps is (10 sec: 16383.7, 60 sec: 15155.2, 300 sec: 15398.2). Total num frames: 225615872. Throughput: 0: 3959.3. Samples: 45571930. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 01:20:08,968][134211] Avg episode reward: [(0, '7.499')] [2025-01-04 01:20:09,612][134294] Updated weights for policy 0, policy_version 55084 (0.0030) [2025-01-04 01:20:12,942][134294] Updated weights for policy 0, policy_version 55094 (0.0024) [2025-01-04 01:20:13,968][134211] Fps is (10 sec: 12697.5, 60 sec: 15087.0, 300 sec: 15384.3). Total num frames: 225677312. Throughput: 0: 3910.7. Samples: 45590048. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 01:20:13,968][134211] Avg episode reward: [(0, '6.865')] [2025-01-04 01:20:16,031][134294] Updated weights for policy 0, policy_version 55104 (0.0026) [2025-01-04 01:20:18,968][134211] Fps is (10 sec: 12697.5, 60 sec: 15086.9, 300 sec: 15273.2). Total num frames: 225742848. Throughput: 0: 3906.4. Samples: 45600184. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 01:20:18,968][134211] Avg episode reward: [(0, '6.766')] [2025-01-04 01:20:19,203][134294] Updated weights for policy 0, policy_version 55114 (0.0028) [2025-01-04 01:20:22,221][134294] Updated weights for policy 0, policy_version 55124 (0.0024) [2025-01-04 01:20:23,968][134211] Fps is (10 sec: 14745.5, 60 sec: 15360.0, 300 sec: 15203.8). Total num frames: 225824768. Throughput: 0: 3677.1. Samples: 45620208. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 01:20:23,968][134211] Avg episode reward: [(0, '5.989')] [2025-01-04 01:20:24,102][134294] Updated weights for policy 0, policy_version 55134 (0.0013) [2025-01-04 01:20:25,963][134294] Updated weights for policy 0, policy_version 55144 (0.0014) [2025-01-04 01:20:28,702][134294] Updated weights for policy 0, policy_version 55154 (0.0023) [2025-01-04 01:20:28,968][134211] Fps is (10 sec: 16793.2, 60 sec: 15633.0, 300 sec: 15259.3). Total num frames: 225910784. Throughput: 0: 3754.3. Samples: 45649080. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-04 01:20:28,969][134211] Avg episode reward: [(0, '7.290')] [2025-01-04 01:20:31,869][134294] Updated weights for policy 0, policy_version 55164 (0.0027) [2025-01-04 01:20:33,968][134211] Fps is (10 sec: 15155.1, 60 sec: 15359.9, 300 sec: 15273.2). Total num frames: 225976320. Throughput: 0: 3752.5. Samples: 45658820. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-04 01:20:33,968][134211] Avg episode reward: [(0, '6.141')] [2025-01-04 01:20:35,036][134294] Updated weights for policy 0, policy_version 55174 (0.0028) [2025-01-04 01:20:37,445][134294] Updated weights for policy 0, policy_version 55184 (0.0017) [2025-01-04 01:20:38,967][134211] Fps is (10 sec: 15155.8, 60 sec: 15018.7, 300 sec: 15342.7). Total num frames: 226062336. Throughput: 0: 3791.6. Samples: 45680294. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-04 01:20:38,968][134211] Avg episode reward: [(0, '6.198')] [2025-01-04 01:20:39,593][134294] Updated weights for policy 0, policy_version 55194 (0.0013) [2025-01-04 01:20:41,637][134294] Updated weights for policy 0, policy_version 55204 (0.0014) [2025-01-04 01:20:43,688][134294] Updated weights for policy 0, policy_version 55214 (0.0014) [2025-01-04 01:20:43,968][134211] Fps is (10 sec: 18432.3, 60 sec: 15428.3, 300 sec: 15439.8). Total num frames: 226160640. Throughput: 0: 3948.3. Samples: 45710190. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-04 01:20:43,968][134211] Avg episode reward: [(0, '6.247')] [2025-01-04 01:20:46,814][134294] Updated weights for policy 0, policy_version 55224 (0.0026) [2025-01-04 01:20:48,969][134211] Fps is (10 sec: 15971.8, 60 sec: 15359.7, 300 sec: 15356.4). Total num frames: 226222080. Throughput: 0: 3832.7. Samples: 45720710. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-04 01:20:48,970][134211] Avg episode reward: [(0, '6.504')] [2025-01-04 01:20:50,008][134294] Updated weights for policy 0, policy_version 55234 (0.0026) [2025-01-04 01:20:53,137][134294] Updated weights for policy 0, policy_version 55244 (0.0026) [2025-01-04 01:20:53,968][134211] Fps is (10 sec: 12697.5, 60 sec: 15360.0, 300 sec: 15356.5). Total num frames: 226287616. Throughput: 0: 3730.9. Samples: 45739820. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-04 01:20:53,968][134211] Avg episode reward: [(0, '6.431')] [2025-01-04 01:20:56,209][134294] Updated weights for policy 0, policy_version 55254 (0.0026) [2025-01-04 01:20:58,968][134211] Fps is (10 sec: 13109.0, 60 sec: 15018.6, 300 sec: 15245.4). Total num frames: 226353152. Throughput: 0: 3763.4. Samples: 45759400. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-04 01:20:58,968][134211] Avg episode reward: [(0, '6.753')] [2025-01-04 01:20:59,425][134294] Updated weights for policy 0, policy_version 55264 (0.0024) [2025-01-04 01:21:02,523][134294] Updated weights for policy 0, policy_version 55274 (0.0028) [2025-01-04 01:21:03,968][134211] Fps is (10 sec: 13106.6, 60 sec: 14472.4, 300 sec: 15092.7). Total num frames: 226418688. Throughput: 0: 3759.1. Samples: 45769344. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 01:21:03,969][134211] Avg episode reward: [(0, '6.511')] [2025-01-04 01:21:05,438][134294] Updated weights for policy 0, policy_version 55284 (0.0022) [2025-01-04 01:21:07,323][134294] Updated weights for policy 0, policy_version 55294 (0.0012) [2025-01-04 01:21:08,968][134211] Fps is (10 sec: 16384.3, 60 sec: 15018.7, 300 sec: 15203.8). Total num frames: 226516992. Throughput: 0: 3853.5. Samples: 45793614. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 01:21:08,968][134211] Avg episode reward: [(0, '7.270')] [2025-01-04 01:21:09,197][134294] Updated weights for policy 0, policy_version 55304 (0.0014) [2025-01-04 01:21:11,062][134294] Updated weights for policy 0, policy_version 55314 (0.0013) [2025-01-04 01:21:13,072][134294] Updated weights for policy 0, policy_version 55324 (0.0014) [2025-01-04 01:21:13,968][134211] Fps is (10 sec: 20070.4, 60 sec: 15701.2, 300 sec: 15328.7). Total num frames: 226619392. Throughput: 0: 3920.7. Samples: 45825514. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 01:21:13,969][134211] Avg episode reward: [(0, '6.778')] [2025-01-04 01:21:13,977][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000055327_226619392.pth... [2025-01-04 01:21:14,061][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000054421_222908416.pth [2025-01-04 01:21:16,142][134294] Updated weights for policy 0, policy_version 55334 (0.0026) [2025-01-04 01:21:18,968][134211] Fps is (10 sec: 16383.4, 60 sec: 15633.0, 300 sec: 15328.8). Total num frames: 226680832. Throughput: 0: 3917.1. Samples: 45835088. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 01:21:18,969][134211] Avg episode reward: [(0, '6.669')] [2025-01-04 01:21:19,459][134294] Updated weights for policy 0, policy_version 55344 (0.0023) [2025-01-04 01:21:22,552][134294] Updated weights for policy 0, policy_version 55354 (0.0026) [2025-01-04 01:21:23,968][134211] Fps is (10 sec: 12698.1, 60 sec: 15360.0, 300 sec: 15314.9). Total num frames: 226746368. Throughput: 0: 3868.0. Samples: 45854354. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 01:21:23,968][134211] Avg episode reward: [(0, '6.616')] [2025-01-04 01:21:25,668][134294] Updated weights for policy 0, policy_version 55364 (0.0027) [2025-01-04 01:21:28,636][134294] Updated weights for policy 0, policy_version 55374 (0.0029) [2025-01-04 01:21:28,968][134211] Fps is (10 sec: 13107.4, 60 sec: 15018.7, 300 sec: 15301.0). Total num frames: 226811904. Throughput: 0: 3655.6. Samples: 45874692. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 01:21:28,968][134211] Avg episode reward: [(0, '7.164')] [2025-01-04 01:21:31,707][134294] Updated weights for policy 0, policy_version 55384 (0.0027) [2025-01-04 01:21:33,968][134211] Fps is (10 sec: 13516.9, 60 sec: 15087.0, 300 sec: 15245.4). Total num frames: 226881536. Throughput: 0: 3645.5. Samples: 45884752. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 01:21:33,968][134211] Avg episode reward: [(0, '7.229')] [2025-01-04 01:21:34,728][134294] Updated weights for policy 0, policy_version 55394 (0.0025) [2025-01-04 01:21:36,568][134294] Updated weights for policy 0, policy_version 55404 (0.0013) [2025-01-04 01:21:38,514][134294] Updated weights for policy 0, policy_version 55414 (0.0014) [2025-01-04 01:21:38,968][134211] Fps is (10 sec: 17203.5, 60 sec: 15360.0, 300 sec: 15245.4). Total num frames: 226983936. Throughput: 0: 3784.7. Samples: 45910132. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 01:21:38,968][134211] Avg episode reward: [(0, '7.255')] [2025-01-04 01:21:41,176][134294] Updated weights for policy 0, policy_version 55424 (0.0026) [2025-01-04 01:21:43,968][134211] Fps is (10 sec: 17202.9, 60 sec: 14882.1, 300 sec: 15231.6). Total num frames: 227053568. Throughput: 0: 3882.6. Samples: 45934118. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 01:21:43,969][134211] Avg episode reward: [(0, '6.405')] [2025-01-04 01:21:44,240][134294] Updated weights for policy 0, policy_version 55434 (0.0027) [2025-01-04 01:21:47,283][134294] Updated weights for policy 0, policy_version 55444 (0.0028) [2025-01-04 01:21:48,968][134211] Fps is (10 sec: 13515.9, 60 sec: 14950.6, 300 sec: 15245.4). Total num frames: 227119104. Throughput: 0: 3879.9. Samples: 45943942. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 01:21:48,969][134211] Avg episode reward: [(0, '7.118')] [2025-01-04 01:21:50,181][134294] Updated weights for policy 0, policy_version 55454 (0.0022) [2025-01-04 01:21:52,125][134294] Updated weights for policy 0, policy_version 55464 (0.0013) [2025-01-04 01:21:53,961][134294] Updated weights for policy 0, policy_version 55474 (0.0013) [2025-01-04 01:21:53,967][134211] Fps is (10 sec: 16794.1, 60 sec: 15564.8, 300 sec: 15370.4). Total num frames: 227221504. Throughput: 0: 3891.5. Samples: 45968732. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 01:21:53,968][134211] Avg episode reward: [(0, '7.173')] [2025-01-04 01:21:55,887][134294] Updated weights for policy 0, policy_version 55484 (0.0013) [2025-01-04 01:21:57,737][134294] Updated weights for policy 0, policy_version 55494 (0.0013) [2025-01-04 01:21:58,967][134211] Fps is (10 sec: 20891.3, 60 sec: 16247.5, 300 sec: 15495.4). Total num frames: 227328000. Throughput: 0: 3910.1. Samples: 46001464. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 01:21:58,968][134211] Avg episode reward: [(0, '7.179')] [2025-01-04 01:21:59,586][134294] Updated weights for policy 0, policy_version 55504 (0.0014) [2025-01-04 01:22:01,528][134294] Updated weights for policy 0, policy_version 55514 (0.0013) [2025-01-04 01:22:03,540][134294] Updated weights for policy 0, policy_version 55524 (0.0014) [2025-01-04 01:22:03,968][134211] Fps is (10 sec: 20889.4, 60 sec: 16862.0, 300 sec: 15550.9). Total num frames: 227430400. Throughput: 0: 4062.1. Samples: 46017882. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 01:22:03,968][134211] Avg episode reward: [(0, '7.364')] [2025-01-04 01:22:06,606][134294] Updated weights for policy 0, policy_version 55534 (0.0026) [2025-01-04 01:22:08,968][134211] Fps is (10 sec: 16383.5, 60 sec: 16247.4, 300 sec: 15537.1). Total num frames: 227491840. Throughput: 0: 4163.9. Samples: 46041728. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 01:22:08,968][134211] Avg episode reward: [(0, '6.858')] [2025-01-04 01:22:10,152][134294] Updated weights for policy 0, policy_version 55544 (0.0027) [2025-01-04 01:22:13,385][134294] Updated weights for policy 0, policy_version 55554 (0.0024) [2025-01-04 01:22:13,968][134211] Fps is (10 sec: 12287.9, 60 sec: 15564.9, 300 sec: 15453.7). Total num frames: 227553280. Throughput: 0: 4114.1. Samples: 46059828. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 01:22:13,968][134211] Avg episode reward: [(0, '6.994')] [2025-01-04 01:22:16,739][134294] Updated weights for policy 0, policy_version 55564 (0.0028) [2025-01-04 01:22:18,968][134211] Fps is (10 sec: 12697.0, 60 sec: 15633.0, 300 sec: 15314.8). Total num frames: 227618816. Throughput: 0: 4091.7. Samples: 46068882. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 01:22:18,969][134211] Avg episode reward: [(0, '6.543')] [2025-01-04 01:22:19,907][134294] Updated weights for policy 0, policy_version 55574 (0.0025) [2025-01-04 01:22:22,772][134294] Updated weights for policy 0, policy_version 55584 (0.0025) [2025-01-04 01:22:23,968][134211] Fps is (10 sec: 13107.2, 60 sec: 15633.1, 300 sec: 15301.0). Total num frames: 227684352. Throughput: 0: 3974.3. Samples: 46088974. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 01:22:23,968][134211] Avg episode reward: [(0, '7.502')] [2025-01-04 01:22:25,868][134294] Updated weights for policy 0, policy_version 55594 (0.0027) [2025-01-04 01:22:28,766][134294] Updated weights for policy 0, policy_version 55604 (0.0024) [2025-01-04 01:22:28,968][134211] Fps is (10 sec: 13517.5, 60 sec: 15701.4, 300 sec: 15314.9). Total num frames: 227753984. Throughput: 0: 3899.6. Samples: 46109598. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 01:22:28,968][134211] Avg episode reward: [(0, '6.283')] [2025-01-04 01:22:31,816][134294] Updated weights for policy 0, policy_version 55614 (0.0028) [2025-01-04 01:22:33,968][134211] Fps is (10 sec: 13926.2, 60 sec: 15701.3, 300 sec: 15328.8). Total num frames: 227823616. Throughput: 0: 3911.4. Samples: 46119954. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 01:22:33,968][134211] Avg episode reward: [(0, '6.735')] [2025-01-04 01:22:34,720][134294] Updated weights for policy 0, policy_version 55624 (0.0025) [2025-01-04 01:22:37,822][134294] Updated weights for policy 0, policy_version 55634 (0.0025) [2025-01-04 01:22:38,968][134211] Fps is (10 sec: 13516.9, 60 sec: 15086.9, 300 sec: 15328.8). Total num frames: 227889152. Throughput: 0: 3812.1. Samples: 46140278. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 01:22:38,968][134211] Avg episode reward: [(0, '6.455')] [2025-01-04 01:22:40,589][134294] Updated weights for policy 0, policy_version 55644 (0.0021) [2025-01-04 01:22:42,710][134294] Updated weights for policy 0, policy_version 55654 (0.0012) [2025-01-04 01:22:43,968][134211] Fps is (10 sec: 15974.7, 60 sec: 15496.6, 300 sec: 15314.9). Total num frames: 227983360. Throughput: 0: 3639.4. Samples: 46165236. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 01:22:43,968][134211] Avg episode reward: [(0, '7.809')] [2025-01-04 01:22:44,796][134294] Updated weights for policy 0, policy_version 55664 (0.0013) [2025-01-04 01:22:46,688][134294] Updated weights for policy 0, policy_version 55674 (0.0013) [2025-01-04 01:22:48,630][134294] Updated weights for policy 0, policy_version 55684 (0.0013) [2025-01-04 01:22:48,967][134211] Fps is (10 sec: 19661.0, 60 sec: 16111.1, 300 sec: 15314.9). Total num frames: 228085760. Throughput: 0: 3618.1. Samples: 46180696. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 01:22:48,968][134211] Avg episode reward: [(0, '7.352')] [2025-01-04 01:22:50,515][134294] Updated weights for policy 0, policy_version 55694 (0.0014) [2025-01-04 01:22:53,366][134294] Updated weights for policy 0, policy_version 55704 (0.0025) [2025-01-04 01:22:53,968][134211] Fps is (10 sec: 18431.6, 60 sec: 15769.5, 300 sec: 15384.3). Total num frames: 228167680. Throughput: 0: 3745.3. Samples: 46210266. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 01:22:53,968][134211] Avg episode reward: [(0, '6.977')] [2025-01-04 01:22:56,431][134294] Updated weights for policy 0, policy_version 55714 (0.0027) [2025-01-04 01:22:58,968][134211] Fps is (10 sec: 14745.2, 60 sec: 15086.9, 300 sec: 15370.4). Total num frames: 228233216. Throughput: 0: 3774.6. Samples: 46229684. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 01:22:58,968][134211] Avg episode reward: [(0, '6.810')] [2025-01-04 01:22:59,656][134294] Updated weights for policy 0, policy_version 55724 (0.0026) [2025-01-04 01:23:02,897][134294] Updated weights for policy 0, policy_version 55734 (0.0029) [2025-01-04 01:23:03,968][134211] Fps is (10 sec: 13107.4, 60 sec: 14472.5, 300 sec: 15370.4). Total num frames: 228298752. Throughput: 0: 3789.6. Samples: 46239414. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 01:23:03,968][134211] Avg episode reward: [(0, '6.412')] [2025-01-04 01:23:05,908][134294] Updated weights for policy 0, policy_version 55744 (0.0025) [2025-01-04 01:23:08,968][134211] Fps is (10 sec: 13106.7, 60 sec: 14540.7, 300 sec: 15370.5). Total num frames: 228364288. Throughput: 0: 3786.4. Samples: 46259364. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 01:23:08,969][134211] Avg episode reward: [(0, '6.904')] [2025-01-04 01:23:08,970][134294] Updated weights for policy 0, policy_version 55754 (0.0026) [2025-01-04 01:23:11,890][134294] Updated weights for policy 0, policy_version 55764 (0.0023) [2025-01-04 01:23:13,968][134211] Fps is (10 sec: 13516.5, 60 sec: 14677.3, 300 sec: 15273.2). Total num frames: 228433920. Throughput: 0: 3781.6. Samples: 46279770. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 01:23:13,968][134211] Avg episode reward: [(0, '6.987')] [2025-01-04 01:23:13,982][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000055770_228433920.pth... [2025-01-04 01:23:14,059][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000054876_224772096.pth [2025-01-04 01:23:15,108][134294] Updated weights for policy 0, policy_version 55774 (0.0024) [2025-01-04 01:23:17,897][134294] Updated weights for policy 0, policy_version 55784 (0.0022) [2025-01-04 01:23:18,968][134211] Fps is (10 sec: 14745.8, 60 sec: 14882.2, 300 sec: 15176.0). Total num frames: 228511744. Throughput: 0: 3769.0. Samples: 46289560. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 01:23:18,968][134211] Avg episode reward: [(0, '6.688')] [2025-01-04 01:23:19,853][134294] Updated weights for policy 0, policy_version 55794 (0.0016) [2025-01-04 01:23:21,748][134294] Updated weights for policy 0, policy_version 55804 (0.0014) [2025-01-04 01:23:23,642][134294] Updated weights for policy 0, policy_version 55814 (0.0013) [2025-01-04 01:23:23,968][134211] Fps is (10 sec: 18432.5, 60 sec: 15564.8, 300 sec: 15287.2). Total num frames: 228618240. Throughput: 0: 3966.8. Samples: 46318782. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 01:23:23,968][134211] Avg episode reward: [(0, '6.565')] [2025-01-04 01:23:25,696][134294] Updated weights for policy 0, policy_version 55824 (0.0016) [2025-01-04 01:23:28,801][134294] Updated weights for policy 0, policy_version 55834 (0.0025) [2025-01-04 01:23:28,968][134211] Fps is (10 sec: 18432.5, 60 sec: 15701.3, 300 sec: 15328.8). Total num frames: 228696064. Throughput: 0: 4001.9. Samples: 46345324. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 01:23:28,968][134211] Avg episode reward: [(0, '6.902')] [2025-01-04 01:23:32,069][134294] Updated weights for policy 0, policy_version 55844 (0.0023) [2025-01-04 01:23:33,968][134211] Fps is (10 sec: 13926.2, 60 sec: 15564.8, 300 sec: 15314.9). Total num frames: 228757504. Throughput: 0: 3866.2. Samples: 46354678. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 01:23:33,968][134211] Avg episode reward: [(0, '6.863')] [2025-01-04 01:23:35,450][134294] Updated weights for policy 0, policy_version 55854 (0.0027) [2025-01-04 01:23:38,723][134294] Updated weights for policy 0, policy_version 55864 (0.0027) [2025-01-04 01:23:38,968][134211] Fps is (10 sec: 12288.1, 60 sec: 15496.5, 300 sec: 15328.8). Total num frames: 228818944. Throughput: 0: 3620.4. Samples: 46373184. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 01:23:38,968][134211] Avg episode reward: [(0, '6.342')] [2025-01-04 01:23:41,755][134294] Updated weights for policy 0, policy_version 55874 (0.0025) [2025-01-04 01:23:43,967][134211] Fps is (10 sec: 13517.1, 60 sec: 15155.2, 300 sec: 15384.3). Total num frames: 228892672. Throughput: 0: 3627.4. Samples: 46392916. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 01:23:43,968][134211] Avg episode reward: [(0, '6.995')] [2025-01-04 01:23:44,250][134294] Updated weights for policy 0, policy_version 55884 (0.0019) [2025-01-04 01:23:46,091][134294] Updated weights for policy 0, policy_version 55894 (0.0015) [2025-01-04 01:23:47,978][134294] Updated weights for policy 0, policy_version 55904 (0.0015) [2025-01-04 01:23:48,968][134211] Fps is (10 sec: 18432.2, 60 sec: 15291.7, 300 sec: 15467.6). Total num frames: 229003264. Throughput: 0: 3768.4. Samples: 46408990. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 01:23:48,968][134211] Avg episode reward: [(0, '7.493')] [2025-01-04 01:23:50,006][134294] Updated weights for policy 0, policy_version 55914 (0.0016) [2025-01-04 01:23:53,183][134294] Updated weights for policy 0, policy_version 55924 (0.0027) [2025-01-04 01:23:53,968][134211] Fps is (10 sec: 18022.0, 60 sec: 15087.0, 300 sec: 15398.2). Total num frames: 229072896. Throughput: 0: 3935.2. Samples: 46436446. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 01:23:53,968][134211] Avg episode reward: [(0, '6.485')] [2025-01-04 01:23:56,309][134294] Updated weights for policy 0, policy_version 55934 (0.0025) [2025-01-04 01:23:58,968][134211] Fps is (10 sec: 13516.7, 60 sec: 15087.0, 300 sec: 15259.3). Total num frames: 229138432. Throughput: 0: 3901.7. Samples: 46455344. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 01:23:58,968][134211] Avg episode reward: [(0, '6.788')] [2025-01-04 01:23:59,618][134294] Updated weights for policy 0, policy_version 55944 (0.0031) [2025-01-04 01:24:02,663][134294] Updated weights for policy 0, policy_version 55954 (0.0028) [2025-01-04 01:24:03,968][134211] Fps is (10 sec: 12697.6, 60 sec: 15018.6, 300 sec: 15231.6). Total num frames: 229199872. Throughput: 0: 3902.1. Samples: 46465152. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 01:24:03,968][134211] Avg episode reward: [(0, '7.033')] [2025-01-04 01:24:05,765][134294] Updated weights for policy 0, policy_version 55964 (0.0025) [2025-01-04 01:24:08,259][134294] Updated weights for policy 0, policy_version 55974 (0.0017) [2025-01-04 01:24:08,968][134211] Fps is (10 sec: 14336.0, 60 sec: 15291.9, 300 sec: 15287.1). Total num frames: 229281792. Throughput: 0: 3705.0. Samples: 46485506. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 01:24:08,968][134211] Avg episode reward: [(0, '7.326')] [2025-01-04 01:24:10,440][134294] Updated weights for policy 0, policy_version 55984 (0.0019) [2025-01-04 01:24:13,486][134294] Updated weights for policy 0, policy_version 55994 (0.0025) [2025-01-04 01:24:13,968][134211] Fps is (10 sec: 15565.0, 60 sec: 15360.1, 300 sec: 15314.9). Total num frames: 229355520. Throughput: 0: 3663.5. Samples: 46510180. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 01:24:13,968][134211] Avg episode reward: [(0, '6.748')] [2025-01-04 01:24:16,694][134294] Updated weights for policy 0, policy_version 56004 (0.0024) [2025-01-04 01:24:18,686][134294] Updated weights for policy 0, policy_version 56014 (0.0013) [2025-01-04 01:24:18,968][134211] Fps is (10 sec: 15564.8, 60 sec: 15428.4, 300 sec: 15370.4). Total num frames: 229437440. Throughput: 0: 3663.5. Samples: 46519534. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 01:24:18,968][134211] Avg episode reward: [(0, '6.617')] [2025-01-04 01:24:20,586][134294] Updated weights for policy 0, policy_version 56024 (0.0015) [2025-01-04 01:24:22,440][134294] Updated weights for policy 0, policy_version 56034 (0.0014) [2025-01-04 01:24:23,968][134211] Fps is (10 sec: 18841.8, 60 sec: 15428.3, 300 sec: 15495.4). Total num frames: 229543936. Throughput: 0: 3944.2. Samples: 46550674. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 01:24:23,968][134211] Avg episode reward: [(0, '6.142')] [2025-01-04 01:24:24,371][134294] Updated weights for policy 0, policy_version 56044 (0.0013) [2025-01-04 01:24:26,509][134294] Updated weights for policy 0, policy_version 56054 (0.0013) [2025-01-04 01:24:28,970][134211] Fps is (10 sec: 18837.3, 60 sec: 15496.0, 300 sec: 15495.3). Total num frames: 229625856. Throughput: 0: 4112.4. Samples: 46577982. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 01:24:28,971][134211] Avg episode reward: [(0, '6.280')] [2025-01-04 01:24:29,937][134294] Updated weights for policy 0, policy_version 56064 (0.0028) [2025-01-04 01:24:33,141][134294] Updated weights for policy 0, policy_version 56074 (0.0030) [2025-01-04 01:24:33,968][134211] Fps is (10 sec: 14335.4, 60 sec: 15496.5, 300 sec: 15342.6). Total num frames: 229687296. Throughput: 0: 3948.5. Samples: 46586672. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 01:24:33,969][134211] Avg episode reward: [(0, '6.255')] [2025-01-04 01:24:36,323][134294] Updated weights for policy 0, policy_version 56084 (0.0025) [2025-01-04 01:24:38,968][134211] Fps is (10 sec: 12700.2, 60 sec: 15564.8, 300 sec: 15314.9). Total num frames: 229752832. Throughput: 0: 3769.1. Samples: 46606056. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 01:24:38,968][134211] Avg episode reward: [(0, '6.666')] [2025-01-04 01:24:39,545][134294] Updated weights for policy 0, policy_version 56094 (0.0026) [2025-01-04 01:24:43,143][134294] Updated weights for policy 0, policy_version 56104 (0.0027) [2025-01-04 01:24:43,968][134211] Fps is (10 sec: 12288.2, 60 sec: 15291.7, 300 sec: 15287.1). Total num frames: 229810176. Throughput: 0: 3748.0. Samples: 46624006. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 01:24:43,968][134211] Avg episode reward: [(0, '6.917')] [2025-01-04 01:24:46,046][134294] Updated weights for policy 0, policy_version 56114 (0.0022) [2025-01-04 01:24:48,131][134294] Updated weights for policy 0, policy_version 56124 (0.0015) [2025-01-04 01:24:48,968][134211] Fps is (10 sec: 14336.3, 60 sec: 14882.1, 300 sec: 15356.5). Total num frames: 229896192. Throughput: 0: 3764.7. Samples: 46634562. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 01:24:48,968][134211] Avg episode reward: [(0, '7.226')] [2025-01-04 01:24:50,144][134294] Updated weights for policy 0, policy_version 56134 (0.0013) [2025-01-04 01:24:52,051][134294] Updated weights for policy 0, policy_version 56144 (0.0012) [2025-01-04 01:24:53,968][134211] Fps is (10 sec: 18431.8, 60 sec: 15360.0, 300 sec: 15398.2). Total num frames: 229994496. Throughput: 0: 4001.0. Samples: 46665550. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 01:24:53,969][134211] Avg episode reward: [(0, '6.780')] [2025-01-04 01:24:54,627][134294] Updated weights for policy 0, policy_version 56154 (0.0022) [2025-01-04 01:24:57,861][134294] Updated weights for policy 0, policy_version 56164 (0.0027) [2025-01-04 01:24:58,968][134211] Fps is (10 sec: 16383.8, 60 sec: 15360.0, 300 sec: 15287.1). Total num frames: 230060032. Throughput: 0: 3914.1. Samples: 46686316. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 01:24:58,968][134211] Avg episode reward: [(0, '7.474')] [2025-01-04 01:25:01,148][134294] Updated weights for policy 0, policy_version 56174 (0.0025) [2025-01-04 01:25:03,968][134211] Fps is (10 sec: 12288.1, 60 sec: 15291.7, 300 sec: 15259.3). Total num frames: 230117376. Throughput: 0: 3906.0. Samples: 46695304. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 01:25:03,968][134211] Avg episode reward: [(0, '7.041')] [2025-01-04 01:25:04,618][134294] Updated weights for policy 0, policy_version 56184 (0.0027) [2025-01-04 01:25:07,969][134294] Updated weights for policy 0, policy_version 56194 (0.0025) [2025-01-04 01:25:08,968][134211] Fps is (10 sec: 12287.9, 60 sec: 15018.6, 300 sec: 15273.2). Total num frames: 230182912. Throughput: 0: 3617.5. Samples: 46713464. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 01:25:08,968][134211] Avg episode reward: [(0, '7.216')] [2025-01-04 01:25:10,478][134294] Updated weights for policy 0, policy_version 56204 (0.0019) [2025-01-04 01:25:12,395][134294] Updated weights for policy 0, policy_version 56214 (0.0015) [2025-01-04 01:25:13,968][134211] Fps is (10 sec: 16793.9, 60 sec: 15496.5, 300 sec: 15398.2). Total num frames: 230285312. Throughput: 0: 3616.2. Samples: 46740704. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 01:25:13,968][134211] Avg episode reward: [(0, '6.104')] [2025-01-04 01:25:13,973][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000056222_230285312.pth... [2025-01-04 01:25:14,018][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000055327_226619392.pth [2025-01-04 01:25:14,226][134294] Updated weights for policy 0, policy_version 56224 (0.0014) [2025-01-04 01:25:16,271][134294] Updated weights for policy 0, policy_version 56234 (0.0013) [2025-01-04 01:25:18,829][134294] Updated weights for policy 0, policy_version 56244 (0.0021) [2025-01-04 01:25:18,968][134211] Fps is (10 sec: 19251.5, 60 sec: 15633.1, 300 sec: 15426.0). Total num frames: 230375424. Throughput: 0: 3769.5. Samples: 46756300. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 01:25:18,968][134211] Avg episode reward: [(0, '6.418')] [2025-01-04 01:25:22,565][134294] Updated weights for policy 0, policy_version 56254 (0.0029) [2025-01-04 01:25:23,969][134211] Fps is (10 sec: 14334.0, 60 sec: 14745.2, 300 sec: 15314.8). Total num frames: 230428672. Throughput: 0: 3776.3. Samples: 46775996. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 01:25:23,970][134211] Avg episode reward: [(0, '7.160')] [2025-01-04 01:25:25,988][134294] Updated weights for policy 0, policy_version 56264 (0.0025) [2025-01-04 01:25:28,716][134294] Updated weights for policy 0, policy_version 56274 (0.0021) [2025-01-04 01:25:28,968][134211] Fps is (10 sec: 12697.6, 60 sec: 14609.6, 300 sec: 15342.7). Total num frames: 230502400. Throughput: 0: 3799.8. Samples: 46794996. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 01:25:28,968][134211] Avg episode reward: [(0, '7.504')] [2025-01-04 01:25:31,007][134294] Updated weights for policy 0, policy_version 56284 (0.0018) [2025-01-04 01:25:33,954][134294] Updated weights for policy 0, policy_version 56294 (0.0026) [2025-01-04 01:25:33,969][134211] Fps is (10 sec: 15154.6, 60 sec: 14881.8, 300 sec: 15314.8). Total num frames: 230580224. Throughput: 0: 3868.4. Samples: 46808646. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 01:25:33,970][134211] Avg episode reward: [(0, '6.846')] [2025-01-04 01:25:37,014][134294] Updated weights for policy 0, policy_version 56304 (0.0027) [2025-01-04 01:25:38,968][134211] Fps is (10 sec: 14335.8, 60 sec: 14882.2, 300 sec: 15203.8). Total num frames: 230645760. Throughput: 0: 3632.3. Samples: 46829002. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 01:25:38,968][134211] Avg episode reward: [(0, '7.443')] [2025-01-04 01:25:40,096][134294] Updated weights for policy 0, policy_version 56314 (0.0025) [2025-01-04 01:25:43,052][134294] Updated weights for policy 0, policy_version 56324 (0.0026) [2025-01-04 01:25:43,968][134211] Fps is (10 sec: 13519.2, 60 sec: 15087.0, 300 sec: 15231.6). Total num frames: 230715392. Throughput: 0: 3622.9. Samples: 46849348. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 01:25:43,968][134211] Avg episode reward: [(0, '6.873')] [2025-01-04 01:25:45,854][134294] Updated weights for policy 0, policy_version 56334 (0.0020) [2025-01-04 01:25:47,666][134294] Updated weights for policy 0, policy_version 56344 (0.0014) [2025-01-04 01:25:48,967][134211] Fps is (10 sec: 16384.3, 60 sec: 15223.5, 300 sec: 15328.8). Total num frames: 230809600. Throughput: 0: 3686.7. Samples: 46861204. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 01:25:48,968][134211] Avg episode reward: [(0, '7.162')] [2025-01-04 01:25:49,546][134294] Updated weights for policy 0, policy_version 56354 (0.0015) [2025-01-04 01:25:51,441][134294] Updated weights for policy 0, policy_version 56364 (0.0013) [2025-01-04 01:25:53,325][134294] Updated weights for policy 0, policy_version 56374 (0.0013) [2025-01-04 01:25:53,968][134211] Fps is (10 sec: 20480.1, 60 sec: 15428.3, 300 sec: 15481.5). Total num frames: 230920192. Throughput: 0: 4013.8. Samples: 46894084. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 01:25:53,968][134211] Avg episode reward: [(0, '6.941')] [2025-01-04 01:25:55,284][134294] Updated weights for policy 0, policy_version 56384 (0.0014) [2025-01-04 01:25:58,346][134294] Updated weights for policy 0, policy_version 56394 (0.0027) [2025-01-04 01:25:58,968][134211] Fps is (10 sec: 18841.1, 60 sec: 15633.0, 300 sec: 15523.2). Total num frames: 230998016. Throughput: 0: 3993.4. Samples: 46920406. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 01:25:58,969][134211] Avg episode reward: [(0, '6.536')] [2025-01-04 01:26:01,443][134294] Updated weights for policy 0, policy_version 56404 (0.0027) [2025-01-04 01:26:03,968][134211] Fps is (10 sec: 13926.2, 60 sec: 15701.4, 300 sec: 15398.2). Total num frames: 231059456. Throughput: 0: 3858.3. Samples: 46929922. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 01:26:03,968][134211] Avg episode reward: [(0, '6.541')] [2025-01-04 01:26:04,827][134294] Updated weights for policy 0, policy_version 56414 (0.0029) [2025-01-04 01:26:07,881][134294] Updated weights for policy 0, policy_version 56424 (0.0027) [2025-01-04 01:26:08,968][134211] Fps is (10 sec: 12697.8, 60 sec: 15701.4, 300 sec: 15273.2). Total num frames: 231124992. Throughput: 0: 3840.5. Samples: 46948812. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 01:26:08,968][134211] Avg episode reward: [(0, '6.972')] [2025-01-04 01:26:11,193][134294] Updated weights for policy 0, policy_version 56434 (0.0027) [2025-01-04 01:26:13,968][134211] Fps is (10 sec: 12697.4, 60 sec: 15018.6, 300 sec: 15273.2). Total num frames: 231186432. Throughput: 0: 3840.4. Samples: 46967814. Policy #0 lag: (min: 0.0, avg: 0.8, max: 3.0) [2025-01-04 01:26:13,968][134211] Avg episode reward: [(0, '7.194')] [2025-01-04 01:26:14,480][134294] Updated weights for policy 0, policy_version 56444 (0.0027) [2025-01-04 01:26:17,334][134294] Updated weights for policy 0, policy_version 56454 (0.0023) [2025-01-04 01:26:18,968][134211] Fps is (10 sec: 14336.1, 60 sec: 14882.1, 300 sec: 15328.8). Total num frames: 231268352. Throughput: 0: 3752.8. Samples: 46977516. Policy #0 lag: (min: 0.0, avg: 0.8, max: 3.0) [2025-01-04 01:26:18,968][134211] Avg episode reward: [(0, '6.747')] [2025-01-04 01:26:19,203][134294] Updated weights for policy 0, policy_version 56464 (0.0013) [2025-01-04 01:26:21,258][134294] Updated weights for policy 0, policy_version 56474 (0.0015) [2025-01-04 01:26:23,968][134211] Fps is (10 sec: 16793.7, 60 sec: 15428.6, 300 sec: 15398.2). Total num frames: 231354368. Throughput: 0: 3929.5. Samples: 47005828. Policy #0 lag: (min: 0.0, avg: 0.8, max: 3.0) [2025-01-04 01:26:23,968][134211] Avg episode reward: [(0, '6.320')] [2025-01-04 01:26:24,324][134294] Updated weights for policy 0, policy_version 56484 (0.0027) [2025-01-04 01:26:27,392][134294] Updated weights for policy 0, policy_version 56494 (0.0025) [2025-01-04 01:26:28,968][134211] Fps is (10 sec: 15155.0, 60 sec: 15291.7, 300 sec: 15384.3). Total num frames: 231419904. Throughput: 0: 3916.0. Samples: 47025568. Policy #0 lag: (min: 0.0, avg: 0.8, max: 3.0) [2025-01-04 01:26:28,968][134211] Avg episode reward: [(0, '7.310')] [2025-01-04 01:26:30,448][134294] Updated weights for policy 0, policy_version 56504 (0.0026) [2025-01-04 01:26:33,379][134294] Updated weights for policy 0, policy_version 56514 (0.0024) [2025-01-04 01:26:33,967][134211] Fps is (10 sec: 13926.8, 60 sec: 15224.0, 300 sec: 15287.1). Total num frames: 231493632. Throughput: 0: 3882.8. Samples: 47035928. Policy #0 lag: (min: 0.0, avg: 0.8, max: 3.0) [2025-01-04 01:26:33,968][134211] Avg episode reward: [(0, '6.577')] [2025-01-04 01:26:35,345][134294] Updated weights for policy 0, policy_version 56524 (0.0015) [2025-01-04 01:26:38,140][134294] Updated weights for policy 0, policy_version 56534 (0.0022) [2025-01-04 01:26:38,968][134211] Fps is (10 sec: 15155.3, 60 sec: 15428.3, 300 sec: 15314.9). Total num frames: 231571456. Throughput: 0: 3704.5. Samples: 47060786. Policy #0 lag: (min: 0.0, avg: 0.8, max: 3.0) [2025-01-04 01:26:38,968][134211] Avg episode reward: [(0, '6.514')] [2025-01-04 01:26:41,161][134294] Updated weights for policy 0, policy_version 56544 (0.0028) [2025-01-04 01:26:43,629][134294] Updated weights for policy 0, policy_version 56554 (0.0015) [2025-01-04 01:26:43,968][134211] Fps is (10 sec: 15564.6, 60 sec: 15564.8, 300 sec: 15356.6). Total num frames: 231649280. Throughput: 0: 3611.4. Samples: 47082918. Policy #0 lag: (min: 0.0, avg: 0.8, max: 3.0) [2025-01-04 01:26:43,968][134211] Avg episode reward: [(0, '6.860')] [2025-01-04 01:26:45,860][134294] Updated weights for policy 0, policy_version 56564 (0.0017) [2025-01-04 01:26:48,968][134211] Fps is (10 sec: 15155.0, 60 sec: 15223.4, 300 sec: 15259.3). Total num frames: 231723008. Throughput: 0: 3684.6. Samples: 47095730. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 01:26:48,968][134211] Avg episode reward: [(0, '6.455')] [2025-01-04 01:26:49,052][134294] Updated weights for policy 0, policy_version 56574 (0.0024) [2025-01-04 01:26:52,130][134294] Updated weights for policy 0, policy_version 56584 (0.0025) [2025-01-04 01:26:53,968][134211] Fps is (10 sec: 15155.2, 60 sec: 14677.3, 300 sec: 15162.1). Total num frames: 231800832. Throughput: 0: 3702.8. Samples: 47115440. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 01:26:53,968][134211] Avg episode reward: [(0, '5.994')] [2025-01-04 01:26:54,247][134294] Updated weights for policy 0, policy_version 56594 (0.0014) [2025-01-04 01:26:56,160][134294] Updated weights for policy 0, policy_version 56604 (0.0014) [2025-01-04 01:26:58,543][134294] Updated weights for policy 0, policy_version 56614 (0.0017) [2025-01-04 01:26:58,968][134211] Fps is (10 sec: 17203.2, 60 sec: 14950.4, 300 sec: 15134.4). Total num frames: 231895040. Throughput: 0: 3931.3. Samples: 47144720. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 01:26:58,968][134211] Avg episode reward: [(0, '6.530')] [2025-01-04 01:27:01,703][134294] Updated weights for policy 0, policy_version 56624 (0.0028) [2025-01-04 01:27:03,970][134211] Fps is (10 sec: 15561.2, 60 sec: 14949.8, 300 sec: 15134.3). Total num frames: 231956480. Throughput: 0: 3933.9. Samples: 47154550. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 01:27:03,972][134211] Avg episode reward: [(0, '6.178')] [2025-01-04 01:27:05,229][134294] Updated weights for policy 0, policy_version 56634 (0.0027) [2025-01-04 01:27:08,026][134294] Updated weights for policy 0, policy_version 56644 (0.0019) [2025-01-04 01:27:08,968][134211] Fps is (10 sec: 13107.4, 60 sec: 15018.7, 300 sec: 15162.1). Total num frames: 232026112. Throughput: 0: 3710.3. Samples: 47172792. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 01:27:08,968][134211] Avg episode reward: [(0, '6.828')] [2025-01-04 01:27:10,295][134294] Updated weights for policy 0, policy_version 56654 (0.0012) [2025-01-04 01:27:12,475][134294] Updated weights for policy 0, policy_version 56664 (0.0013) [2025-01-04 01:27:13,968][134211] Fps is (10 sec: 15977.3, 60 sec: 15496.5, 300 sec: 15245.5). Total num frames: 232116224. Throughput: 0: 3879.2. Samples: 47200134. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 01:27:13,969][134211] Avg episode reward: [(0, '6.562')] [2025-01-04 01:27:13,986][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000056670_232120320.pth... [2025-01-04 01:27:14,055][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000055770_228433920.pth [2025-01-04 01:27:15,450][134294] Updated weights for policy 0, policy_version 56674 (0.0023) [2025-01-04 01:27:18,814][134294] Updated weights for policy 0, policy_version 56684 (0.0025) [2025-01-04 01:27:18,968][134211] Fps is (10 sec: 15154.9, 60 sec: 15155.1, 300 sec: 15231.6). Total num frames: 232177664. Throughput: 0: 3857.8. Samples: 47209532. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 01:27:18,968][134211] Avg episode reward: [(0, '7.110')] [2025-01-04 01:27:22,348][134294] Updated weights for policy 0, policy_version 56694 (0.0027) [2025-01-04 01:27:23,968][134211] Fps is (10 sec: 11878.9, 60 sec: 14677.4, 300 sec: 15189.9). Total num frames: 232235008. Throughput: 0: 3700.8. Samples: 47227320. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 01:27:23,968][134211] Avg episode reward: [(0, '6.802')] [2025-01-04 01:27:25,605][134294] Updated weights for policy 0, policy_version 56704 (0.0027) [2025-01-04 01:27:27,592][134294] Updated weights for policy 0, policy_version 56714 (0.0014) [2025-01-04 01:27:28,968][134211] Fps is (10 sec: 14745.9, 60 sec: 15087.0, 300 sec: 15259.3). Total num frames: 232325120. Throughput: 0: 3745.2. Samples: 47251452. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 01:27:28,968][134211] Avg episode reward: [(0, '6.704')] [2025-01-04 01:27:29,596][134294] Updated weights for policy 0, policy_version 56724 (0.0013) [2025-01-04 01:27:31,593][134294] Updated weights for policy 0, policy_version 56734 (0.0013) [2025-01-04 01:27:33,628][134294] Updated weights for policy 0, policy_version 56744 (0.0013) [2025-01-04 01:27:33,968][134211] Fps is (10 sec: 19251.4, 60 sec: 15564.8, 300 sec: 15384.3). Total num frames: 232427520. Throughput: 0: 3801.8. Samples: 47266812. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 01:27:33,968][134211] Avg episode reward: [(0, '6.781')] [2025-01-04 01:27:36,554][134294] Updated weights for policy 0, policy_version 56754 (0.0023) [2025-01-04 01:27:38,969][134211] Fps is (10 sec: 16381.7, 60 sec: 15291.4, 300 sec: 15273.2). Total num frames: 232488960. Throughput: 0: 3893.1. Samples: 47290634. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 01:27:38,970][134211] Avg episode reward: [(0, '6.285')] [2025-01-04 01:27:40,651][134294] Updated weights for policy 0, policy_version 56764 (0.0030) [2025-01-04 01:27:43,968][134211] Fps is (10 sec: 11468.7, 60 sec: 14882.1, 300 sec: 15106.6). Total num frames: 232542208. Throughput: 0: 3604.8. Samples: 47306936. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 01:27:43,968][134211] Avg episode reward: [(0, '6.361')] [2025-01-04 01:27:44,250][134294] Updated weights for policy 0, policy_version 56774 (0.0028) [2025-01-04 01:27:47,715][134294] Updated weights for policy 0, policy_version 56784 (0.0028) [2025-01-04 01:27:48,968][134211] Fps is (10 sec: 11060.6, 60 sec: 14609.1, 300 sec: 15023.3). Total num frames: 232599552. Throughput: 0: 3575.0. Samples: 47315418. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 01:27:48,968][134211] Avg episode reward: [(0, '7.138')] [2025-01-04 01:27:50,703][134294] Updated weights for policy 0, policy_version 56794 (0.0020) [2025-01-04 01:27:52,758][134294] Updated weights for policy 0, policy_version 56804 (0.0013) [2025-01-04 01:27:53,968][134211] Fps is (10 sec: 14745.6, 60 sec: 14813.8, 300 sec: 15106.6). Total num frames: 232689664. Throughput: 0: 3660.1. Samples: 47337498. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 01:27:53,968][134211] Avg episode reward: [(0, '7.015')] [2025-01-04 01:27:55,321][134294] Updated weights for policy 0, policy_version 56814 (0.0020) [2025-01-04 01:27:58,560][134294] Updated weights for policy 0, policy_version 56824 (0.0026) [2025-01-04 01:27:58,968][134211] Fps is (10 sec: 15155.2, 60 sec: 14267.8, 300 sec: 15092.7). Total num frames: 232751104. Throughput: 0: 3546.1. Samples: 47359706. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 01:27:58,968][134211] Avg episode reward: [(0, '7.040')] [2025-01-04 01:28:01,888][134294] Updated weights for policy 0, policy_version 56834 (0.0027) [2025-01-04 01:28:03,968][134211] Fps is (10 sec: 12697.6, 60 sec: 14336.5, 300 sec: 15092.7). Total num frames: 232816640. Throughput: 0: 3542.0. Samples: 47368920. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 01:28:03,970][134211] Avg episode reward: [(0, '7.045')] [2025-01-04 01:28:04,915][134294] Updated weights for policy 0, policy_version 56844 (0.0023) [2025-01-04 01:28:07,339][134294] Updated weights for policy 0, policy_version 56854 (0.0020) [2025-01-04 01:28:08,968][134211] Fps is (10 sec: 14336.0, 60 sec: 14472.5, 300 sec: 15120.5). Total num frames: 232894464. Throughput: 0: 3640.5. Samples: 47391142. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 01:28:08,968][134211] Avg episode reward: [(0, '6.557')] [2025-01-04 01:28:10,498][134294] Updated weights for policy 0, policy_version 56864 (0.0024) [2025-01-04 01:28:13,739][134294] Updated weights for policy 0, policy_version 56874 (0.0025) [2025-01-04 01:28:13,968][134211] Fps is (10 sec: 13926.5, 60 sec: 13994.8, 300 sec: 15065.0). Total num frames: 232955904. Throughput: 0: 3535.5. Samples: 47410552. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 01:28:13,968][134211] Avg episode reward: [(0, '6.043')] [2025-01-04 01:28:15,993][134294] Updated weights for policy 0, policy_version 56884 (0.0015) [2025-01-04 01:28:18,014][134294] Updated weights for policy 0, policy_version 56894 (0.0012) [2025-01-04 01:28:18,968][134211] Fps is (10 sec: 15973.4, 60 sec: 14608.9, 300 sec: 15037.1). Total num frames: 233054208. Throughput: 0: 3478.4. Samples: 47423344. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 01:28:18,969][134211] Avg episode reward: [(0, '6.255')] [2025-01-04 01:28:20,199][134294] Updated weights for policy 0, policy_version 56904 (0.0015) [2025-01-04 01:28:22,759][134294] Updated weights for policy 0, policy_version 56914 (0.0022) [2025-01-04 01:28:23,970][134211] Fps is (10 sec: 17609.0, 60 sec: 14949.9, 300 sec: 15037.1). Total num frames: 233132032. Throughput: 0: 3569.7. Samples: 47451274. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 01:28:23,970][134211] Avg episode reward: [(0, '6.630')] [2025-01-04 01:28:26,499][134294] Updated weights for policy 0, policy_version 56924 (0.0027) [2025-01-04 01:28:28,969][134211] Fps is (10 sec: 13106.2, 60 sec: 14335.6, 300 sec: 15009.3). Total num frames: 233185280. Throughput: 0: 3582.2. Samples: 47468138. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 01:28:28,970][134211] Avg episode reward: [(0, '6.772')] [2025-01-04 01:28:30,242][134294] Updated weights for policy 0, policy_version 56934 (0.0025) [2025-01-04 01:28:32,830][134294] Updated weights for policy 0, policy_version 56944 (0.0018) [2025-01-04 01:28:33,967][134211] Fps is (10 sec: 12700.6, 60 sec: 13858.2, 300 sec: 15051.1). Total num frames: 233259008. Throughput: 0: 3582.2. Samples: 47476618. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 01:28:33,968][134211] Avg episode reward: [(0, '6.682')] [2025-01-04 01:28:34,979][134294] Updated weights for policy 0, policy_version 56954 (0.0014) [2025-01-04 01:28:37,113][134294] Updated weights for policy 0, policy_version 56964 (0.0015) [2025-01-04 01:28:38,971][134211] Fps is (10 sec: 15971.7, 60 sec: 14267.3, 300 sec: 15092.6). Total num frames: 233345024. Throughput: 0: 3710.9. Samples: 47504502. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 01:28:38,971][134211] Avg episode reward: [(0, '6.599')] [2025-01-04 01:28:40,708][134294] Updated weights for policy 0, policy_version 56974 (0.0030) [2025-01-04 01:28:43,968][134211] Fps is (10 sec: 14335.2, 60 sec: 14335.9, 300 sec: 14912.2). Total num frames: 233402368. Throughput: 0: 3607.7. Samples: 47522052. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 01:28:43,969][134211] Avg episode reward: [(0, '6.425')] [2025-01-04 01:28:44,176][134294] Updated weights for policy 0, policy_version 56984 (0.0026) [2025-01-04 01:28:47,681][134294] Updated weights for policy 0, policy_version 56994 (0.0025) [2025-01-04 01:28:48,968][134211] Fps is (10 sec: 11882.2, 60 sec: 14404.3, 300 sec: 14884.5). Total num frames: 233463808. Throughput: 0: 3590.9. Samples: 47530508. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 01:28:48,968][134211] Avg episode reward: [(0, '6.378')] [2025-01-04 01:28:50,069][134294] Updated weights for policy 0, policy_version 57004 (0.0013) [2025-01-04 01:28:52,237][134294] Updated weights for policy 0, policy_version 57014 (0.0012) [2025-01-04 01:28:53,968][134211] Fps is (10 sec: 15975.2, 60 sec: 14540.8, 300 sec: 14995.5). Total num frames: 233562112. Throughput: 0: 3648.9. Samples: 47555344. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 01:28:53,968][134211] Avg episode reward: [(0, '6.238')] [2025-01-04 01:28:54,404][134294] Updated weights for policy 0, policy_version 57024 (0.0012) [2025-01-04 01:28:56,377][134294] Updated weights for policy 0, policy_version 57034 (0.0012) [2025-01-04 01:28:58,541][134294] Updated weights for policy 0, policy_version 57044 (0.0012) [2025-01-04 01:28:58,967][134211] Fps is (10 sec: 19661.0, 60 sec: 15155.3, 300 sec: 15120.5). Total num frames: 233660416. Throughput: 0: 3868.7. Samples: 47584642. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 01:28:58,968][134211] Avg episode reward: [(0, '6.880')] [2025-01-04 01:29:01,584][134294] Updated weights for policy 0, policy_version 57054 (0.0026) [2025-01-04 01:29:03,968][134211] Fps is (10 sec: 15564.3, 60 sec: 15018.6, 300 sec: 15037.2). Total num frames: 233717760. Throughput: 0: 3821.5. Samples: 47595310. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 01:29:03,969][134211] Avg episode reward: [(0, '6.695')] [2025-01-04 01:29:05,329][134294] Updated weights for policy 0, policy_version 57064 (0.0028) [2025-01-04 01:29:08,810][134294] Updated weights for policy 0, policy_version 57074 (0.0030) [2025-01-04 01:29:08,968][134211] Fps is (10 sec: 11468.4, 60 sec: 14677.3, 300 sec: 14981.6). Total num frames: 233775104. Throughput: 0: 3578.0. Samples: 47612276. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 01:29:08,972][134211] Avg episode reward: [(0, '6.761')] [2025-01-04 01:29:12,355][134294] Updated weights for policy 0, policy_version 57084 (0.0024) [2025-01-04 01:29:13,968][134211] Fps is (10 sec: 11878.6, 60 sec: 14677.3, 300 sec: 14912.2). Total num frames: 233836544. Throughput: 0: 3601.2. Samples: 47630188. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 01:29:13,968][134211] Avg episode reward: [(0, '5.650')] [2025-01-04 01:29:13,975][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000057089_233836544.pth... [2025-01-04 01:29:14,049][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000056222_230285312.pth [2025-01-04 01:29:15,551][134294] Updated weights for policy 0, policy_version 57094 (0.0026) [2025-01-04 01:29:18,822][134294] Updated weights for policy 0, policy_version 57104 (0.0025) [2025-01-04 01:29:18,968][134211] Fps is (10 sec: 12288.0, 60 sec: 14063.1, 300 sec: 14759.5). Total num frames: 233897984. Throughput: 0: 3617.5. Samples: 47639406. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 01:29:18,968][134211] Avg episode reward: [(0, '6.172')] [2025-01-04 01:29:22,072][134294] Updated weights for policy 0, policy_version 57114 (0.0025) [2025-01-04 01:29:23,968][134211] Fps is (10 sec: 12288.0, 60 sec: 13790.3, 300 sec: 14690.2). Total num frames: 233959424. Throughput: 0: 3418.3. Samples: 47658316. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 01:29:23,968][134211] Avg episode reward: [(0, '6.341')] [2025-01-04 01:29:25,289][134294] Updated weights for policy 0, policy_version 57124 (0.0023) [2025-01-04 01:29:28,692][134294] Updated weights for policy 0, policy_version 57134 (0.0027) [2025-01-04 01:29:28,968][134211] Fps is (10 sec: 12288.0, 60 sec: 13926.7, 300 sec: 14690.1). Total num frames: 234020864. Throughput: 0: 3442.1. Samples: 47676944. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 01:29:28,968][134211] Avg episode reward: [(0, '6.430')] [2025-01-04 01:29:31,480][134294] Updated weights for policy 0, policy_version 57144 (0.0019) [2025-01-04 01:29:33,492][134294] Updated weights for policy 0, policy_version 57154 (0.0013) [2025-01-04 01:29:33,968][134211] Fps is (10 sec: 15155.5, 60 sec: 14199.5, 300 sec: 14773.4). Total num frames: 234110976. Throughput: 0: 3481.4. Samples: 47687172. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 01:29:33,968][134211] Avg episode reward: [(0, '5.819')] [2025-01-04 01:29:35,518][134294] Updated weights for policy 0, policy_version 57164 (0.0014) [2025-01-04 01:29:37,493][134294] Updated weights for policy 0, policy_version 57174 (0.0013) [2025-01-04 01:29:38,968][134211] Fps is (10 sec: 18841.9, 60 sec: 14405.0, 300 sec: 14912.2). Total num frames: 234209280. Throughput: 0: 3609.6. Samples: 47717776. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 01:29:38,968][134211] Avg episode reward: [(0, '6.019')] [2025-01-04 01:29:40,221][134294] Updated weights for policy 0, policy_version 57184 (0.0022) [2025-01-04 01:29:43,629][134294] Updated weights for policy 0, policy_version 57194 (0.0028) [2025-01-04 01:29:43,968][134211] Fps is (10 sec: 15564.6, 60 sec: 14404.3, 300 sec: 14815.0). Total num frames: 234266624. Throughput: 0: 3419.0. Samples: 47738500. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 01:29:43,968][134211] Avg episode reward: [(0, '6.219')] [2025-01-04 01:29:47,356][134294] Updated weights for policy 0, policy_version 57204 (0.0028) [2025-01-04 01:29:48,968][134211] Fps is (10 sec: 11468.6, 60 sec: 14336.0, 300 sec: 14676.2). Total num frames: 234323968. Throughput: 0: 3370.2. Samples: 47746968. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 01:29:48,968][134211] Avg episode reward: [(0, '7.033')] [2025-01-04 01:29:50,477][134294] Updated weights for policy 0, policy_version 57214 (0.0025) [2025-01-04 01:29:52,588][134294] Updated weights for policy 0, policy_version 57224 (0.0014) [2025-01-04 01:29:53,968][134211] Fps is (10 sec: 15155.5, 60 sec: 14267.7, 300 sec: 14773.4). Total num frames: 234418176. Throughput: 0: 3471.9. Samples: 47768512. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 01:29:53,968][134211] Avg episode reward: [(0, '5.985')] [2025-01-04 01:29:54,516][134294] Updated weights for policy 0, policy_version 57234 (0.0014) [2025-01-04 01:29:56,452][134294] Updated weights for policy 0, policy_version 57244 (0.0015) [2025-01-04 01:29:58,920][134294] Updated weights for policy 0, policy_version 57254 (0.0022) [2025-01-04 01:29:58,968][134211] Fps is (10 sec: 18841.2, 60 sec: 14199.3, 300 sec: 14898.3). Total num frames: 234512384. Throughput: 0: 3748.2. Samples: 47798858. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 01:29:58,969][134211] Avg episode reward: [(0, '6.875')] [2025-01-04 01:30:02,196][134294] Updated weights for policy 0, policy_version 57264 (0.0029) [2025-01-04 01:30:03,968][134211] Fps is (10 sec: 15564.4, 60 sec: 14267.8, 300 sec: 14884.4). Total num frames: 234573824. Throughput: 0: 3754.9. Samples: 47808376. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 01:30:03,968][134211] Avg episode reward: [(0, '7.013')] [2025-01-04 01:30:05,452][134294] Updated weights for policy 0, policy_version 57274 (0.0028) [2025-01-04 01:30:08,497][134294] Updated weights for policy 0, policy_version 57284 (0.0026) [2025-01-04 01:30:08,968][134211] Fps is (10 sec: 12698.0, 60 sec: 14404.3, 300 sec: 14759.5). Total num frames: 234639360. Throughput: 0: 3760.4. Samples: 47827534. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 01:30:08,968][134211] Avg episode reward: [(0, '6.403')] [2025-01-04 01:30:11,555][134294] Updated weights for policy 0, policy_version 57294 (0.0023) [2025-01-04 01:30:13,968][134211] Fps is (10 sec: 13107.3, 60 sec: 14472.5, 300 sec: 14676.2). Total num frames: 234704896. Throughput: 0: 3787.7. Samples: 47847390. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 01:30:13,968][134211] Avg episode reward: [(0, '6.576')] [2025-01-04 01:30:14,771][134294] Updated weights for policy 0, policy_version 57304 (0.0024) [2025-01-04 01:30:17,858][134294] Updated weights for policy 0, policy_version 57314 (0.0024) [2025-01-04 01:30:18,968][134211] Fps is (10 sec: 13107.1, 60 sec: 14540.8, 300 sec: 14717.9). Total num frames: 234770432. Throughput: 0: 3773.7. Samples: 47856988. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 01:30:18,968][134211] Avg episode reward: [(0, '6.480')] [2025-01-04 01:30:20,511][134294] Updated weights for policy 0, policy_version 57324 (0.0020) [2025-01-04 01:30:22,431][134294] Updated weights for policy 0, policy_version 57334 (0.0013) [2025-01-04 01:30:23,968][134211] Fps is (10 sec: 16793.8, 60 sec: 15223.5, 300 sec: 14815.0). Total num frames: 234872832. Throughput: 0: 3648.4. Samples: 47881954. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 01:30:23,968][134211] Avg episode reward: [(0, '6.825')] [2025-01-04 01:30:24,477][134294] Updated weights for policy 0, policy_version 57344 (0.0015) [2025-01-04 01:30:27,521][134294] Updated weights for policy 0, policy_version 57354 (0.0025) [2025-01-04 01:30:28,968][134211] Fps is (10 sec: 16793.4, 60 sec: 15291.7, 300 sec: 14773.4). Total num frames: 234938368. Throughput: 0: 3717.2. Samples: 47905776. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 01:30:28,969][134211] Avg episode reward: [(0, '6.788')] [2025-01-04 01:30:30,822][134294] Updated weights for policy 0, policy_version 57364 (0.0029) [2025-01-04 01:30:33,743][134294] Updated weights for policy 0, policy_version 57374 (0.0026) [2025-01-04 01:30:33,968][134211] Fps is (10 sec: 13107.0, 60 sec: 14882.1, 300 sec: 14773.4). Total num frames: 235003904. Throughput: 0: 3741.3. Samples: 47915326. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 01:30:33,968][134211] Avg episode reward: [(0, '6.774')] [2025-01-04 01:30:36,570][134294] Updated weights for policy 0, policy_version 57384 (0.0024) [2025-01-04 01:30:38,583][134294] Updated weights for policy 0, policy_version 57394 (0.0014) [2025-01-04 01:30:38,967][134211] Fps is (10 sec: 15565.3, 60 sec: 14745.6, 300 sec: 14842.8). Total num frames: 235094016. Throughput: 0: 3756.8. Samples: 47937570. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 01:30:38,968][134211] Avg episode reward: [(0, '6.855')] [2025-01-04 01:30:40,634][134294] Updated weights for policy 0, policy_version 57404 (0.0016) [2025-01-04 01:30:43,660][134294] Updated weights for policy 0, policy_version 57414 (0.0023) [2025-01-04 01:30:43,968][134211] Fps is (10 sec: 16384.1, 60 sec: 15018.7, 300 sec: 14773.4). Total num frames: 235167744. Throughput: 0: 3666.9. Samples: 47963868. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 01:30:43,968][134211] Avg episode reward: [(0, '6.604')] [2025-01-04 01:30:47,279][134294] Updated weights for policy 0, policy_version 57424 (0.0025) [2025-01-04 01:30:48,967][134211] Fps is (10 sec: 14336.0, 60 sec: 15223.5, 300 sec: 14634.5). Total num frames: 235237376. Throughput: 0: 3644.7. Samples: 47972388. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 01:30:48,968][134211] Avg episode reward: [(0, '6.719')] [2025-01-04 01:30:49,406][134294] Updated weights for policy 0, policy_version 57434 (0.0013) [2025-01-04 01:30:51,367][134294] Updated weights for policy 0, policy_version 57444 (0.0013) [2025-01-04 01:30:53,249][134294] Updated weights for policy 0, policy_version 57454 (0.0014) [2025-01-04 01:30:53,968][134211] Fps is (10 sec: 17613.0, 60 sec: 15428.3, 300 sec: 14731.7). Total num frames: 235343872. Throughput: 0: 3837.0. Samples: 48000200. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 01:30:53,968][134211] Avg episode reward: [(0, '6.820')] [2025-01-04 01:30:55,164][134294] Updated weights for policy 0, policy_version 57464 (0.0013) [2025-01-04 01:30:57,098][134294] Updated weights for policy 0, policy_version 57474 (0.0013) [2025-01-04 01:30:58,968][134211] Fps is (10 sec: 21299.0, 60 sec: 15633.2, 300 sec: 14884.5). Total num frames: 235450368. Throughput: 0: 4112.5. Samples: 48032454. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-04 01:30:58,968][134211] Avg episode reward: [(0, '6.554')] [2025-01-04 01:30:59,004][134294] Updated weights for policy 0, policy_version 57484 (0.0014) [2025-01-04 01:31:01,376][134294] Updated weights for policy 0, policy_version 57494 (0.0019) [2025-01-04 01:31:03,968][134211] Fps is (10 sec: 18431.6, 60 sec: 15906.1, 300 sec: 14926.1). Total num frames: 235528192. Throughput: 0: 4212.8. Samples: 48046564. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-04 01:31:03,969][134211] Avg episode reward: [(0, '6.842')] [2025-01-04 01:31:04,509][134294] Updated weights for policy 0, policy_version 57504 (0.0032) [2025-01-04 01:31:07,823][134294] Updated weights for policy 0, policy_version 57514 (0.0027) [2025-01-04 01:31:08,968][134211] Fps is (10 sec: 13926.1, 60 sec: 15837.8, 300 sec: 14926.1). Total num frames: 235589632. Throughput: 0: 4078.9. Samples: 48065506. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-04 01:31:08,968][134211] Avg episode reward: [(0, '6.912')] [2025-01-04 01:31:11,002][134294] Updated weights for policy 0, policy_version 57524 (0.0025) [2025-01-04 01:31:13,968][134211] Fps is (10 sec: 12697.5, 60 sec: 15837.8, 300 sec: 14870.5). Total num frames: 235655168. Throughput: 0: 3978.1. Samples: 48084792. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-04 01:31:13,969][134211] Avg episode reward: [(0, '6.758')] [2025-01-04 01:31:13,981][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000057533_235655168.pth... [2025-01-04 01:31:14,056][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000056670_232120320.pth [2025-01-04 01:31:14,195][134294] Updated weights for policy 0, policy_version 57534 (0.0024) [2025-01-04 01:31:17,322][134294] Updated weights for policy 0, policy_version 57544 (0.0028) [2025-01-04 01:31:18,968][134211] Fps is (10 sec: 12697.7, 60 sec: 15769.6, 300 sec: 14787.3). Total num frames: 235716608. Throughput: 0: 3978.2. Samples: 48094344. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-04 01:31:18,968][134211] Avg episode reward: [(0, '6.290')] [2025-01-04 01:31:20,385][134294] Updated weights for policy 0, policy_version 57554 (0.0026) [2025-01-04 01:31:23,483][134294] Updated weights for policy 0, policy_version 57564 (0.0027) [2025-01-04 01:31:23,968][134211] Fps is (10 sec: 13107.3, 60 sec: 15223.4, 300 sec: 14801.1). Total num frames: 235786240. Throughput: 0: 3936.5. Samples: 48114714. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-04 01:31:23,968][134211] Avg episode reward: [(0, '6.802')] [2025-01-04 01:31:26,526][134294] Updated weights for policy 0, policy_version 57574 (0.0024) [2025-01-04 01:31:28,968][134211] Fps is (10 sec: 13926.5, 60 sec: 15291.8, 300 sec: 14787.2). Total num frames: 235855872. Throughput: 0: 3790.0. Samples: 48134418. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-04 01:31:28,968][134211] Avg episode reward: [(0, '6.586')] [2025-01-04 01:31:29,605][134294] Updated weights for policy 0, policy_version 57584 (0.0025) [2025-01-04 01:31:32,648][134294] Updated weights for policy 0, policy_version 57594 (0.0025) [2025-01-04 01:31:33,968][134211] Fps is (10 sec: 13516.4, 60 sec: 15291.6, 300 sec: 14745.6). Total num frames: 235921408. Throughput: 0: 3820.3. Samples: 48144302. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 01:31:33,969][134211] Avg episode reward: [(0, '6.890')] [2025-01-04 01:31:35,731][134294] Updated weights for policy 0, policy_version 57604 (0.0026) [2025-01-04 01:31:38,749][134294] Updated weights for policy 0, policy_version 57614 (0.0028) [2025-01-04 01:31:38,968][134211] Fps is (10 sec: 13107.1, 60 sec: 14882.1, 300 sec: 14703.9). Total num frames: 235986944. Throughput: 0: 3656.2. Samples: 48164730. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 01:31:38,968][134211] Avg episode reward: [(0, '6.791')] [2025-01-04 01:31:41,565][134294] Updated weights for policy 0, policy_version 57624 (0.0023) [2025-01-04 01:31:43,603][134294] Updated weights for policy 0, policy_version 57634 (0.0014) [2025-01-04 01:31:43,968][134211] Fps is (10 sec: 15155.4, 60 sec: 15086.9, 300 sec: 14745.6). Total num frames: 236072960. Throughput: 0: 3465.3. Samples: 48188392. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 01:31:43,968][134211] Avg episode reward: [(0, '6.453')] [2025-01-04 01:31:45,507][134294] Updated weights for policy 0, policy_version 57644 (0.0014) [2025-01-04 01:31:47,422][134294] Updated weights for policy 0, policy_version 57654 (0.0013) [2025-01-04 01:31:48,968][134211] Fps is (10 sec: 19661.2, 60 sec: 15769.6, 300 sec: 14856.7). Total num frames: 236183552. Throughput: 0: 3505.4. Samples: 48204304. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 01:31:48,968][134211] Avg episode reward: [(0, '6.316')] [2025-01-04 01:31:49,327][134294] Updated weights for policy 0, policy_version 57664 (0.0015) [2025-01-04 01:31:51,509][134294] Updated weights for policy 0, policy_version 57674 (0.0015) [2025-01-04 01:31:53,968][134211] Fps is (10 sec: 18841.9, 60 sec: 15291.7, 300 sec: 14801.1). Total num frames: 236261376. Throughput: 0: 3738.5. Samples: 48233740. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 01:31:53,969][134211] Avg episode reward: [(0, '6.352')] [2025-01-04 01:31:54,689][134294] Updated weights for policy 0, policy_version 57684 (0.0029) [2025-01-04 01:31:57,909][134294] Updated weights for policy 0, policy_version 57694 (0.0027) [2025-01-04 01:31:58,968][134211] Fps is (10 sec: 14335.3, 60 sec: 14609.0, 300 sec: 14815.1). Total num frames: 236326912. Throughput: 0: 3736.1. Samples: 48252916. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 01:31:58,969][134211] Avg episode reward: [(0, '6.626')] [2025-01-04 01:32:00,959][134294] Updated weights for policy 0, policy_version 57704 (0.0027) [2025-01-04 01:32:03,968][134211] Fps is (10 sec: 13107.4, 60 sec: 14404.3, 300 sec: 14801.1). Total num frames: 236392448. Throughput: 0: 3744.9. Samples: 48262866. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 01:32:03,968][134211] Avg episode reward: [(0, '7.262')] [2025-01-04 01:32:04,078][134294] Updated weights for policy 0, policy_version 57714 (0.0025) [2025-01-04 01:32:07,149][134294] Updated weights for policy 0, policy_version 57724 (0.0028) [2025-01-04 01:32:08,969][134211] Fps is (10 sec: 13106.1, 60 sec: 14472.3, 300 sec: 14717.8). Total num frames: 236457984. Throughput: 0: 3729.8. Samples: 48282558. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 01:32:08,969][134211] Avg episode reward: [(0, '6.444')] [2025-01-04 01:32:10,282][134294] Updated weights for policy 0, policy_version 57734 (0.0026) [2025-01-04 01:32:13,202][134294] Updated weights for policy 0, policy_version 57744 (0.0025) [2025-01-04 01:32:13,968][134211] Fps is (10 sec: 13516.6, 60 sec: 14540.8, 300 sec: 14745.6). Total num frames: 236527616. Throughput: 0: 3741.6. Samples: 48302792. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 01:32:13,968][134211] Avg episode reward: [(0, '6.169')] [2025-01-04 01:32:16,312][134294] Updated weights for policy 0, policy_version 57754 (0.0022) [2025-01-04 01:32:18,247][134294] Updated weights for policy 0, policy_version 57764 (0.0013) [2025-01-04 01:32:18,968][134211] Fps is (10 sec: 15566.8, 60 sec: 14950.4, 300 sec: 14842.8). Total num frames: 236613632. Throughput: 0: 3745.0. Samples: 48312826. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 01:32:18,968][134211] Avg episode reward: [(0, '6.800')] [2025-01-04 01:32:20,212][134294] Updated weights for policy 0, policy_version 57774 (0.0015) [2025-01-04 01:32:22,931][134294] Updated weights for policy 0, policy_version 57784 (0.0024) [2025-01-04 01:32:23,968][134211] Fps is (10 sec: 16793.9, 60 sec: 15155.2, 300 sec: 14815.0). Total num frames: 236695552. Throughput: 0: 3921.9. Samples: 48341216. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 01:32:23,968][134211] Avg episode reward: [(0, '7.566')] [2025-01-04 01:32:26,084][134294] Updated weights for policy 0, policy_version 57794 (0.0028) [2025-01-04 01:32:28,469][134294] Updated weights for policy 0, policy_version 57804 (0.0017) [2025-01-04 01:32:28,968][134211] Fps is (10 sec: 15974.2, 60 sec: 15291.7, 300 sec: 14731.7). Total num frames: 236773376. Throughput: 0: 3877.1. Samples: 48362860. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 01:32:28,968][134211] Avg episode reward: [(0, '6.923')] [2025-01-04 01:32:30,473][134294] Updated weights for policy 0, policy_version 57814 (0.0013) [2025-01-04 01:32:32,771][134294] Updated weights for policy 0, policy_version 57824 (0.0019) [2025-01-04 01:32:33,968][134211] Fps is (10 sec: 16793.3, 60 sec: 15701.4, 300 sec: 14829.0). Total num frames: 236863488. Throughput: 0: 3874.5. Samples: 48378658. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 01:32:33,969][134211] Avg episode reward: [(0, '6.706')] [2025-01-04 01:32:35,806][134294] Updated weights for policy 0, policy_version 57834 (0.0025) [2025-01-04 01:32:38,906][134294] Updated weights for policy 0, policy_version 57844 (0.0024) [2025-01-04 01:32:38,968][134211] Fps is (10 sec: 15564.3, 60 sec: 15701.3, 300 sec: 14870.5). Total num frames: 236929024. Throughput: 0: 3694.2. Samples: 48399978. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 01:32:38,969][134211] Avg episode reward: [(0, '6.845')] [2025-01-04 01:32:41,867][134294] Updated weights for policy 0, policy_version 57854 (0.0025) [2025-01-04 01:32:43,968][134211] Fps is (10 sec: 13107.2, 60 sec: 15360.0, 300 sec: 14898.3). Total num frames: 236994560. Throughput: 0: 3705.0. Samples: 48419640. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 01:32:43,968][134211] Avg episode reward: [(0, '6.866')] [2025-01-04 01:32:45,091][134294] Updated weights for policy 0, policy_version 57864 (0.0021) [2025-01-04 01:32:47,187][134294] Updated weights for policy 0, policy_version 57874 (0.0015) [2025-01-04 01:32:48,968][134211] Fps is (10 sec: 15565.6, 60 sec: 15018.7, 300 sec: 14898.3). Total num frames: 237084672. Throughput: 0: 3738.9. Samples: 48431116. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 01:32:48,968][134211] Avg episode reward: [(0, '7.588')] [2025-01-04 01:32:49,220][134294] Updated weights for policy 0, policy_version 57884 (0.0013) [2025-01-04 01:32:51,184][134294] Updated weights for policy 0, policy_version 57894 (0.0013) [2025-01-04 01:32:53,068][134294] Updated weights for policy 0, policy_version 57904 (0.0015) [2025-01-04 01:32:53,968][134211] Fps is (10 sec: 19660.8, 60 sec: 15496.5, 300 sec: 15051.1). Total num frames: 237191168. Throughput: 0: 3989.4. Samples: 48462078. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 01:32:53,968][134211] Avg episode reward: [(0, '6.429')] [2025-01-04 01:32:54,938][134294] Updated weights for policy 0, policy_version 57914 (0.0013) [2025-01-04 01:32:57,594][134294] Updated weights for policy 0, policy_version 57924 (0.0023) [2025-01-04 01:32:58,968][134211] Fps is (10 sec: 18841.1, 60 sec: 15769.6, 300 sec: 15106.6). Total num frames: 237273088. Throughput: 0: 4141.6. Samples: 48489164. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 01:32:58,969][134211] Avg episode reward: [(0, '6.766')] [2025-01-04 01:33:00,872][134294] Updated weights for policy 0, policy_version 57934 (0.0025) [2025-01-04 01:33:03,968][134211] Fps is (10 sec: 14336.0, 60 sec: 15701.3, 300 sec: 15051.1). Total num frames: 237334528. Throughput: 0: 4131.6. Samples: 48498750. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 01:33:03,968][134211] Avg episode reward: [(0, '6.482')] [2025-01-04 01:33:04,012][134294] Updated weights for policy 0, policy_version 57944 (0.0025) [2025-01-04 01:33:07,063][134294] Updated weights for policy 0, policy_version 57954 (0.0024) [2025-01-04 01:33:08,968][134211] Fps is (10 sec: 12697.7, 60 sec: 15701.6, 300 sec: 15064.9). Total num frames: 237400064. Throughput: 0: 3939.1. Samples: 48518478. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 01:33:08,968][134211] Avg episode reward: [(0, '7.265')] [2025-01-04 01:33:10,101][134294] Updated weights for policy 0, policy_version 57964 (0.0027) [2025-01-04 01:33:13,143][134294] Updated weights for policy 0, policy_version 57974 (0.0023) [2025-01-04 01:33:13,968][134211] Fps is (10 sec: 13516.9, 60 sec: 15701.4, 300 sec: 14967.8). Total num frames: 237469696. Throughput: 0: 3906.9. Samples: 48538670. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 01:33:13,968][134211] Avg episode reward: [(0, '6.908')] [2025-01-04 01:33:13,978][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000057976_237469696.pth... [2025-01-04 01:33:14,055][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000057089_233836544.pth [2025-01-04 01:33:16,133][134294] Updated weights for policy 0, policy_version 57984 (0.0025) [2025-01-04 01:33:18,968][134211] Fps is (10 sec: 13926.6, 60 sec: 15428.3, 300 sec: 14940.1). Total num frames: 237539328. Throughput: 0: 3780.4. Samples: 48548776. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 01:33:18,968][134211] Avg episode reward: [(0, '6.978')] [2025-01-04 01:33:19,140][134294] Updated weights for policy 0, policy_version 57994 (0.0025) [2025-01-04 01:33:22,289][134294] Updated weights for policy 0, policy_version 58004 (0.0027) [2025-01-04 01:33:23,968][134211] Fps is (10 sec: 13516.1, 60 sec: 15155.1, 300 sec: 14981.7). Total num frames: 237604864. Throughput: 0: 3753.4. Samples: 48568882. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 01:33:23,969][134211] Avg episode reward: [(0, '6.462')] [2025-01-04 01:33:25,334][134294] Updated weights for policy 0, policy_version 58014 (0.0025) [2025-01-04 01:33:28,318][134294] Updated weights for policy 0, policy_version 58024 (0.0026) [2025-01-04 01:33:28,968][134211] Fps is (10 sec: 13516.8, 60 sec: 15018.7, 300 sec: 14967.8). Total num frames: 237674496. Throughput: 0: 3772.8. Samples: 48589416. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 01:33:28,968][134211] Avg episode reward: [(0, '7.114')] [2025-01-04 01:33:30,627][134294] Updated weights for policy 0, policy_version 58034 (0.0018) [2025-01-04 01:33:32,520][134294] Updated weights for policy 0, policy_version 58044 (0.0013) [2025-01-04 01:33:33,967][134211] Fps is (10 sec: 17204.4, 60 sec: 15223.5, 300 sec: 15023.5). Total num frames: 237776896. Throughput: 0: 3822.0. Samples: 48603108. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 01:33:33,968][134211] Avg episode reward: [(0, '6.631')] [2025-01-04 01:33:34,420][134294] Updated weights for policy 0, policy_version 58054 (0.0012) [2025-01-04 01:33:36,400][134294] Updated weights for policy 0, policy_version 58064 (0.0013) [2025-01-04 01:33:38,392][134294] Updated weights for policy 0, policy_version 58074 (0.0013) [2025-01-04 01:33:38,968][134211] Fps is (10 sec: 20479.9, 60 sec: 15838.0, 300 sec: 15176.0). Total num frames: 237879296. Throughput: 0: 3839.7. Samples: 48634864. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 01:33:38,968][134211] Avg episode reward: [(0, '7.056')] [2025-01-04 01:33:41,031][134294] Updated weights for policy 0, policy_version 58084 (0.0025) [2025-01-04 01:33:43,968][134211] Fps is (10 sec: 17202.9, 60 sec: 15906.2, 300 sec: 15203.8). Total num frames: 237948928. Throughput: 0: 3748.8. Samples: 48657860. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 01:33:43,968][134211] Avg episode reward: [(0, '7.275')] [2025-01-04 01:33:44,310][134294] Updated weights for policy 0, policy_version 58094 (0.0030) [2025-01-04 01:33:47,523][134294] Updated weights for policy 0, policy_version 58104 (0.0030) [2025-01-04 01:33:48,968][134211] Fps is (10 sec: 13107.1, 60 sec: 15428.2, 300 sec: 15078.8). Total num frames: 238010368. Throughput: 0: 3744.6. Samples: 48667256. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 01:33:48,968][134211] Avg episode reward: [(0, '6.627')] [2025-01-04 01:33:50,663][134294] Updated weights for policy 0, policy_version 58114 (0.0029) [2025-01-04 01:33:53,633][134294] Updated weights for policy 0, policy_version 58124 (0.0025) [2025-01-04 01:33:53,968][134211] Fps is (10 sec: 12697.5, 60 sec: 14745.6, 300 sec: 14967.7). Total num frames: 238075904. Throughput: 0: 3752.4. Samples: 48687336. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 01:33:53,968][134211] Avg episode reward: [(0, '7.085')] [2025-01-04 01:33:56,631][134294] Updated weights for policy 0, policy_version 58134 (0.0023) [2025-01-04 01:33:58,970][134211] Fps is (10 sec: 13514.3, 60 sec: 14540.4, 300 sec: 15009.3). Total num frames: 238145536. Throughput: 0: 3747.4. Samples: 48707308. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 01:33:58,970][134211] Avg episode reward: [(0, '6.590')] [2025-01-04 01:33:59,897][134294] Updated weights for policy 0, policy_version 58144 (0.0025) [2025-01-04 01:34:02,261][134294] Updated weights for policy 0, policy_version 58154 (0.0017) [2025-01-04 01:34:03,967][134211] Fps is (10 sec: 15565.2, 60 sec: 14950.5, 300 sec: 15106.6). Total num frames: 238231552. Throughput: 0: 3743.3. Samples: 48717226. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 01:34:03,968][134211] Avg episode reward: [(0, '6.544')] [2025-01-04 01:34:04,201][134294] Updated weights for policy 0, policy_version 58164 (0.0014) [2025-01-04 01:34:06,043][134294] Updated weights for policy 0, policy_version 58174 (0.0013) [2025-01-04 01:34:07,908][134294] Updated weights for policy 0, policy_version 58184 (0.0013) [2025-01-04 01:34:08,968][134211] Fps is (10 sec: 19664.0, 60 sec: 15701.3, 300 sec: 15273.2). Total num frames: 238342144. Throughput: 0: 4015.6. Samples: 48749584. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 01:34:08,968][134211] Avg episode reward: [(0, '6.457')] [2025-01-04 01:34:09,819][134294] Updated weights for policy 0, policy_version 58194 (0.0013) [2025-01-04 01:34:11,724][134294] Updated weights for policy 0, policy_version 58204 (0.0014) [2025-01-04 01:34:13,968][134211] Fps is (10 sec: 20889.2, 60 sec: 16179.2, 300 sec: 15398.2). Total num frames: 238440448. Throughput: 0: 4251.2. Samples: 48780720. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 01:34:13,968][134211] Avg episode reward: [(0, '6.620')] [2025-01-04 01:34:14,236][134294] Updated weights for policy 0, policy_version 58214 (0.0022) [2025-01-04 01:34:17,734][134294] Updated weights for policy 0, policy_version 58224 (0.0031) [2025-01-04 01:34:18,968][134211] Fps is (10 sec: 15564.9, 60 sec: 15974.3, 300 sec: 15384.3). Total num frames: 238497792. Throughput: 0: 4149.8. Samples: 48789852. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 01:34:18,968][134211] Avg episode reward: [(0, '7.717')] [2025-01-04 01:34:20,890][134294] Updated weights for policy 0, policy_version 58234 (0.0027) [2025-01-04 01:34:23,924][134294] Updated weights for policy 0, policy_version 58244 (0.0027) [2025-01-04 01:34:23,968][134211] Fps is (10 sec: 12697.5, 60 sec: 16042.8, 300 sec: 15412.1). Total num frames: 238567424. Throughput: 0: 3876.4. Samples: 48809304. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 01:34:23,969][134211] Avg episode reward: [(0, '7.143')] [2025-01-04 01:34:27,186][134294] Updated weights for policy 0, policy_version 58254 (0.0025) [2025-01-04 01:34:28,968][134211] Fps is (10 sec: 13107.3, 60 sec: 15906.1, 300 sec: 15314.9). Total num frames: 238628864. Throughput: 0: 3792.0. Samples: 48828498. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 01:34:28,968][134211] Avg episode reward: [(0, '7.490')] [2025-01-04 01:34:30,223][134294] Updated weights for policy 0, policy_version 58264 (0.0025) [2025-01-04 01:34:33,194][134294] Updated weights for policy 0, policy_version 58274 (0.0026) [2025-01-04 01:34:33,968][134211] Fps is (10 sec: 13107.3, 60 sec: 15360.0, 300 sec: 15217.7). Total num frames: 238698496. Throughput: 0: 3812.2. Samples: 48838806. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 01:34:33,968][134211] Avg episode reward: [(0, '7.443')] [2025-01-04 01:34:36,230][134294] Updated weights for policy 0, policy_version 58284 (0.0027) [2025-01-04 01:34:38,968][134211] Fps is (10 sec: 13926.5, 60 sec: 14813.9, 300 sec: 15259.3). Total num frames: 238768128. Throughput: 0: 3818.5. Samples: 48859168. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 01:34:38,968][134211] Avg episode reward: [(0, '6.398')] [2025-01-04 01:34:39,253][134294] Updated weights for policy 0, policy_version 58294 (0.0028) [2025-01-04 01:34:42,297][134294] Updated weights for policy 0, policy_version 58304 (0.0025) [2025-01-04 01:34:43,968][134211] Fps is (10 sec: 13516.9, 60 sec: 14745.6, 300 sec: 15287.1). Total num frames: 238833664. Throughput: 0: 3828.0. Samples: 48879562. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 01:34:43,968][134211] Avg episode reward: [(0, '7.031')] [2025-01-04 01:34:45,023][134294] Updated weights for policy 0, policy_version 58314 (0.0018) [2025-01-04 01:34:47,146][134294] Updated weights for policy 0, policy_version 58324 (0.0016) [2025-01-04 01:34:48,968][134211] Fps is (10 sec: 15974.6, 60 sec: 15291.8, 300 sec: 15287.1). Total num frames: 238927872. Throughput: 0: 3883.8. Samples: 48891996. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 01:34:48,968][134211] Avg episode reward: [(0, '7.297')] [2025-01-04 01:34:49,231][134294] Updated weights for policy 0, policy_version 58334 (0.0013) [2025-01-04 01:34:51,152][134294] Updated weights for policy 0, policy_version 58344 (0.0013) [2025-01-04 01:34:52,977][134294] Updated weights for policy 0, policy_version 58354 (0.0016) [2025-01-04 01:34:53,968][134211] Fps is (10 sec: 20070.3, 60 sec: 15974.4, 300 sec: 15328.8). Total num frames: 239034368. Throughput: 0: 3857.2. Samples: 48923156. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 01:34:53,968][134211] Avg episode reward: [(0, '6.402')] [2025-01-04 01:34:54,886][134294] Updated weights for policy 0, policy_version 58364 (0.0013) [2025-01-04 01:34:57,705][134294] Updated weights for policy 0, policy_version 58374 (0.0024) [2025-01-04 01:34:58,968][134211] Fps is (10 sec: 18431.8, 60 sec: 16111.5, 300 sec: 15384.3). Total num frames: 239112192. Throughput: 0: 3752.2. Samples: 48949570. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 01:34:58,968][134211] Avg episode reward: [(0, '6.940')] [2025-01-04 01:35:00,956][134294] Updated weights for policy 0, policy_version 58384 (0.0029) [2025-01-04 01:35:03,968][134211] Fps is (10 sec: 14335.7, 60 sec: 15769.5, 300 sec: 15384.3). Total num frames: 239177728. Throughput: 0: 3763.5. Samples: 48959208. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 01:35:03,969][134211] Avg episode reward: [(0, '7.430')] [2025-01-04 01:35:04,087][134294] Updated weights for policy 0, policy_version 58394 (0.0027) [2025-01-04 01:35:07,102][134294] Updated weights for policy 0, policy_version 58404 (0.0025) [2025-01-04 01:35:08,968][134211] Fps is (10 sec: 13107.1, 60 sec: 15018.7, 300 sec: 15384.3). Total num frames: 239243264. Throughput: 0: 3770.1. Samples: 48978960. Policy #0 lag: (min: 0.0, avg: 0.8, max: 3.0) [2025-01-04 01:35:08,968][134211] Avg episode reward: [(0, '6.828')] [2025-01-04 01:35:10,249][134294] Updated weights for policy 0, policy_version 58414 (0.0026) [2025-01-04 01:35:13,270][134294] Updated weights for policy 0, policy_version 58424 (0.0026) [2025-01-04 01:35:13,968][134211] Fps is (10 sec: 13516.7, 60 sec: 14540.8, 300 sec: 15398.2). Total num frames: 239312896. Throughput: 0: 3795.8. Samples: 48999310. Policy #0 lag: (min: 0.0, avg: 0.8, max: 3.0) [2025-01-04 01:35:13,969][134211] Avg episode reward: [(0, '6.712')] [2025-01-04 01:35:13,980][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000058426_239312896.pth... [2025-01-04 01:35:14,055][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000057533_235655168.pth [2025-01-04 01:35:16,260][134294] Updated weights for policy 0, policy_version 58434 (0.0026) [2025-01-04 01:35:18,968][134211] Fps is (10 sec: 13516.9, 60 sec: 14677.4, 300 sec: 15273.2). Total num frames: 239378432. Throughput: 0: 3791.2. Samples: 49009408. Policy #0 lag: (min: 0.0, avg: 0.8, max: 3.0) [2025-01-04 01:35:18,968][134211] Avg episode reward: [(0, '6.555')] [2025-01-04 01:35:19,352][134294] Updated weights for policy 0, policy_version 58444 (0.0022) [2025-01-04 01:35:22,320][134294] Updated weights for policy 0, policy_version 58454 (0.0026) [2025-01-04 01:35:23,986][134211] Fps is (10 sec: 13083.5, 60 sec: 14604.6, 300 sec: 15272.3). Total num frames: 239443968. Throughput: 0: 3778.2. Samples: 49029254. Policy #0 lag: (min: 0.0, avg: 0.8, max: 3.0) [2025-01-04 01:35:23,987][134211] Avg episode reward: [(0, '6.808')] [2025-01-04 01:35:25,244][134294] Updated weights for policy 0, policy_version 58464 (0.0022) [2025-01-04 01:35:27,097][134294] Updated weights for policy 0, policy_version 58474 (0.0015) [2025-01-04 01:35:28,968][134211] Fps is (10 sec: 16793.0, 60 sec: 15291.7, 300 sec: 15398.2). Total num frames: 239546368. Throughput: 0: 3931.5. Samples: 49056482. Policy #0 lag: (min: 0.0, avg: 0.8, max: 3.0) [2025-01-04 01:35:28,968][134211] Avg episode reward: [(0, '7.290')] [2025-01-04 01:35:28,982][134294] Updated weights for policy 0, policy_version 58484 (0.0015) [2025-01-04 01:35:30,934][134294] Updated weights for policy 0, policy_version 58494 (0.0013) [2025-01-04 01:35:32,811][134294] Updated weights for policy 0, policy_version 58504 (0.0013) [2025-01-04 01:35:33,968][134211] Fps is (10 sec: 21338.5, 60 sec: 15974.4, 300 sec: 15467.6). Total num frames: 239656960. Throughput: 0: 4010.3. Samples: 49072458. Policy #0 lag: (min: 0.0, avg: 0.8, max: 3.0) [2025-01-04 01:35:33,968][134211] Avg episode reward: [(0, '6.955')] [2025-01-04 01:35:34,678][134294] Updated weights for policy 0, policy_version 58514 (0.0013) [2025-01-04 01:35:36,794][134294] Updated weights for policy 0, policy_version 58524 (0.0016) [2025-01-04 01:35:38,971][134211] Fps is (10 sec: 19245.8, 60 sec: 16178.3, 300 sec: 15495.2). Total num frames: 239738880. Throughput: 0: 3996.1. Samples: 49102994. Policy #0 lag: (min: 0.0, avg: 0.8, max: 3.0) [2025-01-04 01:35:38,971][134211] Avg episode reward: [(0, '7.022')] [2025-01-04 01:35:40,215][134294] Updated weights for policy 0, policy_version 58534 (0.0026) [2025-01-04 01:35:43,457][134294] Updated weights for policy 0, policy_version 58544 (0.0026) [2025-01-04 01:35:43,968][134211] Fps is (10 sec: 14334.7, 60 sec: 16110.7, 300 sec: 15467.6). Total num frames: 239800320. Throughput: 0: 3822.3. Samples: 49121578. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 01:35:43,969][134211] Avg episode reward: [(0, '7.216')] [2025-01-04 01:35:46,484][134294] Updated weights for policy 0, policy_version 58554 (0.0030) [2025-01-04 01:35:48,970][134211] Fps is (10 sec: 12698.9, 60 sec: 15632.5, 300 sec: 15328.6). Total num frames: 239865856. Throughput: 0: 3826.8. Samples: 49131422. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 01:35:48,970][134211] Avg episode reward: [(0, '7.152')] [2025-01-04 01:35:49,621][134294] Updated weights for policy 0, policy_version 58564 (0.0027) [2025-01-04 01:35:52,720][134294] Updated weights for policy 0, policy_version 58574 (0.0026) [2025-01-04 01:35:53,968][134211] Fps is (10 sec: 13517.9, 60 sec: 15018.6, 300 sec: 15203.8). Total num frames: 239935488. Throughput: 0: 3828.0. Samples: 49151220. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 01:35:53,968][134211] Avg episode reward: [(0, '6.697')] [2025-01-04 01:35:55,732][134294] Updated weights for policy 0, policy_version 58584 (0.0026) [2025-01-04 01:35:58,634][134294] Updated weights for policy 0, policy_version 58594 (0.0027) [2025-01-04 01:35:58,968][134211] Fps is (10 sec: 13519.5, 60 sec: 14813.8, 300 sec: 15162.1). Total num frames: 240001024. Throughput: 0: 3836.0. Samples: 49171928. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 01:35:58,968][134211] Avg episode reward: [(0, '6.774')] [2025-01-04 01:36:01,738][134294] Updated weights for policy 0, policy_version 58604 (0.0028) [2025-01-04 01:36:03,968][134211] Fps is (10 sec: 13516.7, 60 sec: 14882.1, 300 sec: 15189.9). Total num frames: 240070656. Throughput: 0: 3830.0. Samples: 49181758. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 01:36:03,968][134211] Avg episode reward: [(0, '7.469')] [2025-01-04 01:36:04,625][134294] Updated weights for policy 0, policy_version 58614 (0.0023) [2025-01-04 01:36:06,649][134294] Updated weights for policy 0, policy_version 58624 (0.0013) [2025-01-04 01:36:08,504][134294] Updated weights for policy 0, policy_version 58634 (0.0014) [2025-01-04 01:36:08,968][134211] Fps is (10 sec: 17203.2, 60 sec: 15496.5, 300 sec: 15314.9). Total num frames: 240173056. Throughput: 0: 3955.5. Samples: 49207180. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 01:36:08,968][134211] Avg episode reward: [(0, '7.109')] [2025-01-04 01:36:10,408][134294] Updated weights for policy 0, policy_version 58644 (0.0013) [2025-01-04 01:36:12,247][134294] Updated weights for policy 0, policy_version 58654 (0.0013) [2025-01-04 01:36:13,967][134211] Fps is (10 sec: 20890.1, 60 sec: 16111.0, 300 sec: 15467.6). Total num frames: 240279552. Throughput: 0: 4074.8. Samples: 49239846. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 01:36:13,968][134211] Avg episode reward: [(0, '6.696')] [2025-01-04 01:36:14,175][134294] Updated weights for policy 0, policy_version 58664 (0.0014) [2025-01-04 01:36:17,142][134294] Updated weights for policy 0, policy_version 58674 (0.0025) [2025-01-04 01:36:18,968][134211] Fps is (10 sec: 17612.8, 60 sec: 16179.2, 300 sec: 15467.6). Total num frames: 240349184. Throughput: 0: 3997.2. Samples: 49252334. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 01:36:18,968][134211] Avg episode reward: [(0, '6.731')] [2025-01-04 01:36:20,422][134294] Updated weights for policy 0, policy_version 58684 (0.0028) [2025-01-04 01:36:23,460][134294] Updated weights for policy 0, policy_version 58694 (0.0024) [2025-01-04 01:36:23,968][134211] Fps is (10 sec: 13516.4, 60 sec: 16184.1, 300 sec: 15453.7). Total num frames: 240414720. Throughput: 0: 3746.2. Samples: 49271562. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 01:36:23,968][134211] Avg episode reward: [(0, '6.892')] [2025-01-04 01:36:26,552][134294] Updated weights for policy 0, policy_version 58704 (0.0027) [2025-01-04 01:36:28,968][134211] Fps is (10 sec: 13107.4, 60 sec: 15564.9, 300 sec: 15453.7). Total num frames: 240480256. Throughput: 0: 3769.1. Samples: 49291186. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 01:36:28,968][134211] Avg episode reward: [(0, '6.767')] [2025-01-04 01:36:29,766][134294] Updated weights for policy 0, policy_version 58714 (0.0026) [2025-01-04 01:36:32,844][134294] Updated weights for policy 0, policy_version 58724 (0.0029) [2025-01-04 01:36:33,968][134211] Fps is (10 sec: 13107.2, 60 sec: 14813.8, 300 sec: 15453.7). Total num frames: 240545792. Throughput: 0: 3769.1. Samples: 49301022. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 01:36:33,968][134211] Avg episode reward: [(0, '6.960')] [2025-01-04 01:36:35,865][134294] Updated weights for policy 0, policy_version 58734 (0.0025) [2025-01-04 01:36:38,508][134294] Updated weights for policy 0, policy_version 58744 (0.0021) [2025-01-04 01:36:38,968][134211] Fps is (10 sec: 14336.1, 60 sec: 14746.4, 300 sec: 15426.0). Total num frames: 240623616. Throughput: 0: 3787.5. Samples: 49321658. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 01:36:38,968][134211] Avg episode reward: [(0, '6.626')] [2025-01-04 01:36:40,442][134294] Updated weights for policy 0, policy_version 58754 (0.0013) [2025-01-04 01:36:42,291][134294] Updated weights for policy 0, policy_version 58764 (0.0014) [2025-01-04 01:36:43,967][134211] Fps is (10 sec: 18432.5, 60 sec: 15496.8, 300 sec: 15412.1). Total num frames: 240730112. Throughput: 0: 4010.5. Samples: 49352400. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 01:36:43,968][134211] Avg episode reward: [(0, '7.261')] [2025-01-04 01:36:44,159][134294] Updated weights for policy 0, policy_version 58774 (0.0013) [2025-01-04 01:36:46,299][134294] Updated weights for policy 0, policy_version 58784 (0.0013) [2025-01-04 01:36:48,908][134294] Updated weights for policy 0, policy_version 58794 (0.0023) [2025-01-04 01:36:48,968][134211] Fps is (10 sec: 19660.7, 60 sec: 15906.7, 300 sec: 15453.7). Total num frames: 240820224. Throughput: 0: 4128.8. Samples: 49367554. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 01:36:48,968][134211] Avg episode reward: [(0, '7.753')] [2025-01-04 01:36:52,647][134294] Updated weights for policy 0, policy_version 58804 (0.0028) [2025-01-04 01:36:53,968][134211] Fps is (10 sec: 14745.1, 60 sec: 15701.3, 300 sec: 15426.0). Total num frames: 240877568. Throughput: 0: 3999.5. Samples: 49387158. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 01:36:53,968][134211] Avg episode reward: [(0, '7.733')] [2025-01-04 01:36:55,836][134294] Updated weights for policy 0, policy_version 58814 (0.0026) [2025-01-04 01:36:58,762][134294] Updated weights for policy 0, policy_version 58824 (0.0025) [2025-01-04 01:36:58,968][134211] Fps is (10 sec: 12288.0, 60 sec: 15701.4, 300 sec: 15426.0). Total num frames: 240943104. Throughput: 0: 3711.9. Samples: 49406880. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 01:36:58,968][134211] Avg episode reward: [(0, '7.521')] [2025-01-04 01:37:01,911][134294] Updated weights for policy 0, policy_version 58834 (0.0023) [2025-01-04 01:37:03,968][134211] Fps is (10 sec: 13107.2, 60 sec: 15633.1, 300 sec: 15426.0). Total num frames: 241008640. Throughput: 0: 3654.5. Samples: 49416786. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 01:37:03,968][134211] Avg episode reward: [(0, '7.170')] [2025-01-04 01:37:05,120][134294] Updated weights for policy 0, policy_version 58844 (0.0024) [2025-01-04 01:37:08,157][134294] Updated weights for policy 0, policy_version 58854 (0.0025) [2025-01-04 01:37:08,967][134211] Fps is (10 sec: 13107.4, 60 sec: 15018.7, 300 sec: 15412.1). Total num frames: 241074176. Throughput: 0: 3660.8. Samples: 49436296. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 01:37:08,968][134211] Avg episode reward: [(0, '7.044')] [2025-01-04 01:37:10,328][134294] Updated weights for policy 0, policy_version 58864 (0.0017) [2025-01-04 01:37:12,195][134294] Updated weights for policy 0, policy_version 58874 (0.0014) [2025-01-04 01:37:13,968][134211] Fps is (10 sec: 17613.2, 60 sec: 15086.9, 300 sec: 15495.4). Total num frames: 241184768. Throughput: 0: 3874.3. Samples: 49465530. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 01:37:13,968][134211] Avg episode reward: [(0, '6.358')] [2025-01-04 01:37:13,980][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000058883_241184768.pth... [2025-01-04 01:37:14,025][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000057976_237469696.pth [2025-01-04 01:37:14,112][134294] Updated weights for policy 0, policy_version 58884 (0.0013) [2025-01-04 01:37:15,981][134294] Updated weights for policy 0, policy_version 58894 (0.0015) [2025-01-04 01:37:17,911][134294] Updated weights for policy 0, policy_version 58904 (0.0013) [2025-01-04 01:37:18,968][134211] Fps is (10 sec: 21707.8, 60 sec: 15701.3, 300 sec: 15578.7). Total num frames: 241291264. Throughput: 0: 4015.4. Samples: 49481714. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 01:37:18,968][134211] Avg episode reward: [(0, '6.971')] [2025-01-04 01:37:19,920][134294] Updated weights for policy 0, policy_version 58914 (0.0015) [2025-01-04 01:37:22,902][134294] Updated weights for policy 0, policy_version 58924 (0.0025) [2025-01-04 01:37:23,968][134211] Fps is (10 sec: 17612.4, 60 sec: 15769.6, 300 sec: 15550.9). Total num frames: 241360896. Throughput: 0: 4162.0. Samples: 49508950. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 01:37:23,969][134211] Avg episode reward: [(0, '7.166')] [2025-01-04 01:37:26,330][134294] Updated weights for policy 0, policy_version 58934 (0.0028) [2025-01-04 01:37:28,968][134211] Fps is (10 sec: 13107.6, 60 sec: 15701.3, 300 sec: 15453.7). Total num frames: 241422336. Throughput: 0: 3878.7. Samples: 49526944. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 01:37:28,968][134211] Avg episode reward: [(0, '6.111')] [2025-01-04 01:37:29,671][134294] Updated weights for policy 0, policy_version 58944 (0.0027) [2025-01-04 01:37:32,800][134294] Updated weights for policy 0, policy_version 58954 (0.0026) [2025-01-04 01:37:33,968][134211] Fps is (10 sec: 12697.8, 60 sec: 15701.3, 300 sec: 15453.7). Total num frames: 241487872. Throughput: 0: 3761.1. Samples: 49536802. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 01:37:33,968][134211] Avg episode reward: [(0, '6.390')] [2025-01-04 01:37:35,741][134294] Updated weights for policy 0, policy_version 58964 (0.0024) [2025-01-04 01:37:38,790][134294] Updated weights for policy 0, policy_version 58974 (0.0027) [2025-01-04 01:37:38,968][134211] Fps is (10 sec: 13516.6, 60 sec: 15564.7, 300 sec: 15467.6). Total num frames: 241557504. Throughput: 0: 3782.0. Samples: 49557350. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 01:37:38,968][134211] Avg episode reward: [(0, '7.045')] [2025-01-04 01:37:41,700][134294] Updated weights for policy 0, policy_version 58984 (0.0025) [2025-01-04 01:37:43,968][134211] Fps is (10 sec: 13926.3, 60 sec: 14950.3, 300 sec: 15398.2). Total num frames: 241627136. Throughput: 0: 3789.4. Samples: 49577404. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 01:37:43,968][134211] Avg episode reward: [(0, '6.445')] [2025-01-04 01:37:44,949][134294] Updated weights for policy 0, policy_version 58994 (0.0027) [2025-01-04 01:37:47,972][134294] Updated weights for policy 0, policy_version 59004 (0.0024) [2025-01-04 01:37:48,968][134211] Fps is (10 sec: 13926.8, 60 sec: 14609.1, 300 sec: 15273.2). Total num frames: 241696768. Throughput: 0: 3785.0. Samples: 49587110. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 01:37:48,968][134211] Avg episode reward: [(0, '6.401')] [2025-01-04 01:37:50,080][134294] Updated weights for policy 0, policy_version 59014 (0.0012) [2025-01-04 01:37:52,003][134294] Updated weights for policy 0, policy_version 59024 (0.0013) [2025-01-04 01:37:53,896][134294] Updated weights for policy 0, policy_version 59034 (0.0014) [2025-01-04 01:37:53,968][134211] Fps is (10 sec: 17613.2, 60 sec: 15428.3, 300 sec: 15356.5). Total num frames: 241803264. Throughput: 0: 3966.9. Samples: 49614808. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 01:37:53,968][134211] Avg episode reward: [(0, '7.098')] [2025-01-04 01:37:55,778][134294] Updated weights for policy 0, policy_version 59044 (0.0015) [2025-01-04 01:37:57,637][134294] Updated weights for policy 0, policy_version 59054 (0.0013) [2025-01-04 01:37:58,967][134211] Fps is (10 sec: 21299.3, 60 sec: 16111.0, 300 sec: 15509.3). Total num frames: 241909760. Throughput: 0: 4041.6. Samples: 49647402. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 01:37:58,968][134211] Avg episode reward: [(0, '6.259')] [2025-01-04 01:37:59,556][134294] Updated weights for policy 0, policy_version 59064 (0.0013) [2025-01-04 01:38:02,292][134294] Updated weights for policy 0, policy_version 59074 (0.0022) [2025-01-04 01:38:03,968][134211] Fps is (10 sec: 18021.9, 60 sec: 16247.5, 300 sec: 15537.0). Total num frames: 241983488. Throughput: 0: 3992.0. Samples: 49661354. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 01:38:03,969][134211] Avg episode reward: [(0, '6.610')] [2025-01-04 01:38:05,683][134294] Updated weights for policy 0, policy_version 59084 (0.0027) [2025-01-04 01:38:08,723][134294] Updated weights for policy 0, policy_version 59094 (0.0026) [2025-01-04 01:38:08,968][134211] Fps is (10 sec: 13926.2, 60 sec: 16247.4, 300 sec: 15523.1). Total num frames: 242049024. Throughput: 0: 3810.1. Samples: 49680402. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 01:38:08,969][134211] Avg episode reward: [(0, '7.762')] [2025-01-04 01:38:11,798][134294] Updated weights for policy 0, policy_version 59104 (0.0025) [2025-01-04 01:38:13,968][134211] Fps is (10 sec: 13516.8, 60 sec: 15564.7, 300 sec: 15523.1). Total num frames: 242118656. Throughput: 0: 3849.6. Samples: 49700176. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 01:38:13,969][134211] Avg episode reward: [(0, '6.682')] [2025-01-04 01:38:14,909][134294] Updated weights for policy 0, policy_version 59114 (0.0028) [2025-01-04 01:38:18,161][134294] Updated weights for policy 0, policy_version 59124 (0.0023) [2025-01-04 01:38:18,968][134211] Fps is (10 sec: 13107.1, 60 sec: 14813.9, 300 sec: 15509.3). Total num frames: 242180096. Throughput: 0: 3847.4. Samples: 49709934. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 01:38:18,969][134211] Avg episode reward: [(0, '6.675')] [2025-01-04 01:38:21,184][134294] Updated weights for policy 0, policy_version 59134 (0.0026) [2025-01-04 01:38:23,968][134211] Fps is (10 sec: 13107.4, 60 sec: 14813.9, 300 sec: 15509.3). Total num frames: 242249728. Throughput: 0: 3828.6. Samples: 49729638. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 01:38:23,968][134211] Avg episode reward: [(0, '7.270')] [2025-01-04 01:38:24,342][134294] Updated weights for policy 0, policy_version 59144 (0.0026) [2025-01-04 01:38:27,288][134294] Updated weights for policy 0, policy_version 59154 (0.0024) [2025-01-04 01:38:28,968][134211] Fps is (10 sec: 13517.0, 60 sec: 14882.1, 300 sec: 15384.3). Total num frames: 242315264. Throughput: 0: 3835.8. Samples: 49750016. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 01:38:28,968][134211] Avg episode reward: [(0, '6.856')] [2025-01-04 01:38:30,210][134294] Updated weights for policy 0, policy_version 59164 (0.0023) [2025-01-04 01:38:32,092][134294] Updated weights for policy 0, policy_version 59174 (0.0013) [2025-01-04 01:38:33,967][134211] Fps is (10 sec: 16384.3, 60 sec: 15428.3, 300 sec: 15370.4). Total num frames: 242413568. Throughput: 0: 3895.5. Samples: 49762406. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 01:38:33,968][134211] Avg episode reward: [(0, '7.179')] [2025-01-04 01:38:34,013][134294] Updated weights for policy 0, policy_version 59184 (0.0014) [2025-01-04 01:38:35,828][134294] Updated weights for policy 0, policy_version 59194 (0.0012) [2025-01-04 01:38:37,737][134294] Updated weights for policy 0, policy_version 59204 (0.0013) [2025-01-04 01:38:38,968][134211] Fps is (10 sec: 20480.0, 60 sec: 16042.7, 300 sec: 15495.4). Total num frames: 242520064. Throughput: 0: 4008.7. Samples: 49795200. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 01:38:38,968][134211] Avg episode reward: [(0, '6.979')] [2025-01-04 01:38:40,490][134294] Updated weights for policy 0, policy_version 59214 (0.0022) [2025-01-04 01:38:43,640][134294] Updated weights for policy 0, policy_version 59224 (0.0028) [2025-01-04 01:38:43,968][134211] Fps is (10 sec: 16793.1, 60 sec: 15906.1, 300 sec: 15495.4). Total num frames: 242581504. Throughput: 0: 3771.2. Samples: 49817106. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 01:38:43,969][134211] Avg episode reward: [(0, '6.644')] [2025-01-04 01:38:46,988][134294] Updated weights for policy 0, policy_version 59234 (0.0027) [2025-01-04 01:38:48,968][134211] Fps is (10 sec: 12287.8, 60 sec: 15769.5, 300 sec: 15481.5). Total num frames: 242642944. Throughput: 0: 3667.7. Samples: 49826398. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 01:38:48,969][134211] Avg episode reward: [(0, '6.939')] [2025-01-04 01:38:50,575][134294] Updated weights for policy 0, policy_version 59244 (0.0030) [2025-01-04 01:38:53,815][134294] Updated weights for policy 0, policy_version 59254 (0.0023) [2025-01-04 01:38:53,967][134211] Fps is (10 sec: 12288.4, 60 sec: 15018.7, 300 sec: 15453.8). Total num frames: 242704384. Throughput: 0: 3636.8. Samples: 49844058. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 01:38:53,968][134211] Avg episode reward: [(0, '7.800')] [2025-01-04 01:38:55,906][134294] Updated weights for policy 0, policy_version 59264 (0.0015) [2025-01-04 01:38:57,694][134294] Updated weights for policy 0, policy_version 59274 (0.0013) [2025-01-04 01:38:58,968][134211] Fps is (10 sec: 16794.0, 60 sec: 15018.6, 300 sec: 15523.1). Total num frames: 242810880. Throughput: 0: 3824.5. Samples: 49872278. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 01:38:58,968][134211] Avg episode reward: [(0, '6.788')] [2025-01-04 01:38:59,676][134294] Updated weights for policy 0, policy_version 59284 (0.0013) [2025-01-04 01:39:01,586][134294] Updated weights for policy 0, policy_version 59294 (0.0013) [2025-01-04 01:39:03,707][134294] Updated weights for policy 0, policy_version 59304 (0.0016) [2025-01-04 01:39:03,968][134211] Fps is (10 sec: 20889.0, 60 sec: 15496.6, 300 sec: 15495.4). Total num frames: 242913280. Throughput: 0: 3960.0. Samples: 49888134. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 01:39:03,968][134211] Avg episode reward: [(0, '7.204')] [2025-01-04 01:39:06,704][134294] Updated weights for policy 0, policy_version 59314 (0.0027) [2025-01-04 01:39:08,968][134211] Fps is (10 sec: 16383.7, 60 sec: 15428.2, 300 sec: 15370.4). Total num frames: 242974720. Throughput: 0: 4051.6. Samples: 49911962. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 01:39:08,968][134211] Avg episode reward: [(0, '7.046')] [2025-01-04 01:39:10,039][134294] Updated weights for policy 0, policy_version 59324 (0.0026) [2025-01-04 01:39:13,146][134294] Updated weights for policy 0, policy_version 59334 (0.0025) [2025-01-04 01:39:13,969][134211] Fps is (10 sec: 12696.1, 60 sec: 15359.7, 300 sec: 15398.1). Total num frames: 243040256. Throughput: 0: 4026.8. Samples: 49931226. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 01:39:13,970][134211] Avg episode reward: [(0, '7.505')] [2025-01-04 01:39:13,986][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000059336_243040256.pth... [2025-01-04 01:39:14,060][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000058426_239312896.pth [2025-01-04 01:39:16,323][134294] Updated weights for policy 0, policy_version 59344 (0.0028) [2025-01-04 01:39:18,968][134211] Fps is (10 sec: 13516.8, 60 sec: 15496.5, 300 sec: 15398.2). Total num frames: 243109888. Throughput: 0: 3968.5. Samples: 49940988. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 01:39:18,968][134211] Avg episode reward: [(0, '6.850')] [2025-01-04 01:39:19,294][134294] Updated weights for policy 0, policy_version 59354 (0.0026) [2025-01-04 01:39:22,451][134294] Updated weights for policy 0, policy_version 59364 (0.0026) [2025-01-04 01:39:23,968][134211] Fps is (10 sec: 13108.4, 60 sec: 15359.9, 300 sec: 15398.2). Total num frames: 243171328. Throughput: 0: 3686.3. Samples: 49961084. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 01:39:23,969][134211] Avg episode reward: [(0, '7.428')] [2025-01-04 01:39:25,421][134294] Updated weights for policy 0, policy_version 59374 (0.0024) [2025-01-04 01:39:28,345][134294] Updated weights for policy 0, policy_version 59384 (0.0024) [2025-01-04 01:39:28,968][134211] Fps is (10 sec: 13926.7, 60 sec: 15564.8, 300 sec: 15426.0). Total num frames: 243249152. Throughput: 0: 3656.2. Samples: 49981632. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 01:39:28,968][134211] Avg episode reward: [(0, '7.669')] [2025-01-04 01:39:30,344][134294] Updated weights for policy 0, policy_version 59394 (0.0012) [2025-01-04 01:39:32,186][134294] Updated weights for policy 0, policy_version 59404 (0.0014) [2025-01-04 01:39:33,968][134211] Fps is (10 sec: 18432.8, 60 sec: 15701.3, 300 sec: 15550.9). Total num frames: 243355648. Throughput: 0: 3798.6. Samples: 49997336. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 01:39:33,968][134211] Avg episode reward: [(0, '7.579')] [2025-01-04 01:39:34,100][134294] Updated weights for policy 0, policy_version 59414 (0.0012) [2025-01-04 01:39:35,940][134294] Updated weights for policy 0, policy_version 59424 (0.0014) [2025-01-04 01:39:37,848][134294] Updated weights for policy 0, policy_version 59434 (0.0011) [2025-01-04 01:39:38,967][134211] Fps is (10 sec: 21299.3, 60 sec: 15701.4, 300 sec: 15689.8). Total num frames: 243462144. Throughput: 0: 4132.7. Samples: 50030030. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 01:39:38,970][134211] Avg episode reward: [(0, '6.403')] [2025-01-04 01:39:39,768][134294] Updated weights for policy 0, policy_version 59444 (0.0014) [2025-01-04 01:39:42,759][134294] Updated weights for policy 0, policy_version 59454 (0.0025) [2025-01-04 01:39:43,969][134211] Fps is (10 sec: 18019.5, 60 sec: 15905.8, 300 sec: 15620.2). Total num frames: 243535872. Throughput: 0: 4067.5. Samples: 50055324. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 01:39:43,970][134211] Avg episode reward: [(0, '7.010')] [2025-01-04 01:39:45,998][134294] Updated weights for policy 0, policy_version 59464 (0.0025) [2025-01-04 01:39:48,968][134211] Fps is (10 sec: 13926.2, 60 sec: 15974.4, 300 sec: 15481.5). Total num frames: 243601408. Throughput: 0: 3929.4. Samples: 50064956. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 01:39:48,968][134211] Avg episode reward: [(0, '7.023')] [2025-01-04 01:39:49,273][134294] Updated weights for policy 0, policy_version 59474 (0.0027) [2025-01-04 01:39:52,258][134294] Updated weights for policy 0, policy_version 59484 (0.0026) [2025-01-04 01:39:53,968][134211] Fps is (10 sec: 13109.2, 60 sec: 16042.6, 300 sec: 15439.8). Total num frames: 243666944. Throughput: 0: 3832.5. Samples: 50084424. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 01:39:53,968][134211] Avg episode reward: [(0, '6.543')] [2025-01-04 01:39:55,317][134294] Updated weights for policy 0, policy_version 59494 (0.0026) [2025-01-04 01:39:58,288][134294] Updated weights for policy 0, policy_version 59504 (0.0025) [2025-01-04 01:39:58,968][134211] Fps is (10 sec: 13516.8, 60 sec: 15428.3, 300 sec: 15453.7). Total num frames: 243736576. Throughput: 0: 3857.3. Samples: 50104798. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 01:39:58,968][134211] Avg episode reward: [(0, '6.617')] [2025-01-04 01:40:01,295][134294] Updated weights for policy 0, policy_version 59514 (0.0024) [2025-01-04 01:40:03,968][134211] Fps is (10 sec: 13516.6, 60 sec: 14813.9, 300 sec: 15453.7). Total num frames: 243802112. Throughput: 0: 3869.2. Samples: 50115102. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 01:40:03,968][134211] Avg episode reward: [(0, '6.262')] [2025-01-04 01:40:04,456][134294] Updated weights for policy 0, policy_version 59524 (0.0025) [2025-01-04 01:40:07,492][134294] Updated weights for policy 0, policy_version 59534 (0.0026) [2025-01-04 01:40:08,968][134211] Fps is (10 sec: 13926.6, 60 sec: 15018.7, 300 sec: 15467.6). Total num frames: 243875840. Throughput: 0: 3866.9. Samples: 50135092. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 01:40:08,968][134211] Avg episode reward: [(0, '6.432')] [2025-01-04 01:40:09,610][134294] Updated weights for policy 0, policy_version 59544 (0.0015) [2025-01-04 01:40:12,091][134294] Updated weights for policy 0, policy_version 59554 (0.0021) [2025-01-04 01:40:13,968][134211] Fps is (10 sec: 15565.1, 60 sec: 15292.1, 300 sec: 15523.1). Total num frames: 243957760. Throughput: 0: 3963.6. Samples: 50159992. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 01:40:13,968][134211] Avg episode reward: [(0, '6.604')] [2025-01-04 01:40:15,130][134294] Updated weights for policy 0, policy_version 59564 (0.0026) [2025-01-04 01:40:17,083][134294] Updated weights for policy 0, policy_version 59574 (0.0013) [2025-01-04 01:40:18,967][134211] Fps is (10 sec: 17612.8, 60 sec: 15701.4, 300 sec: 15621.3). Total num frames: 244051968. Throughput: 0: 3886.5. Samples: 50172226. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 01:40:18,968][134211] Avg episode reward: [(0, '6.398')] [2025-01-04 01:40:18,999][134294] Updated weights for policy 0, policy_version 59584 (0.0013) [2025-01-04 01:40:20,889][134294] Updated weights for policy 0, policy_version 59594 (0.0014) [2025-01-04 01:40:23,249][134294] Updated weights for policy 0, policy_version 59604 (0.0017) [2025-01-04 01:40:23,968][134211] Fps is (10 sec: 18841.3, 60 sec: 16247.5, 300 sec: 15592.6). Total num frames: 244146176. Throughput: 0: 3863.0. Samples: 50203864. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 01:40:23,968][134211] Avg episode reward: [(0, '6.442')] [2025-01-04 01:40:26,497][134294] Updated weights for policy 0, policy_version 59614 (0.0030) [2025-01-04 01:40:28,968][134211] Fps is (10 sec: 15564.4, 60 sec: 15974.4, 300 sec: 15425.9). Total num frames: 244207616. Throughput: 0: 3727.3. Samples: 50223046. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 01:40:28,968][134211] Avg episode reward: [(0, '6.842')] [2025-01-04 01:40:29,807][134294] Updated weights for policy 0, policy_version 59624 (0.0028) [2025-01-04 01:40:32,907][134294] Updated weights for policy 0, policy_version 59634 (0.0027) [2025-01-04 01:40:33,968][134211] Fps is (10 sec: 12697.7, 60 sec: 15291.7, 300 sec: 15370.6). Total num frames: 244273152. Throughput: 0: 3727.3. Samples: 50232686. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 01:40:33,968][134211] Avg episode reward: [(0, '6.598')] [2025-01-04 01:40:35,851][134294] Updated weights for policy 0, policy_version 59644 (0.0023) [2025-01-04 01:40:37,709][134294] Updated weights for policy 0, policy_version 59654 (0.0013) [2025-01-04 01:40:38,967][134211] Fps is (10 sec: 15974.8, 60 sec: 15086.9, 300 sec: 15481.5). Total num frames: 244367360. Throughput: 0: 3813.2. Samples: 50256016. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 01:40:38,968][134211] Avg episode reward: [(0, '6.639')] [2025-01-04 01:40:39,600][134294] Updated weights for policy 0, policy_version 59664 (0.0013) [2025-01-04 01:40:41,481][134294] Updated weights for policy 0, policy_version 59674 (0.0015) [2025-01-04 01:40:43,968][134211] Fps is (10 sec: 18841.2, 60 sec: 15428.6, 300 sec: 15578.8). Total num frames: 244461568. Throughput: 0: 4048.2. Samples: 50286970. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 01:40:43,969][134211] Avg episode reward: [(0, '6.457')] [2025-01-04 01:40:43,987][134294] Updated weights for policy 0, policy_version 59684 (0.0022) [2025-01-04 01:40:47,284][134294] Updated weights for policy 0, policy_version 59694 (0.0029) [2025-01-04 01:40:48,968][134211] Fps is (10 sec: 15564.3, 60 sec: 15360.0, 300 sec: 15550.9). Total num frames: 244523008. Throughput: 0: 4038.4. Samples: 50296828. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 01:40:48,969][134211] Avg episode reward: [(0, '7.318')] [2025-01-04 01:40:50,962][134294] Updated weights for policy 0, policy_version 59704 (0.0028) [2025-01-04 01:40:53,968][134211] Fps is (10 sec: 12288.3, 60 sec: 15291.7, 300 sec: 15537.0). Total num frames: 244584448. Throughput: 0: 3972.9. Samples: 50313872. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 01:40:53,968][134211] Avg episode reward: [(0, '7.495')] [2025-01-04 01:40:54,174][134294] Updated weights for policy 0, policy_version 59714 (0.0025) [2025-01-04 01:40:56,353][134294] Updated weights for policy 0, policy_version 59724 (0.0016) [2025-01-04 01:40:58,968][134211] Fps is (10 sec: 14336.2, 60 sec: 15496.5, 300 sec: 15578.7). Total num frames: 244666368. Throughput: 0: 3936.1. Samples: 50337116. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 01:40:58,968][134211] Avg episode reward: [(0, '7.150')] [2025-01-04 01:40:59,289][134294] Updated weights for policy 0, policy_version 59734 (0.0025) [2025-01-04 01:41:02,442][134294] Updated weights for policy 0, policy_version 59744 (0.0027) [2025-01-04 01:41:03,968][134211] Fps is (10 sec: 14745.6, 60 sec: 15496.6, 300 sec: 15453.7). Total num frames: 244731904. Throughput: 0: 3886.2. Samples: 50347104. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 01:41:03,968][134211] Avg episode reward: [(0, '7.476')] [2025-01-04 01:41:04,966][134294] Updated weights for policy 0, policy_version 59754 (0.0021) [2025-01-04 01:41:06,869][134294] Updated weights for policy 0, policy_version 59764 (0.0012) [2025-01-04 01:41:08,744][134294] Updated weights for policy 0, policy_version 59774 (0.0014) [2025-01-04 01:41:08,968][134211] Fps is (10 sec: 17203.4, 60 sec: 16042.7, 300 sec: 15453.7). Total num frames: 244838400. Throughput: 0: 3764.6. Samples: 50373270. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 01:41:08,968][134211] Avg episode reward: [(0, '7.573')] [2025-01-04 01:41:10,671][134294] Updated weights for policy 0, policy_version 59784 (0.0014) [2025-01-04 01:41:12,544][134294] Updated weights for policy 0, policy_version 59794 (0.0013) [2025-01-04 01:41:13,968][134211] Fps is (10 sec: 21299.5, 60 sec: 16452.3, 300 sec: 15578.7). Total num frames: 244944896. Throughput: 0: 4062.2. Samples: 50405846. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 01:41:13,968][134211] Avg episode reward: [(0, '7.061')] [2025-01-04 01:41:14,021][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000059802_244948992.pth... [2025-01-04 01:41:14,063][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000058883_241184768.pth [2025-01-04 01:41:14,441][134294] Updated weights for policy 0, policy_version 59804 (0.0014) [2025-01-04 01:41:16,620][134294] Updated weights for policy 0, policy_version 59814 (0.0017) [2025-01-04 01:41:18,968][134211] Fps is (10 sec: 18841.3, 60 sec: 16247.4, 300 sec: 15634.2). Total num frames: 245026816. Throughput: 0: 4185.3. Samples: 50421026. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 01:41:18,968][134211] Avg episode reward: [(0, '6.373')] [2025-01-04 01:41:19,841][134294] Updated weights for policy 0, policy_version 59824 (0.0027) [2025-01-04 01:41:23,070][134294] Updated weights for policy 0, policy_version 59834 (0.0027) [2025-01-04 01:41:23,968][134211] Fps is (10 sec: 14335.8, 60 sec: 15701.4, 300 sec: 15620.3). Total num frames: 245088256. Throughput: 0: 4098.7. Samples: 50440460. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 01:41:23,968][134211] Avg episode reward: [(0, '6.589')] [2025-01-04 01:41:26,137][134294] Updated weights for policy 0, policy_version 59844 (0.0027) [2025-01-04 01:41:28,968][134211] Fps is (10 sec: 12697.7, 60 sec: 15769.6, 300 sec: 15620.3). Total num frames: 245153792. Throughput: 0: 3837.0. Samples: 50459636. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 01:41:28,968][134211] Avg episode reward: [(0, '6.458')] [2025-01-04 01:41:29,476][134294] Updated weights for policy 0, policy_version 59854 (0.0025) [2025-01-04 01:41:32,526][134294] Updated weights for policy 0, policy_version 59864 (0.0029) [2025-01-04 01:41:33,968][134211] Fps is (10 sec: 13107.0, 60 sec: 15769.6, 300 sec: 15578.7). Total num frames: 245219328. Throughput: 0: 3832.1. Samples: 50469274. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 01:41:33,968][134211] Avg episode reward: [(0, '7.278')] [2025-01-04 01:41:35,585][134294] Updated weights for policy 0, policy_version 59874 (0.0027) [2025-01-04 01:41:38,495][134294] Updated weights for policy 0, policy_version 59884 (0.0025) [2025-01-04 01:41:38,968][134211] Fps is (10 sec: 13516.2, 60 sec: 15359.9, 300 sec: 15453.7). Total num frames: 245288960. Throughput: 0: 3912.1. Samples: 50489916. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 01:41:38,969][134211] Avg episode reward: [(0, '7.227')] [2025-01-04 01:41:41,564][134294] Updated weights for policy 0, policy_version 59894 (0.0022) [2025-01-04 01:41:43,968][134211] Fps is (10 sec: 13516.9, 60 sec: 14882.2, 300 sec: 15370.4). Total num frames: 245354496. Throughput: 0: 3844.1. Samples: 50510100. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 01:41:43,968][134211] Avg episode reward: [(0, '6.989')] [2025-01-04 01:41:44,605][134294] Updated weights for policy 0, policy_version 59904 (0.0027) [2025-01-04 01:41:47,607][134294] Updated weights for policy 0, policy_version 59914 (0.0027) [2025-01-04 01:41:48,968][134211] Fps is (10 sec: 14336.6, 60 sec: 15155.2, 300 sec: 15439.8). Total num frames: 245432320. Throughput: 0: 3847.8. Samples: 50520256. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 01:41:48,968][134211] Avg episode reward: [(0, '7.029')] [2025-01-04 01:41:49,658][134294] Updated weights for policy 0, policy_version 59924 (0.0014) [2025-01-04 01:41:51,558][134294] Updated weights for policy 0, policy_version 59934 (0.0013) [2025-01-04 01:41:53,589][134294] Updated weights for policy 0, policy_version 59944 (0.0014) [2025-01-04 01:41:53,968][134211] Fps is (10 sec: 18432.3, 60 sec: 15906.2, 300 sec: 15578.7). Total num frames: 245538816. Throughput: 0: 3896.7. Samples: 50548622. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 01:41:53,968][134211] Avg episode reward: [(0, '6.831')] [2025-01-04 01:41:56,036][134294] Updated weights for policy 0, policy_version 59954 (0.0020) [2025-01-04 01:41:58,968][134211] Fps is (10 sec: 17612.8, 60 sec: 15701.3, 300 sec: 15592.6). Total num frames: 245608448. Throughput: 0: 3715.0. Samples: 50573020. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 01:41:58,968][134211] Avg episode reward: [(0, '7.023')] [2025-01-04 01:41:59,261][134294] Updated weights for policy 0, policy_version 59964 (0.0027) [2025-01-04 01:42:02,443][134294] Updated weights for policy 0, policy_version 59974 (0.0028) [2025-01-04 01:42:03,968][134211] Fps is (10 sec: 13106.9, 60 sec: 15633.0, 300 sec: 15578.7). Total num frames: 245669888. Throughput: 0: 3587.2. Samples: 50582450. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 01:42:03,968][134211] Avg episode reward: [(0, '6.887')] [2025-01-04 01:42:05,166][134294] Updated weights for policy 0, policy_version 59984 (0.0022) [2025-01-04 01:42:07,066][134294] Updated weights for policy 0, policy_version 59994 (0.0012) [2025-01-04 01:42:08,967][134211] Fps is (10 sec: 16384.2, 60 sec: 15564.8, 300 sec: 15550.9). Total num frames: 245772288. Throughput: 0: 3719.4. Samples: 50607832. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 01:42:08,968][134211] Avg episode reward: [(0, '6.912')] [2025-01-04 01:42:09,000][134294] Updated weights for policy 0, policy_version 60004 (0.0014) [2025-01-04 01:42:10,868][134294] Updated weights for policy 0, policy_version 60014 (0.0013) [2025-01-04 01:42:12,724][134294] Updated weights for policy 0, policy_version 60024 (0.0013) [2025-01-04 01:42:13,968][134211] Fps is (10 sec: 21299.8, 60 sec: 15633.1, 300 sec: 15564.8). Total num frames: 245882880. Throughput: 0: 4013.0. Samples: 50640222. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 01:42:13,968][134211] Avg episode reward: [(0, '6.790')] [2025-01-04 01:42:14,595][134294] Updated weights for policy 0, policy_version 60034 (0.0014) [2025-01-04 01:42:16,951][134294] Updated weights for policy 0, policy_version 60044 (0.0019) [2025-01-04 01:42:18,968][134211] Fps is (10 sec: 18841.1, 60 sec: 15564.8, 300 sec: 15592.6). Total num frames: 245960704. Throughput: 0: 4139.9. Samples: 50655568. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 01:42:18,968][134211] Avg episode reward: [(0, '6.316')] [2025-01-04 01:42:20,459][134294] Updated weights for policy 0, policy_version 60054 (0.0030) [2025-01-04 01:42:23,502][134294] Updated weights for policy 0, policy_version 60064 (0.0027) [2025-01-04 01:42:23,968][134211] Fps is (10 sec: 14335.7, 60 sec: 15633.1, 300 sec: 15606.5). Total num frames: 246026240. Throughput: 0: 4099.9. Samples: 50674408. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 01:42:23,968][134211] Avg episode reward: [(0, '6.876')] [2025-01-04 01:42:26,694][134294] Updated weights for policy 0, policy_version 60074 (0.0025) [2025-01-04 01:42:28,968][134211] Fps is (10 sec: 12697.8, 60 sec: 15564.8, 300 sec: 15592.6). Total num frames: 246087680. Throughput: 0: 4068.9. Samples: 50693198. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 01:42:28,968][134211] Avg episode reward: [(0, '6.404')] [2025-01-04 01:42:30,391][134294] Updated weights for policy 0, policy_version 60084 (0.0025) [2025-01-04 01:42:33,603][134294] Updated weights for policy 0, policy_version 60094 (0.0026) [2025-01-04 01:42:33,969][134211] Fps is (10 sec: 12286.7, 60 sec: 15496.3, 300 sec: 15564.7). Total num frames: 246149120. Throughput: 0: 4026.6. Samples: 50701460. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 01:42:33,969][134211] Avg episode reward: [(0, '6.815')] [2025-01-04 01:42:36,644][134294] Updated weights for policy 0, policy_version 60104 (0.0024) [2025-01-04 01:42:38,968][134211] Fps is (10 sec: 12696.5, 60 sec: 15428.2, 300 sec: 15550.9). Total num frames: 246214656. Throughput: 0: 3839.2. Samples: 50721390. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 01:42:38,969][134211] Avg episode reward: [(0, '6.963')] [2025-01-04 01:42:39,798][134294] Updated weights for policy 0, policy_version 60114 (0.0026) [2025-01-04 01:42:42,798][134294] Updated weights for policy 0, policy_version 60124 (0.0024) [2025-01-04 01:42:43,967][134211] Fps is (10 sec: 13928.2, 60 sec: 15564.9, 300 sec: 15564.8). Total num frames: 246288384. Throughput: 0: 3748.0. Samples: 50741678. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 01:42:43,968][134211] Avg episode reward: [(0, '7.072')] [2025-01-04 01:42:44,822][134294] Updated weights for policy 0, policy_version 60134 (0.0013) [2025-01-04 01:42:46,791][134294] Updated weights for policy 0, policy_version 60144 (0.0012) [2025-01-04 01:42:48,934][134294] Updated weights for policy 0, policy_version 60154 (0.0013) [2025-01-04 01:42:48,968][134211] Fps is (10 sec: 17614.4, 60 sec: 15974.4, 300 sec: 15550.9). Total num frames: 246390784. Throughput: 0: 3892.2. Samples: 50757598. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 01:42:48,968][134211] Avg episode reward: [(0, '6.285')] [2025-01-04 01:42:51,383][134294] Updated weights for policy 0, policy_version 60164 (0.0020) [2025-01-04 01:42:53,968][134211] Fps is (10 sec: 17202.1, 60 sec: 15359.9, 300 sec: 15425.9). Total num frames: 246460416. Throughput: 0: 3895.2. Samples: 50783120. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 01:42:53,969][134211] Avg episode reward: [(0, '7.053')] [2025-01-04 01:42:54,907][134294] Updated weights for policy 0, policy_version 60174 (0.0028) [2025-01-04 01:42:58,048][134294] Updated weights for policy 0, policy_version 60184 (0.0025) [2025-01-04 01:42:58,968][134211] Fps is (10 sec: 13107.0, 60 sec: 15223.4, 300 sec: 15384.3). Total num frames: 246521856. Throughput: 0: 3594.0. Samples: 50801952. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 01:42:58,968][134211] Avg episode reward: [(0, '7.337')] [2025-01-04 01:43:01,182][134294] Updated weights for policy 0, policy_version 60194 (0.0024) [2025-01-04 01:43:03,967][134211] Fps is (10 sec: 13108.1, 60 sec: 15360.1, 300 sec: 15398.2). Total num frames: 246591488. Throughput: 0: 3471.6. Samples: 50811790. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 01:43:03,968][134211] Avg episode reward: [(0, '6.989')] [2025-01-04 01:43:04,114][134294] Updated weights for policy 0, policy_version 60204 (0.0018) [2025-01-04 01:43:05,995][134294] Updated weights for policy 0, policy_version 60214 (0.0015) [2025-01-04 01:43:08,083][134294] Updated weights for policy 0, policy_version 60224 (0.0016) [2025-01-04 01:43:08,968][134211] Fps is (10 sec: 16793.8, 60 sec: 15291.7, 300 sec: 15495.4). Total num frames: 246689792. Throughput: 0: 3652.8. Samples: 50838784. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 01:43:08,968][134211] Avg episode reward: [(0, '6.705')] [2025-01-04 01:43:11,214][134294] Updated weights for policy 0, policy_version 60234 (0.0027) [2025-01-04 01:43:13,968][134211] Fps is (10 sec: 15974.0, 60 sec: 14472.5, 300 sec: 15495.4). Total num frames: 246751232. Throughput: 0: 3682.7. Samples: 50858922. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 01:43:13,968][134211] Avg episode reward: [(0, '6.462')] [2025-01-04 01:43:13,978][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000060242_246751232.pth... [2025-01-04 01:43:14,058][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000059336_243040256.pth [2025-01-04 01:43:14,509][134294] Updated weights for policy 0, policy_version 60244 (0.0029) [2025-01-04 01:43:16,477][134294] Updated weights for policy 0, policy_version 60254 (0.0013) [2025-01-04 01:43:18,384][134294] Updated weights for policy 0, policy_version 60264 (0.0015) [2025-01-04 01:43:18,967][134211] Fps is (10 sec: 16384.2, 60 sec: 14882.2, 300 sec: 15606.5). Total num frames: 246853632. Throughput: 0: 3780.0. Samples: 50871556. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 01:43:18,968][134211] Avg episode reward: [(0, '6.442')] [2025-01-04 01:43:20,261][134294] Updated weights for policy 0, policy_version 60274 (0.0013) [2025-01-04 01:43:22,163][134294] Updated weights for policy 0, policy_version 60284 (0.0013) [2025-01-04 01:43:23,968][134211] Fps is (10 sec: 19660.6, 60 sec: 15360.0, 300 sec: 15703.6). Total num frames: 246947840. Throughput: 0: 4062.1. Samples: 50904182. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 01:43:23,969][134211] Avg episode reward: [(0, '6.558')] [2025-01-04 01:43:25,061][134294] Updated weights for policy 0, policy_version 60294 (0.0027) [2025-01-04 01:43:28,310][134294] Updated weights for policy 0, policy_version 60304 (0.0027) [2025-01-04 01:43:28,968][134211] Fps is (10 sec: 15564.4, 60 sec: 15360.0, 300 sec: 15578.7). Total num frames: 247009280. Throughput: 0: 4051.1. Samples: 50923980. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 01:43:28,968][134211] Avg episode reward: [(0, '6.503')] [2025-01-04 01:43:31,478][134294] Updated weights for policy 0, policy_version 60314 (0.0024) [2025-01-04 01:43:33,968][134211] Fps is (10 sec: 12697.6, 60 sec: 15428.5, 300 sec: 15439.8). Total num frames: 247074816. Throughput: 0: 3915.3. Samples: 50933786. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 01:43:33,968][134211] Avg episode reward: [(0, '7.032')] [2025-01-04 01:43:34,672][134294] Updated weights for policy 0, policy_version 60324 (0.0029) [2025-01-04 01:43:38,033][134294] Updated weights for policy 0, policy_version 60334 (0.0027) [2025-01-04 01:43:38,968][134211] Fps is (10 sec: 12697.6, 60 sec: 15360.2, 300 sec: 15439.8). Total num frames: 247136256. Throughput: 0: 3766.1. Samples: 50952594. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 01:43:38,968][134211] Avg episode reward: [(0, '6.392')] [2025-01-04 01:43:41,337][134294] Updated weights for policy 0, policy_version 60344 (0.0025) [2025-01-04 01:43:43,968][134211] Fps is (10 sec: 12288.0, 60 sec: 15155.1, 300 sec: 15439.8). Total num frames: 247197696. Throughput: 0: 3757.2. Samples: 50971026. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 01:43:43,968][134211] Avg episode reward: [(0, '6.626')] [2025-01-04 01:43:44,587][134294] Updated weights for policy 0, policy_version 60354 (0.0026) [2025-01-04 01:43:46,868][134294] Updated weights for policy 0, policy_version 60364 (0.0014) [2025-01-04 01:43:48,785][134294] Updated weights for policy 0, policy_version 60374 (0.0014) [2025-01-04 01:43:48,968][134211] Fps is (10 sec: 15974.6, 60 sec: 15086.9, 300 sec: 15564.8). Total num frames: 247296000. Throughput: 0: 3790.6. Samples: 50982366. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 01:43:48,968][134211] Avg episode reward: [(0, '7.640')] [2025-01-04 01:43:50,673][134294] Updated weights for policy 0, policy_version 60384 (0.0013) [2025-01-04 01:43:52,538][134294] Updated weights for policy 0, policy_version 60394 (0.0014) [2025-01-04 01:43:53,967][134211] Fps is (10 sec: 20480.6, 60 sec: 15701.5, 300 sec: 15564.8). Total num frames: 247402496. Throughput: 0: 3908.4. Samples: 51014660. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 01:43:53,968][134211] Avg episode reward: [(0, '6.622')] [2025-01-04 01:43:54,425][134294] Updated weights for policy 0, policy_version 60404 (0.0014) [2025-01-04 01:43:57,005][134294] Updated weights for policy 0, policy_version 60414 (0.0022) [2025-01-04 01:43:58,968][134211] Fps is (10 sec: 18431.7, 60 sec: 15974.4, 300 sec: 15481.5). Total num frames: 247480320. Throughput: 0: 4037.8. Samples: 51040622. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 01:43:58,968][134211] Avg episode reward: [(0, '5.848')] [2025-01-04 01:44:00,194][134294] Updated weights for policy 0, policy_version 60424 (0.0027) [2025-01-04 01:44:03,559][134294] Updated weights for policy 0, policy_version 60434 (0.0032) [2025-01-04 01:44:03,968][134211] Fps is (10 sec: 13516.5, 60 sec: 15769.5, 300 sec: 15467.6). Total num frames: 247537664. Throughput: 0: 3973.3. Samples: 51050354. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 01:44:03,968][134211] Avg episode reward: [(0, '6.518')] [2025-01-04 01:44:06,677][134294] Updated weights for policy 0, policy_version 60444 (0.0026) [2025-01-04 01:44:08,968][134211] Fps is (10 sec: 12697.7, 60 sec: 15291.7, 300 sec: 15481.6). Total num frames: 247607296. Throughput: 0: 3673.4. Samples: 51069484. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 01:44:08,968][134211] Avg episode reward: [(0, '6.062')] [2025-01-04 01:44:09,873][134294] Updated weights for policy 0, policy_version 60454 (0.0027) [2025-01-04 01:44:12,912][134294] Updated weights for policy 0, policy_version 60464 (0.0025) [2025-01-04 01:44:13,968][134211] Fps is (10 sec: 13516.8, 60 sec: 15360.0, 300 sec: 15467.6). Total num frames: 247672832. Throughput: 0: 3675.3. Samples: 51089370. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 01:44:13,968][134211] Avg episode reward: [(0, '6.009')] [2025-01-04 01:44:15,828][134294] Updated weights for policy 0, policy_version 60474 (0.0026) [2025-01-04 01:44:18,915][134294] Updated weights for policy 0, policy_version 60484 (0.0025) [2025-01-04 01:44:18,967][134211] Fps is (10 sec: 13517.0, 60 sec: 14813.9, 300 sec: 15495.4). Total num frames: 247742464. Throughput: 0: 3684.6. Samples: 51099592. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 01:44:18,968][134211] Avg episode reward: [(0, '6.547')] [2025-01-04 01:44:20,883][134294] Updated weights for policy 0, policy_version 60494 (0.0014) [2025-01-04 01:44:22,827][134294] Updated weights for policy 0, policy_version 60504 (0.0014) [2025-01-04 01:44:23,967][134211] Fps is (10 sec: 17203.4, 60 sec: 14950.5, 300 sec: 15578.7). Total num frames: 247844864. Throughput: 0: 3857.4. Samples: 51126178. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 01:44:23,968][134211] Avg episode reward: [(0, '6.674')] [2025-01-04 01:44:24,708][134294] Updated weights for policy 0, policy_version 60514 (0.0014) [2025-01-04 01:44:26,696][134294] Updated weights for policy 0, policy_version 60524 (0.0016) [2025-01-04 01:44:28,968][134211] Fps is (10 sec: 19250.9, 60 sec: 15428.3, 300 sec: 15523.1). Total num frames: 247934976. Throughput: 0: 4094.1. Samples: 51155260. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 01:44:28,968][134211] Avg episode reward: [(0, '6.191')] [2025-01-04 01:44:29,652][134294] Updated weights for policy 0, policy_version 60534 (0.0024) [2025-01-04 01:44:33,004][134294] Updated weights for policy 0, policy_version 60544 (0.0031) [2025-01-04 01:44:33,968][134211] Fps is (10 sec: 15154.9, 60 sec: 15360.0, 300 sec: 15370.4). Total num frames: 247996416. Throughput: 0: 4054.3. Samples: 51164812. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 01:44:33,968][134211] Avg episode reward: [(0, '6.249')] [2025-01-04 01:44:36,533][134294] Updated weights for policy 0, policy_version 60554 (0.0029) [2025-01-04 01:44:38,968][134211] Fps is (10 sec: 11878.4, 60 sec: 15291.7, 300 sec: 15315.0). Total num frames: 248053760. Throughput: 0: 3719.7. Samples: 51182048. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 01:44:38,968][134211] Avg episode reward: [(0, '5.842')] [2025-01-04 01:44:39,820][134294] Updated weights for policy 0, policy_version 60564 (0.0024) [2025-01-04 01:44:41,710][134294] Updated weights for policy 0, policy_version 60574 (0.0012) [2025-01-04 01:44:43,605][134294] Updated weights for policy 0, policy_version 60584 (0.0013) [2025-01-04 01:44:43,968][134211] Fps is (10 sec: 15974.6, 60 sec: 15974.5, 300 sec: 15439.8). Total num frames: 248156160. Throughput: 0: 3738.2. Samples: 51208842. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 01:44:43,968][134211] Avg episode reward: [(0, '5.533')] [2025-01-04 01:44:45,574][134294] Updated weights for policy 0, policy_version 60594 (0.0014) [2025-01-04 01:44:48,763][134294] Updated weights for policy 0, policy_version 60604 (0.0026) [2025-01-04 01:44:48,968][134211] Fps is (10 sec: 18021.7, 60 sec: 15632.9, 300 sec: 15481.5). Total num frames: 248233984. Throughput: 0: 3862.5. Samples: 51224170. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 01:44:48,969][134211] Avg episode reward: [(0, '5.787')] [2025-01-04 01:44:52,614][134294] Updated weights for policy 0, policy_version 60614 (0.0025) [2025-01-04 01:44:53,967][134211] Fps is (10 sec: 13107.3, 60 sec: 14745.6, 300 sec: 15426.0). Total num frames: 248287232. Throughput: 0: 3806.4. Samples: 51240770. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 01:44:53,968][134211] Avg episode reward: [(0, '5.642')] [2025-01-04 01:44:55,200][134294] Updated weights for policy 0, policy_version 60624 (0.0016) [2025-01-04 01:44:57,254][134294] Updated weights for policy 0, policy_version 60634 (0.0013) [2025-01-04 01:44:58,967][134211] Fps is (10 sec: 15565.8, 60 sec: 15155.3, 300 sec: 15550.9). Total num frames: 248389632. Throughput: 0: 3951.7. Samples: 51267196. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 01:44:58,968][134211] Avg episode reward: [(0, '5.401')] [2025-01-04 01:44:59,212][134294] Updated weights for policy 0, policy_version 60644 (0.0014) [2025-01-04 01:45:01,099][134294] Updated weights for policy 0, policy_version 60654 (0.0014) [2025-01-04 01:45:03,158][134294] Updated weights for policy 0, policy_version 60664 (0.0016) [2025-01-04 01:45:03,968][134211] Fps is (10 sec: 20069.8, 60 sec: 15837.8, 300 sec: 15634.2). Total num frames: 248487936. Throughput: 0: 4086.4. Samples: 51283480. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 01:45:03,968][134211] Avg episode reward: [(0, '6.533')] [2025-01-04 01:45:06,395][134294] Updated weights for policy 0, policy_version 60674 (0.0029) [2025-01-04 01:45:08,968][134211] Fps is (10 sec: 15974.0, 60 sec: 15701.3, 300 sec: 15564.8). Total num frames: 248549376. Throughput: 0: 3990.1. Samples: 51305732. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 01:45:08,968][134211] Avg episode reward: [(0, '6.061')] [2025-01-04 01:45:09,770][134294] Updated weights for policy 0, policy_version 60684 (0.0027) [2025-01-04 01:45:12,916][134294] Updated weights for policy 0, policy_version 60694 (0.0028) [2025-01-04 01:45:13,968][134211] Fps is (10 sec: 12697.5, 60 sec: 15701.3, 300 sec: 15467.6). Total num frames: 248614912. Throughput: 0: 3766.1. Samples: 51324736. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 01:45:13,968][134211] Avg episode reward: [(0, '6.175')] [2025-01-04 01:45:13,979][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000060697_248614912.pth... [2025-01-04 01:45:14,054][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000059802_244948992.pth [2025-01-04 01:45:15,912][134294] Updated weights for policy 0, policy_version 60704 (0.0024) [2025-01-04 01:45:18,968][134211] Fps is (10 sec: 13107.3, 60 sec: 15633.0, 300 sec: 15370.4). Total num frames: 248680448. Throughput: 0: 3777.1. Samples: 51334780. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 01:45:18,968][134211] Avg episode reward: [(0, '6.293')] [2025-01-04 01:45:18,986][134294] Updated weights for policy 0, policy_version 60714 (0.0026) [2025-01-04 01:45:22,009][134294] Updated weights for policy 0, policy_version 60724 (0.0023) [2025-01-04 01:45:23,968][134211] Fps is (10 sec: 13517.2, 60 sec: 15086.9, 300 sec: 15398.2). Total num frames: 248750080. Throughput: 0: 3848.2. Samples: 51355218. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 01:45:23,968][134211] Avg episode reward: [(0, '6.332')] [2025-01-04 01:45:25,013][134294] Updated weights for policy 0, policy_version 60734 (0.0024) [2025-01-04 01:45:27,777][134294] Updated weights for policy 0, policy_version 60744 (0.0021) [2025-01-04 01:45:28,968][134211] Fps is (10 sec: 15155.0, 60 sec: 14950.4, 300 sec: 15453.7). Total num frames: 248832000. Throughput: 0: 3742.9. Samples: 51377274. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 01:45:28,968][134211] Avg episode reward: [(0, '6.626')] [2025-01-04 01:45:29,839][134294] Updated weights for policy 0, policy_version 60754 (0.0017) [2025-01-04 01:45:33,148][134294] Updated weights for policy 0, policy_version 60764 (0.0025) [2025-01-04 01:45:33,968][134211] Fps is (10 sec: 14744.7, 60 sec: 15018.5, 300 sec: 15356.5). Total num frames: 248897536. Throughput: 0: 3674.9. Samples: 51389540. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 01:45:33,969][134211] Avg episode reward: [(0, '6.471')] [2025-01-04 01:45:35,500][134294] Updated weights for policy 0, policy_version 60774 (0.0017) [2025-01-04 01:45:37,363][134294] Updated weights for policy 0, policy_version 60784 (0.0014) [2025-01-04 01:45:38,968][134211] Fps is (10 sec: 16793.9, 60 sec: 15769.6, 300 sec: 15384.3). Total num frames: 248999936. Throughput: 0: 3872.1. Samples: 51415014. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 01:45:38,968][134211] Avg episode reward: [(0, '6.530')] [2025-01-04 01:45:39,344][134294] Updated weights for policy 0, policy_version 60794 (0.0014) [2025-01-04 01:45:41,390][134294] Updated weights for policy 0, policy_version 60804 (0.0014) [2025-01-04 01:45:43,968][134211] Fps is (10 sec: 19252.0, 60 sec: 15564.7, 300 sec: 15481.5). Total num frames: 249090048. Throughput: 0: 3925.6. Samples: 51443848. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 01:45:43,969][134211] Avg episode reward: [(0, '6.402')] [2025-01-04 01:45:44,130][134294] Updated weights for policy 0, policy_version 60814 (0.0027) [2025-01-04 01:45:47,816][134294] Updated weights for policy 0, policy_version 60824 (0.0029) [2025-01-04 01:45:48,968][134211] Fps is (10 sec: 14745.3, 60 sec: 15223.5, 300 sec: 15467.6). Total num frames: 249147392. Throughput: 0: 3753.8. Samples: 51452400. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 01:45:48,969][134211] Avg episode reward: [(0, '7.234')] [2025-01-04 01:45:51,106][134294] Updated weights for policy 0, policy_version 60834 (0.0026) [2025-01-04 01:45:53,969][134211] Fps is (10 sec: 11877.1, 60 sec: 15359.7, 300 sec: 15398.1). Total num frames: 249208832. Throughput: 0: 3660.4. Samples: 51470454. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 01:45:53,970][134211] Avg episode reward: [(0, '6.726')] [2025-01-04 01:45:54,421][134294] Updated weights for policy 0, policy_version 60844 (0.0023) [2025-01-04 01:45:56,430][134294] Updated weights for policy 0, policy_version 60854 (0.0013) [2025-01-04 01:45:58,352][134294] Updated weights for policy 0, policy_version 60864 (0.0014) [2025-01-04 01:45:58,967][134211] Fps is (10 sec: 16384.5, 60 sec: 15360.0, 300 sec: 15523.2). Total num frames: 249311232. Throughput: 0: 3828.8. Samples: 51497028. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 01:45:58,968][134211] Avg episode reward: [(0, '6.833')] [2025-01-04 01:46:00,227][134294] Updated weights for policy 0, policy_version 60874 (0.0013) [2025-01-04 01:46:02,178][134294] Updated weights for policy 0, policy_version 60884 (0.0014) [2025-01-04 01:46:03,968][134211] Fps is (10 sec: 19663.2, 60 sec: 15291.8, 300 sec: 15481.5). Total num frames: 249405440. Throughput: 0: 3967.1. Samples: 51513298. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 01:46:03,968][134211] Avg episode reward: [(0, '6.125')] [2025-01-04 01:46:05,156][134294] Updated weights for policy 0, policy_version 60894 (0.0024) [2025-01-04 01:46:08,440][134294] Updated weights for policy 0, policy_version 60904 (0.0029) [2025-01-04 01:46:08,968][134211] Fps is (10 sec: 15564.6, 60 sec: 15291.7, 300 sec: 15328.8). Total num frames: 249466880. Throughput: 0: 3997.8. Samples: 51535120. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 01:46:08,968][134211] Avg episode reward: [(0, '5.262')] [2025-01-04 01:46:11,523][134294] Updated weights for policy 0, policy_version 60914 (0.0026) [2025-01-04 01:46:13,968][134211] Fps is (10 sec: 12697.0, 60 sec: 15291.7, 300 sec: 15273.2). Total num frames: 249532416. Throughput: 0: 3934.3. Samples: 51554320. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 01:46:13,969][134211] Avg episode reward: [(0, '6.116')] [2025-01-04 01:46:14,749][134294] Updated weights for policy 0, policy_version 60924 (0.0028) [2025-01-04 01:46:17,909][134294] Updated weights for policy 0, policy_version 60934 (0.0025) [2025-01-04 01:46:18,968][134211] Fps is (10 sec: 13107.2, 60 sec: 15291.7, 300 sec: 15287.1). Total num frames: 249597952. Throughput: 0: 3876.8. Samples: 51563992. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 01:46:18,968][134211] Avg episode reward: [(0, '6.063')] [2025-01-04 01:46:20,987][134294] Updated weights for policy 0, policy_version 60944 (0.0026) [2025-01-04 01:46:23,763][134294] Updated weights for policy 0, policy_version 60954 (0.0022) [2025-01-04 01:46:23,968][134211] Fps is (10 sec: 13927.1, 60 sec: 15360.0, 300 sec: 15314.9). Total num frames: 249671680. Throughput: 0: 3755.5. Samples: 51584012. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 01:46:23,968][134211] Avg episode reward: [(0, '5.630')] [2025-01-04 01:46:25,765][134294] Updated weights for policy 0, policy_version 60964 (0.0018) [2025-01-04 01:46:28,627][134294] Updated weights for policy 0, policy_version 60974 (0.0026) [2025-01-04 01:46:28,968][134211] Fps is (10 sec: 15154.9, 60 sec: 15291.7, 300 sec: 15356.5). Total num frames: 249749504. Throughput: 0: 3667.9. Samples: 51608902. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 01:46:28,968][134211] Avg episode reward: [(0, '5.790')] [2025-01-04 01:46:31,779][134294] Updated weights for policy 0, policy_version 60984 (0.0023) [2025-01-04 01:46:33,751][134294] Updated weights for policy 0, policy_version 60994 (0.0014) [2025-01-04 01:46:33,968][134211] Fps is (10 sec: 16383.6, 60 sec: 15633.2, 300 sec: 15412.1). Total num frames: 249835520. Throughput: 0: 3692.6. Samples: 51618568. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 01:46:33,968][134211] Avg episode reward: [(0, '6.573')] [2025-01-04 01:46:35,642][134294] Updated weights for policy 0, policy_version 61004 (0.0013) [2025-01-04 01:46:38,403][134294] Updated weights for policy 0, policy_version 61014 (0.0025) [2025-01-04 01:46:38,968][134211] Fps is (10 sec: 16793.6, 60 sec: 15291.7, 300 sec: 15467.6). Total num frames: 249917440. Throughput: 0: 3930.4. Samples: 51647316. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 01:46:38,968][134211] Avg episode reward: [(0, '6.375')] [2025-01-04 01:46:41,500][134294] Updated weights for policy 0, policy_version 61024 (0.0024) [2025-01-04 01:46:43,968][134211] Fps is (10 sec: 14745.7, 60 sec: 14882.1, 300 sec: 15425.9). Total num frames: 249982976. Throughput: 0: 3779.5. Samples: 51667108. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 01:46:43,968][134211] Avg episode reward: [(0, '8.439')] [2025-01-04 01:46:43,975][134264] Saving new best policy, reward=8.439! [2025-01-04 01:46:44,691][134294] Updated weights for policy 0, policy_version 61034 (0.0027) [2025-01-04 01:46:46,660][134294] Updated weights for policy 0, policy_version 61044 (0.0013) [2025-01-04 01:46:48,810][134294] Updated weights for policy 0, policy_version 61054 (0.0016) [2025-01-04 01:46:48,968][134211] Fps is (10 sec: 15974.7, 60 sec: 15496.6, 300 sec: 15384.3). Total num frames: 250077184. Throughput: 0: 3702.2. Samples: 51679898. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 01:46:48,968][134211] Avg episode reward: [(0, '6.391')] [2025-01-04 01:46:52,168][134294] Updated weights for policy 0, policy_version 61064 (0.0026) [2025-01-04 01:46:53,968][134211] Fps is (10 sec: 15564.7, 60 sec: 15496.8, 300 sec: 15356.5). Total num frames: 250138624. Throughput: 0: 3725.1. Samples: 51702750. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 01:46:53,969][134211] Avg episode reward: [(0, '6.832')] [2025-01-04 01:46:55,323][134294] Updated weights for policy 0, policy_version 61074 (0.0022) [2025-01-04 01:46:57,251][134294] Updated weights for policy 0, policy_version 61084 (0.0012) [2025-01-04 01:46:58,967][134211] Fps is (10 sec: 15974.6, 60 sec: 15428.3, 300 sec: 15481.5). Total num frames: 250236928. Throughput: 0: 3869.7. Samples: 51728454. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 01:46:58,968][134211] Avg episode reward: [(0, '6.335')] [2025-01-04 01:46:59,085][134294] Updated weights for policy 0, policy_version 61094 (0.0013) [2025-01-04 01:47:00,993][134294] Updated weights for policy 0, policy_version 61104 (0.0014) [2025-01-04 01:47:03,023][134294] Updated weights for policy 0, policy_version 61114 (0.0016) [2025-01-04 01:47:03,968][134211] Fps is (10 sec: 19660.8, 60 sec: 15496.5, 300 sec: 15467.6). Total num frames: 250335232. Throughput: 0: 4017.4. Samples: 51744776. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 01:47:03,969][134211] Avg episode reward: [(0, '6.176')] [2025-01-04 01:47:06,313][134294] Updated weights for policy 0, policy_version 61124 (0.0031) [2025-01-04 01:47:08,968][134211] Fps is (10 sec: 15974.1, 60 sec: 15496.5, 300 sec: 15301.0). Total num frames: 250396672. Throughput: 0: 4063.8. Samples: 51766882. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 01:47:08,968][134211] Avg episode reward: [(0, '7.431')] [2025-01-04 01:47:09,798][134294] Updated weights for policy 0, policy_version 61134 (0.0028) [2025-01-04 01:47:12,868][134294] Updated weights for policy 0, policy_version 61144 (0.0024) [2025-01-04 01:47:13,968][134211] Fps is (10 sec: 12288.1, 60 sec: 15428.4, 300 sec: 15245.5). Total num frames: 250458112. Throughput: 0: 3926.8. Samples: 51785608. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 01:47:13,968][134211] Avg episode reward: [(0, '7.277')] [2025-01-04 01:47:13,981][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000061147_250458112.pth... [2025-01-04 01:47:14,053][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000060242_246751232.pth [2025-01-04 01:47:16,020][134294] Updated weights for policy 0, policy_version 61154 (0.0026) [2025-01-04 01:47:18,425][134294] Updated weights for policy 0, policy_version 61164 (0.0016) [2025-01-04 01:47:18,967][134211] Fps is (10 sec: 13926.7, 60 sec: 15633.1, 300 sec: 15287.1). Total num frames: 250535936. Throughput: 0: 3926.8. Samples: 51795272. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 01:47:18,968][134211] Avg episode reward: [(0, '7.358')] [2025-01-04 01:47:20,810][134294] Updated weights for policy 0, policy_version 61174 (0.0018) [2025-01-04 01:47:23,914][134294] Updated weights for policy 0, policy_version 61184 (0.0028) [2025-01-04 01:47:23,969][134211] Fps is (10 sec: 15153.5, 60 sec: 15632.8, 300 sec: 15328.7). Total num frames: 250609664. Throughput: 0: 3837.7. Samples: 51820016. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 01:47:23,969][134211] Avg episode reward: [(0, '7.150')] [2025-01-04 01:47:26,884][134294] Updated weights for policy 0, policy_version 61194 (0.0025) [2025-01-04 01:47:28,968][134211] Fps is (10 sec: 13925.1, 60 sec: 15428.1, 300 sec: 15342.7). Total num frames: 250675200. Throughput: 0: 3837.6. Samples: 51839802. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 01:47:28,969][134211] Avg episode reward: [(0, '6.944')] [2025-01-04 01:47:29,917][134294] Updated weights for policy 0, policy_version 61204 (0.0023) [2025-01-04 01:47:31,791][134294] Updated weights for policy 0, policy_version 61214 (0.0013) [2025-01-04 01:47:33,725][134294] Updated weights for policy 0, policy_version 61224 (0.0015) [2025-01-04 01:47:33,967][134211] Fps is (10 sec: 16795.9, 60 sec: 15701.4, 300 sec: 15467.7). Total num frames: 250777600. Throughput: 0: 3843.3. Samples: 51852846. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 01:47:33,968][134211] Avg episode reward: [(0, '7.406')] [2025-01-04 01:47:35,611][134294] Updated weights for policy 0, policy_version 61234 (0.0014) [2025-01-04 01:47:37,536][134294] Updated weights for policy 0, policy_version 61244 (0.0013) [2025-01-04 01:47:38,968][134211] Fps is (10 sec: 20891.3, 60 sec: 16111.0, 300 sec: 15578.7). Total num frames: 250884096. Throughput: 0: 4058.6. Samples: 51885386. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 01:47:38,968][134211] Avg episode reward: [(0, '6.630')] [2025-01-04 01:47:39,547][134294] Updated weights for policy 0, policy_version 61254 (0.0015) [2025-01-04 01:47:42,537][134294] Updated weights for policy 0, policy_version 61264 (0.0027) [2025-01-04 01:47:43,968][134211] Fps is (10 sec: 17612.0, 60 sec: 16179.2, 300 sec: 15467.6). Total num frames: 250953728. Throughput: 0: 4018.1. Samples: 51909272. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 01:47:43,969][134211] Avg episode reward: [(0, '6.922')] [2025-01-04 01:47:45,874][134294] Updated weights for policy 0, policy_version 61274 (0.0031) [2025-01-04 01:47:48,968][134211] Fps is (10 sec: 13107.0, 60 sec: 15633.0, 300 sec: 15439.9). Total num frames: 251015168. Throughput: 0: 3869.3. Samples: 51918896. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 01:47:48,968][134211] Avg episode reward: [(0, '6.915')] [2025-01-04 01:47:49,106][134294] Updated weights for policy 0, policy_version 61284 (0.0025) [2025-01-04 01:47:52,178][134294] Updated weights for policy 0, policy_version 61294 (0.0022) [2025-01-04 01:47:53,968][134211] Fps is (10 sec: 12697.8, 60 sec: 15701.3, 300 sec: 15453.7). Total num frames: 251080704. Throughput: 0: 3809.7. Samples: 51938318. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 01:47:53,968][134211] Avg episode reward: [(0, '7.166')] [2025-01-04 01:47:55,354][134294] Updated weights for policy 0, policy_version 61304 (0.0025) [2025-01-04 01:47:58,523][134294] Updated weights for policy 0, policy_version 61314 (0.0026) [2025-01-04 01:47:58,968][134211] Fps is (10 sec: 13106.9, 60 sec: 15155.1, 300 sec: 15439.8). Total num frames: 251146240. Throughput: 0: 3819.3. Samples: 51957478. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 01:47:58,969][134211] Avg episode reward: [(0, '6.902')] [2025-01-04 01:48:01,722][134294] Updated weights for policy 0, policy_version 61324 (0.0027) [2025-01-04 01:48:03,960][134294] Updated weights for policy 0, policy_version 61334 (0.0015) [2025-01-04 01:48:03,968][134211] Fps is (10 sec: 14336.3, 60 sec: 14813.9, 300 sec: 15370.4). Total num frames: 251224064. Throughput: 0: 3819.9. Samples: 51967168. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 01:48:03,968][134211] Avg episode reward: [(0, '6.914')] [2025-01-04 01:48:05,951][134294] Updated weights for policy 0, policy_version 61344 (0.0014) [2025-01-04 01:48:07,885][134294] Updated weights for policy 0, policy_version 61354 (0.0013) [2025-01-04 01:48:08,967][134211] Fps is (10 sec: 18023.3, 60 sec: 15496.6, 300 sec: 15509.3). Total num frames: 251326464. Throughput: 0: 3909.8. Samples: 51995950. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 01:48:08,968][134211] Avg episode reward: [(0, '6.996')] [2025-01-04 01:48:09,769][134294] Updated weights for policy 0, policy_version 61364 (0.0013) [2025-01-04 01:48:11,659][134294] Updated weights for policy 0, policy_version 61374 (0.0015) [2025-01-04 01:48:13,968][134211] Fps is (10 sec: 19660.4, 60 sec: 16042.7, 300 sec: 15481.5). Total num frames: 251420672. Throughput: 0: 4134.4. Samples: 52025848. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 01:48:13,968][134211] Avg episode reward: [(0, '7.963')] [2025-01-04 01:48:14,738][134294] Updated weights for policy 0, policy_version 61384 (0.0026) [2025-01-04 01:48:17,763][134294] Updated weights for policy 0, policy_version 61394 (0.0027) [2025-01-04 01:48:18,968][134211] Fps is (10 sec: 15564.4, 60 sec: 15769.5, 300 sec: 15370.4). Total num frames: 251482112. Throughput: 0: 4058.0. Samples: 52035456. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 01:48:18,968][134211] Avg episode reward: [(0, '8.247')] [2025-01-04 01:48:21,024][134294] Updated weights for policy 0, policy_version 61404 (0.0026) [2025-01-04 01:48:23,968][134211] Fps is (10 sec: 12697.8, 60 sec: 15633.4, 300 sec: 15384.3). Total num frames: 251547648. Throughput: 0: 3762.9. Samples: 52054716. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 01:48:23,968][134211] Avg episode reward: [(0, '7.738')] [2025-01-04 01:48:24,259][134294] Updated weights for policy 0, policy_version 61414 (0.0027) [2025-01-04 01:48:27,248][134294] Updated weights for policy 0, policy_version 61424 (0.0025) [2025-01-04 01:48:28,968][134211] Fps is (10 sec: 13107.4, 60 sec: 15633.3, 300 sec: 15384.3). Total num frames: 251613184. Throughput: 0: 3664.1. Samples: 52074154. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 01:48:28,969][134211] Avg episode reward: [(0, '7.431')] [2025-01-04 01:48:30,302][134294] Updated weights for policy 0, policy_version 61434 (0.0026) [2025-01-04 01:48:33,381][134294] Updated weights for policy 0, policy_version 61444 (0.0027) [2025-01-04 01:48:33,968][134211] Fps is (10 sec: 13516.8, 60 sec: 15086.9, 300 sec: 15412.1). Total num frames: 251682816. Throughput: 0: 3679.8. Samples: 52084488. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 01:48:33,968][134211] Avg episode reward: [(0, '6.509')] [2025-01-04 01:48:35,456][134294] Updated weights for policy 0, policy_version 61454 (0.0013) [2025-01-04 01:48:37,344][134294] Updated weights for policy 0, policy_version 61464 (0.0012) [2025-01-04 01:48:38,967][134211] Fps is (10 sec: 17613.1, 60 sec: 15087.0, 300 sec: 15564.8). Total num frames: 251789312. Throughput: 0: 3850.3. Samples: 52111582. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 01:48:38,968][134211] Avg episode reward: [(0, '7.262')] [2025-01-04 01:48:39,196][134294] Updated weights for policy 0, policy_version 61474 (0.0014) [2025-01-04 01:48:41,157][134294] Updated weights for policy 0, policy_version 61484 (0.0015) [2025-01-04 01:48:43,053][134294] Updated weights for policy 0, policy_version 61494 (0.0015) [2025-01-04 01:48:43,968][134211] Fps is (10 sec: 21299.4, 60 sec: 15701.4, 300 sec: 15592.6). Total num frames: 251895808. Throughput: 0: 4142.7. Samples: 52143896. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 01:48:43,968][134211] Avg episode reward: [(0, '6.601')] [2025-01-04 01:48:45,056][134294] Updated weights for policy 0, policy_version 61504 (0.0013) [2025-01-04 01:48:48,229][134294] Updated weights for policy 0, policy_version 61514 (0.0028) [2025-01-04 01:48:48,968][134211] Fps is (10 sec: 17612.2, 60 sec: 15837.9, 300 sec: 15467.6). Total num frames: 251965440. Throughput: 0: 4229.5. Samples: 52157498. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 01:48:48,969][134211] Avg episode reward: [(0, '7.690')] [2025-01-04 01:48:51,999][134294] Updated weights for policy 0, policy_version 61524 (0.0030) [2025-01-04 01:48:53,968][134211] Fps is (10 sec: 12697.3, 60 sec: 15701.3, 300 sec: 15398.2). Total num frames: 252022784. Throughput: 0: 3959.5. Samples: 52174130. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 01:48:53,969][134211] Avg episode reward: [(0, '6.264')] [2025-01-04 01:48:55,615][134294] Updated weights for policy 0, policy_version 61534 (0.0028) [2025-01-04 01:48:58,969][134211] Fps is (10 sec: 11467.8, 60 sec: 15564.6, 300 sec: 15398.1). Total num frames: 252080128. Throughput: 0: 3690.3. Samples: 52191916. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 01:48:58,969][134211] Avg episode reward: [(0, '6.751')] [2025-01-04 01:48:59,000][134294] Updated weights for policy 0, policy_version 61544 (0.0027) [2025-01-04 01:49:02,428][134294] Updated weights for policy 0, policy_version 61554 (0.0026) [2025-01-04 01:49:03,968][134211] Fps is (10 sec: 12287.6, 60 sec: 15359.9, 300 sec: 15384.3). Total num frames: 252145664. Throughput: 0: 3671.6. Samples: 52200678. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 01:49:03,969][134211] Avg episode reward: [(0, '7.053')] [2025-01-04 01:49:05,512][134294] Updated weights for policy 0, policy_version 61564 (0.0026) [2025-01-04 01:49:08,300][134294] Updated weights for policy 0, policy_version 61574 (0.0026) [2025-01-04 01:49:08,968][134211] Fps is (10 sec: 13107.9, 60 sec: 14745.4, 300 sec: 15384.3). Total num frames: 252211200. Throughput: 0: 3696.4. Samples: 52221054. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 01:49:08,969][134211] Avg episode reward: [(0, '6.710')] [2025-01-04 01:49:11,080][134294] Updated weights for policy 0, policy_version 61584 (0.0021) [2025-01-04 01:49:12,962][134294] Updated weights for policy 0, policy_version 61594 (0.0013) [2025-01-04 01:49:13,967][134211] Fps is (10 sec: 16385.0, 60 sec: 14813.9, 300 sec: 15481.5). Total num frames: 252309504. Throughput: 0: 3833.7. Samples: 52246668. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 01:49:13,968][134211] Avg episode reward: [(0, '6.612')] [2025-01-04 01:49:13,972][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000061599_252309504.pth... [2025-01-04 01:49:14,023][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000060697_248614912.pth [2025-01-04 01:49:14,881][134294] Updated weights for policy 0, policy_version 61604 (0.0012) [2025-01-04 01:49:16,788][134294] Updated weights for policy 0, policy_version 61614 (0.0013) [2025-01-04 01:49:18,647][134294] Updated weights for policy 0, policy_version 61624 (0.0014) [2025-01-04 01:49:18,967][134211] Fps is (10 sec: 20481.4, 60 sec: 15564.9, 300 sec: 15495.4). Total num frames: 252416000. Throughput: 0: 3964.7. Samples: 52262898. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 01:49:18,968][134211] Avg episode reward: [(0, '6.809')] [2025-01-04 01:49:20,668][134294] Updated weights for policy 0, policy_version 61634 (0.0016) [2025-01-04 01:49:23,660][134294] Updated weights for policy 0, policy_version 61644 (0.0026) [2025-01-04 01:49:23,968][134211] Fps is (10 sec: 18431.3, 60 sec: 15769.5, 300 sec: 15453.7). Total num frames: 252493824. Throughput: 0: 4005.1. Samples: 52291814. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 01:49:23,969][134211] Avg episode reward: [(0, '7.480')] [2025-01-04 01:49:26,896][134294] Updated weights for policy 0, policy_version 61654 (0.0029) [2025-01-04 01:49:28,968][134211] Fps is (10 sec: 14335.7, 60 sec: 15769.6, 300 sec: 15467.6). Total num frames: 252559360. Throughput: 0: 3711.2. Samples: 52310902. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 01:49:28,968][134211] Avg episode reward: [(0, '6.481')] [2025-01-04 01:49:30,195][134294] Updated weights for policy 0, policy_version 61664 (0.0027) [2025-01-04 01:49:33,364][134294] Updated weights for policy 0, policy_version 61674 (0.0024) [2025-01-04 01:49:33,968][134211] Fps is (10 sec: 13107.3, 60 sec: 15701.3, 300 sec: 15495.4). Total num frames: 252624896. Throughput: 0: 3623.2. Samples: 52320540. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 01:49:33,969][134211] Avg episode reward: [(0, '7.147')] [2025-01-04 01:49:36,463][134294] Updated weights for policy 0, policy_version 61684 (0.0026) [2025-01-04 01:49:38,969][134211] Fps is (10 sec: 13105.8, 60 sec: 15018.3, 300 sec: 15370.4). Total num frames: 252690432. Throughput: 0: 3687.5. Samples: 52340070. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 01:49:38,969][134211] Avg episode reward: [(0, '6.500')] [2025-01-04 01:49:39,558][134294] Updated weights for policy 0, policy_version 61694 (0.0027) [2025-01-04 01:49:42,440][134294] Updated weights for policy 0, policy_version 61704 (0.0024) [2025-01-04 01:49:43,967][134211] Fps is (10 sec: 14336.4, 60 sec: 14540.8, 300 sec: 15370.4). Total num frames: 252768256. Throughput: 0: 3782.2. Samples: 52362110. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 01:49:43,968][134211] Avg episode reward: [(0, '6.807')] [2025-01-04 01:49:44,417][134294] Updated weights for policy 0, policy_version 61714 (0.0015) [2025-01-04 01:49:47,258][134294] Updated weights for policy 0, policy_version 61724 (0.0024) [2025-01-04 01:49:48,968][134211] Fps is (10 sec: 15157.0, 60 sec: 14609.1, 300 sec: 15439.8). Total num frames: 252841984. Throughput: 0: 3867.2. Samples: 52374700. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 01:49:48,968][134211] Avg episode reward: [(0, '7.372')] [2025-01-04 01:49:50,006][134294] Updated weights for policy 0, policy_version 61734 (0.0022) [2025-01-04 01:49:52,071][134294] Updated weights for policy 0, policy_version 61744 (0.0017) [2025-01-04 01:49:53,968][134211] Fps is (10 sec: 15564.6, 60 sec: 15018.7, 300 sec: 15370.4). Total num frames: 252923904. Throughput: 0: 3959.5. Samples: 52399230. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 01:49:53,968][134211] Avg episode reward: [(0, '6.962')] [2025-01-04 01:49:55,450][134294] Updated weights for policy 0, policy_version 61754 (0.0024) [2025-01-04 01:49:57,799][134294] Updated weights for policy 0, policy_version 61764 (0.0016) [2025-01-04 01:49:58,968][134211] Fps is (10 sec: 16384.2, 60 sec: 15428.6, 300 sec: 15314.9). Total num frames: 253005824. Throughput: 0: 3885.8. Samples: 52421528. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 01:49:58,968][134211] Avg episode reward: [(0, '6.253')] [2025-01-04 01:49:59,830][134294] Updated weights for policy 0, policy_version 61774 (0.0015) [2025-01-04 01:50:01,747][134294] Updated weights for policy 0, policy_version 61784 (0.0012) [2025-01-04 01:50:03,697][134294] Updated weights for policy 0, policy_version 61794 (0.0015) [2025-01-04 01:50:03,968][134211] Fps is (10 sec: 18431.9, 60 sec: 16042.8, 300 sec: 15453.7). Total num frames: 253108224. Throughput: 0: 3866.6. Samples: 52436896. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 01:50:03,968][134211] Avg episode reward: [(0, '7.357')] [2025-01-04 01:50:06,628][134294] Updated weights for policy 0, policy_version 61804 (0.0027) [2025-01-04 01:50:08,968][134211] Fps is (10 sec: 17202.8, 60 sec: 16111.0, 300 sec: 15467.6). Total num frames: 253177856. Throughput: 0: 3786.3. Samples: 52462196. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 01:50:08,968][134211] Avg episode reward: [(0, '7.578')] [2025-01-04 01:50:09,924][134294] Updated weights for policy 0, policy_version 61814 (0.0026) [2025-01-04 01:50:13,168][134294] Updated weights for policy 0, policy_version 61824 (0.0027) [2025-01-04 01:50:13,968][134211] Fps is (10 sec: 13107.1, 60 sec: 15496.5, 300 sec: 15453.7). Total num frames: 253239296. Throughput: 0: 3787.7. Samples: 52481348. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 01:50:13,968][134211] Avg episode reward: [(0, '7.148')] [2025-01-04 01:50:16,015][134294] Updated weights for policy 0, policy_version 61834 (0.0023) [2025-01-04 01:50:18,968][134211] Fps is (10 sec: 13107.4, 60 sec: 14882.1, 300 sec: 15453.7). Total num frames: 253308928. Throughput: 0: 3803.2. Samples: 52491682. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 01:50:18,968][134211] Avg episode reward: [(0, '8.034')] [2025-01-04 01:50:19,179][134294] Updated weights for policy 0, policy_version 61844 (0.0029) [2025-01-04 01:50:21,823][134294] Updated weights for policy 0, policy_version 61854 (0.0018) [2025-01-04 01:50:23,738][134294] Updated weights for policy 0, policy_version 61864 (0.0013) [2025-01-04 01:50:23,967][134211] Fps is (10 sec: 15974.9, 60 sec: 15087.0, 300 sec: 15481.5). Total num frames: 253399040. Throughput: 0: 3857.5. Samples: 52513652. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 01:50:23,968][134211] Avg episode reward: [(0, '7.342')] [2025-01-04 01:50:25,652][134294] Updated weights for policy 0, policy_version 61874 (0.0013) [2025-01-04 01:50:27,510][134294] Updated weights for policy 0, policy_version 61884 (0.0012) [2025-01-04 01:50:28,968][134211] Fps is (10 sec: 19660.9, 60 sec: 15769.6, 300 sec: 15620.4). Total num frames: 253505536. Throughput: 0: 4088.2. Samples: 52546080. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 01:50:28,968][134211] Avg episode reward: [(0, '6.322')] [2025-01-04 01:50:29,826][134294] Updated weights for policy 0, policy_version 61894 (0.0020) [2025-01-04 01:50:32,916][134294] Updated weights for policy 0, policy_version 61904 (0.0025) [2025-01-04 01:50:33,968][134211] Fps is (10 sec: 17202.7, 60 sec: 15769.6, 300 sec: 15495.4). Total num frames: 253571072. Throughput: 0: 4052.4. Samples: 52557060. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 01:50:33,969][134211] Avg episode reward: [(0, '6.911')] [2025-01-04 01:50:36,014][134294] Updated weights for policy 0, policy_version 61914 (0.0025) [2025-01-04 01:50:38,968][134211] Fps is (10 sec: 13106.6, 60 sec: 15769.8, 300 sec: 15412.1). Total num frames: 253636608. Throughput: 0: 3947.2. Samples: 52576856. Policy #0 lag: (min: 0.0, avg: 0.8, max: 3.0) [2025-01-04 01:50:38,969][134211] Avg episode reward: [(0, '7.048')] [2025-01-04 01:50:39,314][134294] Updated weights for policy 0, policy_version 61924 (0.0025) [2025-01-04 01:50:42,362][134294] Updated weights for policy 0, policy_version 61934 (0.0026) [2025-01-04 01:50:43,968][134211] Fps is (10 sec: 13106.7, 60 sec: 15564.6, 300 sec: 15439.8). Total num frames: 253702144. Throughput: 0: 3883.6. Samples: 52596294. Policy #0 lag: (min: 0.0, avg: 0.8, max: 3.0) [2025-01-04 01:50:43,969][134211] Avg episode reward: [(0, '6.360')] [2025-01-04 01:50:45,327][134294] Updated weights for policy 0, policy_version 61944 (0.0022) [2025-01-04 01:50:48,345][134294] Updated weights for policy 0, policy_version 61954 (0.0023) [2025-01-04 01:50:48,968][134211] Fps is (10 sec: 13517.3, 60 sec: 15496.5, 300 sec: 15467.7). Total num frames: 253771776. Throughput: 0: 3772.9. Samples: 52606678. Policy #0 lag: (min: 0.0, avg: 0.8, max: 3.0) [2025-01-04 01:50:48,968][134211] Avg episode reward: [(0, '7.065')] [2025-01-04 01:50:51,217][134294] Updated weights for policy 0, policy_version 61964 (0.0022) [2025-01-04 01:50:53,314][134294] Updated weights for policy 0, policy_version 61974 (0.0012) [2025-01-04 01:50:53,968][134211] Fps is (10 sec: 15565.7, 60 sec: 15564.8, 300 sec: 15412.1). Total num frames: 253857792. Throughput: 0: 3706.7. Samples: 52628996. Policy #0 lag: (min: 0.0, avg: 0.8, max: 3.0) [2025-01-04 01:50:53,968][134211] Avg episode reward: [(0, '6.602')] [2025-01-04 01:50:55,377][134294] Updated weights for policy 0, policy_version 61984 (0.0013) [2025-01-04 01:50:57,293][134294] Updated weights for policy 0, policy_version 61994 (0.0014) [2025-01-04 01:50:58,968][134211] Fps is (10 sec: 18841.9, 60 sec: 15906.1, 300 sec: 15439.8). Total num frames: 253960192. Throughput: 0: 3969.8. Samples: 52659986. Policy #0 lag: (min: 0.0, avg: 0.8, max: 3.0) [2025-01-04 01:50:58,968][134211] Avg episode reward: [(0, '7.145')] [2025-01-04 01:50:59,156][134294] Updated weights for policy 0, policy_version 62004 (0.0013) [2025-01-04 01:51:01,030][134294] Updated weights for policy 0, policy_version 62014 (0.0015) [2025-01-04 01:51:03,968][134211] Fps is (10 sec: 18841.1, 60 sec: 15633.0, 300 sec: 15523.1). Total num frames: 254046208. Throughput: 0: 4099.1. Samples: 52676142. Policy #0 lag: (min: 0.0, avg: 0.8, max: 3.0) [2025-01-04 01:51:03,969][134211] Avg episode reward: [(0, '7.050')] [2025-01-04 01:51:04,033][134294] Updated weights for policy 0, policy_version 62024 (0.0024) [2025-01-04 01:51:07,295][134294] Updated weights for policy 0, policy_version 62034 (0.0028) [2025-01-04 01:51:08,968][134211] Fps is (10 sec: 15155.1, 60 sec: 15564.8, 300 sec: 15523.2). Total num frames: 254111744. Throughput: 0: 4042.4. Samples: 52695560. Policy #0 lag: (min: 0.0, avg: 0.8, max: 3.0) [2025-01-04 01:51:08,968][134211] Avg episode reward: [(0, '7.355')] [2025-01-04 01:51:10,543][134294] Updated weights for policy 0, policy_version 62044 (0.0029) [2025-01-04 01:51:13,466][134294] Updated weights for policy 0, policy_version 62054 (0.0025) [2025-01-04 01:51:13,968][134211] Fps is (10 sec: 13107.1, 60 sec: 15633.0, 300 sec: 15523.1). Total num frames: 254177280. Throughput: 0: 3760.8. Samples: 52715318. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 01:51:13,969][134211] Avg episode reward: [(0, '6.919')] [2025-01-04 01:51:13,982][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000062055_254177280.pth... [2025-01-04 01:51:14,052][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000061147_250458112.pth [2025-01-04 01:51:16,685][134294] Updated weights for policy 0, policy_version 62064 (0.0025) [2025-01-04 01:51:18,968][134211] Fps is (10 sec: 13107.1, 60 sec: 15564.8, 300 sec: 15495.4). Total num frames: 254242816. Throughput: 0: 3732.7. Samples: 52725030. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 01:51:18,968][134211] Avg episode reward: [(0, '6.369')] [2025-01-04 01:51:19,987][134294] Updated weights for policy 0, policy_version 62074 (0.0027) [2025-01-04 01:51:23,056][134294] Updated weights for policy 0, policy_version 62084 (0.0025) [2025-01-04 01:51:23,968][134211] Fps is (10 sec: 13517.2, 60 sec: 15223.4, 300 sec: 15467.6). Total num frames: 254312448. Throughput: 0: 3719.9. Samples: 52744252. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 01:51:23,968][134211] Avg episode reward: [(0, '6.658')] [2025-01-04 01:51:25,100][134294] Updated weights for policy 0, policy_version 62094 (0.0013) [2025-01-04 01:51:27,854][134294] Updated weights for policy 0, policy_version 62104 (0.0024) [2025-01-04 01:51:28,968][134211] Fps is (10 sec: 14745.6, 60 sec: 14745.6, 300 sec: 15439.8). Total num frames: 254390272. Throughput: 0: 3829.4. Samples: 52768616. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 01:51:28,968][134211] Avg episode reward: [(0, '5.898')] [2025-01-04 01:51:30,910][134294] Updated weights for policy 0, policy_version 62114 (0.0025) [2025-01-04 01:51:33,340][134294] Updated weights for policy 0, policy_version 62124 (0.0017) [2025-01-04 01:51:33,968][134211] Fps is (10 sec: 15973.0, 60 sec: 15018.5, 300 sec: 15439.8). Total num frames: 254472192. Throughput: 0: 3824.4. Samples: 52778780. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 01:51:33,969][134211] Avg episode reward: [(0, '5.798')] [2025-01-04 01:51:35,679][134294] Updated weights for policy 0, policy_version 62134 (0.0020) [2025-01-04 01:51:38,742][134294] Updated weights for policy 0, policy_version 62144 (0.0026) [2025-01-04 01:51:38,968][134211] Fps is (10 sec: 15155.3, 60 sec: 15087.0, 300 sec: 15453.7). Total num frames: 254541824. Throughput: 0: 3882.1. Samples: 52803690. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 01:51:38,968][134211] Avg episode reward: [(0, '7.020')] [2025-01-04 01:51:41,233][134294] Updated weights for policy 0, policy_version 62154 (0.0021) [2025-01-04 01:51:43,160][134294] Updated weights for policy 0, policy_version 62164 (0.0014) [2025-01-04 01:51:43,967][134211] Fps is (10 sec: 16795.3, 60 sec: 15633.2, 300 sec: 15467.6). Total num frames: 254640128. Throughput: 0: 3764.0. Samples: 52829368. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 01:51:43,968][134211] Avg episode reward: [(0, '6.803')] [2025-01-04 01:51:45,024][134294] Updated weights for policy 0, policy_version 62174 (0.0012) [2025-01-04 01:51:46,898][134294] Updated weights for policy 0, policy_version 62184 (0.0013) [2025-01-04 01:51:48,802][134294] Updated weights for policy 0, policy_version 62194 (0.0013) [2025-01-04 01:51:48,968][134211] Fps is (10 sec: 20889.8, 60 sec: 16315.8, 300 sec: 15634.2). Total num frames: 254750720. Throughput: 0: 3765.7. Samples: 52845596. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 01:51:48,968][134211] Avg episode reward: [(0, '7.239')] [2025-01-04 01:51:51,213][134294] Updated weights for policy 0, policy_version 62204 (0.0017) [2025-01-04 01:51:53,968][134211] Fps is (10 sec: 18021.3, 60 sec: 16042.5, 300 sec: 15537.0). Total num frames: 254820352. Throughput: 0: 3945.2. Samples: 52873094. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 01:51:53,969][134211] Avg episode reward: [(0, '7.020')] [2025-01-04 01:51:54,648][134294] Updated weights for policy 0, policy_version 62214 (0.0025) [2025-01-04 01:51:58,318][134294] Updated weights for policy 0, policy_version 62224 (0.0029) [2025-01-04 01:51:58,968][134211] Fps is (10 sec: 12287.7, 60 sec: 15223.4, 300 sec: 15384.3). Total num frames: 254873600. Throughput: 0: 3887.2. Samples: 52890242. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 01:51:58,968][134211] Avg episode reward: [(0, '6.591')] [2025-01-04 01:52:01,753][134294] Updated weights for policy 0, policy_version 62234 (0.0024) [2025-01-04 01:52:03,968][134211] Fps is (10 sec: 11469.3, 60 sec: 14813.9, 300 sec: 15384.3). Total num frames: 254935040. Throughput: 0: 3865.4. Samples: 52898972. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 01:52:03,968][134211] Avg episode reward: [(0, '7.286')] [2025-01-04 01:52:05,264][134294] Updated weights for policy 0, policy_version 62244 (0.0026) [2025-01-04 01:52:07,882][134294] Updated weights for policy 0, policy_version 62254 (0.0017) [2025-01-04 01:52:08,968][134211] Fps is (10 sec: 13926.7, 60 sec: 15018.7, 300 sec: 15439.8). Total num frames: 255012864. Throughput: 0: 3847.1. Samples: 52917370. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 01:52:08,968][134211] Avg episode reward: [(0, '7.004')] [2025-01-04 01:52:09,867][134294] Updated weights for policy 0, policy_version 62264 (0.0013) [2025-01-04 01:52:11,764][134294] Updated weights for policy 0, policy_version 62274 (0.0014) [2025-01-04 01:52:13,859][134294] Updated weights for policy 0, policy_version 62284 (0.0016) [2025-01-04 01:52:13,968][134211] Fps is (10 sec: 18022.4, 60 sec: 15633.1, 300 sec: 15523.1). Total num frames: 255115264. Throughput: 0: 4020.4. Samples: 52949532. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 01:52:13,968][134211] Avg episode reward: [(0, '7.082')] [2025-01-04 01:52:16,927][134294] Updated weights for policy 0, policy_version 62294 (0.0032) [2025-01-04 01:52:18,968][134211] Fps is (10 sec: 16384.0, 60 sec: 15564.8, 300 sec: 15481.6). Total num frames: 255176704. Throughput: 0: 4017.8. Samples: 52959578. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 01:52:18,968][134211] Avg episode reward: [(0, '6.684')] [2025-01-04 01:52:20,284][134294] Updated weights for policy 0, policy_version 62304 (0.0026) [2025-01-04 01:52:23,763][134294] Updated weights for policy 0, policy_version 62314 (0.0029) [2025-01-04 01:52:23,968][134211] Fps is (10 sec: 12287.8, 60 sec: 15428.2, 300 sec: 15467.6). Total num frames: 255238144. Throughput: 0: 3875.3. Samples: 52978078. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 01:52:23,969][134211] Avg episode reward: [(0, '6.681')] [2025-01-04 01:52:26,733][134294] Updated weights for policy 0, policy_version 62324 (0.0023) [2025-01-04 01:52:28,663][134294] Updated weights for policy 0, policy_version 62334 (0.0013) [2025-01-04 01:52:28,967][134211] Fps is (10 sec: 14745.7, 60 sec: 15564.9, 300 sec: 15412.1). Total num frames: 255324160. Throughput: 0: 3809.5. Samples: 53000796. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 01:52:28,968][134211] Avg episode reward: [(0, '7.041')] [2025-01-04 01:52:30,523][134294] Updated weights for policy 0, policy_version 62344 (0.0013) [2025-01-04 01:52:32,409][134294] Updated weights for policy 0, policy_version 62354 (0.0013) [2025-01-04 01:52:33,968][134211] Fps is (10 sec: 19661.2, 60 sec: 16042.9, 300 sec: 15426.0). Total num frames: 255434752. Throughput: 0: 3811.0. Samples: 53017092. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 01:52:33,968][134211] Avg episode reward: [(0, '7.050')] [2025-01-04 01:52:34,450][134294] Updated weights for policy 0, policy_version 62364 (0.0017) [2025-01-04 01:52:37,572][134294] Updated weights for policy 0, policy_version 62374 (0.0028) [2025-01-04 01:52:38,968][134211] Fps is (10 sec: 17612.3, 60 sec: 15974.4, 300 sec: 15412.1). Total num frames: 255500288. Throughput: 0: 3775.7. Samples: 53042998. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 01:52:38,968][134211] Avg episode reward: [(0, '6.721')] [2025-01-04 01:52:40,679][134294] Updated weights for policy 0, policy_version 62384 (0.0027) [2025-01-04 01:52:43,689][134294] Updated weights for policy 0, policy_version 62394 (0.0026) [2025-01-04 01:52:43,968][134211] Fps is (10 sec: 13107.2, 60 sec: 15428.2, 300 sec: 15426.0). Total num frames: 255565824. Throughput: 0: 3837.3. Samples: 53062922. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 01:52:43,968][134211] Avg episode reward: [(0, '7.071')] [2025-01-04 01:52:46,770][134294] Updated weights for policy 0, policy_version 62404 (0.0025) [2025-01-04 01:52:48,968][134211] Fps is (10 sec: 13107.2, 60 sec: 14677.3, 300 sec: 15426.0). Total num frames: 255631360. Throughput: 0: 3863.0. Samples: 53072806. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 01:52:48,969][134211] Avg episode reward: [(0, '6.790')] [2025-01-04 01:52:49,952][134294] Updated weights for policy 0, policy_version 62414 (0.0026) [2025-01-04 01:52:53,297][134294] Updated weights for policy 0, policy_version 62424 (0.0026) [2025-01-04 01:52:53,968][134211] Fps is (10 sec: 13106.6, 60 sec: 14609.1, 300 sec: 15425.9). Total num frames: 255696896. Throughput: 0: 3881.3. Samples: 53092032. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 01:52:53,969][134211] Avg episode reward: [(0, '7.099')] [2025-01-04 01:52:55,762][134294] Updated weights for policy 0, policy_version 62434 (0.0014) [2025-01-04 01:52:57,908][134294] Updated weights for policy 0, policy_version 62444 (0.0014) [2025-01-04 01:52:58,968][134211] Fps is (10 sec: 15155.4, 60 sec: 15155.2, 300 sec: 15453.7). Total num frames: 255782912. Throughput: 0: 3718.2. Samples: 53116852. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 01:52:58,968][134211] Avg episode reward: [(0, '7.164')] [2025-01-04 01:53:00,858][134294] Updated weights for policy 0, policy_version 62454 (0.0025) [2025-01-04 01:53:02,852][134294] Updated weights for policy 0, policy_version 62464 (0.0013) [2025-01-04 01:53:03,968][134211] Fps is (10 sec: 17203.8, 60 sec: 15564.8, 300 sec: 15398.2). Total num frames: 255868928. Throughput: 0: 3731.5. Samples: 53127496. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 01:53:03,969][134211] Avg episode reward: [(0, '7.075')] [2025-01-04 01:53:05,720][134294] Updated weights for policy 0, policy_version 62474 (0.0026) [2025-01-04 01:53:08,730][134294] Updated weights for policy 0, policy_version 62484 (0.0030) [2025-01-04 01:53:08,968][134211] Fps is (10 sec: 15155.1, 60 sec: 15360.0, 300 sec: 15301.0). Total num frames: 255934464. Throughput: 0: 3853.6. Samples: 53151490. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 01:53:08,968][134211] Avg episode reward: [(0, '7.018')] [2025-01-04 01:53:11,555][134294] Updated weights for policy 0, policy_version 62494 (0.0024) [2025-01-04 01:53:13,487][134294] Updated weights for policy 0, policy_version 62504 (0.0014) [2025-01-04 01:53:13,968][134211] Fps is (10 sec: 15565.2, 60 sec: 15155.2, 300 sec: 15398.2). Total num frames: 256024576. Throughput: 0: 3885.9. Samples: 53175664. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 01:53:13,968][134211] Avg episode reward: [(0, '7.280')] [2025-01-04 01:53:14,025][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000062507_256028672.pth... [2025-01-04 01:53:14,076][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000061599_252309504.pth [2025-01-04 01:53:15,394][134294] Updated weights for policy 0, policy_version 62514 (0.0013) [2025-01-04 01:53:17,297][134294] Updated weights for policy 0, policy_version 62524 (0.0013) [2025-01-04 01:53:18,968][134211] Fps is (10 sec: 19250.3, 60 sec: 15837.7, 300 sec: 15523.1). Total num frames: 256126976. Throughput: 0: 3882.4. Samples: 53191802. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 01:53:18,969][134211] Avg episode reward: [(0, '7.324')] [2025-01-04 01:53:19,601][134294] Updated weights for policy 0, policy_version 62534 (0.0020) [2025-01-04 01:53:22,794][134294] Updated weights for policy 0, policy_version 62544 (0.0027) [2025-01-04 01:53:23,968][134211] Fps is (10 sec: 16793.1, 60 sec: 15906.2, 300 sec: 15523.1). Total num frames: 256192512. Throughput: 0: 3850.8. Samples: 53216286. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 01:53:23,968][134211] Avg episode reward: [(0, '6.760')] [2025-01-04 01:53:25,779][134294] Updated weights for policy 0, policy_version 62554 (0.0030) [2025-01-04 01:53:28,824][134294] Updated weights for policy 0, policy_version 62564 (0.0025) [2025-01-04 01:53:28,968][134211] Fps is (10 sec: 13517.5, 60 sec: 15633.0, 300 sec: 15523.1). Total num frames: 256262144. Throughput: 0: 3859.3. Samples: 53236592. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 01:53:28,968][134211] Avg episode reward: [(0, '7.397')] [2025-01-04 01:53:31,979][134294] Updated weights for policy 0, policy_version 62574 (0.0024) [2025-01-04 01:53:33,968][134211] Fps is (10 sec: 13517.0, 60 sec: 14882.1, 300 sec: 15384.3). Total num frames: 256327680. Throughput: 0: 3852.6. Samples: 53246172. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 01:53:33,968][134211] Avg episode reward: [(0, '7.652')] [2025-01-04 01:53:35,149][134294] Updated weights for policy 0, policy_version 62584 (0.0024) [2025-01-04 01:53:38,083][134294] Updated weights for policy 0, policy_version 62594 (0.0027) [2025-01-04 01:53:38,968][134211] Fps is (10 sec: 13106.9, 60 sec: 14882.1, 300 sec: 15245.4). Total num frames: 256393216. Throughput: 0: 3877.0. Samples: 53266498. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 01:53:38,969][134211] Avg episode reward: [(0, '6.973')] [2025-01-04 01:53:41,084][134294] Updated weights for policy 0, policy_version 62604 (0.0024) [2025-01-04 01:53:43,677][134294] Updated weights for policy 0, policy_version 62614 (0.0018) [2025-01-04 01:53:43,968][134211] Fps is (10 sec: 14336.0, 60 sec: 15086.9, 300 sec: 15273.2). Total num frames: 256471040. Throughput: 0: 3793.8. Samples: 53287572. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 01:53:43,968][134211] Avg episode reward: [(0, '7.033')] [2025-01-04 01:53:45,702][134294] Updated weights for policy 0, policy_version 62624 (0.0012) [2025-01-04 01:53:47,579][134294] Updated weights for policy 0, policy_version 62634 (0.0015) [2025-01-04 01:53:48,968][134211] Fps is (10 sec: 18432.6, 60 sec: 15769.7, 300 sec: 15439.9). Total num frames: 256577536. Throughput: 0: 3896.5. Samples: 53302838. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 01:53:48,968][134211] Avg episode reward: [(0, '7.253')] [2025-01-04 01:53:49,489][134294] Updated weights for policy 0, policy_version 62644 (0.0013) [2025-01-04 01:53:51,874][134294] Updated weights for policy 0, policy_version 62654 (0.0019) [2025-01-04 01:53:53,968][134211] Fps is (10 sec: 18431.8, 60 sec: 15974.5, 300 sec: 15509.3). Total num frames: 256655360. Throughput: 0: 4008.6. Samples: 53331876. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 01:53:53,969][134211] Avg episode reward: [(0, '7.384')] [2025-01-04 01:53:55,345][134294] Updated weights for policy 0, policy_version 62664 (0.0030) [2025-01-04 01:53:58,411][134294] Updated weights for policy 0, policy_version 62674 (0.0030) [2025-01-04 01:53:58,968][134211] Fps is (10 sec: 13925.9, 60 sec: 15564.7, 300 sec: 15495.4). Total num frames: 256716800. Throughput: 0: 3885.2. Samples: 53350498. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 01:53:58,969][134211] Avg episode reward: [(0, '6.900')] [2025-01-04 01:54:01,566][134294] Updated weights for policy 0, policy_version 62684 (0.0028) [2025-01-04 01:54:03,967][134211] Fps is (10 sec: 13517.2, 60 sec: 15360.1, 300 sec: 15523.2). Total num frames: 256790528. Throughput: 0: 3744.9. Samples: 53360322. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 01:54:03,968][134211] Avg episode reward: [(0, '7.317')] [2025-01-04 01:54:04,124][134294] Updated weights for policy 0, policy_version 62694 (0.0018) [2025-01-04 01:54:06,037][134294] Updated weights for policy 0, policy_version 62704 (0.0014) [2025-01-04 01:54:07,911][134294] Updated weights for policy 0, policy_version 62714 (0.0015) [2025-01-04 01:54:08,968][134211] Fps is (10 sec: 18023.0, 60 sec: 16042.7, 300 sec: 15550.9). Total num frames: 256897024. Throughput: 0: 3824.0. Samples: 53388366. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 01:54:08,968][134211] Avg episode reward: [(0, '6.710')] [2025-01-04 01:54:09,803][134294] Updated weights for policy 0, policy_version 62724 (0.0013) [2025-01-04 01:54:12,536][134294] Updated weights for policy 0, policy_version 62734 (0.0023) [2025-01-04 01:54:13,968][134211] Fps is (10 sec: 18431.4, 60 sec: 15837.8, 300 sec: 15453.7). Total num frames: 256974848. Throughput: 0: 3954.4. Samples: 53414542. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 01:54:13,969][134211] Avg episode reward: [(0, '6.626')] [2025-01-04 01:54:15,789][134294] Updated weights for policy 0, policy_version 62744 (0.0026) [2025-01-04 01:54:18,819][134294] Updated weights for policy 0, policy_version 62754 (0.0024) [2025-01-04 01:54:18,968][134211] Fps is (10 sec: 14335.7, 60 sec: 15223.6, 300 sec: 15412.1). Total num frames: 257040384. Throughput: 0: 3959.8. Samples: 53424364. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 01:54:18,968][134211] Avg episode reward: [(0, '6.512')] [2025-01-04 01:54:21,983][134294] Updated weights for policy 0, policy_version 62764 (0.0025) [2025-01-04 01:54:23,968][134211] Fps is (10 sec: 12697.1, 60 sec: 15155.1, 300 sec: 15398.2). Total num frames: 257101824. Throughput: 0: 3940.7. Samples: 53443830. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 01:54:23,969][134211] Avg episode reward: [(0, '6.940')] [2025-01-04 01:54:25,295][134294] Updated weights for policy 0, policy_version 62774 (0.0023) [2025-01-04 01:54:28,013][134294] Updated weights for policy 0, policy_version 62784 (0.0020) [2025-01-04 01:54:28,968][134211] Fps is (10 sec: 13926.7, 60 sec: 15291.7, 300 sec: 15439.8). Total num frames: 257179648. Throughput: 0: 3937.9. Samples: 53464776. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 01:54:28,968][134211] Avg episode reward: [(0, '6.906')] [2025-01-04 01:54:30,000][134294] Updated weights for policy 0, policy_version 62794 (0.0015) [2025-01-04 01:54:31,940][134294] Updated weights for policy 0, policy_version 62804 (0.0013) [2025-01-04 01:54:33,968][134211] Fps is (10 sec: 17204.1, 60 sec: 15769.6, 300 sec: 15537.1). Total num frames: 257273856. Throughput: 0: 3951.7. Samples: 53480666. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 01:54:33,968][134211] Avg episode reward: [(0, '6.729')] [2025-01-04 01:54:34,814][134294] Updated weights for policy 0, policy_version 62814 (0.0026) [2025-01-04 01:54:37,874][134294] Updated weights for policy 0, policy_version 62824 (0.0030) [2025-01-04 01:54:38,968][134211] Fps is (10 sec: 15973.8, 60 sec: 15769.6, 300 sec: 15495.4). Total num frames: 257339392. Throughput: 0: 3797.1. Samples: 53502748. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 01:54:38,969][134211] Avg episode reward: [(0, '7.044')] [2025-01-04 01:54:40,865][134294] Updated weights for policy 0, policy_version 62834 (0.0025) [2025-01-04 01:54:42,753][134294] Updated weights for policy 0, policy_version 62844 (0.0014) [2025-01-04 01:54:43,968][134211] Fps is (10 sec: 15154.8, 60 sec: 15906.0, 300 sec: 15537.0). Total num frames: 257425408. Throughput: 0: 3928.5. Samples: 53527280. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 01:54:43,969][134211] Avg episode reward: [(0, '7.357')] [2025-01-04 01:54:45,430][134294] Updated weights for policy 0, policy_version 62854 (0.0022) [2025-01-04 01:54:48,563][134294] Updated weights for policy 0, policy_version 62864 (0.0024) [2025-01-04 01:54:48,968][134211] Fps is (10 sec: 15565.1, 60 sec: 15291.7, 300 sec: 15495.4). Total num frames: 257495040. Throughput: 0: 3956.0. Samples: 53538344. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 01:54:48,968][134211] Avg episode reward: [(0, '8.348')] [2025-01-04 01:54:51,014][134294] Updated weights for policy 0, policy_version 62874 (0.0020) [2025-01-04 01:54:53,077][134294] Updated weights for policy 0, policy_version 62884 (0.0012) [2025-01-04 01:54:53,969][134211] Fps is (10 sec: 15972.2, 60 sec: 15496.1, 300 sec: 15523.0). Total num frames: 257585152. Throughput: 0: 3856.6. Samples: 53561918. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 01:54:53,970][134211] Avg episode reward: [(0, '6.723')] [2025-01-04 01:54:55,199][134294] Updated weights for policy 0, policy_version 62894 (0.0014) [2025-01-04 01:54:57,199][134294] Updated weights for policy 0, policy_version 62904 (0.0012) [2025-01-04 01:54:58,968][134211] Fps is (10 sec: 19661.1, 60 sec: 16247.5, 300 sec: 15537.0). Total num frames: 257691648. Throughput: 0: 3948.0. Samples: 53592200. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 01:54:58,968][134211] Avg episode reward: [(0, '7.158')] [2025-01-04 01:54:59,193][134294] Updated weights for policy 0, policy_version 62914 (0.0015) [2025-01-04 01:55:02,240][134294] Updated weights for policy 0, policy_version 62924 (0.0024) [2025-01-04 01:55:03,968][134211] Fps is (10 sec: 17206.1, 60 sec: 16110.9, 300 sec: 15523.1). Total num frames: 257757184. Throughput: 0: 3996.6. Samples: 53604210. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 01:55:03,968][134211] Avg episode reward: [(0, '7.580')] [2025-01-04 01:55:05,468][134294] Updated weights for policy 0, policy_version 62934 (0.0026) [2025-01-04 01:55:08,483][134294] Updated weights for policy 0, policy_version 62944 (0.0026) [2025-01-04 01:55:08,968][134211] Fps is (10 sec: 13106.9, 60 sec: 15428.2, 300 sec: 15537.0). Total num frames: 257822720. Throughput: 0: 3993.5. Samples: 53623536. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 01:55:08,968][134211] Avg episode reward: [(0, '7.009')] [2025-01-04 01:55:11,490][134294] Updated weights for policy 0, policy_version 62954 (0.0026) [2025-01-04 01:55:13,968][134211] Fps is (10 sec: 13107.1, 60 sec: 15223.5, 300 sec: 15523.1). Total num frames: 257888256. Throughput: 0: 3969.8. Samples: 53643420. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 01:55:13,968][134211] Avg episode reward: [(0, '6.972')] [2025-01-04 01:55:13,979][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000062961_257888256.pth... [2025-01-04 01:55:14,057][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000062055_254177280.pth [2025-01-04 01:55:14,879][134294] Updated weights for policy 0, policy_version 62964 (0.0023) [2025-01-04 01:55:17,867][134294] Updated weights for policy 0, policy_version 62974 (0.0023) [2025-01-04 01:55:18,968][134211] Fps is (10 sec: 13926.8, 60 sec: 15360.1, 300 sec: 15467.6). Total num frames: 257961984. Throughput: 0: 3816.0. Samples: 53652384. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 01:55:18,968][134211] Avg episode reward: [(0, '7.515')] [2025-01-04 01:55:19,774][134294] Updated weights for policy 0, policy_version 62984 (0.0014) [2025-01-04 01:55:21,760][134294] Updated weights for policy 0, policy_version 62994 (0.0014) [2025-01-04 01:55:23,699][134294] Updated weights for policy 0, policy_version 63004 (0.0013) [2025-01-04 01:55:23,968][134211] Fps is (10 sec: 18022.6, 60 sec: 16111.1, 300 sec: 15467.6). Total num frames: 258068480. Throughput: 0: 3966.0. Samples: 53681216. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 01:55:23,968][134211] Avg episode reward: [(0, '7.637')] [2025-01-04 01:55:25,654][134294] Updated weights for policy 0, policy_version 63014 (0.0015) [2025-01-04 01:55:27,900][134294] Updated weights for policy 0, policy_version 63024 (0.0017) [2025-01-04 01:55:28,968][134211] Fps is (10 sec: 19660.3, 60 sec: 16315.7, 300 sec: 15550.9). Total num frames: 258158592. Throughput: 0: 4076.3. Samples: 53710712. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 01:55:28,968][134211] Avg episode reward: [(0, '7.557')] [2025-01-04 01:55:31,071][134294] Updated weights for policy 0, policy_version 63034 (0.0030) [2025-01-04 01:55:33,968][134211] Fps is (10 sec: 15154.9, 60 sec: 15769.6, 300 sec: 15537.0). Total num frames: 258220032. Throughput: 0: 4041.7. Samples: 53720220. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 01:55:33,968][134211] Avg episode reward: [(0, '7.439')] [2025-01-04 01:55:34,482][134294] Updated weights for policy 0, policy_version 63044 (0.0026) [2025-01-04 01:55:37,618][134294] Updated weights for policy 0, policy_version 63054 (0.0025) [2025-01-04 01:55:38,968][134211] Fps is (10 sec: 12697.1, 60 sec: 15769.5, 300 sec: 15537.0). Total num frames: 258285568. Throughput: 0: 3938.9. Samples: 53739164. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 01:55:38,969][134211] Avg episode reward: [(0, '7.458')] [2025-01-04 01:55:40,574][134294] Updated weights for policy 0, policy_version 63064 (0.0025) [2025-01-04 01:55:43,636][134294] Updated weights for policy 0, policy_version 63074 (0.0024) [2025-01-04 01:55:43,968][134211] Fps is (10 sec: 13107.3, 60 sec: 15428.3, 300 sec: 15523.1). Total num frames: 258351104. Throughput: 0: 3717.8. Samples: 53759502. Policy #0 lag: (min: 0.0, avg: 0.8, max: 3.0) [2025-01-04 01:55:43,968][134211] Avg episode reward: [(0, '7.141')] [2025-01-04 01:55:46,565][134294] Updated weights for policy 0, policy_version 63084 (0.0028) [2025-01-04 01:55:48,968][134211] Fps is (10 sec: 13517.6, 60 sec: 15428.3, 300 sec: 15467.6). Total num frames: 258420736. Throughput: 0: 3678.2. Samples: 53769730. Policy #0 lag: (min: 0.0, avg: 0.8, max: 3.0) [2025-01-04 01:55:48,968][134211] Avg episode reward: [(0, '7.243')] [2025-01-04 01:55:49,547][134294] Updated weights for policy 0, policy_version 63094 (0.0023) [2025-01-04 01:55:51,426][134294] Updated weights for policy 0, policy_version 63104 (0.0012) [2025-01-04 01:55:53,319][134294] Updated weights for policy 0, policy_version 63114 (0.0012) [2025-01-04 01:55:53,968][134211] Fps is (10 sec: 17613.2, 60 sec: 15701.8, 300 sec: 15481.5). Total num frames: 258527232. Throughput: 0: 3828.7. Samples: 53795826. Policy #0 lag: (min: 0.0, avg: 0.8, max: 3.0) [2025-01-04 01:55:53,968][134211] Avg episode reward: [(0, '7.312')] [2025-01-04 01:55:55,212][134294] Updated weights for policy 0, policy_version 63124 (0.0014) [2025-01-04 01:55:57,076][134294] Updated weights for policy 0, policy_version 63134 (0.0013) [2025-01-04 01:55:58,948][134294] Updated weights for policy 0, policy_version 63144 (0.0012) [2025-01-04 01:55:58,968][134211] Fps is (10 sec: 21709.0, 60 sec: 15769.6, 300 sec: 15564.8). Total num frames: 258637824. Throughput: 0: 4112.2. Samples: 53828470. Policy #0 lag: (min: 0.0, avg: 0.8, max: 3.0) [2025-01-04 01:55:58,968][134211] Avg episode reward: [(0, '6.551')] [2025-01-04 01:56:01,273][134294] Updated weights for policy 0, policy_version 63154 (0.0019) [2025-01-04 01:56:03,968][134211] Fps is (10 sec: 18431.5, 60 sec: 15906.1, 300 sec: 15592.6). Total num frames: 258711552. Throughput: 0: 4229.3. Samples: 53842704. Policy #0 lag: (min: 0.0, avg: 0.8, max: 3.0) [2025-01-04 01:56:03,969][134211] Avg episode reward: [(0, '7.262')] [2025-01-04 01:56:04,450][134294] Updated weights for policy 0, policy_version 63164 (0.0028) [2025-01-04 01:56:07,660][134294] Updated weights for policy 0, policy_version 63174 (0.0026) [2025-01-04 01:56:08,968][134211] Fps is (10 sec: 13516.3, 60 sec: 15837.8, 300 sec: 15578.7). Total num frames: 258772992. Throughput: 0: 4013.9. Samples: 53861842. Policy #0 lag: (min: 0.0, avg: 0.8, max: 3.0) [2025-01-04 01:56:08,969][134211] Avg episode reward: [(0, '7.648')] [2025-01-04 01:56:10,891][134294] Updated weights for policy 0, policy_version 63184 (0.0027) [2025-01-04 01:56:13,782][134294] Updated weights for policy 0, policy_version 63194 (0.0026) [2025-01-04 01:56:13,968][134211] Fps is (10 sec: 13107.2, 60 sec: 15906.1, 300 sec: 15592.6). Total num frames: 258842624. Throughput: 0: 3801.9. Samples: 53881798. Policy #0 lag: (min: 0.0, avg: 0.8, max: 3.0) [2025-01-04 01:56:13,969][134211] Avg episode reward: [(0, '6.878')] [2025-01-04 01:56:16,875][134294] Updated weights for policy 0, policy_version 63204 (0.0025) [2025-01-04 01:56:18,968][134211] Fps is (10 sec: 13516.9, 60 sec: 15769.5, 300 sec: 15578.7). Total num frames: 258908160. Throughput: 0: 3812.1. Samples: 53891766. Policy #0 lag: (min: 0.0, avg: 0.8, max: 3.0) [2025-01-04 01:56:18,969][134211] Avg episode reward: [(0, '6.505')] [2025-01-04 01:56:19,936][134294] Updated weights for policy 0, policy_version 63214 (0.0028) [2025-01-04 01:56:22,862][134294] Updated weights for policy 0, policy_version 63224 (0.0023) [2025-01-04 01:56:23,968][134211] Fps is (10 sec: 13516.7, 60 sec: 15155.2, 300 sec: 15550.9). Total num frames: 258977792. Throughput: 0: 3843.6. Samples: 53912124. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 01:56:23,969][134211] Avg episode reward: [(0, '7.165')] [2025-01-04 01:56:26,037][134294] Updated weights for policy 0, policy_version 63234 (0.0028) [2025-01-04 01:56:28,677][134294] Updated weights for policy 0, policy_version 63244 (0.0019) [2025-01-04 01:56:28,968][134211] Fps is (10 sec: 14336.5, 60 sec: 14882.2, 300 sec: 15523.2). Total num frames: 259051520. Throughput: 0: 3843.8. Samples: 53932470. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 01:56:28,968][134211] Avg episode reward: [(0, '7.169')] [2025-01-04 01:56:30,682][134294] Updated weights for policy 0, policy_version 63254 (0.0013) [2025-01-04 01:56:32,674][134294] Updated weights for policy 0, policy_version 63264 (0.0016) [2025-01-04 01:56:33,968][134211] Fps is (10 sec: 16793.7, 60 sec: 15428.3, 300 sec: 15606.4). Total num frames: 259145728. Throughput: 0: 3968.8. Samples: 53948326. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 01:56:33,968][134211] Avg episode reward: [(0, '6.933')] [2025-01-04 01:56:35,666][134294] Updated weights for policy 0, policy_version 63274 (0.0024) [2025-01-04 01:56:38,650][134294] Updated weights for policy 0, policy_version 63284 (0.0024) [2025-01-04 01:56:38,968][134211] Fps is (10 sec: 15974.1, 60 sec: 15428.4, 300 sec: 15495.4). Total num frames: 259211264. Throughput: 0: 3893.2. Samples: 53971020. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 01:56:38,968][134211] Avg episode reward: [(0, '6.873')] [2025-01-04 01:56:40,792][134294] Updated weights for policy 0, policy_version 63294 (0.0013) [2025-01-04 01:56:42,686][134294] Updated weights for policy 0, policy_version 63304 (0.0014) [2025-01-04 01:56:43,968][134211] Fps is (10 sec: 17203.5, 60 sec: 16111.0, 300 sec: 15481.5). Total num frames: 259317760. Throughput: 0: 3798.7. Samples: 53999410. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 01:56:43,968][134211] Avg episode reward: [(0, '6.799')] [2025-01-04 01:56:44,565][134294] Updated weights for policy 0, policy_version 63314 (0.0013) [2025-01-04 01:56:46,445][134294] Updated weights for policy 0, policy_version 63324 (0.0014) [2025-01-04 01:56:48,968][134211] Fps is (10 sec: 20070.6, 60 sec: 16520.5, 300 sec: 15564.8). Total num frames: 259411968. Throughput: 0: 3845.2. Samples: 54015736. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 01:56:48,968][134211] Avg episode reward: [(0, '6.573')] [2025-01-04 01:56:49,114][134294] Updated weights for policy 0, policy_version 63334 (0.0023) [2025-01-04 01:56:52,340][134294] Updated weights for policy 0, policy_version 63344 (0.0025) [2025-01-04 01:56:53,968][134211] Fps is (10 sec: 15563.6, 60 sec: 15769.4, 300 sec: 15592.5). Total num frames: 259473408. Throughput: 0: 3892.9. Samples: 54037022. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 01:56:53,969][134211] Avg episode reward: [(0, '7.122')] [2025-01-04 01:56:55,983][134294] Updated weights for policy 0, policy_version 63354 (0.0025) [2025-01-04 01:56:58,968][134211] Fps is (10 sec: 11878.3, 60 sec: 14882.1, 300 sec: 15578.7). Total num frames: 259530752. Throughput: 0: 3828.2. Samples: 54054066. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 01:56:58,969][134211] Avg episode reward: [(0, '6.921')] [2025-01-04 01:56:59,646][134294] Updated weights for policy 0, policy_version 63364 (0.0025) [2025-01-04 01:57:02,553][134294] Updated weights for policy 0, policy_version 63374 (0.0019) [2025-01-04 01:57:03,967][134211] Fps is (10 sec: 13108.2, 60 sec: 14882.2, 300 sec: 15564.8). Total num frames: 259604480. Throughput: 0: 3789.1. Samples: 54062276. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 01:57:03,968][134211] Avg episode reward: [(0, '6.912')] [2025-01-04 01:57:04,550][134294] Updated weights for policy 0, policy_version 63384 (0.0013) [2025-01-04 01:57:06,498][134294] Updated weights for policy 0, policy_version 63394 (0.0012) [2025-01-04 01:57:08,474][134294] Updated weights for policy 0, policy_version 63404 (0.0014) [2025-01-04 01:57:08,967][134211] Fps is (10 sec: 18022.7, 60 sec: 15633.2, 300 sec: 15578.7). Total num frames: 259710976. Throughput: 0: 4000.6. Samples: 54092148. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 01:57:08,968][134211] Avg episode reward: [(0, '6.912')] [2025-01-04 01:57:10,478][134294] Updated weights for policy 0, policy_version 63414 (0.0014) [2025-01-04 01:57:13,427][134294] Updated weights for policy 0, policy_version 63424 (0.0025) [2025-01-04 01:57:13,968][134211] Fps is (10 sec: 18431.2, 60 sec: 15769.6, 300 sec: 15634.2). Total num frames: 259788800. Throughput: 0: 4134.5. Samples: 54118522. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 01:57:13,969][134211] Avg episode reward: [(0, '7.185')] [2025-01-04 01:57:13,978][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000063425_259788800.pth... [2025-01-04 01:57:14,063][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000062507_256028672.pth [2025-01-04 01:57:16,884][134294] Updated weights for policy 0, policy_version 63434 (0.0027) [2025-01-04 01:57:18,968][134211] Fps is (10 sec: 13926.2, 60 sec: 15701.4, 300 sec: 15634.2). Total num frames: 259850240. Throughput: 0: 3981.4. Samples: 54127490. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 01:57:18,968][134211] Avg episode reward: [(0, '6.988')] [2025-01-04 01:57:20,067][134294] Updated weights for policy 0, policy_version 63444 (0.0026) [2025-01-04 01:57:23,218][134294] Updated weights for policy 0, policy_version 63454 (0.0027) [2025-01-04 01:57:23,968][134211] Fps is (10 sec: 12697.8, 60 sec: 15633.1, 300 sec: 15564.8). Total num frames: 259915776. Throughput: 0: 3908.3. Samples: 54146894. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 01:57:23,968][134211] Avg episode reward: [(0, '7.498')] [2025-01-04 01:57:26,134][134294] Updated weights for policy 0, policy_version 63464 (0.0024) [2025-01-04 01:57:28,968][134211] Fps is (10 sec: 13516.7, 60 sec: 15564.7, 300 sec: 15425.9). Total num frames: 259985408. Throughput: 0: 3722.2. Samples: 54166908. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 01:57:28,968][134211] Avg episode reward: [(0, '6.914')] [2025-01-04 01:57:29,257][134294] Updated weights for policy 0, policy_version 63474 (0.0025) [2025-01-04 01:57:32,263][134294] Updated weights for policy 0, policy_version 63484 (0.0024) [2025-01-04 01:57:33,968][134211] Fps is (10 sec: 13516.8, 60 sec: 15086.9, 300 sec: 15425.9). Total num frames: 260050944. Throughput: 0: 3582.6. Samples: 54176954. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 01:57:33,968][134211] Avg episode reward: [(0, '7.243')] [2025-01-04 01:57:34,941][134294] Updated weights for policy 0, policy_version 63494 (0.0021) [2025-01-04 01:57:36,803][134294] Updated weights for policy 0, policy_version 63504 (0.0012) [2025-01-04 01:57:38,747][134294] Updated weights for policy 0, policy_version 63514 (0.0012) [2025-01-04 01:57:38,968][134211] Fps is (10 sec: 17203.5, 60 sec: 15769.6, 300 sec: 15564.8). Total num frames: 260157440. Throughput: 0: 3694.2. Samples: 54203258. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 01:57:38,968][134211] Avg episode reward: [(0, '6.947')] [2025-01-04 01:57:40,578][134294] Updated weights for policy 0, policy_version 63524 (0.0015) [2025-01-04 01:57:42,493][134294] Updated weights for policy 0, policy_version 63534 (0.0013) [2025-01-04 01:57:43,968][134211] Fps is (10 sec: 20889.9, 60 sec: 15701.3, 300 sec: 15689.8). Total num frames: 260259840. Throughput: 0: 4031.9. Samples: 54235502. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-04 01:57:43,968][134211] Avg episode reward: [(0, '6.648')] [2025-01-04 01:57:44,982][134294] Updated weights for policy 0, policy_version 63544 (0.0022) [2025-01-04 01:57:48,243][134294] Updated weights for policy 0, policy_version 63554 (0.0029) [2025-01-04 01:57:48,968][134211] Fps is (10 sec: 16793.3, 60 sec: 15223.4, 300 sec: 15689.8). Total num frames: 260325376. Throughput: 0: 4083.9. Samples: 54246054. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-04 01:57:48,968][134211] Avg episode reward: [(0, '6.902')] [2025-01-04 01:57:51,317][134294] Updated weights for policy 0, policy_version 63564 (0.0023) [2025-01-04 01:57:53,968][134211] Fps is (10 sec: 12697.4, 60 sec: 15223.6, 300 sec: 15606.4). Total num frames: 260386816. Throughput: 0: 3843.1. Samples: 54265088. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-04 01:57:53,968][134211] Avg episode reward: [(0, '6.953')] [2025-01-04 01:57:54,702][134294] Updated weights for policy 0, policy_version 63574 (0.0028) [2025-01-04 01:57:57,918][134294] Updated weights for policy 0, policy_version 63584 (0.0027) [2025-01-04 01:57:58,968][134211] Fps is (10 sec: 12697.5, 60 sec: 15360.0, 300 sec: 15537.0). Total num frames: 260452352. Throughput: 0: 3677.9. Samples: 54284026. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-04 01:57:58,968][134211] Avg episode reward: [(0, '6.846')] [2025-01-04 01:58:00,941][134294] Updated weights for policy 0, policy_version 63594 (0.0023) [2025-01-04 01:58:03,928][134294] Updated weights for policy 0, policy_version 63604 (0.0022) [2025-01-04 01:58:03,968][134211] Fps is (10 sec: 13517.1, 60 sec: 15291.7, 300 sec: 15550.9). Total num frames: 260521984. Throughput: 0: 3703.5. Samples: 54294146. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-04 01:58:03,968][134211] Avg episode reward: [(0, '7.077')] [2025-01-04 01:58:05,813][134294] Updated weights for policy 0, policy_version 63614 (0.0012) [2025-01-04 01:58:07,658][134294] Updated weights for policy 0, policy_version 63624 (0.0013) [2025-01-04 01:58:08,968][134211] Fps is (10 sec: 17613.3, 60 sec: 15291.7, 300 sec: 15606.5). Total num frames: 260628480. Throughput: 0: 3880.1. Samples: 54321496. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-04 01:58:08,968][134211] Avg episode reward: [(0, '6.931')] [2025-01-04 01:58:09,583][134294] Updated weights for policy 0, policy_version 63634 (0.0013) [2025-01-04 01:58:12,024][134294] Updated weights for policy 0, policy_version 63644 (0.0021) [2025-01-04 01:58:13,968][134211] Fps is (10 sec: 18431.6, 60 sec: 15291.8, 300 sec: 15523.2). Total num frames: 260706304. Throughput: 0: 4026.9. Samples: 54348118. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-04 01:58:13,968][134211] Avg episode reward: [(0, '7.489')] [2025-01-04 01:58:15,291][134294] Updated weights for policy 0, policy_version 63654 (0.0028) [2025-01-04 01:58:18,457][134294] Updated weights for policy 0, policy_version 63664 (0.0024) [2025-01-04 01:58:18,968][134211] Fps is (10 sec: 14335.3, 60 sec: 15359.9, 300 sec: 15523.1). Total num frames: 260771840. Throughput: 0: 4016.7. Samples: 54357706. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 01:58:18,969][134211] Avg episode reward: [(0, '6.772')] [2025-01-04 01:58:21,533][134294] Updated weights for policy 0, policy_version 63674 (0.0025) [2025-01-04 01:58:23,719][134294] Updated weights for policy 0, policy_version 63684 (0.0014) [2025-01-04 01:58:23,968][134211] Fps is (10 sec: 14746.0, 60 sec: 15633.1, 300 sec: 15564.8). Total num frames: 260853760. Throughput: 0: 3874.1. Samples: 54377592. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 01:58:23,968][134211] Avg episode reward: [(0, '6.212')] [2025-01-04 01:58:25,692][134294] Updated weights for policy 0, policy_version 63694 (0.0013) [2025-01-04 01:58:28,380][134294] Updated weights for policy 0, policy_version 63704 (0.0022) [2025-01-04 01:58:28,968][134211] Fps is (10 sec: 16794.0, 60 sec: 15906.1, 300 sec: 15634.2). Total num frames: 260939776. Throughput: 0: 3777.5. Samples: 54405490. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 01:58:28,968][134211] Avg episode reward: [(0, '7.018')] [2025-01-04 01:58:31,485][134294] Updated weights for policy 0, policy_version 63714 (0.0025) [2025-01-04 01:58:33,968][134211] Fps is (10 sec: 14745.4, 60 sec: 15837.9, 300 sec: 15620.4). Total num frames: 261001216. Throughput: 0: 3763.8. Samples: 54415424. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 01:58:33,968][134211] Avg episode reward: [(0, '5.971')] [2025-01-04 01:58:34,879][134294] Updated weights for policy 0, policy_version 63724 (0.0026) [2025-01-04 01:58:37,419][134294] Updated weights for policy 0, policy_version 63734 (0.0019) [2025-01-04 01:58:38,968][134211] Fps is (10 sec: 14746.0, 60 sec: 15496.5, 300 sec: 15648.1). Total num frames: 261087232. Throughput: 0: 3788.4. Samples: 54435566. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 01:58:38,968][134211] Avg episode reward: [(0, '6.288')] [2025-01-04 01:58:39,317][134294] Updated weights for policy 0, policy_version 63744 (0.0014) [2025-01-04 01:58:41,203][134294] Updated weights for policy 0, policy_version 63754 (0.0013) [2025-01-04 01:58:43,084][134294] Updated weights for policy 0, policy_version 63764 (0.0013) [2025-01-04 01:58:43,968][134211] Fps is (10 sec: 19250.7, 60 sec: 15564.7, 300 sec: 15648.1). Total num frames: 261193728. Throughput: 0: 4090.0. Samples: 54468078. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 01:58:43,968][134211] Avg episode reward: [(0, '7.308')] [2025-01-04 01:58:44,980][134294] Updated weights for policy 0, policy_version 63774 (0.0014) [2025-01-04 01:58:46,865][134294] Updated weights for policy 0, policy_version 63784 (0.0013) [2025-01-04 01:58:48,968][134211] Fps is (10 sec: 20889.1, 60 sec: 16179.2, 300 sec: 15731.4). Total num frames: 261296128. Throughput: 0: 4228.5. Samples: 54484428. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 01:58:48,968][134211] Avg episode reward: [(0, '6.766')] [2025-01-04 01:58:49,204][134294] Updated weights for policy 0, policy_version 63794 (0.0022) [2025-01-04 01:58:52,661][134294] Updated weights for policy 0, policy_version 63804 (0.0027) [2025-01-04 01:58:53,968][134211] Fps is (10 sec: 15974.7, 60 sec: 16111.0, 300 sec: 15717.5). Total num frames: 261353472. Throughput: 0: 4123.7. Samples: 54507062. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 01:58:53,968][134211] Avg episode reward: [(0, '6.534')] [2025-01-04 01:58:56,391][134294] Updated weights for policy 0, policy_version 63814 (0.0026) [2025-01-04 01:58:58,968][134211] Fps is (10 sec: 11469.0, 60 sec: 15974.5, 300 sec: 15662.0). Total num frames: 261410816. Throughput: 0: 3902.8. Samples: 54523742. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 01:58:58,968][134211] Avg episode reward: [(0, '6.175')] [2025-01-04 01:58:59,954][134294] Updated weights for policy 0, policy_version 63824 (0.0027) [2025-01-04 01:59:03,463][134294] Updated weights for policy 0, policy_version 63834 (0.0026) [2025-01-04 01:59:03,968][134211] Fps is (10 sec: 11468.7, 60 sec: 15769.5, 300 sec: 15495.4). Total num frames: 261468160. Throughput: 0: 3884.8. Samples: 54532520. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 01:59:03,968][134211] Avg episode reward: [(0, '6.433')] [2025-01-04 01:59:06,682][134294] Updated weights for policy 0, policy_version 63844 (0.0027) [2025-01-04 01:59:08,968][134211] Fps is (10 sec: 13107.2, 60 sec: 15223.4, 300 sec: 15481.5). Total num frames: 261541888. Throughput: 0: 3852.1. Samples: 54550938. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 01:59:08,968][134211] Avg episode reward: [(0, '6.301')] [2025-01-04 01:59:09,157][134294] Updated weights for policy 0, policy_version 63854 (0.0018) [2025-01-04 01:59:11,057][134294] Updated weights for policy 0, policy_version 63864 (0.0014) [2025-01-04 01:59:12,940][134294] Updated weights for policy 0, policy_version 63874 (0.0012) [2025-01-04 01:59:13,967][134211] Fps is (10 sec: 18022.8, 60 sec: 15701.4, 300 sec: 15620.4). Total num frames: 261648384. Throughput: 0: 3912.8. Samples: 54581566. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 01:59:13,968][134211] Avg episode reward: [(0, '6.531')] [2025-01-04 01:59:14,038][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000063880_261652480.pth... [2025-01-04 01:59:14,080][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000062961_257888256.pth [2025-01-04 01:59:14,810][134294] Updated weights for policy 0, policy_version 63884 (0.0014) [2025-01-04 01:59:16,706][134294] Updated weights for policy 0, policy_version 63894 (0.0013) [2025-01-04 01:59:18,595][134294] Updated weights for policy 0, policy_version 63904 (0.0014) [2025-01-04 01:59:18,967][134211] Fps is (10 sec: 21709.1, 60 sec: 16452.4, 300 sec: 15787.0). Total num frames: 261758976. Throughput: 0: 4052.5. Samples: 54597784. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 01:59:18,968][134211] Avg episode reward: [(0, '7.089')] [2025-01-04 01:59:20,472][134294] Updated weights for policy 0, policy_version 63914 (0.0015) [2025-01-04 01:59:23,021][134294] Updated weights for policy 0, policy_version 63924 (0.0022) [2025-01-04 01:59:23,968][134211] Fps is (10 sec: 19250.9, 60 sec: 16452.2, 300 sec: 15800.8). Total num frames: 261840896. Throughput: 0: 4285.9. Samples: 54628432. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 01:59:23,968][134211] Avg episode reward: [(0, '6.864')] [2025-01-04 01:59:26,335][134294] Updated weights for policy 0, policy_version 63934 (0.0027) [2025-01-04 01:59:28,968][134211] Fps is (10 sec: 14335.6, 60 sec: 16042.7, 300 sec: 15689.8). Total num frames: 261902336. Throughput: 0: 3981.4. Samples: 54647240. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 01:59:28,968][134211] Avg episode reward: [(0, '7.154')] [2025-01-04 01:59:29,655][134294] Updated weights for policy 0, policy_version 63944 (0.0027) [2025-01-04 01:59:33,013][134294] Updated weights for policy 0, policy_version 63954 (0.0027) [2025-01-04 01:59:33,968][134211] Fps is (10 sec: 12288.0, 60 sec: 16042.7, 300 sec: 15675.9). Total num frames: 261963776. Throughput: 0: 3821.8. Samples: 54656408. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 01:59:33,968][134211] Avg episode reward: [(0, '7.082')] [2025-01-04 01:59:36,141][134294] Updated weights for policy 0, policy_version 63964 (0.0027) [2025-01-04 01:59:38,968][134211] Fps is (10 sec: 12697.6, 60 sec: 15701.3, 300 sec: 15606.5). Total num frames: 262029312. Throughput: 0: 3744.7. Samples: 54675572. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 01:59:38,968][134211] Avg episode reward: [(0, '7.101')] [2025-01-04 01:59:39,612][134294] Updated weights for policy 0, policy_version 63974 (0.0030) [2025-01-04 01:59:42,610][134294] Updated weights for policy 0, policy_version 63984 (0.0026) [2025-01-04 01:59:43,968][134211] Fps is (10 sec: 13106.9, 60 sec: 15018.7, 300 sec: 15592.6). Total num frames: 262094848. Throughput: 0: 3798.3. Samples: 54694666. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 01:59:43,969][134211] Avg episode reward: [(0, '8.074')] [2025-01-04 01:59:45,719][134294] Updated weights for policy 0, policy_version 63994 (0.0026) [2025-01-04 01:59:48,399][134294] Updated weights for policy 0, policy_version 64004 (0.0020) [2025-01-04 01:59:48,968][134211] Fps is (10 sec: 13926.4, 60 sec: 14540.8, 300 sec: 15537.1). Total num frames: 262168576. Throughput: 0: 3820.0. Samples: 54704420. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 01:59:48,969][134211] Avg episode reward: [(0, '6.977')] [2025-01-04 01:59:50,600][134294] Updated weights for policy 0, policy_version 64014 (0.0018) [2025-01-04 01:59:53,579][134294] Updated weights for policy 0, policy_version 64024 (0.0025) [2025-01-04 01:59:53,968][134211] Fps is (10 sec: 15155.4, 60 sec: 14882.1, 300 sec: 15439.8). Total num frames: 262246400. Throughput: 0: 3955.5. Samples: 54728938. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 01:59:53,968][134211] Avg episode reward: [(0, '7.307')] [2025-01-04 01:59:56,662][134294] Updated weights for policy 0, policy_version 64034 (0.0025) [2025-01-04 01:59:58,716][134294] Updated weights for policy 0, policy_version 64044 (0.0013) [2025-01-04 01:59:58,967][134211] Fps is (10 sec: 15974.9, 60 sec: 15291.8, 300 sec: 15495.4). Total num frames: 262328320. Throughput: 0: 3776.0. Samples: 54751486. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 01:59:58,968][134211] Avg episode reward: [(0, '6.883')] [2025-01-04 02:00:00,653][134294] Updated weights for policy 0, policy_version 64054 (0.0014) [2025-01-04 02:00:02,601][134294] Updated weights for policy 0, policy_version 64064 (0.0015) [2025-01-04 02:00:03,968][134211] Fps is (10 sec: 18842.0, 60 sec: 16111.0, 300 sec: 15634.2). Total num frames: 262434816. Throughput: 0: 3770.4. Samples: 54767454. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 02:00:03,968][134211] Avg episode reward: [(0, '6.941')] [2025-01-04 02:00:04,504][134294] Updated weights for policy 0, policy_version 64074 (0.0013) [2025-01-04 02:00:06,375][134294] Updated weights for policy 0, policy_version 64084 (0.0012) [2025-01-04 02:00:08,493][134294] Updated weights for policy 0, policy_version 64094 (0.0017) [2025-01-04 02:00:08,968][134211] Fps is (10 sec: 20479.4, 60 sec: 16520.5, 300 sec: 15745.3). Total num frames: 262533120. Throughput: 0: 3800.2. Samples: 54799442. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 02:00:08,969][134211] Avg episode reward: [(0, '6.980')] [2025-01-04 02:00:11,786][134294] Updated weights for policy 0, policy_version 64104 (0.0025) [2025-01-04 02:00:13,968][134211] Fps is (10 sec: 15974.1, 60 sec: 15769.5, 300 sec: 15703.6). Total num frames: 262594560. Throughput: 0: 3837.6. Samples: 54819932. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 02:00:13,968][134211] Avg episode reward: [(0, '6.668')] [2025-01-04 02:00:15,098][134294] Updated weights for policy 0, policy_version 64114 (0.0028) [2025-01-04 02:00:18,160][134294] Updated weights for policy 0, policy_version 64124 (0.0027) [2025-01-04 02:00:18,968][134211] Fps is (10 sec: 12697.8, 60 sec: 15018.6, 300 sec: 15564.8). Total num frames: 262660096. Throughput: 0: 3844.0. Samples: 54829386. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 02:00:18,968][134211] Avg episode reward: [(0, '7.087')] [2025-01-04 02:00:21,377][134294] Updated weights for policy 0, policy_version 64134 (0.0022) [2025-01-04 02:00:23,970][134211] Fps is (10 sec: 12695.4, 60 sec: 14676.9, 300 sec: 15467.5). Total num frames: 262721536. Throughput: 0: 3850.9. Samples: 54848870. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 02:00:23,970][134211] Avg episode reward: [(0, '6.839')] [2025-01-04 02:00:24,669][134294] Updated weights for policy 0, policy_version 64144 (0.0029) [2025-01-04 02:00:27,335][134294] Updated weights for policy 0, policy_version 64154 (0.0019) [2025-01-04 02:00:28,968][134211] Fps is (10 sec: 14335.6, 60 sec: 15018.6, 300 sec: 15537.0). Total num frames: 262803456. Throughput: 0: 3922.8. Samples: 54871194. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 02:00:28,969][134211] Avg episode reward: [(0, '6.532')] [2025-01-04 02:00:29,708][134294] Updated weights for policy 0, policy_version 64164 (0.0021) [2025-01-04 02:00:32,769][134294] Updated weights for policy 0, policy_version 64174 (0.0026) [2025-01-04 02:00:33,968][134211] Fps is (10 sec: 15157.8, 60 sec: 15155.2, 300 sec: 15550.9). Total num frames: 262873088. Throughput: 0: 3952.6. Samples: 54882288. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 02:00:33,968][134211] Avg episode reward: [(0, '6.901')] [2025-01-04 02:00:35,739][134294] Updated weights for policy 0, policy_version 64184 (0.0025) [2025-01-04 02:00:38,359][134294] Updated weights for policy 0, policy_version 64194 (0.0020) [2025-01-04 02:00:38,968][134211] Fps is (10 sec: 14746.0, 60 sec: 15360.0, 300 sec: 15592.6). Total num frames: 262950912. Throughput: 0: 3860.3. Samples: 54902652. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 02:00:38,968][134211] Avg episode reward: [(0, '6.916')] [2025-01-04 02:00:40,281][134294] Updated weights for policy 0, policy_version 64204 (0.0013) [2025-01-04 02:00:42,209][134294] Updated weights for policy 0, policy_version 64214 (0.0013) [2025-01-04 02:00:43,967][134211] Fps is (10 sec: 18432.6, 60 sec: 16042.8, 300 sec: 15717.5). Total num frames: 263057408. Throughput: 0: 4046.8. Samples: 54933594. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 02:00:43,968][134211] Avg episode reward: [(0, '6.892')] [2025-01-04 02:00:44,115][134294] Updated weights for policy 0, policy_version 64224 (0.0013) [2025-01-04 02:00:46,026][134294] Updated weights for policy 0, policy_version 64234 (0.0015) [2025-01-04 02:00:47,922][134294] Updated weights for policy 0, policy_version 64244 (0.0014) [2025-01-04 02:00:48,968][134211] Fps is (10 sec: 21299.3, 60 sec: 16588.8, 300 sec: 15717.5). Total num frames: 263163904. Throughput: 0: 4051.2. Samples: 54949756. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 02:00:48,968][134211] Avg episode reward: [(0, '6.306')] [2025-01-04 02:00:50,335][134294] Updated weights for policy 0, policy_version 64254 (0.0020) [2025-01-04 02:00:53,896][134294] Updated weights for policy 0, policy_version 64264 (0.0030) [2025-01-04 02:00:53,968][134211] Fps is (10 sec: 16793.0, 60 sec: 16315.7, 300 sec: 15550.9). Total num frames: 263225344. Throughput: 0: 3902.5. Samples: 54975056. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 02:00:53,969][134211] Avg episode reward: [(0, '7.348')] [2025-01-04 02:00:57,484][134294] Updated weights for policy 0, policy_version 64274 (0.0027) [2025-01-04 02:00:58,968][134211] Fps is (10 sec: 11468.6, 60 sec: 15837.8, 300 sec: 15481.5). Total num frames: 263278592. Throughput: 0: 3813.5. Samples: 54991538. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 02:00:58,968][134211] Avg episode reward: [(0, '7.386')] [2025-01-04 02:01:01,074][134294] Updated weights for policy 0, policy_version 64284 (0.0027) [2025-01-04 02:01:03,968][134211] Fps is (10 sec: 11468.8, 60 sec: 15086.9, 300 sec: 15481.5). Total num frames: 263340032. Throughput: 0: 3803.6. Samples: 55000550. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 02:01:03,970][134211] Avg episode reward: [(0, '6.715')] [2025-01-04 02:01:04,576][134294] Updated weights for policy 0, policy_version 64294 (0.0029) [2025-01-04 02:01:07,684][134294] Updated weights for policy 0, policy_version 64304 (0.0027) [2025-01-04 02:01:08,968][134211] Fps is (10 sec: 12697.6, 60 sec: 14540.8, 300 sec: 15467.6). Total num frames: 263405568. Throughput: 0: 3783.7. Samples: 55019130. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 02:01:08,969][134211] Avg episode reward: [(0, '6.128')] [2025-01-04 02:01:10,274][134294] Updated weights for policy 0, policy_version 64314 (0.0018) [2025-01-04 02:01:12,099][134294] Updated weights for policy 0, policy_version 64324 (0.0014) [2025-01-04 02:01:13,968][134211] Fps is (10 sec: 16793.9, 60 sec: 15223.5, 300 sec: 15592.6). Total num frames: 263507968. Throughput: 0: 3898.7. Samples: 55046636. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 02:01:13,968][134211] Avg episode reward: [(0, '6.452')] [2025-01-04 02:01:14,041][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000064334_263512064.pth... [2025-01-04 02:01:14,041][134294] Updated weights for policy 0, policy_version 64334 (0.0015) [2025-01-04 02:01:14,079][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000063425_259788800.pth [2025-01-04 02:01:15,963][134294] Updated weights for policy 0, policy_version 64344 (0.0013) [2025-01-04 02:01:17,878][134294] Updated weights for policy 0, policy_version 64354 (0.0013) [2025-01-04 02:01:18,968][134211] Fps is (10 sec: 20890.1, 60 sec: 15906.2, 300 sec: 15717.5). Total num frames: 263614464. Throughput: 0: 4008.4. Samples: 55062664. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 02:01:18,968][134211] Avg episode reward: [(0, '6.589')] [2025-01-04 02:01:19,734][134294] Updated weights for policy 0, policy_version 64364 (0.0014) [2025-01-04 02:01:22,115][134294] Updated weights for policy 0, policy_version 64374 (0.0018) [2025-01-04 02:01:23,968][134211] Fps is (10 sec: 18841.5, 60 sec: 16248.0, 300 sec: 15745.3). Total num frames: 263696384. Throughput: 0: 4216.2. Samples: 55092382. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 02:01:23,968][134211] Avg episode reward: [(0, '6.753')] [2025-01-04 02:01:25,424][134294] Updated weights for policy 0, policy_version 64384 (0.0029) [2025-01-04 02:01:28,449][134294] Updated weights for policy 0, policy_version 64394 (0.0027) [2025-01-04 02:01:28,969][134211] Fps is (10 sec: 14744.0, 60 sec: 15974.2, 300 sec: 15648.1). Total num frames: 263761920. Throughput: 0: 3955.9. Samples: 55111612. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 02:01:28,969][134211] Avg episode reward: [(0, '7.144')] [2025-01-04 02:01:31,623][134294] Updated weights for policy 0, policy_version 64404 (0.0026) [2025-01-04 02:01:33,968][134211] Fps is (10 sec: 13107.3, 60 sec: 15906.2, 300 sec: 15648.1). Total num frames: 263827456. Throughput: 0: 3814.7. Samples: 55121418. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 02:01:33,968][134211] Avg episode reward: [(0, '6.807')] [2025-01-04 02:01:34,746][134294] Updated weights for policy 0, policy_version 64414 (0.0028) [2025-01-04 02:01:38,060][134294] Updated weights for policy 0, policy_version 64424 (0.0026) [2025-01-04 02:01:38,969][134211] Fps is (10 sec: 12697.8, 60 sec: 15632.9, 300 sec: 15495.3). Total num frames: 263888896. Throughput: 0: 3680.3. Samples: 55140672. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 02:01:38,969][134211] Avg episode reward: [(0, '7.071')] [2025-01-04 02:01:41,041][134294] Updated weights for policy 0, policy_version 64434 (0.0029) [2025-01-04 02:01:43,968][134211] Fps is (10 sec: 13107.0, 60 sec: 15018.6, 300 sec: 15412.1). Total num frames: 263958528. Throughput: 0: 3753.9. Samples: 55160462. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 02:01:43,969][134211] Avg episode reward: [(0, '6.549')] [2025-01-04 02:01:44,141][134294] Updated weights for policy 0, policy_version 64444 (0.0028) [2025-01-04 02:01:47,231][134294] Updated weights for policy 0, policy_version 64454 (0.0026) [2025-01-04 02:01:48,968][134211] Fps is (10 sec: 13517.7, 60 sec: 14335.9, 300 sec: 15426.0). Total num frames: 264024064. Throughput: 0: 3777.4. Samples: 55170532. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 02:01:48,969][134211] Avg episode reward: [(0, '7.242')] [2025-01-04 02:01:50,146][134294] Updated weights for policy 0, policy_version 64464 (0.0025) [2025-01-04 02:01:52,111][134294] Updated weights for policy 0, policy_version 64474 (0.0013) [2025-01-04 02:01:53,967][134211] Fps is (10 sec: 16384.6, 60 sec: 14950.5, 300 sec: 15564.8). Total num frames: 264122368. Throughput: 0: 3905.7. Samples: 55194884. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 02:01:53,968][134211] Avg episode reward: [(0, '6.748')] [2025-01-04 02:01:54,063][134294] Updated weights for policy 0, policy_version 64484 (0.0013) [2025-01-04 02:01:55,928][134294] Updated weights for policy 0, policy_version 64494 (0.0013) [2025-01-04 02:01:57,816][134294] Updated weights for policy 0, policy_version 64504 (0.0014) [2025-01-04 02:01:58,968][134211] Fps is (10 sec: 20889.7, 60 sec: 15906.1, 300 sec: 15689.7). Total num frames: 264232960. Throughput: 0: 4014.6. Samples: 55227292. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 02:01:58,968][134211] Avg episode reward: [(0, '6.950')] [2025-01-04 02:01:59,701][134294] Updated weights for policy 0, policy_version 64514 (0.0012) [2025-01-04 02:02:01,652][134294] Updated weights for policy 0, policy_version 64524 (0.0016) [2025-01-04 02:02:03,968][134211] Fps is (10 sec: 19660.5, 60 sec: 16315.8, 300 sec: 15620.3). Total num frames: 264318976. Throughput: 0: 4021.7. Samples: 55243642. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 02:02:03,968][134211] Avg episode reward: [(0, '7.024')] [2025-01-04 02:02:04,767][134294] Updated weights for policy 0, policy_version 64534 (0.0029) [2025-01-04 02:02:08,005][134294] Updated weights for policy 0, policy_version 64544 (0.0032) [2025-01-04 02:02:08,968][134211] Fps is (10 sec: 14745.6, 60 sec: 16247.5, 300 sec: 15564.8). Total num frames: 264380416. Throughput: 0: 3801.7. Samples: 55263460. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 02:02:08,968][134211] Avg episode reward: [(0, '6.812')] [2025-01-04 02:02:11,212][134294] Updated weights for policy 0, policy_version 64554 (0.0025) [2025-01-04 02:02:13,968][134211] Fps is (10 sec: 12697.2, 60 sec: 15633.0, 300 sec: 15578.7). Total num frames: 264445952. Throughput: 0: 3802.6. Samples: 55282728. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 02:02:13,969][134211] Avg episode reward: [(0, '6.756')] [2025-01-04 02:02:14,429][134294] Updated weights for policy 0, policy_version 64564 (0.0028) [2025-01-04 02:02:17,517][134294] Updated weights for policy 0, policy_version 64574 (0.0023) [2025-01-04 02:02:18,968][134211] Fps is (10 sec: 13516.4, 60 sec: 15018.5, 300 sec: 15592.6). Total num frames: 264515584. Throughput: 0: 3805.3. Samples: 55292660. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 02:02:18,970][134211] Avg episode reward: [(0, '6.899')] [2025-01-04 02:02:20,497][134294] Updated weights for policy 0, policy_version 64584 (0.0027) [2025-01-04 02:02:23,470][134294] Updated weights for policy 0, policy_version 64594 (0.0024) [2025-01-04 02:02:23,968][134211] Fps is (10 sec: 13517.0, 60 sec: 14745.6, 300 sec: 15578.7). Total num frames: 264581120. Throughput: 0: 3837.8. Samples: 55313370. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 02:02:23,968][134211] Avg episode reward: [(0, '6.660')] [2025-01-04 02:02:26,577][134294] Updated weights for policy 0, policy_version 64604 (0.0024) [2025-01-04 02:02:28,962][134294] Updated weights for policy 0, policy_version 64614 (0.0017) [2025-01-04 02:02:28,968][134211] Fps is (10 sec: 14336.7, 60 sec: 14950.7, 300 sec: 15620.4). Total num frames: 264658944. Throughput: 0: 3852.9. Samples: 55333842. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 02:02:28,968][134211] Avg episode reward: [(0, '7.365')] [2025-01-04 02:02:30,887][134294] Updated weights for policy 0, policy_version 64624 (0.0012) [2025-01-04 02:02:32,801][134294] Updated weights for policy 0, policy_version 64634 (0.0014) [2025-01-04 02:02:33,967][134211] Fps is (10 sec: 18432.5, 60 sec: 15633.1, 300 sec: 15620.3). Total num frames: 264765440. Throughput: 0: 3986.4. Samples: 55349918. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 02:02:33,968][134211] Avg episode reward: [(0, '6.716')] [2025-01-04 02:02:34,659][134294] Updated weights for policy 0, policy_version 64644 (0.0014) [2025-01-04 02:02:37,033][134294] Updated weights for policy 0, policy_version 64654 (0.0020) [2025-01-04 02:02:38,968][134211] Fps is (10 sec: 18431.5, 60 sec: 15906.3, 300 sec: 15537.0). Total num frames: 264843264. Throughput: 0: 4089.7. Samples: 55378924. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 02:02:38,969][134211] Avg episode reward: [(0, '6.977')] [2025-01-04 02:02:40,460][134294] Updated weights for policy 0, policy_version 64664 (0.0030) [2025-01-04 02:02:43,602][134294] Updated weights for policy 0, policy_version 64674 (0.0026) [2025-01-04 02:02:43,968][134211] Fps is (10 sec: 14335.5, 60 sec: 15837.9, 300 sec: 15537.0). Total num frames: 264908800. Throughput: 0: 3794.8. Samples: 55398058. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 02:02:43,969][134211] Avg episode reward: [(0, '6.688')] [2025-01-04 02:02:46,721][134294] Updated weights for policy 0, policy_version 64684 (0.0025) [2025-01-04 02:02:48,968][134211] Fps is (10 sec: 12697.7, 60 sec: 15769.6, 300 sec: 15537.0). Total num frames: 264970240. Throughput: 0: 3645.2. Samples: 55407676. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 02:02:48,968][134211] Avg episode reward: [(0, '7.272')] [2025-01-04 02:02:49,693][134294] Updated weights for policy 0, policy_version 64694 (0.0023) [2025-01-04 02:02:51,543][134294] Updated weights for policy 0, policy_version 64704 (0.0013) [2025-01-04 02:02:53,643][134294] Updated weights for policy 0, policy_version 64714 (0.0016) [2025-01-04 02:02:53,968][134211] Fps is (10 sec: 16384.4, 60 sec: 15837.8, 300 sec: 15662.0). Total num frames: 265072640. Throughput: 0: 3773.0. Samples: 55433246. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 02:02:53,968][134211] Avg episode reward: [(0, '6.674')] [2025-01-04 02:02:55,732][134294] Updated weights for policy 0, policy_version 64724 (0.0016) [2025-01-04 02:02:57,787][134294] Updated weights for policy 0, policy_version 64734 (0.0014) [2025-01-04 02:02:58,968][134211] Fps is (10 sec: 20070.8, 60 sec: 15633.1, 300 sec: 15759.2). Total num frames: 265170944. Throughput: 0: 4005.1. Samples: 55462956. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 02:02:58,968][134211] Avg episode reward: [(0, '6.410')] [2025-01-04 02:02:59,855][134294] Updated weights for policy 0, policy_version 64744 (0.0015) [2025-01-04 02:03:03,044][134294] Updated weights for policy 0, policy_version 64754 (0.0029) [2025-01-04 02:03:03,968][134211] Fps is (10 sec: 16793.4, 60 sec: 15360.0, 300 sec: 15634.2). Total num frames: 265240576. Throughput: 0: 4064.5. Samples: 55475562. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 02:03:03,968][134211] Avg episode reward: [(0, '6.199')] [2025-01-04 02:03:06,286][134294] Updated weights for policy 0, policy_version 64764 (0.0030) [2025-01-04 02:03:08,968][134211] Fps is (10 sec: 13516.1, 60 sec: 15428.2, 300 sec: 15592.6). Total num frames: 265306112. Throughput: 0: 4023.7. Samples: 55494438. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 02:03:08,969][134211] Avg episode reward: [(0, '6.772')] [2025-01-04 02:03:09,555][134294] Updated weights for policy 0, policy_version 64774 (0.0027) [2025-01-04 02:03:12,700][134294] Updated weights for policy 0, policy_version 64784 (0.0026) [2025-01-04 02:03:13,968][134211] Fps is (10 sec: 13106.6, 60 sec: 15428.2, 300 sec: 15592.6). Total num frames: 265371648. Throughput: 0: 3995.9. Samples: 55513660. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 02:03:13,969][134211] Avg episode reward: [(0, '6.660')] [2025-01-04 02:03:13,982][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000064788_265371648.pth... [2025-01-04 02:03:14,051][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000063880_261652480.pth [2025-01-04 02:03:15,794][134294] Updated weights for policy 0, policy_version 64794 (0.0024) [2025-01-04 02:03:18,743][134294] Updated weights for policy 0, policy_version 64804 (0.0023) [2025-01-04 02:03:18,968][134211] Fps is (10 sec: 13106.9, 60 sec: 15359.9, 300 sec: 15537.0). Total num frames: 265437184. Throughput: 0: 3862.6. Samples: 55523740. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 02:03:18,969][134211] Avg episode reward: [(0, '6.814')] [2025-01-04 02:03:21,723][134294] Updated weights for policy 0, policy_version 64814 (0.0024) [2025-01-04 02:03:23,968][134211] Fps is (10 sec: 13517.4, 60 sec: 15428.3, 300 sec: 15481.5). Total num frames: 265506816. Throughput: 0: 3673.8. Samples: 55544246. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 02:03:23,968][134211] Avg episode reward: [(0, '7.316')] [2025-01-04 02:03:24,812][134294] Updated weights for policy 0, policy_version 64824 (0.0026) [2025-01-04 02:03:27,152][134294] Updated weights for policy 0, policy_version 64834 (0.0016) [2025-01-04 02:03:28,967][134211] Fps is (10 sec: 15975.7, 60 sec: 15633.1, 300 sec: 15578.7). Total num frames: 265596928. Throughput: 0: 3789.4. Samples: 55568578. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 02:03:28,968][134211] Avg episode reward: [(0, '6.878')] [2025-01-04 02:03:29,067][134294] Updated weights for policy 0, policy_version 64844 (0.0013) [2025-01-04 02:03:31,014][134294] Updated weights for policy 0, policy_version 64854 (0.0013) [2025-01-04 02:03:32,907][134294] Updated weights for policy 0, policy_version 64864 (0.0013) [2025-01-04 02:03:33,967][134211] Fps is (10 sec: 19661.3, 60 sec: 15633.1, 300 sec: 15648.1). Total num frames: 265703424. Throughput: 0: 3933.8. Samples: 55584694. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 02:03:33,968][134211] Avg episode reward: [(0, '6.161')] [2025-01-04 02:03:34,803][134294] Updated weights for policy 0, policy_version 64874 (0.0013) [2025-01-04 02:03:36,687][134294] Updated weights for policy 0, policy_version 64884 (0.0012) [2025-01-04 02:03:38,968][134211] Fps is (10 sec: 20479.1, 60 sec: 15974.4, 300 sec: 15620.3). Total num frames: 265801728. Throughput: 0: 4083.9. Samples: 55617024. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 02:03:38,969][134211] Avg episode reward: [(0, '6.520')] [2025-01-04 02:03:39,427][134294] Updated weights for policy 0, policy_version 64894 (0.0022) [2025-01-04 02:03:42,747][134294] Updated weights for policy 0, policy_version 64904 (0.0028) [2025-01-04 02:03:43,968][134211] Fps is (10 sec: 15564.5, 60 sec: 15837.9, 300 sec: 15467.6). Total num frames: 265859072. Throughput: 0: 3849.2. Samples: 55636170. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 02:03:43,968][134211] Avg episode reward: [(0, '6.623')] [2025-01-04 02:03:46,010][134294] Updated weights for policy 0, policy_version 64914 (0.0026) [2025-01-04 02:03:48,968][134211] Fps is (10 sec: 11878.5, 60 sec: 15837.8, 300 sec: 15481.5). Total num frames: 265920512. Throughput: 0: 3782.7. Samples: 55645786. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 02:03:48,968][134211] Avg episode reward: [(0, '6.873')] [2025-01-04 02:03:49,355][134294] Updated weights for policy 0, policy_version 64924 (0.0026) [2025-01-04 02:03:52,824][134294] Updated weights for policy 0, policy_version 64934 (0.0030) [2025-01-04 02:03:53,968][134211] Fps is (10 sec: 12287.8, 60 sec: 15155.1, 300 sec: 15495.4). Total num frames: 265981952. Throughput: 0: 3765.7. Samples: 55663892. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 02:03:53,968][134211] Avg episode reward: [(0, '6.643')] [2025-01-04 02:03:56,096][134294] Updated weights for policy 0, policy_version 64944 (0.0026) [2025-01-04 02:03:58,637][134294] Updated weights for policy 0, policy_version 64954 (0.0017) [2025-01-04 02:03:58,967][134211] Fps is (10 sec: 13517.2, 60 sec: 14745.6, 300 sec: 15550.9). Total num frames: 266055680. Throughput: 0: 3782.2. Samples: 55683856. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 02:03:58,968][134211] Avg episode reward: [(0, '6.613')] [2025-01-04 02:04:00,507][134294] Updated weights for policy 0, policy_version 64964 (0.0013) [2025-01-04 02:04:02,396][134294] Updated weights for policy 0, policy_version 64974 (0.0014) [2025-01-04 02:04:03,968][134211] Fps is (10 sec: 18432.5, 60 sec: 15428.3, 300 sec: 15675.9). Total num frames: 266166272. Throughput: 0: 3917.3. Samples: 55700016. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 02:04:03,968][134211] Avg episode reward: [(0, '7.064')] [2025-01-04 02:04:04,279][134294] Updated weights for policy 0, policy_version 64984 (0.0013) [2025-01-04 02:04:06,193][134294] Updated weights for policy 0, policy_version 64994 (0.0013) [2025-01-04 02:04:08,083][134294] Updated weights for policy 0, policy_version 65004 (0.0015) [2025-01-04 02:04:08,968][134211] Fps is (10 sec: 21707.3, 60 sec: 16110.9, 300 sec: 15675.8). Total num frames: 266272768. Throughput: 0: 4182.0. Samples: 55732436. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 02:04:08,969][134211] Avg episode reward: [(0, '6.208')] [2025-01-04 02:04:10,608][134294] Updated weights for policy 0, policy_version 65014 (0.0022) [2025-01-04 02:04:13,716][134294] Updated weights for policy 0, policy_version 65024 (0.0027) [2025-01-04 02:04:13,968][134211] Fps is (10 sec: 17202.2, 60 sec: 16110.9, 300 sec: 15523.1). Total num frames: 266338304. Throughput: 0: 4165.5. Samples: 55756030. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 02:04:13,969][134211] Avg episode reward: [(0, '7.191')] [2025-01-04 02:04:16,934][134294] Updated weights for policy 0, policy_version 65034 (0.0030) [2025-01-04 02:04:18,968][134211] Fps is (10 sec: 13107.5, 60 sec: 16111.0, 300 sec: 15467.6). Total num frames: 266403840. Throughput: 0: 4018.4. Samples: 55765524. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 02:04:18,969][134211] Avg episode reward: [(0, '7.468')] [2025-01-04 02:04:20,156][134294] Updated weights for policy 0, policy_version 65044 (0.0029) [2025-01-04 02:04:23,229][134294] Updated weights for policy 0, policy_version 65054 (0.0028) [2025-01-04 02:04:23,968][134211] Fps is (10 sec: 13107.8, 60 sec: 16042.7, 300 sec: 15481.5). Total num frames: 266469376. Throughput: 0: 3733.0. Samples: 55785010. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 02:04:23,968][134211] Avg episode reward: [(0, '6.879')] [2025-01-04 02:04:26,378][134294] Updated weights for policy 0, policy_version 65064 (0.0025) [2025-01-04 02:04:28,968][134211] Fps is (10 sec: 13107.4, 60 sec: 15633.0, 300 sec: 15495.4). Total num frames: 266534912. Throughput: 0: 3742.4. Samples: 55804578. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 02:04:28,968][134211] Avg episode reward: [(0, '7.275')] [2025-01-04 02:04:29,502][134294] Updated weights for policy 0, policy_version 65074 (0.0026) [2025-01-04 02:04:32,491][134294] Updated weights for policy 0, policy_version 65084 (0.0024) [2025-01-04 02:04:33,968][134211] Fps is (10 sec: 13107.0, 60 sec: 14950.3, 300 sec: 15495.4). Total num frames: 266600448. Throughput: 0: 3751.6. Samples: 55814610. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 02:04:33,969][134211] Avg episode reward: [(0, '6.833')] [2025-01-04 02:04:35,377][134294] Updated weights for policy 0, policy_version 65094 (0.0022) [2025-01-04 02:04:37,260][134294] Updated weights for policy 0, policy_version 65104 (0.0013) [2025-01-04 02:04:38,968][134211] Fps is (10 sec: 16384.4, 60 sec: 14950.5, 300 sec: 15606.5). Total num frames: 266698752. Throughput: 0: 3896.7. Samples: 55839242. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 02:04:38,968][134211] Avg episode reward: [(0, '6.702')] [2025-01-04 02:04:39,152][134294] Updated weights for policy 0, policy_version 65114 (0.0013) [2025-01-04 02:04:41,044][134294] Updated weights for policy 0, policy_version 65124 (0.0012) [2025-01-04 02:04:42,938][134294] Updated weights for policy 0, policy_version 65134 (0.0012) [2025-01-04 02:04:43,967][134211] Fps is (10 sec: 20890.3, 60 sec: 15837.9, 300 sec: 15731.4). Total num frames: 266809344. Throughput: 0: 4177.5. Samples: 55871842. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 02:04:43,968][134211] Avg episode reward: [(0, '7.048')] [2025-01-04 02:04:44,802][134294] Updated weights for policy 0, policy_version 65144 (0.0015) [2025-01-04 02:04:47,701][134294] Updated weights for policy 0, policy_version 65154 (0.0024) [2025-01-04 02:04:48,969][134211] Fps is (10 sec: 18430.1, 60 sec: 16042.5, 300 sec: 15717.5). Total num frames: 266883072. Throughput: 0: 4131.3. Samples: 55885928. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 02:04:48,970][134211] Avg episode reward: [(0, '7.094')] [2025-01-04 02:04:50,908][134294] Updated weights for policy 0, policy_version 65164 (0.0028) [2025-01-04 02:04:53,968][134211] Fps is (10 sec: 13926.2, 60 sec: 16111.0, 300 sec: 15662.0). Total num frames: 266948608. Throughput: 0: 3842.9. Samples: 55905364. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 02:04:53,968][134211] Avg episode reward: [(0, '6.993')] [2025-01-04 02:04:54,288][134294] Updated weights for policy 0, policy_version 65174 (0.0027) [2025-01-04 02:04:57,918][134294] Updated weights for policy 0, policy_version 65184 (0.0027) [2025-01-04 02:04:58,968][134211] Fps is (10 sec: 11879.4, 60 sec: 15769.5, 300 sec: 15481.5). Total num frames: 267001856. Throughput: 0: 3694.6. Samples: 55922286. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 02:04:58,968][134211] Avg episode reward: [(0, '6.548')] [2025-01-04 02:05:01,024][134294] Updated weights for policy 0, policy_version 65194 (0.0028) [2025-01-04 02:05:03,132][134294] Updated weights for policy 0, policy_version 65204 (0.0012) [2025-01-04 02:05:03,968][134211] Fps is (10 sec: 14336.2, 60 sec: 15428.3, 300 sec: 15453.7). Total num frames: 267091968. Throughput: 0: 3707.8. Samples: 55932372. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 02:05:03,968][134211] Avg episode reward: [(0, '7.079')] [2025-01-04 02:05:05,102][134294] Updated weights for policy 0, policy_version 65214 (0.0014) [2025-01-04 02:05:06,955][134294] Updated weights for policy 0, policy_version 65224 (0.0013) [2025-01-04 02:05:08,968][134211] Fps is (10 sec: 18841.9, 60 sec: 15291.9, 300 sec: 15578.7). Total num frames: 267190272. Throughput: 0: 3972.9. Samples: 55963792. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 02:05:08,968][134211] Avg episode reward: [(0, '6.682')] [2025-01-04 02:05:09,301][134294] Updated weights for policy 0, policy_version 65234 (0.0021) [2025-01-04 02:05:12,432][134294] Updated weights for policy 0, policy_version 65244 (0.0029) [2025-01-04 02:05:13,968][134211] Fps is (10 sec: 16383.8, 60 sec: 15291.8, 300 sec: 15578.7). Total num frames: 267255808. Throughput: 0: 4014.8. Samples: 55985244. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 02:05:13,968][134211] Avg episode reward: [(0, '7.006')] [2025-01-04 02:05:14,026][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000065249_267259904.pth... [2025-01-04 02:05:14,095][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000064334_263512064.pth [2025-01-04 02:05:15,763][134294] Updated weights for policy 0, policy_version 65254 (0.0029) [2025-01-04 02:05:18,667][134294] Updated weights for policy 0, policy_version 65264 (0.0026) [2025-01-04 02:05:18,968][134211] Fps is (10 sec: 13107.1, 60 sec: 15291.8, 300 sec: 15592.7). Total num frames: 267321344. Throughput: 0: 4007.0. Samples: 55994924. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 02:05:18,968][134211] Avg episode reward: [(0, '6.940')] [2025-01-04 02:05:21,729][134294] Updated weights for policy 0, policy_version 65274 (0.0025) [2025-01-04 02:05:23,968][134211] Fps is (10 sec: 13516.9, 60 sec: 15360.0, 300 sec: 15550.9). Total num frames: 267390976. Throughput: 0: 3912.6. Samples: 56015310. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 02:05:23,968][134211] Avg episode reward: [(0, '6.859')] [2025-01-04 02:05:24,837][134294] Updated weights for policy 0, policy_version 65284 (0.0027) [2025-01-04 02:05:26,890][134294] Updated weights for policy 0, policy_version 65294 (0.0014) [2025-01-04 02:05:28,735][134294] Updated weights for policy 0, policy_version 65304 (0.0013) [2025-01-04 02:05:28,967][134211] Fps is (10 sec: 16793.9, 60 sec: 15906.2, 300 sec: 15648.1). Total num frames: 267489280. Throughput: 0: 3770.3. Samples: 56041504. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 02:05:28,968][134211] Avg episode reward: [(0, '6.890')] [2025-01-04 02:05:30,616][134294] Updated weights for policy 0, policy_version 65314 (0.0013) [2025-01-04 02:05:32,478][134294] Updated weights for policy 0, policy_version 65324 (0.0013) [2025-01-04 02:05:33,968][134211] Fps is (10 sec: 19250.8, 60 sec: 16384.0, 300 sec: 15703.6). Total num frames: 267583488. Throughput: 0: 3823.0. Samples: 56057958. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 02:05:33,969][134211] Avg episode reward: [(0, '7.350')] [2025-01-04 02:05:35,658][134294] Updated weights for policy 0, policy_version 65334 (0.0027) [2025-01-04 02:05:38,968][134211] Fps is (10 sec: 15564.3, 60 sec: 15769.5, 300 sec: 15550.9). Total num frames: 267644928. Throughput: 0: 3869.4. Samples: 56079486. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 02:05:38,969][134211] Avg episode reward: [(0, '6.752')] [2025-01-04 02:05:39,289][134294] Updated weights for policy 0, policy_version 65344 (0.0032) [2025-01-04 02:05:42,829][134294] Updated weights for policy 0, policy_version 65354 (0.0026) [2025-01-04 02:05:43,968][134211] Fps is (10 sec: 11878.5, 60 sec: 14882.1, 300 sec: 15384.3). Total num frames: 267702272. Throughput: 0: 3878.0. Samples: 56096796. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 02:05:43,968][134211] Avg episode reward: [(0, '7.241')] [2025-01-04 02:05:45,563][134294] Updated weights for policy 0, policy_version 65364 (0.0020) [2025-01-04 02:05:47,629][134294] Updated weights for policy 0, policy_version 65374 (0.0013) [2025-01-04 02:05:48,967][134211] Fps is (10 sec: 15565.3, 60 sec: 15292.0, 300 sec: 15509.3). Total num frames: 267800576. Throughput: 0: 3932.7. Samples: 56109342. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 02:05:48,968][134211] Avg episode reward: [(0, '7.057')] [2025-01-04 02:05:49,529][134294] Updated weights for policy 0, policy_version 65384 (0.0013) [2025-01-04 02:05:51,411][134294] Updated weights for policy 0, policy_version 65394 (0.0013) [2025-01-04 02:05:53,260][134294] Updated weights for policy 0, policy_version 65404 (0.0013) [2025-01-04 02:05:53,968][134211] Fps is (10 sec: 20480.3, 60 sec: 15974.4, 300 sec: 15689.8). Total num frames: 267907072. Throughput: 0: 3936.4. Samples: 56140930. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 02:05:53,968][134211] Avg episode reward: [(0, '7.048')] [2025-01-04 02:05:55,933][134294] Updated weights for policy 0, policy_version 65414 (0.0022) [2025-01-04 02:05:58,968][134211] Fps is (10 sec: 17202.9, 60 sec: 16179.2, 300 sec: 15703.7). Total num frames: 267972608. Throughput: 0: 3981.7. Samples: 56164422. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 02:05:58,968][134211] Avg episode reward: [(0, '6.843')] [2025-01-04 02:05:59,121][134294] Updated weights for policy 0, policy_version 65424 (0.0029) [2025-01-04 02:06:02,387][134294] Updated weights for policy 0, policy_version 65434 (0.0027) [2025-01-04 02:06:03,968][134211] Fps is (10 sec: 13107.2, 60 sec: 15769.6, 300 sec: 15703.7). Total num frames: 268038144. Throughput: 0: 3976.8. Samples: 56173878. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 02:06:03,968][134211] Avg episode reward: [(0, '6.247')] [2025-01-04 02:06:05,475][134294] Updated weights for policy 0, policy_version 65444 (0.0025) [2025-01-04 02:06:08,542][134294] Updated weights for policy 0, policy_version 65454 (0.0022) [2025-01-04 02:06:08,968][134211] Fps is (10 sec: 13107.3, 60 sec: 15223.4, 300 sec: 15578.7). Total num frames: 268103680. Throughput: 0: 3958.8. Samples: 56193456. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 02:06:08,969][134211] Avg episode reward: [(0, '6.613')] [2025-01-04 02:06:11,586][134294] Updated weights for policy 0, policy_version 65464 (0.0025) [2025-01-04 02:06:13,968][134211] Fps is (10 sec: 13107.3, 60 sec: 15223.5, 300 sec: 15439.8). Total num frames: 268169216. Throughput: 0: 3820.0. Samples: 56213404. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 02:06:13,968][134211] Avg episode reward: [(0, '7.044')] [2025-01-04 02:06:14,720][134294] Updated weights for policy 0, policy_version 65474 (0.0027) [2025-01-04 02:06:17,564][134294] Updated weights for policy 0, policy_version 65484 (0.0020) [2025-01-04 02:06:18,968][134211] Fps is (10 sec: 14745.7, 60 sec: 15496.5, 300 sec: 15439.8). Total num frames: 268251136. Throughput: 0: 3676.6. Samples: 56223404. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 02:06:18,968][134211] Avg episode reward: [(0, '6.791')] [2025-01-04 02:06:19,616][134294] Updated weights for policy 0, policy_version 65494 (0.0017) [2025-01-04 02:06:22,623][134294] Updated weights for policy 0, policy_version 65504 (0.0023) [2025-01-04 02:06:23,968][134211] Fps is (10 sec: 15155.0, 60 sec: 15496.5, 300 sec: 15453.8). Total num frames: 268320768. Throughput: 0: 3743.4. Samples: 56247940. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 02:06:23,968][134211] Avg episode reward: [(0, '7.308')] [2025-01-04 02:06:25,584][134294] Updated weights for policy 0, policy_version 65514 (0.0022) [2025-01-04 02:06:27,476][134294] Updated weights for policy 0, policy_version 65524 (0.0014) [2025-01-04 02:06:28,967][134211] Fps is (10 sec: 16384.2, 60 sec: 15428.3, 300 sec: 15550.9). Total num frames: 268414976. Throughput: 0: 3929.2. Samples: 56273608. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 02:06:28,968][134211] Avg episode reward: [(0, '6.572')] [2025-01-04 02:06:29,354][134294] Updated weights for policy 0, policy_version 65534 (0.0015) [2025-01-04 02:06:31,269][134294] Updated weights for policy 0, policy_version 65544 (0.0013) [2025-01-04 02:06:33,197][134294] Updated weights for policy 0, policy_version 65554 (0.0013) [2025-01-04 02:06:33,968][134211] Fps is (10 sec: 19660.7, 60 sec: 15564.8, 300 sec: 15689.8). Total num frames: 268517376. Throughput: 0: 4013.3. Samples: 56289940. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 02:06:33,969][134211] Avg episode reward: [(0, '6.764')] [2025-01-04 02:06:36,169][134294] Updated weights for policy 0, policy_version 65564 (0.0026) [2025-01-04 02:06:38,968][134211] Fps is (10 sec: 16793.2, 60 sec: 15633.1, 300 sec: 15675.9). Total num frames: 268582912. Throughput: 0: 3844.7. Samples: 56313942. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 02:06:38,968][134211] Avg episode reward: [(0, '6.799')] [2025-01-04 02:06:39,573][134294] Updated weights for policy 0, policy_version 65574 (0.0027) [2025-01-04 02:06:42,669][134294] Updated weights for policy 0, policy_version 65584 (0.0027) [2025-01-04 02:06:43,968][134211] Fps is (10 sec: 13107.5, 60 sec: 15769.6, 300 sec: 15675.9). Total num frames: 268648448. Throughput: 0: 3739.1. Samples: 56332682. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 02:06:43,968][134211] Avg episode reward: [(0, '7.009')] [2025-01-04 02:06:44,974][134294] Updated weights for policy 0, policy_version 65594 (0.0018) [2025-01-04 02:06:47,464][134294] Updated weights for policy 0, policy_version 65604 (0.0021) [2025-01-04 02:06:48,968][134211] Fps is (10 sec: 14745.4, 60 sec: 15496.4, 300 sec: 15620.3). Total num frames: 268730368. Throughput: 0: 3847.9. Samples: 56347034. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 02:06:48,969][134211] Avg episode reward: [(0, '7.152')] [2025-01-04 02:06:50,533][134294] Updated weights for policy 0, policy_version 65614 (0.0024) [2025-01-04 02:06:53,699][134294] Updated weights for policy 0, policy_version 65624 (0.0024) [2025-01-04 02:06:53,968][134211] Fps is (10 sec: 14745.3, 60 sec: 14813.8, 300 sec: 15467.6). Total num frames: 268795904. Throughput: 0: 3873.5. Samples: 56367764. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 02:06:53,968][134211] Avg episode reward: [(0, '7.906')] [2025-01-04 02:06:56,703][134294] Updated weights for policy 0, policy_version 65634 (0.0022) [2025-01-04 02:06:58,790][134294] Updated weights for policy 0, policy_version 65644 (0.0013) [2025-01-04 02:06:58,967][134211] Fps is (10 sec: 15155.9, 60 sec: 15155.3, 300 sec: 15467.6). Total num frames: 268881920. Throughput: 0: 3915.9. Samples: 56389618. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 02:06:58,968][134211] Avg episode reward: [(0, '7.357')] [2025-01-04 02:07:00,806][134294] Updated weights for policy 0, policy_version 65654 (0.0013) [2025-01-04 02:07:02,712][134294] Updated weights for policy 0, policy_version 65664 (0.0014) [2025-01-04 02:07:03,967][134211] Fps is (10 sec: 18842.2, 60 sec: 15769.6, 300 sec: 15606.5). Total num frames: 268984320. Throughput: 0: 4035.5. Samples: 56405002. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 02:07:03,968][134211] Avg episode reward: [(0, '6.506')] [2025-01-04 02:07:04,598][134294] Updated weights for policy 0, policy_version 65674 (0.0013) [2025-01-04 02:07:06,493][134294] Updated weights for policy 0, policy_version 65684 (0.0013) [2025-01-04 02:07:08,968][134211] Fps is (10 sec: 19250.6, 60 sec: 16179.2, 300 sec: 15689.8). Total num frames: 269074432. Throughput: 0: 4191.1. Samples: 56436538. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 02:07:08,971][134211] Avg episode reward: [(0, '6.667')] [2025-01-04 02:07:09,493][134294] Updated weights for policy 0, policy_version 65694 (0.0027) [2025-01-04 02:07:12,795][134294] Updated weights for policy 0, policy_version 65704 (0.0026) [2025-01-04 02:07:13,969][134211] Fps is (10 sec: 15152.9, 60 sec: 16110.6, 300 sec: 15661.9). Total num frames: 269135872. Throughput: 0: 4037.2. Samples: 56455288. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 02:07:13,970][134211] Avg episode reward: [(0, '7.286')] [2025-01-04 02:07:14,033][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000065708_269139968.pth... [2025-01-04 02:07:14,107][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000064788_265371648.pth [2025-01-04 02:07:15,967][134294] Updated weights for policy 0, policy_version 65714 (0.0025) [2025-01-04 02:07:18,968][134211] Fps is (10 sec: 12697.5, 60 sec: 15837.8, 300 sec: 15662.0). Total num frames: 269201408. Throughput: 0: 3890.4. Samples: 56465010. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 02:07:18,969][134211] Avg episode reward: [(0, '6.514')] [2025-01-04 02:07:18,979][134294] Updated weights for policy 0, policy_version 65724 (0.0026) [2025-01-04 02:07:22,087][134294] Updated weights for policy 0, policy_version 65734 (0.0026) [2025-01-04 02:07:23,968][134211] Fps is (10 sec: 13518.3, 60 sec: 15837.8, 300 sec: 15634.2). Total num frames: 269271040. Throughput: 0: 3809.1. Samples: 56485350. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 02:07:23,969][134211] Avg episode reward: [(0, '6.300')] [2025-01-04 02:07:24,962][134294] Updated weights for policy 0, policy_version 65744 (0.0024) [2025-01-04 02:07:28,085][134294] Updated weights for policy 0, policy_version 65754 (0.0023) [2025-01-04 02:07:28,969][134211] Fps is (10 sec: 13514.9, 60 sec: 15359.6, 300 sec: 15495.3). Total num frames: 269336576. Throughput: 0: 3839.0. Samples: 56505444. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 02:07:28,970][134211] Avg episode reward: [(0, '6.308')] [2025-01-04 02:07:31,323][134294] Updated weights for policy 0, policy_version 65764 (0.0021) [2025-01-04 02:07:33,810][134294] Updated weights for policy 0, policy_version 65774 (0.0018) [2025-01-04 02:07:33,968][134211] Fps is (10 sec: 13926.8, 60 sec: 14882.2, 300 sec: 15481.5). Total num frames: 269410304. Throughput: 0: 3732.2. Samples: 56514980. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 02:07:33,968][134211] Avg episode reward: [(0, '6.204')] [2025-01-04 02:07:35,712][134294] Updated weights for policy 0, policy_version 65784 (0.0013) [2025-01-04 02:07:37,634][134294] Updated weights for policy 0, policy_version 65794 (0.0014) [2025-01-04 02:07:38,968][134211] Fps is (10 sec: 18435.1, 60 sec: 15633.1, 300 sec: 15634.2). Total num frames: 269520896. Throughput: 0: 3916.8. Samples: 56544020. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 02:07:38,968][134211] Avg episode reward: [(0, '7.069')] [2025-01-04 02:07:39,507][134294] Updated weights for policy 0, policy_version 65804 (0.0013) [2025-01-04 02:07:41,384][134294] Updated weights for policy 0, policy_version 65814 (0.0013) [2025-01-04 02:07:43,878][134294] Updated weights for policy 0, policy_version 65824 (0.0020) [2025-01-04 02:07:43,968][134211] Fps is (10 sec: 20479.5, 60 sec: 16110.9, 300 sec: 15745.3). Total num frames: 269615104. Throughput: 0: 4114.9. Samples: 56574788. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 02:07:43,968][134211] Avg episode reward: [(0, '7.417')] [2025-01-04 02:07:47,142][134294] Updated weights for policy 0, policy_version 65834 (0.0027) [2025-01-04 02:07:48,968][134211] Fps is (10 sec: 15564.5, 60 sec: 15769.6, 300 sec: 15606.4). Total num frames: 269676544. Throughput: 0: 3980.9. Samples: 56584142. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 02:07:48,968][134211] Avg episode reward: [(0, '6.708')] [2025-01-04 02:07:50,244][134294] Updated weights for policy 0, policy_version 65844 (0.0028) [2025-01-04 02:07:53,409][134294] Updated weights for policy 0, policy_version 65854 (0.0026) [2025-01-04 02:07:53,968][134211] Fps is (10 sec: 12697.6, 60 sec: 15769.6, 300 sec: 15495.4). Total num frames: 269742080. Throughput: 0: 3717.9. Samples: 56603842. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 02:07:53,968][134211] Avg episode reward: [(0, '7.519')] [2025-01-04 02:07:56,491][134294] Updated weights for policy 0, policy_version 65864 (0.0024) [2025-01-04 02:07:58,968][134211] Fps is (10 sec: 13107.2, 60 sec: 15428.2, 300 sec: 15481.5). Total num frames: 269807616. Throughput: 0: 3734.4. Samples: 56623330. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 02:07:58,968][134211] Avg episode reward: [(0, '7.149')] [2025-01-04 02:07:59,609][134294] Updated weights for policy 0, policy_version 65874 (0.0027) [2025-01-04 02:08:01,835][134294] Updated weights for policy 0, policy_version 65884 (0.0015) [2025-01-04 02:08:03,817][134294] Updated weights for policy 0, policy_version 65894 (0.0012) [2025-01-04 02:08:03,968][134211] Fps is (10 sec: 15974.8, 60 sec: 15291.7, 300 sec: 15578.7). Total num frames: 269901824. Throughput: 0: 3778.1. Samples: 56635024. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 02:08:03,968][134211] Avg episode reward: [(0, '7.133')] [2025-01-04 02:08:05,684][134294] Updated weights for policy 0, policy_version 65904 (0.0012) [2025-01-04 02:08:07,575][134294] Updated weights for policy 0, policy_version 65914 (0.0014) [2025-01-04 02:08:08,967][134211] Fps is (10 sec: 20480.6, 60 sec: 15633.1, 300 sec: 15731.5). Total num frames: 270012416. Throughput: 0: 4037.3. Samples: 56667028. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 02:08:08,968][134211] Avg episode reward: [(0, '6.970')] [2025-01-04 02:08:09,471][134294] Updated weights for policy 0, policy_version 65924 (0.0013) [2025-01-04 02:08:11,366][134294] Updated weights for policy 0, policy_version 65934 (0.0013) [2025-01-04 02:08:13,968][134211] Fps is (10 sec: 20070.2, 60 sec: 16111.3, 300 sec: 15814.8). Total num frames: 270102528. Throughput: 0: 4241.7. Samples: 56696312. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 02:08:13,969][134211] Avg episode reward: [(0, '7.344')] [2025-01-04 02:08:14,353][134294] Updated weights for policy 0, policy_version 65944 (0.0024) [2025-01-04 02:08:17,562][134294] Updated weights for policy 0, policy_version 65954 (0.0028) [2025-01-04 02:08:18,968][134211] Fps is (10 sec: 14745.3, 60 sec: 15974.4, 300 sec: 15773.1). Total num frames: 270159872. Throughput: 0: 4233.4. Samples: 56705484. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 02:08:18,968][134211] Avg episode reward: [(0, '7.075')] [2025-01-04 02:08:20,937][134294] Updated weights for policy 0, policy_version 65964 (0.0027) [2025-01-04 02:08:23,968][134211] Fps is (10 sec: 12288.0, 60 sec: 15906.2, 300 sec: 15689.7). Total num frames: 270225408. Throughput: 0: 4008.3. Samples: 56724396. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 02:08:23,968][134211] Avg episode reward: [(0, '6.672')] [2025-01-04 02:08:24,048][134294] Updated weights for policy 0, policy_version 65974 (0.0026) [2025-01-04 02:08:27,448][134294] Updated weights for policy 0, policy_version 65984 (0.0028) [2025-01-04 02:08:28,968][134211] Fps is (10 sec: 12697.5, 60 sec: 15838.3, 300 sec: 15537.0). Total num frames: 270286848. Throughput: 0: 3740.1. Samples: 56743094. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 02:08:28,968][134211] Avg episode reward: [(0, '7.002')] [2025-01-04 02:08:30,777][134294] Updated weights for policy 0, policy_version 65994 (0.0028) [2025-01-04 02:08:33,098][134294] Updated weights for policy 0, policy_version 66004 (0.0017) [2025-01-04 02:08:33,968][134211] Fps is (10 sec: 14336.2, 60 sec: 15974.4, 300 sec: 15481.5). Total num frames: 270368768. Throughput: 0: 3739.2. Samples: 56752406. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 02:08:33,968][134211] Avg episode reward: [(0, '6.391')] [2025-01-04 02:08:35,013][134294] Updated weights for policy 0, policy_version 66014 (0.0012) [2025-01-04 02:08:36,891][134294] Updated weights for policy 0, policy_version 66024 (0.0013) [2025-01-04 02:08:38,829][134294] Updated weights for policy 0, policy_version 66034 (0.0015) [2025-01-04 02:08:38,968][134211] Fps is (10 sec: 19251.6, 60 sec: 15974.4, 300 sec: 15662.0). Total num frames: 270479360. Throughput: 0: 3988.7. Samples: 56783332. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 02:08:38,968][134211] Avg episode reward: [(0, '6.763')] [2025-01-04 02:08:40,710][134294] Updated weights for policy 0, policy_version 66044 (0.0013) [2025-01-04 02:08:42,613][134294] Updated weights for policy 0, policy_version 66054 (0.0014) [2025-01-04 02:08:43,968][134211] Fps is (10 sec: 20889.1, 60 sec: 16042.7, 300 sec: 15787.0). Total num frames: 270577664. Throughput: 0: 4265.9. Samples: 56815294. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 02:08:43,968][134211] Avg episode reward: [(0, '7.514')] [2025-01-04 02:08:45,289][134294] Updated weights for policy 0, policy_version 66064 (0.0021) [2025-01-04 02:08:48,468][134294] Updated weights for policy 0, policy_version 66074 (0.0026) [2025-01-04 02:08:48,968][134211] Fps is (10 sec: 16383.7, 60 sec: 16110.9, 300 sec: 15800.8). Total num frames: 270643200. Throughput: 0: 4236.1. Samples: 56825648. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 02:08:48,968][134211] Avg episode reward: [(0, '6.900')] [2025-01-04 02:08:51,586][134294] Updated weights for policy 0, policy_version 66084 (0.0029) [2025-01-04 02:08:53,968][134211] Fps is (10 sec: 13107.4, 60 sec: 16111.0, 300 sec: 15773.1). Total num frames: 270708736. Throughput: 0: 3955.8. Samples: 56845042. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 02:08:53,968][134211] Avg episode reward: [(0, '7.590')] [2025-01-04 02:08:55,040][134294] Updated weights for policy 0, policy_version 66094 (0.0028) [2025-01-04 02:08:58,677][134294] Updated weights for policy 0, policy_version 66104 (0.0026) [2025-01-04 02:08:58,968][134211] Fps is (10 sec: 11878.3, 60 sec: 15906.1, 300 sec: 15578.7). Total num frames: 270761984. Throughput: 0: 3688.8. Samples: 56862310. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 02:08:58,969][134211] Avg episode reward: [(0, '6.909')] [2025-01-04 02:09:01,841][134294] Updated weights for policy 0, policy_version 66114 (0.0022) [2025-01-04 02:09:03,924][134294] Updated weights for policy 0, policy_version 66124 (0.0014) [2025-01-04 02:09:03,968][134211] Fps is (10 sec: 13517.0, 60 sec: 15701.3, 300 sec: 15495.4). Total num frames: 270843904. Throughput: 0: 3675.2. Samples: 56870866. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 02:09:03,968][134211] Avg episode reward: [(0, '6.754')] [2025-01-04 02:09:05,882][134294] Updated weights for policy 0, policy_version 66134 (0.0014) [2025-01-04 02:09:07,775][134294] Updated weights for policy 0, policy_version 66144 (0.0014) [2025-01-04 02:09:08,968][134211] Fps is (10 sec: 18431.4, 60 sec: 15564.6, 300 sec: 15620.3). Total num frames: 270946304. Throughput: 0: 3936.0. Samples: 56901516. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 02:09:08,969][134211] Avg episode reward: [(0, '6.885')] [2025-01-04 02:09:10,276][134294] Updated weights for policy 0, policy_version 66154 (0.0021) [2025-01-04 02:09:13,359][134294] Updated weights for policy 0, policy_version 66164 (0.0029) [2025-01-04 02:09:13,968][134211] Fps is (10 sec: 16793.1, 60 sec: 15155.2, 300 sec: 15620.3). Total num frames: 271011840. Throughput: 0: 4034.4. Samples: 56924644. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 02:09:13,968][134211] Avg episode reward: [(0, '6.564')] [2025-01-04 02:09:13,982][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000066166_271015936.pth... [2025-01-04 02:09:14,057][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000065249_267259904.pth [2025-01-04 02:09:16,533][134294] Updated weights for policy 0, policy_version 66174 (0.0027) [2025-01-04 02:09:18,968][134211] Fps is (10 sec: 13107.7, 60 sec: 15291.7, 300 sec: 15620.3). Total num frames: 271077376. Throughput: 0: 4041.7. Samples: 56934284. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 02:09:18,968][134211] Avg episode reward: [(0, '6.647')] [2025-01-04 02:09:19,700][134294] Updated weights for policy 0, policy_version 66184 (0.0025) [2025-01-04 02:09:22,790][134294] Updated weights for policy 0, policy_version 66194 (0.0027) [2025-01-04 02:09:23,968][134211] Fps is (10 sec: 13107.3, 60 sec: 15291.7, 300 sec: 15620.3). Total num frames: 271142912. Throughput: 0: 3789.6. Samples: 56953864. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 02:09:23,968][134211] Avg episode reward: [(0, '7.135')] [2025-01-04 02:09:25,838][134294] Updated weights for policy 0, policy_version 66204 (0.0026) [2025-01-04 02:09:28,470][134294] Updated weights for policy 0, policy_version 66214 (0.0020) [2025-01-04 02:09:28,968][134211] Fps is (10 sec: 14336.3, 60 sec: 15564.9, 300 sec: 15662.0). Total num frames: 271220736. Throughput: 0: 3547.8. Samples: 56974946. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 02:09:28,968][134211] Avg episode reward: [(0, '6.980')] [2025-01-04 02:09:30,334][134294] Updated weights for policy 0, policy_version 66224 (0.0013) [2025-01-04 02:09:32,221][134294] Updated weights for policy 0, policy_version 66234 (0.0013) [2025-01-04 02:09:33,967][134211] Fps is (10 sec: 18842.2, 60 sec: 16042.7, 300 sec: 15703.7). Total num frames: 271331328. Throughput: 0: 3678.2. Samples: 56991168. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 02:09:33,968][134211] Avg episode reward: [(0, '6.840')] [2025-01-04 02:09:34,166][134294] Updated weights for policy 0, policy_version 66244 (0.0013) [2025-01-04 02:09:36,003][134294] Updated weights for policy 0, policy_version 66254 (0.0013) [2025-01-04 02:09:37,954][134294] Updated weights for policy 0, policy_version 66264 (0.0014) [2025-01-04 02:09:38,968][134211] Fps is (10 sec: 21708.6, 60 sec: 15974.4, 300 sec: 15689.8). Total num frames: 271437824. Throughput: 0: 3965.9. Samples: 57023508. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-04 02:09:38,968][134211] Avg episode reward: [(0, '6.606')] [2025-01-04 02:09:40,169][134294] Updated weights for policy 0, policy_version 66274 (0.0018) [2025-01-04 02:09:43,399][134294] Updated weights for policy 0, policy_version 66284 (0.0029) [2025-01-04 02:09:43,968][134211] Fps is (10 sec: 17202.9, 60 sec: 15428.3, 300 sec: 15662.0). Total num frames: 271503360. Throughput: 0: 4115.0. Samples: 57047486. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-04 02:09:43,968][134211] Avg episode reward: [(0, '7.047')] [2025-01-04 02:09:46,521][134294] Updated weights for policy 0, policy_version 66294 (0.0027) [2025-01-04 02:09:48,969][134211] Fps is (10 sec: 13105.9, 60 sec: 15428.0, 300 sec: 15661.9). Total num frames: 271568896. Throughput: 0: 4137.5. Samples: 57057058. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-04 02:09:48,969][134211] Avg episode reward: [(0, '7.487')] [2025-01-04 02:09:49,823][134294] Updated weights for policy 0, policy_version 66304 (0.0026) [2025-01-04 02:09:52,748][134294] Updated weights for policy 0, policy_version 66314 (0.0026) [2025-01-04 02:09:53,968][134211] Fps is (10 sec: 13106.9, 60 sec: 15428.2, 300 sec: 15703.6). Total num frames: 271634432. Throughput: 0: 3890.6. Samples: 57076590. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-04 02:09:53,969][134211] Avg episode reward: [(0, '7.001')] [2025-01-04 02:09:55,958][134294] Updated weights for policy 0, policy_version 66324 (0.0027) [2025-01-04 02:09:58,946][134294] Updated weights for policy 0, policy_version 66334 (0.0027) [2025-01-04 02:09:58,968][134211] Fps is (10 sec: 13518.0, 60 sec: 15701.4, 300 sec: 15634.2). Total num frames: 271704064. Throughput: 0: 3824.3. Samples: 57096738. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-04 02:09:58,968][134211] Avg episode reward: [(0, '6.742')] [2025-01-04 02:10:02,235][134294] Updated weights for policy 0, policy_version 66344 (0.0025) [2025-01-04 02:10:03,968][134211] Fps is (10 sec: 13107.2, 60 sec: 15359.9, 300 sec: 15509.2). Total num frames: 271765504. Throughput: 0: 3820.0. Samples: 57106186. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-04 02:10:03,969][134211] Avg episode reward: [(0, '7.040')] [2025-01-04 02:10:05,311][134294] Updated weights for policy 0, policy_version 66354 (0.0030) [2025-01-04 02:10:07,241][134294] Updated weights for policy 0, policy_version 66364 (0.0012) [2025-01-04 02:10:08,968][134211] Fps is (10 sec: 15974.3, 60 sec: 15291.8, 300 sec: 15620.3). Total num frames: 271863808. Throughput: 0: 3907.6. Samples: 57129706. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-04 02:10:08,968][134211] Avg episode reward: [(0, '7.781')] [2025-01-04 02:10:09,112][134294] Updated weights for policy 0, policy_version 66374 (0.0014) [2025-01-04 02:10:11,024][134294] Updated weights for policy 0, policy_version 66384 (0.0012) [2025-01-04 02:10:12,871][134294] Updated weights for policy 0, policy_version 66394 (0.0013) [2025-01-04 02:10:13,968][134211] Fps is (10 sec: 20070.7, 60 sec: 15906.2, 300 sec: 15745.3). Total num frames: 271966208. Throughput: 0: 4163.4. Samples: 57162298. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-04 02:10:13,968][134211] Avg episode reward: [(0, '6.610')] [2025-01-04 02:10:15,471][134294] Updated weights for policy 0, policy_version 66404 (0.0023) [2025-01-04 02:10:18,631][134294] Updated weights for policy 0, policy_version 66414 (0.0027) [2025-01-04 02:10:18,968][134211] Fps is (10 sec: 17203.4, 60 sec: 15974.4, 300 sec: 15745.3). Total num frames: 272035840. Throughput: 0: 4043.7. Samples: 57173136. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 02:10:18,968][134211] Avg episode reward: [(0, '6.992')] [2025-01-04 02:10:21,748][134294] Updated weights for policy 0, policy_version 66424 (0.0025) [2025-01-04 02:10:23,970][134211] Fps is (10 sec: 13103.8, 60 sec: 15905.5, 300 sec: 15620.2). Total num frames: 272097280. Throughput: 0: 3759.9. Samples: 57192712. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 02:10:23,971][134211] Avg episode reward: [(0, '6.704')] [2025-01-04 02:10:24,938][134294] Updated weights for policy 0, policy_version 66434 (0.0027) [2025-01-04 02:10:28,081][134294] Updated weights for policy 0, policy_version 66444 (0.0027) [2025-01-04 02:10:28,968][134211] Fps is (10 sec: 12697.5, 60 sec: 15701.3, 300 sec: 15523.2). Total num frames: 272162816. Throughput: 0: 3660.7. Samples: 57212216. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 02:10:28,968][134211] Avg episode reward: [(0, '6.899')] [2025-01-04 02:10:31,120][134294] Updated weights for policy 0, policy_version 66454 (0.0024) [2025-01-04 02:10:33,034][134294] Updated weights for policy 0, policy_version 66464 (0.0012) [2025-01-04 02:10:33,967][134211] Fps is (10 sec: 15569.2, 60 sec: 15360.0, 300 sec: 15620.4). Total num frames: 272252928. Throughput: 0: 3674.7. Samples: 57222416. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 02:10:33,968][134211] Avg episode reward: [(0, '6.726')] [2025-01-04 02:10:34,892][134294] Updated weights for policy 0, policy_version 66474 (0.0012) [2025-01-04 02:10:36,792][134294] Updated weights for policy 0, policy_version 66484 (0.0014) [2025-01-04 02:10:38,968][134211] Fps is (10 sec: 18841.6, 60 sec: 15223.4, 300 sec: 15759.2). Total num frames: 272351232. Throughput: 0: 3963.5. Samples: 57254948. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 02:10:38,968][134211] Avg episode reward: [(0, '6.888')] [2025-01-04 02:10:39,342][134294] Updated weights for policy 0, policy_version 66494 (0.0022) [2025-01-04 02:10:42,736][134294] Updated weights for policy 0, policy_version 66504 (0.0029) [2025-01-04 02:10:43,968][134211] Fps is (10 sec: 15973.5, 60 sec: 15155.1, 300 sec: 15634.2). Total num frames: 272412672. Throughput: 0: 3951.1. Samples: 57274540. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 02:10:43,969][134211] Avg episode reward: [(0, '6.465')] [2025-01-04 02:10:46,169][134294] Updated weights for policy 0, policy_version 66514 (0.0026) [2025-01-04 02:10:48,651][134294] Updated weights for policy 0, policy_version 66524 (0.0016) [2025-01-04 02:10:48,967][134211] Fps is (10 sec: 13517.2, 60 sec: 15292.0, 300 sec: 15523.2). Total num frames: 272486400. Throughput: 0: 3943.1. Samples: 57283622. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 02:10:48,968][134211] Avg episode reward: [(0, '7.897')] [2025-01-04 02:10:50,599][134294] Updated weights for policy 0, policy_version 66534 (0.0013) [2025-01-04 02:10:52,454][134294] Updated weights for policy 0, policy_version 66544 (0.0014) [2025-01-04 02:10:53,968][134211] Fps is (10 sec: 18431.4, 60 sec: 16042.5, 300 sec: 15675.8). Total num frames: 272596992. Throughput: 0: 4069.5. Samples: 57312836. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 02:10:53,969][134211] Avg episode reward: [(0, '7.030')] [2025-01-04 02:10:54,348][134294] Updated weights for policy 0, policy_version 66554 (0.0013) [2025-01-04 02:10:56,361][134294] Updated weights for policy 0, policy_version 66564 (0.0014) [2025-01-04 02:10:58,415][134294] Updated weights for policy 0, policy_version 66574 (0.0017) [2025-01-04 02:10:58,968][134211] Fps is (10 sec: 20889.0, 60 sec: 16520.5, 300 sec: 15787.0). Total num frames: 272695296. Throughput: 0: 4031.5. Samples: 57343716. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 02:10:58,968][134211] Avg episode reward: [(0, '6.573')] [2025-01-04 02:11:01,086][134294] Updated weights for policy 0, policy_version 66584 (0.0021) [2025-01-04 02:11:03,968][134211] Fps is (10 sec: 16384.5, 60 sec: 16588.7, 300 sec: 15786.9). Total num frames: 272760832. Throughput: 0: 4051.5. Samples: 57355456. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 02:11:03,969][134211] Avg episode reward: [(0, '6.445')] [2025-01-04 02:11:04,695][134294] Updated weights for policy 0, policy_version 66594 (0.0029) [2025-01-04 02:11:07,860][134294] Updated weights for policy 0, policy_version 66604 (0.0028) [2025-01-04 02:11:08,968][134211] Fps is (10 sec: 12697.6, 60 sec: 15974.4, 300 sec: 15773.1). Total num frames: 272822272. Throughput: 0: 4019.1. Samples: 57373560. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 02:11:08,968][134211] Avg episode reward: [(0, '6.816')] [2025-01-04 02:11:10,906][134294] Updated weights for policy 0, policy_version 66614 (0.0024) [2025-01-04 02:11:13,901][134294] Updated weights for policy 0, policy_version 66624 (0.0024) [2025-01-04 02:11:13,968][134211] Fps is (10 sec: 13107.4, 60 sec: 15428.2, 300 sec: 15731.4). Total num frames: 272891904. Throughput: 0: 4036.3. Samples: 57393850. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 02:11:13,969][134211] Avg episode reward: [(0, '6.810')] [2025-01-04 02:11:13,978][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000066624_272891904.pth... [2025-01-04 02:11:14,054][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000065708_269139968.pth [2025-01-04 02:11:17,001][134294] Updated weights for policy 0, policy_version 66634 (0.0023) [2025-01-04 02:11:18,968][134211] Fps is (10 sec: 13516.8, 60 sec: 15360.0, 300 sec: 15717.5). Total num frames: 272957440. Throughput: 0: 4023.6. Samples: 57403478. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 02:11:18,968][134211] Avg episode reward: [(0, '6.615')] [2025-01-04 02:11:20,064][134294] Updated weights for policy 0, policy_version 66644 (0.0024) [2025-01-04 02:11:23,094][134294] Updated weights for policy 0, policy_version 66654 (0.0025) [2025-01-04 02:11:23,969][134211] Fps is (10 sec: 13106.1, 60 sec: 15428.7, 300 sec: 15620.3). Total num frames: 273022976. Throughput: 0: 3759.2. Samples: 57424118. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 02:11:23,969][134211] Avg episode reward: [(0, '7.163')] [2025-01-04 02:11:25,987][134294] Updated weights for policy 0, policy_version 66664 (0.0026) [2025-01-04 02:11:28,967][134211] Fps is (10 sec: 13926.7, 60 sec: 15564.9, 300 sec: 15523.2). Total num frames: 273096704. Throughput: 0: 3785.1. Samples: 57444866. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 02:11:28,968][134211] Avg episode reward: [(0, '7.184')] [2025-01-04 02:11:28,972][134294] Updated weights for policy 0, policy_version 66674 (0.0024) [2025-01-04 02:11:31,846][134294] Updated weights for policy 0, policy_version 66684 (0.0023) [2025-01-04 02:11:33,956][134294] Updated weights for policy 0, policy_version 66694 (0.0013) [2025-01-04 02:11:33,968][134211] Fps is (10 sec: 15566.5, 60 sec: 15428.2, 300 sec: 15578.7). Total num frames: 273178624. Throughput: 0: 3808.9. Samples: 57455022. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 02:11:33,968][134211] Avg episode reward: [(0, '6.623')] [2025-01-04 02:11:35,857][134294] Updated weights for policy 0, policy_version 66704 (0.0013) [2025-01-04 02:11:37,738][134294] Updated weights for policy 0, policy_version 66714 (0.0013) [2025-01-04 02:11:38,968][134211] Fps is (10 sec: 18841.5, 60 sec: 15564.9, 300 sec: 15717.5). Total num frames: 273285120. Throughput: 0: 3829.8. Samples: 57485172. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 02:11:38,968][134211] Avg episode reward: [(0, '6.214')] [2025-01-04 02:11:39,685][134294] Updated weights for policy 0, policy_version 66724 (0.0015) [2025-01-04 02:11:41,527][134294] Updated weights for policy 0, policy_version 66734 (0.0013) [2025-01-04 02:11:43,509][134294] Updated weights for policy 0, policy_version 66744 (0.0014) [2025-01-04 02:11:43,968][134211] Fps is (10 sec: 20889.4, 60 sec: 16247.6, 300 sec: 15787.0). Total num frames: 273387520. Throughput: 0: 3865.6. Samples: 57517668. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-04 02:11:43,968][134211] Avg episode reward: [(0, '6.366')] [2025-01-04 02:11:46,438][134294] Updated weights for policy 0, policy_version 66754 (0.0025) [2025-01-04 02:11:48,968][134211] Fps is (10 sec: 16793.3, 60 sec: 16110.9, 300 sec: 15787.0). Total num frames: 273453056. Throughput: 0: 3844.3. Samples: 57528448. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-04 02:11:48,968][134211] Avg episode reward: [(0, '6.478')] [2025-01-04 02:11:49,772][134294] Updated weights for policy 0, policy_version 66764 (0.0029) [2025-01-04 02:11:53,070][134294] Updated weights for policy 0, policy_version 66774 (0.0026) [2025-01-04 02:11:53,968][134211] Fps is (10 sec: 12697.5, 60 sec: 15291.9, 300 sec: 15703.6). Total num frames: 273514496. Throughput: 0: 3857.0. Samples: 57547126. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-04 02:11:53,969][134211] Avg episode reward: [(0, '6.585')] [2025-01-04 02:11:56,378][134294] Updated weights for policy 0, policy_version 66784 (0.0026) [2025-01-04 02:11:58,968][134211] Fps is (10 sec: 12697.6, 60 sec: 14745.6, 300 sec: 15578.7). Total num frames: 273580032. Throughput: 0: 3824.9. Samples: 57565972. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-04 02:11:58,968][134211] Avg episode reward: [(0, '6.911')] [2025-01-04 02:11:59,518][134294] Updated weights for policy 0, policy_version 66794 (0.0024) [2025-01-04 02:12:02,499][134294] Updated weights for policy 0, policy_version 66804 (0.0027) [2025-01-04 02:12:03,968][134211] Fps is (10 sec: 13107.4, 60 sec: 14745.7, 300 sec: 15495.4). Total num frames: 273645568. Throughput: 0: 3831.3. Samples: 57575886. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-04 02:12:03,969][134211] Avg episode reward: [(0, '7.644')] [2025-01-04 02:12:05,332][134294] Updated weights for policy 0, policy_version 66814 (0.0021) [2025-01-04 02:12:07,251][134294] Updated weights for policy 0, policy_version 66824 (0.0015) [2025-01-04 02:12:08,968][134211] Fps is (10 sec: 16384.0, 60 sec: 15360.0, 300 sec: 15620.4). Total num frames: 273743872. Throughput: 0: 3922.8. Samples: 57600638. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-04 02:12:08,968][134211] Avg episode reward: [(0, '6.672')] [2025-01-04 02:12:09,322][134294] Updated weights for policy 0, policy_version 66834 (0.0015) [2025-01-04 02:12:12,228][134294] Updated weights for policy 0, policy_version 66844 (0.0027) [2025-01-04 02:12:13,968][134211] Fps is (10 sec: 16793.4, 60 sec: 15360.0, 300 sec: 15634.2). Total num frames: 273813504. Throughput: 0: 3990.1. Samples: 57624422. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-04 02:12:13,968][134211] Avg episode reward: [(0, '6.265')] [2025-01-04 02:12:15,448][134294] Updated weights for policy 0, policy_version 66854 (0.0027) [2025-01-04 02:12:18,564][134294] Updated weights for policy 0, policy_version 66864 (0.0022) [2025-01-04 02:12:18,968][134211] Fps is (10 sec: 13517.1, 60 sec: 15360.0, 300 sec: 15620.4). Total num frames: 273879040. Throughput: 0: 3979.3. Samples: 57634090. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-04 02:12:18,968][134211] Avg episode reward: [(0, '6.337')] [2025-01-04 02:12:20,625][134294] Updated weights for policy 0, policy_version 66874 (0.0015) [2025-01-04 02:12:22,472][134294] Updated weights for policy 0, policy_version 66884 (0.0014) [2025-01-04 02:12:23,968][134211] Fps is (10 sec: 17203.7, 60 sec: 16043.0, 300 sec: 15759.3). Total num frames: 273985536. Throughput: 0: 3898.9. Samples: 57660624. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 02:12:23,968][134211] Avg episode reward: [(0, '6.844')] [2025-01-04 02:12:24,409][134294] Updated weights for policy 0, policy_version 66894 (0.0014) [2025-01-04 02:12:26,385][134294] Updated weights for policy 0, policy_version 66904 (0.0015) [2025-01-04 02:12:28,968][134211] Fps is (10 sec: 18840.3, 60 sec: 16179.0, 300 sec: 15786.9). Total num frames: 274067456. Throughput: 0: 3801.7. Samples: 57688748. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 02:12:28,969][134211] Avg episode reward: [(0, '6.620')] [2025-01-04 02:12:29,804][134294] Updated weights for policy 0, policy_version 66914 (0.0025) [2025-01-04 02:12:33,349][134294] Updated weights for policy 0, policy_version 66924 (0.0028) [2025-01-04 02:12:33,968][134211] Fps is (10 sec: 13926.0, 60 sec: 15769.6, 300 sec: 15606.4). Total num frames: 274124800. Throughput: 0: 3745.4. Samples: 57696992. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 02:12:33,968][134211] Avg episode reward: [(0, '7.277')] [2025-01-04 02:12:36,297][134294] Updated weights for policy 0, policy_version 66934 (0.0020) [2025-01-04 02:12:38,192][134294] Updated weights for policy 0, policy_version 66944 (0.0014) [2025-01-04 02:12:38,968][134211] Fps is (10 sec: 14746.6, 60 sec: 15496.5, 300 sec: 15592.6). Total num frames: 274214912. Throughput: 0: 3803.6. Samples: 57718288. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 02:12:38,968][134211] Avg episode reward: [(0, '6.281')] [2025-01-04 02:12:40,143][134294] Updated weights for policy 0, policy_version 66954 (0.0014) [2025-01-04 02:12:42,020][134294] Updated weights for policy 0, policy_version 66964 (0.0014) [2025-01-04 02:12:43,869][134294] Updated weights for policy 0, policy_version 66974 (0.0015) [2025-01-04 02:12:43,968][134211] Fps is (10 sec: 20070.7, 60 sec: 15633.1, 300 sec: 15759.2). Total num frames: 274325504. Throughput: 0: 4107.5. Samples: 57750808. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 02:12:43,968][134211] Avg episode reward: [(0, '6.842')] [2025-01-04 02:12:45,755][134294] Updated weights for policy 0, policy_version 66984 (0.0014) [2025-01-04 02:12:47,685][134294] Updated weights for policy 0, policy_version 66994 (0.0014) [2025-01-04 02:12:48,968][134211] Fps is (10 sec: 21297.6, 60 sec: 16247.3, 300 sec: 15884.1). Total num frames: 274427904. Throughput: 0: 4249.8. Samples: 57767128. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 02:12:48,969][134211] Avg episode reward: [(0, '6.781')] [2025-01-04 02:12:50,257][134294] Updated weights for policy 0, policy_version 67004 (0.0020) [2025-01-04 02:12:53,493][134294] Updated weights for policy 0, policy_version 67014 (0.0029) [2025-01-04 02:12:53,968][134211] Fps is (10 sec: 16793.7, 60 sec: 16315.8, 300 sec: 15884.2). Total num frames: 274493440. Throughput: 0: 4239.1. Samples: 57791396. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 02:12:53,968][134211] Avg episode reward: [(0, '6.693')] [2025-01-04 02:12:56,664][134294] Updated weights for policy 0, policy_version 67024 (0.0028) [2025-01-04 02:12:58,970][134211] Fps is (10 sec: 12695.6, 60 sec: 16246.9, 300 sec: 15773.0). Total num frames: 274554880. Throughput: 0: 4117.6. Samples: 57809724. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 02:12:58,971][134211] Avg episode reward: [(0, '6.781')] [2025-01-04 02:13:00,427][134294] Updated weights for policy 0, policy_version 67034 (0.0027) [2025-01-04 02:13:03,968][134211] Fps is (10 sec: 11468.3, 60 sec: 16042.6, 300 sec: 15578.7). Total num frames: 274608128. Throughput: 0: 4086.4. Samples: 57817982. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 02:13:03,969][134211] Avg episode reward: [(0, '7.093')] [2025-01-04 02:13:04,205][134294] Updated weights for policy 0, policy_version 67044 (0.0025) [2025-01-04 02:13:07,478][134294] Updated weights for policy 0, policy_version 67054 (0.0024) [2025-01-04 02:13:08,968][134211] Fps is (10 sec: 11471.3, 60 sec: 15428.3, 300 sec: 15481.5). Total num frames: 274669568. Throughput: 0: 3887.6. Samples: 57835568. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 02:13:08,968][134211] Avg episode reward: [(0, '6.393')] [2025-01-04 02:13:10,591][134294] Updated weights for policy 0, policy_version 67064 (0.0027) [2025-01-04 02:13:13,420][134294] Updated weights for policy 0, policy_version 67074 (0.0025) [2025-01-04 02:13:13,968][134211] Fps is (10 sec: 13107.6, 60 sec: 15428.3, 300 sec: 15523.1). Total num frames: 274739200. Throughput: 0: 3715.7. Samples: 57855954. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 02:13:13,968][134211] Avg episode reward: [(0, '5.872')] [2025-01-04 02:13:13,977][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000067075_274739200.pth... [2025-01-04 02:13:14,053][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000066166_271015936.pth [2025-01-04 02:13:16,486][134294] Updated weights for policy 0, policy_version 67084 (0.0023) [2025-01-04 02:13:18,968][134211] Fps is (10 sec: 14336.2, 60 sec: 15564.8, 300 sec: 15550.9). Total num frames: 274812928. Throughput: 0: 3755.3. Samples: 57865980. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 02:13:18,968][134211] Avg episode reward: [(0, '6.280')] [2025-01-04 02:13:19,100][134294] Updated weights for policy 0, policy_version 67094 (0.0019) [2025-01-04 02:13:20,988][134294] Updated weights for policy 0, policy_version 67104 (0.0012) [2025-01-04 02:13:22,897][134294] Updated weights for policy 0, policy_version 67114 (0.0013) [2025-01-04 02:13:23,967][134211] Fps is (10 sec: 18023.0, 60 sec: 15564.8, 300 sec: 15703.7). Total num frames: 274919424. Throughput: 0: 3910.0. Samples: 57894236. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 02:13:23,968][134211] Avg episode reward: [(0, '6.863')] [2025-01-04 02:13:24,739][134294] Updated weights for policy 0, policy_version 67124 (0.0011) [2025-01-04 02:13:26,644][134294] Updated weights for policy 0, policy_version 67134 (0.0012) [2025-01-04 02:13:28,968][134211] Fps is (10 sec: 20479.6, 60 sec: 15838.0, 300 sec: 15759.2). Total num frames: 275017728. Throughput: 0: 3886.7. Samples: 57925710. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 02:13:28,968][134211] Avg episode reward: [(0, '6.857')] [2025-01-04 02:13:29,050][134294] Updated weights for policy 0, policy_version 67144 (0.0020) [2025-01-04 02:13:32,182][134294] Updated weights for policy 0, policy_version 67154 (0.0027) [2025-01-04 02:13:33,969][134211] Fps is (10 sec: 16381.7, 60 sec: 15974.1, 300 sec: 15606.4). Total num frames: 275083264. Throughput: 0: 3744.3. Samples: 57935622. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 02:13:33,969][134211] Avg episode reward: [(0, '6.665')] [2025-01-04 02:13:35,433][134294] Updated weights for policy 0, policy_version 67164 (0.0024) [2025-01-04 02:13:38,342][134294] Updated weights for policy 0, policy_version 67174 (0.0026) [2025-01-04 02:13:38,968][134211] Fps is (10 sec: 13107.3, 60 sec: 15564.8, 300 sec: 15495.4). Total num frames: 275148800. Throughput: 0: 3643.5. Samples: 57955354. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 02:13:38,969][134211] Avg episode reward: [(0, '7.137')] [2025-01-04 02:13:41,616][134294] Updated weights for policy 0, policy_version 67184 (0.0025) [2025-01-04 02:13:43,968][134211] Fps is (10 sec: 13108.6, 60 sec: 14813.8, 300 sec: 15495.4). Total num frames: 275214336. Throughput: 0: 3663.5. Samples: 57974576. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 02:13:43,968][134211] Avg episode reward: [(0, '6.603')] [2025-01-04 02:13:44,853][134294] Updated weights for policy 0, policy_version 67194 (0.0026) [2025-01-04 02:13:47,835][134294] Updated weights for policy 0, policy_version 67204 (0.0023) [2025-01-04 02:13:48,968][134211] Fps is (10 sec: 13926.7, 60 sec: 14336.2, 300 sec: 15523.2). Total num frames: 275288064. Throughput: 0: 3694.5. Samples: 57984232. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 02:13:48,968][134211] Avg episode reward: [(0, '6.196')] [2025-01-04 02:13:49,741][134294] Updated weights for policy 0, policy_version 67214 (0.0014) [2025-01-04 02:13:51,627][134294] Updated weights for policy 0, policy_version 67224 (0.0015) [2025-01-04 02:13:53,449][134294] Updated weights for policy 0, policy_version 67234 (0.0013) [2025-01-04 02:13:53,967][134211] Fps is (10 sec: 18432.5, 60 sec: 15087.0, 300 sec: 15717.5). Total num frames: 275398656. Throughput: 0: 3960.5. Samples: 58013790. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 02:13:53,968][134211] Avg episode reward: [(0, '6.867')] [2025-01-04 02:13:55,551][134294] Updated weights for policy 0, policy_version 67244 (0.0015) [2025-01-04 02:13:58,705][134294] Updated weights for policy 0, policy_version 67254 (0.0028) [2025-01-04 02:13:58,968][134211] Fps is (10 sec: 18431.6, 60 sec: 15292.3, 300 sec: 15689.8). Total num frames: 275472384. Throughput: 0: 4081.9. Samples: 58039640. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 02:13:58,969][134211] Avg episode reward: [(0, '7.202')] [2025-01-04 02:14:02,012][134294] Updated weights for policy 0, policy_version 67264 (0.0026) [2025-01-04 02:14:03,968][134211] Fps is (10 sec: 13516.2, 60 sec: 15428.3, 300 sec: 15550.9). Total num frames: 275533824. Throughput: 0: 4060.8. Samples: 58048716. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 02:14:03,969][134211] Avg episode reward: [(0, '7.099')] [2025-01-04 02:14:05,281][134294] Updated weights for policy 0, policy_version 67274 (0.0024) [2025-01-04 02:14:08,510][134294] Updated weights for policy 0, policy_version 67284 (0.0025) [2025-01-04 02:14:08,968][134211] Fps is (10 sec: 12697.7, 60 sec: 15496.5, 300 sec: 15550.9). Total num frames: 275599360. Throughput: 0: 3860.2. Samples: 58067946. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 02:14:08,968][134211] Avg episode reward: [(0, '7.295')] [2025-01-04 02:14:11,415][134294] Updated weights for policy 0, policy_version 67294 (0.0025) [2025-01-04 02:14:13,380][134294] Updated weights for policy 0, policy_version 67304 (0.0013) [2025-01-04 02:14:13,967][134211] Fps is (10 sec: 15155.9, 60 sec: 15769.7, 300 sec: 15620.4). Total num frames: 275685376. Throughput: 0: 3684.4. Samples: 58091508. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 02:14:13,968][134211] Avg episode reward: [(0, '6.739')] [2025-01-04 02:14:15,320][134294] Updated weights for policy 0, policy_version 67314 (0.0012) [2025-01-04 02:14:17,232][134294] Updated weights for policy 0, policy_version 67324 (0.0013) [2025-01-04 02:14:18,967][134211] Fps is (10 sec: 19661.3, 60 sec: 16384.0, 300 sec: 15773.1). Total num frames: 275795968. Throughput: 0: 3818.3. Samples: 58107442. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 02:14:18,968][134211] Avg episode reward: [(0, '7.267')] [2025-01-04 02:14:19,071][134294] Updated weights for policy 0, policy_version 67334 (0.0013) [2025-01-04 02:14:21,115][134294] Updated weights for policy 0, policy_version 67344 (0.0016) [2025-01-04 02:14:23,968][134211] Fps is (10 sec: 19250.5, 60 sec: 15974.3, 300 sec: 15786.9). Total num frames: 275877888. Throughput: 0: 4041.0. Samples: 58137198. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 02:14:23,969][134211] Avg episode reward: [(0, '6.612')] [2025-01-04 02:14:24,171][134294] Updated weights for policy 0, policy_version 67354 (0.0028) [2025-01-04 02:14:27,391][134294] Updated weights for policy 0, policy_version 67364 (0.0027) [2025-01-04 02:14:28,968][134211] Fps is (10 sec: 14335.5, 60 sec: 15360.0, 300 sec: 15620.3). Total num frames: 275939328. Throughput: 0: 4037.0. Samples: 58156240. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 02:14:28,969][134211] Avg episode reward: [(0, '7.166')] [2025-01-04 02:14:30,636][134294] Updated weights for policy 0, policy_version 67374 (0.0025) [2025-01-04 02:14:33,694][134294] Updated weights for policy 0, policy_version 67384 (0.0025) [2025-01-04 02:14:33,968][134211] Fps is (10 sec: 12697.9, 60 sec: 15360.3, 300 sec: 15481.5). Total num frames: 276004864. Throughput: 0: 4038.3. Samples: 58165958. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 02:14:33,968][134211] Avg episode reward: [(0, '6.300')] [2025-01-04 02:14:36,786][134294] Updated weights for policy 0, policy_version 67394 (0.0024) [2025-01-04 02:14:38,968][134211] Fps is (10 sec: 13516.9, 60 sec: 15428.3, 300 sec: 15495.4). Total num frames: 276074496. Throughput: 0: 3828.5. Samples: 58186072. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 02:14:38,968][134211] Avg episode reward: [(0, '6.591')] [2025-01-04 02:14:39,856][134294] Updated weights for policy 0, policy_version 67404 (0.0027) [2025-01-04 02:14:42,871][134294] Updated weights for policy 0, policy_version 67414 (0.0022) [2025-01-04 02:14:43,967][134211] Fps is (10 sec: 14336.2, 60 sec: 15564.9, 300 sec: 15523.2). Total num frames: 276148224. Throughput: 0: 3714.2. Samples: 58206776. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 02:14:43,968][134211] Avg episode reward: [(0, '7.070')] [2025-01-04 02:14:44,751][134294] Updated weights for policy 0, policy_version 67424 (0.0014) [2025-01-04 02:14:46,648][134294] Updated weights for policy 0, policy_version 67434 (0.0012) [2025-01-04 02:14:48,968][134211] Fps is (10 sec: 16793.6, 60 sec: 15906.1, 300 sec: 15620.3). Total num frames: 276242432. Throughput: 0: 3874.8. Samples: 58223080. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 02:14:48,968][134211] Avg episode reward: [(0, '7.020')] [2025-01-04 02:14:49,315][134294] Updated weights for policy 0, policy_version 67444 (0.0024) [2025-01-04 02:14:52,623][134294] Updated weights for policy 0, policy_version 67454 (0.0026) [2025-01-04 02:14:53,968][134211] Fps is (10 sec: 15974.1, 60 sec: 15155.2, 300 sec: 15606.5). Total num frames: 276307968. Throughput: 0: 3926.8. Samples: 58244654. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 02:14:53,968][134211] Avg episode reward: [(0, '6.473')] [2025-01-04 02:14:55,042][134294] Updated weights for policy 0, policy_version 67464 (0.0020) [2025-01-04 02:14:56,982][134294] Updated weights for policy 0, policy_version 67474 (0.0013) [2025-01-04 02:14:58,968][134211] Fps is (10 sec: 16384.0, 60 sec: 15564.8, 300 sec: 15731.4). Total num frames: 276406272. Throughput: 0: 4011.9. Samples: 58272044. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 02:14:58,968][134211] Avg episode reward: [(0, '6.316')] [2025-01-04 02:14:59,537][134294] Updated weights for policy 0, policy_version 67484 (0.0019) [2025-01-04 02:15:03,082][134294] Updated weights for policy 0, policy_version 67494 (0.0028) [2025-01-04 02:15:03,968][134211] Fps is (10 sec: 15564.8, 60 sec: 15496.6, 300 sec: 15592.6). Total num frames: 276463616. Throughput: 0: 3869.4. Samples: 58281564. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 02:15:03,968][134211] Avg episode reward: [(0, '6.496')] [2025-01-04 02:15:05,951][134294] Updated weights for policy 0, policy_version 67504 (0.0021) [2025-01-04 02:15:07,832][134294] Updated weights for policy 0, policy_version 67514 (0.0014) [2025-01-04 02:15:08,967][134211] Fps is (10 sec: 15565.1, 60 sec: 16042.7, 300 sec: 15578.7). Total num frames: 276561920. Throughput: 0: 3706.4. Samples: 58303986. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 02:15:08,968][134211] Avg episode reward: [(0, '6.163')] [2025-01-04 02:15:09,684][134294] Updated weights for policy 0, policy_version 67524 (0.0014) [2025-01-04 02:15:11,580][134294] Updated weights for policy 0, policy_version 67534 (0.0014) [2025-01-04 02:15:13,695][134294] Updated weights for policy 0, policy_version 67544 (0.0018) [2025-01-04 02:15:13,969][134211] Fps is (10 sec: 19659.1, 60 sec: 16247.2, 300 sec: 15675.8). Total num frames: 276660224. Throughput: 0: 3998.0. Samples: 58336152. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 02:15:13,969][134211] Avg episode reward: [(0, '6.476')] [2025-01-04 02:15:14,014][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000067545_276664320.pth... [2025-01-04 02:15:14,086][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000066624_272891904.pth [2025-01-04 02:15:17,075][134294] Updated weights for policy 0, policy_version 67554 (0.0028) [2025-01-04 02:15:18,968][134211] Fps is (10 sec: 15974.2, 60 sec: 15428.2, 300 sec: 15676.0). Total num frames: 276721664. Throughput: 0: 3992.4. Samples: 58345614. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 02:15:18,968][134211] Avg episode reward: [(0, '6.724')] [2025-01-04 02:15:20,256][134294] Updated weights for policy 0, policy_version 67564 (0.0029) [2025-01-04 02:15:23,314][134294] Updated weights for policy 0, policy_version 67574 (0.0029) [2025-01-04 02:15:23,968][134211] Fps is (10 sec: 13108.3, 60 sec: 15223.5, 300 sec: 15689.8). Total num frames: 276791296. Throughput: 0: 3985.0. Samples: 58365396. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 02:15:23,968][134211] Avg episode reward: [(0, '7.341')] [2025-01-04 02:15:26,344][134294] Updated weights for policy 0, policy_version 67584 (0.0024) [2025-01-04 02:15:28,968][134211] Fps is (10 sec: 13516.8, 60 sec: 15291.8, 300 sec: 15606.4). Total num frames: 276856832. Throughput: 0: 3964.8. Samples: 58385192. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 02:15:28,968][134211] Avg episode reward: [(0, '7.066')] [2025-01-04 02:15:29,528][134294] Updated weights for policy 0, policy_version 67594 (0.0026) [2025-01-04 02:15:31,821][134294] Updated weights for policy 0, policy_version 67604 (0.0017) [2025-01-04 02:15:33,771][134294] Updated weights for policy 0, policy_version 67614 (0.0012) [2025-01-04 02:15:33,967][134211] Fps is (10 sec: 15974.7, 60 sec: 15769.6, 300 sec: 15592.6). Total num frames: 276951040. Throughput: 0: 3845.5. Samples: 58396126. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 02:15:33,968][134211] Avg episode reward: [(0, '7.161')] [2025-01-04 02:15:35,624][134294] Updated weights for policy 0, policy_version 67624 (0.0013) [2025-01-04 02:15:37,768][134294] Updated weights for policy 0, policy_version 67634 (0.0016) [2025-01-04 02:15:38,971][134211] Fps is (10 sec: 18426.1, 60 sec: 16110.1, 300 sec: 15689.6). Total num frames: 277041152. Throughput: 0: 4079.9. Samples: 58428262. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 02:15:38,972][134211] Avg episode reward: [(0, '6.890')] [2025-01-04 02:15:41,058][134294] Updated weights for policy 0, policy_version 67644 (0.0031) [2025-01-04 02:15:43,968][134211] Fps is (10 sec: 15154.9, 60 sec: 15906.1, 300 sec: 15648.1). Total num frames: 277102592. Throughput: 0: 3887.5. Samples: 58446980. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 02:15:43,968][134211] Avg episode reward: [(0, '6.166')] [2025-01-04 02:15:44,408][134294] Updated weights for policy 0, policy_version 67654 (0.0027) [2025-01-04 02:15:47,529][134294] Updated weights for policy 0, policy_version 67664 (0.0025) [2025-01-04 02:15:48,968][134211] Fps is (10 sec: 12701.6, 60 sec: 15428.3, 300 sec: 15495.4). Total num frames: 277168128. Throughput: 0: 3889.7. Samples: 58456600. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 02:15:48,968][134211] Avg episode reward: [(0, '6.490')] [2025-01-04 02:15:50,358][134294] Updated weights for policy 0, policy_version 67674 (0.0022) [2025-01-04 02:15:52,302][134294] Updated weights for policy 0, policy_version 67684 (0.0013) [2025-01-04 02:15:53,968][134211] Fps is (10 sec: 16384.1, 60 sec: 15974.4, 300 sec: 15495.4). Total num frames: 277266432. Throughput: 0: 3936.7. Samples: 58481140. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 02:15:53,968][134211] Avg episode reward: [(0, '6.348')] [2025-01-04 02:15:54,158][134294] Updated weights for policy 0, policy_version 67694 (0.0013) [2025-01-04 02:15:56,044][134294] Updated weights for policy 0, policy_version 67704 (0.0013) [2025-01-04 02:15:57,981][134294] Updated weights for policy 0, policy_version 67714 (0.0014) [2025-01-04 02:15:58,968][134211] Fps is (10 sec: 20070.5, 60 sec: 16042.7, 300 sec: 15620.4). Total num frames: 277368832. Throughput: 0: 3932.6. Samples: 58513116. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 02:15:58,968][134211] Avg episode reward: [(0, '6.633')] [2025-01-04 02:16:01,067][134294] Updated weights for policy 0, policy_version 67724 (0.0025) [2025-01-04 02:16:03,968][134211] Fps is (10 sec: 16792.6, 60 sec: 16179.1, 300 sec: 15634.2). Total num frames: 277434368. Throughput: 0: 3947.9. Samples: 58523270. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 02:16:03,969][134211] Avg episode reward: [(0, '7.392')] [2025-01-04 02:16:04,281][134294] Updated weights for policy 0, policy_version 67734 (0.0027) [2025-01-04 02:16:07,411][134294] Updated weights for policy 0, policy_version 67744 (0.0026) [2025-01-04 02:16:08,968][134211] Fps is (10 sec: 12697.4, 60 sec: 15564.7, 300 sec: 15606.5). Total num frames: 277495808. Throughput: 0: 3928.3. Samples: 58542172. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 02:16:08,968][134211] Avg episode reward: [(0, '6.964')] [2025-01-04 02:16:10,829][134294] Updated weights for policy 0, policy_version 67754 (0.0030) [2025-01-04 02:16:13,917][134294] Updated weights for policy 0, policy_version 67764 (0.0025) [2025-01-04 02:16:13,968][134211] Fps is (10 sec: 12698.1, 60 sec: 15018.9, 300 sec: 15606.5). Total num frames: 277561344. Throughput: 0: 3910.7. Samples: 58561176. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 02:16:13,968][134211] Avg episode reward: [(0, '6.689')] [2025-01-04 02:16:16,377][134294] Updated weights for policy 0, policy_version 67774 (0.0017) [2025-01-04 02:16:18,299][134294] Updated weights for policy 0, policy_version 67784 (0.0014) [2025-01-04 02:16:18,968][134211] Fps is (10 sec: 15974.8, 60 sec: 15564.8, 300 sec: 15703.7). Total num frames: 277655552. Throughput: 0: 3921.8. Samples: 58572606. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 02:16:18,968][134211] Avg episode reward: [(0, '7.104')] [2025-01-04 02:16:20,180][134294] Updated weights for policy 0, policy_version 67794 (0.0013) [2025-01-04 02:16:22,217][134294] Updated weights for policy 0, policy_version 67804 (0.0013) [2025-01-04 02:16:23,968][134211] Fps is (10 sec: 20070.3, 60 sec: 16179.2, 300 sec: 15814.7). Total num frames: 277762048. Throughput: 0: 3916.1. Samples: 58604476. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 02:16:23,968][134211] Avg episode reward: [(0, '6.556')] [2025-01-04 02:16:24,133][134294] Updated weights for policy 0, policy_version 67814 (0.0015) [2025-01-04 02:16:25,976][134294] Updated weights for policy 0, policy_version 67824 (0.0013) [2025-01-04 02:16:28,592][134294] Updated weights for policy 0, policy_version 67834 (0.0022) [2025-01-04 02:16:28,968][134211] Fps is (10 sec: 19660.5, 60 sec: 16588.8, 300 sec: 15842.5). Total num frames: 277852160. Throughput: 0: 4154.2. Samples: 58633918. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 02:16:28,968][134211] Avg episode reward: [(0, '6.732')] [2025-01-04 02:16:31,950][134294] Updated weights for policy 0, policy_version 67844 (0.0029) [2025-01-04 02:16:33,968][134211] Fps is (10 sec: 15155.5, 60 sec: 16042.6, 300 sec: 15689.8). Total num frames: 277913600. Throughput: 0: 4142.8. Samples: 58643024. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 02:16:33,968][134211] Avg episode reward: [(0, '7.289')] [2025-01-04 02:16:35,280][134294] Updated weights for policy 0, policy_version 67854 (0.0028) [2025-01-04 02:16:38,284][134294] Updated weights for policy 0, policy_version 67864 (0.0028) [2025-01-04 02:16:38,968][134211] Fps is (10 sec: 12287.9, 60 sec: 15565.6, 300 sec: 15550.9). Total num frames: 277975040. Throughput: 0: 4021.0. Samples: 58662084. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 02:16:38,968][134211] Avg episode reward: [(0, '6.727')] [2025-01-04 02:16:41,365][134294] Updated weights for policy 0, policy_version 67874 (0.0027) [2025-01-04 02:16:43,968][134211] Fps is (10 sec: 13107.1, 60 sec: 15701.3, 300 sec: 15564.8). Total num frames: 278044672. Throughput: 0: 3750.0. Samples: 58681868. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 02:16:43,968][134211] Avg episode reward: [(0, '6.735')] [2025-01-04 02:16:44,547][134294] Updated weights for policy 0, policy_version 67884 (0.0028) [2025-01-04 02:16:47,585][134294] Updated weights for policy 0, policy_version 67894 (0.0027) [2025-01-04 02:16:48,968][134211] Fps is (10 sec: 13516.9, 60 sec: 15701.3, 300 sec: 15578.7). Total num frames: 278110208. Throughput: 0: 3747.2. Samples: 58691890. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 02:16:48,968][134211] Avg episode reward: [(0, '6.836')] [2025-01-04 02:16:50,072][134294] Updated weights for policy 0, policy_version 67904 (0.0017) [2025-01-04 02:16:52,185][134294] Updated weights for policy 0, policy_version 67914 (0.0015) [2025-01-04 02:16:53,968][134211] Fps is (10 sec: 15155.4, 60 sec: 15496.5, 300 sec: 15648.1). Total num frames: 278196224. Throughput: 0: 3884.5. Samples: 58716976. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 02:16:53,968][134211] Avg episode reward: [(0, '6.662')] [2025-01-04 02:16:55,199][134294] Updated weights for policy 0, policy_version 67924 (0.0025) [2025-01-04 02:16:58,419][134294] Updated weights for policy 0, policy_version 67934 (0.0023) [2025-01-04 02:16:58,968][134211] Fps is (10 sec: 15155.1, 60 sec: 14882.1, 300 sec: 15648.1). Total num frames: 278261760. Throughput: 0: 3903.2. Samples: 58736818. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 02:16:58,968][134211] Avg episode reward: [(0, '6.932')] [2025-01-04 02:17:00,969][134294] Updated weights for policy 0, policy_version 67944 (0.0016) [2025-01-04 02:17:02,958][134294] Updated weights for policy 0, policy_version 67954 (0.0014) [2025-01-04 02:17:03,967][134211] Fps is (10 sec: 16384.3, 60 sec: 15428.5, 300 sec: 15648.1). Total num frames: 278360064. Throughput: 0: 3920.7. Samples: 58749036. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 02:17:03,968][134211] Avg episode reward: [(0, '6.492')] [2025-01-04 02:17:04,944][134294] Updated weights for policy 0, policy_version 67964 (0.0012) [2025-01-04 02:17:06,802][134294] Updated weights for policy 0, policy_version 67974 (0.0014) [2025-01-04 02:17:08,695][134294] Updated weights for policy 0, policy_version 67984 (0.0013) [2025-01-04 02:17:08,968][134211] Fps is (10 sec: 20480.5, 60 sec: 16179.3, 300 sec: 15773.1). Total num frames: 278466560. Throughput: 0: 3915.0. Samples: 58780648. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 02:17:08,968][134211] Avg episode reward: [(0, '6.606')] [2025-01-04 02:17:10,700][134294] Updated weights for policy 0, policy_version 67994 (0.0016) [2025-01-04 02:17:13,752][134294] Updated weights for policy 0, policy_version 68004 (0.0027) [2025-01-04 02:17:13,968][134211] Fps is (10 sec: 18431.2, 60 sec: 16384.0, 300 sec: 15814.7). Total num frames: 278544384. Throughput: 0: 3855.5. Samples: 58807416. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 02:17:13,969][134211] Avg episode reward: [(0, '6.810')] [2025-01-04 02:17:13,980][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000068004_278544384.pth... [2025-01-04 02:17:14,065][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000067075_274739200.pth [2025-01-04 02:17:17,101][134294] Updated weights for policy 0, policy_version 68014 (0.0029) [2025-01-04 02:17:18,968][134211] Fps is (10 sec: 13926.2, 60 sec: 15837.8, 300 sec: 15662.0). Total num frames: 278605824. Throughput: 0: 3859.7. Samples: 58816710. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 02:17:18,968][134211] Avg episode reward: [(0, '6.594')] [2025-01-04 02:17:20,142][134294] Updated weights for policy 0, policy_version 68024 (0.0025) [2025-01-04 02:17:23,182][134294] Updated weights for policy 0, policy_version 68034 (0.0026) [2025-01-04 02:17:23,969][134211] Fps is (10 sec: 13105.4, 60 sec: 15223.1, 300 sec: 15620.3). Total num frames: 278675456. Throughput: 0: 3879.3. Samples: 58836660. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 02:17:23,970][134211] Avg episode reward: [(0, '7.648')] [2025-01-04 02:17:26,233][134294] Updated weights for policy 0, policy_version 68044 (0.0024) [2025-01-04 02:17:28,968][134211] Fps is (10 sec: 13516.7, 60 sec: 14813.9, 300 sec: 15648.1). Total num frames: 278740992. Throughput: 0: 3882.9. Samples: 58856600. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 02:17:28,968][134211] Avg episode reward: [(0, '6.911')] [2025-01-04 02:17:29,343][134294] Updated weights for policy 0, policy_version 68054 (0.0025) [2025-01-04 02:17:32,420][134294] Updated weights for policy 0, policy_version 68064 (0.0027) [2025-01-04 02:17:33,967][134211] Fps is (10 sec: 13929.0, 60 sec: 15018.7, 300 sec: 15592.6). Total num frames: 278814720. Throughput: 0: 3883.1. Samples: 58866630. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 02:17:33,968][134211] Avg episode reward: [(0, '6.666')] [2025-01-04 02:17:34,583][134294] Updated weights for policy 0, policy_version 68074 (0.0017) [2025-01-04 02:17:36,440][134294] Updated weights for policy 0, policy_version 68084 (0.0013) [2025-01-04 02:17:38,386][134294] Updated weights for policy 0, policy_version 68094 (0.0013) [2025-01-04 02:17:38,968][134211] Fps is (10 sec: 18432.4, 60 sec: 15837.9, 300 sec: 15592.6). Total num frames: 278925312. Throughput: 0: 3958.6. Samples: 58895112. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 02:17:38,968][134211] Avg episode reward: [(0, '7.055')] [2025-01-04 02:17:40,279][134294] Updated weights for policy 0, policy_version 68104 (0.0013) [2025-01-04 02:17:42,181][134294] Updated weights for policy 0, policy_version 68114 (0.0011) [2025-01-04 02:17:43,968][134211] Fps is (10 sec: 20888.9, 60 sec: 16315.7, 300 sec: 15578.7). Total num frames: 279023616. Throughput: 0: 4219.9. Samples: 58926712. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 02:17:43,968][134211] Avg episode reward: [(0, '6.561')] [2025-01-04 02:17:44,755][134294] Updated weights for policy 0, policy_version 68124 (0.0021) [2025-01-04 02:17:47,860][134294] Updated weights for policy 0, policy_version 68134 (0.0028) [2025-01-04 02:17:48,968][134211] Fps is (10 sec: 16383.7, 60 sec: 16315.7, 300 sec: 15578.7). Total num frames: 279089152. Throughput: 0: 4174.4. Samples: 58936884. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 02:17:48,968][134211] Avg episode reward: [(0, '5.921')] [2025-01-04 02:17:51,092][134294] Updated weights for policy 0, policy_version 68144 (0.0029) [2025-01-04 02:17:53,968][134211] Fps is (10 sec: 13107.3, 60 sec: 15974.4, 300 sec: 15592.7). Total num frames: 279154688. Throughput: 0: 3900.2. Samples: 58956156. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 02:17:53,968][134211] Avg episode reward: [(0, '6.640')] [2025-01-04 02:17:54,298][134294] Updated weights for policy 0, policy_version 68154 (0.0028) [2025-01-04 02:17:57,463][134294] Updated weights for policy 0, policy_version 68164 (0.0026) [2025-01-04 02:17:58,968][134211] Fps is (10 sec: 12697.4, 60 sec: 15906.1, 300 sec: 15620.4). Total num frames: 279216128. Throughput: 0: 3731.8. Samples: 58975346. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 02:17:58,970][134211] Avg episode reward: [(0, '6.837')] [2025-01-04 02:18:00,515][134294] Updated weights for policy 0, policy_version 68174 (0.0021) [2025-01-04 02:18:02,382][134294] Updated weights for policy 0, policy_version 68184 (0.0012) [2025-01-04 02:18:03,968][134211] Fps is (10 sec: 15565.0, 60 sec: 15837.8, 300 sec: 15731.4). Total num frames: 279310336. Throughput: 0: 3782.4. Samples: 58986918. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 02:18:03,968][134211] Avg episode reward: [(0, '5.801')] [2025-01-04 02:18:04,357][134294] Updated weights for policy 0, policy_version 68194 (0.0013) [2025-01-04 02:18:06,243][134294] Updated weights for policy 0, policy_version 68204 (0.0015) [2025-01-04 02:18:08,126][134294] Updated weights for policy 0, policy_version 68214 (0.0013) [2025-01-04 02:18:08,968][134211] Fps is (10 sec: 20480.4, 60 sec: 15906.1, 300 sec: 15870.3). Total num frames: 279420928. Throughput: 0: 4057.2. Samples: 59019226. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 02:18:08,968][134211] Avg episode reward: [(0, '6.001')] [2025-01-04 02:18:10,055][134294] Updated weights for policy 0, policy_version 68224 (0.0013) [2025-01-04 02:18:12,895][134294] Updated weights for policy 0, policy_version 68234 (0.0027) [2025-01-04 02:18:13,968][134211] Fps is (10 sec: 18841.1, 60 sec: 15906.2, 300 sec: 15884.1). Total num frames: 279498752. Throughput: 0: 4198.9. Samples: 59045552. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 02:18:13,969][134211] Avg episode reward: [(0, '6.836')] [2025-01-04 02:18:16,147][134294] Updated weights for policy 0, policy_version 68244 (0.0029) [2025-01-04 02:18:18,970][134211] Fps is (10 sec: 13923.0, 60 sec: 15905.5, 300 sec: 15731.3). Total num frames: 279560192. Throughput: 0: 4186.0. Samples: 59055010. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 02:18:18,971][134211] Avg episode reward: [(0, '7.143')] [2025-01-04 02:18:19,401][134294] Updated weights for policy 0, policy_version 68254 (0.0028) [2025-01-04 02:18:22,479][134294] Updated weights for policy 0, policy_version 68264 (0.0026) [2025-01-04 02:18:23,968][134211] Fps is (10 sec: 12697.8, 60 sec: 15838.3, 300 sec: 15620.3). Total num frames: 279625728. Throughput: 0: 3985.0. Samples: 59074438. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 02:18:23,968][134211] Avg episode reward: [(0, '6.052')] [2025-01-04 02:18:25,555][134294] Updated weights for policy 0, policy_version 68274 (0.0025) [2025-01-04 02:18:28,450][134294] Updated weights for policy 0, policy_version 68284 (0.0026) [2025-01-04 02:18:28,968][134211] Fps is (10 sec: 13519.5, 60 sec: 15906.0, 300 sec: 15634.3). Total num frames: 279695360. Throughput: 0: 3740.3. Samples: 59095028. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 02:18:28,969][134211] Avg episode reward: [(0, '6.564')] [2025-01-04 02:18:31,609][134294] Updated weights for policy 0, policy_version 68294 (0.0028) [2025-01-04 02:18:33,747][134294] Updated weights for policy 0, policy_version 68304 (0.0014) [2025-01-04 02:18:33,968][134211] Fps is (10 sec: 15155.3, 60 sec: 16042.6, 300 sec: 15689.8). Total num frames: 279777280. Throughput: 0: 3731.4. Samples: 59104796. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 02:18:33,968][134211] Avg episode reward: [(0, '6.462')] [2025-01-04 02:18:35,680][134294] Updated weights for policy 0, policy_version 68314 (0.0013) [2025-01-04 02:18:38,312][134294] Updated weights for policy 0, policy_version 68324 (0.0023) [2025-01-04 02:18:38,968][134211] Fps is (10 sec: 16793.3, 60 sec: 15632.9, 300 sec: 15759.2). Total num frames: 279863296. Throughput: 0: 3932.3. Samples: 59133110. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 02:18:38,969][134211] Avg episode reward: [(0, '6.266')] [2025-01-04 02:18:41,502][134294] Updated weights for policy 0, policy_version 68334 (0.0024) [2025-01-04 02:18:43,670][134294] Updated weights for policy 0, policy_version 68344 (0.0017) [2025-01-04 02:18:43,968][134211] Fps is (10 sec: 16383.8, 60 sec: 15291.8, 300 sec: 15773.1). Total num frames: 279941120. Throughput: 0: 3987.9. Samples: 59154802. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 02:18:43,968][134211] Avg episode reward: [(0, '6.578')] [2025-01-04 02:18:45,594][134294] Updated weights for policy 0, policy_version 68354 (0.0014) [2025-01-04 02:18:47,478][134294] Updated weights for policy 0, policy_version 68364 (0.0015) [2025-01-04 02:18:48,967][134211] Fps is (10 sec: 18433.5, 60 sec: 15974.5, 300 sec: 15759.2). Total num frames: 280047616. Throughput: 0: 4086.5. Samples: 59170808. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 02:18:48,968][134211] Avg episode reward: [(0, '6.284')] [2025-01-04 02:18:49,365][134294] Updated weights for policy 0, policy_version 68374 (0.0013) [2025-01-04 02:18:52,045][134294] Updated weights for policy 0, policy_version 68384 (0.0023) [2025-01-04 02:18:53,968][134211] Fps is (10 sec: 18022.3, 60 sec: 16110.9, 300 sec: 15759.2). Total num frames: 280121344. Throughput: 0: 3992.5. Samples: 59198888. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 02:18:53,968][134211] Avg episode reward: [(0, '6.498')] [2025-01-04 02:18:55,326][134294] Updated weights for policy 0, policy_version 68394 (0.0026) [2025-01-04 02:18:58,642][134294] Updated weights for policy 0, policy_version 68404 (0.0028) [2025-01-04 02:18:58,968][134211] Fps is (10 sec: 13516.5, 60 sec: 16111.0, 300 sec: 15759.2). Total num frames: 280182784. Throughput: 0: 3824.5. Samples: 59217656. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 02:18:58,968][134211] Avg episode reward: [(0, '6.647')] [2025-01-04 02:19:02,268][134294] Updated weights for policy 0, policy_version 68414 (0.0027) [2025-01-04 02:19:03,968][134211] Fps is (10 sec: 12288.1, 60 sec: 15564.8, 300 sec: 15745.3). Total num frames: 280244224. Throughput: 0: 3802.4. Samples: 59226108. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 02:19:03,968][134211] Avg episode reward: [(0, '6.928')] [2025-01-04 02:19:04,951][134294] Updated weights for policy 0, policy_version 68424 (0.0022) [2025-01-04 02:19:06,908][134294] Updated weights for policy 0, policy_version 68434 (0.0013) [2025-01-04 02:19:08,883][134294] Updated weights for policy 0, policy_version 68444 (0.0015) [2025-01-04 02:19:08,968][134211] Fps is (10 sec: 16384.2, 60 sec: 15428.3, 300 sec: 15800.8). Total num frames: 280346624. Throughput: 0: 3919.0. Samples: 59250794. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 02:19:08,968][134211] Avg episode reward: [(0, '6.683')] [2025-01-04 02:19:10,748][134294] Updated weights for policy 0, policy_version 68454 (0.0014) [2025-01-04 02:19:12,609][134294] Updated weights for policy 0, policy_version 68464 (0.0013) [2025-01-04 02:19:13,968][134211] Fps is (10 sec: 21299.3, 60 sec: 15974.5, 300 sec: 15800.8). Total num frames: 280457216. Throughput: 0: 4187.1. Samples: 59283444. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 02:19:13,968][134211] Avg episode reward: [(0, '6.532')] [2025-01-04 02:19:13,973][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000068471_280457216.pth... [2025-01-04 02:19:14,017][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000067545_276664320.pth [2025-01-04 02:19:14,507][134294] Updated weights for policy 0, policy_version 68474 (0.0013) [2025-01-04 02:19:16,425][134294] Updated weights for policy 0, policy_version 68484 (0.0015) [2025-01-04 02:19:18,968][134211] Fps is (10 sec: 19660.5, 60 sec: 16384.7, 300 sec: 15814.7). Total num frames: 280543232. Throughput: 0: 4330.5. Samples: 59299670. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-04 02:19:18,968][134211] Avg episode reward: [(0, '6.528')] [2025-01-04 02:19:19,588][134294] Updated weights for policy 0, policy_version 68494 (0.0028) [2025-01-04 02:19:22,963][134294] Updated weights for policy 0, policy_version 68504 (0.0028) [2025-01-04 02:19:23,968][134211] Fps is (10 sec: 14335.8, 60 sec: 16247.5, 300 sec: 15800.8). Total num frames: 280600576. Throughput: 0: 4128.5. Samples: 59318890. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-04 02:19:23,968][134211] Avg episode reward: [(0, '7.163')] [2025-01-04 02:19:26,001][134294] Updated weights for policy 0, policy_version 68514 (0.0026) [2025-01-04 02:19:28,968][134211] Fps is (10 sec: 12697.5, 60 sec: 16247.5, 300 sec: 15814.7). Total num frames: 280670208. Throughput: 0: 4079.2. Samples: 59338366. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-04 02:19:28,969][134211] Avg episode reward: [(0, '6.254')] [2025-01-04 02:19:29,251][134294] Updated weights for policy 0, policy_version 68524 (0.0025) [2025-01-04 02:19:32,356][134294] Updated weights for policy 0, policy_version 68534 (0.0028) [2025-01-04 02:19:33,968][134211] Fps is (10 sec: 13516.9, 60 sec: 15974.4, 300 sec: 15800.8). Total num frames: 280735744. Throughput: 0: 3942.3. Samples: 59348214. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-04 02:19:33,968][134211] Avg episode reward: [(0, '7.232')] [2025-01-04 02:19:35,359][134294] Updated weights for policy 0, policy_version 68544 (0.0027) [2025-01-04 02:19:38,408][134294] Updated weights for policy 0, policy_version 68554 (0.0020) [2025-01-04 02:19:38,968][134211] Fps is (10 sec: 13107.4, 60 sec: 15633.2, 300 sec: 15773.1). Total num frames: 280801280. Throughput: 0: 3773.0. Samples: 59368674. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-04 02:19:38,968][134211] Avg episode reward: [(0, '6.415')] [2025-01-04 02:19:41,435][134294] Updated weights for policy 0, policy_version 68564 (0.0025) [2025-01-04 02:19:43,969][134211] Fps is (10 sec: 13515.6, 60 sec: 15496.3, 300 sec: 15689.7). Total num frames: 280870912. Throughput: 0: 3797.9. Samples: 59388566. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-04 02:19:43,969][134211] Avg episode reward: [(0, '6.357')] [2025-01-04 02:19:44,441][134294] Updated weights for policy 0, policy_version 68574 (0.0026) [2025-01-04 02:19:47,466][134294] Updated weights for policy 0, policy_version 68584 (0.0026) [2025-01-04 02:19:48,968][134211] Fps is (10 sec: 14745.8, 60 sec: 15018.7, 300 sec: 15731.4). Total num frames: 280948736. Throughput: 0: 3836.6. Samples: 59398756. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-04 02:19:48,968][134211] Avg episode reward: [(0, '7.186')] [2025-01-04 02:19:49,403][134294] Updated weights for policy 0, policy_version 68594 (0.0014) [2025-01-04 02:19:51,303][134294] Updated weights for policy 0, policy_version 68604 (0.0014) [2025-01-04 02:19:53,132][134294] Updated weights for policy 0, policy_version 68614 (0.0014) [2025-01-04 02:19:53,967][134211] Fps is (10 sec: 18843.5, 60 sec: 15633.1, 300 sec: 15773.1). Total num frames: 281059328. Throughput: 0: 3952.6. Samples: 59428660. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-04 02:19:53,968][134211] Avg episode reward: [(0, '6.771')] [2025-01-04 02:19:55,030][134294] Updated weights for policy 0, policy_version 68624 (0.0014) [2025-01-04 02:19:57,801][134294] Updated weights for policy 0, policy_version 68634 (0.0024) [2025-01-04 02:19:58,968][134211] Fps is (10 sec: 18840.1, 60 sec: 15906.0, 300 sec: 15842.5). Total num frames: 281137152. Throughput: 0: 3822.4. Samples: 59455454. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 02:19:58,969][134211] Avg episode reward: [(0, '6.485')] [2025-01-04 02:20:01,081][134294] Updated weights for policy 0, policy_version 68644 (0.0027) [2025-01-04 02:20:03,968][134211] Fps is (10 sec: 13925.9, 60 sec: 15906.1, 300 sec: 15717.5). Total num frames: 281198592. Throughput: 0: 3674.7. Samples: 59465032. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 02:20:03,969][134211] Avg episode reward: [(0, '7.028')] [2025-01-04 02:20:04,388][134294] Updated weights for policy 0, policy_version 68654 (0.0026) [2025-01-04 02:20:07,570][134294] Updated weights for policy 0, policy_version 68664 (0.0025) [2025-01-04 02:20:08,967][134211] Fps is (10 sec: 13518.0, 60 sec: 15428.3, 300 sec: 15634.3). Total num frames: 281272320. Throughput: 0: 3669.7. Samples: 59484026. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 02:20:08,968][134211] Avg episode reward: [(0, '6.621')] [2025-01-04 02:20:09,583][134294] Updated weights for policy 0, policy_version 68674 (0.0015) [2025-01-04 02:20:11,463][134294] Updated weights for policy 0, policy_version 68684 (0.0014) [2025-01-04 02:20:13,362][134294] Updated weights for policy 0, policy_version 68694 (0.0014) [2025-01-04 02:20:13,968][134211] Fps is (10 sec: 18432.5, 60 sec: 15428.3, 300 sec: 15800.8). Total num frames: 281382912. Throughput: 0: 3923.8. Samples: 59514938. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 02:20:13,968][134211] Avg episode reward: [(0, '7.271')] [2025-01-04 02:20:15,232][134294] Updated weights for policy 0, policy_version 68704 (0.0013) [2025-01-04 02:20:17,960][134294] Updated weights for policy 0, policy_version 68714 (0.0024) [2025-01-04 02:20:18,968][134211] Fps is (10 sec: 18841.0, 60 sec: 15291.7, 300 sec: 15828.6). Total num frames: 281460736. Throughput: 0: 4046.8. Samples: 59530320. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 02:20:18,969][134211] Avg episode reward: [(0, '7.077')] [2025-01-04 02:20:21,122][134294] Updated weights for policy 0, policy_version 68724 (0.0031) [2025-01-04 02:20:23,968][134211] Fps is (10 sec: 14335.8, 60 sec: 15428.2, 300 sec: 15828.6). Total num frames: 281526272. Throughput: 0: 4025.4. Samples: 59549816. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 02:20:23,968][134211] Avg episode reward: [(0, '6.796')] [2025-01-04 02:20:24,410][134294] Updated weights for policy 0, policy_version 68734 (0.0028) [2025-01-04 02:20:27,505][134294] Updated weights for policy 0, policy_version 68744 (0.0024) [2025-01-04 02:20:28,968][134211] Fps is (10 sec: 13106.9, 60 sec: 15359.9, 300 sec: 15731.4). Total num frames: 281591808. Throughput: 0: 4013.9. Samples: 59569188. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 02:20:28,969][134211] Avg episode reward: [(0, '6.902')] [2025-01-04 02:20:30,526][134294] Updated weights for policy 0, policy_version 68754 (0.0024) [2025-01-04 02:20:32,773][134294] Updated weights for policy 0, policy_version 68764 (0.0016) [2025-01-04 02:20:33,968][134211] Fps is (10 sec: 15564.8, 60 sec: 15769.6, 300 sec: 15731.6). Total num frames: 281681920. Throughput: 0: 4011.6. Samples: 59579280. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 02:20:33,968][134211] Avg episode reward: [(0, '7.020')] [2025-01-04 02:20:34,697][134294] Updated weights for policy 0, policy_version 68774 (0.0013) [2025-01-04 02:20:36,585][134294] Updated weights for policy 0, policy_version 68784 (0.0013) [2025-01-04 02:20:38,506][134294] Updated weights for policy 0, policy_version 68794 (0.0014) [2025-01-04 02:20:38,968][134211] Fps is (10 sec: 19661.6, 60 sec: 16452.3, 300 sec: 15884.2). Total num frames: 281788416. Throughput: 0: 4058.9. Samples: 59611312. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 02:20:38,968][134211] Avg episode reward: [(0, '6.768')] [2025-01-04 02:20:41,159][134294] Updated weights for policy 0, policy_version 68804 (0.0025) [2025-01-04 02:20:43,968][134211] Fps is (10 sec: 17613.0, 60 sec: 16452.5, 300 sec: 15898.0). Total num frames: 281858048. Throughput: 0: 3994.6. Samples: 59635210. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 02:20:43,968][134211] Avg episode reward: [(0, '6.891')] [2025-01-04 02:20:44,285][134294] Updated weights for policy 0, policy_version 68814 (0.0026) [2025-01-04 02:20:47,495][134294] Updated weights for policy 0, policy_version 68824 (0.0025) [2025-01-04 02:20:48,968][134211] Fps is (10 sec: 13107.2, 60 sec: 16179.2, 300 sec: 15773.1). Total num frames: 281919488. Throughput: 0: 3993.4. Samples: 59644736. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 02:20:48,968][134211] Avg episode reward: [(0, '6.711')] [2025-01-04 02:20:50,581][134294] Updated weights for policy 0, policy_version 68834 (0.0027) [2025-01-04 02:20:53,534][134294] Updated weights for policy 0, policy_version 68844 (0.0025) [2025-01-04 02:20:53,969][134211] Fps is (10 sec: 13106.0, 60 sec: 15496.3, 300 sec: 15661.9). Total num frames: 281989120. Throughput: 0: 4018.7. Samples: 59664874. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 02:20:53,969][134211] Avg episode reward: [(0, '7.037')] [2025-01-04 02:20:56,606][134294] Updated weights for policy 0, policy_version 68854 (0.0023) [2025-01-04 02:20:58,967][134211] Fps is (10 sec: 13926.5, 60 sec: 15360.2, 300 sec: 15675.9). Total num frames: 282058752. Throughput: 0: 3773.0. Samples: 59684724. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 02:20:58,968][134211] Avg episode reward: [(0, '6.998')] [2025-01-04 02:20:59,326][134294] Updated weights for policy 0, policy_version 68864 (0.0019) [2025-01-04 02:21:01,397][134294] Updated weights for policy 0, policy_version 68874 (0.0013) [2025-01-04 02:21:03,377][134294] Updated weights for policy 0, policy_version 68884 (0.0014) [2025-01-04 02:21:03,968][134211] Fps is (10 sec: 17204.9, 60 sec: 16042.8, 300 sec: 15814.7). Total num frames: 282161152. Throughput: 0: 3752.1. Samples: 59699162. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 02:21:03,968][134211] Avg episode reward: [(0, '7.965')] [2025-01-04 02:21:05,236][134294] Updated weights for policy 0, policy_version 68894 (0.0014) [2025-01-04 02:21:07,115][134294] Updated weights for policy 0, policy_version 68904 (0.0012) [2025-01-04 02:21:08,968][134211] Fps is (10 sec: 20889.2, 60 sec: 16588.7, 300 sec: 15953.6). Total num frames: 282267648. Throughput: 0: 4029.4. Samples: 59731140. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 02:21:08,968][134211] Avg episode reward: [(0, '6.527')] [2025-01-04 02:21:09,199][134294] Updated weights for policy 0, policy_version 68914 (0.0015) [2025-01-04 02:21:12,284][134294] Updated weights for policy 0, policy_version 68924 (0.0025) [2025-01-04 02:21:13,968][134211] Fps is (10 sec: 17202.7, 60 sec: 15837.8, 300 sec: 15856.4). Total num frames: 282333184. Throughput: 0: 4110.9. Samples: 59754176. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 02:21:13,968][134211] Avg episode reward: [(0, '6.541')] [2025-01-04 02:21:13,978][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000068929_282333184.pth... [2025-01-04 02:21:14,055][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000068004_278544384.pth [2025-01-04 02:21:15,557][134294] Updated weights for policy 0, policy_version 68934 (0.0031) [2025-01-04 02:21:18,667][134294] Updated weights for policy 0, policy_version 68944 (0.0028) [2025-01-04 02:21:18,968][134211] Fps is (10 sec: 13107.1, 60 sec: 15633.1, 300 sec: 15717.5). Total num frames: 282398720. Throughput: 0: 4097.7. Samples: 59763678. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 02:21:18,968][134211] Avg episode reward: [(0, '6.991')] [2025-01-04 02:21:21,727][134294] Updated weights for policy 0, policy_version 68954 (0.0024) [2025-01-04 02:21:23,968][134211] Fps is (10 sec: 12697.7, 60 sec: 15564.8, 300 sec: 15620.3). Total num frames: 282460160. Throughput: 0: 3823.3. Samples: 59783360. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 02:21:23,968][134211] Avg episode reward: [(0, '6.821')] [2025-01-04 02:21:24,930][134294] Updated weights for policy 0, policy_version 68964 (0.0027) [2025-01-04 02:21:27,940][134294] Updated weights for policy 0, policy_version 68974 (0.0027) [2025-01-04 02:21:28,968][134211] Fps is (10 sec: 13107.3, 60 sec: 15633.2, 300 sec: 15648.1). Total num frames: 282529792. Throughput: 0: 3737.2. Samples: 59803386. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 02:21:28,968][134211] Avg episode reward: [(0, '7.231')] [2025-01-04 02:21:30,429][134294] Updated weights for policy 0, policy_version 68984 (0.0018) [2025-01-04 02:21:32,339][134294] Updated weights for policy 0, policy_version 68994 (0.0013) [2025-01-04 02:21:33,968][134211] Fps is (10 sec: 17203.2, 60 sec: 15837.9, 300 sec: 15787.0). Total num frames: 282632192. Throughput: 0: 3824.8. Samples: 59816852. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 02:21:33,968][134211] Avg episode reward: [(0, '6.996')] [2025-01-04 02:21:34,412][134294] Updated weights for policy 0, policy_version 69004 (0.0016) [2025-01-04 02:21:37,489][134294] Updated weights for policy 0, policy_version 69014 (0.0024) [2025-01-04 02:21:38,968][134211] Fps is (10 sec: 16793.6, 60 sec: 15155.2, 300 sec: 15773.1). Total num frames: 282697728. Throughput: 0: 3948.8. Samples: 59842566. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 02:21:38,968][134211] Avg episode reward: [(0, '7.066')] [2025-01-04 02:21:40,650][134294] Updated weights for policy 0, policy_version 69024 (0.0027) [2025-01-04 02:21:43,559][134294] Updated weights for policy 0, policy_version 69034 (0.0027) [2025-01-04 02:21:43,968][134211] Fps is (10 sec: 13517.0, 60 sec: 15155.2, 300 sec: 15787.0). Total num frames: 282767360. Throughput: 0: 3954.4. Samples: 59862674. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 02:21:43,968][134211] Avg episode reward: [(0, '6.283')] [2025-01-04 02:21:45,586][134294] Updated weights for policy 0, policy_version 69044 (0.0012) [2025-01-04 02:21:47,459][134294] Updated weights for policy 0, policy_version 69054 (0.0015) [2025-01-04 02:21:48,968][134211] Fps is (10 sec: 18022.5, 60 sec: 15974.4, 300 sec: 15870.3). Total num frames: 282877952. Throughput: 0: 3965.5. Samples: 59877610. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 02:21:48,968][134211] Avg episode reward: [(0, '6.706')] [2025-01-04 02:21:49,313][134294] Updated weights for policy 0, policy_version 69064 (0.0012) [2025-01-04 02:21:51,232][134294] Updated weights for policy 0, policy_version 69074 (0.0014) [2025-01-04 02:21:53,968][134211] Fps is (10 sec: 19660.6, 60 sec: 16247.7, 300 sec: 15939.7). Total num frames: 282963968. Throughput: 0: 3948.0. Samples: 59908798. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 02:21:53,968][134211] Avg episode reward: [(0, '6.161')] [2025-01-04 02:21:54,154][134294] Updated weights for policy 0, policy_version 69084 (0.0025) [2025-01-04 02:21:57,450][134294] Updated weights for policy 0, policy_version 69094 (0.0030) [2025-01-04 02:21:58,968][134211] Fps is (10 sec: 14745.2, 60 sec: 16110.8, 300 sec: 15814.7). Total num frames: 283025408. Throughput: 0: 3857.2. Samples: 59927748. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 02:21:58,969][134211] Avg episode reward: [(0, '7.255')] [2025-01-04 02:22:00,578][134294] Updated weights for policy 0, policy_version 69104 (0.0025) [2025-01-04 02:22:03,700][134294] Updated weights for policy 0, policy_version 69114 (0.0026) [2025-01-04 02:22:03,968][134211] Fps is (10 sec: 13107.0, 60 sec: 15564.7, 300 sec: 15689.7). Total num frames: 283095040. Throughput: 0: 3867.7. Samples: 59937726. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 02:22:03,968][134211] Avg episode reward: [(0, '7.109')] [2025-01-04 02:22:06,725][134294] Updated weights for policy 0, policy_version 69124 (0.0024) [2025-01-04 02:22:08,901][134294] Updated weights for policy 0, policy_version 69134 (0.0014) [2025-01-04 02:22:08,968][134211] Fps is (10 sec: 14746.1, 60 sec: 15087.0, 300 sec: 15689.8). Total num frames: 283172864. Throughput: 0: 3873.4. Samples: 59957664. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 02:22:08,968][134211] Avg episode reward: [(0, '6.436')] [2025-01-04 02:22:11,779][134294] Updated weights for policy 0, policy_version 69144 (0.0022) [2025-01-04 02:22:13,968][134211] Fps is (10 sec: 14336.1, 60 sec: 15086.9, 300 sec: 15703.6). Total num frames: 283238400. Throughput: 0: 3947.4. Samples: 59981020. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 02:22:13,968][134211] Avg episode reward: [(0, '6.339')] [2025-01-04 02:22:14,791][134294] Updated weights for policy 0, policy_version 69154 (0.0022) [2025-01-04 02:22:16,640][134294] Updated weights for policy 0, policy_version 69164 (0.0013) [2025-01-04 02:22:18,549][134294] Updated weights for policy 0, policy_version 69174 (0.0012) [2025-01-04 02:22:18,968][134211] Fps is (10 sec: 17203.1, 60 sec: 15769.6, 300 sec: 15828.7). Total num frames: 283344896. Throughput: 0: 3948.7. Samples: 59994544. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 02:22:18,968][134211] Avg episode reward: [(0, '6.797')] [2025-01-04 02:22:20,427][134294] Updated weights for policy 0, policy_version 69184 (0.0013) [2025-01-04 02:22:22,323][134294] Updated weights for policy 0, policy_version 69194 (0.0014) [2025-01-04 02:22:23,967][134211] Fps is (10 sec: 21299.8, 60 sec: 16520.6, 300 sec: 15967.5). Total num frames: 283451392. Throughput: 0: 4105.3. Samples: 60027302. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 02:22:23,968][134211] Avg episode reward: [(0, '6.942')] [2025-01-04 02:22:24,140][134294] Updated weights for policy 0, policy_version 69204 (0.0012) [2025-01-04 02:22:26,214][134294] Updated weights for policy 0, policy_version 69214 (0.0016) [2025-01-04 02:22:28,970][134211] Fps is (10 sec: 18837.0, 60 sec: 16724.7, 300 sec: 15995.1). Total num frames: 283533312. Throughput: 0: 4274.3. Samples: 60055026. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 02:22:28,971][134211] Avg episode reward: [(0, '6.763')] [2025-01-04 02:22:29,355][134294] Updated weights for policy 0, policy_version 69224 (0.0028) [2025-01-04 02:22:32,785][134294] Updated weights for policy 0, policy_version 69234 (0.0028) [2025-01-04 02:22:33,968][134211] Fps is (10 sec: 14335.6, 60 sec: 16042.6, 300 sec: 15828.6). Total num frames: 283594752. Throughput: 0: 4139.4. Samples: 60063884. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 02:22:33,968][134211] Avg episode reward: [(0, '6.611')] [2025-01-04 02:22:36,048][134294] Updated weights for policy 0, policy_version 69244 (0.0029) [2025-01-04 02:22:38,968][134211] Fps is (10 sec: 12290.9, 60 sec: 15974.4, 300 sec: 15703.7). Total num frames: 283656192. Throughput: 0: 3856.1. Samples: 60082322. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 02:22:38,968][134211] Avg episode reward: [(0, '7.196')] [2025-01-04 02:22:39,515][134294] Updated weights for policy 0, policy_version 69254 (0.0025) [2025-01-04 02:22:42,456][134294] Updated weights for policy 0, policy_version 69264 (0.0026) [2025-01-04 02:22:43,968][134211] Fps is (10 sec: 13107.1, 60 sec: 15974.3, 300 sec: 15717.5). Total num frames: 283725824. Throughput: 0: 3868.0. Samples: 60101806. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 02:22:43,968][134211] Avg episode reward: [(0, '6.967')] [2025-01-04 02:22:45,474][134294] Updated weights for policy 0, policy_version 69274 (0.0026) [2025-01-04 02:22:48,264][134294] Updated weights for policy 0, policy_version 69284 (0.0024) [2025-01-04 02:22:48,967][134211] Fps is (10 sec: 14336.3, 60 sec: 15360.0, 300 sec: 15745.3). Total num frames: 283799552. Throughput: 0: 3880.1. Samples: 60112330. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 02:22:48,968][134211] Avg episode reward: [(0, '6.998')] [2025-01-04 02:22:50,174][134294] Updated weights for policy 0, policy_version 69294 (0.0012) [2025-01-04 02:22:52,044][134294] Updated weights for policy 0, policy_version 69304 (0.0013) [2025-01-04 02:22:53,921][134294] Updated weights for policy 0, policy_version 69314 (0.0013) [2025-01-04 02:22:53,967][134211] Fps is (10 sec: 18432.8, 60 sec: 15769.7, 300 sec: 15911.9). Total num frames: 283910144. Throughput: 0: 4082.2. Samples: 60141362. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 02:22:53,968][134211] Avg episode reward: [(0, '6.745')] [2025-01-04 02:22:55,797][134294] Updated weights for policy 0, policy_version 69324 (0.0013) [2025-01-04 02:22:58,004][134294] Updated weights for policy 0, policy_version 69334 (0.0017) [2025-01-04 02:22:58,968][134211] Fps is (10 sec: 20069.7, 60 sec: 16247.5, 300 sec: 15898.0). Total num frames: 284000256. Throughput: 0: 4240.3. Samples: 60171834. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 02:22:58,969][134211] Avg episode reward: [(0, '7.739')] [2025-01-04 02:23:01,494][134294] Updated weights for policy 0, policy_version 69344 (0.0026) [2025-01-04 02:23:03,968][134211] Fps is (10 sec: 14745.1, 60 sec: 16042.7, 300 sec: 15717.5). Total num frames: 284057600. Throughput: 0: 4131.6. Samples: 60180468. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 02:23:03,969][134211] Avg episode reward: [(0, '7.408')] [2025-01-04 02:23:05,322][134294] Updated weights for policy 0, policy_version 69354 (0.0029) [2025-01-04 02:23:08,673][134294] Updated weights for policy 0, policy_version 69364 (0.0026) [2025-01-04 02:23:08,968][134211] Fps is (10 sec: 11878.5, 60 sec: 15769.6, 300 sec: 15662.0). Total num frames: 284119040. Throughput: 0: 3774.0. Samples: 60197132. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 02:23:08,968][134211] Avg episode reward: [(0, '7.442')] [2025-01-04 02:23:10,869][134294] Updated weights for policy 0, policy_version 69374 (0.0015) [2025-01-04 02:23:12,777][134294] Updated weights for policy 0, policy_version 69384 (0.0013) [2025-01-04 02:23:13,968][134211] Fps is (10 sec: 16384.4, 60 sec: 16384.1, 300 sec: 15801.0). Total num frames: 284221440. Throughput: 0: 3768.9. Samples: 60224616. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 02:23:13,968][134211] Avg episode reward: [(0, '6.603')] [2025-01-04 02:23:13,972][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000069390_284221440.pth... [2025-01-04 02:23:14,019][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000068471_280457216.pth [2025-01-04 02:23:14,695][134294] Updated weights for policy 0, policy_version 69394 (0.0012) [2025-01-04 02:23:16,894][134294] Updated weights for policy 0, policy_version 69404 (0.0017) [2025-01-04 02:23:18,968][134211] Fps is (10 sec: 18431.9, 60 sec: 15974.3, 300 sec: 15856.4). Total num frames: 284303360. Throughput: 0: 3920.2. Samples: 60240294. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 02:23:18,968][134211] Avg episode reward: [(0, '7.525')] [2025-01-04 02:23:20,163][134294] Updated weights for policy 0, policy_version 69414 (0.0025) [2025-01-04 02:23:23,409][134294] Updated weights for policy 0, policy_version 69424 (0.0028) [2025-01-04 02:23:23,971][134211] Fps is (10 sec: 14331.4, 60 sec: 15222.6, 300 sec: 15828.5). Total num frames: 284364800. Throughput: 0: 3934.6. Samples: 60259390. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 02:23:23,971][134211] Avg episode reward: [(0, '6.768')] [2025-01-04 02:23:26,517][134294] Updated weights for policy 0, policy_version 69434 (0.0025) [2025-01-04 02:23:28,968][134211] Fps is (10 sec: 12697.7, 60 sec: 14951.0, 300 sec: 15773.1). Total num frames: 284430336. Throughput: 0: 3939.5. Samples: 60279084. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 02:23:28,970][134211] Avg episode reward: [(0, '6.927')] [2025-01-04 02:23:29,632][134294] Updated weights for policy 0, policy_version 69444 (0.0028) [2025-01-04 02:23:32,419][134294] Updated weights for policy 0, policy_version 69454 (0.0021) [2025-01-04 02:23:33,968][134211] Fps is (10 sec: 15160.1, 60 sec: 15360.1, 300 sec: 15773.1). Total num frames: 284516352. Throughput: 0: 3926.0. Samples: 60289002. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 02:23:33,968][134211] Avg episode reward: [(0, '7.041')] [2025-01-04 02:23:34,262][134294] Updated weights for policy 0, policy_version 69464 (0.0013) [2025-01-04 02:23:36,134][134294] Updated weights for policy 0, policy_version 69474 (0.0013) [2025-01-04 02:23:38,024][134294] Updated weights for policy 0, policy_version 69484 (0.0013) [2025-01-04 02:23:38,968][134211] Fps is (10 sec: 19251.6, 60 sec: 16111.0, 300 sec: 15870.3). Total num frames: 284622848. Throughput: 0: 3974.9. Samples: 60320232. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 02:23:38,968][134211] Avg episode reward: [(0, '7.884')] [2025-01-04 02:23:39,993][134294] Updated weights for policy 0, policy_version 69494 (0.0013) [2025-01-04 02:23:42,112][134294] Updated weights for policy 0, policy_version 69504 (0.0016) [2025-01-04 02:23:43,968][134211] Fps is (10 sec: 19250.7, 60 sec: 16384.0, 300 sec: 15800.8). Total num frames: 284708864. Throughput: 0: 3925.0. Samples: 60348458. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 02:23:43,969][134211] Avg episode reward: [(0, '7.489')] [2025-01-04 02:23:45,415][134294] Updated weights for policy 0, policy_version 69514 (0.0030) [2025-01-04 02:23:48,577][134294] Updated weights for policy 0, policy_version 69524 (0.0025) [2025-01-04 02:23:48,968][134211] Fps is (10 sec: 15154.0, 60 sec: 16247.2, 300 sec: 15773.0). Total num frames: 284774400. Throughput: 0: 3947.2. Samples: 60358092. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 02:23:48,969][134211] Avg episode reward: [(0, '7.343')] [2025-01-04 02:23:51,592][134294] Updated weights for policy 0, policy_version 69534 (0.0026) [2025-01-04 02:23:53,968][134211] Fps is (10 sec: 13107.4, 60 sec: 15496.5, 300 sec: 15787.0). Total num frames: 284839936. Throughput: 0: 4016.1. Samples: 60377856. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 02:23:53,968][134211] Avg episode reward: [(0, '6.831')] [2025-01-04 02:23:54,939][134294] Updated weights for policy 0, policy_version 69544 (0.0029) [2025-01-04 02:23:57,966][134294] Updated weights for policy 0, policy_version 69554 (0.0023) [2025-01-04 02:23:58,968][134211] Fps is (10 sec: 12698.4, 60 sec: 15018.7, 300 sec: 15787.0). Total num frames: 284901376. Throughput: 0: 3831.8. Samples: 60397048. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 02:23:58,968][134211] Avg episode reward: [(0, '6.792')] [2025-01-04 02:24:01,363][134294] Updated weights for policy 0, policy_version 69564 (0.0027) [2025-01-04 02:24:03,908][134294] Updated weights for policy 0, policy_version 69574 (0.0019) [2025-01-04 02:24:03,968][134211] Fps is (10 sec: 13516.9, 60 sec: 15291.8, 300 sec: 15689.8). Total num frames: 284975104. Throughput: 0: 3681.0. Samples: 60405940. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 02:24:03,968][134211] Avg episode reward: [(0, '6.653')] [2025-01-04 02:24:05,880][134294] Updated weights for policy 0, policy_version 69584 (0.0013) [2025-01-04 02:24:07,762][134294] Updated weights for policy 0, policy_version 69594 (0.0013) [2025-01-04 02:24:08,968][134211] Fps is (10 sec: 18022.6, 60 sec: 16042.7, 300 sec: 15675.9). Total num frames: 285081600. Throughput: 0: 3888.3. Samples: 60434350. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 02:24:08,968][134211] Avg episode reward: [(0, '6.877')] [2025-01-04 02:24:09,624][134294] Updated weights for policy 0, policy_version 69604 (0.0014) [2025-01-04 02:24:11,549][134294] Updated weights for policy 0, policy_version 69614 (0.0015) [2025-01-04 02:24:13,422][134294] Updated weights for policy 0, policy_version 69624 (0.0014) [2025-01-04 02:24:13,968][134211] Fps is (10 sec: 21299.3, 60 sec: 16110.9, 300 sec: 15745.3). Total num frames: 285188096. Throughput: 0: 4176.5. Samples: 60467026. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-04 02:24:13,968][134211] Avg episode reward: [(0, '6.279')] [2025-01-04 02:24:16,187][134294] Updated weights for policy 0, policy_version 69634 (0.0027) [2025-01-04 02:24:18,968][134211] Fps is (10 sec: 17202.8, 60 sec: 15837.9, 300 sec: 15773.1). Total num frames: 285253632. Throughput: 0: 4208.2. Samples: 60478372. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-04 02:24:18,969][134211] Avg episode reward: [(0, '7.344')] [2025-01-04 02:24:19,340][134294] Updated weights for policy 0, policy_version 69644 (0.0031) [2025-01-04 02:24:22,603][134294] Updated weights for policy 0, policy_version 69654 (0.0026) [2025-01-04 02:24:23,968][134211] Fps is (10 sec: 13106.7, 60 sec: 15906.9, 300 sec: 15759.2). Total num frames: 285319168. Throughput: 0: 3945.3. Samples: 60497774. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-04 02:24:23,969][134211] Avg episode reward: [(0, '7.566')] [2025-01-04 02:24:25,566][134294] Updated weights for policy 0, policy_version 69664 (0.0028) [2025-01-04 02:24:28,628][134294] Updated weights for policy 0, policy_version 69674 (0.0025) [2025-01-04 02:24:28,968][134211] Fps is (10 sec: 13107.2, 60 sec: 15906.1, 300 sec: 15759.2). Total num frames: 285384704. Throughput: 0: 3766.6. Samples: 60517956. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-04 02:24:28,968][134211] Avg episode reward: [(0, '6.629')] [2025-01-04 02:24:31,703][134294] Updated weights for policy 0, policy_version 69684 (0.0024) [2025-01-04 02:24:33,968][134211] Fps is (10 sec: 13517.1, 60 sec: 15633.0, 300 sec: 15773.1). Total num frames: 285454336. Throughput: 0: 3771.6. Samples: 60527810. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-04 02:24:33,968][134211] Avg episode reward: [(0, '7.206')] [2025-01-04 02:24:34,895][134294] Updated weights for policy 0, policy_version 69694 (0.0024) [2025-01-04 02:24:37,579][134294] Updated weights for policy 0, policy_version 69704 (0.0022) [2025-01-04 02:24:38,967][134211] Fps is (10 sec: 15155.6, 60 sec: 15223.5, 300 sec: 15814.8). Total num frames: 285536256. Throughput: 0: 3783.3. Samples: 60548104. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-04 02:24:38,968][134211] Avg episode reward: [(0, '6.716')] [2025-01-04 02:24:39,506][134294] Updated weights for policy 0, policy_version 69714 (0.0013) [2025-01-04 02:24:41,408][134294] Updated weights for policy 0, policy_version 69724 (0.0013) [2025-01-04 02:24:43,245][134294] Updated weights for policy 0, policy_version 69734 (0.0013) [2025-01-04 02:24:43,969][134211] Fps is (10 sec: 18839.3, 60 sec: 15564.5, 300 sec: 15911.8). Total num frames: 285642752. Throughput: 0: 4075.3. Samples: 60580444. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-04 02:24:43,969][134211] Avg episode reward: [(0, '7.755')] [2025-01-04 02:24:45,287][134294] Updated weights for policy 0, policy_version 69744 (0.0017) [2025-01-04 02:24:48,445][134294] Updated weights for policy 0, policy_version 69754 (0.0029) [2025-01-04 02:24:48,968][134211] Fps is (10 sec: 18021.9, 60 sec: 15701.5, 300 sec: 15786.9). Total num frames: 285716480. Throughput: 0: 4180.8. Samples: 60594078. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-04 02:24:48,968][134211] Avg episode reward: [(0, '7.228')] [2025-01-04 02:24:51,484][134294] Updated weights for policy 0, policy_version 69764 (0.0027) [2025-01-04 02:24:53,971][134211] Fps is (10 sec: 13923.8, 60 sec: 15700.5, 300 sec: 15745.2). Total num frames: 285782016. Throughput: 0: 3981.5. Samples: 60613532. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 02:24:53,972][134211] Avg episode reward: [(0, '6.852')] [2025-01-04 02:24:54,840][134294] Updated weights for policy 0, policy_version 69774 (0.0026) [2025-01-04 02:24:57,937][134294] Updated weights for policy 0, policy_version 69784 (0.0024) [2025-01-04 02:24:58,968][134211] Fps is (10 sec: 12697.8, 60 sec: 15701.3, 300 sec: 15745.3). Total num frames: 285843456. Throughput: 0: 3680.1. Samples: 60632630. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 02:24:58,968][134211] Avg episode reward: [(0, '7.599')] [2025-01-04 02:25:01,159][134294] Updated weights for policy 0, policy_version 69794 (0.0025) [2025-01-04 02:25:03,289][134294] Updated weights for policy 0, policy_version 69804 (0.0014) [2025-01-04 02:25:03,968][134211] Fps is (10 sec: 14750.1, 60 sec: 15906.1, 300 sec: 15786.9). Total num frames: 285929472. Throughput: 0: 3647.2. Samples: 60642498. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 02:25:03,968][134211] Avg episode reward: [(0, '6.823')] [2025-01-04 02:25:05,319][134294] Updated weights for policy 0, policy_version 69814 (0.0014) [2025-01-04 02:25:07,979][134294] Updated weights for policy 0, policy_version 69824 (0.0021) [2025-01-04 02:25:08,968][134211] Fps is (10 sec: 17203.3, 60 sec: 15564.8, 300 sec: 15703.6). Total num frames: 286015488. Throughput: 0: 3837.0. Samples: 60670436. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 02:25:08,968][134211] Avg episode reward: [(0, '7.345')] [2025-01-04 02:25:10,178][134294] Updated weights for policy 0, policy_version 69834 (0.0016) [2025-01-04 02:25:13,006][134294] Updated weights for policy 0, policy_version 69844 (0.0023) [2025-01-04 02:25:13,968][134211] Fps is (10 sec: 15974.3, 60 sec: 15018.6, 300 sec: 15689.8). Total num frames: 286089216. Throughput: 0: 3908.6. Samples: 60693842. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 02:25:13,969][134211] Avg episode reward: [(0, '6.520')] [2025-01-04 02:25:14,036][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000069847_286093312.pth... [2025-01-04 02:25:14,106][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000068929_282333184.pth [2025-01-04 02:25:16,185][134294] Updated weights for policy 0, policy_version 69854 (0.0025) [2025-01-04 02:25:18,969][134211] Fps is (10 sec: 14334.4, 60 sec: 15086.7, 300 sec: 15703.6). Total num frames: 286158848. Throughput: 0: 3904.5. Samples: 60703514. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 02:25:18,969][134211] Avg episode reward: [(0, '6.887')] [2025-01-04 02:25:19,217][134294] Updated weights for policy 0, policy_version 69864 (0.0025) [2025-01-04 02:25:22,226][134294] Updated weights for policy 0, policy_version 69874 (0.0025) [2025-01-04 02:25:23,968][134211] Fps is (10 sec: 13517.1, 60 sec: 15087.0, 300 sec: 15703.7). Total num frames: 286224384. Throughput: 0: 3909.7. Samples: 60724040. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 02:25:23,968][134211] Avg episode reward: [(0, '6.608')] [2025-01-04 02:25:24,793][134294] Updated weights for policy 0, policy_version 69884 (0.0019) [2025-01-04 02:25:26,691][134294] Updated weights for policy 0, policy_version 69894 (0.0012) [2025-01-04 02:25:28,597][134294] Updated weights for policy 0, policy_version 69904 (0.0013) [2025-01-04 02:25:28,968][134211] Fps is (10 sec: 17614.8, 60 sec: 15837.9, 300 sec: 15773.1). Total num frames: 286334976. Throughput: 0: 3830.1. Samples: 60752792. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 02:25:28,968][134211] Avg episode reward: [(0, '6.992')] [2025-01-04 02:25:30,476][134294] Updated weights for policy 0, policy_version 69914 (0.0014) [2025-01-04 02:25:32,347][134294] Updated weights for policy 0, policy_version 69924 (0.0013) [2025-01-04 02:25:33,968][134211] Fps is (10 sec: 21708.3, 60 sec: 16452.2, 300 sec: 15773.1). Total num frames: 286441472. Throughput: 0: 3890.8. Samples: 60769166. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 02:25:33,968][134211] Avg episode reward: [(0, '6.622')] [2025-01-04 02:25:34,499][134294] Updated weights for policy 0, policy_version 69934 (0.0018) [2025-01-04 02:25:37,649][134294] Updated weights for policy 0, policy_version 69944 (0.0029) [2025-01-04 02:25:38,968][134211] Fps is (10 sec: 16793.2, 60 sec: 16110.9, 300 sec: 15745.3). Total num frames: 286502912. Throughput: 0: 4016.6. Samples: 60794266. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 02:25:38,968][134211] Avg episode reward: [(0, '7.073')] [2025-01-04 02:25:40,985][134294] Updated weights for policy 0, policy_version 69954 (0.0028) [2025-01-04 02:25:43,968][134211] Fps is (10 sec: 12697.0, 60 sec: 15428.4, 300 sec: 15759.1). Total num frames: 286568448. Throughput: 0: 4021.2. Samples: 60813588. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 02:25:43,969][134211] Avg episode reward: [(0, '6.521')] [2025-01-04 02:25:44,062][134294] Updated weights for policy 0, policy_version 69964 (0.0029) [2025-01-04 02:25:47,037][134294] Updated weights for policy 0, policy_version 69974 (0.0026) [2025-01-04 02:25:48,968][134211] Fps is (10 sec: 13516.9, 60 sec: 15360.0, 300 sec: 15759.2). Total num frames: 286638080. Throughput: 0: 4022.5. Samples: 60823510. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 02:25:48,968][134211] Avg episode reward: [(0, '7.079')] [2025-01-04 02:25:50,111][134294] Updated weights for policy 0, policy_version 69984 (0.0023) [2025-01-04 02:25:53,008][134294] Updated weights for policy 0, policy_version 69994 (0.0025) [2025-01-04 02:25:53,968][134211] Fps is (10 sec: 13927.5, 60 sec: 15429.1, 300 sec: 15759.2). Total num frames: 286707712. Throughput: 0: 3859.6. Samples: 60844118. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 02:25:53,969][134211] Avg episode reward: [(0, '6.950')] [2025-01-04 02:25:56,054][134294] Updated weights for policy 0, policy_version 70004 (0.0025) [2025-01-04 02:25:58,139][134294] Updated weights for policy 0, policy_version 70014 (0.0015) [2025-01-04 02:25:58,968][134211] Fps is (10 sec: 15564.9, 60 sec: 15837.9, 300 sec: 15703.6). Total num frames: 286793728. Throughput: 0: 3862.9. Samples: 60867670. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 02:25:58,968][134211] Avg episode reward: [(0, '7.210')] [2025-01-04 02:26:00,087][134294] Updated weights for policy 0, policy_version 70024 (0.0012) [2025-01-04 02:26:01,967][134294] Updated weights for policy 0, policy_version 70034 (0.0014) [2025-01-04 02:26:03,880][134294] Updated weights for policy 0, policy_version 70044 (0.0014) [2025-01-04 02:26:03,968][134211] Fps is (10 sec: 19251.1, 60 sec: 16179.2, 300 sec: 15703.6). Total num frames: 286900224. Throughput: 0: 4005.7. Samples: 60883768. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 02:26:03,968][134211] Avg episode reward: [(0, '7.138')] [2025-01-04 02:26:05,775][134294] Updated weights for policy 0, policy_version 70054 (0.0017) [2025-01-04 02:26:08,507][134294] Updated weights for policy 0, policy_version 70064 (0.0024) [2025-01-04 02:26:08,968][134211] Fps is (10 sec: 19251.1, 60 sec: 16179.2, 300 sec: 15773.1). Total num frames: 286986240. Throughput: 0: 4230.4. Samples: 60914410. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 02:26:08,968][134211] Avg episode reward: [(0, '6.222')] [2025-01-04 02:26:11,826][134294] Updated weights for policy 0, policy_version 70074 (0.0025) [2025-01-04 02:26:13,968][134211] Fps is (10 sec: 14745.6, 60 sec: 15974.5, 300 sec: 15759.2). Total num frames: 287047680. Throughput: 0: 4011.3. Samples: 60933300. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 02:26:13,968][134211] Avg episode reward: [(0, '7.171')] [2025-01-04 02:26:15,056][134294] Updated weights for policy 0, policy_version 70084 (0.0028) [2025-01-04 02:26:18,139][134294] Updated weights for policy 0, policy_version 70094 (0.0026) [2025-01-04 02:26:18,968][134211] Fps is (10 sec: 12697.5, 60 sec: 15906.4, 300 sec: 15773.1). Total num frames: 287113216. Throughput: 0: 3861.1. Samples: 60942914. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 02:26:18,970][134211] Avg episode reward: [(0, '6.826')] [2025-01-04 02:26:21,179][134294] Updated weights for policy 0, policy_version 70104 (0.0023) [2025-01-04 02:26:23,800][134294] Updated weights for policy 0, policy_version 70114 (0.0021) [2025-01-04 02:26:23,967][134211] Fps is (10 sec: 13926.6, 60 sec: 16042.7, 300 sec: 15787.0). Total num frames: 287186944. Throughput: 0: 3749.9. Samples: 60963010. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 02:26:23,968][134211] Avg episode reward: [(0, '7.578')] [2025-01-04 02:26:26,070][134294] Updated weights for policy 0, policy_version 70124 (0.0021) [2025-01-04 02:26:28,936][134294] Updated weights for policy 0, policy_version 70134 (0.0026) [2025-01-04 02:26:28,968][134211] Fps is (10 sec: 15564.9, 60 sec: 15564.8, 300 sec: 15717.5). Total num frames: 287268864. Throughput: 0: 3872.1. Samples: 60987828. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 02:26:28,968][134211] Avg episode reward: [(0, '7.378')] [2025-01-04 02:26:32,031][134294] Updated weights for policy 0, policy_version 70144 (0.0026) [2025-01-04 02:26:33,968][134211] Fps is (10 sec: 15974.3, 60 sec: 15087.0, 300 sec: 15759.2). Total num frames: 287346688. Throughput: 0: 3867.2. Samples: 60997534. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 02:26:33,968][134211] Avg episode reward: [(0, '6.368')] [2025-01-04 02:26:34,106][134294] Updated weights for policy 0, policy_version 70154 (0.0014) [2025-01-04 02:26:35,973][134294] Updated weights for policy 0, policy_version 70164 (0.0013) [2025-01-04 02:26:37,876][134294] Updated weights for policy 0, policy_version 70174 (0.0013) [2025-01-04 02:26:38,968][134211] Fps is (10 sec: 18432.3, 60 sec: 15837.9, 300 sec: 15884.2). Total num frames: 287453184. Throughput: 0: 4080.9. Samples: 61027758. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 02:26:38,968][134211] Avg episode reward: [(0, '7.136')] [2025-01-04 02:26:39,757][134294] Updated weights for policy 0, policy_version 70184 (0.0014) [2025-01-04 02:26:41,685][134294] Updated weights for policy 0, policy_version 70194 (0.0013) [2025-01-04 02:26:43,968][134211] Fps is (10 sec: 20070.1, 60 sec: 16315.9, 300 sec: 15828.6). Total num frames: 287547392. Throughput: 0: 4224.7. Samples: 61057782. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 02:26:43,968][134211] Avg episode reward: [(0, '7.309')] [2025-01-04 02:26:44,531][134294] Updated weights for policy 0, policy_version 70204 (0.0026) [2025-01-04 02:26:47,797][134294] Updated weights for policy 0, policy_version 70214 (0.0027) [2025-01-04 02:26:48,968][134211] Fps is (10 sec: 15564.6, 60 sec: 16179.2, 300 sec: 15745.3). Total num frames: 287608832. Throughput: 0: 4074.8. Samples: 61067132. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 02:26:48,968][134211] Avg episode reward: [(0, '7.173')] [2025-01-04 02:26:50,922][134294] Updated weights for policy 0, policy_version 70224 (0.0028) [2025-01-04 02:26:53,912][134294] Updated weights for policy 0, policy_version 70234 (0.0025) [2025-01-04 02:26:53,968][134211] Fps is (10 sec: 13107.0, 60 sec: 16179.1, 300 sec: 15773.1). Total num frames: 287678464. Throughput: 0: 3830.4. Samples: 61086780. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 02:26:53,968][134211] Avg episode reward: [(0, '6.572')] [2025-01-04 02:26:57,049][134294] Updated weights for policy 0, policy_version 70244 (0.0028) [2025-01-04 02:26:58,968][134211] Fps is (10 sec: 13107.2, 60 sec: 15769.6, 300 sec: 15745.3). Total num frames: 287739904. Throughput: 0: 3844.8. Samples: 61106316. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 02:26:58,968][134211] Avg episode reward: [(0, '7.083')] [2025-01-04 02:27:00,508][134294] Updated weights for policy 0, policy_version 70254 (0.0027) [2025-01-04 02:27:03,149][134294] Updated weights for policy 0, policy_version 70264 (0.0018) [2025-01-04 02:27:03,968][134211] Fps is (10 sec: 13926.8, 60 sec: 15291.8, 300 sec: 15745.3). Total num frames: 287817728. Throughput: 0: 3829.0. Samples: 61115218. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 02:27:03,968][134211] Avg episode reward: [(0, '6.876')] [2025-01-04 02:27:05,180][134294] Updated weights for policy 0, policy_version 70274 (0.0014) [2025-01-04 02:27:07,079][134294] Updated weights for policy 0, policy_version 70284 (0.0013) [2025-01-04 02:27:08,968][134211] Fps is (10 sec: 18022.5, 60 sec: 15564.8, 300 sec: 15870.3). Total num frames: 287920128. Throughput: 0: 4034.2. Samples: 61144550. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 02:27:08,969][134211] Avg episode reward: [(0, '7.451')] [2025-01-04 02:27:09,032][134294] Updated weights for policy 0, policy_version 70294 (0.0014) [2025-01-04 02:27:12,004][134294] Updated weights for policy 0, policy_version 70304 (0.0025) [2025-01-04 02:27:13,968][134211] Fps is (10 sec: 17202.8, 60 sec: 15701.3, 300 sec: 15745.3). Total num frames: 287989760. Throughput: 0: 4009.9. Samples: 61168276. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 02:27:13,968][134211] Avg episode reward: [(0, '7.233')] [2025-01-04 02:27:13,975][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000070310_287989760.pth... [2025-01-04 02:27:14,050][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000069390_284221440.pth [2025-01-04 02:27:15,235][134294] Updated weights for policy 0, policy_version 70314 (0.0028) [2025-01-04 02:27:18,254][134294] Updated weights for policy 0, policy_version 70324 (0.0023) [2025-01-04 02:27:18,968][134211] Fps is (10 sec: 13516.6, 60 sec: 15701.3, 300 sec: 15606.4). Total num frames: 288055296. Throughput: 0: 4009.1. Samples: 61177942. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 02:27:18,968][134211] Avg episode reward: [(0, '7.150')] [2025-01-04 02:27:20,927][134294] Updated weights for policy 0, policy_version 70334 (0.0019) [2025-01-04 02:27:23,087][134294] Updated weights for policy 0, policy_version 70344 (0.0017) [2025-01-04 02:27:23,968][134211] Fps is (10 sec: 14745.8, 60 sec: 15837.8, 300 sec: 15606.6). Total num frames: 288137216. Throughput: 0: 3870.9. Samples: 61201948. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 02:27:23,968][134211] Avg episode reward: [(0, '7.411')] [2025-01-04 02:27:26,079][134294] Updated weights for policy 0, policy_version 70354 (0.0024) [2025-01-04 02:27:28,451][134294] Updated weights for policy 0, policy_version 70364 (0.0018) [2025-01-04 02:27:28,967][134211] Fps is (10 sec: 16384.4, 60 sec: 15837.9, 300 sec: 15675.9). Total num frames: 288219136. Throughput: 0: 3702.4. Samples: 61224390. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 02:27:28,968][134211] Avg episode reward: [(0, '7.254')] [2025-01-04 02:27:30,357][134294] Updated weights for policy 0, policy_version 70374 (0.0013) [2025-01-04 02:27:32,764][134294] Updated weights for policy 0, policy_version 70384 (0.0021) [2025-01-04 02:27:33,968][134211] Fps is (10 sec: 16793.2, 60 sec: 15974.3, 300 sec: 15759.2). Total num frames: 288305152. Throughput: 0: 3850.4. Samples: 61240402. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 02:27:33,969][134211] Avg episode reward: [(0, '7.431')] [2025-01-04 02:27:35,885][134294] Updated weights for policy 0, policy_version 70394 (0.0027) [2025-01-04 02:27:38,968][134211] Fps is (10 sec: 15155.0, 60 sec: 15291.7, 300 sec: 15745.3). Total num frames: 288370688. Throughput: 0: 3868.0. Samples: 61260838. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 02:27:38,968][134211] Avg episode reward: [(0, '6.875')] [2025-01-04 02:27:39,061][134294] Updated weights for policy 0, policy_version 70404 (0.0022) [2025-01-04 02:27:41,523][134294] Updated weights for policy 0, policy_version 70414 (0.0020) [2025-01-04 02:27:43,797][134294] Updated weights for policy 0, policy_version 70424 (0.0019) [2025-01-04 02:27:43,968][134211] Fps is (10 sec: 15154.8, 60 sec: 15155.1, 300 sec: 15786.9). Total num frames: 288456704. Throughput: 0: 3973.5. Samples: 61285124. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 02:27:43,969][134211] Avg episode reward: [(0, '7.615')] [2025-01-04 02:27:46,758][134294] Updated weights for policy 0, policy_version 70434 (0.0027) [2025-01-04 02:27:48,967][134211] Fps is (10 sec: 15565.1, 60 sec: 15291.8, 300 sec: 15648.1). Total num frames: 288526336. Throughput: 0: 4002.2. Samples: 61295318. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 02:27:48,968][134211] Avg episode reward: [(0, '6.441')] [2025-01-04 02:27:49,538][134294] Updated weights for policy 0, policy_version 70444 (0.0018) [2025-01-04 02:27:51,399][134294] Updated weights for policy 0, policy_version 70454 (0.0017) [2025-01-04 02:27:53,322][134294] Updated weights for policy 0, policy_version 70464 (0.0012) [2025-01-04 02:27:53,967][134211] Fps is (10 sec: 17613.8, 60 sec: 15906.2, 300 sec: 15703.7). Total num frames: 288632832. Throughput: 0: 3950.1. Samples: 61322306. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 02:27:53,968][134211] Avg episode reward: [(0, '7.232')] [2025-01-04 02:27:55,203][134294] Updated weights for policy 0, policy_version 70474 (0.0014) [2025-01-04 02:27:57,073][134294] Updated weights for policy 0, policy_version 70484 (0.0013) [2025-01-04 02:27:58,968][134211] Fps is (10 sec: 20889.3, 60 sec: 16588.8, 300 sec: 15856.4). Total num frames: 288735232. Throughput: 0: 4142.7. Samples: 61354696. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 02:27:58,968][134211] Avg episode reward: [(0, '7.410')] [2025-01-04 02:27:59,389][134294] Updated weights for policy 0, policy_version 70494 (0.0020) [2025-01-04 02:28:02,639][134294] Updated weights for policy 0, policy_version 70504 (0.0030) [2025-01-04 02:28:03,968][134211] Fps is (10 sec: 16383.4, 60 sec: 16315.7, 300 sec: 15856.4). Total num frames: 288796672. Throughput: 0: 4156.2. Samples: 61364970. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 02:28:03,969][134211] Avg episode reward: [(0, '6.624')] [2025-01-04 02:28:05,918][134294] Updated weights for policy 0, policy_version 70514 (0.0029) [2025-01-04 02:28:08,968][134211] Fps is (10 sec: 12697.4, 60 sec: 15701.3, 300 sec: 15731.4). Total num frames: 288862208. Throughput: 0: 4040.5. Samples: 61383772. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 02:28:08,968][134211] Avg episode reward: [(0, '8.276')] [2025-01-04 02:28:09,056][134294] Updated weights for policy 0, policy_version 70524 (0.0024) [2025-01-04 02:28:12,109][134294] Updated weights for policy 0, policy_version 70534 (0.0028) [2025-01-04 02:28:13,968][134211] Fps is (10 sec: 13107.5, 60 sec: 15633.1, 300 sec: 15675.9). Total num frames: 288927744. Throughput: 0: 3976.8. Samples: 61403348. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 02:28:13,968][134211] Avg episode reward: [(0, '7.079')] [2025-01-04 02:28:15,212][134294] Updated weights for policy 0, policy_version 70544 (0.0028) [2025-01-04 02:28:18,172][134294] Updated weights for policy 0, policy_version 70554 (0.0025) [2025-01-04 02:28:18,968][134211] Fps is (10 sec: 13516.8, 60 sec: 15701.3, 300 sec: 15703.8). Total num frames: 288997376. Throughput: 0: 3850.3. Samples: 61413664. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 02:28:18,968][134211] Avg episode reward: [(0, '6.772')] [2025-01-04 02:28:20,869][134294] Updated weights for policy 0, policy_version 70564 (0.0020) [2025-01-04 02:28:22,724][134294] Updated weights for policy 0, policy_version 70574 (0.0014) [2025-01-04 02:28:23,968][134211] Fps is (10 sec: 16793.7, 60 sec: 15974.4, 300 sec: 15814.7). Total num frames: 289095680. Throughput: 0: 3935.1. Samples: 61437918. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 02:28:23,968][134211] Avg episode reward: [(0, '6.663')] [2025-01-04 02:28:24,631][134294] Updated weights for policy 0, policy_version 70584 (0.0013) [2025-01-04 02:28:26,529][134294] Updated weights for policy 0, policy_version 70594 (0.0014) [2025-01-04 02:28:28,415][134294] Updated weights for policy 0, policy_version 70604 (0.0014) [2025-01-04 02:28:28,967][134211] Fps is (10 sec: 20890.1, 60 sec: 16452.3, 300 sec: 15898.0). Total num frames: 289206272. Throughput: 0: 4121.6. Samples: 61470592. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 02:28:28,968][134211] Avg episode reward: [(0, '7.359')] [2025-01-04 02:28:30,436][134294] Updated weights for policy 0, policy_version 70614 (0.0015) [2025-01-04 02:28:33,645][134294] Updated weights for policy 0, policy_version 70624 (0.0026) [2025-01-04 02:28:33,968][134211] Fps is (10 sec: 18431.4, 60 sec: 16247.4, 300 sec: 15786.9). Total num frames: 289280000. Throughput: 0: 4205.7. Samples: 61484578. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 02:28:33,969][134211] Avg episode reward: [(0, '6.755')] [2025-01-04 02:28:36,813][134294] Updated weights for policy 0, policy_version 70634 (0.0026) [2025-01-04 02:28:38,968][134211] Fps is (10 sec: 13516.4, 60 sec: 16179.2, 300 sec: 15703.7). Total num frames: 289341440. Throughput: 0: 4029.7. Samples: 61503642. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 02:28:38,969][134211] Avg episode reward: [(0, '7.110')] [2025-01-04 02:28:40,043][134294] Updated weights for policy 0, policy_version 70644 (0.0026) [2025-01-04 02:28:43,157][134294] Updated weights for policy 0, policy_version 70654 (0.0026) [2025-01-04 02:28:43,968][134211] Fps is (10 sec: 12697.7, 60 sec: 15837.9, 300 sec: 15703.7). Total num frames: 289406976. Throughput: 0: 3742.4. Samples: 61523106. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 02:28:43,969][134211] Avg episode reward: [(0, '6.924')] [2025-01-04 02:28:46,236][134294] Updated weights for policy 0, policy_version 70664 (0.0027) [2025-01-04 02:28:48,968][134211] Fps is (10 sec: 13517.1, 60 sec: 15837.8, 300 sec: 15717.5). Total num frames: 289476608. Throughput: 0: 3737.2. Samples: 61533144. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 02:28:48,968][134211] Avg episode reward: [(0, '6.961')] [2025-01-04 02:28:49,122][134294] Updated weights for policy 0, policy_version 70674 (0.0023) [2025-01-04 02:28:50,990][134294] Updated weights for policy 0, policy_version 70684 (0.0012) [2025-01-04 02:28:52,866][134294] Updated weights for policy 0, policy_version 70694 (0.0012) [2025-01-04 02:28:53,967][134211] Fps is (10 sec: 17613.4, 60 sec: 15837.9, 300 sec: 15870.3). Total num frames: 289583104. Throughput: 0: 3919.4. Samples: 61560144. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 02:28:53,968][134211] Avg episode reward: [(0, '7.023')] [2025-01-04 02:28:54,753][134294] Updated weights for policy 0, policy_version 70704 (0.0012) [2025-01-04 02:28:56,646][134294] Updated weights for policy 0, policy_version 70714 (0.0013) [2025-01-04 02:28:58,554][134294] Updated weights for policy 0, policy_version 70724 (0.0013) [2025-01-04 02:28:58,968][134211] Fps is (10 sec: 21299.1, 60 sec: 15906.1, 300 sec: 15981.3). Total num frames: 289689600. Throughput: 0: 4209.1. Samples: 61592758. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 02:28:58,968][134211] Avg episode reward: [(0, '6.367')] [2025-01-04 02:29:01,999][134294] Updated weights for policy 0, policy_version 70734 (0.0029) [2025-01-04 02:29:03,968][134211] Fps is (10 sec: 15973.9, 60 sec: 15769.6, 300 sec: 15800.8). Total num frames: 289742848. Throughput: 0: 4201.1. Samples: 61602716. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 02:29:03,969][134211] Avg episode reward: [(0, '7.317')] [2025-01-04 02:29:05,798][134294] Updated weights for policy 0, policy_version 70744 (0.0031) [2025-01-04 02:29:08,968][134211] Fps is (10 sec: 11468.6, 60 sec: 15701.3, 300 sec: 15648.1). Total num frames: 289804288. Throughput: 0: 4034.8. Samples: 61619486. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 02:29:08,968][134211] Avg episode reward: [(0, '7.415')] [2025-01-04 02:29:09,108][134294] Updated weights for policy 0, policy_version 70754 (0.0028) [2025-01-04 02:29:11,643][134294] Updated weights for policy 0, policy_version 70764 (0.0016) [2025-01-04 02:29:13,512][134294] Updated weights for policy 0, policy_version 70774 (0.0015) [2025-01-04 02:29:13,968][134211] Fps is (10 sec: 15565.2, 60 sec: 16179.2, 300 sec: 15745.3). Total num frames: 289898496. Throughput: 0: 3853.1. Samples: 61643984. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 02:29:13,968][134211] Avg episode reward: [(0, '7.528')] [2025-01-04 02:29:13,973][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000070776_289898496.pth... [2025-01-04 02:29:14,022][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000069847_286093312.pth [2025-01-04 02:29:15,404][134294] Updated weights for policy 0, policy_version 70784 (0.0012) [2025-01-04 02:29:17,637][134294] Updated weights for policy 0, policy_version 70794 (0.0018) [2025-01-04 02:29:18,968][134211] Fps is (10 sec: 18431.9, 60 sec: 16520.5, 300 sec: 15828.6). Total num frames: 289988608. Throughput: 0: 3903.3. Samples: 61660224. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 02:29:18,969][134211] Avg episode reward: [(0, '7.073')] [2025-01-04 02:29:20,769][134294] Updated weights for policy 0, policy_version 70804 (0.0028) [2025-01-04 02:29:23,926][134294] Updated weights for policy 0, policy_version 70814 (0.0026) [2025-01-04 02:29:23,968][134211] Fps is (10 sec: 15564.4, 60 sec: 15974.3, 300 sec: 15828.6). Total num frames: 290054144. Throughput: 0: 3941.3. Samples: 61681000. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 02:29:23,969][134211] Avg episode reward: [(0, '7.068')] [2025-01-04 02:29:27,011][134294] Updated weights for policy 0, policy_version 70824 (0.0025) [2025-01-04 02:29:28,968][134211] Fps is (10 sec: 13107.3, 60 sec: 15223.4, 300 sec: 15814.7). Total num frames: 290119680. Throughput: 0: 3950.0. Samples: 61700856. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 02:29:28,968][134211] Avg episode reward: [(0, '6.907')] [2025-01-04 02:29:30,115][134294] Updated weights for policy 0, policy_version 70834 (0.0025) [2025-01-04 02:29:32,534][134294] Updated weights for policy 0, policy_version 70844 (0.0020) [2025-01-04 02:29:33,968][134211] Fps is (10 sec: 15155.6, 60 sec: 15428.4, 300 sec: 15828.6). Total num frames: 290205696. Throughput: 0: 3956.0. Samples: 61711162. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 02:29:33,968][134211] Avg episode reward: [(0, '7.128')] [2025-01-04 02:29:34,446][134294] Updated weights for policy 0, policy_version 70854 (0.0013) [2025-01-04 02:29:36,407][134294] Updated weights for policy 0, policy_version 70864 (0.0012) [2025-01-04 02:29:38,339][134294] Updated weights for policy 0, policy_version 70874 (0.0014) [2025-01-04 02:29:38,967][134211] Fps is (10 sec: 19251.7, 60 sec: 16179.3, 300 sec: 15828.7). Total num frames: 290312192. Throughput: 0: 4042.0. Samples: 61742034. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 02:29:38,968][134211] Avg episode reward: [(0, '6.704')] [2025-01-04 02:29:40,257][134294] Updated weights for policy 0, policy_version 70884 (0.0013) [2025-01-04 02:29:42,571][134294] Updated weights for policy 0, policy_version 70894 (0.0017) [2025-01-04 02:29:43,968][134211] Fps is (10 sec: 19250.7, 60 sec: 16520.5, 300 sec: 15870.3). Total num frames: 290398208. Throughput: 0: 3951.0. Samples: 61770556. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 02:29:43,969][134211] Avg episode reward: [(0, '7.827')] [2025-01-04 02:29:45,899][134294] Updated weights for policy 0, policy_version 70904 (0.0029) [2025-01-04 02:29:48,970][134211] Fps is (10 sec: 14741.7, 60 sec: 16383.3, 300 sec: 15856.4). Total num frames: 290459648. Throughput: 0: 3935.0. Samples: 61779800. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 02:29:48,972][134211] Avg episode reward: [(0, '7.363')] [2025-01-04 02:29:48,990][134294] Updated weights for policy 0, policy_version 70914 (0.0028) [2025-01-04 02:29:52,065][134294] Updated weights for policy 0, policy_version 70924 (0.0026) [2025-01-04 02:29:53,968][134211] Fps is (10 sec: 12697.8, 60 sec: 15701.3, 300 sec: 15870.3). Total num frames: 290525184. Throughput: 0: 4002.8. Samples: 61799612. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 02:29:53,968][134211] Avg episode reward: [(0, '7.150')] [2025-01-04 02:29:55,188][134294] Updated weights for policy 0, policy_version 70934 (0.0027) [2025-01-04 02:29:58,170][134294] Updated weights for policy 0, policy_version 70944 (0.0026) [2025-01-04 02:29:58,968][134211] Fps is (10 sec: 13519.4, 60 sec: 15086.8, 300 sec: 15814.7). Total num frames: 290594816. Throughput: 0: 3908.7. Samples: 61819880. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 02:29:58,969][134211] Avg episode reward: [(0, '7.397')] [2025-01-04 02:30:01,189][134294] Updated weights for policy 0, policy_version 70954 (0.0027) [2025-01-04 02:30:03,949][134294] Updated weights for policy 0, policy_version 70964 (0.0020) [2025-01-04 02:30:03,968][134211] Fps is (10 sec: 14336.2, 60 sec: 15428.3, 300 sec: 15773.1). Total num frames: 290668544. Throughput: 0: 3775.7. Samples: 61830128. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 02:30:03,968][134211] Avg episode reward: [(0, '7.353')] [2025-01-04 02:30:05,899][134294] Updated weights for policy 0, policy_version 70974 (0.0014) [2025-01-04 02:30:07,775][134294] Updated weights for policy 0, policy_version 70984 (0.0014) [2025-01-04 02:30:08,968][134211] Fps is (10 sec: 18023.6, 60 sec: 16179.3, 300 sec: 15884.2). Total num frames: 290775040. Throughput: 0: 3927.4. Samples: 61857730. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 02:30:08,968][134211] Avg episode reward: [(0, '6.976')] [2025-01-04 02:30:09,615][134294] Updated weights for policy 0, policy_version 70994 (0.0013) [2025-01-04 02:30:11,497][134294] Updated weights for policy 0, policy_version 71004 (0.0013) [2025-01-04 02:30:13,501][134294] Updated weights for policy 0, policy_version 71014 (0.0018) [2025-01-04 02:30:13,968][134211] Fps is (10 sec: 20889.0, 60 sec: 16315.7, 300 sec: 15995.3). Total num frames: 290877440. Throughput: 0: 4207.1. Samples: 61890174. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 02:30:13,968][134211] Avg episode reward: [(0, '7.049')] [2025-01-04 02:30:16,521][134294] Updated weights for policy 0, policy_version 71024 (0.0026) [2025-01-04 02:30:18,968][134211] Fps is (10 sec: 16793.2, 60 sec: 15906.1, 300 sec: 15995.2). Total num frames: 290942976. Throughput: 0: 4210.8. Samples: 61900650. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 02:30:18,968][134211] Avg episode reward: [(0, '7.344')] [2025-01-04 02:30:19,839][134294] Updated weights for policy 0, policy_version 71034 (0.0032) [2025-01-04 02:30:23,043][134294] Updated weights for policy 0, policy_version 71044 (0.0026) [2025-01-04 02:30:23,968][134211] Fps is (10 sec: 12697.6, 60 sec: 15837.9, 300 sec: 15828.6). Total num frames: 291004416. Throughput: 0: 3945.2. Samples: 61919568. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 02:30:23,969][134211] Avg episode reward: [(0, '7.786')] [2025-01-04 02:30:25,991][134294] Updated weights for policy 0, policy_version 71054 (0.0026) [2025-01-04 02:30:28,968][134211] Fps is (10 sec: 13107.2, 60 sec: 15906.1, 300 sec: 15703.7). Total num frames: 291074048. Throughput: 0: 3752.0. Samples: 61939394. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 02:30:28,968][134211] Avg episode reward: [(0, '7.905')] [2025-01-04 02:30:29,226][134294] Updated weights for policy 0, policy_version 71064 (0.0026) [2025-01-04 02:30:31,789][134294] Updated weights for policy 0, policy_version 71074 (0.0019) [2025-01-04 02:30:33,694][134294] Updated weights for policy 0, policy_version 71084 (0.0013) [2025-01-04 02:30:33,968][134211] Fps is (10 sec: 15974.3, 60 sec: 15974.3, 300 sec: 15800.8). Total num frames: 291164160. Throughput: 0: 3777.4. Samples: 61949776. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 02:30:33,968][134211] Avg episode reward: [(0, '7.975')] [2025-01-04 02:30:35,603][134294] Updated weights for policy 0, policy_version 71094 (0.0013) [2025-01-04 02:30:37,409][134294] Updated weights for policy 0, policy_version 71104 (0.0015) [2025-01-04 02:30:38,968][134211] Fps is (10 sec: 19661.3, 60 sec: 15974.4, 300 sec: 15939.7). Total num frames: 291270656. Throughput: 0: 4058.6. Samples: 61982248. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 02:30:38,968][134211] Avg episode reward: [(0, '7.795')] [2025-01-04 02:30:39,355][134294] Updated weights for policy 0, policy_version 71114 (0.0014) [2025-01-04 02:30:41,230][134294] Updated weights for policy 0, policy_version 71124 (0.0016) [2025-01-04 02:30:43,112][134294] Updated weights for policy 0, policy_version 71134 (0.0013) [2025-01-04 02:30:43,968][134211] Fps is (10 sec: 21709.3, 60 sec: 16384.0, 300 sec: 16078.5). Total num frames: 291381248. Throughput: 0: 4329.3. Samples: 62014694. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 02:30:43,968][134211] Avg episode reward: [(0, '7.733')] [2025-01-04 02:30:45,607][134294] Updated weights for policy 0, policy_version 71144 (0.0021) [2025-01-04 02:30:48,820][134294] Updated weights for policy 0, policy_version 71154 (0.0028) [2025-01-04 02:30:48,968][134211] Fps is (10 sec: 17611.2, 60 sec: 16452.7, 300 sec: 16064.6). Total num frames: 291446784. Throughput: 0: 4364.6. Samples: 62026540. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 02:30:48,969][134211] Avg episode reward: [(0, '7.397')] [2025-01-04 02:30:52,349][134294] Updated weights for policy 0, policy_version 71164 (0.0026) [2025-01-04 02:30:53,968][134211] Fps is (10 sec: 12697.4, 60 sec: 16384.0, 300 sec: 15981.3). Total num frames: 291508224. Throughput: 0: 4158.6. Samples: 62044868. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 02:30:53,968][134211] Avg episode reward: [(0, '7.190')] [2025-01-04 02:30:55,448][134294] Updated weights for policy 0, policy_version 71174 (0.0025) [2025-01-04 02:30:58,378][134294] Updated weights for policy 0, policy_version 71184 (0.0025) [2025-01-04 02:30:58,968][134211] Fps is (10 sec: 12698.6, 60 sec: 16315.9, 300 sec: 15842.5). Total num frames: 291573760. Throughput: 0: 3880.1. Samples: 62064776. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 02:30:58,968][134211] Avg episode reward: [(0, '6.584')] [2025-01-04 02:31:01,896][134294] Updated weights for policy 0, policy_version 71194 (0.0023) [2025-01-04 02:31:03,968][134211] Fps is (10 sec: 12287.6, 60 sec: 16042.5, 300 sec: 15745.3). Total num frames: 291631104. Throughput: 0: 3849.5. Samples: 62073880. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 02:31:03,969][134211] Avg episode reward: [(0, '6.730')] [2025-01-04 02:31:05,561][134294] Updated weights for policy 0, policy_version 71204 (0.0029) [2025-01-04 02:31:07,986][134294] Updated weights for policy 0, policy_version 71214 (0.0016) [2025-01-04 02:31:08,968][134211] Fps is (10 sec: 13516.9, 60 sec: 15564.8, 300 sec: 15800.8). Total num frames: 291708928. Throughput: 0: 3839.5. Samples: 62092346. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 02:31:08,968][134211] Avg episode reward: [(0, '6.564')] [2025-01-04 02:31:09,926][134294] Updated weights for policy 0, policy_version 71224 (0.0014) [2025-01-04 02:31:11,806][134294] Updated weights for policy 0, policy_version 71234 (0.0015) [2025-01-04 02:31:13,723][134294] Updated weights for policy 0, policy_version 71244 (0.0016) [2025-01-04 02:31:13,968][134211] Fps is (10 sec: 18842.7, 60 sec: 15701.4, 300 sec: 15953.6). Total num frames: 291819520. Throughput: 0: 4109.0. Samples: 62124296. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 02:31:13,968][134211] Avg episode reward: [(0, '6.903')] [2025-01-04 02:31:13,973][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000071245_291819520.pth... [2025-01-04 02:31:14,028][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000070310_287989760.pth [2025-01-04 02:31:15,605][134294] Updated weights for policy 0, policy_version 71254 (0.0015) [2025-01-04 02:31:17,535][134294] Updated weights for policy 0, policy_version 71264 (0.0012) [2025-01-04 02:31:18,968][134211] Fps is (10 sec: 21708.7, 60 sec: 16384.0, 300 sec: 16064.6). Total num frames: 291926016. Throughput: 0: 4233.0. Samples: 62140258. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 02:31:18,968][134211] Avg episode reward: [(0, '6.913')] [2025-01-04 02:31:19,472][134294] Updated weights for policy 0, policy_version 71274 (0.0017) [2025-01-04 02:31:22,510][134294] Updated weights for policy 0, policy_version 71284 (0.0028) [2025-01-04 02:31:23,968][134211] Fps is (10 sec: 17612.4, 60 sec: 16520.5, 300 sec: 16023.0). Total num frames: 291995648. Throughput: 0: 4108.7. Samples: 62167142. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 02:31:23,969][134211] Avg episode reward: [(0, '7.150')] [2025-01-04 02:31:25,756][134294] Updated weights for policy 0, policy_version 71294 (0.0030) [2025-01-04 02:31:28,869][134294] Updated weights for policy 0, policy_version 71304 (0.0030) [2025-01-04 02:31:28,968][134211] Fps is (10 sec: 13516.5, 60 sec: 16452.3, 300 sec: 15981.3). Total num frames: 292061184. Throughput: 0: 3811.1. Samples: 62186196. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 02:31:28,968][134211] Avg episode reward: [(0, '7.299')] [2025-01-04 02:31:32,044][134294] Updated weights for policy 0, policy_version 71314 (0.0029) [2025-01-04 02:31:33,969][134211] Fps is (10 sec: 13105.2, 60 sec: 16042.3, 300 sec: 15842.4). Total num frames: 292126720. Throughput: 0: 3763.5. Samples: 62195900. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 02:31:33,970][134211] Avg episode reward: [(0, '7.069')] [2025-01-04 02:31:35,083][134294] Updated weights for policy 0, policy_version 71324 (0.0029) [2025-01-04 02:31:38,285][134294] Updated weights for policy 0, policy_version 71334 (0.0029) [2025-01-04 02:31:38,968][134211] Fps is (10 sec: 13107.3, 60 sec: 15359.9, 300 sec: 15745.3). Total num frames: 292192256. Throughput: 0: 3807.8. Samples: 62216218. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 02:31:38,968][134211] Avg episode reward: [(0, '7.353')] [2025-01-04 02:31:41,455][134294] Updated weights for policy 0, policy_version 71344 (0.0027) [2025-01-04 02:31:43,968][134211] Fps is (10 sec: 12699.7, 60 sec: 14540.8, 300 sec: 15745.3). Total num frames: 292253696. Throughput: 0: 3775.9. Samples: 62234692. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 02:31:43,969][134211] Avg episode reward: [(0, '8.013')] [2025-01-04 02:31:44,728][134294] Updated weights for policy 0, policy_version 71354 (0.0027) [2025-01-04 02:31:46,889][134294] Updated weights for policy 0, policy_version 71364 (0.0014) [2025-01-04 02:31:48,873][134294] Updated weights for policy 0, policy_version 71374 (0.0015) [2025-01-04 02:31:48,968][134211] Fps is (10 sec: 15565.1, 60 sec: 15018.9, 300 sec: 15828.6). Total num frames: 292347904. Throughput: 0: 3833.1. Samples: 62246366. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 02:31:48,968][134211] Avg episode reward: [(0, '7.108')] [2025-01-04 02:31:51,606][134294] Updated weights for policy 0, policy_version 71384 (0.0023) [2025-01-04 02:31:53,968][134211] Fps is (10 sec: 16384.0, 60 sec: 15155.2, 300 sec: 15856.4). Total num frames: 292417536. Throughput: 0: 4001.3. Samples: 62272404. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 02:31:53,968][134211] Avg episode reward: [(0, '7.087')] [2025-01-04 02:31:54,749][134294] Updated weights for policy 0, policy_version 71394 (0.0023) [2025-01-04 02:31:57,346][134294] Updated weights for policy 0, policy_version 71404 (0.0019) [2025-01-04 02:31:58,968][134211] Fps is (10 sec: 15564.8, 60 sec: 15496.6, 300 sec: 15884.2). Total num frames: 292503552. Throughput: 0: 3797.6. Samples: 62295186. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 02:31:58,968][134211] Avg episode reward: [(0, '7.638')] [2025-01-04 02:31:59,295][134294] Updated weights for policy 0, policy_version 71414 (0.0014) [2025-01-04 02:32:01,199][134294] Updated weights for policy 0, policy_version 71424 (0.0014) [2025-01-04 02:32:03,113][134294] Updated weights for policy 0, policy_version 71434 (0.0012) [2025-01-04 02:32:03,967][134211] Fps is (10 sec: 19251.7, 60 sec: 16315.9, 300 sec: 15898.0). Total num frames: 292610048. Throughput: 0: 3801.4. Samples: 62311320. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 02:32:03,968][134211] Avg episode reward: [(0, '7.729')] [2025-01-04 02:32:05,039][134294] Updated weights for policy 0, policy_version 71444 (0.0013) [2025-01-04 02:32:06,960][134294] Updated weights for policy 0, policy_version 71454 (0.0013) [2025-01-04 02:32:08,968][134211] Fps is (10 sec: 20069.0, 60 sec: 16588.6, 300 sec: 15981.3). Total num frames: 292704256. Throughput: 0: 3911.8. Samples: 62343176. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 02:32:08,969][134211] Avg episode reward: [(0, '7.353')] [2025-01-04 02:32:09,913][134294] Updated weights for policy 0, policy_version 71464 (0.0023) [2025-01-04 02:32:13,579][134294] Updated weights for policy 0, policy_version 71474 (0.0031) [2025-01-04 02:32:13,968][134211] Fps is (10 sec: 15154.8, 60 sec: 15701.3, 300 sec: 15953.6). Total num frames: 292761600. Throughput: 0: 3896.9. Samples: 62361558. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 02:32:13,969][134211] Avg episode reward: [(0, '7.795')] [2025-01-04 02:32:16,890][134294] Updated weights for policy 0, policy_version 71484 (0.0027) [2025-01-04 02:32:18,968][134211] Fps is (10 sec: 12288.6, 60 sec: 15018.6, 300 sec: 15898.0). Total num frames: 292827136. Throughput: 0: 3882.1. Samples: 62370588. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 02:32:18,968][134211] Avg episode reward: [(0, '7.797')] [2025-01-04 02:32:19,971][134294] Updated weights for policy 0, policy_version 71494 (0.0026) [2025-01-04 02:32:23,081][134294] Updated weights for policy 0, policy_version 71504 (0.0028) [2025-01-04 02:32:23,968][134211] Fps is (10 sec: 12697.6, 60 sec: 14882.1, 300 sec: 15828.6). Total num frames: 292888576. Throughput: 0: 3862.1. Samples: 62390014. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 02:32:23,968][134211] Avg episode reward: [(0, '7.919')] [2025-01-04 02:32:26,290][134294] Updated weights for policy 0, policy_version 71514 (0.0024) [2025-01-04 02:32:28,969][134211] Fps is (10 sec: 12696.6, 60 sec: 14882.0, 300 sec: 15759.1). Total num frames: 292954112. Throughput: 0: 3889.8. Samples: 62409734. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 02:32:28,969][134211] Avg episode reward: [(0, '7.261')] [2025-01-04 02:32:29,413][134294] Updated weights for policy 0, policy_version 71524 (0.0025) [2025-01-04 02:32:31,986][134294] Updated weights for policy 0, policy_version 71534 (0.0017) [2025-01-04 02:32:33,950][134294] Updated weights for policy 0, policy_version 71544 (0.0014) [2025-01-04 02:32:33,968][134211] Fps is (10 sec: 15565.2, 60 sec: 15292.2, 300 sec: 15842.5). Total num frames: 293044224. Throughput: 0: 3852.4. Samples: 62419724. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 02:32:33,968][134211] Avg episode reward: [(0, '7.651')] [2025-01-04 02:32:35,884][134294] Updated weights for policy 0, policy_version 71554 (0.0013) [2025-01-04 02:32:37,836][134294] Updated weights for policy 0, policy_version 71564 (0.0013) [2025-01-04 02:32:38,968][134211] Fps is (10 sec: 19252.9, 60 sec: 15906.2, 300 sec: 15898.1). Total num frames: 293146624. Throughput: 0: 3975.3. Samples: 62451294. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 02:32:38,968][134211] Avg episode reward: [(0, '7.281')] [2025-01-04 02:32:40,493][134294] Updated weights for policy 0, policy_version 71574 (0.0022) [2025-01-04 02:32:43,772][134294] Updated weights for policy 0, policy_version 71584 (0.0027) [2025-01-04 02:32:43,968][134211] Fps is (10 sec: 16383.6, 60 sec: 15906.1, 300 sec: 15870.2). Total num frames: 293208064. Throughput: 0: 3962.5. Samples: 62473498. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 02:32:43,969][134211] Avg episode reward: [(0, '7.450')] [2025-01-04 02:32:46,925][134294] Updated weights for policy 0, policy_version 71594 (0.0028) [2025-01-04 02:32:48,967][134211] Fps is (10 sec: 12697.8, 60 sec: 15428.3, 300 sec: 15731.4). Total num frames: 293273600. Throughput: 0: 3811.2. Samples: 62482826. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 02:32:48,968][134211] Avg episode reward: [(0, '7.168')] [2025-01-04 02:32:49,597][134294] Updated weights for policy 0, policy_version 71604 (0.0017) [2025-01-04 02:32:51,493][134294] Updated weights for policy 0, policy_version 71614 (0.0014) [2025-01-04 02:32:53,968][134211] Fps is (10 sec: 15974.1, 60 sec: 15837.8, 300 sec: 15703.6). Total num frames: 293367808. Throughput: 0: 3691.1. Samples: 62509276. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 02:32:53,969][134211] Avg episode reward: [(0, '7.588')] [2025-01-04 02:32:54,120][134294] Updated weights for policy 0, policy_version 71624 (0.0023) [2025-01-04 02:32:57,434][134294] Updated weights for policy 0, policy_version 71634 (0.0031) [2025-01-04 02:32:58,968][134211] Fps is (10 sec: 15564.3, 60 sec: 15428.2, 300 sec: 15703.7). Total num frames: 293429248. Throughput: 0: 3713.0. Samples: 62528642. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 02:32:58,968][134211] Avg episode reward: [(0, '6.747')] [2025-01-04 02:33:00,427][134294] Updated weights for policy 0, policy_version 71644 (0.0021) [2025-01-04 02:33:02,598][134294] Updated weights for policy 0, policy_version 71654 (0.0012) [2025-01-04 02:33:03,968][134211] Fps is (10 sec: 14746.1, 60 sec: 15086.9, 300 sec: 15773.1). Total num frames: 293515264. Throughput: 0: 3774.8. Samples: 62540452. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 02:33:03,968][134211] Avg episode reward: [(0, '6.661')] [2025-01-04 02:33:05,621][134294] Updated weights for policy 0, policy_version 71664 (0.0023) [2025-01-04 02:33:08,968][134211] Fps is (10 sec: 14336.3, 60 sec: 14472.7, 300 sec: 15745.3). Total num frames: 293572608. Throughput: 0: 3821.8. Samples: 62561996. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 02:33:08,968][134211] Avg episode reward: [(0, '6.890')] [2025-01-04 02:33:09,013][134294] Updated weights for policy 0, policy_version 71674 (0.0024) [2025-01-04 02:33:10,947][134294] Updated weights for policy 0, policy_version 71684 (0.0014) [2025-01-04 02:33:12,905][134294] Updated weights for policy 0, policy_version 71694 (0.0014) [2025-01-04 02:33:13,968][134211] Fps is (10 sec: 16384.1, 60 sec: 15291.8, 300 sec: 15870.3). Total num frames: 293679104. Throughput: 0: 3993.8. Samples: 62589452. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 02:33:13,968][134211] Avg episode reward: [(0, '7.205')] [2025-01-04 02:33:14,014][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000071700_293683200.pth... [2025-01-04 02:33:14,055][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000070776_289898496.pth [2025-01-04 02:33:14,839][134294] Updated weights for policy 0, policy_version 71704 (0.0014) [2025-01-04 02:33:16,745][134294] Updated weights for policy 0, policy_version 71714 (0.0014) [2025-01-04 02:33:18,649][134294] Updated weights for policy 0, policy_version 71724 (0.0012) [2025-01-04 02:33:18,967][134211] Fps is (10 sec: 21299.4, 60 sec: 15974.5, 300 sec: 15898.0). Total num frames: 293785600. Throughput: 0: 4123.3. Samples: 62605272. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 02:33:18,968][134211] Avg episode reward: [(0, '6.866')] [2025-01-04 02:33:21,088][134294] Updated weights for policy 0, policy_version 71734 (0.0020) [2025-01-04 02:33:23,969][134211] Fps is (10 sec: 17611.2, 60 sec: 16110.7, 300 sec: 15759.1). Total num frames: 293855232. Throughput: 0: 4017.3. Samples: 62632076. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 02:33:23,969][134211] Avg episode reward: [(0, '7.019')] [2025-01-04 02:33:24,529][134294] Updated weights for policy 0, policy_version 71744 (0.0027) [2025-01-04 02:33:27,776][134294] Updated weights for policy 0, policy_version 71754 (0.0025) [2025-01-04 02:33:28,968][134211] Fps is (10 sec: 13106.9, 60 sec: 16042.9, 300 sec: 15717.5). Total num frames: 293916672. Throughput: 0: 3934.2. Samples: 62650536. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 02:33:28,968][134211] Avg episode reward: [(0, '6.768')] [2025-01-04 02:33:30,960][134294] Updated weights for policy 0, policy_version 71764 (0.0025) [2025-01-04 02:33:33,968][134211] Fps is (10 sec: 12698.7, 60 sec: 15633.0, 300 sec: 15731.4). Total num frames: 293982208. Throughput: 0: 3942.5. Samples: 62660240. Policy #0 lag: (min: 0.0, avg: 0.8, max: 3.0) [2025-01-04 02:33:33,968][134211] Avg episode reward: [(0, '6.939')] [2025-01-04 02:33:34,096][134294] Updated weights for policy 0, policy_version 71774 (0.0026) [2025-01-04 02:33:37,343][134294] Updated weights for policy 0, policy_version 71784 (0.0028) [2025-01-04 02:33:38,968][134211] Fps is (10 sec: 12697.7, 60 sec: 14950.4, 300 sec: 15717.5). Total num frames: 294043648. Throughput: 0: 3782.8. Samples: 62679500. Policy #0 lag: (min: 0.0, avg: 0.8, max: 3.0) [2025-01-04 02:33:38,968][134211] Avg episode reward: [(0, '6.964')] [2025-01-04 02:33:40,787][134294] Updated weights for policy 0, policy_version 71794 (0.0027) [2025-01-04 02:33:43,968][134211] Fps is (10 sec: 12288.0, 60 sec: 14950.5, 300 sec: 15689.8). Total num frames: 294105088. Throughput: 0: 3759.7. Samples: 62697830. Policy #0 lag: (min: 0.0, avg: 0.8, max: 3.0) [2025-01-04 02:33:43,968][134211] Avg episode reward: [(0, '7.022')] [2025-01-04 02:33:44,045][134294] Updated weights for policy 0, policy_version 71804 (0.0028) [2025-01-04 02:33:46,261][134294] Updated weights for policy 0, policy_version 71814 (0.0013) [2025-01-04 02:33:48,260][134294] Updated weights for policy 0, policy_version 71824 (0.0013) [2025-01-04 02:33:48,968][134211] Fps is (10 sec: 15974.5, 60 sec: 15496.5, 300 sec: 15662.0). Total num frames: 294203392. Throughput: 0: 3777.6. Samples: 62710442. Policy #0 lag: (min: 0.0, avg: 0.8, max: 3.0) [2025-01-04 02:33:48,968][134211] Avg episode reward: [(0, '7.156')] [2025-01-04 02:33:50,280][134294] Updated weights for policy 0, policy_version 71834 (0.0013) [2025-01-04 02:33:52,258][134294] Updated weights for policy 0, policy_version 71844 (0.0014) [2025-01-04 02:33:53,968][134211] Fps is (10 sec: 20070.6, 60 sec: 15633.2, 300 sec: 15648.1). Total num frames: 294305792. Throughput: 0: 3974.8. Samples: 62740864. Policy #0 lag: (min: 0.0, avg: 0.8, max: 3.0) [2025-01-04 02:33:53,968][134211] Avg episode reward: [(0, '7.052')] [2025-01-04 02:33:54,288][134294] Updated weights for policy 0, policy_version 71854 (0.0012) [2025-01-04 02:33:56,716][134294] Updated weights for policy 0, policy_version 71864 (0.0019) [2025-01-04 02:33:58,968][134211] Fps is (10 sec: 17612.4, 60 sec: 15837.9, 300 sec: 15717.5). Total num frames: 294379520. Throughput: 0: 3927.4. Samples: 62766184. Policy #0 lag: (min: 0.0, avg: 0.8, max: 3.0) [2025-01-04 02:33:58,969][134211] Avg episode reward: [(0, '6.453')] [2025-01-04 02:34:00,207][134294] Updated weights for policy 0, policy_version 71874 (0.0026) [2025-01-04 02:34:03,671][134294] Updated weights for policy 0, policy_version 71884 (0.0029) [2025-01-04 02:34:03,968][134211] Fps is (10 sec: 13106.8, 60 sec: 15360.0, 300 sec: 15703.6). Total num frames: 294436864. Throughput: 0: 3769.7. Samples: 62774910. Policy #0 lag: (min: 0.0, avg: 0.8, max: 3.0) [2025-01-04 02:34:03,970][134211] Avg episode reward: [(0, '7.249')] [2025-01-04 02:34:07,286][134294] Updated weights for policy 0, policy_version 71894 (0.0026) [2025-01-04 02:34:08,968][134211] Fps is (10 sec: 11468.8, 60 sec: 15360.0, 300 sec: 15578.7). Total num frames: 294494208. Throughput: 0: 3559.1. Samples: 62792234. Policy #0 lag: (min: 0.0, avg: 0.8, max: 3.0) [2025-01-04 02:34:08,968][134211] Avg episode reward: [(0, '7.348')] [2025-01-04 02:34:10,611][134294] Updated weights for policy 0, policy_version 71904 (0.0026) [2025-01-04 02:34:12,817][134294] Updated weights for policy 0, policy_version 71914 (0.0014) [2025-01-04 02:34:13,968][134211] Fps is (10 sec: 14336.4, 60 sec: 15018.7, 300 sec: 15564.8). Total num frames: 294580224. Throughput: 0: 3654.6. Samples: 62814992. Policy #0 lag: (min: 0.0, avg: 0.8, max: 3.0) [2025-01-04 02:34:13,968][134211] Avg episode reward: [(0, '6.696')] [2025-01-04 02:34:14,837][134294] Updated weights for policy 0, policy_version 71924 (0.0014) [2025-01-04 02:34:16,931][134294] Updated weights for policy 0, policy_version 71934 (0.0013) [2025-01-04 02:34:18,968][134211] Fps is (10 sec: 18022.6, 60 sec: 14813.8, 300 sec: 15662.0). Total num frames: 294674432. Throughput: 0: 3775.6. Samples: 62830144. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 02:34:18,968][134211] Avg episode reward: [(0, '7.102')] [2025-01-04 02:34:19,579][134294] Updated weights for policy 0, policy_version 71944 (0.0019) [2025-01-04 02:34:23,172][134294] Updated weights for policy 0, policy_version 71954 (0.0028) [2025-01-04 02:34:23,968][134211] Fps is (10 sec: 15154.8, 60 sec: 14609.2, 300 sec: 15634.2). Total num frames: 294731776. Throughput: 0: 3815.8. Samples: 62851212. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 02:34:23,969][134211] Avg episode reward: [(0, '7.223')] [2025-01-04 02:34:26,472][134294] Updated weights for policy 0, policy_version 71964 (0.0025) [2025-01-04 02:34:28,968][134211] Fps is (10 sec: 11878.4, 60 sec: 14609.1, 300 sec: 15550.9). Total num frames: 294793216. Throughput: 0: 3820.4. Samples: 62869750. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 02:34:28,968][134211] Avg episode reward: [(0, '6.672')] [2025-01-04 02:34:29,553][134294] Updated weights for policy 0, policy_version 71974 (0.0024) [2025-01-04 02:34:31,587][134294] Updated weights for policy 0, policy_version 71984 (0.0012) [2025-01-04 02:34:33,467][134294] Updated weights for policy 0, policy_version 71994 (0.0015) [2025-01-04 02:34:33,968][134211] Fps is (10 sec: 16384.4, 60 sec: 15223.5, 300 sec: 15537.0). Total num frames: 294895616. Throughput: 0: 3833.7. Samples: 62882960. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 02:34:33,968][134211] Avg episode reward: [(0, '6.737')] [2025-01-04 02:34:35,424][134294] Updated weights for policy 0, policy_version 72004 (0.0012) [2025-01-04 02:34:37,436][134294] Updated weights for policy 0, policy_version 72014 (0.0013) [2025-01-04 02:34:38,968][134211] Fps is (10 sec: 20070.5, 60 sec: 15837.9, 300 sec: 15578.7). Total num frames: 294993920. Throughput: 0: 3856.2. Samples: 62914392. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 02:34:38,968][134211] Avg episode reward: [(0, '7.202')] [2025-01-04 02:34:39,901][134294] Updated weights for policy 0, policy_version 72024 (0.0021) [2025-01-04 02:34:43,459][134294] Updated weights for policy 0, policy_version 72034 (0.0029) [2025-01-04 02:34:43,968][134211] Fps is (10 sec: 15974.0, 60 sec: 15837.8, 300 sec: 15578.8). Total num frames: 295055360. Throughput: 0: 3755.1. Samples: 62935166. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 02:34:43,969][134211] Avg episode reward: [(0, '6.899')] [2025-01-04 02:34:46,912][134294] Updated weights for policy 0, policy_version 72044 (0.0027) [2025-01-04 02:34:48,969][134211] Fps is (10 sec: 12286.0, 60 sec: 15223.0, 300 sec: 15564.7). Total num frames: 295116800. Throughput: 0: 3760.1. Samples: 62944120. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 02:34:48,970][134211] Avg episode reward: [(0, '6.987')] [2025-01-04 02:34:50,335][134294] Updated weights for policy 0, policy_version 72054 (0.0026) [2025-01-04 02:34:53,434][134294] Updated weights for policy 0, policy_version 72064 (0.0028) [2025-01-04 02:34:53,968][134211] Fps is (10 sec: 12288.1, 60 sec: 14540.7, 300 sec: 15537.1). Total num frames: 295178240. Throughput: 0: 3786.3. Samples: 62962616. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 02:34:53,968][134211] Avg episode reward: [(0, '7.056')] [2025-01-04 02:34:56,515][134294] Updated weights for policy 0, policy_version 72074 (0.0026) [2025-01-04 02:34:58,968][134211] Fps is (10 sec: 12699.3, 60 sec: 14404.2, 300 sec: 15509.2). Total num frames: 295243776. Throughput: 0: 3711.4. Samples: 62982004. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 02:34:58,969][134211] Avg episode reward: [(0, '7.970')] [2025-01-04 02:34:59,784][134294] Updated weights for policy 0, policy_version 72084 (0.0023) [2025-01-04 02:35:02,486][134294] Updated weights for policy 0, policy_version 72094 (0.0016) [2025-01-04 02:35:03,968][134211] Fps is (10 sec: 14336.4, 60 sec: 14745.7, 300 sec: 15412.1). Total num frames: 295321600. Throughput: 0: 3585.9. Samples: 62991510. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 02:35:03,968][134211] Avg episode reward: [(0, '7.095')] [2025-01-04 02:35:04,606][134294] Updated weights for policy 0, policy_version 72104 (0.0012) [2025-01-04 02:35:06,691][134294] Updated weights for policy 0, policy_version 72114 (0.0014) [2025-01-04 02:35:08,712][134294] Updated weights for policy 0, policy_version 72124 (0.0014) [2025-01-04 02:35:08,967][134211] Fps is (10 sec: 18023.1, 60 sec: 15496.6, 300 sec: 15412.1). Total num frames: 295424000. Throughput: 0: 3757.5. Samples: 63020296. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 02:35:08,968][134211] Avg episode reward: [(0, '7.061')] [2025-01-04 02:35:10,690][134294] Updated weights for policy 0, policy_version 72134 (0.0013) [2025-01-04 02:35:13,297][134294] Updated weights for policy 0, policy_version 72144 (0.0024) [2025-01-04 02:35:13,970][134211] Fps is (10 sec: 18428.5, 60 sec: 15427.8, 300 sec: 15467.5). Total num frames: 295505920. Throughput: 0: 3966.6. Samples: 63048256. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 02:35:13,970][134211] Avg episode reward: [(0, '7.366')] [2025-01-04 02:35:13,998][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000072146_295510016.pth... [2025-01-04 02:35:14,080][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000071245_291819520.pth [2025-01-04 02:35:16,779][134294] Updated weights for policy 0, policy_version 72154 (0.0030) [2025-01-04 02:35:18,968][134211] Fps is (10 sec: 14745.4, 60 sec: 14950.4, 300 sec: 15481.5). Total num frames: 295571456. Throughput: 0: 3865.9. Samples: 63056926. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 02:35:18,968][134211] Avg episode reward: [(0, '6.992')] [2025-01-04 02:35:20,003][134294] Updated weights for policy 0, policy_version 72164 (0.0027) [2025-01-04 02:35:23,090][134294] Updated weights for policy 0, policy_version 72174 (0.0028) [2025-01-04 02:35:23,968][134211] Fps is (10 sec: 12699.7, 60 sec: 15018.7, 300 sec: 15453.7). Total num frames: 295632896. Throughput: 0: 3599.1. Samples: 63076352. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 02:35:23,968][134211] Avg episode reward: [(0, '6.798')] [2025-01-04 02:35:26,354][134294] Updated weights for policy 0, policy_version 72184 (0.0024) [2025-01-04 02:35:28,968][134211] Fps is (10 sec: 12697.4, 60 sec: 15086.9, 300 sec: 15370.4). Total num frames: 295698432. Throughput: 0: 3562.1. Samples: 63095462. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 02:35:28,968][134211] Avg episode reward: [(0, '7.184')] [2025-01-04 02:35:29,498][134294] Updated weights for policy 0, policy_version 72194 (0.0027) [2025-01-04 02:35:32,211][134294] Updated weights for policy 0, policy_version 72204 (0.0022) [2025-01-04 02:35:33,968][134211] Fps is (10 sec: 14745.4, 60 sec: 14745.5, 300 sec: 15287.1). Total num frames: 295780352. Throughput: 0: 3579.5. Samples: 63105194. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 02:35:33,968][134211] Avg episode reward: [(0, '7.008')] [2025-01-04 02:35:34,193][134294] Updated weights for policy 0, policy_version 72214 (0.0015) [2025-01-04 02:35:36,147][134294] Updated weights for policy 0, policy_version 72224 (0.0013) [2025-01-04 02:35:38,376][134294] Updated weights for policy 0, policy_version 72234 (0.0017) [2025-01-04 02:35:38,968][134211] Fps is (10 sec: 17612.6, 60 sec: 14677.3, 300 sec: 15231.6). Total num frames: 295874560. Throughput: 0: 3850.6. Samples: 63135894. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 02:35:38,969][134211] Avg episode reward: [(0, '7.138')] [2025-01-04 02:35:42,170][134294] Updated weights for policy 0, policy_version 72244 (0.0031) [2025-01-04 02:35:43,968][134211] Fps is (10 sec: 14745.8, 60 sec: 14540.8, 300 sec: 15189.9). Total num frames: 295927808. Throughput: 0: 3817.4. Samples: 63153786. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 02:35:43,969][134211] Avg episode reward: [(0, '6.941')] [2025-01-04 02:35:45,294][134294] Updated weights for policy 0, policy_version 72254 (0.0023) [2025-01-04 02:35:47,353][134294] Updated weights for policy 0, policy_version 72264 (0.0014) [2025-01-04 02:35:48,967][134211] Fps is (10 sec: 15155.8, 60 sec: 15155.6, 300 sec: 15314.9). Total num frames: 296026112. Throughput: 0: 3871.1. Samples: 63165708. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 02:35:48,968][134211] Avg episode reward: [(0, '7.254')] [2025-01-04 02:35:49,476][134294] Updated weights for policy 0, policy_version 72274 (0.0014) [2025-01-04 02:35:51,477][134294] Updated weights for policy 0, policy_version 72284 (0.0015) [2025-01-04 02:35:53,498][134294] Updated weights for policy 0, policy_version 72294 (0.0013) [2025-01-04 02:35:53,968][134211] Fps is (10 sec: 19661.2, 60 sec: 15769.7, 300 sec: 15426.0). Total num frames: 296124416. Throughput: 0: 3900.6. Samples: 63195822. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 02:35:53,968][134211] Avg episode reward: [(0, '6.914')] [2025-01-04 02:35:55,599][134294] Updated weights for policy 0, policy_version 72304 (0.0014) [2025-01-04 02:35:58,700][134294] Updated weights for policy 0, policy_version 72314 (0.0028) [2025-01-04 02:35:58,968][134211] Fps is (10 sec: 17202.4, 60 sec: 15906.1, 300 sec: 15481.5). Total num frames: 296198144. Throughput: 0: 3837.9. Samples: 63220956. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 02:35:58,970][134211] Avg episode reward: [(0, '6.402')] [2025-01-04 02:36:01,990][134294] Updated weights for policy 0, policy_version 72324 (0.0028) [2025-01-04 02:36:03,968][134211] Fps is (10 sec: 13516.5, 60 sec: 15633.0, 300 sec: 15425.9). Total num frames: 296259584. Throughput: 0: 3849.7. Samples: 63230162. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 02:36:03,968][134211] Avg episode reward: [(0, '7.013')] [2025-01-04 02:36:05,335][134294] Updated weights for policy 0, policy_version 72334 (0.0025) [2025-01-04 02:36:08,492][134294] Updated weights for policy 0, policy_version 72344 (0.0027) [2025-01-04 02:36:08,968][134211] Fps is (10 sec: 12698.0, 60 sec: 15018.6, 300 sec: 15273.2). Total num frames: 296325120. Throughput: 0: 3844.1. Samples: 63249336. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 02:36:08,968][134211] Avg episode reward: [(0, '7.141')] [2025-01-04 02:36:11,640][134294] Updated weights for policy 0, policy_version 72354 (0.0029) [2025-01-04 02:36:13,968][134211] Fps is (10 sec: 13107.3, 60 sec: 14746.0, 300 sec: 15134.4). Total num frames: 296390656. Throughput: 0: 3846.8. Samples: 63268570. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 02:36:13,968][134211] Avg episode reward: [(0, '7.100')] [2025-01-04 02:36:14,853][134294] Updated weights for policy 0, policy_version 72364 (0.0025) [2025-01-04 02:36:17,828][134294] Updated weights for policy 0, policy_version 72374 (0.0025) [2025-01-04 02:36:18,968][134211] Fps is (10 sec: 13107.2, 60 sec: 14745.6, 300 sec: 15120.5). Total num frames: 296456192. Throughput: 0: 3846.6. Samples: 63278292. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 02:36:18,968][134211] Avg episode reward: [(0, '7.126')] [2025-01-04 02:36:20,955][134294] Updated weights for policy 0, policy_version 72384 (0.0026) [2025-01-04 02:36:23,044][134294] Updated weights for policy 0, policy_version 72394 (0.0014) [2025-01-04 02:36:23,967][134211] Fps is (10 sec: 15155.5, 60 sec: 15155.3, 300 sec: 15189.9). Total num frames: 296542208. Throughput: 0: 3645.1. Samples: 63299924. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 02:36:23,968][134211] Avg episode reward: [(0, '6.880')] [2025-01-04 02:36:25,065][134294] Updated weights for policy 0, policy_version 72404 (0.0012) [2025-01-04 02:36:26,971][134294] Updated weights for policy 0, policy_version 72414 (0.0014) [2025-01-04 02:36:28,907][134294] Updated weights for policy 0, policy_version 72424 (0.0013) [2025-01-04 02:36:28,967][134211] Fps is (10 sec: 19251.4, 60 sec: 15837.9, 300 sec: 15328.9). Total num frames: 296648704. Throughput: 0: 3951.8. Samples: 63331618. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 02:36:28,968][134211] Avg episode reward: [(0, '7.315')] [2025-01-04 02:36:30,812][134294] Updated weights for policy 0, policy_version 72434 (0.0015) [2025-01-04 02:36:33,968][134211] Fps is (10 sec: 18431.6, 60 sec: 15769.7, 300 sec: 15370.4). Total num frames: 296726528. Throughput: 0: 4026.9. Samples: 63346920. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 02:36:33,968][134211] Avg episode reward: [(0, '7.401')] [2025-01-04 02:36:34,057][134294] Updated weights for policy 0, policy_version 72444 (0.0026) [2025-01-04 02:36:37,389][134294] Updated weights for policy 0, policy_version 72454 (0.0030) [2025-01-04 02:36:38,968][134211] Fps is (10 sec: 13926.2, 60 sec: 15223.5, 300 sec: 15370.4). Total num frames: 296787968. Throughput: 0: 3765.7. Samples: 63365280. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 02:36:38,968][134211] Avg episode reward: [(0, '7.587')] [2025-01-04 02:36:40,724][134294] Updated weights for policy 0, policy_version 72464 (0.0027) [2025-01-04 02:36:43,968][134211] Fps is (10 sec: 12287.9, 60 sec: 15360.0, 300 sec: 15259.3). Total num frames: 296849408. Throughput: 0: 3627.5. Samples: 63384194. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 02:36:43,968][134211] Avg episode reward: [(0, '7.509')] [2025-01-04 02:36:43,982][134294] Updated weights for policy 0, policy_version 72474 (0.0028) [2025-01-04 02:36:47,124][134294] Updated weights for policy 0, policy_version 72484 (0.0025) [2025-01-04 02:36:48,967][134211] Fps is (10 sec: 13926.6, 60 sec: 15018.7, 300 sec: 15287.1). Total num frames: 296927232. Throughput: 0: 3625.2. Samples: 63393294. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 02:36:48,968][134211] Avg episode reward: [(0, '7.250')] [2025-01-04 02:36:49,207][134294] Updated weights for policy 0, policy_version 72494 (0.0014) [2025-01-04 02:36:51,198][134294] Updated weights for policy 0, policy_version 72504 (0.0014) [2025-01-04 02:36:53,218][134294] Updated weights for policy 0, policy_version 72514 (0.0014) [2025-01-04 02:36:53,968][134211] Fps is (10 sec: 18021.9, 60 sec: 15086.8, 300 sec: 15342.6). Total num frames: 297029632. Throughput: 0: 3843.8. Samples: 63422308. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 02:36:53,968][134211] Avg episode reward: [(0, '7.704')] [2025-01-04 02:36:56,097][134294] Updated weights for policy 0, policy_version 72524 (0.0024) [2025-01-04 02:36:58,969][134211] Fps is (10 sec: 16381.0, 60 sec: 14881.8, 300 sec: 15189.8). Total num frames: 297091072. Throughput: 0: 3892.0. Samples: 63443718. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 02:36:58,970][134211] Avg episode reward: [(0, '7.479')] [2025-01-04 02:36:59,677][134294] Updated weights for policy 0, policy_version 72534 (0.0027) [2025-01-04 02:37:02,182][134294] Updated weights for policy 0, policy_version 72544 (0.0014) [2025-01-04 02:37:03,968][134211] Fps is (10 sec: 14336.3, 60 sec: 15223.5, 300 sec: 15148.3). Total num frames: 297172992. Throughput: 0: 3906.3. Samples: 63454076. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 02:37:03,968][134211] Avg episode reward: [(0, '6.857')] [2025-01-04 02:37:04,367][134294] Updated weights for policy 0, policy_version 72554 (0.0012) [2025-01-04 02:37:06,521][134294] Updated weights for policy 0, policy_version 72564 (0.0012) [2025-01-04 02:37:08,731][134294] Updated weights for policy 0, policy_version 72574 (0.0014) [2025-01-04 02:37:08,969][134211] Fps is (10 sec: 17203.5, 60 sec: 15632.7, 300 sec: 15259.3). Total num frames: 297263104. Throughput: 0: 4045.5. Samples: 63481978. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 02:37:08,970][134211] Avg episode reward: [(0, '7.358')] [2025-01-04 02:37:11,994][134294] Updated weights for policy 0, policy_version 72584 (0.0026) [2025-01-04 02:37:13,968][134211] Fps is (10 sec: 15155.1, 60 sec: 15564.8, 300 sec: 15245.4). Total num frames: 297324544. Throughput: 0: 3803.1. Samples: 63502758. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 02:37:13,968][134211] Avg episode reward: [(0, '7.368')] [2025-01-04 02:37:13,985][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000072590_297328640.pth... [2025-01-04 02:37:14,058][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000071700_293683200.pth [2025-01-04 02:37:15,259][134294] Updated weights for policy 0, policy_version 72594 (0.0028) [2025-01-04 02:37:18,422][134294] Updated weights for policy 0, policy_version 72604 (0.0024) [2025-01-04 02:37:18,968][134211] Fps is (10 sec: 12699.6, 60 sec: 15564.8, 300 sec: 15259.3). Total num frames: 297390080. Throughput: 0: 3676.1. Samples: 63512346. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 02:37:18,968][134211] Avg episode reward: [(0, '6.969')] [2025-01-04 02:37:21,556][134294] Updated weights for policy 0, policy_version 72614 (0.0024) [2025-01-04 02:37:23,968][134211] Fps is (10 sec: 13107.3, 60 sec: 15223.4, 300 sec: 15259.4). Total num frames: 297455616. Throughput: 0: 3698.7. Samples: 63531722. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 02:37:23,968][134211] Avg episode reward: [(0, '7.067')] [2025-01-04 02:37:24,829][134294] Updated weights for policy 0, policy_version 72624 (0.0027) [2025-01-04 02:37:27,916][134294] Updated weights for policy 0, policy_version 72634 (0.0023) [2025-01-04 02:37:28,968][134211] Fps is (10 sec: 13107.1, 60 sec: 14540.7, 300 sec: 15176.0). Total num frames: 297521152. Throughput: 0: 3710.3. Samples: 63551156. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 02:37:28,968][134211] Avg episode reward: [(0, '7.125')] [2025-01-04 02:37:31,030][134294] Updated weights for policy 0, policy_version 72644 (0.0025) [2025-01-04 02:37:33,058][134294] Updated weights for policy 0, policy_version 72654 (0.0013) [2025-01-04 02:37:33,968][134211] Fps is (10 sec: 15155.3, 60 sec: 14677.3, 300 sec: 15120.5). Total num frames: 297607168. Throughput: 0: 3736.8. Samples: 63561450. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 02:37:33,968][134211] Avg episode reward: [(0, '7.160')] [2025-01-04 02:37:35,012][134294] Updated weights for policy 0, policy_version 72664 (0.0011) [2025-01-04 02:37:36,933][134294] Updated weights for policy 0, policy_version 72674 (0.0014) [2025-01-04 02:37:38,968][134211] Fps is (10 sec: 18431.9, 60 sec: 15291.7, 300 sec: 15245.5). Total num frames: 297705472. Throughput: 0: 3785.1. Samples: 63592636. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 02:37:38,968][134211] Avg episode reward: [(0, '8.008')] [2025-01-04 02:37:39,536][134294] Updated weights for policy 0, policy_version 72684 (0.0019) [2025-01-04 02:37:43,321][134294] Updated weights for policy 0, policy_version 72694 (0.0028) [2025-01-04 02:37:43,968][134211] Fps is (10 sec: 15155.3, 60 sec: 15155.2, 300 sec: 15203.8). Total num frames: 297758720. Throughput: 0: 3725.8. Samples: 63611374. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 02:37:43,968][134211] Avg episode reward: [(0, '7.610')] [2025-01-04 02:37:46,600][134294] Updated weights for policy 0, policy_version 72704 (0.0025) [2025-01-04 02:37:48,728][134294] Updated weights for policy 0, policy_version 72714 (0.0014) [2025-01-04 02:37:48,968][134211] Fps is (10 sec: 13516.8, 60 sec: 15223.4, 300 sec: 15162.2). Total num frames: 297840640. Throughput: 0: 3695.5. Samples: 63620372. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 02:37:48,968][134211] Avg episode reward: [(0, '7.081')] [2025-01-04 02:37:50,789][134294] Updated weights for policy 0, policy_version 72724 (0.0015) [2025-01-04 02:37:52,829][134294] Updated weights for policy 0, policy_version 72734 (0.0014) [2025-01-04 02:37:53,967][134211] Fps is (10 sec: 18022.8, 60 sec: 15155.3, 300 sec: 15287.1). Total num frames: 297938944. Throughput: 0: 3725.9. Samples: 63649638. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 02:37:53,968][134211] Avg episode reward: [(0, '7.425')] [2025-01-04 02:37:55,110][134294] Updated weights for policy 0, policy_version 72744 (0.0018) [2025-01-04 02:37:58,293][134294] Updated weights for policy 0, policy_version 72754 (0.0026) [2025-01-04 02:37:58,969][134211] Fps is (10 sec: 16382.4, 60 sec: 15223.6, 300 sec: 15217.6). Total num frames: 298004480. Throughput: 0: 3777.0. Samples: 63672726. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 02:37:58,969][134211] Avg episode reward: [(0, '6.434')] [2025-01-04 02:38:01,838][134294] Updated weights for policy 0, policy_version 72764 (0.0029) [2025-01-04 02:38:03,968][134211] Fps is (10 sec: 12697.2, 60 sec: 14882.1, 300 sec: 15231.6). Total num frames: 298065920. Throughput: 0: 3763.2. Samples: 63681692. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 02:38:03,968][134211] Avg episode reward: [(0, '7.792')] [2025-01-04 02:38:05,251][134294] Updated weights for policy 0, policy_version 72774 (0.0030) [2025-01-04 02:38:08,604][134294] Updated weights for policy 0, policy_version 72784 (0.0027) [2025-01-04 02:38:08,968][134211] Fps is (10 sec: 11879.6, 60 sec: 14336.3, 300 sec: 15064.9). Total num frames: 298123264. Throughput: 0: 3731.6. Samples: 63699644. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 02:38:08,968][134211] Avg episode reward: [(0, '7.203')] [2025-01-04 02:38:11,444][134294] Updated weights for policy 0, policy_version 72794 (0.0018) [2025-01-04 02:38:13,513][134294] Updated weights for policy 0, policy_version 72804 (0.0011) [2025-01-04 02:38:13,967][134211] Fps is (10 sec: 14746.0, 60 sec: 14814.0, 300 sec: 15009.4). Total num frames: 298213376. Throughput: 0: 3815.0. Samples: 63722828. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 02:38:13,968][134211] Avg episode reward: [(0, '7.346')] [2025-01-04 02:38:15,600][134294] Updated weights for policy 0, policy_version 72814 (0.0012) [2025-01-04 02:38:17,642][134294] Updated weights for policy 0, policy_version 72824 (0.0016) [2025-01-04 02:38:18,968][134211] Fps is (10 sec: 17612.9, 60 sec: 15155.2, 300 sec: 15065.0). Total num frames: 298299392. Throughput: 0: 3917.6. Samples: 63737744. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 02:38:18,968][134211] Avg episode reward: [(0, '7.502')] [2025-01-04 02:38:21,089][134294] Updated weights for policy 0, policy_version 72834 (0.0028) [2025-01-04 02:38:23,968][134211] Fps is (10 sec: 14335.7, 60 sec: 15018.7, 300 sec: 15051.1). Total num frames: 298356736. Throughput: 0: 3679.7. Samples: 63758222. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 02:38:23,968][134211] Avg episode reward: [(0, '6.782')] [2025-01-04 02:38:24,621][134294] Updated weights for policy 0, policy_version 72844 (0.0025) [2025-01-04 02:38:28,072][134294] Updated weights for policy 0, policy_version 72854 (0.0025) [2025-01-04 02:38:28,968][134211] Fps is (10 sec: 11878.6, 60 sec: 14950.4, 300 sec: 15037.2). Total num frames: 298418176. Throughput: 0: 3663.7. Samples: 63776242. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 02:38:28,968][134211] Avg episode reward: [(0, '7.315')] [2025-01-04 02:38:30,387][134294] Updated weights for policy 0, policy_version 72864 (0.0014) [2025-01-04 02:38:32,405][134294] Updated weights for policy 0, policy_version 72874 (0.0013) [2025-01-04 02:38:33,967][134211] Fps is (10 sec: 16384.3, 60 sec: 15223.5, 300 sec: 15176.0). Total num frames: 298520576. Throughput: 0: 3770.3. Samples: 63790036. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 02:38:33,968][134211] Avg episode reward: [(0, '7.696')] [2025-01-04 02:38:34,497][134294] Updated weights for policy 0, policy_version 72884 (0.0013) [2025-01-04 02:38:36,559][134294] Updated weights for policy 0, policy_version 72894 (0.0015) [2025-01-04 02:38:38,548][134294] Updated weights for policy 0, policy_version 72904 (0.0013) [2025-01-04 02:38:38,968][134211] Fps is (10 sec: 20479.9, 60 sec: 15291.8, 300 sec: 15314.9). Total num frames: 298622976. Throughput: 0: 3787.3. Samples: 63820066. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 02:38:38,968][134211] Avg episode reward: [(0, '6.956')] [2025-01-04 02:38:41,469][134294] Updated weights for policy 0, policy_version 72914 (0.0022) [2025-01-04 02:38:43,968][134211] Fps is (10 sec: 16383.7, 60 sec: 15428.3, 300 sec: 15189.9). Total num frames: 298684416. Throughput: 0: 3762.6. Samples: 63842038. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 02:38:43,968][134211] Avg episode reward: [(0, '6.494')] [2025-01-04 02:38:44,889][134294] Updated weights for policy 0, policy_version 72924 (0.0030) [2025-01-04 02:38:48,168][134294] Updated weights for policy 0, policy_version 72934 (0.0026) [2025-01-04 02:38:48,968][134211] Fps is (10 sec: 12287.9, 60 sec: 15087.0, 300 sec: 15051.1). Total num frames: 298745856. Throughput: 0: 3764.6. Samples: 63851100. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 02:38:48,968][134211] Avg episode reward: [(0, '7.155')] [2025-01-04 02:38:51,371][134294] Updated weights for policy 0, policy_version 72944 (0.0026) [2025-01-04 02:38:53,968][134211] Fps is (10 sec: 12287.9, 60 sec: 14472.5, 300 sec: 15009.4). Total num frames: 298807296. Throughput: 0: 3792.2. Samples: 63870292. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 02:38:53,968][134211] Avg episode reward: [(0, '6.858')] [2025-01-04 02:38:54,705][134294] Updated weights for policy 0, policy_version 72954 (0.0030) [2025-01-04 02:38:57,820][134294] Updated weights for policy 0, policy_version 72964 (0.0027) [2025-01-04 02:38:58,968][134211] Fps is (10 sec: 12697.2, 60 sec: 14472.7, 300 sec: 15037.2). Total num frames: 298872832. Throughput: 0: 3700.1. Samples: 63889334. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 02:38:58,969][134211] Avg episode reward: [(0, '7.517')] [2025-01-04 02:39:01,014][134294] Updated weights for policy 0, policy_version 72974 (0.0023) [2025-01-04 02:39:03,968][134211] Fps is (10 sec: 12697.9, 60 sec: 14472.6, 300 sec: 15051.1). Total num frames: 298934272. Throughput: 0: 3586.6. Samples: 63899140. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 02:39:03,968][134211] Avg episode reward: [(0, '7.303')] [2025-01-04 02:39:04,274][134294] Updated weights for policy 0, policy_version 72984 (0.0020) [2025-01-04 02:39:06,372][134294] Updated weights for policy 0, policy_version 72994 (0.0013) [2025-01-04 02:39:08,452][134294] Updated weights for policy 0, policy_version 73004 (0.0016) [2025-01-04 02:39:08,967][134211] Fps is (10 sec: 15975.1, 60 sec: 15155.3, 300 sec: 15092.7). Total num frames: 299032576. Throughput: 0: 3666.6. Samples: 63923218. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 02:39:08,968][134211] Avg episode reward: [(0, '7.161')] [2025-01-04 02:39:10,409][134294] Updated weights for policy 0, policy_version 73014 (0.0014) [2025-01-04 02:39:12,293][134294] Updated weights for policy 0, policy_version 73024 (0.0012) [2025-01-04 02:39:13,968][134211] Fps is (10 sec: 20479.8, 60 sec: 15428.2, 300 sec: 15134.4). Total num frames: 299139072. Throughput: 0: 3959.5. Samples: 63954422. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 02:39:13,968][134211] Avg episode reward: [(0, '7.909')] [2025-01-04 02:39:13,972][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000073032_299139072.pth... [2025-01-04 02:39:14,018][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000072146_295510016.pth [2025-01-04 02:39:14,239][134294] Updated weights for policy 0, policy_version 73034 (0.0013) [2025-01-04 02:39:16,278][134294] Updated weights for policy 0, policy_version 73044 (0.0015) [2025-01-04 02:39:18,968][134211] Fps is (10 sec: 18841.3, 60 sec: 15360.0, 300 sec: 15217.7). Total num frames: 299220992. Throughput: 0: 3995.2. Samples: 63969822. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 02:39:18,968][134211] Avg episode reward: [(0, '7.038')] [2025-01-04 02:39:19,428][134294] Updated weights for policy 0, policy_version 73054 (0.0026) [2025-01-04 02:39:22,957][134294] Updated weights for policy 0, policy_version 73064 (0.0028) [2025-01-04 02:39:23,968][134211] Fps is (10 sec: 13926.2, 60 sec: 15360.0, 300 sec: 15203.8). Total num frames: 299278336. Throughput: 0: 3742.4. Samples: 63988476. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 02:39:23,969][134211] Avg episode reward: [(0, '6.840')] [2025-01-04 02:39:26,354][134294] Updated weights for policy 0, policy_version 73074 (0.0031) [2025-01-04 02:39:28,968][134211] Fps is (10 sec: 11878.5, 60 sec: 15360.0, 300 sec: 15064.9). Total num frames: 299339776. Throughput: 0: 3651.3. Samples: 64006348. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 02:39:28,968][134211] Avg episode reward: [(0, '6.582')] [2025-01-04 02:39:29,833][134294] Updated weights for policy 0, policy_version 73084 (0.0029) [2025-01-04 02:39:33,034][134294] Updated weights for policy 0, policy_version 73094 (0.0027) [2025-01-04 02:39:33,968][134211] Fps is (10 sec: 12288.1, 60 sec: 14677.3, 300 sec: 14940.0). Total num frames: 299401216. Throughput: 0: 3651.2. Samples: 64015406. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 02:39:33,968][134211] Avg episode reward: [(0, '6.896')] [2025-01-04 02:39:36,132][134294] Updated weights for policy 0, policy_version 73104 (0.0024) [2025-01-04 02:39:38,469][134294] Updated weights for policy 0, policy_version 73114 (0.0017) [2025-01-04 02:39:38,968][134211] Fps is (10 sec: 13926.5, 60 sec: 14267.7, 300 sec: 14995.5). Total num frames: 299479040. Throughput: 0: 3665.5. Samples: 64035238. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 02:39:38,968][134211] Avg episode reward: [(0, '7.338')] [2025-01-04 02:39:41,256][134294] Updated weights for policy 0, policy_version 73124 (0.0019) [2025-01-04 02:39:43,968][134211] Fps is (10 sec: 14745.6, 60 sec: 14404.3, 300 sec: 15023.4). Total num frames: 299548672. Throughput: 0: 3762.8. Samples: 64058660. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 02:39:43,968][134211] Avg episode reward: [(0, '7.789')] [2025-01-04 02:39:44,382][134294] Updated weights for policy 0, policy_version 73134 (0.0025) [2025-01-04 02:39:46,672][134294] Updated weights for policy 0, policy_version 73144 (0.0015) [2025-01-04 02:39:48,560][134294] Updated weights for policy 0, policy_version 73154 (0.0013) [2025-01-04 02:39:48,967][134211] Fps is (10 sec: 16793.8, 60 sec: 15018.7, 300 sec: 15148.3). Total num frames: 299646976. Throughput: 0: 3798.1. Samples: 64070054. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 02:39:48,968][134211] Avg episode reward: [(0, '7.838')] [2025-01-04 02:39:50,528][134294] Updated weights for policy 0, policy_version 73164 (0.0015) [2025-01-04 02:39:52,401][134294] Updated weights for policy 0, policy_version 73174 (0.0015) [2025-01-04 02:39:53,968][134211] Fps is (10 sec: 20480.2, 60 sec: 15769.7, 300 sec: 15287.1). Total num frames: 299753472. Throughput: 0: 3976.7. Samples: 64102168. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 02:39:53,968][134211] Avg episode reward: [(0, '7.653')] [2025-01-04 02:39:54,318][134294] Updated weights for policy 0, policy_version 73184 (0.0012) [2025-01-04 02:39:56,891][134294] Updated weights for policy 0, policy_version 73194 (0.0025) [2025-01-04 02:39:58,968][134211] Fps is (10 sec: 18022.0, 60 sec: 15906.2, 300 sec: 15273.2). Total num frames: 299827200. Throughput: 0: 3851.6. Samples: 64127744. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 02:39:58,968][134211] Avg episode reward: [(0, '7.244')] [2025-01-04 02:40:00,328][134294] Updated weights for policy 0, policy_version 73204 (0.0028) [2025-01-04 02:40:03,457][134294] Updated weights for policy 0, policy_version 73214 (0.0026) [2025-01-04 02:40:03,968][134211] Fps is (10 sec: 13516.5, 60 sec: 15906.1, 300 sec: 15134.4). Total num frames: 299888640. Throughput: 0: 3710.7. Samples: 64136806. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 02:40:03,969][134211] Avg episode reward: [(0, '7.211')] [2025-01-04 02:40:06,590][134294] Updated weights for policy 0, policy_version 73224 (0.0027) [2025-01-04 02:40:08,968][134211] Fps is (10 sec: 12697.7, 60 sec: 15360.0, 300 sec: 15078.9). Total num frames: 299954176. Throughput: 0: 3732.9. Samples: 64156454. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 02:40:08,968][134211] Avg episode reward: [(0, '7.435')] [2025-01-04 02:40:09,808][134294] Updated weights for policy 0, policy_version 73234 (0.0026) [2025-01-04 02:40:12,693][134294] Updated weights for policy 0, policy_version 73244 (0.0024) [2025-01-04 02:40:13,970][134211] Fps is (10 sec: 13104.4, 60 sec: 14676.8, 300 sec: 15078.7). Total num frames: 300019712. Throughput: 0: 3781.7. Samples: 64176534. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2025-01-04 02:40:13,971][134211] Avg episode reward: [(0, '7.644')] [2025-01-04 02:40:15,780][134294] Updated weights for policy 0, policy_version 73254 (0.0025) [2025-01-04 02:40:18,630][134294] Updated weights for policy 0, policy_version 73264 (0.0021) [2025-01-04 02:40:18,970][134211] Fps is (10 sec: 13513.9, 60 sec: 14472.0, 300 sec: 15106.5). Total num frames: 300089344. Throughput: 0: 3810.1. Samples: 64186870. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2025-01-04 02:40:18,970][134211] Avg episode reward: [(0, '7.363')] [2025-01-04 02:40:21,053][134294] Updated weights for policy 0, policy_version 73274 (0.0016) [2025-01-04 02:40:22,993][134294] Updated weights for policy 0, policy_version 73284 (0.0013) [2025-01-04 02:40:23,968][134211] Fps is (10 sec: 17207.2, 60 sec: 15223.5, 300 sec: 15231.6). Total num frames: 300191744. Throughput: 0: 3922.5. Samples: 64211752. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2025-01-04 02:40:23,968][134211] Avg episode reward: [(0, '8.401')] [2025-01-04 02:40:24,829][134294] Updated weights for policy 0, policy_version 73294 (0.0013) [2025-01-04 02:40:26,762][134294] Updated weights for policy 0, policy_version 73304 (0.0014) [2025-01-04 02:40:28,968][134211] Fps is (10 sec: 19254.0, 60 sec: 15701.2, 300 sec: 15259.3). Total num frames: 300281856. Throughput: 0: 4067.5. Samples: 64241700. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2025-01-04 02:40:28,969][134211] Avg episode reward: [(0, '7.435')] [2025-01-04 02:40:29,701][134294] Updated weights for policy 0, policy_version 73314 (0.0027) [2025-01-04 02:40:32,952][134294] Updated weights for policy 0, policy_version 73324 (0.0030) [2025-01-04 02:40:33,968][134211] Fps is (10 sec: 15155.1, 60 sec: 15701.3, 300 sec: 15148.3). Total num frames: 300343296. Throughput: 0: 4032.6. Samples: 64251520. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2025-01-04 02:40:33,968][134211] Avg episode reward: [(0, '7.697')] [2025-01-04 02:40:36,113][134294] Updated weights for policy 0, policy_version 73334 (0.0026) [2025-01-04 02:40:38,114][134294] Updated weights for policy 0, policy_version 73344 (0.0013) [2025-01-04 02:40:38,968][134211] Fps is (10 sec: 15156.3, 60 sec: 15906.1, 300 sec: 15273.2). Total num frames: 300433408. Throughput: 0: 3782.9. Samples: 64272398. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2025-01-04 02:40:38,968][134211] Avg episode reward: [(0, '7.587')] [2025-01-04 02:40:39,997][134294] Updated weights for policy 0, policy_version 73354 (0.0012) [2025-01-04 02:40:41,913][134294] Updated weights for policy 0, policy_version 73364 (0.0014) [2025-01-04 02:40:43,766][134294] Updated weights for policy 0, policy_version 73374 (0.0013) [2025-01-04 02:40:43,968][134211] Fps is (10 sec: 20070.6, 60 sec: 16588.8, 300 sec: 15314.9). Total num frames: 300544000. Throughput: 0: 3938.1. Samples: 64304956. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2025-01-04 02:40:43,968][134211] Avg episode reward: [(0, '6.973')] [2025-01-04 02:40:45,671][134294] Updated weights for policy 0, policy_version 73384 (0.0014) [2025-01-04 02:40:47,617][134294] Updated weights for policy 0, policy_version 73394 (0.0015) [2025-01-04 02:40:48,968][134211] Fps is (10 sec: 20889.7, 60 sec: 16588.8, 300 sec: 15314.9). Total num frames: 300642304. Throughput: 0: 4101.1. Samples: 64321356. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2025-01-04 02:40:48,968][134211] Avg episode reward: [(0, '7.203')] [2025-01-04 02:40:50,619][134294] Updated weights for policy 0, policy_version 73404 (0.0023) [2025-01-04 02:40:53,775][134294] Updated weights for policy 0, policy_version 73414 (0.0026) [2025-01-04 02:40:53,972][134211] Fps is (10 sec: 15967.4, 60 sec: 15836.7, 300 sec: 15273.0). Total num frames: 300703744. Throughput: 0: 4170.9. Samples: 64344162. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2025-01-04 02:40:53,973][134211] Avg episode reward: [(0, '7.547')] [2025-01-04 02:40:56,789][134294] Updated weights for policy 0, policy_version 73424 (0.0025) [2025-01-04 02:40:58,968][134211] Fps is (10 sec: 12697.5, 60 sec: 15701.3, 300 sec: 15287.1). Total num frames: 300769280. Throughput: 0: 4157.8. Samples: 64363624. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2025-01-04 02:40:58,968][134211] Avg episode reward: [(0, '6.875')] [2025-01-04 02:40:59,983][134294] Updated weights for policy 0, policy_version 73434 (0.0023) [2025-01-04 02:41:03,515][134294] Updated weights for policy 0, policy_version 73444 (0.0029) [2025-01-04 02:41:03,968][134211] Fps is (10 sec: 12702.8, 60 sec: 15701.3, 300 sec: 15273.2). Total num frames: 300830720. Throughput: 0: 4137.1. Samples: 64373030. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2025-01-04 02:41:03,968][134211] Avg episode reward: [(0, '6.989')] [2025-01-04 02:41:07,139][134294] Updated weights for policy 0, policy_version 73454 (0.0027) [2025-01-04 02:41:08,969][134211] Fps is (10 sec: 11877.3, 60 sec: 15564.6, 300 sec: 15245.4). Total num frames: 300888064. Throughput: 0: 3968.7. Samples: 64390348. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2025-01-04 02:41:08,969][134211] Avg episode reward: [(0, '6.290')] [2025-01-04 02:41:10,409][134294] Updated weights for policy 0, policy_version 73464 (0.0026) [2025-01-04 02:41:12,516][134294] Updated weights for policy 0, policy_version 73474 (0.0015) [2025-01-04 02:41:13,968][134211] Fps is (10 sec: 14745.9, 60 sec: 15975.0, 300 sec: 15328.8). Total num frames: 300978176. Throughput: 0: 3829.4. Samples: 64414020. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2025-01-04 02:41:13,968][134211] Avg episode reward: [(0, '7.749')] [2025-01-04 02:41:13,974][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000073481_300978176.pth... [2025-01-04 02:41:14,022][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000072590_297328640.pth [2025-01-04 02:41:14,462][134294] Updated weights for policy 0, policy_version 73484 (0.0013) [2025-01-04 02:41:16,343][134294] Updated weights for policy 0, policy_version 73494 (0.0015) [2025-01-04 02:41:18,222][134294] Updated weights for policy 0, policy_version 73504 (0.0013) [2025-01-04 02:41:18,968][134211] Fps is (10 sec: 19662.1, 60 sec: 16589.3, 300 sec: 15398.2). Total num frames: 301084672. Throughput: 0: 3970.1. Samples: 64430176. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2025-01-04 02:41:18,968][134211] Avg episode reward: [(0, '7.673')] [2025-01-04 02:41:21,009][134294] Updated weights for policy 0, policy_version 73514 (0.0022) [2025-01-04 02:41:23,968][134211] Fps is (10 sec: 17202.8, 60 sec: 15974.3, 300 sec: 15259.3). Total num frames: 301150208. Throughput: 0: 4064.5. Samples: 64455302. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2025-01-04 02:41:23,968][134211] Avg episode reward: [(0, '6.596')] [2025-01-04 02:41:24,245][134294] Updated weights for policy 0, policy_version 73524 (0.0027) [2025-01-04 02:41:27,329][134294] Updated weights for policy 0, policy_version 73534 (0.0026) [2025-01-04 02:41:28,968][134211] Fps is (10 sec: 13107.4, 60 sec: 15564.9, 300 sec: 15217.7). Total num frames: 301215744. Throughput: 0: 3770.7. Samples: 64474638. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2025-01-04 02:41:28,968][134211] Avg episode reward: [(0, '7.142')] [2025-01-04 02:41:30,445][134294] Updated weights for policy 0, policy_version 73544 (0.0026) [2025-01-04 02:41:33,411][134294] Updated weights for policy 0, policy_version 73554 (0.0026) [2025-01-04 02:41:33,967][134211] Fps is (10 sec: 13517.2, 60 sec: 15701.4, 300 sec: 15245.5). Total num frames: 301285376. Throughput: 0: 3633.8. Samples: 64484876. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2025-01-04 02:41:33,968][134211] Avg episode reward: [(0, '7.258')] [2025-01-04 02:41:35,411][134294] Updated weights for policy 0, policy_version 73564 (0.0014) [2025-01-04 02:41:38,143][134294] Updated weights for policy 0, policy_version 73574 (0.0023) [2025-01-04 02:41:38,968][134211] Fps is (10 sec: 15155.3, 60 sec: 15564.8, 300 sec: 15314.9). Total num frames: 301367296. Throughput: 0: 3679.5. Samples: 64509724. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2025-01-04 02:41:38,968][134211] Avg episode reward: [(0, '7.024')] [2025-01-04 02:41:41,197][134294] Updated weights for policy 0, policy_version 73584 (0.0024) [2025-01-04 02:41:43,968][134211] Fps is (10 sec: 15155.1, 60 sec: 14882.1, 300 sec: 15287.1). Total num frames: 301436928. Throughput: 0: 3683.9. Samples: 64529400. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 02:41:43,968][134211] Avg episode reward: [(0, '7.307')] [2025-01-04 02:41:44,067][134294] Updated weights for policy 0, policy_version 73594 (0.0024) [2025-01-04 02:41:45,996][134294] Updated weights for policy 0, policy_version 73604 (0.0013) [2025-01-04 02:41:47,864][134294] Updated weights for policy 0, policy_version 73614 (0.0014) [2025-01-04 02:41:48,967][134211] Fps is (10 sec: 17613.2, 60 sec: 15018.7, 300 sec: 15301.0). Total num frames: 301543424. Throughput: 0: 3818.6. Samples: 64544866. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 02:41:48,968][134211] Avg episode reward: [(0, '6.940')] [2025-01-04 02:41:49,760][134294] Updated weights for policy 0, policy_version 73624 (0.0013) [2025-01-04 02:41:51,885][134294] Updated weights for policy 0, policy_version 73634 (0.0018) [2025-01-04 02:41:53,968][134211] Fps is (10 sec: 18841.1, 60 sec: 15361.0, 300 sec: 15370.5). Total num frames: 301625344. Throughput: 0: 4106.5. Samples: 64575138. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 02:41:53,968][134211] Avg episode reward: [(0, '7.296')] [2025-01-04 02:41:55,571][134294] Updated weights for policy 0, policy_version 73644 (0.0028) [2025-01-04 02:41:58,810][134294] Updated weights for policy 0, policy_version 73654 (0.0028) [2025-01-04 02:41:58,968][134211] Fps is (10 sec: 14335.6, 60 sec: 15291.7, 300 sec: 15301.0). Total num frames: 301686784. Throughput: 0: 3975.4. Samples: 64592914. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 02:41:58,968][134211] Avg episode reward: [(0, '6.998')] [2025-01-04 02:42:01,573][134294] Updated weights for policy 0, policy_version 73664 (0.0021) [2025-01-04 02:42:03,702][134294] Updated weights for policy 0, policy_version 73674 (0.0016) [2025-01-04 02:42:03,968][134211] Fps is (10 sec: 14336.2, 60 sec: 15633.1, 300 sec: 15273.3). Total num frames: 301768704. Throughput: 0: 3844.3. Samples: 64603170. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 02:42:03,968][134211] Avg episode reward: [(0, '6.793')] [2025-01-04 02:42:06,822][134294] Updated weights for policy 0, policy_version 73684 (0.0026) [2025-01-04 02:42:08,968][134211] Fps is (10 sec: 14745.6, 60 sec: 15769.8, 300 sec: 15287.1). Total num frames: 301834240. Throughput: 0: 3810.8. Samples: 64626786. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 02:42:08,968][134211] Avg episode reward: [(0, '6.611')] [2025-01-04 02:42:10,189][134294] Updated weights for policy 0, policy_version 73694 (0.0024) [2025-01-04 02:42:12,515][134294] Updated weights for policy 0, policy_version 73704 (0.0015) [2025-01-04 02:42:13,968][134211] Fps is (10 sec: 15155.3, 60 sec: 15701.3, 300 sec: 15356.5). Total num frames: 301920256. Throughput: 0: 3894.1. Samples: 64649872. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 02:42:13,968][134211] Avg episode reward: [(0, '7.133')] [2025-01-04 02:42:14,368][134294] Updated weights for policy 0, policy_version 73714 (0.0013) [2025-01-04 02:42:16,274][134294] Updated weights for policy 0, policy_version 73724 (0.0014) [2025-01-04 02:42:18,118][134294] Updated weights for policy 0, policy_version 73734 (0.0017) [2025-01-04 02:42:18,967][134211] Fps is (10 sec: 19661.4, 60 sec: 15769.7, 300 sec: 15509.3). Total num frames: 302030848. Throughput: 0: 4029.0. Samples: 64666180. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 02:42:18,968][134211] Avg episode reward: [(0, '6.713')] [2025-01-04 02:42:20,021][134294] Updated weights for policy 0, policy_version 73744 (0.0013) [2025-01-04 02:42:22,529][134294] Updated weights for policy 0, policy_version 73754 (0.0022) [2025-01-04 02:42:23,968][134211] Fps is (10 sec: 19251.1, 60 sec: 16042.7, 300 sec: 15564.8). Total num frames: 302112768. Throughput: 0: 4142.1. Samples: 64696120. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 02:42:23,968][134211] Avg episode reward: [(0, '6.886')] [2025-01-04 02:42:25,850][134294] Updated weights for policy 0, policy_version 73764 (0.0029) [2025-01-04 02:42:28,968][134211] Fps is (10 sec: 13926.0, 60 sec: 15906.1, 300 sec: 15467.6). Total num frames: 302170112. Throughput: 0: 4114.1. Samples: 64714536. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 02:42:28,968][134211] Avg episode reward: [(0, '7.601')] [2025-01-04 02:42:29,293][134294] Updated weights for policy 0, policy_version 73774 (0.0027) [2025-01-04 02:42:32,695][134294] Updated weights for policy 0, policy_version 73784 (0.0027) [2025-01-04 02:42:33,968][134211] Fps is (10 sec: 11878.3, 60 sec: 15769.5, 300 sec: 15342.6). Total num frames: 302231552. Throughput: 0: 3967.0. Samples: 64723382. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 02:42:33,968][134211] Avg episode reward: [(0, '8.472')] [2025-01-04 02:42:33,975][134264] Saving new best policy, reward=8.472! [2025-01-04 02:42:36,287][134294] Updated weights for policy 0, policy_version 73794 (0.0030) [2025-01-04 02:42:38,450][134294] Updated weights for policy 0, policy_version 73804 (0.0014) [2025-01-04 02:42:38,967][134211] Fps is (10 sec: 13926.8, 60 sec: 15701.4, 300 sec: 15426.0). Total num frames: 302309376. Throughput: 0: 3708.0. Samples: 64741996. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 02:42:38,968][134211] Avg episode reward: [(0, '7.632')] [2025-01-04 02:42:40,342][134294] Updated weights for policy 0, policy_version 73814 (0.0015) [2025-01-04 02:42:42,286][134294] Updated weights for policy 0, policy_version 73824 (0.0013) [2025-01-04 02:42:43,968][134211] Fps is (10 sec: 18022.4, 60 sec: 16247.4, 300 sec: 15495.4). Total num frames: 302411776. Throughput: 0: 4006.4. Samples: 64773202. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 02:42:43,968][134211] Avg episode reward: [(0, '7.059')] [2025-01-04 02:42:44,887][134294] Updated weights for policy 0, policy_version 73834 (0.0023) [2025-01-04 02:42:48,037][134294] Updated weights for policy 0, policy_version 73844 (0.0024) [2025-01-04 02:42:48,968][134211] Fps is (10 sec: 16383.5, 60 sec: 15496.5, 300 sec: 15370.4). Total num frames: 302473216. Throughput: 0: 4006.9. Samples: 64783480. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 02:42:48,968][134211] Avg episode reward: [(0, '7.363')] [2025-01-04 02:42:51,140][134294] Updated weights for policy 0, policy_version 73854 (0.0024) [2025-01-04 02:42:53,968][134211] Fps is (10 sec: 12697.4, 60 sec: 15223.5, 300 sec: 15370.5). Total num frames: 302538752. Throughput: 0: 3914.7. Samples: 64802950. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 02:42:53,969][134211] Avg episode reward: [(0, '6.884')] [2025-01-04 02:42:54,400][134294] Updated weights for policy 0, policy_version 73864 (0.0027) [2025-01-04 02:42:57,263][134294] Updated weights for policy 0, policy_version 73874 (0.0021) [2025-01-04 02:42:58,967][134211] Fps is (10 sec: 14746.0, 60 sec: 15564.9, 300 sec: 15439.9). Total num frames: 302620672. Throughput: 0: 3894.6. Samples: 64825130. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 02:42:58,968][134211] Avg episode reward: [(0, '7.027')] [2025-01-04 02:42:59,216][134294] Updated weights for policy 0, policy_version 73884 (0.0013) [2025-01-04 02:43:01,065][134294] Updated weights for policy 0, policy_version 73894 (0.0014) [2025-01-04 02:43:03,230][134294] Updated weights for policy 0, policy_version 73904 (0.0013) [2025-01-04 02:43:03,968][134211] Fps is (10 sec: 18432.6, 60 sec: 15906.2, 300 sec: 15592.6). Total num frames: 302723072. Throughput: 0: 3893.2. Samples: 64841374. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 02:43:03,968][134211] Avg episode reward: [(0, '6.737')] [2025-01-04 02:43:05,353][134294] Updated weights for policy 0, policy_version 73914 (0.0015) [2025-01-04 02:43:07,346][134294] Updated weights for policy 0, policy_version 73924 (0.0012) [2025-01-04 02:43:08,968][134211] Fps is (10 sec: 20479.6, 60 sec: 16520.6, 300 sec: 15634.2). Total num frames: 302825472. Throughput: 0: 3875.7. Samples: 64870524. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 02:43:08,968][134211] Avg episode reward: [(0, '6.541')] [2025-01-04 02:43:09,308][134294] Updated weights for policy 0, policy_version 73934 (0.0015) [2025-01-04 02:43:12,485][134294] Updated weights for policy 0, policy_version 73944 (0.0028) [2025-01-04 02:43:13,968][134211] Fps is (10 sec: 16792.9, 60 sec: 16179.1, 300 sec: 15564.8). Total num frames: 302891008. Throughput: 0: 3985.9. Samples: 64893904. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 02:43:13,969][134211] Avg episode reward: [(0, '7.633')] [2025-01-04 02:43:13,980][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000073948_302891008.pth... [2025-01-04 02:43:14,060][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000073032_299139072.pth [2025-01-04 02:43:15,860][134294] Updated weights for policy 0, policy_version 73954 (0.0030) [2025-01-04 02:43:18,968][134211] Fps is (10 sec: 12697.4, 60 sec: 15359.9, 300 sec: 15578.7). Total num frames: 302952448. Throughput: 0: 3996.2. Samples: 64903210. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 02:43:18,968][134211] Avg episode reward: [(0, '7.332')] [2025-01-04 02:43:19,001][134294] Updated weights for policy 0, policy_version 73964 (0.0028) [2025-01-04 02:43:21,940][134294] Updated weights for policy 0, policy_version 73974 (0.0027) [2025-01-04 02:43:23,968][134211] Fps is (10 sec: 13107.3, 60 sec: 15155.2, 300 sec: 15606.4). Total num frames: 303022080. Throughput: 0: 4027.8. Samples: 64923248. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 02:43:23,969][134211] Avg episode reward: [(0, '6.641')] [2025-01-04 02:43:25,107][134294] Updated weights for policy 0, policy_version 73984 (0.0027) [2025-01-04 02:43:28,085][134294] Updated weights for policy 0, policy_version 73994 (0.0024) [2025-01-04 02:43:28,968][134211] Fps is (10 sec: 13516.3, 60 sec: 15291.6, 300 sec: 15481.5). Total num frames: 303087616. Throughput: 0: 3776.8. Samples: 64943160. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 02:43:28,969][134211] Avg episode reward: [(0, '7.352')] [2025-01-04 02:43:31,186][134294] Updated weights for policy 0, policy_version 74004 (0.0028) [2025-01-04 02:43:33,968][134211] Fps is (10 sec: 13516.9, 60 sec: 15428.3, 300 sec: 15370.4). Total num frames: 303157248. Throughput: 0: 3774.5. Samples: 64953334. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 02:43:33,969][134211] Avg episode reward: [(0, '7.441')] [2025-01-04 02:43:34,243][134294] Updated weights for policy 0, policy_version 74014 (0.0026) [2025-01-04 02:43:37,288][134294] Updated weights for policy 0, policy_version 74024 (0.0027) [2025-01-04 02:43:38,967][134211] Fps is (10 sec: 14337.0, 60 sec: 15360.0, 300 sec: 15412.1). Total num frames: 303230976. Throughput: 0: 3791.2. Samples: 64973550. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 02:43:38,968][134211] Avg episode reward: [(0, '7.264')] [2025-01-04 02:43:39,439][134294] Updated weights for policy 0, policy_version 74034 (0.0013) [2025-01-04 02:43:41,317][134294] Updated weights for policy 0, policy_version 74044 (0.0014) [2025-01-04 02:43:43,147][134294] Updated weights for policy 0, policy_version 74054 (0.0014) [2025-01-04 02:43:43,968][134211] Fps is (10 sec: 18432.5, 60 sec: 15496.6, 300 sec: 15578.7). Total num frames: 303341568. Throughput: 0: 3987.1. Samples: 65004550. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 02:43:43,968][134211] Avg episode reward: [(0, '7.019')] [2025-01-04 02:43:45,518][134294] Updated weights for policy 0, policy_version 74064 (0.0017) [2025-01-04 02:43:48,732][134294] Updated weights for policy 0, policy_version 74074 (0.0027) [2025-01-04 02:43:48,968][134211] Fps is (10 sec: 17612.2, 60 sec: 15564.8, 300 sec: 15592.6). Total num frames: 303407104. Throughput: 0: 3898.0. Samples: 65016784. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 02:43:48,968][134211] Avg episode reward: [(0, '7.095')] [2025-01-04 02:43:51,694][134294] Updated weights for policy 0, policy_version 74084 (0.0028) [2025-01-04 02:43:53,968][134211] Fps is (10 sec: 13516.1, 60 sec: 15633.0, 300 sec: 15606.4). Total num frames: 303476736. Throughput: 0: 3689.8. Samples: 65036566. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 02:43:53,970][134211] Avg episode reward: [(0, '7.673')] [2025-01-04 02:43:54,937][134294] Updated weights for policy 0, policy_version 74094 (0.0025) [2025-01-04 02:43:57,643][134294] Updated weights for policy 0, policy_version 74104 (0.0019) [2025-01-04 02:43:58,968][134211] Fps is (10 sec: 14745.9, 60 sec: 15564.8, 300 sec: 15662.0). Total num frames: 303554560. Throughput: 0: 3658.4. Samples: 65058530. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 02:43:58,968][134211] Avg episode reward: [(0, '7.326')] [2025-01-04 02:43:59,580][134294] Updated weights for policy 0, policy_version 74114 (0.0016) [2025-01-04 02:44:01,451][134294] Updated weights for policy 0, policy_version 74124 (0.0015) [2025-01-04 02:44:03,376][134294] Updated weights for policy 0, policy_version 74134 (0.0013) [2025-01-04 02:44:03,968][134211] Fps is (10 sec: 18842.5, 60 sec: 15701.3, 300 sec: 15703.6). Total num frames: 303665152. Throughput: 0: 3809.7. Samples: 65074644. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 02:44:03,968][134211] Avg episode reward: [(0, '7.101')] [2025-01-04 02:44:05,523][134294] Updated weights for policy 0, policy_version 74144 (0.0017) [2025-01-04 02:44:08,668][134294] Updated weights for policy 0, policy_version 74154 (0.0027) [2025-01-04 02:44:08,968][134211] Fps is (10 sec: 18431.7, 60 sec: 15223.4, 300 sec: 15592.6). Total num frames: 303738880. Throughput: 0: 3976.3. Samples: 65102182. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 02:44:08,968][134211] Avg episode reward: [(0, '7.693')] [2025-01-04 02:44:11,746][134294] Updated weights for policy 0, policy_version 74164 (0.0025) [2025-01-04 02:44:13,968][134211] Fps is (10 sec: 13516.5, 60 sec: 15155.2, 300 sec: 15523.1). Total num frames: 303800320. Throughput: 0: 3955.8. Samples: 65121170. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 02:44:13,968][134211] Avg episode reward: [(0, '6.885')] [2025-01-04 02:44:15,216][134294] Updated weights for policy 0, policy_version 74174 (0.0030) [2025-01-04 02:44:18,216][134294] Updated weights for policy 0, policy_version 74184 (0.0023) [2025-01-04 02:44:18,968][134211] Fps is (10 sec: 13517.0, 60 sec: 15360.1, 300 sec: 15578.7). Total num frames: 303874048. Throughput: 0: 3925.7. Samples: 65129988. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 02:44:18,968][134211] Avg episode reward: [(0, '7.323')] [2025-01-04 02:44:20,137][134294] Updated weights for policy 0, policy_version 74194 (0.0013) [2025-01-04 02:44:22,022][134294] Updated weights for policy 0, policy_version 74204 (0.0016) [2025-01-04 02:44:23,968][134211] Fps is (10 sec: 17612.6, 60 sec: 15906.1, 300 sec: 15717.5). Total num frames: 303976448. Throughput: 0: 4119.8. Samples: 65158944. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 02:44:23,968][134211] Avg episode reward: [(0, '6.814')] [2025-01-04 02:44:24,032][134294] Updated weights for policy 0, policy_version 74214 (0.0016) [2025-01-04 02:44:26,975][134294] Updated weights for policy 0, policy_version 74224 (0.0027) [2025-01-04 02:44:28,968][134211] Fps is (10 sec: 16792.8, 60 sec: 15906.2, 300 sec: 15731.4). Total num frames: 304041984. Throughput: 0: 3946.1. Samples: 65182126. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 02:44:28,970][134211] Avg episode reward: [(0, '6.263')] [2025-01-04 02:44:30,283][134294] Updated weights for policy 0, policy_version 74234 (0.0031) [2025-01-04 02:44:33,330][134294] Updated weights for policy 0, policy_version 74244 (0.0028) [2025-01-04 02:44:33,968][134211] Fps is (10 sec: 13107.4, 60 sec: 15837.9, 300 sec: 15689.8). Total num frames: 304107520. Throughput: 0: 3892.5. Samples: 65191946. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 02:44:33,968][134211] Avg episode reward: [(0, '6.812')] [2025-01-04 02:44:36,607][134294] Updated weights for policy 0, policy_version 74254 (0.0024) [2025-01-04 02:44:38,571][134294] Updated weights for policy 0, policy_version 74264 (0.0012) [2025-01-04 02:44:38,968][134211] Fps is (10 sec: 15155.9, 60 sec: 16042.6, 300 sec: 15745.3). Total num frames: 304193536. Throughput: 0: 3901.2. Samples: 65212118. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 02:44:38,968][134211] Avg episode reward: [(0, '6.758')] [2025-01-04 02:44:40,493][134294] Updated weights for policy 0, policy_version 74274 (0.0013) [2025-01-04 02:44:42,374][134294] Updated weights for policy 0, policy_version 74284 (0.0013) [2025-01-04 02:44:43,968][134211] Fps is (10 sec: 19251.5, 60 sec: 15974.4, 300 sec: 15773.1). Total num frames: 304300032. Throughput: 0: 4131.2. Samples: 65244434. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 02:44:43,968][134211] Avg episode reward: [(0, '5.993')] [2025-01-04 02:44:44,222][134294] Updated weights for policy 0, policy_version 74294 (0.0013) [2025-01-04 02:44:47,141][134294] Updated weights for policy 0, policy_version 74304 (0.0023) [2025-01-04 02:44:48,968][134211] Fps is (10 sec: 17612.4, 60 sec: 16042.7, 300 sec: 15648.1). Total num frames: 304369664. Throughput: 0: 4062.0. Samples: 65257436. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 02:44:48,968][134211] Avg episode reward: [(0, '6.493')] [2025-01-04 02:44:50,388][134294] Updated weights for policy 0, policy_version 74314 (0.0027) [2025-01-04 02:44:53,647][134294] Updated weights for policy 0, policy_version 74324 (0.0028) [2025-01-04 02:44:53,968][134211] Fps is (10 sec: 13516.5, 60 sec: 15974.5, 300 sec: 15620.3). Total num frames: 304435200. Throughput: 0: 3872.5. Samples: 65276446. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 02:44:53,969][134211] Avg episode reward: [(0, '6.959')] [2025-01-04 02:44:56,772][134294] Updated weights for policy 0, policy_version 74334 (0.0027) [2025-01-04 02:44:58,969][134211] Fps is (10 sec: 13106.1, 60 sec: 15769.3, 300 sec: 15634.2). Total num frames: 304500736. Throughput: 0: 3877.6. Samples: 65295666. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 02:44:58,969][134211] Avg episode reward: [(0, '6.504')] [2025-01-04 02:44:59,999][134294] Updated weights for policy 0, policy_version 74344 (0.0027) [2025-01-04 02:45:03,190][134294] Updated weights for policy 0, policy_version 74354 (0.0027) [2025-01-04 02:45:03,967][134211] Fps is (10 sec: 13107.7, 60 sec: 15018.7, 300 sec: 15634.2). Total num frames: 304566272. Throughput: 0: 3894.1. Samples: 65305222. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 02:45:03,968][134211] Avg episode reward: [(0, '7.003')] [2025-01-04 02:45:05,438][134294] Updated weights for policy 0, policy_version 74364 (0.0015) [2025-01-04 02:45:07,529][134294] Updated weights for policy 0, policy_version 74374 (0.0013) [2025-01-04 02:45:08,968][134211] Fps is (10 sec: 15976.0, 60 sec: 15360.0, 300 sec: 15731.5). Total num frames: 304660480. Throughput: 0: 3811.7. Samples: 65330468. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 02:45:08,968][134211] Avg episode reward: [(0, '7.425')] [2025-01-04 02:45:09,593][134294] Updated weights for policy 0, policy_version 74384 (0.0015) [2025-01-04 02:45:11,459][134294] Updated weights for policy 0, policy_version 74394 (0.0014) [2025-01-04 02:45:13,379][134294] Updated weights for policy 0, policy_version 74404 (0.0013) [2025-01-04 02:45:13,968][134211] Fps is (10 sec: 20070.0, 60 sec: 16111.0, 300 sec: 15856.5). Total num frames: 304766976. Throughput: 0: 3993.2. Samples: 65361820. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 02:45:13,968][134211] Avg episode reward: [(0, '6.977')] [2025-01-04 02:45:14,009][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000074407_304771072.pth... [2025-01-04 02:45:14,060][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000073481_300978176.pth [2025-01-04 02:45:16,106][134294] Updated weights for policy 0, policy_version 74414 (0.0022) [2025-01-04 02:45:18,968][134211] Fps is (10 sec: 17612.5, 60 sec: 16042.6, 300 sec: 15745.3). Total num frames: 304836608. Throughput: 0: 4030.7. Samples: 65373328. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 02:45:18,968][134211] Avg episode reward: [(0, '7.290')] [2025-01-04 02:45:19,360][134294] Updated weights for policy 0, policy_version 74424 (0.0029) [2025-01-04 02:45:22,586][134294] Updated weights for policy 0, policy_version 74434 (0.0023) [2025-01-04 02:45:23,968][134211] Fps is (10 sec: 13107.2, 60 sec: 15360.1, 300 sec: 15648.1). Total num frames: 304898048. Throughput: 0: 4004.7. Samples: 65392330. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 02:45:23,968][134211] Avg episode reward: [(0, '6.852')] [2025-01-04 02:45:25,750][134294] Updated weights for policy 0, policy_version 74444 (0.0026) [2025-01-04 02:45:28,724][134294] Updated weights for policy 0, policy_version 74454 (0.0025) [2025-01-04 02:45:28,968][134211] Fps is (10 sec: 12697.7, 60 sec: 15360.1, 300 sec: 15662.0). Total num frames: 304963584. Throughput: 0: 3725.2. Samples: 65412068. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 02:45:28,968][134211] Avg episode reward: [(0, '6.524')] [2025-01-04 02:45:31,850][134294] Updated weights for policy 0, policy_version 74464 (0.0026) [2025-01-04 02:45:33,968][134211] Fps is (10 sec: 13107.0, 60 sec: 15360.0, 300 sec: 15578.7). Total num frames: 305029120. Throughput: 0: 3656.3. Samples: 65421968. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 02:45:33,968][134211] Avg episode reward: [(0, '7.117')] [2025-01-04 02:45:35,007][134294] Updated weights for policy 0, policy_version 74474 (0.0026) [2025-01-04 02:45:37,717][134294] Updated weights for policy 0, policy_version 74484 (0.0019) [2025-01-04 02:45:38,968][134211] Fps is (10 sec: 14745.7, 60 sec: 15291.7, 300 sec: 15481.5). Total num frames: 305111040. Throughput: 0: 3681.6. Samples: 65442118. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 02:45:38,968][134211] Avg episode reward: [(0, '7.109')] [2025-01-04 02:45:39,716][134294] Updated weights for policy 0, policy_version 74494 (0.0016) [2025-01-04 02:45:41,921][134294] Updated weights for policy 0, policy_version 74504 (0.0016) [2025-01-04 02:45:43,968][134211] Fps is (10 sec: 16384.1, 60 sec: 14882.1, 300 sec: 15425.9). Total num frames: 305192960. Throughput: 0: 3857.4. Samples: 65469244. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 02:45:43,968][134211] Avg episode reward: [(0, '6.191')] [2025-01-04 02:45:45,119][134294] Updated weights for policy 0, policy_version 74514 (0.0027) [2025-01-04 02:45:47,477][134294] Updated weights for policy 0, policy_version 74524 (0.0016) [2025-01-04 02:45:48,968][134211] Fps is (10 sec: 17203.2, 60 sec: 15223.5, 300 sec: 15523.4). Total num frames: 305283072. Throughput: 0: 3862.1. Samples: 65479016. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 02:45:48,968][134211] Avg episode reward: [(0, '7.201')] [2025-01-04 02:45:49,410][134294] Updated weights for policy 0, policy_version 74534 (0.0014) [2025-01-04 02:45:51,384][134294] Updated weights for policy 0, policy_version 74544 (0.0014) [2025-01-04 02:45:53,307][134294] Updated weights for policy 0, policy_version 74554 (0.0013) [2025-01-04 02:45:53,968][134211] Fps is (10 sec: 18841.6, 60 sec: 15769.6, 300 sec: 15634.2). Total num frames: 305381376. Throughput: 0: 4005.9. Samples: 65510734. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 02:45:53,968][134211] Avg episode reward: [(0, '6.602')] [2025-01-04 02:45:56,315][134294] Updated weights for policy 0, policy_version 74564 (0.0025) [2025-01-04 02:45:58,968][134211] Fps is (10 sec: 15974.1, 60 sec: 15701.5, 300 sec: 15634.2). Total num frames: 305442816. Throughput: 0: 3781.5. Samples: 65531986. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 02:45:58,968][134211] Avg episode reward: [(0, '7.364')] [2025-01-04 02:45:59,748][134294] Updated weights for policy 0, policy_version 74574 (0.0026) [2025-01-04 02:46:03,047][134294] Updated weights for policy 0, policy_version 74584 (0.0027) [2025-01-04 02:46:03,968][134211] Fps is (10 sec: 12288.0, 60 sec: 15633.0, 300 sec: 15648.1). Total num frames: 305504256. Throughput: 0: 3733.7. Samples: 65541344. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 02:46:03,969][134211] Avg episode reward: [(0, '7.248')] [2025-01-04 02:46:06,022][134294] Updated weights for policy 0, policy_version 74594 (0.0027) [2025-01-04 02:46:08,968][134211] Fps is (10 sec: 13107.2, 60 sec: 15223.4, 300 sec: 15578.7). Total num frames: 305573888. Throughput: 0: 3748.9. Samples: 65561030. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 02:46:08,968][134211] Avg episode reward: [(0, '7.441')] [2025-01-04 02:46:09,268][134294] Updated weights for policy 0, policy_version 74604 (0.0025) [2025-01-04 02:46:11,299][134294] Updated weights for policy 0, policy_version 74614 (0.0013) [2025-01-04 02:46:13,235][134294] Updated weights for policy 0, policy_version 74624 (0.0012) [2025-01-04 02:46:13,968][134211] Fps is (10 sec: 17203.5, 60 sec: 15155.2, 300 sec: 15564.8). Total num frames: 305676288. Throughput: 0: 3915.8. Samples: 65588278. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-04 02:46:13,968][134211] Avg episode reward: [(0, '7.368')] [2025-01-04 02:46:15,065][134294] Updated weights for policy 0, policy_version 74634 (0.0014) [2025-01-04 02:46:16,938][134294] Updated weights for policy 0, policy_version 74644 (0.0015) [2025-01-04 02:46:18,827][134294] Updated weights for policy 0, policy_version 74654 (0.0014) [2025-01-04 02:46:18,967][134211] Fps is (10 sec: 20890.2, 60 sec: 15769.7, 300 sec: 15703.7). Total num frames: 305782784. Throughput: 0: 4056.3. Samples: 65604500. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-04 02:46:18,968][134211] Avg episode reward: [(0, '8.381')] [2025-01-04 02:46:21,473][134294] Updated weights for policy 0, policy_version 74664 (0.0022) [2025-01-04 02:46:23,968][134211] Fps is (10 sec: 17612.6, 60 sec: 15906.1, 300 sec: 15717.5). Total num frames: 305852416. Throughput: 0: 4210.0. Samples: 65631568. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-04 02:46:23,968][134211] Avg episode reward: [(0, '6.936')] [2025-01-04 02:46:24,592][134294] Updated weights for policy 0, policy_version 74674 (0.0026) [2025-01-04 02:46:27,775][134294] Updated weights for policy 0, policy_version 74684 (0.0030) [2025-01-04 02:46:28,968][134211] Fps is (10 sec: 13516.5, 60 sec: 15906.1, 300 sec: 15703.6). Total num frames: 305917952. Throughput: 0: 4035.6. Samples: 65650848. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-04 02:46:28,968][134211] Avg episode reward: [(0, '7.838')] [2025-01-04 02:46:30,886][134294] Updated weights for policy 0, policy_version 74694 (0.0026) [2025-01-04 02:46:33,894][134294] Updated weights for policy 0, policy_version 74704 (0.0027) [2025-01-04 02:46:33,968][134211] Fps is (10 sec: 13516.9, 60 sec: 15974.5, 300 sec: 15662.0). Total num frames: 305987584. Throughput: 0: 4042.5. Samples: 65660930. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-04 02:46:33,968][134211] Avg episode reward: [(0, '6.846')] [2025-01-04 02:46:37,006][134294] Updated weights for policy 0, policy_version 74714 (0.0025) [2025-01-04 02:46:38,969][134211] Fps is (10 sec: 13105.3, 60 sec: 15632.6, 300 sec: 15634.1). Total num frames: 306049024. Throughput: 0: 3780.2. Samples: 65680848. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-04 02:46:38,970][134211] Avg episode reward: [(0, '6.679')] [2025-01-04 02:46:40,169][134294] Updated weights for policy 0, policy_version 74724 (0.0027) [2025-01-04 02:46:43,123][134294] Updated weights for policy 0, policy_version 74734 (0.0026) [2025-01-04 02:46:43,968][134211] Fps is (10 sec: 13925.3, 60 sec: 15564.6, 300 sec: 15537.0). Total num frames: 306126848. Throughput: 0: 3758.6. Samples: 65701126. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-04 02:46:43,969][134211] Avg episode reward: [(0, '7.518')] [2025-01-04 02:46:45,041][134294] Updated weights for policy 0, policy_version 74744 (0.0015) [2025-01-04 02:46:47,716][134294] Updated weights for policy 0, policy_version 74754 (0.0020) [2025-01-04 02:46:48,968][134211] Fps is (10 sec: 15567.3, 60 sec: 15360.0, 300 sec: 15523.2). Total num frames: 306204672. Throughput: 0: 3874.3. Samples: 65715688. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-04 02:46:48,968][134211] Avg episode reward: [(0, '7.458')] [2025-01-04 02:46:50,763][134294] Updated weights for policy 0, policy_version 74764 (0.0026) [2025-01-04 02:46:52,841][134294] Updated weights for policy 0, policy_version 74774 (0.0014) [2025-01-04 02:46:53,968][134211] Fps is (10 sec: 16795.1, 60 sec: 15223.5, 300 sec: 15620.3). Total num frames: 306294784. Throughput: 0: 3935.5. Samples: 65738126. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-04 02:46:53,968][134211] Avg episode reward: [(0, '7.412')] [2025-01-04 02:46:54,795][134294] Updated weights for policy 0, policy_version 74784 (0.0012) [2025-01-04 02:46:56,687][134294] Updated weights for policy 0, policy_version 74794 (0.0012) [2025-01-04 02:46:58,569][134294] Updated weights for policy 0, policy_version 74804 (0.0014) [2025-01-04 02:46:58,968][134211] Fps is (10 sec: 20070.5, 60 sec: 16042.7, 300 sec: 15717.5). Total num frames: 306405376. Throughput: 0: 4048.9. Samples: 65770478. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 02:46:58,968][134211] Avg episode reward: [(0, '6.783')] [2025-01-04 02:47:00,433][134294] Updated weights for policy 0, policy_version 74814 (0.0013) [2025-01-04 02:47:02,350][134294] Updated weights for policy 0, policy_version 74824 (0.0013) [2025-01-04 02:47:03,968][134211] Fps is (10 sec: 21708.7, 60 sec: 16793.6, 300 sec: 15856.4). Total num frames: 306511872. Throughput: 0: 4049.3. Samples: 65786720. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 02:47:03,968][134211] Avg episode reward: [(0, '7.822')] [2025-01-04 02:47:04,312][134294] Updated weights for policy 0, policy_version 74834 (0.0016) [2025-01-04 02:47:07,722][134294] Updated weights for policy 0, policy_version 74844 (0.0026) [2025-01-04 02:47:08,968][134211] Fps is (10 sec: 16793.2, 60 sec: 16657.0, 300 sec: 15773.1). Total num frames: 306573312. Throughput: 0: 4010.2. Samples: 65812026. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 02:47:08,969][134211] Avg episode reward: [(0, '7.782')] [2025-01-04 02:47:11,488][134294] Updated weights for policy 0, policy_version 74854 (0.0027) [2025-01-04 02:47:13,968][134211] Fps is (10 sec: 11468.7, 60 sec: 15837.8, 300 sec: 15578.7). Total num frames: 306626560. Throughput: 0: 3947.6. Samples: 65828490. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 02:47:13,968][134211] Avg episode reward: [(0, '7.327')] [2025-01-04 02:47:13,978][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000074861_306630656.pth... [2025-01-04 02:47:14,059][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000073948_302891008.pth [2025-01-04 02:47:15,031][134294] Updated weights for policy 0, policy_version 74864 (0.0028) [2025-01-04 02:47:18,126][134294] Updated weights for policy 0, policy_version 74874 (0.0027) [2025-01-04 02:47:18,968][134211] Fps is (10 sec: 11878.5, 60 sec: 15155.1, 300 sec: 15523.1). Total num frames: 306692096. Throughput: 0: 3925.0. Samples: 65837556. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 02:47:18,968][134211] Avg episode reward: [(0, '7.513')] [2025-01-04 02:47:21,080][134294] Updated weights for policy 0, policy_version 74884 (0.0026) [2025-01-04 02:47:23,968][134211] Fps is (10 sec: 13107.2, 60 sec: 15086.9, 300 sec: 15550.9). Total num frames: 306757632. Throughput: 0: 3927.3. Samples: 65857570. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 02:47:23,969][134211] Avg episode reward: [(0, '7.667')] [2025-01-04 02:47:24,282][134294] Updated weights for policy 0, policy_version 74894 (0.0022) [2025-01-04 02:47:26,272][134294] Updated weights for policy 0, policy_version 74904 (0.0013) [2025-01-04 02:47:28,149][134294] Updated weights for policy 0, policy_version 74914 (0.0014) [2025-01-04 02:47:28,968][134211] Fps is (10 sec: 16793.6, 60 sec: 15701.3, 300 sec: 15689.8). Total num frames: 306860032. Throughput: 0: 4090.1. Samples: 65885176. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 02:47:28,968][134211] Avg episode reward: [(0, '7.581')] [2025-01-04 02:47:30,949][134294] Updated weights for policy 0, policy_version 74924 (0.0024) [2025-01-04 02:47:33,929][134294] Updated weights for policy 0, policy_version 74934 (0.0024) [2025-01-04 02:47:33,968][134211] Fps is (10 sec: 17203.1, 60 sec: 15701.3, 300 sec: 15662.0). Total num frames: 306929664. Throughput: 0: 4011.9. Samples: 65896226. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 02:47:33,968][134211] Avg episode reward: [(0, '7.135')] [2025-01-04 02:47:36,938][134294] Updated weights for policy 0, policy_version 74944 (0.0026) [2025-01-04 02:47:38,968][134211] Fps is (10 sec: 13516.9, 60 sec: 15770.0, 300 sec: 15537.0). Total num frames: 306995200. Throughput: 0: 3964.0. Samples: 65916506. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 02:47:38,968][134211] Avg episode reward: [(0, '7.805')] [2025-01-04 02:47:40,239][134294] Updated weights for policy 0, policy_version 74954 (0.0028) [2025-01-04 02:47:42,571][134294] Updated weights for policy 0, policy_version 74964 (0.0016) [2025-01-04 02:47:43,968][134211] Fps is (10 sec: 14745.9, 60 sec: 15838.1, 300 sec: 15606.5). Total num frames: 307077120. Throughput: 0: 3750.2. Samples: 65939238. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 02:47:43,968][134211] Avg episode reward: [(0, '7.539')] [2025-01-04 02:47:44,543][134294] Updated weights for policy 0, policy_version 74974 (0.0014) [2025-01-04 02:47:46,451][134294] Updated weights for policy 0, policy_version 74984 (0.0014) [2025-01-04 02:47:48,370][134294] Updated weights for policy 0, policy_version 74994 (0.0013) [2025-01-04 02:47:48,968][134211] Fps is (10 sec: 19251.3, 60 sec: 16384.0, 300 sec: 15759.2). Total num frames: 307187712. Throughput: 0: 3750.2. Samples: 65955480. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 02:47:48,968][134211] Avg episode reward: [(0, '7.199')] [2025-01-04 02:47:50,251][134294] Updated weights for policy 0, policy_version 75004 (0.0015) [2025-01-04 02:47:52,788][134294] Updated weights for policy 0, policy_version 75014 (0.0025) [2025-01-04 02:47:53,968][134211] Fps is (10 sec: 19249.8, 60 sec: 16247.3, 300 sec: 15759.1). Total num frames: 307269632. Throughput: 0: 3855.5. Samples: 65985524. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 02:47:53,969][134211] Avg episode reward: [(0, '7.136')] [2025-01-04 02:47:56,085][134294] Updated weights for policy 0, policy_version 75024 (0.0026) [2025-01-04 02:47:58,968][134211] Fps is (10 sec: 14335.9, 60 sec: 15428.3, 300 sec: 15620.3). Total num frames: 307331072. Throughput: 0: 3907.4. Samples: 66004322. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 02:47:58,968][134211] Avg episode reward: [(0, '7.850')] [2025-01-04 02:47:59,336][134294] Updated weights for policy 0, policy_version 75034 (0.0027) [2025-01-04 02:48:02,522][134294] Updated weights for policy 0, policy_version 75044 (0.0029) [2025-01-04 02:48:03,968][134211] Fps is (10 sec: 12698.3, 60 sec: 14745.6, 300 sec: 15495.4). Total num frames: 307396608. Throughput: 0: 3920.6. Samples: 66013984. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 02:48:03,968][134211] Avg episode reward: [(0, '6.743')] [2025-01-04 02:48:05,649][134294] Updated weights for policy 0, policy_version 75054 (0.0024) [2025-01-04 02:48:08,678][134294] Updated weights for policy 0, policy_version 75064 (0.0025) [2025-01-04 02:48:08,968][134211] Fps is (10 sec: 13107.2, 60 sec: 14813.9, 300 sec: 15495.4). Total num frames: 307462144. Throughput: 0: 3913.4. Samples: 66033674. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 02:48:08,968][134211] Avg episode reward: [(0, '7.430')] [2025-01-04 02:48:11,061][134294] Updated weights for policy 0, policy_version 75074 (0.0018) [2025-01-04 02:48:12,884][134294] Updated weights for policy 0, policy_version 75084 (0.0014) [2025-01-04 02:48:13,968][134211] Fps is (10 sec: 16793.9, 60 sec: 15633.1, 300 sec: 15634.2). Total num frames: 307564544. Throughput: 0: 3898.8. Samples: 66060620. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 02:48:13,968][134211] Avg episode reward: [(0, '7.159')] [2025-01-04 02:48:14,881][134294] Updated weights for policy 0, policy_version 75094 (0.0013) [2025-01-04 02:48:16,744][134294] Updated weights for policy 0, policy_version 75104 (0.0013) [2025-01-04 02:48:18,638][134294] Updated weights for policy 0, policy_version 75114 (0.0013) [2025-01-04 02:48:18,967][134211] Fps is (10 sec: 20889.9, 60 sec: 16315.8, 300 sec: 15759.2). Total num frames: 307671040. Throughput: 0: 4007.7. Samples: 66076572. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 02:48:18,968][134211] Avg episode reward: [(0, '7.159')] [2025-01-04 02:48:21,023][134294] Updated weights for policy 0, policy_version 75124 (0.0021) [2025-01-04 02:48:23,968][134211] Fps is (10 sec: 18021.9, 60 sec: 16452.2, 300 sec: 15787.0). Total num frames: 307744768. Throughput: 0: 4166.2. Samples: 66103984. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 02:48:23,969][134211] Avg episode reward: [(0, '7.486')] [2025-01-04 02:48:24,195][134294] Updated weights for policy 0, policy_version 75134 (0.0027) [2025-01-04 02:48:27,388][134294] Updated weights for policy 0, policy_version 75144 (0.0027) [2025-01-04 02:48:28,968][134211] Fps is (10 sec: 13926.2, 60 sec: 15837.9, 300 sec: 15773.1). Total num frames: 307810304. Throughput: 0: 4085.8. Samples: 66123098. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 02:48:28,968][134211] Avg episode reward: [(0, '7.214')] [2025-01-04 02:48:30,490][134294] Updated weights for policy 0, policy_version 75154 (0.0025) [2025-01-04 02:48:33,458][134294] Updated weights for policy 0, policy_version 75164 (0.0025) [2025-01-04 02:48:33,968][134211] Fps is (10 sec: 13107.5, 60 sec: 15769.6, 300 sec: 15745.3). Total num frames: 307875840. Throughput: 0: 3952.3. Samples: 66133334. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 02:48:33,968][134211] Avg episode reward: [(0, '7.103')] [2025-01-04 02:48:36,799][134294] Updated weights for policy 0, policy_version 75174 (0.0026) [2025-01-04 02:48:38,968][134211] Fps is (10 sec: 12697.4, 60 sec: 15701.3, 300 sec: 15578.7). Total num frames: 307937280. Throughput: 0: 3707.7. Samples: 66152368. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 02:48:38,969][134211] Avg episode reward: [(0, '6.777')] [2025-01-04 02:48:40,240][134294] Updated weights for policy 0, policy_version 75184 (0.0027) [2025-01-04 02:48:42,346][134294] Updated weights for policy 0, policy_version 75194 (0.0013) [2025-01-04 02:48:43,967][134211] Fps is (10 sec: 15155.5, 60 sec: 15837.9, 300 sec: 15662.0). Total num frames: 308027392. Throughput: 0: 3817.6. Samples: 66176112. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 02:48:43,968][134211] Avg episode reward: [(0, '6.423')] [2025-01-04 02:48:44,376][134294] Updated weights for policy 0, policy_version 75204 (0.0013) [2025-01-04 02:48:46,204][134294] Updated weights for policy 0, policy_version 75214 (0.0013) [2025-01-04 02:48:48,131][134294] Updated weights for policy 0, policy_version 75224 (0.0015) [2025-01-04 02:48:48,967][134211] Fps is (10 sec: 19661.5, 60 sec: 15769.6, 300 sec: 15787.0). Total num frames: 308133888. Throughput: 0: 3958.1. Samples: 66192098. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 02:48:48,968][134211] Avg episode reward: [(0, '6.947')] [2025-01-04 02:48:50,029][134294] Updated weights for policy 0, policy_version 75234 (0.0014) [2025-01-04 02:48:51,992][134294] Updated weights for policy 0, policy_version 75244 (0.0015) [2025-01-04 02:48:53,968][134211] Fps is (10 sec: 19660.1, 60 sec: 15906.3, 300 sec: 15828.6). Total num frames: 308224000. Throughput: 0: 4221.5. Samples: 66223642. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 02:48:53,969][134211] Avg episode reward: [(0, '6.865')] [2025-01-04 02:48:55,074][134294] Updated weights for policy 0, policy_version 75254 (0.0027) [2025-01-04 02:48:58,167][134294] Updated weights for policy 0, policy_version 75264 (0.0025) [2025-01-04 02:48:58,968][134211] Fps is (10 sec: 15564.3, 60 sec: 15974.4, 300 sec: 15675.9). Total num frames: 308289536. Throughput: 0: 4063.0. Samples: 66243456. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 02:48:58,968][134211] Avg episode reward: [(0, '6.660')] [2025-01-04 02:49:01,345][134294] Updated weights for policy 0, policy_version 75274 (0.0025) [2025-01-04 02:49:03,968][134211] Fps is (10 sec: 13107.2, 60 sec: 15974.4, 300 sec: 15648.1). Total num frames: 308355072. Throughput: 0: 3924.7. Samples: 66253186. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 02:49:03,969][134211] Avg episode reward: [(0, '7.474')] [2025-01-04 02:49:04,617][134294] Updated weights for policy 0, policy_version 75284 (0.0028) [2025-01-04 02:49:08,006][134294] Updated weights for policy 0, policy_version 75294 (0.0030) [2025-01-04 02:49:08,968][134211] Fps is (10 sec: 12288.2, 60 sec: 15837.9, 300 sec: 15634.2). Total num frames: 308412416. Throughput: 0: 3729.3. Samples: 66271802. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 02:49:08,968][134211] Avg episode reward: [(0, '6.939')] [2025-01-04 02:49:11,388][134294] Updated weights for policy 0, policy_version 75304 (0.0026) [2025-01-04 02:49:13,968][134211] Fps is (10 sec: 11878.3, 60 sec: 15155.1, 300 sec: 15592.6). Total num frames: 308473856. Throughput: 0: 3700.3. Samples: 66289614. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 02:49:13,969][134211] Avg episode reward: [(0, '6.975')] [2025-01-04 02:49:13,986][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000075311_308473856.pth... [2025-01-04 02:49:14,060][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000074407_304771072.pth [2025-01-04 02:49:14,918][134294] Updated weights for policy 0, policy_version 75314 (0.0027) [2025-01-04 02:49:16,900][134294] Updated weights for policy 0, policy_version 75324 (0.0014) [2025-01-04 02:49:18,812][134294] Updated weights for policy 0, policy_version 75334 (0.0013) [2025-01-04 02:49:18,968][134211] Fps is (10 sec: 15564.8, 60 sec: 14950.4, 300 sec: 15564.8). Total num frames: 308568064. Throughput: 0: 3723.9. Samples: 66300908. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 02:49:18,968][134211] Avg episode reward: [(0, '6.969')] [2025-01-04 02:49:20,754][134294] Updated weights for policy 0, policy_version 75344 (0.0013) [2025-01-04 02:49:22,730][134294] Updated weights for policy 0, policy_version 75354 (0.0014) [2025-01-04 02:49:23,968][134211] Fps is (10 sec: 19251.4, 60 sec: 15360.0, 300 sec: 15675.9). Total num frames: 308666368. Throughput: 0: 4012.4. Samples: 66332928. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 02:49:23,968][134211] Avg episode reward: [(0, '7.308')] [2025-01-04 02:49:25,806][134294] Updated weights for policy 0, policy_version 75364 (0.0027) [2025-01-04 02:49:28,969][134211] Fps is (10 sec: 15972.4, 60 sec: 15291.4, 300 sec: 15661.9). Total num frames: 308727808. Throughput: 0: 3945.8. Samples: 66353680. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 02:49:28,970][134211] Avg episode reward: [(0, '7.044')] [2025-01-04 02:49:29,033][134294] Updated weights for policy 0, policy_version 75374 (0.0027) [2025-01-04 02:49:32,099][134294] Updated weights for policy 0, policy_version 75384 (0.0027) [2025-01-04 02:49:33,968][134211] Fps is (10 sec: 12697.6, 60 sec: 15291.7, 300 sec: 15592.6). Total num frames: 308793344. Throughput: 0: 3804.9. Samples: 66363322. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 02:49:33,969][134211] Avg episode reward: [(0, '7.788')] [2025-01-04 02:49:35,250][134294] Updated weights for policy 0, policy_version 75394 (0.0026) [2025-01-04 02:49:37,902][134294] Updated weights for policy 0, policy_version 75404 (0.0022) [2025-01-04 02:49:38,968][134211] Fps is (10 sec: 14747.6, 60 sec: 15633.1, 300 sec: 15509.3). Total num frames: 308875264. Throughput: 0: 3555.7. Samples: 66383646. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 02:49:38,968][134211] Avg episode reward: [(0, '7.786')] [2025-01-04 02:49:39,861][134294] Updated weights for policy 0, policy_version 75414 (0.0012) [2025-01-04 02:49:41,693][134294] Updated weights for policy 0, policy_version 75424 (0.0012) [2025-01-04 02:49:43,609][134294] Updated weights for policy 0, policy_version 75434 (0.0013) [2025-01-04 02:49:43,968][134211] Fps is (10 sec: 18842.0, 60 sec: 15906.1, 300 sec: 15634.2). Total num frames: 308981760. Throughput: 0: 3819.0. Samples: 66415312. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 02:49:43,968][134211] Avg episode reward: [(0, '7.402')] [2025-01-04 02:49:45,527][134294] Updated weights for policy 0, policy_version 75444 (0.0013) [2025-01-04 02:49:47,510][134294] Updated weights for policy 0, policy_version 75454 (0.0014) [2025-01-04 02:49:48,968][134211] Fps is (10 sec: 20070.1, 60 sec: 15701.3, 300 sec: 15731.4). Total num frames: 309075968. Throughput: 0: 3965.4. Samples: 66431628. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 02:49:48,968][134211] Avg episode reward: [(0, '8.148')] [2025-01-04 02:49:50,613][134294] Updated weights for policy 0, policy_version 75464 (0.0027) [2025-01-04 02:49:53,801][134294] Updated weights for policy 0, policy_version 75474 (0.0027) [2025-01-04 02:49:53,968][134211] Fps is (10 sec: 15973.9, 60 sec: 15291.7, 300 sec: 15731.5). Total num frames: 309141504. Throughput: 0: 4037.0. Samples: 66453466. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 02:49:53,969][134211] Avg episode reward: [(0, '7.808')] [2025-01-04 02:49:56,872][134294] Updated weights for policy 0, policy_version 75484 (0.0024) [2025-01-04 02:49:58,968][134211] Fps is (10 sec: 13107.4, 60 sec: 15291.8, 300 sec: 15731.4). Total num frames: 309207040. Throughput: 0: 4074.4. Samples: 66472960. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 02:49:58,968][134211] Avg episode reward: [(0, '7.218')] [2025-01-04 02:50:00,031][134294] Updated weights for policy 0, policy_version 75494 (0.0026) [2025-01-04 02:50:03,160][134294] Updated weights for policy 0, policy_version 75504 (0.0027) [2025-01-04 02:50:03,968][134211] Fps is (10 sec: 13107.5, 60 sec: 15291.8, 300 sec: 15634.2). Total num frames: 309272576. Throughput: 0: 4038.9. Samples: 66482660. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 02:50:03,968][134211] Avg episode reward: [(0, '6.425')] [2025-01-04 02:50:06,179][134294] Updated weights for policy 0, policy_version 75514 (0.0027) [2025-01-04 02:50:08,968][134211] Fps is (10 sec: 13107.1, 60 sec: 15428.3, 300 sec: 15495.4). Total num frames: 309338112. Throughput: 0: 3773.1. Samples: 66502718. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 02:50:08,968][134211] Avg episode reward: [(0, '7.246')] [2025-01-04 02:50:09,358][134294] Updated weights for policy 0, policy_version 75524 (0.0025) [2025-01-04 02:50:11,539][134294] Updated weights for policy 0, policy_version 75534 (0.0018) [2025-01-04 02:50:13,358][134294] Updated weights for policy 0, policy_version 75544 (0.0013) [2025-01-04 02:50:13,967][134211] Fps is (10 sec: 16793.9, 60 sec: 16111.0, 300 sec: 15606.5). Total num frames: 309440512. Throughput: 0: 3898.5. Samples: 66529106. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 02:50:13,968][134211] Avg episode reward: [(0, '7.442')] [2025-01-04 02:50:15,293][134294] Updated weights for policy 0, policy_version 75554 (0.0012) [2025-01-04 02:50:17,145][134294] Updated weights for policy 0, policy_version 75564 (0.0014) [2025-01-04 02:50:18,967][134211] Fps is (10 sec: 20889.9, 60 sec: 16315.8, 300 sec: 15759.2). Total num frames: 309547008. Throughput: 0: 4046.9. Samples: 66545430. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 02:50:18,968][134211] Avg episode reward: [(0, '7.340')] [2025-01-04 02:50:19,133][134294] Updated weights for policy 0, policy_version 75574 (0.0012) [2025-01-04 02:50:21,750][134294] Updated weights for policy 0, policy_version 75584 (0.0024) [2025-01-04 02:50:23,968][134211] Fps is (10 sec: 17611.3, 60 sec: 15837.7, 300 sec: 15773.0). Total num frames: 309616640. Throughput: 0: 4193.7. Samples: 66572364. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 02:50:23,969][134211] Avg episode reward: [(0, '6.149')] [2025-01-04 02:50:25,026][134294] Updated weights for policy 0, policy_version 75594 (0.0028) [2025-01-04 02:50:28,058][134294] Updated weights for policy 0, policy_version 75604 (0.0027) [2025-01-04 02:50:28,968][134211] Fps is (10 sec: 13516.6, 60 sec: 15906.5, 300 sec: 15773.1). Total num frames: 309682176. Throughput: 0: 3923.8. Samples: 66591882. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 02:50:28,968][134211] Avg episode reward: [(0, '6.615')] [2025-01-04 02:50:31,145][134294] Updated weights for policy 0, policy_version 75614 (0.0026) [2025-01-04 02:50:33,968][134211] Fps is (10 sec: 13517.0, 60 sec: 15974.3, 300 sec: 15731.4). Total num frames: 309751808. Throughput: 0: 3787.1. Samples: 66602050. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 02:50:33,969][134211] Avg episode reward: [(0, '7.007')] [2025-01-04 02:50:34,336][134294] Updated weights for policy 0, policy_version 75624 (0.0027) [2025-01-04 02:50:37,339][134294] Updated weights for policy 0, policy_version 75634 (0.0026) [2025-01-04 02:50:38,968][134211] Fps is (10 sec: 13107.0, 60 sec: 15633.0, 300 sec: 15662.0). Total num frames: 309813248. Throughput: 0: 3741.3. Samples: 66621824. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 02:50:38,969][134211] Avg episode reward: [(0, '7.413')] [2025-01-04 02:50:40,230][134294] Updated weights for policy 0, policy_version 75644 (0.0023) [2025-01-04 02:50:42,084][134294] Updated weights for policy 0, policy_version 75654 (0.0013) [2025-01-04 02:50:43,968][134211] Fps is (10 sec: 16385.0, 60 sec: 15564.8, 300 sec: 15703.6). Total num frames: 309915648. Throughput: 0: 3907.7. Samples: 66648806. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 02:50:43,968][134211] Avg episode reward: [(0, '6.810')] [2025-01-04 02:50:44,007][134294] Updated weights for policy 0, policy_version 75664 (0.0012) [2025-01-04 02:50:45,887][134294] Updated weights for policy 0, policy_version 75674 (0.0015) [2025-01-04 02:50:47,746][134294] Updated weights for policy 0, policy_version 75684 (0.0013) [2025-01-04 02:50:48,968][134211] Fps is (10 sec: 21299.7, 60 sec: 15837.9, 300 sec: 15745.3). Total num frames: 310026240. Throughput: 0: 4055.3. Samples: 66665148. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-04 02:50:48,968][134211] Avg episode reward: [(0, '7.943')] [2025-01-04 02:50:49,649][134294] Updated weights for policy 0, policy_version 75694 (0.0012) [2025-01-04 02:50:51,495][134294] Updated weights for policy 0, policy_version 75704 (0.0014) [2025-01-04 02:50:53,526][134294] Updated weights for policy 0, policy_version 75714 (0.0017) [2025-01-04 02:50:53,968][134211] Fps is (10 sec: 21298.7, 60 sec: 16452.3, 300 sec: 15884.1). Total num frames: 310128640. Throughput: 0: 4334.2. Samples: 66697758. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-04 02:50:53,968][134211] Avg episode reward: [(0, '7.572')] [2025-01-04 02:50:56,770][134294] Updated weights for policy 0, policy_version 75724 (0.0025) [2025-01-04 02:50:58,968][134211] Fps is (10 sec: 16383.7, 60 sec: 16384.0, 300 sec: 15884.2). Total num frames: 310190080. Throughput: 0: 4215.0. Samples: 66718782. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-04 02:50:58,968][134211] Avg episode reward: [(0, '7.468')] [2025-01-04 02:51:00,145][134294] Updated weights for policy 0, policy_version 75734 (0.0027) [2025-01-04 02:51:03,315][134294] Updated weights for policy 0, policy_version 75744 (0.0027) [2025-01-04 02:51:03,968][134211] Fps is (10 sec: 12697.1, 60 sec: 16383.9, 300 sec: 15870.2). Total num frames: 310255616. Throughput: 0: 4060.0. Samples: 66728134. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-04 02:51:03,969][134211] Avg episode reward: [(0, '6.835')] [2025-01-04 02:51:06,371][134294] Updated weights for policy 0, policy_version 75754 (0.0027) [2025-01-04 02:51:08,968][134211] Fps is (10 sec: 12697.8, 60 sec: 16315.7, 300 sec: 15731.4). Total num frames: 310317056. Throughput: 0: 3889.7. Samples: 66747398. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-04 02:51:08,968][134211] Avg episode reward: [(0, '7.160')] [2025-01-04 02:51:10,091][134294] Updated weights for policy 0, policy_version 75764 (0.0025) [2025-01-04 02:51:13,174][134294] Updated weights for policy 0, policy_version 75774 (0.0021) [2025-01-04 02:51:13,968][134211] Fps is (10 sec: 12698.2, 60 sec: 15701.3, 300 sec: 15592.6). Total num frames: 310382592. Throughput: 0: 3867.8. Samples: 66765934. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-04 02:51:13,968][134211] Avg episode reward: [(0, '6.591')] [2025-01-04 02:51:14,039][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000075778_310386688.pth... [2025-01-04 02:51:14,085][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000074861_306630656.pth [2025-01-04 02:51:15,260][134294] Updated weights for policy 0, policy_version 75784 (0.0014) [2025-01-04 02:51:17,593][134294] Updated weights for policy 0, policy_version 75794 (0.0018) [2025-01-04 02:51:18,968][134211] Fps is (10 sec: 15155.2, 60 sec: 15360.0, 300 sec: 15648.1). Total num frames: 310468608. Throughput: 0: 3975.2. Samples: 66780932. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-04 02:51:18,968][134211] Avg episode reward: [(0, '7.634')] [2025-01-04 02:51:20,485][134294] Updated weights for policy 0, policy_version 75804 (0.0026) [2025-01-04 02:51:23,444][134294] Updated weights for policy 0, policy_version 75814 (0.0026) [2025-01-04 02:51:23,968][134211] Fps is (10 sec: 15564.6, 60 sec: 15360.1, 300 sec: 15662.0). Total num frames: 310538240. Throughput: 0: 4011.5. Samples: 66802340. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-04 02:51:23,968][134211] Avg episode reward: [(0, '6.708')] [2025-01-04 02:51:26,424][134294] Updated weights for policy 0, policy_version 75824 (0.0025) [2025-01-04 02:51:28,968][134211] Fps is (10 sec: 13926.2, 60 sec: 15428.2, 300 sec: 15662.0). Total num frames: 310607872. Throughput: 0: 3864.7. Samples: 66822716. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-04 02:51:28,968][134211] Avg episode reward: [(0, '7.231')] [2025-01-04 02:51:29,549][134294] Updated weights for policy 0, policy_version 75834 (0.0027) [2025-01-04 02:51:31,488][134294] Updated weights for policy 0, policy_version 75844 (0.0014) [2025-01-04 02:51:33,375][134294] Updated weights for policy 0, policy_version 75854 (0.0012) [2025-01-04 02:51:33,967][134211] Fps is (10 sec: 17203.7, 60 sec: 15974.6, 300 sec: 15800.9). Total num frames: 310710272. Throughput: 0: 3790.8. Samples: 66835732. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 02:51:33,968][134211] Avg episode reward: [(0, '6.712')] [2025-01-04 02:51:35,206][134294] Updated weights for policy 0, policy_version 75864 (0.0013) [2025-01-04 02:51:37,128][134294] Updated weights for policy 0, policy_version 75874 (0.0011) [2025-01-04 02:51:38,968][134211] Fps is (10 sec: 20890.0, 60 sec: 16725.4, 300 sec: 15898.1). Total num frames: 310816768. Throughput: 0: 3793.1. Samples: 66868448. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 02:51:38,968][134211] Avg episode reward: [(0, '7.214')] [2025-01-04 02:51:39,040][134294] Updated weights for policy 0, policy_version 75884 (0.0014) [2025-01-04 02:51:40,955][134294] Updated weights for policy 0, policy_version 75894 (0.0014) [2025-01-04 02:51:43,881][134294] Updated weights for policy 0, policy_version 75904 (0.0028) [2025-01-04 02:51:43,968][134211] Fps is (10 sec: 19250.7, 60 sec: 16452.2, 300 sec: 15925.8). Total num frames: 310902784. Throughput: 0: 3956.7. Samples: 66896832. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 02:51:43,968][134211] Avg episode reward: [(0, '6.877')] [2025-01-04 02:51:47,170][134294] Updated weights for policy 0, policy_version 75914 (0.0027) [2025-01-04 02:51:48,968][134211] Fps is (10 sec: 14745.3, 60 sec: 15633.0, 300 sec: 15828.6). Total num frames: 310964224. Throughput: 0: 3954.6. Samples: 66906090. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 02:51:48,968][134211] Avg episode reward: [(0, '7.702')] [2025-01-04 02:51:50,528][134294] Updated weights for policy 0, policy_version 75924 (0.0029) [2025-01-04 02:51:53,968][134211] Fps is (10 sec: 11878.4, 60 sec: 14882.1, 300 sec: 15648.1). Total num frames: 311021568. Throughput: 0: 3932.7. Samples: 66924370. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 02:51:53,969][134211] Avg episode reward: [(0, '7.663')] [2025-01-04 02:51:53,992][134294] Updated weights for policy 0, policy_version 75934 (0.0025) [2025-01-04 02:51:57,490][134294] Updated weights for policy 0, policy_version 75944 (0.0028) [2025-01-04 02:51:58,968][134211] Fps is (10 sec: 11468.9, 60 sec: 14813.9, 300 sec: 15481.5). Total num frames: 311078912. Throughput: 0: 3905.7. Samples: 66941690. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 02:51:58,968][134211] Avg episode reward: [(0, '7.514')] [2025-01-04 02:52:00,711][134294] Updated weights for policy 0, policy_version 75954 (0.0022) [2025-01-04 02:52:02,707][134294] Updated weights for policy 0, policy_version 75964 (0.0015) [2025-01-04 02:52:03,967][134211] Fps is (10 sec: 15155.7, 60 sec: 15291.9, 300 sec: 15592.6). Total num frames: 311173120. Throughput: 0: 3816.1. Samples: 66952658. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 02:52:03,968][134211] Avg episode reward: [(0, '6.203')] [2025-01-04 02:52:04,600][134294] Updated weights for policy 0, policy_version 75974 (0.0013) [2025-01-04 02:52:06,577][134294] Updated weights for policy 0, policy_version 75984 (0.0015) [2025-01-04 02:52:08,968][134211] Fps is (10 sec: 18022.1, 60 sec: 15701.3, 300 sec: 15703.6). Total num frames: 311259136. Throughput: 0: 4014.1. Samples: 66982974. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 02:52:08,969][134211] Avg episode reward: [(0, '7.071')] [2025-01-04 02:52:09,840][134294] Updated weights for policy 0, policy_version 75994 (0.0024) [2025-01-04 02:52:13,189][134294] Updated weights for policy 0, policy_version 76004 (0.0027) [2025-01-04 02:52:13,968][134211] Fps is (10 sec: 14745.2, 60 sec: 15633.0, 300 sec: 15689.8). Total num frames: 311320576. Throughput: 0: 3968.9. Samples: 67001316. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 02:52:13,968][134211] Avg episode reward: [(0, '7.997')] [2025-01-04 02:52:16,274][134294] Updated weights for policy 0, policy_version 76014 (0.0026) [2025-01-04 02:52:18,968][134211] Fps is (10 sec: 12697.8, 60 sec: 15291.7, 300 sec: 15689.8). Total num frames: 311386112. Throughput: 0: 3901.6. Samples: 67011304. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 02:52:18,968][134211] Avg episode reward: [(0, '7.258')] [2025-01-04 02:52:19,392][134294] Updated weights for policy 0, policy_version 76024 (0.0025) [2025-01-04 02:52:21,828][134294] Updated weights for policy 0, policy_version 76034 (0.0020) [2025-01-04 02:52:23,770][134294] Updated weights for policy 0, policy_version 76044 (0.0012) [2025-01-04 02:52:23,968][134211] Fps is (10 sec: 15974.8, 60 sec: 15701.4, 300 sec: 15662.0). Total num frames: 311480320. Throughput: 0: 3679.8. Samples: 67034038. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 02:52:23,968][134211] Avg episode reward: [(0, '6.917')] [2025-01-04 02:52:25,601][134294] Updated weights for policy 0, policy_version 76054 (0.0014) [2025-01-04 02:52:27,664][134294] Updated weights for policy 0, policy_version 76064 (0.0017) [2025-01-04 02:52:28,968][134211] Fps is (10 sec: 18841.0, 60 sec: 16110.9, 300 sec: 15745.3). Total num frames: 311574528. Throughput: 0: 3726.8. Samples: 67064540. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 02:52:28,969][134211] Avg episode reward: [(0, '7.588')] [2025-01-04 02:52:30,867][134294] Updated weights for policy 0, policy_version 76074 (0.0026) [2025-01-04 02:52:33,962][134294] Updated weights for policy 0, policy_version 76084 (0.0029) [2025-01-04 02:52:33,968][134211] Fps is (10 sec: 15974.2, 60 sec: 15496.5, 300 sec: 15745.3). Total num frames: 311640064. Throughput: 0: 3730.4. Samples: 67073960. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 02:52:33,968][134211] Avg episode reward: [(0, '7.658')] [2025-01-04 02:52:37,127][134294] Updated weights for policy 0, policy_version 76094 (0.0027) [2025-01-04 02:52:38,968][134211] Fps is (10 sec: 12697.9, 60 sec: 14745.6, 300 sec: 15675.9). Total num frames: 311701504. Throughput: 0: 3758.6. Samples: 67093508. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 02:52:38,968][134211] Avg episode reward: [(0, '7.325')] [2025-01-04 02:52:40,372][134294] Updated weights for policy 0, policy_version 76104 (0.0027) [2025-01-04 02:52:42,509][134294] Updated weights for policy 0, policy_version 76114 (0.0015) [2025-01-04 02:52:43,967][134211] Fps is (10 sec: 15155.5, 60 sec: 14813.9, 300 sec: 15606.5). Total num frames: 311791616. Throughput: 0: 3902.0. Samples: 67117280. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 02:52:43,968][134211] Avg episode reward: [(0, '7.150')] [2025-01-04 02:52:44,466][134294] Updated weights for policy 0, policy_version 76124 (0.0014) [2025-01-04 02:52:46,382][134294] Updated weights for policy 0, policy_version 76134 (0.0013) [2025-01-04 02:52:48,944][134294] Updated weights for policy 0, policy_version 76144 (0.0020) [2025-01-04 02:52:48,968][134211] Fps is (10 sec: 18432.1, 60 sec: 15360.0, 300 sec: 15648.1). Total num frames: 311885824. Throughput: 0: 4016.7. Samples: 67133412. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 02:52:48,968][134211] Avg episode reward: [(0, '6.699')] [2025-01-04 02:52:52,291][134294] Updated weights for policy 0, policy_version 76154 (0.0029) [2025-01-04 02:52:53,968][134211] Fps is (10 sec: 15564.2, 60 sec: 15428.3, 300 sec: 15648.1). Total num frames: 311947264. Throughput: 0: 3814.8. Samples: 67154640. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 02:52:53,972][134211] Avg episode reward: [(0, '6.969')] [2025-01-04 02:52:55,426][134294] Updated weights for policy 0, policy_version 76164 (0.0028) [2025-01-04 02:52:58,038][134294] Updated weights for policy 0, policy_version 76174 (0.0020) [2025-01-04 02:52:58,968][134211] Fps is (10 sec: 13926.5, 60 sec: 15769.6, 300 sec: 15689.8). Total num frames: 312025088. Throughput: 0: 3888.1. Samples: 67176280. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 02:52:58,968][134211] Avg episode reward: [(0, '7.633')] [2025-01-04 02:53:00,386][134294] Updated weights for policy 0, policy_version 76184 (0.0020) [2025-01-04 02:53:03,555][134294] Updated weights for policy 0, policy_version 76194 (0.0028) [2025-01-04 02:53:03,968][134211] Fps is (10 sec: 14745.7, 60 sec: 15359.9, 300 sec: 15703.6). Total num frames: 312094720. Throughput: 0: 3930.3. Samples: 67188166. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 02:53:03,968][134211] Avg episode reward: [(0, '7.560')] [2025-01-04 02:53:06,620][134294] Updated weights for policy 0, policy_version 76204 (0.0029) [2025-01-04 02:53:08,968][134211] Fps is (10 sec: 14336.1, 60 sec: 15155.3, 300 sec: 15606.5). Total num frames: 312168448. Throughput: 0: 3865.7. Samples: 67207996. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 02:53:08,968][134211] Avg episode reward: [(0, '7.447')] [2025-01-04 02:53:09,067][134294] Updated weights for policy 0, policy_version 76214 (0.0015) [2025-01-04 02:53:11,154][134294] Updated weights for policy 0, policy_version 76224 (0.0012) [2025-01-04 02:53:13,165][134294] Updated weights for policy 0, policy_version 76234 (0.0012) [2025-01-04 02:53:13,968][134211] Fps is (10 sec: 17613.2, 60 sec: 15837.9, 300 sec: 15592.6). Total num frames: 312270848. Throughput: 0: 3826.6. Samples: 67236736. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 02:53:13,968][134211] Avg episode reward: [(0, '7.152')] [2025-01-04 02:53:13,973][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000076238_312270848.pth... [2025-01-04 02:53:14,021][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000075311_308473856.pth [2025-01-04 02:53:15,165][134294] Updated weights for policy 0, policy_version 76244 (0.0015) [2025-01-04 02:53:17,425][134294] Updated weights for policy 0, policy_version 76254 (0.0018) [2025-01-04 02:53:18,968][134211] Fps is (10 sec: 18841.1, 60 sec: 16179.2, 300 sec: 15634.2). Total num frames: 312356864. Throughput: 0: 3967.0. Samples: 67252476. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 02:53:18,968][134211] Avg episode reward: [(0, '7.339')] [2025-01-04 02:53:20,463][134294] Updated weights for policy 0, policy_version 76264 (0.0027) [2025-01-04 02:53:23,645][134294] Updated weights for policy 0, policy_version 76274 (0.0024) [2025-01-04 02:53:23,968][134211] Fps is (10 sec: 15154.8, 60 sec: 15701.3, 300 sec: 15634.2). Total num frames: 312422400. Throughput: 0: 3993.6. Samples: 67273220. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 02:53:23,969][134211] Avg episode reward: [(0, '6.719')] [2025-01-04 02:53:26,650][134294] Updated weights for policy 0, policy_version 76284 (0.0029) [2025-01-04 02:53:28,968][134211] Fps is (10 sec: 13107.4, 60 sec: 15223.6, 300 sec: 15634.2). Total num frames: 312487936. Throughput: 0: 3900.3. Samples: 67292796. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 02:53:28,968][134211] Avg episode reward: [(0, '7.957')] [2025-01-04 02:53:29,808][134294] Updated weights for policy 0, policy_version 76294 (0.0026) [2025-01-04 02:53:32,865][134294] Updated weights for policy 0, policy_version 76304 (0.0024) [2025-01-04 02:53:33,968][134211] Fps is (10 sec: 13107.2, 60 sec: 15223.4, 300 sec: 15648.1). Total num frames: 312553472. Throughput: 0: 3758.2. Samples: 67302530. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 02:53:33,968][134211] Avg episode reward: [(0, '7.939')] [2025-01-04 02:53:35,918][134294] Updated weights for policy 0, policy_version 76314 (0.0024) [2025-01-04 02:53:37,859][134294] Updated weights for policy 0, policy_version 76324 (0.0013) [2025-01-04 02:53:38,968][134211] Fps is (10 sec: 15564.9, 60 sec: 15701.4, 300 sec: 15648.1). Total num frames: 312643584. Throughput: 0: 3794.2. Samples: 67325378. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 02:53:38,968][134211] Avg episode reward: [(0, '6.984')] [2025-01-04 02:53:39,755][134294] Updated weights for policy 0, policy_version 76334 (0.0014) [2025-01-04 02:53:41,621][134294] Updated weights for policy 0, policy_version 76344 (0.0013) [2025-01-04 02:53:43,547][134294] Updated weights for policy 0, policy_version 76354 (0.0013) [2025-01-04 02:53:43,968][134211] Fps is (10 sec: 20070.9, 60 sec: 16042.6, 300 sec: 15662.0). Total num frames: 312754176. Throughput: 0: 4036.7. Samples: 67357930. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 02:53:43,968][134211] Avg episode reward: [(0, '7.568')] [2025-01-04 02:53:45,670][134294] Updated weights for policy 0, policy_version 76364 (0.0014) [2025-01-04 02:53:48,968][134211] Fps is (10 sec: 18021.9, 60 sec: 15633.0, 300 sec: 15592.6). Total num frames: 312823808. Throughput: 0: 4083.0. Samples: 67371902. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 02:53:48,969][134211] Avg episode reward: [(0, '8.208')] [2025-01-04 02:53:49,058][134294] Updated weights for policy 0, policy_version 76374 (0.0025) [2025-01-04 02:53:52,484][134294] Updated weights for policy 0, policy_version 76384 (0.0027) [2025-01-04 02:53:53,968][134211] Fps is (10 sec: 13106.9, 60 sec: 15633.1, 300 sec: 15578.7). Total num frames: 312885248. Throughput: 0: 4027.0. Samples: 67389214. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2025-01-04 02:53:53,968][134211] Avg episode reward: [(0, '6.871')] [2025-01-04 02:53:55,927][134294] Updated weights for policy 0, policy_version 76394 (0.0027) [2025-01-04 02:53:58,968][134211] Fps is (10 sec: 12288.1, 60 sec: 15360.0, 300 sec: 15564.8). Total num frames: 312946688. Throughput: 0: 3800.5. Samples: 67407760. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2025-01-04 02:53:58,968][134211] Avg episode reward: [(0, '7.821')] [2025-01-04 02:53:59,162][134294] Updated weights for policy 0, policy_version 76404 (0.0027) [2025-01-04 02:54:02,222][134294] Updated weights for policy 0, policy_version 76414 (0.0023) [2025-01-04 02:54:03,967][134211] Fps is (10 sec: 13926.8, 60 sec: 15496.6, 300 sec: 15634.2). Total num frames: 313024512. Throughput: 0: 3652.2. Samples: 67416826. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2025-01-04 02:54:03,968][134211] Avg episode reward: [(0, '7.951')] [2025-01-04 02:54:04,291][134294] Updated weights for policy 0, policy_version 76424 (0.0015) [2025-01-04 02:54:06,372][134294] Updated weights for policy 0, policy_version 76434 (0.0012) [2025-01-04 02:54:08,452][134294] Updated weights for policy 0, policy_version 76444 (0.0014) [2025-01-04 02:54:08,968][134211] Fps is (10 sec: 17613.2, 60 sec: 15906.1, 300 sec: 15759.2). Total num frames: 313122816. Throughput: 0: 3831.0. Samples: 67445612. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2025-01-04 02:54:08,968][134211] Avg episode reward: [(0, '8.123')] [2025-01-04 02:54:10,633][134294] Updated weights for policy 0, policy_version 76454 (0.0013) [2025-01-04 02:54:12,794][134294] Updated weights for policy 0, policy_version 76464 (0.0014) [2025-01-04 02:54:13,968][134211] Fps is (10 sec: 18840.5, 60 sec: 15701.2, 300 sec: 15745.3). Total num frames: 313212928. Throughput: 0: 4027.6. Samples: 67474042. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2025-01-04 02:54:13,969][134211] Avg episode reward: [(0, '7.240')] [2025-01-04 02:54:16,151][134294] Updated weights for policy 0, policy_version 76474 (0.0025) [2025-01-04 02:54:18,968][134211] Fps is (10 sec: 14335.7, 60 sec: 15155.2, 300 sec: 15592.6). Total num frames: 313266176. Throughput: 0: 4004.4. Samples: 67482726. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2025-01-04 02:54:18,968][134211] Avg episode reward: [(0, '7.523')] [2025-01-04 02:54:19,941][134294] Updated weights for policy 0, policy_version 76484 (0.0028) [2025-01-04 02:54:23,567][134294] Updated weights for policy 0, policy_version 76494 (0.0026) [2025-01-04 02:54:23,970][134211] Fps is (10 sec: 11057.2, 60 sec: 15018.1, 300 sec: 15578.6). Total num frames: 313323520. Throughput: 0: 3860.2. Samples: 67499098. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2025-01-04 02:54:23,971][134211] Avg episode reward: [(0, '7.436')] [2025-01-04 02:54:27,126][134294] Updated weights for policy 0, policy_version 76504 (0.0025) [2025-01-04 02:54:28,968][134211] Fps is (10 sec: 12697.9, 60 sec: 15086.9, 300 sec: 15592.6). Total num frames: 313393152. Throughput: 0: 3561.4. Samples: 67518194. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2025-01-04 02:54:28,968][134211] Avg episode reward: [(0, '7.967')] [2025-01-04 02:54:29,287][134294] Updated weights for policy 0, policy_version 76514 (0.0013) [2025-01-04 02:54:31,361][134294] Updated weights for policy 0, policy_version 76524 (0.0017) [2025-01-04 02:54:33,968][134211] Fps is (10 sec: 15158.3, 60 sec: 15360.0, 300 sec: 15592.6). Total num frames: 313475072. Throughput: 0: 3583.1. Samples: 67533142. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2025-01-04 02:54:33,968][134211] Avg episode reward: [(0, '7.585')] [2025-01-04 02:54:34,271][134294] Updated weights for policy 0, policy_version 76534 (0.0025) [2025-01-04 02:54:37,394][134294] Updated weights for policy 0, policy_version 76544 (0.0026) [2025-01-04 02:54:38,968][134211] Fps is (10 sec: 14745.3, 60 sec: 14950.4, 300 sec: 15453.7). Total num frames: 313540608. Throughput: 0: 3653.0. Samples: 67553598. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2025-01-04 02:54:38,968][134211] Avg episode reward: [(0, '8.047')] [2025-01-04 02:54:40,517][134294] Updated weights for policy 0, policy_version 76554 (0.0028) [2025-01-04 02:54:43,638][134294] Updated weights for policy 0, policy_version 76564 (0.0027) [2025-01-04 02:54:43,968][134211] Fps is (10 sec: 13516.7, 60 sec: 14267.6, 300 sec: 15370.4). Total num frames: 313610240. Throughput: 0: 3681.5. Samples: 67573428. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 02:54:43,969][134211] Avg episode reward: [(0, '7.452')] [2025-01-04 02:54:46,174][134294] Updated weights for policy 0, policy_version 76574 (0.0021) [2025-01-04 02:54:48,079][134294] Updated weights for policy 0, policy_version 76584 (0.0013) [2025-01-04 02:54:48,968][134211] Fps is (10 sec: 16384.3, 60 sec: 14677.4, 300 sec: 15467.6). Total num frames: 313704448. Throughput: 0: 3738.9. Samples: 67585078. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 02:54:48,968][134211] Avg episode reward: [(0, '7.515')] [2025-01-04 02:54:50,000][134294] Updated weights for policy 0, policy_version 76594 (0.0015) [2025-01-04 02:54:53,313][134294] Updated weights for policy 0, policy_version 76604 (0.0027) [2025-01-04 02:54:53,968][134211] Fps is (10 sec: 16384.5, 60 sec: 14813.9, 300 sec: 15481.5). Total num frames: 313774080. Throughput: 0: 3709.0. Samples: 67612516. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 02:54:53,968][134211] Avg episode reward: [(0, '7.353')] [2025-01-04 02:54:56,457][134294] Updated weights for policy 0, policy_version 76614 (0.0026) [2025-01-04 02:54:58,422][134294] Updated weights for policy 0, policy_version 76624 (0.0015) [2025-01-04 02:54:58,967][134211] Fps is (10 sec: 15564.9, 60 sec: 15223.5, 300 sec: 15550.9). Total num frames: 313860096. Throughput: 0: 3574.4. Samples: 67634890. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 02:54:58,968][134211] Avg episode reward: [(0, '7.191')] [2025-01-04 02:55:00,310][134294] Updated weights for policy 0, policy_version 76634 (0.0014) [2025-01-04 02:55:02,218][134294] Updated weights for policy 0, policy_version 76644 (0.0013) [2025-01-04 02:55:03,968][134211] Fps is (10 sec: 19660.9, 60 sec: 15769.6, 300 sec: 15703.7). Total num frames: 313970688. Throughput: 0: 3743.1. Samples: 67651166. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 02:55:03,968][134211] Avg episode reward: [(0, '7.794')] [2025-01-04 02:55:04,128][134294] Updated weights for policy 0, policy_version 76654 (0.0015) [2025-01-04 02:55:07,012][134294] Updated weights for policy 0, policy_version 76664 (0.0027) [2025-01-04 02:55:08,968][134211] Fps is (10 sec: 17611.2, 60 sec: 15223.3, 300 sec: 15578.6). Total num frames: 314036224. Throughput: 0: 3962.2. Samples: 67677392. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 02:55:08,969][134211] Avg episode reward: [(0, '6.706')] [2025-01-04 02:55:10,500][134294] Updated weights for policy 0, policy_version 76674 (0.0032) [2025-01-04 02:55:13,968][134211] Fps is (10 sec: 12287.5, 60 sec: 14677.4, 300 sec: 15412.0). Total num frames: 314093568. Throughput: 0: 3929.2. Samples: 67695008. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 02:55:13,969][134211] Avg episode reward: [(0, '6.864')] [2025-01-04 02:55:13,979][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000076683_314093568.pth... [2025-01-04 02:55:14,064][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000075778_310386688.pth [2025-01-04 02:55:14,143][134294] Updated weights for policy 0, policy_version 76684 (0.0029) [2025-01-04 02:55:17,509][134294] Updated weights for policy 0, policy_version 76694 (0.0026) [2025-01-04 02:55:18,968][134211] Fps is (10 sec: 12698.6, 60 sec: 14950.4, 300 sec: 15412.1). Total num frames: 314163200. Throughput: 0: 3789.1. Samples: 67703650. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 02:55:18,968][134211] Avg episode reward: [(0, '7.657')] [2025-01-04 02:55:19,721][134294] Updated weights for policy 0, policy_version 76704 (0.0013) [2025-01-04 02:55:21,591][134294] Updated weights for policy 0, policy_version 76714 (0.0014) [2025-01-04 02:55:23,461][134294] Updated weights for policy 0, policy_version 76724 (0.0013) [2025-01-04 02:55:23,968][134211] Fps is (10 sec: 17613.5, 60 sec: 15770.2, 300 sec: 15550.9). Total num frames: 314269696. Throughput: 0: 3954.9. Samples: 67731566. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 02:55:23,968][134211] Avg episode reward: [(0, '7.179')] [2025-01-04 02:55:25,809][134294] Updated weights for policy 0, policy_version 76734 (0.0019) [2025-01-04 02:55:28,870][134294] Updated weights for policy 0, policy_version 76744 (0.0028) [2025-01-04 02:55:28,968][134211] Fps is (10 sec: 18022.0, 60 sec: 15837.8, 300 sec: 15564.8). Total num frames: 314343424. Throughput: 0: 4071.8. Samples: 67756658. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 02:55:28,968][134211] Avg episode reward: [(0, '7.370')] [2025-01-04 02:55:31,933][134294] Updated weights for policy 0, policy_version 76754 (0.0025) [2025-01-04 02:55:33,968][134211] Fps is (10 sec: 13926.1, 60 sec: 15564.8, 300 sec: 15578.7). Total num frames: 314408960. Throughput: 0: 4037.5. Samples: 67766768. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 02:55:33,969][134211] Avg episode reward: [(0, '7.598')] [2025-01-04 02:55:35,102][134294] Updated weights for policy 0, policy_version 76764 (0.0028) [2025-01-04 02:55:38,355][134294] Updated weights for policy 0, policy_version 76774 (0.0026) [2025-01-04 02:55:38,968][134211] Fps is (10 sec: 12697.5, 60 sec: 15496.5, 300 sec: 15439.8). Total num frames: 314470400. Throughput: 0: 3853.8. Samples: 67785936. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 02:55:38,968][134211] Avg episode reward: [(0, '7.123')] [2025-01-04 02:55:40,709][134294] Updated weights for policy 0, policy_version 76784 (0.0016) [2025-01-04 02:55:42,604][134294] Updated weights for policy 0, policy_version 76794 (0.0015) [2025-01-04 02:55:43,967][134211] Fps is (10 sec: 16794.1, 60 sec: 16111.1, 300 sec: 15426.0). Total num frames: 314576896. Throughput: 0: 3970.9. Samples: 67813582. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 02:55:43,968][134211] Avg episode reward: [(0, '8.366')] [2025-01-04 02:55:44,432][134294] Updated weights for policy 0, policy_version 76804 (0.0013) [2025-01-04 02:55:46,348][134294] Updated weights for policy 0, policy_version 76814 (0.0015) [2025-01-04 02:55:48,233][134294] Updated weights for policy 0, policy_version 76824 (0.0013) [2025-01-04 02:55:48,968][134211] Fps is (10 sec: 21299.7, 60 sec: 16315.7, 300 sec: 15439.8). Total num frames: 314683392. Throughput: 0: 3971.9. Samples: 67829904. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 02:55:48,968][134211] Avg episode reward: [(0, '7.286')] [2025-01-04 02:55:50,712][134294] Updated weights for policy 0, policy_version 76834 (0.0020) [2025-01-04 02:55:53,968][134211] Fps is (10 sec: 17202.9, 60 sec: 16247.4, 300 sec: 15453.7). Total num frames: 314748928. Throughput: 0: 3966.8. Samples: 67855894. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 02:55:53,968][134211] Avg episode reward: [(0, '7.735')] [2025-01-04 02:55:54,003][134294] Updated weights for policy 0, policy_version 76844 (0.0027) [2025-01-04 02:55:57,402][134294] Updated weights for policy 0, policy_version 76854 (0.0026) [2025-01-04 02:55:58,968][134211] Fps is (10 sec: 13107.0, 60 sec: 15906.1, 300 sec: 15453.7). Total num frames: 314814464. Throughput: 0: 3986.5. Samples: 67874400. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 02:55:58,968][134211] Avg episode reward: [(0, '7.713')] [2025-01-04 02:56:00,398][134294] Updated weights for policy 0, policy_version 76864 (0.0026) [2025-01-04 02:56:03,479][134294] Updated weights for policy 0, policy_version 76874 (0.0028) [2025-01-04 02:56:03,968][134211] Fps is (10 sec: 13107.0, 60 sec: 15155.1, 300 sec: 15467.6). Total num frames: 314880000. Throughput: 0: 4018.6. Samples: 67884486. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 02:56:03,968][134211] Avg episode reward: [(0, '7.614')] [2025-01-04 02:56:06,575][134294] Updated weights for policy 0, policy_version 76884 (0.0025) [2025-01-04 02:56:08,968][134211] Fps is (10 sec: 13107.2, 60 sec: 15155.4, 300 sec: 15467.6). Total num frames: 314945536. Throughput: 0: 3843.5. Samples: 67904524. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 02:56:08,968][134211] Avg episode reward: [(0, '8.029')] [2025-01-04 02:56:09,657][134294] Updated weights for policy 0, policy_version 76894 (0.0028) [2025-01-04 02:56:12,658][134294] Updated weights for policy 0, policy_version 76904 (0.0028) [2025-01-04 02:56:13,968][134211] Fps is (10 sec: 14336.3, 60 sec: 15496.6, 300 sec: 15439.8). Total num frames: 315023360. Throughput: 0: 3740.9. Samples: 67924996. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 02:56:13,968][134211] Avg episode reward: [(0, '7.790')] [2025-01-04 02:56:14,699][134294] Updated weights for policy 0, policy_version 76914 (0.0012) [2025-01-04 02:56:16,564][134294] Updated weights for policy 0, policy_version 76924 (0.0013) [2025-01-04 02:56:18,432][134294] Updated weights for policy 0, policy_version 76934 (0.0014) [2025-01-04 02:56:18,967][134211] Fps is (10 sec: 18432.4, 60 sec: 16110.9, 300 sec: 15564.8). Total num frames: 315129856. Throughput: 0: 3879.0. Samples: 67941322. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 02:56:18,968][134211] Avg episode reward: [(0, '7.668')] [2025-01-04 02:56:20,350][134294] Updated weights for policy 0, policy_version 76944 (0.0014) [2025-01-04 02:56:23,188][134294] Updated weights for policy 0, policy_version 76954 (0.0024) [2025-01-04 02:56:23,968][134211] Fps is (10 sec: 18841.1, 60 sec: 15701.3, 300 sec: 15606.4). Total num frames: 315211776. Throughput: 0: 4116.6. Samples: 67971184. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 02:56:23,969][134211] Avg episode reward: [(0, '6.828')] [2025-01-04 02:56:26,613][134294] Updated weights for policy 0, policy_version 76964 (0.0029) [2025-01-04 02:56:28,968][134211] Fps is (10 sec: 14335.6, 60 sec: 15496.5, 300 sec: 15467.6). Total num frames: 315273216. Throughput: 0: 3906.9. Samples: 67989396. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 02:56:28,968][134211] Avg episode reward: [(0, '7.300')] [2025-01-04 02:56:29,782][134294] Updated weights for policy 0, policy_version 76974 (0.0029) [2025-01-04 02:56:31,888][134294] Updated weights for policy 0, policy_version 76984 (0.0016) [2025-01-04 02:56:33,968][134211] Fps is (10 sec: 14336.3, 60 sec: 15769.6, 300 sec: 15384.3). Total num frames: 315355136. Throughput: 0: 3799.1. Samples: 68000862. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 02:56:33,968][134211] Avg episode reward: [(0, '8.116')] [2025-01-04 02:56:34,701][134294] Updated weights for policy 0, policy_version 76994 (0.0024) [2025-01-04 02:56:37,887][134294] Updated weights for policy 0, policy_version 77004 (0.0027) [2025-01-04 02:56:38,968][134211] Fps is (10 sec: 15155.6, 60 sec: 15906.2, 300 sec: 15328.8). Total num frames: 315424768. Throughput: 0: 3714.0. Samples: 68023024. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 02:56:38,968][134211] Avg episode reward: [(0, '7.078')] [2025-01-04 02:56:40,051][134294] Updated weights for policy 0, policy_version 77014 (0.0015) [2025-01-04 02:56:41,926][134294] Updated weights for policy 0, policy_version 77024 (0.0013) [2025-01-04 02:56:43,938][134294] Updated weights for policy 0, policy_version 77034 (0.0013) [2025-01-04 02:56:43,968][134211] Fps is (10 sec: 17612.9, 60 sec: 15906.1, 300 sec: 15481.5). Total num frames: 315531264. Throughput: 0: 3949.9. Samples: 68052146. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 02:56:43,968][134211] Avg episode reward: [(0, '7.726')] [2025-01-04 02:56:45,975][134294] Updated weights for policy 0, policy_version 77044 (0.0013) [2025-01-04 02:56:47,925][134294] Updated weights for policy 0, policy_version 77054 (0.0013) [2025-01-04 02:56:48,968][134211] Fps is (10 sec: 20889.6, 60 sec: 15837.9, 300 sec: 15634.2). Total num frames: 315633664. Throughput: 0: 4063.4. Samples: 68067336. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 02:56:48,968][134211] Avg episode reward: [(0, '8.203')] [2025-01-04 02:56:49,977][134294] Updated weights for policy 0, policy_version 77064 (0.0016) [2025-01-04 02:56:53,123][134294] Updated weights for policy 0, policy_version 77074 (0.0027) [2025-01-04 02:56:53,968][134211] Fps is (10 sec: 17203.0, 60 sec: 15906.1, 300 sec: 15675.9). Total num frames: 315703296. Throughput: 0: 4213.6. Samples: 68094138. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 02:56:53,968][134211] Avg episode reward: [(0, '7.404')] [2025-01-04 02:56:56,341][134294] Updated weights for policy 0, policy_version 77084 (0.0027) [2025-01-04 02:56:58,968][134211] Fps is (10 sec: 13516.6, 60 sec: 15906.1, 300 sec: 15578.7). Total num frames: 315768832. Throughput: 0: 4184.5. Samples: 68113298. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 02:56:58,968][134211] Avg episode reward: [(0, '7.120')] [2025-01-04 02:56:59,390][134294] Updated weights for policy 0, policy_version 77094 (0.0026) [2025-01-04 02:57:02,619][134294] Updated weights for policy 0, policy_version 77104 (0.0024) [2025-01-04 02:57:03,968][134211] Fps is (10 sec: 12697.5, 60 sec: 15837.9, 300 sec: 15495.4). Total num frames: 315830272. Throughput: 0: 4035.5. Samples: 68122920. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 02:57:03,968][134211] Avg episode reward: [(0, '6.663')] [2025-01-04 02:57:05,949][134294] Updated weights for policy 0, policy_version 77114 (0.0027) [2025-01-04 02:57:08,968][134211] Fps is (10 sec: 12697.6, 60 sec: 15837.9, 300 sec: 15509.3). Total num frames: 315895808. Throughput: 0: 3787.1. Samples: 68141604. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 02:57:08,968][134211] Avg episode reward: [(0, '8.023')] [2025-01-04 02:57:09,312][134294] Updated weights for policy 0, policy_version 77124 (0.0026) [2025-01-04 02:57:12,211][134294] Updated weights for policy 0, policy_version 77134 (0.0018) [2025-01-04 02:57:13,968][134211] Fps is (10 sec: 14336.2, 60 sec: 15837.8, 300 sec: 15550.9). Total num frames: 315973632. Throughput: 0: 3856.9. Samples: 68162954. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 02:57:13,968][134211] Avg episode reward: [(0, '7.049')] [2025-01-04 02:57:13,973][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000077142_315973632.pth... [2025-01-04 02:57:14,019][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000076238_312270848.pth [2025-01-04 02:57:14,414][134294] Updated weights for policy 0, policy_version 77144 (0.0014) [2025-01-04 02:57:16,452][134294] Updated weights for policy 0, policy_version 77154 (0.0016) [2025-01-04 02:57:18,318][134294] Updated weights for policy 0, policy_version 77164 (0.0014) [2025-01-04 02:57:18,968][134211] Fps is (10 sec: 18022.7, 60 sec: 15769.6, 300 sec: 15578.7). Total num frames: 316076032. Throughput: 0: 3929.6. Samples: 68177692. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 02:57:18,968][134211] Avg episode reward: [(0, '6.681')] [2025-01-04 02:57:20,243][134294] Updated weights for policy 0, policy_version 77174 (0.0013) [2025-01-04 02:57:22,455][134294] Updated weights for policy 0, policy_version 77184 (0.0018) [2025-01-04 02:57:23,968][134211] Fps is (10 sec: 18841.2, 60 sec: 15837.9, 300 sec: 15550.9). Total num frames: 316162048. Throughput: 0: 4122.8. Samples: 68208552. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 02:57:23,969][134211] Avg episode reward: [(0, '8.354')] [2025-01-04 02:57:25,857][134294] Updated weights for policy 0, policy_version 77194 (0.0031) [2025-01-04 02:57:28,895][134294] Updated weights for policy 0, policy_version 77204 (0.0025) [2025-01-04 02:57:28,968][134211] Fps is (10 sec: 15155.2, 60 sec: 15906.2, 300 sec: 15550.9). Total num frames: 316227584. Throughput: 0: 3903.8. Samples: 68227816. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 02:57:28,968][134211] Avg episode reward: [(0, '7.075')] [2025-01-04 02:57:32,015][134294] Updated weights for policy 0, policy_version 77214 (0.0026) [2025-01-04 02:57:33,968][134211] Fps is (10 sec: 13107.3, 60 sec: 15633.0, 300 sec: 15564.8). Total num frames: 316293120. Throughput: 0: 3780.7. Samples: 68237470. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 02:57:33,968][134211] Avg episode reward: [(0, '6.929')] [2025-01-04 02:57:35,049][134294] Updated weights for policy 0, policy_version 77224 (0.0024) [2025-01-04 02:57:38,172][134294] Updated weights for policy 0, policy_version 77234 (0.0028) [2025-01-04 02:57:38,968][134211] Fps is (10 sec: 13107.2, 60 sec: 15564.8, 300 sec: 15481.5). Total num frames: 316358656. Throughput: 0: 3636.3. Samples: 68257772. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 02:57:38,968][134211] Avg episode reward: [(0, '7.284')] [2025-01-04 02:57:41,219][134294] Updated weights for policy 0, policy_version 77244 (0.0025) [2025-01-04 02:57:43,968][134211] Fps is (10 sec: 13107.2, 60 sec: 14882.1, 300 sec: 15384.3). Total num frames: 316424192. Throughput: 0: 3651.9. Samples: 68277632. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 02:57:43,968][134211] Avg episode reward: [(0, '7.363')] [2025-01-04 02:57:44,415][134294] Updated weights for policy 0, policy_version 77254 (0.0026) [2025-01-04 02:57:47,263][134294] Updated weights for policy 0, policy_version 77264 (0.0020) [2025-01-04 02:57:48,968][134211] Fps is (10 sec: 14745.7, 60 sec: 14540.8, 300 sec: 15453.7). Total num frames: 316506112. Throughput: 0: 3642.8. Samples: 68286846. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 02:57:48,968][134211] Avg episode reward: [(0, '8.271')] [2025-01-04 02:57:49,244][134294] Updated weights for policy 0, policy_version 77274 (0.0014) [2025-01-04 02:57:51,135][134294] Updated weights for policy 0, policy_version 77284 (0.0014) [2025-01-04 02:57:53,057][134294] Updated weights for policy 0, policy_version 77294 (0.0013) [2025-01-04 02:57:53,967][134211] Fps is (10 sec: 18842.1, 60 sec: 15155.3, 300 sec: 15550.9). Total num frames: 316612608. Throughput: 0: 3908.6. Samples: 68317490. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 02:57:53,968][134211] Avg episode reward: [(0, '7.142')] [2025-01-04 02:57:55,014][134294] Updated weights for policy 0, policy_version 77304 (0.0013) [2025-01-04 02:57:56,920][134294] Updated weights for policy 0, policy_version 77314 (0.0014) [2025-01-04 02:57:58,968][134211] Fps is (10 sec: 20889.3, 60 sec: 15769.6, 300 sec: 15662.0). Total num frames: 316715008. Throughput: 0: 4139.8. Samples: 68349244. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 02:57:58,968][134211] Avg episode reward: [(0, '7.205')] [2025-01-04 02:57:59,162][134294] Updated weights for policy 0, policy_version 77324 (0.0018) [2025-01-04 02:58:02,654][134294] Updated weights for policy 0, policy_version 77334 (0.0032) [2025-01-04 02:58:03,968][134211] Fps is (10 sec: 15973.8, 60 sec: 15701.3, 300 sec: 15606.4). Total num frames: 316772352. Throughput: 0: 4027.5. Samples: 68358930. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 02:58:03,969][134211] Avg episode reward: [(0, '6.868')] [2025-01-04 02:58:05,893][134294] Updated weights for policy 0, policy_version 77344 (0.0027) [2025-01-04 02:58:08,963][134294] Updated weights for policy 0, policy_version 77354 (0.0026) [2025-01-04 02:58:08,968][134211] Fps is (10 sec: 12697.6, 60 sec: 15769.6, 300 sec: 15495.4). Total num frames: 316841984. Throughput: 0: 3760.2. Samples: 68377760. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 02:58:08,968][134211] Avg episode reward: [(0, '7.665')] [2025-01-04 02:58:12,057][134294] Updated weights for policy 0, policy_version 77364 (0.0027) [2025-01-04 02:58:13,968][134211] Fps is (10 sec: 13107.2, 60 sec: 15496.5, 300 sec: 15412.1). Total num frames: 316903424. Throughput: 0: 3768.9. Samples: 68397416. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 02:58:13,969][134211] Avg episode reward: [(0, '8.194')] [2025-01-04 02:58:15,147][134294] Updated weights for policy 0, policy_version 77374 (0.0025) [2025-01-04 02:58:18,333][134294] Updated weights for policy 0, policy_version 77384 (0.0025) [2025-01-04 02:58:18,968][134211] Fps is (10 sec: 13107.0, 60 sec: 14950.3, 300 sec: 15425.9). Total num frames: 316973056. Throughput: 0: 3777.4. Samples: 68407454. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 02:58:18,969][134211] Avg episode reward: [(0, '7.576')] [2025-01-04 02:58:21,402][134294] Updated weights for policy 0, policy_version 77394 (0.0025) [2025-01-04 02:58:23,323][134294] Updated weights for policy 0, policy_version 77404 (0.0014) [2025-01-04 02:58:23,968][134211] Fps is (10 sec: 15565.2, 60 sec: 14950.5, 300 sec: 15495.4). Total num frames: 317059072. Throughput: 0: 3796.7. Samples: 68428622. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 02:58:23,968][134211] Avg episode reward: [(0, '7.806')] [2025-01-04 02:58:25,241][134294] Updated weights for policy 0, policy_version 77414 (0.0013) [2025-01-04 02:58:27,110][134294] Updated weights for policy 0, policy_version 77424 (0.0014) [2025-01-04 02:58:28,968][134211] Fps is (10 sec: 19251.5, 60 sec: 15633.0, 300 sec: 15634.2). Total num frames: 317165568. Throughput: 0: 4076.0. Samples: 68461052. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 02:58:28,968][134211] Avg episode reward: [(0, '8.275')] [2025-01-04 02:58:29,081][134294] Updated weights for policy 0, policy_version 77434 (0.0012) [2025-01-04 02:58:31,157][134294] Updated weights for policy 0, policy_version 77444 (0.0017) [2025-01-04 02:58:33,968][134211] Fps is (10 sec: 18841.4, 60 sec: 15906.2, 300 sec: 15606.4). Total num frames: 317247488. Throughput: 0: 4208.3. Samples: 68476222. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 02:58:33,968][134211] Avg episode reward: [(0, '6.993')] [2025-01-04 02:58:34,095][134294] Updated weights for policy 0, policy_version 77454 (0.0026) [2025-01-04 02:58:37,358][134294] Updated weights for policy 0, policy_version 77464 (0.0023) [2025-01-04 02:58:38,968][134211] Fps is (10 sec: 14336.1, 60 sec: 15837.9, 300 sec: 15439.8). Total num frames: 317308928. Throughput: 0: 3959.7. Samples: 68495676. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 02:58:38,968][134211] Avg episode reward: [(0, '7.028')] [2025-01-04 02:58:40,557][134294] Updated weights for policy 0, policy_version 77474 (0.0024) [2025-01-04 02:58:43,752][134294] Updated weights for policy 0, policy_version 77484 (0.0023) [2025-01-04 02:58:43,968][134211] Fps is (10 sec: 12697.5, 60 sec: 15837.9, 300 sec: 15426.0). Total num frames: 317374464. Throughput: 0: 3686.3. Samples: 68515128. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 02:58:43,968][134211] Avg episode reward: [(0, '7.332')] [2025-01-04 02:58:46,917][134294] Updated weights for policy 0, policy_version 77494 (0.0027) [2025-01-04 02:58:48,967][134211] Fps is (10 sec: 13926.6, 60 sec: 15701.3, 300 sec: 15467.6). Total num frames: 317448192. Throughput: 0: 3684.5. Samples: 68524732. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 02:58:48,968][134211] Avg episode reward: [(0, '7.779')] [2025-01-04 02:58:49,361][134294] Updated weights for policy 0, policy_version 77504 (0.0016) [2025-01-04 02:58:51,218][134294] Updated weights for policy 0, policy_version 77514 (0.0016) [2025-01-04 02:58:53,070][134294] Updated weights for policy 0, policy_version 77524 (0.0012) [2025-01-04 02:58:53,967][134211] Fps is (10 sec: 18022.9, 60 sec: 15701.3, 300 sec: 15620.4). Total num frames: 317554688. Throughput: 0: 3889.0. Samples: 68552766. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 02:58:53,968][134211] Avg episode reward: [(0, '6.792')] [2025-01-04 02:58:55,422][134294] Updated weights for policy 0, policy_version 77534 (0.0020) [2025-01-04 02:58:58,662][134294] Updated weights for policy 0, policy_version 77544 (0.0028) [2025-01-04 02:58:58,968][134211] Fps is (10 sec: 17612.4, 60 sec: 15155.2, 300 sec: 15592.6). Total num frames: 317624320. Throughput: 0: 3986.3. Samples: 68576800. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 02:58:58,968][134211] Avg episode reward: [(0, '7.394')] [2025-01-04 02:59:01,777][134294] Updated weights for policy 0, policy_version 77554 (0.0028) [2025-01-04 02:59:03,968][134211] Fps is (10 sec: 13106.9, 60 sec: 15223.5, 300 sec: 15467.6). Total num frames: 317685760. Throughput: 0: 3978.7. Samples: 68586494. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 02:59:03,968][134211] Avg episode reward: [(0, '7.648')] [2025-01-04 02:59:05,129][134294] Updated weights for policy 0, policy_version 77564 (0.0029) [2025-01-04 02:59:08,164][134294] Updated weights for policy 0, policy_version 77574 (0.0025) [2025-01-04 02:59:08,968][134211] Fps is (10 sec: 13107.5, 60 sec: 15223.5, 300 sec: 15398.2). Total num frames: 317755392. Throughput: 0: 3935.1. Samples: 68605700. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 02:59:08,968][134211] Avg episode reward: [(0, '7.472')] [2025-01-04 02:59:10,181][134294] Updated weights for policy 0, policy_version 77584 (0.0016) [2025-01-04 02:59:12,302][134294] Updated weights for policy 0, policy_version 77594 (0.0012) [2025-01-04 02:59:13,968][134211] Fps is (10 sec: 16793.7, 60 sec: 15837.9, 300 sec: 15550.9). Total num frames: 317853696. Throughput: 0: 3836.3. Samples: 68633684. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 02:59:13,968][134211] Avg episode reward: [(0, '7.757')] [2025-01-04 02:59:14,011][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000077602_317857792.pth... [2025-01-04 02:59:14,064][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000076683_314093568.pth [2025-01-04 02:59:14,639][134294] Updated weights for policy 0, policy_version 77604 (0.0018) [2025-01-04 02:59:18,155][134294] Updated weights for policy 0, policy_version 77614 (0.0029) [2025-01-04 02:59:18,968][134211] Fps is (10 sec: 15974.2, 60 sec: 15701.4, 300 sec: 15564.9). Total num frames: 317915136. Throughput: 0: 3732.4. Samples: 68644178. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 02:59:18,968][134211] Avg episode reward: [(0, '6.680')] [2025-01-04 02:59:21,044][134294] Updated weights for policy 0, policy_version 77624 (0.0023) [2025-01-04 02:59:23,147][134294] Updated weights for policy 0, policy_version 77634 (0.0018) [2025-01-04 02:59:23,968][134211] Fps is (10 sec: 14336.1, 60 sec: 15633.0, 300 sec: 15606.4). Total num frames: 317997056. Throughput: 0: 3792.2. Samples: 68666326. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 02:59:23,968][134211] Avg episode reward: [(0, '7.530')] [2025-01-04 02:59:26,430][134294] Updated weights for policy 0, policy_version 77644 (0.0028) [2025-01-04 02:59:28,967][134211] Fps is (10 sec: 14745.9, 60 sec: 14950.5, 300 sec: 15550.9). Total num frames: 318062592. Throughput: 0: 3799.7. Samples: 68686112. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 02:59:28,968][134211] Avg episode reward: [(0, '7.470')] [2025-01-04 02:59:29,282][134294] Updated weights for policy 0, policy_version 77654 (0.0023) [2025-01-04 02:59:31,192][134294] Updated weights for policy 0, policy_version 77664 (0.0015) [2025-01-04 02:59:33,224][134294] Updated weights for policy 0, policy_version 77674 (0.0016) [2025-01-04 02:59:33,968][134211] Fps is (10 sec: 16383.8, 60 sec: 15223.4, 300 sec: 15662.0). Total num frames: 318160896. Throughput: 0: 3916.9. Samples: 68700994. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 02:59:33,968][134211] Avg episode reward: [(0, '7.334')] [2025-01-04 02:59:36,288][134294] Updated weights for policy 0, policy_version 77684 (0.0026) [2025-01-04 02:59:38,968][134211] Fps is (10 sec: 16383.5, 60 sec: 15291.7, 300 sec: 15648.1). Total num frames: 318226432. Throughput: 0: 3813.7. Samples: 68724384. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 02:59:38,968][134211] Avg episode reward: [(0, '6.754')] [2025-01-04 02:59:39,534][134294] Updated weights for policy 0, policy_version 77694 (0.0027) [2025-01-04 02:59:42,345][134294] Updated weights for policy 0, policy_version 77704 (0.0022) [2025-01-04 02:59:43,968][134211] Fps is (10 sec: 14336.2, 60 sec: 15496.6, 300 sec: 15592.6). Total num frames: 318304256. Throughput: 0: 3769.2. Samples: 68746414. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 02:59:43,968][134211] Avg episode reward: [(0, '7.005')] [2025-01-04 02:59:44,429][134294] Updated weights for policy 0, policy_version 77714 (0.0016) [2025-01-04 02:59:47,375][134294] Updated weights for policy 0, policy_version 77724 (0.0025) [2025-01-04 02:59:48,968][134211] Fps is (10 sec: 14745.5, 60 sec: 15428.2, 300 sec: 15592.6). Total num frames: 318373888. Throughput: 0: 3823.9. Samples: 68758570. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 02:59:48,968][134211] Avg episode reward: [(0, '7.780')] [2025-01-04 02:59:50,397][134294] Updated weights for policy 0, policy_version 77734 (0.0024) [2025-01-04 02:59:52,288][134294] Updated weights for policy 0, policy_version 77744 (0.0014) [2025-01-04 02:59:53,968][134211] Fps is (10 sec: 16793.7, 60 sec: 15291.7, 300 sec: 15634.2). Total num frames: 318472192. Throughput: 0: 3929.0. Samples: 68782504. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 02:59:53,968][134211] Avg episode reward: [(0, '7.971')] [2025-01-04 02:59:54,212][134294] Updated weights for policy 0, policy_version 77754 (0.0014) [2025-01-04 02:59:57,038][134294] Updated weights for policy 0, policy_version 77764 (0.0024) [2025-01-04 02:59:58,968][134211] Fps is (10 sec: 17203.5, 60 sec: 15360.0, 300 sec: 15509.3). Total num frames: 318545920. Throughput: 0: 3855.2. Samples: 68807168. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 02:59:58,968][134211] Avg episode reward: [(0, '7.663')] [2025-01-04 03:00:00,167][134294] Updated weights for policy 0, policy_version 77774 (0.0025) [2025-01-04 03:00:03,081][134294] Updated weights for policy 0, policy_version 77784 (0.0020) [2025-01-04 03:00:03,968][134211] Fps is (10 sec: 14745.7, 60 sec: 15564.9, 300 sec: 15537.1). Total num frames: 318619648. Throughput: 0: 3839.7. Samples: 68816964. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 03:00:03,968][134211] Avg episode reward: [(0, '7.864')] [2025-01-04 03:00:05,133][134294] Updated weights for policy 0, policy_version 77794 (0.0015) [2025-01-04 03:00:06,970][134294] Updated weights for policy 0, policy_version 77804 (0.0014) [2025-01-04 03:00:08,968][134211] Fps is (10 sec: 17612.9, 60 sec: 16110.9, 300 sec: 15689.8). Total num frames: 318722048. Throughput: 0: 3982.8. Samples: 68845552. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 03:00:08,968][134211] Avg episode reward: [(0, '6.948')] [2025-01-04 03:00:08,974][134294] Updated weights for policy 0, policy_version 77814 (0.0014) [2025-01-04 03:00:10,973][134294] Updated weights for policy 0, policy_version 77824 (0.0015) [2025-01-04 03:00:13,735][134294] Updated weights for policy 0, policy_version 77834 (0.0024) [2025-01-04 03:00:13,968][134211] Fps is (10 sec: 18840.8, 60 sec: 15906.1, 300 sec: 15745.3). Total num frames: 318808064. Throughput: 0: 4169.0. Samples: 68873720. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 03:00:13,969][134211] Avg episode reward: [(0, '7.315')] [2025-01-04 03:00:17,611][134294] Updated weights for policy 0, policy_version 77844 (0.0026) [2025-01-04 03:00:18,968][134211] Fps is (10 sec: 13926.1, 60 sec: 15769.6, 300 sec: 15564.8). Total num frames: 318861312. Throughput: 0: 4013.5. Samples: 68881602. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 03:00:18,968][134211] Avg episode reward: [(0, '6.766')] [2025-01-04 03:00:21,169][134294] Updated weights for policy 0, policy_version 77854 (0.0025) [2025-01-04 03:00:23,968][134211] Fps is (10 sec: 11059.4, 60 sec: 15360.0, 300 sec: 15509.3). Total num frames: 318918656. Throughput: 0: 3869.8. Samples: 68898526. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 03:00:23,968][134211] Avg episode reward: [(0, '7.208')] [2025-01-04 03:00:24,863][134294] Updated weights for policy 0, policy_version 77864 (0.0028) [2025-01-04 03:00:27,459][134294] Updated weights for policy 0, policy_version 77874 (0.0019) [2025-01-04 03:00:28,968][134211] Fps is (10 sec: 13926.7, 60 sec: 15633.0, 300 sec: 15564.8). Total num frames: 319000576. Throughput: 0: 3851.2. Samples: 68919718. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 03:00:28,968][134211] Avg episode reward: [(0, '7.455')] [2025-01-04 03:00:29,624][134294] Updated weights for policy 0, policy_version 77884 (0.0014) [2025-01-04 03:00:31,761][134294] Updated weights for policy 0, policy_version 77894 (0.0013) [2025-01-04 03:00:33,915][134294] Updated weights for policy 0, policy_version 77904 (0.0014) [2025-01-04 03:00:33,968][134211] Fps is (10 sec: 17613.2, 60 sec: 15564.9, 300 sec: 15675.9). Total num frames: 319094784. Throughput: 0: 3898.6. Samples: 68934008. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 03:00:33,968][134211] Avg episode reward: [(0, '7.955')] [2025-01-04 03:00:36,042][134294] Updated weights for policy 0, policy_version 77914 (0.0012) [2025-01-04 03:00:38,485][134294] Updated weights for policy 0, policy_version 77924 (0.0016) [2025-01-04 03:00:38,968][134211] Fps is (10 sec: 18022.2, 60 sec: 15906.2, 300 sec: 15606.4). Total num frames: 319180800. Throughput: 0: 3999.6. Samples: 68962486. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 03:00:38,968][134211] Avg episode reward: [(0, '7.376')] [2025-01-04 03:00:42,287][134294] Updated weights for policy 0, policy_version 77934 (0.0029) [2025-01-04 03:00:43,968][134211] Fps is (10 sec: 13926.0, 60 sec: 15496.5, 300 sec: 15425.9). Total num frames: 319234048. Throughput: 0: 3838.7. Samples: 68979912. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 03:00:43,968][134211] Avg episode reward: [(0, '7.498')] [2025-01-04 03:00:45,939][134294] Updated weights for policy 0, policy_version 77944 (0.0030) [2025-01-04 03:00:48,968][134211] Fps is (10 sec: 11059.1, 60 sec: 15291.7, 300 sec: 15398.2). Total num frames: 319291392. Throughput: 0: 3814.2. Samples: 68988604. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 03:00:48,968][134211] Avg episode reward: [(0, '6.895')] [2025-01-04 03:00:49,575][134294] Updated weights for policy 0, policy_version 77954 (0.0028) [2025-01-04 03:00:52,803][134294] Updated weights for policy 0, policy_version 77964 (0.0023) [2025-01-04 03:00:53,967][134211] Fps is (10 sec: 12698.0, 60 sec: 14813.9, 300 sec: 15412.1). Total num frames: 319361024. Throughput: 0: 3556.5. Samples: 69005596. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 03:00:53,968][134211] Avg episode reward: [(0, '7.723')] [2025-01-04 03:00:54,985][134294] Updated weights for policy 0, policy_version 77974 (0.0014) [2025-01-04 03:00:57,173][134294] Updated weights for policy 0, policy_version 77984 (0.0014) [2025-01-04 03:00:58,968][134211] Fps is (10 sec: 16384.3, 60 sec: 15155.2, 300 sec: 15509.3). Total num frames: 319455232. Throughput: 0: 3549.2. Samples: 69033432. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 03:00:58,968][134211] Avg episode reward: [(0, '7.222')] [2025-01-04 03:00:59,359][134294] Updated weights for policy 0, policy_version 77994 (0.0015) [2025-01-04 03:01:01,964][134294] Updated weights for policy 0, policy_version 78004 (0.0020) [2025-01-04 03:01:03,968][134211] Fps is (10 sec: 15973.9, 60 sec: 15018.6, 300 sec: 15509.3). Total num frames: 319520768. Throughput: 0: 3675.7. Samples: 69047008. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 03:01:03,969][134211] Avg episode reward: [(0, '7.795')] [2025-01-04 03:01:05,933][134294] Updated weights for policy 0, policy_version 78014 (0.0030) [2025-01-04 03:01:08,968][134211] Fps is (10 sec: 12287.7, 60 sec: 14267.7, 300 sec: 15439.8). Total num frames: 319578112. Throughput: 0: 3661.7. Samples: 69063304. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 03:01:08,968][134211] Avg episode reward: [(0, '6.949')] [2025-01-04 03:01:09,456][134294] Updated weights for policy 0, policy_version 78024 (0.0025) [2025-01-04 03:01:12,028][134294] Updated weights for policy 0, policy_version 78034 (0.0017) [2025-01-04 03:01:13,968][134211] Fps is (10 sec: 14336.4, 60 sec: 14267.8, 300 sec: 15370.4). Total num frames: 319664128. Throughput: 0: 3678.6. Samples: 69085254. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 03:01:13,968][134211] Avg episode reward: [(0, '7.167')] [2025-01-04 03:01:13,972][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000078043_319664128.pth... [2025-01-04 03:01:14,023][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000077142_315973632.pth [2025-01-04 03:01:14,190][134294] Updated weights for policy 0, policy_version 78044 (0.0014) [2025-01-04 03:01:16,251][134294] Updated weights for policy 0, policy_version 78054 (0.0016) [2025-01-04 03:01:18,968][134211] Fps is (10 sec: 16384.1, 60 sec: 14677.4, 300 sec: 15356.5). Total num frames: 319741952. Throughput: 0: 3690.7. Samples: 69100088. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 03:01:18,968][134211] Avg episode reward: [(0, '7.172')] [2025-01-04 03:01:19,614][134294] Updated weights for policy 0, policy_version 78064 (0.0025) [2025-01-04 03:01:23,036][134294] Updated weights for policy 0, policy_version 78074 (0.0028) [2025-01-04 03:01:23,968][134211] Fps is (10 sec: 13516.7, 60 sec: 14677.4, 300 sec: 15342.7). Total num frames: 319799296. Throughput: 0: 3466.2. Samples: 69118464. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 03:01:23,968][134211] Avg episode reward: [(0, '7.544')] [2025-01-04 03:01:26,430][134294] Updated weights for policy 0, policy_version 78084 (0.0028) [2025-01-04 03:01:28,970][134211] Fps is (10 sec: 11875.9, 60 sec: 14335.5, 300 sec: 15273.1). Total num frames: 319860736. Throughput: 0: 3475.2. Samples: 69136304. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 03:01:28,971][134211] Avg episode reward: [(0, '7.187')] [2025-01-04 03:01:29,598][134294] Updated weights for policy 0, policy_version 78094 (0.0024) [2025-01-04 03:01:31,577][134294] Updated weights for policy 0, policy_version 78104 (0.0013) [2025-01-04 03:01:33,631][134294] Updated weights for policy 0, policy_version 78114 (0.0013) [2025-01-04 03:01:33,968][134211] Fps is (10 sec: 15974.3, 60 sec: 14404.2, 300 sec: 15370.4). Total num frames: 319959040. Throughput: 0: 3576.0. Samples: 69149522. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 03:01:33,968][134211] Avg episode reward: [(0, '7.213')] [2025-01-04 03:01:35,749][134294] Updated weights for policy 0, policy_version 78124 (0.0012) [2025-01-04 03:01:37,806][134294] Updated weights for policy 0, policy_version 78134 (0.0014) [2025-01-04 03:01:38,968][134211] Fps is (10 sec: 18845.3, 60 sec: 14472.5, 300 sec: 15314.9). Total num frames: 320049152. Throughput: 0: 3865.4. Samples: 69179542. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 03:01:38,968][134211] Avg episode reward: [(0, '7.090')] [2025-01-04 03:01:41,269][134294] Updated weights for policy 0, policy_version 78144 (0.0027) [2025-01-04 03:01:43,968][134211] Fps is (10 sec: 14745.7, 60 sec: 14540.8, 300 sec: 15162.1). Total num frames: 320106496. Throughput: 0: 3657.6. Samples: 69198024. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 03:01:43,968][134211] Avg episode reward: [(0, '7.216')] [2025-01-04 03:01:44,892][134294] Updated weights for policy 0, policy_version 78154 (0.0029) [2025-01-04 03:01:48,278][134294] Updated weights for policy 0, policy_version 78164 (0.0027) [2025-01-04 03:01:48,968][134211] Fps is (10 sec: 11878.4, 60 sec: 14609.1, 300 sec: 15134.4). Total num frames: 320167936. Throughput: 0: 3549.7. Samples: 69206746. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 03:01:48,969][134211] Avg episode reward: [(0, '8.209')] [2025-01-04 03:01:51,379][134294] Updated weights for policy 0, policy_version 78174 (0.0025) [2025-01-04 03:01:53,971][134211] Fps is (10 sec: 12284.1, 60 sec: 14471.7, 300 sec: 15120.3). Total num frames: 320229376. Throughput: 0: 3608.9. Samples: 69225714. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 03:01:53,972][134211] Avg episode reward: [(0, '7.852')] [2025-01-04 03:01:54,810][134294] Updated weights for policy 0, policy_version 78184 (0.0024) [2025-01-04 03:01:57,707][134294] Updated weights for policy 0, policy_version 78194 (0.0023) [2025-01-04 03:01:58,968][134211] Fps is (10 sec: 13107.2, 60 sec: 14062.9, 300 sec: 15148.3). Total num frames: 320299008. Throughput: 0: 3557.6. Samples: 69245348. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 03:01:58,968][134211] Avg episode reward: [(0, '8.082')] [2025-01-04 03:02:00,684][134294] Updated weights for policy 0, policy_version 78204 (0.0023) [2025-01-04 03:02:02,565][134294] Updated weights for policy 0, policy_version 78214 (0.0014) [2025-01-04 03:02:03,968][134211] Fps is (10 sec: 16389.3, 60 sec: 14540.9, 300 sec: 15245.5). Total num frames: 320393216. Throughput: 0: 3484.3. Samples: 69256882. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 03:02:03,968][134211] Avg episode reward: [(0, '7.042')] [2025-01-04 03:02:04,474][134294] Updated weights for policy 0, policy_version 78224 (0.0015) [2025-01-04 03:02:06,357][134294] Updated weights for policy 0, policy_version 78234 (0.0014) [2025-01-04 03:02:08,299][134294] Updated weights for policy 0, policy_version 78244 (0.0014) [2025-01-04 03:02:08,968][134211] Fps is (10 sec: 20070.8, 60 sec: 15360.1, 300 sec: 15342.6). Total num frames: 320499712. Throughput: 0: 3799.6. Samples: 69289446. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 03:02:08,968][134211] Avg episode reward: [(0, '7.955')] [2025-01-04 03:02:10,299][134294] Updated weights for policy 0, policy_version 78254 (0.0014) [2025-01-04 03:02:13,969][134211] Fps is (10 sec: 17200.7, 60 sec: 15018.3, 300 sec: 15217.6). Total num frames: 320565248. Throughput: 0: 3932.7. Samples: 69313274. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 03:02:13,970][134211] Avg episode reward: [(0, '8.739')] [2025-01-04 03:02:14,001][134264] Saving new best policy, reward=8.739! [2025-01-04 03:02:14,003][134294] Updated weights for policy 0, policy_version 78264 (0.0029) [2025-01-04 03:02:17,696][134294] Updated weights for policy 0, policy_version 78274 (0.0030) [2025-01-04 03:02:18,968][134211] Fps is (10 sec: 12287.7, 60 sec: 14677.3, 300 sec: 15120.5). Total num frames: 320622592. Throughput: 0: 3813.2. Samples: 69321114. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 03:02:18,969][134211] Avg episode reward: [(0, '8.436')] [2025-01-04 03:02:21,128][134294] Updated weights for policy 0, policy_version 78284 (0.0030) [2025-01-04 03:02:23,968][134211] Fps is (10 sec: 11880.0, 60 sec: 14745.6, 300 sec: 15106.6). Total num frames: 320684032. Throughput: 0: 3546.5. Samples: 69339136. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 03:02:23,968][134211] Avg episode reward: [(0, '7.708')] [2025-01-04 03:02:24,263][134294] Updated weights for policy 0, policy_version 78294 (0.0024) [2025-01-04 03:02:26,157][134294] Updated weights for policy 0, policy_version 78304 (0.0012) [2025-01-04 03:02:28,057][134294] Updated weights for policy 0, policy_version 78314 (0.0014) [2025-01-04 03:02:28,968][134211] Fps is (10 sec: 17203.6, 60 sec: 15565.4, 300 sec: 15259.3). Total num frames: 320794624. Throughput: 0: 3765.3. Samples: 69367464. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 03:02:28,968][134211] Avg episode reward: [(0, '8.075')] [2025-01-04 03:02:29,933][134294] Updated weights for policy 0, policy_version 78324 (0.0013) [2025-01-04 03:02:31,903][134294] Updated weights for policy 0, policy_version 78334 (0.0014) [2025-01-04 03:02:33,746][134294] Updated weights for policy 0, policy_version 78344 (0.0013) [2025-01-04 03:02:33,967][134211] Fps is (10 sec: 21299.5, 60 sec: 15633.1, 300 sec: 15384.3). Total num frames: 320897024. Throughput: 0: 3930.9. Samples: 69383634. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 03:02:33,968][134211] Avg episode reward: [(0, '8.074')] [2025-01-04 03:02:35,848][134294] Updated weights for policy 0, policy_version 78354 (0.0016) [2025-01-04 03:02:38,968][134211] Fps is (10 sec: 18022.0, 60 sec: 15428.3, 300 sec: 15426.0). Total num frames: 320974848. Throughput: 0: 4148.7. Samples: 69412394. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 03:02:38,968][134211] Avg episode reward: [(0, '8.173')] [2025-01-04 03:02:38,981][134294] Updated weights for policy 0, policy_version 78364 (0.0028) [2025-01-04 03:02:42,248][134294] Updated weights for policy 0, policy_version 78374 (0.0028) [2025-01-04 03:02:43,968][134211] Fps is (10 sec: 14335.2, 60 sec: 15564.7, 300 sec: 15370.4). Total num frames: 321040384. Throughput: 0: 4126.5. Samples: 69431042. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 03:02:43,969][134211] Avg episode reward: [(0, '7.849')] [2025-01-04 03:02:45,397][134294] Updated weights for policy 0, policy_version 78384 (0.0025) [2025-01-04 03:02:48,501][134294] Updated weights for policy 0, policy_version 78394 (0.0026) [2025-01-04 03:02:48,968][134211] Fps is (10 sec: 13107.3, 60 sec: 15633.1, 300 sec: 15231.6). Total num frames: 321105920. Throughput: 0: 4089.1. Samples: 69440894. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 03:02:48,968][134211] Avg episode reward: [(0, '8.176')] [2025-01-04 03:02:51,501][134294] Updated weights for policy 0, policy_version 78404 (0.0024) [2025-01-04 03:02:53,968][134211] Fps is (10 sec: 13106.8, 60 sec: 15702.0, 300 sec: 15106.6). Total num frames: 321171456. Throughput: 0: 3812.4. Samples: 69461006. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 03:02:53,969][134211] Avg episode reward: [(0, '7.265')] [2025-01-04 03:02:54,639][134294] Updated weights for policy 0, policy_version 78414 (0.0026) [2025-01-04 03:02:57,770][134294] Updated weights for policy 0, policy_version 78424 (0.0027) [2025-01-04 03:02:58,968][134211] Fps is (10 sec: 13107.2, 60 sec: 15633.1, 300 sec: 15134.4). Total num frames: 321236992. Throughput: 0: 3719.5. Samples: 69480648. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 03:02:58,968][134211] Avg episode reward: [(0, '8.102')] [2025-01-04 03:03:00,788][134294] Updated weights for policy 0, policy_version 78434 (0.0025) [2025-01-04 03:03:03,781][134294] Updated weights for policy 0, policy_version 78444 (0.0023) [2025-01-04 03:03:03,968][134211] Fps is (10 sec: 13517.7, 60 sec: 15223.4, 300 sec: 15134.4). Total num frames: 321306624. Throughput: 0: 3773.3. Samples: 69490912. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 03:03:03,968][134211] Avg episode reward: [(0, '7.030')] [2025-01-04 03:03:06,717][134294] Updated weights for policy 0, policy_version 78454 (0.0024) [2025-01-04 03:03:08,710][134294] Updated weights for policy 0, policy_version 78464 (0.0013) [2025-01-04 03:03:08,967][134211] Fps is (10 sec: 15565.1, 60 sec: 14882.1, 300 sec: 15217.7). Total num frames: 321392640. Throughput: 0: 3846.6. Samples: 69512232. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 03:03:08,968][134211] Avg episode reward: [(0, '6.954')] [2025-01-04 03:03:10,670][134294] Updated weights for policy 0, policy_version 78474 (0.0014) [2025-01-04 03:03:12,516][134294] Updated weights for policy 0, policy_version 78484 (0.0014) [2025-01-04 03:03:13,968][134211] Fps is (10 sec: 19251.5, 60 sec: 15565.2, 300 sec: 15342.7). Total num frames: 321499136. Throughput: 0: 3930.0. Samples: 69544314. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 03:03:13,968][134211] Avg episode reward: [(0, '7.528')] [2025-01-04 03:03:14,021][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000078492_321503232.pth... [2025-01-04 03:03:14,062][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000077602_317857792.pth [2025-01-04 03:03:14,461][134294] Updated weights for policy 0, policy_version 78494 (0.0013) [2025-01-04 03:03:17,461][134294] Updated weights for policy 0, policy_version 78504 (0.0026) [2025-01-04 03:03:18,968][134211] Fps is (10 sec: 17612.3, 60 sec: 15769.6, 300 sec: 15287.1). Total num frames: 321568768. Throughput: 0: 3860.2. Samples: 69557344. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 03:03:18,968][134211] Avg episode reward: [(0, '8.194')] [2025-01-04 03:03:20,640][134294] Updated weights for policy 0, policy_version 78514 (0.0030) [2025-01-04 03:03:23,753][134294] Updated weights for policy 0, policy_version 78524 (0.0026) [2025-01-04 03:03:23,968][134211] Fps is (10 sec: 13516.5, 60 sec: 15837.8, 300 sec: 15148.3). Total num frames: 321634304. Throughput: 0: 3651.3. Samples: 69576702. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 03:03:23,968][134211] Avg episode reward: [(0, '7.226')] [2025-01-04 03:03:26,738][134294] Updated weights for policy 0, policy_version 78534 (0.0024) [2025-01-04 03:03:28,864][134294] Updated weights for policy 0, policy_version 78544 (0.0015) [2025-01-04 03:03:28,968][134211] Fps is (10 sec: 14745.9, 60 sec: 15360.0, 300 sec: 15148.3). Total num frames: 321716224. Throughput: 0: 3720.3. Samples: 69598454. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 03:03:28,968][134211] Avg episode reward: [(0, '7.517')] [2025-01-04 03:03:30,900][134294] Updated weights for policy 0, policy_version 78554 (0.0014) [2025-01-04 03:03:33,455][134294] Updated weights for policy 0, policy_version 78564 (0.0023) [2025-01-04 03:03:33,968][134211] Fps is (10 sec: 16793.6, 60 sec: 15086.9, 300 sec: 15231.6). Total num frames: 321802240. Throughput: 0: 3846.3. Samples: 69613976. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 03:03:33,970][134211] Avg episode reward: [(0, '8.027')] [2025-01-04 03:03:36,660][134294] Updated weights for policy 0, policy_version 78574 (0.0028) [2025-01-04 03:03:38,968][134211] Fps is (10 sec: 15154.1, 60 sec: 14882.0, 300 sec: 15231.5). Total num frames: 321867776. Throughput: 0: 3863.2. Samples: 69634850. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 03:03:38,969][134211] Avg episode reward: [(0, '8.202')] [2025-01-04 03:03:39,730][134294] Updated weights for policy 0, policy_version 78584 (0.0024) [2025-01-04 03:03:42,399][134294] Updated weights for policy 0, policy_version 78594 (0.0022) [2025-01-04 03:03:43,968][134211] Fps is (10 sec: 15155.6, 60 sec: 15223.6, 300 sec: 15273.2). Total num frames: 321953792. Throughput: 0: 3936.1. Samples: 69657770. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 03:03:43,968][134211] Avg episode reward: [(0, '7.359')] [2025-01-04 03:03:44,249][134294] Updated weights for policy 0, policy_version 78604 (0.0012) [2025-01-04 03:03:46,191][134294] Updated weights for policy 0, policy_version 78614 (0.0014) [2025-01-04 03:03:48,089][134294] Updated weights for policy 0, policy_version 78624 (0.0015) [2025-01-04 03:03:48,967][134211] Fps is (10 sec: 19252.8, 60 sec: 15906.2, 300 sec: 15273.2). Total num frames: 322060288. Throughput: 0: 4067.2. Samples: 69673936. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 03:03:48,968][134211] Avg episode reward: [(0, '7.632')] [2025-01-04 03:03:49,967][134294] Updated weights for policy 0, policy_version 78634 (0.0013) [2025-01-04 03:03:52,172][134294] Updated weights for policy 0, policy_version 78644 (0.0020) [2025-01-04 03:03:53,968][134211] Fps is (10 sec: 19251.0, 60 sec: 16247.7, 300 sec: 15328.8). Total num frames: 322146304. Throughput: 0: 4274.3. Samples: 69704576. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 03:03:53,968][134211] Avg episode reward: [(0, '7.937')] [2025-01-04 03:03:55,452][134294] Updated weights for policy 0, policy_version 78654 (0.0030) [2025-01-04 03:03:58,589][134294] Updated weights for policy 0, policy_version 78664 (0.0026) [2025-01-04 03:03:58,968][134211] Fps is (10 sec: 15154.0, 60 sec: 16247.3, 300 sec: 15342.6). Total num frames: 322211840. Throughput: 0: 3989.9. Samples: 69723862. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 03:03:58,969][134211] Avg episode reward: [(0, '7.244')] [2025-01-04 03:04:01,711][134294] Updated weights for policy 0, policy_version 78674 (0.0027) [2025-01-04 03:04:03,968][134211] Fps is (10 sec: 12697.5, 60 sec: 16110.9, 300 sec: 15314.9). Total num frames: 322273280. Throughput: 0: 3915.7. Samples: 69733550. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 03:04:03,968][134211] Avg episode reward: [(0, '7.852')] [2025-01-04 03:04:04,950][134294] Updated weights for policy 0, policy_version 78684 (0.0028) [2025-01-04 03:04:07,951][134294] Updated weights for policy 0, policy_version 78694 (0.0026) [2025-01-04 03:04:08,968][134211] Fps is (10 sec: 13107.8, 60 sec: 15837.8, 300 sec: 15217.7). Total num frames: 322342912. Throughput: 0: 3923.2. Samples: 69753248. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 03:04:08,968][134211] Avg episode reward: [(0, '7.660')] [2025-01-04 03:04:11,139][134294] Updated weights for policy 0, policy_version 78704 (0.0029) [2025-01-04 03:04:13,968][134211] Fps is (10 sec: 12696.8, 60 sec: 15018.5, 300 sec: 15203.8). Total num frames: 322400256. Throughput: 0: 3856.0. Samples: 69771976. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 03:04:13,969][134211] Avg episode reward: [(0, '7.790')] [2025-01-04 03:04:14,698][134294] Updated weights for policy 0, policy_version 78714 (0.0028) [2025-01-04 03:04:16,961][134294] Updated weights for policy 0, policy_version 78724 (0.0013) [2025-01-04 03:04:18,968][134211] Fps is (10 sec: 14746.0, 60 sec: 15360.1, 300 sec: 15231.6). Total num frames: 322490368. Throughput: 0: 3747.2. Samples: 69782598. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 03:04:18,968][134211] Avg episode reward: [(0, '7.402')] [2025-01-04 03:04:19,017][134294] Updated weights for policy 0, policy_version 78734 (0.0013) [2025-01-04 03:04:21,026][134294] Updated weights for policy 0, policy_version 78744 (0.0014) [2025-01-04 03:04:23,526][134294] Updated weights for policy 0, policy_version 78754 (0.0021) [2025-01-04 03:04:23,968][134211] Fps is (10 sec: 18433.0, 60 sec: 15837.9, 300 sec: 15328.7). Total num frames: 322584576. Throughput: 0: 3933.8. Samples: 69811870. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 03:04:23,968][134211] Avg episode reward: [(0, '7.871')] [2025-01-04 03:04:25,753][134294] Updated weights for policy 0, policy_version 78764 (0.0016) [2025-01-04 03:04:28,819][134294] Updated weights for policy 0, policy_version 78774 (0.0027) [2025-01-04 03:04:28,968][134211] Fps is (10 sec: 16793.2, 60 sec: 15701.3, 300 sec: 15245.5). Total num frames: 322658304. Throughput: 0: 3952.4. Samples: 69835630. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 03:04:28,969][134211] Avg episode reward: [(0, '7.345')] [2025-01-04 03:04:32,215][134294] Updated weights for policy 0, policy_version 78784 (0.0025) [2025-01-04 03:04:33,968][134211] Fps is (10 sec: 13926.7, 60 sec: 15360.1, 300 sec: 15245.5). Total num frames: 322723840. Throughput: 0: 3796.1. Samples: 69844762. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 03:04:33,968][134211] Avg episode reward: [(0, '7.761')] [2025-01-04 03:04:34,689][134294] Updated weights for policy 0, policy_version 78794 (0.0016) [2025-01-04 03:04:36,587][134294] Updated weights for policy 0, policy_version 78804 (0.0013) [2025-01-04 03:04:38,429][134294] Updated weights for policy 0, policy_version 78814 (0.0013) [2025-01-04 03:04:38,967][134211] Fps is (10 sec: 17203.7, 60 sec: 16042.9, 300 sec: 15342.7). Total num frames: 322830336. Throughput: 0: 3717.8. Samples: 69871878. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 03:04:38,968][134211] Avg episode reward: [(0, '7.143')] [2025-01-04 03:04:40,378][134294] Updated weights for policy 0, policy_version 78824 (0.0014) [2025-01-04 03:04:42,200][134294] Updated weights for policy 0, policy_version 78834 (0.0014) [2025-01-04 03:04:43,968][134211] Fps is (10 sec: 21707.8, 60 sec: 16452.1, 300 sec: 15481.5). Total num frames: 322940928. Throughput: 0: 4015.4. Samples: 69904552. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 03:04:43,969][134211] Avg episode reward: [(0, '8.236')] [2025-01-04 03:04:44,118][134294] Updated weights for policy 0, policy_version 78844 (0.0014) [2025-01-04 03:04:46,519][134294] Updated weights for policy 0, policy_version 78854 (0.0021) [2025-01-04 03:04:48,968][134211] Fps is (10 sec: 18431.5, 60 sec: 15906.1, 300 sec: 15398.2). Total num frames: 323014656. Throughput: 0: 4105.8. Samples: 69918312. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 03:04:48,968][134211] Avg episode reward: [(0, '8.216')] [2025-01-04 03:04:49,814][134294] Updated weights for policy 0, policy_version 78864 (0.0028) [2025-01-04 03:04:53,030][134294] Updated weights for policy 0, policy_version 78874 (0.0026) [2025-01-04 03:04:53,968][134211] Fps is (10 sec: 13517.1, 60 sec: 15496.5, 300 sec: 15356.5). Total num frames: 323076096. Throughput: 0: 4089.2. Samples: 69937260. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 03:04:53,968][134211] Avg episode reward: [(0, '8.082')] [2025-01-04 03:04:56,250][134294] Updated weights for policy 0, policy_version 78884 (0.0027) [2025-01-04 03:04:58,968][134211] Fps is (10 sec: 12697.6, 60 sec: 15496.7, 300 sec: 15328.7). Total num frames: 323141632. Throughput: 0: 4102.5. Samples: 69956586. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 03:04:58,968][134211] Avg episode reward: [(0, '7.220')] [2025-01-04 03:04:59,429][134294] Updated weights for policy 0, policy_version 78894 (0.0026) [2025-01-04 03:05:02,512][134294] Updated weights for policy 0, policy_version 78904 (0.0024) [2025-01-04 03:05:03,968][134211] Fps is (10 sec: 13107.4, 60 sec: 15564.8, 300 sec: 15203.8). Total num frames: 323207168. Throughput: 0: 4084.5. Samples: 69966400. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 03:05:03,968][134211] Avg episode reward: [(0, '7.482')] [2025-01-04 03:05:05,570][134294] Updated weights for policy 0, policy_version 78914 (0.0027) [2025-01-04 03:05:08,502][134294] Updated weights for policy 0, policy_version 78924 (0.0024) [2025-01-04 03:05:08,968][134211] Fps is (10 sec: 13516.8, 60 sec: 15564.8, 300 sec: 15148.3). Total num frames: 323276800. Throughput: 0: 3889.0. Samples: 69986874. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 03:05:08,968][134211] Avg episode reward: [(0, '7.726')] [2025-01-04 03:05:11,503][134294] Updated weights for policy 0, policy_version 78934 (0.0027) [2025-01-04 03:05:13,968][134211] Fps is (10 sec: 13925.4, 60 sec: 15769.6, 300 sec: 15203.8). Total num frames: 323346432. Throughput: 0: 3818.2. Samples: 70007450. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 03:05:13,969][134211] Avg episode reward: [(0, '7.673')] [2025-01-04 03:05:13,977][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000078942_323346432.pth... [2025-01-04 03:05:14,051][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000078043_319664128.pth [2025-01-04 03:05:14,385][134294] Updated weights for policy 0, policy_version 78944 (0.0024) [2025-01-04 03:05:16,358][134294] Updated weights for policy 0, policy_version 78954 (0.0013) [2025-01-04 03:05:18,224][134294] Updated weights for policy 0, policy_version 78964 (0.0012) [2025-01-04 03:05:18,968][134211] Fps is (10 sec: 17201.9, 60 sec: 15974.2, 300 sec: 15356.5). Total num frames: 323448832. Throughput: 0: 3919.6. Samples: 70021146. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 03:05:18,969][134211] Avg episode reward: [(0, '8.041')] [2025-01-04 03:05:20,917][134294] Updated weights for policy 0, policy_version 78974 (0.0024) [2025-01-04 03:05:23,963][134294] Updated weights for policy 0, policy_version 78984 (0.0024) [2025-01-04 03:05:23,968][134211] Fps is (10 sec: 17204.3, 60 sec: 15564.8, 300 sec: 15314.9). Total num frames: 323518464. Throughput: 0: 3888.5. Samples: 70046860. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 03:05:23,968][134211] Avg episode reward: [(0, '7.588')] [2025-01-04 03:05:26,970][134294] Updated weights for policy 0, policy_version 78994 (0.0027) [2025-01-04 03:05:28,968][134211] Fps is (10 sec: 13518.0, 60 sec: 15428.3, 300 sec: 15217.7). Total num frames: 323584000. Throughput: 0: 3607.0. Samples: 70066866. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 03:05:28,968][134211] Avg episode reward: [(0, '7.824')] [2025-01-04 03:05:29,909][134294] Updated weights for policy 0, policy_version 79004 (0.0020) [2025-01-04 03:05:31,731][134294] Updated weights for policy 0, policy_version 79014 (0.0013) [2025-01-04 03:05:33,629][134294] Updated weights for policy 0, policy_version 79024 (0.0013) [2025-01-04 03:05:33,968][134211] Fps is (10 sec: 16793.9, 60 sec: 16042.7, 300 sec: 15273.2). Total num frames: 323686400. Throughput: 0: 3602.7. Samples: 70080432. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 03:05:33,968][134211] Avg episode reward: [(0, '6.846')] [2025-01-04 03:05:35,471][134294] Updated weights for policy 0, policy_version 79034 (0.0014) [2025-01-04 03:05:38,003][134294] Updated weights for policy 0, policy_version 79044 (0.0022) [2025-01-04 03:05:38,970][134211] Fps is (10 sec: 18837.4, 60 sec: 15700.7, 300 sec: 15384.2). Total num frames: 323772416. Throughput: 0: 3868.7. Samples: 70111358. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 03:05:38,971][134211] Avg episode reward: [(0, '7.330')] [2025-01-04 03:05:41,162][134294] Updated weights for policy 0, policy_version 79054 (0.0026) [2025-01-04 03:05:43,968][134211] Fps is (10 sec: 15155.1, 60 sec: 14950.5, 300 sec: 15412.1). Total num frames: 323837952. Throughput: 0: 3858.3. Samples: 70130208. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 03:05:43,968][134211] Avg episode reward: [(0, '7.630')] [2025-01-04 03:05:44,659][134294] Updated weights for policy 0, policy_version 79064 (0.0027) [2025-01-04 03:05:47,864][134294] Updated weights for policy 0, policy_version 79074 (0.0028) [2025-01-04 03:05:48,968][134211] Fps is (10 sec: 13519.6, 60 sec: 14882.1, 300 sec: 15412.1). Total num frames: 323907584. Throughput: 0: 3850.9. Samples: 70139690. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 03:05:48,968][134211] Avg episode reward: [(0, '7.621')] [2025-01-04 03:05:49,822][134294] Updated weights for policy 0, policy_version 79084 (0.0014) [2025-01-04 03:05:51,667][134294] Updated weights for policy 0, policy_version 79094 (0.0013) [2025-01-04 03:05:53,564][134294] Updated weights for policy 0, policy_version 79104 (0.0013) [2025-01-04 03:05:53,968][134211] Fps is (10 sec: 18022.5, 60 sec: 15701.4, 300 sec: 15467.6). Total num frames: 324018176. Throughput: 0: 4029.4. Samples: 70168198. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 03:05:53,968][134211] Avg episode reward: [(0, '8.046')] [2025-01-04 03:05:55,451][134294] Updated weights for policy 0, policy_version 79114 (0.0014) [2025-01-04 03:05:57,429][134294] Updated weights for policy 0, policy_version 79124 (0.0014) [2025-01-04 03:05:58,968][134211] Fps is (10 sec: 20480.0, 60 sec: 16179.2, 300 sec: 15564.8). Total num frames: 324112384. Throughput: 0: 4260.2. Samples: 70199158. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 03:05:58,969][134211] Avg episode reward: [(0, '7.104')] [2025-01-04 03:06:00,363][134294] Updated weights for policy 0, policy_version 79134 (0.0026) [2025-01-04 03:06:03,582][134294] Updated weights for policy 0, policy_version 79144 (0.0028) [2025-01-04 03:06:03,968][134211] Fps is (10 sec: 15564.4, 60 sec: 16110.9, 300 sec: 15578.7). Total num frames: 324173824. Throughput: 0: 4177.8. Samples: 70209146. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 03:06:03,968][134211] Avg episode reward: [(0, '7.375')] [2025-01-04 03:06:06,668][134294] Updated weights for policy 0, policy_version 79154 (0.0025) [2025-01-04 03:06:08,968][134211] Fps is (10 sec: 13107.2, 60 sec: 16110.9, 300 sec: 15523.1). Total num frames: 324243456. Throughput: 0: 4035.7. Samples: 70228466. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 03:06:08,968][134211] Avg episode reward: [(0, '7.034')] [2025-01-04 03:06:09,838][134294] Updated weights for policy 0, policy_version 79164 (0.0027) [2025-01-04 03:06:13,229][134294] Updated weights for policy 0, policy_version 79174 (0.0027) [2025-01-04 03:06:13,968][134211] Fps is (10 sec: 12697.6, 60 sec: 15906.3, 300 sec: 15453.7). Total num frames: 324300800. Throughput: 0: 4009.2. Samples: 70247280. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 03:06:13,968][134211] Avg episode reward: [(0, '7.301')] [2025-01-04 03:06:16,898][134294] Updated weights for policy 0, policy_version 79184 (0.0032) [2025-01-04 03:06:18,967][134211] Fps is (10 sec: 12288.4, 60 sec: 15292.0, 300 sec: 15481.5). Total num frames: 324366336. Throughput: 0: 3889.8. Samples: 70255472. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 03:06:18,968][134211] Avg episode reward: [(0, '7.797')] [2025-01-04 03:06:19,618][134294] Updated weights for policy 0, policy_version 79194 (0.0019) [2025-01-04 03:06:21,609][134294] Updated weights for policy 0, policy_version 79204 (0.0013) [2025-01-04 03:06:23,504][134294] Updated weights for policy 0, policy_version 79214 (0.0015) [2025-01-04 03:06:23,968][134211] Fps is (10 sec: 16794.0, 60 sec: 15837.9, 300 sec: 15620.5). Total num frames: 324468736. Throughput: 0: 3777.3. Samples: 70281326. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 03:06:23,968][134211] Avg episode reward: [(0, '6.942')] [2025-01-04 03:06:25,340][134294] Updated weights for policy 0, policy_version 79224 (0.0013) [2025-01-04 03:06:27,240][134294] Updated weights for policy 0, policy_version 79234 (0.0013) [2025-01-04 03:06:28,967][134211] Fps is (10 sec: 21299.2, 60 sec: 16588.8, 300 sec: 15662.0). Total num frames: 324579328. Throughput: 0: 4085.9. Samples: 70314072. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 03:06:28,968][134211] Avg episode reward: [(0, '8.488')] [2025-01-04 03:06:29,102][134294] Updated weights for policy 0, policy_version 79244 (0.0012) [2025-01-04 03:06:31,682][134294] Updated weights for policy 0, policy_version 79254 (0.0021) [2025-01-04 03:06:33,968][134211] Fps is (10 sec: 18431.7, 60 sec: 16110.9, 300 sec: 15606.5). Total num frames: 324653056. Throughput: 0: 4178.2. Samples: 70327710. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 03:06:33,968][134211] Avg episode reward: [(0, '7.129')] [2025-01-04 03:06:34,796][134294] Updated weights for policy 0, policy_version 79264 (0.0027) [2025-01-04 03:06:37,972][134294] Updated weights for policy 0, policy_version 79274 (0.0029) [2025-01-04 03:06:38,968][134211] Fps is (10 sec: 13926.0, 60 sec: 15770.1, 300 sec: 15634.2). Total num frames: 324718592. Throughput: 0: 3976.5. Samples: 70347142. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 03:06:38,968][134211] Avg episode reward: [(0, '7.452')] [2025-01-04 03:06:41,104][134294] Updated weights for policy 0, policy_version 79284 (0.0025) [2025-01-04 03:06:43,968][134211] Fps is (10 sec: 12697.0, 60 sec: 15701.2, 300 sec: 15634.2). Total num frames: 324780032. Throughput: 0: 3721.9. Samples: 70366646. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 03:06:43,969][134211] Avg episode reward: [(0, '7.362')] [2025-01-04 03:06:44,317][134294] Updated weights for policy 0, policy_version 79294 (0.0027) [2025-01-04 03:06:47,369][134294] Updated weights for policy 0, policy_version 79304 (0.0027) [2025-01-04 03:06:48,968][134211] Fps is (10 sec: 13107.2, 60 sec: 15701.3, 300 sec: 15662.2). Total num frames: 324849664. Throughput: 0: 3720.1. Samples: 70376550. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 03:06:48,968][134211] Avg episode reward: [(0, '7.369')] [2025-01-04 03:06:50,306][134294] Updated weights for policy 0, policy_version 79314 (0.0026) [2025-01-04 03:06:53,339][134294] Updated weights for policy 0, policy_version 79324 (0.0027) [2025-01-04 03:06:53,968][134211] Fps is (10 sec: 13517.4, 60 sec: 14950.3, 300 sec: 15648.1). Total num frames: 324915200. Throughput: 0: 3750.0. Samples: 70397214. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 03:06:53,968][134211] Avg episode reward: [(0, '7.010')] [2025-01-04 03:06:56,277][134294] Updated weights for policy 0, policy_version 79334 (0.0023) [2025-01-04 03:06:58,234][134294] Updated weights for policy 0, policy_version 79344 (0.0011) [2025-01-04 03:06:58,967][134211] Fps is (10 sec: 15974.9, 60 sec: 14950.5, 300 sec: 15648.1). Total num frames: 325009408. Throughput: 0: 3868.7. Samples: 70421370. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 03:06:58,968][134211] Avg episode reward: [(0, '7.152')] [2025-01-04 03:07:00,113][134294] Updated weights for policy 0, policy_version 79354 (0.0012) [2025-01-04 03:07:02,013][134294] Updated weights for policy 0, policy_version 79364 (0.0014) [2025-01-04 03:07:03,867][134294] Updated weights for policy 0, policy_version 79374 (0.0015) [2025-01-04 03:07:03,967][134211] Fps is (10 sec: 20070.9, 60 sec: 15701.4, 300 sec: 15648.1). Total num frames: 325115904. Throughput: 0: 4049.0. Samples: 70437678. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 03:07:03,968][134211] Avg episode reward: [(0, '7.950')] [2025-01-04 03:07:06,011][134294] Updated weights for policy 0, policy_version 79384 (0.0018) [2025-01-04 03:07:08,969][134211] Fps is (10 sec: 18429.5, 60 sec: 15837.6, 300 sec: 15689.8). Total num frames: 325193728. Throughput: 0: 4106.5. Samples: 70466122. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 03:07:08,970][134211] Avg episode reward: [(0, '6.879')] [2025-01-04 03:07:09,314][134294] Updated weights for policy 0, policy_version 79394 (0.0030) [2025-01-04 03:07:12,637][134294] Updated weights for policy 0, policy_version 79404 (0.0028) [2025-01-04 03:07:13,968][134211] Fps is (10 sec: 13926.1, 60 sec: 15906.2, 300 sec: 15703.7). Total num frames: 325255168. Throughput: 0: 3791.8. Samples: 70484706. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 03:07:13,968][134211] Avg episode reward: [(0, '7.740')] [2025-01-04 03:07:13,976][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000079408_325255168.pth... [2025-01-04 03:07:14,049][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000078492_321503232.pth [2025-01-04 03:07:15,980][134294] Updated weights for policy 0, policy_version 79414 (0.0030) [2025-01-04 03:07:18,968][134211] Fps is (10 sec: 12289.5, 60 sec: 15837.8, 300 sec: 15703.6). Total num frames: 325316608. Throughput: 0: 3691.4. Samples: 70493824. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 03:07:18,968][134211] Avg episode reward: [(0, '7.713')] [2025-01-04 03:07:19,045][134294] Updated weights for policy 0, policy_version 79424 (0.0026) [2025-01-04 03:07:21,879][134294] Updated weights for policy 0, policy_version 79434 (0.0022) [2025-01-04 03:07:23,968][134211] Fps is (10 sec: 14336.0, 60 sec: 15496.5, 300 sec: 15606.4). Total num frames: 325398528. Throughput: 0: 3733.3. Samples: 70515138. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 03:07:23,968][134211] Avg episode reward: [(0, '7.749')] [2025-01-04 03:07:23,997][134294] Updated weights for policy 0, policy_version 79444 (0.0015) [2025-01-04 03:07:27,309][134294] Updated weights for policy 0, policy_version 79454 (0.0026) [2025-01-04 03:07:28,968][134211] Fps is (10 sec: 15564.9, 60 sec: 14882.1, 300 sec: 15509.3). Total num frames: 325472256. Throughput: 0: 3801.4. Samples: 70537708. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 03:07:28,968][134211] Avg episode reward: [(0, '7.413')] [2025-01-04 03:07:29,377][134294] Updated weights for policy 0, policy_version 79464 (0.0013) [2025-01-04 03:07:31,273][134294] Updated weights for policy 0, policy_version 79474 (0.0015) [2025-01-04 03:07:33,168][134294] Updated weights for policy 0, policy_version 79484 (0.0014) [2025-01-04 03:07:33,967][134211] Fps is (10 sec: 18432.4, 60 sec: 15496.6, 300 sec: 15620.4). Total num frames: 325582848. Throughput: 0: 3945.7. Samples: 70554104. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 03:07:33,968][134211] Avg episode reward: [(0, '6.865')] [2025-01-04 03:07:35,060][134294] Updated weights for policy 0, policy_version 79494 (0.0014) [2025-01-04 03:07:36,949][134294] Updated weights for policy 0, policy_version 79504 (0.0014) [2025-01-04 03:07:38,968][134211] Fps is (10 sec: 20889.0, 60 sec: 16042.7, 300 sec: 15731.4). Total num frames: 325681152. Throughput: 0: 4210.1. Samples: 70586668. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 03:07:38,969][134211] Avg episode reward: [(0, '8.074')] [2025-01-04 03:07:39,638][134294] Updated weights for policy 0, policy_version 79514 (0.0022) [2025-01-04 03:07:43,055][134294] Updated weights for policy 0, policy_version 79524 (0.0025) [2025-01-04 03:07:43,968][134211] Fps is (10 sec: 15564.5, 60 sec: 15974.5, 300 sec: 15703.6). Total num frames: 325738496. Throughput: 0: 4105.1. Samples: 70606100. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 03:07:43,968][134211] Avg episode reward: [(0, '7.192')] [2025-01-04 03:07:46,220][134294] Updated weights for policy 0, policy_version 79534 (0.0026) [2025-01-04 03:07:48,968][134211] Fps is (10 sec: 12288.1, 60 sec: 15906.1, 300 sec: 15703.7). Total num frames: 325804032. Throughput: 0: 3959.0. Samples: 70615836. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 03:07:48,968][134211] Avg episode reward: [(0, '7.522')] [2025-01-04 03:07:49,422][134294] Updated weights for policy 0, policy_version 79544 (0.0027) [2025-01-04 03:07:52,676][134294] Updated weights for policy 0, policy_version 79554 (0.0026) [2025-01-04 03:07:53,968][134211] Fps is (10 sec: 13107.0, 60 sec: 15906.1, 300 sec: 15703.6). Total num frames: 325869568. Throughput: 0: 3753.5. Samples: 70635026. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 03:07:53,968][134211] Avg episode reward: [(0, '7.519')] [2025-01-04 03:07:55,920][134294] Updated weights for policy 0, policy_version 79564 (0.0026) [2025-01-04 03:07:58,283][134294] Updated weights for policy 0, policy_version 79574 (0.0019) [2025-01-04 03:07:58,968][134211] Fps is (10 sec: 14335.9, 60 sec: 15633.0, 300 sec: 15731.4). Total num frames: 325947392. Throughput: 0: 3813.7. Samples: 70656324. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 03:07:58,968][134211] Avg episode reward: [(0, '7.596')] [2025-01-04 03:08:00,625][134294] Updated weights for policy 0, policy_version 79584 (0.0019) [2025-01-04 03:08:03,729][134294] Updated weights for policy 0, policy_version 79594 (0.0022) [2025-01-04 03:08:03,968][134211] Fps is (10 sec: 14745.6, 60 sec: 15018.6, 300 sec: 15675.9). Total num frames: 326017024. Throughput: 0: 3898.3. Samples: 70669250. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 03:08:03,968][134211] Avg episode reward: [(0, '7.039')] [2025-01-04 03:08:06,285][134294] Updated weights for policy 0, policy_version 79604 (0.0020) [2025-01-04 03:08:08,448][134294] Updated weights for policy 0, policy_version 79614 (0.0015) [2025-01-04 03:08:08,968][134211] Fps is (10 sec: 15565.0, 60 sec: 15155.5, 300 sec: 15606.4). Total num frames: 326103040. Throughput: 0: 3944.6. Samples: 70692644. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 03:08:08,968][134211] Avg episode reward: [(0, '8.232')] [2025-01-04 03:08:11,614][134294] Updated weights for policy 0, policy_version 79624 (0.0025) [2025-01-04 03:08:13,967][134211] Fps is (10 sec: 14746.0, 60 sec: 15155.2, 300 sec: 15578.7). Total num frames: 326164480. Throughput: 0: 3885.4. Samples: 70712552. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 03:08:13,968][134211] Avg episode reward: [(0, '7.339')] [2025-01-04 03:08:14,630][134294] Updated weights for policy 0, policy_version 79634 (0.0019) [2025-01-04 03:08:16,704][134294] Updated weights for policy 0, policy_version 79644 (0.0014) [2025-01-04 03:08:18,761][134294] Updated weights for policy 0, policy_version 79654 (0.0012) [2025-01-04 03:08:18,968][134211] Fps is (10 sec: 15974.7, 60 sec: 15769.6, 300 sec: 15689.8). Total num frames: 326262784. Throughput: 0: 3819.9. Samples: 70726002. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 03:08:18,968][134211] Avg episode reward: [(0, '7.244')] [2025-01-04 03:08:20,750][134294] Updated weights for policy 0, policy_version 79664 (0.0012) [2025-01-04 03:08:22,625][134294] Updated weights for policy 0, policy_version 79674 (0.0013) [2025-01-04 03:08:23,968][134211] Fps is (10 sec: 20889.6, 60 sec: 16247.5, 300 sec: 15787.0). Total num frames: 326373376. Throughput: 0: 3785.4. Samples: 70757010. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 03:08:23,968][134211] Avg episode reward: [(0, '7.992')] [2025-01-04 03:08:24,516][134294] Updated weights for policy 0, policy_version 79684 (0.0013) [2025-01-04 03:08:26,517][134294] Updated weights for policy 0, policy_version 79694 (0.0016) [2025-01-04 03:08:28,968][134211] Fps is (10 sec: 19250.6, 60 sec: 16383.9, 300 sec: 15773.1). Total num frames: 326455296. Throughput: 0: 3987.9. Samples: 70785558. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 03:08:28,969][134211] Avg episode reward: [(0, '6.731')] [2025-01-04 03:08:29,591][134294] Updated weights for policy 0, policy_version 79704 (0.0027) [2025-01-04 03:08:32,769][134294] Updated weights for policy 0, policy_version 79714 (0.0027) [2025-01-04 03:08:33,968][134211] Fps is (10 sec: 14745.4, 60 sec: 15633.0, 300 sec: 15773.1). Total num frames: 326520832. Throughput: 0: 3987.0. Samples: 70795252. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 03:08:33,968][134211] Avg episode reward: [(0, '8.127')] [2025-01-04 03:08:35,856][134294] Updated weights for policy 0, policy_version 79724 (0.0026) [2025-01-04 03:08:38,869][134294] Updated weights for policy 0, policy_version 79734 (0.0026) [2025-01-04 03:08:38,968][134211] Fps is (10 sec: 13517.0, 60 sec: 15155.2, 300 sec: 15717.5). Total num frames: 326590464. Throughput: 0: 4005.6. Samples: 70815276. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 03:08:38,968][134211] Avg episode reward: [(0, '8.203')] [2025-01-04 03:08:41,922][134294] Updated weights for policy 0, policy_version 79744 (0.0023) [2025-01-04 03:08:43,968][134211] Fps is (10 sec: 13516.9, 60 sec: 15291.7, 300 sec: 15578.7). Total num frames: 326656000. Throughput: 0: 3971.5. Samples: 70835040. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 03:08:43,968][134211] Avg episode reward: [(0, '7.119')] [2025-01-04 03:08:45,045][134294] Updated weights for policy 0, policy_version 79754 (0.0029) [2025-01-04 03:08:48,061][134294] Updated weights for policy 0, policy_version 79764 (0.0025) [2025-01-04 03:08:48,968][134211] Fps is (10 sec: 13516.8, 60 sec: 15360.0, 300 sec: 15523.1). Total num frames: 326725632. Throughput: 0: 3908.6. Samples: 70845138. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 03:08:48,968][134211] Avg episode reward: [(0, '7.446')] [2025-01-04 03:08:51,127][134294] Updated weights for policy 0, policy_version 79774 (0.0023) [2025-01-04 03:08:53,968][134211] Fps is (10 sec: 13516.5, 60 sec: 15360.0, 300 sec: 15523.2). Total num frames: 326791168. Throughput: 0: 3836.7. Samples: 70865296. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 03:08:53,969][134211] Avg episode reward: [(0, '7.250')] [2025-01-04 03:08:54,268][134294] Updated weights for policy 0, policy_version 79784 (0.0026) [2025-01-04 03:08:56,676][134294] Updated weights for policy 0, policy_version 79794 (0.0019) [2025-01-04 03:08:58,474][134294] Updated weights for policy 0, policy_version 79804 (0.0014) [2025-01-04 03:08:58,967][134211] Fps is (10 sec: 15974.7, 60 sec: 15633.2, 300 sec: 15634.2). Total num frames: 326885376. Throughput: 0: 3959.5. Samples: 70890728. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-04 03:08:58,968][134211] Avg episode reward: [(0, '7.093')] [2025-01-04 03:09:00,389][134294] Updated weights for policy 0, policy_version 79814 (0.0013) [2025-01-04 03:09:02,274][134294] Updated weights for policy 0, policy_version 79824 (0.0013) [2025-01-04 03:09:03,968][134211] Fps is (10 sec: 20480.6, 60 sec: 16315.8, 300 sec: 15773.1). Total num frames: 326995968. Throughput: 0: 4024.4. Samples: 70907100. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-04 03:09:03,968][134211] Avg episode reward: [(0, '8.103')] [2025-01-04 03:09:04,188][134294] Updated weights for policy 0, policy_version 79834 (0.0013) [2025-01-04 03:09:06,607][134294] Updated weights for policy 0, policy_version 79844 (0.0021) [2025-01-04 03:09:08,968][134211] Fps is (10 sec: 18431.6, 60 sec: 16111.0, 300 sec: 15828.6). Total num frames: 327069696. Throughput: 0: 3970.9. Samples: 70935702. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-04 03:09:08,968][134211] Avg episode reward: [(0, '6.904')] [2025-01-04 03:09:09,658][134294] Updated weights for policy 0, policy_version 79854 (0.0027) [2025-01-04 03:09:12,944][134294] Updated weights for policy 0, policy_version 79864 (0.0026) [2025-01-04 03:09:13,968][134211] Fps is (10 sec: 13925.9, 60 sec: 16179.1, 300 sec: 15745.3). Total num frames: 327135232. Throughput: 0: 3758.7. Samples: 70954698. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-04 03:09:13,969][134211] Avg episode reward: [(0, '6.820')] [2025-01-04 03:09:13,980][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000079867_327135232.pth... [2025-01-04 03:09:14,051][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000078942_323346432.pth [2025-01-04 03:09:16,073][134294] Updated weights for policy 0, policy_version 79874 (0.0026) [2025-01-04 03:09:18,968][134211] Fps is (10 sec: 13107.0, 60 sec: 15633.0, 300 sec: 15648.1). Total num frames: 327200768. Throughput: 0: 3761.7. Samples: 70964528. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-04 03:09:18,968][134211] Avg episode reward: [(0, '7.440')] [2025-01-04 03:09:19,066][134294] Updated weights for policy 0, policy_version 79884 (0.0028) [2025-01-04 03:09:22,109][134294] Updated weights for policy 0, policy_version 79894 (0.0026) [2025-01-04 03:09:23,968][134211] Fps is (10 sec: 13517.0, 60 sec: 14950.3, 300 sec: 15634.2). Total num frames: 327270400. Throughput: 0: 3770.0. Samples: 70984928. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-04 03:09:23,968][134211] Avg episode reward: [(0, '7.412')] [2025-01-04 03:09:25,165][134294] Updated weights for policy 0, policy_version 79904 (0.0023) [2025-01-04 03:09:27,667][134294] Updated weights for policy 0, policy_version 79914 (0.0020) [2025-01-04 03:09:28,968][134211] Fps is (10 sec: 15155.5, 60 sec: 14950.5, 300 sec: 15689.8). Total num frames: 327352320. Throughput: 0: 3839.6. Samples: 71007820. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-04 03:09:28,968][134211] Avg episode reward: [(0, '7.339')] [2025-01-04 03:09:29,616][134294] Updated weights for policy 0, policy_version 79924 (0.0013) [2025-01-04 03:09:31,560][134294] Updated weights for policy 0, policy_version 79934 (0.0020) [2025-01-04 03:09:33,968][134211] Fps is (10 sec: 17203.1, 60 sec: 15360.0, 300 sec: 15634.2). Total num frames: 327442432. Throughput: 0: 3970.9. Samples: 71023828. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-04 03:09:33,968][134211] Avg episode reward: [(0, '7.081')] [2025-01-04 03:09:34,538][134294] Updated weights for policy 0, policy_version 79944 (0.0022) [2025-01-04 03:09:37,642][134294] Updated weights for policy 0, policy_version 79954 (0.0027) [2025-01-04 03:09:38,968][134211] Fps is (10 sec: 15155.1, 60 sec: 15223.5, 300 sec: 15467.6). Total num frames: 327503872. Throughput: 0: 3984.7. Samples: 71044608. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-04 03:09:38,968][134211] Avg episode reward: [(0, '7.258')] [2025-01-04 03:09:40,587][134294] Updated weights for policy 0, policy_version 79964 (0.0023) [2025-01-04 03:09:42,380][134294] Updated weights for policy 0, policy_version 79974 (0.0013) [2025-01-04 03:09:43,967][134211] Fps is (10 sec: 15974.9, 60 sec: 15769.6, 300 sec: 15550.9). Total num frames: 327602176. Throughput: 0: 3997.2. Samples: 71070604. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-04 03:09:43,968][134211] Avg episode reward: [(0, '7.929')] [2025-01-04 03:09:44,410][134294] Updated weights for policy 0, policy_version 79984 (0.0013) [2025-01-04 03:09:46,284][134294] Updated weights for policy 0, policy_version 79994 (0.0014) [2025-01-04 03:09:48,256][134294] Updated weights for policy 0, policy_version 80004 (0.0015) [2025-01-04 03:09:48,968][134211] Fps is (10 sec: 20070.4, 60 sec: 16315.8, 300 sec: 15689.8). Total num frames: 327704576. Throughput: 0: 3987.3. Samples: 71086528. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-04 03:09:48,968][134211] Avg episode reward: [(0, '7.103')] [2025-01-04 03:09:51,198][134294] Updated weights for policy 0, policy_version 80014 (0.0026) [2025-01-04 03:09:53,968][134211] Fps is (10 sec: 16792.1, 60 sec: 16315.6, 300 sec: 15689.7). Total num frames: 327770112. Throughput: 0: 3883.6. Samples: 71110468. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-04 03:09:53,969][134211] Avg episode reward: [(0, '7.713')] [2025-01-04 03:09:54,499][134294] Updated weights for policy 0, policy_version 80024 (0.0028) [2025-01-04 03:09:57,680][134294] Updated weights for policy 0, policy_version 80034 (0.0024) [2025-01-04 03:09:58,968][134211] Fps is (10 sec: 13107.0, 60 sec: 15837.8, 300 sec: 15689.8). Total num frames: 327835648. Throughput: 0: 3885.4. Samples: 71129540. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-04 03:09:58,968][134211] Avg episode reward: [(0, '8.040')] [2025-01-04 03:10:00,974][134294] Updated weights for policy 0, policy_version 80044 (0.0028) [2025-01-04 03:10:03,968][134211] Fps is (10 sec: 12698.6, 60 sec: 15018.7, 300 sec: 15662.0). Total num frames: 327897088. Throughput: 0: 3871.3. Samples: 71138736. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-04 03:10:03,968][134211] Avg episode reward: [(0, '7.113')] [2025-01-04 03:10:04,175][134294] Updated weights for policy 0, policy_version 80054 (0.0024) [2025-01-04 03:10:06,089][134294] Updated weights for policy 0, policy_version 80064 (0.0013) [2025-01-04 03:10:07,948][134294] Updated weights for policy 0, policy_version 80074 (0.0013) [2025-01-04 03:10:08,968][134211] Fps is (10 sec: 16793.9, 60 sec: 15564.8, 300 sec: 15787.0). Total num frames: 328003584. Throughput: 0: 3999.5. Samples: 71164906. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-04 03:10:08,968][134211] Avg episode reward: [(0, '7.541')] [2025-01-04 03:10:09,827][134294] Updated weights for policy 0, policy_version 80084 (0.0014) [2025-01-04 03:10:11,742][134294] Updated weights for policy 0, policy_version 80094 (0.0014) [2025-01-04 03:10:13,968][134211] Fps is (10 sec: 20479.8, 60 sec: 16111.0, 300 sec: 15773.1). Total num frames: 328101888. Throughput: 0: 4187.9. Samples: 71196274. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-04 03:10:13,968][134211] Avg episode reward: [(0, '7.158')] [2025-01-04 03:10:14,256][134294] Updated weights for policy 0, policy_version 80104 (0.0018) [2025-01-04 03:10:18,009][134294] Updated weights for policy 0, policy_version 80114 (0.0027) [2025-01-04 03:10:18,968][134211] Fps is (10 sec: 15154.8, 60 sec: 15906.1, 300 sec: 15717.5). Total num frames: 328155136. Throughput: 0: 4027.2. Samples: 71205054. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-04 03:10:18,968][134211] Avg episode reward: [(0, '7.417')] [2025-01-04 03:10:21,673][134294] Updated weights for policy 0, policy_version 80124 (0.0027) [2025-01-04 03:10:23,970][134211] Fps is (10 sec: 11466.3, 60 sec: 15769.0, 300 sec: 15703.5). Total num frames: 328216576. Throughput: 0: 3939.3. Samples: 71221886. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-04 03:10:23,970][134211] Avg episode reward: [(0, '7.223')] [2025-01-04 03:10:24,887][134294] Updated weights for policy 0, policy_version 80134 (0.0026) [2025-01-04 03:10:26,877][134294] Updated weights for policy 0, policy_version 80144 (0.0014) [2025-01-04 03:10:28,772][134294] Updated weights for policy 0, policy_version 80154 (0.0014) [2025-01-04 03:10:28,968][134211] Fps is (10 sec: 15974.7, 60 sec: 16042.7, 300 sec: 15689.8). Total num frames: 328314880. Throughput: 0: 3942.4. Samples: 71248012. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-04 03:10:28,968][134211] Avg episode reward: [(0, '6.983')] [2025-01-04 03:10:30,618][134294] Updated weights for policy 0, policy_version 80164 (0.0013) [2025-01-04 03:10:32,599][134294] Updated weights for policy 0, policy_version 80174 (0.0017) [2025-01-04 03:10:33,968][134211] Fps is (10 sec: 19255.1, 60 sec: 16110.9, 300 sec: 15717.6). Total num frames: 328409088. Throughput: 0: 3951.0. Samples: 71264324. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 03:10:33,968][134211] Avg episode reward: [(0, '7.711')] [2025-01-04 03:10:35,499][134294] Updated weights for policy 0, policy_version 80184 (0.0025) [2025-01-04 03:10:38,648][134294] Updated weights for policy 0, policy_version 80194 (0.0027) [2025-01-04 03:10:38,968][134211] Fps is (10 sec: 16383.7, 60 sec: 16247.4, 300 sec: 15731.4). Total num frames: 328478720. Throughput: 0: 3927.0. Samples: 71287182. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 03:10:38,968][134211] Avg episode reward: [(0, '7.616')] [2025-01-04 03:10:41,799][134294] Updated weights for policy 0, policy_version 80204 (0.0026) [2025-01-04 03:10:43,968][134211] Fps is (10 sec: 13517.0, 60 sec: 15701.3, 300 sec: 15717.5). Total num frames: 328544256. Throughput: 0: 3931.8. Samples: 71306470. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 03:10:43,968][134211] Avg episode reward: [(0, '7.238')] [2025-01-04 03:10:44,931][134294] Updated weights for policy 0, policy_version 80214 (0.0028) [2025-01-04 03:10:47,957][134294] Updated weights for policy 0, policy_version 80224 (0.0024) [2025-01-04 03:10:48,968][134211] Fps is (10 sec: 12697.5, 60 sec: 15018.6, 300 sec: 15550.9). Total num frames: 328605696. Throughput: 0: 3945.8. Samples: 71316296. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 03:10:48,968][134211] Avg episode reward: [(0, '8.458')] [2025-01-04 03:10:50,977][134294] Updated weights for policy 0, policy_version 80234 (0.0025) [2025-01-04 03:10:53,925][134294] Updated weights for policy 0, policy_version 80244 (0.0024) [2025-01-04 03:10:53,968][134211] Fps is (10 sec: 13516.8, 60 sec: 15155.4, 300 sec: 15481.5). Total num frames: 328679424. Throughput: 0: 3824.8. Samples: 71337022. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 03:10:53,968][134211] Avg episode reward: [(0, '7.026')] [2025-01-04 03:10:56,566][134294] Updated weights for policy 0, policy_version 80254 (0.0019) [2025-01-04 03:10:58,482][134294] Updated weights for policy 0, policy_version 80264 (0.0015) [2025-01-04 03:10:58,967][134211] Fps is (10 sec: 16384.5, 60 sec: 15564.9, 300 sec: 15578.7). Total num frames: 328769536. Throughput: 0: 3679.4. Samples: 71361848. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 03:10:58,968][134211] Avg episode reward: [(0, '6.939')] [2025-01-04 03:11:00,386][134294] Updated weights for policy 0, policy_version 80274 (0.0012) [2025-01-04 03:11:02,267][134294] Updated weights for policy 0, policy_version 80284 (0.0013) [2025-01-04 03:11:03,968][134211] Fps is (10 sec: 20070.6, 60 sec: 16384.0, 300 sec: 15717.5). Total num frames: 328880128. Throughput: 0: 3845.8. Samples: 71378116. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 03:11:03,968][134211] Avg episode reward: [(0, '7.236')] [2025-01-04 03:11:04,161][134294] Updated weights for policy 0, policy_version 80294 (0.0013) [2025-01-04 03:11:06,047][134294] Updated weights for policy 0, policy_version 80304 (0.0013) [2025-01-04 03:11:08,330][134294] Updated weights for policy 0, policy_version 80314 (0.0021) [2025-01-04 03:11:08,968][134211] Fps is (10 sec: 20069.8, 60 sec: 16110.9, 300 sec: 15828.6). Total num frames: 328970240. Throughput: 0: 4189.8. Samples: 71410418. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 03:11:08,968][134211] Avg episode reward: [(0, '7.732')] [2025-01-04 03:11:11,607][134294] Updated weights for policy 0, policy_version 80324 (0.0024) [2025-01-04 03:11:13,969][134211] Fps is (10 sec: 15562.5, 60 sec: 15564.4, 300 sec: 15828.5). Total num frames: 329035776. Throughput: 0: 4037.1. Samples: 71429688. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 03:11:13,970][134211] Avg episode reward: [(0, '6.917')] [2025-01-04 03:11:13,978][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000080331_329035776.pth... [2025-01-04 03:11:14,055][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000079408_325255168.pth [2025-01-04 03:11:15,027][134294] Updated weights for policy 0, policy_version 80334 (0.0025) [2025-01-04 03:11:18,089][134294] Updated weights for policy 0, policy_version 80344 (0.0027) [2025-01-04 03:11:18,968][134211] Fps is (10 sec: 12696.9, 60 sec: 15701.2, 300 sec: 15689.7). Total num frames: 329097216. Throughput: 0: 3884.7. Samples: 71439136. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 03:11:18,969][134211] Avg episode reward: [(0, '7.582')] [2025-01-04 03:11:21,147][134294] Updated weights for policy 0, policy_version 80354 (0.0028) [2025-01-04 03:11:23,968][134211] Fps is (10 sec: 12699.3, 60 sec: 15770.2, 300 sec: 15537.0). Total num frames: 329162752. Throughput: 0: 3816.7. Samples: 71458934. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 03:11:23,968][134211] Avg episode reward: [(0, '7.966')] [2025-01-04 03:11:24,305][134294] Updated weights for policy 0, policy_version 80364 (0.0028) [2025-01-04 03:11:27,358][134294] Updated weights for policy 0, policy_version 80374 (0.0025) [2025-01-04 03:11:28,968][134211] Fps is (10 sec: 13517.8, 60 sec: 15291.7, 300 sec: 15523.1). Total num frames: 329232384. Throughput: 0: 3831.1. Samples: 71478870. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 03:11:28,968][134211] Avg episode reward: [(0, '7.592')] [2025-01-04 03:11:30,356][134294] Updated weights for policy 0, policy_version 80384 (0.0028) [2025-01-04 03:11:33,347][134294] Updated weights for policy 0, policy_version 80394 (0.0028) [2025-01-04 03:11:33,967][134211] Fps is (10 sec: 14336.3, 60 sec: 14950.5, 300 sec: 15550.9). Total num frames: 329306112. Throughput: 0: 3840.7. Samples: 71489128. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 03:11:33,968][134211] Avg episode reward: [(0, '7.849')] [2025-01-04 03:11:35,281][134294] Updated weights for policy 0, policy_version 80404 (0.0015) [2025-01-04 03:11:37,143][134294] Updated weights for policy 0, policy_version 80414 (0.0014) [2025-01-04 03:11:38,967][134211] Fps is (10 sec: 18022.7, 60 sec: 15564.9, 300 sec: 15703.7). Total num frames: 329412608. Throughput: 0: 4005.9. Samples: 71517288. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 03:11:38,968][134211] Avg episode reward: [(0, '7.002')] [2025-01-04 03:11:39,081][134294] Updated weights for policy 0, policy_version 80424 (0.0014) [2025-01-04 03:11:40,924][134294] Updated weights for policy 0, policy_version 80434 (0.0013) [2025-01-04 03:11:42,823][134294] Updated weights for policy 0, policy_version 80444 (0.0013) [2025-01-04 03:11:43,968][134211] Fps is (10 sec: 20889.0, 60 sec: 16179.2, 300 sec: 15814.7). Total num frames: 329515008. Throughput: 0: 4173.0. Samples: 71549636. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 03:11:43,968][134211] Avg episode reward: [(0, '7.815')] [2025-01-04 03:11:45,673][134294] Updated weights for policy 0, policy_version 80454 (0.0025) [2025-01-04 03:11:48,893][134294] Updated weights for policy 0, policy_version 80464 (0.0029) [2025-01-04 03:11:48,968][134211] Fps is (10 sec: 16793.1, 60 sec: 16247.5, 300 sec: 15814.7). Total num frames: 329580544. Throughput: 0: 4033.8. Samples: 71559636. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 03:11:48,968][134211] Avg episode reward: [(0, '6.742')] [2025-01-04 03:11:52,167][134294] Updated weights for policy 0, policy_version 80474 (0.0028) [2025-01-04 03:11:53,971][134211] Fps is (10 sec: 12693.8, 60 sec: 16041.8, 300 sec: 15703.5). Total num frames: 329641984. Throughput: 0: 3737.2. Samples: 71578604. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 03:11:53,971][134211] Avg episode reward: [(0, '8.574')] [2025-01-04 03:11:55,618][134294] Updated weights for policy 0, policy_version 80484 (0.0025) [2025-01-04 03:11:58,676][134294] Updated weights for policy 0, policy_version 80494 (0.0025) [2025-01-04 03:11:58,968][134211] Fps is (10 sec: 12288.1, 60 sec: 15564.7, 300 sec: 15550.9). Total num frames: 329703424. Throughput: 0: 3728.6. Samples: 71597468. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 03:11:58,968][134211] Avg episode reward: [(0, '8.205')] [2025-01-04 03:12:01,107][134294] Updated weights for policy 0, policy_version 80504 (0.0016) [2025-01-04 03:12:03,425][134294] Updated weights for policy 0, policy_version 80514 (0.0019) [2025-01-04 03:12:03,968][134211] Fps is (10 sec: 14750.1, 60 sec: 15155.2, 300 sec: 15578.7). Total num frames: 329789440. Throughput: 0: 3793.1. Samples: 71609822. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 03:12:03,968][134211] Avg episode reward: [(0, '8.168')] [2025-01-04 03:12:06,593][134294] Updated weights for policy 0, policy_version 80524 (0.0029) [2025-01-04 03:12:08,968][134211] Fps is (10 sec: 15974.6, 60 sec: 14882.2, 300 sec: 15620.3). Total num frames: 329863168. Throughput: 0: 3839.5. Samples: 71631710. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 03:12:08,968][134211] Avg episode reward: [(0, '8.511')] [2025-01-04 03:12:09,127][134294] Updated weights for policy 0, policy_version 80534 (0.0017) [2025-01-04 03:12:11,010][134294] Updated weights for policy 0, policy_version 80544 (0.0012) [2025-01-04 03:12:12,913][134294] Updated weights for policy 0, policy_version 80554 (0.0013) [2025-01-04 03:12:13,968][134211] Fps is (10 sec: 17613.0, 60 sec: 15496.9, 300 sec: 15759.2). Total num frames: 329965568. Throughput: 0: 4061.4. Samples: 71661634. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 03:12:13,968][134211] Avg episode reward: [(0, '7.391')] [2025-01-04 03:12:15,461][134294] Updated weights for policy 0, policy_version 80564 (0.0020) [2025-01-04 03:12:18,968][134211] Fps is (10 sec: 16383.9, 60 sec: 15496.7, 300 sec: 15689.8). Total num frames: 330027008. Throughput: 0: 4078.2. Samples: 71672650. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 03:12:18,968][134211] Avg episode reward: [(0, '7.286')] [2025-01-04 03:12:19,068][134294] Updated weights for policy 0, policy_version 80574 (0.0032) [2025-01-04 03:12:22,430][134294] Updated weights for policy 0, policy_version 80584 (0.0025) [2025-01-04 03:12:23,968][134211] Fps is (10 sec: 13926.1, 60 sec: 15701.3, 300 sec: 15703.6). Total num frames: 330104832. Throughput: 0: 3845.7. Samples: 71690348. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 03:12:23,968][134211] Avg episode reward: [(0, '7.061')] [2025-01-04 03:12:24,268][134294] Updated weights for policy 0, policy_version 80594 (0.0013) [2025-01-04 03:12:26,207][134294] Updated weights for policy 0, policy_version 80604 (0.0014) [2025-01-04 03:12:28,081][134294] Updated weights for policy 0, policy_version 80614 (0.0012) [2025-01-04 03:12:28,967][134211] Fps is (10 sec: 18432.3, 60 sec: 16315.8, 300 sec: 15689.8). Total num frames: 330211328. Throughput: 0: 3839.4. Samples: 71722406. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 03:12:28,968][134211] Avg episode reward: [(0, '7.574')] [2025-01-04 03:12:30,224][134294] Updated weights for policy 0, policy_version 80624 (0.0015) [2025-01-04 03:12:33,276][134294] Updated weights for policy 0, policy_version 80634 (0.0029) [2025-01-04 03:12:33,968][134211] Fps is (10 sec: 17612.9, 60 sec: 16247.4, 300 sec: 15592.6). Total num frames: 330280960. Throughput: 0: 3908.0. Samples: 71735496. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 03:12:33,968][134211] Avg episode reward: [(0, '6.946')] [2025-01-04 03:12:36,665][134294] Updated weights for policy 0, policy_version 80644 (0.0028) [2025-01-04 03:12:38,968][134211] Fps is (10 sec: 12697.4, 60 sec: 15428.2, 300 sec: 15592.6). Total num frames: 330338304. Throughput: 0: 3895.3. Samples: 71753880. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 03:12:38,968][134211] Avg episode reward: [(0, '7.246')] [2025-01-04 03:12:40,443][134294] Updated weights for policy 0, policy_version 80654 (0.0026) [2025-01-04 03:12:43,890][134294] Updated weights for policy 0, policy_version 80664 (0.0026) [2025-01-04 03:12:43,968][134211] Fps is (10 sec: 11878.5, 60 sec: 14745.6, 300 sec: 15578.7). Total num frames: 330399744. Throughput: 0: 3857.6. Samples: 71771058. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 03:12:43,968][134211] Avg episode reward: [(0, '7.696')] [2025-01-04 03:12:46,321][134294] Updated weights for policy 0, policy_version 80674 (0.0018) [2025-01-04 03:12:48,204][134294] Updated weights for policy 0, policy_version 80684 (0.0013) [2025-01-04 03:12:48,968][134211] Fps is (10 sec: 15974.4, 60 sec: 15291.8, 300 sec: 15689.8). Total num frames: 330498048. Throughput: 0: 3842.9. Samples: 71782754. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 03:12:48,968][134211] Avg episode reward: [(0, '7.138')] [2025-01-04 03:12:50,106][134294] Updated weights for policy 0, policy_version 80694 (0.0012) [2025-01-04 03:12:51,928][134294] Updated weights for policy 0, policy_version 80704 (0.0013) [2025-01-04 03:12:53,877][134294] Updated weights for policy 0, policy_version 80714 (0.0013) [2025-01-04 03:12:53,968][134211] Fps is (10 sec: 20480.3, 60 sec: 16043.5, 300 sec: 15787.0). Total num frames: 330604544. Throughput: 0: 4081.3. Samples: 71815370. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 03:12:53,968][134211] Avg episode reward: [(0, '7.670')] [2025-01-04 03:12:55,741][134294] Updated weights for policy 0, policy_version 80724 (0.0013) [2025-01-04 03:12:58,670][134294] Updated weights for policy 0, policy_version 80734 (0.0026) [2025-01-04 03:12:58,968][134211] Fps is (10 sec: 18841.6, 60 sec: 16384.0, 300 sec: 15828.6). Total num frames: 330686464. Throughput: 0: 4038.0. Samples: 71843344. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 03:12:58,968][134211] Avg episode reward: [(0, '7.050')] [2025-01-04 03:13:01,915][134294] Updated weights for policy 0, policy_version 80744 (0.0027) [2025-01-04 03:13:03,968][134211] Fps is (10 sec: 14745.4, 60 sec: 16042.7, 300 sec: 15759.2). Total num frames: 330752000. Throughput: 0: 3995.7. Samples: 71852456. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 03:13:03,968][134211] Avg episode reward: [(0, '8.066')] [2025-01-04 03:13:05,328][134294] Updated weights for policy 0, policy_version 80754 (0.0031) [2025-01-04 03:13:08,551][134294] Updated weights for policy 0, policy_version 80764 (0.0028) [2025-01-04 03:13:08,968][134211] Fps is (10 sec: 12697.5, 60 sec: 15837.8, 300 sec: 15759.2). Total num frames: 330813440. Throughput: 0: 4017.2. Samples: 71871124. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 03:13:08,968][134211] Avg episode reward: [(0, '6.798')] [2025-01-04 03:13:11,580][134294] Updated weights for policy 0, policy_version 80774 (0.0026) [2025-01-04 03:13:13,968][134211] Fps is (10 sec: 12697.4, 60 sec: 15223.4, 300 sec: 15648.1). Total num frames: 330878976. Throughput: 0: 3747.3. Samples: 71891036. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 03:13:13,968][134211] Avg episode reward: [(0, '7.162')] [2025-01-04 03:13:13,984][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000080781_330878976.pth... [2025-01-04 03:13:14,064][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000079867_327135232.pth [2025-01-04 03:13:14,706][134294] Updated weights for policy 0, policy_version 80784 (0.0024) [2025-01-04 03:13:17,862][134294] Updated weights for policy 0, policy_version 80794 (0.0027) [2025-01-04 03:13:18,968][134211] Fps is (10 sec: 13107.3, 60 sec: 15291.7, 300 sec: 15495.4). Total num frames: 330944512. Throughput: 0: 3671.3. Samples: 71900702. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 03:13:18,968][134211] Avg episode reward: [(0, '7.333')] [2025-01-04 03:13:20,891][134294] Updated weights for policy 0, policy_version 80804 (0.0026) [2025-01-04 03:13:22,978][134294] Updated weights for policy 0, policy_version 80814 (0.0014) [2025-01-04 03:13:23,968][134211] Fps is (10 sec: 15155.3, 60 sec: 15428.3, 300 sec: 15509.3). Total num frames: 331030528. Throughput: 0: 3741.7. Samples: 71922256. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 03:13:23,968][134211] Avg episode reward: [(0, '6.651')] [2025-01-04 03:13:25,603][134294] Updated weights for policy 0, policy_version 80824 (0.0022) [2025-01-04 03:13:28,541][134294] Updated weights for policy 0, policy_version 80834 (0.0028) [2025-01-04 03:13:28,968][134211] Fps is (10 sec: 15564.7, 60 sec: 14813.8, 300 sec: 15523.1). Total num frames: 331100160. Throughput: 0: 3886.8. Samples: 71945964. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 03:13:28,968][134211] Avg episode reward: [(0, '7.639')] [2025-01-04 03:13:31,249][134294] Updated weights for policy 0, policy_version 80844 (0.0020) [2025-01-04 03:13:33,135][134294] Updated weights for policy 0, policy_version 80854 (0.0012) [2025-01-04 03:13:33,968][134211] Fps is (10 sec: 16384.2, 60 sec: 15223.5, 300 sec: 15606.5). Total num frames: 331194368. Throughput: 0: 3877.2. Samples: 71957228. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 03:13:33,968][134211] Avg episode reward: [(0, '7.806')] [2025-01-04 03:13:35,005][134294] Updated weights for policy 0, policy_version 80864 (0.0014) [2025-01-04 03:13:37,017][134294] Updated weights for policy 0, policy_version 80874 (0.0014) [2025-01-04 03:13:38,968][134211] Fps is (10 sec: 19251.1, 60 sec: 15906.1, 300 sec: 15717.5). Total num frames: 331292672. Throughput: 0: 3861.8. Samples: 71989152. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 03:13:38,968][134211] Avg episode reward: [(0, '7.583')] [2025-01-04 03:13:39,136][134294] Updated weights for policy 0, policy_version 80884 (0.0017) [2025-01-04 03:13:42,346][134294] Updated weights for policy 0, policy_version 80894 (0.0026) [2025-01-04 03:13:43,968][134211] Fps is (10 sec: 16383.6, 60 sec: 15974.4, 300 sec: 15703.6). Total num frames: 331358208. Throughput: 0: 3731.0. Samples: 72011240. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 03:13:43,969][134211] Avg episode reward: [(0, '7.726')] [2025-01-04 03:13:45,566][134294] Updated weights for policy 0, policy_version 80904 (0.0026) [2025-01-04 03:13:48,756][134294] Updated weights for policy 0, policy_version 80914 (0.0027) [2025-01-04 03:13:48,968][134211] Fps is (10 sec: 13107.3, 60 sec: 15428.2, 300 sec: 15703.7). Total num frames: 331423744. Throughput: 0: 3739.7. Samples: 72020742. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 03:13:48,968][134211] Avg episode reward: [(0, '7.470')] [2025-01-04 03:13:51,854][134294] Updated weights for policy 0, policy_version 80924 (0.0027) [2025-01-04 03:13:53,970][134211] Fps is (10 sec: 13514.2, 60 sec: 14813.3, 300 sec: 15620.2). Total num frames: 331493376. Throughput: 0: 3768.0. Samples: 72040692. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 03:13:53,970][134211] Avg episode reward: [(0, '7.512')] [2025-01-04 03:13:54,869][134294] Updated weights for policy 0, policy_version 80934 (0.0022) [2025-01-04 03:13:57,262][134294] Updated weights for policy 0, policy_version 80944 (0.0017) [2025-01-04 03:13:58,968][134211] Fps is (10 sec: 15974.7, 60 sec: 14950.4, 300 sec: 15550.9). Total num frames: 331583488. Throughput: 0: 3863.1. Samples: 72064876. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 03:13:58,968][134211] Avg episode reward: [(0, '7.263')] [2025-01-04 03:13:59,147][134294] Updated weights for policy 0, policy_version 80954 (0.0012) [2025-01-04 03:14:01,005][134294] Updated weights for policy 0, policy_version 80964 (0.0013) [2025-01-04 03:14:02,997][134294] Updated weights for policy 0, policy_version 80974 (0.0015) [2025-01-04 03:14:03,968][134211] Fps is (10 sec: 18845.5, 60 sec: 15496.5, 300 sec: 15634.2). Total num frames: 331681792. Throughput: 0: 4011.6. Samples: 72081224. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 03:14:03,968][134211] Avg episode reward: [(0, '7.434')] [2025-01-04 03:14:06,054][134294] Updated weights for policy 0, policy_version 80984 (0.0027) [2025-01-04 03:14:08,968][134211] Fps is (10 sec: 16383.9, 60 sec: 15564.8, 300 sec: 15634.2). Total num frames: 331747328. Throughput: 0: 4051.0. Samples: 72104550. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 03:14:08,968][134211] Avg episode reward: [(0, '7.072')] [2025-01-04 03:14:09,161][134294] Updated weights for policy 0, policy_version 80994 (0.0024) [2025-01-04 03:14:12,314][134294] Updated weights for policy 0, policy_version 81004 (0.0023) [2025-01-04 03:14:13,968][134211] Fps is (10 sec: 12697.5, 60 sec: 15496.6, 300 sec: 15620.3). Total num frames: 331808768. Throughput: 0: 3954.7. Samples: 72123926. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 03:14:13,968][134211] Avg episode reward: [(0, '7.261')] [2025-01-04 03:14:15,273][134294] Updated weights for policy 0, policy_version 81014 (0.0019) [2025-01-04 03:14:17,398][134294] Updated weights for policy 0, policy_version 81024 (0.0015) [2025-01-04 03:14:18,968][134211] Fps is (10 sec: 15564.9, 60 sec: 15974.4, 300 sec: 15703.7). Total num frames: 331902976. Throughput: 0: 3970.8. Samples: 72135912. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 03:14:18,968][134211] Avg episode reward: [(0, '7.901')] [2025-01-04 03:14:19,504][134294] Updated weights for policy 0, policy_version 81034 (0.0014) [2025-01-04 03:14:21,476][134294] Updated weights for policy 0, policy_version 81044 (0.0017) [2025-01-04 03:14:23,968][134211] Fps is (10 sec: 17612.5, 60 sec: 15906.1, 300 sec: 15703.6). Total num frames: 331984896. Throughput: 0: 3901.0. Samples: 72164698. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 03:14:23,969][134211] Avg episode reward: [(0, '7.083')] [2025-01-04 03:14:24,702][134294] Updated weights for policy 0, policy_version 81054 (0.0028) [2025-01-04 03:14:27,922][134294] Updated weights for policy 0, policy_version 81064 (0.0025) [2025-01-04 03:14:28,968][134211] Fps is (10 sec: 14745.3, 60 sec: 15837.9, 300 sec: 15620.3). Total num frames: 332050432. Throughput: 0: 3831.6. Samples: 72183662. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 03:14:28,969][134211] Avg episode reward: [(0, '7.993')] [2025-01-04 03:14:30,950][134294] Updated weights for policy 0, policy_version 81074 (0.0025) [2025-01-04 03:14:33,968][134211] Fps is (10 sec: 13107.2, 60 sec: 15359.9, 300 sec: 15634.2). Total num frames: 332115968. Throughput: 0: 3846.0. Samples: 72193814. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 03:14:33,968][134211] Avg episode reward: [(0, '7.404')] [2025-01-04 03:14:34,001][134294] Updated weights for policy 0, policy_version 81084 (0.0024) [2025-01-04 03:14:37,180][134294] Updated weights for policy 0, policy_version 81094 (0.0025) [2025-01-04 03:14:38,967][134211] Fps is (10 sec: 14336.4, 60 sec: 15018.8, 300 sec: 15564.8). Total num frames: 332193792. Throughput: 0: 3839.4. Samples: 72213458. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 03:14:38,968][134211] Avg episode reward: [(0, '7.231')] [2025-01-04 03:14:39,218][134294] Updated weights for policy 0, policy_version 81104 (0.0012) [2025-01-04 03:14:41,162][134294] Updated weights for policy 0, policy_version 81114 (0.0013) [2025-01-04 03:14:43,044][134294] Updated weights for policy 0, policy_version 81124 (0.0013) [2025-01-04 03:14:43,968][134211] Fps is (10 sec: 18842.1, 60 sec: 15769.7, 300 sec: 15592.6). Total num frames: 332304384. Throughput: 0: 4002.8. Samples: 72245000. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 03:14:43,968][134211] Avg episode reward: [(0, '7.317')] [2025-01-04 03:14:44,956][134294] Updated weights for policy 0, policy_version 81134 (0.0013) [2025-01-04 03:14:46,849][134294] Updated weights for policy 0, policy_version 81144 (0.0013) [2025-01-04 03:14:48,968][134211] Fps is (10 sec: 20889.3, 60 sec: 16315.8, 300 sec: 15703.7). Total num frames: 332402688. Throughput: 0: 3999.9. Samples: 72261220. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 03:14:48,968][134211] Avg episode reward: [(0, '7.293')] [2025-01-04 03:14:48,990][134294] Updated weights for policy 0, policy_version 81154 (0.0017) [2025-01-04 03:14:52,352][134294] Updated weights for policy 0, policy_version 81164 (0.0029) [2025-01-04 03:14:53,970][134211] Fps is (10 sec: 15970.5, 60 sec: 16179.1, 300 sec: 15689.6). Total num frames: 332464128. Throughput: 0: 4000.7. Samples: 72284590. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 03:14:53,971][134211] Avg episode reward: [(0, '7.684')] [2025-01-04 03:14:55,640][134294] Updated weights for policy 0, policy_version 81174 (0.0030) [2025-01-04 03:14:58,738][134294] Updated weights for policy 0, policy_version 81184 (0.0023) [2025-01-04 03:14:58,968][134211] Fps is (10 sec: 12697.5, 60 sec: 15769.6, 300 sec: 15703.6). Total num frames: 332529664. Throughput: 0: 3997.0. Samples: 72303792. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 03:14:58,968][134211] Avg episode reward: [(0, '7.849')] [2025-01-04 03:15:01,818][134294] Updated weights for policy 0, policy_version 81194 (0.0025) [2025-01-04 03:15:03,968][134211] Fps is (10 sec: 13109.4, 60 sec: 15223.3, 300 sec: 15564.8). Total num frames: 332595200. Throughput: 0: 3951.7. Samples: 72313742. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 03:15:03,969][134211] Avg episode reward: [(0, '7.058')] [2025-01-04 03:15:05,029][134294] Updated weights for policy 0, policy_version 81204 (0.0027) [2025-01-04 03:15:08,101][134294] Updated weights for policy 0, policy_version 81214 (0.0024) [2025-01-04 03:15:08,968][134211] Fps is (10 sec: 13107.3, 60 sec: 15223.5, 300 sec: 15453.7). Total num frames: 332660736. Throughput: 0: 3748.9. Samples: 72333398. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 03:15:08,968][134211] Avg episode reward: [(0, '7.699')] [2025-01-04 03:15:11,043][134294] Updated weights for policy 0, policy_version 81224 (0.0024) [2025-01-04 03:15:13,968][134211] Fps is (10 sec: 13516.9, 60 sec: 15359.8, 300 sec: 15509.2). Total num frames: 332730368. Throughput: 0: 3775.6. Samples: 72353564. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 03:15:13,969][134211] Avg episode reward: [(0, '7.565')] [2025-01-04 03:15:13,976][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000081233_332730368.pth... [2025-01-04 03:15:14,049][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000080331_329035776.pth [2025-01-04 03:15:14,186][134294] Updated weights for policy 0, policy_version 81234 (0.0023) [2025-01-04 03:15:17,043][134294] Updated weights for policy 0, policy_version 81244 (0.0022) [2025-01-04 03:15:18,968][134211] Fps is (10 sec: 15155.3, 60 sec: 15155.2, 300 sec: 15578.8). Total num frames: 332812288. Throughput: 0: 3768.7. Samples: 72363404. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 03:15:18,968][134211] Avg episode reward: [(0, '6.955')] [2025-01-04 03:15:19,015][134294] Updated weights for policy 0, policy_version 81254 (0.0011) [2025-01-04 03:15:20,946][134294] Updated weights for policy 0, policy_version 81264 (0.0016) [2025-01-04 03:15:23,779][134294] Updated weights for policy 0, policy_version 81274 (0.0023) [2025-01-04 03:15:23,969][134211] Fps is (10 sec: 16793.2, 60 sec: 15223.3, 300 sec: 15537.0). Total num frames: 332898304. Throughput: 0: 3972.7. Samples: 72392234. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 03:15:23,969][134211] Avg episode reward: [(0, '7.509')] [2025-01-04 03:15:26,827][134294] Updated weights for policy 0, policy_version 81284 (0.0023) [2025-01-04 03:15:28,967][134211] Fps is (10 sec: 15155.4, 60 sec: 15223.6, 300 sec: 15439.9). Total num frames: 332963840. Throughput: 0: 3716.1. Samples: 72412224. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 03:15:28,968][134211] Avg episode reward: [(0, '7.877')] [2025-01-04 03:15:29,622][134294] Updated weights for policy 0, policy_version 81294 (0.0021) [2025-01-04 03:15:31,475][134294] Updated weights for policy 0, policy_version 81304 (0.0013) [2025-01-04 03:15:33,522][134294] Updated weights for policy 0, policy_version 81314 (0.0017) [2025-01-04 03:15:33,968][134211] Fps is (10 sec: 16794.9, 60 sec: 15837.9, 300 sec: 15550.9). Total num frames: 333066240. Throughput: 0: 3680.5. Samples: 72426844. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 03:15:33,968][134211] Avg episode reward: [(0, '7.156')] [2025-01-04 03:15:36,528][134294] Updated weights for policy 0, policy_version 81324 (0.0028) [2025-01-04 03:15:38,968][134211] Fps is (10 sec: 16793.2, 60 sec: 15633.0, 300 sec: 15550.9). Total num frames: 333131776. Throughput: 0: 3698.4. Samples: 72451010. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 03:15:38,968][134211] Avg episode reward: [(0, '8.008')] [2025-01-04 03:15:39,745][134294] Updated weights for policy 0, policy_version 81334 (0.0022) [2025-01-04 03:15:42,644][134294] Updated weights for policy 0, policy_version 81344 (0.0021) [2025-01-04 03:15:43,968][134211] Fps is (10 sec: 14335.4, 60 sec: 15086.8, 300 sec: 15606.4). Total num frames: 333209600. Throughput: 0: 3746.2. Samples: 72472374. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 03:15:43,969][134211] Avg episode reward: [(0, '7.396')] [2025-01-04 03:15:44,679][134294] Updated weights for policy 0, policy_version 81354 (0.0014) [2025-01-04 03:15:46,645][134294] Updated weights for policy 0, policy_version 81364 (0.0015) [2025-01-04 03:15:48,968][134211] Fps is (10 sec: 16383.8, 60 sec: 14882.1, 300 sec: 15648.1). Total num frames: 333295616. Throughput: 0: 3868.8. Samples: 72487838. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 03:15:48,968][134211] Avg episode reward: [(0, '7.472')] [2025-01-04 03:15:49,686][134294] Updated weights for policy 0, policy_version 81374 (0.0025) [2025-01-04 03:15:53,084][134294] Updated weights for policy 0, policy_version 81384 (0.0025) [2025-01-04 03:15:53,968][134211] Fps is (10 sec: 14746.2, 60 sec: 14882.7, 300 sec: 15550.9). Total num frames: 333357056. Throughput: 0: 3876.3. Samples: 72507832. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 03:15:53,968][134211] Avg episode reward: [(0, '7.309')] [2025-01-04 03:15:56,152][134294] Updated weights for policy 0, policy_version 81394 (0.0020) [2025-01-04 03:15:58,096][134294] Updated weights for policy 0, policy_version 81404 (0.0013) [2025-01-04 03:15:58,968][134211] Fps is (10 sec: 15155.5, 60 sec: 15291.8, 300 sec: 15481.5). Total num frames: 333447168. Throughput: 0: 3950.3. Samples: 72531324. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 03:15:58,968][134211] Avg episode reward: [(0, '7.897')] [2025-01-04 03:15:59,974][134294] Updated weights for policy 0, policy_version 81414 (0.0012) [2025-01-04 03:16:01,847][134294] Updated weights for policy 0, policy_version 81424 (0.0014) [2025-01-04 03:16:03,854][134294] Updated weights for policy 0, policy_version 81434 (0.0015) [2025-01-04 03:16:03,968][134211] Fps is (10 sec: 19661.2, 60 sec: 15974.6, 300 sec: 15537.0). Total num frames: 333553664. Throughput: 0: 4094.5. Samples: 72547658. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 03:16:03,968][134211] Avg episode reward: [(0, '8.037')] [2025-01-04 03:16:06,822][134294] Updated weights for policy 0, policy_version 81444 (0.0026) [2025-01-04 03:16:08,968][134211] Fps is (10 sec: 17202.8, 60 sec: 15974.4, 300 sec: 15537.1). Total num frames: 333619200. Throughput: 0: 4016.5. Samples: 72572974. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 03:16:08,968][134211] Avg episode reward: [(0, '7.229')] [2025-01-04 03:16:10,244][134294] Updated weights for policy 0, policy_version 81454 (0.0028) [2025-01-04 03:16:13,642][134294] Updated weights for policy 0, policy_version 81464 (0.0025) [2025-01-04 03:16:13,970][134211] Fps is (10 sec: 12285.1, 60 sec: 15769.2, 300 sec: 15523.1). Total num frames: 333676544. Throughput: 0: 3974.8. Samples: 72591098. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 03:16:13,971][134211] Avg episode reward: [(0, '7.896')] [2025-01-04 03:16:17,191][134294] Updated weights for policy 0, policy_version 81474 (0.0023) [2025-01-04 03:16:18,968][134211] Fps is (10 sec: 11468.9, 60 sec: 15360.0, 300 sec: 15495.4). Total num frames: 333733888. Throughput: 0: 3838.2. Samples: 72599564. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 03:16:18,969][134211] Avg episode reward: [(0, '7.642')] [2025-01-04 03:16:20,653][134294] Updated weights for policy 0, policy_version 81484 (0.0026) [2025-01-04 03:16:22,767][134294] Updated weights for policy 0, policy_version 81494 (0.0014) [2025-01-04 03:16:23,968][134211] Fps is (10 sec: 14749.0, 60 sec: 15428.5, 300 sec: 15564.8). Total num frames: 333824000. Throughput: 0: 3753.2. Samples: 72619904. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 03:16:23,968][134211] Avg episode reward: [(0, '7.672')] [2025-01-04 03:16:24,732][134294] Updated weights for policy 0, policy_version 81504 (0.0014) [2025-01-04 03:16:26,593][134294] Updated weights for policy 0, policy_version 81514 (0.0013) [2025-01-04 03:16:28,499][134294] Updated weights for policy 0, policy_version 81524 (0.0014) [2025-01-04 03:16:28,968][134211] Fps is (10 sec: 19660.9, 60 sec: 16110.9, 300 sec: 15675.9). Total num frames: 333930496. Throughput: 0: 3992.3. Samples: 72652026. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 03:16:28,968][134211] Avg episode reward: [(0, '7.658')] [2025-01-04 03:16:30,386][134294] Updated weights for policy 0, policy_version 81534 (0.0013) [2025-01-04 03:16:32,372][134294] Updated weights for policy 0, policy_version 81544 (0.0014) [2025-01-04 03:16:33,968][134211] Fps is (10 sec: 20069.7, 60 sec: 15974.4, 300 sec: 15634.2). Total num frames: 334024704. Throughput: 0: 4012.8. Samples: 72668414. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 03:16:33,969][134211] Avg episode reward: [(0, '8.227')] [2025-01-04 03:16:35,431][134294] Updated weights for policy 0, policy_version 81554 (0.0028) [2025-01-04 03:16:38,409][134294] Updated weights for policy 0, policy_version 81564 (0.0026) [2025-01-04 03:16:38,968][134211] Fps is (10 sec: 15974.3, 60 sec: 15974.4, 300 sec: 15509.3). Total num frames: 334090240. Throughput: 0: 4062.8. Samples: 72690656. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 03:16:38,969][134211] Avg episode reward: [(0, '7.678')] [2025-01-04 03:16:41,729][134294] Updated weights for policy 0, policy_version 81574 (0.0027) [2025-01-04 03:16:43,968][134211] Fps is (10 sec: 13107.6, 60 sec: 15769.7, 300 sec: 15509.3). Total num frames: 334155776. Throughput: 0: 3965.3. Samples: 72709764. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 03:16:43,968][134211] Avg episode reward: [(0, '7.703')] [2025-01-04 03:16:44,959][134294] Updated weights for policy 0, policy_version 81584 (0.0026) [2025-01-04 03:16:48,143][134294] Updated weights for policy 0, policy_version 81594 (0.0025) [2025-01-04 03:16:48,968][134211] Fps is (10 sec: 12697.8, 60 sec: 15360.0, 300 sec: 15509.4). Total num frames: 334217216. Throughput: 0: 3801.5. Samples: 72718724. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 03:16:48,968][134211] Avg episode reward: [(0, '8.041')] [2025-01-04 03:16:51,240][134294] Updated weights for policy 0, policy_version 81604 (0.0026) [2025-01-04 03:16:53,968][134211] Fps is (10 sec: 12697.4, 60 sec: 15428.3, 300 sec: 15523.1). Total num frames: 334282752. Throughput: 0: 3687.1. Samples: 72738892. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 03:16:53,969][134211] Avg episode reward: [(0, '6.981')] [2025-01-04 03:16:54,442][134294] Updated weights for policy 0, policy_version 81614 (0.0025) [2025-01-04 03:16:57,784][134294] Updated weights for policy 0, policy_version 81624 (0.0023) [2025-01-04 03:16:58,967][134211] Fps is (10 sec: 13926.6, 60 sec: 15155.2, 300 sec: 15481.5). Total num frames: 334356480. Throughput: 0: 3712.6. Samples: 72758158. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 03:16:58,968][134211] Avg episode reward: [(0, '7.125')] [2025-01-04 03:16:59,735][134294] Updated weights for policy 0, policy_version 81634 (0.0013) [2025-01-04 03:17:01,660][134294] Updated weights for policy 0, policy_version 81644 (0.0014) [2025-01-04 03:17:03,575][134294] Updated weights for policy 0, policy_version 81654 (0.0014) [2025-01-04 03:17:03,968][134211] Fps is (10 sec: 18022.7, 60 sec: 15155.2, 300 sec: 15592.6). Total num frames: 334462976. Throughput: 0: 3878.7. Samples: 72774106. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 03:17:03,968][134211] Avg episode reward: [(0, '6.931')] [2025-01-04 03:17:05,439][134294] Updated weights for policy 0, policy_version 81664 (0.0013) [2025-01-04 03:17:07,438][134294] Updated weights for policy 0, policy_version 81674 (0.0017) [2025-01-04 03:17:08,968][134211] Fps is (10 sec: 19660.5, 60 sec: 15564.8, 300 sec: 15550.9). Total num frames: 334553088. Throughput: 0: 4142.1. Samples: 72806298. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-04 03:17:08,968][134211] Avg episode reward: [(0, '7.633')] [2025-01-04 03:17:10,821][134294] Updated weights for policy 0, policy_version 81684 (0.0025) [2025-01-04 03:17:13,968][134211] Fps is (10 sec: 15154.8, 60 sec: 15633.6, 300 sec: 15550.9). Total num frames: 334614528. Throughput: 0: 3833.9. Samples: 72824554. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-04 03:17:13,969][134211] Avg episode reward: [(0, '7.537')] [2025-01-04 03:17:13,978][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000081693_334614528.pth... [2025-01-04 03:17:14,057][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000080781_330878976.pth [2025-01-04 03:17:14,369][134294] Updated weights for policy 0, policy_version 81694 (0.0026) [2025-01-04 03:17:17,532][134294] Updated weights for policy 0, policy_version 81704 (0.0028) [2025-01-04 03:17:18,968][134211] Fps is (10 sec: 12287.8, 60 sec: 15701.3, 300 sec: 15495.4). Total num frames: 334675968. Throughput: 0: 3672.1. Samples: 72833658. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-04 03:17:18,968][134211] Avg episode reward: [(0, '8.036')] [2025-01-04 03:17:20,551][134294] Updated weights for policy 0, policy_version 81714 (0.0026) [2025-01-04 03:17:23,601][134294] Updated weights for policy 0, policy_version 81724 (0.0027) [2025-01-04 03:17:23,968][134211] Fps is (10 sec: 13107.3, 60 sec: 15359.9, 300 sec: 15370.4). Total num frames: 334745600. Throughput: 0: 3626.8. Samples: 72853860. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-04 03:17:23,969][134211] Avg episode reward: [(0, '8.685')] [2025-01-04 03:17:26,561][134294] Updated weights for policy 0, policy_version 81734 (0.0025) [2025-01-04 03:17:28,968][134211] Fps is (10 sec: 13516.9, 60 sec: 14677.3, 300 sec: 15356.5). Total num frames: 334811136. Throughput: 0: 3646.8. Samples: 72873870. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-04 03:17:28,968][134211] Avg episode reward: [(0, '7.134')] [2025-01-04 03:17:29,783][134294] Updated weights for policy 0, policy_version 81744 (0.0023) [2025-01-04 03:17:32,234][134294] Updated weights for policy 0, policy_version 81754 (0.0019) [2025-01-04 03:17:33,968][134211] Fps is (10 sec: 15565.1, 60 sec: 14609.1, 300 sec: 15467.6). Total num frames: 334901248. Throughput: 0: 3668.8. Samples: 72883820. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-04 03:17:33,968][134211] Avg episode reward: [(0, '8.194')] [2025-01-04 03:17:34,145][134294] Updated weights for policy 0, policy_version 81764 (0.0014) [2025-01-04 03:17:35,998][134294] Updated weights for policy 0, policy_version 81774 (0.0014) [2025-01-04 03:17:37,864][134294] Updated weights for policy 0, policy_version 81784 (0.0014) [2025-01-04 03:17:38,968][134211] Fps is (10 sec: 18841.9, 60 sec: 15155.3, 300 sec: 15592.6). Total num frames: 334999552. Throughput: 0: 3940.1. Samples: 72916194. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-04 03:17:38,968][134211] Avg episode reward: [(0, '7.708')] [2025-01-04 03:17:40,815][134294] Updated weights for policy 0, policy_version 81794 (0.0024) [2025-01-04 03:17:43,938][134294] Updated weights for policy 0, policy_version 81804 (0.0026) [2025-01-04 03:17:43,968][134211] Fps is (10 sec: 16793.4, 60 sec: 15223.5, 300 sec: 15495.4). Total num frames: 335069184. Throughput: 0: 3999.9. Samples: 72938156. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-04 03:17:43,968][134211] Avg episode reward: [(0, '7.541')] [2025-01-04 03:17:47,234][134294] Updated weights for policy 0, policy_version 81814 (0.0030) [2025-01-04 03:17:48,968][134211] Fps is (10 sec: 12697.6, 60 sec: 15155.2, 300 sec: 15328.8). Total num frames: 335126528. Throughput: 0: 3847.6. Samples: 72947246. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-04 03:17:48,968][134211] Avg episode reward: [(0, '7.252')] [2025-01-04 03:17:50,409][134294] Updated weights for policy 0, policy_version 81824 (0.0024) [2025-01-04 03:17:52,455][134294] Updated weights for policy 0, policy_version 81834 (0.0013) [2025-01-04 03:17:53,967][134211] Fps is (10 sec: 15155.5, 60 sec: 15633.2, 300 sec: 15370.4). Total num frames: 335220736. Throughput: 0: 3634.2. Samples: 72969836. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-04 03:17:53,968][134211] Avg episode reward: [(0, '7.500')] [2025-01-04 03:17:54,360][134294] Updated weights for policy 0, policy_version 81844 (0.0013) [2025-01-04 03:17:56,262][134294] Updated weights for policy 0, policy_version 81854 (0.0015) [2025-01-04 03:17:58,181][134294] Updated weights for policy 0, policy_version 81864 (0.0014) [2025-01-04 03:17:58,968][134211] Fps is (10 sec: 20070.3, 60 sec: 16179.2, 300 sec: 15509.3). Total num frames: 335327232. Throughput: 0: 3942.7. Samples: 73001976. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 03:17:58,968][134211] Avg episode reward: [(0, '8.118')] [2025-01-04 03:18:01,041][134294] Updated weights for policy 0, policy_version 81874 (0.0025) [2025-01-04 03:18:03,968][134211] Fps is (10 sec: 16793.0, 60 sec: 15428.2, 300 sec: 15509.3). Total num frames: 335388672. Throughput: 0: 3975.4. Samples: 73012552. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 03:18:03,969][134211] Avg episode reward: [(0, '7.702')] [2025-01-04 03:18:04,495][134294] Updated weights for policy 0, policy_version 81884 (0.0025) [2025-01-04 03:18:07,572][134294] Updated weights for policy 0, policy_version 81894 (0.0024) [2025-01-04 03:18:08,968][134211] Fps is (10 sec: 12697.6, 60 sec: 15018.7, 300 sec: 15509.3). Total num frames: 335454208. Throughput: 0: 3943.7. Samples: 73031324. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 03:18:08,968][134211] Avg episode reward: [(0, '7.842')] [2025-01-04 03:18:10,701][134294] Updated weights for policy 0, policy_version 81904 (0.0024) [2025-01-04 03:18:13,710][134294] Updated weights for policy 0, policy_version 81914 (0.0026) [2025-01-04 03:18:13,968][134211] Fps is (10 sec: 13107.4, 60 sec: 15087.0, 300 sec: 15509.3). Total num frames: 335519744. Throughput: 0: 3942.8. Samples: 73051296. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 03:18:13,968][134211] Avg episode reward: [(0, '7.511')] [2025-01-04 03:18:17,102][134294] Updated weights for policy 0, policy_version 81924 (0.0024) [2025-01-04 03:18:18,969][134211] Fps is (10 sec: 13924.5, 60 sec: 15291.4, 300 sec: 15467.5). Total num frames: 335593472. Throughput: 0: 3929.1. Samples: 73060636. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 03:18:18,969][134211] Avg episode reward: [(0, '8.305')] [2025-01-04 03:18:19,299][134294] Updated weights for policy 0, policy_version 81934 (0.0012) [2025-01-04 03:18:21,360][134294] Updated weights for policy 0, policy_version 81944 (0.0014) [2025-01-04 03:18:23,264][134294] Updated weights for policy 0, policy_version 81954 (0.0014) [2025-01-04 03:18:23,967][134211] Fps is (10 sec: 17613.2, 60 sec: 15838.0, 300 sec: 15578.7). Total num frames: 335695872. Throughput: 0: 3819.8. Samples: 73088084. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 03:18:23,968][134211] Avg episode reward: [(0, '7.241')] [2025-01-04 03:18:25,180][134294] Updated weights for policy 0, policy_version 81964 (0.0012) [2025-01-04 03:18:27,053][134294] Updated weights for policy 0, policy_version 81974 (0.0015) [2025-01-04 03:18:28,968][134211] Fps is (10 sec: 20482.6, 60 sec: 16452.3, 300 sec: 15606.4). Total num frames: 335798272. Throughput: 0: 4053.9. Samples: 73120580. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 03:18:28,968][134211] Avg episode reward: [(0, '8.017')] [2025-01-04 03:18:29,277][134294] Updated weights for policy 0, policy_version 81984 (0.0020) [2025-01-04 03:18:32,373][134294] Updated weights for policy 0, policy_version 81994 (0.0024) [2025-01-04 03:18:33,968][134211] Fps is (10 sec: 16793.1, 60 sec: 16042.6, 300 sec: 15495.4). Total num frames: 335863808. Throughput: 0: 4084.1. Samples: 73131030. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 03:18:33,968][134211] Avg episode reward: [(0, '7.071')] [2025-01-04 03:18:35,656][134294] Updated weights for policy 0, policy_version 82004 (0.0026) [2025-01-04 03:18:38,807][134294] Updated weights for policy 0, policy_version 82014 (0.0026) [2025-01-04 03:18:38,968][134211] Fps is (10 sec: 13107.2, 60 sec: 15496.5, 300 sec: 15495.4). Total num frames: 335929344. Throughput: 0: 4015.0. Samples: 73150512. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 03:18:38,969][134211] Avg episode reward: [(0, '7.386')] [2025-01-04 03:18:41,792][134294] Updated weights for policy 0, policy_version 82024 (0.0025) [2025-01-04 03:18:43,968][134211] Fps is (10 sec: 13107.2, 60 sec: 15428.2, 300 sec: 15495.4). Total num frames: 335994880. Throughput: 0: 3737.7. Samples: 73170174. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 03:18:43,968][134211] Avg episode reward: [(0, '7.936')] [2025-01-04 03:18:45,048][134294] Updated weights for policy 0, policy_version 82034 (0.0029) [2025-01-04 03:18:47,899][134294] Updated weights for policy 0, policy_version 82044 (0.0025) [2025-01-04 03:18:48,968][134211] Fps is (10 sec: 13516.8, 60 sec: 15633.0, 300 sec: 15495.5). Total num frames: 336064512. Throughput: 0: 3720.4. Samples: 73179970. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-04 03:18:48,968][134211] Avg episode reward: [(0, '6.930')] [2025-01-04 03:18:50,978][134294] Updated weights for policy 0, policy_version 82054 (0.0022) [2025-01-04 03:18:53,968][134211] Fps is (10 sec: 13516.9, 60 sec: 15155.2, 300 sec: 15412.1). Total num frames: 336130048. Throughput: 0: 3765.8. Samples: 73200786. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-04 03:18:53,968][134211] Avg episode reward: [(0, '7.163')] [2025-01-04 03:18:53,980][134294] Updated weights for policy 0, policy_version 82064 (0.0025) [2025-01-04 03:18:56,982][134294] Updated weights for policy 0, policy_version 82074 (0.0025) [2025-01-04 03:18:58,968][134211] Fps is (10 sec: 14745.8, 60 sec: 14745.6, 300 sec: 15356.5). Total num frames: 336211968. Throughput: 0: 3820.6. Samples: 73223222. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-04 03:18:58,968][134211] Avg episode reward: [(0, '6.969')] [2025-01-04 03:18:58,983][134294] Updated weights for policy 0, policy_version 82084 (0.0015) [2025-01-04 03:19:01,673][134294] Updated weights for policy 0, policy_version 82094 (0.0025) [2025-01-04 03:19:03,968][134211] Fps is (10 sec: 15564.8, 60 sec: 14950.4, 300 sec: 15384.3). Total num frames: 336285696. Throughput: 0: 3885.9. Samples: 73235496. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-04 03:19:03,968][134211] Avg episode reward: [(0, '7.137')] [2025-01-04 03:19:04,760][134294] Updated weights for policy 0, policy_version 82104 (0.0027) [2025-01-04 03:19:07,152][134294] Updated weights for policy 0, policy_version 82114 (0.0019) [2025-01-04 03:19:08,968][134211] Fps is (10 sec: 16384.1, 60 sec: 15360.0, 300 sec: 15481.5). Total num frames: 336375808. Throughput: 0: 3771.4. Samples: 73257796. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-04 03:19:08,968][134211] Avg episode reward: [(0, '7.783')] [2025-01-04 03:19:09,122][134294] Updated weights for policy 0, policy_version 82124 (0.0013) [2025-01-04 03:19:11,009][134294] Updated weights for policy 0, policy_version 82134 (0.0015) [2025-01-04 03:19:12,903][134294] Updated weights for policy 0, policy_version 82144 (0.0012) [2025-01-04 03:19:13,967][134211] Fps is (10 sec: 19661.1, 60 sec: 16042.7, 300 sec: 15523.1). Total num frames: 336482304. Throughput: 0: 3769.7. Samples: 73290214. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-04 03:19:13,968][134211] Avg episode reward: [(0, '7.314')] [2025-01-04 03:19:14,035][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000082150_336486400.pth... [2025-01-04 03:19:14,078][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000081233_332730368.pth [2025-01-04 03:19:14,791][134294] Updated weights for policy 0, policy_version 82154 (0.0012) [2025-01-04 03:19:16,766][134294] Updated weights for policy 0, policy_version 82164 (0.0013) [2025-01-04 03:19:18,969][134211] Fps is (10 sec: 19658.7, 60 sec: 16315.8, 300 sec: 15550.9). Total num frames: 336572416. Throughput: 0: 3894.0. Samples: 73306264. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-04 03:19:18,969][134211] Avg episode reward: [(0, '7.341')] [2025-01-04 03:19:19,671][134294] Updated weights for policy 0, policy_version 82174 (0.0023) [2025-01-04 03:19:22,926][134294] Updated weights for policy 0, policy_version 82184 (0.0032) [2025-01-04 03:19:23,968][134211] Fps is (10 sec: 15563.7, 60 sec: 15701.2, 300 sec: 15550.9). Total num frames: 336637952. Throughput: 0: 3926.5. Samples: 73327204. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-04 03:19:23,969][134211] Avg episode reward: [(0, '6.957')] [2025-01-04 03:19:26,095][134294] Updated weights for policy 0, policy_version 82194 (0.0027) [2025-01-04 03:19:28,968][134211] Fps is (10 sec: 13108.1, 60 sec: 15086.9, 300 sec: 15550.9). Total num frames: 336703488. Throughput: 0: 3919.8. Samples: 73346566. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-04 03:19:28,969][134211] Avg episode reward: [(0, '6.695')] [2025-01-04 03:19:29,368][134294] Updated weights for policy 0, policy_version 82204 (0.0029) [2025-01-04 03:19:32,306][134294] Updated weights for policy 0, policy_version 82214 (0.0025) [2025-01-04 03:19:33,975][134211] Fps is (10 sec: 13098.5, 60 sec: 15085.2, 300 sec: 15508.9). Total num frames: 336769024. Throughput: 0: 3921.3. Samples: 73356458. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-04 03:19:33,976][134211] Avg episode reward: [(0, '7.147')] [2025-01-04 03:19:35,409][134294] Updated weights for policy 0, policy_version 82224 (0.0027) [2025-01-04 03:19:38,273][134294] Updated weights for policy 0, policy_version 82234 (0.0026) [2025-01-04 03:19:38,970][134211] Fps is (10 sec: 13514.3, 60 sec: 15154.7, 300 sec: 15370.3). Total num frames: 336838656. Throughput: 0: 3913.9. Samples: 73376918. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 03:19:38,970][134211] Avg episode reward: [(0, '7.490')] [2025-01-04 03:19:41,171][134294] Updated weights for policy 0, policy_version 82244 (0.0022) [2025-01-04 03:19:43,136][134294] Updated weights for policy 0, policy_version 82254 (0.0014) [2025-01-04 03:19:43,968][134211] Fps is (10 sec: 15986.0, 60 sec: 15564.9, 300 sec: 15342.6). Total num frames: 336928768. Throughput: 0: 3962.4. Samples: 73401530. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 03:19:43,968][134211] Avg episode reward: [(0, '6.899')] [2025-01-04 03:19:45,076][134294] Updated weights for policy 0, policy_version 82264 (0.0014) [2025-01-04 03:19:47,978][134294] Updated weights for policy 0, policy_version 82274 (0.0023) [2025-01-04 03:19:48,968][134211] Fps is (10 sec: 16797.0, 60 sec: 15701.3, 300 sec: 15398.3). Total num frames: 337006592. Throughput: 0: 4008.3. Samples: 73415868. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 03:19:48,968][134211] Avg episode reward: [(0, '7.865')] [2025-01-04 03:19:51,024][134294] Updated weights for policy 0, policy_version 82284 (0.0023) [2025-01-04 03:19:53,968][134211] Fps is (10 sec: 14335.8, 60 sec: 15701.3, 300 sec: 15398.2). Total num frames: 337072128. Throughput: 0: 3954.6. Samples: 73435754. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 03:19:53,968][134211] Avg episode reward: [(0, '7.748')] [2025-01-04 03:19:54,034][134294] Updated weights for policy 0, policy_version 82294 (0.0021) [2025-01-04 03:19:55,976][134294] Updated weights for policy 0, policy_version 82304 (0.0013) [2025-01-04 03:19:57,856][134294] Updated weights for policy 0, policy_version 82314 (0.0014) [2025-01-04 03:19:58,967][134211] Fps is (10 sec: 17203.6, 60 sec: 16111.0, 300 sec: 15537.1). Total num frames: 337178624. Throughput: 0: 3869.6. Samples: 73464348. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 03:19:58,968][134211] Avg episode reward: [(0, '7.331')] [2025-01-04 03:19:59,781][134294] Updated weights for policy 0, policy_version 82324 (0.0014) [2025-01-04 03:20:01,672][134294] Updated weights for policy 0, policy_version 82334 (0.0014) [2025-01-04 03:20:03,562][134294] Updated weights for policy 0, policy_version 82344 (0.0014) [2025-01-04 03:20:03,968][134211] Fps is (10 sec: 21709.0, 60 sec: 16725.4, 300 sec: 15689.8). Total num frames: 337289216. Throughput: 0: 3875.3. Samples: 73480650. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 03:20:03,968][134211] Avg episode reward: [(0, '7.447')] [2025-01-04 03:20:05,470][134294] Updated weights for policy 0, policy_version 82354 (0.0014) [2025-01-04 03:20:08,424][134294] Updated weights for policy 0, policy_version 82364 (0.0025) [2025-01-04 03:20:08,968][134211] Fps is (10 sec: 18841.0, 60 sec: 16520.5, 300 sec: 15717.6). Total num frames: 337367040. Throughput: 0: 4059.9. Samples: 73509900. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 03:20:08,968][134211] Avg episode reward: [(0, '7.143')] [2025-01-04 03:20:11,753][134294] Updated weights for policy 0, policy_version 82374 (0.0027) [2025-01-04 03:20:13,968][134211] Fps is (10 sec: 13926.3, 60 sec: 15769.6, 300 sec: 15648.1). Total num frames: 337428480. Throughput: 0: 4044.7. Samples: 73528576. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 03:20:13,968][134211] Avg episode reward: [(0, '7.119')] [2025-01-04 03:20:14,997][134294] Updated weights for policy 0, policy_version 82384 (0.0026) [2025-01-04 03:20:18,566][134294] Updated weights for policy 0, policy_version 82394 (0.0024) [2025-01-04 03:20:18,968][134211] Fps is (10 sec: 11878.4, 60 sec: 15223.7, 300 sec: 15551.0). Total num frames: 337485824. Throughput: 0: 4025.9. Samples: 73537596. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 03:20:18,968][134211] Avg episode reward: [(0, '7.769')] [2025-01-04 03:20:22,077][134294] Updated weights for policy 0, policy_version 82404 (0.0026) [2025-01-04 03:20:23,968][134211] Fps is (10 sec: 11878.4, 60 sec: 15155.3, 300 sec: 15537.0). Total num frames: 337547264. Throughput: 0: 3961.9. Samples: 73555196. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 03:20:23,968][134211] Avg episode reward: [(0, '7.088')] [2025-01-04 03:20:25,377][134294] Updated weights for policy 0, policy_version 82414 (0.0025) [2025-01-04 03:20:28,412][134294] Updated weights for policy 0, policy_version 82424 (0.0025) [2025-01-04 03:20:28,968][134211] Fps is (10 sec: 13107.5, 60 sec: 15223.6, 300 sec: 15426.0). Total num frames: 337616896. Throughput: 0: 3847.0. Samples: 73574646. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 03:20:28,968][134211] Avg episode reward: [(0, '7.650')] [2025-01-04 03:20:30,393][134294] Updated weights for policy 0, policy_version 82434 (0.0014) [2025-01-04 03:20:32,319][134294] Updated weights for policy 0, policy_version 82444 (0.0014) [2025-01-04 03:20:33,968][134211] Fps is (10 sec: 17203.3, 60 sec: 15839.8, 300 sec: 15550.9). Total num frames: 337719296. Throughput: 0: 3862.5. Samples: 73589680. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 03:20:33,968][134211] Avg episode reward: [(0, '7.577')] [2025-01-04 03:20:34,728][134294] Updated weights for policy 0, policy_version 82454 (0.0019) [2025-01-04 03:20:37,739][134294] Updated weights for policy 0, policy_version 82464 (0.0025) [2025-01-04 03:20:38,968][134211] Fps is (10 sec: 16793.2, 60 sec: 15770.1, 300 sec: 15509.3). Total num frames: 337784832. Throughput: 0: 3973.4. Samples: 73614558. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 03:20:38,968][134211] Avg episode reward: [(0, '7.955')] [2025-01-04 03:20:40,777][134294] Updated weights for policy 0, policy_version 82474 (0.0027) [2025-01-04 03:20:43,740][134294] Updated weights for policy 0, policy_version 82484 (0.0024) [2025-01-04 03:20:43,968][134211] Fps is (10 sec: 13516.5, 60 sec: 15428.2, 300 sec: 15453.7). Total num frames: 337854464. Throughput: 0: 3793.8. Samples: 73635070. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 03:20:43,968][134211] Avg episode reward: [(0, '6.937')] [2025-01-04 03:20:46,501][134294] Updated weights for policy 0, policy_version 82494 (0.0022) [2025-01-04 03:20:48,360][134294] Updated weights for policy 0, policy_version 82504 (0.0012) [2025-01-04 03:20:48,968][134211] Fps is (10 sec: 16384.4, 60 sec: 15701.4, 300 sec: 15564.8). Total num frames: 337948672. Throughput: 0: 3664.7. Samples: 73645562. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 03:20:48,968][134211] Avg episode reward: [(0, '7.873')] [2025-01-04 03:20:50,244][134294] Updated weights for policy 0, policy_version 82514 (0.0012) [2025-01-04 03:20:52,109][134294] Updated weights for policy 0, policy_version 82524 (0.0014) [2025-01-04 03:20:53,955][134294] Updated weights for policy 0, policy_version 82534 (0.0013) [2025-01-04 03:20:53,968][134211] Fps is (10 sec: 20480.5, 60 sec: 16452.3, 300 sec: 15634.2). Total num frames: 338059264. Throughput: 0: 3741.8. Samples: 73678280. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 03:20:53,968][134211] Avg episode reward: [(0, '7.875')] [2025-01-04 03:20:55,854][134294] Updated weights for policy 0, policy_version 82544 (0.0013) [2025-01-04 03:20:57,846][134294] Updated weights for policy 0, policy_version 82554 (0.0015) [2025-01-04 03:20:58,968][134211] Fps is (10 sec: 20479.4, 60 sec: 16247.4, 300 sec: 15592.6). Total num frames: 338153472. Throughput: 0: 4024.3. Samples: 73709670. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 03:20:58,969][134211] Avg episode reward: [(0, '7.285')] [2025-01-04 03:21:00,867][134294] Updated weights for policy 0, policy_version 82564 (0.0028) [2025-01-04 03:21:03,968][134211] Fps is (10 sec: 15973.5, 60 sec: 15496.4, 300 sec: 15592.6). Total num frames: 338219008. Throughput: 0: 4045.7. Samples: 73719654. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 03:21:03,969][134211] Avg episode reward: [(0, '7.868')] [2025-01-04 03:21:04,117][134294] Updated weights for policy 0, policy_version 82574 (0.0025) [2025-01-04 03:21:07,306][134294] Updated weights for policy 0, policy_version 82584 (0.0025) [2025-01-04 03:21:08,968][134211] Fps is (10 sec: 13107.4, 60 sec: 15291.8, 300 sec: 15620.5). Total num frames: 338284544. Throughput: 0: 4082.0. Samples: 73738888. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 03:21:08,968][134211] Avg episode reward: [(0, '7.618')] [2025-01-04 03:21:10,415][134294] Updated weights for policy 0, policy_version 82594 (0.0026) [2025-01-04 03:21:13,427][134294] Updated weights for policy 0, policy_version 82604 (0.0024) [2025-01-04 03:21:13,968][134211] Fps is (10 sec: 13107.5, 60 sec: 15359.9, 300 sec: 15648.1). Total num frames: 338350080. Throughput: 0: 4096.3. Samples: 73758982. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 03:21:13,968][134211] Avg episode reward: [(0, '7.614')] [2025-01-04 03:21:14,055][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000082606_338354176.pth... [2025-01-04 03:21:14,129][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000081693_334614528.pth [2025-01-04 03:21:16,580][134294] Updated weights for policy 0, policy_version 82614 (0.0026) [2025-01-04 03:21:18,968][134211] Fps is (10 sec: 13107.1, 60 sec: 15496.5, 300 sec: 15564.8). Total num frames: 338415616. Throughput: 0: 3977.8. Samples: 73768680. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 03:21:18,968][134211] Avg episode reward: [(0, '7.399')] [2025-01-04 03:21:19,683][134294] Updated weights for policy 0, policy_version 82624 (0.0025) [2025-01-04 03:21:22,708][134294] Updated weights for policy 0, policy_version 82634 (0.0026) [2025-01-04 03:21:23,968][134211] Fps is (10 sec: 13516.3, 60 sec: 15632.9, 300 sec: 15439.8). Total num frames: 338485248. Throughput: 0: 3871.2. Samples: 73788762. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 03:21:23,969][134211] Avg episode reward: [(0, '7.819')] [2025-01-04 03:21:25,627][134294] Updated weights for policy 0, policy_version 82644 (0.0025) [2025-01-04 03:21:28,221][134294] Updated weights for policy 0, policy_version 82654 (0.0020) [2025-01-04 03:21:28,967][134211] Fps is (10 sec: 14746.0, 60 sec: 15769.6, 300 sec: 15384.3). Total num frames: 338563072. Throughput: 0: 3901.7. Samples: 73810646. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 03:21:28,968][134211] Avg episode reward: [(0, '7.508')] [2025-01-04 03:21:30,125][134294] Updated weights for policy 0, policy_version 82664 (0.0013) [2025-01-04 03:21:32,012][134294] Updated weights for policy 0, policy_version 82674 (0.0013) [2025-01-04 03:21:33,902][134294] Updated weights for policy 0, policy_version 82684 (0.0013) [2025-01-04 03:21:33,967][134211] Fps is (10 sec: 18842.9, 60 sec: 15906.2, 300 sec: 15537.0). Total num frames: 338673664. Throughput: 0: 4031.2. Samples: 73826966. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 03:21:33,968][134211] Avg episode reward: [(0, '8.203')] [2025-01-04 03:21:35,896][134294] Updated weights for policy 0, policy_version 82694 (0.0020) [2025-01-04 03:21:38,953][134294] Updated weights for policy 0, policy_version 82704 (0.0023) [2025-01-04 03:21:38,968][134211] Fps is (10 sec: 19249.7, 60 sec: 16179.1, 300 sec: 15592.5). Total num frames: 338755584. Throughput: 0: 3962.1. Samples: 73856576. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 03:21:38,969][134211] Avg episode reward: [(0, '7.257')] [2025-01-04 03:21:42,114][134294] Updated weights for policy 0, policy_version 82714 (0.0024) [2025-01-04 03:21:43,968][134211] Fps is (10 sec: 14335.8, 60 sec: 16042.7, 300 sec: 15592.6). Total num frames: 338817024. Throughput: 0: 3686.5. Samples: 73875560. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 03:21:43,968][134211] Avg episode reward: [(0, '7.913')] [2025-01-04 03:21:45,240][134294] Updated weights for policy 0, policy_version 82724 (0.0024) [2025-01-04 03:21:48,409][134294] Updated weights for policy 0, policy_version 82734 (0.0028) [2025-01-04 03:21:48,968][134211] Fps is (10 sec: 12698.4, 60 sec: 15564.8, 300 sec: 15592.6). Total num frames: 338882560. Throughput: 0: 3689.8. Samples: 73885692. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 03:21:48,968][134211] Avg episode reward: [(0, '8.086')] [2025-01-04 03:21:51,217][134294] Updated weights for policy 0, policy_version 82744 (0.0020) [2025-01-04 03:21:53,205][134294] Updated weights for policy 0, policy_version 82754 (0.0014) [2025-01-04 03:21:53,967][134211] Fps is (10 sec: 15974.7, 60 sec: 15291.8, 300 sec: 15662.0). Total num frames: 338976768. Throughput: 0: 3759.9. Samples: 73908084. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 03:21:53,968][134211] Avg episode reward: [(0, '7.910')] [2025-01-04 03:21:55,190][134294] Updated weights for policy 0, policy_version 82764 (0.0013) [2025-01-04 03:21:57,098][134294] Updated weights for policy 0, policy_version 82774 (0.0014) [2025-01-04 03:21:58,967][134211] Fps is (10 sec: 19661.2, 60 sec: 15428.4, 300 sec: 15648.1). Total num frames: 339079168. Throughput: 0: 4007.3. Samples: 73939308. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 03:21:58,968][134211] Avg episode reward: [(0, '8.541')] [2025-01-04 03:21:59,062][134294] Updated weights for policy 0, policy_version 82784 (0.0013) [2025-01-04 03:22:01,715][134294] Updated weights for policy 0, policy_version 82794 (0.0021) [2025-01-04 03:22:03,968][134211] Fps is (10 sec: 17202.9, 60 sec: 15496.6, 300 sec: 15578.7). Total num frames: 339148800. Throughput: 0: 4080.2. Samples: 73952288. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 03:22:03,968][134211] Avg episode reward: [(0, '7.452')] [2025-01-04 03:22:05,026][134294] Updated weights for policy 0, policy_version 82804 (0.0028) [2025-01-04 03:22:08,244][134294] Updated weights for policy 0, policy_version 82814 (0.0024) [2025-01-04 03:22:08,968][134211] Fps is (10 sec: 13516.6, 60 sec: 15496.6, 300 sec: 15592.6). Total num frames: 339214336. Throughput: 0: 4058.4. Samples: 73971388. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 03:22:08,968][134211] Avg episode reward: [(0, '6.833')] [2025-01-04 03:22:11,334][134294] Updated weights for policy 0, policy_version 82824 (0.0025) [2025-01-04 03:22:13,968][134211] Fps is (10 sec: 13107.1, 60 sec: 15496.6, 300 sec: 15606.5). Total num frames: 339279872. Throughput: 0: 4002.2. Samples: 73990744. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 03:22:13,968][134211] Avg episode reward: [(0, '7.382')] [2025-01-04 03:22:14,621][134294] Updated weights for policy 0, policy_version 82834 (0.0025) [2025-01-04 03:22:18,005][134294] Updated weights for policy 0, policy_version 82844 (0.0027) [2025-01-04 03:22:18,968][134211] Fps is (10 sec: 12287.8, 60 sec: 15360.0, 300 sec: 15564.8). Total num frames: 339337216. Throughput: 0: 3851.8. Samples: 74000300. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 03:22:18,969][134211] Avg episode reward: [(0, '7.346')] [2025-01-04 03:22:20,800][134294] Updated weights for policy 0, policy_version 82854 (0.0019) [2025-01-04 03:22:22,848][134294] Updated weights for policy 0, policy_version 82864 (0.0014) [2025-01-04 03:22:23,968][134211] Fps is (10 sec: 15155.5, 60 sec: 15769.8, 300 sec: 15662.0). Total num frames: 339431424. Throughput: 0: 3690.6. Samples: 74022652. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 03:22:23,968][134211] Avg episode reward: [(0, '7.641')] [2025-01-04 03:22:24,809][134294] Updated weights for policy 0, policy_version 82874 (0.0015) [2025-01-04 03:22:26,679][134294] Updated weights for policy 0, policy_version 82884 (0.0013) [2025-01-04 03:22:28,968][134211] Fps is (10 sec: 18841.9, 60 sec: 16042.6, 300 sec: 15675.9). Total num frames: 339525632. Throughput: 0: 3933.1. Samples: 74052548. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 03:22:28,968][134211] Avg episode reward: [(0, '7.308')] [2025-01-04 03:22:29,406][134294] Updated weights for policy 0, policy_version 82894 (0.0023) [2025-01-04 03:22:32,708][134294] Updated weights for policy 0, policy_version 82904 (0.0028) [2025-01-04 03:22:33,968][134211] Fps is (10 sec: 15564.4, 60 sec: 15223.4, 300 sec: 15550.9). Total num frames: 339587072. Throughput: 0: 3920.1. Samples: 74062098. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 03:22:33,968][134211] Avg episode reward: [(0, '7.202')] [2025-01-04 03:22:35,800][134294] Updated weights for policy 0, policy_version 82914 (0.0027) [2025-01-04 03:22:38,925][134294] Updated weights for policy 0, policy_version 82924 (0.0027) [2025-01-04 03:22:38,968][134211] Fps is (10 sec: 13107.1, 60 sec: 15018.8, 300 sec: 15550.9). Total num frames: 339656704. Throughput: 0: 3865.0. Samples: 74082010. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 03:22:38,968][134211] Avg episode reward: [(0, '8.001')] [2025-01-04 03:22:41,903][134294] Updated weights for policy 0, policy_version 82934 (0.0029) [2025-01-04 03:22:43,968][134211] Fps is (10 sec: 13516.0, 60 sec: 15086.7, 300 sec: 15578.6). Total num frames: 339722240. Throughput: 0: 3606.4. Samples: 74101600. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 03:22:43,969][134211] Avg episode reward: [(0, '7.923')] [2025-01-04 03:22:45,031][134294] Updated weights for policy 0, policy_version 82944 (0.0028) [2025-01-04 03:22:47,481][134294] Updated weights for policy 0, policy_version 82954 (0.0017) [2025-01-04 03:22:48,968][134211] Fps is (10 sec: 14745.7, 60 sec: 15360.0, 300 sec: 15537.0). Total num frames: 339804160. Throughput: 0: 3539.7. Samples: 74111576. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 03:22:48,968][134211] Avg episode reward: [(0, '7.673')] [2025-01-04 03:22:49,823][134294] Updated weights for policy 0, policy_version 82964 (0.0019) [2025-01-04 03:22:52,766][134294] Updated weights for policy 0, policy_version 82974 (0.0026) [2025-01-04 03:22:53,968][134211] Fps is (10 sec: 15156.2, 60 sec: 14950.3, 300 sec: 15412.1). Total num frames: 339873792. Throughput: 0: 3672.1. Samples: 74136632. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 03:22:53,968][134211] Avg episode reward: [(0, '8.089')] [2025-01-04 03:22:55,386][134294] Updated weights for policy 0, policy_version 82984 (0.0020) [2025-01-04 03:22:57,346][134294] Updated weights for policy 0, policy_version 82994 (0.0013) [2025-01-04 03:22:58,968][134211] Fps is (10 sec: 17203.3, 60 sec: 14950.4, 300 sec: 15550.9). Total num frames: 339976192. Throughput: 0: 3836.4. Samples: 74163382. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 03:22:58,968][134211] Avg episode reward: [(0, '8.085')] [2025-01-04 03:22:59,321][134294] Updated weights for policy 0, policy_version 83004 (0.0014) [2025-01-04 03:23:01,180][134294] Updated weights for policy 0, policy_version 83014 (0.0014) [2025-01-04 03:23:03,090][134294] Updated weights for policy 0, policy_version 83024 (0.0014) [2025-01-04 03:23:03,968][134211] Fps is (10 sec: 20889.8, 60 sec: 15564.8, 300 sec: 15689.8). Total num frames: 340082688. Throughput: 0: 3982.6. Samples: 74179518. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 03:23:03,968][134211] Avg episode reward: [(0, '6.695')] [2025-01-04 03:23:05,551][134294] Updated weights for policy 0, policy_version 83034 (0.0021) [2025-01-04 03:23:08,697][134294] Updated weights for policy 0, policy_version 83044 (0.0028) [2025-01-04 03:23:08,968][134211] Fps is (10 sec: 17202.0, 60 sec: 15564.6, 300 sec: 15689.7). Total num frames: 340148224. Throughput: 0: 4062.9. Samples: 74205486. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 03:23:08,969][134211] Avg episode reward: [(0, '7.465')] [2025-01-04 03:23:11,881][134294] Updated weights for policy 0, policy_version 83054 (0.0029) [2025-01-04 03:23:13,968][134211] Fps is (10 sec: 13106.9, 60 sec: 15564.8, 300 sec: 15662.1). Total num frames: 340213760. Throughput: 0: 3825.3. Samples: 74224688. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 03:23:13,969][134211] Avg episode reward: [(0, '8.060')] [2025-01-04 03:23:13,978][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000083060_340213760.pth... [2025-01-04 03:23:14,057][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000082150_336486400.pth [2025-01-04 03:23:15,168][134294] Updated weights for policy 0, policy_version 83064 (0.0029) [2025-01-04 03:23:18,179][134294] Updated weights for policy 0, policy_version 83074 (0.0025) [2025-01-04 03:23:18,968][134211] Fps is (10 sec: 13107.7, 60 sec: 15701.3, 300 sec: 15537.0). Total num frames: 340279296. Throughput: 0: 3823.6. Samples: 74234158. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 03:23:18,969][134211] Avg episode reward: [(0, '7.890')] [2025-01-04 03:23:21,211][134294] Updated weights for policy 0, policy_version 83084 (0.0027) [2025-01-04 03:23:23,968][134211] Fps is (10 sec: 13107.5, 60 sec: 15223.4, 300 sec: 15412.1). Total num frames: 340344832. Throughput: 0: 3826.0. Samples: 74254182. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 03:23:23,968][134211] Avg episode reward: [(0, '7.165')] [2025-01-04 03:23:24,502][134294] Updated weights for policy 0, policy_version 83094 (0.0026) [2025-01-04 03:23:26,631][134294] Updated weights for policy 0, policy_version 83104 (0.0015) [2025-01-04 03:23:28,970][134211] Fps is (10 sec: 15152.4, 60 sec: 15086.4, 300 sec: 15481.4). Total num frames: 340430848. Throughput: 0: 3930.4. Samples: 74278474. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 03:23:28,970][134211] Avg episode reward: [(0, '6.796')] [2025-01-04 03:23:29,122][134294] Updated weights for policy 0, policy_version 83114 (0.0020) [2025-01-04 03:23:32,219][134294] Updated weights for policy 0, policy_version 83124 (0.0027) [2025-01-04 03:23:33,968][134211] Fps is (10 sec: 15974.5, 60 sec: 15291.8, 300 sec: 15509.3). Total num frames: 340504576. Throughput: 0: 3936.5. Samples: 74288720. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 03:23:33,968][134211] Avg episode reward: [(0, '7.242')] [2025-01-04 03:23:34,447][134294] Updated weights for policy 0, policy_version 83134 (0.0015) [2025-01-04 03:23:36,350][134294] Updated weights for policy 0, policy_version 83144 (0.0012) [2025-01-04 03:23:38,213][134294] Updated weights for policy 0, policy_version 83154 (0.0014) [2025-01-04 03:23:38,967][134211] Fps is (10 sec: 18026.3, 60 sec: 15906.2, 300 sec: 15648.1). Total num frames: 340611072. Throughput: 0: 4012.0. Samples: 74317172. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 03:23:38,968][134211] Avg episode reward: [(0, '7.735')] [2025-01-04 03:23:40,309][134294] Updated weights for policy 0, policy_version 83164 (0.0015) [2025-01-04 03:23:43,363][134294] Updated weights for policy 0, policy_version 83174 (0.0028) [2025-01-04 03:23:43,968][134211] Fps is (10 sec: 18021.9, 60 sec: 16042.8, 300 sec: 15662.0). Total num frames: 340684800. Throughput: 0: 3987.6. Samples: 74342826. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 03:23:43,970][134211] Avg episode reward: [(0, '8.582')] [2025-01-04 03:23:46,547][134294] Updated weights for policy 0, policy_version 83184 (0.0029) [2025-01-04 03:23:48,968][134211] Fps is (10 sec: 13926.1, 60 sec: 15769.6, 300 sec: 15662.0). Total num frames: 340750336. Throughput: 0: 3843.8. Samples: 74352488. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 03:23:48,968][134211] Avg episode reward: [(0, '8.831')] [2025-01-04 03:23:48,969][134264] Saving new best policy, reward=8.831! [2025-01-04 03:23:50,023][134294] Updated weights for policy 0, policy_version 83194 (0.0027) [2025-01-04 03:23:52,859][134294] Updated weights for policy 0, policy_version 83204 (0.0024) [2025-01-04 03:23:53,968][134211] Fps is (10 sec: 13926.8, 60 sec: 15837.9, 300 sec: 15634.2). Total num frames: 340824064. Throughput: 0: 3680.3. Samples: 74371098. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 03:23:53,968][134211] Avg episode reward: [(0, '7.929')] [2025-01-04 03:23:54,772][134294] Updated weights for policy 0, policy_version 83214 (0.0012) [2025-01-04 03:23:56,637][134294] Updated weights for policy 0, policy_version 83224 (0.0014) [2025-01-04 03:23:58,529][134294] Updated weights for policy 0, policy_version 83234 (0.0013) [2025-01-04 03:23:58,968][134211] Fps is (10 sec: 18431.9, 60 sec: 15974.4, 300 sec: 15759.2). Total num frames: 340934656. Throughput: 0: 3960.1. Samples: 74402894. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 03:23:58,968][134211] Avg episode reward: [(0, '7.191')] [2025-01-04 03:24:00,566][134294] Updated weights for policy 0, policy_version 83244 (0.0015) [2025-01-04 03:24:03,697][134294] Updated weights for policy 0, policy_version 83254 (0.0026) [2025-01-04 03:24:03,968][134211] Fps is (10 sec: 18431.8, 60 sec: 15428.3, 300 sec: 15703.6). Total num frames: 341008384. Throughput: 0: 4070.6. Samples: 74417334. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 03:24:03,968][134211] Avg episode reward: [(0, '7.107')] [2025-01-04 03:24:06,705][134294] Updated weights for policy 0, policy_version 83264 (0.0026) [2025-01-04 03:24:08,968][134211] Fps is (10 sec: 13926.4, 60 sec: 15428.4, 300 sec: 15564.8). Total num frames: 341073920. Throughput: 0: 4061.1. Samples: 74436934. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 03:24:08,968][134211] Avg episode reward: [(0, '7.159')] [2025-01-04 03:24:09,980][134294] Updated weights for policy 0, policy_version 83274 (0.0026) [2025-01-04 03:24:13,008][134294] Updated weights for policy 0, policy_version 83284 (0.0028) [2025-01-04 03:24:13,969][134211] Fps is (10 sec: 13106.1, 60 sec: 15428.1, 300 sec: 15481.5). Total num frames: 341139456. Throughput: 0: 3957.0. Samples: 74456536. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 03:24:13,969][134211] Avg episode reward: [(0, '6.906')] [2025-01-04 03:24:16,079][134294] Updated weights for policy 0, policy_version 83294 (0.0026) [2025-01-04 03:24:18,968][134211] Fps is (10 sec: 12697.0, 60 sec: 15359.9, 300 sec: 15467.6). Total num frames: 341200896. Throughput: 0: 3945.6. Samples: 74466274. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 03:24:18,969][134211] Avg episode reward: [(0, '7.361')] [2025-01-04 03:24:19,547][134294] Updated weights for policy 0, policy_version 83304 (0.0025) [2025-01-04 03:24:21,576][134294] Updated weights for policy 0, policy_version 83314 (0.0012) [2025-01-04 03:24:23,616][134294] Updated weights for policy 0, policy_version 83324 (0.0013) [2025-01-04 03:24:23,968][134211] Fps is (10 sec: 15975.2, 60 sec: 15906.0, 300 sec: 15578.7). Total num frames: 341299200. Throughput: 0: 3835.7. Samples: 74489782. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 03:24:23,968][134211] Avg episode reward: [(0, '7.297')] [2025-01-04 03:24:25,558][134294] Updated weights for policy 0, policy_version 83334 (0.0015) [2025-01-04 03:24:27,479][134294] Updated weights for policy 0, policy_version 83344 (0.0013) [2025-01-04 03:24:28,968][134211] Fps is (10 sec: 20891.0, 60 sec: 16316.3, 300 sec: 15731.8). Total num frames: 341409792. Throughput: 0: 3969.2. Samples: 74521440. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 03:24:28,968][134211] Avg episode reward: [(0, '7.940')] [2025-01-04 03:24:29,348][134294] Updated weights for policy 0, policy_version 83354 (0.0013) [2025-01-04 03:24:31,898][134294] Updated weights for policy 0, policy_version 83364 (0.0023) [2025-01-04 03:24:33,968][134211] Fps is (10 sec: 18432.5, 60 sec: 16315.7, 300 sec: 15745.4). Total num frames: 341483520. Throughput: 0: 4068.1. Samples: 74535552. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 03:24:33,968][134211] Avg episode reward: [(0, '7.598')] [2025-01-04 03:24:35,143][134294] Updated weights for policy 0, policy_version 83374 (0.0027) [2025-01-04 03:24:38,351][134294] Updated weights for policy 0, policy_version 83384 (0.0027) [2025-01-04 03:24:38,968][134211] Fps is (10 sec: 13516.1, 60 sec: 15564.6, 300 sec: 15648.1). Total num frames: 341544960. Throughput: 0: 4075.5. Samples: 74554498. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 03:24:38,969][134211] Avg episode reward: [(0, '8.174')] [2025-01-04 03:24:41,724][134294] Updated weights for policy 0, policy_version 83394 (0.0025) [2025-01-04 03:24:43,968][134211] Fps is (10 sec: 12288.0, 60 sec: 15360.0, 300 sec: 15592.6). Total num frames: 341606400. Throughput: 0: 3780.3. Samples: 74573008. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 03:24:43,968][134211] Avg episode reward: [(0, '8.161')] [2025-01-04 03:24:44,998][134294] Updated weights for policy 0, policy_version 83404 (0.0027) [2025-01-04 03:24:47,978][134294] Updated weights for policy 0, policy_version 83414 (0.0022) [2025-01-04 03:24:48,968][134211] Fps is (10 sec: 13107.7, 60 sec: 15428.3, 300 sec: 15606.5). Total num frames: 341676032. Throughput: 0: 3676.4. Samples: 74582770. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 03:24:48,968][134211] Avg episode reward: [(0, '7.747')] [2025-01-04 03:24:50,985][134294] Updated weights for policy 0, policy_version 83424 (0.0024) [2025-01-04 03:24:53,871][134294] Updated weights for policy 0, policy_version 83434 (0.0023) [2025-01-04 03:24:53,968][134211] Fps is (10 sec: 13926.1, 60 sec: 15359.9, 300 sec: 15481.5). Total num frames: 341745664. Throughput: 0: 3702.8. Samples: 74603560. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 03:24:53,968][134211] Avg episode reward: [(0, '7.163')] [2025-01-04 03:24:56,943][134294] Updated weights for policy 0, policy_version 83444 (0.0023) [2025-01-04 03:24:58,878][134294] Updated weights for policy 0, policy_version 83454 (0.0015) [2025-01-04 03:24:58,967][134211] Fps is (10 sec: 15155.5, 60 sec: 14882.2, 300 sec: 15384.3). Total num frames: 341827584. Throughput: 0: 3773.1. Samples: 74626322. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 03:24:58,968][134211] Avg episode reward: [(0, '7.611')] [2025-01-04 03:25:00,754][134294] Updated weights for policy 0, policy_version 83464 (0.0014) [2025-01-04 03:25:02,643][134294] Updated weights for policy 0, policy_version 83474 (0.0013) [2025-01-04 03:25:03,968][134211] Fps is (10 sec: 18842.4, 60 sec: 15428.3, 300 sec: 15481.5). Total num frames: 341934080. Throughput: 0: 3920.2. Samples: 74642682. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 03:25:03,968][134211] Avg episode reward: [(0, '7.419')] [2025-01-04 03:25:04,607][134294] Updated weights for policy 0, policy_version 83484 (0.0016) [2025-01-04 03:25:07,651][134294] Updated weights for policy 0, policy_version 83494 (0.0025) [2025-01-04 03:25:08,968][134211] Fps is (10 sec: 18021.9, 60 sec: 15564.8, 300 sec: 15523.1). Total num frames: 342007808. Throughput: 0: 4002.0. Samples: 74669872. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 03:25:08,968][134211] Avg episode reward: [(0, '7.252')] [2025-01-04 03:25:10,658][134294] Updated weights for policy 0, policy_version 83504 (0.0028) [2025-01-04 03:25:13,679][134294] Updated weights for policy 0, policy_version 83514 (0.0029) [2025-01-04 03:25:13,968][134211] Fps is (10 sec: 13926.0, 60 sec: 15565.0, 300 sec: 15550.9). Total num frames: 342073344. Throughput: 0: 3742.3. Samples: 74689846. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 03:25:13,969][134211] Avg episode reward: [(0, '7.351')] [2025-01-04 03:25:13,977][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000083514_342073344.pth... [2025-01-04 03:25:14,069][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000082606_338354176.pth [2025-01-04 03:25:17,249][134294] Updated weights for policy 0, policy_version 83524 (0.0028) [2025-01-04 03:25:18,968][134211] Fps is (10 sec: 12697.8, 60 sec: 15565.0, 300 sec: 15550.9). Total num frames: 342134784. Throughput: 0: 3623.9. Samples: 74698626. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 03:25:18,968][134211] Avg episode reward: [(0, '7.713')] [2025-01-04 03:25:20,358][134294] Updated weights for policy 0, policy_version 83534 (0.0024) [2025-01-04 03:25:22,686][134294] Updated weights for policy 0, policy_version 83544 (0.0016) [2025-01-04 03:25:23,968][134211] Fps is (10 sec: 14746.1, 60 sec: 15360.1, 300 sec: 15606.5). Total num frames: 342220800. Throughput: 0: 3671.5. Samples: 74719712. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 03:25:23,968][134211] Avg episode reward: [(0, '7.078')] [2025-01-04 03:25:24,588][134294] Updated weights for policy 0, policy_version 83554 (0.0012) [2025-01-04 03:25:26,951][134294] Updated weights for policy 0, policy_version 83564 (0.0018) [2025-01-04 03:25:28,969][134211] Fps is (10 sec: 16791.6, 60 sec: 14881.8, 300 sec: 15537.0). Total num frames: 342302720. Throughput: 0: 3857.8. Samples: 74746612. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 03:25:28,969][134211] Avg episode reward: [(0, '7.689')] [2025-01-04 03:25:30,146][134294] Updated weights for policy 0, policy_version 83574 (0.0029) [2025-01-04 03:25:33,367][134294] Updated weights for policy 0, policy_version 83584 (0.0026) [2025-01-04 03:25:33,968][134211] Fps is (10 sec: 14745.4, 60 sec: 14745.6, 300 sec: 15537.0). Total num frames: 342368256. Throughput: 0: 3854.5. Samples: 74756222. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 03:25:33,968][134211] Avg episode reward: [(0, '7.824')] [2025-01-04 03:25:36,263][134294] Updated weights for policy 0, policy_version 83594 (0.0025) [2025-01-04 03:25:38,310][134294] Updated weights for policy 0, policy_version 83604 (0.0013) [2025-01-04 03:25:38,968][134211] Fps is (10 sec: 15157.1, 60 sec: 15155.3, 300 sec: 15592.6). Total num frames: 342454272. Throughput: 0: 3869.5. Samples: 74777686. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 03:25:38,968][134211] Avg episode reward: [(0, '7.344')] [2025-01-04 03:25:40,222][134294] Updated weights for policy 0, policy_version 83614 (0.0012) [2025-01-04 03:25:42,141][134294] Updated weights for policy 0, policy_version 83624 (0.0014) [2025-01-04 03:25:43,968][134211] Fps is (10 sec: 19251.5, 60 sec: 15906.2, 300 sec: 15634.2). Total num frames: 342560768. Throughput: 0: 4083.0. Samples: 74810056. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 03:25:43,968][134211] Avg episode reward: [(0, '7.944')] [2025-01-04 03:25:43,999][134294] Updated weights for policy 0, policy_version 83634 (0.0014) [2025-01-04 03:25:46,906][134294] Updated weights for policy 0, policy_version 83644 (0.0023) [2025-01-04 03:25:48,968][134211] Fps is (10 sec: 17203.0, 60 sec: 15837.9, 300 sec: 15481.5). Total num frames: 342626304. Throughput: 0: 3991.1. Samples: 74822282. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 03:25:48,968][134211] Avg episode reward: [(0, '7.651')] [2025-01-04 03:25:50,386][134294] Updated weights for policy 0, policy_version 83654 (0.0029) [2025-01-04 03:25:53,529][134294] Updated weights for policy 0, policy_version 83664 (0.0026) [2025-01-04 03:25:53,968][134211] Fps is (10 sec: 13106.9, 60 sec: 15769.7, 300 sec: 15384.3). Total num frames: 342691840. Throughput: 0: 3796.5. Samples: 74840714. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 03:25:53,968][134211] Avg episode reward: [(0, '7.308')] [2025-01-04 03:25:56,638][134294] Updated weights for policy 0, policy_version 83674 (0.0024) [2025-01-04 03:25:58,968][134211] Fps is (10 sec: 13107.2, 60 sec: 15496.5, 300 sec: 15384.3). Total num frames: 342757376. Throughput: 0: 3792.2. Samples: 74860494. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 03:25:58,968][134211] Avg episode reward: [(0, '7.553')] [2025-01-04 03:25:59,722][134294] Updated weights for policy 0, policy_version 83684 (0.0024) [2025-01-04 03:26:02,540][134294] Updated weights for policy 0, policy_version 83694 (0.0021) [2025-01-04 03:26:03,967][134211] Fps is (10 sec: 14746.1, 60 sec: 15087.0, 300 sec: 15439.8). Total num frames: 342839296. Throughput: 0: 3817.8. Samples: 74870426. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 03:26:03,968][134211] Avg episode reward: [(0, '7.821')] [2025-01-04 03:26:04,408][134294] Updated weights for policy 0, policy_version 83704 (0.0013) [2025-01-04 03:26:06,418][134294] Updated weights for policy 0, policy_version 83714 (0.0015) [2025-01-04 03:26:08,968][134211] Fps is (10 sec: 16793.5, 60 sec: 15291.7, 300 sec: 15509.3). Total num frames: 342925312. Throughput: 0: 3991.0. Samples: 74899306. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 03:26:08,968][134211] Avg episode reward: [(0, '7.475')] [2025-01-04 03:26:09,279][134294] Updated weights for policy 0, policy_version 83724 (0.0024) [2025-01-04 03:26:12,426][134294] Updated weights for policy 0, policy_version 83734 (0.0026) [2025-01-04 03:26:13,968][134211] Fps is (10 sec: 15564.5, 60 sec: 15360.1, 300 sec: 15523.2). Total num frames: 342994944. Throughput: 0: 3834.8. Samples: 74919172. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 03:26:13,968][134211] Avg episode reward: [(0, '7.553')] [2025-01-04 03:26:15,564][134294] Updated weights for policy 0, policy_version 83744 (0.0023) [2025-01-04 03:26:18,741][134294] Updated weights for policy 0, policy_version 83754 (0.0027) [2025-01-04 03:26:18,967][134211] Fps is (10 sec: 13517.1, 60 sec: 15428.3, 300 sec: 15509.3). Total num frames: 343060480. Throughput: 0: 3842.5. Samples: 74929136. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 03:26:18,968][134211] Avg episode reward: [(0, '7.409')] [2025-01-04 03:26:20,833][134294] Updated weights for policy 0, policy_version 83764 (0.0014) [2025-01-04 03:26:22,800][134294] Updated weights for policy 0, policy_version 83774 (0.0014) [2025-01-04 03:26:23,967][134211] Fps is (10 sec: 16793.9, 60 sec: 15701.3, 300 sec: 15592.6). Total num frames: 343162880. Throughput: 0: 3926.5. Samples: 74954380. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 03:26:23,968][134211] Avg episode reward: [(0, '7.349')] [2025-01-04 03:26:24,739][134294] Updated weights for policy 0, policy_version 83784 (0.0014) [2025-01-04 03:26:26,585][134294] Updated weights for policy 0, policy_version 83794 (0.0014) [2025-01-04 03:26:28,483][134294] Updated weights for policy 0, policy_version 83804 (0.0013) [2025-01-04 03:26:28,967][134211] Fps is (10 sec: 20889.7, 60 sec: 16111.3, 300 sec: 15578.7). Total num frames: 343269376. Throughput: 0: 3928.6. Samples: 74986844. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 03:26:28,968][134211] Avg episode reward: [(0, '7.308')] [2025-01-04 03:26:30,850][134294] Updated weights for policy 0, policy_version 83814 (0.0021) [2025-01-04 03:26:33,968][134211] Fps is (10 sec: 17612.2, 60 sec: 16179.2, 300 sec: 15537.1). Total num frames: 343339008. Throughput: 0: 3944.3. Samples: 74999774. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 03:26:33,969][134211] Avg episode reward: [(0, '7.544')] [2025-01-04 03:26:34,064][134294] Updated weights for policy 0, policy_version 83824 (0.0026) [2025-01-04 03:26:37,144][134294] Updated weights for policy 0, policy_version 83834 (0.0027) [2025-01-04 03:26:38,968][134211] Fps is (10 sec: 13516.4, 60 sec: 15837.8, 300 sec: 15550.9). Total num frames: 343404544. Throughput: 0: 3965.7. Samples: 75019172. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 03:26:38,968][134211] Avg episode reward: [(0, '8.072')] [2025-01-04 03:26:40,427][134294] Updated weights for policy 0, policy_version 83844 (0.0028) [2025-01-04 03:26:43,272][134294] Updated weights for policy 0, policy_version 83854 (0.0024) [2025-01-04 03:26:43,968][134211] Fps is (10 sec: 13107.4, 60 sec: 15155.2, 300 sec: 15550.9). Total num frames: 343470080. Throughput: 0: 3971.7. Samples: 75039220. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 03:26:43,968][134211] Avg episode reward: [(0, '8.899')] [2025-01-04 03:26:44,004][134264] Saving new best policy, reward=8.899! [2025-01-04 03:26:46,398][134294] Updated weights for policy 0, policy_version 83864 (0.0026) [2025-01-04 03:26:48,968][134211] Fps is (10 sec: 13516.8, 60 sec: 15223.5, 300 sec: 15467.6). Total num frames: 343539712. Throughput: 0: 3969.0. Samples: 75049030. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 03:26:48,968][134211] Avg episode reward: [(0, '7.833')] [2025-01-04 03:26:49,555][134294] Updated weights for policy 0, policy_version 83874 (0.0024) [2025-01-04 03:26:52,592][134294] Updated weights for policy 0, policy_version 83884 (0.0024) [2025-01-04 03:26:53,968][134211] Fps is (10 sec: 13516.0, 60 sec: 15223.3, 300 sec: 15342.6). Total num frames: 343605248. Throughput: 0: 3774.8. Samples: 75069176. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 03:26:53,969][134211] Avg episode reward: [(0, '7.670')] [2025-01-04 03:26:55,595][134294] Updated weights for policy 0, policy_version 83894 (0.0024) [2025-01-04 03:26:58,588][134294] Updated weights for policy 0, policy_version 83904 (0.0022) [2025-01-04 03:26:58,968][134211] Fps is (10 sec: 13516.8, 60 sec: 15291.7, 300 sec: 15342.6). Total num frames: 343674880. Throughput: 0: 3784.8. Samples: 75089488. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 03:26:58,968][134211] Avg episode reward: [(0, '7.706')] [2025-01-04 03:27:00,713][134294] Updated weights for policy 0, policy_version 83914 (0.0016) [2025-01-04 03:27:02,599][134294] Updated weights for policy 0, policy_version 83924 (0.0013) [2025-01-04 03:27:03,968][134211] Fps is (10 sec: 17203.5, 60 sec: 15632.9, 300 sec: 15467.6). Total num frames: 343777280. Throughput: 0: 3882.5. Samples: 75103850. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 03:27:03,969][134211] Avg episode reward: [(0, '7.158')] [2025-01-04 03:27:04,959][134294] Updated weights for policy 0, policy_version 83934 (0.0018) [2025-01-04 03:27:08,004][134294] Updated weights for policy 0, policy_version 83944 (0.0026) [2025-01-04 03:27:08,968][134211] Fps is (10 sec: 16793.7, 60 sec: 15291.8, 300 sec: 15467.6). Total num frames: 343842816. Throughput: 0: 3885.6. Samples: 75129232. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 03:27:08,968][134211] Avg episode reward: [(0, '7.729')] [2025-01-04 03:27:11,237][134294] Updated weights for policy 0, policy_version 83954 (0.0026) [2025-01-04 03:27:13,968][134211] Fps is (10 sec: 13107.8, 60 sec: 15223.5, 300 sec: 15495.4). Total num frames: 343908352. Throughput: 0: 3589.0. Samples: 75148350. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 03:27:13,968][134211] Avg episode reward: [(0, '7.935')] [2025-01-04 03:27:13,975][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000083962_343908352.pth... [2025-01-04 03:27:14,051][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000083060_340213760.pth [2025-01-04 03:27:14,335][134294] Updated weights for policy 0, policy_version 83964 (0.0026) [2025-01-04 03:27:16,356][134294] Updated weights for policy 0, policy_version 83974 (0.0012) [2025-01-04 03:27:18,283][134294] Updated weights for policy 0, policy_version 83984 (0.0014) [2025-01-04 03:27:18,967][134211] Fps is (10 sec: 16793.8, 60 sec: 15837.9, 300 sec: 15523.2). Total num frames: 344010752. Throughput: 0: 3596.6. Samples: 75161618. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 03:27:18,968][134211] Avg episode reward: [(0, '7.166')] [2025-01-04 03:27:20,145][134294] Updated weights for policy 0, policy_version 83994 (0.0013) [2025-01-04 03:27:21,996][134294] Updated weights for policy 0, policy_version 84004 (0.0014) [2025-01-04 03:27:23,869][134294] Updated weights for policy 0, policy_version 84014 (0.0015) [2025-01-04 03:27:23,968][134211] Fps is (10 sec: 21299.4, 60 sec: 15974.4, 300 sec: 15578.7). Total num frames: 344121344. Throughput: 0: 3889.8. Samples: 75194212. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 03:27:23,968][134211] Avg episode reward: [(0, '7.620')] [2025-01-04 03:27:25,756][134294] Updated weights for policy 0, policy_version 84024 (0.0013) [2025-01-04 03:27:28,750][134294] Updated weights for policy 0, policy_version 84034 (0.0024) [2025-01-04 03:27:28,968][134211] Fps is (10 sec: 19250.7, 60 sec: 15564.7, 300 sec: 15648.1). Total num frames: 344203264. Throughput: 0: 4066.9. Samples: 75222232. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 03:27:28,968][134211] Avg episode reward: [(0, '8.933')] [2025-01-04 03:27:28,969][134264] Saving new best policy, reward=8.933! [2025-01-04 03:27:32,067][134294] Updated weights for policy 0, policy_version 84044 (0.0025) [2025-01-04 03:27:33,968][134211] Fps is (10 sec: 14335.8, 60 sec: 15428.3, 300 sec: 15620.3). Total num frames: 344264704. Throughput: 0: 4049.1. Samples: 75231240. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 03:27:33,968][134211] Avg episode reward: [(0, '7.152')] [2025-01-04 03:27:35,219][134294] Updated weights for policy 0, policy_version 84054 (0.0028) [2025-01-04 03:27:38,268][134294] Updated weights for policy 0, policy_version 84064 (0.0026) [2025-01-04 03:27:38,968][134211] Fps is (10 sec: 13107.4, 60 sec: 15496.6, 300 sec: 15634.3). Total num frames: 344334336. Throughput: 0: 4045.1. Samples: 75251202. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 03:27:38,968][134211] Avg episode reward: [(0, '7.659')] [2025-01-04 03:27:41,342][134294] Updated weights for policy 0, policy_version 84074 (0.0025) [2025-01-04 03:27:43,968][134211] Fps is (10 sec: 13516.7, 60 sec: 15496.5, 300 sec: 15578.7). Total num frames: 344399872. Throughput: 0: 4028.7. Samples: 75270778. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 03:27:43,969][134211] Avg episode reward: [(0, '7.219')] [2025-01-04 03:27:44,527][134294] Updated weights for policy 0, policy_version 84084 (0.0029) [2025-01-04 03:27:47,550][134294] Updated weights for policy 0, policy_version 84094 (0.0023) [2025-01-04 03:27:48,968][134211] Fps is (10 sec: 13107.2, 60 sec: 15428.3, 300 sec: 15564.8). Total num frames: 344465408. Throughput: 0: 3931.0. Samples: 75280744. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 03:27:48,968][134211] Avg episode reward: [(0, '8.238')] [2025-01-04 03:27:50,578][134294] Updated weights for policy 0, policy_version 84104 (0.0024) [2025-01-04 03:27:53,492][134294] Updated weights for policy 0, policy_version 84114 (0.0025) [2025-01-04 03:27:53,968][134211] Fps is (10 sec: 13516.8, 60 sec: 15496.6, 300 sec: 15453.7). Total num frames: 344535040. Throughput: 0: 3824.8. Samples: 75301348. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 03:27:53,968][134211] Avg episode reward: [(0, '7.105')] [2025-01-04 03:27:56,054][134294] Updated weights for policy 0, policy_version 84124 (0.0020) [2025-01-04 03:27:57,940][134294] Updated weights for policy 0, policy_version 84134 (0.0014) [2025-01-04 03:27:58,967][134211] Fps is (10 sec: 16793.8, 60 sec: 15974.5, 300 sec: 15426.0). Total num frames: 344633344. Throughput: 0: 3986.9. Samples: 75327758. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 03:27:58,968][134211] Avg episode reward: [(0, '7.703')] [2025-01-04 03:27:59,847][134294] Updated weights for policy 0, policy_version 84144 (0.0014) [2025-01-04 03:28:01,683][134294] Updated weights for policy 0, policy_version 84154 (0.0015) [2025-01-04 03:28:03,597][134294] Updated weights for policy 0, policy_version 84164 (0.0013) [2025-01-04 03:28:03,968][134211] Fps is (10 sec: 20890.1, 60 sec: 16111.1, 300 sec: 15578.7). Total num frames: 344743936. Throughput: 0: 4055.0. Samples: 75344094. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-04 03:28:03,968][134211] Avg episode reward: [(0, '7.612')] [2025-01-04 03:28:05,714][134294] Updated weights for policy 0, policy_version 84174 (0.0016) [2025-01-04 03:28:08,896][134294] Updated weights for policy 0, policy_version 84184 (0.0027) [2025-01-04 03:28:08,968][134211] Fps is (10 sec: 18431.0, 60 sec: 16247.4, 300 sec: 15606.4). Total num frames: 344817664. Throughput: 0: 3957.7. Samples: 75372310. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-04 03:28:08,969][134211] Avg episode reward: [(0, '8.031')] [2025-01-04 03:28:12,012][134294] Updated weights for policy 0, policy_version 84194 (0.0025) [2025-01-04 03:28:13,968][134211] Fps is (10 sec: 13516.7, 60 sec: 16179.2, 300 sec: 15592.6). Total num frames: 344879104. Throughput: 0: 3761.5. Samples: 75391498. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-04 03:28:13,968][134211] Avg episode reward: [(0, '7.307')] [2025-01-04 03:28:15,211][134294] Updated weights for policy 0, policy_version 84204 (0.0029) [2025-01-04 03:28:18,431][134294] Updated weights for policy 0, policy_version 84214 (0.0024) [2025-01-04 03:28:18,968][134211] Fps is (10 sec: 12698.0, 60 sec: 15564.7, 300 sec: 15592.6). Total num frames: 344944640. Throughput: 0: 3782.7. Samples: 75401462. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-04 03:28:18,968][134211] Avg episode reward: [(0, '7.710')] [2025-01-04 03:28:22,036][134294] Updated weights for policy 0, policy_version 84224 (0.0027) [2025-01-04 03:28:23,968][134211] Fps is (10 sec: 12697.6, 60 sec: 14745.6, 300 sec: 15509.4). Total num frames: 345006080. Throughput: 0: 3729.8. Samples: 75419044. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-04 03:28:23,968][134211] Avg episode reward: [(0, '7.358')] [2025-01-04 03:28:24,828][134294] Updated weights for policy 0, policy_version 84234 (0.0018) [2025-01-04 03:28:26,815][134294] Updated weights for policy 0, policy_version 84244 (0.0015) [2025-01-04 03:28:28,671][134294] Updated weights for policy 0, policy_version 84254 (0.0012) [2025-01-04 03:28:28,968][134211] Fps is (10 sec: 16384.2, 60 sec: 15087.0, 300 sec: 15606.5). Total num frames: 345108480. Throughput: 0: 3909.0. Samples: 75446680. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-04 03:28:28,968][134211] Avg episode reward: [(0, '7.974')] [2025-01-04 03:28:30,520][134294] Updated weights for policy 0, policy_version 84264 (0.0015) [2025-01-04 03:28:32,437][134294] Updated weights for policy 0, policy_version 84274 (0.0015) [2025-01-04 03:28:33,968][134211] Fps is (10 sec: 20479.7, 60 sec: 15769.6, 300 sec: 15592.6). Total num frames: 345210880. Throughput: 0: 4048.7. Samples: 75462936. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-04 03:28:33,968][134211] Avg episode reward: [(0, '8.213')] [2025-01-04 03:28:34,909][134294] Updated weights for policy 0, policy_version 84284 (0.0019) [2025-01-04 03:28:38,019][134294] Updated weights for policy 0, policy_version 84294 (0.0026) [2025-01-04 03:28:38,969][134211] Fps is (10 sec: 17200.2, 60 sec: 15769.1, 300 sec: 15578.6). Total num frames: 345280512. Throughput: 0: 4142.9. Samples: 75487786. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-04 03:28:38,970][134211] Avg episode reward: [(0, '7.751')] [2025-01-04 03:28:41,116][134294] Updated weights for policy 0, policy_version 84304 (0.0027) [2025-01-04 03:28:43,968][134211] Fps is (10 sec: 13107.3, 60 sec: 15701.3, 300 sec: 15564.8). Total num frames: 345341952. Throughput: 0: 3980.5. Samples: 75506882. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-04 03:28:43,968][134211] Avg episode reward: [(0, '7.493')] [2025-01-04 03:28:44,378][134294] Updated weights for policy 0, policy_version 84314 (0.0027) [2025-01-04 03:28:47,398][134294] Updated weights for policy 0, policy_version 84324 (0.0026) [2025-01-04 03:28:48,968][134211] Fps is (10 sec: 13109.3, 60 sec: 15769.6, 300 sec: 15550.9). Total num frames: 345411584. Throughput: 0: 3839.2. Samples: 75516860. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-04 03:28:48,968][134211] Avg episode reward: [(0, '6.878')] [2025-01-04 03:28:50,461][134294] Updated weights for policy 0, policy_version 84334 (0.0027) [2025-01-04 03:28:53,334][134294] Updated weights for policy 0, policy_version 84344 (0.0023) [2025-01-04 03:28:53,968][134211] Fps is (10 sec: 13926.6, 60 sec: 15769.7, 300 sec: 15412.1). Total num frames: 345481216. Throughput: 0: 3668.7. Samples: 75537402. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 03:28:53,968][134211] Avg episode reward: [(0, '7.796')] [2025-01-04 03:28:56,386][134294] Updated weights for policy 0, policy_version 84354 (0.0025) [2025-01-04 03:28:58,968][134211] Fps is (10 sec: 13516.9, 60 sec: 15223.4, 300 sec: 15384.3). Total num frames: 345546752. Throughput: 0: 3691.1. Samples: 75557596. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 03:28:58,968][134211] Avg episode reward: [(0, '7.990')] [2025-01-04 03:28:59,479][134294] Updated weights for policy 0, policy_version 84364 (0.0025) [2025-01-04 03:29:01,395][134294] Updated weights for policy 0, policy_version 84374 (0.0013) [2025-01-04 03:29:03,275][134294] Updated weights for policy 0, policy_version 84384 (0.0014) [2025-01-04 03:29:03,968][134211] Fps is (10 sec: 16793.4, 60 sec: 15086.9, 300 sec: 15509.3). Total num frames: 345649152. Throughput: 0: 3769.6. Samples: 75571096. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 03:29:03,968][134211] Avg episode reward: [(0, '6.782')] [2025-01-04 03:29:05,189][134294] Updated weights for policy 0, policy_version 84394 (0.0013) [2025-01-04 03:29:07,039][134294] Updated weights for policy 0, policy_version 84404 (0.0014) [2025-01-04 03:29:08,924][134294] Updated weights for policy 0, policy_version 84414 (0.0012) [2025-01-04 03:29:08,968][134211] Fps is (10 sec: 21299.4, 60 sec: 15701.4, 300 sec: 15662.0). Total num frames: 345759744. Throughput: 0: 4103.6. Samples: 75603706. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 03:29:08,968][134211] Avg episode reward: [(0, '7.341')] [2025-01-04 03:29:11,749][134294] Updated weights for policy 0, policy_version 84424 (0.0024) [2025-01-04 03:29:13,968][134211] Fps is (10 sec: 18021.9, 60 sec: 15837.8, 300 sec: 15689.8). Total num frames: 345829376. Throughput: 0: 4032.7. Samples: 75628154. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 03:29:13,969][134211] Avg episode reward: [(0, '7.396')] [2025-01-04 03:29:13,981][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000084431_345829376.pth... [2025-01-04 03:29:14,057][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000083514_342073344.pth [2025-01-04 03:29:14,889][134294] Updated weights for policy 0, policy_version 84434 (0.0026) [2025-01-04 03:29:18,294][134294] Updated weights for policy 0, policy_version 84444 (0.0028) [2025-01-04 03:29:18,968][134211] Fps is (10 sec: 13106.4, 60 sec: 15769.5, 300 sec: 15564.8). Total num frames: 345890816. Throughput: 0: 3878.7. Samples: 75637480. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 03:29:18,969][134211] Avg episode reward: [(0, '7.909')] [2025-01-04 03:29:21,548][134294] Updated weights for policy 0, policy_version 84454 (0.0028) [2025-01-04 03:29:23,970][134211] Fps is (10 sec: 11876.4, 60 sec: 15700.8, 300 sec: 15384.2). Total num frames: 345948160. Throughput: 0: 3735.4. Samples: 75655880. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 03:29:23,970][134211] Avg episode reward: [(0, '7.581')] [2025-01-04 03:29:25,067][134294] Updated weights for policy 0, policy_version 84464 (0.0030) [2025-01-04 03:29:27,661][134294] Updated weights for policy 0, policy_version 84474 (0.0021) [2025-01-04 03:29:28,967][134211] Fps is (10 sec: 13927.4, 60 sec: 15360.0, 300 sec: 15412.1). Total num frames: 346030080. Throughput: 0: 3789.9. Samples: 75677428. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 03:29:28,968][134211] Avg episode reward: [(0, '7.773')] [2025-01-04 03:29:29,567][134294] Updated weights for policy 0, policy_version 84484 (0.0012) [2025-01-04 03:29:31,594][134294] Updated weights for policy 0, policy_version 84494 (0.0016) [2025-01-04 03:29:33,968][134211] Fps is (10 sec: 16796.8, 60 sec: 15086.9, 300 sec: 15495.4). Total num frames: 346116096. Throughput: 0: 3919.6. Samples: 75693242. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 03:29:33,968][134211] Avg episode reward: [(0, '7.652')] [2025-01-04 03:29:34,562][134294] Updated weights for policy 0, policy_version 84504 (0.0025) [2025-01-04 03:29:37,699][134294] Updated weights for policy 0, policy_version 84514 (0.0025) [2025-01-04 03:29:38,968][134211] Fps is (10 sec: 15154.9, 60 sec: 15019.1, 300 sec: 15509.3). Total num frames: 346181632. Throughput: 0: 3924.6. Samples: 75714008. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 03:29:38,968][134211] Avg episode reward: [(0, '8.082')] [2025-01-04 03:29:40,775][134294] Updated weights for policy 0, policy_version 84524 (0.0025) [2025-01-04 03:29:42,754][134294] Updated weights for policy 0, policy_version 84534 (0.0014) [2025-01-04 03:29:43,968][134211] Fps is (10 sec: 15974.7, 60 sec: 15564.8, 300 sec: 15592.6). Total num frames: 346275840. Throughput: 0: 4015.7. Samples: 75738304. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-04 03:29:43,968][134211] Avg episode reward: [(0, '7.903')] [2025-01-04 03:29:44,622][134294] Updated weights for policy 0, policy_version 84544 (0.0013) [2025-01-04 03:29:46,491][134294] Updated weights for policy 0, policy_version 84554 (0.0013) [2025-01-04 03:29:48,392][134294] Updated weights for policy 0, policy_version 84564 (0.0014) [2025-01-04 03:29:48,968][134211] Fps is (10 sec: 20070.4, 60 sec: 16179.2, 300 sec: 15717.6). Total num frames: 346382336. Throughput: 0: 4080.4. Samples: 75754712. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-04 03:29:48,968][134211] Avg episode reward: [(0, '7.331')] [2025-01-04 03:29:51,017][134294] Updated weights for policy 0, policy_version 84574 (0.0023) [2025-01-04 03:29:53,968][134211] Fps is (10 sec: 17612.4, 60 sec: 16179.1, 300 sec: 15675.9). Total num frames: 346451968. Throughput: 0: 3931.2. Samples: 75780610. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-04 03:29:53,969][134211] Avg episode reward: [(0, '7.614')] [2025-01-04 03:29:54,255][134294] Updated weights for policy 0, policy_version 84584 (0.0023) [2025-01-04 03:29:57,434][134294] Updated weights for policy 0, policy_version 84594 (0.0028) [2025-01-04 03:29:58,968][134211] Fps is (10 sec: 13107.3, 60 sec: 16111.0, 300 sec: 15523.1). Total num frames: 346513408. Throughput: 0: 3815.5. Samples: 75799850. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-04 03:29:58,968][134211] Avg episode reward: [(0, '8.089')] [2025-01-04 03:30:00,558][134294] Updated weights for policy 0, policy_version 84604 (0.0026) [2025-01-04 03:30:03,601][134294] Updated weights for policy 0, policy_version 84614 (0.0028) [2025-01-04 03:30:03,968][134211] Fps is (10 sec: 12697.6, 60 sec: 15496.5, 300 sec: 15495.4). Total num frames: 346578944. Throughput: 0: 3830.6. Samples: 75809856. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-04 03:30:03,968][134211] Avg episode reward: [(0, '7.652')] [2025-01-04 03:30:06,671][134294] Updated weights for policy 0, policy_version 84624 (0.0027) [2025-01-04 03:30:08,968][134211] Fps is (10 sec: 13516.6, 60 sec: 14813.8, 300 sec: 15509.3). Total num frames: 346648576. Throughput: 0: 3863.9. Samples: 75829748. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-04 03:30:08,968][134211] Avg episode reward: [(0, '7.807')] [2025-01-04 03:30:09,804][134294] Updated weights for policy 0, policy_version 84634 (0.0024) [2025-01-04 03:30:11,684][134294] Updated weights for policy 0, policy_version 84644 (0.0011) [2025-01-04 03:30:13,552][134294] Updated weights for policy 0, policy_version 84654 (0.0013) [2025-01-04 03:30:13,968][134211] Fps is (10 sec: 16794.1, 60 sec: 15291.8, 300 sec: 15634.2). Total num frames: 346746880. Throughput: 0: 3984.3. Samples: 75856724. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-04 03:30:13,968][134211] Avg episode reward: [(0, '6.820')] [2025-01-04 03:30:15,486][134294] Updated weights for policy 0, policy_version 84664 (0.0013) [2025-01-04 03:30:17,554][134294] Updated weights for policy 0, policy_version 84674 (0.0015) [2025-01-04 03:30:18,969][134211] Fps is (10 sec: 19249.5, 60 sec: 15837.8, 300 sec: 15661.9). Total num frames: 346841088. Throughput: 0: 3993.8. Samples: 75872964. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-04 03:30:18,970][134211] Avg episode reward: [(0, '7.292')] [2025-01-04 03:30:21,455][134294] Updated weights for policy 0, policy_version 84684 (0.0029) [2025-01-04 03:30:23,968][134211] Fps is (10 sec: 14335.8, 60 sec: 15701.9, 300 sec: 15551.0). Total num frames: 346890240. Throughput: 0: 3945.5. Samples: 75891556. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-04 03:30:23,968][134211] Avg episode reward: [(0, '7.345')] [2025-01-04 03:30:25,073][134294] Updated weights for policy 0, policy_version 84694 (0.0029) [2025-01-04 03:30:27,496][134294] Updated weights for policy 0, policy_version 84704 (0.0017) [2025-01-04 03:30:28,968][134211] Fps is (10 sec: 13518.2, 60 sec: 15769.6, 300 sec: 15620.3). Total num frames: 346976256. Throughput: 0: 3904.3. Samples: 75913996. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-04 03:30:28,968][134211] Avg episode reward: [(0, '7.192')] [2025-01-04 03:30:29,372][134294] Updated weights for policy 0, policy_version 84714 (0.0014) [2025-01-04 03:30:31,247][134294] Updated weights for policy 0, policy_version 84724 (0.0013) [2025-01-04 03:30:33,163][134294] Updated weights for policy 0, policy_version 84734 (0.0012) [2025-01-04 03:30:33,968][134211] Fps is (10 sec: 19251.4, 60 sec: 16111.0, 300 sec: 15689.8). Total num frames: 347082752. Throughput: 0: 3901.2. Samples: 75930268. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-04 03:30:33,968][134211] Avg episode reward: [(0, '7.871')] [2025-01-04 03:30:35,877][134294] Updated weights for policy 0, policy_version 84744 (0.0025) [2025-01-04 03:30:38,968][134211] Fps is (10 sec: 17202.8, 60 sec: 16110.9, 300 sec: 15550.9). Total num frames: 347148288. Throughput: 0: 3886.0. Samples: 75955480. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 03:30:38,968][134211] Avg episode reward: [(0, '6.871')] [2025-01-04 03:30:38,989][134294] Updated weights for policy 0, policy_version 84754 (0.0029) [2025-01-04 03:30:42,235][134294] Updated weights for policy 0, policy_version 84764 (0.0026) [2025-01-04 03:30:43,968][134211] Fps is (10 sec: 13106.9, 60 sec: 15633.0, 300 sec: 15550.9). Total num frames: 347213824. Throughput: 0: 3883.8. Samples: 75974622. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 03:30:43,968][134211] Avg episode reward: [(0, '7.285')] [2025-01-04 03:30:45,394][134294] Updated weights for policy 0, policy_version 84774 (0.0030) [2025-01-04 03:30:48,359][134294] Updated weights for policy 0, policy_version 84784 (0.0026) [2025-01-04 03:30:48,968][134211] Fps is (10 sec: 13107.1, 60 sec: 14950.3, 300 sec: 15550.9). Total num frames: 347279360. Throughput: 0: 3883.9. Samples: 75984632. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 03:30:48,968][134211] Avg episode reward: [(0, '7.117')] [2025-01-04 03:30:51,393][134294] Updated weights for policy 0, policy_version 84794 (0.0027) [2025-01-04 03:30:53,968][134211] Fps is (10 sec: 13516.8, 60 sec: 14950.4, 300 sec: 15564.8). Total num frames: 347348992. Throughput: 0: 3890.3. Samples: 76004810. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 03:30:53,968][134211] Avg episode reward: [(0, '7.961')] [2025-01-04 03:30:54,559][134294] Updated weights for policy 0, policy_version 84804 (0.0023) [2025-01-04 03:30:57,589][134294] Updated weights for policy 0, policy_version 84814 (0.0026) [2025-01-04 03:30:58,968][134211] Fps is (10 sec: 14746.0, 60 sec: 15223.5, 300 sec: 15550.9). Total num frames: 347426816. Throughput: 0: 3753.5. Samples: 76025630. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 03:30:58,968][134211] Avg episode reward: [(0, '7.139')] [2025-01-04 03:30:59,582][134294] Updated weights for policy 0, policy_version 84824 (0.0013) [2025-01-04 03:31:01,512][134294] Updated weights for policy 0, policy_version 84834 (0.0013) [2025-01-04 03:31:03,968][134211] Fps is (10 sec: 16793.6, 60 sec: 15633.1, 300 sec: 15564.8). Total num frames: 347516928. Throughput: 0: 3743.3. Samples: 76041410. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 03:31:03,968][134211] Avg episode reward: [(0, '7.880')] [2025-01-04 03:31:04,175][134294] Updated weights for policy 0, policy_version 84844 (0.0023) [2025-01-04 03:31:07,306][134294] Updated weights for policy 0, policy_version 84854 (0.0027) [2025-01-04 03:31:08,968][134211] Fps is (10 sec: 15564.5, 60 sec: 15564.8, 300 sec: 15550.9). Total num frames: 347582464. Throughput: 0: 3815.3. Samples: 76063244. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 03:31:08,968][134211] Avg episode reward: [(0, '7.782')] [2025-01-04 03:31:10,241][134294] Updated weights for policy 0, policy_version 84864 (0.0022) [2025-01-04 03:31:12,151][134294] Updated weights for policy 0, policy_version 84874 (0.0013) [2025-01-04 03:31:13,968][134211] Fps is (10 sec: 16384.2, 60 sec: 15564.8, 300 sec: 15662.0). Total num frames: 347680768. Throughput: 0: 3909.6. Samples: 76089928. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 03:31:13,968][134211] Avg episode reward: [(0, '6.948')] [2025-01-04 03:31:14,022][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000084884_347684864.pth... [2025-01-04 03:31:14,025][134294] Updated weights for policy 0, policy_version 84884 (0.0015) [2025-01-04 03:31:14,076][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000083962_343908352.pth [2025-01-04 03:31:16,829][134294] Updated weights for policy 0, policy_version 84894 (0.0024) [2025-01-04 03:31:18,968][134211] Fps is (10 sec: 16793.6, 60 sec: 15155.4, 300 sec: 15550.9). Total num frames: 347750400. Throughput: 0: 3826.2. Samples: 76102448. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 03:31:18,968][134211] Avg episode reward: [(0, '6.993')] [2025-01-04 03:31:19,987][134294] Updated weights for policy 0, policy_version 84904 (0.0025) [2025-01-04 03:31:23,093][134294] Updated weights for policy 0, policy_version 84914 (0.0027) [2025-01-04 03:31:23,968][134211] Fps is (10 sec: 13516.6, 60 sec: 15428.2, 300 sec: 15412.0). Total num frames: 347815936. Throughput: 0: 3702.2. Samples: 76122078. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 03:31:23,968][134211] Avg episode reward: [(0, '7.901')] [2025-01-04 03:31:26,104][134294] Updated weights for policy 0, policy_version 84924 (0.0026) [2025-01-04 03:31:28,869][134294] Updated weights for policy 0, policy_version 84934 (0.0021) [2025-01-04 03:31:28,968][134211] Fps is (10 sec: 13926.7, 60 sec: 15223.5, 300 sec: 15426.0). Total num frames: 347889664. Throughput: 0: 3723.5. Samples: 76142178. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 03:31:28,968][134211] Avg episode reward: [(0, '7.891')] [2025-01-04 03:31:30,808][134294] Updated weights for policy 0, policy_version 84944 (0.0013) [2025-01-04 03:31:32,705][134294] Updated weights for policy 0, policy_version 84954 (0.0015) [2025-01-04 03:31:33,968][134211] Fps is (10 sec: 18022.9, 60 sec: 15223.5, 300 sec: 15564.8). Total num frames: 347996160. Throughput: 0: 3849.8. Samples: 76157872. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 03:31:33,968][134211] Avg episode reward: [(0, '7.198')] [2025-01-04 03:31:34,620][134294] Updated weights for policy 0, policy_version 84964 (0.0013) [2025-01-04 03:31:36,470][134294] Updated weights for policy 0, policy_version 84974 (0.0013) [2025-01-04 03:31:38,360][134294] Updated weights for policy 0, policy_version 84984 (0.0014) [2025-01-04 03:31:38,968][134211] Fps is (10 sec: 21708.3, 60 sec: 15974.4, 300 sec: 15717.5). Total num frames: 348106752. Throughput: 0: 4123.3. Samples: 76190360. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 03:31:38,968][134211] Avg episode reward: [(0, '7.866')] [2025-01-04 03:31:40,590][134294] Updated weights for policy 0, policy_version 84994 (0.0019) [2025-01-04 03:31:43,868][134294] Updated weights for policy 0, policy_version 85004 (0.0027) [2025-01-04 03:31:43,968][134211] Fps is (10 sec: 18021.9, 60 sec: 16042.7, 300 sec: 15717.5). Total num frames: 348176384. Throughput: 0: 4215.8. Samples: 76215344. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 03:31:43,969][134211] Avg episode reward: [(0, '6.936')] [2025-01-04 03:31:47,013][134294] Updated weights for policy 0, policy_version 85014 (0.0027) [2025-01-04 03:31:48,968][134211] Fps is (10 sec: 13107.4, 60 sec: 15974.5, 300 sec: 15703.7). Total num frames: 348237824. Throughput: 0: 4074.6. Samples: 76224766. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 03:31:48,968][134211] Avg episode reward: [(0, '7.418')] [2025-01-04 03:31:50,334][134294] Updated weights for policy 0, policy_version 85024 (0.0024) [2025-01-04 03:31:53,554][134294] Updated weights for policy 0, policy_version 85034 (0.0029) [2025-01-04 03:31:53,968][134211] Fps is (10 sec: 12697.6, 60 sec: 15906.1, 300 sec: 15689.8). Total num frames: 348303360. Throughput: 0: 4014.8. Samples: 76243910. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 03:31:53,968][134211] Avg episode reward: [(0, '7.647')] [2025-01-04 03:31:56,570][134294] Updated weights for policy 0, policy_version 85044 (0.0025) [2025-01-04 03:31:58,968][134211] Fps is (10 sec: 13107.2, 60 sec: 15701.3, 300 sec: 15564.8). Total num frames: 348368896. Throughput: 0: 3852.9. Samples: 76263310. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 03:31:58,968][134211] Avg episode reward: [(0, '7.409')] [2025-01-04 03:31:59,799][134294] Updated weights for policy 0, policy_version 85054 (0.0027) [2025-01-04 03:32:02,828][134294] Updated weights for policy 0, policy_version 85064 (0.0021) [2025-01-04 03:32:03,968][134211] Fps is (10 sec: 13517.1, 60 sec: 15360.1, 300 sec: 15578.7). Total num frames: 348438528. Throughput: 0: 3774.8. Samples: 76272314. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 03:32:03,968][134211] Avg episode reward: [(0, '8.196')] [2025-01-04 03:32:05,013][134294] Updated weights for policy 0, policy_version 85074 (0.0012) [2025-01-04 03:32:07,126][134294] Updated weights for policy 0, policy_version 85084 (0.0014) [2025-01-04 03:32:08,967][134211] Fps is (10 sec: 16793.9, 60 sec: 15906.2, 300 sec: 15689.8). Total num frames: 348536832. Throughput: 0: 3938.5. Samples: 76299308. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 03:32:08,968][134211] Avg episode reward: [(0, '7.901')] [2025-01-04 03:32:09,382][134294] Updated weights for policy 0, policy_version 85094 (0.0015) [2025-01-04 03:32:12,345][134294] Updated weights for policy 0, policy_version 85104 (0.0022) [2025-01-04 03:32:13,968][134211] Fps is (10 sec: 15974.1, 60 sec: 15291.7, 300 sec: 15550.9). Total num frames: 348598272. Throughput: 0: 3990.7. Samples: 76321760. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 03:32:13,968][134211] Avg episode reward: [(0, '7.523')] [2025-01-04 03:32:16,237][134294] Updated weights for policy 0, policy_version 85114 (0.0027) [2025-01-04 03:32:18,871][134294] Updated weights for policy 0, policy_version 85124 (0.0015) [2025-01-04 03:32:18,968][134211] Fps is (10 sec: 13107.1, 60 sec: 15291.8, 300 sec: 15412.1). Total num frames: 348667904. Throughput: 0: 3820.4. Samples: 76329792. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 03:32:18,968][134211] Avg episode reward: [(0, '7.759')] [2025-01-04 03:32:21,061][134294] Updated weights for policy 0, policy_version 85134 (0.0015) [2025-01-04 03:32:23,236][134294] Updated weights for policy 0, policy_version 85144 (0.0014) [2025-01-04 03:32:23,969][134211] Fps is (10 sec: 16382.5, 60 sec: 15769.4, 300 sec: 15453.7). Total num frames: 348762112. Throughput: 0: 3675.3. Samples: 76355750. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 03:32:23,969][134211] Avg episode reward: [(0, '7.332')] [2025-01-04 03:32:25,463][134294] Updated weights for policy 0, policy_version 85154 (0.0014) [2025-01-04 03:32:28,968][134211] Fps is (10 sec: 15974.0, 60 sec: 15633.0, 300 sec: 15467.6). Total num frames: 348827648. Throughput: 0: 3628.9. Samples: 76378644. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 03:32:28,969][134211] Avg episode reward: [(0, '7.666')] [2025-01-04 03:32:29,199][134294] Updated weights for policy 0, policy_version 85164 (0.0026) [2025-01-04 03:32:33,088][134294] Updated weights for policy 0, policy_version 85174 (0.0029) [2025-01-04 03:32:33,968][134211] Fps is (10 sec: 11879.0, 60 sec: 14745.4, 300 sec: 15412.0). Total num frames: 348880896. Throughput: 0: 3589.8. Samples: 76386308. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 03:32:33,969][134211] Avg episode reward: [(0, '7.489')] [2025-01-04 03:32:36,699][134294] Updated weights for policy 0, policy_version 85184 (0.0029) [2025-01-04 03:32:38,968][134211] Fps is (10 sec: 11059.3, 60 sec: 13858.1, 300 sec: 15384.3). Total num frames: 348938240. Throughput: 0: 3532.0. Samples: 76402848. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 03:32:38,968][134211] Avg episode reward: [(0, '6.985')] [2025-01-04 03:32:40,463][134294] Updated weights for policy 0, policy_version 85194 (0.0029) [2025-01-04 03:32:42,789][134294] Updated weights for policy 0, policy_version 85204 (0.0013) [2025-01-04 03:32:43,968][134211] Fps is (10 sec: 12698.0, 60 sec: 13858.1, 300 sec: 15398.2). Total num frames: 349007872. Throughput: 0: 3555.2. Samples: 76423296. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 03:32:43,969][134211] Avg episode reward: [(0, '6.803')] [2025-01-04 03:32:46,222][134294] Updated weights for policy 0, policy_version 85214 (0.0026) [2025-01-04 03:32:48,649][134294] Updated weights for policy 0, policy_version 85224 (0.0015) [2025-01-04 03:32:48,968][134211] Fps is (10 sec: 14336.3, 60 sec: 14063.0, 300 sec: 15412.1). Total num frames: 349081600. Throughput: 0: 3551.7. Samples: 76432142. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 03:32:48,968][134211] Avg episode reward: [(0, '7.632')] [2025-01-04 03:32:50,793][134294] Updated weights for policy 0, policy_version 85234 (0.0012) [2025-01-04 03:32:52,936][134294] Updated weights for policy 0, policy_version 85244 (0.0012) [2025-01-04 03:32:53,968][134211] Fps is (10 sec: 16384.0, 60 sec: 14472.5, 300 sec: 15384.3). Total num frames: 349171712. Throughput: 0: 3560.1. Samples: 76459514. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 03:32:53,968][134211] Avg episode reward: [(0, '7.771')] [2025-01-04 03:32:56,306][134294] Updated weights for policy 0, policy_version 85254 (0.0027) [2025-01-04 03:32:58,968][134211] Fps is (10 sec: 14335.7, 60 sec: 14267.7, 300 sec: 15189.9). Total num frames: 349224960. Throughput: 0: 3472.9. Samples: 76478042. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 03:32:58,969][134211] Avg episode reward: [(0, '7.108')] [2025-01-04 03:33:00,279][134294] Updated weights for policy 0, policy_version 85264 (0.0027) [2025-01-04 03:33:03,217][134294] Updated weights for policy 0, policy_version 85274 (0.0022) [2025-01-04 03:33:03,967][134211] Fps is (10 sec: 12288.4, 60 sec: 14267.8, 300 sec: 15176.1). Total num frames: 349294592. Throughput: 0: 3473.3. Samples: 76486088. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 03:33:03,968][134211] Avg episode reward: [(0, '6.365')] [2025-01-04 03:33:05,403][134294] Updated weights for policy 0, policy_version 85284 (0.0014) [2025-01-04 03:33:07,598][134294] Updated weights for policy 0, policy_version 85294 (0.0014) [2025-01-04 03:33:08,968][134211] Fps is (10 sec: 16384.0, 60 sec: 14199.4, 300 sec: 15287.1). Total num frames: 349388800. Throughput: 0: 3483.4. Samples: 76512498. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 03:33:08,968][134211] Avg episode reward: [(0, '7.434')] [2025-01-04 03:33:09,756][134294] Updated weights for policy 0, policy_version 85304 (0.0016) [2025-01-04 03:33:12,870][134294] Updated weights for policy 0, policy_version 85314 (0.0025) [2025-01-04 03:33:13,968][134211] Fps is (10 sec: 15974.1, 60 sec: 14267.7, 300 sec: 15287.1). Total num frames: 349454336. Throughput: 0: 3482.7. Samples: 76535364. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 03:33:13,968][134211] Avg episode reward: [(0, '7.864')] [2025-01-04 03:33:13,980][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000085316_349454336.pth... [2025-01-04 03:33:14,065][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000084431_345829376.pth [2025-01-04 03:33:16,737][134294] Updated weights for policy 0, policy_version 85324 (0.0030) [2025-01-04 03:33:18,968][134211] Fps is (10 sec: 12288.0, 60 sec: 14062.9, 300 sec: 15273.2). Total num frames: 349511680. Throughput: 0: 3486.8. Samples: 76543212. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 03:33:18,968][134211] Avg episode reward: [(0, '7.670')] [2025-01-04 03:33:20,383][134294] Updated weights for policy 0, policy_version 85334 (0.0028) [2025-01-04 03:33:23,968][134211] Fps is (10 sec: 11059.1, 60 sec: 13380.5, 300 sec: 15106.6). Total num frames: 349564928. Throughput: 0: 3500.1. Samples: 76560354. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 03:33:23,968][134211] Avg episode reward: [(0, '7.372')] [2025-01-04 03:33:24,009][134294] Updated weights for policy 0, policy_version 85344 (0.0025) [2025-01-04 03:33:26,776][134294] Updated weights for policy 0, policy_version 85354 (0.0016) [2025-01-04 03:33:28,916][134294] Updated weights for policy 0, policy_version 85364 (0.0012) [2025-01-04 03:33:28,968][134211] Fps is (10 sec: 13926.7, 60 sec: 13721.7, 300 sec: 15051.1). Total num frames: 349650944. Throughput: 0: 3536.8. Samples: 76582450. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 03:33:28,968][134211] Avg episode reward: [(0, '6.866')] [2025-01-04 03:33:31,085][134294] Updated weights for policy 0, policy_version 85374 (0.0012) [2025-01-04 03:33:33,353][134294] Updated weights for policy 0, policy_version 85384 (0.0014) [2025-01-04 03:33:33,968][134211] Fps is (10 sec: 17613.2, 60 sec: 14336.1, 300 sec: 15120.6). Total num frames: 349741056. Throughput: 0: 3651.5. Samples: 76596460. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 03:33:33,968][134211] Avg episode reward: [(0, '7.198')] [2025-01-04 03:33:35,520][134294] Updated weights for policy 0, policy_version 85394 (0.0012) [2025-01-04 03:33:38,968][134211] Fps is (10 sec: 15973.4, 60 sec: 14540.7, 300 sec: 15148.2). Total num frames: 349810688. Throughput: 0: 3606.9. Samples: 76621826. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 03:33:38,969][134211] Avg episode reward: [(0, '6.988')] [2025-01-04 03:33:39,116][134294] Updated weights for policy 0, policy_version 85404 (0.0023) [2025-01-04 03:33:42,841][134294] Updated weights for policy 0, policy_version 85414 (0.0031) [2025-01-04 03:33:43,968][134211] Fps is (10 sec: 12287.8, 60 sec: 14267.8, 300 sec: 15092.7). Total num frames: 349863936. Throughput: 0: 3550.4. Samples: 76637810. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 03:33:43,968][134211] Avg episode reward: [(0, '7.404')] [2025-01-04 03:33:46,485][134294] Updated weights for policy 0, policy_version 85424 (0.0025) [2025-01-04 03:33:48,968][134211] Fps is (10 sec: 11059.6, 60 sec: 13994.6, 300 sec: 15051.1). Total num frames: 349921280. Throughput: 0: 3562.2. Samples: 76646388. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 03:33:48,969][134211] Avg episode reward: [(0, '7.078')] [2025-01-04 03:33:50,268][134294] Updated weights for policy 0, policy_version 85434 (0.0025) [2025-01-04 03:33:53,786][134294] Updated weights for policy 0, policy_version 85444 (0.0025) [2025-01-04 03:33:53,968][134211] Fps is (10 sec: 11468.6, 60 sec: 13448.5, 300 sec: 15023.3). Total num frames: 349978624. Throughput: 0: 3355.1. Samples: 76663476. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 03:33:53,969][134211] Avg episode reward: [(0, '8.103')] [2025-01-04 03:33:56,444][134294] Updated weights for policy 0, policy_version 85454 (0.0018) [2025-01-04 03:33:58,589][134294] Updated weights for policy 0, policy_version 85464 (0.0014) [2025-01-04 03:33:58,968][134211] Fps is (10 sec: 14336.2, 60 sec: 13994.7, 300 sec: 14967.8). Total num frames: 350064640. Throughput: 0: 3356.5. Samples: 76686406. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 03:33:58,968][134211] Avg episode reward: [(0, '7.948')] [2025-01-04 03:34:00,785][134294] Updated weights for policy 0, policy_version 85474 (0.0012) [2025-01-04 03:34:02,928][134294] Updated weights for policy 0, policy_version 85484 (0.0014) [2025-01-04 03:34:03,968][134211] Fps is (10 sec: 17612.9, 60 sec: 14335.9, 300 sec: 14898.3). Total num frames: 350154752. Throughput: 0: 3495.7. Samples: 76700518. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 03:34:03,968][134211] Avg episode reward: [(0, '7.418')] [2025-01-04 03:34:06,205][134294] Updated weights for policy 0, policy_version 85494 (0.0022) [2025-01-04 03:34:08,979][134211] Fps is (10 sec: 14729.0, 60 sec: 13719.0, 300 sec: 14856.1). Total num frames: 350212096. Throughput: 0: 3581.5. Samples: 76721560. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 03:34:08,988][134211] Avg episode reward: [(0, '8.186')] [2025-01-04 03:34:10,182][134294] Updated weights for policy 0, policy_version 85504 (0.0033) [2025-01-04 03:34:12,734][134294] Updated weights for policy 0, policy_version 85514 (0.0017) [2025-01-04 03:34:13,968][134211] Fps is (10 sec: 13107.5, 60 sec: 13858.2, 300 sec: 14898.4). Total num frames: 350285824. Throughput: 0: 3533.8. Samples: 76741472. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 03:34:13,968][134211] Avg episode reward: [(0, '7.957')] [2025-01-04 03:34:14,959][134294] Updated weights for policy 0, policy_version 85524 (0.0011) [2025-01-04 03:34:17,110][134294] Updated weights for policy 0, policy_version 85534 (0.0015) [2025-01-04 03:34:18,968][134211] Fps is (10 sec: 15992.3, 60 sec: 14336.0, 300 sec: 14995.6). Total num frames: 350371840. Throughput: 0: 3537.8. Samples: 76755660. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 03:34:18,968][134211] Avg episode reward: [(0, '7.143')] [2025-01-04 03:34:20,123][134294] Updated weights for policy 0, policy_version 85544 (0.0023) [2025-01-04 03:34:23,881][134294] Updated weights for policy 0, policy_version 85554 (0.0026) [2025-01-04 03:34:23,968][134211] Fps is (10 sec: 14335.5, 60 sec: 14404.2, 300 sec: 14912.2). Total num frames: 350429184. Throughput: 0: 3424.7. Samples: 76775938. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 03:34:23,969][134211] Avg episode reward: [(0, '8.090')] [2025-01-04 03:34:27,635][134294] Updated weights for policy 0, policy_version 85564 (0.0030) [2025-01-04 03:34:28,968][134211] Fps is (10 sec: 11058.6, 60 sec: 13858.0, 300 sec: 14801.1). Total num frames: 350482432. Throughput: 0: 3425.7. Samples: 76791970. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 03:34:28,969][134211] Avg episode reward: [(0, '7.247')] [2025-01-04 03:34:31,173][134294] Updated weights for policy 0, policy_version 85574 (0.0025) [2025-01-04 03:34:33,516][134294] Updated weights for policy 0, policy_version 85584 (0.0014) [2025-01-04 03:34:33,968][134211] Fps is (10 sec: 12698.1, 60 sec: 13585.1, 300 sec: 14828.9). Total num frames: 350556160. Throughput: 0: 3435.3. Samples: 76800978. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 03:34:33,968][134211] Avg episode reward: [(0, '7.114')] [2025-01-04 03:34:36,278][134294] Updated weights for policy 0, policy_version 85594 (0.0020) [2025-01-04 03:34:38,968][134211] Fps is (10 sec: 13927.1, 60 sec: 13516.9, 300 sec: 14731.7). Total num frames: 350621696. Throughput: 0: 3566.3. Samples: 76823958. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 03:34:38,968][134211] Avg episode reward: [(0, '7.221')] [2025-01-04 03:34:40,052][134294] Updated weights for policy 0, policy_version 85604 (0.0027) [2025-01-04 03:34:43,009][134294] Updated weights for policy 0, policy_version 85614 (0.0019) [2025-01-04 03:34:43,967][134211] Fps is (10 sec: 13516.9, 60 sec: 13789.9, 300 sec: 14606.8). Total num frames: 350691328. Throughput: 0: 3477.9. Samples: 76842910. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 03:34:43,968][134211] Avg episode reward: [(0, '7.339')] [2025-01-04 03:34:45,242][134294] Updated weights for policy 0, policy_version 85624 (0.0015) [2025-01-04 03:34:47,445][134294] Updated weights for policy 0, policy_version 85634 (0.0015) [2025-01-04 03:34:48,968][134211] Fps is (10 sec: 16383.9, 60 sec: 14404.3, 300 sec: 14690.1). Total num frames: 350785536. Throughput: 0: 3470.3. Samples: 76856682. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 03:34:48,968][134211] Avg episode reward: [(0, '6.654')] [2025-01-04 03:34:49,579][134294] Updated weights for policy 0, policy_version 85644 (0.0014) [2025-01-04 03:34:52,395][134294] Updated weights for policy 0, policy_version 85654 (0.0022) [2025-01-04 03:34:53,968][134211] Fps is (10 sec: 16383.4, 60 sec: 14609.1, 300 sec: 14717.8). Total num frames: 350855168. Throughput: 0: 3576.1. Samples: 76882444. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 03:34:53,969][134211] Avg episode reward: [(0, '7.719')] [2025-01-04 03:34:56,125][134294] Updated weights for policy 0, policy_version 85664 (0.0027) [2025-01-04 03:34:58,968][134211] Fps is (10 sec: 12288.0, 60 sec: 14062.9, 300 sec: 14676.2). Total num frames: 350908416. Throughput: 0: 3498.5. Samples: 76898906. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 03:34:58,968][134211] Avg episode reward: [(0, '7.305')] [2025-01-04 03:34:59,668][134294] Updated weights for policy 0, policy_version 85674 (0.0028) [2025-01-04 03:35:03,245][134294] Updated weights for policy 0, policy_version 85684 (0.0027) [2025-01-04 03:35:03,968][134211] Fps is (10 sec: 11468.9, 60 sec: 13585.1, 300 sec: 14648.4). Total num frames: 350969856. Throughput: 0: 3370.8. Samples: 76907344. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 03:35:03,969][134211] Avg episode reward: [(0, '6.980')] [2025-01-04 03:35:06,346][134294] Updated weights for policy 0, policy_version 85694 (0.0026) [2025-01-04 03:35:08,968][134211] Fps is (10 sec: 13107.5, 60 sec: 13792.5, 300 sec: 14551.2). Total num frames: 351039488. Throughput: 0: 3347.9. Samples: 76926592. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 03:35:08,968][134211] Avg episode reward: [(0, '6.442')] [2025-01-04 03:35:09,156][134294] Updated weights for policy 0, policy_version 85704 (0.0023) [2025-01-04 03:35:11,309][134294] Updated weights for policy 0, policy_version 85714 (0.0017) [2025-01-04 03:35:13,968][134211] Fps is (10 sec: 15155.2, 60 sec: 13926.4, 300 sec: 14509.6). Total num frames: 351121408. Throughput: 0: 3535.2. Samples: 76951052. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 03:35:13,968][134211] Avg episode reward: [(0, '7.925')] [2025-01-04 03:35:13,979][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000085723_351121408.pth... [2025-01-04 03:35:14,053][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000084884_347684864.pth [2025-01-04 03:35:14,266][134294] Updated weights for policy 0, policy_version 85724 (0.0025) [2025-01-04 03:35:17,503][134294] Updated weights for policy 0, policy_version 85734 (0.0026) [2025-01-04 03:35:18,968][134211] Fps is (10 sec: 14335.7, 60 sec: 13516.8, 300 sec: 14551.2). Total num frames: 351182848. Throughput: 0: 3551.2. Samples: 76960782. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 03:35:18,968][134211] Avg episode reward: [(0, '6.828')] [2025-01-04 03:35:20,505][134294] Updated weights for policy 0, policy_version 85744 (0.0023) [2025-01-04 03:35:22,583][134294] Updated weights for policy 0, policy_version 85754 (0.0013) [2025-01-04 03:35:23,968][134211] Fps is (10 sec: 15155.5, 60 sec: 14063.0, 300 sec: 14565.1). Total num frames: 351272960. Throughput: 0: 3543.3. Samples: 76983406. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 03:35:23,968][134211] Avg episode reward: [(0, '7.235')] [2025-01-04 03:35:24,609][134294] Updated weights for policy 0, policy_version 85764 (0.0015) [2025-01-04 03:35:26,480][134294] Updated weights for policy 0, policy_version 85774 (0.0015) [2025-01-04 03:35:28,968][134211] Fps is (10 sec: 18432.1, 60 sec: 14745.7, 300 sec: 14523.4). Total num frames: 351367168. Throughput: 0: 3775.5. Samples: 77012806. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 03:35:28,968][134211] Avg episode reward: [(0, '6.650')] [2025-01-04 03:35:29,195][134294] Updated weights for policy 0, policy_version 85784 (0.0021) [2025-01-04 03:35:32,489][134294] Updated weights for policy 0, policy_version 85794 (0.0028) [2025-01-04 03:35:33,968][134211] Fps is (10 sec: 15564.4, 60 sec: 14540.7, 300 sec: 14509.6). Total num frames: 351428608. Throughput: 0: 3674.9. Samples: 77022052. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 03:35:33,968][134211] Avg episode reward: [(0, '8.017')] [2025-01-04 03:35:35,627][134294] Updated weights for policy 0, policy_version 85804 (0.0028) [2025-01-04 03:35:38,677][134294] Updated weights for policy 0, policy_version 85814 (0.0026) [2025-01-04 03:35:38,968][134211] Fps is (10 sec: 12697.1, 60 sec: 14540.7, 300 sec: 14509.5). Total num frames: 351494144. Throughput: 0: 3541.3. Samples: 77041804. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 03:35:38,969][134211] Avg episode reward: [(0, '7.539')] [2025-01-04 03:35:41,744][134294] Updated weights for policy 0, policy_version 85824 (0.0027) [2025-01-04 03:35:43,968][134211] Fps is (10 sec: 13107.2, 60 sec: 14472.5, 300 sec: 14509.6). Total num frames: 351559680. Throughput: 0: 3610.0. Samples: 77061356. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 03:35:43,968][134211] Avg episode reward: [(0, '7.420')] [2025-01-04 03:35:45,106][134294] Updated weights for policy 0, policy_version 85834 (0.0026) [2025-01-04 03:35:48,184][134294] Updated weights for policy 0, policy_version 85844 (0.0026) [2025-01-04 03:35:48,971][134211] Fps is (10 sec: 13103.7, 60 sec: 13994.0, 300 sec: 14495.5). Total num frames: 351625216. Throughput: 0: 3634.7. Samples: 77070916. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 03:35:48,971][134211] Avg episode reward: [(0, '7.844')] [2025-01-04 03:35:50,844][134294] Updated weights for policy 0, policy_version 85854 (0.0025) [2025-01-04 03:35:52,736][134294] Updated weights for policy 0, policy_version 85864 (0.0012) [2025-01-04 03:35:53,967][134211] Fps is (10 sec: 16384.5, 60 sec: 14472.6, 300 sec: 14565.1). Total num frames: 351723520. Throughput: 0: 3740.8. Samples: 77094930. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 03:35:53,968][134211] Avg episode reward: [(0, '6.568')] [2025-01-04 03:35:54,579][134294] Updated weights for policy 0, policy_version 85874 (0.0014) [2025-01-04 03:35:56,462][134294] Updated weights for policy 0, policy_version 85884 (0.0013) [2025-01-04 03:35:58,382][134294] Updated weights for policy 0, policy_version 85894 (0.0012) [2025-01-04 03:35:58,967][134211] Fps is (10 sec: 20896.5, 60 sec: 15428.3, 300 sec: 14634.5). Total num frames: 351834112. Throughput: 0: 3926.8. Samples: 77127758. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 03:35:58,968][134211] Avg episode reward: [(0, '7.428')] [2025-01-04 03:36:00,347][134294] Updated weights for policy 0, policy_version 85904 (0.0015) [2025-01-04 03:36:03,649][134294] Updated weights for policy 0, policy_version 85914 (0.0028) [2025-01-04 03:36:03,968][134211] Fps is (10 sec: 18430.5, 60 sec: 15632.9, 300 sec: 14662.3). Total num frames: 351907840. Throughput: 0: 4023.6. Samples: 77141848. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 03:36:03,969][134211] Avg episode reward: [(0, '7.711')] [2025-01-04 03:36:06,780][134294] Updated weights for policy 0, policy_version 85924 (0.0028) [2025-01-04 03:36:08,968][134211] Fps is (10 sec: 13516.6, 60 sec: 15496.5, 300 sec: 14537.3). Total num frames: 351969280. Throughput: 0: 3935.4. Samples: 77160500. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 03:36:08,968][134211] Avg episode reward: [(0, '8.013')] [2025-01-04 03:36:10,084][134294] Updated weights for policy 0, policy_version 85934 (0.0026) [2025-01-04 03:36:13,204][134294] Updated weights for policy 0, policy_version 85944 (0.0027) [2025-01-04 03:36:13,968][134211] Fps is (10 sec: 12698.2, 60 sec: 15223.5, 300 sec: 14523.4). Total num frames: 352034816. Throughput: 0: 3709.1. Samples: 77179714. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 03:36:13,968][134211] Avg episode reward: [(0, '7.773')] [2025-01-04 03:36:16,286][134294] Updated weights for policy 0, policy_version 85954 (0.0027) [2025-01-04 03:36:18,968][134211] Fps is (10 sec: 13106.7, 60 sec: 15291.7, 300 sec: 14523.4). Total num frames: 352100352. Throughput: 0: 3725.7. Samples: 77189710. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 03:36:18,969][134211] Avg episode reward: [(0, '7.671')] [2025-01-04 03:36:19,505][134294] Updated weights for policy 0, policy_version 85964 (0.0026) [2025-01-04 03:36:22,505][134294] Updated weights for policy 0, policy_version 85974 (0.0024) [2025-01-04 03:36:23,968][134211] Fps is (10 sec: 13107.2, 60 sec: 14882.1, 300 sec: 14495.7). Total num frames: 352165888. Throughput: 0: 3725.8. Samples: 77209466. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 03:36:23,968][134211] Avg episode reward: [(0, '8.146')] [2025-01-04 03:36:25,521][134294] Updated weights for policy 0, policy_version 85984 (0.0024) [2025-01-04 03:36:27,399][134294] Updated weights for policy 0, policy_version 85994 (0.0014) [2025-01-04 03:36:28,967][134211] Fps is (10 sec: 16385.0, 60 sec: 14950.5, 300 sec: 14467.9). Total num frames: 352264192. Throughput: 0: 3868.0. Samples: 77235414. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 03:36:28,968][134211] Avg episode reward: [(0, '7.897')] [2025-01-04 03:36:29,272][134294] Updated weights for policy 0, policy_version 86004 (0.0012) [2025-01-04 03:36:31,156][134294] Updated weights for policy 0, policy_version 86014 (0.0012) [2025-01-04 03:36:33,004][134294] Updated weights for policy 0, policy_version 86024 (0.0014) [2025-01-04 03:36:33,967][134211] Fps is (10 sec: 20480.6, 60 sec: 15701.4, 300 sec: 14454.0). Total num frames: 352370688. Throughput: 0: 4018.6. Samples: 77251742. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 03:36:33,968][134211] Avg episode reward: [(0, '7.540')] [2025-01-04 03:36:34,911][134294] Updated weights for policy 0, policy_version 86034 (0.0013) [2025-01-04 03:36:36,805][134294] Updated weights for policy 0, policy_version 86044 (0.0011) [2025-01-04 03:36:38,912][134294] Updated weights for policy 0, policy_version 86054 (0.0017) [2025-01-04 03:36:38,968][134211] Fps is (10 sec: 21298.7, 60 sec: 16384.1, 300 sec: 14579.0). Total num frames: 352477184. Throughput: 0: 4209.2. Samples: 77284344. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-04 03:36:38,968][134211] Avg episode reward: [(0, '7.633')] [2025-01-04 03:36:42,061][134294] Updated weights for policy 0, policy_version 86064 (0.0029) [2025-01-04 03:36:43,968][134211] Fps is (10 sec: 16793.1, 60 sec: 16315.7, 300 sec: 14579.0). Total num frames: 352538624. Throughput: 0: 3962.6. Samples: 77306076. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-04 03:36:43,969][134211] Avg episode reward: [(0, '7.790')] [2025-01-04 03:36:45,231][134294] Updated weights for policy 0, policy_version 86074 (0.0029) [2025-01-04 03:36:48,287][134294] Updated weights for policy 0, policy_version 86084 (0.0027) [2025-01-04 03:36:48,968][134211] Fps is (10 sec: 12697.5, 60 sec: 16316.6, 300 sec: 14579.0). Total num frames: 352604160. Throughput: 0: 3871.2. Samples: 77316052. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-04 03:36:48,968][134211] Avg episode reward: [(0, '7.469')] [2025-01-04 03:36:51,493][134294] Updated weights for policy 0, policy_version 86094 (0.0024) [2025-01-04 03:36:53,968][134211] Fps is (10 sec: 13107.3, 60 sec: 15769.5, 300 sec: 14579.0). Total num frames: 352669696. Throughput: 0: 3889.5. Samples: 77335528. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-04 03:36:53,968][134211] Avg episode reward: [(0, '8.152')] [2025-01-04 03:36:54,708][134294] Updated weights for policy 0, policy_version 86104 (0.0029) [2025-01-04 03:36:57,679][134294] Updated weights for policy 0, policy_version 86114 (0.0022) [2025-01-04 03:36:58,968][134211] Fps is (10 sec: 13107.3, 60 sec: 15018.6, 300 sec: 14565.1). Total num frames: 352735232. Throughput: 0: 3903.2. Samples: 77355358. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-04 03:36:58,968][134211] Avg episode reward: [(0, '8.409')] [2025-01-04 03:37:00,803][134294] Updated weights for policy 0, policy_version 86124 (0.0027) [2025-01-04 03:37:03,686][134294] Updated weights for policy 0, policy_version 86134 (0.0024) [2025-01-04 03:37:03,968][134211] Fps is (10 sec: 13516.8, 60 sec: 14950.5, 300 sec: 14467.9). Total num frames: 352804864. Throughput: 0: 3910.6. Samples: 77365688. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-04 03:37:03,968][134211] Avg episode reward: [(0, '7.380')] [2025-01-04 03:37:06,756][134294] Updated weights for policy 0, policy_version 86144 (0.0025) [2025-01-04 03:37:08,968][134211] Fps is (10 sec: 13516.7, 60 sec: 15018.6, 300 sec: 14481.8). Total num frames: 352870400. Throughput: 0: 3927.0. Samples: 77386180. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-04 03:37:08,968][134211] Avg episode reward: [(0, '7.518')] [2025-01-04 03:37:09,976][134294] Updated weights for policy 0, policy_version 86154 (0.0026) [2025-01-04 03:37:13,243][134294] Updated weights for policy 0, policy_version 86164 (0.0029) [2025-01-04 03:37:13,968][134211] Fps is (10 sec: 13107.2, 60 sec: 15018.7, 300 sec: 14467.9). Total num frames: 352935936. Throughput: 0: 3765.8. Samples: 77404878. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-04 03:37:13,968][134211] Avg episode reward: [(0, '7.439')] [2025-01-04 03:37:13,975][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000086166_352935936.pth... [2025-01-04 03:37:14,029][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000085316_349454336.pth [2025-01-04 03:37:15,548][134294] Updated weights for policy 0, policy_version 86174 (0.0015) [2025-01-04 03:37:17,464][134294] Updated weights for policy 0, policy_version 86184 (0.0012) [2025-01-04 03:37:18,968][134211] Fps is (10 sec: 16384.3, 60 sec: 15564.9, 300 sec: 14481.8). Total num frames: 353034240. Throughput: 0: 3718.9. Samples: 77419092. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-04 03:37:18,968][134211] Avg episode reward: [(0, '7.599')] [2025-01-04 03:37:19,613][134294] Updated weights for policy 0, policy_version 86194 (0.0013) [2025-01-04 03:37:21,627][134294] Updated weights for policy 0, policy_version 86204 (0.0013) [2025-01-04 03:37:23,576][134294] Updated weights for policy 0, policy_version 86214 (0.0012) [2025-01-04 03:37:23,967][134211] Fps is (10 sec: 20480.5, 60 sec: 16247.5, 300 sec: 14620.7). Total num frames: 353140736. Throughput: 0: 3657.6. Samples: 77448934. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-04 03:37:23,968][134211] Avg episode reward: [(0, '7.344')] [2025-01-04 03:37:25,704][134294] Updated weights for policy 0, policy_version 86224 (0.0016) [2025-01-04 03:37:28,841][134294] Updated weights for policy 0, policy_version 86234 (0.0027) [2025-01-04 03:37:28,968][134211] Fps is (10 sec: 18021.9, 60 sec: 15837.8, 300 sec: 14690.1). Total num frames: 353214464. Throughput: 0: 3750.7. Samples: 77474858. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-04 03:37:28,968][134211] Avg episode reward: [(0, '7.083')] [2025-01-04 03:37:32,293][134294] Updated weights for policy 0, policy_version 86244 (0.0024) [2025-01-04 03:37:33,968][134211] Fps is (10 sec: 13516.4, 60 sec: 15086.9, 300 sec: 14703.9). Total num frames: 353275904. Throughput: 0: 3722.1. Samples: 77483548. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 03:37:33,969][134211] Avg episode reward: [(0, '7.475')] [2025-01-04 03:37:35,422][134294] Updated weights for policy 0, policy_version 86254 (0.0029) [2025-01-04 03:37:38,373][134294] Updated weights for policy 0, policy_version 86264 (0.0020) [2025-01-04 03:37:38,968][134211] Fps is (10 sec: 12697.8, 60 sec: 14404.3, 300 sec: 14690.1). Total num frames: 353341440. Throughput: 0: 3730.3. Samples: 77503392. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 03:37:38,968][134211] Avg episode reward: [(0, '6.193')] [2025-01-04 03:37:41,475][134294] Updated weights for policy 0, policy_version 86274 (0.0027) [2025-01-04 03:37:43,968][134211] Fps is (10 sec: 13107.3, 60 sec: 14472.5, 300 sec: 14662.3). Total num frames: 353406976. Throughput: 0: 3730.5. Samples: 77523230. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 03:37:43,968][134211] Avg episode reward: [(0, '7.363')] [2025-01-04 03:37:44,656][134294] Updated weights for policy 0, policy_version 86284 (0.0026) [2025-01-04 03:37:47,594][134294] Updated weights for policy 0, policy_version 86294 (0.0025) [2025-01-04 03:37:48,968][134211] Fps is (10 sec: 14745.7, 60 sec: 14745.7, 300 sec: 14634.5). Total num frames: 353488896. Throughput: 0: 3722.4. Samples: 77533196. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 03:37:48,968][134211] Avg episode reward: [(0, '7.273')] [2025-01-04 03:37:49,512][134294] Updated weights for policy 0, policy_version 86304 (0.0016) [2025-01-04 03:37:51,424][134294] Updated weights for policy 0, policy_version 86314 (0.0012) [2025-01-04 03:37:53,283][134294] Updated weights for policy 0, policy_version 86324 (0.0014) [2025-01-04 03:37:53,968][134211] Fps is (10 sec: 18841.8, 60 sec: 15428.3, 300 sec: 14815.0). Total num frames: 353595392. Throughput: 0: 3930.1. Samples: 77563034. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 03:37:53,968][134211] Avg episode reward: [(0, '7.539')] [2025-01-04 03:37:55,205][134294] Updated weights for policy 0, policy_version 86334 (0.0013) [2025-01-04 03:37:58,099][134294] Updated weights for policy 0, policy_version 86344 (0.0024) [2025-01-04 03:37:58,968][134211] Fps is (10 sec: 18431.6, 60 sec: 15633.0, 300 sec: 14842.8). Total num frames: 353673216. Throughput: 0: 4103.6. Samples: 77589538. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 03:37:58,969][134211] Avg episode reward: [(0, '7.047')] [2025-01-04 03:38:01,503][134294] Updated weights for policy 0, policy_version 86354 (0.0031) [2025-01-04 03:38:03,968][134211] Fps is (10 sec: 13926.2, 60 sec: 15496.5, 300 sec: 14731.7). Total num frames: 353734656. Throughput: 0: 3993.2. Samples: 77598786. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 03:38:03,968][134211] Avg episode reward: [(0, '8.479')] [2025-01-04 03:38:04,991][134294] Updated weights for policy 0, policy_version 86364 (0.0029) [2025-01-04 03:38:08,134][134294] Updated weights for policy 0, policy_version 86374 (0.0030) [2025-01-04 03:38:08,968][134211] Fps is (10 sec: 12288.0, 60 sec: 15428.2, 300 sec: 14717.8). Total num frames: 353796096. Throughput: 0: 3733.4. Samples: 77616940. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 03:38:08,969][134211] Avg episode reward: [(0, '7.025')] [2025-01-04 03:38:10,826][134294] Updated weights for policy 0, policy_version 86384 (0.0017) [2025-01-04 03:38:12,956][134294] Updated weights for policy 0, policy_version 86394 (0.0014) [2025-01-04 03:38:13,968][134211] Fps is (10 sec: 14745.8, 60 sec: 15769.6, 300 sec: 14815.0). Total num frames: 353882112. Throughput: 0: 3705.2. Samples: 77641592. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 03:38:13,968][134211] Avg episode reward: [(0, '7.154')] [2025-01-04 03:38:15,823][134294] Updated weights for policy 0, policy_version 86404 (0.0026) [2025-01-04 03:38:18,866][134294] Updated weights for policy 0, policy_version 86414 (0.0026) [2025-01-04 03:38:18,968][134211] Fps is (10 sec: 15564.9, 60 sec: 15291.7, 300 sec: 14870.6). Total num frames: 353951744. Throughput: 0: 3744.1. Samples: 77652034. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 03:38:18,968][134211] Avg episode reward: [(0, '7.467')] [2025-01-04 03:38:21,748][134294] Updated weights for policy 0, policy_version 86424 (0.0024) [2025-01-04 03:38:23,968][134211] Fps is (10 sec: 13516.6, 60 sec: 14609.0, 300 sec: 14801.1). Total num frames: 354017280. Throughput: 0: 3755.6. Samples: 77672396. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 03:38:23,968][134211] Avg episode reward: [(0, '7.487')] [2025-01-04 03:38:24,931][134294] Updated weights for policy 0, policy_version 86434 (0.0021) [2025-01-04 03:38:26,837][134294] Updated weights for policy 0, policy_version 86444 (0.0013) [2025-01-04 03:38:28,783][134294] Updated weights for policy 0, policy_version 86454 (0.0013) [2025-01-04 03:38:28,968][134211] Fps is (10 sec: 16384.4, 60 sec: 15018.7, 300 sec: 14828.9). Total num frames: 354115584. Throughput: 0: 3903.3. Samples: 77698876. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 03:38:28,968][134211] Avg episode reward: [(0, '6.485')] [2025-01-04 03:38:30,655][134294] Updated weights for policy 0, policy_version 86464 (0.0013) [2025-01-04 03:38:32,573][134294] Updated weights for policy 0, policy_version 86474 (0.0014) [2025-01-04 03:38:33,968][134211] Fps is (10 sec: 20890.0, 60 sec: 15837.9, 300 sec: 14967.8). Total num frames: 354226176. Throughput: 0: 4043.6. Samples: 77715158. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 03:38:33,968][134211] Avg episode reward: [(0, '7.422')] [2025-01-04 03:38:34,447][134294] Updated weights for policy 0, policy_version 86484 (0.0014) [2025-01-04 03:38:36,340][134294] Updated weights for policy 0, policy_version 86494 (0.0014) [2025-01-04 03:38:38,531][134294] Updated weights for policy 0, policy_version 86504 (0.0019) [2025-01-04 03:38:38,968][134211] Fps is (10 sec: 20889.1, 60 sec: 16384.0, 300 sec: 15120.5). Total num frames: 354324480. Throughput: 0: 4105.1. Samples: 77747764. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 03:38:38,968][134211] Avg episode reward: [(0, '7.719')] [2025-01-04 03:38:41,869][134294] Updated weights for policy 0, policy_version 86514 (0.0026) [2025-01-04 03:38:43,968][134211] Fps is (10 sec: 15974.0, 60 sec: 16315.7, 300 sec: 15134.4). Total num frames: 354385920. Throughput: 0: 3954.8. Samples: 77767502. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 03:38:43,969][134211] Avg episode reward: [(0, '7.588')] [2025-01-04 03:38:45,100][134294] Updated weights for policy 0, policy_version 86524 (0.0026) [2025-01-04 03:38:48,277][134294] Updated weights for policy 0, policy_version 86534 (0.0028) [2025-01-04 03:38:48,968][134211] Fps is (10 sec: 12288.2, 60 sec: 15974.4, 300 sec: 15148.3). Total num frames: 354447360. Throughput: 0: 3960.9. Samples: 77777028. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 03:38:48,968][134211] Avg episode reward: [(0, '7.768')] [2025-01-04 03:38:51,480][134294] Updated weights for policy 0, policy_version 86544 (0.0024) [2025-01-04 03:38:53,968][134211] Fps is (10 sec: 13107.2, 60 sec: 15360.0, 300 sec: 15092.7). Total num frames: 354516992. Throughput: 0: 3988.2. Samples: 77796410. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 03:38:53,968][134211] Avg episode reward: [(0, '6.718')] [2025-01-04 03:38:54,658][134294] Updated weights for policy 0, policy_version 86554 (0.0027) [2025-01-04 03:38:57,694][134294] Updated weights for policy 0, policy_version 86564 (0.0023) [2025-01-04 03:38:58,968][134211] Fps is (10 sec: 13516.7, 60 sec: 15155.2, 300 sec: 15009.4). Total num frames: 354582528. Throughput: 0: 3881.7. Samples: 77816268. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 03:38:58,968][134211] Avg episode reward: [(0, '8.055')] [2025-01-04 03:39:00,704][134294] Updated weights for policy 0, policy_version 86574 (0.0023) [2025-01-04 03:39:03,713][134294] Updated weights for policy 0, policy_version 86584 (0.0024) [2025-01-04 03:39:03,968][134211] Fps is (10 sec: 13107.3, 60 sec: 15223.5, 300 sec: 15037.7). Total num frames: 354648064. Throughput: 0: 3883.1. Samples: 77826772. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 03:39:03,968][134211] Avg episode reward: [(0, '7.187')] [2025-01-04 03:39:06,835][134294] Updated weights for policy 0, policy_version 86594 (0.0022) [2025-01-04 03:39:08,968][134211] Fps is (10 sec: 13516.9, 60 sec: 15360.0, 300 sec: 15023.3). Total num frames: 354717696. Throughput: 0: 3878.1. Samples: 77846908. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 03:39:08,968][134211] Avg episode reward: [(0, '7.509')] [2025-01-04 03:39:09,724][134294] Updated weights for policy 0, policy_version 86604 (0.0025) [2025-01-04 03:39:11,849][134294] Updated weights for policy 0, policy_version 86614 (0.0013) [2025-01-04 03:39:13,778][134294] Updated weights for policy 0, policy_version 86624 (0.0014) [2025-01-04 03:39:13,967][134211] Fps is (10 sec: 16384.4, 60 sec: 15496.6, 300 sec: 15051.1). Total num frames: 354811904. Throughput: 0: 3869.3. Samples: 77872994. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 03:39:13,968][134211] Avg episode reward: [(0, '7.153')] [2025-01-04 03:39:13,984][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000086625_354816000.pth... [2025-01-04 03:39:14,030][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000085723_351121408.pth [2025-01-04 03:39:15,685][134294] Updated weights for policy 0, policy_version 86634 (0.0013) [2025-01-04 03:39:17,739][134294] Updated weights for policy 0, policy_version 86644 (0.0014) [2025-01-04 03:39:18,968][134211] Fps is (10 sec: 19659.2, 60 sec: 16042.5, 300 sec: 15203.8). Total num frames: 354914304. Throughput: 0: 3864.7. Samples: 77889074. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-04 03:39:18,969][134211] Avg episode reward: [(0, '7.373')] [2025-01-04 03:39:20,166][134294] Updated weights for policy 0, policy_version 86654 (0.0019) [2025-01-04 03:39:23,643][134294] Updated weights for policy 0, policy_version 86664 (0.0027) [2025-01-04 03:39:23,969][134211] Fps is (10 sec: 16382.4, 60 sec: 15974.2, 300 sec: 15231.6). Total num frames: 354975744. Throughput: 0: 3668.9. Samples: 77912866. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-04 03:39:23,969][134211] Avg episode reward: [(0, '7.696')] [2025-01-04 03:39:27,317][134294] Updated weights for policy 0, policy_version 86674 (0.0025) [2025-01-04 03:39:28,968][134211] Fps is (10 sec: 12288.8, 60 sec: 15359.9, 300 sec: 15189.9). Total num frames: 355037184. Throughput: 0: 3610.8. Samples: 77929986. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-04 03:39:28,968][134211] Avg episode reward: [(0, '6.730')] [2025-01-04 03:39:30,389][134294] Updated weights for policy 0, policy_version 86684 (0.0028) [2025-01-04 03:39:33,460][134294] Updated weights for policy 0, policy_version 86694 (0.0027) [2025-01-04 03:39:33,968][134211] Fps is (10 sec: 12698.7, 60 sec: 14609.1, 300 sec: 15189.9). Total num frames: 355102720. Throughput: 0: 3628.6. Samples: 77940314. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-04 03:39:33,968][134211] Avg episode reward: [(0, '7.869')] [2025-01-04 03:39:36,430][134294] Updated weights for policy 0, policy_version 86704 (0.0025) [2025-01-04 03:39:38,881][134294] Updated weights for policy 0, policy_version 86714 (0.0017) [2025-01-04 03:39:38,967][134211] Fps is (10 sec: 14336.4, 60 sec: 14267.8, 300 sec: 15217.7). Total num frames: 355180544. Throughput: 0: 3651.7. Samples: 77960736. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-04 03:39:38,968][134211] Avg episode reward: [(0, '7.681')] [2025-01-04 03:39:41,309][134294] Updated weights for policy 0, policy_version 86724 (0.0019) [2025-01-04 03:39:43,968][134211] Fps is (10 sec: 15564.8, 60 sec: 14540.8, 300 sec: 15162.1). Total num frames: 355258368. Throughput: 0: 3759.3. Samples: 77985438. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-04 03:39:43,968][134211] Avg episode reward: [(0, '7.388')] [2025-01-04 03:39:44,090][134294] Updated weights for policy 0, policy_version 86734 (0.0023) [2025-01-04 03:39:47,144][134294] Updated weights for policy 0, policy_version 86744 (0.0024) [2025-01-04 03:39:48,967][134211] Fps is (10 sec: 15974.4, 60 sec: 14882.2, 300 sec: 15203.8). Total num frames: 355340288. Throughput: 0: 3750.5. Samples: 77995544. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-04 03:39:48,968][134211] Avg episode reward: [(0, '6.566')] [2025-01-04 03:39:48,999][134294] Updated weights for policy 0, policy_version 86754 (0.0013) [2025-01-04 03:39:50,932][134294] Updated weights for policy 0, policy_version 86764 (0.0013) [2025-01-04 03:39:52,786][134294] Updated weights for policy 0, policy_version 86774 (0.0012) [2025-01-04 03:39:53,968][134211] Fps is (10 sec: 19251.3, 60 sec: 15564.9, 300 sec: 15398.2). Total num frames: 355450880. Throughput: 0: 3992.0. Samples: 78026548. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-04 03:39:53,968][134211] Avg episode reward: [(0, '7.768')] [2025-01-04 03:39:54,666][134294] Updated weights for policy 0, policy_version 86784 (0.0014) [2025-01-04 03:39:56,567][134294] Updated weights for policy 0, policy_version 86794 (0.0015) [2025-01-04 03:39:58,968][134211] Fps is (10 sec: 20070.1, 60 sec: 15974.4, 300 sec: 15495.4). Total num frames: 355540992. Throughput: 0: 4072.0. Samples: 78056234. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-04 03:39:58,968][134211] Avg episode reward: [(0, '6.921')] [2025-01-04 03:39:59,517][134294] Updated weights for policy 0, policy_version 86804 (0.0025) [2025-01-04 03:40:03,063][134294] Updated weights for policy 0, policy_version 86814 (0.0028) [2025-01-04 03:40:03,968][134211] Fps is (10 sec: 14745.2, 60 sec: 15837.9, 300 sec: 15453.7). Total num frames: 355598336. Throughput: 0: 3912.7. Samples: 78065144. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-04 03:40:03,969][134211] Avg episode reward: [(0, '7.850')] [2025-01-04 03:40:06,251][134294] Updated weights for policy 0, policy_version 86824 (0.0028) [2025-01-04 03:40:08,968][134211] Fps is (10 sec: 12287.9, 60 sec: 15769.6, 300 sec: 15398.2). Total num frames: 355663872. Throughput: 0: 3803.9. Samples: 78084040. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 03:40:08,968][134211] Avg episode reward: [(0, '7.764')] [2025-01-04 03:40:09,486][134294] Updated weights for policy 0, policy_version 86834 (0.0028) [2025-01-04 03:40:12,473][134294] Updated weights for policy 0, policy_version 86844 (0.0024) [2025-01-04 03:40:13,968][134211] Fps is (10 sec: 13107.2, 60 sec: 15291.7, 300 sec: 15412.1). Total num frames: 355729408. Throughput: 0: 3861.1. Samples: 78103736. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 03:40:13,968][134211] Avg episode reward: [(0, '7.453')] [2025-01-04 03:40:15,457][134294] Updated weights for policy 0, policy_version 86854 (0.0026) [2025-01-04 03:40:18,382][134294] Updated weights for policy 0, policy_version 86864 (0.0026) [2025-01-04 03:40:18,968][134211] Fps is (10 sec: 13516.9, 60 sec: 14745.8, 300 sec: 15342.6). Total num frames: 355799040. Throughput: 0: 3863.2. Samples: 78114160. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 03:40:18,968][134211] Avg episode reward: [(0, '6.334')] [2025-01-04 03:40:21,407][134294] Updated weights for policy 0, policy_version 86874 (0.0028) [2025-01-04 03:40:23,968][134211] Fps is (10 sec: 13926.5, 60 sec: 14882.3, 300 sec: 15259.3). Total num frames: 355868672. Throughput: 0: 3867.1. Samples: 78134758. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 03:40:23,968][134211] Avg episode reward: [(0, '7.547')] [2025-01-04 03:40:24,493][134294] Updated weights for policy 0, policy_version 86884 (0.0024) [2025-01-04 03:40:26,502][134294] Updated weights for policy 0, policy_version 86894 (0.0014) [2025-01-04 03:40:28,353][134294] Updated weights for policy 0, policy_version 86904 (0.0015) [2025-01-04 03:40:28,968][134211] Fps is (10 sec: 17203.3, 60 sec: 15564.8, 300 sec: 15398.2). Total num frames: 355971072. Throughput: 0: 3923.4. Samples: 78161992. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 03:40:28,968][134211] Avg episode reward: [(0, '7.449')] [2025-01-04 03:40:30,217][134294] Updated weights for policy 0, policy_version 86914 (0.0013) [2025-01-04 03:40:32,470][134294] Updated weights for policy 0, policy_version 86924 (0.0016) [2025-01-04 03:40:33,968][134211] Fps is (10 sec: 18841.3, 60 sec: 15906.1, 300 sec: 15467.6). Total num frames: 356057088. Throughput: 0: 4061.3. Samples: 78178306. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 03:40:33,969][134211] Avg episode reward: [(0, '8.018')] [2025-01-04 03:40:35,812][134294] Updated weights for policy 0, policy_version 86934 (0.0027) [2025-01-04 03:40:38,850][134294] Updated weights for policy 0, policy_version 86944 (0.0024) [2025-01-04 03:40:38,968][134211] Fps is (10 sec: 15154.9, 60 sec: 15701.3, 300 sec: 15467.6). Total num frames: 356122624. Throughput: 0: 3819.2. Samples: 78198412. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 03:40:38,968][134211] Avg episode reward: [(0, '7.348')] [2025-01-04 03:40:41,989][134294] Updated weights for policy 0, policy_version 86954 (0.0026) [2025-01-04 03:40:43,968][134211] Fps is (10 sec: 13107.5, 60 sec: 15496.5, 300 sec: 15467.8). Total num frames: 356188160. Throughput: 0: 3592.0. Samples: 78217874. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 03:40:43,968][134211] Avg episode reward: [(0, '7.536')] [2025-01-04 03:40:45,063][134294] Updated weights for policy 0, policy_version 86964 (0.0024) [2025-01-04 03:40:46,970][134294] Updated weights for policy 0, policy_version 86974 (0.0013) [2025-01-04 03:40:48,902][134294] Updated weights for policy 0, policy_version 86984 (0.0013) [2025-01-04 03:40:48,970][134211] Fps is (10 sec: 16380.9, 60 sec: 15769.0, 300 sec: 15467.5). Total num frames: 356286464. Throughput: 0: 3670.7. Samples: 78230334. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 03:40:48,970][134211] Avg episode reward: [(0, '8.483')] [2025-01-04 03:40:51,615][134294] Updated weights for policy 0, policy_version 86994 (0.0023) [2025-01-04 03:40:53,968][134211] Fps is (10 sec: 16793.4, 60 sec: 15086.9, 300 sec: 15328.7). Total num frames: 356356096. Throughput: 0: 3837.6. Samples: 78256732. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 03:40:53,968][134211] Avg episode reward: [(0, '7.419')] [2025-01-04 03:40:54,922][134294] Updated weights for policy 0, policy_version 87004 (0.0026) [2025-01-04 03:40:58,186][134294] Updated weights for policy 0, policy_version 87014 (0.0028) [2025-01-04 03:40:58,968][134211] Fps is (10 sec: 13108.8, 60 sec: 14608.9, 300 sec: 15287.1). Total num frames: 356417536. Throughput: 0: 3818.1. Samples: 78275552. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 03:40:58,969][134211] Avg episode reward: [(0, '7.658')] [2025-01-04 03:41:00,546][134294] Updated weights for policy 0, policy_version 87024 (0.0018) [2025-01-04 03:41:02,434][134294] Updated weights for policy 0, policy_version 87034 (0.0013) [2025-01-04 03:41:03,967][134211] Fps is (10 sec: 16384.4, 60 sec: 15360.1, 300 sec: 15426.0). Total num frames: 356519936. Throughput: 0: 3892.2. Samples: 78289308. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 03:41:03,968][134211] Avg episode reward: [(0, '7.047')] [2025-01-04 03:41:04,296][134294] Updated weights for policy 0, policy_version 87044 (0.0014) [2025-01-04 03:41:06,224][134294] Updated weights for policy 0, policy_version 87054 (0.0013) [2025-01-04 03:41:08,711][134294] Updated weights for policy 0, policy_version 87064 (0.0021) [2025-01-04 03:41:08,968][134211] Fps is (10 sec: 20071.9, 60 sec: 15906.1, 300 sec: 15537.0). Total num frames: 356618240. Throughput: 0: 4148.1. Samples: 78321424. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 03:41:08,968][134211] Avg episode reward: [(0, '7.348')] [2025-01-04 03:41:11,958][134294] Updated weights for policy 0, policy_version 87074 (0.0024) [2025-01-04 03:41:13,968][134211] Fps is (10 sec: 15974.1, 60 sec: 15837.9, 300 sec: 15523.2). Total num frames: 356679680. Throughput: 0: 3973.1. Samples: 78340782. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 03:41:13,968][134211] Avg episode reward: [(0, '7.717')] [2025-01-04 03:41:13,979][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000087080_356679680.pth... [2025-01-04 03:41:14,053][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000086166_352935936.pth [2025-01-04 03:41:15,185][134294] Updated weights for policy 0, policy_version 87084 (0.0023) [2025-01-04 03:41:18,470][134294] Updated weights for policy 0, policy_version 87094 (0.0024) [2025-01-04 03:41:18,968][134211] Fps is (10 sec: 12288.0, 60 sec: 15701.3, 300 sec: 15509.3). Total num frames: 356741120. Throughput: 0: 3826.3. Samples: 78350488. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 03:41:18,968][134211] Avg episode reward: [(0, '7.290')] [2025-01-04 03:41:21,906][134294] Updated weights for policy 0, policy_version 87104 (0.0023) [2025-01-04 03:41:23,967][134211] Fps is (10 sec: 13517.0, 60 sec: 15769.7, 300 sec: 15426.0). Total num frames: 356814848. Throughput: 0: 3786.3. Samples: 78368794. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 03:41:23,968][134211] Avg episode reward: [(0, '7.888')] [2025-01-04 03:41:23,986][134294] Updated weights for policy 0, policy_version 87114 (0.0015) [2025-01-04 03:41:25,959][134294] Updated weights for policy 0, policy_version 87124 (0.0013) [2025-01-04 03:41:27,883][134294] Updated weights for policy 0, policy_version 87134 (0.0014) [2025-01-04 03:41:28,968][134211] Fps is (10 sec: 18022.6, 60 sec: 15837.9, 300 sec: 15426.0). Total num frames: 356921344. Throughput: 0: 4052.0. Samples: 78400212. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 03:41:28,968][134211] Avg episode reward: [(0, '7.553')] [2025-01-04 03:41:29,738][134294] Updated weights for policy 0, policy_version 87144 (0.0013) [2025-01-04 03:41:31,598][134294] Updated weights for policy 0, policy_version 87154 (0.0013) [2025-01-04 03:41:33,911][134294] Updated weights for policy 0, policy_version 87164 (0.0018) [2025-01-04 03:41:33,968][134211] Fps is (10 sec: 20889.3, 60 sec: 16111.0, 300 sec: 15412.1). Total num frames: 357023744. Throughput: 0: 4138.9. Samples: 78416578. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 03:41:33,968][134211] Avg episode reward: [(0, '7.645')] [2025-01-04 03:41:37,258][134294] Updated weights for policy 0, policy_version 87174 (0.0029) [2025-01-04 03:41:38,968][134211] Fps is (10 sec: 15973.1, 60 sec: 15974.2, 300 sec: 15398.2). Total num frames: 357081088. Throughput: 0: 4051.9. Samples: 78439068. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 03:41:38,969][134211] Avg episode reward: [(0, '7.189')] [2025-01-04 03:41:40,474][134294] Updated weights for policy 0, policy_version 87184 (0.0026) [2025-01-04 03:41:43,567][134294] Updated weights for policy 0, policy_version 87194 (0.0025) [2025-01-04 03:41:43,968][134211] Fps is (10 sec: 12697.5, 60 sec: 16042.6, 300 sec: 15412.1). Total num frames: 357150720. Throughput: 0: 4070.7. Samples: 78458732. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 03:41:43,969][134211] Avg episode reward: [(0, '6.919')] [2025-01-04 03:41:46,524][134294] Updated weights for policy 0, policy_version 87204 (0.0028) [2025-01-04 03:41:48,968][134211] Fps is (10 sec: 13517.5, 60 sec: 15497.0, 300 sec: 15412.1). Total num frames: 357216256. Throughput: 0: 3987.3. Samples: 78468736. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 03:41:48,968][134211] Avg episode reward: [(0, '8.022')] [2025-01-04 03:41:49,741][134294] Updated weights for policy 0, policy_version 87214 (0.0027) [2025-01-04 03:41:52,914][134294] Updated weights for policy 0, policy_version 87224 (0.0023) [2025-01-04 03:41:53,968][134211] Fps is (10 sec: 13107.3, 60 sec: 15428.3, 300 sec: 15412.1). Total num frames: 357281792. Throughput: 0: 3705.2. Samples: 78488156. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 03:41:53,968][134211] Avg episode reward: [(0, '7.953')] [2025-01-04 03:41:56,116][134294] Updated weights for policy 0, policy_version 87234 (0.0025) [2025-01-04 03:41:58,968][134211] Fps is (10 sec: 12697.6, 60 sec: 15428.4, 300 sec: 15384.3). Total num frames: 357343232. Throughput: 0: 3697.2. Samples: 78507156. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 03:41:58,968][134211] Avg episode reward: [(0, '7.068')] [2025-01-04 03:41:59,293][134294] Updated weights for policy 0, policy_version 87244 (0.0027) [2025-01-04 03:42:02,150][134294] Updated weights for policy 0, policy_version 87254 (0.0019) [2025-01-04 03:42:03,968][134211] Fps is (10 sec: 14745.7, 60 sec: 15155.2, 300 sec: 15453.7). Total num frames: 357429248. Throughput: 0: 3699.4. Samples: 78516962. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 03:42:03,968][134211] Avg episode reward: [(0, '7.300')] [2025-01-04 03:42:04,052][134294] Updated weights for policy 0, policy_version 87264 (0.0015) [2025-01-04 03:42:05,929][134294] Updated weights for policy 0, policy_version 87274 (0.0014) [2025-01-04 03:42:07,922][134294] Updated weights for policy 0, policy_version 87284 (0.0014) [2025-01-04 03:42:08,968][134211] Fps is (10 sec: 19251.6, 60 sec: 15291.7, 300 sec: 15592.6). Total num frames: 357535744. Throughput: 0: 3986.5. Samples: 78548186. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 03:42:08,968][134211] Avg episode reward: [(0, '6.971')] [2025-01-04 03:42:09,817][134294] Updated weights for policy 0, policy_version 87294 (0.0013) [2025-01-04 03:42:11,707][134294] Updated weights for policy 0, policy_version 87304 (0.0013) [2025-01-04 03:42:13,968][134211] Fps is (10 sec: 20479.6, 60 sec: 15906.1, 300 sec: 15592.6). Total num frames: 357634048. Throughput: 0: 3987.8. Samples: 78579664. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 03:42:13,968][134211] Avg episode reward: [(0, '7.414')] [2025-01-04 03:42:14,053][134294] Updated weights for policy 0, policy_version 87314 (0.0020) [2025-01-04 03:42:17,395][134294] Updated weights for policy 0, policy_version 87324 (0.0025) [2025-01-04 03:42:18,968][134211] Fps is (10 sec: 15974.1, 60 sec: 15906.1, 300 sec: 15439.8). Total num frames: 357695488. Throughput: 0: 3836.1. Samples: 78589204. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 03:42:18,969][134211] Avg episode reward: [(0, '7.623')] [2025-01-04 03:42:20,523][134294] Updated weights for policy 0, policy_version 87334 (0.0027) [2025-01-04 03:42:23,599][134294] Updated weights for policy 0, policy_version 87344 (0.0026) [2025-01-04 03:42:23,968][134211] Fps is (10 sec: 13107.2, 60 sec: 15837.8, 300 sec: 15426.0). Total num frames: 357765120. Throughput: 0: 3770.7. Samples: 78608748. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 03:42:23,968][134211] Avg episode reward: [(0, '7.435')] [2025-01-04 03:42:26,758][134294] Updated weights for policy 0, policy_version 87354 (0.0023) [2025-01-04 03:42:28,968][134211] Fps is (10 sec: 13107.2, 60 sec: 15086.9, 300 sec: 15426.0). Total num frames: 357826560. Throughput: 0: 3755.8. Samples: 78627742. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 03:42:28,968][134211] Avg episode reward: [(0, '6.522')] [2025-01-04 03:42:30,240][134294] Updated weights for policy 0, policy_version 87364 (0.0027) [2025-01-04 03:42:33,505][134294] Updated weights for policy 0, policy_version 87374 (0.0027) [2025-01-04 03:42:33,968][134211] Fps is (10 sec: 12288.0, 60 sec: 14404.2, 300 sec: 15412.1). Total num frames: 357888000. Throughput: 0: 3728.5. Samples: 78636518. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 03:42:33,969][134211] Avg episode reward: [(0, '7.037')] [2025-01-04 03:42:36,520][134294] Updated weights for policy 0, policy_version 87384 (0.0023) [2025-01-04 03:42:38,970][134211] Fps is (10 sec: 12695.0, 60 sec: 14540.4, 300 sec: 15412.0). Total num frames: 357953536. Throughput: 0: 3746.7. Samples: 78656766. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 03:42:38,970][134211] Avg episode reward: [(0, '6.873')] [2025-01-04 03:42:39,428][134294] Updated weights for policy 0, policy_version 87394 (0.0021) [2025-01-04 03:42:41,381][134294] Updated weights for policy 0, policy_version 87404 (0.0014) [2025-01-04 03:42:43,331][134294] Updated weights for policy 0, policy_version 87414 (0.0014) [2025-01-04 03:42:43,968][134211] Fps is (10 sec: 17203.5, 60 sec: 15155.2, 300 sec: 15495.4). Total num frames: 358060032. Throughput: 0: 3936.1. Samples: 78684280. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 03:42:43,968][134211] Avg episode reward: [(0, '7.776')] [2025-01-04 03:42:45,276][134294] Updated weights for policy 0, policy_version 87424 (0.0013) [2025-01-04 03:42:47,341][134294] Updated weights for policy 0, policy_version 87434 (0.0016) [2025-01-04 03:42:48,968][134211] Fps is (10 sec: 19664.8, 60 sec: 15564.8, 300 sec: 15439.8). Total num frames: 358150144. Throughput: 0: 4078.2. Samples: 78700482. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 03:42:48,968][134211] Avg episode reward: [(0, '7.929')] [2025-01-04 03:42:50,683][134294] Updated weights for policy 0, policy_version 87444 (0.0028) [2025-01-04 03:42:53,968][134211] Fps is (10 sec: 14745.3, 60 sec: 15428.2, 300 sec: 15370.4). Total num frames: 358207488. Throughput: 0: 3827.4. Samples: 78720420. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 03:42:53,969][134211] Avg episode reward: [(0, '7.275')] [2025-01-04 03:42:54,196][134294] Updated weights for policy 0, policy_version 87454 (0.0027) [2025-01-04 03:42:57,568][134294] Updated weights for policy 0, policy_version 87464 (0.0027) [2025-01-04 03:42:58,968][134211] Fps is (10 sec: 11878.4, 60 sec: 15428.3, 300 sec: 15370.4). Total num frames: 358268928. Throughput: 0: 3520.8. Samples: 78738102. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 03:42:58,968][134211] Avg episode reward: [(0, '7.482')] [2025-01-04 03:43:00,755][134294] Updated weights for policy 0, policy_version 87474 (0.0023) [2025-01-04 03:43:02,695][134294] Updated weights for policy 0, policy_version 87484 (0.0014) [2025-01-04 03:43:03,968][134211] Fps is (10 sec: 14336.1, 60 sec: 15360.0, 300 sec: 15439.8). Total num frames: 358350848. Throughput: 0: 3548.6. Samples: 78748890. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 03:43:03,968][134211] Avg episode reward: [(0, '7.790')] [2025-01-04 03:43:05,450][134294] Updated weights for policy 0, policy_version 87494 (0.0022) [2025-01-04 03:43:08,571][134294] Updated weights for policy 0, policy_version 87504 (0.0024) [2025-01-04 03:43:08,968][134211] Fps is (10 sec: 15155.3, 60 sec: 14745.6, 300 sec: 15384.3). Total num frames: 358420480. Throughput: 0: 3649.3. Samples: 78772964. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 03:43:08,968][134211] Avg episode reward: [(0, '7.829')] [2025-01-04 03:43:11,253][134294] Updated weights for policy 0, policy_version 87514 (0.0020) [2025-01-04 03:43:13,143][134294] Updated weights for policy 0, policy_version 87524 (0.0012) [2025-01-04 03:43:13,967][134211] Fps is (10 sec: 16384.4, 60 sec: 14677.4, 300 sec: 15467.6). Total num frames: 358514688. Throughput: 0: 3784.0. Samples: 78798020. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 03:43:13,968][134211] Avg episode reward: [(0, '7.670')] [2025-01-04 03:43:13,976][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000087528_358514688.pth... [2025-01-04 03:43:14,020][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000086625_354816000.pth [2025-01-04 03:43:15,096][134294] Updated weights for policy 0, policy_version 87534 (0.0014) [2025-01-04 03:43:17,095][134294] Updated weights for policy 0, policy_version 87544 (0.0015) [2025-01-04 03:43:18,968][134211] Fps is (10 sec: 19251.4, 60 sec: 15291.8, 300 sec: 15578.7). Total num frames: 358612992. Throughput: 0: 3937.0. Samples: 78813684. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 03:43:18,968][134211] Avg episode reward: [(0, '7.588')] [2025-01-04 03:43:19,202][134294] Updated weights for policy 0, policy_version 87554 (0.0013) [2025-01-04 03:43:22,144][134294] Updated weights for policy 0, policy_version 87564 (0.0024) [2025-01-04 03:43:23,968][134211] Fps is (10 sec: 16793.3, 60 sec: 15291.8, 300 sec: 15481.5). Total num frames: 358682624. Throughput: 0: 4046.1. Samples: 78838832. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 03:43:23,968][134211] Avg episode reward: [(0, '8.009')] [2025-01-04 03:43:25,738][134294] Updated weights for policy 0, policy_version 87574 (0.0029) [2025-01-04 03:43:28,968][134211] Fps is (10 sec: 12697.3, 60 sec: 15223.5, 300 sec: 15301.0). Total num frames: 358739968. Throughput: 0: 3826.8. Samples: 78856486. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 03:43:28,971][134211] Avg episode reward: [(0, '7.159')] [2025-01-04 03:43:29,151][134294] Updated weights for policy 0, policy_version 87584 (0.0028) [2025-01-04 03:43:32,416][134294] Updated weights for policy 0, policy_version 87594 (0.0025) [2025-01-04 03:43:33,968][134211] Fps is (10 sec: 11878.3, 60 sec: 15223.5, 300 sec: 15176.0). Total num frames: 358801408. Throughput: 0: 3663.8. Samples: 78865352. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 03:43:33,968][134211] Avg episode reward: [(0, '7.636')] [2025-01-04 03:43:35,611][134294] Updated weights for policy 0, policy_version 87604 (0.0024) [2025-01-04 03:43:38,583][134294] Updated weights for policy 0, policy_version 87614 (0.0024) [2025-01-04 03:43:38,968][134211] Fps is (10 sec: 13517.1, 60 sec: 15360.6, 300 sec: 15217.7). Total num frames: 358875136. Throughput: 0: 3659.0. Samples: 78885072. Policy #0 lag: (min: 0.0, avg: 0.8, max: 3.0) [2025-01-04 03:43:38,968][134211] Avg episode reward: [(0, '7.391')] [2025-01-04 03:43:40,486][134294] Updated weights for policy 0, policy_version 87624 (0.0014) [2025-01-04 03:43:42,459][134294] Updated weights for policy 0, policy_version 87634 (0.0013) [2025-01-04 03:43:43,968][134211] Fps is (10 sec: 17203.2, 60 sec: 15223.4, 300 sec: 15342.6). Total num frames: 358973440. Throughput: 0: 3902.3. Samples: 78913706. Policy #0 lag: (min: 0.0, avg: 0.8, max: 3.0) [2025-01-04 03:43:43,968][134211] Avg episode reward: [(0, '7.544')] [2025-01-04 03:43:45,029][134294] Updated weights for policy 0, policy_version 87644 (0.0023) [2025-01-04 03:43:48,207][134294] Updated weights for policy 0, policy_version 87654 (0.0023) [2025-01-04 03:43:48,968][134211] Fps is (10 sec: 16383.7, 60 sec: 14813.9, 300 sec: 15328.8). Total num frames: 359038976. Throughput: 0: 3903.2. Samples: 78924532. Policy #0 lag: (min: 0.0, avg: 0.8, max: 3.0) [2025-01-04 03:43:48,968][134211] Avg episode reward: [(0, '6.965')] [2025-01-04 03:43:51,251][134294] Updated weights for policy 0, policy_version 87664 (0.0031) [2025-01-04 03:43:53,968][134211] Fps is (10 sec: 13107.4, 60 sec: 14950.4, 300 sec: 15328.8). Total num frames: 359104512. Throughput: 0: 3806.9. Samples: 78944276. Policy #0 lag: (min: 0.0, avg: 0.8, max: 3.0) [2025-01-04 03:43:53,968][134211] Avg episode reward: [(0, '7.437')] [2025-01-04 03:43:54,310][134294] Updated weights for policy 0, policy_version 87674 (0.0028) [2025-01-04 03:43:56,380][134294] Updated weights for policy 0, policy_version 87684 (0.0013) [2025-01-04 03:43:58,968][134211] Fps is (10 sec: 15155.2, 60 sec: 15360.0, 300 sec: 15398.2). Total num frames: 359190528. Throughput: 0: 3795.8. Samples: 78968834. Policy #0 lag: (min: 0.0, avg: 0.8, max: 3.0) [2025-01-04 03:43:58,968][134211] Avg episode reward: [(0, '7.401')] [2025-01-04 03:43:59,083][134294] Updated weights for policy 0, policy_version 87694 (0.0026) [2025-01-04 03:44:02,359][134294] Updated weights for policy 0, policy_version 87704 (0.0027) [2025-01-04 03:44:03,967][134211] Fps is (10 sec: 15155.5, 60 sec: 15087.0, 300 sec: 15384.3). Total num frames: 359256064. Throughput: 0: 3658.5. Samples: 78978316. Policy #0 lag: (min: 0.0, avg: 0.8, max: 3.0) [2025-01-04 03:44:03,968][134211] Avg episode reward: [(0, '7.330')] [2025-01-04 03:44:04,858][134294] Updated weights for policy 0, policy_version 87714 (0.0018) [2025-01-04 03:44:06,881][134294] Updated weights for policy 0, policy_version 87724 (0.0012) [2025-01-04 03:44:08,956][134294] Updated weights for policy 0, policy_version 87734 (0.0014) [2025-01-04 03:44:08,967][134211] Fps is (10 sec: 16794.0, 60 sec: 15633.1, 300 sec: 15412.1). Total num frames: 359358464. Throughput: 0: 3672.5. Samples: 79004096. Policy #0 lag: (min: 0.0, avg: 0.8, max: 3.0) [2025-01-04 03:44:08,968][134211] Avg episode reward: [(0, '8.009')] [2025-01-04 03:44:10,872][134294] Updated weights for policy 0, policy_version 87744 (0.0014) [2025-01-04 03:44:13,900][134294] Updated weights for policy 0, policy_version 87754 (0.0028) [2025-01-04 03:44:13,968][134211] Fps is (10 sec: 18431.3, 60 sec: 15428.2, 300 sec: 15342.7). Total num frames: 359440384. Throughput: 0: 3883.9. Samples: 79031264. Policy #0 lag: (min: 0.0, avg: 0.8, max: 3.0) [2025-01-04 03:44:13,969][134211] Avg episode reward: [(0, '7.378')] [2025-01-04 03:44:17,009][134294] Updated weights for policy 0, policy_version 87764 (0.0028) [2025-01-04 03:44:18,968][134211] Fps is (10 sec: 14745.5, 60 sec: 14882.1, 300 sec: 15356.6). Total num frames: 359505920. Throughput: 0: 3896.5. Samples: 79040694. Policy #0 lag: (min: 0.0, avg: 0.8, max: 3.0) [2025-01-04 03:44:18,968][134211] Avg episode reward: [(0, '7.566')] [2025-01-04 03:44:20,203][134294] Updated weights for policy 0, policy_version 87774 (0.0033) [2025-01-04 03:44:23,359][134294] Updated weights for policy 0, policy_version 87784 (0.0025) [2025-01-04 03:44:23,968][134211] Fps is (10 sec: 12697.5, 60 sec: 14745.6, 300 sec: 15356.5). Total num frames: 359567360. Throughput: 0: 3903.3. Samples: 79060724. Policy #0 lag: (min: 0.0, avg: 0.8, max: 3.0) [2025-01-04 03:44:23,969][134211] Avg episode reward: [(0, '7.823')] [2025-01-04 03:44:26,409][134294] Updated weights for policy 0, policy_version 87794 (0.0023) [2025-01-04 03:44:28,332][134294] Updated weights for policy 0, policy_version 87804 (0.0014) [2025-01-04 03:44:28,968][134211] Fps is (10 sec: 15155.2, 60 sec: 15291.8, 300 sec: 15439.8). Total num frames: 359657472. Throughput: 0: 3778.8. Samples: 79083750. Policy #0 lag: (min: 0.0, avg: 0.8, max: 3.0) [2025-01-04 03:44:28,968][134211] Avg episode reward: [(0, '6.642')] [2025-01-04 03:44:30,226][134294] Updated weights for policy 0, policy_version 87814 (0.0013) [2025-01-04 03:44:32,105][134294] Updated weights for policy 0, policy_version 87824 (0.0014) [2025-01-04 03:44:33,967][134211] Fps is (10 sec: 19661.6, 60 sec: 16042.8, 300 sec: 15537.0). Total num frames: 359763968. Throughput: 0: 3898.7. Samples: 79099974. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-04 03:44:33,968][134211] Avg episode reward: [(0, '8.239')] [2025-01-04 03:44:33,997][134294] Updated weights for policy 0, policy_version 87834 (0.0014) [2025-01-04 03:44:36,186][134294] Updated weights for policy 0, policy_version 87844 (0.0015) [2025-01-04 03:44:38,968][134211] Fps is (10 sec: 18431.6, 60 sec: 16110.9, 300 sec: 15537.0). Total num frames: 359841792. Throughput: 0: 4097.5. Samples: 79128666. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-04 03:44:38,969][134211] Avg episode reward: [(0, '7.349')] [2025-01-04 03:44:39,746][134294] Updated weights for policy 0, policy_version 87854 (0.0027) [2025-01-04 03:44:43,466][134294] Updated weights for policy 0, policy_version 87864 (0.0032) [2025-01-04 03:44:43,968][134211] Fps is (10 sec: 13106.7, 60 sec: 15360.0, 300 sec: 15439.8). Total num frames: 359895040. Throughput: 0: 3920.1. Samples: 79145240. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-04 03:44:43,969][134211] Avg episode reward: [(0, '7.501')] [2025-01-04 03:44:47,032][134294] Updated weights for policy 0, policy_version 87874 (0.0027) [2025-01-04 03:44:48,969][134211] Fps is (10 sec: 11058.2, 60 sec: 15223.2, 300 sec: 15259.3). Total num frames: 359952384. Throughput: 0: 3904.4. Samples: 79154020. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-04 03:44:48,969][134211] Avg episode reward: [(0, '8.067')] [2025-01-04 03:44:50,430][134294] Updated weights for policy 0, policy_version 87884 (0.0024) [2025-01-04 03:44:53,533][134294] Updated weights for policy 0, policy_version 87894 (0.0023) [2025-01-04 03:44:53,968][134211] Fps is (10 sec: 12697.8, 60 sec: 15291.7, 300 sec: 15189.9). Total num frames: 360022016. Throughput: 0: 3719.5. Samples: 79171472. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-04 03:44:53,968][134211] Avg episode reward: [(0, '7.996')] [2025-01-04 03:44:55,754][134294] Updated weights for policy 0, policy_version 87904 (0.0013) [2025-01-04 03:44:57,696][134294] Updated weights for policy 0, policy_version 87914 (0.0014) [2025-01-04 03:44:58,968][134211] Fps is (10 sec: 16385.7, 60 sec: 15428.3, 300 sec: 15314.9). Total num frames: 360116224. Throughput: 0: 3741.8. Samples: 79199644. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-04 03:44:58,968][134211] Avg episode reward: [(0, '8.650')] [2025-01-04 03:45:00,139][134294] Updated weights for policy 0, policy_version 87924 (0.0021) [2025-01-04 03:45:03,396][134294] Updated weights for policy 0, policy_version 87934 (0.0030) [2025-01-04 03:45:03,968][134211] Fps is (10 sec: 15974.4, 60 sec: 15428.2, 300 sec: 15314.9). Total num frames: 360181760. Throughput: 0: 3778.1. Samples: 79210710. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-04 03:45:03,968][134211] Avg episode reward: [(0, '7.465')] [2025-01-04 03:45:06,535][134294] Updated weights for policy 0, policy_version 87944 (0.0027) [2025-01-04 03:45:08,971][134211] Fps is (10 sec: 13102.9, 60 sec: 14813.0, 300 sec: 15314.7). Total num frames: 360247296. Throughput: 0: 3760.4. Samples: 79229952. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-04 03:45:08,972][134211] Avg episode reward: [(0, '8.611')] [2025-01-04 03:45:09,782][134294] Updated weights for policy 0, policy_version 87954 (0.0029) [2025-01-04 03:45:12,231][134294] Updated weights for policy 0, policy_version 87964 (0.0016) [2025-01-04 03:45:13,968][134211] Fps is (10 sec: 14745.5, 60 sec: 14813.9, 300 sec: 15356.5). Total num frames: 360329216. Throughput: 0: 3766.7. Samples: 79253252. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-04 03:45:13,968][134211] Avg episode reward: [(0, '7.431')] [2025-01-04 03:45:14,019][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000087972_360333312.pth... [2025-01-04 03:45:14,078][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000087080_356679680.pth [2025-01-04 03:45:14,647][134294] Updated weights for policy 0, policy_version 87974 (0.0019) [2025-01-04 03:45:17,618][134294] Updated weights for policy 0, policy_version 87984 (0.0023) [2025-01-04 03:45:18,968][134211] Fps is (10 sec: 15160.0, 60 sec: 14882.1, 300 sec: 15356.5). Total num frames: 360398848. Throughput: 0: 3643.4. Samples: 79263928. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-04 03:45:18,968][134211] Avg episode reward: [(0, '7.454')] [2025-01-04 03:45:21,026][134294] Updated weights for policy 0, policy_version 87994 (0.0027) [2025-01-04 03:45:23,752][134294] Updated weights for policy 0, policy_version 88004 (0.0017) [2025-01-04 03:45:23,968][134211] Fps is (10 sec: 13926.4, 60 sec: 15018.7, 300 sec: 15245.4). Total num frames: 360468480. Throughput: 0: 3416.3. Samples: 79282400. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 03:45:23,968][134211] Avg episode reward: [(0, '8.097')] [2025-01-04 03:45:25,829][134294] Updated weights for policy 0, policy_version 88014 (0.0013) [2025-01-04 03:45:27,829][134294] Updated weights for policy 0, policy_version 88024 (0.0015) [2025-01-04 03:45:28,968][134211] Fps is (10 sec: 16793.8, 60 sec: 15155.2, 300 sec: 15287.1). Total num frames: 360566784. Throughput: 0: 3700.9. Samples: 79311780. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 03:45:28,968][134211] Avg episode reward: [(0, '8.399')] [2025-01-04 03:45:29,723][134294] Updated weights for policy 0, policy_version 88034 (0.0015) [2025-01-04 03:45:31,881][134294] Updated weights for policy 0, policy_version 88044 (0.0016) [2025-01-04 03:45:33,968][134211] Fps is (10 sec: 18431.8, 60 sec: 14813.8, 300 sec: 15356.5). Total num frames: 360652800. Throughput: 0: 3848.4. Samples: 79327194. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 03:45:33,969][134211] Avg episode reward: [(0, '7.600')] [2025-01-04 03:45:35,268][134294] Updated weights for policy 0, policy_version 88054 (0.0030) [2025-01-04 03:45:38,564][134294] Updated weights for policy 0, policy_version 88064 (0.0026) [2025-01-04 03:45:38,968][134211] Fps is (10 sec: 14745.4, 60 sec: 14540.8, 300 sec: 15342.6). Total num frames: 360714240. Throughput: 0: 3885.3. Samples: 79346312. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 03:45:38,968][134211] Avg episode reward: [(0, '7.570')] [2025-01-04 03:45:41,854][134294] Updated weights for policy 0, policy_version 88074 (0.0027) [2025-01-04 03:45:43,968][134211] Fps is (10 sec: 12288.1, 60 sec: 14677.4, 300 sec: 15217.8). Total num frames: 360775680. Throughput: 0: 3678.3. Samples: 79365168. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 03:45:43,968][134211] Avg episode reward: [(0, '7.858')] [2025-01-04 03:45:45,046][134294] Updated weights for policy 0, policy_version 88084 (0.0025) [2025-01-04 03:45:48,242][134294] Updated weights for policy 0, policy_version 88094 (0.0027) [2025-01-04 03:45:48,968][134211] Fps is (10 sec: 12696.8, 60 sec: 14814.0, 300 sec: 15203.8). Total num frames: 360841216. Throughput: 0: 3644.5. Samples: 79374714. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 03:45:48,969][134211] Avg episode reward: [(0, '7.787')] [2025-01-04 03:45:50,848][134294] Updated weights for policy 0, policy_version 88104 (0.0018) [2025-01-04 03:45:52,837][134294] Updated weights for policy 0, policy_version 88114 (0.0014) [2025-01-04 03:45:53,968][134211] Fps is (10 sec: 16383.9, 60 sec: 15291.7, 300 sec: 15328.8). Total num frames: 360939520. Throughput: 0: 3749.5. Samples: 79398666. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 03:45:53,968][134211] Avg episode reward: [(0, '7.344')] [2025-01-04 03:45:54,715][134294] Updated weights for policy 0, policy_version 88124 (0.0013) [2025-01-04 03:45:56,641][134294] Updated weights for policy 0, policy_version 88134 (0.0014) [2025-01-04 03:45:58,564][134294] Updated weights for policy 0, policy_version 88144 (0.0016) [2025-01-04 03:45:58,967][134211] Fps is (10 sec: 20481.7, 60 sec: 15496.6, 300 sec: 15342.6). Total num frames: 361046016. Throughput: 0: 3943.9. Samples: 79430728. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 03:45:58,968][134211] Avg episode reward: [(0, '7.145')] [2025-01-04 03:46:00,604][134294] Updated weights for policy 0, policy_version 88154 (0.0015) [2025-01-04 03:46:03,873][134294] Updated weights for policy 0, policy_version 88164 (0.0026) [2025-01-04 03:46:03,968][134211] Fps is (10 sec: 18022.4, 60 sec: 15633.0, 300 sec: 15259.3). Total num frames: 361119744. Throughput: 0: 4031.2. Samples: 79445332. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 03:46:03,968][134211] Avg episode reward: [(0, '7.064')] [2025-01-04 03:46:07,063][134294] Updated weights for policy 0, policy_version 88174 (0.0030) [2025-01-04 03:46:08,968][134211] Fps is (10 sec: 13516.4, 60 sec: 15565.6, 300 sec: 15259.3). Total num frames: 361181184. Throughput: 0: 4027.7. Samples: 79463646. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 03:46:08,968][134211] Avg episode reward: [(0, '7.995')] [2025-01-04 03:46:10,326][134294] Updated weights for policy 0, policy_version 88184 (0.0027) [2025-01-04 03:46:13,467][134294] Updated weights for policy 0, policy_version 88194 (0.0028) [2025-01-04 03:46:13,968][134211] Fps is (10 sec: 12697.6, 60 sec: 15291.7, 300 sec: 15273.2). Total num frames: 361246720. Throughput: 0: 3805.8. Samples: 79483042. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 03:46:13,968][134211] Avg episode reward: [(0, '7.895')] [2025-01-04 03:46:16,526][134294] Updated weights for policy 0, policy_version 88204 (0.0024) [2025-01-04 03:46:18,968][134211] Fps is (10 sec: 13107.3, 60 sec: 15223.5, 300 sec: 15245.4). Total num frames: 361312256. Throughput: 0: 3684.3. Samples: 79492986. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 03:46:18,968][134211] Avg episode reward: [(0, '8.377')] [2025-01-04 03:46:19,703][134294] Updated weights for policy 0, policy_version 88214 (0.0027) [2025-01-04 03:46:22,766][134294] Updated weights for policy 0, policy_version 88224 (0.0021) [2025-01-04 03:46:23,968][134211] Fps is (10 sec: 13106.8, 60 sec: 15155.1, 300 sec: 15106.6). Total num frames: 361377792. Throughput: 0: 3696.8. Samples: 79512670. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 03:46:23,969][134211] Avg episode reward: [(0, '7.783')] [2025-01-04 03:46:25,813][134294] Updated weights for policy 0, policy_version 88234 (0.0026) [2025-01-04 03:46:28,786][134294] Updated weights for policy 0, policy_version 88244 (0.0025) [2025-01-04 03:46:28,968][134211] Fps is (10 sec: 13516.7, 60 sec: 14677.3, 300 sec: 14995.5). Total num frames: 361447424. Throughput: 0: 3731.5. Samples: 79533086. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 03:46:28,968][134211] Avg episode reward: [(0, '7.732')] [2025-01-04 03:46:31,781][134294] Updated weights for policy 0, policy_version 88254 (0.0024) [2025-01-04 03:46:33,751][134294] Updated weights for policy 0, policy_version 88264 (0.0013) [2025-01-04 03:46:33,968][134211] Fps is (10 sec: 15565.6, 60 sec: 14677.4, 300 sec: 15092.8). Total num frames: 361533440. Throughput: 0: 3743.6. Samples: 79543172. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 03:46:33,968][134211] Avg episode reward: [(0, '8.230')] [2025-01-04 03:46:35,589][134294] Updated weights for policy 0, policy_version 88274 (0.0013) [2025-01-04 03:46:37,504][134294] Updated weights for policy 0, policy_version 88284 (0.0015) [2025-01-04 03:46:38,968][134211] Fps is (10 sec: 19251.5, 60 sec: 15428.3, 300 sec: 15217.7). Total num frames: 361639936. Throughput: 0: 3898.0. Samples: 79574076. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 03:46:38,968][134211] Avg episode reward: [(0, '8.124')] [2025-01-04 03:46:39,540][134294] Updated weights for policy 0, policy_version 88294 (0.0014) [2025-01-04 03:46:42,630][134294] Updated weights for policy 0, policy_version 88304 (0.0026) [2025-01-04 03:46:43,968][134211] Fps is (10 sec: 17612.3, 60 sec: 15564.8, 300 sec: 15231.6). Total num frames: 361709568. Throughput: 0: 3724.8. Samples: 79598344. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 03:46:43,969][134211] Avg episode reward: [(0, '7.716')] [2025-01-04 03:46:45,764][134294] Updated weights for policy 0, policy_version 88314 (0.0029) [2025-01-04 03:46:48,866][134294] Updated weights for policy 0, policy_version 88324 (0.0028) [2025-01-04 03:46:48,968][134211] Fps is (10 sec: 13516.7, 60 sec: 15565.0, 300 sec: 15231.6). Total num frames: 361775104. Throughput: 0: 3620.5. Samples: 79608252. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 03:46:48,968][134211] Avg episode reward: [(0, '7.365')] [2025-01-04 03:46:52,076][134294] Updated weights for policy 0, policy_version 88334 (0.0026) [2025-01-04 03:46:53,968][134211] Fps is (10 sec: 12697.7, 60 sec: 14950.4, 300 sec: 15231.6). Total num frames: 361836544. Throughput: 0: 3643.1. Samples: 79627584. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 03:46:53,968][134211] Avg episode reward: [(0, '7.927')] [2025-01-04 03:46:55,220][134294] Updated weights for policy 0, policy_version 88344 (0.0026) [2025-01-04 03:46:57,489][134294] Updated weights for policy 0, policy_version 88354 (0.0017) [2025-01-04 03:46:58,968][134211] Fps is (10 sec: 15155.4, 60 sec: 14677.3, 300 sec: 15245.5). Total num frames: 361926656. Throughput: 0: 3737.3. Samples: 79651220. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 03:46:58,968][134211] Avg episode reward: [(0, '7.944')] [2025-01-04 03:46:59,444][134294] Updated weights for policy 0, policy_version 88364 (0.0015) [2025-01-04 03:47:01,317][134294] Updated weights for policy 0, policy_version 88374 (0.0015) [2025-01-04 03:47:03,414][134294] Updated weights for policy 0, policy_version 88384 (0.0016) [2025-01-04 03:47:03,968][134211] Fps is (10 sec: 18841.5, 60 sec: 15086.9, 300 sec: 15217.7). Total num frames: 362024960. Throughput: 0: 3875.4. Samples: 79667378. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 03:47:03,968][134211] Avg episode reward: [(0, '7.383')] [2025-01-04 03:47:06,908][134294] Updated weights for policy 0, policy_version 88394 (0.0027) [2025-01-04 03:47:08,970][134211] Fps is (10 sec: 15561.3, 60 sec: 15018.1, 300 sec: 15078.7). Total num frames: 362082304. Throughput: 0: 3927.5. Samples: 79689416. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 03:47:08,971][134211] Avg episode reward: [(0, '8.127')] [2025-01-04 03:47:10,544][134294] Updated weights for policy 0, policy_version 88404 (0.0028) [2025-01-04 03:47:13,718][134294] Updated weights for policy 0, policy_version 88414 (0.0027) [2025-01-04 03:47:13,968][134211] Fps is (10 sec: 11878.3, 60 sec: 14950.4, 300 sec: 15078.8). Total num frames: 362143744. Throughput: 0: 3873.8. Samples: 79707408. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 03:47:13,969][134211] Avg episode reward: [(0, '8.108')] [2025-01-04 03:47:14,038][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000088415_362147840.pth... [2025-01-04 03:47:14,108][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000087528_358514688.pth [2025-01-04 03:47:15,999][134294] Updated weights for policy 0, policy_version 88424 (0.0015) [2025-01-04 03:47:17,850][134294] Updated weights for policy 0, policy_version 88434 (0.0013) [2025-01-04 03:47:18,968][134211] Fps is (10 sec: 16387.7, 60 sec: 15564.8, 300 sec: 15189.9). Total num frames: 362246144. Throughput: 0: 3939.8. Samples: 79720464. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 03:47:18,968][134211] Avg episode reward: [(0, '8.273')] [2025-01-04 03:47:19,873][134294] Updated weights for policy 0, policy_version 88444 (0.0012) [2025-01-04 03:47:21,957][134294] Updated weights for policy 0, policy_version 88454 (0.0012) [2025-01-04 03:47:23,968][134211] Fps is (10 sec: 19661.3, 60 sec: 16042.8, 300 sec: 15301.0). Total num frames: 362340352. Throughput: 0: 3935.2. Samples: 79751160. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 03:47:23,968][134211] Avg episode reward: [(0, '8.290')] [2025-01-04 03:47:24,367][134294] Updated weights for policy 0, policy_version 88464 (0.0014) [2025-01-04 03:47:27,636][134294] Updated weights for policy 0, policy_version 88474 (0.0031) [2025-01-04 03:47:28,968][134211] Fps is (10 sec: 15563.4, 60 sec: 15905.9, 300 sec: 15301.0). Total num frames: 362401792. Throughput: 0: 3864.9. Samples: 79772266. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 03:47:28,969][134211] Avg episode reward: [(0, '7.548')] [2025-01-04 03:47:30,798][134294] Updated weights for policy 0, policy_version 88484 (0.0026) [2025-01-04 03:47:33,737][134294] Updated weights for policy 0, policy_version 88494 (0.0024) [2025-01-04 03:47:33,968][134211] Fps is (10 sec: 13107.2, 60 sec: 15633.0, 300 sec: 15315.0). Total num frames: 362471424. Throughput: 0: 3866.4. Samples: 79782242. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 03:47:33,968][134211] Avg episode reward: [(0, '7.890')] [2025-01-04 03:47:36,866][134294] Updated weights for policy 0, policy_version 88504 (0.0028) [2025-01-04 03:47:38,968][134211] Fps is (10 sec: 13517.7, 60 sec: 14950.3, 300 sec: 15176.0). Total num frames: 362536960. Throughput: 0: 3883.5. Samples: 79802342. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 03:47:38,968][134211] Avg episode reward: [(0, '7.968')] [2025-01-04 03:47:40,092][134294] Updated weights for policy 0, policy_version 88514 (0.0026) [2025-01-04 03:47:43,030][134294] Updated weights for policy 0, policy_version 88524 (0.0026) [2025-01-04 03:47:43,968][134211] Fps is (10 sec: 13107.0, 60 sec: 14882.1, 300 sec: 15092.7). Total num frames: 362602496. Throughput: 0: 3797.7. Samples: 79822116. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 03:47:43,969][134211] Avg episode reward: [(0, '7.300')] [2025-01-04 03:47:46,070][134294] Updated weights for policy 0, policy_version 88534 (0.0024) [2025-01-04 03:47:48,968][134211] Fps is (10 sec: 13516.8, 60 sec: 14950.4, 300 sec: 15134.4). Total num frames: 362672128. Throughput: 0: 3663.1. Samples: 79832216. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 03:47:48,968][134211] Avg episode reward: [(0, '7.732')] [2025-01-04 03:47:49,169][134294] Updated weights for policy 0, policy_version 88544 (0.0025) [2025-01-04 03:47:51,860][134294] Updated weights for policy 0, policy_version 88554 (0.0018) [2025-01-04 03:47:53,771][134294] Updated weights for policy 0, policy_version 88564 (0.0015) [2025-01-04 03:47:53,968][134211] Fps is (10 sec: 15565.2, 60 sec: 15360.1, 300 sec: 15217.7). Total num frames: 362758144. Throughput: 0: 3662.1. Samples: 79854204. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 03:47:53,968][134211] Avg episode reward: [(0, '8.117')] [2025-01-04 03:47:55,665][134294] Updated weights for policy 0, policy_version 88574 (0.0013) [2025-01-04 03:47:57,530][134294] Updated weights for policy 0, policy_version 88584 (0.0013) [2025-01-04 03:47:58,968][134211] Fps is (10 sec: 19660.1, 60 sec: 15701.2, 300 sec: 15314.9). Total num frames: 362868736. Throughput: 0: 3981.4. Samples: 79886572. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 03:47:58,968][134211] Avg episode reward: [(0, '7.524')] [2025-01-04 03:47:59,678][134294] Updated weights for policy 0, policy_version 88594 (0.0016) [2025-01-04 03:48:03,009][134294] Updated weights for policy 0, policy_version 88604 (0.0026) [2025-01-04 03:48:03,968][134211] Fps is (10 sec: 17203.1, 60 sec: 15087.0, 300 sec: 15287.1). Total num frames: 362930176. Throughput: 0: 3950.8. Samples: 79898252. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 03:48:03,968][134211] Avg episode reward: [(0, '8.156')] [2025-01-04 03:48:06,143][134294] Updated weights for policy 0, policy_version 88614 (0.0026) [2025-01-04 03:48:08,968][134211] Fps is (10 sec: 12698.0, 60 sec: 15224.0, 300 sec: 15189.9). Total num frames: 362995712. Throughput: 0: 3691.9. Samples: 79917298. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 03:48:08,968][134211] Avg episode reward: [(0, '7.563')] [2025-01-04 03:48:09,345][134294] Updated weights for policy 0, policy_version 88624 (0.0025) [2025-01-04 03:48:12,441][134294] Updated weights for policy 0, policy_version 88634 (0.0028) [2025-01-04 03:48:13,968][134211] Fps is (10 sec: 13107.2, 60 sec: 15291.8, 300 sec: 15078.8). Total num frames: 363061248. Throughput: 0: 3657.8. Samples: 79936866. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 03:48:13,968][134211] Avg episode reward: [(0, '7.820')] [2025-01-04 03:48:15,516][134294] Updated weights for policy 0, policy_version 88644 (0.0025) [2025-01-04 03:48:18,206][134294] Updated weights for policy 0, policy_version 88654 (0.0022) [2025-01-04 03:48:18,967][134211] Fps is (10 sec: 14336.5, 60 sec: 14882.2, 300 sec: 15106.6). Total num frames: 363139072. Throughput: 0: 3661.9. Samples: 79947028. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 03:48:18,968][134211] Avg episode reward: [(0, '7.533')] [2025-01-04 03:48:20,130][134294] Updated weights for policy 0, policy_version 88664 (0.0012) [2025-01-04 03:48:21,979][134294] Updated weights for policy 0, policy_version 88674 (0.0014) [2025-01-04 03:48:23,879][134294] Updated weights for policy 0, policy_version 88684 (0.0013) [2025-01-04 03:48:23,967][134211] Fps is (10 sec: 18841.9, 60 sec: 15155.2, 300 sec: 15287.1). Total num frames: 363249664. Throughput: 0: 3870.2. Samples: 79976502. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 03:48:23,968][134211] Avg episode reward: [(0, '7.409')] [2025-01-04 03:48:25,794][134294] Updated weights for policy 0, policy_version 88694 (0.0013) [2025-01-04 03:48:27,673][134294] Updated weights for policy 0, policy_version 88704 (0.0012) [2025-01-04 03:48:28,967][134211] Fps is (10 sec: 21708.7, 60 sec: 15906.4, 300 sec: 15439.9). Total num frames: 363356160. Throughput: 0: 4151.7. Samples: 80008942. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 03:48:28,968][134211] Avg episode reward: [(0, '7.760')] [2025-01-04 03:48:29,819][134294] Updated weights for policy 0, policy_version 88714 (0.0017) [2025-01-04 03:48:32,932][134294] Updated weights for policy 0, policy_version 88724 (0.0027) [2025-01-04 03:48:33,968][134211] Fps is (10 sec: 17202.9, 60 sec: 15837.9, 300 sec: 15412.1). Total num frames: 363421696. Throughput: 0: 4191.4. Samples: 80020828. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 03:48:33,968][134211] Avg episode reward: [(0, '7.263')] [2025-01-04 03:48:36,221][134294] Updated weights for policy 0, policy_version 88734 (0.0029) [2025-01-04 03:48:38,968][134211] Fps is (10 sec: 13107.1, 60 sec: 15837.9, 300 sec: 15301.0). Total num frames: 363487232. Throughput: 0: 4127.8. Samples: 80039954. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 03:48:38,968][134211] Avg episode reward: [(0, '7.332')] [2025-01-04 03:48:39,396][134294] Updated weights for policy 0, policy_version 88744 (0.0025) [2025-01-04 03:48:42,571][134294] Updated weights for policy 0, policy_version 88754 (0.0025) [2025-01-04 03:48:43,968][134211] Fps is (10 sec: 13107.2, 60 sec: 15837.9, 300 sec: 15301.0). Total num frames: 363552768. Throughput: 0: 3839.5. Samples: 80059346. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 03:48:43,968][134211] Avg episode reward: [(0, '8.021')] [2025-01-04 03:48:45,654][134294] Updated weights for policy 0, policy_version 88764 (0.0028) [2025-01-04 03:48:48,788][134294] Updated weights for policy 0, policy_version 88774 (0.0026) [2025-01-04 03:48:48,970][134211] Fps is (10 sec: 13104.3, 60 sec: 15769.1, 300 sec: 15300.9). Total num frames: 363618304. Throughput: 0: 3798.4. Samples: 80069190. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 03:48:48,970][134211] Avg episode reward: [(0, '7.541')] [2025-01-04 03:48:52,013][134294] Updated weights for policy 0, policy_version 88784 (0.0026) [2025-01-04 03:48:53,968][134211] Fps is (10 sec: 13107.1, 60 sec: 15428.2, 300 sec: 15231.6). Total num frames: 363683840. Throughput: 0: 3805.8. Samples: 80088560. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 03:48:53,968][134211] Avg episode reward: [(0, '7.042')] [2025-01-04 03:48:55,165][134294] Updated weights for policy 0, policy_version 88794 (0.0026) [2025-01-04 03:48:58,276][134294] Updated weights for policy 0, policy_version 88804 (0.0026) [2025-01-04 03:48:58,968][134211] Fps is (10 sec: 13110.2, 60 sec: 14677.5, 300 sec: 15231.6). Total num frames: 363749376. Throughput: 0: 3806.4. Samples: 80108154. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 03:48:58,968][134211] Avg episode reward: [(0, '7.430')] [2025-01-04 03:49:00,472][134294] Updated weights for policy 0, policy_version 88814 (0.0014) [2025-01-04 03:49:02,335][134294] Updated weights for policy 0, policy_version 88824 (0.0013) [2025-01-04 03:49:03,968][134211] Fps is (10 sec: 17203.4, 60 sec: 15428.3, 300 sec: 15245.4). Total num frames: 363855872. Throughput: 0: 3905.2. Samples: 80122764. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 03:49:03,968][134211] Avg episode reward: [(0, '7.058')] [2025-01-04 03:49:04,253][134294] Updated weights for policy 0, policy_version 88834 (0.0014) [2025-01-04 03:49:06,107][134294] Updated weights for policy 0, policy_version 88844 (0.0013) [2025-01-04 03:49:08,039][134294] Updated weights for policy 0, policy_version 88854 (0.0014) [2025-01-04 03:49:08,968][134211] Fps is (10 sec: 20889.3, 60 sec: 16042.7, 300 sec: 15314.9). Total num frames: 363958272. Throughput: 0: 3971.8. Samples: 80155236. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 03:49:08,968][134211] Avg episode reward: [(0, '7.329')] [2025-01-04 03:49:10,887][134294] Updated weights for policy 0, policy_version 88864 (0.0026) [2025-01-04 03:49:13,968][134211] Fps is (10 sec: 16793.2, 60 sec: 16042.6, 300 sec: 15314.9). Total num frames: 364023808. Throughput: 0: 3745.6. Samples: 80177496. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 03:49:13,968][134211] Avg episode reward: [(0, '7.665')] [2025-01-04 03:49:13,984][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000088874_364027904.pth... [2025-01-04 03:49:13,990][134294] Updated weights for policy 0, policy_version 88874 (0.0030) [2025-01-04 03:49:14,056][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000087972_360333312.pth [2025-01-04 03:49:17,331][134294] Updated weights for policy 0, policy_version 88884 (0.0027) [2025-01-04 03:49:18,968][134211] Fps is (10 sec: 12697.7, 60 sec: 15769.6, 300 sec: 15314.9). Total num frames: 364085248. Throughput: 0: 3687.4. Samples: 80186762. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 03:49:18,968][134211] Avg episode reward: [(0, '7.594')] [2025-01-04 03:49:20,797][134294] Updated weights for policy 0, policy_version 88894 (0.0029) [2025-01-04 03:49:23,968][134211] Fps is (10 sec: 12288.2, 60 sec: 14950.4, 300 sec: 15217.7). Total num frames: 364146688. Throughput: 0: 3663.1. Samples: 80204794. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 03:49:23,968][134211] Avg episode reward: [(0, '7.334')] [2025-01-04 03:49:24,271][134294] Updated weights for policy 0, policy_version 88904 (0.0027) [2025-01-04 03:49:27,166][134294] Updated weights for policy 0, policy_version 88914 (0.0021) [2025-01-04 03:49:28,967][134211] Fps is (10 sec: 14336.3, 60 sec: 14540.8, 300 sec: 15134.4). Total num frames: 364228608. Throughput: 0: 3712.0. Samples: 80226384. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 03:49:28,968][134211] Avg episode reward: [(0, '7.692')] [2025-01-04 03:49:29,110][134294] Updated weights for policy 0, policy_version 88924 (0.0013) [2025-01-04 03:49:30,978][134294] Updated weights for policy 0, policy_version 88934 (0.0012) [2025-01-04 03:49:32,877][134294] Updated weights for policy 0, policy_version 88944 (0.0013) [2025-01-04 03:49:33,967][134211] Fps is (10 sec: 18842.0, 60 sec: 15223.5, 300 sec: 15231.6). Total num frames: 364335104. Throughput: 0: 3855.0. Samples: 80242654. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 03:49:33,968][134211] Avg episode reward: [(0, '7.878')] [2025-01-04 03:49:34,783][134294] Updated weights for policy 0, policy_version 88954 (0.0013) [2025-01-04 03:49:36,662][134294] Updated weights for policy 0, policy_version 88964 (0.0016) [2025-01-04 03:49:38,968][134211] Fps is (10 sec: 20069.7, 60 sec: 15701.3, 300 sec: 15370.4). Total num frames: 364429312. Throughput: 0: 4132.2. Samples: 80274508. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 03:49:38,968][134211] Avg episode reward: [(0, '7.876')] [2025-01-04 03:49:39,615][134294] Updated weights for policy 0, policy_version 88974 (0.0024) [2025-01-04 03:49:42,977][134294] Updated weights for policy 0, policy_version 88984 (0.0026) [2025-01-04 03:49:43,968][134211] Fps is (10 sec: 15564.4, 60 sec: 15633.0, 300 sec: 15384.3). Total num frames: 364490752. Throughput: 0: 4117.1. Samples: 80293426. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 03:49:43,968][134211] Avg episode reward: [(0, '7.725')] [2025-01-04 03:49:46,048][134294] Updated weights for policy 0, policy_version 88994 (0.0029) [2025-01-04 03:49:48,968][134211] Fps is (10 sec: 12697.7, 60 sec: 15633.6, 300 sec: 15370.4). Total num frames: 364556288. Throughput: 0: 4013.1. Samples: 80303356. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 03:49:48,968][134211] Avg episode reward: [(0, '7.622')] [2025-01-04 03:49:49,277][134294] Updated weights for policy 0, policy_version 89004 (0.0024) [2025-01-04 03:49:52,297][134294] Updated weights for policy 0, policy_version 89014 (0.0027) [2025-01-04 03:49:53,969][134211] Fps is (10 sec: 13106.2, 60 sec: 15632.9, 300 sec: 15273.2). Total num frames: 364621824. Throughput: 0: 3730.2. Samples: 80323096. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 03:49:53,969][134211] Avg episode reward: [(0, '7.490')] [2025-01-04 03:49:55,337][134294] Updated weights for policy 0, policy_version 89024 (0.0023) [2025-01-04 03:49:58,333][134294] Updated weights for policy 0, policy_version 89034 (0.0022) [2025-01-04 03:49:58,968][134211] Fps is (10 sec: 13107.3, 60 sec: 15633.0, 300 sec: 15273.2). Total num frames: 364687360. Throughput: 0: 3686.5. Samples: 80343390. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 03:49:58,968][134211] Avg episode reward: [(0, '7.661')] [2025-01-04 03:50:01,654][134294] Updated weights for policy 0, policy_version 89044 (0.0025) [2025-01-04 03:50:03,968][134211] Fps is (10 sec: 12698.8, 60 sec: 14882.1, 300 sec: 15259.5). Total num frames: 364748800. Throughput: 0: 3691.4. Samples: 80352874. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 03:50:03,968][134211] Avg episode reward: [(0, '7.851')] [2025-01-04 03:50:05,102][134294] Updated weights for policy 0, policy_version 89054 (0.0022) [2025-01-04 03:50:07,120][134294] Updated weights for policy 0, policy_version 89064 (0.0014) [2025-01-04 03:50:08,967][134211] Fps is (10 sec: 15565.1, 60 sec: 14745.7, 300 sec: 15301.0). Total num frames: 364843008. Throughput: 0: 3787.5. Samples: 80375232. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 03:50:08,968][134211] Avg episode reward: [(0, '7.046')] [2025-01-04 03:50:08,979][134294] Updated weights for policy 0, policy_version 89074 (0.0012) [2025-01-04 03:50:10,938][134294] Updated weights for policy 0, policy_version 89084 (0.0012) [2025-01-04 03:50:13,601][134294] Updated weights for policy 0, policy_version 89094 (0.0023) [2025-01-04 03:50:13,971][134211] Fps is (10 sec: 18426.0, 60 sec: 15154.4, 300 sec: 15370.2). Total num frames: 364933120. Throughput: 0: 3950.1. Samples: 80404152. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 03:50:13,972][134211] Avg episode reward: [(0, '7.587')] [2025-01-04 03:50:16,864][134294] Updated weights for policy 0, policy_version 89104 (0.0028) [2025-01-04 03:50:18,968][134211] Fps is (10 sec: 15154.6, 60 sec: 15155.2, 300 sec: 15342.6). Total num frames: 364994560. Throughput: 0: 3800.8. Samples: 80413690. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 03:50:18,969][134211] Avg episode reward: [(0, '7.569')] [2025-01-04 03:50:20,087][134294] Updated weights for policy 0, policy_version 89114 (0.0029) [2025-01-04 03:50:23,278][134294] Updated weights for policy 0, policy_version 89124 (0.0021) [2025-01-04 03:50:23,968][134211] Fps is (10 sec: 13111.5, 60 sec: 15291.8, 300 sec: 15245.4). Total num frames: 365064192. Throughput: 0: 3508.8. Samples: 80432404. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 03:50:23,968][134211] Avg episode reward: [(0, '7.650')] [2025-01-04 03:50:25,148][134294] Updated weights for policy 0, policy_version 89134 (0.0015) [2025-01-04 03:50:27,027][134294] Updated weights for policy 0, policy_version 89144 (0.0016) [2025-01-04 03:50:28,917][134294] Updated weights for policy 0, policy_version 89154 (0.0014) [2025-01-04 03:50:28,968][134211] Fps is (10 sec: 18022.6, 60 sec: 15769.5, 300 sec: 15328.8). Total num frames: 365174784. Throughput: 0: 3764.1. Samples: 80462812. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 03:50:28,968][134211] Avg episode reward: [(0, '8.184')] [2025-01-04 03:50:30,908][134294] Updated weights for policy 0, policy_version 89164 (0.0015) [2025-01-04 03:50:33,871][134294] Updated weights for policy 0, policy_version 89174 (0.0026) [2025-01-04 03:50:33,968][134211] Fps is (10 sec: 19250.8, 60 sec: 15359.9, 300 sec: 15398.2). Total num frames: 365256704. Throughput: 0: 3889.6. Samples: 80478388. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 03:50:33,968][134211] Avg episode reward: [(0, '7.754')] [2025-01-04 03:50:37,046][134294] Updated weights for policy 0, policy_version 89184 (0.0028) [2025-01-04 03:50:38,968][134211] Fps is (10 sec: 14335.8, 60 sec: 14813.9, 300 sec: 15398.2). Total num frames: 365318144. Throughput: 0: 3880.9. Samples: 80497732. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 03:50:38,969][134211] Avg episode reward: [(0, '7.060')] [2025-01-04 03:50:40,335][134294] Updated weights for policy 0, policy_version 89194 (0.0024) [2025-01-04 03:50:43,416][134294] Updated weights for policy 0, policy_version 89204 (0.0023) [2025-01-04 03:50:43,968][134211] Fps is (10 sec: 12697.8, 60 sec: 14882.2, 300 sec: 15398.2). Total num frames: 365383680. Throughput: 0: 3868.8. Samples: 80517484. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 03:50:43,968][134211] Avg episode reward: [(0, '7.355')] [2025-01-04 03:50:46,401][134294] Updated weights for policy 0, policy_version 89214 (0.0027) [2025-01-04 03:50:48,968][134211] Fps is (10 sec: 13517.2, 60 sec: 14950.4, 300 sec: 15301.0). Total num frames: 365453312. Throughput: 0: 3885.2. Samples: 80527710. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 03:50:48,968][134211] Avg episode reward: [(0, '7.626')] [2025-01-04 03:50:49,464][134294] Updated weights for policy 0, policy_version 89224 (0.0026) [2025-01-04 03:50:52,428][134294] Updated weights for policy 0, policy_version 89234 (0.0026) [2025-01-04 03:50:53,968][134211] Fps is (10 sec: 14336.1, 60 sec: 15087.2, 300 sec: 15189.9). Total num frames: 365527040. Throughput: 0: 3835.3. Samples: 80547820. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 03:50:53,968][134211] Avg episode reward: [(0, '7.442')] [2025-01-04 03:50:54,550][134294] Updated weights for policy 0, policy_version 89244 (0.0012) [2025-01-04 03:50:56,479][134294] Updated weights for policy 0, policy_version 89254 (0.0013) [2025-01-04 03:50:58,356][134294] Updated weights for policy 0, policy_version 89264 (0.0013) [2025-01-04 03:50:58,967][134211] Fps is (10 sec: 18432.2, 60 sec: 15837.9, 300 sec: 15314.9). Total num frames: 365637632. Throughput: 0: 3876.3. Samples: 80578572. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 03:50:58,968][134211] Avg episode reward: [(0, '6.669')] [2025-01-04 03:51:00,811][134294] Updated weights for policy 0, policy_version 89274 (0.0019) [2025-01-04 03:51:03,925][134294] Updated weights for policy 0, policy_version 89284 (0.0026) [2025-01-04 03:51:03,968][134211] Fps is (10 sec: 18022.3, 60 sec: 15974.4, 300 sec: 15342.6). Total num frames: 365707264. Throughput: 0: 3942.2. Samples: 80591090. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 03:51:03,968][134211] Avg episode reward: [(0, '7.219')] [2025-01-04 03:51:06,993][134294] Updated weights for policy 0, policy_version 89294 (0.0025) [2025-01-04 03:51:08,968][134211] Fps is (10 sec: 13107.0, 60 sec: 15428.2, 300 sec: 15328.8). Total num frames: 365768704. Throughput: 0: 3964.7. Samples: 80610814. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 03:51:08,968][134211] Avg episode reward: [(0, '7.699')] [2025-01-04 03:51:10,145][134294] Updated weights for policy 0, policy_version 89304 (0.0025) [2025-01-04 03:51:13,255][134294] Updated weights for policy 0, policy_version 89314 (0.0027) [2025-01-04 03:51:13,968][134211] Fps is (10 sec: 13106.9, 60 sec: 15087.7, 300 sec: 15342.6). Total num frames: 365838336. Throughput: 0: 3730.5. Samples: 80630684. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 03:51:13,969][134211] Avg episode reward: [(0, '7.780')] [2025-01-04 03:51:13,978][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000089316_365838336.pth... [2025-01-04 03:51:14,054][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000088415_362147840.pth [2025-01-04 03:51:15,880][134294] Updated weights for policy 0, policy_version 89324 (0.0021) [2025-01-04 03:51:17,775][134294] Updated weights for policy 0, policy_version 89334 (0.0014) [2025-01-04 03:51:18,967][134211] Fps is (10 sec: 16794.0, 60 sec: 15701.4, 300 sec: 15453.8). Total num frames: 365936640. Throughput: 0: 3649.4. Samples: 80642608. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 03:51:18,968][134211] Avg episode reward: [(0, '7.528')] [2025-01-04 03:51:19,654][134294] Updated weights for policy 0, policy_version 89344 (0.0014) [2025-01-04 03:51:21,776][134294] Updated weights for policy 0, policy_version 89354 (0.0014) [2025-01-04 03:51:23,968][134211] Fps is (10 sec: 18841.0, 60 sec: 16042.5, 300 sec: 15523.1). Total num frames: 366026752. Throughput: 0: 3912.8. Samples: 80673810. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 03:51:23,969][134211] Avg episode reward: [(0, '8.378')] [2025-01-04 03:51:24,589][134294] Updated weights for policy 0, policy_version 89364 (0.0023) [2025-01-04 03:51:28,177][134294] Updated weights for policy 0, policy_version 89374 (0.0032) [2025-01-04 03:51:28,968][134211] Fps is (10 sec: 14335.7, 60 sec: 15087.0, 300 sec: 15412.1). Total num frames: 366080000. Throughput: 0: 3874.4. Samples: 80691830. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 03:51:28,969][134211] Avg episode reward: [(0, '6.764')] [2025-01-04 03:51:31,418][134294] Updated weights for policy 0, policy_version 89384 (0.0030) [2025-01-04 03:51:33,809][134294] Updated weights for policy 0, policy_version 89394 (0.0017) [2025-01-04 03:51:33,968][134211] Fps is (10 sec: 13108.0, 60 sec: 15018.7, 300 sec: 15314.9). Total num frames: 366157824. Throughput: 0: 3861.2. Samples: 80701462. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 03:51:33,968][134211] Avg episode reward: [(0, '8.079')] [2025-01-04 03:51:36,420][134294] Updated weights for policy 0, policy_version 89404 (0.0021) [2025-01-04 03:51:38,968][134211] Fps is (10 sec: 15155.0, 60 sec: 15223.5, 300 sec: 15328.8). Total num frames: 366231552. Throughput: 0: 3953.4. Samples: 80725724. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 03:51:38,968][134211] Avg episode reward: [(0, '7.652')] [2025-01-04 03:51:39,413][134294] Updated weights for policy 0, policy_version 89414 (0.0025) [2025-01-04 03:51:42,499][134294] Updated weights for policy 0, policy_version 89424 (0.0027) [2025-01-04 03:51:43,968][134211] Fps is (10 sec: 15155.3, 60 sec: 15428.3, 300 sec: 15370.4). Total num frames: 366309376. Throughput: 0: 3733.2. Samples: 80746568. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 03:51:43,968][134211] Avg episode reward: [(0, '8.516')] [2025-01-04 03:51:44,479][134294] Updated weights for policy 0, policy_version 89434 (0.0015) [2025-01-04 03:51:47,155][134294] Updated weights for policy 0, policy_version 89444 (0.0023) [2025-01-04 03:51:48,968][134211] Fps is (10 sec: 15155.4, 60 sec: 15496.5, 300 sec: 15412.1). Total num frames: 366383104. Throughput: 0: 3763.7. Samples: 80760456. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 03:51:48,968][134211] Avg episode reward: [(0, '7.288')] [2025-01-04 03:51:50,321][134294] Updated weights for policy 0, policy_version 89454 (0.0025) [2025-01-04 03:51:53,638][134294] Updated weights for policy 0, policy_version 89464 (0.0026) [2025-01-04 03:51:53,970][134211] Fps is (10 sec: 13514.0, 60 sec: 15291.2, 300 sec: 15314.8). Total num frames: 366444544. Throughput: 0: 3756.2. Samples: 80779852. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 03:51:53,970][134211] Avg episode reward: [(0, '7.763')] [2025-01-04 03:51:55,789][134294] Updated weights for policy 0, policy_version 89474 (0.0012) [2025-01-04 03:51:57,794][134294] Updated weights for policy 0, policy_version 89484 (0.0013) [2025-01-04 03:51:58,967][134211] Fps is (10 sec: 16384.2, 60 sec: 15155.2, 300 sec: 15328.8). Total num frames: 366546944. Throughput: 0: 3913.5. Samples: 80806790. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 03:51:58,968][134211] Avg episode reward: [(0, '8.496')] [2025-01-04 03:51:59,800][134294] Updated weights for policy 0, policy_version 89494 (0.0012) [2025-01-04 03:52:01,808][134294] Updated weights for policy 0, policy_version 89504 (0.0013) [2025-01-04 03:52:03,887][134294] Updated weights for policy 0, policy_version 89514 (0.0014) [2025-01-04 03:52:03,968][134211] Fps is (10 sec: 20484.2, 60 sec: 15701.3, 300 sec: 15481.6). Total num frames: 366649344. Throughput: 0: 3986.6. Samples: 80822006. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 03:52:03,968][134211] Avg episode reward: [(0, '7.997')] [2025-01-04 03:52:07,023][134294] Updated weights for policy 0, policy_version 89524 (0.0022) [2025-01-04 03:52:08,968][134211] Fps is (10 sec: 15974.1, 60 sec: 15633.1, 300 sec: 15467.6). Total num frames: 366706688. Throughput: 0: 3826.4. Samples: 80845994. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 03:52:08,968][134211] Avg episode reward: [(0, '7.528')] [2025-01-04 03:52:10,934][134294] Updated weights for policy 0, policy_version 89534 (0.0027) [2025-01-04 03:52:13,968][134211] Fps is (10 sec: 11468.7, 60 sec: 15428.3, 300 sec: 15314.9). Total num frames: 366764032. Throughput: 0: 3792.6. Samples: 80862496. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 03:52:13,968][134211] Avg episode reward: [(0, '6.448')] [2025-01-04 03:52:14,625][134294] Updated weights for policy 0, policy_version 89544 (0.0027) [2025-01-04 03:52:17,891][134294] Updated weights for policy 0, policy_version 89554 (0.0025) [2025-01-04 03:52:18,968][134211] Fps is (10 sec: 12288.1, 60 sec: 14882.1, 300 sec: 15217.7). Total num frames: 366829568. Throughput: 0: 3763.3. Samples: 80870810. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 03:52:18,968][134211] Avg episode reward: [(0, '7.596')] [2025-01-04 03:52:19,984][134294] Updated weights for policy 0, policy_version 89564 (0.0014) [2025-01-04 03:52:22,541][134294] Updated weights for policy 0, policy_version 89574 (0.0020) [2025-01-04 03:52:23,968][134211] Fps is (10 sec: 14745.4, 60 sec: 14745.7, 300 sec: 15287.1). Total num frames: 366911488. Throughput: 0: 3782.0. Samples: 80895914. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 03:52:23,969][134211] Avg episode reward: [(0, '6.903')] [2025-01-04 03:52:25,820][134294] Updated weights for policy 0, policy_version 89584 (0.0026) [2025-01-04 03:52:28,915][134294] Updated weights for policy 0, policy_version 89594 (0.0026) [2025-01-04 03:52:28,968][134211] Fps is (10 sec: 14745.5, 60 sec: 14950.4, 300 sec: 15273.2). Total num frames: 366977024. Throughput: 0: 3745.9. Samples: 80915134. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 03:52:28,968][134211] Avg episode reward: [(0, '7.474')] [2025-01-04 03:52:32,098][134294] Updated weights for policy 0, policy_version 89604 (0.0025) [2025-01-04 03:52:33,968][134211] Fps is (10 sec: 13107.2, 60 sec: 14745.6, 300 sec: 15273.2). Total num frames: 367042560. Throughput: 0: 3648.8. Samples: 80924652. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 03:52:33,968][134211] Avg episode reward: [(0, '7.450')] [2025-01-04 03:52:35,006][134294] Updated weights for policy 0, policy_version 89614 (0.0025) [2025-01-04 03:52:36,968][134294] Updated weights for policy 0, policy_version 89624 (0.0014) [2025-01-04 03:52:38,968][134211] Fps is (10 sec: 15155.1, 60 sec: 14950.4, 300 sec: 15342.6). Total num frames: 367128576. Throughput: 0: 3760.0. Samples: 80949044. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 03:52:38,968][134211] Avg episode reward: [(0, '8.248')] [2025-01-04 03:52:39,759][134294] Updated weights for policy 0, policy_version 89634 (0.0023) [2025-01-04 03:52:42,753][134294] Updated weights for policy 0, policy_version 89644 (0.0023) [2025-01-04 03:52:43,968][134211] Fps is (10 sec: 15155.4, 60 sec: 14745.6, 300 sec: 15328.8). Total num frames: 367194112. Throughput: 0: 3625.8. Samples: 80969950. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 03:52:43,968][134211] Avg episode reward: [(0, '7.787')] [2025-01-04 03:52:45,798][134294] Updated weights for policy 0, policy_version 89654 (0.0024) [2025-01-04 03:52:48,482][134294] Updated weights for policy 0, policy_version 89664 (0.0022) [2025-01-04 03:52:48,968][134211] Fps is (10 sec: 14336.3, 60 sec: 14813.9, 300 sec: 15301.0). Total num frames: 367271936. Throughput: 0: 3516.4. Samples: 80980244. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 03:52:48,968][134211] Avg episode reward: [(0, '8.832')] [2025-01-04 03:52:50,404][134294] Updated weights for policy 0, policy_version 89674 (0.0014) [2025-01-04 03:52:52,533][134294] Updated weights for policy 0, policy_version 89684 (0.0012) [2025-01-04 03:52:53,967][134211] Fps is (10 sec: 17613.1, 60 sec: 15428.8, 300 sec: 15259.4). Total num frames: 367370240. Throughput: 0: 3601.6. Samples: 81008064. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 03:52:53,968][134211] Avg episode reward: [(0, '6.782')] [2025-01-04 03:52:54,633][134294] Updated weights for policy 0, policy_version 89694 (0.0013) [2025-01-04 03:52:56,753][134294] Updated weights for policy 0, policy_version 89704 (0.0014) [2025-01-04 03:52:58,968][134211] Fps is (10 sec: 18022.2, 60 sec: 15086.9, 300 sec: 15328.8). Total num frames: 367452160. Throughput: 0: 3828.2. Samples: 81034766. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 03:52:58,968][134211] Avg episode reward: [(0, '8.118')] [2025-01-04 03:53:00,220][134294] Updated weights for policy 0, policy_version 89714 (0.0025) [2025-01-04 03:53:03,968][134211] Fps is (10 sec: 13516.2, 60 sec: 14267.6, 300 sec: 15287.1). Total num frames: 367505408. Throughput: 0: 3822.3. Samples: 81042816. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 03:53:03,969][134211] Avg episode reward: [(0, '7.707')] [2025-01-04 03:53:04,067][134294] Updated weights for policy 0, policy_version 89724 (0.0028) [2025-01-04 03:53:07,760][134294] Updated weights for policy 0, policy_version 89734 (0.0026) [2025-01-04 03:53:08,967][134211] Fps is (10 sec: 11469.1, 60 sec: 14336.1, 300 sec: 15273.2). Total num frames: 367566848. Throughput: 0: 3633.2. Samples: 81059408. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 03:53:08,968][134211] Avg episode reward: [(0, '8.155')] [2025-01-04 03:53:10,102][134294] Updated weights for policy 0, policy_version 89744 (0.0014) [2025-01-04 03:53:12,218][134294] Updated weights for policy 0, policy_version 89754 (0.0014) [2025-01-04 03:53:13,968][134211] Fps is (10 sec: 15565.4, 60 sec: 14950.4, 300 sec: 15328.8). Total num frames: 367661056. Throughput: 0: 3789.2. Samples: 81085648. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 03:53:13,968][134211] Avg episode reward: [(0, '7.960')] [2025-01-04 03:53:13,992][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000089762_367665152.pth... [2025-01-04 03:53:14,042][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000088874_364027904.pth [2025-01-04 03:53:14,534][134294] Updated weights for policy 0, policy_version 89764 (0.0013) [2025-01-04 03:53:17,993][134294] Updated weights for policy 0, policy_version 89774 (0.0027) [2025-01-04 03:53:18,968][134211] Fps is (10 sec: 15564.3, 60 sec: 14882.1, 300 sec: 15162.1). Total num frames: 367722496. Throughput: 0: 3823.9. Samples: 81096726. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 03:53:18,968][134211] Avg episode reward: [(0, '7.518')] [2025-01-04 03:53:21,803][134294] Updated weights for policy 0, policy_version 89784 (0.0031) [2025-01-04 03:53:23,968][134211] Fps is (10 sec: 11468.5, 60 sec: 14404.3, 300 sec: 14981.6). Total num frames: 367775744. Throughput: 0: 3643.6. Samples: 81113006. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 03:53:23,969][134211] Avg episode reward: [(0, '8.179')] [2025-01-04 03:53:25,597][134294] Updated weights for policy 0, policy_version 89794 (0.0025) [2025-01-04 03:53:27,779][134294] Updated weights for policy 0, policy_version 89804 (0.0013) [2025-01-04 03:53:28,968][134211] Fps is (10 sec: 13517.1, 60 sec: 14677.4, 300 sec: 15037.2). Total num frames: 367857664. Throughput: 0: 3653.4. Samples: 81134352. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 03:53:28,968][134211] Avg episode reward: [(0, '8.238')] [2025-01-04 03:53:29,957][134294] Updated weights for policy 0, policy_version 89814 (0.0014) [2025-01-04 03:53:32,931][134294] Updated weights for policy 0, policy_version 89824 (0.0022) [2025-01-04 03:53:33,969][134211] Fps is (10 sec: 15154.2, 60 sec: 14745.4, 300 sec: 15051.0). Total num frames: 367927296. Throughput: 0: 3725.8. Samples: 81147910. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 03:53:33,969][134211] Avg episode reward: [(0, '7.652')] [2025-01-04 03:53:36,871][134294] Updated weights for policy 0, policy_version 89834 (0.0031) [2025-01-04 03:53:38,969][134211] Fps is (10 sec: 13924.9, 60 sec: 14472.3, 300 sec: 15064.9). Total num frames: 367996928. Throughput: 0: 3474.8. Samples: 81164436. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 03:53:38,969][134211] Avg episode reward: [(0, '7.230')] [2025-01-04 03:53:39,138][134294] Updated weights for policy 0, policy_version 89844 (0.0015) [2025-01-04 03:53:41,291][134294] Updated weights for policy 0, policy_version 89854 (0.0013) [2025-01-04 03:53:43,551][134294] Updated weights for policy 0, policy_version 89864 (0.0013) [2025-01-04 03:53:43,968][134211] Fps is (10 sec: 15975.5, 60 sec: 14882.1, 300 sec: 15148.4). Total num frames: 368087040. Throughput: 0: 3509.9. Samples: 81192714. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 03:53:43,968][134211] Avg episode reward: [(0, '8.473')] [2025-01-04 03:53:47,355][134294] Updated weights for policy 0, policy_version 89874 (0.0025) [2025-01-04 03:53:48,968][134211] Fps is (10 sec: 14336.9, 60 sec: 14472.4, 300 sec: 15106.6). Total num frames: 368140288. Throughput: 0: 3522.5. Samples: 81201328. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 03:53:48,969][134211] Avg episode reward: [(0, '8.163')] [2025-01-04 03:53:51,076][134294] Updated weights for policy 0, policy_version 89884 (0.0030) [2025-01-04 03:53:53,968][134211] Fps is (10 sec: 10649.4, 60 sec: 13721.5, 300 sec: 15064.9). Total num frames: 368193536. Throughput: 0: 3511.3. Samples: 81217420. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 03:53:53,969][134211] Avg episode reward: [(0, '8.729')] [2025-01-04 03:53:54,615][134294] Updated weights for policy 0, policy_version 89894 (0.0026) [2025-01-04 03:53:56,636][134294] Updated weights for policy 0, policy_version 89904 (0.0014) [2025-01-04 03:53:58,688][134294] Updated weights for policy 0, policy_version 89914 (0.0014) [2025-01-04 03:53:58,968][134211] Fps is (10 sec: 15155.8, 60 sec: 13994.7, 300 sec: 15037.2). Total num frames: 368291840. Throughput: 0: 3487.2. Samples: 81242574. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 03:53:58,968][134211] Avg episode reward: [(0, '7.530')] [2025-01-04 03:54:00,773][134294] Updated weights for policy 0, policy_version 89924 (0.0015) [2025-01-04 03:54:02,835][134294] Updated weights for policy 0, policy_version 89934 (0.0014) [2025-01-04 03:54:03,968][134211] Fps is (10 sec: 19661.2, 60 sec: 14745.6, 300 sec: 15023.3). Total num frames: 368390144. Throughput: 0: 3574.1. Samples: 81257562. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 03:54:03,968][134211] Avg episode reward: [(0, '7.985')] [2025-01-04 03:54:05,079][134294] Updated weights for policy 0, policy_version 89944 (0.0016) [2025-01-04 03:54:08,538][134294] Updated weights for policy 0, policy_version 89954 (0.0029) [2025-01-04 03:54:08,968][134211] Fps is (10 sec: 15974.2, 60 sec: 14745.5, 300 sec: 15009.4). Total num frames: 368451584. Throughput: 0: 3759.8. Samples: 81282198. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 03:54:08,968][134211] Avg episode reward: [(0, '7.494')] [2025-01-04 03:54:12,340][134294] Updated weights for policy 0, policy_version 89964 (0.0028) [2025-01-04 03:54:13,968][134211] Fps is (10 sec: 11878.3, 60 sec: 14131.1, 300 sec: 14995.5). Total num frames: 368508928. Throughput: 0: 3649.0. Samples: 81298560. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 03:54:13,969][134211] Avg episode reward: [(0, '8.473')] [2025-01-04 03:54:15,876][134294] Updated weights for policy 0, policy_version 89974 (0.0030) [2025-01-04 03:54:18,968][134211] Fps is (10 sec: 11878.4, 60 sec: 14131.2, 300 sec: 14995.5). Total num frames: 368570368. Throughput: 0: 3545.1. Samples: 81307438. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 03:54:18,968][134211] Avg episode reward: [(0, '7.944')] [2025-01-04 03:54:19,313][134294] Updated weights for policy 0, policy_version 89984 (0.0028) [2025-01-04 03:54:22,639][134294] Updated weights for policy 0, policy_version 89994 (0.0026) [2025-01-04 03:54:23,968][134211] Fps is (10 sec: 11878.5, 60 sec: 14199.5, 300 sec: 14912.2). Total num frames: 368627712. Throughput: 0: 3583.3. Samples: 81325682. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 03:54:23,968][134211] Avg episode reward: [(0, '7.391')] [2025-01-04 03:54:25,987][134294] Updated weights for policy 0, policy_version 90004 (0.0029) [2025-01-04 03:54:28,905][134294] Updated weights for policy 0, policy_version 90014 (0.0025) [2025-01-04 03:54:28,968][134211] Fps is (10 sec: 12697.5, 60 sec: 13994.6, 300 sec: 14787.2). Total num frames: 368697344. Throughput: 0: 3387.0. Samples: 81345130. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 03:54:28,968][134211] Avg episode reward: [(0, '7.854')] [2025-01-04 03:54:31,194][134294] Updated weights for policy 0, policy_version 90024 (0.0017) [2025-01-04 03:54:33,083][134294] Updated weights for policy 0, policy_version 90034 (0.0012) [2025-01-04 03:54:33,968][134211] Fps is (10 sec: 16794.1, 60 sec: 14472.8, 300 sec: 14801.2). Total num frames: 368795648. Throughput: 0: 3474.2. Samples: 81357666. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 03:54:33,968][134211] Avg episode reward: [(0, '8.583')] [2025-01-04 03:54:35,280][134294] Updated weights for policy 0, policy_version 90044 (0.0018) [2025-01-04 03:54:38,236][134294] Updated weights for policy 0, policy_version 90054 (0.0025) [2025-01-04 03:54:38,968][134211] Fps is (10 sec: 17203.1, 60 sec: 14541.0, 300 sec: 14842.8). Total num frames: 368869376. Throughput: 0: 3722.5. Samples: 81384932. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 03:54:38,969][134211] Avg episode reward: [(0, '8.059')] [2025-01-04 03:54:41,451][134294] Updated weights for policy 0, policy_version 90064 (0.0026) [2025-01-04 03:54:43,968][134211] Fps is (10 sec: 13926.0, 60 sec: 14131.2, 300 sec: 14842.8). Total num frames: 368934912. Throughput: 0: 3596.1. Samples: 81404400. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 03:54:43,968][134211] Avg episode reward: [(0, '8.322')] [2025-01-04 03:54:44,556][134294] Updated weights for policy 0, policy_version 90074 (0.0027) [2025-01-04 03:54:47,697][134294] Updated weights for policy 0, policy_version 90084 (0.0026) [2025-01-04 03:54:48,968][134211] Fps is (10 sec: 13107.3, 60 sec: 14336.1, 300 sec: 14842.8). Total num frames: 369000448. Throughput: 0: 3484.1. Samples: 81414348. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 03:54:48,968][134211] Avg episode reward: [(0, '7.341')] [2025-01-04 03:54:50,618][134294] Updated weights for policy 0, policy_version 90094 (0.0023) [2025-01-04 03:54:52,869][134294] Updated weights for policy 0, policy_version 90104 (0.0015) [2025-01-04 03:54:53,967][134211] Fps is (10 sec: 15155.7, 60 sec: 14882.3, 300 sec: 14912.2). Total num frames: 369086464. Throughput: 0: 3428.0. Samples: 81436456. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 03:54:53,968][134211] Avg episode reward: [(0, '7.884')] [2025-01-04 03:54:54,736][134294] Updated weights for policy 0, policy_version 90114 (0.0012) [2025-01-04 03:54:56,654][134294] Updated weights for policy 0, policy_version 90124 (0.0014) [2025-01-04 03:54:58,630][134294] Updated weights for policy 0, policy_version 90134 (0.0012) [2025-01-04 03:54:58,968][134211] Fps is (10 sec: 19251.5, 60 sec: 15018.7, 300 sec: 15065.0). Total num frames: 369192960. Throughput: 0: 3767.7. Samples: 81468104. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 03:54:58,968][134211] Avg episode reward: [(0, '7.166')] [2025-01-04 03:55:00,645][134294] Updated weights for policy 0, policy_version 90144 (0.0014) [2025-01-04 03:55:03,740][134294] Updated weights for policy 0, policy_version 90154 (0.0025) [2025-01-04 03:55:03,968][134211] Fps is (10 sec: 18431.2, 60 sec: 14677.3, 300 sec: 15009.4). Total num frames: 369270784. Throughput: 0: 3895.2. Samples: 81482724. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 03:55:03,969][134211] Avg episode reward: [(0, '7.384')] [2025-01-04 03:55:06,775][134294] Updated weights for policy 0, policy_version 90164 (0.0027) [2025-01-04 03:55:08,968][134211] Fps is (10 sec: 14334.9, 60 sec: 14745.4, 300 sec: 14926.2). Total num frames: 369336320. Throughput: 0: 3923.0. Samples: 81502218. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 03:55:08,969][134211] Avg episode reward: [(0, '7.975')] [2025-01-04 03:55:10,107][134294] Updated weights for policy 0, policy_version 90174 (0.0030) [2025-01-04 03:55:13,221][134294] Updated weights for policy 0, policy_version 90184 (0.0027) [2025-01-04 03:55:13,968][134211] Fps is (10 sec: 13107.3, 60 sec: 14882.2, 300 sec: 14940.0). Total num frames: 369401856. Throughput: 0: 3919.9. Samples: 81521528. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 03:55:13,969][134211] Avg episode reward: [(0, '7.927')] [2025-01-04 03:55:13,983][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000090186_369401856.pth... [2025-01-04 03:55:14,061][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000089316_365838336.pth [2025-01-04 03:55:16,278][134294] Updated weights for policy 0, policy_version 90194 (0.0026) [2025-01-04 03:55:18,968][134211] Fps is (10 sec: 13517.5, 60 sec: 15018.6, 300 sec: 14940.0). Total num frames: 369471488. Throughput: 0: 3861.1. Samples: 81531418. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 03:55:18,969][134211] Avg episode reward: [(0, '7.901')] [2025-01-04 03:55:19,325][134294] Updated weights for policy 0, policy_version 90204 (0.0024) [2025-01-04 03:55:22,707][134294] Updated weights for policy 0, policy_version 90214 (0.0024) [2025-01-04 03:55:23,968][134211] Fps is (10 sec: 12697.9, 60 sec: 15018.7, 300 sec: 14759.5). Total num frames: 369528832. Throughput: 0: 3684.1. Samples: 81550718. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 03:55:23,968][134211] Avg episode reward: [(0, '8.312')] [2025-01-04 03:55:25,405][134294] Updated weights for policy 0, policy_version 90224 (0.0016) [2025-01-04 03:55:27,429][134294] Updated weights for policy 0, policy_version 90234 (0.0012) [2025-01-04 03:55:28,968][134211] Fps is (10 sec: 15974.1, 60 sec: 15564.7, 300 sec: 14828.9). Total num frames: 369631232. Throughput: 0: 3833.3. Samples: 81576900. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 03:55:28,968][134211] Avg episode reward: [(0, '8.277')] [2025-01-04 03:55:29,338][134294] Updated weights for policy 0, policy_version 90244 (0.0014) [2025-01-04 03:55:31,166][134294] Updated weights for policy 0, policy_version 90254 (0.0014) [2025-01-04 03:55:33,143][134294] Updated weights for policy 0, policy_version 90264 (0.0014) [2025-01-04 03:55:33,968][134211] Fps is (10 sec: 20479.8, 60 sec: 15633.0, 300 sec: 14967.8). Total num frames: 369733632. Throughput: 0: 3975.0. Samples: 81593224. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 03:55:33,968][134211] Avg episode reward: [(0, '8.822')] [2025-01-04 03:55:36,028][134294] Updated weights for policy 0, policy_version 90274 (0.0027) [2025-01-04 03:55:38,968][134211] Fps is (10 sec: 16793.9, 60 sec: 15496.5, 300 sec: 14967.7). Total num frames: 369799168. Throughput: 0: 4023.1. Samples: 81617496. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 03:55:38,969][134211] Avg episode reward: [(0, '7.772')] [2025-01-04 03:55:39,278][134294] Updated weights for policy 0, policy_version 90284 (0.0024) [2025-01-04 03:55:42,415][134294] Updated weights for policy 0, policy_version 90294 (0.0024) [2025-01-04 03:55:43,968][134211] Fps is (10 sec: 12697.5, 60 sec: 15428.3, 300 sec: 14940.0). Total num frames: 369860608. Throughput: 0: 3744.4. Samples: 81636604. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 03:55:43,969][134211] Avg episode reward: [(0, '7.599')] [2025-01-04 03:55:45,529][134294] Updated weights for policy 0, policy_version 90304 (0.0027) [2025-01-04 03:55:48,531][134294] Updated weights for policy 0, policy_version 90314 (0.0027) [2025-01-04 03:55:48,968][134211] Fps is (10 sec: 13107.5, 60 sec: 15496.6, 300 sec: 14926.1). Total num frames: 369930240. Throughput: 0: 3643.6. Samples: 81646686. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 03:55:48,968][134211] Avg episode reward: [(0, '8.212')] [2025-01-04 03:55:51,499][134294] Updated weights for policy 0, policy_version 90324 (0.0026) [2025-01-04 03:55:53,969][134211] Fps is (10 sec: 13515.9, 60 sec: 15154.9, 300 sec: 14773.3). Total num frames: 369995776. Throughput: 0: 3664.6. Samples: 81667128. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 03:55:53,970][134211] Avg episode reward: [(0, '8.385')] [2025-01-04 03:55:54,609][134294] Updated weights for policy 0, policy_version 90334 (0.0026) [2025-01-04 03:55:57,591][134294] Updated weights for policy 0, policy_version 90344 (0.0022) [2025-01-04 03:55:58,968][134211] Fps is (10 sec: 13516.7, 60 sec: 14540.8, 300 sec: 14773.4). Total num frames: 370065408. Throughput: 0: 3682.1. Samples: 81687224. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 03:55:58,969][134211] Avg episode reward: [(0, '8.484')] [2025-01-04 03:56:00,735][134294] Updated weights for policy 0, policy_version 90354 (0.0025) [2025-01-04 03:56:03,403][134294] Updated weights for policy 0, policy_version 90364 (0.0024) [2025-01-04 03:56:03,967][134211] Fps is (10 sec: 14337.5, 60 sec: 14472.6, 300 sec: 14815.0). Total num frames: 370139136. Throughput: 0: 3686.0. Samples: 81697286. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 03:56:03,968][134211] Avg episode reward: [(0, '8.194')] [2025-01-04 03:56:05,353][134294] Updated weights for policy 0, policy_version 90374 (0.0012) [2025-01-04 03:56:07,394][134294] Updated weights for policy 0, policy_version 90384 (0.0016) [2025-01-04 03:56:08,968][134211] Fps is (10 sec: 16793.6, 60 sec: 14950.5, 300 sec: 14898.3). Total num frames: 370233344. Throughput: 0: 3889.1. Samples: 81725728. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 03:56:08,972][134211] Avg episode reward: [(0, '7.869')] [2025-01-04 03:56:10,437][134294] Updated weights for policy 0, policy_version 90394 (0.0026) [2025-01-04 03:56:13,503][134294] Updated weights for policy 0, policy_version 90404 (0.0025) [2025-01-04 03:56:13,968][134211] Fps is (10 sec: 15973.8, 60 sec: 14950.4, 300 sec: 14787.2). Total num frames: 370298880. Throughput: 0: 3757.7. Samples: 81745996. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 03:56:13,969][134211] Avg episode reward: [(0, '7.995')] [2025-01-04 03:56:16,027][134294] Updated weights for policy 0, policy_version 90414 (0.0016) [2025-01-04 03:56:18,241][134294] Updated weights for policy 0, policy_version 90424 (0.0018) [2025-01-04 03:56:18,968][134211] Fps is (10 sec: 15155.4, 60 sec: 15223.5, 300 sec: 14773.4). Total num frames: 370384896. Throughput: 0: 3663.0. Samples: 81758058. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 03:56:18,968][134211] Avg episode reward: [(0, '8.854')] [2025-01-04 03:56:21,411][134294] Updated weights for policy 0, policy_version 90434 (0.0027) [2025-01-04 03:56:23,968][134211] Fps is (10 sec: 15155.5, 60 sec: 15360.0, 300 sec: 14815.0). Total num frames: 370450432. Throughput: 0: 3616.0. Samples: 81780216. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 03:56:23,968][134211] Avg episode reward: [(0, '7.274')] [2025-01-04 03:56:24,534][134294] Updated weights for policy 0, policy_version 90444 (0.0029) [2025-01-04 03:56:26,658][134294] Updated weights for policy 0, policy_version 90454 (0.0016) [2025-01-04 03:56:28,600][134294] Updated weights for policy 0, policy_version 90464 (0.0013) [2025-01-04 03:56:28,967][134211] Fps is (10 sec: 15974.7, 60 sec: 15223.6, 300 sec: 14870.6). Total num frames: 370544640. Throughput: 0: 3766.6. Samples: 81806098. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 03:56:28,968][134211] Avg episode reward: [(0, '7.690')] [2025-01-04 03:56:30,495][134294] Updated weights for policy 0, policy_version 90474 (0.0013) [2025-01-04 03:56:32,387][134294] Updated weights for policy 0, policy_version 90484 (0.0013) [2025-01-04 03:56:33,968][134211] Fps is (10 sec: 19660.5, 60 sec: 15223.5, 300 sec: 14967.8). Total num frames: 370647040. Throughput: 0: 3905.1. Samples: 81822414. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 03:56:33,968][134211] Avg episode reward: [(0, '8.206')] [2025-01-04 03:56:34,942][134294] Updated weights for policy 0, policy_version 90494 (0.0023) [2025-01-04 03:56:38,310][134294] Updated weights for policy 0, policy_version 90504 (0.0027) [2025-01-04 03:56:38,968][134211] Fps is (10 sec: 16382.4, 60 sec: 15155.1, 300 sec: 14912.2). Total num frames: 370708480. Throughput: 0: 3974.5. Samples: 81845978. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 03:56:38,969][134211] Avg episode reward: [(0, '7.876')] [2025-01-04 03:56:41,413][134294] Updated weights for policy 0, policy_version 90514 (0.0025) [2025-01-04 03:56:43,968][134211] Fps is (10 sec: 13107.2, 60 sec: 15291.7, 300 sec: 14898.3). Total num frames: 370778112. Throughput: 0: 3955.6. Samples: 81865226. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 03:56:43,969][134211] Avg episode reward: [(0, '8.289')] [2025-01-04 03:56:44,499][134294] Updated weights for policy 0, policy_version 90524 (0.0028) [2025-01-04 03:56:47,710][134294] Updated weights for policy 0, policy_version 90534 (0.0024) [2025-01-04 03:56:48,968][134211] Fps is (10 sec: 13517.9, 60 sec: 15223.5, 300 sec: 14912.3). Total num frames: 370843648. Throughput: 0: 3946.7. Samples: 81874890. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 03:56:48,968][134211] Avg episode reward: [(0, '7.426')] [2025-01-04 03:56:50,667][134294] Updated weights for policy 0, policy_version 90544 (0.0023) [2025-01-04 03:56:53,732][134294] Updated weights for policy 0, policy_version 90554 (0.0028) [2025-01-04 03:56:53,968][134211] Fps is (10 sec: 13106.6, 60 sec: 15223.5, 300 sec: 14787.2). Total num frames: 370909184. Throughput: 0: 3769.5. Samples: 81895356. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 03:56:53,969][134211] Avg episode reward: [(0, '7.676')] [2025-01-04 03:56:56,738][134294] Updated weights for policy 0, policy_version 90564 (0.0027) [2025-01-04 03:56:58,967][134211] Fps is (10 sec: 13516.9, 60 sec: 15223.5, 300 sec: 14676.2). Total num frames: 370978816. Throughput: 0: 3767.0. Samples: 81915508. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 03:56:58,968][134211] Avg episode reward: [(0, '8.339')] [2025-01-04 03:56:59,479][134294] Updated weights for policy 0, policy_version 90574 (0.0021) [2025-01-04 03:57:01,408][134294] Updated weights for policy 0, policy_version 90584 (0.0013) [2025-01-04 03:57:03,387][134294] Updated weights for policy 0, policy_version 90594 (0.0014) [2025-01-04 03:57:03,968][134211] Fps is (10 sec: 17203.4, 60 sec: 15701.2, 300 sec: 14828.9). Total num frames: 371081216. Throughput: 0: 3823.8. Samples: 81930132. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 03:57:03,969][134211] Avg episode reward: [(0, '9.042')] [2025-01-04 03:57:03,977][134264] Saving new best policy, reward=9.042! [2025-01-04 03:57:06,312][134294] Updated weights for policy 0, policy_version 90604 (0.0025) [2025-01-04 03:57:08,968][134211] Fps is (10 sec: 16793.0, 60 sec: 15223.4, 300 sec: 14856.7). Total num frames: 371146752. Throughput: 0: 3872.4. Samples: 81954476. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 03:57:08,969][134211] Avg episode reward: [(0, '7.596')] [2025-01-04 03:57:09,593][134294] Updated weights for policy 0, policy_version 90614 (0.0026) [2025-01-04 03:57:12,493][134294] Updated weights for policy 0, policy_version 90624 (0.0023) [2025-01-04 03:57:13,968][134211] Fps is (10 sec: 14336.8, 60 sec: 15428.3, 300 sec: 14898.3). Total num frames: 371224576. Throughput: 0: 3770.4. Samples: 81975766. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 03:57:13,968][134211] Avg episode reward: [(0, '7.523')] [2025-01-04 03:57:14,031][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000090632_371228672.pth... [2025-01-04 03:57:14,072][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000089762_367665152.pth [2025-01-04 03:57:14,445][134294] Updated weights for policy 0, policy_version 90634 (0.0014) [2025-01-04 03:57:16,291][134294] Updated weights for policy 0, policy_version 90644 (0.0012) [2025-01-04 03:57:18,203][134294] Updated weights for policy 0, policy_version 90654 (0.0013) [2025-01-04 03:57:18,967][134211] Fps is (10 sec: 18842.2, 60 sec: 15837.9, 300 sec: 14995.5). Total num frames: 371335168. Throughput: 0: 3767.5. Samples: 81991952. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 03:57:18,968][134211] Avg episode reward: [(0, '7.541')] [2025-01-04 03:57:20,053][134294] Updated weights for policy 0, policy_version 90664 (0.0014) [2025-01-04 03:57:22,148][134294] Updated weights for policy 0, policy_version 90674 (0.0013) [2025-01-04 03:57:23,968][134211] Fps is (10 sec: 20070.1, 60 sec: 16247.4, 300 sec: 15078.8). Total num frames: 371425280. Throughput: 0: 3947.9. Samples: 82023630. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 03:57:23,968][134211] Avg episode reward: [(0, '8.100')] [2025-01-04 03:57:25,110][134294] Updated weights for policy 0, policy_version 90684 (0.0020) [2025-01-04 03:57:28,582][134294] Updated weights for policy 0, policy_version 90694 (0.0027) [2025-01-04 03:57:28,971][134211] Fps is (10 sec: 15150.2, 60 sec: 15700.4, 300 sec: 15064.8). Total num frames: 371486720. Throughput: 0: 3943.3. Samples: 82042688. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 03:57:28,972][134211] Avg episode reward: [(0, '7.722')] [2025-01-04 03:57:31,804][134294] Updated weights for policy 0, policy_version 90704 (0.0028) [2025-01-04 03:57:33,968][134211] Fps is (10 sec: 12697.4, 60 sec: 15086.9, 300 sec: 14995.5). Total num frames: 371552256. Throughput: 0: 3939.4. Samples: 82052166. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 03:57:33,969][134211] Avg episode reward: [(0, '8.275')] [2025-01-04 03:57:34,887][134294] Updated weights for policy 0, policy_version 90714 (0.0025) [2025-01-04 03:57:37,975][134294] Updated weights for policy 0, policy_version 90724 (0.0024) [2025-01-04 03:57:38,968][134211] Fps is (10 sec: 13111.4, 60 sec: 15155.4, 300 sec: 14995.5). Total num frames: 371617792. Throughput: 0: 3919.7. Samples: 82071742. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 03:57:38,968][134211] Avg episode reward: [(0, '7.394')] [2025-01-04 03:57:41,073][134294] Updated weights for policy 0, policy_version 90734 (0.0026) [2025-01-04 03:57:43,969][134211] Fps is (10 sec: 13106.1, 60 sec: 15086.7, 300 sec: 14953.8). Total num frames: 371683328. Throughput: 0: 3913.7. Samples: 82091628. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 03:57:43,969][134211] Avg episode reward: [(0, '8.296')] [2025-01-04 03:57:44,259][134294] Updated weights for policy 0, policy_version 90744 (0.0029) [2025-01-04 03:57:47,225][134294] Updated weights for policy 0, policy_version 90754 (0.0026) [2025-01-04 03:57:48,968][134211] Fps is (10 sec: 13106.5, 60 sec: 15086.8, 300 sec: 14842.8). Total num frames: 371748864. Throughput: 0: 3814.5. Samples: 82101786. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 03:57:48,969][134211] Avg episode reward: [(0, '7.544')] [2025-01-04 03:57:50,094][134294] Updated weights for policy 0, policy_version 90764 (0.0025) [2025-01-04 03:57:53,093][134294] Updated weights for policy 0, policy_version 90774 (0.0027) [2025-01-04 03:57:53,968][134211] Fps is (10 sec: 13518.1, 60 sec: 15155.3, 300 sec: 14801.1). Total num frames: 371818496. Throughput: 0: 3740.6. Samples: 82122802. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 03:57:53,968][134211] Avg episode reward: [(0, '8.786')] [2025-01-04 03:57:56,061][134294] Updated weights for policy 0, policy_version 90784 (0.0026) [2025-01-04 03:57:58,744][134294] Updated weights for policy 0, policy_version 90794 (0.0019) [2025-01-04 03:57:58,968][134211] Fps is (10 sec: 14746.5, 60 sec: 15291.7, 300 sec: 14884.5). Total num frames: 371896320. Throughput: 0: 3726.4. Samples: 82143452. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 03:57:58,968][134211] Avg episode reward: [(0, '8.665')] [2025-01-04 03:58:00,601][134294] Updated weights for policy 0, policy_version 90804 (0.0014) [2025-01-04 03:58:02,554][134294] Updated weights for policy 0, policy_version 90814 (0.0013) [2025-01-04 03:58:03,967][134211] Fps is (10 sec: 18432.6, 60 sec: 15360.2, 300 sec: 15037.2). Total num frames: 372002816. Throughput: 0: 3723.3. Samples: 82159500. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 03:58:03,970][134211] Avg episode reward: [(0, '9.265')] [2025-01-04 03:58:03,974][134264] Saving new best policy, reward=9.265! [2025-01-04 03:58:04,476][134294] Updated weights for policy 0, policy_version 90824 (0.0015) [2025-01-04 03:58:06,763][134294] Updated weights for policy 0, policy_version 90834 (0.0021) [2025-01-04 03:58:08,968][134211] Fps is (10 sec: 18431.6, 60 sec: 15564.8, 300 sec: 14981.6). Total num frames: 372080640. Throughput: 0: 3669.2. Samples: 82188742. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 03:58:08,968][134211] Avg episode reward: [(0, '7.341')] [2025-01-04 03:58:09,941][134294] Updated weights for policy 0, policy_version 90844 (0.0028) [2025-01-04 03:58:13,007][134294] Updated weights for policy 0, policy_version 90854 (0.0027) [2025-01-04 03:58:13,968][134211] Fps is (10 sec: 14335.6, 60 sec: 15360.0, 300 sec: 14995.5). Total num frames: 372146176. Throughput: 0: 3681.8. Samples: 82208358. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 03:58:13,968][134211] Avg episode reward: [(0, '7.999')] [2025-01-04 03:58:16,045][134294] Updated weights for policy 0, policy_version 90864 (0.0026) [2025-01-04 03:58:18,968][134211] Fps is (10 sec: 13517.1, 60 sec: 14677.3, 300 sec: 15051.1). Total num frames: 372215808. Throughput: 0: 3688.9. Samples: 82218166. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 03:58:18,969][134211] Avg episode reward: [(0, '8.374')] [2025-01-04 03:58:19,283][134294] Updated weights for policy 0, policy_version 90874 (0.0025) [2025-01-04 03:58:22,375][134294] Updated weights for policy 0, policy_version 90884 (0.0024) [2025-01-04 03:58:23,968][134211] Fps is (10 sec: 14745.9, 60 sec: 14472.6, 300 sec: 15037.2). Total num frames: 372293632. Throughput: 0: 3690.8. Samples: 82237826. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 03:58:23,968][134211] Avg episode reward: [(0, '6.740')] [2025-01-04 03:58:24,328][134294] Updated weights for policy 0, policy_version 90894 (0.0014) [2025-01-04 03:58:26,180][134294] Updated weights for policy 0, policy_version 90904 (0.0014) [2025-01-04 03:58:28,813][134294] Updated weights for policy 0, policy_version 90914 (0.0023) [2025-01-04 03:58:28,968][134211] Fps is (10 sec: 16793.3, 60 sec: 14951.2, 300 sec: 15106.6). Total num frames: 372383744. Throughput: 0: 3897.4. Samples: 82267008. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 03:58:28,969][134211] Avg episode reward: [(0, '8.240')] [2025-01-04 03:58:31,951][134294] Updated weights for policy 0, policy_version 90924 (0.0029) [2025-01-04 03:58:33,968][134211] Fps is (10 sec: 15564.4, 60 sec: 14950.4, 300 sec: 15092.8). Total num frames: 372449280. Throughput: 0: 3888.5. Samples: 82276766. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 03:58:33,968][134211] Avg episode reward: [(0, '8.122')] [2025-01-04 03:58:35,235][134294] Updated weights for policy 0, policy_version 90934 (0.0027) [2025-01-04 03:58:38,205][134294] Updated weights for policy 0, policy_version 90944 (0.0023) [2025-01-04 03:58:38,968][134211] Fps is (10 sec: 13106.8, 60 sec: 14950.3, 300 sec: 15009.4). Total num frames: 372514816. Throughput: 0: 3859.0. Samples: 82296458. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 03:58:38,969][134211] Avg episode reward: [(0, '7.801')] [2025-01-04 03:58:41,456][134294] Updated weights for policy 0, policy_version 90954 (0.0028) [2025-01-04 03:58:43,490][134294] Updated weights for policy 0, policy_version 90964 (0.0013) [2025-01-04 03:58:43,967][134211] Fps is (10 sec: 14746.2, 60 sec: 15223.8, 300 sec: 15106.6). Total num frames: 372596736. Throughput: 0: 3897.5. Samples: 82318840. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 03:58:43,968][134211] Avg episode reward: [(0, '7.828')] [2025-01-04 03:58:45,377][134294] Updated weights for policy 0, policy_version 90974 (0.0013) [2025-01-04 03:58:47,262][134294] Updated weights for policy 0, policy_version 90984 (0.0014) [2025-01-04 03:58:48,967][134211] Fps is (10 sec: 19252.3, 60 sec: 15974.6, 300 sec: 15301.0). Total num frames: 372707328. Throughput: 0: 3898.7. Samples: 82334942. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 03:58:48,968][134211] Avg episode reward: [(0, '7.690')] [2025-01-04 03:58:49,140][134294] Updated weights for policy 0, policy_version 90994 (0.0014) [2025-01-04 03:58:51,988][134294] Updated weights for policy 0, policy_version 91004 (0.0024) [2025-01-04 03:58:53,968][134211] Fps is (10 sec: 17612.2, 60 sec: 15906.1, 300 sec: 15189.9). Total num frames: 372772864. Throughput: 0: 3849.0. Samples: 82361946. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 03:58:53,968][134211] Avg episode reward: [(0, '8.254')] [2025-01-04 03:58:55,395][134294] Updated weights for policy 0, policy_version 91014 (0.0029) [2025-01-04 03:58:58,468][134294] Updated weights for policy 0, policy_version 91024 (0.0026) [2025-01-04 03:58:58,968][134211] Fps is (10 sec: 13106.9, 60 sec: 15701.3, 300 sec: 15078.8). Total num frames: 372838400. Throughput: 0: 3831.5. Samples: 82380774. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 03:58:58,969][134211] Avg episode reward: [(0, '7.160')] [2025-01-04 03:59:01,538][134294] Updated weights for policy 0, policy_version 91034 (0.0027) [2025-01-04 03:59:03,968][134211] Fps is (10 sec: 13107.2, 60 sec: 15018.6, 300 sec: 15092.7). Total num frames: 372903936. Throughput: 0: 3833.4. Samples: 82390668. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 03:59:03,968][134211] Avg episode reward: [(0, '7.946')] [2025-01-04 03:59:04,858][134294] Updated weights for policy 0, policy_version 91044 (0.0029) [2025-01-04 03:59:08,000][134294] Updated weights for policy 0, policy_version 91054 (0.0027) [2025-01-04 03:59:08,968][134211] Fps is (10 sec: 13107.3, 60 sec: 14813.9, 300 sec: 15120.5). Total num frames: 372969472. Throughput: 0: 3816.1. Samples: 82409550. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 03:59:08,968][134211] Avg episode reward: [(0, '6.774')] [2025-01-04 03:59:10,449][134294] Updated weights for policy 0, policy_version 91064 (0.0017) [2025-01-04 03:59:12,317][134294] Updated weights for policy 0, policy_version 91074 (0.0013) [2025-01-04 03:59:13,967][134211] Fps is (10 sec: 16794.1, 60 sec: 15428.3, 300 sec: 15259.3). Total num frames: 373071872. Throughput: 0: 3792.9. Samples: 82437686. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 03:59:13,968][134211] Avg episode reward: [(0, '7.466')] [2025-01-04 03:59:14,029][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000091083_373075968.pth... [2025-01-04 03:59:14,068][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000090186_369401856.pth [2025-01-04 03:59:14,239][134294] Updated weights for policy 0, policy_version 91084 (0.0013) [2025-01-04 03:59:16,127][134294] Updated weights for policy 0, policy_version 91094 (0.0012) [2025-01-04 03:59:18,045][134294] Updated weights for policy 0, policy_version 91104 (0.0013) [2025-01-04 03:59:18,968][134211] Fps is (10 sec: 20889.9, 60 sec: 16042.7, 300 sec: 15426.0). Total num frames: 373178368. Throughput: 0: 3931.1. Samples: 82453666. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 03:59:18,968][134211] Avg episode reward: [(0, '7.835')] [2025-01-04 03:59:20,377][134294] Updated weights for policy 0, policy_version 91114 (0.0020) [2025-01-04 03:59:23,708][134294] Updated weights for policy 0, policy_version 91124 (0.0026) [2025-01-04 03:59:23,968][134211] Fps is (10 sec: 17202.6, 60 sec: 15837.8, 300 sec: 15412.1). Total num frames: 373243904. Throughput: 0: 4076.2. Samples: 82479886. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 03:59:23,969][134211] Avg episode reward: [(0, '7.546')] [2025-01-04 03:59:27,453][134294] Updated weights for policy 0, policy_version 91134 (0.0026) [2025-01-04 03:59:28,968][134211] Fps is (10 sec: 11878.3, 60 sec: 15223.5, 300 sec: 15259.3). Total num frames: 373297152. Throughput: 0: 3942.6. Samples: 82496260. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 03:59:28,968][134211] Avg episode reward: [(0, '6.974')] [2025-01-04 03:59:31,128][134294] Updated weights for policy 0, policy_version 91144 (0.0025) [2025-01-04 03:59:33,970][134211] Fps is (10 sec: 11466.5, 60 sec: 15154.7, 300 sec: 15217.6). Total num frames: 373358592. Throughput: 0: 3779.0. Samples: 82505004. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 03:59:33,970][134211] Avg episode reward: [(0, '8.113')] [2025-01-04 03:59:34,559][134294] Updated weights for policy 0, policy_version 91154 (0.0030) [2025-01-04 03:59:36,620][134294] Updated weights for policy 0, policy_version 91164 (0.0012) [2025-01-04 03:59:38,471][134294] Updated weights for policy 0, policy_version 91174 (0.0013) [2025-01-04 03:59:38,967][134211] Fps is (10 sec: 15974.7, 60 sec: 15701.5, 300 sec: 15328.8). Total num frames: 373456896. Throughput: 0: 3699.3. Samples: 82528412. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 03:59:38,968][134211] Avg episode reward: [(0, '7.593')] [2025-01-04 03:59:40,394][134294] Updated weights for policy 0, policy_version 91184 (0.0013) [2025-01-04 03:59:42,241][134294] Updated weights for policy 0, policy_version 91194 (0.0012) [2025-01-04 03:59:43,968][134211] Fps is (10 sec: 19665.0, 60 sec: 15974.3, 300 sec: 15439.8). Total num frames: 373555200. Throughput: 0: 3981.7. Samples: 82559952. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 03:59:43,968][134211] Avg episode reward: [(0, '7.837')] [2025-01-04 03:59:45,030][134294] Updated weights for policy 0, policy_version 91204 (0.0024) [2025-01-04 03:59:48,146][134294] Updated weights for policy 0, policy_version 91214 (0.0030) [2025-01-04 03:59:48,968][134211] Fps is (10 sec: 16383.9, 60 sec: 15223.5, 300 sec: 15370.4). Total num frames: 373620736. Throughput: 0: 3985.3. Samples: 82570004. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 03:59:48,968][134211] Avg episode reward: [(0, '7.716')] [2025-01-04 03:59:51,262][134294] Updated weights for policy 0, policy_version 91224 (0.0028) [2025-01-04 03:59:53,968][134211] Fps is (10 sec: 13106.3, 60 sec: 15223.3, 300 sec: 15231.5). Total num frames: 373686272. Throughput: 0: 3998.9. Samples: 82589502. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 03:59:53,969][134211] Avg episode reward: [(0, '7.344')] [2025-01-04 03:59:54,431][134294] Updated weights for policy 0, policy_version 91234 (0.0028) [2025-01-04 03:59:57,603][134294] Updated weights for policy 0, policy_version 91244 (0.0024) [2025-01-04 03:59:58,968][134211] Fps is (10 sec: 13107.2, 60 sec: 15223.5, 300 sec: 15189.9). Total num frames: 373751808. Throughput: 0: 3806.1. Samples: 82608962. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 03:59:58,968][134211] Avg episode reward: [(0, '7.914')] [2025-01-04 04:00:00,592][134294] Updated weights for policy 0, policy_version 91254 (0.0026) [2025-01-04 04:00:03,616][134294] Updated weights for policy 0, policy_version 91264 (0.0024) [2025-01-04 04:00:03,968][134211] Fps is (10 sec: 13517.5, 60 sec: 15291.7, 300 sec: 15203.8). Total num frames: 373821440. Throughput: 0: 3680.3. Samples: 82619280. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 04:00:03,969][134211] Avg episode reward: [(0, '7.974')] [2025-01-04 04:00:06,603][134294] Updated weights for policy 0, policy_version 91274 (0.0026) [2025-01-04 04:00:08,968][134211] Fps is (10 sec: 13516.7, 60 sec: 15291.8, 300 sec: 15203.8). Total num frames: 373886976. Throughput: 0: 3550.5. Samples: 82639658. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 04:00:08,968][134211] Avg episode reward: [(0, '7.871')] [2025-01-04 04:00:09,732][134294] Updated weights for policy 0, policy_version 91284 (0.0026) [2025-01-04 04:00:11,934][134294] Updated weights for policy 0, policy_version 91294 (0.0015) [2025-01-04 04:00:13,832][134294] Updated weights for policy 0, policy_version 91304 (0.0013) [2025-01-04 04:00:13,968][134211] Fps is (10 sec: 15974.7, 60 sec: 15155.1, 300 sec: 15287.1). Total num frames: 373981184. Throughput: 0: 3745.5. Samples: 82664806. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 04:00:13,968][134211] Avg episode reward: [(0, '7.799')] [2025-01-04 04:00:15,742][134294] Updated weights for policy 0, policy_version 91314 (0.0013) [2025-01-04 04:00:17,684][134294] Updated weights for policy 0, policy_version 91324 (0.0015) [2025-01-04 04:00:18,968][134211] Fps is (10 sec: 19660.8, 60 sec: 15086.9, 300 sec: 15439.8). Total num frames: 374083584. Throughput: 0: 3914.3. Samples: 82681138. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 04:00:18,968][134211] Avg episode reward: [(0, '8.924')] [2025-01-04 04:00:20,306][134294] Updated weights for policy 0, policy_version 91334 (0.0022) [2025-01-04 04:00:23,529][134294] Updated weights for policy 0, policy_version 91344 (0.0026) [2025-01-04 04:00:23,968][134211] Fps is (10 sec: 16793.7, 60 sec: 15087.0, 300 sec: 15314.9). Total num frames: 374149120. Throughput: 0: 3934.0. Samples: 82705442. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 04:00:23,968][134211] Avg episode reward: [(0, '7.156')] [2025-01-04 04:00:26,722][134294] Updated weights for policy 0, policy_version 91354 (0.0029) [2025-01-04 04:00:28,968][134211] Fps is (10 sec: 13107.3, 60 sec: 15291.8, 300 sec: 15189.9). Total num frames: 374214656. Throughput: 0: 3654.5. Samples: 82724404. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 04:00:28,968][134211] Avg episode reward: [(0, '7.984')] [2025-01-04 04:00:29,913][134294] Updated weights for policy 0, policy_version 91364 (0.0026) [2025-01-04 04:00:32,938][134294] Updated weights for policy 0, policy_version 91374 (0.0024) [2025-01-04 04:00:33,968][134211] Fps is (10 sec: 13107.4, 60 sec: 15360.6, 300 sec: 15189.9). Total num frames: 374280192. Throughput: 0: 3646.8. Samples: 82734108. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 04:00:33,968][134211] Avg episode reward: [(0, '7.606')] [2025-01-04 04:00:35,931][134294] Updated weights for policy 0, policy_version 91384 (0.0024) [2025-01-04 04:00:38,657][134294] Updated weights for policy 0, policy_version 91394 (0.0020) [2025-01-04 04:00:38,967][134211] Fps is (10 sec: 13926.5, 60 sec: 14950.4, 300 sec: 15231.6). Total num frames: 374353920. Throughput: 0: 3671.2. Samples: 82754704. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 04:00:38,968][134211] Avg episode reward: [(0, '8.185')] [2025-01-04 04:00:40,518][134294] Updated weights for policy 0, policy_version 91404 (0.0014) [2025-01-04 04:00:42,393][134294] Updated weights for policy 0, policy_version 91414 (0.0012) [2025-01-04 04:00:43,968][134211] Fps is (10 sec: 18432.0, 60 sec: 15155.2, 300 sec: 15370.4). Total num frames: 374464512. Throughput: 0: 3923.8. Samples: 82785532. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 04:00:43,968][134211] Avg episode reward: [(0, '8.556')] [2025-01-04 04:00:44,223][134294] Updated weights for policy 0, policy_version 91424 (0.0013) [2025-01-04 04:00:46,147][134294] Updated weights for policy 0, policy_version 91434 (0.0014) [2025-01-04 04:00:48,143][134294] Updated weights for policy 0, policy_version 91444 (0.0016) [2025-01-04 04:00:48,968][134211] Fps is (10 sec: 20889.2, 60 sec: 15701.3, 300 sec: 15481.5). Total num frames: 374562816. Throughput: 0: 4056.3. Samples: 82801812. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 04:00:48,968][134211] Avg episode reward: [(0, '8.560')] [2025-01-04 04:00:51,153][134294] Updated weights for policy 0, policy_version 91454 (0.0026) [2025-01-04 04:00:53,968][134211] Fps is (10 sec: 16383.8, 60 sec: 15701.5, 300 sec: 15467.6). Total num frames: 374628352. Throughput: 0: 4123.5. Samples: 82825214. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 04:00:53,968][134211] Avg episode reward: [(0, '6.979')] [2025-01-04 04:00:54,514][134294] Updated weights for policy 0, policy_version 91464 (0.0025) [2025-01-04 04:00:57,724][134294] Updated weights for policy 0, policy_version 91474 (0.0030) [2025-01-04 04:00:58,968][134211] Fps is (10 sec: 12697.6, 60 sec: 15633.0, 300 sec: 15425.9). Total num frames: 374689792. Throughput: 0: 3982.1. Samples: 82844002. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 04:00:58,968][134211] Avg episode reward: [(0, '7.487')] [2025-01-04 04:01:00,960][134294] Updated weights for policy 0, policy_version 91484 (0.0027) [2025-01-04 04:01:03,968][134211] Fps is (10 sec: 12697.7, 60 sec: 15564.9, 300 sec: 15328.8). Total num frames: 374755328. Throughput: 0: 3833.7. Samples: 82853656. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 04:01:03,968][134211] Avg episode reward: [(0, '8.496')] [2025-01-04 04:01:04,150][134294] Updated weights for policy 0, policy_version 91494 (0.0025) [2025-01-04 04:01:07,205][134294] Updated weights for policy 0, policy_version 91504 (0.0027) [2025-01-04 04:01:08,968][134211] Fps is (10 sec: 13107.3, 60 sec: 15564.8, 300 sec: 15328.8). Total num frames: 374820864. Throughput: 0: 3724.9. Samples: 82873064. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 04:01:08,968][134211] Avg episode reward: [(0, '7.907')] [2025-01-04 04:01:10,305][134294] Updated weights for policy 0, policy_version 91514 (0.0027) [2025-01-04 04:01:12,352][134294] Updated weights for policy 0, policy_version 91524 (0.0012) [2025-01-04 04:01:13,967][134211] Fps is (10 sec: 15974.5, 60 sec: 15564.9, 300 sec: 15356.5). Total num frames: 374915072. Throughput: 0: 3859.6. Samples: 82898084. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 04:01:13,968][134211] Avg episode reward: [(0, '8.250')] [2025-01-04 04:01:13,973][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000091532_374915072.pth... [2025-01-04 04:01:14,023][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000090632_371228672.pth [2025-01-04 04:01:14,313][134294] Updated weights for policy 0, policy_version 91534 (0.0014) [2025-01-04 04:01:16,206][134294] Updated weights for policy 0, policy_version 91544 (0.0014) [2025-01-04 04:01:18,182][134294] Updated weights for policy 0, policy_version 91554 (0.0013) [2025-01-04 04:01:18,968][134211] Fps is (10 sec: 19661.0, 60 sec: 15564.8, 300 sec: 15481.5). Total num frames: 375017472. Throughput: 0: 4000.0. Samples: 82914108. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 04:01:18,969][134211] Avg episode reward: [(0, '7.485')] [2025-01-04 04:01:21,113][134294] Updated weights for policy 0, policy_version 91564 (0.0024) [2025-01-04 04:01:23,968][134211] Fps is (10 sec: 15973.9, 60 sec: 15428.2, 300 sec: 15356.5). Total num frames: 375074816. Throughput: 0: 4073.7. Samples: 82938020. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 04:01:23,968][134211] Avg episode reward: [(0, '7.987')] [2025-01-04 04:01:24,741][134294] Updated weights for policy 0, policy_version 91574 (0.0026) [2025-01-04 04:01:28,352][134294] Updated weights for policy 0, policy_version 91584 (0.0027) [2025-01-04 04:01:28,968][134211] Fps is (10 sec: 11468.7, 60 sec: 15291.7, 300 sec: 15203.8). Total num frames: 375132160. Throughput: 0: 3754.2. Samples: 82954470. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 04:01:28,968][134211] Avg episode reward: [(0, '7.645')] [2025-01-04 04:01:31,978][134294] Updated weights for policy 0, policy_version 91594 (0.0027) [2025-01-04 04:01:33,968][134211] Fps is (10 sec: 12288.3, 60 sec: 15291.7, 300 sec: 15217.7). Total num frames: 375197696. Throughput: 0: 3590.0. Samples: 82963360. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 04:01:33,968][134211] Avg episode reward: [(0, '8.827')] [2025-01-04 04:01:34,474][134294] Updated weights for policy 0, policy_version 91604 (0.0017) [2025-01-04 04:01:36,437][134294] Updated weights for policy 0, policy_version 91614 (0.0013) [2025-01-04 04:01:38,372][134294] Updated weights for policy 0, policy_version 91624 (0.0014) [2025-01-04 04:01:38,968][134211] Fps is (10 sec: 17203.4, 60 sec: 15837.9, 300 sec: 15342.7). Total num frames: 375304192. Throughput: 0: 3660.3. Samples: 82989928. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 04:01:38,968][134211] Avg episode reward: [(0, '8.623')] [2025-01-04 04:01:40,270][134294] Updated weights for policy 0, policy_version 91634 (0.0012) [2025-01-04 04:01:42,199][134294] Updated weights for policy 0, policy_version 91644 (0.0013) [2025-01-04 04:01:43,968][134211] Fps is (10 sec: 21298.0, 60 sec: 15769.5, 300 sec: 15481.5). Total num frames: 375410688. Throughput: 0: 3953.1. Samples: 83021892. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 04:01:43,969][134211] Avg episode reward: [(0, '7.709')] [2025-01-04 04:01:44,319][134294] Updated weights for policy 0, policy_version 91654 (0.0018) [2025-01-04 04:01:47,473][134294] Updated weights for policy 0, policy_version 91664 (0.0024) [2025-01-04 04:01:48,968][134211] Fps is (10 sec: 16793.4, 60 sec: 15155.2, 300 sec: 15467.6). Total num frames: 375472128. Throughput: 0: 3989.1. Samples: 83033164. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 04:01:48,968][134211] Avg episode reward: [(0, '7.756')] [2025-01-04 04:01:50,797][134294] Updated weights for policy 0, policy_version 91674 (0.0028) [2025-01-04 04:01:53,968][134211] Fps is (10 sec: 12288.4, 60 sec: 15086.9, 300 sec: 15439.8). Total num frames: 375533568. Throughput: 0: 3972.9. Samples: 83051844. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 04:01:53,969][134211] Avg episode reward: [(0, '8.073')] [2025-01-04 04:01:54,124][134294] Updated weights for policy 0, policy_version 91684 (0.0026) [2025-01-04 04:01:57,202][134294] Updated weights for policy 0, policy_version 91694 (0.0027) [2025-01-04 04:01:58,969][134211] Fps is (10 sec: 12696.1, 60 sec: 15154.9, 300 sec: 15314.8). Total num frames: 375599104. Throughput: 0: 3838.3. Samples: 83070812. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 04:01:58,970][134211] Avg episode reward: [(0, '8.390')] [2025-01-04 04:02:00,308][134294] Updated weights for policy 0, policy_version 91704 (0.0025) [2025-01-04 04:02:03,390][134294] Updated weights for policy 0, policy_version 91714 (0.0026) [2025-01-04 04:02:03,968][134211] Fps is (10 sec: 13107.5, 60 sec: 15155.2, 300 sec: 15314.9). Total num frames: 375664640. Throughput: 0: 3707.7. Samples: 83080956. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 04:02:03,968][134211] Avg episode reward: [(0, '7.618')] [2025-01-04 04:02:06,492][134294] Updated weights for policy 0, policy_version 91724 (0.0025) [2025-01-04 04:02:08,968][134211] Fps is (10 sec: 13108.9, 60 sec: 15155.2, 300 sec: 15273.2). Total num frames: 375730176. Throughput: 0: 3614.9. Samples: 83100690. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 04:02:08,968][134211] Avg episode reward: [(0, '7.454')] [2025-01-04 04:02:09,826][134294] Updated weights for policy 0, policy_version 91734 (0.0027) [2025-01-04 04:02:12,847][134294] Updated weights for policy 0, policy_version 91744 (0.0027) [2025-01-04 04:02:13,968][134211] Fps is (10 sec: 13107.2, 60 sec: 14677.3, 300 sec: 15120.5). Total num frames: 375795712. Throughput: 0: 3681.3. Samples: 83120130. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 04:02:13,968][134211] Avg episode reward: [(0, '9.197')] [2025-01-04 04:02:15,242][134294] Updated weights for policy 0, policy_version 91754 (0.0017) [2025-01-04 04:02:17,780][134294] Updated weights for policy 0, policy_version 91764 (0.0022) [2025-01-04 04:02:18,968][134211] Fps is (10 sec: 14745.7, 60 sec: 14336.0, 300 sec: 15092.7). Total num frames: 375877632. Throughput: 0: 3788.8. Samples: 83133854. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 04:02:18,969][134211] Avg episode reward: [(0, '8.214')] [2025-01-04 04:02:20,835][134294] Updated weights for policy 0, policy_version 91774 (0.0026) [2025-01-04 04:02:22,815][134294] Updated weights for policy 0, policy_version 91784 (0.0013) [2025-01-04 04:02:23,968][134211] Fps is (10 sec: 17612.8, 60 sec: 14950.5, 300 sec: 15204.0). Total num frames: 375971840. Throughput: 0: 3705.2. Samples: 83156662. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 04:02:23,968][134211] Avg episode reward: [(0, '8.091')] [2025-01-04 04:02:24,695][134294] Updated weights for policy 0, policy_version 91794 (0.0013) [2025-01-04 04:02:27,199][134294] Updated weights for policy 0, policy_version 91804 (0.0021) [2025-01-04 04:02:28,968][134211] Fps is (10 sec: 17203.2, 60 sec: 15291.8, 300 sec: 15245.5). Total num frames: 376049664. Throughput: 0: 3592.4. Samples: 83183548. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 04:02:28,968][134211] Avg episode reward: [(0, '7.844')] [2025-01-04 04:02:30,363][134294] Updated weights for policy 0, policy_version 91814 (0.0026) [2025-01-04 04:02:33,426][134294] Updated weights for policy 0, policy_version 91824 (0.0027) [2025-01-04 04:02:33,968][134211] Fps is (10 sec: 14335.9, 60 sec: 15291.7, 300 sec: 15245.5). Total num frames: 376115200. Throughput: 0: 3566.3. Samples: 83193646. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 04:02:33,968][134211] Avg episode reward: [(0, '8.079')] [2025-01-04 04:02:36,566][134294] Updated weights for policy 0, policy_version 91834 (0.0026) [2025-01-04 04:02:38,565][134294] Updated weights for policy 0, policy_version 91844 (0.0012) [2025-01-04 04:02:38,968][134211] Fps is (10 sec: 15155.3, 60 sec: 14950.4, 300 sec: 15314.9). Total num frames: 376201216. Throughput: 0: 3609.2. Samples: 83214258. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 04:02:38,968][134211] Avg episode reward: [(0, '8.374')] [2025-01-04 04:02:40,718][134294] Updated weights for policy 0, policy_version 91854 (0.0016) [2025-01-04 04:02:43,625][134294] Updated weights for policy 0, policy_version 91864 (0.0026) [2025-01-04 04:02:43,968][134211] Fps is (10 sec: 16383.0, 60 sec: 14472.5, 300 sec: 15356.5). Total num frames: 376279040. Throughput: 0: 3771.7. Samples: 83240536. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 04:02:43,969][134211] Avg episode reward: [(0, '7.477')] [2025-01-04 04:02:46,647][134294] Updated weights for policy 0, policy_version 91874 (0.0026) [2025-01-04 04:02:48,968][134211] Fps is (10 sec: 14335.9, 60 sec: 14540.8, 300 sec: 15342.7). Total num frames: 376344576. Throughput: 0: 3767.7. Samples: 83250504. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 04:02:48,968][134211] Avg episode reward: [(0, '7.463')] [2025-01-04 04:02:49,822][134294] Updated weights for policy 0, policy_version 91884 (0.0027) [2025-01-04 04:02:52,834][134294] Updated weights for policy 0, policy_version 91894 (0.0025) [2025-01-04 04:02:53,968][134211] Fps is (10 sec: 13517.7, 60 sec: 14677.4, 300 sec: 15314.9). Total num frames: 376414208. Throughput: 0: 3773.7. Samples: 83270506. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 04:02:53,968][134211] Avg episode reward: [(0, '7.677')] [2025-01-04 04:02:55,117][134294] Updated weights for policy 0, policy_version 91904 (0.0015) [2025-01-04 04:02:57,028][134294] Updated weights for policy 0, policy_version 91914 (0.0013) [2025-01-04 04:02:58,910][134294] Updated weights for policy 0, policy_version 91924 (0.0012) [2025-01-04 04:02:58,968][134211] Fps is (10 sec: 17612.1, 60 sec: 15360.2, 300 sec: 15314.8). Total num frames: 376520704. Throughput: 0: 3988.5. Samples: 83299614. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 04:02:58,968][134211] Avg episode reward: [(0, '8.473')] [2025-01-04 04:03:00,769][134294] Updated weights for policy 0, policy_version 91934 (0.0014) [2025-01-04 04:03:02,675][134294] Updated weights for policy 0, policy_version 91944 (0.0012) [2025-01-04 04:03:03,968][134211] Fps is (10 sec: 20888.6, 60 sec: 15974.3, 300 sec: 15398.2). Total num frames: 376623104. Throughput: 0: 4046.1. Samples: 83315930. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 04:03:03,969][134211] Avg episode reward: [(0, '7.797')] [2025-01-04 04:03:05,200][134294] Updated weights for policy 0, policy_version 91954 (0.0022) [2025-01-04 04:03:08,489][134294] Updated weights for policy 0, policy_version 91964 (0.0025) [2025-01-04 04:03:08,968][134211] Fps is (10 sec: 16793.9, 60 sec: 15974.3, 300 sec: 15398.2). Total num frames: 376688640. Throughput: 0: 4079.5. Samples: 83340240. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 04:03:08,969][134211] Avg episode reward: [(0, '7.593')] [2025-01-04 04:03:11,677][134294] Updated weights for policy 0, policy_version 91974 (0.0028) [2025-01-04 04:03:13,968][134211] Fps is (10 sec: 12697.8, 60 sec: 15906.1, 300 sec: 15370.4). Total num frames: 376750080. Throughput: 0: 3906.3. Samples: 83359332. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 04:03:13,968][134211] Avg episode reward: [(0, '7.449')] [2025-01-04 04:03:13,976][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000091981_376754176.pth... [2025-01-04 04:03:14,053][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000091083_373075968.pth [2025-01-04 04:03:14,966][134294] Updated weights for policy 0, policy_version 91984 (0.0027) [2025-01-04 04:03:18,051][134294] Updated weights for policy 0, policy_version 91994 (0.0027) [2025-01-04 04:03:18,968][134211] Fps is (10 sec: 12697.8, 60 sec: 15633.0, 300 sec: 15328.8). Total num frames: 376815616. Throughput: 0: 3889.3. Samples: 83368664. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 04:03:18,968][134211] Avg episode reward: [(0, '7.744')] [2025-01-04 04:03:21,123][134294] Updated weights for policy 0, policy_version 92004 (0.0025) [2025-01-04 04:03:23,968][134211] Fps is (10 sec: 13517.0, 60 sec: 15223.4, 300 sec: 15259.3). Total num frames: 376885248. Throughput: 0: 3877.9. Samples: 83388766. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 04:03:23,968][134211] Avg episode reward: [(0, '7.509')] [2025-01-04 04:03:24,304][134294] Updated weights for policy 0, policy_version 92014 (0.0028) [2025-01-04 04:03:27,750][134294] Updated weights for policy 0, policy_version 92024 (0.0026) [2025-01-04 04:03:28,968][134211] Fps is (10 sec: 13107.4, 60 sec: 14950.4, 300 sec: 15245.5). Total num frames: 376946688. Throughput: 0: 3694.3. Samples: 83406776. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 04:03:28,968][134211] Avg episode reward: [(0, '8.071')] [2025-01-04 04:03:30,155][134294] Updated weights for policy 0, policy_version 92034 (0.0015) [2025-01-04 04:03:32,115][134294] Updated weights for policy 0, policy_version 92044 (0.0014) [2025-01-04 04:03:33,967][134211] Fps is (10 sec: 16384.4, 60 sec: 15564.8, 300 sec: 15370.4). Total num frames: 377049088. Throughput: 0: 3794.4. Samples: 83421252. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 04:03:33,968][134211] Avg episode reward: [(0, '7.054')] [2025-01-04 04:03:34,002][134294] Updated weights for policy 0, policy_version 92054 (0.0013) [2025-01-04 04:03:35,883][134294] Updated weights for policy 0, policy_version 92064 (0.0013) [2025-01-04 04:03:37,764][134294] Updated weights for policy 0, policy_version 92074 (0.0014) [2025-01-04 04:03:38,968][134211] Fps is (10 sec: 21299.2, 60 sec: 15974.4, 300 sec: 15467.6). Total num frames: 377159680. Throughput: 0: 4074.7. Samples: 83453868. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 04:03:38,968][134211] Avg episode reward: [(0, '7.710')] [2025-01-04 04:03:39,677][134294] Updated weights for policy 0, policy_version 92084 (0.0015) [2025-01-04 04:03:42,868][134294] Updated weights for policy 0, policy_version 92094 (0.0027) [2025-01-04 04:03:43,968][134211] Fps is (10 sec: 18021.7, 60 sec: 15838.0, 300 sec: 15328.7). Total num frames: 377229312. Throughput: 0: 3973.9. Samples: 83478438. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 04:03:43,969][134211] Avg episode reward: [(0, '8.552')] [2025-01-04 04:03:46,169][134294] Updated weights for policy 0, policy_version 92104 (0.0029) [2025-01-04 04:03:48,968][134211] Fps is (10 sec: 13107.2, 60 sec: 15769.6, 300 sec: 15314.9). Total num frames: 377290752. Throughput: 0: 3823.3. Samples: 83487976. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 04:03:48,968][134211] Avg episode reward: [(0, '8.144')] [2025-01-04 04:03:49,395][134294] Updated weights for policy 0, policy_version 92114 (0.0027) [2025-01-04 04:03:52,568][134294] Updated weights for policy 0, policy_version 92124 (0.0024) [2025-01-04 04:03:53,968][134211] Fps is (10 sec: 12697.7, 60 sec: 15701.3, 300 sec: 15314.9). Total num frames: 377356288. Throughput: 0: 3706.8. Samples: 83507046. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 04:03:53,968][134211] Avg episode reward: [(0, '6.848')] [2025-01-04 04:03:55,539][134294] Updated weights for policy 0, policy_version 92134 (0.0024) [2025-01-04 04:03:58,540][134294] Updated weights for policy 0, policy_version 92144 (0.0025) [2025-01-04 04:03:58,968][134211] Fps is (10 sec: 13516.6, 60 sec: 15087.0, 300 sec: 15328.8). Total num frames: 377425920. Throughput: 0: 3737.1. Samples: 83527500. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 04:03:58,968][134211] Avg episode reward: [(0, '7.246')] [2025-01-04 04:04:01,544][134294] Updated weights for policy 0, policy_version 92154 (0.0024) [2025-01-04 04:04:03,970][134211] Fps is (10 sec: 13514.2, 60 sec: 14472.1, 300 sec: 15328.7). Total num frames: 377491456. Throughput: 0: 3754.8. Samples: 83537636. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 04:04:03,970][134211] Avg episode reward: [(0, '7.898')] [2025-01-04 04:04:04,769][134294] Updated weights for policy 0, policy_version 92164 (0.0028) [2025-01-04 04:04:07,920][134294] Updated weights for policy 0, policy_version 92174 (0.0025) [2025-01-04 04:04:08,968][134211] Fps is (10 sec: 13107.3, 60 sec: 14472.6, 300 sec: 15203.8). Total num frames: 377556992. Throughput: 0: 3742.3. Samples: 83557168. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 04:04:08,968][134211] Avg episode reward: [(0, '7.419')] [2025-01-04 04:04:10,579][134294] Updated weights for policy 0, policy_version 92184 (0.0024) [2025-01-04 04:04:12,488][134294] Updated weights for policy 0, policy_version 92194 (0.0013) [2025-01-04 04:04:13,968][134211] Fps is (10 sec: 16387.5, 60 sec: 15087.0, 300 sec: 15176.0). Total num frames: 377655296. Throughput: 0: 3931.6. Samples: 83583700. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 04:04:13,968][134211] Avg episode reward: [(0, '7.472')] [2025-01-04 04:04:14,373][134294] Updated weights for policy 0, policy_version 92204 (0.0013) [2025-01-04 04:04:16,256][134294] Updated weights for policy 0, policy_version 92214 (0.0014) [2025-01-04 04:04:18,152][134294] Updated weights for policy 0, policy_version 92224 (0.0014) [2025-01-04 04:04:18,968][134211] Fps is (10 sec: 20889.4, 60 sec: 15837.9, 300 sec: 15328.8). Total num frames: 377765888. Throughput: 0: 3974.5. Samples: 83600106. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 04:04:18,968][134211] Avg episode reward: [(0, '6.931')] [2025-01-04 04:04:20,574][134294] Updated weights for policy 0, policy_version 92234 (0.0022) [2025-01-04 04:04:23,793][134294] Updated weights for policy 0, policy_version 92244 (0.0026) [2025-01-04 04:04:23,968][134211] Fps is (10 sec: 17612.5, 60 sec: 15769.6, 300 sec: 15370.4). Total num frames: 377831424. Throughput: 0: 3826.9. Samples: 83626078. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 04:04:23,968][134211] Avg episode reward: [(0, '8.310')] [2025-01-04 04:04:26,890][134294] Updated weights for policy 0, policy_version 92254 (0.0026) [2025-01-04 04:04:28,968][134211] Fps is (10 sec: 13107.2, 60 sec: 15837.8, 300 sec: 15384.4). Total num frames: 377896960. Throughput: 0: 3708.1. Samples: 83645304. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 04:04:28,968][134211] Avg episode reward: [(0, '7.601')] [2025-01-04 04:04:30,160][134294] Updated weights for policy 0, policy_version 92264 (0.0027) [2025-01-04 04:04:33,194][134294] Updated weights for policy 0, policy_version 92274 (0.0027) [2025-01-04 04:04:33,968][134211] Fps is (10 sec: 13107.2, 60 sec: 15223.4, 300 sec: 15273.2). Total num frames: 377962496. Throughput: 0: 3711.9. Samples: 83655014. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 04:04:33,968][134211] Avg episode reward: [(0, '7.726')] [2025-01-04 04:04:36,228][134294] Updated weights for policy 0, policy_version 92284 (0.0023) [2025-01-04 04:04:38,968][134211] Fps is (10 sec: 13107.3, 60 sec: 14472.5, 300 sec: 15162.1). Total num frames: 378028032. Throughput: 0: 3734.7. Samples: 83675106. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 04:04:38,968][134211] Avg episode reward: [(0, '7.981')] [2025-01-04 04:04:39,376][134294] Updated weights for policy 0, policy_version 92294 (0.0024) [2025-01-04 04:04:41,810][134294] Updated weights for policy 0, policy_version 92304 (0.0020) [2025-01-04 04:04:43,752][134294] Updated weights for policy 0, policy_version 92314 (0.0014) [2025-01-04 04:04:43,968][134211] Fps is (10 sec: 15974.7, 60 sec: 14882.2, 300 sec: 15259.3). Total num frames: 378122240. Throughput: 0: 3827.7. Samples: 83699746. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 04:04:43,968][134211] Avg episode reward: [(0, '7.760')] [2025-01-04 04:04:45,923][134294] Updated weights for policy 0, policy_version 92324 (0.0017) [2025-01-04 04:04:48,884][134294] Updated weights for policy 0, policy_version 92334 (0.0027) [2025-01-04 04:04:48,968][134211] Fps is (10 sec: 17203.0, 60 sec: 15155.2, 300 sec: 15301.0). Total num frames: 378200064. Throughput: 0: 3917.6. Samples: 83713918. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 04:04:48,968][134211] Avg episode reward: [(0, '7.734')] [2025-01-04 04:04:52,025][134294] Updated weights for policy 0, policy_version 92344 (0.0026) [2025-01-04 04:04:53,968][134211] Fps is (10 sec: 14335.7, 60 sec: 15155.2, 300 sec: 15301.0). Total num frames: 378265600. Throughput: 0: 3925.0. Samples: 83733796. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 04:04:53,968][134211] Avg episode reward: [(0, '7.479')] [2025-01-04 04:04:55,285][134294] Updated weights for policy 0, policy_version 92354 (0.0027) [2025-01-04 04:04:57,835][134294] Updated weights for policy 0, policy_version 92364 (0.0019) [2025-01-04 04:04:58,968][134211] Fps is (10 sec: 14336.2, 60 sec: 15291.8, 300 sec: 15328.8). Total num frames: 378343424. Throughput: 0: 3820.8. Samples: 83755636. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 04:04:58,968][134211] Avg episode reward: [(0, '7.660')] [2025-01-04 04:04:59,789][134294] Updated weights for policy 0, policy_version 92374 (0.0014) [2025-01-04 04:05:01,657][134294] Updated weights for policy 0, policy_version 92384 (0.0013) [2025-01-04 04:05:03,698][134294] Updated weights for policy 0, policy_version 92394 (0.0013) [2025-01-04 04:05:03,968][134211] Fps is (10 sec: 18432.0, 60 sec: 15974.9, 300 sec: 15467.6). Total num frames: 378449920. Throughput: 0: 3817.9. Samples: 83771910. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 04:05:03,968][134211] Avg episode reward: [(0, '7.664')] [2025-01-04 04:05:05,691][134294] Updated weights for policy 0, policy_version 92404 (0.0013) [2025-01-04 04:05:07,569][134294] Updated weights for policy 0, policy_version 92414 (0.0012) [2025-01-04 04:05:08,968][134211] Fps is (10 sec: 20479.7, 60 sec: 16520.5, 300 sec: 15481.5). Total num frames: 378548224. Throughput: 0: 3935.2. Samples: 83803160. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 04:05:08,968][134211] Avg episode reward: [(0, '8.499')] [2025-01-04 04:05:10,138][134294] Updated weights for policy 0, policy_version 92424 (0.0024) [2025-01-04 04:05:13,369][134294] Updated weights for policy 0, policy_version 92434 (0.0026) [2025-01-04 04:05:13,968][134211] Fps is (10 sec: 16384.0, 60 sec: 15974.3, 300 sec: 15356.5). Total num frames: 378613760. Throughput: 0: 3993.6. Samples: 83825016. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 04:05:13,968][134211] Avg episode reward: [(0, '8.409')] [2025-01-04 04:05:13,979][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000092436_378617856.pth... [2025-01-04 04:05:14,051][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000091532_374915072.pth [2025-01-04 04:05:16,529][134294] Updated weights for policy 0, policy_version 92444 (0.0027) [2025-01-04 04:05:18,968][134211] Fps is (10 sec: 13107.4, 60 sec: 15223.5, 300 sec: 15356.5). Total num frames: 378679296. Throughput: 0: 3994.0. Samples: 83834744. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 04:05:18,968][134211] Avg episode reward: [(0, '7.806')] [2025-01-04 04:05:19,733][134294] Updated weights for policy 0, policy_version 92454 (0.0022) [2025-01-04 04:05:22,832][134294] Updated weights for policy 0, policy_version 92464 (0.0026) [2025-01-04 04:05:23,968][134211] Fps is (10 sec: 13107.4, 60 sec: 15223.5, 300 sec: 15356.5). Total num frames: 378744832. Throughput: 0: 3981.2. Samples: 83854262. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 04:05:23,968][134211] Avg episode reward: [(0, '8.259')] [2025-01-04 04:05:26,166][134294] Updated weights for policy 0, policy_version 92474 (0.0026) [2025-01-04 04:05:28,968][134211] Fps is (10 sec: 12288.0, 60 sec: 15087.0, 300 sec: 15328.8). Total num frames: 378802176. Throughput: 0: 3831.2. Samples: 83872152. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 04:05:28,968][134211] Avg episode reward: [(0, '7.949')] [2025-01-04 04:05:29,847][134294] Updated weights for policy 0, policy_version 92484 (0.0030) [2025-01-04 04:05:33,113][134294] Updated weights for policy 0, policy_version 92494 (0.0024) [2025-01-04 04:05:33,968][134211] Fps is (10 sec: 11878.4, 60 sec: 15018.7, 300 sec: 15287.1). Total num frames: 378863616. Throughput: 0: 3715.1. Samples: 83881098. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 04:05:33,968][134211] Avg episode reward: [(0, '7.368')] [2025-01-04 04:05:36,121][134294] Updated weights for policy 0, policy_version 92504 (0.0022) [2025-01-04 04:05:38,016][134294] Updated weights for policy 0, policy_version 92514 (0.0014) [2025-01-04 04:05:38,967][134211] Fps is (10 sec: 15155.4, 60 sec: 15428.3, 300 sec: 15217.7). Total num frames: 378953728. Throughput: 0: 3762.6. Samples: 83903112. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 04:05:38,968][134211] Avg episode reward: [(0, '8.793')] [2025-01-04 04:05:39,946][134294] Updated weights for policy 0, policy_version 92524 (0.0014) [2025-01-04 04:05:41,806][134294] Updated weights for policy 0, policy_version 92534 (0.0011) [2025-01-04 04:05:43,700][134294] Updated weights for policy 0, policy_version 92544 (0.0012) [2025-01-04 04:05:43,968][134211] Fps is (10 sec: 20070.5, 60 sec: 15701.3, 300 sec: 15259.3). Total num frames: 379064320. Throughput: 0: 3997.2. Samples: 83935508. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 04:05:43,968][134211] Avg episode reward: [(0, '7.957')] [2025-01-04 04:05:45,581][134294] Updated weights for policy 0, policy_version 92554 (0.0013) [2025-01-04 04:05:47,421][134294] Updated weights for policy 0, policy_version 92564 (0.0014) [2025-01-04 04:05:48,968][134211] Fps is (10 sec: 20889.2, 60 sec: 16042.7, 300 sec: 15370.4). Total num frames: 379162624. Throughput: 0: 3999.9. Samples: 83951906. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 04:05:48,968][134211] Avg episode reward: [(0, '8.478')] [2025-01-04 04:05:50,435][134294] Updated weights for policy 0, policy_version 92574 (0.0026) [2025-01-04 04:05:53,640][134294] Updated weights for policy 0, policy_version 92584 (0.0025) [2025-01-04 04:05:53,968][134211] Fps is (10 sec: 16383.9, 60 sec: 16042.7, 300 sec: 15384.3). Total num frames: 379228160. Throughput: 0: 3814.2. Samples: 83974800. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 04:05:53,968][134211] Avg episode reward: [(0, '7.856')] [2025-01-04 04:05:56,782][134294] Updated weights for policy 0, policy_version 92594 (0.0025) [2025-01-04 04:05:58,969][134211] Fps is (10 sec: 13105.9, 60 sec: 15837.6, 300 sec: 15384.2). Total num frames: 379293696. Throughput: 0: 3756.2. Samples: 83994048. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 04:05:58,969][134211] Avg episode reward: [(0, '7.885')] [2025-01-04 04:05:59,872][134294] Updated weights for policy 0, policy_version 92604 (0.0025) [2025-01-04 04:06:03,074][134294] Updated weights for policy 0, policy_version 92614 (0.0026) [2025-01-04 04:06:03,968][134211] Fps is (10 sec: 12697.5, 60 sec: 15087.0, 300 sec: 15370.4). Total num frames: 379355136. Throughput: 0: 3756.4. Samples: 84003784. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 04:06:03,968][134211] Avg episode reward: [(0, '7.849')] [2025-01-04 04:06:06,008][134294] Updated weights for policy 0, policy_version 92624 (0.0027) [2025-01-04 04:06:08,968][134211] Fps is (10 sec: 13108.4, 60 sec: 14609.1, 300 sec: 15287.1). Total num frames: 379424768. Throughput: 0: 3765.6. Samples: 84023714. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 04:06:08,968][134211] Avg episode reward: [(0, '8.942')] [2025-01-04 04:06:09,348][134294] Updated weights for policy 0, policy_version 92634 (0.0026) [2025-01-04 04:06:12,325][134294] Updated weights for policy 0, policy_version 92644 (0.0027) [2025-01-04 04:06:13,968][134211] Fps is (10 sec: 13515.8, 60 sec: 14608.9, 300 sec: 15162.1). Total num frames: 379490304. Throughput: 0: 3805.5. Samples: 84043404. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 04:06:13,969][134211] Avg episode reward: [(0, '9.054')] [2025-01-04 04:06:15,340][134294] Updated weights for policy 0, policy_version 92654 (0.0025) [2025-01-04 04:06:18,262][134294] Updated weights for policy 0, policy_version 92664 (0.0025) [2025-01-04 04:06:18,968][134211] Fps is (10 sec: 13516.8, 60 sec: 14677.3, 300 sec: 15203.8). Total num frames: 379559936. Throughput: 0: 3840.0. Samples: 84053900. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 04:06:18,968][134211] Avg episode reward: [(0, '8.249')] [2025-01-04 04:06:21,113][134294] Updated weights for policy 0, policy_version 92674 (0.0021) [2025-01-04 04:06:23,165][134294] Updated weights for policy 0, policy_version 92684 (0.0016) [2025-01-04 04:06:23,968][134211] Fps is (10 sec: 15156.2, 60 sec: 14950.4, 300 sec: 15287.1). Total num frames: 379641856. Throughput: 0: 3867.6. Samples: 84077156. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 04:06:23,968][134211] Avg episode reward: [(0, '8.456')] [2025-01-04 04:06:26,139][134294] Updated weights for policy 0, policy_version 92694 (0.0023) [2025-01-04 04:06:28,968][134211] Fps is (10 sec: 15155.4, 60 sec: 15155.2, 300 sec: 15301.0). Total num frames: 379711488. Throughput: 0: 3623.3. Samples: 84098558. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 04:06:28,968][134211] Avg episode reward: [(0, '8.394')] [2025-01-04 04:06:29,120][134294] Updated weights for policy 0, policy_version 92704 (0.0023) [2025-01-04 04:06:31,003][134294] Updated weights for policy 0, policy_version 92714 (0.0012) [2025-01-04 04:06:32,929][134294] Updated weights for policy 0, policy_version 92724 (0.0014) [2025-01-04 04:06:33,968][134211] Fps is (10 sec: 17203.3, 60 sec: 15837.8, 300 sec: 15287.1). Total num frames: 379813888. Throughput: 0: 3586.7. Samples: 84113308. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 04:06:33,968][134211] Avg episode reward: [(0, '6.739')] [2025-01-04 04:06:35,609][134294] Updated weights for policy 0, policy_version 92734 (0.0024) [2025-01-04 04:06:38,640][134294] Updated weights for policy 0, policy_version 92744 (0.0024) [2025-01-04 04:06:38,969][134211] Fps is (10 sec: 16791.6, 60 sec: 15427.9, 300 sec: 15148.2). Total num frames: 379879424. Throughput: 0: 3626.6. Samples: 84138002. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 04:06:38,969][134211] Avg episode reward: [(0, '7.809')] [2025-01-04 04:06:41,725][134294] Updated weights for policy 0, policy_version 92754 (0.0026) [2025-01-04 04:06:43,968][134211] Fps is (10 sec: 13516.8, 60 sec: 14745.5, 300 sec: 15176.0). Total num frames: 379949056. Throughput: 0: 3641.3. Samples: 84157904. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 04:06:43,968][134211] Avg episode reward: [(0, '8.205')] [2025-01-04 04:06:44,902][134294] Updated weights for policy 0, policy_version 92764 (0.0024) [2025-01-04 04:06:46,904][134294] Updated weights for policy 0, policy_version 92774 (0.0014) [2025-01-04 04:06:48,780][134294] Updated weights for policy 0, policy_version 92784 (0.0014) [2025-01-04 04:06:48,968][134211] Fps is (10 sec: 16795.7, 60 sec: 14745.6, 300 sec: 15301.0). Total num frames: 380047360. Throughput: 0: 3691.8. Samples: 84169916. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 04:06:48,968][134211] Avg episode reward: [(0, '8.338')] [2025-01-04 04:06:50,654][134294] Updated weights for policy 0, policy_version 92794 (0.0013) [2025-01-04 04:06:52,575][134294] Updated weights for policy 0, policy_version 92804 (0.0013) [2025-01-04 04:06:53,968][134211] Fps is (10 sec: 20480.4, 60 sec: 15428.3, 300 sec: 15439.9). Total num frames: 380153856. Throughput: 0: 3974.8. Samples: 84202580. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 04:06:53,968][134211] Avg episode reward: [(0, '8.221')] [2025-01-04 04:06:54,379][134294] Updated weights for policy 0, policy_version 92814 (0.0014) [2025-01-04 04:06:56,536][134294] Updated weights for policy 0, policy_version 92824 (0.0015) [2025-01-04 04:06:58,969][134211] Fps is (10 sec: 18841.2, 60 sec: 15701.6, 300 sec: 15495.4). Total num frames: 380235776. Throughput: 0: 4158.7. Samples: 84230544. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 04:06:58,969][134211] Avg episode reward: [(0, '8.250')] [2025-01-04 04:06:59,629][134294] Updated weights for policy 0, policy_version 92834 (0.0029) [2025-01-04 04:07:03,043][134294] Updated weights for policy 0, policy_version 92844 (0.0027) [2025-01-04 04:07:03,968][134211] Fps is (10 sec: 14335.6, 60 sec: 15701.3, 300 sec: 15481.5). Total num frames: 380297216. Throughput: 0: 4131.2. Samples: 84239804. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 04:07:03,968][134211] Avg episode reward: [(0, '6.764')] [2025-01-04 04:07:06,116][134294] Updated weights for policy 0, policy_version 92854 (0.0027) [2025-01-04 04:07:08,968][134211] Fps is (10 sec: 12696.9, 60 sec: 15632.9, 300 sec: 15481.5). Total num frames: 380362752. Throughput: 0: 4040.9. Samples: 84258998. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 04:07:08,969][134211] Avg episode reward: [(0, '8.069')] [2025-01-04 04:07:09,484][134294] Updated weights for policy 0, policy_version 92864 (0.0025) [2025-01-04 04:07:12,594][134294] Updated weights for policy 0, policy_version 92874 (0.0027) [2025-01-04 04:07:13,968][134211] Fps is (10 sec: 13107.5, 60 sec: 15633.3, 300 sec: 15425.9). Total num frames: 380428288. Throughput: 0: 3986.7. Samples: 84277960. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 04:07:13,968][134211] Avg episode reward: [(0, '7.434')] [2025-01-04 04:07:13,979][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000092878_380428288.pth... [2025-01-04 04:07:14,050][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000091981_376754176.pth [2025-01-04 04:07:15,714][134294] Updated weights for policy 0, policy_version 92884 (0.0027) [2025-01-04 04:07:18,687][134294] Updated weights for policy 0, policy_version 92894 (0.0026) [2025-01-04 04:07:18,968][134211] Fps is (10 sec: 13108.2, 60 sec: 15564.8, 300 sec: 15328.8). Total num frames: 380493824. Throughput: 0: 3882.1. Samples: 84288004. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 04:07:18,968][134211] Avg episode reward: [(0, '7.002')] [2025-01-04 04:07:21,681][134294] Updated weights for policy 0, policy_version 92904 (0.0025) [2025-01-04 04:07:23,968][134211] Fps is (10 sec: 13516.6, 60 sec: 15360.0, 300 sec: 15301.0). Total num frames: 380563456. Throughput: 0: 3795.2. Samples: 84308782. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 04:07:23,968][134211] Avg episode reward: [(0, '7.629')] [2025-01-04 04:07:24,689][134294] Updated weights for policy 0, policy_version 92914 (0.0025) [2025-01-04 04:07:28,197][134294] Updated weights for policy 0, policy_version 92924 (0.0024) [2025-01-04 04:07:28,967][134211] Fps is (10 sec: 13517.0, 60 sec: 15291.8, 300 sec: 15301.0). Total num frames: 380628992. Throughput: 0: 3761.3. Samples: 84327160. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 04:07:28,968][134211] Avg episode reward: [(0, '8.081')] [2025-01-04 04:07:30,257][134294] Updated weights for policy 0, policy_version 92934 (0.0013) [2025-01-04 04:07:32,250][134294] Updated weights for policy 0, policy_version 92944 (0.0013) [2025-01-04 04:07:33,968][134211] Fps is (10 sec: 17203.6, 60 sec: 15360.0, 300 sec: 15370.4). Total num frames: 380735488. Throughput: 0: 3829.5. Samples: 84342244. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 04:07:33,968][134211] Avg episode reward: [(0, '7.454')] [2025-01-04 04:07:34,127][134294] Updated weights for policy 0, policy_version 92954 (0.0013) [2025-01-04 04:07:36,052][134294] Updated weights for policy 0, policy_version 92964 (0.0015) [2025-01-04 04:07:37,914][134294] Updated weights for policy 0, policy_version 92974 (0.0012) [2025-01-04 04:07:38,968][134211] Fps is (10 sec: 21298.9, 60 sec: 16043.0, 300 sec: 15467.6). Total num frames: 380841984. Throughput: 0: 3822.0. Samples: 84374568. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 04:07:38,968][134211] Avg episode reward: [(0, '6.940')] [2025-01-04 04:07:39,800][134294] Updated weights for policy 0, policy_version 92984 (0.0013) [2025-01-04 04:07:42,127][134294] Updated weights for policy 0, policy_version 92994 (0.0019) [2025-01-04 04:07:43,968][134211] Fps is (10 sec: 18841.5, 60 sec: 16247.5, 300 sec: 15523.1). Total num frames: 380923904. Throughput: 0: 3821.1. Samples: 84402492. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 04:07:43,968][134211] Avg episode reward: [(0, '7.456')] [2025-01-04 04:07:45,339][134294] Updated weights for policy 0, policy_version 93004 (0.0033) [2025-01-04 04:07:48,424][134294] Updated weights for policy 0, policy_version 93014 (0.0026) [2025-01-04 04:07:48,968][134211] Fps is (10 sec: 14745.4, 60 sec: 15701.3, 300 sec: 15509.3). Total num frames: 380989440. Throughput: 0: 3829.3. Samples: 84412122. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 04:07:48,969][134211] Avg episode reward: [(0, '7.074')] [2025-01-04 04:07:51,563][134294] Updated weights for policy 0, policy_version 93024 (0.0026) [2025-01-04 04:07:53,968][134211] Fps is (10 sec: 13107.0, 60 sec: 15018.6, 300 sec: 15370.4). Total num frames: 381054976. Throughput: 0: 3841.3. Samples: 84431856. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 04:07:53,969][134211] Avg episode reward: [(0, '8.138')] [2025-01-04 04:07:54,673][134294] Updated weights for policy 0, policy_version 93034 (0.0027) [2025-01-04 04:07:57,759][134294] Updated weights for policy 0, policy_version 93044 (0.0025) [2025-01-04 04:07:58,968][134211] Fps is (10 sec: 13107.4, 60 sec: 14745.7, 300 sec: 15245.5). Total num frames: 381120512. Throughput: 0: 3854.8. Samples: 84451428. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 04:07:58,968][134211] Avg episode reward: [(0, '8.334')] [2025-01-04 04:08:00,816][134294] Updated weights for policy 0, policy_version 93054 (0.0024) [2025-01-04 04:08:03,964][134294] Updated weights for policy 0, policy_version 93064 (0.0027) [2025-01-04 04:08:03,968][134211] Fps is (10 sec: 13516.8, 60 sec: 14882.1, 300 sec: 15259.3). Total num frames: 381190144. Throughput: 0: 3860.4. Samples: 84461722. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 04:08:03,968][134211] Avg episode reward: [(0, '8.019')] [2025-01-04 04:08:06,970][134294] Updated weights for policy 0, policy_version 93074 (0.0027) [2025-01-04 04:08:08,968][134211] Fps is (10 sec: 13516.6, 60 sec: 14882.3, 300 sec: 15273.2). Total num frames: 381255680. Throughput: 0: 3843.2. Samples: 84481726. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 04:08:08,968][134211] Avg episode reward: [(0, '7.679')] [2025-01-04 04:08:10,044][134294] Updated weights for policy 0, policy_version 93084 (0.0028) [2025-01-04 04:08:13,020][134294] Updated weights for policy 0, policy_version 93094 (0.0026) [2025-01-04 04:08:13,968][134211] Fps is (10 sec: 13107.4, 60 sec: 14882.1, 300 sec: 15273.2). Total num frames: 381321216. Throughput: 0: 3881.9. Samples: 84501844. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 04:08:13,968][134211] Avg episode reward: [(0, '7.927')] [2025-01-04 04:08:16,180][134294] Updated weights for policy 0, policy_version 93104 (0.0027) [2025-01-04 04:08:18,441][134294] Updated weights for policy 0, policy_version 93114 (0.0016) [2025-01-04 04:08:18,967][134211] Fps is (10 sec: 14745.9, 60 sec: 15155.2, 300 sec: 15314.9). Total num frames: 381403136. Throughput: 0: 3761.3. Samples: 84511504. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 04:08:18,968][134211] Avg episode reward: [(0, '7.827')] [2025-01-04 04:08:21,058][134294] Updated weights for policy 0, policy_version 93124 (0.0021) [2025-01-04 04:08:23,968][134211] Fps is (10 sec: 15155.1, 60 sec: 15155.2, 300 sec: 15342.6). Total num frames: 381472768. Throughput: 0: 3591.2. Samples: 84536174. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 04:08:23,968][134211] Avg episode reward: [(0, '7.300')] [2025-01-04 04:08:23,972][134294] Updated weights for policy 0, policy_version 93134 (0.0025) [2025-01-04 04:08:26,699][134294] Updated weights for policy 0, policy_version 93144 (0.0022) [2025-01-04 04:08:28,536][134294] Updated weights for policy 0, policy_version 93154 (0.0013) [2025-01-04 04:08:28,968][134211] Fps is (10 sec: 16383.9, 60 sec: 15633.0, 300 sec: 15314.9). Total num frames: 381566976. Throughput: 0: 3519.1. Samples: 84560850. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 04:08:28,968][134211] Avg episode reward: [(0, '7.939')] [2025-01-04 04:08:30,460][134294] Updated weights for policy 0, policy_version 93164 (0.0013) [2025-01-04 04:08:32,322][134294] Updated weights for policy 0, policy_version 93174 (0.0014) [2025-01-04 04:08:33,970][134211] Fps is (10 sec: 20066.1, 60 sec: 15632.5, 300 sec: 15300.9). Total num frames: 381673472. Throughput: 0: 3667.8. Samples: 84577180. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 04:08:33,970][134211] Avg episode reward: [(0, '7.925')] [2025-01-04 04:08:34,237][134294] Updated weights for policy 0, policy_version 93184 (0.0014) [2025-01-04 04:08:36,312][134294] Updated weights for policy 0, policy_version 93194 (0.0017) [2025-01-04 04:08:38,968][134211] Fps is (10 sec: 18431.9, 60 sec: 15155.2, 300 sec: 15328.8). Total num frames: 381751296. Throughput: 0: 3877.4. Samples: 84606338. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 04:08:38,968][134211] Avg episode reward: [(0, '7.773')] [2025-01-04 04:08:40,100][134294] Updated weights for policy 0, policy_version 93204 (0.0027) [2025-01-04 04:08:43,704][134294] Updated weights for policy 0, policy_version 93214 (0.0027) [2025-01-04 04:08:43,968][134211] Fps is (10 sec: 13109.7, 60 sec: 14677.3, 300 sec: 15301.0). Total num frames: 381804544. Throughput: 0: 3807.6. Samples: 84622770. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 04:08:43,969][134211] Avg episode reward: [(0, '8.172')] [2025-01-04 04:08:47,032][134294] Updated weights for policy 0, policy_version 93224 (0.0027) [2025-01-04 04:08:48,968][134211] Fps is (10 sec: 11468.9, 60 sec: 14609.1, 300 sec: 15287.1). Total num frames: 381865984. Throughput: 0: 3776.1. Samples: 84631648. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 04:08:48,968][134211] Avg episode reward: [(0, '8.487')] [2025-01-04 04:08:50,433][134294] Updated weights for policy 0, policy_version 93234 (0.0029) [2025-01-04 04:08:53,279][134294] Updated weights for policy 0, policy_version 93244 (0.0020) [2025-01-04 04:08:53,968][134211] Fps is (10 sec: 13516.8, 60 sec: 14745.6, 300 sec: 15301.0). Total num frames: 381939712. Throughput: 0: 3746.3. Samples: 84650308. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 04:08:53,968][134211] Avg episode reward: [(0, '7.787')] [2025-01-04 04:08:55,505][134294] Updated weights for policy 0, policy_version 93254 (0.0018) [2025-01-04 04:08:58,528][134294] Updated weights for policy 0, policy_version 93264 (0.0027) [2025-01-04 04:08:58,968][134211] Fps is (10 sec: 14745.4, 60 sec: 14882.1, 300 sec: 15328.9). Total num frames: 382013440. Throughput: 0: 3833.3. Samples: 84674342. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 04:08:58,968][134211] Avg episode reward: [(0, '7.513')] [2025-01-04 04:09:01,643][134294] Updated weights for policy 0, policy_version 93274 (0.0025) [2025-01-04 04:09:03,968][134211] Fps is (10 sec: 13926.4, 60 sec: 14813.8, 300 sec: 15328.7). Total num frames: 382078976. Throughput: 0: 3838.9. Samples: 84684254. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 04:09:03,968][134211] Avg episode reward: [(0, '7.575')] [2025-01-04 04:09:04,786][134294] Updated weights for policy 0, policy_version 93284 (0.0027) [2025-01-04 04:09:06,748][134294] Updated weights for policy 0, policy_version 93294 (0.0013) [2025-01-04 04:09:08,634][134294] Updated weights for policy 0, policy_version 93304 (0.0012) [2025-01-04 04:09:08,968][134211] Fps is (10 sec: 16384.2, 60 sec: 15360.0, 300 sec: 15328.8). Total num frames: 382177280. Throughput: 0: 3834.0. Samples: 84708704. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 04:09:08,968][134211] Avg episode reward: [(0, '7.533')] [2025-01-04 04:09:10,485][134294] Updated weights for policy 0, policy_version 93314 (0.0013) [2025-01-04 04:09:12,354][134294] Updated weights for policy 0, policy_version 93324 (0.0015) [2025-01-04 04:09:13,968][134211] Fps is (10 sec: 20890.1, 60 sec: 16110.9, 300 sec: 15328.8). Total num frames: 382287872. Throughput: 0: 4010.9. Samples: 84741342. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-04 04:09:13,968][134211] Avg episode reward: [(0, '7.948')] [2025-01-04 04:09:13,974][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000093332_382287872.pth... [2025-01-04 04:09:14,019][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000092436_378617856.pth [2025-01-04 04:09:14,277][134294] Updated weights for policy 0, policy_version 93334 (0.0013) [2025-01-04 04:09:16,232][134294] Updated weights for policy 0, policy_version 93344 (0.0013) [2025-01-04 04:09:18,968][134211] Fps is (10 sec: 19660.5, 60 sec: 16179.1, 300 sec: 15398.2). Total num frames: 382373888. Throughput: 0: 4006.1. Samples: 84757444. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-04 04:09:18,968][134211] Avg episode reward: [(0, '7.238')] [2025-01-04 04:09:19,143][134294] Updated weights for policy 0, policy_version 93354 (0.0026) [2025-01-04 04:09:22,320][134294] Updated weights for policy 0, policy_version 93364 (0.0025) [2025-01-04 04:09:23,968][134211] Fps is (10 sec: 14745.4, 60 sec: 16042.6, 300 sec: 15384.3). Total num frames: 382435328. Throughput: 0: 3802.6. Samples: 84777456. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-04 04:09:23,968][134211] Avg episode reward: [(0, '7.822')] [2025-01-04 04:09:25,638][134294] Updated weights for policy 0, policy_version 93374 (0.0029) [2025-01-04 04:09:28,968][134211] Fps is (10 sec: 12288.2, 60 sec: 15496.5, 300 sec: 15370.4). Total num frames: 382496768. Throughput: 0: 3844.0. Samples: 84795750. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-04 04:09:28,968][134211] Avg episode reward: [(0, '7.328')] [2025-01-04 04:09:29,098][134294] Updated weights for policy 0, policy_version 93384 (0.0024) [2025-01-04 04:09:32,643][134294] Updated weights for policy 0, policy_version 93394 (0.0025) [2025-01-04 04:09:33,968][134211] Fps is (10 sec: 11878.0, 60 sec: 14677.7, 300 sec: 15342.6). Total num frames: 382554112. Throughput: 0: 3837.4. Samples: 84804332. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-04 04:09:33,969][134211] Avg episode reward: [(0, '7.474')] [2025-01-04 04:09:35,902][134294] Updated weights for policy 0, policy_version 93404 (0.0028) [2025-01-04 04:09:38,969][134211] Fps is (10 sec: 12285.8, 60 sec: 14472.1, 300 sec: 15245.4). Total num frames: 382619648. Throughput: 0: 3835.4. Samples: 84822906. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-04 04:09:38,970][134211] Avg episode reward: [(0, '7.646')] [2025-01-04 04:09:39,072][134294] Updated weights for policy 0, policy_version 93414 (0.0025) [2025-01-04 04:09:42,057][134294] Updated weights for policy 0, policy_version 93424 (0.0022) [2025-01-04 04:09:43,968][134211] Fps is (10 sec: 13517.4, 60 sec: 14745.7, 300 sec: 15217.7). Total num frames: 382689280. Throughput: 0: 3746.4. Samples: 84842928. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-04 04:09:43,968][134211] Avg episode reward: [(0, '6.734')] [2025-01-04 04:09:44,644][134294] Updated weights for policy 0, policy_version 93434 (0.0017) [2025-01-04 04:09:47,078][134294] Updated weights for policy 0, policy_version 93444 (0.0018) [2025-01-04 04:09:48,969][134211] Fps is (10 sec: 14745.5, 60 sec: 15018.2, 300 sec: 15259.3). Total num frames: 382767104. Throughput: 0: 3828.9. Samples: 84856562. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-04 04:09:48,970][134211] Avg episode reward: [(0, '7.608')] [2025-01-04 04:09:50,403][134294] Updated weights for policy 0, policy_version 93454 (0.0025) [2025-01-04 04:09:52,362][134294] Updated weights for policy 0, policy_version 93464 (0.0012) [2025-01-04 04:09:53,968][134211] Fps is (10 sec: 17202.8, 60 sec: 15360.0, 300 sec: 15314.9). Total num frames: 382861312. Throughput: 0: 3791.7. Samples: 84879332. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-04 04:09:53,968][134211] Avg episode reward: [(0, '8.546')] [2025-01-04 04:09:54,273][134294] Updated weights for policy 0, policy_version 93474 (0.0013) [2025-01-04 04:09:56,151][134294] Updated weights for policy 0, policy_version 93484 (0.0014) [2025-01-04 04:09:58,067][134294] Updated weights for policy 0, policy_version 93494 (0.0015) [2025-01-04 04:09:58,967][134211] Fps is (10 sec: 20074.3, 60 sec: 15906.2, 300 sec: 15314.9). Total num frames: 382967808. Throughput: 0: 3788.2. Samples: 84911812. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-04 04:09:58,968][134211] Avg episode reward: [(0, '7.811')] [2025-01-04 04:10:00,845][134294] Updated weights for policy 0, policy_version 93504 (0.0025) [2025-01-04 04:10:03,971][134211] Fps is (10 sec: 16787.7, 60 sec: 15836.9, 300 sec: 15189.7). Total num frames: 383029248. Throughput: 0: 3661.3. Samples: 84922214. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-04 04:10:03,972][134211] Avg episode reward: [(0, '7.937')] [2025-01-04 04:10:04,258][134294] Updated weights for policy 0, policy_version 93514 (0.0029) [2025-01-04 04:10:07,432][134294] Updated weights for policy 0, policy_version 93524 (0.0024) [2025-01-04 04:10:08,968][134211] Fps is (10 sec: 12287.7, 60 sec: 15223.4, 300 sec: 15176.0). Total num frames: 383090688. Throughput: 0: 3632.2. Samples: 84940906. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 04:10:08,968][134211] Avg episode reward: [(0, '7.683')] [2025-01-04 04:10:10,705][134294] Updated weights for policy 0, policy_version 93534 (0.0029) [2025-01-04 04:10:13,732][134294] Updated weights for policy 0, policy_version 93544 (0.0025) [2025-01-04 04:10:13,968][134211] Fps is (10 sec: 12701.9, 60 sec: 14472.4, 300 sec: 15176.0). Total num frames: 383156224. Throughput: 0: 3657.4. Samples: 84960334. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 04:10:13,969][134211] Avg episode reward: [(0, '7.737')] [2025-01-04 04:10:16,796][134294] Updated weights for policy 0, policy_version 93554 (0.0026) [2025-01-04 04:10:18,968][134211] Fps is (10 sec: 13517.0, 60 sec: 14199.5, 300 sec: 15189.9). Total num frames: 383225856. Throughput: 0: 3688.6. Samples: 84970316. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 04:10:18,968][134211] Avg episode reward: [(0, '7.725')] [2025-01-04 04:10:19,899][134294] Updated weights for policy 0, policy_version 93564 (0.0025) [2025-01-04 04:10:22,008][134294] Updated weights for policy 0, policy_version 93574 (0.0013) [2025-01-04 04:10:23,860][134294] Updated weights for policy 0, policy_version 93584 (0.0014) [2025-01-04 04:10:23,968][134211] Fps is (10 sec: 16383.6, 60 sec: 14745.5, 300 sec: 15314.8). Total num frames: 383320064. Throughput: 0: 3804.3. Samples: 84994094. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 04:10:23,969][134211] Avg episode reward: [(0, '7.752')] [2025-01-04 04:10:25,814][134294] Updated weights for policy 0, policy_version 93594 (0.0013) [2025-01-04 04:10:28,196][134294] Updated weights for policy 0, policy_version 93604 (0.0022) [2025-01-04 04:10:28,968][134211] Fps is (10 sec: 18432.0, 60 sec: 15223.5, 300 sec: 15412.1). Total num frames: 383410176. Throughput: 0: 4021.1. Samples: 85023878. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 04:10:28,968][134211] Avg episode reward: [(0, '6.639')] [2025-01-04 04:10:31,384][134294] Updated weights for policy 0, policy_version 93614 (0.0027) [2025-01-04 04:10:33,968][134211] Fps is (10 sec: 15155.9, 60 sec: 15291.8, 300 sec: 15314.9). Total num frames: 383471616. Throughput: 0: 3922.4. Samples: 85033064. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 04:10:33,968][134211] Avg episode reward: [(0, '7.126')] [2025-01-04 04:10:34,703][134294] Updated weights for policy 0, policy_version 93624 (0.0027) [2025-01-04 04:10:37,422][134294] Updated weights for policy 0, policy_version 93634 (0.0019) [2025-01-04 04:10:38,968][134211] Fps is (10 sec: 14335.5, 60 sec: 15565.2, 300 sec: 15217.7). Total num frames: 383553536. Throughput: 0: 3865.8. Samples: 85053292. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 04:10:38,968][134211] Avg episode reward: [(0, '7.240')] [2025-01-04 04:10:39,721][134294] Updated weights for policy 0, policy_version 93644 (0.0017) [2025-01-04 04:10:42,662][134294] Updated weights for policy 0, policy_version 93654 (0.0025) [2025-01-04 04:10:43,968][134211] Fps is (10 sec: 14745.8, 60 sec: 15496.5, 300 sec: 15106.6). Total num frames: 383619072. Throughput: 0: 3659.5. Samples: 85076490. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 04:10:43,968][134211] Avg episode reward: [(0, '7.742')] [2025-01-04 04:10:45,813][134294] Updated weights for policy 0, policy_version 93664 (0.0026) [2025-01-04 04:10:47,635][134294] Updated weights for policy 0, policy_version 93674 (0.0014) [2025-01-04 04:10:48,967][134211] Fps is (10 sec: 15975.1, 60 sec: 15770.1, 300 sec: 15203.8). Total num frames: 383713280. Throughput: 0: 3669.6. Samples: 85087330. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 04:10:48,968][134211] Avg episode reward: [(0, '7.688')] [2025-01-04 04:10:49,546][134294] Updated weights for policy 0, policy_version 93684 (0.0013) [2025-01-04 04:10:52,117][134294] Updated weights for policy 0, policy_version 93694 (0.0021) [2025-01-04 04:10:53,968][134211] Fps is (10 sec: 17203.2, 60 sec: 15496.6, 300 sec: 15245.5). Total num frames: 383791104. Throughput: 0: 3893.0. Samples: 85116092. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 04:10:53,968][134211] Avg episode reward: [(0, '7.989')] [2025-01-04 04:10:55,234][134294] Updated weights for policy 0, policy_version 93704 (0.0025) [2025-01-04 04:10:58,292][134294] Updated weights for policy 0, policy_version 93714 (0.0024) [2025-01-04 04:10:58,969][134211] Fps is (10 sec: 14333.4, 60 sec: 14813.4, 300 sec: 15259.2). Total num frames: 383856640. Throughput: 0: 3898.7. Samples: 85135782. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 04:10:58,970][134211] Avg episode reward: [(0, '7.514')] [2025-01-04 04:11:01,507][134294] Updated weights for policy 0, policy_version 93724 (0.0027) [2025-01-04 04:11:03,968][134211] Fps is (10 sec: 13107.1, 60 sec: 14883.0, 300 sec: 15245.4). Total num frames: 383922176. Throughput: 0: 3893.7. Samples: 85145532. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 04:11:03,968][134211] Avg episode reward: [(0, '7.580')] [2025-01-04 04:11:04,463][134294] Updated weights for policy 0, policy_version 93734 (0.0021) [2025-01-04 04:11:06,351][134294] Updated weights for policy 0, policy_version 93744 (0.0014) [2025-01-04 04:11:08,302][134294] Updated weights for policy 0, policy_version 93754 (0.0013) [2025-01-04 04:11:08,968][134211] Fps is (10 sec: 17206.2, 60 sec: 15633.1, 300 sec: 15384.3). Total num frames: 384028672. Throughput: 0: 3940.6. Samples: 85171416. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 04:11:08,968][134211] Avg episode reward: [(0, '8.708')] [2025-01-04 04:11:10,182][134294] Updated weights for policy 0, policy_version 93764 (0.0013) [2025-01-04 04:11:12,416][134294] Updated weights for policy 0, policy_version 93774 (0.0018) [2025-01-04 04:11:13,968][134211] Fps is (10 sec: 19251.3, 60 sec: 15974.5, 300 sec: 15439.8). Total num frames: 384114688. Throughput: 0: 3913.5. Samples: 85199986. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 04:11:13,968][134211] Avg episode reward: [(0, '7.764')] [2025-01-04 04:11:13,981][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000093778_384114688.pth... [2025-01-04 04:11:14,065][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000092878_380428288.pth [2025-01-04 04:11:15,847][134294] Updated weights for policy 0, policy_version 93784 (0.0026) [2025-01-04 04:11:18,962][134294] Updated weights for policy 0, policy_version 93794 (0.0026) [2025-01-04 04:11:18,968][134211] Fps is (10 sec: 15154.9, 60 sec: 15906.1, 300 sec: 15384.3). Total num frames: 384180224. Throughput: 0: 3911.7. Samples: 85209090. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 04:11:18,968][134211] Avg episode reward: [(0, '8.497')] [2025-01-04 04:11:22,042][134294] Updated weights for policy 0, policy_version 93804 (0.0030) [2025-01-04 04:11:23,968][134211] Fps is (10 sec: 13107.3, 60 sec: 15428.4, 300 sec: 15370.4). Total num frames: 384245760. Throughput: 0: 3899.1. Samples: 85228750. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 04:11:23,968][134211] Avg episode reward: [(0, '7.913')] [2025-01-04 04:11:25,314][134294] Updated weights for policy 0, policy_version 93814 (0.0028) [2025-01-04 04:11:28,817][134294] Updated weights for policy 0, policy_version 93824 (0.0026) [2025-01-04 04:11:28,968][134211] Fps is (10 sec: 12288.0, 60 sec: 14882.1, 300 sec: 15217.7). Total num frames: 384303104. Throughput: 0: 3793.9. Samples: 85247214. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 04:11:28,968][134211] Avg episode reward: [(0, '8.243')] [2025-01-04 04:11:31,573][134294] Updated weights for policy 0, policy_version 93834 (0.0019) [2025-01-04 04:11:33,633][134294] Updated weights for policy 0, policy_version 93844 (0.0013) [2025-01-04 04:11:33,969][134211] Fps is (10 sec: 14334.1, 60 sec: 15291.4, 300 sec: 15287.1). Total num frames: 384389120. Throughput: 0: 3771.3. Samples: 85257044. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 04:11:33,969][134211] Avg episode reward: [(0, '6.987')] [2025-01-04 04:11:35,522][134294] Updated weights for policy 0, policy_version 93854 (0.0013) [2025-01-04 04:11:37,455][134294] Updated weights for policy 0, policy_version 93864 (0.0013) [2025-01-04 04:11:38,967][134211] Fps is (10 sec: 19661.2, 60 sec: 15769.7, 300 sec: 15426.0). Total num frames: 384499712. Throughput: 0: 3835.9. Samples: 85288708. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 04:11:38,968][134211] Avg episode reward: [(0, '7.181')] [2025-01-04 04:11:39,314][134294] Updated weights for policy 0, policy_version 93874 (0.0016) [2025-01-04 04:11:41,834][134294] Updated weights for policy 0, policy_version 93884 (0.0020) [2025-01-04 04:11:43,968][134211] Fps is (10 sec: 18434.2, 60 sec: 15906.1, 300 sec: 15342.6). Total num frames: 384573440. Throughput: 0: 3972.1. Samples: 85314520. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 04:11:43,968][134211] Avg episode reward: [(0, '7.776')] [2025-01-04 04:11:45,122][134294] Updated weights for policy 0, policy_version 93894 (0.0028) [2025-01-04 04:11:48,359][134294] Updated weights for policy 0, policy_version 93904 (0.0028) [2025-01-04 04:11:48,968][134211] Fps is (10 sec: 13516.5, 60 sec: 15360.0, 300 sec: 15189.9). Total num frames: 384634880. Throughput: 0: 3960.7. Samples: 85323762. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 04:11:48,968][134211] Avg episode reward: [(0, '8.198')] [2025-01-04 04:11:51,538][134294] Updated weights for policy 0, policy_version 93914 (0.0024) [2025-01-04 04:11:53,968][134211] Fps is (10 sec: 12697.5, 60 sec: 15155.2, 300 sec: 15134.4). Total num frames: 384700416. Throughput: 0: 3810.8. Samples: 85342902. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 04:11:53,968][134211] Avg episode reward: [(0, '8.657')] [2025-01-04 04:11:54,854][134294] Updated weights for policy 0, policy_version 93924 (0.0028) [2025-01-04 04:11:58,053][134294] Updated weights for policy 0, policy_version 93934 (0.0027) [2025-01-04 04:11:58,968][134211] Fps is (10 sec: 13107.1, 60 sec: 15155.6, 300 sec: 15148.3). Total num frames: 384765952. Throughput: 0: 3601.7. Samples: 85362064. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 04:11:58,968][134211] Avg episode reward: [(0, '8.164')] [2025-01-04 04:12:01,148][134294] Updated weights for policy 0, policy_version 93944 (0.0028) [2025-01-04 04:12:03,968][134211] Fps is (10 sec: 12697.7, 60 sec: 15086.9, 300 sec: 15134.4). Total num frames: 384827392. Throughput: 0: 3619.4. Samples: 85371962. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 04:12:03,968][134211] Avg episode reward: [(0, '7.739')] [2025-01-04 04:12:04,258][134294] Updated weights for policy 0, policy_version 93954 (0.0027) [2025-01-04 04:12:06,277][134294] Updated weights for policy 0, policy_version 93964 (0.0013) [2025-01-04 04:12:08,242][134294] Updated weights for policy 0, policy_version 93974 (0.0012) [2025-01-04 04:12:08,968][134211] Fps is (10 sec: 16384.3, 60 sec: 15018.7, 300 sec: 15259.3). Total num frames: 384929792. Throughput: 0: 3744.4. Samples: 85397250. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 04:12:08,968][134211] Avg episode reward: [(0, '8.423')] [2025-01-04 04:12:10,212][134294] Updated weights for policy 0, policy_version 93984 (0.0014) [2025-01-04 04:12:12,915][134294] Updated weights for policy 0, policy_version 93994 (0.0025) [2025-01-04 04:12:13,968][134211] Fps is (10 sec: 18022.2, 60 sec: 14882.1, 300 sec: 15301.0). Total num frames: 385007616. Throughput: 0: 3917.6. Samples: 85423508. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 04:12:13,969][134211] Avg episode reward: [(0, '7.841')] [2025-01-04 04:12:16,208][134294] Updated weights for policy 0, policy_version 94004 (0.0029) [2025-01-04 04:12:18,969][134211] Fps is (10 sec: 14334.5, 60 sec: 14881.9, 300 sec: 15287.1). Total num frames: 385073152. Throughput: 0: 3912.5. Samples: 85433106. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 04:12:18,969][134211] Avg episode reward: [(0, '8.572')] [2025-01-04 04:12:19,512][134294] Updated weights for policy 0, policy_version 94014 (0.0029) [2025-01-04 04:12:22,565][134294] Updated weights for policy 0, policy_version 94024 (0.0024) [2025-01-04 04:12:23,968][134211] Fps is (10 sec: 13107.3, 60 sec: 14882.1, 300 sec: 15287.1). Total num frames: 385138688. Throughput: 0: 3637.6. Samples: 85452400. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 04:12:23,968][134211] Avg episode reward: [(0, '8.003')] [2025-01-04 04:12:25,683][134294] Updated weights for policy 0, policy_version 94034 (0.0027) [2025-01-04 04:12:27,640][134294] Updated weights for policy 0, policy_version 94044 (0.0012) [2025-01-04 04:12:28,967][134211] Fps is (10 sec: 15976.3, 60 sec: 15496.6, 300 sec: 15245.5). Total num frames: 385232896. Throughput: 0: 3616.7. Samples: 85477268. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 04:12:28,968][134211] Avg episode reward: [(0, '7.656')] [2025-01-04 04:12:29,555][134294] Updated weights for policy 0, policy_version 94054 (0.0012) [2025-01-04 04:12:32,332][134294] Updated weights for policy 0, policy_version 94064 (0.0024) [2025-01-04 04:12:33,968][134211] Fps is (10 sec: 16384.3, 60 sec: 15223.8, 300 sec: 15120.5). Total num frames: 385302528. Throughput: 0: 3716.2. Samples: 85490990. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 04:12:33,968][134211] Avg episode reward: [(0, '8.807')] [2025-01-04 04:12:35,495][134294] Updated weights for policy 0, policy_version 94074 (0.0025) [2025-01-04 04:12:38,601][134294] Updated weights for policy 0, policy_version 94084 (0.0025) [2025-01-04 04:12:38,968][134211] Fps is (10 sec: 13516.4, 60 sec: 14472.5, 300 sec: 15064.9). Total num frames: 385368064. Throughput: 0: 3724.8. Samples: 85510518. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 04:12:38,968][134211] Avg episode reward: [(0, '7.436')] [2025-01-04 04:12:41,685][134294] Updated weights for policy 0, policy_version 94094 (0.0025) [2025-01-04 04:12:43,763][134294] Updated weights for policy 0, policy_version 94104 (0.0014) [2025-01-04 04:12:43,968][134211] Fps is (10 sec: 15155.2, 60 sec: 14677.4, 300 sec: 15134.4). Total num frames: 385454080. Throughput: 0: 3788.9. Samples: 85532566. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 04:12:43,968][134211] Avg episode reward: [(0, '7.917')] [2025-01-04 04:12:45,693][134294] Updated weights for policy 0, policy_version 94114 (0.0013) [2025-01-04 04:12:48,902][134294] Updated weights for policy 0, policy_version 94124 (0.0027) [2025-01-04 04:12:48,971][134211] Fps is (10 sec: 16379.1, 60 sec: 14949.6, 300 sec: 15175.9). Total num frames: 385531904. Throughput: 0: 3909.2. Samples: 85547888. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 04:12:48,971][134211] Avg episode reward: [(0, '8.139')] [2025-01-04 04:12:52,353][134294] Updated weights for policy 0, policy_version 94134 (0.0029) [2025-01-04 04:12:53,967][134211] Fps is (10 sec: 13926.5, 60 sec: 14882.2, 300 sec: 15162.1). Total num frames: 385593344. Throughput: 0: 3745.6. Samples: 85565802. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 04:12:53,968][134211] Avg episode reward: [(0, '7.030')] [2025-01-04 04:12:54,751][134294] Updated weights for policy 0, policy_version 94144 (0.0015) [2025-01-04 04:12:56,806][134294] Updated weights for policy 0, policy_version 94154 (0.0014) [2025-01-04 04:12:58,720][134294] Updated weights for policy 0, policy_version 94164 (0.0012) [2025-01-04 04:12:58,968][134211] Fps is (10 sec: 16799.0, 60 sec: 15564.9, 300 sec: 15287.1). Total num frames: 385699840. Throughput: 0: 3792.1. Samples: 85594150. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 04:12:58,968][134211] Avg episode reward: [(0, '8.407')] [2025-01-04 04:13:00,696][134294] Updated weights for policy 0, policy_version 94174 (0.0013) [2025-01-04 04:13:02,698][134294] Updated weights for policy 0, policy_version 94184 (0.0013) [2025-01-04 04:13:03,968][134211] Fps is (10 sec: 20889.3, 60 sec: 16247.5, 300 sec: 15412.1). Total num frames: 385802240. Throughput: 0: 3927.8. Samples: 85609854. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 04:13:03,968][134211] Avg episode reward: [(0, '7.617')] [2025-01-04 04:13:04,901][134294] Updated weights for policy 0, policy_version 94194 (0.0017) [2025-01-04 04:13:08,126][134294] Updated weights for policy 0, policy_version 94204 (0.0029) [2025-01-04 04:13:08,968][134211] Fps is (10 sec: 16793.2, 60 sec: 15633.0, 300 sec: 15412.1). Total num frames: 385867776. Throughput: 0: 4061.6. Samples: 85635170. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 04:13:08,968][134211] Avg episode reward: [(0, '8.751')] [2025-01-04 04:13:11,363][134294] Updated weights for policy 0, policy_version 94214 (0.0027) [2025-01-04 04:13:13,968][134211] Fps is (10 sec: 12697.4, 60 sec: 15360.0, 300 sec: 15342.6). Total num frames: 385929216. Throughput: 0: 3917.7. Samples: 85653568. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 04:13:13,968][134211] Avg episode reward: [(0, '7.423')] [2025-01-04 04:13:13,978][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000094221_385929216.pth... [2025-01-04 04:13:14,051][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000093332_382287872.pth [2025-01-04 04:13:14,825][134294] Updated weights for policy 0, policy_version 94224 (0.0028) [2025-01-04 04:13:17,882][134294] Updated weights for policy 0, policy_version 94234 (0.0029) [2025-01-04 04:13:18,968][134211] Fps is (10 sec: 12697.7, 60 sec: 15360.2, 300 sec: 15328.8). Total num frames: 385994752. Throughput: 0: 3819.0. Samples: 85662846. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 04:13:18,968][134211] Avg episode reward: [(0, '8.679')] [2025-01-04 04:13:21,100][134294] Updated weights for policy 0, policy_version 94244 (0.0025) [2025-01-04 04:13:23,968][134211] Fps is (10 sec: 13107.3, 60 sec: 15360.0, 300 sec: 15231.6). Total num frames: 386060288. Throughput: 0: 3826.0. Samples: 85682690. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 04:13:23,968][134211] Avg episode reward: [(0, '7.605')] [2025-01-04 04:13:24,178][134294] Updated weights for policy 0, policy_version 94254 (0.0025) [2025-01-04 04:13:27,612][134294] Updated weights for policy 0, policy_version 94264 (0.0024) [2025-01-04 04:13:28,968][134211] Fps is (10 sec: 12288.0, 60 sec: 14745.5, 300 sec: 15065.1). Total num frames: 386117632. Throughput: 0: 3744.0. Samples: 85701048. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 04:13:28,968][134211] Avg episode reward: [(0, '7.694')] [2025-01-04 04:13:31,051][134294] Updated weights for policy 0, policy_version 94274 (0.0025) [2025-01-04 04:13:33,968][134211] Fps is (10 sec: 12288.2, 60 sec: 14677.3, 300 sec: 15023.3). Total num frames: 386183168. Throughput: 0: 3603.1. Samples: 85710018. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 04:13:33,968][134211] Avg episode reward: [(0, '7.900')] [2025-01-04 04:13:34,024][134294] Updated weights for policy 0, policy_version 94284 (0.0020) [2025-01-04 04:13:35,977][134294] Updated weights for policy 0, policy_version 94294 (0.0014) [2025-01-04 04:13:37,812][134294] Updated weights for policy 0, policy_version 94304 (0.0013) [2025-01-04 04:13:38,967][134211] Fps is (10 sec: 17203.5, 60 sec: 15360.1, 300 sec: 15203.8). Total num frames: 386289664. Throughput: 0: 3806.5. Samples: 85737094. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 04:13:38,968][134211] Avg episode reward: [(0, '7.810')] [2025-01-04 04:13:39,705][134294] Updated weights for policy 0, policy_version 94314 (0.0013) [2025-01-04 04:13:41,608][134294] Updated weights for policy 0, policy_version 94324 (0.0013) [2025-01-04 04:13:43,504][134294] Updated weights for policy 0, policy_version 94334 (0.0011) [2025-01-04 04:13:43,968][134211] Fps is (10 sec: 21708.7, 60 sec: 15769.6, 300 sec: 15370.4). Total num frames: 386400256. Throughput: 0: 3896.0. Samples: 85769468. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 04:13:43,968][134211] Avg episode reward: [(0, '7.955')] [2025-01-04 04:13:45,479][134294] Updated weights for policy 0, policy_version 94344 (0.0014) [2025-01-04 04:13:48,190][134294] Updated weights for policy 0, policy_version 94354 (0.0022) [2025-01-04 04:13:48,970][134211] Fps is (10 sec: 19246.7, 60 sec: 15838.1, 300 sec: 15398.1). Total num frames: 386482176. Throughput: 0: 3886.3. Samples: 85784748. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 04:13:48,971][134211] Avg episode reward: [(0, '7.227')] [2025-01-04 04:13:51,588][134294] Updated weights for policy 0, policy_version 94364 (0.0030) [2025-01-04 04:13:53,968][134211] Fps is (10 sec: 14335.8, 60 sec: 15837.8, 300 sec: 15356.5). Total num frames: 386543616. Throughput: 0: 3747.4. Samples: 85803802. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 04:13:53,968][134211] Avg episode reward: [(0, '7.267')] [2025-01-04 04:13:54,860][134294] Updated weights for policy 0, policy_version 94374 (0.0027) [2025-01-04 04:13:57,866][134294] Updated weights for policy 0, policy_version 94384 (0.0024) [2025-01-04 04:13:58,968][134211] Fps is (10 sec: 12700.3, 60 sec: 15155.1, 300 sec: 15356.5). Total num frames: 386609152. Throughput: 0: 3770.8. Samples: 85823252. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 04:13:58,968][134211] Avg episode reward: [(0, '8.396')] [2025-01-04 04:14:01,008][134294] Updated weights for policy 0, policy_version 94394 (0.0025) [2025-01-04 04:14:03,968][134211] Fps is (10 sec: 12697.8, 60 sec: 14472.5, 300 sec: 15231.6). Total num frames: 386670592. Throughput: 0: 3786.0. Samples: 85833218. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 04:14:03,968][134211] Avg episode reward: [(0, '7.740')] [2025-01-04 04:14:04,388][134294] Updated weights for policy 0, policy_version 94404 (0.0029) [2025-01-04 04:14:07,534][134294] Updated weights for policy 0, policy_version 94414 (0.0025) [2025-01-04 04:14:08,968][134211] Fps is (10 sec: 12697.5, 60 sec: 14472.5, 300 sec: 15078.8). Total num frames: 386736128. Throughput: 0: 3763.3. Samples: 85852040. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 04:14:08,969][134211] Avg episode reward: [(0, '7.288')] [2025-01-04 04:14:10,630][134294] Updated weights for policy 0, policy_version 94424 (0.0026) [2025-01-04 04:14:13,627][134294] Updated weights for policy 0, policy_version 94434 (0.0024) [2025-01-04 04:14:13,968][134211] Fps is (10 sec: 13107.2, 60 sec: 14540.8, 300 sec: 15009.4). Total num frames: 386801664. Throughput: 0: 3801.9. Samples: 85872134. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 04:14:13,968][134211] Avg episode reward: [(0, '7.847')] [2025-01-04 04:14:16,652][134294] Updated weights for policy 0, policy_version 94444 (0.0025) [2025-01-04 04:14:18,969][134211] Fps is (10 sec: 13514.6, 60 sec: 14608.6, 300 sec: 15037.1). Total num frames: 386871296. Throughput: 0: 3828.8. Samples: 85882320. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 04:14:18,970][134211] Avg episode reward: [(0, '7.237')] [2025-01-04 04:14:19,701][134294] Updated weights for policy 0, policy_version 94454 (0.0027) [2025-01-04 04:14:22,830][134294] Updated weights for policy 0, policy_version 94464 (0.0028) [2025-01-04 04:14:23,967][134211] Fps is (10 sec: 14336.2, 60 sec: 14745.7, 300 sec: 15078.8). Total num frames: 386945024. Throughput: 0: 3675.5. Samples: 85902490. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 04:14:23,968][134211] Avg episode reward: [(0, '7.504')] [2025-01-04 04:14:24,741][134294] Updated weights for policy 0, policy_version 94474 (0.0015) [2025-01-04 04:14:26,647][134294] Updated weights for policy 0, policy_version 94484 (0.0013) [2025-01-04 04:14:28,537][134294] Updated weights for policy 0, policy_version 94494 (0.0014) [2025-01-04 04:14:28,968][134211] Fps is (10 sec: 18435.6, 60 sec: 15633.1, 300 sec: 15259.4). Total num frames: 387055616. Throughput: 0: 3638.0. Samples: 85933176. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 04:14:28,968][134211] Avg episode reward: [(0, '7.754')] [2025-01-04 04:14:30,474][134294] Updated weights for policy 0, policy_version 94504 (0.0015) [2025-01-04 04:14:32,332][134294] Updated weights for policy 0, policy_version 94514 (0.0013) [2025-01-04 04:14:33,968][134211] Fps is (10 sec: 20479.7, 60 sec: 16110.9, 300 sec: 15356.6). Total num frames: 387149824. Throughput: 0: 3659.1. Samples: 85949400. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 04:14:33,968][134211] Avg episode reward: [(0, '7.476')] [2025-01-04 04:14:35,291][134294] Updated weights for policy 0, policy_version 94524 (0.0027) [2025-01-04 04:14:38,519][134294] Updated weights for policy 0, policy_version 94534 (0.0028) [2025-01-04 04:14:38,968][134211] Fps is (10 sec: 15564.4, 60 sec: 15359.9, 300 sec: 15328.8). Total num frames: 387211264. Throughput: 0: 3735.2. Samples: 85971886. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 04:14:38,968][134211] Avg episode reward: [(0, '8.087')] [2025-01-04 04:14:41,700][134294] Updated weights for policy 0, policy_version 94544 (0.0028) [2025-01-04 04:14:43,968][134211] Fps is (10 sec: 13107.2, 60 sec: 14677.3, 300 sec: 15301.1). Total num frames: 387280896. Throughput: 0: 3728.1. Samples: 85991018. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 04:14:43,968][134211] Avg episode reward: [(0, '8.191')] [2025-01-04 04:14:44,909][134294] Updated weights for policy 0, policy_version 94554 (0.0025) [2025-01-04 04:14:48,030][134294] Updated weights for policy 0, policy_version 94564 (0.0029) [2025-01-04 04:14:48,968][134211] Fps is (10 sec: 13107.4, 60 sec: 14336.5, 300 sec: 15189.9). Total num frames: 387342336. Throughput: 0: 3718.0. Samples: 86000526. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 04:14:48,968][134211] Avg episode reward: [(0, '7.596')] [2025-01-04 04:14:51,090][134294] Updated weights for policy 0, policy_version 94574 (0.0026) [2025-01-04 04:14:53,968][134211] Fps is (10 sec: 13107.2, 60 sec: 14472.6, 300 sec: 15064.9). Total num frames: 387411968. Throughput: 0: 3745.7. Samples: 86020594. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 04:14:53,968][134211] Avg episode reward: [(0, '7.614')] [2025-01-04 04:14:54,210][134294] Updated weights for policy 0, policy_version 94584 (0.0025) [2025-01-04 04:14:57,268][134294] Updated weights for policy 0, policy_version 94594 (0.0025) [2025-01-04 04:14:58,967][134211] Fps is (10 sec: 13516.9, 60 sec: 14472.6, 300 sec: 15079.0). Total num frames: 387477504. Throughput: 0: 3732.7. Samples: 86040104. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 04:14:58,968][134211] Avg episode reward: [(0, '8.651')] [2025-01-04 04:14:59,802][134294] Updated weights for policy 0, policy_version 94604 (0.0017) [2025-01-04 04:15:01,848][134294] Updated weights for policy 0, policy_version 94614 (0.0014) [2025-01-04 04:15:03,968][134211] Fps is (10 sec: 15974.3, 60 sec: 15018.6, 300 sec: 15189.9). Total num frames: 387571712. Throughput: 0: 3826.3. Samples: 86054496. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 04:15:03,968][134211] Avg episode reward: [(0, '7.538')] [2025-01-04 04:15:04,294][134294] Updated weights for policy 0, policy_version 94624 (0.0021) [2025-01-04 04:15:07,631][134294] Updated weights for policy 0, policy_version 94634 (0.0029) [2025-01-04 04:15:08,967][134211] Fps is (10 sec: 15564.7, 60 sec: 14950.5, 300 sec: 15176.1). Total num frames: 387633152. Throughput: 0: 3874.3. Samples: 86076834. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 04:15:08,968][134211] Avg episode reward: [(0, '7.688')] [2025-01-04 04:15:10,170][134294] Updated weights for policy 0, policy_version 94644 (0.0019) [2025-01-04 04:15:12,062][134294] Updated weights for policy 0, policy_version 94654 (0.0014) [2025-01-04 04:15:13,968][134211] Fps is (10 sec: 16383.9, 60 sec: 15564.8, 300 sec: 15287.1). Total num frames: 387735552. Throughput: 0: 3804.4. Samples: 86104374. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 04:15:13,968][134211] Avg episode reward: [(0, '7.847')] [2025-01-04 04:15:13,976][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000094662_387735552.pth... [2025-01-04 04:15:14,045][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000093778_384114688.pth [2025-01-04 04:15:14,436][134294] Updated weights for policy 0, policy_version 94664 (0.0017) [2025-01-04 04:15:17,484][134294] Updated weights for policy 0, policy_version 94674 (0.0027) [2025-01-04 04:15:18,968][134211] Fps is (10 sec: 16792.7, 60 sec: 15496.9, 300 sec: 15189.9). Total num frames: 387801088. Throughput: 0: 3681.5. Samples: 86115070. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 04:15:18,969][134211] Avg episode reward: [(0, '7.645')] [2025-01-04 04:15:20,493][134294] Updated weights for policy 0, policy_version 94684 (0.0024) [2025-01-04 04:15:23,434][134294] Updated weights for policy 0, policy_version 94694 (0.0025) [2025-01-04 04:15:23,968][134211] Fps is (10 sec: 13516.8, 60 sec: 15428.2, 300 sec: 15120.5). Total num frames: 387870720. Throughput: 0: 3635.9. Samples: 86135502. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 04:15:23,968][134211] Avg episode reward: [(0, '7.621')] [2025-01-04 04:15:26,730][134294] Updated weights for policy 0, policy_version 94704 (0.0026) [2025-01-04 04:15:28,968][134211] Fps is (10 sec: 12697.2, 60 sec: 14540.6, 300 sec: 15106.6). Total num frames: 387928064. Throughput: 0: 3621.7. Samples: 86153996. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 04:15:28,969][134211] Avg episode reward: [(0, '7.780')] [2025-01-04 04:15:29,982][134294] Updated weights for policy 0, policy_version 94714 (0.0022) [2025-01-04 04:15:32,019][134294] Updated weights for policy 0, policy_version 94724 (0.0013) [2025-01-04 04:15:33,967][134294] Updated weights for policy 0, policy_version 94734 (0.0014) [2025-01-04 04:15:33,968][134211] Fps is (10 sec: 15974.8, 60 sec: 14677.4, 300 sec: 15176.0). Total num frames: 388030464. Throughput: 0: 3687.6. Samples: 86166466. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 04:15:33,968][134211] Avg episode reward: [(0, '7.885')] [2025-01-04 04:15:35,852][134294] Updated weights for policy 0, policy_version 94744 (0.0014) [2025-01-04 04:15:37,694][134294] Updated weights for policy 0, policy_version 94754 (0.0012) [2025-01-04 04:15:38,967][134211] Fps is (10 sec: 20891.4, 60 sec: 15428.3, 300 sec: 15314.9). Total num frames: 388136960. Throughput: 0: 3948.9. Samples: 86198296. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 04:15:38,968][134211] Avg episode reward: [(0, '7.244')] [2025-01-04 04:15:39,629][134294] Updated weights for policy 0, policy_version 94764 (0.0013) [2025-01-04 04:15:42,088][134294] Updated weights for policy 0, policy_version 94774 (0.0022) [2025-01-04 04:15:43,968][134211] Fps is (10 sec: 18431.7, 60 sec: 15564.8, 300 sec: 15259.3). Total num frames: 388214784. Throughput: 0: 4111.6. Samples: 86225126. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 04:15:43,968][134211] Avg episode reward: [(0, '7.444')] [2025-01-04 04:15:45,332][134294] Updated weights for policy 0, policy_version 94784 (0.0029) [2025-01-04 04:15:48,399][134294] Updated weights for policy 0, policy_version 94794 (0.0024) [2025-01-04 04:15:48,968][134211] Fps is (10 sec: 14335.6, 60 sec: 15633.0, 300 sec: 15217.7). Total num frames: 388280320. Throughput: 0: 4012.1. Samples: 86235040. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 04:15:48,968][134211] Avg episode reward: [(0, '8.351')] [2025-01-04 04:15:51,478][134294] Updated weights for policy 0, policy_version 94804 (0.0027) [2025-01-04 04:15:53,968][134211] Fps is (10 sec: 13107.1, 60 sec: 15564.8, 300 sec: 15217.8). Total num frames: 388345856. Throughput: 0: 3953.6. Samples: 86254746. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 04:15:53,968][134211] Avg episode reward: [(0, '8.445')] [2025-01-04 04:15:54,658][134294] Updated weights for policy 0, policy_version 94814 (0.0027) [2025-01-04 04:15:57,653][134294] Updated weights for policy 0, policy_version 94824 (0.0027) [2025-01-04 04:15:58,968][134211] Fps is (10 sec: 13516.7, 60 sec: 15633.0, 300 sec: 15231.6). Total num frames: 388415488. Throughput: 0: 3781.3. Samples: 86274532. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 04:15:58,968][134211] Avg episode reward: [(0, '7.582')] [2025-01-04 04:16:00,660][134294] Updated weights for policy 0, policy_version 94834 (0.0024) [2025-01-04 04:16:03,676][134294] Updated weights for policy 0, policy_version 94844 (0.0025) [2025-01-04 04:16:03,968][134211] Fps is (10 sec: 13516.8, 60 sec: 15155.2, 300 sec: 15092.7). Total num frames: 388481024. Throughput: 0: 3777.5. Samples: 86285058. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 04:16:03,968][134211] Avg episode reward: [(0, '7.702')] [2025-01-04 04:16:06,796][134294] Updated weights for policy 0, policy_version 94854 (0.0020) [2025-01-04 04:16:08,968][134211] Fps is (10 sec: 13107.5, 60 sec: 15223.5, 300 sec: 15023.3). Total num frames: 388546560. Throughput: 0: 3765.4. Samples: 86304944. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 04:16:08,968][134211] Avg episode reward: [(0, '8.192')] [2025-01-04 04:16:10,184][134294] Updated weights for policy 0, policy_version 94864 (0.0026) [2025-01-04 04:16:12,985][134294] Updated weights for policy 0, policy_version 94874 (0.0016) [2025-01-04 04:16:13,968][134211] Fps is (10 sec: 14336.2, 60 sec: 14813.9, 300 sec: 15065.0). Total num frames: 388624384. Throughput: 0: 3803.7. Samples: 86325162. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 04:16:13,968][134211] Avg episode reward: [(0, '7.841')] [2025-01-04 04:16:14,902][134294] Updated weights for policy 0, policy_version 94884 (0.0012) [2025-01-04 04:16:16,822][134294] Updated weights for policy 0, policy_version 94894 (0.0015) [2025-01-04 04:16:18,726][134294] Updated weights for policy 0, policy_version 94904 (0.0014) [2025-01-04 04:16:18,967][134211] Fps is (10 sec: 18432.1, 60 sec: 15496.7, 300 sec: 15203.8). Total num frames: 388730880. Throughput: 0: 3881.5. Samples: 86341134. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 04:16:18,968][134211] Avg episode reward: [(0, '7.886')] [2025-01-04 04:16:21,491][134294] Updated weights for policy 0, policy_version 94914 (0.0026) [2025-01-04 04:16:23,968][134211] Fps is (10 sec: 16793.3, 60 sec: 15360.0, 300 sec: 15217.7). Total num frames: 388792320. Throughput: 0: 3752.0. Samples: 86367138. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 04:16:23,968][134211] Avg episode reward: [(0, '7.847')] [2025-01-04 04:16:25,002][134294] Updated weights for policy 0, policy_version 94924 (0.0025) [2025-01-04 04:16:28,232][134294] Updated weights for policy 0, policy_version 94934 (0.0028) [2025-01-04 04:16:28,968][134211] Fps is (10 sec: 12697.3, 60 sec: 15496.7, 300 sec: 15148.3). Total num frames: 388857856. Throughput: 0: 3567.7. Samples: 86385674. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 04:16:28,968][134211] Avg episode reward: [(0, '7.964')] [2025-01-04 04:16:31,181][134294] Updated weights for policy 0, policy_version 94944 (0.0024) [2025-01-04 04:16:33,969][134211] Fps is (10 sec: 13106.3, 60 sec: 14881.9, 300 sec: 14995.5). Total num frames: 388923392. Throughput: 0: 3569.7. Samples: 86395678. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 04:16:33,969][134211] Avg episode reward: [(0, '7.972')] [2025-01-04 04:16:34,442][134294] Updated weights for policy 0, policy_version 94954 (0.0027) [2025-01-04 04:16:36,392][134294] Updated weights for policy 0, policy_version 94964 (0.0014) [2025-01-04 04:16:38,252][134294] Updated weights for policy 0, policy_version 94974 (0.0014) [2025-01-04 04:16:38,967][134211] Fps is (10 sec: 16794.0, 60 sec: 14813.9, 300 sec: 15092.7). Total num frames: 389025792. Throughput: 0: 3690.0. Samples: 86420794. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 04:16:38,968][134211] Avg episode reward: [(0, '7.738')] [2025-01-04 04:16:40,169][134294] Updated weights for policy 0, policy_version 94984 (0.0014) [2025-01-04 04:16:42,055][134294] Updated weights for policy 0, policy_version 94994 (0.0014) [2025-01-04 04:16:43,957][134294] Updated weights for policy 0, policy_version 95004 (0.0012) [2025-01-04 04:16:43,968][134211] Fps is (10 sec: 21301.1, 60 sec: 15360.0, 300 sec: 15259.3). Total num frames: 389136384. Throughput: 0: 3970.0. Samples: 86453182. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 04:16:43,968][134211] Avg episode reward: [(0, '8.290')] [2025-01-04 04:16:46,615][134294] Updated weights for policy 0, policy_version 95014 (0.0027) [2025-01-04 04:16:48,968][134211] Fps is (10 sec: 18021.9, 60 sec: 15428.3, 300 sec: 15273.2). Total num frames: 389206016. Throughput: 0: 4016.2. Samples: 86465786. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 04:16:48,968][134211] Avg episode reward: [(0, '7.628')] [2025-01-04 04:16:49,908][134294] Updated weights for policy 0, policy_version 95024 (0.0029) [2025-01-04 04:16:52,986][134294] Updated weights for policy 0, policy_version 95034 (0.0025) [2025-01-04 04:16:53,968][134211] Fps is (10 sec: 13516.5, 60 sec: 15428.3, 300 sec: 15273.2). Total num frames: 389271552. Throughput: 0: 4006.6. Samples: 86485242. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 04:16:53,969][134211] Avg episode reward: [(0, '8.152')] [2025-01-04 04:16:56,087][134294] Updated weights for policy 0, policy_version 95044 (0.0024) [2025-01-04 04:16:58,968][134211] Fps is (10 sec: 12697.6, 60 sec: 15291.8, 300 sec: 15273.2). Total num frames: 389332992. Throughput: 0: 3988.3. Samples: 86504636. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 04:16:58,968][134211] Avg episode reward: [(0, '7.666')] [2025-01-04 04:16:59,344][134294] Updated weights for policy 0, policy_version 95054 (0.0025) [2025-01-04 04:17:02,474][134294] Updated weights for policy 0, policy_version 95064 (0.0026) [2025-01-04 04:17:03,968][134211] Fps is (10 sec: 12697.7, 60 sec: 15291.7, 300 sec: 15148.2). Total num frames: 389398528. Throughput: 0: 3851.9. Samples: 86514472. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 04:17:03,968][134211] Avg episode reward: [(0, '7.188')] [2025-01-04 04:17:05,510][134294] Updated weights for policy 0, policy_version 95074 (0.0025) [2025-01-04 04:17:08,507][134294] Updated weights for policy 0, policy_version 95084 (0.0023) [2025-01-04 04:17:08,968][134211] Fps is (10 sec: 13516.8, 60 sec: 15360.0, 300 sec: 15120.5). Total num frames: 389468160. Throughput: 0: 3725.5. Samples: 86534786. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 04:17:08,968][134211] Avg episode reward: [(0, '7.710')] [2025-01-04 04:17:11,497][134294] Updated weights for policy 0, policy_version 95094 (0.0025) [2025-01-04 04:17:13,968][134211] Fps is (10 sec: 13925.8, 60 sec: 15223.3, 300 sec: 15134.4). Total num frames: 389537792. Throughput: 0: 3762.8. Samples: 86555000. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 04:17:13,969][134211] Avg episode reward: [(0, '8.125')] [2025-01-04 04:17:13,978][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000095102_389537792.pth... [2025-01-04 04:17:14,048][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000094221_385929216.pth [2025-01-04 04:17:14,624][134294] Updated weights for policy 0, policy_version 95104 (0.0024) [2025-01-04 04:17:16,787][134294] Updated weights for policy 0, policy_version 95114 (0.0015) [2025-01-04 04:17:18,675][134294] Updated weights for policy 0, policy_version 95124 (0.0014) [2025-01-04 04:17:18,968][134211] Fps is (10 sec: 16383.9, 60 sec: 15018.6, 300 sec: 15231.6). Total num frames: 389632000. Throughput: 0: 3800.7. Samples: 86566708. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 04:17:18,968][134211] Avg episode reward: [(0, '7.587')] [2025-01-04 04:17:20,548][134294] Updated weights for policy 0, policy_version 95134 (0.0014) [2025-01-04 04:17:22,463][134294] Updated weights for policy 0, policy_version 95144 (0.0013) [2025-01-04 04:17:23,968][134211] Fps is (10 sec: 19661.7, 60 sec: 15701.4, 300 sec: 15259.3). Total num frames: 389734400. Throughput: 0: 3965.7. Samples: 86599250. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 04:17:23,968][134211] Avg episode reward: [(0, '7.878')] [2025-01-04 04:17:24,960][134294] Updated weights for policy 0, policy_version 95154 (0.0020) [2025-01-04 04:17:28,528][134294] Updated weights for policy 0, policy_version 95164 (0.0026) [2025-01-04 04:17:28,968][134211] Fps is (10 sec: 15974.6, 60 sec: 15564.8, 300 sec: 15217.7). Total num frames: 389791744. Throughput: 0: 3708.5. Samples: 86620066. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 04:17:28,968][134211] Avg episode reward: [(0, '8.398')] [2025-01-04 04:17:32,179][134294] Updated weights for policy 0, policy_version 95174 (0.0028) [2025-01-04 04:17:33,968][134211] Fps is (10 sec: 11878.5, 60 sec: 15496.8, 300 sec: 15203.8). Total num frames: 389853184. Throughput: 0: 3618.9. Samples: 86628638. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 04:17:33,969][134211] Avg episode reward: [(0, '7.626')] [2025-01-04 04:17:35,557][134294] Updated weights for policy 0, policy_version 95184 (0.0026) [2025-01-04 04:17:38,621][134294] Updated weights for policy 0, policy_version 95194 (0.0023) [2025-01-04 04:17:38,968][134211] Fps is (10 sec: 12697.7, 60 sec: 14882.1, 300 sec: 15134.4). Total num frames: 389918720. Throughput: 0: 3596.1. Samples: 86647066. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 04:17:38,968][134211] Avg episode reward: [(0, '8.189')] [2025-01-04 04:17:40,676][134294] Updated weights for policy 0, policy_version 95204 (0.0014) [2025-01-04 04:17:43,374][134294] Updated weights for policy 0, policy_version 95214 (0.0021) [2025-01-04 04:17:43,968][134211] Fps is (10 sec: 15154.4, 60 sec: 14472.4, 300 sec: 15162.3). Total num frames: 390004736. Throughput: 0: 3713.5. Samples: 86671744. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 04:17:43,969][134211] Avg episode reward: [(0, '7.726')] [2025-01-04 04:17:46,330][134294] Updated weights for policy 0, policy_version 95224 (0.0023) [2025-01-04 04:17:48,968][134211] Fps is (10 sec: 15154.9, 60 sec: 14404.3, 300 sec: 15176.0). Total num frames: 390070272. Throughput: 0: 3720.2. Samples: 86681880. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 04:17:48,968][134211] Avg episode reward: [(0, '8.076')] [2025-01-04 04:17:49,490][134294] Updated weights for policy 0, policy_version 95234 (0.0024) [2025-01-04 04:17:51,986][134294] Updated weights for policy 0, policy_version 95244 (0.0019) [2025-01-04 04:17:53,937][134294] Updated weights for policy 0, policy_version 95254 (0.0013) [2025-01-04 04:17:53,968][134211] Fps is (10 sec: 15565.8, 60 sec: 14813.9, 300 sec: 15120.5). Total num frames: 390160384. Throughput: 0: 3761.7. Samples: 86704060. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 04:17:53,968][134211] Avg episode reward: [(0, '7.128')] [2025-01-04 04:17:55,804][134294] Updated weights for policy 0, policy_version 95264 (0.0013) [2025-01-04 04:17:57,826][134294] Updated weights for policy 0, policy_version 95274 (0.0014) [2025-01-04 04:17:58,967][134211] Fps is (10 sec: 19251.7, 60 sec: 15496.6, 300 sec: 15120.5). Total num frames: 390262784. Throughput: 0: 4009.4. Samples: 86735420. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 04:17:58,968][134211] Avg episode reward: [(0, '7.883')] [2025-01-04 04:17:59,925][134294] Updated weights for policy 0, policy_version 95284 (0.0012) [2025-01-04 04:18:02,744][134294] Updated weights for policy 0, policy_version 95294 (0.0024) [2025-01-04 04:18:03,970][134211] Fps is (10 sec: 17608.6, 60 sec: 15632.5, 300 sec: 15148.1). Total num frames: 390336512. Throughput: 0: 4056.5. Samples: 86749260. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 04:18:03,971][134211] Avg episode reward: [(0, '7.459')] [2025-01-04 04:18:05,905][134294] Updated weights for policy 0, policy_version 95304 (0.0028) [2025-01-04 04:18:08,968][134211] Fps is (10 sec: 13925.9, 60 sec: 15564.8, 300 sec: 15162.1). Total num frames: 390402048. Throughput: 0: 3761.3. Samples: 86768508. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 04:18:08,969][134211] Avg episode reward: [(0, '8.098')] [2025-01-04 04:18:09,086][134294] Updated weights for policy 0, policy_version 95314 (0.0027) [2025-01-04 04:18:12,368][134294] Updated weights for policy 0, policy_version 95324 (0.0026) [2025-01-04 04:18:13,968][134211] Fps is (10 sec: 12699.5, 60 sec: 15428.2, 300 sec: 15148.2). Total num frames: 390463488. Throughput: 0: 3716.5. Samples: 86787310. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 04:18:13,969][134211] Avg episode reward: [(0, '7.872')] [2025-01-04 04:18:15,507][134294] Updated weights for policy 0, policy_version 95334 (0.0023) [2025-01-04 04:18:18,508][134294] Updated weights for policy 0, policy_version 95344 (0.0027) [2025-01-04 04:18:18,969][134211] Fps is (10 sec: 13105.8, 60 sec: 15018.4, 300 sec: 15162.1). Total num frames: 390533120. Throughput: 0: 3750.6. Samples: 86797418. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 04:18:18,969][134211] Avg episode reward: [(0, '8.154')] [2025-01-04 04:18:21,542][134294] Updated weights for policy 0, policy_version 95354 (0.0025) [2025-01-04 04:18:23,968][134211] Fps is (10 sec: 13927.4, 60 sec: 14472.5, 300 sec: 15203.8). Total num frames: 390602752. Throughput: 0: 3794.4. Samples: 86817816. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 04:18:23,968][134211] Avg episode reward: [(0, '7.924')] [2025-01-04 04:18:24,654][134294] Updated weights for policy 0, policy_version 95364 (0.0022) [2025-01-04 04:18:26,555][134294] Updated weights for policy 0, policy_version 95374 (0.0014) [2025-01-04 04:18:28,444][134294] Updated weights for policy 0, policy_version 95384 (0.0014) [2025-01-04 04:18:28,968][134211] Fps is (10 sec: 16795.8, 60 sec: 15155.2, 300 sec: 15314.9). Total num frames: 390701056. Throughput: 0: 3852.5. Samples: 86845102. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 04:18:28,968][134211] Avg episode reward: [(0, '9.365')] [2025-01-04 04:18:29,009][134264] Saving new best policy, reward=9.365! [2025-01-04 04:18:30,336][134294] Updated weights for policy 0, policy_version 95394 (0.0013) [2025-01-04 04:18:32,738][134294] Updated weights for policy 0, policy_version 95404 (0.0020) [2025-01-04 04:18:33,968][134211] Fps is (10 sec: 18431.8, 60 sec: 15564.8, 300 sec: 15245.4). Total num frames: 390787072. Throughput: 0: 3984.7. Samples: 86861190. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 04:18:33,968][134211] Avg episode reward: [(0, '7.797')] [2025-01-04 04:18:35,943][134294] Updated weights for policy 0, policy_version 95414 (0.0031) [2025-01-04 04:18:38,968][134211] Fps is (10 sec: 15154.9, 60 sec: 15564.8, 300 sec: 15092.7). Total num frames: 390852608. Throughput: 0: 3939.5. Samples: 86881340. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 04:18:38,968][134211] Avg episode reward: [(0, '8.411')] [2025-01-04 04:18:39,072][134294] Updated weights for policy 0, policy_version 95424 (0.0029) [2025-01-04 04:18:42,169][134294] Updated weights for policy 0, policy_version 95434 (0.0027) [2025-01-04 04:18:43,968][134211] Fps is (10 sec: 13107.3, 60 sec: 15223.6, 300 sec: 15037.3). Total num frames: 390918144. Throughput: 0: 3672.0. Samples: 86900660. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 04:18:43,968][134211] Avg episode reward: [(0, '8.376')] [2025-01-04 04:18:45,387][134294] Updated weights for policy 0, policy_version 95444 (0.0025) [2025-01-04 04:18:48,467][134294] Updated weights for policy 0, policy_version 95454 (0.0027) [2025-01-04 04:18:48,968][134211] Fps is (10 sec: 13516.7, 60 sec: 15291.7, 300 sec: 15064.9). Total num frames: 390987776. Throughput: 0: 3580.6. Samples: 86910378. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 04:18:48,968][134211] Avg episode reward: [(0, '8.134')] [2025-01-04 04:18:50,675][134294] Updated weights for policy 0, policy_version 95464 (0.0015) [2025-01-04 04:18:52,491][134294] Updated weights for policy 0, policy_version 95474 (0.0013) [2025-01-04 04:18:53,968][134211] Fps is (10 sec: 17203.4, 60 sec: 15496.5, 300 sec: 15189.9). Total num frames: 391090176. Throughput: 0: 3740.0. Samples: 86936806. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 04:18:53,968][134211] Avg episode reward: [(0, '7.525')] [2025-01-04 04:18:54,793][134294] Updated weights for policy 0, policy_version 95484 (0.0020) [2025-01-04 04:18:57,875][134294] Updated weights for policy 0, policy_version 95494 (0.0025) [2025-01-04 04:18:58,968][134211] Fps is (10 sec: 16793.9, 60 sec: 14882.1, 300 sec: 15203.8). Total num frames: 391155712. Throughput: 0: 3833.1. Samples: 86959798. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 04:18:58,968][134211] Avg episode reward: [(0, '8.084')] [2025-01-04 04:19:01,058][134294] Updated weights for policy 0, policy_version 95504 (0.0030) [2025-01-04 04:19:03,968][134211] Fps is (10 sec: 13107.1, 60 sec: 14746.2, 300 sec: 15203.8). Total num frames: 391221248. Throughput: 0: 3829.5. Samples: 86969742. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 04:19:03,968][134211] Avg episode reward: [(0, '7.846')] [2025-01-04 04:19:04,268][134294] Updated weights for policy 0, policy_version 95514 (0.0033) [2025-01-04 04:19:07,282][134294] Updated weights for policy 0, policy_version 95524 (0.0024) [2025-01-04 04:19:08,968][134211] Fps is (10 sec: 14334.9, 60 sec: 14950.3, 300 sec: 15245.4). Total num frames: 391299072. Throughput: 0: 3816.4. Samples: 86989558. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 04:19:08,969][134211] Avg episode reward: [(0, '7.995')] [2025-01-04 04:19:09,247][134294] Updated weights for policy 0, policy_version 95534 (0.0013) [2025-01-04 04:19:11,147][134294] Updated weights for policy 0, policy_version 95544 (0.0013) [2025-01-04 04:19:13,037][134294] Updated weights for policy 0, policy_version 95554 (0.0015) [2025-01-04 04:19:13,968][134211] Fps is (10 sec: 18432.0, 60 sec: 15701.5, 300 sec: 15370.5). Total num frames: 391405568. Throughput: 0: 3917.3. Samples: 87021382. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 04:19:13,968][134211] Avg episode reward: [(0, '7.984')] [2025-01-04 04:19:14,000][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000095559_391409664.pth... [2025-01-04 04:19:14,047][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000094662_387735552.pth [2025-01-04 04:19:14,936][134294] Updated weights for policy 0, policy_version 95564 (0.0014) [2025-01-04 04:19:17,751][134294] Updated weights for policy 0, policy_version 95574 (0.0024) [2025-01-04 04:19:18,969][134211] Fps is (10 sec: 18840.7, 60 sec: 15906.1, 300 sec: 15398.1). Total num frames: 391487488. Throughput: 0: 3881.6. Samples: 87035868. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 04:19:18,970][134211] Avg episode reward: [(0, '7.992')] [2025-01-04 04:19:20,806][134294] Updated weights for policy 0, policy_version 95584 (0.0029) [2025-01-04 04:19:23,811][134294] Updated weights for policy 0, policy_version 95594 (0.0027) [2025-01-04 04:19:23,968][134211] Fps is (10 sec: 14745.5, 60 sec: 15837.9, 300 sec: 15245.4). Total num frames: 391553024. Throughput: 0: 3878.9. Samples: 87055892. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 04:19:23,968][134211] Avg episode reward: [(0, '8.331')] [2025-01-04 04:19:27,257][134294] Updated weights for policy 0, policy_version 95604 (0.0026) [2025-01-04 04:19:28,968][134211] Fps is (10 sec: 12289.4, 60 sec: 15155.2, 300 sec: 15120.5). Total num frames: 391610368. Throughput: 0: 3850.8. Samples: 87073944. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 04:19:28,968][134211] Avg episode reward: [(0, '7.205')] [2025-01-04 04:19:30,972][134294] Updated weights for policy 0, policy_version 95614 (0.0027) [2025-01-04 04:19:33,811][134294] Updated weights for policy 0, policy_version 95624 (0.0021) [2025-01-04 04:19:33,967][134211] Fps is (10 sec: 12288.3, 60 sec: 14813.9, 300 sec: 15134.4). Total num frames: 391675904. Throughput: 0: 3826.7. Samples: 87082578. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 04:19:33,968][134211] Avg episode reward: [(0, '7.932')] [2025-01-04 04:19:35,779][134294] Updated weights for policy 0, policy_version 95634 (0.0013) [2025-01-04 04:19:37,619][134294] Updated weights for policy 0, policy_version 95644 (0.0015) [2025-01-04 04:19:38,967][134211] Fps is (10 sec: 17613.2, 60 sec: 15564.9, 300 sec: 15273.2). Total num frames: 391786496. Throughput: 0: 3855.3. Samples: 87110294. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 04:19:38,968][134211] Avg episode reward: [(0, '8.737')] [2025-01-04 04:19:39,619][134294] Updated weights for policy 0, policy_version 95654 (0.0016) [2025-01-04 04:19:42,592][134294] Updated weights for policy 0, policy_version 95664 (0.0025) [2025-01-04 04:19:43,968][134211] Fps is (10 sec: 17612.3, 60 sec: 15564.8, 300 sec: 15287.1). Total num frames: 391852032. Throughput: 0: 3888.8. Samples: 87134796. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 04:19:43,969][134211] Avg episode reward: [(0, '8.195')] [2025-01-04 04:19:45,830][134294] Updated weights for policy 0, policy_version 95674 (0.0027) [2025-01-04 04:19:48,886][134294] Updated weights for policy 0, policy_version 95684 (0.0025) [2025-01-04 04:19:48,968][134211] Fps is (10 sec: 13516.5, 60 sec: 15564.8, 300 sec: 15287.1). Total num frames: 391921664. Throughput: 0: 3886.0. Samples: 87144612. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 04:19:48,968][134211] Avg episode reward: [(0, '7.609')] [2025-01-04 04:19:51,818][134294] Updated weights for policy 0, policy_version 95694 (0.0026) [2025-01-04 04:19:53,968][134211] Fps is (10 sec: 13516.8, 60 sec: 14950.4, 300 sec: 15287.1). Total num frames: 391987200. Throughput: 0: 3897.0. Samples: 87164920. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 04:19:53,968][134211] Avg episode reward: [(0, '7.933')] [2025-01-04 04:19:55,003][134294] Updated weights for policy 0, policy_version 95704 (0.0025) [2025-01-04 04:19:57,967][134294] Updated weights for policy 0, policy_version 95714 (0.0026) [2025-01-04 04:19:58,968][134211] Fps is (10 sec: 13515.8, 60 sec: 15018.5, 300 sec: 15203.8). Total num frames: 392056832. Throughput: 0: 3634.7. Samples: 87184946. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 04:19:58,969][134211] Avg episode reward: [(0, '7.954')] [2025-01-04 04:20:00,897][134294] Updated weights for policy 0, policy_version 95724 (0.0025) [2025-01-04 04:20:02,772][134294] Updated weights for policy 0, policy_version 95734 (0.0014) [2025-01-04 04:20:03,968][134211] Fps is (10 sec: 16384.4, 60 sec: 15496.6, 300 sec: 15314.9). Total num frames: 392151040. Throughput: 0: 3562.7. Samples: 87196186. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 04:20:03,968][134211] Avg episode reward: [(0, '7.413')] [2025-01-04 04:20:04,663][134294] Updated weights for policy 0, policy_version 95744 (0.0014) [2025-01-04 04:20:06,598][134294] Updated weights for policy 0, policy_version 95754 (0.0016) [2025-01-04 04:20:08,968][134211] Fps is (10 sec: 18842.9, 60 sec: 15769.8, 300 sec: 15287.1). Total num frames: 392245248. Throughput: 0: 3834.4. Samples: 87228442. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 04:20:08,968][134211] Avg episode reward: [(0, '7.838')] [2025-01-04 04:20:09,099][134294] Updated weights for policy 0, policy_version 95764 (0.0024) [2025-01-04 04:20:12,302][134294] Updated weights for policy 0, policy_version 95774 (0.0029) [2025-01-04 04:20:13,968][134211] Fps is (10 sec: 15974.0, 60 sec: 15086.9, 300 sec: 15287.1). Total num frames: 392310784. Throughput: 0: 3877.9. Samples: 87248448. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 04:20:13,968][134211] Avg episode reward: [(0, '7.589')] [2025-01-04 04:20:15,483][134294] Updated weights for policy 0, policy_version 95784 (0.0028) [2025-01-04 04:20:18,579][134294] Updated weights for policy 0, policy_version 95794 (0.0026) [2025-01-04 04:20:18,972][134211] Fps is (10 sec: 13102.1, 60 sec: 14813.2, 300 sec: 15273.0). Total num frames: 392376320. Throughput: 0: 3904.1. Samples: 87258278. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 04:20:18,972][134211] Avg episode reward: [(0, '8.432')] [2025-01-04 04:20:21,633][134294] Updated weights for policy 0, policy_version 95804 (0.0027) [2025-01-04 04:20:23,968][134211] Fps is (10 sec: 13107.4, 60 sec: 14813.9, 300 sec: 15301.0). Total num frames: 392441856. Throughput: 0: 3732.9. Samples: 87278274. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 04:20:23,968][134211] Avg episode reward: [(0, '7.182')] [2025-01-04 04:20:24,819][134294] Updated weights for policy 0, policy_version 95814 (0.0028) [2025-01-04 04:20:27,340][134294] Updated weights for policy 0, policy_version 95824 (0.0017) [2025-01-04 04:20:28,968][134211] Fps is (10 sec: 15161.4, 60 sec: 15291.8, 300 sec: 15245.5). Total num frames: 392527872. Throughput: 0: 3702.0. Samples: 87301386. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 04:20:28,968][134211] Avg episode reward: [(0, '8.400')] [2025-01-04 04:20:29,248][134294] Updated weights for policy 0, policy_version 95834 (0.0013) [2025-01-04 04:20:31,174][134294] Updated weights for policy 0, policy_version 95844 (0.0015) [2025-01-04 04:20:33,029][134294] Updated weights for policy 0, policy_version 95854 (0.0014) [2025-01-04 04:20:33,968][134211] Fps is (10 sec: 19661.0, 60 sec: 16042.6, 300 sec: 15259.3). Total num frames: 392638464. Throughput: 0: 3845.8. Samples: 87317672. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 04:20:33,968][134211] Avg episode reward: [(0, '8.058')] [2025-01-04 04:20:34,917][134294] Updated weights for policy 0, policy_version 95864 (0.0012) [2025-01-04 04:20:36,794][134294] Updated weights for policy 0, policy_version 95874 (0.0014) [2025-01-04 04:20:38,968][134211] Fps is (10 sec: 20479.6, 60 sec: 15769.5, 300 sec: 15314.9). Total num frames: 392732672. Throughput: 0: 4117.0. Samples: 87350186. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 04:20:38,968][134211] Avg episode reward: [(0, '7.415')] [2025-01-04 04:20:39,533][134294] Updated weights for policy 0, policy_version 95884 (0.0024) [2025-01-04 04:20:42,814][134294] Updated weights for policy 0, policy_version 95894 (0.0033) [2025-01-04 04:20:43,990][134211] Fps is (10 sec: 15530.4, 60 sec: 15695.6, 300 sec: 15299.8). Total num frames: 392794112. Throughput: 0: 4103.3. Samples: 87369682. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 04:20:43,991][134211] Avg episode reward: [(0, '8.323')] [2025-01-04 04:20:46,008][134294] Updated weights for policy 0, policy_version 95904 (0.0027) [2025-01-04 04:20:48,955][134294] Updated weights for policy 0, policy_version 95914 (0.0026) [2025-01-04 04:20:48,968][134211] Fps is (10 sec: 13106.9, 60 sec: 15701.3, 300 sec: 15314.9). Total num frames: 392863744. Throughput: 0: 4074.4. Samples: 87379536. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 04:20:48,969][134211] Avg episode reward: [(0, '7.853')] [2025-01-04 04:20:52,101][134294] Updated weights for policy 0, policy_version 95924 (0.0025) [2025-01-04 04:20:53,968][134211] Fps is (10 sec: 13546.6, 60 sec: 15701.3, 300 sec: 15301.0). Total num frames: 392929280. Throughput: 0: 3805.8. Samples: 87399702. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 04:20:53,968][134211] Avg episode reward: [(0, '7.802')] [2025-01-04 04:20:55,095][134294] Updated weights for policy 0, policy_version 95934 (0.0025) [2025-01-04 04:20:58,275][134294] Updated weights for policy 0, policy_version 95944 (0.0026) [2025-01-04 04:20:58,968][134211] Fps is (10 sec: 12697.9, 60 sec: 15565.0, 300 sec: 15287.1). Total num frames: 392990720. Throughput: 0: 3801.6. Samples: 87419518. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 04:20:58,968][134211] Avg episode reward: [(0, '7.832')] [2025-01-04 04:21:01,389][134294] Updated weights for policy 0, policy_version 95954 (0.0025) [2025-01-04 04:21:03,968][134211] Fps is (10 sec: 13107.2, 60 sec: 15155.1, 300 sec: 15301.0). Total num frames: 393060352. Throughput: 0: 3805.6. Samples: 87429516. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 04:21:03,968][134211] Avg episode reward: [(0, '7.845')] [2025-01-04 04:21:04,488][134294] Updated weights for policy 0, policy_version 95964 (0.0024) [2025-01-04 04:21:07,517][134294] Updated weights for policy 0, policy_version 95974 (0.0025) [2025-01-04 04:21:08,968][134211] Fps is (10 sec: 13516.8, 60 sec: 14677.3, 300 sec: 15259.3). Total num frames: 393125888. Throughput: 0: 3801.5. Samples: 87449340. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 04:21:08,968][134211] Avg episode reward: [(0, '7.503')] [2025-01-04 04:21:10,321][134294] Updated weights for policy 0, policy_version 95984 (0.0022) [2025-01-04 04:21:12,229][134294] Updated weights for policy 0, policy_version 95994 (0.0013) [2025-01-04 04:21:13,968][134211] Fps is (10 sec: 16793.8, 60 sec: 15291.8, 300 sec: 15245.4). Total num frames: 393228288. Throughput: 0: 3890.8. Samples: 87476474. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 04:21:13,968][134211] Avg episode reward: [(0, '7.962')] [2025-01-04 04:21:13,972][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000096003_393228288.pth... [2025-01-04 04:21:14,023][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000095102_389537792.pth [2025-01-04 04:21:14,115][134294] Updated weights for policy 0, policy_version 96004 (0.0014) [2025-01-04 04:21:16,785][134294] Updated weights for policy 0, policy_version 96014 (0.0025) [2025-01-04 04:21:18,968][134211] Fps is (10 sec: 17612.8, 60 sec: 15429.3, 300 sec: 15287.1). Total num frames: 393302016. Throughput: 0: 3811.4. Samples: 87489186. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 04:21:18,968][134211] Avg episode reward: [(0, '7.581')] [2025-01-04 04:21:19,929][134294] Updated weights for policy 0, policy_version 96024 (0.0027) [2025-01-04 04:21:23,048][134294] Updated weights for policy 0, policy_version 96034 (0.0026) [2025-01-04 04:21:23,968][134211] Fps is (10 sec: 13516.7, 60 sec: 15360.0, 300 sec: 15273.2). Total num frames: 393363456. Throughput: 0: 3529.4. Samples: 87509008. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 04:21:23,968][134211] Avg episode reward: [(0, '7.855')] [2025-01-04 04:21:26,289][134294] Updated weights for policy 0, policy_version 96044 (0.0025) [2025-01-04 04:21:28,968][134211] Fps is (10 sec: 12697.2, 60 sec: 15018.5, 300 sec: 15273.2). Total num frames: 393428992. Throughput: 0: 3510.6. Samples: 87527582. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 04:21:28,968][134211] Avg episode reward: [(0, '8.883')] [2025-01-04 04:21:29,274][134294] Updated weights for policy 0, policy_version 96054 (0.0019) [2025-01-04 04:21:31,314][134294] Updated weights for policy 0, policy_version 96064 (0.0014) [2025-01-04 04:21:33,319][134294] Updated weights for policy 0, policy_version 96074 (0.0014) [2025-01-04 04:21:33,968][134211] Fps is (10 sec: 17203.5, 60 sec: 14950.4, 300 sec: 15287.1). Total num frames: 393535488. Throughput: 0: 3612.5. Samples: 87542098. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-04 04:21:33,968][134211] Avg episode reward: [(0, '8.570')] [2025-01-04 04:21:35,134][134294] Updated weights for policy 0, policy_version 96084 (0.0014) [2025-01-04 04:21:36,959][134294] Updated weights for policy 0, policy_version 96094 (0.0014) [2025-01-04 04:21:38,922][134294] Updated weights for policy 0, policy_version 96104 (0.0012) [2025-01-04 04:21:38,968][134211] Fps is (10 sec: 21300.3, 60 sec: 15155.2, 300 sec: 15273.2). Total num frames: 393641984. Throughput: 0: 3882.9. Samples: 87574430. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-04 04:21:38,968][134211] Avg episode reward: [(0, '8.230')] [2025-01-04 04:21:40,805][134294] Updated weights for policy 0, policy_version 96114 (0.0013) [2025-01-04 04:21:43,614][134294] Updated weights for policy 0, policy_version 96124 (0.0024) [2025-01-04 04:21:43,968][134211] Fps is (10 sec: 19251.0, 60 sec: 15570.5, 300 sec: 15328.8). Total num frames: 393728000. Throughput: 0: 4075.2. Samples: 87602902. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-04 04:21:43,968][134211] Avg episode reward: [(0, '7.935')] [2025-01-04 04:21:46,778][134294] Updated weights for policy 0, policy_version 96134 (0.0027) [2025-01-04 04:21:48,968][134211] Fps is (10 sec: 14745.1, 60 sec: 15428.3, 300 sec: 15314.9). Total num frames: 393789440. Throughput: 0: 4060.9. Samples: 87612256. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-04 04:21:48,969][134211] Avg episode reward: [(0, '6.905')] [2025-01-04 04:21:50,039][134294] Updated weights for policy 0, policy_version 96144 (0.0026) [2025-01-04 04:21:53,311][134294] Updated weights for policy 0, policy_version 96154 (0.0030) [2025-01-04 04:21:53,968][134211] Fps is (10 sec: 12697.5, 60 sec: 15428.3, 300 sec: 15328.8). Total num frames: 393854976. Throughput: 0: 4043.2. Samples: 87631284. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-04 04:21:53,968][134211] Avg episode reward: [(0, '7.259')] [2025-01-04 04:21:56,436][134294] Updated weights for policy 0, policy_version 96164 (0.0021) [2025-01-04 04:21:58,968][134211] Fps is (10 sec: 13107.2, 60 sec: 15496.5, 300 sec: 15328.8). Total num frames: 393920512. Throughput: 0: 3877.5. Samples: 87650960. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-04 04:21:58,968][134211] Avg episode reward: [(0, '7.475')] [2025-01-04 04:21:59,524][134294] Updated weights for policy 0, policy_version 96174 (0.0025) [2025-01-04 04:22:02,588][134294] Updated weights for policy 0, policy_version 96184 (0.0025) [2025-01-04 04:22:03,968][134211] Fps is (10 sec: 13107.4, 60 sec: 15428.3, 300 sec: 15314.9). Total num frames: 393986048. Throughput: 0: 3817.8. Samples: 87660988. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-04 04:22:03,968][134211] Avg episode reward: [(0, '6.738')] [2025-01-04 04:22:05,672][134294] Updated weights for policy 0, policy_version 96194 (0.0026) [2025-01-04 04:22:08,685][134294] Updated weights for policy 0, policy_version 96204 (0.0025) [2025-01-04 04:22:08,968][134211] Fps is (10 sec: 13107.3, 60 sec: 15428.3, 300 sec: 15301.0). Total num frames: 394051584. Throughput: 0: 3824.9. Samples: 87681130. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-04 04:22:08,968][134211] Avg episode reward: [(0, '7.938')] [2025-01-04 04:22:11,845][134294] Updated weights for policy 0, policy_version 96214 (0.0027) [2025-01-04 04:22:13,968][134211] Fps is (10 sec: 13107.0, 60 sec: 14813.8, 300 sec: 15203.8). Total num frames: 394117120. Throughput: 0: 3847.0. Samples: 87700698. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-04 04:22:13,968][134211] Avg episode reward: [(0, '7.639')] [2025-01-04 04:22:14,897][134294] Updated weights for policy 0, policy_version 96224 (0.0024) [2025-01-04 04:22:17,308][134294] Updated weights for policy 0, policy_version 96234 (0.0016) [2025-01-04 04:22:18,967][134211] Fps is (10 sec: 15974.9, 60 sec: 15155.3, 300 sec: 15176.0). Total num frames: 394211328. Throughput: 0: 3751.6. Samples: 87710920. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-04 04:22:18,968][134211] Avg episode reward: [(0, '7.630')] [2025-01-04 04:22:19,171][134294] Updated weights for policy 0, policy_version 96244 (0.0012) [2025-01-04 04:22:21,027][134294] Updated weights for policy 0, policy_version 96254 (0.0015) [2025-01-04 04:22:22,886][134294] Updated weights for policy 0, policy_version 96264 (0.0014) [2025-01-04 04:22:23,968][134211] Fps is (10 sec: 19661.0, 60 sec: 15837.9, 300 sec: 15328.8). Total num frames: 394313728. Throughput: 0: 3759.5. Samples: 87743606. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-04 04:22:23,968][134211] Avg episode reward: [(0, '7.414')] [2025-01-04 04:22:25,449][134294] Updated weights for policy 0, policy_version 96274 (0.0022) [2025-01-04 04:22:28,764][134294] Updated weights for policy 0, policy_version 96284 (0.0027) [2025-01-04 04:22:28,968][134211] Fps is (10 sec: 17202.6, 60 sec: 15906.2, 300 sec: 15356.5). Total num frames: 394383360. Throughput: 0: 3635.5. Samples: 87766502. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-04 04:22:28,970][134211] Avg episode reward: [(0, '7.965')] [2025-01-04 04:22:31,791][134294] Updated weights for policy 0, policy_version 96294 (0.0026) [2025-01-04 04:22:33,968][134211] Fps is (10 sec: 13107.0, 60 sec: 15155.1, 300 sec: 15342.6). Total num frames: 394444800. Throughput: 0: 3643.2. Samples: 87776198. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 04:22:33,968][134211] Avg episode reward: [(0, '8.056')] [2025-01-04 04:22:34,995][134294] Updated weights for policy 0, policy_version 96304 (0.0024) [2025-01-04 04:22:38,006][134294] Updated weights for policy 0, policy_version 96314 (0.0025) [2025-01-04 04:22:38,968][134211] Fps is (10 sec: 12697.6, 60 sec: 14472.5, 300 sec: 15273.2). Total num frames: 394510336. Throughput: 0: 3655.4. Samples: 87795778. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 04:22:38,968][134211] Avg episode reward: [(0, '8.203')] [2025-01-04 04:22:41,183][134294] Updated weights for policy 0, policy_version 96324 (0.0024) [2025-01-04 04:22:43,667][134294] Updated weights for policy 0, policy_version 96334 (0.0018) [2025-01-04 04:22:43,968][134211] Fps is (10 sec: 14336.3, 60 sec: 14336.0, 300 sec: 15314.9). Total num frames: 394588160. Throughput: 0: 3681.1. Samples: 87816608. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 04:22:43,968][134211] Avg episode reward: [(0, '7.848')] [2025-01-04 04:22:45,568][134294] Updated weights for policy 0, policy_version 96344 (0.0012) [2025-01-04 04:22:47,442][134294] Updated weights for policy 0, policy_version 96354 (0.0013) [2025-01-04 04:22:48,968][134211] Fps is (10 sec: 18842.1, 60 sec: 15155.3, 300 sec: 15384.3). Total num frames: 394698752. Throughput: 0: 3817.8. Samples: 87832788. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 04:22:48,968][134211] Avg episode reward: [(0, '7.991')] [2025-01-04 04:22:49,318][134294] Updated weights for policy 0, policy_version 96364 (0.0014) [2025-01-04 04:22:51,205][134294] Updated weights for policy 0, policy_version 96374 (0.0013) [2025-01-04 04:22:53,483][134294] Updated weights for policy 0, policy_version 96384 (0.0016) [2025-01-04 04:22:53,968][134211] Fps is (10 sec: 20479.9, 60 sec: 15633.1, 300 sec: 15356.5). Total num frames: 394792960. Throughput: 0: 4094.9. Samples: 87865400. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 04:22:53,968][134211] Avg episode reward: [(0, '7.522')] [2025-01-04 04:22:56,863][134294] Updated weights for policy 0, policy_version 96394 (0.0026) [2025-01-04 04:22:58,968][134211] Fps is (10 sec: 15564.0, 60 sec: 15564.7, 300 sec: 15315.0). Total num frames: 394854400. Throughput: 0: 4086.4. Samples: 87884586. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 04:22:58,969][134211] Avg episode reward: [(0, '7.864')] [2025-01-04 04:23:00,249][134294] Updated weights for policy 0, policy_version 96404 (0.0026) [2025-01-04 04:23:03,344][134294] Updated weights for policy 0, policy_version 96414 (0.0027) [2025-01-04 04:23:03,968][134211] Fps is (10 sec: 12287.8, 60 sec: 15496.5, 300 sec: 15301.0). Total num frames: 394915840. Throughput: 0: 4068.6. Samples: 87894008. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 04:23:03,969][134211] Avg episode reward: [(0, '7.970')] [2025-01-04 04:23:06,416][134294] Updated weights for policy 0, policy_version 96424 (0.0027) [2025-01-04 04:23:08,968][134211] Fps is (10 sec: 13107.3, 60 sec: 15564.7, 300 sec: 15328.8). Total num frames: 394985472. Throughput: 0: 3782.2. Samples: 87913806. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 04:23:08,969][134211] Avg episode reward: [(0, '8.320')] [2025-01-04 04:23:09,588][134294] Updated weights for policy 0, policy_version 96434 (0.0025) [2025-01-04 04:23:12,607][134294] Updated weights for policy 0, policy_version 96444 (0.0027) [2025-01-04 04:23:13,968][134211] Fps is (10 sec: 13517.0, 60 sec: 15564.8, 300 sec: 15314.9). Total num frames: 395051008. Throughput: 0: 3717.2. Samples: 87933776. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 04:23:13,968][134211] Avg episode reward: [(0, '7.422')] [2025-01-04 04:23:13,976][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000096448_395051008.pth... [2025-01-04 04:23:14,042][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000095559_391409664.pth [2025-01-04 04:23:15,556][134294] Updated weights for policy 0, policy_version 96454 (0.0025) [2025-01-04 04:23:18,701][134294] Updated weights for policy 0, policy_version 96464 (0.0024) [2025-01-04 04:23:18,968][134211] Fps is (10 sec: 13107.6, 60 sec: 15086.9, 300 sec: 15301.0). Total num frames: 395116544. Throughput: 0: 3731.3. Samples: 87944108. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 04:23:18,968][134211] Avg episode reward: [(0, '7.014')] [2025-01-04 04:23:21,382][134294] Updated weights for policy 0, policy_version 96474 (0.0018) [2025-01-04 04:23:23,211][134294] Updated weights for policy 0, policy_version 96484 (0.0013) [2025-01-04 04:23:23,968][134211] Fps is (10 sec: 15974.6, 60 sec: 14950.4, 300 sec: 15287.1). Total num frames: 395210752. Throughput: 0: 3805.8. Samples: 87967040. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 04:23:23,968][134211] Avg episode reward: [(0, '7.860')] [2025-01-04 04:23:25,126][134294] Updated weights for policy 0, policy_version 96494 (0.0013) [2025-01-04 04:23:27,087][134294] Updated weights for policy 0, policy_version 96504 (0.0014) [2025-01-04 04:23:28,967][134211] Fps is (10 sec: 20070.8, 60 sec: 15564.9, 300 sec: 15356.5). Total num frames: 395317248. Throughput: 0: 4042.3. Samples: 87998512. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 04:23:28,968][134211] Avg episode reward: [(0, '8.072')] [2025-01-04 04:23:29,188][134294] Updated weights for policy 0, policy_version 96514 (0.0016) [2025-01-04 04:23:31,241][134294] Updated weights for policy 0, policy_version 96524 (0.0014) [2025-01-04 04:23:33,210][134294] Updated weights for policy 0, policy_version 96534 (0.0015) [2025-01-04 04:23:33,968][134211] Fps is (10 sec: 20070.2, 60 sec: 16111.0, 300 sec: 15453.7). Total num frames: 395411456. Throughput: 0: 4017.9. Samples: 88013596. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 04:23:33,968][134211] Avg episode reward: [(0, '7.471')] [2025-01-04 04:23:36,184][134294] Updated weights for policy 0, policy_version 96544 (0.0025) [2025-01-04 04:23:38,968][134211] Fps is (10 sec: 15973.9, 60 sec: 16110.9, 300 sec: 15453.7). Total num frames: 395476992. Throughput: 0: 3821.3. Samples: 88037358. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 04:23:38,968][134211] Avg episode reward: [(0, '7.964')] [2025-01-04 04:23:39,542][134294] Updated weights for policy 0, policy_version 96554 (0.0026) [2025-01-04 04:23:42,662][134294] Updated weights for policy 0, policy_version 96564 (0.0029) [2025-01-04 04:23:43,968][134211] Fps is (10 sec: 13107.0, 60 sec: 15906.1, 300 sec: 15439.8). Total num frames: 395542528. Throughput: 0: 3814.6. Samples: 88056240. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 04:23:43,969][134211] Avg episode reward: [(0, '7.733')] [2025-01-04 04:23:45,769][134294] Updated weights for policy 0, policy_version 96574 (0.0024) [2025-01-04 04:23:48,937][134294] Updated weights for policy 0, policy_version 96584 (0.0026) [2025-01-04 04:23:48,968][134211] Fps is (10 sec: 13107.2, 60 sec: 15155.2, 300 sec: 15314.9). Total num frames: 395608064. Throughput: 0: 3830.4. Samples: 88066376. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 04:23:48,969][134211] Avg episode reward: [(0, '7.598')] [2025-01-04 04:23:51,890][134294] Updated weights for policy 0, policy_version 96594 (0.0025) [2025-01-04 04:23:53,968][134211] Fps is (10 sec: 13106.9, 60 sec: 14677.3, 300 sec: 15314.9). Total num frames: 395673600. Throughput: 0: 3833.3. Samples: 88086306. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 04:23:53,969][134211] Avg episode reward: [(0, '7.625')] [2025-01-04 04:23:55,060][134294] Updated weights for policy 0, policy_version 96604 (0.0025) [2025-01-04 04:23:58,005][134294] Updated weights for policy 0, policy_version 96614 (0.0027) [2025-01-04 04:23:58,968][134211] Fps is (10 sec: 13516.7, 60 sec: 14813.9, 300 sec: 15328.8). Total num frames: 395743232. Throughput: 0: 3835.7. Samples: 88106382. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 04:23:58,968][134211] Avg episode reward: [(0, '7.825')] [2025-01-04 04:24:01,089][134294] Updated weights for policy 0, policy_version 96624 (0.0025) [2025-01-04 04:24:03,968][134211] Fps is (10 sec: 13107.6, 60 sec: 14813.9, 300 sec: 15273.3). Total num frames: 395804672. Throughput: 0: 3834.0. Samples: 88116636. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 04:24:03,968][134211] Avg episode reward: [(0, '7.382')] [2025-01-04 04:24:04,289][134294] Updated weights for policy 0, policy_version 96634 (0.0027) [2025-01-04 04:24:07,346][134294] Updated weights for policy 0, policy_version 96644 (0.0026) [2025-01-04 04:24:08,967][134211] Fps is (10 sec: 13107.6, 60 sec: 14814.0, 300 sec: 15148.3). Total num frames: 395874304. Throughput: 0: 3758.4. Samples: 88136168. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 04:24:08,968][134211] Avg episode reward: [(0, '7.851')] [2025-01-04 04:24:09,746][134294] Updated weights for policy 0, policy_version 96654 (0.0016) [2025-01-04 04:24:11,646][134294] Updated weights for policy 0, policy_version 96664 (0.0012) [2025-01-04 04:24:13,512][134294] Updated weights for policy 0, policy_version 96674 (0.0014) [2025-01-04 04:24:13,968][134211] Fps is (10 sec: 18022.5, 60 sec: 15564.8, 300 sec: 15245.5). Total num frames: 395984896. Throughput: 0: 3714.4. Samples: 88165660. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 04:24:13,968][134211] Avg episode reward: [(0, '7.723')] [2025-01-04 04:24:15,592][134294] Updated weights for policy 0, policy_version 96684 (0.0019) [2025-01-04 04:24:18,966][134294] Updated weights for policy 0, policy_version 96694 (0.0028) [2025-01-04 04:24:18,968][134211] Fps is (10 sec: 18431.8, 60 sec: 15701.3, 300 sec: 15273.2). Total num frames: 396058624. Throughput: 0: 3689.5. Samples: 88179622. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 04:24:18,968][134211] Avg episode reward: [(0, '7.944')] [2025-01-04 04:24:22,199][134294] Updated weights for policy 0, policy_version 96704 (0.0025) [2025-01-04 04:24:23,968][134211] Fps is (10 sec: 13516.5, 60 sec: 15155.1, 300 sec: 15287.1). Total num frames: 396120064. Throughput: 0: 3569.4. Samples: 88197982. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 04:24:23,968][134211] Avg episode reward: [(0, '8.100')] [2025-01-04 04:24:25,677][134294] Updated weights for policy 0, policy_version 96714 (0.0026) [2025-01-04 04:24:27,557][134294] Updated weights for policy 0, policy_version 96724 (0.0015) [2025-01-04 04:24:28,968][134211] Fps is (10 sec: 14335.8, 60 sec: 14745.5, 300 sec: 15342.6). Total num frames: 396201984. Throughput: 0: 3665.2. Samples: 88221176. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 04:24:28,968][134211] Avg episode reward: [(0, '8.292')] [2025-01-04 04:24:30,319][134294] Updated weights for policy 0, policy_version 96734 (0.0024) [2025-01-04 04:24:33,302][134294] Updated weights for policy 0, policy_version 96744 (0.0025) [2025-01-04 04:24:33,969][134211] Fps is (10 sec: 15152.8, 60 sec: 14335.6, 300 sec: 15203.7). Total num frames: 396271616. Throughput: 0: 3674.5. Samples: 88231736. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 04:24:33,970][134211] Avg episode reward: [(0, '8.433')] [2025-01-04 04:24:36,169][134294] Updated weights for policy 0, policy_version 96754 (0.0022) [2025-01-04 04:24:38,088][134294] Updated weights for policy 0, policy_version 96764 (0.0013) [2025-01-04 04:24:38,968][134211] Fps is (10 sec: 15974.6, 60 sec: 14745.6, 300 sec: 15287.1). Total num frames: 396361728. Throughput: 0: 3743.9. Samples: 88254782. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 04:24:38,968][134211] Avg episode reward: [(0, '8.168')] [2025-01-04 04:24:40,061][134294] Updated weights for policy 0, policy_version 96774 (0.0014) [2025-01-04 04:24:41,962][134294] Updated weights for policy 0, policy_version 96784 (0.0013) [2025-01-04 04:24:43,968][134211] Fps is (10 sec: 19254.5, 60 sec: 15360.0, 300 sec: 15398.2). Total num frames: 396464128. Throughput: 0: 4003.4. Samples: 88286536. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 04:24:43,968][134211] Avg episode reward: [(0, '7.907')] [2025-01-04 04:24:44,082][134294] Updated weights for policy 0, policy_version 96794 (0.0017) [2025-01-04 04:24:47,205][134294] Updated weights for policy 0, policy_version 96804 (0.0025) [2025-01-04 04:24:48,968][134211] Fps is (10 sec: 16793.4, 60 sec: 15360.0, 300 sec: 15398.2). Total num frames: 396529664. Throughput: 0: 4013.5. Samples: 88297242. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 04:24:48,968][134211] Avg episode reward: [(0, '7.149')] [2025-01-04 04:24:50,246][134294] Updated weights for policy 0, policy_version 96814 (0.0024) [2025-01-04 04:24:53,333][134294] Updated weights for policy 0, policy_version 96824 (0.0026) [2025-01-04 04:24:53,968][134211] Fps is (10 sec: 13516.6, 60 sec: 15428.3, 300 sec: 15398.2). Total num frames: 396599296. Throughput: 0: 4024.3. Samples: 88317264. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 04:24:53,968][134211] Avg episode reward: [(0, '8.302')] [2025-01-04 04:24:56,349][134294] Updated weights for policy 0, policy_version 96834 (0.0024) [2025-01-04 04:24:58,968][134211] Fps is (10 sec: 13517.0, 60 sec: 15360.1, 300 sec: 15301.0). Total num frames: 396664832. Throughput: 0: 3806.0. Samples: 88336928. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 04:24:58,968][134211] Avg episode reward: [(0, '8.243')] [2025-01-04 04:24:59,627][134294] Updated weights for policy 0, policy_version 96844 (0.0024) [2025-01-04 04:25:02,768][134294] Updated weights for policy 0, policy_version 96854 (0.0025) [2025-01-04 04:25:03,968][134211] Fps is (10 sec: 12697.7, 60 sec: 15360.0, 300 sec: 15189.9). Total num frames: 396726272. Throughput: 0: 3705.5. Samples: 88346372. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 04:25:03,969][134211] Avg episode reward: [(0, '7.652')] [2025-01-04 04:25:05,814][134294] Updated weights for policy 0, policy_version 96864 (0.0027) [2025-01-04 04:25:08,675][134294] Updated weights for policy 0, policy_version 96874 (0.0025) [2025-01-04 04:25:08,968][134211] Fps is (10 sec: 13107.2, 60 sec: 15360.0, 300 sec: 15203.8). Total num frames: 396795904. Throughput: 0: 3754.5. Samples: 88366934. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 04:25:08,968][134211] Avg episode reward: [(0, '8.708')] [2025-01-04 04:25:11,001][134294] Updated weights for policy 0, policy_version 96884 (0.0017) [2025-01-04 04:25:12,994][134294] Updated weights for policy 0, policy_version 96894 (0.0013) [2025-01-04 04:25:13,968][134211] Fps is (10 sec: 17203.4, 60 sec: 15223.5, 300 sec: 15329.0). Total num frames: 396898304. Throughput: 0: 3832.9. Samples: 88393658. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 04:25:13,968][134211] Avg episode reward: [(0, '8.199')] [2025-01-04 04:25:13,973][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000096899_396898304.pth... [2025-01-04 04:25:14,017][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000096003_393228288.pth [2025-01-04 04:25:14,963][134294] Updated weights for policy 0, policy_version 96904 (0.0013) [2025-01-04 04:25:16,850][134294] Updated weights for policy 0, policy_version 96914 (0.0015) [2025-01-04 04:25:18,723][134294] Updated weights for policy 0, policy_version 96924 (0.0014) [2025-01-04 04:25:18,968][134211] Fps is (10 sec: 20889.6, 60 sec: 15769.6, 300 sec: 15467.6). Total num frames: 397004800. Throughput: 0: 3954.8. Samples: 88409696. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 04:25:18,968][134211] Avg episode reward: [(0, '8.462')] [2025-01-04 04:25:20,599][134294] Updated weights for policy 0, policy_version 96934 (0.0015) [2025-01-04 04:25:23,827][134294] Updated weights for policy 0, policy_version 96944 (0.0024) [2025-01-04 04:25:23,968][134211] Fps is (10 sec: 18431.6, 60 sec: 16042.7, 300 sec: 15439.8). Total num frames: 397082624. Throughput: 0: 4101.1. Samples: 88439334. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 04:25:23,969][134211] Avg episode reward: [(0, '8.689')] [2025-01-04 04:25:27,349][134294] Updated weights for policy 0, policy_version 96954 (0.0029) [2025-01-04 04:25:28,968][134211] Fps is (10 sec: 13516.6, 60 sec: 15633.1, 300 sec: 15259.3). Total num frames: 397139968. Throughput: 0: 3769.2. Samples: 88456150. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 04:25:28,968][134211] Avg episode reward: [(0, '7.831')] [2025-01-04 04:25:31,047][134294] Updated weights for policy 0, policy_version 96964 (0.0028) [2025-01-04 04:25:33,968][134211] Fps is (10 sec: 11059.3, 60 sec: 15360.4, 300 sec: 15120.5). Total num frames: 397193216. Throughput: 0: 3723.5. Samples: 88464798. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 04:25:33,968][134211] Avg episode reward: [(0, '7.774')] [2025-01-04 04:25:34,657][134294] Updated weights for policy 0, policy_version 96974 (0.0027) [2025-01-04 04:25:38,131][134294] Updated weights for policy 0, policy_version 96984 (0.0027) [2025-01-04 04:25:38,968][134211] Fps is (10 sec: 11468.8, 60 sec: 14882.1, 300 sec: 15121.6). Total num frames: 397254656. Throughput: 0: 3660.9. Samples: 88482004. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 04:25:38,968][134211] Avg episode reward: [(0, '8.142')] [2025-01-04 04:25:40,494][134294] Updated weights for policy 0, policy_version 96994 (0.0017) [2025-01-04 04:25:42,438][134294] Updated weights for policy 0, policy_version 97004 (0.0014) [2025-01-04 04:25:43,967][134211] Fps is (10 sec: 16384.5, 60 sec: 14882.2, 300 sec: 15231.6). Total num frames: 397357056. Throughput: 0: 3828.6. Samples: 88509216. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 04:25:43,968][134211] Avg episode reward: [(0, '7.036')] [2025-01-04 04:25:44,305][134294] Updated weights for policy 0, policy_version 97014 (0.0013) [2025-01-04 04:25:46,229][134294] Updated weights for policy 0, policy_version 97024 (0.0014) [2025-01-04 04:25:48,088][134294] Updated weights for policy 0, policy_version 97034 (0.0012) [2025-01-04 04:25:48,968][134211] Fps is (10 sec: 21299.6, 60 sec: 15633.1, 300 sec: 15384.3). Total num frames: 397467648. Throughput: 0: 3983.6. Samples: 88525634. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 04:25:48,968][134211] Avg episode reward: [(0, '7.852')] [2025-01-04 04:25:50,645][134294] Updated weights for policy 0, policy_version 97044 (0.0023) [2025-01-04 04:25:53,968][134211] Fps is (10 sec: 17201.8, 60 sec: 15496.4, 300 sec: 15384.3). Total num frames: 397529088. Throughput: 0: 4083.7. Samples: 88550704. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 04:25:53,969][134211] Avg episode reward: [(0, '7.299')] [2025-01-04 04:25:54,009][134294] Updated weights for policy 0, policy_version 97054 (0.0028) [2025-01-04 04:25:57,196][134294] Updated weights for policy 0, policy_version 97064 (0.0026) [2025-01-04 04:25:58,968][134211] Fps is (10 sec: 12697.6, 60 sec: 15496.5, 300 sec: 15370.4). Total num frames: 397594624. Throughput: 0: 3916.8. Samples: 88569912. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 04:25:58,968][134211] Avg episode reward: [(0, '7.451')] [2025-01-04 04:26:00,319][134294] Updated weights for policy 0, policy_version 97074 (0.0025) [2025-01-04 04:26:03,421][134294] Updated weights for policy 0, policy_version 97084 (0.0024) [2025-01-04 04:26:03,969][134211] Fps is (10 sec: 13515.5, 60 sec: 15632.7, 300 sec: 15384.2). Total num frames: 397664256. Throughput: 0: 3785.5. Samples: 88580050. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 04:26:03,970][134211] Avg episode reward: [(0, '7.930')] [2025-01-04 04:26:06,339][134294] Updated weights for policy 0, policy_version 97094 (0.0026) [2025-01-04 04:26:08,968][134211] Fps is (10 sec: 13516.8, 60 sec: 15564.8, 300 sec: 15259.3). Total num frames: 397729792. Throughput: 0: 3574.2. Samples: 88600174. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 04:26:08,968][134211] Avg episode reward: [(0, '7.683')] [2025-01-04 04:26:09,539][134294] Updated weights for policy 0, policy_version 97104 (0.0028) [2025-01-04 04:26:12,556][134294] Updated weights for policy 0, policy_version 97114 (0.0025) [2025-01-04 04:26:13,968][134211] Fps is (10 sec: 13109.3, 60 sec: 14950.4, 300 sec: 15231.6). Total num frames: 397795328. Throughput: 0: 3637.6. Samples: 88619842. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 04:26:13,968][134211] Avg episode reward: [(0, '7.810')] [2025-01-04 04:26:15,629][134294] Updated weights for policy 0, policy_version 97124 (0.0027) [2025-01-04 04:26:18,596][134294] Updated weights for policy 0, policy_version 97134 (0.0023) [2025-01-04 04:26:18,968][134211] Fps is (10 sec: 13516.6, 60 sec: 14336.0, 300 sec: 15259.3). Total num frames: 397864960. Throughput: 0: 3675.5. Samples: 88630196. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 04:26:18,968][134211] Avg episode reward: [(0, '8.459')] [2025-01-04 04:26:21,400][134294] Updated weights for policy 0, policy_version 97144 (0.0026) [2025-01-04 04:26:23,968][134211] Fps is (10 sec: 13926.5, 60 sec: 14199.5, 300 sec: 15273.2). Total num frames: 397934592. Throughput: 0: 3757.3. Samples: 88651080. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 04:26:23,968][134211] Avg episode reward: [(0, '7.173')] [2025-01-04 04:26:24,397][134294] Updated weights for policy 0, policy_version 97154 (0.0023) [2025-01-04 04:26:27,158][134294] Updated weights for policy 0, policy_version 97164 (0.0017) [2025-01-04 04:26:28,968][134211] Fps is (10 sec: 15564.6, 60 sec: 14677.3, 300 sec: 15203.8). Total num frames: 398020608. Throughput: 0: 3671.5. Samples: 88674436. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 04:26:28,968][134211] Avg episode reward: [(0, '7.575')] [2025-01-04 04:26:29,163][134294] Updated weights for policy 0, policy_version 97174 (0.0015) [2025-01-04 04:26:31,007][134294] Updated weights for policy 0, policy_version 97184 (0.0013) [2025-01-04 04:26:32,921][134294] Updated weights for policy 0, policy_version 97194 (0.0014) [2025-01-04 04:26:33,968][134211] Fps is (10 sec: 19251.2, 60 sec: 15564.8, 300 sec: 15203.8). Total num frames: 398127104. Throughput: 0: 3665.0. Samples: 88690560. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 04:26:33,968][134211] Avg episode reward: [(0, '8.166')] [2025-01-04 04:26:34,751][134294] Updated weights for policy 0, policy_version 97204 (0.0013) [2025-01-04 04:26:36,641][134294] Updated weights for policy 0, policy_version 97214 (0.0013) [2025-01-04 04:26:38,968][134211] Fps is (10 sec: 20480.5, 60 sec: 16179.2, 300 sec: 15245.5). Total num frames: 398225408. Throughput: 0: 3832.0. Samples: 88723140. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 04:26:38,968][134211] Avg episode reward: [(0, '7.935')] [2025-01-04 04:26:39,282][134294] Updated weights for policy 0, policy_version 97224 (0.0022) [2025-01-04 04:26:42,546][134294] Updated weights for policy 0, policy_version 97234 (0.0028) [2025-01-04 04:26:43,968][134211] Fps is (10 sec: 15973.8, 60 sec: 15496.4, 300 sec: 15245.4). Total num frames: 398286848. Throughput: 0: 3838.4. Samples: 88742642. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 04:26:43,968][134211] Avg episode reward: [(0, '8.024')] [2025-01-04 04:26:45,765][134294] Updated weights for policy 0, policy_version 97244 (0.0028) [2025-01-04 04:26:48,728][134294] Updated weights for policy 0, policy_version 97254 (0.0030) [2025-01-04 04:26:48,968][134211] Fps is (10 sec: 12697.6, 60 sec: 14745.6, 300 sec: 15245.5). Total num frames: 398352384. Throughput: 0: 3835.8. Samples: 88752654. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 04:26:48,968][134211] Avg episode reward: [(0, '8.145')] [2025-01-04 04:26:51,742][134294] Updated weights for policy 0, policy_version 97264 (0.0026) [2025-01-04 04:26:53,968][134211] Fps is (10 sec: 13107.4, 60 sec: 14814.0, 300 sec: 15245.5). Total num frames: 398417920. Throughput: 0: 3834.7. Samples: 88772736. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 04:26:53,968][134211] Avg episode reward: [(0, '7.882')] [2025-01-04 04:26:54,902][134294] Updated weights for policy 0, policy_version 97274 (0.0027) [2025-01-04 04:26:57,949][134294] Updated weights for policy 0, policy_version 97284 (0.0025) [2025-01-04 04:26:58,968][134211] Fps is (10 sec: 13516.7, 60 sec: 14882.1, 300 sec: 15259.3). Total num frames: 398487552. Throughput: 0: 3845.5. Samples: 88792892. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 04:26:58,968][134211] Avg episode reward: [(0, '7.910')] [2025-01-04 04:27:00,978][134294] Updated weights for policy 0, policy_version 97294 (0.0027) [2025-01-04 04:27:03,968][134211] Fps is (10 sec: 13516.8, 60 sec: 14814.2, 300 sec: 15259.3). Total num frames: 398553088. Throughput: 0: 3839.1. Samples: 88802954. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 04:27:03,968][134211] Avg episode reward: [(0, '8.207')] [2025-01-04 04:27:04,059][134294] Updated weights for policy 0, policy_version 97304 (0.0024) [2025-01-04 04:27:07,011][134294] Updated weights for policy 0, policy_version 97314 (0.0025) [2025-01-04 04:27:08,968][134211] Fps is (10 sec: 14335.8, 60 sec: 15018.6, 300 sec: 15301.0). Total num frames: 398630912. Throughput: 0: 3825.7. Samples: 88823238. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 04:27:08,968][134211] Avg episode reward: [(0, '8.494')] [2025-01-04 04:27:09,262][134294] Updated weights for policy 0, policy_version 97324 (0.0017) [2025-01-04 04:27:11,128][134294] Updated weights for policy 0, policy_version 97334 (0.0014) [2025-01-04 04:27:13,029][134294] Updated weights for policy 0, policy_version 97344 (0.0013) [2025-01-04 04:27:13,968][134211] Fps is (10 sec: 18842.1, 60 sec: 15769.6, 300 sec: 15356.5). Total num frames: 398741504. Throughput: 0: 4000.1. Samples: 88854438. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 04:27:13,968][134211] Avg episode reward: [(0, '7.730')] [2025-01-04 04:27:13,975][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000097349_398741504.pth... [2025-01-04 04:27:14,022][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000096448_395051008.pth [2025-01-04 04:27:14,885][134294] Updated weights for policy 0, policy_version 97354 (0.0015) [2025-01-04 04:27:17,505][134294] Updated weights for policy 0, policy_version 97364 (0.0022) [2025-01-04 04:27:18,968][134211] Fps is (10 sec: 18841.8, 60 sec: 15906.1, 300 sec: 15273.2). Total num frames: 398819328. Throughput: 0: 3973.4. Samples: 88869364. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 04:27:18,968][134211] Avg episode reward: [(0, '7.737')] [2025-01-04 04:27:20,525][134294] Updated weights for policy 0, policy_version 97374 (0.0027) [2025-01-04 04:27:23,679][134294] Updated weights for policy 0, policy_version 97384 (0.0026) [2025-01-04 04:27:23,968][134211] Fps is (10 sec: 14334.9, 60 sec: 15837.7, 300 sec: 15259.3). Total num frames: 398884864. Throughput: 0: 3695.9. Samples: 88889458. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 04:27:23,969][134211] Avg episode reward: [(0, '8.411')] [2025-01-04 04:27:26,813][134294] Updated weights for policy 0, policy_version 97394 (0.0025) [2025-01-04 04:27:28,968][134211] Fps is (10 sec: 12697.6, 60 sec: 15428.3, 300 sec: 15259.3). Total num frames: 398946304. Throughput: 0: 3687.2. Samples: 88908566. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 04:27:28,968][134211] Avg episode reward: [(0, '8.460')] [2025-01-04 04:27:30,466][134294] Updated weights for policy 0, policy_version 97404 (0.0027) [2025-01-04 04:27:33,667][134294] Updated weights for policy 0, policy_version 97414 (0.0022) [2025-01-04 04:27:33,968][134211] Fps is (10 sec: 12698.6, 60 sec: 14745.6, 300 sec: 15259.3). Total num frames: 399011840. Throughput: 0: 3650.4. Samples: 88916922. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 04:27:33,968][134211] Avg episode reward: [(0, '8.292')] [2025-01-04 04:27:35,749][134294] Updated weights for policy 0, policy_version 97424 (0.0013) [2025-01-04 04:27:37,636][134294] Updated weights for policy 0, policy_version 97434 (0.0013) [2025-01-04 04:27:38,968][134211] Fps is (10 sec: 17203.6, 60 sec: 14882.2, 300 sec: 15356.5). Total num frames: 399118336. Throughput: 0: 3788.3. Samples: 88943208. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 04:27:38,968][134211] Avg episode reward: [(0, '8.083')] [2025-01-04 04:27:39,510][134294] Updated weights for policy 0, policy_version 97444 (0.0014) [2025-01-04 04:27:41,406][134294] Updated weights for policy 0, policy_version 97454 (0.0013) [2025-01-04 04:27:43,283][134294] Updated weights for policy 0, policy_version 97464 (0.0013) [2025-01-04 04:27:43,968][134211] Fps is (10 sec: 21298.7, 60 sec: 15633.1, 300 sec: 15342.6). Total num frames: 399224832. Throughput: 0: 4062.1. Samples: 88975686. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 04:27:43,968][134211] Avg episode reward: [(0, '9.217')] [2025-01-04 04:27:45,483][134294] Updated weights for policy 0, policy_version 97474 (0.0017) [2025-01-04 04:27:48,583][134294] Updated weights for policy 0, policy_version 97484 (0.0028) [2025-01-04 04:27:48,968][134211] Fps is (10 sec: 17612.7, 60 sec: 15701.3, 300 sec: 15259.3). Total num frames: 399294464. Throughput: 0: 4135.2. Samples: 88989038. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 04:27:48,968][134211] Avg episode reward: [(0, '7.442')] [2025-01-04 04:27:51,868][134294] Updated weights for policy 0, policy_version 97494 (0.0028) [2025-01-04 04:27:53,968][134211] Fps is (10 sec: 13926.6, 60 sec: 15769.6, 300 sec: 15287.1). Total num frames: 399364096. Throughput: 0: 4109.6. Samples: 89008168. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 04:27:53,968][134211] Avg episode reward: [(0, '8.188')] [2025-01-04 04:27:54,937][134294] Updated weights for policy 0, policy_version 97504 (0.0024) [2025-01-04 04:27:58,077][134294] Updated weights for policy 0, policy_version 97514 (0.0026) [2025-01-04 04:27:58,968][134211] Fps is (10 sec: 13107.0, 60 sec: 15633.0, 300 sec: 15287.1). Total num frames: 399425536. Throughput: 0: 3853.6. Samples: 89027850. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 04:27:58,968][134211] Avg episode reward: [(0, '7.497')] [2025-01-04 04:28:01,130][134294] Updated weights for policy 0, policy_version 97524 (0.0028) [2025-01-04 04:28:03,968][134211] Fps is (10 sec: 12697.4, 60 sec: 15633.1, 300 sec: 15273.2). Total num frames: 399491072. Throughput: 0: 3742.9. Samples: 89037796. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 04:28:03,968][134211] Avg episode reward: [(0, '8.553')] [2025-01-04 04:28:04,383][134294] Updated weights for policy 0, policy_version 97534 (0.0029) [2025-01-04 04:28:07,356][134294] Updated weights for policy 0, policy_version 97544 (0.0024) [2025-01-04 04:28:08,970][134211] Fps is (10 sec: 13104.5, 60 sec: 15427.8, 300 sec: 15273.1). Total num frames: 399556608. Throughput: 0: 3732.4. Samples: 89057422. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 04:28:08,970][134211] Avg episode reward: [(0, '8.124')] [2025-01-04 04:28:10,508][134294] Updated weights for policy 0, policy_version 97554 (0.0023) [2025-01-04 04:28:13,454][134294] Updated weights for policy 0, policy_version 97564 (0.0023) [2025-01-04 04:28:13,968][134211] Fps is (10 sec: 13517.0, 60 sec: 14745.6, 300 sec: 15287.1). Total num frames: 399626240. Throughput: 0: 3759.4. Samples: 89077738. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 04:28:13,968][134211] Avg episode reward: [(0, '7.565')] [2025-01-04 04:28:16,461][134294] Updated weights for policy 0, policy_version 97574 (0.0026) [2025-01-04 04:28:18,968][134211] Fps is (10 sec: 13929.3, 60 sec: 14609.1, 300 sec: 15203.8). Total num frames: 399695872. Throughput: 0: 3801.6. Samples: 89087994. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 04:28:18,968][134211] Avg episode reward: [(0, '7.448')] [2025-01-04 04:28:19,511][134294] Updated weights for policy 0, policy_version 97584 (0.0027) [2025-01-04 04:28:22,058][134294] Updated weights for policy 0, policy_version 97594 (0.0017) [2025-01-04 04:28:23,967][134211] Fps is (10 sec: 15565.0, 60 sec: 14950.6, 300 sec: 15134.4). Total num frames: 399781888. Throughput: 0: 3708.4. Samples: 89110084. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 04:28:23,968][134211] Avg episode reward: [(0, '8.089')] [2025-01-04 04:28:23,983][134294] Updated weights for policy 0, policy_version 97604 (0.0012) [2025-01-04 04:28:25,888][134294] Updated weights for policy 0, policy_version 97614 (0.0014) [2025-01-04 04:28:27,865][134294] Updated weights for policy 0, policy_version 97624 (0.0015) [2025-01-04 04:28:28,968][134211] Fps is (10 sec: 18841.6, 60 sec: 15633.1, 300 sec: 15162.1). Total num frames: 399884288. Throughput: 0: 3687.0. Samples: 89141602. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 04:28:28,968][134211] Avg episode reward: [(0, '7.824')] [2025-01-04 04:28:30,699][134294] Updated weights for policy 0, policy_version 97634 (0.0025) [2025-01-04 04:28:33,840][134294] Updated weights for policy 0, policy_version 97644 (0.0026) [2025-01-04 04:28:33,968][134211] Fps is (10 sec: 16793.2, 60 sec: 15633.0, 300 sec: 15162.1). Total num frames: 399949824. Throughput: 0: 3618.5. Samples: 89151870. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 04:28:33,968][134211] Avg episode reward: [(0, '8.362')] [2025-01-04 04:28:37,007][134294] Updated weights for policy 0, policy_version 97654 (0.0027) [2025-01-04 04:28:38,968][134211] Fps is (10 sec: 13106.7, 60 sec: 14950.3, 300 sec: 15162.1). Total num frames: 400015360. Throughput: 0: 3630.8. Samples: 89171556. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 04:28:38,969][134211] Avg episode reward: [(0, '7.597')] [2025-01-04 04:28:40,111][134294] Updated weights for policy 0, policy_version 97664 (0.0025) [2025-01-04 04:28:43,119][134294] Updated weights for policy 0, policy_version 97674 (0.0028) [2025-01-04 04:28:43,968][134211] Fps is (10 sec: 13107.3, 60 sec: 14267.7, 300 sec: 15162.1). Total num frames: 400080896. Throughput: 0: 3641.2. Samples: 89191702. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 04:28:43,968][134211] Avg episode reward: [(0, '8.176')] [2025-01-04 04:28:45,947][134294] Updated weights for policy 0, policy_version 97684 (0.0024) [2025-01-04 04:28:47,834][134294] Updated weights for policy 0, policy_version 97694 (0.0014) [2025-01-04 04:28:48,967][134211] Fps is (10 sec: 15975.4, 60 sec: 14677.4, 300 sec: 15259.4). Total num frames: 400175104. Throughput: 0: 3667.8. Samples: 89202848. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 04:28:48,968][134211] Avg episode reward: [(0, '7.995')] [2025-01-04 04:28:49,778][134294] Updated weights for policy 0, policy_version 97704 (0.0013) [2025-01-04 04:28:52,478][134294] Updated weights for policy 0, policy_version 97714 (0.0020) [2025-01-04 04:28:53,968][134211] Fps is (10 sec: 17203.1, 60 sec: 14813.8, 300 sec: 15287.1). Total num frames: 400252928. Throughput: 0: 3865.9. Samples: 89231382. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 04:28:53,969][134211] Avg episode reward: [(0, '7.990')] [2025-01-04 04:28:55,604][134294] Updated weights for policy 0, policy_version 97724 (0.0028) [2025-01-04 04:28:58,704][134294] Updated weights for policy 0, policy_version 97734 (0.0027) [2025-01-04 04:28:58,968][134211] Fps is (10 sec: 14335.7, 60 sec: 14882.2, 300 sec: 15301.0). Total num frames: 400318464. Throughput: 0: 3857.5. Samples: 89251326. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 04:28:58,968][134211] Avg episode reward: [(0, '7.826')] [2025-01-04 04:29:01,735][134294] Updated weights for policy 0, policy_version 97744 (0.0027) [2025-01-04 04:29:03,968][134211] Fps is (10 sec: 13107.3, 60 sec: 14882.2, 300 sec: 15287.1). Total num frames: 400384000. Throughput: 0: 3844.2. Samples: 89260984. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 04:29:03,968][134211] Avg episode reward: [(0, '7.191')] [2025-01-04 04:29:04,649][134294] Updated weights for policy 0, policy_version 97754 (0.0020) [2025-01-04 04:29:06,548][134294] Updated weights for policy 0, policy_version 97764 (0.0015) [2025-01-04 04:29:08,435][134294] Updated weights for policy 0, policy_version 97774 (0.0013) [2025-01-04 04:29:08,967][134211] Fps is (10 sec: 17203.4, 60 sec: 15565.4, 300 sec: 15273.2). Total num frames: 400490496. Throughput: 0: 3932.2. Samples: 89287032. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 04:29:08,968][134211] Avg episode reward: [(0, '7.245')] [2025-01-04 04:29:10,285][134294] Updated weights for policy 0, policy_version 97784 (0.0013) [2025-01-04 04:29:12,192][134294] Updated weights for policy 0, policy_version 97794 (0.0014) [2025-01-04 04:29:13,968][134211] Fps is (10 sec: 21299.1, 60 sec: 16179.2, 300 sec: 15384.3). Total num frames: 400596992. Throughput: 0: 3958.0. Samples: 89319712. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 04:29:13,968][134211] Avg episode reward: [(0, '8.413')] [2025-01-04 04:29:13,975][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000097803_400601088.pth... [2025-01-04 04:29:14,046][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000096899_396898304.pth [2025-01-04 04:29:14,326][134294] Updated weights for policy 0, policy_version 97804 (0.0018) [2025-01-04 04:29:17,424][134294] Updated weights for policy 0, policy_version 97814 (0.0028) [2025-01-04 04:29:18,968][134211] Fps is (10 sec: 17202.9, 60 sec: 16110.9, 300 sec: 15398.2). Total num frames: 400662528. Throughput: 0: 3974.7. Samples: 89330732. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 04:29:18,968][134211] Avg episode reward: [(0, '8.066')] [2025-01-04 04:29:20,588][134294] Updated weights for policy 0, policy_version 97824 (0.0027) [2025-01-04 04:29:23,621][134294] Updated weights for policy 0, policy_version 97834 (0.0030) [2025-01-04 04:29:23,968][134211] Fps is (10 sec: 13516.7, 60 sec: 15837.8, 300 sec: 15356.5). Total num frames: 400732160. Throughput: 0: 3971.4. Samples: 89350268. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 04:29:23,968][134211] Avg episode reward: [(0, '6.721')] [2025-01-04 04:29:26,830][134294] Updated weights for policy 0, policy_version 97844 (0.0025) [2025-01-04 04:29:28,968][134211] Fps is (10 sec: 13107.3, 60 sec: 15155.2, 300 sec: 15328.9). Total num frames: 400793600. Throughput: 0: 3941.1. Samples: 89369052. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 04:29:28,968][134211] Avg episode reward: [(0, '8.425')] [2025-01-04 04:29:30,473][134294] Updated weights for policy 0, policy_version 97854 (0.0030) [2025-01-04 04:29:33,955][134294] Updated weights for policy 0, policy_version 97864 (0.0028) [2025-01-04 04:29:33,968][134211] Fps is (10 sec: 11878.4, 60 sec: 15018.6, 300 sec: 15217.7). Total num frames: 400850944. Throughput: 0: 3884.3. Samples: 89377642. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 04:29:33,969][134211] Avg episode reward: [(0, '7.847')] [2025-01-04 04:29:36,814][134294] Updated weights for policy 0, policy_version 97874 (0.0021) [2025-01-04 04:29:38,967][134294] Updated weights for policy 0, policy_version 97884 (0.0016) [2025-01-04 04:29:38,968][134211] Fps is (10 sec: 13926.5, 60 sec: 15291.9, 300 sec: 15148.3). Total num frames: 400932864. Throughput: 0: 3701.1. Samples: 89397930. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 04:29:38,968][134211] Avg episode reward: [(0, '8.231')] [2025-01-04 04:29:41,061][134294] Updated weights for policy 0, policy_version 97894 (0.0013) [2025-01-04 04:29:43,071][134294] Updated weights for policy 0, policy_version 97904 (0.0014) [2025-01-04 04:29:43,968][134211] Fps is (10 sec: 17613.0, 60 sec: 15769.6, 300 sec: 15245.4). Total num frames: 401027072. Throughput: 0: 3910.1. Samples: 89427280. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 04:29:43,968][134211] Avg episode reward: [(0, '8.034')] [2025-01-04 04:29:45,660][134294] Updated weights for policy 0, policy_version 97914 (0.0020) [2025-01-04 04:29:48,968][134211] Fps is (10 sec: 15974.2, 60 sec: 15291.7, 300 sec: 15231.6). Total num frames: 401092608. Throughput: 0: 3948.0. Samples: 89438644. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 04:29:48,968][134211] Avg episode reward: [(0, '8.386')] [2025-01-04 04:29:49,208][134294] Updated weights for policy 0, policy_version 97924 (0.0032) [2025-01-04 04:29:52,639][134294] Updated weights for policy 0, policy_version 97934 (0.0027) [2025-01-04 04:29:53,968][134211] Fps is (10 sec: 12288.2, 60 sec: 14950.4, 300 sec: 15203.8). Total num frames: 401149952. Throughput: 0: 3762.0. Samples: 89456322. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 04:29:53,968][134211] Avg episode reward: [(0, '7.930')] [2025-01-04 04:29:56,061][134294] Updated weights for policy 0, policy_version 97944 (0.0028) [2025-01-04 04:29:58,968][134211] Fps is (10 sec: 11878.3, 60 sec: 14882.1, 300 sec: 15203.8). Total num frames: 401211392. Throughput: 0: 3432.5. Samples: 89474176. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 04:29:58,968][134211] Avg episode reward: [(0, '7.474')] [2025-01-04 04:29:59,523][134294] Updated weights for policy 0, policy_version 97954 (0.0027) [2025-01-04 04:30:02,989][134294] Updated weights for policy 0, policy_version 97964 (0.0025) [2025-01-04 04:30:03,968][134211] Fps is (10 sec: 11877.4, 60 sec: 14745.4, 300 sec: 15162.1). Total num frames: 401268736. Throughput: 0: 3386.9. Samples: 89483146. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 04:30:03,969][134211] Avg episode reward: [(0, '7.765')] [2025-01-04 04:30:05,444][134294] Updated weights for policy 0, policy_version 97974 (0.0016) [2025-01-04 04:30:07,482][134294] Updated weights for policy 0, policy_version 97984 (0.0012) [2025-01-04 04:30:08,967][134211] Fps is (10 sec: 15565.2, 60 sec: 14609.1, 300 sec: 15148.3). Total num frames: 401367040. Throughput: 0: 3488.7. Samples: 89507260. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 04:30:08,968][134211] Avg episode reward: [(0, '8.496')] [2025-01-04 04:30:09,596][134294] Updated weights for policy 0, policy_version 97994 (0.0014) [2025-01-04 04:30:11,787][134294] Updated weights for policy 0, policy_version 98004 (0.0016) [2025-01-04 04:30:13,968][134211] Fps is (10 sec: 18023.7, 60 sec: 14199.5, 300 sec: 15064.9). Total num frames: 401448960. Throughput: 0: 3654.2. Samples: 89533492. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 04:30:13,968][134211] Avg episode reward: [(0, '7.677')] [2025-01-04 04:30:15,248][134294] Updated weights for policy 0, policy_version 98014 (0.0028) [2025-01-04 04:30:18,931][134294] Updated weights for policy 0, policy_version 98024 (0.0027) [2025-01-04 04:30:18,968][134211] Fps is (10 sec: 13925.5, 60 sec: 14062.8, 300 sec: 14995.5). Total num frames: 401506304. Throughput: 0: 3650.6. Samples: 89541918. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 04:30:18,969][134211] Avg episode reward: [(0, '7.545')] [2025-01-04 04:30:22,179][134294] Updated weights for policy 0, policy_version 98034 (0.0026) [2025-01-04 04:30:23,968][134211] Fps is (10 sec: 11878.4, 60 sec: 13926.4, 300 sec: 15009.4). Total num frames: 401567744. Throughput: 0: 3598.1. Samples: 89559844. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 04:30:23,968][134211] Avg episode reward: [(0, '7.812')] [2025-01-04 04:30:25,174][134294] Updated weights for policy 0, policy_version 98044 (0.0022) [2025-01-04 04:30:27,258][134294] Updated weights for policy 0, policy_version 98054 (0.0014) [2025-01-04 04:30:28,968][134211] Fps is (10 sec: 15156.1, 60 sec: 14404.3, 300 sec: 15134.4). Total num frames: 401657856. Throughput: 0: 3499.0. Samples: 89584734. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 04:30:28,968][134211] Avg episode reward: [(0, '7.235')] [2025-01-04 04:30:29,363][134294] Updated weights for policy 0, policy_version 98064 (0.0012) [2025-01-04 04:30:31,440][134294] Updated weights for policy 0, policy_version 98074 (0.0014) [2025-01-04 04:30:33,968][134211] Fps is (10 sec: 17612.7, 60 sec: 14882.1, 300 sec: 15217.7). Total num frames: 401743872. Throughput: 0: 3583.5. Samples: 89599900. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 04:30:33,969][134211] Avg episode reward: [(0, '7.802')] [2025-01-04 04:30:34,622][134294] Updated weights for policy 0, policy_version 98084 (0.0028) [2025-01-04 04:30:38,161][134294] Updated weights for policy 0, policy_version 98094 (0.0032) [2025-01-04 04:30:38,968][134211] Fps is (10 sec: 13926.1, 60 sec: 14404.2, 300 sec: 15051.0). Total num frames: 401797120. Throughput: 0: 3604.9. Samples: 89618542. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 04:30:38,968][134211] Avg episode reward: [(0, '8.134')] [2025-01-04 04:30:41,662][134294] Updated weights for policy 0, policy_version 98104 (0.0026) [2025-01-04 04:30:43,968][134211] Fps is (10 sec: 11468.8, 60 sec: 13858.1, 300 sec: 14884.4). Total num frames: 401858560. Throughput: 0: 3597.3. Samples: 89636054. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 04:30:43,968][134211] Avg episode reward: [(0, '8.625')] [2025-01-04 04:30:45,175][134294] Updated weights for policy 0, policy_version 98114 (0.0025) [2025-01-04 04:30:48,545][134294] Updated weights for policy 0, policy_version 98124 (0.0026) [2025-01-04 04:30:48,968][134211] Fps is (10 sec: 12288.1, 60 sec: 13789.9, 300 sec: 14884.5). Total num frames: 401920000. Throughput: 0: 3594.8. Samples: 89644908. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 04:30:48,969][134211] Avg episode reward: [(0, '7.785')] [2025-01-04 04:30:51,170][134294] Updated weights for policy 0, policy_version 98134 (0.0020) [2025-01-04 04:30:53,199][134294] Updated weights for policy 0, policy_version 98144 (0.0015) [2025-01-04 04:30:53,968][134211] Fps is (10 sec: 15155.6, 60 sec: 14336.0, 300 sec: 14967.8). Total num frames: 402010112. Throughput: 0: 3558.8. Samples: 89667408. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 04:30:53,968][134211] Avg episode reward: [(0, '7.684')] [2025-01-04 04:30:55,100][134294] Updated weights for policy 0, policy_version 98154 (0.0015) [2025-01-04 04:30:57,278][134294] Updated weights for policy 0, policy_version 98164 (0.0016) [2025-01-04 04:30:58,968][134211] Fps is (10 sec: 18022.7, 60 sec: 14813.9, 300 sec: 15037.3). Total num frames: 402100224. Throughput: 0: 3611.5. Samples: 89696008. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 04:30:58,968][134211] Avg episode reward: [(0, '8.615')] [2025-01-04 04:31:00,764][134294] Updated weights for policy 0, policy_version 98174 (0.0029) [2025-01-04 04:31:03,968][134211] Fps is (10 sec: 14334.8, 60 sec: 14745.6, 300 sec: 14995.5). Total num frames: 402153472. Throughput: 0: 3611.4. Samples: 89704434. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 04:31:03,969][134211] Avg episode reward: [(0, '8.306')] [2025-01-04 04:31:04,469][134294] Updated weights for policy 0, policy_version 98184 (0.0029) [2025-01-04 04:31:07,786][134294] Updated weights for policy 0, policy_version 98194 (0.0024) [2025-01-04 04:31:08,968][134211] Fps is (10 sec: 11877.8, 60 sec: 14199.3, 300 sec: 14995.5). Total num frames: 402219008. Throughput: 0: 3601.5. Samples: 89721912. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 04:31:08,968][134211] Avg episode reward: [(0, '7.803')] [2025-01-04 04:31:10,092][134294] Updated weights for policy 0, policy_version 98204 (0.0014) [2025-01-04 04:31:11,976][134294] Updated weights for policy 0, policy_version 98214 (0.0014) [2025-01-04 04:31:13,916][134294] Updated weights for policy 0, policy_version 98224 (0.0014) [2025-01-04 04:31:13,968][134211] Fps is (10 sec: 17204.2, 60 sec: 14609.0, 300 sec: 15120.5). Total num frames: 402325504. Throughput: 0: 3691.0. Samples: 89750828. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 04:31:13,968][134211] Avg episode reward: [(0, '8.006')] [2025-01-04 04:31:14,046][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000098225_402329600.pth... [2025-01-04 04:31:14,087][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000097349_398741504.pth [2025-01-04 04:31:15,870][134294] Updated weights for policy 0, policy_version 98234 (0.0013) [2025-01-04 04:31:18,968][134211] Fps is (10 sec: 18431.7, 60 sec: 14950.4, 300 sec: 15148.2). Total num frames: 402403328. Throughput: 0: 3693.7. Samples: 89766118. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 04:31:18,969][134211] Avg episode reward: [(0, '7.710')] [2025-01-04 04:31:19,082][134294] Updated weights for policy 0, policy_version 98244 (0.0026) [2025-01-04 04:31:22,592][134294] Updated weights for policy 0, policy_version 98254 (0.0027) [2025-01-04 04:31:23,968][134211] Fps is (10 sec: 13516.8, 60 sec: 14882.1, 300 sec: 15051.1). Total num frames: 402460672. Throughput: 0: 3678.9. Samples: 89784092. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 04:31:23,969][134211] Avg episode reward: [(0, '7.696')] [2025-01-04 04:31:26,070][134294] Updated weights for policy 0, policy_version 98264 (0.0030) [2025-01-04 04:31:28,968][134211] Fps is (10 sec: 11878.9, 60 sec: 14404.2, 300 sec: 14898.3). Total num frames: 402522112. Throughput: 0: 3678.0. Samples: 89801564. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 04:31:28,969][134211] Avg episode reward: [(0, '8.200')] [2025-01-04 04:31:29,820][134294] Updated weights for policy 0, policy_version 98274 (0.0029) [2025-01-04 04:31:33,162][134294] Updated weights for policy 0, policy_version 98284 (0.0022) [2025-01-04 04:31:33,968][134211] Fps is (10 sec: 12288.2, 60 sec: 13994.7, 300 sec: 14773.4). Total num frames: 402583552. Throughput: 0: 3664.1. Samples: 89809790. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 04:31:33,968][134211] Avg episode reward: [(0, '7.803')] [2025-01-04 04:31:35,328][134294] Updated weights for policy 0, policy_version 98294 (0.0013) [2025-01-04 04:31:37,402][134294] Updated weights for policy 0, policy_version 98304 (0.0013) [2025-01-04 04:31:38,968][134211] Fps is (10 sec: 15974.8, 60 sec: 14745.6, 300 sec: 14898.4). Total num frames: 402681856. Throughput: 0: 3731.1. Samples: 89835308. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 04:31:38,968][134211] Avg episode reward: [(0, '8.384')] [2025-01-04 04:31:39,461][134294] Updated weights for policy 0, policy_version 98314 (0.0014) [2025-01-04 04:31:42,361][134294] Updated weights for policy 0, policy_version 98324 (0.0025) [2025-01-04 04:31:43,968][134211] Fps is (10 sec: 16793.2, 60 sec: 14882.1, 300 sec: 14912.2). Total num frames: 402751488. Throughput: 0: 3630.5. Samples: 89859380. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 04:31:43,969][134211] Avg episode reward: [(0, '7.542')] [2025-01-04 04:31:45,546][134294] Updated weights for policy 0, policy_version 98334 (0.0028) [2025-01-04 04:31:48,689][134294] Updated weights for policy 0, policy_version 98344 (0.0026) [2025-01-04 04:31:48,971][134211] Fps is (10 sec: 13512.3, 60 sec: 14949.6, 300 sec: 14912.1). Total num frames: 402817024. Throughput: 0: 3662.6. Samples: 89869262. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 04:31:48,972][134211] Avg episode reward: [(0, '7.736')] [2025-01-04 04:31:52,135][134294] Updated weights for policy 0, policy_version 98354 (0.0026) [2025-01-04 04:31:53,968][134211] Fps is (10 sec: 12697.6, 60 sec: 14472.5, 300 sec: 14884.4). Total num frames: 402878464. Throughput: 0: 3682.9. Samples: 89887640. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 04:31:53,968][134211] Avg episode reward: [(0, '8.884')] [2025-01-04 04:31:55,384][134294] Updated weights for policy 0, policy_version 98364 (0.0026) [2025-01-04 04:31:57,627][134294] Updated weights for policy 0, policy_version 98374 (0.0017) [2025-01-04 04:31:58,967][134211] Fps is (10 sec: 14750.7, 60 sec: 14404.3, 300 sec: 14953.9). Total num frames: 402964480. Throughput: 0: 3562.1. Samples: 89911120. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 04:31:58,968][134211] Avg episode reward: [(0, '7.609')] [2025-01-04 04:31:59,568][134294] Updated weights for policy 0, policy_version 98384 (0.0013) [2025-01-04 04:32:01,474][134294] Updated weights for policy 0, policy_version 98394 (0.0012) [2025-01-04 04:32:03,422][134294] Updated weights for policy 0, policy_version 98404 (0.0015) [2025-01-04 04:32:03,968][134211] Fps is (10 sec: 19251.6, 60 sec: 15291.9, 300 sec: 15051.1). Total num frames: 403070976. Throughput: 0: 3581.2. Samples: 89927268. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 04:32:03,968][134211] Avg episode reward: [(0, '8.444')] [2025-01-04 04:32:05,878][134294] Updated weights for policy 0, policy_version 98414 (0.0022) [2025-01-04 04:32:08,968][134211] Fps is (10 sec: 17612.4, 60 sec: 15360.1, 300 sec: 14912.2). Total num frames: 403140608. Throughput: 0: 3758.5. Samples: 89953224. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 04:32:08,968][134211] Avg episode reward: [(0, '8.271')] [2025-01-04 04:32:09,285][134294] Updated weights for policy 0, policy_version 98424 (0.0026) [2025-01-04 04:32:12,715][134294] Updated weights for policy 0, policy_version 98434 (0.0028) [2025-01-04 04:32:13,968][134211] Fps is (10 sec: 12697.5, 60 sec: 14540.8, 300 sec: 14842.8). Total num frames: 403197952. Throughput: 0: 3762.9. Samples: 89970894. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 04:32:13,968][134211] Avg episode reward: [(0, '8.918')] [2025-01-04 04:32:15,828][134294] Updated weights for policy 0, policy_version 98444 (0.0028) [2025-01-04 04:32:18,907][134294] Updated weights for policy 0, policy_version 98454 (0.0026) [2025-01-04 04:32:18,968][134211] Fps is (10 sec: 12697.6, 60 sec: 14404.4, 300 sec: 14856.7). Total num frames: 403267584. Throughput: 0: 3801.2. Samples: 89980846. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 04:32:18,968][134211] Avg episode reward: [(0, '7.989')] [2025-01-04 04:32:21,924][134294] Updated weights for policy 0, policy_version 98464 (0.0026) [2025-01-04 04:32:23,968][134211] Fps is (10 sec: 13516.7, 60 sec: 14540.8, 300 sec: 14870.6). Total num frames: 403333120. Throughput: 0: 3683.4. Samples: 90001060. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 04:32:23,968][134211] Avg episode reward: [(0, '7.203')] [2025-01-04 04:32:25,058][134294] Updated weights for policy 0, policy_version 98474 (0.0024) [2025-01-04 04:32:28,162][134294] Updated weights for policy 0, policy_version 98484 (0.0025) [2025-01-04 04:32:28,968][134211] Fps is (10 sec: 13107.1, 60 sec: 14609.1, 300 sec: 14870.6). Total num frames: 403398656. Throughput: 0: 3587.8. Samples: 90020830. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 04:32:28,968][134211] Avg episode reward: [(0, '7.709')] [2025-01-04 04:32:31,209][134294] Updated weights for policy 0, policy_version 98494 (0.0026) [2025-01-04 04:32:33,968][134211] Fps is (10 sec: 13516.8, 60 sec: 14745.6, 300 sec: 14745.6). Total num frames: 403468288. Throughput: 0: 3595.7. Samples: 90031056. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 04:32:33,968][134211] Avg episode reward: [(0, '7.856')] [2025-01-04 04:32:34,230][134294] Updated weights for policy 0, policy_version 98504 (0.0023) [2025-01-04 04:32:36,168][134294] Updated weights for policy 0, policy_version 98514 (0.0013) [2025-01-04 04:32:37,992][134294] Updated weights for policy 0, policy_version 98524 (0.0013) [2025-01-04 04:32:38,968][134211] Fps is (10 sec: 17202.7, 60 sec: 14813.7, 300 sec: 14731.7). Total num frames: 403570688. Throughput: 0: 3770.9. Samples: 90057332. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 04:32:38,968][134211] Avg episode reward: [(0, '8.223')] [2025-01-04 04:32:39,937][134294] Updated weights for policy 0, policy_version 98534 (0.0012) [2025-01-04 04:32:41,869][134294] Updated weights for policy 0, policy_version 98544 (0.0012) [2025-01-04 04:32:43,742][134294] Updated weights for policy 0, policy_version 98554 (0.0014) [2025-01-04 04:32:43,968][134211] Fps is (10 sec: 20890.0, 60 sec: 15428.3, 300 sec: 14856.7). Total num frames: 403677184. Throughput: 0: 3960.8. Samples: 90089356. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 04:32:43,968][134211] Avg episode reward: [(0, '8.071')] [2025-01-04 04:32:45,646][134294] Updated weights for policy 0, policy_version 98564 (0.0015) [2025-01-04 04:32:48,073][134294] Updated weights for policy 0, policy_version 98574 (0.0019) [2025-01-04 04:32:48,968][134211] Fps is (10 sec: 19661.4, 60 sec: 15838.7, 300 sec: 14926.1). Total num frames: 403767296. Throughput: 0: 3964.7. Samples: 90105678. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 04:32:48,968][134211] Avg episode reward: [(0, '7.880')] [2025-01-04 04:32:51,335][134294] Updated weights for policy 0, policy_version 98584 (0.0033) [2025-01-04 04:32:53,968][134211] Fps is (10 sec: 15564.4, 60 sec: 15906.1, 300 sec: 14940.0). Total num frames: 403832832. Throughput: 0: 3836.1. Samples: 90125850. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 04:32:53,969][134211] Avg episode reward: [(0, '8.141')] [2025-01-04 04:32:54,502][134294] Updated weights for policy 0, policy_version 98594 (0.0027) [2025-01-04 04:32:57,649][134294] Updated weights for policy 0, policy_version 98604 (0.0026) [2025-01-04 04:32:58,968][134211] Fps is (10 sec: 13107.3, 60 sec: 15564.7, 300 sec: 14940.0). Total num frames: 403898368. Throughput: 0: 3874.7. Samples: 90145254. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 04:32:58,969][134211] Avg episode reward: [(0, '7.821')] [2025-01-04 04:33:00,756][134294] Updated weights for policy 0, policy_version 98614 (0.0025) [2025-01-04 04:33:03,968][134211] Fps is (10 sec: 12697.2, 60 sec: 14813.7, 300 sec: 14926.2). Total num frames: 403959808. Throughput: 0: 3874.1. Samples: 90155184. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 04:33:03,969][134211] Avg episode reward: [(0, '8.173')] [2025-01-04 04:33:04,030][134294] Updated weights for policy 0, policy_version 98624 (0.0029) [2025-01-04 04:33:06,964][134294] Updated weights for policy 0, policy_version 98634 (0.0028) [2025-01-04 04:33:08,968][134211] Fps is (10 sec: 13107.2, 60 sec: 14813.9, 300 sec: 14926.1). Total num frames: 404029440. Throughput: 0: 3870.0. Samples: 90175208. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 04:33:08,968][134211] Avg episode reward: [(0, '7.353')] [2025-01-04 04:33:09,962][134294] Updated weights for policy 0, policy_version 98644 (0.0025) [2025-01-04 04:33:12,934][134294] Updated weights for policy 0, policy_version 98654 (0.0024) [2025-01-04 04:33:13,968][134211] Fps is (10 sec: 13926.7, 60 sec: 15018.6, 300 sec: 14926.1). Total num frames: 404099072. Throughput: 0: 3885.9. Samples: 90195696. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 04:33:13,968][134211] Avg episode reward: [(0, '8.419')] [2025-01-04 04:33:13,979][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000098657_404099072.pth... [2025-01-04 04:33:14,052][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000097803_400601088.pth [2025-01-04 04:33:15,998][134294] Updated weights for policy 0, policy_version 98664 (0.0028) [2025-01-04 04:33:18,968][134211] Fps is (10 sec: 13516.9, 60 sec: 14950.4, 300 sec: 14856.7). Total num frames: 404164608. Throughput: 0: 3881.2. Samples: 90205708. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 04:33:18,968][134211] Avg episode reward: [(0, '8.642')] [2025-01-04 04:33:18,975][134294] Updated weights for policy 0, policy_version 98674 (0.0024) [2025-01-04 04:33:21,658][134294] Updated weights for policy 0, policy_version 98684 (0.0022) [2025-01-04 04:33:23,609][134294] Updated weights for policy 0, policy_version 98694 (0.0012) [2025-01-04 04:33:23,967][134211] Fps is (10 sec: 15565.4, 60 sec: 15360.1, 300 sec: 14815.0). Total num frames: 404254720. Throughput: 0: 3804.9. Samples: 90228552. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 04:33:23,968][134211] Avg episode reward: [(0, '7.490')] [2025-01-04 04:33:25,587][134294] Updated weights for policy 0, policy_version 98704 (0.0014) [2025-01-04 04:33:27,510][134294] Updated weights for policy 0, policy_version 98714 (0.0014) [2025-01-04 04:33:28,968][134211] Fps is (10 sec: 19250.6, 60 sec: 15974.3, 300 sec: 14940.0). Total num frames: 404357120. Throughput: 0: 3785.3. Samples: 90259698. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 04:33:28,969][134211] Avg episode reward: [(0, '7.892')] [2025-01-04 04:33:30,126][134294] Updated weights for policy 0, policy_version 98724 (0.0021) [2025-01-04 04:33:33,865][134294] Updated weights for policy 0, policy_version 98734 (0.0032) [2025-01-04 04:33:33,968][134211] Fps is (10 sec: 15973.8, 60 sec: 15769.6, 300 sec: 14912.2). Total num frames: 404414464. Throughput: 0: 3640.3. Samples: 90269494. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 04:33:33,969][134211] Avg episode reward: [(0, '7.502')] [2025-01-04 04:33:37,603][134294] Updated weights for policy 0, policy_version 98744 (0.0025) [2025-01-04 04:33:38,968][134211] Fps is (10 sec: 11059.4, 60 sec: 14950.5, 300 sec: 14870.6). Total num frames: 404467712. Throughput: 0: 3555.7. Samples: 90285856. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 04:33:38,968][134211] Avg episode reward: [(0, '8.156')] [2025-01-04 04:33:40,458][134294] Updated weights for policy 0, policy_version 98754 (0.0019) [2025-01-04 04:33:42,458][134294] Updated weights for policy 0, policy_version 98764 (0.0013) [2025-01-04 04:33:43,968][134211] Fps is (10 sec: 15564.9, 60 sec: 14882.1, 300 sec: 14898.3). Total num frames: 404570112. Throughput: 0: 3693.6. Samples: 90311468. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 04:33:43,968][134211] Avg episode reward: [(0, '7.611')] [2025-01-04 04:33:44,414][134294] Updated weights for policy 0, policy_version 98774 (0.0013) [2025-01-04 04:33:46,357][134294] Updated weights for policy 0, policy_version 98784 (0.0012) [2025-01-04 04:33:48,622][134294] Updated weights for policy 0, policy_version 98794 (0.0018) [2025-01-04 04:33:48,968][134211] Fps is (10 sec: 19660.9, 60 sec: 14950.4, 300 sec: 14953.9). Total num frames: 404664320. Throughput: 0: 3819.5. Samples: 90327058. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 04:33:48,968][134211] Avg episode reward: [(0, '8.181')] [2025-01-04 04:33:52,457][134294] Updated weights for policy 0, policy_version 98804 (0.0025) [2025-01-04 04:33:53,968][134211] Fps is (10 sec: 14745.3, 60 sec: 14745.6, 300 sec: 14912.2). Total num frames: 404717568. Throughput: 0: 3835.3. Samples: 90347796. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 04:33:53,969][134211] Avg episode reward: [(0, '7.612')] [2025-01-04 04:33:55,631][134294] Updated weights for policy 0, policy_version 98814 (0.0030) [2025-01-04 04:33:58,968][134211] Fps is (10 sec: 11468.4, 60 sec: 14677.3, 300 sec: 14898.3). Total num frames: 404779008. Throughput: 0: 3789.4. Samples: 90366218. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 04:33:58,969][134211] Avg episode reward: [(0, '8.420')] [2025-01-04 04:33:59,122][134294] Updated weights for policy 0, policy_version 98824 (0.0028) [2025-01-04 04:34:02,407][134294] Updated weights for policy 0, policy_version 98834 (0.0027) [2025-01-04 04:34:03,968][134211] Fps is (10 sec: 12697.9, 60 sec: 14745.7, 300 sec: 14759.5). Total num frames: 404844544. Throughput: 0: 3768.1. Samples: 90375274. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 04:34:03,968][134211] Avg episode reward: [(0, '8.006')] [2025-01-04 04:34:05,519][134294] Updated weights for policy 0, policy_version 98844 (0.0025) [2025-01-04 04:34:08,352][134294] Updated weights for policy 0, policy_version 98854 (0.0025) [2025-01-04 04:34:08,970][134211] Fps is (10 sec: 13104.8, 60 sec: 14676.8, 300 sec: 14620.5). Total num frames: 404910080. Throughput: 0: 3710.7. Samples: 90395542. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 04:34:08,971][134211] Avg episode reward: [(0, '7.529')] [2025-01-04 04:34:11,609][134294] Updated weights for policy 0, policy_version 98864 (0.0026) [2025-01-04 04:34:13,968][134211] Fps is (10 sec: 13107.1, 60 sec: 14609.1, 300 sec: 14620.6). Total num frames: 404975616. Throughput: 0: 3444.8. Samples: 90414714. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 04:34:13,968][134211] Avg episode reward: [(0, '7.747')] [2025-01-04 04:34:14,859][134294] Updated weights for policy 0, policy_version 98874 (0.0023) [2025-01-04 04:34:16,875][134294] Updated weights for policy 0, policy_version 98884 (0.0012) [2025-01-04 04:34:18,806][134294] Updated weights for policy 0, policy_version 98894 (0.0012) [2025-01-04 04:34:18,968][134211] Fps is (10 sec: 15978.1, 60 sec: 15087.0, 300 sec: 14704.0). Total num frames: 405069824. Throughput: 0: 3490.5. Samples: 90426566. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 04:34:18,968][134211] Avg episode reward: [(0, '8.000')] [2025-01-04 04:34:21,337][134294] Updated weights for policy 0, policy_version 98904 (0.0020) [2025-01-04 04:34:23,968][134211] Fps is (10 sec: 16793.6, 60 sec: 14813.8, 300 sec: 14745.6). Total num frames: 405143552. Throughput: 0: 3723.4. Samples: 90453408. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 04:34:23,968][134211] Avg episode reward: [(0, '7.465')] [2025-01-04 04:34:24,539][134294] Updated weights for policy 0, policy_version 98914 (0.0026) [2025-01-04 04:34:27,629][134294] Updated weights for policy 0, policy_version 98924 (0.0024) [2025-01-04 04:34:28,968][134211] Fps is (10 sec: 13516.5, 60 sec: 14131.3, 300 sec: 14759.5). Total num frames: 405204992. Throughput: 0: 3589.4. Samples: 90472990. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 04:34:28,968][134211] Avg episode reward: [(0, '7.968')] [2025-01-04 04:34:30,762][134294] Updated weights for policy 0, policy_version 98934 (0.0024) [2025-01-04 04:34:33,745][134294] Updated weights for policy 0, policy_version 98944 (0.0026) [2025-01-04 04:34:33,968][134211] Fps is (10 sec: 13107.2, 60 sec: 14336.0, 300 sec: 14717.8). Total num frames: 405274624. Throughput: 0: 3467.3. Samples: 90483088. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 04:34:33,968][134211] Avg episode reward: [(0, '8.230')] [2025-01-04 04:34:35,974][134294] Updated weights for policy 0, policy_version 98954 (0.0015) [2025-01-04 04:34:37,893][134294] Updated weights for policy 0, policy_version 98964 (0.0012) [2025-01-04 04:34:38,968][134211] Fps is (10 sec: 17203.3, 60 sec: 15155.2, 300 sec: 14745.6). Total num frames: 405377024. Throughput: 0: 3574.3. Samples: 90508640. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 04:34:38,968][134211] Avg episode reward: [(0, '8.190')] [2025-01-04 04:34:40,359][134294] Updated weights for policy 0, policy_version 98974 (0.0020) [2025-01-04 04:34:43,503][134294] Updated weights for policy 0, policy_version 98984 (0.0025) [2025-01-04 04:34:43,968][134211] Fps is (10 sec: 16793.5, 60 sec: 14540.8, 300 sec: 14745.6). Total num frames: 405442560. Throughput: 0: 3680.7. Samples: 90531850. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 04:34:43,969][134211] Avg episode reward: [(0, '7.712')] [2025-01-04 04:34:46,563][134294] Updated weights for policy 0, policy_version 98994 (0.0027) [2025-01-04 04:34:48,968][134211] Fps is (10 sec: 13107.0, 60 sec: 14062.9, 300 sec: 14773.4). Total num frames: 405508096. Throughput: 0: 3698.8. Samples: 90541722. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 04:34:48,969][134211] Avg episode reward: [(0, '7.611')] [2025-01-04 04:34:49,816][134294] Updated weights for policy 0, policy_version 99004 (0.0025) [2025-01-04 04:34:52,116][134294] Updated weights for policy 0, policy_version 99014 (0.0014) [2025-01-04 04:34:53,968][134211] Fps is (10 sec: 15565.2, 60 sec: 14677.4, 300 sec: 14870.6). Total num frames: 405598208. Throughput: 0: 3741.7. Samples: 90563910. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 04:34:53,968][134211] Avg episode reward: [(0, '7.438')] [2025-01-04 04:34:54,039][134294] Updated weights for policy 0, policy_version 99024 (0.0014) [2025-01-04 04:34:55,927][134294] Updated weights for policy 0, policy_version 99034 (0.0016) [2025-01-04 04:34:57,860][134294] Updated weights for policy 0, policy_version 99044 (0.0014) [2025-01-04 04:34:58,967][134211] Fps is (10 sec: 19661.5, 60 sec: 15428.4, 300 sec: 15037.2). Total num frames: 405704704. Throughput: 0: 4031.7. Samples: 90596140. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 04:34:58,968][134211] Avg episode reward: [(0, '8.232')] [2025-01-04 04:34:59,782][134294] Updated weights for policy 0, policy_version 99054 (0.0013) [2025-01-04 04:35:01,646][134294] Updated weights for policy 0, policy_version 99064 (0.0015) [2025-01-04 04:35:03,969][134211] Fps is (10 sec: 20067.4, 60 sec: 15905.8, 300 sec: 15023.2). Total num frames: 405798912. Throughput: 0: 4125.6. Samples: 90612226. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 04:35:03,970][134211] Avg episode reward: [(0, '7.520')] [2025-01-04 04:35:04,444][134294] Updated weights for policy 0, policy_version 99074 (0.0027) [2025-01-04 04:35:07,901][134294] Updated weights for policy 0, policy_version 99084 (0.0028) [2025-01-04 04:35:08,968][134211] Fps is (10 sec: 15564.4, 60 sec: 15838.4, 300 sec: 14953.9). Total num frames: 405860352. Throughput: 0: 3990.4. Samples: 90632974. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 04:35:08,968][134211] Avg episode reward: [(0, '8.394')] [2025-01-04 04:35:11,033][134294] Updated weights for policy 0, policy_version 99094 (0.0026) [2025-01-04 04:35:13,968][134211] Fps is (10 sec: 12699.1, 60 sec: 15837.8, 300 sec: 14981.7). Total num frames: 405925888. Throughput: 0: 3986.5. Samples: 90652384. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 04:35:13,968][134211] Avg episode reward: [(0, '8.548')] [2025-01-04 04:35:13,977][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000099103_405925888.pth... [2025-01-04 04:35:14,061][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000098225_402329600.pth [2025-01-04 04:35:14,310][134294] Updated weights for policy 0, policy_version 99104 (0.0026) [2025-01-04 04:35:17,651][134294] Updated weights for policy 0, policy_version 99114 (0.0022) [2025-01-04 04:35:18,968][134211] Fps is (10 sec: 12697.5, 60 sec: 15291.7, 300 sec: 14981.6). Total num frames: 405987328. Throughput: 0: 3958.6. Samples: 90661224. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 04:35:18,968][134211] Avg episode reward: [(0, '9.127')] [2025-01-04 04:35:20,762][134294] Updated weights for policy 0, policy_version 99124 (0.0023) [2025-01-04 04:35:23,674][134294] Updated weights for policy 0, policy_version 99134 (0.0023) [2025-01-04 04:35:23,968][134211] Fps is (10 sec: 13107.4, 60 sec: 15223.5, 300 sec: 14912.2). Total num frames: 406056960. Throughput: 0: 3837.6. Samples: 90681334. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 04:35:23,968][134211] Avg episode reward: [(0, '7.353')] [2025-01-04 04:35:26,601][134294] Updated weights for policy 0, policy_version 99144 (0.0026) [2025-01-04 04:35:28,968][134211] Fps is (10 sec: 13517.1, 60 sec: 15291.8, 300 sec: 14842.8). Total num frames: 406122496. Throughput: 0: 3779.0. Samples: 90701904. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 04:35:28,968][134211] Avg episode reward: [(0, '8.084')] [2025-01-04 04:35:29,686][134294] Updated weights for policy 0, policy_version 99154 (0.0026) [2025-01-04 04:35:33,002][134294] Updated weights for policy 0, policy_version 99164 (0.0025) [2025-01-04 04:35:33,968][134211] Fps is (10 sec: 12697.4, 60 sec: 15155.2, 300 sec: 14870.6). Total num frames: 406183936. Throughput: 0: 3780.0. Samples: 90711824. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 04:35:33,969][134211] Avg episode reward: [(0, '7.377')] [2025-01-04 04:35:36,421][134294] Updated weights for policy 0, policy_version 99174 (0.0027) [2025-01-04 04:35:38,593][134294] Updated weights for policy 0, policy_version 99184 (0.0016) [2025-01-04 04:35:38,968][134211] Fps is (10 sec: 13926.5, 60 sec: 14745.6, 300 sec: 14926.1). Total num frames: 406261760. Throughput: 0: 3703.9. Samples: 90730586. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 04:35:38,968][134211] Avg episode reward: [(0, '8.676')] [2025-01-04 04:35:40,565][134294] Updated weights for policy 0, policy_version 99194 (0.0013) [2025-01-04 04:35:42,482][134294] Updated weights for policy 0, policy_version 99204 (0.0015) [2025-01-04 04:35:43,968][134211] Fps is (10 sec: 18432.3, 60 sec: 15428.3, 300 sec: 15078.8). Total num frames: 406368256. Throughput: 0: 3687.2. Samples: 90762064. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 04:35:43,968][134211] Avg episode reward: [(0, '8.267')] [2025-01-04 04:35:44,541][134294] Updated weights for policy 0, policy_version 99214 (0.0016) [2025-01-04 04:35:47,585][134294] Updated weights for policy 0, policy_version 99224 (0.0025) [2025-01-04 04:35:48,968][134211] Fps is (10 sec: 17612.4, 60 sec: 15496.6, 300 sec: 15009.4). Total num frames: 406437888. Throughput: 0: 3598.3. Samples: 90774144. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 04:35:48,968][134211] Avg episode reward: [(0, '8.096')] [2025-01-04 04:35:50,617][134294] Updated weights for policy 0, policy_version 99234 (0.0029) [2025-01-04 04:35:53,749][134294] Updated weights for policy 0, policy_version 99244 (0.0025) [2025-01-04 04:35:53,968][134211] Fps is (10 sec: 13516.6, 60 sec: 15086.9, 300 sec: 14926.1). Total num frames: 406503424. Throughput: 0: 3579.6. Samples: 90794054. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 04:35:53,968][134211] Avg episode reward: [(0, '7.555')] [2025-01-04 04:35:56,778][134294] Updated weights for policy 0, policy_version 99254 (0.0029) [2025-01-04 04:35:58,968][134211] Fps is (10 sec: 13107.4, 60 sec: 14404.2, 300 sec: 14967.8). Total num frames: 406568960. Throughput: 0: 3585.0. Samples: 90813708. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 04:35:58,968][134211] Avg episode reward: [(0, '8.411')] [2025-01-04 04:35:59,970][134294] Updated weights for policy 0, policy_version 99264 (0.0026) [2025-01-04 04:36:02,995][134294] Updated weights for policy 0, policy_version 99274 (0.0027) [2025-01-04 04:36:03,968][134211] Fps is (10 sec: 13516.8, 60 sec: 13995.0, 300 sec: 14981.7). Total num frames: 406638592. Throughput: 0: 3604.4. Samples: 90823422. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 04:36:03,968][134211] Avg episode reward: [(0, '7.754')] [2025-01-04 04:36:05,730][134294] Updated weights for policy 0, policy_version 99284 (0.0020) [2025-01-04 04:36:07,731][134294] Updated weights for policy 0, policy_version 99294 (0.0016) [2025-01-04 04:36:08,968][134211] Fps is (10 sec: 15564.6, 60 sec: 14404.3, 300 sec: 14912.2). Total num frames: 406724608. Throughput: 0: 3697.4. Samples: 90847718. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 04:36:08,968][134211] Avg episode reward: [(0, '7.564')] [2025-01-04 04:36:10,582][134294] Updated weights for policy 0, policy_version 99304 (0.0023) [2025-01-04 04:36:13,573][134294] Updated weights for policy 0, policy_version 99314 (0.0025) [2025-01-04 04:36:13,968][134211] Fps is (10 sec: 15564.7, 60 sec: 14472.5, 300 sec: 14884.5). Total num frames: 406794240. Throughput: 0: 3722.0. Samples: 90869394. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 04:36:13,968][134211] Avg episode reward: [(0, '8.518')] [2025-01-04 04:36:16,257][134294] Updated weights for policy 0, policy_version 99324 (0.0019) [2025-01-04 04:36:18,134][134294] Updated weights for policy 0, policy_version 99334 (0.0012) [2025-01-04 04:36:18,967][134211] Fps is (10 sec: 16384.4, 60 sec: 15018.8, 300 sec: 15009.4). Total num frames: 406888448. Throughput: 0: 3754.3. Samples: 90880766. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-04 04:36:18,968][134211] Avg episode reward: [(0, '7.858')] [2025-01-04 04:36:20,010][134294] Updated weights for policy 0, policy_version 99344 (0.0013) [2025-01-04 04:36:21,906][134294] Updated weights for policy 0, policy_version 99354 (0.0012) [2025-01-04 04:36:23,771][134294] Updated weights for policy 0, policy_version 99364 (0.0012) [2025-01-04 04:36:23,968][134211] Fps is (10 sec: 20070.8, 60 sec: 15633.1, 300 sec: 15162.2). Total num frames: 406994944. Throughput: 0: 4056.5. Samples: 90913130. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-04 04:36:23,968][134211] Avg episode reward: [(0, '7.399')] [2025-01-04 04:36:25,864][134294] Updated weights for policy 0, policy_version 99374 (0.0016) [2025-01-04 04:36:28,960][134294] Updated weights for policy 0, policy_version 99384 (0.0029) [2025-01-04 04:36:28,968][134211] Fps is (10 sec: 18841.0, 60 sec: 15906.1, 300 sec: 15231.6). Total num frames: 407076864. Throughput: 0: 3954.2. Samples: 90940004. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-04 04:36:28,968][134211] Avg episode reward: [(0, '7.690')] [2025-01-04 04:36:32,206][134294] Updated weights for policy 0, policy_version 99394 (0.0027) [2025-01-04 04:36:33,968][134211] Fps is (10 sec: 14335.7, 60 sec: 15906.1, 300 sec: 15106.6). Total num frames: 407138304. Throughput: 0: 3893.2. Samples: 90949336. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-04 04:36:33,968][134211] Avg episode reward: [(0, '7.777')] [2025-01-04 04:36:35,412][134294] Updated weights for policy 0, policy_version 99404 (0.0027) [2025-01-04 04:36:38,383][134294] Updated weights for policy 0, policy_version 99414 (0.0022) [2025-01-04 04:36:38,968][134211] Fps is (10 sec: 12697.8, 60 sec: 15701.3, 300 sec: 15092.7). Total num frames: 407203840. Throughput: 0: 3889.4. Samples: 90969078. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-04 04:36:38,968][134211] Avg episode reward: [(0, '7.697')] [2025-01-04 04:36:41,406][134294] Updated weights for policy 0, policy_version 99424 (0.0025) [2025-01-04 04:36:43,968][134211] Fps is (10 sec: 13517.0, 60 sec: 15086.9, 300 sec: 15106.8). Total num frames: 407273472. Throughput: 0: 3895.6. Samples: 90989012. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-04 04:36:43,968][134211] Avg episode reward: [(0, '7.736')] [2025-01-04 04:36:44,601][134294] Updated weights for policy 0, policy_version 99434 (0.0026) [2025-01-04 04:36:47,562][134294] Updated weights for policy 0, policy_version 99444 (0.0024) [2025-01-04 04:36:48,968][134211] Fps is (10 sec: 13516.8, 60 sec: 15018.7, 300 sec: 15120.5). Total num frames: 407339008. Throughput: 0: 3903.6. Samples: 90999082. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-04 04:36:48,968][134211] Avg episode reward: [(0, '8.081')] [2025-01-04 04:36:50,511][134294] Updated weights for policy 0, policy_version 99454 (0.0023) [2025-01-04 04:36:53,477][134294] Updated weights for policy 0, policy_version 99464 (0.0022) [2025-01-04 04:36:53,968][134211] Fps is (10 sec: 13515.8, 60 sec: 15086.8, 300 sec: 15064.9). Total num frames: 407408640. Throughput: 0: 3828.9. Samples: 91020020. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-04 04:36:53,969][134211] Avg episode reward: [(0, '7.852')] [2025-01-04 04:36:56,429][134294] Updated weights for policy 0, policy_version 99474 (0.0025) [2025-01-04 04:36:58,968][134211] Fps is (10 sec: 13926.4, 60 sec: 15155.2, 300 sec: 14940.0). Total num frames: 407478272. Throughput: 0: 3800.6. Samples: 91040422. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-04 04:36:58,968][134211] Avg episode reward: [(0, '8.301')] [2025-01-04 04:36:59,383][134294] Updated weights for policy 0, policy_version 99484 (0.0023) [2025-01-04 04:37:01,294][134294] Updated weights for policy 0, policy_version 99494 (0.0013) [2025-01-04 04:37:03,238][134294] Updated weights for policy 0, policy_version 99504 (0.0014) [2025-01-04 04:37:03,968][134211] Fps is (10 sec: 17204.4, 60 sec: 15701.3, 300 sec: 15051.1). Total num frames: 407580672. Throughput: 0: 3858.0. Samples: 91054378. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-04 04:37:03,968][134211] Avg episode reward: [(0, '8.076')] [2025-01-04 04:37:05,951][134294] Updated weights for policy 0, policy_version 99514 (0.0022) [2025-01-04 04:37:08,968][134211] Fps is (10 sec: 16793.5, 60 sec: 15360.0, 300 sec: 15078.8). Total num frames: 407646208. Throughput: 0: 3695.1. Samples: 91079410. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-04 04:37:08,968][134211] Avg episode reward: [(0, '7.982')] [2025-01-04 04:37:09,124][134294] Updated weights for policy 0, policy_version 99524 (0.0023) [2025-01-04 04:37:12,149][134294] Updated weights for policy 0, policy_version 99534 (0.0024) [2025-01-04 04:37:13,968][134211] Fps is (10 sec: 13107.3, 60 sec: 15291.8, 300 sec: 15064.9). Total num frames: 407711744. Throughput: 0: 3537.4. Samples: 91099186. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-04 04:37:13,968][134211] Avg episode reward: [(0, '8.498')] [2025-01-04 04:37:14,032][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000099540_407715840.pth... [2025-01-04 04:37:14,096][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000098657_404099072.pth [2025-01-04 04:37:15,040][134294] Updated weights for policy 0, policy_version 99544 (0.0020) [2025-01-04 04:37:16,919][134294] Updated weights for policy 0, policy_version 99554 (0.0012) [2025-01-04 04:37:18,827][134294] Updated weights for policy 0, policy_version 99564 (0.0012) [2025-01-04 04:37:18,968][134211] Fps is (10 sec: 16793.8, 60 sec: 15428.2, 300 sec: 15189.9). Total num frames: 407814144. Throughput: 0: 3625.4. Samples: 91112480. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-04 04:37:18,968][134211] Avg episode reward: [(0, '8.213')] [2025-01-04 04:37:20,714][134294] Updated weights for policy 0, policy_version 99574 (0.0016) [2025-01-04 04:37:23,358][134294] Updated weights for policy 0, policy_version 99584 (0.0023) [2025-01-04 04:37:23,968][134211] Fps is (10 sec: 18841.6, 60 sec: 15086.9, 300 sec: 15259.3). Total num frames: 407900160. Throughput: 0: 3871.7. Samples: 91143306. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-04 04:37:23,968][134211] Avg episode reward: [(0, '7.670')] [2025-01-04 04:37:26,453][134294] Updated weights for policy 0, policy_version 99594 (0.0025) [2025-01-04 04:37:28,968][134211] Fps is (10 sec: 15154.8, 60 sec: 14813.9, 300 sec: 15245.4). Total num frames: 407965696. Throughput: 0: 3858.3. Samples: 91162636. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-04 04:37:28,968][134211] Avg episode reward: [(0, '8.027')] [2025-01-04 04:37:29,784][134294] Updated weights for policy 0, policy_version 99604 (0.0026) [2025-01-04 04:37:32,945][134294] Updated weights for policy 0, policy_version 99614 (0.0029) [2025-01-04 04:37:33,968][134211] Fps is (10 sec: 12697.4, 60 sec: 14813.9, 300 sec: 15106.6). Total num frames: 408027136. Throughput: 0: 3847.5. Samples: 91172220. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-04 04:37:33,969][134211] Avg episode reward: [(0, '7.769')] [2025-01-04 04:37:36,579][134294] Updated weights for policy 0, policy_version 99624 (0.0030) [2025-01-04 04:37:38,867][134294] Updated weights for policy 0, policy_version 99634 (0.0015) [2025-01-04 04:37:38,967][134211] Fps is (10 sec: 13517.2, 60 sec: 14950.4, 300 sec: 14995.5). Total num frames: 408100864. Throughput: 0: 3782.7. Samples: 91190240. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-04 04:37:38,968][134211] Avg episode reward: [(0, '7.691')] [2025-01-04 04:37:40,825][134294] Updated weights for policy 0, policy_version 99644 (0.0012) [2025-01-04 04:37:42,743][134294] Updated weights for policy 0, policy_version 99654 (0.0013) [2025-01-04 04:37:43,967][134211] Fps is (10 sec: 18023.0, 60 sec: 15564.8, 300 sec: 15051.1). Total num frames: 408207360. Throughput: 0: 4020.5. Samples: 91221346. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-04 04:37:43,968][134211] Avg episode reward: [(0, '8.156')] [2025-01-04 04:37:44,633][134294] Updated weights for policy 0, policy_version 99664 (0.0012) [2025-01-04 04:37:46,656][134294] Updated weights for policy 0, policy_version 99674 (0.0016) [2025-01-04 04:37:48,968][134211] Fps is (10 sec: 19250.8, 60 sec: 15906.1, 300 sec: 15120.5). Total num frames: 408293376. Throughput: 0: 4067.8. Samples: 91237428. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-04 04:37:48,968][134211] Avg episode reward: [(0, '8.161')] [2025-01-04 04:37:49,675][134294] Updated weights for policy 0, policy_version 99684 (0.0029) [2025-01-04 04:37:52,680][134294] Updated weights for policy 0, policy_version 99694 (0.0025) [2025-01-04 04:37:53,968][134211] Fps is (10 sec: 15154.9, 60 sec: 15838.1, 300 sec: 15120.5). Total num frames: 408358912. Throughput: 0: 3967.9. Samples: 91257966. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-04 04:37:53,968][134211] Avg episode reward: [(0, '8.148')] [2025-01-04 04:37:55,886][134294] Updated weights for policy 0, policy_version 99704 (0.0025) [2025-01-04 04:37:58,896][134294] Updated weights for policy 0, policy_version 99714 (0.0024) [2025-01-04 04:37:58,969][134211] Fps is (10 sec: 13515.7, 60 sec: 15837.6, 300 sec: 15148.2). Total num frames: 408428544. Throughput: 0: 3975.2. Samples: 91278074. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-04 04:37:58,969][134211] Avg episode reward: [(0, '7.926')] [2025-01-04 04:38:02,027][134294] Updated weights for policy 0, policy_version 99724 (0.0026) [2025-01-04 04:38:03,968][134211] Fps is (10 sec: 13107.3, 60 sec: 15155.2, 300 sec: 15120.5). Total num frames: 408489984. Throughput: 0: 3894.7. Samples: 91287744. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-04 04:38:03,968][134211] Avg episode reward: [(0, '7.442')] [2025-01-04 04:38:05,053][134294] Updated weights for policy 0, policy_version 99734 (0.0025) [2025-01-04 04:38:08,067][134294] Updated weights for policy 0, policy_version 99744 (0.0026) [2025-01-04 04:38:08,969][134211] Fps is (10 sec: 13106.8, 60 sec: 15223.2, 300 sec: 15120.4). Total num frames: 408559616. Throughput: 0: 3664.5. Samples: 91308212. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-04 04:38:08,970][134211] Avg episode reward: [(0, '7.805')] [2025-01-04 04:38:11,076][134294] Updated weights for policy 0, policy_version 99754 (0.0025) [2025-01-04 04:38:13,968][134211] Fps is (10 sec: 13926.2, 60 sec: 15291.7, 300 sec: 15134.4). Total num frames: 408629248. Throughput: 0: 3676.8. Samples: 91328090. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-04 04:38:13,968][134211] Avg episode reward: [(0, '7.558')] [2025-01-04 04:38:14,328][134294] Updated weights for policy 0, policy_version 99764 (0.0027) [2025-01-04 04:38:17,081][134294] Updated weights for policy 0, policy_version 99774 (0.0024) [2025-01-04 04:38:18,967][134211] Fps is (10 sec: 15157.3, 60 sec: 14950.4, 300 sec: 15106.6). Total num frames: 408711168. Throughput: 0: 3682.8. Samples: 91337944. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 04:38:18,968][134211] Avg episode reward: [(0, '7.553')] [2025-01-04 04:38:19,005][134294] Updated weights for policy 0, policy_version 99784 (0.0013) [2025-01-04 04:38:20,954][134294] Updated weights for policy 0, policy_version 99794 (0.0013) [2025-01-04 04:38:23,940][134294] Updated weights for policy 0, policy_version 99804 (0.0025) [2025-01-04 04:38:23,968][134211] Fps is (10 sec: 16793.8, 60 sec: 14950.4, 300 sec: 15051.1). Total num frames: 408797184. Throughput: 0: 3925.7. Samples: 91366896. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 04:38:23,968][134211] Avg episode reward: [(0, '7.868')] [2025-01-04 04:38:27,304][134294] Updated weights for policy 0, policy_version 99814 (0.0028) [2025-01-04 04:38:28,967][134211] Fps is (10 sec: 15564.8, 60 sec: 15018.7, 300 sec: 15092.7). Total num frames: 408866816. Throughput: 0: 3657.3. Samples: 91385924. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 04:38:28,968][134211] Avg episode reward: [(0, '7.341')] [2025-01-04 04:38:29,475][134294] Updated weights for policy 0, policy_version 99824 (0.0016) [2025-01-04 04:38:31,368][134294] Updated weights for policy 0, policy_version 99834 (0.0015) [2025-01-04 04:38:33,885][134294] Updated weights for policy 0, policy_version 99844 (0.0019) [2025-01-04 04:38:33,968][134211] Fps is (10 sec: 16383.7, 60 sec: 15564.8, 300 sec: 15231.6). Total num frames: 408961024. Throughput: 0: 3660.7. Samples: 91402158. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 04:38:33,968][134211] Avg episode reward: [(0, '8.291')] [2025-01-04 04:38:36,901][134294] Updated weights for policy 0, policy_version 99854 (0.0025) [2025-01-04 04:38:38,968][134211] Fps is (10 sec: 15974.2, 60 sec: 15428.2, 300 sec: 15106.6). Total num frames: 409026560. Throughput: 0: 3698.9. Samples: 91424414. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 04:38:38,968][134211] Avg episode reward: [(0, '7.271')] [2025-01-04 04:38:40,173][134294] Updated weights for policy 0, policy_version 99864 (0.0029) [2025-01-04 04:38:43,173][134294] Updated weights for policy 0, policy_version 99874 (0.0024) [2025-01-04 04:38:43,968][134211] Fps is (10 sec: 13107.2, 60 sec: 14745.5, 300 sec: 15009.4). Total num frames: 409092096. Throughput: 0: 3690.1. Samples: 91444126. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 04:38:43,968][134211] Avg episode reward: [(0, '7.831')] [2025-01-04 04:38:46,220][134294] Updated weights for policy 0, policy_version 99884 (0.0026) [2025-01-04 04:38:48,968][134211] Fps is (10 sec: 13516.6, 60 sec: 14472.5, 300 sec: 15065.0). Total num frames: 409161728. Throughput: 0: 3700.1. Samples: 91454250. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 04:38:48,968][134211] Avg episode reward: [(0, '9.147')] [2025-01-04 04:38:49,003][134294] Updated weights for policy 0, policy_version 99894 (0.0021) [2025-01-04 04:38:50,934][134294] Updated weights for policy 0, policy_version 99904 (0.0013) [2025-01-04 04:38:53,114][134294] Updated weights for policy 0, policy_version 99914 (0.0018) [2025-01-04 04:38:53,968][134211] Fps is (10 sec: 16384.2, 60 sec: 14950.4, 300 sec: 15176.0). Total num frames: 409255936. Throughput: 0: 3842.7. Samples: 91481130. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 04:38:53,968][134211] Avg episode reward: [(0, '7.707')] [2025-01-04 04:38:56,256][134294] Updated weights for policy 0, policy_version 99924 (0.0026) [2025-01-04 04:38:58,968][134211] Fps is (10 sec: 15974.5, 60 sec: 14882.3, 300 sec: 15176.0). Total num frames: 409321472. Throughput: 0: 3856.5. Samples: 91501632. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 04:38:58,968][134211] Avg episode reward: [(0, '7.790')] [2025-01-04 04:38:59,377][134294] Updated weights for policy 0, policy_version 99934 (0.0026) [2025-01-04 04:39:02,446][134294] Updated weights for policy 0, policy_version 99944 (0.0028) [2025-01-04 04:39:03,968][134211] Fps is (10 sec: 13107.1, 60 sec: 14950.4, 300 sec: 15176.1). Total num frames: 409387008. Throughput: 0: 3855.3. Samples: 91511432. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 04:39:03,968][134211] Avg episode reward: [(0, '8.797')] [2025-01-04 04:39:05,266][134294] Updated weights for policy 0, policy_version 99954 (0.0020) [2025-01-04 04:39:07,208][134294] Updated weights for policy 0, policy_version 99964 (0.0012) [2025-01-04 04:39:08,968][134211] Fps is (10 sec: 16793.8, 60 sec: 15496.9, 300 sec: 15301.0). Total num frames: 409489408. Throughput: 0: 3767.1. Samples: 91536414. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 04:39:08,968][134211] Avg episode reward: [(0, '8.687')] [2025-01-04 04:39:09,159][134294] Updated weights for policy 0, policy_version 99974 (0.0016) [2025-01-04 04:39:11,977][134294] Updated weights for policy 0, policy_version 99984 (0.0023) [2025-01-04 04:39:13,968][134211] Fps is (10 sec: 17202.8, 60 sec: 15496.5, 300 sec: 15217.7). Total num frames: 409559040. Throughput: 0: 3882.3. Samples: 91560628. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 04:39:13,969][134211] Avg episode reward: [(0, '8.159')] [2025-01-04 04:39:13,977][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000099990_409559040.pth... [2025-01-04 04:39:14,055][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000099103_405925888.pth [2025-01-04 04:39:15,217][134294] Updated weights for policy 0, policy_version 99994 (0.0028) [2025-01-04 04:39:18,158][134294] Updated weights for policy 0, policy_version 100004 (0.0027) [2025-01-04 04:39:18,968][134211] Fps is (10 sec: 13516.7, 60 sec: 15223.4, 300 sec: 15189.9). Total num frames: 409624576. Throughput: 0: 3736.9. Samples: 91570318. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 04:39:18,968][134211] Avg episode reward: [(0, '8.074')] [2025-01-04 04:39:21,170][134294] Updated weights for policy 0, policy_version 100014 (0.0024) [2025-01-04 04:39:23,968][134211] Fps is (10 sec: 13107.6, 60 sec: 14882.1, 300 sec: 15203.8). Total num frames: 409690112. Throughput: 0: 3693.5. Samples: 91590622. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 04:39:23,968][134211] Avg episode reward: [(0, '8.486')] [2025-01-04 04:39:24,371][134294] Updated weights for policy 0, policy_version 100024 (0.0025) [2025-01-04 04:39:26,511][134294] Updated weights for policy 0, policy_version 100034 (0.0018) [2025-01-04 04:39:28,443][134294] Updated weights for policy 0, policy_version 100044 (0.0012) [2025-01-04 04:39:28,967][134211] Fps is (10 sec: 16384.4, 60 sec: 15360.0, 300 sec: 15301.0). Total num frames: 409788416. Throughput: 0: 3838.6. Samples: 91616862. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 04:39:28,968][134211] Avg episode reward: [(0, '8.403')] [2025-01-04 04:39:30,371][134294] Updated weights for policy 0, policy_version 100054 (0.0013) [2025-01-04 04:39:32,301][134294] Updated weights for policy 0, policy_version 100064 (0.0012) [2025-01-04 04:39:33,968][134211] Fps is (10 sec: 20480.2, 60 sec: 15564.9, 300 sec: 15314.9). Total num frames: 409894912. Throughput: 0: 3968.1. Samples: 91632812. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 04:39:33,968][134211] Avg episode reward: [(0, '9.117')] [2025-01-04 04:39:34,257][134294] Updated weights for policy 0, policy_version 100074 (0.0014) [2025-01-04 04:39:37,328][134294] Updated weights for policy 0, policy_version 100084 (0.0028) [2025-01-04 04:39:38,968][134211] Fps is (10 sec: 17201.9, 60 sec: 15564.6, 300 sec: 15314.9). Total num frames: 409960448. Throughput: 0: 3947.6. Samples: 91658776. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 04:39:38,969][134211] Avg episode reward: [(0, '7.836')] [2025-01-04 04:39:41,033][134294] Updated weights for policy 0, policy_version 100094 (0.0034) [2025-01-04 04:39:43,968][134211] Fps is (10 sec: 12287.8, 60 sec: 15428.3, 300 sec: 15287.1). Total num frames: 410017792. Throughput: 0: 3864.1. Samples: 91675518. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 04:39:43,968][134211] Avg episode reward: [(0, '7.322')] [2025-01-04 04:39:44,595][134294] Updated weights for policy 0, policy_version 100104 (0.0028) [2025-01-04 04:39:47,752][134294] Updated weights for policy 0, policy_version 100114 (0.0025) [2025-01-04 04:39:48,968][134211] Fps is (10 sec: 12288.7, 60 sec: 15360.0, 300 sec: 15203.8). Total num frames: 410083328. Throughput: 0: 3854.1. Samples: 91684866. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 04:39:48,968][134211] Avg episode reward: [(0, '7.541')] [2025-01-04 04:39:50,712][134294] Updated weights for policy 0, policy_version 100124 (0.0028) [2025-01-04 04:39:53,739][134294] Updated weights for policy 0, policy_version 100134 (0.0022) [2025-01-04 04:39:53,968][134211] Fps is (10 sec: 13517.0, 60 sec: 14950.4, 300 sec: 15078.8). Total num frames: 410152960. Throughput: 0: 3753.4. Samples: 91705318. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 04:39:53,968][134211] Avg episode reward: [(0, '7.343')] [2025-01-04 04:39:56,631][134294] Updated weights for policy 0, policy_version 100144 (0.0023) [2025-01-04 04:39:58,968][134211] Fps is (10 sec: 13516.9, 60 sec: 14950.4, 300 sec: 14981.7). Total num frames: 410218496. Throughput: 0: 3667.9. Samples: 91725680. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 04:39:58,968][134211] Avg episode reward: [(0, '7.780')] [2025-01-04 04:39:59,719][134294] Updated weights for policy 0, policy_version 100154 (0.0025) [2025-01-04 04:40:02,797][134294] Updated weights for policy 0, policy_version 100164 (0.0027) [2025-01-04 04:40:03,968][134211] Fps is (10 sec: 13107.2, 60 sec: 14950.4, 300 sec: 14995.5). Total num frames: 410284032. Throughput: 0: 3676.1. Samples: 91735744. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 04:40:03,968][134211] Avg episode reward: [(0, '7.664')] [2025-01-04 04:40:05,711][134294] Updated weights for policy 0, policy_version 100174 (0.0024) [2025-01-04 04:40:07,701][134294] Updated weights for policy 0, policy_version 100184 (0.0013) [2025-01-04 04:40:08,967][134211] Fps is (10 sec: 15974.7, 60 sec: 14813.9, 300 sec: 15092.7). Total num frames: 410378240. Throughput: 0: 3740.7. Samples: 91758954. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 04:40:08,968][134211] Avg episode reward: [(0, '7.590')] [2025-01-04 04:40:09,570][134294] Updated weights for policy 0, policy_version 100194 (0.0014) [2025-01-04 04:40:11,433][134294] Updated weights for policy 0, policy_version 100204 (0.0011) [2025-01-04 04:40:13,363][134294] Updated weights for policy 0, policy_version 100214 (0.0015) [2025-01-04 04:40:13,968][134211] Fps is (10 sec: 20480.2, 60 sec: 15496.6, 300 sec: 15259.4). Total num frames: 410488832. Throughput: 0: 3881.9. Samples: 91791548. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 04:40:13,968][134211] Avg episode reward: [(0, '7.805')] [2025-01-04 04:40:15,187][134294] Updated weights for policy 0, policy_version 100224 (0.0014) [2025-01-04 04:40:17,951][134294] Updated weights for policy 0, policy_version 100234 (0.0026) [2025-01-04 04:40:18,968][134211] Fps is (10 sec: 18841.0, 60 sec: 15701.3, 300 sec: 15287.1). Total num frames: 410566656. Throughput: 0: 3866.6. Samples: 91806810. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 04:40:18,970][134211] Avg episode reward: [(0, '8.870')] [2025-01-04 04:40:21,318][134294] Updated weights for policy 0, policy_version 100244 (0.0027) [2025-01-04 04:40:23,969][134211] Fps is (10 sec: 14334.0, 60 sec: 15701.0, 300 sec: 15287.0). Total num frames: 410632192. Throughput: 0: 3710.6. Samples: 91825754. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 04:40:23,970][134211] Avg episode reward: [(0, '7.367')] [2025-01-04 04:40:24,614][134294] Updated weights for policy 0, policy_version 100254 (0.0028) [2025-01-04 04:40:27,691][134294] Updated weights for policy 0, policy_version 100264 (0.0029) [2025-01-04 04:40:28,968][134211] Fps is (10 sec: 13107.2, 60 sec: 15155.1, 300 sec: 15301.0). Total num frames: 410697728. Throughput: 0: 3769.2. Samples: 91845134. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 04:40:28,968][134211] Avg episode reward: [(0, '8.725')] [2025-01-04 04:40:30,695][134294] Updated weights for policy 0, policy_version 100274 (0.0026) [2025-01-04 04:40:33,688][134294] Updated weights for policy 0, policy_version 100284 (0.0024) [2025-01-04 04:40:33,968][134211] Fps is (10 sec: 13108.9, 60 sec: 14472.5, 300 sec: 15259.3). Total num frames: 410763264. Throughput: 0: 3793.4. Samples: 91855568. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 04:40:33,968][134211] Avg episode reward: [(0, '8.241')] [2025-01-04 04:40:36,814][134294] Updated weights for policy 0, policy_version 100294 (0.0024) [2025-01-04 04:40:38,968][134211] Fps is (10 sec: 13516.9, 60 sec: 14540.9, 300 sec: 15134.4). Total num frames: 410832896. Throughput: 0: 3786.8. Samples: 91875724. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 04:40:38,968][134211] Avg episode reward: [(0, '8.722')] [2025-01-04 04:40:39,742][134294] Updated weights for policy 0, policy_version 100304 (0.0022) [2025-01-04 04:40:41,778][134294] Updated weights for policy 0, policy_version 100314 (0.0014) [2025-01-04 04:40:43,968][134211] Fps is (10 sec: 15155.1, 60 sec: 14950.4, 300 sec: 15176.0). Total num frames: 410914816. Throughput: 0: 3876.6. Samples: 91900128. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 04:40:43,968][134211] Avg episode reward: [(0, '7.875')] [2025-01-04 04:40:44,530][134294] Updated weights for policy 0, policy_version 100324 (0.0023) [2025-01-04 04:40:47,603][134294] Updated weights for policy 0, policy_version 100334 (0.0023) [2025-01-04 04:40:48,968][134211] Fps is (10 sec: 15155.4, 60 sec: 15018.7, 300 sec: 15189.9). Total num frames: 410984448. Throughput: 0: 3880.9. Samples: 91910384. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 04:40:48,968][134211] Avg episode reward: [(0, '7.822')] [2025-01-04 04:40:50,338][134294] Updated weights for policy 0, policy_version 100344 (0.0020) [2025-01-04 04:40:52,211][134294] Updated weights for policy 0, policy_version 100354 (0.0015) [2025-01-04 04:40:53,968][134211] Fps is (10 sec: 17203.4, 60 sec: 15564.8, 300 sec: 15314.9). Total num frames: 411086848. Throughput: 0: 3923.6. Samples: 91935518. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 04:40:53,968][134211] Avg episode reward: [(0, '8.714')] [2025-01-04 04:40:54,200][134294] Updated weights for policy 0, policy_version 100364 (0.0014) [2025-01-04 04:40:56,998][134294] Updated weights for policy 0, policy_version 100374 (0.0025) [2025-01-04 04:40:58,968][134211] Fps is (10 sec: 17202.9, 60 sec: 15633.0, 300 sec: 15314.9). Total num frames: 411156480. Throughput: 0: 3742.2. Samples: 91959948. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 04:40:58,968][134211] Avg episode reward: [(0, '8.643')] [2025-01-04 04:41:00,072][134294] Updated weights for policy 0, policy_version 100384 (0.0026) [2025-01-04 04:41:03,051][134294] Updated weights for policy 0, policy_version 100394 (0.0026) [2025-01-04 04:41:03,968][134211] Fps is (10 sec: 13516.8, 60 sec: 15633.1, 300 sec: 15245.5). Total num frames: 411222016. Throughput: 0: 3635.0. Samples: 91970386. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 04:41:03,968][134211] Avg episode reward: [(0, '8.314')] [2025-01-04 04:41:06,182][134294] Updated weights for policy 0, policy_version 100404 (0.0025) [2025-01-04 04:41:08,286][134294] Updated weights for policy 0, policy_version 100414 (0.0014) [2025-01-04 04:41:08,968][134211] Fps is (10 sec: 14745.3, 60 sec: 15428.1, 300 sec: 15287.1). Total num frames: 411303936. Throughput: 0: 3691.7. Samples: 91991876. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 04:41:08,969][134211] Avg episode reward: [(0, '8.051')] [2025-01-04 04:41:11,282][134294] Updated weights for policy 0, policy_version 100424 (0.0024) [2025-01-04 04:41:13,968][134211] Fps is (10 sec: 14745.5, 60 sec: 14677.3, 300 sec: 15189.9). Total num frames: 411369472. Throughput: 0: 3741.2. Samples: 92013486. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 04:41:13,968][134211] Avg episode reward: [(0, '8.098')] [2025-01-04 04:41:13,976][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000100432_411369472.pth... [2025-01-04 04:41:14,048][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000099540_407715840.pth [2025-01-04 04:41:14,626][134294] Updated weights for policy 0, policy_version 100434 (0.0025) [2025-01-04 04:41:17,400][134294] Updated weights for policy 0, policy_version 100444 (0.0018) [2025-01-04 04:41:18,968][134211] Fps is (10 sec: 14336.4, 60 sec: 14677.4, 300 sec: 15092.7). Total num frames: 411447296. Throughput: 0: 3697.3. Samples: 92021948. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 04:41:18,968][134211] Avg episode reward: [(0, '8.157')] [2025-01-04 04:41:19,439][134294] Updated weights for policy 0, policy_version 100454 (0.0014) [2025-01-04 04:41:21,339][134294] Updated weights for policy 0, policy_version 100464 (0.0014) [2025-01-04 04:41:23,257][134294] Updated weights for policy 0, policy_version 100474 (0.0014) [2025-01-04 04:41:23,968][134211] Fps is (10 sec: 18432.2, 60 sec: 15360.3, 300 sec: 15176.0). Total num frames: 411553792. Throughput: 0: 3935.4. Samples: 92052818. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 04:41:23,968][134211] Avg episode reward: [(0, '9.232')] [2025-01-04 04:41:25,661][134294] Updated weights for policy 0, policy_version 100484 (0.0020) [2025-01-04 04:41:28,968][134211] Fps is (10 sec: 17203.1, 60 sec: 15360.0, 300 sec: 15189.9). Total num frames: 411619328. Throughput: 0: 3918.5. Samples: 92076460. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 04:41:28,968][134211] Avg episode reward: [(0, '8.006')] [2025-01-04 04:41:29,094][134294] Updated weights for policy 0, policy_version 100494 (0.0030) [2025-01-04 04:41:32,472][134294] Updated weights for policy 0, policy_version 100504 (0.0029) [2025-01-04 04:41:33,968][134211] Fps is (10 sec: 12697.3, 60 sec: 15291.7, 300 sec: 15176.0). Total num frames: 411680768. Throughput: 0: 3885.4. Samples: 92085226. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 04:41:33,969][134211] Avg episode reward: [(0, '8.503')] [2025-01-04 04:41:35,892][134294] Updated weights for policy 0, policy_version 100514 (0.0024) [2025-01-04 04:41:38,968][134211] Fps is (10 sec: 11878.5, 60 sec: 15087.0, 300 sec: 15134.4). Total num frames: 411738112. Throughput: 0: 3729.2. Samples: 92103330. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 04:41:38,968][134211] Avg episode reward: [(0, '6.576')] [2025-01-04 04:41:39,520][134294] Updated weights for policy 0, policy_version 100524 (0.0027) [2025-01-04 04:41:42,096][134294] Updated weights for policy 0, policy_version 100534 (0.0015) [2025-01-04 04:41:43,968][134211] Fps is (10 sec: 14336.3, 60 sec: 15155.2, 300 sec: 15203.8). Total num frames: 411824128. Throughput: 0: 3679.6. Samples: 92125530. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 04:41:43,968][134211] Avg episode reward: [(0, '7.878')] [2025-01-04 04:41:44,093][134294] Updated weights for policy 0, policy_version 100544 (0.0013) [2025-01-04 04:41:45,999][134294] Updated weights for policy 0, policy_version 100554 (0.0013) [2025-01-04 04:41:47,901][134294] Updated weights for policy 0, policy_version 100564 (0.0012) [2025-01-04 04:41:48,967][134211] Fps is (10 sec: 19251.5, 60 sec: 15769.6, 300 sec: 15328.8). Total num frames: 411930624. Throughput: 0: 3805.7. Samples: 92141642. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 04:41:48,968][134211] Avg episode reward: [(0, '8.245')] [2025-01-04 04:41:49,743][134294] Updated weights for policy 0, policy_version 100574 (0.0014) [2025-01-04 04:41:51,755][134294] Updated weights for policy 0, policy_version 100584 (0.0013) [2025-01-04 04:41:53,958][134294] Updated weights for policy 0, policy_version 100594 (0.0015) [2025-01-04 04:41:53,968][134211] Fps is (10 sec: 20889.2, 60 sec: 15769.6, 300 sec: 15439.8). Total num frames: 412033024. Throughput: 0: 4035.9. Samples: 92173490. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 04:41:53,968][134211] Avg episode reward: [(0, '7.610')] [2025-01-04 04:41:57,204][134294] Updated weights for policy 0, policy_version 100604 (0.0029) [2025-01-04 04:41:58,968][134211] Fps is (10 sec: 16383.6, 60 sec: 15633.1, 300 sec: 15301.0). Total num frames: 412094464. Throughput: 0: 4027.2. Samples: 92194708. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 04:41:58,968][134211] Avg episode reward: [(0, '7.589')] [2025-01-04 04:42:00,445][134294] Updated weights for policy 0, policy_version 100614 (0.0025) [2025-01-04 04:42:03,560][134294] Updated weights for policy 0, policy_version 100624 (0.0025) [2025-01-04 04:42:03,969][134211] Fps is (10 sec: 12696.2, 60 sec: 15632.8, 300 sec: 15300.9). Total num frames: 412160000. Throughput: 0: 4054.8. Samples: 92204420. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 04:42:03,969][134211] Avg episode reward: [(0, '7.996')] [2025-01-04 04:42:06,730][134294] Updated weights for policy 0, policy_version 100634 (0.0023) [2025-01-04 04:42:08,968][134211] Fps is (10 sec: 12697.7, 60 sec: 15291.8, 300 sec: 15287.1). Total num frames: 412221440. Throughput: 0: 3801.6. Samples: 92223892. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 04:42:08,968][134211] Avg episode reward: [(0, '7.544')] [2025-01-04 04:42:10,049][134294] Updated weights for policy 0, policy_version 100644 (0.0029) [2025-01-04 04:42:13,318][134294] Updated weights for policy 0, policy_version 100654 (0.0027) [2025-01-04 04:42:13,968][134211] Fps is (10 sec: 12289.2, 60 sec: 15223.4, 300 sec: 15148.2). Total num frames: 412282880. Throughput: 0: 3684.6. Samples: 92242268. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 04:42:13,968][134211] Avg episode reward: [(0, '8.239')] [2025-01-04 04:42:16,554][134294] Updated weights for policy 0, policy_version 100664 (0.0026) [2025-01-04 04:42:18,968][134211] Fps is (10 sec: 12697.5, 60 sec: 15018.6, 300 sec: 15078.8). Total num frames: 412348416. Throughput: 0: 3705.9. Samples: 92251990. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 04:42:18,968][134211] Avg episode reward: [(0, '7.701')] [2025-01-04 04:42:19,712][134294] Updated weights for policy 0, policy_version 100674 (0.0025) [2025-01-04 04:42:22,941][134294] Updated weights for policy 0, policy_version 100684 (0.0024) [2025-01-04 04:42:23,968][134211] Fps is (10 sec: 13106.6, 60 sec: 14335.8, 300 sec: 15078.8). Total num frames: 412413952. Throughput: 0: 3731.5. Samples: 92271252. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 04:42:23,969][134211] Avg episode reward: [(0, '7.759')] [2025-01-04 04:42:25,897][134294] Updated weights for policy 0, policy_version 100694 (0.0023) [2025-01-04 04:42:28,968][134211] Fps is (10 sec: 13106.8, 60 sec: 14335.9, 300 sec: 15092.7). Total num frames: 412479488. Throughput: 0: 3680.1. Samples: 92291134. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 04:42:28,969][134211] Avg episode reward: [(0, '7.743')] [2025-01-04 04:42:29,174][134294] Updated weights for policy 0, policy_version 100704 (0.0026) [2025-01-04 04:42:31,726][134294] Updated weights for policy 0, policy_version 100714 (0.0020) [2025-01-04 04:42:33,637][134294] Updated weights for policy 0, policy_version 100724 (0.0014) [2025-01-04 04:42:33,967][134211] Fps is (10 sec: 15566.1, 60 sec: 14813.9, 300 sec: 15148.3). Total num frames: 412569600. Throughput: 0: 3548.9. Samples: 92301344. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 04:42:33,968][134211] Avg episode reward: [(0, '7.917')] [2025-01-04 04:42:35,540][134294] Updated weights for policy 0, policy_version 100734 (0.0013) [2025-01-04 04:42:37,375][134294] Updated weights for policy 0, policy_version 100744 (0.0013) [2025-01-04 04:42:38,968][134211] Fps is (10 sec: 20070.5, 60 sec: 15701.2, 300 sec: 15162.1). Total num frames: 412680192. Throughput: 0: 3563.9. Samples: 92333868. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 04:42:38,968][134211] Avg episode reward: [(0, '8.582')] [2025-01-04 04:42:39,357][134294] Updated weights for policy 0, policy_version 100754 (0.0013) [2025-01-04 04:42:41,157][134294] Updated weights for policy 0, policy_version 100764 (0.0013) [2025-01-04 04:42:43,522][134294] Updated weights for policy 0, policy_version 100774 (0.0020) [2025-01-04 04:42:43,968][134211] Fps is (10 sec: 20479.6, 60 sec: 15837.8, 300 sec: 15189.9). Total num frames: 412774400. Throughput: 0: 3776.8. Samples: 92364662. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 04:42:43,968][134211] Avg episode reward: [(0, '8.233')] [2025-01-04 04:42:46,793][134294] Updated weights for policy 0, policy_version 100784 (0.0028) [2025-01-04 04:42:48,968][134211] Fps is (10 sec: 15565.1, 60 sec: 15086.9, 300 sec: 15176.0). Total num frames: 412835840. Throughput: 0: 3767.5. Samples: 92373954. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 04:42:48,968][134211] Avg episode reward: [(0, '8.508')] [2025-01-04 04:42:50,092][134294] Updated weights for policy 0, policy_version 100794 (0.0028) [2025-01-04 04:42:53,354][134294] Updated weights for policy 0, policy_version 100804 (0.0027) [2025-01-04 04:42:53,968][134211] Fps is (10 sec: 12287.9, 60 sec: 14404.3, 300 sec: 15148.3). Total num frames: 412897280. Throughput: 0: 3754.5. Samples: 92392844. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 04:42:53,968][134211] Avg episode reward: [(0, '8.297')] [2025-01-04 04:42:56,513][134294] Updated weights for policy 0, policy_version 100814 (0.0027) [2025-01-04 04:42:58,968][134211] Fps is (10 sec: 13107.4, 60 sec: 14540.8, 300 sec: 15176.0). Total num frames: 412966912. Throughput: 0: 3775.8. Samples: 92412176. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 04:42:58,968][134211] Avg episode reward: [(0, '8.554')] [2025-01-04 04:42:59,636][134294] Updated weights for policy 0, policy_version 100824 (0.0026) [2025-01-04 04:43:02,764][134294] Updated weights for policy 0, policy_version 100834 (0.0023) [2025-01-04 04:43:03,968][134211] Fps is (10 sec: 13516.6, 60 sec: 14541.0, 300 sec: 15162.2). Total num frames: 413032448. Throughput: 0: 3780.1. Samples: 92422094. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 04:43:03,969][134211] Avg episode reward: [(0, '9.033')] [2025-01-04 04:43:05,690][134294] Updated weights for policy 0, policy_version 100844 (0.0024) [2025-01-04 04:43:08,809][134294] Updated weights for policy 0, policy_version 100854 (0.0022) [2025-01-04 04:43:08,968][134211] Fps is (10 sec: 13107.0, 60 sec: 14609.0, 300 sec: 15148.3). Total num frames: 413097984. Throughput: 0: 3801.3. Samples: 92442308. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 04:43:08,968][134211] Avg episode reward: [(0, '7.399')] [2025-01-04 04:43:11,780][134294] Updated weights for policy 0, policy_version 100864 (0.0022) [2025-01-04 04:43:13,968][134211] Fps is (10 sec: 13516.6, 60 sec: 14745.6, 300 sec: 15106.6). Total num frames: 413167616. Throughput: 0: 3807.4. Samples: 92462468. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 04:43:13,968][134211] Avg episode reward: [(0, '7.812')] [2025-01-04 04:43:13,979][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000100872_413171712.pth... [2025-01-04 04:43:14,024][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000099990_409559040.pth [2025-01-04 04:43:14,417][134294] Updated weights for policy 0, policy_version 100874 (0.0019) [2025-01-04 04:43:16,290][134294] Updated weights for policy 0, policy_version 100884 (0.0012) [2025-01-04 04:43:18,223][134294] Updated weights for policy 0, policy_version 100894 (0.0013) [2025-01-04 04:43:18,968][134211] Fps is (10 sec: 18022.8, 60 sec: 15496.6, 300 sec: 15189.9). Total num frames: 413278208. Throughput: 0: 3922.2. Samples: 92477842. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-04 04:43:18,968][134211] Avg episode reward: [(0, '7.866')] [2025-01-04 04:43:20,074][134294] Updated weights for policy 0, policy_version 100904 (0.0013) [2025-01-04 04:43:21,942][134294] Updated weights for policy 0, policy_version 100914 (0.0016) [2025-01-04 04:43:23,820][134294] Updated weights for policy 0, policy_version 100924 (0.0014) [2025-01-04 04:43:23,967][134211] Fps is (10 sec: 21709.9, 60 sec: 16179.4, 300 sec: 15314.9). Total num frames: 413384704. Throughput: 0: 3924.3. Samples: 92510460. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-04 04:43:23,968][134211] Avg episode reward: [(0, '8.712')] [2025-01-04 04:43:25,690][134294] Updated weights for policy 0, policy_version 100934 (0.0013) [2025-01-04 04:43:28,022][134294] Updated weights for policy 0, policy_version 100944 (0.0019) [2025-01-04 04:43:28,968][134211] Fps is (10 sec: 19660.4, 60 sec: 16588.9, 300 sec: 15301.0). Total num frames: 413474816. Throughput: 0: 3902.6. Samples: 92540278. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-04 04:43:28,968][134211] Avg episode reward: [(0, '8.264')] [2025-01-04 04:43:31,383][134294] Updated weights for policy 0, policy_version 100954 (0.0027) [2025-01-04 04:43:33,968][134211] Fps is (10 sec: 15154.6, 60 sec: 16110.8, 300 sec: 15287.1). Total num frames: 413536256. Throughput: 0: 3902.4. Samples: 92549564. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-04 04:43:33,969][134211] Avg episode reward: [(0, '8.299')] [2025-01-04 04:43:34,770][134294] Updated weights for policy 0, policy_version 100964 (0.0027) [2025-01-04 04:43:38,474][134294] Updated weights for policy 0, policy_version 100974 (0.0025) [2025-01-04 04:43:38,968][134211] Fps is (10 sec: 11878.3, 60 sec: 15223.5, 300 sec: 15259.3). Total num frames: 413593600. Throughput: 0: 3871.5. Samples: 92567062. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-04 04:43:38,968][134211] Avg episode reward: [(0, '9.146')] [2025-01-04 04:43:41,846][134294] Updated weights for policy 0, policy_version 100984 (0.0026) [2025-01-04 04:43:43,968][134211] Fps is (10 sec: 11878.5, 60 sec: 14677.3, 300 sec: 15231.6). Total num frames: 413655040. Throughput: 0: 3828.4. Samples: 92584454. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-04 04:43:43,968][134211] Avg episode reward: [(0, '8.178')] [2025-01-04 04:43:45,328][134294] Updated weights for policy 0, policy_version 100994 (0.0027) [2025-01-04 04:43:48,446][134294] Updated weights for policy 0, policy_version 101004 (0.0027) [2025-01-04 04:43:48,968][134211] Fps is (10 sec: 12288.0, 60 sec: 14677.3, 300 sec: 15120.5). Total num frames: 413716480. Throughput: 0: 3811.7. Samples: 92593620. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-04 04:43:48,968][134211] Avg episode reward: [(0, '7.650')] [2025-01-04 04:43:51,481][134294] Updated weights for policy 0, policy_version 101014 (0.0026) [2025-01-04 04:43:53,968][134211] Fps is (10 sec: 12697.8, 60 sec: 14745.6, 300 sec: 15120.5). Total num frames: 413782016. Throughput: 0: 3807.0. Samples: 92613622. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-04 04:43:53,968][134211] Avg episode reward: [(0, '9.406')] [2025-01-04 04:43:53,975][134264] Saving new best policy, reward=9.406! [2025-01-04 04:43:54,846][134294] Updated weights for policy 0, policy_version 101024 (0.0028) [2025-01-04 04:43:56,889][134294] Updated weights for policy 0, policy_version 101034 (0.0015) [2025-01-04 04:43:58,968][134211] Fps is (10 sec: 15155.2, 60 sec: 15018.6, 300 sec: 15189.9). Total num frames: 413868032. Throughput: 0: 3894.3. Samples: 92637712. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-04 04:43:58,968][134211] Avg episode reward: [(0, '8.638')] [2025-01-04 04:43:59,397][134294] Updated weights for policy 0, policy_version 101044 (0.0021) [2025-01-04 04:44:02,485][134294] Updated weights for policy 0, policy_version 101054 (0.0024) [2025-01-04 04:44:03,968][134211] Fps is (10 sec: 15155.2, 60 sec: 15018.7, 300 sec: 15064.9). Total num frames: 413933568. Throughput: 0: 3783.9. Samples: 92648118. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-04 04:44:03,968][134211] Avg episode reward: [(0, '7.586')] [2025-01-04 04:44:05,507][134294] Updated weights for policy 0, policy_version 101064 (0.0024) [2025-01-04 04:44:07,440][134294] Updated weights for policy 0, policy_version 101074 (0.0013) [2025-01-04 04:44:08,967][134211] Fps is (10 sec: 16384.5, 60 sec: 15564.9, 300 sec: 15162.2). Total num frames: 414031872. Throughput: 0: 3582.3. Samples: 92671664. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-04 04:44:08,968][134211] Avg episode reward: [(0, '8.588')] [2025-01-04 04:44:09,330][134294] Updated weights for policy 0, policy_version 101084 (0.0013) [2025-01-04 04:44:11,800][134294] Updated weights for policy 0, policy_version 101094 (0.0019) [2025-01-04 04:44:13,968][134211] Fps is (10 sec: 17202.9, 60 sec: 15633.1, 300 sec: 15189.9). Total num frames: 414105600. Throughput: 0: 3498.2. Samples: 92697698. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-04 04:44:13,968][134211] Avg episode reward: [(0, '7.891')] [2025-01-04 04:44:15,007][134294] Updated weights for policy 0, policy_version 101104 (0.0024) [2025-01-04 04:44:18,241][134294] Updated weights for policy 0, policy_version 101114 (0.0025) [2025-01-04 04:44:18,968][134211] Fps is (10 sec: 13925.9, 60 sec: 14882.1, 300 sec: 15189.9). Total num frames: 414171136. Throughput: 0: 3507.8. Samples: 92707414. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 04:44:18,969][134211] Avg episode reward: [(0, '8.564')] [2025-01-04 04:44:21,284][134294] Updated weights for policy 0, policy_version 101124 (0.0025) [2025-01-04 04:44:23,968][134211] Fps is (10 sec: 13107.0, 60 sec: 14199.4, 300 sec: 15078.8). Total num frames: 414236672. Throughput: 0: 3554.7. Samples: 92727026. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 04:44:23,969][134211] Avg episode reward: [(0, '7.481')] [2025-01-04 04:44:24,492][134294] Updated weights for policy 0, policy_version 101134 (0.0024) [2025-01-04 04:44:26,517][134294] Updated weights for policy 0, policy_version 101144 (0.0014) [2025-01-04 04:44:28,455][134294] Updated weights for policy 0, policy_version 101154 (0.0013) [2025-01-04 04:44:28,967][134211] Fps is (10 sec: 16384.6, 60 sec: 14336.1, 300 sec: 15051.1). Total num frames: 414334976. Throughput: 0: 3759.4. Samples: 92753624. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 04:44:28,968][134211] Avg episode reward: [(0, '7.860')] [2025-01-04 04:44:30,330][134294] Updated weights for policy 0, policy_version 101164 (0.0013) [2025-01-04 04:44:32,209][134294] Updated weights for policy 0, policy_version 101174 (0.0013) [2025-01-04 04:44:33,968][134211] Fps is (10 sec: 20070.6, 60 sec: 15018.7, 300 sec: 15176.0). Total num frames: 414437376. Throughput: 0: 3918.6. Samples: 92769958. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 04:44:33,969][134211] Avg episode reward: [(0, '8.859')] [2025-01-04 04:44:34,751][134294] Updated weights for policy 0, policy_version 101184 (0.0022) [2025-01-04 04:44:37,950][134294] Updated weights for policy 0, policy_version 101194 (0.0027) [2025-01-04 04:44:38,968][134211] Fps is (10 sec: 16383.5, 60 sec: 15086.9, 300 sec: 15189.9). Total num frames: 414498816. Throughput: 0: 3991.1. Samples: 92793224. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 04:44:38,969][134211] Avg episode reward: [(0, '8.300')] [2025-01-04 04:44:41,403][134294] Updated weights for policy 0, policy_version 101204 (0.0026) [2025-01-04 04:44:43,968][134211] Fps is (10 sec: 12288.1, 60 sec: 15086.9, 300 sec: 15176.0). Total num frames: 414560256. Throughput: 0: 3861.0. Samples: 92811458. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 04:44:43,968][134211] Avg episode reward: [(0, '8.374')] [2025-01-04 04:44:44,678][134294] Updated weights for policy 0, policy_version 101214 (0.0026) [2025-01-04 04:44:47,929][134294] Updated weights for policy 0, policy_version 101224 (0.0026) [2025-01-04 04:44:48,968][134211] Fps is (10 sec: 12697.9, 60 sec: 15155.2, 300 sec: 15162.1). Total num frames: 414625792. Throughput: 0: 3844.2. Samples: 92821108. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 04:44:48,968][134211] Avg episode reward: [(0, '7.691')] [2025-01-04 04:44:50,997][134294] Updated weights for policy 0, policy_version 101234 (0.0026) [2025-01-04 04:44:53,970][134211] Fps is (10 sec: 13103.8, 60 sec: 15154.5, 300 sec: 15162.0). Total num frames: 414691328. Throughput: 0: 3756.5. Samples: 92840718. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 04:44:53,971][134211] Avg episode reward: [(0, '7.627')] [2025-01-04 04:44:54,163][134294] Updated weights for policy 0, policy_version 101244 (0.0024) [2025-01-04 04:44:56,534][134294] Updated weights for policy 0, policy_version 101254 (0.0017) [2025-01-04 04:44:58,731][134294] Updated weights for policy 0, policy_version 101264 (0.0017) [2025-01-04 04:44:58,968][134211] Fps is (10 sec: 15155.0, 60 sec: 15155.2, 300 sec: 15231.6). Total num frames: 414777344. Throughput: 0: 3725.7. Samples: 92865356. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 04:44:58,968][134211] Avg episode reward: [(0, '8.686')] [2025-01-04 04:45:01,734][134294] Updated weights for policy 0, policy_version 101274 (0.0026) [2025-01-04 04:45:03,968][134211] Fps is (10 sec: 15159.3, 60 sec: 15155.2, 300 sec: 15134.4). Total num frames: 414842880. Throughput: 0: 3740.0. Samples: 92875712. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 04:45:03,968][134211] Avg episode reward: [(0, '7.589')] [2025-01-04 04:45:04,983][134294] Updated weights for policy 0, policy_version 101284 (0.0025) [2025-01-04 04:45:07,205][134294] Updated weights for policy 0, policy_version 101294 (0.0016) [2025-01-04 04:45:08,968][134211] Fps is (10 sec: 15974.7, 60 sec: 15086.9, 300 sec: 15078.8). Total num frames: 414937088. Throughput: 0: 3808.3. Samples: 92898398. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 04:45:08,968][134211] Avg episode reward: [(0, '8.198')] [2025-01-04 04:45:09,076][134294] Updated weights for policy 0, policy_version 101304 (0.0013) [2025-01-04 04:45:10,941][134294] Updated weights for policy 0, policy_version 101314 (0.0013) [2025-01-04 04:45:12,842][134294] Updated weights for policy 0, policy_version 101324 (0.0014) [2025-01-04 04:45:13,967][134211] Fps is (10 sec: 20480.3, 60 sec: 15701.4, 300 sec: 15189.9). Total num frames: 415047680. Throughput: 0: 3937.8. Samples: 92930826. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 04:45:13,968][134211] Avg episode reward: [(0, '8.492')] [2025-01-04 04:45:13,972][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000101330_415047680.pth... [2025-01-04 04:45:14,019][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000100432_411369472.pth [2025-01-04 04:45:14,762][134294] Updated weights for policy 0, policy_version 101334 (0.0014) [2025-01-04 04:45:16,658][134294] Updated weights for policy 0, policy_version 101344 (0.0012) [2025-01-04 04:45:18,969][134211] Fps is (10 sec: 20068.0, 60 sec: 16110.7, 300 sec: 15273.2). Total num frames: 415137792. Throughput: 0: 3935.7. Samples: 92947068. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 04:45:18,969][134211] Avg episode reward: [(0, '8.301')] [2025-01-04 04:45:19,441][134294] Updated weights for policy 0, policy_version 101354 (0.0024) [2025-01-04 04:45:22,968][134294] Updated weights for policy 0, policy_version 101364 (0.0030) [2025-01-04 04:45:23,968][134211] Fps is (10 sec: 14745.1, 60 sec: 15974.4, 300 sec: 15245.4). Total num frames: 415195136. Throughput: 0: 3875.7. Samples: 92967630. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 04:45:23,969][134211] Avg episode reward: [(0, '8.106')] [2025-01-04 04:45:26,317][134294] Updated weights for policy 0, policy_version 101374 (0.0026) [2025-01-04 04:45:28,970][134211] Fps is (10 sec: 12286.9, 60 sec: 15427.7, 300 sec: 15245.4). Total num frames: 415260672. Throughput: 0: 3878.5. Samples: 92985998. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 04:45:28,971][134211] Avg episode reward: [(0, '8.872')] [2025-01-04 04:45:29,575][134294] Updated weights for policy 0, policy_version 101384 (0.0024) [2025-01-04 04:45:32,897][134294] Updated weights for policy 0, policy_version 101394 (0.0025) [2025-01-04 04:45:33,968][134211] Fps is (10 sec: 12697.2, 60 sec: 14745.5, 300 sec: 15217.7). Total num frames: 415322112. Throughput: 0: 3868.8. Samples: 92995206. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 04:45:33,969][134211] Avg episode reward: [(0, '8.670')] [2025-01-04 04:45:36,163][134294] Updated weights for policy 0, policy_version 101404 (0.0024) [2025-01-04 04:45:38,968][134211] Fps is (10 sec: 11880.6, 60 sec: 14677.3, 300 sec: 15134.4). Total num frames: 415379456. Throughput: 0: 3843.0. Samples: 93013644. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 04:45:38,968][134211] Avg episode reward: [(0, '8.831')] [2025-01-04 04:45:39,843][134294] Updated weights for policy 0, policy_version 101414 (0.0026) [2025-01-04 04:45:43,043][134294] Updated weights for policy 0, policy_version 101424 (0.0025) [2025-01-04 04:45:43,968][134211] Fps is (10 sec: 11878.8, 60 sec: 14677.3, 300 sec: 15106.6). Total num frames: 415440896. Throughput: 0: 3692.5. Samples: 93031518. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 04:45:43,968][134211] Avg episode reward: [(0, '8.258')] [2025-01-04 04:45:45,619][134294] Updated weights for policy 0, policy_version 101434 (0.0018) [2025-01-04 04:45:47,509][134294] Updated weights for policy 0, policy_version 101444 (0.0016) [2025-01-04 04:45:48,967][134211] Fps is (10 sec: 16384.5, 60 sec: 15291.8, 300 sec: 15106.6). Total num frames: 415543296. Throughput: 0: 3755.6. Samples: 93044714. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 04:45:48,968][134211] Avg episode reward: [(0, '7.547')] [2025-01-04 04:45:49,406][134294] Updated weights for policy 0, policy_version 101454 (0.0012) [2025-01-04 04:45:51,285][134294] Updated weights for policy 0, policy_version 101464 (0.0012) [2025-01-04 04:45:53,293][134294] Updated weights for policy 0, policy_version 101474 (0.0013) [2025-01-04 04:45:53,968][134211] Fps is (10 sec: 20889.8, 60 sec: 15975.1, 300 sec: 15231.6). Total num frames: 415649792. Throughput: 0: 3968.0. Samples: 93076960. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 04:45:53,968][134211] Avg episode reward: [(0, '8.515')] [2025-01-04 04:45:55,811][134294] Updated weights for policy 0, policy_version 101484 (0.0022) [2025-01-04 04:45:58,968][134211] Fps is (10 sec: 17202.8, 60 sec: 15633.1, 300 sec: 15231.6). Total num frames: 415715328. Throughput: 0: 3770.5. Samples: 93100498. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 04:45:58,968][134211] Avg episode reward: [(0, '8.614')] [2025-01-04 04:45:59,028][134294] Updated weights for policy 0, policy_version 101494 (0.0027) [2025-01-04 04:46:02,343][134294] Updated weights for policy 0, policy_version 101504 (0.0025) [2025-01-04 04:46:03,969][134211] Fps is (10 sec: 13105.9, 60 sec: 15632.8, 300 sec: 15176.0). Total num frames: 415780864. Throughput: 0: 3613.4. Samples: 93109672. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 04:46:03,970][134211] Avg episode reward: [(0, '8.344')] [2025-01-04 04:46:05,458][134294] Updated weights for policy 0, policy_version 101514 (0.0028) [2025-01-04 04:46:08,524][134294] Updated weights for policy 0, policy_version 101524 (0.0025) [2025-01-04 04:46:08,968][134211] Fps is (10 sec: 13107.1, 60 sec: 15155.1, 300 sec: 15176.0). Total num frames: 415846400. Throughput: 0: 3595.7. Samples: 93129438. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 04:46:08,968][134211] Avg episode reward: [(0, '8.003')] [2025-01-04 04:46:11,515][134294] Updated weights for policy 0, policy_version 101534 (0.0024) [2025-01-04 04:46:13,968][134211] Fps is (10 sec: 13108.3, 60 sec: 14404.2, 300 sec: 15134.4). Total num frames: 415911936. Throughput: 0: 3631.1. Samples: 93149392. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 04:46:13,969][134211] Avg episode reward: [(0, '8.718')] [2025-01-04 04:46:14,623][134294] Updated weights for policy 0, policy_version 101544 (0.0028) [2025-01-04 04:46:17,731][134294] Updated weights for policy 0, policy_version 101554 (0.0024) [2025-01-04 04:46:18,968][134211] Fps is (10 sec: 13107.2, 60 sec: 13994.9, 300 sec: 14995.5). Total num frames: 415977472. Throughput: 0: 3648.8. Samples: 93159402. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 04:46:18,968][134211] Avg episode reward: [(0, '8.224')] [2025-01-04 04:46:21,094][134294] Updated weights for policy 0, policy_version 101564 (0.0026) [2025-01-04 04:46:23,968][134211] Fps is (10 sec: 13107.6, 60 sec: 14131.3, 300 sec: 14995.5). Total num frames: 416043008. Throughput: 0: 3647.7. Samples: 93177788. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 04:46:23,968][134211] Avg episode reward: [(0, '8.881')] [2025-01-04 04:46:24,010][134294] Updated weights for policy 0, policy_version 101574 (0.0024) [2025-01-04 04:46:25,933][134294] Updated weights for policy 0, policy_version 101584 (0.0014) [2025-01-04 04:46:27,810][134294] Updated weights for policy 0, policy_version 101594 (0.0013) [2025-01-04 04:46:28,967][134211] Fps is (10 sec: 17613.3, 60 sec: 14882.7, 300 sec: 15162.2). Total num frames: 416153600. Throughput: 0: 3913.6. Samples: 93207630. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 04:46:28,968][134211] Avg episode reward: [(0, '8.564')] [2025-01-04 04:46:29,681][134294] Updated weights for policy 0, policy_version 101604 (0.0012) [2025-01-04 04:46:31,572][134294] Updated weights for policy 0, policy_version 101614 (0.0014) [2025-01-04 04:46:33,438][134294] Updated weights for policy 0, policy_version 101624 (0.0013) [2025-01-04 04:46:33,967][134211] Fps is (10 sec: 21709.0, 60 sec: 15633.2, 300 sec: 15328.8). Total num frames: 416260096. Throughput: 0: 3980.9. Samples: 93223856. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 04:46:33,968][134211] Avg episode reward: [(0, '8.461')] [2025-01-04 04:46:35,712][134294] Updated weights for policy 0, policy_version 101634 (0.0017) [2025-01-04 04:46:38,911][134294] Updated weights for policy 0, policy_version 101644 (0.0027) [2025-01-04 04:46:38,968][134211] Fps is (10 sec: 18021.9, 60 sec: 15906.1, 300 sec: 15287.1). Total num frames: 416333824. Throughput: 0: 3874.3. Samples: 93251304. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 04:46:38,968][134211] Avg episode reward: [(0, '7.983')] [2025-01-04 04:46:42,149][134294] Updated weights for policy 0, policy_version 101654 (0.0027) [2025-01-04 04:46:43,968][134211] Fps is (10 sec: 13516.6, 60 sec: 15906.2, 300 sec: 15134.4). Total num frames: 416395264. Throughput: 0: 3773.3. Samples: 93270296. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 04:46:43,968][134211] Avg episode reward: [(0, '8.542')] [2025-01-04 04:46:45,223][134294] Updated weights for policy 0, policy_version 101664 (0.0025) [2025-01-04 04:46:48,426][134294] Updated weights for policy 0, policy_version 101674 (0.0024) [2025-01-04 04:46:48,968][134211] Fps is (10 sec: 12697.6, 60 sec: 15291.7, 300 sec: 15009.4). Total num frames: 416460800. Throughput: 0: 3793.5. Samples: 93280378. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 04:46:48,968][134211] Avg episode reward: [(0, '7.755')] [2025-01-04 04:46:51,771][134294] Updated weights for policy 0, policy_version 101684 (0.0026) [2025-01-04 04:46:53,968][134211] Fps is (10 sec: 12697.4, 60 sec: 14540.8, 300 sec: 15009.4). Total num frames: 416522240. Throughput: 0: 3761.9. Samples: 93298722. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 04:46:53,969][134211] Avg episode reward: [(0, '8.781')] [2025-01-04 04:46:55,142][134294] Updated weights for policy 0, policy_version 101694 (0.0028) [2025-01-04 04:46:58,084][134294] Updated weights for policy 0, policy_version 101704 (0.0024) [2025-01-04 04:46:58,968][134211] Fps is (10 sec: 12697.6, 60 sec: 14540.8, 300 sec: 15009.5). Total num frames: 416587776. Throughput: 0: 3754.2. Samples: 93318330. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 04:46:58,968][134211] Avg episode reward: [(0, '7.699')] [2025-01-04 04:47:01,129][134294] Updated weights for policy 0, policy_version 101714 (0.0024) [2025-01-04 04:47:03,969][134211] Fps is (10 sec: 13515.8, 60 sec: 14609.1, 300 sec: 15037.1). Total num frames: 416657408. Throughput: 0: 3758.8. Samples: 93328550. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 04:47:03,969][134211] Avg episode reward: [(0, '8.861')] [2025-01-04 04:47:04,347][134294] Updated weights for policy 0, policy_version 101724 (0.0025) [2025-01-04 04:47:07,261][134294] Updated weights for policy 0, policy_version 101734 (0.0027) [2025-01-04 04:47:08,967][134211] Fps is (10 sec: 13926.8, 60 sec: 14677.4, 300 sec: 15065.0). Total num frames: 416727040. Throughput: 0: 3795.6. Samples: 93348588. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 04:47:08,968][134211] Avg episode reward: [(0, '7.779')] [2025-01-04 04:47:09,549][134294] Updated weights for policy 0, policy_version 101744 (0.0014) [2025-01-04 04:47:11,444][134294] Updated weights for policy 0, policy_version 101754 (0.0012) [2025-01-04 04:47:13,352][134294] Updated weights for policy 0, policy_version 101764 (0.0013) [2025-01-04 04:47:13,968][134211] Fps is (10 sec: 18024.1, 60 sec: 15428.3, 300 sec: 15217.7). Total num frames: 416837632. Throughput: 0: 3803.6. Samples: 93378792. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 04:47:13,968][134211] Avg episode reward: [(0, '7.640')] [2025-01-04 04:47:13,972][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000101767_416837632.pth... [2025-01-04 04:47:14,026][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000100872_413171712.pth [2025-01-04 04:47:15,249][134294] Updated weights for policy 0, policy_version 101774 (0.0014) [2025-01-04 04:47:17,175][134294] Updated weights for policy 0, policy_version 101784 (0.0012) [2025-01-04 04:47:18,968][134211] Fps is (10 sec: 21299.1, 60 sec: 16042.7, 300 sec: 15342.7). Total num frames: 416940032. Throughput: 0: 3799.4. Samples: 93394828. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 04:47:18,968][134211] Avg episode reward: [(0, '8.143')] [2025-01-04 04:47:19,190][134294] Updated weights for policy 0, policy_version 101794 (0.0014) [2025-01-04 04:47:22,411][134294] Updated weights for policy 0, policy_version 101804 (0.0025) [2025-01-04 04:47:23,968][134211] Fps is (10 sec: 16793.3, 60 sec: 16042.6, 300 sec: 15342.7). Total num frames: 417005568. Throughput: 0: 3742.6. Samples: 93419722. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 04:47:23,969][134211] Avg episode reward: [(0, '7.583')] [2025-01-04 04:47:25,749][134294] Updated weights for policy 0, policy_version 101814 (0.0028) [2025-01-04 04:47:28,887][134294] Updated weights for policy 0, policy_version 101824 (0.0024) [2025-01-04 04:47:28,968][134211] Fps is (10 sec: 13106.9, 60 sec: 15291.7, 300 sec: 15259.3). Total num frames: 417071104. Throughput: 0: 3738.8. Samples: 93438542. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 04:47:28,968][134211] Avg episode reward: [(0, '7.058')] [2025-01-04 04:47:31,958][134294] Updated weights for policy 0, policy_version 101834 (0.0024) [2025-01-04 04:47:33,968][134211] Fps is (10 sec: 13107.4, 60 sec: 14609.0, 300 sec: 15106.6). Total num frames: 417136640. Throughput: 0: 3735.9. Samples: 93448492. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 04:47:33,968][134211] Avg episode reward: [(0, '7.373')] [2025-01-04 04:47:35,039][134294] Updated weights for policy 0, policy_version 101844 (0.0025) [2025-01-04 04:47:38,583][134294] Updated weights for policy 0, policy_version 101854 (0.0023) [2025-01-04 04:47:38,970][134211] Fps is (10 sec: 12695.0, 60 sec: 14403.8, 300 sec: 14995.4). Total num frames: 417198080. Throughput: 0: 3755.9. Samples: 93467744. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 04:47:38,971][134211] Avg episode reward: [(0, '9.010')] [2025-01-04 04:47:41,963][134294] Updated weights for policy 0, policy_version 101864 (0.0027) [2025-01-04 04:47:43,968][134211] Fps is (10 sec: 11878.2, 60 sec: 14336.0, 300 sec: 14981.6). Total num frames: 417255424. Throughput: 0: 3711.4. Samples: 93485344. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 04:47:43,969][134211] Avg episode reward: [(0, '8.042')] [2025-01-04 04:47:45,274][134294] Updated weights for policy 0, policy_version 101874 (0.0025) [2025-01-04 04:47:48,154][134294] Updated weights for policy 0, policy_version 101884 (0.0024) [2025-01-04 04:47:48,967][134211] Fps is (10 sec: 13520.0, 60 sec: 14540.9, 300 sec: 15037.2). Total num frames: 417333248. Throughput: 0: 3701.5. Samples: 93495112. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 04:47:48,968][134211] Avg episode reward: [(0, '7.416')] [2025-01-04 04:47:50,400][134294] Updated weights for policy 0, policy_version 101894 (0.0016) [2025-01-04 04:47:53,522][134294] Updated weights for policy 0, policy_version 101904 (0.0025) [2025-01-04 04:47:53,968][134211] Fps is (10 sec: 14745.7, 60 sec: 14677.3, 300 sec: 15037.2). Total num frames: 417402880. Throughput: 0: 3781.0. Samples: 93518736. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 04:47:53,969][134211] Avg episode reward: [(0, '8.811')] [2025-01-04 04:47:56,353][134294] Updated weights for policy 0, policy_version 101914 (0.0020) [2025-01-04 04:47:58,242][134294] Updated weights for policy 0, policy_version 101924 (0.0014) [2025-01-04 04:47:58,967][134211] Fps is (10 sec: 15974.4, 60 sec: 15087.0, 300 sec: 15120.5). Total num frames: 417492992. Throughput: 0: 3651.1. Samples: 93543092. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 04:47:58,968][134211] Avg episode reward: [(0, '8.690')] [2025-01-04 04:48:00,157][134294] Updated weights for policy 0, policy_version 101934 (0.0013) [2025-01-04 04:48:02,018][134294] Updated weights for policy 0, policy_version 101944 (0.0014) [2025-01-04 04:48:03,946][134294] Updated weights for policy 0, policy_version 101954 (0.0015) [2025-01-04 04:48:03,968][134211] Fps is (10 sec: 20070.1, 60 sec: 15769.8, 300 sec: 15273.2). Total num frames: 417603584. Throughput: 0: 3654.8. Samples: 93559296. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 04:48:03,968][134211] Avg episode reward: [(0, '8.063')] [2025-01-04 04:48:05,779][134294] Updated weights for policy 0, policy_version 101964 (0.0013) [2025-01-04 04:48:07,867][134294] Updated weights for policy 0, policy_version 101974 (0.0015) [2025-01-04 04:48:08,968][134211] Fps is (10 sec: 20479.7, 60 sec: 16179.2, 300 sec: 15356.5). Total num frames: 417697792. Throughput: 0: 3822.2. Samples: 93591718. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 04:48:08,968][134211] Avg episode reward: [(0, '7.361')] [2025-01-04 04:48:10,998][134294] Updated weights for policy 0, policy_version 101984 (0.0028) [2025-01-04 04:48:13,969][134211] Fps is (10 sec: 15972.6, 60 sec: 15427.9, 300 sec: 15203.7). Total num frames: 417763328. Throughput: 0: 3852.1. Samples: 93611892. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 04:48:13,970][134211] Avg episode reward: [(0, '8.150')] [2025-01-04 04:48:14,181][134294] Updated weights for policy 0, policy_version 101994 (0.0027) [2025-01-04 04:48:17,295][134294] Updated weights for policy 0, policy_version 102004 (0.0027) [2025-01-04 04:48:18,968][134211] Fps is (10 sec: 12697.4, 60 sec: 14745.5, 300 sec: 15051.0). Total num frames: 417824768. Throughput: 0: 3845.0. Samples: 93621518. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 04:48:18,969][134211] Avg episode reward: [(0, '7.933')] [2025-01-04 04:48:20,682][134294] Updated weights for policy 0, policy_version 102014 (0.0026) [2025-01-04 04:48:23,968][134211] Fps is (10 sec: 12289.5, 60 sec: 14677.3, 300 sec: 14953.9). Total num frames: 417886208. Throughput: 0: 3830.1. Samples: 93640092. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 04:48:23,969][134211] Avg episode reward: [(0, '8.240')] [2025-01-04 04:48:24,076][134294] Updated weights for policy 0, policy_version 102024 (0.0026) [2025-01-04 04:48:27,096][134294] Updated weights for policy 0, policy_version 102034 (0.0026) [2025-01-04 04:48:28,968][134211] Fps is (10 sec: 12697.7, 60 sec: 14677.3, 300 sec: 14967.8). Total num frames: 417951744. Throughput: 0: 3874.9. Samples: 93659716. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 04:48:28,968][134211] Avg episode reward: [(0, '8.379')] [2025-01-04 04:48:30,075][134294] Updated weights for policy 0, policy_version 102044 (0.0028) [2025-01-04 04:48:32,995][134294] Updated weights for policy 0, policy_version 102054 (0.0023) [2025-01-04 04:48:33,968][134211] Fps is (10 sec: 13516.9, 60 sec: 14745.6, 300 sec: 15009.4). Total num frames: 418021376. Throughput: 0: 3890.9. Samples: 93670204. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 04:48:33,968][134211] Avg episode reward: [(0, '7.637')] [2025-01-04 04:48:36,005][134294] Updated weights for policy 0, policy_version 102064 (0.0025) [2025-01-04 04:48:38,968][134211] Fps is (10 sec: 13926.4, 60 sec: 14882.7, 300 sec: 15037.2). Total num frames: 418091008. Throughput: 0: 3822.7. Samples: 93690756. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 04:48:38,968][134211] Avg episode reward: [(0, '8.249')] [2025-01-04 04:48:39,142][134294] Updated weights for policy 0, policy_version 102074 (0.0024) [2025-01-04 04:48:42,103][134294] Updated weights for policy 0, policy_version 102084 (0.0025) [2025-01-04 04:48:43,968][134211] Fps is (10 sec: 13926.4, 60 sec: 15086.9, 300 sec: 15064.9). Total num frames: 418160640. Throughput: 0: 3731.5. Samples: 93711010. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 04:48:43,969][134211] Avg episode reward: [(0, '8.445')] [2025-01-04 04:48:45,029][134294] Updated weights for policy 0, policy_version 102094 (0.0025) [2025-01-04 04:48:47,891][134294] Updated weights for policy 0, policy_version 102104 (0.0025) [2025-01-04 04:48:48,971][134211] Fps is (10 sec: 13922.3, 60 sec: 14949.6, 300 sec: 15078.7). Total num frames: 418230272. Throughput: 0: 3606.6. Samples: 93721604. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 04:48:48,972][134211] Avg episode reward: [(0, '7.929')] [2025-01-04 04:48:50,369][134294] Updated weights for policy 0, policy_version 102114 (0.0017) [2025-01-04 04:48:52,358][134294] Updated weights for policy 0, policy_version 102124 (0.0014) [2025-01-04 04:48:53,968][134211] Fps is (10 sec: 16794.0, 60 sec: 15428.3, 300 sec: 15120.5). Total num frames: 418328576. Throughput: 0: 3457.6. Samples: 93747310. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 04:48:53,968][134211] Avg episode reward: [(0, '9.144')] [2025-01-04 04:48:54,362][134294] Updated weights for policy 0, policy_version 102134 (0.0017) [2025-01-04 04:48:56,320][134294] Updated weights for policy 0, policy_version 102144 (0.0015) [2025-01-04 04:48:58,167][134294] Updated weights for policy 0, policy_version 102154 (0.0014) [2025-01-04 04:48:58,968][134211] Fps is (10 sec: 20485.9, 60 sec: 15701.3, 300 sec: 15259.3). Total num frames: 418435072. Throughput: 0: 3714.2. Samples: 93779026. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 04:48:58,968][134211] Avg episode reward: [(0, '7.300')] [2025-01-04 04:49:00,898][134294] Updated weights for policy 0, policy_version 102164 (0.0025) [2025-01-04 04:49:03,968][134211] Fps is (10 sec: 17203.0, 60 sec: 14950.5, 300 sec: 15148.2). Total num frames: 418500608. Throughput: 0: 3744.6. Samples: 93790026. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 04:49:03,968][134211] Avg episode reward: [(0, '8.086')] [2025-01-04 04:49:04,187][134294] Updated weights for policy 0, policy_version 102174 (0.0028) [2025-01-04 04:49:07,488][134294] Updated weights for policy 0, policy_version 102184 (0.0026) [2025-01-04 04:49:08,968][134211] Fps is (10 sec: 12697.7, 60 sec: 14404.2, 300 sec: 15106.6). Total num frames: 418562048. Throughput: 0: 3745.9. Samples: 93808658. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 04:49:08,968][134211] Avg episode reward: [(0, '8.607')] [2025-01-04 04:49:10,670][134294] Updated weights for policy 0, policy_version 102194 (0.0027) [2025-01-04 04:49:13,679][134294] Updated weights for policy 0, policy_version 102204 (0.0027) [2025-01-04 04:49:13,968][134211] Fps is (10 sec: 13107.2, 60 sec: 14472.9, 300 sec: 15120.5). Total num frames: 418631680. Throughput: 0: 3753.2. Samples: 93828612. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 04:49:13,968][134211] Avg episode reward: [(0, '7.844')] [2025-01-04 04:49:13,981][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000102205_418631680.pth... [2025-01-04 04:49:14,057][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000101330_415047680.pth [2025-01-04 04:49:17,014][134294] Updated weights for policy 0, policy_version 102214 (0.0026) [2025-01-04 04:49:18,968][134211] Fps is (10 sec: 12697.8, 60 sec: 14404.3, 300 sec: 15092.7). Total num frames: 418689024. Throughput: 0: 3724.2. Samples: 93837794. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 04:49:18,968][134211] Avg episode reward: [(0, '8.700')] [2025-01-04 04:49:20,352][134294] Updated weights for policy 0, policy_version 102224 (0.0027) [2025-01-04 04:49:23,175][134294] Updated weights for policy 0, policy_version 102234 (0.0020) [2025-01-04 04:49:23,968][134211] Fps is (10 sec: 13516.9, 60 sec: 14677.4, 300 sec: 15023.3). Total num frames: 418766848. Throughput: 0: 3689.7. Samples: 93856790. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 04:49:23,968][134211] Avg episode reward: [(0, '8.694')] [2025-01-04 04:49:25,085][134294] Updated weights for policy 0, policy_version 102244 (0.0013) [2025-01-04 04:49:26,975][134294] Updated weights for policy 0, policy_version 102254 (0.0015) [2025-01-04 04:49:28,885][134294] Updated weights for policy 0, policy_version 102264 (0.0013) [2025-01-04 04:49:28,968][134211] Fps is (10 sec: 18431.9, 60 sec: 15360.0, 300 sec: 15037.2). Total num frames: 418873344. Throughput: 0: 3929.0. Samples: 93887816. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 04:49:28,968][134211] Avg episode reward: [(0, '7.365')] [2025-01-04 04:49:31,618][134294] Updated weights for policy 0, policy_version 102274 (0.0026) [2025-01-04 04:49:33,968][134211] Fps is (10 sec: 17612.6, 60 sec: 15360.0, 300 sec: 15065.0). Total num frames: 418942976. Throughput: 0: 3971.0. Samples: 93900286. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 04:49:33,968][134211] Avg episode reward: [(0, '7.878')] [2025-01-04 04:49:34,794][134294] Updated weights for policy 0, policy_version 102284 (0.0029) [2025-01-04 04:49:38,259][134294] Updated weights for policy 0, policy_version 102294 (0.0030) [2025-01-04 04:49:38,968][134211] Fps is (10 sec: 12697.4, 60 sec: 15155.2, 300 sec: 15051.1). Total num frames: 419000320. Throughput: 0: 3814.9. Samples: 93918982. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 04:49:38,968][134211] Avg episode reward: [(0, '8.000')] [2025-01-04 04:49:41,789][134294] Updated weights for policy 0, policy_version 102304 (0.0029) [2025-01-04 04:49:43,968][134211] Fps is (10 sec: 11878.3, 60 sec: 15018.7, 300 sec: 15037.2). Total num frames: 419061760. Throughput: 0: 3496.0. Samples: 93936344. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 04:49:43,968][134211] Avg episode reward: [(0, '7.637')] [2025-01-04 04:49:44,876][134294] Updated weights for policy 0, policy_version 102314 (0.0022) [2025-01-04 04:49:46,831][134294] Updated weights for policy 0, policy_version 102324 (0.0012) [2025-01-04 04:49:48,748][134294] Updated weights for policy 0, policy_version 102334 (0.0013) [2025-01-04 04:49:48,967][134211] Fps is (10 sec: 16384.4, 60 sec: 15565.6, 300 sec: 15162.3). Total num frames: 419164160. Throughput: 0: 3543.9. Samples: 93949500. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 04:49:48,968][134211] Avg episode reward: [(0, '8.113')] [2025-01-04 04:49:50,757][134294] Updated weights for policy 0, policy_version 102344 (0.0014) [2025-01-04 04:49:52,743][134294] Updated weights for policy 0, policy_version 102354 (0.0014) [2025-01-04 04:49:53,967][134211] Fps is (10 sec: 20480.6, 60 sec: 15633.1, 300 sec: 15217.7). Total num frames: 419266560. Throughput: 0: 3825.9. Samples: 93980822. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 04:49:53,968][134211] Avg episode reward: [(0, '7.719')] [2025-01-04 04:49:54,701][134294] Updated weights for policy 0, policy_version 102364 (0.0015) [2025-01-04 04:49:57,747][134294] Updated weights for policy 0, policy_version 102374 (0.0026) [2025-01-04 04:49:58,968][134211] Fps is (10 sec: 17202.3, 60 sec: 15018.6, 300 sec: 15231.5). Total num frames: 419336192. Throughput: 0: 3930.3. Samples: 94005476. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 04:49:58,970][134211] Avg episode reward: [(0, '7.519')] [2025-01-04 04:50:00,955][134294] Updated weights for policy 0, policy_version 102384 (0.0027) [2025-01-04 04:50:03,968][134211] Fps is (10 sec: 13516.4, 60 sec: 15018.6, 300 sec: 15134.4). Total num frames: 419401728. Throughput: 0: 3941.6. Samples: 94015168. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 04:50:03,968][134211] Avg episode reward: [(0, '8.370')] [2025-01-04 04:50:04,142][134294] Updated weights for policy 0, policy_version 102394 (0.0026) [2025-01-04 04:50:07,122][134294] Updated weights for policy 0, policy_version 102404 (0.0028) [2025-01-04 04:50:08,968][134211] Fps is (10 sec: 13107.7, 60 sec: 15087.0, 300 sec: 14981.6). Total num frames: 419467264. Throughput: 0: 3957.9. Samples: 94034894. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 04:50:08,968][134211] Avg episode reward: [(0, '7.872')] [2025-01-04 04:50:10,178][134294] Updated weights for policy 0, policy_version 102414 (0.0023) [2025-01-04 04:50:13,183][134294] Updated weights for policy 0, policy_version 102424 (0.0026) [2025-01-04 04:50:13,968][134211] Fps is (10 sec: 13516.8, 60 sec: 15086.9, 300 sec: 14912.3). Total num frames: 419536896. Throughput: 0: 3718.6. Samples: 94055152. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 04:50:13,968][134211] Avg episode reward: [(0, '7.228')] [2025-01-04 04:50:16,392][134294] Updated weights for policy 0, policy_version 102434 (0.0028) [2025-01-04 04:50:18,968][134211] Fps is (10 sec: 13107.1, 60 sec: 15155.2, 300 sec: 14926.1). Total num frames: 419598336. Throughput: 0: 3660.0. Samples: 94064986. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 04:50:18,968][134211] Avg episode reward: [(0, '8.475')] [2025-01-04 04:50:19,861][134294] Updated weights for policy 0, policy_version 102444 (0.0026) [2025-01-04 04:50:23,123][134294] Updated weights for policy 0, policy_version 102454 (0.0025) [2025-01-04 04:50:23,968][134211] Fps is (10 sec: 12288.1, 60 sec: 14882.1, 300 sec: 14912.3). Total num frames: 419659776. Throughput: 0: 3650.3. Samples: 94083246. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 04:50:23,968][134211] Avg episode reward: [(0, '9.311')] [2025-01-04 04:50:26,307][134294] Updated weights for policy 0, policy_version 102464 (0.0022) [2025-01-04 04:50:28,308][134294] Updated weights for policy 0, policy_version 102474 (0.0013) [2025-01-04 04:50:28,968][134211] Fps is (10 sec: 14745.8, 60 sec: 14540.8, 300 sec: 14995.6). Total num frames: 419745792. Throughput: 0: 3763.9. Samples: 94105718. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 04:50:28,968][134211] Avg episode reward: [(0, '8.550')] [2025-01-04 04:50:30,278][134294] Updated weights for policy 0, policy_version 102484 (0.0014) [2025-01-04 04:50:32,132][134294] Updated weights for policy 0, policy_version 102494 (0.0014) [2025-01-04 04:50:33,968][134211] Fps is (10 sec: 19251.5, 60 sec: 15155.2, 300 sec: 15162.2). Total num frames: 419852288. Throughput: 0: 3831.0. Samples: 94121894. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 04:50:33,968][134211] Avg episode reward: [(0, '8.140')] [2025-01-04 04:50:34,012][134294] Updated weights for policy 0, policy_version 102504 (0.0014) [2025-01-04 04:50:36,161][134294] Updated weights for policy 0, policy_version 102514 (0.0016) [2025-01-04 04:50:38,968][134211] Fps is (10 sec: 18430.7, 60 sec: 15496.4, 300 sec: 15217.7). Total num frames: 419930112. Throughput: 0: 3775.8. Samples: 94150734. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 04:50:38,969][134211] Avg episode reward: [(0, '9.104')] [2025-01-04 04:50:39,426][134294] Updated weights for policy 0, policy_version 102524 (0.0028) [2025-01-04 04:50:42,685][134294] Updated weights for policy 0, policy_version 102534 (0.0028) [2025-01-04 04:50:43,968][134211] Fps is (10 sec: 14335.7, 60 sec: 15564.8, 300 sec: 15092.7). Total num frames: 419995648. Throughput: 0: 3648.2. Samples: 94169644. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 04:50:43,968][134211] Avg episode reward: [(0, '8.134')] [2025-01-04 04:50:45,737][134294] Updated weights for policy 0, policy_version 102544 (0.0028) [2025-01-04 04:50:48,956][134294] Updated weights for policy 0, policy_version 102554 (0.0025) [2025-01-04 04:50:48,968][134211] Fps is (10 sec: 13107.3, 60 sec: 14950.2, 300 sec: 14953.8). Total num frames: 420061184. Throughput: 0: 3658.6. Samples: 94179806. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 04:50:48,969][134211] Avg episode reward: [(0, '8.282')] [2025-01-04 04:50:52,207][134294] Updated weights for policy 0, policy_version 102564 (0.0027) [2025-01-04 04:50:53,968][134211] Fps is (10 sec: 12697.8, 60 sec: 14267.7, 300 sec: 14940.0). Total num frames: 420122624. Throughput: 0: 3636.9. Samples: 94198554. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 04:50:53,968][134211] Avg episode reward: [(0, '8.500')] [2025-01-04 04:50:55,314][134294] Updated weights for policy 0, policy_version 102574 (0.0024) [2025-01-04 04:50:58,310][134294] Updated weights for policy 0, policy_version 102584 (0.0024) [2025-01-04 04:50:58,968][134211] Fps is (10 sec: 13107.9, 60 sec: 14267.8, 300 sec: 14953.9). Total num frames: 420192256. Throughput: 0: 3638.1. Samples: 94218868. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 04:50:58,968][134211] Avg episode reward: [(0, '7.765')] [2025-01-04 04:51:01,295][134294] Updated weights for policy 0, policy_version 102594 (0.0025) [2025-01-04 04:51:03,968][134211] Fps is (10 sec: 13926.5, 60 sec: 14336.1, 300 sec: 14967.8). Total num frames: 420261888. Throughput: 0: 3649.3. Samples: 94229206. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 04:51:03,968][134211] Avg episode reward: [(0, '9.005')] [2025-01-04 04:51:04,125][134294] Updated weights for policy 0, policy_version 102604 (0.0023) [2025-01-04 04:51:05,995][134294] Updated weights for policy 0, policy_version 102614 (0.0014) [2025-01-04 04:51:07,880][134294] Updated weights for policy 0, policy_version 102624 (0.0014) [2025-01-04 04:51:08,968][134211] Fps is (10 sec: 17613.1, 60 sec: 15018.7, 300 sec: 15106.6). Total num frames: 420368384. Throughput: 0: 3846.5. Samples: 94256340. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 04:51:08,968][134211] Avg episode reward: [(0, '7.881')] [2025-01-04 04:51:09,774][134294] Updated weights for policy 0, policy_version 102634 (0.0013) [2025-01-04 04:51:12,851][134294] Updated weights for policy 0, policy_version 102644 (0.0027) [2025-01-04 04:51:13,968][134211] Fps is (10 sec: 18021.6, 60 sec: 15086.9, 300 sec: 15134.4). Total num frames: 420442112. Throughput: 0: 3905.8. Samples: 94281480. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 04:51:13,969][134211] Avg episode reward: [(0, '7.640')] [2025-01-04 04:51:13,980][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000102647_420442112.pth... [2025-01-04 04:51:14,058][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000101767_416837632.pth [2025-01-04 04:51:16,217][134294] Updated weights for policy 0, policy_version 102654 (0.0030) [2025-01-04 04:51:18,968][134211] Fps is (10 sec: 13516.5, 60 sec: 15086.9, 300 sec: 15120.5). Total num frames: 420503552. Throughput: 0: 3751.8. Samples: 94290728. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 04:51:18,969][134211] Avg episode reward: [(0, '7.054')] [2025-01-04 04:51:19,598][134294] Updated weights for policy 0, policy_version 102664 (0.0025) [2025-01-04 04:51:23,026][134294] Updated weights for policy 0, policy_version 102674 (0.0025) [2025-01-04 04:51:23,968][134211] Fps is (10 sec: 12697.8, 60 sec: 15155.2, 300 sec: 14967.7). Total num frames: 420569088. Throughput: 0: 3513.0. Samples: 94308818. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 04:51:23,968][134211] Avg episode reward: [(0, '8.321')] [2025-01-04 04:51:25,016][134294] Updated weights for policy 0, policy_version 102684 (0.0014) [2025-01-04 04:51:26,874][134294] Updated weights for policy 0, policy_version 102694 (0.0012) [2025-01-04 04:51:28,783][134294] Updated weights for policy 0, policy_version 102704 (0.0012) [2025-01-04 04:51:28,967][134211] Fps is (10 sec: 17203.7, 60 sec: 15496.5, 300 sec: 14967.8). Total num frames: 420675584. Throughput: 0: 3757.4. Samples: 94338724. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 04:51:28,968][134211] Avg episode reward: [(0, '8.769')] [2025-01-04 04:51:30,748][134294] Updated weights for policy 0, policy_version 102714 (0.0013) [2025-01-04 04:51:33,795][134294] Updated weights for policy 0, policy_version 102724 (0.0029) [2025-01-04 04:51:33,968][134211] Fps is (10 sec: 18840.8, 60 sec: 15086.8, 300 sec: 14995.5). Total num frames: 420757504. Throughput: 0: 3873.2. Samples: 94354098. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 04:51:33,969][134211] Avg episode reward: [(0, '8.699')] [2025-01-04 04:51:37,289][134294] Updated weights for policy 0, policy_version 102734 (0.0028) [2025-01-04 04:51:38,968][134211] Fps is (10 sec: 13926.0, 60 sec: 14745.7, 300 sec: 14981.6). Total num frames: 420814848. Throughput: 0: 3871.0. Samples: 94372748. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 04:51:38,968][134211] Avg episode reward: [(0, '7.195')] [2025-01-04 04:51:40,870][134294] Updated weights for policy 0, policy_version 102744 (0.0026) [2025-01-04 04:51:43,968][134211] Fps is (10 sec: 11469.2, 60 sec: 14609.1, 300 sec: 14953.9). Total num frames: 420872192. Throughput: 0: 3803.0. Samples: 94390002. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 04:51:43,968][134211] Avg episode reward: [(0, '8.694')] [2025-01-04 04:51:44,411][134294] Updated weights for policy 0, policy_version 102754 (0.0028) [2025-01-04 04:51:47,704][134294] Updated weights for policy 0, policy_version 102764 (0.0028) [2025-01-04 04:51:48,968][134211] Fps is (10 sec: 12288.1, 60 sec: 14609.2, 300 sec: 14967.8). Total num frames: 420937728. Throughput: 0: 3767.3. Samples: 94398736. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 04:51:48,968][134211] Avg episode reward: [(0, '7.704')] [2025-01-04 04:51:50,130][134294] Updated weights for policy 0, policy_version 102774 (0.0015) [2025-01-04 04:51:52,389][134294] Updated weights for policy 0, policy_version 102784 (0.0014) [2025-01-04 04:51:53,968][134211] Fps is (10 sec: 14745.8, 60 sec: 14950.4, 300 sec: 15023.3). Total num frames: 421019648. Throughput: 0: 3713.9. Samples: 94423464. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 04:51:53,968][134211] Avg episode reward: [(0, '6.782')] [2025-01-04 04:51:55,686][134294] Updated weights for policy 0, policy_version 102794 (0.0024) [2025-01-04 04:51:58,968][134211] Fps is (10 sec: 14335.9, 60 sec: 14813.9, 300 sec: 14995.6). Total num frames: 421081088. Throughput: 0: 3571.5. Samples: 94442196. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 04:51:58,968][134211] Avg episode reward: [(0, '7.499')] [2025-01-04 04:51:58,992][134294] Updated weights for policy 0, policy_version 102804 (0.0030) [2025-01-04 04:52:01,234][134294] Updated weights for policy 0, policy_version 102814 (0.0016) [2025-01-04 04:52:03,798][134294] Updated weights for policy 0, policy_version 102824 (0.0022) [2025-01-04 04:52:03,968][134211] Fps is (10 sec: 14745.3, 60 sec: 15086.8, 300 sec: 15051.0). Total num frames: 421167104. Throughput: 0: 3641.1. Samples: 94454576. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 04:52:03,969][134211] Avg episode reward: [(0, '7.802')] [2025-01-04 04:52:06,947][134294] Updated weights for policy 0, policy_version 102834 (0.0026) [2025-01-04 04:52:08,968][134211] Fps is (10 sec: 14744.7, 60 sec: 14335.8, 300 sec: 14884.4). Total num frames: 421228544. Throughput: 0: 3716.4. Samples: 94476058. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 04:52:08,969][134211] Avg episode reward: [(0, '7.288')] [2025-01-04 04:52:10,395][134294] Updated weights for policy 0, policy_version 102844 (0.0027) [2025-01-04 04:52:12,923][134294] Updated weights for policy 0, policy_version 102854 (0.0019) [2025-01-04 04:52:13,967][134211] Fps is (10 sec: 14336.6, 60 sec: 14472.6, 300 sec: 14815.0). Total num frames: 421310464. Throughput: 0: 3518.9. Samples: 94497076. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 04:52:13,968][134211] Avg episode reward: [(0, '8.721')] [2025-01-04 04:52:14,813][134294] Updated weights for policy 0, policy_version 102864 (0.0015) [2025-01-04 04:52:17,002][134294] Updated weights for policy 0, policy_version 102874 (0.0018) [2025-01-04 04:52:18,968][134211] Fps is (10 sec: 16794.6, 60 sec: 14882.1, 300 sec: 14884.5). Total num frames: 421396480. Throughput: 0: 3532.3. Samples: 94513048. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 04:52:18,969][134211] Avg episode reward: [(0, '7.542')] [2025-01-04 04:52:20,275][134294] Updated weights for policy 0, policy_version 102884 (0.0024) [2025-01-04 04:52:23,710][134294] Updated weights for policy 0, policy_version 102894 (0.0026) [2025-01-04 04:52:23,968][134211] Fps is (10 sec: 14335.5, 60 sec: 14745.6, 300 sec: 14856.7). Total num frames: 421453824. Throughput: 0: 3539.3. Samples: 94532016. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 04:52:23,969][134211] Avg episode reward: [(0, '7.790')] [2025-01-04 04:52:26,794][134294] Updated weights for policy 0, policy_version 102904 (0.0025) [2025-01-04 04:52:28,968][134211] Fps is (10 sec: 12697.6, 60 sec: 14131.1, 300 sec: 14870.6). Total num frames: 421523456. Throughput: 0: 3585.3. Samples: 94551342. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 04:52:28,968][134211] Avg episode reward: [(0, '7.630')] [2025-01-04 04:52:29,893][134294] Updated weights for policy 0, policy_version 102914 (0.0026) [2025-01-04 04:52:32,409][134294] Updated weights for policy 0, policy_version 102924 (0.0017) [2025-01-04 04:52:33,967][134211] Fps is (10 sec: 15565.3, 60 sec: 14199.6, 300 sec: 14954.0). Total num frames: 421609472. Throughput: 0: 3612.3. Samples: 94561290. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 04:52:33,968][134211] Avg episode reward: [(0, '8.302')] [2025-01-04 04:52:34,277][134294] Updated weights for policy 0, policy_version 102934 (0.0015) [2025-01-04 04:52:36,419][134294] Updated weights for policy 0, policy_version 102944 (0.0017) [2025-01-04 04:52:38,968][134211] Fps is (10 sec: 16793.6, 60 sec: 14609.1, 300 sec: 15037.2). Total num frames: 421691392. Throughput: 0: 3709.8. Samples: 94590404. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 04:52:38,968][134211] Avg episode reward: [(0, '8.903')] [2025-01-04 04:52:39,482][134294] Updated weights for policy 0, policy_version 102954 (0.0025) [2025-01-04 04:52:42,584][134294] Updated weights for policy 0, policy_version 102964 (0.0027) [2025-01-04 04:52:43,968][134211] Fps is (10 sec: 14745.2, 60 sec: 14745.6, 300 sec: 14995.5). Total num frames: 421756928. Throughput: 0: 3733.1. Samples: 94610186. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 04:52:43,968][134211] Avg episode reward: [(0, '7.933')] [2025-01-04 04:52:45,735][134294] Updated weights for policy 0, policy_version 102974 (0.0028) [2025-01-04 04:52:48,848][134294] Updated weights for policy 0, policy_version 102984 (0.0026) [2025-01-04 04:52:48,968][134211] Fps is (10 sec: 13107.3, 60 sec: 14745.6, 300 sec: 14981.6). Total num frames: 421822464. Throughput: 0: 3680.7. Samples: 94620206. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 04:52:48,968][134211] Avg episode reward: [(0, '8.677')] [2025-01-04 04:52:51,636][134294] Updated weights for policy 0, policy_version 102994 (0.0021) [2025-01-04 04:52:53,614][134294] Updated weights for policy 0, policy_version 103004 (0.0014) [2025-01-04 04:52:53,967][134211] Fps is (10 sec: 15155.6, 60 sec: 14813.9, 300 sec: 14967.8). Total num frames: 421908480. Throughput: 0: 3688.2. Samples: 94642024. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 04:52:53,968][134211] Avg episode reward: [(0, '8.092')] [2025-01-04 04:52:55,520][134294] Updated weights for policy 0, policy_version 103014 (0.0014) [2025-01-04 04:52:57,381][134294] Updated weights for policy 0, policy_version 103024 (0.0013) [2025-01-04 04:52:58,968][134211] Fps is (10 sec: 19251.1, 60 sec: 15564.8, 300 sec: 14953.9). Total num frames: 422014976. Throughput: 0: 3933.3. Samples: 94674076. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 04:52:58,968][134211] Avg episode reward: [(0, '8.462')] [2025-01-04 04:52:59,704][134294] Updated weights for policy 0, policy_version 103034 (0.0020) [2025-01-04 04:53:02,872][134294] Updated weights for policy 0, policy_version 103044 (0.0028) [2025-01-04 04:53:03,968][134211] Fps is (10 sec: 17202.5, 60 sec: 15223.5, 300 sec: 14856.7). Total num frames: 422080512. Throughput: 0: 3819.7. Samples: 94684934. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 04:53:03,969][134211] Avg episode reward: [(0, '8.498')] [2025-01-04 04:53:06,117][134294] Updated weights for policy 0, policy_version 103054 (0.0027) [2025-01-04 04:53:08,968][134211] Fps is (10 sec: 12697.6, 60 sec: 15223.6, 300 sec: 14842.9). Total num frames: 422141952. Throughput: 0: 3821.4. Samples: 94703980. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 04:53:08,968][134211] Avg episode reward: [(0, '8.349')] [2025-01-04 04:53:09,319][134294] Updated weights for policy 0, policy_version 103064 (0.0026) [2025-01-04 04:53:12,557][134294] Updated weights for policy 0, policy_version 103074 (0.0026) [2025-01-04 04:53:13,968][134211] Fps is (10 sec: 12697.9, 60 sec: 14950.4, 300 sec: 14856.7). Total num frames: 422207488. Throughput: 0: 3816.3. Samples: 94723074. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 04:53:13,968][134211] Avg episode reward: [(0, '8.216')] [2025-01-04 04:53:14,052][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000103079_422211584.pth... [2025-01-04 04:53:14,120][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000102205_418631680.pth [2025-01-04 04:53:15,635][134294] Updated weights for policy 0, policy_version 103084 (0.0027) [2025-01-04 04:53:18,906][134294] Updated weights for policy 0, policy_version 103094 (0.0025) [2025-01-04 04:53:18,968][134211] Fps is (10 sec: 13107.2, 60 sec: 14609.1, 300 sec: 14870.6). Total num frames: 422273024. Throughput: 0: 3813.0. Samples: 94732874. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 04:53:18,968][134211] Avg episode reward: [(0, '7.141')] [2025-01-04 04:53:21,873][134294] Updated weights for policy 0, policy_version 103104 (0.0024) [2025-01-04 04:53:23,957][134294] Updated weights for policy 0, policy_version 103114 (0.0013) [2025-01-04 04:53:23,968][134211] Fps is (10 sec: 14745.7, 60 sec: 15018.7, 300 sec: 14926.1). Total num frames: 422354944. Throughput: 0: 3617.5. Samples: 94753190. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 04:53:23,968][134211] Avg episode reward: [(0, '7.694')] [2025-01-04 04:53:26,290][134294] Updated weights for policy 0, policy_version 103124 (0.0019) [2025-01-04 04:53:28,968][134211] Fps is (10 sec: 15565.0, 60 sec: 15087.0, 300 sec: 14940.0). Total num frames: 422428672. Throughput: 0: 3743.0. Samples: 94778622. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 04:53:28,968][134211] Avg episode reward: [(0, '8.512')] [2025-01-04 04:53:29,342][134294] Updated weights for policy 0, policy_version 103134 (0.0026) [2025-01-04 04:53:32,646][134294] Updated weights for policy 0, policy_version 103144 (0.0024) [2025-01-04 04:53:33,968][134211] Fps is (10 sec: 14745.6, 60 sec: 14882.1, 300 sec: 14953.9). Total num frames: 422502400. Throughput: 0: 3725.9. Samples: 94787870. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 04:53:33,968][134211] Avg episode reward: [(0, '8.514')] [2025-01-04 04:53:34,562][134294] Updated weights for policy 0, policy_version 103154 (0.0012) [2025-01-04 04:53:36,584][134294] Updated weights for policy 0, policy_version 103164 (0.0013) [2025-01-04 04:53:38,968][134211] Fps is (10 sec: 15974.1, 60 sec: 14950.4, 300 sec: 15009.4). Total num frames: 422588416. Throughput: 0: 3851.5. Samples: 94815342. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 04:53:38,968][134211] Avg episode reward: [(0, '8.615')] [2025-01-04 04:53:39,962][134294] Updated weights for policy 0, policy_version 103174 (0.0026) [2025-01-04 04:53:43,388][134294] Updated weights for policy 0, policy_version 103184 (0.0027) [2025-01-04 04:53:43,968][134211] Fps is (10 sec: 14335.6, 60 sec: 14813.8, 300 sec: 14967.9). Total num frames: 422645760. Throughput: 0: 3531.2. Samples: 94832980. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 04:53:43,968][134211] Avg episode reward: [(0, '7.627')] [2025-01-04 04:53:45,999][134294] Updated weights for policy 0, policy_version 103194 (0.0018) [2025-01-04 04:53:48,499][134294] Updated weights for policy 0, policy_version 103204 (0.0021) [2025-01-04 04:53:48,968][134211] Fps is (10 sec: 13926.3, 60 sec: 15086.9, 300 sec: 14912.2). Total num frames: 422727680. Throughput: 0: 3551.1. Samples: 94844734. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 04:53:48,969][134211] Avg episode reward: [(0, '7.929')] [2025-01-04 04:53:51,818][134294] Updated weights for policy 0, policy_version 103214 (0.0028) [2025-01-04 04:53:53,871][134294] Updated weights for policy 0, policy_version 103224 (0.0014) [2025-01-04 04:53:53,967][134211] Fps is (10 sec: 15975.0, 60 sec: 14950.4, 300 sec: 14815.0). Total num frames: 422805504. Throughput: 0: 3598.6. Samples: 94865918. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 04:53:53,968][134211] Avg episode reward: [(0, '8.697')] [2025-01-04 04:53:55,887][134294] Updated weights for policy 0, policy_version 103234 (0.0013) [2025-01-04 04:53:57,780][134294] Updated weights for policy 0, policy_version 103244 (0.0013) [2025-01-04 04:53:58,968][134211] Fps is (10 sec: 18432.5, 60 sec: 14950.4, 300 sec: 14953.9). Total num frames: 422912000. Throughput: 0: 3866.8. Samples: 94897078. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 04:53:58,968][134211] Avg episode reward: [(0, '8.731')] [2025-01-04 04:53:59,920][134294] Updated weights for policy 0, policy_version 103254 (0.0017) [2025-01-04 04:54:03,048][134294] Updated weights for policy 0, policy_version 103264 (0.0027) [2025-01-04 04:54:03,968][134211] Fps is (10 sec: 17202.7, 60 sec: 14950.4, 300 sec: 14967.8). Total num frames: 422977536. Throughput: 0: 3928.9. Samples: 94909676. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 04:54:03,968][134211] Avg episode reward: [(0, '8.076')] [2025-01-04 04:54:06,249][134294] Updated weights for policy 0, policy_version 103274 (0.0025) [2025-01-04 04:54:08,968][134211] Fps is (10 sec: 13107.0, 60 sec: 15018.7, 300 sec: 14953.9). Total num frames: 423043072. Throughput: 0: 3905.1. Samples: 94928922. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 04:54:08,968][134211] Avg episode reward: [(0, '7.898')] [2025-01-04 04:54:09,384][134294] Updated weights for policy 0, policy_version 103284 (0.0027) [2025-01-04 04:54:12,447][134294] Updated weights for policy 0, policy_version 103294 (0.0024) [2025-01-04 04:54:13,968][134211] Fps is (10 sec: 13107.3, 60 sec: 15018.6, 300 sec: 14981.6). Total num frames: 423108608. Throughput: 0: 3772.9. Samples: 94948402. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 04:54:13,968][134211] Avg episode reward: [(0, '7.771')] [2025-01-04 04:54:15,538][134294] Updated weights for policy 0, policy_version 103304 (0.0026) [2025-01-04 04:54:18,818][134294] Updated weights for policy 0, policy_version 103314 (0.0024) [2025-01-04 04:54:18,968][134211] Fps is (10 sec: 13106.9, 60 sec: 15018.6, 300 sec: 14940.0). Total num frames: 423174144. Throughput: 0: 3796.9. Samples: 94958730. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 04:54:18,969][134211] Avg episode reward: [(0, '7.219')] [2025-01-04 04:54:22,144][134294] Updated weights for policy 0, policy_version 103324 (0.0026) [2025-01-04 04:54:23,968][134211] Fps is (10 sec: 12697.4, 60 sec: 14677.2, 300 sec: 14787.2). Total num frames: 423235584. Throughput: 0: 3591.3. Samples: 94976950. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 04:54:23,970][134211] Avg episode reward: [(0, '8.071')] [2025-01-04 04:54:25,304][134294] Updated weights for policy 0, policy_version 103334 (0.0026) [2025-01-04 04:54:28,286][134294] Updated weights for policy 0, policy_version 103344 (0.0027) [2025-01-04 04:54:28,968][134211] Fps is (10 sec: 13107.5, 60 sec: 14609.0, 300 sec: 14787.3). Total num frames: 423305216. Throughput: 0: 3650.8. Samples: 94997266. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 04:54:28,969][134211] Avg episode reward: [(0, '7.731')] [2025-01-04 04:54:31,158][134294] Updated weights for policy 0, policy_version 103354 (0.0028) [2025-01-04 04:54:33,317][134294] Updated weights for policy 0, policy_version 103364 (0.0015) [2025-01-04 04:54:33,968][134211] Fps is (10 sec: 15565.3, 60 sec: 14813.9, 300 sec: 14884.5). Total num frames: 423391232. Throughput: 0: 3619.3. Samples: 95007600. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 04:54:33,968][134211] Avg episode reward: [(0, '8.431')] [2025-01-04 04:54:35,217][134294] Updated weights for policy 0, policy_version 103374 (0.0013) [2025-01-04 04:54:37,108][134294] Updated weights for policy 0, policy_version 103384 (0.0013) [2025-01-04 04:54:38,968][134211] Fps is (10 sec: 19251.6, 60 sec: 15155.3, 300 sec: 15037.2). Total num frames: 423497728. Throughput: 0: 3842.8. Samples: 95038844. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 04:54:38,968][134211] Avg episode reward: [(0, '8.926')] [2025-01-04 04:54:38,999][134294] Updated weights for policy 0, policy_version 103394 (0.0013) [2025-01-04 04:54:40,872][134294] Updated weights for policy 0, policy_version 103404 (0.0013) [2025-01-04 04:54:43,621][134294] Updated weights for policy 0, policy_version 103414 (0.0025) [2025-01-04 04:54:43,968][134211] Fps is (10 sec: 19250.7, 60 sec: 15633.1, 300 sec: 14981.6). Total num frames: 423583744. Throughput: 0: 3790.6. Samples: 95067654. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 04:54:43,969][134211] Avg episode reward: [(0, '8.482')] [2025-01-04 04:54:47,063][134294] Updated weights for policy 0, policy_version 103424 (0.0032) [2025-01-04 04:54:48,968][134211] Fps is (10 sec: 14745.3, 60 sec: 15291.8, 300 sec: 14842.8). Total num frames: 423645184. Throughput: 0: 3719.2. Samples: 95077038. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 04:54:48,968][134211] Avg episode reward: [(0, '8.015')] [2025-01-04 04:54:50,357][134294] Updated weights for policy 0, policy_version 103434 (0.0027) [2025-01-04 04:54:53,743][134294] Updated weights for policy 0, policy_version 103444 (0.0028) [2025-01-04 04:54:53,968][134211] Fps is (10 sec: 12288.1, 60 sec: 15018.6, 300 sec: 14815.0). Total num frames: 423706624. Throughput: 0: 3695.2. Samples: 95095206. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 04:54:53,968][134211] Avg episode reward: [(0, '8.138')] [2025-01-04 04:54:56,749][134294] Updated weights for policy 0, policy_version 103454 (0.0027) [2025-01-04 04:54:58,968][134211] Fps is (10 sec: 13107.4, 60 sec: 14404.3, 300 sec: 14828.9). Total num frames: 423776256. Throughput: 0: 3694.5. Samples: 95114656. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 04:54:58,968][134211] Avg episode reward: [(0, '8.418')] [2025-01-04 04:54:59,813][134294] Updated weights for policy 0, policy_version 103464 (0.0025) [2025-01-04 04:55:02,900][134294] Updated weights for policy 0, policy_version 103474 (0.0022) [2025-01-04 04:55:03,968][134211] Fps is (10 sec: 13516.3, 60 sec: 14404.2, 300 sec: 14828.9). Total num frames: 423841792. Throughput: 0: 3687.8. Samples: 95124684. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 04:55:03,969][134211] Avg episode reward: [(0, '7.793')] [2025-01-04 04:55:05,968][134294] Updated weights for policy 0, policy_version 103484 (0.0023) [2025-01-04 04:55:08,952][134294] Updated weights for policy 0, policy_version 103494 (0.0024) [2025-01-04 04:55:08,968][134211] Fps is (10 sec: 13516.7, 60 sec: 14472.6, 300 sec: 14828.9). Total num frames: 423911424. Throughput: 0: 3735.2. Samples: 95145032. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 04:55:08,968][134211] Avg episode reward: [(0, '9.113')] [2025-01-04 04:55:11,921][134294] Updated weights for policy 0, policy_version 103504 (0.0023) [2025-01-04 04:55:13,968][134211] Fps is (10 sec: 14746.3, 60 sec: 14677.4, 300 sec: 14884.4). Total num frames: 423989248. Throughput: 0: 3763.9. Samples: 95166640. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 04:55:13,968][134211] Avg episode reward: [(0, '8.766')] [2025-01-04 04:55:13,975][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000103513_423989248.pth... [2025-01-04 04:55:14,020][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000102647_420442112.pth [2025-01-04 04:55:14,100][134294] Updated weights for policy 0, policy_version 103514 (0.0016) [2025-01-04 04:55:16,034][134294] Updated weights for policy 0, policy_version 103524 (0.0014) [2025-01-04 04:55:18,012][134294] Updated weights for policy 0, policy_version 103534 (0.0012) [2025-01-04 04:55:18,968][134211] Fps is (10 sec: 18022.6, 60 sec: 15291.8, 300 sec: 15023.3). Total num frames: 424091648. Throughput: 0: 3891.5. Samples: 95182718. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 04:55:18,968][134211] Avg episode reward: [(0, '7.740')] [2025-01-04 04:55:20,173][134294] Updated weights for policy 0, policy_version 103544 (0.0016) [2025-01-04 04:55:23,536][134294] Updated weights for policy 0, policy_version 103554 (0.0028) [2025-01-04 04:55:23,968][134211] Fps is (10 sec: 17203.2, 60 sec: 15428.3, 300 sec: 14967.7). Total num frames: 424161280. Throughput: 0: 3770.9. Samples: 95208534. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 04:55:23,968][134211] Avg episode reward: [(0, '7.104')] [2025-01-04 04:55:27,027][134294] Updated weights for policy 0, policy_version 103564 (0.0027) [2025-01-04 04:55:28,971][134211] Fps is (10 sec: 12693.5, 60 sec: 15222.7, 300 sec: 14801.0). Total num frames: 424218624. Throughput: 0: 3519.0. Samples: 95226020. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 04:55:28,971][134211] Avg episode reward: [(0, '7.639')] [2025-01-04 04:55:30,363][134294] Updated weights for policy 0, policy_version 103574 (0.0029) [2025-01-04 04:55:33,419][134294] Updated weights for policy 0, policy_version 103584 (0.0025) [2025-01-04 04:55:33,968][134211] Fps is (10 sec: 12287.9, 60 sec: 14882.1, 300 sec: 14759.5). Total num frames: 424284160. Throughput: 0: 3528.7. Samples: 95235828. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 04:55:33,968][134211] Avg episode reward: [(0, '8.487')] [2025-01-04 04:55:36,289][134294] Updated weights for policy 0, policy_version 103594 (0.0023) [2025-01-04 04:55:38,581][134294] Updated weights for policy 0, policy_version 103604 (0.0014) [2025-01-04 04:55:38,968][134211] Fps is (10 sec: 14750.3, 60 sec: 14472.5, 300 sec: 14815.0). Total num frames: 424366080. Throughput: 0: 3606.7. Samples: 95257506. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 04:55:38,968][134211] Avg episode reward: [(0, '9.030')] [2025-01-04 04:55:41,148][134294] Updated weights for policy 0, policy_version 103614 (0.0019) [2025-01-04 04:55:43,969][134211] Fps is (10 sec: 15154.1, 60 sec: 14199.3, 300 sec: 14828.9). Total num frames: 424435712. Throughput: 0: 3688.8. Samples: 95280656. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 04:55:43,970][134211] Avg episode reward: [(0, '8.326')] [2025-01-04 04:55:44,381][134294] Updated weights for policy 0, policy_version 103624 (0.0026) [2025-01-04 04:55:47,441][134294] Updated weights for policy 0, policy_version 103634 (0.0027) [2025-01-04 04:55:48,968][134211] Fps is (10 sec: 13516.6, 60 sec: 14267.7, 300 sec: 14842.8). Total num frames: 424501248. Throughput: 0: 3682.0. Samples: 95290374. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 04:55:48,968][134211] Avg episode reward: [(0, '8.307')] [2025-01-04 04:55:50,221][134294] Updated weights for policy 0, policy_version 103644 (0.0016) [2025-01-04 04:55:52,756][134294] Updated weights for policy 0, policy_version 103654 (0.0020) [2025-01-04 04:55:53,968][134211] Fps is (10 sec: 14337.1, 60 sec: 14540.8, 300 sec: 14870.6). Total num frames: 424579072. Throughput: 0: 3734.0. Samples: 95313062. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 04:55:53,968][134211] Avg episode reward: [(0, '8.542')] [2025-01-04 04:55:55,839][134294] Updated weights for policy 0, policy_version 103664 (0.0026) [2025-01-04 04:55:58,684][134294] Updated weights for policy 0, policy_version 103674 (0.0023) [2025-01-04 04:55:58,968][134211] Fps is (10 sec: 15155.5, 60 sec: 14609.1, 300 sec: 14884.4). Total num frames: 424652800. Throughput: 0: 3700.8. Samples: 95333176. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 04:55:58,968][134211] Avg episode reward: [(0, '7.600')] [2025-01-04 04:56:00,658][134294] Updated weights for policy 0, policy_version 103684 (0.0014) [2025-01-04 04:56:03,609][134294] Updated weights for policy 0, policy_version 103694 (0.0026) [2025-01-04 04:56:03,968][134211] Fps is (10 sec: 15564.9, 60 sec: 14882.3, 300 sec: 14801.1). Total num frames: 424734720. Throughput: 0: 3662.5. Samples: 95347530. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 04:56:03,968][134211] Avg episode reward: [(0, '8.183')] [2025-01-04 04:56:06,628][134294] Updated weights for policy 0, policy_version 103704 (0.0025) [2025-01-04 04:56:08,968][134211] Fps is (10 sec: 14744.8, 60 sec: 14813.8, 300 sec: 14773.4). Total num frames: 424800256. Throughput: 0: 3541.4. Samples: 95367900. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 04:56:08,969][134211] Avg episode reward: [(0, '8.242')] [2025-01-04 04:56:09,737][134294] Updated weights for policy 0, policy_version 103714 (0.0024) [2025-01-04 04:56:11,619][134294] Updated weights for policy 0, policy_version 103724 (0.0014) [2025-01-04 04:56:13,450][134294] Updated weights for policy 0, policy_version 103734 (0.0013) [2025-01-04 04:56:13,968][134211] Fps is (10 sec: 16793.7, 60 sec: 15223.5, 300 sec: 14912.2). Total num frames: 424902656. Throughput: 0: 3763.2. Samples: 95395350. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 04:56:13,968][134211] Avg episode reward: [(0, '8.300')] [2025-01-04 04:56:16,188][134294] Updated weights for policy 0, policy_version 103744 (0.0022) [2025-01-04 04:56:18,968][134211] Fps is (10 sec: 16384.6, 60 sec: 14540.8, 300 sec: 14898.3). Total num frames: 424964096. Throughput: 0: 3802.1. Samples: 95406922. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 04:56:18,968][134211] Avg episode reward: [(0, '8.078')] [2025-01-04 04:56:19,700][134294] Updated weights for policy 0, policy_version 103754 (0.0027) [2025-01-04 04:56:23,232][134294] Updated weights for policy 0, policy_version 103764 (0.0027) [2025-01-04 04:56:23,969][134211] Fps is (10 sec: 12286.8, 60 sec: 14404.1, 300 sec: 14745.5). Total num frames: 425025536. Throughput: 0: 3713.5. Samples: 95424616. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 04:56:23,969][134211] Avg episode reward: [(0, '8.094')] [2025-01-04 04:56:25,803][134294] Updated weights for policy 0, policy_version 103774 (0.0017) [2025-01-04 04:56:27,694][134294] Updated weights for policy 0, policy_version 103784 (0.0014) [2025-01-04 04:56:28,968][134211] Fps is (10 sec: 15974.6, 60 sec: 15087.8, 300 sec: 14801.2). Total num frames: 425123840. Throughput: 0: 3782.2. Samples: 95450850. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 04:56:28,968][134211] Avg episode reward: [(0, '7.631')] [2025-01-04 04:56:29,553][134294] Updated weights for policy 0, policy_version 103794 (0.0013) [2025-01-04 04:56:31,430][134294] Updated weights for policy 0, policy_version 103804 (0.0013) [2025-01-04 04:56:33,575][134294] Updated weights for policy 0, policy_version 103814 (0.0017) [2025-01-04 04:56:33,968][134211] Fps is (10 sec: 20071.9, 60 sec: 15701.3, 300 sec: 14953.9). Total num frames: 425226240. Throughput: 0: 3928.2. Samples: 95467144. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 04:56:33,969][134211] Avg episode reward: [(0, '7.990')] [2025-01-04 04:56:37,024][134294] Updated weights for policy 0, policy_version 103824 (0.0029) [2025-01-04 04:56:38,968][134211] Fps is (10 sec: 15973.7, 60 sec: 15291.6, 300 sec: 14953.9). Total num frames: 425283584. Throughput: 0: 3920.0. Samples: 95489462. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 04:56:38,969][134211] Avg episode reward: [(0, '8.009')] [2025-01-04 04:56:40,257][134294] Updated weights for policy 0, policy_version 103834 (0.0025) [2025-01-04 04:56:43,293][134294] Updated weights for policy 0, policy_version 103844 (0.0025) [2025-01-04 04:56:43,968][134211] Fps is (10 sec: 12697.6, 60 sec: 15291.9, 300 sec: 14967.7). Total num frames: 425353216. Throughput: 0: 3910.0. Samples: 95509128. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 04:56:43,968][134211] Avg episode reward: [(0, '7.657')] [2025-01-04 04:56:46,418][134294] Updated weights for policy 0, policy_version 103854 (0.0025) [2025-01-04 04:56:48,968][134211] Fps is (10 sec: 13106.9, 60 sec: 15223.4, 300 sec: 14898.3). Total num frames: 425414656. Throughput: 0: 3811.8. Samples: 95519064. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 04:56:48,969][134211] Avg episode reward: [(0, '8.586')] [2025-01-04 04:56:49,870][134294] Updated weights for policy 0, policy_version 103864 (0.0026) [2025-01-04 04:56:53,145][134294] Updated weights for policy 0, policy_version 103874 (0.0028) [2025-01-04 04:56:53,969][134211] Fps is (10 sec: 12286.8, 60 sec: 14950.1, 300 sec: 14898.3). Total num frames: 425476096. Throughput: 0: 3760.1. Samples: 95537106. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 04:56:53,969][134211] Avg episode reward: [(0, '8.194')] [2025-01-04 04:56:55,622][134294] Updated weights for policy 0, policy_version 103884 (0.0020) [2025-01-04 04:56:57,470][134294] Updated weights for policy 0, policy_version 103894 (0.0015) [2025-01-04 04:56:58,968][134211] Fps is (10 sec: 16385.0, 60 sec: 15428.3, 300 sec: 14953.9). Total num frames: 425578496. Throughput: 0: 3759.5. Samples: 95564528. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 04:56:58,968][134211] Avg episode reward: [(0, '7.730')] [2025-01-04 04:56:59,621][134294] Updated weights for policy 0, policy_version 103904 (0.0017) [2025-01-04 04:57:02,563][134294] Updated weights for policy 0, policy_version 103914 (0.0025) [2025-01-04 04:57:03,968][134211] Fps is (10 sec: 17204.9, 60 sec: 15223.4, 300 sec: 14981.7). Total num frames: 425648128. Throughput: 0: 3769.3. Samples: 95576542. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 04:57:03,969][134211] Avg episode reward: [(0, '8.130')] [2025-01-04 04:57:05,723][134294] Updated weights for policy 0, policy_version 103924 (0.0026) [2025-01-04 04:57:08,884][134294] Updated weights for policy 0, policy_version 103934 (0.0025) [2025-01-04 04:57:08,968][134211] Fps is (10 sec: 13516.0, 60 sec: 15223.4, 300 sec: 14926.1). Total num frames: 425713664. Throughput: 0: 3816.3. Samples: 95596348. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 04:57:08,969][134211] Avg episode reward: [(0, '7.464')] [2025-01-04 04:57:12,098][134294] Updated weights for policy 0, policy_version 103944 (0.0025) [2025-01-04 04:57:13,968][134211] Fps is (10 sec: 14336.3, 60 sec: 14813.9, 300 sec: 14898.3). Total num frames: 425791488. Throughput: 0: 3702.2. Samples: 95617450. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 04:57:13,968][134211] Avg episode reward: [(0, '7.155')] [2025-01-04 04:57:14,051][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000103954_425795584.pth... [2025-01-04 04:57:14,052][134294] Updated weights for policy 0, policy_version 103954 (0.0013) [2025-01-04 04:57:14,090][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000103079_422211584.pth [2025-01-04 04:57:15,941][134294] Updated weights for policy 0, policy_version 103964 (0.0014) [2025-01-04 04:57:17,932][134294] Updated weights for policy 0, policy_version 103974 (0.0014) [2025-01-04 04:57:18,967][134211] Fps is (10 sec: 18433.3, 60 sec: 15564.9, 300 sec: 15065.0). Total num frames: 425897984. Throughput: 0: 3697.9. Samples: 95633546. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 04:57:18,968][134211] Avg episode reward: [(0, '7.402')] [2025-01-04 04:57:19,941][134294] Updated weights for policy 0, policy_version 103984 (0.0014) [2025-01-04 04:57:22,615][134294] Updated weights for policy 0, policy_version 103994 (0.0021) [2025-01-04 04:57:23,968][134211] Fps is (10 sec: 18022.1, 60 sec: 15769.8, 300 sec: 15078.8). Total num frames: 425971712. Throughput: 0: 3827.9. Samples: 95661716. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 04:57:23,968][134211] Avg episode reward: [(0, '7.670')] [2025-01-04 04:57:26,097][134294] Updated weights for policy 0, policy_version 104004 (0.0028) [2025-01-04 04:57:28,968][134211] Fps is (10 sec: 13516.3, 60 sec: 15155.1, 300 sec: 14995.5). Total num frames: 426033152. Throughput: 0: 3798.8. Samples: 95680076. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 04:57:28,968][134211] Avg episode reward: [(0, '8.020')] [2025-01-04 04:57:29,250][134294] Updated weights for policy 0, policy_version 104014 (0.0028) [2025-01-04 04:57:32,386][134294] Updated weights for policy 0, policy_version 104024 (0.0029) [2025-01-04 04:57:33,968][134211] Fps is (10 sec: 12697.5, 60 sec: 14540.8, 300 sec: 14940.0). Total num frames: 426098688. Throughput: 0: 3791.2. Samples: 95689666. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 04:57:33,968][134211] Avg episode reward: [(0, '7.660')] [2025-01-04 04:57:35,425][134294] Updated weights for policy 0, policy_version 104034 (0.0026) [2025-01-04 04:57:38,788][134294] Updated weights for policy 0, policy_version 104044 (0.0026) [2025-01-04 04:57:38,968][134211] Fps is (10 sec: 13107.3, 60 sec: 14677.4, 300 sec: 14940.0). Total num frames: 426164224. Throughput: 0: 3832.4. Samples: 95709560. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 04:57:38,969][134211] Avg episode reward: [(0, '7.722')] [2025-01-04 04:57:42,254][134294] Updated weights for policy 0, policy_version 104054 (0.0024) [2025-01-04 04:57:43,968][134211] Fps is (10 sec: 13107.3, 60 sec: 14609.1, 300 sec: 14940.0). Total num frames: 426229760. Throughput: 0: 3613.0. Samples: 95727112. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 04:57:43,968][134211] Avg episode reward: [(0, '8.761')] [2025-01-04 04:57:44,767][134294] Updated weights for policy 0, policy_version 104064 (0.0018) [2025-01-04 04:57:46,772][134294] Updated weights for policy 0, policy_version 104074 (0.0015) [2025-01-04 04:57:48,778][134294] Updated weights for policy 0, policy_version 104084 (0.0014) [2025-01-04 04:57:48,968][134211] Fps is (10 sec: 16384.0, 60 sec: 15223.6, 300 sec: 14981.6). Total num frames: 426328064. Throughput: 0: 3677.0. Samples: 95742008. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 04:57:48,968][134211] Avg episode reward: [(0, '8.635')] [2025-01-04 04:57:50,766][134294] Updated weights for policy 0, policy_version 104094 (0.0015) [2025-01-04 04:57:52,848][134294] Updated weights for policy 0, policy_version 104104 (0.0014) [2025-01-04 04:57:53,968][134211] Fps is (10 sec: 19251.2, 60 sec: 15769.9, 300 sec: 14940.0). Total num frames: 426422272. Throughput: 0: 3919.6. Samples: 95772730. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 04:57:53,968][134211] Avg episode reward: [(0, '9.205')] [2025-01-04 04:57:56,040][134294] Updated weights for policy 0, policy_version 104114 (0.0026) [2025-01-04 04:57:58,968][134211] Fps is (10 sec: 15564.8, 60 sec: 15086.9, 300 sec: 14926.1). Total num frames: 426483712. Throughput: 0: 3890.1. Samples: 95792506. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 04:57:58,968][134211] Avg episode reward: [(0, '8.105')] [2025-01-04 04:57:59,339][134294] Updated weights for policy 0, policy_version 104124 (0.0031) [2025-01-04 04:58:02,573][134294] Updated weights for policy 0, policy_version 104134 (0.0024) [2025-01-04 04:58:03,968][134211] Fps is (10 sec: 12697.7, 60 sec: 15018.7, 300 sec: 14940.0). Total num frames: 426549248. Throughput: 0: 3744.3. Samples: 95802042. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 04:58:03,968][134211] Avg episode reward: [(0, '7.780')] [2025-01-04 04:58:05,579][134294] Updated weights for policy 0, policy_version 104144 (0.0026) [2025-01-04 04:58:08,654][134294] Updated weights for policy 0, policy_version 104154 (0.0027) [2025-01-04 04:58:08,968][134211] Fps is (10 sec: 13516.7, 60 sec: 15087.0, 300 sec: 14953.9). Total num frames: 426618880. Throughput: 0: 3563.9. Samples: 95822092. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 04:58:08,968][134211] Avg episode reward: [(0, '8.456')] [2025-01-04 04:58:11,584][134294] Updated weights for policy 0, policy_version 104164 (0.0024) [2025-01-04 04:58:13,968][134211] Fps is (10 sec: 13516.6, 60 sec: 14882.1, 300 sec: 14953.9). Total num frames: 426684416. Throughput: 0: 3601.9. Samples: 95842160. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 04:58:13,968][134211] Avg episode reward: [(0, '8.537')] [2025-01-04 04:58:14,725][134294] Updated weights for policy 0, policy_version 104174 (0.0026) [2025-01-04 04:58:18,070][134294] Updated weights for policy 0, policy_version 104184 (0.0025) [2025-01-04 04:58:18,968][134211] Fps is (10 sec: 12697.5, 60 sec: 14131.1, 300 sec: 14884.4). Total num frames: 426745856. Throughput: 0: 3611.6. Samples: 95852186. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 04:58:18,968][134211] Avg episode reward: [(0, '8.150')] [2025-01-04 04:58:21,161][134294] Updated weights for policy 0, policy_version 104194 (0.0026) [2025-01-04 04:58:23,316][134294] Updated weights for policy 0, policy_version 104204 (0.0015) [2025-01-04 04:58:23,968][134211] Fps is (10 sec: 14745.4, 60 sec: 14336.0, 300 sec: 14926.1). Total num frames: 426831872. Throughput: 0: 3624.8. Samples: 95872676. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 04:58:23,968][134211] Avg episode reward: [(0, '8.268')] [2025-01-04 04:58:25,240][134294] Updated weights for policy 0, policy_version 104214 (0.0015) [2025-01-04 04:58:27,135][134294] Updated weights for policy 0, policy_version 104224 (0.0015) [2025-01-04 04:58:28,968][134211] Fps is (10 sec: 18841.9, 60 sec: 15018.7, 300 sec: 15023.3). Total num frames: 426934272. Throughput: 0: 3927.0. Samples: 95903826. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 04:58:28,968][134211] Avg episode reward: [(0, '8.307')] [2025-01-04 04:58:29,469][134294] Updated weights for policy 0, policy_version 104234 (0.0019) [2025-01-04 04:58:32,697][134294] Updated weights for policy 0, policy_version 104244 (0.0027) [2025-01-04 04:58:33,968][134211] Fps is (10 sec: 16793.1, 60 sec: 15018.6, 300 sec: 14953.8). Total num frames: 426999808. Throughput: 0: 3825.7. Samples: 95914168. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 04:58:33,969][134211] Avg episode reward: [(0, '7.953')] [2025-01-04 04:58:35,787][134294] Updated weights for policy 0, policy_version 104254 (0.0029) [2025-01-04 04:58:38,924][134294] Updated weights for policy 0, policy_version 104264 (0.0026) [2025-01-04 04:58:38,968][134211] Fps is (10 sec: 13107.1, 60 sec: 15018.7, 300 sec: 14981.6). Total num frames: 427065344. Throughput: 0: 3581.4. Samples: 95933892. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 04:58:38,968][134211] Avg episode reward: [(0, '7.819')] [2025-01-04 04:58:41,913][134294] Updated weights for policy 0, policy_version 104274 (0.0025) [2025-01-04 04:58:43,968][134211] Fps is (10 sec: 13107.7, 60 sec: 15018.6, 300 sec: 14926.1). Total num frames: 427130880. Throughput: 0: 3584.7. Samples: 95953818. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 04:58:43,969][134211] Avg episode reward: [(0, '8.035')] [2025-01-04 04:58:45,056][134294] Updated weights for policy 0, policy_version 104284 (0.0026) [2025-01-04 04:58:48,224][134294] Updated weights for policy 0, policy_version 104294 (0.0028) [2025-01-04 04:58:48,968][134211] Fps is (10 sec: 13107.2, 60 sec: 14472.5, 300 sec: 14884.4). Total num frames: 427196416. Throughput: 0: 3590.8. Samples: 95963628. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 04:58:48,968][134211] Avg episode reward: [(0, '8.284')] [2025-01-04 04:58:51,224][134294] Updated weights for policy 0, policy_version 104304 (0.0022) [2025-01-04 04:58:53,329][134294] Updated weights for policy 0, policy_version 104314 (0.0014) [2025-01-04 04:58:53,968][134211] Fps is (10 sec: 15155.6, 60 sec: 14336.0, 300 sec: 14815.0). Total num frames: 427282432. Throughput: 0: 3621.4. Samples: 95985054. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 04:58:53,968][134211] Avg episode reward: [(0, '7.730')] [2025-01-04 04:58:55,299][134294] Updated weights for policy 0, policy_version 104324 (0.0013) [2025-01-04 04:58:57,210][134294] Updated weights for policy 0, policy_version 104334 (0.0012) [2025-01-04 04:58:58,967][134211] Fps is (10 sec: 19251.6, 60 sec: 15087.0, 300 sec: 14953.9). Total num frames: 427388928. Throughput: 0: 3873.3. Samples: 96016458. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 04:58:58,968][134211] Avg episode reward: [(0, '6.979')] [2025-01-04 04:58:59,111][134294] Updated weights for policy 0, policy_version 104344 (0.0014) [2025-01-04 04:59:01,007][134294] Updated weights for policy 0, policy_version 104354 (0.0016) [2025-01-04 04:59:03,484][134294] Updated weights for policy 0, policy_version 104364 (0.0021) [2025-01-04 04:59:03,968][134211] Fps is (10 sec: 19660.3, 60 sec: 15496.5, 300 sec: 15037.2). Total num frames: 427479040. Throughput: 0: 4013.0. Samples: 96032772. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 04:59:03,969][134211] Avg episode reward: [(0, '8.087')] [2025-01-04 04:59:06,834][134294] Updated weights for policy 0, policy_version 104374 (0.0029) [2025-01-04 04:59:08,968][134211] Fps is (10 sec: 15154.9, 60 sec: 15360.0, 300 sec: 15023.3). Total num frames: 427540480. Throughput: 0: 4011.4. Samples: 96053186. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 04:59:08,968][134211] Avg episode reward: [(0, '7.459')] [2025-01-04 04:59:10,124][134294] Updated weights for policy 0, policy_version 104384 (0.0024) [2025-01-04 04:59:13,365][134294] Updated weights for policy 0, policy_version 104394 (0.0025) [2025-01-04 04:59:13,968][134211] Fps is (10 sec: 12697.6, 60 sec: 15360.0, 300 sec: 15023.3). Total num frames: 427606016. Throughput: 0: 3739.3. Samples: 96072094. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 04:59:13,968][134211] Avg episode reward: [(0, '7.805')] [2025-01-04 04:59:13,980][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000104396_427606016.pth... [2025-01-04 04:59:14,053][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000103513_423989248.pth [2025-01-04 04:59:16,596][134294] Updated weights for policy 0, policy_version 104404 (0.0027) [2025-01-04 04:59:18,968][134211] Fps is (10 sec: 12287.8, 60 sec: 15291.7, 300 sec: 15009.4). Total num frames: 427663360. Throughput: 0: 3719.6. Samples: 96081550. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 04:59:18,968][134211] Avg episode reward: [(0, '8.002')] [2025-01-04 04:59:19,996][134294] Updated weights for policy 0, policy_version 104414 (0.0026) [2025-01-04 04:59:23,266][134294] Updated weights for policy 0, policy_version 104424 (0.0030) [2025-01-04 04:59:23,968][134211] Fps is (10 sec: 11878.4, 60 sec: 14882.2, 300 sec: 14981.6). Total num frames: 427724800. Throughput: 0: 3689.1. Samples: 96099900. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 04:59:23,968][134211] Avg episode reward: [(0, '8.527')] [2025-01-04 04:59:26,346][134294] Updated weights for policy 0, policy_version 104434 (0.0025) [2025-01-04 04:59:28,968][134211] Fps is (10 sec: 13107.3, 60 sec: 14336.0, 300 sec: 14926.1). Total num frames: 427794432. Throughput: 0: 3685.2. Samples: 96119652. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 04:59:28,968][134211] Avg episode reward: [(0, '7.680')] [2025-01-04 04:59:29,430][134294] Updated weights for policy 0, policy_version 104444 (0.0024) [2025-01-04 04:59:32,413][134294] Updated weights for policy 0, policy_version 104454 (0.0025) [2025-01-04 04:59:33,968][134211] Fps is (10 sec: 13926.6, 60 sec: 14404.4, 300 sec: 14801.1). Total num frames: 427864064. Throughput: 0: 3693.8. Samples: 96129848. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 04:59:33,968][134211] Avg episode reward: [(0, '7.970')] [2025-01-04 04:59:35,139][134294] Updated weights for policy 0, policy_version 104464 (0.0020) [2025-01-04 04:59:37,146][134294] Updated weights for policy 0, policy_version 104474 (0.0014) [2025-01-04 04:59:38,968][134211] Fps is (10 sec: 15155.3, 60 sec: 14677.3, 300 sec: 14787.3). Total num frames: 427945984. Throughput: 0: 3771.0. Samples: 96154748. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 04:59:38,968][134211] Avg episode reward: [(0, '7.967')] [2025-01-04 04:59:40,391][134294] Updated weights for policy 0, policy_version 104484 (0.0028) [2025-01-04 04:59:43,390][134294] Updated weights for policy 0, policy_version 104494 (0.0020) [2025-01-04 04:59:43,967][134211] Fps is (10 sec: 15155.4, 60 sec: 14745.7, 300 sec: 14815.0). Total num frames: 428015616. Throughput: 0: 3502.8. Samples: 96174086. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 04:59:43,968][134211] Avg episode reward: [(0, '7.667')] [2025-01-04 04:59:45,533][134294] Updated weights for policy 0, policy_version 104504 (0.0014) [2025-01-04 04:59:47,486][134294] Updated weights for policy 0, policy_version 104514 (0.0013) [2025-01-04 04:59:48,967][134211] Fps is (10 sec: 17203.5, 60 sec: 15360.0, 300 sec: 14953.9). Total num frames: 428118016. Throughput: 0: 3469.4. Samples: 96188892. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 04:59:48,968][134211] Avg episode reward: [(0, '6.846')] [2025-01-04 04:59:49,499][134294] Updated weights for policy 0, policy_version 104524 (0.0013) [2025-01-04 04:59:51,512][134294] Updated weights for policy 0, policy_version 104534 (0.0017) [2025-01-04 04:59:53,561][134294] Updated weights for policy 0, policy_version 104544 (0.0014) [2025-01-04 04:59:53,971][134211] Fps is (10 sec: 20063.8, 60 sec: 15564.0, 300 sec: 15050.9). Total num frames: 428216320. Throughput: 0: 3695.2. Samples: 96219480. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 04:59:53,971][134211] Avg episode reward: [(0, '8.095')] [2025-01-04 04:59:56,856][134294] Updated weights for policy 0, policy_version 104554 (0.0025) [2025-01-04 04:59:58,968][134211] Fps is (10 sec: 15974.0, 60 sec: 14813.8, 300 sec: 15037.2). Total num frames: 428277760. Throughput: 0: 3744.2. Samples: 96240582. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 04:59:58,968][134211] Avg episode reward: [(0, '7.939')] [2025-01-04 05:00:00,183][134294] Updated weights for policy 0, policy_version 104564 (0.0030) [2025-01-04 05:00:03,296][134294] Updated weights for policy 0, policy_version 104574 (0.0027) [2025-01-04 05:00:03,968][134211] Fps is (10 sec: 12701.4, 60 sec: 14404.3, 300 sec: 15023.3). Total num frames: 428343296. Throughput: 0: 3741.9. Samples: 96249934. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 05:00:03,968][134211] Avg episode reward: [(0, '7.724')] [2025-01-04 05:00:06,408][134294] Updated weights for policy 0, policy_version 104584 (0.0026) [2025-01-04 05:00:08,968][134211] Fps is (10 sec: 13107.2, 60 sec: 14472.5, 300 sec: 14981.6). Total num frames: 428408832. Throughput: 0: 3772.3. Samples: 96269652. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 05:00:08,968][134211] Avg episode reward: [(0, '8.163')] [2025-01-04 05:00:09,597][134294] Updated weights for policy 0, policy_version 104594 (0.0024) [2025-01-04 05:00:12,621][134294] Updated weights for policy 0, policy_version 104604 (0.0025) [2025-01-04 05:00:13,968][134211] Fps is (10 sec: 13107.4, 60 sec: 14472.6, 300 sec: 14856.7). Total num frames: 428474368. Throughput: 0: 3775.9. Samples: 96289566. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 05:00:13,968][134211] Avg episode reward: [(0, '7.329')] [2025-01-04 05:00:15,571][134294] Updated weights for policy 0, policy_version 104614 (0.0024) [2025-01-04 05:00:18,852][134294] Updated weights for policy 0, policy_version 104624 (0.0027) [2025-01-04 05:00:18,969][134211] Fps is (10 sec: 13106.2, 60 sec: 14608.9, 300 sec: 14842.8). Total num frames: 428539904. Throughput: 0: 3781.7. Samples: 96300028. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 05:00:18,969][134211] Avg episode reward: [(0, '8.216')] [2025-01-04 05:00:22,120][134294] Updated weights for policy 0, policy_version 104634 (0.0028) [2025-01-04 05:00:23,968][134211] Fps is (10 sec: 12697.6, 60 sec: 14609.1, 300 sec: 14856.8). Total num frames: 428601344. Throughput: 0: 3639.8. Samples: 96318540. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 05:00:23,968][134211] Avg episode reward: [(0, '7.189')] [2025-01-04 05:00:25,333][134294] Updated weights for policy 0, policy_version 104644 (0.0028) [2025-01-04 05:00:28,283][134294] Updated weights for policy 0, policy_version 104654 (0.0025) [2025-01-04 05:00:28,967][134211] Fps is (10 sec: 13108.5, 60 sec: 14609.1, 300 sec: 14870.6). Total num frames: 428670976. Throughput: 0: 3660.2. Samples: 96338796. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 05:00:28,968][134211] Avg episode reward: [(0, '8.103')] [2025-01-04 05:00:30,232][134294] Updated weights for policy 0, policy_version 104664 (0.0013) [2025-01-04 05:00:32,155][134294] Updated weights for policy 0, policy_version 104674 (0.0014) [2025-01-04 05:00:33,968][134211] Fps is (10 sec: 18022.0, 60 sec: 15291.7, 300 sec: 14967.7). Total num frames: 428781568. Throughput: 0: 3674.2. Samples: 96354232. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 05:00:33,968][134211] Avg episode reward: [(0, '7.471')] [2025-01-04 05:00:34,004][134294] Updated weights for policy 0, policy_version 104684 (0.0013) [2025-01-04 05:00:35,927][134294] Updated weights for policy 0, policy_version 104694 (0.0014) [2025-01-04 05:00:38,605][134294] Updated weights for policy 0, policy_version 104704 (0.0021) [2025-01-04 05:00:38,968][134211] Fps is (10 sec: 19660.2, 60 sec: 15360.0, 300 sec: 15023.3). Total num frames: 428867584. Throughput: 0: 3692.8. Samples: 96385644. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 05:00:38,969][134211] Avg episode reward: [(0, '7.646')] [2025-01-04 05:00:42,174][134294] Updated weights for policy 0, policy_version 104714 (0.0027) [2025-01-04 05:00:43,968][134211] Fps is (10 sec: 14745.9, 60 sec: 15223.4, 300 sec: 15009.4). Total num frames: 428929024. Throughput: 0: 3622.6. Samples: 96403600. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 05:00:43,968][134211] Avg episode reward: [(0, '8.085')] [2025-01-04 05:00:45,237][134294] Updated weights for policy 0, policy_version 104724 (0.0027) [2025-01-04 05:00:48,455][134294] Updated weights for policy 0, policy_version 104734 (0.0025) [2025-01-04 05:00:48,968][134211] Fps is (10 sec: 12697.5, 60 sec: 14609.0, 300 sec: 14967.7). Total num frames: 428994560. Throughput: 0: 3639.2. Samples: 96413698. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 05:00:48,969][134211] Avg episode reward: [(0, '8.178')] [2025-01-04 05:00:51,717][134294] Updated weights for policy 0, policy_version 104744 (0.0028) [2025-01-04 05:00:53,968][134211] Fps is (10 sec: 12697.4, 60 sec: 13995.4, 300 sec: 14926.1). Total num frames: 429056000. Throughput: 0: 3614.1. Samples: 96432288. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 05:00:53,968][134211] Avg episode reward: [(0, '8.533')] [2025-01-04 05:00:54,813][134294] Updated weights for policy 0, policy_version 104754 (0.0023) [2025-01-04 05:00:57,011][134294] Updated weights for policy 0, policy_version 104764 (0.0018) [2025-01-04 05:00:58,968][134211] Fps is (10 sec: 14336.1, 60 sec: 14336.0, 300 sec: 14926.1). Total num frames: 429137920. Throughput: 0: 3689.1. Samples: 96455578. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 05:00:58,968][134211] Avg episode reward: [(0, '8.522')] [2025-01-04 05:00:59,994][134294] Updated weights for policy 0, policy_version 104774 (0.0023) [2025-01-04 05:01:03,078][134294] Updated weights for policy 0, policy_version 104784 (0.0027) [2025-01-04 05:01:03,971][134211] Fps is (10 sec: 14741.2, 60 sec: 14335.3, 300 sec: 14926.0). Total num frames: 429203456. Throughput: 0: 3685.1. Samples: 96465868. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 05:01:03,971][134211] Avg episode reward: [(0, '8.163')] [2025-01-04 05:01:05,953][134294] Updated weights for policy 0, policy_version 104794 (0.0024) [2025-01-04 05:01:08,028][134294] Updated weights for policy 0, policy_version 104804 (0.0016) [2025-01-04 05:01:08,967][134211] Fps is (10 sec: 15565.4, 60 sec: 14745.7, 300 sec: 14884.5). Total num frames: 429293568. Throughput: 0: 3766.6. Samples: 96488038. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 05:01:08,968][134211] Avg episode reward: [(0, '7.462')] [2025-01-04 05:01:09,959][134294] Updated weights for policy 0, policy_version 104814 (0.0014) [2025-01-04 05:01:12,682][134294] Updated weights for policy 0, policy_version 104824 (0.0022) [2025-01-04 05:01:13,968][134211] Fps is (10 sec: 16797.9, 60 sec: 14950.3, 300 sec: 14940.0). Total num frames: 429371392. Throughput: 0: 3901.8. Samples: 96514382. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 05:01:13,969][134211] Avg episode reward: [(0, '8.234')] [2025-01-04 05:01:14,022][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000104828_429375488.pth... [2025-01-04 05:01:14,095][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000103954_425795584.pth [2025-01-04 05:01:15,940][134294] Updated weights for policy 0, policy_version 104834 (0.0028) [2025-01-04 05:01:18,968][134211] Fps is (10 sec: 14335.5, 60 sec: 14950.6, 300 sec: 14953.9). Total num frames: 429436928. Throughput: 0: 3769.3. Samples: 96523848. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 05:01:18,968][134211] Avg episode reward: [(0, '8.298')] [2025-01-04 05:01:19,323][134294] Updated weights for policy 0, policy_version 104844 (0.0025) [2025-01-04 05:01:22,598][134294] Updated weights for policy 0, policy_version 104854 (0.0025) [2025-01-04 05:01:23,968][134211] Fps is (10 sec: 13517.7, 60 sec: 15086.9, 300 sec: 14856.7). Total num frames: 429506560. Throughput: 0: 3482.7. Samples: 96542364. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 05:01:23,968][134211] Avg episode reward: [(0, '8.445')] [2025-01-04 05:01:24,726][134294] Updated weights for policy 0, policy_version 104864 (0.0014) [2025-01-04 05:01:26,576][134294] Updated weights for policy 0, policy_version 104874 (0.0014) [2025-01-04 05:01:28,406][134294] Updated weights for policy 0, policy_version 104884 (0.0013) [2025-01-04 05:01:28,967][134211] Fps is (10 sec: 18023.0, 60 sec: 15769.6, 300 sec: 14884.5). Total num frames: 429617152. Throughput: 0: 3767.1. Samples: 96573118. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-04 05:01:28,968][134211] Avg episode reward: [(0, '7.293')] [2025-01-04 05:01:30,320][134294] Updated weights for policy 0, policy_version 104894 (0.0014) [2025-01-04 05:01:32,231][134294] Updated weights for policy 0, policy_version 104904 (0.0015) [2025-01-04 05:01:33,968][134211] Fps is (10 sec: 21708.6, 60 sec: 15701.4, 300 sec: 15051.1). Total num frames: 429723648. Throughput: 0: 3902.7. Samples: 96589318. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-04 05:01:33,968][134211] Avg episode reward: [(0, '8.165')] [2025-01-04 05:01:34,214][134294] Updated weights for policy 0, policy_version 104914 (0.0014) [2025-01-04 05:01:37,286][134294] Updated weights for policy 0, policy_version 104924 (0.0026) [2025-01-04 05:01:38,968][134211] Fps is (10 sec: 16793.1, 60 sec: 15291.7, 300 sec: 15023.3). Total num frames: 429785088. Throughput: 0: 4065.1. Samples: 96615218. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-04 05:01:38,968][134211] Avg episode reward: [(0, '7.621')] [2025-01-04 05:01:40,784][134294] Updated weights for policy 0, policy_version 104934 (0.0026) [2025-01-04 05:01:43,969][134211] Fps is (10 sec: 11877.3, 60 sec: 15223.2, 300 sec: 15009.4). Total num frames: 429842432. Throughput: 0: 3926.6. Samples: 96632280. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-04 05:01:43,969][134211] Avg episode reward: [(0, '7.892')] [2025-01-04 05:01:44,411][134294] Updated weights for policy 0, policy_version 104944 (0.0026) [2025-01-04 05:01:47,986][134294] Updated weights for policy 0, policy_version 104954 (0.0027) [2025-01-04 05:01:48,968][134211] Fps is (10 sec: 11468.8, 60 sec: 15087.0, 300 sec: 14995.6). Total num frames: 429899776. Throughput: 0: 3891.5. Samples: 96640972. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-04 05:01:48,968][134211] Avg episode reward: [(0, '7.669')] [2025-01-04 05:01:51,522][134294] Updated weights for policy 0, policy_version 104964 (0.0025) [2025-01-04 05:01:53,971][134211] Fps is (10 sec: 11466.2, 60 sec: 15017.9, 300 sec: 14842.6). Total num frames: 429957120. Throughput: 0: 3779.4. Samples: 96658124. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-04 05:01:53,971][134211] Avg episode reward: [(0, '8.264')] [2025-01-04 05:01:55,121][134294] Updated weights for policy 0, policy_version 104974 (0.0027) [2025-01-04 05:01:58,345][134294] Updated weights for policy 0, policy_version 104984 (0.0024) [2025-01-04 05:01:58,968][134211] Fps is (10 sec: 12288.0, 60 sec: 14745.6, 300 sec: 14828.9). Total num frames: 430022656. Throughput: 0: 3598.9. Samples: 96676330. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-04 05:01:58,968][134211] Avg episode reward: [(0, '7.339')] [2025-01-04 05:02:01,075][134294] Updated weights for policy 0, policy_version 104994 (0.0022) [2025-01-04 05:02:03,021][134294] Updated weights for policy 0, policy_version 105004 (0.0014) [2025-01-04 05:02:03,968][134211] Fps is (10 sec: 15979.0, 60 sec: 15224.2, 300 sec: 14926.1). Total num frames: 430116864. Throughput: 0: 3639.5. Samples: 96687628. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-04 05:02:03,968][134211] Avg episode reward: [(0, '7.708')] [2025-01-04 05:02:05,019][134294] Updated weights for policy 0, policy_version 105014 (0.0014) [2025-01-04 05:02:07,782][134294] Updated weights for policy 0, policy_version 105024 (0.0020) [2025-01-04 05:02:08,968][134211] Fps is (10 sec: 16793.6, 60 sec: 14950.3, 300 sec: 14912.2). Total num frames: 430190592. Throughput: 0: 3850.8. Samples: 96715650. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-04 05:02:08,968][134211] Avg episode reward: [(0, '7.407')] [2025-01-04 05:02:10,965][134294] Updated weights for policy 0, policy_version 105034 (0.0028) [2025-01-04 05:02:13,943][134294] Updated weights for policy 0, policy_version 105044 (0.0026) [2025-01-04 05:02:13,968][134211] Fps is (10 sec: 14336.2, 60 sec: 14814.0, 300 sec: 14787.2). Total num frames: 430260224. Throughput: 0: 3612.3. Samples: 96735672. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-04 05:02:13,969][134211] Avg episode reward: [(0, '8.135')] [2025-01-04 05:02:16,947][134294] Updated weights for policy 0, policy_version 105054 (0.0025) [2025-01-04 05:02:18,968][134211] Fps is (10 sec: 13516.9, 60 sec: 14813.9, 300 sec: 14759.5). Total num frames: 430325760. Throughput: 0: 3476.8. Samples: 96745772. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-04 05:02:18,968][134211] Avg episode reward: [(0, '7.456')] [2025-01-04 05:02:20,181][134294] Updated weights for policy 0, policy_version 105064 (0.0024) [2025-01-04 05:02:22,835][134294] Updated weights for policy 0, policy_version 105074 (0.0019) [2025-01-04 05:02:23,967][134211] Fps is (10 sec: 14336.5, 60 sec: 14950.4, 300 sec: 14815.0). Total num frames: 430403584. Throughput: 0: 3348.4. Samples: 96765896. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-04 05:02:23,968][134211] Avg episode reward: [(0, '7.696')] [2025-01-04 05:02:24,865][134294] Updated weights for policy 0, policy_version 105084 (0.0015) [2025-01-04 05:02:26,810][134294] Updated weights for policy 0, policy_version 105094 (0.0013) [2025-01-04 05:02:28,671][134294] Updated weights for policy 0, policy_version 105104 (0.0013) [2025-01-04 05:02:28,967][134211] Fps is (10 sec: 18432.4, 60 sec: 14882.1, 300 sec: 14953.9). Total num frames: 430510080. Throughput: 0: 3658.3. Samples: 96796898. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 05:02:28,968][134211] Avg episode reward: [(0, '8.008')] [2025-01-04 05:02:31,289][134294] Updated weights for policy 0, policy_version 105114 (0.0022) [2025-01-04 05:02:33,968][134211] Fps is (10 sec: 17611.9, 60 sec: 14267.7, 300 sec: 14967.7). Total num frames: 430579712. Throughput: 0: 3747.4. Samples: 96809608. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 05:02:33,969][134211] Avg episode reward: [(0, '7.883')] [2025-01-04 05:02:34,373][134294] Updated weights for policy 0, policy_version 105124 (0.0025) [2025-01-04 05:02:37,626][134294] Updated weights for policy 0, policy_version 105134 (0.0030) [2025-01-04 05:02:38,968][134211] Fps is (10 sec: 13106.9, 60 sec: 14267.7, 300 sec: 14953.9). Total num frames: 430641152. Throughput: 0: 3795.7. Samples: 96828920. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 05:02:38,968][134211] Avg episode reward: [(0, '9.192')] [2025-01-04 05:02:40,773][134294] Updated weights for policy 0, policy_version 105144 (0.0027) [2025-01-04 05:02:43,802][134294] Updated weights for policy 0, policy_version 105154 (0.0025) [2025-01-04 05:02:43,968][134211] Fps is (10 sec: 13107.4, 60 sec: 14472.7, 300 sec: 14856.7). Total num frames: 430710784. Throughput: 0: 3835.0. Samples: 96848904. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 05:02:43,968][134211] Avg episode reward: [(0, '7.783')] [2025-01-04 05:02:46,696][134294] Updated weights for policy 0, policy_version 105164 (0.0023) [2025-01-04 05:02:48,968][134211] Fps is (10 sec: 13516.9, 60 sec: 14609.1, 300 sec: 14759.5). Total num frames: 430776320. Throughput: 0: 3809.8. Samples: 96859066. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 05:02:48,968][134211] Avg episode reward: [(0, '8.011')] [2025-01-04 05:02:49,985][134294] Updated weights for policy 0, policy_version 105174 (0.0026) [2025-01-04 05:02:53,043][134294] Updated weights for policy 0, policy_version 105184 (0.0026) [2025-01-04 05:02:53,968][134211] Fps is (10 sec: 13107.2, 60 sec: 14746.3, 300 sec: 14773.4). Total num frames: 430841856. Throughput: 0: 3622.0. Samples: 96878642. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 05:02:53,968][134211] Avg episode reward: [(0, '7.096')] [2025-01-04 05:02:55,650][134294] Updated weights for policy 0, policy_version 105194 (0.0017) [2025-01-04 05:02:57,505][134294] Updated weights for policy 0, policy_version 105204 (0.0014) [2025-01-04 05:02:58,968][134211] Fps is (10 sec: 16793.6, 60 sec: 15360.0, 300 sec: 14898.3). Total num frames: 430944256. Throughput: 0: 3779.3. Samples: 96905740. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 05:02:58,968][134211] Avg episode reward: [(0, '8.131')] [2025-01-04 05:02:59,668][134294] Updated weights for policy 0, policy_version 105214 (0.0018) [2025-01-04 05:03:02,677][134294] Updated weights for policy 0, policy_version 105224 (0.0026) [2025-01-04 05:03:03,968][134211] Fps is (10 sec: 16793.6, 60 sec: 14882.2, 300 sec: 14884.4). Total num frames: 431009792. Throughput: 0: 3818.7. Samples: 96917614. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 05:03:03,969][134211] Avg episode reward: [(0, '7.373')] [2025-01-04 05:03:05,799][134294] Updated weights for policy 0, policy_version 105234 (0.0026) [2025-01-04 05:03:08,828][134294] Updated weights for policy 0, policy_version 105244 (0.0025) [2025-01-04 05:03:08,968][134211] Fps is (10 sec: 13516.9, 60 sec: 14813.9, 300 sec: 14898.3). Total num frames: 431079424. Throughput: 0: 3816.7. Samples: 96937646. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 05:03:08,968][134211] Avg episode reward: [(0, '8.350')] [2025-01-04 05:03:11,983][134294] Updated weights for policy 0, policy_version 105254 (0.0025) [2025-01-04 05:03:13,968][134211] Fps is (10 sec: 14336.3, 60 sec: 14882.2, 300 sec: 14940.0). Total num frames: 431153152. Throughput: 0: 3577.6. Samples: 96957892. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 05:03:13,968][134211] Avg episode reward: [(0, '7.766')] [2025-01-04 05:03:14,042][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000105263_431157248.pth... [2025-01-04 05:03:14,083][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000104396_427606016.pth [2025-01-04 05:03:14,269][134294] Updated weights for policy 0, policy_version 105264 (0.0015) [2025-01-04 05:03:16,227][134294] Updated weights for policy 0, policy_version 105274 (0.0014) [2025-01-04 05:03:18,491][134294] Updated weights for policy 0, policy_version 105284 (0.0017) [2025-01-04 05:03:18,968][134211] Fps is (10 sec: 16793.4, 60 sec: 15360.0, 300 sec: 14967.8). Total num frames: 431247360. Throughput: 0: 3648.8. Samples: 96973802. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 05:03:18,971][134211] Avg episode reward: [(0, '8.220')] [2025-01-04 05:03:21,880][134294] Updated weights for policy 0, policy_version 105294 (0.0026) [2025-01-04 05:03:23,995][134211] Fps is (10 sec: 15523.1, 60 sec: 15080.1, 300 sec: 14827.6). Total num frames: 431308800. Throughput: 0: 3691.8. Samples: 96995150. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 05:03:23,995][134211] Avg episode reward: [(0, '8.572')] [2025-01-04 05:03:25,140][134294] Updated weights for policy 0, policy_version 105304 (0.0028) [2025-01-04 05:03:27,494][134294] Updated weights for policy 0, policy_version 105314 (0.0015) [2025-01-04 05:03:28,968][134211] Fps is (10 sec: 14745.9, 60 sec: 14745.6, 300 sec: 14898.4). Total num frames: 431394816. Throughput: 0: 3767.8. Samples: 97018456. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 05:03:28,968][134211] Avg episode reward: [(0, '8.174')] [2025-01-04 05:03:29,392][134294] Updated weights for policy 0, policy_version 105324 (0.0014) [2025-01-04 05:03:31,295][134294] Updated weights for policy 0, policy_version 105334 (0.0012) [2025-01-04 05:03:33,128][134294] Updated weights for policy 0, policy_version 105344 (0.0011) [2025-01-04 05:03:33,968][134211] Fps is (10 sec: 19303.2, 60 sec: 15360.1, 300 sec: 15037.2). Total num frames: 431501312. Throughput: 0: 3903.4. Samples: 97034720. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 05:03:33,968][134211] Avg episode reward: [(0, '8.801')] [2025-01-04 05:03:35,664][134294] Updated weights for policy 0, policy_version 105354 (0.0021) [2025-01-04 05:03:38,902][134294] Updated weights for policy 0, policy_version 105364 (0.0027) [2025-01-04 05:03:38,968][134211] Fps is (10 sec: 17612.6, 60 sec: 15496.6, 300 sec: 15051.1). Total num frames: 431570944. Throughput: 0: 4045.7. Samples: 97060698. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 05:03:38,968][134211] Avg episode reward: [(0, '8.450')] [2025-01-04 05:03:42,604][134294] Updated weights for policy 0, policy_version 105374 (0.0028) [2025-01-04 05:03:43,968][134211] Fps is (10 sec: 12287.4, 60 sec: 15223.4, 300 sec: 15009.4). Total num frames: 431624192. Throughput: 0: 3821.2. Samples: 97077694. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 05:03:43,969][134211] Avg episode reward: [(0, '6.995')] [2025-01-04 05:03:46,070][134294] Updated weights for policy 0, policy_version 105384 (0.0026) [2025-01-04 05:03:48,968][134211] Fps is (10 sec: 11468.7, 60 sec: 15155.2, 300 sec: 14926.1). Total num frames: 431685632. Throughput: 0: 3755.9. Samples: 97086630. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 05:03:48,968][134211] Avg episode reward: [(0, '7.865')] [2025-01-04 05:03:49,696][134294] Updated weights for policy 0, policy_version 105394 (0.0030) [2025-01-04 05:03:52,848][134294] Updated weights for policy 0, policy_version 105404 (0.0025) [2025-01-04 05:03:53,968][134211] Fps is (10 sec: 13107.7, 60 sec: 15223.5, 300 sec: 14801.1). Total num frames: 431755264. Throughput: 0: 3705.7. Samples: 97104404. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 05:03:53,968][134211] Avg episode reward: [(0, '7.493')] [2025-01-04 05:03:54,986][134294] Updated weights for policy 0, policy_version 105414 (0.0017) [2025-01-04 05:03:57,970][134294] Updated weights for policy 0, policy_version 105424 (0.0024) [2025-01-04 05:03:58,968][134211] Fps is (10 sec: 14336.0, 60 sec: 14745.6, 300 sec: 14745.6). Total num frames: 431828992. Throughput: 0: 3778.9. Samples: 97127944. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 05:03:58,968][134211] Avg episode reward: [(0, '8.204')] [2025-01-04 05:04:01,033][134294] Updated weights for policy 0, policy_version 105434 (0.0023) [2025-01-04 05:04:03,231][134294] Updated weights for policy 0, policy_version 105444 (0.0015) [2025-01-04 05:04:03,968][134211] Fps is (10 sec: 15564.6, 60 sec: 15018.7, 300 sec: 14815.0). Total num frames: 431910912. Throughput: 0: 3652.4. Samples: 97138158. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 05:04:03,968][134211] Avg episode reward: [(0, '8.378')] [2025-01-04 05:04:05,174][134294] Updated weights for policy 0, policy_version 105454 (0.0013) [2025-01-04 05:04:07,059][134294] Updated weights for policy 0, policy_version 105464 (0.0012) [2025-01-04 05:04:08,895][134294] Updated weights for policy 0, policy_version 105474 (0.0014) [2025-01-04 05:04:08,968][134211] Fps is (10 sec: 19251.5, 60 sec: 15701.3, 300 sec: 14967.8). Total num frames: 432021504. Throughput: 0: 3869.6. Samples: 97169180. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 05:04:08,968][134211] Avg episode reward: [(0, '8.170')] [2025-01-04 05:04:11,569][134294] Updated weights for policy 0, policy_version 105484 (0.0025) [2025-01-04 05:04:13,968][134211] Fps is (10 sec: 18022.4, 60 sec: 15633.0, 300 sec: 15009.4). Total num frames: 432091136. Throughput: 0: 3901.3. Samples: 97194014. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 05:04:13,968][134211] Avg episode reward: [(0, '8.521')] [2025-01-04 05:04:14,866][134294] Updated weights for policy 0, policy_version 105494 (0.0027) [2025-01-04 05:04:18,294][134294] Updated weights for policy 0, policy_version 105504 (0.0026) [2025-01-04 05:04:18,968][134211] Fps is (10 sec: 12697.4, 60 sec: 15018.7, 300 sec: 14995.5). Total num frames: 432148480. Throughput: 0: 3742.1. Samples: 97203114. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 05:04:18,968][134211] Avg episode reward: [(0, '7.838')] [2025-01-04 05:04:21,865][134294] Updated weights for policy 0, policy_version 105514 (0.0025) [2025-01-04 05:04:23,968][134211] Fps is (10 sec: 11468.8, 60 sec: 14957.1, 300 sec: 14953.9). Total num frames: 432205824. Throughput: 0: 3546.7. Samples: 97220302. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 05:04:23,968][134211] Avg episode reward: [(0, '7.778')] [2025-01-04 05:04:25,505][134294] Updated weights for policy 0, policy_version 105524 (0.0030) [2025-01-04 05:04:28,263][134294] Updated weights for policy 0, policy_version 105534 (0.0019) [2025-01-04 05:04:28,967][134211] Fps is (10 sec: 13107.5, 60 sec: 14745.6, 300 sec: 14967.8). Total num frames: 432279552. Throughput: 0: 3602.2. Samples: 97239792. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 05:04:28,968][134211] Avg episode reward: [(0, '7.662')] [2025-01-04 05:04:30,126][134294] Updated weights for policy 0, policy_version 105544 (0.0012) [2025-01-04 05:04:32,318][134294] Updated weights for policy 0, policy_version 105554 (0.0018) [2025-01-04 05:04:33,968][134211] Fps is (10 sec: 16384.0, 60 sec: 14472.5, 300 sec: 14995.5). Total num frames: 432369664. Throughput: 0: 3759.6. Samples: 97255812. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 05:04:33,968][134211] Avg episode reward: [(0, '7.848')] [2025-01-04 05:04:35,371][134294] Updated weights for policy 0, policy_version 105564 (0.0025) [2025-01-04 05:04:38,407][134294] Updated weights for policy 0, policy_version 105574 (0.0028) [2025-01-04 05:04:38,968][134211] Fps is (10 sec: 15563.7, 60 sec: 14404.1, 300 sec: 14981.6). Total num frames: 432435200. Throughput: 0: 3832.2. Samples: 97276854. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 05:04:38,969][134211] Avg episode reward: [(0, '7.840')] [2025-01-04 05:04:41,485][134294] Updated weights for policy 0, policy_version 105584 (0.0023) [2025-01-04 05:04:43,968][134211] Fps is (10 sec: 13107.2, 60 sec: 14609.1, 300 sec: 14856.7). Total num frames: 432500736. Throughput: 0: 3748.3. Samples: 97296616. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 05:04:43,968][134211] Avg episode reward: [(0, '8.231')] [2025-01-04 05:04:44,592][134294] Updated weights for policy 0, policy_version 105594 (0.0027) [2025-01-04 05:04:47,735][134294] Updated weights for policy 0, policy_version 105604 (0.0025) [2025-01-04 05:04:48,968][134211] Fps is (10 sec: 14336.9, 60 sec: 14882.2, 300 sec: 14787.4). Total num frames: 432578560. Throughput: 0: 3740.6. Samples: 97306484. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 05:04:48,968][134211] Avg episode reward: [(0, '9.786')] [2025-01-04 05:04:49,038][134264] Saving new best policy, reward=9.786! [2025-01-04 05:04:49,663][134294] Updated weights for policy 0, policy_version 105614 (0.0015) [2025-01-04 05:04:51,739][134294] Updated weights for policy 0, policy_version 105624 (0.0013) [2025-01-04 05:04:53,968][134211] Fps is (10 sec: 16384.1, 60 sec: 15155.2, 300 sec: 14870.6). Total num frames: 432664576. Throughput: 0: 3659.8. Samples: 97333872. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 05:04:53,968][134211] Avg episode reward: [(0, '9.101')] [2025-01-04 05:04:54,790][134294] Updated weights for policy 0, policy_version 105634 (0.0024) [2025-01-04 05:04:57,955][134294] Updated weights for policy 0, policy_version 105644 (0.0024) [2025-01-04 05:04:58,968][134211] Fps is (10 sec: 15154.2, 60 sec: 15018.5, 300 sec: 14870.5). Total num frames: 432730112. Throughput: 0: 3542.1. Samples: 97353412. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 05:04:58,969][134211] Avg episode reward: [(0, '8.336')] [2025-01-04 05:05:00,996][134294] Updated weights for policy 0, policy_version 105654 (0.0025) [2025-01-04 05:05:03,968][134211] Fps is (10 sec: 13107.1, 60 sec: 14745.6, 300 sec: 14870.6). Total num frames: 432795648. Throughput: 0: 3568.0. Samples: 97363672. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 05:05:03,968][134211] Avg episode reward: [(0, '8.176')] [2025-01-04 05:05:04,135][134294] Updated weights for policy 0, policy_version 105664 (0.0025) [2025-01-04 05:05:06,410][134294] Updated weights for policy 0, policy_version 105674 (0.0016) [2025-01-04 05:05:08,317][134294] Updated weights for policy 0, policy_version 105684 (0.0014) [2025-01-04 05:05:08,968][134211] Fps is (10 sec: 16384.7, 60 sec: 14540.8, 300 sec: 14981.6). Total num frames: 432893952. Throughput: 0: 3717.3. Samples: 97387582. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 05:05:08,969][134211] Avg episode reward: [(0, '7.870')] [2025-01-04 05:05:10,243][134294] Updated weights for policy 0, policy_version 105694 (0.0013) [2025-01-04 05:05:12,157][134294] Updated weights for policy 0, policy_version 105704 (0.0013) [2025-01-04 05:05:13,968][134211] Fps is (10 sec: 20480.3, 60 sec: 15155.2, 300 sec: 15120.5). Total num frames: 433000448. Throughput: 0: 3999.5. Samples: 97419772. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 05:05:13,968][134211] Avg episode reward: [(0, '8.401')] [2025-01-04 05:05:14,016][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000105714_433004544.pth... [2025-01-04 05:05:14,016][134294] Updated weights for policy 0, policy_version 105714 (0.0012) [2025-01-04 05:05:14,056][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000104828_429375488.pth [2025-01-04 05:05:16,814][134294] Updated weights for policy 0, policy_version 105724 (0.0021) [2025-01-04 05:05:18,968][134211] Fps is (10 sec: 17612.8, 60 sec: 15360.0, 300 sec: 15148.3). Total num frames: 433070080. Throughput: 0: 3931.2. Samples: 97432716. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 05:05:18,968][134211] Avg episode reward: [(0, '8.784')] [2025-01-04 05:05:20,490][134294] Updated weights for policy 0, policy_version 105734 (0.0029) [2025-01-04 05:05:23,968][134211] Fps is (10 sec: 12287.9, 60 sec: 15291.8, 300 sec: 15092.7). Total num frames: 433123328. Throughput: 0: 3838.4. Samples: 97449582. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 05:05:23,968][134211] Avg episode reward: [(0, '8.158')] [2025-01-04 05:05:23,988][134294] Updated weights for policy 0, policy_version 105744 (0.0027) [2025-01-04 05:05:27,166][134294] Updated weights for policy 0, policy_version 105754 (0.0025) [2025-01-04 05:05:28,968][134211] Fps is (10 sec: 11878.5, 60 sec: 15155.1, 300 sec: 14940.0). Total num frames: 433188864. Throughput: 0: 3823.0. Samples: 97468652. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-04 05:05:28,968][134211] Avg episode reward: [(0, '8.621')] [2025-01-04 05:05:30,290][134294] Updated weights for policy 0, policy_version 105764 (0.0024) [2025-01-04 05:05:33,577][134294] Updated weights for policy 0, policy_version 105774 (0.0025) [2025-01-04 05:05:33,971][134211] Fps is (10 sec: 13103.0, 60 sec: 14744.8, 300 sec: 14870.4). Total num frames: 433254400. Throughput: 0: 3820.0. Samples: 97478398. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-04 05:05:33,971][134211] Avg episode reward: [(0, '8.288')] [2025-01-04 05:05:36,771][134294] Updated weights for policy 0, policy_version 105784 (0.0027) [2025-01-04 05:05:38,968][134211] Fps is (10 sec: 12697.7, 60 sec: 14677.5, 300 sec: 14870.6). Total num frames: 433315840. Throughput: 0: 3628.8. Samples: 97497168. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-04 05:05:38,968][134211] Avg episode reward: [(0, '8.030')] [2025-01-04 05:05:40,197][134294] Updated weights for policy 0, policy_version 105794 (0.0025) [2025-01-04 05:05:42,508][134294] Updated weights for policy 0, policy_version 105804 (0.0015) [2025-01-04 05:05:43,968][134211] Fps is (10 sec: 14750.4, 60 sec: 15018.7, 300 sec: 14940.0). Total num frames: 433401856. Throughput: 0: 3694.5. Samples: 97519662. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-04 05:05:43,968][134211] Avg episode reward: [(0, '7.274')] [2025-01-04 05:05:44,599][134294] Updated weights for policy 0, policy_version 105814 (0.0013) [2025-01-04 05:05:46,509][134294] Updated weights for policy 0, policy_version 105824 (0.0012) [2025-01-04 05:05:48,444][134294] Updated weights for policy 0, policy_version 105834 (0.0013) [2025-01-04 05:05:48,968][134211] Fps is (10 sec: 18841.7, 60 sec: 15428.3, 300 sec: 15078.8). Total num frames: 433504256. Throughput: 0: 3810.1. Samples: 97535124. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-04 05:05:48,969][134211] Avg episode reward: [(0, '7.655')] [2025-01-04 05:05:50,661][134294] Updated weights for policy 0, policy_version 105844 (0.0019) [2025-01-04 05:05:53,968][134211] Fps is (10 sec: 16793.2, 60 sec: 15086.9, 300 sec: 15023.3). Total num frames: 433569792. Throughput: 0: 3872.7. Samples: 97561856. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-04 05:05:53,969][134211] Avg episode reward: [(0, '7.550')] [2025-01-04 05:05:54,285][134294] Updated weights for policy 0, policy_version 105854 (0.0027) [2025-01-04 05:05:57,473][134294] Updated weights for policy 0, policy_version 105864 (0.0025) [2025-01-04 05:05:58,968][134211] Fps is (10 sec: 13107.0, 60 sec: 15087.1, 300 sec: 15023.5). Total num frames: 433635328. Throughput: 0: 3558.4. Samples: 97579900. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-04 05:05:58,968][134211] Avg episode reward: [(0, '8.542')] [2025-01-04 05:06:00,659][134294] Updated weights for policy 0, policy_version 105874 (0.0027) [2025-01-04 05:06:03,721][134294] Updated weights for policy 0, policy_version 105884 (0.0027) [2025-01-04 05:06:03,968][134211] Fps is (10 sec: 13516.8, 60 sec: 15155.2, 300 sec: 14953.9). Total num frames: 433704960. Throughput: 0: 3496.9. Samples: 97590076. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-04 05:06:03,968][134211] Avg episode reward: [(0, '7.553')] [2025-01-04 05:06:06,689][134294] Updated weights for policy 0, policy_version 105894 (0.0023) [2025-01-04 05:06:08,968][134211] Fps is (10 sec: 13516.9, 60 sec: 14609.1, 300 sec: 14912.2). Total num frames: 433770496. Throughput: 0: 3568.4. Samples: 97610162. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-04 05:06:08,968][134211] Avg episode reward: [(0, '8.621')] [2025-01-04 05:06:09,823][134294] Updated weights for policy 0, policy_version 105904 (0.0026) [2025-01-04 05:06:12,813][134294] Updated weights for policy 0, policy_version 105914 (0.0027) [2025-01-04 05:06:13,968][134211] Fps is (10 sec: 13107.3, 60 sec: 13926.4, 300 sec: 14912.2). Total num frames: 433836032. Throughput: 0: 3592.8. Samples: 97630328. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-04 05:06:13,968][134211] Avg episode reward: [(0, '8.068')] [2025-01-04 05:06:15,854][134294] Updated weights for policy 0, policy_version 105924 (0.0026) [2025-01-04 05:06:18,726][134294] Updated weights for policy 0, policy_version 105934 (0.0025) [2025-01-04 05:06:18,968][134211] Fps is (10 sec: 13926.5, 60 sec: 13994.7, 300 sec: 14926.1). Total num frames: 433909760. Throughput: 0: 3603.5. Samples: 97640544. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-04 05:06:18,968][134211] Avg episode reward: [(0, '8.069')] [2025-01-04 05:06:20,768][134294] Updated weights for policy 0, policy_version 105944 (0.0013) [2025-01-04 05:06:22,994][134294] Updated weights for policy 0, policy_version 105954 (0.0016) [2025-01-04 05:06:23,969][134211] Fps is (10 sec: 15973.1, 60 sec: 14540.6, 300 sec: 14842.7). Total num frames: 433995776. Throughput: 0: 3770.0. Samples: 97666820. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-04 05:06:23,969][134211] Avg episode reward: [(0, '7.845')] [2025-01-04 05:06:26,221][134294] Updated weights for policy 0, policy_version 105964 (0.0027) [2025-01-04 05:06:28,968][134211] Fps is (10 sec: 15154.6, 60 sec: 14540.7, 300 sec: 14703.9). Total num frames: 434061312. Throughput: 0: 3714.2. Samples: 97686804. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 05:06:28,969][134211] Avg episode reward: [(0, '8.660')] [2025-01-04 05:06:29,342][134294] Updated weights for policy 0, policy_version 105974 (0.0028) [2025-01-04 05:06:31,387][134294] Updated weights for policy 0, policy_version 105984 (0.0013) [2025-01-04 05:06:33,207][134294] Updated weights for policy 0, policy_version 105994 (0.0014) [2025-01-04 05:06:33,967][134211] Fps is (10 sec: 16795.5, 60 sec: 15156.1, 300 sec: 14842.8). Total num frames: 434163712. Throughput: 0: 3662.1. Samples: 97699916. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 05:06:33,968][134211] Avg episode reward: [(0, '8.448')] [2025-01-04 05:06:35,081][134294] Updated weights for policy 0, policy_version 106004 (0.0012) [2025-01-04 05:06:36,989][134294] Updated weights for policy 0, policy_version 106014 (0.0014) [2025-01-04 05:06:38,968][134211] Fps is (10 sec: 20070.8, 60 sec: 15769.6, 300 sec: 14981.7). Total num frames: 434262016. Throughput: 0: 3792.5. Samples: 97732518. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 05:06:38,968][134211] Avg episode reward: [(0, '8.601')] [2025-01-04 05:06:39,665][134294] Updated weights for policy 0, policy_version 106024 (0.0022) [2025-01-04 05:06:42,915][134294] Updated weights for policy 0, policy_version 106034 (0.0027) [2025-01-04 05:06:43,968][134211] Fps is (10 sec: 15973.8, 60 sec: 15359.9, 300 sec: 14995.5). Total num frames: 434323456. Throughput: 0: 3841.7. Samples: 97752778. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 05:06:43,969][134211] Avg episode reward: [(0, '8.395')] [2025-01-04 05:06:46,072][134294] Updated weights for policy 0, policy_version 106044 (0.0029) [2025-01-04 05:06:48,968][134211] Fps is (10 sec: 12697.7, 60 sec: 14745.6, 300 sec: 15023.5). Total num frames: 434388992. Throughput: 0: 3831.5. Samples: 97762492. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 05:06:48,968][134211] Avg episode reward: [(0, '8.811')] [2025-01-04 05:06:49,339][134294] Updated weights for policy 0, policy_version 106054 (0.0027) [2025-01-04 05:06:52,688][134294] Updated weights for policy 0, policy_version 106064 (0.0030) [2025-01-04 05:06:53,968][134211] Fps is (10 sec: 12697.8, 60 sec: 14677.4, 300 sec: 15009.4). Total num frames: 434450432. Throughput: 0: 3803.4. Samples: 97781314. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 05:06:53,968][134211] Avg episode reward: [(0, '8.178')] [2025-01-04 05:06:55,850][134294] Updated weights for policy 0, policy_version 106074 (0.0024) [2025-01-04 05:06:58,767][134294] Updated weights for policy 0, policy_version 106084 (0.0024) [2025-01-04 05:06:58,969][134211] Fps is (10 sec: 13106.0, 60 sec: 14745.4, 300 sec: 14926.1). Total num frames: 434520064. Throughput: 0: 3796.9. Samples: 97801190. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 05:06:58,969][134211] Avg episode reward: [(0, '8.051')] [2025-01-04 05:07:01,745][134294] Updated weights for policy 0, policy_version 106094 (0.0022) [2025-01-04 05:07:03,968][134211] Fps is (10 sec: 13516.7, 60 sec: 14677.4, 300 sec: 14898.3). Total num frames: 434585600. Throughput: 0: 3797.6. Samples: 97811436. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 05:07:03,968][134211] Avg episode reward: [(0, '8.906')] [2025-01-04 05:07:04,971][134294] Updated weights for policy 0, policy_version 106104 (0.0023) [2025-01-04 05:07:07,752][134294] Updated weights for policy 0, policy_version 106114 (0.0021) [2025-01-04 05:07:08,968][134211] Fps is (10 sec: 14747.2, 60 sec: 14950.4, 300 sec: 14940.0). Total num frames: 434667520. Throughput: 0: 3657.6. Samples: 97831406. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 05:07:08,968][134211] Avg episode reward: [(0, '8.466')] [2025-01-04 05:07:09,737][134294] Updated weights for policy 0, policy_version 106124 (0.0014) [2025-01-04 05:07:11,665][134294] Updated weights for policy 0, policy_version 106134 (0.0014) [2025-01-04 05:07:13,533][134294] Updated weights for policy 0, policy_version 106144 (0.0013) [2025-01-04 05:07:13,968][134211] Fps is (10 sec: 18841.8, 60 sec: 15633.1, 300 sec: 15078.8). Total num frames: 434774016. Throughput: 0: 3911.0. Samples: 97862796. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 05:07:13,968][134211] Avg episode reward: [(0, '8.622')] [2025-01-04 05:07:13,974][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000106146_434774016.pth... [2025-01-04 05:07:14,020][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000105263_431157248.pth [2025-01-04 05:07:15,479][134294] Updated weights for policy 0, policy_version 106154 (0.0013) [2025-01-04 05:07:17,351][134294] Updated weights for policy 0, policy_version 106164 (0.0013) [2025-01-04 05:07:18,968][134211] Fps is (10 sec: 21298.5, 60 sec: 16179.1, 300 sec: 15176.0). Total num frames: 434880512. Throughput: 0: 3977.6. Samples: 97878910. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 05:07:18,968][134211] Avg episode reward: [(0, '8.253')] [2025-01-04 05:07:19,515][134294] Updated weights for policy 0, policy_version 106174 (0.0016) [2025-01-04 05:07:22,937][134294] Updated weights for policy 0, policy_version 106184 (0.0028) [2025-01-04 05:07:23,968][134211] Fps is (10 sec: 16383.1, 60 sec: 15701.4, 300 sec: 15009.4). Total num frames: 434937856. Throughput: 0: 3797.2. Samples: 97903396. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 05:07:23,969][134211] Avg episode reward: [(0, '8.476')] [2025-01-04 05:07:26,226][134294] Updated weights for policy 0, policy_version 106194 (0.0026) [2025-01-04 05:07:28,968][134211] Fps is (10 sec: 12288.4, 60 sec: 15701.4, 300 sec: 14995.5). Total num frames: 435003392. Throughput: 0: 3763.2. Samples: 97922122. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 05:07:28,968][134211] Avg episode reward: [(0, '8.352')] [2025-01-04 05:07:29,436][134294] Updated weights for policy 0, policy_version 106204 (0.0028) [2025-01-04 05:07:32,536][134294] Updated weights for policy 0, policy_version 106214 (0.0027) [2025-01-04 05:07:33,968][134211] Fps is (10 sec: 13107.0, 60 sec: 15086.7, 300 sec: 15009.4). Total num frames: 435068928. Throughput: 0: 3763.5. Samples: 97931852. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 05:07:33,969][134211] Avg episode reward: [(0, '8.089')] [2025-01-04 05:07:35,599][134294] Updated weights for policy 0, policy_version 106224 (0.0026) [2025-01-04 05:07:38,525][134294] Updated weights for policy 0, policy_version 106234 (0.0023) [2025-01-04 05:07:38,968][134211] Fps is (10 sec: 13516.6, 60 sec: 14609.1, 300 sec: 15009.4). Total num frames: 435138560. Throughput: 0: 3798.7. Samples: 97952258. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 05:07:38,968][134211] Avg episode reward: [(0, '8.146')] [2025-01-04 05:07:41,721][134294] Updated weights for policy 0, policy_version 106244 (0.0023) [2025-01-04 05:07:43,968][134211] Fps is (10 sec: 13107.7, 60 sec: 14609.0, 300 sec: 14995.5). Total num frames: 435200000. Throughput: 0: 3779.3. Samples: 97971254. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 05:07:43,969][134211] Avg episode reward: [(0, '7.760')] [2025-01-04 05:07:45,312][134294] Updated weights for policy 0, policy_version 106254 (0.0027) [2025-01-04 05:07:48,735][134294] Updated weights for policy 0, policy_version 106264 (0.0024) [2025-01-04 05:07:48,968][134211] Fps is (10 sec: 12288.1, 60 sec: 14540.8, 300 sec: 14981.6). Total num frames: 435261440. Throughput: 0: 3743.5. Samples: 97979894. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 05:07:48,968][134211] Avg episode reward: [(0, '7.756')] [2025-01-04 05:07:50,965][134294] Updated weights for policy 0, policy_version 106274 (0.0015) [2025-01-04 05:07:52,983][134294] Updated weights for policy 0, policy_version 106284 (0.0013) [2025-01-04 05:07:53,967][134211] Fps is (10 sec: 15975.2, 60 sec: 15155.2, 300 sec: 14967.8). Total num frames: 435359744. Throughput: 0: 3836.4. Samples: 98004046. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 05:07:53,968][134211] Avg episode reward: [(0, '7.496')] [2025-01-04 05:07:54,896][134294] Updated weights for policy 0, policy_version 106294 (0.0013) [2025-01-04 05:07:56,890][134294] Updated weights for policy 0, policy_version 106304 (0.0015) [2025-01-04 05:07:58,968][134211] Fps is (10 sec: 18841.5, 60 sec: 15496.8, 300 sec: 15051.1). Total num frames: 435449856. Throughput: 0: 3786.5. Samples: 98033190. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 05:07:58,968][134211] Avg episode reward: [(0, '8.964')] [2025-01-04 05:07:59,981][134294] Updated weights for policy 0, policy_version 106314 (0.0028) [2025-01-04 05:08:03,111][134294] Updated weights for policy 0, policy_version 106324 (0.0029) [2025-01-04 05:08:03,968][134211] Fps is (10 sec: 15154.7, 60 sec: 15428.2, 300 sec: 15023.3). Total num frames: 435511296. Throughput: 0: 3638.4. Samples: 98042636. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 05:08:03,968][134211] Avg episode reward: [(0, '7.883')] [2025-01-04 05:08:06,321][134294] Updated weights for policy 0, policy_version 106334 (0.0024) [2025-01-04 05:08:08,968][134211] Fps is (10 sec: 12697.5, 60 sec: 15155.1, 300 sec: 14995.5). Total num frames: 435576832. Throughput: 0: 3527.2. Samples: 98062118. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 05:08:08,968][134211] Avg episode reward: [(0, '8.098')] [2025-01-04 05:08:09,438][134294] Updated weights for policy 0, policy_version 106344 (0.0025) [2025-01-04 05:08:12,483][134294] Updated weights for policy 0, policy_version 106354 (0.0023) [2025-01-04 05:08:13,968][134211] Fps is (10 sec: 13107.4, 60 sec: 14472.5, 300 sec: 14898.3). Total num frames: 435642368. Throughput: 0: 3547.7. Samples: 98081768. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 05:08:13,968][134211] Avg episode reward: [(0, '8.630')] [2025-01-04 05:08:15,630][134294] Updated weights for policy 0, policy_version 106364 (0.0024) [2025-01-04 05:08:18,420][134294] Updated weights for policy 0, policy_version 106374 (0.0025) [2025-01-04 05:08:18,968][134211] Fps is (10 sec: 13516.9, 60 sec: 13858.2, 300 sec: 14927.5). Total num frames: 435712000. Throughput: 0: 3563.3. Samples: 98092200. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 05:08:18,968][134211] Avg episode reward: [(0, '7.975')] [2025-01-04 05:08:21,730][134294] Updated weights for policy 0, policy_version 106384 (0.0027) [2025-01-04 05:08:23,968][134211] Fps is (10 sec: 13107.2, 60 sec: 13926.5, 300 sec: 14842.8). Total num frames: 435773440. Throughput: 0: 3540.9. Samples: 98111600. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 05:08:23,968][134211] Avg episode reward: [(0, '7.652')] [2025-01-04 05:08:24,693][134294] Updated weights for policy 0, policy_version 106394 (0.0020) [2025-01-04 05:08:26,615][134294] Updated weights for policy 0, policy_version 106404 (0.0013) [2025-01-04 05:08:28,489][134294] Updated weights for policy 0, policy_version 106414 (0.0014) [2025-01-04 05:08:28,968][134211] Fps is (10 sec: 16793.9, 60 sec: 14609.1, 300 sec: 14842.8). Total num frames: 435879936. Throughput: 0: 3741.9. Samples: 98139638. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 05:08:28,968][134211] Avg episode reward: [(0, '8.610')] [2025-01-04 05:08:30,345][134294] Updated weights for policy 0, policy_version 106424 (0.0013) [2025-01-04 05:08:32,218][134294] Updated weights for policy 0, policy_version 106434 (0.0014) [2025-01-04 05:08:33,968][134211] Fps is (10 sec: 21708.9, 60 sec: 15360.2, 300 sec: 14981.6). Total num frames: 435990528. Throughput: 0: 3913.1. Samples: 98155982. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 05:08:33,968][134211] Avg episode reward: [(0, '7.967')] [2025-01-04 05:08:34,085][134294] Updated weights for policy 0, policy_version 106444 (0.0014) [2025-01-04 05:08:36,567][134294] Updated weights for policy 0, policy_version 106454 (0.0019) [2025-01-04 05:08:38,968][134211] Fps is (10 sec: 18431.5, 60 sec: 15428.3, 300 sec: 15051.1). Total num frames: 436064256. Throughput: 0: 3997.9. Samples: 98183952. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 05:08:38,969][134211] Avg episode reward: [(0, '7.774')] [2025-01-04 05:08:39,843][134294] Updated weights for policy 0, policy_version 106464 (0.0027) [2025-01-04 05:08:43,108][134294] Updated weights for policy 0, policy_version 106474 (0.0029) [2025-01-04 05:08:43,968][134211] Fps is (10 sec: 13515.8, 60 sec: 15428.2, 300 sec: 15051.0). Total num frames: 436125696. Throughput: 0: 3770.0. Samples: 98202844. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 05:08:43,969][134211] Avg episode reward: [(0, '8.609')] [2025-01-04 05:08:46,148][134294] Updated weights for policy 0, policy_version 106484 (0.0029) [2025-01-04 05:08:48,968][134211] Fps is (10 sec: 13107.2, 60 sec: 15564.8, 300 sec: 15051.1). Total num frames: 436195328. Throughput: 0: 3782.8. Samples: 98212864. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 05:08:48,968][134211] Avg episode reward: [(0, '8.819')] [2025-01-04 05:08:49,328][134294] Updated weights for policy 0, policy_version 106494 (0.0027) [2025-01-04 05:08:52,638][134294] Updated weights for policy 0, policy_version 106504 (0.0029) [2025-01-04 05:08:53,968][134211] Fps is (10 sec: 13107.9, 60 sec: 14950.3, 300 sec: 15009.4). Total num frames: 436256768. Throughput: 0: 3773.3. Samples: 98231918. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 05:08:53,968][134211] Avg episode reward: [(0, '7.872')] [2025-01-04 05:08:55,742][134294] Updated weights for policy 0, policy_version 106514 (0.0023) [2025-01-04 05:08:58,754][134294] Updated weights for policy 0, policy_version 106524 (0.0024) [2025-01-04 05:08:58,968][134211] Fps is (10 sec: 12697.6, 60 sec: 14540.8, 300 sec: 14953.9). Total num frames: 436322304. Throughput: 0: 3782.0. Samples: 98251958. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 05:08:58,968][134211] Avg episode reward: [(0, '7.595')] [2025-01-04 05:09:01,818][134294] Updated weights for policy 0, policy_version 106534 (0.0024) [2025-01-04 05:09:03,968][134211] Fps is (10 sec: 13107.3, 60 sec: 14609.1, 300 sec: 14801.1). Total num frames: 436387840. Throughput: 0: 3769.4. Samples: 98261824. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 05:09:03,968][134211] Avg episode reward: [(0, '8.364')] [2025-01-04 05:09:04,948][134294] Updated weights for policy 0, policy_version 106544 (0.0026) [2025-01-04 05:09:07,887][134294] Updated weights for policy 0, policy_version 106554 (0.0026) [2025-01-04 05:09:08,968][134211] Fps is (10 sec: 13516.8, 60 sec: 14677.3, 300 sec: 14801.1). Total num frames: 436457472. Throughput: 0: 3786.4. Samples: 98281988. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 05:09:08,968][134211] Avg episode reward: [(0, '8.353')] [2025-01-04 05:09:10,874][134294] Updated weights for policy 0, policy_version 106564 (0.0024) [2025-01-04 05:09:13,503][134294] Updated weights for policy 0, policy_version 106574 (0.0021) [2025-01-04 05:09:13,968][134211] Fps is (10 sec: 14745.6, 60 sec: 14882.1, 300 sec: 14870.6). Total num frames: 436535296. Throughput: 0: 3636.5. Samples: 98303280. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 05:09:13,968][134211] Avg episode reward: [(0, '8.465')] [2025-01-04 05:09:13,973][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000106576_436535296.pth... [2025-01-04 05:09:14,022][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000105714_433004544.pth [2025-01-04 05:09:15,441][134294] Updated weights for policy 0, policy_version 106584 (0.0013) [2025-01-04 05:09:17,400][134294] Updated weights for policy 0, policy_version 106594 (0.0014) [2025-01-04 05:09:18,968][134211] Fps is (10 sec: 18022.4, 60 sec: 15428.3, 300 sec: 15023.3). Total num frames: 436637696. Throughput: 0: 3628.3. Samples: 98319254. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 05:09:18,968][134211] Avg episode reward: [(0, '7.318')] [2025-01-04 05:09:19,374][134294] Updated weights for policy 0, policy_version 106604 (0.0014) [2025-01-04 05:09:21,365][134294] Updated weights for policy 0, policy_version 106614 (0.0013) [2025-01-04 05:09:23,968][134211] Fps is (10 sec: 19251.1, 60 sec: 15906.1, 300 sec: 15078.8). Total num frames: 436727808. Throughput: 0: 3686.6. Samples: 98349850. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 05:09:23,969][134211] Avg episode reward: [(0, '8.239')] [2025-01-04 05:09:24,061][134294] Updated weights for policy 0, policy_version 106624 (0.0020) [2025-01-04 05:09:27,440][134294] Updated weights for policy 0, policy_version 106634 (0.0028) [2025-01-04 05:09:28,968][134211] Fps is (10 sec: 15155.2, 60 sec: 15155.1, 300 sec: 14981.6). Total num frames: 436789248. Throughput: 0: 3689.7. Samples: 98368880. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 05:09:28,968][134211] Avg episode reward: [(0, '8.578')] [2025-01-04 05:09:30,504][134294] Updated weights for policy 0, policy_version 106644 (0.0028) [2025-01-04 05:09:33,602][134294] Updated weights for policy 0, policy_version 106654 (0.0027) [2025-01-04 05:09:33,968][134211] Fps is (10 sec: 13107.3, 60 sec: 14472.5, 300 sec: 14995.5). Total num frames: 436858880. Throughput: 0: 3687.8. Samples: 98378814. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 05:09:33,968][134211] Avg episode reward: [(0, '8.138')] [2025-01-04 05:09:36,576][134294] Updated weights for policy 0, policy_version 106664 (0.0024) [2025-01-04 05:09:38,968][134211] Fps is (10 sec: 13516.8, 60 sec: 14336.0, 300 sec: 14995.5). Total num frames: 436924416. Throughput: 0: 3713.4. Samples: 98399020. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 05:09:38,968][134211] Avg episode reward: [(0, '8.590')] [2025-01-04 05:09:39,845][134294] Updated weights for policy 0, policy_version 106674 (0.0026) [2025-01-04 05:09:43,091][134294] Updated weights for policy 0, policy_version 106684 (0.0026) [2025-01-04 05:09:43,968][134211] Fps is (10 sec: 12697.5, 60 sec: 14336.1, 300 sec: 14940.0). Total num frames: 436985856. Throughput: 0: 3687.6. Samples: 98417898. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 05:09:43,968][134211] Avg episode reward: [(0, '7.608')] [2025-01-04 05:09:46,544][134294] Updated weights for policy 0, policy_version 106694 (0.0025) [2025-01-04 05:09:48,968][134211] Fps is (10 sec: 13107.5, 60 sec: 14336.0, 300 sec: 14884.5). Total num frames: 437055488. Throughput: 0: 3663.7. Samples: 98426690. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 05:09:48,968][134211] Avg episode reward: [(0, '7.478')] [2025-01-04 05:09:49,187][134294] Updated weights for policy 0, policy_version 106704 (0.0015) [2025-01-04 05:09:51,273][134294] Updated weights for policy 0, policy_version 106714 (0.0014) [2025-01-04 05:09:53,256][134294] Updated weights for policy 0, policy_version 106724 (0.0013) [2025-01-04 05:09:53,968][134211] Fps is (10 sec: 16794.0, 60 sec: 14950.4, 300 sec: 14995.6). Total num frames: 437153792. Throughput: 0: 3803.5. Samples: 98453144. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 05:09:53,968][134211] Avg episode reward: [(0, '8.651')] [2025-01-04 05:09:55,246][134294] Updated weights for policy 0, policy_version 106734 (0.0014) [2025-01-04 05:09:57,080][134294] Updated weights for policy 0, policy_version 106744 (0.0011) [2025-01-04 05:09:58,968][134211] Fps is (10 sec: 20070.1, 60 sec: 15564.8, 300 sec: 15120.5). Total num frames: 437256192. Throughput: 0: 4038.4. Samples: 98485008. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 05:09:58,968][134211] Avg episode reward: [(0, '8.746')] [2025-01-04 05:09:59,283][134294] Updated weights for policy 0, policy_version 106754 (0.0018) [2025-01-04 05:10:03,041][134294] Updated weights for policy 0, policy_version 106764 (0.0025) [2025-01-04 05:10:03,968][134211] Fps is (10 sec: 15974.0, 60 sec: 15428.2, 300 sec: 14981.6). Total num frames: 437313536. Throughput: 0: 3901.0. Samples: 98494798. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 05:10:03,969][134211] Avg episode reward: [(0, '8.462')] [2025-01-04 05:10:06,179][134294] Updated weights for policy 0, policy_version 106774 (0.0024) [2025-01-04 05:10:08,968][134211] Fps is (10 sec: 12288.1, 60 sec: 15360.0, 300 sec: 14842.8). Total num frames: 437379072. Throughput: 0: 3620.8. Samples: 98512784. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 05:10:08,968][134211] Avg episode reward: [(0, '8.230')] [2025-01-04 05:10:09,487][134294] Updated weights for policy 0, policy_version 106784 (0.0024) [2025-01-04 05:10:12,505][134294] Updated weights for policy 0, policy_version 106794 (0.0024) [2025-01-04 05:10:13,968][134211] Fps is (10 sec: 13107.2, 60 sec: 15155.2, 300 sec: 14828.9). Total num frames: 437444608. Throughput: 0: 3631.7. Samples: 98532308. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 05:10:13,969][134211] Avg episode reward: [(0, '8.468')] [2025-01-04 05:10:15,552][134294] Updated weights for policy 0, policy_version 106804 (0.0024) [2025-01-04 05:10:18,677][134294] Updated weights for policy 0, policy_version 106814 (0.0023) [2025-01-04 05:10:18,968][134211] Fps is (10 sec: 13516.6, 60 sec: 14609.1, 300 sec: 14884.4). Total num frames: 437514240. Throughput: 0: 3639.8. Samples: 98542606. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 05:10:18,968][134211] Avg episode reward: [(0, '7.932')] [2025-01-04 05:10:21,929][134294] Updated weights for policy 0, policy_version 106824 (0.0026) [2025-01-04 05:10:23,968][134211] Fps is (10 sec: 13107.4, 60 sec: 14131.2, 300 sec: 14870.6). Total num frames: 437575680. Throughput: 0: 3616.2. Samples: 98561748. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 05:10:23,968][134211] Avg episode reward: [(0, '8.493')] [2025-01-04 05:10:25,359][134294] Updated weights for policy 0, policy_version 106834 (0.0025) [2025-01-04 05:10:28,196][134294] Updated weights for policy 0, policy_version 106844 (0.0024) [2025-01-04 05:10:28,968][134211] Fps is (10 sec: 12697.7, 60 sec: 14199.5, 300 sec: 14870.7). Total num frames: 437641216. Throughput: 0: 3627.7. Samples: 98581146. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 05:10:28,968][134211] Avg episode reward: [(0, '8.811')] [2025-01-04 05:10:30,570][134294] Updated weights for policy 0, policy_version 106854 (0.0016) [2025-01-04 05:10:32,378][134294] Updated weights for policy 0, policy_version 106864 (0.0013) [2025-01-04 05:10:33,968][134211] Fps is (10 sec: 17203.3, 60 sec: 14813.9, 300 sec: 15023.3). Total num frames: 437747712. Throughput: 0: 3742.6. Samples: 98595108. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-04 05:10:33,968][134211] Avg episode reward: [(0, '8.726')] [2025-01-04 05:10:34,304][134294] Updated weights for policy 0, policy_version 106874 (0.0012) [2025-01-04 05:10:36,185][134294] Updated weights for policy 0, policy_version 106884 (0.0013) [2025-01-04 05:10:38,484][134294] Updated weights for policy 0, policy_version 106894 (0.0019) [2025-01-04 05:10:38,968][134211] Fps is (10 sec: 20069.0, 60 sec: 15291.6, 300 sec: 15051.0). Total num frames: 437841920. Throughput: 0: 3875.8. Samples: 98627558. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-04 05:10:38,969][134211] Avg episode reward: [(0, '7.866')] [2025-01-04 05:10:42,006][134294] Updated weights for policy 0, policy_version 106904 (0.0027) [2025-01-04 05:10:43,968][134211] Fps is (10 sec: 15564.0, 60 sec: 15291.7, 300 sec: 14912.2). Total num frames: 437903360. Throughput: 0: 3588.6. Samples: 98646496. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-04 05:10:43,970][134211] Avg episode reward: [(0, '7.569')] [2025-01-04 05:10:45,209][134294] Updated weights for policy 0, policy_version 106914 (0.0028) [2025-01-04 05:10:48,331][134294] Updated weights for policy 0, policy_version 106924 (0.0024) [2025-01-04 05:10:48,968][134211] Fps is (10 sec: 12288.7, 60 sec: 15155.1, 300 sec: 14898.3). Total num frames: 437964800. Throughput: 0: 3593.3. Samples: 98656494. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-04 05:10:48,968][134211] Avg episode reward: [(0, '8.275')] [2025-01-04 05:10:51,639][134294] Updated weights for policy 0, policy_version 106934 (0.0025) [2025-01-04 05:10:53,968][134211] Fps is (10 sec: 12698.0, 60 sec: 14609.0, 300 sec: 14898.3). Total num frames: 438030336. Throughput: 0: 3613.1. Samples: 98675372. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-04 05:10:53,968][134211] Avg episode reward: [(0, '8.384')] [2025-01-04 05:10:54,984][134294] Updated weights for policy 0, policy_version 106944 (0.0024) [2025-01-04 05:10:58,015][134294] Updated weights for policy 0, policy_version 106954 (0.0025) [2025-01-04 05:10:58,968][134211] Fps is (10 sec: 13107.4, 60 sec: 13994.7, 300 sec: 14884.5). Total num frames: 438095872. Throughput: 0: 3609.8. Samples: 98694748. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-04 05:10:58,968][134211] Avg episode reward: [(0, '7.944')] [2025-01-04 05:11:00,974][134294] Updated weights for policy 0, policy_version 106964 (0.0025) [2025-01-04 05:11:03,203][134294] Updated weights for policy 0, policy_version 106974 (0.0018) [2025-01-04 05:11:03,967][134211] Fps is (10 sec: 14746.0, 60 sec: 14404.3, 300 sec: 14940.0). Total num frames: 438177792. Throughput: 0: 3611.5. Samples: 98705124. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-04 05:11:03,968][134211] Avg episode reward: [(0, '8.646')] [2025-01-04 05:11:05,114][134294] Updated weights for policy 0, policy_version 106984 (0.0013) [2025-01-04 05:11:06,958][134294] Updated weights for policy 0, policy_version 106994 (0.0013) [2025-01-04 05:11:08,879][134294] Updated weights for policy 0, policy_version 107004 (0.0014) [2025-01-04 05:11:08,967][134211] Fps is (10 sec: 19251.4, 60 sec: 15155.2, 300 sec: 15092.7). Total num frames: 438288384. Throughput: 0: 3875.4. Samples: 98736142. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-04 05:11:08,968][134211] Avg episode reward: [(0, '8.746')] [2025-01-04 05:11:11,345][134294] Updated weights for policy 0, policy_version 107014 (0.0023) [2025-01-04 05:11:13,968][134211] Fps is (10 sec: 18430.9, 60 sec: 15291.7, 300 sec: 15092.7). Total num frames: 438362112. Throughput: 0: 4005.2. Samples: 98761380. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-04 05:11:13,969][134211] Avg episode reward: [(0, '7.897')] [2025-01-04 05:11:13,979][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000107022_438362112.pth... [2025-01-04 05:11:14,060][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000106146_434774016.pth [2025-01-04 05:11:14,697][134294] Updated weights for policy 0, policy_version 107024 (0.0028) [2025-01-04 05:11:18,164][134294] Updated weights for policy 0, policy_version 107034 (0.0033) [2025-01-04 05:11:18,969][134211] Fps is (10 sec: 13105.6, 60 sec: 15086.7, 300 sec: 14995.5). Total num frames: 438419456. Throughput: 0: 3891.1. Samples: 98770212. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-04 05:11:18,969][134211] Avg episode reward: [(0, '8.588')] [2025-01-04 05:11:21,398][134294] Updated weights for policy 0, policy_version 107044 (0.0026) [2025-01-04 05:11:23,968][134211] Fps is (10 sec: 11878.8, 60 sec: 15086.9, 300 sec: 14981.7). Total num frames: 438480896. Throughput: 0: 3574.2. Samples: 98788394. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-04 05:11:23,968][134211] Avg episode reward: [(0, '8.582')] [2025-01-04 05:11:24,882][134294] Updated weights for policy 0, policy_version 107054 (0.0026) [2025-01-04 05:11:27,856][134294] Updated weights for policy 0, policy_version 107064 (0.0026) [2025-01-04 05:11:28,968][134211] Fps is (10 sec: 12698.8, 60 sec: 15086.9, 300 sec: 14856.7). Total num frames: 438546432. Throughput: 0: 3582.4. Samples: 98807704. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-04 05:11:28,968][134211] Avg episode reward: [(0, '8.930')] [2025-01-04 05:11:30,980][134294] Updated weights for policy 0, policy_version 107074 (0.0026) [2025-01-04 05:11:33,540][134294] Updated weights for policy 0, policy_version 107084 (0.0018) [2025-01-04 05:11:33,967][134211] Fps is (10 sec: 14336.4, 60 sec: 14609.1, 300 sec: 14787.3). Total num frames: 438624256. Throughput: 0: 3582.9. Samples: 98817722. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 05:11:33,968][134211] Avg episode reward: [(0, '8.651')] [2025-01-04 05:11:35,424][134294] Updated weights for policy 0, policy_version 107094 (0.0013) [2025-01-04 05:11:37,314][134294] Updated weights for policy 0, policy_version 107104 (0.0014) [2025-01-04 05:11:38,968][134211] Fps is (10 sec: 18432.3, 60 sec: 14814.1, 300 sec: 14940.0). Total num frames: 438730752. Throughput: 0: 3814.9. Samples: 98847040. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 05:11:38,968][134211] Avg episode reward: [(0, '8.519')] [2025-01-04 05:11:39,194][134294] Updated weights for policy 0, policy_version 107114 (0.0013) [2025-01-04 05:11:41,169][134294] Updated weights for policy 0, policy_version 107124 (0.0012) [2025-01-04 05:11:43,968][134211] Fps is (10 sec: 19250.4, 60 sec: 15223.5, 300 sec: 15009.4). Total num frames: 438816768. Throughput: 0: 4029.0. Samples: 98876054. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 05:11:43,969][134211] Avg episode reward: [(0, '8.595')] [2025-01-04 05:11:44,058][134294] Updated weights for policy 0, policy_version 107134 (0.0022) [2025-01-04 05:11:47,968][134294] Updated weights for policy 0, policy_version 107144 (0.0028) [2025-01-04 05:11:48,968][134211] Fps is (10 sec: 13926.0, 60 sec: 15086.9, 300 sec: 14981.6). Total num frames: 438870016. Throughput: 0: 3972.2. Samples: 98883872. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 05:11:48,969][134211] Avg episode reward: [(0, '7.242')] [2025-01-04 05:11:51,632][134294] Updated weights for policy 0, policy_version 107154 (0.0024) [2025-01-04 05:11:53,968][134211] Fps is (10 sec: 11059.3, 60 sec: 14950.4, 300 sec: 14940.0). Total num frames: 438927360. Throughput: 0: 3650.4. Samples: 98900412. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 05:11:53,969][134211] Avg episode reward: [(0, '7.914')] [2025-01-04 05:11:55,277][134294] Updated weights for policy 0, policy_version 107164 (0.0026) [2025-01-04 05:11:58,049][134294] Updated weights for policy 0, policy_version 107174 (0.0019) [2025-01-04 05:11:58,967][134211] Fps is (10 sec: 13107.6, 60 sec: 15087.0, 300 sec: 14967.8). Total num frames: 439001088. Throughput: 0: 3524.1. Samples: 98919964. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 05:11:58,968][134211] Avg episode reward: [(0, '8.192')] [2025-01-04 05:11:59,954][134294] Updated weights for policy 0, policy_version 107184 (0.0013) [2025-01-04 05:12:01,931][134294] Updated weights for policy 0, policy_version 107194 (0.0014) [2025-01-04 05:12:03,853][134294] Updated weights for policy 0, policy_version 107204 (0.0013) [2025-01-04 05:12:03,968][134211] Fps is (10 sec: 18022.9, 60 sec: 15496.5, 300 sec: 15051.1). Total num frames: 439107584. Throughput: 0: 3684.3. Samples: 98936000. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 05:12:03,968][134211] Avg episode reward: [(0, '8.773')] [2025-01-04 05:12:05,732][134294] Updated weights for policy 0, policy_version 107214 (0.0014) [2025-01-04 05:12:08,710][134294] Updated weights for policy 0, policy_version 107224 (0.0025) [2025-01-04 05:12:08,968][134211] Fps is (10 sec: 18840.2, 60 sec: 15018.5, 300 sec: 14967.7). Total num frames: 439189504. Throughput: 0: 3940.5. Samples: 98965720. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 05:12:08,969][134211] Avg episode reward: [(0, '8.176')] [2025-01-04 05:12:11,898][134294] Updated weights for policy 0, policy_version 107234 (0.0032) [2025-01-04 05:12:13,968][134211] Fps is (10 sec: 14745.2, 60 sec: 14882.2, 300 sec: 14828.9). Total num frames: 439255040. Throughput: 0: 3935.1. Samples: 98984784. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 05:12:13,972][134211] Avg episode reward: [(0, '7.554')] [2025-01-04 05:12:15,112][134294] Updated weights for policy 0, policy_version 107244 (0.0026) [2025-01-04 05:12:18,303][134294] Updated weights for policy 0, policy_version 107254 (0.0024) [2025-01-04 05:12:18,968][134211] Fps is (10 sec: 13107.8, 60 sec: 15018.9, 300 sec: 14856.7). Total num frames: 439320576. Throughput: 0: 3920.1. Samples: 98994128. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 05:12:18,968][134211] Avg episode reward: [(0, '7.432')] [2025-01-04 05:12:21,503][134294] Updated weights for policy 0, policy_version 107264 (0.0028) [2025-01-04 05:12:23,968][134211] Fps is (10 sec: 12697.7, 60 sec: 15018.7, 300 sec: 14842.8). Total num frames: 439382016. Throughput: 0: 3698.2. Samples: 99013460. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 05:12:23,968][134211] Avg episode reward: [(0, '8.965')] [2025-01-04 05:12:25,010][134294] Updated weights for policy 0, policy_version 107274 (0.0028) [2025-01-04 05:12:28,049][134294] Updated weights for policy 0, policy_version 107284 (0.0028) [2025-01-04 05:12:28,968][134211] Fps is (10 sec: 12288.2, 60 sec: 14950.4, 300 sec: 14828.9). Total num frames: 439443456. Throughput: 0: 3472.7. Samples: 99032326. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 05:12:28,968][134211] Avg episode reward: [(0, '8.267')] [2025-01-04 05:12:31,137][134294] Updated weights for policy 0, policy_version 107294 (0.0026) [2025-01-04 05:12:33,968][134211] Fps is (10 sec: 13107.1, 60 sec: 14813.8, 300 sec: 14828.9). Total num frames: 439513088. Throughput: 0: 3523.2. Samples: 99042418. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 05:12:33,968][134211] Avg episode reward: [(0, '8.271')] [2025-01-04 05:12:34,229][134294] Updated weights for policy 0, policy_version 107304 (0.0024) [2025-01-04 05:12:37,248][134294] Updated weights for policy 0, policy_version 107314 (0.0027) [2025-01-04 05:12:38,968][134211] Fps is (10 sec: 13516.8, 60 sec: 14131.2, 300 sec: 14842.8). Total num frames: 439578624. Throughput: 0: 3601.2. Samples: 99062464. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 05:12:38,968][134211] Avg episode reward: [(0, '7.059')] [2025-01-04 05:12:40,367][134294] Updated weights for policy 0, policy_version 107324 (0.0025) [2025-01-04 05:12:42,834][134294] Updated weights for policy 0, policy_version 107334 (0.0018) [2025-01-04 05:12:43,968][134211] Fps is (10 sec: 14746.0, 60 sec: 14063.0, 300 sec: 14912.2). Total num frames: 439660544. Throughput: 0: 3670.8. Samples: 99085152. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 05:12:43,968][134211] Avg episode reward: [(0, '8.606')] [2025-01-04 05:12:44,755][134294] Updated weights for policy 0, policy_version 107344 (0.0013) [2025-01-04 05:12:46,644][134294] Updated weights for policy 0, policy_version 107354 (0.0013) [2025-01-04 05:12:48,454][134294] Updated weights for policy 0, policy_version 107364 (0.0013) [2025-01-04 05:12:48,968][134211] Fps is (10 sec: 19251.3, 60 sec: 15018.7, 300 sec: 14953.9). Total num frames: 439771136. Throughput: 0: 3675.9. Samples: 99101414. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 05:12:48,968][134211] Avg episode reward: [(0, '8.897')] [2025-01-04 05:12:50,739][134294] Updated weights for policy 0, policy_version 107374 (0.0017) [2025-01-04 05:12:53,968][134211] Fps is (10 sec: 18022.0, 60 sec: 15223.5, 300 sec: 14884.4). Total num frames: 439840768. Throughput: 0: 3618.7. Samples: 99128560. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 05:12:53,968][134211] Avg episode reward: [(0, '8.549')] [2025-01-04 05:12:54,227][134294] Updated weights for policy 0, policy_version 107384 (0.0032) [2025-01-04 05:12:57,582][134294] Updated weights for policy 0, policy_version 107394 (0.0028) [2025-01-04 05:12:58,968][134211] Fps is (10 sec: 12697.3, 60 sec: 14950.3, 300 sec: 14870.6). Total num frames: 439898112. Throughput: 0: 3586.6. Samples: 99146182. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 05:12:58,968][134211] Avg episode reward: [(0, '7.704')] [2025-01-04 05:13:01,222][134294] Updated weights for policy 0, policy_version 107404 (0.0029) [2025-01-04 05:13:03,968][134211] Fps is (10 sec: 11468.9, 60 sec: 14131.2, 300 sec: 14842.8). Total num frames: 439955456. Throughput: 0: 3571.7. Samples: 99154856. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 05:13:03,968][134211] Avg episode reward: [(0, '7.908')] [2025-01-04 05:13:04,668][134294] Updated weights for policy 0, policy_version 107414 (0.0027) [2025-01-04 05:13:06,640][134294] Updated weights for policy 0, policy_version 107424 (0.0014) [2025-01-04 05:13:08,576][134294] Updated weights for policy 0, policy_version 107434 (0.0015) [2025-01-04 05:13:08,968][134211] Fps is (10 sec: 15974.7, 60 sec: 14472.7, 300 sec: 14967.8). Total num frames: 440057856. Throughput: 0: 3663.7. Samples: 99178328. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 05:13:08,968][134211] Avg episode reward: [(0, '8.214')] [2025-01-04 05:13:10,399][134294] Updated weights for policy 0, policy_version 107444 (0.0014) [2025-01-04 05:13:12,376][134294] Updated weights for policy 0, policy_version 107454 (0.0013) [2025-01-04 05:13:13,968][134211] Fps is (10 sec: 20480.0, 60 sec: 15087.0, 300 sec: 15078.8). Total num frames: 440160256. Throughput: 0: 3960.6. Samples: 99210554. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 05:13:13,968][134211] Avg episode reward: [(0, '9.084')] [2025-01-04 05:13:13,980][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000107462_440164352.pth... [2025-01-04 05:13:14,046][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000106576_436535296.pth [2025-01-04 05:13:14,638][134294] Updated weights for policy 0, policy_version 107464 (0.0019) [2025-01-04 05:13:17,991][134294] Updated weights for policy 0, policy_version 107474 (0.0028) [2025-01-04 05:13:18,968][134211] Fps is (10 sec: 16384.0, 60 sec: 15018.7, 300 sec: 15078.8). Total num frames: 440221696. Throughput: 0: 3975.0. Samples: 99221294. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 05:13:18,968][134211] Avg episode reward: [(0, '8.612')] [2025-01-04 05:13:21,164][134294] Updated weights for policy 0, policy_version 107484 (0.0027) [2025-01-04 05:13:23,968][134211] Fps is (10 sec: 12287.9, 60 sec: 15018.7, 300 sec: 14926.1). Total num frames: 440283136. Throughput: 0: 3934.4. Samples: 99239512. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 05:13:23,968][134211] Avg episode reward: [(0, '8.069')] [2025-01-04 05:13:24,784][134294] Updated weights for policy 0, policy_version 107494 (0.0028) [2025-01-04 05:13:27,839][134294] Updated weights for policy 0, policy_version 107504 (0.0025) [2025-01-04 05:13:28,968][134211] Fps is (10 sec: 12697.4, 60 sec: 15086.9, 300 sec: 14773.4). Total num frames: 440348672. Throughput: 0: 3849.0. Samples: 99258360. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 05:13:28,968][134211] Avg episode reward: [(0, '8.788')] [2025-01-04 05:13:30,932][134294] Updated weights for policy 0, policy_version 107514 (0.0026) [2025-01-04 05:13:33,799][134294] Updated weights for policy 0, policy_version 107524 (0.0026) [2025-01-04 05:13:33,968][134211] Fps is (10 sec: 13516.5, 60 sec: 15086.9, 300 sec: 14759.5). Total num frames: 440418304. Throughput: 0: 3713.7. Samples: 99268532. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 05:13:33,968][134211] Avg episode reward: [(0, '7.951')] [2025-01-04 05:13:36,848][134294] Updated weights for policy 0, policy_version 107534 (0.0026) [2025-01-04 05:13:38,968][134211] Fps is (10 sec: 13517.1, 60 sec: 15086.9, 300 sec: 14773.4). Total num frames: 440483840. Throughput: 0: 3569.3. Samples: 99289178. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 05:13:38,968][134211] Avg episode reward: [(0, '7.912')] [2025-01-04 05:13:39,968][134294] Updated weights for policy 0, policy_version 107544 (0.0025) [2025-01-04 05:13:43,244][134294] Updated weights for policy 0, policy_version 107554 (0.0026) [2025-01-04 05:13:43,968][134211] Fps is (10 sec: 13107.2, 60 sec: 14813.8, 300 sec: 14759.5). Total num frames: 440549376. Throughput: 0: 3603.7. Samples: 99308350. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 05:13:43,969][134211] Avg episode reward: [(0, '8.100')] [2025-01-04 05:13:46,562][134294] Updated weights for policy 0, policy_version 107564 (0.0021) [2025-01-04 05:13:48,683][134294] Updated weights for policy 0, policy_version 107574 (0.0014) [2025-01-04 05:13:48,969][134211] Fps is (10 sec: 14334.3, 60 sec: 14267.4, 300 sec: 14815.0). Total num frames: 440627200. Throughput: 0: 3605.2. Samples: 99317096. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 05:13:48,969][134211] Avg episode reward: [(0, '8.276')] [2025-01-04 05:13:50,619][134294] Updated weights for policy 0, policy_version 107584 (0.0013) [2025-01-04 05:13:52,939][134294] Updated weights for policy 0, policy_version 107594 (0.0019) [2025-01-04 05:13:53,968][134211] Fps is (10 sec: 16384.2, 60 sec: 14540.8, 300 sec: 14884.4). Total num frames: 440713216. Throughput: 0: 3742.1. Samples: 99346722. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 05:13:53,968][134211] Avg episode reward: [(0, '8.404')] [2025-01-04 05:13:56,187][134294] Updated weights for policy 0, policy_version 107604 (0.0027) [2025-01-04 05:13:58,968][134211] Fps is (10 sec: 15156.1, 60 sec: 14677.2, 300 sec: 14884.4). Total num frames: 440778752. Throughput: 0: 3447.8. Samples: 99365706. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 05:13:58,969][134211] Avg episode reward: [(0, '8.306')] [2025-01-04 05:13:59,557][134294] Updated weights for policy 0, policy_version 107614 (0.0030) [2025-01-04 05:14:02,902][134294] Updated weights for policy 0, policy_version 107624 (0.0029) [2025-01-04 05:14:03,968][134211] Fps is (10 sec: 12697.7, 60 sec: 14745.6, 300 sec: 14856.7). Total num frames: 440840192. Throughput: 0: 3408.7. Samples: 99374688. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 05:14:03,968][134211] Avg episode reward: [(0, '8.015')] [2025-01-04 05:14:05,520][134294] Updated weights for policy 0, policy_version 107634 (0.0020) [2025-01-04 05:14:07,600][134294] Updated weights for policy 0, policy_version 107644 (0.0020) [2025-01-04 05:14:08,968][134211] Fps is (10 sec: 14746.2, 60 sec: 14472.5, 300 sec: 14884.4). Total num frames: 440926208. Throughput: 0: 3541.3. Samples: 99398872. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 05:14:08,968][134211] Avg episode reward: [(0, '8.226')] [2025-01-04 05:14:10,647][134294] Updated weights for policy 0, policy_version 107654 (0.0024) [2025-01-04 05:14:13,636][134294] Updated weights for policy 0, policy_version 107664 (0.0025) [2025-01-04 05:14:13,968][134211] Fps is (10 sec: 15564.9, 60 sec: 13926.4, 300 sec: 14773.4). Total num frames: 440995840. Throughput: 0: 3584.8. Samples: 99419674. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 05:14:13,968][134211] Avg episode reward: [(0, '7.949')] [2025-01-04 05:14:16,544][134294] Updated weights for policy 0, policy_version 107674 (0.0025) [2025-01-04 05:14:18,968][134211] Fps is (10 sec: 13516.9, 60 sec: 13994.6, 300 sec: 14690.1). Total num frames: 441061376. Throughput: 0: 3585.7. Samples: 99429886. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 05:14:18,968][134211] Avg episode reward: [(0, '8.402')] [2025-01-04 05:14:19,930][134294] Updated weights for policy 0, policy_version 107684 (0.0024) [2025-01-04 05:14:22,011][134294] Updated weights for policy 0, policy_version 107694 (0.0014) [2025-01-04 05:14:23,968][134211] Fps is (10 sec: 15564.9, 60 sec: 14472.6, 300 sec: 14787.3). Total num frames: 441151488. Throughput: 0: 3638.0. Samples: 99452888. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 05:14:23,968][134211] Avg episode reward: [(0, '8.279')] [2025-01-04 05:14:23,999][134294] Updated weights for policy 0, policy_version 107704 (0.0013) [2025-01-04 05:14:25,958][134294] Updated weights for policy 0, policy_version 107714 (0.0015) [2025-01-04 05:14:27,784][134294] Updated weights for policy 0, policy_version 107724 (0.0014) [2025-01-04 05:14:28,967][134211] Fps is (10 sec: 20070.8, 60 sec: 15223.5, 300 sec: 14926.1). Total num frames: 441262080. Throughput: 0: 3916.1. Samples: 99484572. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 05:14:28,968][134211] Avg episode reward: [(0, '8.080')] [2025-01-04 05:14:29,717][134294] Updated weights for policy 0, policy_version 107734 (0.0014) [2025-01-04 05:14:31,798][134294] Updated weights for policy 0, policy_version 107744 (0.0015) [2025-01-04 05:14:33,968][134211] Fps is (10 sec: 19250.7, 60 sec: 15428.3, 300 sec: 14981.6). Total num frames: 441344000. Throughput: 0: 4077.7. Samples: 99500588. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 05:14:33,969][134211] Avg episode reward: [(0, '9.690')] [2025-01-04 05:14:35,160][134294] Updated weights for policy 0, policy_version 107754 (0.0028) [2025-01-04 05:14:38,383][134294] Updated weights for policy 0, policy_version 107764 (0.0024) [2025-01-04 05:14:38,968][134211] Fps is (10 sec: 14335.6, 60 sec: 15360.0, 300 sec: 14981.6). Total num frames: 441405440. Throughput: 0: 3841.8. Samples: 99519604. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 05:14:38,968][134211] Avg episode reward: [(0, '8.404')] [2025-01-04 05:14:41,646][134294] Updated weights for policy 0, policy_version 107774 (0.0025) [2025-01-04 05:14:43,968][134211] Fps is (10 sec: 12287.5, 60 sec: 15291.6, 300 sec: 14953.8). Total num frames: 441466880. Throughput: 0: 3835.3. Samples: 99538296. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 05:14:43,969][134211] Avg episode reward: [(0, '8.099')] [2025-01-04 05:14:44,950][134294] Updated weights for policy 0, policy_version 107784 (0.0021) [2025-01-04 05:14:48,257][134294] Updated weights for policy 0, policy_version 107794 (0.0025) [2025-01-04 05:14:48,968][134211] Fps is (10 sec: 12697.6, 60 sec: 15087.2, 300 sec: 14842.8). Total num frames: 441532416. Throughput: 0: 3846.2. Samples: 99547768. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 05:14:48,968][134211] Avg episode reward: [(0, '8.297')] [2025-01-04 05:14:51,686][134294] Updated weights for policy 0, policy_version 107804 (0.0028) [2025-01-04 05:14:53,968][134211] Fps is (10 sec: 12288.6, 60 sec: 14609.1, 300 sec: 14690.1). Total num frames: 441589760. Throughput: 0: 3713.6. Samples: 99565984. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 05:14:53,968][134211] Avg episode reward: [(0, '7.626')] [2025-01-04 05:14:54,996][134294] Updated weights for policy 0, policy_version 107814 (0.0026) [2025-01-04 05:14:57,550][134294] Updated weights for policy 0, policy_version 107824 (0.0018) [2025-01-04 05:14:58,967][134211] Fps is (10 sec: 14336.3, 60 sec: 14950.6, 300 sec: 14787.3). Total num frames: 441675776. Throughput: 0: 3752.5. Samples: 99588534. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 05:14:58,968][134211] Avg episode reward: [(0, '7.522')] [2025-01-04 05:14:59,417][134294] Updated weights for policy 0, policy_version 107834 (0.0013) [2025-01-04 05:15:01,303][134294] Updated weights for policy 0, policy_version 107844 (0.0013) [2025-01-04 05:15:03,200][134294] Updated weights for policy 0, policy_version 107854 (0.0014) [2025-01-04 05:15:03,968][134211] Fps is (10 sec: 19251.7, 60 sec: 15701.4, 300 sec: 14926.1). Total num frames: 441782272. Throughput: 0: 3887.8. Samples: 99604838. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 05:15:03,968][134211] Avg episode reward: [(0, '8.905')] [2025-01-04 05:15:05,075][134294] Updated weights for policy 0, policy_version 107864 (0.0014) [2025-01-04 05:15:07,215][134294] Updated weights for policy 0, policy_version 107874 (0.0015) [2025-01-04 05:15:08,968][134211] Fps is (10 sec: 19660.3, 60 sec: 15769.6, 300 sec: 15009.4). Total num frames: 441872384. Throughput: 0: 4072.3. Samples: 99636144. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 05:15:08,969][134211] Avg episode reward: [(0, '8.222')] [2025-01-04 05:15:10,399][134294] Updated weights for policy 0, policy_version 107884 (0.0029) [2025-01-04 05:15:13,630][134294] Updated weights for policy 0, policy_version 107894 (0.0026) [2025-01-04 05:15:13,968][134211] Fps is (10 sec: 15154.4, 60 sec: 15633.0, 300 sec: 14981.6). Total num frames: 441933824. Throughput: 0: 3792.6. Samples: 99655240. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 05:15:13,969][134211] Avg episode reward: [(0, '8.195')] [2025-01-04 05:15:13,980][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000107894_441933824.pth... [2025-01-04 05:15:14,067][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000107022_438362112.pth [2025-01-04 05:15:16,880][134294] Updated weights for policy 0, policy_version 107904 (0.0027) [2025-01-04 05:15:18,968][134211] Fps is (10 sec: 12697.7, 60 sec: 15633.1, 300 sec: 14995.5). Total num frames: 441999360. Throughput: 0: 3643.0. Samples: 99664524. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 05:15:18,968][134211] Avg episode reward: [(0, '8.358')] [2025-01-04 05:15:20,415][134294] Updated weights for policy 0, policy_version 107914 (0.0025) [2025-01-04 05:15:23,661][134294] Updated weights for policy 0, policy_version 107924 (0.0032) [2025-01-04 05:15:23,971][134211] Fps is (10 sec: 12284.6, 60 sec: 15086.1, 300 sec: 14967.6). Total num frames: 442056704. Throughput: 0: 3624.5. Samples: 99682718. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 05:15:23,971][134211] Avg episode reward: [(0, '8.279')] [2025-01-04 05:15:26,804][134294] Updated weights for policy 0, policy_version 107934 (0.0027) [2025-01-04 05:15:28,968][134211] Fps is (10 sec: 12697.6, 60 sec: 14404.2, 300 sec: 14842.8). Total num frames: 442126336. Throughput: 0: 3644.8. Samples: 99702308. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 05:15:28,968][134211] Avg episode reward: [(0, '8.749')] [2025-01-04 05:15:29,780][134294] Updated weights for policy 0, policy_version 107944 (0.0025) [2025-01-04 05:15:32,767][134294] Updated weights for policy 0, policy_version 107954 (0.0024) [2025-01-04 05:15:33,968][134211] Fps is (10 sec: 13520.9, 60 sec: 14131.2, 300 sec: 14745.6). Total num frames: 442191872. Throughput: 0: 3656.6. Samples: 99712314. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 05:15:33,968][134211] Avg episode reward: [(0, '8.468')] [2025-01-04 05:15:35,800][134294] Updated weights for policy 0, policy_version 107964 (0.0023) [2025-01-04 05:15:38,850][134294] Updated weights for policy 0, policy_version 107974 (0.0023) [2025-01-04 05:15:38,970][134211] Fps is (10 sec: 13513.9, 60 sec: 14267.3, 300 sec: 14773.3). Total num frames: 442261504. Throughput: 0: 3717.0. Samples: 99733256. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 05:15:38,970][134211] Avg episode reward: [(0, '8.194')] [2025-01-04 05:15:42,144][134294] Updated weights for policy 0, policy_version 107984 (0.0024) [2025-01-04 05:15:43,967][134211] Fps is (10 sec: 13926.8, 60 sec: 14404.4, 300 sec: 14801.2). Total num frames: 442331136. Throughput: 0: 3655.2. Samples: 99753016. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 05:15:43,968][134211] Avg episode reward: [(0, '7.779')] [2025-01-04 05:15:44,443][134294] Updated weights for policy 0, policy_version 107994 (0.0013) [2025-01-04 05:15:46,539][134294] Updated weights for policy 0, policy_version 108004 (0.0013) [2025-01-04 05:15:48,679][134294] Updated weights for policy 0, policy_version 108014 (0.0012) [2025-01-04 05:15:48,967][134211] Fps is (10 sec: 16797.4, 60 sec: 14950.5, 300 sec: 14912.2). Total num frames: 442429440. Throughput: 0: 3614.3. Samples: 99767480. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 05:15:48,968][134211] Avg episode reward: [(0, '8.193')] [2025-01-04 05:15:50,717][134294] Updated weights for policy 0, policy_version 108024 (0.0014) [2025-01-04 05:15:52,683][134294] Updated weights for policy 0, policy_version 108034 (0.0014) [2025-01-04 05:15:53,968][134211] Fps is (10 sec: 20070.4, 60 sec: 15701.4, 300 sec: 15037.2). Total num frames: 442531840. Throughput: 0: 3580.3. Samples: 99797258. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 05:15:53,968][134211] Avg episode reward: [(0, '8.256')] [2025-01-04 05:15:54,787][134294] Updated weights for policy 0, policy_version 108044 (0.0015) [2025-01-04 05:15:57,766][134294] Updated weights for policy 0, policy_version 108054 (0.0024) [2025-01-04 05:15:58,968][134211] Fps is (10 sec: 17202.4, 60 sec: 15428.1, 300 sec: 14995.5). Total num frames: 442601472. Throughput: 0: 3698.3. Samples: 99821662. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 05:15:58,969][134211] Avg episode reward: [(0, '9.433')] [2025-01-04 05:16:01,013][134294] Updated weights for policy 0, policy_version 108064 (0.0024) [2025-01-04 05:16:03,968][134211] Fps is (10 sec: 13106.8, 60 sec: 14677.3, 300 sec: 14828.9). Total num frames: 442662912. Throughput: 0: 3709.3. Samples: 99831444. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 05:16:03,968][134211] Avg episode reward: [(0, '8.042')] [2025-01-04 05:16:04,262][134294] Updated weights for policy 0, policy_version 108074 (0.0030) [2025-01-04 05:16:07,322][134294] Updated weights for policy 0, policy_version 108084 (0.0026) [2025-01-04 05:16:08,968][134211] Fps is (10 sec: 12697.5, 60 sec: 14267.7, 300 sec: 14801.1). Total num frames: 442728448. Throughput: 0: 3732.9. Samples: 99850686. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 05:16:08,970][134211] Avg episode reward: [(0, '8.807')] [2025-01-04 05:16:10,616][134294] Updated weights for policy 0, policy_version 108094 (0.0026) [2025-01-04 05:16:13,721][134294] Updated weights for policy 0, policy_version 108104 (0.0025) [2025-01-04 05:16:13,968][134211] Fps is (10 sec: 13107.3, 60 sec: 14336.1, 300 sec: 14829.0). Total num frames: 442793984. Throughput: 0: 3721.4. Samples: 99869770. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 05:16:13,968][134211] Avg episode reward: [(0, '8.927')] [2025-01-04 05:16:16,752][134294] Updated weights for policy 0, policy_version 108114 (0.0022) [2025-01-04 05:16:18,968][134211] Fps is (10 sec: 13107.6, 60 sec: 14336.0, 300 sec: 14842.8). Total num frames: 442859520. Throughput: 0: 3726.1. Samples: 99879986. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 05:16:18,968][134211] Avg episode reward: [(0, '8.394')] [2025-01-04 05:16:20,085][134294] Updated weights for policy 0, policy_version 108124 (0.0024) [2025-01-04 05:16:23,528][134294] Updated weights for policy 0, policy_version 108134 (0.0028) [2025-01-04 05:16:23,968][134211] Fps is (10 sec: 12697.6, 60 sec: 14405.0, 300 sec: 14828.9). Total num frames: 442920960. Throughput: 0: 3668.7. Samples: 99898342. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 05:16:23,968][134211] Avg episode reward: [(0, '8.073')] [2025-01-04 05:16:26,656][134294] Updated weights for policy 0, policy_version 108144 (0.0027) [2025-01-04 05:16:28,968][134211] Fps is (10 sec: 13107.4, 60 sec: 14404.3, 300 sec: 14801.1). Total num frames: 442990592. Throughput: 0: 3659.2. Samples: 99917680. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 05:16:28,968][134211] Avg episode reward: [(0, '7.382')] [2025-01-04 05:16:29,277][134294] Updated weights for policy 0, policy_version 108154 (0.0021) [2025-01-04 05:16:31,151][134294] Updated weights for policy 0, policy_version 108164 (0.0012) [2025-01-04 05:16:33,028][134294] Updated weights for policy 0, policy_version 108174 (0.0014) [2025-01-04 05:16:33,968][134211] Fps is (10 sec: 18022.7, 60 sec: 15155.3, 300 sec: 14815.0). Total num frames: 443101184. Throughput: 0: 3686.0. Samples: 99933348. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 05:16:33,968][134211] Avg episode reward: [(0, '8.719')] [2025-01-04 05:16:34,874][134294] Updated weights for policy 0, policy_version 108184 (0.0012) [2025-01-04 05:16:36,723][134294] Updated weights for policy 0, policy_version 108194 (0.0013) [2025-01-04 05:16:38,968][134211] Fps is (10 sec: 20889.6, 60 sec: 15633.6, 300 sec: 14856.7). Total num frames: 443199488. Throughput: 0: 3752.5. Samples: 99966122. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 05:16:38,968][134211] Avg episode reward: [(0, '8.078')] [2025-01-04 05:16:38,977][134294] Updated weights for policy 0, policy_version 108204 (0.0016) [2025-01-04 05:16:42,106][134294] Updated weights for policy 0, policy_version 108214 (0.0027) [2025-01-04 05:16:43,968][134211] Fps is (10 sec: 16383.9, 60 sec: 15564.8, 300 sec: 14898.3). Total num frames: 443265024. Throughput: 0: 3682.9. Samples: 99987390. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 05:16:43,968][134211] Avg episode reward: [(0, '10.094')] [2025-01-04 05:16:43,980][134264] Saving new best policy, reward=10.094! [2025-01-04 05:16:45,423][134294] Updated weights for policy 0, policy_version 108224 (0.0026) [2025-01-04 05:16:48,720][134294] Updated weights for policy 0, policy_version 108234 (0.0026) [2025-01-04 05:16:48,968][134211] Fps is (10 sec: 12697.4, 60 sec: 14950.3, 300 sec: 14912.2). Total num frames: 443326464. Throughput: 0: 3679.9. Samples: 99997040. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 05:16:48,968][134211] Avg episode reward: [(0, '9.372')] [2025-01-04 05:16:51,914][134294] Updated weights for policy 0, policy_version 108244 (0.0025) [2025-01-04 05:16:53,968][134211] Fps is (10 sec: 12288.0, 60 sec: 14267.7, 300 sec: 14870.6). Total num frames: 443387904. Throughput: 0: 3667.9. Samples: 100015740. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 05:16:53,968][134211] Avg episode reward: [(0, '8.985')] [2025-01-04 05:16:55,348][134294] Updated weights for policy 0, policy_version 108254 (0.0022) [2025-01-04 05:16:58,364][134294] Updated weights for policy 0, policy_version 108264 (0.0025) [2025-01-04 05:16:58,968][134211] Fps is (10 sec: 12697.7, 60 sec: 14199.5, 300 sec: 14731.7). Total num frames: 443453440. Throughput: 0: 3671.5. Samples: 100034986. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 05:16:58,968][134211] Avg episode reward: [(0, '8.520')] [2025-01-04 05:17:01,422][134294] Updated weights for policy 0, policy_version 108274 (0.0026) [2025-01-04 05:17:03,968][134211] Fps is (10 sec: 13107.0, 60 sec: 14267.7, 300 sec: 14676.2). Total num frames: 443518976. Throughput: 0: 3664.3. Samples: 100044878. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 05:17:03,969][134211] Avg episode reward: [(0, '8.504')] [2025-01-04 05:17:04,963][134294] Updated weights for policy 0, policy_version 108284 (0.0027) [2025-01-04 05:17:07,456][134294] Updated weights for policy 0, policy_version 108294 (0.0018) [2025-01-04 05:17:08,968][134211] Fps is (10 sec: 14745.9, 60 sec: 14540.9, 300 sec: 14731.7). Total num frames: 443600896. Throughput: 0: 3701.1. Samples: 100064890. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 05:17:08,968][134211] Avg episode reward: [(0, '8.075')] [2025-01-04 05:17:09,395][134294] Updated weights for policy 0, policy_version 108304 (0.0013) [2025-01-04 05:17:11,252][134294] Updated weights for policy 0, policy_version 108314 (0.0013) [2025-01-04 05:17:13,205][134294] Updated weights for policy 0, policy_version 108324 (0.0016) [2025-01-04 05:17:13,968][134211] Fps is (10 sec: 18841.5, 60 sec: 15223.4, 300 sec: 14870.6). Total num frames: 443707392. Throughput: 0: 3987.8. Samples: 100097130. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 05:17:13,968][134211] Avg episode reward: [(0, '8.715')] [2025-01-04 05:17:13,980][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000108327_443707392.pth... [2025-01-04 05:17:14,046][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000107462_440164352.pth [2025-01-04 05:17:16,155][134294] Updated weights for policy 0, policy_version 108334 (0.0028) [2025-01-04 05:17:18,968][134211] Fps is (10 sec: 16382.6, 60 sec: 15086.8, 300 sec: 14856.6). Total num frames: 443764736. Throughput: 0: 3859.8. Samples: 100107042. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 05:17:18,969][134211] Avg episode reward: [(0, '8.235')] [2025-01-04 05:17:19,713][134294] Updated weights for policy 0, policy_version 108344 (0.0028) [2025-01-04 05:17:23,083][134294] Updated weights for policy 0, policy_version 108354 (0.0025) [2025-01-04 05:17:23,968][134211] Fps is (10 sec: 11878.7, 60 sec: 15087.0, 300 sec: 14856.7). Total num frames: 443826176. Throughput: 0: 3529.3. Samples: 100124940. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 05:17:23,968][134211] Avg episode reward: [(0, '7.669')] [2025-01-04 05:17:26,360][134294] Updated weights for policy 0, policy_version 108364 (0.0028) [2025-01-04 05:17:28,797][134294] Updated weights for policy 0, policy_version 108374 (0.0015) [2025-01-04 05:17:28,967][134211] Fps is (10 sec: 13927.6, 60 sec: 15223.5, 300 sec: 14884.5). Total num frames: 443904000. Throughput: 0: 3507.2. Samples: 100145212. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 05:17:28,968][134211] Avg episode reward: [(0, '8.073')] [2025-01-04 05:17:30,716][134294] Updated weights for policy 0, policy_version 108384 (0.0011) [2025-01-04 05:17:32,546][134294] Updated weights for policy 0, policy_version 108394 (0.0013) [2025-01-04 05:17:33,968][134211] Fps is (10 sec: 18432.1, 60 sec: 15155.2, 300 sec: 15023.3). Total num frames: 444010496. Throughput: 0: 3648.3. Samples: 100161214. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 05:17:33,968][134211] Avg episode reward: [(0, '8.690')] [2025-01-04 05:17:34,419][134294] Updated weights for policy 0, policy_version 108404 (0.0012) [2025-01-04 05:17:36,386][134294] Updated weights for policy 0, policy_version 108414 (0.0015) [2025-01-04 05:17:38,243][134294] Updated weights for policy 0, policy_version 108424 (0.0014) [2025-01-04 05:17:38,968][134211] Fps is (10 sec: 21299.0, 60 sec: 15291.7, 300 sec: 15106.6). Total num frames: 444116992. Throughput: 0: 3950.5. Samples: 100193512. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 05:17:38,968][134211] Avg episode reward: [(0, '8.205')] [2025-01-04 05:17:40,899][134294] Updated weights for policy 0, policy_version 108434 (0.0020) [2025-01-04 05:17:43,968][134211] Fps is (10 sec: 17202.7, 60 sec: 15291.7, 300 sec: 14953.9). Total num frames: 444182528. Throughput: 0: 4043.5. Samples: 100216942. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 05:17:43,969][134211] Avg episode reward: [(0, '8.425')] [2025-01-04 05:17:44,264][134294] Updated weights for policy 0, policy_version 108444 (0.0029) [2025-01-04 05:17:48,052][134294] Updated weights for policy 0, policy_version 108454 (0.0030) [2025-01-04 05:17:48,968][134211] Fps is (10 sec: 11878.1, 60 sec: 15155.2, 300 sec: 14898.3). Total num frames: 444235776. Throughput: 0: 4002.8. Samples: 100225006. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 05:17:48,969][134211] Avg episode reward: [(0, '7.696')] [2025-01-04 05:17:51,643][134294] Updated weights for policy 0, policy_version 108464 (0.0027) [2025-01-04 05:17:53,968][134211] Fps is (10 sec: 11059.4, 60 sec: 15086.9, 300 sec: 14898.3). Total num frames: 444293120. Throughput: 0: 3930.8. Samples: 100241778. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 05:17:53,968][134211] Avg episode reward: [(0, '9.332')] [2025-01-04 05:17:55,324][134294] Updated weights for policy 0, policy_version 108474 (0.0024) [2025-01-04 05:17:58,520][134294] Updated weights for policy 0, policy_version 108484 (0.0027) [2025-01-04 05:17:58,968][134211] Fps is (10 sec: 11878.5, 60 sec: 15018.7, 300 sec: 14912.2). Total num frames: 444354560. Throughput: 0: 3613.7. Samples: 100259748. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 05:17:58,968][134211] Avg episode reward: [(0, '8.470')] [2025-01-04 05:18:01,516][134294] Updated weights for policy 0, policy_version 108494 (0.0025) [2025-01-04 05:18:03,968][134211] Fps is (10 sec: 12697.7, 60 sec: 15018.7, 300 sec: 14787.3). Total num frames: 444420096. Throughput: 0: 3617.0. Samples: 100269806. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 05:18:03,968][134211] Avg episode reward: [(0, '8.499')] [2025-01-04 05:18:04,661][134294] Updated weights for policy 0, policy_version 108504 (0.0024) [2025-01-04 05:18:07,669][134294] Updated weights for policy 0, policy_version 108514 (0.0025) [2025-01-04 05:18:08,968][134211] Fps is (10 sec: 13516.8, 60 sec: 14813.8, 300 sec: 14676.2). Total num frames: 444489728. Throughput: 0: 3666.5. Samples: 100289934. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 05:18:08,968][134211] Avg episode reward: [(0, '8.450')] [2025-01-04 05:18:10,638][134294] Updated weights for policy 0, policy_version 108524 (0.0025) [2025-01-04 05:18:13,557][134294] Updated weights for policy 0, policy_version 108534 (0.0026) [2025-01-04 05:18:13,968][134211] Fps is (10 sec: 13926.1, 60 sec: 14199.5, 300 sec: 14703.9). Total num frames: 444559360. Throughput: 0: 3681.5. Samples: 100310882. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 05:18:13,969][134211] Avg episode reward: [(0, '8.851')] [2025-01-04 05:18:16,232][134294] Updated weights for policy 0, policy_version 108544 (0.0021) [2025-01-04 05:18:18,234][134294] Updated weights for policy 0, policy_version 108554 (0.0017) [2025-01-04 05:18:18,968][134211] Fps is (10 sec: 15564.9, 60 sec: 14677.5, 300 sec: 14787.3). Total num frames: 444645376. Throughput: 0: 3581.7. Samples: 100322390. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 05:18:18,968][134211] Avg episode reward: [(0, '9.447')] [2025-01-04 05:18:21,045][134294] Updated weights for policy 0, policy_version 108564 (0.0024) [2025-01-04 05:18:23,968][134211] Fps is (10 sec: 15565.1, 60 sec: 14813.9, 300 sec: 14801.1). Total num frames: 444715008. Throughput: 0: 3395.7. Samples: 100346320. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 05:18:23,968][134211] Avg episode reward: [(0, '8.919')] [2025-01-04 05:18:24,277][134294] Updated weights for policy 0, policy_version 108574 (0.0023) [2025-01-04 05:18:26,813][134294] Updated weights for policy 0, policy_version 108584 (0.0017) [2025-01-04 05:18:28,732][134294] Updated weights for policy 0, policy_version 108594 (0.0013) [2025-01-04 05:18:28,968][134211] Fps is (10 sec: 15974.6, 60 sec: 15018.7, 300 sec: 14870.6). Total num frames: 444805120. Throughput: 0: 3412.9. Samples: 100370522. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 05:18:28,968][134211] Avg episode reward: [(0, '8.520')] [2025-01-04 05:18:30,665][134294] Updated weights for policy 0, policy_version 108604 (0.0012) [2025-01-04 05:18:32,551][134294] Updated weights for policy 0, policy_version 108614 (0.0015) [2025-01-04 05:18:33,968][134211] Fps is (10 sec: 19251.3, 60 sec: 14950.4, 300 sec: 14995.5). Total num frames: 444907520. Throughput: 0: 3590.8. Samples: 100386590. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 05:18:33,968][134211] Avg episode reward: [(0, '8.090')] [2025-01-04 05:18:35,008][134294] Updated weights for policy 0, policy_version 108624 (0.0019) [2025-01-04 05:18:38,170][134294] Updated weights for policy 0, policy_version 108634 (0.0028) [2025-01-04 05:18:38,968][134211] Fps is (10 sec: 16793.3, 60 sec: 14267.7, 300 sec: 14995.5). Total num frames: 444973056. Throughput: 0: 3767.2. Samples: 100411304. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 05:18:38,968][134211] Avg episode reward: [(0, '8.885')] [2025-01-04 05:18:41,331][134294] Updated weights for policy 0, policy_version 108644 (0.0027) [2025-01-04 05:18:43,968][134211] Fps is (10 sec: 12697.4, 60 sec: 14199.5, 300 sec: 14940.0). Total num frames: 445034496. Throughput: 0: 3792.0. Samples: 100430390. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 05:18:43,968][134211] Avg episode reward: [(0, '8.546')] [2025-01-04 05:18:44,602][134294] Updated weights for policy 0, policy_version 108654 (0.0026) [2025-01-04 05:18:47,669][134294] Updated weights for policy 0, policy_version 108664 (0.0027) [2025-01-04 05:18:48,968][134211] Fps is (10 sec: 13107.4, 60 sec: 14472.6, 300 sec: 14884.5). Total num frames: 445104128. Throughput: 0: 3783.1. Samples: 100440046. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 05:18:48,968][134211] Avg episode reward: [(0, '8.643')] [2025-01-04 05:18:50,731][134294] Updated weights for policy 0, policy_version 108674 (0.0027) [2025-01-04 05:18:53,968][134211] Fps is (10 sec: 13517.0, 60 sec: 14609.1, 300 sec: 14884.5). Total num frames: 445169664. Throughput: 0: 3780.5. Samples: 100460056. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 05:18:53,968][134211] Avg episode reward: [(0, '6.932')] [2025-01-04 05:18:53,971][134294] Updated weights for policy 0, policy_version 108684 (0.0024) [2025-01-04 05:18:56,753][134294] Updated weights for policy 0, policy_version 108694 (0.0022) [2025-01-04 05:18:58,645][134294] Updated weights for policy 0, policy_version 108704 (0.0013) [2025-01-04 05:18:58,968][134211] Fps is (10 sec: 15155.1, 60 sec: 15018.7, 300 sec: 14967.8). Total num frames: 445255680. Throughput: 0: 3841.7. Samples: 100483756. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 05:18:58,968][134211] Avg episode reward: [(0, '8.225')] [2025-01-04 05:19:00,503][134294] Updated weights for policy 0, policy_version 108714 (0.0014) [2025-01-04 05:19:02,492][134294] Updated weights for policy 0, policy_version 108724 (0.0012) [2025-01-04 05:19:03,967][134211] Fps is (10 sec: 19661.1, 60 sec: 15769.6, 300 sec: 15051.1). Total num frames: 445366272. Throughput: 0: 3949.9. Samples: 100500136. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 05:19:03,968][134211] Avg episode reward: [(0, '8.093')] [2025-01-04 05:19:04,297][134294] Updated weights for policy 0, policy_version 108734 (0.0015) [2025-01-04 05:19:06,301][134294] Updated weights for policy 0, policy_version 108744 (0.0014) [2025-01-04 05:19:08,968][134211] Fps is (10 sec: 19251.3, 60 sec: 15974.4, 300 sec: 15092.7). Total num frames: 445448192. Throughput: 0: 4088.1. Samples: 100530286. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 05:19:08,968][134211] Avg episode reward: [(0, '9.200')] [2025-01-04 05:19:09,330][134294] Updated weights for policy 0, policy_version 108754 (0.0027) [2025-01-04 05:19:12,581][134294] Updated weights for policy 0, policy_version 108764 (0.0028) [2025-01-04 05:19:13,968][134211] Fps is (10 sec: 14745.0, 60 sec: 15906.1, 300 sec: 15092.7). Total num frames: 445513728. Throughput: 0: 3974.1. Samples: 100549356. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 05:19:13,969][134211] Avg episode reward: [(0, '8.815')] [2025-01-04 05:19:13,976][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000108768_445513728.pth... [2025-01-04 05:19:14,057][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000107894_441933824.pth [2025-01-04 05:19:15,897][134294] Updated weights for policy 0, policy_version 108774 (0.0028) [2025-01-04 05:19:18,968][134211] Fps is (10 sec: 12697.6, 60 sec: 15496.5, 300 sec: 14995.5). Total num frames: 445575168. Throughput: 0: 3826.0. Samples: 100558758. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 05:19:18,969][134211] Avg episode reward: [(0, '8.443')] [2025-01-04 05:19:19,151][134294] Updated weights for policy 0, policy_version 108784 (0.0029) [2025-01-04 05:19:22,572][134294] Updated weights for policy 0, policy_version 108794 (0.0022) [2025-01-04 05:19:23,968][134211] Fps is (10 sec: 12288.1, 60 sec: 15360.0, 300 sec: 14828.9). Total num frames: 445636608. Throughput: 0: 3683.9. Samples: 100577078. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 05:19:23,968][134211] Avg episode reward: [(0, '7.911')] [2025-01-04 05:19:25,754][134294] Updated weights for policy 0, policy_version 108804 (0.0026) [2025-01-04 05:19:28,699][134294] Updated weights for policy 0, policy_version 108814 (0.0024) [2025-01-04 05:19:28,968][134211] Fps is (10 sec: 12697.5, 60 sec: 14950.4, 300 sec: 14773.4). Total num frames: 445702144. Throughput: 0: 3702.3. Samples: 100596994. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 05:19:28,968][134211] Avg episode reward: [(0, '8.943')] [2025-01-04 05:19:31,664][134294] Updated weights for policy 0, policy_version 108824 (0.0026) [2025-01-04 05:19:33,968][134211] Fps is (10 sec: 13517.0, 60 sec: 14404.3, 300 sec: 14801.1). Total num frames: 445771776. Throughput: 0: 3715.2. Samples: 100607230. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 05:19:33,968][134211] Avg episode reward: [(0, '8.950')] [2025-01-04 05:19:34,726][134294] Updated weights for policy 0, policy_version 108834 (0.0025) [2025-01-04 05:19:37,734][134294] Updated weights for policy 0, policy_version 108844 (0.0025) [2025-01-04 05:19:38,968][134211] Fps is (10 sec: 13926.5, 60 sec: 14472.5, 300 sec: 14828.9). Total num frames: 445841408. Throughput: 0: 3723.3. Samples: 100627604. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 05:19:38,968][134211] Avg episode reward: [(0, '6.895')] [2025-01-04 05:19:40,178][134294] Updated weights for policy 0, policy_version 108854 (0.0018) [2025-01-04 05:19:42,052][134294] Updated weights for policy 0, policy_version 108864 (0.0014) [2025-01-04 05:19:43,967][134211] Fps is (10 sec: 17203.5, 60 sec: 15155.3, 300 sec: 14953.9). Total num frames: 445943808. Throughput: 0: 3818.3. Samples: 100655580. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 05:19:43,968][134211] Avg episode reward: [(0, '8.523')] [2025-01-04 05:19:44,114][134294] Updated weights for policy 0, policy_version 108874 (0.0013) [2025-01-04 05:19:46,203][134294] Updated weights for policy 0, policy_version 108884 (0.0013) [2025-01-04 05:19:48,157][134294] Updated weights for policy 0, policy_version 108894 (0.0014) [2025-01-04 05:19:48,967][134211] Fps is (10 sec: 20480.5, 60 sec: 15701.4, 300 sec: 15106.6). Total num frames: 446046208. Throughput: 0: 3784.3. Samples: 100670430. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 05:19:48,968][134211] Avg episode reward: [(0, '8.401')] [2025-01-04 05:19:50,075][134294] Updated weights for policy 0, policy_version 108904 (0.0014) [2025-01-04 05:19:53,304][134294] Updated weights for policy 0, policy_version 108914 (0.0026) [2025-01-04 05:19:53,968][134211] Fps is (10 sec: 17202.6, 60 sec: 15769.6, 300 sec: 15051.0). Total num frames: 446115840. Throughput: 0: 3725.1. Samples: 100697914. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 05:19:53,968][134211] Avg episode reward: [(0, '9.320')] [2025-01-04 05:19:56,614][134294] Updated weights for policy 0, policy_version 108924 (0.0028) [2025-01-04 05:19:58,968][134211] Fps is (10 sec: 13516.4, 60 sec: 15428.2, 300 sec: 14912.2). Total num frames: 446181376. Throughput: 0: 3707.8. Samples: 100716206. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 05:19:58,968][134211] Avg episode reward: [(0, '8.723')] [2025-01-04 05:19:59,868][134294] Updated weights for policy 0, policy_version 108934 (0.0028) [2025-01-04 05:20:02,995][134294] Updated weights for policy 0, policy_version 108944 (0.0027) [2025-01-04 05:20:03,968][134211] Fps is (10 sec: 13107.4, 60 sec: 14677.3, 300 sec: 14828.9). Total num frames: 446246912. Throughput: 0: 3714.1. Samples: 100725892. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 05:20:03,968][134211] Avg episode reward: [(0, '8.141')] [2025-01-04 05:20:06,059][134294] Updated weights for policy 0, policy_version 108954 (0.0026) [2025-01-04 05:20:08,968][134211] Fps is (10 sec: 13107.4, 60 sec: 14404.3, 300 sec: 14842.8). Total num frames: 446312448. Throughput: 0: 3746.7. Samples: 100745678. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 05:20:08,968][134211] Avg episode reward: [(0, '8.209')] [2025-01-04 05:20:09,210][134294] Updated weights for policy 0, policy_version 108964 (0.0025) [2025-01-04 05:20:12,236][134294] Updated weights for policy 0, policy_version 108974 (0.0025) [2025-01-04 05:20:13,968][134211] Fps is (10 sec: 13107.0, 60 sec: 14404.3, 300 sec: 14842.8). Total num frames: 446377984. Throughput: 0: 3749.8. Samples: 100765736. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 05:20:13,968][134211] Avg episode reward: [(0, '8.504')] [2025-01-04 05:20:15,149][134294] Updated weights for policy 0, policy_version 108984 (0.0025) [2025-01-04 05:20:18,320][134294] Updated weights for policy 0, policy_version 108994 (0.0025) [2025-01-04 05:20:18,968][134211] Fps is (10 sec: 13107.2, 60 sec: 14472.6, 300 sec: 14870.7). Total num frames: 446443520. Throughput: 0: 3754.9. Samples: 100776202. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 05:20:18,968][134211] Avg episode reward: [(0, '7.171')] [2025-01-04 05:20:21,639][134294] Updated weights for policy 0, policy_version 109004 (0.0025) [2025-01-04 05:20:23,968][134211] Fps is (10 sec: 13517.1, 60 sec: 14609.1, 300 sec: 14870.6). Total num frames: 446513152. Throughput: 0: 3718.1. Samples: 100794916. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 05:20:23,968][134211] Avg episode reward: [(0, '9.022')] [2025-01-04 05:20:24,343][134294] Updated weights for policy 0, policy_version 109014 (0.0016) [2025-01-04 05:20:26,287][134294] Updated weights for policy 0, policy_version 109024 (0.0012) [2025-01-04 05:20:28,142][134294] Updated weights for policy 0, policy_version 109034 (0.0013) [2025-01-04 05:20:28,968][134211] Fps is (10 sec: 17612.6, 60 sec: 15291.7, 300 sec: 15009.4). Total num frames: 446619648. Throughput: 0: 3742.3. Samples: 100823986. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 05:20:28,968][134211] Avg episode reward: [(0, '8.077')] [2025-01-04 05:20:30,034][134294] Updated weights for policy 0, policy_version 109044 (0.0012) [2025-01-04 05:20:31,948][134294] Updated weights for policy 0, policy_version 109054 (0.0013) [2025-01-04 05:20:33,845][134294] Updated weights for policy 0, policy_version 109064 (0.0013) [2025-01-04 05:20:33,967][134211] Fps is (10 sec: 21299.3, 60 sec: 15906.2, 300 sec: 15134.5). Total num frames: 446726144. Throughput: 0: 3776.8. Samples: 100840388. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 05:20:33,968][134211] Avg episode reward: [(0, '8.682')] [2025-01-04 05:20:35,716][134294] Updated weights for policy 0, policy_version 109074 (0.0015) [2025-01-04 05:20:38,482][134294] Updated weights for policy 0, policy_version 109084 (0.0025) [2025-01-04 05:20:38,968][134211] Fps is (10 sec: 19251.0, 60 sec: 16179.2, 300 sec: 15189.9). Total num frames: 446812160. Throughput: 0: 3848.0. Samples: 100871072. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 05:20:38,970][134211] Avg episode reward: [(0, '8.570')] [2025-01-04 05:20:41,874][134294] Updated weights for policy 0, policy_version 109094 (0.0029) [2025-01-04 05:20:43,968][134211] Fps is (10 sec: 14335.5, 60 sec: 15428.2, 300 sec: 15051.0). Total num frames: 446869504. Throughput: 0: 3838.3. Samples: 100888930. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 05:20:43,969][134211] Avg episode reward: [(0, '7.104')] [2025-01-04 05:20:45,590][134294] Updated weights for policy 0, policy_version 109104 (0.0027) [2025-01-04 05:20:48,969][134211] Fps is (10 sec: 11468.0, 60 sec: 14677.1, 300 sec: 14898.3). Total num frames: 446926848. Throughput: 0: 3811.3. Samples: 100897404. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 05:20:48,969][134211] Avg episode reward: [(0, '8.623')] [2025-01-04 05:20:49,065][134294] Updated weights for policy 0, policy_version 109114 (0.0027) [2025-01-04 05:20:52,286][134294] Updated weights for policy 0, policy_version 109124 (0.0026) [2025-01-04 05:20:53,968][134211] Fps is (10 sec: 11878.4, 60 sec: 14540.8, 300 sec: 14870.6). Total num frames: 446988288. Throughput: 0: 3777.9. Samples: 100915686. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 05:20:53,969][134211] Avg episode reward: [(0, '8.337')] [2025-01-04 05:20:55,579][134294] Updated weights for policy 0, policy_version 109134 (0.0028) [2025-01-04 05:20:58,503][134294] Updated weights for policy 0, policy_version 109144 (0.0022) [2025-01-04 05:20:58,968][134211] Fps is (10 sec: 13108.2, 60 sec: 14609.1, 300 sec: 14898.3). Total num frames: 447057920. Throughput: 0: 3774.8. Samples: 100935600. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 05:20:58,968][134211] Avg episode reward: [(0, '7.924')] [2025-01-04 05:21:01,533][134294] Updated weights for policy 0, policy_version 109154 (0.0024) [2025-01-04 05:21:03,968][134211] Fps is (10 sec: 13517.0, 60 sec: 14609.1, 300 sec: 14898.4). Total num frames: 447123456. Throughput: 0: 3768.8. Samples: 100945796. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 05:21:03,968][134211] Avg episode reward: [(0, '8.431')] [2025-01-04 05:21:04,618][134294] Updated weights for policy 0, policy_version 109164 (0.0026) [2025-01-04 05:21:07,617][134294] Updated weights for policy 0, policy_version 109174 (0.0022) [2025-01-04 05:21:08,968][134211] Fps is (10 sec: 13517.0, 60 sec: 14677.3, 300 sec: 14912.2). Total num frames: 447193088. Throughput: 0: 3797.0. Samples: 100965782. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 05:21:08,968][134211] Avg episode reward: [(0, '8.532')] [2025-01-04 05:21:10,660][134294] Updated weights for policy 0, policy_version 109184 (0.0025) [2025-01-04 05:21:13,530][134294] Updated weights for policy 0, policy_version 109194 (0.0026) [2025-01-04 05:21:13,968][134211] Fps is (10 sec: 13925.9, 60 sec: 14745.5, 300 sec: 14926.1). Total num frames: 447262720. Throughput: 0: 3615.7. Samples: 100986694. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 05:21:13,969][134211] Avg episode reward: [(0, '7.758')] [2025-01-04 05:21:13,979][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000109195_447262720.pth... [2025-01-04 05:21:14,052][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000108327_443707392.pth [2025-01-04 05:21:16,485][134294] Updated weights for policy 0, policy_version 109204 (0.0021) [2025-01-04 05:21:18,522][134294] Updated weights for policy 0, policy_version 109214 (0.0012) [2025-01-04 05:21:18,968][134211] Fps is (10 sec: 15155.3, 60 sec: 15018.7, 300 sec: 14995.5). Total num frames: 447344640. Throughput: 0: 3473.4. Samples: 100996690. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 05:21:18,969][134211] Avg episode reward: [(0, '8.574')] [2025-01-04 05:21:20,633][134294] Updated weights for policy 0, policy_version 109224 (0.0011) [2025-01-04 05:21:22,968][134294] Updated weights for policy 0, policy_version 109234 (0.0019) [2025-01-04 05:21:23,968][134211] Fps is (10 sec: 16794.1, 60 sec: 15291.7, 300 sec: 15051.1). Total num frames: 447430656. Throughput: 0: 3445.8. Samples: 101026134. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 05:21:23,968][134211] Avg episode reward: [(0, '9.014')] [2025-01-04 05:21:26,482][134294] Updated weights for policy 0, policy_version 109244 (0.0029) [2025-01-04 05:21:28,968][134211] Fps is (10 sec: 14745.5, 60 sec: 14540.8, 300 sec: 14884.4). Total num frames: 447492096. Throughput: 0: 3456.7. Samples: 101044480. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 05:21:28,968][134211] Avg episode reward: [(0, '8.746')] [2025-01-04 05:21:29,711][134294] Updated weights for policy 0, policy_version 109254 (0.0028) [2025-01-04 05:21:32,325][134294] Updated weights for policy 0, policy_version 109264 (0.0019) [2025-01-04 05:21:33,967][134211] Fps is (10 sec: 14745.9, 60 sec: 14199.5, 300 sec: 14842.8). Total num frames: 447578112. Throughput: 0: 3479.7. Samples: 101053986. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 05:21:33,968][134211] Avg episode reward: [(0, '8.015')] [2025-01-04 05:21:34,152][134294] Updated weights for policy 0, policy_version 109274 (0.0013) [2025-01-04 05:21:36,299][134294] Updated weights for policy 0, policy_version 109284 (0.0017) [2025-01-04 05:21:38,968][134211] Fps is (10 sec: 16793.3, 60 sec: 14131.2, 300 sec: 14898.3). Total num frames: 447660032. Throughput: 0: 3720.4. Samples: 101083102. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 05:21:38,969][134211] Avg episode reward: [(0, '9.258')] [2025-01-04 05:21:39,644][134294] Updated weights for policy 0, policy_version 109294 (0.0026) [2025-01-04 05:21:43,034][134294] Updated weights for policy 0, policy_version 109304 (0.0030) [2025-01-04 05:21:43,968][134211] Fps is (10 sec: 13926.0, 60 sec: 14131.2, 300 sec: 14884.4). Total num frames: 447717376. Throughput: 0: 3675.7. Samples: 101101006. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 05:21:43,968][134211] Avg episode reward: [(0, '8.349')] [2025-01-04 05:21:46,472][134294] Updated weights for policy 0, policy_version 109314 (0.0024) [2025-01-04 05:21:48,968][134211] Fps is (10 sec: 11878.7, 60 sec: 14199.7, 300 sec: 14884.5). Total num frames: 447778816. Throughput: 0: 3648.2. Samples: 101109964. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 05:21:48,968][134211] Avg episode reward: [(0, '8.790')] [2025-01-04 05:21:49,623][134294] Updated weights for policy 0, policy_version 109324 (0.0021) [2025-01-04 05:21:51,776][134294] Updated weights for policy 0, policy_version 109334 (0.0014) [2025-01-04 05:21:53,920][134294] Updated weights for policy 0, policy_version 109344 (0.0015) [2025-01-04 05:21:53,968][134211] Fps is (10 sec: 15565.1, 60 sec: 14745.7, 300 sec: 14981.6). Total num frames: 447873024. Throughput: 0: 3719.8. Samples: 101133174. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 05:21:53,968][134211] Avg episode reward: [(0, '8.028')] [2025-01-04 05:21:56,147][134294] Updated weights for policy 0, policy_version 109354 (0.0014) [2025-01-04 05:21:58,185][134294] Updated weights for policy 0, policy_version 109364 (0.0013) [2025-01-04 05:21:58,968][134211] Fps is (10 sec: 18841.4, 60 sec: 15155.2, 300 sec: 15078.8). Total num frames: 447967232. Throughput: 0: 3896.5. Samples: 101162034. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 05:21:58,968][134211] Avg episode reward: [(0, '8.241')] [2025-01-04 05:22:01,527][134294] Updated weights for policy 0, policy_version 109374 (0.0027) [2025-01-04 05:22:03,968][134211] Fps is (10 sec: 14745.2, 60 sec: 14950.4, 300 sec: 14981.6). Total num frames: 448020480. Throughput: 0: 3878.6. Samples: 101171226. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 05:22:03,968][134211] Avg episode reward: [(0, '8.580')] [2025-01-04 05:22:05,283][134294] Updated weights for policy 0, policy_version 109384 (0.0026) [2025-01-04 05:22:08,948][134294] Updated weights for policy 0, policy_version 109394 (0.0027) [2025-01-04 05:22:08,968][134211] Fps is (10 sec: 11059.2, 60 sec: 14745.6, 300 sec: 14815.0). Total num frames: 448077824. Throughput: 0: 3592.4. Samples: 101187790. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 05:22:08,968][134211] Avg episode reward: [(0, '8.658')] [2025-01-04 05:22:12,193][134294] Updated weights for policy 0, policy_version 109404 (0.0026) [2025-01-04 05:22:13,968][134211] Fps is (10 sec: 11878.4, 60 sec: 14609.1, 300 sec: 14828.9). Total num frames: 448139264. Throughput: 0: 3583.5. Samples: 101205740. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 05:22:13,969][134211] Avg episode reward: [(0, '7.775')] [2025-01-04 05:22:15,499][134294] Updated weights for policy 0, policy_version 109414 (0.0025) [2025-01-04 05:22:18,730][134294] Updated weights for policy 0, policy_version 109424 (0.0024) [2025-01-04 05:22:18,968][134211] Fps is (10 sec: 12287.9, 60 sec: 14267.7, 300 sec: 14828.9). Total num frames: 448200704. Throughput: 0: 3582.9. Samples: 101215218. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 05:22:18,968][134211] Avg episode reward: [(0, '8.150')] [2025-01-04 05:22:22,140][134294] Updated weights for policy 0, policy_version 109434 (0.0027) [2025-01-04 05:22:23,968][134211] Fps is (10 sec: 13107.5, 60 sec: 13994.7, 300 sec: 14801.1). Total num frames: 448270336. Throughput: 0: 3346.1. Samples: 101233678. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 05:22:23,968][134211] Avg episode reward: [(0, '8.752')] [2025-01-04 05:22:24,432][134294] Updated weights for policy 0, policy_version 109444 (0.0012) [2025-01-04 05:22:26,365][134294] Updated weights for policy 0, policy_version 109454 (0.0012) [2025-01-04 05:22:28,284][134294] Updated weights for policy 0, policy_version 109464 (0.0013) [2025-01-04 05:22:28,967][134211] Fps is (10 sec: 17613.2, 60 sec: 14745.6, 300 sec: 14801.1). Total num frames: 448376832. Throughput: 0: 3621.1. Samples: 101263954. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 05:22:28,968][134211] Avg episode reward: [(0, '7.982')] [2025-01-04 05:22:30,318][134294] Updated weights for policy 0, policy_version 109474 (0.0016) [2025-01-04 05:22:33,452][134294] Updated weights for policy 0, policy_version 109484 (0.0026) [2025-01-04 05:22:33,968][134211] Fps is (10 sec: 18021.8, 60 sec: 14540.7, 300 sec: 14690.0). Total num frames: 448450560. Throughput: 0: 3723.7. Samples: 101277532. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 05:22:33,969][134211] Avg episode reward: [(0, '9.105')] [2025-01-04 05:22:36,563][134294] Updated weights for policy 0, policy_version 109494 (0.0029) [2025-01-04 05:22:38,968][134211] Fps is (10 sec: 13926.0, 60 sec: 14267.7, 300 sec: 14690.1). Total num frames: 448516096. Throughput: 0: 3640.2. Samples: 101296982. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 05:22:38,968][134211] Avg episode reward: [(0, '8.158')] [2025-01-04 05:22:39,838][134294] Updated weights for policy 0, policy_version 109504 (0.0027) [2025-01-04 05:22:43,049][134294] Updated weights for policy 0, policy_version 109514 (0.0028) [2025-01-04 05:22:43,968][134211] Fps is (10 sec: 12697.9, 60 sec: 14336.0, 300 sec: 14717.8). Total num frames: 448577536. Throughput: 0: 3422.9. Samples: 101316066. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 05:22:43,968][134211] Avg episode reward: [(0, '8.351')] [2025-01-04 05:22:46,046][134294] Updated weights for policy 0, policy_version 109524 (0.0023) [2025-01-04 05:22:48,968][134211] Fps is (10 sec: 13107.2, 60 sec: 14472.5, 300 sec: 14759.5). Total num frames: 448647168. Throughput: 0: 3448.6. Samples: 101326414. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 05:22:48,968][134211] Avg episode reward: [(0, '8.493')] [2025-01-04 05:22:49,054][134294] Updated weights for policy 0, policy_version 109534 (0.0025) [2025-01-04 05:22:52,267][134294] Updated weights for policy 0, policy_version 109544 (0.0027) [2025-01-04 05:22:53,968][134211] Fps is (10 sec: 13107.1, 60 sec: 13926.4, 300 sec: 14759.5). Total num frames: 448708608. Throughput: 0: 3514.7. Samples: 101345954. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 05:22:53,968][134211] Avg episode reward: [(0, '8.711')] [2025-01-04 05:22:55,398][134294] Updated weights for policy 0, policy_version 109554 (0.0025) [2025-01-04 05:22:57,336][134294] Updated weights for policy 0, policy_version 109564 (0.0014) [2025-01-04 05:22:58,967][134211] Fps is (10 sec: 15974.8, 60 sec: 13994.7, 300 sec: 14870.6). Total num frames: 448806912. Throughput: 0: 3679.4. Samples: 101371314. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 05:22:58,968][134211] Avg episode reward: [(0, '8.416')] [2025-01-04 05:22:59,242][134294] Updated weights for policy 0, policy_version 109574 (0.0014) [2025-01-04 05:23:01,113][134294] Updated weights for policy 0, policy_version 109584 (0.0014) [2025-01-04 05:23:03,149][134294] Updated weights for policy 0, policy_version 109594 (0.0016) [2025-01-04 05:23:03,968][134211] Fps is (10 sec: 19661.0, 60 sec: 14745.6, 300 sec: 14967.8). Total num frames: 448905216. Throughput: 0: 3828.5. Samples: 101387500. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 05:23:03,968][134211] Avg episode reward: [(0, '8.702')] [2025-01-04 05:23:06,269][134294] Updated weights for policy 0, policy_version 109604 (0.0030) [2025-01-04 05:23:08,968][134211] Fps is (10 sec: 15974.2, 60 sec: 14813.9, 300 sec: 14940.0). Total num frames: 448966656. Throughput: 0: 3926.7. Samples: 101410378. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 05:23:08,968][134211] Avg episode reward: [(0, '9.147')] [2025-01-04 05:23:09,642][134294] Updated weights for policy 0, policy_version 109614 (0.0027) [2025-01-04 05:23:12,632][134294] Updated weights for policy 0, policy_version 109624 (0.0027) [2025-01-04 05:23:13,968][134211] Fps is (10 sec: 12697.5, 60 sec: 14882.2, 300 sec: 14870.6). Total num frames: 449032192. Throughput: 0: 3680.8. Samples: 101429592. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 05:23:13,968][134211] Avg episode reward: [(0, '7.896')] [2025-01-04 05:23:13,982][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000109628_449036288.pth... [2025-01-04 05:23:14,055][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000108768_445513728.pth [2025-01-04 05:23:15,911][134294] Updated weights for policy 0, policy_version 109634 (0.0025) [2025-01-04 05:23:18,968][134211] Fps is (10 sec: 13516.9, 60 sec: 15018.7, 300 sec: 14870.6). Total num frames: 449101824. Throughput: 0: 3595.0. Samples: 101439308. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 05:23:18,968][134211] Avg episode reward: [(0, '8.329')] [2025-01-04 05:23:18,974][134294] Updated weights for policy 0, policy_version 109644 (0.0025) [2025-01-04 05:23:22,116][134294] Updated weights for policy 0, policy_version 109654 (0.0025) [2025-01-04 05:23:23,968][134211] Fps is (10 sec: 13107.1, 60 sec: 14882.1, 300 sec: 14773.4). Total num frames: 449163264. Throughput: 0: 3598.1. Samples: 101458898. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 05:23:23,968][134211] Avg episode reward: [(0, '8.064')] [2025-01-04 05:23:25,544][134294] Updated weights for policy 0, policy_version 109664 (0.0024) [2025-01-04 05:23:28,581][134294] Updated weights for policy 0, policy_version 109674 (0.0026) [2025-01-04 05:23:28,968][134211] Fps is (10 sec: 12697.2, 60 sec: 14199.4, 300 sec: 14648.4). Total num frames: 449228800. Throughput: 0: 3601.2. Samples: 101478122. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 05:23:28,969][134211] Avg episode reward: [(0, '7.834')] [2025-01-04 05:23:31,484][134294] Updated weights for policy 0, policy_version 109684 (0.0026) [2025-01-04 05:23:33,939][134294] Updated weights for policy 0, policy_version 109694 (0.0017) [2025-01-04 05:23:33,968][134211] Fps is (10 sec: 14336.3, 60 sec: 14267.8, 300 sec: 14690.1). Total num frames: 449306624. Throughput: 0: 3600.1. Samples: 101488418. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 05:23:33,968][134211] Avg episode reward: [(0, '8.738')] [2025-01-04 05:23:35,859][134294] Updated weights for policy 0, policy_version 109704 (0.0014) [2025-01-04 05:23:37,751][134294] Updated weights for policy 0, policy_version 109714 (0.0014) [2025-01-04 05:23:38,968][134211] Fps is (10 sec: 18023.1, 60 sec: 14882.2, 300 sec: 14828.9). Total num frames: 449409024. Throughput: 0: 3805.8. Samples: 101517216. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 05:23:38,968][134211] Avg episode reward: [(0, '9.581')] [2025-01-04 05:23:39,797][134294] Updated weights for policy 0, policy_version 109724 (0.0014) [2025-01-04 05:23:41,702][134294] Updated weights for policy 0, policy_version 109734 (0.0014) [2025-01-04 05:23:43,968][134211] Fps is (10 sec: 20070.3, 60 sec: 15496.5, 300 sec: 14926.1). Total num frames: 449507328. Throughput: 0: 3923.2. Samples: 101547860. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 05:23:43,968][134211] Avg episode reward: [(0, '8.512')] [2025-01-04 05:23:44,155][134294] Updated weights for policy 0, policy_version 109744 (0.0023) [2025-01-04 05:23:47,947][134294] Updated weights for policy 0, policy_version 109754 (0.0029) [2025-01-04 05:23:48,968][134211] Fps is (10 sec: 15154.8, 60 sec: 15223.5, 300 sec: 14884.4). Total num frames: 449560576. Throughput: 0: 3760.4. Samples: 101556720. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 05:23:48,968][134211] Avg episode reward: [(0, '9.688')] [2025-01-04 05:23:51,561][134294] Updated weights for policy 0, policy_version 109764 (0.0030) [2025-01-04 05:23:53,968][134211] Fps is (10 sec: 11059.0, 60 sec: 15155.2, 300 sec: 14787.2). Total num frames: 449617920. Throughput: 0: 3620.8. Samples: 101573316. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 05:23:53,968][134211] Avg episode reward: [(0, '9.099')] [2025-01-04 05:23:55,119][134294] Updated weights for policy 0, policy_version 109774 (0.0028) [2025-01-04 05:23:58,290][134294] Updated weights for policy 0, policy_version 109784 (0.0027) [2025-01-04 05:23:58,968][134211] Fps is (10 sec: 11878.6, 60 sec: 14540.8, 300 sec: 14620.6). Total num frames: 449679360. Throughput: 0: 3597.4. Samples: 101591474. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 05:23:58,968][134211] Avg episode reward: [(0, '8.978')] [2025-01-04 05:24:01,366][134294] Updated weights for policy 0, policy_version 109794 (0.0023) [2025-01-04 05:24:03,968][134211] Fps is (10 sec: 12697.6, 60 sec: 13994.6, 300 sec: 14565.1). Total num frames: 449744896. Throughput: 0: 3607.2. Samples: 101601632. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 05:24:03,968][134211] Avg episode reward: [(0, '8.702')] [2025-01-04 05:24:04,618][134294] Updated weights for policy 0, policy_version 109804 (0.0025) [2025-01-04 05:24:06,556][134294] Updated weights for policy 0, policy_version 109814 (0.0013) [2025-01-04 05:24:08,373][134294] Updated weights for policy 0, policy_version 109824 (0.0013) [2025-01-04 05:24:08,968][134211] Fps is (10 sec: 16793.2, 60 sec: 14677.3, 300 sec: 14690.1). Total num frames: 449847296. Throughput: 0: 3722.1. Samples: 101626392. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 05:24:08,970][134211] Avg episode reward: [(0, '7.962')] [2025-01-04 05:24:11,070][134294] Updated weights for policy 0, policy_version 109834 (0.0022) [2025-01-04 05:24:13,970][134211] Fps is (10 sec: 17199.1, 60 sec: 14745.0, 300 sec: 14717.7). Total num frames: 449916928. Throughput: 0: 3830.1. Samples: 101650486. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 05:24:13,971][134211] Avg episode reward: [(0, '8.198')] [2025-01-04 05:24:14,004][134294] Updated weights for policy 0, policy_version 109844 (0.0025) [2025-01-04 05:24:17,190][134294] Updated weights for policy 0, policy_version 109854 (0.0025) [2025-01-04 05:24:18,968][134211] Fps is (10 sec: 13517.0, 60 sec: 14677.3, 300 sec: 14731.7). Total num frames: 449982464. Throughput: 0: 3821.8. Samples: 101660400. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 05:24:18,968][134211] Avg episode reward: [(0, '8.678')] [2025-01-04 05:24:20,191][134294] Updated weights for policy 0, policy_version 109864 (0.0024) [2025-01-04 05:24:23,450][134294] Updated weights for policy 0, policy_version 109874 (0.0026) [2025-01-04 05:24:23,968][134211] Fps is (10 sec: 13110.6, 60 sec: 14745.7, 300 sec: 14731.7). Total num frames: 450048000. Throughput: 0: 3628.3. Samples: 101680492. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 05:24:23,968][134211] Avg episode reward: [(0, '8.471')] [2025-01-04 05:24:25,724][134294] Updated weights for policy 0, policy_version 109884 (0.0015) [2025-01-04 05:24:27,740][134294] Updated weights for policy 0, policy_version 109894 (0.0013) [2025-01-04 05:24:28,968][134211] Fps is (10 sec: 15973.5, 60 sec: 15223.4, 300 sec: 14815.0). Total num frames: 450142208. Throughput: 0: 3528.4. Samples: 101706638. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 05:24:28,969][134211] Avg episode reward: [(0, '7.986')] [2025-01-04 05:24:30,580][134294] Updated weights for policy 0, policy_version 109904 (0.0025) [2025-01-04 05:24:33,538][134294] Updated weights for policy 0, policy_version 109914 (0.0024) [2025-01-04 05:24:33,968][134211] Fps is (10 sec: 16384.0, 60 sec: 15086.9, 300 sec: 14815.0). Total num frames: 450211840. Throughput: 0: 3558.9. Samples: 101716872. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 05:24:33,968][134211] Avg episode reward: [(0, '8.405')] [2025-01-04 05:24:36,516][134294] Updated weights for policy 0, policy_version 109924 (0.0025) [2025-01-04 05:24:38,968][134211] Fps is (10 sec: 13517.5, 60 sec: 14472.5, 300 sec: 14690.1). Total num frames: 450277376. Throughput: 0: 3650.7. Samples: 101737596. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 05:24:38,968][134211] Avg episode reward: [(0, '9.236')] [2025-01-04 05:24:39,672][134294] Updated weights for policy 0, policy_version 109934 (0.0024) [2025-01-04 05:24:42,770][134294] Updated weights for policy 0, policy_version 109944 (0.0024) [2025-01-04 05:24:43,968][134211] Fps is (10 sec: 13516.8, 60 sec: 13994.7, 300 sec: 14579.0). Total num frames: 450347008. Throughput: 0: 3684.7. Samples: 101757286. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 05:24:43,968][134211] Avg episode reward: [(0, '7.896')] [2025-01-04 05:24:45,568][134294] Updated weights for policy 0, policy_version 109954 (0.0020) [2025-01-04 05:24:47,481][134294] Updated weights for policy 0, policy_version 109964 (0.0012) [2025-01-04 05:24:48,967][134211] Fps is (10 sec: 16794.0, 60 sec: 14745.7, 300 sec: 14676.2). Total num frames: 450445312. Throughput: 0: 3727.7. Samples: 101769378. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 05:24:48,968][134211] Avg episode reward: [(0, '8.589')] [2025-01-04 05:24:49,363][134294] Updated weights for policy 0, policy_version 109974 (0.0015) [2025-01-04 05:24:51,265][134294] Updated weights for policy 0, policy_version 109984 (0.0015) [2025-01-04 05:24:53,227][134294] Updated weights for policy 0, policy_version 109994 (0.0014) [2025-01-04 05:24:53,969][134211] Fps is (10 sec: 20067.8, 60 sec: 15496.2, 300 sec: 14801.1). Total num frames: 450547712. Throughput: 0: 3895.8. Samples: 101801708. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 05:24:53,969][134211] Avg episode reward: [(0, '8.310')] [2025-01-04 05:24:55,613][134294] Updated weights for policy 0, policy_version 110004 (0.0017) [2025-01-04 05:24:58,778][134294] Updated weights for policy 0, policy_version 110014 (0.0027) [2025-01-04 05:24:58,968][134211] Fps is (10 sec: 17203.0, 60 sec: 15633.1, 300 sec: 14815.0). Total num frames: 450617344. Throughput: 0: 3893.2. Samples: 101825672. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 05:24:58,968][134211] Avg episode reward: [(0, '9.295')] [2025-01-04 05:25:01,951][134294] Updated weights for policy 0, policy_version 110024 (0.0029) [2025-01-04 05:25:03,968][134211] Fps is (10 sec: 13518.5, 60 sec: 15633.1, 300 sec: 14815.0). Total num frames: 450682880. Throughput: 0: 3882.9. Samples: 101835132. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 05:25:03,968][134211] Avg episode reward: [(0, '7.879')] [2025-01-04 05:25:05,151][134294] Updated weights for policy 0, policy_version 110034 (0.0028) [2025-01-04 05:25:08,299][134294] Updated weights for policy 0, policy_version 110044 (0.0027) [2025-01-04 05:25:08,968][134211] Fps is (10 sec: 13107.2, 60 sec: 15018.7, 300 sec: 14815.0). Total num frames: 450748416. Throughput: 0: 3867.1. Samples: 101854512. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 05:25:08,968][134211] Avg episode reward: [(0, '9.122')] [2025-01-04 05:25:11,412][134294] Updated weights for policy 0, policy_version 110054 (0.0025) [2025-01-04 05:25:13,968][134211] Fps is (10 sec: 13106.9, 60 sec: 14951.0, 300 sec: 14815.0). Total num frames: 450813952. Throughput: 0: 3727.4. Samples: 101874370. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 05:25:13,970][134211] Avg episode reward: [(0, '8.525')] [2025-01-04 05:25:13,979][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000110062_450813952.pth... [2025-01-04 05:25:14,054][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000109195_447262720.pth [2025-01-04 05:25:14,561][134294] Updated weights for policy 0, policy_version 110064 (0.0028) [2025-01-04 05:25:17,760][134294] Updated weights for policy 0, policy_version 110074 (0.0025) [2025-01-04 05:25:18,968][134211] Fps is (10 sec: 12697.6, 60 sec: 14882.1, 300 sec: 14787.2). Total num frames: 450875392. Throughput: 0: 3709.2. Samples: 101883786. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 05:25:18,968][134211] Avg episode reward: [(0, '8.954')] [2025-01-04 05:25:21,002][134294] Updated weights for policy 0, policy_version 110084 (0.0025) [2025-01-04 05:25:23,968][134211] Fps is (10 sec: 12288.2, 60 sec: 14813.8, 300 sec: 14634.5). Total num frames: 450936832. Throughput: 0: 3669.3. Samples: 101902714. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 05:25:23,968][134211] Avg episode reward: [(0, '8.919')] [2025-01-04 05:25:24,394][134294] Updated weights for policy 0, policy_version 110094 (0.0024) [2025-01-04 05:25:26,745][134294] Updated weights for policy 0, policy_version 110104 (0.0013) [2025-01-04 05:25:28,628][134294] Updated weights for policy 0, policy_version 110114 (0.0013) [2025-01-04 05:25:28,968][134211] Fps is (10 sec: 15974.5, 60 sec: 14882.3, 300 sec: 14606.7). Total num frames: 451035136. Throughput: 0: 3787.5. Samples: 101927724. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 05:25:28,968][134211] Avg episode reward: [(0, '8.132')] [2025-01-04 05:25:30,484][134294] Updated weights for policy 0, policy_version 110124 (0.0013) [2025-01-04 05:25:32,419][134294] Updated weights for policy 0, policy_version 110134 (0.0013) [2025-01-04 05:25:33,968][134211] Fps is (10 sec: 20070.5, 60 sec: 15428.3, 300 sec: 14662.3). Total num frames: 451137536. Throughput: 0: 3880.4. Samples: 101943996. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 05:25:33,968][134211] Avg episode reward: [(0, '7.994')] [2025-01-04 05:25:34,416][134294] Updated weights for policy 0, policy_version 110144 (0.0013) [2025-01-04 05:25:36,604][134294] Updated weights for policy 0, policy_version 110154 (0.0018) [2025-01-04 05:25:38,968][134211] Fps is (10 sec: 18022.2, 60 sec: 15633.1, 300 sec: 14731.7). Total num frames: 451215360. Throughput: 0: 3797.8. Samples: 101972606. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 05:25:38,968][134211] Avg episode reward: [(0, '8.420')] [2025-01-04 05:25:40,068][134294] Updated weights for policy 0, policy_version 110164 (0.0028) [2025-01-04 05:25:43,249][134294] Updated weights for policy 0, policy_version 110174 (0.0028) [2025-01-04 05:25:43,968][134211] Fps is (10 sec: 14335.6, 60 sec: 15564.7, 300 sec: 14759.5). Total num frames: 451280896. Throughput: 0: 3679.0. Samples: 101991230. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 05:25:43,969][134211] Avg episode reward: [(0, '7.750')] [2025-01-04 05:25:46,845][134294] Updated weights for policy 0, policy_version 110184 (0.0028) [2025-01-04 05:25:48,968][134211] Fps is (10 sec: 11877.5, 60 sec: 14813.6, 300 sec: 14731.7). Total num frames: 451334144. Throughput: 0: 3663.6. Samples: 101999998. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 05:25:48,969][134211] Avg episode reward: [(0, '7.807')] [2025-01-04 05:25:50,441][134294] Updated weights for policy 0, policy_version 110194 (0.0028) [2025-01-04 05:25:53,968][134211] Fps is (10 sec: 11059.3, 60 sec: 14063.2, 300 sec: 14690.1). Total num frames: 451391488. Throughput: 0: 3603.5. Samples: 102016672. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 05:25:53,968][134211] Avg episode reward: [(0, '8.314')] [2025-01-04 05:25:54,075][134294] Updated weights for policy 0, policy_version 110204 (0.0025) [2025-01-04 05:25:57,468][134294] Updated weights for policy 0, policy_version 110214 (0.0027) [2025-01-04 05:25:58,967][134211] Fps is (10 sec: 12289.2, 60 sec: 13994.7, 300 sec: 14690.1). Total num frames: 451457024. Throughput: 0: 3557.1. Samples: 102034436. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 05:25:58,968][134211] Avg episode reward: [(0, '8.022')] [2025-01-04 05:25:59,833][134294] Updated weights for policy 0, policy_version 110224 (0.0018) [2025-01-04 05:26:01,750][134294] Updated weights for policy 0, policy_version 110234 (0.0014) [2025-01-04 05:26:03,662][134294] Updated weights for policy 0, policy_version 110244 (0.0014) [2025-01-04 05:26:03,968][134211] Fps is (10 sec: 17203.1, 60 sec: 14677.3, 300 sec: 14815.0). Total num frames: 451563520. Throughput: 0: 3688.6. Samples: 102049774. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 05:26:03,968][134211] Avg episode reward: [(0, '8.195')] [2025-01-04 05:26:05,553][134294] Updated weights for policy 0, policy_version 110254 (0.0014) [2025-01-04 05:26:07,848][134294] Updated weights for policy 0, policy_version 110264 (0.0020) [2025-01-04 05:26:08,968][134211] Fps is (10 sec: 19660.2, 60 sec: 15086.9, 300 sec: 14884.5). Total num frames: 451653632. Throughput: 0: 3962.1. Samples: 102081008. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 05:26:08,968][134211] Avg episode reward: [(0, '8.170')] [2025-01-04 05:26:11,076][134294] Updated weights for policy 0, policy_version 110274 (0.0028) [2025-01-04 05:26:13,968][134211] Fps is (10 sec: 15564.9, 60 sec: 15087.0, 300 sec: 14828.9). Total num frames: 451719168. Throughput: 0: 3839.2. Samples: 102100488. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 05:26:13,968][134211] Avg episode reward: [(0, '7.910')] [2025-01-04 05:26:14,310][134294] Updated weights for policy 0, policy_version 110284 (0.0027) [2025-01-04 05:26:17,446][134294] Updated weights for policy 0, policy_version 110294 (0.0026) [2025-01-04 05:26:18,968][134211] Fps is (10 sec: 13107.4, 60 sec: 15155.2, 300 sec: 14759.5). Total num frames: 451784704. Throughput: 0: 3693.7. Samples: 102110212. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 05:26:18,968][134211] Avg episode reward: [(0, '9.016')] [2025-01-04 05:26:20,563][134294] Updated weights for policy 0, policy_version 110304 (0.0026) [2025-01-04 05:26:23,875][134294] Updated weights for policy 0, policy_version 110314 (0.0027) [2025-01-04 05:26:23,968][134211] Fps is (10 sec: 12697.8, 60 sec: 15155.2, 300 sec: 14759.5). Total num frames: 451846144. Throughput: 0: 3492.8. Samples: 102129784. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 05:26:23,968][134211] Avg episode reward: [(0, '8.595')] [2025-01-04 05:26:27,307][134294] Updated weights for policy 0, policy_version 110324 (0.0028) [2025-01-04 05:26:28,968][134211] Fps is (10 sec: 12288.0, 60 sec: 14540.8, 300 sec: 14676.2). Total num frames: 451907584. Throughput: 0: 3472.3. Samples: 102147484. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 05:26:28,968][134211] Avg episode reward: [(0, '8.430')] [2025-01-04 05:26:30,377][134294] Updated weights for policy 0, policy_version 110334 (0.0023) [2025-01-04 05:26:33,367][134294] Updated weights for policy 0, policy_version 110344 (0.0027) [2025-01-04 05:26:33,968][134211] Fps is (10 sec: 12697.4, 60 sec: 13926.4, 300 sec: 14620.6). Total num frames: 451973120. Throughput: 0: 3508.1. Samples: 102157862. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 05:26:33,968][134211] Avg episode reward: [(0, '8.823')] [2025-01-04 05:26:35,651][134294] Updated weights for policy 0, policy_version 110354 (0.0016) [2025-01-04 05:26:37,521][134294] Updated weights for policy 0, policy_version 110364 (0.0012) [2025-01-04 05:26:38,967][134211] Fps is (10 sec: 17203.5, 60 sec: 14404.3, 300 sec: 14787.3). Total num frames: 452079616. Throughput: 0: 3721.6. Samples: 102184142. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 05:26:38,968][134211] Avg episode reward: [(0, '8.348')] [2025-01-04 05:26:39,396][134294] Updated weights for policy 0, policy_version 110374 (0.0014) [2025-01-04 05:26:41,351][134294] Updated weights for policy 0, policy_version 110384 (0.0012) [2025-01-04 05:26:43,172][134294] Updated weights for policy 0, policy_version 110394 (0.0014) [2025-01-04 05:26:43,968][134211] Fps is (10 sec: 21299.7, 60 sec: 15087.0, 300 sec: 14940.0). Total num frames: 452186112. Throughput: 0: 4047.0. Samples: 102216550. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-04 05:26:43,968][134211] Avg episode reward: [(0, '8.182')] [2025-01-04 05:26:45,608][134294] Updated weights for policy 0, policy_version 110404 (0.0021) [2025-01-04 05:26:48,723][134294] Updated weights for policy 0, policy_version 110414 (0.0026) [2025-01-04 05:26:48,968][134211] Fps is (10 sec: 17612.4, 60 sec: 15360.2, 300 sec: 14856.7). Total num frames: 452255744. Throughput: 0: 3981.1. Samples: 102228924. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-04 05:26:48,968][134211] Avg episode reward: [(0, '8.551')] [2025-01-04 05:26:52,066][134294] Updated weights for policy 0, policy_version 110424 (0.0028) [2025-01-04 05:26:53,968][134211] Fps is (10 sec: 13107.0, 60 sec: 15428.3, 300 sec: 14745.6). Total num frames: 452317184. Throughput: 0: 3705.0. Samples: 102247732. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-04 05:26:53,969][134211] Avg episode reward: [(0, '8.815')] [2025-01-04 05:26:55,463][134294] Updated weights for policy 0, policy_version 110434 (0.0029) [2025-01-04 05:26:58,738][134294] Updated weights for policy 0, policy_version 110444 (0.0027) [2025-01-04 05:26:58,970][134211] Fps is (10 sec: 12695.3, 60 sec: 15427.7, 300 sec: 14787.2). Total num frames: 452382720. Throughput: 0: 3681.9. Samples: 102266178. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-04 05:26:58,970][134211] Avg episode reward: [(0, '8.470')] [2025-01-04 05:27:01,616][134294] Updated weights for policy 0, policy_version 110454 (0.0024) [2025-01-04 05:27:03,968][134211] Fps is (10 sec: 13107.1, 60 sec: 14745.6, 300 sec: 14815.0). Total num frames: 452448256. Throughput: 0: 3693.1. Samples: 102276402. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-04 05:27:03,968][134211] Avg episode reward: [(0, '9.427')] [2025-01-04 05:27:04,860][134294] Updated weights for policy 0, policy_version 110464 (0.0023) [2025-01-04 05:27:07,826][134294] Updated weights for policy 0, policy_version 110474 (0.0024) [2025-01-04 05:27:08,968][134211] Fps is (10 sec: 13109.0, 60 sec: 14335.9, 300 sec: 14828.9). Total num frames: 452513792. Throughput: 0: 3700.8. Samples: 102296322. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-04 05:27:08,969][134211] Avg episode reward: [(0, '8.537')] [2025-01-04 05:27:11,114][134294] Updated weights for policy 0, policy_version 110484 (0.0025) [2025-01-04 05:27:13,971][134211] Fps is (10 sec: 13103.2, 60 sec: 14335.3, 300 sec: 14842.6). Total num frames: 452579328. Throughput: 0: 3739.7. Samples: 102315784. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-04 05:27:13,971][134211] Avg episode reward: [(0, '8.487')] [2025-01-04 05:27:13,982][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000110493_452579328.pth... [2025-01-04 05:27:14,058][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000109628_449036288.pth [2025-01-04 05:27:14,195][134294] Updated weights for policy 0, policy_version 110494 (0.0027) [2025-01-04 05:27:17,235][134294] Updated weights for policy 0, policy_version 110504 (0.0026) [2025-01-04 05:27:18,968][134211] Fps is (10 sec: 14336.9, 60 sec: 14540.8, 300 sec: 14870.6). Total num frames: 452657152. Throughput: 0: 3723.9. Samples: 102325436. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-04 05:27:18,968][134211] Avg episode reward: [(0, '8.018')] [2025-01-04 05:27:19,193][134294] Updated weights for policy 0, policy_version 110514 (0.0012) [2025-01-04 05:27:21,133][134294] Updated weights for policy 0, policy_version 110524 (0.0014) [2025-01-04 05:27:23,169][134294] Updated weights for policy 0, policy_version 110534 (0.0014) [2025-01-04 05:27:23,968][134211] Fps is (10 sec: 18027.0, 60 sec: 15223.3, 300 sec: 14856.6). Total num frames: 452759552. Throughput: 0: 3798.8. Samples: 102355090. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-04 05:27:23,969][134211] Avg episode reward: [(0, '7.991')] [2025-01-04 05:27:25,261][134294] Updated weights for policy 0, policy_version 110544 (0.0013) [2025-01-04 05:27:27,298][134294] Updated weights for policy 0, policy_version 110554 (0.0016) [2025-01-04 05:27:28,967][134211] Fps is (10 sec: 20480.1, 60 sec: 15906.2, 300 sec: 14953.9). Total num frames: 452861952. Throughput: 0: 3749.0. Samples: 102385254. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-04 05:27:28,968][134211] Avg episode reward: [(0, '8.389')] [2025-01-04 05:27:29,182][134294] Updated weights for policy 0, policy_version 110564 (0.0015) [2025-01-04 05:27:31,882][134294] Updated weights for policy 0, policy_version 110574 (0.0025) [2025-01-04 05:27:33,968][134211] Fps is (10 sec: 17203.7, 60 sec: 15974.3, 300 sec: 14967.7). Total num frames: 452931584. Throughput: 0: 3768.8. Samples: 102398522. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-04 05:27:33,969][134211] Avg episode reward: [(0, '8.363')] [2025-01-04 05:27:35,360][134294] Updated weights for policy 0, policy_version 110584 (0.0030) [2025-01-04 05:27:38,515][134294] Updated weights for policy 0, policy_version 110594 (0.0026) [2025-01-04 05:27:38,968][134211] Fps is (10 sec: 13516.5, 60 sec: 15291.7, 300 sec: 14981.6). Total num frames: 452997120. Throughput: 0: 3763.4. Samples: 102417084. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-04 05:27:38,968][134211] Avg episode reward: [(0, '8.447')] [2025-01-04 05:27:41,656][134294] Updated weights for policy 0, policy_version 110604 (0.0027) [2025-01-04 05:27:43,968][134211] Fps is (10 sec: 13106.9, 60 sec: 14608.9, 300 sec: 14967.7). Total num frames: 453062656. Throughput: 0: 3783.9. Samples: 102436448. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-04 05:27:43,969][134211] Avg episode reward: [(0, '8.915')] [2025-01-04 05:27:44,932][134294] Updated weights for policy 0, policy_version 110614 (0.0026) [2025-01-04 05:27:48,415][134294] Updated weights for policy 0, policy_version 110624 (0.0023) [2025-01-04 05:27:48,968][134211] Fps is (10 sec: 12288.0, 60 sec: 14404.3, 300 sec: 14953.9). Total num frames: 453120000. Throughput: 0: 3756.2. Samples: 102445430. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 05:27:48,969][134211] Avg episode reward: [(0, '8.836')] [2025-01-04 05:27:52,008][134294] Updated weights for policy 0, policy_version 110634 (0.0024) [2025-01-04 05:27:53,968][134211] Fps is (10 sec: 11469.5, 60 sec: 14336.0, 300 sec: 14815.0). Total num frames: 453177344. Throughput: 0: 3704.1. Samples: 102463006. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 05:27:53,968][134211] Avg episode reward: [(0, '8.447')] [2025-01-04 05:27:55,459][134294] Updated weights for policy 0, policy_version 110644 (0.0025) [2025-01-04 05:27:58,577][134294] Updated weights for policy 0, policy_version 110654 (0.0021) [2025-01-04 05:27:58,968][134211] Fps is (10 sec: 12288.1, 60 sec: 14336.4, 300 sec: 14703.9). Total num frames: 453242880. Throughput: 0: 3687.3. Samples: 102481702. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 05:27:58,968][134211] Avg episode reward: [(0, '8.407')] [2025-01-04 05:28:01,486][134294] Updated weights for policy 0, policy_version 110664 (0.0024) [2025-01-04 05:28:03,666][134294] Updated weights for policy 0, policy_version 110674 (0.0014) [2025-01-04 05:28:03,968][134211] Fps is (10 sec: 14745.8, 60 sec: 14609.1, 300 sec: 14773.4). Total num frames: 453324800. Throughput: 0: 3700.5. Samples: 102491958. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 05:28:03,968][134211] Avg episode reward: [(0, '8.169')] [2025-01-04 05:28:06,377][134294] Updated weights for policy 0, policy_version 110684 (0.0022) [2025-01-04 05:28:08,968][134211] Fps is (10 sec: 15564.8, 60 sec: 14745.7, 300 sec: 14801.1). Total num frames: 453398528. Throughput: 0: 3589.6. Samples: 102516618. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 05:28:08,968][134211] Avg episode reward: [(0, '8.376')] [2025-01-04 05:28:09,291][134294] Updated weights for policy 0, policy_version 110694 (0.0025) [2025-01-04 05:28:12,230][134294] Updated weights for policy 0, policy_version 110704 (0.0022) [2025-01-04 05:28:13,967][134211] Fps is (10 sec: 15155.3, 60 sec: 14951.2, 300 sec: 14828.9). Total num frames: 453476352. Throughput: 0: 3413.7. Samples: 102538872. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 05:28:13,968][134211] Avg episode reward: [(0, '8.488')] [2025-01-04 05:28:14,192][134294] Updated weights for policy 0, policy_version 110714 (0.0014) [2025-01-04 05:28:16,091][134294] Updated weights for policy 0, policy_version 110724 (0.0014) [2025-01-04 05:28:17,992][134294] Updated weights for policy 0, policy_version 110734 (0.0014) [2025-01-04 05:28:18,968][134211] Fps is (10 sec: 18841.7, 60 sec: 15496.5, 300 sec: 14995.5). Total num frames: 453586944. Throughput: 0: 3480.3. Samples: 102555132. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 05:28:18,968][134211] Avg episode reward: [(0, '8.610')] [2025-01-04 05:28:19,831][134294] Updated weights for policy 0, policy_version 110744 (0.0013) [2025-01-04 05:28:22,879][134294] Updated weights for policy 0, policy_version 110754 (0.0025) [2025-01-04 05:28:23,968][134211] Fps is (10 sec: 18022.0, 60 sec: 14950.5, 300 sec: 15009.4). Total num frames: 453656576. Throughput: 0: 3692.3. Samples: 102583238. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 05:28:23,968][134211] Avg episode reward: [(0, '8.612')] [2025-01-04 05:28:26,490][134294] Updated weights for policy 0, policy_version 110764 (0.0027) [2025-01-04 05:28:28,968][134211] Fps is (10 sec: 12697.5, 60 sec: 14199.4, 300 sec: 14940.0). Total num frames: 453713920. Throughput: 0: 3641.3. Samples: 102600306. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 05:28:28,968][134211] Avg episode reward: [(0, '8.518')] [2025-01-04 05:28:29,955][134294] Updated weights for policy 0, policy_version 110774 (0.0030) [2025-01-04 05:28:33,008][134294] Updated weights for policy 0, policy_version 110784 (0.0028) [2025-01-04 05:28:33,968][134211] Fps is (10 sec: 12287.9, 60 sec: 14131.3, 300 sec: 14815.0). Total num frames: 453779456. Throughput: 0: 3650.8. Samples: 102609718. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 05:28:33,968][134211] Avg episode reward: [(0, '8.924')] [2025-01-04 05:28:36,126][134294] Updated weights for policy 0, policy_version 110794 (0.0026) [2025-01-04 05:28:38,968][134211] Fps is (10 sec: 13107.2, 60 sec: 14131.2, 300 sec: 14703.9). Total num frames: 453844992. Throughput: 0: 3702.1. Samples: 102629600. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 05:28:38,968][134211] Avg episode reward: [(0, '8.299')] [2025-01-04 05:28:39,284][134294] Updated weights for policy 0, policy_version 110804 (0.0027) [2025-01-04 05:28:42,143][134294] Updated weights for policy 0, policy_version 110814 (0.0022) [2025-01-04 05:28:43,968][134211] Fps is (10 sec: 15155.4, 60 sec: 14472.7, 300 sec: 14815.0). Total num frames: 453931008. Throughput: 0: 3789.1. Samples: 102652210. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 05:28:43,968][134211] Avg episode reward: [(0, '8.275')] [2025-01-04 05:28:44,198][134294] Updated weights for policy 0, policy_version 110824 (0.0016) [2025-01-04 05:28:46,979][134294] Updated weights for policy 0, policy_version 110834 (0.0025) [2025-01-04 05:28:48,968][134211] Fps is (10 sec: 15564.9, 60 sec: 14677.3, 300 sec: 14856.7). Total num frames: 454000640. Throughput: 0: 3830.5. Samples: 102664332. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 05:28:48,968][134211] Avg episode reward: [(0, '8.437')] [2025-01-04 05:28:50,077][134294] Updated weights for policy 0, policy_version 110844 (0.0023) [2025-01-04 05:28:53,150][134294] Updated weights for policy 0, policy_version 110854 (0.0026) [2025-01-04 05:28:53,968][134211] Fps is (10 sec: 13516.8, 60 sec: 14813.9, 300 sec: 14870.6). Total num frames: 454066176. Throughput: 0: 3734.4. Samples: 102684668. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 05:28:53,968][134211] Avg episode reward: [(0, '7.785')] [2025-01-04 05:28:56,043][134294] Updated weights for policy 0, policy_version 110864 (0.0021) [2025-01-04 05:28:57,963][134294] Updated weights for policy 0, policy_version 110874 (0.0013) [2025-01-04 05:28:58,968][134211] Fps is (10 sec: 15564.9, 60 sec: 15223.5, 300 sec: 14953.9). Total num frames: 454156288. Throughput: 0: 3787.8. Samples: 102709322. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 05:28:58,968][134211] Avg episode reward: [(0, '8.096')] [2025-01-04 05:29:00,382][134294] Updated weights for policy 0, policy_version 110884 (0.0021) [2025-01-04 05:29:03,443][134294] Updated weights for policy 0, policy_version 110894 (0.0023) [2025-01-04 05:29:03,968][134211] Fps is (10 sec: 15974.2, 60 sec: 15018.6, 300 sec: 14842.8). Total num frames: 454225920. Throughput: 0: 3692.7. Samples: 102721304. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 05:29:03,968][134211] Avg episode reward: [(0, '8.173')] [2025-01-04 05:29:06,692][134294] Updated weights for policy 0, policy_version 110904 (0.0025) [2025-01-04 05:29:08,968][134211] Fps is (10 sec: 13516.7, 60 sec: 14882.1, 300 sec: 14829.0). Total num frames: 454291456. Throughput: 0: 3493.1. Samples: 102740428. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 05:29:08,968][134211] Avg episode reward: [(0, '8.660')] [2025-01-04 05:29:09,829][134294] Updated weights for policy 0, policy_version 110914 (0.0028) [2025-01-04 05:29:12,307][134294] Updated weights for policy 0, policy_version 110924 (0.0019) [2025-01-04 05:29:13,968][134211] Fps is (10 sec: 15155.4, 60 sec: 15018.6, 300 sec: 14898.3). Total num frames: 454377472. Throughput: 0: 3635.6. Samples: 102763906. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 05:29:13,968][134211] Avg episode reward: [(0, '7.429')] [2025-01-04 05:29:14,044][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000110933_454381568.pth... [2025-01-04 05:29:14,085][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000110062_450813952.pth [2025-01-04 05:29:14,275][134294] Updated weights for policy 0, policy_version 110934 (0.0013) [2025-01-04 05:29:16,161][134294] Updated weights for policy 0, policy_version 110944 (0.0013) [2025-01-04 05:29:18,093][134294] Updated weights for policy 0, policy_version 110954 (0.0016) [2025-01-04 05:29:18,967][134211] Fps is (10 sec: 19251.7, 60 sec: 14950.4, 300 sec: 15037.2). Total num frames: 454483968. Throughput: 0: 3786.6. Samples: 102780112. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 05:29:18,968][134211] Avg episode reward: [(0, '7.598')] [2025-01-04 05:29:19,984][134294] Updated weights for policy 0, policy_version 110964 (0.0013) [2025-01-04 05:29:22,471][134294] Updated weights for policy 0, policy_version 110974 (0.0023) [2025-01-04 05:29:23,968][134211] Fps is (10 sec: 18431.8, 60 sec: 15086.9, 300 sec: 14981.7). Total num frames: 454561792. Throughput: 0: 3998.5. Samples: 102809534. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 05:29:23,968][134211] Avg episode reward: [(0, '7.824')] [2025-01-04 05:29:26,189][134294] Updated weights for policy 0, policy_version 110984 (0.0029) [2025-01-04 05:29:28,968][134211] Fps is (10 sec: 13926.1, 60 sec: 15155.2, 300 sec: 14953.9). Total num frames: 454623232. Throughput: 0: 3875.2. Samples: 102826596. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 05:29:28,968][134211] Avg episode reward: [(0, '8.121')] [2025-01-04 05:29:29,622][134294] Updated weights for policy 0, policy_version 110994 (0.0029) [2025-01-04 05:29:32,755][134294] Updated weights for policy 0, policy_version 111004 (0.0027) [2025-01-04 05:29:33,968][134211] Fps is (10 sec: 12697.7, 60 sec: 15155.2, 300 sec: 14953.9). Total num frames: 454688768. Throughput: 0: 3819.7. Samples: 102836220. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 05:29:33,968][134211] Avg episode reward: [(0, '8.491')] [2025-01-04 05:29:35,685][134294] Updated weights for policy 0, policy_version 111014 (0.0026) [2025-01-04 05:29:38,801][134294] Updated weights for policy 0, policy_version 111024 (0.0026) [2025-01-04 05:29:38,968][134211] Fps is (10 sec: 13107.1, 60 sec: 15155.2, 300 sec: 14940.0). Total num frames: 454754304. Throughput: 0: 3819.2. Samples: 102856532. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 05:29:38,968][134211] Avg episode reward: [(0, '8.381')] [2025-01-04 05:29:41,724][134294] Updated weights for policy 0, policy_version 111034 (0.0025) [2025-01-04 05:29:43,968][134211] Fps is (10 sec: 13107.1, 60 sec: 14813.8, 300 sec: 14828.9). Total num frames: 454819840. Throughput: 0: 3718.5. Samples: 102876656. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 05:29:43,968][134211] Avg episode reward: [(0, '7.928')] [2025-01-04 05:29:44,913][134294] Updated weights for policy 0, policy_version 111044 (0.0025) [2025-01-04 05:29:48,365][134294] Updated weights for policy 0, policy_version 111054 (0.0023) [2025-01-04 05:29:48,970][134211] Fps is (10 sec: 12695.0, 60 sec: 14676.8, 300 sec: 14690.0). Total num frames: 454881280. Throughput: 0: 3658.4. Samples: 102885938. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 05:29:48,970][134211] Avg episode reward: [(0, '7.629')] [2025-01-04 05:29:51,689][134294] Updated weights for policy 0, policy_version 111064 (0.0023) [2025-01-04 05:29:53,849][134294] Updated weights for policy 0, policy_version 111074 (0.0013) [2025-01-04 05:29:53,968][134211] Fps is (10 sec: 13926.1, 60 sec: 14882.1, 300 sec: 14717.8). Total num frames: 454959104. Throughput: 0: 3660.0. Samples: 102905130. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 05:29:53,968][134211] Avg episode reward: [(0, '8.840')] [2025-01-04 05:29:56,014][134294] Updated weights for policy 0, policy_version 111084 (0.0014) [2025-01-04 05:29:58,045][134294] Updated weights for policy 0, policy_version 111094 (0.0013) [2025-01-04 05:29:58,968][134211] Fps is (10 sec: 17616.8, 60 sec: 15018.7, 300 sec: 14828.9). Total num frames: 455057408. Throughput: 0: 3781.1. Samples: 102934056. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 05:29:58,968][134211] Avg episode reward: [(0, '8.157')] [2025-01-04 05:30:00,285][134294] Updated weights for policy 0, policy_version 111104 (0.0014) [2025-01-04 05:30:03,851][134294] Updated weights for policy 0, policy_version 111114 (0.0027) [2025-01-04 05:30:03,968][134211] Fps is (10 sec: 16384.6, 60 sec: 14950.4, 300 sec: 14828.9). Total num frames: 455122944. Throughput: 0: 3703.1. Samples: 102946750. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 05:30:03,968][134211] Avg episode reward: [(0, '8.438')] [2025-01-04 05:30:07,098][134294] Updated weights for policy 0, policy_version 111124 (0.0029) [2025-01-04 05:30:08,968][134211] Fps is (10 sec: 12697.4, 60 sec: 14882.1, 300 sec: 14815.0). Total num frames: 455184384. Throughput: 0: 3443.0. Samples: 102964470. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 05:30:08,968][134211] Avg episode reward: [(0, '8.758')] [2025-01-04 05:30:10,197][134294] Updated weights for policy 0, policy_version 111134 (0.0027) [2025-01-04 05:30:13,271][134294] Updated weights for policy 0, policy_version 111144 (0.0028) [2025-01-04 05:30:13,968][134211] Fps is (10 sec: 13107.0, 60 sec: 14609.0, 300 sec: 14842.8). Total num frames: 455254016. Throughput: 0: 3507.3. Samples: 102984426. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 05:30:13,969][134211] Avg episode reward: [(0, '7.874')] [2025-01-04 05:30:16,242][134294] Updated weights for policy 0, policy_version 111154 (0.0027) [2025-01-04 05:30:18,968][134211] Fps is (10 sec: 13516.7, 60 sec: 13926.3, 300 sec: 14856.7). Total num frames: 455319552. Throughput: 0: 3520.2. Samples: 102994630. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 05:30:18,968][134211] Avg episode reward: [(0, '8.021')] [2025-01-04 05:30:19,469][134294] Updated weights for policy 0, policy_version 111164 (0.0025) [2025-01-04 05:30:22,316][134294] Updated weights for policy 0, policy_version 111174 (0.0022) [2025-01-04 05:30:23,968][134211] Fps is (10 sec: 14336.3, 60 sec: 13926.4, 300 sec: 14787.3). Total num frames: 455397376. Throughput: 0: 3512.1. Samples: 103014578. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 05:30:23,968][134211] Avg episode reward: [(0, '8.175')] [2025-01-04 05:30:24,474][134294] Updated weights for policy 0, policy_version 111184 (0.0016) [2025-01-04 05:30:26,977][134294] Updated weights for policy 0, policy_version 111194 (0.0018) [2025-01-04 05:30:28,968][134211] Fps is (10 sec: 15565.1, 60 sec: 14199.5, 300 sec: 14703.9). Total num frames: 455475200. Throughput: 0: 3617.8. Samples: 103039456. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 05:30:28,968][134211] Avg episode reward: [(0, '8.384')] [2025-01-04 05:30:30,215][134294] Updated weights for policy 0, policy_version 111204 (0.0027) [2025-01-04 05:30:33,249][134294] Updated weights for policy 0, policy_version 111214 (0.0025) [2025-01-04 05:30:33,968][134211] Fps is (10 sec: 14335.7, 60 sec: 14199.4, 300 sec: 14662.3). Total num frames: 455540736. Throughput: 0: 3631.1. Samples: 103049330. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 05:30:33,968][134211] Avg episode reward: [(0, '7.985')] [2025-01-04 05:30:36,446][134294] Updated weights for policy 0, policy_version 111224 (0.0024) [2025-01-04 05:30:38,968][134211] Fps is (10 sec: 12697.4, 60 sec: 14131.2, 300 sec: 14648.4). Total num frames: 455602176. Throughput: 0: 3631.9. Samples: 103068566. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 05:30:38,968][134211] Avg episode reward: [(0, '8.354')] [2025-01-04 05:30:39,679][134294] Updated weights for policy 0, policy_version 111234 (0.0023) [2025-01-04 05:30:41,677][134294] Updated weights for policy 0, policy_version 111244 (0.0015) [2025-01-04 05:30:43,756][134294] Updated weights for policy 0, policy_version 111254 (0.0013) [2025-01-04 05:30:43,968][134211] Fps is (10 sec: 15974.7, 60 sec: 14677.4, 300 sec: 14801.2). Total num frames: 455700480. Throughput: 0: 3560.6. Samples: 103094282. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 05:30:43,968][134211] Avg episode reward: [(0, '7.757')] [2025-01-04 05:30:45,784][134294] Updated weights for policy 0, policy_version 111264 (0.0013) [2025-01-04 05:30:47,598][134294] Updated weights for policy 0, policy_version 111274 (0.0014) [2025-01-04 05:30:48,968][134211] Fps is (10 sec: 19661.1, 60 sec: 15292.3, 300 sec: 14940.0). Total num frames: 455798784. Throughput: 0: 3623.5. Samples: 103109808. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 05:30:48,968][134211] Avg episode reward: [(0, '7.399')] [2025-01-04 05:30:50,284][134294] Updated weights for policy 0, policy_version 111284 (0.0020) [2025-01-04 05:30:53,687][134294] Updated weights for policy 0, policy_version 111294 (0.0030) [2025-01-04 05:30:53,968][134211] Fps is (10 sec: 15974.0, 60 sec: 15018.7, 300 sec: 14926.1). Total num frames: 455860224. Throughput: 0: 3762.0. Samples: 103133760. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 05:30:53,968][134211] Avg episode reward: [(0, '8.994')] [2025-01-04 05:30:57,271][134294] Updated weights for policy 0, policy_version 111304 (0.0028) [2025-01-04 05:30:58,968][134211] Fps is (10 sec: 12288.0, 60 sec: 14404.2, 300 sec: 14773.4). Total num frames: 455921664. Throughput: 0: 3705.4. Samples: 103151168. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 05:30:58,968][134211] Avg episode reward: [(0, '8.645')] [2025-01-04 05:31:00,403][134294] Updated weights for policy 0, policy_version 111314 (0.0028) [2025-01-04 05:31:03,495][134294] Updated weights for policy 0, policy_version 111324 (0.0025) [2025-01-04 05:31:03,968][134211] Fps is (10 sec: 12697.6, 60 sec: 14404.2, 300 sec: 14690.1). Total num frames: 455987200. Throughput: 0: 3704.7. Samples: 103161342. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 05:31:03,968][134211] Avg episode reward: [(0, '7.868')] [2025-01-04 05:31:06,447][134294] Updated weights for policy 0, policy_version 111334 (0.0022) [2025-01-04 05:31:08,968][134211] Fps is (10 sec: 13107.1, 60 sec: 14472.5, 300 sec: 14690.1). Total num frames: 456052736. Throughput: 0: 3707.0. Samples: 103181394. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 05:31:08,968][134211] Avg episode reward: [(0, '8.133')] [2025-01-04 05:31:09,638][134294] Updated weights for policy 0, policy_version 111344 (0.0024) [2025-01-04 05:31:12,750][134294] Updated weights for policy 0, policy_version 111354 (0.0025) [2025-01-04 05:31:13,968][134211] Fps is (10 sec: 13516.7, 60 sec: 14472.5, 300 sec: 14703.9). Total num frames: 456122368. Throughput: 0: 3591.4. Samples: 103201070. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 05:31:13,968][134211] Avg episode reward: [(0, '7.750')] [2025-01-04 05:31:13,978][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000111358_456122368.pth... [2025-01-04 05:31:14,048][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000110493_452579328.pth [2025-01-04 05:31:15,802][134294] Updated weights for policy 0, policy_version 111364 (0.0028) [2025-01-04 05:31:18,192][134294] Updated weights for policy 0, policy_version 111374 (0.0017) [2025-01-04 05:31:18,968][134211] Fps is (10 sec: 15155.4, 60 sec: 14745.6, 300 sec: 14773.4). Total num frames: 456204288. Throughput: 0: 3595.4. Samples: 103211124. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 05:31:18,968][134211] Avg episode reward: [(0, '8.510')] [2025-01-04 05:31:20,207][134294] Updated weights for policy 0, policy_version 111384 (0.0013) [2025-01-04 05:31:22,216][134294] Updated weights for policy 0, policy_version 111394 (0.0013) [2025-01-04 05:31:23,968][134211] Fps is (10 sec: 18022.8, 60 sec: 15086.9, 300 sec: 14898.3). Total num frames: 456302592. Throughput: 0: 3822.7. Samples: 103240586. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 05:31:23,968][134211] Avg episode reward: [(0, '8.521')] [2025-01-04 05:31:24,348][134294] Updated weights for policy 0, policy_version 111404 (0.0018) [2025-01-04 05:31:27,742][134294] Updated weights for policy 0, policy_version 111414 (0.0029) [2025-01-04 05:31:28,968][134211] Fps is (10 sec: 15974.2, 60 sec: 14813.8, 300 sec: 14884.5). Total num frames: 456364032. Throughput: 0: 3735.8. Samples: 103262394. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 05:31:28,968][134211] Avg episode reward: [(0, '9.086')] [2025-01-04 05:31:31,005][134294] Updated weights for policy 0, policy_version 111424 (0.0025) [2025-01-04 05:31:33,968][134211] Fps is (10 sec: 12697.4, 60 sec: 14813.9, 300 sec: 14745.6). Total num frames: 456429568. Throughput: 0: 3607.3. Samples: 103272138. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 05:31:33,969][134211] Avg episode reward: [(0, '7.550')] [2025-01-04 05:31:34,183][134294] Updated weights for policy 0, policy_version 111434 (0.0027) [2025-01-04 05:31:37,299][134294] Updated weights for policy 0, policy_version 111444 (0.0023) [2025-01-04 05:31:38,968][134211] Fps is (10 sec: 13107.2, 60 sec: 14882.1, 300 sec: 14606.7). Total num frames: 456495104. Throughput: 0: 3504.8. Samples: 103291476. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 05:31:38,968][134211] Avg episode reward: [(0, '9.122')] [2025-01-04 05:31:40,455][134294] Updated weights for policy 0, policy_version 111454 (0.0031) [2025-01-04 05:31:43,287][134294] Updated weights for policy 0, policy_version 111464 (0.0023) [2025-01-04 05:31:43,968][134211] Fps is (10 sec: 13517.0, 60 sec: 14404.2, 300 sec: 14606.8). Total num frames: 456564736. Throughput: 0: 3570.8. Samples: 103311852. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 05:31:43,968][134211] Avg episode reward: [(0, '9.278')] [2025-01-04 05:31:46,387][134294] Updated weights for policy 0, policy_version 111474 (0.0026) [2025-01-04 05:31:48,968][134211] Fps is (10 sec: 13926.5, 60 sec: 13926.4, 300 sec: 14634.5). Total num frames: 456634368. Throughput: 0: 3567.6. Samples: 103321882. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 05:31:48,968][134211] Avg episode reward: [(0, '8.123')] [2025-01-04 05:31:49,117][134294] Updated weights for policy 0, policy_version 111484 (0.0017) [2025-01-04 05:31:52,120][134294] Updated weights for policy 0, policy_version 111494 (0.0020) [2025-01-04 05:31:53,968][134211] Fps is (10 sec: 14336.1, 60 sec: 14131.3, 300 sec: 14662.4). Total num frames: 456708096. Throughput: 0: 3600.6. Samples: 103343420. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 05:31:53,968][134211] Avg episode reward: [(0, '8.435')] [2025-01-04 05:31:54,351][134294] Updated weights for policy 0, policy_version 111504 (0.0013) [2025-01-04 05:31:56,498][134294] Updated weights for policy 0, policy_version 111514 (0.0014) [2025-01-04 05:31:58,456][134294] Updated weights for policy 0, policy_version 111524 (0.0013) [2025-01-04 05:31:58,967][134211] Fps is (10 sec: 17613.1, 60 sec: 14813.9, 300 sec: 14787.3). Total num frames: 456810496. Throughput: 0: 3808.6. Samples: 103372456. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 05:31:58,968][134211] Avg episode reward: [(0, '8.275')] [2025-01-04 05:32:00,313][134294] Updated weights for policy 0, policy_version 111534 (0.0012) [2025-01-04 05:32:03,152][134294] Updated weights for policy 0, policy_version 111544 (0.0023) [2025-01-04 05:32:03,968][134211] Fps is (10 sec: 18431.6, 60 sec: 15086.9, 300 sec: 14842.8). Total num frames: 456892416. Throughput: 0: 3929.0. Samples: 103387932. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 05:32:03,968][134211] Avg episode reward: [(0, '8.940')] [2025-01-04 05:32:06,399][134294] Updated weights for policy 0, policy_version 111554 (0.0029) [2025-01-04 05:32:08,968][134211] Fps is (10 sec: 14335.6, 60 sec: 15018.7, 300 sec: 14829.1). Total num frames: 456953856. Throughput: 0: 3696.7. Samples: 103406940. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 05:32:08,968][134211] Avg episode reward: [(0, '8.098')] [2025-01-04 05:32:09,711][134294] Updated weights for policy 0, policy_version 111564 (0.0030) [2025-01-04 05:32:12,784][134294] Updated weights for policy 0, policy_version 111574 (0.0026) [2025-01-04 05:32:13,968][134211] Fps is (10 sec: 12697.6, 60 sec: 14950.4, 300 sec: 14787.2). Total num frames: 457019392. Throughput: 0: 3637.3. Samples: 103426072. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 05:32:13,969][134211] Avg episode reward: [(0, '8.189')] [2025-01-04 05:32:15,837][134294] Updated weights for policy 0, policy_version 111584 (0.0024) [2025-01-04 05:32:18,893][134294] Updated weights for policy 0, policy_version 111594 (0.0025) [2025-01-04 05:32:18,968][134211] Fps is (10 sec: 13516.8, 60 sec: 14745.6, 300 sec: 14676.2). Total num frames: 457089024. Throughput: 0: 3646.7. Samples: 103436240. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 05:32:18,969][134211] Avg episode reward: [(0, '9.070')] [2025-01-04 05:32:22,033][134294] Updated weights for policy 0, policy_version 111604 (0.0025) [2025-01-04 05:32:23,968][134211] Fps is (10 sec: 13106.9, 60 sec: 14131.1, 300 sec: 14537.3). Total num frames: 457150464. Throughput: 0: 3660.6. Samples: 103456206. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 05:32:23,969][134211] Avg episode reward: [(0, '8.200')] [2025-01-04 05:32:25,369][134294] Updated weights for policy 0, policy_version 111614 (0.0024) [2025-01-04 05:32:28,759][134294] Updated weights for policy 0, policy_version 111624 (0.0027) [2025-01-04 05:32:28,968][134211] Fps is (10 sec: 12288.0, 60 sec: 14131.2, 300 sec: 14509.6). Total num frames: 457211904. Throughput: 0: 3613.3. Samples: 103474452. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 05:32:28,968][134211] Avg episode reward: [(0, '7.964')] [2025-01-04 05:32:31,925][134294] Updated weights for policy 0, policy_version 111634 (0.0022) [2025-01-04 05:32:33,968][134211] Fps is (10 sec: 13926.9, 60 sec: 14336.0, 300 sec: 14551.2). Total num frames: 457289728. Throughput: 0: 3601.4. Samples: 103483946. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 05:32:33,968][134211] Avg episode reward: [(0, '8.253')] [2025-01-04 05:32:34,119][134294] Updated weights for policy 0, policy_version 111644 (0.0014) [2025-01-04 05:32:36,038][134294] Updated weights for policy 0, policy_version 111654 (0.0013) [2025-01-04 05:32:37,973][134294] Updated weights for policy 0, policy_version 111664 (0.0013) [2025-01-04 05:32:38,968][134211] Fps is (10 sec: 18432.4, 60 sec: 15018.7, 300 sec: 14690.1). Total num frames: 457396224. Throughput: 0: 3768.0. Samples: 103512978. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 05:32:38,968][134211] Avg episode reward: [(0, '8.805')] [2025-01-04 05:32:39,910][134294] Updated weights for policy 0, policy_version 111674 (0.0013) [2025-01-04 05:32:41,935][134294] Updated weights for policy 0, policy_version 111684 (0.0016) [2025-01-04 05:32:43,968][134211] Fps is (10 sec: 19251.0, 60 sec: 15291.7, 300 sec: 14787.3). Total num frames: 457482240. Throughput: 0: 3763.8. Samples: 103541826. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 05:32:43,969][134211] Avg episode reward: [(0, '8.187')] [2025-01-04 05:32:45,209][134294] Updated weights for policy 0, policy_version 111694 (0.0031) [2025-01-04 05:32:48,545][134294] Updated weights for policy 0, policy_version 111704 (0.0029) [2025-01-04 05:32:48,968][134211] Fps is (10 sec: 14335.7, 60 sec: 15086.9, 300 sec: 14787.3). Total num frames: 457539584. Throughput: 0: 3614.8. Samples: 103550600. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 05:32:48,969][134211] Avg episode reward: [(0, '7.939')] [2025-01-04 05:32:51,862][134294] Updated weights for policy 0, policy_version 111714 (0.0027) [2025-01-04 05:32:53,975][134211] Fps is (10 sec: 12279.4, 60 sec: 14948.6, 300 sec: 14786.9). Total num frames: 457605120. Throughput: 0: 3610.6. Samples: 103569442. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 05:32:53,976][134211] Avg episode reward: [(0, '8.971')] [2025-01-04 05:32:55,351][134294] Updated weights for policy 0, policy_version 111724 (0.0028) [2025-01-04 05:32:58,697][134294] Updated weights for policy 0, policy_version 111734 (0.0025) [2025-01-04 05:32:58,968][134211] Fps is (10 sec: 12697.5, 60 sec: 14267.7, 300 sec: 14717.8). Total num frames: 457666560. Throughput: 0: 3583.2. Samples: 103587314. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 05:32:58,971][134211] Avg episode reward: [(0, '8.734')] [2025-01-04 05:33:01,684][134294] Updated weights for policy 0, policy_version 111744 (0.0025) [2025-01-04 05:33:03,968][134211] Fps is (10 sec: 12706.7, 60 sec: 13994.7, 300 sec: 14690.1). Total num frames: 457732096. Throughput: 0: 3577.1. Samples: 103597210. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 05:33:03,968][134211] Avg episode reward: [(0, '8.593')] [2025-01-04 05:33:04,860][134294] Updated weights for policy 0, policy_version 111754 (0.0023) [2025-01-04 05:33:07,550][134294] Updated weights for policy 0, policy_version 111764 (0.0021) [2025-01-04 05:33:08,968][134211] Fps is (10 sec: 14745.9, 60 sec: 14336.0, 300 sec: 14703.9). Total num frames: 457814016. Throughput: 0: 3581.9. Samples: 103617392. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 05:33:08,968][134211] Avg episode reward: [(0, '8.402')] [2025-01-04 05:33:09,643][134294] Updated weights for policy 0, policy_version 111774 (0.0015) [2025-01-04 05:33:12,573][134294] Updated weights for policy 0, policy_version 111784 (0.0027) [2025-01-04 05:33:13,968][134211] Fps is (10 sec: 15155.1, 60 sec: 14404.3, 300 sec: 14565.1). Total num frames: 457883648. Throughput: 0: 3722.5. Samples: 103641966. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 05:33:13,968][134211] Avg episode reward: [(0, '7.946')] [2025-01-04 05:33:13,976][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000111788_457883648.pth... [2025-01-04 05:33:14,052][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000110933_454381568.pth [2025-01-04 05:33:15,671][134294] Updated weights for policy 0, policy_version 111794 (0.0028) [2025-01-04 05:33:18,259][134294] Updated weights for policy 0, policy_version 111804 (0.0018) [2025-01-04 05:33:18,968][134211] Fps is (10 sec: 14745.7, 60 sec: 14540.9, 300 sec: 14592.9). Total num frames: 457961472. Throughput: 0: 3732.2. Samples: 103651894. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 05:33:18,968][134211] Avg episode reward: [(0, '8.149')] [2025-01-04 05:33:20,605][134294] Updated weights for policy 0, policy_version 111814 (0.0019) [2025-01-04 05:33:23,937][134294] Updated weights for policy 0, policy_version 111824 (0.0030) [2025-01-04 05:33:23,968][134211] Fps is (10 sec: 14745.5, 60 sec: 14677.4, 300 sec: 14634.5). Total num frames: 458031104. Throughput: 0: 3613.3. Samples: 103675576. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 05:33:23,968][134211] Avg episode reward: [(0, '7.858')] [2025-01-04 05:33:27,013][134294] Updated weights for policy 0, policy_version 111834 (0.0021) [2025-01-04 05:33:28,967][134211] Fps is (10 sec: 14745.7, 60 sec: 14950.5, 300 sec: 14676.2). Total num frames: 458108928. Throughput: 0: 3455.6. Samples: 103697326. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 05:33:28,968][134211] Avg episode reward: [(0, '8.065')] [2025-01-04 05:33:28,973][134294] Updated weights for policy 0, policy_version 111844 (0.0012) [2025-01-04 05:33:30,884][134294] Updated weights for policy 0, policy_version 111854 (0.0014) [2025-01-04 05:33:32,792][134294] Updated weights for policy 0, policy_version 111864 (0.0014) [2025-01-04 05:33:33,968][134211] Fps is (10 sec: 18432.4, 60 sec: 15428.3, 300 sec: 14815.0). Total num frames: 458215424. Throughput: 0: 3616.7. Samples: 103713350. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 05:33:33,968][134211] Avg episode reward: [(0, '8.738')] [2025-01-04 05:33:35,101][134294] Updated weights for policy 0, policy_version 111874 (0.0019) [2025-01-04 05:33:38,523][134294] Updated weights for policy 0, policy_version 111884 (0.0028) [2025-01-04 05:33:38,968][134211] Fps is (10 sec: 17202.9, 60 sec: 14745.6, 300 sec: 14745.6). Total num frames: 458280960. Throughput: 0: 3754.4. Samples: 103738362. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 05:33:38,968][134211] Avg episode reward: [(0, '7.712')] [2025-01-04 05:33:41,809][134294] Updated weights for policy 0, policy_version 111894 (0.0027) [2025-01-04 05:33:43,968][134211] Fps is (10 sec: 12697.2, 60 sec: 14336.0, 300 sec: 14717.8). Total num frames: 458342400. Throughput: 0: 3768.1. Samples: 103756878. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 05:33:43,969][134211] Avg episode reward: [(0, '8.264')] [2025-01-04 05:33:45,083][134294] Updated weights for policy 0, policy_version 111904 (0.0028) [2025-01-04 05:33:48,541][134294] Updated weights for policy 0, policy_version 111914 (0.0027) [2025-01-04 05:33:48,968][134211] Fps is (10 sec: 11878.3, 60 sec: 14336.0, 300 sec: 14690.1). Total num frames: 458399744. Throughput: 0: 3758.8. Samples: 103766354. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 05:33:48,968][134211] Avg episode reward: [(0, '8.019')] [2025-01-04 05:33:52,066][134294] Updated weights for policy 0, policy_version 111924 (0.0029) [2025-01-04 05:33:53,968][134211] Fps is (10 sec: 11877.8, 60 sec: 14269.3, 300 sec: 14592.8). Total num frames: 458461184. Throughput: 0: 3693.3. Samples: 103783592. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 05:33:53,969][134211] Avg episode reward: [(0, '8.089')] [2025-01-04 05:33:54,983][134294] Updated weights for policy 0, policy_version 111934 (0.0019) [2025-01-04 05:33:57,168][134294] Updated weights for policy 0, policy_version 111944 (0.0013) [2025-01-04 05:33:58,967][134211] Fps is (10 sec: 15974.8, 60 sec: 14882.2, 300 sec: 14690.1). Total num frames: 458559488. Throughput: 0: 3715.4. Samples: 103809160. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 05:33:58,968][134211] Avg episode reward: [(0, '8.100')] [2025-01-04 05:33:59,110][134294] Updated weights for policy 0, policy_version 111954 (0.0015) [2025-01-04 05:34:00,999][134294] Updated weights for policy 0, policy_version 111964 (0.0014) [2025-01-04 05:34:02,951][134294] Updated weights for policy 0, policy_version 111974 (0.0014) [2025-01-04 05:34:03,968][134211] Fps is (10 sec: 20071.7, 60 sec: 15496.5, 300 sec: 14815.0). Total num frames: 458661888. Throughput: 0: 3855.9. Samples: 103825410. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 05:34:03,969][134211] Avg episode reward: [(0, '8.089')] [2025-01-04 05:34:05,818][134294] Updated weights for policy 0, policy_version 111984 (0.0025) [2025-01-04 05:34:08,968][134211] Fps is (10 sec: 16383.5, 60 sec: 15155.2, 300 sec: 14731.7). Total num frames: 458723328. Throughput: 0: 3850.4. Samples: 103848844. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 05:34:08,968][134211] Avg episode reward: [(0, '8.318')] [2025-01-04 05:34:09,128][134294] Updated weights for policy 0, policy_version 111994 (0.0028) [2025-01-04 05:34:12,147][134294] Updated weights for policy 0, policy_version 112004 (0.0025) [2025-01-04 05:34:13,968][134211] Fps is (10 sec: 12697.5, 60 sec: 15086.9, 300 sec: 14592.9). Total num frames: 458788864. Throughput: 0: 3799.2. Samples: 103868290. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 05:34:13,969][134211] Avg episode reward: [(0, '7.810')] [2025-01-04 05:34:15,478][134294] Updated weights for policy 0, policy_version 112014 (0.0027) [2025-01-04 05:34:18,494][134294] Updated weights for policy 0, policy_version 112024 (0.0027) [2025-01-04 05:34:18,968][134211] Fps is (10 sec: 13107.3, 60 sec: 14882.1, 300 sec: 14551.2). Total num frames: 458854400. Throughput: 0: 3656.0. Samples: 103877870. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 05:34:18,968][134211] Avg episode reward: [(0, '7.515')] [2025-01-04 05:34:21,521][134294] Updated weights for policy 0, policy_version 112034 (0.0027) [2025-01-04 05:34:23,968][134211] Fps is (10 sec: 12697.8, 60 sec: 14745.6, 300 sec: 14551.2). Total num frames: 458915840. Throughput: 0: 3541.6. Samples: 103897732. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 05:34:23,968][134211] Avg episode reward: [(0, '7.866')] [2025-01-04 05:34:25,076][134294] Updated weights for policy 0, policy_version 112044 (0.0029) [2025-01-04 05:34:28,159][134294] Updated weights for policy 0, policy_version 112054 (0.0027) [2025-01-04 05:34:28,968][134211] Fps is (10 sec: 12696.9, 60 sec: 14540.6, 300 sec: 14551.2). Total num frames: 458981376. Throughput: 0: 3548.5. Samples: 103916562. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 05:34:28,969][134211] Avg episode reward: [(0, '8.429')] [2025-01-04 05:34:31,220][134294] Updated weights for policy 0, policy_version 112064 (0.0026) [2025-01-04 05:34:33,968][134211] Fps is (10 sec: 13106.2, 60 sec: 13857.9, 300 sec: 14551.2). Total num frames: 459046912. Throughput: 0: 3558.7. Samples: 103926498. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 05:34:33,969][134211] Avg episode reward: [(0, '7.462')] [2025-01-04 05:34:34,381][134294] Updated weights for policy 0, policy_version 112074 (0.0027) [2025-01-04 05:34:36,369][134294] Updated weights for policy 0, policy_version 112084 (0.0012) [2025-01-04 05:34:38,318][134294] Updated weights for policy 0, policy_version 112094 (0.0013) [2025-01-04 05:34:38,967][134211] Fps is (10 sec: 16794.8, 60 sec: 14472.6, 300 sec: 14676.2). Total num frames: 459149312. Throughput: 0: 3731.0. Samples: 103951484. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 05:34:38,968][134211] Avg episode reward: [(0, '7.793')] [2025-01-04 05:34:40,176][134294] Updated weights for policy 0, policy_version 112104 (0.0013) [2025-01-04 05:34:42,105][134294] Updated weights for policy 0, policy_version 112114 (0.0012) [2025-01-04 05:34:43,968][134211] Fps is (10 sec: 20071.9, 60 sec: 15087.0, 300 sec: 14801.2). Total num frames: 459247616. Throughput: 0: 3862.2. Samples: 103982960. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 05:34:43,968][134211] Avg episode reward: [(0, '8.331')] [2025-01-04 05:34:44,645][134294] Updated weights for policy 0, policy_version 112124 (0.0020) [2025-01-04 05:34:47,975][134294] Updated weights for policy 0, policy_version 112134 (0.0029) [2025-01-04 05:34:48,969][134211] Fps is (10 sec: 15972.1, 60 sec: 15154.9, 300 sec: 14745.6). Total num frames: 459309056. Throughput: 0: 3726.6. Samples: 103993110. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 05:34:48,970][134211] Avg episode reward: [(0, '7.957')] [2025-01-04 05:34:51,367][134294] Updated weights for policy 0, policy_version 112144 (0.0027) [2025-01-04 05:34:53,968][134211] Fps is (10 sec: 12287.8, 60 sec: 15155.3, 300 sec: 14620.6). Total num frames: 459370496. Throughput: 0: 3596.3. Samples: 104010676. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 05:34:53,969][134211] Avg episode reward: [(0, '8.033')] [2025-01-04 05:34:55,008][134294] Updated weights for policy 0, policy_version 112154 (0.0028) [2025-01-04 05:34:58,365][134294] Updated weights for policy 0, policy_version 112164 (0.0028) [2025-01-04 05:34:58,968][134211] Fps is (10 sec: 11879.8, 60 sec: 14472.5, 300 sec: 14592.9). Total num frames: 459427840. Throughput: 0: 3562.6. Samples: 104028608. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 05:34:58,968][134211] Avg episode reward: [(0, '8.272')] [2025-01-04 05:35:01,411][134294] Updated weights for policy 0, policy_version 112174 (0.0026) [2025-01-04 05:35:03,968][134211] Fps is (10 sec: 12697.7, 60 sec: 13926.4, 300 sec: 14620.6). Total num frames: 459497472. Throughput: 0: 3572.9. Samples: 104038650. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 05:35:03,968][134211] Avg episode reward: [(0, '7.139')] [2025-01-04 05:35:04,533][134294] Updated weights for policy 0, policy_version 112184 (0.0025) [2025-01-04 05:35:07,110][134294] Updated weights for policy 0, policy_version 112194 (0.0020) [2025-01-04 05:35:08,967][134211] Fps is (10 sec: 15565.3, 60 sec: 14336.1, 300 sec: 14676.2). Total num frames: 459583488. Throughput: 0: 3609.5. Samples: 104060158. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 05:35:08,968][134211] Avg episode reward: [(0, '8.143')] [2025-01-04 05:35:09,049][134294] Updated weights for policy 0, policy_version 112204 (0.0015) [2025-01-04 05:35:10,902][134294] Updated weights for policy 0, policy_version 112214 (0.0012) [2025-01-04 05:35:12,800][134294] Updated weights for policy 0, policy_version 112224 (0.0014) [2025-01-04 05:35:13,968][134211] Fps is (10 sec: 19251.6, 60 sec: 15018.7, 300 sec: 14815.0). Total num frames: 459689984. Throughput: 0: 3914.8. Samples: 104092726. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 05:35:13,968][134211] Avg episode reward: [(0, '7.654')] [2025-01-04 05:35:13,976][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000112230_459694080.pth... [2025-01-04 05:35:14,036][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000111358_456122368.pth [2025-01-04 05:35:14,973][134294] Updated weights for policy 0, policy_version 112234 (0.0019) [2025-01-04 05:35:18,160][134294] Updated weights for policy 0, policy_version 112244 (0.0028) [2025-01-04 05:35:18,969][134211] Fps is (10 sec: 17609.7, 60 sec: 15086.5, 300 sec: 14787.2). Total num frames: 459759616. Throughput: 0: 3963.4. Samples: 104104856. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 05:35:18,970][134211] Avg episode reward: [(0, '8.188')] [2025-01-04 05:35:21,556][134294] Updated weights for policy 0, policy_version 112254 (0.0027) [2025-01-04 05:35:23,968][134211] Fps is (10 sec: 12697.3, 60 sec: 15018.6, 300 sec: 14717.8). Total num frames: 459816960. Throughput: 0: 3814.0. Samples: 104123116. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 05:35:23,969][134211] Avg episode reward: [(0, '7.730')] [2025-01-04 05:35:25,104][134294] Updated weights for policy 0, policy_version 112264 (0.0028) [2025-01-04 05:35:28,393][134294] Updated weights for policy 0, policy_version 112274 (0.0025) [2025-01-04 05:35:28,968][134211] Fps is (10 sec: 11880.1, 60 sec: 14950.5, 300 sec: 14703.9). Total num frames: 459878400. Throughput: 0: 3512.8. Samples: 104141036. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 05:35:28,968][134211] Avg episode reward: [(0, '8.588')] [2025-01-04 05:35:31,419][134294] Updated weights for policy 0, policy_version 112284 (0.0027) [2025-01-04 05:35:33,968][134211] Fps is (10 sec: 13107.3, 60 sec: 15018.8, 300 sec: 14731.7). Total num frames: 459948032. Throughput: 0: 3511.6. Samples: 104151128. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 05:35:33,968][134211] Avg episode reward: [(0, '8.009')] [2025-01-04 05:35:34,614][134294] Updated weights for policy 0, policy_version 112294 (0.0025) [2025-01-04 05:35:37,585][134294] Updated weights for policy 0, policy_version 112304 (0.0025) [2025-01-04 05:35:38,968][134211] Fps is (10 sec: 13517.0, 60 sec: 14404.2, 300 sec: 14620.6). Total num frames: 460013568. Throughput: 0: 3568.1. Samples: 104171240. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 05:35:38,968][134211] Avg episode reward: [(0, '8.367')] [2025-01-04 05:35:40,546][134294] Updated weights for policy 0, policy_version 112314 (0.0024) [2025-01-04 05:35:43,032][134294] Updated weights for policy 0, policy_version 112324 (0.0021) [2025-01-04 05:35:43,968][134211] Fps is (10 sec: 14745.9, 60 sec: 14131.2, 300 sec: 14565.1). Total num frames: 460095488. Throughput: 0: 3671.2. Samples: 104193812. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 05:35:43,968][134211] Avg episode reward: [(0, '8.268')] [2025-01-04 05:35:44,959][134294] Updated weights for policy 0, policy_version 112334 (0.0013) [2025-01-04 05:35:46,789][134294] Updated weights for policy 0, policy_version 112344 (0.0015) [2025-01-04 05:35:48,909][134294] Updated weights for policy 0, policy_version 112354 (0.0014) [2025-01-04 05:35:48,968][134211] Fps is (10 sec: 18839.9, 60 sec: 14882.3, 300 sec: 14717.8). Total num frames: 460201984. Throughput: 0: 3809.4. Samples: 104210074. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 05:35:48,969][134211] Avg episode reward: [(0, '8.887')] [2025-01-04 05:35:51,015][134294] Updated weights for policy 0, policy_version 112364 (0.0013) [2025-01-04 05:35:53,968][134211] Fps is (10 sec: 18021.0, 60 sec: 15086.8, 300 sec: 14759.4). Total num frames: 460275712. Throughput: 0: 3948.2. Samples: 104237832. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 05:35:53,969][134211] Avg episode reward: [(0, '7.986')] [2025-01-04 05:35:54,525][134294] Updated weights for policy 0, policy_version 112374 (0.0029) [2025-01-04 05:35:58,053][134294] Updated weights for policy 0, policy_version 112384 (0.0027) [2025-01-04 05:35:58,968][134211] Fps is (10 sec: 13108.2, 60 sec: 15086.9, 300 sec: 14731.7). Total num frames: 460333056. Throughput: 0: 3599.8. Samples: 104254716. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 05:35:58,968][134211] Avg episode reward: [(0, '8.190')] [2025-01-04 05:36:01,284][134294] Updated weights for policy 0, policy_version 112394 (0.0028) [2025-01-04 05:36:03,968][134211] Fps is (10 sec: 11879.0, 60 sec: 14950.4, 300 sec: 14717.8). Total num frames: 460394496. Throughput: 0: 3536.6. Samples: 104263996. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 05:36:03,969][134211] Avg episode reward: [(0, '8.025')] [2025-01-04 05:36:04,873][134294] Updated weights for policy 0, policy_version 112404 (0.0030) [2025-01-04 05:36:08,065][134294] Updated weights for policy 0, policy_version 112414 (0.0025) [2025-01-04 05:36:08,968][134211] Fps is (10 sec: 12288.2, 60 sec: 14540.8, 300 sec: 14690.1). Total num frames: 460455936. Throughput: 0: 3537.9. Samples: 104282320. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 05:36:08,968][134211] Avg episode reward: [(0, '8.767')] [2025-01-04 05:36:10,556][134294] Updated weights for policy 0, policy_version 112424 (0.0021) [2025-01-04 05:36:12,740][134294] Updated weights for policy 0, policy_version 112434 (0.0016) [2025-01-04 05:36:13,968][134211] Fps is (10 sec: 14745.8, 60 sec: 14199.4, 300 sec: 14703.9). Total num frames: 460541952. Throughput: 0: 3689.4. Samples: 104307058. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 05:36:13,968][134211] Avg episode reward: [(0, '9.737')] [2025-01-04 05:36:15,795][134294] Updated weights for policy 0, policy_version 112444 (0.0023) [2025-01-04 05:36:18,735][134294] Updated weights for policy 0, policy_version 112454 (0.0025) [2025-01-04 05:36:18,968][134211] Fps is (10 sec: 15564.8, 60 sec: 14199.8, 300 sec: 14606.8). Total num frames: 460611584. Throughput: 0: 3690.5. Samples: 104317200. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 05:36:18,968][134211] Avg episode reward: [(0, '8.866')] [2025-01-04 05:36:21,779][134294] Updated weights for policy 0, policy_version 112464 (0.0026) [2025-01-04 05:36:23,968][134211] Fps is (10 sec: 13516.8, 60 sec: 14336.0, 300 sec: 14620.6). Total num frames: 460677120. Throughput: 0: 3697.1. Samples: 104337610. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 05:36:23,968][134211] Avg episode reward: [(0, '8.817')] [2025-01-04 05:36:25,064][134294] Updated weights for policy 0, policy_version 112474 (0.0024) [2025-01-04 05:36:27,167][134294] Updated weights for policy 0, policy_version 112484 (0.0013) [2025-01-04 05:36:28,968][134211] Fps is (10 sec: 15974.5, 60 sec: 14882.2, 300 sec: 14717.8). Total num frames: 460771328. Throughput: 0: 3743.8. Samples: 104362284. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 05:36:28,968][134211] Avg episode reward: [(0, '8.531')] [2025-01-04 05:36:29,049][134294] Updated weights for policy 0, policy_version 112494 (0.0013) [2025-01-04 05:36:30,992][134294] Updated weights for policy 0, policy_version 112504 (0.0014) [2025-01-04 05:36:32,867][134294] Updated weights for policy 0, policy_version 112514 (0.0012) [2025-01-04 05:36:33,967][134211] Fps is (10 sec: 20070.7, 60 sec: 15496.6, 300 sec: 14856.7). Total num frames: 460877824. Throughput: 0: 3743.4. Samples: 104378524. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 05:36:33,968][134211] Avg episode reward: [(0, '8.268')] [2025-01-04 05:36:34,724][134294] Updated weights for policy 0, policy_version 112524 (0.0014) [2025-01-04 05:36:36,863][134294] Updated weights for policy 0, policy_version 112534 (0.0018) [2025-01-04 05:36:38,969][134211] Fps is (10 sec: 19249.2, 60 sec: 15837.6, 300 sec: 14912.2). Total num frames: 460963840. Throughput: 0: 3805.4. Samples: 104409074. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 05:36:38,969][134211] Avg episode reward: [(0, '8.562')] [2025-01-04 05:36:40,205][134294] Updated weights for policy 0, policy_version 112544 (0.0030) [2025-01-04 05:36:43,452][134294] Updated weights for policy 0, policy_version 112554 (0.0031) [2025-01-04 05:36:43,968][134211] Fps is (10 sec: 14745.1, 60 sec: 15496.5, 300 sec: 14884.4). Total num frames: 461025280. Throughput: 0: 3840.2. Samples: 104427524. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 05:36:43,968][134211] Avg episode reward: [(0, '8.161')] [2025-01-04 05:36:46,618][134294] Updated weights for policy 0, policy_version 112564 (0.0025) [2025-01-04 05:36:48,968][134211] Fps is (10 sec: 12698.8, 60 sec: 14814.1, 300 sec: 14856.7). Total num frames: 461090816. Throughput: 0: 3849.3. Samples: 104437216. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 05:36:48,968][134211] Avg episode reward: [(0, '7.601')] [2025-01-04 05:36:49,751][134294] Updated weights for policy 0, policy_version 112574 (0.0023) [2025-01-04 05:36:53,008][134294] Updated weights for policy 0, policy_version 112584 (0.0028) [2025-01-04 05:36:53,968][134211] Fps is (10 sec: 12697.8, 60 sec: 14609.2, 300 sec: 14717.8). Total num frames: 461152256. Throughput: 0: 3871.9. Samples: 104456556. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 05:36:53,968][134211] Avg episode reward: [(0, '8.789')] [2025-01-04 05:36:56,381][134294] Updated weights for policy 0, policy_version 112594 (0.0028) [2025-01-04 05:36:58,968][134211] Fps is (10 sec: 12288.0, 60 sec: 14677.4, 300 sec: 14648.4). Total num frames: 461213696. Throughput: 0: 3730.9. Samples: 104474950. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 05:36:58,968][134211] Avg episode reward: [(0, '8.011')] [2025-01-04 05:36:59,646][134294] Updated weights for policy 0, policy_version 112604 (0.0028) [2025-01-04 05:37:02,632][134294] Updated weights for policy 0, policy_version 112614 (0.0025) [2025-01-04 05:37:03,968][134211] Fps is (10 sec: 13107.3, 60 sec: 14813.9, 300 sec: 14676.2). Total num frames: 461283328. Throughput: 0: 3725.3. Samples: 104484840. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 05:37:03,968][134211] Avg episode reward: [(0, '8.972')] [2025-01-04 05:37:05,657][134294] Updated weights for policy 0, policy_version 112624 (0.0026) [2025-01-04 05:37:08,671][134294] Updated weights for policy 0, policy_version 112634 (0.0026) [2025-01-04 05:37:08,968][134211] Fps is (10 sec: 13516.8, 60 sec: 14882.1, 300 sec: 14676.2). Total num frames: 461348864. Throughput: 0: 3730.9. Samples: 104505502. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 05:37:08,968][134211] Avg episode reward: [(0, '7.545')] [2025-01-04 05:37:11,713][134294] Updated weights for policy 0, policy_version 112644 (0.0027) [2025-01-04 05:37:13,968][134211] Fps is (10 sec: 13516.6, 60 sec: 14609.1, 300 sec: 14676.2). Total num frames: 461418496. Throughput: 0: 3626.7. Samples: 104525488. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 05:37:13,968][134211] Avg episode reward: [(0, '8.620')] [2025-01-04 05:37:13,978][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000112651_461418496.pth... [2025-01-04 05:37:14,054][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000111788_457883648.pth [2025-01-04 05:37:14,859][134294] Updated weights for policy 0, policy_version 112654 (0.0024) [2025-01-04 05:37:17,913][134294] Updated weights for policy 0, policy_version 112664 (0.0026) [2025-01-04 05:37:18,968][134211] Fps is (10 sec: 13516.8, 60 sec: 14540.8, 300 sec: 14690.1). Total num frames: 461484032. Throughput: 0: 3481.8. Samples: 104535206. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 05:37:18,968][134211] Avg episode reward: [(0, '8.379')] [2025-01-04 05:37:20,273][134294] Updated weights for policy 0, policy_version 112674 (0.0016) [2025-01-04 05:37:22,307][134294] Updated weights for policy 0, policy_version 112684 (0.0014) [2025-01-04 05:37:23,968][134211] Fps is (10 sec: 16793.8, 60 sec: 15155.2, 300 sec: 14828.9). Total num frames: 461586432. Throughput: 0: 3378.5. Samples: 104561104. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 05:37:23,968][134211] Avg episode reward: [(0, '8.551')] [2025-01-04 05:37:24,347][134294] Updated weights for policy 0, policy_version 112694 (0.0012) [2025-01-04 05:37:26,657][134294] Updated weights for policy 0, policy_version 112704 (0.0016) [2025-01-04 05:37:28,968][134211] Fps is (10 sec: 18022.2, 60 sec: 14882.1, 300 sec: 14828.9). Total num frames: 461664256. Throughput: 0: 3550.4. Samples: 104587290. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 05:37:28,968][134211] Avg episode reward: [(0, '8.832')] [2025-01-04 05:37:29,745][134294] Updated weights for policy 0, policy_version 112714 (0.0026) [2025-01-04 05:37:32,993][134294] Updated weights for policy 0, policy_version 112724 (0.0025) [2025-01-04 05:37:33,968][134211] Fps is (10 sec: 14335.6, 60 sec: 14199.4, 300 sec: 14690.0). Total num frames: 461729792. Throughput: 0: 3545.1. Samples: 104596748. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 05:37:33,968][134211] Avg episode reward: [(0, '8.283')] [2025-01-04 05:37:36,068][134294] Updated weights for policy 0, policy_version 112734 (0.0025) [2025-01-04 05:37:38,968][134211] Fps is (10 sec: 13107.2, 60 sec: 13858.3, 300 sec: 14620.6). Total num frames: 461795328. Throughput: 0: 3546.7. Samples: 104616158. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 05:37:38,968][134211] Avg episode reward: [(0, '8.309')] [2025-01-04 05:37:39,369][134294] Updated weights for policy 0, policy_version 112744 (0.0028) [2025-01-04 05:37:42,489][134294] Updated weights for policy 0, policy_version 112754 (0.0026) [2025-01-04 05:37:43,968][134211] Fps is (10 sec: 13107.6, 60 sec: 13926.5, 300 sec: 14648.4). Total num frames: 461860864. Throughput: 0: 3569.7. Samples: 104635588. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 05:37:43,968][134211] Avg episode reward: [(0, '8.811')] [2025-01-04 05:37:44,747][134294] Updated weights for policy 0, policy_version 112764 (0.0016) [2025-01-04 05:37:46,646][134294] Updated weights for policy 0, policy_version 112774 (0.0013) [2025-01-04 05:37:48,675][134294] Updated weights for policy 0, policy_version 112784 (0.0013) [2025-01-04 05:37:48,969][134211] Fps is (10 sec: 17201.7, 60 sec: 14608.8, 300 sec: 14787.6). Total num frames: 461967360. Throughput: 0: 3702.8. Samples: 104651472. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 05:37:48,969][134211] Avg episode reward: [(0, '8.689')] [2025-01-04 05:37:50,886][134294] Updated weights for policy 0, policy_version 112794 (0.0014) [2025-01-04 05:37:53,971][134211] Fps is (10 sec: 17606.9, 60 sec: 14744.8, 300 sec: 14814.9). Total num frames: 462036992. Throughput: 0: 3838.1. Samples: 104678228. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 05:37:53,972][134211] Avg episode reward: [(0, '8.387')] [2025-01-04 05:37:54,645][134294] Updated weights for policy 0, policy_version 112804 (0.0028) [2025-01-04 05:37:58,208][134294] Updated weights for policy 0, policy_version 112814 (0.0030) [2025-01-04 05:37:58,968][134211] Fps is (10 sec: 12698.7, 60 sec: 14677.3, 300 sec: 14787.2). Total num frames: 462094336. Throughput: 0: 3760.7. Samples: 104694718. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 05:37:58,968][134211] Avg episode reward: [(0, '9.098')] [2025-01-04 05:38:01,235][134294] Updated weights for policy 0, policy_version 112824 (0.0029) [2025-01-04 05:38:03,968][134211] Fps is (10 sec: 12292.0, 60 sec: 14609.0, 300 sec: 14731.7). Total num frames: 462159872. Throughput: 0: 3767.1. Samples: 104704726. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 05:38:03,968][134211] Avg episode reward: [(0, '8.039')] [2025-01-04 05:38:04,554][134294] Updated weights for policy 0, policy_version 112834 (0.0027) [2025-01-04 05:38:07,619][134294] Updated weights for policy 0, policy_version 112844 (0.0025) [2025-01-04 05:38:08,968][134211] Fps is (10 sec: 13107.4, 60 sec: 14609.1, 300 sec: 14717.8). Total num frames: 462225408. Throughput: 0: 3623.8. Samples: 104724174. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 05:38:08,968][134211] Avg episode reward: [(0, '7.946')] [2025-01-04 05:38:10,573][134294] Updated weights for policy 0, policy_version 112854 (0.0027) [2025-01-04 05:38:13,570][134294] Updated weights for policy 0, policy_version 112864 (0.0024) [2025-01-04 05:38:13,969][134211] Fps is (10 sec: 13514.6, 60 sec: 14608.7, 300 sec: 14690.0). Total num frames: 462295040. Throughput: 0: 3498.7. Samples: 104744736. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 05:38:13,970][134211] Avg episode reward: [(0, '8.976')] [2025-01-04 05:38:16,011][134294] Updated weights for policy 0, policy_version 112874 (0.0020) [2025-01-04 05:38:18,235][134294] Updated weights for policy 0, policy_version 112884 (0.0017) [2025-01-04 05:38:18,968][134211] Fps is (10 sec: 15564.7, 60 sec: 14950.4, 300 sec: 14745.6). Total num frames: 462381056. Throughput: 0: 3562.5. Samples: 104757058. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 05:38:18,968][134211] Avg episode reward: [(0, '8.716')] [2025-01-04 05:38:21,310][134294] Updated weights for policy 0, policy_version 112894 (0.0028) [2025-01-04 05:38:23,968][134211] Fps is (10 sec: 14747.9, 60 sec: 14267.7, 300 sec: 14690.0). Total num frames: 462442496. Throughput: 0: 3626.8. Samples: 104779366. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 05:38:23,968][134211] Avg episode reward: [(0, '8.415')] [2025-01-04 05:38:24,569][134294] Updated weights for policy 0, policy_version 112904 (0.0024) [2025-01-04 05:38:26,696][134294] Updated weights for policy 0, policy_version 112914 (0.0013) [2025-01-04 05:38:28,909][134294] Updated weights for policy 0, policy_version 112924 (0.0016) [2025-01-04 05:38:28,968][134211] Fps is (10 sec: 15565.0, 60 sec: 14540.8, 300 sec: 14648.4). Total num frames: 462536704. Throughput: 0: 3748.0. Samples: 104804250. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 05:38:28,968][134211] Avg episode reward: [(0, '8.208')] [2025-01-04 05:38:31,990][134294] Updated weights for policy 0, policy_version 112934 (0.0025) [2025-01-04 05:38:33,968][134211] Fps is (10 sec: 15974.6, 60 sec: 14540.9, 300 sec: 14648.4). Total num frames: 462602240. Throughput: 0: 3623.3. Samples: 104814516. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 05:38:33,968][134211] Avg episode reward: [(0, '8.447')] [2025-01-04 05:38:35,157][134294] Updated weights for policy 0, policy_version 112944 (0.0023) [2025-01-04 05:38:38,168][134294] Updated weights for policy 0, policy_version 112954 (0.0025) [2025-01-04 05:38:38,968][134211] Fps is (10 sec: 13107.2, 60 sec: 14540.8, 300 sec: 14662.3). Total num frames: 462667776. Throughput: 0: 3483.1. Samples: 104834954. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 05:38:38,968][134211] Avg episode reward: [(0, '8.501')] [2025-01-04 05:38:41,422][134294] Updated weights for policy 0, policy_version 112964 (0.0029) [2025-01-04 05:38:43,968][134211] Fps is (10 sec: 13516.9, 60 sec: 14609.1, 300 sec: 14704.0). Total num frames: 462737408. Throughput: 0: 3554.9. Samples: 104854686. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 05:38:43,968][134211] Avg episode reward: [(0, '8.800')] [2025-01-04 05:38:44,034][134294] Updated weights for policy 0, policy_version 112974 (0.0016) [2025-01-04 05:38:46,110][134294] Updated weights for policy 0, policy_version 112984 (0.0013) [2025-01-04 05:38:48,127][134294] Updated weights for policy 0, policy_version 112994 (0.0011) [2025-01-04 05:38:48,968][134211] Fps is (10 sec: 16793.7, 60 sec: 14472.8, 300 sec: 14829.0). Total num frames: 462835712. Throughput: 0: 3654.7. Samples: 104869188. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 05:38:48,968][134211] Avg episode reward: [(0, '8.051')] [2025-01-04 05:38:50,176][134294] Updated weights for policy 0, policy_version 113004 (0.0013) [2025-01-04 05:38:52,936][134294] Updated weights for policy 0, policy_version 113014 (0.0018) [2025-01-04 05:38:53,968][134211] Fps is (10 sec: 17612.2, 60 sec: 14609.8, 300 sec: 14759.5). Total num frames: 462913536. Throughput: 0: 3851.4. Samples: 104897490. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 05:38:53,969][134211] Avg episode reward: [(0, '7.820')] [2025-01-04 05:38:56,755][134294] Updated weights for policy 0, policy_version 113024 (0.0029) [2025-01-04 05:38:58,968][134211] Fps is (10 sec: 13516.5, 60 sec: 14609.1, 300 sec: 14606.8). Total num frames: 462970880. Throughput: 0: 3751.0. Samples: 104913524. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 05:38:58,968][134211] Avg episode reward: [(0, '8.247')] [2025-01-04 05:39:00,405][134294] Updated weights for policy 0, policy_version 113034 (0.0026) [2025-01-04 05:39:03,857][134294] Updated weights for policy 0, policy_version 113044 (0.0027) [2025-01-04 05:39:03,968][134211] Fps is (10 sec: 11468.9, 60 sec: 14472.5, 300 sec: 14592.9). Total num frames: 463028224. Throughput: 0: 3669.5. Samples: 104922188. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 05:39:03,969][134211] Avg episode reward: [(0, '8.208')] [2025-01-04 05:39:06,279][134294] Updated weights for policy 0, policy_version 113054 (0.0016) [2025-01-04 05:39:08,799][134294] Updated weights for policy 0, policy_version 113064 (0.0021) [2025-01-04 05:39:08,968][134211] Fps is (10 sec: 13926.5, 60 sec: 14745.6, 300 sec: 14648.4). Total num frames: 463110144. Throughput: 0: 3681.6. Samples: 104945038. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 05:39:08,968][134211] Avg episode reward: [(0, '8.284')] [2025-01-04 05:39:11,855][134294] Updated weights for policy 0, policy_version 113074 (0.0024) [2025-01-04 05:39:13,968][134211] Fps is (10 sec: 14745.4, 60 sec: 14677.6, 300 sec: 14648.4). Total num frames: 463175680. Throughput: 0: 3576.1. Samples: 104965176. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 05:39:13,969][134211] Avg episode reward: [(0, '8.680')] [2025-01-04 05:39:13,978][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000113080_463175680.pth... [2025-01-04 05:39:14,062][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000112230_459694080.pth [2025-01-04 05:39:15,039][134294] Updated weights for policy 0, policy_version 113084 (0.0025) [2025-01-04 05:39:18,160][134294] Updated weights for policy 0, policy_version 113094 (0.0027) [2025-01-04 05:39:18,968][134211] Fps is (10 sec: 13107.2, 60 sec: 14336.0, 300 sec: 14662.3). Total num frames: 463241216. Throughput: 0: 3563.1. Samples: 104974854. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 05:39:18,968][134211] Avg episode reward: [(0, '7.162')] [2025-01-04 05:39:21,301][134294] Updated weights for policy 0, policy_version 113104 (0.0024) [2025-01-04 05:39:23,827][134294] Updated weights for policy 0, policy_version 113114 (0.0017) [2025-01-04 05:39:23,968][134211] Fps is (10 sec: 13926.9, 60 sec: 14540.8, 300 sec: 14690.1). Total num frames: 463314944. Throughput: 0: 3540.1. Samples: 104994258. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 05:39:23,968][134211] Avg episode reward: [(0, '8.519')] [2025-01-04 05:39:25,952][134294] Updated weights for policy 0, policy_version 113124 (0.0014) [2025-01-04 05:39:27,971][134294] Updated weights for policy 0, policy_version 113134 (0.0013) [2025-01-04 05:39:28,968][134211] Fps is (10 sec: 17203.3, 60 sec: 14609.0, 300 sec: 14801.2). Total num frames: 463413248. Throughput: 0: 3751.3. Samples: 105023494. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 05:39:28,968][134211] Avg episode reward: [(0, '8.247')] [2025-01-04 05:39:30,771][134294] Updated weights for policy 0, policy_version 113144 (0.0024) [2025-01-04 05:39:33,836][134294] Updated weights for policy 0, policy_version 113154 (0.0030) [2025-01-04 05:39:33,968][134211] Fps is (10 sec: 16383.7, 60 sec: 14609.0, 300 sec: 14676.2). Total num frames: 463478784. Throughput: 0: 3663.4. Samples: 105034040. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 05:39:33,969][134211] Avg episode reward: [(0, '8.288')] [2025-01-04 05:39:36,852][134294] Updated weights for policy 0, policy_version 113164 (0.0026) [2025-01-04 05:39:38,968][134211] Fps is (10 sec: 13107.3, 60 sec: 14609.1, 300 sec: 14565.1). Total num frames: 463544320. Throughput: 0: 3481.6. Samples: 105054160. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 05:39:38,968][134211] Avg episode reward: [(0, '9.401')] [2025-01-04 05:39:40,083][134294] Updated weights for policy 0, policy_version 113174 (0.0025) [2025-01-04 05:39:43,078][134294] Updated weights for policy 0, policy_version 113184 (0.0026) [2025-01-04 05:39:43,968][134211] Fps is (10 sec: 13107.4, 60 sec: 14540.8, 300 sec: 14579.0). Total num frames: 463609856. Throughput: 0: 3562.9. Samples: 105073854. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 05:39:43,970][134211] Avg episode reward: [(0, '7.858')] [2025-01-04 05:39:46,141][134294] Updated weights for policy 0, policy_version 113194 (0.0026) [2025-01-04 05:39:48,968][134211] Fps is (10 sec: 13107.1, 60 sec: 13994.6, 300 sec: 14592.9). Total num frames: 463675392. Throughput: 0: 3593.5. Samples: 105083896. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 05:39:48,968][134211] Avg episode reward: [(0, '8.848')] [2025-01-04 05:39:49,456][134294] Updated weights for policy 0, policy_version 113204 (0.0026) [2025-01-04 05:39:52,699][134294] Updated weights for policy 0, policy_version 113214 (0.0022) [2025-01-04 05:39:52,712][134264] Signal inference workers to stop experience collection... (50 times) [2025-01-04 05:39:52,713][134264] Signal inference workers to resume experience collection... (50 times) [2025-01-04 05:39:52,719][134294] InferenceWorker_p0-w0: stopping experience collection (50 times) [2025-01-04 05:39:52,726][134294] InferenceWorker_p0-w0: resuming experience collection (50 times) [2025-01-04 05:39:53,968][134211] Fps is (10 sec: 13926.5, 60 sec: 13926.5, 300 sec: 14648.4). Total num frames: 463749120. Throughput: 0: 3494.6. Samples: 105102294. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 05:39:53,968][134211] Avg episode reward: [(0, '8.738')] [2025-01-04 05:39:54,797][134294] Updated weights for policy 0, policy_version 113224 (0.0013) [2025-01-04 05:39:56,928][134294] Updated weights for policy 0, policy_version 113234 (0.0013) [2025-01-04 05:39:58,958][134294] Updated weights for policy 0, policy_version 113244 (0.0015) [2025-01-04 05:39:58,968][134211] Fps is (10 sec: 17202.7, 60 sec: 14609.0, 300 sec: 14745.6). Total num frames: 463847424. Throughput: 0: 3697.3. Samples: 105131556. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 05:39:58,968][134211] Avg episode reward: [(0, '8.747')] [2025-01-04 05:40:02,232][134294] Updated weights for policy 0, policy_version 113254 (0.0027) [2025-01-04 05:40:03,968][134211] Fps is (10 sec: 15974.3, 60 sec: 14677.4, 300 sec: 14662.3). Total num frames: 463908864. Throughput: 0: 3716.1. Samples: 105142076. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 05:40:03,968][134211] Avg episode reward: [(0, '8.151')] [2025-01-04 05:40:05,344][134294] Updated weights for policy 0, policy_version 113264 (0.0026) [2025-01-04 05:40:08,382][134294] Updated weights for policy 0, policy_version 113274 (0.0027) [2025-01-04 05:40:08,968][134211] Fps is (10 sec: 12696.9, 60 sec: 14404.1, 300 sec: 14523.4). Total num frames: 463974400. Throughput: 0: 3724.8. Samples: 105161878. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 05:40:08,969][134211] Avg episode reward: [(0, '7.616')] [2025-01-04 05:40:11,477][134294] Updated weights for policy 0, policy_version 113284 (0.0024) [2025-01-04 05:40:13,968][134211] Fps is (10 sec: 13516.6, 60 sec: 14472.6, 300 sec: 14523.5). Total num frames: 464044032. Throughput: 0: 3519.0. Samples: 105181850. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 05:40:13,969][134211] Avg episode reward: [(0, '7.865')] [2025-01-04 05:40:14,449][134294] Updated weights for policy 0, policy_version 113294 (0.0027) [2025-01-04 05:40:17,540][134294] Updated weights for policy 0, policy_version 113304 (0.0024) [2025-01-04 05:40:18,968][134211] Fps is (10 sec: 13518.0, 60 sec: 14472.6, 300 sec: 14551.2). Total num frames: 464109568. Throughput: 0: 3506.6. Samples: 105191836. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 05:40:18,968][134211] Avg episode reward: [(0, '8.640')] [2025-01-04 05:40:20,485][134294] Updated weights for policy 0, policy_version 113314 (0.0024) [2025-01-04 05:40:23,623][134294] Updated weights for policy 0, policy_version 113324 (0.0025) [2025-01-04 05:40:23,968][134211] Fps is (10 sec: 13516.6, 60 sec: 14404.2, 300 sec: 14579.0). Total num frames: 464179200. Throughput: 0: 3515.2. Samples: 105212346. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 05:40:23,969][134211] Avg episode reward: [(0, '8.343')] [2025-01-04 05:40:26,077][134294] Updated weights for policy 0, policy_version 113334 (0.0016) [2025-01-04 05:40:28,045][134294] Updated weights for policy 0, policy_version 113344 (0.0013) [2025-01-04 05:40:28,968][134211] Fps is (10 sec: 16384.0, 60 sec: 14336.0, 300 sec: 14662.3). Total num frames: 464273408. Throughput: 0: 3654.3. Samples: 105238296. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 05:40:28,968][134211] Avg episode reward: [(0, '8.118')] [2025-01-04 05:40:29,952][134294] Updated weights for policy 0, policy_version 113354 (0.0013) [2025-01-04 05:40:31,875][134294] Updated weights for policy 0, policy_version 113364 (0.0015) [2025-01-04 05:40:33,968][134211] Fps is (10 sec: 18841.8, 60 sec: 14813.9, 300 sec: 14759.5). Total num frames: 464367616. Throughput: 0: 3789.1. Samples: 105254408. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 05:40:33,968][134211] Avg episode reward: [(0, '8.559')] [2025-01-04 05:40:34,827][134294] Updated weights for policy 0, policy_version 113374 (0.0025) [2025-01-04 05:40:38,074][134294] Updated weights for policy 0, policy_version 113384 (0.0029) [2025-01-04 05:40:38,968][134211] Fps is (10 sec: 15564.4, 60 sec: 14745.5, 300 sec: 14690.0). Total num frames: 464429056. Throughput: 0: 3854.8. Samples: 105275760. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 05:40:38,969][134211] Avg episode reward: [(0, '8.599')] [2025-01-04 05:40:41,139][134294] Updated weights for policy 0, policy_version 113394 (0.0026) [2025-01-04 05:40:43,968][134211] Fps is (10 sec: 12697.6, 60 sec: 14745.6, 300 sec: 14551.2). Total num frames: 464494592. Throughput: 0: 3635.1. Samples: 105295134. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 05:40:43,968][134211] Avg episode reward: [(0, '8.291')] [2025-01-04 05:40:44,369][134294] Updated weights for policy 0, policy_version 113404 (0.0026) [2025-01-04 05:40:47,455][134294] Updated weights for policy 0, policy_version 113414 (0.0028) [2025-01-04 05:40:48,968][134211] Fps is (10 sec: 13107.3, 60 sec: 14745.6, 300 sec: 14523.5). Total num frames: 464560128. Throughput: 0: 3620.3. Samples: 105304988. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 05:40:48,968][134211] Avg episode reward: [(0, '7.811')] [2025-01-04 05:40:50,857][134294] Updated weights for policy 0, policy_version 113424 (0.0023) [2025-01-04 05:40:53,717][134294] Updated weights for policy 0, policy_version 113434 (0.0022) [2025-01-04 05:40:53,968][134211] Fps is (10 sec: 13517.1, 60 sec: 14677.3, 300 sec: 14565.1). Total num frames: 464629760. Throughput: 0: 3594.3. Samples: 105323618. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 05:40:53,968][134211] Avg episode reward: [(0, '8.405')] [2025-01-04 05:40:55,749][134294] Updated weights for policy 0, policy_version 113444 (0.0013) [2025-01-04 05:40:57,725][134294] Updated weights for policy 0, policy_version 113454 (0.0012) [2025-01-04 05:40:58,968][134211] Fps is (10 sec: 17203.5, 60 sec: 14745.7, 300 sec: 14704.0). Total num frames: 464732160. Throughput: 0: 3790.8. Samples: 105352434. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-04 05:40:58,968][134211] Avg episode reward: [(0, '8.024')] [2025-01-04 05:40:59,642][134294] Updated weights for policy 0, policy_version 113464 (0.0014) [2025-01-04 05:41:02,682][134294] Updated weights for policy 0, policy_version 113474 (0.0022) [2025-01-04 05:41:03,968][134211] Fps is (10 sec: 17612.3, 60 sec: 14950.3, 300 sec: 14745.6). Total num frames: 464805888. Throughput: 0: 3874.5. Samples: 105366190. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-04 05:41:03,969][134211] Avg episode reward: [(0, '8.173')] [2025-01-04 05:41:05,755][134294] Updated weights for policy 0, policy_version 113484 (0.0024) [2025-01-04 05:41:08,900][134294] Updated weights for policy 0, policy_version 113494 (0.0026) [2025-01-04 05:41:08,968][134211] Fps is (10 sec: 13926.2, 60 sec: 14950.6, 300 sec: 14676.2). Total num frames: 464871424. Throughput: 0: 3843.0. Samples: 105385278. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-04 05:41:08,969][134211] Avg episode reward: [(0, '8.611')] [2025-01-04 05:41:12,024][134294] Updated weights for policy 0, policy_version 113504 (0.0024) [2025-01-04 05:41:13,968][134211] Fps is (10 sec: 13106.9, 60 sec: 14882.1, 300 sec: 14662.3). Total num frames: 464936960. Throughput: 0: 3704.4. Samples: 105404996. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-04 05:41:13,969][134211] Avg episode reward: [(0, '8.133')] [2025-01-04 05:41:13,977][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000113510_464936960.pth... [2025-01-04 05:41:14,052][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000112651_461418496.pth [2025-01-04 05:41:15,226][134294] Updated weights for policy 0, policy_version 113514 (0.0027) [2025-01-04 05:41:18,082][134294] Updated weights for policy 0, policy_version 113524 (0.0026) [2025-01-04 05:41:18,968][134211] Fps is (10 sec: 13107.2, 60 sec: 14882.1, 300 sec: 14662.3). Total num frames: 465002496. Throughput: 0: 3570.7. Samples: 105415088. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-04 05:41:18,969][134211] Avg episode reward: [(0, '8.172')] [2025-01-04 05:41:21,236][134294] Updated weights for policy 0, policy_version 113534 (0.0026) [2025-01-04 05:41:23,968][134211] Fps is (10 sec: 12697.6, 60 sec: 14745.6, 300 sec: 14551.2). Total num frames: 465063936. Throughput: 0: 3529.6. Samples: 105434594. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-04 05:41:23,970][134211] Avg episode reward: [(0, '8.675')] [2025-01-04 05:41:24,413][134294] Updated weights for policy 0, policy_version 113544 (0.0022) [2025-01-04 05:41:26,477][134294] Updated weights for policy 0, policy_version 113554 (0.0014) [2025-01-04 05:41:28,880][134294] Updated weights for policy 0, policy_version 113564 (0.0019) [2025-01-04 05:41:28,968][134211] Fps is (10 sec: 15564.8, 60 sec: 14745.6, 300 sec: 14509.5). Total num frames: 465158144. Throughput: 0: 3671.6. Samples: 105460356. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-04 05:41:28,969][134211] Avg episode reward: [(0, '7.823')] [2025-01-04 05:41:31,944][134294] Updated weights for policy 0, policy_version 113574 (0.0026) [2025-01-04 05:41:33,968][134211] Fps is (10 sec: 15974.9, 60 sec: 14267.7, 300 sec: 14440.2). Total num frames: 465223680. Throughput: 0: 3669.0. Samples: 105470092. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-04 05:41:33,968][134211] Avg episode reward: [(0, '8.058')] [2025-01-04 05:41:35,129][134294] Updated weights for policy 0, policy_version 113584 (0.0025) [2025-01-04 05:41:37,729][134294] Updated weights for policy 0, policy_version 113594 (0.0018) [2025-01-04 05:41:38,968][134211] Fps is (10 sec: 14745.3, 60 sec: 14609.0, 300 sec: 14509.6). Total num frames: 465305600. Throughput: 0: 3714.0. Samples: 105490748. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-04 05:41:38,968][134211] Avg episode reward: [(0, '8.495')] [2025-01-04 05:41:39,615][134294] Updated weights for policy 0, policy_version 113604 (0.0014) [2025-01-04 05:41:41,542][134294] Updated weights for policy 0, policy_version 113614 (0.0013) [2025-01-04 05:41:43,437][134294] Updated weights for policy 0, policy_version 113624 (0.0012) [2025-01-04 05:41:43,968][134211] Fps is (10 sec: 18842.0, 60 sec: 15291.8, 300 sec: 14648.4). Total num frames: 465412096. Throughput: 0: 3789.7. Samples: 105522972. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-04 05:41:43,968][134211] Avg episode reward: [(0, '8.346')] [2025-01-04 05:41:45,617][134294] Updated weights for policy 0, policy_version 113634 (0.0017) [2025-01-04 05:41:48,702][134294] Updated weights for policy 0, policy_version 113644 (0.0025) [2025-01-04 05:41:48,968][134211] Fps is (10 sec: 18023.0, 60 sec: 15428.3, 300 sec: 14690.1). Total num frames: 465485824. Throughput: 0: 3783.8. Samples: 105536462. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-04 05:41:48,968][134211] Avg episode reward: [(0, '8.260')] [2025-01-04 05:41:52,180][134294] Updated weights for policy 0, policy_version 113654 (0.0030) [2025-01-04 05:41:53,969][134211] Fps is (10 sec: 13105.9, 60 sec: 15223.2, 300 sec: 14676.1). Total num frames: 465543168. Throughput: 0: 3766.5. Samples: 105554772. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-04 05:41:53,969][134211] Avg episode reward: [(0, '9.133')] [2025-01-04 05:41:55,828][134294] Updated weights for policy 0, policy_version 113664 (0.0024) [2025-01-04 05:41:58,971][134211] Fps is (10 sec: 11874.6, 60 sec: 14540.0, 300 sec: 14648.2). Total num frames: 465604608. Throughput: 0: 3713.7. Samples: 105572122. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 05:41:58,971][134211] Avg episode reward: [(0, '8.677')] [2025-01-04 05:41:59,360][134294] Updated weights for policy 0, policy_version 113674 (0.0031) [2025-01-04 05:42:03,004][134294] Updated weights for policy 0, policy_version 113684 (0.0026) [2025-01-04 05:42:03,968][134211] Fps is (10 sec: 11469.8, 60 sec: 14199.5, 300 sec: 14606.8). Total num frames: 465657856. Throughput: 0: 3677.6. Samples: 105580578. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 05:42:03,968][134211] Avg episode reward: [(0, '8.870')] [2025-01-04 05:42:06,124][134294] Updated weights for policy 0, policy_version 113694 (0.0028) [2025-01-04 05:42:08,279][134294] Updated weights for policy 0, policy_version 113704 (0.0015) [2025-01-04 05:42:08,968][134211] Fps is (10 sec: 13930.7, 60 sec: 14540.8, 300 sec: 14662.3). Total num frames: 465743872. Throughput: 0: 3681.3. Samples: 105600252. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 05:42:08,968][134211] Avg episode reward: [(0, '7.928')] [2025-01-04 05:42:10,236][134294] Updated weights for policy 0, policy_version 113714 (0.0016) [2025-01-04 05:42:12,868][134294] Updated weights for policy 0, policy_version 113724 (0.0021) [2025-01-04 05:42:13,968][134211] Fps is (10 sec: 16793.4, 60 sec: 14813.9, 300 sec: 14717.8). Total num frames: 465825792. Throughput: 0: 3713.9. Samples: 105627484. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 05:42:13,969][134211] Avg episode reward: [(0, '7.377')] [2025-01-04 05:42:15,980][134294] Updated weights for policy 0, policy_version 113734 (0.0026) [2025-01-04 05:42:18,968][134211] Fps is (10 sec: 14745.5, 60 sec: 14813.9, 300 sec: 14592.9). Total num frames: 465891328. Throughput: 0: 3718.9. Samples: 105637442. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 05:42:18,968][134211] Avg episode reward: [(0, '8.957')] [2025-01-04 05:42:18,993][134294] Updated weights for policy 0, policy_version 113744 (0.0026) [2025-01-04 05:42:22,001][134294] Updated weights for policy 0, policy_version 113754 (0.0025) [2025-01-04 05:42:23,968][134211] Fps is (10 sec: 13107.3, 60 sec: 14882.2, 300 sec: 14551.2). Total num frames: 465956864. Throughput: 0: 3708.7. Samples: 105657638. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 05:42:23,968][134211] Avg episode reward: [(0, '8.073')] [2025-01-04 05:42:25,379][134294] Updated weights for policy 0, policy_version 113764 (0.0024) [2025-01-04 05:42:28,505][134294] Updated weights for policy 0, policy_version 113774 (0.0022) [2025-01-04 05:42:28,968][134211] Fps is (10 sec: 13517.0, 60 sec: 14472.6, 300 sec: 14565.1). Total num frames: 466026496. Throughput: 0: 3405.3. Samples: 105676212. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 05:42:28,968][134211] Avg episode reward: [(0, '7.839')] [2025-01-04 05:42:30,576][134294] Updated weights for policy 0, policy_version 113784 (0.0013) [2025-01-04 05:42:32,535][134294] Updated weights for policy 0, policy_version 113794 (0.0013) [2025-01-04 05:42:33,968][134211] Fps is (10 sec: 17203.5, 60 sec: 15087.0, 300 sec: 14690.1). Total num frames: 466128896. Throughput: 0: 3435.6. Samples: 105691064. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 05:42:33,968][134211] Avg episode reward: [(0, '8.655')] [2025-01-04 05:42:34,425][134294] Updated weights for policy 0, policy_version 113804 (0.0014) [2025-01-04 05:42:36,360][134294] Updated weights for policy 0, policy_version 113814 (0.0012) [2025-01-04 05:42:38,880][134294] Updated weights for policy 0, policy_version 113824 (0.0020) [2025-01-04 05:42:38,968][134211] Fps is (10 sec: 19660.5, 60 sec: 15291.8, 300 sec: 14787.2). Total num frames: 466223104. Throughput: 0: 3741.5. Samples: 105723138. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 05:42:38,968][134211] Avg episode reward: [(0, '8.598')] [2025-01-04 05:42:42,228][134294] Updated weights for policy 0, policy_version 113834 (0.0029) [2025-01-04 05:42:43,968][134211] Fps is (10 sec: 15564.5, 60 sec: 14540.7, 300 sec: 14634.6). Total num frames: 466284544. Throughput: 0: 3779.1. Samples: 105742170. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 05:42:43,969][134211] Avg episode reward: [(0, '9.104')] [2025-01-04 05:42:45,373][134294] Updated weights for policy 0, policy_version 113844 (0.0026) [2025-01-04 05:42:48,457][134294] Updated weights for policy 0, policy_version 113854 (0.0027) [2025-01-04 05:42:48,968][134211] Fps is (10 sec: 12697.6, 60 sec: 14404.2, 300 sec: 14620.8). Total num frames: 466350080. Throughput: 0: 3812.4. Samples: 105752138. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 05:42:48,968][134211] Avg episode reward: [(0, '7.932')] [2025-01-04 05:42:51,462][134294] Updated weights for policy 0, policy_version 113864 (0.0025) [2025-01-04 05:42:53,968][134211] Fps is (10 sec: 13107.4, 60 sec: 14541.0, 300 sec: 14648.4). Total num frames: 466415616. Throughput: 0: 3820.6. Samples: 105772178. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 05:42:53,968][134211] Avg episode reward: [(0, '8.458')] [2025-01-04 05:42:54,690][134294] Updated weights for policy 0, policy_version 113874 (0.0024) [2025-01-04 05:42:58,223][134294] Updated weights for policy 0, policy_version 113884 (0.0026) [2025-01-04 05:42:58,968][134211] Fps is (10 sec: 12697.8, 60 sec: 14541.6, 300 sec: 14634.5). Total num frames: 466477056. Throughput: 0: 3616.0. Samples: 105790202. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 05:42:58,968][134211] Avg episode reward: [(0, '8.758')] [2025-01-04 05:43:01,481][134294] Updated weights for policy 0, policy_version 113894 (0.0026) [2025-01-04 05:43:03,968][134211] Fps is (10 sec: 12287.8, 60 sec: 14677.3, 300 sec: 14620.6). Total num frames: 466538496. Throughput: 0: 3603.1. Samples: 105799582. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 05:43:03,968][134211] Avg episode reward: [(0, '7.400')] [2025-01-04 05:43:04,674][134294] Updated weights for policy 0, policy_version 113904 (0.0027) [2025-01-04 05:43:07,707][134294] Updated weights for policy 0, policy_version 113914 (0.0024) [2025-01-04 05:43:08,967][134211] Fps is (10 sec: 13926.6, 60 sec: 14540.9, 300 sec: 14648.5). Total num frames: 466616320. Throughput: 0: 3597.8. Samples: 105819538. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 05:43:08,968][134211] Avg episode reward: [(0, '7.786')] [2025-01-04 05:43:09,611][134294] Updated weights for policy 0, policy_version 113924 (0.0014) [2025-01-04 05:43:12,348][134294] Updated weights for policy 0, policy_version 113934 (0.0021) [2025-01-04 05:43:13,968][134211] Fps is (10 sec: 15564.8, 60 sec: 14472.6, 300 sec: 14620.6). Total num frames: 466694144. Throughput: 0: 3736.7. Samples: 105844364. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 05:43:13,968][134211] Avg episode reward: [(0, '8.895')] [2025-01-04 05:43:13,978][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000113939_466694144.pth... [2025-01-04 05:43:14,042][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000113080_463175680.pth [2025-01-04 05:43:15,368][134294] Updated weights for policy 0, policy_version 113944 (0.0026) [2025-01-04 05:43:18,344][134294] Updated weights for policy 0, policy_version 113954 (0.0027) [2025-01-04 05:43:18,968][134211] Fps is (10 sec: 14335.6, 60 sec: 14472.5, 300 sec: 14634.5). Total num frames: 466759680. Throughput: 0: 3632.4. Samples: 105854522. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 05:43:18,968][134211] Avg episode reward: [(0, '8.055')] [2025-01-04 05:43:21,402][134294] Updated weights for policy 0, policy_version 113964 (0.0029) [2025-01-04 05:43:23,585][134294] Updated weights for policy 0, policy_version 113974 (0.0015) [2025-01-04 05:43:23,968][134211] Fps is (10 sec: 14745.5, 60 sec: 14745.6, 300 sec: 14592.9). Total num frames: 466841600. Throughput: 0: 3378.6. Samples: 105875174. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 05:43:23,969][134211] Avg episode reward: [(0, '8.196')] [2025-01-04 05:43:25,730][134294] Updated weights for policy 0, policy_version 113984 (0.0014) [2025-01-04 05:43:28,823][134294] Updated weights for policy 0, policy_version 113994 (0.0025) [2025-01-04 05:43:28,968][134211] Fps is (10 sec: 15974.6, 60 sec: 14882.1, 300 sec: 14634.5). Total num frames: 466919424. Throughput: 0: 3534.7. Samples: 105901230. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 05:43:28,968][134211] Avg episode reward: [(0, '8.653')] [2025-01-04 05:43:31,983][134294] Updated weights for policy 0, policy_version 114004 (0.0028) [2025-01-04 05:43:33,968][134211] Fps is (10 sec: 14336.0, 60 sec: 14267.7, 300 sec: 14634.5). Total num frames: 466984960. Throughput: 0: 3524.8. Samples: 105910756. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 05:43:33,968][134211] Avg episode reward: [(0, '7.771')] [2025-01-04 05:43:35,019][134294] Updated weights for policy 0, policy_version 114014 (0.0025) [2025-01-04 05:43:37,062][134294] Updated weights for policy 0, policy_version 114024 (0.0014) [2025-01-04 05:43:38,968][134211] Fps is (10 sec: 14745.5, 60 sec: 14062.9, 300 sec: 14676.2). Total num frames: 467066880. Throughput: 0: 3604.7. Samples: 105934390. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 05:43:38,968][134211] Avg episode reward: [(0, '8.866')] [2025-01-04 05:43:39,859][134294] Updated weights for policy 0, policy_version 114034 (0.0025) [2025-01-04 05:43:42,905][134294] Updated weights for policy 0, policy_version 114044 (0.0022) [2025-01-04 05:43:43,969][134211] Fps is (10 sec: 15154.1, 60 sec: 14199.3, 300 sec: 14578.9). Total num frames: 467136512. Throughput: 0: 3665.4. Samples: 105955148. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 05:43:43,969][134211] Avg episode reward: [(0, '8.085')] [2025-01-04 05:43:45,893][134294] Updated weights for policy 0, policy_version 114054 (0.0025) [2025-01-04 05:43:48,808][134294] Updated weights for policy 0, policy_version 114064 (0.0025) [2025-01-04 05:43:48,968][134211] Fps is (10 sec: 13926.4, 60 sec: 14267.7, 300 sec: 14551.2). Total num frames: 467206144. Throughput: 0: 3685.7. Samples: 105965440. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 05:43:48,968][134211] Avg episode reward: [(0, '7.624')] [2025-01-04 05:43:51,896][134294] Updated weights for policy 0, policy_version 114074 (0.0023) [2025-01-04 05:43:53,968][134211] Fps is (10 sec: 14747.0, 60 sec: 14472.5, 300 sec: 14620.6). Total num frames: 467283968. Throughput: 0: 3704.1. Samples: 105986224. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 05:43:53,968][134211] Avg episode reward: [(0, '7.456')] [2025-01-04 05:43:54,108][134294] Updated weights for policy 0, policy_version 114084 (0.0013) [2025-01-04 05:43:56,276][134294] Updated weights for policy 0, policy_version 114094 (0.0014) [2025-01-04 05:43:58,437][134294] Updated weights for policy 0, policy_version 114104 (0.0013) [2025-01-04 05:43:58,969][134211] Fps is (10 sec: 17200.9, 60 sec: 15018.3, 300 sec: 14745.5). Total num frames: 467378176. Throughput: 0: 3783.5. Samples: 106014626. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 05:43:58,969][134211] Avg episode reward: [(0, '8.828')] [2025-01-04 05:44:00,517][134294] Updated weights for policy 0, policy_version 114114 (0.0014) [2025-01-04 05:44:03,660][134294] Updated weights for policy 0, policy_version 114124 (0.0029) [2025-01-04 05:44:03,968][134211] Fps is (10 sec: 16793.4, 60 sec: 15223.5, 300 sec: 14717.8). Total num frames: 467451904. Throughput: 0: 3860.2. Samples: 106028232. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 05:44:03,968][134211] Avg episode reward: [(0, '8.257')] [2025-01-04 05:44:06,904][134294] Updated weights for policy 0, policy_version 114134 (0.0029) [2025-01-04 05:44:08,968][134211] Fps is (10 sec: 13928.3, 60 sec: 15018.6, 300 sec: 14717.8). Total num frames: 467517440. Throughput: 0: 3823.6. Samples: 106047238. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 05:44:08,968][134211] Avg episode reward: [(0, '8.461')] [2025-01-04 05:44:10,201][134294] Updated weights for policy 0, policy_version 114144 (0.0025) [2025-01-04 05:44:13,412][134294] Updated weights for policy 0, policy_version 114154 (0.0028) [2025-01-04 05:44:13,970][134211] Fps is (10 sec: 12695.0, 60 sec: 14745.1, 300 sec: 14703.8). Total num frames: 467578880. Throughput: 0: 3665.4. Samples: 106066182. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 05:44:13,971][134211] Avg episode reward: [(0, '8.723')] [2025-01-04 05:44:16,440][134294] Updated weights for policy 0, policy_version 114164 (0.0027) [2025-01-04 05:44:18,968][134211] Fps is (10 sec: 13107.4, 60 sec: 14813.9, 300 sec: 14690.1). Total num frames: 467648512. Throughput: 0: 3678.6. Samples: 106076294. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 05:44:18,968][134211] Avg episode reward: [(0, '8.271')] [2025-01-04 05:44:19,523][134294] Updated weights for policy 0, policy_version 114174 (0.0026) [2025-01-04 05:44:22,711][134294] Updated weights for policy 0, policy_version 114184 (0.0024) [2025-01-04 05:44:23,968][134211] Fps is (10 sec: 13110.1, 60 sec: 14472.6, 300 sec: 14565.1). Total num frames: 467709952. Throughput: 0: 3589.6. Samples: 106095922. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 05:44:23,968][134211] Avg episode reward: [(0, '8.594')] [2025-01-04 05:44:25,873][134294] Updated weights for policy 0, policy_version 114194 (0.0027) [2025-01-04 05:44:28,569][134294] Updated weights for policy 0, policy_version 114204 (0.0017) [2025-01-04 05:44:28,967][134211] Fps is (10 sec: 13517.0, 60 sec: 14404.3, 300 sec: 14592.9). Total num frames: 467783680. Throughput: 0: 3580.6. Samples: 106116272. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 05:44:28,968][134211] Avg episode reward: [(0, '8.524')] [2025-01-04 05:44:30,520][134294] Updated weights for policy 0, policy_version 114214 (0.0014) [2025-01-04 05:44:32,432][134294] Updated weights for policy 0, policy_version 114224 (0.0013) [2025-01-04 05:44:33,968][134211] Fps is (10 sec: 18432.1, 60 sec: 15155.3, 300 sec: 14745.6). Total num frames: 467894272. Throughput: 0: 3703.9. Samples: 106132114. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 05:44:33,968][134211] Avg episode reward: [(0, '8.315')] [2025-01-04 05:44:34,281][134294] Updated weights for policy 0, policy_version 114234 (0.0013) [2025-01-04 05:44:36,270][134294] Updated weights for policy 0, policy_version 114244 (0.0013) [2025-01-04 05:44:38,968][134211] Fps is (10 sec: 19660.2, 60 sec: 15223.4, 300 sec: 14815.0). Total num frames: 467980288. Throughput: 0: 3929.0. Samples: 106163030. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 05:44:38,969][134211] Avg episode reward: [(0, '7.310')] [2025-01-04 05:44:39,026][134294] Updated weights for policy 0, policy_version 114254 (0.0022) [2025-01-04 05:44:42,427][134294] Updated weights for policy 0, policy_version 114264 (0.0028) [2025-01-04 05:44:43,968][134211] Fps is (10 sec: 14745.4, 60 sec: 15087.1, 300 sec: 14801.1). Total num frames: 468041728. Throughput: 0: 3719.8. Samples: 106182014. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 05:44:43,968][134211] Avg episode reward: [(0, '7.641')] [2025-01-04 05:44:45,504][134294] Updated weights for policy 0, policy_version 114274 (0.0024) [2025-01-04 05:44:48,539][134294] Updated weights for policy 0, policy_version 114284 (0.0029) [2025-01-04 05:44:48,968][134211] Fps is (10 sec: 13107.4, 60 sec: 15087.0, 300 sec: 14787.3). Total num frames: 468111360. Throughput: 0: 3639.4. Samples: 106192004. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 05:44:48,968][134211] Avg episode reward: [(0, '9.009')] [2025-01-04 05:44:51,731][134294] Updated weights for policy 0, policy_version 114294 (0.0023) [2025-01-04 05:44:53,968][134211] Fps is (10 sec: 13107.2, 60 sec: 14813.9, 300 sec: 14662.3). Total num frames: 468172800. Throughput: 0: 3650.5. Samples: 106211512. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 05:44:53,968][134211] Avg episode reward: [(0, '8.511')] [2025-01-04 05:44:55,138][134294] Updated weights for policy 0, policy_version 114304 (0.0025) [2025-01-04 05:44:58,513][134294] Updated weights for policy 0, policy_version 114314 (0.0027) [2025-01-04 05:44:58,968][134211] Fps is (10 sec: 12287.9, 60 sec: 14268.1, 300 sec: 14662.3). Total num frames: 468234240. Throughput: 0: 3634.7. Samples: 106229734. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 05:44:58,968][134211] Avg episode reward: [(0, '8.543')] [2025-01-04 05:45:02,043][134294] Updated weights for policy 0, policy_version 114324 (0.0027) [2025-01-04 05:45:03,968][134211] Fps is (10 sec: 12288.0, 60 sec: 14062.9, 300 sec: 14648.4). Total num frames: 468295680. Throughput: 0: 3605.5. Samples: 106238540. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 05:45:03,968][134211] Avg episode reward: [(0, '8.069')] [2025-01-04 05:45:04,808][134294] Updated weights for policy 0, policy_version 114334 (0.0021) [2025-01-04 05:45:06,655][134294] Updated weights for policy 0, policy_version 114344 (0.0013) [2025-01-04 05:45:08,562][134294] Updated weights for policy 0, policy_version 114354 (0.0016) [2025-01-04 05:45:08,968][134211] Fps is (10 sec: 16793.8, 60 sec: 14745.6, 300 sec: 14773.4). Total num frames: 468402176. Throughput: 0: 3742.4. Samples: 106264330. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 05:45:08,968][134211] Avg episode reward: [(0, '8.309')] [2025-01-04 05:45:10,386][134294] Updated weights for policy 0, policy_version 114364 (0.0013) [2025-01-04 05:45:12,336][134294] Updated weights for policy 0, policy_version 114374 (0.0013) [2025-01-04 05:45:13,968][134211] Fps is (10 sec: 21299.2, 60 sec: 15497.1, 300 sec: 14912.2). Total num frames: 468508672. Throughput: 0: 4016.1. Samples: 106296996. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 05:45:13,968][134211] Avg episode reward: [(0, '7.865')] [2025-01-04 05:45:13,978][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000114382_468508672.pth... [2025-01-04 05:45:14,038][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000113510_464936960.pth [2025-01-04 05:45:14,337][134294] Updated weights for policy 0, policy_version 114384 (0.0015) [2025-01-04 05:45:17,419][134294] Updated weights for policy 0, policy_version 114394 (0.0027) [2025-01-04 05:45:18,968][134211] Fps is (10 sec: 17202.7, 60 sec: 15428.2, 300 sec: 14898.3). Total num frames: 468574208. Throughput: 0: 3922.1. Samples: 106308608. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 05:45:18,969][134211] Avg episode reward: [(0, '8.552')] [2025-01-04 05:45:20,713][134294] Updated weights for policy 0, policy_version 114404 (0.0025) [2025-01-04 05:45:23,968][134211] Fps is (10 sec: 12697.5, 60 sec: 15428.2, 300 sec: 14787.2). Total num frames: 468635648. Throughput: 0: 3655.8. Samples: 106327542. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 05:45:23,968][134211] Avg episode reward: [(0, '8.204')] [2025-01-04 05:45:24,064][134294] Updated weights for policy 0, policy_version 114414 (0.0023) [2025-01-04 05:45:27,417][134294] Updated weights for policy 0, policy_version 114424 (0.0027) [2025-01-04 05:45:28,968][134211] Fps is (10 sec: 12288.2, 60 sec: 15223.4, 300 sec: 14676.2). Total num frames: 468697088. Throughput: 0: 3628.9. Samples: 106345316. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 05:45:28,968][134211] Avg episode reward: [(0, '8.539')] [2025-01-04 05:45:30,956][134294] Updated weights for policy 0, policy_version 114434 (0.0026) [2025-01-04 05:45:33,968][134211] Fps is (10 sec: 12288.2, 60 sec: 14404.2, 300 sec: 14676.2). Total num frames: 468758528. Throughput: 0: 3610.0. Samples: 106354454. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 05:45:33,968][134211] Avg episode reward: [(0, '8.015')] [2025-01-04 05:45:34,025][134294] Updated weights for policy 0, policy_version 114444 (0.0024) [2025-01-04 05:45:37,024][134294] Updated weights for policy 0, policy_version 114454 (0.0026) [2025-01-04 05:45:38,968][134211] Fps is (10 sec: 13107.2, 60 sec: 14131.2, 300 sec: 14690.1). Total num frames: 468828160. Throughput: 0: 3624.3. Samples: 106374604. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 05:45:38,968][134211] Avg episode reward: [(0, '7.312')] [2025-01-04 05:45:40,031][134294] Updated weights for policy 0, policy_version 114464 (0.0026) [2025-01-04 05:45:42,995][134294] Updated weights for policy 0, policy_version 114474 (0.0024) [2025-01-04 05:45:43,968][134211] Fps is (10 sec: 13926.2, 60 sec: 14267.7, 300 sec: 14703.9). Total num frames: 468897792. Throughput: 0: 3676.1. Samples: 106395158. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 05:45:43,968][134211] Avg episode reward: [(0, '8.860')] [2025-01-04 05:45:45,964][134294] Updated weights for policy 0, policy_version 114484 (0.0025) [2025-01-04 05:45:48,877][134294] Updated weights for policy 0, policy_version 114494 (0.0023) [2025-01-04 05:45:48,968][134211] Fps is (10 sec: 13926.4, 60 sec: 14267.7, 300 sec: 14703.9). Total num frames: 468967424. Throughput: 0: 3712.8. Samples: 106405616. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 05:45:48,968][134211] Avg episode reward: [(0, '7.779')] [2025-01-04 05:45:52,022][134294] Updated weights for policy 0, policy_version 114504 (0.0022) [2025-01-04 05:45:53,968][134211] Fps is (10 sec: 13107.5, 60 sec: 14267.8, 300 sec: 14565.1). Total num frames: 469028864. Throughput: 0: 3588.4. Samples: 106425810. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 05:45:53,968][134211] Avg episode reward: [(0, '7.932')] [2025-01-04 05:45:54,943][134294] Updated weights for policy 0, policy_version 114514 (0.0020) [2025-01-04 05:45:57,101][134294] Updated weights for policy 0, policy_version 114524 (0.0014) [2025-01-04 05:45:58,968][134211] Fps is (10 sec: 15565.0, 60 sec: 14813.9, 300 sec: 14634.5). Total num frames: 469123072. Throughput: 0: 3417.9. Samples: 106450802. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 05:45:58,968][134211] Avg episode reward: [(0, '8.582')] [2025-01-04 05:45:59,394][134294] Updated weights for policy 0, policy_version 114534 (0.0014) [2025-01-04 05:46:02,233][134294] Updated weights for policy 0, policy_version 114544 (0.0020) [2025-01-04 05:46:03,968][134211] Fps is (10 sec: 17612.8, 60 sec: 15155.2, 300 sec: 14690.1). Total num frames: 469204992. Throughput: 0: 3418.7. Samples: 106462450. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 05:46:03,968][134211] Avg episode reward: [(0, '8.009')] [2025-01-04 05:46:04,200][134294] Updated weights for policy 0, policy_version 114554 (0.0016) [2025-01-04 05:46:07,478][134294] Updated weights for policy 0, policy_version 114564 (0.0024) [2025-01-04 05:46:08,968][134211] Fps is (10 sec: 14745.4, 60 sec: 14472.5, 300 sec: 14690.1). Total num frames: 469270528. Throughput: 0: 3515.3. Samples: 106485730. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 05:46:08,968][134211] Avg episode reward: [(0, '8.361')] [2025-01-04 05:46:10,862][134294] Updated weights for policy 0, policy_version 114574 (0.0028) [2025-01-04 05:46:13,968][134211] Fps is (10 sec: 12697.3, 60 sec: 13721.6, 300 sec: 14676.2). Total num frames: 469331968. Throughput: 0: 3540.7. Samples: 106504648. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 05:46:13,968][134211] Avg episode reward: [(0, '8.896')] [2025-01-04 05:46:13,980][134294] Updated weights for policy 0, policy_version 114584 (0.0027) [2025-01-04 05:46:17,040][134294] Updated weights for policy 0, policy_version 114594 (0.0027) [2025-01-04 05:46:18,968][134211] Fps is (10 sec: 12697.6, 60 sec: 13721.6, 300 sec: 14690.1). Total num frames: 469397504. Throughput: 0: 3552.1. Samples: 106514300. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 05:46:18,969][134211] Avg episode reward: [(0, '9.020')] [2025-01-04 05:46:20,134][134294] Updated weights for policy 0, policy_version 114604 (0.0027) [2025-01-04 05:46:22,957][134294] Updated weights for policy 0, policy_version 114614 (0.0025) [2025-01-04 05:46:23,968][134211] Fps is (10 sec: 14336.4, 60 sec: 13994.7, 300 sec: 14634.5). Total num frames: 469475328. Throughput: 0: 3560.6. Samples: 106534832. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 05:46:23,968][134211] Avg episode reward: [(0, '8.023')] [2025-01-04 05:46:25,024][134294] Updated weights for policy 0, policy_version 114624 (0.0015) [2025-01-04 05:46:27,061][134294] Updated weights for policy 0, policy_version 114634 (0.0015) [2025-01-04 05:46:28,968][134211] Fps is (10 sec: 18022.7, 60 sec: 14677.4, 300 sec: 14759.5). Total num frames: 469577728. Throughput: 0: 3756.8. Samples: 106564212. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 05:46:28,968][134211] Avg episode reward: [(0, '8.819')] [2025-01-04 05:46:29,020][134294] Updated weights for policy 0, policy_version 114644 (0.0015) [2025-01-04 05:46:31,806][134294] Updated weights for policy 0, policy_version 114654 (0.0021) [2025-01-04 05:46:33,968][134211] Fps is (10 sec: 17202.6, 60 sec: 14813.8, 300 sec: 14717.8). Total num frames: 469647360. Throughput: 0: 3802.2. Samples: 106576716. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 05:46:33,969][134211] Avg episode reward: [(0, '8.142')] [2025-01-04 05:46:35,376][134294] Updated weights for policy 0, policy_version 114664 (0.0026) [2025-01-04 05:46:38,485][134294] Updated weights for policy 0, policy_version 114674 (0.0027) [2025-01-04 05:46:38,968][134211] Fps is (10 sec: 13106.9, 60 sec: 14677.3, 300 sec: 14565.1). Total num frames: 469708800. Throughput: 0: 3758.6. Samples: 106594946. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 05:46:38,968][134211] Avg episode reward: [(0, '8.410')] [2025-01-04 05:46:41,583][134294] Updated weights for policy 0, policy_version 114684 (0.0025) [2025-01-04 05:46:43,968][134211] Fps is (10 sec: 12697.9, 60 sec: 14609.1, 300 sec: 14537.3). Total num frames: 469774336. Throughput: 0: 3641.5. Samples: 106614670. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 05:46:43,968][134211] Avg episode reward: [(0, '8.250')] [2025-01-04 05:46:44,837][134294] Updated weights for policy 0, policy_version 114694 (0.0028) [2025-01-04 05:46:47,758][134294] Updated weights for policy 0, policy_version 114704 (0.0024) [2025-01-04 05:46:48,968][134211] Fps is (10 sec: 13107.2, 60 sec: 14540.8, 300 sec: 14565.1). Total num frames: 469839872. Throughput: 0: 3600.3. Samples: 106624464. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 05:46:48,970][134211] Avg episode reward: [(0, '8.379')] [2025-01-04 05:46:50,735][134294] Updated weights for policy 0, policy_version 114714 (0.0028) [2025-01-04 05:46:53,298][134294] Updated weights for policy 0, policy_version 114724 (0.0018) [2025-01-04 05:46:53,968][134211] Fps is (10 sec: 14745.5, 60 sec: 14882.1, 300 sec: 14634.7). Total num frames: 469921792. Throughput: 0: 3544.0. Samples: 106645212. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 05:46:53,968][134211] Avg episode reward: [(0, '8.111')] [2025-01-04 05:46:55,880][134294] Updated weights for policy 0, policy_version 114734 (0.0021) [2025-01-04 05:46:58,968][134211] Fps is (10 sec: 14745.6, 60 sec: 14404.2, 300 sec: 14676.2). Total num frames: 469987328. Throughput: 0: 3637.3. Samples: 106668326. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 05:46:58,968][134211] Avg episode reward: [(0, '8.360')] [2025-01-04 05:46:59,148][134294] Updated weights for policy 0, policy_version 114744 (0.0025) [2025-01-04 05:47:01,931][134294] Updated weights for policy 0, policy_version 114754 (0.0021) [2025-01-04 05:47:03,937][134294] Updated weights for policy 0, policy_version 114764 (0.0013) [2025-01-04 05:47:03,968][134211] Fps is (10 sec: 15155.2, 60 sec: 14472.5, 300 sec: 14676.2). Total num frames: 470073344. Throughput: 0: 3627.4. Samples: 106677532. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 05:47:03,968][134211] Avg episode reward: [(0, '8.027')] [2025-01-04 05:47:05,800][134294] Updated weights for policy 0, policy_version 114774 (0.0013) [2025-01-04 05:47:07,729][134294] Updated weights for policy 0, policy_version 114784 (0.0013) [2025-01-04 05:47:08,967][134211] Fps is (10 sec: 19251.6, 60 sec: 15155.3, 300 sec: 14759.5). Total num frames: 470179840. Throughput: 0: 3874.4. Samples: 106709180. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 05:47:08,968][134211] Avg episode reward: [(0, '8.267')] [2025-01-04 05:47:09,656][134294] Updated weights for policy 0, policy_version 114794 (0.0014) [2025-01-04 05:47:12,211][134294] Updated weights for policy 0, policy_version 114804 (0.0020) [2025-01-04 05:47:13,968][134211] Fps is (10 sec: 18432.2, 60 sec: 15428.3, 300 sec: 14801.1). Total num frames: 470257664. Throughput: 0: 3808.3. Samples: 106735586. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 05:47:13,968][134211] Avg episode reward: [(0, '8.284')] [2025-01-04 05:47:13,975][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000114809_470257664.pth... [2025-01-04 05:47:14,046][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000113939_466694144.pth [2025-01-04 05:47:15,524][134294] Updated weights for policy 0, policy_version 114814 (0.0025) [2025-01-04 05:47:18,628][134294] Updated weights for policy 0, policy_version 114824 (0.0030) [2025-01-04 05:47:18,968][134211] Fps is (10 sec: 13925.7, 60 sec: 15359.9, 300 sec: 14787.2). Total num frames: 470319104. Throughput: 0: 3734.9. Samples: 106744788. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 05:47:18,969][134211] Avg episode reward: [(0, '8.916')] [2025-01-04 05:47:21,821][134294] Updated weights for policy 0, policy_version 114834 (0.0025) [2025-01-04 05:47:23,968][134211] Fps is (10 sec: 12697.6, 60 sec: 15155.2, 300 sec: 14773.4). Total num frames: 470384640. Throughput: 0: 3762.9. Samples: 106764278. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 05:47:23,968][134211] Avg episode reward: [(0, '9.279')] [2025-01-04 05:47:25,290][134294] Updated weights for policy 0, policy_version 114844 (0.0027) [2025-01-04 05:47:28,584][134294] Updated weights for policy 0, policy_version 114854 (0.0026) [2025-01-04 05:47:28,968][134211] Fps is (10 sec: 12288.4, 60 sec: 14404.2, 300 sec: 14620.6). Total num frames: 470441984. Throughput: 0: 3724.4. Samples: 106782268. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 05:47:28,968][134211] Avg episode reward: [(0, '7.722')] [2025-01-04 05:47:32,033][134294] Updated weights for policy 0, policy_version 114864 (0.0028) [2025-01-04 05:47:33,968][134211] Fps is (10 sec: 12288.0, 60 sec: 14336.1, 300 sec: 14523.4). Total num frames: 470507520. Throughput: 0: 3709.2. Samples: 106791378. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 05:47:33,968][134211] Avg episode reward: [(0, '8.163')] [2025-01-04 05:47:35,032][134294] Updated weights for policy 0, policy_version 114874 (0.0026) [2025-01-04 05:47:37,950][134294] Updated weights for policy 0, policy_version 114884 (0.0024) [2025-01-04 05:47:38,968][134211] Fps is (10 sec: 13516.5, 60 sec: 14472.5, 300 sec: 14551.2). Total num frames: 470577152. Throughput: 0: 3700.3. Samples: 106811726. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 05:47:38,969][134211] Avg episode reward: [(0, '7.839')] [2025-01-04 05:47:40,711][134294] Updated weights for policy 0, policy_version 114894 (0.0021) [2025-01-04 05:47:42,732][134294] Updated weights for policy 0, policy_version 114904 (0.0016) [2025-01-04 05:47:43,968][134211] Fps is (10 sec: 15564.5, 60 sec: 14813.8, 300 sec: 14620.6). Total num frames: 470663168. Throughput: 0: 3737.6. Samples: 106836518. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 05:47:43,968][134211] Avg episode reward: [(0, '8.335')] [2025-01-04 05:47:45,703][134294] Updated weights for policy 0, policy_version 114914 (0.0024) [2025-01-04 05:47:48,647][134294] Updated weights for policy 0, policy_version 114924 (0.0023) [2025-01-04 05:47:48,968][134211] Fps is (10 sec: 15155.4, 60 sec: 14813.9, 300 sec: 14620.6). Total num frames: 470728704. Throughput: 0: 3762.3. Samples: 106846834. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 05:47:48,968][134211] Avg episode reward: [(0, '8.264')] [2025-01-04 05:47:51,684][134294] Updated weights for policy 0, policy_version 114934 (0.0025) [2025-01-04 05:47:53,968][134211] Fps is (10 sec: 13517.1, 60 sec: 14609.1, 300 sec: 14648.4). Total num frames: 470798336. Throughput: 0: 3513.8. Samples: 106867302. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 05:47:53,968][134211] Avg episode reward: [(0, '8.474')] [2025-01-04 05:47:54,817][134294] Updated weights for policy 0, policy_version 114944 (0.0022) [2025-01-04 05:47:56,942][134294] Updated weights for policy 0, policy_version 114954 (0.0013) [2025-01-04 05:47:58,970][134211] Fps is (10 sec: 15970.9, 60 sec: 15018.1, 300 sec: 14745.5). Total num frames: 470888448. Throughput: 0: 3471.0. Samples: 106891790. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 05:47:58,970][134211] Avg episode reward: [(0, '8.104')] [2025-01-04 05:47:59,140][134294] Updated weights for policy 0, policy_version 114964 (0.0014) [2025-01-04 05:48:01,254][134294] Updated weights for policy 0, policy_version 114974 (0.0013) [2025-01-04 05:48:03,467][134294] Updated weights for policy 0, policy_version 114984 (0.0016) [2025-01-04 05:48:03,968][134211] Fps is (10 sec: 18022.1, 60 sec: 15086.9, 300 sec: 14787.2). Total num frames: 470978560. Throughput: 0: 3587.1. Samples: 106906208. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 05:48:03,969][134211] Avg episode reward: [(0, '8.553')] [2025-01-04 05:48:06,801][134294] Updated weights for policy 0, policy_version 114994 (0.0028) [2025-01-04 05:48:08,968][134211] Fps is (10 sec: 15158.6, 60 sec: 14335.9, 300 sec: 14731.7). Total num frames: 471040000. Throughput: 0: 3643.3. Samples: 106928226. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 05:48:08,968][134211] Avg episode reward: [(0, '7.902')] [2025-01-04 05:48:10,083][134294] Updated weights for policy 0, policy_version 115004 (0.0030) [2025-01-04 05:48:13,200][134294] Updated weights for policy 0, policy_version 115014 (0.0025) [2025-01-04 05:48:13,968][134211] Fps is (10 sec: 12697.7, 60 sec: 14131.2, 300 sec: 14731.7). Total num frames: 471105536. Throughput: 0: 3673.4. Samples: 106947570. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 05:48:13,968][134211] Avg episode reward: [(0, '8.412')] [2025-01-04 05:48:16,117][134294] Updated weights for policy 0, policy_version 115024 (0.0027) [2025-01-04 05:48:18,968][134211] Fps is (10 sec: 13107.3, 60 sec: 14199.5, 300 sec: 14676.2). Total num frames: 471171072. Throughput: 0: 3695.7. Samples: 106957684. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 05:48:18,968][134211] Avg episode reward: [(0, '7.705')] [2025-01-04 05:48:19,341][134294] Updated weights for policy 0, policy_version 115034 (0.0028) [2025-01-04 05:48:22,488][134294] Updated weights for policy 0, policy_version 115044 (0.0025) [2025-01-04 05:48:23,968][134211] Fps is (10 sec: 13107.0, 60 sec: 14199.4, 300 sec: 14634.5). Total num frames: 471236608. Throughput: 0: 3678.7. Samples: 106977266. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 05:48:23,969][134211] Avg episode reward: [(0, '9.604')] [2025-01-04 05:48:25,662][134294] Updated weights for policy 0, policy_version 115054 (0.0026) [2025-01-04 05:48:28,861][134294] Updated weights for policy 0, policy_version 115064 (0.0023) [2025-01-04 05:48:28,968][134211] Fps is (10 sec: 13107.1, 60 sec: 14336.0, 300 sec: 14634.5). Total num frames: 471302144. Throughput: 0: 3555.8. Samples: 106996530. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 05:48:28,968][134211] Avg episode reward: [(0, '8.657')] [2025-01-04 05:48:30,933][134294] Updated weights for policy 0, policy_version 115074 (0.0014) [2025-01-04 05:48:32,980][134294] Updated weights for policy 0, policy_version 115084 (0.0012) [2025-01-04 05:48:33,968][134211] Fps is (10 sec: 16794.1, 60 sec: 14950.4, 300 sec: 14704.0). Total num frames: 471404544. Throughput: 0: 3630.1. Samples: 107010188. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 05:48:33,968][134211] Avg episode reward: [(0, '8.255')] [2025-01-04 05:48:34,940][134294] Updated weights for policy 0, policy_version 115094 (0.0015) [2025-01-04 05:48:37,911][134294] Updated weights for policy 0, policy_version 115104 (0.0028) [2025-01-04 05:48:38,968][134211] Fps is (10 sec: 17613.1, 60 sec: 15018.7, 300 sec: 14717.9). Total num frames: 471478272. Throughput: 0: 3778.7. Samples: 107037344. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 05:48:38,968][134211] Avg episode reward: [(0, '7.327')] [2025-01-04 05:48:41,147][134294] Updated weights for policy 0, policy_version 115114 (0.0025) [2025-01-04 05:48:43,968][134211] Fps is (10 sec: 13516.4, 60 sec: 14609.1, 300 sec: 14690.1). Total num frames: 471539712. Throughput: 0: 3655.6. Samples: 107056284. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 05:48:43,968][134211] Avg episode reward: [(0, '8.258')] [2025-01-04 05:48:44,354][134294] Updated weights for policy 0, policy_version 115124 (0.0027) [2025-01-04 05:48:47,457][134294] Updated weights for policy 0, policy_version 115134 (0.0026) [2025-01-04 05:48:48,968][134211] Fps is (10 sec: 12697.1, 60 sec: 14609.0, 300 sec: 14648.4). Total num frames: 471605248. Throughput: 0: 3550.9. Samples: 107066000. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 05:48:48,969][134211] Avg episode reward: [(0, '7.222')] [2025-01-04 05:48:50,539][134294] Updated weights for policy 0, policy_version 115144 (0.0025) [2025-01-04 05:48:53,785][134294] Updated weights for policy 0, policy_version 115154 (0.0023) [2025-01-04 05:48:53,968][134211] Fps is (10 sec: 13107.4, 60 sec: 14540.8, 300 sec: 14551.3). Total num frames: 471670784. Throughput: 0: 3507.5. Samples: 107086064. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 05:48:53,968][134211] Avg episode reward: [(0, '8.590')] [2025-01-04 05:48:56,845][134294] Updated weights for policy 0, policy_version 115164 (0.0022) [2025-01-04 05:48:58,897][134294] Updated weights for policy 0, policy_version 115174 (0.0014) [2025-01-04 05:48:58,968][134211] Fps is (10 sec: 14746.3, 60 sec: 14404.8, 300 sec: 14579.0). Total num frames: 471752704. Throughput: 0: 3558.7. Samples: 107107712. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 05:48:58,968][134211] Avg episode reward: [(0, '8.617')] [2025-01-04 05:49:01,083][134294] Updated weights for policy 0, policy_version 115184 (0.0017) [2025-01-04 05:49:03,968][134211] Fps is (10 sec: 15564.6, 60 sec: 14131.2, 300 sec: 14606.7). Total num frames: 471826432. Throughput: 0: 3648.4. Samples: 107121862. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 05:49:03,969][134211] Avg episode reward: [(0, '8.353')] [2025-01-04 05:49:04,314][134294] Updated weights for policy 0, policy_version 115194 (0.0027) [2025-01-04 05:49:07,423][134294] Updated weights for policy 0, policy_version 115204 (0.0027) [2025-01-04 05:49:08,968][134211] Fps is (10 sec: 13926.1, 60 sec: 14199.5, 300 sec: 14620.7). Total num frames: 471891968. Throughput: 0: 3647.5. Samples: 107141402. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 05:49:08,968][134211] Avg episode reward: [(0, '8.571')] [2025-01-04 05:49:10,465][134294] Updated weights for policy 0, policy_version 115214 (0.0021) [2025-01-04 05:49:12,517][134294] Updated weights for policy 0, policy_version 115224 (0.0015) [2025-01-04 05:49:13,968][134211] Fps is (10 sec: 14745.8, 60 sec: 14472.5, 300 sec: 14662.3). Total num frames: 471973888. Throughput: 0: 3740.7. Samples: 107164860. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 05:49:13,968][134211] Avg episode reward: [(0, '7.939')] [2025-01-04 05:49:14,017][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000115229_471977984.pth... [2025-01-04 05:49:14,086][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000114382_468508672.pth [2025-01-04 05:49:15,521][134294] Updated weights for policy 0, policy_version 115234 (0.0025) [2025-01-04 05:49:18,405][134294] Updated weights for policy 0, policy_version 115244 (0.0022) [2025-01-04 05:49:18,967][134211] Fps is (10 sec: 15565.2, 60 sec: 14609.1, 300 sec: 14704.0). Total num frames: 472047616. Throughput: 0: 3661.5. Samples: 107174954. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 05:49:18,968][134211] Avg episode reward: [(0, '8.972')] [2025-01-04 05:49:20,366][134294] Updated weights for policy 0, policy_version 115254 (0.0013) [2025-01-04 05:49:22,409][134294] Updated weights for policy 0, policy_version 115264 (0.0015) [2025-01-04 05:49:23,968][134211] Fps is (10 sec: 16384.0, 60 sec: 15018.7, 300 sec: 14759.5). Total num frames: 472137728. Throughput: 0: 3669.1. Samples: 107202454. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 05:49:23,968][134211] Avg episode reward: [(0, '7.686')] [2025-01-04 05:49:25,775][134294] Updated weights for policy 0, policy_version 115274 (0.0028) [2025-01-04 05:49:28,968][134211] Fps is (10 sec: 15155.0, 60 sec: 14950.4, 300 sec: 14592.9). Total num frames: 472199168. Throughput: 0: 3661.4. Samples: 107221048. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 05:49:28,968][134211] Avg episode reward: [(0, '7.840')] [2025-01-04 05:49:29,161][134294] Updated weights for policy 0, policy_version 115284 (0.0027) [2025-01-04 05:49:32,479][134294] Updated weights for policy 0, policy_version 115294 (0.0028) [2025-01-04 05:49:33,968][134211] Fps is (10 sec: 12697.4, 60 sec: 14335.9, 300 sec: 14523.4). Total num frames: 472264704. Throughput: 0: 3650.6. Samples: 107230276. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 05:49:33,968][134211] Avg episode reward: [(0, '8.834')] [2025-01-04 05:49:34,818][134294] Updated weights for policy 0, policy_version 115304 (0.0017) [2025-01-04 05:49:37,312][134294] Updated weights for policy 0, policy_version 115314 (0.0020) [2025-01-04 05:49:38,968][134211] Fps is (10 sec: 14745.4, 60 sec: 14472.5, 300 sec: 14592.9). Total num frames: 472346624. Throughput: 0: 3738.7. Samples: 107254304. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 05:49:38,969][134211] Avg episode reward: [(0, '8.794')] [2025-01-04 05:49:40,328][134294] Updated weights for policy 0, policy_version 115324 (0.0027) [2025-01-04 05:49:43,270][134294] Updated weights for policy 0, policy_version 115334 (0.0026) [2025-01-04 05:49:43,968][134211] Fps is (10 sec: 14745.7, 60 sec: 14540.8, 300 sec: 14579.0). Total num frames: 472412160. Throughput: 0: 3713.0. Samples: 107274796. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 05:49:43,968][134211] Avg episode reward: [(0, '8.233')] [2025-01-04 05:49:46,315][134294] Updated weights for policy 0, policy_version 115344 (0.0026) [2025-01-04 05:49:48,968][134211] Fps is (10 sec: 13516.9, 60 sec: 14609.1, 300 sec: 14606.7). Total num frames: 472481792. Throughput: 0: 3624.4. Samples: 107284958. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 05:49:48,968][134211] Avg episode reward: [(0, '7.512')] [2025-01-04 05:49:49,438][134294] Updated weights for policy 0, policy_version 115354 (0.0026) [2025-01-04 05:49:52,253][134294] Updated weights for policy 0, policy_version 115364 (0.0021) [2025-01-04 05:49:53,968][134211] Fps is (10 sec: 15155.4, 60 sec: 14882.1, 300 sec: 14676.2). Total num frames: 472563712. Throughput: 0: 3650.3. Samples: 107305666. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 05:49:53,968][134211] Avg episode reward: [(0, '8.780')] [2025-01-04 05:49:54,255][134294] Updated weights for policy 0, policy_version 115374 (0.0013) [2025-01-04 05:49:56,344][134294] Updated weights for policy 0, policy_version 115384 (0.0013) [2025-01-04 05:49:58,477][134294] Updated weights for policy 0, policy_version 115394 (0.0013) [2025-01-04 05:49:58,968][134211] Fps is (10 sec: 17612.9, 60 sec: 15086.9, 300 sec: 14787.3). Total num frames: 472657920. Throughput: 0: 3782.0. Samples: 107335048. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 05:49:58,968][134211] Avg episode reward: [(0, '8.042')] [2025-01-04 05:50:01,958][134294] Updated weights for policy 0, policy_version 115404 (0.0027) [2025-01-04 05:50:03,968][134211] Fps is (10 sec: 15155.1, 60 sec: 14813.9, 300 sec: 14620.6). Total num frames: 472715264. Throughput: 0: 3771.0. Samples: 107344648. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 05:50:03,968][134211] Avg episode reward: [(0, '8.682')] [2025-01-04 05:50:05,414][134294] Updated weights for policy 0, policy_version 115414 (0.0028) [2025-01-04 05:50:08,456][134294] Updated weights for policy 0, policy_version 115424 (0.0027) [2025-01-04 05:50:08,968][134211] Fps is (10 sec: 12287.9, 60 sec: 14813.9, 300 sec: 14481.8). Total num frames: 472780800. Throughput: 0: 3572.4. Samples: 107363210. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 05:50:08,968][134211] Avg episode reward: [(0, '8.332')] [2025-01-04 05:50:11,664][134294] Updated weights for policy 0, policy_version 115434 (0.0028) [2025-01-04 05:50:13,968][134211] Fps is (10 sec: 13107.3, 60 sec: 14540.8, 300 sec: 14481.8). Total num frames: 472846336. Throughput: 0: 3587.0. Samples: 107382464. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 05:50:13,968][134211] Avg episode reward: [(0, '8.289')] [2025-01-04 05:50:14,832][134294] Updated weights for policy 0, policy_version 115444 (0.0025) [2025-01-04 05:50:17,913][134294] Updated weights for policy 0, policy_version 115454 (0.0025) [2025-01-04 05:50:18,968][134211] Fps is (10 sec: 13107.4, 60 sec: 14404.2, 300 sec: 14495.7). Total num frames: 472911872. Throughput: 0: 3598.6. Samples: 107392212. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 05:50:18,968][134211] Avg episode reward: [(0, '8.798')] [2025-01-04 05:50:21,030][134294] Updated weights for policy 0, policy_version 115464 (0.0027) [2025-01-04 05:50:23,201][134294] Updated weights for policy 0, policy_version 115474 (0.0016) [2025-01-04 05:50:23,968][134211] Fps is (10 sec: 14745.7, 60 sec: 14267.8, 300 sec: 14565.1). Total num frames: 472993792. Throughput: 0: 3533.5. Samples: 107413312. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 05:50:23,968][134211] Avg episode reward: [(0, '8.227')] [2025-01-04 05:50:25,393][134294] Updated weights for policy 0, policy_version 115484 (0.0016) [2025-01-04 05:50:28,572][134294] Updated weights for policy 0, policy_version 115494 (0.0024) [2025-01-04 05:50:28,968][134211] Fps is (10 sec: 15154.9, 60 sec: 14404.2, 300 sec: 14592.9). Total num frames: 473063424. Throughput: 0: 3624.1. Samples: 107437882. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 05:50:28,968][134211] Avg episode reward: [(0, '8.994')] [2025-01-04 05:50:31,804][134294] Updated weights for policy 0, policy_version 115504 (0.0026) [2025-01-04 05:50:33,968][134211] Fps is (10 sec: 14745.6, 60 sec: 14609.1, 300 sec: 14620.6). Total num frames: 473141248. Throughput: 0: 3607.4. Samples: 107447290. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 05:50:33,968][134211] Avg episode reward: [(0, '8.124')] [2025-01-04 05:50:34,173][134294] Updated weights for policy 0, policy_version 115514 (0.0015) [2025-01-04 05:50:36,108][134294] Updated weights for policy 0, policy_version 115524 (0.0012) [2025-01-04 05:50:38,027][134294] Updated weights for policy 0, policy_version 115534 (0.0014) [2025-01-04 05:50:38,968][134211] Fps is (10 sec: 18022.7, 60 sec: 14950.5, 300 sec: 14731.7). Total num frames: 473243648. Throughput: 0: 3775.0. Samples: 107475542. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 05:50:38,968][134211] Avg episode reward: [(0, '8.522')] [2025-01-04 05:50:40,156][134294] Updated weights for policy 0, policy_version 115544 (0.0015) [2025-01-04 05:50:43,157][134294] Updated weights for policy 0, policy_version 115554 (0.0030) [2025-01-04 05:50:43,968][134211] Fps is (10 sec: 17612.7, 60 sec: 15087.0, 300 sec: 14745.6). Total num frames: 473317376. Throughput: 0: 3674.2. Samples: 107500388. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 05:50:43,968][134211] Avg episode reward: [(0, '8.084')] [2025-01-04 05:50:46,392][134294] Updated weights for policy 0, policy_version 115564 (0.0028) [2025-01-04 05:50:48,968][134211] Fps is (10 sec: 13926.4, 60 sec: 15018.7, 300 sec: 14759.5). Total num frames: 473382912. Throughput: 0: 3675.2. Samples: 107510032. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 05:50:48,968][134211] Avg episode reward: [(0, '8.006')] [2025-01-04 05:50:49,556][134294] Updated weights for policy 0, policy_version 115574 (0.0031) [2025-01-04 05:50:52,909][134294] Updated weights for policy 0, policy_version 115584 (0.0027) [2025-01-04 05:50:53,968][134211] Fps is (10 sec: 12697.4, 60 sec: 14677.3, 300 sec: 14648.4). Total num frames: 473444352. Throughput: 0: 3686.6. Samples: 107529108. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 05:50:53,968][134211] Avg episode reward: [(0, '7.537')] [2025-01-04 05:50:56,207][134294] Updated weights for policy 0, policy_version 115594 (0.0027) [2025-01-04 05:50:58,968][134211] Fps is (10 sec: 12287.9, 60 sec: 14131.2, 300 sec: 14579.0). Total num frames: 473505792. Throughput: 0: 3666.3. Samples: 107547448. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 05:50:58,968][134211] Avg episode reward: [(0, '8.006')] [2025-01-04 05:50:59,568][134294] Updated weights for policy 0, policy_version 115604 (0.0026) [2025-01-04 05:51:02,710][134294] Updated weights for policy 0, policy_version 115614 (0.0029) [2025-01-04 05:51:03,968][134211] Fps is (10 sec: 12288.1, 60 sec: 14199.5, 300 sec: 14565.1). Total num frames: 473567232. Throughput: 0: 3662.1. Samples: 107557006. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 05:51:03,968][134211] Avg episode reward: [(0, '8.587')] [2025-01-04 05:51:05,873][134294] Updated weights for policy 0, policy_version 115624 (0.0025) [2025-01-04 05:51:08,620][134294] Updated weights for policy 0, policy_version 115634 (0.0022) [2025-01-04 05:51:08,967][134211] Fps is (10 sec: 13517.1, 60 sec: 14336.1, 300 sec: 14606.8). Total num frames: 473640960. Throughput: 0: 3631.2. Samples: 107576716. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 05:51:08,968][134211] Avg episode reward: [(0, '8.189')] [2025-01-04 05:51:10,493][134294] Updated weights for policy 0, policy_version 115644 (0.0013) [2025-01-04 05:51:12,436][134294] Updated weights for policy 0, policy_version 115654 (0.0013) [2025-01-04 05:51:13,968][134211] Fps is (10 sec: 18022.5, 60 sec: 15018.7, 300 sec: 14745.6). Total num frames: 473747456. Throughput: 0: 3751.1. Samples: 107606680. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 05:51:13,968][134211] Avg episode reward: [(0, '9.020')] [2025-01-04 05:51:13,972][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000115661_473747456.pth... [2025-01-04 05:51:14,021][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000114809_470257664.pth [2025-01-04 05:51:14,539][134294] Updated weights for policy 0, policy_version 115664 (0.0013) [2025-01-04 05:51:16,877][134294] Updated weights for policy 0, policy_version 115674 (0.0021) [2025-01-04 05:51:18,968][134211] Fps is (10 sec: 18431.7, 60 sec: 15223.5, 300 sec: 14745.6). Total num frames: 473825280. Throughput: 0: 3861.2. Samples: 107621044. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 05:51:18,968][134211] Avg episode reward: [(0, '7.632')] [2025-01-04 05:51:20,212][134294] Updated weights for policy 0, policy_version 115684 (0.0026) [2025-01-04 05:51:23,667][134294] Updated weights for policy 0, policy_version 115694 (0.0029) [2025-01-04 05:51:23,968][134211] Fps is (10 sec: 13516.5, 60 sec: 14813.8, 300 sec: 14592.9). Total num frames: 473882624. Throughput: 0: 3642.6. Samples: 107639458. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 05:51:23,968][134211] Avg episode reward: [(0, '8.286')] [2025-01-04 05:51:27,046][134294] Updated weights for policy 0, policy_version 115704 (0.0028) [2025-01-04 05:51:28,968][134211] Fps is (10 sec: 11878.2, 60 sec: 14677.3, 300 sec: 14565.1). Total num frames: 473944064. Throughput: 0: 3486.7. Samples: 107657290. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 05:51:28,968][134211] Avg episode reward: [(0, '7.904')] [2025-01-04 05:51:30,439][134294] Updated weights for policy 0, policy_version 115714 (0.0030) [2025-01-04 05:51:32,494][134294] Updated weights for policy 0, policy_version 115724 (0.0013) [2025-01-04 05:51:33,968][134211] Fps is (10 sec: 15155.6, 60 sec: 14882.1, 300 sec: 14662.3). Total num frames: 474034176. Throughput: 0: 3504.4. Samples: 107667730. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 05:51:33,968][134211] Avg episode reward: [(0, '7.944')] [2025-01-04 05:51:34,656][134294] Updated weights for policy 0, policy_version 115734 (0.0016) [2025-01-04 05:51:37,642][134294] Updated weights for policy 0, policy_version 115744 (0.0028) [2025-01-04 05:51:38,968][134211] Fps is (10 sec: 15974.6, 60 sec: 14336.0, 300 sec: 14676.2). Total num frames: 474103808. Throughput: 0: 3655.4. Samples: 107693600. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 05:51:38,968][134211] Avg episode reward: [(0, '8.153')] [2025-01-04 05:51:40,688][134294] Updated weights for policy 0, policy_version 115754 (0.0027) [2025-01-04 05:51:43,762][134294] Updated weights for policy 0, policy_version 115764 (0.0024) [2025-01-04 05:51:43,968][134211] Fps is (10 sec: 13516.6, 60 sec: 14199.4, 300 sec: 14676.2). Total num frames: 474169344. Throughput: 0: 3694.0. Samples: 107713680. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 05:51:43,968][134211] Avg episode reward: [(0, '8.221')] [2025-01-04 05:51:46,772][134294] Updated weights for policy 0, policy_version 115774 (0.0026) [2025-01-04 05:51:48,968][134211] Fps is (10 sec: 13516.4, 60 sec: 14267.7, 300 sec: 14634.5). Total num frames: 474238976. Throughput: 0: 3704.6. Samples: 107723714. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 05:51:48,969][134211] Avg episode reward: [(0, '8.007')] [2025-01-04 05:51:49,995][134294] Updated weights for policy 0, policy_version 115784 (0.0029) [2025-01-04 05:51:52,970][134294] Updated weights for policy 0, policy_version 115794 (0.0019) [2025-01-04 05:51:53,967][134211] Fps is (10 sec: 13926.7, 60 sec: 14404.3, 300 sec: 14648.4). Total num frames: 474308608. Throughput: 0: 3685.2. Samples: 107742550. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 05:51:53,968][134211] Avg episode reward: [(0, '8.255')] [2025-01-04 05:51:55,161][134294] Updated weights for policy 0, policy_version 115804 (0.0012) [2025-01-04 05:51:57,283][134294] Updated weights for policy 0, policy_version 115814 (0.0014) [2025-01-04 05:51:58,967][134211] Fps is (10 sec: 16384.7, 60 sec: 14950.4, 300 sec: 14676.2). Total num frames: 474402816. Throughput: 0: 3644.9. Samples: 107770700. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 05:51:58,968][134211] Avg episode reward: [(0, '8.312')] [2025-01-04 05:51:59,526][134294] Updated weights for policy 0, policy_version 115824 (0.0012) [2025-01-04 05:52:02,639][134294] Updated weights for policy 0, policy_version 115834 (0.0026) [2025-01-04 05:52:03,968][134211] Fps is (10 sec: 15973.8, 60 sec: 15018.6, 300 sec: 14537.3). Total num frames: 474468352. Throughput: 0: 3606.2. Samples: 107783326. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 05:52:03,968][134211] Avg episode reward: [(0, '8.546')] [2025-01-04 05:52:06,023][134294] Updated weights for policy 0, policy_version 115844 (0.0027) [2025-01-04 05:52:08,968][134211] Fps is (10 sec: 12697.1, 60 sec: 14813.8, 300 sec: 14481.8). Total num frames: 474529792. Throughput: 0: 3588.3. Samples: 107800934. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 05:52:08,969][134211] Avg episode reward: [(0, '7.628')] [2025-01-04 05:52:09,332][134294] Updated weights for policy 0, policy_version 115854 (0.0025) [2025-01-04 05:52:12,454][134294] Updated weights for policy 0, policy_version 115864 (0.0025) [2025-01-04 05:52:13,968][134211] Fps is (10 sec: 12697.9, 60 sec: 14131.2, 300 sec: 14495.7). Total num frames: 474595328. Throughput: 0: 3621.5. Samples: 107820256. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 05:52:13,968][134211] Avg episode reward: [(0, '8.133')] [2025-01-04 05:52:15,534][134294] Updated weights for policy 0, policy_version 115874 (0.0028) [2025-01-04 05:52:18,380][134294] Updated weights for policy 0, policy_version 115884 (0.0026) [2025-01-04 05:52:18,968][134211] Fps is (10 sec: 13517.1, 60 sec: 13994.6, 300 sec: 14509.6). Total num frames: 474664960. Throughput: 0: 3619.7. Samples: 107830618. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 05:52:18,968][134211] Avg episode reward: [(0, '8.314')] [2025-01-04 05:52:21,427][134294] Updated weights for policy 0, policy_version 115894 (0.0027) [2025-01-04 05:52:23,968][134211] Fps is (10 sec: 13516.8, 60 sec: 14131.2, 300 sec: 14537.3). Total num frames: 474730496. Throughput: 0: 3499.2. Samples: 107851064. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 05:52:23,968][134211] Avg episode reward: [(0, '7.962')] [2025-01-04 05:52:24,761][134294] Updated weights for policy 0, policy_version 115904 (0.0027) [2025-01-04 05:52:26,772][134294] Updated weights for policy 0, policy_version 115914 (0.0014) [2025-01-04 05:52:28,751][134294] Updated weights for policy 0, policy_version 115924 (0.0013) [2025-01-04 05:52:28,967][134211] Fps is (10 sec: 16384.3, 60 sec: 14745.7, 300 sec: 14648.4). Total num frames: 474828800. Throughput: 0: 3616.5. Samples: 107876420. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 05:52:28,968][134211] Avg episode reward: [(0, '8.125')] [2025-01-04 05:52:30,698][134294] Updated weights for policy 0, policy_version 115934 (0.0013) [2025-01-04 05:52:32,861][134294] Updated weights for policy 0, policy_version 115944 (0.0015) [2025-01-04 05:52:33,968][134211] Fps is (10 sec: 18841.2, 60 sec: 14745.5, 300 sec: 14717.8). Total num frames: 474918912. Throughput: 0: 3746.9. Samples: 107892324. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 05:52:33,969][134211] Avg episode reward: [(0, '8.255')] [2025-01-04 05:52:36,376][134294] Updated weights for policy 0, policy_version 115954 (0.0032) [2025-01-04 05:52:38,968][134211] Fps is (10 sec: 14744.4, 60 sec: 14540.6, 300 sec: 14620.6). Total num frames: 474976256. Throughput: 0: 3774.9. Samples: 107912422. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 05:52:38,969][134211] Avg episode reward: [(0, '8.694')] [2025-01-04 05:52:39,786][134294] Updated weights for policy 0, policy_version 115964 (0.0028) [2025-01-04 05:52:42,994][134294] Updated weights for policy 0, policy_version 115974 (0.0025) [2025-01-04 05:52:43,968][134211] Fps is (10 sec: 12288.0, 60 sec: 14540.8, 300 sec: 14620.6). Total num frames: 475041792. Throughput: 0: 3568.3. Samples: 107931276. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 05:52:43,969][134211] Avg episode reward: [(0, '7.812')] [2025-01-04 05:52:45,942][134294] Updated weights for policy 0, policy_version 115984 (0.0025) [2025-01-04 05:52:48,942][134294] Updated weights for policy 0, policy_version 115994 (0.0023) [2025-01-04 05:52:48,968][134211] Fps is (10 sec: 13517.5, 60 sec: 14540.8, 300 sec: 14620.6). Total num frames: 475111424. Throughput: 0: 3516.8. Samples: 107941580. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 05:52:48,968][134211] Avg episode reward: [(0, '8.705')] [2025-01-04 05:52:52,014][134294] Updated weights for policy 0, policy_version 116004 (0.0024) [2025-01-04 05:52:53,968][134211] Fps is (10 sec: 13107.3, 60 sec: 14404.2, 300 sec: 14523.6). Total num frames: 475172864. Throughput: 0: 3576.1. Samples: 107961858. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 05:52:53,968][134211] Avg episode reward: [(0, '7.832')] [2025-01-04 05:52:55,214][134294] Updated weights for policy 0, policy_version 116014 (0.0027) [2025-01-04 05:52:57,707][134294] Updated weights for policy 0, policy_version 116024 (0.0016) [2025-01-04 05:52:58,968][134211] Fps is (10 sec: 14746.0, 60 sec: 14267.7, 300 sec: 14509.6). Total num frames: 475258880. Throughput: 0: 3644.3. Samples: 107984248. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 05:52:58,968][134211] Avg episode reward: [(0, '9.105')] [2025-01-04 05:52:59,682][134294] Updated weights for policy 0, policy_version 116034 (0.0015) [2025-01-04 05:53:01,616][134294] Updated weights for policy 0, policy_version 116044 (0.0016) [2025-01-04 05:53:03,569][134294] Updated weights for policy 0, policy_version 116054 (0.0012) [2025-01-04 05:53:03,968][134211] Fps is (10 sec: 18842.0, 60 sec: 14882.2, 300 sec: 14648.4). Total num frames: 475361280. Throughput: 0: 3764.9. Samples: 108000036. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 05:53:03,968][134211] Avg episode reward: [(0, '8.136')] [2025-01-04 05:53:05,689][134294] Updated weights for policy 0, policy_version 116064 (0.0015) [2025-01-04 05:53:08,737][134294] Updated weights for policy 0, policy_version 116074 (0.0028) [2025-01-04 05:53:08,968][134211] Fps is (10 sec: 18022.0, 60 sec: 15155.2, 300 sec: 14690.1). Total num frames: 475439104. Throughput: 0: 3932.7. Samples: 108028034. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 05:53:08,969][134211] Avg episode reward: [(0, '9.456')] [2025-01-04 05:53:11,757][134294] Updated weights for policy 0, policy_version 116084 (0.0024) [2025-01-04 05:53:13,968][134211] Fps is (10 sec: 14335.1, 60 sec: 15155.1, 300 sec: 14690.0). Total num frames: 475504640. Throughput: 0: 3804.1. Samples: 108047608. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 05:53:13,969][134211] Avg episode reward: [(0, '7.627')] [2025-01-04 05:53:13,977][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000116090_475504640.pth... [2025-01-04 05:53:14,055][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000115229_471977984.pth [2025-01-04 05:53:15,111][134294] Updated weights for policy 0, policy_version 116094 (0.0030) [2025-01-04 05:53:18,093][134294] Updated weights for policy 0, policy_version 116104 (0.0025) [2025-01-04 05:53:18,968][134211] Fps is (10 sec: 13107.3, 60 sec: 15086.9, 300 sec: 14690.1). Total num frames: 475570176. Throughput: 0: 3661.1. Samples: 108057072. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 05:53:18,968][134211] Avg episode reward: [(0, '9.051')] [2025-01-04 05:53:21,219][134294] Updated weights for policy 0, policy_version 116114 (0.0024) [2025-01-04 05:53:23,968][134211] Fps is (10 sec: 13107.7, 60 sec: 15086.9, 300 sec: 14690.1). Total num frames: 475635712. Throughput: 0: 3651.3. Samples: 108076728. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 05:53:23,968][134211] Avg episode reward: [(0, '8.532')] [2025-01-04 05:53:24,631][134294] Updated weights for policy 0, policy_version 116124 (0.0027) [2025-01-04 05:53:27,934][134294] Updated weights for policy 0, policy_version 116134 (0.0026) [2025-01-04 05:53:28,968][134211] Fps is (10 sec: 12697.7, 60 sec: 14472.5, 300 sec: 14551.2). Total num frames: 475697152. Throughput: 0: 3641.4. Samples: 108095136. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 05:53:28,968][134211] Avg episode reward: [(0, '7.543')] [2025-01-04 05:53:31,110][134294] Updated weights for policy 0, policy_version 116144 (0.0026) [2025-01-04 05:53:33,968][134211] Fps is (10 sec: 12697.8, 60 sec: 14063.0, 300 sec: 14523.4). Total num frames: 475762688. Throughput: 0: 3629.0. Samples: 108104886. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 05:53:33,968][134211] Avg episode reward: [(0, '7.841')] [2025-01-04 05:53:34,378][134294] Updated weights for policy 0, policy_version 116154 (0.0023) [2025-01-04 05:53:37,385][134294] Updated weights for policy 0, policy_version 116164 (0.0025) [2025-01-04 05:53:38,968][134211] Fps is (10 sec: 13516.5, 60 sec: 14267.9, 300 sec: 14551.2). Total num frames: 475832320. Throughput: 0: 3613.5. Samples: 108124466. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 05:53:38,968][134211] Avg episode reward: [(0, '8.554')] [2025-01-04 05:53:39,675][134294] Updated weights for policy 0, policy_version 116174 (0.0017) [2025-01-04 05:53:42,043][134294] Updated weights for policy 0, policy_version 116184 (0.0020) [2025-01-04 05:53:43,968][134211] Fps is (10 sec: 15155.0, 60 sec: 14540.8, 300 sec: 14606.8). Total num frames: 475914240. Throughput: 0: 3669.5. Samples: 108149378. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 05:53:43,968][134211] Avg episode reward: [(0, '9.080')] [2025-01-04 05:53:45,046][134294] Updated weights for policy 0, policy_version 116194 (0.0025) [2025-01-04 05:53:48,227][134294] Updated weights for policy 0, policy_version 116204 (0.0025) [2025-01-04 05:53:48,968][134211] Fps is (10 sec: 14745.5, 60 sec: 14472.5, 300 sec: 14606.7). Total num frames: 475979776. Throughput: 0: 3547.2. Samples: 108159660. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 05:53:48,969][134211] Avg episode reward: [(0, '8.355')] [2025-01-04 05:53:50,689][134294] Updated weights for policy 0, policy_version 116214 (0.0016) [2025-01-04 05:53:52,624][134294] Updated weights for policy 0, policy_version 116224 (0.0014) [2025-01-04 05:53:53,967][134211] Fps is (10 sec: 16384.4, 60 sec: 15087.0, 300 sec: 14662.3). Total num frames: 476078080. Throughput: 0: 3473.0. Samples: 108184318. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 05:53:53,968][134211] Avg episode reward: [(0, '8.373')] [2025-01-04 05:53:54,693][134294] Updated weights for policy 0, policy_version 116234 (0.0014) [2025-01-04 05:53:56,846][134294] Updated weights for policy 0, policy_version 116244 (0.0014) [2025-01-04 05:53:58,968][134211] Fps is (10 sec: 17613.0, 60 sec: 14950.3, 300 sec: 14676.2). Total num frames: 476155904. Throughput: 0: 3629.5. Samples: 108210936. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 05:53:58,969][134211] Avg episode reward: [(0, '8.078')] [2025-01-04 05:54:00,757][134294] Updated weights for policy 0, policy_version 116254 (0.0031) [2025-01-04 05:54:03,968][134211] Fps is (10 sec: 13106.8, 60 sec: 14131.1, 300 sec: 14634.5). Total num frames: 476209152. Throughput: 0: 3585.3. Samples: 108218412. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 05:54:03,968][134211] Avg episode reward: [(0, '8.689')] [2025-01-04 05:54:04,514][134294] Updated weights for policy 0, policy_version 116264 (0.0030) [2025-01-04 05:54:07,099][134294] Updated weights for policy 0, policy_version 116274 (0.0017) [2025-01-04 05:54:08,968][134211] Fps is (10 sec: 13926.8, 60 sec: 14267.8, 300 sec: 14648.4). Total num frames: 476295168. Throughput: 0: 3590.5. Samples: 108238298. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 05:54:08,968][134211] Avg episode reward: [(0, '9.153')] [2025-01-04 05:54:09,005][134294] Updated weights for policy 0, policy_version 116284 (0.0014) [2025-01-04 05:54:10,844][134294] Updated weights for policy 0, policy_version 116294 (0.0012) [2025-01-04 05:54:12,832][134294] Updated weights for policy 0, policy_version 116304 (0.0013) [2025-01-04 05:54:13,968][134211] Fps is (10 sec: 18842.0, 60 sec: 14882.3, 300 sec: 14745.6). Total num frames: 476397568. Throughput: 0: 3888.4. Samples: 108270116. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 05:54:13,968][134211] Avg episode reward: [(0, '7.804')] [2025-01-04 05:54:15,529][134294] Updated weights for policy 0, policy_version 116314 (0.0025) [2025-01-04 05:54:18,690][134294] Updated weights for policy 0, policy_version 116324 (0.0028) [2025-01-04 05:54:18,968][134211] Fps is (10 sec: 16793.2, 60 sec: 14882.1, 300 sec: 14662.3). Total num frames: 476463104. Throughput: 0: 3907.5. Samples: 108280726. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 05:54:18,968][134211] Avg episode reward: [(0, '7.597')] [2025-01-04 05:54:22,069][134294] Updated weights for policy 0, policy_version 116334 (0.0030) [2025-01-04 05:54:23,968][134211] Fps is (10 sec: 12697.3, 60 sec: 14813.9, 300 sec: 14662.3). Total num frames: 476524544. Throughput: 0: 3887.3. Samples: 108299396. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 05:54:23,968][134211] Avg episode reward: [(0, '7.532')] [2025-01-04 05:54:25,445][134294] Updated weights for policy 0, policy_version 116344 (0.0025) [2025-01-04 05:54:28,699][134294] Updated weights for policy 0, policy_version 116354 (0.0026) [2025-01-04 05:54:28,968][134211] Fps is (10 sec: 12288.1, 60 sec: 14813.9, 300 sec: 14648.4). Total num frames: 476585984. Throughput: 0: 3750.5. Samples: 108318152. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 05:54:28,968][134211] Avg episode reward: [(0, '8.478')] [2025-01-04 05:54:31,738][134294] Updated weights for policy 0, policy_version 116364 (0.0027) [2025-01-04 05:54:33,968][134211] Fps is (10 sec: 13107.4, 60 sec: 14882.1, 300 sec: 14606.8). Total num frames: 476655616. Throughput: 0: 3737.1. Samples: 108327828. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 05:54:33,968][134211] Avg episode reward: [(0, '7.689')] [2025-01-04 05:54:34,916][134294] Updated weights for policy 0, policy_version 116374 (0.0024) [2025-01-04 05:54:38,093][134294] Updated weights for policy 0, policy_version 116384 (0.0026) [2025-01-04 05:54:38,968][134211] Fps is (10 sec: 13106.3, 60 sec: 14745.5, 300 sec: 14592.8). Total num frames: 476717056. Throughput: 0: 3623.4. Samples: 108347376. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 05:54:38,969][134211] Avg episode reward: [(0, '8.662')] [2025-01-04 05:54:41,287][134294] Updated weights for policy 0, policy_version 116394 (0.0025) [2025-01-04 05:54:43,968][134211] Fps is (10 sec: 13107.0, 60 sec: 14540.8, 300 sec: 14592.9). Total num frames: 476786688. Throughput: 0: 3468.6. Samples: 108367024. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 05:54:43,968][134211] Avg episode reward: [(0, '8.470')] [2025-01-04 05:54:44,314][134294] Updated weights for policy 0, policy_version 116404 (0.0027) [2025-01-04 05:54:47,442][134294] Updated weights for policy 0, policy_version 116414 (0.0023) [2025-01-04 05:54:48,967][134211] Fps is (10 sec: 13927.6, 60 sec: 14609.2, 300 sec: 14551.2). Total num frames: 476856320. Throughput: 0: 3524.9. Samples: 108377032. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 05:54:48,968][134211] Avg episode reward: [(0, '7.777')] [2025-01-04 05:54:49,584][134294] Updated weights for policy 0, policy_version 116424 (0.0013) [2025-01-04 05:54:51,490][134294] Updated weights for policy 0, policy_version 116434 (0.0014) [2025-01-04 05:54:53,412][134294] Updated weights for policy 0, policy_version 116444 (0.0015) [2025-01-04 05:54:53,967][134211] Fps is (10 sec: 17613.4, 60 sec: 14745.6, 300 sec: 14592.9). Total num frames: 476962816. Throughput: 0: 3708.2. Samples: 108405166. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 05:54:53,968][134211] Avg episode reward: [(0, '8.731')] [2025-01-04 05:54:55,348][134294] Updated weights for policy 0, policy_version 116454 (0.0013) [2025-01-04 05:54:57,382][134294] Updated weights for policy 0, policy_version 116464 (0.0014) [2025-01-04 05:54:58,968][134211] Fps is (10 sec: 20889.1, 60 sec: 15155.2, 300 sec: 14745.6). Total num frames: 477065216. Throughput: 0: 3692.0. Samples: 108436256. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 05:54:58,968][134211] Avg episode reward: [(0, '8.531')] [2025-01-04 05:54:59,830][134294] Updated weights for policy 0, policy_version 116474 (0.0021) [2025-01-04 05:55:03,194][134294] Updated weights for policy 0, policy_version 116484 (0.0025) [2025-01-04 05:55:03,968][134211] Fps is (10 sec: 16383.4, 60 sec: 15291.7, 300 sec: 14731.7). Total num frames: 477126656. Throughput: 0: 3691.9. Samples: 108446860. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 05:55:03,969][134211] Avg episode reward: [(0, '8.723')] [2025-01-04 05:55:06,555][134294] Updated weights for policy 0, policy_version 116494 (0.0029) [2025-01-04 05:55:08,968][134211] Fps is (10 sec: 12288.1, 60 sec: 14882.1, 300 sec: 14717.8). Total num frames: 477188096. Throughput: 0: 3679.3. Samples: 108464966. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 05:55:08,968][134211] Avg episode reward: [(0, '8.423')] [2025-01-04 05:55:09,717][134294] Updated weights for policy 0, policy_version 116504 (0.0029) [2025-01-04 05:55:12,884][134294] Updated weights for policy 0, policy_version 116514 (0.0028) [2025-01-04 05:55:13,968][134211] Fps is (10 sec: 12697.6, 60 sec: 14267.7, 300 sec: 14717.8). Total num frames: 477253632. Throughput: 0: 3695.7. Samples: 108484458. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 05:55:13,969][134211] Avg episode reward: [(0, '7.512')] [2025-01-04 05:55:13,977][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000116517_477253632.pth... [2025-01-04 05:55:14,049][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000115661_473747456.pth [2025-01-04 05:55:15,954][134294] Updated weights for policy 0, policy_version 116524 (0.0026) [2025-01-04 05:55:18,911][134294] Updated weights for policy 0, policy_version 116534 (0.0027) [2025-01-04 05:55:18,968][134211] Fps is (10 sec: 13516.8, 60 sec: 14336.0, 300 sec: 14676.2). Total num frames: 477323264. Throughput: 0: 3705.0. Samples: 108494554. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 05:55:18,968][134211] Avg episode reward: [(0, '7.982')] [2025-01-04 05:55:21,864][134294] Updated weights for policy 0, policy_version 116544 (0.0025) [2025-01-04 05:55:23,968][134211] Fps is (10 sec: 13516.8, 60 sec: 14404.3, 300 sec: 14662.3). Total num frames: 477388800. Throughput: 0: 3732.4. Samples: 108515330. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 05:55:23,968][134211] Avg episode reward: [(0, '8.299')] [2025-01-04 05:55:24,928][134294] Updated weights for policy 0, policy_version 116554 (0.0028) [2025-01-04 05:55:28,275][134294] Updated weights for policy 0, policy_version 116564 (0.0026) [2025-01-04 05:55:28,968][134211] Fps is (10 sec: 12697.6, 60 sec: 14404.3, 300 sec: 14606.7). Total num frames: 477450240. Throughput: 0: 3717.7. Samples: 108534322. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 05:55:28,968][134211] Avg episode reward: [(0, '7.674')] [2025-01-04 05:55:31,439][134294] Updated weights for policy 0, policy_version 116574 (0.0024) [2025-01-04 05:55:33,969][134211] Fps is (10 sec: 12696.7, 60 sec: 14335.8, 300 sec: 14481.7). Total num frames: 477515776. Throughput: 0: 3711.7. Samples: 108544064. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 05:55:33,969][134211] Avg episode reward: [(0, '8.261')] [2025-01-04 05:55:34,735][134294] Updated weights for policy 0, policy_version 116584 (0.0027) [2025-01-04 05:55:37,046][134294] Updated weights for policy 0, policy_version 116594 (0.0015) [2025-01-04 05:55:38,968][134211] Fps is (10 sec: 14745.6, 60 sec: 14677.5, 300 sec: 14509.6). Total num frames: 477597696. Throughput: 0: 3580.3. Samples: 108566280. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 05:55:38,968][134211] Avg episode reward: [(0, '8.190')] [2025-01-04 05:55:39,655][134294] Updated weights for policy 0, policy_version 116604 (0.0023) [2025-01-04 05:55:42,553][134294] Updated weights for policy 0, policy_version 116614 (0.0027) [2025-01-04 05:55:43,968][134211] Fps is (10 sec: 15156.3, 60 sec: 14677.3, 300 sec: 14523.4). Total num frames: 477667328. Throughput: 0: 3370.3. Samples: 108587920. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 05:55:43,968][134211] Avg episode reward: [(0, '7.678')] [2025-01-04 05:55:45,559][134294] Updated weights for policy 0, policy_version 116624 (0.0024) [2025-01-04 05:55:47,516][134294] Updated weights for policy 0, policy_version 116634 (0.0014) [2025-01-04 05:55:48,967][134211] Fps is (10 sec: 16384.3, 60 sec: 15086.9, 300 sec: 14634.5). Total num frames: 477761536. Throughput: 0: 3385.4. Samples: 108599202. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 05:55:48,968][134211] Avg episode reward: [(0, '8.251')] [2025-01-04 05:55:49,395][134294] Updated weights for policy 0, policy_version 116644 (0.0014) [2025-01-04 05:55:51,249][134294] Updated weights for policy 0, policy_version 116654 (0.0013) [2025-01-04 05:55:53,280][134294] Updated weights for policy 0, policy_version 116664 (0.0013) [2025-01-04 05:55:53,968][134211] Fps is (10 sec: 19660.9, 60 sec: 15018.6, 300 sec: 14773.4). Total num frames: 477863936. Throughput: 0: 3702.4. Samples: 108631574. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 05:55:53,968][134211] Avg episode reward: [(0, '7.880')] [2025-01-04 05:55:56,556][134294] Updated weights for policy 0, policy_version 116674 (0.0023) [2025-01-04 05:55:58,968][134211] Fps is (10 sec: 15974.1, 60 sec: 14267.7, 300 sec: 14759.5). Total num frames: 477921280. Throughput: 0: 3710.1. Samples: 108651414. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 05:55:58,968][134211] Avg episode reward: [(0, '8.640')] [2025-01-04 05:56:00,435][134294] Updated weights for policy 0, policy_version 116684 (0.0029) [2025-01-04 05:56:03,968][134211] Fps is (10 sec: 11059.3, 60 sec: 14131.2, 300 sec: 14690.1). Total num frames: 477974528. Throughput: 0: 3663.2. Samples: 108659400. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 05:56:03,968][134211] Avg episode reward: [(0, '7.687')] [2025-01-04 05:56:04,194][134294] Updated weights for policy 0, policy_version 116694 (0.0024) [2025-01-04 05:56:06,454][134294] Updated weights for policy 0, policy_version 116704 (0.0016) [2025-01-04 05:56:08,390][134294] Updated weights for policy 0, policy_version 116714 (0.0014) [2025-01-04 05:56:08,968][134211] Fps is (10 sec: 14745.8, 60 sec: 14677.4, 300 sec: 14648.4). Total num frames: 478068736. Throughput: 0: 3706.5. Samples: 108682120. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 05:56:08,968][134211] Avg episode reward: [(0, '8.278')] [2025-01-04 05:56:11,271][134294] Updated weights for policy 0, policy_version 116724 (0.0024) [2025-01-04 05:56:13,968][134211] Fps is (10 sec: 15974.1, 60 sec: 14677.3, 300 sec: 14606.7). Total num frames: 478134272. Throughput: 0: 3778.5. Samples: 108704356. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 05:56:13,968][134211] Avg episode reward: [(0, '8.649')] [2025-01-04 05:56:14,617][134294] Updated weights for policy 0, policy_version 116734 (0.0024) [2025-01-04 05:56:17,717][134294] Updated weights for policy 0, policy_version 116744 (0.0024) [2025-01-04 05:56:18,968][134211] Fps is (10 sec: 12697.4, 60 sec: 14540.8, 300 sec: 14620.6). Total num frames: 478195712. Throughput: 0: 3768.5. Samples: 108713644. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 05:56:18,968][134211] Avg episode reward: [(0, '8.679')] [2025-01-04 05:56:20,739][134294] Updated weights for policy 0, policy_version 116754 (0.0025) [2025-01-04 05:56:23,619][134294] Updated weights for policy 0, policy_version 116764 (0.0026) [2025-01-04 05:56:23,968][134211] Fps is (10 sec: 13516.8, 60 sec: 14677.3, 300 sec: 14662.3). Total num frames: 478269440. Throughput: 0: 3736.4. Samples: 108734420. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 05:56:23,969][134211] Avg episode reward: [(0, '8.397')] [2025-01-04 05:56:26,667][134294] Updated weights for policy 0, policy_version 116774 (0.0026) [2025-01-04 05:56:28,968][134211] Fps is (10 sec: 13516.7, 60 sec: 14677.3, 300 sec: 14565.1). Total num frames: 478330880. Throughput: 0: 3692.5. Samples: 108754082. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 05:56:28,969][134211] Avg episode reward: [(0, '8.292')] [2025-01-04 05:56:30,075][134294] Updated weights for policy 0, policy_version 116784 (0.0026) [2025-01-04 05:56:32,220][134294] Updated weights for policy 0, policy_version 116794 (0.0014) [2025-01-04 05:56:33,967][134211] Fps is (10 sec: 15155.7, 60 sec: 15087.2, 300 sec: 14634.5). Total num frames: 478420992. Throughput: 0: 3676.3. Samples: 108764634. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 05:56:33,968][134211] Avg episode reward: [(0, '7.891')] [2025-01-04 05:56:34,169][134294] Updated weights for policy 0, policy_version 116804 (0.0013) [2025-01-04 05:56:36,172][134294] Updated weights for policy 0, policy_version 116814 (0.0012) [2025-01-04 05:56:38,968][134211] Fps is (10 sec: 17612.9, 60 sec: 15155.2, 300 sec: 14703.9). Total num frames: 478507008. Throughput: 0: 3632.1. Samples: 108795018. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 05:56:38,969][134211] Avg episode reward: [(0, '8.466')] [2025-01-04 05:56:39,156][134294] Updated weights for policy 0, policy_version 116824 (0.0022) [2025-01-04 05:56:42,673][134294] Updated weights for policy 0, policy_version 116834 (0.0028) [2025-01-04 05:56:43,968][134211] Fps is (10 sec: 14335.7, 60 sec: 14950.4, 300 sec: 14662.3). Total num frames: 478564352. Throughput: 0: 3585.8. Samples: 108812776. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 05:56:43,968][134211] Avg episode reward: [(0, '8.335')] [2025-01-04 05:56:45,934][134294] Updated weights for policy 0, policy_version 116844 (0.0029) [2025-01-04 05:56:48,968][134211] Fps is (10 sec: 12288.1, 60 sec: 14472.5, 300 sec: 14648.4). Total num frames: 478629888. Throughput: 0: 3619.4. Samples: 108822274. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 05:56:48,968][134211] Avg episode reward: [(0, '8.295')] [2025-01-04 05:56:49,158][134294] Updated weights for policy 0, policy_version 116854 (0.0024) [2025-01-04 05:56:51,130][134294] Updated weights for policy 0, policy_version 116864 (0.0012) [2025-01-04 05:56:53,229][134294] Updated weights for policy 0, policy_version 116874 (0.0016) [2025-01-04 05:56:53,968][134211] Fps is (10 sec: 16384.1, 60 sec: 14404.3, 300 sec: 14662.3). Total num frames: 478728192. Throughput: 0: 3661.3. Samples: 108846880. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 05:56:53,968][134211] Avg episode reward: [(0, '7.724')] [2025-01-04 05:56:55,299][134294] Updated weights for policy 0, policy_version 116884 (0.0013) [2025-01-04 05:56:57,329][134294] Updated weights for policy 0, policy_version 116894 (0.0015) [2025-01-04 05:56:58,968][134211] Fps is (10 sec: 19251.2, 60 sec: 15018.7, 300 sec: 14759.5). Total num frames: 478822400. Throughput: 0: 3820.6. Samples: 108876284. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 05:56:58,968][134211] Avg episode reward: [(0, '8.711')] [2025-01-04 05:57:00,352][134294] Updated weights for policy 0, policy_version 116904 (0.0025) [2025-01-04 05:57:03,539][134294] Updated weights for policy 0, policy_version 116914 (0.0028) [2025-01-04 05:57:03,968][134211] Fps is (10 sec: 15564.6, 60 sec: 15155.2, 300 sec: 14759.5). Total num frames: 478883840. Throughput: 0: 3820.5. Samples: 108885566. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 05:57:03,969][134211] Avg episode reward: [(0, '9.007')] [2025-01-04 05:57:06,883][134294] Updated weights for policy 0, policy_version 116924 (0.0024) [2025-01-04 05:57:08,968][134211] Fps is (10 sec: 12287.9, 60 sec: 14609.0, 300 sec: 14745.6). Total num frames: 478945280. Throughput: 0: 3776.0. Samples: 108904340. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 05:57:08,968][134211] Avg episode reward: [(0, '8.937')] [2025-01-04 05:57:10,217][134294] Updated weights for policy 0, policy_version 116934 (0.0028) [2025-01-04 05:57:13,386][134294] Updated weights for policy 0, policy_version 116944 (0.0023) [2025-01-04 05:57:13,968][134211] Fps is (10 sec: 12287.9, 60 sec: 14540.8, 300 sec: 14717.8). Total num frames: 479006720. Throughput: 0: 3758.2. Samples: 108923202. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 05:57:13,968][134211] Avg episode reward: [(0, '8.475')] [2025-01-04 05:57:14,045][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000116946_479010816.pth... [2025-01-04 05:57:14,113][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000116090_475504640.pth [2025-01-04 05:57:16,436][134294] Updated weights for policy 0, policy_version 116954 (0.0025) [2025-01-04 05:57:18,970][134211] Fps is (10 sec: 13104.4, 60 sec: 14676.8, 300 sec: 14731.6). Total num frames: 479076352. Throughput: 0: 3743.7. Samples: 108933108. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 05:57:18,971][134211] Avg episode reward: [(0, '8.709')] [2025-01-04 05:57:19,616][134294] Updated weights for policy 0, policy_version 116964 (0.0025) [2025-01-04 05:57:22,691][134294] Updated weights for policy 0, policy_version 116974 (0.0028) [2025-01-04 05:57:23,968][134211] Fps is (10 sec: 13107.5, 60 sec: 14472.6, 300 sec: 14606.7). Total num frames: 479137792. Throughput: 0: 3506.3. Samples: 108952802. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 05:57:23,968][134211] Avg episode reward: [(0, '8.374')] [2025-01-04 05:57:25,971][134294] Updated weights for policy 0, policy_version 116984 (0.0026) [2025-01-04 05:57:28,968][134211] Fps is (10 sec: 12700.3, 60 sec: 14540.8, 300 sec: 14523.4). Total num frames: 479203328. Throughput: 0: 3539.4. Samples: 108972048. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 05:57:28,968][134211] Avg episode reward: [(0, '8.740')] [2025-01-04 05:57:29,119][134294] Updated weights for policy 0, policy_version 116994 (0.0025) [2025-01-04 05:57:32,145][134294] Updated weights for policy 0, policy_version 117004 (0.0024) [2025-01-04 05:57:33,968][134211] Fps is (10 sec: 14745.7, 60 sec: 14404.2, 300 sec: 14606.8). Total num frames: 479285248. Throughput: 0: 3538.9. Samples: 108981526. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 05:57:33,968][134211] Avg episode reward: [(0, '9.006')] [2025-01-04 05:57:34,020][134294] Updated weights for policy 0, policy_version 117014 (0.0014) [2025-01-04 05:57:36,143][134294] Updated weights for policy 0, policy_version 117024 (0.0016) [2025-01-04 05:57:38,970][134211] Fps is (10 sec: 16381.0, 60 sec: 14335.5, 300 sec: 14662.2). Total num frames: 479367168. Throughput: 0: 3613.8. Samples: 109009506. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 05:57:38,970][134211] Avg episode reward: [(0, '8.839')] [2025-01-04 05:57:39,201][134294] Updated weights for policy 0, policy_version 117034 (0.0028) [2025-01-04 05:57:42,350][134294] Updated weights for policy 0, policy_version 117044 (0.0026) [2025-01-04 05:57:43,968][134211] Fps is (10 sec: 14745.3, 60 sec: 14472.5, 300 sec: 14648.4). Total num frames: 479432704. Throughput: 0: 3392.8. Samples: 109028960. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 05:57:43,968][134211] Avg episode reward: [(0, '8.876')] [2025-01-04 05:57:45,336][134294] Updated weights for policy 0, policy_version 117054 (0.0026) [2025-01-04 05:57:48,333][134294] Updated weights for policy 0, policy_version 117064 (0.0026) [2025-01-04 05:57:48,968][134211] Fps is (10 sec: 13519.4, 60 sec: 14540.8, 300 sec: 14676.2). Total num frames: 479502336. Throughput: 0: 3417.3. Samples: 109039346. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 05:57:48,968][134211] Avg episode reward: [(0, '8.573')] [2025-01-04 05:57:50,939][134294] Updated weights for policy 0, policy_version 117074 (0.0019) [2025-01-04 05:57:53,014][134294] Updated weights for policy 0, policy_version 117084 (0.0011) [2025-01-04 05:57:53,968][134211] Fps is (10 sec: 15974.6, 60 sec: 14404.3, 300 sec: 14690.1). Total num frames: 479592448. Throughput: 0: 3527.3. Samples: 109063070. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 05:57:53,968][134211] Avg episode reward: [(0, '7.685')] [2025-01-04 05:57:55,407][134294] Updated weights for policy 0, policy_version 117094 (0.0019) [2025-01-04 05:57:58,919][134294] Updated weights for policy 0, policy_version 117104 (0.0027) [2025-01-04 05:57:58,968][134211] Fps is (10 sec: 15564.7, 60 sec: 13926.4, 300 sec: 14565.1). Total num frames: 479657984. Throughput: 0: 3612.9. Samples: 109085780. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 05:57:58,968][134211] Avg episode reward: [(0, '8.937')] [2025-01-04 05:58:02,345][134294] Updated weights for policy 0, policy_version 117114 (0.0026) [2025-01-04 05:58:03,968][134211] Fps is (10 sec: 12287.9, 60 sec: 13858.2, 300 sec: 14495.7). Total num frames: 479715328. Throughput: 0: 3585.8. Samples: 109094460. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 05:58:03,968][134211] Avg episode reward: [(0, '9.220')] [2025-01-04 05:58:05,382][134294] Updated weights for policy 0, policy_version 117124 (0.0022) [2025-01-04 05:58:07,375][134294] Updated weights for policy 0, policy_version 117134 (0.0012) [2025-01-04 05:58:08,968][134211] Fps is (10 sec: 15565.1, 60 sec: 14472.6, 300 sec: 14606.8). Total num frames: 479813632. Throughput: 0: 3657.0. Samples: 109117366. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 05:58:08,968][134211] Avg episode reward: [(0, '7.897')] [2025-01-04 05:58:09,317][134294] Updated weights for policy 0, policy_version 117144 (0.0013) [2025-01-04 05:58:11,481][134294] Updated weights for policy 0, policy_version 117154 (0.0022) [2025-01-04 05:58:13,968][134211] Fps is (10 sec: 17612.4, 60 sec: 14745.6, 300 sec: 14648.4). Total num frames: 479891456. Throughput: 0: 3830.4. Samples: 109144416. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 05:58:13,969][134211] Avg episode reward: [(0, '7.720')] [2025-01-04 05:58:14,691][134294] Updated weights for policy 0, policy_version 117164 (0.0027) [2025-01-04 05:58:17,823][134294] Updated weights for policy 0, policy_version 117174 (0.0026) [2025-01-04 05:58:18,968][134211] Fps is (10 sec: 14335.1, 60 sec: 14677.8, 300 sec: 14648.4). Total num frames: 479956992. Throughput: 0: 3827.6. Samples: 109153770. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 05:58:18,969][134211] Avg episode reward: [(0, '8.651')] [2025-01-04 05:58:20,931][134294] Updated weights for policy 0, policy_version 117184 (0.0025) [2025-01-04 05:58:23,968][134211] Fps is (10 sec: 13107.3, 60 sec: 14745.6, 300 sec: 14662.3). Total num frames: 480022528. Throughput: 0: 3650.1. Samples: 109173752. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 05:58:23,968][134211] Avg episode reward: [(0, '8.590')] [2025-01-04 05:58:24,110][134294] Updated weights for policy 0, policy_version 117194 (0.0030) [2025-01-04 05:58:27,407][134294] Updated weights for policy 0, policy_version 117204 (0.0028) [2025-01-04 05:58:28,968][134211] Fps is (10 sec: 12698.3, 60 sec: 14677.4, 300 sec: 14648.4). Total num frames: 480083968. Throughput: 0: 3628.8. Samples: 109192254. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 05:58:28,968][134211] Avg episode reward: [(0, '8.020')] [2025-01-04 05:58:30,677][134294] Updated weights for policy 0, policy_version 117214 (0.0024) [2025-01-04 05:58:33,591][134294] Updated weights for policy 0, policy_version 117224 (0.0024) [2025-01-04 05:58:33,968][134211] Fps is (10 sec: 13107.5, 60 sec: 14472.5, 300 sec: 14648.4). Total num frames: 480153600. Throughput: 0: 3617.6. Samples: 109202136. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 05:58:33,968][134211] Avg episode reward: [(0, '9.009')] [2025-01-04 05:58:36,418][134294] Updated weights for policy 0, policy_version 117234 (0.0023) [2025-01-04 05:58:38,432][134294] Updated weights for policy 0, policy_version 117244 (0.0013) [2025-01-04 05:58:38,968][134211] Fps is (10 sec: 15564.3, 60 sec: 14541.2, 300 sec: 14662.3). Total num frames: 480239616. Throughput: 0: 3592.5. Samples: 109224734. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 05:58:38,968][134211] Avg episode reward: [(0, '8.593')] [2025-01-04 05:58:40,364][134294] Updated weights for policy 0, policy_version 117254 (0.0013) [2025-01-04 05:58:42,278][134294] Updated weights for policy 0, policy_version 117264 (0.0013) [2025-01-04 05:58:43,968][134211] Fps is (10 sec: 18841.3, 60 sec: 15155.2, 300 sec: 14787.3). Total num frames: 480342016. Throughput: 0: 3780.5. Samples: 109255904. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 05:58:43,969][134211] Avg episode reward: [(0, '7.943')] [2025-01-04 05:58:44,776][134294] Updated weights for policy 0, policy_version 117274 (0.0022) [2025-01-04 05:58:47,931][134294] Updated weights for policy 0, policy_version 117284 (0.0028) [2025-01-04 05:58:48,968][134211] Fps is (10 sec: 16793.8, 60 sec: 15086.9, 300 sec: 14676.2). Total num frames: 480407552. Throughput: 0: 3819.3. Samples: 109266330. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 05:58:48,968][134211] Avg episode reward: [(0, '7.687')] [2025-01-04 05:58:50,957][134294] Updated weights for policy 0, policy_version 117294 (0.0025) [2025-01-04 05:58:53,968][134211] Fps is (10 sec: 13107.2, 60 sec: 14677.3, 300 sec: 14634.5). Total num frames: 480473088. Throughput: 0: 3759.0. Samples: 109286524. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 05:58:53,968][134211] Avg episode reward: [(0, '8.510')] [2025-01-04 05:58:54,010][134294] Updated weights for policy 0, policy_version 117304 (0.0028) [2025-01-04 05:58:57,155][134294] Updated weights for policy 0, policy_version 117314 (0.0028) [2025-01-04 05:58:58,969][134211] Fps is (10 sec: 13105.6, 60 sec: 14677.0, 300 sec: 14676.1). Total num frames: 480538624. Throughput: 0: 3587.0. Samples: 109305834. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 05:58:58,970][134211] Avg episode reward: [(0, '8.540')] [2025-01-04 05:59:00,435][134294] Updated weights for policy 0, policy_version 117324 (0.0028) [2025-01-04 05:59:03,395][134294] Updated weights for policy 0, policy_version 117334 (0.0026) [2025-01-04 05:59:03,968][134211] Fps is (10 sec: 13107.5, 60 sec: 14813.9, 300 sec: 14606.7). Total num frames: 480604160. Throughput: 0: 3598.0. Samples: 109315676. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 05:59:03,968][134211] Avg episode reward: [(0, '8.023')] [2025-01-04 05:59:06,486][134294] Updated weights for policy 0, policy_version 117344 (0.0026) [2025-01-04 05:59:08,968][134211] Fps is (10 sec: 13109.0, 60 sec: 14267.7, 300 sec: 14481.8). Total num frames: 480669696. Throughput: 0: 3591.7. Samples: 109335376. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 05:59:08,968][134211] Avg episode reward: [(0, '8.516')] [2025-01-04 05:59:09,740][134294] Updated weights for policy 0, policy_version 117354 (0.0026) [2025-01-04 05:59:12,792][134294] Updated weights for policy 0, policy_version 117364 (0.0024) [2025-01-04 05:59:13,968][134211] Fps is (10 sec: 13107.1, 60 sec: 14063.0, 300 sec: 14481.8). Total num frames: 480735232. Throughput: 0: 3619.8. Samples: 109355144. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 05:59:13,968][134211] Avg episode reward: [(0, '8.472')] [2025-01-04 05:59:13,978][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000117367_480735232.pth... [2025-01-04 05:59:14,052][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000116517_477253632.pth [2025-01-04 05:59:15,641][134294] Updated weights for policy 0, policy_version 117374 (0.0020) [2025-01-04 05:59:17,541][134294] Updated weights for policy 0, policy_version 117384 (0.0013) [2025-01-04 05:59:18,967][134211] Fps is (10 sec: 16384.4, 60 sec: 14609.2, 300 sec: 14606.8). Total num frames: 480833536. Throughput: 0: 3670.5. Samples: 109367310. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 05:59:18,968][134211] Avg episode reward: [(0, '7.684')] [2025-01-04 05:59:19,362][134294] Updated weights for policy 0, policy_version 117394 (0.0013) [2025-01-04 05:59:21,296][134294] Updated weights for policy 0, policy_version 117404 (0.0014) [2025-01-04 05:59:23,281][134294] Updated weights for policy 0, policy_version 117414 (0.0013) [2025-01-04 05:59:23,968][134211] Fps is (10 sec: 20480.2, 60 sec: 15291.8, 300 sec: 14759.5). Total num frames: 480940032. Throughput: 0: 3885.6. Samples: 109399584. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 05:59:23,968][134211] Avg episode reward: [(0, '8.618')] [2025-01-04 05:59:25,604][134294] Updated weights for policy 0, policy_version 117424 (0.0020) [2025-01-04 05:59:28,968][134211] Fps is (10 sec: 16793.1, 60 sec: 15291.7, 300 sec: 14731.7). Total num frames: 481001472. Throughput: 0: 3703.6. Samples: 109422564. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 05:59:28,968][134211] Avg episode reward: [(0, '8.690')] [2025-01-04 05:59:29,265][134294] Updated weights for policy 0, policy_version 117434 (0.0030) [2025-01-04 05:59:32,505][134294] Updated weights for policy 0, policy_version 117444 (0.0027) [2025-01-04 05:59:33,968][134211] Fps is (10 sec: 12697.3, 60 sec: 15223.4, 300 sec: 14745.6). Total num frames: 481067008. Throughput: 0: 3665.8. Samples: 109431290. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 05:59:33,968][134211] Avg episode reward: [(0, '7.790')] [2025-01-04 05:59:35,836][134294] Updated weights for policy 0, policy_version 117454 (0.0025) [2025-01-04 05:59:38,968][134211] Fps is (10 sec: 12697.6, 60 sec: 14813.9, 300 sec: 14717.8). Total num frames: 481128448. Throughput: 0: 3638.5. Samples: 109450258. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 05:59:38,969][134211] Avg episode reward: [(0, '8.322')] [2025-01-04 05:59:39,058][134294] Updated weights for policy 0, policy_version 117464 (0.0027) [2025-01-04 05:59:42,176][134294] Updated weights for policy 0, policy_version 117474 (0.0026) [2025-01-04 05:59:43,968][134211] Fps is (10 sec: 12697.6, 60 sec: 14199.5, 300 sec: 14703.9). Total num frames: 481193984. Throughput: 0: 3644.9. Samples: 109469848. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 05:59:43,968][134211] Avg episode reward: [(0, '9.368')] [2025-01-04 05:59:45,210][134294] Updated weights for policy 0, policy_version 117484 (0.0025) [2025-01-04 05:59:48,178][134294] Updated weights for policy 0, policy_version 117494 (0.0025) [2025-01-04 05:59:48,968][134211] Fps is (10 sec: 13516.9, 60 sec: 14267.8, 300 sec: 14579.0). Total num frames: 481263616. Throughput: 0: 3658.3. Samples: 109480302. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 05:59:48,968][134211] Avg episode reward: [(0, '7.817')] [2025-01-04 05:59:51,001][134294] Updated weights for policy 0, policy_version 117504 (0.0024) [2025-01-04 05:59:53,968][134211] Fps is (10 sec: 13926.7, 60 sec: 14336.1, 300 sec: 14467.9). Total num frames: 481333248. Throughput: 0: 3682.0. Samples: 109501066. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 05:59:53,968][134211] Avg episode reward: [(0, '8.039')] [2025-01-04 05:59:54,000][134294] Updated weights for policy 0, policy_version 117514 (0.0023) [2025-01-04 05:59:56,087][134294] Updated weights for policy 0, policy_version 117524 (0.0017) [2025-01-04 05:59:58,968][134211] Fps is (10 sec: 15155.4, 60 sec: 14609.4, 300 sec: 14537.3). Total num frames: 481415168. Throughput: 0: 3769.7. Samples: 109524780. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 05:59:58,968][134211] Avg episode reward: [(0, '8.400')] [2025-01-04 05:59:59,333][134294] Updated weights for policy 0, policy_version 117534 (0.0025) [2025-01-04 06:00:01,660][134294] Updated weights for policy 0, policy_version 117544 (0.0014) [2025-01-04 06:00:03,808][134294] Updated weights for policy 0, policy_version 117554 (0.0015) [2025-01-04 06:00:03,967][134211] Fps is (10 sec: 17203.3, 60 sec: 15018.7, 300 sec: 14634.5). Total num frames: 481505280. Throughput: 0: 3751.2. Samples: 109536112. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 06:00:03,968][134211] Avg episode reward: [(0, '8.210')] [2025-01-04 06:00:05,844][134294] Updated weights for policy 0, policy_version 117564 (0.0013) [2025-01-04 06:00:08,965][134294] Updated weights for policy 0, policy_version 117574 (0.0022) [2025-01-04 06:00:08,968][134211] Fps is (10 sec: 16793.4, 60 sec: 15223.5, 300 sec: 14676.2). Total num frames: 481583104. Throughput: 0: 3647.4. Samples: 109563718. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 06:00:08,968][134211] Avg episode reward: [(0, '7.851')] [2025-01-04 06:00:12,123][134294] Updated weights for policy 0, policy_version 117584 (0.0027) [2025-01-04 06:00:13,968][134211] Fps is (10 sec: 13925.6, 60 sec: 15155.1, 300 sec: 14648.4). Total num frames: 481644544. Throughput: 0: 3548.5. Samples: 109582246. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 06:00:13,969][134211] Avg episode reward: [(0, '8.122')] [2025-01-04 06:00:15,601][134294] Updated weights for policy 0, policy_version 117594 (0.0029) [2025-01-04 06:00:18,779][134294] Updated weights for policy 0, policy_version 117604 (0.0025) [2025-01-04 06:00:18,968][134211] Fps is (10 sec: 12287.9, 60 sec: 14540.7, 300 sec: 14634.5). Total num frames: 481705984. Throughput: 0: 3558.3. Samples: 109591412. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 06:00:18,970][134211] Avg episode reward: [(0, '8.146')] [2025-01-04 06:00:22,200][134294] Updated weights for policy 0, policy_version 117614 (0.0026) [2025-01-04 06:00:23,968][134211] Fps is (10 sec: 11878.6, 60 sec: 13721.5, 300 sec: 14620.6). Total num frames: 481763328. Throughput: 0: 3546.0. Samples: 109609830. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 06:00:23,969][134211] Avg episode reward: [(0, '8.476')] [2025-01-04 06:00:25,222][134294] Updated weights for policy 0, policy_version 117624 (0.0019) [2025-01-04 06:00:27,329][134294] Updated weights for policy 0, policy_version 117634 (0.0012) [2025-01-04 06:00:28,967][134211] Fps is (10 sec: 15155.6, 60 sec: 14267.8, 300 sec: 14717.9). Total num frames: 481857536. Throughput: 0: 3650.9. Samples: 109634136. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 06:00:28,968][134211] Avg episode reward: [(0, '7.607')] [2025-01-04 06:00:29,532][134294] Updated weights for policy 0, policy_version 117644 (0.0012) [2025-01-04 06:00:31,909][134294] Updated weights for policy 0, policy_version 117654 (0.0018) [2025-01-04 06:00:33,969][134211] Fps is (10 sec: 16792.2, 60 sec: 14404.1, 300 sec: 14690.0). Total num frames: 481931264. Throughput: 0: 3730.1. Samples: 109648160. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 06:00:33,969][134211] Avg episode reward: [(0, '7.633')] [2025-01-04 06:00:35,897][134294] Updated weights for policy 0, policy_version 117664 (0.0027) [2025-01-04 06:00:38,968][134211] Fps is (10 sec: 12696.7, 60 sec: 14267.6, 300 sec: 14634.5). Total num frames: 481984512. Throughput: 0: 3636.9. Samples: 109664728. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 06:00:38,969][134211] Avg episode reward: [(0, '7.928')] [2025-01-04 06:00:39,604][134294] Updated weights for policy 0, policy_version 117674 (0.0028) [2025-01-04 06:00:42,185][134294] Updated weights for policy 0, policy_version 117684 (0.0016) [2025-01-04 06:00:43,967][134211] Fps is (10 sec: 13518.3, 60 sec: 14540.9, 300 sec: 14592.9). Total num frames: 482066432. Throughput: 0: 3587.9. Samples: 109686234. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 06:00:43,968][134211] Avg episode reward: [(0, '8.874')] [2025-01-04 06:00:44,389][134294] Updated weights for policy 0, policy_version 117694 (0.0013) [2025-01-04 06:00:46,477][134294] Updated weights for policy 0, policy_version 117704 (0.0013) [2025-01-04 06:00:48,968][134211] Fps is (10 sec: 16385.0, 60 sec: 14745.6, 300 sec: 14523.5). Total num frames: 482148352. Throughput: 0: 3653.8. Samples: 109700534. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 06:00:48,968][134211] Avg episode reward: [(0, '8.843')] [2025-01-04 06:00:49,572][134294] Updated weights for policy 0, policy_version 117714 (0.0022) [2025-01-04 06:00:53,310][134294] Updated weights for policy 0, policy_version 117724 (0.0027) [2025-01-04 06:00:53,968][134211] Fps is (10 sec: 13516.6, 60 sec: 14472.5, 300 sec: 14509.6). Total num frames: 482201600. Throughput: 0: 3458.8. Samples: 109719366. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 06:00:53,968][134211] Avg episode reward: [(0, '7.821')] [2025-01-04 06:00:56,962][134294] Updated weights for policy 0, policy_version 117734 (0.0027) [2025-01-04 06:00:58,968][134211] Fps is (10 sec: 11059.1, 60 sec: 14062.9, 300 sec: 14523.4). Total num frames: 482258944. Throughput: 0: 3417.0. Samples: 109736008. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 06:00:58,968][134211] Avg episode reward: [(0, '8.669')] [2025-01-04 06:01:00,538][134294] Updated weights for policy 0, policy_version 117744 (0.0026) [2025-01-04 06:01:03,968][134211] Fps is (10 sec: 11468.7, 60 sec: 13516.7, 300 sec: 14398.5). Total num frames: 482316288. Throughput: 0: 3410.0. Samples: 109744864. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 06:01:03,968][134211] Avg episode reward: [(0, '9.239')] [2025-01-04 06:01:04,136][134294] Updated weights for policy 0, policy_version 117754 (0.0024) [2025-01-04 06:01:06,673][134294] Updated weights for policy 0, policy_version 117764 (0.0016) [2025-01-04 06:01:08,835][134294] Updated weights for policy 0, policy_version 117774 (0.0013) [2025-01-04 06:01:08,967][134211] Fps is (10 sec: 14336.4, 60 sec: 13653.4, 300 sec: 14467.9). Total num frames: 482402304. Throughput: 0: 3469.2. Samples: 109765944. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 06:01:08,968][134211] Avg episode reward: [(0, '8.223')] [2025-01-04 06:01:11,052][134294] Updated weights for policy 0, policy_version 117784 (0.0013) [2025-01-04 06:01:13,927][134294] Updated weights for policy 0, policy_version 117794 (0.0026) [2025-01-04 06:01:13,968][134211] Fps is (10 sec: 16793.1, 60 sec: 13994.7, 300 sec: 14537.3). Total num frames: 482484224. Throughput: 0: 3509.9. Samples: 109792082. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 06:01:13,969][134211] Avg episode reward: [(0, '8.658')] [2025-01-04 06:01:13,980][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000117794_482484224.pth... [2025-01-04 06:01:14,071][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000116946_479010816.pth [2025-01-04 06:01:17,889][134294] Updated weights for policy 0, policy_version 117804 (0.0030) [2025-01-04 06:01:18,968][134211] Fps is (10 sec: 13106.8, 60 sec: 13789.9, 300 sec: 14454.0). Total num frames: 482533376. Throughput: 0: 3370.2. Samples: 109799814. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 06:01:18,968][134211] Avg episode reward: [(0, '8.265')] [2025-01-04 06:01:21,288][134294] Updated weights for policy 0, policy_version 117814 (0.0023) [2025-01-04 06:01:23,969][134211] Fps is (10 sec: 11058.3, 60 sec: 13857.9, 300 sec: 14454.0). Total num frames: 482594816. Throughput: 0: 3387.5. Samples: 109817166. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 06:01:23,970][134211] Avg episode reward: [(0, '8.297')] [2025-01-04 06:01:24,895][134294] Updated weights for policy 0, policy_version 117824 (0.0023) [2025-01-04 06:01:28,119][134294] Updated weights for policy 0, policy_version 117834 (0.0024) [2025-01-04 06:01:28,968][134211] Fps is (10 sec: 12288.0, 60 sec: 13311.9, 300 sec: 14356.8). Total num frames: 482656256. Throughput: 0: 3312.6. Samples: 109835302. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 06:01:28,968][134211] Avg episode reward: [(0, '8.818')] [2025-01-04 06:01:31,441][134294] Updated weights for policy 0, policy_version 117844 (0.0029) [2025-01-04 06:01:33,968][134211] Fps is (10 sec: 12289.5, 60 sec: 13107.4, 300 sec: 14273.5). Total num frames: 482717696. Throughput: 0: 3200.9. Samples: 109844574. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 06:01:33,968][134211] Avg episode reward: [(0, '8.515')] [2025-01-04 06:01:34,893][134294] Updated weights for policy 0, policy_version 117854 (0.0026) [2025-01-04 06:01:37,076][134294] Updated weights for policy 0, policy_version 117864 (0.0011) [2025-01-04 06:01:38,967][134211] Fps is (10 sec: 14746.0, 60 sec: 13653.5, 300 sec: 14370.7). Total num frames: 482803712. Throughput: 0: 3271.6. Samples: 109866588. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 06:01:38,968][134211] Avg episode reward: [(0, '7.601')] [2025-01-04 06:01:39,130][134294] Updated weights for policy 0, policy_version 117874 (0.0013) [2025-01-04 06:01:41,179][134294] Updated weights for policy 0, policy_version 117884 (0.0013) [2025-01-04 06:01:43,236][134294] Updated weights for policy 0, policy_version 117894 (0.0014) [2025-01-04 06:01:43,967][134211] Fps is (10 sec: 18841.9, 60 sec: 13994.7, 300 sec: 14495.7). Total num frames: 482906112. Throughput: 0: 3564.5. Samples: 109896408. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 06:01:43,968][134211] Avg episode reward: [(0, '9.143')] [2025-01-04 06:01:45,439][134294] Updated weights for policy 0, policy_version 117904 (0.0016) [2025-01-04 06:01:48,742][134294] Updated weights for policy 0, policy_version 117914 (0.0029) [2025-01-04 06:01:48,968][134211] Fps is (10 sec: 17202.9, 60 sec: 13789.9, 300 sec: 14398.5). Total num frames: 482975744. Throughput: 0: 3656.3. Samples: 109909398. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 06:01:48,968][134211] Avg episode reward: [(0, '8.739')] [2025-01-04 06:01:52,359][134294] Updated weights for policy 0, policy_version 117924 (0.0025) [2025-01-04 06:01:53,968][134211] Fps is (10 sec: 12697.4, 60 sec: 13858.1, 300 sec: 14273.5). Total num frames: 483033088. Throughput: 0: 3576.6. Samples: 109926894. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 06:01:53,969][134211] Avg episode reward: [(0, '8.802')] [2025-01-04 06:01:55,801][134294] Updated weights for policy 0, policy_version 117934 (0.0025) [2025-01-04 06:01:58,968][134211] Fps is (10 sec: 11878.3, 60 sec: 13926.4, 300 sec: 14273.5). Total num frames: 483094528. Throughput: 0: 3394.7. Samples: 109944844. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 06:01:58,968][134211] Avg episode reward: [(0, '8.986')] [2025-01-04 06:01:59,178][134294] Updated weights for policy 0, policy_version 117944 (0.0027) [2025-01-04 06:02:02,851][134294] Updated weights for policy 0, policy_version 117954 (0.0030) [2025-01-04 06:02:03,968][134211] Fps is (10 sec: 11468.8, 60 sec: 13858.2, 300 sec: 14245.8). Total num frames: 483147776. Throughput: 0: 3415.0. Samples: 109953490. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 06:02:03,968][134211] Avg episode reward: [(0, '8.445')] [2025-01-04 06:02:06,476][134294] Updated weights for policy 0, policy_version 117964 (0.0026) [2025-01-04 06:02:08,968][134211] Fps is (10 sec: 11468.8, 60 sec: 13448.5, 300 sec: 14245.8). Total num frames: 483209216. Throughput: 0: 3406.2. Samples: 109970442. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 06:02:08,968][134211] Avg episode reward: [(0, '8.541')] [2025-01-04 06:02:10,094][134294] Updated weights for policy 0, policy_version 117974 (0.0026) [2025-01-04 06:02:13,496][134294] Updated weights for policy 0, policy_version 117984 (0.0022) [2025-01-04 06:02:13,967][134211] Fps is (10 sec: 12288.2, 60 sec: 13107.3, 300 sec: 14218.1). Total num frames: 483270656. Throughput: 0: 3394.2. Samples: 109988040. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 06:02:13,968][134211] Avg episode reward: [(0, '8.666')] [2025-01-04 06:02:15,482][134294] Updated weights for policy 0, policy_version 117994 (0.0014) [2025-01-04 06:02:17,323][134294] Updated weights for policy 0, policy_version 118004 (0.0014) [2025-01-04 06:02:18,968][134211] Fps is (10 sec: 15974.5, 60 sec: 13926.4, 300 sec: 14342.9). Total num frames: 483368960. Throughput: 0: 3517.2. Samples: 110002846. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 06:02:18,970][134211] Avg episode reward: [(0, '8.865')] [2025-01-04 06:02:19,999][134294] Updated weights for policy 0, policy_version 118014 (0.0023) [2025-01-04 06:02:23,084][134294] Updated weights for policy 0, policy_version 118024 (0.0027) [2025-01-04 06:02:23,968][134211] Fps is (10 sec: 16383.8, 60 sec: 13995.0, 300 sec: 14342.9). Total num frames: 483434496. Throughput: 0: 3562.1. Samples: 110026882. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 06:02:23,968][134211] Avg episode reward: [(0, '7.911')] [2025-01-04 06:02:26,123][134294] Updated weights for policy 0, policy_version 118034 (0.0025) [2025-01-04 06:02:28,968][134211] Fps is (10 sec: 13516.3, 60 sec: 14131.1, 300 sec: 14301.3). Total num frames: 483504128. Throughput: 0: 3339.7. Samples: 110046696. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 06:02:28,969][134211] Avg episode reward: [(0, '8.298')] [2025-01-04 06:02:29,310][134294] Updated weights for policy 0, policy_version 118044 (0.0023) [2025-01-04 06:02:32,222][134294] Updated weights for policy 0, policy_version 118054 (0.0024) [2025-01-04 06:02:33,968][134211] Fps is (10 sec: 13516.6, 60 sec: 14199.4, 300 sec: 14245.8). Total num frames: 483569664. Throughput: 0: 3274.9. Samples: 110056768. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 06:02:33,968][134211] Avg episode reward: [(0, '8.646')] [2025-01-04 06:02:35,344][134294] Updated weights for policy 0, policy_version 118064 (0.0026) [2025-01-04 06:02:38,761][134294] Updated weights for policy 0, policy_version 118074 (0.0025) [2025-01-04 06:02:38,968][134211] Fps is (10 sec: 12698.0, 60 sec: 13789.8, 300 sec: 14231.9). Total num frames: 483631104. Throughput: 0: 3324.1. Samples: 110076478. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 06:02:38,968][134211] Avg episode reward: [(0, '8.671')] [2025-01-04 06:02:41,159][134294] Updated weights for policy 0, policy_version 118084 (0.0017) [2025-01-04 06:02:43,079][134294] Updated weights for policy 0, policy_version 118094 (0.0014) [2025-01-04 06:02:43,967][134211] Fps is (10 sec: 15974.8, 60 sec: 13721.6, 300 sec: 14329.1). Total num frames: 483729408. Throughput: 0: 3496.2. Samples: 110102172. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 06:02:43,968][134211] Avg episode reward: [(0, '8.179')] [2025-01-04 06:02:44,940][134294] Updated weights for policy 0, policy_version 118104 (0.0013) [2025-01-04 06:02:46,876][134294] Updated weights for policy 0, policy_version 118114 (0.0016) [2025-01-04 06:02:48,744][134294] Updated weights for policy 0, policy_version 118124 (0.0013) [2025-01-04 06:02:48,968][134211] Fps is (10 sec: 20889.6, 60 sec: 14404.2, 300 sec: 14398.5). Total num frames: 483840000. Throughput: 0: 3664.4. Samples: 110118390. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 06:02:48,968][134211] Avg episode reward: [(0, '8.641')] [2025-01-04 06:02:51,257][134294] Updated weights for policy 0, policy_version 118134 (0.0017) [2025-01-04 06:02:53,968][134211] Fps is (10 sec: 18022.1, 60 sec: 14609.1, 300 sec: 14412.4). Total num frames: 483909632. Throughput: 0: 3888.8. Samples: 110145438. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 06:02:53,968][134211] Avg episode reward: [(0, '8.128')] [2025-01-04 06:02:54,513][134294] Updated weights for policy 0, policy_version 118144 (0.0026) [2025-01-04 06:02:57,887][134294] Updated weights for policy 0, policy_version 118154 (0.0028) [2025-01-04 06:02:58,968][134211] Fps is (10 sec: 13107.2, 60 sec: 14609.1, 300 sec: 14426.2). Total num frames: 483971072. Throughput: 0: 3908.7. Samples: 110163934. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 06:02:58,968][134211] Avg episode reward: [(0, '8.140')] [2025-01-04 06:03:00,952][134294] Updated weights for policy 0, policy_version 118164 (0.0021) [2025-01-04 06:03:03,968][134211] Fps is (10 sec: 12697.6, 60 sec: 14813.9, 300 sec: 14315.2). Total num frames: 484036608. Throughput: 0: 3804.7. Samples: 110174058. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 06:03:03,968][134211] Avg episode reward: [(0, '8.194')] [2025-01-04 06:03:04,081][134294] Updated weights for policy 0, policy_version 118174 (0.0024) [2025-01-04 06:03:07,485][134294] Updated weights for policy 0, policy_version 118184 (0.0026) [2025-01-04 06:03:08,968][134211] Fps is (10 sec: 12697.7, 60 sec: 14813.9, 300 sec: 14259.6). Total num frames: 484098048. Throughput: 0: 3683.6. Samples: 110192646. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 06:03:08,968][134211] Avg episode reward: [(0, '8.938')] [2025-01-04 06:03:10,676][134294] Updated weights for policy 0, policy_version 118194 (0.0024) [2025-01-04 06:03:13,662][134294] Updated weights for policy 0, policy_version 118204 (0.0025) [2025-01-04 06:03:13,968][134211] Fps is (10 sec: 12697.5, 60 sec: 14882.1, 300 sec: 14259.7). Total num frames: 484163584. Throughput: 0: 3681.7. Samples: 110212370. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 06:03:13,968][134211] Avg episode reward: [(0, '8.608')] [2025-01-04 06:03:13,983][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000118204_484163584.pth... [2025-01-04 06:03:14,060][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000117367_480735232.pth [2025-01-04 06:03:16,772][134294] Updated weights for policy 0, policy_version 118214 (0.0026) [2025-01-04 06:03:18,970][134211] Fps is (10 sec: 13514.1, 60 sec: 14403.8, 300 sec: 14273.4). Total num frames: 484233216. Throughput: 0: 3678.4. Samples: 110222304. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 06:03:18,970][134211] Avg episode reward: [(0, '9.174')] [2025-01-04 06:03:19,872][134294] Updated weights for policy 0, policy_version 118224 (0.0024) [2025-01-04 06:03:22,750][134294] Updated weights for policy 0, policy_version 118234 (0.0026) [2025-01-04 06:03:23,968][134211] Fps is (10 sec: 13517.1, 60 sec: 14404.3, 300 sec: 14287.4). Total num frames: 484298752. Throughput: 0: 3692.5. Samples: 110242640. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 06:03:23,968][134211] Avg episode reward: [(0, '9.050')] [2025-01-04 06:03:25,131][134294] Updated weights for policy 0, policy_version 118244 (0.0018) [2025-01-04 06:03:27,508][134294] Updated weights for policy 0, policy_version 118254 (0.0020) [2025-01-04 06:03:28,970][134211] Fps is (10 sec: 15155.1, 60 sec: 14676.9, 300 sec: 14342.8). Total num frames: 484384768. Throughput: 0: 3672.7. Samples: 110267454. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 06:03:28,970][134211] Avg episode reward: [(0, '7.839')] [2025-01-04 06:03:30,715][134294] Updated weights for policy 0, policy_version 118264 (0.0027) [2025-01-04 06:03:33,968][134211] Fps is (10 sec: 14745.3, 60 sec: 14609.1, 300 sec: 14259.6). Total num frames: 484446208. Throughput: 0: 3524.4. Samples: 110276986. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 06:03:33,968][134211] Avg episode reward: [(0, '8.012')] [2025-01-04 06:03:34,034][134294] Updated weights for policy 0, policy_version 118274 (0.0022) [2025-01-04 06:03:36,392][134294] Updated weights for policy 0, policy_version 118284 (0.0017) [2025-01-04 06:03:38,569][134294] Updated weights for policy 0, policy_version 118294 (0.0013) [2025-01-04 06:03:38,968][134211] Fps is (10 sec: 15158.6, 60 sec: 15087.0, 300 sec: 14218.0). Total num frames: 484536320. Throughput: 0: 3428.3. Samples: 110299712. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 06:03:38,968][134211] Avg episode reward: [(0, '7.962')] [2025-01-04 06:03:40,527][134294] Updated weights for policy 0, policy_version 118304 (0.0014) [2025-01-04 06:03:42,415][134294] Updated weights for policy 0, policy_version 118314 (0.0013) [2025-01-04 06:03:43,968][134211] Fps is (10 sec: 20070.8, 60 sec: 15291.7, 300 sec: 14370.7). Total num frames: 484646912. Throughput: 0: 3710.7. Samples: 110330916. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 06:03:43,968][134211] Avg episode reward: [(0, '7.643')] [2025-01-04 06:03:44,305][134294] Updated weights for policy 0, policy_version 118324 (0.0013) [2025-01-04 06:03:46,362][134294] Updated weights for policy 0, policy_version 118334 (0.0015) [2025-01-04 06:03:48,968][134211] Fps is (10 sec: 19250.8, 60 sec: 14813.9, 300 sec: 14426.3). Total num frames: 484728832. Throughput: 0: 3833.3. Samples: 110346556. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 06:03:48,968][134211] Avg episode reward: [(0, '8.720')] [2025-01-04 06:03:49,587][134294] Updated weights for policy 0, policy_version 118344 (0.0024) [2025-01-04 06:03:52,755][134294] Updated weights for policy 0, policy_version 118354 (0.0027) [2025-01-04 06:03:53,968][134211] Fps is (10 sec: 14335.5, 60 sec: 14677.3, 300 sec: 14412.4). Total num frames: 484790272. Throughput: 0: 3850.8. Samples: 110365934. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 06:03:53,969][134211] Avg episode reward: [(0, '8.960')] [2025-01-04 06:03:55,910][134294] Updated weights for policy 0, policy_version 118364 (0.0026) [2025-01-04 06:03:58,968][134211] Fps is (10 sec: 12697.7, 60 sec: 14745.6, 300 sec: 14412.4). Total num frames: 484855808. Throughput: 0: 3851.7. Samples: 110385696. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 06:03:58,968][134211] Avg episode reward: [(0, '9.166')] [2025-01-04 06:03:59,020][134294] Updated weights for policy 0, policy_version 118374 (0.0025) [2025-01-04 06:04:02,680][134294] Updated weights for policy 0, policy_version 118384 (0.0025) [2025-01-04 06:04:03,968][134211] Fps is (10 sec: 12288.1, 60 sec: 14609.0, 300 sec: 14384.6). Total num frames: 484913152. Throughput: 0: 3820.2. Samples: 110394204. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 06:04:03,970][134211] Avg episode reward: [(0, '9.027')] [2025-01-04 06:04:06,143][134294] Updated weights for policy 0, policy_version 118394 (0.0025) [2025-01-04 06:04:08,968][134211] Fps is (10 sec: 11468.7, 60 sec: 14540.8, 300 sec: 14356.8). Total num frames: 484970496. Throughput: 0: 3751.8. Samples: 110411470. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 06:04:08,969][134211] Avg episode reward: [(0, '8.486')] [2025-01-04 06:04:09,777][134294] Updated weights for policy 0, policy_version 118404 (0.0026) [2025-01-04 06:04:12,803][134294] Updated weights for policy 0, policy_version 118414 (0.0025) [2025-01-04 06:04:13,968][134211] Fps is (10 sec: 12288.0, 60 sec: 14540.8, 300 sec: 14245.7). Total num frames: 485036032. Throughput: 0: 3621.3. Samples: 110430406. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 06:04:13,968][134211] Avg episode reward: [(0, '8.473')] [2025-01-04 06:04:16,044][134294] Updated weights for policy 0, policy_version 118424 (0.0027) [2025-01-04 06:04:18,968][134211] Fps is (10 sec: 13107.5, 60 sec: 14473.1, 300 sec: 14106.9). Total num frames: 485101568. Throughput: 0: 3622.6. Samples: 110440004. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 06:04:18,968][134211] Avg episode reward: [(0, '8.577')] [2025-01-04 06:04:19,028][134294] Updated weights for policy 0, policy_version 118434 (0.0025) [2025-01-04 06:04:20,949][134294] Updated weights for policy 0, policy_version 118444 (0.0014) [2025-01-04 06:04:22,807][134294] Updated weights for policy 0, policy_version 118454 (0.0014) [2025-01-04 06:04:23,968][134211] Fps is (10 sec: 17612.9, 60 sec: 15223.4, 300 sec: 14273.5). Total num frames: 485212160. Throughput: 0: 3708.2. Samples: 110466584. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 06:04:23,968][134211] Avg episode reward: [(0, '8.930')] [2025-01-04 06:04:24,723][134294] Updated weights for policy 0, policy_version 118464 (0.0013) [2025-01-04 06:04:26,605][134294] Updated weights for policy 0, policy_version 118474 (0.0013) [2025-01-04 06:04:28,440][134294] Updated weights for policy 0, policy_version 118484 (0.0014) [2025-01-04 06:04:28,967][134211] Fps is (10 sec: 21708.9, 60 sec: 15565.4, 300 sec: 14412.4). Total num frames: 485318656. Throughput: 0: 3743.5. Samples: 110499372. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-04 06:04:28,968][134211] Avg episode reward: [(0, '8.629')] [2025-01-04 06:04:30,485][134294] Updated weights for policy 0, policy_version 118494 (0.0016) [2025-01-04 06:04:33,913][134294] Updated weights for policy 0, policy_version 118504 (0.0025) [2025-01-04 06:04:33,968][134211] Fps is (10 sec: 18022.2, 60 sec: 15769.6, 300 sec: 14454.0). Total num frames: 485392384. Throughput: 0: 3706.8. Samples: 110513364. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-04 06:04:33,969][134211] Avg episode reward: [(0, '8.108')] [2025-01-04 06:04:37,403][134294] Updated weights for policy 0, policy_version 118514 (0.0027) [2025-01-04 06:04:38,968][134211] Fps is (10 sec: 13106.8, 60 sec: 15223.4, 300 sec: 14426.3). Total num frames: 485449728. Throughput: 0: 3667.8. Samples: 110530986. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-04 06:04:38,968][134211] Avg episode reward: [(0, '8.746')] [2025-01-04 06:04:40,778][134294] Updated weights for policy 0, policy_version 118524 (0.0028) [2025-01-04 06:04:43,920][134294] Updated weights for policy 0, policy_version 118534 (0.0025) [2025-01-04 06:04:43,968][134211] Fps is (10 sec: 12288.1, 60 sec: 14472.5, 300 sec: 14412.4). Total num frames: 485515264. Throughput: 0: 3643.8. Samples: 110549668. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-04 06:04:43,968][134211] Avg episode reward: [(0, '8.502')] [2025-01-04 06:04:46,967][134294] Updated weights for policy 0, policy_version 118544 (0.0027) [2025-01-04 06:04:48,968][134211] Fps is (10 sec: 13107.3, 60 sec: 14199.5, 300 sec: 14398.5). Total num frames: 485580800. Throughput: 0: 3669.2. Samples: 110559316. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-04 06:04:48,968][134211] Avg episode reward: [(0, '8.735')] [2025-01-04 06:04:50,111][134294] Updated weights for policy 0, policy_version 118554 (0.0028) [2025-01-04 06:04:52,997][134294] Updated weights for policy 0, policy_version 118564 (0.0027) [2025-01-04 06:04:53,968][134211] Fps is (10 sec: 13517.0, 60 sec: 14336.1, 300 sec: 14356.8). Total num frames: 485650432. Throughput: 0: 3742.1. Samples: 110579862. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-04 06:04:53,968][134211] Avg episode reward: [(0, '9.166')] [2025-01-04 06:04:55,965][134294] Updated weights for policy 0, policy_version 118574 (0.0025) [2025-01-04 06:04:58,924][134294] Updated weights for policy 0, policy_version 118584 (0.0022) [2025-01-04 06:04:58,968][134211] Fps is (10 sec: 13926.4, 60 sec: 14404.3, 300 sec: 14287.4). Total num frames: 485720064. Throughput: 0: 3784.0. Samples: 110600686. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-04 06:04:58,968][134211] Avg episode reward: [(0, '9.416')] [2025-01-04 06:05:02,154][134294] Updated weights for policy 0, policy_version 118594 (0.0023) [2025-01-04 06:05:03,968][134211] Fps is (10 sec: 13107.0, 60 sec: 14472.5, 300 sec: 14231.9). Total num frames: 485781504. Throughput: 0: 3785.2. Samples: 110610340. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-04 06:05:03,968][134211] Avg episode reward: [(0, '8.089')] [2025-01-04 06:05:05,359][134294] Updated weights for policy 0, policy_version 118604 (0.0025) [2025-01-04 06:05:08,652][134294] Updated weights for policy 0, policy_version 118614 (0.0028) [2025-01-04 06:05:08,968][134211] Fps is (10 sec: 12287.9, 60 sec: 14540.8, 300 sec: 14231.9). Total num frames: 485842944. Throughput: 0: 3622.2. Samples: 110629582. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-04 06:05:08,968][134211] Avg episode reward: [(0, '8.703')] [2025-01-04 06:05:11,206][134294] Updated weights for policy 0, policy_version 118624 (0.0016) [2025-01-04 06:05:13,055][134294] Updated weights for policy 0, policy_version 118634 (0.0013) [2025-01-04 06:05:13,967][134211] Fps is (10 sec: 16384.5, 60 sec: 15155.3, 300 sec: 14370.7). Total num frames: 485945344. Throughput: 0: 3459.5. Samples: 110655048. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-04 06:05:13,968][134211] Avg episode reward: [(0, '7.935')] [2025-01-04 06:05:13,972][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000118639_485945344.pth... [2025-01-04 06:05:14,017][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000117794_482484224.pth [2025-01-04 06:05:15,026][134294] Updated weights for policy 0, policy_version 118644 (0.0013) [2025-01-04 06:05:17,151][134294] Updated weights for policy 0, policy_version 118654 (0.0017) [2025-01-04 06:05:18,968][134211] Fps is (10 sec: 18431.7, 60 sec: 15428.2, 300 sec: 14454.0). Total num frames: 486027264. Throughput: 0: 3501.5. Samples: 110670934. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-04 06:05:18,969][134211] Avg episode reward: [(0, '9.386')] [2025-01-04 06:05:20,266][134294] Updated weights for policy 0, policy_version 118664 (0.0026) [2025-01-04 06:05:23,410][134294] Updated weights for policy 0, policy_version 118674 (0.0027) [2025-01-04 06:05:23,968][134211] Fps is (10 sec: 14745.3, 60 sec: 14677.3, 300 sec: 14356.8). Total num frames: 486092800. Throughput: 0: 3567.6. Samples: 110691526. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-04 06:05:23,968][134211] Avg episode reward: [(0, '8.808')] [2025-01-04 06:05:26,493][134294] Updated weights for policy 0, policy_version 118684 (0.0029) [2025-01-04 06:05:28,968][134211] Fps is (10 sec: 13107.4, 60 sec: 13994.6, 300 sec: 14329.1). Total num frames: 486158336. Throughput: 0: 3578.8. Samples: 110710714. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 06:05:28,968][134211] Avg episode reward: [(0, '9.292')] [2025-01-04 06:05:29,786][134294] Updated weights for policy 0, policy_version 118694 (0.0027) [2025-01-04 06:05:33,198][134294] Updated weights for policy 0, policy_version 118704 (0.0024) [2025-01-04 06:05:33,968][134211] Fps is (10 sec: 12697.7, 60 sec: 13789.9, 300 sec: 14356.9). Total num frames: 486219776. Throughput: 0: 3567.0. Samples: 110719830. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 06:05:33,968][134211] Avg episode reward: [(0, '8.550')] [2025-01-04 06:05:36,232][134294] Updated weights for policy 0, policy_version 118714 (0.0020) [2025-01-04 06:05:38,330][134294] Updated weights for policy 0, policy_version 118724 (0.0013) [2025-01-04 06:05:38,968][134211] Fps is (10 sec: 14336.3, 60 sec: 14199.5, 300 sec: 14356.8). Total num frames: 486301696. Throughput: 0: 3580.4. Samples: 110740980. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 06:05:38,968][134211] Avg episode reward: [(0, '8.097')] [2025-01-04 06:05:40,402][134294] Updated weights for policy 0, policy_version 118734 (0.0013) [2025-01-04 06:05:42,334][134294] Updated weights for policy 0, policy_version 118744 (0.0013) [2025-01-04 06:05:43,968][134211] Fps is (10 sec: 18841.6, 60 sec: 14882.2, 300 sec: 14440.1). Total num frames: 486408192. Throughput: 0: 3798.2. Samples: 110771604. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 06:05:43,968][134211] Avg episode reward: [(0, '8.068')] [2025-01-04 06:05:44,409][134294] Updated weights for policy 0, policy_version 118754 (0.0015) [2025-01-04 06:05:47,498][134294] Updated weights for policy 0, policy_version 118764 (0.0027) [2025-01-04 06:05:48,968][134211] Fps is (10 sec: 17202.8, 60 sec: 14882.1, 300 sec: 14481.8). Total num frames: 486473728. Throughput: 0: 3849.4. Samples: 110783562. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 06:05:48,968][134211] Avg episode reward: [(0, '8.722')] [2025-01-04 06:05:50,889][134294] Updated weights for policy 0, policy_version 118774 (0.0029) [2025-01-04 06:05:53,968][134211] Fps is (10 sec: 12697.4, 60 sec: 14745.6, 300 sec: 14495.7). Total num frames: 486535168. Throughput: 0: 3826.4. Samples: 110801770. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 06:05:53,969][134211] Avg episode reward: [(0, '7.940')] [2025-01-04 06:05:54,124][134294] Updated weights for policy 0, policy_version 118784 (0.0029) [2025-01-04 06:05:57,447][134294] Updated weights for policy 0, policy_version 118794 (0.0026) [2025-01-04 06:05:58,968][134211] Fps is (10 sec: 12288.1, 60 sec: 14609.1, 300 sec: 14509.6). Total num frames: 486596608. Throughput: 0: 3669.2. Samples: 110820164. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 06:05:58,970][134211] Avg episode reward: [(0, '8.194')] [2025-01-04 06:06:00,983][134294] Updated weights for policy 0, policy_version 118804 (0.0026) [2025-01-04 06:06:03,968][134211] Fps is (10 sec: 11878.4, 60 sec: 14540.8, 300 sec: 14412.3). Total num frames: 486653952. Throughput: 0: 3514.2. Samples: 110829072. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 06:06:03,969][134211] Avg episode reward: [(0, '8.118')] [2025-01-04 06:06:04,546][134294] Updated weights for policy 0, policy_version 118814 (0.0026) [2025-01-04 06:06:08,045][134294] Updated weights for policy 0, policy_version 118824 (0.0026) [2025-01-04 06:06:08,967][134211] Fps is (10 sec: 12288.3, 60 sec: 14609.1, 300 sec: 14356.9). Total num frames: 486719488. Throughput: 0: 3441.6. Samples: 110846398. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 06:06:08,968][134211] Avg episode reward: [(0, '8.237')] [2025-01-04 06:06:10,411][134294] Updated weights for policy 0, policy_version 118834 (0.0016) [2025-01-04 06:06:13,529][134294] Updated weights for policy 0, policy_version 118844 (0.0025) [2025-01-04 06:06:13,968][134211] Fps is (10 sec: 13516.8, 60 sec: 14062.9, 300 sec: 14426.2). Total num frames: 486789120. Throughput: 0: 3504.1. Samples: 110868398. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 06:06:13,968][134211] Avg episode reward: [(0, '8.215')] [2025-01-04 06:06:16,794][134294] Updated weights for policy 0, policy_version 118854 (0.0027) [2025-01-04 06:06:18,967][134211] Fps is (10 sec: 13107.2, 60 sec: 13721.7, 300 sec: 14426.3). Total num frames: 486850560. Throughput: 0: 3508.9. Samples: 110877728. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 06:06:18,968][134211] Avg episode reward: [(0, '8.608')] [2025-01-04 06:06:19,681][134294] Updated weights for policy 0, policy_version 118864 (0.0016) [2025-01-04 06:06:21,720][134294] Updated weights for policy 0, policy_version 118874 (0.0015) [2025-01-04 06:06:23,805][134294] Updated weights for policy 0, policy_version 118884 (0.0014) [2025-01-04 06:06:23,967][134211] Fps is (10 sec: 15974.9, 60 sec: 14267.8, 300 sec: 14551.2). Total num frames: 486948864. Throughput: 0: 3586.0. Samples: 110902350. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 06:06:23,968][134211] Avg episode reward: [(0, '8.171')] [2025-01-04 06:06:25,839][134294] Updated weights for policy 0, policy_version 118894 (0.0014) [2025-01-04 06:06:28,063][134294] Updated weights for policy 0, policy_version 118904 (0.0016) [2025-01-04 06:06:28,968][134211] Fps is (10 sec: 18841.3, 60 sec: 14677.4, 300 sec: 14648.4). Total num frames: 487038976. Throughput: 0: 3546.1. Samples: 110931180. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 06:06:28,968][134211] Avg episode reward: [(0, '8.036')] [2025-01-04 06:06:31,611][134294] Updated weights for policy 0, policy_version 118914 (0.0025) [2025-01-04 06:06:33,968][134211] Fps is (10 sec: 14745.2, 60 sec: 14609.0, 300 sec: 14551.2). Total num frames: 487096320. Throughput: 0: 3474.7. Samples: 110939922. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 06:06:33,969][134211] Avg episode reward: [(0, '8.691')] [2025-01-04 06:06:35,317][134294] Updated weights for policy 0, policy_version 118924 (0.0026) [2025-01-04 06:06:38,892][134294] Updated weights for policy 0, policy_version 118934 (0.0027) [2025-01-04 06:06:38,968][134211] Fps is (10 sec: 11468.7, 60 sec: 14199.4, 300 sec: 14398.5). Total num frames: 487153664. Throughput: 0: 3446.5. Samples: 110956860. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 06:06:38,968][134211] Avg episode reward: [(0, '8.145')] [2025-01-04 06:06:42,305][134294] Updated weights for policy 0, policy_version 118944 (0.0027) [2025-01-04 06:06:43,968][134211] Fps is (10 sec: 11468.9, 60 sec: 13380.2, 300 sec: 14356.8). Total num frames: 487211008. Throughput: 0: 3427.8. Samples: 110974414. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 06:06:43,968][134211] Avg episode reward: [(0, '8.356')] [2025-01-04 06:06:45,512][134294] Updated weights for policy 0, policy_version 118954 (0.0026) [2025-01-04 06:06:48,551][134294] Updated weights for policy 0, policy_version 118964 (0.0023) [2025-01-04 06:06:48,968][134211] Fps is (10 sec: 12697.6, 60 sec: 13448.5, 300 sec: 14398.5). Total num frames: 487280640. Throughput: 0: 3444.2. Samples: 110984062. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 06:06:48,968][134211] Avg episode reward: [(0, '8.887')] [2025-01-04 06:06:51,527][134294] Updated weights for policy 0, policy_version 118974 (0.0027) [2025-01-04 06:06:53,968][134211] Fps is (10 sec: 13517.0, 60 sec: 13516.8, 300 sec: 14412.4). Total num frames: 487346176. Throughput: 0: 3518.9. Samples: 111004750. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 06:06:53,968][134211] Avg episode reward: [(0, '8.060')] [2025-01-04 06:06:54,657][134294] Updated weights for policy 0, policy_version 118984 (0.0026) [2025-01-04 06:06:57,651][134294] Updated weights for policy 0, policy_version 118994 (0.0026) [2025-01-04 06:06:58,968][134211] Fps is (10 sec: 13107.2, 60 sec: 13585.1, 300 sec: 14454.0). Total num frames: 487411712. Throughput: 0: 3473.5. Samples: 111024704. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 06:06:58,968][134211] Avg episode reward: [(0, '9.031')] [2025-01-04 06:07:00,672][134294] Updated weights for policy 0, policy_version 119004 (0.0026) [2025-01-04 06:07:03,579][134294] Updated weights for policy 0, policy_version 119014 (0.0022) [2025-01-04 06:07:03,968][134211] Fps is (10 sec: 13926.5, 60 sec: 13858.2, 300 sec: 14495.7). Total num frames: 487485440. Throughput: 0: 3498.1. Samples: 111035144. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 06:07:03,968][134211] Avg episode reward: [(0, '9.049')] [2025-01-04 06:07:05,596][134294] Updated weights for policy 0, policy_version 119024 (0.0011) [2025-01-04 06:07:07,587][134294] Updated weights for policy 0, policy_version 119034 (0.0015) [2025-01-04 06:07:08,968][134211] Fps is (10 sec: 17613.2, 60 sec: 14472.5, 300 sec: 14634.5). Total num frames: 487587840. Throughput: 0: 3546.4. Samples: 111061938. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 06:07:08,968][134211] Avg episode reward: [(0, '8.734')] [2025-01-04 06:07:09,739][134294] Updated weights for policy 0, policy_version 119044 (0.0012) [2025-01-04 06:07:11,741][134294] Updated weights for policy 0, policy_version 119054 (0.0012) [2025-01-04 06:07:13,896][134294] Updated weights for policy 0, policy_version 119064 (0.0016) [2025-01-04 06:07:13,968][134211] Fps is (10 sec: 20069.9, 60 sec: 14950.4, 300 sec: 14634.5). Total num frames: 487686144. Throughput: 0: 3575.1. Samples: 111092058. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 06:07:13,968][134211] Avg episode reward: [(0, '8.498')] [2025-01-04 06:07:13,976][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000119064_487686144.pth... [2025-01-04 06:07:14,048][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000118204_484163584.pth [2025-01-04 06:07:17,253][134294] Updated weights for policy 0, policy_version 119074 (0.0029) [2025-01-04 06:07:18,968][134211] Fps is (10 sec: 15974.3, 60 sec: 14950.4, 300 sec: 14620.6). Total num frames: 487747584. Throughput: 0: 3596.6. Samples: 111101768. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 06:07:18,968][134211] Avg episode reward: [(0, '7.499')] [2025-01-04 06:07:20,429][134294] Updated weights for policy 0, policy_version 119084 (0.0029) [2025-01-04 06:07:23,589][134294] Updated weights for policy 0, policy_version 119094 (0.0025) [2025-01-04 06:07:23,968][134211] Fps is (10 sec: 12697.7, 60 sec: 14404.2, 300 sec: 14606.8). Total num frames: 487813120. Throughput: 0: 3650.2. Samples: 111121120. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 06:07:23,969][134211] Avg episode reward: [(0, '9.577')] [2025-01-04 06:07:26,559][134294] Updated weights for policy 0, policy_version 119104 (0.0025) [2025-01-04 06:07:28,968][134211] Fps is (10 sec: 13106.7, 60 sec: 13994.6, 300 sec: 14606.7). Total num frames: 487878656. Throughput: 0: 3697.5. Samples: 111140802. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 06:07:28,969][134211] Avg episode reward: [(0, '7.627')] [2025-01-04 06:07:29,755][134294] Updated weights for policy 0, policy_version 119114 (0.0027) [2025-01-04 06:07:32,931][134294] Updated weights for policy 0, policy_version 119124 (0.0025) [2025-01-04 06:07:33,968][134211] Fps is (10 sec: 12697.7, 60 sec: 14063.0, 300 sec: 14606.8). Total num frames: 487940096. Throughput: 0: 3702.0. Samples: 111150652. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 06:07:33,968][134211] Avg episode reward: [(0, '8.246')] [2025-01-04 06:07:36,224][134294] Updated weights for policy 0, policy_version 119134 (0.0025) [2025-01-04 06:07:38,968][134211] Fps is (10 sec: 12288.4, 60 sec: 14131.2, 300 sec: 14481.8). Total num frames: 488001536. Throughput: 0: 3654.6. Samples: 111169206. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 06:07:38,968][134211] Avg episode reward: [(0, '8.389')] [2025-01-04 06:07:39,839][134294] Updated weights for policy 0, policy_version 119144 (0.0029) [2025-01-04 06:07:42,874][134294] Updated weights for policy 0, policy_version 119154 (0.0024) [2025-01-04 06:07:43,968][134211] Fps is (10 sec: 12697.6, 60 sec: 14267.7, 300 sec: 14329.1). Total num frames: 488067072. Throughput: 0: 3631.2. Samples: 111188106. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 06:07:43,968][134211] Avg episode reward: [(0, '8.368')] [2025-01-04 06:07:45,827][134294] Updated weights for policy 0, policy_version 119164 (0.0027) [2025-01-04 06:07:48,648][134294] Updated weights for policy 0, policy_version 119174 (0.0025) [2025-01-04 06:07:48,968][134211] Fps is (10 sec: 13516.9, 60 sec: 14267.8, 300 sec: 14329.1). Total num frames: 488136704. Throughput: 0: 3627.2. Samples: 111198370. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 06:07:48,968][134211] Avg episode reward: [(0, '8.457')] [2025-01-04 06:07:51,458][134294] Updated weights for policy 0, policy_version 119184 (0.0019) [2025-01-04 06:07:53,416][134294] Updated weights for policy 0, policy_version 119194 (0.0015) [2025-01-04 06:07:53,968][134211] Fps is (10 sec: 15974.3, 60 sec: 14677.3, 300 sec: 14426.3). Total num frames: 488226816. Throughput: 0: 3547.8. Samples: 111221592. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 06:07:53,968][134211] Avg episode reward: [(0, '8.917')] [2025-01-04 06:07:56,234][134294] Updated weights for policy 0, policy_version 119204 (0.0024) [2025-01-04 06:07:58,968][134211] Fps is (10 sec: 15974.4, 60 sec: 14745.6, 300 sec: 14440.1). Total num frames: 488296448. Throughput: 0: 3398.8. Samples: 111245004. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 06:07:58,968][134211] Avg episode reward: [(0, '7.913')] [2025-01-04 06:07:59,301][134294] Updated weights for policy 0, policy_version 119214 (0.0025) [2025-01-04 06:08:02,616][134294] Updated weights for policy 0, policy_version 119224 (0.0024) [2025-01-04 06:08:03,968][134211] Fps is (10 sec: 13517.1, 60 sec: 14609.1, 300 sec: 14454.0). Total num frames: 488361984. Throughput: 0: 3381.5. Samples: 111253936. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 06:08:03,968][134211] Avg episode reward: [(0, '7.703')] [2025-01-04 06:08:04,805][134294] Updated weights for policy 0, policy_version 119234 (0.0015) [2025-01-04 06:08:06,928][134294] Updated weights for policy 0, policy_version 119244 (0.0013) [2025-01-04 06:08:08,968][134211] Fps is (10 sec: 16384.1, 60 sec: 14540.8, 300 sec: 14565.1). Total num frames: 488460288. Throughput: 0: 3535.6. Samples: 111280222. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 06:08:08,968][134211] Avg episode reward: [(0, '8.280')] [2025-01-04 06:08:09,052][134294] Updated weights for policy 0, policy_version 119254 (0.0013) [2025-01-04 06:08:11,062][134294] Updated weights for policy 0, policy_version 119264 (0.0013) [2025-01-04 06:08:12,952][134294] Updated weights for policy 0, policy_version 119274 (0.0013) [2025-01-04 06:08:13,968][134211] Fps is (10 sec: 20479.9, 60 sec: 14677.4, 300 sec: 14690.2). Total num frames: 488566784. Throughput: 0: 3784.0. Samples: 111311080. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 06:08:13,968][134211] Avg episode reward: [(0, '8.673')] [2025-01-04 06:08:15,185][134294] Updated weights for policy 0, policy_version 119284 (0.0019) [2025-01-04 06:08:18,365][134294] Updated weights for policy 0, policy_version 119294 (0.0028) [2025-01-04 06:08:18,968][134211] Fps is (10 sec: 17612.6, 60 sec: 14813.9, 300 sec: 14703.9). Total num frames: 488636416. Throughput: 0: 3842.4. Samples: 111323562. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 06:08:18,968][134211] Avg episode reward: [(0, '8.035')] [2025-01-04 06:08:21,511][134294] Updated weights for policy 0, policy_version 119304 (0.0025) [2025-01-04 06:08:23,968][134211] Fps is (10 sec: 13107.0, 60 sec: 14745.6, 300 sec: 14620.7). Total num frames: 488697856. Throughput: 0: 3854.4. Samples: 111342652. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 06:08:23,968][134211] Avg episode reward: [(0, '8.329')] [2025-01-04 06:08:24,731][134294] Updated weights for policy 0, policy_version 119314 (0.0029) [2025-01-04 06:08:27,752][134294] Updated weights for policy 0, policy_version 119324 (0.0025) [2025-01-04 06:08:28,968][134211] Fps is (10 sec: 12697.4, 60 sec: 14745.7, 300 sec: 14634.5). Total num frames: 488763392. Throughput: 0: 3865.8. Samples: 111362066. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 06:08:28,969][134211] Avg episode reward: [(0, '9.001')] [2025-01-04 06:08:30,975][134294] Updated weights for policy 0, policy_version 119334 (0.0027) [2025-01-04 06:08:33,968][134211] Fps is (10 sec: 13107.0, 60 sec: 14813.8, 300 sec: 14551.2). Total num frames: 488828928. Throughput: 0: 3856.9. Samples: 111371930. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 06:08:33,968][134211] Avg episode reward: [(0, '8.244')] [2025-01-04 06:08:34,219][134294] Updated weights for policy 0, policy_version 119344 (0.0025) [2025-01-04 06:08:37,294][134294] Updated weights for policy 0, policy_version 119354 (0.0026) [2025-01-04 06:08:38,968][134211] Fps is (10 sec: 12697.7, 60 sec: 14813.9, 300 sec: 14384.6). Total num frames: 488890368. Throughput: 0: 3763.9. Samples: 111390968. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 06:08:38,968][134211] Avg episode reward: [(0, '7.593')] [2025-01-04 06:08:40,915][134294] Updated weights for policy 0, policy_version 119364 (0.0024) [2025-01-04 06:08:43,848][134294] Updated weights for policy 0, policy_version 119374 (0.0025) [2025-01-04 06:08:43,968][134211] Fps is (10 sec: 12697.6, 60 sec: 14813.9, 300 sec: 14329.1). Total num frames: 488955904. Throughput: 0: 3668.2. Samples: 111410072. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 06:08:43,968][134211] Avg episode reward: [(0, '7.911')] [2025-01-04 06:08:46,821][134294] Updated weights for policy 0, policy_version 119384 (0.0026) [2025-01-04 06:08:48,968][134211] Fps is (10 sec: 13516.8, 60 sec: 14813.8, 300 sec: 14356.8). Total num frames: 489025536. Throughput: 0: 3693.9. Samples: 111420164. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 06:08:48,968][134211] Avg episode reward: [(0, '7.838')] [2025-01-04 06:08:49,886][134294] Updated weights for policy 0, policy_version 119394 (0.0025) [2025-01-04 06:08:53,014][134294] Updated weights for policy 0, policy_version 119404 (0.0026) [2025-01-04 06:08:53,968][134211] Fps is (10 sec: 13107.2, 60 sec: 14336.0, 300 sec: 14342.9). Total num frames: 489086976. Throughput: 0: 3554.7. Samples: 111440182. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 06:08:53,968][134211] Avg episode reward: [(0, '8.560')] [2025-01-04 06:08:56,266][134294] Updated weights for policy 0, policy_version 119414 (0.0024) [2025-01-04 06:08:58,968][134211] Fps is (10 sec: 12697.6, 60 sec: 14267.7, 300 sec: 14370.7). Total num frames: 489152512. Throughput: 0: 3298.5. Samples: 111459514. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 06:08:58,968][134211] Avg episode reward: [(0, '8.722')] [2025-01-04 06:08:59,414][134294] Updated weights for policy 0, policy_version 119424 (0.0028) [2025-01-04 06:09:02,604][134294] Updated weights for policy 0, policy_version 119434 (0.0023) [2025-01-04 06:09:03,967][134211] Fps is (10 sec: 13926.8, 60 sec: 14404.3, 300 sec: 14426.3). Total num frames: 489226240. Throughput: 0: 3236.2. Samples: 111469190. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 06:09:03,968][134211] Avg episode reward: [(0, '8.575')] [2025-01-04 06:09:04,669][134294] Updated weights for policy 0, policy_version 119444 (0.0012) [2025-01-04 06:09:06,624][134294] Updated weights for policy 0, policy_version 119454 (0.0015) [2025-01-04 06:09:08,651][134294] Updated weights for policy 0, policy_version 119464 (0.0013) [2025-01-04 06:09:08,968][134211] Fps is (10 sec: 17612.9, 60 sec: 14472.5, 300 sec: 14551.2). Total num frames: 489328640. Throughput: 0: 3432.4. Samples: 111497108. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 06:09:08,968][134211] Avg episode reward: [(0, '8.628')] [2025-01-04 06:09:11,934][134294] Updated weights for policy 0, policy_version 119474 (0.0028) [2025-01-04 06:09:13,968][134211] Fps is (10 sec: 16383.6, 60 sec: 13721.6, 300 sec: 14537.3). Total num frames: 489390080. Throughput: 0: 3468.5. Samples: 111518150. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 06:09:13,968][134211] Avg episode reward: [(0, '8.073')] [2025-01-04 06:09:13,977][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000119480_489390080.pth... [2025-01-04 06:09:14,054][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000118639_485945344.pth [2025-01-04 06:09:15,337][134294] Updated weights for policy 0, policy_version 119484 (0.0026) [2025-01-04 06:09:18,622][134294] Updated weights for policy 0, policy_version 119494 (0.0029) [2025-01-04 06:09:18,968][134211] Fps is (10 sec: 12287.9, 60 sec: 13585.1, 300 sec: 14370.7). Total num frames: 489451520. Throughput: 0: 3455.5. Samples: 111527428. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 06:09:18,968][134211] Avg episode reward: [(0, '9.103')] [2025-01-04 06:09:21,418][134294] Updated weights for policy 0, policy_version 119504 (0.0022) [2025-01-04 06:09:23,462][134294] Updated weights for policy 0, policy_version 119514 (0.0016) [2025-01-04 06:09:23,968][134211] Fps is (10 sec: 14335.9, 60 sec: 13926.4, 300 sec: 14287.4). Total num frames: 489533440. Throughput: 0: 3523.2. Samples: 111549514. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 06:09:23,968][134211] Avg episode reward: [(0, '7.445')] [2025-01-04 06:09:26,472][134294] Updated weights for policy 0, policy_version 119524 (0.0027) [2025-01-04 06:09:28,968][134211] Fps is (10 sec: 15155.1, 60 sec: 13994.7, 300 sec: 14273.5). Total num frames: 489603072. Throughput: 0: 3582.3. Samples: 111571276. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 06:09:28,968][134211] Avg episode reward: [(0, '8.392')] [2025-01-04 06:09:29,681][134294] Updated weights for policy 0, policy_version 119534 (0.0027) [2025-01-04 06:09:32,888][134294] Updated weights for policy 0, policy_version 119544 (0.0029) [2025-01-04 06:09:33,968][134211] Fps is (10 sec: 13106.8, 60 sec: 13926.4, 300 sec: 14287.4). Total num frames: 489664512. Throughput: 0: 3573.7. Samples: 111580980. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 06:09:33,969][134211] Avg episode reward: [(0, '8.279')] [2025-01-04 06:09:36,018][134294] Updated weights for policy 0, policy_version 119554 (0.0026) [2025-01-04 06:09:38,198][134294] Updated weights for policy 0, policy_version 119564 (0.0014) [2025-01-04 06:09:38,968][134211] Fps is (10 sec: 14336.3, 60 sec: 14267.8, 300 sec: 14343.0). Total num frames: 489746432. Throughput: 0: 3587.8. Samples: 111601634. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 06:09:38,968][134211] Avg episode reward: [(0, '9.242')] [2025-01-04 06:09:40,361][134294] Updated weights for policy 0, policy_version 119574 (0.0012) [2025-01-04 06:09:42,343][134294] Updated weights for policy 0, policy_version 119584 (0.0013) [2025-01-04 06:09:43,968][134211] Fps is (10 sec: 18432.8, 60 sec: 14882.2, 300 sec: 14467.9). Total num frames: 489848832. Throughput: 0: 3831.4. Samples: 111631926. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 06:09:43,968][134211] Avg episode reward: [(0, '8.869')] [2025-01-04 06:09:44,223][134294] Updated weights for policy 0, policy_version 119594 (0.0014) [2025-01-04 06:09:46,077][134294] Updated weights for policy 0, policy_version 119604 (0.0013) [2025-01-04 06:09:48,968][134211] Fps is (10 sec: 19250.8, 60 sec: 15223.5, 300 sec: 14537.3). Total num frames: 489938944. Throughput: 0: 3974.2. Samples: 111648032. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 06:09:48,968][134211] Avg episode reward: [(0, '7.918')] [2025-01-04 06:09:48,969][134294] Updated weights for policy 0, policy_version 119614 (0.0025) [2025-01-04 06:09:52,305][134294] Updated weights for policy 0, policy_version 119624 (0.0028) [2025-01-04 06:09:53,968][134211] Fps is (10 sec: 15155.0, 60 sec: 15223.5, 300 sec: 14509.6). Total num frames: 490000384. Throughput: 0: 3791.3. Samples: 111667718. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 06:09:53,968][134211] Avg episode reward: [(0, '8.448')] [2025-01-04 06:09:55,369][134294] Updated weights for policy 0, policy_version 119634 (0.0024) [2025-01-04 06:09:58,470][134294] Updated weights for policy 0, policy_version 119644 (0.0026) [2025-01-04 06:09:58,968][134211] Fps is (10 sec: 12697.2, 60 sec: 15223.4, 300 sec: 14523.4). Total num frames: 490065920. Throughput: 0: 3771.9. Samples: 111687888. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 06:09:58,969][134211] Avg episode reward: [(0, '8.254')] [2025-01-04 06:10:01,876][134294] Updated weights for policy 0, policy_version 119654 (0.0023) [2025-01-04 06:10:03,968][134211] Fps is (10 sec: 12287.9, 60 sec: 14950.3, 300 sec: 14509.6). Total num frames: 490123264. Throughput: 0: 3766.7. Samples: 111696930. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 06:10:03,968][134211] Avg episode reward: [(0, '8.271')] [2025-01-04 06:10:05,585][134294] Updated weights for policy 0, policy_version 119664 (0.0030) [2025-01-04 06:10:08,968][134211] Fps is (10 sec: 11469.1, 60 sec: 14199.4, 300 sec: 14356.8). Total num frames: 490180608. Throughput: 0: 3651.1. Samples: 111713814. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 06:10:08,968][134211] Avg episode reward: [(0, '8.345')] [2025-01-04 06:10:09,049][134294] Updated weights for policy 0, policy_version 119674 (0.0026) [2025-01-04 06:10:12,465][134294] Updated weights for policy 0, policy_version 119684 (0.0027) [2025-01-04 06:10:13,968][134211] Fps is (10 sec: 11878.4, 60 sec: 14199.4, 300 sec: 14287.4). Total num frames: 490242048. Throughput: 0: 3565.3. Samples: 111731714. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 06:10:13,968][134211] Avg episode reward: [(0, '8.409')] [2025-01-04 06:10:15,521][134294] Updated weights for policy 0, policy_version 119694 (0.0024) [2025-01-04 06:10:17,564][134294] Updated weights for policy 0, policy_version 119704 (0.0013) [2025-01-04 06:10:18,968][134211] Fps is (10 sec: 14745.6, 60 sec: 14609.1, 300 sec: 14356.8). Total num frames: 490328064. Throughput: 0: 3591.9. Samples: 111742616. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 06:10:18,968][134211] Avg episode reward: [(0, '9.202')] [2025-01-04 06:10:20,171][134294] Updated weights for policy 0, policy_version 119714 (0.0020) [2025-01-04 06:10:23,072][134294] Updated weights for policy 0, policy_version 119724 (0.0023) [2025-01-04 06:10:23,968][134211] Fps is (10 sec: 15565.0, 60 sec: 14404.3, 300 sec: 14370.7). Total num frames: 490397696. Throughput: 0: 3683.8. Samples: 111767406. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 06:10:23,968][134211] Avg episode reward: [(0, '9.475')] [2025-01-04 06:10:26,063][134294] Updated weights for policy 0, policy_version 119734 (0.0023) [2025-01-04 06:10:28,968][134211] Fps is (10 sec: 13926.4, 60 sec: 14404.3, 300 sec: 14398.5). Total num frames: 490467328. Throughput: 0: 3458.3. Samples: 111787550. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 06:10:28,968][134211] Avg episode reward: [(0, '8.712')] [2025-01-04 06:10:29,266][134294] Updated weights for policy 0, policy_version 119744 (0.0026) [2025-01-04 06:10:32,260][134294] Updated weights for policy 0, policy_version 119754 (0.0025) [2025-01-04 06:10:33,968][134211] Fps is (10 sec: 13516.7, 60 sec: 14472.6, 300 sec: 14342.9). Total num frames: 490532864. Throughput: 0: 3327.6. Samples: 111797772. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 06:10:33,968][134211] Avg episode reward: [(0, '8.068')] [2025-01-04 06:10:35,026][134294] Updated weights for policy 0, policy_version 119764 (0.0022) [2025-01-04 06:10:37,056][134294] Updated weights for policy 0, policy_version 119774 (0.0014) [2025-01-04 06:10:38,968][134211] Fps is (10 sec: 16384.3, 60 sec: 14745.6, 300 sec: 14315.2). Total num frames: 490631168. Throughput: 0: 3439.4. Samples: 111822492. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 06:10:38,968][134211] Avg episode reward: [(0, '9.025')] [2025-01-04 06:10:39,121][134294] Updated weights for policy 0, policy_version 119784 (0.0015) [2025-01-04 06:10:41,252][134294] Updated weights for policy 0, policy_version 119794 (0.0016) [2025-01-04 06:10:43,968][134211] Fps is (10 sec: 18022.1, 60 sec: 14404.2, 300 sec: 14370.7). Total num frames: 490713088. Throughput: 0: 3584.5. Samples: 111849188. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 06:10:43,969][134211] Avg episode reward: [(0, '8.541')] [2025-01-04 06:10:44,256][134294] Updated weights for policy 0, policy_version 119804 (0.0023) [2025-01-04 06:10:47,569][134294] Updated weights for policy 0, policy_version 119814 (0.0026) [2025-01-04 06:10:48,968][134211] Fps is (10 sec: 13926.1, 60 sec: 13858.1, 300 sec: 14356.8). Total num frames: 490770432. Throughput: 0: 3581.5. Samples: 111858096. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 06:10:48,969][134211] Avg episode reward: [(0, '8.182')] [2025-01-04 06:10:50,833][134294] Updated weights for policy 0, policy_version 119824 (0.0027) [2025-01-04 06:10:53,965][134294] Updated weights for policy 0, policy_version 119834 (0.0027) [2025-01-04 06:10:53,969][134211] Fps is (10 sec: 12696.7, 60 sec: 13994.5, 300 sec: 14384.6). Total num frames: 490840064. Throughput: 0: 3633.6. Samples: 111877330. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 06:10:53,969][134211] Avg episode reward: [(0, '8.135')] [2025-01-04 06:10:56,989][134294] Updated weights for policy 0, policy_version 119844 (0.0025) [2025-01-04 06:10:58,968][134211] Fps is (10 sec: 13516.9, 60 sec: 13994.7, 300 sec: 14412.4). Total num frames: 490905600. Throughput: 0: 3680.1. Samples: 111897320. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 06:10:58,968][134211] Avg episode reward: [(0, '9.152')] [2025-01-04 06:11:00,087][134294] Updated weights for policy 0, policy_version 119854 (0.0024) [2025-01-04 06:11:03,204][134294] Updated weights for policy 0, policy_version 119864 (0.0025) [2025-01-04 06:11:03,968][134211] Fps is (10 sec: 13108.4, 60 sec: 14131.2, 300 sec: 14412.4). Total num frames: 490971136. Throughput: 0: 3664.8. Samples: 111907530. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 06:11:03,968][134211] Avg episode reward: [(0, '8.915')] [2025-01-04 06:11:06,397][134294] Updated weights for policy 0, policy_version 119874 (0.0026) [2025-01-04 06:11:08,968][134211] Fps is (10 sec: 13517.0, 60 sec: 14336.1, 300 sec: 14412.4). Total num frames: 491040768. Throughput: 0: 3534.9. Samples: 111926476. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 06:11:08,968][134211] Avg episode reward: [(0, '8.749')] [2025-01-04 06:11:09,138][134294] Updated weights for policy 0, policy_version 119884 (0.0019) [2025-01-04 06:11:11,219][134294] Updated weights for policy 0, policy_version 119894 (0.0014) [2025-01-04 06:11:13,948][134294] Updated weights for policy 0, policy_version 119904 (0.0023) [2025-01-04 06:11:13,968][134211] Fps is (10 sec: 15564.3, 60 sec: 14745.6, 300 sec: 14495.7). Total num frames: 491126784. Throughput: 0: 3662.0. Samples: 111952340. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 06:11:13,969][134211] Avg episode reward: [(0, '7.987')] [2025-01-04 06:11:13,977][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000119904_491126784.pth... [2025-01-04 06:11:14,052][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000119064_487686144.pth [2025-01-04 06:11:17,170][134294] Updated weights for policy 0, policy_version 119914 (0.0027) [2025-01-04 06:11:18,968][134211] Fps is (10 sec: 14745.3, 60 sec: 14336.0, 300 sec: 14370.7). Total num frames: 491188224. Throughput: 0: 3645.3. Samples: 111961810. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 06:11:18,968][134211] Avg episode reward: [(0, '8.299')] [2025-01-04 06:11:20,262][134294] Updated weights for policy 0, policy_version 119924 (0.0027) [2025-01-04 06:11:23,099][134294] Updated weights for policy 0, policy_version 119934 (0.0028) [2025-01-04 06:11:23,968][134211] Fps is (10 sec: 13107.4, 60 sec: 14336.0, 300 sec: 14301.3). Total num frames: 491257856. Throughput: 0: 3552.8. Samples: 111982370. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 06:11:23,968][134211] Avg episode reward: [(0, '9.006')] [2025-01-04 06:11:26,167][134294] Updated weights for policy 0, policy_version 119944 (0.0024) [2025-01-04 06:11:28,100][134294] Updated weights for policy 0, policy_version 119954 (0.0012) [2025-01-04 06:11:28,967][134211] Fps is (10 sec: 15974.8, 60 sec: 14677.4, 300 sec: 14412.4). Total num frames: 491347968. Throughput: 0: 3495.4. Samples: 112006480. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 06:11:28,968][134211] Avg episode reward: [(0, '8.151')] [2025-01-04 06:11:30,015][134294] Updated weights for policy 0, policy_version 119964 (0.0015) [2025-01-04 06:11:31,874][134294] Updated weights for policy 0, policy_version 119974 (0.0017) [2025-01-04 06:11:33,832][134294] Updated weights for policy 0, policy_version 119984 (0.0015) [2025-01-04 06:11:33,968][134211] Fps is (10 sec: 19661.0, 60 sec: 15360.0, 300 sec: 14579.0). Total num frames: 491454464. Throughput: 0: 3662.4. Samples: 112022902. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 06:11:33,968][134211] Avg episode reward: [(0, '8.990')] [2025-01-04 06:11:37,107][134294] Updated weights for policy 0, policy_version 119994 (0.0030) [2025-01-04 06:11:38,968][134211] Fps is (10 sec: 16793.3, 60 sec: 14745.6, 300 sec: 14592.9). Total num frames: 491515904. Throughput: 0: 3774.1. Samples: 112047160. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 06:11:38,968][134211] Avg episode reward: [(0, '8.490')] [2025-01-04 06:11:40,722][134294] Updated weights for policy 0, policy_version 120004 (0.0029) [2025-01-04 06:11:43,734][134294] Updated weights for policy 0, policy_version 120014 (0.0024) [2025-01-04 06:11:43,968][134211] Fps is (10 sec: 12287.8, 60 sec: 14404.3, 300 sec: 14565.1). Total num frames: 491577344. Throughput: 0: 3742.3. Samples: 112065726. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 06:11:43,969][134211] Avg episode reward: [(0, '9.244')] [2025-01-04 06:11:46,935][134294] Updated weights for policy 0, policy_version 120024 (0.0025) [2025-01-04 06:11:48,968][134211] Fps is (10 sec: 12697.6, 60 sec: 14540.8, 300 sec: 14565.1). Total num frames: 491642880. Throughput: 0: 3726.3. Samples: 112075212. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 06:11:48,968][134211] Avg episode reward: [(0, '8.503')] [2025-01-04 06:11:50,139][134294] Updated weights for policy 0, policy_version 120034 (0.0027) [2025-01-04 06:11:53,300][134294] Updated weights for policy 0, policy_version 120044 (0.0029) [2025-01-04 06:11:53,968][134211] Fps is (10 sec: 13107.3, 60 sec: 14472.7, 300 sec: 14565.1). Total num frames: 491708416. Throughput: 0: 3735.8. Samples: 112094590. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 06:11:53,968][134211] Avg episode reward: [(0, '8.587')] [2025-01-04 06:11:56,333][134294] Updated weights for policy 0, policy_version 120054 (0.0026) [2025-01-04 06:11:58,968][134211] Fps is (10 sec: 13107.3, 60 sec: 14472.5, 300 sec: 14537.3). Total num frames: 491773952. Throughput: 0: 3606.4. Samples: 112114626. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 06:11:58,968][134211] Avg episode reward: [(0, '8.655')] [2025-01-04 06:11:59,430][134294] Updated weights for policy 0, policy_version 120064 (0.0024) [2025-01-04 06:12:03,014][134294] Updated weights for policy 0, policy_version 120074 (0.0030) [2025-01-04 06:12:03,968][134211] Fps is (10 sec: 12288.1, 60 sec: 14336.0, 300 sec: 14384.6). Total num frames: 491831296. Throughput: 0: 3598.4. Samples: 112123736. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 06:12:03,968][134211] Avg episode reward: [(0, '8.144')] [2025-01-04 06:12:05,618][134294] Updated weights for policy 0, policy_version 120084 (0.0014) [2025-01-04 06:12:07,750][134294] Updated weights for policy 0, policy_version 120094 (0.0012) [2025-01-04 06:12:08,967][134211] Fps is (10 sec: 14745.9, 60 sec: 14677.3, 300 sec: 14356.8). Total num frames: 491921408. Throughput: 0: 3649.0. Samples: 112146574. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 06:12:08,968][134211] Avg episode reward: [(0, '8.725')] [2025-01-04 06:12:10,196][134294] Updated weights for policy 0, policy_version 120104 (0.0016) [2025-01-04 06:12:13,658][134294] Updated weights for policy 0, policy_version 120114 (0.0025) [2025-01-04 06:12:13,968][134211] Fps is (10 sec: 15564.7, 60 sec: 14336.0, 300 sec: 14370.7). Total num frames: 491986944. Throughput: 0: 3602.2. Samples: 112168582. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 06:12:13,968][134211] Avg episode reward: [(0, '8.475')] [2025-01-04 06:12:16,291][134294] Updated weights for policy 0, policy_version 120124 (0.0019) [2025-01-04 06:12:18,174][134294] Updated weights for policy 0, policy_version 120134 (0.0014) [2025-01-04 06:12:18,968][134211] Fps is (10 sec: 16383.8, 60 sec: 14950.4, 300 sec: 14481.8). Total num frames: 492085248. Throughput: 0: 3484.8. Samples: 112179716. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 06:12:18,968][134211] Avg episode reward: [(0, '8.384')] [2025-01-04 06:12:20,048][134294] Updated weights for policy 0, policy_version 120144 (0.0012) [2025-01-04 06:12:22,068][134294] Updated weights for policy 0, policy_version 120154 (0.0016) [2025-01-04 06:12:23,968][134211] Fps is (10 sec: 18431.9, 60 sec: 15223.5, 300 sec: 14551.2). Total num frames: 492171264. Throughput: 0: 3647.6. Samples: 112211302. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 06:12:23,968][134211] Avg episode reward: [(0, '8.258')] [2025-01-04 06:12:25,242][134294] Updated weights for policy 0, policy_version 120164 (0.0028) [2025-01-04 06:12:28,418][134294] Updated weights for policy 0, policy_version 120174 (0.0029) [2025-01-04 06:12:28,968][134211] Fps is (10 sec: 15154.6, 60 sec: 14813.7, 300 sec: 14565.1). Total num frames: 492236800. Throughput: 0: 3665.1. Samples: 112230658. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 06:12:28,969][134211] Avg episode reward: [(0, '8.360')] [2025-01-04 06:12:31,618][134294] Updated weights for policy 0, policy_version 120184 (0.0030) [2025-01-04 06:12:33,968][134211] Fps is (10 sec: 13107.2, 60 sec: 14131.2, 300 sec: 14579.0). Total num frames: 492302336. Throughput: 0: 3668.3. Samples: 112240288. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 06:12:33,968][134211] Avg episode reward: [(0, '8.660')] [2025-01-04 06:12:34,766][134294] Updated weights for policy 0, policy_version 120194 (0.0025) [2025-01-04 06:12:37,977][134294] Updated weights for policy 0, policy_version 120204 (0.0025) [2025-01-04 06:12:38,968][134211] Fps is (10 sec: 12698.1, 60 sec: 14131.2, 300 sec: 14565.1). Total num frames: 492363776. Throughput: 0: 3668.0. Samples: 112259648. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 06:12:38,968][134211] Avg episode reward: [(0, '9.556')] [2025-01-04 06:12:41,428][134294] Updated weights for policy 0, policy_version 120214 (0.0026) [2025-01-04 06:12:43,971][134211] Fps is (10 sec: 12284.3, 60 sec: 14130.5, 300 sec: 14537.2). Total num frames: 492425216. Throughput: 0: 3611.7. Samples: 112277164. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 06:12:43,972][134211] Avg episode reward: [(0, '8.636')] [2025-01-04 06:12:44,969][134294] Updated weights for policy 0, policy_version 120224 (0.0027) [2025-01-04 06:12:48,099][134294] Updated weights for policy 0, policy_version 120234 (0.0026) [2025-01-04 06:12:48,968][134211] Fps is (10 sec: 12287.8, 60 sec: 14062.9, 300 sec: 14440.1). Total num frames: 492486656. Throughput: 0: 3609.5. Samples: 112286162. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 06:12:48,968][134211] Avg episode reward: [(0, '8.257')] [2025-01-04 06:12:51,044][134294] Updated weights for policy 0, policy_version 120244 (0.0027) [2025-01-04 06:12:53,968][134211] Fps is (10 sec: 13111.4, 60 sec: 14131.2, 300 sec: 14440.1). Total num frames: 492556288. Throughput: 0: 3561.4. Samples: 112306836. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 06:12:53,968][134211] Avg episode reward: [(0, '9.085')] [2025-01-04 06:12:54,076][134294] Updated weights for policy 0, policy_version 120254 (0.0023) [2025-01-04 06:12:56,020][134294] Updated weights for policy 0, policy_version 120264 (0.0014) [2025-01-04 06:12:58,853][134294] Updated weights for policy 0, policy_version 120274 (0.0024) [2025-01-04 06:12:58,968][134211] Fps is (10 sec: 15564.8, 60 sec: 14472.5, 300 sec: 14509.5). Total num frames: 492642304. Throughput: 0: 3621.2. Samples: 112331538. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 06:12:58,968][134211] Avg episode reward: [(0, '8.569')] [2025-01-04 06:13:02,015][134294] Updated weights for policy 0, policy_version 120284 (0.0026) [2025-01-04 06:13:03,968][134211] Fps is (10 sec: 14745.6, 60 sec: 14540.8, 300 sec: 14384.6). Total num frames: 492703744. Throughput: 0: 3585.1. Samples: 112341046. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 06:13:03,968][134211] Avg episode reward: [(0, '8.370')] [2025-01-04 06:13:05,117][134294] Updated weights for policy 0, policy_version 120294 (0.0024) [2025-01-04 06:13:08,365][134294] Updated weights for policy 0, policy_version 120304 (0.0024) [2025-01-04 06:13:08,968][134211] Fps is (10 sec: 12697.6, 60 sec: 14131.1, 300 sec: 14245.7). Total num frames: 492769280. Throughput: 0: 3329.6. Samples: 112361132. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 06:13:08,968][134211] Avg episode reward: [(0, '8.137')] [2025-01-04 06:13:10,915][134294] Updated weights for policy 0, policy_version 120314 (0.0019) [2025-01-04 06:13:13,068][134294] Updated weights for policy 0, policy_version 120324 (0.0013) [2025-01-04 06:13:13,968][134211] Fps is (10 sec: 15564.9, 60 sec: 14540.8, 300 sec: 14315.2). Total num frames: 492859392. Throughput: 0: 3440.3. Samples: 112385472. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 06:13:13,968][134211] Avg episode reward: [(0, '8.099')] [2025-01-04 06:13:14,008][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000120328_492863488.pth... [2025-01-04 06:13:14,050][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000119480_489390080.pth [2025-01-04 06:13:15,257][134294] Updated weights for policy 0, policy_version 120334 (0.0013) [2025-01-04 06:13:17,370][134294] Updated weights for policy 0, policy_version 120344 (0.0013) [2025-01-04 06:13:18,968][134211] Fps is (10 sec: 18022.4, 60 sec: 14404.2, 300 sec: 14412.4). Total num frames: 492949504. Throughput: 0: 3547.9. Samples: 112399942. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 06:13:18,968][134211] Avg episode reward: [(0, '8.405')] [2025-01-04 06:13:20,440][134294] Updated weights for policy 0, policy_version 120354 (0.0025) [2025-01-04 06:13:23,636][134294] Updated weights for policy 0, policy_version 120364 (0.0030) [2025-01-04 06:13:23,968][134211] Fps is (10 sec: 15154.9, 60 sec: 13994.7, 300 sec: 14398.5). Total num frames: 493010944. Throughput: 0: 3589.1. Samples: 112421158. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 06:13:23,968][134211] Avg episode reward: [(0, '8.733')] [2025-01-04 06:13:26,783][134294] Updated weights for policy 0, policy_version 120374 (0.0026) [2025-01-04 06:13:28,969][134211] Fps is (10 sec: 12695.5, 60 sec: 13994.3, 300 sec: 14398.4). Total num frames: 493076480. Throughput: 0: 3630.9. Samples: 112440550. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 06:13:28,970][134211] Avg episode reward: [(0, '8.390')] [2025-01-04 06:13:30,029][134294] Updated weights for policy 0, policy_version 120384 (0.0023) [2025-01-04 06:13:33,098][134294] Updated weights for policy 0, policy_version 120394 (0.0026) [2025-01-04 06:13:33,968][134211] Fps is (10 sec: 13107.2, 60 sec: 13994.7, 300 sec: 14412.4). Total num frames: 493142016. Throughput: 0: 3647.8. Samples: 112450312. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 06:13:33,968][134211] Avg episode reward: [(0, '9.449')] [2025-01-04 06:13:36,206][134294] Updated weights for policy 0, policy_version 120404 (0.0028) [2025-01-04 06:13:38,968][134211] Fps is (10 sec: 12699.7, 60 sec: 13994.6, 300 sec: 14398.5). Total num frames: 493203456. Throughput: 0: 3621.8. Samples: 112469816. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 06:13:38,968][134211] Avg episode reward: [(0, '8.986')] [2025-01-04 06:13:39,664][134294] Updated weights for policy 0, policy_version 120414 (0.0025) [2025-01-04 06:13:43,017][134294] Updated weights for policy 0, policy_version 120424 (0.0024) [2025-01-04 06:13:43,968][134211] Fps is (10 sec: 13107.5, 60 sec: 14132.0, 300 sec: 14398.5). Total num frames: 493273088. Throughput: 0: 3480.5. Samples: 112488160. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 06:13:43,968][134211] Avg episode reward: [(0, '8.816')] [2025-01-04 06:13:44,968][134294] Updated weights for policy 0, policy_version 120434 (0.0014) [2025-01-04 06:13:46,794][134294] Updated weights for policy 0, policy_version 120444 (0.0013) [2025-01-04 06:13:48,713][134294] Updated weights for policy 0, policy_version 120454 (0.0012) [2025-01-04 06:13:48,967][134211] Fps is (10 sec: 18023.0, 60 sec: 14950.5, 300 sec: 14565.1). Total num frames: 493383680. Throughput: 0: 3631.4. Samples: 112504460. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 06:13:48,968][134211] Avg episode reward: [(0, '8.789')] [2025-01-04 06:13:50,622][134294] Updated weights for policy 0, policy_version 120464 (0.0012) [2025-01-04 06:13:53,478][134294] Updated weights for policy 0, policy_version 120474 (0.0024) [2025-01-04 06:13:53,968][134211] Fps is (10 sec: 19251.1, 60 sec: 15155.2, 300 sec: 14620.6). Total num frames: 493465600. Throughput: 0: 3853.9. Samples: 112534558. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 06:13:53,968][134211] Avg episode reward: [(0, '8.023')] [2025-01-04 06:13:56,452][134294] Updated weights for policy 0, policy_version 120484 (0.0028) [2025-01-04 06:13:58,969][134211] Fps is (10 sec: 14743.6, 60 sec: 14813.6, 300 sec: 14592.8). Total num frames: 493531136. Throughput: 0: 3744.6. Samples: 112553984. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 06:13:58,969][134211] Avg episode reward: [(0, '8.441')] [2025-01-04 06:13:59,798][134294] Updated weights for policy 0, policy_version 120494 (0.0026) [2025-01-04 06:14:03,375][134294] Updated weights for policy 0, policy_version 120504 (0.0029) [2025-01-04 06:14:03,968][134211] Fps is (10 sec: 12287.3, 60 sec: 14745.5, 300 sec: 14440.1). Total num frames: 493588480. Throughput: 0: 3630.5. Samples: 112563316. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 06:14:03,969][134211] Avg episode reward: [(0, '9.481')] [2025-01-04 06:14:06,966][134294] Updated weights for policy 0, policy_version 120514 (0.0025) [2025-01-04 06:14:08,968][134211] Fps is (10 sec: 11470.0, 60 sec: 14609.1, 300 sec: 14426.2). Total num frames: 493645824. Throughput: 0: 3534.0. Samples: 112580188. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 06:14:08,968][134211] Avg episode reward: [(0, '8.969')] [2025-01-04 06:14:10,443][134294] Updated weights for policy 0, policy_version 120524 (0.0027) [2025-01-04 06:14:13,874][134294] Updated weights for policy 0, policy_version 120534 (0.0023) [2025-01-04 06:14:13,968][134211] Fps is (10 sec: 11879.1, 60 sec: 14131.2, 300 sec: 14426.3). Total num frames: 493707264. Throughput: 0: 3496.7. Samples: 112597894. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 06:14:13,968][134211] Avg episode reward: [(0, '9.384')] [2025-01-04 06:14:16,766][134294] Updated weights for policy 0, policy_version 120544 (0.0024) [2025-01-04 06:14:18,728][134294] Updated weights for policy 0, policy_version 120554 (0.0015) [2025-01-04 06:14:18,968][134211] Fps is (10 sec: 14745.9, 60 sec: 14063.0, 300 sec: 14440.1). Total num frames: 493793280. Throughput: 0: 3504.9. Samples: 112608032. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 06:14:18,968][134211] Avg episode reward: [(0, '8.773')] [2025-01-04 06:14:20,643][134294] Updated weights for policy 0, policy_version 120564 (0.0013) [2025-01-04 06:14:22,528][134294] Updated weights for policy 0, policy_version 120574 (0.0014) [2025-01-04 06:14:23,968][134211] Fps is (10 sec: 18841.3, 60 sec: 14745.6, 300 sec: 14551.2). Total num frames: 493895680. Throughput: 0: 3762.0. Samples: 112639106. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 06:14:23,968][134211] Avg episode reward: [(0, '8.590')] [2025-01-04 06:14:25,299][134294] Updated weights for policy 0, policy_version 120584 (0.0023) [2025-01-04 06:14:28,368][134294] Updated weights for policy 0, policy_version 120594 (0.0027) [2025-01-04 06:14:28,968][134211] Fps is (10 sec: 16383.4, 60 sec: 14677.7, 300 sec: 14551.2). Total num frames: 493957120. Throughput: 0: 3834.1. Samples: 112660696. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 06:14:28,969][134211] Avg episode reward: [(0, '9.272')] [2025-01-04 06:14:31,735][134294] Updated weights for policy 0, policy_version 120604 (0.0030) [2025-01-04 06:14:33,968][134211] Fps is (10 sec: 12697.6, 60 sec: 14677.3, 300 sec: 14495.7). Total num frames: 494022656. Throughput: 0: 3678.4. Samples: 112669988. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 06:14:33,968][134211] Avg episode reward: [(0, '8.996')] [2025-01-04 06:14:34,928][134294] Updated weights for policy 0, policy_version 120614 (0.0027) [2025-01-04 06:14:38,250][134294] Updated weights for policy 0, policy_version 120624 (0.0027) [2025-01-04 06:14:38,968][134211] Fps is (10 sec: 12697.8, 60 sec: 14677.3, 300 sec: 14356.8). Total num frames: 494084096. Throughput: 0: 3435.7. Samples: 112689166. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 06:14:38,968][134211] Avg episode reward: [(0, '8.774')] [2025-01-04 06:14:41,673][134294] Updated weights for policy 0, policy_version 120634 (0.0025) [2025-01-04 06:14:43,968][134211] Fps is (10 sec: 12288.2, 60 sec: 14540.8, 300 sec: 14259.6). Total num frames: 494145536. Throughput: 0: 3408.4. Samples: 112707356. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 06:14:43,968][134211] Avg episode reward: [(0, '9.388')] [2025-01-04 06:14:44,886][134294] Updated weights for policy 0, policy_version 120644 (0.0025) [2025-01-04 06:14:47,321][134294] Updated weights for policy 0, policy_version 120654 (0.0019) [2025-01-04 06:14:48,968][134211] Fps is (10 sec: 14745.8, 60 sec: 14131.2, 300 sec: 14342.9). Total num frames: 494231552. Throughput: 0: 3414.4. Samples: 112716962. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 06:14:48,968][134211] Avg episode reward: [(0, '8.515')] [2025-01-04 06:14:49,192][134294] Updated weights for policy 0, policy_version 120664 (0.0012) [2025-01-04 06:14:51,135][134294] Updated weights for policy 0, policy_version 120674 (0.0014) [2025-01-04 06:14:52,987][134294] Updated weights for policy 0, policy_version 120684 (0.0014) [2025-01-04 06:14:53,967][134211] Fps is (10 sec: 19661.0, 60 sec: 14609.1, 300 sec: 14495.7). Total num frames: 494342144. Throughput: 0: 3756.6. Samples: 112749232. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 06:14:53,968][134211] Avg episode reward: [(0, '8.823')] [2025-01-04 06:14:55,207][134294] Updated weights for policy 0, policy_version 120694 (0.0017) [2025-01-04 06:14:58,373][134294] Updated weights for policy 0, policy_version 120704 (0.0031) [2025-01-04 06:14:58,968][134211] Fps is (10 sec: 17612.6, 60 sec: 14609.3, 300 sec: 14523.4). Total num frames: 494407680. Throughput: 0: 3903.2. Samples: 112773538. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 06:14:58,968][134211] Avg episode reward: [(0, '8.082')] [2025-01-04 06:15:01,478][134294] Updated weights for policy 0, policy_version 120714 (0.0028) [2025-01-04 06:15:03,968][134211] Fps is (10 sec: 13106.9, 60 sec: 14745.7, 300 sec: 14551.2). Total num frames: 494473216. Throughput: 0: 3893.4. Samples: 112783236. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 06:15:03,968][134211] Avg episode reward: [(0, '8.162')] [2025-01-04 06:15:05,030][134294] Updated weights for policy 0, policy_version 120724 (0.0028) [2025-01-04 06:15:08,527][134294] Updated weights for policy 0, policy_version 120734 (0.0029) [2025-01-04 06:15:08,968][134211] Fps is (10 sec: 11878.5, 60 sec: 14677.4, 300 sec: 14523.5). Total num frames: 494526464. Throughput: 0: 3591.9. Samples: 112800740. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 06:15:08,968][134211] Avg episode reward: [(0, '8.066')] [2025-01-04 06:15:12,130][134294] Updated weights for policy 0, policy_version 120744 (0.0026) [2025-01-04 06:15:13,968][134211] Fps is (10 sec: 11468.8, 60 sec: 14677.3, 300 sec: 14440.1). Total num frames: 494587904. Throughput: 0: 3506.7. Samples: 112818498. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 06:15:13,968][134211] Avg episode reward: [(0, '8.639')] [2025-01-04 06:15:13,986][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000120750_494592000.pth... [2025-01-04 06:15:14,056][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000119904_491126784.pth [2025-01-04 06:15:15,202][134294] Updated weights for policy 0, policy_version 120754 (0.0027) [2025-01-04 06:15:18,108][134294] Updated weights for policy 0, policy_version 120764 (0.0026) [2025-01-04 06:15:18,968][134211] Fps is (10 sec: 13107.3, 60 sec: 14404.3, 300 sec: 14440.1). Total num frames: 494657536. Throughput: 0: 3529.1. Samples: 112828798. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 06:15:18,968][134211] Avg episode reward: [(0, '9.202')] [2025-01-04 06:15:20,338][134294] Updated weights for policy 0, policy_version 120774 (0.0016) [2025-01-04 06:15:22,181][134294] Updated weights for policy 0, policy_version 120784 (0.0012) [2025-01-04 06:15:23,968][134211] Fps is (10 sec: 18022.6, 60 sec: 14540.8, 300 sec: 14579.0). Total num frames: 494768128. Throughput: 0: 3705.5. Samples: 112855914. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 06:15:23,968][134211] Avg episode reward: [(0, '8.046')] [2025-01-04 06:15:24,123][134294] Updated weights for policy 0, policy_version 120794 (0.0013) [2025-01-04 06:15:26,184][134294] Updated weights for policy 0, policy_version 120804 (0.0014) [2025-01-04 06:15:28,939][134294] Updated weights for policy 0, policy_version 120814 (0.0021) [2025-01-04 06:15:28,968][134211] Fps is (10 sec: 19660.6, 60 sec: 14950.5, 300 sec: 14648.4). Total num frames: 494854144. Throughput: 0: 3934.5. Samples: 112884408. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 06:15:28,968][134211] Avg episode reward: [(0, '8.675')] [2025-01-04 06:15:32,229][134294] Updated weights for policy 0, policy_version 120824 (0.0025) [2025-01-04 06:15:33,968][134211] Fps is (10 sec: 14335.8, 60 sec: 14813.9, 300 sec: 14509.5). Total num frames: 494911488. Throughput: 0: 3924.7. Samples: 112893576. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 06:15:33,968][134211] Avg episode reward: [(0, '8.679')] [2025-01-04 06:15:35,668][134294] Updated weights for policy 0, policy_version 120834 (0.0026) [2025-01-04 06:15:38,970][134211] Fps is (10 sec: 11876.2, 60 sec: 14813.4, 300 sec: 14440.1). Total num frames: 494972928. Throughput: 0: 3613.0. Samples: 112911822. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 06:15:38,970][134211] Avg episode reward: [(0, '8.733')] [2025-01-04 06:15:39,139][134294] Updated weights for policy 0, policy_version 120844 (0.0026) [2025-01-04 06:15:42,565][134294] Updated weights for policy 0, policy_version 120854 (0.0027) [2025-01-04 06:15:43,968][134211] Fps is (10 sec: 12288.0, 60 sec: 14813.8, 300 sec: 14454.0). Total num frames: 495034368. Throughput: 0: 3472.0. Samples: 112929776. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 06:15:43,968][134211] Avg episode reward: [(0, '9.209')] [2025-01-04 06:15:45,566][134294] Updated weights for policy 0, policy_version 120864 (0.0023) [2025-01-04 06:15:48,564][134294] Updated weights for policy 0, policy_version 120874 (0.0026) [2025-01-04 06:15:48,968][134211] Fps is (10 sec: 13109.6, 60 sec: 14540.8, 300 sec: 14454.1). Total num frames: 495104000. Throughput: 0: 3484.5. Samples: 112940038. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 06:15:48,968][134211] Avg episode reward: [(0, '8.898')] [2025-01-04 06:15:51,518][134294] Updated weights for policy 0, policy_version 120884 (0.0027) [2025-01-04 06:15:53,968][134211] Fps is (10 sec: 13926.2, 60 sec: 13858.0, 300 sec: 14467.9). Total num frames: 495173632. Throughput: 0: 3551.5. Samples: 112960560. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 06:15:53,969][134211] Avg episode reward: [(0, '8.982')] [2025-01-04 06:15:54,668][134294] Updated weights for policy 0, policy_version 120894 (0.0026) [2025-01-04 06:15:56,966][134294] Updated weights for policy 0, policy_version 120904 (0.0015) [2025-01-04 06:15:58,864][134294] Updated weights for policy 0, policy_version 120914 (0.0013) [2025-01-04 06:15:58,968][134211] Fps is (10 sec: 15974.0, 60 sec: 14267.7, 300 sec: 14551.2). Total num frames: 495263744. Throughput: 0: 3709.4. Samples: 112985422. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 06:15:58,968][134211] Avg episode reward: [(0, '7.616')] [2025-01-04 06:16:00,793][134294] Updated weights for policy 0, policy_version 120924 (0.0015) [2025-01-04 06:16:03,531][134294] Updated weights for policy 0, policy_version 120934 (0.0019) [2025-01-04 06:16:03,968][134211] Fps is (10 sec: 17613.3, 60 sec: 14609.1, 300 sec: 14606.7). Total num frames: 495349760. Throughput: 0: 3833.3. Samples: 113001296. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 06:16:03,968][134211] Avg episode reward: [(0, '9.691')] [2025-01-04 06:16:07,325][134294] Updated weights for policy 0, policy_version 120944 (0.0028) [2025-01-04 06:16:08,968][134211] Fps is (10 sec: 13926.7, 60 sec: 14609.0, 300 sec: 14495.7). Total num frames: 495403008. Throughput: 0: 3630.8. Samples: 113019302. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 06:16:08,968][134211] Avg episode reward: [(0, '8.407')] [2025-01-04 06:16:10,956][134294] Updated weights for policy 0, policy_version 120954 (0.0028) [2025-01-04 06:16:13,438][134294] Updated weights for policy 0, policy_version 120964 (0.0018) [2025-01-04 06:16:13,968][134211] Fps is (10 sec: 12697.7, 60 sec: 14813.9, 300 sec: 14537.3). Total num frames: 495476736. Throughput: 0: 3435.1. Samples: 113038986. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 06:16:13,968][134211] Avg episode reward: [(0, '9.681')] [2025-01-04 06:16:15,339][134294] Updated weights for policy 0, policy_version 120974 (0.0015) [2025-01-04 06:16:17,230][134294] Updated weights for policy 0, policy_version 120984 (0.0012) [2025-01-04 06:16:18,968][134211] Fps is (10 sec: 17613.1, 60 sec: 15360.0, 300 sec: 14648.4). Total num frames: 495579136. Throughput: 0: 3589.0. Samples: 113055080. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 06:16:18,968][134211] Avg episode reward: [(0, '8.815')] [2025-01-04 06:16:19,654][134294] Updated weights for policy 0, policy_version 120994 (0.0019) [2025-01-04 06:16:23,109][134294] Updated weights for policy 0, policy_version 121004 (0.0028) [2025-01-04 06:16:23,968][134211] Fps is (10 sec: 16383.6, 60 sec: 14540.8, 300 sec: 14551.2). Total num frames: 495640576. Throughput: 0: 3710.3. Samples: 113078780. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 06:16:23,969][134211] Avg episode reward: [(0, '8.791')] [2025-01-04 06:16:26,771][134294] Updated weights for policy 0, policy_version 121014 (0.0024) [2025-01-04 06:16:28,968][134211] Fps is (10 sec: 11878.3, 60 sec: 14062.9, 300 sec: 14384.6). Total num frames: 495697920. Throughput: 0: 3687.8. Samples: 113095728. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 06:16:28,968][134211] Avg episode reward: [(0, '8.771')] [2025-01-04 06:16:30,159][134294] Updated weights for policy 0, policy_version 121024 (0.0027) [2025-01-04 06:16:33,406][134294] Updated weights for policy 0, policy_version 121034 (0.0026) [2025-01-04 06:16:33,968][134211] Fps is (10 sec: 11878.4, 60 sec: 14131.2, 300 sec: 14384.6). Total num frames: 495759360. Throughput: 0: 3660.8. Samples: 113104774. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 06:16:33,969][134211] Avg episode reward: [(0, '7.980')] [2025-01-04 06:16:36,107][134294] Updated weights for policy 0, policy_version 121044 (0.0021) [2025-01-04 06:16:38,465][134294] Updated weights for policy 0, policy_version 121054 (0.0017) [2025-01-04 06:16:38,969][134211] Fps is (10 sec: 14334.6, 60 sec: 14472.7, 300 sec: 14454.0). Total num frames: 495841280. Throughput: 0: 3711.8. Samples: 113127594. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 06:16:38,969][134211] Avg episode reward: [(0, '8.301')] [2025-01-04 06:16:41,977][134294] Updated weights for policy 0, policy_version 121064 (0.0027) [2025-01-04 06:16:43,968][134211] Fps is (10 sec: 14336.1, 60 sec: 14472.5, 300 sec: 14440.1). Total num frames: 495902720. Throughput: 0: 3585.1. Samples: 113146750. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 06:16:43,968][134211] Avg episode reward: [(0, '9.123')] [2025-01-04 06:16:45,043][134294] Updated weights for policy 0, policy_version 121074 (0.0024) [2025-01-04 06:16:47,509][134294] Updated weights for policy 0, policy_version 121084 (0.0019) [2025-01-04 06:16:48,967][134211] Fps is (10 sec: 14747.3, 60 sec: 14745.6, 300 sec: 14509.6). Total num frames: 495988736. Throughput: 0: 3459.0. Samples: 113156950. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 06:16:48,968][134211] Avg episode reward: [(0, '8.217')] [2025-01-04 06:16:49,450][134294] Updated weights for policy 0, policy_version 121094 (0.0014) [2025-01-04 06:16:51,299][134294] Updated weights for policy 0, policy_version 121104 (0.0014) [2025-01-04 06:16:53,199][134294] Updated weights for policy 0, policy_version 121114 (0.0013) [2025-01-04 06:16:53,968][134211] Fps is (10 sec: 19661.3, 60 sec: 15428.4, 300 sec: 14662.3). Total num frames: 496099328. Throughput: 0: 3759.7. Samples: 113188486. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 06:16:53,968][134211] Avg episode reward: [(0, '8.226')] [2025-01-04 06:16:55,406][134294] Updated weights for policy 0, policy_version 121124 (0.0020) [2025-01-04 06:16:58,588][134294] Updated weights for policy 0, policy_version 121134 (0.0030) [2025-01-04 06:16:58,968][134211] Fps is (10 sec: 18021.9, 60 sec: 15087.0, 300 sec: 14703.9). Total num frames: 496168960. Throughput: 0: 3867.9. Samples: 113213040. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 06:16:58,968][134211] Avg episode reward: [(0, '8.215')] [2025-01-04 06:17:01,813][134294] Updated weights for policy 0, policy_version 121144 (0.0027) [2025-01-04 06:17:03,969][134211] Fps is (10 sec: 13105.9, 60 sec: 14677.1, 300 sec: 14606.7). Total num frames: 496230400. Throughput: 0: 3723.5. Samples: 113222640. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 06:17:03,969][134211] Avg episode reward: [(0, '9.648')] [2025-01-04 06:17:05,025][134294] Updated weights for policy 0, policy_version 121154 (0.0027) [2025-01-04 06:17:08,321][134294] Updated weights for policy 0, policy_version 121164 (0.0029) [2025-01-04 06:17:08,968][134211] Fps is (10 sec: 12287.7, 60 sec: 14813.8, 300 sec: 14592.9). Total num frames: 496291840. Throughput: 0: 3620.7. Samples: 113241712. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 06:17:08,969][134211] Avg episode reward: [(0, '8.633')] [2025-01-04 06:17:12,049][134294] Updated weights for policy 0, policy_version 121174 (0.0030) [2025-01-04 06:17:13,968][134211] Fps is (10 sec: 12289.0, 60 sec: 14609.0, 300 sec: 14467.9). Total num frames: 496353280. Throughput: 0: 3624.9. Samples: 113258848. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 06:17:13,968][134211] Avg episode reward: [(0, '8.974')] [2025-01-04 06:17:13,982][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000121180_496353280.pth... [2025-01-04 06:17:14,053][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000120328_492863488.pth [2025-01-04 06:17:15,148][134294] Updated weights for policy 0, policy_version 121184 (0.0027) [2025-01-04 06:17:18,116][134294] Updated weights for policy 0, policy_version 121194 (0.0025) [2025-01-04 06:17:18,968][134211] Fps is (10 sec: 12697.9, 60 sec: 13994.6, 300 sec: 14398.5). Total num frames: 496418816. Throughput: 0: 3652.4. Samples: 113269132. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 06:17:18,968][134211] Avg episode reward: [(0, '9.038')] [2025-01-04 06:17:21,070][134294] Updated weights for policy 0, policy_version 121204 (0.0024) [2025-01-04 06:17:23,968][134211] Fps is (10 sec: 13516.8, 60 sec: 14131.2, 300 sec: 14412.4). Total num frames: 496488448. Throughput: 0: 3602.9. Samples: 113289722. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 06:17:23,968][134211] Avg episode reward: [(0, '8.267')] [2025-01-04 06:17:24,129][134294] Updated weights for policy 0, policy_version 121214 (0.0027) [2025-01-04 06:17:26,157][134294] Updated weights for policy 0, policy_version 121224 (0.0014) [2025-01-04 06:17:28,757][134294] Updated weights for policy 0, policy_version 121234 (0.0021) [2025-01-04 06:17:28,968][134211] Fps is (10 sec: 15565.0, 60 sec: 14609.1, 300 sec: 14481.8). Total num frames: 496574464. Throughput: 0: 3733.1. Samples: 113314738. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 06:17:28,968][134211] Avg episode reward: [(0, '8.627')] [2025-01-04 06:17:31,763][134294] Updated weights for policy 0, policy_version 121244 (0.0026) [2025-01-04 06:17:33,968][134211] Fps is (10 sec: 15155.3, 60 sec: 14677.4, 300 sec: 14495.7). Total num frames: 496640000. Throughput: 0: 3734.0. Samples: 113324980. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 06:17:33,968][134211] Avg episode reward: [(0, '8.827')] [2025-01-04 06:17:35,004][134294] Updated weights for policy 0, policy_version 121254 (0.0028) [2025-01-04 06:17:38,150][134294] Updated weights for policy 0, policy_version 121264 (0.0026) [2025-01-04 06:17:38,968][134211] Fps is (10 sec: 13107.1, 60 sec: 14404.5, 300 sec: 14509.7). Total num frames: 496705536. Throughput: 0: 3465.9. Samples: 113344452. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 06:17:38,968][134211] Avg episode reward: [(0, '9.091')] [2025-01-04 06:17:40,930][134294] Updated weights for policy 0, policy_version 121274 (0.0018) [2025-01-04 06:17:42,948][134294] Updated weights for policy 0, policy_version 121284 (0.0013) [2025-01-04 06:17:43,968][134211] Fps is (10 sec: 15974.6, 60 sec: 14950.5, 300 sec: 14620.6). Total num frames: 496799744. Throughput: 0: 3473.5. Samples: 113369346. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 06:17:43,968][134211] Avg episode reward: [(0, '8.708')] [2025-01-04 06:17:44,886][134294] Updated weights for policy 0, policy_version 121294 (0.0014) [2025-01-04 06:17:47,432][134294] Updated weights for policy 0, policy_version 121304 (0.0034) [2025-01-04 06:17:48,967][134211] Fps is (10 sec: 18432.4, 60 sec: 15018.7, 300 sec: 14690.1). Total num frames: 496889856. Throughput: 0: 3575.5. Samples: 113383536. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 06:17:48,968][134211] Avg episode reward: [(0, '8.334')] [2025-01-04 06:17:49,484][134294] Updated weights for policy 0, policy_version 121314 (0.0013) [2025-01-04 06:17:52,164][134294] Updated weights for policy 0, policy_version 121324 (0.0021) [2025-01-04 06:17:53,968][134211] Fps is (10 sec: 16383.5, 60 sec: 14404.2, 300 sec: 14648.4). Total num frames: 496963584. Throughput: 0: 3716.7. Samples: 113408964. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 06:17:53,969][134211] Avg episode reward: [(0, '8.997')] [2025-01-04 06:17:55,268][134294] Updated weights for policy 0, policy_version 121334 (0.0024) [2025-01-04 06:17:58,309][134294] Updated weights for policy 0, policy_version 121344 (0.0030) [2025-01-04 06:17:58,968][134211] Fps is (10 sec: 14335.8, 60 sec: 14404.3, 300 sec: 14676.2). Total num frames: 497033216. Throughput: 0: 3783.0. Samples: 113429082. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 06:17:58,968][134211] Avg episode reward: [(0, '7.873')] [2025-01-04 06:18:01,398][134294] Updated weights for policy 0, policy_version 121354 (0.0024) [2025-01-04 06:18:03,969][134211] Fps is (10 sec: 12696.1, 60 sec: 14335.9, 300 sec: 14648.3). Total num frames: 497090560. Throughput: 0: 3775.0. Samples: 113439014. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 06:18:03,970][134211] Avg episode reward: [(0, '8.069')] [2025-01-04 06:18:05,205][134294] Updated weights for policy 0, policy_version 121364 (0.0027) [2025-01-04 06:18:08,731][134294] Updated weights for policy 0, policy_version 121374 (0.0029) [2025-01-04 06:18:08,968][134211] Fps is (10 sec: 11468.8, 60 sec: 14267.8, 300 sec: 14537.3). Total num frames: 497147904. Throughput: 0: 3691.3. Samples: 113455830. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 06:18:08,969][134211] Avg episode reward: [(0, '8.159')] [2025-01-04 06:18:11,867][134294] Updated weights for policy 0, policy_version 121384 (0.0020) [2025-01-04 06:18:13,847][134294] Updated weights for policy 0, policy_version 121394 (0.0014) [2025-01-04 06:18:13,968][134211] Fps is (10 sec: 13928.5, 60 sec: 14609.1, 300 sec: 14509.6). Total num frames: 497229824. Throughput: 0: 3612.8. Samples: 113477314. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 06:18:13,968][134211] Avg episode reward: [(0, '8.083')] [2025-01-04 06:18:15,794][134294] Updated weights for policy 0, policy_version 121404 (0.0012) [2025-01-04 06:18:17,643][134294] Updated weights for policy 0, policy_version 121414 (0.0013) [2025-01-04 06:18:18,968][134211] Fps is (10 sec: 18841.5, 60 sec: 15291.7, 300 sec: 14662.3). Total num frames: 497336320. Throughput: 0: 3736.2. Samples: 113493110. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 06:18:18,968][134211] Avg episode reward: [(0, '8.954')] [2025-01-04 06:18:20,020][134294] Updated weights for policy 0, policy_version 121424 (0.0022) [2025-01-04 06:18:23,208][134294] Updated weights for policy 0, policy_version 121434 (0.0028) [2025-01-04 06:18:23,968][134211] Fps is (10 sec: 17202.8, 60 sec: 15223.4, 300 sec: 14662.4). Total num frames: 497401856. Throughput: 0: 3868.6. Samples: 113518540. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 06:18:23,969][134211] Avg episode reward: [(0, '9.291')] [2025-01-04 06:18:26,308][134294] Updated weights for policy 0, policy_version 121444 (0.0025) [2025-01-04 06:18:28,968][134211] Fps is (10 sec: 13107.1, 60 sec: 14882.1, 300 sec: 14662.3). Total num frames: 497467392. Throughput: 0: 3747.5. Samples: 113537986. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 06:18:28,968][134211] Avg episode reward: [(0, '8.618')] [2025-01-04 06:18:29,524][134294] Updated weights for policy 0, policy_version 121454 (0.0025) [2025-01-04 06:18:32,606][134294] Updated weights for policy 0, policy_version 121464 (0.0023) [2025-01-04 06:18:33,968][134211] Fps is (10 sec: 13107.4, 60 sec: 14882.1, 300 sec: 14676.2). Total num frames: 497532928. Throughput: 0: 3644.9. Samples: 113547558. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 06:18:33,968][134211] Avg episode reward: [(0, '8.545')] [2025-01-04 06:18:35,666][134294] Updated weights for policy 0, policy_version 121474 (0.0023) [2025-01-04 06:18:38,968][134211] Fps is (10 sec: 12697.7, 60 sec: 14813.9, 300 sec: 14648.4). Total num frames: 497594368. Throughput: 0: 3518.8. Samples: 113567308. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 06:18:38,968][134211] Avg episode reward: [(0, '8.860')] [2025-01-04 06:18:38,986][134294] Updated weights for policy 0, policy_version 121484 (0.0027) [2025-01-04 06:18:42,326][134294] Updated weights for policy 0, policy_version 121494 (0.0027) [2025-01-04 06:18:43,968][134211] Fps is (10 sec: 12696.9, 60 sec: 14335.8, 300 sec: 14495.6). Total num frames: 497659904. Throughput: 0: 3483.0. Samples: 113585820. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 06:18:43,970][134211] Avg episode reward: [(0, '8.510')] [2025-01-04 06:18:45,454][134294] Updated weights for policy 0, policy_version 121504 (0.0029) [2025-01-04 06:18:48,466][134294] Updated weights for policy 0, policy_version 121514 (0.0025) [2025-01-04 06:18:48,968][134211] Fps is (10 sec: 13107.1, 60 sec: 13926.3, 300 sec: 14440.1). Total num frames: 497725440. Throughput: 0: 3485.3. Samples: 113595850. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 06:18:48,968][134211] Avg episode reward: [(0, '9.263')] [2025-01-04 06:18:51,544][134294] Updated weights for policy 0, policy_version 121524 (0.0022) [2025-01-04 06:18:53,968][134211] Fps is (10 sec: 13927.4, 60 sec: 13926.5, 300 sec: 14468.0). Total num frames: 497799168. Throughput: 0: 3552.0. Samples: 113615672. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 06:18:53,968][134211] Avg episode reward: [(0, '9.007')] [2025-01-04 06:18:54,127][134294] Updated weights for policy 0, policy_version 121534 (0.0018) [2025-01-04 06:18:56,433][134294] Updated weights for policy 0, policy_version 121544 (0.0019) [2025-01-04 06:18:58,968][134211] Fps is (10 sec: 15155.3, 60 sec: 14062.9, 300 sec: 14537.4). Total num frames: 497876992. Throughput: 0: 3624.2. Samples: 113640404. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 06:18:58,968][134211] Avg episode reward: [(0, '7.838')] [2025-01-04 06:18:59,374][134294] Updated weights for policy 0, policy_version 121554 (0.0025) [2025-01-04 06:19:02,455][134294] Updated weights for policy 0, policy_version 121564 (0.0025) [2025-01-04 06:19:03,968][134211] Fps is (10 sec: 14335.7, 60 sec: 14199.8, 300 sec: 14565.1). Total num frames: 497942528. Throughput: 0: 3496.6. Samples: 113650458. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 06:19:03,968][134211] Avg episode reward: [(0, '9.286')] [2025-01-04 06:19:05,112][134294] Updated weights for policy 0, policy_version 121574 (0.0020) [2025-01-04 06:19:07,128][134294] Updated weights for policy 0, policy_version 121584 (0.0013) [2025-01-04 06:19:08,968][134211] Fps is (10 sec: 16384.3, 60 sec: 14882.1, 300 sec: 14690.1). Total num frames: 498040832. Throughput: 0: 3491.1. Samples: 113675638. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 06:19:08,968][134211] Avg episode reward: [(0, '9.018')] [2025-01-04 06:19:09,398][134294] Updated weights for policy 0, policy_version 121594 (0.0018) [2025-01-04 06:19:12,760][134294] Updated weights for policy 0, policy_version 121604 (0.0028) [2025-01-04 06:19:13,968][134211] Fps is (10 sec: 15974.5, 60 sec: 14540.8, 300 sec: 14606.7). Total num frames: 498102272. Throughput: 0: 3530.3. Samples: 113696850. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 06:19:13,968][134211] Avg episode reward: [(0, '10.179')] [2025-01-04 06:19:13,980][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000121607_498102272.pth... [2025-01-04 06:19:14,054][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000120750_494592000.pth [2025-01-04 06:19:14,055][134264] Saving new best policy, reward=10.179! [2025-01-04 06:19:15,974][134294] Updated weights for policy 0, policy_version 121614 (0.0027) [2025-01-04 06:19:18,968][134211] Fps is (10 sec: 12697.5, 60 sec: 13858.1, 300 sec: 14481.8). Total num frames: 498167808. Throughput: 0: 3530.3. Samples: 113706422. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 06:19:18,968][134211] Avg episode reward: [(0, '8.458')] [2025-01-04 06:19:19,020][134294] Updated weights for policy 0, policy_version 121624 (0.0024) [2025-01-04 06:19:21,973][134294] Updated weights for policy 0, policy_version 121634 (0.0025) [2025-01-04 06:19:23,951][134294] Updated weights for policy 0, policy_version 121644 (0.0013) [2025-01-04 06:19:23,968][134211] Fps is (10 sec: 15155.3, 60 sec: 14199.5, 300 sec: 14565.1). Total num frames: 498253824. Throughput: 0: 3553.4. Samples: 113727212. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 06:19:23,968][134211] Avg episode reward: [(0, '8.798')] [2025-01-04 06:19:25,900][134294] Updated weights for policy 0, policy_version 121654 (0.0013) [2025-01-04 06:19:28,777][134294] Updated weights for policy 0, policy_version 121664 (0.0026) [2025-01-04 06:19:28,968][134211] Fps is (10 sec: 16793.4, 60 sec: 14472.5, 300 sec: 14620.6). Total num frames: 498335744. Throughput: 0: 3764.7. Samples: 113755230. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 06:19:28,968][134211] Avg episode reward: [(0, '7.620')] [2025-01-04 06:19:31,896][134294] Updated weights for policy 0, policy_version 121674 (0.0027) [2025-01-04 06:19:33,968][134211] Fps is (10 sec: 14745.6, 60 sec: 14472.5, 300 sec: 14634.5). Total num frames: 498401280. Throughput: 0: 3757.2. Samples: 113764924. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 06:19:33,968][134211] Avg episode reward: [(0, '8.535')] [2025-01-04 06:19:35,252][134294] Updated weights for policy 0, policy_version 121684 (0.0026) [2025-01-04 06:19:38,504][134294] Updated weights for policy 0, policy_version 121694 (0.0026) [2025-01-04 06:19:38,968][134211] Fps is (10 sec: 12288.1, 60 sec: 14404.3, 300 sec: 14620.6). Total num frames: 498458624. Throughput: 0: 3734.3. Samples: 113783718. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 06:19:38,968][134211] Avg episode reward: [(0, '9.281')] [2025-01-04 06:19:41,587][134294] Updated weights for policy 0, policy_version 121704 (0.0023) [2025-01-04 06:19:43,968][134211] Fps is (10 sec: 13516.8, 60 sec: 14609.2, 300 sec: 14592.9). Total num frames: 498536448. Throughput: 0: 3660.8. Samples: 113805138. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 06:19:43,968][134211] Avg episode reward: [(0, '8.494')] [2025-01-04 06:19:44,043][134294] Updated weights for policy 0, policy_version 121714 (0.0020) [2025-01-04 06:19:47,004][134294] Updated weights for policy 0, policy_version 121724 (0.0021) [2025-01-04 06:19:48,968][134211] Fps is (10 sec: 14745.5, 60 sec: 14677.3, 300 sec: 14454.0). Total num frames: 498606080. Throughput: 0: 3667.2. Samples: 113815480. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 06:19:48,968][134211] Avg episode reward: [(0, '9.112')] [2025-01-04 06:19:50,016][134294] Updated weights for policy 0, policy_version 121734 (0.0025) [2025-01-04 06:19:52,021][134294] Updated weights for policy 0, policy_version 121744 (0.0013) [2025-01-04 06:19:53,967][134294] Updated weights for policy 0, policy_version 121754 (0.0016) [2025-01-04 06:19:53,968][134211] Fps is (10 sec: 16793.7, 60 sec: 15086.9, 300 sec: 14565.1). Total num frames: 498704384. Throughput: 0: 3648.3. Samples: 113839810. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 06:19:53,968][134211] Avg episode reward: [(0, '8.851')] [2025-01-04 06:19:56,444][134294] Updated weights for policy 0, policy_version 121764 (0.0022) [2025-01-04 06:19:58,968][134211] Fps is (10 sec: 17203.2, 60 sec: 15018.7, 300 sec: 14592.9). Total num frames: 498778112. Throughput: 0: 3739.4. Samples: 113865122. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 06:19:58,968][134211] Avg episode reward: [(0, '9.367')] [2025-01-04 06:19:59,602][134294] Updated weights for policy 0, policy_version 121774 (0.0024) [2025-01-04 06:20:02,835][134294] Updated weights for policy 0, policy_version 121784 (0.0028) [2025-01-04 06:20:03,968][134211] Fps is (10 sec: 13516.7, 60 sec: 14950.4, 300 sec: 14620.6). Total num frames: 498839552. Throughput: 0: 3743.1. Samples: 113874860. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 06:20:03,968][134211] Avg episode reward: [(0, '8.755')] [2025-01-04 06:20:06,332][134294] Updated weights for policy 0, policy_version 121794 (0.0026) [2025-01-04 06:20:08,968][134211] Fps is (10 sec: 11878.4, 60 sec: 14267.7, 300 sec: 14606.8). Total num frames: 498896896. Throughput: 0: 3675.2. Samples: 113892598. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 06:20:08,968][134211] Avg episode reward: [(0, '9.274')] [2025-01-04 06:20:10,040][134294] Updated weights for policy 0, policy_version 121804 (0.0027) [2025-01-04 06:20:12,799][134294] Updated weights for policy 0, policy_version 121814 (0.0019) [2025-01-04 06:20:13,968][134211] Fps is (10 sec: 13516.9, 60 sec: 14540.8, 300 sec: 14634.5). Total num frames: 498974720. Throughput: 0: 3499.4. Samples: 113912702. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 06:20:13,968][134211] Avg episode reward: [(0, '8.621')] [2025-01-04 06:20:14,753][134294] Updated weights for policy 0, policy_version 121824 (0.0013) [2025-01-04 06:20:16,636][134294] Updated weights for policy 0, policy_version 121834 (0.0013) [2025-01-04 06:20:18,486][134294] Updated weights for policy 0, policy_version 121844 (0.0013) [2025-01-04 06:20:18,968][134211] Fps is (10 sec: 18431.2, 60 sec: 15223.3, 300 sec: 14620.6). Total num frames: 499081216. Throughput: 0: 3644.5. Samples: 113928928. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 06:20:18,969][134211] Avg episode reward: [(0, '8.699')] [2025-01-04 06:20:21,076][134294] Updated weights for policy 0, policy_version 121854 (0.0023) [2025-01-04 06:20:23,968][134211] Fps is (10 sec: 17612.6, 60 sec: 14950.4, 300 sec: 14565.1). Total num frames: 499150848. Throughput: 0: 3816.6. Samples: 113955464. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 06:20:23,968][134211] Avg episode reward: [(0, '8.336')] [2025-01-04 06:20:24,177][134294] Updated weights for policy 0, policy_version 121864 (0.0025) [2025-01-04 06:20:27,288][134294] Updated weights for policy 0, policy_version 121874 (0.0028) [2025-01-04 06:20:28,968][134211] Fps is (10 sec: 13107.9, 60 sec: 14609.1, 300 sec: 14579.0). Total num frames: 499212288. Throughput: 0: 3770.0. Samples: 113974786. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 06:20:28,968][134211] Avg episode reward: [(0, '8.524')] [2025-01-04 06:20:30,448][134294] Updated weights for policy 0, policy_version 121884 (0.0028) [2025-01-04 06:20:33,407][134294] Updated weights for policy 0, policy_version 121894 (0.0023) [2025-01-04 06:20:33,968][134211] Fps is (10 sec: 13107.0, 60 sec: 14677.3, 300 sec: 14606.8). Total num frames: 499281920. Throughput: 0: 3766.7. Samples: 113984982. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 06:20:33,969][134211] Avg episode reward: [(0, '8.586')] [2025-01-04 06:20:36,684][134294] Updated weights for policy 0, policy_version 121904 (0.0027) [2025-01-04 06:20:38,968][134211] Fps is (10 sec: 13107.0, 60 sec: 14745.6, 300 sec: 14606.8). Total num frames: 499343360. Throughput: 0: 3659.2. Samples: 114004476. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 06:20:38,968][134211] Avg episode reward: [(0, '8.006')] [2025-01-04 06:20:40,068][134294] Updated weights for policy 0, policy_version 121914 (0.0025) [2025-01-04 06:20:43,522][134294] Updated weights for policy 0, policy_version 121924 (0.0024) [2025-01-04 06:20:43,968][134211] Fps is (10 sec: 12288.0, 60 sec: 14472.5, 300 sec: 14579.0). Total num frames: 499404800. Throughput: 0: 3493.6. Samples: 114022336. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 06:20:43,971][134211] Avg episode reward: [(0, '9.102')] [2025-01-04 06:20:46,154][134294] Updated weights for policy 0, policy_version 121934 (0.0018) [2025-01-04 06:20:48,430][134294] Updated weights for policy 0, policy_version 121944 (0.0016) [2025-01-04 06:20:48,968][134211] Fps is (10 sec: 14336.1, 60 sec: 14677.3, 300 sec: 14620.6). Total num frames: 499486720. Throughput: 0: 3527.7. Samples: 114033608. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 06:20:48,968][134211] Avg episode reward: [(0, '8.462')] [2025-01-04 06:20:51,444][134294] Updated weights for policy 0, policy_version 121954 (0.0025) [2025-01-04 06:20:53,971][134211] Fps is (10 sec: 15150.6, 60 sec: 14198.7, 300 sec: 14551.1). Total num frames: 499556352. Throughput: 0: 3643.0. Samples: 114056544. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 06:20:53,972][134211] Avg episode reward: [(0, '8.342')] [2025-01-04 06:20:54,553][134294] Updated weights for policy 0, policy_version 121964 (0.0025) [2025-01-04 06:20:57,566][134294] Updated weights for policy 0, policy_version 121974 (0.0024) [2025-01-04 06:20:58,968][134211] Fps is (10 sec: 13516.8, 60 sec: 14062.9, 300 sec: 14481.8). Total num frames: 499621888. Throughput: 0: 3643.1. Samples: 114076644. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 06:20:58,968][134211] Avg episode reward: [(0, '8.738')] [2025-01-04 06:21:00,486][134294] Updated weights for policy 0, policy_version 121984 (0.0025) [2025-01-04 06:21:02,489][134294] Updated weights for policy 0, policy_version 121994 (0.0013) [2025-01-04 06:21:03,968][134211] Fps is (10 sec: 15159.4, 60 sec: 14472.4, 300 sec: 14592.9). Total num frames: 499707904. Throughput: 0: 3536.4. Samples: 114088066. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 06:21:03,969][134211] Avg episode reward: [(0, '8.455')] [2025-01-04 06:21:05,224][134294] Updated weights for policy 0, policy_version 122004 (0.0018) [2025-01-04 06:21:08,446][134294] Updated weights for policy 0, policy_version 122014 (0.0025) [2025-01-04 06:21:08,968][134211] Fps is (10 sec: 15154.8, 60 sec: 14609.0, 300 sec: 14565.1). Total num frames: 499773440. Throughput: 0: 3472.2. Samples: 114111716. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 06:21:08,969][134211] Avg episode reward: [(0, '7.834')] [2025-01-04 06:21:11,764][134294] Updated weights for policy 0, policy_version 122024 (0.0027) [2025-01-04 06:21:13,968][134211] Fps is (10 sec: 13517.3, 60 sec: 14472.5, 300 sec: 14454.0). Total num frames: 499843072. Throughput: 0: 3457.5. Samples: 114130372. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 06:21:13,968][134211] Avg episode reward: [(0, '7.947')] [2025-01-04 06:21:13,973][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000122032_499843072.pth... [2025-01-04 06:21:14,022][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000121180_496353280.pth [2025-01-04 06:21:14,304][134294] Updated weights for policy 0, policy_version 122034 (0.0017) [2025-01-04 06:21:16,164][134294] Updated weights for policy 0, policy_version 122044 (0.0013) [2025-01-04 06:21:18,059][134294] Updated weights for policy 0, policy_version 122054 (0.0013) [2025-01-04 06:21:18,967][134211] Fps is (10 sec: 17613.7, 60 sec: 14472.7, 300 sec: 14606.8). Total num frames: 499949568. Throughput: 0: 3586.8. Samples: 114146386. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 06:21:18,968][134211] Avg episode reward: [(0, '8.607')] [2025-01-04 06:21:20,259][134294] Updated weights for policy 0, policy_version 122064 (0.0017) [2025-01-04 06:21:23,424][134294] Updated weights for policy 0, policy_version 122074 (0.0029) [2025-01-04 06:21:23,968][134211] Fps is (10 sec: 17612.6, 60 sec: 14472.5, 300 sec: 14648.4). Total num frames: 500019200. Throughput: 0: 3750.2. Samples: 114173236. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 06:21:23,968][134211] Avg episode reward: [(0, '8.655')] [2025-01-04 06:21:26,506][134294] Updated weights for policy 0, policy_version 122084 (0.0029) [2025-01-04 06:21:28,968][134211] Fps is (10 sec: 13516.5, 60 sec: 14540.8, 300 sec: 14662.3). Total num frames: 500084736. Throughput: 0: 3779.2. Samples: 114192398. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 06:21:28,968][134211] Avg episode reward: [(0, '8.952')] [2025-01-04 06:21:29,732][134294] Updated weights for policy 0, policy_version 122094 (0.0028) [2025-01-04 06:21:32,761][134294] Updated weights for policy 0, policy_version 122104 (0.0024) [2025-01-04 06:21:33,968][134211] Fps is (10 sec: 13107.2, 60 sec: 14472.5, 300 sec: 14606.8). Total num frames: 500150272. Throughput: 0: 3746.2. Samples: 114202186. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 06:21:33,969][134211] Avg episode reward: [(0, '8.748')] [2025-01-04 06:21:36,079][134294] Updated weights for policy 0, policy_version 122114 (0.0026) [2025-01-04 06:21:38,968][134211] Fps is (10 sec: 12697.7, 60 sec: 14472.6, 300 sec: 14606.8). Total num frames: 500211712. Throughput: 0: 3658.2. Samples: 114221150. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 06:21:38,968][134211] Avg episode reward: [(0, '7.678')] [2025-01-04 06:21:39,558][134294] Updated weights for policy 0, policy_version 122124 (0.0022) [2025-01-04 06:21:41,609][134294] Updated weights for policy 0, policy_version 122134 (0.0013) [2025-01-04 06:21:43,593][134294] Updated weights for policy 0, policy_version 122144 (0.0014) [2025-01-04 06:21:43,968][134211] Fps is (10 sec: 15974.8, 60 sec: 15087.0, 300 sec: 14648.4). Total num frames: 500310016. Throughput: 0: 3773.0. Samples: 114246430. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 06:21:43,968][134211] Avg episode reward: [(0, '9.655')] [2025-01-04 06:21:45,481][134294] Updated weights for policy 0, policy_version 122154 (0.0013) [2025-01-04 06:21:47,347][134294] Updated weights for policy 0, policy_version 122164 (0.0012) [2025-01-04 06:21:48,968][134211] Fps is (10 sec: 20070.6, 60 sec: 15428.3, 300 sec: 14620.6). Total num frames: 500412416. Throughput: 0: 3880.5. Samples: 114262686. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 06:21:48,968][134211] Avg episode reward: [(0, '9.548')] [2025-01-04 06:21:49,620][134294] Updated weights for policy 0, policy_version 122174 (0.0019) [2025-01-04 06:21:52,992][134294] Updated weights for policy 0, policy_version 122184 (0.0030) [2025-01-04 06:21:53,968][134211] Fps is (10 sec: 16383.6, 60 sec: 15292.5, 300 sec: 14592.9). Total num frames: 500473856. Throughput: 0: 3899.4. Samples: 114287186. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 06:21:53,969][134211] Avg episode reward: [(0, '8.469')] [2025-01-04 06:21:56,316][134294] Updated weights for policy 0, policy_version 122194 (0.0028) [2025-01-04 06:21:58,968][134211] Fps is (10 sec: 12697.3, 60 sec: 15291.7, 300 sec: 14606.8). Total num frames: 500539392. Throughput: 0: 3899.1. Samples: 114305834. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 06:21:58,968][134211] Avg episode reward: [(0, '8.220')] [2025-01-04 06:21:59,550][134294] Updated weights for policy 0, policy_version 122204 (0.0027) [2025-01-04 06:22:02,695][134294] Updated weights for policy 0, policy_version 122214 (0.0026) [2025-01-04 06:22:03,968][134211] Fps is (10 sec: 12697.6, 60 sec: 14882.2, 300 sec: 14606.8). Total num frames: 500600832. Throughput: 0: 3758.3. Samples: 114315510. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 06:22:03,968][134211] Avg episode reward: [(0, '8.327')] [2025-01-04 06:22:06,200][134294] Updated weights for policy 0, policy_version 122224 (0.0025) [2025-01-04 06:22:08,968][134211] Fps is (10 sec: 11878.6, 60 sec: 14745.7, 300 sec: 14592.9). Total num frames: 500658176. Throughput: 0: 3556.2. Samples: 114333264. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 06:22:08,968][134211] Avg episode reward: [(0, '7.981')] [2025-01-04 06:22:09,971][134294] Updated weights for policy 0, policy_version 122234 (0.0031) [2025-01-04 06:22:13,475][134294] Updated weights for policy 0, policy_version 122244 (0.0027) [2025-01-04 06:22:13,968][134211] Fps is (10 sec: 11468.8, 60 sec: 14540.8, 300 sec: 14565.1). Total num frames: 500715520. Throughput: 0: 3507.9. Samples: 114350256. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 06:22:13,968][134211] Avg episode reward: [(0, '8.036')] [2025-01-04 06:22:16,083][134294] Updated weights for policy 0, policy_version 122254 (0.0019) [2025-01-04 06:22:17,984][134294] Updated weights for policy 0, policy_version 122264 (0.0014) [2025-01-04 06:22:18,968][134211] Fps is (10 sec: 15564.9, 60 sec: 14404.3, 300 sec: 14662.3). Total num frames: 500813824. Throughput: 0: 3548.8. Samples: 114361882. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 06:22:18,968][134211] Avg episode reward: [(0, '8.205')] [2025-01-04 06:22:20,007][134294] Updated weights for policy 0, policy_version 122274 (0.0015) [2025-01-04 06:22:23,042][134294] Updated weights for policy 0, policy_version 122284 (0.0024) [2025-01-04 06:22:23,968][134211] Fps is (10 sec: 16793.4, 60 sec: 14404.2, 300 sec: 14606.7). Total num frames: 500883456. Throughput: 0: 3738.6. Samples: 114389388. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 06:22:23,968][134211] Avg episode reward: [(0, '8.127')] [2025-01-04 06:22:26,037][134294] Updated weights for policy 0, policy_version 122294 (0.0025) [2025-01-04 06:22:28,968][134211] Fps is (10 sec: 13516.7, 60 sec: 14404.3, 300 sec: 14606.8). Total num frames: 500948992. Throughput: 0: 3610.4. Samples: 114408898. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 06:22:28,968][134211] Avg episode reward: [(0, '9.169')] [2025-01-04 06:22:29,361][134294] Updated weights for policy 0, policy_version 122304 (0.0026) [2025-01-04 06:22:32,498][134294] Updated weights for policy 0, policy_version 122314 (0.0027) [2025-01-04 06:22:33,968][134211] Fps is (10 sec: 13107.6, 60 sec: 14404.3, 300 sec: 14606.8). Total num frames: 501014528. Throughput: 0: 3453.8. Samples: 114418106. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 06:22:33,968][134211] Avg episode reward: [(0, '8.780')] [2025-01-04 06:22:35,648][134294] Updated weights for policy 0, policy_version 122324 (0.0024) [2025-01-04 06:22:38,864][134294] Updated weights for policy 0, policy_version 122334 (0.0024) [2025-01-04 06:22:38,968][134211] Fps is (10 sec: 13106.2, 60 sec: 14472.4, 300 sec: 14509.5). Total num frames: 501080064. Throughput: 0: 3350.1. Samples: 114437944. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 06:22:38,971][134211] Avg episode reward: [(0, '8.787')] [2025-01-04 06:22:41,608][134294] Updated weights for policy 0, policy_version 122344 (0.0018) [2025-01-04 06:22:43,699][134294] Updated weights for policy 0, policy_version 122354 (0.0013) [2025-01-04 06:22:43,967][134211] Fps is (10 sec: 15155.4, 60 sec: 14267.7, 300 sec: 14495.7). Total num frames: 501166080. Throughput: 0: 3451.5. Samples: 114461150. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 06:22:43,968][134211] Avg episode reward: [(0, '8.838')] [2025-01-04 06:22:45,622][134294] Updated weights for policy 0, policy_version 122364 (0.0012) [2025-01-04 06:22:47,576][134294] Updated weights for policy 0, policy_version 122374 (0.0014) [2025-01-04 06:22:48,968][134211] Fps is (10 sec: 18433.4, 60 sec: 14199.5, 300 sec: 14579.0). Total num frames: 501264384. Throughput: 0: 3589.3. Samples: 114477030. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 06:22:48,968][134211] Avg episode reward: [(0, '8.782')] [2025-01-04 06:22:50,348][134294] Updated weights for policy 0, policy_version 122384 (0.0025) [2025-01-04 06:22:53,629][134294] Updated weights for policy 0, policy_version 122394 (0.0029) [2025-01-04 06:22:53,968][134211] Fps is (10 sec: 16383.5, 60 sec: 14267.7, 300 sec: 14565.1). Total num frames: 501329920. Throughput: 0: 3712.7. Samples: 114500336. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 06:22:53,968][134211] Avg episode reward: [(0, '7.881')] [2025-01-04 06:22:56,653][134294] Updated weights for policy 0, policy_version 122404 (0.0027) [2025-01-04 06:22:58,968][134211] Fps is (10 sec: 13107.1, 60 sec: 14267.8, 300 sec: 14592.9). Total num frames: 501395456. Throughput: 0: 3767.2. Samples: 114519778. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 06:22:58,968][134211] Avg episode reward: [(0, '8.808')] [2025-01-04 06:22:59,859][134294] Updated weights for policy 0, policy_version 122414 (0.0027) [2025-01-04 06:23:02,940][134294] Updated weights for policy 0, policy_version 122424 (0.0025) [2025-01-04 06:23:03,968][134211] Fps is (10 sec: 12697.8, 60 sec: 14267.8, 300 sec: 14606.8). Total num frames: 501456896. Throughput: 0: 3725.6. Samples: 114529534. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 06:23:03,968][134211] Avg episode reward: [(0, '7.937')] [2025-01-04 06:23:06,399][134294] Updated weights for policy 0, policy_version 122434 (0.0026) [2025-01-04 06:23:08,968][134211] Fps is (10 sec: 12697.5, 60 sec: 14404.2, 300 sec: 14551.2). Total num frames: 501522432. Throughput: 0: 3532.4. Samples: 114548344. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 06:23:08,968][134211] Avg episode reward: [(0, '9.831')] [2025-01-04 06:23:09,531][134294] Updated weights for policy 0, policy_version 122444 (0.0028) [2025-01-04 06:23:13,037][134294] Updated weights for policy 0, policy_version 122454 (0.0021) [2025-01-04 06:23:13,968][134211] Fps is (10 sec: 12287.9, 60 sec: 14404.3, 300 sec: 14384.6). Total num frames: 501579776. Throughput: 0: 3503.2. Samples: 114566542. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 06:23:13,968][134211] Avg episode reward: [(0, '9.427')] [2025-01-04 06:23:14,014][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000122457_501583872.pth... [2025-01-04 06:23:14,089][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000121607_498102272.pth [2025-01-04 06:23:15,493][134294] Updated weights for policy 0, policy_version 122464 (0.0017) [2025-01-04 06:23:17,386][134294] Updated weights for policy 0, policy_version 122474 (0.0014) [2025-01-04 06:23:18,967][134211] Fps is (10 sec: 16384.5, 60 sec: 14540.8, 300 sec: 14523.5). Total num frames: 501686272. Throughput: 0: 3599.9. Samples: 114580102. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 06:23:18,968][134211] Avg episode reward: [(0, '8.381')] [2025-01-04 06:23:19,359][134294] Updated weights for policy 0, policy_version 122484 (0.0014) [2025-01-04 06:23:21,669][134294] Updated weights for policy 0, policy_version 122494 (0.0020) [2025-01-04 06:23:23,968][134211] Fps is (10 sec: 18432.1, 60 sec: 14677.4, 300 sec: 14565.1). Total num frames: 501764096. Throughput: 0: 3794.9. Samples: 114608710. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 06:23:23,968][134211] Avg episode reward: [(0, '8.293')] [2025-01-04 06:23:24,994][134294] Updated weights for policy 0, policy_version 122504 (0.0028) [2025-01-04 06:23:28,208][134294] Updated weights for policy 0, policy_version 122514 (0.0028) [2025-01-04 06:23:28,968][134211] Fps is (10 sec: 13926.0, 60 sec: 14609.0, 300 sec: 14551.2). Total num frames: 501825536. Throughput: 0: 3696.2. Samples: 114627478. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 06:23:28,968][134211] Avg episode reward: [(0, '8.386')] [2025-01-04 06:23:31,263][134294] Updated weights for policy 0, policy_version 122524 (0.0029) [2025-01-04 06:23:33,968][134211] Fps is (10 sec: 12697.7, 60 sec: 14609.1, 300 sec: 14565.1). Total num frames: 501891072. Throughput: 0: 3564.8. Samples: 114637446. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 06:23:33,968][134211] Avg episode reward: [(0, '8.500')] [2025-01-04 06:23:34,519][134294] Updated weights for policy 0, policy_version 122534 (0.0030) [2025-01-04 06:23:37,864][134294] Updated weights for policy 0, policy_version 122544 (0.0027) [2025-01-04 06:23:38,968][134211] Fps is (10 sec: 12697.8, 60 sec: 14541.0, 300 sec: 14551.2). Total num frames: 501952512. Throughput: 0: 3460.5. Samples: 114656056. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 06:23:38,968][134211] Avg episode reward: [(0, '8.448')] [2025-01-04 06:23:41,246][134294] Updated weights for policy 0, policy_version 122554 (0.0025) [2025-01-04 06:23:43,968][134211] Fps is (10 sec: 12697.7, 60 sec: 14199.5, 300 sec: 14551.2). Total num frames: 502018048. Throughput: 0: 3443.0. Samples: 114674714. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 06:23:43,968][134211] Avg episode reward: [(0, '8.258')] [2025-01-04 06:23:44,004][134294] Updated weights for policy 0, policy_version 122564 (0.0017) [2025-01-04 06:23:45,916][134294] Updated weights for policy 0, policy_version 122574 (0.0012) [2025-01-04 06:23:47,826][134294] Updated weights for policy 0, policy_version 122584 (0.0012) [2025-01-04 06:23:48,968][134211] Fps is (10 sec: 17612.9, 60 sec: 14404.3, 300 sec: 14676.2). Total num frames: 502128640. Throughput: 0: 3575.3. Samples: 114690422. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 06:23:48,968][134211] Avg episode reward: [(0, '8.198')] [2025-01-04 06:23:49,733][134294] Updated weights for policy 0, policy_version 122594 (0.0015) [2025-01-04 06:23:51,615][134294] Updated weights for policy 0, policy_version 122604 (0.0013) [2025-01-04 06:23:53,462][134294] Updated weights for policy 0, policy_version 122614 (0.0015) [2025-01-04 06:23:53,967][134211] Fps is (10 sec: 21708.9, 60 sec: 15087.0, 300 sec: 14773.4). Total num frames: 502235136. Throughput: 0: 3880.4. Samples: 114722960. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 06:23:53,968][134211] Avg episode reward: [(0, '9.031')] [2025-01-04 06:23:55,343][134294] Updated weights for policy 0, policy_version 122624 (0.0013) [2025-01-04 06:23:58,215][134294] Updated weights for policy 0, policy_version 122634 (0.0027) [2025-01-04 06:23:58,968][134211] Fps is (10 sec: 18431.7, 60 sec: 15291.7, 300 sec: 14815.0). Total num frames: 502312960. Throughput: 0: 4080.4. Samples: 114750162. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 06:23:58,968][134211] Avg episode reward: [(0, '8.413')] [2025-01-04 06:24:01,853][134294] Updated weights for policy 0, policy_version 122644 (0.0033) [2025-01-04 06:24:03,968][134211] Fps is (10 sec: 13925.9, 60 sec: 15291.7, 300 sec: 14690.0). Total num frames: 502374400. Throughput: 0: 3969.7. Samples: 114758742. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 06:24:03,969][134211] Avg episode reward: [(0, '8.738')] [2025-01-04 06:24:05,404][134294] Updated weights for policy 0, policy_version 122654 (0.0028) [2025-01-04 06:24:08,968][134211] Fps is (10 sec: 11468.9, 60 sec: 15087.0, 300 sec: 14662.3). Total num frames: 502427648. Throughput: 0: 3712.0. Samples: 114775752. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 06:24:08,968][134211] Avg episode reward: [(0, '8.387')] [2025-01-04 06:24:09,085][134294] Updated weights for policy 0, policy_version 122664 (0.0027) [2025-01-04 06:24:12,646][134294] Updated weights for policy 0, policy_version 122674 (0.0026) [2025-01-04 06:24:13,968][134211] Fps is (10 sec: 11059.2, 60 sec: 15086.9, 300 sec: 14634.5). Total num frames: 502484992. Throughput: 0: 3670.2. Samples: 114792636. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 06:24:13,969][134211] Avg episode reward: [(0, '8.385')] [2025-01-04 06:24:16,056][134294] Updated weights for policy 0, policy_version 122684 (0.0022) [2025-01-04 06:24:18,968][134211] Fps is (10 sec: 12287.8, 60 sec: 14404.2, 300 sec: 14565.1). Total num frames: 502550528. Throughput: 0: 3655.2. Samples: 114801930. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 06:24:18,968][134211] Avg episode reward: [(0, '8.000')] [2025-01-04 06:24:19,198][134294] Updated weights for policy 0, policy_version 122694 (0.0025) [2025-01-04 06:24:22,151][134294] Updated weights for policy 0, policy_version 122704 (0.0025) [2025-01-04 06:24:23,968][134211] Fps is (10 sec: 12697.7, 60 sec: 14131.2, 300 sec: 14495.7). Total num frames: 502611968. Throughput: 0: 3687.4. Samples: 114821988. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 06:24:23,969][134211] Avg episode reward: [(0, '9.538')] [2025-01-04 06:24:25,533][134294] Updated weights for policy 0, policy_version 122714 (0.0028) [2025-01-04 06:24:28,493][134294] Updated weights for policy 0, policy_version 122724 (0.0024) [2025-01-04 06:24:28,968][134211] Fps is (10 sec: 13107.2, 60 sec: 14267.7, 300 sec: 14509.6). Total num frames: 502681600. Throughput: 0: 3708.8. Samples: 114841610. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 06:24:28,968][134211] Avg episode reward: [(0, '7.752')] [2025-01-04 06:24:31,442][134294] Updated weights for policy 0, policy_version 122734 (0.0025) [2025-01-04 06:24:33,968][134211] Fps is (10 sec: 13926.4, 60 sec: 14336.0, 300 sec: 14551.2). Total num frames: 502751232. Throughput: 0: 3590.6. Samples: 114851998. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 06:24:33,968][134211] Avg episode reward: [(0, '8.055')] [2025-01-04 06:24:34,464][134294] Updated weights for policy 0, policy_version 122744 (0.0025) [2025-01-04 06:24:37,520][134294] Updated weights for policy 0, policy_version 122754 (0.0024) [2025-01-04 06:24:38,968][134211] Fps is (10 sec: 13516.8, 60 sec: 14404.2, 300 sec: 14509.6). Total num frames: 502816768. Throughput: 0: 3317.2. Samples: 114872234. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 06:24:38,968][134211] Avg episode reward: [(0, '8.521')] [2025-01-04 06:24:40,748][134294] Updated weights for policy 0, policy_version 122764 (0.0024) [2025-01-04 06:24:42,936][134294] Updated weights for policy 0, policy_version 122774 (0.0013) [2025-01-04 06:24:43,968][134211] Fps is (10 sec: 14745.8, 60 sec: 14677.3, 300 sec: 14551.2). Total num frames: 502898688. Throughput: 0: 3217.9. Samples: 114894968. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 06:24:43,968][134211] Avg episode reward: [(0, '8.745')] [2025-01-04 06:24:45,046][134294] Updated weights for policy 0, policy_version 122784 (0.0015) [2025-01-04 06:24:46,959][134294] Updated weights for policy 0, policy_version 122794 (0.0014) [2025-01-04 06:24:48,883][134294] Updated weights for policy 0, policy_version 122804 (0.0015) [2025-01-04 06:24:48,968][134211] Fps is (10 sec: 18842.0, 60 sec: 14609.1, 300 sec: 14579.0). Total num frames: 503005184. Throughput: 0: 3366.2. Samples: 114910218. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 06:24:48,968][134211] Avg episode reward: [(0, '8.352')] [2025-01-04 06:24:50,763][134294] Updated weights for policy 0, policy_version 122814 (0.0013) [2025-01-04 06:24:52,979][134294] Updated weights for policy 0, policy_version 122824 (0.0018) [2025-01-04 06:24:53,968][134211] Fps is (10 sec: 20070.3, 60 sec: 14404.2, 300 sec: 14648.4). Total num frames: 503099392. Throughput: 0: 3695.4. Samples: 114942044. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 06:24:53,968][134211] Avg episode reward: [(0, '8.563')] [2025-01-04 06:24:56,020][134294] Updated weights for policy 0, policy_version 122834 (0.0025) [2025-01-04 06:24:58,968][134211] Fps is (10 sec: 15564.8, 60 sec: 14131.2, 300 sec: 14648.4). Total num frames: 503160832. Throughput: 0: 3756.7. Samples: 114961688. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 06:24:58,968][134211] Avg episode reward: [(0, '8.787')] [2025-01-04 06:24:59,409][134294] Updated weights for policy 0, policy_version 122844 (0.0028) [2025-01-04 06:25:02,584][134294] Updated weights for policy 0, policy_version 122854 (0.0024) [2025-01-04 06:25:03,968][134211] Fps is (10 sec: 12697.0, 60 sec: 14199.4, 300 sec: 14676.2). Total num frames: 503226368. Throughput: 0: 3761.8. Samples: 114971214. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 06:25:03,969][134211] Avg episode reward: [(0, '8.798')] [2025-01-04 06:25:05,752][134294] Updated weights for policy 0, policy_version 122864 (0.0026) [2025-01-04 06:25:08,876][134294] Updated weights for policy 0, policy_version 122874 (0.0025) [2025-01-04 06:25:08,968][134211] Fps is (10 sec: 13106.9, 60 sec: 14404.2, 300 sec: 14634.5). Total num frames: 503291904. Throughput: 0: 3745.4. Samples: 114990530. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 06:25:08,969][134211] Avg episode reward: [(0, '9.315')] [2025-01-04 06:25:12,261][134294] Updated weights for policy 0, policy_version 122884 (0.0028) [2025-01-04 06:25:13,968][134211] Fps is (10 sec: 12288.3, 60 sec: 14404.3, 300 sec: 14467.9). Total num frames: 503349248. Throughput: 0: 3723.5. Samples: 115009168. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 06:25:13,969][134211] Avg episode reward: [(0, '8.371')] [2025-01-04 06:25:13,982][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000122888_503349248.pth... [2025-01-04 06:25:14,067][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000122032_499843072.pth [2025-01-04 06:25:15,706][134294] Updated weights for policy 0, policy_version 122894 (0.0028) [2025-01-04 06:25:18,638][134294] Updated weights for policy 0, policy_version 122904 (0.0025) [2025-01-04 06:25:18,968][134211] Fps is (10 sec: 12697.7, 60 sec: 14472.6, 300 sec: 14467.9). Total num frames: 503418880. Throughput: 0: 3694.8. Samples: 115018262. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 06:25:18,968][134211] Avg episode reward: [(0, '9.570')] [2025-01-04 06:25:21,610][134294] Updated weights for policy 0, policy_version 122914 (0.0026) [2025-01-04 06:25:23,968][134211] Fps is (10 sec: 13517.2, 60 sec: 14540.8, 300 sec: 14481.8). Total num frames: 503484416. Throughput: 0: 3709.0. Samples: 115039140. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 06:25:23,968][134211] Avg episode reward: [(0, '8.202')] [2025-01-04 06:25:24,656][134294] Updated weights for policy 0, policy_version 122924 (0.0026) [2025-01-04 06:25:27,715][134294] Updated weights for policy 0, policy_version 122934 (0.0025) [2025-01-04 06:25:28,968][134211] Fps is (10 sec: 13517.0, 60 sec: 14540.8, 300 sec: 14481.8). Total num frames: 503554048. Throughput: 0: 3652.4. Samples: 115059326. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 06:25:28,968][134211] Avg episode reward: [(0, '9.027')] [2025-01-04 06:25:30,651][134294] Updated weights for policy 0, policy_version 122944 (0.0022) [2025-01-04 06:25:33,512][134294] Updated weights for policy 0, policy_version 122954 (0.0023) [2025-01-04 06:25:33,968][134211] Fps is (10 sec: 13926.2, 60 sec: 14540.8, 300 sec: 14509.6). Total num frames: 503623680. Throughput: 0: 3546.6. Samples: 115069816. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 06:25:33,968][134211] Avg episode reward: [(0, '9.211')] [2025-01-04 06:25:36,688][134294] Updated weights for policy 0, policy_version 122964 (0.0028) [2025-01-04 06:25:38,946][134294] Updated weights for policy 0, policy_version 122974 (0.0016) [2025-01-04 06:25:38,968][134211] Fps is (10 sec: 14745.6, 60 sec: 14745.6, 300 sec: 14565.1). Total num frames: 503701504. Throughput: 0: 3286.7. Samples: 115089946. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 06:25:38,968][134211] Avg episode reward: [(0, '8.868')] [2025-01-04 06:25:41,175][134294] Updated weights for policy 0, policy_version 122984 (0.0012) [2025-01-04 06:25:43,260][134294] Updated weights for policy 0, policy_version 122994 (0.0013) [2025-01-04 06:25:43,968][134211] Fps is (10 sec: 17203.5, 60 sec: 14950.4, 300 sec: 14606.8). Total num frames: 503795712. Throughput: 0: 3484.5. Samples: 115118492. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 06:25:43,968][134211] Avg episode reward: [(0, '8.063')] [2025-01-04 06:25:45,624][134294] Updated weights for policy 0, policy_version 123004 (0.0018) [2025-01-04 06:25:48,879][134294] Updated weights for policy 0, policy_version 123014 (0.0024) [2025-01-04 06:25:48,968][134211] Fps is (10 sec: 16383.8, 60 sec: 14336.0, 300 sec: 14606.9). Total num frames: 503865344. Throughput: 0: 3549.9. Samples: 115130960. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 06:25:48,968][134211] Avg episode reward: [(0, '9.264')] [2025-01-04 06:25:52,127][134294] Updated weights for policy 0, policy_version 123024 (0.0028) [2025-01-04 06:25:53,968][134211] Fps is (10 sec: 13516.4, 60 sec: 13858.1, 300 sec: 14606.7). Total num frames: 503930880. Throughput: 0: 3549.3. Samples: 115150248. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 06:25:53,969][134211] Avg episode reward: [(0, '8.794')] [2025-01-04 06:25:55,166][134294] Updated weights for policy 0, policy_version 123034 (0.0026) [2025-01-04 06:25:58,171][134294] Updated weights for policy 0, policy_version 123044 (0.0026) [2025-01-04 06:25:58,968][134211] Fps is (10 sec: 13107.1, 60 sec: 13926.3, 300 sec: 14537.3). Total num frames: 503996416. Throughput: 0: 3580.2. Samples: 115170276. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 06:25:58,968][134211] Avg episode reward: [(0, '8.920')] [2025-01-04 06:26:01,145][134294] Updated weights for policy 0, policy_version 123054 (0.0026) [2025-01-04 06:26:03,968][134211] Fps is (10 sec: 13107.5, 60 sec: 13926.5, 300 sec: 14537.3). Total num frames: 504061952. Throughput: 0: 3605.4. Samples: 115180504. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 06:26:03,968][134211] Avg episode reward: [(0, '9.538')] [2025-01-04 06:26:04,519][134294] Updated weights for policy 0, policy_version 123064 (0.0024) [2025-01-04 06:26:07,900][134294] Updated weights for policy 0, policy_version 123074 (0.0024) [2025-01-04 06:26:08,968][134211] Fps is (10 sec: 12697.9, 60 sec: 13858.2, 300 sec: 14509.6). Total num frames: 504123392. Throughput: 0: 3546.0. Samples: 115198710. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 06:26:08,968][134211] Avg episode reward: [(0, '8.963')] [2025-01-04 06:26:10,467][134294] Updated weights for policy 0, policy_version 123084 (0.0016) [2025-01-04 06:26:12,610][134294] Updated weights for policy 0, policy_version 123094 (0.0014) [2025-01-04 06:26:13,967][134211] Fps is (10 sec: 15565.2, 60 sec: 14472.6, 300 sec: 14467.9). Total num frames: 504217600. Throughput: 0: 3662.9. Samples: 115224158. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 06:26:13,968][134211] Avg episode reward: [(0, '8.617')] [2025-01-04 06:26:14,638][134294] Updated weights for policy 0, policy_version 123104 (0.0013) [2025-01-04 06:26:16,637][134294] Updated weights for policy 0, policy_version 123114 (0.0013) [2025-01-04 06:26:18,566][134294] Updated weights for policy 0, policy_version 123124 (0.0013) [2025-01-04 06:26:18,967][134211] Fps is (10 sec: 20070.5, 60 sec: 15087.0, 300 sec: 14592.9). Total num frames: 504324096. Throughput: 0: 3772.9. Samples: 115239596. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 06:26:18,968][134211] Avg episode reward: [(0, '8.646')] [2025-01-04 06:26:21,024][134294] Updated weights for policy 0, policy_version 123134 (0.0023) [2025-01-04 06:26:23,968][134211] Fps is (10 sec: 17612.2, 60 sec: 15155.1, 300 sec: 14606.7). Total num frames: 504393728. Throughput: 0: 3914.2. Samples: 115266084. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 06:26:23,969][134211] Avg episode reward: [(0, '8.065')] [2025-01-04 06:26:24,293][134294] Updated weights for policy 0, policy_version 123144 (0.0028) [2025-01-04 06:26:27,439][134294] Updated weights for policy 0, policy_version 123154 (0.0028) [2025-01-04 06:26:28,968][134211] Fps is (10 sec: 13107.0, 60 sec: 15018.6, 300 sec: 14592.9). Total num frames: 504455168. Throughput: 0: 3699.9. Samples: 115284986. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 06:26:28,968][134211] Avg episode reward: [(0, '9.027')] [2025-01-04 06:26:30,796][134294] Updated weights for policy 0, policy_version 123164 (0.0025) [2025-01-04 06:26:33,968][134211] Fps is (10 sec: 12288.1, 60 sec: 14882.1, 300 sec: 14592.9). Total num frames: 504516608. Throughput: 0: 3632.5. Samples: 115294422. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 06:26:33,968][134211] Avg episode reward: [(0, '9.019')] [2025-01-04 06:26:34,069][134294] Updated weights for policy 0, policy_version 123174 (0.0025) [2025-01-04 06:26:37,644][134294] Updated weights for policy 0, policy_version 123184 (0.0023) [2025-01-04 06:26:38,968][134211] Fps is (10 sec: 11878.3, 60 sec: 14540.8, 300 sec: 14454.0). Total num frames: 504573952. Throughput: 0: 3601.9. Samples: 115312334. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 06:26:38,968][134211] Avg episode reward: [(0, '8.406')] [2025-01-04 06:26:41,052][134294] Updated weights for policy 0, policy_version 123194 (0.0028) [2025-01-04 06:26:43,968][134211] Fps is (10 sec: 11878.4, 60 sec: 13994.6, 300 sec: 14315.2). Total num frames: 504635392. Throughput: 0: 3551.1. Samples: 115330074. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 06:26:43,968][134211] Avg episode reward: [(0, '7.963')] [2025-01-04 06:26:44,568][134294] Updated weights for policy 0, policy_version 123204 (0.0030) [2025-01-04 06:26:47,865][134294] Updated weights for policy 0, policy_version 123214 (0.0028) [2025-01-04 06:26:48,968][134211] Fps is (10 sec: 12288.0, 60 sec: 13858.1, 300 sec: 14315.2). Total num frames: 504696832. Throughput: 0: 3518.4. Samples: 115338834. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 06:26:48,968][134211] Avg episode reward: [(0, '8.243')] [2025-01-04 06:26:50,802][134294] Updated weights for policy 0, policy_version 123224 (0.0023) [2025-01-04 06:26:53,776][134294] Updated weights for policy 0, policy_version 123234 (0.0021) [2025-01-04 06:26:53,968][134211] Fps is (10 sec: 13107.4, 60 sec: 13926.5, 300 sec: 14329.1). Total num frames: 504766464. Throughput: 0: 3564.1. Samples: 115359094. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 06:26:53,968][134211] Avg episode reward: [(0, '8.462')] [2025-01-04 06:26:56,741][134294] Updated weights for policy 0, policy_version 123244 (0.0026) [2025-01-04 06:26:58,968][134211] Fps is (10 sec: 13926.4, 60 sec: 13994.7, 300 sec: 14356.8). Total num frames: 504836096. Throughput: 0: 3459.9. Samples: 115379856. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 06:26:58,968][134211] Avg episode reward: [(0, '8.628')] [2025-01-04 06:26:59,818][134294] Updated weights for policy 0, policy_version 123254 (0.0026) [2025-01-04 06:27:02,917][134294] Updated weights for policy 0, policy_version 123264 (0.0026) [2025-01-04 06:27:03,968][134211] Fps is (10 sec: 13516.8, 60 sec: 13994.7, 300 sec: 14384.6). Total num frames: 504901632. Throughput: 0: 3337.6. Samples: 115389788. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 06:27:03,968][134211] Avg episode reward: [(0, '8.807')] [2025-01-04 06:27:05,555][134294] Updated weights for policy 0, policy_version 123274 (0.0019) [2025-01-04 06:27:07,403][134294] Updated weights for policy 0, policy_version 123284 (0.0013) [2025-01-04 06:27:08,968][134211] Fps is (10 sec: 16384.3, 60 sec: 14609.1, 300 sec: 14523.5). Total num frames: 504999936. Throughput: 0: 3304.1. Samples: 115414768. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 06:27:08,968][134211] Avg episode reward: [(0, '8.765')] [2025-01-04 06:27:09,526][134294] Updated weights for policy 0, policy_version 123294 (0.0013) [2025-01-04 06:27:11,489][134294] Updated weights for policy 0, policy_version 123304 (0.0014) [2025-01-04 06:27:13,562][134294] Updated weights for policy 0, policy_version 123314 (0.0015) [2025-01-04 06:27:13,968][134211] Fps is (10 sec: 19661.1, 60 sec: 14677.3, 300 sec: 14523.4). Total num frames: 505098240. Throughput: 0: 3550.2. Samples: 115444746. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 06:27:13,968][134211] Avg episode reward: [(0, '8.690')] [2025-01-04 06:27:14,012][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000123316_505102336.pth... [2025-01-04 06:27:14,055][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000122457_501583872.pth [2025-01-04 06:27:15,693][134294] Updated weights for policy 0, policy_version 123324 (0.0015) [2025-01-04 06:27:18,889][134294] Updated weights for policy 0, policy_version 123334 (0.0029) [2025-01-04 06:27:18,968][134211] Fps is (10 sec: 17612.4, 60 sec: 14199.4, 300 sec: 14551.2). Total num frames: 505176064. Throughput: 0: 3653.8. Samples: 115458842. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 06:27:18,968][134211] Avg episode reward: [(0, '7.901')] [2025-01-04 06:27:22,128][134294] Updated weights for policy 0, policy_version 123344 (0.0029) [2025-01-04 06:27:23,968][134211] Fps is (10 sec: 13926.0, 60 sec: 14063.0, 300 sec: 14537.3). Total num frames: 505237504. Throughput: 0: 3672.8. Samples: 115477608. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 06:27:23,968][134211] Avg episode reward: [(0, '9.264')] [2025-01-04 06:27:25,269][134294] Updated weights for policy 0, policy_version 123354 (0.0026) [2025-01-04 06:27:28,243][134294] Updated weights for policy 0, policy_version 123364 (0.0026) [2025-01-04 06:27:28,968][134211] Fps is (10 sec: 13107.3, 60 sec: 14199.5, 300 sec: 14551.2). Total num frames: 505307136. Throughput: 0: 3724.3. Samples: 115497668. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 06:27:28,968][134211] Avg episode reward: [(0, '9.621')] [2025-01-04 06:27:31,360][134294] Updated weights for policy 0, policy_version 123374 (0.0026) [2025-01-04 06:27:33,968][134211] Fps is (10 sec: 13516.8, 60 sec: 14267.7, 300 sec: 14551.2). Total num frames: 505372672. Throughput: 0: 3757.0. Samples: 115507900. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 06:27:33,969][134211] Avg episode reward: [(0, '8.778')] [2025-01-04 06:27:34,424][134294] Updated weights for policy 0, policy_version 123384 (0.0025) [2025-01-04 06:27:37,586][134294] Updated weights for policy 0, policy_version 123394 (0.0025) [2025-01-04 06:27:38,968][134211] Fps is (10 sec: 12696.9, 60 sec: 14335.9, 300 sec: 14467.9). Total num frames: 505434112. Throughput: 0: 3739.8. Samples: 115527386. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 06:27:38,969][134211] Avg episode reward: [(0, '8.136')] [2025-01-04 06:27:40,934][134294] Updated weights for policy 0, policy_version 123404 (0.0025) [2025-01-04 06:27:43,968][134211] Fps is (10 sec: 12697.6, 60 sec: 14404.3, 300 sec: 14356.8). Total num frames: 505499648. Throughput: 0: 3698.0. Samples: 115546266. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 06:27:43,969][134211] Avg episode reward: [(0, '8.991')] [2025-01-04 06:27:44,111][134294] Updated weights for policy 0, policy_version 123414 (0.0026) [2025-01-04 06:27:47,370][134294] Updated weights for policy 0, policy_version 123424 (0.0025) [2025-01-04 06:27:48,968][134211] Fps is (10 sec: 13107.7, 60 sec: 14472.5, 300 sec: 14356.8). Total num frames: 505565184. Throughput: 0: 3684.7. Samples: 115555600. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 06:27:48,968][134211] Avg episode reward: [(0, '8.870')] [2025-01-04 06:27:50,400][134294] Updated weights for policy 0, policy_version 123434 (0.0024) [2025-01-04 06:27:53,253][134294] Updated weights for policy 0, policy_version 123444 (0.0023) [2025-01-04 06:27:53,968][134211] Fps is (10 sec: 13517.1, 60 sec: 14472.5, 300 sec: 14370.7). Total num frames: 505634816. Throughput: 0: 3585.9. Samples: 115576134. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 06:27:53,968][134211] Avg episode reward: [(0, '9.104')] [2025-01-04 06:27:56,229][134294] Updated weights for policy 0, policy_version 123454 (0.0024) [2025-01-04 06:27:58,834][134294] Updated weights for policy 0, policy_version 123464 (0.0019) [2025-01-04 06:27:58,968][134211] Fps is (10 sec: 14336.3, 60 sec: 14540.8, 300 sec: 14412.4). Total num frames: 505708544. Throughput: 0: 3385.3. Samples: 115597084. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 06:27:58,968][134211] Avg episode reward: [(0, '8.178')] [2025-01-04 06:28:01,005][134294] Updated weights for policy 0, policy_version 123474 (0.0017) [2025-01-04 06:28:03,968][134211] Fps is (10 sec: 15155.0, 60 sec: 14745.6, 300 sec: 14454.0). Total num frames: 505786368. Throughput: 0: 3380.1. Samples: 115610946. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 06:28:03,968][134211] Avg episode reward: [(0, '8.182')] [2025-01-04 06:28:04,145][134294] Updated weights for policy 0, policy_version 123484 (0.0025) [2025-01-04 06:28:07,405][134294] Updated weights for policy 0, policy_version 123494 (0.0021) [2025-01-04 06:28:08,968][134211] Fps is (10 sec: 15155.2, 60 sec: 14336.0, 300 sec: 14509.6). Total num frames: 505860096. Throughput: 0: 3387.1. Samples: 115630028. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 06:28:08,968][134211] Avg episode reward: [(0, '8.745')] [2025-01-04 06:28:09,617][134294] Updated weights for policy 0, policy_version 123504 (0.0016) [2025-01-04 06:28:11,725][134294] Updated weights for policy 0, policy_version 123514 (0.0012) [2025-01-04 06:28:13,753][134294] Updated weights for policy 0, policy_version 123524 (0.0014) [2025-01-04 06:28:13,968][134211] Fps is (10 sec: 16793.8, 60 sec: 14267.7, 300 sec: 14467.9). Total num frames: 505954304. Throughput: 0: 3582.8. Samples: 115658892. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 06:28:13,968][134211] Avg episode reward: [(0, '7.871')] [2025-01-04 06:28:16,985][134294] Updated weights for policy 0, policy_version 123534 (0.0030) [2025-01-04 06:28:18,968][134211] Fps is (10 sec: 15564.7, 60 sec: 13994.7, 300 sec: 14412.4). Total num frames: 506015744. Throughput: 0: 3589.5. Samples: 115669426. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 06:28:18,968][134211] Avg episode reward: [(0, '8.396')] [2025-01-04 06:28:20,357][134294] Updated weights for policy 0, policy_version 123544 (0.0027) [2025-01-04 06:28:23,433][134294] Updated weights for policy 0, policy_version 123554 (0.0022) [2025-01-04 06:28:23,968][134211] Fps is (10 sec: 12697.4, 60 sec: 14062.9, 300 sec: 14426.2). Total num frames: 506081280. Throughput: 0: 3579.8. Samples: 115688474. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 06:28:23,969][134211] Avg episode reward: [(0, '8.861')] [2025-01-04 06:28:26,494][134294] Updated weights for policy 0, policy_version 123564 (0.0022) [2025-01-04 06:28:28,968][134211] Fps is (10 sec: 13515.9, 60 sec: 14062.8, 300 sec: 14440.1). Total num frames: 506150912. Throughput: 0: 3599.0. Samples: 115708224. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 06:28:28,969][134211] Avg episode reward: [(0, '8.309')] [2025-01-04 06:28:29,632][134294] Updated weights for policy 0, policy_version 123574 (0.0028) [2025-01-04 06:28:32,604][134294] Updated weights for policy 0, policy_version 123584 (0.0028) [2025-01-04 06:28:33,968][134211] Fps is (10 sec: 13107.5, 60 sec: 13994.7, 300 sec: 14440.1). Total num frames: 506212352. Throughput: 0: 3614.4. Samples: 115718246. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 06:28:33,968][134211] Avg episode reward: [(0, '8.753')] [2025-01-04 06:28:35,746][134294] Updated weights for policy 0, policy_version 123594 (0.0024) [2025-01-04 06:28:38,759][134294] Updated weights for policy 0, policy_version 123604 (0.0025) [2025-01-04 06:28:38,968][134211] Fps is (10 sec: 13107.9, 60 sec: 14131.3, 300 sec: 14454.0). Total num frames: 506281984. Throughput: 0: 3609.4. Samples: 115738556. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 06:28:38,968][134211] Avg episode reward: [(0, '9.176')] [2025-01-04 06:28:42,105][134294] Updated weights for policy 0, policy_version 123614 (0.0024) [2025-01-04 06:28:43,968][134211] Fps is (10 sec: 13107.3, 60 sec: 14063.0, 300 sec: 14287.4). Total num frames: 506343424. Throughput: 0: 3558.1. Samples: 115757200. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 06:28:43,968][134211] Avg episode reward: [(0, '7.985')] [2025-01-04 06:28:44,835][134294] Updated weights for policy 0, policy_version 123624 (0.0019) [2025-01-04 06:28:46,787][134294] Updated weights for policy 0, policy_version 123634 (0.0015) [2025-01-04 06:28:48,759][134294] Updated weights for policy 0, policy_version 123644 (0.0013) [2025-01-04 06:28:48,968][134211] Fps is (10 sec: 16794.0, 60 sec: 14745.7, 300 sec: 14287.4). Total num frames: 506449920. Throughput: 0: 3563.3. Samples: 115771292. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 06:28:48,968][134211] Avg episode reward: [(0, '8.101')] [2025-01-04 06:28:50,586][134294] Updated weights for policy 0, policy_version 123654 (0.0014) [2025-01-04 06:28:53,305][134294] Updated weights for policy 0, policy_version 123664 (0.0025) [2025-01-04 06:28:53,968][134211] Fps is (10 sec: 19251.0, 60 sec: 15018.7, 300 sec: 14315.2). Total num frames: 506535936. Throughput: 0: 3807.4. Samples: 115801360. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 06:28:53,968][134211] Avg episode reward: [(0, '8.176')] [2025-01-04 06:28:56,420][134294] Updated weights for policy 0, policy_version 123674 (0.0026) [2025-01-04 06:28:58,968][134211] Fps is (10 sec: 15154.8, 60 sec: 14882.1, 300 sec: 14329.1). Total num frames: 506601472. Throughput: 0: 3601.3. Samples: 115820950. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 06:28:58,968][134211] Avg episode reward: [(0, '7.821')] [2025-01-04 06:28:59,643][134294] Updated weights for policy 0, policy_version 123684 (0.0028) [2025-01-04 06:29:02,812][134294] Updated weights for policy 0, policy_version 123694 (0.0027) [2025-01-04 06:29:03,968][134211] Fps is (10 sec: 12697.3, 60 sec: 14609.0, 300 sec: 14356.8). Total num frames: 506662912. Throughput: 0: 3581.9. Samples: 115830612. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 06:29:03,969][134211] Avg episode reward: [(0, '8.592')] [2025-01-04 06:29:05,952][134294] Updated weights for policy 0, policy_version 123704 (0.0026) [2025-01-04 06:29:08,968][134211] Fps is (10 sec: 12697.6, 60 sec: 14472.5, 300 sec: 14384.6). Total num frames: 506728448. Throughput: 0: 3594.0. Samples: 115850206. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 06:29:08,968][134211] Avg episode reward: [(0, '8.772')] [2025-01-04 06:29:09,144][134294] Updated weights for policy 0, policy_version 123714 (0.0027) [2025-01-04 06:29:12,302][134294] Updated weights for policy 0, policy_version 123724 (0.0028) [2025-01-04 06:29:13,968][134211] Fps is (10 sec: 12696.9, 60 sec: 13926.2, 300 sec: 14370.7). Total num frames: 506789888. Throughput: 0: 3575.2. Samples: 115869110. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 06:29:13,969][134211] Avg episode reward: [(0, '7.702')] [2025-01-04 06:29:14,047][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000123729_506793984.pth... [2025-01-04 06:29:14,115][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000122888_503349248.pth [2025-01-04 06:29:15,686][134294] Updated weights for policy 0, policy_version 123734 (0.0024) [2025-01-04 06:29:18,806][134294] Updated weights for policy 0, policy_version 123744 (0.0025) [2025-01-04 06:29:18,968][134211] Fps is (10 sec: 12697.6, 60 sec: 13994.6, 300 sec: 14384.6). Total num frames: 506855424. Throughput: 0: 3560.2. Samples: 115878454. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 06:29:18,968][134211] Avg episode reward: [(0, '8.329')] [2025-01-04 06:29:21,362][134294] Updated weights for policy 0, policy_version 123754 (0.0021) [2025-01-04 06:29:23,542][134294] Updated weights for policy 0, policy_version 123764 (0.0016) [2025-01-04 06:29:23,968][134211] Fps is (10 sec: 15156.0, 60 sec: 14336.0, 300 sec: 14440.1). Total num frames: 506941440. Throughput: 0: 3623.1. Samples: 115901594. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 06:29:23,968][134211] Avg episode reward: [(0, '7.992')] [2025-01-04 06:29:26,744][134294] Updated weights for policy 0, policy_version 123774 (0.0025) [2025-01-04 06:29:28,968][134211] Fps is (10 sec: 15155.3, 60 sec: 14267.9, 300 sec: 14426.3). Total num frames: 507006976. Throughput: 0: 3676.3. Samples: 115922634. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 06:29:28,968][134211] Avg episode reward: [(0, '8.280')] [2025-01-04 06:29:29,828][134294] Updated weights for policy 0, policy_version 123784 (0.0025) [2025-01-04 06:29:32,751][134294] Updated weights for policy 0, policy_version 123794 (0.0024) [2025-01-04 06:29:33,968][134211] Fps is (10 sec: 13926.9, 60 sec: 14472.5, 300 sec: 14454.0). Total num frames: 507080704. Throughput: 0: 3584.6. Samples: 115932598. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 06:29:33,968][134211] Avg episode reward: [(0, '8.177')] [2025-01-04 06:29:34,839][134294] Updated weights for policy 0, policy_version 123804 (0.0014) [2025-01-04 06:29:37,900][134294] Updated weights for policy 0, policy_version 123814 (0.0027) [2025-01-04 06:29:38,968][134211] Fps is (10 sec: 14744.8, 60 sec: 14540.7, 300 sec: 14426.2). Total num frames: 507154432. Throughput: 0: 3450.8. Samples: 115956646. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 06:29:38,969][134211] Avg episode reward: [(0, '7.926')] [2025-01-04 06:29:41,151][134294] Updated weights for policy 0, policy_version 123824 (0.0023) [2025-01-04 06:29:43,262][134294] Updated weights for policy 0, policy_version 123834 (0.0014) [2025-01-04 06:29:43,968][134211] Fps is (10 sec: 15564.8, 60 sec: 14882.1, 300 sec: 14342.9). Total num frames: 507236352. Throughput: 0: 3512.9. Samples: 115979028. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 06:29:43,968][134211] Avg episode reward: [(0, '8.531')] [2025-01-04 06:29:45,341][134294] Updated weights for policy 0, policy_version 123844 (0.0013) [2025-01-04 06:29:47,627][134294] Updated weights for policy 0, policy_version 123854 (0.0018) [2025-01-04 06:29:48,968][134211] Fps is (10 sec: 16384.6, 60 sec: 14472.5, 300 sec: 14301.3). Total num frames: 507318272. Throughput: 0: 3631.7. Samples: 115994038. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 06:29:48,968][134211] Avg episode reward: [(0, '8.295')] [2025-01-04 06:29:50,948][134294] Updated weights for policy 0, policy_version 123864 (0.0029) [2025-01-04 06:29:53,946][134294] Updated weights for policy 0, policy_version 123874 (0.0025) [2025-01-04 06:29:53,968][134211] Fps is (10 sec: 15155.1, 60 sec: 14199.5, 300 sec: 14329.1). Total num frames: 507387904. Throughput: 0: 3643.5. Samples: 116014164. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 06:29:53,968][134211] Avg episode reward: [(0, '7.674')] [2025-01-04 06:29:57,290][134294] Updated weights for policy 0, policy_version 123884 (0.0026) [2025-01-04 06:29:58,968][134211] Fps is (10 sec: 13107.3, 60 sec: 14131.2, 300 sec: 14315.2). Total num frames: 507449344. Throughput: 0: 3651.5. Samples: 116033426. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 06:29:58,968][134211] Avg episode reward: [(0, '7.841')] [2025-01-04 06:30:00,261][134294] Updated weights for policy 0, policy_version 123894 (0.0023) [2025-01-04 06:30:03,441][134294] Updated weights for policy 0, policy_version 123904 (0.0024) [2025-01-04 06:30:03,968][134211] Fps is (10 sec: 12697.5, 60 sec: 14199.5, 300 sec: 14315.2). Total num frames: 507514880. Throughput: 0: 3670.4. Samples: 116043624. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 06:30:03,968][134211] Avg episode reward: [(0, '8.521')] [2025-01-04 06:30:06,823][134294] Updated weights for policy 0, policy_version 123914 (0.0030) [2025-01-04 06:30:08,968][134211] Fps is (10 sec: 12697.2, 60 sec: 14131.1, 300 sec: 14329.1). Total num frames: 507576320. Throughput: 0: 3569.5. Samples: 116062222. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 06:30:08,969][134211] Avg episode reward: [(0, '8.729')] [2025-01-04 06:30:10,123][134294] Updated weights for policy 0, policy_version 123924 (0.0022) [2025-01-04 06:30:12,258][134294] Updated weights for policy 0, policy_version 123934 (0.0014) [2025-01-04 06:30:13,967][134211] Fps is (10 sec: 14745.9, 60 sec: 14541.0, 300 sec: 14384.6). Total num frames: 507662336. Throughput: 0: 3626.5. Samples: 116085826. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 06:30:13,968][134211] Avg episode reward: [(0, '8.548')] [2025-01-04 06:30:14,460][134294] Updated weights for policy 0, policy_version 123944 (0.0016) [2025-01-04 06:30:16,441][134294] Updated weights for policy 0, policy_version 123954 (0.0013) [2025-01-04 06:30:18,468][134294] Updated weights for policy 0, policy_version 123964 (0.0013) [2025-01-04 06:30:18,968][134211] Fps is (10 sec: 18842.6, 60 sec: 15155.2, 300 sec: 14509.6). Total num frames: 507764736. Throughput: 0: 3735.0. Samples: 116100674. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 06:30:18,968][134211] Avg episode reward: [(0, '8.795')] [2025-01-04 06:30:20,779][134294] Updated weights for policy 0, policy_version 123974 (0.0018) [2025-01-04 06:30:23,968][134211] Fps is (10 sec: 17202.6, 60 sec: 14882.1, 300 sec: 14509.5). Total num frames: 507834368. Throughput: 0: 3788.6. Samples: 116127130. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 06:30:23,969][134211] Avg episode reward: [(0, '8.066')] [2025-01-04 06:30:24,057][134294] Updated weights for policy 0, policy_version 123984 (0.0033) [2025-01-04 06:30:27,161][134294] Updated weights for policy 0, policy_version 123994 (0.0028) [2025-01-04 06:30:28,968][134211] Fps is (10 sec: 13516.5, 60 sec: 14882.1, 300 sec: 14495.7). Total num frames: 507899904. Throughput: 0: 3714.5. Samples: 116146180. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 06:30:28,968][134211] Avg episode reward: [(0, '9.346')] [2025-01-04 06:30:30,362][134294] Updated weights for policy 0, policy_version 124004 (0.0027) [2025-01-04 06:30:33,385][134294] Updated weights for policy 0, policy_version 124014 (0.0026) [2025-01-04 06:30:33,968][134211] Fps is (10 sec: 13107.5, 60 sec: 14745.6, 300 sec: 14454.0). Total num frames: 507965440. Throughput: 0: 3604.7. Samples: 116156248. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 06:30:33,968][134211] Avg episode reward: [(0, '9.217')] [2025-01-04 06:30:36,434][134294] Updated weights for policy 0, policy_version 124024 (0.0028) [2025-01-04 06:30:38,968][134211] Fps is (10 sec: 13107.3, 60 sec: 14609.2, 300 sec: 14356.8). Total num frames: 508030976. Throughput: 0: 3598.0. Samples: 116176072. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 06:30:38,968][134211] Avg episode reward: [(0, '8.196')] [2025-01-04 06:30:39,791][134294] Updated weights for policy 0, policy_version 124034 (0.0029) [2025-01-04 06:30:43,108][134294] Updated weights for policy 0, policy_version 124044 (0.0030) [2025-01-04 06:30:43,968][134211] Fps is (10 sec: 12697.5, 60 sec: 14267.7, 300 sec: 14329.1). Total num frames: 508092416. Throughput: 0: 3573.9. Samples: 116194252. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 06:30:43,968][134211] Avg episode reward: [(0, '8.302')] [2025-01-04 06:30:46,412][134294] Updated weights for policy 0, policy_version 124054 (0.0026) [2025-01-04 06:30:48,968][134211] Fps is (10 sec: 12287.9, 60 sec: 13926.4, 300 sec: 14315.2). Total num frames: 508153856. Throughput: 0: 3554.5. Samples: 116203576. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 06:30:48,968][134211] Avg episode reward: [(0, '8.429')] [2025-01-04 06:30:49,835][134294] Updated weights for policy 0, policy_version 124064 (0.0027) [2025-01-04 06:30:52,768][134294] Updated weights for policy 0, policy_version 124074 (0.0026) [2025-01-04 06:30:53,968][134211] Fps is (10 sec: 12697.6, 60 sec: 13858.1, 300 sec: 14315.2). Total num frames: 508219392. Throughput: 0: 3570.9. Samples: 116222912. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 06:30:53,968][134211] Avg episode reward: [(0, '8.309')] [2025-01-04 06:30:55,811][134294] Updated weights for policy 0, policy_version 124084 (0.0027) [2025-01-04 06:30:58,717][134294] Updated weights for policy 0, policy_version 124094 (0.0026) [2025-01-04 06:30:58,968][134211] Fps is (10 sec: 13516.9, 60 sec: 13994.7, 300 sec: 14329.1). Total num frames: 508289024. Throughput: 0: 3503.9. Samples: 116243502. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 06:30:58,968][134211] Avg episode reward: [(0, '7.969')] [2025-01-04 06:31:01,769][134294] Updated weights for policy 0, policy_version 124104 (0.0027) [2025-01-04 06:31:03,967][134211] Fps is (10 sec: 13926.8, 60 sec: 14063.0, 300 sec: 14356.8). Total num frames: 508358656. Throughput: 0: 3401.3. Samples: 116253734. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 06:31:03,968][134211] Avg episode reward: [(0, '8.526')] [2025-01-04 06:31:04,435][134294] Updated weights for policy 0, policy_version 124114 (0.0018) [2025-01-04 06:31:06,321][134294] Updated weights for policy 0, policy_version 124124 (0.0014) [2025-01-04 06:31:08,350][134294] Updated weights for policy 0, policy_version 124134 (0.0014) [2025-01-04 06:31:08,967][134211] Fps is (10 sec: 17203.6, 60 sec: 14745.7, 300 sec: 14384.6). Total num frames: 508461056. Throughput: 0: 3408.0. Samples: 116280488. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 06:31:08,968][134211] Avg episode reward: [(0, '8.702')] [2025-01-04 06:31:10,425][134294] Updated weights for policy 0, policy_version 124144 (0.0013) [2025-01-04 06:31:13,400][134294] Updated weights for policy 0, policy_version 124154 (0.0029) [2025-01-04 06:31:13,968][134211] Fps is (10 sec: 18021.8, 60 sec: 14609.0, 300 sec: 14287.4). Total num frames: 508538880. Throughput: 0: 3556.8. Samples: 116306238. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-04 06:31:13,968][134211] Avg episode reward: [(0, '8.676')] [2025-01-04 06:31:13,978][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000124155_508538880.pth... [2025-01-04 06:31:14,068][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000123316_505102336.pth [2025-01-04 06:31:16,975][134294] Updated weights for policy 0, policy_version 124164 (0.0030) [2025-01-04 06:31:18,968][134211] Fps is (10 sec: 13516.5, 60 sec: 13858.1, 300 sec: 14245.8). Total num frames: 508596224. Throughput: 0: 3524.4. Samples: 116314846. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-04 06:31:18,968][134211] Avg episode reward: [(0, '8.900')] [2025-01-04 06:31:20,335][134294] Updated weights for policy 0, policy_version 124174 (0.0025) [2025-01-04 06:31:23,365][134294] Updated weights for policy 0, policy_version 124184 (0.0026) [2025-01-04 06:31:23,968][134211] Fps is (10 sec: 12288.1, 60 sec: 13789.9, 300 sec: 14259.6). Total num frames: 508661760. Throughput: 0: 3506.0. Samples: 116333844. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-04 06:31:23,968][134211] Avg episode reward: [(0, '8.924')] [2025-01-04 06:31:26,443][134294] Updated weights for policy 0, policy_version 124194 (0.0028) [2025-01-04 06:31:28,968][134211] Fps is (10 sec: 13516.9, 60 sec: 13858.2, 300 sec: 14287.4). Total num frames: 508731392. Throughput: 0: 3538.7. Samples: 116353494. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-04 06:31:28,968][134211] Avg episode reward: [(0, '8.996')] [2025-01-04 06:31:29,575][134294] Updated weights for policy 0, policy_version 124204 (0.0026) [2025-01-04 06:31:32,665][134294] Updated weights for policy 0, policy_version 124214 (0.0025) [2025-01-04 06:31:33,968][134211] Fps is (10 sec: 13516.8, 60 sec: 13858.1, 300 sec: 14315.2). Total num frames: 508796928. Throughput: 0: 3554.0. Samples: 116363506. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-04 06:31:33,968][134211] Avg episode reward: [(0, '8.459')] [2025-01-04 06:31:35,579][134294] Updated weights for policy 0, policy_version 124224 (0.0025) [2025-01-04 06:31:38,240][134294] Updated weights for policy 0, policy_version 124234 (0.0019) [2025-01-04 06:31:38,968][134211] Fps is (10 sec: 14336.1, 60 sec: 14063.0, 300 sec: 14370.7). Total num frames: 508874752. Throughput: 0: 3572.6. Samples: 116383678. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-04 06:31:38,968][134211] Avg episode reward: [(0, '8.020')] [2025-01-04 06:31:40,295][134294] Updated weights for policy 0, policy_version 124244 (0.0015) [2025-01-04 06:31:42,947][134294] Updated weights for policy 0, policy_version 124254 (0.0023) [2025-01-04 06:31:43,968][134211] Fps is (10 sec: 15564.7, 60 sec: 14336.0, 300 sec: 14426.2). Total num frames: 508952576. Throughput: 0: 3701.0. Samples: 116410046. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-04 06:31:43,969][134211] Avg episode reward: [(0, '9.088')] [2025-01-04 06:31:46,373][134294] Updated weights for policy 0, policy_version 124264 (0.0028) [2025-01-04 06:31:48,968][134211] Fps is (10 sec: 13926.5, 60 sec: 14336.1, 300 sec: 14398.5). Total num frames: 509014016. Throughput: 0: 3667.2. Samples: 116418756. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-04 06:31:48,968][134211] Avg episode reward: [(0, '8.442')] [2025-01-04 06:31:49,414][134294] Updated weights for policy 0, policy_version 124274 (0.0021) [2025-01-04 06:31:51,340][134294] Updated weights for policy 0, policy_version 124284 (0.0012) [2025-01-04 06:31:53,337][134294] Updated weights for policy 0, policy_version 124294 (0.0013) [2025-01-04 06:31:53,967][134211] Fps is (10 sec: 16794.2, 60 sec: 15018.7, 300 sec: 14523.5). Total num frames: 509120512. Throughput: 0: 3635.4. Samples: 116444082. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-04 06:31:53,968][134211] Avg episode reward: [(0, '8.685')] [2025-01-04 06:31:55,232][134294] Updated weights for policy 0, policy_version 124304 (0.0014) [2025-01-04 06:31:57,849][134294] Updated weights for policy 0, policy_version 124314 (0.0024) [2025-01-04 06:31:58,968][134211] Fps is (10 sec: 18841.1, 60 sec: 15223.4, 300 sec: 14579.0). Total num frames: 509202432. Throughput: 0: 3678.3. Samples: 116471762. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-04 06:31:58,968][134211] Avg episode reward: [(0, '8.206')] [2025-01-04 06:32:01,079][134294] Updated weights for policy 0, policy_version 124324 (0.0029) [2025-01-04 06:32:03,970][134211] Fps is (10 sec: 14332.2, 60 sec: 15086.3, 300 sec: 14453.9). Total num frames: 509263872. Throughput: 0: 3693.5. Samples: 116481060. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-04 06:32:03,971][134211] Avg episode reward: [(0, '8.115')] [2025-01-04 06:32:04,539][134294] Updated weights for policy 0, policy_version 124334 (0.0025) [2025-01-04 06:32:08,007][134294] Updated weights for policy 0, policy_version 124344 (0.0025) [2025-01-04 06:32:08,968][134211] Fps is (10 sec: 11878.4, 60 sec: 14335.9, 300 sec: 14315.2). Total num frames: 509321216. Throughput: 0: 3670.8. Samples: 116499030. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-04 06:32:08,968][134211] Avg episode reward: [(0, '8.505')] [2025-01-04 06:32:11,657][134294] Updated weights for policy 0, policy_version 124354 (0.0027) [2025-01-04 06:32:13,969][134211] Fps is (10 sec: 11470.5, 60 sec: 13994.5, 300 sec: 14245.7). Total num frames: 509378560. Throughput: 0: 3605.3. Samples: 116515736. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 06:32:13,969][134211] Avg episode reward: [(0, '8.808')] [2025-01-04 06:32:15,384][134294] Updated weights for policy 0, policy_version 124364 (0.0022) [2025-01-04 06:32:18,525][134294] Updated weights for policy 0, policy_version 124374 (0.0023) [2025-01-04 06:32:18,968][134211] Fps is (10 sec: 12288.3, 60 sec: 14131.3, 300 sec: 14259.6). Total num frames: 509444096. Throughput: 0: 3576.0. Samples: 116524426. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 06:32:18,969][134211] Avg episode reward: [(0, '9.292')] [2025-01-04 06:32:20,616][134294] Updated weights for policy 0, policy_version 124384 (0.0016) [2025-01-04 06:32:23,559][134294] Updated weights for policy 0, policy_version 124394 (0.0025) [2025-01-04 06:32:23,968][134211] Fps is (10 sec: 14336.7, 60 sec: 14335.9, 300 sec: 14287.4). Total num frames: 509521920. Throughput: 0: 3654.8. Samples: 116548146. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 06:32:23,969][134211] Avg episode reward: [(0, '9.647')] [2025-01-04 06:32:26,756][134294] Updated weights for policy 0, policy_version 124404 (0.0025) [2025-01-04 06:32:28,968][134211] Fps is (10 sec: 13926.3, 60 sec: 14199.5, 300 sec: 14273.5). Total num frames: 509583360. Throughput: 0: 3494.8. Samples: 116567312. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 06:32:28,968][134211] Avg episode reward: [(0, '8.090')] [2025-01-04 06:32:29,963][134294] Updated weights for policy 0, policy_version 124414 (0.0023) [2025-01-04 06:32:32,913][134294] Updated weights for policy 0, policy_version 124424 (0.0025) [2025-01-04 06:32:33,968][134211] Fps is (10 sec: 13107.6, 60 sec: 14267.7, 300 sec: 14301.3). Total num frames: 509652992. Throughput: 0: 3523.6. Samples: 116577320. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 06:32:33,968][134211] Avg episode reward: [(0, '8.880')] [2025-01-04 06:32:35,796][134294] Updated weights for policy 0, policy_version 124434 (0.0021) [2025-01-04 06:32:37,709][134294] Updated weights for policy 0, policy_version 124444 (0.0013) [2025-01-04 06:32:38,967][134211] Fps is (10 sec: 16384.2, 60 sec: 14540.8, 300 sec: 14398.5). Total num frames: 509747200. Throughput: 0: 3489.5. Samples: 116601110. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 06:32:38,968][134211] Avg episode reward: [(0, '7.397')] [2025-01-04 06:32:39,592][134294] Updated weights for policy 0, policy_version 124454 (0.0015) [2025-01-04 06:32:41,617][134294] Updated weights for policy 0, policy_version 124464 (0.0014) [2025-01-04 06:32:43,557][134294] Updated weights for policy 0, policy_version 124474 (0.0013) [2025-01-04 06:32:43,967][134211] Fps is (10 sec: 19661.4, 60 sec: 14950.5, 300 sec: 14523.5). Total num frames: 509849600. Throughput: 0: 3572.5. Samples: 116632524. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 06:32:43,968][134211] Avg episode reward: [(0, '9.263')] [2025-01-04 06:32:45,906][134294] Updated weights for policy 0, policy_version 124484 (0.0018) [2025-01-04 06:32:48,968][134211] Fps is (10 sec: 17612.4, 60 sec: 15155.1, 300 sec: 14537.3). Total num frames: 509923328. Throughput: 0: 3659.8. Samples: 116645744. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 06:32:48,969][134211] Avg episode reward: [(0, '9.015')] [2025-01-04 06:32:49,372][134294] Updated weights for policy 0, policy_version 124494 (0.0029) [2025-01-04 06:32:52,668][134294] Updated weights for policy 0, policy_version 124504 (0.0027) [2025-01-04 06:32:53,968][134211] Fps is (10 sec: 13516.4, 60 sec: 14404.2, 300 sec: 14495.7). Total num frames: 509984768. Throughput: 0: 3663.7. Samples: 116663898. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 06:32:53,968][134211] Avg episode reward: [(0, '8.618')] [2025-01-04 06:32:55,758][134294] Updated weights for policy 0, policy_version 124514 (0.0027) [2025-01-04 06:32:58,748][134294] Updated weights for policy 0, policy_version 124524 (0.0028) [2025-01-04 06:32:58,968][134211] Fps is (10 sec: 12697.8, 60 sec: 14131.2, 300 sec: 14454.0). Total num frames: 510050304. Throughput: 0: 3737.9. Samples: 116683940. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 06:32:58,968][134211] Avg episode reward: [(0, '8.170')] [2025-01-04 06:33:01,831][134294] Updated weights for policy 0, policy_version 124534 (0.0028) [2025-01-04 06:33:03,969][134211] Fps is (10 sec: 13106.2, 60 sec: 14199.8, 300 sec: 14426.2). Total num frames: 510115840. Throughput: 0: 3760.5. Samples: 116693654. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 06:33:03,969][134211] Avg episode reward: [(0, '9.107')] [2025-01-04 06:33:05,130][134294] Updated weights for policy 0, policy_version 124544 (0.0027) [2025-01-04 06:33:08,364][134294] Updated weights for policy 0, policy_version 124554 (0.0026) [2025-01-04 06:33:08,968][134211] Fps is (10 sec: 12697.6, 60 sec: 14267.8, 300 sec: 14315.2). Total num frames: 510177280. Throughput: 0: 3656.4. Samples: 116712684. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 06:33:08,968][134211] Avg episode reward: [(0, '8.427')] [2025-01-04 06:33:11,631][134294] Updated weights for policy 0, policy_version 124564 (0.0024) [2025-01-04 06:33:13,968][134211] Fps is (10 sec: 12698.6, 60 sec: 14404.4, 300 sec: 14329.0). Total num frames: 510242816. Throughput: 0: 3645.2. Samples: 116731348. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 06:33:13,969][134211] Avg episode reward: [(0, '8.695')] [2025-01-04 06:33:13,986][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000124571_510242816.pth... [2025-01-04 06:33:14,061][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000123729_506793984.pth [2025-01-04 06:33:15,014][134294] Updated weights for policy 0, policy_version 124574 (0.0027) [2025-01-04 06:33:18,140][134294] Updated weights for policy 0, policy_version 124584 (0.0026) [2025-01-04 06:33:18,968][134211] Fps is (10 sec: 12697.7, 60 sec: 14336.0, 300 sec: 14315.2). Total num frames: 510304256. Throughput: 0: 3629.4. Samples: 116740642. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 06:33:18,968][134211] Avg episode reward: [(0, '7.872')] [2025-01-04 06:33:21,181][134294] Updated weights for policy 0, policy_version 124594 (0.0024) [2025-01-04 06:33:23,568][134294] Updated weights for policy 0, policy_version 124604 (0.0016) [2025-01-04 06:33:23,968][134211] Fps is (10 sec: 13926.8, 60 sec: 14336.1, 300 sec: 14343.0). Total num frames: 510382080. Throughput: 0: 3545.9. Samples: 116760676. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 06:33:23,968][134211] Avg episode reward: [(0, '8.834')] [2025-01-04 06:33:25,482][134294] Updated weights for policy 0, policy_version 124614 (0.0013) [2025-01-04 06:33:27,747][134294] Updated weights for policy 0, policy_version 124624 (0.0016) [2025-01-04 06:33:28,968][134211] Fps is (10 sec: 16793.4, 60 sec: 14813.9, 300 sec: 14440.1). Total num frames: 510472192. Throughput: 0: 3489.5. Samples: 116789554. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 06:33:28,968][134211] Avg episode reward: [(0, '7.842')] [2025-01-04 06:33:30,889][134294] Updated weights for policy 0, policy_version 124634 (0.0023) [2025-01-04 06:33:33,892][134294] Updated weights for policy 0, policy_version 124644 (0.0027) [2025-01-04 06:33:33,968][134211] Fps is (10 sec: 15973.2, 60 sec: 14813.7, 300 sec: 14440.1). Total num frames: 510541824. Throughput: 0: 3414.0. Samples: 116799378. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 06:33:33,969][134211] Avg episode reward: [(0, '8.589')] [2025-01-04 06:33:37,003][134294] Updated weights for policy 0, policy_version 124654 (0.0025) [2025-01-04 06:33:38,968][134211] Fps is (10 sec: 13107.1, 60 sec: 14267.7, 300 sec: 14440.1). Total num frames: 510603264. Throughput: 0: 3458.1. Samples: 116819514. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 06:33:38,968][134211] Avg episode reward: [(0, '8.224')] [2025-01-04 06:33:40,495][134294] Updated weights for policy 0, policy_version 124664 (0.0026) [2025-01-04 06:33:43,421][134294] Updated weights for policy 0, policy_version 124674 (0.0021) [2025-01-04 06:33:43,968][134211] Fps is (10 sec: 13108.2, 60 sec: 13721.6, 300 sec: 14315.2). Total num frames: 510672896. Throughput: 0: 3429.8. Samples: 116838282. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 06:33:43,968][134211] Avg episode reward: [(0, '8.602')] [2025-01-04 06:33:45,391][134294] Updated weights for policy 0, policy_version 124684 (0.0013) [2025-01-04 06:33:47,365][134294] Updated weights for policy 0, policy_version 124694 (0.0013) [2025-01-04 06:33:48,968][134211] Fps is (10 sec: 17203.3, 60 sec: 14199.5, 300 sec: 14370.7). Total num frames: 510775296. Throughput: 0: 3557.5. Samples: 116853736. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 06:33:48,968][134211] Avg episode reward: [(0, '9.528')] [2025-01-04 06:33:49,651][134294] Updated weights for policy 0, policy_version 124704 (0.0020) [2025-01-04 06:33:52,841][134294] Updated weights for policy 0, policy_version 124714 (0.0029) [2025-01-04 06:33:53,968][134211] Fps is (10 sec: 16793.4, 60 sec: 14267.8, 300 sec: 14370.7). Total num frames: 510840832. Throughput: 0: 3682.6. Samples: 116878400. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 06:33:53,968][134211] Avg episode reward: [(0, '7.721')] [2025-01-04 06:33:55,965][134294] Updated weights for policy 0, policy_version 124724 (0.0026) [2025-01-04 06:33:58,968][134211] Fps is (10 sec: 13107.1, 60 sec: 14267.7, 300 sec: 14384.6). Total num frames: 510906368. Throughput: 0: 3700.2. Samples: 116897858. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 06:33:58,968][134211] Avg episode reward: [(0, '8.669')] [2025-01-04 06:33:59,131][134294] Updated weights for policy 0, policy_version 124734 (0.0024) [2025-01-04 06:34:02,071][134294] Updated weights for policy 0, policy_version 124744 (0.0029) [2025-01-04 06:34:03,968][134211] Fps is (10 sec: 13107.0, 60 sec: 14267.9, 300 sec: 14384.6). Total num frames: 510971904. Throughput: 0: 3715.2. Samples: 116907826. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 06:34:03,968][134211] Avg episode reward: [(0, '8.774')] [2025-01-04 06:34:05,208][134294] Updated weights for policy 0, policy_version 124754 (0.0023) [2025-01-04 06:34:08,504][134294] Updated weights for policy 0, policy_version 124764 (0.0028) [2025-01-04 06:34:08,968][134211] Fps is (10 sec: 13106.9, 60 sec: 14335.9, 300 sec: 14398.5). Total num frames: 511037440. Throughput: 0: 3715.1. Samples: 116927856. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 06:34:08,969][134211] Avg episode reward: [(0, '9.497')] [2025-01-04 06:34:11,871][134294] Updated weights for policy 0, policy_version 124774 (0.0024) [2025-01-04 06:34:13,968][134211] Fps is (10 sec: 13517.1, 60 sec: 14404.3, 300 sec: 14412.4). Total num frames: 511107072. Throughput: 0: 3493.7. Samples: 116946770. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 06:34:13,968][134211] Avg episode reward: [(0, '8.760')] [2025-01-04 06:34:14,339][134294] Updated weights for policy 0, policy_version 124784 (0.0013) [2025-01-04 06:34:16,399][134294] Updated weights for policy 0, policy_version 124794 (0.0013) [2025-01-04 06:34:18,361][134294] Updated weights for policy 0, policy_version 124804 (0.0014) [2025-01-04 06:34:18,967][134211] Fps is (10 sec: 17204.0, 60 sec: 15087.0, 300 sec: 14467.9). Total num frames: 511209472. Throughput: 0: 3598.3. Samples: 116961298. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 06:34:18,968][134211] Avg episode reward: [(0, '7.939')] [2025-01-04 06:34:20,279][134294] Updated weights for policy 0, policy_version 124814 (0.0014) [2025-01-04 06:34:22,719][134294] Updated weights for policy 0, policy_version 124824 (0.0020) [2025-01-04 06:34:23,968][134211] Fps is (10 sec: 18431.6, 60 sec: 15155.1, 300 sec: 14523.4). Total num frames: 511291392. Throughput: 0: 3814.1. Samples: 116991148. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 06:34:23,969][134211] Avg episode reward: [(0, '8.408')] [2025-01-04 06:34:26,282][134294] Updated weights for policy 0, policy_version 124834 (0.0028) [2025-01-04 06:34:28,968][134211] Fps is (10 sec: 13926.0, 60 sec: 14609.0, 300 sec: 14467.9). Total num frames: 511348736. Throughput: 0: 3789.3. Samples: 117008802. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 06:34:28,968][134211] Avg episode reward: [(0, '8.158')] [2025-01-04 06:34:29,649][134294] Updated weights for policy 0, policy_version 124844 (0.0028) [2025-01-04 06:34:32,750][134294] Updated weights for policy 0, policy_version 124854 (0.0024) [2025-01-04 06:34:33,968][134211] Fps is (10 sec: 12697.3, 60 sec: 14609.1, 300 sec: 14454.0). Total num frames: 511418368. Throughput: 0: 3660.6. Samples: 117018466. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 06:34:33,969][134211] Avg episode reward: [(0, '7.678')] [2025-01-04 06:34:35,798][134294] Updated weights for policy 0, policy_version 124864 (0.0026) [2025-01-04 06:34:38,871][134294] Updated weights for policy 0, policy_version 124874 (0.0027) [2025-01-04 06:34:38,968][134211] Fps is (10 sec: 13516.9, 60 sec: 14677.4, 300 sec: 14398.5). Total num frames: 511483904. Throughput: 0: 3565.7. Samples: 117038858. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 06:34:38,968][134211] Avg episode reward: [(0, '8.509')] [2025-01-04 06:34:42,118][134294] Updated weights for policy 0, policy_version 124884 (0.0026) [2025-01-04 06:34:43,968][134211] Fps is (10 sec: 12697.7, 60 sec: 14540.7, 300 sec: 14329.0). Total num frames: 511545344. Throughput: 0: 3548.2. Samples: 117057528. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 06:34:43,969][134211] Avg episode reward: [(0, '8.577')] [2025-01-04 06:34:45,401][134294] Updated weights for policy 0, policy_version 124894 (0.0028) [2025-01-04 06:34:48,582][134294] Updated weights for policy 0, policy_version 124904 (0.0027) [2025-01-04 06:34:48,968][134211] Fps is (10 sec: 12697.5, 60 sec: 13926.4, 300 sec: 14315.2). Total num frames: 511610880. Throughput: 0: 3548.6. Samples: 117067514. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 06:34:48,968][134211] Avg episode reward: [(0, '9.182')] [2025-01-04 06:34:51,723][134294] Updated weights for policy 0, policy_version 124914 (0.0026) [2025-01-04 06:34:53,968][134211] Fps is (10 sec: 13107.4, 60 sec: 13926.4, 300 sec: 14329.1). Total num frames: 511676416. Throughput: 0: 3528.8. Samples: 117086652. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 06:34:53,970][134211] Avg episode reward: [(0, '8.340')] [2025-01-04 06:34:54,729][134294] Updated weights for policy 0, policy_version 124924 (0.0027) [2025-01-04 06:34:56,873][134294] Updated weights for policy 0, policy_version 124934 (0.0015) [2025-01-04 06:34:58,968][134211] Fps is (10 sec: 14745.8, 60 sec: 14199.5, 300 sec: 14384.6). Total num frames: 511758336. Throughput: 0: 3645.4. Samples: 117110814. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 06:34:58,968][134211] Avg episode reward: [(0, '8.547')] [2025-01-04 06:34:59,633][134294] Updated weights for policy 0, policy_version 124944 (0.0023) [2025-01-04 06:35:02,741][134294] Updated weights for policy 0, policy_version 124954 (0.0030) [2025-01-04 06:35:03,968][134211] Fps is (10 sec: 14745.8, 60 sec: 14199.5, 300 sec: 14398.5). Total num frames: 511823872. Throughput: 0: 3549.7. Samples: 117121036. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 06:35:03,968][134211] Avg episode reward: [(0, '8.472')] [2025-01-04 06:35:05,780][134294] Updated weights for policy 0, policy_version 124964 (0.0025) [2025-01-04 06:35:08,709][134294] Updated weights for policy 0, policy_version 124974 (0.0022) [2025-01-04 06:35:08,968][134211] Fps is (10 sec: 13926.5, 60 sec: 14336.1, 300 sec: 14356.8). Total num frames: 511897600. Throughput: 0: 3336.6. Samples: 117141294. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 06:35:08,968][134211] Avg episode reward: [(0, '8.833')] [2025-01-04 06:35:10,773][134294] Updated weights for policy 0, policy_version 124984 (0.0014) [2025-01-04 06:35:12,793][134294] Updated weights for policy 0, policy_version 124994 (0.0013) [2025-01-04 06:35:13,968][134211] Fps is (10 sec: 17203.3, 60 sec: 14813.9, 300 sec: 14342.9). Total num frames: 511995904. Throughput: 0: 3558.8. Samples: 117168946. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 06:35:13,968][134211] Avg episode reward: [(0, '8.150')] [2025-01-04 06:35:13,975][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000124999_511995904.pth... [2025-01-04 06:35:14,025][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000124155_508538880.pth [2025-01-04 06:35:15,016][134294] Updated weights for policy 0, policy_version 125004 (0.0016) [2025-01-04 06:35:16,986][134294] Updated weights for policy 0, policy_version 125014 (0.0012) [2025-01-04 06:35:18,933][134294] Updated weights for policy 0, policy_version 125024 (0.0014) [2025-01-04 06:35:18,968][134211] Fps is (10 sec: 20070.3, 60 sec: 14813.8, 300 sec: 14454.0). Total num frames: 512098304. Throughput: 0: 3665.2. Samples: 117183400. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 06:35:18,968][134211] Avg episode reward: [(0, '8.609')] [2025-01-04 06:35:21,971][134294] Updated weights for policy 0, policy_version 125034 (0.0025) [2025-01-04 06:35:23,968][134211] Fps is (10 sec: 16383.7, 60 sec: 14472.5, 300 sec: 14440.1). Total num frames: 512159744. Throughput: 0: 3769.2. Samples: 117208472. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 06:35:23,969][134211] Avg episode reward: [(0, '9.076')] [2025-01-04 06:35:25,304][134294] Updated weights for policy 0, policy_version 125044 (0.0026) [2025-01-04 06:35:28,378][134294] Updated weights for policy 0, policy_version 125054 (0.0025) [2025-01-04 06:35:28,968][134211] Fps is (10 sec: 12697.5, 60 sec: 14609.1, 300 sec: 14440.1). Total num frames: 512225280. Throughput: 0: 3785.8. Samples: 117227886. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 06:35:28,968][134211] Avg episode reward: [(0, '8.848')] [2025-01-04 06:35:31,337][134294] Updated weights for policy 0, policy_version 125064 (0.0025) [2025-01-04 06:35:33,968][134211] Fps is (10 sec: 13516.8, 60 sec: 14609.1, 300 sec: 14454.0). Total num frames: 512294912. Throughput: 0: 3792.0. Samples: 117238154. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 06:35:33,969][134211] Avg episode reward: [(0, '8.278')] [2025-01-04 06:35:34,548][134294] Updated weights for policy 0, policy_version 125074 (0.0025) [2025-01-04 06:35:37,667][134294] Updated weights for policy 0, policy_version 125084 (0.0024) [2025-01-04 06:35:38,968][134211] Fps is (10 sec: 13107.1, 60 sec: 14540.8, 300 sec: 14454.0). Total num frames: 512356352. Throughput: 0: 3800.6. Samples: 117257678. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 06:35:38,968][134211] Avg episode reward: [(0, '8.180')] [2025-01-04 06:35:41,049][134294] Updated weights for policy 0, policy_version 125094 (0.0025) [2025-01-04 06:35:43,968][134211] Fps is (10 sec: 12288.0, 60 sec: 14540.8, 300 sec: 14454.0). Total num frames: 512417792. Throughput: 0: 3663.9. Samples: 117275692. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 06:35:43,968][134211] Avg episode reward: [(0, '7.823')] [2025-01-04 06:35:44,459][134294] Updated weights for policy 0, policy_version 125104 (0.0027) [2025-01-04 06:35:47,588][134294] Updated weights for policy 0, policy_version 125114 (0.0026) [2025-01-04 06:35:48,968][134211] Fps is (10 sec: 12697.6, 60 sec: 14540.8, 300 sec: 14454.0). Total num frames: 512483328. Throughput: 0: 3647.1. Samples: 117285154. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 06:35:48,968][134211] Avg episode reward: [(0, '7.638')] [2025-01-04 06:35:50,700][134294] Updated weights for policy 0, policy_version 125124 (0.0022) [2025-01-04 06:35:53,708][134294] Updated weights for policy 0, policy_version 125134 (0.0026) [2025-01-04 06:35:53,968][134211] Fps is (10 sec: 13516.8, 60 sec: 14609.1, 300 sec: 14454.0). Total num frames: 512552960. Throughput: 0: 3644.0. Samples: 117305276. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 06:35:53,968][134211] Avg episode reward: [(0, '9.495')] [2025-01-04 06:35:56,578][134294] Updated weights for policy 0, policy_version 125144 (0.0026) [2025-01-04 06:35:58,967][134211] Fps is (10 sec: 13517.2, 60 sec: 14336.0, 300 sec: 14440.1). Total num frames: 512618496. Throughput: 0: 3488.5. Samples: 117325928. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 06:35:58,968][134211] Avg episode reward: [(0, '9.202')] [2025-01-04 06:35:59,681][134294] Updated weights for policy 0, policy_version 125154 (0.0028) [2025-01-04 06:36:02,775][134294] Updated weights for policy 0, policy_version 125164 (0.0027) [2025-01-04 06:36:03,968][134211] Fps is (10 sec: 13107.4, 60 sec: 14336.0, 300 sec: 14315.2). Total num frames: 512684032. Throughput: 0: 3391.1. Samples: 117335998. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 06:36:03,968][134211] Avg episode reward: [(0, '8.705')] [2025-01-04 06:36:05,648][134294] Updated weights for policy 0, policy_version 125174 (0.0023) [2025-01-04 06:36:07,653][134294] Updated weights for policy 0, policy_version 125184 (0.0015) [2025-01-04 06:36:08,968][134211] Fps is (10 sec: 15974.3, 60 sec: 14677.3, 300 sec: 14370.7). Total num frames: 512778240. Throughput: 0: 3352.3. Samples: 117359324. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 06:36:08,968][134211] Avg episode reward: [(0, '7.971')] [2025-01-04 06:36:09,810][134294] Updated weights for policy 0, policy_version 125194 (0.0012) [2025-01-04 06:36:11,967][134294] Updated weights for policy 0, policy_version 125204 (0.0016) [2025-01-04 06:36:13,968][134211] Fps is (10 sec: 18841.8, 60 sec: 14609.1, 300 sec: 14495.7). Total num frames: 512872448. Throughput: 0: 3554.0. Samples: 117387816. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 06:36:13,968][134211] Avg episode reward: [(0, '9.112')] [2025-01-04 06:36:14,071][134294] Updated weights for policy 0, policy_version 125214 (0.0014) [2025-01-04 06:36:17,040][134294] Updated weights for policy 0, policy_version 125224 (0.0026) [2025-01-04 06:36:18,968][134211] Fps is (10 sec: 15974.1, 60 sec: 13994.6, 300 sec: 14495.7). Total num frames: 512937984. Throughput: 0: 3594.7. Samples: 117399916. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 06:36:18,968][134211] Avg episode reward: [(0, '8.167')] [2025-01-04 06:36:20,486][134294] Updated weights for policy 0, policy_version 125234 (0.0025) [2025-01-04 06:36:23,531][134294] Updated weights for policy 0, policy_version 125244 (0.0028) [2025-01-04 06:36:23,968][134211] Fps is (10 sec: 13106.8, 60 sec: 14062.9, 300 sec: 14481.8). Total num frames: 513003520. Throughput: 0: 3575.0. Samples: 117418554. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 06:36:23,969][134211] Avg episode reward: [(0, '8.957')] [2025-01-04 06:36:26,591][134294] Updated weights for policy 0, policy_version 125254 (0.0027) [2025-01-04 06:36:28,968][134211] Fps is (10 sec: 13107.3, 60 sec: 14062.9, 300 sec: 14481.8). Total num frames: 513069056. Throughput: 0: 3613.2. Samples: 117438286. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 06:36:28,968][134211] Avg episode reward: [(0, '9.241')] [2025-01-04 06:36:29,887][134294] Updated weights for policy 0, policy_version 125264 (0.0027) [2025-01-04 06:36:32,810][134294] Updated weights for policy 0, policy_version 125274 (0.0025) [2025-01-04 06:36:33,968][134211] Fps is (10 sec: 13107.2, 60 sec: 13994.7, 300 sec: 14440.1). Total num frames: 513134592. Throughput: 0: 3616.9. Samples: 117447916. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 06:36:33,968][134211] Avg episode reward: [(0, '8.721')] [2025-01-04 06:36:35,977][134294] Updated weights for policy 0, policy_version 125284 (0.0024) [2025-01-04 06:36:38,968][134211] Fps is (10 sec: 13107.2, 60 sec: 14062.9, 300 sec: 14398.5). Total num frames: 513200128. Throughput: 0: 3617.3. Samples: 117468054. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 06:36:38,968][134211] Avg episode reward: [(0, '8.636')] [2025-01-04 06:36:39,244][134294] Updated weights for policy 0, policy_version 125294 (0.0024) [2025-01-04 06:36:42,753][134294] Updated weights for policy 0, policy_version 125304 (0.0027) [2025-01-04 06:36:43,968][134211] Fps is (10 sec: 12287.8, 60 sec: 13994.6, 300 sec: 14384.6). Total num frames: 513257472. Throughput: 0: 3552.4. Samples: 117485786. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 06:36:43,969][134211] Avg episode reward: [(0, '8.553')] [2025-01-04 06:36:45,501][134294] Updated weights for policy 0, policy_version 125314 (0.0019) [2025-01-04 06:36:47,411][134294] Updated weights for policy 0, policy_version 125324 (0.0013) [2025-01-04 06:36:48,968][134211] Fps is (10 sec: 15564.8, 60 sec: 14540.8, 300 sec: 14356.8). Total num frames: 513355776. Throughput: 0: 3611.9. Samples: 117498534. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 06:36:48,968][134211] Avg episode reward: [(0, '8.811')] [2025-01-04 06:36:49,714][134294] Updated weights for policy 0, policy_version 125334 (0.0018) [2025-01-04 06:36:52,966][134294] Updated weights for policy 0, policy_version 125344 (0.0026) [2025-01-04 06:36:53,969][134211] Fps is (10 sec: 16383.0, 60 sec: 14472.3, 300 sec: 14301.2). Total num frames: 513421312. Throughput: 0: 3636.7. Samples: 117522978. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 06:36:53,969][134211] Avg episode reward: [(0, '7.912')] [2025-01-04 06:36:55,981][134294] Updated weights for policy 0, policy_version 125354 (0.0028) [2025-01-04 06:36:58,968][134211] Fps is (10 sec: 13107.2, 60 sec: 14472.5, 300 sec: 14315.3). Total num frames: 513486848. Throughput: 0: 3456.0. Samples: 117543338. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 06:36:58,968][134211] Avg episode reward: [(0, '8.140')] [2025-01-04 06:36:58,999][134294] Updated weights for policy 0, policy_version 125364 (0.0025) [2025-01-04 06:37:02,317][134294] Updated weights for policy 0, policy_version 125374 (0.0026) [2025-01-04 06:37:03,969][134211] Fps is (10 sec: 13106.5, 60 sec: 14472.2, 300 sec: 14342.9). Total num frames: 513552384. Throughput: 0: 3397.6. Samples: 117552814. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 06:37:03,970][134211] Avg episode reward: [(0, '8.360')] [2025-01-04 06:37:05,340][134294] Updated weights for policy 0, policy_version 125384 (0.0025) [2025-01-04 06:37:08,356][134294] Updated weights for policy 0, policy_version 125394 (0.0027) [2025-01-04 06:37:08,970][134211] Fps is (10 sec: 13104.5, 60 sec: 13994.1, 300 sec: 14370.7). Total num frames: 513617920. Throughput: 0: 3429.2. Samples: 117572876. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 06:37:08,970][134211] Avg episode reward: [(0, '7.986')] [2025-01-04 06:37:11,442][134294] Updated weights for policy 0, policy_version 125404 (0.0023) [2025-01-04 06:37:13,515][134294] Updated weights for policy 0, policy_version 125414 (0.0013) [2025-01-04 06:37:13,968][134211] Fps is (10 sec: 15157.1, 60 sec: 13858.1, 300 sec: 14440.1). Total num frames: 513703936. Throughput: 0: 3495.4. Samples: 117595580. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 06:37:13,968][134211] Avg episode reward: [(0, '9.260')] [2025-01-04 06:37:13,974][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000125416_513703936.pth... [2025-01-04 06:37:14,019][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000124571_510242816.pth [2025-01-04 06:37:15,582][134294] Updated weights for policy 0, policy_version 125424 (0.0014) [2025-01-04 06:37:17,427][134294] Updated weights for policy 0, policy_version 125434 (0.0015) [2025-01-04 06:37:18,968][134211] Fps is (10 sec: 18436.0, 60 sec: 14404.3, 300 sec: 14509.6). Total num frames: 513802240. Throughput: 0: 3619.3. Samples: 117610786. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 06:37:18,968][134211] Avg episode reward: [(0, '7.898')] [2025-01-04 06:37:20,047][134294] Updated weights for policy 0, policy_version 125444 (0.0022) [2025-01-04 06:37:23,338][134294] Updated weights for policy 0, policy_version 125454 (0.0028) [2025-01-04 06:37:23,968][134211] Fps is (10 sec: 15974.7, 60 sec: 14336.0, 300 sec: 14509.6). Total num frames: 513863680. Throughput: 0: 3698.1. Samples: 117634470. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 06:37:23,968][134211] Avg episode reward: [(0, '8.672')] [2025-01-04 06:37:26,503][134294] Updated weights for policy 0, policy_version 125464 (0.0028) [2025-01-04 06:37:28,968][134211] Fps is (10 sec: 13107.2, 60 sec: 14404.3, 300 sec: 14509.6). Total num frames: 513933312. Throughput: 0: 3736.3. Samples: 117653920. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 06:37:28,968][134211] Avg episode reward: [(0, '8.482')] [2025-01-04 06:37:29,597][134294] Updated weights for policy 0, policy_version 125474 (0.0026) [2025-01-04 06:37:32,713][134294] Updated weights for policy 0, policy_version 125484 (0.0025) [2025-01-04 06:37:33,968][134211] Fps is (10 sec: 13516.9, 60 sec: 14404.3, 300 sec: 14412.4). Total num frames: 513998848. Throughput: 0: 3670.9. Samples: 117663722. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 06:37:33,968][134211] Avg episode reward: [(0, '8.930')] [2025-01-04 06:37:35,647][134294] Updated weights for policy 0, policy_version 125494 (0.0026) [2025-01-04 06:37:38,710][134294] Updated weights for policy 0, policy_version 125504 (0.0026) [2025-01-04 06:37:38,968][134211] Fps is (10 sec: 13107.0, 60 sec: 14404.3, 300 sec: 14287.4). Total num frames: 514064384. Throughput: 0: 3588.7. Samples: 117684468. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 06:37:38,968][134211] Avg episode reward: [(0, '9.093')] [2025-01-04 06:37:42,094][134294] Updated weights for policy 0, policy_version 125514 (0.0026) [2025-01-04 06:37:43,968][134211] Fps is (10 sec: 12697.4, 60 sec: 14472.6, 300 sec: 14245.7). Total num frames: 514125824. Throughput: 0: 3541.8. Samples: 117702718. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 06:37:43,968][134211] Avg episode reward: [(0, '8.196')] [2025-01-04 06:37:45,443][134294] Updated weights for policy 0, policy_version 125524 (0.0027) [2025-01-04 06:37:48,361][134294] Updated weights for policy 0, policy_version 125534 (0.0026) [2025-01-04 06:37:48,968][134211] Fps is (10 sec: 12697.6, 60 sec: 13926.4, 300 sec: 14259.6). Total num frames: 514191360. Throughput: 0: 3547.6. Samples: 117712450. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 06:37:48,968][134211] Avg episode reward: [(0, '6.962')] [2025-01-04 06:37:50,772][134294] Updated weights for policy 0, policy_version 125544 (0.0018) [2025-01-04 06:37:52,877][134294] Updated weights for policy 0, policy_version 125554 (0.0014) [2025-01-04 06:37:53,968][134211] Fps is (10 sec: 15974.5, 60 sec: 14404.5, 300 sec: 14356.8). Total num frames: 514285568. Throughput: 0: 3642.0. Samples: 117736758. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 06:37:53,968][134211] Avg episode reward: [(0, '8.457')] [2025-01-04 06:37:55,317][134294] Updated weights for policy 0, policy_version 125564 (0.0020) [2025-01-04 06:37:58,327][134294] Updated weights for policy 0, policy_version 125574 (0.0024) [2025-01-04 06:37:58,968][134211] Fps is (10 sec: 16793.7, 60 sec: 14540.8, 300 sec: 14384.6). Total num frames: 514359296. Throughput: 0: 3664.0. Samples: 117760460. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 06:37:58,968][134211] Avg episode reward: [(0, '7.714')] [2025-01-04 06:38:01,375][134294] Updated weights for policy 0, policy_version 125584 (0.0029) [2025-01-04 06:38:03,968][134211] Fps is (10 sec: 13926.2, 60 sec: 14541.1, 300 sec: 14398.5). Total num frames: 514424832. Throughput: 0: 3551.3. Samples: 117770596. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 06:38:03,968][134211] Avg episode reward: [(0, '8.804')] [2025-01-04 06:38:04,594][134294] Updated weights for policy 0, policy_version 125594 (0.0024) [2025-01-04 06:38:07,696][134294] Updated weights for policy 0, policy_version 125604 (0.0023) [2025-01-04 06:38:08,968][134211] Fps is (10 sec: 12697.7, 60 sec: 14473.1, 300 sec: 14384.6). Total num frames: 514486272. Throughput: 0: 3458.6. Samples: 117790108. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 06:38:08,968][134211] Avg episode reward: [(0, '8.480')] [2025-01-04 06:38:10,710][134294] Updated weights for policy 0, policy_version 125614 (0.0020) [2025-01-04 06:38:12,885][134294] Updated weights for policy 0, policy_version 125624 (0.0012) [2025-01-04 06:38:13,968][134211] Fps is (10 sec: 14746.1, 60 sec: 14472.6, 300 sec: 14467.9). Total num frames: 514572288. Throughput: 0: 3547.6. Samples: 117813562. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 06:38:13,968][134211] Avg episode reward: [(0, '9.243')] [2025-01-04 06:38:15,075][134294] Updated weights for policy 0, policy_version 125634 (0.0013) [2025-01-04 06:38:16,980][134294] Updated weights for policy 0, policy_version 125644 (0.0012) [2025-01-04 06:38:18,931][134294] Updated weights for policy 0, policy_version 125654 (0.0013) [2025-01-04 06:38:18,967][134211] Fps is (10 sec: 19251.5, 60 sec: 14609.1, 300 sec: 14565.1). Total num frames: 514678784. Throughput: 0: 3656.6. Samples: 117828268. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 06:38:18,968][134211] Avg episode reward: [(0, '8.315')] [2025-01-04 06:38:21,904][134294] Updated weights for policy 0, policy_version 125664 (0.0029) [2025-01-04 06:38:23,968][134211] Fps is (10 sec: 16792.7, 60 sec: 14609.0, 300 sec: 14467.9). Total num frames: 514740224. Throughput: 0: 3762.6. Samples: 117853784. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 06:38:23,969][134211] Avg episode reward: [(0, '8.189')] [2025-01-04 06:38:25,337][134294] Updated weights for policy 0, policy_version 125674 (0.0025) [2025-01-04 06:38:28,481][134294] Updated weights for policy 0, policy_version 125684 (0.0025) [2025-01-04 06:38:28,968][134211] Fps is (10 sec: 12696.9, 60 sec: 14540.7, 300 sec: 14454.0). Total num frames: 514805760. Throughput: 0: 3773.6. Samples: 117872532. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 06:38:28,969][134211] Avg episode reward: [(0, '8.675')] [2025-01-04 06:38:31,616][134294] Updated weights for policy 0, policy_version 125694 (0.0028) [2025-01-04 06:38:33,968][134211] Fps is (10 sec: 13107.7, 60 sec: 14540.8, 300 sec: 14467.9). Total num frames: 514871296. Throughput: 0: 3778.6. Samples: 117882488. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 06:38:33,968][134211] Avg episode reward: [(0, '7.940')] [2025-01-04 06:38:34,770][134294] Updated weights for policy 0, policy_version 125704 (0.0027) [2025-01-04 06:38:37,898][134294] Updated weights for policy 0, policy_version 125714 (0.0027) [2025-01-04 06:38:38,968][134211] Fps is (10 sec: 13107.6, 60 sec: 14540.8, 300 sec: 14454.0). Total num frames: 514936832. Throughput: 0: 3675.9. Samples: 117902172. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 06:38:38,968][134211] Avg episode reward: [(0, '8.676')] [2025-01-04 06:38:41,226][134294] Updated weights for policy 0, policy_version 125724 (0.0027) [2025-01-04 06:38:43,968][134211] Fps is (10 sec: 12697.3, 60 sec: 14540.8, 300 sec: 14315.2). Total num frames: 514998272. Throughput: 0: 3558.3. Samples: 117920582. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 06:38:43,969][134211] Avg episode reward: [(0, '8.089')] [2025-01-04 06:38:44,579][134294] Updated weights for policy 0, policy_version 125734 (0.0027) [2025-01-04 06:38:47,911][134294] Updated weights for policy 0, policy_version 125744 (0.0029) [2025-01-04 06:38:48,968][134211] Fps is (10 sec: 12288.2, 60 sec: 14472.6, 300 sec: 14301.3). Total num frames: 515059712. Throughput: 0: 3537.0. Samples: 117929762. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 06:38:48,968][134211] Avg episode reward: [(0, '8.870')] [2025-01-04 06:38:51,012][134294] Updated weights for policy 0, policy_version 125754 (0.0022) [2025-01-04 06:38:53,968][134211] Fps is (10 sec: 12698.0, 60 sec: 13994.7, 300 sec: 14301.3). Total num frames: 515125248. Throughput: 0: 3526.0. Samples: 117948776. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 06:38:53,968][134211] Avg episode reward: [(0, '8.193')] [2025-01-04 06:38:54,114][134294] Updated weights for policy 0, policy_version 125764 (0.0022) [2025-01-04 06:38:56,051][134294] Updated weights for policy 0, policy_version 125774 (0.0013) [2025-01-04 06:38:57,899][134294] Updated weights for policy 0, policy_version 125784 (0.0013) [2025-01-04 06:38:58,968][134211] Fps is (10 sec: 17203.3, 60 sec: 14540.8, 300 sec: 14440.2). Total num frames: 515231744. Throughput: 0: 3644.5. Samples: 117977564. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 06:38:58,968][134211] Avg episode reward: [(0, '8.910')] [2025-01-04 06:38:59,783][134294] Updated weights for policy 0, policy_version 125794 (0.0013) [2025-01-04 06:39:01,743][134294] Updated weights for policy 0, policy_version 125804 (0.0012) [2025-01-04 06:39:03,897][134294] Updated weights for policy 0, policy_version 125814 (0.0017) [2025-01-04 06:39:03,968][134211] Fps is (10 sec: 20889.5, 60 sec: 15155.3, 300 sec: 14565.1). Total num frames: 515334144. Throughput: 0: 3669.4. Samples: 117993392. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 06:39:03,968][134211] Avg episode reward: [(0, '8.358')] [2025-01-04 06:39:07,353][134294] Updated weights for policy 0, policy_version 125824 (0.0028) [2025-01-04 06:39:08,970][134211] Fps is (10 sec: 15561.2, 60 sec: 15018.1, 300 sec: 14509.4). Total num frames: 515387392. Throughput: 0: 3614.5. Samples: 118016444. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 06:39:08,971][134211] Avg episode reward: [(0, '8.438')] [2025-01-04 06:39:11,135][134294] Updated weights for policy 0, policy_version 125834 (0.0030) [2025-01-04 06:39:13,968][134211] Fps is (10 sec: 11059.1, 60 sec: 14540.8, 300 sec: 14356.8). Total num frames: 515444736. Throughput: 0: 3565.7. Samples: 118032986. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 06:39:13,968][134211] Avg episode reward: [(0, '8.716')] [2025-01-04 06:39:14,033][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000125842_515448832.pth... [2025-01-04 06:39:14,104][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000124999_511995904.pth [2025-01-04 06:39:14,770][134294] Updated weights for policy 0, policy_version 125844 (0.0028) [2025-01-04 06:39:18,045][134294] Updated weights for policy 0, policy_version 125854 (0.0026) [2025-01-04 06:39:18,968][134211] Fps is (10 sec: 11880.3, 60 sec: 13789.7, 300 sec: 14287.4). Total num frames: 515506176. Throughput: 0: 3534.0. Samples: 118041520. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 06:39:18,969][134211] Avg episode reward: [(0, '8.293')] [2025-01-04 06:39:21,155][134294] Updated weights for policy 0, policy_version 125864 (0.0025) [2025-01-04 06:39:23,968][134211] Fps is (10 sec: 12287.8, 60 sec: 13789.9, 300 sec: 14301.3). Total num frames: 515567616. Throughput: 0: 3523.6. Samples: 118060736. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 06:39:23,969][134211] Avg episode reward: [(0, '8.041')] [2025-01-04 06:39:24,639][134294] Updated weights for policy 0, policy_version 125874 (0.0022) [2025-01-04 06:39:26,592][134294] Updated weights for policy 0, policy_version 125884 (0.0011) [2025-01-04 06:39:28,576][134294] Updated weights for policy 0, policy_version 125894 (0.0014) [2025-01-04 06:39:28,968][134211] Fps is (10 sec: 16385.0, 60 sec: 14404.4, 300 sec: 14412.4). Total num frames: 515670016. Throughput: 0: 3690.7. Samples: 118086662. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 06:39:28,968][134211] Avg episode reward: [(0, '8.720')] [2025-01-04 06:39:30,489][134294] Updated weights for policy 0, policy_version 125904 (0.0013) [2025-01-04 06:39:32,344][134294] Updated weights for policy 0, policy_version 125914 (0.0013) [2025-01-04 06:39:33,968][134211] Fps is (10 sec: 20890.1, 60 sec: 15087.0, 300 sec: 14551.2). Total num frames: 515776512. Throughput: 0: 3844.4. Samples: 118102760. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 06:39:33,968][134211] Avg episode reward: [(0, '9.325')] [2025-01-04 06:39:34,295][134294] Updated weights for policy 0, policy_version 125924 (0.0012) [2025-01-04 06:39:36,439][134294] Updated weights for policy 0, policy_version 125934 (0.0017) [2025-01-04 06:39:38,968][134211] Fps is (10 sec: 18431.6, 60 sec: 15291.7, 300 sec: 14606.8). Total num frames: 515854336. Throughput: 0: 4074.0. Samples: 118132106. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 06:39:38,968][134211] Avg episode reward: [(0, '8.163')] [2025-01-04 06:39:39,850][134294] Updated weights for policy 0, policy_version 125944 (0.0028) [2025-01-04 06:39:43,572][134294] Updated weights for policy 0, policy_version 125954 (0.0029) [2025-01-04 06:39:43,969][134211] Fps is (10 sec: 13105.4, 60 sec: 15154.9, 300 sec: 14565.0). Total num frames: 515907584. Throughput: 0: 3809.8. Samples: 118149010. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 06:39:43,970][134211] Avg episode reward: [(0, '8.386')] [2025-01-04 06:39:46,999][134294] Updated weights for policy 0, policy_version 125964 (0.0028) [2025-01-04 06:39:48,968][134211] Fps is (10 sec: 11468.6, 60 sec: 15155.1, 300 sec: 14551.2). Total num frames: 515969024. Throughput: 0: 3654.4. Samples: 118157840. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 06:39:48,969][134211] Avg episode reward: [(0, '9.695')] [2025-01-04 06:39:50,249][134294] Updated weights for policy 0, policy_version 125974 (0.0023) [2025-01-04 06:39:53,493][134294] Updated weights for policy 0, policy_version 125984 (0.0023) [2025-01-04 06:39:53,968][134211] Fps is (10 sec: 12699.2, 60 sec: 15155.2, 300 sec: 14495.7). Total num frames: 516034560. Throughput: 0: 3566.5. Samples: 118176928. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 06:39:53,968][134211] Avg episode reward: [(0, '8.615')] [2025-01-04 06:39:56,596][134294] Updated weights for policy 0, policy_version 125994 (0.0028) [2025-01-04 06:39:58,968][134211] Fps is (10 sec: 13106.6, 60 sec: 14472.3, 300 sec: 14495.6). Total num frames: 516100096. Throughput: 0: 3626.4. Samples: 118196176. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 06:39:58,969][134211] Avg episode reward: [(0, '7.728')] [2025-01-04 06:39:59,799][134294] Updated weights for policy 0, policy_version 126004 (0.0024) [2025-01-04 06:40:02,930][134294] Updated weights for policy 0, policy_version 126014 (0.0027) [2025-01-04 06:40:03,969][134211] Fps is (10 sec: 13105.4, 60 sec: 13857.8, 300 sec: 14467.8). Total num frames: 516165632. Throughput: 0: 3654.0. Samples: 118205954. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 06:40:03,970][134211] Avg episode reward: [(0, '9.931')] [2025-01-04 06:40:06,013][134294] Updated weights for policy 0, policy_version 126024 (0.0022) [2025-01-04 06:40:08,968][134211] Fps is (10 sec: 13108.1, 60 sec: 14063.4, 300 sec: 14356.8). Total num frames: 516231168. Throughput: 0: 3660.8. Samples: 118225470. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 06:40:08,968][134211] Avg episode reward: [(0, '8.441')] [2025-01-04 06:40:09,302][134294] Updated weights for policy 0, policy_version 126034 (0.0028) [2025-01-04 06:40:12,768][134294] Updated weights for policy 0, policy_version 126044 (0.0023) [2025-01-04 06:40:13,967][134211] Fps is (10 sec: 12289.9, 60 sec: 14063.0, 300 sec: 14204.1). Total num frames: 516288512. Throughput: 0: 3482.1. Samples: 118243358. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 06:40:13,968][134211] Avg episode reward: [(0, '9.086')] [2025-01-04 06:40:15,252][134294] Updated weights for policy 0, policy_version 126054 (0.0016) [2025-01-04 06:40:17,317][134294] Updated weights for policy 0, policy_version 126064 (0.0014) [2025-01-04 06:40:18,968][134211] Fps is (10 sec: 15974.3, 60 sec: 14745.7, 300 sec: 14342.9). Total num frames: 516390912. Throughput: 0: 3427.6. Samples: 118257004. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 06:40:18,968][134211] Avg episode reward: [(0, '8.940')] [2025-01-04 06:40:19,259][134294] Updated weights for policy 0, policy_version 126074 (0.0016) [2025-01-04 06:40:21,174][134294] Updated weights for policy 0, policy_version 126084 (0.0015) [2025-01-04 06:40:23,579][134294] Updated weights for policy 0, policy_version 126094 (0.0018) [2025-01-04 06:40:23,968][134211] Fps is (10 sec: 19660.2, 60 sec: 15291.7, 300 sec: 14440.1). Total num frames: 516485120. Throughput: 0: 3472.3. Samples: 118288362. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 06:40:23,969][134211] Avg episode reward: [(0, '8.678')] [2025-01-04 06:40:26,990][134294] Updated weights for policy 0, policy_version 126104 (0.0027) [2025-01-04 06:40:28,968][134211] Fps is (10 sec: 15564.8, 60 sec: 14609.0, 300 sec: 14412.4). Total num frames: 516546560. Throughput: 0: 3520.2. Samples: 118307414. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 06:40:28,968][134211] Avg episode reward: [(0, '8.336')] [2025-01-04 06:40:30,166][134294] Updated weights for policy 0, policy_version 126114 (0.0027) [2025-01-04 06:40:33,177][134294] Updated weights for policy 0, policy_version 126124 (0.0027) [2025-01-04 06:40:33,969][134211] Fps is (10 sec: 12696.6, 60 sec: 13926.2, 300 sec: 14426.2). Total num frames: 516612096. Throughput: 0: 3546.0. Samples: 118317412. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 06:40:33,969][134211] Avg episode reward: [(0, '8.268')] [2025-01-04 06:40:36,227][134294] Updated weights for policy 0, policy_version 126134 (0.0027) [2025-01-04 06:40:38,968][134211] Fps is (10 sec: 13107.2, 60 sec: 13721.6, 300 sec: 14440.1). Total num frames: 516677632. Throughput: 0: 3567.2. Samples: 118337450. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 06:40:38,968][134211] Avg episode reward: [(0, '8.550')] [2025-01-04 06:40:39,527][134294] Updated weights for policy 0, policy_version 126144 (0.0026) [2025-01-04 06:40:42,798][134294] Updated weights for policy 0, policy_version 126154 (0.0028) [2025-01-04 06:40:43,968][134211] Fps is (10 sec: 12698.6, 60 sec: 13858.4, 300 sec: 14426.2). Total num frames: 516739072. Throughput: 0: 3551.3. Samples: 118355982. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 06:40:43,968][134211] Avg episode reward: [(0, '8.454')] [2025-01-04 06:40:46,076][134294] Updated weights for policy 0, policy_version 126164 (0.0024) [2025-01-04 06:40:48,968][134211] Fps is (10 sec: 12697.6, 60 sec: 13926.4, 300 sec: 14412.4). Total num frames: 516804608. Throughput: 0: 3547.3. Samples: 118365580. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 06:40:48,968][134211] Avg episode reward: [(0, '8.045')] [2025-01-04 06:40:49,197][134294] Updated weights for policy 0, policy_version 126174 (0.0025) [2025-01-04 06:40:52,401][134294] Updated weights for policy 0, policy_version 126184 (0.0022) [2025-01-04 06:40:53,969][134211] Fps is (10 sec: 12696.6, 60 sec: 13857.9, 300 sec: 14398.4). Total num frames: 516866048. Throughput: 0: 3542.6. Samples: 118384892. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 06:40:53,969][134211] Avg episode reward: [(0, '8.733')] [2025-01-04 06:40:55,699][134294] Updated weights for policy 0, policy_version 126194 (0.0027) [2025-01-04 06:40:57,850][134294] Updated weights for policy 0, policy_version 126204 (0.0015) [2025-01-04 06:40:58,968][134211] Fps is (10 sec: 14336.1, 60 sec: 14131.4, 300 sec: 14454.0). Total num frames: 516947968. Throughput: 0: 3648.3. Samples: 118407530. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 06:40:58,968][134211] Avg episode reward: [(0, '8.039')] [2025-01-04 06:41:00,646][134294] Updated weights for policy 0, policy_version 126214 (0.0025) [2025-01-04 06:41:03,648][134294] Updated weights for policy 0, policy_version 126224 (0.0025) [2025-01-04 06:41:03,968][134211] Fps is (10 sec: 15156.4, 60 sec: 14199.8, 300 sec: 14370.7). Total num frames: 517017600. Throughput: 0: 3583.5. Samples: 118418260. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 06:41:03,969][134211] Avg episode reward: [(0, '8.041')] [2025-01-04 06:41:06,739][134294] Updated weights for policy 0, policy_version 126234 (0.0024) [2025-01-04 06:41:08,968][134211] Fps is (10 sec: 13517.0, 60 sec: 14199.5, 300 sec: 14273.5). Total num frames: 517083136. Throughput: 0: 3329.1. Samples: 118438170. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 06:41:08,968][134211] Avg episode reward: [(0, '9.210')] [2025-01-04 06:41:09,535][134294] Updated weights for policy 0, policy_version 126244 (0.0020) [2025-01-04 06:41:11,577][134294] Updated weights for policy 0, policy_version 126254 (0.0013) [2025-01-04 06:41:13,968][134211] Fps is (10 sec: 15564.9, 60 sec: 14745.5, 300 sec: 14356.8). Total num frames: 517173248. Throughput: 0: 3483.9. Samples: 118464190. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 06:41:13,969][134211] Avg episode reward: [(0, '8.431')] [2025-01-04 06:41:13,977][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000126263_517173248.pth... [2025-01-04 06:41:14,060][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000125416_513703936.pth [2025-01-04 06:41:14,364][134294] Updated weights for policy 0, policy_version 126264 (0.0023) [2025-01-04 06:41:17,698][134294] Updated weights for policy 0, policy_version 126274 (0.0028) [2025-01-04 06:41:18,968][134211] Fps is (10 sec: 14745.0, 60 sec: 13994.6, 300 sec: 14329.1). Total num frames: 517230592. Throughput: 0: 3452.1. Samples: 118472754. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 06:41:18,969][134211] Avg episode reward: [(0, '8.303')] [2025-01-04 06:41:20,475][134294] Updated weights for policy 0, policy_version 126284 (0.0025) [2025-01-04 06:41:22,607][134294] Updated weights for policy 0, policy_version 126294 (0.0016) [2025-01-04 06:41:23,968][134211] Fps is (10 sec: 14336.2, 60 sec: 13858.2, 300 sec: 14398.5). Total num frames: 517316608. Throughput: 0: 3538.4. Samples: 118496680. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 06:41:23,968][134211] Avg episode reward: [(0, '8.561')] [2025-01-04 06:41:25,827][134294] Updated weights for policy 0, policy_version 126304 (0.0027) [2025-01-04 06:41:28,918][134294] Updated weights for policy 0, policy_version 126314 (0.0029) [2025-01-04 06:41:28,968][134211] Fps is (10 sec: 15155.7, 60 sec: 13926.4, 300 sec: 14398.5). Total num frames: 517382144. Throughput: 0: 3566.9. Samples: 118516492. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 06:41:28,968][134211] Avg episode reward: [(0, '8.550')] [2025-01-04 06:41:31,893][134294] Updated weights for policy 0, policy_version 126324 (0.0027) [2025-01-04 06:41:33,970][134211] Fps is (10 sec: 13104.2, 60 sec: 13926.1, 300 sec: 14398.4). Total num frames: 517447680. Throughput: 0: 3574.1. Samples: 118526422. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 06:41:33,971][134211] Avg episode reward: [(0, '8.934')] [2025-01-04 06:41:34,970][134294] Updated weights for policy 0, policy_version 126334 (0.0026) [2025-01-04 06:41:38,217][134294] Updated weights for policy 0, policy_version 126344 (0.0027) [2025-01-04 06:41:38,968][134211] Fps is (10 sec: 13107.1, 60 sec: 13926.4, 300 sec: 14426.3). Total num frames: 517513216. Throughput: 0: 3587.1. Samples: 118546306. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 06:41:38,968][134211] Avg episode reward: [(0, '8.323')] [2025-01-04 06:41:41,495][134294] Updated weights for policy 0, policy_version 126354 (0.0025) [2025-01-04 06:41:43,859][134294] Updated weights for policy 0, policy_version 126364 (0.0014) [2025-01-04 06:41:43,968][134211] Fps is (10 sec: 13929.8, 60 sec: 14131.3, 300 sec: 14342.9). Total num frames: 517586944. Throughput: 0: 3530.6. Samples: 118566408. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 06:41:43,968][134211] Avg episode reward: [(0, '8.986')] [2025-01-04 06:41:45,951][134294] Updated weights for policy 0, policy_version 126374 (0.0015) [2025-01-04 06:41:47,805][134294] Updated weights for policy 0, policy_version 126384 (0.0012) [2025-01-04 06:41:48,968][134211] Fps is (10 sec: 18022.7, 60 sec: 14813.9, 300 sec: 14481.8). Total num frames: 517693440. Throughput: 0: 3628.7. Samples: 118581550. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 06:41:48,968][134211] Avg episode reward: [(0, '7.704')] [2025-01-04 06:41:49,779][134294] Updated weights for policy 0, policy_version 126394 (0.0015) [2025-01-04 06:41:52,798][134294] Updated weights for policy 0, policy_version 126404 (0.0025) [2025-01-04 06:41:53,968][134211] Fps is (10 sec: 17202.8, 60 sec: 14882.3, 300 sec: 14481.8). Total num frames: 517758976. Throughput: 0: 3791.4. Samples: 118608784. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 06:41:53,969][134211] Avg episode reward: [(0, '7.986')] [2025-01-04 06:41:56,320][134294] Updated weights for policy 0, policy_version 126414 (0.0029) [2025-01-04 06:41:58,968][134211] Fps is (10 sec: 13107.0, 60 sec: 14609.1, 300 sec: 14481.9). Total num frames: 517824512. Throughput: 0: 3613.5. Samples: 118626798. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 06:41:58,968][134211] Avg episode reward: [(0, '8.166')] [2025-01-04 06:41:59,594][134294] Updated weights for policy 0, policy_version 126424 (0.0025) [2025-01-04 06:42:02,795][134294] Updated weights for policy 0, policy_version 126434 (0.0027) [2025-01-04 06:42:03,968][134211] Fps is (10 sec: 13107.4, 60 sec: 14540.8, 300 sec: 14481.9). Total num frames: 517890048. Throughput: 0: 3640.4. Samples: 118636570. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 06:42:03,968][134211] Avg episode reward: [(0, '7.642')] [2025-01-04 06:42:05,777][134294] Updated weights for policy 0, policy_version 126444 (0.0024) [2025-01-04 06:42:08,968][134211] Fps is (10 sec: 12697.4, 60 sec: 14472.5, 300 sec: 14398.5). Total num frames: 517951488. Throughput: 0: 3547.8. Samples: 118656332. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 06:42:08,969][134211] Avg episode reward: [(0, '8.780')] [2025-01-04 06:42:08,973][134294] Updated weights for policy 0, policy_version 126454 (0.0028) [2025-01-04 06:42:12,486][134294] Updated weights for policy 0, policy_version 126464 (0.0026) [2025-01-04 06:42:13,968][134211] Fps is (10 sec: 12697.7, 60 sec: 14063.0, 300 sec: 14287.4). Total num frames: 518017024. Throughput: 0: 3518.3. Samples: 118674814. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 06:42:13,968][134211] Avg episode reward: [(0, '8.667')] [2025-01-04 06:42:14,788][134294] Updated weights for policy 0, policy_version 126474 (0.0015) [2025-01-04 06:42:16,898][134294] Updated weights for policy 0, policy_version 126484 (0.0014) [2025-01-04 06:42:18,856][134294] Updated weights for policy 0, policy_version 126494 (0.0013) [2025-01-04 06:42:18,968][134211] Fps is (10 sec: 16794.2, 60 sec: 14814.0, 300 sec: 14426.3). Total num frames: 518119424. Throughput: 0: 3616.9. Samples: 118689172. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 06:42:18,968][134211] Avg episode reward: [(0, '9.227')] [2025-01-04 06:42:20,734][134294] Updated weights for policy 0, policy_version 126504 (0.0013) [2025-01-04 06:42:22,655][134294] Updated weights for policy 0, policy_version 126514 (0.0012) [2025-01-04 06:42:23,968][134211] Fps is (10 sec: 20889.5, 60 sec: 15155.2, 300 sec: 14551.2). Total num frames: 518225920. Throughput: 0: 3884.9. Samples: 118721128. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 06:42:23,968][134211] Avg episode reward: [(0, '8.697')] [2025-01-04 06:42:25,278][134294] Updated weights for policy 0, policy_version 126524 (0.0018) [2025-01-04 06:42:28,780][134294] Updated weights for policy 0, policy_version 126534 (0.0026) [2025-01-04 06:42:28,969][134211] Fps is (10 sec: 16381.9, 60 sec: 15018.4, 300 sec: 14523.4). Total num frames: 518283264. Throughput: 0: 3909.8. Samples: 118742356. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 06:42:28,970][134211] Avg episode reward: [(0, '8.182')] [2025-01-04 06:42:32,260][134294] Updated weights for policy 0, policy_version 126544 (0.0026) [2025-01-04 06:42:33,968][134211] Fps is (10 sec: 11878.2, 60 sec: 14951.0, 300 sec: 14509.6). Total num frames: 518344704. Throughput: 0: 3762.7. Samples: 118750872. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 06:42:33,969][134211] Avg episode reward: [(0, '9.065')] [2025-01-04 06:42:35,533][134294] Updated weights for policy 0, policy_version 126554 (0.0027) [2025-01-04 06:42:38,514][134294] Updated weights for policy 0, policy_version 126564 (0.0027) [2025-01-04 06:42:38,968][134211] Fps is (10 sec: 12699.0, 60 sec: 14950.4, 300 sec: 14523.4). Total num frames: 518410240. Throughput: 0: 3594.8. Samples: 118770550. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 06:42:38,968][134211] Avg episode reward: [(0, '9.170')] [2025-01-04 06:42:41,786][134294] Updated weights for policy 0, policy_version 126574 (0.0024) [2025-01-04 06:42:43,968][134211] Fps is (10 sec: 12697.7, 60 sec: 14745.6, 300 sec: 14509.6). Total num frames: 518471680. Throughput: 0: 3602.4. Samples: 118788908. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 06:42:43,968][134211] Avg episode reward: [(0, '8.688')] [2025-01-04 06:42:45,194][134294] Updated weights for policy 0, policy_version 126584 (0.0027) [2025-01-04 06:42:48,293][134294] Updated weights for policy 0, policy_version 126594 (0.0024) [2025-01-04 06:42:48,968][134211] Fps is (10 sec: 12697.8, 60 sec: 14062.9, 300 sec: 14412.4). Total num frames: 518537216. Throughput: 0: 3588.6. Samples: 118798058. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 06:42:48,968][134211] Avg episode reward: [(0, '8.194')] [2025-01-04 06:42:51,381][134294] Updated weights for policy 0, policy_version 126604 (0.0026) [2025-01-04 06:42:53,968][134211] Fps is (10 sec: 13107.1, 60 sec: 14063.0, 300 sec: 14384.6). Total num frames: 518602752. Throughput: 0: 3600.7. Samples: 118818364. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 06:42:53,968][134211] Avg episode reward: [(0, '8.327')] [2025-01-04 06:42:54,639][134294] Updated weights for policy 0, policy_version 126614 (0.0026) [2025-01-04 06:42:57,714][134294] Updated weights for policy 0, policy_version 126624 (0.0025) [2025-01-04 06:42:58,968][134211] Fps is (10 sec: 13107.2, 60 sec: 14063.0, 300 sec: 14384.6). Total num frames: 518668288. Throughput: 0: 3619.9. Samples: 118837710. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 06:42:58,968][134211] Avg episode reward: [(0, '8.606')] [2025-01-04 06:43:00,723][134294] Updated weights for policy 0, policy_version 126634 (0.0024) [2025-01-04 06:43:03,333][134294] Updated weights for policy 0, policy_version 126644 (0.0022) [2025-01-04 06:43:03,968][134211] Fps is (10 sec: 14335.8, 60 sec: 14267.7, 300 sec: 14440.1). Total num frames: 518746112. Throughput: 0: 3533.2. Samples: 118848166. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 06:43:03,968][134211] Avg episode reward: [(0, '7.889')] [2025-01-04 06:43:05,239][134294] Updated weights for policy 0, policy_version 126654 (0.0014) [2025-01-04 06:43:07,156][134294] Updated weights for policy 0, policy_version 126664 (0.0013) [2025-01-04 06:43:08,968][134211] Fps is (10 sec: 18432.1, 60 sec: 15018.7, 300 sec: 14509.6). Total num frames: 518852608. Throughput: 0: 3472.1. Samples: 118877372. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 06:43:08,968][134211] Avg episode reward: [(0, '8.432')] [2025-01-04 06:43:09,097][134294] Updated weights for policy 0, policy_version 126674 (0.0012) [2025-01-04 06:43:11,536][134294] Updated weights for policy 0, policy_version 126684 (0.0017) [2025-01-04 06:43:13,969][134211] Fps is (10 sec: 18020.9, 60 sec: 15154.9, 300 sec: 14398.4). Total num frames: 518926336. Throughput: 0: 3561.7. Samples: 118902634. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 06:43:13,969][134211] Avg episode reward: [(0, '8.935')] [2025-01-04 06:43:13,980][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000126691_518926336.pth... [2025-01-04 06:43:14,068][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000125842_515448832.pth [2025-01-04 06:43:15,072][134294] Updated weights for policy 0, policy_version 126694 (0.0028) [2025-01-04 06:43:18,483][134294] Updated weights for policy 0, policy_version 126704 (0.0026) [2025-01-04 06:43:18,968][134211] Fps is (10 sec: 13107.0, 60 sec: 14404.2, 300 sec: 14384.6). Total num frames: 518983680. Throughput: 0: 3549.8. Samples: 118910614. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 06:43:18,968][134211] Avg episode reward: [(0, '9.706')] [2025-01-04 06:43:21,632][134294] Updated weights for policy 0, policy_version 126714 (0.0026) [2025-01-04 06:43:23,968][134211] Fps is (10 sec: 12289.4, 60 sec: 13721.6, 300 sec: 14384.6). Total num frames: 519049216. Throughput: 0: 3547.1. Samples: 118930168. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 06:43:23,968][134211] Avg episode reward: [(0, '8.868')] [2025-01-04 06:43:24,968][134294] Updated weights for policy 0, policy_version 126724 (0.0028) [2025-01-04 06:43:28,021][134294] Updated weights for policy 0, policy_version 126734 (0.0026) [2025-01-04 06:43:28,968][134211] Fps is (10 sec: 12697.6, 60 sec: 13790.1, 300 sec: 14370.7). Total num frames: 519110656. Throughput: 0: 3563.9. Samples: 118949282. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 06:43:28,968][134211] Avg episode reward: [(0, '8.286')] [2025-01-04 06:43:31,073][134294] Updated weights for policy 0, policy_version 126744 (0.0024) [2025-01-04 06:43:33,968][134211] Fps is (10 sec: 13107.0, 60 sec: 13926.4, 300 sec: 14384.6). Total num frames: 519180288. Throughput: 0: 3586.5. Samples: 118959450. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 06:43:33,968][134211] Avg episode reward: [(0, '8.735')] [2025-01-04 06:43:34,197][134294] Updated weights for policy 0, policy_version 126754 (0.0025) [2025-01-04 06:43:37,196][134294] Updated weights for policy 0, policy_version 126764 (0.0027) [2025-01-04 06:43:38,968][134211] Fps is (10 sec: 13516.8, 60 sec: 13926.4, 300 sec: 14398.5). Total num frames: 519245824. Throughput: 0: 3579.8. Samples: 118979456. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 06:43:38,968][134211] Avg episode reward: [(0, '9.109')] [2025-01-04 06:43:40,523][134294] Updated weights for policy 0, policy_version 126774 (0.0026) [2025-01-04 06:43:42,578][134294] Updated weights for policy 0, policy_version 126784 (0.0013) [2025-01-04 06:43:43,968][134211] Fps is (10 sec: 15155.6, 60 sec: 14336.0, 300 sec: 14481.8). Total num frames: 519331840. Throughput: 0: 3669.3. Samples: 119002826. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 06:43:43,968][134211] Avg episode reward: [(0, '9.222')] [2025-01-04 06:43:44,921][134294] Updated weights for policy 0, policy_version 126794 (0.0017) [2025-01-04 06:43:48,084][134294] Updated weights for policy 0, policy_version 126804 (0.0027) [2025-01-04 06:43:48,968][134211] Fps is (10 sec: 15155.3, 60 sec: 14336.0, 300 sec: 14481.8). Total num frames: 519397376. Throughput: 0: 3695.7. Samples: 119014470. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 06:43:48,968][134211] Avg episode reward: [(0, '8.903')] [2025-01-04 06:43:51,199][134294] Updated weights for policy 0, policy_version 126814 (0.0026) [2025-01-04 06:43:53,868][134294] Updated weights for policy 0, policy_version 126824 (0.0017) [2025-01-04 06:43:53,968][134211] Fps is (10 sec: 13926.4, 60 sec: 14472.6, 300 sec: 14370.7). Total num frames: 519471104. Throughput: 0: 3475.6. Samples: 119033772. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 06:43:53,968][134211] Avg episode reward: [(0, '8.437')] [2025-01-04 06:43:55,967][134294] Updated weights for policy 0, policy_version 126834 (0.0014) [2025-01-04 06:43:57,855][134294] Updated weights for policy 0, policy_version 126844 (0.0013) [2025-01-04 06:43:58,968][134211] Fps is (10 sec: 17613.0, 60 sec: 15087.0, 300 sec: 14370.7). Total num frames: 519573504. Throughput: 0: 3572.1. Samples: 119063376. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 06:43:58,968][134211] Avg episode reward: [(0, '8.800')] [2025-01-04 06:43:59,730][134294] Updated weights for policy 0, policy_version 126854 (0.0014) [2025-01-04 06:44:01,605][134294] Updated weights for policy 0, policy_version 126864 (0.0014) [2025-01-04 06:44:03,968][134211] Fps is (10 sec: 20069.9, 60 sec: 15428.3, 300 sec: 14523.5). Total num frames: 519671808. Throughput: 0: 3758.0. Samples: 119079724. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 06:44:03,968][134211] Avg episode reward: [(0, '8.756')] [2025-01-04 06:44:04,161][134294] Updated weights for policy 0, policy_version 126874 (0.0020) [2025-01-04 06:44:07,340][134294] Updated weights for policy 0, policy_version 126884 (0.0027) [2025-01-04 06:44:08,968][134211] Fps is (10 sec: 15974.1, 60 sec: 14677.3, 300 sec: 14537.3). Total num frames: 519733248. Throughput: 0: 3811.5. Samples: 119101684. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 06:44:08,968][134211] Avg episode reward: [(0, '7.940')] [2025-01-04 06:44:11,070][134294] Updated weights for policy 0, policy_version 126894 (0.0033) [2025-01-04 06:44:13,968][134211] Fps is (10 sec: 11468.8, 60 sec: 14336.2, 300 sec: 14509.6). Total num frames: 519786496. Throughput: 0: 3757.2. Samples: 119118356. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 06:44:13,968][134211] Avg episode reward: [(0, '9.525')] [2025-01-04 06:44:14,870][134294] Updated weights for policy 0, policy_version 126904 (0.0026) [2025-01-04 06:44:18,437][134294] Updated weights for policy 0, policy_version 126914 (0.0027) [2025-01-04 06:44:18,968][134211] Fps is (10 sec: 11059.2, 60 sec: 14336.0, 300 sec: 14495.7). Total num frames: 519843840. Throughput: 0: 3716.4. Samples: 119126686. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 06:44:18,968][134211] Avg episode reward: [(0, '7.951')] [2025-01-04 06:44:21,960][134294] Updated weights for policy 0, policy_version 126924 (0.0023) [2025-01-04 06:44:23,968][134211] Fps is (10 sec: 11468.8, 60 sec: 14199.4, 300 sec: 14342.9). Total num frames: 519901184. Throughput: 0: 3657.7. Samples: 119144052. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 06:44:23,968][134211] Avg episode reward: [(0, '9.260')] [2025-01-04 06:44:25,361][134294] Updated weights for policy 0, policy_version 126934 (0.0024) [2025-01-04 06:44:28,557][134294] Updated weights for policy 0, policy_version 126944 (0.0025) [2025-01-04 06:44:28,968][134211] Fps is (10 sec: 12288.0, 60 sec: 14267.7, 300 sec: 14204.1). Total num frames: 519966720. Throughput: 0: 3552.3. Samples: 119162678. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 06:44:28,968][134211] Avg episode reward: [(0, '8.123')] [2025-01-04 06:44:31,903][134294] Updated weights for policy 0, policy_version 126954 (0.0025) [2025-01-04 06:44:33,968][134211] Fps is (10 sec: 12697.8, 60 sec: 14131.2, 300 sec: 14148.6). Total num frames: 520028160. Throughput: 0: 3496.0. Samples: 119171792. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 06:44:33,968][134211] Avg episode reward: [(0, '8.859')] [2025-01-04 06:44:34,995][134294] Updated weights for policy 0, policy_version 126964 (0.0027) [2025-01-04 06:44:37,589][134294] Updated weights for policy 0, policy_version 126974 (0.0021) [2025-01-04 06:44:38,968][134211] Fps is (10 sec: 14336.2, 60 sec: 14404.3, 300 sec: 14245.8). Total num frames: 520110080. Throughput: 0: 3524.2. Samples: 119192360. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 06:44:38,968][134211] Avg episode reward: [(0, '8.910')] [2025-01-04 06:44:39,779][134294] Updated weights for policy 0, policy_version 126984 (0.0015) [2025-01-04 06:44:42,881][134294] Updated weights for policy 0, policy_version 126994 (0.0026) [2025-01-04 06:44:43,968][134211] Fps is (10 sec: 15155.1, 60 sec: 14131.2, 300 sec: 14273.5). Total num frames: 520179712. Throughput: 0: 3387.9. Samples: 119215830. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 06:44:43,968][134211] Avg episode reward: [(0, '7.608')] [2025-01-04 06:44:46,244][134294] Updated weights for policy 0, policy_version 127004 (0.0028) [2025-01-04 06:44:48,968][134211] Fps is (10 sec: 13106.1, 60 sec: 14062.7, 300 sec: 14259.6). Total num frames: 520241152. Throughput: 0: 3233.2. Samples: 119225220. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 06:44:48,969][134211] Avg episode reward: [(0, '9.126')] [2025-01-04 06:44:49,295][134294] Updated weights for policy 0, policy_version 127014 (0.0026) [2025-01-04 06:44:51,812][134294] Updated weights for policy 0, policy_version 127024 (0.0017) [2025-01-04 06:44:53,827][134294] Updated weights for policy 0, policy_version 127034 (0.0013) [2025-01-04 06:44:53,968][134211] Fps is (10 sec: 15155.4, 60 sec: 14336.0, 300 sec: 14343.0). Total num frames: 520331264. Throughput: 0: 3244.6. Samples: 119247692. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 06:44:53,968][134211] Avg episode reward: [(0, '8.110')] [2025-01-04 06:44:55,888][134294] Updated weights for policy 0, policy_version 127044 (0.0013) [2025-01-04 06:44:57,792][134294] Updated weights for policy 0, policy_version 127054 (0.0013) [2025-01-04 06:44:58,968][134211] Fps is (10 sec: 18842.2, 60 sec: 14267.6, 300 sec: 14454.1). Total num frames: 520429568. Throughput: 0: 3549.4. Samples: 119278082. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 06:44:58,969][134211] Avg episode reward: [(0, '8.230')] [2025-01-04 06:45:00,697][134294] Updated weights for policy 0, policy_version 127064 (0.0025) [2025-01-04 06:45:03,868][134294] Updated weights for policy 0, policy_version 127074 (0.0027) [2025-01-04 06:45:03,970][134211] Fps is (10 sec: 16380.5, 60 sec: 13721.2, 300 sec: 14453.9). Total num frames: 520495104. Throughput: 0: 3591.8. Samples: 119288324. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 06:45:03,970][134211] Avg episode reward: [(0, '9.055')] [2025-01-04 06:45:07,071][134294] Updated weights for policy 0, policy_version 127084 (0.0026) [2025-01-04 06:45:08,968][134211] Fps is (10 sec: 12698.1, 60 sec: 13721.6, 300 sec: 14467.9). Total num frames: 520556544. Throughput: 0: 3632.7. Samples: 119307522. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 06:45:08,968][134211] Avg episode reward: [(0, '8.184')] [2025-01-04 06:45:10,474][134294] Updated weights for policy 0, policy_version 127094 (0.0026) [2025-01-04 06:45:13,725][134294] Updated weights for policy 0, policy_version 127104 (0.0027) [2025-01-04 06:45:13,968][134211] Fps is (10 sec: 12290.5, 60 sec: 13858.2, 300 sec: 14329.1). Total num frames: 520617984. Throughput: 0: 3629.1. Samples: 119325988. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 06:45:13,969][134211] Avg episode reward: [(0, '8.160')] [2025-01-04 06:45:14,018][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000127105_520622080.pth... [2025-01-04 06:45:14,088][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000126263_517173248.pth [2025-01-04 06:45:17,196][134294] Updated weights for policy 0, policy_version 127114 (0.0028) [2025-01-04 06:45:18,970][134211] Fps is (10 sec: 12285.5, 60 sec: 13925.9, 300 sec: 14217.9). Total num frames: 520679424. Throughput: 0: 3621.9. Samples: 119334786. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 06:45:18,970][134211] Avg episode reward: [(0, '8.786')] [2025-01-04 06:45:20,221][134294] Updated weights for policy 0, policy_version 127124 (0.0027) [2025-01-04 06:45:23,158][134294] Updated weights for policy 0, policy_version 127134 (0.0024) [2025-01-04 06:45:23,971][134211] Fps is (10 sec: 13102.9, 60 sec: 14130.4, 300 sec: 14245.6). Total num frames: 520749056. Throughput: 0: 3614.1. Samples: 119355006. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 06:45:23,972][134211] Avg episode reward: [(0, '8.903')] [2025-01-04 06:45:26,283][134294] Updated weights for policy 0, policy_version 127144 (0.0023) [2025-01-04 06:45:28,968][134211] Fps is (10 sec: 13519.7, 60 sec: 14131.2, 300 sec: 14245.8). Total num frames: 520814592. Throughput: 0: 3535.2. Samples: 119374916. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 06:45:28,968][134211] Avg episode reward: [(0, '8.441')] [2025-01-04 06:45:29,378][134294] Updated weights for policy 0, policy_version 127154 (0.0025) [2025-01-04 06:45:31,537][134294] Updated weights for policy 0, policy_version 127164 (0.0014) [2025-01-04 06:45:33,968][134211] Fps is (10 sec: 15160.3, 60 sec: 14540.8, 300 sec: 14315.2). Total num frames: 520900608. Throughput: 0: 3602.2. Samples: 119387316. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 06:45:33,968][134211] Avg episode reward: [(0, '8.224')] [2025-01-04 06:45:34,179][134294] Updated weights for policy 0, policy_version 127174 (0.0022) [2025-01-04 06:45:37,139][134294] Updated weights for policy 0, policy_version 127184 (0.0026) [2025-01-04 06:45:38,968][134211] Fps is (10 sec: 15155.2, 60 sec: 14267.7, 300 sec: 14329.1). Total num frames: 520966144. Throughput: 0: 3597.8. Samples: 119409594. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 06:45:38,968][134211] Avg episode reward: [(0, '7.909')] [2025-01-04 06:45:40,184][134294] Updated weights for policy 0, policy_version 127194 (0.0025) [2025-01-04 06:45:42,915][134294] Updated weights for policy 0, policy_version 127204 (0.0020) [2025-01-04 06:45:43,968][134211] Fps is (10 sec: 14745.8, 60 sec: 14472.6, 300 sec: 14384.6). Total num frames: 521048064. Throughput: 0: 3403.0. Samples: 119431216. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 06:45:43,968][134211] Avg episode reward: [(0, '8.180')] [2025-01-04 06:45:44,983][134294] Updated weights for policy 0, policy_version 127214 (0.0013) [2025-01-04 06:45:46,934][134294] Updated weights for policy 0, policy_version 127224 (0.0015) [2025-01-04 06:45:48,819][134294] Updated weights for policy 0, policy_version 127234 (0.0012) [2025-01-04 06:45:48,968][134211] Fps is (10 sec: 18431.9, 60 sec: 15155.4, 300 sec: 14523.5). Total num frames: 521150464. Throughput: 0: 3517.0. Samples: 119446582. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 06:45:48,969][134211] Avg episode reward: [(0, '8.228')] [2025-01-04 06:45:50,689][134294] Updated weights for policy 0, policy_version 127244 (0.0013) [2025-01-04 06:45:53,264][134294] Updated weights for policy 0, policy_version 127254 (0.0021) [2025-01-04 06:45:53,968][134211] Fps is (10 sec: 19250.6, 60 sec: 15155.1, 300 sec: 14551.2). Total num frames: 521240576. Throughput: 0: 3782.1. Samples: 119477718. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 06:45:53,969][134211] Avg episode reward: [(0, '8.478')] [2025-01-04 06:45:56,550][134294] Updated weights for policy 0, policy_version 127264 (0.0027) [2025-01-04 06:45:58,968][134211] Fps is (10 sec: 15155.0, 60 sec: 14540.9, 300 sec: 14523.4). Total num frames: 521302016. Throughput: 0: 3790.3. Samples: 119496552. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 06:45:58,968][134211] Avg episode reward: [(0, '7.866')] [2025-01-04 06:45:59,755][134294] Updated weights for policy 0, policy_version 127274 (0.0027) [2025-01-04 06:46:03,151][134294] Updated weights for policy 0, policy_version 127284 (0.0027) [2025-01-04 06:46:03,968][134211] Fps is (10 sec: 12288.2, 60 sec: 14473.0, 300 sec: 14509.6). Total num frames: 521363456. Throughput: 0: 3806.0. Samples: 119506046. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 06:46:03,968][134211] Avg episode reward: [(0, '8.275')] [2025-01-04 06:46:06,192][134294] Updated weights for policy 0, policy_version 127294 (0.0026) [2025-01-04 06:46:08,968][134211] Fps is (10 sec: 12697.8, 60 sec: 14540.8, 300 sec: 14426.3). Total num frames: 521428992. Throughput: 0: 3785.9. Samples: 119525358. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 06:46:08,968][134211] Avg episode reward: [(0, '8.737')] [2025-01-04 06:46:09,345][134294] Updated weights for policy 0, policy_version 127304 (0.0026) [2025-01-04 06:46:12,599][134294] Updated weights for policy 0, policy_version 127314 (0.0026) [2025-01-04 06:46:13,968][134211] Fps is (10 sec: 12697.5, 60 sec: 14540.8, 300 sec: 14440.1). Total num frames: 521490432. Throughput: 0: 3763.1. Samples: 119544258. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 06:46:13,968][134211] Avg episode reward: [(0, '8.760')] [2025-01-04 06:46:16,155][134294] Updated weights for policy 0, policy_version 127324 (0.0028) [2025-01-04 06:46:18,968][134211] Fps is (10 sec: 12288.0, 60 sec: 14541.3, 300 sec: 14356.8). Total num frames: 521551872. Throughput: 0: 3682.6. Samples: 119553034. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 06:46:18,968][134211] Avg episode reward: [(0, '9.130')] [2025-01-04 06:46:19,700][134294] Updated weights for policy 0, policy_version 127334 (0.0028) [2025-01-04 06:46:21,732][134294] Updated weights for policy 0, policy_version 127344 (0.0014) [2025-01-04 06:46:23,968][134211] Fps is (10 sec: 14336.1, 60 sec: 14746.4, 300 sec: 14412.4). Total num frames: 521633792. Throughput: 0: 3684.9. Samples: 119575416. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 06:46:23,968][134211] Avg episode reward: [(0, '8.836')] [2025-01-04 06:46:24,502][134294] Updated weights for policy 0, policy_version 127354 (0.0023) [2025-01-04 06:46:27,624][134294] Updated weights for policy 0, policy_version 127364 (0.0026) [2025-01-04 06:46:28,968][134211] Fps is (10 sec: 14745.4, 60 sec: 14745.6, 300 sec: 14412.5). Total num frames: 521699328. Throughput: 0: 3652.3. Samples: 119595572. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 06:46:28,968][134211] Avg episode reward: [(0, '7.888')] [2025-01-04 06:46:30,703][134294] Updated weights for policy 0, policy_version 127374 (0.0024) [2025-01-04 06:46:33,526][134294] Updated weights for policy 0, policy_version 127384 (0.0024) [2025-01-04 06:46:33,968][134211] Fps is (10 sec: 13516.7, 60 sec: 14472.5, 300 sec: 14426.2). Total num frames: 521768960. Throughput: 0: 3543.3. Samples: 119606032. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 06:46:33,969][134211] Avg episode reward: [(0, '9.767')] [2025-01-04 06:46:36,587][134294] Updated weights for policy 0, policy_version 127394 (0.0024) [2025-01-04 06:46:38,968][134211] Fps is (10 sec: 13516.4, 60 sec: 14472.4, 300 sec: 14398.5). Total num frames: 521834496. Throughput: 0: 3309.3. Samples: 119626636. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 06:46:38,969][134211] Avg episode reward: [(0, '8.601')] [2025-01-04 06:46:39,713][134294] Updated weights for policy 0, policy_version 127404 (0.0027) [2025-01-04 06:46:42,398][134294] Updated weights for policy 0, policy_version 127414 (0.0021) [2025-01-04 06:46:43,968][134211] Fps is (10 sec: 14745.9, 60 sec: 14472.5, 300 sec: 14315.2). Total num frames: 521916416. Throughput: 0: 3393.3. Samples: 119649250. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 06:46:43,968][134211] Avg episode reward: [(0, '7.998')] [2025-01-04 06:46:44,457][134294] Updated weights for policy 0, policy_version 127424 (0.0012) [2025-01-04 06:46:46,510][134294] Updated weights for policy 0, policy_version 127434 (0.0012) [2025-01-04 06:46:48,449][134294] Updated weights for policy 0, policy_version 127444 (0.0013) [2025-01-04 06:46:48,968][134211] Fps is (10 sec: 18432.9, 60 sec: 14472.6, 300 sec: 14440.2). Total num frames: 522018816. Throughput: 0: 3506.7. Samples: 119663846. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 06:46:48,968][134211] Avg episode reward: [(0, '8.874')] [2025-01-04 06:46:50,535][134294] Updated weights for policy 0, policy_version 127454 (0.0014) [2025-01-04 06:46:53,742][134294] Updated weights for policy 0, policy_version 127464 (0.0026) [2025-01-04 06:46:53,968][134211] Fps is (10 sec: 17612.3, 60 sec: 14199.5, 300 sec: 14467.9). Total num frames: 522092544. Throughput: 0: 3696.6. Samples: 119691704. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 06:46:53,969][134211] Avg episode reward: [(0, '7.678')] [2025-01-04 06:46:57,258][134294] Updated weights for policy 0, policy_version 127474 (0.0029) [2025-01-04 06:46:58,968][134211] Fps is (10 sec: 13516.7, 60 sec: 14199.5, 300 sec: 14454.0). Total num frames: 522153984. Throughput: 0: 3673.7. Samples: 119709572. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 06:46:58,968][134211] Avg episode reward: [(0, '9.253')] [2025-01-04 06:47:00,317][134294] Updated weights for policy 0, policy_version 127484 (0.0025) [2025-01-04 06:47:03,380][134294] Updated weights for policy 0, policy_version 127494 (0.0026) [2025-01-04 06:47:03,968][134211] Fps is (10 sec: 12697.7, 60 sec: 14267.7, 300 sec: 14467.9). Total num frames: 522219520. Throughput: 0: 3703.4. Samples: 119719686. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 06:47:03,968][134211] Avg episode reward: [(0, '8.642')] [2025-01-04 06:47:06,494][134294] Updated weights for policy 0, policy_version 127504 (0.0026) [2025-01-04 06:47:08,968][134211] Fps is (10 sec: 13106.6, 60 sec: 14267.6, 300 sec: 14467.9). Total num frames: 522285056. Throughput: 0: 3646.5. Samples: 119739508. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 06:47:08,969][134211] Avg episode reward: [(0, '8.088')] [2025-01-04 06:47:09,922][134294] Updated weights for policy 0, policy_version 127514 (0.0024) [2025-01-04 06:47:13,297][134294] Updated weights for policy 0, policy_version 127524 (0.0025) [2025-01-04 06:47:13,968][134211] Fps is (10 sec: 12697.5, 60 sec: 14267.7, 300 sec: 14329.0). Total num frames: 522346496. Throughput: 0: 3597.5. Samples: 119757462. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 06:47:13,968][134211] Avg episode reward: [(0, '8.727')] [2025-01-04 06:47:13,982][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000127526_522346496.pth... [2025-01-04 06:47:14,063][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000126691_518926336.pth [2025-01-04 06:47:16,645][134294] Updated weights for policy 0, policy_version 127534 (0.0027) [2025-01-04 06:47:18,969][134211] Fps is (10 sec: 12287.4, 60 sec: 14267.5, 300 sec: 14176.3). Total num frames: 522407936. Throughput: 0: 3569.1. Samples: 119766646. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 06:47:18,969][134211] Avg episode reward: [(0, '8.138')] [2025-01-04 06:47:19,743][134294] Updated weights for policy 0, policy_version 127544 (0.0023) [2025-01-04 06:47:22,687][134294] Updated weights for policy 0, policy_version 127554 (0.0027) [2025-01-04 06:47:23,968][134211] Fps is (10 sec: 13107.3, 60 sec: 14062.9, 300 sec: 14218.0). Total num frames: 522477568. Throughput: 0: 3557.9. Samples: 119786742. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 06:47:23,968][134211] Avg episode reward: [(0, '7.869')] [2025-01-04 06:47:25,814][134294] Updated weights for policy 0, policy_version 127564 (0.0026) [2025-01-04 06:47:27,797][134294] Updated weights for policy 0, policy_version 127574 (0.0015) [2025-01-04 06:47:28,968][134211] Fps is (10 sec: 15976.1, 60 sec: 14472.6, 300 sec: 14315.2). Total num frames: 522567680. Throughput: 0: 3596.7. Samples: 119811100. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 06:47:28,968][134211] Avg episode reward: [(0, '8.200')] [2025-01-04 06:47:29,631][134294] Updated weights for policy 0, policy_version 127584 (0.0014) [2025-01-04 06:47:31,536][134294] Updated weights for policy 0, policy_version 127594 (0.0012) [2025-01-04 06:47:33,430][134294] Updated weights for policy 0, policy_version 127604 (0.0013) [2025-01-04 06:47:33,968][134211] Fps is (10 sec: 19661.3, 60 sec: 15087.0, 300 sec: 14454.0). Total num frames: 522674176. Throughput: 0: 3637.3. Samples: 119827524. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 06:47:33,968][134211] Avg episode reward: [(0, '8.257')] [2025-01-04 06:47:35,364][134294] Updated weights for policy 0, policy_version 127614 (0.0014) [2025-01-04 06:47:37,986][134294] Updated weights for policy 0, policy_version 127624 (0.0023) [2025-01-04 06:47:38,968][134211] Fps is (10 sec: 18841.0, 60 sec: 15360.0, 300 sec: 14523.4). Total num frames: 522756096. Throughput: 0: 3689.3. Samples: 119857722. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 06:47:38,969][134211] Avg episode reward: [(0, '9.285')] [2025-01-04 06:47:41,547][134294] Updated weights for policy 0, policy_version 127634 (0.0031) [2025-01-04 06:47:43,968][134211] Fps is (10 sec: 13926.0, 60 sec: 14950.3, 300 sec: 14495.7). Total num frames: 522813440. Throughput: 0: 3671.7. Samples: 119874798. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 06:47:43,969][134211] Avg episode reward: [(0, '8.020')] [2025-01-04 06:47:45,285][134294] Updated weights for policy 0, policy_version 127644 (0.0025) [2025-01-04 06:47:48,451][134294] Updated weights for policy 0, policy_version 127654 (0.0028) [2025-01-04 06:47:48,968][134211] Fps is (10 sec: 11878.5, 60 sec: 14267.7, 300 sec: 14481.8). Total num frames: 522874880. Throughput: 0: 3636.0. Samples: 119883304. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 06:47:48,968][134211] Avg episode reward: [(0, '8.219')] [2025-01-04 06:47:51,473][134294] Updated weights for policy 0, policy_version 127664 (0.0028) [2025-01-04 06:47:53,968][134211] Fps is (10 sec: 13107.3, 60 sec: 14199.5, 300 sec: 14495.7). Total num frames: 522944512. Throughput: 0: 3641.1. Samples: 119903356. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 06:47:53,968][134211] Avg episode reward: [(0, '8.960')] [2025-01-04 06:47:54,663][134294] Updated weights for policy 0, policy_version 127674 (0.0024) [2025-01-04 06:47:57,924][134294] Updated weights for policy 0, policy_version 127684 (0.0032) [2025-01-04 06:47:58,968][134211] Fps is (10 sec: 13107.4, 60 sec: 14199.5, 300 sec: 14440.2). Total num frames: 523005952. Throughput: 0: 3667.2. Samples: 119922486. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 06:47:58,968][134211] Avg episode reward: [(0, '9.460')] [2025-01-04 06:48:00,815][134294] Updated weights for policy 0, policy_version 127694 (0.0026) [2025-01-04 06:48:03,773][134294] Updated weights for policy 0, policy_version 127704 (0.0025) [2025-01-04 06:48:03,968][134211] Fps is (10 sec: 13107.1, 60 sec: 14267.7, 300 sec: 14315.2). Total num frames: 523075584. Throughput: 0: 3696.0. Samples: 119932962. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 06:48:03,968][134211] Avg episode reward: [(0, '8.356')] [2025-01-04 06:48:06,715][134294] Updated weights for policy 0, policy_version 127714 (0.0024) [2025-01-04 06:48:08,968][134211] Fps is (10 sec: 13516.6, 60 sec: 14267.8, 300 sec: 14287.5). Total num frames: 523141120. Throughput: 0: 3707.1. Samples: 119953560. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 06:48:08,968][134211] Avg episode reward: [(0, '9.428')] [2025-01-04 06:48:10,175][134294] Updated weights for policy 0, policy_version 127724 (0.0027) [2025-01-04 06:48:13,424][134294] Updated weights for policy 0, policy_version 127734 (0.0022) [2025-01-04 06:48:13,968][134211] Fps is (10 sec: 13107.5, 60 sec: 14336.1, 300 sec: 14315.2). Total num frames: 523206656. Throughput: 0: 3568.5. Samples: 119971682. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 06:48:13,968][134211] Avg episode reward: [(0, '9.052')] [2025-01-04 06:48:15,576][134294] Updated weights for policy 0, policy_version 127744 (0.0013) [2025-01-04 06:48:17,661][134294] Updated weights for policy 0, policy_version 127754 (0.0014) [2025-01-04 06:48:18,967][134211] Fps is (10 sec: 16384.3, 60 sec: 14950.7, 300 sec: 14426.3). Total num frames: 523304960. Throughput: 0: 3520.8. Samples: 119985962. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 06:48:18,968][134211] Avg episode reward: [(0, '8.543')] [2025-01-04 06:48:19,610][134294] Updated weights for policy 0, policy_version 127764 (0.0012) [2025-01-04 06:48:21,503][134294] Updated weights for policy 0, policy_version 127774 (0.0013) [2025-01-04 06:48:23,343][134294] Updated weights for policy 0, policy_version 127784 (0.0013) [2025-01-04 06:48:23,968][134211] Fps is (10 sec: 20889.4, 60 sec: 15633.1, 300 sec: 14592.9). Total num frames: 523415552. Throughput: 0: 3554.6. Samples: 120017680. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 06:48:23,968][134211] Avg episode reward: [(0, '8.327')] [2025-01-04 06:48:26,025][134294] Updated weights for policy 0, policy_version 127794 (0.0024) [2025-01-04 06:48:28,968][134211] Fps is (10 sec: 17612.4, 60 sec: 15223.4, 300 sec: 14579.0). Total num frames: 523481088. Throughput: 0: 3688.0. Samples: 120040756. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 06:48:28,968][134211] Avg episode reward: [(0, '8.394')] [2025-01-04 06:48:29,399][134294] Updated weights for policy 0, policy_version 127804 (0.0026) [2025-01-04 06:48:32,555][134294] Updated weights for policy 0, policy_version 127814 (0.0028) [2025-01-04 06:48:33,968][134211] Fps is (10 sec: 12697.5, 60 sec: 14472.5, 300 sec: 14565.1). Total num frames: 523542528. Throughput: 0: 3711.0. Samples: 120050300. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 06:48:33,969][134211] Avg episode reward: [(0, '8.654')] [2025-01-04 06:48:35,578][134294] Updated weights for policy 0, policy_version 127824 (0.0027) [2025-01-04 06:48:38,653][134294] Updated weights for policy 0, policy_version 127834 (0.0024) [2025-01-04 06:48:38,968][134211] Fps is (10 sec: 12697.7, 60 sec: 14199.5, 300 sec: 14495.7). Total num frames: 523608064. Throughput: 0: 3709.5. Samples: 120070282. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 06:48:38,968][134211] Avg episode reward: [(0, '8.160')] [2025-01-04 06:48:42,060][134294] Updated weights for policy 0, policy_version 127844 (0.0027) [2025-01-04 06:48:43,968][134211] Fps is (10 sec: 12697.5, 60 sec: 14267.7, 300 sec: 14481.8). Total num frames: 523669504. Throughput: 0: 3692.2. Samples: 120088634. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 06:48:43,969][134211] Avg episode reward: [(0, '8.196')] [2025-01-04 06:48:45,458][134294] Updated weights for policy 0, policy_version 127854 (0.0026) [2025-01-04 06:48:48,509][134294] Updated weights for policy 0, policy_version 127864 (0.0027) [2025-01-04 06:48:48,968][134211] Fps is (10 sec: 12697.4, 60 sec: 14336.0, 300 sec: 14454.0). Total num frames: 523735040. Throughput: 0: 3668.0. Samples: 120098022. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 06:48:48,968][134211] Avg episode reward: [(0, '8.885')] [2025-01-04 06:48:51,528][134294] Updated weights for policy 0, policy_version 127874 (0.0025) [2025-01-04 06:48:53,968][134211] Fps is (10 sec: 13516.9, 60 sec: 14336.0, 300 sec: 14342.9). Total num frames: 523804672. Throughput: 0: 3660.4. Samples: 120118278. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 06:48:53,968][134211] Avg episode reward: [(0, '8.197')] [2025-01-04 06:48:54,730][134294] Updated weights for policy 0, policy_version 127884 (0.0024) [2025-01-04 06:48:57,861][134294] Updated weights for policy 0, policy_version 127894 (0.0023) [2025-01-04 06:48:58,968][134211] Fps is (10 sec: 13107.5, 60 sec: 14336.0, 300 sec: 14218.0). Total num frames: 523866112. Throughput: 0: 3687.0. Samples: 120137596. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 06:48:58,968][134211] Avg episode reward: [(0, '8.202')] [2025-01-04 06:49:00,904][134294] Updated weights for policy 0, policy_version 127904 (0.0027) [2025-01-04 06:49:03,963][134294] Updated weights for policy 0, policy_version 127914 (0.0027) [2025-01-04 06:49:03,968][134211] Fps is (10 sec: 13107.3, 60 sec: 14336.0, 300 sec: 14245.8). Total num frames: 523935744. Throughput: 0: 3597.5. Samples: 120147848. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 06:49:03,968][134211] Avg episode reward: [(0, '8.479')] [2025-01-04 06:49:07,102][134294] Updated weights for policy 0, policy_version 127924 (0.0023) [2025-01-04 06:49:08,968][134211] Fps is (10 sec: 14745.7, 60 sec: 14540.8, 300 sec: 14329.1). Total num frames: 524013568. Throughput: 0: 3340.4. Samples: 120167998. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 06:49:08,968][134211] Avg episode reward: [(0, '8.690')] [2025-01-04 06:49:09,029][134294] Updated weights for policy 0, policy_version 127934 (0.0013) [2025-01-04 06:49:11,069][134294] Updated weights for policy 0, policy_version 127944 (0.0015) [2025-01-04 06:49:13,100][134294] Updated weights for policy 0, policy_version 127954 (0.0014) [2025-01-04 06:49:13,968][134211] Fps is (10 sec: 18022.2, 60 sec: 15155.2, 300 sec: 14481.8). Total num frames: 524115968. Throughput: 0: 3505.1. Samples: 120198484. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 06:49:13,968][134211] Avg episode reward: [(0, '7.960')] [2025-01-04 06:49:13,976][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000127958_524115968.pth... [2025-01-04 06:49:14,040][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000127105_520622080.pth [2025-01-04 06:49:16,012][134294] Updated weights for policy 0, policy_version 127964 (0.0019) [2025-01-04 06:49:18,968][134211] Fps is (10 sec: 16383.9, 60 sec: 14540.8, 300 sec: 14495.7). Total num frames: 524177408. Throughput: 0: 3519.9. Samples: 120208696. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 06:49:18,968][134211] Avg episode reward: [(0, '8.553')] [2025-01-04 06:49:19,317][134294] Updated weights for policy 0, policy_version 127974 (0.0025) [2025-01-04 06:49:22,454][134294] Updated weights for policy 0, policy_version 127984 (0.0026) [2025-01-04 06:49:23,969][134211] Fps is (10 sec: 12286.9, 60 sec: 13721.4, 300 sec: 14481.7). Total num frames: 524238848. Throughput: 0: 3503.7. Samples: 120227954. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 06:49:23,969][134211] Avg episode reward: [(0, '8.801')] [2025-01-04 06:49:25,622][134294] Updated weights for policy 0, policy_version 127994 (0.0028) [2025-01-04 06:49:28,830][134294] Updated weights for policy 0, policy_version 128004 (0.0024) [2025-01-04 06:49:28,968][134211] Fps is (10 sec: 12697.1, 60 sec: 13721.5, 300 sec: 14495.7). Total num frames: 524304384. Throughput: 0: 3522.3. Samples: 120247140. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 06:49:28,969][134211] Avg episode reward: [(0, '8.474')] [2025-01-04 06:49:32,006][134294] Updated weights for policy 0, policy_version 128014 (0.0027) [2025-01-04 06:49:33,968][134211] Fps is (10 sec: 13108.5, 60 sec: 13789.9, 300 sec: 14440.1). Total num frames: 524369920. Throughput: 0: 3528.7. Samples: 120256812. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 06:49:33,968][134211] Avg episode reward: [(0, '8.816')] [2025-01-04 06:49:35,024][134294] Updated weights for policy 0, policy_version 128024 (0.0024) [2025-01-04 06:49:37,984][134294] Updated weights for policy 0, policy_version 128034 (0.0022) [2025-01-04 06:49:38,968][134211] Fps is (10 sec: 13107.5, 60 sec: 13789.8, 300 sec: 14426.2). Total num frames: 524435456. Throughput: 0: 3537.7. Samples: 120277476. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 06:49:38,968][134211] Avg episode reward: [(0, '8.111')] [2025-01-04 06:49:41,136][134294] Updated weights for policy 0, policy_version 128044 (0.0021) [2025-01-04 06:49:43,206][134294] Updated weights for policy 0, policy_version 128054 (0.0013) [2025-01-04 06:49:43,968][134211] Fps is (10 sec: 15155.3, 60 sec: 14199.5, 300 sec: 14509.6). Total num frames: 524521472. Throughput: 0: 3615.2. Samples: 120300282. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 06:49:43,968][134211] Avg episode reward: [(0, '9.380')] [2025-01-04 06:49:45,305][134294] Updated weights for policy 0, policy_version 128064 (0.0013) [2025-01-04 06:49:47,260][134294] Updated weights for policy 0, policy_version 128074 (0.0014) [2025-01-04 06:49:48,968][134211] Fps is (10 sec: 19251.7, 60 sec: 14882.2, 300 sec: 14565.1). Total num frames: 524627968. Throughput: 0: 3716.5. Samples: 120315088. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 06:49:48,968][134211] Avg episode reward: [(0, '8.966')] [2025-01-04 06:49:49,131][134294] Updated weights for policy 0, policy_version 128084 (0.0013) [2025-01-04 06:49:51,021][134294] Updated weights for policy 0, policy_version 128094 (0.0014) [2025-01-04 06:49:53,761][134294] Updated weights for policy 0, policy_version 128104 (0.0025) [2025-01-04 06:49:53,968][134211] Fps is (10 sec: 19250.7, 60 sec: 15155.2, 300 sec: 14523.5). Total num frames: 524713984. Throughput: 0: 3966.1. Samples: 120346472. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 06:49:53,968][134211] Avg episode reward: [(0, '7.705')] [2025-01-04 06:49:57,191][134294] Updated weights for policy 0, policy_version 128114 (0.0028) [2025-01-04 06:49:58,968][134211] Fps is (10 sec: 14745.3, 60 sec: 15155.2, 300 sec: 14509.7). Total num frames: 524775424. Throughput: 0: 3696.3. Samples: 120364818. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 06:49:58,969][134211] Avg episode reward: [(0, '9.853')] [2025-01-04 06:50:00,407][134294] Updated weights for policy 0, policy_version 128124 (0.0028) [2025-01-04 06:50:03,534][134294] Updated weights for policy 0, policy_version 128134 (0.0029) [2025-01-04 06:50:03,969][134211] Fps is (10 sec: 12696.4, 60 sec: 15086.7, 300 sec: 14523.4). Total num frames: 524840960. Throughput: 0: 3689.5. Samples: 120374726. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 06:50:03,969][134211] Avg episode reward: [(0, '7.992')] [2025-01-04 06:50:06,536][134294] Updated weights for policy 0, policy_version 128144 (0.0026) [2025-01-04 06:50:08,968][134211] Fps is (10 sec: 13106.6, 60 sec: 14882.0, 300 sec: 14537.3). Total num frames: 524906496. Throughput: 0: 3702.9. Samples: 120394582. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 06:50:08,969][134211] Avg episode reward: [(0, '8.412')] [2025-01-04 06:50:09,996][134294] Updated weights for policy 0, policy_version 128154 (0.0027) [2025-01-04 06:50:13,466][134294] Updated weights for policy 0, policy_version 128164 (0.0024) [2025-01-04 06:50:13,968][134211] Fps is (10 sec: 12289.0, 60 sec: 14131.2, 300 sec: 14523.5). Total num frames: 524963840. Throughput: 0: 3671.7. Samples: 120412368. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 06:50:13,969][134211] Avg episode reward: [(0, '9.563')] [2025-01-04 06:50:16,912][134294] Updated weights for policy 0, policy_version 128174 (0.0025) [2025-01-04 06:50:18,968][134211] Fps is (10 sec: 11879.1, 60 sec: 14131.2, 300 sec: 14495.8). Total num frames: 525025280. Throughput: 0: 3649.9. Samples: 120421056. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 06:50:18,968][134211] Avg episode reward: [(0, '8.452')] [2025-01-04 06:50:20,372][134294] Updated weights for policy 0, policy_version 128184 (0.0027) [2025-01-04 06:50:23,485][134294] Updated weights for policy 0, policy_version 128194 (0.0025) [2025-01-04 06:50:23,969][134211] Fps is (10 sec: 12287.2, 60 sec: 14131.2, 300 sec: 14481.7). Total num frames: 525086720. Throughput: 0: 3599.8. Samples: 120439468. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 06:50:23,969][134211] Avg episode reward: [(0, '8.795')] [2025-01-04 06:50:26,444][134294] Updated weights for policy 0, policy_version 128204 (0.0025) [2025-01-04 06:50:28,968][134211] Fps is (10 sec: 12697.4, 60 sec: 14131.2, 300 sec: 14412.4). Total num frames: 525152256. Throughput: 0: 3538.8. Samples: 120459528. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 06:50:28,968][134211] Avg episode reward: [(0, '8.622')] [2025-01-04 06:50:29,702][134294] Updated weights for policy 0, policy_version 128214 (0.0027) [2025-01-04 06:50:31,983][134294] Updated weights for policy 0, policy_version 128224 (0.0018) [2025-01-04 06:50:33,916][134294] Updated weights for policy 0, policy_version 128234 (0.0012) [2025-01-04 06:50:33,968][134211] Fps is (10 sec: 15976.1, 60 sec: 14609.1, 300 sec: 14509.6). Total num frames: 525246464. Throughput: 0: 3449.3. Samples: 120470308. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 06:50:33,968][134211] Avg episode reward: [(0, '7.397')] [2025-01-04 06:50:35,795][134294] Updated weights for policy 0, policy_version 128244 (0.0013) [2025-01-04 06:50:37,741][134294] Updated weights for policy 0, policy_version 128254 (0.0014) [2025-01-04 06:50:38,968][134211] Fps is (10 sec: 19251.7, 60 sec: 15155.3, 300 sec: 14565.1). Total num frames: 525344768. Throughput: 0: 3471.0. Samples: 120502666. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 06:50:38,968][134211] Avg episode reward: [(0, '8.639')] [2025-01-04 06:50:40,653][134294] Updated weights for policy 0, policy_version 128264 (0.0023) [2025-01-04 06:50:43,968][134211] Fps is (10 sec: 15974.1, 60 sec: 14745.6, 300 sec: 14426.2). Total num frames: 525406208. Throughput: 0: 3507.8. Samples: 120522668. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 06:50:43,969][134211] Avg episode reward: [(0, '7.666')] [2025-01-04 06:50:44,366][134294] Updated weights for policy 0, policy_version 128274 (0.0027) [2025-01-04 06:50:47,706][134294] Updated weights for policy 0, policy_version 128284 (0.0027) [2025-01-04 06:50:48,968][134211] Fps is (10 sec: 11878.2, 60 sec: 13926.4, 300 sec: 14315.2). Total num frames: 525463552. Throughput: 0: 3472.3. Samples: 120530978. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 06:50:48,968][134211] Avg episode reward: [(0, '8.228')] [2025-01-04 06:50:50,804][134294] Updated weights for policy 0, policy_version 128294 (0.0031) [2025-01-04 06:50:53,765][134294] Updated weights for policy 0, policy_version 128304 (0.0025) [2025-01-04 06:50:53,968][134211] Fps is (10 sec: 12697.6, 60 sec: 13653.3, 300 sec: 14342.9). Total num frames: 525533184. Throughput: 0: 3479.7. Samples: 120551168. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 06:50:53,968][134211] Avg episode reward: [(0, '8.901')] [2025-01-04 06:50:56,976][134294] Updated weights for policy 0, policy_version 128314 (0.0024) [2025-01-04 06:50:58,968][134211] Fps is (10 sec: 13517.0, 60 sec: 13721.6, 300 sec: 14356.8). Total num frames: 525598720. Throughput: 0: 3519.8. Samples: 120570760. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 06:50:58,968][134211] Avg episode reward: [(0, '8.543')] [2025-01-04 06:51:00,099][134294] Updated weights for policy 0, policy_version 128324 (0.0028) [2025-01-04 06:51:03,066][134294] Updated weights for policy 0, policy_version 128334 (0.0025) [2025-01-04 06:51:03,968][134211] Fps is (10 sec: 13107.1, 60 sec: 13721.8, 300 sec: 14356.8). Total num frames: 525664256. Throughput: 0: 3550.4. Samples: 120580824. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 06:51:03,968][134211] Avg episode reward: [(0, '8.426')] [2025-01-04 06:51:06,056][134294] Updated weights for policy 0, policy_version 128344 (0.0027) [2025-01-04 06:51:08,968][134211] Fps is (10 sec: 13107.1, 60 sec: 13721.7, 300 sec: 14370.7). Total num frames: 525729792. Throughput: 0: 3594.0. Samples: 120601194. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 06:51:08,968][134211] Avg episode reward: [(0, '8.599')] [2025-01-04 06:51:09,340][134294] Updated weights for policy 0, policy_version 128354 (0.0028) [2025-01-04 06:51:12,257][134294] Updated weights for policy 0, policy_version 128364 (0.0019) [2025-01-04 06:51:13,968][134211] Fps is (10 sec: 14745.9, 60 sec: 14131.3, 300 sec: 14440.1). Total num frames: 525811712. Throughput: 0: 3621.2. Samples: 120622482. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 06:51:13,968][134211] Avg episode reward: [(0, '8.119')] [2025-01-04 06:51:13,975][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000128372_525811712.pth... [2025-01-04 06:51:14,020][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000127526_522346496.pth [2025-01-04 06:51:14,421][134294] Updated weights for policy 0, policy_version 128374 (0.0015) [2025-01-04 06:51:16,846][134294] Updated weights for policy 0, policy_version 128384 (0.0018) [2025-01-04 06:51:18,968][134211] Fps is (10 sec: 15563.5, 60 sec: 14335.8, 300 sec: 14412.3). Total num frames: 525885440. Throughput: 0: 3682.0. Samples: 120636002. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 06:51:18,969][134211] Avg episode reward: [(0, '8.673')] [2025-01-04 06:51:20,065][134294] Updated weights for policy 0, policy_version 128394 (0.0026) [2025-01-04 06:51:22,949][134294] Updated weights for policy 0, policy_version 128404 (0.0029) [2025-01-04 06:51:23,968][134211] Fps is (10 sec: 14335.8, 60 sec: 14472.7, 300 sec: 14426.2). Total num frames: 525955072. Throughput: 0: 3407.4. Samples: 120656000. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 06:51:23,968][134211] Avg episode reward: [(0, '8.954')] [2025-01-04 06:51:25,964][134294] Updated weights for policy 0, policy_version 128414 (0.0027) [2025-01-04 06:51:28,604][134294] Updated weights for policy 0, policy_version 128424 (0.0020) [2025-01-04 06:51:28,967][134211] Fps is (10 sec: 14337.4, 60 sec: 14609.1, 300 sec: 14440.2). Total num frames: 526028800. Throughput: 0: 3433.9. Samples: 120677192. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 06:51:28,968][134211] Avg episode reward: [(0, '8.306')] [2025-01-04 06:51:31,140][134294] Updated weights for policy 0, policy_version 128434 (0.0019) [2025-01-04 06:51:33,968][134211] Fps is (10 sec: 15155.2, 60 sec: 14335.9, 300 sec: 14481.8). Total num frames: 526106624. Throughput: 0: 3531.3. Samples: 120689886. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 06:51:33,968][134211] Avg episode reward: [(0, '8.216')] [2025-01-04 06:51:33,969][134294] Updated weights for policy 0, policy_version 128444 (0.0025) [2025-01-04 06:51:37,080][134294] Updated weights for policy 0, policy_version 128454 (0.0025) [2025-01-04 06:51:38,968][134211] Fps is (10 sec: 13926.2, 60 sec: 13721.6, 300 sec: 14412.4). Total num frames: 526168064. Throughput: 0: 3540.0. Samples: 120710468. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 06:51:38,968][134211] Avg episode reward: [(0, '7.895')] [2025-01-04 06:51:40,430][134294] Updated weights for policy 0, policy_version 128464 (0.0029) [2025-01-04 06:51:43,456][134294] Updated weights for policy 0, policy_version 128474 (0.0020) [2025-01-04 06:51:43,968][134211] Fps is (10 sec: 13107.5, 60 sec: 13858.2, 300 sec: 14301.3). Total num frames: 526237696. Throughput: 0: 3520.3. Samples: 120729174. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 06:51:43,968][134211] Avg episode reward: [(0, '8.404')] [2025-01-04 06:51:45,563][134294] Updated weights for policy 0, policy_version 128484 (0.0014) [2025-01-04 06:51:47,553][134294] Updated weights for policy 0, policy_version 128494 (0.0013) [2025-01-04 06:51:48,967][134211] Fps is (10 sec: 16793.8, 60 sec: 14540.9, 300 sec: 14384.6). Total num frames: 526336000. Throughput: 0: 3623.3. Samples: 120743872. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 06:51:48,968][134211] Avg episode reward: [(0, '8.929')] [2025-01-04 06:51:49,753][134294] Updated weights for policy 0, policy_version 128504 (0.0017) [2025-01-04 06:51:53,147][134294] Updated weights for policy 0, policy_version 128514 (0.0027) [2025-01-04 06:51:53,968][134211] Fps is (10 sec: 16383.9, 60 sec: 14472.6, 300 sec: 14398.5). Total num frames: 526401536. Throughput: 0: 3722.8. Samples: 120768720. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 06:51:53,968][134211] Avg episode reward: [(0, '7.221')] [2025-01-04 06:51:56,599][134294] Updated weights for policy 0, policy_version 128524 (0.0027) [2025-01-04 06:51:58,970][134211] Fps is (10 sec: 12285.1, 60 sec: 14335.5, 300 sec: 14370.6). Total num frames: 526458880. Throughput: 0: 3633.7. Samples: 120786006. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 06:51:58,971][134211] Avg episode reward: [(0, '7.999')] [2025-01-04 06:52:00,224][134294] Updated weights for policy 0, policy_version 128534 (0.0027) [2025-01-04 06:52:03,692][134294] Updated weights for policy 0, policy_version 128544 (0.0025) [2025-01-04 06:52:03,968][134211] Fps is (10 sec: 11878.3, 60 sec: 14267.8, 300 sec: 14356.8). Total num frames: 526520320. Throughput: 0: 3526.4. Samples: 120794688. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 06:52:03,968][134211] Avg episode reward: [(0, '8.633')] [2025-01-04 06:52:05,709][134294] Updated weights for policy 0, policy_version 128554 (0.0013) [2025-01-04 06:52:07,845][134294] Updated weights for policy 0, policy_version 128564 (0.0015) [2025-01-04 06:52:08,968][134211] Fps is (10 sec: 14748.7, 60 sec: 14609.0, 300 sec: 14440.1). Total num frames: 526606336. Throughput: 0: 3643.1. Samples: 120819938. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 06:52:08,968][134211] Avg episode reward: [(0, '8.110')] [2025-01-04 06:52:11,853][134294] Updated weights for policy 0, policy_version 128574 (0.0027) [2025-01-04 06:52:13,968][134211] Fps is (10 sec: 13926.4, 60 sec: 14131.2, 300 sec: 14412.4). Total num frames: 526659584. Throughput: 0: 3531.8. Samples: 120836122. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 06:52:13,968][134211] Avg episode reward: [(0, '9.011')] [2025-01-04 06:52:15,846][134294] Updated weights for policy 0, policy_version 128584 (0.0029) [2025-01-04 06:52:17,961][134294] Updated weights for policy 0, policy_version 128594 (0.0014) [2025-01-04 06:52:18,968][134211] Fps is (10 sec: 13107.2, 60 sec: 14199.6, 300 sec: 14440.1). Total num frames: 526737408. Throughput: 0: 3435.7. Samples: 120844492. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 06:52:18,968][134211] Avg episode reward: [(0, '8.494')] [2025-01-04 06:52:20,008][134294] Updated weights for policy 0, policy_version 128604 (0.0016) [2025-01-04 06:52:21,899][134294] Updated weights for policy 0, policy_version 128614 (0.0013) [2025-01-04 06:52:23,801][134294] Updated weights for policy 0, policy_version 128624 (0.0015) [2025-01-04 06:52:23,968][134211] Fps is (10 sec: 18841.8, 60 sec: 14882.2, 300 sec: 14509.6). Total num frames: 526848000. Throughput: 0: 3665.5. Samples: 120875414. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 06:52:23,968][134211] Avg episode reward: [(0, '8.658')] [2025-01-04 06:52:25,692][134294] Updated weights for policy 0, policy_version 128634 (0.0013) [2025-01-04 06:52:27,563][134294] Updated weights for policy 0, policy_version 128644 (0.0013) [2025-01-04 06:52:28,968][134211] Fps is (10 sec: 21299.4, 60 sec: 15359.9, 300 sec: 14495.7). Total num frames: 526950400. Throughput: 0: 3970.8. Samples: 120907862. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 06:52:28,968][134211] Avg episode reward: [(0, '7.945')] [2025-01-04 06:52:30,152][134294] Updated weights for policy 0, policy_version 128654 (0.0021) [2025-01-04 06:52:33,328][134294] Updated weights for policy 0, policy_version 128664 (0.0025) [2025-01-04 06:52:33,968][134211] Fps is (10 sec: 16383.6, 60 sec: 15086.9, 300 sec: 14426.3). Total num frames: 527011840. Throughput: 0: 3876.0. Samples: 120918292. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 06:52:33,969][134211] Avg episode reward: [(0, '8.233')] [2025-01-04 06:52:36,464][134294] Updated weights for policy 0, policy_version 128674 (0.0025) [2025-01-04 06:52:38,968][134211] Fps is (10 sec: 12288.0, 60 sec: 15086.9, 300 sec: 14440.1). Total num frames: 527073280. Throughput: 0: 3745.3. Samples: 120937258. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 06:52:38,968][134211] Avg episode reward: [(0, '8.218')] [2025-01-04 06:52:40,098][134294] Updated weights for policy 0, policy_version 128684 (0.0026) [2025-01-04 06:52:43,402][134294] Updated weights for policy 0, policy_version 128694 (0.0026) [2025-01-04 06:52:43,968][134211] Fps is (10 sec: 12288.0, 60 sec: 14950.3, 300 sec: 14440.1). Total num frames: 527134720. Throughput: 0: 3754.6. Samples: 120954956. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 06:52:43,969][134211] Avg episode reward: [(0, '8.828')] [2025-01-04 06:52:46,749][134294] Updated weights for policy 0, policy_version 128704 (0.0025) [2025-01-04 06:52:48,968][134211] Fps is (10 sec: 12697.8, 60 sec: 14404.2, 300 sec: 14426.3). Total num frames: 527200256. Throughput: 0: 3767.0. Samples: 120964204. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 06:52:48,968][134211] Avg episode reward: [(0, '8.984')] [2025-01-04 06:52:49,889][134294] Updated weights for policy 0, policy_version 128714 (0.0024) [2025-01-04 06:52:52,908][134294] Updated weights for policy 0, policy_version 128724 (0.0025) [2025-01-04 06:52:53,968][134211] Fps is (10 sec: 13107.2, 60 sec: 14404.2, 300 sec: 14440.1). Total num frames: 527265792. Throughput: 0: 3648.3. Samples: 120984110. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 06:52:53,968][134211] Avg episode reward: [(0, '8.475')] [2025-01-04 06:52:56,027][134294] Updated weights for policy 0, policy_version 128734 (0.0025) [2025-01-04 06:52:58,968][134211] Fps is (10 sec: 13106.9, 60 sec: 14541.3, 300 sec: 14426.3). Total num frames: 527331328. Throughput: 0: 3727.6. Samples: 121003864. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 06:52:58,968][134211] Avg episode reward: [(0, '9.466')] [2025-01-04 06:52:59,198][134294] Updated weights for policy 0, policy_version 128744 (0.0028) [2025-01-04 06:53:02,257][134294] Updated weights for policy 0, policy_version 128754 (0.0026) [2025-01-04 06:53:03,968][134211] Fps is (10 sec: 13107.4, 60 sec: 14609.1, 300 sec: 14426.3). Total num frames: 527396864. Throughput: 0: 3759.2. Samples: 121013656. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 06:53:03,968][134211] Avg episode reward: [(0, '9.753')] [2025-01-04 06:53:05,256][134294] Updated weights for policy 0, policy_version 128764 (0.0026) [2025-01-04 06:53:08,302][134294] Updated weights for policy 0, policy_version 128774 (0.0023) [2025-01-04 06:53:08,968][134211] Fps is (10 sec: 13517.0, 60 sec: 14336.0, 300 sec: 14440.1). Total num frames: 527466496. Throughput: 0: 3527.4. Samples: 121034148. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 06:53:08,968][134211] Avg episode reward: [(0, '8.447')] [2025-01-04 06:53:11,684][134294] Updated weights for policy 0, policy_version 128784 (0.0024) [2025-01-04 06:53:13,968][134211] Fps is (10 sec: 12697.3, 60 sec: 14404.2, 300 sec: 14301.3). Total num frames: 527523840. Throughput: 0: 3206.3. Samples: 121052144. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 06:53:13,969][134211] Avg episode reward: [(0, '8.692')] [2025-01-04 06:53:13,978][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000128790_527523840.pth... [2025-01-04 06:53:14,060][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000127958_524115968.pth [2025-01-04 06:53:15,268][134294] Updated weights for policy 0, policy_version 128794 (0.0025) [2025-01-04 06:53:18,245][134294] Updated weights for policy 0, policy_version 128804 (0.0025) [2025-01-04 06:53:18,968][134211] Fps is (10 sec: 12288.0, 60 sec: 14199.5, 300 sec: 14148.6). Total num frames: 527589376. Throughput: 0: 3175.2. Samples: 121061174. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 06:53:18,968][134211] Avg episode reward: [(0, '8.121')] [2025-01-04 06:53:21,318][134294] Updated weights for policy 0, policy_version 128814 (0.0027) [2025-01-04 06:53:23,968][134211] Fps is (10 sec: 13107.3, 60 sec: 13448.5, 300 sec: 14148.6). Total num frames: 527654912. Throughput: 0: 3210.3. Samples: 121081720. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 06:53:23,968][134211] Avg episode reward: [(0, '8.063')] [2025-01-04 06:53:24,329][134294] Updated weights for policy 0, policy_version 128824 (0.0028) [2025-01-04 06:53:27,284][134294] Updated weights for policy 0, policy_version 128834 (0.0026) [2025-01-04 06:53:28,968][134211] Fps is (10 sec: 14745.8, 60 sec: 13107.2, 300 sec: 14218.0). Total num frames: 527736832. Throughput: 0: 3304.6. Samples: 121103664. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 06:53:28,968][134211] Avg episode reward: [(0, '8.589')] [2025-01-04 06:53:29,270][134294] Updated weights for policy 0, policy_version 128844 (0.0013) [2025-01-04 06:53:31,206][134294] Updated weights for policy 0, policy_version 128854 (0.0015) [2025-01-04 06:53:33,066][134294] Updated weights for policy 0, policy_version 128864 (0.0012) [2025-01-04 06:53:33,968][134211] Fps is (10 sec: 18842.0, 60 sec: 13858.2, 300 sec: 14356.8). Total num frames: 527843328. Throughput: 0: 3454.3. Samples: 121119646. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 06:53:33,968][134211] Avg episode reward: [(0, '7.932')] [2025-01-04 06:53:35,064][134294] Updated weights for policy 0, policy_version 128874 (0.0017) [2025-01-04 06:53:38,159][134294] Updated weights for policy 0, policy_version 128884 (0.0028) [2025-01-04 06:53:38,968][134211] Fps is (10 sec: 18021.1, 60 sec: 14062.8, 300 sec: 14398.5). Total num frames: 527917056. Throughput: 0: 3631.2. Samples: 121147516. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 06:53:38,969][134211] Avg episode reward: [(0, '8.127')] [2025-01-04 06:53:41,323][134294] Updated weights for policy 0, policy_version 128894 (0.0028) [2025-01-04 06:53:43,968][134211] Fps is (10 sec: 13516.5, 60 sec: 14062.9, 300 sec: 14384.6). Total num frames: 527978496. Throughput: 0: 3600.4. Samples: 121165884. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 06:53:43,969][134211] Avg episode reward: [(0, '9.203')] [2025-01-04 06:53:44,905][134294] Updated weights for policy 0, policy_version 128904 (0.0027) [2025-01-04 06:53:48,098][134294] Updated weights for policy 0, policy_version 128914 (0.0027) [2025-01-04 06:53:48,968][134211] Fps is (10 sec: 12288.3, 60 sec: 13994.6, 300 sec: 14356.8). Total num frames: 528039936. Throughput: 0: 3579.1. Samples: 121174716. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 06:53:48,969][134211] Avg episode reward: [(0, '8.914')] [2025-01-04 06:53:51,315][134294] Updated weights for policy 0, policy_version 128924 (0.0028) [2025-01-04 06:53:53,968][134211] Fps is (10 sec: 12697.6, 60 sec: 13994.7, 300 sec: 14370.7). Total num frames: 528105472. Throughput: 0: 3554.8. Samples: 121194114. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 06:53:53,969][134211] Avg episode reward: [(0, '8.289')] [2025-01-04 06:53:54,430][134294] Updated weights for policy 0, policy_version 128934 (0.0025) [2025-01-04 06:53:57,527][134294] Updated weights for policy 0, policy_version 128944 (0.0023) [2025-01-04 06:53:58,968][134211] Fps is (10 sec: 13107.6, 60 sec: 13994.7, 300 sec: 14356.8). Total num frames: 528171008. Throughput: 0: 3595.0. Samples: 121213920. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 06:53:58,968][134211] Avg episode reward: [(0, '8.309')] [2025-01-04 06:54:00,718][134294] Updated weights for policy 0, policy_version 128954 (0.0026) [2025-01-04 06:54:03,682][134294] Updated weights for policy 0, policy_version 128964 (0.0022) [2025-01-04 06:54:03,968][134211] Fps is (10 sec: 13107.5, 60 sec: 13994.7, 300 sec: 14315.2). Total num frames: 528236544. Throughput: 0: 3612.5. Samples: 121223738. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 06:54:03,968][134211] Avg episode reward: [(0, '7.990')] [2025-01-04 06:54:05,690][134294] Updated weights for policy 0, policy_version 128974 (0.0014) [2025-01-04 06:54:07,577][134294] Updated weights for policy 0, policy_version 128984 (0.0013) [2025-01-04 06:54:08,968][134211] Fps is (10 sec: 17613.1, 60 sec: 14677.4, 300 sec: 14343.0). Total num frames: 528347136. Throughput: 0: 3755.6. Samples: 121250720. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 06:54:08,968][134211] Avg episode reward: [(0, '8.307')] [2025-01-04 06:54:09,570][134294] Updated weights for policy 0, policy_version 128994 (0.0013) [2025-01-04 06:54:11,428][134294] Updated weights for policy 0, policy_version 129004 (0.0014) [2025-01-04 06:54:13,969][134211] Fps is (10 sec: 19248.6, 60 sec: 15086.7, 300 sec: 14412.3). Total num frames: 528429056. Throughput: 0: 3895.0. Samples: 121278946. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 06:54:13,970][134211] Avg episode reward: [(0, '8.280')] [2025-01-04 06:54:14,899][134294] Updated weights for policy 0, policy_version 129014 (0.0027) [2025-01-04 06:54:18,531][134294] Updated weights for policy 0, policy_version 129024 (0.0027) [2025-01-04 06:54:18,968][134211] Fps is (10 sec: 13926.1, 60 sec: 14950.4, 300 sec: 14398.5). Total num frames: 528486400. Throughput: 0: 3723.5. Samples: 121287204. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 06:54:18,968][134211] Avg episode reward: [(0, '8.499')] [2025-01-04 06:54:22,148][134294] Updated weights for policy 0, policy_version 129034 (0.0028) [2025-01-04 06:54:23,968][134211] Fps is (10 sec: 11470.1, 60 sec: 14813.9, 300 sec: 14370.7). Total num frames: 528543744. Throughput: 0: 3478.9. Samples: 121304066. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 06:54:23,968][134211] Avg episode reward: [(0, '8.274')] [2025-01-04 06:54:25,509][134294] Updated weights for policy 0, policy_version 129044 (0.0026) [2025-01-04 06:54:28,632][134294] Updated weights for policy 0, policy_version 129054 (0.0027) [2025-01-04 06:54:28,968][134211] Fps is (10 sec: 12288.1, 60 sec: 14540.8, 300 sec: 14370.7). Total num frames: 528609280. Throughput: 0: 3494.3. Samples: 121323128. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 06:54:28,968][134211] Avg episode reward: [(0, '7.965')] [2025-01-04 06:54:31,792][134294] Updated weights for policy 0, policy_version 129064 (0.0025) [2025-01-04 06:54:33,968][134211] Fps is (10 sec: 12697.7, 60 sec: 13789.8, 300 sec: 14356.8). Total num frames: 528670720. Throughput: 0: 3510.4. Samples: 121332682. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 06:54:33,968][134211] Avg episode reward: [(0, '8.558')] [2025-01-04 06:54:35,044][134294] Updated weights for policy 0, policy_version 129074 (0.0025) [2025-01-04 06:54:37,993][134294] Updated weights for policy 0, policy_version 129084 (0.0023) [2025-01-04 06:54:38,968][134211] Fps is (10 sec: 13106.4, 60 sec: 13721.6, 300 sec: 14301.3). Total num frames: 528740352. Throughput: 0: 3516.7. Samples: 121352366. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 06:54:38,969][134211] Avg episode reward: [(0, '8.947')] [2025-01-04 06:54:41,038][134294] Updated weights for policy 0, policy_version 129094 (0.0024) [2025-01-04 06:54:43,967][134211] Fps is (10 sec: 13517.1, 60 sec: 13789.9, 300 sec: 14162.4). Total num frames: 528805888. Throughput: 0: 3521.9. Samples: 121372406. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 06:54:43,968][134211] Avg episode reward: [(0, '8.413')] [2025-01-04 06:54:44,040][134294] Updated weights for policy 0, policy_version 129104 (0.0024) [2025-01-04 06:54:46,098][134294] Updated weights for policy 0, policy_version 129114 (0.0013) [2025-01-04 06:54:48,064][134294] Updated weights for policy 0, policy_version 129124 (0.0013) [2025-01-04 06:54:48,967][134211] Fps is (10 sec: 16795.0, 60 sec: 14472.7, 300 sec: 14218.0). Total num frames: 528908288. Throughput: 0: 3612.2. Samples: 121386288. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 06:54:48,968][134211] Avg episode reward: [(0, '7.759')] [2025-01-04 06:54:49,947][134294] Updated weights for policy 0, policy_version 129134 (0.0013) [2025-01-04 06:54:52,123][134294] Updated weights for policy 0, policy_version 129144 (0.0017) [2025-01-04 06:54:53,968][134211] Fps is (10 sec: 18840.9, 60 sec: 14813.9, 300 sec: 14301.3). Total num frames: 528994304. Throughput: 0: 3693.8. Samples: 121416940. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 06:54:53,969][134211] Avg episode reward: [(0, '8.099')] [2025-01-04 06:54:55,281][134294] Updated weights for policy 0, policy_version 129154 (0.0025) [2025-01-04 06:54:58,573][134294] Updated weights for policy 0, policy_version 129164 (0.0026) [2025-01-04 06:54:58,968][134211] Fps is (10 sec: 14745.2, 60 sec: 14745.6, 300 sec: 14287.5). Total num frames: 529055744. Throughput: 0: 3489.3. Samples: 121435960. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 06:54:58,968][134211] Avg episode reward: [(0, '8.741')] [2025-01-04 06:55:01,863][134294] Updated weights for policy 0, policy_version 129174 (0.0026) [2025-01-04 06:55:03,968][134211] Fps is (10 sec: 12697.8, 60 sec: 14745.6, 300 sec: 14287.4). Total num frames: 529121280. Throughput: 0: 3506.1. Samples: 121444978. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 06:55:03,968][134211] Avg episode reward: [(0, '8.637')] [2025-01-04 06:55:05,262][134294] Updated weights for policy 0, policy_version 129184 (0.0030) [2025-01-04 06:55:08,201][134294] Updated weights for policy 0, policy_version 129194 (0.0028) [2025-01-04 06:55:08,968][134211] Fps is (10 sec: 13107.2, 60 sec: 13994.6, 300 sec: 14315.2). Total num frames: 529186816. Throughput: 0: 3560.6. Samples: 121464292. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 06:55:08,969][134211] Avg episode reward: [(0, '9.063')] [2025-01-04 06:55:11,785][134294] Updated weights for policy 0, policy_version 129204 (0.0028) [2025-01-04 06:55:13,968][134211] Fps is (10 sec: 12287.8, 60 sec: 13585.3, 300 sec: 14301.3). Total num frames: 529244160. Throughput: 0: 3538.3. Samples: 121482350. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 06:55:13,968][134211] Avg episode reward: [(0, '8.335')] [2025-01-04 06:55:13,979][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000129210_529244160.pth... [2025-01-04 06:55:14,062][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000128372_525811712.pth [2025-01-04 06:55:15,230][134294] Updated weights for policy 0, policy_version 129214 (0.0027) [2025-01-04 06:55:17,516][134294] Updated weights for policy 0, policy_version 129224 (0.0013) [2025-01-04 06:55:18,968][134211] Fps is (10 sec: 14336.3, 60 sec: 14063.0, 300 sec: 14384.6). Total num frames: 529330176. Throughput: 0: 3542.5. Samples: 121492094. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 06:55:18,968][134211] Avg episode reward: [(0, '8.348')] [2025-01-04 06:55:19,324][134294] Updated weights for policy 0, policy_version 129234 (0.0014) [2025-01-04 06:55:21,246][134294] Updated weights for policy 0, policy_version 129244 (0.0014) [2025-01-04 06:55:23,117][134294] Updated weights for policy 0, policy_version 129254 (0.0013) [2025-01-04 06:55:23,968][134211] Fps is (10 sec: 19661.4, 60 sec: 14950.5, 300 sec: 14537.3). Total num frames: 529440768. Throughput: 0: 3818.9. Samples: 121524212. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 06:55:23,968][134211] Avg episode reward: [(0, '8.896')] [2025-01-04 06:55:25,033][134294] Updated weights for policy 0, policy_version 129264 (0.0013) [2025-01-04 06:55:28,002][134294] Updated weights for policy 0, policy_version 129274 (0.0026) [2025-01-04 06:55:28,968][134211] Fps is (10 sec: 18431.8, 60 sec: 15086.9, 300 sec: 14467.9). Total num frames: 529514496. Throughput: 0: 3951.5. Samples: 121550222. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 06:55:28,968][134211] Avg episode reward: [(0, '7.705')] [2025-01-04 06:55:31,642][134294] Updated weights for policy 0, policy_version 129284 (0.0027) [2025-01-04 06:55:33,968][134211] Fps is (10 sec: 13516.2, 60 sec: 15086.9, 300 sec: 14342.9). Total num frames: 529575936. Throughput: 0: 3832.8. Samples: 121558768. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 06:55:33,969][134211] Avg episode reward: [(0, '8.698')] [2025-01-04 06:55:34,802][134294] Updated weights for policy 0, policy_version 129294 (0.0028) [2025-01-04 06:55:37,863][134294] Updated weights for policy 0, policy_version 129304 (0.0025) [2025-01-04 06:55:38,968][134211] Fps is (10 sec: 12697.5, 60 sec: 15018.8, 300 sec: 14356.8). Total num frames: 529641472. Throughput: 0: 3583.1. Samples: 121578180. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 06:55:38,968][134211] Avg episode reward: [(0, '7.877')] [2025-01-04 06:55:41,095][134294] Updated weights for policy 0, policy_version 129314 (0.0025) [2025-01-04 06:55:43,971][134211] Fps is (10 sec: 12694.1, 60 sec: 14949.6, 300 sec: 14370.6). Total num frames: 529702912. Throughput: 0: 3573.4. Samples: 121596776. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 06:55:43,971][134211] Avg episode reward: [(0, '8.490')] [2025-01-04 06:55:44,604][134294] Updated weights for policy 0, policy_version 129324 (0.0024) [2025-01-04 06:55:47,974][134294] Updated weights for policy 0, policy_version 129334 (0.0027) [2025-01-04 06:55:48,968][134211] Fps is (10 sec: 11878.5, 60 sec: 14199.4, 300 sec: 14329.1). Total num frames: 529760256. Throughput: 0: 3568.2. Samples: 121605546. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 06:55:48,968][134211] Avg episode reward: [(0, '8.185')] [2025-01-04 06:55:50,993][134294] Updated weights for policy 0, policy_version 129344 (0.0024) [2025-01-04 06:55:53,968][134211] Fps is (10 sec: 12701.4, 60 sec: 13926.4, 300 sec: 14342.9). Total num frames: 529829888. Throughput: 0: 3579.4. Samples: 121625364. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 06:55:53,969][134211] Avg episode reward: [(0, '8.288')] [2025-01-04 06:55:54,141][134294] Updated weights for policy 0, policy_version 129354 (0.0027) [2025-01-04 06:55:57,027][134294] Updated weights for policy 0, policy_version 129364 (0.0022) [2025-01-04 06:55:58,968][134211] Fps is (10 sec: 15155.4, 60 sec: 14267.8, 300 sec: 14398.5). Total num frames: 529911808. Throughput: 0: 3667.4. Samples: 121647380. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 06:55:58,968][134211] Avg episode reward: [(0, '8.134')] [2025-01-04 06:55:59,123][134294] Updated weights for policy 0, policy_version 129374 (0.0014) [2025-01-04 06:56:01,948][134294] Updated weights for policy 0, policy_version 129384 (0.0027) [2025-01-04 06:56:03,968][134211] Fps is (10 sec: 14745.9, 60 sec: 14267.7, 300 sec: 14398.5). Total num frames: 529977344. Throughput: 0: 3727.3. Samples: 121659824. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 06:56:03,968][134211] Avg episode reward: [(0, '8.512')] [2025-01-04 06:56:05,273][134294] Updated weights for policy 0, policy_version 129394 (0.0030) [2025-01-04 06:56:08,012][134294] Updated weights for policy 0, policy_version 129404 (0.0022) [2025-01-04 06:56:08,968][134211] Fps is (10 sec: 14336.0, 60 sec: 14472.6, 300 sec: 14384.6). Total num frames: 530055168. Throughput: 0: 3442.3. Samples: 121679116. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 06:56:08,968][134211] Avg episode reward: [(0, '8.257')] [2025-01-04 06:56:10,386][134294] Updated weights for policy 0, policy_version 129414 (0.0020) [2025-01-04 06:56:13,706][134294] Updated weights for policy 0, policy_version 129424 (0.0025) [2025-01-04 06:56:13,968][134211] Fps is (10 sec: 14745.1, 60 sec: 14677.3, 300 sec: 14370.7). Total num frames: 530124800. Throughput: 0: 3374.9. Samples: 121702094. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 06:56:13,969][134211] Avg episode reward: [(0, '8.389')] [2025-01-04 06:56:16,567][134294] Updated weights for policy 0, policy_version 129434 (0.0020) [2025-01-04 06:56:18,711][134294] Updated weights for policy 0, policy_version 129444 (0.0016) [2025-01-04 06:56:18,968][134211] Fps is (10 sec: 15155.3, 60 sec: 14609.1, 300 sec: 14412.4). Total num frames: 530206720. Throughput: 0: 3398.7. Samples: 121711708. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 06:56:18,968][134211] Avg episode reward: [(0, '7.534')] [2025-01-04 06:56:20,815][134294] Updated weights for policy 0, policy_version 129454 (0.0015) [2025-01-04 06:56:22,725][134294] Updated weights for policy 0, policy_version 129464 (0.0015) [2025-01-04 06:56:23,967][134211] Fps is (10 sec: 18432.9, 60 sec: 14472.6, 300 sec: 14509.6). Total num frames: 530309120. Throughput: 0: 3627.0. Samples: 121741394. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 06:56:23,968][134211] Avg episode reward: [(0, '8.308')] [2025-01-04 06:56:24,677][134294] Updated weights for policy 0, policy_version 129474 (0.0014) [2025-01-04 06:56:27,646][134294] Updated weights for policy 0, policy_version 129484 (0.0026) [2025-01-04 06:56:28,968][134211] Fps is (10 sec: 17203.0, 60 sec: 14404.3, 300 sec: 14481.8). Total num frames: 530378752. Throughput: 0: 3771.6. Samples: 121766484. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 06:56:28,968][134211] Avg episode reward: [(0, '8.520')] [2025-01-04 06:56:30,853][134294] Updated weights for policy 0, policy_version 129494 (0.0025) [2025-01-04 06:56:33,968][134211] Fps is (10 sec: 13516.6, 60 sec: 14472.6, 300 sec: 14495.7). Total num frames: 530444288. Throughput: 0: 3792.3. Samples: 121776200. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 06:56:33,968][134211] Avg episode reward: [(0, '8.631')] [2025-01-04 06:56:34,041][134294] Updated weights for policy 0, policy_version 129504 (0.0025) [2025-01-04 06:56:37,323][134294] Updated weights for policy 0, policy_version 129514 (0.0026) [2025-01-04 06:56:38,968][134211] Fps is (10 sec: 13107.1, 60 sec: 14472.5, 300 sec: 14481.8). Total num frames: 530509824. Throughput: 0: 3774.1. Samples: 121795198. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 06:56:38,968][134211] Avg episode reward: [(0, '7.444')] [2025-01-04 06:56:40,585][134294] Updated weights for policy 0, policy_version 129524 (0.0030) [2025-01-04 06:56:43,968][134211] Fps is (10 sec: 12287.8, 60 sec: 14405.0, 300 sec: 14342.9). Total num frames: 530567168. Throughput: 0: 3690.8. Samples: 121813468. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 06:56:43,968][134211] Avg episode reward: [(0, '8.346')] [2025-01-04 06:56:44,055][134294] Updated weights for policy 0, policy_version 129534 (0.0024) [2025-01-04 06:56:47,445][134294] Updated weights for policy 0, policy_version 129544 (0.0024) [2025-01-04 06:56:48,968][134211] Fps is (10 sec: 11878.4, 60 sec: 14472.5, 300 sec: 14329.1). Total num frames: 530628608. Throughput: 0: 3614.8. Samples: 121822490. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 06:56:48,968][134211] Avg episode reward: [(0, '8.261')] [2025-01-04 06:56:50,626][134294] Updated weights for policy 0, policy_version 129554 (0.0028) [2025-01-04 06:56:53,968][134211] Fps is (10 sec: 12288.0, 60 sec: 14336.0, 300 sec: 14343.0). Total num frames: 530690048. Throughput: 0: 3601.5. Samples: 121841184. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 06:56:53,968][134211] Avg episode reward: [(0, '8.938')] [2025-01-04 06:56:54,043][134294] Updated weights for policy 0, policy_version 129564 (0.0026) [2025-01-04 06:56:57,004][134294] Updated weights for policy 0, policy_version 129574 (0.0023) [2025-01-04 06:56:58,968][134211] Fps is (10 sec: 13107.3, 60 sec: 14131.2, 300 sec: 14370.7). Total num frames: 530759680. Throughput: 0: 3524.7. Samples: 121860702. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 06:56:58,968][134211] Avg episode reward: [(0, '8.092')] [2025-01-04 06:57:00,096][134294] Updated weights for policy 0, policy_version 129584 (0.0028) [2025-01-04 06:57:03,349][134294] Updated weights for policy 0, policy_version 129594 (0.0022) [2025-01-04 06:57:03,969][134211] Fps is (10 sec: 13105.5, 60 sec: 14062.6, 300 sec: 14287.3). Total num frames: 530821120. Throughput: 0: 3542.5. Samples: 121871124. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 06:57:03,970][134211] Avg episode reward: [(0, '8.706')] [2025-01-04 06:57:06,555][134294] Updated weights for policy 0, policy_version 129604 (0.0023) [2025-01-04 06:57:08,540][134294] Updated weights for policy 0, policy_version 129614 (0.0013) [2025-01-04 06:57:08,967][134211] Fps is (10 sec: 14745.8, 60 sec: 14199.5, 300 sec: 14398.5). Total num frames: 530907136. Throughput: 0: 3323.3. Samples: 121890944. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 06:57:08,968][134211] Avg episode reward: [(0, '8.038')] [2025-01-04 06:57:10,603][134294] Updated weights for policy 0, policy_version 129624 (0.0014) [2025-01-04 06:57:12,618][134294] Updated weights for policy 0, policy_version 129634 (0.0014) [2025-01-04 06:57:13,968][134211] Fps is (10 sec: 18434.6, 60 sec: 14677.4, 300 sec: 14467.9). Total num frames: 531005440. Throughput: 0: 3443.1. Samples: 121921422. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 06:57:13,968][134211] Avg episode reward: [(0, '7.708')] [2025-01-04 06:57:14,026][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000129641_531009536.pth... [2025-01-04 06:57:14,068][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000128790_527523840.pth [2025-01-04 06:57:14,654][134294] Updated weights for policy 0, policy_version 129644 (0.0013) [2025-01-04 06:57:16,637][134294] Updated weights for policy 0, policy_version 129654 (0.0015) [2025-01-04 06:57:18,968][134211] Fps is (10 sec: 19250.7, 60 sec: 14882.1, 300 sec: 14412.4). Total num frames: 531099648. Throughput: 0: 3564.6. Samples: 121936606. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 06:57:18,968][134211] Avg episode reward: [(0, '7.613')] [2025-01-04 06:57:19,202][134294] Updated weights for policy 0, policy_version 129664 (0.0021) [2025-01-04 06:57:22,474][134294] Updated weights for policy 0, policy_version 129674 (0.0029) [2025-01-04 06:57:23,968][134211] Fps is (10 sec: 15564.8, 60 sec: 14199.4, 300 sec: 14273.5). Total num frames: 531161088. Throughput: 0: 3626.2. Samples: 121958378. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 06:57:23,968][134211] Avg episode reward: [(0, '8.509')] [2025-01-04 06:57:25,676][134294] Updated weights for policy 0, policy_version 129684 (0.0026) [2025-01-04 06:57:28,717][134294] Updated weights for policy 0, policy_version 129694 (0.0029) [2025-01-04 06:57:28,968][134211] Fps is (10 sec: 12697.6, 60 sec: 14131.2, 300 sec: 14287.4). Total num frames: 531226624. Throughput: 0: 3658.1. Samples: 121978080. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 06:57:28,968][134211] Avg episode reward: [(0, '8.136')] [2025-01-04 06:57:31,833][134294] Updated weights for policy 0, policy_version 129704 (0.0026) [2025-01-04 06:57:33,968][134211] Fps is (10 sec: 13107.0, 60 sec: 14131.2, 300 sec: 14301.3). Total num frames: 531292160. Throughput: 0: 3681.9. Samples: 121988176. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 06:57:33,969][134211] Avg episode reward: [(0, '8.109')] [2025-01-04 06:57:35,325][134294] Updated weights for policy 0, policy_version 129714 (0.0031) [2025-01-04 06:57:38,687][134294] Updated weights for policy 0, policy_version 129724 (0.0029) [2025-01-04 06:57:38,968][134211] Fps is (10 sec: 12287.9, 60 sec: 13994.7, 300 sec: 14287.4). Total num frames: 531349504. Throughput: 0: 3663.6. Samples: 122006046. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 06:57:38,968][134211] Avg episode reward: [(0, '7.783')] [2025-01-04 06:57:42,169][134294] Updated weights for policy 0, policy_version 129734 (0.0027) [2025-01-04 06:57:43,968][134211] Fps is (10 sec: 11878.6, 60 sec: 14063.0, 300 sec: 14273.5). Total num frames: 531410944. Throughput: 0: 3626.9. Samples: 122023914. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 06:57:43,968][134211] Avg episode reward: [(0, '8.219')] [2025-01-04 06:57:45,476][134294] Updated weights for policy 0, policy_version 129744 (0.0028) [2025-01-04 06:57:48,540][134294] Updated weights for policy 0, policy_version 129754 (0.0022) [2025-01-04 06:57:48,968][134211] Fps is (10 sec: 13107.5, 60 sec: 14199.5, 300 sec: 14287.4). Total num frames: 531480576. Throughput: 0: 3605.7. Samples: 122033374. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 06:57:48,968][134211] Avg episode reward: [(0, '7.707')] [2025-01-04 06:57:50,432][134294] Updated weights for policy 0, policy_version 129764 (0.0013) [2025-01-04 06:57:52,306][134294] Updated weights for policy 0, policy_version 129774 (0.0013) [2025-01-04 06:57:53,967][134211] Fps is (10 sec: 17613.1, 60 sec: 14950.5, 300 sec: 14426.3). Total num frames: 531587072. Throughput: 0: 3781.2. Samples: 122061098. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 06:57:53,968][134211] Avg episode reward: [(0, '8.355')] [2025-01-04 06:57:54,187][134294] Updated weights for policy 0, policy_version 129784 (0.0014) [2025-01-04 06:57:56,134][134294] Updated weights for policy 0, policy_version 129794 (0.0014) [2025-01-04 06:57:58,884][134294] Updated weights for policy 0, policy_version 129804 (0.0025) [2025-01-04 06:57:58,968][134211] Fps is (10 sec: 19660.4, 60 sec: 15291.7, 300 sec: 14509.6). Total num frames: 531677184. Throughput: 0: 3749.4. Samples: 122090144. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 06:57:58,968][134211] Avg episode reward: [(0, '7.730')] [2025-01-04 06:58:02,303][134294] Updated weights for policy 0, policy_version 129814 (0.0026) [2025-01-04 06:58:03,969][134211] Fps is (10 sec: 15152.8, 60 sec: 15291.7, 300 sec: 14481.7). Total num frames: 531738624. Throughput: 0: 3613.5. Samples: 122099220. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 06:58:03,970][134211] Avg episode reward: [(0, '9.362')] [2025-01-04 06:58:05,492][134294] Updated weights for policy 0, policy_version 129824 (0.0028) [2025-01-04 06:58:08,556][134294] Updated weights for policy 0, policy_version 129834 (0.0025) [2025-01-04 06:58:08,968][134211] Fps is (10 sec: 12697.6, 60 sec: 14950.3, 300 sec: 14509.6). Total num frames: 531804160. Throughput: 0: 3559.2. Samples: 122118542. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 06:58:08,969][134211] Avg episode reward: [(0, '8.675')] [2025-01-04 06:58:11,999][134294] Updated weights for policy 0, policy_version 129844 (0.0028) [2025-01-04 06:58:13,968][134211] Fps is (10 sec: 12289.7, 60 sec: 14267.7, 300 sec: 14481.8). Total num frames: 531861504. Throughput: 0: 3527.8. Samples: 122136830. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 06:58:13,969][134211] Avg episode reward: [(0, '8.386')] [2025-01-04 06:58:15,474][134294] Updated weights for policy 0, policy_version 129854 (0.0029) [2025-01-04 06:58:18,957][134294] Updated weights for policy 0, policy_version 129864 (0.0030) [2025-01-04 06:58:18,968][134211] Fps is (10 sec: 11878.5, 60 sec: 13721.6, 300 sec: 14467.9). Total num frames: 531922944. Throughput: 0: 3501.2. Samples: 122145730. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 06:58:18,968][134211] Avg episode reward: [(0, '8.393')] [2025-01-04 06:58:22,285][134294] Updated weights for policy 0, policy_version 129874 (0.0026) [2025-01-04 06:58:23,968][134211] Fps is (10 sec: 11878.3, 60 sec: 13653.3, 300 sec: 14384.6). Total num frames: 531980288. Throughput: 0: 3505.8. Samples: 122163808. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 06:58:23,968][134211] Avg episode reward: [(0, '8.518')] [2025-01-04 06:58:25,500][134294] Updated weights for policy 0, policy_version 129884 (0.0023) [2025-01-04 06:58:27,368][134294] Updated weights for policy 0, policy_version 129894 (0.0015) [2025-01-04 06:58:28,968][134211] Fps is (10 sec: 15565.1, 60 sec: 14199.5, 300 sec: 14356.8). Total num frames: 532078592. Throughput: 0: 3675.3. Samples: 122189304. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 06:58:28,968][134211] Avg episode reward: [(0, '8.739')] [2025-01-04 06:58:29,170][134294] Updated weights for policy 0, policy_version 129904 (0.0015) [2025-01-04 06:58:31,081][134294] Updated weights for policy 0, policy_version 129914 (0.0015) [2025-01-04 06:58:33,674][134294] Updated weights for policy 0, policy_version 129924 (0.0022) [2025-01-04 06:58:33,968][134211] Fps is (10 sec: 18841.3, 60 sec: 14609.0, 300 sec: 14412.4). Total num frames: 532168704. Throughput: 0: 3828.5. Samples: 122205658. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 06:58:33,969][134211] Avg episode reward: [(0, '9.122')] [2025-01-04 06:58:36,921][134294] Updated weights for policy 0, policy_version 129934 (0.0028) [2025-01-04 06:58:38,968][134211] Fps is (10 sec: 15564.7, 60 sec: 14745.6, 300 sec: 14426.3). Total num frames: 532234240. Throughput: 0: 3675.0. Samples: 122226474. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 06:58:38,968][134211] Avg episode reward: [(0, '7.908')] [2025-01-04 06:58:40,354][134294] Updated weights for policy 0, policy_version 129944 (0.0028) [2025-01-04 06:58:43,498][134294] Updated weights for policy 0, policy_version 129954 (0.0026) [2025-01-04 06:58:43,968][134211] Fps is (10 sec: 12697.9, 60 sec: 14745.6, 300 sec: 14426.3). Total num frames: 532295680. Throughput: 0: 3442.0. Samples: 122245032. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 06:58:43,970][134211] Avg episode reward: [(0, '8.448')] [2025-01-04 06:58:46,871][134294] Updated weights for policy 0, policy_version 129964 (0.0027) [2025-01-04 06:58:48,968][134211] Fps is (10 sec: 12288.0, 60 sec: 14609.1, 300 sec: 14412.4). Total num frames: 532357120. Throughput: 0: 3444.1. Samples: 122254200. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 06:58:48,968][134211] Avg episode reward: [(0, '8.500')] [2025-01-04 06:58:50,010][134294] Updated weights for policy 0, policy_version 129974 (0.0026) [2025-01-04 06:58:53,045][134294] Updated weights for policy 0, policy_version 129984 (0.0026) [2025-01-04 06:58:53,968][134211] Fps is (10 sec: 12697.7, 60 sec: 13926.4, 300 sec: 14412.4). Total num frames: 532422656. Throughput: 0: 3450.9. Samples: 122273830. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 06:58:53,968][134211] Avg episode reward: [(0, '8.202')] [2025-01-04 06:58:56,004][134294] Updated weights for policy 0, policy_version 129994 (0.0025) [2025-01-04 06:58:58,968][134211] Fps is (10 sec: 13516.7, 60 sec: 13585.1, 300 sec: 14426.2). Total num frames: 532492288. Throughput: 0: 3491.9. Samples: 122293966. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 06:58:58,968][134211] Avg episode reward: [(0, '7.805')] [2025-01-04 06:58:59,248][134294] Updated weights for policy 0, policy_version 130004 (0.0026) [2025-01-04 06:59:02,206][134294] Updated weights for policy 0, policy_version 130014 (0.0025) [2025-01-04 06:59:03,968][134211] Fps is (10 sec: 14745.7, 60 sec: 13858.5, 300 sec: 14315.2). Total num frames: 532570112. Throughput: 0: 3514.2. Samples: 122303868. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-04 06:59:03,968][134211] Avg episode reward: [(0, '8.963')] [2025-01-04 06:59:04,276][134294] Updated weights for policy 0, policy_version 130024 (0.0015) [2025-01-04 06:59:06,214][134294] Updated weights for policy 0, policy_version 130034 (0.0015) [2025-01-04 06:59:08,968][134211] Fps is (10 sec: 16384.1, 60 sec: 14199.5, 300 sec: 14329.1). Total num frames: 532656128. Throughput: 0: 3736.4. Samples: 122331944. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-04 06:59:08,968][134211] Avg episode reward: [(0, '8.590')] [2025-01-04 06:59:09,107][134294] Updated weights for policy 0, policy_version 130044 (0.0024) [2025-01-04 06:59:12,331][134294] Updated weights for policy 0, policy_version 130054 (0.0028) [2025-01-04 06:59:13,968][134211] Fps is (10 sec: 14745.4, 60 sec: 14267.7, 300 sec: 14342.9). Total num frames: 532717568. Throughput: 0: 3591.6. Samples: 122350928. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-04 06:59:13,968][134211] Avg episode reward: [(0, '7.897')] [2025-01-04 06:59:14,026][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000130059_532721664.pth... [2025-01-04 06:59:14,100][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000129210_529244160.pth [2025-01-04 06:59:15,714][134294] Updated weights for policy 0, policy_version 130064 (0.0026) [2025-01-04 06:59:18,930][134294] Updated weights for policy 0, policy_version 130074 (0.0027) [2025-01-04 06:59:18,968][134211] Fps is (10 sec: 12697.5, 60 sec: 14336.0, 300 sec: 14370.7). Total num frames: 532783104. Throughput: 0: 3430.1. Samples: 122360014. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-04 06:59:18,968][134211] Avg episode reward: [(0, '7.768')] [2025-01-04 06:59:20,969][134294] Updated weights for policy 0, policy_version 130084 (0.0013) [2025-01-04 06:59:22,814][134294] Updated weights for policy 0, policy_version 130094 (0.0012) [2025-01-04 06:59:23,968][134211] Fps is (10 sec: 16793.9, 60 sec: 15087.0, 300 sec: 14495.7). Total num frames: 532885504. Throughput: 0: 3542.8. Samples: 122385900. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-04 06:59:23,968][134211] Avg episode reward: [(0, '8.141')] [2025-01-04 06:59:25,344][134294] Updated weights for policy 0, policy_version 130104 (0.0021) [2025-01-04 06:59:28,403][134294] Updated weights for policy 0, policy_version 130114 (0.0028) [2025-01-04 06:59:28,968][134211] Fps is (10 sec: 16793.6, 60 sec: 14540.8, 300 sec: 14509.6). Total num frames: 532951040. Throughput: 0: 3650.0. Samples: 122409282. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-04 06:59:28,968][134211] Avg episode reward: [(0, '8.240')] [2025-01-04 06:59:31,414][134294] Updated weights for policy 0, policy_version 130124 (0.0027) [2025-01-04 06:59:33,968][134211] Fps is (10 sec: 13516.4, 60 sec: 14199.5, 300 sec: 14509.6). Total num frames: 533020672. Throughput: 0: 3671.0. Samples: 122419394. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-04 06:59:33,970][134211] Avg episode reward: [(0, '8.015')] [2025-01-04 06:59:34,729][134294] Updated weights for policy 0, policy_version 130134 (0.0024) [2025-01-04 06:59:37,878][134294] Updated weights for policy 0, policy_version 130144 (0.0026) [2025-01-04 06:59:38,968][134211] Fps is (10 sec: 13107.1, 60 sec: 14131.2, 300 sec: 14495.7). Total num frames: 533082112. Throughput: 0: 3662.0. Samples: 122438622. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-04 06:59:38,968][134211] Avg episode reward: [(0, '8.565')] [2025-01-04 06:59:41,305][134294] Updated weights for policy 0, policy_version 130154 (0.0028) [2025-01-04 06:59:43,968][134211] Fps is (10 sec: 11878.7, 60 sec: 14063.0, 300 sec: 14342.9). Total num frames: 533139456. Throughput: 0: 3601.5. Samples: 122456034. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-04 06:59:43,968][134211] Avg episode reward: [(0, '7.648')] [2025-01-04 06:59:44,456][134294] Updated weights for policy 0, policy_version 130164 (0.0024) [2025-01-04 06:59:46,857][134294] Updated weights for policy 0, policy_version 130174 (0.0018) [2025-01-04 06:59:48,968][134211] Fps is (10 sec: 13516.6, 60 sec: 14335.9, 300 sec: 14315.2). Total num frames: 533217280. Throughput: 0: 3671.5. Samples: 122469088. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-04 06:59:48,969][134211] Avg episode reward: [(0, '8.661')] [2025-01-04 06:59:50,186][134294] Updated weights for policy 0, policy_version 130184 (0.0027) [2025-01-04 06:59:53,137][134294] Updated weights for policy 0, policy_version 130194 (0.0024) [2025-01-04 06:59:53,967][134211] Fps is (10 sec: 14336.1, 60 sec: 14336.0, 300 sec: 14329.1). Total num frames: 533282816. Throughput: 0: 3477.9. Samples: 122488448. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-04 06:59:53,968][134211] Avg episode reward: [(0, '8.256')] [2025-01-04 06:59:55,370][134294] Updated weights for policy 0, policy_version 130204 (0.0015) [2025-01-04 06:59:57,300][134294] Updated weights for policy 0, policy_version 130214 (0.0012) [2025-01-04 06:59:58,967][134211] Fps is (10 sec: 17203.8, 60 sec: 14950.4, 300 sec: 14467.9). Total num frames: 533389312. Throughput: 0: 3694.6. Samples: 122517184. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-04 06:59:58,968][134211] Avg episode reward: [(0, '8.113')] [2025-01-04 06:59:59,243][134294] Updated weights for policy 0, policy_version 130224 (0.0012) [2025-01-04 07:00:01,113][134294] Updated weights for policy 0, policy_version 130234 (0.0014) [2025-01-04 07:00:03,851][134294] Updated weights for policy 0, policy_version 130244 (0.0025) [2025-01-04 07:00:03,968][134211] Fps is (10 sec: 19660.3, 60 sec: 15155.2, 300 sec: 14551.2). Total num frames: 533479424. Throughput: 0: 3846.8. Samples: 122533120. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-04 07:00:03,968][134211] Avg episode reward: [(0, '7.988')] [2025-01-04 07:00:07,095][134294] Updated weights for policy 0, policy_version 130254 (0.0026) [2025-01-04 07:00:08,968][134211] Fps is (10 sec: 15154.9, 60 sec: 14745.6, 300 sec: 14565.1). Total num frames: 533540864. Throughput: 0: 3721.3. Samples: 122553358. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 07:00:08,968][134211] Avg episode reward: [(0, '8.360')] [2025-01-04 07:00:10,590][134294] Updated weights for policy 0, policy_version 130264 (0.0030) [2025-01-04 07:00:13,968][134211] Fps is (10 sec: 11878.3, 60 sec: 14677.3, 300 sec: 14467.9). Total num frames: 533598208. Throughput: 0: 3591.0. Samples: 122570878. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 07:00:13,969][134211] Avg episode reward: [(0, '8.356')] [2025-01-04 07:00:14,202][134294] Updated weights for policy 0, policy_version 130274 (0.0026) [2025-01-04 07:00:17,758][134294] Updated weights for policy 0, policy_version 130284 (0.0027) [2025-01-04 07:00:18,968][134211] Fps is (10 sec: 11468.1, 60 sec: 14540.7, 300 sec: 14287.4). Total num frames: 533655552. Throughput: 0: 3558.5. Samples: 122579530. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 07:00:18,969][134211] Avg episode reward: [(0, '8.075')] [2025-01-04 07:00:21,353][134294] Updated weights for policy 0, policy_version 130294 (0.0027) [2025-01-04 07:00:23,968][134211] Fps is (10 sec: 11469.0, 60 sec: 13789.8, 300 sec: 14231.9). Total num frames: 533712896. Throughput: 0: 3516.5. Samples: 122596864. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 07:00:23,968][134211] Avg episode reward: [(0, '7.799')] [2025-01-04 07:00:24,850][134294] Updated weights for policy 0, policy_version 130304 (0.0025) [2025-01-04 07:00:26,981][134294] Updated weights for policy 0, policy_version 130314 (0.0015) [2025-01-04 07:00:28,960][134294] Updated weights for policy 0, policy_version 130324 (0.0012) [2025-01-04 07:00:28,968][134211] Fps is (10 sec: 15156.3, 60 sec: 14267.8, 300 sec: 14343.0). Total num frames: 533807104. Throughput: 0: 3661.9. Samples: 122620820. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 07:00:28,968][134211] Avg episode reward: [(0, '7.629')] [2025-01-04 07:00:30,883][134294] Updated weights for policy 0, policy_version 130334 (0.0014) [2025-01-04 07:00:32,757][134294] Updated weights for policy 0, policy_version 130344 (0.0012) [2025-01-04 07:00:33,968][134211] Fps is (10 sec: 19660.7, 60 sec: 14813.9, 300 sec: 14467.9). Total num frames: 533909504. Throughput: 0: 3731.5. Samples: 122637006. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 07:00:33,968][134211] Avg episode reward: [(0, '8.148')] [2025-01-04 07:00:35,380][134294] Updated weights for policy 0, policy_version 130354 (0.0022) [2025-01-04 07:00:38,833][134294] Updated weights for policy 0, policy_version 130364 (0.0029) [2025-01-04 07:00:38,968][134211] Fps is (10 sec: 16793.3, 60 sec: 14882.1, 300 sec: 14481.9). Total num frames: 533975040. Throughput: 0: 3843.8. Samples: 122661420. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 07:00:38,968][134211] Avg episode reward: [(0, '8.343')] [2025-01-04 07:00:42,066][134294] Updated weights for policy 0, policy_version 130374 (0.0029) [2025-01-04 07:00:43,968][134211] Fps is (10 sec: 12287.8, 60 sec: 14882.1, 300 sec: 14481.8). Total num frames: 534032384. Throughput: 0: 3597.5. Samples: 122679074. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 07:00:43,969][134211] Avg episode reward: [(0, '7.851')] [2025-01-04 07:00:45,457][134294] Updated weights for policy 0, policy_version 130384 (0.0027) [2025-01-04 07:00:48,692][134294] Updated weights for policy 0, policy_version 130394 (0.0026) [2025-01-04 07:00:48,968][134211] Fps is (10 sec: 11877.6, 60 sec: 14608.9, 300 sec: 14454.0). Total num frames: 534093824. Throughput: 0: 3454.1. Samples: 122688556. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 07:00:48,969][134211] Avg episode reward: [(0, '10.172')] [2025-01-04 07:00:51,622][134294] Updated weights for policy 0, policy_version 130404 (0.0025) [2025-01-04 07:00:53,968][134211] Fps is (10 sec: 13107.2, 60 sec: 14677.3, 300 sec: 14412.4). Total num frames: 534163456. Throughput: 0: 3448.5. Samples: 122708540. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 07:00:53,968][134211] Avg episode reward: [(0, '8.512')] [2025-01-04 07:00:54,835][134294] Updated weights for policy 0, policy_version 130414 (0.0024) [2025-01-04 07:00:57,954][134294] Updated weights for policy 0, policy_version 130424 (0.0025) [2025-01-04 07:00:58,969][134211] Fps is (10 sec: 13516.5, 60 sec: 13994.4, 300 sec: 14412.3). Total num frames: 534228992. Throughput: 0: 3496.4. Samples: 122728218. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 07:00:58,969][134211] Avg episode reward: [(0, '10.030')] [2025-01-04 07:01:00,976][134294] Updated weights for policy 0, policy_version 130434 (0.0024) [2025-01-04 07:01:03,920][134294] Updated weights for policy 0, policy_version 130444 (0.0025) [2025-01-04 07:01:03,968][134211] Fps is (10 sec: 13517.0, 60 sec: 13653.4, 300 sec: 14384.6). Total num frames: 534298624. Throughput: 0: 3528.9. Samples: 122738330. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 07:01:03,969][134211] Avg episode reward: [(0, '8.216')] [2025-01-04 07:01:07,001][134294] Updated weights for policy 0, policy_version 130454 (0.0024) [2025-01-04 07:01:08,968][134211] Fps is (10 sec: 13108.4, 60 sec: 13653.3, 300 sec: 14356.8). Total num frames: 534360064. Throughput: 0: 3595.6. Samples: 122758668. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 07:01:08,968][134211] Avg episode reward: [(0, '7.943')] [2025-01-04 07:01:10,213][134294] Updated weights for policy 0, policy_version 130464 (0.0024) [2025-01-04 07:01:13,550][134294] Updated weights for policy 0, policy_version 130474 (0.0028) [2025-01-04 07:01:13,968][134211] Fps is (10 sec: 13107.3, 60 sec: 13858.2, 300 sec: 14315.2). Total num frames: 534429696. Throughput: 0: 3482.6. Samples: 122777538. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 07:01:13,968][134211] Avg episode reward: [(0, '8.577')] [2025-01-04 07:01:13,973][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000130476_534429696.pth... [2025-01-04 07:01:14,020][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000129641_531009536.pth [2025-01-04 07:01:15,544][134294] Updated weights for policy 0, policy_version 130484 (0.0015) [2025-01-04 07:01:17,592][134294] Updated weights for policy 0, policy_version 130494 (0.0015) [2025-01-04 07:01:18,967][134211] Fps is (10 sec: 17203.5, 60 sec: 14609.3, 300 sec: 14315.2). Total num frames: 534532096. Throughput: 0: 3441.9. Samples: 122791892. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 07:01:18,968][134211] Avg episode reward: [(0, '7.999')] [2025-01-04 07:01:19,537][134294] Updated weights for policy 0, policy_version 130504 (0.0013) [2025-01-04 07:01:21,645][134294] Updated weights for policy 0, policy_version 130514 (0.0018) [2025-01-04 07:01:23,968][134211] Fps is (10 sec: 18431.6, 60 sec: 15018.6, 300 sec: 14356.8). Total num frames: 534614016. Throughput: 0: 3557.2. Samples: 122821496. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 07:01:23,969][134211] Avg episode reward: [(0, '8.119')] [2025-01-04 07:01:24,678][134294] Updated weights for policy 0, policy_version 130524 (0.0028) [2025-01-04 07:01:27,953][134294] Updated weights for policy 0, policy_version 130534 (0.0027) [2025-01-04 07:01:28,968][134211] Fps is (10 sec: 14745.3, 60 sec: 14540.8, 300 sec: 14356.8). Total num frames: 534679552. Throughput: 0: 3592.1. Samples: 122840720. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 07:01:28,968][134211] Avg episode reward: [(0, '7.658')] [2025-01-04 07:01:31,014][134294] Updated weights for policy 0, policy_version 130544 (0.0028) [2025-01-04 07:01:33,968][134211] Fps is (10 sec: 13107.3, 60 sec: 13926.4, 300 sec: 14356.8). Total num frames: 534745088. Throughput: 0: 3605.0. Samples: 122850778. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 07:01:33,968][134211] Avg episode reward: [(0, '7.930')] [2025-01-04 07:01:34,172][134294] Updated weights for policy 0, policy_version 130554 (0.0027) [2025-01-04 07:01:37,189][134294] Updated weights for policy 0, policy_version 130564 (0.0026) [2025-01-04 07:01:38,968][134211] Fps is (10 sec: 13107.2, 60 sec: 13926.4, 300 sec: 14384.6). Total num frames: 534810624. Throughput: 0: 3600.7. Samples: 122870570. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 07:01:38,968][134211] Avg episode reward: [(0, '7.739')] [2025-01-04 07:01:40,482][134294] Updated weights for policy 0, policy_version 130574 (0.0025) [2025-01-04 07:01:43,766][134294] Updated weights for policy 0, policy_version 130584 (0.0024) [2025-01-04 07:01:43,968][134211] Fps is (10 sec: 12697.5, 60 sec: 13994.7, 300 sec: 14384.6). Total num frames: 534872064. Throughput: 0: 3578.8. Samples: 122889260. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 07:01:43,968][134211] Avg episode reward: [(0, '7.967')] [2025-01-04 07:01:47,178][134294] Updated weights for policy 0, policy_version 130594 (0.0028) [2025-01-04 07:01:48,967][134211] Fps is (10 sec: 13517.0, 60 sec: 14199.7, 300 sec: 14426.3). Total num frames: 534945792. Throughput: 0: 3555.7. Samples: 122898338. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 07:01:48,968][134211] Avg episode reward: [(0, '8.667')] [2025-01-04 07:01:49,128][134294] Updated weights for policy 0, policy_version 130604 (0.0016) [2025-01-04 07:01:51,044][134294] Updated weights for policy 0, policy_version 130614 (0.0014) [2025-01-04 07:01:53,054][134294] Updated weights for policy 0, policy_version 130624 (0.0014) [2025-01-04 07:01:53,967][134211] Fps is (10 sec: 18023.0, 60 sec: 14813.9, 300 sec: 14551.2). Total num frames: 535052288. Throughput: 0: 3763.6. Samples: 122928028. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 07:01:53,968][134211] Avg episode reward: [(0, '7.688')] [2025-01-04 07:01:54,924][134294] Updated weights for policy 0, policy_version 130634 (0.0014) [2025-01-04 07:01:56,809][134294] Updated weights for policy 0, policy_version 130644 (0.0014) [2025-01-04 07:01:58,661][134294] Updated weights for policy 0, policy_version 130654 (0.0014) [2025-01-04 07:01:58,968][134211] Fps is (10 sec: 21708.7, 60 sec: 15565.1, 300 sec: 14717.9). Total num frames: 535162880. Throughput: 0: 4059.2. Samples: 122960202. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 07:01:58,968][134211] Avg episode reward: [(0, '7.604')] [2025-01-04 07:02:00,714][134294] Updated weights for policy 0, policy_version 130664 (0.0014) [2025-01-04 07:02:03,900][134294] Updated weights for policy 0, policy_version 130674 (0.0030) [2025-01-04 07:02:03,970][134211] Fps is (10 sec: 18837.1, 60 sec: 15700.7, 300 sec: 14689.9). Total num frames: 535240704. Throughput: 0: 4065.9. Samples: 122974868. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 07:02:03,971][134211] Avg episode reward: [(0, '8.106')] [2025-01-04 07:02:07,168][134294] Updated weights for policy 0, policy_version 130684 (0.0027) [2025-01-04 07:02:08,968][134211] Fps is (10 sec: 13926.1, 60 sec: 15701.3, 300 sec: 14565.1). Total num frames: 535302144. Throughput: 0: 3827.9. Samples: 122993750. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 07:02:08,968][134211] Avg episode reward: [(0, '8.423')] [2025-01-04 07:02:10,741][134294] Updated weights for policy 0, policy_version 130694 (0.0028) [2025-01-04 07:02:13,968][134211] Fps is (10 sec: 11880.9, 60 sec: 15496.5, 300 sec: 14440.1). Total num frames: 535359488. Throughput: 0: 3790.0. Samples: 123011270. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 07:02:13,972][134211] Avg episode reward: [(0, '8.421')] [2025-01-04 07:02:14,207][134294] Updated weights for policy 0, policy_version 130704 (0.0029) [2025-01-04 07:02:17,791][134294] Updated weights for policy 0, policy_version 130714 (0.0025) [2025-01-04 07:02:18,968][134211] Fps is (10 sec: 11468.9, 60 sec: 14745.5, 300 sec: 14426.2). Total num frames: 535416832. Throughput: 0: 3752.5. Samples: 123019642. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 07:02:18,968][134211] Avg episode reward: [(0, '7.606')] [2025-01-04 07:02:21,186][134294] Updated weights for policy 0, policy_version 130724 (0.0022) [2025-01-04 07:02:23,968][134211] Fps is (10 sec: 11878.4, 60 sec: 14404.3, 300 sec: 14412.4). Total num frames: 535478272. Throughput: 0: 3706.3. Samples: 123037354. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 07:02:23,968][134211] Avg episode reward: [(0, '8.208')] [2025-01-04 07:02:24,668][134294] Updated weights for policy 0, policy_version 130734 (0.0024) [2025-01-04 07:02:27,605][134294] Updated weights for policy 0, policy_version 130744 (0.0024) [2025-01-04 07:02:28,968][134211] Fps is (10 sec: 12697.7, 60 sec: 14404.3, 300 sec: 14412.4). Total num frames: 535543808. Throughput: 0: 3721.9. Samples: 123056746. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 07:02:28,968][134211] Avg episode reward: [(0, '7.727')] [2025-01-04 07:02:30,714][134294] Updated weights for policy 0, policy_version 130754 (0.0023) [2025-01-04 07:02:33,599][134294] Updated weights for policy 0, policy_version 130764 (0.0025) [2025-01-04 07:02:33,968][134211] Fps is (10 sec: 13516.9, 60 sec: 14472.5, 300 sec: 14454.0). Total num frames: 535613440. Throughput: 0: 3750.1. Samples: 123067092. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 07:02:33,968][134211] Avg episode reward: [(0, '8.396')] [2025-01-04 07:02:36,519][134294] Updated weights for policy 0, policy_version 130774 (0.0025) [2025-01-04 07:02:38,968][134211] Fps is (10 sec: 13926.3, 60 sec: 14540.8, 300 sec: 14481.8). Total num frames: 535683072. Throughput: 0: 3554.5. Samples: 123087980. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 07:02:38,968][134211] Avg episode reward: [(0, '7.655')] [2025-01-04 07:02:39,727][134294] Updated weights for policy 0, policy_version 130784 (0.0026) [2025-01-04 07:02:42,900][134294] Updated weights for policy 0, policy_version 130794 (0.0025) [2025-01-04 07:02:43,968][134211] Fps is (10 sec: 13107.2, 60 sec: 14540.8, 300 sec: 14454.0). Total num frames: 535744512. Throughput: 0: 3264.1. Samples: 123107088. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 07:02:43,968][134211] Avg episode reward: [(0, '9.329')] [2025-01-04 07:02:46,077][134294] Updated weights for policy 0, policy_version 130804 (0.0027) [2025-01-04 07:02:48,968][134211] Fps is (10 sec: 12288.1, 60 sec: 14336.0, 300 sec: 14301.3). Total num frames: 535805952. Throughput: 0: 3155.7. Samples: 123116866. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 07:02:48,968][134211] Avg episode reward: [(0, '8.459')] [2025-01-04 07:02:49,301][134294] Updated weights for policy 0, policy_version 130814 (0.0027) [2025-01-04 07:02:51,266][134294] Updated weights for policy 0, policy_version 130824 (0.0013) [2025-01-04 07:02:53,132][134294] Updated weights for policy 0, policy_version 130834 (0.0015) [2025-01-04 07:02:53,968][134211] Fps is (10 sec: 16793.9, 60 sec: 14336.0, 300 sec: 14356.8). Total num frames: 535912448. Throughput: 0: 3295.3. Samples: 123142038. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 07:02:53,968][134211] Avg episode reward: [(0, '8.796')] [2025-01-04 07:02:55,022][134294] Updated weights for policy 0, policy_version 130844 (0.0013) [2025-01-04 07:02:56,949][134294] Updated weights for policy 0, policy_version 130854 (0.0013) [2025-01-04 07:02:58,808][134294] Updated weights for policy 0, policy_version 130864 (0.0015) [2025-01-04 07:02:58,968][134211] Fps is (10 sec: 21299.0, 60 sec: 14267.7, 300 sec: 14509.6). Total num frames: 536018944. Throughput: 0: 3627.4. Samples: 123174504. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 07:02:58,968][134211] Avg episode reward: [(0, '8.140')] [2025-01-04 07:03:01,569][134294] Updated weights for policy 0, policy_version 130874 (0.0022) [2025-01-04 07:03:03,968][134211] Fps is (10 sec: 17612.4, 60 sec: 14131.7, 300 sec: 14523.4). Total num frames: 536088576. Throughput: 0: 3714.4. Samples: 123186788. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 07:03:03,969][134211] Avg episode reward: [(0, '8.590')] [2025-01-04 07:03:04,833][134294] Updated weights for policy 0, policy_version 130884 (0.0029) [2025-01-04 07:03:07,963][134294] Updated weights for policy 0, policy_version 130894 (0.0028) [2025-01-04 07:03:08,968][134211] Fps is (10 sec: 13107.2, 60 sec: 14131.2, 300 sec: 14537.3). Total num frames: 536150016. Throughput: 0: 3750.2. Samples: 123206112. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 07:03:08,968][134211] Avg episode reward: [(0, '8.787')] [2025-01-04 07:03:11,520][134294] Updated weights for policy 0, policy_version 130904 (0.0026) [2025-01-04 07:03:13,968][134211] Fps is (10 sec: 11878.3, 60 sec: 14131.2, 300 sec: 14523.4). Total num frames: 536207360. Throughput: 0: 3708.1. Samples: 123223612. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 07:03:13,969][134211] Avg episode reward: [(0, '8.195')] [2025-01-04 07:03:14,020][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000130911_536211456.pth... [2025-01-04 07:03:14,098][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000130059_532721664.pth [2025-01-04 07:03:15,205][134294] Updated weights for policy 0, policy_version 130914 (0.0027) [2025-01-04 07:03:18,498][134294] Updated weights for policy 0, policy_version 130924 (0.0026) [2025-01-04 07:03:18,968][134211] Fps is (10 sec: 11878.4, 60 sec: 14199.5, 300 sec: 14537.3). Total num frames: 536268800. Throughput: 0: 3668.8. Samples: 123232188. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 07:03:18,969][134211] Avg episode reward: [(0, '7.667')] [2025-01-04 07:03:21,439][134294] Updated weights for policy 0, policy_version 130934 (0.0023) [2025-01-04 07:03:23,968][134211] Fps is (10 sec: 12697.8, 60 sec: 14267.8, 300 sec: 14426.2). Total num frames: 536334336. Throughput: 0: 3644.8. Samples: 123251998. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 07:03:23,968][134211] Avg episode reward: [(0, '7.960')] [2025-01-04 07:03:24,673][134294] Updated weights for policy 0, policy_version 130944 (0.0027) [2025-01-04 07:03:28,036][134294] Updated weights for policy 0, policy_version 130954 (0.0026) [2025-01-04 07:03:28,969][134211] Fps is (10 sec: 12697.7, 60 sec: 14199.4, 300 sec: 14329.1). Total num frames: 536395776. Throughput: 0: 3632.0. Samples: 123270528. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 07:03:28,969][134211] Avg episode reward: [(0, '7.815')] [2025-01-04 07:03:31,227][134294] Updated weights for policy 0, policy_version 130964 (0.0025) [2025-01-04 07:03:33,968][134211] Fps is (10 sec: 12697.5, 60 sec: 14131.2, 300 sec: 14329.0). Total num frames: 536461312. Throughput: 0: 3631.6. Samples: 123280288. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 07:03:33,968][134211] Avg episode reward: [(0, '8.132')] [2025-01-04 07:03:34,565][134294] Updated weights for policy 0, policy_version 130974 (0.0024) [2025-01-04 07:03:36,825][134294] Updated weights for policy 0, policy_version 130984 (0.0015) [2025-01-04 07:03:38,764][134294] Updated weights for policy 0, policy_version 130994 (0.0013) [2025-01-04 07:03:38,967][134211] Fps is (10 sec: 15974.7, 60 sec: 14540.9, 300 sec: 14440.1). Total num frames: 536555520. Throughput: 0: 3571.3. Samples: 123302748. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 07:03:38,968][134211] Avg episode reward: [(0, '8.263')] [2025-01-04 07:03:40,808][134294] Updated weights for policy 0, policy_version 131004 (0.0013) [2025-01-04 07:03:42,781][134294] Updated weights for policy 0, policy_version 131014 (0.0017) [2025-01-04 07:03:43,968][134211] Fps is (10 sec: 18432.0, 60 sec: 15018.7, 300 sec: 14537.3). Total num frames: 536645632. Throughput: 0: 3520.5. Samples: 123332928. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 07:03:43,968][134211] Avg episode reward: [(0, '8.290')] [2025-01-04 07:03:46,059][134294] Updated weights for policy 0, policy_version 131024 (0.0028) [2025-01-04 07:03:48,968][134211] Fps is (10 sec: 14745.2, 60 sec: 14950.4, 300 sec: 14509.6). Total num frames: 536702976. Throughput: 0: 3452.6. Samples: 123342156. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 07:03:48,968][134211] Avg episode reward: [(0, '8.843')] [2025-01-04 07:03:49,627][134294] Updated weights for policy 0, policy_version 131034 (0.0029) [2025-01-04 07:03:52,784][134294] Updated weights for policy 0, policy_version 131044 (0.0027) [2025-01-04 07:03:53,971][134211] Fps is (10 sec: 12284.4, 60 sec: 14267.0, 300 sec: 14495.5). Total num frames: 536768512. Throughput: 0: 3425.7. Samples: 123360280. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 07:03:53,971][134211] Avg episode reward: [(0, '8.304')] [2025-01-04 07:03:55,853][134294] Updated weights for policy 0, policy_version 131054 (0.0024) [2025-01-04 07:03:58,886][134294] Updated weights for policy 0, policy_version 131064 (0.0025) [2025-01-04 07:03:58,968][134211] Fps is (10 sec: 13516.4, 60 sec: 13653.3, 300 sec: 14467.9). Total num frames: 536838144. Throughput: 0: 3488.7. Samples: 123380602. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 07:03:58,969][134211] Avg episode reward: [(0, '8.635')] [2025-01-04 07:04:01,830][134294] Updated weights for policy 0, policy_version 131074 (0.0025) [2025-01-04 07:04:03,968][134211] Fps is (10 sec: 13520.9, 60 sec: 13585.1, 300 sec: 14398.5). Total num frames: 536903680. Throughput: 0: 3522.9. Samples: 123390718. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 07:04:03,968][134211] Avg episode reward: [(0, '8.619')] [2025-01-04 07:04:04,951][134294] Updated weights for policy 0, policy_version 131084 (0.0027) [2025-01-04 07:04:08,077][134294] Updated weights for policy 0, policy_version 131094 (0.0025) [2025-01-04 07:04:08,968][134211] Fps is (10 sec: 13107.7, 60 sec: 13653.3, 300 sec: 14412.4). Total num frames: 536969216. Throughput: 0: 3518.3. Samples: 123410322. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 07:04:08,968][134211] Avg episode reward: [(0, '8.205')] [2025-01-04 07:04:11,428][134294] Updated weights for policy 0, policy_version 131104 (0.0027) [2025-01-04 07:04:13,968][134211] Fps is (10 sec: 12697.5, 60 sec: 13721.6, 300 sec: 14398.5). Total num frames: 537030656. Throughput: 0: 3532.1. Samples: 123429472. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 07:04:13,968][134211] Avg episode reward: [(0, '8.605')] [2025-01-04 07:04:14,646][134294] Updated weights for policy 0, policy_version 131114 (0.0029) [2025-01-04 07:04:17,421][134294] Updated weights for policy 0, policy_version 131124 (0.0018) [2025-01-04 07:04:18,967][134211] Fps is (10 sec: 14336.3, 60 sec: 14063.0, 300 sec: 14329.1). Total num frames: 537112576. Throughput: 0: 3514.2. Samples: 123438428. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 07:04:18,968][134211] Avg episode reward: [(0, '8.467')] [2025-01-04 07:04:19,560][134294] Updated weights for policy 0, policy_version 131134 (0.0014) [2025-01-04 07:04:21,503][134294] Updated weights for policy 0, policy_version 131144 (0.0015) [2025-01-04 07:04:23,900][134294] Updated weights for policy 0, policy_version 131154 (0.0020) [2025-01-04 07:04:23,968][134211] Fps is (10 sec: 17613.2, 60 sec: 14540.8, 300 sec: 14426.3). Total num frames: 537206784. Throughput: 0: 3656.6. Samples: 123467294. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 07:04:23,968][134211] Avg episode reward: [(0, '7.779')] [2025-01-04 07:04:26,071][134294] Updated weights for policy 0, policy_version 131164 (0.0017) [2025-01-04 07:04:28,968][134211] Fps is (10 sec: 17203.0, 60 sec: 14813.9, 300 sec: 14454.0). Total num frames: 537284608. Throughput: 0: 3545.1. Samples: 123492456. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 07:04:28,968][134211] Avg episode reward: [(0, '8.390')] [2025-01-04 07:04:29,189][134294] Updated weights for policy 0, policy_version 131174 (0.0024) [2025-01-04 07:04:32,289][134294] Updated weights for policy 0, policy_version 131184 (0.0025) [2025-01-04 07:04:33,968][134211] Fps is (10 sec: 14335.7, 60 sec: 14813.9, 300 sec: 14467.9). Total num frames: 537350144. Throughput: 0: 3550.8. Samples: 123501944. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 07:04:33,968][134211] Avg episode reward: [(0, '8.305')] [2025-01-04 07:04:35,334][134294] Updated weights for policy 0, policy_version 131194 (0.0025) [2025-01-04 07:04:38,507][134294] Updated weights for policy 0, policy_version 131204 (0.0027) [2025-01-04 07:04:38,968][134211] Fps is (10 sec: 12697.6, 60 sec: 14267.7, 300 sec: 14481.8). Total num frames: 537411584. Throughput: 0: 3587.6. Samples: 123521712. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 07:04:38,968][134211] Avg episode reward: [(0, '8.616')] [2025-01-04 07:04:42,084][134294] Updated weights for policy 0, policy_version 131214 (0.0028) [2025-01-04 07:04:43,968][134211] Fps is (10 sec: 11878.5, 60 sec: 13721.6, 300 sec: 14412.4). Total num frames: 537468928. Throughput: 0: 3526.6. Samples: 123539298. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 07:04:43,968][134211] Avg episode reward: [(0, '8.556')] [2025-01-04 07:04:45,611][134294] Updated weights for policy 0, policy_version 131224 (0.0029) [2025-01-04 07:04:48,853][134294] Updated weights for policy 0, policy_version 131234 (0.0028) [2025-01-04 07:04:48,968][134211] Fps is (10 sec: 12287.9, 60 sec: 13858.1, 300 sec: 14412.4). Total num frames: 537534464. Throughput: 0: 3502.1. Samples: 123548312. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 07:04:48,968][134211] Avg episode reward: [(0, '7.836')] [2025-01-04 07:04:51,608][134294] Updated weights for policy 0, policy_version 131244 (0.0020) [2025-01-04 07:04:53,807][134294] Updated weights for policy 0, policy_version 131254 (0.0015) [2025-01-04 07:04:53,968][134211] Fps is (10 sec: 14745.4, 60 sec: 14131.9, 300 sec: 14329.0). Total num frames: 537616384. Throughput: 0: 3556.0. Samples: 123570342. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 07:04:53,969][134211] Avg episode reward: [(0, '7.787')] [2025-01-04 07:04:56,819][134294] Updated weights for policy 0, policy_version 131264 (0.0026) [2025-01-04 07:04:58,969][134211] Fps is (10 sec: 14743.9, 60 sec: 14062.7, 300 sec: 14245.7). Total num frames: 537681920. Throughput: 0: 3606.0. Samples: 123591744. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 07:04:58,970][134211] Avg episode reward: [(0, '8.640')] [2025-01-04 07:05:00,061][134294] Updated weights for policy 0, policy_version 131274 (0.0027) [2025-01-04 07:05:02,966][134294] Updated weights for policy 0, policy_version 131284 (0.0022) [2025-01-04 07:05:03,968][134211] Fps is (10 sec: 13515.9, 60 sec: 14131.0, 300 sec: 14273.5). Total num frames: 537751552. Throughput: 0: 3628.4. Samples: 123601708. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 07:05:03,969][134211] Avg episode reward: [(0, '8.217')] [2025-01-04 07:05:06,050][134294] Updated weights for policy 0, policy_version 131294 (0.0022) [2025-01-04 07:05:08,969][134211] Fps is (10 sec: 13516.9, 60 sec: 14130.9, 300 sec: 14301.2). Total num frames: 537817088. Throughput: 0: 3438.1. Samples: 123622012. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 07:05:08,969][134211] Avg episode reward: [(0, '8.595')] [2025-01-04 07:05:09,203][134294] Updated weights for policy 0, policy_version 131304 (0.0028) [2025-01-04 07:05:12,133][134294] Updated weights for policy 0, policy_version 131314 (0.0020) [2025-01-04 07:05:13,968][134211] Fps is (10 sec: 14746.8, 60 sec: 14472.6, 300 sec: 14384.6). Total num frames: 537899008. Throughput: 0: 3369.6. Samples: 123644086. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 07:05:13,968][134211] Avg episode reward: [(0, '8.799')] [2025-01-04 07:05:13,973][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000131323_537899008.pth... [2025-01-04 07:05:14,019][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000130476_534429696.pth [2025-01-04 07:05:14,155][134294] Updated weights for policy 0, policy_version 131324 (0.0014) [2025-01-04 07:05:16,189][134294] Updated weights for policy 0, policy_version 131334 (0.0015) [2025-01-04 07:05:18,213][134294] Updated weights for policy 0, policy_version 131344 (0.0014) [2025-01-04 07:05:18,968][134211] Fps is (10 sec: 17614.7, 60 sec: 14677.3, 300 sec: 14509.6). Total num frames: 537993216. Throughput: 0: 3494.3. Samples: 123659186. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 07:05:18,968][134211] Avg episode reward: [(0, '8.058')] [2025-01-04 07:05:21,202][134294] Updated weights for policy 0, policy_version 131354 (0.0025) [2025-01-04 07:05:23,968][134211] Fps is (10 sec: 15974.1, 60 sec: 14199.4, 300 sec: 14412.4). Total num frames: 538058752. Throughput: 0: 3578.8. Samples: 123682758. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 07:05:23,969][134211] Avg episode reward: [(0, '8.842')] [2025-01-04 07:05:24,529][134294] Updated weights for policy 0, policy_version 131364 (0.0026) [2025-01-04 07:05:27,579][134294] Updated weights for policy 0, policy_version 131374 (0.0029) [2025-01-04 07:05:28,968][134211] Fps is (10 sec: 13106.3, 60 sec: 13994.5, 300 sec: 14287.4). Total num frames: 538124288. Throughput: 0: 3615.1. Samples: 123701980. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 07:05:28,969][134211] Avg episode reward: [(0, '8.600')] [2025-01-04 07:05:30,716][134294] Updated weights for policy 0, policy_version 131384 (0.0025) [2025-01-04 07:05:33,680][134294] Updated weights for policy 0, policy_version 131394 (0.0025) [2025-01-04 07:05:33,968][134211] Fps is (10 sec: 13516.8, 60 sec: 14062.9, 300 sec: 14301.3). Total num frames: 538193920. Throughput: 0: 3640.8. Samples: 123712146. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 07:05:33,968][134211] Avg episode reward: [(0, '8.533')] [2025-01-04 07:05:36,666][134294] Updated weights for policy 0, policy_version 131404 (0.0024) [2025-01-04 07:05:38,968][134211] Fps is (10 sec: 13517.7, 60 sec: 14131.2, 300 sec: 14329.1). Total num frames: 538259456. Throughput: 0: 3606.0. Samples: 123732612. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 07:05:38,968][134211] Avg episode reward: [(0, '7.836')] [2025-01-04 07:05:39,949][134294] Updated weights for policy 0, policy_version 131414 (0.0023) [2025-01-04 07:05:43,426][134294] Updated weights for policy 0, policy_version 131424 (0.0031) [2025-01-04 07:05:43,969][134211] Fps is (10 sec: 12286.9, 60 sec: 14131.0, 300 sec: 14315.2). Total num frames: 538316800. Throughput: 0: 3532.3. Samples: 123750698. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 07:05:43,969][134211] Avg episode reward: [(0, '8.941')] [2025-01-04 07:05:46,703][134294] Updated weights for policy 0, policy_version 131434 (0.0025) [2025-01-04 07:05:48,968][134211] Fps is (10 sec: 11878.5, 60 sec: 14063.0, 300 sec: 14287.4). Total num frames: 538378240. Throughput: 0: 3517.4. Samples: 123759986. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 07:05:48,968][134211] Avg episode reward: [(0, '9.078')] [2025-01-04 07:05:49,636][134294] Updated weights for policy 0, policy_version 131444 (0.0021) [2025-01-04 07:05:51,586][134294] Updated weights for policy 0, policy_version 131454 (0.0013) [2025-01-04 07:05:53,445][134294] Updated weights for policy 0, policy_version 131464 (0.0014) [2025-01-04 07:05:53,968][134211] Fps is (10 sec: 16794.7, 60 sec: 14472.5, 300 sec: 14426.3). Total num frames: 538484736. Throughput: 0: 3632.6. Samples: 123785478. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 07:05:53,968][134211] Avg episode reward: [(0, '8.979')] [2025-01-04 07:05:55,343][134294] Updated weights for policy 0, policy_version 131474 (0.0014) [2025-01-04 07:05:58,317][134294] Updated weights for policy 0, policy_version 131484 (0.0026) [2025-01-04 07:05:58,968][134211] Fps is (10 sec: 18431.9, 60 sec: 14677.6, 300 sec: 14454.0). Total num frames: 538562560. Throughput: 0: 3742.4. Samples: 123812494. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 07:05:58,968][134211] Avg episode reward: [(0, '8.306')] [2025-01-04 07:06:01,436][134294] Updated weights for policy 0, policy_version 131494 (0.0028) [2025-01-04 07:06:03,968][134211] Fps is (10 sec: 14336.3, 60 sec: 14609.2, 300 sec: 14467.9). Total num frames: 538628096. Throughput: 0: 3619.3. Samples: 123822056. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 07:06:03,968][134211] Avg episode reward: [(0, '8.740')] [2025-01-04 07:06:04,746][134294] Updated weights for policy 0, policy_version 131504 (0.0026) [2025-01-04 07:06:07,932][134294] Updated weights for policy 0, policy_version 131514 (0.0027) [2025-01-04 07:06:08,968][134211] Fps is (10 sec: 13107.2, 60 sec: 14609.3, 300 sec: 14454.0). Total num frames: 538693632. Throughput: 0: 3520.2. Samples: 123841168. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 07:06:08,968][134211] Avg episode reward: [(0, '7.852')] [2025-01-04 07:06:11,285][134294] Updated weights for policy 0, policy_version 131524 (0.0025) [2025-01-04 07:06:13,968][134211] Fps is (10 sec: 12288.0, 60 sec: 14199.4, 300 sec: 14301.3). Total num frames: 538750976. Throughput: 0: 3493.7. Samples: 123859196. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 07:06:13,968][134211] Avg episode reward: [(0, '7.971')] [2025-01-04 07:06:14,784][134294] Updated weights for policy 0, policy_version 131534 (0.0025) [2025-01-04 07:06:18,551][134294] Updated weights for policy 0, policy_version 131544 (0.0024) [2025-01-04 07:06:18,968][134211] Fps is (10 sec: 11469.0, 60 sec: 13585.1, 300 sec: 14218.0). Total num frames: 538808320. Throughput: 0: 3461.5. Samples: 123867914. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 07:06:18,968][134211] Avg episode reward: [(0, '8.723')] [2025-01-04 07:06:20,638][134294] Updated weights for policy 0, policy_version 131554 (0.0014) [2025-01-04 07:06:22,701][134294] Updated weights for policy 0, policy_version 131564 (0.0014) [2025-01-04 07:06:23,968][134211] Fps is (10 sec: 15974.8, 60 sec: 14199.5, 300 sec: 14342.9). Total num frames: 538910720. Throughput: 0: 3543.6. Samples: 123892074. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 07:06:23,968][134211] Avg episode reward: [(0, '8.004')] [2025-01-04 07:06:24,562][134294] Updated weights for policy 0, policy_version 131574 (0.0013) [2025-01-04 07:06:26,510][134294] Updated weights for policy 0, policy_version 131584 (0.0013) [2025-01-04 07:06:28,406][134294] Updated weights for policy 0, policy_version 131594 (0.0012) [2025-01-04 07:06:28,968][134211] Fps is (10 sec: 21299.2, 60 sec: 14950.6, 300 sec: 14495.7). Total num frames: 539021312. Throughput: 0: 3858.6. Samples: 123924332. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 07:06:28,968][134211] Avg episode reward: [(0, '8.489')] [2025-01-04 07:06:30,570][134294] Updated weights for policy 0, policy_version 131604 (0.0017) [2025-01-04 07:06:33,749][134294] Updated weights for policy 0, policy_version 131614 (0.0027) [2025-01-04 07:06:33,968][134211] Fps is (10 sec: 18021.9, 60 sec: 14950.4, 300 sec: 14509.6). Total num frames: 539090944. Throughput: 0: 3948.5. Samples: 123937670. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 07:06:33,969][134211] Avg episode reward: [(0, '8.329')] [2025-01-04 07:06:36,796][134294] Updated weights for policy 0, policy_version 131624 (0.0027) [2025-01-04 07:06:38,968][134211] Fps is (10 sec: 13516.7, 60 sec: 14950.4, 300 sec: 14523.5). Total num frames: 539156480. Throughput: 0: 3818.9. Samples: 123957328. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 07:06:38,968][134211] Avg episode reward: [(0, '7.535')] [2025-01-04 07:06:40,194][134294] Updated weights for policy 0, policy_version 131634 (0.0026) [2025-01-04 07:06:43,513][134294] Updated weights for policy 0, policy_version 131644 (0.0029) [2025-01-04 07:06:43,968][134211] Fps is (10 sec: 12697.6, 60 sec: 15018.9, 300 sec: 14481.8). Total num frames: 539217920. Throughput: 0: 3625.7. Samples: 123975650. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 07:06:43,968][134211] Avg episode reward: [(0, '8.635')] [2025-01-04 07:06:46,880][134294] Updated weights for policy 0, policy_version 131654 (0.0026) [2025-01-04 07:06:48,968][134211] Fps is (10 sec: 12287.9, 60 sec: 15018.6, 300 sec: 14329.0). Total num frames: 539279360. Throughput: 0: 3619.4. Samples: 123984928. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 07:06:48,968][134211] Avg episode reward: [(0, '7.868')] [2025-01-04 07:06:50,090][134294] Updated weights for policy 0, policy_version 131664 (0.0027) [2025-01-04 07:06:53,162][134294] Updated weights for policy 0, policy_version 131674 (0.0025) [2025-01-04 07:06:53,968][134211] Fps is (10 sec: 12697.8, 60 sec: 14336.1, 300 sec: 14176.3). Total num frames: 539344896. Throughput: 0: 3622.4. Samples: 124004176. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 07:06:53,968][134211] Avg episode reward: [(0, '8.264')] [2025-01-04 07:06:56,582][134294] Updated weights for policy 0, policy_version 131684 (0.0026) [2025-01-04 07:06:58,968][134211] Fps is (10 sec: 12287.8, 60 sec: 13994.6, 300 sec: 14107.0). Total num frames: 539402240. Throughput: 0: 3621.8. Samples: 124022176. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 07:06:58,969][134211] Avg episode reward: [(0, '8.483')] [2025-01-04 07:07:00,034][134294] Updated weights for policy 0, policy_version 131694 (0.0026) [2025-01-04 07:07:03,297][134294] Updated weights for policy 0, policy_version 131704 (0.0027) [2025-01-04 07:07:03,968][134211] Fps is (10 sec: 12288.0, 60 sec: 13994.7, 300 sec: 14120.8). Total num frames: 539467776. Throughput: 0: 3631.6. Samples: 124031336. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 07:07:03,968][134211] Avg episode reward: [(0, '7.768')] [2025-01-04 07:07:06,409][134294] Updated weights for policy 0, policy_version 131714 (0.0025) [2025-01-04 07:07:08,968][134211] Fps is (10 sec: 13106.7, 60 sec: 13994.5, 300 sec: 14148.5). Total num frames: 539533312. Throughput: 0: 3531.1. Samples: 124050976. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 07:07:08,969][134211] Avg episode reward: [(0, '8.454')] [2025-01-04 07:07:09,546][134294] Updated weights for policy 0, policy_version 131724 (0.0029) [2025-01-04 07:07:13,021][134294] Updated weights for policy 0, policy_version 131734 (0.0024) [2025-01-04 07:07:13,968][134211] Fps is (10 sec: 12287.7, 60 sec: 13994.7, 300 sec: 14148.5). Total num frames: 539590656. Throughput: 0: 3222.3. Samples: 124069338. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 07:07:13,969][134211] Avg episode reward: [(0, '7.409')] [2025-01-04 07:07:14,007][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000131737_539594752.pth... [2025-01-04 07:07:14,077][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000130911_536211456.pth [2025-01-04 07:07:15,719][134294] Updated weights for policy 0, policy_version 131744 (0.0020) [2025-01-04 07:07:17,744][134294] Updated weights for policy 0, policy_version 131754 (0.0014) [2025-01-04 07:07:18,968][134211] Fps is (10 sec: 15156.3, 60 sec: 14609.1, 300 sec: 14259.6). Total num frames: 539684864. Throughput: 0: 3193.5. Samples: 124081378. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 07:07:18,968][134211] Avg episode reward: [(0, '7.894')] [2025-01-04 07:07:19,750][134294] Updated weights for policy 0, policy_version 131764 (0.0013) [2025-01-04 07:07:21,637][134294] Updated weights for policy 0, policy_version 131774 (0.0013) [2025-01-04 07:07:23,499][134294] Updated weights for policy 0, policy_version 131784 (0.0014) [2025-01-04 07:07:23,968][134211] Fps is (10 sec: 20480.6, 60 sec: 14745.6, 300 sec: 14412.4). Total num frames: 539795456. Throughput: 0: 3457.4. Samples: 124112912. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-04 07:07:23,968][134211] Avg episode reward: [(0, '8.970')] [2025-01-04 07:07:25,370][134294] Updated weights for policy 0, policy_version 131794 (0.0014) [2025-01-04 07:07:28,067][134294] Updated weights for policy 0, policy_version 131804 (0.0026) [2025-01-04 07:07:28,968][134211] Fps is (10 sec: 19251.0, 60 sec: 14267.7, 300 sec: 14454.0). Total num frames: 539877376. Throughput: 0: 3669.0. Samples: 124140756. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-04 07:07:28,968][134211] Avg episode reward: [(0, '8.173')] [2025-01-04 07:07:31,335][134294] Updated weights for policy 0, policy_version 131814 (0.0030) [2025-01-04 07:07:33,968][134211] Fps is (10 sec: 14744.3, 60 sec: 14199.3, 300 sec: 14440.1). Total num frames: 539942912. Throughput: 0: 3673.4. Samples: 124150232. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-04 07:07:33,969][134211] Avg episode reward: [(0, '8.928')] [2025-01-04 07:07:34,542][134294] Updated weights for policy 0, policy_version 131824 (0.0025) [2025-01-04 07:07:37,698][134294] Updated weights for policy 0, policy_version 131834 (0.0026) [2025-01-04 07:07:38,968][134211] Fps is (10 sec: 13107.1, 60 sec: 14199.4, 300 sec: 14454.0). Total num frames: 540008448. Throughput: 0: 3677.1. Samples: 124169646. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-04 07:07:38,968][134211] Avg episode reward: [(0, '7.976')] [2025-01-04 07:07:40,923][134294] Updated weights for policy 0, policy_version 131844 (0.0026) [2025-01-04 07:07:43,968][134211] Fps is (10 sec: 12288.7, 60 sec: 14131.2, 300 sec: 14440.1). Total num frames: 540065792. Throughput: 0: 3690.8. Samples: 124188260. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-04 07:07:43,969][134211] Avg episode reward: [(0, '8.722')] [2025-01-04 07:07:44,362][134294] Updated weights for policy 0, policy_version 131854 (0.0025) [2025-01-04 07:07:47,833][134294] Updated weights for policy 0, policy_version 131864 (0.0023) [2025-01-04 07:07:48,968][134211] Fps is (10 sec: 11878.4, 60 sec: 14131.2, 300 sec: 14287.4). Total num frames: 540127232. Throughput: 0: 3680.5. Samples: 124196960. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-04 07:07:48,968][134211] Avg episode reward: [(0, '8.900')] [2025-01-04 07:07:50,976][134294] Updated weights for policy 0, policy_version 131874 (0.0026) [2025-01-04 07:07:53,912][134294] Updated weights for policy 0, policy_version 131884 (0.0024) [2025-01-04 07:07:53,968][134211] Fps is (10 sec: 13107.3, 60 sec: 14199.4, 300 sec: 14162.4). Total num frames: 540196864. Throughput: 0: 3678.0. Samples: 124216486. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-04 07:07:53,968][134211] Avg episode reward: [(0, '9.569')] [2025-01-04 07:07:56,802][134294] Updated weights for policy 0, policy_version 131894 (0.0023) [2025-01-04 07:07:58,968][134211] Fps is (10 sec: 13517.1, 60 sec: 14336.1, 300 sec: 14148.6). Total num frames: 540262400. Throughput: 0: 3727.4. Samples: 124237068. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-04 07:07:58,968][134211] Avg episode reward: [(0, '9.135')] [2025-01-04 07:07:59,996][134294] Updated weights for policy 0, policy_version 131904 (0.0025) [2025-01-04 07:08:02,926][134294] Updated weights for policy 0, policy_version 131914 (0.0024) [2025-01-04 07:08:03,968][134211] Fps is (10 sec: 13516.6, 60 sec: 14404.2, 300 sec: 14176.3). Total num frames: 540332032. Throughput: 0: 3679.9. Samples: 124246974. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-04 07:08:03,969][134211] Avg episode reward: [(0, '8.594')] [2025-01-04 07:08:05,982][134294] Updated weights for policy 0, policy_version 131924 (0.0024) [2025-01-04 07:08:08,918][134294] Updated weights for policy 0, policy_version 131934 (0.0024) [2025-01-04 07:08:08,968][134211] Fps is (10 sec: 13926.2, 60 sec: 14472.7, 300 sec: 14218.0). Total num frames: 540401664. Throughput: 0: 3438.8. Samples: 124267658. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-04 07:08:08,968][134211] Avg episode reward: [(0, '8.572')] [2025-01-04 07:08:12,237][134294] Updated weights for policy 0, policy_version 131944 (0.0026) [2025-01-04 07:08:13,968][134211] Fps is (10 sec: 13927.0, 60 sec: 14677.4, 300 sec: 14245.8). Total num frames: 540471296. Throughput: 0: 3257.6. Samples: 124287350. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-04 07:08:13,968][134211] Avg episode reward: [(0, '7.763')] [2025-01-04 07:08:14,515][134294] Updated weights for policy 0, policy_version 131954 (0.0015) [2025-01-04 07:08:16,630][134294] Updated weights for policy 0, policy_version 131964 (0.0014) [2025-01-04 07:08:18,747][134294] Updated weights for policy 0, policy_version 131974 (0.0012) [2025-01-04 07:08:18,968][134211] Fps is (10 sec: 16793.9, 60 sec: 14745.6, 300 sec: 14356.8). Total num frames: 540569600. Throughput: 0: 3379.7. Samples: 124302316. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-04 07:08:18,968][134211] Avg episode reward: [(0, '8.371')] [2025-01-04 07:08:20,789][134294] Updated weights for policy 0, policy_version 131984 (0.0014) [2025-01-04 07:08:23,244][134294] Updated weights for policy 0, policy_version 131994 (0.0019) [2025-01-04 07:08:23,968][134211] Fps is (10 sec: 18021.9, 60 sec: 14267.7, 300 sec: 14426.2). Total num frames: 540651520. Throughput: 0: 3591.7. Samples: 124331274. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-04 07:08:23,969][134211] Avg episode reward: [(0, '7.909')] [2025-01-04 07:08:26,648][134294] Updated weights for policy 0, policy_version 132004 (0.0028) [2025-01-04 07:08:28,968][134211] Fps is (10 sec: 14745.5, 60 sec: 13994.7, 300 sec: 14426.3). Total num frames: 540717056. Throughput: 0: 3592.8. Samples: 124349934. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 07:08:28,968][134211] Avg episode reward: [(0, '8.090')] [2025-01-04 07:08:29,959][134294] Updated weights for policy 0, policy_version 132014 (0.0027) [2025-01-04 07:08:32,995][134294] Updated weights for policy 0, policy_version 132024 (0.0028) [2025-01-04 07:08:33,968][134211] Fps is (10 sec: 13107.5, 60 sec: 13994.9, 300 sec: 14329.1). Total num frames: 540782592. Throughput: 0: 3611.7. Samples: 124359488. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 07:08:33,968][134211] Avg episode reward: [(0, '7.819')] [2025-01-04 07:08:36,346][134294] Updated weights for policy 0, policy_version 132034 (0.0027) [2025-01-04 07:08:38,968][134211] Fps is (10 sec: 12697.4, 60 sec: 13926.4, 300 sec: 14231.9). Total num frames: 540844032. Throughput: 0: 3602.0. Samples: 124378574. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 07:08:38,968][134211] Avg episode reward: [(0, '8.326')] [2025-01-04 07:08:39,642][134294] Updated weights for policy 0, policy_version 132044 (0.0027) [2025-01-04 07:08:43,089][134294] Updated weights for policy 0, policy_version 132054 (0.0026) [2025-01-04 07:08:43,968][134211] Fps is (10 sec: 11878.2, 60 sec: 13926.4, 300 sec: 14231.9). Total num frames: 540901376. Throughput: 0: 3550.2. Samples: 124396826. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 07:08:43,968][134211] Avg episode reward: [(0, '9.026')] [2025-01-04 07:08:46,359][134294] Updated weights for policy 0, policy_version 132064 (0.0026) [2025-01-04 07:08:48,968][134211] Fps is (10 sec: 12288.2, 60 sec: 13994.7, 300 sec: 14232.0). Total num frames: 540966912. Throughput: 0: 3541.5. Samples: 124406342. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 07:08:48,968][134211] Avg episode reward: [(0, '8.153')] [2025-01-04 07:08:49,571][134294] Updated weights for policy 0, policy_version 132074 (0.0026) [2025-01-04 07:08:51,890][134294] Updated weights for policy 0, policy_version 132084 (0.0016) [2025-01-04 07:08:53,968][134211] Fps is (10 sec: 14745.7, 60 sec: 14199.5, 300 sec: 14273.5). Total num frames: 541048832. Throughput: 0: 3578.4. Samples: 124428686. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 07:08:53,968][134211] Avg episode reward: [(0, '8.081')] [2025-01-04 07:08:54,337][134294] Updated weights for policy 0, policy_version 132094 (0.0021) [2025-01-04 07:08:57,495][134294] Updated weights for policy 0, policy_version 132104 (0.0027) [2025-01-04 07:08:58,968][134211] Fps is (10 sec: 14745.5, 60 sec: 14199.4, 300 sec: 14273.5). Total num frames: 541114368. Throughput: 0: 3609.6. Samples: 124449784. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 07:08:58,968][134211] Avg episode reward: [(0, '8.739')] [2025-01-04 07:09:00,466][134294] Updated weights for policy 0, policy_version 132114 (0.0029) [2025-01-04 07:09:03,421][134294] Updated weights for policy 0, policy_version 132124 (0.0025) [2025-01-04 07:09:03,968][134211] Fps is (10 sec: 13516.8, 60 sec: 14199.5, 300 sec: 14287.4). Total num frames: 541184000. Throughput: 0: 3512.6. Samples: 124460386. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 07:09:03,968][134211] Avg episode reward: [(0, '8.702')] [2025-01-04 07:09:06,429][134294] Updated weights for policy 0, policy_version 132134 (0.0027) [2025-01-04 07:09:08,968][134211] Fps is (10 sec: 13926.4, 60 sec: 14199.5, 300 sec: 14315.2). Total num frames: 541253632. Throughput: 0: 3327.2. Samples: 124480998. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 07:09:08,968][134211] Avg episode reward: [(0, '9.068')] [2025-01-04 07:09:09,466][134294] Updated weights for policy 0, policy_version 132144 (0.0024) [2025-01-04 07:09:11,689][134294] Updated weights for policy 0, policy_version 132154 (0.0014) [2025-01-04 07:09:13,761][134294] Updated weights for policy 0, policy_version 132164 (0.0013) [2025-01-04 07:09:13,968][134211] Fps is (10 sec: 15974.8, 60 sec: 14540.8, 300 sec: 14342.9). Total num frames: 541343744. Throughput: 0: 3473.5. Samples: 124506240. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 07:09:13,968][134211] Avg episode reward: [(0, '9.138')] [2025-01-04 07:09:13,996][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000132165_541347840.pth... [2025-01-04 07:09:14,038][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000131323_537899008.pth [2025-01-04 07:09:15,851][134294] Updated weights for policy 0, policy_version 132174 (0.0013) [2025-01-04 07:09:18,390][134294] Updated weights for policy 0, policy_version 132184 (0.0022) [2025-01-04 07:09:18,968][134211] Fps is (10 sec: 17612.9, 60 sec: 14336.0, 300 sec: 14315.2). Total num frames: 541429760. Throughput: 0: 3592.9. Samples: 124521168. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 07:09:18,968][134211] Avg episode reward: [(0, '8.870')] [2025-01-04 07:09:21,663][134294] Updated weights for policy 0, policy_version 132194 (0.0027) [2025-01-04 07:09:23,968][134211] Fps is (10 sec: 15155.0, 60 sec: 14063.0, 300 sec: 14273.5). Total num frames: 541495296. Throughput: 0: 3621.3. Samples: 124541532. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 07:09:23,968][134211] Avg episode reward: [(0, '7.955')] [2025-01-04 07:09:24,979][134294] Updated weights for policy 0, policy_version 132204 (0.0025) [2025-01-04 07:09:28,386][134294] Updated weights for policy 0, policy_version 132214 (0.0028) [2025-01-04 07:09:28,968][134211] Fps is (10 sec: 12288.1, 60 sec: 13926.4, 300 sec: 14245.8). Total num frames: 541552640. Throughput: 0: 3621.0. Samples: 124559768. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 07:09:28,968][134211] Avg episode reward: [(0, '8.475')] [2025-01-04 07:09:31,492][134294] Updated weights for policy 0, policy_version 132224 (0.0026) [2025-01-04 07:09:33,968][134211] Fps is (10 sec: 12697.5, 60 sec: 13994.6, 300 sec: 14273.5). Total num frames: 541622272. Throughput: 0: 3629.0. Samples: 124569648. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 07:09:33,968][134211] Avg episode reward: [(0, '8.381')] [2025-01-04 07:09:34,630][134294] Updated weights for policy 0, policy_version 132234 (0.0025) [2025-01-04 07:09:37,695][134294] Updated weights for policy 0, policy_version 132244 (0.0025) [2025-01-04 07:09:38,968][134211] Fps is (10 sec: 13107.2, 60 sec: 13994.7, 300 sec: 14287.4). Total num frames: 541683712. Throughput: 0: 3570.8. Samples: 124589372. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 07:09:38,968][134211] Avg episode reward: [(0, '8.471')] [2025-01-04 07:09:40,986][134294] Updated weights for policy 0, policy_version 132254 (0.0025) [2025-01-04 07:09:43,969][134211] Fps is (10 sec: 12696.2, 60 sec: 14130.9, 300 sec: 14287.3). Total num frames: 541749248. Throughput: 0: 3518.9. Samples: 124608140. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 07:09:43,969][134211] Avg episode reward: [(0, '8.561')] [2025-01-04 07:09:44,296][134294] Updated weights for policy 0, policy_version 132264 (0.0027) [2025-01-04 07:09:47,461][134294] Updated weights for policy 0, policy_version 132274 (0.0027) [2025-01-04 07:09:48,968][134211] Fps is (10 sec: 13516.5, 60 sec: 14199.4, 300 sec: 14245.7). Total num frames: 541818880. Throughput: 0: 3490.5. Samples: 124617460. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 07:09:48,968][134211] Avg episode reward: [(0, '8.648')] [2025-01-04 07:09:49,703][134294] Updated weights for policy 0, policy_version 132284 (0.0011) [2025-01-04 07:09:51,554][134294] Updated weights for policy 0, policy_version 132294 (0.0013) [2025-01-04 07:09:53,968][134211] Fps is (10 sec: 16385.8, 60 sec: 14404.3, 300 sec: 14343.0). Total num frames: 541913088. Throughput: 0: 3644.9. Samples: 124645020. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 07:09:53,968][134211] Avg episode reward: [(0, '8.653')] [2025-01-04 07:09:54,293][134294] Updated weights for policy 0, policy_version 132304 (0.0023) [2025-01-04 07:09:57,405][134294] Updated weights for policy 0, policy_version 132314 (0.0025) [2025-01-04 07:09:58,968][134211] Fps is (10 sec: 15974.6, 60 sec: 14404.3, 300 sec: 14329.1). Total num frames: 541978624. Throughput: 0: 3534.5. Samples: 124665294. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 07:09:58,968][134211] Avg episode reward: [(0, '8.734')] [2025-01-04 07:10:00,694][134294] Updated weights for policy 0, policy_version 132324 (0.0031) [2025-01-04 07:10:03,968][134211] Fps is (10 sec: 12288.0, 60 sec: 14199.5, 300 sec: 14301.3). Total num frames: 542035968. Throughput: 0: 3406.7. Samples: 124674470. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 07:10:03,968][134211] Avg episode reward: [(0, '9.450')] [2025-01-04 07:10:04,070][134294] Updated weights for policy 0, policy_version 132334 (0.0026) [2025-01-04 07:10:06,760][134294] Updated weights for policy 0, policy_version 132344 (0.0020) [2025-01-04 07:10:08,650][134294] Updated weights for policy 0, policy_version 132354 (0.0014) [2025-01-04 07:10:08,967][134211] Fps is (10 sec: 14746.0, 60 sec: 14540.9, 300 sec: 14329.1). Total num frames: 542126080. Throughput: 0: 3430.3. Samples: 124695894. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 07:10:08,968][134211] Avg episode reward: [(0, '8.913')] [2025-01-04 07:10:10,596][134294] Updated weights for policy 0, policy_version 132364 (0.0013) [2025-01-04 07:10:12,753][134294] Updated weights for policy 0, policy_version 132374 (0.0014) [2025-01-04 07:10:13,968][134211] Fps is (10 sec: 18022.6, 60 sec: 14540.8, 300 sec: 14315.2). Total num frames: 542216192. Throughput: 0: 3684.7. Samples: 124725580. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 07:10:13,968][134211] Avg episode reward: [(0, '7.950')] [2025-01-04 07:10:16,120][134294] Updated weights for policy 0, policy_version 132384 (0.0027) [2025-01-04 07:10:18,968][134211] Fps is (10 sec: 14745.1, 60 sec: 14062.9, 300 sec: 14287.4). Total num frames: 542273536. Throughput: 0: 3666.8. Samples: 124734654. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 07:10:18,968][134211] Avg episode reward: [(0, '8.535')] [2025-01-04 07:10:19,886][134294] Updated weights for policy 0, policy_version 132394 (0.0027) [2025-01-04 07:10:23,474][134294] Updated weights for policy 0, policy_version 132404 (0.0024) [2025-01-04 07:10:23,968][134211] Fps is (10 sec: 11468.7, 60 sec: 13926.4, 300 sec: 14259.7). Total num frames: 542330880. Throughput: 0: 3593.7. Samples: 124751088. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 07:10:23,968][134211] Avg episode reward: [(0, '8.172')] [2025-01-04 07:10:26,910][134294] Updated weights for policy 0, policy_version 132414 (0.0027) [2025-01-04 07:10:28,968][134211] Fps is (10 sec: 11878.7, 60 sec: 13994.7, 300 sec: 14231.9). Total num frames: 542392320. Throughput: 0: 3575.2. Samples: 124769020. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 07:10:28,968][134211] Avg episode reward: [(0, '8.338')] [2025-01-04 07:10:29,641][134294] Updated weights for policy 0, policy_version 132424 (0.0020) [2025-01-04 07:10:31,675][134294] Updated weights for policy 0, policy_version 132434 (0.0016) [2025-01-04 07:10:33,873][134294] Updated weights for policy 0, policy_version 132444 (0.0021) [2025-01-04 07:10:33,968][134211] Fps is (10 sec: 15974.6, 60 sec: 14472.6, 300 sec: 14342.9). Total num frames: 542490624. Throughput: 0: 3678.0. Samples: 124782970. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 07:10:33,968][134211] Avg episode reward: [(0, '8.248')] [2025-01-04 07:10:37,025][134294] Updated weights for policy 0, policy_version 132454 (0.0026) [2025-01-04 07:10:38,968][134211] Fps is (10 sec: 16383.7, 60 sec: 14540.8, 300 sec: 14370.8). Total num frames: 542556160. Throughput: 0: 3595.4. Samples: 124806812. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 07:10:38,968][134211] Avg episode reward: [(0, '8.051')] [2025-01-04 07:10:40,062][134294] Updated weights for policy 0, policy_version 132464 (0.0028) [2025-01-04 07:10:43,470][134294] Updated weights for policy 0, policy_version 132474 (0.0029) [2025-01-04 07:10:43,968][134211] Fps is (10 sec: 12697.6, 60 sec: 14472.8, 300 sec: 14370.7). Total num frames: 542617600. Throughput: 0: 3560.7. Samples: 124825524. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 07:10:43,968][134211] Avg episode reward: [(0, '8.945')] [2025-01-04 07:10:46,731][134294] Updated weights for policy 0, policy_version 132484 (0.0024) [2025-01-04 07:10:48,968][134211] Fps is (10 sec: 12287.6, 60 sec: 14336.0, 300 sec: 14218.0). Total num frames: 542679040. Throughput: 0: 3569.2. Samples: 124835086. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 07:10:48,969][134211] Avg episode reward: [(0, '9.454')] [2025-01-04 07:10:50,036][134294] Updated weights for policy 0, policy_version 132494 (0.0025) [2025-01-04 07:10:53,030][134294] Updated weights for policy 0, policy_version 132504 (0.0025) [2025-01-04 07:10:53,968][134211] Fps is (10 sec: 12697.5, 60 sec: 13858.1, 300 sec: 14176.3). Total num frames: 542744576. Throughput: 0: 3522.6. Samples: 124854410. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 07:10:53,968][134211] Avg episode reward: [(0, '9.166')] [2025-01-04 07:10:55,949][134294] Updated weights for policy 0, policy_version 132514 (0.0025) [2025-01-04 07:10:58,896][134294] Updated weights for policy 0, policy_version 132524 (0.0024) [2025-01-04 07:10:58,968][134211] Fps is (10 sec: 13926.8, 60 sec: 13994.7, 300 sec: 14204.1). Total num frames: 542818304. Throughput: 0: 3327.1. Samples: 124875298. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 07:10:58,968][134211] Avg episode reward: [(0, '9.143')] [2025-01-04 07:11:01,950][134294] Updated weights for policy 0, policy_version 132534 (0.0027) [2025-01-04 07:11:03,968][134211] Fps is (10 sec: 14336.2, 60 sec: 14199.5, 300 sec: 14218.0). Total num frames: 542887936. Throughput: 0: 3347.4. Samples: 124885288. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 07:11:03,968][134211] Avg episode reward: [(0, '9.110')] [2025-01-04 07:11:04,512][134294] Updated weights for policy 0, policy_version 132544 (0.0018) [2025-01-04 07:11:06,990][134294] Updated weights for policy 0, policy_version 132554 (0.0020) [2025-01-04 07:11:08,968][134211] Fps is (10 sec: 14745.8, 60 sec: 13994.6, 300 sec: 14287.4). Total num frames: 542965760. Throughput: 0: 3517.4. Samples: 124909370. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 07:11:08,968][134211] Avg episode reward: [(0, '9.112')] [2025-01-04 07:11:09,989][134294] Updated weights for policy 0, policy_version 132564 (0.0023) [2025-01-04 07:11:13,434][134294] Updated weights for policy 0, policy_version 132574 (0.0024) [2025-01-04 07:11:13,969][134211] Fps is (10 sec: 13924.5, 60 sec: 13516.5, 300 sec: 14301.2). Total num frames: 543027200. Throughput: 0: 3534.6. Samples: 124928084. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 07:11:13,970][134211] Avg episode reward: [(0, '8.543')] [2025-01-04 07:11:13,981][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000132575_543027200.pth... [2025-01-04 07:11:14,060][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000131737_539594752.pth [2025-01-04 07:11:15,957][134294] Updated weights for policy 0, policy_version 132584 (0.0019) [2025-01-04 07:11:17,949][134294] Updated weights for policy 0, policy_version 132594 (0.0013) [2025-01-04 07:11:18,968][134211] Fps is (10 sec: 15974.5, 60 sec: 14199.5, 300 sec: 14287.4). Total num frames: 543125504. Throughput: 0: 3499.7. Samples: 124940454. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 07:11:18,968][134211] Avg episode reward: [(0, '8.796')] [2025-01-04 07:11:19,973][134294] Updated weights for policy 0, policy_version 132604 (0.0014) [2025-01-04 07:11:21,816][134294] Updated weights for policy 0, policy_version 132614 (0.0014) [2025-01-04 07:11:23,968][134211] Fps is (10 sec: 18843.7, 60 sec: 14745.6, 300 sec: 14218.0). Total num frames: 543215616. Throughput: 0: 3661.1. Samples: 124971564. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 07:11:23,969][134211] Avg episode reward: [(0, '8.611')] [2025-01-04 07:11:24,649][134294] Updated weights for policy 0, policy_version 132624 (0.0023) [2025-01-04 07:11:27,909][134294] Updated weights for policy 0, policy_version 132634 (0.0028) [2025-01-04 07:11:28,968][134211] Fps is (10 sec: 15564.7, 60 sec: 14813.9, 300 sec: 14204.1). Total num frames: 543281152. Throughput: 0: 3680.9. Samples: 124991164. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 07:11:28,968][134211] Avg episode reward: [(0, '8.138')] [2025-01-04 07:11:31,015][134294] Updated weights for policy 0, policy_version 132644 (0.0025) [2025-01-04 07:11:33,970][134211] Fps is (10 sec: 13104.5, 60 sec: 14267.2, 300 sec: 14204.0). Total num frames: 543346688. Throughput: 0: 3688.6. Samples: 125001078. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 07:11:33,971][134211] Avg episode reward: [(0, '8.449')] [2025-01-04 07:11:34,201][134294] Updated weights for policy 0, policy_version 132654 (0.0029) [2025-01-04 07:11:37,269][134294] Updated weights for policy 0, policy_version 132664 (0.0026) [2025-01-04 07:11:38,968][134211] Fps is (10 sec: 12697.6, 60 sec: 14199.5, 300 sec: 14204.1). Total num frames: 543408128. Throughput: 0: 3696.3. Samples: 125020742. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 07:11:38,968][134211] Avg episode reward: [(0, '8.183')] [2025-01-04 07:11:40,507][134294] Updated weights for policy 0, policy_version 132674 (0.0025) [2025-01-04 07:11:43,955][134294] Updated weights for policy 0, policy_version 132684 (0.0030) [2025-01-04 07:11:43,968][134211] Fps is (10 sec: 12700.4, 60 sec: 14267.7, 300 sec: 14218.0). Total num frames: 543473664. Throughput: 0: 3649.6. Samples: 125039532. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 07:11:43,968][134211] Avg episode reward: [(0, '7.788')] [2025-01-04 07:11:47,219][134294] Updated weights for policy 0, policy_version 132694 (0.0028) [2025-01-04 07:11:48,968][134211] Fps is (10 sec: 12697.6, 60 sec: 14267.8, 300 sec: 14204.1). Total num frames: 543535104. Throughput: 0: 3633.0. Samples: 125048772. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 07:11:48,968][134211] Avg episode reward: [(0, '7.678')] [2025-01-04 07:11:50,343][134294] Updated weights for policy 0, policy_version 132704 (0.0023) [2025-01-04 07:11:53,538][134294] Updated weights for policy 0, policy_version 132714 (0.0023) [2025-01-04 07:11:53,968][134211] Fps is (10 sec: 12697.6, 60 sec: 14267.7, 300 sec: 14231.9). Total num frames: 543600640. Throughput: 0: 3526.6. Samples: 125068068. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 07:11:53,968][134211] Avg episode reward: [(0, '8.565')] [2025-01-04 07:11:56,102][134294] Updated weights for policy 0, policy_version 132724 (0.0019) [2025-01-04 07:11:58,005][134294] Updated weights for policy 0, policy_version 132734 (0.0013) [2025-01-04 07:11:58,968][134211] Fps is (10 sec: 15974.6, 60 sec: 14609.1, 300 sec: 14329.1). Total num frames: 543694848. Throughput: 0: 3683.5. Samples: 125093834. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 07:11:58,968][134211] Avg episode reward: [(0, '8.957')] [2025-01-04 07:11:59,897][134294] Updated weights for policy 0, policy_version 132744 (0.0013) [2025-01-04 07:12:01,873][134294] Updated weights for policy 0, policy_version 132754 (0.0013) [2025-01-04 07:12:03,944][134294] Updated weights for policy 0, policy_version 132764 (0.0018) [2025-01-04 07:12:03,968][134211] Fps is (10 sec: 20070.2, 60 sec: 15223.4, 300 sec: 14467.9). Total num frames: 543801344. Throughput: 0: 3760.6. Samples: 125109684. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 07:12:03,968][134211] Avg episode reward: [(0, '8.073')] [2025-01-04 07:12:07,263][134294] Updated weights for policy 0, policy_version 132774 (0.0028) [2025-01-04 07:12:08,968][134211] Fps is (10 sec: 16383.7, 60 sec: 14882.1, 300 sec: 14467.9). Total num frames: 543858688. Throughput: 0: 3599.1. Samples: 125133524. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 07:12:08,969][134211] Avg episode reward: [(0, '8.938')] [2025-01-04 07:12:10,774][134294] Updated weights for policy 0, policy_version 132784 (0.0029) [2025-01-04 07:12:13,968][134211] Fps is (10 sec: 11468.9, 60 sec: 14814.2, 300 sec: 14342.9). Total num frames: 543916032. Throughput: 0: 3548.4. Samples: 125150840. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 07:12:13,968][134211] Avg episode reward: [(0, '8.646')] [2025-01-04 07:12:14,478][134294] Updated weights for policy 0, policy_version 132794 (0.0026) [2025-01-04 07:12:17,758][134294] Updated weights for policy 0, policy_version 132804 (0.0023) [2025-01-04 07:12:18,968][134211] Fps is (10 sec: 11878.3, 60 sec: 14199.4, 300 sec: 14176.3). Total num frames: 543977472. Throughput: 0: 3524.7. Samples: 125159682. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 07:12:18,968][134211] Avg episode reward: [(0, '8.161')] [2025-01-04 07:12:21,310][134294] Updated weights for policy 0, policy_version 132814 (0.0027) [2025-01-04 07:12:23,967][134211] Fps is (10 sec: 12697.9, 60 sec: 13789.9, 300 sec: 14120.8). Total num frames: 544043008. Throughput: 0: 3474.4. Samples: 125177088. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 07:12:23,968][134211] Avg episode reward: [(0, '8.955')] [2025-01-04 07:12:24,089][134294] Updated weights for policy 0, policy_version 132824 (0.0021) [2025-01-04 07:12:26,116][134294] Updated weights for policy 0, policy_version 132834 (0.0013) [2025-01-04 07:12:28,031][134294] Updated weights for policy 0, policy_version 132844 (0.0014) [2025-01-04 07:12:28,968][134211] Fps is (10 sec: 17203.6, 60 sec: 14472.6, 300 sec: 14259.7). Total num frames: 544149504. Throughput: 0: 3708.7. Samples: 125206422. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 07:12:28,968][134211] Avg episode reward: [(0, '9.027')] [2025-01-04 07:12:29,908][134294] Updated weights for policy 0, policy_version 132854 (0.0013) [2025-01-04 07:12:31,882][134294] Updated weights for policy 0, policy_version 132864 (0.0014) [2025-01-04 07:12:33,968][134211] Fps is (10 sec: 19660.1, 60 sec: 14882.6, 300 sec: 14342.9). Total num frames: 544239616. Throughput: 0: 3853.6. Samples: 125222186. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 07:12:33,968][134211] Avg episode reward: [(0, '8.841')] [2025-01-04 07:12:34,639][134294] Updated weights for policy 0, policy_version 132874 (0.0025) [2025-01-04 07:12:37,816][134294] Updated weights for policy 0, policy_version 132884 (0.0026) [2025-01-04 07:12:38,968][134211] Fps is (10 sec: 15564.3, 60 sec: 14950.3, 300 sec: 14370.7). Total num frames: 544305152. Throughput: 0: 3919.0. Samples: 125244422. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 07:12:38,969][134211] Avg episode reward: [(0, '8.533')] [2025-01-04 07:12:41,056][134294] Updated weights for policy 0, policy_version 132894 (0.0027) [2025-01-04 07:12:43,968][134211] Fps is (10 sec: 12697.7, 60 sec: 14882.1, 300 sec: 14370.7). Total num frames: 544366592. Throughput: 0: 3751.6. Samples: 125262656. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 07:12:43,969][134211] Avg episode reward: [(0, '8.741')] [2025-01-04 07:12:44,579][134294] Updated weights for policy 0, policy_version 132904 (0.0027) [2025-01-04 07:12:47,906][134294] Updated weights for policy 0, policy_version 132914 (0.0025) [2025-01-04 07:12:48,968][134211] Fps is (10 sec: 12288.1, 60 sec: 14882.1, 300 sec: 14342.9). Total num frames: 544428032. Throughput: 0: 3602.5. Samples: 125271796. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 07:12:48,968][134211] Avg episode reward: [(0, '8.233')] [2025-01-04 07:12:51,018][134294] Updated weights for policy 0, policy_version 132924 (0.0027) [2025-01-04 07:12:53,968][134211] Fps is (10 sec: 12697.8, 60 sec: 14882.2, 300 sec: 14342.9). Total num frames: 544493568. Throughput: 0: 3504.1. Samples: 125291208. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 07:12:53,968][134211] Avg episode reward: [(0, '8.627')] [2025-01-04 07:12:54,152][134294] Updated weights for policy 0, policy_version 132934 (0.0027) [2025-01-04 07:12:57,157][134294] Updated weights for policy 0, policy_version 132944 (0.0024) [2025-01-04 07:12:58,968][134211] Fps is (10 sec: 13107.3, 60 sec: 14404.2, 300 sec: 14329.1). Total num frames: 544559104. Throughput: 0: 3560.1. Samples: 125311046. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 07:12:58,969][134211] Avg episode reward: [(0, '8.083')] [2025-01-04 07:13:00,261][134294] Updated weights for policy 0, policy_version 132954 (0.0028) [2025-01-04 07:13:03,241][134294] Updated weights for policy 0, policy_version 132964 (0.0024) [2025-01-04 07:13:03,968][134211] Fps is (10 sec: 13516.8, 60 sec: 13789.9, 300 sec: 14329.1). Total num frames: 544628736. Throughput: 0: 3592.7. Samples: 125321352. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 07:13:03,968][134211] Avg episode reward: [(0, '8.183')] [2025-01-04 07:13:06,279][134294] Updated weights for policy 0, policy_version 132974 (0.0025) [2025-01-04 07:13:08,968][134211] Fps is (10 sec: 13516.8, 60 sec: 13926.4, 300 sec: 14315.2). Total num frames: 544694272. Throughput: 0: 3661.1. Samples: 125341838. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 07:13:08,969][134211] Avg episode reward: [(0, '8.678')] [2025-01-04 07:13:09,276][134294] Updated weights for policy 0, policy_version 132984 (0.0028) [2025-01-04 07:13:12,736][134294] Updated weights for policy 0, policy_version 132994 (0.0021) [2025-01-04 07:13:13,968][134211] Fps is (10 sec: 12697.5, 60 sec: 13994.7, 300 sec: 14190.2). Total num frames: 544755712. Throughput: 0: 3419.0. Samples: 125360278. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 07:13:13,968][134211] Avg episode reward: [(0, '8.852')] [2025-01-04 07:13:13,976][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000132997_544755712.pth... [2025-01-04 07:13:14,056][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000132165_541347840.pth [2025-01-04 07:13:16,092][134294] Updated weights for policy 0, policy_version 133004 (0.0024) [2025-01-04 07:13:18,188][134294] Updated weights for policy 0, policy_version 133014 (0.0015) [2025-01-04 07:13:18,968][134211] Fps is (10 sec: 14336.2, 60 sec: 14336.0, 300 sec: 14190.2). Total num frames: 544837632. Throughput: 0: 3281.5. Samples: 125369854. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 07:13:18,968][134211] Avg episode reward: [(0, '8.530')] [2025-01-04 07:13:20,306][134294] Updated weights for policy 0, policy_version 133024 (0.0014) [2025-01-04 07:13:22,227][134294] Updated weights for policy 0, policy_version 133034 (0.0014) [2025-01-04 07:13:23,967][134211] Fps is (10 sec: 18432.4, 60 sec: 14950.4, 300 sec: 14315.2). Total num frames: 544940032. Throughput: 0: 3454.2. Samples: 125399860. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 07:13:23,968][134211] Avg episode reward: [(0, '9.024')] [2025-01-04 07:13:24,214][134294] Updated weights for policy 0, policy_version 133044 (0.0014) [2025-01-04 07:13:26,921][134294] Updated weights for policy 0, policy_version 133054 (0.0023) [2025-01-04 07:13:28,968][134211] Fps is (10 sec: 17202.9, 60 sec: 14336.0, 300 sec: 14329.1). Total num frames: 545009664. Throughput: 0: 3580.4. Samples: 125423776. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 07:13:28,968][134211] Avg episode reward: [(0, '8.888')] [2025-01-04 07:13:30,493][134294] Updated weights for policy 0, policy_version 133064 (0.0030) [2025-01-04 07:13:33,774][134294] Updated weights for policy 0, policy_version 133074 (0.0027) [2025-01-04 07:13:33,971][134211] Fps is (10 sec: 13102.8, 60 sec: 13857.4, 300 sec: 14328.9). Total num frames: 545071104. Throughput: 0: 3573.7. Samples: 125432624. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 07:13:33,972][134211] Avg episode reward: [(0, '7.696')] [2025-01-04 07:13:36,744][134294] Updated weights for policy 0, policy_version 133084 (0.0026) [2025-01-04 07:13:38,968][134211] Fps is (10 sec: 12697.6, 60 sec: 13858.2, 300 sec: 14356.8). Total num frames: 545136640. Throughput: 0: 3586.2. Samples: 125452588. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 07:13:38,968][134211] Avg episode reward: [(0, '8.420')] [2025-01-04 07:13:39,952][134294] Updated weights for policy 0, policy_version 133094 (0.0026) [2025-01-04 07:13:43,287][134294] Updated weights for policy 0, policy_version 133104 (0.0026) [2025-01-04 07:13:43,968][134211] Fps is (10 sec: 13111.4, 60 sec: 13926.4, 300 sec: 14356.8). Total num frames: 545202176. Throughput: 0: 3563.8. Samples: 125471418. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 07:13:43,968][134211] Avg episode reward: [(0, '8.763')] [2025-01-04 07:13:46,568][134294] Updated weights for policy 0, policy_version 133114 (0.0028) [2025-01-04 07:13:48,968][134211] Fps is (10 sec: 12697.6, 60 sec: 13926.4, 300 sec: 14287.4). Total num frames: 545263616. Throughput: 0: 3544.3. Samples: 125480844. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 07:13:48,968][134211] Avg episode reward: [(0, '9.022')] [2025-01-04 07:13:49,893][134294] Updated weights for policy 0, policy_version 133124 (0.0030) [2025-01-04 07:13:52,996][134294] Updated weights for policy 0, policy_version 133134 (0.0024) [2025-01-04 07:13:53,968][134211] Fps is (10 sec: 12697.4, 60 sec: 13926.4, 300 sec: 14287.4). Total num frames: 545329152. Throughput: 0: 3512.2. Samples: 125499886. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 07:13:53,969][134211] Avg episode reward: [(0, '7.922')] [2025-01-04 07:13:55,949][134294] Updated weights for policy 0, policy_version 133144 (0.0025) [2025-01-04 07:13:58,838][134294] Updated weights for policy 0, policy_version 133154 (0.0024) [2025-01-04 07:13:58,968][134211] Fps is (10 sec: 13516.5, 60 sec: 13994.6, 300 sec: 14287.4). Total num frames: 545398784. Throughput: 0: 3564.0. Samples: 125520658. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 07:13:58,969][134211] Avg episode reward: [(0, '9.555')] [2025-01-04 07:14:01,923][134294] Updated weights for policy 0, policy_version 133164 (0.0025) [2025-01-04 07:14:03,968][134211] Fps is (10 sec: 13926.5, 60 sec: 13994.6, 300 sec: 14287.4). Total num frames: 545468416. Throughput: 0: 3577.1. Samples: 125530824. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 07:14:03,968][134211] Avg episode reward: [(0, '8.044')] [2025-01-04 07:14:04,834][134294] Updated weights for policy 0, policy_version 133174 (0.0025) [2025-01-04 07:14:07,232][134294] Updated weights for policy 0, policy_version 133184 (0.0017) [2025-01-04 07:14:08,968][134211] Fps is (10 sec: 15975.1, 60 sec: 14404.3, 300 sec: 14287.4). Total num frames: 545558528. Throughput: 0: 3413.8. Samples: 125553482. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 07:14:08,968][134211] Avg episode reward: [(0, '8.000')] [2025-01-04 07:14:09,136][134294] Updated weights for policy 0, policy_version 133194 (0.0014) [2025-01-04 07:14:11,041][134294] Updated weights for policy 0, policy_version 133204 (0.0016) [2025-01-04 07:14:13,476][134294] Updated weights for policy 0, policy_version 133214 (0.0018) [2025-01-04 07:14:13,968][134211] Fps is (10 sec: 18022.3, 60 sec: 14882.1, 300 sec: 14301.3). Total num frames: 545648640. Throughput: 0: 3551.5. Samples: 125583594. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 07:14:13,968][134211] Avg episode reward: [(0, '7.651')] [2025-01-04 07:14:17,364][134294] Updated weights for policy 0, policy_version 133224 (0.0029) [2025-01-04 07:14:18,968][134211] Fps is (10 sec: 14335.8, 60 sec: 14404.2, 300 sec: 14259.6). Total num frames: 545701888. Throughput: 0: 3538.3. Samples: 125591836. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 07:14:18,968][134211] Avg episode reward: [(0, '8.407')] [2025-01-04 07:14:21,109][134294] Updated weights for policy 0, policy_version 133234 (0.0027) [2025-01-04 07:14:23,968][134211] Fps is (10 sec: 10649.6, 60 sec: 13585.0, 300 sec: 14245.7). Total num frames: 545755136. Throughput: 0: 3449.1. Samples: 125607796. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 07:14:23,968][134211] Avg episode reward: [(0, '7.850')] [2025-01-04 07:14:24,823][134294] Updated weights for policy 0, policy_version 133244 (0.0029) [2025-01-04 07:14:27,044][134294] Updated weights for policy 0, policy_version 133254 (0.0015) [2025-01-04 07:14:28,968][134211] Fps is (10 sec: 14336.2, 60 sec: 13926.4, 300 sec: 14315.2). Total num frames: 545845248. Throughput: 0: 3545.7. Samples: 125630976. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 07:14:28,968][134211] Avg episode reward: [(0, '8.368')] [2025-01-04 07:14:29,032][134294] Updated weights for policy 0, policy_version 133264 (0.0014) [2025-01-04 07:14:32,008][134294] Updated weights for policy 0, policy_version 133274 (0.0024) [2025-01-04 07:14:33,968][134211] Fps is (10 sec: 15565.0, 60 sec: 13995.4, 300 sec: 14329.1). Total num frames: 545910784. Throughput: 0: 3611.1. Samples: 125643344. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 07:14:33,968][134211] Avg episode reward: [(0, '7.751')] [2025-01-04 07:14:35,274][134294] Updated weights for policy 0, policy_version 133284 (0.0025) [2025-01-04 07:14:38,343][134294] Updated weights for policy 0, policy_version 133294 (0.0025) [2025-01-04 07:14:38,968][134211] Fps is (10 sec: 13107.1, 60 sec: 13994.7, 300 sec: 14329.1). Total num frames: 545976320. Throughput: 0: 3618.1. Samples: 125662702. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 07:14:38,968][134211] Avg episode reward: [(0, '8.734')] [2025-01-04 07:14:41,499][134294] Updated weights for policy 0, policy_version 133304 (0.0026) [2025-01-04 07:14:43,968][134211] Fps is (10 sec: 13107.1, 60 sec: 13994.6, 300 sec: 14315.2). Total num frames: 546041856. Throughput: 0: 3572.6. Samples: 125681426. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 07:14:43,968][134211] Avg episode reward: [(0, '8.983')] [2025-01-04 07:14:44,855][134294] Updated weights for policy 0, policy_version 133314 (0.0026) [2025-01-04 07:14:48,024][134294] Updated weights for policy 0, policy_version 133324 (0.0025) [2025-01-04 07:14:48,968][134211] Fps is (10 sec: 12697.5, 60 sec: 13994.7, 300 sec: 14204.1). Total num frames: 546103296. Throughput: 0: 3560.4. Samples: 125691042. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 07:14:48,968][134211] Avg episode reward: [(0, '7.676')] [2025-01-04 07:14:51,264][134294] Updated weights for policy 0, policy_version 133334 (0.0026) [2025-01-04 07:14:53,806][134294] Updated weights for policy 0, policy_version 133344 (0.0018) [2025-01-04 07:14:53,967][134211] Fps is (10 sec: 13517.1, 60 sec: 14131.3, 300 sec: 14231.9). Total num frames: 546177024. Throughput: 0: 3483.6. Samples: 125710246. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 07:14:53,968][134211] Avg episode reward: [(0, '8.884')] [2025-01-04 07:14:55,703][134294] Updated weights for policy 0, policy_version 133354 (0.0013) [2025-01-04 07:14:57,590][134294] Updated weights for policy 0, policy_version 133364 (0.0014) [2025-01-04 07:14:58,968][134211] Fps is (10 sec: 18432.3, 60 sec: 14814.0, 300 sec: 14412.4). Total num frames: 546287616. Throughput: 0: 3501.1. Samples: 125741144. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 07:14:58,968][134211] Avg episode reward: [(0, '8.261')] [2025-01-04 07:14:59,499][134294] Updated weights for policy 0, policy_version 133374 (0.0013) [2025-01-04 07:15:01,814][134294] Updated weights for policy 0, policy_version 133384 (0.0019) [2025-01-04 07:15:03,968][134211] Fps is (10 sec: 18431.4, 60 sec: 14882.1, 300 sec: 14356.8). Total num frames: 546361344. Throughput: 0: 3649.9. Samples: 125756080. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 07:15:03,969][134211] Avg episode reward: [(0, '8.934')] [2025-01-04 07:15:05,371][134294] Updated weights for policy 0, policy_version 133394 (0.0030) [2025-01-04 07:15:08,519][134294] Updated weights for policy 0, policy_version 133404 (0.0027) [2025-01-04 07:15:08,968][134211] Fps is (10 sec: 13926.3, 60 sec: 14472.5, 300 sec: 14273.5). Total num frames: 546426880. Throughput: 0: 3706.8. Samples: 125774602. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 07:15:08,968][134211] Avg episode reward: [(0, '8.786')] [2025-01-04 07:15:11,671][134294] Updated weights for policy 0, policy_version 133414 (0.0025) [2025-01-04 07:15:13,968][134211] Fps is (10 sec: 12697.6, 60 sec: 13994.6, 300 sec: 14287.4). Total num frames: 546488320. Throughput: 0: 3612.3. Samples: 125793532. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 07:15:13,969][134211] Avg episode reward: [(0, '7.876')] [2025-01-04 07:15:13,984][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000133420_546488320.pth... [2025-01-04 07:15:14,070][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000132575_543027200.pth [2025-01-04 07:15:15,107][134294] Updated weights for policy 0, policy_version 133424 (0.0029) [2025-01-04 07:15:18,319][134294] Updated weights for policy 0, policy_version 133434 (0.0025) [2025-01-04 07:15:18,968][134211] Fps is (10 sec: 12287.8, 60 sec: 14131.2, 300 sec: 14301.3). Total num frames: 546549760. Throughput: 0: 3545.5. Samples: 125802894. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 07:15:18,969][134211] Avg episode reward: [(0, '9.299')] [2025-01-04 07:15:21,594][134294] Updated weights for policy 0, policy_version 133444 (0.0026) [2025-01-04 07:15:23,968][134211] Fps is (10 sec: 12697.8, 60 sec: 14336.0, 300 sec: 14315.2). Total num frames: 546615296. Throughput: 0: 3537.0. Samples: 125821868. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 07:15:23,968][134211] Avg episode reward: [(0, '8.936')] [2025-01-04 07:15:24,709][134294] Updated weights for policy 0, policy_version 133454 (0.0027) [2025-01-04 07:15:27,792][134294] Updated weights for policy 0, policy_version 133464 (0.0025) [2025-01-04 07:15:28,968][134211] Fps is (10 sec: 13107.4, 60 sec: 13926.4, 300 sec: 14204.1). Total num frames: 546680832. Throughput: 0: 3555.5. Samples: 125841424. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 07:15:28,968][134211] Avg episode reward: [(0, '7.853')] [2025-01-04 07:15:30,779][134294] Updated weights for policy 0, policy_version 133474 (0.0025) [2025-01-04 07:15:33,692][134294] Updated weights for policy 0, policy_version 133484 (0.0024) [2025-01-04 07:15:33,969][134211] Fps is (10 sec: 13515.5, 60 sec: 13994.4, 300 sec: 14217.9). Total num frames: 546750464. Throughput: 0: 3574.7. Samples: 125851906. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 07:15:33,969][134211] Avg episode reward: [(0, '9.292')] [2025-01-04 07:15:36,643][134294] Updated weights for policy 0, policy_version 133494 (0.0022) [2025-01-04 07:15:38,972][134211] Fps is (10 sec: 13920.8, 60 sec: 14062.0, 300 sec: 14245.6). Total num frames: 546820096. Throughput: 0: 3611.7. Samples: 125872786. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 07:15:38,973][134211] Avg episode reward: [(0, '8.223')] [2025-01-04 07:15:39,880][134294] Updated weights for policy 0, policy_version 133504 (0.0025) [2025-01-04 07:15:43,147][134294] Updated weights for policy 0, policy_version 133514 (0.0024) [2025-01-04 07:15:43,967][134211] Fps is (10 sec: 13518.4, 60 sec: 14063.0, 300 sec: 14259.7). Total num frames: 546885632. Throughput: 0: 3343.3. Samples: 125891592. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 07:15:43,968][134211] Avg episode reward: [(0, '9.305')] [2025-01-04 07:15:45,232][134294] Updated weights for policy 0, policy_version 133524 (0.0014) [2025-01-04 07:15:47,206][134294] Updated weights for policy 0, policy_version 133534 (0.0015) [2025-01-04 07:15:48,967][134211] Fps is (10 sec: 16800.7, 60 sec: 14745.7, 300 sec: 14384.6). Total num frames: 546988032. Throughput: 0: 3349.0. Samples: 125906782. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 07:15:48,968][134211] Avg episode reward: [(0, '8.550')] [2025-01-04 07:15:49,165][134294] Updated weights for policy 0, policy_version 133544 (0.0010) [2025-01-04 07:15:51,205][134294] Updated weights for policy 0, policy_version 133554 (0.0014) [2025-01-04 07:15:53,145][134294] Updated weights for policy 0, policy_version 133564 (0.0013) [2025-01-04 07:15:53,968][134211] Fps is (10 sec: 20888.8, 60 sec: 15291.6, 300 sec: 14495.7). Total num frames: 547094528. Throughput: 0: 3624.3. Samples: 125937696. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 07:15:53,969][134211] Avg episode reward: [(0, '9.200')] [2025-01-04 07:15:55,589][134294] Updated weights for policy 0, policy_version 133574 (0.0023) [2025-01-04 07:15:58,785][134294] Updated weights for policy 0, policy_version 133584 (0.0027) [2025-01-04 07:15:58,968][134211] Fps is (10 sec: 17202.6, 60 sec: 14540.7, 300 sec: 14481.8). Total num frames: 547160064. Throughput: 0: 3728.6. Samples: 125961318. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 07:15:58,969][134211] Avg episode reward: [(0, '8.256')] [2025-01-04 07:16:02,000][134294] Updated weights for policy 0, policy_version 133594 (0.0024) [2025-01-04 07:16:03,968][134211] Fps is (10 sec: 13107.5, 60 sec: 14404.3, 300 sec: 14440.1). Total num frames: 547225600. Throughput: 0: 3730.9. Samples: 125970786. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 07:16:03,968][134211] Avg episode reward: [(0, '8.582')] [2025-01-04 07:16:05,067][134294] Updated weights for policy 0, policy_version 133604 (0.0029) [2025-01-04 07:16:08,064][134294] Updated weights for policy 0, policy_version 133614 (0.0028) [2025-01-04 07:16:08,968][134211] Fps is (10 sec: 13107.1, 60 sec: 14404.2, 300 sec: 14454.1). Total num frames: 547291136. Throughput: 0: 3756.4. Samples: 125990906. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 07:16:08,969][134211] Avg episode reward: [(0, '8.521')] [2025-01-04 07:16:11,541][134294] Updated weights for policy 0, policy_version 133624 (0.0028) [2025-01-04 07:16:13,968][134211] Fps is (10 sec: 12287.8, 60 sec: 14336.0, 300 sec: 14315.2). Total num frames: 547348480. Throughput: 0: 3716.4. Samples: 126008662. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 07:16:13,968][134211] Avg episode reward: [(0, '8.762')] [2025-01-04 07:16:15,104][134294] Updated weights for policy 0, policy_version 133634 (0.0028) [2025-01-04 07:16:18,502][134294] Updated weights for policy 0, policy_version 133644 (0.0025) [2025-01-04 07:16:18,968][134211] Fps is (10 sec: 11878.5, 60 sec: 14336.0, 300 sec: 14218.0). Total num frames: 547409920. Throughput: 0: 3681.0. Samples: 126017548. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 07:16:18,968][134211] Avg episode reward: [(0, '8.169')] [2025-01-04 07:16:21,904][134294] Updated weights for policy 0, policy_version 133654 (0.0022) [2025-01-04 07:16:23,968][134211] Fps is (10 sec: 11878.4, 60 sec: 14199.4, 300 sec: 14190.2). Total num frames: 547467264. Throughput: 0: 3617.0. Samples: 126035538. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 07:16:23,968][134211] Avg episode reward: [(0, '9.297')] [2025-01-04 07:16:25,510][134294] Updated weights for policy 0, policy_version 133664 (0.0025) [2025-01-04 07:16:28,952][134294] Updated weights for policy 0, policy_version 133674 (0.0028) [2025-01-04 07:16:28,968][134211] Fps is (10 sec: 11878.4, 60 sec: 14131.2, 300 sec: 14176.4). Total num frames: 547528704. Throughput: 0: 3591.7. Samples: 126053220. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 07:16:28,968][134211] Avg episode reward: [(0, '8.275')] [2025-01-04 07:16:31,757][134294] Updated weights for policy 0, policy_version 133684 (0.0017) [2025-01-04 07:16:33,602][134294] Updated weights for policy 0, policy_version 133694 (0.0013) [2025-01-04 07:16:33,967][134211] Fps is (10 sec: 14746.1, 60 sec: 14404.6, 300 sec: 14259.6). Total num frames: 547614720. Throughput: 0: 3467.5. Samples: 126062820. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 07:16:33,968][134211] Avg episode reward: [(0, '8.736')] [2025-01-04 07:16:35,505][134294] Updated weights for policy 0, policy_version 133704 (0.0014) [2025-01-04 07:16:37,367][134294] Updated weights for policy 0, policy_version 133714 (0.0012) [2025-01-04 07:16:38,968][134211] Fps is (10 sec: 19661.2, 60 sec: 15088.0, 300 sec: 14412.4). Total num frames: 547725312. Throughput: 0: 3503.2. Samples: 126095340. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 07:16:38,968][134211] Avg episode reward: [(0, '8.215')] [2025-01-04 07:16:39,252][134294] Updated weights for policy 0, policy_version 133724 (0.0012) [2025-01-04 07:16:41,333][134294] Updated weights for policy 0, policy_version 133734 (0.0015) [2025-01-04 07:16:43,968][134211] Fps is (10 sec: 18841.2, 60 sec: 15291.7, 300 sec: 14467.9). Total num frames: 547803136. Throughput: 0: 3587.4. Samples: 126122750. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 07:16:43,968][134211] Avg episode reward: [(0, '8.382')] [2025-01-04 07:16:44,626][134294] Updated weights for policy 0, policy_version 133744 (0.0031) [2025-01-04 07:16:48,241][134294] Updated weights for policy 0, policy_version 133754 (0.0025) [2025-01-04 07:16:48,968][134211] Fps is (10 sec: 13516.5, 60 sec: 14540.7, 300 sec: 14440.1). Total num frames: 547860480. Throughput: 0: 3562.8. Samples: 126131112. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 07:16:48,968][134211] Avg episode reward: [(0, '9.236')] [2025-01-04 07:16:51,787][134294] Updated weights for policy 0, policy_version 133764 (0.0030) [2025-01-04 07:16:53,968][134211] Fps is (10 sec: 12288.0, 60 sec: 13858.2, 300 sec: 14342.9). Total num frames: 547926016. Throughput: 0: 3508.7. Samples: 126148796. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 07:16:53,968][134211] Avg episode reward: [(0, '8.516')] [2025-01-04 07:16:54,905][134294] Updated weights for policy 0, policy_version 133774 (0.0025) [2025-01-04 07:16:58,194][134294] Updated weights for policy 0, policy_version 133784 (0.0027) [2025-01-04 07:16:58,968][134211] Fps is (10 sec: 12697.7, 60 sec: 13789.9, 300 sec: 14190.2). Total num frames: 547987456. Throughput: 0: 3545.3. Samples: 126168202. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 07:16:58,968][134211] Avg episode reward: [(0, '9.713')] [2025-01-04 07:17:01,385][134294] Updated weights for policy 0, policy_version 133794 (0.0026) [2025-01-04 07:17:03,968][134211] Fps is (10 sec: 12288.0, 60 sec: 13721.6, 300 sec: 14204.1). Total num frames: 548048896. Throughput: 0: 3554.3. Samples: 126177490. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 07:17:03,968][134211] Avg episode reward: [(0, '8.079')] [2025-01-04 07:17:04,927][134294] Updated weights for policy 0, policy_version 133804 (0.0028) [2025-01-04 07:17:07,997][134294] Updated weights for policy 0, policy_version 133814 (0.0020) [2025-01-04 07:17:08,968][134211] Fps is (10 sec: 13516.7, 60 sec: 13858.1, 300 sec: 14259.6). Total num frames: 548122624. Throughput: 0: 3554.3. Samples: 126195480. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 07:17:08,968][134211] Avg episode reward: [(0, '8.279')] [2025-01-04 07:17:10,000][134294] Updated weights for policy 0, policy_version 133824 (0.0014) [2025-01-04 07:17:13,155][134294] Updated weights for policy 0, policy_version 133834 (0.0022) [2025-01-04 07:17:13,968][134211] Fps is (10 sec: 14335.9, 60 sec: 14063.0, 300 sec: 14287.4). Total num frames: 548192256. Throughput: 0: 3688.4. Samples: 126219196. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 07:17:13,968][134211] Avg episode reward: [(0, '8.831')] [2025-01-04 07:17:13,980][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000133836_548192256.pth... [2025-01-04 07:17:14,054][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000132997_544755712.pth [2025-01-04 07:17:16,774][134294] Updated weights for policy 0, policy_version 133844 (0.0030) [2025-01-04 07:17:18,968][134211] Fps is (10 sec: 12697.9, 60 sec: 13994.7, 300 sec: 14259.6). Total num frames: 548249600. Throughput: 0: 3664.4. Samples: 126227716. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 07:17:18,968][134211] Avg episode reward: [(0, '8.836')] [2025-01-04 07:17:19,662][134294] Updated weights for policy 0, policy_version 133854 (0.0021) [2025-01-04 07:17:21,701][134294] Updated weights for policy 0, policy_version 133864 (0.0013) [2025-01-04 07:17:23,655][134294] Updated weights for policy 0, policy_version 133874 (0.0014) [2025-01-04 07:17:23,967][134211] Fps is (10 sec: 15974.8, 60 sec: 14745.7, 300 sec: 14245.8). Total num frames: 548352000. Throughput: 0: 3489.6. Samples: 126252372. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 07:17:23,968][134211] Avg episode reward: [(0, '9.205')] [2025-01-04 07:17:25,760][134294] Updated weights for policy 0, policy_version 133884 (0.0014) [2025-01-04 07:17:27,781][134294] Updated weights for policy 0, policy_version 133894 (0.0013) [2025-01-04 07:17:28,967][134211] Fps is (10 sec: 20480.1, 60 sec: 15428.3, 300 sec: 14287.4). Total num frames: 548454400. Throughput: 0: 3555.7. Samples: 126282758. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 07:17:28,968][134211] Avg episode reward: [(0, '8.441')] [2025-01-04 07:17:29,724][134294] Updated weights for policy 0, policy_version 133904 (0.0013) [2025-01-04 07:17:31,840][134294] Updated weights for policy 0, policy_version 133914 (0.0018) [2025-01-04 07:17:33,968][134211] Fps is (10 sec: 18431.3, 60 sec: 15359.9, 300 sec: 14342.9). Total num frames: 548536320. Throughput: 0: 3718.5. Samples: 126298444. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 07:17:33,969][134211] Avg episode reward: [(0, '7.991')] [2025-01-04 07:17:35,175][134294] Updated weights for policy 0, policy_version 133924 (0.0029) [2025-01-04 07:17:38,464][134294] Updated weights for policy 0, policy_version 133934 (0.0029) [2025-01-04 07:17:38,968][134211] Fps is (10 sec: 14335.6, 60 sec: 14540.7, 300 sec: 14342.9). Total num frames: 548597760. Throughput: 0: 3748.0. Samples: 126317458. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 07:17:38,968][134211] Avg episode reward: [(0, '8.569')] [2025-01-04 07:17:41,625][134294] Updated weights for policy 0, policy_version 133944 (0.0025) [2025-01-04 07:17:43,968][134211] Fps is (10 sec: 12288.3, 60 sec: 14267.7, 300 sec: 14343.0). Total num frames: 548659200. Throughput: 0: 3738.0. Samples: 126336410. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 07:17:43,968][134211] Avg episode reward: [(0, '8.934')] [2025-01-04 07:17:45,020][134294] Updated weights for policy 0, policy_version 133954 (0.0027) [2025-01-04 07:17:48,153][134294] Updated weights for policy 0, policy_version 133964 (0.0023) [2025-01-04 07:17:48,968][134211] Fps is (10 sec: 12697.6, 60 sec: 14404.3, 300 sec: 14342.9). Total num frames: 548724736. Throughput: 0: 3744.1. Samples: 126345976. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 07:17:48,968][134211] Avg episode reward: [(0, '8.253')] [2025-01-04 07:17:51,548][134294] Updated weights for policy 0, policy_version 133974 (0.0024) [2025-01-04 07:17:53,968][134211] Fps is (10 sec: 12697.5, 60 sec: 14336.0, 300 sec: 14329.1). Total num frames: 548786176. Throughput: 0: 3750.1. Samples: 126364236. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 07:17:53,968][134211] Avg episode reward: [(0, '8.448')] [2025-01-04 07:17:54,818][134294] Updated weights for policy 0, policy_version 133984 (0.0025) [2025-01-04 07:17:57,947][134294] Updated weights for policy 0, policy_version 133994 (0.0025) [2025-01-04 07:17:58,968][134211] Fps is (10 sec: 12697.8, 60 sec: 14404.3, 300 sec: 14315.2). Total num frames: 548851712. Throughput: 0: 3663.6. Samples: 126384058. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 07:17:58,968][134211] Avg episode reward: [(0, '8.439')] [2025-01-04 07:18:00,807][134294] Updated weights for policy 0, policy_version 134004 (0.0029) [2025-01-04 07:18:03,770][134294] Updated weights for policy 0, policy_version 134014 (0.0027) [2025-01-04 07:18:03,968][134211] Fps is (10 sec: 13516.4, 60 sec: 14540.7, 300 sec: 14329.0). Total num frames: 548921344. Throughput: 0: 3707.3. Samples: 126394546. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 07:18:03,969][134211] Avg episode reward: [(0, '7.515')] [2025-01-04 07:18:06,756][134294] Updated weights for policy 0, policy_version 134024 (0.0026) [2025-01-04 07:18:08,968][134211] Fps is (10 sec: 13516.6, 60 sec: 14404.3, 300 sec: 14342.9). Total num frames: 548986880. Throughput: 0: 3618.9. Samples: 126415224. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 07:18:08,968][134211] Avg episode reward: [(0, '8.184')] [2025-01-04 07:18:09,944][134294] Updated weights for policy 0, policy_version 134034 (0.0027) [2025-01-04 07:18:12,793][134294] Updated weights for policy 0, policy_version 134044 (0.0023) [2025-01-04 07:18:13,968][134211] Fps is (10 sec: 13517.3, 60 sec: 14404.3, 300 sec: 14301.3). Total num frames: 549056512. Throughput: 0: 3387.2. Samples: 126435184. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 07:18:13,968][134211] Avg episode reward: [(0, '8.018')] [2025-01-04 07:18:16,178][134294] Updated weights for policy 0, policy_version 134054 (0.0025) [2025-01-04 07:18:18,968][134211] Fps is (10 sec: 12697.6, 60 sec: 14404.2, 300 sec: 14148.5). Total num frames: 549113856. Throughput: 0: 3239.8. Samples: 126444236. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 07:18:18,968][134211] Avg episode reward: [(0, '8.920')] [2025-01-04 07:18:19,629][134294] Updated weights for policy 0, policy_version 134064 (0.0028) [2025-01-04 07:18:23,224][134294] Updated weights for policy 0, policy_version 134074 (0.0028) [2025-01-04 07:18:23,968][134211] Fps is (10 sec: 11878.3, 60 sec: 13721.5, 300 sec: 14120.8). Total num frames: 549175296. Throughput: 0: 3213.8. Samples: 126462080. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 07:18:23,969][134211] Avg episode reward: [(0, '8.475')] [2025-01-04 07:18:25,690][134294] Updated weights for policy 0, policy_version 134084 (0.0014) [2025-01-04 07:18:27,718][134294] Updated weights for policy 0, policy_version 134094 (0.0014) [2025-01-04 07:18:28,968][134211] Fps is (10 sec: 15974.8, 60 sec: 13653.3, 300 sec: 14245.9). Total num frames: 549273600. Throughput: 0: 3366.5. Samples: 126487902. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 07:18:28,968][134211] Avg episode reward: [(0, '9.343')] [2025-01-04 07:18:29,631][134294] Updated weights for policy 0, policy_version 134104 (0.0013) [2025-01-04 07:18:31,609][134294] Updated weights for policy 0, policy_version 134114 (0.0013) [2025-01-04 07:18:33,483][134294] Updated weights for policy 0, policy_version 134124 (0.0014) [2025-01-04 07:18:33,968][134211] Fps is (10 sec: 20479.2, 60 sec: 14062.9, 300 sec: 14384.6). Total num frames: 549380096. Throughput: 0: 3509.7. Samples: 126503916. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 07:18:33,969][134211] Avg episode reward: [(0, '7.655')] [2025-01-04 07:18:36,316][134294] Updated weights for policy 0, policy_version 134134 (0.0024) [2025-01-04 07:18:38,968][134211] Fps is (10 sec: 17202.7, 60 sec: 14131.2, 300 sec: 14384.6). Total num frames: 549445632. Throughput: 0: 3665.4. Samples: 126529178. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 07:18:38,968][134211] Avg episode reward: [(0, '8.522')] [2025-01-04 07:18:39,604][134294] Updated weights for policy 0, policy_version 134144 (0.0028) [2025-01-04 07:18:42,761][134294] Updated weights for policy 0, policy_version 134154 (0.0028) [2025-01-04 07:18:43,968][134211] Fps is (10 sec: 12698.2, 60 sec: 14131.2, 300 sec: 14384.6). Total num frames: 549507072. Throughput: 0: 3643.7. Samples: 126548024. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 07:18:43,968][134211] Avg episode reward: [(0, '8.753')] [2025-01-04 07:18:46,165][134294] Updated weights for policy 0, policy_version 134164 (0.0027) [2025-01-04 07:18:48,968][134211] Fps is (10 sec: 11878.7, 60 sec: 13994.7, 300 sec: 14356.8). Total num frames: 549564416. Throughput: 0: 3615.4. Samples: 126557238. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 07:18:48,968][134211] Avg episode reward: [(0, '8.196')] [2025-01-04 07:18:49,571][134294] Updated weights for policy 0, policy_version 134174 (0.0025) [2025-01-04 07:18:52,733][134294] Updated weights for policy 0, policy_version 134184 (0.0028) [2025-01-04 07:18:53,968][134211] Fps is (10 sec: 12288.1, 60 sec: 14063.0, 300 sec: 14343.0). Total num frames: 549629952. Throughput: 0: 3565.6. Samples: 126575676. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 07:18:53,968][134211] Avg episode reward: [(0, '7.692')] [2025-01-04 07:18:55,809][134294] Updated weights for policy 0, policy_version 134194 (0.0026) [2025-01-04 07:18:58,799][134294] Updated weights for policy 0, policy_version 134204 (0.0025) [2025-01-04 07:18:58,968][134211] Fps is (10 sec: 13516.0, 60 sec: 14131.1, 300 sec: 14342.9). Total num frames: 549699584. Throughput: 0: 3577.7. Samples: 126596184. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 07:18:58,969][134211] Avg episode reward: [(0, '8.607')] [2025-01-04 07:19:01,898][134294] Updated weights for policy 0, policy_version 134214 (0.0024) [2025-01-04 07:19:03,971][134211] Fps is (10 sec: 13512.5, 60 sec: 14062.3, 300 sec: 14259.5). Total num frames: 549765120. Throughput: 0: 3590.1. Samples: 126605800. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 07:19:03,971][134211] Avg episode reward: [(0, '8.192')] [2025-01-04 07:19:05,117][134294] Updated weights for policy 0, policy_version 134224 (0.0025) [2025-01-04 07:19:08,053][134294] Updated weights for policy 0, policy_version 134234 (0.0027) [2025-01-04 07:19:08,968][134211] Fps is (10 sec: 13517.5, 60 sec: 14131.2, 300 sec: 14190.2). Total num frames: 549834752. Throughput: 0: 3637.2. Samples: 126625752. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 07:19:08,968][134211] Avg episode reward: [(0, '8.388')] [2025-01-04 07:19:11,128][134294] Updated weights for policy 0, policy_version 134244 (0.0027) [2025-01-04 07:19:13,968][134211] Fps is (10 sec: 13111.3, 60 sec: 13994.7, 300 sec: 14218.0). Total num frames: 549896192. Throughput: 0: 3499.1. Samples: 126645360. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 07:19:13,968][134211] Avg episode reward: [(0, '8.534')] [2025-01-04 07:19:13,978][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000134252_549896192.pth... [2025-01-04 07:19:14,075][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000133420_546488320.pth [2025-01-04 07:19:14,493][134294] Updated weights for policy 0, policy_version 134254 (0.0028) [2025-01-04 07:19:17,771][134294] Updated weights for policy 0, policy_version 134264 (0.0028) [2025-01-04 07:19:18,968][134211] Fps is (10 sec: 12287.8, 60 sec: 14062.9, 300 sec: 14245.8). Total num frames: 549957632. Throughput: 0: 3344.8. Samples: 126654432. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 07:19:18,968][134211] Avg episode reward: [(0, '8.467')] [2025-01-04 07:19:20,385][134294] Updated weights for policy 0, policy_version 134274 (0.0021) [2025-01-04 07:19:22,381][134294] Updated weights for policy 0, policy_version 134284 (0.0014) [2025-01-04 07:19:23,968][134211] Fps is (10 sec: 15974.3, 60 sec: 14677.3, 300 sec: 14273.5). Total num frames: 550055936. Throughput: 0: 3328.4. Samples: 126678958. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 07:19:23,968][134211] Avg episode reward: [(0, '8.814')] [2025-01-04 07:19:24,573][134294] Updated weights for policy 0, policy_version 134294 (0.0021) [2025-01-04 07:19:27,638][134294] Updated weights for policy 0, policy_version 134304 (0.0027) [2025-01-04 07:19:28,968][134211] Fps is (10 sec: 16793.7, 60 sec: 14199.4, 300 sec: 14287.4). Total num frames: 550125568. Throughput: 0: 3427.0. Samples: 126702240. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 07:19:28,968][134211] Avg episode reward: [(0, '8.296')] [2025-01-04 07:19:30,694][134294] Updated weights for policy 0, policy_version 134314 (0.0024) [2025-01-04 07:19:33,705][134294] Updated weights for policy 0, policy_version 134324 (0.0024) [2025-01-04 07:19:33,968][134211] Fps is (10 sec: 13516.7, 60 sec: 13516.9, 300 sec: 14287.4). Total num frames: 550191104. Throughput: 0: 3449.0. Samples: 126712446. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 07:19:33,969][134211] Avg episode reward: [(0, '8.154')] [2025-01-04 07:19:36,713][134294] Updated weights for policy 0, policy_version 134334 (0.0027) [2025-01-04 07:19:38,970][134211] Fps is (10 sec: 13513.1, 60 sec: 13584.5, 300 sec: 14301.2). Total num frames: 550260736. Throughput: 0: 3493.9. Samples: 126732910. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 07:19:38,971][134211] Avg episode reward: [(0, '8.791')] [2025-01-04 07:19:39,801][134294] Updated weights for policy 0, policy_version 134344 (0.0025) [2025-01-04 07:19:43,170][134294] Updated weights for policy 0, policy_version 134354 (0.0028) [2025-01-04 07:19:43,968][134211] Fps is (10 sec: 13107.3, 60 sec: 13585.0, 300 sec: 14301.3). Total num frames: 550322176. Throughput: 0: 3457.1. Samples: 126751754. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 07:19:43,968][134211] Avg episode reward: [(0, '8.829')] [2025-01-04 07:19:45,889][134294] Updated weights for policy 0, policy_version 134364 (0.0019) [2025-01-04 07:19:47,853][134294] Updated weights for policy 0, policy_version 134374 (0.0014) [2025-01-04 07:19:48,968][134211] Fps is (10 sec: 15569.2, 60 sec: 14199.5, 300 sec: 14370.7). Total num frames: 550416384. Throughput: 0: 3506.2. Samples: 126763568. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 07:19:48,968][134211] Avg episode reward: [(0, '9.300')] [2025-01-04 07:19:49,856][134294] Updated weights for policy 0, policy_version 134384 (0.0014) [2025-01-04 07:19:51,910][134294] Updated weights for policy 0, policy_version 134394 (0.0014) [2025-01-04 07:19:53,907][134294] Updated weights for policy 0, policy_version 134404 (0.0016) [2025-01-04 07:19:53,968][134211] Fps is (10 sec: 19661.2, 60 sec: 14813.9, 300 sec: 14342.9). Total num frames: 550518784. Throughput: 0: 3739.6. Samples: 126794036. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 07:19:53,968][134211] Avg episode reward: [(0, '8.431')] [2025-01-04 07:19:55,847][134294] Updated weights for policy 0, policy_version 134414 (0.0014) [2025-01-04 07:19:58,834][134294] Updated weights for policy 0, policy_version 134424 (0.0026) [2025-01-04 07:19:58,970][134211] Fps is (10 sec: 18427.0, 60 sec: 15018.1, 300 sec: 14370.6). Total num frames: 550600704. Throughput: 0: 3910.3. Samples: 126821332. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 07:19:58,971][134211] Avg episode reward: [(0, '8.993')] [2025-01-04 07:20:02,093][134294] Updated weights for policy 0, policy_version 134434 (0.0032) [2025-01-04 07:20:03,968][134211] Fps is (10 sec: 14335.9, 60 sec: 14951.2, 300 sec: 14356.8). Total num frames: 550662144. Throughput: 0: 3915.2. Samples: 126830616. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 07:20:03,968][134211] Avg episode reward: [(0, '9.499')] [2025-01-04 07:20:05,374][134294] Updated weights for policy 0, policy_version 134444 (0.0025) [2025-01-04 07:20:08,339][134294] Updated weights for policy 0, policy_version 134454 (0.0026) [2025-01-04 07:20:08,968][134211] Fps is (10 sec: 12700.8, 60 sec: 14882.1, 300 sec: 14370.7). Total num frames: 550727680. Throughput: 0: 3804.3. Samples: 126850152. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 07:20:08,968][134211] Avg episode reward: [(0, '9.397')] [2025-01-04 07:20:11,377][134294] Updated weights for policy 0, policy_version 134464 (0.0028) [2025-01-04 07:20:13,969][134211] Fps is (10 sec: 13105.4, 60 sec: 14950.1, 300 sec: 14384.5). Total num frames: 550793216. Throughput: 0: 3717.3. Samples: 126869524. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 07:20:13,970][134211] Avg episode reward: [(0, '8.264')] [2025-01-04 07:20:14,996][134294] Updated weights for policy 0, policy_version 134474 (0.0027) [2025-01-04 07:20:18,233][134294] Updated weights for policy 0, policy_version 134484 (0.0027) [2025-01-04 07:20:18,968][134211] Fps is (10 sec: 12697.6, 60 sec: 14950.4, 300 sec: 14370.7). Total num frames: 550854656. Throughput: 0: 3687.2. Samples: 126878370. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 07:20:18,968][134211] Avg episode reward: [(0, '8.784')] [2025-01-04 07:20:21,502][134294] Updated weights for policy 0, policy_version 134494 (0.0023) [2025-01-04 07:20:23,968][134211] Fps is (10 sec: 11879.9, 60 sec: 14267.7, 300 sec: 14342.9). Total num frames: 550912000. Throughput: 0: 3643.8. Samples: 126896872. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 07:20:23,968][134211] Avg episode reward: [(0, '7.895')] [2025-01-04 07:20:25,322][134294] Updated weights for policy 0, policy_version 134504 (0.0033) [2025-01-04 07:20:28,763][134294] Updated weights for policy 0, policy_version 134514 (0.0025) [2025-01-04 07:20:28,968][134211] Fps is (10 sec: 11878.0, 60 sec: 14131.1, 300 sec: 14315.2). Total num frames: 550973440. Throughput: 0: 3606.3. Samples: 126914040. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 07:20:28,969][134211] Avg episode reward: [(0, '8.473')] [2025-01-04 07:20:31,750][134294] Updated weights for policy 0, policy_version 134524 (0.0023) [2025-01-04 07:20:33,968][134211] Fps is (10 sec: 12697.6, 60 sec: 14131.2, 300 sec: 14301.5). Total num frames: 551038976. Throughput: 0: 3554.5. Samples: 126923520. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 07:20:33,968][134211] Avg episode reward: [(0, '8.806')] [2025-01-04 07:20:34,849][134294] Updated weights for policy 0, policy_version 134534 (0.0026) [2025-01-04 07:20:37,894][134294] Updated weights for policy 0, policy_version 134544 (0.0028) [2025-01-04 07:20:38,968][134211] Fps is (10 sec: 13107.5, 60 sec: 14063.5, 300 sec: 14301.3). Total num frames: 551104512. Throughput: 0: 3328.7. Samples: 126943826. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 07:20:38,968][134211] Avg episode reward: [(0, '7.660')] [2025-01-04 07:20:40,904][134294] Updated weights for policy 0, policy_version 134554 (0.0023) [2025-01-04 07:20:43,934][134294] Updated weights for policy 0, policy_version 134564 (0.0026) [2025-01-04 07:20:43,968][134211] Fps is (10 sec: 13517.0, 60 sec: 14199.5, 300 sec: 14190.2). Total num frames: 551174144. Throughput: 0: 3177.3. Samples: 126964304. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 07:20:43,968][134211] Avg episode reward: [(0, '8.776')] [2025-01-04 07:20:46,762][134294] Updated weights for policy 0, policy_version 134574 (0.0022) [2025-01-04 07:20:48,883][134294] Updated weights for policy 0, policy_version 134584 (0.0014) [2025-01-04 07:20:48,968][134211] Fps is (10 sec: 15155.5, 60 sec: 13994.7, 300 sec: 14106.9). Total num frames: 551256064. Throughput: 0: 3186.1. Samples: 126973990. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 07:20:48,968][134211] Avg episode reward: [(0, '8.082')] [2025-01-04 07:20:50,970][134294] Updated weights for policy 0, policy_version 134594 (0.0013) [2025-01-04 07:20:52,963][134294] Updated weights for policy 0, policy_version 134604 (0.0013) [2025-01-04 07:20:53,968][134211] Fps is (10 sec: 18430.9, 60 sec: 13994.5, 300 sec: 14231.8). Total num frames: 551358464. Throughput: 0: 3409.7. Samples: 127003588. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 07:20:53,969][134211] Avg episode reward: [(0, '7.976')] [2025-01-04 07:20:54,876][134294] Updated weights for policy 0, policy_version 134614 (0.0013) [2025-01-04 07:20:57,709][134294] Updated weights for policy 0, policy_version 134624 (0.0027) [2025-01-04 07:20:58,968][134211] Fps is (10 sec: 17612.6, 60 sec: 13858.7, 300 sec: 14259.6). Total num frames: 551432192. Throughput: 0: 3557.4. Samples: 127029602. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 07:20:58,968][134211] Avg episode reward: [(0, '9.425')] [2025-01-04 07:21:00,950][134294] Updated weights for policy 0, policy_version 134634 (0.0026) [2025-01-04 07:21:03,968][134211] Fps is (10 sec: 13517.6, 60 sec: 13858.1, 300 sec: 14245.8). Total num frames: 551493632. Throughput: 0: 3575.9. Samples: 127039284. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 07:21:03,968][134211] Avg episode reward: [(0, '7.991')] [2025-01-04 07:21:04,249][134294] Updated weights for policy 0, policy_version 134644 (0.0030) [2025-01-04 07:21:07,462][134294] Updated weights for policy 0, policy_version 134654 (0.0026) [2025-01-04 07:21:08,968][134211] Fps is (10 sec: 12697.5, 60 sec: 13858.1, 300 sec: 14273.5). Total num frames: 551559168. Throughput: 0: 3579.2. Samples: 127057934. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 07:21:08,970][134211] Avg episode reward: [(0, '9.180')] [2025-01-04 07:21:10,442][134294] Updated weights for policy 0, policy_version 134664 (0.0024) [2025-01-04 07:21:13,712][134294] Updated weights for policy 0, policy_version 134674 (0.0027) [2025-01-04 07:21:13,968][134211] Fps is (10 sec: 13106.9, 60 sec: 13858.4, 300 sec: 14287.4). Total num frames: 551624704. Throughput: 0: 3638.0. Samples: 127077748. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 07:21:13,969][134211] Avg episode reward: [(0, '8.830')] [2025-01-04 07:21:13,977][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000134674_551624704.pth... [2025-01-04 07:21:14,049][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000133836_548192256.pth [2025-01-04 07:21:17,005][134294] Updated weights for policy 0, policy_version 134684 (0.0029) [2025-01-04 07:21:18,968][134211] Fps is (10 sec: 12697.7, 60 sec: 13858.1, 300 sec: 14301.3). Total num frames: 551686144. Throughput: 0: 3631.3. Samples: 127086926. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 07:21:18,968][134211] Avg episode reward: [(0, '8.393')] [2025-01-04 07:21:20,325][134294] Updated weights for policy 0, policy_version 134694 (0.0028) [2025-01-04 07:21:23,412][134294] Updated weights for policy 0, policy_version 134704 (0.0026) [2025-01-04 07:21:23,968][134211] Fps is (10 sec: 12697.7, 60 sec: 13994.6, 300 sec: 14315.2). Total num frames: 551751680. Throughput: 0: 3606.9. Samples: 127106136. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 07:21:23,968][134211] Avg episode reward: [(0, '8.206')] [2025-01-04 07:21:26,370][134294] Updated weights for policy 0, policy_version 134714 (0.0025) [2025-01-04 07:21:28,968][134211] Fps is (10 sec: 13516.3, 60 sec: 14131.2, 300 sec: 14259.6). Total num frames: 551821312. Throughput: 0: 3596.9. Samples: 127126164. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 07:21:28,969][134211] Avg episode reward: [(0, '8.583')] [2025-01-04 07:21:29,473][134294] Updated weights for policy 0, policy_version 134724 (0.0024) [2025-01-04 07:21:32,602][134294] Updated weights for policy 0, policy_version 134734 (0.0024) [2025-01-04 07:21:33,968][134211] Fps is (10 sec: 13516.7, 60 sec: 14131.2, 300 sec: 14106.9). Total num frames: 551886848. Throughput: 0: 3606.7. Samples: 127136294. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 07:21:33,968][134211] Avg episode reward: [(0, '9.482')] [2025-01-04 07:21:35,013][134294] Updated weights for policy 0, policy_version 134744 (0.0018) [2025-01-04 07:21:36,870][134294] Updated weights for policy 0, policy_version 134754 (0.0013) [2025-01-04 07:21:38,968][134211] Fps is (10 sec: 17203.8, 60 sec: 14813.9, 300 sec: 14204.1). Total num frames: 551993344. Throughput: 0: 3545.0. Samples: 127163110. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 07:21:38,968][134211] Avg episode reward: [(0, '8.772')] [2025-01-04 07:21:38,973][134294] Updated weights for policy 0, policy_version 134764 (0.0016) [2025-01-04 07:21:42,466][134294] Updated weights for policy 0, policy_version 134774 (0.0029) [2025-01-04 07:21:43,968][134211] Fps is (10 sec: 15974.4, 60 sec: 14540.7, 300 sec: 14190.2). Total num frames: 552046592. Throughput: 0: 3423.4. Samples: 127183658. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 07:21:43,969][134211] Avg episode reward: [(0, '8.831')] [2025-01-04 07:21:46,017][134294] Updated weights for policy 0, policy_version 134784 (0.0029) [2025-01-04 07:21:48,968][134211] Fps is (10 sec: 11878.5, 60 sec: 14267.7, 300 sec: 14190.2). Total num frames: 552112128. Throughput: 0: 3404.8. Samples: 127192498. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 07:21:48,968][134211] Avg episode reward: [(0, '8.590')] [2025-01-04 07:21:49,116][134294] Updated weights for policy 0, policy_version 134794 (0.0022) [2025-01-04 07:21:51,105][134294] Updated weights for policy 0, policy_version 134804 (0.0014) [2025-01-04 07:21:53,259][134294] Updated weights for policy 0, policy_version 134814 (0.0013) [2025-01-04 07:21:53,968][134211] Fps is (10 sec: 16384.3, 60 sec: 14199.6, 300 sec: 14315.2). Total num frames: 552210432. Throughput: 0: 3549.0. Samples: 127217638. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 07:21:53,969][134211] Avg episode reward: [(0, '8.835')] [2025-01-04 07:21:55,830][134294] Updated weights for policy 0, policy_version 134824 (0.0021) [2025-01-04 07:21:58,968][134211] Fps is (10 sec: 16383.9, 60 sec: 14062.9, 300 sec: 14329.1). Total num frames: 552275968. Throughput: 0: 3619.9. Samples: 127240642. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 07:21:58,968][134211] Avg episode reward: [(0, '7.751')] [2025-01-04 07:21:59,034][134294] Updated weights for policy 0, policy_version 134834 (0.0028) [2025-01-04 07:22:02,195][134294] Updated weights for policy 0, policy_version 134844 (0.0027) [2025-01-04 07:22:03,968][134211] Fps is (10 sec: 13106.2, 60 sec: 14131.0, 300 sec: 14301.3). Total num frames: 552341504. Throughput: 0: 3629.9. Samples: 127250274. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 07:22:03,969][134211] Avg episode reward: [(0, '9.289')] [2025-01-04 07:22:05,314][134294] Updated weights for policy 0, policy_version 134854 (0.0030) [2025-01-04 07:22:08,314][134294] Updated weights for policy 0, policy_version 134864 (0.0027) [2025-01-04 07:22:08,969][134211] Fps is (10 sec: 13105.7, 60 sec: 14131.0, 300 sec: 14287.4). Total num frames: 552407040. Throughput: 0: 3644.8. Samples: 127270156. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 07:22:08,969][134211] Avg episode reward: [(0, '8.320')] [2025-01-04 07:22:11,387][134294] Updated weights for policy 0, policy_version 134874 (0.0026) [2025-01-04 07:22:13,968][134211] Fps is (10 sec: 13108.0, 60 sec: 14131.2, 300 sec: 14315.2). Total num frames: 552472576. Throughput: 0: 3627.1. Samples: 127289384. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 07:22:13,969][134211] Avg episode reward: [(0, '8.012')] [2025-01-04 07:22:14,842][134294] Updated weights for policy 0, policy_version 134884 (0.0030) [2025-01-04 07:22:18,135][134294] Updated weights for policy 0, policy_version 134894 (0.0025) [2025-01-04 07:22:18,968][134211] Fps is (10 sec: 12699.0, 60 sec: 14131.2, 300 sec: 14176.3). Total num frames: 552534016. Throughput: 0: 3607.4. Samples: 127298628. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 07:22:18,968][134211] Avg episode reward: [(0, '7.852')] [2025-01-04 07:22:21,377][134294] Updated weights for policy 0, policy_version 134904 (0.0024) [2025-01-04 07:22:23,967][134211] Fps is (10 sec: 12288.4, 60 sec: 14063.0, 300 sec: 14037.5). Total num frames: 552595456. Throughput: 0: 3422.7. Samples: 127317132. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 07:22:23,968][134211] Avg episode reward: [(0, '8.115')] [2025-01-04 07:22:24,599][134294] Updated weights for policy 0, policy_version 134914 (0.0022) [2025-01-04 07:22:26,633][134294] Updated weights for policy 0, policy_version 134924 (0.0013) [2025-01-04 07:22:28,632][134294] Updated weights for policy 0, policy_version 134934 (0.0013) [2025-01-04 07:22:28,968][134211] Fps is (10 sec: 16383.5, 60 sec: 14609.1, 300 sec: 14106.9). Total num frames: 552697856. Throughput: 0: 3551.9. Samples: 127343496. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 07:22:28,968][134211] Avg episode reward: [(0, '8.243')] [2025-01-04 07:22:30,441][134294] Updated weights for policy 0, policy_version 134944 (0.0013) [2025-01-04 07:22:32,992][134294] Updated weights for policy 0, policy_version 134954 (0.0020) [2025-01-04 07:22:33,968][134211] Fps is (10 sec: 18431.3, 60 sec: 14882.1, 300 sec: 14176.3). Total num frames: 552779776. Throughput: 0: 3716.3. Samples: 127359734. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 07:22:33,969][134211] Avg episode reward: [(0, '7.734')] [2025-01-04 07:22:36,080][134294] Updated weights for policy 0, policy_version 134964 (0.0027) [2025-01-04 07:22:38,969][134211] Fps is (10 sec: 14744.8, 60 sec: 14199.3, 300 sec: 14190.2). Total num frames: 552845312. Throughput: 0: 3600.3. Samples: 127379656. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 07:22:38,969][134211] Avg episode reward: [(0, '9.129')] [2025-01-04 07:22:39,425][134294] Updated weights for policy 0, policy_version 134974 (0.0027) [2025-01-04 07:22:42,526][134294] Updated weights for policy 0, policy_version 134984 (0.0026) [2025-01-04 07:22:43,968][134211] Fps is (10 sec: 12697.7, 60 sec: 14336.0, 300 sec: 14176.3). Total num frames: 552906752. Throughput: 0: 3512.2. Samples: 127398692. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 07:22:43,968][134211] Avg episode reward: [(0, '7.633')] [2025-01-04 07:22:45,978][134294] Updated weights for policy 0, policy_version 134994 (0.0028) [2025-01-04 07:22:48,968][134211] Fps is (10 sec: 12698.7, 60 sec: 14336.0, 300 sec: 14190.2). Total num frames: 552972288. Throughput: 0: 3499.7. Samples: 127407756. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 07:22:48,968][134211] Avg episode reward: [(0, '7.319')] [2025-01-04 07:22:49,143][134294] Updated weights for policy 0, policy_version 135004 (0.0025) [2025-01-04 07:22:52,469][134294] Updated weights for policy 0, policy_version 135014 (0.0026) [2025-01-04 07:22:53,968][134211] Fps is (10 sec: 12697.6, 60 sec: 13721.6, 300 sec: 14176.3). Total num frames: 553033728. Throughput: 0: 3480.1. Samples: 127426758. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 07:22:53,969][134211] Avg episode reward: [(0, '7.764')] [2025-01-04 07:22:55,509][134294] Updated weights for policy 0, policy_version 135024 (0.0024) [2025-01-04 07:22:58,452][134294] Updated weights for policy 0, policy_version 135034 (0.0025) [2025-01-04 07:22:58,968][134211] Fps is (10 sec: 13107.1, 60 sec: 13789.8, 300 sec: 14176.3). Total num frames: 553103360. Throughput: 0: 3505.8. Samples: 127447144. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 07:22:58,968][134211] Avg episode reward: [(0, '8.621')] [2025-01-04 07:23:01,535][134294] Updated weights for policy 0, policy_version 135044 (0.0026) [2025-01-04 07:23:03,970][134211] Fps is (10 sec: 13514.2, 60 sec: 13789.6, 300 sec: 14176.2). Total num frames: 553168896. Throughput: 0: 3523.7. Samples: 127457204. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 07:23:03,970][134211] Avg episode reward: [(0, '8.972')] [2025-01-04 07:23:04,619][134294] Updated weights for policy 0, policy_version 135054 (0.0025) [2025-01-04 07:23:07,728][134294] Updated weights for policy 0, policy_version 135064 (0.0025) [2025-01-04 07:23:08,968][134211] Fps is (10 sec: 13516.8, 60 sec: 13858.4, 300 sec: 14176.3). Total num frames: 553238528. Throughput: 0: 3550.0. Samples: 127476882. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 07:23:08,968][134211] Avg episode reward: [(0, '8.118')] [2025-01-04 07:23:10,692][134294] Updated weights for policy 0, policy_version 135074 (0.0024) [2025-01-04 07:23:13,108][134294] Updated weights for policy 0, policy_version 135084 (0.0018) [2025-01-04 07:23:13,968][134211] Fps is (10 sec: 15158.5, 60 sec: 14131.2, 300 sec: 14259.6). Total num frames: 553320448. Throughput: 0: 3470.7. Samples: 127499674. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 07:23:13,968][134211] Avg episode reward: [(0, '7.762')] [2025-01-04 07:23:13,975][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000135088_553320448.pth... [2025-01-04 07:23:14,020][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000134252_549896192.pth [2025-01-04 07:23:15,172][134294] Updated weights for policy 0, policy_version 135094 (0.0014) [2025-01-04 07:23:17,114][134294] Updated weights for policy 0, policy_version 135104 (0.0015) [2025-01-04 07:23:18,968][134211] Fps is (10 sec: 18022.6, 60 sec: 14745.6, 300 sec: 14384.6). Total num frames: 553418752. Throughput: 0: 3443.2. Samples: 127514678. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 07:23:18,968][134211] Avg episode reward: [(0, '7.879')] [2025-01-04 07:23:19,435][134294] Updated weights for policy 0, policy_version 135114 (0.0016) [2025-01-04 07:23:22,771][134294] Updated weights for policy 0, policy_version 135124 (0.0027) [2025-01-04 07:23:23,968][134211] Fps is (10 sec: 15974.1, 60 sec: 14745.5, 300 sec: 14259.6). Total num frames: 553480192. Throughput: 0: 3527.2. Samples: 127538378. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 07:23:23,968][134211] Avg episode reward: [(0, '7.671')] [2025-01-04 07:23:25,949][134294] Updated weights for policy 0, policy_version 135134 (0.0029) [2025-01-04 07:23:28,968][134211] Fps is (10 sec: 12697.6, 60 sec: 14131.3, 300 sec: 14120.8). Total num frames: 553545728. Throughput: 0: 3541.4. Samples: 127558052. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 07:23:28,968][134211] Avg episode reward: [(0, '8.012')] [2025-01-04 07:23:28,992][134294] Updated weights for policy 0, policy_version 135144 (0.0025) [2025-01-04 07:23:32,082][134294] Updated weights for policy 0, policy_version 135154 (0.0026) [2025-01-04 07:23:33,968][134211] Fps is (10 sec: 13516.8, 60 sec: 13926.4, 300 sec: 14134.7). Total num frames: 553615360. Throughput: 0: 3559.7. Samples: 127567942. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 07:23:33,968][134211] Avg episode reward: [(0, '7.839')] [2025-01-04 07:23:35,167][134294] Updated weights for policy 0, policy_version 135164 (0.0026) [2025-01-04 07:23:37,910][134294] Updated weights for policy 0, policy_version 135174 (0.0024) [2025-01-04 07:23:38,968][134211] Fps is (10 sec: 13926.3, 60 sec: 13994.9, 300 sec: 14162.4). Total num frames: 553684992. Throughput: 0: 3594.8. Samples: 127588524. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 07:23:38,968][134211] Avg episode reward: [(0, '9.607')] [2025-01-04 07:23:41,035][134294] Updated weights for policy 0, policy_version 135184 (0.0023) [2025-01-04 07:23:43,968][134211] Fps is (10 sec: 13107.2, 60 sec: 13994.7, 300 sec: 14176.3). Total num frames: 553746432. Throughput: 0: 3582.2. Samples: 127608342. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 07:23:43,968][134211] Avg episode reward: [(0, '8.907')] [2025-01-04 07:23:44,366][134294] Updated weights for policy 0, policy_version 135194 (0.0024) [2025-01-04 07:23:47,679][134294] Updated weights for policy 0, policy_version 135204 (0.0025) [2025-01-04 07:23:48,968][134211] Fps is (10 sec: 13516.9, 60 sec: 14131.2, 300 sec: 14204.1). Total num frames: 553820160. Throughput: 0: 3555.0. Samples: 127617172. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 07:23:48,968][134211] Avg episode reward: [(0, '8.239')] [2025-01-04 07:23:49,809][134294] Updated weights for policy 0, policy_version 135214 (0.0015) [2025-01-04 07:23:52,836][134294] Updated weights for policy 0, policy_version 135224 (0.0027) [2025-01-04 07:23:53,968][134211] Fps is (10 sec: 14335.2, 60 sec: 14267.6, 300 sec: 14204.1). Total num frames: 553889792. Throughput: 0: 3635.1. Samples: 127640466. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 07:23:53,969][134211] Avg episode reward: [(0, '8.670')] [2025-01-04 07:23:55,996][134294] Updated weights for policy 0, policy_version 135234 (0.0023) [2025-01-04 07:23:58,968][134211] Fps is (10 sec: 13516.8, 60 sec: 14199.5, 300 sec: 14204.3). Total num frames: 553955328. Throughput: 0: 3561.9. Samples: 127659958. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 07:23:58,968][134211] Avg episode reward: [(0, '8.086')] [2025-01-04 07:23:58,999][134294] Updated weights for policy 0, policy_version 135244 (0.0025) [2025-01-04 07:24:01,031][134294] Updated weights for policy 0, policy_version 135254 (0.0013) [2025-01-04 07:24:03,861][134294] Updated weights for policy 0, policy_version 135264 (0.0023) [2025-01-04 07:24:03,968][134211] Fps is (10 sec: 15156.0, 60 sec: 14541.3, 300 sec: 14259.6). Total num frames: 554041344. Throughput: 0: 3540.5. Samples: 127674000. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 07:24:03,969][134211] Avg episode reward: [(0, '8.779')] [2025-01-04 07:24:06,827][134294] Updated weights for policy 0, policy_version 135274 (0.0025) [2025-01-04 07:24:08,968][134211] Fps is (10 sec: 15154.7, 60 sec: 14472.5, 300 sec: 14273.5). Total num frames: 554106880. Throughput: 0: 3474.9. Samples: 127694750. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 07:24:08,968][134211] Avg episode reward: [(0, '8.635')] [2025-01-04 07:24:10,101][134294] Updated weights for policy 0, policy_version 135284 (0.0025) [2025-01-04 07:24:13,033][134294] Updated weights for policy 0, policy_version 135294 (0.0026) [2025-01-04 07:24:13,968][134211] Fps is (10 sec: 13107.2, 60 sec: 14199.4, 300 sec: 14287.4). Total num frames: 554172416. Throughput: 0: 3475.2. Samples: 127714436. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 07:24:13,968][134211] Avg episode reward: [(0, '8.357')] [2025-01-04 07:24:16,490][134294] Updated weights for policy 0, policy_version 135304 (0.0026) [2025-01-04 07:24:18,847][134294] Updated weights for policy 0, policy_version 135314 (0.0016) [2025-01-04 07:24:18,968][134211] Fps is (10 sec: 13926.8, 60 sec: 13789.9, 300 sec: 14204.1). Total num frames: 554246144. Throughput: 0: 3456.1. Samples: 127723466. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 07:24:18,968][134211] Avg episode reward: [(0, '8.683')] [2025-01-04 07:24:20,900][134294] Updated weights for policy 0, policy_version 135324 (0.0014) [2025-01-04 07:24:23,046][134294] Updated weights for policy 0, policy_version 135334 (0.0012) [2025-01-04 07:24:23,970][134211] Fps is (10 sec: 16380.7, 60 sec: 14267.3, 300 sec: 14273.4). Total num frames: 554336256. Throughput: 0: 3613.6. Samples: 127751142. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 07:24:23,970][134211] Avg episode reward: [(0, '8.696')] [2025-01-04 07:24:26,513][134294] Updated weights for policy 0, policy_version 135344 (0.0024) [2025-01-04 07:24:28,968][134211] Fps is (10 sec: 14744.8, 60 sec: 14131.1, 300 sec: 14245.7). Total num frames: 554393600. Throughput: 0: 3591.8. Samples: 127769974. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 07:24:28,969][134211] Avg episode reward: [(0, '8.469')] [2025-01-04 07:24:30,190][134294] Updated weights for policy 0, policy_version 135354 (0.0029) [2025-01-04 07:24:33,583][134294] Updated weights for policy 0, policy_version 135364 (0.0026) [2025-01-04 07:24:33,968][134211] Fps is (10 sec: 11471.1, 60 sec: 13926.4, 300 sec: 14204.2). Total num frames: 554450944. Throughput: 0: 3585.4. Samples: 127778518. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 07:24:33,969][134211] Avg episode reward: [(0, '8.064')] [2025-01-04 07:24:36,673][134294] Updated weights for policy 0, policy_version 135374 (0.0026) [2025-01-04 07:24:38,968][134211] Fps is (10 sec: 12698.0, 60 sec: 13926.4, 300 sec: 14231.9). Total num frames: 554520576. Throughput: 0: 3498.0. Samples: 127797876. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 07:24:38,968][134211] Avg episode reward: [(0, '8.456')] [2025-01-04 07:24:39,765][134294] Updated weights for policy 0, policy_version 135384 (0.0025) [2025-01-04 07:24:42,930][134294] Updated weights for policy 0, policy_version 135394 (0.0025) [2025-01-04 07:24:43,967][134211] Fps is (10 sec: 13517.3, 60 sec: 13994.7, 300 sec: 14134.7). Total num frames: 554586112. Throughput: 0: 3497.5. Samples: 127817346. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 07:24:43,968][134211] Avg episode reward: [(0, '7.623')] [2025-01-04 07:24:45,306][134294] Updated weights for policy 0, policy_version 135404 (0.0015) [2025-01-04 07:24:48,084][134294] Updated weights for policy 0, policy_version 135414 (0.0023) [2025-01-04 07:24:48,968][134211] Fps is (10 sec: 14336.0, 60 sec: 14062.9, 300 sec: 14051.4). Total num frames: 554663936. Throughput: 0: 3481.9. Samples: 127830686. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 07:24:48,968][134211] Avg episode reward: [(0, '8.721')] [2025-01-04 07:24:51,385][134294] Updated weights for policy 0, policy_version 135424 (0.0027) [2025-01-04 07:24:53,968][134211] Fps is (10 sec: 13926.0, 60 sec: 13926.5, 300 sec: 13982.1). Total num frames: 554725376. Throughput: 0: 3443.2. Samples: 127849694. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 07:24:53,968][134211] Avg episode reward: [(0, '7.759')] [2025-01-04 07:24:54,651][134294] Updated weights for policy 0, policy_version 135434 (0.0027) [2025-01-04 07:24:56,698][134294] Updated weights for policy 0, policy_version 135444 (0.0013) [2025-01-04 07:24:58,582][134294] Updated weights for policy 0, policy_version 135454 (0.0013) [2025-01-04 07:24:58,967][134211] Fps is (10 sec: 16384.5, 60 sec: 14540.8, 300 sec: 14120.8). Total num frames: 554827776. Throughput: 0: 3591.5. Samples: 127876052. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 07:24:58,968][134211] Avg episode reward: [(0, '8.740')] [2025-01-04 07:25:01,135][134294] Updated weights for policy 0, policy_version 135464 (0.0024) [2025-01-04 07:25:03,968][134211] Fps is (10 sec: 17202.3, 60 sec: 14267.6, 300 sec: 14134.6). Total num frames: 554897408. Throughput: 0: 3674.0. Samples: 127888800. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 07:25:03,969][134211] Avg episode reward: [(0, '8.694')] [2025-01-04 07:25:04,132][134294] Updated weights for policy 0, policy_version 135474 (0.0026) [2025-01-04 07:25:07,359][134294] Updated weights for policy 0, policy_version 135484 (0.0027) [2025-01-04 07:25:08,968][134211] Fps is (10 sec: 13106.8, 60 sec: 14199.5, 300 sec: 14120.8). Total num frames: 554958848. Throughput: 0: 3490.7. Samples: 127908214. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 07:25:08,968][134211] Avg episode reward: [(0, '8.034')] [2025-01-04 07:25:10,336][134294] Updated weights for policy 0, policy_version 135494 (0.0024) [2025-01-04 07:25:13,531][134294] Updated weights for policy 0, policy_version 135504 (0.0023) [2025-01-04 07:25:13,968][134211] Fps is (10 sec: 13107.4, 60 sec: 14267.7, 300 sec: 14148.5). Total num frames: 555028480. Throughput: 0: 3519.2. Samples: 127928340. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 07:25:13,969][134211] Avg episode reward: [(0, '8.716')] [2025-01-04 07:25:13,981][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000135505_555028480.pth... [2025-01-04 07:25:14,056][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000134674_551624704.pth [2025-01-04 07:25:16,793][134294] Updated weights for policy 0, policy_version 135514 (0.0025) [2025-01-04 07:25:18,968][134211] Fps is (10 sec: 13107.3, 60 sec: 14062.9, 300 sec: 14162.4). Total num frames: 555089920. Throughput: 0: 3532.5. Samples: 127937480. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 07:25:18,968][134211] Avg episode reward: [(0, '7.238')] [2025-01-04 07:25:20,206][134294] Updated weights for policy 0, policy_version 135524 (0.0027) [2025-01-04 07:25:22,807][134294] Updated weights for policy 0, policy_version 135534 (0.0016) [2025-01-04 07:25:23,968][134211] Fps is (10 sec: 13927.1, 60 sec: 13858.6, 300 sec: 14218.0). Total num frames: 555167744. Throughput: 0: 3537.5. Samples: 127957062. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 07:25:23,968][134211] Avg episode reward: [(0, '8.017')] [2025-01-04 07:25:24,862][134294] Updated weights for policy 0, policy_version 135544 (0.0013) [2025-01-04 07:25:26,737][134294] Updated weights for policy 0, policy_version 135554 (0.0013) [2025-01-04 07:25:28,765][134294] Updated weights for policy 0, policy_version 135564 (0.0014) [2025-01-04 07:25:28,968][134211] Fps is (10 sec: 18022.3, 60 sec: 14609.2, 300 sec: 14342.9). Total num frames: 555270144. Throughput: 0: 3791.9. Samples: 127987982. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 07:25:28,968][134211] Avg episode reward: [(0, '8.301')] [2025-01-04 07:25:31,868][134294] Updated weights for policy 0, policy_version 135574 (0.0026) [2025-01-04 07:25:33,968][134211] Fps is (10 sec: 16792.4, 60 sec: 14745.5, 300 sec: 14342.9). Total num frames: 555335680. Throughput: 0: 3736.4. Samples: 127998828. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 07:25:33,969][134211] Avg episode reward: [(0, '8.265')] [2025-01-04 07:25:35,119][134294] Updated weights for policy 0, policy_version 135584 (0.0030) [2025-01-04 07:25:38,303][134294] Updated weights for policy 0, policy_version 135594 (0.0025) [2025-01-04 07:25:38,968][134211] Fps is (10 sec: 13107.2, 60 sec: 14677.3, 300 sec: 14329.1). Total num frames: 555401216. Throughput: 0: 3736.1. Samples: 128017816. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 07:25:38,968][134211] Avg episode reward: [(0, '8.494')] [2025-01-04 07:25:41,329][134294] Updated weights for policy 0, policy_version 135604 (0.0025) [2025-01-04 07:25:43,968][134211] Fps is (10 sec: 12698.2, 60 sec: 14609.0, 300 sec: 14259.6). Total num frames: 555462656. Throughput: 0: 3579.2. Samples: 128037116. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 07:25:43,969][134211] Avg episode reward: [(0, '8.563')] [2025-01-04 07:25:44,791][134294] Updated weights for policy 0, policy_version 135614 (0.0029) [2025-01-04 07:25:48,441][134294] Updated weights for policy 0, policy_version 135624 (0.0027) [2025-01-04 07:25:48,968][134211] Fps is (10 sec: 11878.5, 60 sec: 14267.8, 300 sec: 14106.9). Total num frames: 555520000. Throughput: 0: 3488.7. Samples: 128045790. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 07:25:48,968][134211] Avg episode reward: [(0, '8.568')] [2025-01-04 07:25:51,920][134294] Updated weights for policy 0, policy_version 135634 (0.0027) [2025-01-04 07:25:53,967][134211] Fps is (10 sec: 12288.4, 60 sec: 14336.1, 300 sec: 14079.1). Total num frames: 555585536. Throughput: 0: 3444.1. Samples: 128063200. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 07:25:53,968][134211] Avg episode reward: [(0, '9.202')] [2025-01-04 07:25:54,491][134294] Updated weights for policy 0, policy_version 135644 (0.0017) [2025-01-04 07:25:56,518][134294] Updated weights for policy 0, policy_version 135654 (0.0016) [2025-01-04 07:25:58,814][134294] Updated weights for policy 0, policy_version 135664 (0.0021) [2025-01-04 07:25:58,968][134211] Fps is (10 sec: 15974.5, 60 sec: 14199.4, 300 sec: 14190.2). Total num frames: 555679744. Throughput: 0: 3610.8. Samples: 128090826. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 07:25:58,968][134211] Avg episode reward: [(0, '9.121')] [2025-01-04 07:26:01,875][134294] Updated weights for policy 0, policy_version 135674 (0.0029) [2025-01-04 07:26:03,968][134211] Fps is (10 sec: 15974.1, 60 sec: 14131.3, 300 sec: 14190.2). Total num frames: 555745280. Throughput: 0: 3635.5. Samples: 128101078. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 07:26:03,968][134211] Avg episode reward: [(0, '8.811')] [2025-01-04 07:26:05,065][134294] Updated weights for policy 0, policy_version 135684 (0.0026) [2025-01-04 07:26:08,231][134294] Updated weights for policy 0, policy_version 135694 (0.0026) [2025-01-04 07:26:08,968][134211] Fps is (10 sec: 13106.9, 60 sec: 14199.5, 300 sec: 14190.2). Total num frames: 555810816. Throughput: 0: 3625.2. Samples: 128120196. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 07:26:08,968][134211] Avg episode reward: [(0, '7.976')] [2025-01-04 07:26:11,350][134294] Updated weights for policy 0, policy_version 135704 (0.0025) [2025-01-04 07:26:13,968][134211] Fps is (10 sec: 13106.5, 60 sec: 14131.2, 300 sec: 14204.1). Total num frames: 555876352. Throughput: 0: 3383.5. Samples: 128140242. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 07:26:13,969][134211] Avg episode reward: [(0, '7.992')] [2025-01-04 07:26:14,637][134294] Updated weights for policy 0, policy_version 135714 (0.0025) [2025-01-04 07:26:17,913][134294] Updated weights for policy 0, policy_version 135724 (0.0027) [2025-01-04 07:26:18,968][134211] Fps is (10 sec: 12697.6, 60 sec: 14131.2, 300 sec: 14190.2). Total num frames: 555937792. Throughput: 0: 3342.3. Samples: 128149228. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 07:26:18,968][134211] Avg episode reward: [(0, '8.371')] [2025-01-04 07:26:21,057][134294] Updated weights for policy 0, policy_version 135734 (0.0025) [2025-01-04 07:26:23,564][134294] Updated weights for policy 0, policy_version 135744 (0.0017) [2025-01-04 07:26:23,968][134211] Fps is (10 sec: 13517.7, 60 sec: 14062.9, 300 sec: 14204.1). Total num frames: 556011520. Throughput: 0: 3356.1. Samples: 128168840. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 07:26:23,968][134211] Avg episode reward: [(0, '8.358')] [2025-01-04 07:26:25,816][134294] Updated weights for policy 0, policy_version 135754 (0.0014) [2025-01-04 07:26:28,913][134294] Updated weights for policy 0, policy_version 135764 (0.0025) [2025-01-04 07:26:28,968][134211] Fps is (10 sec: 15155.2, 60 sec: 13653.3, 300 sec: 14245.8). Total num frames: 556089344. Throughput: 0: 3469.4. Samples: 128193238. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 07:26:28,968][134211] Avg episode reward: [(0, '8.762')] [2025-01-04 07:26:32,013][134294] Updated weights for policy 0, policy_version 135774 (0.0026) [2025-01-04 07:26:33,968][134211] Fps is (10 sec: 14336.0, 60 sec: 13653.5, 300 sec: 14106.9). Total num frames: 556154880. Throughput: 0: 3491.1. Samples: 128202890. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 07:26:33,968][134211] Avg episode reward: [(0, '8.020')] [2025-01-04 07:26:35,011][134294] Updated weights for policy 0, policy_version 135784 (0.0026) [2025-01-04 07:26:37,970][134294] Updated weights for policy 0, policy_version 135794 (0.0028) [2025-01-04 07:26:38,968][134211] Fps is (10 sec: 13516.7, 60 sec: 13721.6, 300 sec: 14162.4). Total num frames: 556224512. Throughput: 0: 3565.3. Samples: 128223638. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 07:26:38,969][134211] Avg episode reward: [(0, '7.991')] [2025-01-04 07:26:40,989][134294] Updated weights for policy 0, policy_version 135804 (0.0025) [2025-01-04 07:26:43,968][134211] Fps is (10 sec: 13516.5, 60 sec: 13789.9, 300 sec: 14162.4). Total num frames: 556290048. Throughput: 0: 3389.3. Samples: 128243346. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 07:26:43,968][134211] Avg episode reward: [(0, '8.919')] [2025-01-04 07:26:44,337][134294] Updated weights for policy 0, policy_version 135814 (0.0030) [2025-01-04 07:26:47,509][134294] Updated weights for policy 0, policy_version 135824 (0.0027) [2025-01-04 07:26:48,968][134211] Fps is (10 sec: 13926.8, 60 sec: 14062.9, 300 sec: 14079.1). Total num frames: 556363776. Throughput: 0: 3358.6. Samples: 128252214. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 07:26:48,968][134211] Avg episode reward: [(0, '8.069')] [2025-01-04 07:26:49,563][134294] Updated weights for policy 0, policy_version 135834 (0.0013) [2025-01-04 07:26:51,573][134294] Updated weights for policy 0, policy_version 135844 (0.0014) [2025-01-04 07:26:53,586][134294] Updated weights for policy 0, policy_version 135854 (0.0012) [2025-01-04 07:26:53,967][134211] Fps is (10 sec: 17613.3, 60 sec: 14677.3, 300 sec: 14204.1). Total num frames: 556466176. Throughput: 0: 3568.1. Samples: 128280760. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 07:26:53,968][134211] Avg episode reward: [(0, '8.294')] [2025-01-04 07:26:55,933][134294] Updated weights for policy 0, policy_version 135864 (0.0023) [2025-01-04 07:26:58,968][134211] Fps is (10 sec: 17203.0, 60 sec: 14267.7, 300 sec: 14218.0). Total num frames: 556535808. Throughput: 0: 3670.0. Samples: 128305390. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 07:26:58,968][134211] Avg episode reward: [(0, '9.258')] [2025-01-04 07:26:59,142][134294] Updated weights for policy 0, policy_version 135874 (0.0024) [2025-01-04 07:27:02,391][134294] Updated weights for policy 0, policy_version 135884 (0.0030) [2025-01-04 07:27:03,968][134211] Fps is (10 sec: 13106.8, 60 sec: 14199.4, 300 sec: 14204.1). Total num frames: 556597248. Throughput: 0: 3673.6. Samples: 128314540. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 07:27:03,968][134211] Avg episode reward: [(0, '8.505')] [2025-01-04 07:27:05,680][134294] Updated weights for policy 0, policy_version 135894 (0.0024) [2025-01-04 07:27:08,886][134294] Updated weights for policy 0, policy_version 135904 (0.0027) [2025-01-04 07:27:08,968][134211] Fps is (10 sec: 12697.6, 60 sec: 14199.5, 300 sec: 14204.1). Total num frames: 556662784. Throughput: 0: 3659.5. Samples: 128333518. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 07:27:08,968][134211] Avg episode reward: [(0, '8.396')] [2025-01-04 07:27:12,401][134294] Updated weights for policy 0, policy_version 135914 (0.0028) [2025-01-04 07:27:13,968][134211] Fps is (10 sec: 12288.0, 60 sec: 14063.0, 300 sec: 14190.2). Total num frames: 556720128. Throughput: 0: 3513.4. Samples: 128351342. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 07:27:13,968][134211] Avg episode reward: [(0, '9.645')] [2025-01-04 07:27:13,982][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000135918_556720128.pth... [2025-01-04 07:27:14,064][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000135088_553320448.pth [2025-01-04 07:27:15,878][134294] Updated weights for policy 0, policy_version 135924 (0.0026) [2025-01-04 07:27:18,968][134211] Fps is (10 sec: 11878.3, 60 sec: 14062.9, 300 sec: 14190.2). Total num frames: 556781568. Throughput: 0: 3496.7. Samples: 128360240. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 07:27:18,968][134211] Avg episode reward: [(0, '7.864')] [2025-01-04 07:27:19,126][134294] Updated weights for policy 0, policy_version 135934 (0.0024) [2025-01-04 07:27:22,346][134294] Updated weights for policy 0, policy_version 135944 (0.0027) [2025-01-04 07:27:23,968][134211] Fps is (10 sec: 13107.5, 60 sec: 13994.7, 300 sec: 14079.2). Total num frames: 556851200. Throughput: 0: 3462.0. Samples: 128379426. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 07:27:23,968][134211] Avg episode reward: [(0, '8.438')] [2025-01-04 07:27:24,598][134294] Updated weights for policy 0, policy_version 135954 (0.0014) [2025-01-04 07:27:26,465][134294] Updated weights for policy 0, policy_version 135964 (0.0014) [2025-01-04 07:27:28,338][134294] Updated weights for policy 0, policy_version 135974 (0.0013) [2025-01-04 07:27:28,968][134211] Fps is (10 sec: 18022.8, 60 sec: 14540.8, 300 sec: 14176.3). Total num frames: 556961792. Throughput: 0: 3695.1. Samples: 128409624. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 07:27:28,968][134211] Avg episode reward: [(0, '7.909')] [2025-01-04 07:27:30,269][134294] Updated weights for policy 0, policy_version 135984 (0.0012) [2025-01-04 07:27:32,138][134294] Updated weights for policy 0, policy_version 135994 (0.0014) [2025-01-04 07:27:33,968][134211] Fps is (10 sec: 20888.5, 60 sec: 15086.8, 300 sec: 14287.4). Total num frames: 557060096. Throughput: 0: 3860.1. Samples: 128425922. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 07:27:33,969][134211] Avg episode reward: [(0, '8.154')] [2025-01-04 07:27:34,767][134294] Updated weights for policy 0, policy_version 136004 (0.0021) [2025-01-04 07:27:37,966][134294] Updated weights for policy 0, policy_version 136014 (0.0029) [2025-01-04 07:27:38,968][134211] Fps is (10 sec: 15974.2, 60 sec: 14950.4, 300 sec: 14287.4). Total num frames: 557121536. Throughput: 0: 3743.8. Samples: 128449230. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 07:27:38,968][134211] Avg episode reward: [(0, '8.866')] [2025-01-04 07:27:41,110][134294] Updated weights for policy 0, policy_version 136024 (0.0030) [2025-01-04 07:27:43,968][134211] Fps is (10 sec: 12698.0, 60 sec: 14950.4, 300 sec: 14287.4). Total num frames: 557187072. Throughput: 0: 3619.8. Samples: 128468282. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 07:27:43,968][134211] Avg episode reward: [(0, '8.693')] [2025-01-04 07:27:44,538][134294] Updated weights for policy 0, policy_version 136034 (0.0028) [2025-01-04 07:27:47,929][134294] Updated weights for policy 0, policy_version 136044 (0.0032) [2025-01-04 07:27:48,968][134211] Fps is (10 sec: 12288.0, 60 sec: 14677.3, 300 sec: 14273.5). Total num frames: 557244416. Throughput: 0: 3606.6. Samples: 128476838. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 07:27:48,968][134211] Avg episode reward: [(0, '8.164')] [2025-01-04 07:27:51,156][134294] Updated weights for policy 0, policy_version 136054 (0.0026) [2025-01-04 07:27:53,968][134211] Fps is (10 sec: 11878.6, 60 sec: 13994.6, 300 sec: 14245.8). Total num frames: 557305856. Throughput: 0: 3596.6. Samples: 128495364. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 07:27:53,968][134211] Avg episode reward: [(0, '8.400')] [2025-01-04 07:27:54,646][134294] Updated weights for policy 0, policy_version 136064 (0.0030) [2025-01-04 07:27:57,704][134294] Updated weights for policy 0, policy_version 136074 (0.0022) [2025-01-04 07:27:58,968][134211] Fps is (10 sec: 12697.7, 60 sec: 13926.4, 300 sec: 14245.9). Total num frames: 557371392. Throughput: 0: 3623.3. Samples: 128514392. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 07:27:58,968][134211] Avg episode reward: [(0, '8.483')] [2025-01-04 07:28:00,724][134294] Updated weights for policy 0, policy_version 136084 (0.0026) [2025-01-04 07:28:03,744][134294] Updated weights for policy 0, policy_version 136094 (0.0026) [2025-01-04 07:28:03,968][134211] Fps is (10 sec: 13516.6, 60 sec: 14062.9, 300 sec: 14245.7). Total num frames: 557441024. Throughput: 0: 3657.6. Samples: 128524834. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 07:28:03,968][134211] Avg episode reward: [(0, '8.332')] [2025-01-04 07:28:06,674][134294] Updated weights for policy 0, policy_version 136104 (0.0025) [2025-01-04 07:28:08,968][134211] Fps is (10 sec: 13926.3, 60 sec: 14131.2, 300 sec: 14204.1). Total num frames: 557510656. Throughput: 0: 3686.8. Samples: 128545334. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 07:28:08,968][134211] Avg episode reward: [(0, '8.488')] [2025-01-04 07:28:09,764][134294] Updated weights for policy 0, policy_version 136114 (0.0025) [2025-01-04 07:28:12,902][134294] Updated weights for policy 0, policy_version 136124 (0.0026) [2025-01-04 07:28:13,968][134211] Fps is (10 sec: 13516.9, 60 sec: 14267.7, 300 sec: 14093.0). Total num frames: 557576192. Throughput: 0: 3454.8. Samples: 128565092. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 07:28:13,968][134211] Avg episode reward: [(0, '8.939')] [2025-01-04 07:28:16,067][134294] Updated weights for policy 0, policy_version 136134 (0.0025) [2025-01-04 07:28:18,968][134211] Fps is (10 sec: 12697.7, 60 sec: 14267.8, 300 sec: 14093.0). Total num frames: 557637632. Throughput: 0: 3305.8. Samples: 128574682. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 07:28:18,968][134211] Avg episode reward: [(0, '8.495')] [2025-01-04 07:28:19,422][134294] Updated weights for policy 0, policy_version 136144 (0.0026) [2025-01-04 07:28:22,727][134294] Updated weights for policy 0, policy_version 136154 (0.0025) [2025-01-04 07:28:23,967][134211] Fps is (10 sec: 12697.9, 60 sec: 14199.5, 300 sec: 14093.0). Total num frames: 557703168. Throughput: 0: 3202.1. Samples: 128593324. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 07:28:23,968][134211] Avg episode reward: [(0, '8.553')] [2025-01-04 07:28:25,053][134294] Updated weights for policy 0, policy_version 136164 (0.0015) [2025-01-04 07:28:27,130][134294] Updated weights for policy 0, policy_version 136174 (0.0014) [2025-01-04 07:28:28,967][134211] Fps is (10 sec: 16793.9, 60 sec: 14062.9, 300 sec: 14204.1). Total num frames: 557805568. Throughput: 0: 3385.8. Samples: 128620640. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 07:28:28,968][134211] Avg episode reward: [(0, '9.563')] [2025-01-04 07:28:29,130][134294] Updated weights for policy 0, policy_version 136184 (0.0012) [2025-01-04 07:28:31,452][134294] Updated weights for policy 0, policy_version 136194 (0.0017) [2025-01-04 07:28:33,968][134211] Fps is (10 sec: 18021.8, 60 sec: 13721.7, 300 sec: 14231.9). Total num frames: 557883392. Throughput: 0: 3515.1. Samples: 128635018. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 07:28:33,968][134211] Avg episode reward: [(0, '8.784')] [2025-01-04 07:28:34,490][134294] Updated weights for policy 0, policy_version 136204 (0.0025) [2025-01-04 07:28:37,773][134294] Updated weights for policy 0, policy_version 136214 (0.0027) [2025-01-04 07:28:38,968][134211] Fps is (10 sec: 14335.8, 60 sec: 13789.9, 300 sec: 14245.8). Total num frames: 557948928. Throughput: 0: 3538.0. Samples: 128654574. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 07:28:38,968][134211] Avg episode reward: [(0, '9.681')] [2025-01-04 07:28:40,738][134294] Updated weights for policy 0, policy_version 136224 (0.0024) [2025-01-04 07:28:43,858][134294] Updated weights for policy 0, policy_version 136234 (0.0024) [2025-01-04 07:28:43,968][134211] Fps is (10 sec: 13107.4, 60 sec: 13789.9, 300 sec: 14218.0). Total num frames: 558014464. Throughput: 0: 3564.0. Samples: 128674774. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 07:28:43,968][134211] Avg episode reward: [(0, '8.102')] [2025-01-04 07:28:47,140][134294] Updated weights for policy 0, policy_version 136244 (0.0026) [2025-01-04 07:28:48,968][134211] Fps is (10 sec: 12697.6, 60 sec: 13858.1, 300 sec: 14190.2). Total num frames: 558075904. Throughput: 0: 3534.0. Samples: 128683864. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 07:28:48,968][134211] Avg episode reward: [(0, '9.793')] [2025-01-04 07:28:50,382][134294] Updated weights for policy 0, policy_version 136254 (0.0024) [2025-01-04 07:28:53,601][134294] Updated weights for policy 0, policy_version 136264 (0.0025) [2025-01-04 07:28:53,968][134211] Fps is (10 sec: 12697.5, 60 sec: 13926.4, 300 sec: 14190.2). Total num frames: 558141440. Throughput: 0: 3508.6. Samples: 128703222. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 07:28:53,968][134211] Avg episode reward: [(0, '9.207')] [2025-01-04 07:28:56,678][134294] Updated weights for policy 0, policy_version 136274 (0.0026) [2025-01-04 07:28:58,968][134211] Fps is (10 sec: 13107.1, 60 sec: 13926.4, 300 sec: 14120.8). Total num frames: 558206976. Throughput: 0: 3507.6. Samples: 128722934. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 07:28:58,968][134211] Avg episode reward: [(0, '8.777')] [2025-01-04 07:28:59,862][134294] Updated weights for policy 0, policy_version 136284 (0.0025) [2025-01-04 07:29:03,107][134294] Updated weights for policy 0, policy_version 136294 (0.0027) [2025-01-04 07:29:03,968][134211] Fps is (10 sec: 12697.6, 60 sec: 13789.9, 300 sec: 14106.9). Total num frames: 558268416. Throughput: 0: 3500.0. Samples: 128732184. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 07:29:03,968][134211] Avg episode reward: [(0, '8.666')] [2025-01-04 07:29:05,711][134294] Updated weights for policy 0, policy_version 136304 (0.0021) [2025-01-04 07:29:07,605][134294] Updated weights for policy 0, policy_version 136314 (0.0012) [2025-01-04 07:29:08,968][134211] Fps is (10 sec: 16384.0, 60 sec: 14336.0, 300 sec: 14231.9). Total num frames: 558370816. Throughput: 0: 3627.6. Samples: 128756566. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 07:29:08,968][134211] Avg episode reward: [(0, '8.365')] [2025-01-04 07:29:09,538][134294] Updated weights for policy 0, policy_version 136324 (0.0014) [2025-01-04 07:29:12,406][134294] Updated weights for policy 0, policy_version 136334 (0.0023) [2025-01-04 07:29:13,968][134211] Fps is (10 sec: 17203.1, 60 sec: 14404.2, 300 sec: 14218.0). Total num frames: 558440448. Throughput: 0: 3576.4. Samples: 128781580. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 07:29:13,969][134211] Avg episode reward: [(0, '7.937')] [2025-01-04 07:29:13,989][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000136339_558444544.pth... [2025-01-04 07:29:14,069][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000135505_555028480.pth [2025-01-04 07:29:15,848][134294] Updated weights for policy 0, policy_version 136344 (0.0025) [2025-01-04 07:29:18,968][134211] Fps is (10 sec: 13107.2, 60 sec: 14404.2, 300 sec: 14120.9). Total num frames: 558501888. Throughput: 0: 3456.5. Samples: 128790560. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 07:29:18,968][134211] Avg episode reward: [(0, '9.100')] [2025-01-04 07:29:19,103][134294] Updated weights for policy 0, policy_version 136354 (0.0029) [2025-01-04 07:29:22,338][134294] Updated weights for policy 0, policy_version 136364 (0.0026) [2025-01-04 07:29:23,968][134211] Fps is (10 sec: 12287.8, 60 sec: 14335.9, 300 sec: 14134.7). Total num frames: 558563328. Throughput: 0: 3440.5. Samples: 128809398. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 07:29:23,971][134211] Avg episode reward: [(0, '9.549')] [2025-01-04 07:29:25,635][134294] Updated weights for policy 0, policy_version 136374 (0.0027) [2025-01-04 07:29:27,825][134294] Updated weights for policy 0, policy_version 136384 (0.0017) [2025-01-04 07:29:28,968][134211] Fps is (10 sec: 14745.9, 60 sec: 14062.9, 300 sec: 14231.9). Total num frames: 558649344. Throughput: 0: 3500.7. Samples: 128832306. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 07:29:28,968][134211] Avg episode reward: [(0, '8.644')] [2025-01-04 07:29:29,760][134294] Updated weights for policy 0, policy_version 136394 (0.0014) [2025-01-04 07:29:32,073][134294] Updated weights for policy 0, policy_version 136404 (0.0022) [2025-01-04 07:29:33,968][134211] Fps is (10 sec: 16794.2, 60 sec: 14131.3, 300 sec: 14273.5). Total num frames: 558731264. Throughput: 0: 3636.8. Samples: 128847522. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 07:29:33,968][134211] Avg episode reward: [(0, '7.841')] [2025-01-04 07:29:35,275][134294] Updated weights for policy 0, policy_version 136414 (0.0026) [2025-01-04 07:29:38,192][134294] Updated weights for policy 0, policy_version 136424 (0.0027) [2025-01-04 07:29:38,969][134211] Fps is (10 sec: 15153.8, 60 sec: 14199.3, 300 sec: 14287.4). Total num frames: 558800896. Throughput: 0: 3659.4. Samples: 128867898. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 07:29:38,969][134211] Avg episode reward: [(0, '9.515')] [2025-01-04 07:29:41,210][134294] Updated weights for policy 0, policy_version 136434 (0.0028) [2025-01-04 07:29:43,968][134211] Fps is (10 sec: 13515.8, 60 sec: 14199.3, 300 sec: 14245.7). Total num frames: 558866432. Throughput: 0: 3661.8. Samples: 128887718. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 07:29:43,969][134211] Avg episode reward: [(0, '9.106')] [2025-01-04 07:29:44,641][134294] Updated weights for policy 0, policy_version 136444 (0.0024) [2025-01-04 07:29:47,979][134294] Updated weights for policy 0, policy_version 136454 (0.0029) [2025-01-04 07:29:48,968][134211] Fps is (10 sec: 12288.9, 60 sec: 14131.2, 300 sec: 14231.9). Total num frames: 558923776. Throughput: 0: 3648.1. Samples: 128896348. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 07:29:48,968][134211] Avg episode reward: [(0, '8.477')] [2025-01-04 07:29:51,390][134294] Updated weights for policy 0, policy_version 136464 (0.0023) [2025-01-04 07:29:53,968][134211] Fps is (10 sec: 12288.9, 60 sec: 14131.2, 300 sec: 14106.9). Total num frames: 558989312. Throughput: 0: 3520.6. Samples: 128914992. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 07:29:53,968][134211] Avg episode reward: [(0, '8.663')] [2025-01-04 07:29:54,746][134294] Updated weights for policy 0, policy_version 136474 (0.0026) [2025-01-04 07:29:57,095][134294] Updated weights for policy 0, policy_version 136484 (0.0017) [2025-01-04 07:29:58,968][134211] Fps is (10 sec: 15155.5, 60 sec: 14472.6, 300 sec: 14162.5). Total num frames: 559075328. Throughput: 0: 3484.2. Samples: 128938366. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 07:29:58,968][134211] Avg episode reward: [(0, '8.713')] [2025-01-04 07:29:59,054][134294] Updated weights for policy 0, policy_version 136494 (0.0015) [2025-01-04 07:30:00,919][134294] Updated weights for policy 0, policy_version 136504 (0.0012) [2025-01-04 07:30:02,864][134294] Updated weights for policy 0, policy_version 136514 (0.0015) [2025-01-04 07:30:03,968][134211] Fps is (10 sec: 19251.3, 60 sec: 15223.5, 300 sec: 14315.2). Total num frames: 559181824. Throughput: 0: 3643.9. Samples: 128954536. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 07:30:03,968][134211] Avg episode reward: [(0, '8.553')] [2025-01-04 07:30:05,332][134294] Updated weights for policy 0, policy_version 136524 (0.0021) [2025-01-04 07:30:08,556][134294] Updated weights for policy 0, policy_version 136534 (0.0027) [2025-01-04 07:30:08,968][134211] Fps is (10 sec: 17202.4, 60 sec: 14609.0, 300 sec: 14301.3). Total num frames: 559247360. Throughput: 0: 3788.5. Samples: 128979882. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 07:30:08,969][134211] Avg episode reward: [(0, '8.421')] [2025-01-04 07:30:11,763][134294] Updated weights for policy 0, policy_version 136544 (0.0025) [2025-01-04 07:30:13,968][134211] Fps is (10 sec: 13106.8, 60 sec: 14540.8, 300 sec: 14315.2). Total num frames: 559312896. Throughput: 0: 3700.3. Samples: 128998822. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 07:30:13,969][134211] Avg episode reward: [(0, '7.725')] [2025-01-04 07:30:15,132][134294] Updated weights for policy 0, policy_version 136554 (0.0027) [2025-01-04 07:30:18,593][134294] Updated weights for policy 0, policy_version 136564 (0.0030) [2025-01-04 07:30:18,968][134211] Fps is (10 sec: 12288.3, 60 sec: 14472.5, 300 sec: 14245.7). Total num frames: 559370240. Throughput: 0: 3551.9. Samples: 129007358. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 07:30:18,968][134211] Avg episode reward: [(0, '8.785')] [2025-01-04 07:30:21,790][134294] Updated weights for policy 0, policy_version 136574 (0.0024) [2025-01-04 07:30:23,968][134211] Fps is (10 sec: 11878.5, 60 sec: 14472.6, 300 sec: 14106.9). Total num frames: 559431680. Throughput: 0: 3514.9. Samples: 129026066. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 07:30:23,969][134211] Avg episode reward: [(0, '8.293')] [2025-01-04 07:30:25,468][134294] Updated weights for policy 0, policy_version 136584 (0.0030) [2025-01-04 07:30:28,911][134294] Updated weights for policy 0, policy_version 136594 (0.0025) [2025-01-04 07:30:28,968][134211] Fps is (10 sec: 11878.6, 60 sec: 13994.7, 300 sec: 14079.2). Total num frames: 559489024. Throughput: 0: 3456.5. Samples: 129043258. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 07:30:28,968][134211] Avg episode reward: [(0, '8.595')] [2025-01-04 07:30:31,430][134294] Updated weights for policy 0, policy_version 136604 (0.0014) [2025-01-04 07:30:33,470][134294] Updated weights for policy 0, policy_version 136614 (0.0013) [2025-01-04 07:30:33,968][134211] Fps is (10 sec: 14746.0, 60 sec: 14131.2, 300 sec: 14162.4). Total num frames: 559579136. Throughput: 0: 3514.7. Samples: 129054508. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 07:30:33,968][134211] Avg episode reward: [(0, '8.749')] [2025-01-04 07:30:35,364][134294] Updated weights for policy 0, policy_version 136624 (0.0014) [2025-01-04 07:30:37,229][134294] Updated weights for policy 0, policy_version 136634 (0.0015) [2025-01-04 07:30:38,967][134211] Fps is (10 sec: 19661.1, 60 sec: 14745.8, 300 sec: 14315.2). Total num frames: 559685632. Throughput: 0: 3801.9. Samples: 129086076. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 07:30:38,968][134211] Avg episode reward: [(0, '8.639')] [2025-01-04 07:30:39,174][134294] Updated weights for policy 0, policy_version 136644 (0.0015) [2025-01-04 07:30:41,170][134294] Updated weights for policy 0, policy_version 136654 (0.0017) [2025-01-04 07:30:43,968][134211] Fps is (10 sec: 19250.7, 60 sec: 15087.1, 300 sec: 14412.4). Total num frames: 559771648. Throughput: 0: 3893.3. Samples: 129113566. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 07:30:43,968][134211] Avg episode reward: [(0, '9.072')] [2025-01-04 07:30:44,375][134294] Updated weights for policy 0, policy_version 136664 (0.0027) [2025-01-04 07:30:47,802][134294] Updated weights for policy 0, policy_version 136674 (0.0029) [2025-01-04 07:30:48,968][134211] Fps is (10 sec: 14335.7, 60 sec: 15086.9, 300 sec: 14384.6). Total num frames: 559828992. Throughput: 0: 3727.4. Samples: 129122270. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 07:30:48,968][134211] Avg episode reward: [(0, '9.094')] [2025-01-04 07:30:51,003][134294] Updated weights for policy 0, policy_version 136684 (0.0025) [2025-01-04 07:30:53,968][134211] Fps is (10 sec: 11878.5, 60 sec: 15018.7, 300 sec: 14273.5). Total num frames: 559890432. Throughput: 0: 3577.7. Samples: 129140878. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 07:30:53,968][134211] Avg episode reward: [(0, '8.502')] [2025-01-04 07:30:54,574][134294] Updated weights for policy 0, policy_version 136694 (0.0027) [2025-01-04 07:30:57,687][134294] Updated weights for policy 0, policy_version 136704 (0.0027) [2025-01-04 07:30:58,968][134211] Fps is (10 sec: 12288.0, 60 sec: 14609.0, 300 sec: 14259.6). Total num frames: 559951872. Throughput: 0: 3572.9. Samples: 129159604. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 07:30:58,968][134211] Avg episode reward: [(0, '8.412')] [2025-01-04 07:31:00,754][134294] Updated weights for policy 0, policy_version 136714 (0.0027) [2025-01-04 07:31:03,968][134211] Fps is (10 sec: 12697.4, 60 sec: 13926.3, 300 sec: 14259.6). Total num frames: 560017408. Throughput: 0: 3611.5. Samples: 129169874. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 07:31:03,969][134211] Avg episode reward: [(0, '8.454')] [2025-01-04 07:31:04,016][134294] Updated weights for policy 0, policy_version 136724 (0.0026) [2025-01-04 07:31:07,086][134294] Updated weights for policy 0, policy_version 136734 (0.0025) [2025-01-04 07:31:08,968][134211] Fps is (10 sec: 13516.7, 60 sec: 13994.7, 300 sec: 14273.5). Total num frames: 560087040. Throughput: 0: 3622.8. Samples: 129189094. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 07:31:08,968][134211] Avg episode reward: [(0, '8.751')] [2025-01-04 07:31:10,084][134294] Updated weights for policy 0, policy_version 136744 (0.0025) [2025-01-04 07:31:13,053][134294] Updated weights for policy 0, policy_version 136754 (0.0027) [2025-01-04 07:31:13,968][134211] Fps is (10 sec: 13516.8, 60 sec: 13994.7, 300 sec: 14287.4). Total num frames: 560152576. Throughput: 0: 3697.0. Samples: 129209622. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 07:31:13,969][134211] Avg episode reward: [(0, '8.444')] [2025-01-04 07:31:14,032][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000136757_560156672.pth... [2025-01-04 07:31:14,109][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000135918_556720128.pth [2025-01-04 07:31:16,440][134294] Updated weights for policy 0, policy_version 136764 (0.0025) [2025-01-04 07:31:18,968][134211] Fps is (10 sec: 13107.4, 60 sec: 14131.2, 300 sec: 14259.6). Total num frames: 560218112. Throughput: 0: 3646.1. Samples: 129218584. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 07:31:18,968][134211] Avg episode reward: [(0, '8.836')] [2025-01-04 07:31:19,408][134294] Updated weights for policy 0, policy_version 136774 (0.0025) [2025-01-04 07:31:22,770][134294] Updated weights for policy 0, policy_version 136784 (0.0025) [2025-01-04 07:31:23,968][134211] Fps is (10 sec: 12697.8, 60 sec: 14131.2, 300 sec: 14204.1). Total num frames: 560279552. Throughput: 0: 3382.0. Samples: 129238268. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 07:31:23,968][134211] Avg episode reward: [(0, '9.034')] [2025-01-04 07:31:26,098][134294] Updated weights for policy 0, policy_version 136794 (0.0025) [2025-01-04 07:31:28,968][134211] Fps is (10 sec: 12697.4, 60 sec: 14267.7, 300 sec: 14204.1). Total num frames: 560345088. Throughput: 0: 3192.9. Samples: 129257246. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 07:31:28,968][134211] Avg episode reward: [(0, '9.110')] [2025-01-04 07:31:29,102][134294] Updated weights for policy 0, policy_version 136804 (0.0023) [2025-01-04 07:31:32,150][134294] Updated weights for policy 0, policy_version 136814 (0.0024) [2025-01-04 07:31:33,968][134211] Fps is (10 sec: 13107.2, 60 sec: 13858.1, 300 sec: 14190.2). Total num frames: 560410624. Throughput: 0: 3225.8. Samples: 129267430. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 07:31:33,968][134211] Avg episode reward: [(0, '7.471')] [2025-01-04 07:31:35,124][134294] Updated weights for policy 0, policy_version 136824 (0.0023) [2025-01-04 07:31:37,081][134294] Updated weights for policy 0, policy_version 136834 (0.0013) [2025-01-04 07:31:38,953][134294] Updated weights for policy 0, policy_version 136844 (0.0013) [2025-01-04 07:31:38,968][134211] Fps is (10 sec: 16794.0, 60 sec: 13789.9, 300 sec: 14315.2). Total num frames: 560513024. Throughput: 0: 3357.6. Samples: 129291970. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 07:31:38,968][134211] Avg episode reward: [(0, '8.891')] [2025-01-04 07:31:40,880][134294] Updated weights for policy 0, policy_version 136854 (0.0012) [2025-01-04 07:31:42,766][134294] Updated weights for policy 0, policy_version 136864 (0.0012) [2025-01-04 07:31:43,968][134211] Fps is (10 sec: 20890.0, 60 sec: 14131.3, 300 sec: 14426.2). Total num frames: 560619520. Throughput: 0: 3654.7. Samples: 129324064. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 07:31:43,968][134211] Avg episode reward: [(0, '7.765')] [2025-01-04 07:31:44,954][134294] Updated weights for policy 0, policy_version 136874 (0.0016) [2025-01-04 07:31:48,500][134294] Updated weights for policy 0, policy_version 136884 (0.0029) [2025-01-04 07:31:48,968][134211] Fps is (10 sec: 16383.7, 60 sec: 14131.2, 300 sec: 14273.5). Total num frames: 560676864. Throughput: 0: 3695.9. Samples: 129336188. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 07:31:48,968][134211] Avg episode reward: [(0, '8.626')] [2025-01-04 07:31:52,359][134294] Updated weights for policy 0, policy_version 136894 (0.0029) [2025-01-04 07:31:53,968][134211] Fps is (10 sec: 11059.1, 60 sec: 13994.7, 300 sec: 14218.0). Total num frames: 560730112. Throughput: 0: 3621.7. Samples: 129352068. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 07:31:53,968][134211] Avg episode reward: [(0, '8.891')] [2025-01-04 07:31:56,019][134294] Updated weights for policy 0, policy_version 136904 (0.0031) [2025-01-04 07:31:58,968][134211] Fps is (10 sec: 11468.7, 60 sec: 13994.7, 300 sec: 14218.0). Total num frames: 560791552. Throughput: 0: 3545.4. Samples: 129369166. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 07:31:58,968][134211] Avg episode reward: [(0, '8.206')] [2025-01-04 07:31:59,565][134294] Updated weights for policy 0, policy_version 136914 (0.0027) [2025-01-04 07:32:02,675][134294] Updated weights for policy 0, policy_version 136924 (0.0026) [2025-01-04 07:32:03,968][134211] Fps is (10 sec: 12697.5, 60 sec: 13994.7, 300 sec: 14218.0). Total num frames: 560857088. Throughput: 0: 3557.7. Samples: 129378682. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 07:32:03,968][134211] Avg episode reward: [(0, '8.884')] [2025-01-04 07:32:05,531][134294] Updated weights for policy 0, policy_version 136934 (0.0026) [2025-01-04 07:32:08,601][134294] Updated weights for policy 0, policy_version 136944 (0.0025) [2025-01-04 07:32:08,968][134211] Fps is (10 sec: 13516.8, 60 sec: 13994.7, 300 sec: 14259.6). Total num frames: 560926720. Throughput: 0: 3576.0. Samples: 129399186. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 07:32:08,968][134211] Avg episode reward: [(0, '8.643')] [2025-01-04 07:32:11,598][134294] Updated weights for policy 0, policy_version 136954 (0.0024) [2025-01-04 07:32:13,968][134211] Fps is (10 sec: 13516.9, 60 sec: 13994.7, 300 sec: 14273.5). Total num frames: 560992256. Throughput: 0: 3598.3. Samples: 129419168. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 07:32:13,968][134211] Avg episode reward: [(0, '8.175')] [2025-01-04 07:32:14,739][134294] Updated weights for policy 0, policy_version 136964 (0.0024) [2025-01-04 07:32:17,974][134294] Updated weights for policy 0, policy_version 136974 (0.0024) [2025-01-04 07:32:18,968][134211] Fps is (10 sec: 13107.3, 60 sec: 13994.7, 300 sec: 14259.6). Total num frames: 561057792. Throughput: 0: 3593.1. Samples: 129429120. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 07:32:18,968][134211] Avg episode reward: [(0, '9.033')] [2025-01-04 07:32:20,760][134294] Updated weights for policy 0, policy_version 136984 (0.0022) [2025-01-04 07:32:22,875][134294] Updated weights for policy 0, policy_version 136994 (0.0014) [2025-01-04 07:32:23,968][134211] Fps is (10 sec: 15155.5, 60 sec: 14404.3, 300 sec: 14176.3). Total num frames: 561143808. Throughput: 0: 3554.2. Samples: 129451910. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-04 07:32:23,968][134211] Avg episode reward: [(0, '8.023')] [2025-01-04 07:32:25,031][134294] Updated weights for policy 0, policy_version 137004 (0.0012) [2025-01-04 07:32:27,056][134294] Updated weights for policy 0, policy_version 137014 (0.0013) [2025-01-04 07:32:28,968][134211] Fps is (10 sec: 18432.0, 60 sec: 14950.4, 300 sec: 14176.3). Total num frames: 561242112. Throughput: 0: 3482.1. Samples: 129480758. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-04 07:32:28,968][134211] Avg episode reward: [(0, '8.354')] [2025-01-04 07:32:29,438][134294] Updated weights for policy 0, policy_version 137024 (0.0020) [2025-01-04 07:32:32,587][134294] Updated weights for policy 0, policy_version 137034 (0.0026) [2025-01-04 07:32:33,968][134211] Fps is (10 sec: 15974.0, 60 sec: 14882.1, 300 sec: 14176.3). Total num frames: 561303552. Throughput: 0: 3451.1. Samples: 129491488. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-04 07:32:33,969][134211] Avg episode reward: [(0, '8.429')] [2025-01-04 07:32:35,783][134294] Updated weights for policy 0, policy_version 137044 (0.0026) [2025-01-04 07:32:38,908][134294] Updated weights for policy 0, policy_version 137054 (0.0025) [2025-01-04 07:32:38,968][134211] Fps is (10 sec: 13107.1, 60 sec: 14336.0, 300 sec: 14190.2). Total num frames: 561373184. Throughput: 0: 3526.9. Samples: 129510780. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-04 07:32:38,968][134211] Avg episode reward: [(0, '8.186')] [2025-01-04 07:32:42,007][134294] Updated weights for policy 0, policy_version 137064 (0.0027) [2025-01-04 07:32:43,968][134211] Fps is (10 sec: 13516.9, 60 sec: 13653.3, 300 sec: 14218.0). Total num frames: 561438720. Throughput: 0: 3584.3. Samples: 129530460. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-04 07:32:43,968][134211] Avg episode reward: [(0, '8.381')] [2025-01-04 07:32:45,111][134294] Updated weights for policy 0, policy_version 137074 (0.0027) [2025-01-04 07:32:48,246][134294] Updated weights for policy 0, policy_version 137084 (0.0024) [2025-01-04 07:32:48,968][134211] Fps is (10 sec: 12697.5, 60 sec: 13721.6, 300 sec: 14218.0). Total num frames: 561500160. Throughput: 0: 3601.2. Samples: 129540736. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-04 07:32:48,968][134211] Avg episode reward: [(0, '9.172')] [2025-01-04 07:32:51,573][134294] Updated weights for policy 0, policy_version 137094 (0.0024) [2025-01-04 07:32:53,968][134211] Fps is (10 sec: 12288.2, 60 sec: 13858.1, 300 sec: 14204.1). Total num frames: 561561600. Throughput: 0: 3552.6. Samples: 129559054. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-04 07:32:53,968][134211] Avg episode reward: [(0, '9.038')] [2025-01-04 07:32:55,143][134294] Updated weights for policy 0, policy_version 137104 (0.0026) [2025-01-04 07:32:58,052][134294] Updated weights for policy 0, policy_version 137114 (0.0025) [2025-01-04 07:32:58,968][134211] Fps is (10 sec: 12697.6, 60 sec: 13926.4, 300 sec: 14190.2). Total num frames: 561627136. Throughput: 0: 3535.0. Samples: 129578242. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-04 07:32:58,968][134211] Avg episode reward: [(0, '8.726')] [2025-01-04 07:33:01,186][134294] Updated weights for policy 0, policy_version 137124 (0.0033) [2025-01-04 07:33:03,968][134211] Fps is (10 sec: 13107.0, 60 sec: 13926.4, 300 sec: 14176.3). Total num frames: 561692672. Throughput: 0: 3535.3. Samples: 129588208. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-04 07:33:03,968][134211] Avg episode reward: [(0, '8.355')] [2025-01-04 07:33:04,410][134294] Updated weights for policy 0, policy_version 137134 (0.0024) [2025-01-04 07:33:07,348][134294] Updated weights for policy 0, policy_version 137144 (0.0026) [2025-01-04 07:33:08,968][134211] Fps is (10 sec: 13517.0, 60 sec: 13926.4, 300 sec: 14190.2). Total num frames: 561762304. Throughput: 0: 3469.2. Samples: 129608024. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-04 07:33:08,969][134211] Avg episode reward: [(0, '9.544')] [2025-01-04 07:33:10,356][134294] Updated weights for policy 0, policy_version 137154 (0.0026) [2025-01-04 07:33:12,779][134294] Updated weights for policy 0, policy_version 137164 (0.0019) [2025-01-04 07:33:13,967][134211] Fps is (10 sec: 15565.3, 60 sec: 14267.8, 300 sec: 14273.5). Total num frames: 561848320. Throughput: 0: 3343.2. Samples: 129631200. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-04 07:33:13,968][134211] Avg episode reward: [(0, '9.764')] [2025-01-04 07:33:13,976][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000137170_561848320.pth... [2025-01-04 07:33:14,020][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000136339_558444544.pth [2025-01-04 07:33:14,735][134294] Updated weights for policy 0, policy_version 137174 (0.0013) [2025-01-04 07:33:16,599][134294] Updated weights for policy 0, policy_version 137184 (0.0014) [2025-01-04 07:33:18,633][134294] Updated weights for policy 0, policy_version 137194 (0.0014) [2025-01-04 07:33:18,968][134211] Fps is (10 sec: 18841.5, 60 sec: 14882.1, 300 sec: 14398.5). Total num frames: 561950720. Throughput: 0: 3463.5. Samples: 129647346. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-04 07:33:18,968][134211] Avg episode reward: [(0, '8.553')] [2025-01-04 07:33:20,636][134294] Updated weights for policy 0, policy_version 137204 (0.0013) [2025-01-04 07:33:23,841][134294] Updated weights for policy 0, policy_version 137214 (0.0024) [2025-01-04 07:33:23,968][134211] Fps is (10 sec: 18021.7, 60 sec: 14745.5, 300 sec: 14315.2). Total num frames: 562028544. Throughput: 0: 3670.1. Samples: 129675936. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-04 07:33:23,969][134211] Avg episode reward: [(0, '8.647')] [2025-01-04 07:33:27,245][134294] Updated weights for policy 0, policy_version 137224 (0.0029) [2025-01-04 07:33:28,968][134211] Fps is (10 sec: 13926.4, 60 sec: 14131.2, 300 sec: 14259.6). Total num frames: 562089984. Throughput: 0: 3625.7. Samples: 129693614. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 07:33:28,969][134211] Avg episode reward: [(0, '8.498')] [2025-01-04 07:33:30,356][134294] Updated weights for policy 0, policy_version 137234 (0.0023) [2025-01-04 07:33:33,385][134294] Updated weights for policy 0, policy_version 137244 (0.0025) [2025-01-04 07:33:33,968][134211] Fps is (10 sec: 12697.7, 60 sec: 14199.5, 300 sec: 14259.6). Total num frames: 562155520. Throughput: 0: 3622.3. Samples: 129703738. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 07:33:33,969][134211] Avg episode reward: [(0, '9.085')] [2025-01-04 07:33:36,399][134294] Updated weights for policy 0, policy_version 137254 (0.0028) [2025-01-04 07:33:38,968][134211] Fps is (10 sec: 13107.3, 60 sec: 14131.2, 300 sec: 14259.6). Total num frames: 562221056. Throughput: 0: 3656.8. Samples: 129723608. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 07:33:38,968][134211] Avg episode reward: [(0, '8.157')] [2025-01-04 07:33:39,611][134294] Updated weights for policy 0, policy_version 137264 (0.0027) [2025-01-04 07:33:42,665][134294] Updated weights for policy 0, policy_version 137274 (0.0028) [2025-01-04 07:33:43,968][134211] Fps is (10 sec: 13107.2, 60 sec: 14131.2, 300 sec: 14273.5). Total num frames: 562286592. Throughput: 0: 3669.6. Samples: 129743374. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 07:33:43,968][134211] Avg episode reward: [(0, '8.640')] [2025-01-04 07:33:45,846][134294] Updated weights for policy 0, policy_version 137284 (0.0024) [2025-01-04 07:33:48,968][134211] Fps is (10 sec: 13106.9, 60 sec: 14199.4, 300 sec: 14273.5). Total num frames: 562352128. Throughput: 0: 3665.3. Samples: 129753146. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 07:33:48,968][134211] Avg episode reward: [(0, '8.444')] [2025-01-04 07:33:49,080][134294] Updated weights for policy 0, policy_version 137294 (0.0025) [2025-01-04 07:33:52,397][134294] Updated weights for policy 0, policy_version 137304 (0.0028) [2025-01-04 07:33:53,968][134211] Fps is (10 sec: 12697.6, 60 sec: 14199.4, 300 sec: 14259.6). Total num frames: 562413568. Throughput: 0: 3640.9. Samples: 129771866. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 07:33:53,968][134211] Avg episode reward: [(0, '8.350')] [2025-01-04 07:33:55,742][134294] Updated weights for policy 0, policy_version 137314 (0.0026) [2025-01-04 07:33:58,513][134294] Updated weights for policy 0, policy_version 137324 (0.0020) [2025-01-04 07:33:58,967][134211] Fps is (10 sec: 13517.3, 60 sec: 14336.1, 300 sec: 14301.3). Total num frames: 562487296. Throughput: 0: 3561.7. Samples: 129791478. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 07:33:58,968][134211] Avg episode reward: [(0, '8.000')] [2025-01-04 07:34:00,516][134294] Updated weights for policy 0, policy_version 137334 (0.0013) [2025-01-04 07:34:02,421][134294] Updated weights for policy 0, policy_version 137344 (0.0015) [2025-01-04 07:34:03,967][134211] Fps is (10 sec: 18023.0, 60 sec: 15018.8, 300 sec: 14315.2). Total num frames: 562593792. Throughput: 0: 3547.8. Samples: 129806994. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 07:34:03,968][134211] Avg episode reward: [(0, '9.109')] [2025-01-04 07:34:04,309][134294] Updated weights for policy 0, policy_version 137354 (0.0014) [2025-01-04 07:34:06,278][134294] Updated weights for policy 0, policy_version 137364 (0.0014) [2025-01-04 07:34:08,164][134294] Updated weights for policy 0, policy_version 137374 (0.0013) [2025-01-04 07:34:08,968][134211] Fps is (10 sec: 21299.0, 60 sec: 15633.1, 300 sec: 14440.2). Total num frames: 562700288. Throughput: 0: 3622.0. Samples: 129838924. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 07:34:08,968][134211] Avg episode reward: [(0, '7.990')] [2025-01-04 07:34:10,070][134294] Updated weights for policy 0, policy_version 137384 (0.0013) [2025-01-04 07:34:12,675][134294] Updated weights for policy 0, policy_version 137394 (0.0022) [2025-01-04 07:34:13,968][134211] Fps is (10 sec: 18430.4, 60 sec: 15496.3, 300 sec: 14495.6). Total num frames: 562778112. Throughput: 0: 3834.3. Samples: 129866162. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 07:34:13,969][134211] Avg episode reward: [(0, '8.299')] [2025-01-04 07:34:16,087][134294] Updated weights for policy 0, policy_version 137404 (0.0028) [2025-01-04 07:34:18,968][134211] Fps is (10 sec: 13516.5, 60 sec: 14745.6, 300 sec: 14481.8). Total num frames: 562835456. Throughput: 0: 3805.4. Samples: 129874982. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 07:34:18,968][134211] Avg episode reward: [(0, '8.464')] [2025-01-04 07:34:19,749][134294] Updated weights for policy 0, policy_version 137414 (0.0030) [2025-01-04 07:34:23,355][134294] Updated weights for policy 0, policy_version 137424 (0.0030) [2025-01-04 07:34:23,968][134211] Fps is (10 sec: 11469.6, 60 sec: 14404.3, 300 sec: 14384.6). Total num frames: 562892800. Throughput: 0: 3748.1. Samples: 129892272. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 07:34:23,968][134211] Avg episode reward: [(0, '8.413')] [2025-01-04 07:34:26,915][134294] Updated weights for policy 0, policy_version 137434 (0.0025) [2025-01-04 07:34:28,968][134211] Fps is (10 sec: 11469.0, 60 sec: 14336.0, 300 sec: 14301.3). Total num frames: 562950144. Throughput: 0: 3680.1. Samples: 129908976. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 07:34:28,968][134211] Avg episode reward: [(0, '7.889')] [2025-01-04 07:34:30,641][134294] Updated weights for policy 0, policy_version 137444 (0.0023) [2025-01-04 07:34:33,968][134211] Fps is (10 sec: 11468.6, 60 sec: 14199.5, 300 sec: 14259.7). Total num frames: 563007488. Throughput: 0: 3654.3. Samples: 129917588. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 07:34:33,968][134211] Avg episode reward: [(0, '7.689')] [2025-01-04 07:34:33,995][134294] Updated weights for policy 0, policy_version 137454 (0.0025) [2025-01-04 07:34:37,099][134294] Updated weights for policy 0, policy_version 137464 (0.0025) [2025-01-04 07:34:38,970][134211] Fps is (10 sec: 12285.3, 60 sec: 14198.9, 300 sec: 14259.6). Total num frames: 563073024. Throughput: 0: 3658.8. Samples: 129936520. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 07:34:38,970][134211] Avg episode reward: [(0, '8.469')] [2025-01-04 07:34:40,229][134294] Updated weights for policy 0, policy_version 137474 (0.0025) [2025-01-04 07:34:43,261][134294] Updated weights for policy 0, policy_version 137484 (0.0027) [2025-01-04 07:34:43,968][134211] Fps is (10 sec: 13516.8, 60 sec: 14267.7, 300 sec: 14301.3). Total num frames: 563142656. Throughput: 0: 3671.7. Samples: 129956706. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 07:34:43,968][134211] Avg episode reward: [(0, '8.324')] [2025-01-04 07:34:46,186][134294] Updated weights for policy 0, policy_version 137494 (0.0025) [2025-01-04 07:34:48,968][134211] Fps is (10 sec: 13519.8, 60 sec: 14267.8, 300 sec: 14301.3). Total num frames: 563208192. Throughput: 0: 3551.6. Samples: 129966818. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 07:34:48,968][134211] Avg episode reward: [(0, '8.029')] [2025-01-04 07:34:49,627][134294] Updated weights for policy 0, policy_version 137504 (0.0029) [2025-01-04 07:34:52,822][134294] Updated weights for policy 0, policy_version 137514 (0.0024) [2025-01-04 07:34:53,968][134211] Fps is (10 sec: 12697.8, 60 sec: 14267.8, 300 sec: 14218.0). Total num frames: 563269632. Throughput: 0: 3258.0. Samples: 129985534. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 07:34:53,968][134211] Avg episode reward: [(0, '7.757')] [2025-01-04 07:34:56,125][134294] Updated weights for policy 0, policy_version 137524 (0.0029) [2025-01-04 07:34:58,968][134211] Fps is (10 sec: 12697.4, 60 sec: 14131.1, 300 sec: 14079.1). Total num frames: 563335168. Throughput: 0: 3074.5. Samples: 130004512. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 07:34:58,968][134211] Avg episode reward: [(0, '9.395')] [2025-01-04 07:34:59,249][134294] Updated weights for policy 0, policy_version 137534 (0.0025) [2025-01-04 07:35:01,568][134294] Updated weights for policy 0, policy_version 137544 (0.0016) [2025-01-04 07:35:03,494][134294] Updated weights for policy 0, policy_version 137554 (0.0014) [2025-01-04 07:35:03,968][134211] Fps is (10 sec: 15974.5, 60 sec: 13926.4, 300 sec: 14176.3). Total num frames: 563429376. Throughput: 0: 3142.1. Samples: 130016376. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 07:35:03,968][134211] Avg episode reward: [(0, '8.516')] [2025-01-04 07:35:05,413][134294] Updated weights for policy 0, policy_version 137564 (0.0011) [2025-01-04 07:35:07,272][134294] Updated weights for policy 0, policy_version 137574 (0.0014) [2025-01-04 07:35:08,967][134211] Fps is (10 sec: 19661.3, 60 sec: 13858.1, 300 sec: 14301.3). Total num frames: 563531776. Throughput: 0: 3467.1. Samples: 130048290. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 07:35:08,968][134211] Avg episode reward: [(0, '7.768')] [2025-01-04 07:35:09,396][134294] Updated weights for policy 0, policy_version 137584 (0.0014) [2025-01-04 07:35:12,596][134294] Updated weights for policy 0, policy_version 137594 (0.0029) [2025-01-04 07:35:13,968][134211] Fps is (10 sec: 16793.1, 60 sec: 13653.4, 300 sec: 14329.1). Total num frames: 563597312. Throughput: 0: 3601.8. Samples: 130071060. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 07:35:13,969][134211] Avg episode reward: [(0, '9.357')] [2025-01-04 07:35:14,035][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000137598_563601408.pth... [2025-01-04 07:35:14,114][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000136757_560156672.pth [2025-01-04 07:35:15,958][134294] Updated weights for policy 0, policy_version 137604 (0.0029) [2025-01-04 07:35:18,968][134211] Fps is (10 sec: 13107.0, 60 sec: 13789.9, 300 sec: 14343.0). Total num frames: 563662848. Throughput: 0: 3616.9. Samples: 130080350. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 07:35:18,968][134211] Avg episode reward: [(0, '8.137')] [2025-01-04 07:35:19,091][134294] Updated weights for policy 0, policy_version 137614 (0.0026) [2025-01-04 07:35:22,481][134294] Updated weights for policy 0, policy_version 137624 (0.0025) [2025-01-04 07:35:23,968][134211] Fps is (10 sec: 12697.9, 60 sec: 13858.1, 300 sec: 14356.8). Total num frames: 563724288. Throughput: 0: 3616.4. Samples: 130099248. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 07:35:23,968][134211] Avg episode reward: [(0, '7.864')] [2025-01-04 07:35:25,904][134294] Updated weights for policy 0, policy_version 137634 (0.0025) [2025-01-04 07:35:28,937][134294] Updated weights for policy 0, policy_version 137644 (0.0024) [2025-01-04 07:35:28,968][134211] Fps is (10 sec: 12697.6, 60 sec: 13994.7, 300 sec: 14273.5). Total num frames: 563789824. Throughput: 0: 3587.2. Samples: 130118128. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 07:35:28,968][134211] Avg episode reward: [(0, '8.105')] [2025-01-04 07:35:31,999][134294] Updated weights for policy 0, policy_version 137654 (0.0025) [2025-01-04 07:35:33,968][134211] Fps is (10 sec: 13107.3, 60 sec: 14131.2, 300 sec: 14134.7). Total num frames: 563855360. Throughput: 0: 3586.0. Samples: 130128188. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 07:35:33,968][134211] Avg episode reward: [(0, '8.834')] [2025-01-04 07:35:35,091][134294] Updated weights for policy 0, policy_version 137664 (0.0026) [2025-01-04 07:35:37,937][134294] Updated weights for policy 0, policy_version 137674 (0.0023) [2025-01-04 07:35:38,968][134211] Fps is (10 sec: 13516.8, 60 sec: 14200.0, 300 sec: 14079.1). Total num frames: 563924992. Throughput: 0: 3626.7. Samples: 130148734. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 07:35:38,968][134211] Avg episode reward: [(0, '9.504')] [2025-01-04 07:35:41,038][134294] Updated weights for policy 0, policy_version 137684 (0.0027) [2025-01-04 07:35:43,968][134211] Fps is (10 sec: 13516.7, 60 sec: 14131.2, 300 sec: 14106.9). Total num frames: 563990528. Throughput: 0: 3652.6. Samples: 130168878. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 07:35:43,968][134211] Avg episode reward: [(0, '9.495')] [2025-01-04 07:35:44,054][134294] Updated weights for policy 0, policy_version 137694 (0.0023) [2025-01-04 07:35:47,173][134294] Updated weights for policy 0, policy_version 137704 (0.0025) [2025-01-04 07:35:48,968][134211] Fps is (10 sec: 13107.3, 60 sec: 14131.2, 300 sec: 14120.8). Total num frames: 564056064. Throughput: 0: 3611.6. Samples: 130178900. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 07:35:48,968][134211] Avg episode reward: [(0, '8.984')] [2025-01-04 07:35:50,508][134294] Updated weights for policy 0, policy_version 137714 (0.0027) [2025-01-04 07:35:53,887][134294] Updated weights for policy 0, policy_version 137724 (0.0027) [2025-01-04 07:35:53,968][134211] Fps is (10 sec: 12697.6, 60 sec: 14131.2, 300 sec: 14120.8). Total num frames: 564117504. Throughput: 0: 3313.9. Samples: 130197418. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 07:35:53,968][134211] Avg episode reward: [(0, '7.844')] [2025-01-04 07:35:56,398][134294] Updated weights for policy 0, policy_version 137734 (0.0019) [2025-01-04 07:35:58,321][134294] Updated weights for policy 0, policy_version 137744 (0.0013) [2025-01-04 07:35:58,967][134211] Fps is (10 sec: 15565.0, 60 sec: 14609.1, 300 sec: 14218.0). Total num frames: 564211712. Throughput: 0: 3365.2. Samples: 130222492. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 07:35:58,968][134211] Avg episode reward: [(0, '8.261')] [2025-01-04 07:36:00,202][134294] Updated weights for policy 0, policy_version 137754 (0.0015) [2025-01-04 07:36:02,181][134294] Updated weights for policy 0, policy_version 137764 (0.0012) [2025-01-04 07:36:03,967][134211] Fps is (10 sec: 20070.7, 60 sec: 14813.9, 300 sec: 14343.0). Total num frames: 564318208. Throughput: 0: 3513.6. Samples: 130238460. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 07:36:03,968][134211] Avg episode reward: [(0, '8.212')] [2025-01-04 07:36:04,028][134294] Updated weights for policy 0, policy_version 137774 (0.0015) [2025-01-04 07:36:06,484][134294] Updated weights for policy 0, policy_version 137784 (0.0020) [2025-01-04 07:36:08,968][134211] Fps is (10 sec: 18021.9, 60 sec: 14335.9, 300 sec: 14370.7). Total num frames: 564391936. Throughput: 0: 3713.5. Samples: 130266354. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 07:36:08,968][134211] Avg episode reward: [(0, '9.443')] [2025-01-04 07:36:09,669][134294] Updated weights for policy 0, policy_version 137794 (0.0028) [2025-01-04 07:36:12,916][134294] Updated weights for policy 0, policy_version 137804 (0.0029) [2025-01-04 07:36:13,968][134211] Fps is (10 sec: 13926.1, 60 sec: 14336.0, 300 sec: 14370.7). Total num frames: 564457472. Throughput: 0: 3719.4. Samples: 130285502. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 07:36:13,968][134211] Avg episode reward: [(0, '8.187')] [2025-01-04 07:36:15,943][134294] Updated weights for policy 0, policy_version 137814 (0.0026) [2025-01-04 07:36:18,968][134211] Fps is (10 sec: 13107.3, 60 sec: 14336.0, 300 sec: 14384.6). Total num frames: 564523008. Throughput: 0: 3720.0. Samples: 130295588. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 07:36:18,968][134211] Avg episode reward: [(0, '8.656')] [2025-01-04 07:36:18,992][134294] Updated weights for policy 0, policy_version 137824 (0.0025) [2025-01-04 07:36:22,128][134294] Updated weights for policy 0, policy_version 137834 (0.0024) [2025-01-04 07:36:23,968][134211] Fps is (10 sec: 12697.6, 60 sec: 14336.0, 300 sec: 14370.7). Total num frames: 564584448. Throughput: 0: 3697.6. Samples: 130315128. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 07:36:23,968][134211] Avg episode reward: [(0, '8.126')] [2025-01-04 07:36:25,866][134294] Updated weights for policy 0, policy_version 137844 (0.0027) [2025-01-04 07:36:28,968][134211] Fps is (10 sec: 12287.9, 60 sec: 14267.7, 300 sec: 14356.8). Total num frames: 564645888. Throughput: 0: 3630.3. Samples: 130332242. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 07:36:28,968][134211] Avg episode reward: [(0, '9.204')] [2025-01-04 07:36:29,380][134294] Updated weights for policy 0, policy_version 137854 (0.0025) [2025-01-04 07:36:32,735][134294] Updated weights for policy 0, policy_version 137864 (0.0024) [2025-01-04 07:36:33,969][134211] Fps is (10 sec: 11877.3, 60 sec: 14131.0, 300 sec: 14204.0). Total num frames: 564703232. Throughput: 0: 3605.3. Samples: 130341142. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 07:36:33,970][134211] Avg episode reward: [(0, '9.012')] [2025-01-04 07:36:36,056][134294] Updated weights for policy 0, policy_version 137874 (0.0024) [2025-01-04 07:36:38,971][134211] Fps is (10 sec: 12284.3, 60 sec: 14062.2, 300 sec: 14065.1). Total num frames: 564768768. Throughput: 0: 3612.3. Samples: 130359982. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 07:36:38,971][134211] Avg episode reward: [(0, '9.045')] [2025-01-04 07:36:39,231][134294] Updated weights for policy 0, policy_version 137884 (0.0027) [2025-01-04 07:36:42,051][134294] Updated weights for policy 0, policy_version 137894 (0.0025) [2025-01-04 07:36:43,968][134211] Fps is (10 sec: 13518.0, 60 sec: 14131.2, 300 sec: 14106.9). Total num frames: 564838400. Throughput: 0: 3506.6. Samples: 130380288. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 07:36:43,968][134211] Avg episode reward: [(0, '8.875')] [2025-01-04 07:36:45,099][134294] Updated weights for policy 0, policy_version 137904 (0.0024) [2025-01-04 07:36:47,433][134294] Updated weights for policy 0, policy_version 137914 (0.0016) [2025-01-04 07:36:48,968][134211] Fps is (10 sec: 15569.8, 60 sec: 14472.5, 300 sec: 14218.0). Total num frames: 564924416. Throughput: 0: 3377.6. Samples: 130390454. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 07:36:48,968][134211] Avg episode reward: [(0, '9.322')] [2025-01-04 07:36:49,408][134294] Updated weights for policy 0, policy_version 137924 (0.0013) [2025-01-04 07:36:51,761][134294] Updated weights for policy 0, policy_version 137934 (0.0021) [2025-01-04 07:36:53,968][134211] Fps is (10 sec: 16383.8, 60 sec: 14745.6, 300 sec: 14273.5). Total num frames: 565002240. Throughput: 0: 3387.7. Samples: 130418800. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 07:36:53,969][134211] Avg episode reward: [(0, '8.676')] [2025-01-04 07:36:55,279][134294] Updated weights for policy 0, policy_version 137944 (0.0026) [2025-01-04 07:36:58,221][134294] Updated weights for policy 0, policy_version 137954 (0.0025) [2025-01-04 07:36:58,968][134211] Fps is (10 sec: 13926.1, 60 sec: 14199.4, 300 sec: 14259.6). Total num frames: 565063680. Throughput: 0: 3381.1. Samples: 130437652. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 07:36:58,968][134211] Avg episode reward: [(0, '7.541')] [2025-01-04 07:37:01,387][134294] Updated weights for policy 0, policy_version 137964 (0.0025) [2025-01-04 07:37:03,968][134211] Fps is (10 sec: 13107.5, 60 sec: 13585.1, 300 sec: 14259.6). Total num frames: 565133312. Throughput: 0: 3379.6. Samples: 130447670. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 07:37:03,968][134211] Avg episode reward: [(0, '8.992')] [2025-01-04 07:37:04,581][134294] Updated weights for policy 0, policy_version 137974 (0.0026) [2025-01-04 07:37:07,663][134294] Updated weights for policy 0, policy_version 137984 (0.0025) [2025-01-04 07:37:08,968][134211] Fps is (10 sec: 13107.2, 60 sec: 13380.3, 300 sec: 14245.7). Total num frames: 565194752. Throughput: 0: 3385.5. Samples: 130467474. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 07:37:08,968][134211] Avg episode reward: [(0, '8.674')] [2025-01-04 07:37:10,988][134294] Updated weights for policy 0, policy_version 137994 (0.0029) [2025-01-04 07:37:13,967][134211] Fps is (10 sec: 12697.7, 60 sec: 13380.3, 300 sec: 14245.8). Total num frames: 565260288. Throughput: 0: 3420.8. Samples: 130486178. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 07:37:13,968][134211] Avg episode reward: [(0, '9.488')] [2025-01-04 07:37:14,022][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000138004_565264384.pth... [2025-01-04 07:37:14,026][134294] Updated weights for policy 0, policy_version 138004 (0.0025) [2025-01-04 07:37:14,062][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000137170_561848320.pth [2025-01-04 07:37:16,056][134294] Updated weights for policy 0, policy_version 138014 (0.0015) [2025-01-04 07:37:18,032][134294] Updated weights for policy 0, policy_version 138024 (0.0016) [2025-01-04 07:37:18,967][134211] Fps is (10 sec: 16794.1, 60 sec: 13994.7, 300 sec: 14301.3). Total num frames: 565362688. Throughput: 0: 3533.9. Samples: 130500162. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 07:37:18,968][134211] Avg episode reward: [(0, '7.947')] [2025-01-04 07:37:20,122][134294] Updated weights for policy 0, policy_version 138034 (0.0017) [2025-01-04 07:37:23,519][134294] Updated weights for policy 0, policy_version 138044 (0.0027) [2025-01-04 07:37:23,968][134211] Fps is (10 sec: 17202.9, 60 sec: 14131.2, 300 sec: 14204.1). Total num frames: 565432320. Throughput: 0: 3704.2. Samples: 130526658. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 07:37:23,968][134211] Avg episode reward: [(0, '7.844')] [2025-01-04 07:37:26,870][134294] Updated weights for policy 0, policy_version 138054 (0.0031) [2025-01-04 07:37:28,968][134211] Fps is (10 sec: 13107.1, 60 sec: 14131.2, 300 sec: 14204.1). Total num frames: 565493760. Throughput: 0: 3651.0. Samples: 130544584. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 07:37:28,968][134211] Avg episode reward: [(0, '8.364')] [2025-01-04 07:37:30,083][134294] Updated weights for policy 0, policy_version 138064 (0.0028) [2025-01-04 07:37:33,097][134294] Updated weights for policy 0, policy_version 138074 (0.0027) [2025-01-04 07:37:33,968][134211] Fps is (10 sec: 12697.4, 60 sec: 14267.9, 300 sec: 14190.2). Total num frames: 565559296. Throughput: 0: 3641.7. Samples: 130554332. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 07:37:33,969][134211] Avg episode reward: [(0, '9.597')] [2025-01-04 07:37:36,181][134294] Updated weights for policy 0, policy_version 138084 (0.0027) [2025-01-04 07:37:38,968][134211] Fps is (10 sec: 13107.1, 60 sec: 14268.5, 300 sec: 14190.2). Total num frames: 565624832. Throughput: 0: 3458.1. Samples: 130574414. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 07:37:38,969][134211] Avg episode reward: [(0, '9.150')] [2025-01-04 07:37:39,307][134294] Updated weights for policy 0, policy_version 138094 (0.0024) [2025-01-04 07:37:42,348][134294] Updated weights for policy 0, policy_version 138104 (0.0026) [2025-01-04 07:37:43,968][134211] Fps is (10 sec: 13517.1, 60 sec: 14267.8, 300 sec: 14218.0). Total num frames: 565694464. Throughput: 0: 3485.6. Samples: 130594502. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 07:37:43,968][134211] Avg episode reward: [(0, '9.317')] [2025-01-04 07:37:45,279][134294] Updated weights for policy 0, policy_version 138114 (0.0023) [2025-01-04 07:37:48,328][134294] Updated weights for policy 0, policy_version 138124 (0.0029) [2025-01-04 07:37:48,968][134211] Fps is (10 sec: 13516.7, 60 sec: 13926.3, 300 sec: 14231.9). Total num frames: 565760000. Throughput: 0: 3495.8. Samples: 130604982. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 07:37:48,968][134211] Avg episode reward: [(0, '8.667')] [2025-01-04 07:37:51,545][134294] Updated weights for policy 0, policy_version 138134 (0.0025) [2025-01-04 07:37:53,652][134294] Updated weights for policy 0, policy_version 138144 (0.0012) [2025-01-04 07:37:53,968][134211] Fps is (10 sec: 14745.4, 60 sec: 13994.7, 300 sec: 14287.4). Total num frames: 565841920. Throughput: 0: 3514.0. Samples: 130625604. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 07:37:53,968][134211] Avg episode reward: [(0, '8.004')] [2025-01-04 07:37:55,738][134294] Updated weights for policy 0, policy_version 138154 (0.0014) [2025-01-04 07:37:57,648][134294] Updated weights for policy 0, policy_version 138164 (0.0012) [2025-01-04 07:37:58,967][134211] Fps is (10 sec: 18842.1, 60 sec: 14745.7, 300 sec: 14426.3). Total num frames: 565948416. Throughput: 0: 3777.2. Samples: 130656154. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 07:37:58,968][134211] Avg episode reward: [(0, '8.182')] [2025-01-04 07:37:59,545][134294] Updated weights for policy 0, policy_version 138174 (0.0013) [2025-01-04 07:38:01,821][134294] Updated weights for policy 0, policy_version 138184 (0.0018) [2025-01-04 07:38:03,968][134211] Fps is (10 sec: 18431.9, 60 sec: 14882.1, 300 sec: 14454.0). Total num frames: 566026240. Throughput: 0: 3804.2. Samples: 130671354. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 07:38:03,968][134211] Avg episode reward: [(0, '8.500')] [2025-01-04 07:38:05,313][134294] Updated weights for policy 0, policy_version 138194 (0.0029) [2025-01-04 07:38:08,452][134294] Updated weights for policy 0, policy_version 138204 (0.0026) [2025-01-04 07:38:08,968][134211] Fps is (10 sec: 13925.9, 60 sec: 14882.1, 300 sec: 14370.7). Total num frames: 566087680. Throughput: 0: 3628.3. Samples: 130689930. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 07:38:08,968][134211] Avg episode reward: [(0, '8.467')] [2025-01-04 07:38:11,626][134294] Updated weights for policy 0, policy_version 138214 (0.0027) [2025-01-04 07:38:13,968][134211] Fps is (10 sec: 12697.8, 60 sec: 14882.1, 300 sec: 14245.8). Total num frames: 566153216. Throughput: 0: 3666.5. Samples: 130709578. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 07:38:13,968][134211] Avg episode reward: [(0, '9.439')] [2025-01-04 07:38:14,667][134294] Updated weights for policy 0, policy_version 138224 (0.0026) [2025-01-04 07:38:17,837][134294] Updated weights for policy 0, policy_version 138234 (0.0025) [2025-01-04 07:38:18,968][134211] Fps is (10 sec: 13107.3, 60 sec: 14267.7, 300 sec: 14204.1). Total num frames: 566218752. Throughput: 0: 3666.9. Samples: 130719340. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 07:38:18,968][134211] Avg episode reward: [(0, '8.520')] [2025-01-04 07:38:21,120][134294] Updated weights for policy 0, policy_version 138244 (0.0027) [2025-01-04 07:38:23,968][134211] Fps is (10 sec: 12287.8, 60 sec: 14062.9, 300 sec: 14190.2). Total num frames: 566276096. Throughput: 0: 3636.3. Samples: 130738048. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 07:38:23,968][134211] Avg episode reward: [(0, '8.267')] [2025-01-04 07:38:24,673][134294] Updated weights for policy 0, policy_version 138254 (0.0025) [2025-01-04 07:38:28,256][134294] Updated weights for policy 0, policy_version 138264 (0.0025) [2025-01-04 07:38:28,968][134211] Fps is (10 sec: 11878.4, 60 sec: 14062.9, 300 sec: 14176.3). Total num frames: 566337536. Throughput: 0: 3574.0. Samples: 130755332. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 07:38:28,968][134211] Avg episode reward: [(0, '7.856')] [2025-01-04 07:38:31,584][134294] Updated weights for policy 0, policy_version 138274 (0.0027) [2025-01-04 07:38:33,968][134211] Fps is (10 sec: 11878.5, 60 sec: 13926.4, 300 sec: 14148.6). Total num frames: 566394880. Throughput: 0: 3542.0. Samples: 130764372. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 07:38:33,968][134211] Avg episode reward: [(0, '8.521')] [2025-01-04 07:38:35,083][134294] Updated weights for policy 0, policy_version 138284 (0.0026) [2025-01-04 07:38:37,151][134294] Updated weights for policy 0, policy_version 138294 (0.0012) [2025-01-04 07:38:38,968][134211] Fps is (10 sec: 15155.3, 60 sec: 14404.3, 300 sec: 14245.8). Total num frames: 566489088. Throughput: 0: 3580.7. Samples: 130786734. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 07:38:38,968][134211] Avg episode reward: [(0, '8.396')] [2025-01-04 07:38:39,017][134294] Updated weights for policy 0, policy_version 138304 (0.0014) [2025-01-04 07:38:40,886][134294] Updated weights for policy 0, policy_version 138314 (0.0014) [2025-01-04 07:38:42,775][134294] Updated weights for policy 0, policy_version 138324 (0.0013) [2025-01-04 07:38:43,968][134211] Fps is (10 sec: 20480.2, 60 sec: 15086.9, 300 sec: 14398.5). Total num frames: 566599680. Throughput: 0: 3623.9. Samples: 130819232. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-04 07:38:43,968][134211] Avg episode reward: [(0, '8.658')] [2025-01-04 07:38:44,611][134294] Updated weights for policy 0, policy_version 138334 (0.0013) [2025-01-04 07:38:47,326][134294] Updated weights for policy 0, policy_version 138344 (0.0023) [2025-01-04 07:38:48,968][134211] Fps is (10 sec: 18840.4, 60 sec: 15291.6, 300 sec: 14454.0). Total num frames: 566677504. Throughput: 0: 3607.6. Samples: 130833698. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-04 07:38:48,969][134211] Avg episode reward: [(0, '7.958')] [2025-01-04 07:38:50,532][134294] Updated weights for policy 0, policy_version 138354 (0.0030) [2025-01-04 07:38:53,907][134294] Updated weights for policy 0, policy_version 138364 (0.0028) [2025-01-04 07:38:53,968][134211] Fps is (10 sec: 13926.1, 60 sec: 14950.4, 300 sec: 14412.3). Total num frames: 566738944. Throughput: 0: 3621.0. Samples: 130852874. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-04 07:38:53,969][134211] Avg episode reward: [(0, '7.646')] [2025-01-04 07:38:57,295][134294] Updated weights for policy 0, policy_version 138374 (0.0030) [2025-01-04 07:38:58,968][134211] Fps is (10 sec: 12288.9, 60 sec: 14199.4, 300 sec: 14259.6). Total num frames: 566800384. Throughput: 0: 3586.3. Samples: 130870960. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-04 07:38:58,968][134211] Avg episode reward: [(0, '9.347')] [2025-01-04 07:39:00,400][134294] Updated weights for policy 0, policy_version 138384 (0.0028) [2025-01-04 07:39:03,474][134294] Updated weights for policy 0, policy_version 138394 (0.0023) [2025-01-04 07:39:03,968][134211] Fps is (10 sec: 12697.8, 60 sec: 13994.7, 300 sec: 14120.8). Total num frames: 566865920. Throughput: 0: 3596.8. Samples: 130881194. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-04 07:39:03,968][134211] Avg episode reward: [(0, '7.920')] [2025-01-04 07:39:06,394][134294] Updated weights for policy 0, policy_version 138404 (0.0023) [2025-01-04 07:39:08,968][134211] Fps is (10 sec: 13516.0, 60 sec: 14131.1, 300 sec: 14093.0). Total num frames: 566935552. Throughput: 0: 3627.4. Samples: 130901284. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-04 07:39:08,969][134211] Avg episode reward: [(0, '9.651')] [2025-01-04 07:39:09,680][134294] Updated weights for policy 0, policy_version 138414 (0.0027) [2025-01-04 07:39:12,705][134294] Updated weights for policy 0, policy_version 138424 (0.0028) [2025-01-04 07:39:13,969][134211] Fps is (10 sec: 13515.3, 60 sec: 14130.9, 300 sec: 14120.7). Total num frames: 567001088. Throughput: 0: 3680.5. Samples: 130920960. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-04 07:39:13,969][134211] Avg episode reward: [(0, '7.789')] [2025-01-04 07:39:13,981][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000138428_567001088.pth... [2025-01-04 07:39:14,049][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000137598_563601408.pth [2025-01-04 07:39:15,701][134294] Updated weights for policy 0, policy_version 138434 (0.0027) [2025-01-04 07:39:18,655][134294] Updated weights for policy 0, policy_version 138444 (0.0026) [2025-01-04 07:39:18,969][134211] Fps is (10 sec: 13105.6, 60 sec: 14130.8, 300 sec: 14148.5). Total num frames: 567066624. Throughput: 0: 3707.5. Samples: 130931216. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-04 07:39:18,970][134211] Avg episode reward: [(0, '8.283')] [2025-01-04 07:39:21,948][134294] Updated weights for policy 0, policy_version 138454 (0.0027) [2025-01-04 07:39:23,968][134211] Fps is (10 sec: 12698.9, 60 sec: 14199.5, 300 sec: 14162.4). Total num frames: 567128064. Throughput: 0: 3645.5. Samples: 130950784. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-04 07:39:23,968][134211] Avg episode reward: [(0, '8.652')] [2025-01-04 07:39:25,434][134294] Updated weights for policy 0, policy_version 138464 (0.0027) [2025-01-04 07:39:28,530][134294] Updated weights for policy 0, policy_version 138474 (0.0024) [2025-01-04 07:39:28,971][134211] Fps is (10 sec: 12695.8, 60 sec: 14267.0, 300 sec: 14190.1). Total num frames: 567193600. Throughput: 0: 3334.4. Samples: 130969290. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-04 07:39:28,971][134211] Avg episode reward: [(0, '8.555')] [2025-01-04 07:39:31,499][134294] Updated weights for policy 0, policy_version 138484 (0.0024) [2025-01-04 07:39:33,968][134211] Fps is (10 sec: 13516.8, 60 sec: 14472.5, 300 sec: 14204.2). Total num frames: 567263232. Throughput: 0: 3243.9. Samples: 130979674. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-04 07:39:33,968][134211] Avg episode reward: [(0, '8.093')] [2025-01-04 07:39:34,550][134294] Updated weights for policy 0, policy_version 138494 (0.0026) [2025-01-04 07:39:37,416][134294] Updated weights for policy 0, policy_version 138504 (0.0020) [2025-01-04 07:39:38,968][134211] Fps is (10 sec: 14750.4, 60 sec: 14199.5, 300 sec: 14231.9). Total num frames: 567341056. Throughput: 0: 3275.9. Samples: 131000290. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-04 07:39:38,968][134211] Avg episode reward: [(0, '8.902')] [2025-01-04 07:39:39,374][134294] Updated weights for policy 0, policy_version 138514 (0.0013) [2025-01-04 07:39:41,248][134294] Updated weights for policy 0, policy_version 138524 (0.0013) [2025-01-04 07:39:43,099][134294] Updated weights for policy 0, policy_version 138534 (0.0013) [2025-01-04 07:39:43,968][134211] Fps is (10 sec: 18842.0, 60 sec: 14199.5, 300 sec: 14384.6). Total num frames: 567451648. Throughput: 0: 3588.3. Samples: 131032434. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-04 07:39:43,968][134211] Avg episode reward: [(0, '8.760')] [2025-01-04 07:39:44,990][134294] Updated weights for policy 0, policy_version 138544 (0.0014) [2025-01-04 07:39:47,234][134294] Updated weights for policy 0, policy_version 138554 (0.0020) [2025-01-04 07:39:48,968][134211] Fps is (10 sec: 19250.8, 60 sec: 14267.9, 300 sec: 14454.0). Total num frames: 567533568. Throughput: 0: 3717.9. Samples: 131048498. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-04 07:39:48,969][134211] Avg episode reward: [(0, '7.554')] [2025-01-04 07:39:51,056][134294] Updated weights for policy 0, policy_version 138564 (0.0032) [2025-01-04 07:39:53,969][134211] Fps is (10 sec: 13515.0, 60 sec: 14131.0, 300 sec: 14412.3). Total num frames: 567586816. Throughput: 0: 3659.4. Samples: 131065958. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-04 07:39:53,970][134211] Avg episode reward: [(0, '8.634')] [2025-01-04 07:39:54,738][134294] Updated weights for policy 0, policy_version 138574 (0.0029) [2025-01-04 07:39:58,105][134294] Updated weights for policy 0, policy_version 138584 (0.0025) [2025-01-04 07:39:58,969][134211] Fps is (10 sec: 11467.8, 60 sec: 14130.9, 300 sec: 14301.2). Total num frames: 567648256. Throughput: 0: 3611.6. Samples: 131083482. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-04 07:39:58,969][134211] Avg episode reward: [(0, '8.785')] [2025-01-04 07:40:01,117][134294] Updated weights for policy 0, policy_version 138594 (0.0027) [2025-01-04 07:40:03,968][134211] Fps is (10 sec: 12698.9, 60 sec: 14131.2, 300 sec: 14176.3). Total num frames: 567713792. Throughput: 0: 3606.9. Samples: 131093522. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-04 07:40:03,968][134211] Avg episode reward: [(0, '8.596')] [2025-01-04 07:40:04,433][134294] Updated weights for policy 0, policy_version 138604 (0.0028) [2025-01-04 07:40:07,398][134294] Updated weights for policy 0, policy_version 138614 (0.0026) [2025-01-04 07:40:08,968][134211] Fps is (10 sec: 13518.1, 60 sec: 14131.3, 300 sec: 14190.2). Total num frames: 567783424. Throughput: 0: 3609.5. Samples: 131113212. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-04 07:40:08,968][134211] Avg episode reward: [(0, '9.189')] [2025-01-04 07:40:10,448][134294] Updated weights for policy 0, policy_version 138624 (0.0026) [2025-01-04 07:40:13,339][134294] Updated weights for policy 0, policy_version 138634 (0.0025) [2025-01-04 07:40:13,968][134211] Fps is (10 sec: 13926.6, 60 sec: 14199.7, 300 sec: 14204.1). Total num frames: 567853056. Throughput: 0: 3658.2. Samples: 131133898. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-04 07:40:13,968][134211] Avg episode reward: [(0, '9.495')] [2025-01-04 07:40:16,258][134294] Updated weights for policy 0, policy_version 138644 (0.0023) [2025-01-04 07:40:18,968][134211] Fps is (10 sec: 13516.9, 60 sec: 14199.9, 300 sec: 14218.0). Total num frames: 567918592. Throughput: 0: 3658.5. Samples: 131144304. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-04 07:40:18,968][134211] Avg episode reward: [(0, '8.933')] [2025-01-04 07:40:19,369][134294] Updated weights for policy 0, policy_version 138654 (0.0028) [2025-01-04 07:40:22,505][134294] Updated weights for policy 0, policy_version 138664 (0.0024) [2025-01-04 07:40:23,968][134211] Fps is (10 sec: 12697.5, 60 sec: 14199.5, 300 sec: 14204.1). Total num frames: 567980032. Throughput: 0: 3643.6. Samples: 131164254. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-04 07:40:23,969][134211] Avg episode reward: [(0, '7.933')] [2025-01-04 07:40:26,003][134294] Updated weights for policy 0, policy_version 138674 (0.0026) [2025-01-04 07:40:28,420][134294] Updated weights for policy 0, policy_version 138684 (0.0015) [2025-01-04 07:40:28,968][134211] Fps is (10 sec: 13926.4, 60 sec: 14405.1, 300 sec: 14245.8). Total num frames: 568057856. Throughput: 0: 3381.8. Samples: 131184614. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-04 07:40:28,968][134211] Avg episode reward: [(0, '9.524')] [2025-01-04 07:40:30,613][134294] Updated weights for policy 0, policy_version 138694 (0.0015) [2025-01-04 07:40:33,719][134294] Updated weights for policy 0, policy_version 138704 (0.0026) [2025-01-04 07:40:33,968][134211] Fps is (10 sec: 15154.6, 60 sec: 14472.4, 300 sec: 14259.6). Total num frames: 568131584. Throughput: 0: 3318.3. Samples: 131197824. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-04 07:40:33,969][134211] Avg episode reward: [(0, '8.578')] [2025-01-04 07:40:36,764][134294] Updated weights for policy 0, policy_version 138714 (0.0027) [2025-01-04 07:40:38,970][134211] Fps is (10 sec: 14332.8, 60 sec: 14335.5, 300 sec: 14273.4). Total num frames: 568201216. Throughput: 0: 3376.4. Samples: 131217900. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-04 07:40:38,971][134211] Avg episode reward: [(0, '8.538')] [2025-01-04 07:40:39,874][134294] Updated weights for policy 0, policy_version 138724 (0.0025) [2025-01-04 07:40:42,853][134294] Updated weights for policy 0, policy_version 138734 (0.0026) [2025-01-04 07:40:43,968][134211] Fps is (10 sec: 14336.9, 60 sec: 13721.6, 300 sec: 14301.3). Total num frames: 568274944. Throughput: 0: 3442.1. Samples: 131238374. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-04 07:40:43,968][134211] Avg episode reward: [(0, '8.671')] [2025-01-04 07:40:44,838][134294] Updated weights for policy 0, policy_version 138744 (0.0015) [2025-01-04 07:40:47,583][134294] Updated weights for policy 0, policy_version 138754 (0.0022) [2025-01-04 07:40:48,968][134211] Fps is (10 sec: 15158.5, 60 sec: 13653.4, 300 sec: 14356.8). Total num frames: 568352768. Throughput: 0: 3527.4. Samples: 131252256. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 07:40:48,969][134211] Avg episode reward: [(0, '8.552')] [2025-01-04 07:40:50,770][134294] Updated weights for policy 0, policy_version 138764 (0.0026) [2025-01-04 07:40:53,968][134211] Fps is (10 sec: 13926.1, 60 sec: 13790.1, 300 sec: 14245.7). Total num frames: 568414208. Throughput: 0: 3533.7. Samples: 131272228. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 07:40:53,974][134211] Avg episode reward: [(0, '8.595')] [2025-01-04 07:40:53,985][134294] Updated weights for policy 0, policy_version 138774 (0.0026) [2025-01-04 07:40:56,992][134294] Updated weights for policy 0, policy_version 138784 (0.0020) [2025-01-04 07:40:58,968][134211] Fps is (10 sec: 14336.1, 60 sec: 14131.5, 300 sec: 14162.4). Total num frames: 568496128. Throughput: 0: 3560.5. Samples: 131294120. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 07:40:58,968][134211] Avg episode reward: [(0, '7.935')] [2025-01-04 07:40:59,254][134294] Updated weights for policy 0, policy_version 138794 (0.0017) [2025-01-04 07:41:02,222][134294] Updated weights for policy 0, policy_version 138804 (0.0026) [2025-01-04 07:41:03,968][134211] Fps is (10 sec: 14745.8, 60 sec: 14131.2, 300 sec: 14134.7). Total num frames: 568561664. Throughput: 0: 3574.3. Samples: 131305148. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 07:41:03,968][134211] Avg episode reward: [(0, '7.753')] [2025-01-04 07:41:05,398][134294] Updated weights for policy 0, policy_version 138814 (0.0027) [2025-01-04 07:41:08,371][134294] Updated weights for policy 0, policy_version 138824 (0.0025) [2025-01-04 07:41:08,968][134211] Fps is (10 sec: 13107.1, 60 sec: 14062.9, 300 sec: 14134.7). Total num frames: 568627200. Throughput: 0: 3569.3. Samples: 131324870. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 07:41:08,968][134211] Avg episode reward: [(0, '9.033')] [2025-01-04 07:41:11,429][134294] Updated weights for policy 0, policy_version 138834 (0.0026) [2025-01-04 07:41:13,968][134211] Fps is (10 sec: 13516.9, 60 sec: 14062.9, 300 sec: 14148.6). Total num frames: 568696832. Throughput: 0: 3564.3. Samples: 131345008. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 07:41:13,968][134211] Avg episode reward: [(0, '8.682')] [2025-01-04 07:41:13,979][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000138842_568696832.pth... [2025-01-04 07:41:14,053][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000138004_565264384.pth [2025-01-04 07:41:14,580][134294] Updated weights for policy 0, policy_version 138844 (0.0026) [2025-01-04 07:41:17,561][134294] Updated weights for policy 0, policy_version 138854 (0.0025) [2025-01-04 07:41:18,967][134211] Fps is (10 sec: 14745.8, 60 sec: 14267.8, 300 sec: 14204.1). Total num frames: 568774656. Throughput: 0: 3488.5. Samples: 131354804. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 07:41:18,968][134211] Avg episode reward: [(0, '10.017')] [2025-01-04 07:41:19,503][134294] Updated weights for policy 0, policy_version 138864 (0.0015) [2025-01-04 07:41:21,566][134294] Updated weights for policy 0, policy_version 138874 (0.0013) [2025-01-04 07:41:23,608][134294] Updated weights for policy 0, policy_version 138884 (0.0013) [2025-01-04 07:41:23,968][134211] Fps is (10 sec: 17613.0, 60 sec: 14882.2, 300 sec: 14329.1). Total num frames: 568872960. Throughput: 0: 3678.0. Samples: 131383400. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 07:41:23,968][134211] Avg episode reward: [(0, '8.798')] [2025-01-04 07:41:25,994][134294] Updated weights for policy 0, policy_version 138894 (0.0018) [2025-01-04 07:41:28,968][134211] Fps is (10 sec: 16793.3, 60 sec: 14745.6, 300 sec: 14370.8). Total num frames: 568942592. Throughput: 0: 3756.8. Samples: 131407430. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 07:41:28,968][134211] Avg episode reward: [(0, '8.372')] [2025-01-04 07:41:29,324][134294] Updated weights for policy 0, policy_version 138904 (0.0028) [2025-01-04 07:41:32,746][134294] Updated weights for policy 0, policy_version 138914 (0.0026) [2025-01-04 07:41:33,968][134211] Fps is (10 sec: 13107.1, 60 sec: 14540.9, 300 sec: 14357.0). Total num frames: 569004032. Throughput: 0: 3640.6. Samples: 131416084. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 07:41:33,968][134211] Avg episode reward: [(0, '8.453')] [2025-01-04 07:41:35,886][134294] Updated weights for policy 0, policy_version 138924 (0.0026) [2025-01-04 07:41:38,901][134294] Updated weights for policy 0, policy_version 138934 (0.0026) [2025-01-04 07:41:38,968][134211] Fps is (10 sec: 13106.2, 60 sec: 14541.1, 300 sec: 14356.8). Total num frames: 569073664. Throughput: 0: 3637.4. Samples: 131435914. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 07:41:38,969][134211] Avg episode reward: [(0, '8.565')] [2025-01-04 07:41:41,944][134294] Updated weights for policy 0, policy_version 138944 (0.0027) [2025-01-04 07:41:43,968][134211] Fps is (10 sec: 13516.6, 60 sec: 14404.2, 300 sec: 14287.4). Total num frames: 569139200. Throughput: 0: 3594.4. Samples: 131455870. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 07:41:43,968][134211] Avg episode reward: [(0, '9.908')] [2025-01-04 07:41:45,081][134294] Updated weights for policy 0, policy_version 138954 (0.0026) [2025-01-04 07:41:48,056][134294] Updated weights for policy 0, policy_version 138964 (0.0027) [2025-01-04 07:41:48,968][134211] Fps is (10 sec: 13108.1, 60 sec: 14199.4, 300 sec: 14245.8). Total num frames: 569204736. Throughput: 0: 3569.6. Samples: 131465780. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 07:41:48,968][134211] Avg episode reward: [(0, '8.431')] [2025-01-04 07:41:51,339][134294] Updated weights for policy 0, policy_version 138974 (0.0025) [2025-01-04 07:41:53,968][134211] Fps is (10 sec: 12697.7, 60 sec: 14199.5, 300 sec: 14245.8). Total num frames: 569266176. Throughput: 0: 3560.0. Samples: 131485068. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 07:41:53,968][134211] Avg episode reward: [(0, '8.647')] [2025-01-04 07:41:54,853][134294] Updated weights for policy 0, policy_version 138984 (0.0026) [2025-01-04 07:41:57,811][134294] Updated weights for policy 0, policy_version 138994 (0.0023) [2025-01-04 07:41:58,968][134211] Fps is (10 sec: 13517.1, 60 sec: 14063.0, 300 sec: 14259.6). Total num frames: 569339904. Throughput: 0: 3557.0. Samples: 131505072. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 07:41:58,968][134211] Avg episode reward: [(0, '8.049')] [2025-01-04 07:41:59,710][134294] Updated weights for policy 0, policy_version 139004 (0.0012) [2025-01-04 07:42:01,637][134294] Updated weights for policy 0, policy_version 139014 (0.0012) [2025-01-04 07:42:03,522][134294] Updated weights for policy 0, policy_version 139024 (0.0013) [2025-01-04 07:42:03,968][134211] Fps is (10 sec: 18432.2, 60 sec: 14813.9, 300 sec: 14426.3). Total num frames: 569450496. Throughput: 0: 3694.8. Samples: 131521070. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 07:42:03,968][134211] Avg episode reward: [(0, '9.010')] [2025-01-04 07:42:05,614][134294] Updated weights for policy 0, policy_version 139034 (0.0016) [2025-01-04 07:42:08,671][134294] Updated weights for policy 0, policy_version 139044 (0.0029) [2025-01-04 07:42:08,968][134211] Fps is (10 sec: 18431.8, 60 sec: 14950.4, 300 sec: 14454.0). Total num frames: 569524224. Throughput: 0: 3691.4. Samples: 131549514. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 07:42:08,968][134211] Avg episode reward: [(0, '8.379')] [2025-01-04 07:42:11,826][134294] Updated weights for policy 0, policy_version 139054 (0.0025) [2025-01-04 07:42:13,968][134211] Fps is (10 sec: 13926.3, 60 sec: 14882.1, 300 sec: 14329.0). Total num frames: 569589760. Throughput: 0: 3588.8. Samples: 131568926. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 07:42:13,968][134211] Avg episode reward: [(0, '8.730')] [2025-01-04 07:42:15,129][134294] Updated weights for policy 0, policy_version 139064 (0.0025) [2025-01-04 07:42:18,329][134294] Updated weights for policy 0, policy_version 139074 (0.0028) [2025-01-04 07:42:18,968][134211] Fps is (10 sec: 12697.4, 60 sec: 14609.0, 300 sec: 14301.3). Total num frames: 569651200. Throughput: 0: 3599.1. Samples: 131578044. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 07:42:18,968][134211] Avg episode reward: [(0, '8.531')] [2025-01-04 07:42:21,723][134294] Updated weights for policy 0, policy_version 139084 (0.0032) [2025-01-04 07:42:23,968][134211] Fps is (10 sec: 12287.9, 60 sec: 13994.6, 300 sec: 14301.3). Total num frames: 569712640. Throughput: 0: 3570.1. Samples: 131596566. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 07:42:23,968][134211] Avg episode reward: [(0, '9.258')] [2025-01-04 07:42:25,166][134294] Updated weights for policy 0, policy_version 139094 (0.0026) [2025-01-04 07:42:28,568][134294] Updated weights for policy 0, policy_version 139104 (0.0023) [2025-01-04 07:42:28,968][134211] Fps is (10 sec: 11878.4, 60 sec: 13789.8, 300 sec: 14273.5). Total num frames: 569769984. Throughput: 0: 3523.4. Samples: 131614422. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 07:42:28,968][134211] Avg episode reward: [(0, '8.460')] [2025-01-04 07:42:32,016][134294] Updated weights for policy 0, policy_version 139114 (0.0022) [2025-01-04 07:42:33,968][134211] Fps is (10 sec: 11878.4, 60 sec: 13789.8, 300 sec: 14259.6). Total num frames: 569831424. Throughput: 0: 3503.1. Samples: 131623418. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 07:42:33,968][134211] Avg episode reward: [(0, '7.869')] [2025-01-04 07:42:35,375][134294] Updated weights for policy 0, policy_version 139124 (0.0025) [2025-01-04 07:42:37,480][134294] Updated weights for policy 0, policy_version 139134 (0.0014) [2025-01-04 07:42:38,968][134211] Fps is (10 sec: 15155.6, 60 sec: 14131.4, 300 sec: 14329.1). Total num frames: 569921536. Throughput: 0: 3555.1. Samples: 131645046. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 07:42:38,968][134211] Avg episode reward: [(0, '7.974')] [2025-01-04 07:42:39,516][134294] Updated weights for policy 0, policy_version 139144 (0.0012) [2025-01-04 07:42:41,506][134294] Updated weights for policy 0, policy_version 139154 (0.0014) [2025-01-04 07:42:43,396][134294] Updated weights for policy 0, policy_version 139164 (0.0012) [2025-01-04 07:42:43,968][134211] Fps is (10 sec: 19661.3, 60 sec: 14813.9, 300 sec: 14467.9). Total num frames: 570028032. Throughput: 0: 3798.7. Samples: 131676012. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 07:42:43,968][134211] Avg episode reward: [(0, '9.219')] [2025-01-04 07:42:45,313][134294] Updated weights for policy 0, policy_version 139174 (0.0012) [2025-01-04 07:42:47,771][134294] Updated weights for policy 0, policy_version 139184 (0.0021) [2025-01-04 07:42:48,968][134211] Fps is (10 sec: 18841.2, 60 sec: 15086.9, 300 sec: 14467.9). Total num frames: 570109952. Throughput: 0: 3793.4. Samples: 131691774. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 07:42:48,968][134211] Avg episode reward: [(0, '8.633')] [2025-01-04 07:42:51,208][134294] Updated weights for policy 0, policy_version 139194 (0.0031) [2025-01-04 07:42:53,968][134211] Fps is (10 sec: 13926.1, 60 sec: 15018.6, 300 sec: 14301.3). Total num frames: 570167296. Throughput: 0: 3578.5. Samples: 131710548. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 07:42:53,969][134211] Avg episode reward: [(0, '8.550')] [2025-01-04 07:42:54,833][134294] Updated weights for policy 0, policy_version 139204 (0.0027) [2025-01-04 07:42:58,390][134294] Updated weights for policy 0, policy_version 139214 (0.0027) [2025-01-04 07:42:58,968][134211] Fps is (10 sec: 11468.8, 60 sec: 14745.5, 300 sec: 14231.9). Total num frames: 570224640. Throughput: 0: 3529.2. Samples: 131727738. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 07:42:58,968][134211] Avg episode reward: [(0, '9.157')] [2025-01-04 07:43:01,803][134294] Updated weights for policy 0, policy_version 139224 (0.0024) [2025-01-04 07:43:03,968][134211] Fps is (10 sec: 11878.4, 60 sec: 13926.4, 300 sec: 14231.9). Total num frames: 570286080. Throughput: 0: 3523.1. Samples: 131736584. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 07:43:03,968][134211] Avg episode reward: [(0, '9.091')] [2025-01-04 07:43:05,396][134294] Updated weights for policy 0, policy_version 139234 (0.0027) [2025-01-04 07:43:08,673][134294] Updated weights for policy 0, policy_version 139244 (0.0029) [2025-01-04 07:43:08,968][134211] Fps is (10 sec: 11878.4, 60 sec: 13653.3, 300 sec: 14204.1). Total num frames: 570343424. Throughput: 0: 3505.4. Samples: 131754308. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 07:43:08,968][134211] Avg episode reward: [(0, '8.087')] [2025-01-04 07:43:11,811][134294] Updated weights for policy 0, policy_version 139254 (0.0026) [2025-01-04 07:43:13,968][134211] Fps is (10 sec: 12287.9, 60 sec: 13653.3, 300 sec: 14204.1). Total num frames: 570408960. Throughput: 0: 3537.9. Samples: 131773630. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 07:43:13,969][134211] Avg episode reward: [(0, '9.339')] [2025-01-04 07:43:14,049][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000139261_570413056.pth... [2025-01-04 07:43:14,127][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000138428_567001088.pth [2025-01-04 07:43:15,243][134294] Updated weights for policy 0, policy_version 139264 (0.0027) [2025-01-04 07:43:18,313][134294] Updated weights for policy 0, policy_version 139274 (0.0026) [2025-01-04 07:43:18,968][134211] Fps is (10 sec: 12697.6, 60 sec: 13653.3, 300 sec: 14218.0). Total num frames: 570470400. Throughput: 0: 3537.9. Samples: 131782622. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 07:43:18,968][134211] Avg episode reward: [(0, '9.350')] [2025-01-04 07:43:21,421][134294] Updated weights for policy 0, policy_version 139284 (0.0023) [2025-01-04 07:43:23,968][134211] Fps is (10 sec: 12697.7, 60 sec: 13721.6, 300 sec: 14231.9). Total num frames: 570535936. Throughput: 0: 3498.7. Samples: 131802490. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 07:43:23,968][134211] Avg episode reward: [(0, '8.074')] [2025-01-04 07:43:24,582][134294] Updated weights for policy 0, policy_version 139294 (0.0026) [2025-01-04 07:43:28,074][134294] Updated weights for policy 0, policy_version 139304 (0.0026) [2025-01-04 07:43:28,968][134211] Fps is (10 sec: 12697.6, 60 sec: 13789.9, 300 sec: 14245.7). Total num frames: 570597376. Throughput: 0: 3218.3. Samples: 131820836. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 07:43:28,968][134211] Avg episode reward: [(0, '8.361')] [2025-01-04 07:43:30,980][134294] Updated weights for policy 0, policy_version 139314 (0.0024) [2025-01-04 07:43:32,878][134294] Updated weights for policy 0, policy_version 139324 (0.0013) [2025-01-04 07:43:33,967][134211] Fps is (10 sec: 15565.3, 60 sec: 14336.1, 300 sec: 14245.8). Total num frames: 570691584. Throughput: 0: 3111.3. Samples: 131831780. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 07:43:33,968][134211] Avg episode reward: [(0, '8.276')] [2025-01-04 07:43:34,770][134294] Updated weights for policy 0, policy_version 139334 (0.0012) [2025-01-04 07:43:36,660][134294] Updated weights for policy 0, policy_version 139344 (0.0014) [2025-01-04 07:43:38,968][134211] Fps is (10 sec: 18841.8, 60 sec: 14404.3, 300 sec: 14190.2). Total num frames: 570785792. Throughput: 0: 3402.4. Samples: 131863654. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 07:43:38,968][134211] Avg episode reward: [(0, '8.022')] [2025-01-04 07:43:39,690][134294] Updated weights for policy 0, policy_version 139354 (0.0024) [2025-01-04 07:43:42,930][134294] Updated weights for policy 0, policy_version 139364 (0.0028) [2025-01-04 07:43:43,968][134211] Fps is (10 sec: 15564.5, 60 sec: 13653.3, 300 sec: 14134.7). Total num frames: 570847232. Throughput: 0: 3441.9. Samples: 131882624. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 07:43:43,968][134211] Avg episode reward: [(0, '8.746')] [2025-01-04 07:43:46,023][134294] Updated weights for policy 0, policy_version 139374 (0.0025) [2025-01-04 07:43:48,968][134211] Fps is (10 sec: 12287.9, 60 sec: 13312.0, 300 sec: 14134.7). Total num frames: 570908672. Throughput: 0: 3465.5. Samples: 131892532. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 07:43:48,968][134211] Avg episode reward: [(0, '8.384')] [2025-01-04 07:43:49,198][134294] Updated weights for policy 0, policy_version 139384 (0.0027) [2025-01-04 07:43:52,274][134294] Updated weights for policy 0, policy_version 139394 (0.0023) [2025-01-04 07:43:53,968][134211] Fps is (10 sec: 12697.7, 60 sec: 13448.6, 300 sec: 14148.6). Total num frames: 570974208. Throughput: 0: 3512.0. Samples: 131912350. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 07:43:53,968][134211] Avg episode reward: [(0, '8.162')] [2025-01-04 07:43:55,737][134294] Updated weights for policy 0, policy_version 139404 (0.0029) [2025-01-04 07:43:58,946][134294] Updated weights for policy 0, policy_version 139414 (0.0023) [2025-01-04 07:43:58,968][134211] Fps is (10 sec: 13107.2, 60 sec: 13585.1, 300 sec: 14148.6). Total num frames: 571039744. Throughput: 0: 3493.6. Samples: 131930842. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 07:43:58,968][134211] Avg episode reward: [(0, '8.297')] [2025-01-04 07:44:02,248][134294] Updated weights for policy 0, policy_version 139424 (0.0027) [2025-01-04 07:44:03,968][134211] Fps is (10 sec: 12697.4, 60 sec: 13585.1, 300 sec: 14120.8). Total num frames: 571101184. Throughput: 0: 3495.9. Samples: 131939936. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 07:44:03,968][134211] Avg episode reward: [(0, '8.014')] [2025-01-04 07:44:05,190][134294] Updated weights for policy 0, policy_version 139434 (0.0023) [2025-01-04 07:44:07,161][134294] Updated weights for policy 0, policy_version 139444 (0.0013) [2025-01-04 07:44:08,968][134211] Fps is (10 sec: 15155.2, 60 sec: 14131.2, 300 sec: 14204.1). Total num frames: 571191296. Throughput: 0: 3593.1. Samples: 131964178. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 07:44:08,968][134211] Avg episode reward: [(0, '9.110')] [2025-01-04 07:44:09,804][134294] Updated weights for policy 0, policy_version 139454 (0.0024) [2025-01-04 07:44:12,775][134294] Updated weights for policy 0, policy_version 139464 (0.0025) [2025-01-04 07:44:13,968][134211] Fps is (10 sec: 15564.9, 60 sec: 14131.2, 300 sec: 14204.2). Total num frames: 571256832. Throughput: 0: 3664.6. Samples: 131985742. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 07:44:13,968][134211] Avg episode reward: [(0, '9.177')] [2025-01-04 07:44:15,910][134294] Updated weights for policy 0, policy_version 139474 (0.0026) [2025-01-04 07:44:18,806][134294] Updated weights for policy 0, policy_version 139484 (0.0026) [2025-01-04 07:44:18,968][134211] Fps is (10 sec: 13516.9, 60 sec: 14267.7, 300 sec: 14231.9). Total num frames: 571326464. Throughput: 0: 3639.9. Samples: 131995574. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 07:44:18,968][134211] Avg episode reward: [(0, '8.935')] [2025-01-04 07:44:21,781][134294] Updated weights for policy 0, policy_version 139494 (0.0026) [2025-01-04 07:44:23,968][134211] Fps is (10 sec: 13516.7, 60 sec: 14267.7, 300 sec: 14232.0). Total num frames: 571392000. Throughput: 0: 3392.1. Samples: 132016298. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 07:44:23,969][134211] Avg episode reward: [(0, '8.214')] [2025-01-04 07:44:24,800][134294] Updated weights for policy 0, policy_version 139504 (0.0020) [2025-01-04 07:44:27,019][134294] Updated weights for policy 0, policy_version 139514 (0.0018) [2025-01-04 07:44:28,968][134211] Fps is (10 sec: 14335.9, 60 sec: 14540.8, 300 sec: 14259.6). Total num frames: 571469824. Throughput: 0: 3480.8. Samples: 132039260. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 07:44:28,968][134211] Avg episode reward: [(0, '8.849')] [2025-01-04 07:44:30,296][134294] Updated weights for policy 0, policy_version 139524 (0.0026) [2025-01-04 07:44:32,877][134294] Updated weights for policy 0, policy_version 139534 (0.0019) [2025-01-04 07:44:33,967][134211] Fps is (10 sec: 15565.2, 60 sec: 14267.7, 300 sec: 14259.6). Total num frames: 571547648. Throughput: 0: 3463.3. Samples: 132048378. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 07:44:33,968][134211] Avg episode reward: [(0, '8.761')] [2025-01-04 07:44:34,980][134294] Updated weights for policy 0, policy_version 139544 (0.0012) [2025-01-04 07:44:36,980][134294] Updated weights for policy 0, policy_version 139554 (0.0015) [2025-01-04 07:44:38,875][134294] Updated weights for policy 0, policy_version 139564 (0.0014) [2025-01-04 07:44:38,967][134211] Fps is (10 sec: 18432.4, 60 sec: 14472.6, 300 sec: 14245.8). Total num frames: 571654144. Throughput: 0: 3671.8. Samples: 132077580. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 07:44:38,968][134211] Avg episode reward: [(0, '9.529')] [2025-01-04 07:44:40,768][134294] Updated weights for policy 0, policy_version 139574 (0.0013) [2025-01-04 07:44:43,581][134294] Updated weights for policy 0, policy_version 139584 (0.0023) [2025-01-04 07:44:43,968][134211] Fps is (10 sec: 19249.9, 60 sec: 14882.0, 300 sec: 14259.6). Total num frames: 571740160. Throughput: 0: 3893.3. Samples: 132106044. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 07:44:43,969][134211] Avg episode reward: [(0, '9.070')] [2025-01-04 07:44:46,679][134294] Updated weights for policy 0, policy_version 139594 (0.0025) [2025-01-04 07:44:48,968][134211] Fps is (10 sec: 15154.8, 60 sec: 14950.4, 300 sec: 14301.3). Total num frames: 571805696. Throughput: 0: 3908.1. Samples: 132115802. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 07:44:48,968][134211] Avg episode reward: [(0, '8.474')] [2025-01-04 07:44:49,938][134294] Updated weights for policy 0, policy_version 139604 (0.0031) [2025-01-04 07:44:53,145][134294] Updated weights for policy 0, policy_version 139614 (0.0026) [2025-01-04 07:44:53,968][134211] Fps is (10 sec: 12698.1, 60 sec: 14882.1, 300 sec: 14301.3). Total num frames: 571867136. Throughput: 0: 3787.1. Samples: 132134598. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 07:44:53,968][134211] Avg episode reward: [(0, '8.621')] [2025-01-04 07:44:56,465][134294] Updated weights for policy 0, policy_version 139624 (0.0026) [2025-01-04 07:44:58,968][134211] Fps is (10 sec: 12288.0, 60 sec: 14813.9, 300 sec: 14287.4). Total num frames: 571928576. Throughput: 0: 3724.7. Samples: 132153352. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 07:44:58,968][134211] Avg episode reward: [(0, '9.824')] [2025-01-04 07:44:59,850][134294] Updated weights for policy 0, policy_version 139634 (0.0026) [2025-01-04 07:45:03,196][134294] Updated weights for policy 0, policy_version 139644 (0.0027) [2025-01-04 07:45:03,968][134211] Fps is (10 sec: 12287.5, 60 sec: 14813.8, 300 sec: 14259.6). Total num frames: 571990016. Throughput: 0: 3706.8. Samples: 132162382. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 07:45:03,969][134211] Avg episode reward: [(0, '9.112')] [2025-01-04 07:45:06,099][134294] Updated weights for policy 0, policy_version 139654 (0.0024) [2025-01-04 07:45:08,968][134211] Fps is (10 sec: 12696.7, 60 sec: 14404.1, 300 sec: 14245.7). Total num frames: 572055552. Throughput: 0: 3688.9. Samples: 132182300. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 07:45:08,970][134211] Avg episode reward: [(0, '7.603')] [2025-01-04 07:45:09,385][134294] Updated weights for policy 0, policy_version 139664 (0.0025) [2025-01-04 07:45:12,373][134294] Updated weights for policy 0, policy_version 139674 (0.0027) [2025-01-04 07:45:13,968][134211] Fps is (10 sec: 13517.3, 60 sec: 14472.5, 300 sec: 14259.6). Total num frames: 572125184. Throughput: 0: 3617.4. Samples: 132202044. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 07:45:13,968][134211] Avg episode reward: [(0, '9.195')] [2025-01-04 07:45:13,981][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000139679_572125184.pth... [2025-01-04 07:45:14,057][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000138842_568696832.pth [2025-01-04 07:45:15,441][134294] Updated weights for policy 0, policy_version 139684 (0.0027) [2025-01-04 07:45:18,250][134294] Updated weights for policy 0, policy_version 139694 (0.0023) [2025-01-04 07:45:18,968][134211] Fps is (10 sec: 13927.3, 60 sec: 14472.5, 300 sec: 14287.4). Total num frames: 572194816. Throughput: 0: 3642.8. Samples: 132212306. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 07:45:18,969][134211] Avg episode reward: [(0, '8.797')] [2025-01-04 07:45:21,277][134294] Updated weights for policy 0, policy_version 139704 (0.0024) [2025-01-04 07:45:23,968][134211] Fps is (10 sec: 13517.0, 60 sec: 14472.6, 300 sec: 14245.7). Total num frames: 572260352. Throughput: 0: 3456.2. Samples: 132233110. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 07:45:23,968][134211] Avg episode reward: [(0, '8.835')] [2025-01-04 07:45:24,482][134294] Updated weights for policy 0, policy_version 139714 (0.0027) [2025-01-04 07:45:27,873][134294] Updated weights for policy 0, policy_version 139724 (0.0025) [2025-01-04 07:45:28,968][134211] Fps is (10 sec: 12288.2, 60 sec: 14131.2, 300 sec: 14190.2). Total num frames: 572317696. Throughput: 0: 3231.2. Samples: 132251446. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 07:45:28,968][134211] Avg episode reward: [(0, '8.997')] [2025-01-04 07:45:30,975][134294] Updated weights for policy 0, policy_version 139734 (0.0023) [2025-01-04 07:45:32,979][134294] Updated weights for policy 0, policy_version 139744 (0.0013) [2025-01-04 07:45:33,967][134211] Fps is (10 sec: 15155.5, 60 sec: 14404.3, 300 sec: 14273.6). Total num frames: 572411904. Throughput: 0: 3243.9. Samples: 132261778. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 07:45:33,968][134211] Avg episode reward: [(0, '7.863')] [2025-01-04 07:45:34,900][134294] Updated weights for policy 0, policy_version 139754 (0.0013) [2025-01-04 07:45:36,805][134294] Updated weights for policy 0, policy_version 139764 (0.0013) [2025-01-04 07:45:38,968][134211] Fps is (10 sec: 18841.3, 60 sec: 14199.4, 300 sec: 14342.9). Total num frames: 572506112. Throughput: 0: 3529.3. Samples: 132293418. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 07:45:38,968][134211] Avg episode reward: [(0, '9.693')] [2025-01-04 07:45:39,606][134294] Updated weights for policy 0, policy_version 139774 (0.0025) [2025-01-04 07:45:43,064][134294] Updated weights for policy 0, policy_version 139784 (0.0027) [2025-01-04 07:45:43,968][134211] Fps is (10 sec: 15154.8, 60 sec: 13721.7, 300 sec: 14273.5). Total num frames: 572563456. Throughput: 0: 3529.4. Samples: 132312176. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 07:45:43,968][134211] Avg episode reward: [(0, '9.407')] [2025-01-04 07:45:46,220][134294] Updated weights for policy 0, policy_version 139794 (0.0026) [2025-01-04 07:45:48,968][134211] Fps is (10 sec: 12288.1, 60 sec: 13721.6, 300 sec: 14287.4). Total num frames: 572628992. Throughput: 0: 3549.8. Samples: 132322122. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 07:45:48,968][134211] Avg episode reward: [(0, '8.108')] [2025-01-04 07:45:49,344][134294] Updated weights for policy 0, policy_version 139804 (0.0027) [2025-01-04 07:45:52,417][134294] Updated weights for policy 0, policy_version 139814 (0.0025) [2025-01-04 07:45:53,968][134211] Fps is (10 sec: 13107.3, 60 sec: 13789.9, 300 sec: 14231.9). Total num frames: 572694528. Throughput: 0: 3545.7. Samples: 132341856. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 07:45:53,968][134211] Avg episode reward: [(0, '8.940')] [2025-01-04 07:45:55,696][134294] Updated weights for policy 0, policy_version 139824 (0.0025) [2025-01-04 07:45:58,889][134294] Updated weights for policy 0, policy_version 139834 (0.0024) [2025-01-04 07:45:58,968][134211] Fps is (10 sec: 13107.1, 60 sec: 13858.1, 300 sec: 14231.9). Total num frames: 572760064. Throughput: 0: 3534.4. Samples: 132361090. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 07:45:58,968][134211] Avg episode reward: [(0, '8.338')] [2025-01-04 07:46:02,206][134294] Updated weights for policy 0, policy_version 139844 (0.0027) [2025-01-04 07:46:03,968][134211] Fps is (10 sec: 12696.9, 60 sec: 13858.1, 300 sec: 14218.0). Total num frames: 572821504. Throughput: 0: 3512.5. Samples: 132370370. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 07:46:03,969][134211] Avg episode reward: [(0, '8.228')] [2025-01-04 07:46:04,909][134294] Updated weights for policy 0, policy_version 139854 (0.0021) [2025-01-04 07:46:06,809][134294] Updated weights for policy 0, policy_version 139864 (0.0013) [2025-01-04 07:46:08,703][134294] Updated weights for policy 0, policy_version 139874 (0.0013) [2025-01-04 07:46:08,968][134211] Fps is (10 sec: 16793.9, 60 sec: 14541.0, 300 sec: 14342.9). Total num frames: 572928000. Throughput: 0: 3618.0. Samples: 132395918. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 07:46:08,968][134211] Avg episode reward: [(0, '8.340')] [2025-01-04 07:46:10,590][134294] Updated weights for policy 0, policy_version 139884 (0.0013) [2025-01-04 07:46:12,478][134294] Updated weights for policy 0, policy_version 139894 (0.0015) [2025-01-04 07:46:13,968][134211] Fps is (10 sec: 21300.7, 60 sec: 15155.3, 300 sec: 14440.1). Total num frames: 573034496. Throughput: 0: 3937.5. Samples: 132428634. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 07:46:13,968][134211] Avg episode reward: [(0, '7.579')] [2025-01-04 07:46:14,553][134294] Updated weights for policy 0, policy_version 139904 (0.0016) [2025-01-04 07:46:17,695][134294] Updated weights for policy 0, policy_version 139914 (0.0028) [2025-01-04 07:46:18,968][134211] Fps is (10 sec: 17612.4, 60 sec: 15155.2, 300 sec: 14342.9). Total num frames: 573104128. Throughput: 0: 3974.3. Samples: 132440624. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 07:46:18,968][134211] Avg episode reward: [(0, '8.854')] [2025-01-04 07:46:20,741][134294] Updated weights for policy 0, policy_version 139924 (0.0027) [2025-01-04 07:46:23,968][134211] Fps is (10 sec: 13106.9, 60 sec: 15086.9, 300 sec: 14315.2). Total num frames: 573165568. Throughput: 0: 3701.3. Samples: 132459978. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 07:46:23,968][134211] Avg episode reward: [(0, '8.399')] [2025-01-04 07:46:24,121][134294] Updated weights for policy 0, policy_version 139934 (0.0028) [2025-01-04 07:46:27,396][134294] Updated weights for policy 0, policy_version 139944 (0.0028) [2025-01-04 07:46:28,968][134211] Fps is (10 sec: 12288.0, 60 sec: 15155.2, 300 sec: 14315.2). Total num frames: 573227008. Throughput: 0: 3684.4. Samples: 132477972. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 07:46:28,968][134211] Avg episode reward: [(0, '7.864')] [2025-01-04 07:46:30,944][134294] Updated weights for policy 0, policy_version 139954 (0.0025) [2025-01-04 07:46:33,968][134211] Fps is (10 sec: 12288.0, 60 sec: 14609.0, 300 sec: 14287.4). Total num frames: 573288448. Throughput: 0: 3660.3. Samples: 132486838. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 07:46:33,969][134211] Avg episode reward: [(0, '8.597')] [2025-01-04 07:46:34,345][134294] Updated weights for policy 0, policy_version 139964 (0.0028) [2025-01-04 07:46:37,695][134294] Updated weights for policy 0, policy_version 139974 (0.0026) [2025-01-04 07:46:38,968][134211] Fps is (10 sec: 11878.4, 60 sec: 13994.7, 300 sec: 14259.6). Total num frames: 573345792. Throughput: 0: 3623.4. Samples: 132504908. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 07:46:38,968][134211] Avg episode reward: [(0, '7.760')] [2025-01-04 07:46:40,919][134294] Updated weights for policy 0, policy_version 139984 (0.0025) [2025-01-04 07:46:43,802][134294] Updated weights for policy 0, policy_version 139994 (0.0024) [2025-01-04 07:46:43,968][134211] Fps is (10 sec: 12697.5, 60 sec: 14199.5, 300 sec: 14273.5). Total num frames: 573415424. Throughput: 0: 3642.1. Samples: 132524986. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 07:46:43,968][134211] Avg episode reward: [(0, '8.693')] [2025-01-04 07:46:46,714][134294] Updated weights for policy 0, policy_version 140004 (0.0025) [2025-01-04 07:46:48,968][134211] Fps is (10 sec: 13926.4, 60 sec: 14267.7, 300 sec: 14301.3). Total num frames: 573485056. Throughput: 0: 3666.4. Samples: 132535358. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 07:46:48,968][134211] Avg episode reward: [(0, '8.899')] [2025-01-04 07:46:49,794][134294] Updated weights for policy 0, policy_version 140014 (0.0026) [2025-01-04 07:46:52,800][134294] Updated weights for policy 0, policy_version 140024 (0.0024) [2025-01-04 07:46:53,968][134211] Fps is (10 sec: 13926.6, 60 sec: 14336.0, 300 sec: 14287.4). Total num frames: 573554688. Throughput: 0: 3551.8. Samples: 132555750. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 07:46:53,968][134211] Avg episode reward: [(0, '8.541')] [2025-01-04 07:46:55,913][134294] Updated weights for policy 0, policy_version 140034 (0.0025) [2025-01-04 07:46:58,765][134294] Updated weights for policy 0, policy_version 140044 (0.0020) [2025-01-04 07:46:58,968][134211] Fps is (10 sec: 13926.7, 60 sec: 14404.3, 300 sec: 14148.6). Total num frames: 573624320. Throughput: 0: 3267.1. Samples: 132575652. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 07:46:58,968][134211] Avg episode reward: [(0, '9.097')] [2025-01-04 07:47:00,751][134294] Updated weights for policy 0, policy_version 140054 (0.0013) [2025-01-04 07:47:02,711][134294] Updated weights for policy 0, policy_version 140064 (0.0013) [2025-01-04 07:47:03,967][134211] Fps is (10 sec: 17203.5, 60 sec: 15087.1, 300 sec: 14245.8). Total num frames: 573726720. Throughput: 0: 3333.7. Samples: 132590640. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 07:47:03,968][134211] Avg episode reward: [(0, '8.795')] [2025-01-04 07:47:04,582][134294] Updated weights for policy 0, policy_version 140074 (0.0013) [2025-01-04 07:47:06,710][134294] Updated weights for policy 0, policy_version 140084 (0.0016) [2025-01-04 07:47:08,968][134211] Fps is (10 sec: 18840.8, 60 sec: 14745.5, 300 sec: 14315.2). Total num frames: 573812736. Throughput: 0: 3574.9. Samples: 132620848. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 07:47:08,969][134211] Avg episode reward: [(0, '8.066')] [2025-01-04 07:47:09,972][134294] Updated weights for policy 0, policy_version 140094 (0.0028) [2025-01-04 07:47:13,335][134294] Updated weights for policy 0, policy_version 140104 (0.0030) [2025-01-04 07:47:13,968][134211] Fps is (10 sec: 14744.5, 60 sec: 13994.5, 300 sec: 14315.1). Total num frames: 573874176. Throughput: 0: 3581.6. Samples: 132639148. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 07:47:13,969][134211] Avg episode reward: [(0, '8.427')] [2025-01-04 07:47:13,983][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000140106_573874176.pth... [2025-01-04 07:47:14,055][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000139261_570413056.pth [2025-01-04 07:47:16,554][134294] Updated weights for policy 0, policy_version 140114 (0.0024) [2025-01-04 07:47:18,968][134211] Fps is (10 sec: 11878.7, 60 sec: 13789.9, 300 sec: 14301.3). Total num frames: 573931520. Throughput: 0: 3595.6. Samples: 132648640. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 07:47:18,968][134211] Avg episode reward: [(0, '8.109')] [2025-01-04 07:47:20,074][134294] Updated weights for policy 0, policy_version 140124 (0.0028) [2025-01-04 07:47:23,358][134294] Updated weights for policy 0, policy_version 140134 (0.0025) [2025-01-04 07:47:23,968][134211] Fps is (10 sec: 11879.2, 60 sec: 13789.9, 300 sec: 14315.2). Total num frames: 573992960. Throughput: 0: 3596.6. Samples: 132666756. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 07:47:23,968][134211] Avg episode reward: [(0, '8.415')] [2025-01-04 07:47:26,696][134294] Updated weights for policy 0, policy_version 140144 (0.0023) [2025-01-04 07:47:28,968][134211] Fps is (10 sec: 12697.5, 60 sec: 13858.1, 300 sec: 14329.1). Total num frames: 574058496. Throughput: 0: 3561.2. Samples: 132685240. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 07:47:28,968][134211] Avg episode reward: [(0, '7.347')] [2025-01-04 07:47:29,924][134294] Updated weights for policy 0, policy_version 140154 (0.0026) [2025-01-04 07:47:32,722][134294] Updated weights for policy 0, policy_version 140164 (0.0020) [2025-01-04 07:47:33,968][134211] Fps is (10 sec: 14336.0, 60 sec: 14131.2, 300 sec: 14287.4). Total num frames: 574136320. Throughput: 0: 3538.9. Samples: 132694610. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 07:47:33,968][134211] Avg episode reward: [(0, '9.029')] [2025-01-04 07:47:34,829][134294] Updated weights for policy 0, policy_version 140174 (0.0019) [2025-01-04 07:47:37,787][134294] Updated weights for policy 0, policy_version 140184 (0.0026) [2025-01-04 07:47:38,968][134211] Fps is (10 sec: 14745.8, 60 sec: 14336.0, 300 sec: 14162.4). Total num frames: 574205952. Throughput: 0: 3642.0. Samples: 132719640. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 07:47:38,968][134211] Avg episode reward: [(0, '8.330')] [2025-01-04 07:47:40,794][134294] Updated weights for policy 0, policy_version 140194 (0.0024) [2025-01-04 07:47:43,760][134294] Updated weights for policy 0, policy_version 140204 (0.0025) [2025-01-04 07:47:43,968][134211] Fps is (10 sec: 13926.2, 60 sec: 14336.0, 300 sec: 14120.8). Total num frames: 574275584. Throughput: 0: 3654.9. Samples: 132740124. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 07:47:43,968][134211] Avg episode reward: [(0, '7.923')] [2025-01-04 07:47:46,722][134294] Updated weights for policy 0, policy_version 140214 (0.0024) [2025-01-04 07:47:48,968][134211] Fps is (10 sec: 13926.3, 60 sec: 14336.0, 300 sec: 14162.4). Total num frames: 574345216. Throughput: 0: 3550.9. Samples: 132750430. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 07:47:48,968][134211] Avg episode reward: [(0, '7.817')] [2025-01-04 07:47:49,731][134294] Updated weights for policy 0, policy_version 140224 (0.0027) [2025-01-04 07:47:52,969][134294] Updated weights for policy 0, policy_version 140234 (0.0026) [2025-01-04 07:47:53,968][134211] Fps is (10 sec: 13107.2, 60 sec: 14199.4, 300 sec: 14176.3). Total num frames: 574406656. Throughput: 0: 3328.5. Samples: 132770628. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 07:47:53,969][134211] Avg episode reward: [(0, '7.799')] [2025-01-04 07:47:56,146][134294] Updated weights for policy 0, policy_version 140244 (0.0027) [2025-01-04 07:47:58,395][134294] Updated weights for policy 0, policy_version 140254 (0.0015) [2025-01-04 07:47:58,968][134211] Fps is (10 sec: 14336.2, 60 sec: 14404.3, 300 sec: 14245.8). Total num frames: 574488576. Throughput: 0: 3405.2. Samples: 132792380. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 07:47:58,968][134211] Avg episode reward: [(0, '8.383')] [2025-01-04 07:48:00,436][134294] Updated weights for policy 0, policy_version 140264 (0.0014) [2025-01-04 07:48:02,438][134294] Updated weights for policy 0, policy_version 140274 (0.0012) [2025-01-04 07:48:03,968][134211] Fps is (10 sec: 18842.0, 60 sec: 14472.5, 300 sec: 14412.4). Total num frames: 574595072. Throughput: 0: 3528.9. Samples: 132807442. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 07:48:03,968][134211] Avg episode reward: [(0, '8.030')] [2025-01-04 07:48:04,353][134294] Updated weights for policy 0, policy_version 140284 (0.0014) [2025-01-04 07:48:06,242][134294] Updated weights for policy 0, policy_version 140294 (0.0013) [2025-01-04 07:48:08,397][134294] Updated weights for policy 0, policy_version 140304 (0.0017) [2025-01-04 07:48:08,968][134211] Fps is (10 sec: 20479.8, 60 sec: 14677.4, 300 sec: 14523.5). Total num frames: 574693376. Throughput: 0: 3838.3. Samples: 132839478. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 07:48:08,968][134211] Avg episode reward: [(0, '8.435')] [2025-01-04 07:48:11,537][134294] Updated weights for policy 0, policy_version 140314 (0.0026) [2025-01-04 07:48:13,968][134211] Fps is (10 sec: 15974.0, 60 sec: 14677.4, 300 sec: 14523.4). Total num frames: 574754816. Throughput: 0: 3872.6. Samples: 132859506. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 07:48:13,969][134211] Avg episode reward: [(0, '8.976')] [2025-01-04 07:48:14,897][134294] Updated weights for policy 0, policy_version 140324 (0.0028) [2025-01-04 07:48:18,064][134294] Updated weights for policy 0, policy_version 140334 (0.0029) [2025-01-04 07:48:18,968][134211] Fps is (10 sec: 12288.0, 60 sec: 14745.6, 300 sec: 14509.6). Total num frames: 574816256. Throughput: 0: 3877.9. Samples: 132869116. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 07:48:18,968][134211] Avg episode reward: [(0, '8.153')] [2025-01-04 07:48:21,127][134294] Updated weights for policy 0, policy_version 140344 (0.0025) [2025-01-04 07:48:23,968][134211] Fps is (10 sec: 12697.3, 60 sec: 14813.8, 300 sec: 14523.4). Total num frames: 574881792. Throughput: 0: 3762.5. Samples: 132888954. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 07:48:23,969][134211] Avg episode reward: [(0, '8.891')] [2025-01-04 07:48:24,385][134294] Updated weights for policy 0, policy_version 140354 (0.0025) [2025-01-04 07:48:27,605][134294] Updated weights for policy 0, policy_version 140364 (0.0024) [2025-01-04 07:48:28,968][134211] Fps is (10 sec: 12697.5, 60 sec: 14745.6, 300 sec: 14412.4). Total num frames: 574943232. Throughput: 0: 3718.2. Samples: 132907444. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 07:48:28,968][134211] Avg episode reward: [(0, '9.383')] [2025-01-04 07:48:31,045][134294] Updated weights for policy 0, policy_version 140374 (0.0028) [2025-01-04 07:48:33,968][134211] Fps is (10 sec: 12288.3, 60 sec: 14472.5, 300 sec: 14301.3). Total num frames: 575004672. Throughput: 0: 3686.8. Samples: 132916338. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 07:48:33,968][134211] Avg episode reward: [(0, '8.212')] [2025-01-04 07:48:34,718][134294] Updated weights for policy 0, policy_version 140384 (0.0022) [2025-01-04 07:48:37,950][134294] Updated weights for policy 0, policy_version 140394 (0.0026) [2025-01-04 07:48:38,968][134211] Fps is (10 sec: 12288.0, 60 sec: 14336.0, 300 sec: 14301.3). Total num frames: 575066112. Throughput: 0: 3634.7. Samples: 132934188. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 07:48:38,968][134211] Avg episode reward: [(0, '8.180')] [2025-01-04 07:48:41,025][134294] Updated weights for policy 0, policy_version 140404 (0.0023) [2025-01-04 07:48:43,968][134211] Fps is (10 sec: 12697.8, 60 sec: 14267.8, 300 sec: 14315.2). Total num frames: 575131648. Throughput: 0: 3594.2. Samples: 132954120. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 07:48:43,968][134211] Avg episode reward: [(0, '8.855')] [2025-01-04 07:48:44,068][134294] Updated weights for policy 0, policy_version 140414 (0.0025) [2025-01-04 07:48:47,095][134294] Updated weights for policy 0, policy_version 140424 (0.0024) [2025-01-04 07:48:48,968][134211] Fps is (10 sec: 13515.9, 60 sec: 14267.6, 300 sec: 14329.0). Total num frames: 575201280. Throughput: 0: 3484.8. Samples: 132964262. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 07:48:48,969][134211] Avg episode reward: [(0, '8.425')] [2025-01-04 07:48:50,091][134294] Updated weights for policy 0, policy_version 140434 (0.0026) [2025-01-04 07:48:53,004][134294] Updated weights for policy 0, policy_version 140444 (0.0024) [2025-01-04 07:48:53,967][134211] Fps is (10 sec: 13926.7, 60 sec: 14404.3, 300 sec: 14343.0). Total num frames: 575270912. Throughput: 0: 3240.5. Samples: 132985300. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 07:48:53,968][134211] Avg episode reward: [(0, '8.413')] [2025-01-04 07:48:55,230][134294] Updated weights for policy 0, policy_version 140454 (0.0015) [2025-01-04 07:48:57,966][134294] Updated weights for policy 0, policy_version 140464 (0.0022) [2025-01-04 07:48:58,968][134211] Fps is (10 sec: 15156.4, 60 sec: 14404.2, 300 sec: 14412.4). Total num frames: 575352832. Throughput: 0: 3322.5. Samples: 133009016. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 07:48:58,968][134211] Avg episode reward: [(0, '8.584')] [2025-01-04 07:49:01,235][134294] Updated weights for policy 0, policy_version 140474 (0.0026) [2025-01-04 07:49:03,808][134294] Updated weights for policy 0, policy_version 140484 (0.0017) [2025-01-04 07:49:03,968][134211] Fps is (10 sec: 15155.0, 60 sec: 13789.9, 300 sec: 14342.9). Total num frames: 575422464. Throughput: 0: 3319.7. Samples: 133018504. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 07:49:03,968][134211] Avg episode reward: [(0, '8.102')] [2025-01-04 07:49:05,717][134294] Updated weights for policy 0, policy_version 140494 (0.0014) [2025-01-04 07:49:07,603][134294] Updated weights for policy 0, policy_version 140504 (0.0014) [2025-01-04 07:49:08,968][134211] Fps is (10 sec: 18021.7, 60 sec: 13994.6, 300 sec: 14495.7). Total num frames: 575533056. Throughput: 0: 3515.2. Samples: 133047140. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 07:49:08,968][134211] Avg episode reward: [(0, '8.893')] [2025-01-04 07:49:09,444][134294] Updated weights for policy 0, policy_version 140514 (0.0014) [2025-01-04 07:49:11,787][134294] Updated weights for policy 0, policy_version 140524 (0.0017) [2025-01-04 07:49:13,968][134211] Fps is (10 sec: 18841.3, 60 sec: 14267.8, 300 sec: 14523.4). Total num frames: 575610880. Throughput: 0: 3707.4. Samples: 133074276. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 07:49:13,968][134211] Avg episode reward: [(0, '8.632')] [2025-01-04 07:49:13,979][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000140531_575614976.pth... [2025-01-04 07:49:14,062][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000139679_572125184.pth [2025-01-04 07:49:15,031][134294] Updated weights for policy 0, policy_version 140534 (0.0030) [2025-01-04 07:49:18,179][134294] Updated weights for policy 0, policy_version 140544 (0.0027) [2025-01-04 07:49:18,969][134211] Fps is (10 sec: 14334.1, 60 sec: 14335.6, 300 sec: 14523.4). Total num frames: 575676416. Throughput: 0: 3720.3. Samples: 133083756. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 07:49:18,971][134211] Avg episode reward: [(0, '9.595')] [2025-01-04 07:49:21,323][134294] Updated weights for policy 0, policy_version 140554 (0.0027) [2025-01-04 07:49:23,968][134211] Fps is (10 sec: 12697.5, 60 sec: 14267.8, 300 sec: 14467.9). Total num frames: 575737856. Throughput: 0: 3752.3. Samples: 133103040. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 07:49:23,968][134211] Avg episode reward: [(0, '7.919')] [2025-01-04 07:49:24,751][134294] Updated weights for policy 0, policy_version 140564 (0.0029) [2025-01-04 07:49:28,079][134294] Updated weights for policy 0, policy_version 140574 (0.0027) [2025-01-04 07:49:28,968][134211] Fps is (10 sec: 12290.1, 60 sec: 14267.8, 300 sec: 14412.4). Total num frames: 575799296. Throughput: 0: 3712.0. Samples: 133121162. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 07:49:28,968][134211] Avg episode reward: [(0, '8.320')] [2025-01-04 07:49:31,242][134294] Updated weights for policy 0, policy_version 140584 (0.0028) [2025-01-04 07:49:33,968][134211] Fps is (10 sec: 12697.6, 60 sec: 14336.0, 300 sec: 14273.5). Total num frames: 575864832. Throughput: 0: 3700.9. Samples: 133130800. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 07:49:33,968][134211] Avg episode reward: [(0, '8.490')] [2025-01-04 07:49:34,505][134294] Updated weights for policy 0, policy_version 140594 (0.0024) [2025-01-04 07:49:37,691][134294] Updated weights for policy 0, policy_version 140604 (0.0026) [2025-01-04 07:49:38,968][134211] Fps is (10 sec: 13107.1, 60 sec: 14404.3, 300 sec: 14204.1). Total num frames: 575930368. Throughput: 0: 3663.7. Samples: 133150168. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 07:49:38,968][134211] Avg episode reward: [(0, '9.540')] [2025-01-04 07:49:40,690][134294] Updated weights for policy 0, policy_version 140614 (0.0025) [2025-01-04 07:49:43,739][134294] Updated weights for policy 0, policy_version 140624 (0.0026) [2025-01-04 07:49:43,968][134211] Fps is (10 sec: 13107.3, 60 sec: 14404.3, 300 sec: 14204.1). Total num frames: 575995904. Throughput: 0: 3590.8. Samples: 133170604. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 07:49:43,968][134211] Avg episode reward: [(0, '8.680')] [2025-01-04 07:49:46,725][134294] Updated weights for policy 0, policy_version 140634 (0.0026) [2025-01-04 07:49:48,968][134211] Fps is (10 sec: 13517.0, 60 sec: 14404.5, 300 sec: 14231.9). Total num frames: 576065536. Throughput: 0: 3603.8. Samples: 133180676. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 07:49:48,968][134211] Avg episode reward: [(0, '8.276')] [2025-01-04 07:49:49,741][134294] Updated weights for policy 0, policy_version 140644 (0.0027) [2025-01-04 07:49:52,910][134294] Updated weights for policy 0, policy_version 140654 (0.0025) [2025-01-04 07:49:53,968][134211] Fps is (10 sec: 13926.6, 60 sec: 14404.2, 300 sec: 14259.6). Total num frames: 576135168. Throughput: 0: 3411.2. Samples: 133200642. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 07:49:53,968][134211] Avg episode reward: [(0, '7.823')] [2025-01-04 07:49:55,014][134294] Updated weights for policy 0, policy_version 140664 (0.0012) [2025-01-04 07:49:57,093][134294] Updated weights for policy 0, policy_version 140674 (0.0015) [2025-01-04 07:49:58,968][134211] Fps is (10 sec: 17203.3, 60 sec: 14745.6, 300 sec: 14398.5). Total num frames: 576237568. Throughput: 0: 3433.6. Samples: 133228788. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 07:49:58,968][134211] Avg episode reward: [(0, '8.793')] [2025-01-04 07:49:59,147][134294] Updated weights for policy 0, policy_version 140684 (0.0015) [2025-01-04 07:50:01,238][134294] Updated weights for policy 0, policy_version 140694 (0.0014) [2025-01-04 07:50:03,875][134294] Updated weights for policy 0, policy_version 140704 (0.0023) [2025-01-04 07:50:03,968][134211] Fps is (10 sec: 18841.4, 60 sec: 15018.6, 300 sec: 14467.9). Total num frames: 576323584. Throughput: 0: 3551.7. Samples: 133243576. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 07:50:03,968][134211] Avg episode reward: [(0, '8.192')] [2025-01-04 07:50:07,211][134294] Updated weights for policy 0, policy_version 140714 (0.0031) [2025-01-04 07:50:08,968][134211] Fps is (10 sec: 14335.7, 60 sec: 14131.3, 300 sec: 14426.3). Total num frames: 576380928. Throughput: 0: 3577.9. Samples: 133264044. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 07:50:08,968][134211] Avg episode reward: [(0, '8.528')] [2025-01-04 07:50:10,516][134294] Updated weights for policy 0, policy_version 140724 (0.0024) [2025-01-04 07:50:13,605][134294] Updated weights for policy 0, policy_version 140734 (0.0022) [2025-01-04 07:50:13,968][134211] Fps is (10 sec: 12697.0, 60 sec: 13994.6, 300 sec: 14426.2). Total num frames: 576450560. Throughput: 0: 3606.4. Samples: 133283450. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 07:50:13,969][134211] Avg episode reward: [(0, '8.814')] [2025-01-04 07:50:16,705][134294] Updated weights for policy 0, policy_version 140744 (0.0027) [2025-01-04 07:50:18,968][134211] Fps is (10 sec: 13516.0, 60 sec: 13994.9, 300 sec: 14426.2). Total num frames: 576516096. Throughput: 0: 3610.8. Samples: 133293286. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 07:50:18,969][134211] Avg episode reward: [(0, '8.540')] [2025-01-04 07:50:19,895][134294] Updated weights for policy 0, policy_version 140754 (0.0025) [2025-01-04 07:50:23,004][134294] Updated weights for policy 0, policy_version 140764 (0.0026) [2025-01-04 07:50:23,968][134211] Fps is (10 sec: 12698.1, 60 sec: 13994.7, 300 sec: 14440.1). Total num frames: 576577536. Throughput: 0: 3615.9. Samples: 133312882. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 07:50:23,968][134211] Avg episode reward: [(0, '9.414')] [2025-01-04 07:50:26,358][134294] Updated weights for policy 0, policy_version 140774 (0.0029) [2025-01-04 07:50:28,968][134211] Fps is (10 sec: 12288.7, 60 sec: 13994.6, 300 sec: 14329.0). Total num frames: 576638976. Throughput: 0: 3562.8. Samples: 133330928. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 07:50:28,968][134211] Avg episode reward: [(0, '8.515')] [2025-01-04 07:50:29,924][134294] Updated weights for policy 0, policy_version 140784 (0.0025) [2025-01-04 07:50:33,422][134294] Updated weights for policy 0, policy_version 140794 (0.0026) [2025-01-04 07:50:33,968][134211] Fps is (10 sec: 11878.4, 60 sec: 13858.1, 300 sec: 14204.1). Total num frames: 576696320. Throughput: 0: 3534.3. Samples: 133339722. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 07:50:33,968][134211] Avg episode reward: [(0, '8.703')] [2025-01-04 07:50:36,842][134294] Updated weights for policy 0, policy_version 140804 (0.0025) [2025-01-04 07:50:38,968][134211] Fps is (10 sec: 11878.3, 60 sec: 13789.9, 300 sec: 14218.0). Total num frames: 576757760. Throughput: 0: 3486.1. Samples: 133357516. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 07:50:38,968][134211] Avg episode reward: [(0, '9.121')] [2025-01-04 07:50:39,731][134294] Updated weights for policy 0, policy_version 140814 (0.0021) [2025-01-04 07:50:41,611][134294] Updated weights for policy 0, policy_version 140824 (0.0014) [2025-01-04 07:50:43,538][134294] Updated weights for policy 0, policy_version 140834 (0.0013) [2025-01-04 07:50:43,968][134211] Fps is (10 sec: 16794.0, 60 sec: 14472.6, 300 sec: 14356.8). Total num frames: 576864256. Throughput: 0: 3479.6. Samples: 133385372. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 07:50:43,968][134211] Avg episode reward: [(0, '8.419')] [2025-01-04 07:50:45,465][134294] Updated weights for policy 0, policy_version 140844 (0.0013) [2025-01-04 07:50:47,365][134294] Updated weights for policy 0, policy_version 140854 (0.0012) [2025-01-04 07:50:48,968][134211] Fps is (10 sec: 20889.8, 60 sec: 15018.6, 300 sec: 14481.8). Total num frames: 576966656. Throughput: 0: 3508.8. Samples: 133401470. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 07:50:48,968][134211] Avg episode reward: [(0, '8.248')] [2025-01-04 07:50:49,713][134294] Updated weights for policy 0, policy_version 140864 (0.0021) [2025-01-04 07:50:52,976][134294] Updated weights for policy 0, policy_version 140874 (0.0027) [2025-01-04 07:50:53,968][134211] Fps is (10 sec: 16383.6, 60 sec: 14882.1, 300 sec: 14467.9). Total num frames: 577028096. Throughput: 0: 3594.3. Samples: 133425790. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 07:50:53,968][134211] Avg episode reward: [(0, '8.146')] [2025-01-04 07:50:56,191][134294] Updated weights for policy 0, policy_version 140884 (0.0028) [2025-01-04 07:50:58,968][134211] Fps is (10 sec: 12697.6, 60 sec: 14267.7, 300 sec: 14481.8). Total num frames: 577093632. Throughput: 0: 3574.1. Samples: 133444282. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 07:50:58,968][134211] Avg episode reward: [(0, '8.772')] [2025-01-04 07:50:59,694][134294] Updated weights for policy 0, policy_version 140894 (0.0027) [2025-01-04 07:51:03,076][134294] Updated weights for policy 0, policy_version 140904 (0.0028) [2025-01-04 07:51:03,968][134211] Fps is (10 sec: 12288.0, 60 sec: 13789.8, 300 sec: 14315.2). Total num frames: 577150976. Throughput: 0: 3553.4. Samples: 133453188. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 07:51:03,968][134211] Avg episode reward: [(0, '9.030')] [2025-01-04 07:51:06,090][134294] Updated weights for policy 0, policy_version 140914 (0.0027) [2025-01-04 07:51:08,968][134211] Fps is (10 sec: 12288.2, 60 sec: 13926.4, 300 sec: 14176.3). Total num frames: 577216512. Throughput: 0: 3544.9. Samples: 133472400. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 07:51:08,968][134211] Avg episode reward: [(0, '8.846')] [2025-01-04 07:51:09,358][134294] Updated weights for policy 0, policy_version 140924 (0.0024) [2025-01-04 07:51:12,465][134294] Updated weights for policy 0, policy_version 140934 (0.0025) [2025-01-04 07:51:13,968][134211] Fps is (10 sec: 13106.6, 60 sec: 13858.1, 300 sec: 14162.4). Total num frames: 577282048. Throughput: 0: 3575.8. Samples: 133491842. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 07:51:13,969][134211] Avg episode reward: [(0, '9.528')] [2025-01-04 07:51:13,979][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000140938_577282048.pth... [2025-01-04 07:51:14,051][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000140106_573874176.pth [2025-01-04 07:51:15,613][134294] Updated weights for policy 0, policy_version 140944 (0.0025) [2025-01-04 07:51:18,653][134294] Updated weights for policy 0, policy_version 140954 (0.0024) [2025-01-04 07:51:18,968][134211] Fps is (10 sec: 13516.6, 60 sec: 13926.5, 300 sec: 14190.2). Total num frames: 577351680. Throughput: 0: 3599.8. Samples: 133501712. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 07:51:18,968][134211] Avg episode reward: [(0, '8.230')] [2025-01-04 07:51:21,611][134294] Updated weights for policy 0, policy_version 140964 (0.0026) [2025-01-04 07:51:23,968][134211] Fps is (10 sec: 13517.4, 60 sec: 13994.7, 300 sec: 14204.1). Total num frames: 577417216. Throughput: 0: 3655.1. Samples: 133521994. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 07:51:23,969][134211] Avg episode reward: [(0, '8.200')] [2025-01-04 07:51:25,011][134294] Updated weights for policy 0, policy_version 140974 (0.0024) [2025-01-04 07:51:28,314][134294] Updated weights for policy 0, policy_version 140984 (0.0025) [2025-01-04 07:51:28,968][134211] Fps is (10 sec: 12288.0, 60 sec: 13926.4, 300 sec: 14190.2). Total num frames: 577474560. Throughput: 0: 3443.4. Samples: 133540326. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 07:51:28,968][134211] Avg episode reward: [(0, '8.112')] [2025-01-04 07:51:31,535][134294] Updated weights for policy 0, policy_version 140994 (0.0024) [2025-01-04 07:51:33,682][134294] Updated weights for policy 0, policy_version 141004 (0.0013) [2025-01-04 07:51:33,968][134211] Fps is (10 sec: 13926.5, 60 sec: 14336.0, 300 sec: 14273.5). Total num frames: 577556480. Throughput: 0: 3296.9. Samples: 133549830. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 07:51:33,968][134211] Avg episode reward: [(0, '7.169')] [2025-01-04 07:51:35,640][134294] Updated weights for policy 0, policy_version 141014 (0.0012) [2025-01-04 07:51:37,581][134294] Updated weights for policy 0, policy_version 141024 (0.0015) [2025-01-04 07:51:38,967][134211] Fps is (10 sec: 18842.0, 60 sec: 15087.0, 300 sec: 14398.5). Total num frames: 577662976. Throughput: 0: 3421.3. Samples: 133579746. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 07:51:38,968][134211] Avg episode reward: [(0, '8.119')] [2025-01-04 07:51:39,505][134294] Updated weights for policy 0, policy_version 141034 (0.0014) [2025-01-04 07:51:41,510][134294] Updated weights for policy 0, policy_version 141044 (0.0018) [2025-01-04 07:51:43,968][134211] Fps is (10 sec: 18840.4, 60 sec: 14677.1, 300 sec: 14440.1). Total num frames: 577744896. Throughput: 0: 3636.6. Samples: 133607930. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 07:51:43,969][134211] Avg episode reward: [(0, '8.737')] [2025-01-04 07:51:44,634][134294] Updated weights for policy 0, policy_version 141054 (0.0024) [2025-01-04 07:51:47,797][134294] Updated weights for policy 0, policy_version 141064 (0.0026) [2025-01-04 07:51:48,968][134211] Fps is (10 sec: 14745.2, 60 sec: 14062.9, 300 sec: 14426.2). Total num frames: 577810432. Throughput: 0: 3651.7. Samples: 133617514. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 07:51:48,968][134211] Avg episode reward: [(0, '8.088')] [2025-01-04 07:51:51,169][134294] Updated weights for policy 0, policy_version 141074 (0.0027) [2025-01-04 07:51:53,968][134211] Fps is (10 sec: 12288.7, 60 sec: 13994.7, 300 sec: 14384.6). Total num frames: 577867776. Throughput: 0: 3635.4. Samples: 133635996. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 07:51:53,969][134211] Avg episode reward: [(0, '9.027')] [2025-01-04 07:51:54,744][134294] Updated weights for policy 0, policy_version 141084 (0.0028) [2025-01-04 07:51:58,321][134294] Updated weights for policy 0, policy_version 141094 (0.0025) [2025-01-04 07:51:58,968][134211] Fps is (10 sec: 11468.8, 60 sec: 13858.1, 300 sec: 14231.9). Total num frames: 577925120. Throughput: 0: 3582.9. Samples: 133653072. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 07:51:58,968][134211] Avg episode reward: [(0, '8.753')] [2025-01-04 07:52:02,006][134294] Updated weights for policy 0, policy_version 141104 (0.0027) [2025-01-04 07:52:03,968][134211] Fps is (10 sec: 11469.0, 60 sec: 13858.2, 300 sec: 14134.7). Total num frames: 577982464. Throughput: 0: 3549.1. Samples: 133661422. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 07:52:03,968][134211] Avg episode reward: [(0, '8.648')] [2025-01-04 07:52:05,303][134294] Updated weights for policy 0, policy_version 141114 (0.0023) [2025-01-04 07:52:08,527][134294] Updated weights for policy 0, policy_version 141124 (0.0028) [2025-01-04 07:52:08,968][134211] Fps is (10 sec: 12288.0, 60 sec: 13858.1, 300 sec: 14148.6). Total num frames: 578048000. Throughput: 0: 3508.0. Samples: 133679854. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 07:52:08,968][134211] Avg episode reward: [(0, '8.141')] [2025-01-04 07:52:11,589][134294] Updated weights for policy 0, policy_version 141134 (0.0026) [2025-01-04 07:52:13,968][134211] Fps is (10 sec: 13107.0, 60 sec: 13858.2, 300 sec: 14176.3). Total num frames: 578113536. Throughput: 0: 3540.1. Samples: 133699632. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 07:52:13,968][134211] Avg episode reward: [(0, '7.745')] [2025-01-04 07:52:14,724][134294] Updated weights for policy 0, policy_version 141144 (0.0026) [2025-01-04 07:52:17,753][134294] Updated weights for policy 0, policy_version 141154 (0.0024) [2025-01-04 07:52:18,968][134211] Fps is (10 sec: 14336.3, 60 sec: 13994.7, 300 sec: 14231.9). Total num frames: 578191360. Throughput: 0: 3548.7. Samples: 133709520. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 07:52:18,968][134211] Avg episode reward: [(0, '8.288')] [2025-01-04 07:52:19,685][134294] Updated weights for policy 0, policy_version 141164 (0.0013) [2025-01-04 07:52:21,648][134294] Updated weights for policy 0, policy_version 141174 (0.0012) [2025-01-04 07:52:23,968][134211] Fps is (10 sec: 16383.9, 60 sec: 14336.0, 300 sec: 14301.3). Total num frames: 578277376. Throughput: 0: 3503.9. Samples: 133737422. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 07:52:23,968][134211] Avg episode reward: [(0, '7.972')] [2025-01-04 07:52:24,730][134294] Updated weights for policy 0, policy_version 141184 (0.0022) [2025-01-04 07:52:28,322][134294] Updated weights for policy 0, policy_version 141194 (0.0031) [2025-01-04 07:52:28,968][134211] Fps is (10 sec: 14335.8, 60 sec: 14336.0, 300 sec: 14231.9). Total num frames: 578334720. Throughput: 0: 3276.9. Samples: 133755390. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 07:52:28,968][134211] Avg episode reward: [(0, '8.087')] [2025-01-04 07:52:31,125][134294] Updated weights for policy 0, policy_version 141204 (0.0020) [2025-01-04 07:52:33,277][134294] Updated weights for policy 0, policy_version 141214 (0.0015) [2025-01-04 07:52:33,971][134211] Fps is (10 sec: 14331.8, 60 sec: 14403.5, 300 sec: 14287.3). Total num frames: 578420736. Throughput: 0: 3305.2. Samples: 133766258. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 07:52:33,971][134211] Avg episode reward: [(0, '8.378')] [2025-01-04 07:52:36,318][134294] Updated weights for policy 0, policy_version 141224 (0.0025) [2025-01-04 07:52:38,968][134211] Fps is (10 sec: 15155.1, 60 sec: 13721.5, 300 sec: 14273.5). Total num frames: 578486272. Throughput: 0: 3404.1. Samples: 133789180. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 07:52:38,968][134211] Avg episode reward: [(0, '9.121')] [2025-01-04 07:52:39,694][134294] Updated weights for policy 0, policy_version 141234 (0.0026) [2025-01-04 07:52:42,844][134294] Updated weights for policy 0, policy_version 141244 (0.0027) [2025-01-04 07:52:43,968][134211] Fps is (10 sec: 12701.4, 60 sec: 13380.4, 300 sec: 14245.7). Total num frames: 578547712. Throughput: 0: 3443.4. Samples: 133808026. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 07:52:43,968][134211] Avg episode reward: [(0, '9.107')] [2025-01-04 07:52:45,945][134294] Updated weights for policy 0, policy_version 141254 (0.0026) [2025-01-04 07:52:48,937][134294] Updated weights for policy 0, policy_version 141264 (0.0026) [2025-01-04 07:52:48,968][134211] Fps is (10 sec: 13107.2, 60 sec: 13448.5, 300 sec: 14273.5). Total num frames: 578617344. Throughput: 0: 3482.3. Samples: 133818124. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 07:52:48,968][134211] Avg episode reward: [(0, '8.416')] [2025-01-04 07:52:52,030][134294] Updated weights for policy 0, policy_version 141274 (0.0027) [2025-01-04 07:52:53,968][134211] Fps is (10 sec: 13517.1, 60 sec: 13585.1, 300 sec: 14218.0). Total num frames: 578682880. Throughput: 0: 3522.7. Samples: 133838374. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 07:52:53,968][134211] Avg episode reward: [(0, '7.891')] [2025-01-04 07:52:55,013][134294] Updated weights for policy 0, policy_version 141284 (0.0028) [2025-01-04 07:52:58,140][134294] Updated weights for policy 0, policy_version 141294 (0.0026) [2025-01-04 07:52:58,968][134211] Fps is (10 sec: 13107.2, 60 sec: 13721.6, 300 sec: 14079.1). Total num frames: 578748416. Throughput: 0: 3526.1. Samples: 133858308. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 07:52:58,968][134211] Avg episode reward: [(0, '9.020')] [2025-01-04 07:53:00,752][134294] Updated weights for policy 0, policy_version 141304 (0.0017) [2025-01-04 07:53:03,329][134294] Updated weights for policy 0, policy_version 141314 (0.0019) [2025-01-04 07:53:03,968][134211] Fps is (10 sec: 14335.6, 60 sec: 14062.9, 300 sec: 14009.7). Total num frames: 578826240. Throughput: 0: 3580.4. Samples: 133870638. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 07:53:03,968][134211] Avg episode reward: [(0, '8.874')] [2025-01-04 07:53:06,416][134294] Updated weights for policy 0, policy_version 141324 (0.0027) [2025-01-04 07:53:08,968][134211] Fps is (10 sec: 14336.0, 60 sec: 14062.9, 300 sec: 14023.6). Total num frames: 578891776. Throughput: 0: 3419.0. Samples: 133891276. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 07:53:08,968][134211] Avg episode reward: [(0, '9.006')] [2025-01-04 07:53:09,749][134294] Updated weights for policy 0, policy_version 141334 (0.0024) [2025-01-04 07:53:12,751][134294] Updated weights for policy 0, policy_version 141344 (0.0022) [2025-01-04 07:53:13,968][134211] Fps is (10 sec: 14336.3, 60 sec: 14267.8, 300 sec: 14079.1). Total num frames: 578969600. Throughput: 0: 3467.2. Samples: 133911414. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 07:53:13,968][134211] Avg episode reward: [(0, '8.683')] [2025-01-04 07:53:13,973][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000141350_578969600.pth... [2025-01-04 07:53:14,019][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000140531_575614976.pth [2025-01-04 07:53:14,731][134294] Updated weights for policy 0, policy_version 141354 (0.0013) [2025-01-04 07:53:16,604][134294] Updated weights for policy 0, policy_version 141364 (0.0014) [2025-01-04 07:53:18,498][134294] Updated weights for policy 0, policy_version 141374 (0.0012) [2025-01-04 07:53:18,968][134211] Fps is (10 sec: 18432.4, 60 sec: 14745.6, 300 sec: 14218.0). Total num frames: 579076096. Throughput: 0: 3585.4. Samples: 133927588. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 07:53:18,968][134211] Avg episode reward: [(0, '9.454')] [2025-01-04 07:53:20,394][134294] Updated weights for policy 0, policy_version 141384 (0.0013) [2025-01-04 07:53:22,291][134294] Updated weights for policy 0, policy_version 141394 (0.0014) [2025-01-04 07:53:23,968][134211] Fps is (10 sec: 20478.4, 60 sec: 14950.3, 300 sec: 14342.9). Total num frames: 579174400. Throughput: 0: 3800.3. Samples: 133960196. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 07:53:23,969][134211] Avg episode reward: [(0, '8.107')] [2025-01-04 07:53:25,061][134294] Updated weights for policy 0, policy_version 141404 (0.0027) [2025-01-04 07:53:28,324][134294] Updated weights for policy 0, policy_version 141414 (0.0028) [2025-01-04 07:53:28,968][134211] Fps is (10 sec: 15974.3, 60 sec: 15018.7, 300 sec: 14343.0). Total num frames: 579235840. Throughput: 0: 3833.0. Samples: 133980512. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 07:53:28,968][134211] Avg episode reward: [(0, '9.051')] [2025-01-04 07:53:31,849][134294] Updated weights for policy 0, policy_version 141424 (0.0025) [2025-01-04 07:53:33,968][134211] Fps is (10 sec: 12288.8, 60 sec: 14609.8, 300 sec: 14342.9). Total num frames: 579297280. Throughput: 0: 3803.2. Samples: 133989270. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 07:53:33,968][134211] Avg episode reward: [(0, '8.961')] [2025-01-04 07:53:35,340][134294] Updated weights for policy 0, policy_version 141434 (0.0026) [2025-01-04 07:53:38,494][134294] Updated weights for policy 0, policy_version 141444 (0.0023) [2025-01-04 07:53:38,968][134211] Fps is (10 sec: 12287.8, 60 sec: 14540.8, 300 sec: 14329.1). Total num frames: 579358720. Throughput: 0: 3760.6. Samples: 134007600. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 07:53:38,968][134211] Avg episode reward: [(0, '9.664')] [2025-01-04 07:53:41,522][134294] Updated weights for policy 0, policy_version 141454 (0.0026) [2025-01-04 07:53:43,968][134211] Fps is (10 sec: 12697.7, 60 sec: 14609.1, 300 sec: 14315.2). Total num frames: 579424256. Throughput: 0: 3761.9. Samples: 134027592. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 07:53:43,968][134211] Avg episode reward: [(0, '8.553')] [2025-01-04 07:53:44,601][134294] Updated weights for policy 0, policy_version 141464 (0.0028) [2025-01-04 07:53:47,623][134294] Updated weights for policy 0, policy_version 141474 (0.0024) [2025-01-04 07:53:48,968][134211] Fps is (10 sec: 13516.9, 60 sec: 14609.1, 300 sec: 14315.2). Total num frames: 579493888. Throughput: 0: 3710.4. Samples: 134037604. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 07:53:48,968][134211] Avg episode reward: [(0, '8.479')] [2025-01-04 07:53:50,580][134294] Updated weights for policy 0, policy_version 141484 (0.0024) [2025-01-04 07:53:53,500][134294] Updated weights for policy 0, policy_version 141494 (0.0024) [2025-01-04 07:53:53,968][134211] Fps is (10 sec: 13925.7, 60 sec: 14677.2, 300 sec: 14273.5). Total num frames: 579563520. Throughput: 0: 3714.5. Samples: 134058428. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 07:53:53,969][134211] Avg episode reward: [(0, '8.181')] [2025-01-04 07:53:56,460][134294] Updated weights for policy 0, policy_version 141504 (0.0024) [2025-01-04 07:53:58,968][134211] Fps is (10 sec: 13516.4, 60 sec: 14677.3, 300 sec: 14259.6). Total num frames: 579629056. Throughput: 0: 3717.8. Samples: 134078716. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 07:53:58,970][134211] Avg episode reward: [(0, '9.039')] [2025-01-04 07:53:59,749][134294] Updated weights for policy 0, policy_version 141514 (0.0024) [2025-01-04 07:54:03,074][134294] Updated weights for policy 0, policy_version 141524 (0.0024) [2025-01-04 07:54:03,968][134211] Fps is (10 sec: 12698.1, 60 sec: 14404.3, 300 sec: 14093.0). Total num frames: 579690496. Throughput: 0: 3565.0. Samples: 134088012. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 07:54:03,968][134211] Avg episode reward: [(0, '8.593')] [2025-01-04 07:54:06,343][134294] Updated weights for policy 0, policy_version 141534 (0.0026) [2025-01-04 07:54:08,968][134211] Fps is (10 sec: 12698.1, 60 sec: 14404.3, 300 sec: 14051.4). Total num frames: 579756032. Throughput: 0: 3262.9. Samples: 134107022. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 07:54:08,968][134211] Avg episode reward: [(0, '9.607')] [2025-01-04 07:54:09,343][134294] Updated weights for policy 0, policy_version 141544 (0.0025) [2025-01-04 07:54:12,286][134294] Updated weights for policy 0, policy_version 141554 (0.0023) [2025-01-04 07:54:13,968][134211] Fps is (10 sec: 13517.0, 60 sec: 14267.7, 300 sec: 14065.3). Total num frames: 579825664. Throughput: 0: 3264.3. Samples: 134127406. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 07:54:13,968][134211] Avg episode reward: [(0, '9.103')] [2025-01-04 07:54:15,325][134294] Updated weights for policy 0, policy_version 141564 (0.0025) [2025-01-04 07:54:18,149][134294] Updated weights for policy 0, policy_version 141574 (0.0024) [2025-01-04 07:54:18,968][134211] Fps is (10 sec: 13926.3, 60 sec: 13653.3, 300 sec: 14093.0). Total num frames: 579895296. Throughput: 0: 3301.9. Samples: 134137854. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 07:54:18,968][134211] Avg episode reward: [(0, '8.799')] [2025-01-04 07:54:20,443][134294] Updated weights for policy 0, policy_version 141584 (0.0017) [2025-01-04 07:54:22,348][134294] Updated weights for policy 0, policy_version 141594 (0.0012) [2025-01-04 07:54:23,967][134211] Fps is (10 sec: 17613.1, 60 sec: 13790.1, 300 sec: 14245.8). Total num frames: 580001792. Throughput: 0: 3490.1. Samples: 134164654. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 07:54:23,968][134211] Avg episode reward: [(0, '8.907')] [2025-01-04 07:54:24,234][134294] Updated weights for policy 0, policy_version 141604 (0.0014) [2025-01-04 07:54:26,110][134294] Updated weights for policy 0, policy_version 141614 (0.0013) [2025-01-04 07:54:28,034][134294] Updated weights for policy 0, policy_version 141624 (0.0014) [2025-01-04 07:54:28,968][134211] Fps is (10 sec: 21299.6, 60 sec: 14540.8, 300 sec: 14384.6). Total num frames: 580108288. Throughput: 0: 3756.6. Samples: 134196638. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 07:54:28,968][134211] Avg episode reward: [(0, '9.281')] [2025-01-04 07:54:30,243][134294] Updated weights for policy 0, policy_version 141634 (0.0015) [2025-01-04 07:54:33,659][134294] Updated weights for policy 0, policy_version 141644 (0.0026) [2025-01-04 07:54:33,968][134211] Fps is (10 sec: 17202.6, 60 sec: 14609.0, 300 sec: 14384.6). Total num frames: 580173824. Throughput: 0: 3822.5. Samples: 134209616. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 07:54:33,969][134211] Avg episode reward: [(0, '8.314')] [2025-01-04 07:54:37,150][134294] Updated weights for policy 0, policy_version 141654 (0.0027) [2025-01-04 07:54:38,968][134211] Fps is (10 sec: 12697.4, 60 sec: 14609.1, 300 sec: 14370.7). Total num frames: 580235264. Throughput: 0: 3742.4. Samples: 134226836. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 07:54:38,968][134211] Avg episode reward: [(0, '8.595')] [2025-01-04 07:54:40,500][134294] Updated weights for policy 0, policy_version 141664 (0.0027) [2025-01-04 07:54:43,765][134294] Updated weights for policy 0, policy_version 141674 (0.0027) [2025-01-04 07:54:43,968][134211] Fps is (10 sec: 12288.3, 60 sec: 14540.8, 300 sec: 14342.9). Total num frames: 580296704. Throughput: 0: 3711.1. Samples: 134245714. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 07:54:43,968][134211] Avg episode reward: [(0, '8.795')] [2025-01-04 07:54:46,871][134294] Updated weights for policy 0, policy_version 141684 (0.0027) [2025-01-04 07:54:48,968][134211] Fps is (10 sec: 12697.7, 60 sec: 14472.6, 300 sec: 14329.1). Total num frames: 580362240. Throughput: 0: 3718.1. Samples: 134255324. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 07:54:48,968][134211] Avg episode reward: [(0, '9.808')] [2025-01-04 07:54:49,912][134294] Updated weights for policy 0, policy_version 141694 (0.0026) [2025-01-04 07:54:52,868][134294] Updated weights for policy 0, policy_version 141704 (0.0027) [2025-01-04 07:54:53,968][134211] Fps is (10 sec: 13516.4, 60 sec: 14472.6, 300 sec: 14218.0). Total num frames: 580431872. Throughput: 0: 3743.9. Samples: 134275500. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 07:54:53,969][134211] Avg episode reward: [(0, '8.823')] [2025-01-04 07:54:56,185][134294] Updated weights for policy 0, policy_version 141714 (0.0023) [2025-01-04 07:54:58,968][134211] Fps is (10 sec: 13106.9, 60 sec: 14404.3, 300 sec: 14134.7). Total num frames: 580493312. Throughput: 0: 3719.9. Samples: 134294800. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 07:54:58,969][134211] Avg episode reward: [(0, '9.435')] [2025-01-04 07:54:59,346][134294] Updated weights for policy 0, policy_version 141724 (0.0027) [2025-01-04 07:55:02,691][134294] Updated weights for policy 0, policy_version 141734 (0.0029) [2025-01-04 07:55:03,968][134211] Fps is (10 sec: 12697.8, 60 sec: 14472.5, 300 sec: 14162.4). Total num frames: 580558848. Throughput: 0: 3692.5. Samples: 134304016. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 07:55:03,968][134211] Avg episode reward: [(0, '8.996')] [2025-01-04 07:55:05,706][134294] Updated weights for policy 0, policy_version 141744 (0.0026) [2025-01-04 07:55:08,658][134294] Updated weights for policy 0, policy_version 141754 (0.0024) [2025-01-04 07:55:08,968][134211] Fps is (10 sec: 13107.4, 60 sec: 14472.5, 300 sec: 14148.6). Total num frames: 580624384. Throughput: 0: 3546.3. Samples: 134324238. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 07:55:08,968][134211] Avg episode reward: [(0, '8.870')] [2025-01-04 07:55:11,697][134294] Updated weights for policy 0, policy_version 141764 (0.0027) [2025-01-04 07:55:13,968][134211] Fps is (10 sec: 13516.5, 60 sec: 14472.5, 300 sec: 14162.5). Total num frames: 580694016. Throughput: 0: 3279.2. Samples: 134344202. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 07:55:13,969][134211] Avg episode reward: [(0, '9.884')] [2025-01-04 07:55:13,976][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000141771_580694016.pth... [2025-01-04 07:55:14,053][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000140938_577282048.pth [2025-01-04 07:55:14,866][134294] Updated weights for policy 0, policy_version 141774 (0.0027) [2025-01-04 07:55:17,736][134294] Updated weights for policy 0, policy_version 141784 (0.0020) [2025-01-04 07:55:18,968][134211] Fps is (10 sec: 14745.9, 60 sec: 14609.1, 300 sec: 14218.0). Total num frames: 580771840. Throughput: 0: 3207.8. Samples: 134353966. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 07:55:18,968][134211] Avg episode reward: [(0, '8.352')] [2025-01-04 07:55:19,652][134294] Updated weights for policy 0, policy_version 141794 (0.0013) [2025-01-04 07:55:21,542][134294] Updated weights for policy 0, policy_version 141804 (0.0014) [2025-01-04 07:55:23,384][134294] Updated weights for policy 0, policy_version 141814 (0.0013) [2025-01-04 07:55:23,968][134211] Fps is (10 sec: 18432.7, 60 sec: 14609.0, 300 sec: 14370.7). Total num frames: 580878336. Throughput: 0: 3495.1. Samples: 134384114. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 07:55:23,968][134211] Avg episode reward: [(0, '8.665')] [2025-01-04 07:55:25,488][134294] Updated weights for policy 0, policy_version 141824 (0.0016) [2025-01-04 07:55:28,727][134294] Updated weights for policy 0, policy_version 141834 (0.0028) [2025-01-04 07:55:28,968][134211] Fps is (10 sec: 18021.9, 60 sec: 14062.9, 300 sec: 14426.3). Total num frames: 580952064. Throughput: 0: 3637.8. Samples: 134409416. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 07:55:28,968][134211] Avg episode reward: [(0, '9.342')] [2025-01-04 07:55:32,226][134294] Updated weights for policy 0, policy_version 141844 (0.0029) [2025-01-04 07:55:33,968][134211] Fps is (10 sec: 13516.5, 60 sec: 13994.7, 300 sec: 14426.2). Total num frames: 581013504. Throughput: 0: 3626.3. Samples: 134418508. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 07:55:33,969][134211] Avg episode reward: [(0, '8.551')] [2025-01-04 07:55:35,443][134294] Updated weights for policy 0, policy_version 141854 (0.0027) [2025-01-04 07:55:38,667][134294] Updated weights for policy 0, policy_version 141864 (0.0025) [2025-01-04 07:55:38,968][134211] Fps is (10 sec: 12697.7, 60 sec: 14062.9, 300 sec: 14287.4). Total num frames: 581079040. Throughput: 0: 3594.9. Samples: 134437270. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 07:55:38,968][134211] Avg episode reward: [(0, '8.097')] [2025-01-04 07:55:41,810][134294] Updated weights for policy 0, policy_version 141874 (0.0027) [2025-01-04 07:55:43,968][134211] Fps is (10 sec: 12697.8, 60 sec: 14062.9, 300 sec: 14148.6). Total num frames: 581140480. Throughput: 0: 3592.7. Samples: 134456470. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 07:55:43,968][134211] Avg episode reward: [(0, '8.767')] [2025-01-04 07:55:44,997][134294] Updated weights for policy 0, policy_version 141884 (0.0025) [2025-01-04 07:55:47,973][134294] Updated weights for policy 0, policy_version 141894 (0.0024) [2025-01-04 07:55:48,968][134211] Fps is (10 sec: 13107.1, 60 sec: 14131.2, 300 sec: 14176.3). Total num frames: 581210112. Throughput: 0: 3604.5. Samples: 134466220. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 07:55:48,968][134211] Avg episode reward: [(0, '8.391')] [2025-01-04 07:55:51,103][134294] Updated weights for policy 0, policy_version 141904 (0.0024) [2025-01-04 07:55:53,968][134211] Fps is (10 sec: 13516.6, 60 sec: 14063.0, 300 sec: 14176.3). Total num frames: 581275648. Throughput: 0: 3606.4. Samples: 134486528. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 07:55:53,968][134211] Avg episode reward: [(0, '9.196')] [2025-01-04 07:55:54,188][134294] Updated weights for policy 0, policy_version 141914 (0.0027) [2025-01-04 07:55:57,596][134294] Updated weights for policy 0, policy_version 141924 (0.0025) [2025-01-04 07:55:58,968][134211] Fps is (10 sec: 12697.7, 60 sec: 14063.0, 300 sec: 14190.2). Total num frames: 581337088. Throughput: 0: 3576.0. Samples: 134505122. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 07:55:58,968][134211] Avg episode reward: [(0, '8.432')] [2025-01-04 07:56:00,695][134294] Updated weights for policy 0, policy_version 141934 (0.0024) [2025-01-04 07:56:02,788][134294] Updated weights for policy 0, policy_version 141944 (0.0013) [2025-01-04 07:56:03,968][134211] Fps is (10 sec: 15154.9, 60 sec: 14472.5, 300 sec: 14273.5). Total num frames: 581427200. Throughput: 0: 3597.0. Samples: 134515832. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 07:56:03,968][134211] Avg episode reward: [(0, '8.524')] [2025-01-04 07:56:04,682][134294] Updated weights for policy 0, policy_version 141954 (0.0014) [2025-01-04 07:56:06,586][134294] Updated weights for policy 0, policy_version 141964 (0.0014) [2025-01-04 07:56:08,528][134294] Updated weights for policy 0, policy_version 141974 (0.0014) [2025-01-04 07:56:08,968][134211] Fps is (10 sec: 19661.0, 60 sec: 15155.2, 300 sec: 14412.4). Total num frames: 581533696. Throughput: 0: 3632.1. Samples: 134547560. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 07:56:08,968][134211] Avg episode reward: [(0, '8.490')] [2025-01-04 07:56:10,432][134294] Updated weights for policy 0, policy_version 141984 (0.0015) [2025-01-04 07:56:12,884][134294] Updated weights for policy 0, policy_version 141994 (0.0019) [2025-01-04 07:56:13,968][134211] Fps is (10 sec: 19251.9, 60 sec: 15428.4, 300 sec: 14467.9). Total num frames: 581619712. Throughput: 0: 3705.9. Samples: 134576182. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 07:56:13,968][134211] Avg episode reward: [(0, '8.252')] [2025-01-04 07:56:16,046][134294] Updated weights for policy 0, policy_version 142004 (0.0027) [2025-01-04 07:56:18,968][134211] Fps is (10 sec: 14745.5, 60 sec: 15155.2, 300 sec: 14454.0). Total num frames: 581681152. Throughput: 0: 3718.0. Samples: 134585818. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 07:56:18,968][134211] Avg episode reward: [(0, '9.475')] [2025-01-04 07:56:19,334][134294] Updated weights for policy 0, policy_version 142014 (0.0025) [2025-01-04 07:56:22,354][134294] Updated weights for policy 0, policy_version 142024 (0.0026) [2025-01-04 07:56:23,968][134211] Fps is (10 sec: 13107.0, 60 sec: 14540.8, 300 sec: 14495.7). Total num frames: 581750784. Throughput: 0: 3730.9. Samples: 134605162. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 07:56:23,968][134211] Avg episode reward: [(0, '8.725')] [2025-01-04 07:56:25,505][134294] Updated weights for policy 0, policy_version 142034 (0.0025) [2025-01-04 07:56:28,774][134294] Updated weights for policy 0, policy_version 142044 (0.0026) [2025-01-04 07:56:28,968][134211] Fps is (10 sec: 13107.1, 60 sec: 14336.0, 300 sec: 14426.3). Total num frames: 581812224. Throughput: 0: 3731.9. Samples: 134624408. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 07:56:28,968][134211] Avg episode reward: [(0, '9.054')] [2025-01-04 07:56:32,393][134294] Updated weights for policy 0, policy_version 142054 (0.0025) [2025-01-04 07:56:33,968][134211] Fps is (10 sec: 11878.2, 60 sec: 14267.7, 300 sec: 14259.6). Total num frames: 581869568. Throughput: 0: 3710.0. Samples: 134633170. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 07:56:33,969][134211] Avg episode reward: [(0, '7.791')] [2025-01-04 07:56:35,747][134294] Updated weights for policy 0, policy_version 142064 (0.0026) [2025-01-04 07:56:38,968][134211] Fps is (10 sec: 11878.5, 60 sec: 14199.5, 300 sec: 14190.2). Total num frames: 581931008. Throughput: 0: 3656.3. Samples: 134651060. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 07:56:38,968][134211] Avg episode reward: [(0, '8.609')] [2025-01-04 07:56:39,166][134294] Updated weights for policy 0, policy_version 142074 (0.0025) [2025-01-04 07:56:42,117][134294] Updated weights for policy 0, policy_version 142084 (0.0025) [2025-01-04 07:56:43,968][134211] Fps is (10 sec: 12697.8, 60 sec: 14267.7, 300 sec: 14190.2). Total num frames: 581996544. Throughput: 0: 3678.8. Samples: 134670670. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 07:56:43,968][134211] Avg episode reward: [(0, '9.155')] [2025-01-04 07:56:45,114][134294] Updated weights for policy 0, policy_version 142094 (0.0024) [2025-01-04 07:56:48,047][134294] Updated weights for policy 0, policy_version 142104 (0.0024) [2025-01-04 07:56:48,968][134211] Fps is (10 sec: 13926.3, 60 sec: 14336.0, 300 sec: 14245.8). Total num frames: 582070272. Throughput: 0: 3677.3. Samples: 134681308. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 07:56:48,968][134211] Avg episode reward: [(0, '9.414')] [2025-01-04 07:56:50,936][134294] Updated weights for policy 0, policy_version 142114 (0.0023) [2025-01-04 07:56:53,843][134294] Updated weights for policy 0, policy_version 142124 (0.0023) [2025-01-04 07:56:53,968][134211] Fps is (10 sec: 14336.2, 60 sec: 14404.3, 300 sec: 14287.4). Total num frames: 582139904. Throughput: 0: 3438.0. Samples: 134702270. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 07:56:53,968][134211] Avg episode reward: [(0, '9.350')] [2025-01-04 07:56:57,177][134294] Updated weights for policy 0, policy_version 142134 (0.0026) [2025-01-04 07:56:58,968][134211] Fps is (10 sec: 13107.2, 60 sec: 14404.2, 300 sec: 14301.3). Total num frames: 582201344. Throughput: 0: 3225.5. Samples: 134721330. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 07:56:58,968][134211] Avg episode reward: [(0, '9.124')] [2025-01-04 07:57:00,437][134294] Updated weights for policy 0, policy_version 142144 (0.0024) [2025-01-04 07:57:02,527][134294] Updated weights for policy 0, policy_version 142154 (0.0013) [2025-01-04 07:57:03,968][134211] Fps is (10 sec: 15155.4, 60 sec: 14404.4, 300 sec: 14384.6). Total num frames: 582291456. Throughput: 0: 3260.4. Samples: 134732534. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 07:57:03,968][134211] Avg episode reward: [(0, '9.547')] [2025-01-04 07:57:04,479][134294] Updated weights for policy 0, policy_version 142164 (0.0013) [2025-01-04 07:57:06,383][134294] Updated weights for policy 0, policy_version 142174 (0.0013) [2025-01-04 07:57:08,230][134294] Updated weights for policy 0, policy_version 142184 (0.0013) [2025-01-04 07:57:08,968][134211] Fps is (10 sec: 19661.0, 60 sec: 14404.2, 300 sec: 14523.4). Total num frames: 582397952. Throughput: 0: 3527.6. Samples: 134763904. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 07:57:08,968][134211] Avg episode reward: [(0, '8.499')] [2025-01-04 07:57:10,221][134294] Updated weights for policy 0, policy_version 142194 (0.0013) [2025-01-04 07:57:12,732][134294] Updated weights for policy 0, policy_version 142204 (0.0019) [2025-01-04 07:57:13,968][134211] Fps is (10 sec: 18841.2, 60 sec: 14336.0, 300 sec: 14537.3). Total num frames: 582479872. Throughput: 0: 3709.5. Samples: 134791336. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 07:57:13,968][134211] Avg episode reward: [(0, '8.574')] [2025-01-04 07:57:13,982][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000142207_582479872.pth... [2025-01-04 07:57:14,062][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000141350_578969600.pth [2025-01-04 07:57:16,262][134294] Updated weights for policy 0, policy_version 142214 (0.0032) [2025-01-04 07:57:18,968][134211] Fps is (10 sec: 14336.1, 60 sec: 14336.0, 300 sec: 14454.0). Total num frames: 582541312. Throughput: 0: 3713.5. Samples: 134800278. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 07:57:18,968][134211] Avg episode reward: [(0, '7.973')] [2025-01-04 07:57:19,258][134294] Updated weights for policy 0, policy_version 142224 (0.0026) [2025-01-04 07:57:22,559][134294] Updated weights for policy 0, policy_version 142234 (0.0026) [2025-01-04 07:57:23,968][134211] Fps is (10 sec: 12697.7, 60 sec: 14267.8, 300 sec: 14481.8). Total num frames: 582606848. Throughput: 0: 3747.0. Samples: 134819676. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 07:57:23,968][134211] Avg episode reward: [(0, '8.392')] [2025-01-04 07:57:25,906][134294] Updated weights for policy 0, policy_version 142244 (0.0024) [2025-01-04 07:57:28,968][134211] Fps is (10 sec: 12288.0, 60 sec: 14199.5, 300 sec: 14384.7). Total num frames: 582664192. Throughput: 0: 3714.4. Samples: 134837816. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 07:57:28,968][134211] Avg episode reward: [(0, '8.196')] [2025-01-04 07:57:29,426][134294] Updated weights for policy 0, policy_version 142254 (0.0024) [2025-01-04 07:57:32,837][134294] Updated weights for policy 0, policy_version 142264 (0.0027) [2025-01-04 07:57:33,969][134211] Fps is (10 sec: 11876.9, 60 sec: 14267.5, 300 sec: 14370.7). Total num frames: 582725632. Throughput: 0: 3672.0. Samples: 134846552. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 07:57:33,969][134211] Avg episode reward: [(0, '8.809')] [2025-01-04 07:57:35,819][134294] Updated weights for policy 0, policy_version 142274 (0.0027) [2025-01-04 07:57:38,803][134294] Updated weights for policy 0, policy_version 142284 (0.0024) [2025-01-04 07:57:38,968][134211] Fps is (10 sec: 13107.0, 60 sec: 14404.2, 300 sec: 14398.5). Total num frames: 582795264. Throughput: 0: 3653.0. Samples: 134866656. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 07:57:38,968][134211] Avg episode reward: [(0, '8.226')] [2025-01-04 07:57:41,802][134294] Updated weights for policy 0, policy_version 142294 (0.0025) [2025-01-04 07:57:43,968][134211] Fps is (10 sec: 13928.2, 60 sec: 14472.6, 300 sec: 14398.5). Total num frames: 582864896. Throughput: 0: 3677.2. Samples: 134886804. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 07:57:43,968][134211] Avg episode reward: [(0, '9.424')] [2025-01-04 07:57:44,838][134294] Updated weights for policy 0, policy_version 142304 (0.0023) [2025-01-04 07:57:47,814][134294] Updated weights for policy 0, policy_version 142314 (0.0025) [2025-01-04 07:57:48,968][134211] Fps is (10 sec: 13517.0, 60 sec: 14336.0, 300 sec: 14398.5). Total num frames: 582930432. Throughput: 0: 3655.9. Samples: 134897050. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 07:57:48,968][134211] Avg episode reward: [(0, '8.783')] [2025-01-04 07:57:50,782][134294] Updated weights for policy 0, policy_version 142324 (0.0026) [2025-01-04 07:57:53,715][134294] Updated weights for policy 0, policy_version 142334 (0.0025) [2025-01-04 07:57:53,968][134211] Fps is (10 sec: 13926.4, 60 sec: 14404.3, 300 sec: 14426.3). Total num frames: 583004160. Throughput: 0: 3426.3. Samples: 134918086. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 07:57:53,968][134211] Avg episode reward: [(0, '8.771')] [2025-01-04 07:57:56,866][134294] Updated weights for policy 0, policy_version 142344 (0.0026) [2025-01-04 07:57:58,968][134211] Fps is (10 sec: 13516.8, 60 sec: 14404.3, 300 sec: 14370.7). Total num frames: 583065600. Throughput: 0: 3241.3. Samples: 134937194. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 07:57:58,968][134211] Avg episode reward: [(0, '7.815')] [2025-01-04 07:58:00,228][134294] Updated weights for policy 0, policy_version 142354 (0.0025) [2025-01-04 07:58:02,483][134294] Updated weights for policy 0, policy_version 142364 (0.0013) [2025-01-04 07:58:03,968][134211] Fps is (10 sec: 14745.8, 60 sec: 14336.0, 300 sec: 14440.1). Total num frames: 583151616. Throughput: 0: 3269.4. Samples: 134947400. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 07:58:03,968][134211] Avg episode reward: [(0, '8.936')] [2025-01-04 07:58:04,410][134294] Updated weights for policy 0, policy_version 142374 (0.0014) [2025-01-04 07:58:07,048][134294] Updated weights for policy 0, policy_version 142384 (0.0024) [2025-01-04 07:58:08,968][134211] Fps is (10 sec: 16383.7, 60 sec: 13858.1, 300 sec: 14440.1). Total num frames: 583229440. Throughput: 0: 3446.7. Samples: 134974776. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 07:58:08,968][134211] Avg episode reward: [(0, '8.233')] [2025-01-04 07:58:10,276][134294] Updated weights for policy 0, policy_version 142394 (0.0027) [2025-01-04 07:58:13,188][134294] Updated weights for policy 0, policy_version 142404 (0.0025) [2025-01-04 07:58:13,968][134211] Fps is (10 sec: 14335.8, 60 sec: 13585.1, 300 sec: 14301.3). Total num frames: 583294976. Throughput: 0: 3489.3. Samples: 134994836. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 07:58:13,968][134211] Avg episode reward: [(0, '8.545')] [2025-01-04 07:58:16,204][134294] Updated weights for policy 0, policy_version 142414 (0.0025) [2025-01-04 07:58:18,968][134211] Fps is (10 sec: 13106.4, 60 sec: 13653.2, 300 sec: 14190.2). Total num frames: 583360512. Throughput: 0: 3522.1. Samples: 135005046. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 07:58:18,969][134211] Avg episode reward: [(0, '8.598')] [2025-01-04 07:58:19,415][134294] Updated weights for policy 0, policy_version 142424 (0.0028) [2025-01-04 07:58:22,432][134294] Updated weights for policy 0, policy_version 142434 (0.0026) [2025-01-04 07:58:23,968][134211] Fps is (10 sec: 13516.7, 60 sec: 13721.6, 300 sec: 14218.0). Total num frames: 583430144. Throughput: 0: 3516.2. Samples: 135024884. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 07:58:23,968][134211] Avg episode reward: [(0, '8.560')] [2025-01-04 07:58:25,467][134294] Updated weights for policy 0, policy_version 142444 (0.0023) [2025-01-04 07:58:28,472][134294] Updated weights for policy 0, policy_version 142454 (0.0024) [2025-01-04 07:58:28,968][134211] Fps is (10 sec: 13517.8, 60 sec: 13858.1, 300 sec: 14231.9). Total num frames: 583495680. Throughput: 0: 3508.4. Samples: 135044684. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 07:58:28,968][134211] Avg episode reward: [(0, '9.555')] [2025-01-04 07:58:30,648][134294] Updated weights for policy 0, policy_version 142464 (0.0014) [2025-01-04 07:58:32,775][134294] Updated weights for policy 0, policy_version 142474 (0.0012) [2025-01-04 07:58:33,968][134211] Fps is (10 sec: 16384.1, 60 sec: 14472.8, 300 sec: 14356.8). Total num frames: 583593984. Throughput: 0: 3598.7. Samples: 135058994. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 07:58:33,968][134211] Avg episode reward: [(0, '8.298')] [2025-01-04 07:58:35,538][134294] Updated weights for policy 0, policy_version 142484 (0.0020) [2025-01-04 07:58:38,547][134294] Updated weights for policy 0, policy_version 142494 (0.0026) [2025-01-04 07:58:38,968][134211] Fps is (10 sec: 16383.7, 60 sec: 14404.3, 300 sec: 14356.8). Total num frames: 583659520. Throughput: 0: 3655.5. Samples: 135082584. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 07:58:38,968][134211] Avg episode reward: [(0, '8.272')] [2025-01-04 07:58:41,782][134294] Updated weights for policy 0, policy_version 142504 (0.0027) [2025-01-04 07:58:43,969][134211] Fps is (10 sec: 12695.9, 60 sec: 14267.4, 300 sec: 14329.0). Total num frames: 583720960. Throughput: 0: 3658.3. Samples: 135101824. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 07:58:43,970][134211] Avg episode reward: [(0, '8.776')] [2025-01-04 07:58:44,967][134294] Updated weights for policy 0, policy_version 142514 (0.0024) [2025-01-04 07:58:48,031][134294] Updated weights for policy 0, policy_version 142524 (0.0026) [2025-01-04 07:58:48,968][134211] Fps is (10 sec: 13107.2, 60 sec: 14336.0, 300 sec: 14329.1). Total num frames: 583790592. Throughput: 0: 3647.4. Samples: 135111536. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 07:58:48,968][134211] Avg episode reward: [(0, '8.486')] [2025-01-04 07:58:51,062][134294] Updated weights for policy 0, policy_version 142534 (0.0026) [2025-01-04 07:58:53,968][134211] Fps is (10 sec: 13518.5, 60 sec: 14199.4, 300 sec: 14329.1). Total num frames: 583856128. Throughput: 0: 3490.5. Samples: 135131848. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 07:58:53,968][134211] Avg episode reward: [(0, '8.299')] [2025-01-04 07:58:54,126][134294] Updated weights for policy 0, policy_version 142544 (0.0028) [2025-01-04 07:58:57,430][134294] Updated weights for policy 0, policy_version 142554 (0.0027) [2025-01-04 07:58:58,968][134211] Fps is (10 sec: 12697.7, 60 sec: 14199.4, 300 sec: 14329.1). Total num frames: 583917568. Throughput: 0: 3460.0. Samples: 135150534. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 07:58:58,969][134211] Avg episode reward: [(0, '9.044')] [2025-01-04 07:59:00,426][134294] Updated weights for policy 0, policy_version 142564 (0.0020) [2025-01-04 07:59:02,380][134294] Updated weights for policy 0, policy_version 142574 (0.0013) [2025-01-04 07:59:03,968][134211] Fps is (10 sec: 15565.2, 60 sec: 14336.0, 300 sec: 14426.3). Total num frames: 584011776. Throughput: 0: 3504.1. Samples: 135162730. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 07:59:03,968][134211] Avg episode reward: [(0, '8.607')] [2025-01-04 07:59:04,361][134294] Updated weights for policy 0, policy_version 142584 (0.0013) [2025-01-04 07:59:06,278][134294] Updated weights for policy 0, policy_version 142594 (0.0012) [2025-01-04 07:59:08,689][134294] Updated weights for policy 0, policy_version 142604 (0.0019) [2025-01-04 07:59:08,968][134211] Fps is (10 sec: 18841.4, 60 sec: 14609.1, 300 sec: 14509.6). Total num frames: 584105984. Throughput: 0: 3765.5. Samples: 135194330. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 07:59:08,969][134211] Avg episode reward: [(0, '9.088')] [2025-01-04 07:59:12,210][134294] Updated weights for policy 0, policy_version 142614 (0.0028) [2025-01-04 07:59:13,969][134211] Fps is (10 sec: 15562.7, 60 sec: 14540.5, 300 sec: 14481.7). Total num frames: 584167424. Throughput: 0: 3734.2. Samples: 135212728. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 07:59:13,970][134211] Avg episode reward: [(0, '8.795')] [2025-01-04 07:59:13,979][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000142619_584167424.pth... [2025-01-04 07:59:14,054][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000141771_580694016.pth [2025-01-04 07:59:15,626][134294] Updated weights for policy 0, policy_version 142624 (0.0031) [2025-01-04 07:59:18,666][134294] Updated weights for policy 0, policy_version 142634 (0.0027) [2025-01-04 07:59:18,968][134211] Fps is (10 sec: 12288.2, 60 sec: 14472.7, 300 sec: 14329.1). Total num frames: 584228864. Throughput: 0: 3625.3. Samples: 135222134. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 07:59:18,968][134211] Avg episode reward: [(0, '8.393')] [2025-01-04 07:59:21,801][134294] Updated weights for policy 0, policy_version 142644 (0.0026) [2025-01-04 07:59:23,968][134211] Fps is (10 sec: 13108.7, 60 sec: 14472.5, 300 sec: 14204.1). Total num frames: 584298496. Throughput: 0: 3538.4. Samples: 135241814. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 07:59:23,968][134211] Avg episode reward: [(0, '9.259')] [2025-01-04 07:59:24,935][134294] Updated weights for policy 0, policy_version 142654 (0.0026) [2025-01-04 07:59:28,430][134294] Updated weights for policy 0, policy_version 142664 (0.0028) [2025-01-04 07:59:28,968][134211] Fps is (10 sec: 12697.6, 60 sec: 14336.0, 300 sec: 14176.3). Total num frames: 584355840. Throughput: 0: 3528.5. Samples: 135260602. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 07:59:28,968][134211] Avg episode reward: [(0, '8.354')] [2025-01-04 07:59:31,795][134294] Updated weights for policy 0, policy_version 142674 (0.0022) [2025-01-04 07:59:33,968][134211] Fps is (10 sec: 11878.4, 60 sec: 13721.6, 300 sec: 14176.3). Total num frames: 584417280. Throughput: 0: 3508.1. Samples: 135269400. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 07:59:33,968][134211] Avg episode reward: [(0, '9.300')] [2025-01-04 07:59:35,009][134294] Updated weights for policy 0, policy_version 142684 (0.0027) [2025-01-04 07:59:38,047][134294] Updated weights for policy 0, policy_version 142694 (0.0026) [2025-01-04 07:59:38,968][134211] Fps is (10 sec: 13107.1, 60 sec: 13789.9, 300 sec: 14204.1). Total num frames: 584486912. Throughput: 0: 3492.4. Samples: 135289008. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 07:59:38,968][134211] Avg episode reward: [(0, '9.491')] [2025-01-04 07:59:41,060][134294] Updated weights for policy 0, policy_version 142704 (0.0026) [2025-01-04 07:59:43,968][134211] Fps is (10 sec: 13516.9, 60 sec: 13858.4, 300 sec: 14204.1). Total num frames: 584552448. Throughput: 0: 3521.4. Samples: 135308998. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 07:59:43,968][134211] Avg episode reward: [(0, '8.701')] [2025-01-04 07:59:44,156][134294] Updated weights for policy 0, policy_version 142714 (0.0024) [2025-01-04 07:59:47,257][134294] Updated weights for policy 0, policy_version 142724 (0.0025) [2025-01-04 07:59:48,967][134211] Fps is (10 sec: 14336.4, 60 sec: 13994.7, 300 sec: 14231.9). Total num frames: 584630272. Throughput: 0: 3471.5. Samples: 135318948. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 07:59:48,968][134211] Avg episode reward: [(0, '8.890')] [2025-01-04 07:59:49,330][134294] Updated weights for policy 0, policy_version 142734 (0.0015) [2025-01-04 07:59:51,318][134294] Updated weights for policy 0, policy_version 142744 (0.0013) [2025-01-04 07:59:53,254][134294] Updated weights for policy 0, policy_version 142754 (0.0013) [2025-01-04 07:59:53,968][134211] Fps is (10 sec: 18022.6, 60 sec: 14609.1, 300 sec: 14370.7). Total num frames: 584732672. Throughput: 0: 3404.2. Samples: 135347520. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 07:59:53,968][134211] Avg episode reward: [(0, '8.856')] [2025-01-04 07:59:55,366][134294] Updated weights for policy 0, policy_version 142764 (0.0015) [2025-01-04 07:59:58,854][134294] Updated weights for policy 0, policy_version 142774 (0.0029) [2025-01-04 07:59:58,968][134211] Fps is (10 sec: 17202.7, 60 sec: 14745.6, 300 sec: 14384.6). Total num frames: 584802304. Throughput: 0: 3534.8. Samples: 135371788. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 07:59:58,969][134211] Avg episode reward: [(0, '8.973')] [2025-01-04 08:00:02,635][134294] Updated weights for policy 0, policy_version 142784 (0.0032) [2025-01-04 08:00:03,968][134211] Fps is (10 sec: 12287.7, 60 sec: 14062.9, 300 sec: 14342.9). Total num frames: 584855552. Throughput: 0: 3513.7. Samples: 135380250. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 08:00:03,968][134211] Avg episode reward: [(0, '9.089')] [2025-01-04 08:00:05,976][134294] Updated weights for policy 0, policy_version 142794 (0.0022) [2025-01-04 08:00:08,968][134211] Fps is (10 sec: 11878.6, 60 sec: 13585.1, 300 sec: 14329.1). Total num frames: 584921088. Throughput: 0: 3471.9. Samples: 135398048. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 08:00:08,968][134211] Avg episode reward: [(0, '8.457')] [2025-01-04 08:00:09,141][134294] Updated weights for policy 0, policy_version 142804 (0.0025) [2025-01-04 08:00:12,149][134294] Updated weights for policy 0, policy_version 142814 (0.0025) [2025-01-04 08:00:13,968][134211] Fps is (10 sec: 13107.3, 60 sec: 13653.6, 300 sec: 14287.4). Total num frames: 584986624. Throughput: 0: 3494.0. Samples: 135417834. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 08:00:13,968][134211] Avg episode reward: [(0, '8.884')] [2025-01-04 08:00:15,335][134294] Updated weights for policy 0, policy_version 142824 (0.0025) [2025-01-04 08:00:18,330][134294] Updated weights for policy 0, policy_version 142834 (0.0030) [2025-01-04 08:00:18,968][134211] Fps is (10 sec: 13107.3, 60 sec: 13721.6, 300 sec: 14148.6). Total num frames: 585052160. Throughput: 0: 3526.2. Samples: 135428078. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 08:00:18,968][134211] Avg episode reward: [(0, '9.144')] [2025-01-04 08:00:21,399][134294] Updated weights for policy 0, policy_version 142844 (0.0026) [2025-01-04 08:00:23,968][134211] Fps is (10 sec: 13107.1, 60 sec: 13653.3, 300 sec: 14120.8). Total num frames: 585117696. Throughput: 0: 3526.3. Samples: 135447692. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 08:00:23,968][134211] Avg episode reward: [(0, '8.661')] [2025-01-04 08:00:24,671][134294] Updated weights for policy 0, policy_version 142854 (0.0023) [2025-01-04 08:00:27,282][134294] Updated weights for policy 0, policy_version 142864 (0.0021) [2025-01-04 08:00:28,968][134211] Fps is (10 sec: 15155.2, 60 sec: 14131.2, 300 sec: 14204.1). Total num frames: 585203712. Throughput: 0: 3588.9. Samples: 135470498. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 08:00:28,968][134211] Avg episode reward: [(0, '8.379')] [2025-01-04 08:00:29,328][134294] Updated weights for policy 0, policy_version 142874 (0.0013) [2025-01-04 08:00:32,032][134294] Updated weights for policy 0, policy_version 142884 (0.0020) [2025-01-04 08:00:33,968][134211] Fps is (10 sec: 15564.8, 60 sec: 14267.7, 300 sec: 14218.0). Total num frames: 585273344. Throughput: 0: 3666.6. Samples: 135483948. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 08:00:33,968][134211] Avg episode reward: [(0, '8.715')] [2025-01-04 08:00:35,617][134294] Updated weights for policy 0, policy_version 142894 (0.0030) [2025-01-04 08:00:38,968][134211] Fps is (10 sec: 12697.4, 60 sec: 14062.9, 300 sec: 14204.1). Total num frames: 585330688. Throughput: 0: 3416.6. Samples: 135501266. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 08:00:38,968][134211] Avg episode reward: [(0, '10.020')] [2025-01-04 08:00:39,165][134294] Updated weights for policy 0, policy_version 142904 (0.0026) [2025-01-04 08:00:41,248][134294] Updated weights for policy 0, policy_version 142914 (0.0014) [2025-01-04 08:00:43,162][134294] Updated weights for policy 0, policy_version 142924 (0.0012) [2025-01-04 08:00:43,967][134211] Fps is (10 sec: 15974.9, 60 sec: 14677.4, 300 sec: 14315.2). Total num frames: 585433088. Throughput: 0: 3462.3. Samples: 135527590. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 08:00:43,968][134211] Avg episode reward: [(0, '9.361')] [2025-01-04 08:00:45,039][134294] Updated weights for policy 0, policy_version 142934 (0.0012) [2025-01-04 08:00:46,908][134294] Updated weights for policy 0, policy_version 142944 (0.0014) [2025-01-04 08:00:48,811][134294] Updated weights for policy 0, policy_version 142954 (0.0013) [2025-01-04 08:00:48,968][134211] Fps is (10 sec: 21299.6, 60 sec: 15223.5, 300 sec: 14467.9). Total num frames: 585543680. Throughput: 0: 3637.7. Samples: 135543944. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 08:00:48,968][134211] Avg episode reward: [(0, '8.623')] [2025-01-04 08:00:51,239][134294] Updated weights for policy 0, policy_version 142964 (0.0019) [2025-01-04 08:00:53,968][134211] Fps is (10 sec: 18022.0, 60 sec: 14677.3, 300 sec: 14495.7). Total num frames: 585613312. Throughput: 0: 3847.8. Samples: 135571200. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 08:00:53,968][134211] Avg episode reward: [(0, '8.183')] [2025-01-04 08:00:54,546][134294] Updated weights for policy 0, policy_version 142974 (0.0028) [2025-01-04 08:00:58,329][134294] Updated weights for policy 0, policy_version 142984 (0.0025) [2025-01-04 08:00:58,968][134211] Fps is (10 sec: 12287.1, 60 sec: 14404.1, 300 sec: 14370.7). Total num frames: 585666560. Throughput: 0: 3785.9. Samples: 135588202. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 08:00:58,969][134211] Avg episode reward: [(0, '8.194')] [2025-01-04 08:01:01,800][134294] Updated weights for policy 0, policy_version 142994 (0.0025) [2025-01-04 08:01:03,968][134211] Fps is (10 sec: 11468.7, 60 sec: 14540.8, 300 sec: 14218.0). Total num frames: 585728000. Throughput: 0: 3763.7. Samples: 135597446. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 08:01:03,968][134211] Avg episode reward: [(0, '8.891')] [2025-01-04 08:01:04,979][134294] Updated weights for policy 0, policy_version 143004 (0.0025) [2025-01-04 08:01:08,041][134294] Updated weights for policy 0, policy_version 143014 (0.0025) [2025-01-04 08:01:08,968][134211] Fps is (10 sec: 12698.2, 60 sec: 14540.8, 300 sec: 14148.5). Total num frames: 585793536. Throughput: 0: 3754.6. Samples: 135616648. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 08:01:08,968][134211] Avg episode reward: [(0, '8.774')] [2025-01-04 08:01:11,064][134294] Updated weights for policy 0, policy_version 143024 (0.0026) [2025-01-04 08:01:13,968][134211] Fps is (10 sec: 13516.9, 60 sec: 14609.1, 300 sec: 14176.3). Total num frames: 585863168. Throughput: 0: 3682.5. Samples: 135636212. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 08:01:13,968][134211] Avg episode reward: [(0, '8.043')] [2025-01-04 08:01:13,976][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000143033_585863168.pth... [2025-01-04 08:01:14,042][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000142207_582479872.pth [2025-01-04 08:01:14,377][134294] Updated weights for policy 0, policy_version 143034 (0.0025) [2025-01-04 08:01:17,453][134294] Updated weights for policy 0, policy_version 143044 (0.0026) [2025-01-04 08:01:18,968][134211] Fps is (10 sec: 13107.3, 60 sec: 14540.8, 300 sec: 14148.6). Total num frames: 585924608. Throughput: 0: 3601.0. Samples: 135645992. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 08:01:18,968][134211] Avg episode reward: [(0, '8.879')] [2025-01-04 08:01:20,410][134294] Updated weights for policy 0, policy_version 143054 (0.0025) [2025-01-04 08:01:23,358][134294] Updated weights for policy 0, policy_version 143064 (0.0024) [2025-01-04 08:01:23,968][134211] Fps is (10 sec: 13516.9, 60 sec: 14677.4, 300 sec: 14190.2). Total num frames: 585998336. Throughput: 0: 3675.0. Samples: 135666640. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 08:01:23,968][134211] Avg episode reward: [(0, '8.565')] [2025-01-04 08:01:26,466][134294] Updated weights for policy 0, policy_version 143074 (0.0025) [2025-01-04 08:01:28,968][134211] Fps is (10 sec: 13516.7, 60 sec: 14267.7, 300 sec: 14204.1). Total num frames: 586059776. Throughput: 0: 3523.4. Samples: 135686146. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 08:01:28,968][134211] Avg episode reward: [(0, '8.811')] [2025-01-04 08:01:29,794][134294] Updated weights for policy 0, policy_version 143084 (0.0027) [2025-01-04 08:01:33,030][134294] Updated weights for policy 0, policy_version 143094 (0.0026) [2025-01-04 08:01:33,968][134211] Fps is (10 sec: 12697.6, 60 sec: 14199.5, 300 sec: 14218.0). Total num frames: 586125312. Throughput: 0: 3369.3. Samples: 135695564. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 08:01:33,968][134211] Avg episode reward: [(0, '10.162')] [2025-01-04 08:01:36,030][134294] Updated weights for policy 0, policy_version 143104 (0.0023) [2025-01-04 08:01:38,968][134211] Fps is (10 sec: 13106.7, 60 sec: 14335.9, 300 sec: 14218.0). Total num frames: 586190848. Throughput: 0: 3211.1. Samples: 135715700. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 08:01:38,969][134211] Avg episode reward: [(0, '7.850')] [2025-01-04 08:01:39,133][134294] Updated weights for policy 0, policy_version 143114 (0.0026) [2025-01-04 08:01:42,083][134294] Updated weights for policy 0, policy_version 143124 (0.0025) [2025-01-04 08:01:43,967][134211] Fps is (10 sec: 14336.2, 60 sec: 13926.4, 300 sec: 14231.9). Total num frames: 586268672. Throughput: 0: 3289.5. Samples: 135736226. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 08:01:43,968][134211] Avg episode reward: [(0, '8.231')] [2025-01-04 08:01:44,369][134294] Updated weights for policy 0, policy_version 143134 (0.0016) [2025-01-04 08:01:46,189][134294] Updated weights for policy 0, policy_version 143144 (0.0013) [2025-01-04 08:01:48,372][134294] Updated weights for policy 0, policy_version 143154 (0.0018) [2025-01-04 08:01:48,969][134211] Fps is (10 sec: 17201.3, 60 sec: 13653.0, 300 sec: 14315.1). Total num frames: 586362880. Throughput: 0: 3444.1. Samples: 135752436. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 08:01:48,970][134211] Avg episode reward: [(0, '7.169')] [2025-01-04 08:01:51,566][134294] Updated weights for policy 0, policy_version 143164 (0.0029) [2025-01-04 08:01:53,968][134211] Fps is (10 sec: 15974.1, 60 sec: 13585.1, 300 sec: 14329.1). Total num frames: 586428416. Throughput: 0: 3516.2. Samples: 135774876. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 08:01:53,968][134211] Avg episode reward: [(0, '7.848')] [2025-01-04 08:01:55,092][134294] Updated weights for policy 0, policy_version 143174 (0.0024) [2025-01-04 08:01:58,363][134294] Updated weights for policy 0, policy_version 143184 (0.0028) [2025-01-04 08:01:58,968][134211] Fps is (10 sec: 12289.9, 60 sec: 13653.5, 300 sec: 14218.0). Total num frames: 586485760. Throughput: 0: 3485.2. Samples: 135793048. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 08:01:58,968][134211] Avg episode reward: [(0, '9.412')] [2025-01-04 08:02:01,680][134294] Updated weights for policy 0, policy_version 143194 (0.0027) [2025-01-04 08:02:03,968][134211] Fps is (10 sec: 11878.2, 60 sec: 13653.3, 300 sec: 14065.2). Total num frames: 586547200. Throughput: 0: 3474.9. Samples: 135802364. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 08:02:03,968][134211] Avg episode reward: [(0, '8.340')] [2025-01-04 08:02:04,864][134294] Updated weights for policy 0, policy_version 143204 (0.0024) [2025-01-04 08:02:06,788][134294] Updated weights for policy 0, policy_version 143214 (0.0015) [2025-01-04 08:02:08,621][134294] Updated weights for policy 0, policy_version 143224 (0.0014) [2025-01-04 08:02:08,968][134211] Fps is (10 sec: 16384.1, 60 sec: 14267.8, 300 sec: 14134.7). Total num frames: 586649600. Throughput: 0: 3556.9. Samples: 135826698. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 08:02:08,968][134211] Avg episode reward: [(0, '8.331')] [2025-01-04 08:02:10,519][134294] Updated weights for policy 0, policy_version 143234 (0.0013) [2025-01-04 08:02:12,504][134294] Updated weights for policy 0, policy_version 143244 (0.0017) [2025-01-04 08:02:13,968][134211] Fps is (10 sec: 20070.7, 60 sec: 14745.6, 300 sec: 14259.6). Total num frames: 586747904. Throughput: 0: 3809.7. Samples: 135857584. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 08:02:13,968][134211] Avg episode reward: [(0, '8.343')] [2025-01-04 08:02:15,487][134294] Updated weights for policy 0, policy_version 143254 (0.0026) [2025-01-04 08:02:18,644][134294] Updated weights for policy 0, policy_version 143264 (0.0027) [2025-01-04 08:02:18,968][134211] Fps is (10 sec: 15974.3, 60 sec: 14745.6, 300 sec: 14245.7). Total num frames: 586809344. Throughput: 0: 3822.4. Samples: 135867574. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 08:02:18,968][134211] Avg episode reward: [(0, '8.759')] [2025-01-04 08:02:21,780][134294] Updated weights for policy 0, policy_version 143274 (0.0025) [2025-01-04 08:02:23,968][134211] Fps is (10 sec: 12697.4, 60 sec: 14609.0, 300 sec: 14273.5). Total num frames: 586874880. Throughput: 0: 3807.0. Samples: 135887012. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 08:02:23,968][134211] Avg episode reward: [(0, '8.897')] [2025-01-04 08:02:24,964][134294] Updated weights for policy 0, policy_version 143284 (0.0029) [2025-01-04 08:02:28,220][134294] Updated weights for policy 0, policy_version 143294 (0.0024) [2025-01-04 08:02:28,968][134211] Fps is (10 sec: 13107.2, 60 sec: 14677.4, 300 sec: 14287.5). Total num frames: 586940416. Throughput: 0: 3777.8. Samples: 135906226. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 08:02:28,968][134211] Avg episode reward: [(0, '8.183')] [2025-01-04 08:02:31,713][134294] Updated weights for policy 0, policy_version 143304 (0.0028) [2025-01-04 08:02:33,968][134211] Fps is (10 sec: 12288.0, 60 sec: 14540.8, 300 sec: 14245.7). Total num frames: 586997760. Throughput: 0: 3615.4. Samples: 135915126. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 08:02:33,969][134211] Avg episode reward: [(0, '8.361')] [2025-01-04 08:02:35,442][134294] Updated weights for policy 0, policy_version 143314 (0.0030) [2025-01-04 08:02:38,567][134294] Updated weights for policy 0, policy_version 143324 (0.0027) [2025-01-04 08:02:38,968][134211] Fps is (10 sec: 11468.7, 60 sec: 14404.4, 300 sec: 14204.1). Total num frames: 587055104. Throughput: 0: 3503.6. Samples: 135932538. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 08:02:38,968][134211] Avg episode reward: [(0, '9.501')] [2025-01-04 08:02:41,914][134294] Updated weights for policy 0, policy_version 143334 (0.0025) [2025-01-04 08:02:43,968][134211] Fps is (10 sec: 12288.1, 60 sec: 14199.4, 300 sec: 14204.1). Total num frames: 587120640. Throughput: 0: 3520.4. Samples: 135951468. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 08:02:43,968][134211] Avg episode reward: [(0, '9.065')] [2025-01-04 08:02:44,963][134294] Updated weights for policy 0, policy_version 143344 (0.0025) [2025-01-04 08:02:47,976][134294] Updated weights for policy 0, policy_version 143354 (0.0024) [2025-01-04 08:02:48,968][134211] Fps is (10 sec: 13926.7, 60 sec: 13858.5, 300 sec: 14204.1). Total num frames: 587194368. Throughput: 0: 3535.2. Samples: 135961448. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 08:02:48,968][134211] Avg episode reward: [(0, '8.216')] [2025-01-04 08:02:49,972][134294] Updated weights for policy 0, policy_version 143364 (0.0014) [2025-01-04 08:02:51,932][134294] Updated weights for policy 0, policy_version 143374 (0.0014) [2025-01-04 08:02:53,968][134211] Fps is (10 sec: 17203.3, 60 sec: 14404.3, 300 sec: 14329.1). Total num frames: 587292672. Throughput: 0: 3619.1. Samples: 135989558. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 08:02:53,968][134211] Avg episode reward: [(0, '8.451')] [2025-01-04 08:02:54,311][134294] Updated weights for policy 0, policy_version 143384 (0.0022) [2025-01-04 08:02:57,692][134294] Updated weights for policy 0, policy_version 143394 (0.0030) [2025-01-04 08:02:58,968][134211] Fps is (10 sec: 15974.1, 60 sec: 14472.5, 300 sec: 14245.7). Total num frames: 587354112. Throughput: 0: 3392.3. Samples: 136010236. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 08:02:58,968][134211] Avg episode reward: [(0, '8.784')] [2025-01-04 08:03:00,889][134294] Updated weights for policy 0, policy_version 143404 (0.0027) [2025-01-04 08:03:03,968][134211] Fps is (10 sec: 12697.7, 60 sec: 14540.8, 300 sec: 14204.1). Total num frames: 587419648. Throughput: 0: 3378.2. Samples: 136019594. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 08:03:03,968][134211] Avg episode reward: [(0, '7.783')] [2025-01-04 08:03:04,155][134294] Updated weights for policy 0, policy_version 143414 (0.0026) [2025-01-04 08:03:07,183][134294] Updated weights for policy 0, policy_version 143424 (0.0024) [2025-01-04 08:03:08,968][134211] Fps is (10 sec: 13107.0, 60 sec: 13926.3, 300 sec: 14204.1). Total num frames: 587485184. Throughput: 0: 3384.2. Samples: 136039300. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 08:03:08,969][134211] Avg episode reward: [(0, '8.806')] [2025-01-04 08:03:10,199][134294] Updated weights for policy 0, policy_version 143434 (0.0029) [2025-01-04 08:03:13,267][134294] Updated weights for policy 0, policy_version 143444 (0.0026) [2025-01-04 08:03:13,968][134211] Fps is (10 sec: 13516.3, 60 sec: 13448.4, 300 sec: 14218.0). Total num frames: 587554816. Throughput: 0: 3416.3. Samples: 136059962. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 08:03:13,969][134211] Avg episode reward: [(0, '8.015')] [2025-01-04 08:03:13,978][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000143446_587554816.pth... [2025-01-04 08:03:14,053][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000142619_584167424.pth [2025-01-04 08:03:16,243][134294] Updated weights for policy 0, policy_version 143454 (0.0023) [2025-01-04 08:03:18,968][134211] Fps is (10 sec: 13926.5, 60 sec: 13585.0, 300 sec: 14218.0). Total num frames: 587624448. Throughput: 0: 3436.0. Samples: 136069746. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 08:03:18,968][134211] Avg episode reward: [(0, '9.057')] [2025-01-04 08:03:19,248][134294] Updated weights for policy 0, policy_version 143464 (0.0028) [2025-01-04 08:03:22,184][134294] Updated weights for policy 0, policy_version 143474 (0.0024) [2025-01-04 08:03:23,968][134211] Fps is (10 sec: 15155.7, 60 sec: 13858.2, 300 sec: 14273.5). Total num frames: 587706368. Throughput: 0: 3514.8. Samples: 136090706. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 08:03:23,968][134211] Avg episode reward: [(0, '8.346')] [2025-01-04 08:03:24,028][134294] Updated weights for policy 0, policy_version 143484 (0.0014) [2025-01-04 08:03:25,963][134294] Updated weights for policy 0, policy_version 143494 (0.0015) [2025-01-04 08:03:27,982][134294] Updated weights for policy 0, policy_version 143504 (0.0016) [2025-01-04 08:03:28,968][134211] Fps is (10 sec: 18431.6, 60 sec: 14472.5, 300 sec: 14287.4). Total num frames: 587808768. Throughput: 0: 3798.3. Samples: 136122390. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 08:03:28,968][134211] Avg episode reward: [(0, '9.055')] [2025-01-04 08:03:30,389][134294] Updated weights for policy 0, policy_version 143514 (0.0017) [2025-01-04 08:03:33,842][134294] Updated weights for policy 0, policy_version 143524 (0.0027) [2025-01-04 08:03:33,968][134211] Fps is (10 sec: 16793.4, 60 sec: 14609.1, 300 sec: 14287.4). Total num frames: 587874304. Throughput: 0: 3835.8. Samples: 136134062. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 08:03:33,969][134211] Avg episode reward: [(0, '8.766')] [2025-01-04 08:03:36,974][134294] Updated weights for policy 0, policy_version 143534 (0.0022) [2025-01-04 08:03:38,968][134211] Fps is (10 sec: 13107.4, 60 sec: 14745.6, 300 sec: 14301.3). Total num frames: 587939840. Throughput: 0: 3623.6. Samples: 136152620. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 08:03:38,968][134211] Avg episode reward: [(0, '9.665')] [2025-01-04 08:03:40,162][134294] Updated weights for policy 0, policy_version 143544 (0.0024) [2025-01-04 08:03:43,143][134294] Updated weights for policy 0, policy_version 143554 (0.0027) [2025-01-04 08:03:43,968][134211] Fps is (10 sec: 13107.5, 60 sec: 14745.6, 300 sec: 14287.4). Total num frames: 588005376. Throughput: 0: 3605.0. Samples: 136172460. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 08:03:43,968][134211] Avg episode reward: [(0, '8.575')] [2025-01-04 08:03:46,186][134294] Updated weights for policy 0, policy_version 143564 (0.0025) [2025-01-04 08:03:48,968][134211] Fps is (10 sec: 13107.2, 60 sec: 14609.0, 300 sec: 14287.4). Total num frames: 588070912. Throughput: 0: 3624.3. Samples: 136182686. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 08:03:48,968][134211] Avg episode reward: [(0, '9.140')] [2025-01-04 08:03:49,273][134294] Updated weights for policy 0, policy_version 143574 (0.0024) [2025-01-04 08:03:52,294][134294] Updated weights for policy 0, policy_version 143584 (0.0025) [2025-01-04 08:03:53,968][134211] Fps is (10 sec: 13516.8, 60 sec: 14131.2, 300 sec: 14315.2). Total num frames: 588140544. Throughput: 0: 3634.2. Samples: 136202840. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 08:03:53,968][134211] Avg episode reward: [(0, '9.305')] [2025-01-04 08:03:55,305][134294] Updated weights for policy 0, policy_version 143594 (0.0023) [2025-01-04 08:03:58,539][134294] Updated weights for policy 0, policy_version 143604 (0.0023) [2025-01-04 08:03:58,968][134211] Fps is (10 sec: 13516.8, 60 sec: 14199.5, 300 sec: 14218.0). Total num frames: 588206080. Throughput: 0: 3617.9. Samples: 136222766. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 08:03:58,969][134211] Avg episode reward: [(0, '8.383')] [2025-01-04 08:04:01,637][134294] Updated weights for policy 0, policy_version 143614 (0.0025) [2025-01-04 08:04:03,968][134211] Fps is (10 sec: 12697.4, 60 sec: 14131.2, 300 sec: 14106.9). Total num frames: 588267520. Throughput: 0: 3616.6. Samples: 136232494. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 08:04:03,968][134211] Avg episode reward: [(0, '9.110')] [2025-01-04 08:04:05,129][134294] Updated weights for policy 0, policy_version 143624 (0.0028) [2025-01-04 08:04:07,705][134294] Updated weights for policy 0, policy_version 143634 (0.0019) [2025-01-04 08:04:08,968][134211] Fps is (10 sec: 14336.3, 60 sec: 14404.3, 300 sec: 14176.4). Total num frames: 588349440. Throughput: 0: 3571.8. Samples: 136251436. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 08:04:08,968][134211] Avg episode reward: [(0, '7.850')] [2025-01-04 08:04:09,644][134294] Updated weights for policy 0, policy_version 143644 (0.0014) [2025-01-04 08:04:11,710][134294] Updated weights for policy 0, policy_version 143654 (0.0016) [2025-01-04 08:04:13,968][134211] Fps is (10 sec: 16793.8, 60 sec: 14677.4, 300 sec: 14259.6). Total num frames: 588435456. Throughput: 0: 3509.1. Samples: 136280300. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 08:04:13,968][134211] Avg episode reward: [(0, '8.722')] [2025-01-04 08:04:14,664][134294] Updated weights for policy 0, policy_version 143664 (0.0023) [2025-01-04 08:04:17,750][134294] Updated weights for policy 0, policy_version 143674 (0.0024) [2025-01-04 08:04:18,968][134211] Fps is (10 sec: 15154.9, 60 sec: 14609.1, 300 sec: 14245.8). Total num frames: 588500992. Throughput: 0: 3469.0. Samples: 136290166. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 08:04:18,968][134211] Avg episode reward: [(0, '8.716')] [2025-01-04 08:04:20,795][134294] Updated weights for policy 0, policy_version 143684 (0.0023) [2025-01-04 08:04:23,778][134294] Updated weights for policy 0, policy_version 143694 (0.0026) [2025-01-04 08:04:23,968][134211] Fps is (10 sec: 13516.6, 60 sec: 14404.2, 300 sec: 14287.4). Total num frames: 588570624. Throughput: 0: 3508.0. Samples: 136310482. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 08:04:23,969][134211] Avg episode reward: [(0, '8.319')] [2025-01-04 08:04:27,023][134294] Updated weights for policy 0, policy_version 143704 (0.0025) [2025-01-04 08:04:28,970][134211] Fps is (10 sec: 13104.6, 60 sec: 13721.2, 300 sec: 14287.3). Total num frames: 588632064. Throughput: 0: 3487.6. Samples: 136329410. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 08:04:28,970][134211] Avg episode reward: [(0, '7.747')] [2025-01-04 08:04:30,285][134294] Updated weights for policy 0, policy_version 143714 (0.0027) [2025-01-04 08:04:32,891][134294] Updated weights for policy 0, policy_version 143724 (0.0015) [2025-01-04 08:04:33,968][134211] Fps is (10 sec: 14336.1, 60 sec: 13994.7, 300 sec: 14329.1). Total num frames: 588713984. Throughput: 0: 3475.7. Samples: 136339094. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 08:04:33,968][134211] Avg episode reward: [(0, '8.288')] [2025-01-04 08:04:35,031][134294] Updated weights for policy 0, policy_version 143734 (0.0014) [2025-01-04 08:04:38,079][134294] Updated weights for policy 0, policy_version 143744 (0.0026) [2025-01-04 08:04:38,968][134211] Fps is (10 sec: 15157.6, 60 sec: 14062.8, 300 sec: 14342.9). Total num frames: 588783616. Throughput: 0: 3588.6. Samples: 136364328. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 08:04:38,969][134211] Avg episode reward: [(0, '8.834')] [2025-01-04 08:04:41,177][134294] Updated weights for policy 0, policy_version 143754 (0.0027) [2025-01-04 08:04:43,968][134211] Fps is (10 sec: 13516.8, 60 sec: 14062.9, 300 sec: 14301.3). Total num frames: 588849152. Throughput: 0: 3576.4. Samples: 136383706. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 08:04:43,968][134211] Avg episode reward: [(0, '8.375')] [2025-01-04 08:04:44,316][134294] Updated weights for policy 0, policy_version 143764 (0.0028) [2025-01-04 08:04:47,304][134294] Updated weights for policy 0, policy_version 143774 (0.0026) [2025-01-04 08:04:48,968][134211] Fps is (10 sec: 13107.8, 60 sec: 14062.9, 300 sec: 14176.3). Total num frames: 588914688. Throughput: 0: 3583.5. Samples: 136393750. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 08:04:48,968][134211] Avg episode reward: [(0, '9.511')] [2025-01-04 08:04:50,364][134294] Updated weights for policy 0, policy_version 143784 (0.0026) [2025-01-04 08:04:53,325][134294] Updated weights for policy 0, policy_version 143794 (0.0026) [2025-01-04 08:04:53,968][134211] Fps is (10 sec: 13926.2, 60 sec: 14131.1, 300 sec: 14190.2). Total num frames: 588988416. Throughput: 0: 3620.4. Samples: 136414356. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 08:04:53,969][134211] Avg episode reward: [(0, '8.708')] [2025-01-04 08:04:56,348][134294] Updated weights for policy 0, policy_version 143804 (0.0022) [2025-01-04 08:04:58,968][134211] Fps is (10 sec: 13516.7, 60 sec: 14062.9, 300 sec: 14218.0). Total num frames: 589049856. Throughput: 0: 3409.7. Samples: 136433738. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 08:04:58,968][134211] Avg episode reward: [(0, '8.769')] [2025-01-04 08:04:59,828][134294] Updated weights for policy 0, policy_version 143814 (0.0026) [2025-01-04 08:05:01,944][134294] Updated weights for policy 0, policy_version 143824 (0.0013) [2025-01-04 08:05:03,967][134211] Fps is (10 sec: 15155.9, 60 sec: 14540.9, 300 sec: 14301.3). Total num frames: 589139968. Throughput: 0: 3442.4. Samples: 136445074. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 08:05:03,968][134211] Avg episode reward: [(0, '9.512')] [2025-01-04 08:05:03,976][134294] Updated weights for policy 0, policy_version 143834 (0.0015) [2025-01-04 08:05:05,836][134294] Updated weights for policy 0, policy_version 143844 (0.0013) [2025-01-04 08:05:07,755][134294] Updated weights for policy 0, policy_version 143854 (0.0014) [2025-01-04 08:05:08,968][134211] Fps is (10 sec: 20070.8, 60 sec: 15018.7, 300 sec: 14454.0). Total num frames: 589250560. Throughput: 0: 3692.0. Samples: 136476620. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 08:05:08,968][134211] Avg episode reward: [(0, '8.955')] [2025-01-04 08:05:09,686][134294] Updated weights for policy 0, policy_version 143864 (0.0014) [2025-01-04 08:05:11,820][134294] Updated weights for policy 0, policy_version 143874 (0.0016) [2025-01-04 08:05:13,969][134211] Fps is (10 sec: 19248.2, 60 sec: 14950.1, 300 sec: 14509.5). Total num frames: 589332480. Throughput: 0: 3889.1. Samples: 136504416. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 08:05:13,970][134211] Avg episode reward: [(0, '8.392')] [2025-01-04 08:05:13,980][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000143880_589332480.pth... [2025-01-04 08:05:14,066][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000143033_585863168.pth [2025-01-04 08:05:15,293][134294] Updated weights for policy 0, policy_version 143884 (0.0024) [2025-01-04 08:05:18,480][134294] Updated weights for policy 0, policy_version 143894 (0.0029) [2025-01-04 08:05:18,968][134211] Fps is (10 sec: 14335.8, 60 sec: 14882.2, 300 sec: 14495.7). Total num frames: 589393920. Throughput: 0: 3866.5. Samples: 136513084. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 08:05:18,968][134211] Avg episode reward: [(0, '8.612')] [2025-01-04 08:05:21,668][134294] Updated weights for policy 0, policy_version 143904 (0.0025) [2025-01-04 08:05:23,968][134211] Fps is (10 sec: 12699.2, 60 sec: 14813.9, 300 sec: 14426.2). Total num frames: 589459456. Throughput: 0: 3739.2. Samples: 136532590. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 08:05:23,968][134211] Avg episode reward: [(0, '8.849')] [2025-01-04 08:05:24,809][134294] Updated weights for policy 0, policy_version 143914 (0.0027) [2025-01-04 08:05:28,383][134294] Updated weights for policy 0, policy_version 143924 (0.0024) [2025-01-04 08:05:28,968][134211] Fps is (10 sec: 12287.9, 60 sec: 14746.1, 300 sec: 14384.6). Total num frames: 589516800. Throughput: 0: 3718.6. Samples: 136551044. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 08:05:28,968][134211] Avg episode reward: [(0, '8.337')] [2025-01-04 08:05:31,592][134294] Updated weights for policy 0, policy_version 143934 (0.0027) [2025-01-04 08:05:33,968][134211] Fps is (10 sec: 12288.2, 60 sec: 14472.6, 300 sec: 14412.4). Total num frames: 589582336. Throughput: 0: 3700.1. Samples: 136560252. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 08:05:33,968][134211] Avg episode reward: [(0, '7.798')] [2025-01-04 08:05:34,931][134294] Updated weights for policy 0, policy_version 143944 (0.0027) [2025-01-04 08:05:37,823][134294] Updated weights for policy 0, policy_version 143954 (0.0022) [2025-01-04 08:05:38,968][134211] Fps is (10 sec: 13107.4, 60 sec: 14404.4, 300 sec: 14287.4). Total num frames: 589647872. Throughput: 0: 3674.6. Samples: 136579710. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 08:05:38,968][134211] Avg episode reward: [(0, '9.047')] [2025-01-04 08:05:40,896][134294] Updated weights for policy 0, policy_version 143964 (0.0022) [2025-01-04 08:05:43,817][134294] Updated weights for policy 0, policy_version 143974 (0.0024) [2025-01-04 08:05:43,969][134211] Fps is (10 sec: 13514.5, 60 sec: 14472.2, 300 sec: 14148.5). Total num frames: 589717504. Throughput: 0: 3703.4. Samples: 136600396. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 08:05:43,970][134211] Avg episode reward: [(0, '8.275')] [2025-01-04 08:05:46,790][134294] Updated weights for policy 0, policy_version 143984 (0.0026) [2025-01-04 08:05:48,968][134211] Fps is (10 sec: 13926.3, 60 sec: 14540.8, 300 sec: 14148.6). Total num frames: 589787136. Throughput: 0: 3681.0. Samples: 136610718. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 08:05:48,969][134211] Avg episode reward: [(0, '9.174')] [2025-01-04 08:05:49,801][134294] Updated weights for policy 0, policy_version 143994 (0.0025) [2025-01-04 08:05:52,888][134294] Updated weights for policy 0, policy_version 144004 (0.0027) [2025-01-04 08:05:53,968][134211] Fps is (10 sec: 13518.9, 60 sec: 14404.3, 300 sec: 14190.2). Total num frames: 589852672. Throughput: 0: 3427.9. Samples: 136630876. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 08:05:53,968][134211] Avg episode reward: [(0, '8.700')] [2025-01-04 08:05:56,111][134294] Updated weights for policy 0, policy_version 144014 (0.0025) [2025-01-04 08:05:58,968][134211] Fps is (10 sec: 12697.3, 60 sec: 14404.2, 300 sec: 14190.2). Total num frames: 589914112. Throughput: 0: 3227.7. Samples: 136649660. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 08:05:58,969][134211] Avg episode reward: [(0, '8.834')] [2025-01-04 08:05:59,604][134294] Updated weights for policy 0, policy_version 144024 (0.0026) [2025-01-04 08:06:02,722][134294] Updated weights for policy 0, policy_version 144034 (0.0020) [2025-01-04 08:06:03,968][134211] Fps is (10 sec: 13516.8, 60 sec: 14131.1, 300 sec: 14218.0). Total num frames: 589987840. Throughput: 0: 3230.3. Samples: 136658446. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 08:06:03,968][134211] Avg episode reward: [(0, '7.745')] [2025-01-04 08:06:04,696][134294] Updated weights for policy 0, policy_version 144044 (0.0013) [2025-01-04 08:06:06,601][134294] Updated weights for policy 0, policy_version 144054 (0.0015) [2025-01-04 08:06:08,472][134294] Updated weights for policy 0, policy_version 144064 (0.0013) [2025-01-04 08:06:08,968][134211] Fps is (10 sec: 18022.9, 60 sec: 14062.9, 300 sec: 14342.9). Total num frames: 590094336. Throughput: 0: 3446.9. Samples: 136687698. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 08:06:08,968][134211] Avg episode reward: [(0, '7.611')] [2025-01-04 08:06:10,581][134294] Updated weights for policy 0, policy_version 144074 (0.0014) [2025-01-04 08:06:13,589][134294] Updated weights for policy 0, policy_version 144084 (0.0026) [2025-01-04 08:06:13,968][134211] Fps is (10 sec: 18022.2, 60 sec: 13926.7, 300 sec: 14384.6). Total num frames: 590168064. Throughput: 0: 3615.9. Samples: 136713758. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 08:06:13,969][134211] Avg episode reward: [(0, '7.950')] [2025-01-04 08:06:16,675][134294] Updated weights for policy 0, policy_version 144094 (0.0026) [2025-01-04 08:06:18,968][134211] Fps is (10 sec: 13926.2, 60 sec: 13994.7, 300 sec: 14356.8). Total num frames: 590233600. Throughput: 0: 3632.6. Samples: 136723720. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 08:06:18,968][134211] Avg episode reward: [(0, '8.257')] [2025-01-04 08:06:19,969][134294] Updated weights for policy 0, policy_version 144104 (0.0028) [2025-01-04 08:06:23,157][134294] Updated weights for policy 0, policy_version 144114 (0.0025) [2025-01-04 08:06:23,968][134211] Fps is (10 sec: 13107.3, 60 sec: 13994.7, 300 sec: 14370.7). Total num frames: 590299136. Throughput: 0: 3628.1. Samples: 136742976. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 08:06:23,968][134211] Avg episode reward: [(0, '8.380')] [2025-01-04 08:06:26,162][134294] Updated weights for policy 0, policy_version 144124 (0.0023) [2025-01-04 08:06:28,968][134211] Fps is (10 sec: 13107.1, 60 sec: 14131.2, 300 sec: 14370.7). Total num frames: 590364672. Throughput: 0: 3597.5. Samples: 136762278. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 08:06:28,968][134211] Avg episode reward: [(0, '8.415')] [2025-01-04 08:06:29,591][134294] Updated weights for policy 0, policy_version 144134 (0.0026) [2025-01-04 08:06:32,883][134294] Updated weights for policy 0, policy_version 144144 (0.0027) [2025-01-04 08:06:33,968][134211] Fps is (10 sec: 12697.6, 60 sec: 14062.9, 300 sec: 14356.8). Total num frames: 590426112. Throughput: 0: 3572.5. Samples: 136771480. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 08:06:33,968][134211] Avg episode reward: [(0, '8.148')] [2025-01-04 08:06:36,465][134294] Updated weights for policy 0, policy_version 144154 (0.0023) [2025-01-04 08:06:38,968][134211] Fps is (10 sec: 12288.3, 60 sec: 13994.7, 300 sec: 14301.3). Total num frames: 590487552. Throughput: 0: 3514.1. Samples: 136789010. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 08:06:38,968][134211] Avg episode reward: [(0, '8.497')] [2025-01-04 08:06:39,163][134294] Updated weights for policy 0, policy_version 144164 (0.0018) [2025-01-04 08:06:41,811][134294] Updated weights for policy 0, policy_version 144174 (0.0021) [2025-01-04 08:06:43,968][134211] Fps is (10 sec: 13516.8, 60 sec: 14063.3, 300 sec: 14231.9). Total num frames: 590561280. Throughput: 0: 3600.5. Samples: 136811682. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 08:06:43,968][134211] Avg episode reward: [(0, '8.498')] [2025-01-04 08:06:45,034][134294] Updated weights for policy 0, policy_version 144184 (0.0025) [2025-01-04 08:06:47,942][134294] Updated weights for policy 0, policy_version 144194 (0.0024) [2025-01-04 08:06:48,968][134211] Fps is (10 sec: 14745.6, 60 sec: 14131.2, 300 sec: 14259.6). Total num frames: 590635008. Throughput: 0: 3621.8. Samples: 136821428. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 08:06:48,968][134211] Avg episode reward: [(0, '8.605')] [2025-01-04 08:06:50,036][134294] Updated weights for policy 0, policy_version 144204 (0.0013) [2025-01-04 08:06:52,786][134294] Updated weights for policy 0, policy_version 144214 (0.0022) [2025-01-04 08:06:53,968][134211] Fps is (10 sec: 15155.2, 60 sec: 14336.0, 300 sec: 14329.1). Total num frames: 590712832. Throughput: 0: 3522.9. Samples: 136846228. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 08:06:53,968][134211] Avg episode reward: [(0, '8.776')] [2025-01-04 08:06:55,908][134294] Updated weights for policy 0, policy_version 144224 (0.0027) [2025-01-04 08:06:58,968][134211] Fps is (10 sec: 14335.7, 60 sec: 14404.3, 300 sec: 14342.9). Total num frames: 590778368. Throughput: 0: 3383.4. Samples: 136866012. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 08:06:58,968][134211] Avg episode reward: [(0, '8.203')] [2025-01-04 08:06:59,122][134294] Updated weights for policy 0, policy_version 144234 (0.0024) [2025-01-04 08:07:02,371][134294] Updated weights for policy 0, policy_version 144244 (0.0029) [2025-01-04 08:07:03,968][134211] Fps is (10 sec: 12697.6, 60 sec: 14199.5, 300 sec: 14204.1). Total num frames: 590839808. Throughput: 0: 3366.5. Samples: 136875214. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 08:07:03,968][134211] Avg episode reward: [(0, '9.036')] [2025-01-04 08:07:05,499][134294] Updated weights for policy 0, policy_version 144254 (0.0026) [2025-01-04 08:07:07,488][134294] Updated weights for policy 0, policy_version 144264 (0.0012) [2025-01-04 08:07:08,968][134211] Fps is (10 sec: 14745.6, 60 sec: 13858.1, 300 sec: 14162.4). Total num frames: 590925824. Throughput: 0: 3449.6. Samples: 136898210. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 08:07:08,968][134211] Avg episode reward: [(0, '8.908')] [2025-01-04 08:07:10,224][134294] Updated weights for policy 0, policy_version 144274 (0.0025) [2025-01-04 08:07:13,250][134294] Updated weights for policy 0, policy_version 144284 (0.0024) [2025-01-04 08:07:13,968][134211] Fps is (10 sec: 15564.9, 60 sec: 13789.9, 300 sec: 14190.2). Total num frames: 590995456. Throughput: 0: 3501.0. Samples: 136919824. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 08:07:13,968][134211] Avg episode reward: [(0, '8.473')] [2025-01-04 08:07:13,977][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000144286_590995456.pth... [2025-01-04 08:07:14,054][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000143446_587554816.pth [2025-01-04 08:07:16,407][134294] Updated weights for policy 0, policy_version 144294 (0.0026) [2025-01-04 08:07:18,968][134211] Fps is (10 sec: 13516.9, 60 sec: 13789.9, 300 sec: 14190.2). Total num frames: 591060992. Throughput: 0: 3509.1. Samples: 136929388. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 08:07:18,968][134211] Avg episode reward: [(0, '9.645')] [2025-01-04 08:07:19,461][134294] Updated weights for policy 0, policy_version 144304 (0.0028) [2025-01-04 08:07:22,399][134294] Updated weights for policy 0, policy_version 144314 (0.0025) [2025-01-04 08:07:23,968][134211] Fps is (10 sec: 13107.1, 60 sec: 13789.9, 300 sec: 14190.2). Total num frames: 591126528. Throughput: 0: 3574.8. Samples: 136949878. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 08:07:23,968][134211] Avg episode reward: [(0, '8.888')] [2025-01-04 08:07:25,316][134294] Updated weights for policy 0, policy_version 144324 (0.0020) [2025-01-04 08:07:27,573][134294] Updated weights for policy 0, policy_version 144334 (0.0015) [2025-01-04 08:07:28,969][134211] Fps is (10 sec: 14744.0, 60 sec: 14062.7, 300 sec: 14273.5). Total num frames: 591208448. Throughput: 0: 3588.0. Samples: 136973144. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 08:07:28,969][134211] Avg episode reward: [(0, '8.325')] [2025-01-04 08:07:31,021][134294] Updated weights for policy 0, policy_version 144344 (0.0028) [2025-01-04 08:07:33,595][134294] Updated weights for policy 0, policy_version 144354 (0.0019) [2025-01-04 08:07:33,968][134211] Fps is (10 sec: 15155.5, 60 sec: 14199.5, 300 sec: 14315.2). Total num frames: 591278080. Throughput: 0: 3569.2. Samples: 136982042. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 08:07:33,968][134211] Avg episode reward: [(0, '9.103')] [2025-01-04 08:07:35,595][134294] Updated weights for policy 0, policy_version 144364 (0.0013) [2025-01-04 08:07:37,467][134294] Updated weights for policy 0, policy_version 144374 (0.0013) [2025-01-04 08:07:38,968][134211] Fps is (10 sec: 17204.9, 60 sec: 14882.1, 300 sec: 14440.1). Total num frames: 591380480. Throughput: 0: 3651.0. Samples: 137010524. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 08:07:38,968][134211] Avg episode reward: [(0, '8.728')] [2025-01-04 08:07:40,097][134294] Updated weights for policy 0, policy_version 144384 (0.0020) [2025-01-04 08:07:43,261][134294] Updated weights for policy 0, policy_version 144394 (0.0028) [2025-01-04 08:07:43,968][134211] Fps is (10 sec: 16793.2, 60 sec: 14745.6, 300 sec: 14412.4). Total num frames: 591446016. Throughput: 0: 3700.8. Samples: 137032546. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 08:07:43,968][134211] Avg episode reward: [(0, '8.688')] [2025-01-04 08:07:46,280][134294] Updated weights for policy 0, policy_version 144404 (0.0025) [2025-01-04 08:07:48,968][134211] Fps is (10 sec: 13107.2, 60 sec: 14609.0, 300 sec: 14301.3). Total num frames: 591511552. Throughput: 0: 3718.2. Samples: 137042534. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 08:07:48,968][134211] Avg episode reward: [(0, '7.931')] [2025-01-04 08:07:49,470][134294] Updated weights for policy 0, policy_version 144414 (0.0027) [2025-01-04 08:07:52,584][134294] Updated weights for policy 0, policy_version 144424 (0.0025) [2025-01-04 08:07:53,968][134211] Fps is (10 sec: 13107.2, 60 sec: 14404.2, 300 sec: 14315.2). Total num frames: 591577088. Throughput: 0: 3642.1. Samples: 137062104. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 08:07:53,969][134211] Avg episode reward: [(0, '8.742')] [2025-01-04 08:07:55,592][134294] Updated weights for policy 0, policy_version 144434 (0.0029) [2025-01-04 08:07:58,879][134294] Updated weights for policy 0, policy_version 144444 (0.0026) [2025-01-04 08:07:58,968][134211] Fps is (10 sec: 13107.2, 60 sec: 14404.3, 300 sec: 14315.2). Total num frames: 591642624. Throughput: 0: 3599.6. Samples: 137081806. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 08:07:58,968][134211] Avg episode reward: [(0, '8.882')] [2025-01-04 08:08:02,030][134294] Updated weights for policy 0, policy_version 144454 (0.0025) [2025-01-04 08:08:03,968][134211] Fps is (10 sec: 12288.0, 60 sec: 14336.0, 300 sec: 14287.4). Total num frames: 591699968. Throughput: 0: 3593.0. Samples: 137091072. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 08:08:03,968][134211] Avg episode reward: [(0, '8.498')] [2025-01-04 08:08:05,367][134294] Updated weights for policy 0, policy_version 144464 (0.0024) [2025-01-04 08:08:08,348][134294] Updated weights for policy 0, policy_version 144474 (0.0025) [2025-01-04 08:08:08,968][134211] Fps is (10 sec: 13107.2, 60 sec: 14131.2, 300 sec: 14301.3). Total num frames: 591773696. Throughput: 0: 3569.2. Samples: 137110492. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 08:08:08,968][134211] Avg episode reward: [(0, '8.247')] [2025-01-04 08:08:11,392][134294] Updated weights for policy 0, policy_version 144484 (0.0026) [2025-01-04 08:08:13,968][134211] Fps is (10 sec: 13926.6, 60 sec: 14062.9, 300 sec: 14287.4). Total num frames: 591839232. Throughput: 0: 3502.8. Samples: 137130766. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 08:08:13,968][134211] Avg episode reward: [(0, '8.118')] [2025-01-04 08:08:14,450][134294] Updated weights for policy 0, policy_version 144494 (0.0027) [2025-01-04 08:08:17,459][134294] Updated weights for policy 0, policy_version 144504 (0.0025) [2025-01-04 08:08:18,968][134211] Fps is (10 sec: 13107.4, 60 sec: 14063.0, 300 sec: 14231.9). Total num frames: 591904768. Throughput: 0: 3532.2. Samples: 137140990. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 08:08:18,968][134211] Avg episode reward: [(0, '8.654')] [2025-01-04 08:08:20,347][134294] Updated weights for policy 0, policy_version 144514 (0.0029) [2025-01-04 08:08:23,328][134294] Updated weights for policy 0, policy_version 144524 (0.0026) [2025-01-04 08:08:23,968][134211] Fps is (10 sec: 13926.3, 60 sec: 14199.5, 300 sec: 14134.7). Total num frames: 591978496. Throughput: 0: 3364.1. Samples: 137161908. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 08:08:23,968][134211] Avg episode reward: [(0, '9.056')] [2025-01-04 08:08:25,627][134294] Updated weights for policy 0, policy_version 144534 (0.0015) [2025-01-04 08:08:27,635][134294] Updated weights for policy 0, policy_version 144544 (0.0013) [2025-01-04 08:08:28,968][134211] Fps is (10 sec: 17203.3, 60 sec: 14472.8, 300 sec: 14245.8). Total num frames: 592076800. Throughput: 0: 3478.6. Samples: 137189082. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 08:08:28,968][134211] Avg episode reward: [(0, '8.733')] [2025-01-04 08:08:29,671][134294] Updated weights for policy 0, policy_version 144554 (0.0012) [2025-01-04 08:08:31,968][134294] Updated weights for policy 0, policy_version 144564 (0.0018) [2025-01-04 08:08:33,968][134211] Fps is (10 sec: 17612.9, 60 sec: 14609.0, 300 sec: 14287.4). Total num frames: 592154624. Throughput: 0: 3586.4. Samples: 137203920. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 08:08:33,968][134211] Avg episode reward: [(0, '8.320')] [2025-01-04 08:08:35,726][134294] Updated weights for policy 0, policy_version 144574 (0.0028) [2025-01-04 08:08:38,968][134211] Fps is (10 sec: 13516.5, 60 sec: 13858.1, 300 sec: 14259.6). Total num frames: 592211968. Throughput: 0: 3537.4. Samples: 137221286. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 08:08:38,968][134211] Avg episode reward: [(0, '7.892')] [2025-01-04 08:08:39,312][134294] Updated weights for policy 0, policy_version 144584 (0.0027) [2025-01-04 08:08:42,691][134294] Updated weights for policy 0, policy_version 144594 (0.0024) [2025-01-04 08:08:43,968][134211] Fps is (10 sec: 11878.3, 60 sec: 13789.9, 300 sec: 14245.7). Total num frames: 592273408. Throughput: 0: 3493.5. Samples: 137239014. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 08:08:43,969][134211] Avg episode reward: [(0, '8.330')] [2025-01-04 08:08:45,711][134294] Updated weights for policy 0, policy_version 144604 (0.0026) [2025-01-04 08:08:48,722][134294] Updated weights for policy 0, policy_version 144614 (0.0025) [2025-01-04 08:08:48,968][134211] Fps is (10 sec: 12697.7, 60 sec: 13789.9, 300 sec: 14231.9). Total num frames: 592338944. Throughput: 0: 3514.9. Samples: 137249244. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 08:08:48,968][134211] Avg episode reward: [(0, '7.819')] [2025-01-04 08:08:51,701][134294] Updated weights for policy 0, policy_version 144624 (0.0024) [2025-01-04 08:08:53,968][134211] Fps is (10 sec: 13516.5, 60 sec: 13858.1, 300 sec: 14245.7). Total num frames: 592408576. Throughput: 0: 3541.1. Samples: 137269842. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 08:08:53,969][134211] Avg episode reward: [(0, '8.345')] [2025-01-04 08:08:54,710][134294] Updated weights for policy 0, policy_version 144634 (0.0026) [2025-01-04 08:08:57,837][134294] Updated weights for policy 0, policy_version 144644 (0.0026) [2025-01-04 08:08:58,968][134211] Fps is (10 sec: 13516.9, 60 sec: 13858.2, 300 sec: 14259.6). Total num frames: 592474112. Throughput: 0: 3529.7. Samples: 137289602. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 08:08:58,968][134211] Avg episode reward: [(0, '8.957')] [2025-01-04 08:09:01,047][134294] Updated weights for policy 0, policy_version 144654 (0.0027) [2025-01-04 08:09:03,922][134294] Updated weights for policy 0, policy_version 144664 (0.0021) [2025-01-04 08:09:03,968][134211] Fps is (10 sec: 13517.4, 60 sec: 14063.0, 300 sec: 14218.0). Total num frames: 592543744. Throughput: 0: 3515.0. Samples: 137299164. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 08:09:03,968][134211] Avg episode reward: [(0, '9.156')] [2025-01-04 08:09:05,798][134294] Updated weights for policy 0, policy_version 144674 (0.0012) [2025-01-04 08:09:07,705][134294] Updated weights for policy 0, policy_version 144684 (0.0014) [2025-01-04 08:09:08,967][134211] Fps is (10 sec: 17613.0, 60 sec: 14609.1, 300 sec: 14287.4). Total num frames: 592650240. Throughput: 0: 3663.4. Samples: 137326760. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 08:09:08,968][134211] Avg episode reward: [(0, '7.976')] [2025-01-04 08:09:09,595][134294] Updated weights for policy 0, policy_version 144694 (0.0013) [2025-01-04 08:09:11,469][134294] Updated weights for policy 0, policy_version 144704 (0.0013) [2025-01-04 08:09:13,348][134294] Updated weights for policy 0, policy_version 144714 (0.0014) [2025-01-04 08:09:13,968][134211] Fps is (10 sec: 21708.9, 60 sec: 15360.0, 300 sec: 14440.1). Total num frames: 592760832. Throughput: 0: 3786.0. Samples: 137359452. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 08:09:13,968][134211] Avg episode reward: [(0, '8.449')] [2025-01-04 08:09:13,977][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000144717_592760832.pth... [2025-01-04 08:09:14,026][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000143880_589332480.pth [2025-01-04 08:09:15,930][134294] Updated weights for policy 0, policy_version 144724 (0.0020) [2025-01-04 08:09:18,968][134211] Fps is (10 sec: 17612.3, 60 sec: 15360.0, 300 sec: 14426.3). Total num frames: 592826368. Throughput: 0: 3721.6. Samples: 137371394. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 08:09:18,968][134211] Avg episode reward: [(0, '8.786')] [2025-01-04 08:09:19,126][134294] Updated weights for policy 0, policy_version 144734 (0.0027) [2025-01-04 08:09:22,316][134294] Updated weights for policy 0, policy_version 144744 (0.0028) [2025-01-04 08:09:23,968][134211] Fps is (10 sec: 12697.5, 60 sec: 15155.2, 300 sec: 14426.4). Total num frames: 592887808. Throughput: 0: 3758.9. Samples: 137390438. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 08:09:23,968][134211] Avg episode reward: [(0, '8.558')] [2025-01-04 08:09:25,503][134294] Updated weights for policy 0, policy_version 144754 (0.0025) [2025-01-04 08:09:28,650][134294] Updated weights for policy 0, policy_version 144764 (0.0027) [2025-01-04 08:09:28,968][134211] Fps is (10 sec: 13107.2, 60 sec: 14677.3, 300 sec: 14384.6). Total num frames: 592957440. Throughput: 0: 3804.5. Samples: 137410216. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 08:09:28,968][134211] Avg episode reward: [(0, '8.847')] [2025-01-04 08:09:31,935][134294] Updated weights for policy 0, policy_version 144774 (0.0026) [2025-01-04 08:09:33,968][134211] Fps is (10 sec: 12697.5, 60 sec: 14336.0, 300 sec: 14343.0). Total num frames: 593014784. Throughput: 0: 3781.8. Samples: 137419426. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 08:09:33,968][134211] Avg episode reward: [(0, '8.830')] [2025-01-04 08:09:35,249][134294] Updated weights for policy 0, policy_version 144784 (0.0025) [2025-01-04 08:09:38,187][134294] Updated weights for policy 0, policy_version 144794 (0.0028) [2025-01-04 08:09:38,968][134211] Fps is (10 sec: 12697.8, 60 sec: 14540.8, 300 sec: 14356.8). Total num frames: 593084416. Throughput: 0: 3748.8. Samples: 137438538. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 08:09:38,968][134211] Avg episode reward: [(0, '8.368')] [2025-01-04 08:09:41,279][134294] Updated weights for policy 0, policy_version 144804 (0.0026) [2025-01-04 08:09:43,968][134211] Fps is (10 sec: 13516.9, 60 sec: 14609.1, 300 sec: 14356.8). Total num frames: 593149952. Throughput: 0: 3757.1. Samples: 137458670. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 08:09:43,968][134211] Avg episode reward: [(0, '7.990')] [2025-01-04 08:09:44,399][134294] Updated weights for policy 0, policy_version 144814 (0.0023) [2025-01-04 08:09:47,676][134294] Updated weights for policy 0, policy_version 144824 (0.0026) [2025-01-04 08:09:48,968][134211] Fps is (10 sec: 13107.1, 60 sec: 14609.1, 300 sec: 14329.1). Total num frames: 593215488. Throughput: 0: 3761.1. Samples: 137468412. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 08:09:48,968][134211] Avg episode reward: [(0, '8.206')] [2025-01-04 08:09:50,835][134294] Updated weights for policy 0, policy_version 144834 (0.0027) [2025-01-04 08:09:53,607][134294] Updated weights for policy 0, policy_version 144844 (0.0023) [2025-01-04 08:09:53,968][134211] Fps is (10 sec: 13516.8, 60 sec: 14609.1, 300 sec: 14356.8). Total num frames: 593285120. Throughput: 0: 3590.3. Samples: 137488322. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 08:09:53,968][134211] Avg episode reward: [(0, '7.899')] [2025-01-04 08:09:56,564][134294] Updated weights for policy 0, policy_version 144854 (0.0026) [2025-01-04 08:09:58,968][134211] Fps is (10 sec: 13516.9, 60 sec: 14609.1, 300 sec: 14273.5). Total num frames: 593350656. Throughput: 0: 3313.0. Samples: 137508538. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 08:09:58,968][134211] Avg episode reward: [(0, '7.870')] [2025-01-04 08:10:00,002][134294] Updated weights for policy 0, policy_version 144864 (0.0025) [2025-01-04 08:10:03,499][134294] Updated weights for policy 0, policy_version 144874 (0.0024) [2025-01-04 08:10:03,968][134211] Fps is (10 sec: 12287.9, 60 sec: 14404.2, 300 sec: 14093.0). Total num frames: 593408000. Throughput: 0: 3241.0. Samples: 137517240. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 08:10:03,968][134211] Avg episode reward: [(0, '9.058')] [2025-01-04 08:10:06,662][134294] Updated weights for policy 0, policy_version 144884 (0.0026) [2025-01-04 08:10:08,782][134294] Updated weights for policy 0, policy_version 144894 (0.0014) [2025-01-04 08:10:08,968][134211] Fps is (10 sec: 13926.5, 60 sec: 13994.7, 300 sec: 14093.1). Total num frames: 593489920. Throughput: 0: 3239.5. Samples: 137536214. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 08:10:08,968][134211] Avg episode reward: [(0, '8.082')] [2025-01-04 08:10:10,710][134294] Updated weights for policy 0, policy_version 144904 (0.0013) [2025-01-04 08:10:13,388][134294] Updated weights for policy 0, policy_version 144914 (0.0021) [2025-01-04 08:10:13,968][134211] Fps is (10 sec: 16384.0, 60 sec: 13516.7, 300 sec: 14162.4). Total num frames: 593571840. Throughput: 0: 3428.6. Samples: 137564502. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 08:10:13,968][134211] Avg episode reward: [(0, '8.045')] [2025-01-04 08:10:16,447][134294] Updated weights for policy 0, policy_version 144924 (0.0028) [2025-01-04 08:10:18,968][134211] Fps is (10 sec: 15154.9, 60 sec: 13585.1, 300 sec: 14176.3). Total num frames: 593641472. Throughput: 0: 3442.5. Samples: 137574338. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 08:10:18,968][134211] Avg episode reward: [(0, '8.021')] [2025-01-04 08:10:19,657][134294] Updated weights for policy 0, policy_version 144934 (0.0026) [2025-01-04 08:10:22,661][134294] Updated weights for policy 0, policy_version 144944 (0.0022) [2025-01-04 08:10:23,968][134211] Fps is (10 sec: 13517.0, 60 sec: 13653.3, 300 sec: 14204.1). Total num frames: 593707008. Throughput: 0: 3460.9. Samples: 137594278. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 08:10:23,968][134211] Avg episode reward: [(0, '7.476')] [2025-01-04 08:10:25,741][134294] Updated weights for policy 0, policy_version 144954 (0.0024) [2025-01-04 08:10:28,805][134294] Updated weights for policy 0, policy_version 144964 (0.0021) [2025-01-04 08:10:28,968][134211] Fps is (10 sec: 13107.2, 60 sec: 13585.1, 300 sec: 14204.1). Total num frames: 593772544. Throughput: 0: 3457.6. Samples: 137614264. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 08:10:28,969][134211] Avg episode reward: [(0, '7.471')] [2025-01-04 08:10:32,020][134294] Updated weights for policy 0, policy_version 144974 (0.0023) [2025-01-04 08:10:33,968][134211] Fps is (10 sec: 12697.4, 60 sec: 13653.3, 300 sec: 14190.2). Total num frames: 593833984. Throughput: 0: 3452.3. Samples: 137623766. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 08:10:33,968][134211] Avg episode reward: [(0, '8.429')] [2025-01-04 08:10:35,591][134294] Updated weights for policy 0, policy_version 144984 (0.0024) [2025-01-04 08:10:37,724][134294] Updated weights for policy 0, policy_version 144994 (0.0013) [2025-01-04 08:10:38,968][134211] Fps is (10 sec: 14336.3, 60 sec: 13858.1, 300 sec: 14231.9). Total num frames: 593915904. Throughput: 0: 3471.5. Samples: 137644538. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 08:10:38,968][134211] Avg episode reward: [(0, '7.886')] [2025-01-04 08:10:39,780][134294] Updated weights for policy 0, policy_version 145004 (0.0014) [2025-01-04 08:10:41,633][134294] Updated weights for policy 0, policy_version 145014 (0.0014) [2025-01-04 08:10:43,571][134294] Updated weights for policy 0, policy_version 145024 (0.0012) [2025-01-04 08:10:43,968][134211] Fps is (10 sec: 19251.6, 60 sec: 14609.1, 300 sec: 14370.7). Total num frames: 594026496. Throughput: 0: 3718.8. Samples: 137675882. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 08:10:43,968][134211] Avg episode reward: [(0, '8.475')] [2025-01-04 08:10:45,373][134294] Updated weights for policy 0, policy_version 145034 (0.0014) [2025-01-04 08:10:47,302][134294] Updated weights for policy 0, policy_version 145044 (0.0013) [2025-01-04 08:10:48,968][134211] Fps is (10 sec: 21708.5, 60 sec: 15291.7, 300 sec: 14509.6). Total num frames: 594132992. Throughput: 0: 3887.7. Samples: 137692188. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 08:10:48,968][134211] Avg episode reward: [(0, '8.487')] [2025-01-04 08:10:49,567][134294] Updated weights for policy 0, policy_version 145054 (0.0020) [2025-01-04 08:10:52,705][134294] Updated weights for policy 0, policy_version 145064 (0.0027) [2025-01-04 08:10:53,968][134211] Fps is (10 sec: 17202.8, 60 sec: 15223.4, 300 sec: 14523.4). Total num frames: 594198528. Throughput: 0: 4023.2. Samples: 137717260. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 08:10:53,969][134211] Avg episode reward: [(0, '9.135')] [2025-01-04 08:10:55,820][134294] Updated weights for policy 0, policy_version 145074 (0.0026) [2025-01-04 08:10:58,948][134294] Updated weights for policy 0, policy_version 145084 (0.0028) [2025-01-04 08:10:58,968][134211] Fps is (10 sec: 13107.1, 60 sec: 15223.4, 300 sec: 14495.7). Total num frames: 594264064. Throughput: 0: 3826.9. Samples: 137736712. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 08:10:58,968][134211] Avg episode reward: [(0, '8.339')] [2025-01-04 08:11:02,321][134294] Updated weights for policy 0, policy_version 145094 (0.0026) [2025-01-04 08:11:03,968][134211] Fps is (10 sec: 12288.2, 60 sec: 15223.5, 300 sec: 14329.1). Total num frames: 594321408. Throughput: 0: 3813.0. Samples: 137745922. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 08:11:03,968][134211] Avg episode reward: [(0, '9.528')] [2025-01-04 08:11:05,732][134294] Updated weights for policy 0, policy_version 145104 (0.0028) [2025-01-04 08:11:08,682][134294] Updated weights for policy 0, policy_version 145114 (0.0025) [2025-01-04 08:11:08,969][134211] Fps is (10 sec: 12285.9, 60 sec: 14949.9, 300 sec: 14301.2). Total num frames: 594386944. Throughput: 0: 3794.2. Samples: 137765026. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 08:11:08,970][134211] Avg episode reward: [(0, '8.694')] [2025-01-04 08:11:11,738][134294] Updated weights for policy 0, policy_version 145124 (0.0026) [2025-01-04 08:11:13,968][134211] Fps is (10 sec: 13106.9, 60 sec: 14677.3, 300 sec: 14301.3). Total num frames: 594452480. Throughput: 0: 3786.7. Samples: 137784668. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 08:11:13,969][134211] Avg episode reward: [(0, '8.868')] [2025-01-04 08:11:13,976][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000145131_594456576.pth... [2025-01-04 08:11:14,053][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000144286_590995456.pth [2025-01-04 08:11:14,967][134294] Updated weights for policy 0, policy_version 145134 (0.0025) [2025-01-04 08:11:17,997][134294] Updated weights for policy 0, policy_version 145144 (0.0024) [2025-01-04 08:11:18,968][134211] Fps is (10 sec: 13519.1, 60 sec: 14677.3, 300 sec: 14315.2). Total num frames: 594522112. Throughput: 0: 3790.7. Samples: 137794346. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 08:11:18,968][134211] Avg episode reward: [(0, '8.406')] [2025-01-04 08:11:20,972][134294] Updated weights for policy 0, policy_version 145154 (0.0024) [2025-01-04 08:11:23,943][134294] Updated weights for policy 0, policy_version 145164 (0.0022) [2025-01-04 08:11:23,968][134211] Fps is (10 sec: 13926.8, 60 sec: 14745.6, 300 sec: 14329.1). Total num frames: 594591744. Throughput: 0: 3790.0. Samples: 137815088. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 08:11:23,968][134211] Avg episode reward: [(0, '8.527')] [2025-01-04 08:11:26,955][134294] Updated weights for policy 0, policy_version 145174 (0.0024) [2025-01-04 08:11:28,968][134211] Fps is (10 sec: 13516.8, 60 sec: 14745.6, 300 sec: 14342.9). Total num frames: 594657280. Throughput: 0: 3547.2. Samples: 137835506. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 08:11:28,968][134211] Avg episode reward: [(0, '10.346')] [2025-01-04 08:11:28,969][134264] Saving new best policy, reward=10.346! [2025-01-04 08:11:30,154][134294] Updated weights for policy 0, policy_version 145184 (0.0025) [2025-01-04 08:11:33,223][134294] Updated weights for policy 0, policy_version 145194 (0.0025) [2025-01-04 08:11:33,968][134211] Fps is (10 sec: 13107.0, 60 sec: 14813.9, 300 sec: 14356.8). Total num frames: 594722816. Throughput: 0: 3399.9. Samples: 137845184. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 08:11:33,968][134211] Avg episode reward: [(0, '9.103')] [2025-01-04 08:11:36,421][134294] Updated weights for policy 0, policy_version 145204 (0.0022) [2025-01-04 08:11:38,968][134211] Fps is (10 sec: 13107.2, 60 sec: 14540.8, 300 sec: 14329.1). Total num frames: 594788352. Throughput: 0: 3276.0. Samples: 137864678. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 08:11:38,968][134211] Avg episode reward: [(0, '8.747')] [2025-01-04 08:11:39,516][134294] Updated weights for policy 0, policy_version 145214 (0.0026) [2025-01-04 08:11:42,475][134294] Updated weights for policy 0, policy_version 145224 (0.0024) [2025-01-04 08:11:43,968][134211] Fps is (10 sec: 13107.2, 60 sec: 13789.8, 300 sec: 14301.3). Total num frames: 594853888. Throughput: 0: 3290.3. Samples: 137884776. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 08:11:43,968][134211] Avg episode reward: [(0, '9.455')] [2025-01-04 08:11:45,501][134294] Updated weights for policy 0, policy_version 145234 (0.0026) [2025-01-04 08:11:48,309][134294] Updated weights for policy 0, policy_version 145244 (0.0026) [2025-01-04 08:11:48,968][134211] Fps is (10 sec: 13926.3, 60 sec: 13243.7, 300 sec: 14287.4). Total num frames: 594927616. Throughput: 0: 3321.3. Samples: 137895382. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 08:11:48,968][134211] Avg episode reward: [(0, '8.490')] [2025-01-04 08:11:51,378][134294] Updated weights for policy 0, policy_version 145254 (0.0022) [2025-01-04 08:11:53,968][134211] Fps is (10 sec: 13516.8, 60 sec: 13175.5, 300 sec: 14273.5). Total num frames: 594989056. Throughput: 0: 3350.9. Samples: 137915812. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 08:11:53,968][134211] Avg episode reward: [(0, '9.344')] [2025-01-04 08:11:54,563][134294] Updated weights for policy 0, policy_version 145264 (0.0028) [2025-01-04 08:11:56,733][134294] Updated weights for policy 0, policy_version 145274 (0.0015) [2025-01-04 08:11:58,666][134294] Updated weights for policy 0, policy_version 145284 (0.0014) [2025-01-04 08:11:58,967][134211] Fps is (10 sec: 15974.8, 60 sec: 13721.7, 300 sec: 14398.5). Total num frames: 595087360. Throughput: 0: 3481.6. Samples: 137941340. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 08:11:58,968][134211] Avg episode reward: [(0, '8.725')] [2025-01-04 08:12:00,782][134294] Updated weights for policy 0, policy_version 145294 (0.0014) [2025-01-04 08:12:02,840][134294] Updated weights for policy 0, policy_version 145304 (0.0014) [2025-01-04 08:12:03,968][134211] Fps is (10 sec: 18841.0, 60 sec: 14267.6, 300 sec: 14412.3). Total num frames: 595177472. Throughput: 0: 3601.1. Samples: 137956396. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 08:12:03,969][134211] Avg episode reward: [(0, '8.378')] [2025-01-04 08:12:06,360][134294] Updated weights for policy 0, policy_version 145314 (0.0027) [2025-01-04 08:12:08,968][134211] Fps is (10 sec: 14745.4, 60 sec: 14131.6, 300 sec: 14370.7). Total num frames: 595234816. Throughput: 0: 3603.6. Samples: 137977252. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 08:12:08,968][134211] Avg episode reward: [(0, '8.442')] [2025-01-04 08:12:09,722][134294] Updated weights for policy 0, policy_version 145324 (0.0029) [2025-01-04 08:12:13,055][134294] Updated weights for policy 0, policy_version 145334 (0.0028) [2025-01-04 08:12:13,968][134211] Fps is (10 sec: 11879.0, 60 sec: 14063.0, 300 sec: 14356.8). Total num frames: 595296256. Throughput: 0: 3564.1. Samples: 137995888. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 08:12:13,968][134211] Avg episode reward: [(0, '8.596')] [2025-01-04 08:12:15,934][134294] Updated weights for policy 0, policy_version 145344 (0.0025) [2025-01-04 08:12:18,968][134211] Fps is (10 sec: 13107.0, 60 sec: 14062.9, 300 sec: 14370.7). Total num frames: 595365888. Throughput: 0: 3576.2. Samples: 138006112. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 08:12:18,968][134211] Avg episode reward: [(0, '8.652')] [2025-01-04 08:12:19,015][134294] Updated weights for policy 0, policy_version 145354 (0.0025) [2025-01-04 08:12:21,933][134294] Updated weights for policy 0, policy_version 145364 (0.0026) [2025-01-04 08:12:23,968][134211] Fps is (10 sec: 13926.2, 60 sec: 14062.9, 300 sec: 14329.1). Total num frames: 595435520. Throughput: 0: 3598.8. Samples: 138026624. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 08:12:23,968][134211] Avg episode reward: [(0, '9.332')] [2025-01-04 08:12:25,275][134294] Updated weights for policy 0, policy_version 145374 (0.0026) [2025-01-04 08:12:28,968][134211] Fps is (10 sec: 12288.2, 60 sec: 13858.2, 300 sec: 14273.5). Total num frames: 595488768. Throughput: 0: 3535.4. Samples: 138043870. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 08:12:28,968][134211] Avg episode reward: [(0, '8.620')] [2025-01-04 08:12:29,053][134294] Updated weights for policy 0, policy_version 145384 (0.0025) [2025-01-04 08:12:31,085][134294] Updated weights for policy 0, policy_version 145394 (0.0013) [2025-01-04 08:12:33,094][134294] Updated weights for policy 0, policy_version 145404 (0.0014) [2025-01-04 08:12:33,968][134211] Fps is (10 sec: 15565.1, 60 sec: 14472.6, 300 sec: 14273.5). Total num frames: 595591168. Throughput: 0: 3599.6. Samples: 138057364. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 08:12:33,968][134211] Avg episode reward: [(0, '8.695')] [2025-01-04 08:12:35,403][134294] Updated weights for policy 0, policy_version 145414 (0.0015) [2025-01-04 08:12:38,968][134211] Fps is (10 sec: 16383.8, 60 sec: 14404.3, 300 sec: 14259.6). Total num frames: 595652608. Throughput: 0: 3702.6. Samples: 138082430. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 08:12:38,969][134211] Avg episode reward: [(0, '8.142')] [2025-01-04 08:12:38,995][134294] Updated weights for policy 0, policy_version 145424 (0.0027) [2025-01-04 08:12:42,422][134294] Updated weights for policy 0, policy_version 145434 (0.0028) [2025-01-04 08:12:43,968][134211] Fps is (10 sec: 12287.8, 60 sec: 14336.0, 300 sec: 14245.8). Total num frames: 595714048. Throughput: 0: 3521.9. Samples: 138099828. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 08:12:43,968][134211] Avg episode reward: [(0, '8.785')] [2025-01-04 08:12:45,593][134294] Updated weights for policy 0, policy_version 145444 (0.0026) [2025-01-04 08:12:48,619][134294] Updated weights for policy 0, policy_version 145454 (0.0025) [2025-01-04 08:12:48,968][134211] Fps is (10 sec: 13107.2, 60 sec: 14267.7, 300 sec: 14259.6). Total num frames: 595783680. Throughput: 0: 3407.8. Samples: 138109744. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 08:12:48,968][134211] Avg episode reward: [(0, '8.967')] [2025-01-04 08:12:51,623][134294] Updated weights for policy 0, policy_version 145464 (0.0025) [2025-01-04 08:12:53,968][134211] Fps is (10 sec: 13516.7, 60 sec: 14336.0, 300 sec: 14259.6). Total num frames: 595849216. Throughput: 0: 3397.4. Samples: 138130134. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 08:12:53,969][134211] Avg episode reward: [(0, '8.237')] [2025-01-04 08:12:54,745][134294] Updated weights for policy 0, policy_version 145474 (0.0025) [2025-01-04 08:12:57,749][134294] Updated weights for policy 0, policy_version 145484 (0.0024) [2025-01-04 08:12:58,968][134211] Fps is (10 sec: 13107.1, 60 sec: 13789.8, 300 sec: 14287.4). Total num frames: 595914752. Throughput: 0: 3427.3. Samples: 138150116. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 08:12:58,969][134211] Avg episode reward: [(0, '8.295')] [2025-01-04 08:13:00,961][134294] Updated weights for policy 0, policy_version 145494 (0.0025) [2025-01-04 08:13:03,967][134211] Fps is (10 sec: 13107.6, 60 sec: 13380.4, 300 sec: 14259.6). Total num frames: 595980288. Throughput: 0: 3413.7. Samples: 138159728. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 08:13:03,968][134211] Avg episode reward: [(0, '8.695')] [2025-01-04 08:13:04,002][134294] Updated weights for policy 0, policy_version 145504 (0.0023) [2025-01-04 08:13:06,033][134294] Updated weights for policy 0, policy_version 145514 (0.0015) [2025-01-04 08:13:07,902][134294] Updated weights for policy 0, policy_version 145524 (0.0012) [2025-01-04 08:13:08,967][134211] Fps is (10 sec: 17203.8, 60 sec: 14199.5, 300 sec: 14398.5). Total num frames: 596086784. Throughput: 0: 3539.7. Samples: 138185910. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 08:13:08,968][134211] Avg episode reward: [(0, '8.103')] [2025-01-04 08:13:09,773][134294] Updated weights for policy 0, policy_version 145534 (0.0015) [2025-01-04 08:13:11,823][134294] Updated weights for policy 0, policy_version 145544 (0.0017) [2025-01-04 08:13:13,974][134211] Fps is (10 sec: 19238.8, 60 sec: 14607.5, 300 sec: 14467.6). Total num frames: 596172800. Throughput: 0: 3794.3. Samples: 138214638. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 08:13:13,975][134211] Avg episode reward: [(0, '8.248')] [2025-01-04 08:13:13,987][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000145550_596172800.pth... [2025-01-04 08:13:14,060][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000144717_592760832.pth [2025-01-04 08:13:15,265][134294] Updated weights for policy 0, policy_version 145554 (0.0031) [2025-01-04 08:13:18,597][134294] Updated weights for policy 0, policy_version 145564 (0.0028) [2025-01-04 08:13:18,968][134211] Fps is (10 sec: 14744.7, 60 sec: 14472.5, 300 sec: 14426.2). Total num frames: 596234240. Throughput: 0: 3683.8. Samples: 138223136. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 08:13:18,969][134211] Avg episode reward: [(0, '8.592')] [2025-01-04 08:13:21,705][134294] Updated weights for policy 0, policy_version 145574 (0.0025) [2025-01-04 08:13:23,968][134211] Fps is (10 sec: 12295.6, 60 sec: 14336.0, 300 sec: 14301.3). Total num frames: 596295680. Throughput: 0: 3553.5. Samples: 138242338. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 08:13:23,968][134211] Avg episode reward: [(0, '8.311')] [2025-01-04 08:13:24,970][134294] Updated weights for policy 0, policy_version 145584 (0.0027) [2025-01-04 08:13:27,977][134294] Updated weights for policy 0, policy_version 145594 (0.0028) [2025-01-04 08:13:28,969][134211] Fps is (10 sec: 12696.7, 60 sec: 14540.5, 300 sec: 14259.6). Total num frames: 596361216. Throughput: 0: 3600.0. Samples: 138261834. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 08:13:28,969][134211] Avg episode reward: [(0, '8.518')] [2025-01-04 08:13:31,621][134294] Updated weights for policy 0, policy_version 145604 (0.0024) [2025-01-04 08:13:33,968][134211] Fps is (10 sec: 12287.9, 60 sec: 13789.8, 300 sec: 14259.6). Total num frames: 596418560. Throughput: 0: 3571.0. Samples: 138270438. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 08:13:33,969][134211] Avg episode reward: [(0, '8.770')] [2025-01-04 08:13:35,268][134294] Updated weights for policy 0, policy_version 145614 (0.0029) [2025-01-04 08:13:38,620][134294] Updated weights for policy 0, policy_version 145624 (0.0025) [2025-01-04 08:13:38,968][134211] Fps is (10 sec: 11879.7, 60 sec: 13789.9, 300 sec: 14259.6). Total num frames: 596480000. Throughput: 0: 3504.6. Samples: 138287840. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 08:13:38,968][134211] Avg episode reward: [(0, '7.615')] [2025-01-04 08:13:41,847][134294] Updated weights for policy 0, policy_version 145634 (0.0028) [2025-01-04 08:13:43,968][134211] Fps is (10 sec: 12288.2, 60 sec: 13789.9, 300 sec: 14245.7). Total num frames: 596541440. Throughput: 0: 3471.3. Samples: 138306326. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 08:13:43,968][134211] Avg episode reward: [(0, '8.431')] [2025-01-04 08:13:45,259][134294] Updated weights for policy 0, policy_version 145644 (0.0024) [2025-01-04 08:13:48,514][134294] Updated weights for policy 0, policy_version 145654 (0.0026) [2025-01-04 08:13:48,968][134211] Fps is (10 sec: 12697.8, 60 sec: 13721.6, 300 sec: 14231.9). Total num frames: 596606976. Throughput: 0: 3464.0. Samples: 138315608. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 08:13:48,968][134211] Avg episode reward: [(0, '9.139')] [2025-01-04 08:13:50,668][134294] Updated weights for policy 0, policy_version 145664 (0.0012) [2025-01-04 08:13:53,497][134294] Updated weights for policy 0, policy_version 145674 (0.0025) [2025-01-04 08:13:53,968][134211] Fps is (10 sec: 14335.9, 60 sec: 13926.4, 300 sec: 14273.5). Total num frames: 596684800. Throughput: 0: 3398.4. Samples: 138338840. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 08:13:53,968][134211] Avg episode reward: [(0, '9.664')] [2025-01-04 08:13:56,467][134294] Updated weights for policy 0, policy_version 145684 (0.0028) [2025-01-04 08:13:58,968][134211] Fps is (10 sec: 13926.3, 60 sec: 13858.2, 300 sec: 14245.7). Total num frames: 596746240. Throughput: 0: 3197.4. Samples: 138358502. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 08:13:58,968][134211] Avg episode reward: [(0, '9.575')] [2025-01-04 08:13:59,976][134294] Updated weights for policy 0, policy_version 145694 (0.0025) [2025-01-04 08:14:02,825][134294] Updated weights for policy 0, policy_version 145704 (0.0020) [2025-01-04 08:14:03,967][134211] Fps is (10 sec: 13926.8, 60 sec: 14062.9, 300 sec: 14148.6). Total num frames: 596824064. Throughput: 0: 3196.9. Samples: 138366996. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 08:14:03,968][134211] Avg episode reward: [(0, '10.501')] [2025-01-04 08:14:03,972][134264] Saving new best policy, reward=10.501! [2025-01-04 08:14:04,979][134294] Updated weights for policy 0, policy_version 145714 (0.0014) [2025-01-04 08:14:06,912][134294] Updated weights for policy 0, policy_version 145724 (0.0013) [2025-01-04 08:14:08,816][134294] Updated weights for policy 0, policy_version 145734 (0.0013) [2025-01-04 08:14:08,968][134211] Fps is (10 sec: 18022.6, 60 sec: 13994.7, 300 sec: 14120.8). Total num frames: 596926464. Throughput: 0: 3413.7. Samples: 138395952. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 08:14:08,968][134211] Avg episode reward: [(0, '9.100')] [2025-01-04 08:14:10,779][134294] Updated weights for policy 0, policy_version 145744 (0.0013) [2025-01-04 08:14:12,769][134294] Updated weights for policy 0, policy_version 145754 (0.0015) [2025-01-04 08:14:13,968][134211] Fps is (10 sec: 20070.0, 60 sec: 14201.0, 300 sec: 14231.9). Total num frames: 597024768. Throughput: 0: 3671.5. Samples: 138427046. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 08:14:13,968][134211] Avg episode reward: [(0, '7.969')] [2025-01-04 08:14:15,730][134294] Updated weights for policy 0, policy_version 145764 (0.0027) [2025-01-04 08:14:18,827][134294] Updated weights for policy 0, policy_version 145774 (0.0027) [2025-01-04 08:14:18,968][134211] Fps is (10 sec: 16383.6, 60 sec: 14267.8, 300 sec: 14245.7). Total num frames: 597090304. Throughput: 0: 3706.1. Samples: 138437210. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 08:14:18,968][134211] Avg episode reward: [(0, '9.915')] [2025-01-04 08:14:22,020][134294] Updated weights for policy 0, policy_version 145784 (0.0025) [2025-01-04 08:14:23,968][134211] Fps is (10 sec: 13107.0, 60 sec: 14336.0, 300 sec: 14231.9). Total num frames: 597155840. Throughput: 0: 3751.1. Samples: 138456640. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 08:14:23,969][134211] Avg episode reward: [(0, '8.499')] [2025-01-04 08:14:25,084][134294] Updated weights for policy 0, policy_version 145794 (0.0024) [2025-01-04 08:14:28,284][134294] Updated weights for policy 0, policy_version 145804 (0.0026) [2025-01-04 08:14:28,968][134211] Fps is (10 sec: 13107.3, 60 sec: 14336.3, 300 sec: 14259.6). Total num frames: 597221376. Throughput: 0: 3773.4. Samples: 138476130. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 08:14:28,968][134211] Avg episode reward: [(0, '8.545')] [2025-01-04 08:14:31,635][134294] Updated weights for policy 0, policy_version 145814 (0.0025) [2025-01-04 08:14:33,968][134211] Fps is (10 sec: 12288.0, 60 sec: 14336.0, 300 sec: 14218.0). Total num frames: 597278720. Throughput: 0: 3776.3. Samples: 138485544. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 08:14:33,968][134211] Avg episode reward: [(0, '8.345')] [2025-01-04 08:14:35,217][134294] Updated weights for policy 0, policy_version 145824 (0.0025) [2025-01-04 08:14:38,731][134294] Updated weights for policy 0, policy_version 145834 (0.0026) [2025-01-04 08:14:38,968][134211] Fps is (10 sec: 11468.7, 60 sec: 14267.7, 300 sec: 14190.2). Total num frames: 597336064. Throughput: 0: 3637.3. Samples: 138502518. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 08:14:38,968][134211] Avg episode reward: [(0, '8.630')] [2025-01-04 08:14:42,095][134294] Updated weights for policy 0, policy_version 145844 (0.0024) [2025-01-04 08:14:43,968][134211] Fps is (10 sec: 11878.6, 60 sec: 14267.7, 300 sec: 14176.3). Total num frames: 597397504. Throughput: 0: 3602.9. Samples: 138520632. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 08:14:43,968][134211] Avg episode reward: [(0, '9.518')] [2025-01-04 08:14:45,323][134294] Updated weights for policy 0, policy_version 145854 (0.0025) [2025-01-04 08:14:48,358][134294] Updated weights for policy 0, policy_version 145864 (0.0025) [2025-01-04 08:14:48,968][134211] Fps is (10 sec: 12697.8, 60 sec: 14267.7, 300 sec: 14162.4). Total num frames: 597463040. Throughput: 0: 3638.7. Samples: 138530740. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 08:14:48,968][134211] Avg episode reward: [(0, '8.747')] [2025-01-04 08:14:51,362][134294] Updated weights for policy 0, policy_version 145874 (0.0024) [2025-01-04 08:14:53,969][134211] Fps is (10 sec: 13515.3, 60 sec: 14131.0, 300 sec: 14176.3). Total num frames: 597532672. Throughput: 0: 3443.4. Samples: 138550908. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 08:14:53,969][134211] Avg episode reward: [(0, '8.892')] [2025-01-04 08:14:54,380][134294] Updated weights for policy 0, policy_version 145884 (0.0023) [2025-01-04 08:14:57,489][134294] Updated weights for policy 0, policy_version 145894 (0.0027) [2025-01-04 08:14:58,968][134211] Fps is (10 sec: 13516.7, 60 sec: 14199.4, 300 sec: 14204.1). Total num frames: 597598208. Throughput: 0: 3197.3. Samples: 138570924. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 08:14:58,968][134211] Avg episode reward: [(0, '8.978')] [2025-01-04 08:15:00,799][134294] Updated weights for policy 0, policy_version 145904 (0.0026) [2025-01-04 08:15:03,968][134211] Fps is (10 sec: 12698.8, 60 sec: 13926.3, 300 sec: 14134.7). Total num frames: 597659648. Throughput: 0: 3175.2. Samples: 138580096. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 08:15:03,968][134211] Avg episode reward: [(0, '8.560')] [2025-01-04 08:15:04,225][134294] Updated weights for policy 0, policy_version 145914 (0.0023) [2025-01-04 08:15:06,559][134294] Updated weights for policy 0, policy_version 145924 (0.0015) [2025-01-04 08:15:08,463][134294] Updated weights for policy 0, policy_version 145934 (0.0013) [2025-01-04 08:15:08,967][134211] Fps is (10 sec: 15565.1, 60 sec: 13789.9, 300 sec: 14176.3). Total num frames: 597753856. Throughput: 0: 3244.5. Samples: 138602642. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 08:15:08,968][134211] Avg episode reward: [(0, '8.167')] [2025-01-04 08:15:10,378][134294] Updated weights for policy 0, policy_version 145944 (0.0014) [2025-01-04 08:15:12,537][134294] Updated weights for policy 0, policy_version 145954 (0.0018) [2025-01-04 08:15:13,968][134211] Fps is (10 sec: 18432.2, 60 sec: 13653.3, 300 sec: 14245.8). Total num frames: 597843968. Throughput: 0: 3471.2. Samples: 138632334. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 08:15:13,968][134211] Avg episode reward: [(0, '8.382')] [2025-01-04 08:15:13,977][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000145958_597843968.pth... [2025-01-04 08:15:14,060][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000145131_594456576.pth [2025-01-04 08:15:15,988][134294] Updated weights for policy 0, policy_version 145964 (0.0028) [2025-01-04 08:15:18,968][134211] Fps is (10 sec: 14745.3, 60 sec: 13516.8, 300 sec: 14218.0). Total num frames: 597901312. Throughput: 0: 3452.7. Samples: 138640914. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 08:15:18,968][134211] Avg episode reward: [(0, '7.730')] [2025-01-04 08:15:19,239][134294] Updated weights for policy 0, policy_version 145974 (0.0026) [2025-01-04 08:15:22,547][134294] Updated weights for policy 0, policy_version 145984 (0.0028) [2025-01-04 08:15:23,968][134211] Fps is (10 sec: 12288.0, 60 sec: 13516.8, 300 sec: 14218.0). Total num frames: 597966848. Throughput: 0: 3493.9. Samples: 138659742. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 08:15:23,968][134211] Avg episode reward: [(0, '8.962')] [2025-01-04 08:15:25,505][134294] Updated weights for policy 0, policy_version 145994 (0.0024) [2025-01-04 08:15:28,550][134294] Updated weights for policy 0, policy_version 146004 (0.0026) [2025-01-04 08:15:28,968][134211] Fps is (10 sec: 13516.7, 60 sec: 13585.0, 300 sec: 14245.7). Total num frames: 598036480. Throughput: 0: 3543.6. Samples: 138680094. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 08:15:28,968][134211] Avg episode reward: [(0, '10.063')] [2025-01-04 08:15:31,845][134294] Updated weights for policy 0, policy_version 146014 (0.0028) [2025-01-04 08:15:33,968][134211] Fps is (10 sec: 13106.9, 60 sec: 13653.3, 300 sec: 14176.3). Total num frames: 598097920. Throughput: 0: 3526.6. Samples: 138689438. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 08:15:33,969][134211] Avg episode reward: [(0, '7.709')] [2025-01-04 08:15:35,311][134294] Updated weights for policy 0, policy_version 146024 (0.0023) [2025-01-04 08:15:38,440][134294] Updated weights for policy 0, policy_version 146034 (0.0022) [2025-01-04 08:15:38,968][134211] Fps is (10 sec: 12288.0, 60 sec: 13721.6, 300 sec: 14009.7). Total num frames: 598159360. Throughput: 0: 3490.4. Samples: 138707974. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 08:15:38,968][134211] Avg episode reward: [(0, '8.875')] [2025-01-04 08:15:41,614][134294] Updated weights for policy 0, policy_version 146044 (0.0022) [2025-01-04 08:15:43,968][134211] Fps is (10 sec: 12697.7, 60 sec: 13789.8, 300 sec: 13870.9). Total num frames: 598224896. Throughput: 0: 3474.1. Samples: 138727258. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 08:15:43,968][134211] Avg episode reward: [(0, '8.534')] [2025-01-04 08:15:44,708][134294] Updated weights for policy 0, policy_version 146054 (0.0025) [2025-01-04 08:15:47,909][134294] Updated weights for policy 0, policy_version 146064 (0.0025) [2025-01-04 08:15:48,968][134211] Fps is (10 sec: 13107.5, 60 sec: 13789.9, 300 sec: 13870.9). Total num frames: 598290432. Throughput: 0: 3489.9. Samples: 138737142. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 08:15:48,968][134211] Avg episode reward: [(0, '8.629')] [2025-01-04 08:15:50,147][134294] Updated weights for policy 0, policy_version 146074 (0.0017) [2025-01-04 08:15:52,070][134294] Updated weights for policy 0, policy_version 146084 (0.0014) [2025-01-04 08:15:53,963][134294] Updated weights for policy 0, policy_version 146094 (0.0012) [2025-01-04 08:15:53,968][134211] Fps is (10 sec: 17613.0, 60 sec: 14472.8, 300 sec: 14023.6). Total num frames: 598401024. Throughput: 0: 3587.5. Samples: 138764078. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 08:15:53,968][134211] Avg episode reward: [(0, '8.133')] [2025-01-04 08:15:55,803][134294] Updated weights for policy 0, policy_version 146104 (0.0014) [2025-01-04 08:15:58,101][134294] Updated weights for policy 0, policy_version 146114 (0.0021) [2025-01-04 08:15:58,968][134211] Fps is (10 sec: 20070.1, 60 sec: 14882.1, 300 sec: 14134.7). Total num frames: 598491136. Throughput: 0: 3594.5. Samples: 138794086. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 08:15:58,968][134211] Avg episode reward: [(0, '8.739')] [2025-01-04 08:16:01,559][134294] Updated weights for policy 0, policy_version 146124 (0.0028) [2025-01-04 08:16:03,968][134211] Fps is (10 sec: 14745.3, 60 sec: 14813.8, 300 sec: 14107.0). Total num frames: 598548480. Throughput: 0: 3604.2. Samples: 138803104. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 08:16:03,969][134211] Avg episode reward: [(0, '9.348')] [2025-01-04 08:16:05,264][134294] Updated weights for policy 0, policy_version 146134 (0.0026) [2025-01-04 08:16:08,551][134294] Updated weights for policy 0, policy_version 146144 (0.0024) [2025-01-04 08:16:08,968][134211] Fps is (10 sec: 11878.5, 60 sec: 14267.7, 300 sec: 14093.0). Total num frames: 598609920. Throughput: 0: 3569.0. Samples: 138820346. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 08:16:08,968][134211] Avg episode reward: [(0, '8.214')] [2025-01-04 08:16:11,871][134294] Updated weights for policy 0, policy_version 146154 (0.0029) [2025-01-04 08:16:13,968][134211] Fps is (10 sec: 12288.1, 60 sec: 13789.8, 300 sec: 14065.2). Total num frames: 598671360. Throughput: 0: 3529.5. Samples: 138838920. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 08:16:13,968][134211] Avg episode reward: [(0, '8.772')] [2025-01-04 08:16:15,106][134294] Updated weights for policy 0, policy_version 146164 (0.0027) [2025-01-04 08:16:18,090][134294] Updated weights for policy 0, policy_version 146174 (0.0029) [2025-01-04 08:16:18,968][134211] Fps is (10 sec: 12697.4, 60 sec: 13926.4, 300 sec: 14051.4). Total num frames: 598736896. Throughput: 0: 3544.1. Samples: 138848924. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 08:16:18,968][134211] Avg episode reward: [(0, '8.257')] [2025-01-04 08:16:21,200][134294] Updated weights for policy 0, policy_version 146184 (0.0025) [2025-01-04 08:16:23,968][134211] Fps is (10 sec: 13107.1, 60 sec: 13926.4, 300 sec: 14051.4). Total num frames: 598802432. Throughput: 0: 3577.3. Samples: 138868952. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 08:16:23,968][134211] Avg episode reward: [(0, '8.281')] [2025-01-04 08:16:24,447][134294] Updated weights for policy 0, policy_version 146194 (0.0024) [2025-01-04 08:16:27,467][134294] Updated weights for policy 0, policy_version 146204 (0.0025) [2025-01-04 08:16:28,968][134211] Fps is (10 sec: 13107.2, 60 sec: 13858.1, 300 sec: 14051.4). Total num frames: 598867968. Throughput: 0: 3576.8. Samples: 138888212. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 08:16:28,968][134211] Avg episode reward: [(0, '9.052')] [2025-01-04 08:16:30,740][134294] Updated weights for policy 0, policy_version 146214 (0.0025) [2025-01-04 08:16:33,968][134211] Fps is (10 sec: 12697.6, 60 sec: 13858.1, 300 sec: 14037.5). Total num frames: 598929408. Throughput: 0: 3573.7. Samples: 138897958. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 08:16:33,968][134211] Avg episode reward: [(0, '8.438')] [2025-01-04 08:16:34,074][134294] Updated weights for policy 0, policy_version 146224 (0.0026) [2025-01-04 08:16:36,894][134294] Updated weights for policy 0, policy_version 146234 (0.0018) [2025-01-04 08:16:38,968][134211] Fps is (10 sec: 14336.3, 60 sec: 14199.5, 300 sec: 14093.0). Total num frames: 599011328. Throughput: 0: 3422.1. Samples: 138918074. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 08:16:38,968][134211] Avg episode reward: [(0, '8.869')] [2025-01-04 08:16:38,983][134294] Updated weights for policy 0, policy_version 146244 (0.0014) [2025-01-04 08:16:40,993][134294] Updated weights for policy 0, policy_version 146254 (0.0014) [2025-01-04 08:16:42,883][134294] Updated weights for policy 0, policy_version 146264 (0.0013) [2025-01-04 08:16:43,968][134211] Fps is (10 sec: 18841.9, 60 sec: 14882.2, 300 sec: 14204.1). Total num frames: 599117824. Throughput: 0: 3446.4. Samples: 138949172. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 08:16:43,968][134211] Avg episode reward: [(0, '9.571')] [2025-01-04 08:16:44,791][134294] Updated weights for policy 0, policy_version 146274 (0.0014) [2025-01-04 08:16:47,525][134294] Updated weights for policy 0, policy_version 146284 (0.0025) [2025-01-04 08:16:48,968][134211] Fps is (10 sec: 18431.6, 60 sec: 15086.9, 300 sec: 14259.6). Total num frames: 599195648. Throughput: 0: 3566.9. Samples: 138963614. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 08:16:48,968][134211] Avg episode reward: [(0, '8.243')] [2025-01-04 08:16:50,746][134294] Updated weights for policy 0, policy_version 146294 (0.0027) [2025-01-04 08:16:53,840][134294] Updated weights for policy 0, policy_version 146304 (0.0024) [2025-01-04 08:16:53,968][134211] Fps is (10 sec: 14335.8, 60 sec: 14336.0, 300 sec: 14148.5). Total num frames: 599261184. Throughput: 0: 3617.6. Samples: 138983140. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 08:16:53,968][134211] Avg episode reward: [(0, '8.458')] [2025-01-04 08:16:56,890][134294] Updated weights for policy 0, policy_version 146314 (0.0030) [2025-01-04 08:16:58,969][134211] Fps is (10 sec: 13105.2, 60 sec: 13926.0, 300 sec: 14065.2). Total num frames: 599326720. Throughput: 0: 3636.4. Samples: 139002564. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 08:16:58,970][134211] Avg episode reward: [(0, '8.275')] [2025-01-04 08:17:00,390][134294] Updated weights for policy 0, policy_version 146324 (0.0023) [2025-01-04 08:17:03,783][134294] Updated weights for policy 0, policy_version 146334 (0.0027) [2025-01-04 08:17:03,968][134211] Fps is (10 sec: 12287.5, 60 sec: 13926.3, 300 sec: 14065.2). Total num frames: 599384064. Throughput: 0: 3608.0. Samples: 139011284. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 08:17:03,969][134211] Avg episode reward: [(0, '7.952')] [2025-01-04 08:17:06,935][134294] Updated weights for policy 0, policy_version 146344 (0.0022) [2025-01-04 08:17:08,968][134211] Fps is (10 sec: 12289.9, 60 sec: 13994.6, 300 sec: 14079.1). Total num frames: 599449600. Throughput: 0: 3582.6. Samples: 139030168. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 08:17:08,968][134211] Avg episode reward: [(0, '8.630')] [2025-01-04 08:17:10,143][134294] Updated weights for policy 0, policy_version 146354 (0.0026) [2025-01-04 08:17:13,221][134294] Updated weights for policy 0, policy_version 146364 (0.0025) [2025-01-04 08:17:13,968][134211] Fps is (10 sec: 13107.8, 60 sec: 14062.9, 300 sec: 14065.2). Total num frames: 599515136. Throughput: 0: 3591.5. Samples: 139049828. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 08:17:13,968][134211] Avg episode reward: [(0, '8.095')] [2025-01-04 08:17:13,985][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000146366_599515136.pth... [2025-01-04 08:17:14,059][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000145550_596172800.pth [2025-01-04 08:17:16,286][134294] Updated weights for policy 0, policy_version 146374 (0.0024) [2025-01-04 08:17:18,968][134211] Fps is (10 sec: 13517.0, 60 sec: 14131.2, 300 sec: 14065.3). Total num frames: 599584768. Throughput: 0: 3591.0. Samples: 139059552. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 08:17:18,968][134211] Avg episode reward: [(0, '8.225')] [2025-01-04 08:17:19,263][134294] Updated weights for policy 0, policy_version 146384 (0.0023) [2025-01-04 08:17:22,179][134294] Updated weights for policy 0, policy_version 146394 (0.0027) [2025-01-04 08:17:23,968][134211] Fps is (10 sec: 13517.0, 60 sec: 14131.2, 300 sec: 14106.9). Total num frames: 599650304. Throughput: 0: 3610.2. Samples: 139080534. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 08:17:23,968][134211] Avg episode reward: [(0, '8.236')] [2025-01-04 08:17:25,178][134294] Updated weights for policy 0, policy_version 146404 (0.0025) [2025-01-04 08:17:28,245][134294] Updated weights for policy 0, policy_version 146414 (0.0025) [2025-01-04 08:17:28,968][134211] Fps is (10 sec: 13516.7, 60 sec: 14199.5, 300 sec: 13995.8). Total num frames: 599719936. Throughput: 0: 3372.1. Samples: 139100918. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 08:17:28,968][134211] Avg episode reward: [(0, '8.787')] [2025-01-04 08:17:31,541][134294] Updated weights for policy 0, policy_version 146424 (0.0024) [2025-01-04 08:17:33,968][134211] Fps is (10 sec: 12697.4, 60 sec: 14131.2, 300 sec: 13981.9). Total num frames: 599777280. Throughput: 0: 3262.7. Samples: 139110434. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 08:17:33,969][134211] Avg episode reward: [(0, '8.235')] [2025-01-04 08:17:35,035][134294] Updated weights for policy 0, policy_version 146434 (0.0025) [2025-01-04 08:17:37,966][134294] Updated weights for policy 0, policy_version 146444 (0.0020) [2025-01-04 08:17:38,967][134211] Fps is (10 sec: 13517.0, 60 sec: 14062.9, 300 sec: 14037.5). Total num frames: 599855104. Throughput: 0: 3223.0. Samples: 139128176. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 08:17:38,968][134211] Avg episode reward: [(0, '8.933')] [2025-01-04 08:17:39,963][134294] Updated weights for policy 0, policy_version 146454 (0.0013) [2025-01-04 08:17:41,894][134294] Updated weights for policy 0, policy_version 146464 (0.0013) [2025-01-04 08:17:43,758][134294] Updated weights for policy 0, policy_version 146474 (0.0012) [2025-01-04 08:17:43,968][134211] Fps is (10 sec: 18432.5, 60 sec: 14063.0, 300 sec: 14162.5). Total num frames: 599961600. Throughput: 0: 3488.4. Samples: 139159536. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 08:17:43,968][134211] Avg episode reward: [(0, '8.540')] [2025-01-04 08:17:46,366][134294] Updated weights for policy 0, policy_version 146484 (0.0022) [2025-01-04 08:17:48,968][134211] Fps is (10 sec: 17612.5, 60 sec: 13926.4, 300 sec: 14176.3). Total num frames: 600031232. Throughput: 0: 3571.8. Samples: 139172012. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 08:17:48,968][134211] Avg episode reward: [(0, '9.229')] [2025-01-04 08:17:49,560][134294] Updated weights for policy 0, policy_version 146494 (0.0027) [2025-01-04 08:17:52,736][134294] Updated weights for policy 0, policy_version 146504 (0.0025) [2025-01-04 08:17:53,968][134211] Fps is (10 sec: 13106.7, 60 sec: 13858.1, 300 sec: 14162.4). Total num frames: 600092672. Throughput: 0: 3583.3. Samples: 139191418. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 08:17:53,969][134211] Avg episode reward: [(0, '9.354')] [2025-01-04 08:17:55,855][134294] Updated weights for policy 0, policy_version 146514 (0.0024) [2025-01-04 08:17:58,896][134294] Updated weights for policy 0, policy_version 146524 (0.0025) [2025-01-04 08:17:58,968][134211] Fps is (10 sec: 13107.1, 60 sec: 13926.8, 300 sec: 14176.3). Total num frames: 600162304. Throughput: 0: 3597.7. Samples: 139211726. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 08:17:58,968][134211] Avg episode reward: [(0, '9.317')] [2025-01-04 08:18:02,084][134294] Updated weights for policy 0, policy_version 146534 (0.0029) [2025-01-04 08:18:03,968][134211] Fps is (10 sec: 13107.3, 60 sec: 13994.8, 300 sec: 14023.6). Total num frames: 600223744. Throughput: 0: 3589.0. Samples: 139221060. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 08:18:03,969][134211] Avg episode reward: [(0, '9.019')] [2025-01-04 08:18:05,543][134294] Updated weights for policy 0, policy_version 146544 (0.0027) [2025-01-04 08:18:08,598][134294] Updated weights for policy 0, policy_version 146554 (0.0027) [2025-01-04 08:18:08,968][134211] Fps is (10 sec: 12697.8, 60 sec: 13994.7, 300 sec: 13954.5). Total num frames: 600289280. Throughput: 0: 3541.4. Samples: 139239898. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 08:18:08,968][134211] Avg episode reward: [(0, '8.447')] [2025-01-04 08:18:11,509][134294] Updated weights for policy 0, policy_version 146564 (0.0024) [2025-01-04 08:18:13,968][134211] Fps is (10 sec: 13107.3, 60 sec: 13994.7, 300 sec: 13968.1). Total num frames: 600354816. Throughput: 0: 3538.9. Samples: 139260170. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 08:18:13,968][134211] Avg episode reward: [(0, '9.789')] [2025-01-04 08:18:14,617][134294] Updated weights for policy 0, policy_version 146574 (0.0027) [2025-01-04 08:18:17,656][134294] Updated weights for policy 0, policy_version 146584 (0.0025) [2025-01-04 08:18:18,968][134211] Fps is (10 sec: 13516.7, 60 sec: 13994.7, 300 sec: 13995.8). Total num frames: 600424448. Throughput: 0: 3550.6. Samples: 139270212. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 08:18:18,968][134211] Avg episode reward: [(0, '8.617')] [2025-01-04 08:18:20,548][134294] Updated weights for policy 0, policy_version 146594 (0.0024) [2025-01-04 08:18:23,487][134294] Updated weights for policy 0, policy_version 146604 (0.0026) [2025-01-04 08:18:23,968][134211] Fps is (10 sec: 13926.6, 60 sec: 14062.9, 300 sec: 14009.8). Total num frames: 600494080. Throughput: 0: 3621.0. Samples: 139291122. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 08:18:23,968][134211] Avg episode reward: [(0, '7.908')] [2025-01-04 08:18:26,477][134294] Updated weights for policy 0, policy_version 146614 (0.0025) [2025-01-04 08:18:28,968][134211] Fps is (10 sec: 13516.7, 60 sec: 13994.6, 300 sec: 14037.5). Total num frames: 600559616. Throughput: 0: 3369.5. Samples: 139311162. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 08:18:28,970][134211] Avg episode reward: [(0, '8.830')] [2025-01-04 08:18:29,850][134294] Updated weights for policy 0, policy_version 146624 (0.0029) [2025-01-04 08:18:32,024][134294] Updated weights for policy 0, policy_version 146634 (0.0014) [2025-01-04 08:18:33,968][134211] Fps is (10 sec: 15564.9, 60 sec: 14540.9, 300 sec: 14134.7). Total num frames: 600649728. Throughput: 0: 3337.3. Samples: 139322190. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 08:18:33,968][134211] Avg episode reward: [(0, '9.132')] [2025-01-04 08:18:34,095][134294] Updated weights for policy 0, policy_version 146644 (0.0012) [2025-01-04 08:18:36,255][134294] Updated weights for policy 0, policy_version 146654 (0.0015) [2025-01-04 08:18:38,279][134294] Updated weights for policy 0, policy_version 146664 (0.0012) [2025-01-04 08:18:38,968][134211] Fps is (10 sec: 18841.7, 60 sec: 14882.1, 300 sec: 14259.6). Total num frames: 600748032. Throughput: 0: 3559.3. Samples: 139351586. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 08:18:38,968][134211] Avg episode reward: [(0, '9.244')] [2025-01-04 08:18:40,258][134294] Updated weights for policy 0, policy_version 146674 (0.0012) [2025-01-04 08:18:42,917][134294] Updated weights for policy 0, policy_version 146684 (0.0023) [2025-01-04 08:18:43,968][134211] Fps is (10 sec: 18022.0, 60 sec: 14472.5, 300 sec: 14315.2). Total num frames: 600829952. Throughput: 0: 3700.3. Samples: 139378238. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 08:18:43,969][134211] Avg episode reward: [(0, '8.325')] [2025-01-04 08:18:46,171][134294] Updated weights for policy 0, policy_version 146694 (0.0025) [2025-01-04 08:18:48,968][134211] Fps is (10 sec: 14335.9, 60 sec: 14336.0, 300 sec: 14259.6). Total num frames: 600891392. Throughput: 0: 3708.4. Samples: 139387938. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 08:18:48,968][134211] Avg episode reward: [(0, '8.315')] [2025-01-04 08:18:49,496][134294] Updated weights for policy 0, policy_version 146704 (0.0027) [2025-01-04 08:18:52,487][134294] Updated weights for policy 0, policy_version 146714 (0.0027) [2025-01-04 08:18:53,968][134211] Fps is (10 sec: 12697.8, 60 sec: 14404.3, 300 sec: 14273.5). Total num frames: 600956928. Throughput: 0: 3718.0. Samples: 139407206. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 08:18:53,968][134211] Avg episode reward: [(0, '8.423')] [2025-01-04 08:18:55,597][134294] Updated weights for policy 0, policy_version 146724 (0.0025) [2025-01-04 08:18:58,615][134294] Updated weights for policy 0, policy_version 146734 (0.0027) [2025-01-04 08:18:58,968][134211] Fps is (10 sec: 13107.1, 60 sec: 14336.0, 300 sec: 14231.9). Total num frames: 601022464. Throughput: 0: 3712.3. Samples: 139427222. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 08:18:58,968][134211] Avg episode reward: [(0, '7.489')] [2025-01-04 08:19:02,145][134294] Updated weights for policy 0, policy_version 146744 (0.0027) [2025-01-04 08:19:03,968][134211] Fps is (10 sec: 12287.8, 60 sec: 14267.7, 300 sec: 14079.1). Total num frames: 601079808. Throughput: 0: 3687.0. Samples: 139436126. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 08:19:03,969][134211] Avg episode reward: [(0, '8.765')] [2025-01-04 08:19:05,719][134294] Updated weights for policy 0, policy_version 146754 (0.0029) [2025-01-04 08:19:08,968][134211] Fps is (10 sec: 11878.4, 60 sec: 14199.4, 300 sec: 13954.2). Total num frames: 601141248. Throughput: 0: 3613.4. Samples: 139453724. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 08:19:08,968][134211] Avg episode reward: [(0, '8.523')] [2025-01-04 08:19:09,141][134294] Updated weights for policy 0, policy_version 146764 (0.0024) [2025-01-04 08:19:12,360][134294] Updated weights for policy 0, policy_version 146774 (0.0026) [2025-01-04 08:19:13,968][134211] Fps is (10 sec: 12288.0, 60 sec: 14131.2, 300 sec: 13940.3). Total num frames: 601202688. Throughput: 0: 3582.2. Samples: 139472362. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 08:19:13,968][134211] Avg episode reward: [(0, '8.347')] [2025-01-04 08:19:14,030][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000146779_601206784.pth... [2025-01-04 08:19:14,096][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000145958_597843968.pth [2025-01-04 08:19:15,473][134294] Updated weights for policy 0, policy_version 146784 (0.0026) [2025-01-04 08:19:18,460][134294] Updated weights for policy 0, policy_version 146794 (0.0025) [2025-01-04 08:19:18,968][134211] Fps is (10 sec: 13107.1, 60 sec: 14131.2, 300 sec: 13954.2). Total num frames: 601272320. Throughput: 0: 3560.3. Samples: 139482406. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 08:19:18,969][134211] Avg episode reward: [(0, '7.906')] [2025-01-04 08:19:21,447][134294] Updated weights for policy 0, policy_version 146804 (0.0024) [2025-01-04 08:19:23,968][134211] Fps is (10 sec: 13926.7, 60 sec: 14131.2, 300 sec: 13968.1). Total num frames: 601341952. Throughput: 0: 3367.9. Samples: 139503140. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 08:19:23,968][134211] Avg episode reward: [(0, '8.834')] [2025-01-04 08:19:24,437][134294] Updated weights for policy 0, policy_version 146814 (0.0023) [2025-01-04 08:19:27,457][134294] Updated weights for policy 0, policy_version 146824 (0.0025) [2025-01-04 08:19:28,968][134211] Fps is (10 sec: 13517.1, 60 sec: 14131.2, 300 sec: 13995.8). Total num frames: 601407488. Throughput: 0: 3223.1. Samples: 139523276. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 08:19:28,968][134211] Avg episode reward: [(0, '8.518')] [2025-01-04 08:19:30,631][134294] Updated weights for policy 0, policy_version 146834 (0.0023) [2025-01-04 08:19:33,968][134211] Fps is (10 sec: 12697.4, 60 sec: 13653.3, 300 sec: 14009.7). Total num frames: 601468928. Throughput: 0: 3219.5. Samples: 139532816. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 08:19:33,968][134211] Avg episode reward: [(0, '8.601')] [2025-01-04 08:19:33,990][134294] Updated weights for policy 0, policy_version 146844 (0.0024) [2025-01-04 08:19:36,866][134294] Updated weights for policy 0, policy_version 146854 (0.0019) [2025-01-04 08:19:38,817][134294] Updated weights for policy 0, policy_version 146864 (0.0013) [2025-01-04 08:19:38,968][134211] Fps is (10 sec: 14745.7, 60 sec: 13448.6, 300 sec: 14093.0). Total num frames: 601554944. Throughput: 0: 3244.6. Samples: 139553214. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 08:19:38,968][134211] Avg episode reward: [(0, '8.488')] [2025-01-04 08:19:40,755][134294] Updated weights for policy 0, policy_version 146874 (0.0014) [2025-01-04 08:19:42,584][134294] Updated weights for policy 0, policy_version 146884 (0.0011) [2025-01-04 08:19:43,967][134211] Fps is (10 sec: 19661.5, 60 sec: 13926.5, 300 sec: 14245.8). Total num frames: 601665536. Throughput: 0: 3517.7. Samples: 139585516. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 08:19:43,968][134211] Avg episode reward: [(0, '7.967')] [2025-01-04 08:19:44,493][134294] Updated weights for policy 0, policy_version 146894 (0.0013) [2025-01-04 08:19:46,481][134294] Updated weights for policy 0, policy_version 146904 (0.0015) [2025-01-04 08:19:48,971][134211] Fps is (10 sec: 19654.4, 60 sec: 14335.3, 300 sec: 14301.2). Total num frames: 601751552. Throughput: 0: 3677.7. Samples: 139601632. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 08:19:48,972][134211] Avg episode reward: [(0, '9.038')] [2025-01-04 08:19:49,564][134294] Updated weights for policy 0, policy_version 146914 (0.0024) [2025-01-04 08:19:52,825][134294] Updated weights for policy 0, policy_version 146924 (0.0026) [2025-01-04 08:19:53,968][134211] Fps is (10 sec: 14745.0, 60 sec: 14267.7, 300 sec: 14287.4). Total num frames: 601812992. Throughput: 0: 3727.6. Samples: 139621468. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 08:19:53,968][134211] Avg episode reward: [(0, '8.822')] [2025-01-04 08:19:55,934][134294] Updated weights for policy 0, policy_version 146934 (0.0025) [2025-01-04 08:19:58,897][134294] Updated weights for policy 0, policy_version 146944 (0.0026) [2025-01-04 08:19:58,968][134211] Fps is (10 sec: 13111.2, 60 sec: 14336.0, 300 sec: 14315.2). Total num frames: 601882624. Throughput: 0: 3759.1. Samples: 139641520. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 08:19:58,968][134211] Avg episode reward: [(0, '9.248')] [2025-01-04 08:20:02,309][134294] Updated weights for policy 0, policy_version 146954 (0.0026) [2025-01-04 08:20:03,968][134211] Fps is (10 sec: 12697.8, 60 sec: 14336.0, 300 sec: 14190.2). Total num frames: 601939968. Throughput: 0: 3739.7. Samples: 139650694. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 08:20:03,968][134211] Avg episode reward: [(0, '9.272')] [2025-01-04 08:20:05,702][134294] Updated weights for policy 0, policy_version 146964 (0.0026) [2025-01-04 08:20:08,718][134294] Updated weights for policy 0, policy_version 146974 (0.0026) [2025-01-04 08:20:08,968][134211] Fps is (10 sec: 12287.9, 60 sec: 14404.3, 300 sec: 14106.9). Total num frames: 602005504. Throughput: 0: 3690.2. Samples: 139669200. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 08:20:08,968][134211] Avg episode reward: [(0, '9.949')] [2025-01-04 08:20:11,804][134294] Updated weights for policy 0, policy_version 146984 (0.0026) [2025-01-04 08:20:13,968][134211] Fps is (10 sec: 13107.0, 60 sec: 14472.5, 300 sec: 14134.7). Total num frames: 602071040. Throughput: 0: 3688.2. Samples: 139689246. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 08:20:13,968][134211] Avg episode reward: [(0, '8.566')] [2025-01-04 08:20:14,961][134294] Updated weights for policy 0, policy_version 146994 (0.0026) [2025-01-04 08:20:17,930][134294] Updated weights for policy 0, policy_version 147004 (0.0024) [2025-01-04 08:20:18,968][134211] Fps is (10 sec: 13516.9, 60 sec: 14472.6, 300 sec: 14148.5). Total num frames: 602140672. Throughput: 0: 3697.2. Samples: 139699192. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 08:20:18,968][134211] Avg episode reward: [(0, '8.719')] [2025-01-04 08:20:20,951][134294] Updated weights for policy 0, policy_version 147014 (0.0026) [2025-01-04 08:20:23,802][134294] Updated weights for policy 0, policy_version 147024 (0.0025) [2025-01-04 08:20:23,968][134211] Fps is (10 sec: 13926.4, 60 sec: 14472.5, 300 sec: 14148.6). Total num frames: 602210304. Throughput: 0: 3708.8. Samples: 139720110. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 08:20:23,968][134211] Avg episode reward: [(0, '8.463')] [2025-01-04 08:20:26,719][134294] Updated weights for policy 0, policy_version 147034 (0.0026) [2025-01-04 08:20:28,968][134211] Fps is (10 sec: 13926.4, 60 sec: 14540.8, 300 sec: 14176.3). Total num frames: 602279936. Throughput: 0: 3450.8. Samples: 139740802. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 08:20:28,968][134211] Avg episode reward: [(0, '9.485')] [2025-01-04 08:20:29,977][134294] Updated weights for policy 0, policy_version 147044 (0.0025) [2025-01-04 08:20:33,144][134294] Updated weights for policy 0, policy_version 147054 (0.0029) [2025-01-04 08:20:33,968][134211] Fps is (10 sec: 13107.3, 60 sec: 14540.8, 300 sec: 14176.3). Total num frames: 602341376. Throughput: 0: 3296.0. Samples: 139749944. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 08:20:33,968][134211] Avg episode reward: [(0, '9.195')] [2025-01-04 08:20:36,597][134294] Updated weights for policy 0, policy_version 147064 (0.0023) [2025-01-04 08:20:38,968][134211] Fps is (10 sec: 12288.3, 60 sec: 14131.2, 300 sec: 14162.5). Total num frames: 602402816. Throughput: 0: 3265.3. Samples: 139768406. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 08:20:38,968][134211] Avg episode reward: [(0, '7.435')] [2025-01-04 08:20:39,647][134294] Updated weights for policy 0, policy_version 147074 (0.0022) [2025-01-04 08:20:41,706][134294] Updated weights for policy 0, policy_version 147084 (0.0013) [2025-01-04 08:20:43,968][134211] Fps is (10 sec: 15155.2, 60 sec: 13789.8, 300 sec: 14245.7). Total num frames: 602492928. Throughput: 0: 3381.3. Samples: 139793680. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-04 08:20:43,968][134211] Avg episode reward: [(0, '9.707')] [2025-01-04 08:20:44,234][134294] Updated weights for policy 0, policy_version 147094 (0.0023) [2025-01-04 08:20:47,333][134294] Updated weights for policy 0, policy_version 147104 (0.0029) [2025-01-04 08:20:48,968][134211] Fps is (10 sec: 15154.8, 60 sec: 13380.9, 300 sec: 14079.1). Total num frames: 602554368. Throughput: 0: 3401.7. Samples: 139803772. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-04 08:20:48,968][134211] Avg episode reward: [(0, '8.319')] [2025-01-04 08:20:50,683][134294] Updated weights for policy 0, policy_version 147114 (0.0024) [2025-01-04 08:20:53,581][134294] Updated weights for policy 0, policy_version 147124 (0.0023) [2025-01-04 08:20:53,968][134211] Fps is (10 sec: 13107.2, 60 sec: 13516.8, 300 sec: 14009.7). Total num frames: 602624000. Throughput: 0: 3426.9. Samples: 139823410. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-04 08:20:53,968][134211] Avg episode reward: [(0, '7.936')] [2025-01-04 08:20:56,714][134294] Updated weights for policy 0, policy_version 147134 (0.0025) [2025-01-04 08:20:58,968][134211] Fps is (10 sec: 13516.9, 60 sec: 13448.5, 300 sec: 14037.5). Total num frames: 602689536. Throughput: 0: 3414.5. Samples: 139842898. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-04 08:20:58,968][134211] Avg episode reward: [(0, '7.886')] [2025-01-04 08:21:00,108][134294] Updated weights for policy 0, policy_version 147144 (0.0026) [2025-01-04 08:21:03,021][134294] Updated weights for policy 0, policy_version 147154 (0.0020) [2025-01-04 08:21:03,967][134211] Fps is (10 sec: 13517.2, 60 sec: 13653.4, 300 sec: 14065.3). Total num frames: 602759168. Throughput: 0: 3388.7. Samples: 139851684. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-04 08:21:03,968][134211] Avg episode reward: [(0, '9.814')] [2025-01-04 08:21:05,103][134294] Updated weights for policy 0, policy_version 147164 (0.0013) [2025-01-04 08:21:07,193][134294] Updated weights for policy 0, policy_version 147174 (0.0015) [2025-01-04 08:21:08,968][134211] Fps is (10 sec: 15974.3, 60 sec: 14062.9, 300 sec: 14162.4). Total num frames: 602849280. Throughput: 0: 3535.1. Samples: 139879188. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-04 08:21:08,968][134211] Avg episode reward: [(0, '8.084')] [2025-01-04 08:21:10,060][134294] Updated weights for policy 0, policy_version 147184 (0.0024) [2025-01-04 08:21:13,239][134294] Updated weights for policy 0, policy_version 147194 (0.0028) [2025-01-04 08:21:13,968][134211] Fps is (10 sec: 15564.3, 60 sec: 14062.9, 300 sec: 14162.4). Total num frames: 602914816. Throughput: 0: 3535.0. Samples: 139899876. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-04 08:21:13,968][134211] Avg episode reward: [(0, '8.936')] [2025-01-04 08:21:13,978][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000147196_602914816.pth... [2025-01-04 08:21:14,045][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000146366_599515136.pth [2025-01-04 08:21:16,607][134294] Updated weights for policy 0, policy_version 147204 (0.0027) [2025-01-04 08:21:18,968][134211] Fps is (10 sec: 13107.2, 60 sec: 13994.7, 300 sec: 14162.4). Total num frames: 602980352. Throughput: 0: 3531.7. Samples: 139908870. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-04 08:21:18,968][134211] Avg episode reward: [(0, '8.044')] [2025-01-04 08:21:19,569][134294] Updated weights for policy 0, policy_version 147214 (0.0024) [2025-01-04 08:21:22,669][134294] Updated weights for policy 0, policy_version 147224 (0.0026) [2025-01-04 08:21:23,968][134211] Fps is (10 sec: 12697.8, 60 sec: 13858.2, 300 sec: 14148.6). Total num frames: 603041792. Throughput: 0: 3566.6. Samples: 139928904. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-04 08:21:23,968][134211] Avg episode reward: [(0, '8.384')] [2025-01-04 08:21:25,729][134294] Updated weights for policy 0, policy_version 147234 (0.0026) [2025-01-04 08:21:28,907][134294] Updated weights for policy 0, policy_version 147244 (0.0025) [2025-01-04 08:21:28,968][134211] Fps is (10 sec: 13107.2, 60 sec: 13858.1, 300 sec: 14176.3). Total num frames: 603111424. Throughput: 0: 3448.7. Samples: 139948870. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-04 08:21:28,968][134211] Avg episode reward: [(0, '8.498')] [2025-01-04 08:21:32,358][134294] Updated weights for policy 0, policy_version 147254 (0.0026) [2025-01-04 08:21:33,968][134211] Fps is (10 sec: 12697.6, 60 sec: 13789.9, 300 sec: 14093.0). Total num frames: 603168768. Throughput: 0: 3420.3. Samples: 139957684. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-04 08:21:33,968][134211] Avg episode reward: [(0, '9.326')] [2025-01-04 08:21:35,623][134294] Updated weights for policy 0, policy_version 147264 (0.0025) [2025-01-04 08:21:37,706][134294] Updated weights for policy 0, policy_version 147274 (0.0013) [2025-01-04 08:21:38,968][134211] Fps is (10 sec: 14745.9, 60 sec: 14267.7, 300 sec: 14037.5). Total num frames: 603258880. Throughput: 0: 3454.3. Samples: 139978852. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-04 08:21:38,968][134211] Avg episode reward: [(0, '8.433')] [2025-01-04 08:21:39,643][134294] Updated weights for policy 0, policy_version 147284 (0.0013) [2025-01-04 08:21:42,401][134294] Updated weights for policy 0, policy_version 147294 (0.0024) [2025-01-04 08:21:43,968][134211] Fps is (10 sec: 16793.8, 60 sec: 14063.0, 300 sec: 14037.5). Total num frames: 603336704. Throughput: 0: 3597.8. Samples: 140004798. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-04 08:21:43,968][134211] Avg episode reward: [(0, '8.764')] [2025-01-04 08:21:45,481][134294] Updated weights for policy 0, policy_version 147304 (0.0028) [2025-01-04 08:21:48,506][134294] Updated weights for policy 0, policy_version 147314 (0.0026) [2025-01-04 08:21:48,968][134211] Fps is (10 sec: 14335.7, 60 sec: 14131.2, 300 sec: 14037.5). Total num frames: 603402240. Throughput: 0: 3624.2. Samples: 140014774. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 08:21:48,968][134211] Avg episode reward: [(0, '7.989')] [2025-01-04 08:21:51,456][134294] Updated weights for policy 0, policy_version 147324 (0.0025) [2025-01-04 08:21:53,968][134211] Fps is (10 sec: 13107.0, 60 sec: 14062.9, 300 sec: 14037.5). Total num frames: 603467776. Throughput: 0: 3463.4. Samples: 140035040. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 08:21:53,968][134211] Avg episode reward: [(0, '9.184')] [2025-01-04 08:21:54,932][134294] Updated weights for policy 0, policy_version 147334 (0.0024) [2025-01-04 08:21:57,917][134294] Updated weights for policy 0, policy_version 147344 (0.0027) [2025-01-04 08:21:58,968][134211] Fps is (10 sec: 13107.2, 60 sec: 14062.9, 300 sec: 14065.3). Total num frames: 603533312. Throughput: 0: 3426.1. Samples: 140054052. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 08:21:58,968][134211] Avg episode reward: [(0, '8.233')] [2025-01-04 08:22:01,301][134294] Updated weights for policy 0, policy_version 147354 (0.0029) [2025-01-04 08:22:03,968][134211] Fps is (10 sec: 12288.2, 60 sec: 13858.1, 300 sec: 14037.5). Total num frames: 603590656. Throughput: 0: 3432.9. Samples: 140063350. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 08:22:03,968][134211] Avg episode reward: [(0, '8.086')] [2025-01-04 08:22:04,694][134294] Updated weights for policy 0, policy_version 147364 (0.0026) [2025-01-04 08:22:07,041][134294] Updated weights for policy 0, policy_version 147374 (0.0014) [2025-01-04 08:22:08,967][134211] Fps is (10 sec: 14746.0, 60 sec: 13858.2, 300 sec: 14120.8). Total num frames: 603680768. Throughput: 0: 3464.7. Samples: 140084816. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 08:22:08,968][134211] Avg episode reward: [(0, '8.192')] [2025-01-04 08:22:09,079][134294] Updated weights for policy 0, policy_version 147384 (0.0013) [2025-01-04 08:22:10,928][134294] Updated weights for policy 0, policy_version 147394 (0.0012) [2025-01-04 08:22:12,894][134294] Updated weights for policy 0, policy_version 147404 (0.0014) [2025-01-04 08:22:13,968][134211] Fps is (10 sec: 19251.0, 60 sec: 14472.5, 300 sec: 14231.9). Total num frames: 603783168. Throughput: 0: 3713.8. Samples: 140115990. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 08:22:13,968][134211] Avg episode reward: [(0, '8.601')] [2025-01-04 08:22:15,809][134294] Updated weights for policy 0, policy_version 147414 (0.0027) [2025-01-04 08:22:18,965][134294] Updated weights for policy 0, policy_version 147424 (0.0024) [2025-01-04 08:22:18,968][134211] Fps is (10 sec: 16793.1, 60 sec: 14472.5, 300 sec: 14231.9). Total num frames: 603848704. Throughput: 0: 3747.6. Samples: 140126326. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 08:22:18,969][134211] Avg episode reward: [(0, '8.110')] [2025-01-04 08:22:22,052][134294] Updated weights for policy 0, policy_version 147434 (0.0024) [2025-01-04 08:22:23,968][134211] Fps is (10 sec: 12697.8, 60 sec: 14472.5, 300 sec: 14204.1). Total num frames: 603910144. Throughput: 0: 3712.8. Samples: 140145928. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 08:22:23,968][134211] Avg episode reward: [(0, '8.729')] [2025-01-04 08:22:25,110][134294] Updated weights for policy 0, policy_version 147444 (0.0025) [2025-01-04 08:22:28,215][134294] Updated weights for policy 0, policy_version 147454 (0.0025) [2025-01-04 08:22:28,968][134211] Fps is (10 sec: 13107.4, 60 sec: 14472.6, 300 sec: 14245.8). Total num frames: 603979776. Throughput: 0: 3581.9. Samples: 140165982. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 08:22:28,968][134211] Avg episode reward: [(0, '8.902')] [2025-01-04 08:22:31,516][134294] Updated weights for policy 0, policy_version 147464 (0.0025) [2025-01-04 08:22:33,968][134211] Fps is (10 sec: 13107.2, 60 sec: 14540.8, 300 sec: 14190.2). Total num frames: 604041216. Throughput: 0: 3567.8. Samples: 140175326. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 08:22:33,968][134211] Avg episode reward: [(0, '9.232')] [2025-01-04 08:22:34,929][134294] Updated weights for policy 0, policy_version 147474 (0.0025) [2025-01-04 08:22:38,145][134294] Updated weights for policy 0, policy_version 147484 (0.0025) [2025-01-04 08:22:38,968][134211] Fps is (10 sec: 12287.8, 60 sec: 14062.9, 300 sec: 14037.5). Total num frames: 604102656. Throughput: 0: 3527.7. Samples: 140193786. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 08:22:38,968][134211] Avg episode reward: [(0, '9.089')] [2025-01-04 08:22:41,765][134294] Updated weights for policy 0, policy_version 147494 (0.0025) [2025-01-04 08:22:43,968][134211] Fps is (10 sec: 12288.1, 60 sec: 13789.9, 300 sec: 14009.7). Total num frames: 604164096. Throughput: 0: 3492.0. Samples: 140211192. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 08:22:43,968][134211] Avg episode reward: [(0, '9.232')] [2025-01-04 08:22:44,489][134294] Updated weights for policy 0, policy_version 147504 (0.0018) [2025-01-04 08:22:46,462][134294] Updated weights for policy 0, policy_version 147514 (0.0014) [2025-01-04 08:22:48,289][134294] Updated weights for policy 0, policy_version 147524 (0.0012) [2025-01-04 08:22:48,968][134211] Fps is (10 sec: 16794.0, 60 sec: 14472.6, 300 sec: 14162.5). Total num frames: 604270592. Throughput: 0: 3618.3. Samples: 140226174. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 08:22:48,968][134211] Avg episode reward: [(0, '9.359')] [2025-01-04 08:22:50,179][134294] Updated weights for policy 0, policy_version 147534 (0.0014) [2025-01-04 08:22:52,081][134294] Updated weights for policy 0, policy_version 147544 (0.0013) [2025-01-04 08:22:53,960][134294] Updated weights for policy 0, policy_version 147554 (0.0014) [2025-01-04 08:22:53,967][134211] Fps is (10 sec: 21708.9, 60 sec: 15223.5, 300 sec: 14301.3). Total num frames: 604381184. Throughput: 0: 3867.6. Samples: 140258856. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 08:22:53,968][134211] Avg episode reward: [(0, '9.201')] [2025-01-04 08:22:56,573][134294] Updated weights for policy 0, policy_version 147564 (0.0022) [2025-01-04 08:22:58,968][134211] Fps is (10 sec: 18022.0, 60 sec: 15291.7, 300 sec: 14329.1). Total num frames: 604450816. Throughput: 0: 3734.8. Samples: 140284058. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 08:22:58,968][134211] Avg episode reward: [(0, '8.256')] [2025-01-04 08:22:59,796][134294] Updated weights for policy 0, policy_version 147574 (0.0029) [2025-01-04 08:23:03,305][134294] Updated weights for policy 0, policy_version 147584 (0.0030) [2025-01-04 08:23:03,968][134211] Fps is (10 sec: 12697.4, 60 sec: 15291.7, 300 sec: 14301.3). Total num frames: 604508160. Throughput: 0: 3703.7. Samples: 140292990. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 08:23:03,968][134211] Avg episode reward: [(0, '9.292')] [2025-01-04 08:23:06,668][134294] Updated weights for policy 0, policy_version 147594 (0.0031) [2025-01-04 08:23:08,968][134211] Fps is (10 sec: 12288.0, 60 sec: 14882.1, 300 sec: 14301.3). Total num frames: 604573696. Throughput: 0: 3669.1. Samples: 140311038. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 08:23:08,968][134211] Avg episode reward: [(0, '8.298')] [2025-01-04 08:23:09,782][134294] Updated weights for policy 0, policy_version 147604 (0.0025) [2025-01-04 08:23:12,846][134294] Updated weights for policy 0, policy_version 147614 (0.0025) [2025-01-04 08:23:13,968][134211] Fps is (10 sec: 13107.0, 60 sec: 14267.7, 300 sec: 14287.4). Total num frames: 604639232. Throughput: 0: 3666.1. Samples: 140330956. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 08:23:13,968][134211] Avg episode reward: [(0, '8.843')] [2025-01-04 08:23:13,978][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000147617_604639232.pth... [2025-01-04 08:23:14,047][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000146779_601206784.pth [2025-01-04 08:23:15,965][134294] Updated weights for policy 0, policy_version 147624 (0.0025) [2025-01-04 08:23:18,938][134294] Updated weights for policy 0, policy_version 147634 (0.0024) [2025-01-04 08:23:18,968][134211] Fps is (10 sec: 13516.8, 60 sec: 14336.0, 300 sec: 14287.4). Total num frames: 604708864. Throughput: 0: 3683.6. Samples: 140341088. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 08:23:18,968][134211] Avg episode reward: [(0, '8.429')] [2025-01-04 08:23:21,927][134294] Updated weights for policy 0, policy_version 147644 (0.0025) [2025-01-04 08:23:23,968][134211] Fps is (10 sec: 13516.9, 60 sec: 14404.2, 300 sec: 14287.4). Total num frames: 604774400. Throughput: 0: 3724.1. Samples: 140361372. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 08:23:23,968][134211] Avg episode reward: [(0, '9.108')] [2025-01-04 08:23:25,012][134294] Updated weights for policy 0, policy_version 147654 (0.0022) [2025-01-04 08:23:28,005][134294] Updated weights for policy 0, policy_version 147664 (0.0025) [2025-01-04 08:23:28,968][134211] Fps is (10 sec: 13516.8, 60 sec: 14404.2, 300 sec: 14218.0). Total num frames: 604844032. Throughput: 0: 3790.7. Samples: 140381776. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 08:23:28,968][134211] Avg episode reward: [(0, '8.356')] [2025-01-04 08:23:31,327][134294] Updated weights for policy 0, policy_version 147674 (0.0028) [2025-01-04 08:23:33,968][134211] Fps is (10 sec: 12697.8, 60 sec: 14336.0, 300 sec: 14079.1). Total num frames: 604901376. Throughput: 0: 3663.3. Samples: 140391024. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 08:23:33,968][134211] Avg episode reward: [(0, '9.165')] [2025-01-04 08:23:34,674][134294] Updated weights for policy 0, policy_version 147684 (0.0026) [2025-01-04 08:23:37,883][134294] Updated weights for policy 0, policy_version 147694 (0.0024) [2025-01-04 08:23:38,968][134211] Fps is (10 sec: 12288.0, 60 sec: 14404.3, 300 sec: 14023.6). Total num frames: 604966912. Throughput: 0: 3349.5. Samples: 140409586. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 08:23:38,969][134211] Avg episode reward: [(0, '9.405')] [2025-01-04 08:23:41,013][134294] Updated weights for policy 0, policy_version 147704 (0.0027) [2025-01-04 08:23:43,969][134211] Fps is (10 sec: 13106.0, 60 sec: 14472.3, 300 sec: 14037.4). Total num frames: 605032448. Throughput: 0: 3226.8. Samples: 140429266. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 08:23:43,969][134211] Avg episode reward: [(0, '8.070')] [2025-01-04 08:23:44,149][134294] Updated weights for policy 0, policy_version 147714 (0.0026) [2025-01-04 08:23:47,079][134294] Updated weights for policy 0, policy_version 147724 (0.0024) [2025-01-04 08:23:48,968][134211] Fps is (10 sec: 13516.8, 60 sec: 13858.1, 300 sec: 14051.4). Total num frames: 605102080. Throughput: 0: 3255.3. Samples: 140439478. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 08:23:48,968][134211] Avg episode reward: [(0, '9.044')] [2025-01-04 08:23:50,073][134294] Updated weights for policy 0, policy_version 147734 (0.0024) [2025-01-04 08:23:52,897][134294] Updated weights for policy 0, policy_version 147744 (0.0026) [2025-01-04 08:23:53,968][134211] Fps is (10 sec: 13927.5, 60 sec: 13175.4, 300 sec: 14065.2). Total num frames: 605171712. Throughput: 0: 3319.8. Samples: 140460428. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 08:23:53,969][134211] Avg episode reward: [(0, '9.166')] [2025-01-04 08:23:55,866][134294] Updated weights for policy 0, policy_version 147754 (0.0024) [2025-01-04 08:23:58,730][134294] Updated weights for policy 0, policy_version 147764 (0.0024) [2025-01-04 08:23:58,968][134211] Fps is (10 sec: 13926.5, 60 sec: 13175.5, 300 sec: 14106.9). Total num frames: 605241344. Throughput: 0: 3345.9. Samples: 140481520. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 08:23:58,968][134211] Avg episode reward: [(0, '8.972')] [2025-01-04 08:24:01,998][134294] Updated weights for policy 0, policy_version 147774 (0.0028) [2025-01-04 08:24:03,968][134211] Fps is (10 sec: 13926.7, 60 sec: 13380.3, 300 sec: 14134.7). Total num frames: 605310976. Throughput: 0: 3334.9. Samples: 140491160. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 08:24:03,968][134211] Avg episode reward: [(0, '8.804')] [2025-01-04 08:24:04,417][134294] Updated weights for policy 0, policy_version 147784 (0.0015) [2025-01-04 08:24:06,531][134294] Updated weights for policy 0, policy_version 147794 (0.0012) [2025-01-04 08:24:08,411][134294] Updated weights for policy 0, policy_version 147804 (0.0012) [2025-01-04 08:24:08,968][134211] Fps is (10 sec: 17203.4, 60 sec: 13994.7, 300 sec: 14273.5). Total num frames: 605413376. Throughput: 0: 3479.1. Samples: 140517932. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 08:24:08,968][134211] Avg episode reward: [(0, '9.837')] [2025-01-04 08:24:11,128][134294] Updated weights for policy 0, policy_version 147814 (0.0022) [2025-01-04 08:24:13,968][134211] Fps is (10 sec: 16793.2, 60 sec: 13994.7, 300 sec: 14259.6). Total num frames: 605478912. Throughput: 0: 3552.2. Samples: 140541624. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 08:24:13,968][134211] Avg episode reward: [(0, '8.675')] [2025-01-04 08:24:14,278][134294] Updated weights for policy 0, policy_version 147824 (0.0027) [2025-01-04 08:24:17,467][134294] Updated weights for policy 0, policy_version 147834 (0.0027) [2025-01-04 08:24:18,968][134211] Fps is (10 sec: 13107.0, 60 sec: 13926.4, 300 sec: 14245.7). Total num frames: 605544448. Throughput: 0: 3553.3. Samples: 140550924. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 08:24:18,968][134211] Avg episode reward: [(0, '9.140')] [2025-01-04 08:24:20,544][134294] Updated weights for policy 0, policy_version 147844 (0.0026) [2025-01-04 08:24:23,509][134294] Updated weights for policy 0, policy_version 147854 (0.0025) [2025-01-04 08:24:23,968][134211] Fps is (10 sec: 13516.9, 60 sec: 13994.7, 300 sec: 14259.6). Total num frames: 605614080. Throughput: 0: 3591.0. Samples: 140571180. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 08:24:23,968][134211] Avg episode reward: [(0, '9.109')] [2025-01-04 08:24:26,555][134294] Updated weights for policy 0, policy_version 147864 (0.0025) [2025-01-04 08:24:28,968][134211] Fps is (10 sec: 13516.8, 60 sec: 13926.4, 300 sec: 14273.5). Total num frames: 605679616. Throughput: 0: 3601.5. Samples: 140591332. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 08:24:28,968][134211] Avg episode reward: [(0, '9.378')] [2025-01-04 08:24:29,711][134294] Updated weights for policy 0, policy_version 147874 (0.0024) [2025-01-04 08:24:33,028][134294] Updated weights for policy 0, policy_version 147884 (0.0026) [2025-01-04 08:24:33,968][134211] Fps is (10 sec: 12697.6, 60 sec: 13994.6, 300 sec: 14190.2). Total num frames: 605741056. Throughput: 0: 3580.8. Samples: 140600616. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 08:24:33,968][134211] Avg episode reward: [(0, '8.440')] [2025-01-04 08:24:36,179][134294] Updated weights for policy 0, policy_version 147894 (0.0023) [2025-01-04 08:24:38,968][134211] Fps is (10 sec: 12697.7, 60 sec: 13994.7, 300 sec: 14037.5). Total num frames: 605806592. Throughput: 0: 3536.6. Samples: 140619576. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 08:24:38,968][134211] Avg episode reward: [(0, '8.786')] [2025-01-04 08:24:39,600][134294] Updated weights for policy 0, policy_version 147904 (0.0025) [2025-01-04 08:24:41,936][134294] Updated weights for policy 0, policy_version 147914 (0.0016) [2025-01-04 08:24:43,971][134211] Fps is (10 sec: 13513.0, 60 sec: 14062.5, 300 sec: 13981.9). Total num frames: 605876224. Throughput: 0: 3545.3. Samples: 140641070. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 08:24:43,971][134211] Avg episode reward: [(0, '9.000')] [2025-01-04 08:24:45,295][134294] Updated weights for policy 0, policy_version 147924 (0.0026) [2025-01-04 08:24:48,518][134294] Updated weights for policy 0, policy_version 147934 (0.0028) [2025-01-04 08:24:48,968][134211] Fps is (10 sec: 13516.9, 60 sec: 13994.7, 300 sec: 13995.8). Total num frames: 605941760. Throughput: 0: 3528.7. Samples: 140649952. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 08:24:48,968][134211] Avg episode reward: [(0, '8.505')] [2025-01-04 08:24:50,537][134294] Updated weights for policy 0, policy_version 147944 (0.0014) [2025-01-04 08:24:53,400][134294] Updated weights for policy 0, policy_version 147954 (0.0023) [2025-01-04 08:24:53,968][134211] Fps is (10 sec: 14749.7, 60 sec: 14199.5, 300 sec: 14037.5). Total num frames: 606023680. Throughput: 0: 3474.4. Samples: 140674280. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 08:24:53,968][134211] Avg episode reward: [(0, '8.376')] [2025-01-04 08:24:56,648][134294] Updated weights for policy 0, policy_version 147964 (0.0026) [2025-01-04 08:24:58,968][134211] Fps is (10 sec: 14745.2, 60 sec: 14131.2, 300 sec: 14065.2). Total num frames: 606089216. Throughput: 0: 3371.0. Samples: 140693318. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 08:24:58,968][134211] Avg episode reward: [(0, '8.994')] [2025-01-04 08:24:59,812][134294] Updated weights for policy 0, policy_version 147974 (0.0028) [2025-01-04 08:25:03,144][134294] Updated weights for policy 0, policy_version 147984 (0.0022) [2025-01-04 08:25:03,968][134211] Fps is (10 sec: 13517.1, 60 sec: 14131.2, 300 sec: 14079.1). Total num frames: 606158848. Throughput: 0: 3372.9. Samples: 140702702. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 08:25:03,968][134211] Avg episode reward: [(0, '8.533')] [2025-01-04 08:25:05,260][134294] Updated weights for policy 0, policy_version 147994 (0.0014) [2025-01-04 08:25:07,242][134294] Updated weights for policy 0, policy_version 148004 (0.0013) [2025-01-04 08:25:08,967][134211] Fps is (10 sec: 17203.7, 60 sec: 14131.2, 300 sec: 14204.1). Total num frames: 606261248. Throughput: 0: 3508.4. Samples: 140729056. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 08:25:08,968][134211] Avg episode reward: [(0, '9.040')] [2025-01-04 08:25:09,150][134294] Updated weights for policy 0, policy_version 148014 (0.0014) [2025-01-04 08:25:11,248][134294] Updated weights for policy 0, policy_version 148024 (0.0018) [2025-01-04 08:25:13,968][134211] Fps is (10 sec: 18021.7, 60 sec: 14336.0, 300 sec: 14231.9). Total num frames: 606339072. Throughput: 0: 3663.6. Samples: 140756196. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 08:25:13,969][134211] Avg episode reward: [(0, '9.576')] [2025-01-04 08:25:13,983][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000148032_606339072.pth... [2025-01-04 08:25:14,057][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000147196_602914816.pth [2025-01-04 08:25:14,414][134294] Updated weights for policy 0, policy_version 148034 (0.0032) [2025-01-04 08:25:17,575][134294] Updated weights for policy 0, policy_version 148044 (0.0024) [2025-01-04 08:25:18,968][134211] Fps is (10 sec: 14335.8, 60 sec: 14336.0, 300 sec: 14218.0). Total num frames: 606404608. Throughput: 0: 3665.1. Samples: 140765546. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 08:25:18,968][134211] Avg episode reward: [(0, '8.225')] [2025-01-04 08:25:20,805][134294] Updated weights for policy 0, policy_version 148054 (0.0025) [2025-01-04 08:25:23,956][134294] Updated weights for policy 0, policy_version 148064 (0.0025) [2025-01-04 08:25:23,968][134211] Fps is (10 sec: 13107.5, 60 sec: 14267.7, 300 sec: 14204.1). Total num frames: 606470144. Throughput: 0: 3674.6. Samples: 140784934. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 08:25:23,968][134211] Avg episode reward: [(0, '8.150')] [2025-01-04 08:25:26,937][134294] Updated weights for policy 0, policy_version 148074 (0.0024) [2025-01-04 08:25:28,968][134211] Fps is (10 sec: 13107.1, 60 sec: 14267.7, 300 sec: 14218.0). Total num frames: 606535680. Throughput: 0: 3642.1. Samples: 140804956. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 08:25:28,968][134211] Avg episode reward: [(0, '9.260')] [2025-01-04 08:25:30,064][134294] Updated weights for policy 0, policy_version 148084 (0.0023) [2025-01-04 08:25:33,403][134294] Updated weights for policy 0, policy_version 148094 (0.0029) [2025-01-04 08:25:33,970][134211] Fps is (10 sec: 12695.1, 60 sec: 14267.3, 300 sec: 14217.9). Total num frames: 606597120. Throughput: 0: 3655.8. Samples: 140814470. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 08:25:33,970][134211] Avg episode reward: [(0, '8.536')] [2025-01-04 08:25:36,687][134294] Updated weights for policy 0, policy_version 148104 (0.0026) [2025-01-04 08:25:38,968][134211] Fps is (10 sec: 12697.7, 60 sec: 14267.7, 300 sec: 14134.7). Total num frames: 606662656. Throughput: 0: 3532.9. Samples: 140833260. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 08:25:38,968][134211] Avg episode reward: [(0, '9.127')] [2025-01-04 08:25:39,762][134294] Updated weights for policy 0, policy_version 148114 (0.0026) [2025-01-04 08:25:42,864][134294] Updated weights for policy 0, policy_version 148124 (0.0025) [2025-01-04 08:25:43,968][134211] Fps is (10 sec: 13109.1, 60 sec: 14200.0, 300 sec: 14148.5). Total num frames: 606728192. Throughput: 0: 3552.3. Samples: 140853172. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 08:25:43,969][134211] Avg episode reward: [(0, '8.675')] [2025-01-04 08:25:45,832][134294] Updated weights for policy 0, policy_version 148134 (0.0025) [2025-01-04 08:25:48,824][134294] Updated weights for policy 0, policy_version 148144 (0.0026) [2025-01-04 08:25:48,968][134211] Fps is (10 sec: 13516.8, 60 sec: 14267.7, 300 sec: 14148.6). Total num frames: 606797824. Throughput: 0: 3578.3. Samples: 140863726. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 08:25:48,968][134211] Avg episode reward: [(0, '7.840')] [2025-01-04 08:25:51,838][134294] Updated weights for policy 0, policy_version 148154 (0.0027) [2025-01-04 08:25:53,968][134211] Fps is (10 sec: 13517.5, 60 sec: 13994.7, 300 sec: 14148.6). Total num frames: 606863360. Throughput: 0: 3439.9. Samples: 140883852. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 08:25:53,968][134211] Avg episode reward: [(0, '8.746')] [2025-01-04 08:25:55,092][134294] Updated weights for policy 0, policy_version 148164 (0.0024) [2025-01-04 08:25:58,393][134294] Updated weights for policy 0, policy_version 148174 (0.0024) [2025-01-04 08:25:58,969][134211] Fps is (10 sec: 12695.9, 60 sec: 13926.1, 300 sec: 14120.7). Total num frames: 606924800. Throughput: 0: 3254.1. Samples: 140902634. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 08:25:58,970][134211] Avg episode reward: [(0, '8.179')] [2025-01-04 08:26:00,823][134294] Updated weights for policy 0, policy_version 148184 (0.0014) [2025-01-04 08:26:02,861][134294] Updated weights for policy 0, policy_version 148194 (0.0014) [2025-01-04 08:26:03,968][134211] Fps is (10 sec: 15974.7, 60 sec: 14404.3, 300 sec: 14148.6). Total num frames: 607023104. Throughput: 0: 3333.1. Samples: 140915534. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 08:26:03,968][134211] Avg episode reward: [(0, '9.724')] [2025-01-04 08:26:04,872][134294] Updated weights for policy 0, policy_version 148204 (0.0015) [2025-01-04 08:26:07,520][134294] Updated weights for policy 0, policy_version 148214 (0.0020) [2025-01-04 08:26:08,968][134211] Fps is (10 sec: 17615.2, 60 sec: 13994.6, 300 sec: 14190.2). Total num frames: 607100928. Throughput: 0: 3513.4. Samples: 140943038. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 08:26:08,968][134211] Avg episode reward: [(0, '9.026')] [2025-01-04 08:26:10,964][134294] Updated weights for policy 0, policy_version 148224 (0.0021) [2025-01-04 08:26:13,970][134211] Fps is (10 sec: 13923.2, 60 sec: 13721.2, 300 sec: 14176.2). Total num frames: 607162368. Throughput: 0: 3489.1. Samples: 140961972. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 08:26:13,971][134211] Avg episode reward: [(0, '8.166')] [2025-01-04 08:26:14,168][134294] Updated weights for policy 0, policy_version 148234 (0.0027) [2025-01-04 08:26:17,221][134294] Updated weights for policy 0, policy_version 148244 (0.0024) [2025-01-04 08:26:18,968][134211] Fps is (10 sec: 12697.4, 60 sec: 13721.6, 300 sec: 14190.2). Total num frames: 607227904. Throughput: 0: 3491.0. Samples: 140971556. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 08:26:18,969][134211] Avg episode reward: [(0, '9.294')] [2025-01-04 08:26:20,205][134294] Updated weights for policy 0, policy_version 148254 (0.0025) [2025-01-04 08:26:23,167][134294] Updated weights for policy 0, policy_version 148264 (0.0027) [2025-01-04 08:26:23,968][134211] Fps is (10 sec: 13519.6, 60 sec: 13789.9, 300 sec: 14190.2). Total num frames: 607297536. Throughput: 0: 3532.1. Samples: 140992204. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 08:26:23,968][134211] Avg episode reward: [(0, '8.864')] [2025-01-04 08:26:26,230][134294] Updated weights for policy 0, policy_version 148274 (0.0024) [2025-01-04 08:26:28,968][134211] Fps is (10 sec: 13517.0, 60 sec: 13789.9, 300 sec: 14218.0). Total num frames: 607363072. Throughput: 0: 3536.8. Samples: 141012328. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 08:26:28,968][134211] Avg episode reward: [(0, '8.037')] [2025-01-04 08:26:29,423][134294] Updated weights for policy 0, policy_version 148284 (0.0026) [2025-01-04 08:26:32,663][134294] Updated weights for policy 0, policy_version 148294 (0.0022) [2025-01-04 08:26:33,968][134211] Fps is (10 sec: 12697.6, 60 sec: 13790.3, 300 sec: 14120.8). Total num frames: 607424512. Throughput: 0: 3509.2. Samples: 141021642. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 08:26:33,968][134211] Avg episode reward: [(0, '8.111')] [2025-01-04 08:26:36,003][134294] Updated weights for policy 0, policy_version 148304 (0.0030) [2025-01-04 08:26:38,335][134294] Updated weights for policy 0, policy_version 148314 (0.0017) [2025-01-04 08:26:38,968][134211] Fps is (10 sec: 13926.4, 60 sec: 13994.7, 300 sec: 14120.8). Total num frames: 607502336. Throughput: 0: 3484.0. Samples: 141040632. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 08:26:38,968][134211] Avg episode reward: [(0, '8.331')] [2025-01-04 08:26:41,334][134294] Updated weights for policy 0, policy_version 148324 (0.0023) [2025-01-04 08:26:43,969][134211] Fps is (10 sec: 13925.4, 60 sec: 13926.3, 300 sec: 14106.9). Total num frames: 607563776. Throughput: 0: 3541.3. Samples: 141061990. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 08:26:43,970][134211] Avg episode reward: [(0, '9.022')] [2025-01-04 08:26:45,023][134294] Updated weights for policy 0, policy_version 148334 (0.0029) [2025-01-04 08:26:47,519][134294] Updated weights for policy 0, policy_version 148344 (0.0015) [2025-01-04 08:26:48,968][134211] Fps is (10 sec: 13926.5, 60 sec: 14063.0, 300 sec: 14148.6). Total num frames: 607641600. Throughput: 0: 3447.8. Samples: 141070684. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 08:26:48,968][134211] Avg episode reward: [(0, '9.090')] [2025-01-04 08:26:49,587][134294] Updated weights for policy 0, policy_version 148354 (0.0015) [2025-01-04 08:26:51,577][134294] Updated weights for policy 0, policy_version 148364 (0.0013) [2025-01-04 08:26:53,551][134294] Updated weights for policy 0, policy_version 148374 (0.0013) [2025-01-04 08:26:53,968][134211] Fps is (10 sec: 18024.1, 60 sec: 14677.4, 300 sec: 14273.5). Total num frames: 607744000. Throughput: 0: 3510.6. Samples: 141101016. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 08:26:53,968][134211] Avg episode reward: [(0, '8.744')] [2025-01-04 08:26:55,792][134294] Updated weights for policy 0, policy_version 148384 (0.0018) [2025-01-04 08:26:58,968][134211] Fps is (10 sec: 17612.3, 60 sec: 14882.4, 300 sec: 14329.1). Total num frames: 607817728. Throughput: 0: 3640.2. Samples: 141125772. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 08:26:58,969][134211] Avg episode reward: [(0, '8.019')] [2025-01-04 08:26:59,218][134294] Updated weights for policy 0, policy_version 148394 (0.0027) [2025-01-04 08:27:02,961][134294] Updated weights for policy 0, policy_version 148404 (0.0027) [2025-01-04 08:27:03,970][134211] Fps is (10 sec: 12694.7, 60 sec: 14130.7, 300 sec: 14204.0). Total num frames: 607870976. Throughput: 0: 3609.6. Samples: 141133996. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 08:27:03,971][134211] Avg episode reward: [(0, '8.983')] [2025-01-04 08:27:06,482][134294] Updated weights for policy 0, policy_version 148414 (0.0028) [2025-01-04 08:27:08,968][134211] Fps is (10 sec: 11469.0, 60 sec: 13858.1, 300 sec: 14065.3). Total num frames: 607932416. Throughput: 0: 3526.6. Samples: 141150902. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 08:27:08,968][134211] Avg episode reward: [(0, '9.353')] [2025-01-04 08:27:09,904][134294] Updated weights for policy 0, policy_version 148424 (0.0026) [2025-01-04 08:27:12,876][134294] Updated weights for policy 0, policy_version 148434 (0.0025) [2025-01-04 08:27:13,969][134211] Fps is (10 sec: 12699.0, 60 sec: 13926.6, 300 sec: 14065.2). Total num frames: 607997952. Throughput: 0: 3517.4. Samples: 141170614. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 08:27:13,969][134211] Avg episode reward: [(0, '8.933')] [2025-01-04 08:27:13,979][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000148437_607997952.pth... [2025-01-04 08:27:14,054][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000147617_604639232.pth [2025-01-04 08:27:16,043][134294] Updated weights for policy 0, policy_version 148444 (0.0023) [2025-01-04 08:27:18,957][134294] Updated weights for policy 0, policy_version 148454 (0.0024) [2025-01-04 08:27:18,968][134211] Fps is (10 sec: 13516.6, 60 sec: 13994.7, 300 sec: 14093.0). Total num frames: 608067584. Throughput: 0: 3528.7. Samples: 141180432. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 08:27:18,968][134211] Avg episode reward: [(0, '8.745')] [2025-01-04 08:27:21,990][134294] Updated weights for policy 0, policy_version 148464 (0.0024) [2025-01-04 08:27:23,968][134211] Fps is (10 sec: 13518.3, 60 sec: 13926.4, 300 sec: 14079.1). Total num frames: 608133120. Throughput: 0: 3561.3. Samples: 141200890. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 08:27:23,968][134211] Avg episode reward: [(0, '8.745')] [2025-01-04 08:27:25,054][134294] Updated weights for policy 0, policy_version 148474 (0.0024) [2025-01-04 08:27:27,992][134294] Updated weights for policy 0, policy_version 148484 (0.0023) [2025-01-04 08:27:28,968][134211] Fps is (10 sec: 13516.9, 60 sec: 13994.7, 300 sec: 14106.9). Total num frames: 608202752. Throughput: 0: 3542.6. Samples: 141221406. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 08:27:28,968][134211] Avg episode reward: [(0, '8.941')] [2025-01-04 08:27:31,106][134294] Updated weights for policy 0, policy_version 148494 (0.0025) [2025-01-04 08:27:33,968][134211] Fps is (10 sec: 13106.7, 60 sec: 13994.6, 300 sec: 14106.9). Total num frames: 608264192. Throughput: 0: 3570.5. Samples: 141231356. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 08:27:33,969][134211] Avg episode reward: [(0, '9.537')] [2025-01-04 08:27:34,482][134294] Updated weights for policy 0, policy_version 148504 (0.0026) [2025-01-04 08:27:37,624][134294] Updated weights for policy 0, policy_version 148514 (0.0027) [2025-01-04 08:27:38,968][134211] Fps is (10 sec: 12697.6, 60 sec: 13789.9, 300 sec: 14120.8). Total num frames: 608329728. Throughput: 0: 3310.2. Samples: 141249974. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 08:27:38,968][134211] Avg episode reward: [(0, '8.580')] [2025-01-04 08:27:40,685][134294] Updated weights for policy 0, policy_version 148524 (0.0025) [2025-01-04 08:27:43,889][134294] Updated weights for policy 0, policy_version 148534 (0.0024) [2025-01-04 08:27:43,968][134211] Fps is (10 sec: 13107.4, 60 sec: 13858.3, 300 sec: 13981.9). Total num frames: 608395264. Throughput: 0: 3202.8. Samples: 141269900. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 08:27:43,968][134211] Avg episode reward: [(0, '8.374')] [2025-01-04 08:27:46,721][134294] Updated weights for policy 0, policy_version 148544 (0.0023) [2025-01-04 08:27:48,754][134294] Updated weights for policy 0, policy_version 148554 (0.0013) [2025-01-04 08:27:48,968][134211] Fps is (10 sec: 14745.5, 60 sec: 13926.4, 300 sec: 13884.7). Total num frames: 608477184. Throughput: 0: 3228.7. Samples: 141279280. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 08:27:48,968][134211] Avg episode reward: [(0, '9.270')] [2025-01-04 08:27:50,702][134294] Updated weights for policy 0, policy_version 148564 (0.0013) [2025-01-04 08:27:52,600][134294] Updated weights for policy 0, policy_version 148574 (0.0013) [2025-01-04 08:27:53,968][134211] Fps is (10 sec: 19251.7, 60 sec: 14062.9, 300 sec: 14023.6). Total num frames: 608587776. Throughput: 0: 3553.4. Samples: 141310806. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 08:27:53,968][134211] Avg episode reward: [(0, '8.425')] [2025-01-04 08:27:54,707][134294] Updated weights for policy 0, policy_version 148584 (0.0016) [2025-01-04 08:27:57,894][134294] Updated weights for policy 0, policy_version 148594 (0.0028) [2025-01-04 08:27:58,968][134211] Fps is (10 sec: 17612.7, 60 sec: 13926.4, 300 sec: 14051.4). Total num frames: 608653312. Throughput: 0: 3637.3. Samples: 141334290. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 08:27:58,968][134211] Avg episode reward: [(0, '8.021')] [2025-01-04 08:28:01,144][134294] Updated weights for policy 0, policy_version 148604 (0.0026) [2025-01-04 08:28:03,968][134211] Fps is (10 sec: 12287.8, 60 sec: 13995.2, 300 sec: 14023.6). Total num frames: 608710656. Throughput: 0: 3627.8. Samples: 141343684. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 08:28:03,968][134211] Avg episode reward: [(0, '8.353')] [2025-01-04 08:28:04,807][134294] Updated weights for policy 0, policy_version 148614 (0.0027) [2025-01-04 08:28:08,034][134294] Updated weights for policy 0, policy_version 148624 (0.0028) [2025-01-04 08:28:08,968][134211] Fps is (10 sec: 11878.5, 60 sec: 13994.7, 300 sec: 14009.7). Total num frames: 608772096. Throughput: 0: 3566.8. Samples: 141361398. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 08:28:08,968][134211] Avg episode reward: [(0, '7.911')] [2025-01-04 08:28:11,126][134294] Updated weights for policy 0, policy_version 148634 (0.0025) [2025-01-04 08:28:13,968][134211] Fps is (10 sec: 13107.1, 60 sec: 14063.2, 300 sec: 14009.7). Total num frames: 608841728. Throughput: 0: 3547.3. Samples: 141381036. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 08:28:13,968][134211] Avg episode reward: [(0, '8.977')] [2025-01-04 08:28:14,343][134294] Updated weights for policy 0, policy_version 148644 (0.0026) [2025-01-04 08:28:17,343][134294] Updated weights for policy 0, policy_version 148654 (0.0024) [2025-01-04 08:28:18,968][134211] Fps is (10 sec: 13516.7, 60 sec: 13994.7, 300 sec: 14009.7). Total num frames: 608907264. Throughput: 0: 3545.1. Samples: 141390884. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 08:28:18,968][134211] Avg episode reward: [(0, '9.008')] [2025-01-04 08:28:20,403][134294] Updated weights for policy 0, policy_version 148664 (0.0024) [2025-01-04 08:28:23,345][134294] Updated weights for policy 0, policy_version 148674 (0.0024) [2025-01-04 08:28:23,968][134211] Fps is (10 sec: 13107.2, 60 sec: 13994.6, 300 sec: 13995.8). Total num frames: 608972800. Throughput: 0: 3588.4. Samples: 141411454. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 08:28:23,968][134211] Avg episode reward: [(0, '8.737')] [2025-01-04 08:28:26,339][134294] Updated weights for policy 0, policy_version 148684 (0.0026) [2025-01-04 08:28:28,968][134211] Fps is (10 sec: 13516.8, 60 sec: 13994.6, 300 sec: 14037.5). Total num frames: 609042432. Throughput: 0: 3589.0. Samples: 141431406. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 08:28:28,968][134211] Avg episode reward: [(0, '8.635')] [2025-01-04 08:28:29,695][134294] Updated weights for policy 0, policy_version 148694 (0.0025) [2025-01-04 08:28:33,054][134294] Updated weights for policy 0, policy_version 148704 (0.0028) [2025-01-04 08:28:33,968][134211] Fps is (10 sec: 12697.8, 60 sec: 13926.5, 300 sec: 14009.7). Total num frames: 609099776. Throughput: 0: 3578.6. Samples: 141440316. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 08:28:33,968][134211] Avg episode reward: [(0, '8.678')] [2025-01-04 08:28:35,926][134294] Updated weights for policy 0, policy_version 148714 (0.0019) [2025-01-04 08:28:37,823][134294] Updated weights for policy 0, policy_version 148724 (0.0013) [2025-01-04 08:28:38,967][134211] Fps is (10 sec: 15155.6, 60 sec: 14404.3, 300 sec: 14107.0). Total num frames: 609193984. Throughput: 0: 3381.7. Samples: 141462982. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 08:28:38,968][134211] Avg episode reward: [(0, '8.807')] [2025-01-04 08:28:39,897][134294] Updated weights for policy 0, policy_version 148734 (0.0015) [2025-01-04 08:28:41,968][134294] Updated weights for policy 0, policy_version 148744 (0.0013) [2025-01-04 08:28:43,968][134211] Fps is (10 sec: 18022.1, 60 sec: 14745.6, 300 sec: 14162.4). Total num frames: 609280000. Throughput: 0: 3493.4. Samples: 141491494. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 08:28:43,969][134211] Avg episode reward: [(0, '9.146')] [2025-01-04 08:28:45,152][134294] Updated weights for policy 0, policy_version 148754 (0.0025) [2025-01-04 08:28:48,331][134294] Updated weights for policy 0, policy_version 148764 (0.0028) [2025-01-04 08:28:48,968][134211] Fps is (10 sec: 14745.3, 60 sec: 14404.3, 300 sec: 14134.7). Total num frames: 609341440. Throughput: 0: 3484.6. Samples: 141500492. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 08:28:48,968][134211] Avg episode reward: [(0, '9.462')] [2025-01-04 08:28:51,452][134294] Updated weights for policy 0, policy_version 148774 (0.0028) [2025-01-04 08:28:53,968][134211] Fps is (10 sec: 13107.4, 60 sec: 13721.6, 300 sec: 14134.7). Total num frames: 609411072. Throughput: 0: 3526.8. Samples: 141520106. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 08:28:53,968][134211] Avg episode reward: [(0, '8.086')] [2025-01-04 08:28:54,508][134294] Updated weights for policy 0, policy_version 148784 (0.0027) [2025-01-04 08:28:57,616][134294] Updated weights for policy 0, policy_version 148794 (0.0026) [2025-01-04 08:28:58,968][134211] Fps is (10 sec: 13516.8, 60 sec: 13721.6, 300 sec: 14120.8). Total num frames: 609476608. Throughput: 0: 3535.0. Samples: 141540110. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 08:28:58,968][134211] Avg episode reward: [(0, '9.494')] [2025-01-04 08:29:00,692][134294] Updated weights for policy 0, policy_version 148804 (0.0026) [2025-01-04 08:29:03,919][134294] Updated weights for policy 0, policy_version 148814 (0.0025) [2025-01-04 08:29:03,968][134211] Fps is (10 sec: 13107.2, 60 sec: 13858.2, 300 sec: 13995.8). Total num frames: 609542144. Throughput: 0: 3536.8. Samples: 141550040. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 08:29:03,968][134211] Avg episode reward: [(0, '8.924')] [2025-01-04 08:29:07,168][134294] Updated weights for policy 0, policy_version 148824 (0.0025) [2025-01-04 08:29:08,968][134211] Fps is (10 sec: 12697.6, 60 sec: 13858.1, 300 sec: 13981.9). Total num frames: 609603584. Throughput: 0: 3502.7. Samples: 141569076. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 08:29:08,968][134211] Avg episode reward: [(0, '7.961')] [2025-01-04 08:29:10,148][134294] Updated weights for policy 0, policy_version 148834 (0.0027) [2025-01-04 08:29:13,091][134294] Updated weights for policy 0, policy_version 148844 (0.0023) [2025-01-04 08:29:13,968][134211] Fps is (10 sec: 13107.0, 60 sec: 13858.1, 300 sec: 13995.8). Total num frames: 609673216. Throughput: 0: 3520.2. Samples: 141589814. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 08:29:13,968][134211] Avg episode reward: [(0, '8.383')] [2025-01-04 08:29:14,005][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000148847_609677312.pth... [2025-01-04 08:29:14,069][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000148032_606339072.pth [2025-01-04 08:29:16,112][134294] Updated weights for policy 0, policy_version 148854 (0.0026) [2025-01-04 08:29:18,968][134211] Fps is (10 sec: 13926.5, 60 sec: 13926.4, 300 sec: 13995.8). Total num frames: 609742848. Throughput: 0: 3548.1. Samples: 141599980. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 08:29:18,968][134211] Avg episode reward: [(0, '8.525')] [2025-01-04 08:29:19,058][134294] Updated weights for policy 0, policy_version 148864 (0.0023) [2025-01-04 08:29:22,054][134294] Updated weights for policy 0, policy_version 148874 (0.0027) [2025-01-04 08:29:23,968][134211] Fps is (10 sec: 13926.5, 60 sec: 13994.7, 300 sec: 14009.7). Total num frames: 609812480. Throughput: 0: 3497.8. Samples: 141620384. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 08:29:23,968][134211] Avg episode reward: [(0, '9.954')] [2025-01-04 08:29:25,123][134294] Updated weights for policy 0, policy_version 148884 (0.0026) [2025-01-04 08:29:28,110][134294] Updated weights for policy 0, policy_version 148894 (0.0025) [2025-01-04 08:29:28,968][134211] Fps is (10 sec: 13926.5, 60 sec: 13994.7, 300 sec: 14037.5). Total num frames: 609882112. Throughput: 0: 3320.6. Samples: 141640918. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 08:29:28,968][134211] Avg episode reward: [(0, '9.118')] [2025-01-04 08:29:30,293][134294] Updated weights for policy 0, policy_version 148904 (0.0014) [2025-01-04 08:29:33,219][134294] Updated weights for policy 0, policy_version 148914 (0.0026) [2025-01-04 08:29:33,968][134211] Fps is (10 sec: 14745.8, 60 sec: 14336.0, 300 sec: 14079.1). Total num frames: 609959936. Throughput: 0: 3422.0. Samples: 141654484. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 08:29:33,968][134211] Avg episode reward: [(0, '8.675')] [2025-01-04 08:29:36,530][134294] Updated weights for policy 0, policy_version 148924 (0.0028) [2025-01-04 08:29:38,901][134294] Updated weights for policy 0, policy_version 148934 (0.0015) [2025-01-04 08:29:38,968][134211] Fps is (10 sec: 15155.1, 60 sec: 13994.7, 300 sec: 14093.2). Total num frames: 610033664. Throughput: 0: 3402.5. Samples: 141673220. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 08:29:38,968][134211] Avg episode reward: [(0, '9.423')] [2025-01-04 08:29:40,911][134294] Updated weights for policy 0, policy_version 148944 (0.0013) [2025-01-04 08:29:42,771][134294] Updated weights for policy 0, policy_version 148954 (0.0013) [2025-01-04 08:29:43,968][134211] Fps is (10 sec: 18022.2, 60 sec: 14336.0, 300 sec: 14231.9). Total num frames: 610140160. Throughput: 0: 3648.0. Samples: 141704270. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 08:29:43,968][134211] Avg episode reward: [(0, '9.153')] [2025-01-04 08:29:44,689][134294] Updated weights for policy 0, policy_version 148964 (0.0014) [2025-01-04 08:29:46,527][134294] Updated weights for policy 0, policy_version 148974 (0.0012) [2025-01-04 08:29:48,926][134294] Updated weights for policy 0, policy_version 148984 (0.0020) [2025-01-04 08:29:48,968][134211] Fps is (10 sec: 20479.6, 60 sec: 14950.4, 300 sec: 14287.4). Total num frames: 610238464. Throughput: 0: 3790.9. Samples: 141720630. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 08:29:48,968][134211] Avg episode reward: [(0, '9.529')] [2025-01-04 08:29:51,933][134294] Updated weights for policy 0, policy_version 148994 (0.0025) [2025-01-04 08:29:53,968][134211] Fps is (10 sec: 16383.8, 60 sec: 14882.1, 300 sec: 14287.4). Total num frames: 610304000. Throughput: 0: 3875.2. Samples: 141743462. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 08:29:53,969][134211] Avg episode reward: [(0, '9.140')] [2025-01-04 08:29:55,363][134294] Updated weights for policy 0, policy_version 149004 (0.0025) [2025-01-04 08:29:58,484][134294] Updated weights for policy 0, policy_version 149014 (0.0026) [2025-01-04 08:29:58,968][134211] Fps is (10 sec: 12697.7, 60 sec: 14813.9, 300 sec: 14259.6). Total num frames: 610365440. Throughput: 0: 3840.4. Samples: 141762632. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 08:29:58,968][134211] Avg episode reward: [(0, '9.043')] [2025-01-04 08:30:01,969][134294] Updated weights for policy 0, policy_version 149024 (0.0026) [2025-01-04 08:30:03,968][134211] Fps is (10 sec: 11878.6, 60 sec: 14677.3, 300 sec: 14106.9). Total num frames: 610422784. Throughput: 0: 3816.1. Samples: 141771704. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 08:30:03,968][134211] Avg episode reward: [(0, '9.223')] [2025-01-04 08:30:05,324][134294] Updated weights for policy 0, policy_version 149034 (0.0028) [2025-01-04 08:30:08,619][134294] Updated weights for policy 0, policy_version 149044 (0.0023) [2025-01-04 08:30:08,968][134211] Fps is (10 sec: 12287.9, 60 sec: 14745.6, 300 sec: 14065.3). Total num frames: 610488320. Throughput: 0: 3758.6. Samples: 141789522. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 08:30:08,968][134211] Avg episode reward: [(0, '8.582')] [2025-01-04 08:30:11,638][134294] Updated weights for policy 0, policy_version 149054 (0.0022) [2025-01-04 08:30:13,968][134211] Fps is (10 sec: 13107.1, 60 sec: 14677.3, 300 sec: 14065.2). Total num frames: 610553856. Throughput: 0: 3739.9. Samples: 141809216. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 08:30:13,968][134211] Avg episode reward: [(0, '8.776')] [2025-01-04 08:30:14,872][134294] Updated weights for policy 0, policy_version 149064 (0.0027) [2025-01-04 08:30:17,867][134294] Updated weights for policy 0, policy_version 149074 (0.0025) [2025-01-04 08:30:18,968][134211] Fps is (10 sec: 13107.3, 60 sec: 14609.0, 300 sec: 14065.2). Total num frames: 610619392. Throughput: 0: 3658.4. Samples: 141819114. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 08:30:18,968][134211] Avg episode reward: [(0, '9.162')] [2025-01-04 08:30:20,898][134294] Updated weights for policy 0, policy_version 149084 (0.0027) [2025-01-04 08:30:23,730][134294] Updated weights for policy 0, policy_version 149094 (0.0025) [2025-01-04 08:30:23,968][134211] Fps is (10 sec: 13516.9, 60 sec: 14609.1, 300 sec: 14079.1). Total num frames: 610689024. Throughput: 0: 3702.8. Samples: 141839848. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 08:30:23,968][134211] Avg episode reward: [(0, '8.483')] [2025-01-04 08:30:26,694][134294] Updated weights for policy 0, policy_version 149104 (0.0024) [2025-01-04 08:30:28,968][134211] Fps is (10 sec: 13516.8, 60 sec: 14540.8, 300 sec: 14093.1). Total num frames: 610754560. Throughput: 0: 3467.9. Samples: 141860326. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 08:30:28,968][134211] Avg episode reward: [(0, '9.784')] [2025-01-04 08:30:30,020][134294] Updated weights for policy 0, policy_version 149114 (0.0024) [2025-01-04 08:30:33,326][134294] Updated weights for policy 0, policy_version 149124 (0.0029) [2025-01-04 08:30:33,969][134211] Fps is (10 sec: 13105.6, 60 sec: 14335.7, 300 sec: 14093.0). Total num frames: 610820096. Throughput: 0: 3308.9. Samples: 141869534. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 08:30:33,970][134211] Avg episode reward: [(0, '8.756')] [2025-01-04 08:30:36,488][134294] Updated weights for policy 0, policy_version 149134 (0.0028) [2025-01-04 08:30:38,967][134211] Fps is (10 sec: 13107.6, 60 sec: 14199.5, 300 sec: 14093.1). Total num frames: 610885632. Throughput: 0: 3222.5. Samples: 141888474. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 08:30:38,968][134211] Avg episode reward: [(0, '9.088')] [2025-01-04 08:30:39,450][134294] Updated weights for policy 0, policy_version 149144 (0.0020) [2025-01-04 08:30:41,569][134294] Updated weights for policy 0, policy_version 149154 (0.0013) [2025-01-04 08:30:43,968][134211] Fps is (10 sec: 14747.2, 60 sec: 13789.9, 300 sec: 14134.7). Total num frames: 610967552. Throughput: 0: 3348.3. Samples: 141913304. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 08:30:43,968][134211] Avg episode reward: [(0, '8.118')] [2025-01-04 08:30:44,559][134294] Updated weights for policy 0, policy_version 149164 (0.0023) [2025-01-04 08:30:47,883][134294] Updated weights for policy 0, policy_version 149174 (0.0027) [2025-01-04 08:30:48,968][134211] Fps is (10 sec: 14335.5, 60 sec: 13175.5, 300 sec: 14120.8). Total num frames: 611028992. Throughput: 0: 3339.2. Samples: 141921970. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 08:30:48,968][134211] Avg episode reward: [(0, '8.263')] [2025-01-04 08:30:50,925][134294] Updated weights for policy 0, policy_version 149184 (0.0026) [2025-01-04 08:30:53,201][134294] Updated weights for policy 0, policy_version 149194 (0.0017) [2025-01-04 08:30:53,968][134211] Fps is (10 sec: 14336.3, 60 sec: 13448.6, 300 sec: 14190.3). Total num frames: 611110912. Throughput: 0: 3395.0. Samples: 141942298. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 08:30:53,968][134211] Avg episode reward: [(0, '10.060')] [2025-01-04 08:30:55,653][134294] Updated weights for policy 0, policy_version 149204 (0.0021) [2025-01-04 08:30:58,666][134294] Updated weights for policy 0, policy_version 149214 (0.0026) [2025-01-04 08:30:58,968][134211] Fps is (10 sec: 15564.8, 60 sec: 13653.3, 300 sec: 14106.9). Total num frames: 611184640. Throughput: 0: 3504.2. Samples: 141966904. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 08:30:58,968][134211] Avg episode reward: [(0, '8.904')] [2025-01-04 08:31:02,066][134294] Updated weights for policy 0, policy_version 149224 (0.0028) [2025-01-04 08:31:03,968][134211] Fps is (10 sec: 13106.9, 60 sec: 13653.3, 300 sec: 14037.5). Total num frames: 611241984. Throughput: 0: 3486.0. Samples: 141975984. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 08:31:03,969][134211] Avg episode reward: [(0, '8.807')] [2025-01-04 08:31:05,724][134294] Updated weights for policy 0, policy_version 149234 (0.0025) [2025-01-04 08:31:08,744][134294] Updated weights for policy 0, policy_version 149244 (0.0028) [2025-01-04 08:31:08,968][134211] Fps is (10 sec: 11878.2, 60 sec: 13585.0, 300 sec: 14037.6). Total num frames: 611303424. Throughput: 0: 3421.0. Samples: 141993794. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 08:31:08,969][134211] Avg episode reward: [(0, '8.390')] [2025-01-04 08:31:11,793][134294] Updated weights for policy 0, policy_version 149254 (0.0025) [2025-01-04 08:31:13,968][134211] Fps is (10 sec: 13107.3, 60 sec: 13653.3, 300 sec: 14051.4). Total num frames: 611373056. Throughput: 0: 3415.1. Samples: 142014004. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 08:31:13,968][134211] Avg episode reward: [(0, '8.426')] [2025-01-04 08:31:13,975][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000149261_611373056.pth... [2025-01-04 08:31:14,058][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000148437_607997952.pth [2025-01-04 08:31:14,887][134294] Updated weights for policy 0, policy_version 149264 (0.0026) [2025-01-04 08:31:17,878][134294] Updated weights for policy 0, policy_version 149274 (0.0024) [2025-01-04 08:31:18,968][134211] Fps is (10 sec: 13517.2, 60 sec: 13653.3, 300 sec: 14037.5). Total num frames: 611438592. Throughput: 0: 3424.4. Samples: 142023628. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 08:31:18,968][134211] Avg episode reward: [(0, '8.346')] [2025-01-04 08:31:20,922][134294] Updated weights for policy 0, policy_version 149284 (0.0025) [2025-01-04 08:31:22,871][134294] Updated weights for policy 0, policy_version 149294 (0.0014) [2025-01-04 08:31:23,967][134211] Fps is (10 sec: 15565.2, 60 sec: 13994.7, 300 sec: 14120.8). Total num frames: 611528704. Throughput: 0: 3515.0. Samples: 142046650. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 08:31:23,968][134211] Avg episode reward: [(0, '8.942')] [2025-01-04 08:31:24,774][134294] Updated weights for policy 0, policy_version 149304 (0.0014) [2025-01-04 08:31:26,618][134294] Updated weights for policy 0, policy_version 149314 (0.0014) [2025-01-04 08:31:28,523][134294] Updated weights for policy 0, policy_version 149324 (0.0013) [2025-01-04 08:31:28,967][134211] Fps is (10 sec: 20070.8, 60 sec: 14745.7, 300 sec: 14287.4). Total num frames: 611639296. Throughput: 0: 3687.5. Samples: 142079240. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 08:31:28,968][134211] Avg episode reward: [(0, '8.275')] [2025-01-04 08:31:30,588][134294] Updated weights for policy 0, policy_version 149334 (0.0015) [2025-01-04 08:31:33,936][134294] Updated weights for policy 0, policy_version 149344 (0.0026) [2025-01-04 08:31:33,968][134211] Fps is (10 sec: 18431.7, 60 sec: 14882.4, 300 sec: 14273.5). Total num frames: 611713024. Throughput: 0: 3806.5. Samples: 142093264. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 08:31:33,968][134211] Avg episode reward: [(0, '9.335')] [2025-01-04 08:31:37,503][134294] Updated weights for policy 0, policy_version 149354 (0.0030) [2025-01-04 08:31:38,968][134211] Fps is (10 sec: 13106.9, 60 sec: 14745.5, 300 sec: 14259.7). Total num frames: 611770368. Throughput: 0: 3741.9. Samples: 142110682. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 08:31:38,968][134211] Avg episode reward: [(0, '8.702')] [2025-01-04 08:31:40,772][134294] Updated weights for policy 0, policy_version 149364 (0.0026) [2025-01-04 08:31:43,862][134294] Updated weights for policy 0, policy_version 149374 (0.0026) [2025-01-04 08:31:43,968][134211] Fps is (10 sec: 12287.9, 60 sec: 14472.5, 300 sec: 14218.0). Total num frames: 611835904. Throughput: 0: 3625.7. Samples: 142130062. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 08:31:43,968][134211] Avg episode reward: [(0, '8.385')] [2025-01-04 08:31:46,815][134294] Updated weights for policy 0, policy_version 149384 (0.0023) [2025-01-04 08:31:48,968][134211] Fps is (10 sec: 13107.1, 60 sec: 14540.8, 300 sec: 14093.0). Total num frames: 611901440. Throughput: 0: 3643.8. Samples: 142139954. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 08:31:48,968][134211] Avg episode reward: [(0, '7.967')] [2025-01-04 08:31:50,073][134294] Updated weights for policy 0, policy_version 149394 (0.0028) [2025-01-04 08:31:53,142][134294] Updated weights for policy 0, policy_version 149404 (0.0027) [2025-01-04 08:31:53,968][134211] Fps is (10 sec: 13107.1, 60 sec: 14267.7, 300 sec: 14065.2). Total num frames: 611966976. Throughput: 0: 3687.3. Samples: 142159720. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 08:31:53,969][134211] Avg episode reward: [(0, '9.124')] [2025-01-04 08:31:56,233][134294] Updated weights for policy 0, policy_version 149414 (0.0028) [2025-01-04 08:31:58,968][134211] Fps is (10 sec: 13107.3, 60 sec: 14131.2, 300 sec: 14107.0). Total num frames: 612032512. Throughput: 0: 3672.2. Samples: 142179252. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 08:31:58,968][134211] Avg episode reward: [(0, '9.394')] [2025-01-04 08:31:59,505][134294] Updated weights for policy 0, policy_version 149424 (0.0026) [2025-01-04 08:32:02,878][134294] Updated weights for policy 0, policy_version 149434 (0.0024) [2025-01-04 08:32:03,968][134211] Fps is (10 sec: 12697.6, 60 sec: 14199.5, 300 sec: 14106.9). Total num frames: 612093952. Throughput: 0: 3659.8. Samples: 142188318. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 08:32:03,969][134211] Avg episode reward: [(0, '9.312')] [2025-01-04 08:32:06,111][134294] Updated weights for policy 0, policy_version 149444 (0.0027) [2025-01-04 08:32:08,968][134211] Fps is (10 sec: 12287.8, 60 sec: 14199.5, 300 sec: 14093.1). Total num frames: 612155392. Throughput: 0: 3562.6. Samples: 142206966. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 08:32:08,968][134211] Avg episode reward: [(0, '8.500')] [2025-01-04 08:32:09,548][134294] Updated weights for policy 0, policy_version 149454 (0.0025) [2025-01-04 08:32:12,718][134294] Updated weights for policy 0, policy_version 149464 (0.0025) [2025-01-04 08:32:13,968][134211] Fps is (10 sec: 12288.0, 60 sec: 14062.9, 300 sec: 14065.2). Total num frames: 612216832. Throughput: 0: 3256.4. Samples: 142225778. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 08:32:13,968][134211] Avg episode reward: [(0, '9.136')] [2025-01-04 08:32:15,788][134294] Updated weights for policy 0, policy_version 149474 (0.0024) [2025-01-04 08:32:18,573][134294] Updated weights for policy 0, policy_version 149484 (0.0026) [2025-01-04 08:32:18,971][134211] Fps is (10 sec: 13103.3, 60 sec: 14130.5, 300 sec: 14079.0). Total num frames: 612286464. Throughput: 0: 3172.9. Samples: 142236052. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 08:32:18,971][134211] Avg episode reward: [(0, '9.137')] [2025-01-04 08:32:21,651][134294] Updated weights for policy 0, policy_version 149494 (0.0027) [2025-01-04 08:32:23,968][134211] Fps is (10 sec: 14745.9, 60 sec: 13926.4, 300 sec: 14106.9). Total num frames: 612364288. Throughput: 0: 3248.4. Samples: 142256858. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 08:32:23,968][134211] Avg episode reward: [(0, '8.383')] [2025-01-04 08:32:23,989][134294] Updated weights for policy 0, policy_version 149504 (0.0017) [2025-01-04 08:32:26,499][134294] Updated weights for policy 0, policy_version 149514 (0.0020) [2025-01-04 08:32:28,968][134211] Fps is (10 sec: 15569.5, 60 sec: 13380.2, 300 sec: 14162.5). Total num frames: 612442112. Throughput: 0: 3369.4. Samples: 142281684. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 08:32:28,968][134211] Avg episode reward: [(0, '9.394')] [2025-01-04 08:32:29,285][134294] Updated weights for policy 0, policy_version 149524 (0.0024) [2025-01-04 08:32:32,569][134294] Updated weights for policy 0, policy_version 149534 (0.0026) [2025-01-04 08:32:33,968][134211] Fps is (10 sec: 14335.3, 60 sec: 13243.6, 300 sec: 14162.4). Total num frames: 612507648. Throughput: 0: 3367.2. Samples: 142291480. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 08:32:33,969][134211] Avg episode reward: [(0, '8.603')] [2025-01-04 08:32:35,734][134294] Updated weights for policy 0, policy_version 149544 (0.0025) [2025-01-04 08:32:38,968][134211] Fps is (10 sec: 12697.7, 60 sec: 13312.0, 300 sec: 14148.6). Total num frames: 612569088. Throughput: 0: 3349.4. Samples: 142310444. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 08:32:38,968][134211] Avg episode reward: [(0, '7.910')] [2025-01-04 08:32:39,171][134294] Updated weights for policy 0, policy_version 149554 (0.0024) [2025-01-04 08:32:41,183][134294] Updated weights for policy 0, policy_version 149564 (0.0012) [2025-01-04 08:32:43,338][134294] Updated weights for policy 0, policy_version 149574 (0.0013) [2025-01-04 08:32:43,968][134211] Fps is (10 sec: 15565.6, 60 sec: 13789.9, 300 sec: 14190.2). Total num frames: 612663296. Throughput: 0: 3493.3. Samples: 142336448. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 08:32:43,968][134211] Avg episode reward: [(0, '8.272')] [2025-01-04 08:32:45,417][134294] Updated weights for policy 0, policy_version 149584 (0.0013) [2025-01-04 08:32:47,304][134294] Updated weights for policy 0, policy_version 149594 (0.0013) [2025-01-04 08:32:48,967][134211] Fps is (10 sec: 20070.7, 60 sec: 14472.6, 300 sec: 14176.3). Total num frames: 612769792. Throughput: 0: 3620.8. Samples: 142351252. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 08:32:48,968][134211] Avg episode reward: [(0, '8.835')] [2025-01-04 08:32:49,171][134294] Updated weights for policy 0, policy_version 149604 (0.0012) [2025-01-04 08:32:51,773][134294] Updated weights for policy 0, policy_version 149614 (0.0023) [2025-01-04 08:32:53,968][134211] Fps is (10 sec: 18431.5, 60 sec: 14677.3, 300 sec: 14218.0). Total num frames: 612847616. Throughput: 0: 3829.4. Samples: 142379288. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 08:32:53,968][134211] Avg episode reward: [(0, '9.193')] [2025-01-04 08:32:54,927][134294] Updated weights for policy 0, policy_version 149624 (0.0028) [2025-01-04 08:32:58,014][134294] Updated weights for policy 0, policy_version 149634 (0.0025) [2025-01-04 08:32:58,968][134211] Fps is (10 sec: 14335.8, 60 sec: 14677.3, 300 sec: 14245.8). Total num frames: 612913152. Throughput: 0: 3849.5. Samples: 142399004. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 08:32:58,968][134211] Avg episode reward: [(0, '8.715')] [2025-01-04 08:33:01,183][134294] Updated weights for policy 0, policy_version 149644 (0.0026) [2025-01-04 08:33:03,968][134211] Fps is (10 sec: 12287.6, 60 sec: 14609.0, 300 sec: 14231.8). Total num frames: 612970496. Throughput: 0: 3833.1. Samples: 142408530. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 08:33:03,969][134211] Avg episode reward: [(0, '9.290')] [2025-01-04 08:33:04,702][134294] Updated weights for policy 0, policy_version 149654 (0.0027) [2025-01-04 08:33:07,986][134294] Updated weights for policy 0, policy_version 149664 (0.0026) [2025-01-04 08:33:08,968][134211] Fps is (10 sec: 12288.0, 60 sec: 14677.4, 300 sec: 14218.0). Total num frames: 613036032. Throughput: 0: 3769.1. Samples: 142426468. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 08:33:08,968][134211] Avg episode reward: [(0, '9.056')] [2025-01-04 08:33:10,995][134294] Updated weights for policy 0, policy_version 149674 (0.0026) [2025-01-04 08:33:13,968][134211] Fps is (10 sec: 13107.8, 60 sec: 14745.6, 300 sec: 14218.0). Total num frames: 613101568. Throughput: 0: 3666.2. Samples: 142446662. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 08:33:13,968][134211] Avg episode reward: [(0, '9.333')] [2025-01-04 08:33:13,980][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000149683_613101568.pth... [2025-01-04 08:33:14,061][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000148847_609677312.pth [2025-01-04 08:33:14,144][134294] Updated weights for policy 0, policy_version 149684 (0.0028) [2025-01-04 08:33:17,179][134294] Updated weights for policy 0, policy_version 149694 (0.0026) [2025-01-04 08:33:18,968][134211] Fps is (10 sec: 13107.2, 60 sec: 14678.1, 300 sec: 14218.0). Total num frames: 613167104. Throughput: 0: 3668.2. Samples: 142456548. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 08:33:18,968][134211] Avg episode reward: [(0, '9.301')] [2025-01-04 08:33:20,218][134294] Updated weights for policy 0, policy_version 149704 (0.0024) [2025-01-04 08:33:23,202][134294] Updated weights for policy 0, policy_version 149714 (0.0027) [2025-01-04 08:33:23,968][134211] Fps is (10 sec: 13516.6, 60 sec: 14540.7, 300 sec: 14218.0). Total num frames: 613236736. Throughput: 0: 3700.2. Samples: 142476952. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 08:33:23,968][134211] Avg episode reward: [(0, '8.903')] [2025-01-04 08:33:26,114][134294] Updated weights for policy 0, policy_version 149724 (0.0028) [2025-01-04 08:33:28,968][134211] Fps is (10 sec: 13516.1, 60 sec: 14335.9, 300 sec: 14245.7). Total num frames: 613302272. Throughput: 0: 3571.8. Samples: 142497182. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 08:33:28,969][134211] Avg episode reward: [(0, '8.610')] [2025-01-04 08:33:29,288][134294] Updated weights for policy 0, policy_version 149734 (0.0026) [2025-01-04 08:33:32,588][134294] Updated weights for policy 0, policy_version 149744 (0.0024) [2025-01-04 08:33:33,968][134211] Fps is (10 sec: 13107.3, 60 sec: 14336.1, 300 sec: 14148.5). Total num frames: 613367808. Throughput: 0: 3452.2. Samples: 142506600. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 08:33:33,968][134211] Avg episode reward: [(0, '8.739')] [2025-01-04 08:33:35,824][134294] Updated weights for policy 0, policy_version 149754 (0.0026) [2025-01-04 08:33:38,845][134294] Updated weights for policy 0, policy_version 149764 (0.0026) [2025-01-04 08:33:38,968][134211] Fps is (10 sec: 13107.7, 60 sec: 14404.2, 300 sec: 14079.1). Total num frames: 613433344. Throughput: 0: 3256.3. Samples: 142525822. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 08:33:38,968][134211] Avg episode reward: [(0, '9.407')] [2025-01-04 08:33:42,004][134294] Updated weights for policy 0, policy_version 149774 (0.0024) [2025-01-04 08:33:43,968][134211] Fps is (10 sec: 13106.7, 60 sec: 13926.3, 300 sec: 14093.0). Total num frames: 613498880. Throughput: 0: 3256.6. Samples: 142545552. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 08:33:43,969][134211] Avg episode reward: [(0, '8.954')] [2025-01-04 08:33:44,968][134294] Updated weights for policy 0, policy_version 149784 (0.0027) [2025-01-04 08:33:47,927][134294] Updated weights for policy 0, policy_version 149794 (0.0023) [2025-01-04 08:33:48,968][134211] Fps is (10 sec: 13516.8, 60 sec: 13311.9, 300 sec: 14093.0). Total num frames: 613568512. Throughput: 0: 3280.9. Samples: 142556170. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 08:33:48,968][134211] Avg episode reward: [(0, '9.071')] [2025-01-04 08:33:50,879][134294] Updated weights for policy 0, policy_version 149804 (0.0028) [2025-01-04 08:33:53,705][134294] Updated weights for policy 0, policy_version 149814 (0.0024) [2025-01-04 08:33:53,968][134211] Fps is (10 sec: 13926.8, 60 sec: 13175.5, 300 sec: 14106.9). Total num frames: 613638144. Throughput: 0: 3354.1. Samples: 142577404. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 08:33:53,968][134211] Avg episode reward: [(0, '8.741')] [2025-01-04 08:33:56,646][134294] Updated weights for policy 0, policy_version 149824 (0.0025) [2025-01-04 08:33:58,968][134211] Fps is (10 sec: 14745.9, 60 sec: 13380.3, 300 sec: 14148.6). Total num frames: 613715968. Throughput: 0: 3386.1. Samples: 142599036. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 08:33:58,968][134211] Avg episode reward: [(0, '8.905')] [2025-01-04 08:33:58,974][134294] Updated weights for policy 0, policy_version 149834 (0.0014) [2025-01-04 08:34:01,072][134294] Updated weights for policy 0, policy_version 149844 (0.0013) [2025-01-04 08:34:03,168][134294] Updated weights for policy 0, policy_version 149854 (0.0013) [2025-01-04 08:34:03,967][134211] Fps is (10 sec: 18023.0, 60 sec: 14131.4, 300 sec: 14287.4). Total num frames: 613818368. Throughput: 0: 3501.3. Samples: 142614106. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 08:34:03,968][134211] Avg episode reward: [(0, '10.015')] [2025-01-04 08:34:05,488][134294] Updated weights for policy 0, policy_version 149864 (0.0017) [2025-01-04 08:34:08,759][134294] Updated weights for policy 0, policy_version 149874 (0.0028) [2025-01-04 08:34:08,969][134211] Fps is (10 sec: 16791.5, 60 sec: 14130.9, 300 sec: 14273.5). Total num frames: 613883904. Throughput: 0: 3608.6. Samples: 142639342. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 08:34:08,969][134211] Avg episode reward: [(0, '9.437')] [2025-01-04 08:34:11,842][134294] Updated weights for policy 0, policy_version 149884 (0.0025) [2025-01-04 08:34:13,968][134211] Fps is (10 sec: 13106.7, 60 sec: 14131.2, 300 sec: 14259.6). Total num frames: 613949440. Throughput: 0: 3579.0. Samples: 142658238. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 08:34:13,968][134211] Avg episode reward: [(0, '8.852')] [2025-01-04 08:34:15,195][134294] Updated weights for policy 0, policy_version 149894 (0.0028) [2025-01-04 08:34:18,351][134294] Updated weights for policy 0, policy_version 149904 (0.0025) [2025-01-04 08:34:18,968][134211] Fps is (10 sec: 13108.8, 60 sec: 14131.2, 300 sec: 14245.8). Total num frames: 614014976. Throughput: 0: 3578.9. Samples: 142667652. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 08:34:18,968][134211] Avg episode reward: [(0, '8.939')] [2025-01-04 08:34:21,346][134294] Updated weights for policy 0, policy_version 149914 (0.0025) [2025-01-04 08:34:23,968][134211] Fps is (10 sec: 13107.1, 60 sec: 14062.9, 300 sec: 14231.8). Total num frames: 614080512. Throughput: 0: 3599.6. Samples: 142687804. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 08:34:23,969][134211] Avg episode reward: [(0, '9.314')] [2025-01-04 08:34:24,577][134294] Updated weights for policy 0, policy_version 149924 (0.0028) [2025-01-04 08:34:27,455][134294] Updated weights for policy 0, policy_version 149934 (0.0023) [2025-01-04 08:34:28,968][134211] Fps is (10 sec: 13107.0, 60 sec: 14063.0, 300 sec: 14190.2). Total num frames: 614146048. Throughput: 0: 3603.6. Samples: 142707714. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 08:34:28,968][134211] Avg episode reward: [(0, '9.136')] [2025-01-04 08:34:30,749][134294] Updated weights for policy 0, policy_version 149944 (0.0025) [2025-01-04 08:34:33,959][134294] Updated weights for policy 0, policy_version 149954 (0.0025) [2025-01-04 08:34:33,969][134211] Fps is (10 sec: 13105.8, 60 sec: 14062.6, 300 sec: 14162.4). Total num frames: 614211584. Throughput: 0: 3579.3. Samples: 142717244. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 08:34:33,970][134211] Avg episode reward: [(0, '8.784')] [2025-01-04 08:34:37,260][134294] Updated weights for policy 0, policy_version 149964 (0.0026) [2025-01-04 08:34:38,968][134211] Fps is (10 sec: 12697.2, 60 sec: 13994.6, 300 sec: 14009.7). Total num frames: 614273024. Throughput: 0: 3525.5. Samples: 142736052. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 08:34:38,969][134211] Avg episode reward: [(0, '8.542')] [2025-01-04 08:34:40,393][134294] Updated weights for policy 0, policy_version 149974 (0.0025) [2025-01-04 08:34:43,705][134294] Updated weights for policy 0, policy_version 149984 (0.0024) [2025-01-04 08:34:43,968][134211] Fps is (10 sec: 12699.3, 60 sec: 13994.8, 300 sec: 13898.6). Total num frames: 614338560. Throughput: 0: 3477.4. Samples: 142755518. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 08:34:43,968][134211] Avg episode reward: [(0, '8.758')] [2025-01-04 08:34:46,135][134294] Updated weights for policy 0, policy_version 149994 (0.0018) [2025-01-04 08:34:48,152][134294] Updated weights for policy 0, policy_version 150004 (0.0014) [2025-01-04 08:34:48,967][134211] Fps is (10 sec: 15565.8, 60 sec: 14336.1, 300 sec: 13982.0). Total num frames: 614428672. Throughput: 0: 3402.5. Samples: 142767220. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 08:34:48,968][134211] Avg episode reward: [(0, '9.087')] [2025-01-04 08:34:50,094][134294] Updated weights for policy 0, policy_version 150014 (0.0014) [2025-01-04 08:34:52,006][134294] Updated weights for policy 0, policy_version 150024 (0.0013) [2025-01-04 08:34:53,887][134294] Updated weights for policy 0, policy_version 150034 (0.0014) [2025-01-04 08:34:53,968][134211] Fps is (10 sec: 20070.4, 60 sec: 15018.7, 300 sec: 14148.6). Total num frames: 614539264. Throughput: 0: 3544.1. Samples: 142798824. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 08:34:53,968][134211] Avg episode reward: [(0, '9.033')] [2025-01-04 08:34:56,690][134294] Updated weights for policy 0, policy_version 150044 (0.0027) [2025-01-04 08:34:58,968][134211] Fps is (10 sec: 18022.2, 60 sec: 14882.1, 300 sec: 14190.2). Total num frames: 614608896. Throughput: 0: 3665.9. Samples: 142823202. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 08:34:58,968][134211] Avg episode reward: [(0, '9.002')] [2025-01-04 08:34:59,787][134294] Updated weights for policy 0, policy_version 150054 (0.0028) [2025-01-04 08:35:03,371][134294] Updated weights for policy 0, policy_version 150064 (0.0029) [2025-01-04 08:35:03,968][134211] Fps is (10 sec: 12697.5, 60 sec: 14131.2, 300 sec: 14162.4). Total num frames: 614666240. Throughput: 0: 3655.4. Samples: 142832146. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 08:35:03,968][134211] Avg episode reward: [(0, '9.489')] [2025-01-04 08:35:06,692][134294] Updated weights for policy 0, policy_version 150074 (0.0029) [2025-01-04 08:35:08,968][134211] Fps is (10 sec: 11878.4, 60 sec: 14063.2, 300 sec: 14148.6). Total num frames: 614727680. Throughput: 0: 3607.8. Samples: 142850152. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 08:35:08,968][134211] Avg episode reward: [(0, '9.461')] [2025-01-04 08:35:10,109][134294] Updated weights for policy 0, policy_version 150084 (0.0025) [2025-01-04 08:35:13,101][134294] Updated weights for policy 0, policy_version 150094 (0.0024) [2025-01-04 08:35:13,968][134211] Fps is (10 sec: 12697.5, 60 sec: 14063.0, 300 sec: 14148.6). Total num frames: 614793216. Throughput: 0: 3601.7. Samples: 142869792. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 08:35:13,968][134211] Avg episode reward: [(0, '8.968')] [2025-01-04 08:35:14,020][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000150097_614797312.pth... [2025-01-04 08:35:14,097][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000149261_611373056.pth [2025-01-04 08:35:16,137][134294] Updated weights for policy 0, policy_version 150104 (0.0025) [2025-01-04 08:35:18,968][134211] Fps is (10 sec: 13516.3, 60 sec: 14131.1, 300 sec: 14148.5). Total num frames: 614862848. Throughput: 0: 3613.8. Samples: 142879860. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 08:35:18,969][134211] Avg episode reward: [(0, '9.011')] [2025-01-04 08:35:19,083][134294] Updated weights for policy 0, policy_version 150114 (0.0023) [2025-01-04 08:35:22,081][134294] Updated weights for policy 0, policy_version 150124 (0.0023) [2025-01-04 08:35:23,968][134211] Fps is (10 sec: 13516.9, 60 sec: 14131.3, 300 sec: 14148.6). Total num frames: 614928384. Throughput: 0: 3653.2. Samples: 142900446. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 08:35:23,968][134211] Avg episode reward: [(0, '9.439')] [2025-01-04 08:35:25,120][134294] Updated weights for policy 0, policy_version 150134 (0.0026) [2025-01-04 08:35:28,140][134294] Updated weights for policy 0, policy_version 150144 (0.0024) [2025-01-04 08:35:28,968][134211] Fps is (10 sec: 13517.3, 60 sec: 14199.5, 300 sec: 14162.5). Total num frames: 614998016. Throughput: 0: 3675.2. Samples: 142920904. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 08:35:28,968][134211] Avg episode reward: [(0, '8.742')] [2025-01-04 08:35:31,078][134294] Updated weights for policy 0, policy_version 150154 (0.0025) [2025-01-04 08:35:33,968][134211] Fps is (10 sec: 13516.6, 60 sec: 14199.7, 300 sec: 14162.4). Total num frames: 615063552. Throughput: 0: 3637.9. Samples: 142930928. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 08:35:33,969][134211] Avg episode reward: [(0, '9.469')] [2025-01-04 08:35:34,498][134294] Updated weights for policy 0, policy_version 150164 (0.0024) [2025-01-04 08:35:37,719][134294] Updated weights for policy 0, policy_version 150174 (0.0026) [2025-01-04 08:35:38,969][134211] Fps is (10 sec: 12696.3, 60 sec: 14199.3, 300 sec: 14093.0). Total num frames: 615124992. Throughput: 0: 3350.2. Samples: 142949586. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 08:35:38,969][134211] Avg episode reward: [(0, '9.737')] [2025-01-04 08:35:40,797][134294] Updated weights for policy 0, policy_version 150184 (0.0024) [2025-01-04 08:35:43,881][134294] Updated weights for policy 0, policy_version 150194 (0.0025) [2025-01-04 08:35:43,968][134211] Fps is (10 sec: 13107.3, 60 sec: 14267.7, 300 sec: 14120.8). Total num frames: 615194624. Throughput: 0: 3253.8. Samples: 142969622. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 08:35:43,968][134211] Avg episode reward: [(0, '8.923')] [2025-01-04 08:35:46,766][134294] Updated weights for policy 0, policy_version 150204 (0.0027) [2025-01-04 08:35:48,968][134211] Fps is (10 sec: 13927.8, 60 sec: 13926.4, 300 sec: 14079.1). Total num frames: 615264256. Throughput: 0: 3285.2. Samples: 142979978. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 08:35:48,968][134211] Avg episode reward: [(0, '8.987')] [2025-01-04 08:35:49,795][134294] Updated weights for policy 0, policy_version 150214 (0.0023) [2025-01-04 08:35:52,757][134294] Updated weights for policy 0, policy_version 150224 (0.0023) [2025-01-04 08:35:53,968][134211] Fps is (10 sec: 13516.9, 60 sec: 13175.5, 300 sec: 14051.4). Total num frames: 615329792. Throughput: 0: 3341.6. Samples: 143000524. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 08:35:53,968][134211] Avg episode reward: [(0, '8.354')] [2025-01-04 08:35:55,506][134294] Updated weights for policy 0, policy_version 150234 (0.0021) [2025-01-04 08:35:57,518][134294] Updated weights for policy 0, policy_version 150244 (0.0014) [2025-01-04 08:35:58,968][134211] Fps is (10 sec: 15155.0, 60 sec: 13448.5, 300 sec: 14148.6). Total num frames: 615415808. Throughput: 0: 3458.4. Samples: 143025420. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 08:35:58,968][134211] Avg episode reward: [(0, '8.938')] [2025-01-04 08:36:00,499][134294] Updated weights for policy 0, policy_version 150254 (0.0023) [2025-01-04 08:36:03,772][134294] Updated weights for policy 0, policy_version 150264 (0.0025) [2025-01-04 08:36:03,968][134211] Fps is (10 sec: 15155.0, 60 sec: 13585.0, 300 sec: 14162.4). Total num frames: 615481344. Throughput: 0: 3455.2. Samples: 143035344. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 08:36:03,968][134211] Avg episode reward: [(0, '9.793')] [2025-01-04 08:36:06,375][134294] Updated weights for policy 0, policy_version 150274 (0.0020) [2025-01-04 08:36:08,352][134294] Updated weights for policy 0, policy_version 150284 (0.0013) [2025-01-04 08:36:08,967][134211] Fps is (10 sec: 15974.9, 60 sec: 14131.2, 300 sec: 14245.8). Total num frames: 615575552. Throughput: 0: 3502.4. Samples: 143058054. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 08:36:08,968][134211] Avg episode reward: [(0, '8.404')] [2025-01-04 08:36:10,163][134294] Updated weights for policy 0, policy_version 150294 (0.0016) [2025-01-04 08:36:12,058][134294] Updated weights for policy 0, policy_version 150304 (0.0016) [2025-01-04 08:36:13,970][134211] Fps is (10 sec: 19247.2, 60 sec: 14676.8, 300 sec: 14356.7). Total num frames: 615673856. Throughput: 0: 3755.8. Samples: 143089922. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 08:36:13,971][134211] Avg episode reward: [(0, '9.887')] [2025-01-04 08:36:14,613][134294] Updated weights for policy 0, policy_version 150314 (0.0020) [2025-01-04 08:36:17,943][134294] Updated weights for policy 0, policy_version 150324 (0.0028) [2025-01-04 08:36:18,968][134211] Fps is (10 sec: 16383.5, 60 sec: 14609.1, 300 sec: 14273.5). Total num frames: 615739392. Throughput: 0: 3753.0. Samples: 143099814. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 08:36:18,968][134211] Avg episode reward: [(0, '8.839')] [2025-01-04 08:36:20,985][134294] Updated weights for policy 0, policy_version 150334 (0.0025) [2025-01-04 08:36:23,968][134211] Fps is (10 sec: 13110.0, 60 sec: 14609.0, 300 sec: 14120.8). Total num frames: 615804928. Throughput: 0: 3769.6. Samples: 143119216. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 08:36:23,968][134211] Avg episode reward: [(0, '8.824')] [2025-01-04 08:36:24,031][134294] Updated weights for policy 0, policy_version 150344 (0.0028) [2025-01-04 08:36:27,044][134294] Updated weights for policy 0, policy_version 150354 (0.0027) [2025-01-04 08:36:28,968][134211] Fps is (10 sec: 13517.0, 60 sec: 14609.1, 300 sec: 14106.9). Total num frames: 615874560. Throughput: 0: 3771.2. Samples: 143139326. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 08:36:28,968][134211] Avg episode reward: [(0, '8.420')] [2025-01-04 08:36:30,204][134294] Updated weights for policy 0, policy_version 150364 (0.0029) [2025-01-04 08:36:33,340][134294] Updated weights for policy 0, policy_version 150374 (0.0027) [2025-01-04 08:36:33,968][134211] Fps is (10 sec: 13107.4, 60 sec: 14540.8, 300 sec: 14120.8). Total num frames: 615936000. Throughput: 0: 3761.3. Samples: 143149236. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 08:36:33,968][134211] Avg episode reward: [(0, '9.471')] [2025-01-04 08:36:36,619][134294] Updated weights for policy 0, policy_version 150384 (0.0027) [2025-01-04 08:36:38,968][134211] Fps is (10 sec: 12697.6, 60 sec: 14609.3, 300 sec: 14120.8). Total num frames: 616001536. Throughput: 0: 3718.5. Samples: 143167858. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 08:36:38,968][134211] Avg episode reward: [(0, '8.267')] [2025-01-04 08:36:39,879][134294] Updated weights for policy 0, policy_version 150394 (0.0026) [2025-01-04 08:36:42,824][134294] Updated weights for policy 0, policy_version 150404 (0.0024) [2025-01-04 08:36:43,968][134211] Fps is (10 sec: 13107.0, 60 sec: 14540.8, 300 sec: 14120.8). Total num frames: 616067072. Throughput: 0: 3611.4. Samples: 143187934. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 08:36:43,968][134211] Avg episode reward: [(0, '8.378')] [2025-01-04 08:36:46,223][134294] Updated weights for policy 0, policy_version 150414 (0.0028) [2025-01-04 08:36:48,968][134211] Fps is (10 sec: 12287.8, 60 sec: 14336.0, 300 sec: 14093.0). Total num frames: 616124416. Throughput: 0: 3587.6. Samples: 143196784. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 08:36:48,968][134211] Avg episode reward: [(0, '9.129')] [2025-01-04 08:36:49,846][134294] Updated weights for policy 0, policy_version 150424 (0.0025) [2025-01-04 08:36:53,113][134294] Updated weights for policy 0, policy_version 150434 (0.0027) [2025-01-04 08:36:53,968][134211] Fps is (10 sec: 11878.6, 60 sec: 14267.7, 300 sec: 14079.1). Total num frames: 616185856. Throughput: 0: 3480.7. Samples: 143214684. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 08:36:53,968][134211] Avg episode reward: [(0, '8.113')] [2025-01-04 08:36:55,494][134294] Updated weights for policy 0, policy_version 150444 (0.0019) [2025-01-04 08:36:57,910][134294] Updated weights for policy 0, policy_version 150454 (0.0019) [2025-01-04 08:36:58,968][134211] Fps is (10 sec: 14745.6, 60 sec: 14267.7, 300 sec: 14162.4). Total num frames: 616271872. Throughput: 0: 3316.1. Samples: 143239140. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 08:36:58,968][134211] Avg episode reward: [(0, '9.079')] [2025-01-04 08:37:00,914][134294] Updated weights for policy 0, policy_version 150464 (0.0027) [2025-01-04 08:37:03,968][134211] Fps is (10 sec: 15155.1, 60 sec: 14267.8, 300 sec: 14176.3). Total num frames: 616337408. Throughput: 0: 3319.5. Samples: 143249190. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 08:37:03,968][134211] Avg episode reward: [(0, '10.337')] [2025-01-04 08:37:04,268][134294] Updated weights for policy 0, policy_version 150474 (0.0024) [2025-01-04 08:37:07,499][134294] Updated weights for policy 0, policy_version 150484 (0.0027) [2025-01-04 08:37:08,968][134211] Fps is (10 sec: 12697.5, 60 sec: 13721.5, 300 sec: 14176.3). Total num frames: 616398848. Throughput: 0: 3302.8. Samples: 143267842. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 08:37:08,969][134211] Avg episode reward: [(0, '8.964')] [2025-01-04 08:37:10,564][134294] Updated weights for policy 0, policy_version 150494 (0.0025) [2025-01-04 08:37:12,794][134294] Updated weights for policy 0, policy_version 150504 (0.0017) [2025-01-04 08:37:13,968][134211] Fps is (10 sec: 14745.8, 60 sec: 13517.3, 300 sec: 14232.0). Total num frames: 616484864. Throughput: 0: 3378.0. Samples: 143291338. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 08:37:13,968][134211] Avg episode reward: [(0, '8.866')] [2025-01-04 08:37:13,978][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000150509_616484864.pth... [2025-01-04 08:37:14,036][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000149683_613101568.pth [2025-01-04 08:37:15,243][134294] Updated weights for policy 0, policy_version 150514 (0.0022) [2025-01-04 08:37:18,206][134294] Updated weights for policy 0, policy_version 150524 (0.0026) [2025-01-04 08:37:18,968][134211] Fps is (10 sec: 15564.9, 60 sec: 13585.1, 300 sec: 14204.1). Total num frames: 616554496. Throughput: 0: 3410.4. Samples: 143302706. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 08:37:18,968][134211] Avg episode reward: [(0, '8.818')] [2025-01-04 08:37:21,159][134294] Updated weights for policy 0, policy_version 150534 (0.0026) [2025-01-04 08:37:23,968][134211] Fps is (10 sec: 13925.7, 60 sec: 13653.3, 300 sec: 14176.3). Total num frames: 616624128. Throughput: 0: 3453.3. Samples: 143323256. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 08:37:23,969][134211] Avg episode reward: [(0, '9.508')] [2025-01-04 08:37:24,295][134294] Updated weights for policy 0, policy_version 150544 (0.0025) [2025-01-04 08:37:27,381][134294] Updated weights for policy 0, policy_version 150554 (0.0024) [2025-01-04 08:37:28,968][134211] Fps is (10 sec: 13107.3, 60 sec: 13516.8, 300 sec: 14162.5). Total num frames: 616685568. Throughput: 0: 3446.8. Samples: 143343040. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 08:37:28,968][134211] Avg episode reward: [(0, '9.225')] [2025-01-04 08:37:30,360][134294] Updated weights for policy 0, policy_version 150564 (0.0021) [2025-01-04 08:37:32,320][134294] Updated weights for policy 0, policy_version 150574 (0.0013) [2025-01-04 08:37:33,967][134211] Fps is (10 sec: 15975.4, 60 sec: 14131.2, 300 sec: 14287.4). Total num frames: 616783872. Throughput: 0: 3518.0. Samples: 143355094. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 08:37:33,968][134211] Avg episode reward: [(0, '9.076')] [2025-01-04 08:37:34,340][134294] Updated weights for policy 0, policy_version 150584 (0.0011) [2025-01-04 08:37:36,300][134294] Updated weights for policy 0, policy_version 150594 (0.0013) [2025-01-04 08:37:38,707][134294] Updated weights for policy 0, policy_version 150604 (0.0018) [2025-01-04 08:37:38,968][134211] Fps is (10 sec: 18841.4, 60 sec: 14540.8, 300 sec: 14273.5). Total num frames: 616873984. Throughput: 0: 3803.2. Samples: 143385830. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-04 08:37:38,968][134211] Avg episode reward: [(0, '9.322')] [2025-01-04 08:37:42,117][134294] Updated weights for policy 0, policy_version 150614 (0.0031) [2025-01-04 08:37:43,968][134211] Fps is (10 sec: 15154.7, 60 sec: 14472.5, 300 sec: 14120.8). Total num frames: 616935424. Throughput: 0: 3687.5. Samples: 143405076. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-04 08:37:43,969][134211] Avg episode reward: [(0, '9.380')] [2025-01-04 08:37:45,399][134294] Updated weights for policy 0, policy_version 150624 (0.0027) [2025-01-04 08:37:48,537][134294] Updated weights for policy 0, policy_version 150634 (0.0024) [2025-01-04 08:37:48,969][134211] Fps is (10 sec: 12696.5, 60 sec: 14608.9, 300 sec: 14079.1). Total num frames: 617000960. Throughput: 0: 3683.8. Samples: 143414966. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-04 08:37:48,969][134211] Avg episode reward: [(0, '8.470')] [2025-01-04 08:37:51,915][134294] Updated weights for policy 0, policy_version 150644 (0.0025) [2025-01-04 08:37:53,969][134211] Fps is (10 sec: 12696.5, 60 sec: 14608.8, 300 sec: 14065.2). Total num frames: 617062400. Throughput: 0: 3684.3. Samples: 143433640. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-04 08:37:53,971][134211] Avg episode reward: [(0, '9.149')] [2025-01-04 08:37:55,158][134294] Updated weights for policy 0, policy_version 150654 (0.0028) [2025-01-04 08:37:58,177][134294] Updated weights for policy 0, policy_version 150664 (0.0027) [2025-01-04 08:37:58,968][134211] Fps is (10 sec: 12289.1, 60 sec: 14199.5, 300 sec: 14079.2). Total num frames: 617123840. Throughput: 0: 3590.3. Samples: 143452900. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-04 08:37:58,968][134211] Avg episode reward: [(0, '8.537')] [2025-01-04 08:38:01,436][134294] Updated weights for policy 0, policy_version 150674 (0.0023) [2025-01-04 08:38:03,968][134211] Fps is (10 sec: 12698.6, 60 sec: 14199.4, 300 sec: 14079.1). Total num frames: 617189376. Throughput: 0: 3551.9. Samples: 143462544. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-04 08:38:03,969][134211] Avg episode reward: [(0, '9.604')] [2025-01-04 08:38:04,849][134294] Updated weights for policy 0, policy_version 150684 (0.0027) [2025-01-04 08:38:07,988][134294] Updated weights for policy 0, policy_version 150694 (0.0025) [2025-01-04 08:38:08,968][134211] Fps is (10 sec: 13107.2, 60 sec: 14267.8, 300 sec: 14079.1). Total num frames: 617254912. Throughput: 0: 3511.7. Samples: 143481280. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-04 08:38:08,969][134211] Avg episode reward: [(0, '9.473')] [2025-01-04 08:38:11,023][134294] Updated weights for policy 0, policy_version 150704 (0.0022) [2025-01-04 08:38:13,970][134211] Fps is (10 sec: 13104.8, 60 sec: 13925.9, 300 sec: 14079.0). Total num frames: 617320448. Throughput: 0: 3518.5. Samples: 143501380. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-04 08:38:13,970][134211] Avg episode reward: [(0, '9.104')] [2025-01-04 08:38:14,053][134294] Updated weights for policy 0, policy_version 150714 (0.0026) [2025-01-04 08:38:17,323][134294] Updated weights for policy 0, policy_version 150724 (0.0025) [2025-01-04 08:38:18,968][134211] Fps is (10 sec: 13107.4, 60 sec: 13858.2, 300 sec: 14065.3). Total num frames: 617385984. Throughput: 0: 3464.1. Samples: 143510980. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-04 08:38:18,968][134211] Avg episode reward: [(0, '9.234')] [2025-01-04 08:38:19,875][134294] Updated weights for policy 0, policy_version 150734 (0.0019) [2025-01-04 08:38:21,787][134294] Updated weights for policy 0, policy_version 150744 (0.0013) [2025-01-04 08:38:23,795][134294] Updated weights for policy 0, policy_version 150754 (0.0015) [2025-01-04 08:38:23,968][134211] Fps is (10 sec: 16797.0, 60 sec: 14404.4, 300 sec: 14190.2). Total num frames: 617488384. Throughput: 0: 3367.6. Samples: 143537370. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-04 08:38:23,968][134211] Avg episode reward: [(0, '8.768')] [2025-01-04 08:38:26,707][134294] Updated weights for policy 0, policy_version 150764 (0.0024) [2025-01-04 08:38:28,968][134211] Fps is (10 sec: 17203.0, 60 sec: 14540.8, 300 sec: 14204.1). Total num frames: 617558016. Throughput: 0: 3447.9. Samples: 143560232. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-04 08:38:28,968][134211] Avg episode reward: [(0, '9.202')] [2025-01-04 08:38:29,935][134294] Updated weights for policy 0, policy_version 150774 (0.0030) [2025-01-04 08:38:33,189][134294] Updated weights for policy 0, policy_version 150784 (0.0026) [2025-01-04 08:38:33,968][134211] Fps is (10 sec: 13106.6, 60 sec: 13926.3, 300 sec: 14190.2). Total num frames: 617619456. Throughput: 0: 3436.9. Samples: 143569626. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-04 08:38:33,969][134211] Avg episode reward: [(0, '8.789')] [2025-01-04 08:38:36,463][134294] Updated weights for policy 0, policy_version 150794 (0.0026) [2025-01-04 08:38:38,968][134211] Fps is (10 sec: 12288.0, 60 sec: 13448.5, 300 sec: 14176.3). Total num frames: 617680896. Throughput: 0: 3437.5. Samples: 143588326. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-04 08:38:38,968][134211] Avg episode reward: [(0, '8.270')] [2025-01-04 08:38:39,723][134294] Updated weights for policy 0, policy_version 150804 (0.0024) [2025-01-04 08:38:42,805][134294] Updated weights for policy 0, policy_version 150814 (0.0022) [2025-01-04 08:38:43,968][134211] Fps is (10 sec: 12698.0, 60 sec: 13516.8, 300 sec: 14162.4). Total num frames: 617746432. Throughput: 0: 3446.3. Samples: 143607984. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 08:38:43,968][134211] Avg episode reward: [(0, '9.971')] [2025-01-04 08:38:46,006][134294] Updated weights for policy 0, policy_version 150824 (0.0028) [2025-01-04 08:38:48,968][134211] Fps is (10 sec: 12697.6, 60 sec: 13448.7, 300 sec: 14134.7). Total num frames: 617807872. Throughput: 0: 3444.7. Samples: 143617554. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 08:38:48,968][134211] Avg episode reward: [(0, '8.801')] [2025-01-04 08:38:49,570][134294] Updated weights for policy 0, policy_version 150834 (0.0028) [2025-01-04 08:38:51,989][134294] Updated weights for policy 0, policy_version 150844 (0.0015) [2025-01-04 08:38:53,948][134294] Updated weights for policy 0, policy_version 150854 (0.0014) [2025-01-04 08:38:53,967][134211] Fps is (10 sec: 15155.6, 60 sec: 13926.7, 300 sec: 14176.3). Total num frames: 617897984. Throughput: 0: 3489.2. Samples: 143638294. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 08:38:53,968][134211] Avg episode reward: [(0, '7.804')] [2025-01-04 08:38:55,892][134294] Updated weights for policy 0, policy_version 150864 (0.0013) [2025-01-04 08:38:57,710][134294] Updated weights for policy 0, policy_version 150874 (0.0013) [2025-01-04 08:38:58,968][134211] Fps is (10 sec: 18841.8, 60 sec: 14540.8, 300 sec: 14162.4). Total num frames: 617996288. Throughput: 0: 3752.3. Samples: 143670224. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 08:38:58,968][134211] Avg episode reward: [(0, '8.933')] [2025-01-04 08:39:00,680][134294] Updated weights for policy 0, policy_version 150884 (0.0025) [2025-01-04 08:39:03,968][134211] Fps is (10 sec: 15973.9, 60 sec: 14472.5, 300 sec: 14148.6). Total num frames: 618057728. Throughput: 0: 3754.5. Samples: 143679932. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 08:39:03,969][134211] Avg episode reward: [(0, '8.231')] [2025-01-04 08:39:04,172][134294] Updated weights for policy 0, policy_version 150894 (0.0026) [2025-01-04 08:39:07,589][134294] Updated weights for policy 0, policy_version 150904 (0.0026) [2025-01-04 08:39:08,968][134211] Fps is (10 sec: 12287.9, 60 sec: 14404.3, 300 sec: 14134.7). Total num frames: 618119168. Throughput: 0: 3566.9. Samples: 143697880. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 08:39:08,968][134211] Avg episode reward: [(0, '8.238')] [2025-01-04 08:39:10,590][134294] Updated weights for policy 0, policy_version 150914 (0.0024) [2025-01-04 08:39:13,593][134294] Updated weights for policy 0, policy_version 150924 (0.0025) [2025-01-04 08:39:13,968][134211] Fps is (10 sec: 13107.3, 60 sec: 14473.0, 300 sec: 14148.5). Total num frames: 618188800. Throughput: 0: 3508.0. Samples: 143718094. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 08:39:13,968][134211] Avg episode reward: [(0, '8.876')] [2025-01-04 08:39:13,977][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000150925_618188800.pth... [2025-01-04 08:39:14,053][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000150097_614797312.pth [2025-01-04 08:39:16,625][134294] Updated weights for policy 0, policy_version 150934 (0.0023) [2025-01-04 08:39:18,968][134211] Fps is (10 sec: 13516.8, 60 sec: 14472.5, 300 sec: 14148.6). Total num frames: 618254336. Throughput: 0: 3520.3. Samples: 143728040. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 08:39:18,968][134211] Avg episode reward: [(0, '9.756')] [2025-01-04 08:39:19,695][134294] Updated weights for policy 0, policy_version 150944 (0.0025) [2025-01-04 08:39:22,755][134294] Updated weights for policy 0, policy_version 150954 (0.0026) [2025-01-04 08:39:23,968][134211] Fps is (10 sec: 13516.9, 60 sec: 13926.4, 300 sec: 14162.4). Total num frames: 618323968. Throughput: 0: 3555.2. Samples: 143748308. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 08:39:23,968][134211] Avg episode reward: [(0, '8.593')] [2025-01-04 08:39:25,761][134294] Updated weights for policy 0, policy_version 150964 (0.0027) [2025-01-04 08:39:28,724][134294] Updated weights for policy 0, policy_version 150974 (0.0023) [2025-01-04 08:39:28,968][134211] Fps is (10 sec: 13516.7, 60 sec: 13858.1, 300 sec: 14162.5). Total num frames: 618389504. Throughput: 0: 3572.8. Samples: 143768760. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 08:39:28,968][134211] Avg episode reward: [(0, '9.379')] [2025-01-04 08:39:31,883][134294] Updated weights for policy 0, policy_version 150984 (0.0027) [2025-01-04 08:39:33,968][134211] Fps is (10 sec: 13107.1, 60 sec: 13926.5, 300 sec: 14176.3). Total num frames: 618455040. Throughput: 0: 3574.7. Samples: 143778416. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 08:39:33,968][134211] Avg episode reward: [(0, '9.138')] [2025-01-04 08:39:35,307][134294] Updated weights for policy 0, policy_version 150994 (0.0025) [2025-01-04 08:39:38,409][134294] Updated weights for policy 0, policy_version 151004 (0.0025) [2025-01-04 08:39:38,968][134211] Fps is (10 sec: 12697.8, 60 sec: 13926.4, 300 sec: 14162.4). Total num frames: 618516480. Throughput: 0: 3529.4. Samples: 143797118. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 08:39:38,968][134211] Avg episode reward: [(0, '9.071')] [2025-01-04 08:39:41,467][134294] Updated weights for policy 0, policy_version 151014 (0.0024) [2025-01-04 08:39:43,968][134211] Fps is (10 sec: 13107.1, 60 sec: 13994.7, 300 sec: 14093.0). Total num frames: 618586112. Throughput: 0: 3267.1. Samples: 143817246. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-04 08:39:43,968][134211] Avg episode reward: [(0, '9.235')] [2025-01-04 08:39:44,450][134294] Updated weights for policy 0, policy_version 151024 (0.0024) [2025-01-04 08:39:47,606][134294] Updated weights for policy 0, policy_version 151034 (0.0023) [2025-01-04 08:39:48,968][134211] Fps is (10 sec: 14745.6, 60 sec: 14267.8, 300 sec: 13981.9). Total num frames: 618663936. Throughput: 0: 3275.1. Samples: 143827312. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-04 08:39:48,968][134211] Avg episode reward: [(0, '8.468')] [2025-01-04 08:39:49,471][134294] Updated weights for policy 0, policy_version 151044 (0.0012) [2025-01-04 08:39:51,389][134294] Updated weights for policy 0, policy_version 151054 (0.0013) [2025-01-04 08:39:53,244][134294] Updated weights for policy 0, policy_version 151064 (0.0013) [2025-01-04 08:39:53,967][134211] Fps is (10 sec: 18432.5, 60 sec: 14540.8, 300 sec: 14106.9). Total num frames: 618770432. Throughput: 0: 3533.5. Samples: 143856886. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-04 08:39:53,968][134211] Avg episode reward: [(0, '8.102')] [2025-01-04 08:39:55,594][134294] Updated weights for policy 0, policy_version 151074 (0.0019) [2025-01-04 08:39:58,737][134294] Updated weights for policy 0, policy_version 151084 (0.0030) [2025-01-04 08:39:58,968][134211] Fps is (10 sec: 17612.5, 60 sec: 14062.9, 300 sec: 14148.6). Total num frames: 618840064. Throughput: 0: 3631.5. Samples: 143881512. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-04 08:39:58,968][134211] Avg episode reward: [(0, '8.591')] [2025-01-04 08:40:02,355][134294] Updated weights for policy 0, policy_version 151094 (0.0029) [2025-01-04 08:40:03,968][134211] Fps is (10 sec: 12696.5, 60 sec: 13994.5, 300 sec: 14134.6). Total num frames: 618897408. Throughput: 0: 3608.9. Samples: 143890442. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-04 08:40:03,969][134211] Avg episode reward: [(0, '9.129')] [2025-01-04 08:40:05,688][134294] Updated weights for policy 0, policy_version 151104 (0.0025) [2025-01-04 08:40:08,889][134294] Updated weights for policy 0, policy_version 151114 (0.0025) [2025-01-04 08:40:08,971][134211] Fps is (10 sec: 12284.3, 60 sec: 14062.2, 300 sec: 14134.5). Total num frames: 618962944. Throughput: 0: 3559.3. Samples: 143908490. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-04 08:40:08,971][134211] Avg episode reward: [(0, '9.194')] [2025-01-04 08:40:11,924][134294] Updated weights for policy 0, policy_version 151124 (0.0027) [2025-01-04 08:40:13,968][134211] Fps is (10 sec: 13108.0, 60 sec: 13994.6, 300 sec: 14120.8). Total num frames: 619028480. Throughput: 0: 3536.1. Samples: 143927884. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-04 08:40:13,968][134211] Avg episode reward: [(0, '9.085')] [2025-01-04 08:40:15,106][134294] Updated weights for policy 0, policy_version 151134 (0.0028) [2025-01-04 08:40:18,216][134294] Updated weights for policy 0, policy_version 151144 (0.0025) [2025-01-04 08:40:18,968][134211] Fps is (10 sec: 13111.4, 60 sec: 13994.7, 300 sec: 14120.8). Total num frames: 619094016. Throughput: 0: 3541.0. Samples: 143937760. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-04 08:40:18,968][134211] Avg episode reward: [(0, '8.772')] [2025-01-04 08:40:21,259][134294] Updated weights for policy 0, policy_version 151154 (0.0024) [2025-01-04 08:40:23,968][134211] Fps is (10 sec: 13107.4, 60 sec: 13926.4, 300 sec: 14106.9). Total num frames: 619159552. Throughput: 0: 3575.3. Samples: 143958006. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-04 08:40:23,968][134211] Avg episode reward: [(0, '9.770')] [2025-01-04 08:40:24,356][134294] Updated weights for policy 0, policy_version 151164 (0.0025) [2025-01-04 08:40:27,278][134294] Updated weights for policy 0, policy_version 151174 (0.0023) [2025-01-04 08:40:28,968][134211] Fps is (10 sec: 13516.7, 60 sec: 13994.7, 300 sec: 14120.8). Total num frames: 619229184. Throughput: 0: 3579.2. Samples: 143978310. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-04 08:40:28,968][134211] Avg episode reward: [(0, '8.489')] [2025-01-04 08:40:30,434][134294] Updated weights for policy 0, policy_version 151184 (0.0026) [2025-01-04 08:40:33,475][134294] Updated weights for policy 0, policy_version 151194 (0.0025) [2025-01-04 08:40:33,968][134211] Fps is (10 sec: 13516.7, 60 sec: 13994.6, 300 sec: 14134.7). Total num frames: 619294720. Throughput: 0: 3577.0. Samples: 143988278. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-04 08:40:33,968][134211] Avg episode reward: [(0, '8.282')] [2025-01-04 08:40:36,651][134294] Updated weights for policy 0, policy_version 151204 (0.0023) [2025-01-04 08:40:38,796][134294] Updated weights for policy 0, policy_version 151214 (0.0013) [2025-01-04 08:40:38,967][134211] Fps is (10 sec: 14746.0, 60 sec: 14336.0, 300 sec: 14176.3). Total num frames: 619376640. Throughput: 0: 3365.9. Samples: 144008350. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-04 08:40:38,968][134211] Avg episode reward: [(0, '9.932')] [2025-01-04 08:40:40,893][134294] Updated weights for policy 0, policy_version 151224 (0.0017) [2025-01-04 08:40:43,968][134211] Fps is (10 sec: 15564.8, 60 sec: 14404.3, 300 sec: 14190.2). Total num frames: 619450368. Throughput: 0: 3388.8. Samples: 144034008. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-04 08:40:43,968][134211] Avg episode reward: [(0, '8.168')] [2025-01-04 08:40:44,172][134294] Updated weights for policy 0, policy_version 151234 (0.0026) [2025-01-04 08:40:47,453][134294] Updated weights for policy 0, policy_version 151244 (0.0029) [2025-01-04 08:40:48,968][134211] Fps is (10 sec: 13106.9, 60 sec: 14062.9, 300 sec: 14162.4). Total num frames: 619507712. Throughput: 0: 3393.3. Samples: 144043140. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 08:40:48,968][134211] Avg episode reward: [(0, '8.728')] [2025-01-04 08:40:50,626][134294] Updated weights for policy 0, policy_version 151254 (0.0021) [2025-01-04 08:40:52,643][134294] Updated weights for policy 0, policy_version 151264 (0.0014) [2025-01-04 08:40:53,968][134211] Fps is (10 sec: 15155.5, 60 sec: 13858.1, 300 sec: 14190.2). Total num frames: 619601920. Throughput: 0: 3482.5. Samples: 144065192. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 08:40:53,968][134211] Avg episode reward: [(0, '10.214')] [2025-01-04 08:40:54,667][134294] Updated weights for policy 0, policy_version 151274 (0.0014) [2025-01-04 08:40:56,491][134294] Updated weights for policy 0, policy_version 151284 (0.0013) [2025-01-04 08:40:58,373][134294] Updated weights for policy 0, policy_version 151294 (0.0013) [2025-01-04 08:40:58,968][134211] Fps is (10 sec: 20480.2, 60 sec: 14540.8, 300 sec: 14343.0). Total num frames: 619712512. Throughput: 0: 3762.2. Samples: 144097180. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 08:40:58,968][134211] Avg episode reward: [(0, '8.664')] [2025-01-04 08:41:00,756][134294] Updated weights for policy 0, policy_version 151304 (0.0020) [2025-01-04 08:41:03,968][134211] Fps is (10 sec: 17612.6, 60 sec: 14677.5, 300 sec: 14245.7). Total num frames: 619778048. Throughput: 0: 3819.6. Samples: 144109640. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 08:41:03,968][134211] Avg episode reward: [(0, '8.636')] [2025-01-04 08:41:04,324][134294] Updated weights for policy 0, policy_version 151314 (0.0027) [2025-01-04 08:41:07,736][134294] Updated weights for policy 0, policy_version 151324 (0.0026) [2025-01-04 08:41:08,968][134211] Fps is (10 sec: 12287.8, 60 sec: 14541.5, 300 sec: 14107.0). Total num frames: 619835392. Throughput: 0: 3761.1. Samples: 144127256. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 08:41:08,968][134211] Avg episode reward: [(0, '8.983')] [2025-01-04 08:41:10,886][134294] Updated weights for policy 0, policy_version 151334 (0.0028) [2025-01-04 08:41:13,968][134211] Fps is (10 sec: 12288.0, 60 sec: 14540.8, 300 sec: 14106.9). Total num frames: 619900928. Throughput: 0: 3740.4. Samples: 144146628. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 08:41:13,968][134211] Avg episode reward: [(0, '8.241')] [2025-01-04 08:41:14,022][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000151344_619905024.pth... [2025-01-04 08:41:14,029][134294] Updated weights for policy 0, policy_version 151344 (0.0028) [2025-01-04 08:41:14,090][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000150509_616484864.pth [2025-01-04 08:41:17,133][134294] Updated weights for policy 0, policy_version 151354 (0.0025) [2025-01-04 08:41:18,968][134211] Fps is (10 sec: 13107.3, 60 sec: 14540.8, 300 sec: 14106.9). Total num frames: 619966464. Throughput: 0: 3735.1. Samples: 144156356. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 08:41:18,968][134211] Avg episode reward: [(0, '8.580')] [2025-01-04 08:41:20,234][134294] Updated weights for policy 0, policy_version 151364 (0.0022) [2025-01-04 08:41:23,110][134294] Updated weights for policy 0, policy_version 151374 (0.0025) [2025-01-04 08:41:23,969][134211] Fps is (10 sec: 13515.6, 60 sec: 14608.8, 300 sec: 14106.9). Total num frames: 620036096. Throughput: 0: 3745.2. Samples: 144176888. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 08:41:23,969][134211] Avg episode reward: [(0, '8.618')] [2025-01-04 08:41:26,221][134294] Updated weights for policy 0, policy_version 151384 (0.0025) [2025-01-04 08:41:28,968][134211] Fps is (10 sec: 13516.9, 60 sec: 14540.8, 300 sec: 14120.8). Total num frames: 620101632. Throughput: 0: 3618.2. Samples: 144196826. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 08:41:28,968][134211] Avg episode reward: [(0, '8.594')] [2025-01-04 08:41:29,416][134294] Updated weights for policy 0, policy_version 151394 (0.0024) [2025-01-04 08:41:32,633][134294] Updated weights for policy 0, policy_version 151404 (0.0024) [2025-01-04 08:41:33,968][134211] Fps is (10 sec: 12698.6, 60 sec: 14472.5, 300 sec: 14106.9). Total num frames: 620163072. Throughput: 0: 3622.7. Samples: 144206164. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 08:41:33,968][134211] Avg episode reward: [(0, '8.358')] [2025-01-04 08:41:35,840][134294] Updated weights for policy 0, policy_version 151414 (0.0026) [2025-01-04 08:41:38,948][134294] Updated weights for policy 0, policy_version 151424 (0.0026) [2025-01-04 08:41:38,970][134211] Fps is (10 sec: 13104.4, 60 sec: 14267.2, 300 sec: 14120.7). Total num frames: 620232704. Throughput: 0: 3564.9. Samples: 144225622. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 08:41:38,970][134211] Avg episode reward: [(0, '8.269')] [2025-01-04 08:41:41,766][134294] Updated weights for policy 0, policy_version 151434 (0.0025) [2025-01-04 08:41:43,968][134211] Fps is (10 sec: 13926.5, 60 sec: 14199.5, 300 sec: 14162.4). Total num frames: 620302336. Throughput: 0: 3314.5. Samples: 144246334. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 08:41:43,968][134211] Avg episode reward: [(0, '8.385')] [2025-01-04 08:41:44,810][134294] Updated weights for policy 0, policy_version 151444 (0.0026) [2025-01-04 08:41:47,775][134294] Updated weights for policy 0, policy_version 151454 (0.0026) [2025-01-04 08:41:48,968][134211] Fps is (10 sec: 13929.3, 60 sec: 14404.3, 300 sec: 14190.2). Total num frames: 620371968. Throughput: 0: 3260.8. Samples: 144256378. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 08:41:48,968][134211] Avg episode reward: [(0, '8.294')] [2025-01-04 08:41:50,793][134294] Updated weights for policy 0, policy_version 151464 (0.0025) [2025-01-04 08:41:53,779][134294] Updated weights for policy 0, policy_version 151474 (0.0025) [2025-01-04 08:41:53,968][134211] Fps is (10 sec: 13516.1, 60 sec: 13926.3, 300 sec: 14120.8). Total num frames: 620437504. Throughput: 0: 3333.0. Samples: 144277242. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 08:41:53,969][134211] Avg episode reward: [(0, '8.317')] [2025-01-04 08:41:56,875][134294] Updated weights for policy 0, policy_version 151484 (0.0026) [2025-01-04 08:41:58,968][134211] Fps is (10 sec: 13516.9, 60 sec: 13243.7, 300 sec: 14134.7). Total num frames: 620507136. Throughput: 0: 3348.0. Samples: 144297286. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 08:41:58,968][134211] Avg episode reward: [(0, '8.485')] [2025-01-04 08:41:59,942][134294] Updated weights for policy 0, policy_version 151494 (0.0024) [2025-01-04 08:42:03,095][134294] Updated weights for policy 0, policy_version 151504 (0.0025) [2025-01-04 08:42:03,968][134211] Fps is (10 sec: 13517.6, 60 sec: 13243.7, 300 sec: 14148.6). Total num frames: 620572672. Throughput: 0: 3348.1. Samples: 144307022. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 08:42:03,968][134211] Avg episode reward: [(0, '8.928')] [2025-01-04 08:42:05,286][134294] Updated weights for policy 0, policy_version 151514 (0.0012) [2025-01-04 08:42:08,236][134294] Updated weights for policy 0, policy_version 151524 (0.0026) [2025-01-04 08:42:08,968][134211] Fps is (10 sec: 14335.9, 60 sec: 13585.1, 300 sec: 14120.8). Total num frames: 620650496. Throughput: 0: 3409.9. Samples: 144330330. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 08:42:08,968][134211] Avg episode reward: [(0, '8.993')] [2025-01-04 08:42:11,520][134294] Updated weights for policy 0, policy_version 151534 (0.0023) [2025-01-04 08:42:13,970][134211] Fps is (10 sec: 14742.1, 60 sec: 13652.8, 300 sec: 14120.7). Total num frames: 620720128. Throughput: 0: 3409.1. Samples: 144350244. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 08:42:13,970][134211] Avg episode reward: [(0, '8.227')] [2025-01-04 08:42:14,011][134294] Updated weights for policy 0, policy_version 151544 (0.0016) [2025-01-04 08:42:16,098][134294] Updated weights for policy 0, policy_version 151554 (0.0013) [2025-01-04 08:42:17,944][134294] Updated weights for policy 0, policy_version 151564 (0.0015) [2025-01-04 08:42:18,968][134211] Fps is (10 sec: 17613.1, 60 sec: 14336.0, 300 sec: 14245.8). Total num frames: 620826624. Throughput: 0: 3533.1. Samples: 144365154. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 08:42:18,968][134211] Avg episode reward: [(0, '8.659')] [2025-01-04 08:42:19,887][134294] Updated weights for policy 0, policy_version 151574 (0.0013) [2025-01-04 08:42:21,740][134294] Updated weights for policy 0, policy_version 151584 (0.0013) [2025-01-04 08:42:23,692][134294] Updated weights for policy 0, policy_version 151594 (0.0014) [2025-01-04 08:42:23,968][134211] Fps is (10 sec: 21304.1, 60 sec: 14950.6, 300 sec: 14398.5). Total num frames: 620933120. Throughput: 0: 3821.1. Samples: 144397564. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 08:42:23,968][134211] Avg episode reward: [(0, '9.061')] [2025-01-04 08:42:26,616][134294] Updated weights for policy 0, policy_version 151604 (0.0024) [2025-01-04 08:42:28,968][134211] Fps is (10 sec: 16793.4, 60 sec: 14882.1, 300 sec: 14273.5). Total num frames: 620994560. Throughput: 0: 3856.1. Samples: 144419856. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 08:42:28,968][134211] Avg episode reward: [(0, '8.084')] [2025-01-04 08:42:30,456][134294] Updated weights for policy 0, policy_version 151614 (0.0032) [2025-01-04 08:42:33,968][134211] Fps is (10 sec: 11468.7, 60 sec: 14745.6, 300 sec: 14148.6). Total num frames: 621047808. Throughput: 0: 3809.5. Samples: 144427806. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 08:42:33,968][134211] Avg episode reward: [(0, '8.679')] [2025-01-04 08:42:34,143][134294] Updated weights for policy 0, policy_version 151624 (0.0029) [2025-01-04 08:42:37,645][134294] Updated weights for policy 0, policy_version 151634 (0.0026) [2025-01-04 08:42:38,968][134211] Fps is (10 sec: 11468.8, 60 sec: 14609.6, 300 sec: 14148.6). Total num frames: 621109248. Throughput: 0: 3731.8. Samples: 144445170. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 08:42:38,968][134211] Avg episode reward: [(0, '9.486')] [2025-01-04 08:42:40,732][134294] Updated weights for policy 0, policy_version 151644 (0.0025) [2025-01-04 08:42:43,710][134294] Updated weights for policy 0, policy_version 151654 (0.0022) [2025-01-04 08:42:43,968][134211] Fps is (10 sec: 12697.8, 60 sec: 14540.8, 300 sec: 14148.6). Total num frames: 621174784. Throughput: 0: 3729.5. Samples: 144465114. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 08:42:43,968][134211] Avg episode reward: [(0, '8.817')] [2025-01-04 08:42:46,787][134294] Updated weights for policy 0, policy_version 151664 (0.0027) [2025-01-04 08:42:48,968][134211] Fps is (10 sec: 13107.1, 60 sec: 14472.5, 300 sec: 14162.5). Total num frames: 621240320. Throughput: 0: 3738.9. Samples: 144475272. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 08:42:48,968][134211] Avg episode reward: [(0, '8.692')] [2025-01-04 08:42:50,328][134294] Updated weights for policy 0, policy_version 151674 (0.0028) [2025-01-04 08:42:53,813][134294] Updated weights for policy 0, policy_version 151684 (0.0030) [2025-01-04 08:42:53,968][134211] Fps is (10 sec: 12287.8, 60 sec: 14336.1, 300 sec: 14148.6). Total num frames: 621297664. Throughput: 0: 3607.2. Samples: 144492656. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 08:42:53,968][134211] Avg episode reward: [(0, '9.388')] [2025-01-04 08:42:57,059][134294] Updated weights for policy 0, policy_version 151694 (0.0025) [2025-01-04 08:42:58,968][134211] Fps is (10 sec: 12287.8, 60 sec: 14267.7, 300 sec: 14148.6). Total num frames: 621363200. Throughput: 0: 3581.8. Samples: 144511418. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 08:42:58,969][134211] Avg episode reward: [(0, '7.882')] [2025-01-04 08:43:00,104][134294] Updated weights for policy 0, policy_version 151704 (0.0022) [2025-01-04 08:43:03,010][134294] Updated weights for policy 0, policy_version 151714 (0.0027) [2025-01-04 08:43:03,968][134211] Fps is (10 sec: 13106.7, 60 sec: 14267.6, 300 sec: 14148.5). Total num frames: 621428736. Throughput: 0: 3485.3. Samples: 144521994. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 08:43:03,969][134211] Avg episode reward: [(0, '9.024')] [2025-01-04 08:43:06,262][134294] Updated weights for policy 0, policy_version 151724 (0.0027) [2025-01-04 08:43:08,968][134211] Fps is (10 sec: 13107.5, 60 sec: 14062.9, 300 sec: 14148.6). Total num frames: 621494272. Throughput: 0: 3193.6. Samples: 144541276. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 08:43:08,968][134211] Avg episode reward: [(0, '9.553')] [2025-01-04 08:43:09,514][134294] Updated weights for policy 0, policy_version 151734 (0.0025) [2025-01-04 08:43:12,509][134294] Updated weights for policy 0, policy_version 151744 (0.0025) [2025-01-04 08:43:13,968][134211] Fps is (10 sec: 13107.9, 60 sec: 13995.2, 300 sec: 14148.6). Total num frames: 621559808. Throughput: 0: 3144.4. Samples: 144561352. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 08:43:13,968][134211] Avg episode reward: [(0, '9.005')] [2025-01-04 08:43:13,982][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000151749_621563904.pth... [2025-01-04 08:43:14,051][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000150925_618188800.pth [2025-01-04 08:43:15,547][134294] Updated weights for policy 0, policy_version 151754 (0.0026) [2025-01-04 08:43:18,447][134294] Updated weights for policy 0, policy_version 151764 (0.0024) [2025-01-04 08:43:18,971][134211] Fps is (10 sec: 13514.6, 60 sec: 13379.9, 300 sec: 14037.4). Total num frames: 621629440. Throughput: 0: 3197.0. Samples: 144571678. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 08:43:18,971][134211] Avg episode reward: [(0, '8.492')] [2025-01-04 08:43:21,488][134294] Updated weights for policy 0, policy_version 151774 (0.0024) [2025-01-04 08:43:23,968][134211] Fps is (10 sec: 13926.3, 60 sec: 12765.9, 300 sec: 14037.5). Total num frames: 621699072. Throughput: 0: 3268.6. Samples: 144592256. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 08:43:23,968][134211] Avg episode reward: [(0, '8.635')] [2025-01-04 08:43:24,509][134294] Updated weights for policy 0, policy_version 151784 (0.0023) [2025-01-04 08:43:27,405][134294] Updated weights for policy 0, policy_version 151794 (0.0023) [2025-01-04 08:43:28,968][134211] Fps is (10 sec: 14748.3, 60 sec: 13039.0, 300 sec: 14093.0). Total num frames: 621776896. Throughput: 0: 3318.0. Samples: 144614424. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 08:43:28,968][134211] Avg episode reward: [(0, '8.388')] [2025-01-04 08:43:29,410][134294] Updated weights for policy 0, policy_version 151804 (0.0017) [2025-01-04 08:43:32,355][134294] Updated weights for policy 0, policy_version 151814 (0.0024) [2025-01-04 08:43:33,968][134211] Fps is (10 sec: 15155.4, 60 sec: 13380.3, 300 sec: 14134.7). Total num frames: 621850624. Throughput: 0: 3368.5. Samples: 144626852. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 08:43:33,968][134211] Avg episode reward: [(0, '8.547')] [2025-01-04 08:43:35,654][134294] Updated weights for policy 0, policy_version 151824 (0.0028) [2025-01-04 08:43:37,903][134294] Updated weights for policy 0, policy_version 151834 (0.0014) [2025-01-04 08:43:38,967][134211] Fps is (10 sec: 15564.9, 60 sec: 13721.6, 300 sec: 14190.2). Total num frames: 621932544. Throughput: 0: 3440.2. Samples: 144647464. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 08:43:38,968][134211] Avg episode reward: [(0, '8.174')] [2025-01-04 08:43:39,961][134294] Updated weights for policy 0, policy_version 151844 (0.0013) [2025-01-04 08:43:41,825][134294] Updated weights for policy 0, policy_version 151854 (0.0013) [2025-01-04 08:43:43,749][134294] Updated weights for policy 0, policy_version 151864 (0.0013) [2025-01-04 08:43:43,968][134211] Fps is (10 sec: 18431.8, 60 sec: 14336.0, 300 sec: 14329.1). Total num frames: 622034944. Throughput: 0: 3725.2. Samples: 144679050. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 08:43:43,968][134211] Avg episode reward: [(0, '8.737')] [2025-01-04 08:43:46,688][134294] Updated weights for policy 0, policy_version 151874 (0.0021) [2025-01-04 08:43:48,968][134211] Fps is (10 sec: 16793.4, 60 sec: 14336.0, 300 sec: 14245.7). Total num frames: 622100480. Throughput: 0: 3743.8. Samples: 144690462. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 08:43:48,968][134211] Avg episode reward: [(0, '8.752')] [2025-01-04 08:43:50,119][134294] Updated weights for policy 0, policy_version 151884 (0.0028) [2025-01-04 08:43:53,237][134294] Updated weights for policy 0, policy_version 151894 (0.0028) [2025-01-04 08:43:53,968][134211] Fps is (10 sec: 13107.3, 60 sec: 14472.6, 300 sec: 14134.7). Total num frames: 622166016. Throughput: 0: 3732.0. Samples: 144709216. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 08:43:53,968][134211] Avg episode reward: [(0, '9.643')] [2025-01-04 08:43:56,505][134294] Updated weights for policy 0, policy_version 151904 (0.0029) [2025-01-04 08:43:58,968][134211] Fps is (10 sec: 13107.0, 60 sec: 14472.6, 300 sec: 14148.6). Total num frames: 622231552. Throughput: 0: 3713.8. Samples: 144728472. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 08:43:58,968][134211] Avg episode reward: [(0, '8.169')] [2025-01-04 08:43:59,665][134294] Updated weights for policy 0, policy_version 151914 (0.0026) [2025-01-04 08:44:02,953][134294] Updated weights for policy 0, policy_version 151924 (0.0026) [2025-01-04 08:44:03,968][134211] Fps is (10 sec: 12287.8, 60 sec: 14336.1, 300 sec: 14134.7). Total num frames: 622288896. Throughput: 0: 3693.3. Samples: 144737870. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 08:44:03,969][134211] Avg episode reward: [(0, '9.708')] [2025-01-04 08:44:06,159][134294] Updated weights for policy 0, policy_version 151934 (0.0026) [2025-01-04 08:44:08,968][134211] Fps is (10 sec: 12288.0, 60 sec: 14336.0, 300 sec: 14120.8). Total num frames: 622354432. Throughput: 0: 3651.7. Samples: 144756584. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 08:44:08,968][134211] Avg episode reward: [(0, '9.195')] [2025-01-04 08:44:09,398][134294] Updated weights for policy 0, policy_version 151944 (0.0026) [2025-01-04 08:44:12,584][134294] Updated weights for policy 0, policy_version 151954 (0.0031) [2025-01-04 08:44:13,968][134211] Fps is (10 sec: 13106.5, 60 sec: 14335.8, 300 sec: 14120.8). Total num frames: 622419968. Throughput: 0: 3591.4. Samples: 144776040. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 08:44:13,969][134211] Avg episode reward: [(0, '8.722')] [2025-01-04 08:44:15,749][134294] Updated weights for policy 0, policy_version 151964 (0.0025) [2025-01-04 08:44:18,968][134211] Fps is (10 sec: 12697.6, 60 sec: 14199.8, 300 sec: 14093.0). Total num frames: 622481408. Throughput: 0: 3529.5. Samples: 144785680. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 08:44:18,968][134211] Avg episode reward: [(0, '8.189')] [2025-01-04 08:44:18,985][134294] Updated weights for policy 0, policy_version 151974 (0.0026) [2025-01-04 08:44:22,217][134294] Updated weights for policy 0, policy_version 151984 (0.0026) [2025-01-04 08:44:23,968][134211] Fps is (10 sec: 12288.6, 60 sec: 14062.9, 300 sec: 14079.1). Total num frames: 622542848. Throughput: 0: 3494.5. Samples: 144804716. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 08:44:23,969][134211] Avg episode reward: [(0, '9.035')] [2025-01-04 08:44:25,483][134294] Updated weights for policy 0, policy_version 151994 (0.0022) [2025-01-04 08:44:27,374][134294] Updated weights for policy 0, policy_version 152004 (0.0013) [2025-01-04 08:44:28,968][134211] Fps is (10 sec: 15974.7, 60 sec: 14404.3, 300 sec: 14190.2). Total num frames: 622641152. Throughput: 0: 3344.9. Samples: 144829570. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 08:44:28,968][134211] Avg episode reward: [(0, '8.035')] [2025-01-04 08:44:29,293][134294] Updated weights for policy 0, policy_version 152014 (0.0013) [2025-01-04 08:44:31,305][134294] Updated weights for policy 0, policy_version 152024 (0.0013) [2025-01-04 08:44:33,398][134294] Updated weights for policy 0, policy_version 152034 (0.0017) [2025-01-04 08:44:33,968][134211] Fps is (10 sec: 19251.5, 60 sec: 14745.6, 300 sec: 14301.3). Total num frames: 622735360. Throughput: 0: 3441.0. Samples: 144845308. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 08:44:33,968][134211] Avg episode reward: [(0, '10.318')] [2025-01-04 08:44:37,108][134294] Updated weights for policy 0, policy_version 152044 (0.0033) [2025-01-04 08:44:38,968][134211] Fps is (10 sec: 15154.8, 60 sec: 14335.9, 300 sec: 14259.6). Total num frames: 622792704. Throughput: 0: 3495.5. Samples: 144866512. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 08:44:38,969][134211] Avg episode reward: [(0, '8.093')] [2025-01-04 08:44:40,368][134294] Updated weights for policy 0, policy_version 152054 (0.0026) [2025-01-04 08:44:43,496][134294] Updated weights for policy 0, policy_version 152064 (0.0026) [2025-01-04 08:44:43,970][134211] Fps is (10 sec: 12285.7, 60 sec: 13721.2, 300 sec: 14217.9). Total num frames: 622858240. Throughput: 0: 3499.1. Samples: 144885936. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 08:44:43,970][134211] Avg episode reward: [(0, '9.349')] [2025-01-04 08:44:46,383][134294] Updated weights for policy 0, policy_version 152074 (0.0027) [2025-01-04 08:44:48,968][134211] Fps is (10 sec: 12697.6, 60 sec: 13653.3, 300 sec: 14065.2). Total num frames: 622919680. Throughput: 0: 3515.4. Samples: 144896064. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 08:44:48,969][134211] Avg episode reward: [(0, '8.561')] [2025-01-04 08:44:50,178][134294] Updated weights for policy 0, policy_version 152084 (0.0029) [2025-01-04 08:44:53,583][134294] Updated weights for policy 0, policy_version 152094 (0.0026) [2025-01-04 08:44:53,968][134211] Fps is (10 sec: 11880.4, 60 sec: 13516.8, 300 sec: 14023.6). Total num frames: 622977024. Throughput: 0: 3477.2. Samples: 144913058. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 08:44:53,968][134211] Avg episode reward: [(0, '8.120')] [2025-01-04 08:44:56,895][134294] Updated weights for policy 0, policy_version 152104 (0.0026) [2025-01-04 08:44:58,968][134211] Fps is (10 sec: 12288.1, 60 sec: 13516.8, 300 sec: 14051.4). Total num frames: 623042560. Throughput: 0: 3452.7. Samples: 144931410. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 08:44:58,968][134211] Avg episode reward: [(0, '8.856')] [2025-01-04 08:45:00,322][134294] Updated weights for policy 0, policy_version 152114 (0.0025) [2025-01-04 08:45:02,890][134294] Updated weights for policy 0, policy_version 152124 (0.0020) [2025-01-04 08:45:03,968][134211] Fps is (10 sec: 14336.2, 60 sec: 13858.2, 300 sec: 14093.2). Total num frames: 623120384. Throughput: 0: 3446.4. Samples: 144940766. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 08:45:03,968][134211] Avg episode reward: [(0, '8.014')] [2025-01-04 08:45:04,928][134294] Updated weights for policy 0, policy_version 152134 (0.0014) [2025-01-04 08:45:06,946][134294] Updated weights for policy 0, policy_version 152144 (0.0014) [2025-01-04 08:45:08,967][134211] Fps is (10 sec: 17613.3, 60 sec: 14404.3, 300 sec: 14204.1). Total num frames: 623218688. Throughput: 0: 3670.1. Samples: 144969868. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 08:45:08,968][134211] Avg episode reward: [(0, '8.255')] [2025-01-04 08:45:08,974][134294] Updated weights for policy 0, policy_version 152154 (0.0014) [2025-01-04 08:45:12,203][134294] Updated weights for policy 0, policy_version 152164 (0.0027) [2025-01-04 08:45:13,968][134211] Fps is (10 sec: 16383.7, 60 sec: 14404.4, 300 sec: 14204.1). Total num frames: 623284224. Throughput: 0: 3616.7. Samples: 144992322. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 08:45:13,969][134211] Avg episode reward: [(0, '8.297')] [2025-01-04 08:45:13,977][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000152169_623284224.pth... [2025-01-04 08:45:14,059][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000151344_619905024.pth [2025-01-04 08:45:15,529][134294] Updated weights for policy 0, policy_version 152174 (0.0027) [2025-01-04 08:45:18,599][134294] Updated weights for policy 0, policy_version 152184 (0.0028) [2025-01-04 08:45:18,968][134211] Fps is (10 sec: 13107.0, 60 sec: 14472.6, 300 sec: 14204.1). Total num frames: 623349760. Throughput: 0: 3470.9. Samples: 145001498. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 08:45:18,969][134211] Avg episode reward: [(0, '8.494')] [2025-01-04 08:45:21,691][134294] Updated weights for policy 0, policy_version 152194 (0.0026) [2025-01-04 08:45:23,968][134211] Fps is (10 sec: 13107.5, 60 sec: 14540.8, 300 sec: 14190.2). Total num frames: 623415296. Throughput: 0: 3440.2. Samples: 145021322. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 08:45:23,968][134211] Avg episode reward: [(0, '9.621')] [2025-01-04 08:45:24,873][134294] Updated weights for policy 0, policy_version 152204 (0.0024) [2025-01-04 08:45:27,974][134294] Updated weights for policy 0, policy_version 152214 (0.0027) [2025-01-04 08:45:28,968][134211] Fps is (10 sec: 12697.5, 60 sec: 13926.4, 300 sec: 14176.3). Total num frames: 623476736. Throughput: 0: 3442.7. Samples: 145040852. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 08:45:28,968][134211] Avg episode reward: [(0, '8.727')] [2025-01-04 08:45:31,394][134294] Updated weights for policy 0, policy_version 152224 (0.0025) [2025-01-04 08:45:33,968][134211] Fps is (10 sec: 12287.9, 60 sec: 13380.3, 300 sec: 14106.9). Total num frames: 623538176. Throughput: 0: 3416.5. Samples: 145049808. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 08:45:33,968][134211] Avg episode reward: [(0, '9.032')] [2025-01-04 08:45:34,794][134294] Updated weights for policy 0, policy_version 152234 (0.0025) [2025-01-04 08:45:38,103][134294] Updated weights for policy 0, policy_version 152244 (0.0025) [2025-01-04 08:45:38,968][134211] Fps is (10 sec: 12287.5, 60 sec: 13448.5, 300 sec: 14065.2). Total num frames: 623599616. Throughput: 0: 3444.4. Samples: 145068058. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 08:45:38,969][134211] Avg episode reward: [(0, '8.335')] [2025-01-04 08:45:40,692][134294] Updated weights for policy 0, policy_version 152254 (0.0019) [2025-01-04 08:45:42,901][134294] Updated weights for policy 0, policy_version 152264 (0.0017) [2025-01-04 08:45:43,968][134211] Fps is (10 sec: 14745.4, 60 sec: 13790.2, 300 sec: 14162.4). Total num frames: 623685632. Throughput: 0: 3582.9. Samples: 145092642. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 08:45:43,968][134211] Avg episode reward: [(0, '8.746')] [2025-01-04 08:45:46,012][134294] Updated weights for policy 0, policy_version 152274 (0.0026) [2025-01-04 08:45:48,968][134211] Fps is (10 sec: 14746.3, 60 sec: 13789.9, 300 sec: 14051.4). Total num frames: 623747072. Throughput: 0: 3585.7. Samples: 145102122. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 08:45:48,968][134211] Avg episode reward: [(0, '9.219')] [2025-01-04 08:45:49,544][134294] Updated weights for policy 0, policy_version 152284 (0.0027) [2025-01-04 08:45:52,502][134294] Updated weights for policy 0, policy_version 152294 (0.0025) [2025-01-04 08:45:53,968][134211] Fps is (10 sec: 12697.7, 60 sec: 13926.4, 300 sec: 13898.6). Total num frames: 623812608. Throughput: 0: 3358.8. Samples: 145121014. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 08:45:53,968][134211] Avg episode reward: [(0, '8.390')] [2025-01-04 08:45:55,524][134294] Updated weights for policy 0, policy_version 152304 (0.0025) [2025-01-04 08:45:58,632][134294] Updated weights for policy 0, policy_version 152314 (0.0027) [2025-01-04 08:45:58,968][134211] Fps is (10 sec: 13107.2, 60 sec: 13926.4, 300 sec: 13898.6). Total num frames: 623878144. Throughput: 0: 3310.2. Samples: 145141280. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 08:45:58,968][134211] Avg episode reward: [(0, '8.408')] [2025-01-04 08:46:01,696][134294] Updated weights for policy 0, policy_version 152324 (0.0026) [2025-01-04 08:46:03,948][134294] Updated weights for policy 0, policy_version 152334 (0.0015) [2025-01-04 08:46:03,968][134211] Fps is (10 sec: 14745.7, 60 sec: 13994.7, 300 sec: 13981.9). Total num frames: 623960064. Throughput: 0: 3327.0. Samples: 145151214. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 08:46:03,968][134211] Avg episode reward: [(0, '9.276')] [2025-01-04 08:46:06,060][134294] Updated weights for policy 0, policy_version 152344 (0.0015) [2025-01-04 08:46:08,968][134211] Fps is (10 sec: 15974.2, 60 sec: 13653.3, 300 sec: 14023.6). Total num frames: 624037888. Throughput: 0: 3471.4. Samples: 145177534. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 08:46:08,968][134211] Avg episode reward: [(0, '8.485')] [2025-01-04 08:46:09,010][134294] Updated weights for policy 0, policy_version 152354 (0.0024) [2025-01-04 08:46:12,183][134294] Updated weights for policy 0, policy_version 152364 (0.0026) [2025-01-04 08:46:13,968][134211] Fps is (10 sec: 14336.1, 60 sec: 13653.4, 300 sec: 14023.6). Total num frames: 624103424. Throughput: 0: 3476.6. Samples: 145197300. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 08:46:13,968][134211] Avg episode reward: [(0, '8.008')] [2025-01-04 08:46:15,162][134294] Updated weights for policy 0, policy_version 152374 (0.0024) [2025-01-04 08:46:18,261][134294] Updated weights for policy 0, policy_version 152384 (0.0025) [2025-01-04 08:46:18,968][134211] Fps is (10 sec: 13516.8, 60 sec: 13721.6, 300 sec: 14023.6). Total num frames: 624173056. Throughput: 0: 3505.5. Samples: 145207554. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 08:46:18,968][134211] Avg episode reward: [(0, '8.431')] [2025-01-04 08:46:21,255][134294] Updated weights for policy 0, policy_version 152394 (0.0025) [2025-01-04 08:46:23,968][134211] Fps is (10 sec: 13516.7, 60 sec: 13721.6, 300 sec: 14023.6). Total num frames: 624238592. Throughput: 0: 3542.9. Samples: 145227488. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 08:46:23,968][134211] Avg episode reward: [(0, '9.025')] [2025-01-04 08:46:24,459][134294] Updated weights for policy 0, policy_version 152404 (0.0026) [2025-01-04 08:46:27,479][134294] Updated weights for policy 0, policy_version 152414 (0.0024) [2025-01-04 08:46:28,968][134211] Fps is (10 sec: 13107.4, 60 sec: 13789.9, 300 sec: 14037.5). Total num frames: 624304128. Throughput: 0: 3438.2. Samples: 145247360. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 08:46:28,968][134211] Avg episode reward: [(0, '9.336')] [2025-01-04 08:46:30,473][134294] Updated weights for policy 0, policy_version 152424 (0.0023) [2025-01-04 08:46:33,336][134294] Updated weights for policy 0, policy_version 152434 (0.0024) [2025-01-04 08:46:33,969][134211] Fps is (10 sec: 13515.7, 60 sec: 13926.2, 300 sec: 14037.5). Total num frames: 624373760. Throughput: 0: 3461.2. Samples: 145257878. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 08:46:33,969][134211] Avg episode reward: [(0, '9.647')] [2025-01-04 08:46:36,513][134294] Updated weights for policy 0, policy_version 152444 (0.0026) [2025-01-04 08:46:38,968][134211] Fps is (10 sec: 13516.6, 60 sec: 13994.8, 300 sec: 14023.6). Total num frames: 624439296. Throughput: 0: 3477.5. Samples: 145277502. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 08:46:38,968][134211] Avg episode reward: [(0, '9.196')] [2025-01-04 08:46:39,870][134294] Updated weights for policy 0, policy_version 152454 (0.0028) [2025-01-04 08:46:42,534][134294] Updated weights for policy 0, policy_version 152464 (0.0021) [2025-01-04 08:46:43,968][134211] Fps is (10 sec: 14747.1, 60 sec: 13926.5, 300 sec: 14065.3). Total num frames: 624521216. Throughput: 0: 3526.6. Samples: 145299976. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 08:46:43,968][134211] Avg episode reward: [(0, '8.685')] [2025-01-04 08:46:44,402][134294] Updated weights for policy 0, policy_version 152474 (0.0012) [2025-01-04 08:46:46,300][134294] Updated weights for policy 0, policy_version 152484 (0.0016) [2025-01-04 08:46:48,366][134294] Updated weights for policy 0, policy_version 152494 (0.0012) [2025-01-04 08:46:48,968][134211] Fps is (10 sec: 18432.3, 60 sec: 14609.1, 300 sec: 14190.2). Total num frames: 624623616. Throughput: 0: 3669.4. Samples: 145316336. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 08:46:48,968][134211] Avg episode reward: [(0, '9.791')] [2025-01-04 08:46:51,007][134294] Updated weights for policy 0, policy_version 152504 (0.0022) [2025-01-04 08:46:53,970][134211] Fps is (10 sec: 16789.2, 60 sec: 14608.5, 300 sec: 14176.2). Total num frames: 624689152. Throughput: 0: 3623.8. Samples: 145340616. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 08:46:53,971][134211] Avg episode reward: [(0, '9.052')] [2025-01-04 08:46:54,487][134294] Updated weights for policy 0, policy_version 152514 (0.0027) [2025-01-04 08:46:57,803][134294] Updated weights for policy 0, policy_version 152524 (0.0028) [2025-01-04 08:46:58,968][134211] Fps is (10 sec: 12697.4, 60 sec: 14540.8, 300 sec: 14162.4). Total num frames: 624750592. Throughput: 0: 3592.3. Samples: 145358952. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 08:46:58,968][134211] Avg episode reward: [(0, '8.723')] [2025-01-04 08:47:00,896][134294] Updated weights for policy 0, policy_version 152534 (0.0027) [2025-01-04 08:47:03,968][134211] Fps is (10 sec: 12700.2, 60 sec: 14267.6, 300 sec: 14120.8). Total num frames: 624816128. Throughput: 0: 3589.1. Samples: 145369064. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 08:47:03,969][134211] Avg episode reward: [(0, '9.090')] [2025-01-04 08:47:04,019][134294] Updated weights for policy 0, policy_version 152544 (0.0026) [2025-01-04 08:47:07,211][134294] Updated weights for policy 0, policy_version 152554 (0.0025) [2025-01-04 08:47:08,968][134211] Fps is (10 sec: 13107.1, 60 sec: 14062.9, 300 sec: 14107.0). Total num frames: 624881664. Throughput: 0: 3574.0. Samples: 145388316. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 08:47:08,968][134211] Avg episode reward: [(0, '8.390')] [2025-01-04 08:47:10,396][134294] Updated weights for policy 0, policy_version 152564 (0.0030) [2025-01-04 08:47:13,371][134294] Updated weights for policy 0, policy_version 152574 (0.0027) [2025-01-04 08:47:13,969][134211] Fps is (10 sec: 13515.3, 60 sec: 14130.8, 300 sec: 13981.9). Total num frames: 624951296. Throughput: 0: 3579.4. Samples: 145408440. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 08:47:13,970][134211] Avg episode reward: [(0, '9.742')] [2025-01-04 08:47:13,978][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000152576_624951296.pth... [2025-01-04 08:47:14,049][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000151749_621563904.pth [2025-01-04 08:47:16,410][134294] Updated weights for policy 0, policy_version 152584 (0.0028) [2025-01-04 08:47:18,968][134211] Fps is (10 sec: 13516.7, 60 sec: 14062.9, 300 sec: 13843.1). Total num frames: 625016832. Throughput: 0: 3569.8. Samples: 145418516. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 08:47:18,968][134211] Avg episode reward: [(0, '9.122')] [2025-01-04 08:47:19,566][134294] Updated weights for policy 0, policy_version 152594 (0.0029) [2025-01-04 08:47:22,696][134294] Updated weights for policy 0, policy_version 152604 (0.0024) [2025-01-04 08:47:23,968][134211] Fps is (10 sec: 13109.3, 60 sec: 14063.0, 300 sec: 13857.0). Total num frames: 625082368. Throughput: 0: 3566.5. Samples: 145437994. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 08:47:23,968][134211] Avg episode reward: [(0, '8.804')] [2025-01-04 08:47:25,674][134294] Updated weights for policy 0, policy_version 152614 (0.0025) [2025-01-04 08:47:28,688][134294] Updated weights for policy 0, policy_version 152624 (0.0025) [2025-01-04 08:47:28,968][134211] Fps is (10 sec: 13516.8, 60 sec: 14131.1, 300 sec: 13912.5). Total num frames: 625152000. Throughput: 0: 3522.8. Samples: 145458504. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 08:47:28,968][134211] Avg episode reward: [(0, '8.636')] [2025-01-04 08:47:31,746][134294] Updated weights for policy 0, policy_version 152634 (0.0024) [2025-01-04 08:47:33,968][134211] Fps is (10 sec: 13516.8, 60 sec: 14063.1, 300 sec: 13926.4). Total num frames: 625217536. Throughput: 0: 3382.7. Samples: 145468558. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 08:47:33,968][134211] Avg episode reward: [(0, '9.043')] [2025-01-04 08:47:34,813][134294] Updated weights for policy 0, policy_version 152644 (0.0029) [2025-01-04 08:47:38,139][134294] Updated weights for policy 0, policy_version 152654 (0.0026) [2025-01-04 08:47:38,967][134211] Fps is (10 sec: 13107.7, 60 sec: 14063.0, 300 sec: 13926.4). Total num frames: 625283072. Throughput: 0: 3272.3. Samples: 145487860. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 08:47:38,968][134211] Avg episode reward: [(0, '8.630')] [2025-01-04 08:47:40,119][134294] Updated weights for policy 0, policy_version 152664 (0.0012) [2025-01-04 08:47:41,989][134294] Updated weights for policy 0, policy_version 152674 (0.0013) [2025-01-04 08:47:43,872][134294] Updated weights for policy 0, policy_version 152684 (0.0013) [2025-01-04 08:47:43,967][134211] Fps is (10 sec: 17613.1, 60 sec: 14540.8, 300 sec: 14079.1). Total num frames: 625393664. Throughput: 0: 3535.1. Samples: 145518032. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 08:47:43,968][134211] Avg episode reward: [(0, '8.590')] [2025-01-04 08:47:45,875][134294] Updated weights for policy 0, policy_version 152694 (0.0016) [2025-01-04 08:47:48,968][134211] Fps is (10 sec: 18841.2, 60 sec: 14131.2, 300 sec: 14148.6). Total num frames: 625471488. Throughput: 0: 3642.0. Samples: 145532952. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 08:47:48,969][134211] Avg episode reward: [(0, '8.415')] [2025-01-04 08:47:48,978][134294] Updated weights for policy 0, policy_version 152704 (0.0028) [2025-01-04 08:47:52,214][134294] Updated weights for policy 0, policy_version 152714 (0.0029) [2025-01-04 08:47:53,968][134211] Fps is (10 sec: 13926.0, 60 sec: 14063.5, 300 sec: 14134.7). Total num frames: 625532928. Throughput: 0: 3639.3. Samples: 145552086. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 08:47:53,969][134211] Avg episode reward: [(0, '8.695')] [2025-01-04 08:47:55,662][134294] Updated weights for policy 0, policy_version 152724 (0.0028) [2025-01-04 08:47:58,939][134294] Updated weights for policy 0, policy_version 152734 (0.0028) [2025-01-04 08:47:58,968][134211] Fps is (10 sec: 12697.6, 60 sec: 14131.2, 300 sec: 14134.7). Total num frames: 625598464. Throughput: 0: 3601.8. Samples: 145570516. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 08:47:58,968][134211] Avg episode reward: [(0, '8.470')] [2025-01-04 08:48:02,203][134294] Updated weights for policy 0, policy_version 152744 (0.0027) [2025-01-04 08:48:03,968][134211] Fps is (10 sec: 12697.6, 60 sec: 14063.0, 300 sec: 14120.8). Total num frames: 625659904. Throughput: 0: 3586.4. Samples: 145579904. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 08:48:03,968][134211] Avg episode reward: [(0, '8.897')] [2025-01-04 08:48:05,575][134294] Updated weights for policy 0, policy_version 152754 (0.0023) [2025-01-04 08:48:08,701][134294] Updated weights for policy 0, policy_version 152764 (0.0024) [2025-01-04 08:48:08,968][134211] Fps is (10 sec: 12697.5, 60 sec: 14062.9, 300 sec: 14120.8). Total num frames: 625725440. Throughput: 0: 3571.8. Samples: 145598726. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 08:48:08,968][134211] Avg episode reward: [(0, '8.592')] [2025-01-04 08:48:11,618][134294] Updated weights for policy 0, policy_version 152774 (0.0022) [2025-01-04 08:48:13,968][134211] Fps is (10 sec: 13107.4, 60 sec: 13995.0, 300 sec: 14107.0). Total num frames: 625790976. Throughput: 0: 3564.3. Samples: 145618896. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 08:48:13,968][134211] Avg episode reward: [(0, '9.546')] [2025-01-04 08:48:14,720][134294] Updated weights for policy 0, policy_version 152784 (0.0027) [2025-01-04 08:48:17,693][134294] Updated weights for policy 0, policy_version 152794 (0.0026) [2025-01-04 08:48:18,968][134211] Fps is (10 sec: 13107.2, 60 sec: 13994.7, 300 sec: 14093.0). Total num frames: 625856512. Throughput: 0: 3565.3. Samples: 145628998. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 08:48:18,968][134211] Avg episode reward: [(0, '9.266')] [2025-01-04 08:48:20,820][134294] Updated weights for policy 0, policy_version 152804 (0.0026) [2025-01-04 08:48:23,708][134294] Updated weights for policy 0, policy_version 152814 (0.0026) [2025-01-04 08:48:23,968][134211] Fps is (10 sec: 13516.8, 60 sec: 14062.9, 300 sec: 14065.2). Total num frames: 625926144. Throughput: 0: 3586.8. Samples: 145649266. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 08:48:23,968][134211] Avg episode reward: [(0, '8.962')] [2025-01-04 08:48:26,734][134294] Updated weights for policy 0, policy_version 152824 (0.0026) [2025-01-04 08:48:28,968][134211] Fps is (10 sec: 13517.0, 60 sec: 13994.7, 300 sec: 14037.5). Total num frames: 625991680. Throughput: 0: 3373.7. Samples: 145669850. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 08:48:28,968][134211] Avg episode reward: [(0, '8.654')] [2025-01-04 08:48:30,012][134294] Updated weights for policy 0, policy_version 152834 (0.0023) [2025-01-04 08:48:32,519][134294] Updated weights for policy 0, policy_version 152844 (0.0017) [2025-01-04 08:48:33,968][134211] Fps is (10 sec: 15155.3, 60 sec: 14336.0, 300 sec: 14051.4). Total num frames: 626077696. Throughput: 0: 3242.2. Samples: 145678850. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 08:48:33,968][134211] Avg episode reward: [(0, '9.196')] [2025-01-04 08:48:34,481][134294] Updated weights for policy 0, policy_version 152854 (0.0013) [2025-01-04 08:48:36,482][134294] Updated weights for policy 0, policy_version 152864 (0.0013) [2025-01-04 08:48:38,964][134294] Updated weights for policy 0, policy_version 152874 (0.0020) [2025-01-04 08:48:38,968][134211] Fps is (10 sec: 18022.3, 60 sec: 14813.8, 300 sec: 14023.6). Total num frames: 626171904. Throughput: 0: 3498.0. Samples: 145709496. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 08:48:38,968][134211] Avg episode reward: [(0, '9.303')] [2025-01-04 08:48:42,198][134294] Updated weights for policy 0, policy_version 152884 (0.0027) [2025-01-04 08:48:43,968][134211] Fps is (10 sec: 15564.5, 60 sec: 13994.6, 300 sec: 14009.7). Total num frames: 626233344. Throughput: 0: 3529.9. Samples: 145729360. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 08:48:43,968][134211] Avg episode reward: [(0, '7.788')] [2025-01-04 08:48:45,382][134294] Updated weights for policy 0, policy_version 152894 (0.0029) [2025-01-04 08:48:48,427][134294] Updated weights for policy 0, policy_version 152904 (0.0024) [2025-01-04 08:48:48,977][134211] Fps is (10 sec: 12685.9, 60 sec: 13787.8, 300 sec: 14009.3). Total num frames: 626298880. Throughput: 0: 3542.7. Samples: 145739358. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 08:48:48,978][134211] Avg episode reward: [(0, '8.820')] [2025-01-04 08:48:51,942][134294] Updated weights for policy 0, policy_version 152914 (0.0028) [2025-01-04 08:48:53,968][134211] Fps is (10 sec: 12287.5, 60 sec: 13721.5, 300 sec: 13981.9). Total num frames: 626356224. Throughput: 0: 3531.9. Samples: 145757662. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 08:48:53,969][134211] Avg episode reward: [(0, '9.104')] [2025-01-04 08:48:55,554][134294] Updated weights for policy 0, policy_version 152924 (0.0024) [2025-01-04 08:48:58,787][134294] Updated weights for policy 0, policy_version 152934 (0.0022) [2025-01-04 08:48:58,968][134211] Fps is (10 sec: 11889.5, 60 sec: 13653.4, 300 sec: 13995.8). Total num frames: 626417664. Throughput: 0: 3482.4. Samples: 145775604. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 08:48:58,968][134211] Avg episode reward: [(0, '8.762')] [2025-01-04 08:49:00,761][134294] Updated weights for policy 0, policy_version 152944 (0.0015) [2025-01-04 08:49:02,719][134294] Updated weights for policy 0, policy_version 152954 (0.0014) [2025-01-04 08:49:03,968][134211] Fps is (10 sec: 16794.6, 60 sec: 14404.3, 300 sec: 14134.7). Total num frames: 626524160. Throughput: 0: 3579.3. Samples: 145790068. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 08:49:03,968][134211] Avg episode reward: [(0, '9.137')] [2025-01-04 08:49:04,783][134294] Updated weights for policy 0, policy_version 152964 (0.0013) [2025-01-04 08:49:07,019][134294] Updated weights for policy 0, policy_version 152974 (0.0017) [2025-01-04 08:49:08,968][134211] Fps is (10 sec: 18431.6, 60 sec: 14609.1, 300 sec: 14176.4). Total num frames: 626601984. Throughput: 0: 3772.8. Samples: 145819042. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 08:49:08,968][134211] Avg episode reward: [(0, '10.036')] [2025-01-04 08:49:10,404][134294] Updated weights for policy 0, policy_version 152984 (0.0027) [2025-01-04 08:49:13,411][134294] Updated weights for policy 0, policy_version 152994 (0.0029) [2025-01-04 08:49:13,968][134211] Fps is (10 sec: 14335.8, 60 sec: 14609.1, 300 sec: 14190.2). Total num frames: 626667520. Throughput: 0: 3741.6. Samples: 145838224. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 08:49:13,968][134211] Avg episode reward: [(0, '8.870')] [2025-01-04 08:49:13,976][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000152995_626667520.pth... [2025-01-04 08:49:14,061][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000152169_623284224.pth [2025-01-04 08:49:16,627][134294] Updated weights for policy 0, policy_version 153004 (0.0026) [2025-01-04 08:49:18,968][134211] Fps is (10 sec: 13107.3, 60 sec: 14609.1, 300 sec: 14204.1). Total num frames: 626733056. Throughput: 0: 3755.1. Samples: 145847828. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 08:49:18,968][134211] Avg episode reward: [(0, '8.820')] [2025-01-04 08:49:19,717][134294] Updated weights for policy 0, policy_version 153014 (0.0026) [2025-01-04 08:49:22,802][134294] Updated weights for policy 0, policy_version 153024 (0.0026) [2025-01-04 08:49:23,968][134211] Fps is (10 sec: 13107.1, 60 sec: 14540.8, 300 sec: 14093.0). Total num frames: 626798592. Throughput: 0: 3515.2. Samples: 145867682. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 08:49:23,969][134211] Avg episode reward: [(0, '9.563')] [2025-01-04 08:49:25,742][134294] Updated weights for policy 0, policy_version 153034 (0.0026) [2025-01-04 08:49:28,662][134294] Updated weights for policy 0, policy_version 153044 (0.0028) [2025-01-04 08:49:28,969][134211] Fps is (10 sec: 13515.6, 60 sec: 14608.8, 300 sec: 14009.7). Total num frames: 626868224. Throughput: 0: 3537.8. Samples: 145888564. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 08:49:28,969][134211] Avg episode reward: [(0, '9.115')] [2025-01-04 08:49:31,814][134294] Updated weights for policy 0, policy_version 153054 (0.0024) [2025-01-04 08:49:33,968][134211] Fps is (10 sec: 13516.8, 60 sec: 14267.7, 300 sec: 14037.5). Total num frames: 626933760. Throughput: 0: 3534.6. Samples: 145898384. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 08:49:33,968][134211] Avg episode reward: [(0, '9.386')] [2025-01-04 08:49:35,263][134294] Updated weights for policy 0, policy_version 153064 (0.0026) [2025-01-04 08:49:38,504][134294] Updated weights for policy 0, policy_version 153074 (0.0025) [2025-01-04 08:49:38,968][134211] Fps is (10 sec: 12698.6, 60 sec: 13721.6, 300 sec: 14023.7). Total num frames: 626995200. Throughput: 0: 3539.9. Samples: 145916956. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 08:49:38,968][134211] Avg episode reward: [(0, '8.818')] [2025-01-04 08:49:41,518][134294] Updated weights for policy 0, policy_version 153084 (0.0027) [2025-01-04 08:49:43,969][134211] Fps is (10 sec: 12696.3, 60 sec: 13789.6, 300 sec: 14037.4). Total num frames: 627060736. Throughput: 0: 3573.7. Samples: 145936426. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 08:49:43,969][134211] Avg episode reward: [(0, '8.366')] [2025-01-04 08:49:44,644][134294] Updated weights for policy 0, policy_version 153094 (0.0025) [2025-01-04 08:49:47,708][134294] Updated weights for policy 0, policy_version 153104 (0.0026) [2025-01-04 08:49:48,968][134211] Fps is (10 sec: 13516.7, 60 sec: 13860.2, 300 sec: 14079.1). Total num frames: 627130368. Throughput: 0: 3476.0. Samples: 145946490. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 08:49:48,968][134211] Avg episode reward: [(0, '9.594')] [2025-01-04 08:49:50,588][134294] Updated weights for policy 0, policy_version 153114 (0.0024) [2025-01-04 08:49:52,995][134294] Updated weights for policy 0, policy_version 153124 (0.0016) [2025-01-04 08:49:53,968][134211] Fps is (10 sec: 15156.9, 60 sec: 14267.8, 300 sec: 14134.7). Total num frames: 627212288. Throughput: 0: 3309.5. Samples: 145967968. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 08:49:53,968][134211] Avg episode reward: [(0, '10.596')] [2025-01-04 08:49:53,998][134264] Saving new best policy, reward=10.596! [2025-01-04 08:49:55,359][134294] Updated weights for policy 0, policy_version 153134 (0.0019) [2025-01-04 08:49:58,273][134294] Updated weights for policy 0, policy_version 153144 (0.0025) [2025-01-04 08:49:58,968][134211] Fps is (10 sec: 15564.9, 60 sec: 14472.5, 300 sec: 14120.8). Total num frames: 627286016. Throughput: 0: 3423.8. Samples: 145992296. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 08:49:58,968][134211] Avg episode reward: [(0, '9.452')] [2025-01-04 08:50:01,486][134294] Updated weights for policy 0, policy_version 153154 (0.0024) [2025-01-04 08:50:03,968][134211] Fps is (10 sec: 13516.9, 60 sec: 13721.6, 300 sec: 13995.8). Total num frames: 627347456. Throughput: 0: 3424.7. Samples: 146001938. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 08:50:03,968][134211] Avg episode reward: [(0, '8.583')] [2025-01-04 08:50:04,992][134294] Updated weights for policy 0, policy_version 153164 (0.0029) [2025-01-04 08:50:08,264][134294] Updated weights for policy 0, policy_version 153174 (0.0028) [2025-01-04 08:50:08,967][134211] Fps is (10 sec: 12697.9, 60 sec: 13516.9, 300 sec: 13995.8). Total num frames: 627412992. Throughput: 0: 3382.1. Samples: 146019874. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 08:50:08,968][134211] Avg episode reward: [(0, '8.541')] [2025-01-04 08:50:10,326][134294] Updated weights for policy 0, policy_version 153184 (0.0014) [2025-01-04 08:50:12,146][134294] Updated weights for policy 0, policy_version 153194 (0.0012) [2025-01-04 08:50:13,967][134211] Fps is (10 sec: 17203.5, 60 sec: 14199.5, 300 sec: 14134.7). Total num frames: 627519488. Throughput: 0: 3567.8. Samples: 146049110. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 08:50:13,968][134211] Avg episode reward: [(0, '8.419')] [2025-01-04 08:50:14,137][134294] Updated weights for policy 0, policy_version 153204 (0.0013) [2025-01-04 08:50:16,001][134294] Updated weights for policy 0, policy_version 153214 (0.0013) [2025-01-04 08:50:17,833][134294] Updated weights for policy 0, policy_version 153224 (0.0013) [2025-01-04 08:50:18,967][134211] Fps is (10 sec: 21708.9, 60 sec: 14950.4, 300 sec: 14287.4). Total num frames: 627630080. Throughput: 0: 3710.1. Samples: 146065338. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 08:50:18,968][134211] Avg episode reward: [(0, '9.138')] [2025-01-04 08:50:19,959][134294] Updated weights for policy 0, policy_version 153234 (0.0016) [2025-01-04 08:50:23,180][134294] Updated weights for policy 0, policy_version 153244 (0.0028) [2025-01-04 08:50:23,968][134211] Fps is (10 sec: 17612.2, 60 sec: 14950.4, 300 sec: 14301.3). Total num frames: 627695616. Throughput: 0: 3891.0. Samples: 146092050. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 08:50:23,969][134211] Avg episode reward: [(0, '9.354')] [2025-01-04 08:50:26,460][134294] Updated weights for policy 0, policy_version 153254 (0.0028) [2025-01-04 08:50:28,968][134211] Fps is (10 sec: 13107.0, 60 sec: 14882.4, 300 sec: 14315.2). Total num frames: 627761152. Throughput: 0: 3878.5. Samples: 146110952. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 08:50:28,968][134211] Avg episode reward: [(0, '8.549')] [2025-01-04 08:50:29,684][134294] Updated weights for policy 0, policy_version 153264 (0.0026) [2025-01-04 08:50:33,005][134294] Updated weights for policy 0, policy_version 153274 (0.0025) [2025-01-04 08:50:33,968][134211] Fps is (10 sec: 12288.1, 60 sec: 14745.6, 300 sec: 14301.3). Total num frames: 627818496. Throughput: 0: 3859.9. Samples: 146120186. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 08:50:33,968][134211] Avg episode reward: [(0, '9.780')] [2025-01-04 08:50:36,351][134294] Updated weights for policy 0, policy_version 153284 (0.0022) [2025-01-04 08:50:38,968][134211] Fps is (10 sec: 11878.4, 60 sec: 14745.6, 300 sec: 14218.0). Total num frames: 627879936. Throughput: 0: 3793.9. Samples: 146138692. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 08:50:38,968][134211] Avg episode reward: [(0, '9.174')] [2025-01-04 08:50:39,622][134294] Updated weights for policy 0, policy_version 153294 (0.0027) [2025-01-04 08:50:42,611][134294] Updated weights for policy 0, policy_version 153304 (0.0025) [2025-01-04 08:50:43,968][134211] Fps is (10 sec: 13107.4, 60 sec: 14814.2, 300 sec: 14245.7). Total num frames: 627949568. Throughput: 0: 3688.9. Samples: 146158298. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 08:50:43,968][134211] Avg episode reward: [(0, '8.734')] [2025-01-04 08:50:45,574][134294] Updated weights for policy 0, policy_version 153314 (0.0024) [2025-01-04 08:50:48,549][134294] Updated weights for policy 0, policy_version 153324 (0.0028) [2025-01-04 08:50:48,968][134211] Fps is (10 sec: 13926.4, 60 sec: 14813.9, 300 sec: 14259.6). Total num frames: 628019200. Throughput: 0: 3709.3. Samples: 146168856. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 08:50:48,968][134211] Avg episode reward: [(0, '8.881')] [2025-01-04 08:50:51,811][134294] Updated weights for policy 0, policy_version 153334 (0.0026) [2025-01-04 08:50:53,968][134211] Fps is (10 sec: 13107.0, 60 sec: 14472.5, 300 sec: 14245.7). Total num frames: 628080640. Throughput: 0: 3738.5. Samples: 146188106. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 08:50:53,969][134211] Avg episode reward: [(0, '8.746')] [2025-01-04 08:50:55,456][134294] Updated weights for policy 0, policy_version 153344 (0.0025) [2025-01-04 08:50:58,861][134294] Updated weights for policy 0, policy_version 153354 (0.0026) [2025-01-04 08:50:58,968][134211] Fps is (10 sec: 11878.3, 60 sec: 14199.5, 300 sec: 14162.4). Total num frames: 628137984. Throughput: 0: 3480.6. Samples: 146205740. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 08:50:58,968][134211] Avg episode reward: [(0, '10.088')] [2025-01-04 08:51:01,988][134294] Updated weights for policy 0, policy_version 153364 (0.0023) [2025-01-04 08:51:03,968][134211] Fps is (10 sec: 11878.3, 60 sec: 14199.4, 300 sec: 14106.9). Total num frames: 628199424. Throughput: 0: 3337.0. Samples: 146215506. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 08:51:03,969][134211] Avg episode reward: [(0, '8.763')] [2025-01-04 08:51:05,362][134294] Updated weights for policy 0, policy_version 153374 (0.0025) [2025-01-04 08:51:08,623][134294] Updated weights for policy 0, policy_version 153384 (0.0027) [2025-01-04 08:51:08,968][134211] Fps is (10 sec: 12697.8, 60 sec: 14199.4, 300 sec: 14106.9). Total num frames: 628264960. Throughput: 0: 3155.0. Samples: 146234026. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 08:51:08,968][134211] Avg episode reward: [(0, '9.106')] [2025-01-04 08:51:11,517][134294] Updated weights for policy 0, policy_version 153394 (0.0024) [2025-01-04 08:51:13,968][134211] Fps is (10 sec: 13107.0, 60 sec: 13516.7, 300 sec: 14093.0). Total num frames: 628330496. Throughput: 0: 3173.7. Samples: 146253772. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 08:51:13,970][134211] Avg episode reward: [(0, '8.924')] [2025-01-04 08:51:13,980][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000153401_628330496.pth... [2025-01-04 08:51:14,056][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000152576_624951296.pth [2025-01-04 08:51:14,740][134294] Updated weights for policy 0, policy_version 153404 (0.0025) [2025-01-04 08:51:17,394][134294] Updated weights for policy 0, policy_version 153414 (0.0019) [2025-01-04 08:51:18,968][134211] Fps is (10 sec: 14745.7, 60 sec: 13038.9, 300 sec: 14148.6). Total num frames: 628412416. Throughput: 0: 3183.1. Samples: 146263426. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 08:51:18,968][134211] Avg episode reward: [(0, '9.249')] [2025-01-04 08:51:19,659][134294] Updated weights for policy 0, policy_version 153424 (0.0018) [2025-01-04 08:51:22,574][134294] Updated weights for policy 0, policy_version 153434 (0.0026) [2025-01-04 08:51:23,968][134211] Fps is (10 sec: 15155.8, 60 sec: 13107.2, 300 sec: 14162.4). Total num frames: 628482048. Throughput: 0: 3324.6. Samples: 146288298. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 08:51:23,968][134211] Avg episode reward: [(0, '9.864')] [2025-01-04 08:51:25,611][134294] Updated weights for policy 0, policy_version 153444 (0.0026) [2025-01-04 08:51:28,729][134294] Updated weights for policy 0, policy_version 153454 (0.0024) [2025-01-04 08:51:28,968][134211] Fps is (10 sec: 13516.5, 60 sec: 13107.2, 300 sec: 14148.6). Total num frames: 628547584. Throughput: 0: 3331.4. Samples: 146308212. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 08:51:28,968][134211] Avg episode reward: [(0, '8.867')] [2025-01-04 08:51:31,955][134294] Updated weights for policy 0, policy_version 153464 (0.0028) [2025-01-04 08:51:33,968][134211] Fps is (10 sec: 12697.4, 60 sec: 13175.5, 300 sec: 14134.7). Total num frames: 628609024. Throughput: 0: 3313.3. Samples: 146317956. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 08:51:33,969][134211] Avg episode reward: [(0, '8.898')] [2025-01-04 08:51:35,325][134294] Updated weights for policy 0, policy_version 153474 (0.0026) [2025-01-04 08:51:37,543][134294] Updated weights for policy 0, policy_version 153484 (0.0014) [2025-01-04 08:51:38,967][134211] Fps is (10 sec: 14746.0, 60 sec: 13585.1, 300 sec: 14148.6). Total num frames: 628695040. Throughput: 0: 3353.3. Samples: 146339004. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 08:51:38,968][134211] Avg episode reward: [(0, '9.217')] [2025-01-04 08:51:39,573][134294] Updated weights for policy 0, policy_version 153494 (0.0014) [2025-01-04 08:51:41,472][134294] Updated weights for policy 0, policy_version 153504 (0.0013) [2025-01-04 08:51:43,344][134294] Updated weights for policy 0, policy_version 153514 (0.0013) [2025-01-04 08:51:43,968][134211] Fps is (10 sec: 19661.3, 60 sec: 14267.7, 300 sec: 14176.3). Total num frames: 628805632. Throughput: 0: 3665.5. Samples: 146370688. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 08:51:43,968][134211] Avg episode reward: [(0, '8.714')] [2025-01-04 08:51:45,219][134294] Updated weights for policy 0, policy_version 153524 (0.0013) [2025-01-04 08:51:47,077][134294] Updated weights for policy 0, policy_version 153534 (0.0013) [2025-01-04 08:51:48,968][134211] Fps is (10 sec: 21708.5, 60 sec: 14882.1, 300 sec: 14315.3). Total num frames: 628912128. Throughput: 0: 3812.2. Samples: 146387054. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 08:51:48,968][134211] Avg episode reward: [(0, '7.871')] [2025-01-04 08:51:49,024][134294] Updated weights for policy 0, policy_version 153544 (0.0016) [2025-01-04 08:51:52,330][134294] Updated weights for policy 0, policy_version 153554 (0.0029) [2025-01-04 08:51:53,968][134211] Fps is (10 sec: 16793.1, 60 sec: 14882.1, 300 sec: 14315.2). Total num frames: 628973568. Throughput: 0: 3956.2. Samples: 146412058. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 08:51:53,969][134211] Avg episode reward: [(0, '8.171')] [2025-01-04 08:51:55,836][134294] Updated weights for policy 0, policy_version 153564 (0.0030) [2025-01-04 08:51:58,968][134211] Fps is (10 sec: 12287.9, 60 sec: 14950.4, 300 sec: 14301.3). Total num frames: 629035008. Throughput: 0: 3913.2. Samples: 146429864. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 08:51:58,969][134211] Avg episode reward: [(0, '8.447')] [2025-01-04 08:51:59,255][134294] Updated weights for policy 0, policy_version 153574 (0.0027) [2025-01-04 08:52:02,784][134294] Updated weights for policy 0, policy_version 153584 (0.0025) [2025-01-04 08:52:03,968][134211] Fps is (10 sec: 11877.9, 60 sec: 14882.0, 300 sec: 14273.5). Total num frames: 629092352. Throughput: 0: 3895.9. Samples: 146438744. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 08:52:03,969][134211] Avg episode reward: [(0, '8.342')] [2025-01-04 08:52:06,167][134294] Updated weights for policy 0, policy_version 153594 (0.0029) [2025-01-04 08:52:08,968][134211] Fps is (10 sec: 11878.0, 60 sec: 14813.8, 300 sec: 14245.8). Total num frames: 629153792. Throughput: 0: 3738.1. Samples: 146456516. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 08:52:08,969][134211] Avg episode reward: [(0, '9.161')] [2025-01-04 08:52:09,541][134294] Updated weights for policy 0, policy_version 153604 (0.0029) [2025-01-04 08:52:12,767][134294] Updated weights for policy 0, policy_version 153614 (0.0030) [2025-01-04 08:52:13,968][134211] Fps is (10 sec: 12697.5, 60 sec: 14813.8, 300 sec: 14245.7). Total num frames: 629219328. Throughput: 0: 3715.1. Samples: 146475392. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 08:52:13,969][134211] Avg episode reward: [(0, '8.748')] [2025-01-04 08:52:15,701][134294] Updated weights for policy 0, policy_version 153624 (0.0023) [2025-01-04 08:52:18,664][134294] Updated weights for policy 0, policy_version 153634 (0.0025) [2025-01-04 08:52:18,968][134211] Fps is (10 sec: 13517.2, 60 sec: 14609.0, 300 sec: 14259.6). Total num frames: 629288960. Throughput: 0: 3733.4. Samples: 146485960. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 08:52:18,968][134211] Avg episode reward: [(0, '9.223')] [2025-01-04 08:52:21,577][134294] Updated weights for policy 0, policy_version 153644 (0.0024) [2025-01-04 08:52:23,968][134211] Fps is (10 sec: 13517.6, 60 sec: 14540.8, 300 sec: 14245.7). Total num frames: 629354496. Throughput: 0: 3720.5. Samples: 146506426. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 08:52:23,968][134211] Avg episode reward: [(0, '8.019')] [2025-01-04 08:52:24,823][134294] Updated weights for policy 0, policy_version 153654 (0.0028) [2025-01-04 08:52:27,754][134294] Updated weights for policy 0, policy_version 153664 (0.0025) [2025-01-04 08:52:28,968][134211] Fps is (10 sec: 13107.2, 60 sec: 14540.8, 300 sec: 14245.7). Total num frames: 629420032. Throughput: 0: 3460.2. Samples: 146526398. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 08:52:28,968][134211] Avg episode reward: [(0, '8.576')] [2025-01-04 08:52:30,854][134294] Updated weights for policy 0, policy_version 153674 (0.0026) [2025-01-04 08:52:33,968][134211] Fps is (10 sec: 13107.3, 60 sec: 14609.1, 300 sec: 14245.7). Total num frames: 629485568. Throughput: 0: 3315.9. Samples: 146536272. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 08:52:33,968][134211] Avg episode reward: [(0, '8.669')] [2025-01-04 08:52:34,068][134294] Updated weights for policy 0, policy_version 153684 (0.0026) [2025-01-04 08:52:37,291][134294] Updated weights for policy 0, policy_version 153694 (0.0023) [2025-01-04 08:52:38,968][134211] Fps is (10 sec: 13107.2, 60 sec: 14267.7, 300 sec: 14093.0). Total num frames: 629551104. Throughput: 0: 3190.1. Samples: 146555612. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 08:52:38,968][134211] Avg episode reward: [(0, '8.231')] [2025-01-04 08:52:40,431][134294] Updated weights for policy 0, policy_version 153704 (0.0027) [2025-01-04 08:52:43,341][134294] Updated weights for policy 0, policy_version 153714 (0.0025) [2025-01-04 08:52:43,968][134211] Fps is (10 sec: 13516.8, 60 sec: 13585.0, 300 sec: 14065.2). Total num frames: 629620736. Throughput: 0: 3243.9. Samples: 146575840. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 08:52:43,968][134211] Avg episode reward: [(0, '10.096')] [2025-01-04 08:52:46,262][134294] Updated weights for policy 0, policy_version 153724 (0.0028) [2025-01-04 08:52:48,968][134211] Fps is (10 sec: 13516.7, 60 sec: 12902.4, 300 sec: 14079.1). Total num frames: 629686272. Throughput: 0: 3278.4. Samples: 146586272. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 08:52:48,968][134211] Avg episode reward: [(0, '8.847')] [2025-01-04 08:52:49,330][134294] Updated weights for policy 0, policy_version 153734 (0.0027) [2025-01-04 08:52:52,798][134294] Updated weights for policy 0, policy_version 153744 (0.0024) [2025-01-04 08:52:53,968][134211] Fps is (10 sec: 12697.7, 60 sec: 12902.4, 300 sec: 14065.2). Total num frames: 629747712. Throughput: 0: 3304.8. Samples: 146605232. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 08:52:53,968][134211] Avg episode reward: [(0, '9.125')] [2025-01-04 08:52:56,222][134294] Updated weights for policy 0, policy_version 153754 (0.0025) [2025-01-04 08:52:58,474][134294] Updated weights for policy 0, policy_version 153764 (0.0015) [2025-01-04 08:52:58,968][134211] Fps is (10 sec: 13516.4, 60 sec: 13107.1, 300 sec: 14106.9). Total num frames: 629821440. Throughput: 0: 3356.4. Samples: 146626430. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 08:52:58,969][134211] Avg episode reward: [(0, '9.037')] [2025-01-04 08:53:01,544][134294] Updated weights for policy 0, policy_version 153774 (0.0021) [2025-01-04 08:53:03,968][134211] Fps is (10 sec: 13516.7, 60 sec: 13175.6, 300 sec: 14093.0). Total num frames: 629882880. Throughput: 0: 3348.4. Samples: 146636640. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 08:53:03,968][134211] Avg episode reward: [(0, '8.935')] [2025-01-04 08:53:04,994][134294] Updated weights for policy 0, policy_version 153784 (0.0025) [2025-01-04 08:53:07,215][134294] Updated weights for policy 0, policy_version 153794 (0.0014) [2025-01-04 08:53:08,968][134211] Fps is (10 sec: 15156.0, 60 sec: 13653.4, 300 sec: 14176.3). Total num frames: 629972992. Throughput: 0: 3375.3. Samples: 146658314. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 08:53:08,968][134211] Avg episode reward: [(0, '9.560')] [2025-01-04 08:53:09,216][134294] Updated weights for policy 0, policy_version 153804 (0.0012) [2025-01-04 08:53:11,174][134294] Updated weights for policy 0, policy_version 153814 (0.0013) [2025-01-04 08:53:13,110][134294] Updated weights for policy 0, policy_version 153824 (0.0012) [2025-01-04 08:53:13,968][134211] Fps is (10 sec: 19661.4, 60 sec: 14336.2, 300 sec: 14315.2). Total num frames: 630079488. Throughput: 0: 3625.0. Samples: 146689524. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 08:53:13,968][134211] Avg episode reward: [(0, '8.229')] [2025-01-04 08:53:14,032][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000153829_630083584.pth... [2025-01-04 08:53:14,072][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000152995_626667520.pth [2025-01-04 08:53:15,009][134294] Updated weights for policy 0, policy_version 153834 (0.0014) [2025-01-04 08:53:17,544][134294] Updated weights for policy 0, policy_version 153844 (0.0023) [2025-01-04 08:53:18,968][134211] Fps is (10 sec: 18841.2, 60 sec: 14540.8, 300 sec: 14356.8). Total num frames: 630161408. Throughput: 0: 3738.5. Samples: 146704504. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 08:53:18,969][134211] Avg episode reward: [(0, '7.643')] [2025-01-04 08:53:20,721][134294] Updated weights for policy 0, policy_version 153854 (0.0023) [2025-01-04 08:53:23,744][134294] Updated weights for policy 0, policy_version 153864 (0.0028) [2025-01-04 08:53:23,968][134211] Fps is (10 sec: 14745.3, 60 sec: 14540.8, 300 sec: 14356.8). Total num frames: 630226944. Throughput: 0: 3756.4. Samples: 146724652. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 08:53:23,968][134211] Avg episode reward: [(0, '9.770')] [2025-01-04 08:53:26,853][134294] Updated weights for policy 0, policy_version 153874 (0.0027) [2025-01-04 08:53:28,968][134211] Fps is (10 sec: 13107.2, 60 sec: 14540.8, 300 sec: 14287.4). Total num frames: 630292480. Throughput: 0: 3744.0. Samples: 146744322. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 08:53:28,968][134211] Avg episode reward: [(0, '9.350')] [2025-01-04 08:53:30,061][134294] Updated weights for policy 0, policy_version 153884 (0.0029) [2025-01-04 08:53:33,408][134294] Updated weights for policy 0, policy_version 153894 (0.0026) [2025-01-04 08:53:33,968][134211] Fps is (10 sec: 12697.5, 60 sec: 14472.5, 300 sec: 14176.3). Total num frames: 630353920. Throughput: 0: 3724.4. Samples: 146753872. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 08:53:33,968][134211] Avg episode reward: [(0, '8.187')] [2025-01-04 08:53:36,768][134294] Updated weights for policy 0, policy_version 153904 (0.0027) [2025-01-04 08:53:38,968][134211] Fps is (10 sec: 12288.0, 60 sec: 14404.3, 300 sec: 14176.3). Total num frames: 630415360. Throughput: 0: 3708.2. Samples: 146772102. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 08:53:38,968][134211] Avg episode reward: [(0, '8.737')] [2025-01-04 08:53:40,102][134294] Updated weights for policy 0, policy_version 153914 (0.0026) [2025-01-04 08:53:43,053][134294] Updated weights for policy 0, policy_version 153924 (0.0025) [2025-01-04 08:53:43,968][134211] Fps is (10 sec: 12697.6, 60 sec: 14336.0, 300 sec: 14176.8). Total num frames: 630480896. Throughput: 0: 3671.8. Samples: 146791660. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 08:53:43,968][134211] Avg episode reward: [(0, '8.708')] [2025-01-04 08:53:46,058][134294] Updated weights for policy 0, policy_version 153934 (0.0026) [2025-01-04 08:53:48,968][134211] Fps is (10 sec: 13516.8, 60 sec: 14404.3, 300 sec: 14218.0). Total num frames: 630550528. Throughput: 0: 3674.4. Samples: 146801990. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 08:53:48,968][134211] Avg episode reward: [(0, '8.685')] [2025-01-04 08:53:49,136][134294] Updated weights for policy 0, policy_version 153944 (0.0023) [2025-01-04 08:53:52,266][134294] Updated weights for policy 0, policy_version 153954 (0.0023) [2025-01-04 08:53:53,968][134211] Fps is (10 sec: 13516.8, 60 sec: 14472.5, 300 sec: 14231.9). Total num frames: 630616064. Throughput: 0: 3635.0. Samples: 146821890. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 08:53:53,968][134211] Avg episode reward: [(0, '9.327')] [2025-01-04 08:53:55,257][134294] Updated weights for policy 0, policy_version 153964 (0.0025) [2025-01-04 08:53:58,226][134294] Updated weights for policy 0, policy_version 153974 (0.0024) [2025-01-04 08:53:58,968][134211] Fps is (10 sec: 13516.7, 60 sec: 14404.3, 300 sec: 14106.9). Total num frames: 630685696. Throughput: 0: 3402.2. Samples: 146842624. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 08:53:58,968][134211] Avg episode reward: [(0, '8.430')] [2025-01-04 08:54:01,174][134294] Updated weights for policy 0, policy_version 153984 (0.0024) [2025-01-04 08:54:03,968][134211] Fps is (10 sec: 13517.0, 60 sec: 14472.6, 300 sec: 14065.3). Total num frames: 630751232. Throughput: 0: 3292.2. Samples: 146852654. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 08:54:03,968][134211] Avg episode reward: [(0, '9.054')] [2025-01-04 08:54:04,567][134294] Updated weights for policy 0, policy_version 153994 (0.0025) [2025-01-04 08:54:07,810][134294] Updated weights for policy 0, policy_version 154004 (0.0026) [2025-01-04 08:54:08,968][134211] Fps is (10 sec: 12697.7, 60 sec: 13994.6, 300 sec: 14051.4). Total num frames: 630812672. Throughput: 0: 3260.2. Samples: 146871360. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 08:54:08,968][134211] Avg episode reward: [(0, '10.208')] [2025-01-04 08:54:10,766][134294] Updated weights for policy 0, policy_version 154014 (0.0023) [2025-01-04 08:54:13,736][134294] Updated weights for policy 0, policy_version 154024 (0.0023) [2025-01-04 08:54:13,968][134211] Fps is (10 sec: 13107.1, 60 sec: 13380.2, 300 sec: 14065.2). Total num frames: 630882304. Throughput: 0: 3282.8. Samples: 146892046. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 08:54:13,968][134211] Avg episode reward: [(0, '9.154')] [2025-01-04 08:54:16,592][134294] Updated weights for policy 0, policy_version 154034 (0.0022) [2025-01-04 08:54:18,968][134211] Fps is (10 sec: 13926.6, 60 sec: 13175.5, 300 sec: 14079.1). Total num frames: 630951936. Throughput: 0: 3304.0. Samples: 146902552. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 08:54:18,968][134211] Avg episode reward: [(0, '9.361')] [2025-01-04 08:54:19,623][134294] Updated weights for policy 0, policy_version 154044 (0.0027) [2025-01-04 08:54:22,740][134294] Updated weights for policy 0, policy_version 154054 (0.0026) [2025-01-04 08:54:23,968][134211] Fps is (10 sec: 13926.5, 60 sec: 13243.7, 300 sec: 14079.2). Total num frames: 631021568. Throughput: 0: 3348.7. Samples: 146922792. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 08:54:23,968][134211] Avg episode reward: [(0, '9.527')] [2025-01-04 08:54:25,676][134294] Updated weights for policy 0, policy_version 154064 (0.0024) [2025-01-04 08:54:28,658][134294] Updated weights for policy 0, policy_version 154074 (0.0024) [2025-01-04 08:54:28,968][134211] Fps is (10 sec: 13926.2, 60 sec: 13312.0, 300 sec: 14093.0). Total num frames: 631091200. Throughput: 0: 3371.9. Samples: 146943394. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 08:54:28,968][134211] Avg episode reward: [(0, '8.476')] [2025-01-04 08:54:31,763][134294] Updated weights for policy 0, policy_version 154084 (0.0027) [2025-01-04 08:54:33,968][134211] Fps is (10 sec: 14335.9, 60 sec: 13516.8, 300 sec: 14134.7). Total num frames: 631164928. Throughput: 0: 3359.3. Samples: 146953160. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 08:54:33,968][134211] Avg episode reward: [(0, '10.082')] [2025-01-04 08:54:34,173][134294] Updated weights for policy 0, policy_version 154094 (0.0014) [2025-01-04 08:54:37,101][134294] Updated weights for policy 0, policy_version 154104 (0.0025) [2025-01-04 08:54:38,968][134211] Fps is (10 sec: 13926.4, 60 sec: 13585.1, 300 sec: 14134.7). Total num frames: 631230464. Throughput: 0: 3418.0. Samples: 146975698. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 08:54:38,968][134211] Avg episode reward: [(0, '8.920')] [2025-01-04 08:54:40,429][134294] Updated weights for policy 0, policy_version 154114 (0.0026) [2025-01-04 08:54:42,748][134294] Updated weights for policy 0, policy_version 154124 (0.0018) [2025-01-04 08:54:43,968][134211] Fps is (10 sec: 15155.6, 60 sec: 13926.4, 300 sec: 14190.2). Total num frames: 631316480. Throughput: 0: 3464.6. Samples: 146998530. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 08:54:43,968][134211] Avg episode reward: [(0, '9.621')] [2025-01-04 08:54:44,674][134294] Updated weights for policy 0, policy_version 154134 (0.0014) [2025-01-04 08:54:46,519][134294] Updated weights for policy 0, policy_version 154144 (0.0013) [2025-01-04 08:54:48,413][134294] Updated weights for policy 0, policy_version 154154 (0.0013) [2025-01-04 08:54:48,968][134211] Fps is (10 sec: 19251.6, 60 sec: 14540.9, 300 sec: 14273.5). Total num frames: 631422976. Throughput: 0: 3602.3. Samples: 147014758. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 08:54:48,968][134211] Avg episode reward: [(0, '8.951')] [2025-01-04 08:54:50,356][134294] Updated weights for policy 0, policy_version 154164 (0.0014) [2025-01-04 08:54:53,347][134294] Updated weights for policy 0, policy_version 154174 (0.0022) [2025-01-04 08:54:53,968][134211] Fps is (10 sec: 18431.5, 60 sec: 14745.6, 300 sec: 14287.4). Total num frames: 631500800. Throughput: 0: 3837.8. Samples: 147044062. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 08:54:53,969][134211] Avg episode reward: [(0, '9.039')] [2025-01-04 08:54:56,861][134294] Updated weights for policy 0, policy_version 154184 (0.0030) [2025-01-04 08:54:58,968][134211] Fps is (10 sec: 13926.1, 60 sec: 14609.1, 300 sec: 14287.4). Total num frames: 631562240. Throughput: 0: 3760.5. Samples: 147061270. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 08:54:58,968][134211] Avg episode reward: [(0, '9.721')] [2025-01-04 08:55:00,446][134294] Updated weights for policy 0, policy_version 154194 (0.0040) [2025-01-04 08:55:03,968][134211] Fps is (10 sec: 11469.0, 60 sec: 14404.3, 300 sec: 14245.7). Total num frames: 631615488. Throughput: 0: 3723.2. Samples: 147070098. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 08:55:03,968][134211] Avg episode reward: [(0, '9.529')] [2025-01-04 08:55:04,157][134294] Updated weights for policy 0, policy_version 154204 (0.0027) [2025-01-04 08:55:07,576][134294] Updated weights for policy 0, policy_version 154214 (0.0027) [2025-01-04 08:55:08,968][134211] Fps is (10 sec: 11059.3, 60 sec: 14336.0, 300 sec: 14079.1). Total num frames: 631672832. Throughput: 0: 3653.9. Samples: 147087218. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 08:55:08,968][134211] Avg episode reward: [(0, '9.210')] [2025-01-04 08:55:10,809][134294] Updated weights for policy 0, policy_version 154224 (0.0028) [2025-01-04 08:55:13,808][134294] Updated weights for policy 0, policy_version 154234 (0.0023) [2025-01-04 08:55:13,968][134211] Fps is (10 sec: 12697.6, 60 sec: 14336.0, 300 sec: 13940.3). Total num frames: 631742464. Throughput: 0: 3635.4. Samples: 147106988. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 08:55:13,968][134211] Avg episode reward: [(0, '8.260')] [2025-01-04 08:55:13,976][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000154234_631742464.pth... [2025-01-04 08:55:14,059][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000153401_628330496.pth [2025-01-04 08:55:16,863][134294] Updated weights for policy 0, policy_version 154244 (0.0022) [2025-01-04 08:55:18,968][134211] Fps is (10 sec: 13516.9, 60 sec: 14267.7, 300 sec: 13940.3). Total num frames: 631808000. Throughput: 0: 3630.6. Samples: 147116536. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 08:55:18,968][134211] Avg episode reward: [(0, '9.040')] [2025-01-04 08:55:20,015][134294] Updated weights for policy 0, policy_version 154254 (0.0025) [2025-01-04 08:55:22,873][134294] Updated weights for policy 0, policy_version 154264 (0.0028) [2025-01-04 08:55:23,968][134211] Fps is (10 sec: 13516.9, 60 sec: 14267.8, 300 sec: 13954.2). Total num frames: 631877632. Throughput: 0: 3583.6. Samples: 147136960. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 08:55:23,968][134211] Avg episode reward: [(0, '8.527')] [2025-01-04 08:55:25,975][134294] Updated weights for policy 0, policy_version 154274 (0.0027) [2025-01-04 08:55:28,837][134294] Updated weights for policy 0, policy_version 154284 (0.0022) [2025-01-04 08:55:28,968][134211] Fps is (10 sec: 13926.2, 60 sec: 14267.7, 300 sec: 13995.8). Total num frames: 631947264. Throughput: 0: 3535.8. Samples: 147157640. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 08:55:28,968][134211] Avg episode reward: [(0, '8.999')] [2025-01-04 08:55:32,008][134294] Updated weights for policy 0, policy_version 154294 (0.0024) [2025-01-04 08:55:33,968][134211] Fps is (10 sec: 13107.2, 60 sec: 14063.0, 300 sec: 13995.8). Total num frames: 632008704. Throughput: 0: 3397.9. Samples: 147167664. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 08:55:33,968][134211] Avg episode reward: [(0, '8.771')] [2025-01-04 08:55:35,281][134294] Updated weights for policy 0, policy_version 154304 (0.0026) [2025-01-04 08:55:38,733][134294] Updated weights for policy 0, policy_version 154314 (0.0026) [2025-01-04 08:55:38,968][134211] Fps is (10 sec: 12288.0, 60 sec: 13994.7, 300 sec: 13968.1). Total num frames: 632070144. Throughput: 0: 3159.4. Samples: 147186236. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 08:55:38,968][134211] Avg episode reward: [(0, '7.918')] [2025-01-04 08:55:41,409][134294] Updated weights for policy 0, policy_version 154324 (0.0020) [2025-01-04 08:55:43,315][134294] Updated weights for policy 0, policy_version 154334 (0.0012) [2025-01-04 08:55:43,967][134211] Fps is (10 sec: 15565.1, 60 sec: 14131.2, 300 sec: 14051.4). Total num frames: 632164352. Throughput: 0: 3315.3. Samples: 147210456. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 08:55:43,968][134211] Avg episode reward: [(0, '9.208')] [2025-01-04 08:55:45,276][134294] Updated weights for policy 0, policy_version 154344 (0.0015) [2025-01-04 08:55:48,137][134294] Updated weights for policy 0, policy_version 154354 (0.0026) [2025-01-04 08:55:48,968][134211] Fps is (10 sec: 17203.0, 60 sec: 13653.3, 300 sec: 14106.9). Total num frames: 632242176. Throughput: 0: 3444.7. Samples: 147225110. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 08:55:48,969][134211] Avg episode reward: [(0, '9.189')] [2025-01-04 08:55:51,142][134294] Updated weights for policy 0, policy_version 154364 (0.0027) [2025-01-04 08:55:53,968][134211] Fps is (10 sec: 14334.6, 60 sec: 13448.4, 300 sec: 14134.6). Total num frames: 632307712. Throughput: 0: 3507.6. Samples: 147245062. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 08:55:53,969][134211] Avg episode reward: [(0, '8.556')] [2025-01-04 08:55:54,452][134294] Updated weights for policy 0, policy_version 154374 (0.0023) [2025-01-04 08:55:57,344][134294] Updated weights for policy 0, policy_version 154384 (0.0028) [2025-01-04 08:55:58,968][134211] Fps is (10 sec: 13107.3, 60 sec: 13516.8, 300 sec: 14148.6). Total num frames: 632373248. Throughput: 0: 3509.2. Samples: 147264902. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 08:55:58,968][134211] Avg episode reward: [(0, '8.639')] [2025-01-04 08:56:00,568][134294] Updated weights for policy 0, policy_version 154394 (0.0027) [2025-01-04 08:56:03,859][134294] Updated weights for policy 0, policy_version 154404 (0.0024) [2025-01-04 08:56:03,968][134211] Fps is (10 sec: 13108.1, 60 sec: 13721.6, 300 sec: 14148.5). Total num frames: 632438784. Throughput: 0: 3513.9. Samples: 147274662. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 08:56:03,968][134211] Avg episode reward: [(0, '8.086')] [2025-01-04 08:56:07,056][134294] Updated weights for policy 0, policy_version 154414 (0.0027) [2025-01-04 08:56:08,968][134211] Fps is (10 sec: 13106.9, 60 sec: 13858.1, 300 sec: 14148.6). Total num frames: 632504320. Throughput: 0: 3478.1. Samples: 147293474. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 08:56:08,969][134211] Avg episode reward: [(0, '9.092')] [2025-01-04 08:56:10,192][134294] Updated weights for policy 0, policy_version 154424 (0.0024) [2025-01-04 08:56:13,194][134294] Updated weights for policy 0, policy_version 154434 (0.0024) [2025-01-04 08:56:13,968][134211] Fps is (10 sec: 13107.2, 60 sec: 13789.9, 300 sec: 14093.0). Total num frames: 632569856. Throughput: 0: 3470.8. Samples: 147313828. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 08:56:13,968][134211] Avg episode reward: [(0, '9.473')] [2025-01-04 08:56:16,066][134294] Updated weights for policy 0, policy_version 154444 (0.0024) [2025-01-04 08:56:18,968][134211] Fps is (10 sec: 13517.0, 60 sec: 13858.1, 300 sec: 14093.0). Total num frames: 632639488. Throughput: 0: 3476.7. Samples: 147324118. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 08:56:18,968][134211] Avg episode reward: [(0, '9.348')] [2025-01-04 08:56:19,159][134294] Updated weights for policy 0, policy_version 154454 (0.0026) [2025-01-04 08:56:22,027][134294] Updated weights for policy 0, policy_version 154464 (0.0020) [2025-01-04 08:56:23,965][134294] Updated weights for policy 0, policy_version 154474 (0.0012) [2025-01-04 08:56:23,967][134211] Fps is (10 sec: 15565.2, 60 sec: 14131.2, 300 sec: 14162.5). Total num frames: 632725504. Throughput: 0: 3531.3. Samples: 147345146. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 08:56:23,968][134211] Avg episode reward: [(0, '8.779')] [2025-01-04 08:56:25,921][134294] Updated weights for policy 0, policy_version 154484 (0.0014) [2025-01-04 08:56:28,784][134294] Updated weights for policy 0, policy_version 154494 (0.0024) [2025-01-04 08:56:28,968][134211] Fps is (10 sec: 16793.6, 60 sec: 14336.0, 300 sec: 14231.9). Total num frames: 632807424. Throughput: 0: 3611.7. Samples: 147372984. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 08:56:28,968][134211] Avg episode reward: [(0, '8.541')] [2025-01-04 08:56:31,994][134294] Updated weights for policy 0, policy_version 154504 (0.0024) [2025-01-04 08:56:33,968][134211] Fps is (10 sec: 14335.5, 60 sec: 14336.0, 300 sec: 14148.5). Total num frames: 632868864. Throughput: 0: 3502.6. Samples: 147382728. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 08:56:33,969][134211] Avg episode reward: [(0, '8.942')] [2025-01-04 08:56:35,392][134294] Updated weights for policy 0, policy_version 154514 (0.0029) [2025-01-04 08:56:38,650][134294] Updated weights for policy 0, policy_version 154524 (0.0025) [2025-01-04 08:56:38,968][134211] Fps is (10 sec: 12288.1, 60 sec: 14336.0, 300 sec: 13981.9). Total num frames: 632930304. Throughput: 0: 3471.0. Samples: 147401252. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 08:56:38,968][134211] Avg episode reward: [(0, '9.226')] [2025-01-04 08:56:41,669][134294] Updated weights for policy 0, policy_version 154534 (0.0024) [2025-01-04 08:56:43,968][134211] Fps is (10 sec: 13107.5, 60 sec: 13926.4, 300 sec: 13857.0). Total num frames: 632999936. Throughput: 0: 3467.6. Samples: 147420944. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 08:56:43,968][134211] Avg episode reward: [(0, '8.945')] [2025-01-04 08:56:44,727][134294] Updated weights for policy 0, policy_version 154544 (0.0028) [2025-01-04 08:56:47,758][134294] Updated weights for policy 0, policy_version 154554 (0.0023) [2025-01-04 08:56:48,968][134211] Fps is (10 sec: 13516.8, 60 sec: 13721.6, 300 sec: 13870.9). Total num frames: 633065472. Throughput: 0: 3473.9. Samples: 147430986. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 08:56:48,968][134211] Avg episode reward: [(0, '8.030')] [2025-01-04 08:56:50,800][134294] Updated weights for policy 0, policy_version 154564 (0.0027) [2025-01-04 08:56:53,048][134294] Updated weights for policy 0, policy_version 154574 (0.0016) [2025-01-04 08:56:53,968][134211] Fps is (10 sec: 15155.3, 60 sec: 14063.1, 300 sec: 13954.2). Total num frames: 633151488. Throughput: 0: 3540.0. Samples: 147452772. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 08:56:53,968][134211] Avg episode reward: [(0, '8.527')] [2025-01-04 08:56:55,180][134294] Updated weights for policy 0, policy_version 154584 (0.0013) [2025-01-04 08:56:57,152][134294] Updated weights for policy 0, policy_version 154594 (0.0014) [2025-01-04 08:56:58,968][134211] Fps is (10 sec: 18430.7, 60 sec: 14608.9, 300 sec: 14093.0). Total num frames: 633249792. Throughput: 0: 3760.0. Samples: 147483028. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 08:56:58,969][134211] Avg episode reward: [(0, '8.612')] [2025-01-04 08:56:59,350][134294] Updated weights for policy 0, policy_version 154604 (0.0018) [2025-01-04 08:57:02,743][134294] Updated weights for policy 0, policy_version 154614 (0.0028) [2025-01-04 08:57:03,968][134211] Fps is (10 sec: 15973.9, 60 sec: 14540.8, 300 sec: 14093.0). Total num frames: 633311232. Throughput: 0: 3767.9. Samples: 147493674. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 08:57:03,969][134211] Avg episode reward: [(0, '9.302')] [2025-01-04 08:57:06,147][134294] Updated weights for policy 0, policy_version 154624 (0.0029) [2025-01-04 08:57:08,968][134211] Fps is (10 sec: 11879.1, 60 sec: 14404.3, 300 sec: 14065.3). Total num frames: 633368576. Throughput: 0: 3693.0. Samples: 147511334. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 08:57:08,968][134211] Avg episode reward: [(0, '8.866')] [2025-01-04 08:57:09,597][134294] Updated weights for policy 0, policy_version 154634 (0.0023) [2025-01-04 08:57:12,611][134294] Updated weights for policy 0, policy_version 154644 (0.0026) [2025-01-04 08:57:13,968][134211] Fps is (10 sec: 12697.7, 60 sec: 14472.5, 300 sec: 14065.2). Total num frames: 633438208. Throughput: 0: 3494.6. Samples: 147530240. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 08:57:13,968][134211] Avg episode reward: [(0, '7.827')] [2025-01-04 08:57:13,977][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000154648_633438208.pth... [2025-01-04 08:57:14,050][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000153829_630083584.pth [2025-01-04 08:57:15,748][134294] Updated weights for policy 0, policy_version 154654 (0.0026) [2025-01-04 08:57:18,721][134294] Updated weights for policy 0, policy_version 154664 (0.0027) [2025-01-04 08:57:18,968][134211] Fps is (10 sec: 13926.4, 60 sec: 14472.5, 300 sec: 14079.1). Total num frames: 633507840. Throughput: 0: 3504.9. Samples: 147540446. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 08:57:18,969][134211] Avg episode reward: [(0, '8.721')] [2025-01-04 08:57:21,851][134294] Updated weights for policy 0, policy_version 154674 (0.0026) [2025-01-04 08:57:23,968][134211] Fps is (10 sec: 13107.2, 60 sec: 14062.9, 300 sec: 14065.2). Total num frames: 633569280. Throughput: 0: 3538.2. Samples: 147560470. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 08:57:23,968][134211] Avg episode reward: [(0, '9.046')] [2025-01-04 08:57:25,052][134294] Updated weights for policy 0, policy_version 154684 (0.0027) [2025-01-04 08:57:28,218][134294] Updated weights for policy 0, policy_version 154694 (0.0027) [2025-01-04 08:57:28,968][134211] Fps is (10 sec: 12697.6, 60 sec: 13789.9, 300 sec: 14065.2). Total num frames: 633634816. Throughput: 0: 3527.5. Samples: 147579680. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 08:57:28,968][134211] Avg episode reward: [(0, '9.512')] [2025-01-04 08:57:31,369][134294] Updated weights for policy 0, policy_version 154704 (0.0024) [2025-01-04 08:57:33,968][134211] Fps is (10 sec: 12697.8, 60 sec: 13789.9, 300 sec: 14051.4). Total num frames: 633696256. Throughput: 0: 3521.5. Samples: 147589454. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 08:57:33,968][134211] Avg episode reward: [(0, '9.238')] [2025-01-04 08:57:34,771][134294] Updated weights for policy 0, policy_version 154714 (0.0024) [2025-01-04 08:57:38,016][134294] Updated weights for policy 0, policy_version 154724 (0.0025) [2025-01-04 08:57:38,968][134211] Fps is (10 sec: 12288.0, 60 sec: 13789.8, 300 sec: 14023.6). Total num frames: 633757696. Throughput: 0: 3452.5. Samples: 147608134. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 08:57:38,968][134211] Avg episode reward: [(0, '9.216')] [2025-01-04 08:57:40,762][134294] Updated weights for policy 0, policy_version 154734 (0.0022) [2025-01-04 08:57:42,836][134294] Updated weights for policy 0, policy_version 154744 (0.0015) [2025-01-04 08:57:43,968][134211] Fps is (10 sec: 15155.3, 60 sec: 14131.2, 300 sec: 14106.9). Total num frames: 633847808. Throughput: 0: 3321.7. Samples: 147632504. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 08:57:43,968][134211] Avg episode reward: [(0, '10.059')] [2025-01-04 08:57:45,715][134294] Updated weights for policy 0, policy_version 154754 (0.0024) [2025-01-04 08:57:48,664][134294] Updated weights for policy 0, policy_version 154764 (0.0026) [2025-01-04 08:57:48,968][134211] Fps is (10 sec: 15974.4, 60 sec: 14199.4, 300 sec: 14134.7). Total num frames: 633917440. Throughput: 0: 3316.2. Samples: 147642904. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 08:57:48,969][134211] Avg episode reward: [(0, '8.135')] [2025-01-04 08:57:51,544][134294] Updated weights for policy 0, policy_version 154774 (0.0024) [2025-01-04 08:57:53,969][134211] Fps is (10 sec: 13515.5, 60 sec: 13857.9, 300 sec: 14106.9). Total num frames: 633982976. Throughput: 0: 3382.4. Samples: 147663544. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 08:57:53,969][134211] Avg episode reward: [(0, '9.221')] [2025-01-04 08:57:54,880][134294] Updated weights for policy 0, policy_version 154784 (0.0028) [2025-01-04 08:57:57,973][134294] Updated weights for policy 0, policy_version 154794 (0.0024) [2025-01-04 08:57:58,968][134211] Fps is (10 sec: 13517.0, 60 sec: 13380.4, 300 sec: 14134.7). Total num frames: 634052608. Throughput: 0: 3398.3. Samples: 147683164. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 08:57:58,968][134211] Avg episode reward: [(0, '8.522')] [2025-01-04 08:58:00,040][134294] Updated weights for policy 0, policy_version 154804 (0.0014) [2025-01-04 08:58:02,551][134294] Updated weights for policy 0, policy_version 154814 (0.0018) [2025-01-04 08:58:03,968][134211] Fps is (10 sec: 15156.4, 60 sec: 13721.6, 300 sec: 14106.9). Total num frames: 634134528. Throughput: 0: 3502.5. Samples: 147698058. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 08:58:03,969][134211] Avg episode reward: [(0, '9.139')] [2025-01-04 08:58:05,951][134294] Updated weights for policy 0, policy_version 154824 (0.0025) [2025-01-04 08:58:08,968][134211] Fps is (10 sec: 14335.8, 60 sec: 13789.9, 300 sec: 13954.2). Total num frames: 634195968. Throughput: 0: 3474.9. Samples: 147716842. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 08:58:08,969][134211] Avg episode reward: [(0, '9.361')] [2025-01-04 08:58:09,196][134294] Updated weights for policy 0, policy_version 154834 (0.0027) [2025-01-04 08:58:12,156][134294] Updated weights for policy 0, policy_version 154844 (0.0023) [2025-01-04 08:58:13,968][134211] Fps is (10 sec: 12697.6, 60 sec: 13721.6, 300 sec: 13898.6). Total num frames: 634261504. Throughput: 0: 3488.0. Samples: 147736640. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 08:58:13,968][134211] Avg episode reward: [(0, '9.435')] [2025-01-04 08:58:15,225][134294] Updated weights for policy 0, policy_version 154854 (0.0026) [2025-01-04 08:58:18,146][134294] Updated weights for policy 0, policy_version 154864 (0.0024) [2025-01-04 08:58:18,968][134211] Fps is (10 sec: 13516.8, 60 sec: 13721.6, 300 sec: 13912.5). Total num frames: 634331136. Throughput: 0: 3504.0. Samples: 147747136. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 08:58:18,968][134211] Avg episode reward: [(0, '9.354')] [2025-01-04 08:58:21,076][134294] Updated weights for policy 0, policy_version 154874 (0.0024) [2025-01-04 08:58:23,968][134211] Fps is (10 sec: 13517.0, 60 sec: 13789.9, 300 sec: 13912.5). Total num frames: 634396672. Throughput: 0: 3546.9. Samples: 147767746. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 08:58:23,968][134211] Avg episode reward: [(0, '10.393')] [2025-01-04 08:58:24,287][134294] Updated weights for policy 0, policy_version 154884 (0.0025) [2025-01-04 08:58:27,324][134294] Updated weights for policy 0, policy_version 154894 (0.0023) [2025-01-04 08:58:28,968][134211] Fps is (10 sec: 13516.8, 60 sec: 13858.1, 300 sec: 13940.3). Total num frames: 634466304. Throughput: 0: 3442.3. Samples: 147787406. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 08:58:28,968][134211] Avg episode reward: [(0, '8.456')] [2025-01-04 08:58:29,945][134294] Updated weights for policy 0, policy_version 154904 (0.0022) [2025-01-04 08:58:31,916][134294] Updated weights for policy 0, policy_version 154914 (0.0013) [2025-01-04 08:58:33,968][134211] Fps is (10 sec: 16793.6, 60 sec: 14472.5, 300 sec: 14065.3). Total num frames: 634564608. Throughput: 0: 3525.0. Samples: 147801528. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 08:58:33,968][134211] Avg episode reward: [(0, '9.802')] [2025-01-04 08:58:34,244][134294] Updated weights for policy 0, policy_version 154924 (0.0018) [2025-01-04 08:58:37,706][134294] Updated weights for policy 0, policy_version 154934 (0.0025) [2025-01-04 08:58:38,968][134211] Fps is (10 sec: 15565.0, 60 sec: 14404.3, 300 sec: 14037.5). Total num frames: 634621952. Throughput: 0: 3571.7. Samples: 147824268. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 08:58:38,968][134211] Avg episode reward: [(0, '8.995')] [2025-01-04 08:58:40,879][134294] Updated weights for policy 0, policy_version 154944 (0.0025) [2025-01-04 08:58:43,800][134294] Updated weights for policy 0, policy_version 154954 (0.0025) [2025-01-04 08:58:43,968][134211] Fps is (10 sec: 12697.4, 60 sec: 14062.9, 300 sec: 14037.5). Total num frames: 634691584. Throughput: 0: 3574.6. Samples: 147844020. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 08:58:43,969][134211] Avg episode reward: [(0, '8.530')] [2025-01-04 08:58:46,810][134294] Updated weights for policy 0, policy_version 154964 (0.0024) [2025-01-04 08:58:48,968][134211] Fps is (10 sec: 13926.4, 60 sec: 14063.0, 300 sec: 14051.4). Total num frames: 634761216. Throughput: 0: 3468.8. Samples: 147854152. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 08:58:48,968][134211] Avg episode reward: [(0, '8.827')] [2025-01-04 08:58:49,880][134294] Updated weights for policy 0, policy_version 154974 (0.0026) [2025-01-04 08:58:52,934][134294] Updated weights for policy 0, policy_version 154984 (0.0025) [2025-01-04 08:58:53,968][134211] Fps is (10 sec: 13107.4, 60 sec: 13994.9, 300 sec: 14023.6). Total num frames: 634822656. Throughput: 0: 3502.3. Samples: 147874444. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 08:58:53,968][134211] Avg episode reward: [(0, '8.519')] [2025-01-04 08:58:56,514][134294] Updated weights for policy 0, policy_version 154994 (0.0026) [2025-01-04 08:58:58,968][134211] Fps is (10 sec: 12288.0, 60 sec: 13858.1, 300 sec: 14009.7). Total num frames: 634884096. Throughput: 0: 3453.5. Samples: 147892046. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 08:58:58,968][134211] Avg episode reward: [(0, '9.884')] [2025-01-04 08:58:59,870][134294] Updated weights for policy 0, policy_version 155004 (0.0023) [2025-01-04 08:59:01,990][134294] Updated weights for policy 0, policy_version 155014 (0.0014) [2025-01-04 08:59:03,967][134211] Fps is (10 sec: 15155.5, 60 sec: 13994.7, 300 sec: 14106.9). Total num frames: 634974208. Throughput: 0: 3477.8. Samples: 147903636. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 08:59:03,968][134211] Avg episode reward: [(0, '8.293')] [2025-01-04 08:59:04,043][134294] Updated weights for policy 0, policy_version 155024 (0.0015) [2025-01-04 08:59:06,075][134294] Updated weights for policy 0, policy_version 155034 (0.0013) [2025-01-04 08:59:08,780][134294] Updated weights for policy 0, policy_version 155044 (0.0021) [2025-01-04 08:59:08,968][134211] Fps is (10 sec: 17612.7, 60 sec: 14404.3, 300 sec: 14162.4). Total num frames: 635060224. Throughput: 0: 3678.3. Samples: 147933270. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 08:59:08,968][134211] Avg episode reward: [(0, '9.311')] [2025-01-04 08:59:12,176][134294] Updated weights for policy 0, policy_version 155054 (0.0029) [2025-01-04 08:59:13,969][134211] Fps is (10 sec: 14743.6, 60 sec: 14335.7, 300 sec: 14134.6). Total num frames: 635121664. Throughput: 0: 3647.4. Samples: 147951544. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 08:59:13,970][134211] Avg episode reward: [(0, '8.911')] [2025-01-04 08:59:13,979][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000155059_635121664.pth... [2025-01-04 08:59:14,053][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000154234_631742464.pth [2025-01-04 08:59:15,428][134294] Updated weights for policy 0, policy_version 155064 (0.0027) [2025-01-04 08:59:18,368][134294] Updated weights for policy 0, policy_version 155074 (0.0026) [2025-01-04 08:59:18,968][134211] Fps is (10 sec: 12696.8, 60 sec: 14267.6, 300 sec: 14120.8). Total num frames: 635187200. Throughput: 0: 3550.8. Samples: 147961316. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 08:59:18,969][134211] Avg episode reward: [(0, '9.867')] [2025-01-04 08:59:21,466][134294] Updated weights for policy 0, policy_version 155084 (0.0024) [2025-01-04 08:59:23,968][134211] Fps is (10 sec: 13108.8, 60 sec: 14267.7, 300 sec: 14106.9). Total num frames: 635252736. Throughput: 0: 3493.9. Samples: 147981496. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 08:59:23,968][134211] Avg episode reward: [(0, '8.853')] [2025-01-04 08:59:24,645][134294] Updated weights for policy 0, policy_version 155094 (0.0025) [2025-01-04 08:59:27,616][134294] Updated weights for policy 0, policy_version 155104 (0.0024) [2025-01-04 08:59:28,969][134211] Fps is (10 sec: 13515.8, 60 sec: 14267.4, 300 sec: 14093.0). Total num frames: 635322368. Throughput: 0: 3500.3. Samples: 148001538. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 08:59:28,970][134211] Avg episode reward: [(0, '8.914')] [2025-01-04 08:59:30,636][134294] Updated weights for policy 0, policy_version 155114 (0.0027) [2025-01-04 08:59:33,950][134294] Updated weights for policy 0, policy_version 155124 (0.0026) [2025-01-04 08:59:33,969][134211] Fps is (10 sec: 13515.7, 60 sec: 13721.4, 300 sec: 14093.0). Total num frames: 635387904. Throughput: 0: 3499.9. Samples: 148011652. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 08:59:33,969][134211] Avg episode reward: [(0, '9.287')] [2025-01-04 08:59:37,203][134294] Updated weights for policy 0, policy_version 155134 (0.0027) [2025-01-04 08:59:38,968][134211] Fps is (10 sec: 12699.2, 60 sec: 13789.8, 300 sec: 14009.7). Total num frames: 635449344. Throughput: 0: 3464.8. Samples: 148030362. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 08:59:38,968][134211] Avg episode reward: [(0, '8.469')] [2025-01-04 08:59:40,354][134294] Updated weights for policy 0, policy_version 155144 (0.0025) [2025-01-04 08:59:43,318][134294] Updated weights for policy 0, policy_version 155154 (0.0025) [2025-01-04 08:59:43,968][134211] Fps is (10 sec: 13108.1, 60 sec: 13789.9, 300 sec: 13884.7). Total num frames: 635518976. Throughput: 0: 3524.2. Samples: 148050636. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 08:59:43,968][134211] Avg episode reward: [(0, '8.396')] [2025-01-04 08:59:46,196][134294] Updated weights for policy 0, policy_version 155164 (0.0025) [2025-01-04 08:59:48,968][134211] Fps is (10 sec: 13926.6, 60 sec: 13789.9, 300 sec: 13857.0). Total num frames: 635588608. Throughput: 0: 3494.6. Samples: 148060892. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 08:59:48,968][134211] Avg episode reward: [(0, '9.404')] [2025-01-04 08:59:49,263][134294] Updated weights for policy 0, policy_version 155174 (0.0025) [2025-01-04 08:59:52,348][134294] Updated weights for policy 0, policy_version 155184 (0.0027) [2025-01-04 08:59:53,968][134211] Fps is (10 sec: 13516.8, 60 sec: 13858.1, 300 sec: 13870.9). Total num frames: 635654144. Throughput: 0: 3287.7. Samples: 148081216. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 08:59:53,969][134211] Avg episode reward: [(0, '9.490')] [2025-01-04 08:59:55,327][134294] Updated weights for policy 0, policy_version 155194 (0.0028) [2025-01-04 08:59:58,178][134294] Updated weights for policy 0, policy_version 155204 (0.0026) [2025-01-04 08:59:58,968][134211] Fps is (10 sec: 13516.6, 60 sec: 13994.6, 300 sec: 13926.4). Total num frames: 635723776. Throughput: 0: 3345.7. Samples: 148102098. Policy #0 lag: (min: 0.0, avg: 0.8, max: 3.0) [2025-01-04 08:59:58,968][134211] Avg episode reward: [(0, '10.527')] [2025-01-04 09:00:01,275][134294] Updated weights for policy 0, policy_version 155214 (0.0020) [2025-01-04 09:00:03,462][134294] Updated weights for policy 0, policy_version 155224 (0.0017) [2025-01-04 09:00:03,969][134211] Fps is (10 sec: 15153.1, 60 sec: 13857.7, 300 sec: 14009.6). Total num frames: 635805696. Throughput: 0: 3353.0. Samples: 148112206. Policy #0 lag: (min: 0.0, avg: 0.8, max: 3.0) [2025-01-04 09:00:03,970][134211] Avg episode reward: [(0, '9.163')] [2025-01-04 09:00:05,561][134294] Updated weights for policy 0, policy_version 155234 (0.0014) [2025-01-04 09:00:07,559][134294] Updated weights for policy 0, policy_version 155244 (0.0012) [2025-01-04 09:00:08,968][134211] Fps is (10 sec: 17612.8, 60 sec: 13994.7, 300 sec: 14093.0). Total num frames: 635899904. Throughput: 0: 3547.5. Samples: 148141134. Policy #0 lag: (min: 0.0, avg: 0.8, max: 3.0) [2025-01-04 09:00:08,968][134211] Avg episode reward: [(0, '8.265')] [2025-01-04 09:00:10,718][134294] Updated weights for policy 0, policy_version 155254 (0.0025) [2025-01-04 09:00:13,968][134211] Fps is (10 sec: 15157.2, 60 sec: 13926.6, 300 sec: 14065.2). Total num frames: 635957248. Throughput: 0: 3542.9. Samples: 148160964. Policy #0 lag: (min: 0.0, avg: 0.8, max: 3.0) [2025-01-04 09:00:13,968][134211] Avg episode reward: [(0, '9.649')] [2025-01-04 09:00:14,066][134294] Updated weights for policy 0, policy_version 155264 (0.0024) [2025-01-04 09:00:17,555][134294] Updated weights for policy 0, policy_version 155274 (0.0030) [2025-01-04 09:00:18,968][134211] Fps is (10 sec: 11468.8, 60 sec: 13790.0, 300 sec: 14023.6). Total num frames: 636014592. Throughput: 0: 3514.1. Samples: 148169786. Policy #0 lag: (min: 0.0, avg: 0.8, max: 3.0) [2025-01-04 09:00:18,968][134211] Avg episode reward: [(0, '9.157')] [2025-01-04 09:00:21,153][134294] Updated weights for policy 0, policy_version 155284 (0.0027) [2025-01-04 09:00:23,968][134211] Fps is (10 sec: 11468.9, 60 sec: 13653.3, 300 sec: 13981.9). Total num frames: 636071936. Throughput: 0: 3478.7. Samples: 148186902. Policy #0 lag: (min: 0.0, avg: 0.8, max: 3.0) [2025-01-04 09:00:23,968][134211] Avg episode reward: [(0, '8.780')] [2025-01-04 09:00:24,767][134294] Updated weights for policy 0, policy_version 155294 (0.0028) [2025-01-04 09:00:27,705][134294] Updated weights for policy 0, policy_version 155304 (0.0019) [2025-01-04 09:00:28,968][134211] Fps is (10 sec: 12697.8, 60 sec: 13653.7, 300 sec: 14009.7). Total num frames: 636141568. Throughput: 0: 3466.0. Samples: 148206606. Policy #0 lag: (min: 0.0, avg: 0.8, max: 3.0) [2025-01-04 09:00:28,968][134211] Avg episode reward: [(0, '9.124')] [2025-01-04 09:00:30,633][134294] Updated weights for policy 0, policy_version 155314 (0.0023) [2025-01-04 09:00:33,968][134211] Fps is (10 sec: 13107.1, 60 sec: 13585.2, 300 sec: 14009.7). Total num frames: 636203008. Throughput: 0: 3456.9. Samples: 148216452. Policy #0 lag: (min: 0.0, avg: 0.8, max: 3.0) [2025-01-04 09:00:33,968][134211] Avg episode reward: [(0, '9.256')] [2025-01-04 09:00:33,973][134294] Updated weights for policy 0, policy_version 155324 (0.0028) [2025-01-04 09:00:37,547][134294] Updated weights for policy 0, policy_version 155334 (0.0024) [2025-01-04 09:00:38,967][134211] Fps is (10 sec: 13107.3, 60 sec: 13721.7, 300 sec: 13926.4). Total num frames: 636272640. Throughput: 0: 3396.8. Samples: 148234070. Policy #0 lag: (min: 0.0, avg: 0.8, max: 3.0) [2025-01-04 09:00:38,968][134211] Avg episode reward: [(0, '8.963')] [2025-01-04 09:00:39,819][134294] Updated weights for policy 0, policy_version 155344 (0.0015) [2025-01-04 09:00:42,002][134294] Updated weights for policy 0, policy_version 155354 (0.0015) [2025-01-04 09:00:43,968][134211] Fps is (10 sec: 15974.8, 60 sec: 14063.0, 300 sec: 13968.1). Total num frames: 636362752. Throughput: 0: 3531.3. Samples: 148261004. Policy #0 lag: (min: 0.0, avg: 0.8, max: 3.0) [2025-01-04 09:00:43,968][134211] Avg episode reward: [(0, '9.517')] [2025-01-04 09:00:44,204][134294] Updated weights for policy 0, policy_version 155364 (0.0014) [2025-01-04 09:00:46,377][134294] Updated weights for policy 0, policy_version 155374 (0.0014) [2025-01-04 09:00:48,532][134294] Updated weights for policy 0, policy_version 155384 (0.0013) [2025-01-04 09:00:48,968][134211] Fps is (10 sec: 18431.9, 60 sec: 14472.6, 300 sec: 14065.3). Total num frames: 636456960. Throughput: 0: 3624.4. Samples: 148275296. Policy #0 lag: (min: 0.0, avg: 0.8, max: 3.0) [2025-01-04 09:00:48,968][134211] Avg episode reward: [(0, '9.057')] [2025-01-04 09:00:51,500][134294] Updated weights for policy 0, policy_version 155394 (0.0024) [2025-01-04 09:00:53,970][134211] Fps is (10 sec: 15561.3, 60 sec: 14403.8, 300 sec: 14051.3). Total num frames: 636518400. Throughput: 0: 3496.1. Samples: 148298466. Policy #0 lag: (min: 0.0, avg: 0.8, max: 3.0) [2025-01-04 09:00:53,971][134211] Avg episode reward: [(0, '8.500')] [2025-01-04 09:00:55,372][134294] Updated weights for policy 0, policy_version 155404 (0.0032) [2025-01-04 09:00:58,968][134211] Fps is (10 sec: 11468.5, 60 sec: 14131.2, 300 sec: 14009.7). Total num frames: 636571648. Throughput: 0: 3409.2. Samples: 148314376. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 09:00:58,969][134211] Avg episode reward: [(0, '8.935')] [2025-01-04 09:00:59,219][134294] Updated weights for policy 0, policy_version 155414 (0.0029) [2025-01-04 09:01:02,856][134294] Updated weights for policy 0, policy_version 155424 (0.0026) [2025-01-04 09:01:03,968][134211] Fps is (10 sec: 10651.7, 60 sec: 13653.6, 300 sec: 13968.1). Total num frames: 636624896. Throughput: 0: 3398.3. Samples: 148322708. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 09:01:03,969][134211] Avg episode reward: [(0, '8.110')] [2025-01-04 09:01:06,516][134294] Updated weights for policy 0, policy_version 155434 (0.0028) [2025-01-04 09:01:08,968][134211] Fps is (10 sec: 11468.5, 60 sec: 13107.1, 300 sec: 13954.2). Total num frames: 636686336. Throughput: 0: 3394.2. Samples: 148339640. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 09:01:08,969][134211] Avg episode reward: [(0, '9.375')] [2025-01-04 09:01:10,070][134294] Updated weights for policy 0, policy_version 155444 (0.0027) [2025-01-04 09:01:13,608][134294] Updated weights for policy 0, policy_version 155454 (0.0028) [2025-01-04 09:01:13,970][134211] Fps is (10 sec: 11466.4, 60 sec: 13038.5, 300 sec: 13898.5). Total num frames: 636739584. Throughput: 0: 3339.9. Samples: 148356908. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 09:01:13,971][134211] Avg episode reward: [(0, '8.417')] [2025-01-04 09:01:13,982][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000155455_636743680.pth... [2025-01-04 09:01:14,059][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000154648_633438208.pth [2025-01-04 09:01:17,139][134294] Updated weights for policy 0, policy_version 155464 (0.0025) [2025-01-04 09:01:18,968][134211] Fps is (10 sec: 11469.2, 60 sec: 13107.2, 300 sec: 13815.3). Total num frames: 636801024. Throughput: 0: 3314.2. Samples: 148365590. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 09:01:18,968][134211] Avg episode reward: [(0, '8.357')] [2025-01-04 09:01:20,640][134294] Updated weights for policy 0, policy_version 155474 (0.0029) [2025-01-04 09:01:23,968][134211] Fps is (10 sec: 11880.9, 60 sec: 13107.2, 300 sec: 13732.0). Total num frames: 636858368. Throughput: 0: 3318.7. Samples: 148383414. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 09:01:23,969][134211] Avg episode reward: [(0, '8.680')] [2025-01-04 09:01:24,056][134294] Updated weights for policy 0, policy_version 155484 (0.0024) [2025-01-04 09:01:27,601][134294] Updated weights for policy 0, policy_version 155494 (0.0025) [2025-01-04 09:01:28,968][134211] Fps is (10 sec: 11878.4, 60 sec: 12970.7, 300 sec: 13732.0). Total num frames: 636919808. Throughput: 0: 3107.6. Samples: 148400846. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 09:01:28,968][134211] Avg episode reward: [(0, '8.307')] [2025-01-04 09:01:30,950][134294] Updated weights for policy 0, policy_version 155504 (0.0024) [2025-01-04 09:01:33,118][134294] Updated weights for policy 0, policy_version 155514 (0.0013) [2025-01-04 09:01:33,967][134211] Fps is (10 sec: 13926.8, 60 sec: 13243.8, 300 sec: 13787.6). Total num frames: 636997632. Throughput: 0: 3000.7. Samples: 148410328. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 09:01:33,968][134211] Avg episode reward: [(0, '8.893')] [2025-01-04 09:01:35,332][134294] Updated weights for policy 0, policy_version 155524 (0.0013) [2025-01-04 09:01:37,527][134294] Updated weights for policy 0, policy_version 155534 (0.0012) [2025-01-04 09:01:38,967][134211] Fps is (10 sec: 17203.5, 60 sec: 13653.3, 300 sec: 13870.9). Total num frames: 637091840. Throughput: 0: 3113.0. Samples: 148438544. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 09:01:38,968][134211] Avg episode reward: [(0, '8.235')] [2025-01-04 09:01:39,727][134294] Updated weights for policy 0, policy_version 155544 (0.0013) [2025-01-04 09:01:42,305][134294] Updated weights for policy 0, policy_version 155554 (0.0018) [2025-01-04 09:01:43,968][134211] Fps is (10 sec: 16793.0, 60 sec: 13380.2, 300 sec: 13898.6). Total num frames: 637165568. Throughput: 0: 3296.2. Samples: 148462704. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 09:01:43,969][134211] Avg episode reward: [(0, '8.618')] [2025-01-04 09:01:46,152][134294] Updated weights for policy 0, policy_version 155564 (0.0028) [2025-01-04 09:01:48,968][134211] Fps is (10 sec: 12697.2, 60 sec: 12697.5, 300 sec: 13787.5). Total num frames: 637218816. Throughput: 0: 3291.6. Samples: 148470830. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 09:01:48,971][134211] Avg episode reward: [(0, '9.406')] [2025-01-04 09:01:50,023][134294] Updated weights for policy 0, policy_version 155574 (0.0028) [2025-01-04 09:01:53,656][134294] Updated weights for policy 0, policy_version 155584 (0.0026) [2025-01-04 09:01:53,969][134211] Fps is (10 sec: 10648.5, 60 sec: 12561.3, 300 sec: 13634.8). Total num frames: 637272064. Throughput: 0: 3278.0. Samples: 148487154. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 09:01:53,970][134211] Avg episode reward: [(0, '9.305')] [2025-01-04 09:01:57,184][134294] Updated weights for policy 0, policy_version 155594 (0.0027) [2025-01-04 09:01:58,968][134211] Fps is (10 sec: 11059.4, 60 sec: 12629.4, 300 sec: 13620.9). Total num frames: 637329408. Throughput: 0: 3271.4. Samples: 148504112. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 09:01:58,968][134211] Avg episode reward: [(0, '8.498')] [2025-01-04 09:02:00,728][134294] Updated weights for policy 0, policy_version 155604 (0.0026) [2025-01-04 09:02:03,968][134211] Fps is (10 sec: 11879.7, 60 sec: 12765.9, 300 sec: 13634.8). Total num frames: 637390848. Throughput: 0: 3275.8. Samples: 148513002. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 09:02:03,969][134211] Avg episode reward: [(0, '8.866')] [2025-01-04 09:02:04,277][134294] Updated weights for policy 0, policy_version 155614 (0.0028) [2025-01-04 09:02:07,672][134294] Updated weights for policy 0, policy_version 155624 (0.0029) [2025-01-04 09:02:08,968][134211] Fps is (10 sec: 11468.6, 60 sec: 12629.4, 300 sec: 13579.3). Total num frames: 637444096. Throughput: 0: 3267.5. Samples: 148530450. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 09:02:08,969][134211] Avg episode reward: [(0, '9.380')] [2025-01-04 09:02:11,592][134294] Updated weights for policy 0, policy_version 155634 (0.0026) [2025-01-04 09:02:13,968][134211] Fps is (10 sec: 11059.4, 60 sec: 12698.1, 300 sec: 13537.6). Total num frames: 637501440. Throughput: 0: 3231.1. Samples: 148546244. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 09:02:13,968][134211] Avg episode reward: [(0, '9.102')] [2025-01-04 09:02:14,870][134294] Updated weights for policy 0, policy_version 155644 (0.0019) [2025-01-04 09:02:17,046][134294] Updated weights for policy 0, policy_version 155654 (0.0015) [2025-01-04 09:02:18,968][134211] Fps is (10 sec: 14746.0, 60 sec: 13175.5, 300 sec: 13634.8). Total num frames: 637591552. Throughput: 0: 3299.8. Samples: 148558820. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 09:02:18,968][134211] Avg episode reward: [(0, '8.844')] [2025-01-04 09:02:19,205][134294] Updated weights for policy 0, policy_version 155664 (0.0013) [2025-01-04 09:02:22,508][134294] Updated weights for policy 0, policy_version 155674 (0.0026) [2025-01-04 09:02:23,968][134211] Fps is (10 sec: 15564.4, 60 sec: 13312.0, 300 sec: 13634.8). Total num frames: 637657088. Throughput: 0: 3201.7. Samples: 148582620. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 09:02:23,969][134211] Avg episode reward: [(0, '8.271')] [2025-01-04 09:02:26,057][134294] Updated weights for policy 0, policy_version 155684 (0.0028) [2025-01-04 09:02:28,968][134211] Fps is (10 sec: 11878.3, 60 sec: 13175.5, 300 sec: 13607.1). Total num frames: 637710336. Throughput: 0: 3036.0. Samples: 148599322. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 09:02:28,968][134211] Avg episode reward: [(0, '9.214')] [2025-01-04 09:02:29,753][134294] Updated weights for policy 0, policy_version 155694 (0.0025) [2025-01-04 09:02:32,840][134294] Updated weights for policy 0, policy_version 155704 (0.0027) [2025-01-04 09:02:33,968][134211] Fps is (10 sec: 11878.7, 60 sec: 12970.6, 300 sec: 13620.9). Total num frames: 637775872. Throughput: 0: 3063.7. Samples: 148608698. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 09:02:33,968][134211] Avg episode reward: [(0, '8.896')] [2025-01-04 09:02:35,879][134294] Updated weights for policy 0, policy_version 155714 (0.0025) [2025-01-04 09:02:38,836][134294] Updated weights for policy 0, policy_version 155724 (0.0025) [2025-01-04 09:02:38,968][134211] Fps is (10 sec: 13516.3, 60 sec: 12561.0, 300 sec: 13551.5). Total num frames: 637845504. Throughput: 0: 3151.3. Samples: 148628960. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 09:02:38,969][134211] Avg episode reward: [(0, '9.390')] [2025-01-04 09:02:42,052][134294] Updated weights for policy 0, policy_version 155734 (0.0025) [2025-01-04 09:02:43,968][134211] Fps is (10 sec: 13107.2, 60 sec: 12356.3, 300 sec: 13523.7). Total num frames: 637906944. Throughput: 0: 3199.4. Samples: 148648086. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 09:02:43,968][134211] Avg episode reward: [(0, '8.880')] [2025-01-04 09:02:45,416][134294] Updated weights for policy 0, policy_version 155744 (0.0028) [2025-01-04 09:02:48,389][134294] Updated weights for policy 0, policy_version 155754 (0.0023) [2025-01-04 09:02:48,968][134211] Fps is (10 sec: 12698.1, 60 sec: 12561.1, 300 sec: 13523.8). Total num frames: 637972480. Throughput: 0: 3214.6. Samples: 148657660. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 09:02:48,968][134211] Avg episode reward: [(0, '8.970')] [2025-01-04 09:02:51,433][134294] Updated weights for policy 0, policy_version 155764 (0.0023) [2025-01-04 09:02:53,968][134211] Fps is (10 sec: 13106.4, 60 sec: 12766.0, 300 sec: 13509.8). Total num frames: 638038016. Throughput: 0: 3271.8. Samples: 148677682. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 09:02:53,969][134211] Avg episode reward: [(0, '8.675')] [2025-01-04 09:02:54,813][134294] Updated weights for policy 0, policy_version 155774 (0.0028) [2025-01-04 09:02:57,891][134294] Updated weights for policy 0, policy_version 155784 (0.0026) [2025-01-04 09:02:58,969][134211] Fps is (10 sec: 13106.6, 60 sec: 12902.3, 300 sec: 13454.3). Total num frames: 638103552. Throughput: 0: 3351.3. Samples: 148697052. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 09:02:58,970][134211] Avg episode reward: [(0, '9.111')] [2025-01-04 09:03:00,563][134294] Updated weights for policy 0, policy_version 155794 (0.0021) [2025-01-04 09:03:02,489][134294] Updated weights for policy 0, policy_version 155804 (0.0013) [2025-01-04 09:03:03,968][134211] Fps is (10 sec: 16794.4, 60 sec: 13585.1, 300 sec: 13593.2). Total num frames: 638205952. Throughput: 0: 3353.3. Samples: 148709718. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 09:03:03,968][134211] Avg episode reward: [(0, '7.925')] [2025-01-04 09:03:04,375][134294] Updated weights for policy 0, policy_version 155814 (0.0014) [2025-01-04 09:03:06,326][134294] Updated weights for policy 0, policy_version 155824 (0.0013) [2025-01-04 09:03:08,354][134294] Updated weights for policy 0, policy_version 155834 (0.0013) [2025-01-04 09:03:08,968][134211] Fps is (10 sec: 20071.5, 60 sec: 14336.1, 300 sec: 13704.3). Total num frames: 638304256. Throughput: 0: 3531.9. Samples: 148741556. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-04 09:03:08,968][134211] Avg episode reward: [(0, '9.169')] [2025-01-04 09:03:11,389][134294] Updated weights for policy 0, policy_version 155844 (0.0027) [2025-01-04 09:03:13,968][134211] Fps is (10 sec: 15564.8, 60 sec: 14336.0, 300 sec: 13662.6). Total num frames: 638361600. Throughput: 0: 3614.8. Samples: 148761990. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-04 09:03:13,968][134211] Avg episode reward: [(0, '9.625')] [2025-01-04 09:03:13,982][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000155850_638361600.pth... [2025-01-04 09:03:14,072][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000155059_635121664.pth [2025-01-04 09:03:15,358][134294] Updated weights for policy 0, policy_version 155854 (0.0028) [2025-01-04 09:03:18,508][134294] Updated weights for policy 0, policy_version 155864 (0.0029) [2025-01-04 09:03:18,968][134211] Fps is (10 sec: 11878.2, 60 sec: 13858.1, 300 sec: 13648.7). Total num frames: 638423040. Throughput: 0: 3589.1. Samples: 148770206. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-04 09:03:18,968][134211] Avg episode reward: [(0, '9.181')] [2025-01-04 09:03:21,567][134294] Updated weights for policy 0, policy_version 155874 (0.0026) [2025-01-04 09:03:23,968][134211] Fps is (10 sec: 12697.7, 60 sec: 13858.2, 300 sec: 13634.8). Total num frames: 638488576. Throughput: 0: 3576.8. Samples: 148789914. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-04 09:03:23,968][134211] Avg episode reward: [(0, '9.291')] [2025-01-04 09:03:24,790][134294] Updated weights for policy 0, policy_version 155884 (0.0025) [2025-01-04 09:03:27,857][134294] Updated weights for policy 0, policy_version 155894 (0.0027) [2025-01-04 09:03:28,968][134211] Fps is (10 sec: 13107.4, 60 sec: 14062.9, 300 sec: 13523.7). Total num frames: 638554112. Throughput: 0: 3594.0. Samples: 148809816. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-04 09:03:28,968][134211] Avg episode reward: [(0, '8.424')] [2025-01-04 09:03:30,843][134294] Updated weights for policy 0, policy_version 155904 (0.0025) [2025-01-04 09:03:33,842][134294] Updated weights for policy 0, policy_version 155914 (0.0026) [2025-01-04 09:03:33,969][134211] Fps is (10 sec: 13515.7, 60 sec: 14131.0, 300 sec: 13565.4). Total num frames: 638623744. Throughput: 0: 3608.9. Samples: 148820062. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-04 09:03:33,969][134211] Avg episode reward: [(0, '7.706')] [2025-01-04 09:03:36,805][134294] Updated weights for policy 0, policy_version 155924 (0.0024) [2025-01-04 09:03:38,968][134211] Fps is (10 sec: 13516.6, 60 sec: 14063.0, 300 sec: 13551.5). Total num frames: 638689280. Throughput: 0: 3619.2. Samples: 148840546. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-04 09:03:38,968][134211] Avg episode reward: [(0, '8.469')] [2025-01-04 09:03:40,055][134294] Updated weights for policy 0, policy_version 155934 (0.0024) [2025-01-04 09:03:43,516][134294] Updated weights for policy 0, policy_version 155944 (0.0024) [2025-01-04 09:03:43,968][134211] Fps is (10 sec: 12698.5, 60 sec: 14062.9, 300 sec: 13523.7). Total num frames: 638750720. Throughput: 0: 3592.6. Samples: 148858720. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-04 09:03:43,968][134211] Avg episode reward: [(0, '9.716')] [2025-01-04 09:03:46,781][134294] Updated weights for policy 0, policy_version 155954 (0.0028) [2025-01-04 09:03:48,968][134211] Fps is (10 sec: 12697.7, 60 sec: 14062.9, 300 sec: 13537.6). Total num frames: 638816256. Throughput: 0: 3516.6. Samples: 148867966. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-04 09:03:48,968][134211] Avg episode reward: [(0, '8.120')] [2025-01-04 09:03:49,841][134294] Updated weights for policy 0, policy_version 155964 (0.0026) [2025-01-04 09:03:52,842][134294] Updated weights for policy 0, policy_version 155974 (0.0026) [2025-01-04 09:03:53,968][134211] Fps is (10 sec: 13107.1, 60 sec: 14063.0, 300 sec: 13551.5). Total num frames: 638881792. Throughput: 0: 3259.9. Samples: 148888254. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-04 09:03:53,969][134211] Avg episode reward: [(0, '8.415')] [2025-01-04 09:03:56,242][134294] Updated weights for policy 0, policy_version 155984 (0.0024) [2025-01-04 09:03:58,968][134211] Fps is (10 sec: 12287.8, 60 sec: 13926.5, 300 sec: 13440.4). Total num frames: 638939136. Throughput: 0: 3211.3. Samples: 148906498. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-04 09:03:58,968][134211] Avg episode reward: [(0, '8.494')] [2025-01-04 09:03:59,708][134294] Updated weights for policy 0, policy_version 155994 (0.0027) [2025-01-04 09:04:02,735][134294] Updated weights for policy 0, policy_version 156004 (0.0021) [2025-01-04 09:04:03,967][134211] Fps is (10 sec: 13517.2, 60 sec: 13516.8, 300 sec: 13412.7). Total num frames: 639016960. Throughput: 0: 3226.5. Samples: 148915398. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-04 09:04:03,968][134211] Avg episode reward: [(0, '8.462')] [2025-01-04 09:04:04,687][134294] Updated weights for policy 0, policy_version 156014 (0.0014) [2025-01-04 09:04:06,540][134294] Updated weights for policy 0, policy_version 156024 (0.0013) [2025-01-04 09:04:08,639][134294] Updated weights for policy 0, policy_version 156034 (0.0017) [2025-01-04 09:04:08,968][134211] Fps is (10 sec: 18022.7, 60 sec: 13585.0, 300 sec: 13551.6). Total num frames: 639119360. Throughput: 0: 3449.0. Samples: 148945120. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-04 09:04:08,968][134211] Avg episode reward: [(0, '8.958')] [2025-01-04 09:04:12,137][134294] Updated weights for policy 0, policy_version 156044 (0.0026) [2025-01-04 09:04:13,968][134211] Fps is (10 sec: 15974.1, 60 sec: 13585.1, 300 sec: 13523.8). Total num frames: 639176704. Throughput: 0: 3452.3. Samples: 148965168. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 09:04:13,968][134211] Avg episode reward: [(0, '8.534')] [2025-01-04 09:04:15,584][134294] Updated weights for policy 0, policy_version 156054 (0.0026) [2025-01-04 09:04:18,645][134294] Updated weights for policy 0, policy_version 156064 (0.0025) [2025-01-04 09:04:18,968][134211] Fps is (10 sec: 12287.9, 60 sec: 13653.3, 300 sec: 13523.7). Total num frames: 639242240. Throughput: 0: 3431.1. Samples: 148974458. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 09:04:18,968][134211] Avg episode reward: [(0, '9.469')] [2025-01-04 09:04:21,644][134294] Updated weights for policy 0, policy_version 156074 (0.0028) [2025-01-04 09:04:23,968][134211] Fps is (10 sec: 13107.2, 60 sec: 13653.3, 300 sec: 13509.9). Total num frames: 639307776. Throughput: 0: 3426.7. Samples: 148994746. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 09:04:23,968][134211] Avg episode reward: [(0, '9.574')] [2025-01-04 09:04:24,818][134294] Updated weights for policy 0, policy_version 156084 (0.0026) [2025-01-04 09:04:27,841][134294] Updated weights for policy 0, policy_version 156094 (0.0027) [2025-01-04 09:04:28,968][134211] Fps is (10 sec: 13107.1, 60 sec: 13653.3, 300 sec: 13509.9). Total num frames: 639373312. Throughput: 0: 3463.6. Samples: 149014582. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 09:04:28,968][134211] Avg episode reward: [(0, '8.895')] [2025-01-04 09:04:30,894][134294] Updated weights for policy 0, policy_version 156104 (0.0027) [2025-01-04 09:04:33,846][134294] Updated weights for policy 0, policy_version 156114 (0.0024) [2025-01-04 09:04:33,968][134211] Fps is (10 sec: 13516.8, 60 sec: 13653.5, 300 sec: 13537.6). Total num frames: 639442944. Throughput: 0: 3481.5. Samples: 149024632. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 09:04:33,968][134211] Avg episode reward: [(0, '9.428')] [2025-01-04 09:04:36,848][134294] Updated weights for policy 0, policy_version 156124 (0.0027) [2025-01-04 09:04:38,968][134211] Fps is (10 sec: 13516.5, 60 sec: 13653.3, 300 sec: 13523.7). Total num frames: 639508480. Throughput: 0: 3491.4. Samples: 149045368. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 09:04:38,969][134211] Avg episode reward: [(0, '9.021')] [2025-01-04 09:04:40,151][134294] Updated weights for policy 0, policy_version 156134 (0.0028) [2025-01-04 09:04:43,511][134294] Updated weights for policy 0, policy_version 156144 (0.0026) [2025-01-04 09:04:43,968][134211] Fps is (10 sec: 12697.8, 60 sec: 13653.4, 300 sec: 13496.0). Total num frames: 639569920. Throughput: 0: 3493.3. Samples: 149063694. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 09:04:43,968][134211] Avg episode reward: [(0, '8.478')] [2025-01-04 09:04:46,837][134294] Updated weights for policy 0, policy_version 156154 (0.0028) [2025-01-04 09:04:48,969][134211] Fps is (10 sec: 12287.1, 60 sec: 13584.8, 300 sec: 13482.0). Total num frames: 639631360. Throughput: 0: 3497.6. Samples: 149072794. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 09:04:48,973][134211] Avg episode reward: [(0, '9.069')] [2025-01-04 09:04:49,908][134294] Updated weights for policy 0, policy_version 156164 (0.0025) [2025-01-04 09:04:52,269][134294] Updated weights for policy 0, policy_version 156174 (0.0017) [2025-01-04 09:04:53,968][134211] Fps is (10 sec: 15564.9, 60 sec: 14063.0, 300 sec: 13565.4). Total num frames: 639725568. Throughput: 0: 3334.4. Samples: 149095166. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 09:04:53,968][134211] Avg episode reward: [(0, '8.466')] [2025-01-04 09:04:54,138][134294] Updated weights for policy 0, policy_version 156184 (0.0012) [2025-01-04 09:04:56,392][134294] Updated weights for policy 0, policy_version 156194 (0.0018) [2025-01-04 09:04:58,968][134211] Fps is (10 sec: 17204.7, 60 sec: 14404.2, 300 sec: 13551.6). Total num frames: 639803392. Throughput: 0: 3484.9. Samples: 149121988. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 09:04:58,969][134211] Avg episode reward: [(0, '9.295')] [2025-01-04 09:04:59,461][134294] Updated weights for policy 0, policy_version 156204 (0.0024) [2025-01-04 09:05:02,642][134294] Updated weights for policy 0, policy_version 156214 (0.0023) [2025-01-04 09:05:03,968][134211] Fps is (10 sec: 14335.8, 60 sec: 14199.4, 300 sec: 13454.3). Total num frames: 639868928. Throughput: 0: 3495.0. Samples: 149131734. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 09:05:03,968][134211] Avg episode reward: [(0, '8.671')] [2025-01-04 09:05:05,680][134294] Updated weights for policy 0, policy_version 156224 (0.0024) [2025-01-04 09:05:08,721][134294] Updated weights for policy 0, policy_version 156234 (0.0029) [2025-01-04 09:05:08,968][134211] Fps is (10 sec: 13107.3, 60 sec: 13585.0, 300 sec: 13482.1). Total num frames: 639934464. Throughput: 0: 3489.2. Samples: 149151758. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 09:05:08,968][134211] Avg episode reward: [(0, '8.398')] [2025-01-04 09:05:12,053][134294] Updated weights for policy 0, policy_version 156244 (0.0028) [2025-01-04 09:05:13,968][134211] Fps is (10 sec: 12697.5, 60 sec: 13653.3, 300 sec: 13496.0). Total num frames: 639995904. Throughput: 0: 3466.8. Samples: 149170588. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 09:05:13,969][134211] Avg episode reward: [(0, '8.274')] [2025-01-04 09:05:14,045][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000156250_640000000.pth... [2025-01-04 09:05:14,119][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000155455_636743680.pth [2025-01-04 09:05:15,380][134294] Updated weights for policy 0, policy_version 156254 (0.0024) [2025-01-04 09:05:18,285][134294] Updated weights for policy 0, policy_version 156264 (0.0025) [2025-01-04 09:05:18,968][134211] Fps is (10 sec: 12697.8, 60 sec: 13653.3, 300 sec: 13523.7). Total num frames: 640061440. Throughput: 0: 3455.1. Samples: 149180112. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 09:05:18,968][134211] Avg episode reward: [(0, '9.138')] [2025-01-04 09:05:21,496][134294] Updated weights for policy 0, policy_version 156274 (0.0024) [2025-01-04 09:05:23,968][134211] Fps is (10 sec: 13107.2, 60 sec: 13653.3, 300 sec: 13509.8). Total num frames: 640126976. Throughput: 0: 3437.0. Samples: 149200034. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 09:05:23,968][134211] Avg episode reward: [(0, '8.340')] [2025-01-04 09:05:24,544][134294] Updated weights for policy 0, policy_version 156284 (0.0027) [2025-01-04 09:05:27,756][134294] Updated weights for policy 0, policy_version 156294 (0.0028) [2025-01-04 09:05:28,968][134211] Fps is (10 sec: 13516.3, 60 sec: 13721.5, 300 sec: 13537.6). Total num frames: 640196608. Throughput: 0: 3472.0. Samples: 149219936. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 09:05:28,968][134211] Avg episode reward: [(0, '9.808')] [2025-01-04 09:05:29,974][134294] Updated weights for policy 0, policy_version 156304 (0.0016) [2025-01-04 09:05:31,885][134294] Updated weights for policy 0, policy_version 156314 (0.0016) [2025-01-04 09:05:33,725][134294] Updated weights for policy 0, policy_version 156324 (0.0016) [2025-01-04 09:05:33,968][134211] Fps is (10 sec: 18022.8, 60 sec: 14404.3, 300 sec: 13676.5). Total num frames: 640307200. Throughput: 0: 3606.8. Samples: 149235098. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 09:05:33,968][134211] Avg episode reward: [(0, '9.616')] [2025-01-04 09:05:35,670][134294] Updated weights for policy 0, policy_version 156334 (0.0014) [2025-01-04 09:05:37,950][134294] Updated weights for policy 0, policy_version 156344 (0.0018) [2025-01-04 09:05:38,968][134211] Fps is (10 sec: 19661.4, 60 sec: 14745.7, 300 sec: 13662.6). Total num frames: 640393216. Throughput: 0: 3810.1. Samples: 149266622. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 09:05:38,968][134211] Avg episode reward: [(0, '8.324')] [2025-01-04 09:05:41,397][134294] Updated weights for policy 0, policy_version 156354 (0.0028) [2025-01-04 09:05:43,968][134211] Fps is (10 sec: 14745.2, 60 sec: 14745.6, 300 sec: 13551.5). Total num frames: 640454656. Throughput: 0: 3616.6. Samples: 149284736. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 09:05:43,969][134211] Avg episode reward: [(0, '10.696')] [2025-01-04 09:05:43,978][134264] Saving new best policy, reward=10.696! [2025-01-04 09:05:45,019][134294] Updated weights for policy 0, policy_version 156364 (0.0025) [2025-01-04 09:05:48,065][134294] Updated weights for policy 0, policy_version 156374 (0.0026) [2025-01-04 09:05:48,968][134211] Fps is (10 sec: 12288.0, 60 sec: 14745.8, 300 sec: 13551.6). Total num frames: 640516096. Throughput: 0: 3595.6. Samples: 149293536. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 09:05:48,968][134211] Avg episode reward: [(0, '8.327')] [2025-01-04 09:05:51,128][134294] Updated weights for policy 0, policy_version 156384 (0.0027) [2025-01-04 09:05:53,968][134211] Fps is (10 sec: 12697.1, 60 sec: 14267.6, 300 sec: 13593.1). Total num frames: 640581632. Throughput: 0: 3598.1. Samples: 149313676. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 09:05:53,969][134211] Avg episode reward: [(0, '9.739')] [2025-01-04 09:05:54,579][134294] Updated weights for policy 0, policy_version 156394 (0.0027) [2025-01-04 09:05:58,075][134294] Updated weights for policy 0, policy_version 156404 (0.0029) [2025-01-04 09:05:58,968][134211] Fps is (10 sec: 12288.0, 60 sec: 13926.4, 300 sec: 13607.1). Total num frames: 640638976. Throughput: 0: 3568.6. Samples: 149331174. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 09:05:58,968][134211] Avg episode reward: [(0, '10.244')] [2025-01-04 09:06:01,361][134294] Updated weights for policy 0, policy_version 156414 (0.0025) [2025-01-04 09:06:03,968][134211] Fps is (10 sec: 11879.0, 60 sec: 13858.1, 300 sec: 13607.1). Total num frames: 640700416. Throughput: 0: 3565.2. Samples: 149340546. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 09:06:03,968][134211] Avg episode reward: [(0, '8.374')] [2025-01-04 09:06:04,871][134294] Updated weights for policy 0, policy_version 156424 (0.0026) [2025-01-04 09:06:07,892][134294] Updated weights for policy 0, policy_version 156434 (0.0023) [2025-01-04 09:06:08,968][134211] Fps is (10 sec: 12697.6, 60 sec: 13858.1, 300 sec: 13648.8). Total num frames: 640765952. Throughput: 0: 3536.9. Samples: 149359192. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 09:06:08,968][134211] Avg episode reward: [(0, '9.050')] [2025-01-04 09:06:11,136][134294] Updated weights for policy 0, policy_version 156444 (0.0027) [2025-01-04 09:06:13,968][134211] Fps is (10 sec: 12697.7, 60 sec: 13858.2, 300 sec: 13648.7). Total num frames: 640827392. Throughput: 0: 3518.1. Samples: 149378250. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 09:06:13,968][134211] Avg episode reward: [(0, '9.810')] [2025-01-04 09:06:14,427][134294] Updated weights for policy 0, policy_version 156454 (0.0027) [2025-01-04 09:06:16,522][134294] Updated weights for policy 0, policy_version 156464 (0.0015) [2025-01-04 09:06:18,407][134294] Updated weights for policy 0, policy_version 156474 (0.0013) [2025-01-04 09:06:18,968][134211] Fps is (10 sec: 15974.5, 60 sec: 14404.3, 300 sec: 13787.6). Total num frames: 640925696. Throughput: 0: 3452.4. Samples: 149390456. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-04 09:06:18,968][134211] Avg episode reward: [(0, '9.532')] [2025-01-04 09:06:21,003][134294] Updated weights for policy 0, policy_version 156484 (0.0022) [2025-01-04 09:06:23,968][134211] Fps is (10 sec: 16793.6, 60 sec: 14472.6, 300 sec: 13815.3). Total num frames: 640995328. Throughput: 0: 3339.8. Samples: 149416912. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-04 09:06:23,968][134211] Avg episode reward: [(0, '9.089')] [2025-01-04 09:06:24,029][134294] Updated weights for policy 0, policy_version 156494 (0.0026) [2025-01-04 09:06:26,992][134294] Updated weights for policy 0, policy_version 156504 (0.0025) [2025-01-04 09:06:28,968][134211] Fps is (10 sec: 13926.3, 60 sec: 14472.6, 300 sec: 13787.5). Total num frames: 641064960. Throughput: 0: 3382.1. Samples: 149436932. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-04 09:06:28,968][134211] Avg episode reward: [(0, '9.267')] [2025-01-04 09:06:30,120][134294] Updated weights for policy 0, policy_version 156514 (0.0026) [2025-01-04 09:06:33,049][134294] Updated weights for policy 0, policy_version 156524 (0.0024) [2025-01-04 09:06:33,968][134211] Fps is (10 sec: 13516.8, 60 sec: 13721.6, 300 sec: 13690.4). Total num frames: 641130496. Throughput: 0: 3417.9. Samples: 149447340. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-04 09:06:33,968][134211] Avg episode reward: [(0, '9.674')] [2025-01-04 09:06:36,000][134294] Updated weights for policy 0, policy_version 156534 (0.0029) [2025-01-04 09:06:38,973][134211] Fps is (10 sec: 13100.6, 60 sec: 13379.1, 300 sec: 13662.4). Total num frames: 641196032. Throughput: 0: 3415.4. Samples: 149467384. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-04 09:06:38,974][134211] Avg episode reward: [(0, '8.451')] [2025-01-04 09:06:39,590][134294] Updated weights for policy 0, policy_version 156544 (0.0027) [2025-01-04 09:06:42,917][134294] Updated weights for policy 0, policy_version 156554 (0.0027) [2025-01-04 09:06:43,968][134211] Fps is (10 sec: 12288.0, 60 sec: 13312.0, 300 sec: 13676.5). Total num frames: 641253376. Throughput: 0: 3420.2. Samples: 149485082. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-04 09:06:43,968][134211] Avg episode reward: [(0, '8.602')] [2025-01-04 09:06:46,221][134294] Updated weights for policy 0, policy_version 156564 (0.0022) [2025-01-04 09:06:48,968][134211] Fps is (10 sec: 12704.0, 60 sec: 13448.5, 300 sec: 13732.1). Total num frames: 641323008. Throughput: 0: 3425.4. Samples: 149494690. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-04 09:06:48,968][134211] Avg episode reward: [(0, '7.848')] [2025-01-04 09:06:49,224][134294] Updated weights for policy 0, policy_version 156574 (0.0026) [2025-01-04 09:06:52,255][134294] Updated weights for policy 0, policy_version 156584 (0.0024) [2025-01-04 09:06:53,967][134211] Fps is (10 sec: 13926.7, 60 sec: 13517.0, 300 sec: 13773.7). Total num frames: 641392640. Throughput: 0: 3461.8. Samples: 149514970. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-04 09:06:53,968][134211] Avg episode reward: [(0, '9.100')] [2025-01-04 09:06:54,597][134294] Updated weights for policy 0, policy_version 156594 (0.0017) [2025-01-04 09:06:56,887][134294] Updated weights for policy 0, policy_version 156604 (0.0020) [2025-01-04 09:06:58,968][134211] Fps is (10 sec: 15564.8, 60 sec: 13994.7, 300 sec: 13857.0). Total num frames: 641478656. Throughput: 0: 3596.6. Samples: 149540098. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-04 09:06:58,968][134211] Avg episode reward: [(0, '8.156')] [2025-01-04 09:06:59,924][134294] Updated weights for policy 0, policy_version 156614 (0.0027) [2025-01-04 09:07:02,925][134294] Updated weights for policy 0, policy_version 156624 (0.0023) [2025-01-04 09:07:03,968][134211] Fps is (10 sec: 15154.7, 60 sec: 14062.9, 300 sec: 13898.6). Total num frames: 641544192. Throughput: 0: 3544.7. Samples: 149549968. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-04 09:07:03,968][134211] Avg episode reward: [(0, '8.303')] [2025-01-04 09:07:06,007][134294] Updated weights for policy 0, policy_version 156634 (0.0026) [2025-01-04 09:07:08,968][134211] Fps is (10 sec: 12697.6, 60 sec: 13994.7, 300 sec: 13912.5). Total num frames: 641605632. Throughput: 0: 3405.7. Samples: 149570170. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-04 09:07:08,968][134211] Avg episode reward: [(0, '9.157')] [2025-01-04 09:07:09,333][134294] Updated weights for policy 0, policy_version 156644 (0.0026) [2025-01-04 09:07:12,595][134294] Updated weights for policy 0, policy_version 156654 (0.0026) [2025-01-04 09:07:13,968][134211] Fps is (10 sec: 12697.8, 60 sec: 14062.9, 300 sec: 13829.2). Total num frames: 641671168. Throughput: 0: 3370.4. Samples: 149588598. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-04 09:07:13,968][134211] Avg episode reward: [(0, '8.311')] [2025-01-04 09:07:13,976][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000156658_641671168.pth... [2025-01-04 09:07:14,028][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000155850_638361600.pth [2025-01-04 09:07:15,184][134294] Updated weights for policy 0, policy_version 156664 (0.0018) [2025-01-04 09:07:17,129][134294] Updated weights for policy 0, policy_version 156674 (0.0013) [2025-01-04 09:07:18,968][134211] Fps is (10 sec: 16384.2, 60 sec: 14062.9, 300 sec: 13940.3). Total num frames: 641769472. Throughput: 0: 3445.1. Samples: 149602368. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-04 09:07:18,968][134211] Avg episode reward: [(0, '9.585')] [2025-01-04 09:07:19,607][134294] Updated weights for policy 0, policy_version 156684 (0.0019) [2025-01-04 09:07:22,586][134294] Updated weights for policy 0, policy_version 156694 (0.0027) [2025-01-04 09:07:23,968][134211] Fps is (10 sec: 16383.7, 60 sec: 13994.6, 300 sec: 13981.9). Total num frames: 641835008. Throughput: 0: 3537.4. Samples: 149626548. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 09:07:23,969][134211] Avg episode reward: [(0, '9.360')] [2025-01-04 09:07:25,667][134294] Updated weights for policy 0, policy_version 156704 (0.0024) [2025-01-04 09:07:28,697][134294] Updated weights for policy 0, policy_version 156714 (0.0024) [2025-01-04 09:07:28,968][134211] Fps is (10 sec: 13107.0, 60 sec: 13926.4, 300 sec: 13981.9). Total num frames: 641900544. Throughput: 0: 3589.0. Samples: 149646586. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 09:07:28,968][134211] Avg episode reward: [(0, '8.996')] [2025-01-04 09:07:31,773][134294] Updated weights for policy 0, policy_version 156724 (0.0025) [2025-01-04 09:07:33,968][134211] Fps is (10 sec: 13516.8, 60 sec: 13994.6, 300 sec: 13981.9). Total num frames: 641970176. Throughput: 0: 3596.9. Samples: 149656550. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 09:07:33,968][134211] Avg episode reward: [(0, '8.885')] [2025-01-04 09:07:34,844][134294] Updated weights for policy 0, policy_version 156734 (0.0026) [2025-01-04 09:07:37,996][134294] Updated weights for policy 0, policy_version 156744 (0.0025) [2025-01-04 09:07:38,968][134211] Fps is (10 sec: 13107.4, 60 sec: 13927.6, 300 sec: 13981.9). Total num frames: 642031616. Throughput: 0: 3586.5. Samples: 149676364. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 09:07:38,968][134211] Avg episode reward: [(0, '9.522')] [2025-01-04 09:07:41,285][134294] Updated weights for policy 0, policy_version 156754 (0.0025) [2025-01-04 09:07:43,968][134211] Fps is (10 sec: 12697.0, 60 sec: 14062.8, 300 sec: 13981.9). Total num frames: 642097152. Throughput: 0: 3444.2. Samples: 149695088. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 09:07:43,969][134211] Avg episode reward: [(0, '8.272')] [2025-01-04 09:07:44,551][134294] Updated weights for policy 0, policy_version 156764 (0.0025) [2025-01-04 09:07:47,276][134294] Updated weights for policy 0, policy_version 156774 (0.0021) [2025-01-04 09:07:48,968][134211] Fps is (10 sec: 14745.3, 60 sec: 14267.7, 300 sec: 14037.5). Total num frames: 642179072. Throughput: 0: 3433.2. Samples: 149704464. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 09:07:48,968][134211] Avg episode reward: [(0, '9.769')] [2025-01-04 09:07:49,206][134294] Updated weights for policy 0, policy_version 156784 (0.0015) [2025-01-04 09:07:51,121][134294] Updated weights for policy 0, policy_version 156794 (0.0013) [2025-01-04 09:07:53,020][134294] Updated weights for policy 0, policy_version 156804 (0.0014) [2025-01-04 09:07:53,968][134211] Fps is (10 sec: 19252.5, 60 sec: 14950.4, 300 sec: 14190.2). Total num frames: 642289664. Throughput: 0: 3687.3. Samples: 149736100. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 09:07:53,968][134211] Avg episode reward: [(0, '10.589')] [2025-01-04 09:07:54,913][134294] Updated weights for policy 0, policy_version 156814 (0.0013) [2025-01-04 09:07:57,035][134294] Updated weights for policy 0, policy_version 156824 (0.0015) [2025-01-04 09:07:58,968][134211] Fps is (10 sec: 19660.9, 60 sec: 14950.4, 300 sec: 14134.7). Total num frames: 642375680. Throughput: 0: 3926.1. Samples: 149765274. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 09:07:58,969][134211] Avg episode reward: [(0, '8.154')] [2025-01-04 09:08:00,171][134294] Updated weights for policy 0, policy_version 156834 (0.0027) [2025-01-04 09:08:03,968][134211] Fps is (10 sec: 13926.2, 60 sec: 14745.6, 300 sec: 13981.9). Total num frames: 642428928. Throughput: 0: 3812.1. Samples: 149773914. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 09:08:03,968][134211] Avg episode reward: [(0, '9.158')] [2025-01-04 09:08:04,052][134294] Updated weights for policy 0, policy_version 156844 (0.0027) [2025-01-04 09:08:07,239][134294] Updated weights for policy 0, policy_version 156854 (0.0028) [2025-01-04 09:08:08,968][134211] Fps is (10 sec: 11468.8, 60 sec: 14745.6, 300 sec: 13995.8). Total num frames: 642490368. Throughput: 0: 3661.5. Samples: 149791314. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 09:08:08,968][134211] Avg episode reward: [(0, '9.186')] [2025-01-04 09:08:10,801][134294] Updated weights for policy 0, policy_version 156864 (0.0025) [2025-01-04 09:08:13,968][134211] Fps is (10 sec: 12288.1, 60 sec: 14677.3, 300 sec: 13995.8). Total num frames: 642551808. Throughput: 0: 3608.7. Samples: 149808976. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 09:08:13,968][134211] Avg episode reward: [(0, '9.693')] [2025-01-04 09:08:14,269][134294] Updated weights for policy 0, policy_version 156874 (0.0026) [2025-01-04 09:08:17,475][134294] Updated weights for policy 0, policy_version 156884 (0.0025) [2025-01-04 09:08:18,968][134211] Fps is (10 sec: 12697.7, 60 sec: 14131.2, 300 sec: 13995.8). Total num frames: 642617344. Throughput: 0: 3592.5. Samples: 149818214. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 09:08:18,968][134211] Avg episode reward: [(0, '8.763')] [2025-01-04 09:08:20,416][134294] Updated weights for policy 0, policy_version 156894 (0.0027) [2025-01-04 09:08:23,488][134294] Updated weights for policy 0, policy_version 156904 (0.0022) [2025-01-04 09:08:23,968][134211] Fps is (10 sec: 13106.5, 60 sec: 14131.1, 300 sec: 13995.8). Total num frames: 642682880. Throughput: 0: 3607.6. Samples: 149838710. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 09:08:23,969][134211] Avg episode reward: [(0, '8.665')] [2025-01-04 09:08:26,374][134294] Updated weights for policy 0, policy_version 156914 (0.0025) [2025-01-04 09:08:28,968][134211] Fps is (10 sec: 13516.9, 60 sec: 14199.5, 300 sec: 13995.9). Total num frames: 642752512. Throughput: 0: 3642.6. Samples: 149859004. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 09:08:28,968][134211] Avg episode reward: [(0, '9.228')] [2025-01-04 09:08:29,503][134294] Updated weights for policy 0, policy_version 156924 (0.0028) [2025-01-04 09:08:32,538][134294] Updated weights for policy 0, policy_version 156934 (0.0024) [2025-01-04 09:08:33,968][134211] Fps is (10 sec: 13517.5, 60 sec: 14131.2, 300 sec: 13995.8). Total num frames: 642818048. Throughput: 0: 3660.3. Samples: 149869176. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 09:08:33,968][134211] Avg episode reward: [(0, '8.018')] [2025-01-04 09:08:35,531][134294] Updated weights for policy 0, policy_version 156944 (0.0025) [2025-01-04 09:08:38,540][134294] Updated weights for policy 0, policy_version 156954 (0.0025) [2025-01-04 09:08:38,971][134211] Fps is (10 sec: 13512.5, 60 sec: 14267.0, 300 sec: 14023.5). Total num frames: 642887680. Throughput: 0: 3414.4. Samples: 149889760. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 09:08:38,971][134211] Avg episode reward: [(0, '9.158')] [2025-01-04 09:08:41,688][134294] Updated weights for policy 0, policy_version 156964 (0.0027) [2025-01-04 09:08:43,969][134211] Fps is (10 sec: 13105.6, 60 sec: 14199.3, 300 sec: 14009.7). Total num frames: 642949120. Throughput: 0: 3185.7. Samples: 149908636. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 09:08:43,969][134211] Avg episode reward: [(0, '8.689')] [2025-01-04 09:08:45,120][134294] Updated weights for policy 0, policy_version 156974 (0.0032) [2025-01-04 09:08:48,184][134294] Updated weights for policy 0, policy_version 156984 (0.0025) [2025-01-04 09:08:48,968][134211] Fps is (10 sec: 12701.5, 60 sec: 13926.4, 300 sec: 14009.7). Total num frames: 643014656. Throughput: 0: 3199.5. Samples: 149917892. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 09:08:48,968][134211] Avg episode reward: [(0, '8.243')] [2025-01-04 09:08:51,114][134294] Updated weights for policy 0, policy_version 156994 (0.0023) [2025-01-04 09:08:53,968][134211] Fps is (10 sec: 13518.1, 60 sec: 13243.7, 300 sec: 14051.4). Total num frames: 643084288. Throughput: 0: 3272.4. Samples: 149938574. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 09:08:53,969][134211] Avg episode reward: [(0, '9.449')] [2025-01-04 09:08:54,159][134294] Updated weights for policy 0, policy_version 157004 (0.0025) [2025-01-04 09:08:57,139][134294] Updated weights for policy 0, policy_version 157014 (0.0026) [2025-01-04 09:08:58,968][134211] Fps is (10 sec: 13516.9, 60 sec: 12902.4, 300 sec: 14009.7). Total num frames: 643149824. Throughput: 0: 3330.7. Samples: 149958856. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 09:08:58,968][134211] Avg episode reward: [(0, '9.057')] [2025-01-04 09:09:00,153][134294] Updated weights for policy 0, policy_version 157024 (0.0026) [2025-01-04 09:09:03,043][134294] Updated weights for policy 0, policy_version 157034 (0.0025) [2025-01-04 09:09:03,968][134211] Fps is (10 sec: 13516.9, 60 sec: 13175.4, 300 sec: 13898.6). Total num frames: 643219456. Throughput: 0: 3362.8. Samples: 149969542. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 09:09:03,968][134211] Avg episode reward: [(0, '8.473')] [2025-01-04 09:09:06,132][134294] Updated weights for policy 0, policy_version 157044 (0.0025) [2025-01-04 09:09:08,024][134294] Updated weights for policy 0, policy_version 157054 (0.0012) [2025-01-04 09:09:08,968][134211] Fps is (10 sec: 15974.5, 60 sec: 13653.4, 300 sec: 14009.7). Total num frames: 643309568. Throughput: 0: 3409.5. Samples: 149992134. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 09:09:08,968][134211] Avg episode reward: [(0, '9.233')] [2025-01-04 09:09:10,070][134294] Updated weights for policy 0, policy_version 157064 (0.0014) [2025-01-04 09:09:12,011][134294] Updated weights for policy 0, policy_version 157074 (0.0014) [2025-01-04 09:09:13,968][134211] Fps is (10 sec: 18431.9, 60 sec: 14199.4, 300 sec: 14106.9). Total num frames: 643403776. Throughput: 0: 3626.3. Samples: 150022188. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 09:09:13,969][134211] Avg episode reward: [(0, '9.140')] [2025-01-04 09:09:13,980][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000157081_643403776.pth... [2025-01-04 09:09:14,059][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000156250_640000000.pth [2025-01-04 09:09:14,812][134294] Updated weights for policy 0, policy_version 157084 (0.0022) [2025-01-04 09:09:18,245][134294] Updated weights for policy 0, policy_version 157094 (0.0027) [2025-01-04 09:09:18,968][134211] Fps is (10 sec: 15154.9, 60 sec: 14062.9, 300 sec: 14079.1). Total num frames: 643461120. Throughput: 0: 3594.4. Samples: 150030926. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 09:09:18,968][134211] Avg episode reward: [(0, '8.543')] [2025-01-04 09:09:21,768][134294] Updated weights for policy 0, policy_version 157104 (0.0026) [2025-01-04 09:09:23,968][134211] Fps is (10 sec: 12288.2, 60 sec: 14063.1, 300 sec: 14079.1). Total num frames: 643526656. Throughput: 0: 3540.9. Samples: 150049090. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 09:09:23,968][134211] Avg episode reward: [(0, '8.742')] [2025-01-04 09:09:24,908][134294] Updated weights for policy 0, policy_version 157114 (0.0024) [2025-01-04 09:09:28,029][134294] Updated weights for policy 0, policy_version 157124 (0.0025) [2025-01-04 09:09:28,968][134211] Fps is (10 sec: 12697.6, 60 sec: 13926.4, 300 sec: 14051.4). Total num frames: 643588096. Throughput: 0: 3554.0. Samples: 150068562. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 09:09:28,969][134211] Avg episode reward: [(0, '9.267')] [2025-01-04 09:09:31,198][134294] Updated weights for policy 0, policy_version 157134 (0.0024) [2025-01-04 09:09:33,968][134211] Fps is (10 sec: 12697.5, 60 sec: 13926.4, 300 sec: 14051.4). Total num frames: 643653632. Throughput: 0: 3567.1. Samples: 150078412. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 09:09:33,968][134211] Avg episode reward: [(0, '7.565')] [2025-01-04 09:09:34,285][134294] Updated weights for policy 0, policy_version 157144 (0.0027) [2025-01-04 09:09:37,320][134294] Updated weights for policy 0, policy_version 157154 (0.0026) [2025-01-04 09:09:38,968][134211] Fps is (10 sec: 13107.3, 60 sec: 13858.8, 300 sec: 14065.2). Total num frames: 643719168. Throughput: 0: 3553.3. Samples: 150098472. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 09:09:38,968][134211] Avg episode reward: [(0, '8.323')] [2025-01-04 09:09:40,651][134294] Updated weights for policy 0, policy_version 157164 (0.0029) [2025-01-04 09:09:43,894][134294] Updated weights for policy 0, policy_version 157174 (0.0026) [2025-01-04 09:09:43,968][134211] Fps is (10 sec: 13107.2, 60 sec: 13926.6, 300 sec: 14079.2). Total num frames: 643784704. Throughput: 0: 3519.0. Samples: 150117212. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 09:09:43,969][134211] Avg episode reward: [(0, '8.945')] [2025-01-04 09:09:47,051][134294] Updated weights for policy 0, policy_version 157184 (0.0024) [2025-01-04 09:09:48,968][134211] Fps is (10 sec: 13107.4, 60 sec: 13926.4, 300 sec: 13981.9). Total num frames: 643850240. Throughput: 0: 3490.4. Samples: 150126610. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 09:09:48,968][134211] Avg episode reward: [(0, '9.633')] [2025-01-04 09:09:50,059][134294] Updated weights for policy 0, policy_version 157194 (0.0026) [2025-01-04 09:09:51,934][134294] Updated weights for policy 0, policy_version 157204 (0.0012) [2025-01-04 09:09:53,778][134294] Updated weights for policy 0, policy_version 157214 (0.0013) [2025-01-04 09:09:53,968][134211] Fps is (10 sec: 16384.5, 60 sec: 14404.4, 300 sec: 14051.4). Total num frames: 643948544. Throughput: 0: 3542.0. Samples: 150151524. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 09:09:53,968][134211] Avg episode reward: [(0, '9.719')] [2025-01-04 09:09:55,812][134294] Updated weights for policy 0, policy_version 157224 (0.0014) [2025-01-04 09:09:58,077][134294] Updated weights for policy 0, policy_version 157234 (0.0017) [2025-01-04 09:09:58,968][134211] Fps is (10 sec: 18841.3, 60 sec: 14813.8, 300 sec: 14134.7). Total num frames: 644038656. Throughput: 0: 3530.8. Samples: 150181072. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 09:09:58,968][134211] Avg episode reward: [(0, '8.944')] [2025-01-04 09:10:01,784][134294] Updated weights for policy 0, policy_version 157244 (0.0026) [2025-01-04 09:10:03,968][134211] Fps is (10 sec: 14335.6, 60 sec: 14540.8, 300 sec: 14093.0). Total num frames: 644091904. Throughput: 0: 3526.2. Samples: 150189604. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 09:10:03,969][134211] Avg episode reward: [(0, '8.756')] [2025-01-04 09:10:05,596][134294] Updated weights for policy 0, policy_version 157254 (0.0030) [2025-01-04 09:10:08,929][134294] Updated weights for policy 0, policy_version 157264 (0.0028) [2025-01-04 09:10:08,968][134211] Fps is (10 sec: 11468.8, 60 sec: 14062.9, 300 sec: 14093.0). Total num frames: 644153344. Throughput: 0: 3492.1. Samples: 150206234. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 09:10:08,968][134211] Avg episode reward: [(0, '9.216')] [2025-01-04 09:10:12,320][134294] Updated weights for policy 0, policy_version 157274 (0.0025) [2025-01-04 09:10:13,968][134211] Fps is (10 sec: 11878.6, 60 sec: 13448.6, 300 sec: 14065.2). Total num frames: 644210688. Throughput: 0: 3455.3. Samples: 150224050. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 09:10:13,968][134211] Avg episode reward: [(0, '9.030')] [2025-01-04 09:10:15,884][134294] Updated weights for policy 0, policy_version 157284 (0.0026) [2025-01-04 09:10:18,123][134294] Updated weights for policy 0, policy_version 157294 (0.0015) [2025-01-04 09:10:18,968][134211] Fps is (10 sec: 13516.4, 60 sec: 13789.8, 300 sec: 14106.9). Total num frames: 644288512. Throughput: 0: 3432.3. Samples: 150232866. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 09:10:18,969][134211] Avg episode reward: [(0, '8.512')] [2025-01-04 09:10:20,753][134294] Updated weights for policy 0, policy_version 157304 (0.0022) [2025-01-04 09:10:23,820][134294] Updated weights for policy 0, policy_version 157314 (0.0027) [2025-01-04 09:10:23,968][134211] Fps is (10 sec: 14745.6, 60 sec: 13858.1, 300 sec: 14106.9). Total num frames: 644358144. Throughput: 0: 3527.9. Samples: 150257228. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 09:10:23,968][134211] Avg episode reward: [(0, '10.181')] [2025-01-04 09:10:26,859][134294] Updated weights for policy 0, policy_version 157324 (0.0024) [2025-01-04 09:10:28,968][134211] Fps is (10 sec: 13926.8, 60 sec: 13994.7, 300 sec: 13968.0). Total num frames: 644427776. Throughput: 0: 3559.7. Samples: 150277400. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 09:10:28,968][134211] Avg episode reward: [(0, '9.632')] [2025-01-04 09:10:29,863][134294] Updated weights for policy 0, policy_version 157334 (0.0025) [2025-01-04 09:10:33,150][134294] Updated weights for policy 0, policy_version 157344 (0.0026) [2025-01-04 09:10:33,968][134211] Fps is (10 sec: 13106.4, 60 sec: 13926.3, 300 sec: 13884.7). Total num frames: 644489216. Throughput: 0: 3562.8. Samples: 150286938. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 09:10:33,969][134211] Avg episode reward: [(0, '9.863')] [2025-01-04 09:10:36,363][134294] Updated weights for policy 0, policy_version 157354 (0.0024) [2025-01-04 09:10:38,968][134211] Fps is (10 sec: 12697.7, 60 sec: 13926.4, 300 sec: 13898.6). Total num frames: 644554752. Throughput: 0: 3442.5. Samples: 150306436. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 09:10:38,968][134211] Avg episode reward: [(0, '8.095')] [2025-01-04 09:10:39,530][134294] Updated weights for policy 0, policy_version 157364 (0.0026) [2025-01-04 09:10:42,666][134294] Updated weights for policy 0, policy_version 157374 (0.0024) [2025-01-04 09:10:43,968][134211] Fps is (10 sec: 13927.0, 60 sec: 14062.9, 300 sec: 13940.3). Total num frames: 644628480. Throughput: 0: 3242.9. Samples: 150327004. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 09:10:43,968][134211] Avg episode reward: [(0, '9.163')] [2025-01-04 09:10:44,660][134294] Updated weights for policy 0, policy_version 157384 (0.0014) [2025-01-04 09:10:46,672][134294] Updated weights for policy 0, policy_version 157394 (0.0013) [2025-01-04 09:10:48,593][134294] Updated weights for policy 0, policy_version 157404 (0.0013) [2025-01-04 09:10:48,968][134211] Fps is (10 sec: 17613.0, 60 sec: 14677.3, 300 sec: 14065.3). Total num frames: 644730880. Throughput: 0: 3386.6. Samples: 150342000. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 09:10:48,968][134211] Avg episode reward: [(0, '8.871')] [2025-01-04 09:10:51,123][134294] Updated weights for policy 0, policy_version 157414 (0.0021) [2025-01-04 09:10:53,968][134211] Fps is (10 sec: 17612.9, 60 sec: 14267.7, 300 sec: 14120.8). Total num frames: 644804608. Throughput: 0: 3610.3. Samples: 150368698. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 09:10:53,968][134211] Avg episode reward: [(0, '10.445')] [2025-01-04 09:10:54,330][134294] Updated weights for policy 0, policy_version 157424 (0.0030) [2025-01-04 09:10:57,550][134294] Updated weights for policy 0, policy_version 157434 (0.0024) [2025-01-04 09:10:58,968][134211] Fps is (10 sec: 13516.5, 60 sec: 13789.9, 300 sec: 14120.8). Total num frames: 644866048. Throughput: 0: 3636.5. Samples: 150387692. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 09:10:58,968][134211] Avg episode reward: [(0, '8.396')] [2025-01-04 09:11:00,645][134294] Updated weights for policy 0, policy_version 157444 (0.0024) [2025-01-04 09:11:03,833][134294] Updated weights for policy 0, policy_version 157454 (0.0023) [2025-01-04 09:11:03,968][134211] Fps is (10 sec: 12697.6, 60 sec: 13994.7, 300 sec: 14120.8). Total num frames: 644931584. Throughput: 0: 3659.1. Samples: 150397526. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 09:11:03,968][134211] Avg episode reward: [(0, '9.854')] [2025-01-04 09:11:06,810][134294] Updated weights for policy 0, policy_version 157464 (0.0027) [2025-01-04 09:11:08,968][134211] Fps is (10 sec: 13107.3, 60 sec: 14062.9, 300 sec: 14134.7). Total num frames: 644997120. Throughput: 0: 3559.2. Samples: 150417390. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 09:11:08,968][134211] Avg episode reward: [(0, '8.874')] [2025-01-04 09:11:10,116][134294] Updated weights for policy 0, policy_version 157474 (0.0027) [2025-01-04 09:11:13,469][134294] Updated weights for policy 0, policy_version 157484 (0.0025) [2025-01-04 09:11:13,968][134211] Fps is (10 sec: 12697.0, 60 sec: 14131.1, 300 sec: 14009.7). Total num frames: 645058560. Throughput: 0: 3519.2. Samples: 150435766. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 09:11:13,969][134211] Avg episode reward: [(0, '9.485')] [2025-01-04 09:11:13,977][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000157485_645058560.pth... [2025-01-04 09:11:14,056][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000156658_641671168.pth [2025-01-04 09:11:16,813][134294] Updated weights for policy 0, policy_version 157494 (0.0027) [2025-01-04 09:11:18,968][134211] Fps is (10 sec: 12288.0, 60 sec: 13858.2, 300 sec: 13981.9). Total num frames: 645120000. Throughput: 0: 3508.4. Samples: 150444816. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 09:11:18,968][134211] Avg episode reward: [(0, '10.126')] [2025-01-04 09:11:19,970][134294] Updated weights for policy 0, policy_version 157504 (0.0027) [2025-01-04 09:11:23,035][134294] Updated weights for policy 0, policy_version 157514 (0.0025) [2025-01-04 09:11:23,968][134211] Fps is (10 sec: 12698.2, 60 sec: 13789.8, 300 sec: 13968.1). Total num frames: 645185536. Throughput: 0: 3514.4. Samples: 150464586. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 09:11:23,968][134211] Avg episode reward: [(0, '8.562')] [2025-01-04 09:11:26,039][134294] Updated weights for policy 0, policy_version 157524 (0.0028) [2025-01-04 09:11:28,968][134211] Fps is (10 sec: 13107.2, 60 sec: 13721.6, 300 sec: 13968.1). Total num frames: 645251072. Throughput: 0: 3496.2. Samples: 150484332. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 09:11:28,968][134211] Avg episode reward: [(0, '8.074')] [2025-01-04 09:11:29,385][134294] Updated weights for policy 0, policy_version 157534 (0.0026) [2025-01-04 09:11:32,411][134294] Updated weights for policy 0, policy_version 157544 (0.0025) [2025-01-04 09:11:33,968][134211] Fps is (10 sec: 13107.3, 60 sec: 13790.0, 300 sec: 13968.3). Total num frames: 645316608. Throughput: 0: 3382.3. Samples: 150494202. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 09:11:33,968][134211] Avg episode reward: [(0, '9.232')] [2025-01-04 09:11:35,373][134294] Updated weights for policy 0, policy_version 157554 (0.0026) [2025-01-04 09:11:38,453][134294] Updated weights for policy 0, policy_version 157564 (0.0027) [2025-01-04 09:11:38,968][134211] Fps is (10 sec: 13516.8, 60 sec: 13858.1, 300 sec: 14009.7). Total num frames: 645386240. Throughput: 0: 3244.1. Samples: 150514680. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 09:11:38,968][134211] Avg episode reward: [(0, '8.881')] [2025-01-04 09:11:41,591][134294] Updated weights for policy 0, policy_version 157574 (0.0022) [2025-01-04 09:11:43,716][134294] Updated weights for policy 0, policy_version 157584 (0.0014) [2025-01-04 09:11:43,968][134211] Fps is (10 sec: 15155.3, 60 sec: 13994.7, 300 sec: 14051.4). Total num frames: 645468160. Throughput: 0: 3309.9. Samples: 150536638. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 09:11:43,968][134211] Avg episode reward: [(0, '9.146')] [2025-01-04 09:11:45,763][134294] Updated weights for policy 0, policy_version 157594 (0.0014) [2025-01-04 09:11:47,753][134294] Updated weights for policy 0, policy_version 157604 (0.0012) [2025-01-04 09:11:48,968][134211] Fps is (10 sec: 18022.7, 60 sec: 13926.4, 300 sec: 14148.6). Total num frames: 645566464. Throughput: 0: 3421.2. Samples: 150551480. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 09:11:48,968][134211] Avg episode reward: [(0, '9.043')] [2025-01-04 09:11:50,410][134294] Updated weights for policy 0, policy_version 157614 (0.0021) [2025-01-04 09:11:53,814][134294] Updated weights for policy 0, policy_version 157624 (0.0028) [2025-01-04 09:11:53,968][134211] Fps is (10 sec: 15974.1, 60 sec: 13721.6, 300 sec: 14065.2). Total num frames: 645627904. Throughput: 0: 3514.7. Samples: 150575552. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 09:11:53,968][134211] Avg episode reward: [(0, '8.724')] [2025-01-04 09:11:57,270][134294] Updated weights for policy 0, policy_version 157634 (0.0028) [2025-01-04 09:11:58,968][134211] Fps is (10 sec: 11877.8, 60 sec: 13653.3, 300 sec: 14037.5). Total num frames: 645685248. Throughput: 0: 3488.7. Samples: 150592756. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 09:11:58,969][134211] Avg episode reward: [(0, '10.259')] [2025-01-04 09:12:00,825][134294] Updated weights for policy 0, policy_version 157644 (0.0024) [2025-01-04 09:12:03,968][134211] Fps is (10 sec: 11468.7, 60 sec: 13516.8, 300 sec: 14023.6). Total num frames: 645742592. Throughput: 0: 3483.6. Samples: 150601578. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 09:12:03,969][134211] Avg episode reward: [(0, '9.073')] [2025-01-04 09:12:04,423][134294] Updated weights for policy 0, policy_version 157654 (0.0027) [2025-01-04 09:12:07,871][134294] Updated weights for policy 0, policy_version 157664 (0.0026) [2025-01-04 09:12:08,968][134211] Fps is (10 sec: 11469.0, 60 sec: 13380.2, 300 sec: 13995.8). Total num frames: 645799936. Throughput: 0: 3433.7. Samples: 150619102. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 09:12:08,969][134211] Avg episode reward: [(0, '9.395')] [2025-01-04 09:12:11,302][134294] Updated weights for policy 0, policy_version 157674 (0.0025) [2025-01-04 09:12:13,968][134211] Fps is (10 sec: 11878.5, 60 sec: 13380.4, 300 sec: 13870.9). Total num frames: 645861376. Throughput: 0: 3388.4. Samples: 150636810. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 09:12:13,969][134211] Avg episode reward: [(0, '8.550')] [2025-01-04 09:12:14,836][134294] Updated weights for policy 0, policy_version 157684 (0.0028) [2025-01-04 09:12:18,216][134294] Updated weights for policy 0, policy_version 157694 (0.0026) [2025-01-04 09:12:18,968][134211] Fps is (10 sec: 12288.2, 60 sec: 13380.3, 300 sec: 13857.0). Total num frames: 645922816. Throughput: 0: 3360.7. Samples: 150645434. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 09:12:18,969][134211] Avg episode reward: [(0, '9.694')] [2025-01-04 09:12:20,841][134294] Updated weights for policy 0, policy_version 157704 (0.0021) [2025-01-04 09:12:22,769][134294] Updated weights for policy 0, policy_version 157714 (0.0013) [2025-01-04 09:12:23,968][134211] Fps is (10 sec: 15974.7, 60 sec: 13926.4, 300 sec: 13968.1). Total num frames: 646021120. Throughput: 0: 3437.0. Samples: 150669346. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 09:12:23,968][134211] Avg episode reward: [(0, '8.974')] [2025-01-04 09:12:24,851][134294] Updated weights for policy 0, policy_version 157724 (0.0016) [2025-01-04 09:12:27,859][134294] Updated weights for policy 0, policy_version 157734 (0.0027) [2025-01-04 09:12:28,968][134211] Fps is (10 sec: 16793.8, 60 sec: 13994.7, 300 sec: 13968.1). Total num frames: 646090752. Throughput: 0: 3493.4. Samples: 150693840. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 09:12:28,968][134211] Avg episode reward: [(0, '10.095')] [2025-01-04 09:12:30,930][134294] Updated weights for policy 0, policy_version 157744 (0.0026) [2025-01-04 09:12:33,968][134211] Fps is (10 sec: 13516.6, 60 sec: 13994.6, 300 sec: 13981.9). Total num frames: 646156288. Throughput: 0: 3384.8. Samples: 150703798. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 09:12:33,969][134211] Avg episode reward: [(0, '8.735')] [2025-01-04 09:12:34,146][134294] Updated weights for policy 0, policy_version 157754 (0.0025) [2025-01-04 09:12:37,191][134294] Updated weights for policy 0, policy_version 157764 (0.0028) [2025-01-04 09:12:38,968][134211] Fps is (10 sec: 13106.5, 60 sec: 13926.3, 300 sec: 13981.9). Total num frames: 646221824. Throughput: 0: 3284.5. Samples: 150723356. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 09:12:38,969][134211] Avg episode reward: [(0, '9.845')] [2025-01-04 09:12:40,550][134294] Updated weights for policy 0, policy_version 157774 (0.0025) [2025-01-04 09:12:43,968][134211] Fps is (10 sec: 12287.9, 60 sec: 13516.7, 300 sec: 13898.6). Total num frames: 646279168. Throughput: 0: 3310.2. Samples: 150741712. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 09:12:43,969][134211] Avg episode reward: [(0, '8.130')] [2025-01-04 09:12:43,970][134294] Updated weights for policy 0, policy_version 157784 (0.0026) [2025-01-04 09:12:47,190][134294] Updated weights for policy 0, policy_version 157794 (0.0025) [2025-01-04 09:12:48,968][134211] Fps is (10 sec: 12288.5, 60 sec: 12970.6, 300 sec: 13745.9). Total num frames: 646344704. Throughput: 0: 3316.9. Samples: 150750838. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 09:12:48,968][134211] Avg episode reward: [(0, '9.372')] [2025-01-04 09:12:50,445][134294] Updated weights for policy 0, policy_version 157804 (0.0025) [2025-01-04 09:12:53,702][134294] Updated weights for policy 0, policy_version 157814 (0.0026) [2025-01-04 09:12:53,968][134211] Fps is (10 sec: 12697.6, 60 sec: 12970.7, 300 sec: 13662.6). Total num frames: 646406144. Throughput: 0: 3356.0. Samples: 150770120. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 09:12:53,968][134211] Avg episode reward: [(0, '9.516')] [2025-01-04 09:12:56,130][134294] Updated weights for policy 0, policy_version 157824 (0.0018) [2025-01-04 09:12:58,097][134294] Updated weights for policy 0, policy_version 157834 (0.0014) [2025-01-04 09:12:58,968][134211] Fps is (10 sec: 15974.7, 60 sec: 13653.5, 300 sec: 13815.3). Total num frames: 646504448. Throughput: 0: 3527.6. Samples: 150795552. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 09:12:58,968][134211] Avg episode reward: [(0, '10.066')] [2025-01-04 09:13:00,101][134294] Updated weights for policy 0, policy_version 157844 (0.0014) [2025-01-04 09:13:02,250][134294] Updated weights for policy 0, policy_version 157854 (0.0016) [2025-01-04 09:13:03,969][134211] Fps is (10 sec: 18430.3, 60 sec: 14131.0, 300 sec: 13898.6). Total num frames: 646590464. Throughput: 0: 3674.9. Samples: 150810810. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 09:13:03,969][134211] Avg episode reward: [(0, '8.786')] [2025-01-04 09:13:05,585][134294] Updated weights for policy 0, policy_version 157864 (0.0026) [2025-01-04 09:13:08,968][134211] Fps is (10 sec: 14335.7, 60 sec: 14131.2, 300 sec: 13884.7). Total num frames: 646647808. Throughput: 0: 3587.5. Samples: 150830786. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 09:13:08,969][134211] Avg episode reward: [(0, '8.657')] [2025-01-04 09:13:09,049][134294] Updated weights for policy 0, policy_version 157874 (0.0027) [2025-01-04 09:13:12,606][134294] Updated weights for policy 0, policy_version 157884 (0.0030) [2025-01-04 09:13:13,968][134211] Fps is (10 sec: 11469.7, 60 sec: 14062.9, 300 sec: 13857.0). Total num frames: 646705152. Throughput: 0: 3424.8. Samples: 150847958. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 09:13:13,969][134211] Avg episode reward: [(0, '8.912')] [2025-01-04 09:13:14,038][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000157888_646709248.pth... [2025-01-04 09:13:14,121][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000157081_643403776.pth [2025-01-04 09:13:16,257][134294] Updated weights for policy 0, policy_version 157894 (0.0028) [2025-01-04 09:13:18,968][134211] Fps is (10 sec: 11878.4, 60 sec: 14062.9, 300 sec: 13843.1). Total num frames: 646766592. Throughput: 0: 3393.0. Samples: 150856482. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 09:13:18,968][134211] Avg episode reward: [(0, '8.486')] [2025-01-04 09:13:19,519][134294] Updated weights for policy 0, policy_version 157904 (0.0025) [2025-01-04 09:13:22,634][134294] Updated weights for policy 0, policy_version 157914 (0.0025) [2025-01-04 09:13:23,968][134211] Fps is (10 sec: 12288.1, 60 sec: 13448.5, 300 sec: 13815.3). Total num frames: 646828032. Throughput: 0: 3385.7. Samples: 150875712. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 09:13:23,968][134211] Avg episode reward: [(0, '9.701')] [2025-01-04 09:13:25,992][134294] Updated weights for policy 0, policy_version 157924 (0.0026) [2025-01-04 09:13:28,968][134211] Fps is (10 sec: 12288.0, 60 sec: 13312.0, 300 sec: 13801.4). Total num frames: 646889472. Throughput: 0: 3381.1. Samples: 150893860. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 09:13:28,968][134211] Avg episode reward: [(0, '9.189')] [2025-01-04 09:13:29,471][134294] Updated weights for policy 0, policy_version 157934 (0.0028) [2025-01-04 09:13:32,265][134294] Updated weights for policy 0, policy_version 157944 (0.0021) [2025-01-04 09:13:33,968][134211] Fps is (10 sec: 14336.2, 60 sec: 13585.1, 300 sec: 13843.2). Total num frames: 646971392. Throughput: 0: 3384.9. Samples: 150903156. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 09:13:33,968][134211] Avg episode reward: [(0, '7.504')] [2025-01-04 09:13:34,590][134294] Updated weights for policy 0, policy_version 157954 (0.0019) [2025-01-04 09:13:37,671][134294] Updated weights for policy 0, policy_version 157964 (0.0025) [2025-01-04 09:13:38,969][134211] Fps is (10 sec: 14334.4, 60 sec: 13516.6, 300 sec: 13843.1). Total num frames: 647032832. Throughput: 0: 3480.0. Samples: 150926724. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 09:13:38,969][134211] Avg episode reward: [(0, '9.535')] [2025-01-04 09:13:41,198][134294] Updated weights for policy 0, policy_version 157974 (0.0022) [2025-01-04 09:13:43,968][134211] Fps is (10 sec: 11878.2, 60 sec: 13516.8, 300 sec: 13815.3). Total num frames: 647090176. Throughput: 0: 3304.4. Samples: 150944250. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 09:13:43,968][134211] Avg episode reward: [(0, '9.341')] [2025-01-04 09:13:44,848][134294] Updated weights for policy 0, policy_version 157984 (0.0023) [2025-01-04 09:13:46,930][134294] Updated weights for policy 0, policy_version 157994 (0.0014) [2025-01-04 09:13:48,968][134211] Fps is (10 sec: 14747.4, 60 sec: 13926.4, 300 sec: 13884.8). Total num frames: 647180288. Throughput: 0: 3212.2. Samples: 150955356. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-04 09:13:48,968][134211] Avg episode reward: [(0, '9.580')] [2025-01-04 09:13:48,992][134294] Updated weights for policy 0, policy_version 158004 (0.0013) [2025-01-04 09:13:50,964][134294] Updated weights for policy 0, policy_version 158014 (0.0014) [2025-01-04 09:13:52,927][134294] Updated weights for policy 0, policy_version 158024 (0.0013) [2025-01-04 09:13:53,968][134211] Fps is (10 sec: 19661.1, 60 sec: 14677.4, 300 sec: 14023.6). Total num frames: 647286784. Throughput: 0: 3450.7. Samples: 150986066. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-04 09:13:53,968][134211] Avg episode reward: [(0, '9.455')] [2025-01-04 09:13:54,863][134294] Updated weights for policy 0, policy_version 158034 (0.0013) [2025-01-04 09:13:57,844][134294] Updated weights for policy 0, policy_version 158044 (0.0022) [2025-01-04 09:13:58,968][134211] Fps is (10 sec: 17612.6, 60 sec: 14199.4, 300 sec: 14023.6). Total num frames: 647356416. Throughput: 0: 3624.2. Samples: 151011044. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-04 09:13:58,968][134211] Avg episode reward: [(0, '8.550')] [2025-01-04 09:14:01,711][134294] Updated weights for policy 0, policy_version 158054 (0.0032) [2025-01-04 09:14:03,969][134211] Fps is (10 sec: 12286.4, 60 sec: 13653.3, 300 sec: 13898.6). Total num frames: 647409664. Throughput: 0: 3612.5. Samples: 151019048. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-04 09:14:03,970][134211] Avg episode reward: [(0, '8.964')] [2025-01-04 09:14:05,625][134294] Updated weights for policy 0, policy_version 158064 (0.0030) [2025-01-04 09:14:08,968][134211] Fps is (10 sec: 11059.1, 60 sec: 13653.3, 300 sec: 13773.7). Total num frames: 647467008. Throughput: 0: 3549.6. Samples: 151035446. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-04 09:14:08,968][134211] Avg episode reward: [(0, '9.833')] [2025-01-04 09:14:09,144][134294] Updated weights for policy 0, policy_version 158074 (0.0027) [2025-01-04 09:14:12,560][134294] Updated weights for policy 0, policy_version 158084 (0.0029) [2025-01-04 09:14:13,968][134211] Fps is (10 sec: 11470.0, 60 sec: 13653.4, 300 sec: 13773.7). Total num frames: 647524352. Throughput: 0: 3530.1. Samples: 151052716. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-04 09:14:13,969][134211] Avg episode reward: [(0, '9.198')] [2025-01-04 09:14:16,139][134294] Updated weights for policy 0, policy_version 158094 (0.0027) [2025-01-04 09:14:18,968][134211] Fps is (10 sec: 11468.9, 60 sec: 13585.1, 300 sec: 13745.9). Total num frames: 647581696. Throughput: 0: 3519.0. Samples: 151061512. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-04 09:14:18,968][134211] Avg episode reward: [(0, '9.955')] [2025-01-04 09:14:19,792][134294] Updated weights for policy 0, policy_version 158104 (0.0026) [2025-01-04 09:14:23,192][134294] Updated weights for policy 0, policy_version 158114 (0.0029) [2025-01-04 09:14:23,968][134211] Fps is (10 sec: 11878.6, 60 sec: 13585.1, 300 sec: 13745.9). Total num frames: 647643136. Throughput: 0: 3379.5. Samples: 151078798. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-04 09:14:23,968][134211] Avg episode reward: [(0, '8.580')] [2025-01-04 09:14:26,274][134294] Updated weights for policy 0, policy_version 158124 (0.0026) [2025-01-04 09:14:28,968][134211] Fps is (10 sec: 12697.3, 60 sec: 13653.3, 300 sec: 13745.9). Total num frames: 647708672. Throughput: 0: 3423.5. Samples: 151098310. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-04 09:14:28,969][134211] Avg episode reward: [(0, '7.944')] [2025-01-04 09:14:29,535][134294] Updated weights for policy 0, policy_version 158134 (0.0028) [2025-01-04 09:14:32,770][134294] Updated weights for policy 0, policy_version 158144 (0.0024) [2025-01-04 09:14:33,969][134211] Fps is (10 sec: 12696.2, 60 sec: 13311.8, 300 sec: 13732.0). Total num frames: 647770112. Throughput: 0: 3382.4. Samples: 151107566. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-04 09:14:33,969][134211] Avg episode reward: [(0, '9.658')] [2025-01-04 09:14:35,860][134294] Updated weights for policy 0, policy_version 158154 (0.0026) [2025-01-04 09:14:38,913][134294] Updated weights for policy 0, policy_version 158164 (0.0025) [2025-01-04 09:14:38,968][134211] Fps is (10 sec: 13107.6, 60 sec: 13448.8, 300 sec: 13745.9). Total num frames: 647839744. Throughput: 0: 3142.2. Samples: 151127466. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-04 09:14:38,968][134211] Avg episode reward: [(0, '9.004')] [2025-01-04 09:14:42,002][134294] Updated weights for policy 0, policy_version 158174 (0.0025) [2025-01-04 09:14:43,968][134211] Fps is (10 sec: 13108.4, 60 sec: 13516.8, 300 sec: 13732.0). Total num frames: 647901184. Throughput: 0: 3013.1. Samples: 151146632. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-04 09:14:43,968][134211] Avg episode reward: [(0, '8.622')] [2025-01-04 09:14:45,463][134294] Updated weights for policy 0, policy_version 158184 (0.0028) [2025-01-04 09:14:48,492][134294] Updated weights for policy 0, policy_version 158194 (0.0024) [2025-01-04 09:14:48,968][134211] Fps is (10 sec: 13107.3, 60 sec: 13175.5, 300 sec: 13634.8). Total num frames: 647970816. Throughput: 0: 3042.2. Samples: 151155942. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-04 09:14:48,968][134211] Avg episode reward: [(0, '9.616')] [2025-01-04 09:14:50,502][134294] Updated weights for policy 0, policy_version 158204 (0.0013) [2025-01-04 09:14:52,717][134294] Updated weights for policy 0, policy_version 158214 (0.0018) [2025-01-04 09:14:53,968][134211] Fps is (10 sec: 15564.3, 60 sec: 12834.0, 300 sec: 13620.9). Total num frames: 648056832. Throughput: 0: 3262.3. Samples: 151182252. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 09:14:53,969][134211] Avg episode reward: [(0, '9.441')] [2025-01-04 09:14:56,056][134294] Updated weights for policy 0, policy_version 158224 (0.0025) [2025-01-04 09:14:58,968][134211] Fps is (10 sec: 14745.3, 60 sec: 12697.6, 300 sec: 13648.7). Total num frames: 648118272. Throughput: 0: 3300.6. Samples: 151201244. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 09:14:58,968][134211] Avg episode reward: [(0, '8.598')] [2025-01-04 09:14:59,265][134294] Updated weights for policy 0, policy_version 158234 (0.0030) [2025-01-04 09:15:02,560][134294] Updated weights for policy 0, policy_version 158244 (0.0026) [2025-01-04 09:15:03,968][134211] Fps is (10 sec: 12698.0, 60 sec: 12902.6, 300 sec: 13662.6). Total num frames: 648183808. Throughput: 0: 3317.5. Samples: 151210800. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 09:15:03,968][134211] Avg episode reward: [(0, '8.428')] [2025-01-04 09:15:05,639][134294] Updated weights for policy 0, policy_version 158254 (0.0027) [2025-01-04 09:15:08,778][134294] Updated weights for policy 0, policy_version 158264 (0.0023) [2025-01-04 09:15:08,968][134211] Fps is (10 sec: 13107.2, 60 sec: 13038.9, 300 sec: 13690.4). Total num frames: 648249344. Throughput: 0: 3371.3. Samples: 151230506. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 09:15:08,968][134211] Avg episode reward: [(0, '8.797')] [2025-01-04 09:15:12,244][134294] Updated weights for policy 0, policy_version 158274 (0.0031) [2025-01-04 09:15:13,968][134211] Fps is (10 sec: 12288.0, 60 sec: 13038.9, 300 sec: 13620.9). Total num frames: 648306688. Throughput: 0: 3332.4. Samples: 151248268. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 09:15:13,968][134211] Avg episode reward: [(0, '9.098')] [2025-01-04 09:15:13,980][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000158278_648306688.pth... [2025-01-04 09:15:14,054][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000157485_645058560.pth [2025-01-04 09:15:15,325][134294] Updated weights for policy 0, policy_version 158284 (0.0021) [2025-01-04 09:15:17,397][134294] Updated weights for policy 0, policy_version 158294 (0.0013) [2025-01-04 09:15:18,968][134211] Fps is (10 sec: 15155.5, 60 sec: 13653.4, 300 sec: 13704.2). Total num frames: 648400896. Throughput: 0: 3389.0. Samples: 151260068. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 09:15:18,968][134211] Avg episode reward: [(0, '9.027')] [2025-01-04 09:15:19,444][134294] Updated weights for policy 0, policy_version 158304 (0.0014) [2025-01-04 09:15:21,383][134294] Updated weights for policy 0, policy_version 158314 (0.0012) [2025-01-04 09:15:23,968][134211] Fps is (10 sec: 18022.5, 60 sec: 14062.9, 300 sec: 13759.8). Total num frames: 648486912. Throughput: 0: 3612.4. Samples: 151290024. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 09:15:23,968][134211] Avg episode reward: [(0, '9.157')] [2025-01-04 09:15:24,527][134294] Updated weights for policy 0, policy_version 158324 (0.0023) [2025-01-04 09:15:28,079][134294] Updated weights for policy 0, policy_version 158334 (0.0029) [2025-01-04 09:15:28,968][134211] Fps is (10 sec: 14335.7, 60 sec: 13926.4, 300 sec: 13745.9). Total num frames: 648544256. Throughput: 0: 3573.9. Samples: 151307458. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 09:15:28,968][134211] Avg episode reward: [(0, '9.390')] [2025-01-04 09:15:31,484][134294] Updated weights for policy 0, policy_version 158344 (0.0025) [2025-01-04 09:15:33,968][134211] Fps is (10 sec: 11468.8, 60 sec: 13858.4, 300 sec: 13718.1). Total num frames: 648601600. Throughput: 0: 3569.4. Samples: 151316568. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 09:15:33,968][134211] Avg episode reward: [(0, '8.714')] [2025-01-04 09:15:34,958][134294] Updated weights for policy 0, policy_version 158354 (0.0026) [2025-01-04 09:15:38,261][134294] Updated weights for policy 0, policy_version 158364 (0.0028) [2025-01-04 09:15:38,968][134211] Fps is (10 sec: 12288.0, 60 sec: 13789.8, 300 sec: 13690.4). Total num frames: 648667136. Throughput: 0: 3386.5. Samples: 151334642. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 09:15:38,968][134211] Avg episode reward: [(0, '9.479')] [2025-01-04 09:15:41,590][134294] Updated weights for policy 0, policy_version 158374 (0.0028) [2025-01-04 09:15:43,969][134211] Fps is (10 sec: 12696.6, 60 sec: 13789.7, 300 sec: 13551.5). Total num frames: 648728576. Throughput: 0: 3371.6. Samples: 151352970. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 09:15:43,969][134211] Avg episode reward: [(0, '9.126')] [2025-01-04 09:15:44,936][134294] Updated weights for policy 0, policy_version 158384 (0.0027) [2025-01-04 09:15:48,175][134294] Updated weights for policy 0, policy_version 158394 (0.0024) [2025-01-04 09:15:48,968][134211] Fps is (10 sec: 12288.2, 60 sec: 13653.3, 300 sec: 13509.9). Total num frames: 648790016. Throughput: 0: 3358.5. Samples: 151361932. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 09:15:48,968][134211] Avg episode reward: [(0, '9.611')] [2025-01-04 09:15:51,220][134294] Updated weights for policy 0, policy_version 158404 (0.0023) [2025-01-04 09:15:53,968][134211] Fps is (10 sec: 12698.6, 60 sec: 13312.1, 300 sec: 13523.7). Total num frames: 648855552. Throughput: 0: 3360.9. Samples: 151381746. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 09:15:53,968][134211] Avg episode reward: [(0, '9.187')] [2025-01-04 09:15:54,509][134294] Updated weights for policy 0, policy_version 158414 (0.0025) [2025-01-04 09:15:57,903][134294] Updated weights for policy 0, policy_version 158424 (0.0025) [2025-01-04 09:15:58,968][134211] Fps is (10 sec: 12697.6, 60 sec: 13312.0, 300 sec: 13509.9). Total num frames: 648916992. Throughput: 0: 3370.6. Samples: 151399946. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 09:15:58,968][134211] Avg episode reward: [(0, '8.703')] [2025-01-04 09:16:01,112][134294] Updated weights for policy 0, policy_version 158434 (0.0024) [2025-01-04 09:16:03,320][134294] Updated weights for policy 0, policy_version 158444 (0.0014) [2025-01-04 09:16:03,967][134211] Fps is (10 sec: 14336.4, 60 sec: 13585.1, 300 sec: 13565.4). Total num frames: 648998912. Throughput: 0: 3330.8. Samples: 151409952. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 09:16:03,968][134211] Avg episode reward: [(0, '8.556')] [2025-01-04 09:16:05,407][134294] Updated weights for policy 0, policy_version 158454 (0.0013) [2025-01-04 09:16:07,298][134294] Updated weights for policy 0, policy_version 158464 (0.0014) [2025-01-04 09:16:08,968][134211] Fps is (10 sec: 18432.2, 60 sec: 14199.5, 300 sec: 13704.3). Total num frames: 649101312. Throughput: 0: 3327.3. Samples: 151439750. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 09:16:08,968][134211] Avg episode reward: [(0, '9.032')] [2025-01-04 09:16:09,352][134294] Updated weights for policy 0, policy_version 158474 (0.0012) [2025-01-04 09:16:12,329][134294] Updated weights for policy 0, policy_version 158484 (0.0022) [2025-01-04 09:16:13,968][134211] Fps is (10 sec: 16792.9, 60 sec: 14336.0, 300 sec: 13718.1). Total num frames: 649166848. Throughput: 0: 3458.2. Samples: 151463076. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 09:16:13,969][134211] Avg episode reward: [(0, '9.303')] [2025-01-04 09:16:16,006][134294] Updated weights for policy 0, policy_version 158494 (0.0028) [2025-01-04 09:16:18,968][134211] Fps is (10 sec: 11878.2, 60 sec: 13653.3, 300 sec: 13676.5). Total num frames: 649220096. Throughput: 0: 3442.2. Samples: 151471468. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 09:16:18,968][134211] Avg episode reward: [(0, '9.291')] [2025-01-04 09:16:19,864][134294] Updated weights for policy 0, policy_version 158504 (0.0026) [2025-01-04 09:16:23,123][134294] Updated weights for policy 0, policy_version 158514 (0.0027) [2025-01-04 09:16:23,968][134211] Fps is (10 sec: 11468.8, 60 sec: 13243.7, 300 sec: 13662.6). Total num frames: 649281536. Throughput: 0: 3418.7. Samples: 151488484. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 09:16:23,969][134211] Avg episode reward: [(0, '7.685')] [2025-01-04 09:16:26,481][134294] Updated weights for policy 0, policy_version 158524 (0.0028) [2025-01-04 09:16:28,968][134211] Fps is (10 sec: 12288.0, 60 sec: 13312.0, 300 sec: 13648.7). Total num frames: 649342976. Throughput: 0: 3425.8. Samples: 151507130. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 09:16:28,968][134211] Avg episode reward: [(0, '9.315')] [2025-01-04 09:16:29,755][134294] Updated weights for policy 0, policy_version 158534 (0.0030) [2025-01-04 09:16:33,234][134294] Updated weights for policy 0, policy_version 158544 (0.0024) [2025-01-04 09:16:33,968][134211] Fps is (10 sec: 12288.0, 60 sec: 13380.3, 300 sec: 13620.9). Total num frames: 649404416. Throughput: 0: 3429.1. Samples: 151516240. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 09:16:33,969][134211] Avg episode reward: [(0, '9.061')] [2025-01-04 09:16:36,591][134294] Updated weights for policy 0, policy_version 158554 (0.0027) [2025-01-04 09:16:38,968][134211] Fps is (10 sec: 11877.7, 60 sec: 13243.6, 300 sec: 13537.6). Total num frames: 649461760. Throughput: 0: 3383.3. Samples: 151533996. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 09:16:38,969][134211] Avg episode reward: [(0, '10.307')] [2025-01-04 09:16:40,227][134294] Updated weights for policy 0, policy_version 158564 (0.0027) [2025-01-04 09:16:43,698][134294] Updated weights for policy 0, policy_version 158574 (0.0025) [2025-01-04 09:16:43,969][134211] Fps is (10 sec: 11467.2, 60 sec: 13175.3, 300 sec: 13398.7). Total num frames: 649519104. Throughput: 0: 3368.7. Samples: 151551544. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 09:16:43,970][134211] Avg episode reward: [(0, '9.791')] [2025-01-04 09:16:46,920][134294] Updated weights for policy 0, policy_version 158584 (0.0021) [2025-01-04 09:16:48,939][134294] Updated weights for policy 0, policy_version 158594 (0.0015) [2025-01-04 09:16:48,968][134211] Fps is (10 sec: 13927.4, 60 sec: 13516.8, 300 sec: 13468.2). Total num frames: 649601024. Throughput: 0: 3340.3. Samples: 151560266. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 09:16:48,968][134211] Avg episode reward: [(0, '10.002')] [2025-01-04 09:16:51,051][134294] Updated weights for policy 0, policy_version 158604 (0.0014) [2025-01-04 09:16:53,110][134294] Updated weights for policy 0, policy_version 158614 (0.0012) [2025-01-04 09:16:53,968][134211] Fps is (10 sec: 18025.4, 60 sec: 14063.0, 300 sec: 13607.1). Total num frames: 649699328. Throughput: 0: 3325.1. Samples: 151589380. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 09:16:53,969][134211] Avg episode reward: [(0, '8.431')] [2025-01-04 09:16:55,182][134294] Updated weights for policy 0, policy_version 158624 (0.0014) [2025-01-04 09:16:58,457][134294] Updated weights for policy 0, policy_version 158634 (0.0030) [2025-01-04 09:16:58,968][134211] Fps is (10 sec: 16793.5, 60 sec: 14199.5, 300 sec: 13648.7). Total num frames: 649768960. Throughput: 0: 3348.6. Samples: 151613764. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 09:16:58,968][134211] Avg episode reward: [(0, '8.978')] [2025-01-04 09:17:01,935][134294] Updated weights for policy 0, policy_version 158644 (0.0030) [2025-01-04 09:17:03,968][134211] Fps is (10 sec: 12697.5, 60 sec: 13789.8, 300 sec: 13648.7). Total num frames: 649826304. Throughput: 0: 3355.2. Samples: 151622450. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 09:17:03,968][134211] Avg episode reward: [(0, '8.994')] [2025-01-04 09:17:05,384][134294] Updated weights for policy 0, policy_version 158654 (0.0024) [2025-01-04 09:17:08,532][134294] Updated weights for policy 0, policy_version 158664 (0.0026) [2025-01-04 09:17:08,968][134211] Fps is (10 sec: 12287.9, 60 sec: 13175.4, 300 sec: 13662.6). Total num frames: 649891840. Throughput: 0: 3384.5. Samples: 151640786. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 09:17:08,968][134211] Avg episode reward: [(0, '8.770')] [2025-01-04 09:17:11,688][134294] Updated weights for policy 0, policy_version 158674 (0.0027) [2025-01-04 09:17:13,968][134211] Fps is (10 sec: 12697.5, 60 sec: 13107.2, 300 sec: 13662.6). Total num frames: 649953280. Throughput: 0: 3388.1. Samples: 151659594. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 09:17:13,968][134211] Avg episode reward: [(0, '9.079')] [2025-01-04 09:17:13,981][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000158680_649953280.pth... [2025-01-04 09:17:14,067][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000157888_646709248.pth [2025-01-04 09:17:15,366][134294] Updated weights for policy 0, policy_version 158684 (0.0029) [2025-01-04 09:17:18,650][134294] Updated weights for policy 0, policy_version 158694 (0.0028) [2025-01-04 09:17:18,968][134211] Fps is (10 sec: 12288.1, 60 sec: 13243.8, 300 sec: 13537.6). Total num frames: 650014720. Throughput: 0: 3371.1. Samples: 151667938. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 09:17:18,968][134211] Avg episode reward: [(0, '9.328')] [2025-01-04 09:17:21,595][134294] Updated weights for policy 0, policy_version 158704 (0.0024) [2025-01-04 09:17:23,968][134211] Fps is (10 sec: 12697.5, 60 sec: 13312.0, 300 sec: 13523.7). Total num frames: 650080256. Throughput: 0: 3417.6. Samples: 151687788. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 09:17:23,968][134211] Avg episode reward: [(0, '8.611')] [2025-01-04 09:17:24,872][134294] Updated weights for policy 0, policy_version 158714 (0.0025) [2025-01-04 09:17:28,204][134294] Updated weights for policy 0, policy_version 158724 (0.0027) [2025-01-04 09:17:28,968][134211] Fps is (10 sec: 12288.0, 60 sec: 13243.8, 300 sec: 13496.0). Total num frames: 650137600. Throughput: 0: 3440.7. Samples: 151706370. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 09:17:28,968][134211] Avg episode reward: [(0, '8.493')] [2025-01-04 09:17:31,596][134294] Updated weights for policy 0, policy_version 158734 (0.0026) [2025-01-04 09:17:33,968][134211] Fps is (10 sec: 13107.4, 60 sec: 13448.6, 300 sec: 13523.8). Total num frames: 650211328. Throughput: 0: 3448.5. Samples: 151715448. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 09:17:33,968][134211] Avg episode reward: [(0, '9.169')] [2025-01-04 09:17:33,986][134294] Updated weights for policy 0, policy_version 158744 (0.0017) [2025-01-04 09:17:36,833][134294] Updated weights for policy 0, policy_version 158754 (0.0024) [2025-01-04 09:17:38,968][134211] Fps is (10 sec: 14335.8, 60 sec: 13653.5, 300 sec: 13565.4). Total num frames: 650280960. Throughput: 0: 3317.9. Samples: 151738688. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 09:17:38,968][134211] Avg episode reward: [(0, '8.538')] [2025-01-04 09:17:40,198][134294] Updated weights for policy 0, policy_version 158764 (0.0028) [2025-01-04 09:17:43,803][134294] Updated weights for policy 0, policy_version 158774 (0.0029) [2025-01-04 09:17:43,968][134211] Fps is (10 sec: 12696.6, 60 sec: 13653.5, 300 sec: 13537.6). Total num frames: 650338304. Throughput: 0: 3166.9. Samples: 151756278. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 09:17:43,969][134211] Avg episode reward: [(0, '9.481')] [2025-01-04 09:17:46,306][134294] Updated weights for policy 0, policy_version 158784 (0.0015) [2025-01-04 09:17:48,969][134211] Fps is (10 sec: 13515.1, 60 sec: 13584.7, 300 sec: 13593.1). Total num frames: 650416128. Throughput: 0: 3224.2. Samples: 151767542. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 09:17:48,970][134211] Avg episode reward: [(0, '10.173')] [2025-01-04 09:17:49,229][134294] Updated weights for policy 0, policy_version 158794 (0.0023) [2025-01-04 09:17:52,590][134294] Updated weights for policy 0, policy_version 158804 (0.0025) [2025-01-04 09:17:53,968][134211] Fps is (10 sec: 13517.9, 60 sec: 12902.4, 300 sec: 13454.3). Total num frames: 650473472. Throughput: 0: 3253.7. Samples: 151787204. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 09:17:53,968][134211] Avg episode reward: [(0, '9.700')] [2025-01-04 09:17:55,548][134294] Updated weights for policy 0, policy_version 158814 (0.0022) [2025-01-04 09:17:57,680][134294] Updated weights for policy 0, policy_version 158824 (0.0012) [2025-01-04 09:17:58,968][134211] Fps is (10 sec: 14747.8, 60 sec: 13243.8, 300 sec: 13468.3). Total num frames: 650563584. Throughput: 0: 3368.1. Samples: 151811156. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 09:17:58,968][134211] Avg episode reward: [(0, '9.102')] [2025-01-04 09:17:59,856][134294] Updated weights for policy 0, policy_version 158834 (0.0015) [2025-01-04 09:18:01,958][134294] Updated weights for policy 0, policy_version 158844 (0.0014) [2025-01-04 09:18:03,968][134211] Fps is (10 sec: 18432.1, 60 sec: 13858.1, 300 sec: 13593.2). Total num frames: 650657792. Throughput: 0: 3499.2. Samples: 151825404. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 09:18:03,968][134211] Avg episode reward: [(0, '9.378')] [2025-01-04 09:18:04,402][134294] Updated weights for policy 0, policy_version 158854 (0.0015) [2025-01-04 09:18:07,800][134294] Updated weights for policy 0, policy_version 158864 (0.0027) [2025-01-04 09:18:08,968][134211] Fps is (10 sec: 15153.8, 60 sec: 13721.4, 300 sec: 13593.1). Total num frames: 650715136. Throughput: 0: 3562.8. Samples: 151848116. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 09:18:08,969][134211] Avg episode reward: [(0, '8.976')] [2025-01-04 09:18:11,505][134294] Updated weights for policy 0, policy_version 158874 (0.0031) [2025-01-04 09:18:13,968][134211] Fps is (10 sec: 11468.6, 60 sec: 13653.3, 300 sec: 13579.3). Total num frames: 650772480. Throughput: 0: 3517.0. Samples: 151864636. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 09:18:13,968][134211] Avg episode reward: [(0, '7.677')] [2025-01-04 09:18:15,283][134294] Updated weights for policy 0, policy_version 158884 (0.0031) [2025-01-04 09:18:18,834][134294] Updated weights for policy 0, policy_version 158894 (0.0025) [2025-01-04 09:18:18,968][134211] Fps is (10 sec: 11469.8, 60 sec: 13585.1, 300 sec: 13565.4). Total num frames: 650829824. Throughput: 0: 3500.4. Samples: 151872968. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 09:18:18,968][134211] Avg episode reward: [(0, '8.752')] [2025-01-04 09:18:22,143][134294] Updated weights for policy 0, policy_version 158904 (0.0025) [2025-01-04 09:18:23,968][134211] Fps is (10 sec: 11878.6, 60 sec: 13516.8, 300 sec: 13565.4). Total num frames: 650891264. Throughput: 0: 3381.1. Samples: 151890836. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 09:18:23,968][134211] Avg episode reward: [(0, '8.770')] [2025-01-04 09:18:25,430][134294] Updated weights for policy 0, policy_version 158914 (0.0026) [2025-01-04 09:18:28,708][134294] Updated weights for policy 0, policy_version 158924 (0.0028) [2025-01-04 09:18:28,968][134211] Fps is (10 sec: 12287.9, 60 sec: 13585.0, 300 sec: 13496.0). Total num frames: 650952704. Throughput: 0: 3414.9. Samples: 151909944. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 09:18:28,968][134211] Avg episode reward: [(0, '8.715')] [2025-01-04 09:18:31,805][134294] Updated weights for policy 0, policy_version 158934 (0.0026) [2025-01-04 09:18:33,968][134211] Fps is (10 sec: 12287.8, 60 sec: 13380.2, 300 sec: 13496.0). Total num frames: 651014144. Throughput: 0: 3376.9. Samples: 151919498. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 09:18:33,968][134211] Avg episode reward: [(0, '8.403')] [2025-01-04 09:18:35,342][134294] Updated weights for policy 0, policy_version 158944 (0.0028) [2025-01-04 09:18:38,767][134294] Updated weights for policy 0, policy_version 158954 (0.0028) [2025-01-04 09:18:38,968][134211] Fps is (10 sec: 12288.0, 60 sec: 13243.7, 300 sec: 13509.9). Total num frames: 651075584. Throughput: 0: 3333.8. Samples: 151937224. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 09:18:38,968][134211] Avg episode reward: [(0, '9.526')] [2025-01-04 09:18:42,103][134294] Updated weights for policy 0, policy_version 158964 (0.0024) [2025-01-04 09:18:43,968][134211] Fps is (10 sec: 12288.1, 60 sec: 13312.2, 300 sec: 13412.7). Total num frames: 651137024. Throughput: 0: 3203.9. Samples: 151955334. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 09:18:43,968][134211] Avg episode reward: [(0, '8.560')] [2025-01-04 09:18:45,560][134294] Updated weights for policy 0, policy_version 158974 (0.0023) [2025-01-04 09:18:48,749][134294] Updated weights for policy 0, policy_version 158984 (0.0025) [2025-01-04 09:18:48,968][134211] Fps is (10 sec: 12288.1, 60 sec: 13039.2, 300 sec: 13259.9). Total num frames: 651198464. Throughput: 0: 3091.4. Samples: 151964516. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 09:18:48,968][134211] Avg episode reward: [(0, '7.529')] [2025-01-04 09:18:51,970][134294] Updated weights for policy 0, policy_version 158994 (0.0026) [2025-01-04 09:18:53,968][134211] Fps is (10 sec: 12287.8, 60 sec: 13107.2, 300 sec: 13232.2). Total num frames: 651259904. Throughput: 0: 3010.0. Samples: 151983562. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 09:18:53,969][134211] Avg episode reward: [(0, '8.462')] [2025-01-04 09:18:55,251][134294] Updated weights for policy 0, policy_version 159004 (0.0026) [2025-01-04 09:18:58,010][134294] Updated weights for policy 0, policy_version 159014 (0.0018) [2025-01-04 09:18:58,968][134211] Fps is (10 sec: 13926.5, 60 sec: 12902.4, 300 sec: 13315.5). Total num frames: 651337728. Throughput: 0: 3104.0. Samples: 152004316. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 09:18:58,968][134211] Avg episode reward: [(0, '9.287')] [2025-01-04 09:18:59,995][134294] Updated weights for policy 0, policy_version 159024 (0.0014) [2025-01-04 09:19:02,311][134294] Updated weights for policy 0, policy_version 159034 (0.0017) [2025-01-04 09:19:03,968][134211] Fps is (10 sec: 15974.3, 60 sec: 12697.5, 300 sec: 13398.8). Total num frames: 651419648. Throughput: 0: 3266.0. Samples: 152019940. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 09:19:03,969][134211] Avg episode reward: [(0, '8.884')] [2025-01-04 09:19:05,529][134294] Updated weights for policy 0, policy_version 159044 (0.0028) [2025-01-04 09:19:08,610][134294] Updated weights for policy 0, policy_version 159054 (0.0029) [2025-01-04 09:19:08,968][134211] Fps is (10 sec: 15154.9, 60 sec: 12902.6, 300 sec: 13440.4). Total num frames: 651489280. Throughput: 0: 3313.1. Samples: 152039926. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 09:19:08,968][134211] Avg episode reward: [(0, '8.812')] [2025-01-04 09:19:11,681][134294] Updated weights for policy 0, policy_version 159064 (0.0028) [2025-01-04 09:19:13,968][134211] Fps is (10 sec: 13107.2, 60 sec: 12970.7, 300 sec: 13454.3). Total num frames: 651550720. Throughput: 0: 3312.2. Samples: 152058994. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 09:19:13,969][134211] Avg episode reward: [(0, '8.866')] [2025-01-04 09:19:13,989][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000159070_651550720.pth... [2025-01-04 09:19:14,071][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000158278_648306688.pth [2025-01-04 09:19:15,360][134294] Updated weights for policy 0, policy_version 159074 (0.0027) [2025-01-04 09:19:18,577][134294] Updated weights for policy 0, policy_version 159084 (0.0026) [2025-01-04 09:19:18,968][134211] Fps is (10 sec: 12287.9, 60 sec: 13038.9, 300 sec: 13454.3). Total num frames: 651612160. Throughput: 0: 3288.4. Samples: 152067476. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 09:19:18,968][134211] Avg episode reward: [(0, '9.886')] [2025-01-04 09:19:21,293][134294] Updated weights for policy 0, policy_version 159094 (0.0018) [2025-01-04 09:19:23,223][134294] Updated weights for policy 0, policy_version 159104 (0.0015) [2025-01-04 09:19:23,967][134211] Fps is (10 sec: 15155.7, 60 sec: 13516.8, 300 sec: 13537.6). Total num frames: 651702272. Throughput: 0: 3401.4. Samples: 152090284. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 09:19:23,968][134211] Avg episode reward: [(0, '9.550')] [2025-01-04 09:19:25,186][134294] Updated weights for policy 0, policy_version 159114 (0.0014) [2025-01-04 09:19:27,150][134294] Updated weights for policy 0, policy_version 159124 (0.0013) [2025-01-04 09:19:28,968][134211] Fps is (10 sec: 19251.3, 60 sec: 14199.5, 300 sec: 13676.5). Total num frames: 651804672. Throughput: 0: 3695.1. Samples: 152121612. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 09:19:28,968][134211] Avg episode reward: [(0, '9.231')] [2025-01-04 09:19:29,419][134294] Updated weights for policy 0, policy_version 159134 (0.0018) [2025-01-04 09:19:32,936][134294] Updated weights for policy 0, policy_version 159144 (0.0030) [2025-01-04 09:19:33,968][134211] Fps is (10 sec: 16383.8, 60 sec: 14199.5, 300 sec: 13648.7). Total num frames: 651866112. Throughput: 0: 3720.4. Samples: 152131934. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 09:19:33,968][134211] Avg episode reward: [(0, '8.521')] [2025-01-04 09:19:36,047][134294] Updated weights for policy 0, policy_version 159154 (0.0029) [2025-01-04 09:19:38,968][134211] Fps is (10 sec: 12288.0, 60 sec: 14199.5, 300 sec: 13648.7). Total num frames: 651927552. Throughput: 0: 3702.1. Samples: 152150154. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 09:19:38,968][134211] Avg episode reward: [(0, '8.275')] [2025-01-04 09:19:39,475][134294] Updated weights for policy 0, policy_version 159164 (0.0030) [2025-01-04 09:19:42,840][134294] Updated weights for policy 0, policy_version 159174 (0.0027) [2025-01-04 09:19:43,968][134211] Fps is (10 sec: 12287.7, 60 sec: 14199.4, 300 sec: 13620.9). Total num frames: 651988992. Throughput: 0: 3636.8. Samples: 152167974. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 09:19:43,969][134211] Avg episode reward: [(0, '10.125')] [2025-01-04 09:19:46,428][134294] Updated weights for policy 0, policy_version 159184 (0.0023) [2025-01-04 09:19:48,968][134211] Fps is (10 sec: 11878.4, 60 sec: 14131.2, 300 sec: 13523.8). Total num frames: 652046336. Throughput: 0: 3487.2. Samples: 152176862. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 09:19:48,968][134211] Avg episode reward: [(0, '9.530')] [2025-01-04 09:19:49,880][134294] Updated weights for policy 0, policy_version 159194 (0.0027) [2025-01-04 09:19:53,006][134294] Updated weights for policy 0, policy_version 159204 (0.0025) [2025-01-04 09:19:53,968][134211] Fps is (10 sec: 11878.0, 60 sec: 14131.1, 300 sec: 13523.7). Total num frames: 652107776. Throughput: 0: 3454.3. Samples: 152195372. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 09:19:53,969][134211] Avg episode reward: [(0, '9.130')] [2025-01-04 09:19:56,126][134294] Updated weights for policy 0, policy_version 159214 (0.0025) [2025-01-04 09:19:58,968][134211] Fps is (10 sec: 12288.0, 60 sec: 13858.1, 300 sec: 13509.9). Total num frames: 652169216. Throughput: 0: 3449.0. Samples: 152214200. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 09:19:58,968][134211] Avg episode reward: [(0, '9.148')] [2025-01-04 09:19:59,745][134294] Updated weights for policy 0, policy_version 159224 (0.0025) [2025-01-04 09:20:03,247][134294] Updated weights for policy 0, policy_version 159234 (0.0026) [2025-01-04 09:20:03,968][134211] Fps is (10 sec: 11878.8, 60 sec: 13448.5, 300 sec: 13482.1). Total num frames: 652226560. Throughput: 0: 3449.5. Samples: 152222704. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 09:20:03,968][134211] Avg episode reward: [(0, '8.487')] [2025-01-04 09:20:06,558][134294] Updated weights for policy 0, policy_version 159244 (0.0028) [2025-01-04 09:20:08,968][134211] Fps is (10 sec: 12288.2, 60 sec: 13380.3, 300 sec: 13509.9). Total num frames: 652292096. Throughput: 0: 3343.4. Samples: 152240738. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 09:20:08,968][134211] Avg episode reward: [(0, '8.302')] [2025-01-04 09:20:09,813][134294] Updated weights for policy 0, policy_version 159254 (0.0025) [2025-01-04 09:20:12,138][134294] Updated weights for policy 0, policy_version 159264 (0.0016) [2025-01-04 09:20:13,968][134211] Fps is (10 sec: 15565.2, 60 sec: 13858.2, 300 sec: 13496.0). Total num frames: 652382208. Throughput: 0: 3180.1. Samples: 152264716. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 09:20:13,968][134211] Avg episode reward: [(0, '9.887')] [2025-01-04 09:20:14,221][134294] Updated weights for policy 0, policy_version 159274 (0.0015) [2025-01-04 09:20:16,211][134294] Updated weights for policy 0, policy_version 159284 (0.0013) [2025-01-04 09:20:18,198][134294] Updated weights for policy 0, policy_version 159294 (0.0012) [2025-01-04 09:20:18,968][134211] Fps is (10 sec: 19250.8, 60 sec: 14540.8, 300 sec: 13551.5). Total num frames: 652484608. Throughput: 0: 3286.7. Samples: 152279834. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 09:20:18,968][134211] Avg episode reward: [(0, '8.667')] [2025-01-04 09:20:20,096][134294] Updated weights for policy 0, policy_version 159304 (0.0012) [2025-01-04 09:20:21,976][134294] Updated weights for policy 0, policy_version 159314 (0.0015) [2025-01-04 09:20:23,968][134211] Fps is (10 sec: 19250.7, 60 sec: 14540.7, 300 sec: 13662.6). Total num frames: 652574720. Throughput: 0: 3586.7. Samples: 152311554. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 09:20:23,969][134211] Avg episode reward: [(0, '9.075')] [2025-01-04 09:20:25,006][134294] Updated weights for policy 0, policy_version 159324 (0.0029) [2025-01-04 09:20:28,246][134294] Updated weights for policy 0, policy_version 159334 (0.0029) [2025-01-04 09:20:28,970][134211] Fps is (10 sec: 15561.8, 60 sec: 13925.9, 300 sec: 13690.3). Total num frames: 652640256. Throughput: 0: 3621.1. Samples: 152330932. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 09:20:28,970][134211] Avg episode reward: [(0, '8.850')] [2025-01-04 09:20:31,431][134294] Updated weights for policy 0, policy_version 159344 (0.0029) [2025-01-04 09:20:33,968][134211] Fps is (10 sec: 13107.3, 60 sec: 13994.6, 300 sec: 13690.4). Total num frames: 652705792. Throughput: 0: 3641.7. Samples: 152340738. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 09:20:33,968][134211] Avg episode reward: [(0, '9.289')] [2025-01-04 09:20:34,570][134294] Updated weights for policy 0, policy_version 159354 (0.0026) [2025-01-04 09:20:37,601][134294] Updated weights for policy 0, policy_version 159364 (0.0027) [2025-01-04 09:20:38,968][134211] Fps is (10 sec: 13109.9, 60 sec: 14062.9, 300 sec: 13704.3). Total num frames: 652771328. Throughput: 0: 3670.6. Samples: 152360546. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 09:20:38,968][134211] Avg episode reward: [(0, '9.773')] [2025-01-04 09:20:40,853][134294] Updated weights for policy 0, policy_version 159374 (0.0027) [2025-01-04 09:20:43,968][134211] Fps is (10 sec: 12697.7, 60 sec: 14063.0, 300 sec: 13704.2). Total num frames: 652832768. Throughput: 0: 3678.6. Samples: 152379736. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 09:20:43,968][134211] Avg episode reward: [(0, '9.650')] [2025-01-04 09:20:44,090][134294] Updated weights for policy 0, policy_version 159384 (0.0028) [2025-01-04 09:20:47,315][134294] Updated weights for policy 0, policy_version 159394 (0.0026) [2025-01-04 09:20:48,968][134211] Fps is (10 sec: 12697.6, 60 sec: 14199.5, 300 sec: 13704.2). Total num frames: 652898304. Throughput: 0: 3693.4. Samples: 152388906. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 09:20:48,968][134211] Avg episode reward: [(0, '9.875')] [2025-01-04 09:20:50,329][134294] Updated weights for policy 0, policy_version 159404 (0.0026) [2025-01-04 09:20:53,280][134294] Updated weights for policy 0, policy_version 159414 (0.0025) [2025-01-04 09:20:53,968][134211] Fps is (10 sec: 13516.6, 60 sec: 14336.1, 300 sec: 13732.0). Total num frames: 652967936. Throughput: 0: 3750.8. Samples: 152409526. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 09:20:53,969][134211] Avg episode reward: [(0, '9.510')] [2025-01-04 09:20:56,258][134294] Updated weights for policy 0, policy_version 159424 (0.0026) [2025-01-04 09:20:58,968][134211] Fps is (10 sec: 13516.8, 60 sec: 14404.3, 300 sec: 13676.5). Total num frames: 653033472. Throughput: 0: 3666.5. Samples: 152429708. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 09:20:58,968][134211] Avg episode reward: [(0, '9.894')] [2025-01-04 09:20:59,333][134294] Updated weights for policy 0, policy_version 159434 (0.0025) [2025-01-04 09:21:02,357][134294] Updated weights for policy 0, policy_version 159444 (0.0028) [2025-01-04 09:21:03,968][134211] Fps is (10 sec: 13516.9, 60 sec: 14609.1, 300 sec: 13565.4). Total num frames: 653103104. Throughput: 0: 3553.0. Samples: 152439718. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 09:21:03,968][134211] Avg episode reward: [(0, '9.368')] [2025-01-04 09:21:05,481][134294] Updated weights for policy 0, policy_version 159454 (0.0026) [2025-01-04 09:21:08,644][134294] Updated weights for policy 0, policy_version 159464 (0.0024) [2025-01-04 09:21:08,968][134211] Fps is (10 sec: 13107.2, 60 sec: 14540.8, 300 sec: 13551.5). Total num frames: 653164544. Throughput: 0: 3294.4. Samples: 152459804. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 09:21:08,968][134211] Avg episode reward: [(0, '8.735')] [2025-01-04 09:21:12,182][134294] Updated weights for policy 0, policy_version 159474 (0.0028) [2025-01-04 09:21:13,968][134211] Fps is (10 sec: 12287.8, 60 sec: 14062.9, 300 sec: 13579.3). Total num frames: 653225984. Throughput: 0: 3256.5. Samples: 152477470. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 09:21:13,969][134211] Avg episode reward: [(0, '9.237')] [2025-01-04 09:21:13,978][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000159479_653225984.pth... [2025-01-04 09:21:14,056][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000158680_649953280.pth [2025-01-04 09:21:15,663][134294] Updated weights for policy 0, policy_version 159484 (0.0028) [2025-01-04 09:21:18,847][134294] Updated weights for policy 0, policy_version 159494 (0.0028) [2025-01-04 09:21:18,968][134211] Fps is (10 sec: 12288.0, 60 sec: 13380.3, 300 sec: 13579.3). Total num frames: 653287424. Throughput: 0: 3242.1. Samples: 152486632. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 09:21:18,968][134211] Avg episode reward: [(0, '10.273')] [2025-01-04 09:21:21,776][134294] Updated weights for policy 0, policy_version 159504 (0.0026) [2025-01-04 09:21:23,968][134211] Fps is (10 sec: 13107.5, 60 sec: 13039.0, 300 sec: 13607.1). Total num frames: 653357056. Throughput: 0: 3246.9. Samples: 152506656. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 09:21:23,968][134211] Avg episode reward: [(0, '9.752')] [2025-01-04 09:21:24,807][134294] Updated weights for policy 0, policy_version 159514 (0.0026) [2025-01-04 09:21:28,002][134294] Updated weights for policy 0, policy_version 159524 (0.0025) [2025-01-04 09:21:28,968][134211] Fps is (10 sec: 13516.8, 60 sec: 13039.4, 300 sec: 13620.9). Total num frames: 653422592. Throughput: 0: 3264.1. Samples: 152526620. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 09:21:28,968][134211] Avg episode reward: [(0, '8.848')] [2025-01-04 09:21:30,948][134294] Updated weights for policy 0, policy_version 159534 (0.0025) [2025-01-04 09:21:33,897][134294] Updated weights for policy 0, policy_version 159544 (0.0027) [2025-01-04 09:21:33,968][134211] Fps is (10 sec: 13516.7, 60 sec: 13107.2, 300 sec: 13662.6). Total num frames: 653492224. Throughput: 0: 3286.4. Samples: 152536792. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 09:21:33,968][134211] Avg episode reward: [(0, '9.587')] [2025-01-04 09:21:36,792][134294] Updated weights for policy 0, policy_version 159554 (0.0026) [2025-01-04 09:21:38,968][134211] Fps is (10 sec: 13926.4, 60 sec: 13175.5, 300 sec: 13704.3). Total num frames: 653561856. Throughput: 0: 3294.0. Samples: 152557758. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 09:21:38,968][134211] Avg episode reward: [(0, '8.937')] [2025-01-04 09:21:39,945][134294] Updated weights for policy 0, policy_version 159564 (0.0026) [2025-01-04 09:21:42,507][134294] Updated weights for policy 0, policy_version 159574 (0.0019) [2025-01-04 09:21:43,968][134211] Fps is (10 sec: 14745.9, 60 sec: 13448.6, 300 sec: 13690.4). Total num frames: 653639680. Throughput: 0: 3347.0. Samples: 152580322. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 09:21:43,968][134211] Avg episode reward: [(0, '8.168')] [2025-01-04 09:21:44,589][134294] Updated weights for policy 0, policy_version 159584 (0.0014) [2025-01-04 09:21:47,049][134294] Updated weights for policy 0, policy_version 159594 (0.0019) [2025-01-04 09:21:48,968][134211] Fps is (10 sec: 15974.2, 60 sec: 13721.6, 300 sec: 13634.8). Total num frames: 653721600. Throughput: 0: 3442.7. Samples: 152594638. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 09:21:48,968][134211] Avg episode reward: [(0, '9.670')] [2025-01-04 09:21:50,023][134294] Updated weights for policy 0, policy_version 159604 (0.0028) [2025-01-04 09:21:53,155][134294] Updated weights for policy 0, policy_version 159614 (0.0025) [2025-01-04 09:21:53,968][134211] Fps is (10 sec: 14745.3, 60 sec: 13653.3, 300 sec: 13620.9). Total num frames: 653787136. Throughput: 0: 3448.2. Samples: 152614974. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 09:21:53,968][134211] Avg episode reward: [(0, '9.934')] [2025-01-04 09:21:56,298][134294] Updated weights for policy 0, policy_version 159624 (0.0025) [2025-01-04 09:21:58,968][134211] Fps is (10 sec: 12697.7, 60 sec: 13585.1, 300 sec: 13634.8). Total num frames: 653848576. Throughput: 0: 3467.4. Samples: 152633502. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 09:21:58,968][134211] Avg episode reward: [(0, '8.307')] [2025-01-04 09:21:59,860][134294] Updated weights for policy 0, policy_version 159634 (0.0029) [2025-01-04 09:22:03,245][134294] Updated weights for policy 0, policy_version 159644 (0.0023) [2025-01-04 09:22:03,968][134211] Fps is (10 sec: 12697.9, 60 sec: 13516.8, 300 sec: 13634.8). Total num frames: 653914112. Throughput: 0: 3453.0. Samples: 152642016. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 09:22:03,968][134211] Avg episode reward: [(0, '9.894')] [2025-01-04 09:22:05,282][134294] Updated weights for policy 0, policy_version 159654 (0.0013) [2025-01-04 09:22:07,263][134294] Updated weights for policy 0, policy_version 159664 (0.0015) [2025-01-04 09:22:08,968][134211] Fps is (10 sec: 15565.0, 60 sec: 13994.7, 300 sec: 13732.0). Total num frames: 654004224. Throughput: 0: 3603.4. Samples: 152668810. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 09:22:08,968][134211] Avg episode reward: [(0, '8.971')] [2025-01-04 09:22:10,303][134294] Updated weights for policy 0, policy_version 159674 (0.0024) [2025-01-04 09:22:13,376][134294] Updated weights for policy 0, policy_version 159684 (0.0025) [2025-01-04 09:22:13,968][134211] Fps is (10 sec: 15564.4, 60 sec: 14062.9, 300 sec: 13745.9). Total num frames: 654069760. Throughput: 0: 3604.2. Samples: 152688810. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 09:22:13,969][134211] Avg episode reward: [(0, '8.770')] [2025-01-04 09:22:16,711][134294] Updated weights for policy 0, policy_version 159694 (0.0025) [2025-01-04 09:22:18,968][134211] Fps is (10 sec: 13107.1, 60 sec: 14131.2, 300 sec: 13745.9). Total num frames: 654135296. Throughput: 0: 3585.9. Samples: 152698156. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 09:22:18,968][134211] Avg episode reward: [(0, '8.836')] [2025-01-04 09:22:19,847][134294] Updated weights for policy 0, policy_version 159704 (0.0026) [2025-01-04 09:22:22,775][134294] Updated weights for policy 0, policy_version 159714 (0.0024) [2025-01-04 09:22:23,968][134211] Fps is (10 sec: 13107.3, 60 sec: 14062.9, 300 sec: 13773.7). Total num frames: 654200832. Throughput: 0: 3566.6. Samples: 152718256. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 09:22:23,968][134211] Avg episode reward: [(0, '8.666')] [2025-01-04 09:22:25,794][134294] Updated weights for policy 0, policy_version 159724 (0.0025) [2025-01-04 09:22:28,689][134294] Updated weights for policy 0, policy_version 159734 (0.0023) [2025-01-04 09:22:28,968][134211] Fps is (10 sec: 13516.7, 60 sec: 14131.2, 300 sec: 13759.8). Total num frames: 654270464. Throughput: 0: 3525.5. Samples: 152738972. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 09:22:28,968][134211] Avg episode reward: [(0, '9.584')] [2025-01-04 09:22:31,719][134294] Updated weights for policy 0, policy_version 159744 (0.0026) [2025-01-04 09:22:33,968][134211] Fps is (10 sec: 13926.4, 60 sec: 14131.2, 300 sec: 13759.8). Total num frames: 654340096. Throughput: 0: 3436.4. Samples: 152749278. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 09:22:33,968][134211] Avg episode reward: [(0, '8.939')] [2025-01-04 09:22:34,610][134294] Updated weights for policy 0, policy_version 159754 (0.0024) [2025-01-04 09:22:37,809][134294] Updated weights for policy 0, policy_version 159764 (0.0021) [2025-01-04 09:22:38,968][134211] Fps is (10 sec: 13517.0, 60 sec: 14063.0, 300 sec: 13787.6). Total num frames: 654405632. Throughput: 0: 3432.6. Samples: 152769440. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 09:22:38,968][134211] Avg episode reward: [(0, '9.176')] [2025-01-04 09:22:40,852][134294] Updated weights for policy 0, policy_version 159774 (0.0026) [2025-01-04 09:22:43,968][134211] Fps is (10 sec: 13107.4, 60 sec: 13858.1, 300 sec: 13746.0). Total num frames: 654471168. Throughput: 0: 3467.4. Samples: 152789536. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 09:22:43,968][134211] Avg episode reward: [(0, '8.458')] [2025-01-04 09:22:44,076][134294] Updated weights for policy 0, policy_version 159784 (0.0026) [2025-01-04 09:22:47,288][134294] Updated weights for policy 0, policy_version 159794 (0.0027) [2025-01-04 09:22:48,968][134211] Fps is (10 sec: 13926.4, 60 sec: 13721.7, 300 sec: 13801.4). Total num frames: 654544896. Throughput: 0: 3478.2. Samples: 152798536. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 09:22:48,968][134211] Avg episode reward: [(0, '8.507')] [2025-01-04 09:22:49,453][134294] Updated weights for policy 0, policy_version 159804 (0.0014) [2025-01-04 09:22:51,301][134294] Updated weights for policy 0, policy_version 159814 (0.0014) [2025-01-04 09:22:53,190][134294] Updated weights for policy 0, policy_version 159824 (0.0015) [2025-01-04 09:22:53,967][134211] Fps is (10 sec: 18432.2, 60 sec: 14472.6, 300 sec: 13870.9). Total num frames: 654655488. Throughput: 0: 3524.6. Samples: 152827418. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 09:22:53,968][134211] Avg episode reward: [(0, '8.535')] [2025-01-04 09:22:55,081][134294] Updated weights for policy 0, policy_version 159834 (0.0012) [2025-01-04 09:22:56,977][134294] Updated weights for policy 0, policy_version 159844 (0.0013) [2025-01-04 09:22:58,968][134211] Fps is (10 sec: 20889.1, 60 sec: 15086.9, 300 sec: 13884.7). Total num frames: 654753792. Throughput: 0: 3784.6. Samples: 152859116. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 09:22:58,968][134211] Avg episode reward: [(0, '9.450')] [2025-01-04 09:22:59,530][134294] Updated weights for policy 0, policy_version 159854 (0.0021) [2025-01-04 09:23:02,653][134294] Updated weights for policy 0, policy_version 159864 (0.0029) [2025-01-04 09:23:03,968][134211] Fps is (10 sec: 15973.9, 60 sec: 15018.6, 300 sec: 13898.7). Total num frames: 654815232. Throughput: 0: 3802.3. Samples: 152869260. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 09:23:03,969][134211] Avg episode reward: [(0, '8.197')] [2025-01-04 09:23:05,873][134294] Updated weights for policy 0, policy_version 159874 (0.0028) [2025-01-04 09:23:08,968][134211] Fps is (10 sec: 12697.6, 60 sec: 14609.0, 300 sec: 13926.4). Total num frames: 654880768. Throughput: 0: 3788.5. Samples: 152888740. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 09:23:08,968][134211] Avg episode reward: [(0, '9.296')] [2025-01-04 09:23:08,982][134294] Updated weights for policy 0, policy_version 159884 (0.0027) [2025-01-04 09:23:12,319][134294] Updated weights for policy 0, policy_version 159894 (0.0031) [2025-01-04 09:23:13,968][134211] Fps is (10 sec: 12697.7, 60 sec: 14540.8, 300 sec: 13940.3). Total num frames: 654942208. Throughput: 0: 3739.9. Samples: 152907268. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 09:23:13,968][134211] Avg episode reward: [(0, '9.049')] [2025-01-04 09:23:13,980][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000159899_654946304.pth... [2025-01-04 09:23:14,055][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000159070_651550720.pth [2025-01-04 09:23:15,658][134294] Updated weights for policy 0, policy_version 159904 (0.0026) [2025-01-04 09:23:18,709][134294] Updated weights for policy 0, policy_version 159914 (0.0025) [2025-01-04 09:23:18,968][134211] Fps is (10 sec: 12697.7, 60 sec: 14540.8, 300 sec: 13954.2). Total num frames: 655007744. Throughput: 0: 3715.8. Samples: 152916490. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 09:23:18,968][134211] Avg episode reward: [(0, '8.245')] [2025-01-04 09:23:21,693][134294] Updated weights for policy 0, policy_version 159924 (0.0025) [2025-01-04 09:23:23,969][134211] Fps is (10 sec: 13515.3, 60 sec: 14608.8, 300 sec: 13981.9). Total num frames: 655077376. Throughput: 0: 3723.9. Samples: 152937022. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 09:23:23,969][134211] Avg episode reward: [(0, '8.895')] [2025-01-04 09:23:24,757][134294] Updated weights for policy 0, policy_version 159934 (0.0025) [2025-01-04 09:23:27,808][134294] Updated weights for policy 0, policy_version 159944 (0.0027) [2025-01-04 09:23:28,968][134211] Fps is (10 sec: 13516.9, 60 sec: 14540.8, 300 sec: 13995.8). Total num frames: 655142912. Throughput: 0: 3717.6. Samples: 152956826. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 09:23:28,968][134211] Avg episode reward: [(0, '8.635')] [2025-01-04 09:23:30,871][134294] Updated weights for policy 0, policy_version 159954 (0.0023) [2025-01-04 09:23:33,719][134294] Updated weights for policy 0, policy_version 159964 (0.0025) [2025-01-04 09:23:33,968][134211] Fps is (10 sec: 13518.2, 60 sec: 14540.8, 300 sec: 14023.6). Total num frames: 655212544. Throughput: 0: 3747.5. Samples: 152967174. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 09:23:33,968][134211] Avg episode reward: [(0, '9.641')] [2025-01-04 09:23:36,711][134294] Updated weights for policy 0, policy_version 159974 (0.0024) [2025-01-04 09:23:38,968][134211] Fps is (10 sec: 13515.8, 60 sec: 14540.6, 300 sec: 14037.4). Total num frames: 655278080. Throughput: 0: 3569.2. Samples: 152988036. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 09:23:38,969][134211] Avg episode reward: [(0, '9.182')] [2025-01-04 09:23:40,071][134294] Updated weights for policy 0, policy_version 159984 (0.0025) [2025-01-04 09:23:43,308][134294] Updated weights for policy 0, policy_version 159994 (0.0027) [2025-01-04 09:23:43,968][134211] Fps is (10 sec: 13107.2, 60 sec: 14540.8, 300 sec: 14051.4). Total num frames: 655343616. Throughput: 0: 3282.7. Samples: 153006836. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 09:23:43,968][134211] Avg episode reward: [(0, '9.288')] [2025-01-04 09:23:46,579][134294] Updated weights for policy 0, policy_version 160004 (0.0023) [2025-01-04 09:23:48,968][134211] Fps is (10 sec: 12698.2, 60 sec: 14335.9, 300 sec: 14051.4). Total num frames: 655405056. Throughput: 0: 3263.9. Samples: 153016138. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 09:23:48,969][134211] Avg episode reward: [(0, '9.133')] [2025-01-04 09:23:49,593][134294] Updated weights for policy 0, policy_version 160014 (0.0025) [2025-01-04 09:23:52,592][134294] Updated weights for policy 0, policy_version 160024 (0.0025) [2025-01-04 09:23:53,968][134211] Fps is (10 sec: 13107.5, 60 sec: 13653.3, 300 sec: 14023.6). Total num frames: 655474688. Throughput: 0: 3279.0. Samples: 153036294. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 09:23:53,968][134211] Avg episode reward: [(0, '8.692')] [2025-01-04 09:23:55,654][134294] Updated weights for policy 0, policy_version 160034 (0.0026) [2025-01-04 09:23:58,864][134294] Updated weights for policy 0, policy_version 160044 (0.0025) [2025-01-04 09:23:58,968][134211] Fps is (10 sec: 13517.1, 60 sec: 13107.2, 300 sec: 13968.1). Total num frames: 655540224. Throughput: 0: 3307.6. Samples: 153056112. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 09:23:58,968][134211] Avg episode reward: [(0, '10.280')] [2025-01-04 09:24:02,402][134294] Updated weights for policy 0, policy_version 160054 (0.0029) [2025-01-04 09:24:03,968][134211] Fps is (10 sec: 12697.2, 60 sec: 13107.2, 300 sec: 13940.3). Total num frames: 655601664. Throughput: 0: 3299.0. Samples: 153064948. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 09:24:03,968][134211] Avg episode reward: [(0, '9.424')] [2025-01-04 09:24:04,849][134294] Updated weights for policy 0, policy_version 160064 (0.0016) [2025-01-04 09:24:07,472][134294] Updated weights for policy 0, policy_version 160074 (0.0020) [2025-01-04 09:24:08,968][134211] Fps is (10 sec: 13926.4, 60 sec: 13312.0, 300 sec: 13995.8). Total num frames: 655679488. Throughput: 0: 3357.2. Samples: 153088092. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 09:24:08,968][134211] Avg episode reward: [(0, '9.080')] [2025-01-04 09:24:10,622][134294] Updated weights for policy 0, policy_version 160084 (0.0027) [2025-01-04 09:24:13,868][134294] Updated weights for policy 0, policy_version 160094 (0.0027) [2025-01-04 09:24:13,968][134211] Fps is (10 sec: 14336.2, 60 sec: 13380.3, 300 sec: 14009.7). Total num frames: 655745024. Throughput: 0: 3346.0. Samples: 153107398. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 09:24:13,968][134211] Avg episode reward: [(0, '9.706')] [2025-01-04 09:24:17,023][134294] Updated weights for policy 0, policy_version 160104 (0.0025) [2025-01-04 09:24:18,968][134211] Fps is (10 sec: 13106.6, 60 sec: 13380.2, 300 sec: 13926.4). Total num frames: 655810560. Throughput: 0: 3329.8. Samples: 153117014. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 09:24:18,969][134211] Avg episode reward: [(0, '9.154')] [2025-01-04 09:24:20,013][134294] Updated weights for policy 0, policy_version 160114 (0.0026) [2025-01-04 09:24:22,876][134294] Updated weights for policy 0, policy_version 160124 (0.0025) [2025-01-04 09:24:23,970][134211] Fps is (10 sec: 14333.3, 60 sec: 13516.6, 300 sec: 13843.0). Total num frames: 655888384. Throughput: 0: 3326.5. Samples: 153137732. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 09:24:23,970][134211] Avg episode reward: [(0, '10.064')] [2025-01-04 09:24:24,777][134294] Updated weights for policy 0, policy_version 160134 (0.0014) [2025-01-04 09:24:27,531][134294] Updated weights for policy 0, policy_version 160144 (0.0021) [2025-01-04 09:24:28,968][134211] Fps is (10 sec: 15565.4, 60 sec: 13721.6, 300 sec: 13898.6). Total num frames: 655966208. Throughput: 0: 3466.0. Samples: 153162806. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 09:24:28,968][134211] Avg episode reward: [(0, '9.412')] [2025-01-04 09:24:30,545][134294] Updated weights for policy 0, policy_version 160154 (0.0026) [2025-01-04 09:24:33,415][134294] Updated weights for policy 0, policy_version 160164 (0.0024) [2025-01-04 09:24:33,968][134211] Fps is (10 sec: 14748.3, 60 sec: 13721.6, 300 sec: 13926.4). Total num frames: 656035840. Throughput: 0: 3490.8. Samples: 153173224. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 09:24:33,969][134211] Avg episode reward: [(0, '9.260')] [2025-01-04 09:24:36,402][134294] Updated weights for policy 0, policy_version 160174 (0.0024) [2025-01-04 09:24:38,968][134211] Fps is (10 sec: 13516.9, 60 sec: 13721.8, 300 sec: 13940.3). Total num frames: 656101376. Throughput: 0: 3498.0. Samples: 153193702. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 09:24:38,968][134211] Avg episode reward: [(0, '8.290')] [2025-01-04 09:24:39,771][134294] Updated weights for policy 0, policy_version 160184 (0.0028) [2025-01-04 09:24:43,119][134294] Updated weights for policy 0, policy_version 160194 (0.0025) [2025-01-04 09:24:43,969][134211] Fps is (10 sec: 12696.7, 60 sec: 13653.2, 300 sec: 13954.1). Total num frames: 656162816. Throughput: 0: 3466.0. Samples: 153212086. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 09:24:43,969][134211] Avg episode reward: [(0, '9.640')] [2025-01-04 09:24:45,879][134294] Updated weights for policy 0, policy_version 160204 (0.0021) [2025-01-04 09:24:47,881][134294] Updated weights for policy 0, policy_version 160214 (0.0013) [2025-01-04 09:24:48,967][134211] Fps is (10 sec: 15565.1, 60 sec: 14199.6, 300 sec: 14065.3). Total num frames: 656257024. Throughput: 0: 3521.5. Samples: 153223416. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 09:24:48,968][134211] Avg episode reward: [(0, '9.700')] [2025-01-04 09:24:49,710][134294] Updated weights for policy 0, policy_version 160224 (0.0013) [2025-01-04 09:24:51,607][134294] Updated weights for policy 0, policy_version 160234 (0.0013) [2025-01-04 09:24:53,499][134294] Updated weights for policy 0, policy_version 160244 (0.0014) [2025-01-04 09:24:53,968][134211] Fps is (10 sec: 20481.7, 60 sec: 14882.1, 300 sec: 14231.9). Total num frames: 656367616. Throughput: 0: 3728.8. Samples: 153255888. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 09:24:53,968][134211] Avg episode reward: [(0, '8.431')] [2025-01-04 09:24:55,611][134294] Updated weights for policy 0, policy_version 160254 (0.0015) [2025-01-04 09:24:58,842][134294] Updated weights for policy 0, policy_version 160264 (0.0024) [2025-01-04 09:24:58,968][134211] Fps is (10 sec: 18431.4, 60 sec: 15018.6, 300 sec: 14287.4). Total num frames: 656441344. Throughput: 0: 3871.6. Samples: 153281620. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 09:24:58,969][134211] Avg episode reward: [(0, '9.637')] [2025-01-04 09:25:02,022][134294] Updated weights for policy 0, policy_version 160274 (0.0027) [2025-01-04 09:25:03,968][134211] Fps is (10 sec: 13516.6, 60 sec: 15018.7, 300 sec: 14273.5). Total num frames: 656502784. Throughput: 0: 3866.3. Samples: 153290996. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 09:25:03,969][134211] Avg episode reward: [(0, '9.557')] [2025-01-04 09:25:05,163][134294] Updated weights for policy 0, policy_version 160284 (0.0028) [2025-01-04 09:25:08,251][134294] Updated weights for policy 0, policy_version 160294 (0.0027) [2025-01-04 09:25:08,968][134211] Fps is (10 sec: 13107.3, 60 sec: 14882.1, 300 sec: 14204.1). Total num frames: 656572416. Throughput: 0: 3849.5. Samples: 153310952. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 09:25:08,968][134211] Avg episode reward: [(0, '8.017')] [2025-01-04 09:25:11,462][134294] Updated weights for policy 0, policy_version 160304 (0.0029) [2025-01-04 09:25:13,968][134211] Fps is (10 sec: 13106.9, 60 sec: 14813.8, 300 sec: 14065.2). Total num frames: 656633856. Throughput: 0: 3704.2. Samples: 153329498. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 09:25:13,969][134211] Avg episode reward: [(0, '9.354')] [2025-01-04 09:25:13,978][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000160311_656633856.pth... [2025-01-04 09:25:14,056][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000159479_653225984.pth [2025-01-04 09:25:14,895][134294] Updated weights for policy 0, policy_version 160314 (0.0025) [2025-01-04 09:25:18,211][134294] Updated weights for policy 0, policy_version 160324 (0.0027) [2025-01-04 09:25:18,968][134211] Fps is (10 sec: 12288.2, 60 sec: 14745.7, 300 sec: 13968.1). Total num frames: 656695296. Throughput: 0: 3674.2. Samples: 153338562. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 09:25:18,968][134211] Avg episode reward: [(0, '10.033')] [2025-01-04 09:25:21,380][134294] Updated weights for policy 0, policy_version 160334 (0.0023) [2025-01-04 09:25:23,968][134211] Fps is (10 sec: 12697.6, 60 sec: 14541.2, 300 sec: 13968.1). Total num frames: 656760832. Throughput: 0: 3643.3. Samples: 153357652. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 09:25:23,969][134211] Avg episode reward: [(0, '9.342')] [2025-01-04 09:25:24,497][134294] Updated weights for policy 0, policy_version 160344 (0.0022) [2025-01-04 09:25:27,655][134294] Updated weights for policy 0, policy_version 160354 (0.0023) [2025-01-04 09:25:28,968][134211] Fps is (10 sec: 12697.3, 60 sec: 14267.7, 300 sec: 13954.2). Total num frames: 656822272. Throughput: 0: 3668.0. Samples: 153377142. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 09:25:28,968][134211] Avg episode reward: [(0, '10.652')] [2025-01-04 09:25:30,713][134294] Updated weights for policy 0, policy_version 160364 (0.0026) [2025-01-04 09:25:33,634][134294] Updated weights for policy 0, policy_version 160374 (0.0024) [2025-01-04 09:25:33,968][134211] Fps is (10 sec: 13107.8, 60 sec: 14267.8, 300 sec: 13968.1). Total num frames: 656891904. Throughput: 0: 3648.0. Samples: 153387578. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 09:25:33,968][134211] Avg episode reward: [(0, '9.359')] [2025-01-04 09:25:36,651][134294] Updated weights for policy 0, policy_version 160384 (0.0024) [2025-01-04 09:25:38,968][134211] Fps is (10 sec: 13926.6, 60 sec: 14336.0, 300 sec: 13995.8). Total num frames: 656961536. Throughput: 0: 3387.3. Samples: 153408318. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 09:25:38,968][134211] Avg episode reward: [(0, '8.840')] [2025-01-04 09:25:39,850][134294] Updated weights for policy 0, policy_version 160394 (0.0025) [2025-01-04 09:25:43,269][134294] Updated weights for policy 0, policy_version 160404 (0.0028) [2025-01-04 09:25:43,968][134211] Fps is (10 sec: 13107.0, 60 sec: 14336.2, 300 sec: 13981.9). Total num frames: 657022976. Throughput: 0: 3226.1. Samples: 153426796. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 09:25:43,968][134211] Avg episode reward: [(0, '8.133')] [2025-01-04 09:25:46,468][134294] Updated weights for policy 0, policy_version 160414 (0.0027) [2025-01-04 09:25:48,968][134211] Fps is (10 sec: 12288.1, 60 sec: 13789.8, 300 sec: 13954.2). Total num frames: 657084416. Throughput: 0: 3223.4. Samples: 153436050. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 09:25:48,968][134211] Avg episode reward: [(0, '9.456')] [2025-01-04 09:25:49,668][134294] Updated weights for policy 0, policy_version 160424 (0.0025) [2025-01-04 09:25:52,582][134294] Updated weights for policy 0, policy_version 160434 (0.0024) [2025-01-04 09:25:53,968][134211] Fps is (10 sec: 13107.2, 60 sec: 13107.2, 300 sec: 13968.1). Total num frames: 657154048. Throughput: 0: 3229.2. Samples: 153456268. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 09:25:53,968][134211] Avg episode reward: [(0, '10.272')] [2025-01-04 09:25:55,506][134294] Updated weights for policy 0, policy_version 160444 (0.0024) [2025-01-04 09:25:58,829][134294] Updated weights for policy 0, policy_version 160454 (0.0024) [2025-01-04 09:25:58,968][134211] Fps is (10 sec: 13516.7, 60 sec: 12970.7, 300 sec: 13954.2). Total num frames: 657219584. Throughput: 0: 3259.8. Samples: 153476186. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 09:25:58,968][134211] Avg episode reward: [(0, '8.218')] [2025-01-04 09:26:01,146][134294] Updated weights for policy 0, policy_version 160464 (0.0016) [2025-01-04 09:26:03,968][134211] Fps is (10 sec: 13926.4, 60 sec: 13175.5, 300 sec: 13995.8). Total num frames: 657293312. Throughput: 0: 3325.3. Samples: 153488200. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 09:26:03,968][134211] Avg episode reward: [(0, '8.392')] [2025-01-04 09:26:04,670][134294] Updated weights for policy 0, policy_version 160474 (0.0026) [2025-01-04 09:26:07,186][134294] Updated weights for policy 0, policy_version 160484 (0.0018) [2025-01-04 09:26:08,968][134211] Fps is (10 sec: 15974.6, 60 sec: 13448.6, 300 sec: 14079.1). Total num frames: 657379328. Throughput: 0: 3357.5. Samples: 153508736. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 09:26:08,968][134211] Avg episode reward: [(0, '9.849')] [2025-01-04 09:26:09,132][134294] Updated weights for policy 0, policy_version 160494 (0.0014) [2025-01-04 09:26:11,941][134294] Updated weights for policy 0, policy_version 160504 (0.0021) [2025-01-04 09:26:13,968][134211] Fps is (10 sec: 15154.9, 60 sec: 13516.8, 300 sec: 14093.0). Total num frames: 657444864. Throughput: 0: 3459.1. Samples: 153532804. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 09:26:13,969][134211] Avg episode reward: [(0, '9.027')] [2025-01-04 09:26:15,511][134294] Updated weights for policy 0, policy_version 160514 (0.0026) [2025-01-04 09:26:18,830][134294] Updated weights for policy 0, policy_version 160524 (0.0025) [2025-01-04 09:26:18,968][134211] Fps is (10 sec: 12697.4, 60 sec: 13516.8, 300 sec: 14065.2). Total num frames: 657506304. Throughput: 0: 3414.9. Samples: 153541248. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 09:26:18,968][134211] Avg episode reward: [(0, '8.657')] [2025-01-04 09:26:21,680][134294] Updated weights for policy 0, policy_version 160534 (0.0024) [2025-01-04 09:26:23,968][134211] Fps is (10 sec: 13107.6, 60 sec: 13585.2, 300 sec: 14079.1). Total num frames: 657575936. Throughput: 0: 3403.2. Samples: 153561460. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 09:26:23,968][134211] Avg episode reward: [(0, '9.177')] [2025-01-04 09:26:24,943][134294] Updated weights for policy 0, policy_version 160544 (0.0027) [2025-01-04 09:26:27,810][134294] Updated weights for policy 0, policy_version 160554 (0.0027) [2025-01-04 09:26:28,968][134211] Fps is (10 sec: 13516.8, 60 sec: 13653.4, 300 sec: 14065.2). Total num frames: 657641472. Throughput: 0: 3440.6. Samples: 153581622. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 09:26:28,968][134211] Avg episode reward: [(0, '9.617')] [2025-01-04 09:26:30,826][134294] Updated weights for policy 0, policy_version 160564 (0.0026) [2025-01-04 09:26:33,755][134294] Updated weights for policy 0, policy_version 160574 (0.0027) [2025-01-04 09:26:33,968][134211] Fps is (10 sec: 13516.8, 60 sec: 13653.3, 300 sec: 14065.3). Total num frames: 657711104. Throughput: 0: 3462.2. Samples: 153591848. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 09:26:33,968][134211] Avg episode reward: [(0, '8.260')] [2025-01-04 09:26:36,804][134294] Updated weights for policy 0, policy_version 160584 (0.0024) [2025-01-04 09:26:38,968][134211] Fps is (10 sec: 13516.8, 60 sec: 13585.1, 300 sec: 14023.6). Total num frames: 657776640. Throughput: 0: 3474.7. Samples: 153612628. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 09:26:38,968][134211] Avg episode reward: [(0, '9.004')] [2025-01-04 09:26:40,108][134294] Updated weights for policy 0, policy_version 160594 (0.0024) [2025-01-04 09:26:43,508][134294] Updated weights for policy 0, policy_version 160604 (0.0026) [2025-01-04 09:26:43,968][134211] Fps is (10 sec: 12697.0, 60 sec: 13585.0, 300 sec: 13954.2). Total num frames: 657838080. Throughput: 0: 3436.3. Samples: 153630822. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 09:26:43,969][134211] Avg episode reward: [(0, '9.616')] [2025-01-04 09:26:46,033][134294] Updated weights for policy 0, policy_version 160614 (0.0014) [2025-01-04 09:26:48,008][134294] Updated weights for policy 0, policy_version 160624 (0.0012) [2025-01-04 09:26:48,968][134211] Fps is (10 sec: 15565.1, 60 sec: 14131.2, 300 sec: 14051.4). Total num frames: 657932288. Throughput: 0: 3434.5. Samples: 153642752. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 09:26:48,968][134211] Avg episode reward: [(0, '8.769')] [2025-01-04 09:26:50,366][134294] Updated weights for policy 0, policy_version 160634 (0.0021) [2025-01-04 09:26:53,402][134294] Updated weights for policy 0, policy_version 160644 (0.0026) [2025-01-04 09:26:53,968][134211] Fps is (10 sec: 16384.5, 60 sec: 14131.2, 300 sec: 14079.1). Total num frames: 658001920. Throughput: 0: 3549.8. Samples: 153668476. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 09:26:53,968][134211] Avg episode reward: [(0, '9.339')] [2025-01-04 09:26:56,426][134294] Updated weights for policy 0, policy_version 160654 (0.0026) [2025-01-04 09:26:58,968][134211] Fps is (10 sec: 13516.6, 60 sec: 14131.2, 300 sec: 14079.1). Total num frames: 658067456. Throughput: 0: 3456.9. Samples: 153688362. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 09:26:58,968][134211] Avg episode reward: [(0, '9.135')] [2025-01-04 09:26:59,598][134294] Updated weights for policy 0, policy_version 160664 (0.0026) [2025-01-04 09:27:02,711][134294] Updated weights for policy 0, policy_version 160674 (0.0025) [2025-01-04 09:27:03,968][134211] Fps is (10 sec: 13107.4, 60 sec: 13994.7, 300 sec: 13995.8). Total num frames: 658132992. Throughput: 0: 3493.6. Samples: 153698458. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 09:27:03,970][134211] Avg episode reward: [(0, '8.156')] [2025-01-04 09:27:06,146][134294] Updated weights for policy 0, policy_version 160684 (0.0024) [2025-01-04 09:27:08,968][134211] Fps is (10 sec: 12696.7, 60 sec: 13584.9, 300 sec: 13981.9). Total num frames: 658194432. Throughput: 0: 3450.2. Samples: 153716720. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 09:27:08,969][134211] Avg episode reward: [(0, '9.104')] [2025-01-04 09:27:09,728][134294] Updated weights for policy 0, policy_version 160694 (0.0023) [2025-01-04 09:27:12,463][134294] Updated weights for policy 0, policy_version 160704 (0.0020) [2025-01-04 09:27:13,968][134211] Fps is (10 sec: 13517.0, 60 sec: 13721.7, 300 sec: 14009.7). Total num frames: 658268160. Throughput: 0: 3459.3. Samples: 153737292. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 09:27:13,968][134211] Avg episode reward: [(0, '8.597')] [2025-01-04 09:27:14,023][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000160711_658272256.pth... [2025-01-04 09:27:14,065][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000159899_654946304.pth [2025-01-04 09:27:14,681][134294] Updated weights for policy 0, policy_version 160714 (0.0014) [2025-01-04 09:27:16,833][134294] Updated weights for policy 0, policy_version 160724 (0.0013) [2025-01-04 09:27:18,968][134211] Fps is (10 sec: 15975.5, 60 sec: 14131.2, 300 sec: 14079.1). Total num frames: 658354176. Throughput: 0: 3548.6. Samples: 153751536. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 09:27:18,968][134211] Avg episode reward: [(0, '8.941')] [2025-01-04 09:27:19,718][134294] Updated weights for policy 0, policy_version 160734 (0.0021) [2025-01-04 09:27:23,397][134294] Updated weights for policy 0, policy_version 160744 (0.0029) [2025-01-04 09:27:23,968][134211] Fps is (10 sec: 14335.7, 60 sec: 13926.4, 300 sec: 14037.5). Total num frames: 658411520. Throughput: 0: 3537.5. Samples: 153771814. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 09:27:23,968][134211] Avg episode reward: [(0, '9.012')] [2025-01-04 09:27:26,759][134294] Updated weights for policy 0, policy_version 160754 (0.0026) [2025-01-04 09:27:28,968][134211] Fps is (10 sec: 11878.4, 60 sec: 13858.1, 300 sec: 14009.7). Total num frames: 658472960. Throughput: 0: 3523.7. Samples: 153789386. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 09:27:28,969][134211] Avg episode reward: [(0, '8.925')] [2025-01-04 09:27:30,368][134294] Updated weights for policy 0, policy_version 160764 (0.0026) [2025-01-04 09:27:33,680][134294] Updated weights for policy 0, policy_version 160774 (0.0026) [2025-01-04 09:27:33,968][134211] Fps is (10 sec: 11878.5, 60 sec: 13653.3, 300 sec: 13981.9). Total num frames: 658530304. Throughput: 0: 3448.0. Samples: 153797912. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 09:27:33,968][134211] Avg episode reward: [(0, '8.271')] [2025-01-04 09:27:37,063][134294] Updated weights for policy 0, policy_version 160784 (0.0025) [2025-01-04 09:27:38,968][134211] Fps is (10 sec: 11878.5, 60 sec: 13585.1, 300 sec: 13968.1). Total num frames: 658591744. Throughput: 0: 3287.4. Samples: 153816410. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 09:27:38,968][134211] Avg episode reward: [(0, '9.462')] [2025-01-04 09:27:40,492][134294] Updated weights for policy 0, policy_version 160794 (0.0026) [2025-01-04 09:27:43,704][134294] Updated weights for policy 0, policy_version 160804 (0.0025) [2025-01-04 09:27:43,968][134211] Fps is (10 sec: 12287.8, 60 sec: 13585.1, 300 sec: 13926.4). Total num frames: 658653184. Throughput: 0: 3259.9. Samples: 153835058. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 09:27:43,968][134211] Avg episode reward: [(0, '9.773')] [2025-01-04 09:27:46,697][134294] Updated weights for policy 0, policy_version 160814 (0.0021) [2025-01-04 09:27:48,968][134211] Fps is (10 sec: 13516.7, 60 sec: 13243.7, 300 sec: 13801.4). Total num frames: 658726912. Throughput: 0: 3240.9. Samples: 153844298. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 09:27:48,968][134211] Avg episode reward: [(0, '9.501')] [2025-01-04 09:27:49,340][134294] Updated weights for policy 0, policy_version 160824 (0.0022) [2025-01-04 09:27:52,628][134294] Updated weights for policy 0, policy_version 160834 (0.0027) [2025-01-04 09:27:53,968][134211] Fps is (10 sec: 13516.9, 60 sec: 13107.2, 300 sec: 13676.5). Total num frames: 658788352. Throughput: 0: 3310.4. Samples: 153865684. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 09:27:53,968][134211] Avg episode reward: [(0, '9.731')] [2025-01-04 09:27:56,115][134294] Updated weights for policy 0, policy_version 160844 (0.0023) [2025-01-04 09:27:58,287][134294] Updated weights for policy 0, policy_version 160854 (0.0012) [2025-01-04 09:27:58,968][134211] Fps is (10 sec: 14336.3, 60 sec: 13380.3, 300 sec: 13745.9). Total num frames: 658870272. Throughput: 0: 3331.6. Samples: 153887216. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 09:27:58,968][134211] Avg episode reward: [(0, '9.312')] [2025-01-04 09:28:00,505][134294] Updated weights for policy 0, policy_version 160864 (0.0013) [2025-01-04 09:28:02,924][134294] Updated weights for policy 0, policy_version 160874 (0.0015) [2025-01-04 09:28:03,968][134211] Fps is (10 sec: 15974.3, 60 sec: 13585.0, 300 sec: 13787.5). Total num frames: 658948096. Throughput: 0: 3325.2. Samples: 153901170. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 09:28:03,969][134211] Avg episode reward: [(0, '8.503')] [2025-01-04 09:28:06,434][134294] Updated weights for policy 0, policy_version 160884 (0.0029) [2025-01-04 09:28:08,968][134211] Fps is (10 sec: 13926.0, 60 sec: 13585.2, 300 sec: 13787.5). Total num frames: 659009536. Throughput: 0: 3304.3. Samples: 153920508. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 09:28:08,968][134211] Avg episode reward: [(0, '9.752')] [2025-01-04 09:28:09,924][134294] Updated weights for policy 0, policy_version 160894 (0.0026) [2025-01-04 09:28:13,287][134294] Updated weights for policy 0, policy_version 160904 (0.0026) [2025-01-04 09:28:13,968][134211] Fps is (10 sec: 11878.4, 60 sec: 13311.9, 300 sec: 13759.8). Total num frames: 659066880. Throughput: 0: 3315.1. Samples: 153938566. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 09:28:13,969][134211] Avg episode reward: [(0, '9.839')] [2025-01-04 09:28:16,929][134294] Updated weights for policy 0, policy_version 160914 (0.0027) [2025-01-04 09:28:18,968][134211] Fps is (10 sec: 11468.8, 60 sec: 12834.1, 300 sec: 13718.2). Total num frames: 659124224. Throughput: 0: 3315.6. Samples: 153947116. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 09:28:18,968][134211] Avg episode reward: [(0, '8.422')] [2025-01-04 09:28:20,406][134294] Updated weights for policy 0, policy_version 160924 (0.0025) [2025-01-04 09:28:23,825][134294] Updated weights for policy 0, policy_version 160934 (0.0025) [2025-01-04 09:28:23,968][134211] Fps is (10 sec: 11878.6, 60 sec: 12902.4, 300 sec: 13704.2). Total num frames: 659185664. Throughput: 0: 3294.5. Samples: 153964662. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 09:28:23,968][134211] Avg episode reward: [(0, '8.999')] [2025-01-04 09:28:26,932][134294] Updated weights for policy 0, policy_version 160944 (0.0023) [2025-01-04 09:28:28,968][134211] Fps is (10 sec: 12697.7, 60 sec: 12970.7, 300 sec: 13690.4). Total num frames: 659251200. Throughput: 0: 3297.4. Samples: 153983442. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 09:28:28,968][134211] Avg episode reward: [(0, '9.323')] [2025-01-04 09:28:30,378][134294] Updated weights for policy 0, policy_version 160954 (0.0025) [2025-01-04 09:28:33,628][134294] Updated weights for policy 0, policy_version 160964 (0.0023) [2025-01-04 09:28:33,968][134211] Fps is (10 sec: 12697.6, 60 sec: 13038.9, 300 sec: 13676.5). Total num frames: 659312640. Throughput: 0: 3296.1. Samples: 153992624. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 09:28:33,968][134211] Avg episode reward: [(0, '8.503')] [2025-01-04 09:28:36,872][134294] Updated weights for policy 0, policy_version 160974 (0.0025) [2025-01-04 09:28:38,968][134211] Fps is (10 sec: 11878.4, 60 sec: 12970.7, 300 sec: 13648.7). Total num frames: 659369984. Throughput: 0: 3233.2. Samples: 154011180. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 09:28:38,968][134211] Avg episode reward: [(0, '8.348')] [2025-01-04 09:28:40,440][134294] Updated weights for policy 0, policy_version 160984 (0.0026) [2025-01-04 09:28:43,570][134294] Updated weights for policy 0, policy_version 160994 (0.0023) [2025-01-04 09:28:43,967][134211] Fps is (10 sec: 12288.2, 60 sec: 13039.0, 300 sec: 13662.6). Total num frames: 659435520. Throughput: 0: 3155.0. Samples: 154029192. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 09:28:43,968][134211] Avg episode reward: [(0, '9.220')] [2025-01-04 09:28:46,168][134294] Updated weights for policy 0, policy_version 161004 (0.0019) [2025-01-04 09:28:48,968][134211] Fps is (10 sec: 13925.8, 60 sec: 13038.8, 300 sec: 13676.5). Total num frames: 659509248. Throughput: 0: 3114.7. Samples: 154041332. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 09:28:48,969][134211] Avg episode reward: [(0, '8.339')] [2025-01-04 09:28:49,300][134294] Updated weights for policy 0, policy_version 161014 (0.0023) [2025-01-04 09:28:52,882][134294] Updated weights for policy 0, policy_version 161024 (0.0026) [2025-01-04 09:28:53,968][134211] Fps is (10 sec: 13106.9, 60 sec: 12970.7, 300 sec: 13648.7). Total num frames: 659566592. Throughput: 0: 3094.0. Samples: 154059740. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 09:28:53,968][134211] Avg episode reward: [(0, '9.171')] [2025-01-04 09:28:55,571][134294] Updated weights for policy 0, policy_version 161034 (0.0019) [2025-01-04 09:28:57,634][134294] Updated weights for policy 0, policy_version 161044 (0.0013) [2025-01-04 09:28:58,967][134211] Fps is (10 sec: 15156.2, 60 sec: 13175.5, 300 sec: 13759.8). Total num frames: 659660800. Throughput: 0: 3261.2. Samples: 154085318. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 09:28:58,968][134211] Avg episode reward: [(0, '9.164')] [2025-01-04 09:28:59,640][134294] Updated weights for policy 0, policy_version 161054 (0.0013) [2025-01-04 09:29:01,685][134294] Updated weights for policy 0, policy_version 161064 (0.0012) [2025-01-04 09:29:03,824][134294] Updated weights for policy 0, policy_version 161074 (0.0013) [2025-01-04 09:29:03,968][134211] Fps is (10 sec: 19251.4, 60 sec: 13516.8, 300 sec: 13829.2). Total num frames: 659759104. Throughput: 0: 3401.9. Samples: 154100202. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 09:29:03,968][134211] Avg episode reward: [(0, '9.084')] [2025-01-04 09:29:06,797][134294] Updated weights for policy 0, policy_version 161084 (0.0026) [2025-01-04 09:29:08,968][134211] Fps is (10 sec: 15973.6, 60 sec: 13516.8, 300 sec: 13815.3). Total num frames: 659820544. Throughput: 0: 3549.3. Samples: 154124382. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 09:29:08,969][134211] Avg episode reward: [(0, '8.128')] [2025-01-04 09:29:10,688][134294] Updated weights for policy 0, policy_version 161094 (0.0034) [2025-01-04 09:29:13,968][134211] Fps is (10 sec: 11877.8, 60 sec: 13516.7, 300 sec: 13787.6). Total num frames: 659877888. Throughput: 0: 3489.7. Samples: 154140482. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 09:29:13,969][134211] Avg episode reward: [(0, '9.189')] [2025-01-04 09:29:13,980][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000161103_659877888.pth... [2025-01-04 09:29:14,070][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000160311_656633856.pth [2025-01-04 09:29:14,425][134294] Updated weights for policy 0, policy_version 161104 (0.0029) [2025-01-04 09:29:17,919][134294] Updated weights for policy 0, policy_version 161114 (0.0029) [2025-01-04 09:29:18,968][134211] Fps is (10 sec: 11469.2, 60 sec: 13516.8, 300 sec: 13718.2). Total num frames: 659935232. Throughput: 0: 3467.9. Samples: 154148678. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 09:29:18,968][134211] Avg episode reward: [(0, '9.025')] [2025-01-04 09:29:21,240][134294] Updated weights for policy 0, policy_version 161124 (0.0025) [2025-01-04 09:29:23,968][134211] Fps is (10 sec: 11469.2, 60 sec: 13448.5, 300 sec: 13648.7). Total num frames: 659992576. Throughput: 0: 3463.2. Samples: 154167026. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 09:29:23,969][134211] Avg episode reward: [(0, '7.709')] [2025-01-04 09:29:24,728][134294] Updated weights for policy 0, policy_version 161134 (0.0027) [2025-01-04 09:29:28,123][134294] Updated weights for policy 0, policy_version 161144 (0.0026) [2025-01-04 09:29:28,968][134211] Fps is (10 sec: 11878.2, 60 sec: 13380.3, 300 sec: 13620.9). Total num frames: 660054016. Throughput: 0: 3455.5. Samples: 154184692. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 09:29:28,968][134211] Avg episode reward: [(0, '9.032')] [2025-01-04 09:29:31,686][134294] Updated weights for policy 0, policy_version 161154 (0.0028) [2025-01-04 09:29:33,969][134211] Fps is (10 sec: 11877.3, 60 sec: 13311.8, 300 sec: 13593.1). Total num frames: 660111360. Throughput: 0: 3385.1. Samples: 154193662. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 09:29:33,969][134211] Avg episode reward: [(0, '8.551')] [2025-01-04 09:29:34,987][134294] Updated weights for policy 0, policy_version 161164 (0.0026) [2025-01-04 09:29:38,369][134294] Updated weights for policy 0, policy_version 161174 (0.0023) [2025-01-04 09:29:38,968][134211] Fps is (10 sec: 11878.4, 60 sec: 13380.3, 300 sec: 13593.2). Total num frames: 660172800. Throughput: 0: 3376.9. Samples: 154211700. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 09:29:38,968][134211] Avg episode reward: [(0, '9.485')] [2025-01-04 09:29:41,632][134294] Updated weights for policy 0, policy_version 161184 (0.0023) [2025-01-04 09:29:43,971][134211] Fps is (10 sec: 12285.7, 60 sec: 13311.3, 300 sec: 13481.9). Total num frames: 660234240. Throughput: 0: 3214.9. Samples: 154230000. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 09:29:43,971][134211] Avg episode reward: [(0, '8.840')] [2025-01-04 09:29:45,123][134294] Updated weights for policy 0, policy_version 161194 (0.0027) [2025-01-04 09:29:48,604][134294] Updated weights for policy 0, policy_version 161204 (0.0027) [2025-01-04 09:29:48,968][134211] Fps is (10 sec: 12288.0, 60 sec: 13107.3, 300 sec: 13315.5). Total num frames: 660295680. Throughput: 0: 3076.7. Samples: 154238652. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 09:29:48,968][134211] Avg episode reward: [(0, '9.538')] [2025-01-04 09:29:51,664][134294] Updated weights for policy 0, policy_version 161214 (0.0027) [2025-01-04 09:29:53,968][134211] Fps is (10 sec: 12291.5, 60 sec: 13175.5, 300 sec: 13273.8). Total num frames: 660357120. Throughput: 0: 2959.6. Samples: 154257564. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 09:29:53,968][134211] Avg episode reward: [(0, '9.530')] [2025-01-04 09:29:55,071][134294] Updated weights for policy 0, policy_version 161224 (0.0027) [2025-01-04 09:29:58,395][134294] Updated weights for policy 0, policy_version 161234 (0.0024) [2025-01-04 09:29:58,968][134211] Fps is (10 sec: 12697.8, 60 sec: 12697.6, 300 sec: 13287.7). Total num frames: 660422656. Throughput: 0: 3020.3. Samples: 154276392. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 09:29:58,968][134211] Avg episode reward: [(0, '9.471')] [2025-01-04 09:30:00,651][134294] Updated weights for policy 0, policy_version 161244 (0.0012) [2025-01-04 09:30:03,033][134294] Updated weights for policy 0, policy_version 161254 (0.0016) [2025-01-04 09:30:03,968][134211] Fps is (10 sec: 14745.6, 60 sec: 12424.5, 300 sec: 13329.4). Total num frames: 660504576. Throughput: 0: 3133.0. Samples: 154289662. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 09:30:03,968][134211] Avg episode reward: [(0, '9.363')] [2025-01-04 09:30:06,744][134294] Updated weights for policy 0, policy_version 161264 (0.0024) [2025-01-04 09:30:08,968][134211] Fps is (10 sec: 14335.7, 60 sec: 12424.6, 300 sec: 13329.4). Total num frames: 660566016. Throughput: 0: 3154.9. Samples: 154308998. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 09:30:08,968][134211] Avg episode reward: [(0, '9.116')] [2025-01-04 09:30:10,070][134294] Updated weights for policy 0, policy_version 161274 (0.0028) [2025-01-04 09:30:12,925][134294] Updated weights for policy 0, policy_version 161284 (0.0018) [2025-01-04 09:30:13,968][134211] Fps is (10 sec: 13107.3, 60 sec: 12629.5, 300 sec: 13357.1). Total num frames: 660635648. Throughput: 0: 3210.0. Samples: 154329142. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 09:30:13,968][134211] Avg episode reward: [(0, '8.824')] [2025-01-04 09:30:15,901][134294] Updated weights for policy 0, policy_version 161294 (0.0019) [2025-01-04 09:30:18,968][134211] Fps is (10 sec: 13107.4, 60 sec: 12697.6, 300 sec: 13343.3). Total num frames: 660697088. Throughput: 0: 3234.1. Samples: 154339192. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 09:30:18,968][134211] Avg episode reward: [(0, '8.741')] [2025-01-04 09:30:19,195][134294] Updated weights for policy 0, policy_version 161304 (0.0026) [2025-01-04 09:30:22,654][134294] Updated weights for policy 0, policy_version 161314 (0.0026) [2025-01-04 09:30:23,968][134211] Fps is (10 sec: 12287.7, 60 sec: 12765.9, 300 sec: 13343.2). Total num frames: 660758528. Throughput: 0: 3241.5. Samples: 154357566. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 09:30:23,969][134211] Avg episode reward: [(0, '9.107')] [2025-01-04 09:30:25,796][134294] Updated weights for policy 0, policy_version 161324 (0.0024) [2025-01-04 09:30:28,906][134294] Updated weights for policy 0, policy_version 161334 (0.0024) [2025-01-04 09:30:28,968][134211] Fps is (10 sec: 12697.5, 60 sec: 12834.1, 300 sec: 13329.4). Total num frames: 660824064. Throughput: 0: 3265.3. Samples: 154376928. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 09:30:28,968][134211] Avg episode reward: [(0, '9.151')] [2025-01-04 09:30:31,938][134294] Updated weights for policy 0, policy_version 161344 (0.0028) [2025-01-04 09:30:33,969][134211] Fps is (10 sec: 13105.9, 60 sec: 12970.7, 300 sec: 13315.4). Total num frames: 660889600. Throughput: 0: 3295.1. Samples: 154386936. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 09:30:33,969][134211] Avg episode reward: [(0, '9.246')] [2025-01-04 09:30:34,975][134294] Updated weights for policy 0, policy_version 161354 (0.0024) [2025-01-04 09:30:38,061][134294] Updated weights for policy 0, policy_version 161364 (0.0025) [2025-01-04 09:30:38,968][134211] Fps is (10 sec: 13107.3, 60 sec: 13039.0, 300 sec: 13329.4). Total num frames: 660955136. Throughput: 0: 3322.6. Samples: 154407080. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 09:30:38,968][134211] Avg episode reward: [(0, '9.172')] [2025-01-04 09:30:41,200][134294] Updated weights for policy 0, policy_version 161374 (0.0024) [2025-01-04 09:30:43,970][134211] Fps is (10 sec: 12696.3, 60 sec: 13039.1, 300 sec: 13329.3). Total num frames: 661016576. Throughput: 0: 3322.0. Samples: 154425888. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 09:30:43,970][134211] Avg episode reward: [(0, '9.221')] [2025-01-04 09:30:44,749][134294] Updated weights for policy 0, policy_version 161384 (0.0026) [2025-01-04 09:30:47,182][134294] Updated weights for policy 0, policy_version 161394 (0.0015) [2025-01-04 09:30:48,968][134211] Fps is (10 sec: 15155.3, 60 sec: 13516.8, 300 sec: 13398.8). Total num frames: 661106688. Throughput: 0: 3244.0. Samples: 154435642. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 09:30:48,968][134211] Avg episode reward: [(0, '9.244')] [2025-01-04 09:30:49,110][134294] Updated weights for policy 0, policy_version 161404 (0.0014) [2025-01-04 09:30:50,946][134294] Updated weights for policy 0, policy_version 161414 (0.0013) [2025-01-04 09:30:53,306][134294] Updated weights for policy 0, policy_version 161424 (0.0020) [2025-01-04 09:30:53,968][134211] Fps is (10 sec: 18026.0, 60 sec: 13994.6, 300 sec: 13482.1). Total num frames: 661196800. Throughput: 0: 3515.0. Samples: 154467172. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 09:30:53,968][134211] Avg episode reward: [(0, '8.517')] [2025-01-04 09:30:56,636][134294] Updated weights for policy 0, policy_version 161434 (0.0028) [2025-01-04 09:30:58,968][134211] Fps is (10 sec: 15564.6, 60 sec: 13994.6, 300 sec: 13454.3). Total num frames: 661262336. Throughput: 0: 3491.7. Samples: 154486270. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 09:30:58,968][134211] Avg episode reward: [(0, '8.235')] [2025-01-04 09:30:59,936][134294] Updated weights for policy 0, policy_version 161444 (0.0028) [2025-01-04 09:31:03,166][134294] Updated weights for policy 0, policy_version 161454 (0.0029) [2025-01-04 09:31:03,969][134211] Fps is (10 sec: 12695.9, 60 sec: 13653.0, 300 sec: 13370.9). Total num frames: 661323776. Throughput: 0: 3482.5. Samples: 154495908. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 09:31:03,970][134211] Avg episode reward: [(0, '9.616')] [2025-01-04 09:31:06,139][134294] Updated weights for policy 0, policy_version 161464 (0.0027) [2025-01-04 09:31:08,968][134211] Fps is (10 sec: 12697.5, 60 sec: 13721.6, 300 sec: 13371.0). Total num frames: 661389312. Throughput: 0: 3510.7. Samples: 154515548. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 09:31:08,968][134211] Avg episode reward: [(0, '9.782')] [2025-01-04 09:31:09,606][134294] Updated weights for policy 0, policy_version 161474 (0.0029) [2025-01-04 09:31:12,960][134294] Updated weights for policy 0, policy_version 161484 (0.0027) [2025-01-04 09:31:13,968][134211] Fps is (10 sec: 12289.5, 60 sec: 13516.7, 300 sec: 13357.1). Total num frames: 661446656. Throughput: 0: 3470.8. Samples: 154533116. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 09:31:13,969][134211] Avg episode reward: [(0, '9.000')] [2025-01-04 09:31:13,979][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000161486_661446656.pth... [2025-01-04 09:31:14,078][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000160711_658272256.pth [2025-01-04 09:31:16,483][134294] Updated weights for policy 0, policy_version 161494 (0.0024) [2025-01-04 09:31:18,968][134211] Fps is (10 sec: 12288.1, 60 sec: 13585.1, 300 sec: 13343.2). Total num frames: 661512192. Throughput: 0: 3445.3. Samples: 154541972. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 09:31:18,968][134211] Avg episode reward: [(0, '9.126')] [2025-01-04 09:31:19,514][134294] Updated weights for policy 0, policy_version 161504 (0.0026) [2025-01-04 09:31:22,506][134294] Updated weights for policy 0, policy_version 161514 (0.0028) [2025-01-04 09:31:23,968][134211] Fps is (10 sec: 13107.0, 60 sec: 13653.3, 300 sec: 13343.2). Total num frames: 661577728. Throughput: 0: 3449.9. Samples: 154562326. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 09:31:23,969][134211] Avg episode reward: [(0, '9.703')] [2025-01-04 09:31:25,570][134294] Updated weights for policy 0, policy_version 161524 (0.0025) [2025-01-04 09:31:28,423][134294] Updated weights for policy 0, policy_version 161534 (0.0024) [2025-01-04 09:31:28,968][134211] Fps is (10 sec: 13516.7, 60 sec: 13721.6, 300 sec: 13343.2). Total num frames: 661647360. Throughput: 0: 3493.0. Samples: 154583064. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 09:31:28,968][134211] Avg episode reward: [(0, '8.431')] [2025-01-04 09:31:31,520][134294] Updated weights for policy 0, policy_version 161544 (0.0026) [2025-01-04 09:31:33,968][134211] Fps is (10 sec: 13517.4, 60 sec: 13721.9, 300 sec: 13343.2). Total num frames: 661712896. Throughput: 0: 3500.8. Samples: 154593180. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 09:31:33,968][134211] Avg episode reward: [(0, '8.777')] [2025-01-04 09:31:34,802][134294] Updated weights for policy 0, policy_version 161554 (0.0028) [2025-01-04 09:31:37,766][134294] Updated weights for policy 0, policy_version 161564 (0.0023) [2025-01-04 09:31:38,968][134211] Fps is (10 sec: 14336.3, 60 sec: 13926.4, 300 sec: 13398.8). Total num frames: 661790720. Throughput: 0: 3235.2. Samples: 154612756. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 09:31:38,968][134211] Avg episode reward: [(0, '9.530')] [2025-01-04 09:31:39,857][134294] Updated weights for policy 0, policy_version 161574 (0.0017) [2025-01-04 09:31:43,143][134294] Updated weights for policy 0, policy_version 161584 (0.0024) [2025-01-04 09:31:43,968][134211] Fps is (10 sec: 14335.8, 60 sec: 13995.1, 300 sec: 13301.6). Total num frames: 661856256. Throughput: 0: 3311.5. Samples: 154635288. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 09:31:43,969][134211] Avg episode reward: [(0, '9.172')] [2025-01-04 09:31:46,679][134294] Updated weights for policy 0, policy_version 161594 (0.0024) [2025-01-04 09:31:48,967][134211] Fps is (10 sec: 13516.9, 60 sec: 13653.4, 300 sec: 13301.6). Total num frames: 661925888. Throughput: 0: 3293.1. Samples: 154644090. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 09:31:48,968][134211] Avg episode reward: [(0, '9.361')] [2025-01-04 09:31:49,105][134294] Updated weights for policy 0, policy_version 161604 (0.0018) [2025-01-04 09:31:51,003][134294] Updated weights for policy 0, policy_version 161614 (0.0012) [2025-01-04 09:31:52,974][134294] Updated weights for policy 0, policy_version 161624 (0.0014) [2025-01-04 09:31:53,968][134211] Fps is (10 sec: 17613.2, 60 sec: 13926.5, 300 sec: 13440.4). Total num frames: 662032384. Throughput: 0: 3476.9. Samples: 154672006. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 09:31:53,968][134211] Avg episode reward: [(0, '8.865')] [2025-01-04 09:31:54,884][134294] Updated weights for policy 0, policy_version 161634 (0.0013) [2025-01-04 09:31:56,805][134294] Updated weights for policy 0, policy_version 161644 (0.0014) [2025-01-04 09:31:58,968][134211] Fps is (10 sec: 20069.9, 60 sec: 14404.3, 300 sec: 13537.6). Total num frames: 662126592. Throughput: 0: 3773.8. Samples: 154702936. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 09:31:58,968][134211] Avg episode reward: [(0, '8.808')] [2025-01-04 09:31:59,415][134294] Updated weights for policy 0, policy_version 161654 (0.0023) [2025-01-04 09:32:02,880][134294] Updated weights for policy 0, policy_version 161664 (0.0031) [2025-01-04 09:32:03,968][134211] Fps is (10 sec: 15155.1, 60 sec: 14336.3, 300 sec: 13523.8). Total num frames: 662183936. Throughput: 0: 3784.6. Samples: 154712278. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 09:32:03,968][134211] Avg episode reward: [(0, '9.288')] [2025-01-04 09:32:06,729][134294] Updated weights for policy 0, policy_version 161674 (0.0030) [2025-01-04 09:32:08,968][134211] Fps is (10 sec: 11468.7, 60 sec: 14199.5, 300 sec: 13468.2). Total num frames: 662241280. Throughput: 0: 3697.8. Samples: 154728726. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 09:32:08,968][134211] Avg episode reward: [(0, '8.853')] [2025-01-04 09:32:10,498][134294] Updated weights for policy 0, policy_version 161684 (0.0027) [2025-01-04 09:32:13,944][134294] Updated weights for policy 0, policy_version 161694 (0.0027) [2025-01-04 09:32:13,968][134211] Fps is (10 sec: 11468.9, 60 sec: 14199.5, 300 sec: 13371.0). Total num frames: 662298624. Throughput: 0: 3610.6. Samples: 154745542. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 09:32:13,968][134211] Avg episode reward: [(0, '9.534')] [2025-01-04 09:32:17,389][134294] Updated weights for policy 0, policy_version 161704 (0.0029) [2025-01-04 09:32:18,968][134211] Fps is (10 sec: 11878.5, 60 sec: 14131.2, 300 sec: 13384.9). Total num frames: 662360064. Throughput: 0: 3580.7. Samples: 154754310. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 09:32:18,968][134211] Avg episode reward: [(0, '8.867')] [2025-01-04 09:32:20,362][134294] Updated weights for policy 0, policy_version 161714 (0.0024) [2025-01-04 09:32:23,348][134294] Updated weights for policy 0, policy_version 161724 (0.0025) [2025-01-04 09:32:23,968][134211] Fps is (10 sec: 13107.0, 60 sec: 14199.5, 300 sec: 13412.7). Total num frames: 662429696. Throughput: 0: 3598.7. Samples: 154774698. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 09:32:23,968][134211] Avg episode reward: [(0, '8.184')] [2025-01-04 09:32:26,189][134294] Updated weights for policy 0, policy_version 161734 (0.0024) [2025-01-04 09:32:28,968][134211] Fps is (10 sec: 13516.6, 60 sec: 14131.2, 300 sec: 13440.4). Total num frames: 662495232. Throughput: 0: 3559.1. Samples: 154795448. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 09:32:28,968][134211] Avg episode reward: [(0, '9.361')] [2025-01-04 09:32:29,319][134294] Updated weights for policy 0, policy_version 161744 (0.0029) [2025-01-04 09:32:32,585][134294] Updated weights for policy 0, policy_version 161754 (0.0025) [2025-01-04 09:32:33,968][134211] Fps is (10 sec: 13107.2, 60 sec: 14131.2, 300 sec: 13454.3). Total num frames: 662560768. Throughput: 0: 3569.1. Samples: 154804700. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 09:32:33,968][134211] Avg episode reward: [(0, '9.795')] [2025-01-04 09:32:35,939][134294] Updated weights for policy 0, policy_version 161764 (0.0027) [2025-01-04 09:32:38,968][134211] Fps is (10 sec: 12697.8, 60 sec: 13858.1, 300 sec: 13454.3). Total num frames: 662622208. Throughput: 0: 3365.4. Samples: 154823448. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 09:32:38,968][134211] Avg episode reward: [(0, '9.288')] [2025-01-04 09:32:39,165][134294] Updated weights for policy 0, policy_version 161774 (0.0025) [2025-01-04 09:32:42,582][134294] Updated weights for policy 0, policy_version 161784 (0.0025) [2025-01-04 09:32:43,968][134211] Fps is (10 sec: 12287.9, 60 sec: 13789.9, 300 sec: 13412.7). Total num frames: 662683648. Throughput: 0: 3089.0. Samples: 154841942. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 09:32:43,969][134211] Avg episode reward: [(0, '9.917')] [2025-01-04 09:32:45,835][134294] Updated weights for policy 0, policy_version 161794 (0.0023) [2025-01-04 09:32:48,968][134211] Fps is (10 sec: 12287.8, 60 sec: 13653.3, 300 sec: 13412.7). Total num frames: 662745088. Throughput: 0: 3084.3. Samples: 154851070. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 09:32:48,968][134211] Avg episode reward: [(0, '8.990')] [2025-01-04 09:32:49,031][134294] Updated weights for policy 0, policy_version 161804 (0.0023) [2025-01-04 09:32:51,801][134294] Updated weights for policy 0, policy_version 161814 (0.0019) [2025-01-04 09:32:53,969][134211] Fps is (10 sec: 14334.1, 60 sec: 13243.4, 300 sec: 13412.6). Total num frames: 662827008. Throughput: 0: 3199.8. Samples: 154872720. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 09:32:53,970][134211] Avg episode reward: [(0, '8.723')] [2025-01-04 09:32:54,006][134294] Updated weights for policy 0, policy_version 161824 (0.0016) [2025-01-04 09:32:57,066][134294] Updated weights for policy 0, policy_version 161834 (0.0025) [2025-01-04 09:32:58,968][134211] Fps is (10 sec: 15155.3, 60 sec: 12834.1, 300 sec: 13384.9). Total num frames: 662896640. Throughput: 0: 3317.5. Samples: 154894830. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 09:32:58,968][134211] Avg episode reward: [(0, '8.769')] [2025-01-04 09:33:00,144][134294] Updated weights for policy 0, policy_version 161844 (0.0025) [2025-01-04 09:33:03,079][134294] Updated weights for policy 0, policy_version 161854 (0.0023) [2025-01-04 09:33:03,968][134211] Fps is (10 sec: 13518.7, 60 sec: 12970.7, 300 sec: 13398.8). Total num frames: 662962176. Throughput: 0: 3354.9. Samples: 154905282. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 09:33:03,968][134211] Avg episode reward: [(0, '9.307')] [2025-01-04 09:33:06,139][134294] Updated weights for policy 0, policy_version 161864 (0.0024) [2025-01-04 09:33:08,968][134211] Fps is (10 sec: 13516.7, 60 sec: 13175.5, 300 sec: 13440.4). Total num frames: 663031808. Throughput: 0: 3348.1. Samples: 154925362. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 09:33:08,968][134211] Avg episode reward: [(0, '9.294')] [2025-01-04 09:33:09,191][134294] Updated weights for policy 0, policy_version 161874 (0.0027) [2025-01-04 09:33:12,551][134294] Updated weights for policy 0, policy_version 161884 (0.0026) [2025-01-04 09:33:13,968][134211] Fps is (10 sec: 12697.4, 60 sec: 13175.4, 300 sec: 13440.4). Total num frames: 663089152. Throughput: 0: 3302.5. Samples: 154944060. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 09:33:13,969][134211] Avg episode reward: [(0, '8.975')] [2025-01-04 09:33:14,022][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000161888_663093248.pth... [2025-01-04 09:33:14,096][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000161103_659877888.pth [2025-01-04 09:33:15,943][134294] Updated weights for policy 0, policy_version 161894 (0.0025) [2025-01-04 09:33:17,934][134294] Updated weights for policy 0, policy_version 161904 (0.0014) [2025-01-04 09:33:18,968][134211] Fps is (10 sec: 14745.9, 60 sec: 13653.4, 300 sec: 13537.6). Total num frames: 663179264. Throughput: 0: 3312.2. Samples: 154953748. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 09:33:18,968][134211] Avg episode reward: [(0, '8.935')] [2025-01-04 09:33:19,776][134294] Updated weights for policy 0, policy_version 161914 (0.0014) [2025-01-04 09:33:21,754][134294] Updated weights for policy 0, policy_version 161924 (0.0013) [2025-01-04 09:33:23,652][134294] Updated weights for policy 0, policy_version 161934 (0.0013) [2025-01-04 09:33:23,968][134211] Fps is (10 sec: 19660.8, 60 sec: 14267.7, 300 sec: 13676.5). Total num frames: 663285760. Throughput: 0: 3604.1. Samples: 154985632. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 09:33:23,968][134211] Avg episode reward: [(0, '8.937')] [2025-01-04 09:33:25,723][134294] Updated weights for policy 0, policy_version 161944 (0.0015) [2025-01-04 09:33:28,883][134294] Updated weights for policy 0, policy_version 161954 (0.0030) [2025-01-04 09:33:28,968][134211] Fps is (10 sec: 18431.6, 60 sec: 14472.5, 300 sec: 13732.0). Total num frames: 663363584. Throughput: 0: 3779.0. Samples: 155011996. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 09:33:28,968][134211] Avg episode reward: [(0, '9.483')] [2025-01-04 09:33:32,067][134294] Updated weights for policy 0, policy_version 161964 (0.0025) [2025-01-04 09:33:33,968][134211] Fps is (10 sec: 13926.6, 60 sec: 14404.3, 300 sec: 13745.9). Total num frames: 663425024. Throughput: 0: 3785.6. Samples: 155021424. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 09:33:33,968][134211] Avg episode reward: [(0, '8.709')] [2025-01-04 09:33:35,288][134294] Updated weights for policy 0, policy_version 161974 (0.0025) [2025-01-04 09:33:38,347][134294] Updated weights for policy 0, policy_version 161984 (0.0024) [2025-01-04 09:33:38,968][134211] Fps is (10 sec: 12697.8, 60 sec: 14472.5, 300 sec: 13745.9). Total num frames: 663490560. Throughput: 0: 3738.9. Samples: 155040964. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 09:33:38,968][134211] Avg episode reward: [(0, '8.838')] [2025-01-04 09:33:41,805][134294] Updated weights for policy 0, policy_version 161994 (0.0026) [2025-01-04 09:33:43,968][134211] Fps is (10 sec: 12697.6, 60 sec: 14472.5, 300 sec: 13704.3). Total num frames: 663552000. Throughput: 0: 3654.3. Samples: 155059276. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 09:33:43,968][134211] Avg episode reward: [(0, '10.163')] [2025-01-04 09:33:45,254][134294] Updated weights for policy 0, policy_version 162004 (0.0027) [2025-01-04 09:33:48,542][134294] Updated weights for policy 0, policy_version 162014 (0.0028) [2025-01-04 09:33:48,968][134211] Fps is (10 sec: 12287.9, 60 sec: 14472.6, 300 sec: 13718.1). Total num frames: 663613440. Throughput: 0: 3621.0. Samples: 155068226. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 09:33:48,968][134211] Avg episode reward: [(0, '9.571')] [2025-01-04 09:33:51,551][134294] Updated weights for policy 0, policy_version 162024 (0.0026) [2025-01-04 09:33:53,968][134211] Fps is (10 sec: 12697.8, 60 sec: 14199.8, 300 sec: 13620.9). Total num frames: 663678976. Throughput: 0: 3612.6. Samples: 155087930. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 09:33:53,968][134211] Avg episode reward: [(0, '9.533')] [2025-01-04 09:33:54,558][134294] Updated weights for policy 0, policy_version 162034 (0.0024) [2025-01-04 09:33:57,575][134294] Updated weights for policy 0, policy_version 162044 (0.0025) [2025-01-04 09:33:58,968][134211] Fps is (10 sec: 13516.7, 60 sec: 14199.5, 300 sec: 13523.7). Total num frames: 663748608. Throughput: 0: 3647.4. Samples: 155108194. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 09:33:58,968][134211] Avg episode reward: [(0, '9.996')] [2025-01-04 09:34:00,534][134294] Updated weights for policy 0, policy_version 162054 (0.0026) [2025-01-04 09:34:03,768][134294] Updated weights for policy 0, policy_version 162064 (0.0029) [2025-01-04 09:34:03,968][134211] Fps is (10 sec: 13516.5, 60 sec: 14199.4, 300 sec: 13537.6). Total num frames: 663814144. Throughput: 0: 3666.5. Samples: 155118740. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 09:34:03,969][134211] Avg episode reward: [(0, '8.564')] [2025-01-04 09:34:07,108][134294] Updated weights for policy 0, policy_version 162074 (0.0024) [2025-01-04 09:34:08,970][134211] Fps is (10 sec: 12695.0, 60 sec: 14062.5, 300 sec: 13551.4). Total num frames: 663875584. Throughput: 0: 3366.5. Samples: 155137132. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 09:34:08,970][134211] Avg episode reward: [(0, '9.875')] [2025-01-04 09:34:10,509][134294] Updated weights for policy 0, policy_version 162084 (0.0025) [2025-01-04 09:34:13,928][134294] Updated weights for policy 0, policy_version 162094 (0.0026) [2025-01-04 09:34:13,968][134211] Fps is (10 sec: 12288.0, 60 sec: 14131.2, 300 sec: 13565.4). Total num frames: 663937024. Throughput: 0: 3183.6. Samples: 155155260. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 09:34:13,968][134211] Avg episode reward: [(0, '8.862')] [2025-01-04 09:34:17,097][134294] Updated weights for policy 0, policy_version 162104 (0.0025) [2025-01-04 09:34:18,968][134211] Fps is (10 sec: 12700.2, 60 sec: 13721.6, 300 sec: 13593.2). Total num frames: 664002560. Throughput: 0: 3184.1. Samples: 155164710. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 09:34:18,968][134211] Avg episode reward: [(0, '9.968')] [2025-01-04 09:34:20,069][134294] Updated weights for policy 0, policy_version 162114 (0.0025) [2025-01-04 09:34:22,971][134294] Updated weights for policy 0, policy_version 162124 (0.0028) [2025-01-04 09:34:23,968][134211] Fps is (10 sec: 13517.1, 60 sec: 13107.3, 300 sec: 13620.9). Total num frames: 664072192. Throughput: 0: 3212.2. Samples: 155185512. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 09:34:23,968][134211] Avg episode reward: [(0, '8.585')] [2025-01-04 09:34:25,950][134294] Updated weights for policy 0, policy_version 162134 (0.0024) [2025-01-04 09:34:27,912][134294] Updated weights for policy 0, policy_version 162144 (0.0013) [2025-01-04 09:34:28,968][134211] Fps is (10 sec: 15564.8, 60 sec: 13243.7, 300 sec: 13718.2). Total num frames: 664158208. Throughput: 0: 3351.9. Samples: 155210110. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 09:34:28,968][134211] Avg episode reward: [(0, '8.342')] [2025-01-04 09:34:30,496][134294] Updated weights for policy 0, policy_version 162154 (0.0025) [2025-01-04 09:34:33,301][134294] Updated weights for policy 0, policy_version 162164 (0.0024) [2025-01-04 09:34:33,968][134211] Fps is (10 sec: 15564.8, 60 sec: 13380.3, 300 sec: 13745.9). Total num frames: 664227840. Throughput: 0: 3405.7. Samples: 155221484. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 09:34:33,968][134211] Avg episode reward: [(0, '7.888')] [2025-01-04 09:34:36,228][134294] Updated weights for policy 0, policy_version 162174 (0.0025) [2025-01-04 09:34:38,968][134211] Fps is (10 sec: 13926.2, 60 sec: 13448.5, 300 sec: 13773.8). Total num frames: 664297472. Throughput: 0: 3436.0. Samples: 155242552. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 09:34:38,969][134211] Avg episode reward: [(0, '9.374')] [2025-01-04 09:34:39,405][134294] Updated weights for policy 0, policy_version 162184 (0.0024) [2025-01-04 09:34:42,610][134294] Updated weights for policy 0, policy_version 162194 (0.0027) [2025-01-04 09:34:43,968][134211] Fps is (10 sec: 13106.6, 60 sec: 13448.5, 300 sec: 13773.7). Total num frames: 664358912. Throughput: 0: 3402.2. Samples: 155261294. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 09:34:43,969][134211] Avg episode reward: [(0, '8.578')] [2025-01-04 09:34:45,944][134294] Updated weights for policy 0, policy_version 162204 (0.0025) [2025-01-04 09:34:48,968][134211] Fps is (10 sec: 12287.7, 60 sec: 13448.4, 300 sec: 13773.6). Total num frames: 664420352. Throughput: 0: 3378.7. Samples: 155270782. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 09:34:48,969][134211] Avg episode reward: [(0, '9.656')] [2025-01-04 09:34:49,316][134294] Updated weights for policy 0, policy_version 162214 (0.0026) [2025-01-04 09:34:51,496][134294] Updated weights for policy 0, policy_version 162224 (0.0015) [2025-01-04 09:34:53,493][134294] Updated weights for policy 0, policy_version 162234 (0.0014) [2025-01-04 09:34:53,968][134211] Fps is (10 sec: 15565.6, 60 sec: 13926.4, 300 sec: 13870.9). Total num frames: 664514560. Throughput: 0: 3485.7. Samples: 155293980. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 09:34:53,968][134211] Avg episode reward: [(0, '9.600')] [2025-01-04 09:34:56,545][134294] Updated weights for policy 0, policy_version 162244 (0.0025) [2025-01-04 09:34:58,968][134211] Fps is (10 sec: 15565.4, 60 sec: 13789.9, 300 sec: 13801.4). Total num frames: 664576000. Throughput: 0: 3570.7. Samples: 155315942. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 09:34:58,968][134211] Avg episode reward: [(0, '8.192')] [2025-01-04 09:35:00,087][134294] Updated weights for policy 0, policy_version 162254 (0.0029) [2025-01-04 09:35:03,703][134294] Updated weights for policy 0, policy_version 162264 (0.0028) [2025-01-04 09:35:03,971][134211] Fps is (10 sec: 12283.9, 60 sec: 13720.9, 300 sec: 13801.3). Total num frames: 664637440. Throughput: 0: 3549.7. Samples: 155324456. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 09:35:03,972][134211] Avg episode reward: [(0, '9.019')] [2025-01-04 09:35:06,648][134294] Updated weights for policy 0, policy_version 162274 (0.0023) [2025-01-04 09:35:08,968][134211] Fps is (10 sec: 13517.0, 60 sec: 13926.9, 300 sec: 13815.3). Total num frames: 664711168. Throughput: 0: 3528.7. Samples: 155344304. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 09:35:08,968][134211] Avg episode reward: [(0, '9.310')] [2025-01-04 09:35:09,084][134294] Updated weights for policy 0, policy_version 162284 (0.0019) [2025-01-04 09:35:12,327][134294] Updated weights for policy 0, policy_version 162294 (0.0027) [2025-01-04 09:35:13,968][134211] Fps is (10 sec: 13520.8, 60 sec: 13926.4, 300 sec: 13815.3). Total num frames: 664772608. Throughput: 0: 3435.5. Samples: 155364710. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 09:35:13,968][134211] Avg episode reward: [(0, '9.492')] [2025-01-04 09:35:13,977][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000162298_664772608.pth... [2025-01-04 09:35:14,065][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000161486_661446656.pth [2025-01-04 09:35:15,738][134294] Updated weights for policy 0, policy_version 162304 (0.0026) [2025-01-04 09:35:18,761][134294] Updated weights for policy 0, policy_version 162314 (0.0024) [2025-01-04 09:35:18,968][134211] Fps is (10 sec: 12697.6, 60 sec: 13926.4, 300 sec: 13829.2). Total num frames: 664838144. Throughput: 0: 3391.7. Samples: 155374112. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 09:35:18,968][134211] Avg episode reward: [(0, '9.198')] [2025-01-04 09:35:21,737][134294] Updated weights for policy 0, policy_version 162324 (0.0026) [2025-01-04 09:35:23,969][134211] Fps is (10 sec: 13515.8, 60 sec: 13926.2, 300 sec: 13843.0). Total num frames: 664907776. Throughput: 0: 3376.4. Samples: 155394492. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 09:35:23,969][134211] Avg episode reward: [(0, '8.505')] [2025-01-04 09:35:24,773][134294] Updated weights for policy 0, policy_version 162334 (0.0024) [2025-01-04 09:35:27,766][134294] Updated weights for policy 0, policy_version 162344 (0.0026) [2025-01-04 09:35:28,968][134211] Fps is (10 sec: 13516.8, 60 sec: 13585.1, 300 sec: 13843.1). Total num frames: 664973312. Throughput: 0: 3408.6. Samples: 155414680. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 09:35:28,968][134211] Avg episode reward: [(0, '10.231')] [2025-01-04 09:35:30,941][134294] Updated weights for policy 0, policy_version 162354 (0.0025) [2025-01-04 09:35:33,858][134294] Updated weights for policy 0, policy_version 162364 (0.0023) [2025-01-04 09:35:33,968][134211] Fps is (10 sec: 13518.0, 60 sec: 13585.0, 300 sec: 13857.0). Total num frames: 665042944. Throughput: 0: 3418.7. Samples: 155424622. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 09:35:33,968][134211] Avg episode reward: [(0, '10.535')] [2025-01-04 09:35:36,924][134294] Updated weights for policy 0, policy_version 162374 (0.0025) [2025-01-04 09:35:38,968][134211] Fps is (10 sec: 13516.8, 60 sec: 13516.9, 300 sec: 13871.0). Total num frames: 665108480. Throughput: 0: 3358.0. Samples: 155445088. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 09:35:38,968][134211] Avg episode reward: [(0, '9.298')] [2025-01-04 09:35:40,015][134294] Updated weights for policy 0, policy_version 162384 (0.0026) [2025-01-04 09:35:43,164][134294] Updated weights for policy 0, policy_version 162394 (0.0028) [2025-01-04 09:35:43,968][134211] Fps is (10 sec: 13107.2, 60 sec: 13585.1, 300 sec: 13787.5). Total num frames: 665174016. Throughput: 0: 3307.8. Samples: 155464794. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 09:35:43,968][134211] Avg episode reward: [(0, '8.857')] [2025-01-04 09:35:45,754][134294] Updated weights for policy 0, policy_version 162404 (0.0019) [2025-01-04 09:35:47,739][134294] Updated weights for policy 0, policy_version 162414 (0.0013) [2025-01-04 09:35:48,967][134211] Fps is (10 sec: 16384.2, 60 sec: 14199.6, 300 sec: 13815.3). Total num frames: 665272320. Throughput: 0: 3390.8. Samples: 155477032. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 09:35:48,968][134211] Avg episode reward: [(0, '9.522')] [2025-01-04 09:35:49,613][134294] Updated weights for policy 0, policy_version 162424 (0.0012) [2025-01-04 09:35:51,511][134294] Updated weights for policy 0, policy_version 162434 (0.0015) [2025-01-04 09:35:53,876][134294] Updated weights for policy 0, policy_version 162444 (0.0020) [2025-01-04 09:35:53,968][134211] Fps is (10 sec: 19660.7, 60 sec: 14267.7, 300 sec: 13926.4). Total num frames: 665370624. Throughput: 0: 3664.7. Samples: 155509218. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 09:35:53,968][134211] Avg episode reward: [(0, '8.517')] [2025-01-04 09:35:56,916][134294] Updated weights for policy 0, policy_version 162454 (0.0026) [2025-01-04 09:35:58,968][134211] Fps is (10 sec: 16383.6, 60 sec: 14336.0, 300 sec: 13940.4). Total num frames: 665436160. Throughput: 0: 3668.8. Samples: 155529806. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 09:35:58,968][134211] Avg episode reward: [(0, '9.099')] [2025-01-04 09:36:00,280][134294] Updated weights for policy 0, policy_version 162464 (0.0028) [2025-01-04 09:36:03,617][134294] Updated weights for policy 0, policy_version 162474 (0.0023) [2025-01-04 09:36:03,970][134211] Fps is (10 sec: 12285.5, 60 sec: 14268.0, 300 sec: 13912.4). Total num frames: 665493504. Throughput: 0: 3668.7. Samples: 155539210. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 09:36:03,971][134211] Avg episode reward: [(0, '8.749')] [2025-01-04 09:36:07,274][134294] Updated weights for policy 0, policy_version 162484 (0.0029) [2025-01-04 09:36:08,968][134211] Fps is (10 sec: 11468.7, 60 sec: 13994.6, 300 sec: 13912.5). Total num frames: 665550848. Throughput: 0: 3595.4. Samples: 155556280. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 09:36:08,968][134211] Avg episode reward: [(0, '9.338')] [2025-01-04 09:36:10,997][134294] Updated weights for policy 0, policy_version 162494 (0.0029) [2025-01-04 09:36:13,968][134211] Fps is (10 sec: 11471.2, 60 sec: 13926.4, 300 sec: 13884.7). Total num frames: 665608192. Throughput: 0: 3530.3. Samples: 155573544. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 09:36:13,968][134211] Avg episode reward: [(0, '9.120')] [2025-01-04 09:36:14,484][134294] Updated weights for policy 0, policy_version 162504 (0.0026) [2025-01-04 09:36:17,722][134294] Updated weights for policy 0, policy_version 162514 (0.0025) [2025-01-04 09:36:18,968][134211] Fps is (10 sec: 11878.6, 60 sec: 13858.1, 300 sec: 13870.9). Total num frames: 665669632. Throughput: 0: 3507.5. Samples: 155582460. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 09:36:18,968][134211] Avg episode reward: [(0, '10.049')] [2025-01-04 09:36:20,750][134294] Updated weights for policy 0, policy_version 162524 (0.0024) [2025-01-04 09:36:23,737][134294] Updated weights for policy 0, policy_version 162534 (0.0026) [2025-01-04 09:36:23,968][134211] Fps is (10 sec: 13516.8, 60 sec: 13926.6, 300 sec: 13884.7). Total num frames: 665743360. Throughput: 0: 3504.8. Samples: 155602806. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 09:36:23,968][134211] Avg episode reward: [(0, '8.999')] [2025-01-04 09:36:26,598][134294] Updated weights for policy 0, policy_version 162544 (0.0024) [2025-01-04 09:36:28,969][134211] Fps is (10 sec: 13924.5, 60 sec: 13926.1, 300 sec: 13884.7). Total num frames: 665808896. Throughput: 0: 3529.0. Samples: 155623604. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 09:36:28,970][134211] Avg episode reward: [(0, '9.385')] [2025-01-04 09:36:29,602][134294] Updated weights for policy 0, policy_version 162554 (0.0023) [2025-01-04 09:36:32,535][134294] Updated weights for policy 0, policy_version 162564 (0.0023) [2025-01-04 09:36:33,968][134211] Fps is (10 sec: 13516.4, 60 sec: 13926.3, 300 sec: 13857.0). Total num frames: 665878528. Throughput: 0: 3483.9. Samples: 155633810. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 09:36:33,969][134211] Avg episode reward: [(0, '8.827')] [2025-01-04 09:36:35,578][134294] Updated weights for policy 0, policy_version 162574 (0.0024) [2025-01-04 09:36:38,499][134294] Updated weights for policy 0, policy_version 162584 (0.0024) [2025-01-04 09:36:38,968][134211] Fps is (10 sec: 13928.1, 60 sec: 13994.6, 300 sec: 13870.9). Total num frames: 665948160. Throughput: 0: 3234.5. Samples: 155654770. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 09:36:38,968][134211] Avg episode reward: [(0, '8.525')] [2025-01-04 09:36:41,853][134294] Updated weights for policy 0, policy_version 162594 (0.0028) [2025-01-04 09:36:43,968][134211] Fps is (10 sec: 13107.5, 60 sec: 13926.4, 300 sec: 13843.1). Total num frames: 666009600. Throughput: 0: 3193.6. Samples: 155673520. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 09:36:43,968][134211] Avg episode reward: [(0, '8.564')] [2025-01-04 09:36:45,083][134294] Updated weights for policy 0, policy_version 162604 (0.0024) [2025-01-04 09:36:48,321][134294] Updated weights for policy 0, policy_version 162614 (0.0024) [2025-01-04 09:36:48,967][134211] Fps is (10 sec: 13107.5, 60 sec: 13448.5, 300 sec: 13718.1). Total num frames: 666079232. Throughput: 0: 3188.5. Samples: 155682686. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 09:36:48,968][134211] Avg episode reward: [(0, '9.115')] [2025-01-04 09:36:50,204][134294] Updated weights for policy 0, policy_version 162624 (0.0013) [2025-01-04 09:36:52,137][134294] Updated weights for policy 0, policy_version 162634 (0.0012) [2025-01-04 09:36:53,942][134294] Updated weights for policy 0, policy_version 162644 (0.0013) [2025-01-04 09:36:53,968][134211] Fps is (10 sec: 18022.4, 60 sec: 13653.3, 300 sec: 13773.7). Total num frames: 666189824. Throughput: 0: 3436.1. Samples: 155710906. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 09:36:53,968][134211] Avg episode reward: [(0, '9.227')] [2025-01-04 09:36:55,825][134294] Updated weights for policy 0, policy_version 162654 (0.0013) [2025-01-04 09:36:57,791][134294] Updated weights for policy 0, policy_version 162664 (0.0016) [2025-01-04 09:36:58,968][134211] Fps is (10 sec: 20889.0, 60 sec: 14199.5, 300 sec: 13912.5). Total num frames: 666288128. Throughput: 0: 3761.3. Samples: 155742802. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 09:36:58,968][134211] Avg episode reward: [(0, '9.296')] [2025-01-04 09:37:00,657][134294] Updated weights for policy 0, policy_version 162674 (0.0025) [2025-01-04 09:37:03,923][134294] Updated weights for policy 0, policy_version 162684 (0.0028) [2025-01-04 09:37:03,968][134211] Fps is (10 sec: 16384.0, 60 sec: 14336.5, 300 sec: 13940.3). Total num frames: 666353664. Throughput: 0: 3789.4. Samples: 155752984. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 09:37:03,969][134211] Avg episode reward: [(0, '9.072')] [2025-01-04 09:37:07,107][134294] Updated weights for policy 0, policy_version 162694 (0.0027) [2025-01-04 09:37:08,968][134211] Fps is (10 sec: 12697.8, 60 sec: 14404.3, 300 sec: 13954.2). Total num frames: 666415104. Throughput: 0: 3764.9. Samples: 155772224. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 09:37:08,968][134211] Avg episode reward: [(0, '8.914')] [2025-01-04 09:37:10,241][134294] Updated weights for policy 0, policy_version 162704 (0.0028) [2025-01-04 09:37:13,337][134294] Updated weights for policy 0, policy_version 162714 (0.0025) [2025-01-04 09:37:13,968][134211] Fps is (10 sec: 13107.1, 60 sec: 14609.0, 300 sec: 13981.9). Total num frames: 666484736. Throughput: 0: 3744.6. Samples: 155792106. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 09:37:13,969][134211] Avg episode reward: [(0, '9.576')] [2025-01-04 09:37:13,980][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000162716_666484736.pth... [2025-01-04 09:37:14,060][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000161888_663093248.pth [2025-01-04 09:37:16,490][134294] Updated weights for policy 0, policy_version 162724 (0.0024) [2025-01-04 09:37:18,968][134211] Fps is (10 sec: 13107.3, 60 sec: 14609.1, 300 sec: 13954.2). Total num frames: 666546176. Throughput: 0: 3724.2. Samples: 155801396. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 09:37:18,968][134211] Avg episode reward: [(0, '9.420')] [2025-01-04 09:37:19,768][134294] Updated weights for policy 0, policy_version 162734 (0.0027) [2025-01-04 09:37:22,774][134294] Updated weights for policy 0, policy_version 162744 (0.0026) [2025-01-04 09:37:23,968][134211] Fps is (10 sec: 12697.8, 60 sec: 14472.5, 300 sec: 13954.2). Total num frames: 666611712. Throughput: 0: 3695.3. Samples: 155821058. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 09:37:23,968][134211] Avg episode reward: [(0, '9.576')] [2025-01-04 09:37:25,761][134294] Updated weights for policy 0, policy_version 162754 (0.0024) [2025-01-04 09:37:28,678][134294] Updated weights for policy 0, policy_version 162764 (0.0025) [2025-01-04 09:37:28,968][134211] Fps is (10 sec: 13926.3, 60 sec: 14609.4, 300 sec: 13981.9). Total num frames: 666685440. Throughput: 0: 3744.6. Samples: 155842028. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 09:37:28,968][134211] Avg episode reward: [(0, '10.368')] [2025-01-04 09:37:31,546][134294] Updated weights for policy 0, policy_version 162774 (0.0025) [2025-01-04 09:37:33,968][134211] Fps is (10 sec: 13926.3, 60 sec: 14540.9, 300 sec: 13995.8). Total num frames: 666750976. Throughput: 0: 3769.5. Samples: 155852314. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 09:37:33,968][134211] Avg episode reward: [(0, '8.933')] [2025-01-04 09:37:34,764][134294] Updated weights for policy 0, policy_version 162784 (0.0027) [2025-01-04 09:37:37,864][134294] Updated weights for policy 0, policy_version 162794 (0.0024) [2025-01-04 09:37:38,968][134211] Fps is (10 sec: 13107.0, 60 sec: 14472.5, 300 sec: 14009.7). Total num frames: 666816512. Throughput: 0: 3577.9. Samples: 155871910. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 09:37:38,968][134211] Avg episode reward: [(0, '8.987')] [2025-01-04 09:37:40,908][134294] Updated weights for policy 0, policy_version 162804 (0.0026) [2025-01-04 09:37:43,968][134211] Fps is (10 sec: 13107.2, 60 sec: 14540.8, 300 sec: 14023.6). Total num frames: 666882048. Throughput: 0: 3324.7. Samples: 155892416. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 09:37:43,969][134211] Avg episode reward: [(0, '10.136')] [2025-01-04 09:37:43,985][134294] Updated weights for policy 0, policy_version 162814 (0.0028) [2025-01-04 09:37:47,079][134294] Updated weights for policy 0, policy_version 162824 (0.0026) [2025-01-04 09:37:48,968][134211] Fps is (10 sec: 13517.0, 60 sec: 14540.8, 300 sec: 13982.0). Total num frames: 666951680. Throughput: 0: 3310.5. Samples: 155901954. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 09:37:48,968][134211] Avg episode reward: [(0, '9.078')] [2025-01-04 09:37:50,128][134294] Updated weights for policy 0, policy_version 162834 (0.0024) [2025-01-04 09:37:52,979][134294] Updated weights for policy 0, policy_version 162844 (0.0026) [2025-01-04 09:37:53,968][134211] Fps is (10 sec: 13926.5, 60 sec: 13858.1, 300 sec: 13981.9). Total num frames: 667021312. Throughput: 0: 3343.4. Samples: 155922678. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 09:37:53,968][134211] Avg episode reward: [(0, '8.412')] [2025-01-04 09:37:55,946][134294] Updated weights for policy 0, policy_version 162854 (0.0021) [2025-01-04 09:37:58,871][134294] Updated weights for policy 0, policy_version 162864 (0.0024) [2025-01-04 09:37:58,968][134211] Fps is (10 sec: 13926.2, 60 sec: 13380.3, 300 sec: 13995.8). Total num frames: 667090944. Throughput: 0: 3365.9. Samples: 155943572. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 09:37:58,968][134211] Avg episode reward: [(0, '8.869')] [2025-01-04 09:38:01,909][134294] Updated weights for policy 0, policy_version 162874 (0.0024) [2025-01-04 09:38:03,968][134211] Fps is (10 sec: 13516.8, 60 sec: 13380.3, 300 sec: 13981.9). Total num frames: 667156480. Throughput: 0: 3388.4. Samples: 155953876. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 09:38:03,968][134211] Avg episode reward: [(0, '9.475')] [2025-01-04 09:38:05,420][134294] Updated weights for policy 0, policy_version 162884 (0.0025) [2025-01-04 09:38:08,693][134294] Updated weights for policy 0, policy_version 162894 (0.0027) [2025-01-04 09:38:08,969][134211] Fps is (10 sec: 12287.0, 60 sec: 13311.8, 300 sec: 13981.9). Total num frames: 667213824. Throughput: 0: 3349.1. Samples: 155971770. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 09:38:08,969][134211] Avg episode reward: [(0, '10.153')] [2025-01-04 09:38:12,280][134294] Updated weights for policy 0, policy_version 162904 (0.0024) [2025-01-04 09:38:13,968][134211] Fps is (10 sec: 11468.8, 60 sec: 13107.2, 300 sec: 13870.8). Total num frames: 667271168. Throughput: 0: 3275.1. Samples: 155989406. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 09:38:13,969][134211] Avg episode reward: [(0, '9.429')] [2025-01-04 09:38:15,404][134294] Updated weights for policy 0, policy_version 162914 (0.0022) [2025-01-04 09:38:17,416][134294] Updated weights for policy 0, policy_version 162924 (0.0012) [2025-01-04 09:38:18,968][134211] Fps is (10 sec: 15566.4, 60 sec: 13721.6, 300 sec: 13843.1). Total num frames: 667369472. Throughput: 0: 3307.5. Samples: 156001150. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 09:38:18,968][134211] Avg episode reward: [(0, '8.719')] [2025-01-04 09:38:19,295][134294] Updated weights for policy 0, policy_version 162934 (0.0015) [2025-01-04 09:38:21,166][134294] Updated weights for policy 0, policy_version 162944 (0.0014) [2025-01-04 09:38:23,092][134294] Updated weights for policy 0, policy_version 162954 (0.0013) [2025-01-04 09:38:23,968][134211] Fps is (10 sec: 20480.5, 60 sec: 14404.3, 300 sec: 13940.3). Total num frames: 667475968. Throughput: 0: 3586.4. Samples: 156033298. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 09:38:23,968][134211] Avg episode reward: [(0, '10.241')] [2025-01-04 09:38:25,202][134294] Updated weights for policy 0, policy_version 162964 (0.0014) [2025-01-04 09:38:28,317][134294] Updated weights for policy 0, policy_version 162974 (0.0026) [2025-01-04 09:38:28,968][134211] Fps is (10 sec: 17612.4, 60 sec: 14336.0, 300 sec: 13968.1). Total num frames: 667545600. Throughput: 0: 3680.7. Samples: 156058048. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 09:38:28,969][134211] Avg episode reward: [(0, '10.472')] [2025-01-04 09:38:31,787][134294] Updated weights for policy 0, policy_version 162984 (0.0032) [2025-01-04 09:38:33,968][134211] Fps is (10 sec: 13106.9, 60 sec: 14267.7, 300 sec: 13954.2). Total num frames: 667607040. Throughput: 0: 3665.7. Samples: 156066910. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 09:38:33,969][134211] Avg episode reward: [(0, '9.089')] [2025-01-04 09:38:35,054][134294] Updated weights for policy 0, policy_version 162994 (0.0030) [2025-01-04 09:38:38,106][134294] Updated weights for policy 0, policy_version 163004 (0.0027) [2025-01-04 09:38:38,968][134211] Fps is (10 sec: 12697.6, 60 sec: 14267.7, 300 sec: 13968.1). Total num frames: 667672576. Throughput: 0: 3631.0. Samples: 156086074. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 09:38:38,968][134211] Avg episode reward: [(0, '9.637')] [2025-01-04 09:38:41,398][134294] Updated weights for policy 0, policy_version 163014 (0.0027) [2025-01-04 09:38:43,968][134211] Fps is (10 sec: 12697.6, 60 sec: 14199.5, 300 sec: 13968.0). Total num frames: 667734016. Throughput: 0: 3582.9. Samples: 156104802. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 09:38:43,968][134211] Avg episode reward: [(0, '9.965')] [2025-01-04 09:38:44,855][134294] Updated weights for policy 0, policy_version 163024 (0.0028) [2025-01-04 09:38:48,150][134294] Updated weights for policy 0, policy_version 163034 (0.0024) [2025-01-04 09:38:48,968][134211] Fps is (10 sec: 12288.1, 60 sec: 14062.9, 300 sec: 13954.2). Total num frames: 667795456. Throughput: 0: 3553.4. Samples: 156113780. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 09:38:48,968][134211] Avg episode reward: [(0, '8.806')] [2025-01-04 09:38:51,213][134294] Updated weights for policy 0, policy_version 163044 (0.0026) [2025-01-04 09:38:53,968][134211] Fps is (10 sec: 12697.8, 60 sec: 13994.7, 300 sec: 13940.3). Total num frames: 667860992. Throughput: 0: 3594.2. Samples: 156133506. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 09:38:53,968][134211] Avg episode reward: [(0, '9.017')] [2025-01-04 09:38:54,232][134294] Updated weights for policy 0, policy_version 163054 (0.0027) [2025-01-04 09:38:57,300][134294] Updated weights for policy 0, policy_version 163064 (0.0026) [2025-01-04 09:38:58,968][134211] Fps is (10 sec: 13516.7, 60 sec: 13994.7, 300 sec: 13954.2). Total num frames: 667930624. Throughput: 0: 3652.2. Samples: 156153754. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 09:38:58,968][134211] Avg episode reward: [(0, '9.112')] [2025-01-04 09:39:00,306][134294] Updated weights for policy 0, policy_version 163074 (0.0026) [2025-01-04 09:39:03,226][134294] Updated weights for policy 0, policy_version 163084 (0.0024) [2025-01-04 09:39:03,968][134211] Fps is (10 sec: 13516.6, 60 sec: 13994.7, 300 sec: 13968.1). Total num frames: 667996160. Throughput: 0: 3623.5. Samples: 156164210. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 09:39:03,968][134211] Avg episode reward: [(0, '9.528')] [2025-01-04 09:39:06,204][134294] Updated weights for policy 0, policy_version 163094 (0.0024) [2025-01-04 09:39:08,968][134211] Fps is (10 sec: 13516.8, 60 sec: 14199.7, 300 sec: 13995.8). Total num frames: 668065792. Throughput: 0: 3363.2. Samples: 156184642. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 09:39:08,968][134211] Avg episode reward: [(0, '10.129')] [2025-01-04 09:39:09,290][134294] Updated weights for policy 0, policy_version 163104 (0.0027) [2025-01-04 09:39:12,513][134294] Updated weights for policy 0, policy_version 163114 (0.0026) [2025-01-04 09:39:13,968][134211] Fps is (10 sec: 13516.8, 60 sec: 14336.0, 300 sec: 13995.8). Total num frames: 668131328. Throughput: 0: 3242.3. Samples: 156203950. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 09:39:13,968][134211] Avg episode reward: [(0, '9.263')] [2025-01-04 09:39:13,977][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000163118_668131328.pth... [2025-01-04 09:39:14,051][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000162298_664772608.pth [2025-01-04 09:39:15,705][134294] Updated weights for policy 0, policy_version 163124 (0.0022) [2025-01-04 09:39:18,830][134294] Updated weights for policy 0, policy_version 163134 (0.0026) [2025-01-04 09:39:18,968][134211] Fps is (10 sec: 13107.2, 60 sec: 13789.8, 300 sec: 13981.9). Total num frames: 668196864. Throughput: 0: 3257.3. Samples: 156213486. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 09:39:18,968][134211] Avg episode reward: [(0, '9.824')] [2025-01-04 09:39:21,767][134294] Updated weights for policy 0, policy_version 163144 (0.0026) [2025-01-04 09:39:23,968][134211] Fps is (10 sec: 13516.9, 60 sec: 13175.4, 300 sec: 13926.4). Total num frames: 668266496. Throughput: 0: 3288.0. Samples: 156234032. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 09:39:23,968][134211] Avg episode reward: [(0, '8.975')] [2025-01-04 09:39:24,780][134294] Updated weights for policy 0, policy_version 163154 (0.0026) [2025-01-04 09:39:27,766][134294] Updated weights for policy 0, policy_version 163164 (0.0025) [2025-01-04 09:39:28,968][134211] Fps is (10 sec: 13926.5, 60 sec: 13175.5, 300 sec: 13926.4). Total num frames: 668336128. Throughput: 0: 3329.3. Samples: 156254618. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 09:39:28,969][134211] Avg episode reward: [(0, '9.842')] [2025-01-04 09:39:30,717][134294] Updated weights for policy 0, policy_version 163174 (0.0026) [2025-01-04 09:39:33,634][134294] Updated weights for policy 0, policy_version 163184 (0.0021) [2025-01-04 09:39:33,968][134211] Fps is (10 sec: 13926.2, 60 sec: 13312.0, 300 sec: 13926.4). Total num frames: 668405760. Throughput: 0: 3360.7. Samples: 156265012. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 09:39:33,969][134211] Avg episode reward: [(0, '9.124')] [2025-01-04 09:39:36,686][134294] Updated weights for policy 0, policy_version 163194 (0.0024) [2025-01-04 09:39:38,968][134211] Fps is (10 sec: 13516.7, 60 sec: 13312.0, 300 sec: 13940.3). Total num frames: 668471296. Throughput: 0: 3376.4. Samples: 156285446. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 09:39:38,968][134211] Avg episode reward: [(0, '9.984')] [2025-01-04 09:39:39,797][134294] Updated weights for policy 0, policy_version 163204 (0.0024) [2025-01-04 09:39:42,547][134294] Updated weights for policy 0, policy_version 163214 (0.0020) [2025-01-04 09:39:43,968][134211] Fps is (10 sec: 14336.3, 60 sec: 13585.1, 300 sec: 13995.8). Total num frames: 668549120. Throughput: 0: 3418.7. Samples: 156307594. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 09:39:43,968][134211] Avg episode reward: [(0, '9.556')] [2025-01-04 09:39:44,917][134294] Updated weights for policy 0, policy_version 163224 (0.0016) [2025-01-04 09:39:48,081][134294] Updated weights for policy 0, policy_version 163234 (0.0025) [2025-01-04 09:39:48,968][134211] Fps is (10 sec: 14336.1, 60 sec: 13653.3, 300 sec: 13898.6). Total num frames: 668614656. Throughput: 0: 3434.5. Samples: 156318764. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 09:39:48,969][134211] Avg episode reward: [(0, '8.693')] [2025-01-04 09:39:51,008][134294] Updated weights for policy 0, policy_version 163244 (0.0025) [2025-01-04 09:39:53,968][134211] Fps is (10 sec: 13516.7, 60 sec: 13721.6, 300 sec: 13926.4). Total num frames: 668684288. Throughput: 0: 3431.2. Samples: 156339044. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 09:39:53,968][134211] Avg episode reward: [(0, '8.981')] [2025-01-04 09:39:54,034][134294] Updated weights for policy 0, policy_version 163254 (0.0027) [2025-01-04 09:39:57,054][134294] Updated weights for policy 0, policy_version 163264 (0.0026) [2025-01-04 09:39:58,968][134211] Fps is (10 sec: 13925.6, 60 sec: 13721.5, 300 sec: 13954.3). Total num frames: 668753920. Throughput: 0: 3452.1. Samples: 156359296. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 09:39:58,969][134211] Avg episode reward: [(0, '9.004')] [2025-01-04 09:39:59,967][134294] Updated weights for policy 0, policy_version 163274 (0.0020) [2025-01-04 09:40:03,076][134294] Updated weights for policy 0, policy_version 163284 (0.0029) [2025-01-04 09:40:03,968][134211] Fps is (10 sec: 13517.0, 60 sec: 13721.6, 300 sec: 13926.4). Total num frames: 668819456. Throughput: 0: 3473.5. Samples: 156369794. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 09:40:03,968][134211] Avg episode reward: [(0, '9.208')] [2025-01-04 09:40:06,493][134294] Updated weights for policy 0, policy_version 163294 (0.0025) [2025-01-04 09:40:08,968][134211] Fps is (10 sec: 13107.4, 60 sec: 13653.3, 300 sec: 13940.3). Total num frames: 668884992. Throughput: 0: 3425.4. Samples: 156388174. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 09:40:08,968][134211] Avg episode reward: [(0, '9.197')] [2025-01-04 09:40:09,347][134294] Updated weights for policy 0, policy_version 163304 (0.0019) [2025-01-04 09:40:11,511][134294] Updated weights for policy 0, policy_version 163314 (0.0013) [2025-01-04 09:40:13,513][134294] Updated weights for policy 0, policy_version 163324 (0.0013) [2025-01-04 09:40:13,968][134211] Fps is (10 sec: 16384.1, 60 sec: 14199.5, 300 sec: 14051.4). Total num frames: 668983296. Throughput: 0: 3572.6. Samples: 156415384. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 09:40:13,968][134211] Avg episode reward: [(0, '8.791')] [2025-01-04 09:40:15,466][134294] Updated weights for policy 0, policy_version 163334 (0.0014) [2025-01-04 09:40:17,904][134294] Updated weights for policy 0, policy_version 163344 (0.0019) [2025-01-04 09:40:18,968][134211] Fps is (10 sec: 18432.2, 60 sec: 14540.8, 300 sec: 14106.9). Total num frames: 669069312. Throughput: 0: 3685.9. Samples: 156430878. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 09:40:18,969][134211] Avg episode reward: [(0, '8.689')] [2025-01-04 09:40:21,130][134294] Updated weights for policy 0, policy_version 163354 (0.0027) [2025-01-04 09:40:23,968][134211] Fps is (10 sec: 14745.5, 60 sec: 14404.3, 300 sec: 14093.0). Total num frames: 669130752. Throughput: 0: 3673.7. Samples: 156450762. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 09:40:23,968][134211] Avg episode reward: [(0, '8.539')] [2025-01-04 09:40:24,603][134294] Updated weights for policy 0, policy_version 163364 (0.0026) [2025-01-04 09:40:27,938][134294] Updated weights for policy 0, policy_version 163374 (0.0027) [2025-01-04 09:40:28,968][134211] Fps is (10 sec: 12288.2, 60 sec: 14267.7, 300 sec: 14065.2). Total num frames: 669192192. Throughput: 0: 3583.6. Samples: 156468854. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 09:40:28,968][134211] Avg episode reward: [(0, '9.170')] [2025-01-04 09:40:31,387][134294] Updated weights for policy 0, policy_version 163384 (0.0026) [2025-01-04 09:40:33,968][134211] Fps is (10 sec: 11877.9, 60 sec: 14062.9, 300 sec: 14037.5). Total num frames: 669249536. Throughput: 0: 3537.7. Samples: 156477962. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 09:40:33,969][134211] Avg episode reward: [(0, '8.983')] [2025-01-04 09:40:34,949][134294] Updated weights for policy 0, policy_version 163394 (0.0028) [2025-01-04 09:40:38,048][134294] Updated weights for policy 0, policy_version 163404 (0.0025) [2025-01-04 09:40:38,971][134211] Fps is (10 sec: 11874.8, 60 sec: 13993.9, 300 sec: 14023.5). Total num frames: 669310976. Throughput: 0: 3486.5. Samples: 156495948. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 09:40:38,971][134211] Avg episode reward: [(0, '9.432')] [2025-01-04 09:40:41,257][134294] Updated weights for policy 0, policy_version 163414 (0.0026) [2025-01-04 09:40:43,968][134211] Fps is (10 sec: 12698.0, 60 sec: 13789.9, 300 sec: 13912.5). Total num frames: 669376512. Throughput: 0: 3457.8. Samples: 156514896. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 09:40:43,968][134211] Avg episode reward: [(0, '9.101')] [2025-01-04 09:40:44,754][134294] Updated weights for policy 0, policy_version 163424 (0.0026) [2025-01-04 09:40:47,916][134294] Updated weights for policy 0, policy_version 163434 (0.0026) [2025-01-04 09:40:48,968][134211] Fps is (10 sec: 12701.7, 60 sec: 13721.6, 300 sec: 13787.6). Total num frames: 669437952. Throughput: 0: 3427.7. Samples: 156524040. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 09:40:48,968][134211] Avg episode reward: [(0, '9.429')] [2025-01-04 09:40:50,987][134294] Updated weights for policy 0, policy_version 163444 (0.0024) [2025-01-04 09:40:53,968][134211] Fps is (10 sec: 12697.8, 60 sec: 13653.4, 300 sec: 13787.6). Total num frames: 669503488. Throughput: 0: 3461.9. Samples: 156543956. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 09:40:53,968][134211] Avg episode reward: [(0, '9.310')] [2025-01-04 09:40:53,979][134294] Updated weights for policy 0, policy_version 163454 (0.0025) [2025-01-04 09:40:57,171][134294] Updated weights for policy 0, policy_version 163464 (0.0025) [2025-01-04 09:40:58,968][134211] Fps is (10 sec: 13107.1, 60 sec: 13585.2, 300 sec: 13815.4). Total num frames: 669569024. Throughput: 0: 3292.9. Samples: 156563566. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 09:40:58,968][134211] Avg episode reward: [(0, '9.165')] [2025-01-04 09:41:00,200][134294] Updated weights for policy 0, policy_version 163474 (0.0024) [2025-01-04 09:41:03,204][134294] Updated weights for policy 0, policy_version 163484 (0.0023) [2025-01-04 09:41:03,968][134211] Fps is (10 sec: 13926.5, 60 sec: 13721.6, 300 sec: 13870.9). Total num frames: 669642752. Throughput: 0: 3180.5. Samples: 156574000. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 09:41:03,968][134211] Avg episode reward: [(0, '8.681')] [2025-01-04 09:41:05,128][134294] Updated weights for policy 0, policy_version 163494 (0.0012) [2025-01-04 09:41:07,028][134294] Updated weights for policy 0, policy_version 163504 (0.0014) [2025-01-04 09:41:08,967][134294] Updated weights for policy 0, policy_version 163514 (0.0014) [2025-01-04 09:41:08,968][134211] Fps is (10 sec: 18432.3, 60 sec: 14472.7, 300 sec: 14051.4). Total num frames: 669753344. Throughput: 0: 3367.3. Samples: 156602292. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 09:41:08,968][134211] Avg episode reward: [(0, '9.644')] [2025-01-04 09:41:10,908][134294] Updated weights for policy 0, policy_version 163524 (0.0013) [2025-01-04 09:41:13,076][134294] Updated weights for policy 0, policy_version 163534 (0.0016) [2025-01-04 09:41:13,968][134211] Fps is (10 sec: 20068.7, 60 sec: 14335.8, 300 sec: 14148.5). Total num frames: 669843456. Throughput: 0: 3633.5. Samples: 156632362. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 09:41:13,969][134211] Avg episode reward: [(0, '10.065')] [2025-01-04 09:41:14,040][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000163537_669847552.pth... [2025-01-04 09:41:14,115][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000162716_666484736.pth [2025-01-04 09:41:16,586][134294] Updated weights for policy 0, policy_version 163544 (0.0030) [2025-01-04 09:41:18,968][134211] Fps is (10 sec: 15154.9, 60 sec: 13926.4, 300 sec: 14106.9). Total num frames: 669904896. Throughput: 0: 3624.6. Samples: 156641066. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 09:41:18,968][134211] Avg episode reward: [(0, '9.174')] [2025-01-04 09:41:19,705][134294] Updated weights for policy 0, policy_version 163554 (0.0027) [2025-01-04 09:41:22,997][134294] Updated weights for policy 0, policy_version 163564 (0.0023) [2025-01-04 09:41:23,968][134211] Fps is (10 sec: 12698.3, 60 sec: 13994.6, 300 sec: 14107.0). Total num frames: 669970432. Throughput: 0: 3650.3. Samples: 156660200. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 09:41:23,968][134211] Avg episode reward: [(0, '8.893')] [2025-01-04 09:41:26,018][134294] Updated weights for policy 0, policy_version 163574 (0.0025) [2025-01-04 09:41:28,969][134211] Fps is (10 sec: 13106.0, 60 sec: 14062.7, 300 sec: 14093.0). Total num frames: 670035968. Throughput: 0: 3669.5. Samples: 156680028. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 09:41:28,969][134211] Avg episode reward: [(0, '8.777')] [2025-01-04 09:41:29,164][134294] Updated weights for policy 0, policy_version 163584 (0.0027) [2025-01-04 09:41:32,156][134294] Updated weights for policy 0, policy_version 163594 (0.0026) [2025-01-04 09:41:33,968][134211] Fps is (10 sec: 13107.6, 60 sec: 14199.6, 300 sec: 14079.1). Total num frames: 670101504. Throughput: 0: 3689.5. Samples: 156690066. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 09:41:33,968][134211] Avg episode reward: [(0, '9.108')] [2025-01-04 09:41:35,303][134294] Updated weights for policy 0, policy_version 163604 (0.0024) [2025-01-04 09:41:38,290][134294] Updated weights for policy 0, policy_version 163614 (0.0023) [2025-01-04 09:41:38,968][134211] Fps is (10 sec: 13108.3, 60 sec: 14268.5, 300 sec: 14093.0). Total num frames: 670167040. Throughput: 0: 3693.9. Samples: 156710184. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 09:41:38,968][134211] Avg episode reward: [(0, '9.603')] [2025-01-04 09:41:41,489][134294] Updated weights for policy 0, policy_version 163624 (0.0025) [2025-01-04 09:41:43,968][134211] Fps is (10 sec: 13106.9, 60 sec: 14267.7, 300 sec: 14079.1). Total num frames: 670232576. Throughput: 0: 3678.9. Samples: 156729116. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 09:41:43,969][134211] Avg episode reward: [(0, '8.697')] [2025-01-04 09:41:44,836][134294] Updated weights for policy 0, policy_version 163634 (0.0025) [2025-01-04 09:41:48,044][134294] Updated weights for policy 0, policy_version 163644 (0.0026) [2025-01-04 09:41:48,968][134211] Fps is (10 sec: 12697.8, 60 sec: 14267.7, 300 sec: 13912.5). Total num frames: 670294016. Throughput: 0: 3652.5. Samples: 156738362. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 09:41:48,968][134211] Avg episode reward: [(0, '8.780')] [2025-01-04 09:41:51,072][134294] Updated weights for policy 0, policy_version 163654 (0.0025) [2025-01-04 09:41:53,968][134211] Fps is (10 sec: 12697.7, 60 sec: 14267.7, 300 sec: 13801.4). Total num frames: 670359552. Throughput: 0: 3468.3. Samples: 156758368. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 09:41:53,968][134211] Avg episode reward: [(0, '7.562')] [2025-01-04 09:41:54,415][134294] Updated weights for policy 0, policy_version 163664 (0.0026) [2025-01-04 09:41:57,618][134294] Updated weights for policy 0, policy_version 163674 (0.0023) [2025-01-04 09:41:58,968][134211] Fps is (10 sec: 13107.1, 60 sec: 14267.7, 300 sec: 13801.4). Total num frames: 670425088. Throughput: 0: 3223.0. Samples: 156777396. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 09:41:58,968][134211] Avg episode reward: [(0, '8.566')] [2025-01-04 09:42:00,558][134294] Updated weights for policy 0, policy_version 163684 (0.0025) [2025-01-04 09:42:03,660][134294] Updated weights for policy 0, policy_version 163694 (0.0026) [2025-01-04 09:42:03,968][134211] Fps is (10 sec: 13517.0, 60 sec: 14199.5, 300 sec: 13829.2). Total num frames: 670494720. Throughput: 0: 3261.5. Samples: 156787834. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 09:42:03,968][134211] Avg episode reward: [(0, '8.495')] [2025-01-04 09:42:07,096][134294] Updated weights for policy 0, policy_version 163704 (0.0026) [2025-01-04 09:42:08,968][134211] Fps is (10 sec: 12287.9, 60 sec: 13243.7, 300 sec: 13773.7). Total num frames: 670547968. Throughput: 0: 3244.5. Samples: 156806204. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 09:42:08,968][134211] Avg episode reward: [(0, '7.893')] [2025-01-04 09:42:10,760][134294] Updated weights for policy 0, policy_version 163714 (0.0028) [2025-01-04 09:42:13,626][134294] Updated weights for policy 0, policy_version 163724 (0.0017) [2025-01-04 09:42:13,968][134211] Fps is (10 sec: 12288.0, 60 sec: 12902.6, 300 sec: 13801.4). Total num frames: 670617600. Throughput: 0: 3210.1. Samples: 156824480. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 09:42:13,968][134211] Avg episode reward: [(0, '8.753')] [2025-01-04 09:42:15,648][134294] Updated weights for policy 0, policy_version 163734 (0.0014) [2025-01-04 09:42:17,656][134294] Updated weights for policy 0, policy_version 163744 (0.0014) [2025-01-04 09:42:18,967][134211] Fps is (10 sec: 17613.2, 60 sec: 13653.4, 300 sec: 13940.3). Total num frames: 670724096. Throughput: 0: 3321.9. Samples: 156839550. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 09:42:18,968][134211] Avg episode reward: [(0, '8.991')] [2025-01-04 09:42:19,533][134294] Updated weights for policy 0, policy_version 163754 (0.0013) [2025-01-04 09:42:22,186][134294] Updated weights for policy 0, policy_version 163764 (0.0024) [2025-01-04 09:42:23,968][134211] Fps is (10 sec: 18022.1, 60 sec: 13789.9, 300 sec: 13940.3). Total num frames: 670797824. Throughput: 0: 3497.9. Samples: 156867588. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 09:42:23,969][134211] Avg episode reward: [(0, '8.048')] [2025-01-04 09:42:25,396][134294] Updated weights for policy 0, policy_version 163774 (0.0028) [2025-01-04 09:42:28,704][134294] Updated weights for policy 0, policy_version 163784 (0.0026) [2025-01-04 09:42:28,969][134211] Fps is (10 sec: 13514.9, 60 sec: 13721.5, 300 sec: 13926.3). Total num frames: 670859264. Throughput: 0: 3496.1. Samples: 156886446. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 09:42:28,970][134211] Avg episode reward: [(0, '9.621')] [2025-01-04 09:42:32,111][134294] Updated weights for policy 0, policy_version 163794 (0.0025) [2025-01-04 09:42:33,968][134211] Fps is (10 sec: 12288.1, 60 sec: 13653.3, 300 sec: 13912.5). Total num frames: 670920704. Throughput: 0: 3486.7. Samples: 156895266. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 09:42:33,968][134211] Avg episode reward: [(0, '10.045')] [2025-01-04 09:42:35,220][134294] Updated weights for policy 0, policy_version 163804 (0.0023) [2025-01-04 09:42:38,275][134294] Updated weights for policy 0, policy_version 163814 (0.0023) [2025-01-04 09:42:38,968][134211] Fps is (10 sec: 12699.3, 60 sec: 13653.4, 300 sec: 13912.5). Total num frames: 670986240. Throughput: 0: 3488.8. Samples: 156915364. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 09:42:38,968][134211] Avg episode reward: [(0, '9.692')] [2025-01-04 09:42:41,576][134294] Updated weights for policy 0, policy_version 163824 (0.0025) [2025-01-04 09:42:43,968][134211] Fps is (10 sec: 13107.3, 60 sec: 13653.4, 300 sec: 13898.6). Total num frames: 671051776. Throughput: 0: 3476.1. Samples: 156933822. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 09:42:43,968][134211] Avg episode reward: [(0, '8.903')] [2025-01-04 09:42:44,919][134294] Updated weights for policy 0, policy_version 163834 (0.0023) [2025-01-04 09:42:48,121][134294] Updated weights for policy 0, policy_version 163844 (0.0026) [2025-01-04 09:42:48,968][134211] Fps is (10 sec: 12697.4, 60 sec: 13653.3, 300 sec: 13870.9). Total num frames: 671113216. Throughput: 0: 3444.1. Samples: 156942818. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 09:42:48,968][134211] Avg episode reward: [(0, '9.805')] [2025-01-04 09:42:51,163][134294] Updated weights for policy 0, policy_version 163854 (0.0026) [2025-01-04 09:42:53,968][134211] Fps is (10 sec: 12697.6, 60 sec: 13653.4, 300 sec: 13857.0). Total num frames: 671178752. Throughput: 0: 3485.2. Samples: 156963038. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 09:42:53,968][134211] Avg episode reward: [(0, '8.628')] [2025-01-04 09:42:54,277][134294] Updated weights for policy 0, policy_version 163864 (0.0025) [2025-01-04 09:42:57,317][134294] Updated weights for policy 0, policy_version 163874 (0.0024) [2025-01-04 09:42:58,968][134211] Fps is (10 sec: 13516.9, 60 sec: 13721.6, 300 sec: 13870.9). Total num frames: 671248384. Throughput: 0: 3523.4. Samples: 156983034. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 09:42:58,968][134211] Avg episode reward: [(0, '8.702')] [2025-01-04 09:43:00,359][134294] Updated weights for policy 0, policy_version 163884 (0.0025) [2025-01-04 09:43:03,436][134294] Updated weights for policy 0, policy_version 163894 (0.0025) [2025-01-04 09:43:03,968][134211] Fps is (10 sec: 13516.6, 60 sec: 13653.3, 300 sec: 13898.7). Total num frames: 671313920. Throughput: 0: 3412.0. Samples: 156993090. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 09:43:03,969][134211] Avg episode reward: [(0, '9.037')] [2025-01-04 09:43:06,454][134294] Updated weights for policy 0, policy_version 163904 (0.0026) [2025-01-04 09:43:08,968][134211] Fps is (10 sec: 13516.9, 60 sec: 13926.4, 300 sec: 13940.3). Total num frames: 671383552. Throughput: 0: 3242.7. Samples: 157013510. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 09:43:08,968][134211] Avg episode reward: [(0, '9.904')] [2025-01-04 09:43:09,363][134294] Updated weights for policy 0, policy_version 163914 (0.0026) [2025-01-04 09:43:12,626][134294] Updated weights for policy 0, policy_version 163924 (0.0026) [2025-01-04 09:43:13,968][134211] Fps is (10 sec: 13107.2, 60 sec: 13789.8, 300 sec: 13815.3). Total num frames: 671444992. Throughput: 0: 3256.5. Samples: 157032984. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 09:43:13,968][134211] Avg episode reward: [(0, '9.299')] [2025-01-04 09:43:14,010][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000163928_671449088.pth... [2025-01-04 09:43:14,074][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000163118_668131328.pth [2025-01-04 09:43:15,938][134294] Updated weights for policy 0, policy_version 163934 (0.0023) [2025-01-04 09:43:18,968][134211] Fps is (10 sec: 12697.5, 60 sec: 13107.2, 300 sec: 13676.5). Total num frames: 671510528. Throughput: 0: 3270.9. Samples: 157042458. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 09:43:18,968][134211] Avg episode reward: [(0, '9.297')] [2025-01-04 09:43:18,977][134294] Updated weights for policy 0, policy_version 163944 (0.0024) [2025-01-04 09:43:21,305][134294] Updated weights for policy 0, policy_version 163954 (0.0018) [2025-01-04 09:43:23,591][134294] Updated weights for policy 0, policy_version 163964 (0.0019) [2025-01-04 09:43:23,968][134211] Fps is (10 sec: 15564.9, 60 sec: 13380.3, 300 sec: 13745.9). Total num frames: 671600640. Throughput: 0: 3367.5. Samples: 157066902. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 09:43:23,968][134211] Avg episode reward: [(0, '8.613')] [2025-01-04 09:43:26,532][134294] Updated weights for policy 0, policy_version 163974 (0.0025) [2025-01-04 09:43:28,968][134211] Fps is (10 sec: 15565.0, 60 sec: 13448.8, 300 sec: 13759.8). Total num frames: 671666176. Throughput: 0: 3418.9. Samples: 157087674. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 09:43:28,968][134211] Avg episode reward: [(0, '9.072')] [2025-01-04 09:43:29,718][134294] Updated weights for policy 0, policy_version 163984 (0.0025) [2025-01-04 09:43:32,759][134294] Updated weights for policy 0, policy_version 163994 (0.0026) [2025-01-04 09:43:33,968][134211] Fps is (10 sec: 13107.3, 60 sec: 13516.8, 300 sec: 13759.8). Total num frames: 671731712. Throughput: 0: 3443.4. Samples: 157097772. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 09:43:33,968][134211] Avg episode reward: [(0, '8.774')] [2025-01-04 09:43:35,752][134294] Updated weights for policy 0, policy_version 164004 (0.0027) [2025-01-04 09:43:38,944][134294] Updated weights for policy 0, policy_version 164014 (0.0023) [2025-01-04 09:43:38,968][134211] Fps is (10 sec: 13516.6, 60 sec: 13585.0, 300 sec: 13787.6). Total num frames: 671801344. Throughput: 0: 3443.7. Samples: 157118004. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 09:43:38,968][134211] Avg episode reward: [(0, '9.756')] [2025-01-04 09:43:41,683][134294] Updated weights for policy 0, policy_version 164024 (0.0017) [2025-01-04 09:43:43,721][134294] Updated weights for policy 0, policy_version 164034 (0.0013) [2025-01-04 09:43:43,970][134211] Fps is (10 sec: 15561.7, 60 sec: 13925.9, 300 sec: 13870.8). Total num frames: 671887360. Throughput: 0: 3521.4. Samples: 157141502. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 09:43:43,970][134211] Avg episode reward: [(0, '10.612')] [2025-01-04 09:43:45,689][134294] Updated weights for policy 0, policy_version 164044 (0.0013) [2025-01-04 09:43:47,712][134294] Updated weights for policy 0, policy_version 164054 (0.0015) [2025-01-04 09:43:48,968][134211] Fps is (10 sec: 18841.5, 60 sec: 14609.1, 300 sec: 13995.8). Total num frames: 671989760. Throughput: 0: 3631.8. Samples: 157156522. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 09:43:48,968][134211] Avg episode reward: [(0, '8.191')] [2025-01-04 09:43:49,735][134294] Updated weights for policy 0, policy_version 164064 (0.0015) [2025-01-04 09:43:52,812][134294] Updated weights for policy 0, policy_version 164074 (0.0029) [2025-01-04 09:43:53,968][134211] Fps is (10 sec: 17206.3, 60 sec: 14677.3, 300 sec: 13995.8). Total num frames: 672059392. Throughput: 0: 3764.5. Samples: 157182914. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 09:43:53,969][134211] Avg episode reward: [(0, '9.510')] [2025-01-04 09:43:55,995][134294] Updated weights for policy 0, policy_version 164084 (0.0024) [2025-01-04 09:43:58,968][134211] Fps is (10 sec: 13107.5, 60 sec: 14540.8, 300 sec: 13981.9). Total num frames: 672120832. Throughput: 0: 3738.3. Samples: 157201208. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 09:43:58,968][134211] Avg episode reward: [(0, '8.693')] [2025-01-04 09:43:59,595][134294] Updated weights for policy 0, policy_version 164094 (0.0026) [2025-01-04 09:44:02,746][134294] Updated weights for policy 0, policy_version 164104 (0.0029) [2025-01-04 09:44:03,970][134211] Fps is (10 sec: 12285.5, 60 sec: 14472.0, 300 sec: 13954.1). Total num frames: 672182272. Throughput: 0: 3736.6. Samples: 157210612. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 09:44:03,971][134211] Avg episode reward: [(0, '9.860')] [2025-01-04 09:44:06,197][134294] Updated weights for policy 0, policy_version 164114 (0.0029) [2025-01-04 09:44:08,968][134211] Fps is (10 sec: 11878.2, 60 sec: 14267.7, 300 sec: 13926.4). Total num frames: 672239616. Throughput: 0: 3588.7. Samples: 157228394. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 09:44:08,969][134211] Avg episode reward: [(0, '8.785')] [2025-01-04 09:44:09,904][134294] Updated weights for policy 0, policy_version 164124 (0.0025) [2025-01-04 09:44:13,386][134294] Updated weights for policy 0, policy_version 164134 (0.0026) [2025-01-04 09:44:13,968][134211] Fps is (10 sec: 11471.1, 60 sec: 14199.5, 300 sec: 13898.6). Total num frames: 672296960. Throughput: 0: 3513.4. Samples: 157245776. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 09:44:13,969][134211] Avg episode reward: [(0, '8.070')] [2025-01-04 09:44:16,702][134294] Updated weights for policy 0, policy_version 164144 (0.0025) [2025-01-04 09:44:18,968][134211] Fps is (10 sec: 12288.1, 60 sec: 14199.5, 300 sec: 13884.7). Total num frames: 672362496. Throughput: 0: 3495.2. Samples: 157255058. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 09:44:18,968][134211] Avg episode reward: [(0, '9.582')] [2025-01-04 09:44:19,791][134294] Updated weights for policy 0, policy_version 164154 (0.0022) [2025-01-04 09:44:22,783][134294] Updated weights for policy 0, policy_version 164164 (0.0027) [2025-01-04 09:44:23,968][134211] Fps is (10 sec: 13107.5, 60 sec: 13789.9, 300 sec: 13870.9). Total num frames: 672428032. Throughput: 0: 3490.9. Samples: 157275094. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 09:44:23,968][134211] Avg episode reward: [(0, '9.437')] [2025-01-04 09:44:25,846][134294] Updated weights for policy 0, policy_version 164174 (0.0023) [2025-01-04 09:44:28,968][134211] Fps is (10 sec: 13107.3, 60 sec: 13789.9, 300 sec: 13857.0). Total num frames: 672493568. Throughput: 0: 3407.1. Samples: 157294816. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 09:44:28,968][134211] Avg episode reward: [(0, '8.329')] [2025-01-04 09:44:29,100][134294] Updated weights for policy 0, policy_version 164184 (0.0026) [2025-01-04 09:44:32,301][134294] Updated weights for policy 0, policy_version 164194 (0.0028) [2025-01-04 09:44:33,968][134211] Fps is (10 sec: 12697.4, 60 sec: 13721.6, 300 sec: 13843.1). Total num frames: 672555008. Throughput: 0: 3283.3. Samples: 157304272. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 09:44:33,968][134211] Avg episode reward: [(0, '9.104')] [2025-01-04 09:44:35,600][134294] Updated weights for policy 0, policy_version 164204 (0.0027) [2025-01-04 09:44:38,877][134294] Updated weights for policy 0, policy_version 164214 (0.0023) [2025-01-04 09:44:38,968][134211] Fps is (10 sec: 12697.5, 60 sec: 13653.3, 300 sec: 13801.4). Total num frames: 672620544. Throughput: 0: 3113.3. Samples: 157323012. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 09:44:38,968][134211] Avg episode reward: [(0, '9.393')] [2025-01-04 09:44:41,981][134294] Updated weights for policy 0, policy_version 164224 (0.0026) [2025-01-04 09:44:43,968][134211] Fps is (10 sec: 12697.8, 60 sec: 13244.2, 300 sec: 13787.6). Total num frames: 672681984. Throughput: 0: 3128.4. Samples: 157341988. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 09:44:43,968][134211] Avg episode reward: [(0, '9.950')] [2025-01-04 09:44:45,254][134294] Updated weights for policy 0, policy_version 164234 (0.0028) [2025-01-04 09:44:47,435][134294] Updated weights for policy 0, policy_version 164244 (0.0014) [2025-01-04 09:44:48,967][134211] Fps is (10 sec: 15155.5, 60 sec: 13039.0, 300 sec: 13857.0). Total num frames: 672772096. Throughput: 0: 3159.9. Samples: 157352798. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 09:44:48,968][134211] Avg episode reward: [(0, '9.797')] [2025-01-04 09:44:49,403][134294] Updated weights for policy 0, policy_version 164254 (0.0014) [2025-01-04 09:44:51,283][134294] Updated weights for policy 0, policy_version 164264 (0.0013) [2025-01-04 09:44:53,210][134294] Updated weights for policy 0, policy_version 164274 (0.0014) [2025-01-04 09:44:53,968][134211] Fps is (10 sec: 19660.9, 60 sec: 13653.4, 300 sec: 13982.0). Total num frames: 672878592. Throughput: 0: 3470.1. Samples: 157384550. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-04 09:44:53,968][134211] Avg episode reward: [(0, '9.323')] [2025-01-04 09:44:55,967][134294] Updated weights for policy 0, policy_version 164284 (0.0022) [2025-01-04 09:44:58,968][134211] Fps is (10 sec: 17203.0, 60 sec: 13721.6, 300 sec: 13981.9). Total num frames: 672944128. Throughput: 0: 3584.2. Samples: 157407062. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-04 09:44:58,968][134211] Avg episode reward: [(0, '8.548')] [2025-01-04 09:44:59,240][134294] Updated weights for policy 0, policy_version 164294 (0.0027) [2025-01-04 09:45:02,395][134294] Updated weights for policy 0, policy_version 164304 (0.0028) [2025-01-04 09:45:03,968][134211] Fps is (10 sec: 12697.6, 60 sec: 13722.1, 300 sec: 13968.1). Total num frames: 673005568. Throughput: 0: 3587.2. Samples: 157416484. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-04 09:45:03,968][134211] Avg episode reward: [(0, '8.806')] [2025-01-04 09:45:05,665][134294] Updated weights for policy 0, policy_version 164314 (0.0026) [2025-01-04 09:45:08,710][134294] Updated weights for policy 0, policy_version 164324 (0.0026) [2025-01-04 09:45:08,968][134211] Fps is (10 sec: 13107.1, 60 sec: 13926.4, 300 sec: 13870.9). Total num frames: 673075200. Throughput: 0: 3575.7. Samples: 157436002. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-04 09:45:08,968][134211] Avg episode reward: [(0, '8.386')] [2025-01-04 09:45:11,813][134294] Updated weights for policy 0, policy_version 164334 (0.0024) [2025-01-04 09:45:13,968][134211] Fps is (10 sec: 13107.2, 60 sec: 13994.7, 300 sec: 13787.6). Total num frames: 673136640. Throughput: 0: 3565.9. Samples: 157455282. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-04 09:45:13,968][134211] Avg episode reward: [(0, '9.807')] [2025-01-04 09:45:13,979][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000164340_673136640.pth... [2025-01-04 09:45:14,059][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000163537_669847552.pth [2025-01-04 09:45:15,222][134294] Updated weights for policy 0, policy_version 164344 (0.0024) [2025-01-04 09:45:18,424][134294] Updated weights for policy 0, policy_version 164354 (0.0024) [2025-01-04 09:45:18,968][134211] Fps is (10 sec: 12288.0, 60 sec: 13926.4, 300 sec: 13787.5). Total num frames: 673198080. Throughput: 0: 3559.7. Samples: 157464460. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-04 09:45:18,968][134211] Avg episode reward: [(0, '8.323')] [2025-01-04 09:45:21,371][134294] Updated weights for policy 0, policy_version 164364 (0.0024) [2025-01-04 09:45:23,968][134211] Fps is (10 sec: 13107.0, 60 sec: 13994.6, 300 sec: 13815.3). Total num frames: 673267712. Throughput: 0: 3592.2. Samples: 157484662. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-04 09:45:23,968][134211] Avg episode reward: [(0, '8.135')] [2025-01-04 09:45:24,465][134294] Updated weights for policy 0, policy_version 164374 (0.0025) [2025-01-04 09:45:27,401][134294] Updated weights for policy 0, policy_version 164384 (0.0026) [2025-01-04 09:45:28,968][134211] Fps is (10 sec: 13926.6, 60 sec: 14062.9, 300 sec: 13857.0). Total num frames: 673337344. Throughput: 0: 3622.0. Samples: 157504976. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-04 09:45:28,968][134211] Avg episode reward: [(0, '9.462')] [2025-01-04 09:45:30,465][134294] Updated weights for policy 0, policy_version 164394 (0.0026) [2025-01-04 09:45:33,641][134294] Updated weights for policy 0, policy_version 164404 (0.0025) [2025-01-04 09:45:33,968][134211] Fps is (10 sec: 13106.5, 60 sec: 14062.8, 300 sec: 13857.1). Total num frames: 673398784. Throughput: 0: 3605.7. Samples: 157515056. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-04 09:45:33,969][134211] Avg episode reward: [(0, '8.954')] [2025-01-04 09:45:36,801][134294] Updated weights for policy 0, policy_version 164414 (0.0025) [2025-01-04 09:45:38,968][134211] Fps is (10 sec: 12696.9, 60 sec: 14062.8, 300 sec: 13857.0). Total num frames: 673464320. Throughput: 0: 3333.7. Samples: 157534568. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-04 09:45:38,969][134211] Avg episode reward: [(0, '8.816')] [2025-01-04 09:45:40,071][134294] Updated weights for policy 0, policy_version 164424 (0.0027) [2025-01-04 09:45:43,304][134294] Updated weights for policy 0, policy_version 164434 (0.0026) [2025-01-04 09:45:43,968][134211] Fps is (10 sec: 12698.4, 60 sec: 14062.9, 300 sec: 13857.0). Total num frames: 673525760. Throughput: 0: 3248.9. Samples: 157553262. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-04 09:45:43,968][134211] Avg episode reward: [(0, '8.423')] [2025-01-04 09:45:46,444][134294] Updated weights for policy 0, policy_version 164444 (0.0023) [2025-01-04 09:45:48,968][134211] Fps is (10 sec: 13107.7, 60 sec: 13721.5, 300 sec: 13870.9). Total num frames: 673595392. Throughput: 0: 3255.2. Samples: 157562970. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-04 09:45:48,968][134211] Avg episode reward: [(0, '8.948')] [2025-01-04 09:45:49,677][134294] Updated weights for policy 0, policy_version 164454 (0.0027) [2025-01-04 09:45:52,586][134294] Updated weights for policy 0, policy_version 164464 (0.0027) [2025-01-04 09:45:53,968][134211] Fps is (10 sec: 13516.7, 60 sec: 13038.9, 300 sec: 13870.9). Total num frames: 673660928. Throughput: 0: 3268.2. Samples: 157583070. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-04 09:45:53,968][134211] Avg episode reward: [(0, '9.634')] [2025-01-04 09:45:55,632][134294] Updated weights for policy 0, policy_version 164474 (0.0026) [2025-01-04 09:45:58,565][134294] Updated weights for policy 0, policy_version 164484 (0.0025) [2025-01-04 09:45:58,968][134211] Fps is (10 sec: 13517.0, 60 sec: 13107.2, 300 sec: 13857.0). Total num frames: 673730560. Throughput: 0: 3292.7. Samples: 157603454. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 09:45:58,968][134211] Avg episode reward: [(0, '7.994')] [2025-01-04 09:46:01,695][134294] Updated weights for policy 0, policy_version 164494 (0.0022) [2025-01-04 09:46:03,891][134294] Updated weights for policy 0, policy_version 164504 (0.0014) [2025-01-04 09:46:03,967][134211] Fps is (10 sec: 14746.0, 60 sec: 13380.3, 300 sec: 13745.9). Total num frames: 673808384. Throughput: 0: 3296.1. Samples: 157612784. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 09:46:03,968][134211] Avg episode reward: [(0, '8.588')] [2025-01-04 09:46:06,114][134294] Updated weights for policy 0, policy_version 164514 (0.0014) [2025-01-04 09:46:08,417][134294] Updated weights for policy 0, policy_version 164524 (0.0016) [2025-01-04 09:46:08,968][134211] Fps is (10 sec: 16384.0, 60 sec: 13653.4, 300 sec: 13732.1). Total num frames: 673894400. Throughput: 0: 3466.7. Samples: 157640662. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 09:46:08,968][134211] Avg episode reward: [(0, '9.422')] [2025-01-04 09:46:11,783][134294] Updated weights for policy 0, policy_version 164534 (0.0026) [2025-01-04 09:46:13,968][134211] Fps is (10 sec: 14745.3, 60 sec: 13653.3, 300 sec: 13732.0). Total num frames: 673955840. Throughput: 0: 3448.3. Samples: 157660150. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 09:46:13,968][134211] Avg episode reward: [(0, '8.408')] [2025-01-04 09:46:15,283][134294] Updated weights for policy 0, policy_version 164544 (0.0028) [2025-01-04 09:46:18,481][134294] Updated weights for policy 0, policy_version 164554 (0.0026) [2025-01-04 09:46:18,968][134211] Fps is (10 sec: 12287.6, 60 sec: 13653.3, 300 sec: 13718.1). Total num frames: 674017280. Throughput: 0: 3422.9. Samples: 157669084. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 09:46:18,969][134211] Avg episode reward: [(0, '8.834')] [2025-01-04 09:46:21,471][134294] Updated weights for policy 0, policy_version 164564 (0.0025) [2025-01-04 09:46:23,968][134211] Fps is (10 sec: 12697.3, 60 sec: 13585.0, 300 sec: 13718.2). Total num frames: 674082816. Throughput: 0: 3431.1. Samples: 157688968. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 09:46:23,969][134211] Avg episode reward: [(0, '9.193')] [2025-01-04 09:46:24,687][134294] Updated weights for policy 0, policy_version 164574 (0.0024) [2025-01-04 09:46:27,715][134294] Updated weights for policy 0, policy_version 164584 (0.0026) [2025-01-04 09:46:28,968][134211] Fps is (10 sec: 13517.2, 60 sec: 13585.0, 300 sec: 13732.0). Total num frames: 674152448. Throughput: 0: 3455.2. Samples: 157708744. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 09:46:28,968][134211] Avg episode reward: [(0, '8.888')] [2025-01-04 09:46:30,686][134294] Updated weights for policy 0, policy_version 164594 (0.0024) [2025-01-04 09:46:33,817][134294] Updated weights for policy 0, policy_version 164604 (0.0027) [2025-01-04 09:46:33,969][134211] Fps is (10 sec: 13514.8, 60 sec: 13653.1, 300 sec: 13731.9). Total num frames: 674217984. Throughput: 0: 3469.5. Samples: 157719104. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 09:46:33,970][134211] Avg episode reward: [(0, '9.225')] [2025-01-04 09:46:36,836][134294] Updated weights for policy 0, policy_version 164614 (0.0025) [2025-01-04 09:46:38,968][134211] Fps is (10 sec: 13107.2, 60 sec: 13653.4, 300 sec: 13732.0). Total num frames: 674283520. Throughput: 0: 3471.4. Samples: 157739284. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 09:46:38,968][134211] Avg episode reward: [(0, '10.910')] [2025-01-04 09:46:39,007][134264] Saving new best policy, reward=10.910! [2025-01-04 09:46:40,086][134294] Updated weights for policy 0, policy_version 164624 (0.0027) [2025-01-04 09:46:43,395][134294] Updated weights for policy 0, policy_version 164634 (0.0023) [2025-01-04 09:46:43,968][134211] Fps is (10 sec: 12699.6, 60 sec: 13653.3, 300 sec: 13732.0). Total num frames: 674344960. Throughput: 0: 3429.0. Samples: 157757758. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 09:46:43,968][134211] Avg episode reward: [(0, '7.983')] [2025-01-04 09:46:46,707][134294] Updated weights for policy 0, policy_version 164644 (0.0025) [2025-01-04 09:46:48,968][134211] Fps is (10 sec: 12697.8, 60 sec: 13585.1, 300 sec: 13732.0). Total num frames: 674410496. Throughput: 0: 3429.3. Samples: 157767102. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 09:46:48,968][134211] Avg episode reward: [(0, '9.885')] [2025-01-04 09:46:49,580][134294] Updated weights for policy 0, policy_version 164654 (0.0025) [2025-01-04 09:46:51,454][134294] Updated weights for policy 0, policy_version 164664 (0.0015) [2025-01-04 09:46:53,383][134294] Updated weights for policy 0, policy_version 164674 (0.0013) [2025-01-04 09:46:53,967][134211] Fps is (10 sec: 17203.7, 60 sec: 14267.8, 300 sec: 13870.9). Total num frames: 674516992. Throughput: 0: 3386.0. Samples: 157793030. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 09:46:53,968][134211] Avg episode reward: [(0, '9.328')] [2025-01-04 09:46:55,225][134294] Updated weights for policy 0, policy_version 164684 (0.0013) [2025-01-04 09:46:57,204][134294] Updated weights for policy 0, policy_version 164694 (0.0016) [2025-01-04 09:46:58,968][134211] Fps is (10 sec: 20070.2, 60 sec: 14677.3, 300 sec: 13954.2). Total num frames: 674611200. Throughput: 0: 3632.5. Samples: 157823614. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 09:46:58,968][134211] Avg episode reward: [(0, '8.624')] [2025-01-04 09:47:00,132][134294] Updated weights for policy 0, policy_version 164704 (0.0027) [2025-01-04 09:47:03,377][134294] Updated weights for policy 0, policy_version 164714 (0.0027) [2025-01-04 09:47:03,968][134211] Fps is (10 sec: 15564.4, 60 sec: 14404.2, 300 sec: 13981.9). Total num frames: 674672640. Throughput: 0: 3651.8. Samples: 157833414. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 09:47:03,968][134211] Avg episode reward: [(0, '8.260')] [2025-01-04 09:47:06,604][134294] Updated weights for policy 0, policy_version 164724 (0.0024) [2025-01-04 09:47:08,968][134211] Fps is (10 sec: 12697.5, 60 sec: 14062.9, 300 sec: 13968.0). Total num frames: 674738176. Throughput: 0: 3635.5. Samples: 157852566. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 09:47:08,968][134211] Avg episode reward: [(0, '9.650')] [2025-01-04 09:47:09,809][134294] Updated weights for policy 0, policy_version 164734 (0.0027) [2025-01-04 09:47:13,189][134294] Updated weights for policy 0, policy_version 164744 (0.0025) [2025-01-04 09:47:13,968][134211] Fps is (10 sec: 12697.0, 60 sec: 14062.8, 300 sec: 13815.3). Total num frames: 674799616. Throughput: 0: 3609.0. Samples: 157871150. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 09:47:13,969][134211] Avg episode reward: [(0, '9.077')] [2025-01-04 09:47:13,979][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000164746_674799616.pth... [2025-01-04 09:47:14,056][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000163928_671449088.pth [2025-01-04 09:47:16,466][134294] Updated weights for policy 0, policy_version 164754 (0.0026) [2025-01-04 09:47:18,968][134211] Fps is (10 sec: 12697.5, 60 sec: 14131.2, 300 sec: 13787.5). Total num frames: 674865152. Throughput: 0: 3582.7. Samples: 157880318. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 09:47:18,968][134211] Avg episode reward: [(0, '9.616')] [2025-01-04 09:47:19,426][134294] Updated weights for policy 0, policy_version 164764 (0.0024) [2025-01-04 09:47:22,498][134294] Updated weights for policy 0, policy_version 164774 (0.0026) [2025-01-04 09:47:23,968][134211] Fps is (10 sec: 13517.6, 60 sec: 14199.5, 300 sec: 13815.4). Total num frames: 674934784. Throughput: 0: 3587.8. Samples: 157900736. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 09:47:23,968][134211] Avg episode reward: [(0, '8.212')] [2025-01-04 09:47:25,428][134294] Updated weights for policy 0, policy_version 164784 (0.0025) [2025-01-04 09:47:28,330][134294] Updated weights for policy 0, policy_version 164794 (0.0023) [2025-01-04 09:47:28,968][134211] Fps is (10 sec: 13517.1, 60 sec: 14131.2, 300 sec: 13829.2). Total num frames: 675000320. Throughput: 0: 3643.4. Samples: 157921712. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 09:47:28,968][134211] Avg episode reward: [(0, '9.148')] [2025-01-04 09:47:31,384][134294] Updated weights for policy 0, policy_version 164804 (0.0026) [2025-01-04 09:47:33,968][134211] Fps is (10 sec: 13516.8, 60 sec: 14199.9, 300 sec: 13843.1). Total num frames: 675069952. Throughput: 0: 3664.0. Samples: 157931982. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 09:47:33,968][134211] Avg episode reward: [(0, '9.373')] [2025-01-04 09:47:34,574][134294] Updated weights for policy 0, policy_version 164814 (0.0026) [2025-01-04 09:47:37,753][134294] Updated weights for policy 0, policy_version 164824 (0.0025) [2025-01-04 09:47:38,968][134211] Fps is (10 sec: 13107.1, 60 sec: 14131.2, 300 sec: 13829.2). Total num frames: 675131392. Throughput: 0: 3518.8. Samples: 157951376. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 09:47:38,968][134211] Avg episode reward: [(0, '9.085')] [2025-01-04 09:47:40,982][134294] Updated weights for policy 0, policy_version 164834 (0.0026) [2025-01-04 09:47:43,970][134211] Fps is (10 sec: 12694.8, 60 sec: 14199.0, 300 sec: 13843.0). Total num frames: 675196928. Throughput: 0: 3266.9. Samples: 157970630. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 09:47:43,971][134211] Avg episode reward: [(0, '8.445')] [2025-01-04 09:47:44,025][134294] Updated weights for policy 0, policy_version 164844 (0.0025) [2025-01-04 09:47:47,291][134294] Updated weights for policy 0, policy_version 164854 (0.0026) [2025-01-04 09:47:48,968][134211] Fps is (10 sec: 13107.2, 60 sec: 14199.4, 300 sec: 13843.1). Total num frames: 675262464. Throughput: 0: 3261.1. Samples: 157980164. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 09:47:48,968][134211] Avg episode reward: [(0, '8.850')] [2025-01-04 09:47:50,278][134294] Updated weights for policy 0, policy_version 164864 (0.0025) [2025-01-04 09:47:53,137][134294] Updated weights for policy 0, policy_version 164874 (0.0023) [2025-01-04 09:47:53,968][134211] Fps is (10 sec: 13519.4, 60 sec: 13585.0, 300 sec: 13843.1). Total num frames: 675332096. Throughput: 0: 3291.5. Samples: 158000686. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 09:47:53,969][134211] Avg episode reward: [(0, '9.614')] [2025-01-04 09:47:56,158][134294] Updated weights for policy 0, policy_version 164884 (0.0023) [2025-01-04 09:47:58,968][134211] Fps is (10 sec: 13926.4, 60 sec: 13175.5, 300 sec: 13857.0). Total num frames: 675401728. Throughput: 0: 3335.7. Samples: 158021254. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 09:47:58,968][134211] Avg episode reward: [(0, '8.725')] [2025-01-04 09:47:59,252][134294] Updated weights for policy 0, policy_version 164894 (0.0027) [2025-01-04 09:48:02,313][134294] Updated weights for policy 0, policy_version 164904 (0.0025) [2025-01-04 09:48:03,968][134211] Fps is (10 sec: 13516.9, 60 sec: 13243.7, 300 sec: 13843.1). Total num frames: 675467264. Throughput: 0: 3355.6. Samples: 158031318. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 09:48:03,969][134211] Avg episode reward: [(0, '8.878')] [2025-01-04 09:48:05,566][134294] Updated weights for policy 0, policy_version 164914 (0.0028) [2025-01-04 09:48:07,801][134294] Updated weights for policy 0, policy_version 164924 (0.0013) [2025-01-04 09:48:08,967][134211] Fps is (10 sec: 14336.4, 60 sec: 13448.6, 300 sec: 13898.6). Total num frames: 675545088. Throughput: 0: 3374.7. Samples: 158052598. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 09:48:08,968][134211] Avg episode reward: [(0, '8.283')] [2025-01-04 09:48:10,167][134294] Updated weights for policy 0, policy_version 164934 (0.0014) [2025-01-04 09:48:13,468][134294] Updated weights for policy 0, policy_version 164944 (0.0024) [2025-01-04 09:48:13,968][134211] Fps is (10 sec: 14745.6, 60 sec: 13585.2, 300 sec: 13912.5). Total num frames: 675614720. Throughput: 0: 3404.3. Samples: 158074906. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 09:48:13,969][134211] Avg episode reward: [(0, '9.643')] [2025-01-04 09:48:16,708][134294] Updated weights for policy 0, policy_version 164954 (0.0030) [2025-01-04 09:48:18,968][134211] Fps is (10 sec: 13107.0, 60 sec: 13516.9, 300 sec: 13815.3). Total num frames: 675676160. Throughput: 0: 3387.1. Samples: 158084400. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 09:48:18,968][134211] Avg episode reward: [(0, '7.706')] [2025-01-04 09:48:19,923][134294] Updated weights for policy 0, policy_version 164964 (0.0028) [2025-01-04 09:48:22,911][134294] Updated weights for policy 0, policy_version 164974 (0.0022) [2025-01-04 09:48:23,968][134211] Fps is (10 sec: 13107.3, 60 sec: 13516.8, 300 sec: 13829.2). Total num frames: 675745792. Throughput: 0: 3394.7. Samples: 158104136. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 09:48:23,968][134211] Avg episode reward: [(0, '9.169')] [2025-01-04 09:48:25,882][134294] Updated weights for policy 0, policy_version 164984 (0.0022) [2025-01-04 09:48:28,968][134211] Fps is (10 sec: 13515.7, 60 sec: 13516.6, 300 sec: 13829.2). Total num frames: 675811328. Throughput: 0: 3417.7. Samples: 158124420. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 09:48:28,969][134211] Avg episode reward: [(0, '9.802')] [2025-01-04 09:48:29,017][134294] Updated weights for policy 0, policy_version 164994 (0.0025) [2025-01-04 09:48:32,230][134294] Updated weights for policy 0, policy_version 165004 (0.0024) [2025-01-04 09:48:33,968][134211] Fps is (10 sec: 13107.2, 60 sec: 13448.5, 300 sec: 13815.3). Total num frames: 675876864. Throughput: 0: 3417.8. Samples: 158133964. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 09:48:33,968][134211] Avg episode reward: [(0, '10.270')] [2025-01-04 09:48:35,368][134294] Updated weights for policy 0, policy_version 165014 (0.0022) [2025-01-04 09:48:37,202][134294] Updated weights for policy 0, policy_version 165024 (0.0015) [2025-01-04 09:48:38,967][134211] Fps is (10 sec: 16385.6, 60 sec: 14063.0, 300 sec: 13857.1). Total num frames: 675975168. Throughput: 0: 3486.4. Samples: 158157572. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 09:48:38,968][134211] Avg episode reward: [(0, '9.665')] [2025-01-04 09:48:39,141][134294] Updated weights for policy 0, policy_version 165034 (0.0012) [2025-01-04 09:48:41,105][134294] Updated weights for policy 0, policy_version 165044 (0.0014) [2025-01-04 09:48:43,192][134294] Updated weights for policy 0, policy_version 165054 (0.0014) [2025-01-04 09:48:43,968][134211] Fps is (10 sec: 19251.1, 60 sec: 14541.3, 300 sec: 13829.2). Total num frames: 676069376. Throughput: 0: 3712.4. Samples: 158188314. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 09:48:43,969][134211] Avg episode reward: [(0, '8.388')] [2025-01-04 09:48:47,183][134294] Updated weights for policy 0, policy_version 165064 (0.0031) [2025-01-04 09:48:48,968][134211] Fps is (10 sec: 14745.2, 60 sec: 14336.0, 300 sec: 13773.7). Total num frames: 676122624. Throughput: 0: 3661.3. Samples: 158196078. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 09:48:48,968][134211] Avg episode reward: [(0, '8.818')] [2025-01-04 09:48:50,675][134294] Updated weights for policy 0, policy_version 165074 (0.0029) [2025-01-04 09:48:53,968][134211] Fps is (10 sec: 10649.8, 60 sec: 14063.0, 300 sec: 13745.9). Total num frames: 676175872. Throughput: 0: 3583.8. Samples: 158213870. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 09:48:53,968][134211] Avg episode reward: [(0, '8.690')] [2025-01-04 09:48:54,433][134294] Updated weights for policy 0, policy_version 165084 (0.0032) [2025-01-04 09:48:58,050][134294] Updated weights for policy 0, policy_version 165094 (0.0031) [2025-01-04 09:48:58,968][134211] Fps is (10 sec: 11059.2, 60 sec: 13858.1, 300 sec: 13732.1). Total num frames: 676233216. Throughput: 0: 3438.3. Samples: 158229630. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 09:48:58,968][134211] Avg episode reward: [(0, '9.260')] [2025-01-04 09:49:01,251][134294] Updated weights for policy 0, policy_version 165104 (0.0027) [2025-01-04 09:49:03,968][134211] Fps is (10 sec: 12288.0, 60 sec: 13858.2, 300 sec: 13759.8). Total num frames: 676298752. Throughput: 0: 3442.2. Samples: 158239300. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 09:49:03,969][134211] Avg episode reward: [(0, '10.534')] [2025-01-04 09:49:04,669][134294] Updated weights for policy 0, policy_version 165114 (0.0027) [2025-01-04 09:49:07,613][134294] Updated weights for policy 0, policy_version 165124 (0.0024) [2025-01-04 09:49:08,968][134211] Fps is (10 sec: 12697.6, 60 sec: 13585.0, 300 sec: 13773.7). Total num frames: 676360192. Throughput: 0: 3430.6. Samples: 158258512. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 09:49:08,968][134211] Avg episode reward: [(0, '9.132')] [2025-01-04 09:49:10,933][134294] Updated weights for policy 0, policy_version 165134 (0.0026) [2025-01-04 09:49:13,968][134211] Fps is (10 sec: 11468.9, 60 sec: 13312.1, 300 sec: 13732.0). Total num frames: 676413440. Throughput: 0: 3319.3. Samples: 158273784. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 09:49:13,968][134211] Avg episode reward: [(0, '9.127')] [2025-01-04 09:49:13,996][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000165141_676417536.pth... [2025-01-04 09:49:14,044][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000164340_673136640.pth [2025-01-04 09:49:14,687][134294] Updated weights for policy 0, policy_version 165144 (0.0017) [2025-01-04 09:49:16,698][134294] Updated weights for policy 0, policy_version 165154 (0.0013) [2025-01-04 09:49:18,651][134294] Updated weights for policy 0, policy_version 165164 (0.0014) [2025-01-04 09:49:18,968][134211] Fps is (10 sec: 15563.8, 60 sec: 13994.5, 300 sec: 13856.9). Total num frames: 676515840. Throughput: 0: 3430.4. Samples: 158288332. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 09:49:18,969][134211] Avg episode reward: [(0, '9.598')] [2025-01-04 09:49:20,534][134294] Updated weights for policy 0, policy_version 165174 (0.0013) [2025-01-04 09:49:22,462][134294] Updated weights for policy 0, policy_version 165184 (0.0013) [2025-01-04 09:49:23,968][134211] Fps is (10 sec: 21299.0, 60 sec: 14677.4, 300 sec: 14009.7). Total num frames: 676626432. Throughput: 0: 3614.3. Samples: 158320218. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 09:49:23,968][134211] Avg episode reward: [(0, '9.117')] [2025-01-04 09:49:24,372][134294] Updated weights for policy 0, policy_version 165194 (0.0013) [2025-01-04 09:49:26,451][134294] Updated weights for policy 0, policy_version 165204 (0.0016) [2025-01-04 09:49:28,968][134211] Fps is (10 sec: 18842.1, 60 sec: 14882.2, 300 sec: 14065.2). Total num frames: 676704256. Throughput: 0: 3546.3. Samples: 158347898. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 09:49:28,969][134211] Avg episode reward: [(0, '8.532')] [2025-01-04 09:49:29,625][134294] Updated weights for policy 0, policy_version 165214 (0.0026) [2025-01-04 09:49:33,342][134294] Updated weights for policy 0, policy_version 165224 (0.0031) [2025-01-04 09:49:33,968][134211] Fps is (10 sec: 13516.7, 60 sec: 14745.6, 300 sec: 14037.5). Total num frames: 676761600. Throughput: 0: 3561.3. Samples: 158356336. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 09:49:33,968][134211] Avg episode reward: [(0, '10.030')] [2025-01-04 09:49:36,662][134294] Updated weights for policy 0, policy_version 165234 (0.0029) [2025-01-04 09:49:38,968][134211] Fps is (10 sec: 11878.9, 60 sec: 14131.2, 300 sec: 14037.5). Total num frames: 676823040. Throughput: 0: 3566.2. Samples: 158374350. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 09:49:38,968][134211] Avg episode reward: [(0, '10.105')] [2025-01-04 09:49:40,176][134294] Updated weights for policy 0, policy_version 165244 (0.0027) [2025-01-04 09:49:43,468][134294] Updated weights for policy 0, policy_version 165254 (0.0029) [2025-01-04 09:49:43,968][134211] Fps is (10 sec: 12287.9, 60 sec: 13585.1, 300 sec: 13940.3). Total num frames: 676884480. Throughput: 0: 3617.9. Samples: 158392436. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 09:49:43,968][134211] Avg episode reward: [(0, '8.495')] [2025-01-04 09:49:46,852][134294] Updated weights for policy 0, policy_version 165264 (0.0027) [2025-01-04 09:49:48,968][134211] Fps is (10 sec: 12287.3, 60 sec: 13721.5, 300 sec: 13787.5). Total num frames: 676945920. Throughput: 0: 3603.2. Samples: 158401446. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 09:49:48,969][134211] Avg episode reward: [(0, '9.338')] [2025-01-04 09:49:50,078][134294] Updated weights for policy 0, policy_version 165274 (0.0030) [2025-01-04 09:49:53,135][134294] Updated weights for policy 0, policy_version 165284 (0.0026) [2025-01-04 09:49:53,968][134211] Fps is (10 sec: 12697.5, 60 sec: 13926.4, 300 sec: 13787.5). Total num frames: 677011456. Throughput: 0: 3606.2. Samples: 158420790. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 09:49:53,969][134211] Avg episode reward: [(0, '8.999')] [2025-01-04 09:49:56,311][134294] Updated weights for policy 0, policy_version 165294 (0.0026) [2025-01-04 09:49:58,968][134211] Fps is (10 sec: 12698.2, 60 sec: 13994.7, 300 sec: 13787.5). Total num frames: 677072896. Throughput: 0: 3691.5. Samples: 158439902. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 09:49:58,968][134211] Avg episode reward: [(0, '9.091')] [2025-01-04 09:49:59,695][134294] Updated weights for policy 0, policy_version 165304 (0.0027) [2025-01-04 09:50:03,094][134294] Updated weights for policy 0, policy_version 165314 (0.0029) [2025-01-04 09:50:03,968][134211] Fps is (10 sec: 12288.1, 60 sec: 13926.4, 300 sec: 13759.8). Total num frames: 677134336. Throughput: 0: 3567.9. Samples: 158448884. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 09:50:03,968][134211] Avg episode reward: [(0, '8.342')] [2025-01-04 09:50:06,470][134294] Updated weights for policy 0, policy_version 165324 (0.0030) [2025-01-04 09:50:08,968][134211] Fps is (10 sec: 11878.4, 60 sec: 13858.1, 300 sec: 13745.9). Total num frames: 677191680. Throughput: 0: 3258.5. Samples: 158466852. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 09:50:08,969][134211] Avg episode reward: [(0, '9.205')] [2025-01-04 09:50:10,229][134294] Updated weights for policy 0, policy_version 165334 (0.0025) [2025-01-04 09:50:13,769][134294] Updated weights for policy 0, policy_version 165344 (0.0028) [2025-01-04 09:50:13,968][134211] Fps is (10 sec: 11468.7, 60 sec: 13926.3, 300 sec: 13732.0). Total num frames: 677249024. Throughput: 0: 3023.0. Samples: 158483934. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 09:50:13,968][134211] Avg episode reward: [(0, '8.845')] [2025-01-04 09:50:17,506][134294] Updated weights for policy 0, policy_version 165354 (0.0027) [2025-01-04 09:50:18,968][134211] Fps is (10 sec: 11468.8, 60 sec: 13175.6, 300 sec: 13690.4). Total num frames: 677306368. Throughput: 0: 3014.4. Samples: 158491984. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 09:50:18,968][134211] Avg episode reward: [(0, '8.823')] [2025-01-04 09:50:20,716][134294] Updated weights for policy 0, policy_version 165364 (0.0026) [2025-01-04 09:50:23,701][134294] Updated weights for policy 0, policy_version 165374 (0.0026) [2025-01-04 09:50:23,969][134211] Fps is (10 sec: 12696.4, 60 sec: 12492.6, 300 sec: 13690.3). Total num frames: 677376000. Throughput: 0: 3039.6. Samples: 158511136. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 09:50:23,969][134211] Avg episode reward: [(0, '8.153')] [2025-01-04 09:50:26,723][134294] Updated weights for policy 0, policy_version 165384 (0.0024) [2025-01-04 09:50:28,968][134211] Fps is (10 sec: 13516.6, 60 sec: 12288.0, 300 sec: 13704.3). Total num frames: 677441536. Throughput: 0: 3081.8. Samples: 158531116. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 09:50:28,969][134211] Avg episode reward: [(0, '9.059')] [2025-01-04 09:50:29,853][134294] Updated weights for policy 0, policy_version 165394 (0.0025) [2025-01-04 09:50:32,938][134294] Updated weights for policy 0, policy_version 165404 (0.0026) [2025-01-04 09:50:33,968][134211] Fps is (10 sec: 13108.6, 60 sec: 12424.5, 300 sec: 13704.3). Total num frames: 677507072. Throughput: 0: 3098.4. Samples: 158540872. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 09:50:33,968][134211] Avg episode reward: [(0, '10.076')] [2025-01-04 09:50:36,045][134294] Updated weights for policy 0, policy_version 165414 (0.0025) [2025-01-04 09:50:38,968][134211] Fps is (10 sec: 13107.6, 60 sec: 12492.8, 300 sec: 13718.1). Total num frames: 677572608. Throughput: 0: 3106.2. Samples: 158560568. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 09:50:38,968][134211] Avg episode reward: [(0, '9.005')] [2025-01-04 09:50:39,281][134294] Updated weights for policy 0, policy_version 165424 (0.0028) [2025-01-04 09:50:42,519][134294] Updated weights for policy 0, policy_version 165434 (0.0028) [2025-01-04 09:50:43,968][134211] Fps is (10 sec: 13516.9, 60 sec: 12629.4, 300 sec: 13718.1). Total num frames: 677642240. Throughput: 0: 3110.8. Samples: 158579888. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 09:50:43,968][134211] Avg episode reward: [(0, '9.720')] [2025-01-04 09:50:44,809][134294] Updated weights for policy 0, policy_version 165444 (0.0016) [2025-01-04 09:50:46,918][134294] Updated weights for policy 0, policy_version 165454 (0.0013) [2025-01-04 09:50:48,968][134211] Fps is (10 sec: 15155.0, 60 sec: 12970.8, 300 sec: 13773.7). Total num frames: 677724160. Throughput: 0: 3248.3. Samples: 158595058. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 09:50:48,968][134211] Avg episode reward: [(0, '8.100')] [2025-01-04 09:50:50,166][134294] Updated weights for policy 0, policy_version 165464 (0.0028) [2025-01-04 09:50:53,481][134294] Updated weights for policy 0, policy_version 165474 (0.0029) [2025-01-04 09:50:53,968][134211] Fps is (10 sec: 14335.7, 60 sec: 12902.4, 300 sec: 13745.9). Total num frames: 677785600. Throughput: 0: 3279.2. Samples: 158614414. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 09:50:53,968][134211] Avg episode reward: [(0, '8.747')] [2025-01-04 09:50:56,512][134294] Updated weights for policy 0, policy_version 165484 (0.0026) [2025-01-04 09:50:58,968][134211] Fps is (10 sec: 12697.7, 60 sec: 12970.7, 300 sec: 13704.2). Total num frames: 677851136. Throughput: 0: 3334.4. Samples: 158633982. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 09:50:58,968][134211] Avg episode reward: [(0, '9.220')] [2025-01-04 09:50:59,783][134294] Updated weights for policy 0, policy_version 165494 (0.0025) [2025-01-04 09:51:02,962][134294] Updated weights for policy 0, policy_version 165504 (0.0028) [2025-01-04 09:51:03,968][134211] Fps is (10 sec: 13107.2, 60 sec: 13038.9, 300 sec: 13634.8). Total num frames: 677916672. Throughput: 0: 3369.0. Samples: 158643590. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 09:51:03,968][134211] Avg episode reward: [(0, '10.222')] [2025-01-04 09:51:05,993][134294] Updated weights for policy 0, policy_version 165514 (0.0024) [2025-01-04 09:51:08,970][134211] Fps is (10 sec: 13104.5, 60 sec: 13175.0, 300 sec: 13648.6). Total num frames: 677982208. Throughput: 0: 3377.7. Samples: 158663136. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 09:51:08,970][134211] Avg episode reward: [(0, '8.570')] [2025-01-04 09:51:09,292][134294] Updated weights for policy 0, policy_version 165524 (0.0026) [2025-01-04 09:51:12,541][134294] Updated weights for policy 0, policy_version 165534 (0.0026) [2025-01-04 09:51:13,968][134211] Fps is (10 sec: 12697.5, 60 sec: 13243.7, 300 sec: 13648.7). Total num frames: 678043648. Throughput: 0: 3348.1. Samples: 158681780. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 09:51:13,969][134211] Avg episode reward: [(0, '9.762')] [2025-01-04 09:51:13,981][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000165538_678043648.pth... [2025-01-04 09:51:14,062][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000164746_674799616.pth [2025-01-04 09:51:15,965][134294] Updated weights for policy 0, policy_version 165544 (0.0027) [2025-01-04 09:51:18,968][134211] Fps is (10 sec: 12290.5, 60 sec: 13312.0, 300 sec: 13634.8). Total num frames: 678105088. Throughput: 0: 3333.2. Samples: 158690868. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 09:51:18,968][134211] Avg episode reward: [(0, '9.631')] [2025-01-04 09:51:19,170][134294] Updated weights for policy 0, policy_version 165554 (0.0022) [2025-01-04 09:51:22,215][134294] Updated weights for policy 0, policy_version 165564 (0.0025) [2025-01-04 09:51:23,968][134211] Fps is (10 sec: 13517.3, 60 sec: 13380.5, 300 sec: 13648.7). Total num frames: 678178816. Throughput: 0: 3333.8. Samples: 158710590. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 09:51:23,968][134211] Avg episode reward: [(0, '10.317')] [2025-01-04 09:51:24,489][134294] Updated weights for policy 0, policy_version 165574 (0.0016) [2025-01-04 09:51:27,275][134294] Updated weights for policy 0, policy_version 165584 (0.0024) [2025-01-04 09:51:28,968][134211] Fps is (10 sec: 14745.6, 60 sec: 13516.8, 300 sec: 13676.6). Total num frames: 678252544. Throughput: 0: 3426.5. Samples: 158734082. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 09:51:28,968][134211] Avg episode reward: [(0, '9.297')] [2025-01-04 09:51:30,409][134294] Updated weights for policy 0, policy_version 165594 (0.0027) [2025-01-04 09:51:33,372][134294] Updated weights for policy 0, policy_version 165604 (0.0025) [2025-01-04 09:51:33,968][134211] Fps is (10 sec: 13925.9, 60 sec: 13516.7, 300 sec: 13676.5). Total num frames: 678318080. Throughput: 0: 3314.3. Samples: 158744204. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 09:51:33,969][134211] Avg episode reward: [(0, '9.067')] [2025-01-04 09:51:36,618][134294] Updated weights for policy 0, policy_version 165614 (0.0025) [2025-01-04 09:51:38,969][134211] Fps is (10 sec: 13105.6, 60 sec: 13516.5, 300 sec: 13690.3). Total num frames: 678383616. Throughput: 0: 3319.6. Samples: 158763800. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 09:51:38,970][134211] Avg episode reward: [(0, '9.093')] [2025-01-04 09:51:39,964][134294] Updated weights for policy 0, policy_version 165624 (0.0026) [2025-01-04 09:51:42,222][134294] Updated weights for policy 0, policy_version 165634 (0.0016) [2025-01-04 09:51:43,968][134211] Fps is (10 sec: 14336.3, 60 sec: 13653.3, 300 sec: 13732.0). Total num frames: 678461440. Throughput: 0: 3375.1. Samples: 158785862. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 09:51:43,968][134211] Avg episode reward: [(0, '8.656')] [2025-01-04 09:51:45,311][134294] Updated weights for policy 0, policy_version 165644 (0.0026) [2025-01-04 09:51:48,415][134294] Updated weights for policy 0, policy_version 165654 (0.0027) [2025-01-04 09:51:48,968][134211] Fps is (10 sec: 13928.3, 60 sec: 13312.0, 300 sec: 13579.3). Total num frames: 678522880. Throughput: 0: 3376.2. Samples: 158795520. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 09:51:48,968][134211] Avg episode reward: [(0, '9.419')] [2025-01-04 09:51:51,857][134294] Updated weights for policy 0, policy_version 165664 (0.0024) [2025-01-04 09:51:53,968][134211] Fps is (10 sec: 12287.8, 60 sec: 13312.0, 300 sec: 13468.2). Total num frames: 678584320. Throughput: 0: 3353.8. Samples: 158814052. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 09:51:53,968][134211] Avg episode reward: [(0, '8.791')] [2025-01-04 09:51:55,257][134294] Updated weights for policy 0, policy_version 165674 (0.0026) [2025-01-04 09:51:58,654][134294] Updated weights for policy 0, policy_version 165684 (0.0028) [2025-01-04 09:51:58,971][134211] Fps is (10 sec: 11874.6, 60 sec: 13174.8, 300 sec: 13454.2). Total num frames: 678641664. Throughput: 0: 3343.7. Samples: 158832256. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 09:51:58,971][134211] Avg episode reward: [(0, '8.144')] [2025-01-04 09:52:01,071][134294] Updated weights for policy 0, policy_version 165694 (0.0017) [2025-01-04 09:52:03,095][134294] Updated weights for policy 0, policy_version 165704 (0.0012) [2025-01-04 09:52:03,968][134211] Fps is (10 sec: 15565.1, 60 sec: 13721.6, 300 sec: 13565.4). Total num frames: 678739968. Throughput: 0: 3412.4. Samples: 158844426. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 09:52:03,968][134211] Avg episode reward: [(0, '8.362')] [2025-01-04 09:52:05,259][134294] Updated weights for policy 0, policy_version 165714 (0.0014) [2025-01-04 09:52:07,518][134294] Updated weights for policy 0, policy_version 165724 (0.0014) [2025-01-04 09:52:08,968][134211] Fps is (10 sec: 18436.5, 60 sec: 14063.3, 300 sec: 13648.7). Total num frames: 678825984. Throughput: 0: 3614.3. Samples: 158873236. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 09:52:08,969][134211] Avg episode reward: [(0, '8.671')] [2025-01-04 09:52:10,890][134294] Updated weights for policy 0, policy_version 165734 (0.0027) [2025-01-04 09:52:13,968][134211] Fps is (10 sec: 13926.1, 60 sec: 13926.4, 300 sec: 13607.1). Total num frames: 678879232. Throughput: 0: 3489.7. Samples: 158891120. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 09:52:13,969][134211] Avg episode reward: [(0, '9.977')] [2025-01-04 09:52:14,659][134294] Updated weights for policy 0, policy_version 165744 (0.0029) [2025-01-04 09:52:18,136][134294] Updated weights for policy 0, policy_version 165754 (0.0028) [2025-01-04 09:52:18,968][134211] Fps is (10 sec: 11059.9, 60 sec: 13858.1, 300 sec: 13565.4). Total num frames: 678936576. Throughput: 0: 3451.9. Samples: 158899540. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 09:52:18,968][134211] Avg episode reward: [(0, '9.491')] [2025-01-04 09:52:21,308][134294] Updated weights for policy 0, policy_version 165764 (0.0027) [2025-01-04 09:52:23,968][134211] Fps is (10 sec: 11878.2, 60 sec: 13653.2, 300 sec: 13551.5). Total num frames: 678998016. Throughput: 0: 3429.4. Samples: 158918120. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 09:52:23,969][134211] Avg episode reward: [(0, '8.680')] [2025-01-04 09:52:24,749][134294] Updated weights for policy 0, policy_version 165774 (0.0028) [2025-01-04 09:52:27,814][134294] Updated weights for policy 0, policy_version 165784 (0.0028) [2025-01-04 09:52:28,968][134211] Fps is (10 sec: 12697.5, 60 sec: 13516.8, 300 sec: 13537.6). Total num frames: 679063552. Throughput: 0: 3359.8. Samples: 158937052. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 09:52:28,969][134211] Avg episode reward: [(0, '9.057')] [2025-01-04 09:52:31,214][134294] Updated weights for policy 0, policy_version 165794 (0.0029) [2025-01-04 09:52:33,971][134211] Fps is (10 sec: 12694.2, 60 sec: 13447.9, 300 sec: 13537.5). Total num frames: 679124992. Throughput: 0: 3351.0. Samples: 158946326. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 09:52:33,971][134211] Avg episode reward: [(0, '8.395')] [2025-01-04 09:52:34,607][134294] Updated weights for policy 0, policy_version 165804 (0.0028) [2025-01-04 09:52:37,961][134294] Updated weights for policy 0, policy_version 165814 (0.0029) [2025-01-04 09:52:38,968][134211] Fps is (10 sec: 11878.5, 60 sec: 13312.3, 300 sec: 13510.0). Total num frames: 679182336. Throughput: 0: 3341.9. Samples: 158964436. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 09:52:38,968][134211] Avg episode reward: [(0, '9.450')] [2025-01-04 09:52:41,422][134294] Updated weights for policy 0, policy_version 165824 (0.0028) [2025-01-04 09:52:43,968][134211] Fps is (10 sec: 11472.1, 60 sec: 12970.6, 300 sec: 13482.1). Total num frames: 679239680. Throughput: 0: 3324.3. Samples: 158981838. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 09:52:43,969][134211] Avg episode reward: [(0, '7.911')] [2025-01-04 09:52:45,135][134294] Updated weights for policy 0, policy_version 165834 (0.0028) [2025-01-04 09:52:48,771][134294] Updated weights for policy 0, policy_version 165844 (0.0024) [2025-01-04 09:52:48,968][134211] Fps is (10 sec: 11468.8, 60 sec: 12902.4, 300 sec: 13440.4). Total num frames: 679297024. Throughput: 0: 3236.2. Samples: 158990056. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 09:52:48,968][134211] Avg episode reward: [(0, '8.755')] [2025-01-04 09:52:52,065][134294] Updated weights for policy 0, policy_version 165854 (0.0025) [2025-01-04 09:52:53,967][134211] Fps is (10 sec: 13107.6, 60 sec: 13107.3, 300 sec: 13454.3). Total num frames: 679370752. Throughput: 0: 2996.9. Samples: 159008092. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 09:52:53,968][134211] Avg episode reward: [(0, '9.655')] [2025-01-04 09:52:54,230][134294] Updated weights for policy 0, policy_version 165864 (0.0013) [2025-01-04 09:52:56,234][134294] Updated weights for policy 0, policy_version 165874 (0.0015) [2025-01-04 09:52:58,306][134294] Updated weights for policy 0, policy_version 165884 (0.0013) [2025-01-04 09:52:58,967][134211] Fps is (10 sec: 17613.3, 60 sec: 13858.9, 300 sec: 13579.3). Total num frames: 679473152. Throughput: 0: 3266.0. Samples: 159038088. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 09:52:58,968][134211] Avg episode reward: [(0, '9.763')] [2025-01-04 09:53:00,737][134294] Updated weights for policy 0, policy_version 165894 (0.0018) [2025-01-04 09:53:03,968][134211] Fps is (10 sec: 16383.7, 60 sec: 13243.7, 300 sec: 13523.7). Total num frames: 679534592. Throughput: 0: 3343.0. Samples: 159049976. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 09:53:03,968][134211] Avg episode reward: [(0, '8.838')] [2025-01-04 09:53:04,500][134294] Updated weights for policy 0, policy_version 165904 (0.0032) [2025-01-04 09:53:08,081][134294] Updated weights for policy 0, policy_version 165914 (0.0031) [2025-01-04 09:53:08,968][134211] Fps is (10 sec: 11877.3, 60 sec: 12765.9, 300 sec: 13482.1). Total num frames: 679591936. Throughput: 0: 3295.2. Samples: 159066406. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 09:53:08,969][134211] Avg episode reward: [(0, '10.030')] [2025-01-04 09:53:11,780][134294] Updated weights for policy 0, policy_version 165924 (0.0027) [2025-01-04 09:53:13,969][134211] Fps is (10 sec: 11058.2, 60 sec: 12765.7, 300 sec: 13454.3). Total num frames: 679645184. Throughput: 0: 3243.3. Samples: 159083002. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 09:53:13,969][134211] Avg episode reward: [(0, '8.604')] [2025-01-04 09:53:13,987][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000165929_679645184.pth... [2025-01-04 09:53:14,120][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000165141_676417536.pth [2025-01-04 09:53:15,692][134294] Updated weights for policy 0, policy_version 165934 (0.0029) [2025-01-04 09:53:18,968][134211] Fps is (10 sec: 11059.3, 60 sec: 12765.8, 300 sec: 13412.6). Total num frames: 679702528. Throughput: 0: 3212.4. Samples: 159090878. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 09:53:18,969][134211] Avg episode reward: [(0, '9.215')] [2025-01-04 09:53:18,973][134294] Updated weights for policy 0, policy_version 165944 (0.0025) [2025-01-04 09:53:22,099][134294] Updated weights for policy 0, policy_version 165954 (0.0029) [2025-01-04 09:53:23,968][134211] Fps is (10 sec: 12698.6, 60 sec: 12902.5, 300 sec: 13426.6). Total num frames: 679772160. Throughput: 0: 3243.4. Samples: 159110388. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 09:53:23,968][134211] Avg episode reward: [(0, '9.315')] [2025-01-04 09:53:25,246][134294] Updated weights for policy 0, policy_version 165964 (0.0025) [2025-01-04 09:53:28,424][134294] Updated weights for policy 0, policy_version 165974 (0.0026) [2025-01-04 09:53:28,968][134211] Fps is (10 sec: 13107.7, 60 sec: 12834.1, 300 sec: 13412.7). Total num frames: 679833600. Throughput: 0: 3289.2. Samples: 159129850. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 09:53:28,968][134211] Avg episode reward: [(0, '8.874')] [2025-01-04 09:53:31,639][134294] Updated weights for policy 0, policy_version 165984 (0.0026) [2025-01-04 09:53:33,968][134211] Fps is (10 sec: 12288.1, 60 sec: 12834.8, 300 sec: 13287.7). Total num frames: 679895040. Throughput: 0: 3318.7. Samples: 159139398. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 09:53:33,968][134211] Avg episode reward: [(0, '9.983')] [2025-01-04 09:53:35,304][134294] Updated weights for policy 0, policy_version 165994 (0.0028) [2025-01-04 09:53:38,968][134211] Fps is (10 sec: 11059.3, 60 sec: 12697.6, 300 sec: 13135.0). Total num frames: 679944192. Throughput: 0: 3280.1. Samples: 159155698. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 09:53:38,969][134211] Avg episode reward: [(0, '9.777')] [2025-01-04 09:53:39,771][134294] Updated weights for policy 0, policy_version 166004 (0.0031) [2025-01-04 09:53:42,671][134294] Updated weights for policy 0, policy_version 166014 (0.0016) [2025-01-04 09:53:43,968][134211] Fps is (10 sec: 12288.1, 60 sec: 12970.7, 300 sec: 13204.4). Total num frames: 680017920. Throughput: 0: 3021.1. Samples: 159174040. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 09:53:43,968][134211] Avg episode reward: [(0, '10.142')] [2025-01-04 09:53:44,875][134294] Updated weights for policy 0, policy_version 166024 (0.0015) [2025-01-04 09:53:47,116][134294] Updated weights for policy 0, policy_version 166034 (0.0016) [2025-01-04 09:53:48,968][134211] Fps is (10 sec: 14745.5, 60 sec: 13243.7, 300 sec: 13273.8). Total num frames: 680091648. Throughput: 0: 3060.8. Samples: 159187712. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 09:53:48,969][134211] Avg episode reward: [(0, '9.512')] [2025-01-04 09:53:50,929][134294] Updated weights for policy 0, policy_version 166044 (0.0032) [2025-01-04 09:53:53,968][134211] Fps is (10 sec: 12697.6, 60 sec: 12902.4, 300 sec: 13259.9). Total num frames: 680144896. Throughput: 0: 3090.8. Samples: 159205492. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 09:53:53,968][134211] Avg episode reward: [(0, '9.932')] [2025-01-04 09:53:54,417][134294] Updated weights for policy 0, policy_version 166054 (0.0027) [2025-01-04 09:53:56,875][134294] Updated weights for policy 0, policy_version 166064 (0.0017) [2025-01-04 09:53:58,968][134211] Fps is (10 sec: 12697.8, 60 sec: 12424.5, 300 sec: 13287.7). Total num frames: 680218624. Throughput: 0: 3186.6. Samples: 159226398. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 09:53:58,968][134211] Avg episode reward: [(0, '9.788')] [2025-01-04 09:54:00,682][134294] Updated weights for policy 0, policy_version 166074 (0.0035) [2025-01-04 09:54:03,968][134211] Fps is (10 sec: 12697.5, 60 sec: 12288.0, 300 sec: 13259.9). Total num frames: 680271872. Throughput: 0: 3182.3. Samples: 159234078. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 09:54:03,968][134211] Avg episode reward: [(0, '9.590')] [2025-01-04 09:54:04,798][134294] Updated weights for policy 0, policy_version 166084 (0.0033) [2025-01-04 09:54:07,758][134294] Updated weights for policy 0, policy_version 166094 (0.0023) [2025-01-04 09:54:08,967][134211] Fps is (10 sec: 12288.3, 60 sec: 12493.0, 300 sec: 13315.5). Total num frames: 680341504. Throughput: 0: 3123.9. Samples: 159250962. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 09:54:08,968][134211] Avg episode reward: [(0, '9.067')] [2025-01-04 09:54:10,065][134294] Updated weights for policy 0, policy_version 166104 (0.0014) [2025-01-04 09:54:12,291][134294] Updated weights for policy 0, policy_version 166114 (0.0014) [2025-01-04 09:54:13,968][134211] Fps is (10 sec: 15974.6, 60 sec: 13107.4, 300 sec: 13273.9). Total num frames: 680431616. Throughput: 0: 3288.2. Samples: 159277816. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 09:54:13,968][134211] Avg episode reward: [(0, '8.566')] [2025-01-04 09:54:14,519][134294] Updated weights for policy 0, policy_version 166124 (0.0015) [2025-01-04 09:54:17,228][134294] Updated weights for policy 0, policy_version 166134 (0.0020) [2025-01-04 09:54:18,968][134211] Fps is (10 sec: 15564.2, 60 sec: 13243.8, 300 sec: 13121.1). Total num frames: 680497152. Throughput: 0: 3374.9. Samples: 159291268. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 09:54:18,969][134211] Avg episode reward: [(0, '8.436')] [2025-01-04 09:54:21,431][134294] Updated weights for policy 0, policy_version 166144 (0.0033) [2025-01-04 09:54:23,968][134211] Fps is (10 sec: 11878.1, 60 sec: 12970.7, 300 sec: 13037.8). Total num frames: 680550400. Throughput: 0: 3348.4. Samples: 159306376. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 09:54:23,969][134211] Avg episode reward: [(0, '9.010')] [2025-01-04 09:54:25,642][134294] Updated weights for policy 0, policy_version 166154 (0.0036) [2025-01-04 09:54:28,968][134211] Fps is (10 sec: 10240.1, 60 sec: 12765.9, 300 sec: 13010.0). Total num frames: 680599552. Throughput: 0: 3272.3. Samples: 159321294. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 09:54:28,969][134211] Avg episode reward: [(0, '9.106')] [2025-01-04 09:54:29,765][134294] Updated weights for policy 0, policy_version 166164 (0.0034) [2025-01-04 09:54:33,366][134294] Updated weights for policy 0, policy_version 166174 (0.0033) [2025-01-04 09:54:33,968][134211] Fps is (10 sec: 10239.8, 60 sec: 12629.3, 300 sec: 12982.2). Total num frames: 680652800. Throughput: 0: 3139.6. Samples: 159328996. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 09:54:33,969][134211] Avg episode reward: [(0, '8.411')] [2025-01-04 09:54:37,052][134294] Updated weights for policy 0, policy_version 166184 (0.0034) [2025-01-04 09:54:38,968][134211] Fps is (10 sec: 10649.1, 60 sec: 12697.5, 300 sec: 12954.4). Total num frames: 680706048. Throughput: 0: 3117.0. Samples: 159345760. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 09:54:38,970][134211] Avg episode reward: [(0, '8.675')] [2025-01-04 09:54:40,480][134294] Updated weights for policy 0, policy_version 166194 (0.0025) [2025-01-04 09:54:42,662][134294] Updated weights for policy 0, policy_version 166204 (0.0014) [2025-01-04 09:54:43,968][134211] Fps is (10 sec: 13927.0, 60 sec: 12902.4, 300 sec: 13037.8). Total num frames: 680792064. Throughput: 0: 3155.4. Samples: 159368392. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 09:54:43,968][134211] Avg episode reward: [(0, '8.291')] [2025-01-04 09:54:44,937][134294] Updated weights for policy 0, policy_version 166214 (0.0016) [2025-01-04 09:54:47,156][134294] Updated weights for policy 0, policy_version 166224 (0.0012) [2025-01-04 09:54:48,968][134211] Fps is (10 sec: 17613.7, 60 sec: 13175.5, 300 sec: 13121.1). Total num frames: 680882176. Throughput: 0: 3292.4. Samples: 159382234. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 09:54:48,968][134211] Avg episode reward: [(0, '8.663')] [2025-01-04 09:54:49,932][134294] Updated weights for policy 0, policy_version 166234 (0.0021) [2025-01-04 09:54:53,968][134211] Fps is (10 sec: 13925.8, 60 sec: 13107.1, 300 sec: 13079.4). Total num frames: 680931328. Throughput: 0: 3367.5. Samples: 159402502. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 09:54:53,969][134211] Avg episode reward: [(0, '9.727')] [2025-01-04 09:54:53,986][134294] Updated weights for policy 0, policy_version 166244 (0.0035) [2025-01-04 09:54:58,034][134294] Updated weights for policy 0, policy_version 166254 (0.0032) [2025-01-04 09:54:58,968][134211] Fps is (10 sec: 10240.0, 60 sec: 12765.9, 300 sec: 13051.7). Total num frames: 680984576. Throughput: 0: 3107.7. Samples: 159417662. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 09:54:58,968][134211] Avg episode reward: [(0, '9.250')] [2025-01-04 09:55:01,904][134294] Updated weights for policy 0, policy_version 166264 (0.0033) [2025-01-04 09:55:03,968][134211] Fps is (10 sec: 10649.9, 60 sec: 12765.9, 300 sec: 13037.8). Total num frames: 681037824. Throughput: 0: 2988.9. Samples: 159425768. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 09:55:03,968][134211] Avg episode reward: [(0, '9.277')] [2025-01-04 09:55:05,938][134294] Updated weights for policy 0, policy_version 166274 (0.0029) [2025-01-04 09:55:08,813][134294] Updated weights for policy 0, policy_version 166284 (0.0019) [2025-01-04 09:55:08,967][134211] Fps is (10 sec: 11469.0, 60 sec: 12629.3, 300 sec: 13051.7). Total num frames: 681099264. Throughput: 0: 2993.3. Samples: 159441072. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 09:55:08,968][134211] Avg episode reward: [(0, '9.344')] [2025-01-04 09:55:11,003][134294] Updated weights for policy 0, policy_version 166294 (0.0014) [2025-01-04 09:55:13,098][134294] Updated weights for policy 0, policy_version 166304 (0.0015) [2025-01-04 09:55:13,968][134211] Fps is (10 sec: 15564.8, 60 sec: 12697.6, 300 sec: 13176.6). Total num frames: 681193472. Throughput: 0: 3275.0. Samples: 159468668. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 09:55:13,968][134211] Avg episode reward: [(0, '8.310')] [2025-01-04 09:55:14,047][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000166308_681197568.pth... [2025-01-04 09:55:14,096][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000165538_678043648.pth [2025-01-04 09:55:15,518][134294] Updated weights for policy 0, policy_version 166314 (0.0014) [2025-01-04 09:55:18,133][134294] Updated weights for policy 0, policy_version 166324 (0.0018) [2025-01-04 09:55:18,968][134211] Fps is (10 sec: 17202.7, 60 sec: 12902.4, 300 sec: 13204.4). Total num frames: 681271296. Throughput: 0: 3392.2. Samples: 159481646. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 09:55:18,969][134211] Avg episode reward: [(0, '8.454')] [2025-01-04 09:55:22,749][134294] Updated weights for policy 0, policy_version 166334 (0.0032) [2025-01-04 09:55:23,968][134211] Fps is (10 sec: 12287.8, 60 sec: 12765.9, 300 sec: 13135.0). Total num frames: 681316352. Throughput: 0: 3375.1. Samples: 159497638. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 09:55:23,969][134211] Avg episode reward: [(0, '9.082')] [2025-01-04 09:55:26,396][134294] Updated weights for policy 0, policy_version 166344 (0.0029) [2025-01-04 09:55:28,968][134211] Fps is (10 sec: 10240.0, 60 sec: 12902.4, 300 sec: 13107.2). Total num frames: 681373696. Throughput: 0: 3244.2. Samples: 159514384. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 09:55:28,969][134211] Avg episode reward: [(0, '9.627')] [2025-01-04 09:55:30,024][134294] Updated weights for policy 0, policy_version 166354 (0.0027) [2025-01-04 09:55:33,658][134294] Updated weights for policy 0, policy_version 166364 (0.0026) [2025-01-04 09:55:33,969][134211] Fps is (10 sec: 11058.2, 60 sec: 12902.2, 300 sec: 13065.5). Total num frames: 681426944. Throughput: 0: 3130.2. Samples: 159523094. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 09:55:33,969][134211] Avg episode reward: [(0, '9.097')] [2025-01-04 09:55:37,242][134294] Updated weights for policy 0, policy_version 166374 (0.0029) [2025-01-04 09:55:38,968][134211] Fps is (10 sec: 11059.2, 60 sec: 12970.8, 300 sec: 13023.9). Total num frames: 681484288. Throughput: 0: 3055.2. Samples: 159539986. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 09:55:38,969][134211] Avg episode reward: [(0, '9.355')] [2025-01-04 09:55:40,960][134294] Updated weights for policy 0, policy_version 166384 (0.0029) [2025-01-04 09:55:43,968][134211] Fps is (10 sec: 11469.8, 60 sec: 12492.7, 300 sec: 12940.6). Total num frames: 681541632. Throughput: 0: 3084.7. Samples: 159556474. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 09:55:43,969][134211] Avg episode reward: [(0, '9.480')] [2025-01-04 09:55:44,910][134294] Updated weights for policy 0, policy_version 166394 (0.0033) [2025-01-04 09:55:48,600][134294] Updated weights for policy 0, policy_version 166404 (0.0029) [2025-01-04 09:55:48,968][134211] Fps is (10 sec: 11059.2, 60 sec: 11878.4, 300 sec: 12912.8). Total num frames: 681594880. Throughput: 0: 3070.9. Samples: 159563960. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 09:55:48,968][134211] Avg episode reward: [(0, '9.311')] [2025-01-04 09:55:50,748][134294] Updated weights for policy 0, policy_version 166414 (0.0014) [2025-01-04 09:55:52,712][134294] Updated weights for policy 0, policy_version 166424 (0.0014) [2025-01-04 09:55:53,967][134211] Fps is (10 sec: 15565.4, 60 sec: 12766.0, 300 sec: 13037.8). Total num frames: 681697280. Throughput: 0: 3283.3. Samples: 159588820. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 09:55:53,968][134211] Avg episode reward: [(0, '9.544')] [2025-01-04 09:55:54,756][134294] Updated weights for policy 0, policy_version 166434 (0.0015) [2025-01-04 09:55:56,713][134294] Updated weights for policy 0, policy_version 166444 (0.0015) [2025-01-04 09:55:58,968][134211] Fps is (10 sec: 19251.2, 60 sec: 13380.2, 300 sec: 13121.1). Total num frames: 681787392. Throughput: 0: 3314.6. Samples: 159617826. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 09:55:58,969][134211] Avg episode reward: [(0, '8.327')] [2025-01-04 09:55:59,701][134294] Updated weights for policy 0, policy_version 166454 (0.0028) [2025-01-04 09:56:03,667][134294] Updated weights for policy 0, policy_version 166464 (0.0032) [2025-01-04 09:56:03,968][134211] Fps is (10 sec: 13926.0, 60 sec: 13312.0, 300 sec: 13065.6). Total num frames: 681836544. Throughput: 0: 3206.4. Samples: 159625934. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 09:56:03,968][134211] Avg episode reward: [(0, '8.111')] [2025-01-04 09:56:07,840][134294] Updated weights for policy 0, policy_version 166474 (0.0035) [2025-01-04 09:56:08,968][134211] Fps is (10 sec: 9830.4, 60 sec: 13107.1, 300 sec: 13023.9). Total num frames: 681885696. Throughput: 0: 3183.5. Samples: 159640896. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 09:56:08,969][134211] Avg episode reward: [(0, '8.628')] [2025-01-04 09:56:12,000][134294] Updated weights for policy 0, policy_version 166484 (0.0032) [2025-01-04 09:56:13,969][134211] Fps is (10 sec: 9829.3, 60 sec: 12356.0, 300 sec: 12982.2). Total num frames: 681934848. Throughput: 0: 3131.2. Samples: 159655290. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 09:56:13,970][134211] Avg episode reward: [(0, '9.770')] [2025-01-04 09:56:16,460][134294] Updated weights for policy 0, policy_version 166494 (0.0037) [2025-01-04 09:56:18,968][134211] Fps is (10 sec: 9420.9, 60 sec: 11810.1, 300 sec: 12885.0). Total num frames: 681979904. Throughput: 0: 3098.0. Samples: 159662500. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 09:56:18,969][134211] Avg episode reward: [(0, '9.900')] [2025-01-04 09:56:20,430][134294] Updated weights for policy 0, policy_version 166504 (0.0029) [2025-01-04 09:56:22,495][134294] Updated weights for policy 0, policy_version 166514 (0.0013) [2025-01-04 09:56:23,968][134211] Fps is (10 sec: 13518.5, 60 sec: 12561.1, 300 sec: 12940.6). Total num frames: 682070016. Throughput: 0: 3159.1. Samples: 159682146. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 09:56:23,968][134211] Avg episode reward: [(0, '9.178')] [2025-01-04 09:56:24,658][134294] Updated weights for policy 0, policy_version 166524 (0.0015) [2025-01-04 09:56:28,243][134294] Updated weights for policy 0, policy_version 166534 (0.0031) [2025-01-04 09:56:28,968][134211] Fps is (10 sec: 14745.7, 60 sec: 12561.1, 300 sec: 12912.8). Total num frames: 682127360. Throughput: 0: 3275.0. Samples: 159703848. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 09:56:28,968][134211] Avg episode reward: [(0, '9.087')] [2025-01-04 09:56:31,865][134294] Updated weights for policy 0, policy_version 166544 (0.0030) [2025-01-04 09:56:33,968][134211] Fps is (10 sec: 11468.5, 60 sec: 12629.5, 300 sec: 12885.1). Total num frames: 682184704. Throughput: 0: 3292.9. Samples: 159712140. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 09:56:33,969][134211] Avg episode reward: [(0, '8.544')] [2025-01-04 09:56:35,781][134294] Updated weights for policy 0, policy_version 166554 (0.0030) [2025-01-04 09:56:38,969][134211] Fps is (10 sec: 11058.3, 60 sec: 12560.9, 300 sec: 12801.7). Total num frames: 682237952. Throughput: 0: 3107.5. Samples: 159728662. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 09:56:38,969][134211] Avg episode reward: [(0, '8.484')] [2025-01-04 09:56:39,421][134294] Updated weights for policy 0, policy_version 166564 (0.0031) [2025-01-04 09:56:42,114][134294] Updated weights for policy 0, policy_version 166574 (0.0019) [2025-01-04 09:56:43,971][134211] Fps is (10 sec: 13103.4, 60 sec: 12901.8, 300 sec: 12857.1). Total num frames: 682315776. Throughput: 0: 2930.0. Samples: 159749686. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 09:56:43,972][134211] Avg episode reward: [(0, '8.380')] [2025-01-04 09:56:44,391][134294] Updated weights for policy 0, policy_version 166584 (0.0018) [2025-01-04 09:56:46,642][134294] Updated weights for policy 0, policy_version 166594 (0.0015) [2025-01-04 09:56:48,780][134294] Updated weights for policy 0, policy_version 166604 (0.0014) [2025-01-04 09:56:48,968][134211] Fps is (10 sec: 17204.8, 60 sec: 13585.1, 300 sec: 12968.4). Total num frames: 682409984. Throughput: 0: 3057.5. Samples: 159763522. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-04 09:56:48,968][134211] Avg episode reward: [(0, '8.975')] [2025-01-04 09:56:52,218][134294] Updated weights for policy 0, policy_version 166614 (0.0030) [2025-01-04 09:56:53,968][134211] Fps is (10 sec: 14749.7, 60 sec: 12765.8, 300 sec: 12954.6). Total num frames: 682463232. Throughput: 0: 3214.7. Samples: 159785556. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-04 09:56:53,969][134211] Avg episode reward: [(0, '9.845')] [2025-01-04 09:56:56,403][134294] Updated weights for policy 0, policy_version 166624 (0.0035) [2025-01-04 09:56:58,968][134211] Fps is (10 sec: 10649.3, 60 sec: 12151.5, 300 sec: 12801.7). Total num frames: 682516480. Throughput: 0: 3230.3. Samples: 159800652. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-04 09:56:58,969][134211] Avg episode reward: [(0, '9.497')] [2025-01-04 09:57:00,323][134294] Updated weights for policy 0, policy_version 166634 (0.0037) [2025-01-04 09:57:03,968][134211] Fps is (10 sec: 10649.7, 60 sec: 12219.7, 300 sec: 12690.7). Total num frames: 682569728. Throughput: 0: 3245.4. Samples: 159808544. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-04 09:57:03,969][134211] Avg episode reward: [(0, '9.010')] [2025-01-04 09:57:04,070][134294] Updated weights for policy 0, policy_version 166644 (0.0034) [2025-01-04 09:57:07,626][134294] Updated weights for policy 0, policy_version 166654 (0.0032) [2025-01-04 09:57:08,968][134211] Fps is (10 sec: 11059.4, 60 sec: 12356.3, 300 sec: 12704.5). Total num frames: 682627072. Throughput: 0: 3187.5. Samples: 159825584. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-04 09:57:08,968][134211] Avg episode reward: [(0, '8.036')] [2025-01-04 09:57:11,174][134294] Updated weights for policy 0, policy_version 166664 (0.0028) [2025-01-04 09:57:13,968][134211] Fps is (10 sec: 11468.9, 60 sec: 12493.0, 300 sec: 12704.5). Total num frames: 682684416. Throughput: 0: 3090.7. Samples: 159842928. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-04 09:57:13,968][134211] Avg episode reward: [(0, '10.218')] [2025-01-04 09:57:13,991][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000166671_682684416.pth... [2025-01-04 09:57:14,104][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000165929_679645184.pth [2025-01-04 09:57:14,991][134294] Updated weights for policy 0, policy_version 166674 (0.0030) [2025-01-04 09:57:18,720][134294] Updated weights for policy 0, policy_version 166684 (0.0026) [2025-01-04 09:57:18,967][134211] Fps is (10 sec: 11059.5, 60 sec: 12629.4, 300 sec: 12676.8). Total num frames: 682737664. Throughput: 0: 3068.3. Samples: 159850212. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-04 09:57:18,968][134211] Avg episode reward: [(0, '9.277')] [2025-01-04 09:57:20,886][134294] Updated weights for policy 0, policy_version 166694 (0.0014) [2025-01-04 09:57:22,868][134294] Updated weights for policy 0, policy_version 166704 (0.0014) [2025-01-04 09:57:23,968][134211] Fps is (10 sec: 15565.1, 60 sec: 12834.1, 300 sec: 12801.7). Total num frames: 682840064. Throughput: 0: 3245.7. Samples: 159874714. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-04 09:57:23,968][134211] Avg episode reward: [(0, '9.465')] [2025-01-04 09:57:24,898][134294] Updated weights for policy 0, policy_version 166714 (0.0013) [2025-01-04 09:57:27,711][134294] Updated weights for policy 0, policy_version 166724 (0.0025) [2025-01-04 09:57:28,968][134211] Fps is (10 sec: 17612.2, 60 sec: 13107.2, 300 sec: 12843.5). Total num frames: 682913792. Throughput: 0: 3333.1. Samples: 159899666. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-04 09:57:28,969][134211] Avg episode reward: [(0, '9.802')] [2025-01-04 09:57:31,567][134294] Updated weights for policy 0, policy_version 166734 (0.0032) [2025-01-04 09:57:33,968][134211] Fps is (10 sec: 12287.9, 60 sec: 12970.7, 300 sec: 12815.6). Total num frames: 682962944. Throughput: 0: 3204.9. Samples: 159907742. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-04 09:57:33,968][134211] Avg episode reward: [(0, '8.837')] [2025-01-04 09:57:35,586][134294] Updated weights for policy 0, policy_version 166744 (0.0033) [2025-01-04 09:57:38,968][134211] Fps is (10 sec: 10240.0, 60 sec: 12970.8, 300 sec: 12801.7). Total num frames: 683016192. Throughput: 0: 3056.0. Samples: 159923074. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-04 09:57:38,969][134211] Avg episode reward: [(0, '8.908')] [2025-01-04 09:57:39,525][134294] Updated weights for policy 0, policy_version 166754 (0.0030) [2025-01-04 09:57:43,132][134294] Updated weights for policy 0, policy_version 166764 (0.0030) [2025-01-04 09:57:43,968][134211] Fps is (10 sec: 11059.1, 60 sec: 12630.0, 300 sec: 12801.7). Total num frames: 683073536. Throughput: 0: 3090.5. Samples: 159939724. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-04 09:57:43,969][134211] Avg episode reward: [(0, '9.777')] [2025-01-04 09:57:46,143][134294] Updated weights for policy 0, policy_version 166774 (0.0024) [2025-01-04 09:57:48,968][134211] Fps is (10 sec: 12288.1, 60 sec: 12151.4, 300 sec: 12774.0). Total num frames: 683139072. Throughput: 0: 3144.1. Samples: 159950030. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 09:57:48,968][134211] Avg episode reward: [(0, '9.104')] [2025-01-04 09:57:49,387][134294] Updated weights for policy 0, policy_version 166784 (0.0026) [2025-01-04 09:57:52,966][134294] Updated weights for policy 0, policy_version 166794 (0.0031) [2025-01-04 09:57:53,968][134211] Fps is (10 sec: 12288.0, 60 sec: 12219.8, 300 sec: 12621.2). Total num frames: 683196416. Throughput: 0: 3164.8. Samples: 159968000. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 09:57:53,969][134211] Avg episode reward: [(0, '9.320')] [2025-01-04 09:57:56,438][134294] Updated weights for policy 0, policy_version 166804 (0.0027) [2025-01-04 09:57:58,465][134294] Updated weights for policy 0, policy_version 166814 (0.0015) [2025-01-04 09:57:58,967][134211] Fps is (10 sec: 13926.8, 60 sec: 12697.7, 300 sec: 12690.7). Total num frames: 683278336. Throughput: 0: 3250.9. Samples: 159989216. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 09:57:58,968][134211] Avg episode reward: [(0, '8.990')] [2025-01-04 09:58:00,830][134294] Updated weights for policy 0, policy_version 166824 (0.0019) [2025-01-04 09:58:03,968][134211] Fps is (10 sec: 14745.5, 60 sec: 12902.4, 300 sec: 12718.4). Total num frames: 683343872. Throughput: 0: 3374.7. Samples: 160002076. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 09:58:03,969][134211] Avg episode reward: [(0, '9.280')] [2025-01-04 09:58:04,416][134294] Updated weights for policy 0, policy_version 166834 (0.0031) [2025-01-04 09:58:08,653][134294] Updated weights for policy 0, policy_version 166844 (0.0035) [2025-01-04 09:58:08,968][134211] Fps is (10 sec: 11468.5, 60 sec: 12765.9, 300 sec: 12704.6). Total num frames: 683393024. Throughput: 0: 3185.9. Samples: 160018082. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 09:58:08,968][134211] Avg episode reward: [(0, '8.844')] [2025-01-04 09:58:12,618][134294] Updated weights for policy 0, policy_version 166854 (0.0032) [2025-01-04 09:58:13,968][134211] Fps is (10 sec: 11059.5, 60 sec: 12834.2, 300 sec: 12718.5). Total num frames: 683454464. Throughput: 0: 2982.6. Samples: 160033884. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 09:58:13,968][134211] Avg episode reward: [(0, '9.007')] [2025-01-04 09:58:15,018][134294] Updated weights for policy 0, policy_version 166864 (0.0012) [2025-01-04 09:58:18,024][134294] Updated weights for policy 0, policy_version 166874 (0.0023) [2025-01-04 09:58:18,968][134211] Fps is (10 sec: 13107.1, 60 sec: 13107.1, 300 sec: 12718.4). Total num frames: 683524096. Throughput: 0: 3093.0. Samples: 160046928. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 09:58:18,968][134211] Avg episode reward: [(0, '9.025')] [2025-01-04 09:58:21,954][134294] Updated weights for policy 0, policy_version 166884 (0.0036) [2025-01-04 09:58:23,968][134211] Fps is (10 sec: 12287.7, 60 sec: 12287.9, 300 sec: 12690.7). Total num frames: 683577344. Throughput: 0: 3107.7. Samples: 160062920. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 09:58:23,969][134211] Avg episode reward: [(0, '8.706')] [2025-01-04 09:58:25,342][134294] Updated weights for policy 0, policy_version 166894 (0.0025) [2025-01-04 09:58:27,416][134294] Updated weights for policy 0, policy_version 166904 (0.0014) [2025-01-04 09:58:28,968][134211] Fps is (10 sec: 14336.2, 60 sec: 12561.1, 300 sec: 12787.9). Total num frames: 683667456. Throughput: 0: 3261.7. Samples: 160086498. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 09:58:28,968][134211] Avg episode reward: [(0, '8.974')] [2025-01-04 09:58:29,794][134294] Updated weights for policy 0, policy_version 166914 (0.0020) [2025-01-04 09:58:33,609][134294] Updated weights for policy 0, policy_version 166924 (0.0031) [2025-01-04 09:58:33,968][134211] Fps is (10 sec: 14335.9, 60 sec: 12629.3, 300 sec: 12801.7). Total num frames: 683720704. Throughput: 0: 3270.8. Samples: 160097216. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 09:58:33,969][134211] Avg episode reward: [(0, '8.308')] [2025-01-04 09:58:37,549][134294] Updated weights for policy 0, policy_version 166934 (0.0032) [2025-01-04 09:58:38,968][134211] Fps is (10 sec: 10648.9, 60 sec: 12629.2, 300 sec: 12732.3). Total num frames: 683773952. Throughput: 0: 3212.6. Samples: 160112570. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 09:58:38,969][134211] Avg episode reward: [(0, '9.379')] [2025-01-04 09:58:41,499][134294] Updated weights for policy 0, policy_version 166944 (0.0033) [2025-01-04 09:58:43,968][134211] Fps is (10 sec: 10240.1, 60 sec: 12492.8, 300 sec: 12649.0). Total num frames: 683823104. Throughput: 0: 3080.5. Samples: 160127838. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 09:58:43,969][134211] Avg episode reward: [(0, '8.116')] [2025-01-04 09:58:45,129][134294] Updated weights for policy 0, policy_version 166954 (0.0026) [2025-01-04 09:58:47,434][134294] Updated weights for policy 0, policy_version 166964 (0.0012) [2025-01-04 09:58:48,968][134211] Fps is (10 sec: 13517.9, 60 sec: 12834.2, 300 sec: 12760.1). Total num frames: 683909120. Throughput: 0: 3041.4. Samples: 160138938. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 09:58:48,968][134211] Avg episode reward: [(0, '9.600')] [2025-01-04 09:58:49,647][134294] Updated weights for policy 0, policy_version 166974 (0.0013) [2025-01-04 09:58:51,736][134294] Updated weights for policy 0, policy_version 166984 (0.0016) [2025-01-04 09:58:53,874][134294] Updated weights for policy 0, policy_version 166994 (0.0016) [2025-01-04 09:58:53,968][134211] Fps is (10 sec: 18432.5, 60 sec: 13516.8, 300 sec: 12843.4). Total num frames: 684007424. Throughput: 0: 3305.7. Samples: 160166840. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 09:58:53,968][134211] Avg episode reward: [(0, '10.618')] [2025-01-04 09:58:57,134][134294] Updated weights for policy 0, policy_version 167004 (0.0028) [2025-01-04 09:58:58,968][134211] Fps is (10 sec: 15564.2, 60 sec: 13107.1, 300 sec: 12857.3). Total num frames: 684064768. Throughput: 0: 3417.7. Samples: 160187682. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 09:58:58,969][134211] Avg episode reward: [(0, '8.531')] [2025-01-04 09:59:01,321][134294] Updated weights for policy 0, policy_version 167014 (0.0037) [2025-01-04 09:59:03,968][134211] Fps is (10 sec: 10239.9, 60 sec: 12765.9, 300 sec: 12774.0). Total num frames: 684109824. Throughput: 0: 3295.8. Samples: 160195238. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 09:59:03,969][134211] Avg episode reward: [(0, '8.809')] [2025-01-04 09:59:05,670][134294] Updated weights for policy 0, policy_version 167024 (0.0036) [2025-01-04 09:59:08,968][134211] Fps is (10 sec: 10240.2, 60 sec: 12902.4, 300 sec: 12662.9). Total num frames: 684167168. Throughput: 0: 3267.3. Samples: 160209950. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 09:59:08,968][134211] Avg episode reward: [(0, '10.121')] [2025-01-04 09:59:09,352][134294] Updated weights for policy 0, policy_version 167034 (0.0029) [2025-01-04 09:59:12,993][134294] Updated weights for policy 0, policy_version 167044 (0.0032) [2025-01-04 09:59:13,968][134211] Fps is (10 sec: 11059.0, 60 sec: 12765.8, 300 sec: 12621.2). Total num frames: 684220416. Throughput: 0: 3111.6. Samples: 160226520. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 09:59:13,969][134211] Avg episode reward: [(0, '9.141')] [2025-01-04 09:59:13,978][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000167046_684220416.pth... [2025-01-04 09:59:14,072][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000166308_681197568.pth [2025-01-04 09:59:16,542][134294] Updated weights for policy 0, policy_version 167054 (0.0028) [2025-01-04 09:59:18,797][134294] Updated weights for policy 0, policy_version 167064 (0.0015) [2025-01-04 09:59:18,967][134211] Fps is (10 sec: 12697.9, 60 sec: 12834.2, 300 sec: 12690.7). Total num frames: 684294144. Throughput: 0: 3057.9. Samples: 160234822. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 09:59:18,968][134211] Avg episode reward: [(0, '8.482')] [2025-01-04 09:59:21,784][134294] Updated weights for policy 0, policy_version 167074 (0.0024) [2025-01-04 09:59:23,968][134211] Fps is (10 sec: 13516.8, 60 sec: 12970.7, 300 sec: 12732.3). Total num frames: 684355584. Throughput: 0: 3231.5. Samples: 160257986. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 09:59:23,969][134211] Avg episode reward: [(0, '9.539')] [2025-01-04 09:59:25,697][134294] Updated weights for policy 0, policy_version 167084 (0.0033) [2025-01-04 09:59:28,969][134211] Fps is (10 sec: 11467.4, 60 sec: 12356.0, 300 sec: 12732.3). Total num frames: 684408832. Throughput: 0: 3240.3. Samples: 160273656. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 09:59:28,970][134211] Avg episode reward: [(0, '9.032')] [2025-01-04 09:59:29,608][134294] Updated weights for policy 0, policy_version 167094 (0.0035) [2025-01-04 09:59:33,100][134294] Updated weights for policy 0, policy_version 167104 (0.0031) [2025-01-04 09:59:33,968][134211] Fps is (10 sec: 11059.2, 60 sec: 12424.5, 300 sec: 12746.2). Total num frames: 684466176. Throughput: 0: 3180.5. Samples: 160282064. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 09:59:33,969][134211] Avg episode reward: [(0, '9.315')] [2025-01-04 09:59:36,241][134294] Updated weights for policy 0, policy_version 167114 (0.0022) [2025-01-04 09:59:38,826][134294] Updated weights for policy 0, policy_version 167124 (0.0021) [2025-01-04 09:59:38,968][134211] Fps is (10 sec: 13108.5, 60 sec: 12766.0, 300 sec: 12704.5). Total num frames: 684539904. Throughput: 0: 3010.0. Samples: 160302292. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 09:59:38,969][134211] Avg episode reward: [(0, '8.881')] [2025-01-04 09:59:42,897][134294] Updated weights for policy 0, policy_version 167134 (0.0030) [2025-01-04 09:59:43,968][134211] Fps is (10 sec: 12288.2, 60 sec: 12765.9, 300 sec: 12565.7). Total num frames: 684589056. Throughput: 0: 2923.7. Samples: 160319246. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 09:59:43,968][134211] Avg episode reward: [(0, '8.832')] [2025-01-04 09:59:46,275][134294] Updated weights for policy 0, policy_version 167144 (0.0023) [2025-01-04 09:59:48,590][134294] Updated weights for policy 0, policy_version 167154 (0.0013) [2025-01-04 09:59:48,968][134211] Fps is (10 sec: 12697.2, 60 sec: 12629.2, 300 sec: 12662.9). Total num frames: 684666880. Throughput: 0: 2956.1. Samples: 160328262. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 09:59:48,969][134211] Avg episode reward: [(0, '10.572')] [2025-01-04 09:59:50,723][134294] Updated weights for policy 0, policy_version 167164 (0.0016) [2025-01-04 09:59:52,779][134294] Updated weights for policy 0, policy_version 167174 (0.0014) [2025-01-04 09:59:53,968][134211] Fps is (10 sec: 17613.0, 60 sec: 12629.3, 300 sec: 12815.6). Total num frames: 684765184. Throughput: 0: 3253.3. Samples: 160356350. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 09:59:53,968][134211] Avg episode reward: [(0, '9.010')] [2025-01-04 09:59:55,281][134294] Updated weights for policy 0, policy_version 167184 (0.0020) [2025-01-04 09:59:58,968][134211] Fps is (10 sec: 15565.4, 60 sec: 12629.4, 300 sec: 12829.5). Total num frames: 684822528. Throughput: 0: 3344.7. Samples: 160377032. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 09:59:58,968][134211] Avg episode reward: [(0, '10.631')] [2025-01-04 09:59:59,378][134294] Updated weights for policy 0, policy_version 167194 (0.0036) [2025-01-04 10:00:03,233][134294] Updated weights for policy 0, policy_version 167204 (0.0031) [2025-01-04 10:00:03,968][134211] Fps is (10 sec: 10649.3, 60 sec: 12697.6, 300 sec: 12787.8). Total num frames: 684871680. Throughput: 0: 3329.9. Samples: 160384670. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 10:00:03,969][134211] Avg episode reward: [(0, '8.953')] [2025-01-04 10:00:07,373][134294] Updated weights for policy 0, policy_version 167214 (0.0033) [2025-01-04 10:00:08,968][134211] Fps is (10 sec: 9830.5, 60 sec: 12561.1, 300 sec: 12635.1). Total num frames: 684920832. Throughput: 0: 3152.5. Samples: 160399846. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 10:00:08,968][134211] Avg episode reward: [(0, '8.808')] [2025-01-04 10:00:11,533][134294] Updated weights for policy 0, policy_version 167224 (0.0031) [2025-01-04 10:00:13,971][134211] Fps is (10 sec: 9827.4, 60 sec: 12492.2, 300 sec: 12537.8). Total num frames: 684969984. Throughput: 0: 3128.2. Samples: 160414434. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 10:00:13,972][134211] Avg episode reward: [(0, '10.128')] [2025-01-04 10:00:15,831][134294] Updated weights for policy 0, policy_version 167234 (0.0039) [2025-01-04 10:00:18,805][134294] Updated weights for policy 0, policy_version 167244 (0.0018) [2025-01-04 10:00:18,968][134211] Fps is (10 sec: 11059.3, 60 sec: 12288.0, 300 sec: 12593.5). Total num frames: 685031424. Throughput: 0: 3099.2. Samples: 160421526. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 10:00:18,968][134211] Avg episode reward: [(0, '9.961')] [2025-01-04 10:00:22,098][134294] Updated weights for policy 0, policy_version 167254 (0.0028) [2025-01-04 10:00:23,968][134211] Fps is (10 sec: 12291.8, 60 sec: 12288.0, 300 sec: 12607.3). Total num frames: 685092864. Throughput: 0: 3095.9. Samples: 160441608. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 10:00:23,969][134211] Avg episode reward: [(0, '8.967')] [2025-01-04 10:00:25,616][134294] Updated weights for policy 0, policy_version 167264 (0.0029) [2025-01-04 10:00:28,968][134211] Fps is (10 sec: 11878.3, 60 sec: 12356.5, 300 sec: 12621.3). Total num frames: 685150208. Throughput: 0: 3110.7. Samples: 160459226. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 10:00:28,968][134211] Avg episode reward: [(0, '9.887')] [2025-01-04 10:00:29,036][134294] Updated weights for policy 0, policy_version 167274 (0.0028) [2025-01-04 10:00:32,444][134294] Updated weights for policy 0, policy_version 167284 (0.0027) [2025-01-04 10:00:33,968][134211] Fps is (10 sec: 11878.5, 60 sec: 12424.6, 300 sec: 12635.1). Total num frames: 685211648. Throughput: 0: 3109.7. Samples: 160468196. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 10:00:33,968][134211] Avg episode reward: [(0, '9.400')] [2025-01-04 10:00:35,107][134294] Updated weights for policy 0, policy_version 167294 (0.0019) [2025-01-04 10:00:37,229][134294] Updated weights for policy 0, policy_version 167304 (0.0014) [2025-01-04 10:00:38,968][134211] Fps is (10 sec: 15974.6, 60 sec: 12834.2, 300 sec: 12774.0). Total num frames: 685309952. Throughput: 0: 3023.9. Samples: 160492424. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 10:00:38,968][134211] Avg episode reward: [(0, '8.932')] [2025-01-04 10:00:39,378][134294] Updated weights for policy 0, policy_version 167314 (0.0015) [2025-01-04 10:00:41,479][134294] Updated weights for policy 0, policy_version 167324 (0.0015) [2025-01-04 10:00:43,969][134211] Fps is (10 sec: 18429.1, 60 sec: 13448.2, 300 sec: 12885.0). Total num frames: 685395968. Throughput: 0: 3180.3. Samples: 160520152. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 10:00:43,970][134211] Avg episode reward: [(0, '9.378')] [2025-01-04 10:00:44,265][134294] Updated weights for policy 0, policy_version 167334 (0.0021) [2025-01-04 10:00:48,866][134294] Updated weights for policy 0, policy_version 167344 (0.0036) [2025-01-04 10:00:48,968][134211] Fps is (10 sec: 13106.6, 60 sec: 12902.4, 300 sec: 12690.6). Total num frames: 685441024. Throughput: 0: 3172.6. Samples: 160527436. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 10:00:48,969][134211] Avg episode reward: [(0, '9.595')] [2025-01-04 10:00:52,748][134294] Updated weights for policy 0, policy_version 167354 (0.0034) [2025-01-04 10:00:53,968][134211] Fps is (10 sec: 9831.9, 60 sec: 12151.4, 300 sec: 12565.7). Total num frames: 685494272. Throughput: 0: 3163.8. Samples: 160542216. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 10:00:53,969][134211] Avg episode reward: [(0, '9.161')] [2025-01-04 10:00:56,416][134294] Updated weights for policy 0, policy_version 167364 (0.0035) [2025-01-04 10:00:58,968][134211] Fps is (10 sec: 10649.7, 60 sec: 12083.2, 300 sec: 12579.6). Total num frames: 685547520. Throughput: 0: 3203.7. Samples: 160558592. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 10:00:58,969][134211] Avg episode reward: [(0, '9.842')] [2025-01-04 10:01:00,187][134294] Updated weights for policy 0, policy_version 167374 (0.0033) [2025-01-04 10:01:03,768][134294] Updated weights for policy 0, policy_version 167384 (0.0028) [2025-01-04 10:01:03,968][134211] Fps is (10 sec: 11059.4, 60 sec: 12219.8, 300 sec: 12607.4). Total num frames: 685604864. Throughput: 0: 3231.3. Samples: 160566934. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 10:01:03,968][134211] Avg episode reward: [(0, '9.359')] [2025-01-04 10:01:07,262][134294] Updated weights for policy 0, policy_version 167394 (0.0031) [2025-01-04 10:01:08,968][134211] Fps is (10 sec: 11878.4, 60 sec: 12424.5, 300 sec: 12649.0). Total num frames: 685666304. Throughput: 0: 3174.0. Samples: 160584438. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 10:01:08,969][134211] Avg episode reward: [(0, '8.346')] [2025-01-04 10:01:10,781][134294] Updated weights for policy 0, policy_version 167404 (0.0026) [2025-01-04 10:01:13,968][134211] Fps is (10 sec: 11878.1, 60 sec: 12561.7, 300 sec: 12690.7). Total num frames: 685723648. Throughput: 0: 3185.2. Samples: 160602560. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 10:01:13,968][134211] Avg episode reward: [(0, '9.919')] [2025-01-04 10:01:13,977][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000167413_685723648.pth... [2025-01-04 10:01:14,078][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000166671_682684416.pth [2025-01-04 10:01:14,167][134294] Updated weights for policy 0, policy_version 167414 (0.0028) [2025-01-04 10:01:16,543][134294] Updated weights for policy 0, policy_version 167424 (0.0013) [2025-01-04 10:01:18,721][134294] Updated weights for policy 0, policy_version 167434 (0.0014) [2025-01-04 10:01:18,968][134211] Fps is (10 sec: 14746.1, 60 sec: 13038.9, 300 sec: 12690.7). Total num frames: 685813760. Throughput: 0: 3236.0. Samples: 160613814. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 10:01:18,968][134211] Avg episode reward: [(0, '8.749')] [2025-01-04 10:01:20,791][134294] Updated weights for policy 0, policy_version 167444 (0.0014) [2025-01-04 10:01:23,913][134294] Updated weights for policy 0, policy_version 167454 (0.0028) [2025-01-04 10:01:23,968][134211] Fps is (10 sec: 16793.4, 60 sec: 13312.0, 300 sec: 12760.1). Total num frames: 685891584. Throughput: 0: 3305.2. Samples: 160641158. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 10:01:23,969][134211] Avg episode reward: [(0, '9.326')] [2025-01-04 10:01:27,714][134294] Updated weights for policy 0, policy_version 167464 (0.0038) [2025-01-04 10:01:28,968][134211] Fps is (10 sec: 13106.8, 60 sec: 13243.7, 300 sec: 12746.2). Total num frames: 685944832. Throughput: 0: 3051.9. Samples: 160657484. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 10:01:28,968][134211] Avg episode reward: [(0, '9.107')] [2025-01-04 10:01:31,518][134294] Updated weights for policy 0, policy_version 167474 (0.0034) [2025-01-04 10:01:33,968][134211] Fps is (10 sec: 11059.5, 60 sec: 13175.5, 300 sec: 12760.1). Total num frames: 686002176. Throughput: 0: 3075.4. Samples: 160665826. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 10:01:33,968][134211] Avg episode reward: [(0, '9.698')] [2025-01-04 10:01:35,100][134294] Updated weights for policy 0, policy_version 167484 (0.0036) [2025-01-04 10:01:38,863][134294] Updated weights for policy 0, policy_version 167494 (0.0026) [2025-01-04 10:01:38,968][134211] Fps is (10 sec: 11059.3, 60 sec: 12424.5, 300 sec: 12676.9). Total num frames: 686055424. Throughput: 0: 3115.0. Samples: 160682390. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 10:01:38,968][134211] Avg episode reward: [(0, '9.148')] [2025-01-04 10:01:42,370][134294] Updated weights for policy 0, policy_version 167504 (0.0031) [2025-01-04 10:01:43,968][134211] Fps is (10 sec: 11059.2, 60 sec: 11947.0, 300 sec: 12551.8). Total num frames: 686112768. Throughput: 0: 3130.3. Samples: 160699456. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 10:01:43,968][134211] Avg episode reward: [(0, '8.931')] [2025-01-04 10:01:45,721][134294] Updated weights for policy 0, policy_version 167514 (0.0024) [2025-01-04 10:01:48,357][134294] Updated weights for policy 0, policy_version 167524 (0.0023) [2025-01-04 10:01:48,968][134211] Fps is (10 sec: 12697.5, 60 sec: 12356.3, 300 sec: 12607.4). Total num frames: 686182400. Throughput: 0: 3172.3. Samples: 160709690. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 10:01:48,968][134211] Avg episode reward: [(0, '8.848')] [2025-01-04 10:01:51,966][134294] Updated weights for policy 0, policy_version 167534 (0.0028) [2025-01-04 10:01:53,968][134211] Fps is (10 sec: 12697.6, 60 sec: 12424.6, 300 sec: 12621.2). Total num frames: 686239744. Throughput: 0: 3203.4. Samples: 160728590. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 10:01:53,968][134211] Avg episode reward: [(0, '9.482')] [2025-01-04 10:01:55,141][134294] Updated weights for policy 0, policy_version 167544 (0.0025) [2025-01-04 10:01:57,206][134294] Updated weights for policy 0, policy_version 167554 (0.0014) [2025-01-04 10:01:58,967][134211] Fps is (10 sec: 15155.7, 60 sec: 13107.3, 300 sec: 12760.1). Total num frames: 686333952. Throughput: 0: 3351.3. Samples: 160753366. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 10:01:58,968][134211] Avg episode reward: [(0, '7.858')] [2025-01-04 10:01:59,245][134294] Updated weights for policy 0, policy_version 167564 (0.0012) [2025-01-04 10:02:01,278][134294] Updated weights for policy 0, policy_version 167574 (0.0015) [2025-01-04 10:02:03,418][134294] Updated weights for policy 0, policy_version 167584 (0.0016) [2025-01-04 10:02:03,968][134211] Fps is (10 sec: 19251.1, 60 sec: 13789.8, 300 sec: 12898.9). Total num frames: 686432256. Throughput: 0: 3433.8. Samples: 160768338. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 10:02:03,968][134211] Avg episode reward: [(0, '10.387')] [2025-01-04 10:02:07,289][134294] Updated weights for policy 0, policy_version 167594 (0.0032) [2025-01-04 10:02:08,968][134211] Fps is (10 sec: 14335.5, 60 sec: 13516.8, 300 sec: 12857.3). Total num frames: 686477312. Throughput: 0: 3290.3. Samples: 160789220. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 10:02:08,969][134211] Avg episode reward: [(0, '9.934')] [2025-01-04 10:02:11,997][134294] Updated weights for policy 0, policy_version 167604 (0.0033) [2025-01-04 10:02:13,968][134211] Fps is (10 sec: 9011.1, 60 sec: 13312.0, 300 sec: 12829.5). Total num frames: 686522368. Throughput: 0: 3215.5. Samples: 160802180. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 10:02:13,969][134211] Avg episode reward: [(0, '9.247')] [2025-01-04 10:02:16,285][134294] Updated weights for policy 0, policy_version 167614 (0.0031) [2025-01-04 10:02:18,968][134211] Fps is (10 sec: 9421.0, 60 sec: 12629.3, 300 sec: 12649.0). Total num frames: 686571520. Throughput: 0: 3195.3. Samples: 160809616. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 10:02:18,968][134211] Avg episode reward: [(0, '9.543')] [2025-01-04 10:02:20,550][134294] Updated weights for policy 0, policy_version 167624 (0.0028) [2025-01-04 10:02:23,968][134211] Fps is (10 sec: 10240.1, 60 sec: 12219.8, 300 sec: 12579.6). Total num frames: 686624768. Throughput: 0: 3165.7. Samples: 160824846. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 10:02:23,968][134211] Avg episode reward: [(0, '10.543')] [2025-01-04 10:02:24,151][134294] Updated weights for policy 0, policy_version 167634 (0.0031) [2025-01-04 10:02:27,488][134294] Updated weights for policy 0, policy_version 167644 (0.0030) [2025-01-04 10:02:28,968][134211] Fps is (10 sec: 11468.7, 60 sec: 12356.3, 300 sec: 12621.2). Total num frames: 686686208. Throughput: 0: 3182.3. Samples: 160842658. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 10:02:28,968][134211] Avg episode reward: [(0, '9.304')] [2025-01-04 10:02:30,914][134294] Updated weights for policy 0, policy_version 167654 (0.0028) [2025-01-04 10:02:33,058][134294] Updated weights for policy 0, policy_version 167664 (0.0014) [2025-01-04 10:02:33,968][134211] Fps is (10 sec: 13516.9, 60 sec: 12629.3, 300 sec: 12690.7). Total num frames: 686759936. Throughput: 0: 3158.9. Samples: 160851842. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 10:02:33,968][134211] Avg episode reward: [(0, '10.307')] [2025-01-04 10:02:36,219][134294] Updated weights for policy 0, policy_version 167674 (0.0030) [2025-01-04 10:02:38,968][134211] Fps is (10 sec: 13516.7, 60 sec: 12765.9, 300 sec: 12704.5). Total num frames: 686821376. Throughput: 0: 3230.8. Samples: 160873976. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 10:02:38,969][134211] Avg episode reward: [(0, '9.825')] [2025-01-04 10:02:40,138][134294] Updated weights for policy 0, policy_version 167684 (0.0032) [2025-01-04 10:02:43,969][134211] Fps is (10 sec: 11058.0, 60 sec: 12629.1, 300 sec: 12649.0). Total num frames: 686870528. Throughput: 0: 3025.4. Samples: 160889512. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 10:02:43,969][134211] Avg episode reward: [(0, '9.743')] [2025-01-04 10:02:44,112][134294] Updated weights for policy 0, policy_version 167694 (0.0036) [2025-01-04 10:02:46,882][134294] Updated weights for policy 0, policy_version 167704 (0.0020) [2025-01-04 10:02:48,968][134211] Fps is (10 sec: 13107.5, 60 sec: 12834.2, 300 sec: 12732.3). Total num frames: 686952448. Throughput: 0: 2898.4. Samples: 160898764. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 10:02:48,968][134211] Avg episode reward: [(0, '8.825')] [2025-01-04 10:02:49,134][134294] Updated weights for policy 0, policy_version 167714 (0.0015) [2025-01-04 10:02:51,196][134294] Updated weights for policy 0, policy_version 167724 (0.0014) [2025-01-04 10:02:53,248][134294] Updated weights for policy 0, policy_version 167734 (0.0014) [2025-01-04 10:02:53,968][134211] Fps is (10 sec: 18024.4, 60 sec: 13516.8, 300 sec: 12787.8). Total num frames: 687050752. Throughput: 0: 3077.7. Samples: 160927716. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 10:02:53,968][134211] Avg episode reward: [(0, '8.722')] [2025-01-04 10:02:55,334][134294] Updated weights for policy 0, policy_version 167744 (0.0013) [2025-01-04 10:02:58,384][134294] Updated weights for policy 0, policy_version 167754 (0.0027) [2025-01-04 10:02:58,968][134211] Fps is (10 sec: 17202.7, 60 sec: 13175.4, 300 sec: 12815.6). Total num frames: 687124480. Throughput: 0: 3344.1. Samples: 160952666. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 10:02:58,969][134211] Avg episode reward: [(0, '9.637')] [2025-01-04 10:03:02,474][134294] Updated weights for policy 0, policy_version 167764 (0.0038) [2025-01-04 10:03:03,968][134211] Fps is (10 sec: 12287.1, 60 sec: 12356.1, 300 sec: 12815.6). Total num frames: 687173632. Throughput: 0: 3349.2. Samples: 160960334. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 10:03:03,969][134211] Avg episode reward: [(0, '7.909')] [2025-01-04 10:03:06,197][134294] Updated weights for policy 0, policy_version 167774 (0.0031) [2025-01-04 10:03:08,968][134211] Fps is (10 sec: 10649.7, 60 sec: 12561.1, 300 sec: 12801.7). Total num frames: 687230976. Throughput: 0: 3368.4. Samples: 160976424. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 10:03:08,969][134211] Avg episode reward: [(0, '10.665')] [2025-01-04 10:03:10,000][134294] Updated weights for policy 0, policy_version 167784 (0.0030) [2025-01-04 10:03:13,387][134294] Updated weights for policy 0, policy_version 167794 (0.0032) [2025-01-04 10:03:13,968][134211] Fps is (10 sec: 11469.5, 60 sec: 12765.9, 300 sec: 12760.1). Total num frames: 687288320. Throughput: 0: 3352.4. Samples: 160993516. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 10:03:13,968][134211] Avg episode reward: [(0, '9.683')] [2025-01-04 10:03:13,979][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000167795_687288320.pth... [2025-01-04 10:03:14,066][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000167046_684220416.pth [2025-01-04 10:03:17,421][134294] Updated weights for policy 0, policy_version 167804 (0.0030) [2025-01-04 10:03:18,968][134211] Fps is (10 sec: 10649.6, 60 sec: 12765.8, 300 sec: 12746.2). Total num frames: 687337472. Throughput: 0: 3322.3. Samples: 161001344. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 10:03:18,969][134211] Avg episode reward: [(0, '8.811')] [2025-01-04 10:03:21,062][134294] Updated weights for policy 0, policy_version 167814 (0.0032) [2025-01-04 10:03:23,968][134211] Fps is (10 sec: 10649.6, 60 sec: 12834.1, 300 sec: 12635.1). Total num frames: 687394816. Throughput: 0: 3199.7. Samples: 161017964. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 10:03:23,968][134211] Avg episode reward: [(0, '9.179')] [2025-01-04 10:03:24,579][134294] Updated weights for policy 0, policy_version 167824 (0.0032) [2025-01-04 10:03:26,583][134294] Updated weights for policy 0, policy_version 167834 (0.0013) [2025-01-04 10:03:28,600][134294] Updated weights for policy 0, policy_version 167844 (0.0013) [2025-01-04 10:03:28,968][134211] Fps is (10 sec: 15565.2, 60 sec: 13448.6, 300 sec: 12787.9). Total num frames: 687493120. Throughput: 0: 3418.6. Samples: 161043346. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 10:03:28,968][134211] Avg episode reward: [(0, '8.698')] [2025-01-04 10:03:30,644][134294] Updated weights for policy 0, policy_version 167854 (0.0014) [2025-01-04 10:03:33,172][134294] Updated weights for policy 0, policy_version 167864 (0.0020) [2025-01-04 10:03:33,968][134211] Fps is (10 sec: 18432.1, 60 sec: 13653.3, 300 sec: 12899.0). Total num frames: 687579136. Throughput: 0: 3550.5. Samples: 161058536. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 10:03:33,968][134211] Avg episode reward: [(0, '9.259')] [2025-01-04 10:03:37,270][134294] Updated weights for policy 0, policy_version 167874 (0.0033) [2025-01-04 10:03:38,968][134211] Fps is (10 sec: 13516.4, 60 sec: 13448.5, 300 sec: 12898.9). Total num frames: 687628288. Throughput: 0: 3297.4. Samples: 161076100. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 10:03:38,969][134211] Avg episode reward: [(0, '9.055')] [2025-01-04 10:03:41,112][134294] Updated weights for policy 0, policy_version 167884 (0.0037) [2025-01-04 10:03:43,970][134211] Fps is (10 sec: 9418.3, 60 sec: 13379.9, 300 sec: 12760.0). Total num frames: 687673344. Throughput: 0: 3062.6. Samples: 161090492. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 10:03:43,971][134211] Avg episode reward: [(0, '9.302')] [2025-01-04 10:03:45,718][134294] Updated weights for policy 0, policy_version 167894 (0.0030) [2025-01-04 10:03:48,055][134294] Updated weights for policy 0, policy_version 167904 (0.0014) [2025-01-04 10:03:48,968][134211] Fps is (10 sec: 11878.0, 60 sec: 13243.6, 300 sec: 12676.7). Total num frames: 687747072. Throughput: 0: 3071.7. Samples: 161098558. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 10:03:48,969][134211] Avg episode reward: [(0, '9.073')] [2025-01-04 10:03:50,244][134294] Updated weights for policy 0, policy_version 167914 (0.0014) [2025-01-04 10:03:52,617][134294] Updated weights for policy 0, policy_version 167924 (0.0022) [2025-01-04 10:03:53,968][134211] Fps is (10 sec: 15569.0, 60 sec: 12970.6, 300 sec: 12760.1). Total num frames: 687828992. Throughput: 0: 3319.4. Samples: 161125796. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 10:03:53,968][134211] Avg episode reward: [(0, '10.358')] [2025-01-04 10:03:56,448][134294] Updated weights for policy 0, policy_version 167934 (0.0032) [2025-01-04 10:03:58,968][134211] Fps is (10 sec: 13517.3, 60 sec: 12629.3, 300 sec: 12787.8). Total num frames: 687882240. Throughput: 0: 3306.4. Samples: 161142306. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 10:03:58,969][134211] Avg episode reward: [(0, '8.501')] [2025-01-04 10:04:00,242][134294] Updated weights for policy 0, policy_version 167944 (0.0036) [2025-01-04 10:04:03,941][134294] Updated weights for policy 0, policy_version 167954 (0.0029) [2025-01-04 10:04:03,969][134211] Fps is (10 sec: 11058.1, 60 sec: 12765.8, 300 sec: 12787.8). Total num frames: 687939584. Throughput: 0: 3308.6. Samples: 161150236. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 10:04:03,969][134211] Avg episode reward: [(0, '9.043')] [2025-01-04 10:04:07,752][134294] Updated weights for policy 0, policy_version 167964 (0.0033) [2025-01-04 10:04:08,968][134211] Fps is (10 sec: 10649.7, 60 sec: 12629.3, 300 sec: 12774.0). Total num frames: 687988736. Throughput: 0: 3311.6. Samples: 161166986. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 10:04:08,968][134211] Avg episode reward: [(0, '9.116')] [2025-01-04 10:04:11,642][134294] Updated weights for policy 0, policy_version 167974 (0.0029) [2025-01-04 10:04:13,968][134211] Fps is (10 sec: 11879.7, 60 sec: 12834.2, 300 sec: 12760.1). Total num frames: 688058368. Throughput: 0: 3153.7. Samples: 161185262. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 10:04:13,968][134211] Avg episode reward: [(0, '9.103')] [2025-01-04 10:04:14,023][134294] Updated weights for policy 0, policy_version 167984 (0.0013) [2025-01-04 10:04:16,210][134294] Updated weights for policy 0, policy_version 167994 (0.0013) [2025-01-04 10:04:18,484][134294] Updated weights for policy 0, policy_version 168004 (0.0014) [2025-01-04 10:04:18,968][134211] Fps is (10 sec: 16384.3, 60 sec: 13585.1, 300 sec: 12871.2). Total num frames: 688152576. Throughput: 0: 3115.2. Samples: 161198720. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 10:04:18,968][134211] Avg episode reward: [(0, '10.138')] [2025-01-04 10:04:21,026][134294] Updated weights for policy 0, policy_version 168014 (0.0021) [2025-01-04 10:04:23,968][134211] Fps is (10 sec: 15564.3, 60 sec: 13653.3, 300 sec: 12899.0). Total num frames: 688214016. Throughput: 0: 3258.3. Samples: 161222724. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 10:04:23,969][134211] Avg episode reward: [(0, '9.388')] [2025-01-04 10:04:24,967][134294] Updated weights for policy 0, policy_version 168024 (0.0033) [2025-01-04 10:04:28,968][134211] Fps is (10 sec: 11059.0, 60 sec: 12834.1, 300 sec: 12871.2). Total num frames: 688263168. Throughput: 0: 3273.8. Samples: 161237804. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 10:04:28,968][134211] Avg episode reward: [(0, '9.477')] [2025-01-04 10:04:29,019][134294] Updated weights for policy 0, policy_version 168034 (0.0036) [2025-01-04 10:04:32,555][134294] Updated weights for policy 0, policy_version 168044 (0.0029) [2025-01-04 10:04:33,968][134211] Fps is (10 sec: 10649.8, 60 sec: 12356.3, 300 sec: 12815.6). Total num frames: 688320512. Throughput: 0: 3279.0. Samples: 161246112. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 10:04:33,968][134211] Avg episode reward: [(0, '9.944')] [2025-01-04 10:04:36,172][134294] Updated weights for policy 0, policy_version 168054 (0.0030) [2025-01-04 10:04:38,968][134211] Fps is (10 sec: 11468.7, 60 sec: 12492.8, 300 sec: 12843.4). Total num frames: 688377856. Throughput: 0: 3055.7. Samples: 161263304. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 10:04:38,969][134211] Avg episode reward: [(0, '10.266')] [2025-01-04 10:04:39,935][134294] Updated weights for policy 0, policy_version 168064 (0.0031) [2025-01-04 10:04:43,710][134294] Updated weights for policy 0, policy_version 168074 (0.0029) [2025-01-04 10:04:43,968][134211] Fps is (10 sec: 11059.0, 60 sec: 12629.9, 300 sec: 12760.1). Total num frames: 688431104. Throughput: 0: 3047.6. Samples: 161279446. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 10:04:43,969][134211] Avg episode reward: [(0, '9.607')] [2025-01-04 10:04:46,496][134294] Updated weights for policy 0, policy_version 168084 (0.0018) [2025-01-04 10:04:48,775][134294] Updated weights for policy 0, policy_version 168094 (0.0014) [2025-01-04 10:04:48,967][134211] Fps is (10 sec: 13517.2, 60 sec: 12766.0, 300 sec: 12704.5). Total num frames: 688513024. Throughput: 0: 3101.1. Samples: 161289782. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 10:04:48,968][134211] Avg episode reward: [(0, '10.378')] [2025-01-04 10:04:50,880][134294] Updated weights for policy 0, policy_version 168104 (0.0014) [2025-01-04 10:04:53,518][134294] Updated weights for policy 0, policy_version 168114 (0.0022) [2025-01-04 10:04:53,968][134211] Fps is (10 sec: 16793.8, 60 sec: 12834.1, 300 sec: 12801.7). Total num frames: 688599040. Throughput: 0: 3351.2. Samples: 161317792. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 10:04:53,968][134211] Avg episode reward: [(0, '9.950')] [2025-01-04 10:04:57,722][134294] Updated weights for policy 0, policy_version 168124 (0.0037) [2025-01-04 10:04:58,969][134211] Fps is (10 sec: 13515.2, 60 sec: 12765.7, 300 sec: 12801.7). Total num frames: 688648192. Throughput: 0: 3288.5. Samples: 161333246. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 10:04:58,970][134211] Avg episode reward: [(0, '9.642')] [2025-01-04 10:05:01,574][134294] Updated weights for policy 0, policy_version 168134 (0.0032) [2025-01-04 10:05:03,968][134211] Fps is (10 sec: 9830.2, 60 sec: 12629.5, 300 sec: 12801.7). Total num frames: 688697344. Throughput: 0: 3167.8. Samples: 161341272. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 10:05:03,969][134211] Avg episode reward: [(0, '10.013')] [2025-01-04 10:05:05,549][134294] Updated weights for policy 0, policy_version 168144 (0.0032) [2025-01-04 10:05:08,679][134294] Updated weights for policy 0, policy_version 168154 (0.0020) [2025-01-04 10:05:08,967][134211] Fps is (10 sec: 11470.1, 60 sec: 12902.4, 300 sec: 12857.4). Total num frames: 688762880. Throughput: 0: 2972.5. Samples: 161356484. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 10:05:08,968][134211] Avg episode reward: [(0, '9.826')] [2025-01-04 10:05:11,637][134294] Updated weights for policy 0, policy_version 168164 (0.0022) [2025-01-04 10:05:13,968][134211] Fps is (10 sec: 12287.9, 60 sec: 12697.5, 300 sec: 12843.4). Total num frames: 688820224. Throughput: 0: 3100.4. Samples: 161377322. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 10:05:13,969][134211] Avg episode reward: [(0, '9.034')] [2025-01-04 10:05:14,036][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000168170_688824320.pth... [2025-01-04 10:05:14,129][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000167413_685723648.pth [2025-01-04 10:05:15,606][134294] Updated weights for policy 0, policy_version 168174 (0.0031) [2025-01-04 10:05:18,968][134211] Fps is (10 sec: 11058.9, 60 sec: 12014.9, 300 sec: 12815.6). Total num frames: 688873472. Throughput: 0: 3078.2. Samples: 161384630. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 10:05:18,969][134211] Avg episode reward: [(0, '9.338')] [2025-01-04 10:05:19,669][134294] Updated weights for policy 0, policy_version 168184 (0.0029) [2025-01-04 10:05:21,798][134294] Updated weights for policy 0, policy_version 168194 (0.0014) [2025-01-04 10:05:23,816][134294] Updated weights for policy 0, policy_version 168204 (0.0015) [2025-01-04 10:05:23,968][134211] Fps is (10 sec: 14336.2, 60 sec: 12492.8, 300 sec: 12926.7). Total num frames: 688963584. Throughput: 0: 3166.9. Samples: 161405816. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-04 10:05:23,969][134211] Avg episode reward: [(0, '9.229')] [2025-01-04 10:05:25,907][134294] Updated weights for policy 0, policy_version 168214 (0.0016) [2025-01-04 10:05:27,960][134294] Updated weights for policy 0, policy_version 168224 (0.0014) [2025-01-04 10:05:28,968][134211] Fps is (10 sec: 18841.9, 60 sec: 13312.0, 300 sec: 13051.7). Total num frames: 689061888. Throughput: 0: 3472.2. Samples: 161435696. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-04 10:05:28,968][134211] Avg episode reward: [(0, '9.367')] [2025-01-04 10:05:30,694][134294] Updated weights for policy 0, policy_version 168234 (0.0024) [2025-01-04 10:05:33,968][134211] Fps is (10 sec: 15564.8, 60 sec: 13312.0, 300 sec: 12912.8). Total num frames: 689119232. Throughput: 0: 3473.4. Samples: 161446086. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-04 10:05:33,968][134211] Avg episode reward: [(0, '8.785')] [2025-01-04 10:05:34,627][134294] Updated weights for policy 0, policy_version 168244 (0.0038) [2025-01-04 10:05:38,566][134294] Updated weights for policy 0, policy_version 168254 (0.0037) [2025-01-04 10:05:38,968][134211] Fps is (10 sec: 10649.5, 60 sec: 13175.5, 300 sec: 12787.9). Total num frames: 689168384. Throughput: 0: 3203.9. Samples: 161461966. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-04 10:05:38,968][134211] Avg episode reward: [(0, '9.106')] [2025-01-04 10:05:42,731][134294] Updated weights for policy 0, policy_version 168264 (0.0036) [2025-01-04 10:05:43,968][134211] Fps is (10 sec: 9830.3, 60 sec: 13107.2, 300 sec: 12801.7). Total num frames: 689217536. Throughput: 0: 3182.1. Samples: 161476436. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-04 10:05:43,969][134211] Avg episode reward: [(0, '9.462')] [2025-01-04 10:05:46,832][134294] Updated weights for policy 0, policy_version 168274 (0.0036) [2025-01-04 10:05:48,968][134211] Fps is (10 sec: 10239.9, 60 sec: 12629.3, 300 sec: 12801.7). Total num frames: 689270784. Throughput: 0: 3178.3. Samples: 161484296. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-04 10:05:48,969][134211] Avg episode reward: [(0, '8.770')] [2025-01-04 10:05:50,619][134294] Updated weights for policy 0, policy_version 168284 (0.0030) [2025-01-04 10:05:53,965][134294] Updated weights for policy 0, policy_version 168294 (0.0022) [2025-01-04 10:05:53,968][134211] Fps is (10 sec: 11469.2, 60 sec: 12219.8, 300 sec: 12829.5). Total num frames: 689332224. Throughput: 0: 3194.0. Samples: 161500216. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-04 10:05:53,968][134211] Avg episode reward: [(0, '10.271')] [2025-01-04 10:05:56,163][134294] Updated weights for policy 0, policy_version 168304 (0.0015) [2025-01-04 10:05:58,727][134294] Updated weights for policy 0, policy_version 168314 (0.0024) [2025-01-04 10:05:58,969][134211] Fps is (10 sec: 14335.0, 60 sec: 12765.9, 300 sec: 12912.8). Total num frames: 689414144. Throughput: 0: 3280.3. Samples: 161524936. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-04 10:05:58,969][134211] Avg episode reward: [(0, '9.352')] [2025-01-04 10:06:02,766][134294] Updated weights for policy 0, policy_version 168324 (0.0031) [2025-01-04 10:06:03,969][134211] Fps is (10 sec: 13514.8, 60 sec: 12833.9, 300 sec: 12885.0). Total num frames: 689467392. Throughput: 0: 3291.5. Samples: 161532752. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-04 10:06:03,970][134211] Avg episode reward: [(0, '9.298')] [2025-01-04 10:06:06,398][134294] Updated weights for policy 0, policy_version 168334 (0.0032) [2025-01-04 10:06:08,968][134211] Fps is (10 sec: 10240.9, 60 sec: 12561.1, 300 sec: 12857.3). Total num frames: 689516544. Throughput: 0: 3178.7. Samples: 161548856. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-04 10:06:08,968][134211] Avg episode reward: [(0, '9.158')] [2025-01-04 10:06:09,979][134294] Updated weights for policy 0, policy_version 168344 (0.0021) [2025-01-04 10:06:12,304][134294] Updated weights for policy 0, policy_version 168354 (0.0016) [2025-01-04 10:06:13,968][134211] Fps is (10 sec: 13518.9, 60 sec: 13039.0, 300 sec: 12843.4). Total num frames: 689602560. Throughput: 0: 3018.0. Samples: 161571508. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-04 10:06:13,968][134211] Avg episode reward: [(0, '10.653')] [2025-01-04 10:06:14,646][134294] Updated weights for policy 0, policy_version 168364 (0.0016) [2025-01-04 10:06:17,675][134294] Updated weights for policy 0, policy_version 168374 (0.0025) [2025-01-04 10:06:18,968][134211] Fps is (10 sec: 15154.7, 60 sec: 13243.7, 300 sec: 12801.7). Total num frames: 689668096. Throughput: 0: 3069.7. Samples: 161584222. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-04 10:06:18,969][134211] Avg episode reward: [(0, '9.614')] [2025-01-04 10:06:21,597][134294] Updated weights for policy 0, policy_version 168384 (0.0036) [2025-01-04 10:06:23,969][134211] Fps is (10 sec: 11876.8, 60 sec: 12629.1, 300 sec: 12801.7). Total num frames: 689721344. Throughput: 0: 3060.0. Samples: 161599668. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-04 10:06:23,970][134211] Avg episode reward: [(0, '9.676')] [2025-01-04 10:06:25,565][134294] Updated weights for policy 0, policy_version 168394 (0.0032) [2025-01-04 10:06:28,968][134211] Fps is (10 sec: 11059.3, 60 sec: 11946.6, 300 sec: 12801.7). Total num frames: 689778688. Throughput: 0: 3108.4. Samples: 161616312. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 10:06:28,969][134211] Avg episode reward: [(0, '9.393')] [2025-01-04 10:06:29,208][134294] Updated weights for policy 0, policy_version 168404 (0.0033) [2025-01-04 10:06:32,624][134294] Updated weights for policy 0, policy_version 168414 (0.0029) [2025-01-04 10:06:33,968][134211] Fps is (10 sec: 11470.3, 60 sec: 11946.7, 300 sec: 12815.6). Total num frames: 689836032. Throughput: 0: 3121.7. Samples: 161624774. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 10:06:33,968][134211] Avg episode reward: [(0, '9.074')] [2025-01-04 10:06:36,021][134294] Updated weights for policy 0, policy_version 168424 (0.0031) [2025-01-04 10:06:38,968][134211] Fps is (10 sec: 11469.0, 60 sec: 12083.2, 300 sec: 12815.6). Total num frames: 689893376. Throughput: 0: 3162.1. Samples: 161642512. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 10:06:38,968][134211] Avg episode reward: [(0, '9.401')] [2025-01-04 10:06:39,742][134294] Updated weights for policy 0, policy_version 168434 (0.0030) [2025-01-04 10:06:43,490][134294] Updated weights for policy 0, policy_version 168444 (0.0031) [2025-01-04 10:06:43,968][134211] Fps is (10 sec: 11059.0, 60 sec: 12151.5, 300 sec: 12760.1). Total num frames: 689946624. Throughput: 0: 2978.9. Samples: 161658986. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 10:06:43,969][134211] Avg episode reward: [(0, '9.042')] [2025-01-04 10:06:46,232][134294] Updated weights for policy 0, policy_version 168454 (0.0019) [2025-01-04 10:06:48,562][134294] Updated weights for policy 0, policy_version 168464 (0.0015) [2025-01-04 10:06:48,968][134211] Fps is (10 sec: 13926.5, 60 sec: 12697.6, 300 sec: 12857.3). Total num frames: 690032640. Throughput: 0: 3048.9. Samples: 161669948. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 10:06:48,968][134211] Avg episode reward: [(0, '9.746')] [2025-01-04 10:06:50,734][134294] Updated weights for policy 0, policy_version 168474 (0.0013) [2025-01-04 10:06:52,789][134294] Updated weights for policy 0, policy_version 168484 (0.0014) [2025-01-04 10:06:53,968][134211] Fps is (10 sec: 18022.7, 60 sec: 13243.7, 300 sec: 12857.3). Total num frames: 690126848. Throughput: 0: 3313.5. Samples: 161697962. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 10:06:53,968][134211] Avg episode reward: [(0, '10.273')] [2025-01-04 10:06:55,646][134294] Updated weights for policy 0, policy_version 168494 (0.0026) [2025-01-04 10:06:58,968][134211] Fps is (10 sec: 15155.1, 60 sec: 12834.3, 300 sec: 12718.4). Total num frames: 690184192. Throughput: 0: 3246.0. Samples: 161717578. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 10:06:58,968][134211] Avg episode reward: [(0, '9.396')] [2025-01-04 10:06:59,579][134294] Updated weights for policy 0, policy_version 168504 (0.0032) [2025-01-04 10:07:03,411][134294] Updated weights for policy 0, policy_version 168514 (0.0033) [2025-01-04 10:07:03,968][134211] Fps is (10 sec: 11059.1, 60 sec: 12834.4, 300 sec: 12746.2). Total num frames: 690237440. Throughput: 0: 3137.6. Samples: 161725412. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 10:07:03,968][134211] Avg episode reward: [(0, '8.446')] [2025-01-04 10:07:07,182][134294] Updated weights for policy 0, policy_version 168524 (0.0034) [2025-01-04 10:07:08,968][134211] Fps is (10 sec: 10649.6, 60 sec: 12902.4, 300 sec: 12774.0). Total num frames: 690290688. Throughput: 0: 3157.4. Samples: 161741746. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 10:07:08,968][134211] Avg episode reward: [(0, '9.238')] [2025-01-04 10:07:11,168][134294] Updated weights for policy 0, policy_version 168534 (0.0031) [2025-01-04 10:07:13,971][134211] Fps is (10 sec: 10236.7, 60 sec: 12287.3, 300 sec: 12773.8). Total num frames: 690339840. Throughput: 0: 3117.3. Samples: 161756602. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 10:07:13,972][134211] Avg episode reward: [(0, '9.029')] [2025-01-04 10:07:13,984][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000168540_690339840.pth... [2025-01-04 10:07:14,087][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000167795_687288320.pth [2025-01-04 10:07:15,021][134294] Updated weights for policy 0, policy_version 168544 (0.0026) [2025-01-04 10:07:17,208][134294] Updated weights for policy 0, policy_version 168554 (0.0015) [2025-01-04 10:07:18,968][134211] Fps is (10 sec: 13516.9, 60 sec: 12629.4, 300 sec: 12885.1). Total num frames: 690425856. Throughput: 0: 3177.3. Samples: 161767750. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 10:07:18,968][134211] Avg episode reward: [(0, '8.315')] [2025-01-04 10:07:19,415][134294] Updated weights for policy 0, policy_version 168564 (0.0015) [2025-01-04 10:07:21,552][134294] Updated weights for policy 0, policy_version 168574 (0.0017) [2025-01-04 10:07:23,968][134211] Fps is (10 sec: 16798.5, 60 sec: 13107.4, 300 sec: 12954.5). Total num frames: 690507776. Throughput: 0: 3388.7. Samples: 161795006. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 10:07:23,969][134211] Avg episode reward: [(0, '8.741')] [2025-01-04 10:07:25,136][134294] Updated weights for policy 0, policy_version 168584 (0.0035) [2025-01-04 10:07:28,925][134294] Updated weights for policy 0, policy_version 168594 (0.0031) [2025-01-04 10:07:28,969][134211] Fps is (10 sec: 13514.7, 60 sec: 13038.6, 300 sec: 12885.0). Total num frames: 690561024. Throughput: 0: 3387.2. Samples: 161811416. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 10:07:28,970][134211] Avg episode reward: [(0, '10.891')] [2025-01-04 10:07:32,551][134294] Updated weights for policy 0, policy_version 168604 (0.0030) [2025-01-04 10:07:33,968][134211] Fps is (10 sec: 10649.8, 60 sec: 12970.6, 300 sec: 12857.3). Total num frames: 690614272. Throughput: 0: 3327.9. Samples: 161819706. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 10:07:33,969][134211] Avg episode reward: [(0, '10.032')] [2025-01-04 10:07:36,181][134294] Updated weights for policy 0, policy_version 168614 (0.0031) [2025-01-04 10:07:38,968][134211] Fps is (10 sec: 11060.6, 60 sec: 12970.6, 300 sec: 12885.1). Total num frames: 690671616. Throughput: 0: 3083.1. Samples: 161836704. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 10:07:38,969][134211] Avg episode reward: [(0, '10.725')] [2025-01-04 10:07:40,071][134294] Updated weights for policy 0, policy_version 168624 (0.0030) [2025-01-04 10:07:43,858][134294] Updated weights for policy 0, policy_version 168634 (0.0034) [2025-01-04 10:07:43,968][134211] Fps is (10 sec: 11059.4, 60 sec: 12970.7, 300 sec: 12787.8). Total num frames: 690724864. Throughput: 0: 3000.6. Samples: 161852606. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 10:07:43,968][134211] Avg episode reward: [(0, '9.881')] [2025-01-04 10:07:47,564][134294] Updated weights for policy 0, policy_version 168644 (0.0032) [2025-01-04 10:07:48,968][134211] Fps is (10 sec: 11878.7, 60 sec: 12629.3, 300 sec: 12676.8). Total num frames: 690790400. Throughput: 0: 3001.8. Samples: 161860494. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 10:07:48,968][134211] Avg episode reward: [(0, '10.617')] [2025-01-04 10:07:49,811][134294] Updated weights for policy 0, policy_version 168654 (0.0015) [2025-01-04 10:07:51,849][134294] Updated weights for policy 0, policy_version 168664 (0.0015) [2025-01-04 10:07:53,938][134294] Updated weights for policy 0, policy_version 168674 (0.0016) [2025-01-04 10:07:53,970][134211] Fps is (10 sec: 16380.3, 60 sec: 12697.1, 300 sec: 12760.0). Total num frames: 690888704. Throughput: 0: 3223.9. Samples: 161886828. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 10:07:53,971][134211] Avg episode reward: [(0, '9.246')] [2025-01-04 10:07:57,315][134294] Updated weights for policy 0, policy_version 168684 (0.0028) [2025-01-04 10:07:58,968][134211] Fps is (10 sec: 15564.5, 60 sec: 12697.6, 300 sec: 12787.9). Total num frames: 690946048. Throughput: 0: 3362.0. Samples: 161907882. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 10:07:58,968][134211] Avg episode reward: [(0, '8.920')] [2025-01-04 10:08:00,784][134294] Updated weights for policy 0, policy_version 168694 (0.0028) [2025-01-04 10:08:03,968][134211] Fps is (10 sec: 11470.8, 60 sec: 12765.8, 300 sec: 12787.8). Total num frames: 691003392. Throughput: 0: 3321.6. Samples: 161917222. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 10:08:03,970][134211] Avg episode reward: [(0, '10.115')] [2025-01-04 10:08:04,621][134294] Updated weights for policy 0, policy_version 168704 (0.0033) [2025-01-04 10:08:08,279][134294] Updated weights for policy 0, policy_version 168714 (0.0028) [2025-01-04 10:08:08,971][134211] Fps is (10 sec: 11055.8, 60 sec: 12765.2, 300 sec: 12773.8). Total num frames: 691056640. Throughput: 0: 3070.7. Samples: 161933198. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 10:08:08,972][134211] Avg episode reward: [(0, '10.736')] [2025-01-04 10:08:12,307][134294] Updated weights for policy 0, policy_version 168724 (0.0028) [2025-01-04 10:08:13,968][134211] Fps is (10 sec: 10650.0, 60 sec: 12834.8, 300 sec: 12787.8). Total num frames: 691109888. Throughput: 0: 3051.8. Samples: 161948742. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 10:08:13,969][134211] Avg episode reward: [(0, '8.377')] [2025-01-04 10:08:15,871][134294] Updated weights for policy 0, policy_version 168734 (0.0025) [2025-01-04 10:08:18,968][134211] Fps is (10 sec: 11062.6, 60 sec: 12356.2, 300 sec: 12787.9). Total num frames: 691167232. Throughput: 0: 3063.7. Samples: 161957572. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 10:08:18,968][134211] Avg episode reward: [(0, '9.225')] [2025-01-04 10:08:19,366][134294] Updated weights for policy 0, policy_version 168744 (0.0025) [2025-01-04 10:08:22,517][134294] Updated weights for policy 0, policy_version 168754 (0.0027) [2025-01-04 10:08:23,968][134211] Fps is (10 sec: 12288.2, 60 sec: 12083.3, 300 sec: 12676.8). Total num frames: 691232768. Throughput: 0: 3098.7. Samples: 161976146. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 10:08:23,968][134211] Avg episode reward: [(0, '9.048')] [2025-01-04 10:08:25,617][134294] Updated weights for policy 0, policy_version 168764 (0.0025) [2025-01-04 10:08:28,650][134294] Updated weights for policy 0, policy_version 168774 (0.0026) [2025-01-04 10:08:28,968][134211] Fps is (10 sec: 13107.2, 60 sec: 12288.3, 300 sec: 12607.3). Total num frames: 691298304. Throughput: 0: 3193.2. Samples: 161996302. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 10:08:28,968][134211] Avg episode reward: [(0, '9.571')] [2025-01-04 10:08:31,808][134294] Updated weights for policy 0, policy_version 168784 (0.0027) [2025-01-04 10:08:33,968][134211] Fps is (10 sec: 13107.0, 60 sec: 12492.8, 300 sec: 12662.9). Total num frames: 691363840. Throughput: 0: 3234.9. Samples: 162006064. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 10:08:33,968][134211] Avg episode reward: [(0, '10.023')] [2025-01-04 10:08:35,008][134294] Updated weights for policy 0, policy_version 168794 (0.0025) [2025-01-04 10:08:38,029][134294] Updated weights for policy 0, policy_version 168804 (0.0026) [2025-01-04 10:08:38,968][134211] Fps is (10 sec: 13107.2, 60 sec: 12629.3, 300 sec: 12732.4). Total num frames: 691429376. Throughput: 0: 3085.0. Samples: 162025646. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 10:08:38,969][134211] Avg episode reward: [(0, '9.417')] [2025-01-04 10:08:41,652][134294] Updated weights for policy 0, policy_version 168814 (0.0025) [2025-01-04 10:08:43,968][134211] Fps is (10 sec: 12697.9, 60 sec: 12765.9, 300 sec: 12690.7). Total num frames: 691490816. Throughput: 0: 3008.3. Samples: 162043256. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 10:08:43,968][134211] Avg episode reward: [(0, '9.073')] [2025-01-04 10:08:44,421][134294] Updated weights for policy 0, policy_version 168824 (0.0020) [2025-01-04 10:08:46,630][134294] Updated weights for policy 0, policy_version 168834 (0.0013) [2025-01-04 10:08:48,717][134294] Updated weights for policy 0, policy_version 168844 (0.0012) [2025-01-04 10:08:48,968][134211] Fps is (10 sec: 15974.8, 60 sec: 13312.0, 300 sec: 12746.2). Total num frames: 691589120. Throughput: 0: 3110.8. Samples: 162057204. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 10:08:48,968][134211] Avg episode reward: [(0, '9.421')] [2025-01-04 10:08:51,380][134294] Updated weights for policy 0, policy_version 168854 (0.0017) [2025-01-04 10:08:53,968][134211] Fps is (10 sec: 16382.9, 60 sec: 12766.2, 300 sec: 12787.8). Total num frames: 691654656. Throughput: 0: 3312.6. Samples: 162082256. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 10:08:53,969][134211] Avg episode reward: [(0, '8.843')] [2025-01-04 10:08:54,806][134294] Updated weights for policy 0, policy_version 168864 (0.0028) [2025-01-04 10:08:58,246][134294] Updated weights for policy 0, policy_version 168874 (0.0028) [2025-01-04 10:08:58,968][134211] Fps is (10 sec: 12287.8, 60 sec: 12765.9, 300 sec: 12787.9). Total num frames: 691712000. Throughput: 0: 3357.1. Samples: 162099812. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 10:08:58,968][134211] Avg episode reward: [(0, '10.414')] [2025-01-04 10:09:01,465][134294] Updated weights for policy 0, policy_version 168884 (0.0027) [2025-01-04 10:09:03,968][134211] Fps is (10 sec: 12288.6, 60 sec: 12902.5, 300 sec: 12843.4). Total num frames: 691777536. Throughput: 0: 3378.3. Samples: 162109596. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 10:09:03,969][134211] Avg episode reward: [(0, '8.838')] [2025-01-04 10:09:04,793][134294] Updated weights for policy 0, policy_version 168894 (0.0027) [2025-01-04 10:09:07,853][134294] Updated weights for policy 0, policy_version 168904 (0.0024) [2025-01-04 10:09:08,968][134211] Fps is (10 sec: 13107.1, 60 sec: 13107.9, 300 sec: 12829.5). Total num frames: 691843072. Throughput: 0: 3387.2. Samples: 162128570. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 10:09:08,968][134211] Avg episode reward: [(0, '9.119')] [2025-01-04 10:09:10,946][134294] Updated weights for policy 0, policy_version 168914 (0.0026) [2025-01-04 10:09:13,953][134294] Updated weights for policy 0, policy_version 168924 (0.0025) [2025-01-04 10:09:13,969][134211] Fps is (10 sec: 13514.6, 60 sec: 13379.9, 300 sec: 12746.1). Total num frames: 691912704. Throughput: 0: 3389.2. Samples: 162148822. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 10:09:13,971][134211] Avg episode reward: [(0, '9.353')] [2025-01-04 10:09:13,983][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000168924_691912704.pth... [2025-01-04 10:09:14,059][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000168170_688824320.pth [2025-01-04 10:09:17,394][134294] Updated weights for policy 0, policy_version 168934 (0.0030) [2025-01-04 10:09:18,968][134211] Fps is (10 sec: 12287.9, 60 sec: 13312.0, 300 sec: 12718.4). Total num frames: 691965952. Throughput: 0: 3371.4. Samples: 162157776. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 10:09:18,969][134211] Avg episode reward: [(0, '8.375')] [2025-01-04 10:09:20,908][134294] Updated weights for policy 0, policy_version 168944 (0.0028) [2025-01-04 10:09:23,902][134294] Updated weights for policy 0, policy_version 168954 (0.0025) [2025-01-04 10:09:23,968][134211] Fps is (10 sec: 12290.2, 60 sec: 13380.3, 300 sec: 12787.9). Total num frames: 692035584. Throughput: 0: 3348.6. Samples: 162176332. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 10:09:23,968][134211] Avg episode reward: [(0, '8.735')] [2025-01-04 10:09:26,911][134294] Updated weights for policy 0, policy_version 168964 (0.0029) [2025-01-04 10:09:28,968][134211] Fps is (10 sec: 13516.2, 60 sec: 13380.2, 300 sec: 12815.6). Total num frames: 692101120. Throughput: 0: 3398.0. Samples: 162196168. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 10:09:28,969][134211] Avg episode reward: [(0, '9.597')] [2025-01-04 10:09:30,192][134294] Updated weights for policy 0, policy_version 168974 (0.0026) [2025-01-04 10:09:33,220][134294] Updated weights for policy 0, policy_version 168984 (0.0024) [2025-01-04 10:09:33,968][134211] Fps is (10 sec: 13107.2, 60 sec: 13380.3, 300 sec: 12843.4). Total num frames: 692166656. Throughput: 0: 3303.6. Samples: 162205866. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 10:09:33,968][134211] Avg episode reward: [(0, '8.931')] [2025-01-04 10:09:36,332][134294] Updated weights for policy 0, policy_version 168994 (0.0028) [2025-01-04 10:09:38,970][134211] Fps is (10 sec: 13105.0, 60 sec: 13379.8, 300 sec: 12885.0). Total num frames: 692232192. Throughput: 0: 3186.2. Samples: 162225642. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 10:09:38,971][134211] Avg episode reward: [(0, '8.980')] [2025-01-04 10:09:39,671][134294] Updated weights for policy 0, policy_version 169004 (0.0024) [2025-01-04 10:09:42,765][134294] Updated weights for policy 0, policy_version 169014 (0.0024) [2025-01-04 10:09:43,968][134211] Fps is (10 sec: 12697.3, 60 sec: 13380.2, 300 sec: 12815.6). Total num frames: 692293632. Throughput: 0: 3220.8. Samples: 162244750. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 10:09:43,969][134211] Avg episode reward: [(0, '9.817')] [2025-01-04 10:09:46,022][134294] Updated weights for policy 0, policy_version 169024 (0.0030) [2025-01-04 10:09:48,968][134211] Fps is (10 sec: 12290.9, 60 sec: 12765.8, 300 sec: 12732.3). Total num frames: 692355072. Throughput: 0: 3218.0. Samples: 162254406. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 10:09:48,968][134211] Avg episode reward: [(0, '8.226')] [2025-01-04 10:09:49,411][134294] Updated weights for policy 0, policy_version 169034 (0.0027) [2025-01-04 10:09:52,636][134294] Updated weights for policy 0, policy_version 169044 (0.0028) [2025-01-04 10:09:53,968][134211] Fps is (10 sec: 12697.7, 60 sec: 12766.0, 300 sec: 12787.9). Total num frames: 692420608. Throughput: 0: 3208.4. Samples: 162272948. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 10:09:53,969][134211] Avg episode reward: [(0, '8.903')] [2025-01-04 10:09:55,614][134294] Updated weights for policy 0, policy_version 169054 (0.0026) [2025-01-04 10:09:58,711][134294] Updated weights for policy 0, policy_version 169064 (0.0025) [2025-01-04 10:09:58,968][134211] Fps is (10 sec: 13107.0, 60 sec: 12902.4, 300 sec: 12843.4). Total num frames: 692486144. Throughput: 0: 3205.2. Samples: 162293052. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 10:09:58,968][134211] Avg episode reward: [(0, '9.513')] [2025-01-04 10:10:02,100][134294] Updated weights for policy 0, policy_version 169074 (0.0029) [2025-01-04 10:10:03,968][134211] Fps is (10 sec: 12288.1, 60 sec: 12765.9, 300 sec: 12815.6). Total num frames: 692543488. Throughput: 0: 3206.6. Samples: 162302074. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 10:10:03,968][134211] Avg episode reward: [(0, '8.692')] [2025-01-04 10:10:05,933][134294] Updated weights for policy 0, policy_version 169084 (0.0029) [2025-01-04 10:10:08,967][134211] Fps is (10 sec: 11878.7, 60 sec: 12697.6, 300 sec: 12829.5). Total num frames: 692604928. Throughput: 0: 3160.0. Samples: 162318532. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 10:10:08,970][134211] Avg episode reward: [(0, '9.351')] [2025-01-04 10:10:09,082][134294] Updated weights for policy 0, policy_version 169094 (0.0024) [2025-01-04 10:10:11,286][134294] Updated weights for policy 0, policy_version 169104 (0.0014) [2025-01-04 10:10:13,408][134294] Updated weights for policy 0, policy_version 169114 (0.0013) [2025-01-04 10:10:13,968][134211] Fps is (10 sec: 15565.0, 60 sec: 13107.6, 300 sec: 12968.4). Total num frames: 692699136. Throughput: 0: 3302.8. Samples: 162344794. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 10:10:13,968][134211] Avg episode reward: [(0, '9.897')] [2025-01-04 10:10:16,102][134294] Updated weights for policy 0, policy_version 169124 (0.0023) [2025-01-04 10:10:18,968][134211] Fps is (10 sec: 15564.5, 60 sec: 13243.7, 300 sec: 12871.2). Total num frames: 692760576. Throughput: 0: 3341.2. Samples: 162356222. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 10:10:18,968][134211] Avg episode reward: [(0, '9.418')] [2025-01-04 10:10:19,896][134294] Updated weights for policy 0, policy_version 169134 (0.0028) [2025-01-04 10:10:23,196][134294] Updated weights for policy 0, policy_version 169144 (0.0032) [2025-01-04 10:10:23,969][134211] Fps is (10 sec: 11877.3, 60 sec: 13038.7, 300 sec: 12732.3). Total num frames: 692817920. Throughput: 0: 3280.7. Samples: 162373270. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 10:10:23,970][134211] Avg episode reward: [(0, '8.959')] [2025-01-04 10:10:26,650][134294] Updated weights for policy 0, policy_version 169154 (0.0029) [2025-01-04 10:10:28,968][134211] Fps is (10 sec: 11468.7, 60 sec: 12902.5, 300 sec: 12732.3). Total num frames: 692875264. Throughput: 0: 3252.7. Samples: 162391120. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 10:10:28,969][134211] Avg episode reward: [(0, '9.763')] [2025-01-04 10:10:30,667][134294] Updated weights for policy 0, policy_version 169164 (0.0032) [2025-01-04 10:10:33,968][134211] Fps is (10 sec: 10650.5, 60 sec: 12629.3, 300 sec: 12732.3). Total num frames: 692924416. Throughput: 0: 3198.5. Samples: 162398338. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 10:10:33,969][134211] Avg episode reward: [(0, '9.179')] [2025-01-04 10:10:34,834][134294] Updated weights for policy 0, policy_version 169174 (0.0029) [2025-01-04 10:10:37,032][134294] Updated weights for policy 0, policy_version 169184 (0.0014) [2025-01-04 10:10:38,968][134211] Fps is (10 sec: 13926.4, 60 sec: 13039.4, 300 sec: 12871.2). Total num frames: 693014528. Throughput: 0: 3227.2. Samples: 162418170. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 10:10:38,968][134211] Avg episode reward: [(0, '9.332')] [2025-01-04 10:10:39,128][134294] Updated weights for policy 0, policy_version 169194 (0.0014) [2025-01-04 10:10:41,145][134294] Updated weights for policy 0, policy_version 169204 (0.0013) [2025-01-04 10:10:43,237][134294] Updated weights for policy 0, policy_version 169214 (0.0015) [2025-01-04 10:10:43,968][134211] Fps is (10 sec: 18431.9, 60 sec: 13585.1, 300 sec: 13010.0). Total num frames: 693108736. Throughput: 0: 3448.7. Samples: 162448244. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 10:10:43,969][134211] Avg episode reward: [(0, '8.984')] [2025-01-04 10:10:47,039][134294] Updated weights for policy 0, policy_version 169224 (0.0031) [2025-01-04 10:10:48,968][134211] Fps is (10 sec: 14336.0, 60 sec: 13380.2, 300 sec: 12968.3). Total num frames: 693157888. Throughput: 0: 3438.8. Samples: 162456820. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 10:10:48,969][134211] Avg episode reward: [(0, '9.516')] [2025-01-04 10:10:50,794][134294] Updated weights for policy 0, policy_version 169234 (0.0033) [2025-01-04 10:10:53,968][134211] Fps is (10 sec: 11059.2, 60 sec: 13312.0, 300 sec: 12899.0). Total num frames: 693219328. Throughput: 0: 3437.1. Samples: 162473204. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 10:10:53,969][134211] Avg episode reward: [(0, '8.981')] [2025-01-04 10:10:53,989][134294] Updated weights for policy 0, policy_version 169244 (0.0028) [2025-01-04 10:10:57,182][134294] Updated weights for policy 0, policy_version 169254 (0.0026) [2025-01-04 10:10:58,968][134211] Fps is (10 sec: 12697.9, 60 sec: 13312.0, 300 sec: 12940.6). Total num frames: 693284864. Throughput: 0: 3276.5. Samples: 162492236. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 10:10:58,968][134211] Avg episode reward: [(0, '9.152')] [2025-01-04 10:11:00,510][134294] Updated weights for policy 0, policy_version 169264 (0.0027) [2025-01-04 10:11:03,765][134294] Updated weights for policy 0, policy_version 169274 (0.0029) [2025-01-04 10:11:03,968][134211] Fps is (10 sec: 12697.1, 60 sec: 13380.2, 300 sec: 12982.2). Total num frames: 693346304. Throughput: 0: 3234.0. Samples: 162501752. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 10:11:03,969][134211] Avg episode reward: [(0, '8.667')] [2025-01-04 10:11:06,934][134294] Updated weights for policy 0, policy_version 169284 (0.0027) [2025-01-04 10:11:08,968][134211] Fps is (10 sec: 12697.5, 60 sec: 13448.5, 300 sec: 12912.8). Total num frames: 693411840. Throughput: 0: 3278.9. Samples: 162520818. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 10:11:08,968][134211] Avg episode reward: [(0, '9.170')] [2025-01-04 10:11:10,136][134294] Updated weights for policy 0, policy_version 169294 (0.0026) [2025-01-04 10:11:13,373][134294] Updated weights for policy 0, policy_version 169304 (0.0025) [2025-01-04 10:11:13,968][134211] Fps is (10 sec: 12697.9, 60 sec: 12902.3, 300 sec: 12898.9). Total num frames: 693473280. Throughput: 0: 3307.5. Samples: 162539960. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 10:11:13,969][134211] Avg episode reward: [(0, '8.527')] [2025-01-04 10:11:14,030][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000169306_693477376.pth... [2025-01-04 10:11:14,103][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000168540_690339840.pth [2025-01-04 10:11:16,807][134294] Updated weights for policy 0, policy_version 169314 (0.0023) [2025-01-04 10:11:18,968][134211] Fps is (10 sec: 12287.9, 60 sec: 12902.4, 300 sec: 12926.8). Total num frames: 693534720. Throughput: 0: 3345.3. Samples: 162548878. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 10:11:18,968][134211] Avg episode reward: [(0, '8.124')] [2025-01-04 10:11:20,442][134294] Updated weights for policy 0, policy_version 169324 (0.0030) [2025-01-04 10:11:23,476][134294] Updated weights for policy 0, policy_version 169334 (0.0026) [2025-01-04 10:11:23,969][134211] Fps is (10 sec: 12286.9, 60 sec: 12970.6, 300 sec: 12940.5). Total num frames: 693596160. Throughput: 0: 3303.7. Samples: 162566842. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 10:11:23,970][134211] Avg episode reward: [(0, '8.988')] [2025-01-04 10:11:26,532][134294] Updated weights for policy 0, policy_version 169344 (0.0023) [2025-01-04 10:11:28,968][134211] Fps is (10 sec: 12697.8, 60 sec: 13107.2, 300 sec: 12968.4). Total num frames: 693661696. Throughput: 0: 3077.4. Samples: 162586726. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 10:11:28,968][134211] Avg episode reward: [(0, '8.175')] [2025-01-04 10:11:29,808][134294] Updated weights for policy 0, policy_version 169354 (0.0026) [2025-01-04 10:11:33,072][134294] Updated weights for policy 0, policy_version 169364 (0.0026) [2025-01-04 10:11:33,968][134211] Fps is (10 sec: 12698.8, 60 sec: 13312.0, 300 sec: 12982.2). Total num frames: 693723136. Throughput: 0: 3090.7. Samples: 162595902. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 10:11:33,969][134211] Avg episode reward: [(0, '9.718')] [2025-01-04 10:11:36,624][134294] Updated weights for policy 0, policy_version 169374 (0.0028) [2025-01-04 10:11:38,968][134211] Fps is (10 sec: 11878.2, 60 sec: 12765.9, 300 sec: 12996.1). Total num frames: 693780480. Throughput: 0: 3124.3. Samples: 162613798. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 10:11:38,968][134211] Avg episode reward: [(0, '8.891')] [2025-01-04 10:11:40,106][134294] Updated weights for policy 0, policy_version 169384 (0.0028) [2025-01-04 10:11:43,331][134294] Updated weights for policy 0, policy_version 169394 (0.0026) [2025-01-04 10:11:43,968][134211] Fps is (10 sec: 11878.6, 60 sec: 12219.8, 300 sec: 12912.8). Total num frames: 693841920. Throughput: 0: 3105.9. Samples: 162632000. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 10:11:43,968][134211] Avg episode reward: [(0, '9.287')] [2025-01-04 10:11:46,665][134294] Updated weights for policy 0, policy_version 169404 (0.0027) [2025-01-04 10:11:48,968][134211] Fps is (10 sec: 12288.1, 60 sec: 12424.6, 300 sec: 12801.7). Total num frames: 693903360. Throughput: 0: 3101.0. Samples: 162641296. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 10:11:48,968][134211] Avg episode reward: [(0, '8.848')] [2025-01-04 10:11:50,075][134294] Updated weights for policy 0, policy_version 169414 (0.0029) [2025-01-04 10:11:53,389][134294] Updated weights for policy 0, policy_version 169424 (0.0025) [2025-01-04 10:11:53,968][134211] Fps is (10 sec: 12288.0, 60 sec: 12424.6, 300 sec: 12815.6). Total num frames: 693964800. Throughput: 0: 3086.0. Samples: 162659686. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 10:11:53,968][134211] Avg episode reward: [(0, '9.399')] [2025-01-04 10:11:56,629][134294] Updated weights for policy 0, policy_version 169434 (0.0027) [2025-01-04 10:11:58,968][134211] Fps is (10 sec: 12697.5, 60 sec: 12424.5, 300 sec: 12857.3). Total num frames: 694030336. Throughput: 0: 3079.6. Samples: 162678540. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 10:11:58,968][134211] Avg episode reward: [(0, '9.621')] [2025-01-04 10:11:59,770][134294] Updated weights for policy 0, policy_version 169444 (0.0027) [2025-01-04 10:12:02,901][134294] Updated weights for policy 0, policy_version 169454 (0.0024) [2025-01-04 10:12:03,968][134211] Fps is (10 sec: 13107.2, 60 sec: 12492.9, 300 sec: 12898.9). Total num frames: 694095872. Throughput: 0: 3099.2. Samples: 162688342. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 10:12:03,968][134211] Avg episode reward: [(0, '8.754')] [2025-01-04 10:12:05,965][134294] Updated weights for policy 0, policy_version 169464 (0.0025) [2025-01-04 10:12:08,968][134211] Fps is (10 sec: 12697.6, 60 sec: 12424.5, 300 sec: 12940.7). Total num frames: 694157312. Throughput: 0: 3144.0. Samples: 162708320. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 10:12:08,968][134211] Avg episode reward: [(0, '8.464')] [2025-01-04 10:12:09,384][134294] Updated weights for policy 0, policy_version 169474 (0.0024) [2025-01-04 10:12:12,867][134294] Updated weights for policy 0, policy_version 169484 (0.0025) [2025-01-04 10:12:13,968][134211] Fps is (10 sec: 12697.7, 60 sec: 12492.9, 300 sec: 12871.2). Total num frames: 694222848. Throughput: 0: 3095.0. Samples: 162726000. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 10:12:13,968][134211] Avg episode reward: [(0, '8.756')] [2025-01-04 10:12:15,083][134294] Updated weights for policy 0, policy_version 169494 (0.0012) [2025-01-04 10:12:17,177][134294] Updated weights for policy 0, policy_version 169504 (0.0014) [2025-01-04 10:12:18,968][134211] Fps is (10 sec: 16384.1, 60 sec: 13107.2, 300 sec: 12926.7). Total num frames: 694321152. Throughput: 0: 3207.6. Samples: 162740244. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 10:12:18,968][134211] Avg episode reward: [(0, '9.473')] [2025-01-04 10:12:19,275][134294] Updated weights for policy 0, policy_version 169514 (0.0013) [2025-01-04 10:12:21,300][134294] Updated weights for policy 0, policy_version 169524 (0.0014) [2025-01-04 10:12:23,151][134294] Updated weights for policy 0, policy_version 169534 (0.0014) [2025-01-04 10:12:23,968][134211] Fps is (10 sec: 20480.0, 60 sec: 13858.4, 300 sec: 13107.3). Total num frames: 694427648. Throughput: 0: 3488.6. Samples: 162770786. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 10:12:23,968][134211] Avg episode reward: [(0, '8.417')] [2025-01-04 10:12:25,198][134294] Updated weights for policy 0, policy_version 169544 (0.0017) [2025-01-04 10:12:28,415][134294] Updated weights for policy 0, policy_version 169554 (0.0031) [2025-01-04 10:12:28,968][134211] Fps is (10 sec: 17611.8, 60 sec: 13926.3, 300 sec: 13162.7). Total num frames: 694497280. Throughput: 0: 3639.4. Samples: 162795774. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 10:12:28,970][134211] Avg episode reward: [(0, '10.201')] [2025-01-04 10:12:31,734][134294] Updated weights for policy 0, policy_version 169564 (0.0029) [2025-01-04 10:12:33,968][134211] Fps is (10 sec: 13107.0, 60 sec: 13926.4, 300 sec: 13176.6). Total num frames: 694558720. Throughput: 0: 3638.6. Samples: 162805034. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 10:12:33,968][134211] Avg episode reward: [(0, '9.241')] [2025-01-04 10:12:35,154][134294] Updated weights for policy 0, policy_version 169574 (0.0031) [2025-01-04 10:12:38,293][134294] Updated weights for policy 0, policy_version 169584 (0.0025) [2025-01-04 10:12:38,968][134211] Fps is (10 sec: 12698.3, 60 sec: 14063.0, 300 sec: 13218.3). Total num frames: 694624256. Throughput: 0: 3639.7. Samples: 162823472. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 10:12:38,968][134211] Avg episode reward: [(0, '9.276')] [2025-01-04 10:12:41,602][134294] Updated weights for policy 0, policy_version 169594 (0.0027) [2025-01-04 10:12:43,968][134211] Fps is (10 sec: 12287.8, 60 sec: 13994.6, 300 sec: 13190.5). Total num frames: 694681600. Throughput: 0: 3628.3. Samples: 162841812. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 10:12:43,969][134211] Avg episode reward: [(0, '10.257')] [2025-01-04 10:12:45,134][134294] Updated weights for policy 0, policy_version 169604 (0.0027) [2025-01-04 10:12:48,773][134294] Updated weights for policy 0, policy_version 169614 (0.0026) [2025-01-04 10:12:48,968][134211] Fps is (10 sec: 11468.7, 60 sec: 13926.4, 300 sec: 13051.8). Total num frames: 694738944. Throughput: 0: 3609.2. Samples: 162850756. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 10:12:48,968][134211] Avg episode reward: [(0, '9.166')] [2025-01-04 10:12:52,138][134294] Updated weights for policy 0, policy_version 169624 (0.0027) [2025-01-04 10:12:53,968][134211] Fps is (10 sec: 11878.6, 60 sec: 13926.4, 300 sec: 13065.6). Total num frames: 694800384. Throughput: 0: 3557.3. Samples: 162868400. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 10:12:53,968][134211] Avg episode reward: [(0, '9.299')] [2025-01-04 10:12:55,161][134294] Updated weights for policy 0, policy_version 169634 (0.0024) [2025-01-04 10:12:58,169][134294] Updated weights for policy 0, policy_version 169644 (0.0025) [2025-01-04 10:12:58,969][134211] Fps is (10 sec: 13105.9, 60 sec: 13994.4, 300 sec: 13107.2). Total num frames: 694870016. Throughput: 0: 3614.1. Samples: 162888640. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 10:12:58,969][134211] Avg episode reward: [(0, '9.149')] [2025-01-04 10:13:01,362][134294] Updated weights for policy 0, policy_version 169654 (0.0026) [2025-01-04 10:13:03,968][134211] Fps is (10 sec: 13107.0, 60 sec: 13926.4, 300 sec: 13135.1). Total num frames: 694931456. Throughput: 0: 3509.7. Samples: 162898180. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 10:13:03,968][134211] Avg episode reward: [(0, '9.189')] [2025-01-04 10:13:04,741][134294] Updated weights for policy 0, policy_version 169664 (0.0025) [2025-01-04 10:13:07,863][134294] Updated weights for policy 0, policy_version 169674 (0.0025) [2025-01-04 10:13:08,968][134211] Fps is (10 sec: 12698.9, 60 sec: 13994.7, 300 sec: 13176.6). Total num frames: 694996992. Throughput: 0: 3253.0. Samples: 162917170. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 10:13:08,968][134211] Avg episode reward: [(0, '9.099')] [2025-01-04 10:13:10,989][134294] Updated weights for policy 0, policy_version 169684 (0.0026) [2025-01-04 10:13:13,969][134211] Fps is (10 sec: 12696.3, 60 sec: 13926.1, 300 sec: 13190.5). Total num frames: 695058432. Throughput: 0: 3125.6. Samples: 162936428. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 10:13:13,969][134211] Avg episode reward: [(0, '9.446')] [2025-01-04 10:13:13,983][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000169693_695062528.pth... [2025-01-04 10:13:14,066][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000168924_691912704.pth [2025-01-04 10:13:14,323][134294] Updated weights for policy 0, policy_version 169694 (0.0027) [2025-01-04 10:13:17,661][134294] Updated weights for policy 0, policy_version 169704 (0.0022) [2025-01-04 10:13:18,968][134211] Fps is (10 sec: 12288.0, 60 sec: 13312.0, 300 sec: 13176.6). Total num frames: 695119872. Throughput: 0: 3119.1. Samples: 162945394. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 10:13:18,968][134211] Avg episode reward: [(0, '9.634')] [2025-01-04 10:13:21,163][134294] Updated weights for policy 0, policy_version 169714 (0.0028) [2025-01-04 10:13:23,968][134211] Fps is (10 sec: 12289.3, 60 sec: 12561.0, 300 sec: 13162.7). Total num frames: 695181312. Throughput: 0: 3112.5. Samples: 162963534. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 10:13:23,968][134211] Avg episode reward: [(0, '9.024')] [2025-01-04 10:13:24,417][134294] Updated weights for policy 0, policy_version 169724 (0.0024) [2025-01-04 10:13:27,412][134294] Updated weights for policy 0, policy_version 169734 (0.0025) [2025-01-04 10:13:28,968][134211] Fps is (10 sec: 12697.7, 60 sec: 12492.9, 300 sec: 13162.7). Total num frames: 695246848. Throughput: 0: 3136.0. Samples: 162982932. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 10:13:28,968][134211] Avg episode reward: [(0, '8.629')] [2025-01-04 10:13:30,600][134294] Updated weights for policy 0, policy_version 169744 (0.0026) [2025-01-04 10:13:33,628][134294] Updated weights for policy 0, policy_version 169754 (0.0022) [2025-01-04 10:13:33,968][134211] Fps is (10 sec: 13107.2, 60 sec: 12561.0, 300 sec: 13162.7). Total num frames: 695312384. Throughput: 0: 3161.4. Samples: 162993020. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 10:13:33,968][134211] Avg episode reward: [(0, '8.227')] [2025-01-04 10:13:36,704][134294] Updated weights for policy 0, policy_version 169764 (0.0025) [2025-01-04 10:13:38,968][134211] Fps is (10 sec: 13516.8, 60 sec: 12629.3, 300 sec: 13190.5). Total num frames: 695382016. Throughput: 0: 3217.8. Samples: 163013200. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 10:13:38,968][134211] Avg episode reward: [(0, '9.047')] [2025-01-04 10:13:40,014][134294] Updated weights for policy 0, policy_version 169774 (0.0025) [2025-01-04 10:13:43,208][134294] Updated weights for policy 0, policy_version 169784 (0.0026) [2025-01-04 10:13:43,968][134211] Fps is (10 sec: 13107.2, 60 sec: 12697.6, 300 sec: 13065.5). Total num frames: 695443456. Throughput: 0: 3182.2. Samples: 163031834. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 10:13:43,968][134211] Avg episode reward: [(0, '9.707')] [2025-01-04 10:13:46,602][134294] Updated weights for policy 0, policy_version 169794 (0.0026) [2025-01-04 10:13:48,968][134211] Fps is (10 sec: 11878.3, 60 sec: 12697.6, 300 sec: 13037.8). Total num frames: 695500800. Throughput: 0: 3170.3. Samples: 163040844. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 10:13:48,968][134211] Avg episode reward: [(0, '9.434')] [2025-01-04 10:13:50,247][134294] Updated weights for policy 0, policy_version 169804 (0.0030) [2025-01-04 10:13:53,615][134294] Updated weights for policy 0, policy_version 169814 (0.0027) [2025-01-04 10:13:53,968][134211] Fps is (10 sec: 11878.1, 60 sec: 12697.5, 300 sec: 13051.6). Total num frames: 695562240. Throughput: 0: 3133.3. Samples: 163058170. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 10:13:53,969][134211] Avg episode reward: [(0, '9.054')] [2025-01-04 10:13:56,496][134294] Updated weights for policy 0, policy_version 169824 (0.0021) [2025-01-04 10:13:58,531][134294] Updated weights for policy 0, policy_version 169834 (0.0014) [2025-01-04 10:13:58,968][134211] Fps is (10 sec: 14745.9, 60 sec: 12970.9, 300 sec: 13121.1). Total num frames: 695648256. Throughput: 0: 3223.0. Samples: 163081458. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 10:13:58,968][134211] Avg episode reward: [(0, '8.946')] [2025-01-04 10:14:00,548][134294] Updated weights for policy 0, policy_version 169844 (0.0013) [2025-01-04 10:14:02,411][134294] Updated weights for policy 0, policy_version 169854 (0.0013) [2025-01-04 10:14:03,968][134211] Fps is (10 sec: 18432.5, 60 sec: 13585.1, 300 sec: 13232.2). Total num frames: 695746560. Throughput: 0: 3372.3. Samples: 163097146. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 10:14:03,968][134211] Avg episode reward: [(0, '9.288')] [2025-01-04 10:14:05,072][134294] Updated weights for policy 0, policy_version 169864 (0.0022) [2025-01-04 10:14:08,307][134294] Updated weights for policy 0, policy_version 169874 (0.0029) [2025-01-04 10:14:08,970][134211] Fps is (10 sec: 15971.2, 60 sec: 13516.4, 300 sec: 13204.4). Total num frames: 695808000. Throughput: 0: 3495.7. Samples: 163120848. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 10:14:08,970][134211] Avg episode reward: [(0, '8.791')] [2025-01-04 10:14:12,134][134294] Updated weights for policy 0, policy_version 169884 (0.0028) [2025-01-04 10:14:13,968][134211] Fps is (10 sec: 11468.9, 60 sec: 13380.5, 300 sec: 13204.4). Total num frames: 695861248. Throughput: 0: 3429.0. Samples: 163137236. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 10:14:13,968][134211] Avg episode reward: [(0, '9.396')] [2025-01-04 10:14:15,802][134294] Updated weights for policy 0, policy_version 169894 (0.0032) [2025-01-04 10:14:18,968][134211] Fps is (10 sec: 11061.2, 60 sec: 13312.0, 300 sec: 13162.7). Total num frames: 695918592. Throughput: 0: 3396.0. Samples: 163145840. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 10:14:18,969][134211] Avg episode reward: [(0, '9.985')] [2025-01-04 10:14:19,396][134294] Updated weights for policy 0, policy_version 169904 (0.0024) [2025-01-04 10:14:22,678][134294] Updated weights for policy 0, policy_version 169914 (0.0027) [2025-01-04 10:14:23,968][134211] Fps is (10 sec: 12287.8, 60 sec: 13380.3, 300 sec: 13162.8). Total num frames: 695984128. Throughput: 0: 3340.2. Samples: 163163510. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 10:14:23,969][134211] Avg episode reward: [(0, '8.566')] [2025-01-04 10:14:25,692][134294] Updated weights for policy 0, policy_version 169924 (0.0025) [2025-01-04 10:14:28,697][134294] Updated weights for policy 0, policy_version 169934 (0.0026) [2025-01-04 10:14:28,968][134211] Fps is (10 sec: 13107.2, 60 sec: 13380.2, 300 sec: 13162.7). Total num frames: 696049664. Throughput: 0: 3380.1. Samples: 163183938. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 10:14:28,969][134211] Avg episode reward: [(0, '10.799')] [2025-01-04 10:14:31,768][134294] Updated weights for policy 0, policy_version 169944 (0.0028) [2025-01-04 10:14:33,968][134211] Fps is (10 sec: 13107.2, 60 sec: 13380.3, 300 sec: 13162.8). Total num frames: 696115200. Throughput: 0: 3399.4. Samples: 163193816. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 10:14:33,968][134211] Avg episode reward: [(0, '8.942')] [2025-01-04 10:14:34,918][134294] Updated weights for policy 0, policy_version 169954 (0.0028) [2025-01-04 10:14:38,051][134294] Updated weights for policy 0, policy_version 169964 (0.0028) [2025-01-04 10:14:38,968][134211] Fps is (10 sec: 13107.3, 60 sec: 13312.0, 300 sec: 13176.6). Total num frames: 696180736. Throughput: 0: 3447.6. Samples: 163213310. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 10:14:38,969][134211] Avg episode reward: [(0, '8.622')] [2025-01-04 10:14:41,255][134294] Updated weights for policy 0, policy_version 169974 (0.0027) [2025-01-04 10:14:43,968][134211] Fps is (10 sec: 13107.2, 60 sec: 13380.3, 300 sec: 13190.5). Total num frames: 696246272. Throughput: 0: 3359.3. Samples: 163232626. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 10:14:43,968][134211] Avg episode reward: [(0, '9.446')] [2025-01-04 10:14:44,667][134294] Updated weights for policy 0, policy_version 169984 (0.0024) [2025-01-04 10:14:47,930][134294] Updated weights for policy 0, policy_version 169994 (0.0027) [2025-01-04 10:14:48,968][134211] Fps is (10 sec: 12697.7, 60 sec: 13448.6, 300 sec: 13176.6). Total num frames: 696307712. Throughput: 0: 3210.7. Samples: 163241628. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 10:14:48,968][134211] Avg episode reward: [(0, '8.662')] [2025-01-04 10:14:51,309][134294] Updated weights for policy 0, policy_version 170004 (0.0025) [2025-01-04 10:14:53,968][134211] Fps is (10 sec: 11878.3, 60 sec: 13380.3, 300 sec: 13148.8). Total num frames: 696365056. Throughput: 0: 3087.3. Samples: 163259772. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 10:14:53,968][134211] Avg episode reward: [(0, '8.324')] [2025-01-04 10:14:55,112][134294] Updated weights for policy 0, policy_version 170014 (0.0030) [2025-01-04 10:14:58,257][134294] Updated weights for policy 0, policy_version 170024 (0.0026) [2025-01-04 10:14:58,968][134211] Fps is (10 sec: 12288.1, 60 sec: 13038.9, 300 sec: 13176.6). Total num frames: 696430592. Throughput: 0: 3114.0. Samples: 163277368. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 10:14:58,968][134211] Avg episode reward: [(0, '8.790')] [2025-01-04 10:15:00,281][134294] Updated weights for policy 0, policy_version 170034 (0.0014) [2025-01-04 10:15:02,769][134294] Updated weights for policy 0, policy_version 170044 (0.0017) [2025-01-04 10:15:03,968][134211] Fps is (10 sec: 14336.1, 60 sec: 12697.6, 300 sec: 13232.1). Total num frames: 696508416. Throughput: 0: 3263.1. Samples: 163292678. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 10:15:03,968][134211] Avg episode reward: [(0, '9.509')] [2025-01-04 10:15:06,494][134294] Updated weights for policy 0, policy_version 170054 (0.0031) [2025-01-04 10:15:08,968][134211] Fps is (10 sec: 13106.9, 60 sec: 12561.4, 300 sec: 13093.3). Total num frames: 696561664. Throughput: 0: 3254.0. Samples: 163309942. Policy #0 lag: (min: 0.0, avg: 0.7, max: 3.0) [2025-01-04 10:15:08,969][134211] Avg episode reward: [(0, '9.333')] [2025-01-04 10:15:10,888][134294] Updated weights for policy 0, policy_version 170064 (0.0029) [2025-01-04 10:15:12,954][134294] Updated weights for policy 0, policy_version 170074 (0.0016) [2025-01-04 10:15:13,968][134211] Fps is (10 sec: 13107.5, 60 sec: 12970.7, 300 sec: 13148.9). Total num frames: 696639488. Throughput: 0: 3245.9. Samples: 163330002. Policy #0 lag: (min: 0.0, avg: 0.7, max: 3.0) [2025-01-04 10:15:13,968][134211] Avg episode reward: [(0, '9.028')] [2025-01-04 10:15:14,019][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000170079_696643584.pth... [2025-01-04 10:15:14,065][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000169306_693477376.pth [2025-01-04 10:15:14,994][134294] Updated weights for policy 0, policy_version 170084 (0.0013) [2025-01-04 10:15:16,963][134294] Updated weights for policy 0, policy_version 170094 (0.0013) [2025-01-04 10:15:18,967][134211] Fps is (10 sec: 18023.0, 60 sec: 13721.7, 300 sec: 13301.6). Total num frames: 696741888. Throughput: 0: 3368.0. Samples: 163345374. Policy #0 lag: (min: 0.0, avg: 0.7, max: 3.0) [2025-01-04 10:15:18,968][134211] Avg episode reward: [(0, '9.215')] [2025-01-04 10:15:19,011][134294] Updated weights for policy 0, policy_version 170104 (0.0017) [2025-01-04 10:15:21,497][134294] Updated weights for policy 0, policy_version 170114 (0.0019) [2025-01-04 10:15:23,968][134211] Fps is (10 sec: 17612.3, 60 sec: 13858.1, 300 sec: 13357.1). Total num frames: 696815616. Throughput: 0: 3528.3. Samples: 163372082. Policy #0 lag: (min: 0.0, avg: 0.7, max: 3.0) [2025-01-04 10:15:23,969][134211] Avg episode reward: [(0, '9.621')] [2025-01-04 10:15:24,786][134294] Updated weights for policy 0, policy_version 170124 (0.0030) [2025-01-04 10:15:28,092][134294] Updated weights for policy 0, policy_version 170134 (0.0028) [2025-01-04 10:15:28,968][134211] Fps is (10 sec: 13516.6, 60 sec: 13789.9, 300 sec: 13398.8). Total num frames: 696877056. Throughput: 0: 3512.6. Samples: 163390694. Policy #0 lag: (min: 0.0, avg: 0.7, max: 3.0) [2025-01-04 10:15:28,968][134211] Avg episode reward: [(0, '8.953')] [2025-01-04 10:15:31,230][134294] Updated weights for policy 0, policy_version 170144 (0.0025) [2025-01-04 10:15:33,968][134211] Fps is (10 sec: 12288.1, 60 sec: 13721.6, 300 sec: 13301.6). Total num frames: 696938496. Throughput: 0: 3528.4. Samples: 163400406. Policy #0 lag: (min: 0.0, avg: 0.7, max: 3.0) [2025-01-04 10:15:33,968][134211] Avg episode reward: [(0, '9.309')] [2025-01-04 10:15:34,589][134294] Updated weights for policy 0, policy_version 170154 (0.0028) [2025-01-04 10:15:37,826][134294] Updated weights for policy 0, policy_version 170164 (0.0026) [2025-01-04 10:15:38,968][134211] Fps is (10 sec: 12697.4, 60 sec: 13721.6, 300 sec: 13204.4). Total num frames: 697004032. Throughput: 0: 3539.2. Samples: 163419034. Policy #0 lag: (min: 0.0, avg: 0.7, max: 3.0) [2025-01-04 10:15:38,968][134211] Avg episode reward: [(0, '9.204')] [2025-01-04 10:15:41,211][134294] Updated weights for policy 0, policy_version 170174 (0.0026) [2025-01-04 10:15:43,968][134211] Fps is (10 sec: 12287.9, 60 sec: 13585.1, 300 sec: 13232.2). Total num frames: 697061376. Throughput: 0: 3543.5. Samples: 163436826. Policy #0 lag: (min: 0.0, avg: 0.7, max: 3.0) [2025-01-04 10:15:43,969][134211] Avg episode reward: [(0, '8.414')] [2025-01-04 10:15:44,909][134294] Updated weights for policy 0, policy_version 170184 (0.0029) [2025-01-04 10:15:48,142][134294] Updated weights for policy 0, policy_version 170194 (0.0029) [2025-01-04 10:15:48,969][134211] Fps is (10 sec: 11877.5, 60 sec: 13584.9, 300 sec: 13232.1). Total num frames: 697122816. Throughput: 0: 3398.4. Samples: 163445608. Policy #0 lag: (min: 0.0, avg: 0.7, max: 3.0) [2025-01-04 10:15:48,969][134211] Avg episode reward: [(0, '9.085')] [2025-01-04 10:15:51,560][134294] Updated weights for policy 0, policy_version 170204 (0.0028) [2025-01-04 10:15:53,968][134211] Fps is (10 sec: 12288.1, 60 sec: 13653.4, 300 sec: 13218.3). Total num frames: 697184256. Throughput: 0: 3421.2. Samples: 163463896. Policy #0 lag: (min: 0.0, avg: 0.7, max: 3.0) [2025-01-04 10:15:53,968][134211] Avg episode reward: [(0, '8.555')] [2025-01-04 10:15:54,768][134294] Updated weights for policy 0, policy_version 170214 (0.0027) [2025-01-04 10:15:57,912][134294] Updated weights for policy 0, policy_version 170224 (0.0026) [2025-01-04 10:15:58,968][134211] Fps is (10 sec: 12698.4, 60 sec: 13653.3, 300 sec: 13232.2). Total num frames: 697249792. Throughput: 0: 3409.4. Samples: 163483428. Policy #0 lag: (min: 0.0, avg: 0.7, max: 3.0) [2025-01-04 10:15:58,969][134211] Avg episode reward: [(0, '8.580')] [2025-01-04 10:16:00,953][134294] Updated weights for policy 0, policy_version 170234 (0.0024) [2025-01-04 10:16:03,968][134211] Fps is (10 sec: 13107.2, 60 sec: 13448.5, 300 sec: 13232.2). Total num frames: 697315328. Throughput: 0: 3295.0. Samples: 163493648. Policy #0 lag: (min: 0.0, avg: 0.7, max: 3.0) [2025-01-04 10:16:03,968][134211] Avg episode reward: [(0, '8.433')] [2025-01-04 10:16:03,987][134294] Updated weights for policy 0, policy_version 170244 (0.0026) [2025-01-04 10:16:07,025][134294] Updated weights for policy 0, policy_version 170254 (0.0028) [2025-01-04 10:16:08,968][134211] Fps is (10 sec: 13107.5, 60 sec: 13653.4, 300 sec: 13246.1). Total num frames: 697380864. Throughput: 0: 3147.4. Samples: 163513714. Policy #0 lag: (min: 0.0, avg: 0.7, max: 3.0) [2025-01-04 10:16:08,968][134211] Avg episode reward: [(0, '8.569')] [2025-01-04 10:16:10,403][134294] Updated weights for policy 0, policy_version 170264 (0.0028) [2025-01-04 10:16:13,953][134294] Updated weights for policy 0, policy_version 170274 (0.0026) [2025-01-04 10:16:13,969][134211] Fps is (10 sec: 12696.6, 60 sec: 13380.1, 300 sec: 13246.0). Total num frames: 697442304. Throughput: 0: 3132.0. Samples: 163531636. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 10:16:13,970][134211] Avg episode reward: [(0, '8.801')] [2025-01-04 10:16:17,562][134294] Updated weights for policy 0, policy_version 170284 (0.0027) [2025-01-04 10:16:18,968][134211] Fps is (10 sec: 11468.7, 60 sec: 12561.0, 300 sec: 13218.3). Total num frames: 697495552. Throughput: 0: 3103.6. Samples: 163540068. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 10:16:18,968][134211] Avg episode reward: [(0, '9.280')] [2025-01-04 10:16:21,211][134294] Updated weights for policy 0, policy_version 170294 (0.0025) [2025-01-04 10:16:23,968][134211] Fps is (10 sec: 11059.6, 60 sec: 12287.9, 300 sec: 13190.5). Total num frames: 697552896. Throughput: 0: 3063.5. Samples: 163556892. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 10:16:23,969][134211] Avg episode reward: [(0, '9.354')] [2025-01-04 10:16:24,972][134294] Updated weights for policy 0, policy_version 170304 (0.0031) [2025-01-04 10:16:28,221][134294] Updated weights for policy 0, policy_version 170314 (0.0027) [2025-01-04 10:16:28,968][134211] Fps is (10 sec: 11878.4, 60 sec: 12288.0, 300 sec: 13190.5). Total num frames: 697614336. Throughput: 0: 3060.9. Samples: 163574564. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 10:16:28,968][134211] Avg episode reward: [(0, '9.339')] [2025-01-04 10:16:31,333][134294] Updated weights for policy 0, policy_version 170324 (0.0024) [2025-01-04 10:16:33,968][134211] Fps is (10 sec: 12698.2, 60 sec: 12356.3, 300 sec: 13218.3). Total num frames: 697679872. Throughput: 0: 3086.1. Samples: 163584478. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 10:16:33,968][134211] Avg episode reward: [(0, '9.905')] [2025-01-04 10:16:34,551][134294] Updated weights for policy 0, policy_version 170334 (0.0025) [2025-01-04 10:16:37,620][134294] Updated weights for policy 0, policy_version 170344 (0.0026) [2025-01-04 10:16:38,968][134211] Fps is (10 sec: 12697.5, 60 sec: 12288.0, 300 sec: 13218.3). Total num frames: 697741312. Throughput: 0: 3110.9. Samples: 163603888. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 10:16:38,968][134211] Avg episode reward: [(0, '9.690')] [2025-01-04 10:16:40,942][134294] Updated weights for policy 0, policy_version 170354 (0.0030) [2025-01-04 10:16:43,960][134294] Updated weights for policy 0, policy_version 170364 (0.0024) [2025-01-04 10:16:43,968][134211] Fps is (10 sec: 13107.2, 60 sec: 12492.8, 300 sec: 13246.0). Total num frames: 697810944. Throughput: 0: 3111.1. Samples: 163623426. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 10:16:43,968][134211] Avg episode reward: [(0, '10.299')] [2025-01-04 10:16:45,988][134294] Updated weights for policy 0, policy_version 170374 (0.0015) [2025-01-04 10:16:48,968][134211] Fps is (10 sec: 15155.4, 60 sec: 12834.3, 300 sec: 13315.5). Total num frames: 697892864. Throughput: 0: 3187.5. Samples: 163637084. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 10:16:48,968][134211] Avg episode reward: [(0, '8.441')] [2025-01-04 10:16:48,972][134294] Updated weights for policy 0, policy_version 170384 (0.0025) [2025-01-04 10:16:52,420][134294] Updated weights for policy 0, policy_version 170394 (0.0025) [2025-01-04 10:16:53,968][134211] Fps is (10 sec: 13926.2, 60 sec: 12765.8, 300 sec: 13287.7). Total num frames: 697950208. Throughput: 0: 3160.8. Samples: 163655952. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 10:16:53,968][134211] Avg episode reward: [(0, '9.776')] [2025-01-04 10:16:55,478][134294] Updated weights for policy 0, policy_version 170404 (0.0026) [2025-01-04 10:16:58,580][134294] Updated weights for policy 0, policy_version 170414 (0.0025) [2025-01-04 10:16:58,968][134211] Fps is (10 sec: 12697.4, 60 sec: 12834.2, 300 sec: 13301.6). Total num frames: 698019840. Throughput: 0: 3204.9. Samples: 163675854. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 10:16:58,968][134211] Avg episode reward: [(0, '9.455')] [2025-01-04 10:17:01,740][134294] Updated weights for policy 0, policy_version 170424 (0.0024) [2025-01-04 10:17:03,968][134211] Fps is (10 sec: 13106.9, 60 sec: 12765.8, 300 sec: 13301.6). Total num frames: 698081280. Throughput: 0: 3232.6. Samples: 163685538. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 10:17:03,969][134211] Avg episode reward: [(0, '9.639')] [2025-01-04 10:17:04,991][134294] Updated weights for policy 0, policy_version 170434 (0.0025) [2025-01-04 10:17:08,397][134294] Updated weights for policy 0, policy_version 170444 (0.0026) [2025-01-04 10:17:08,968][134211] Fps is (10 sec: 12288.2, 60 sec: 12697.6, 300 sec: 13287.7). Total num frames: 698142720. Throughput: 0: 3275.1. Samples: 163704270. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 10:17:08,968][134211] Avg episode reward: [(0, '8.962')] [2025-01-04 10:17:11,523][134294] Updated weights for policy 0, policy_version 170454 (0.0028) [2025-01-04 10:17:13,968][134211] Fps is (10 sec: 12288.3, 60 sec: 12697.8, 300 sec: 13162.7). Total num frames: 698204160. Throughput: 0: 3292.9. Samples: 163722744. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 10:17:13,968][134211] Avg episode reward: [(0, '8.947')] [2025-01-04 10:17:13,998][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000170461_698208256.pth... [2025-01-04 10:17:14,071][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000169693_695062528.pth [2025-01-04 10:17:15,105][134294] Updated weights for policy 0, policy_version 170464 (0.0027) [2025-01-04 10:17:17,493][134294] Updated weights for policy 0, policy_version 170474 (0.0015) [2025-01-04 10:17:18,968][134211] Fps is (10 sec: 14745.7, 60 sec: 13243.8, 300 sec: 13093.3). Total num frames: 698290176. Throughput: 0: 3273.7. Samples: 163731794. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 10:17:18,968][134211] Avg episode reward: [(0, '9.301')] [2025-01-04 10:17:19,565][134294] Updated weights for policy 0, policy_version 170484 (0.0013) [2025-01-04 10:17:22,586][134294] Updated weights for policy 0, policy_version 170494 (0.0024) [2025-01-04 10:17:23,968][134211] Fps is (10 sec: 15565.0, 60 sec: 13448.7, 300 sec: 13093.3). Total num frames: 698359808. Throughput: 0: 3421.0. Samples: 163757834. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 10:17:23,968][134211] Avg episode reward: [(0, '9.040')] [2025-01-04 10:17:25,897][134294] Updated weights for policy 0, policy_version 170504 (0.0029) [2025-01-04 10:17:28,968][134211] Fps is (10 sec: 13107.2, 60 sec: 13448.6, 300 sec: 13093.3). Total num frames: 698421248. Throughput: 0: 3410.1. Samples: 163776882. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 10:17:28,968][134211] Avg episode reward: [(0, '9.356')] [2025-01-04 10:17:29,003][134294] Updated weights for policy 0, policy_version 170514 (0.0029) [2025-01-04 10:17:32,141][134294] Updated weights for policy 0, policy_version 170524 (0.0029) [2025-01-04 10:17:33,968][134211] Fps is (10 sec: 12697.3, 60 sec: 13448.5, 300 sec: 13093.3). Total num frames: 698486784. Throughput: 0: 3321.4. Samples: 163786548. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 10:17:33,969][134211] Avg episode reward: [(0, '8.311')] [2025-01-04 10:17:35,263][134294] Updated weights for policy 0, policy_version 170534 (0.0027) [2025-01-04 10:17:38,248][134294] Updated weights for policy 0, policy_version 170544 (0.0025) [2025-01-04 10:17:38,968][134211] Fps is (10 sec: 13107.0, 60 sec: 13516.8, 300 sec: 13121.1). Total num frames: 698552320. Throughput: 0: 3348.5. Samples: 163806632. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 10:17:38,968][134211] Avg episode reward: [(0, '9.517')] [2025-01-04 10:17:41,361][134294] Updated weights for policy 0, policy_version 170554 (0.0026) [2025-01-04 10:17:43,968][134211] Fps is (10 sec: 13107.3, 60 sec: 13448.5, 300 sec: 13148.9). Total num frames: 698617856. Throughput: 0: 3333.5. Samples: 163825860. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 10:17:43,969][134211] Avg episode reward: [(0, '9.271')] [2025-01-04 10:17:44,924][134294] Updated weights for policy 0, policy_version 170564 (0.0025) [2025-01-04 10:17:48,370][134294] Updated weights for policy 0, policy_version 170574 (0.0025) [2025-01-04 10:17:48,968][134211] Fps is (10 sec: 12288.0, 60 sec: 13038.9, 300 sec: 13135.0). Total num frames: 698675200. Throughput: 0: 3308.1. Samples: 163834402. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 10:17:48,968][134211] Avg episode reward: [(0, '8.605')] [2025-01-04 10:17:51,899][134294] Updated weights for policy 0, policy_version 170584 (0.0025) [2025-01-04 10:17:53,968][134211] Fps is (10 sec: 11878.6, 60 sec: 13107.2, 300 sec: 13107.2). Total num frames: 698736640. Throughput: 0: 3285.4. Samples: 163852114. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 10:17:53,968][134211] Avg episode reward: [(0, '9.216')] [2025-01-04 10:17:55,372][134294] Updated weights for policy 0, policy_version 170594 (0.0027) [2025-01-04 10:17:58,416][134294] Updated weights for policy 0, policy_version 170604 (0.0027) [2025-01-04 10:17:58,968][134211] Fps is (10 sec: 12287.9, 60 sec: 12970.7, 300 sec: 13107.2). Total num frames: 698798080. Throughput: 0: 3293.2. Samples: 163870936. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 10:17:58,968][134211] Avg episode reward: [(0, '9.419')] [2025-01-04 10:18:01,464][134294] Updated weights for policy 0, policy_version 170614 (0.0027) [2025-01-04 10:18:03,968][134211] Fps is (10 sec: 12697.2, 60 sec: 13038.9, 300 sec: 13107.2). Total num frames: 698863616. Throughput: 0: 3314.1. Samples: 163880932. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 10:18:03,969][134211] Avg episode reward: [(0, '9.089')] [2025-01-04 10:18:04,763][134294] Updated weights for policy 0, policy_version 170624 (0.0029) [2025-01-04 10:18:07,854][134294] Updated weights for policy 0, policy_version 170634 (0.0024) [2025-01-04 10:18:08,968][134211] Fps is (10 sec: 13107.4, 60 sec: 13107.2, 300 sec: 13121.1). Total num frames: 698929152. Throughput: 0: 3165.9. Samples: 163900298. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 10:18:08,968][134211] Avg episode reward: [(0, '9.458')] [2025-01-04 10:18:10,756][134294] Updated weights for policy 0, policy_version 170644 (0.0024) [2025-01-04 10:18:13,968][134211] Fps is (10 sec: 13107.4, 60 sec: 13175.5, 300 sec: 13135.0). Total num frames: 698994688. Throughput: 0: 3174.0. Samples: 163919714. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 10:18:13,971][134211] Avg episode reward: [(0, '9.157')] [2025-01-04 10:18:14,246][134294] Updated weights for policy 0, policy_version 170654 (0.0024) [2025-01-04 10:18:16,645][134294] Updated weights for policy 0, policy_version 170664 (0.0018) [2025-01-04 10:18:18,968][134211] Fps is (10 sec: 13516.0, 60 sec: 12902.2, 300 sec: 13162.7). Total num frames: 699064320. Throughput: 0: 3218.3. Samples: 163931372. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 10:18:18,969][134211] Avg episode reward: [(0, '9.813')] [2025-01-04 10:18:20,244][134294] Updated weights for policy 0, policy_version 170674 (0.0027) [2025-01-04 10:18:23,095][134294] Updated weights for policy 0, policy_version 170684 (0.0019) [2025-01-04 10:18:23,968][134211] Fps is (10 sec: 14336.4, 60 sec: 12970.7, 300 sec: 13190.5). Total num frames: 699138048. Throughput: 0: 3162.2. Samples: 163948932. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 10:18:23,968][134211] Avg episode reward: [(0, '9.209')] [2025-01-04 10:18:25,050][134294] Updated weights for policy 0, policy_version 170694 (0.0015) [2025-01-04 10:18:26,970][134294] Updated weights for policy 0, policy_version 170704 (0.0013) [2025-01-04 10:18:28,895][134294] Updated weights for policy 0, policy_version 170714 (0.0015) [2025-01-04 10:18:28,967][134211] Fps is (10 sec: 18023.8, 60 sec: 13721.6, 300 sec: 13329.4). Total num frames: 699244544. Throughput: 0: 3436.0. Samples: 163980480. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 10:18:28,968][134211] Avg episode reward: [(0, '9.552')] [2025-01-04 10:18:31,671][134294] Updated weights for policy 0, policy_version 170724 (0.0023) [2025-01-04 10:18:33,968][134211] Fps is (10 sec: 17612.5, 60 sec: 13789.9, 300 sec: 13329.4). Total num frames: 699314176. Throughput: 0: 3530.4. Samples: 163993272. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 10:18:33,968][134211] Avg episode reward: [(0, '8.695')] [2025-01-04 10:18:34,965][134294] Updated weights for policy 0, policy_version 170734 (0.0030) [2025-01-04 10:18:38,408][134294] Updated weights for policy 0, policy_version 170744 (0.0029) [2025-01-04 10:18:38,968][134211] Fps is (10 sec: 12697.2, 60 sec: 13653.3, 300 sec: 13315.5). Total num frames: 699371520. Throughput: 0: 3536.4. Samples: 164011254. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 10:18:38,969][134211] Avg episode reward: [(0, '8.534')] [2025-01-04 10:18:41,808][134294] Updated weights for policy 0, policy_version 170754 (0.0027) [2025-01-04 10:18:43,968][134211] Fps is (10 sec: 11878.3, 60 sec: 13585.1, 300 sec: 13329.4). Total num frames: 699432960. Throughput: 0: 3525.4. Samples: 164029578. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 10:18:43,969][134211] Avg episode reward: [(0, '8.581')] [2025-01-04 10:18:45,160][134294] Updated weights for policy 0, policy_version 170764 (0.0025) [2025-01-04 10:18:48,668][134294] Updated weights for policy 0, policy_version 170774 (0.0026) [2025-01-04 10:18:48,968][134211] Fps is (10 sec: 12288.0, 60 sec: 13653.3, 300 sec: 13329.4). Total num frames: 699494400. Throughput: 0: 3505.5. Samples: 164038680. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 10:18:48,968][134211] Avg episode reward: [(0, '8.363')] [2025-01-04 10:18:52,432][134294] Updated weights for policy 0, policy_version 170784 (0.0026) [2025-01-04 10:18:53,968][134211] Fps is (10 sec: 11059.4, 60 sec: 13448.5, 300 sec: 13204.4). Total num frames: 699543552. Throughput: 0: 3443.2. Samples: 164055244. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 10:18:53,968][134211] Avg episode reward: [(0, '8.569')] [2025-01-04 10:18:56,379][134294] Updated weights for policy 0, policy_version 170794 (0.0030) [2025-01-04 10:18:58,968][134211] Fps is (10 sec: 10240.1, 60 sec: 13312.0, 300 sec: 13051.7). Total num frames: 699596800. Throughput: 0: 3362.4. Samples: 164071022. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 10:18:58,968][134211] Avg episode reward: [(0, '9.288')] [2025-01-04 10:19:00,439][134294] Updated weights for policy 0, policy_version 170804 (0.0033) [2025-01-04 10:19:03,565][134294] Updated weights for policy 0, policy_version 170814 (0.0019) [2025-01-04 10:19:03,968][134211] Fps is (10 sec: 11468.9, 60 sec: 13243.8, 300 sec: 13051.8). Total num frames: 699658240. Throughput: 0: 3251.9. Samples: 164077704. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 10:19:03,968][134211] Avg episode reward: [(0, '9.753')] [2025-01-04 10:19:05,555][134294] Updated weights for policy 0, policy_version 170824 (0.0013) [2025-01-04 10:19:07,613][134294] Updated weights for policy 0, policy_version 170834 (0.0013) [2025-01-04 10:19:08,967][134211] Fps is (10 sec: 16384.4, 60 sec: 13858.2, 300 sec: 13218.3). Total num frames: 699760640. Throughput: 0: 3473.7. Samples: 164105250. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 10:19:08,968][134211] Avg episode reward: [(0, '8.228')] [2025-01-04 10:19:09,611][134294] Updated weights for policy 0, policy_version 170844 (0.0014) [2025-01-04 10:19:12,316][134294] Updated weights for policy 0, policy_version 170854 (0.0022) [2025-01-04 10:19:13,968][134211] Fps is (10 sec: 17611.9, 60 sec: 13994.6, 300 sec: 13273.8). Total num frames: 699834368. Throughput: 0: 3327.6. Samples: 164130224. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 10:19:13,969][134211] Avg episode reward: [(0, '9.427')] [2025-01-04 10:19:14,028][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000170859_699838464.pth... [2025-01-04 10:19:14,105][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000170079_696643584.pth [2025-01-04 10:19:15,759][134294] Updated weights for policy 0, policy_version 170864 (0.0031) [2025-01-04 10:19:18,968][134211] Fps is (10 sec: 13515.5, 60 sec: 13858.1, 300 sec: 13259.9). Total num frames: 699895808. Throughput: 0: 3243.1. Samples: 164139214. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 10:19:18,969][134211] Avg episode reward: [(0, '10.068')] [2025-01-04 10:19:19,177][134294] Updated weights for policy 0, policy_version 170874 (0.0027) [2025-01-04 10:19:22,661][134294] Updated weights for policy 0, policy_version 170884 (0.0024) [2025-01-04 10:19:23,968][134211] Fps is (10 sec: 11878.9, 60 sec: 13585.0, 300 sec: 13232.2). Total num frames: 699953152. Throughput: 0: 3236.8. Samples: 164156910. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 10:19:23,968][134211] Avg episode reward: [(0, '8.418')] [2025-01-04 10:19:25,793][134294] Updated weights for policy 0, policy_version 170894 (0.0026) [2025-01-04 10:19:28,834][134294] Updated weights for policy 0, policy_version 170904 (0.0024) [2025-01-04 10:19:28,970][134211] Fps is (10 sec: 12695.8, 60 sec: 12970.2, 300 sec: 13246.0). Total num frames: 700022784. Throughput: 0: 3269.9. Samples: 164176728. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 10:19:28,970][134211] Avg episode reward: [(0, '9.021')] [2025-01-04 10:19:31,927][134294] Updated weights for policy 0, policy_version 170914 (0.0028) [2025-01-04 10:19:33,968][134211] Fps is (10 sec: 13516.9, 60 sec: 12902.4, 300 sec: 13246.1). Total num frames: 700088320. Throughput: 0: 3284.9. Samples: 164186498. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 10:19:33,968][134211] Avg episode reward: [(0, '9.614')] [2025-01-04 10:19:35,172][134294] Updated weights for policy 0, policy_version 170924 (0.0026) [2025-01-04 10:19:38,183][134294] Updated weights for policy 0, policy_version 170934 (0.0027) [2025-01-04 10:19:38,968][134211] Fps is (10 sec: 13109.9, 60 sec: 13038.9, 300 sec: 13246.0). Total num frames: 700153856. Throughput: 0: 3354.8. Samples: 164206212. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 10:19:38,968][134211] Avg episode reward: [(0, '8.651')] [2025-01-04 10:19:41,265][134294] Updated weights for policy 0, policy_version 170944 (0.0024) [2025-01-04 10:19:43,968][134211] Fps is (10 sec: 12697.3, 60 sec: 13038.9, 300 sec: 13246.0). Total num frames: 700215296. Throughput: 0: 3433.4. Samples: 164225526. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 10:19:43,969][134211] Avg episode reward: [(0, '9.074')] [2025-01-04 10:19:44,784][134294] Updated weights for policy 0, policy_version 170954 (0.0024) [2025-01-04 10:19:48,248][134294] Updated weights for policy 0, policy_version 170964 (0.0026) [2025-01-04 10:19:48,968][134211] Fps is (10 sec: 11878.4, 60 sec: 12970.7, 300 sec: 13246.1). Total num frames: 700272640. Throughput: 0: 3482.2. Samples: 164234404. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 10:19:48,968][134211] Avg episode reward: [(0, '9.095')] [2025-01-04 10:19:51,740][134294] Updated weights for policy 0, policy_version 170974 (0.0027) [2025-01-04 10:19:53,968][134211] Fps is (10 sec: 11878.5, 60 sec: 13175.4, 300 sec: 13232.2). Total num frames: 700334080. Throughput: 0: 3256.5. Samples: 164251794. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 10:19:53,970][134211] Avg episode reward: [(0, '9.626')] [2025-01-04 10:19:55,038][134294] Updated weights for policy 0, policy_version 170984 (0.0028) [2025-01-04 10:19:57,962][134294] Updated weights for policy 0, policy_version 170994 (0.0026) [2025-01-04 10:19:58,970][134211] Fps is (10 sec: 13104.4, 60 sec: 13448.1, 300 sec: 13204.3). Total num frames: 700403712. Throughput: 0: 3144.1. Samples: 164271712. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 10:19:58,970][134211] Avg episode reward: [(0, '9.671')] [2025-01-04 10:20:00,976][134294] Updated weights for policy 0, policy_version 171004 (0.0024) [2025-01-04 10:20:03,969][134211] Fps is (10 sec: 13515.4, 60 sec: 13516.5, 300 sec: 13246.0). Total num frames: 700469248. Throughput: 0: 3172.4. Samples: 164281974. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 10:20:03,969][134211] Avg episode reward: [(0, '9.985')] [2025-01-04 10:20:04,027][134294] Updated weights for policy 0, policy_version 171014 (0.0024) [2025-01-04 10:20:07,081][134294] Updated weights for policy 0, policy_version 171024 (0.0026) [2025-01-04 10:20:08,968][134211] Fps is (10 sec: 13109.9, 60 sec: 12902.3, 300 sec: 13204.4). Total num frames: 700534784. Throughput: 0: 3223.3. Samples: 164301958. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 10:20:08,968][134211] Avg episode reward: [(0, '9.738')] [2025-01-04 10:20:10,162][134294] Updated weights for policy 0, policy_version 171034 (0.0025) [2025-01-04 10:20:13,621][134294] Updated weights for policy 0, policy_version 171044 (0.0032) [2025-01-04 10:20:13,968][134211] Fps is (10 sec: 12698.8, 60 sec: 12697.6, 300 sec: 13065.5). Total num frames: 700596224. Throughput: 0: 3205.1. Samples: 164320950. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 10:20:13,969][134211] Avg episode reward: [(0, '9.240')] [2025-01-04 10:20:17,084][134294] Updated weights for policy 0, policy_version 171054 (0.0030) [2025-01-04 10:20:18,968][134211] Fps is (10 sec: 11878.4, 60 sec: 12629.5, 300 sec: 13010.0). Total num frames: 700653568. Throughput: 0: 3180.4. Samples: 164329616. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 10:20:18,968][134211] Avg episode reward: [(0, '9.454')] [2025-01-04 10:20:20,763][134294] Updated weights for policy 0, policy_version 171064 (0.0027) [2025-01-04 10:20:23,586][134294] Updated weights for policy 0, policy_version 171074 (0.0019) [2025-01-04 10:20:23,968][134211] Fps is (10 sec: 12697.5, 60 sec: 12834.1, 300 sec: 13037.8). Total num frames: 700723200. Throughput: 0: 3128.8. Samples: 164347010. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 10:20:23,968][134211] Avg episode reward: [(0, '9.114')] [2025-01-04 10:20:25,518][134294] Updated weights for policy 0, policy_version 171084 (0.0011) [2025-01-04 10:20:27,352][134294] Updated weights for policy 0, policy_version 171094 (0.0013) [2025-01-04 10:20:28,968][134211] Fps is (10 sec: 18022.1, 60 sec: 13517.2, 300 sec: 13204.4). Total num frames: 700833792. Throughput: 0: 3380.9. Samples: 164377668. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 10:20:28,968][134211] Avg episode reward: [(0, '9.518')] [2025-01-04 10:20:29,296][134294] Updated weights for policy 0, policy_version 171104 (0.0013) [2025-01-04 10:20:31,579][134294] Updated weights for policy 0, policy_version 171114 (0.0019) [2025-01-04 10:20:33,968][134211] Fps is (10 sec: 18841.7, 60 sec: 13721.6, 300 sec: 13246.0). Total num frames: 700911616. Throughput: 0: 3518.5. Samples: 164392738. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 10:20:33,969][134211] Avg episode reward: [(0, '9.833')] [2025-01-04 10:20:34,770][134294] Updated weights for policy 0, policy_version 171124 (0.0031) [2025-01-04 10:20:38,181][134294] Updated weights for policy 0, policy_version 171134 (0.0027) [2025-01-04 10:20:38,968][134211] Fps is (10 sec: 13926.6, 60 sec: 13653.3, 300 sec: 13259.9). Total num frames: 700973056. Throughput: 0: 3549.6. Samples: 164411524. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 10:20:38,968][134211] Avg episode reward: [(0, '9.237')] [2025-01-04 10:20:41,253][134294] Updated weights for policy 0, policy_version 171144 (0.0025) [2025-01-04 10:20:43,968][134211] Fps is (10 sec: 12288.0, 60 sec: 13653.3, 300 sec: 13260.0). Total num frames: 701034496. Throughput: 0: 3523.2. Samples: 164430248. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 10:20:43,969][134211] Avg episode reward: [(0, '9.535')] [2025-01-04 10:20:44,909][134294] Updated weights for policy 0, policy_version 171154 (0.0028) [2025-01-04 10:20:47,965][134294] Updated weights for policy 0, policy_version 171164 (0.0026) [2025-01-04 10:20:48,968][134211] Fps is (10 sec: 12697.8, 60 sec: 13789.9, 300 sec: 13273.8). Total num frames: 701100032. Throughput: 0: 3494.5. Samples: 164439222. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 10:20:48,968][134211] Avg episode reward: [(0, '9.807')] [2025-01-04 10:20:51,256][134294] Updated weights for policy 0, policy_version 171174 (0.0029) [2025-01-04 10:20:53,968][134211] Fps is (10 sec: 12288.0, 60 sec: 13721.6, 300 sec: 13246.1). Total num frames: 701157376. Throughput: 0: 3466.0. Samples: 164457930. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 10:20:53,969][134211] Avg episode reward: [(0, '8.784')] [2025-01-04 10:20:54,690][134294] Updated weights for policy 0, policy_version 171184 (0.0024) [2025-01-04 10:20:57,876][134294] Updated weights for policy 0, policy_version 171194 (0.0028) [2025-01-04 10:20:58,968][134211] Fps is (10 sec: 12287.9, 60 sec: 13653.8, 300 sec: 13246.1). Total num frames: 701222912. Throughput: 0: 3465.3. Samples: 164476886. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 10:20:58,969][134211] Avg episode reward: [(0, '9.768')] [2025-01-04 10:21:00,849][134294] Updated weights for policy 0, policy_version 171204 (0.0025) [2025-01-04 10:21:03,929][134294] Updated weights for policy 0, policy_version 171214 (0.0027) [2025-01-04 10:21:03,968][134211] Fps is (10 sec: 13517.0, 60 sec: 13721.8, 300 sec: 13259.9). Total num frames: 701292544. Throughput: 0: 3500.2. Samples: 164487124. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 10:21:03,968][134211] Avg episode reward: [(0, '9.484')] [2025-01-04 10:21:06,949][134294] Updated weights for policy 0, policy_version 171224 (0.0024) [2025-01-04 10:21:08,969][134211] Fps is (10 sec: 13514.9, 60 sec: 13721.3, 300 sec: 13273.8). Total num frames: 701358080. Throughput: 0: 3562.5. Samples: 164507326. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 10:21:08,970][134211] Avg episode reward: [(0, '8.611')] [2025-01-04 10:21:10,180][134294] Updated weights for policy 0, policy_version 171234 (0.0025) [2025-01-04 10:21:13,116][134294] Updated weights for policy 0, policy_version 171244 (0.0024) [2025-01-04 10:21:13,970][134211] Fps is (10 sec: 13104.2, 60 sec: 13789.4, 300 sec: 13315.4). Total num frames: 701423616. Throughput: 0: 3320.6. Samples: 164527104. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 10:21:13,971][134211] Avg episode reward: [(0, '9.164')] [2025-01-04 10:21:13,983][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000171246_701423616.pth... [2025-01-04 10:21:14,064][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000170461_698208256.pth [2025-01-04 10:21:16,610][134294] Updated weights for policy 0, policy_version 171254 (0.0026) [2025-01-04 10:21:18,968][134211] Fps is (10 sec: 12699.3, 60 sec: 13858.1, 300 sec: 13329.4). Total num frames: 701485056. Throughput: 0: 3184.0. Samples: 164536018. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 10:21:18,968][134211] Avg episode reward: [(0, '10.382')] [2025-01-04 10:21:19,799][134294] Updated weights for policy 0, policy_version 171264 (0.0027) [2025-01-04 10:21:23,102][134294] Updated weights for policy 0, policy_version 171274 (0.0025) [2025-01-04 10:21:23,968][134211] Fps is (10 sec: 12290.8, 60 sec: 13721.6, 300 sec: 13329.4). Total num frames: 701546496. Throughput: 0: 3189.1. Samples: 164555034. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 10:21:23,968][134211] Avg episode reward: [(0, '8.854')] [2025-01-04 10:21:26,190][134294] Updated weights for policy 0, policy_version 171284 (0.0027) [2025-01-04 10:21:28,968][134211] Fps is (10 sec: 12697.6, 60 sec: 12970.7, 300 sec: 13329.4). Total num frames: 701612032. Throughput: 0: 3206.9. Samples: 164574560. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 10:21:28,968][134211] Avg episode reward: [(0, '9.920')] [2025-01-04 10:21:29,323][134294] Updated weights for policy 0, policy_version 171294 (0.0024) [2025-01-04 10:21:32,341][134294] Updated weights for policy 0, policy_version 171304 (0.0025) [2025-01-04 10:21:33,968][134211] Fps is (10 sec: 13516.6, 60 sec: 12834.1, 300 sec: 13357.1). Total num frames: 701681664. Throughput: 0: 3226.7. Samples: 164584424. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 10:21:33,968][134211] Avg episode reward: [(0, '8.968')] [2025-01-04 10:21:35,381][134294] Updated weights for policy 0, policy_version 171314 (0.0029) [2025-01-04 10:21:38,515][134294] Updated weights for policy 0, policy_version 171324 (0.0024) [2025-01-04 10:21:38,968][134211] Fps is (10 sec: 13517.0, 60 sec: 12902.4, 300 sec: 13343.2). Total num frames: 701747200. Throughput: 0: 3266.7. Samples: 164604932. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 10:21:38,968][134211] Avg episode reward: [(0, '8.863')] [2025-01-04 10:21:41,816][134294] Updated weights for policy 0, policy_version 171334 (0.0027) [2025-01-04 10:21:43,968][134211] Fps is (10 sec: 12697.5, 60 sec: 12902.4, 300 sec: 13273.8). Total num frames: 701808640. Throughput: 0: 3255.0. Samples: 164623362. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 10:21:43,969][134211] Avg episode reward: [(0, '8.942')] [2025-01-04 10:21:44,891][134294] Updated weights for policy 0, policy_version 171344 (0.0025) [2025-01-04 10:21:46,799][134294] Updated weights for policy 0, policy_version 171354 (0.0012) [2025-01-04 10:21:48,887][134294] Updated weights for policy 0, policy_version 171364 (0.0015) [2025-01-04 10:21:48,968][134211] Fps is (10 sec: 15974.3, 60 sec: 13448.5, 300 sec: 13412.7). Total num frames: 701906944. Throughput: 0: 3318.4. Samples: 164636450. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 10:21:48,968][134211] Avg episode reward: [(0, '9.357')] [2025-01-04 10:21:52,311][134294] Updated weights for policy 0, policy_version 171374 (0.0026) [2025-01-04 10:21:53,968][134211] Fps is (10 sec: 15155.2, 60 sec: 13380.3, 300 sec: 13357.1). Total num frames: 701960192. Throughput: 0: 3388.4. Samples: 164659798. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 10:21:53,969][134211] Avg episode reward: [(0, '8.835')] [2025-01-04 10:21:56,155][134294] Updated weights for policy 0, policy_version 171384 (0.0028) [2025-01-04 10:21:58,968][134211] Fps is (10 sec: 11059.0, 60 sec: 13243.7, 300 sec: 13343.3). Total num frames: 702017536. Throughput: 0: 3298.7. Samples: 164675538. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 10:21:58,969][134211] Avg episode reward: [(0, '9.085')] [2025-01-04 10:21:59,975][134294] Updated weights for policy 0, policy_version 171394 (0.0027) [2025-01-04 10:22:01,899][134294] Updated weights for policy 0, policy_version 171404 (0.0014) [2025-01-04 10:22:03,904][134294] Updated weights for policy 0, policy_version 171414 (0.0014) [2025-01-04 10:22:03,968][134211] Fps is (10 sec: 15155.4, 60 sec: 13653.3, 300 sec: 13454.3). Total num frames: 702111744. Throughput: 0: 3354.0. Samples: 164686948. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 10:22:03,968][134211] Avg episode reward: [(0, '9.607')] [2025-01-04 10:22:06,152][134294] Updated weights for policy 0, policy_version 171424 (0.0014) [2025-01-04 10:22:08,968][134211] Fps is (10 sec: 17203.4, 60 sec: 13858.4, 300 sec: 13509.9). Total num frames: 702189568. Throughput: 0: 3582.2. Samples: 164716234. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 10:22:08,968][134211] Avg episode reward: [(0, '9.281')] [2025-01-04 10:22:09,207][134294] Updated weights for policy 0, policy_version 171434 (0.0027) [2025-01-04 10:22:13,296][134294] Updated weights for policy 0, policy_version 171444 (0.0036) [2025-01-04 10:22:13,968][134211] Fps is (10 sec: 12697.4, 60 sec: 13585.5, 300 sec: 13384.9). Total num frames: 702238720. Throughput: 0: 3488.0. Samples: 164731520. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 10:22:13,969][134211] Avg episode reward: [(0, '9.263')] [2025-01-04 10:22:17,154][134294] Updated weights for policy 0, policy_version 171454 (0.0029) [2025-01-04 10:22:18,968][134211] Fps is (10 sec: 10239.9, 60 sec: 13448.5, 300 sec: 13329.4). Total num frames: 702291968. Throughput: 0: 3443.7. Samples: 164739390. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 10:22:18,969][134211] Avg episode reward: [(0, '10.104')] [2025-01-04 10:22:21,504][134294] Updated weights for policy 0, policy_version 171464 (0.0027) [2025-01-04 10:22:23,968][134211] Fps is (10 sec: 11469.1, 60 sec: 13448.5, 300 sec: 13329.4). Total num frames: 702353408. Throughput: 0: 3308.0. Samples: 164753790. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 10:22:23,969][134211] Avg episode reward: [(0, '9.850')] [2025-01-04 10:22:24,140][134294] Updated weights for policy 0, policy_version 171474 (0.0016) [2025-01-04 10:22:26,274][134294] Updated weights for policy 0, policy_version 171484 (0.0015) [2025-01-04 10:22:28,159][134294] Updated weights for policy 0, policy_version 171494 (0.0013) [2025-01-04 10:22:28,968][134211] Fps is (10 sec: 16383.7, 60 sec: 14062.9, 300 sec: 13454.3). Total num frames: 702455808. Throughput: 0: 3554.0. Samples: 164783292. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 10:22:28,968][134211] Avg episode reward: [(0, '9.335')] [2025-01-04 10:22:30,097][134294] Updated weights for policy 0, policy_version 171504 (0.0016) [2025-01-04 10:22:32,536][134294] Updated weights for policy 0, policy_version 171514 (0.0021) [2025-01-04 10:22:33,968][134211] Fps is (10 sec: 18431.5, 60 sec: 14267.7, 300 sec: 13509.9). Total num frames: 702537728. Throughput: 0: 3607.7. Samples: 164798796. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 10:22:33,969][134211] Avg episode reward: [(0, '8.653')] [2025-01-04 10:22:36,119][134294] Updated weights for policy 0, policy_version 171524 (0.0028) [2025-01-04 10:22:38,968][134211] Fps is (10 sec: 13517.1, 60 sec: 14062.9, 300 sec: 13468.2). Total num frames: 702590976. Throughput: 0: 3479.0. Samples: 164816354. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 10:22:38,968][134211] Avg episode reward: [(0, '8.496')] [2025-01-04 10:22:39,872][134294] Updated weights for policy 0, policy_version 171534 (0.0025) [2025-01-04 10:22:43,209][134294] Updated weights for policy 0, policy_version 171544 (0.0026) [2025-01-04 10:22:43,968][134211] Fps is (10 sec: 11469.1, 60 sec: 14063.0, 300 sec: 13482.1). Total num frames: 702652416. Throughput: 0: 3526.3. Samples: 164834220. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 10:22:43,968][134211] Avg episode reward: [(0, '8.349')] [2025-01-04 10:22:46,350][134294] Updated weights for policy 0, policy_version 171554 (0.0026) [2025-01-04 10:22:48,968][134211] Fps is (10 sec: 12288.0, 60 sec: 13448.5, 300 sec: 13482.1). Total num frames: 702713856. Throughput: 0: 3488.1. Samples: 164843912. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 10:22:48,968][134211] Avg episode reward: [(0, '10.309')] [2025-01-04 10:22:49,943][134294] Updated weights for policy 0, policy_version 171564 (0.0027) [2025-01-04 10:22:53,755][134294] Updated weights for policy 0, policy_version 171574 (0.0030) [2025-01-04 10:22:53,968][134211] Fps is (10 sec: 11468.7, 60 sec: 13448.6, 300 sec: 13454.3). Total num frames: 702767104. Throughput: 0: 3211.7. Samples: 164860762. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 10:22:53,968][134211] Avg episode reward: [(0, '8.790')] [2025-01-04 10:22:57,441][134294] Updated weights for policy 0, policy_version 171584 (0.0027) [2025-01-04 10:22:58,968][134211] Fps is (10 sec: 11059.2, 60 sec: 13448.5, 300 sec: 13426.6). Total num frames: 702824448. Throughput: 0: 3238.1. Samples: 164877236. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 10:22:58,969][134211] Avg episode reward: [(0, '9.552')] [2025-01-04 10:23:00,818][134294] Updated weights for policy 0, policy_version 171594 (0.0029) [2025-01-04 10:23:03,968][134211] Fps is (10 sec: 11468.8, 60 sec: 12834.1, 300 sec: 13398.8). Total num frames: 702881792. Throughput: 0: 3269.2. Samples: 164886504. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 10:23:03,968][134211] Avg episode reward: [(0, '9.235')] [2025-01-04 10:23:04,432][134294] Updated weights for policy 0, policy_version 171604 (0.0030) [2025-01-04 10:23:08,215][134294] Updated weights for policy 0, policy_version 171614 (0.0031) [2025-01-04 10:23:08,968][134211] Fps is (10 sec: 11059.2, 60 sec: 12424.5, 300 sec: 13357.1). Total num frames: 702935040. Throughput: 0: 3314.7. Samples: 164902952. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 10:23:08,968][134211] Avg episode reward: [(0, '9.705')] [2025-01-04 10:23:11,629][134294] Updated weights for policy 0, policy_version 171624 (0.0028) [2025-01-04 10:23:13,968][134211] Fps is (10 sec: 11468.7, 60 sec: 12629.3, 300 sec: 13329.4). Total num frames: 702996480. Throughput: 0: 3046.5. Samples: 164920384. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 10:23:13,968][134211] Avg episode reward: [(0, '9.205')] [2025-01-04 10:23:13,980][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000171630_702996480.pth... [2025-01-04 10:23:14,056][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000170859_699838464.pth [2025-01-04 10:23:15,389][134294] Updated weights for policy 0, policy_version 171634 (0.0029) [2025-01-04 10:23:18,793][134294] Updated weights for policy 0, policy_version 171644 (0.0027) [2025-01-04 10:23:18,968][134211] Fps is (10 sec: 11878.4, 60 sec: 12697.6, 300 sec: 13273.8). Total num frames: 703053824. Throughput: 0: 2883.6. Samples: 164928556. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 10:23:18,968][134211] Avg episode reward: [(0, '8.679')] [2025-01-04 10:23:22,110][134294] Updated weights for policy 0, policy_version 171654 (0.0032) [2025-01-04 10:23:23,968][134211] Fps is (10 sec: 11878.0, 60 sec: 12697.5, 300 sec: 13121.1). Total num frames: 703115264. Throughput: 0: 2903.1. Samples: 164946996. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 10:23:23,969][134211] Avg episode reward: [(0, '9.528')] [2025-01-04 10:23:25,371][134294] Updated weights for policy 0, policy_version 171664 (0.0025) [2025-01-04 10:23:27,432][134294] Updated weights for policy 0, policy_version 171674 (0.0013) [2025-01-04 10:23:28,968][134211] Fps is (10 sec: 14336.1, 60 sec: 12356.3, 300 sec: 13162.7). Total num frames: 703197184. Throughput: 0: 3031.4. Samples: 164970634. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 10:23:28,968][134211] Avg episode reward: [(0, '9.994')] [2025-01-04 10:23:30,127][134294] Updated weights for policy 0, policy_version 171684 (0.0024) [2025-01-04 10:23:33,121][134294] Updated weights for policy 0, policy_version 171694 (0.0028) [2025-01-04 10:23:33,972][134211] Fps is (10 sec: 15149.5, 60 sec: 12150.6, 300 sec: 13204.2). Total num frames: 703266816. Throughput: 0: 3050.8. Samples: 164981210. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 10:23:33,973][134211] Avg episode reward: [(0, '9.528')] [2025-01-04 10:23:36,266][134294] Updated weights for policy 0, policy_version 171704 (0.0026) [2025-01-04 10:23:38,968][134211] Fps is (10 sec: 13516.7, 60 sec: 12356.3, 300 sec: 13218.3). Total num frames: 703332352. Throughput: 0: 3118.6. Samples: 165001100. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 10:23:38,969][134211] Avg episode reward: [(0, '8.891')] [2025-01-04 10:23:39,407][134294] Updated weights for policy 0, policy_version 171714 (0.0024) [2025-01-04 10:23:42,574][134294] Updated weights for policy 0, policy_version 171724 (0.0025) [2025-01-04 10:23:43,968][134211] Fps is (10 sec: 13112.7, 60 sec: 12424.5, 300 sec: 13232.2). Total num frames: 703397888. Throughput: 0: 3180.1. Samples: 165020340. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 10:23:43,968][134211] Avg episode reward: [(0, '9.803')] [2025-01-04 10:23:45,809][134294] Updated weights for policy 0, policy_version 171734 (0.0025) [2025-01-04 10:23:48,968][134211] Fps is (10 sec: 12697.1, 60 sec: 12424.4, 300 sec: 13273.8). Total num frames: 703459328. Throughput: 0: 3186.9. Samples: 165029914. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 10:23:48,969][134211] Avg episode reward: [(0, '9.765')] [2025-01-04 10:23:49,087][134294] Updated weights for policy 0, policy_version 171744 (0.0026) [2025-01-04 10:23:52,462][134294] Updated weights for policy 0, policy_version 171754 (0.0026) [2025-01-04 10:23:53,968][134211] Fps is (10 sec: 12288.0, 60 sec: 12561.1, 300 sec: 13301.6). Total num frames: 703520768. Throughput: 0: 3231.1. Samples: 165048352. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 10:23:53,968][134211] Avg episode reward: [(0, '10.146')] [2025-01-04 10:23:55,707][134294] Updated weights for policy 0, policy_version 171764 (0.0029) [2025-01-04 10:23:58,774][134294] Updated weights for policy 0, policy_version 171774 (0.0028) [2025-01-04 10:23:58,968][134211] Fps is (10 sec: 12698.4, 60 sec: 12697.6, 300 sec: 13315.5). Total num frames: 703586304. Throughput: 0: 3278.6. Samples: 165067922. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 10:23:58,968][134211] Avg episode reward: [(0, '9.383')] [2025-01-04 10:24:01,926][134294] Updated weights for policy 0, policy_version 171784 (0.0027) [2025-01-04 10:24:03,968][134211] Fps is (10 sec: 13107.3, 60 sec: 12834.1, 300 sec: 13190.5). Total num frames: 703651840. Throughput: 0: 3312.3. Samples: 165077610. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 10:24:03,968][134211] Avg episode reward: [(0, '10.638')] [2025-01-04 10:24:05,316][134294] Updated weights for policy 0, policy_version 171794 (0.0028) [2025-01-04 10:24:08,101][134294] Updated weights for policy 0, policy_version 171804 (0.0019) [2025-01-04 10:24:08,968][134211] Fps is (10 sec: 13926.4, 60 sec: 13175.5, 300 sec: 13190.5). Total num frames: 703725568. Throughput: 0: 3314.9. Samples: 165096164. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 10:24:08,968][134211] Avg episode reward: [(0, '9.036')] [2025-01-04 10:24:10,047][134294] Updated weights for policy 0, policy_version 171814 (0.0013) [2025-01-04 10:24:12,130][134294] Updated weights for policy 0, policy_version 171824 (0.0013) [2025-01-04 10:24:13,968][134211] Fps is (10 sec: 17203.4, 60 sec: 13789.9, 300 sec: 13315.5). Total num frames: 703823872. Throughput: 0: 3444.1. Samples: 165125616. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 10:24:13,968][134211] Avg episode reward: [(0, '9.444')] [2025-01-04 10:24:14,399][134294] Updated weights for policy 0, policy_version 171834 (0.0015) [2025-01-04 10:24:18,063][134294] Updated weights for policy 0, policy_version 171844 (0.0034) [2025-01-04 10:24:18,968][134211] Fps is (10 sec: 15155.1, 60 sec: 13721.6, 300 sec: 13301.6). Total num frames: 703877120. Throughput: 0: 3453.8. Samples: 165136618. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 10:24:18,968][134211] Avg episode reward: [(0, '9.053')] [2025-01-04 10:24:21,830][134294] Updated weights for policy 0, policy_version 171854 (0.0031) [2025-01-04 10:24:23,968][134211] Fps is (10 sec: 11059.0, 60 sec: 13653.4, 300 sec: 13260.0). Total num frames: 703934464. Throughput: 0: 3368.4. Samples: 165152680. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 10:24:23,968][134211] Avg episode reward: [(0, '9.497')] [2025-01-04 10:24:25,357][134294] Updated weights for policy 0, policy_version 171864 (0.0027) [2025-01-04 10:24:28,607][134294] Updated weights for policy 0, policy_version 171874 (0.0025) [2025-01-04 10:24:28,968][134211] Fps is (10 sec: 12287.1, 60 sec: 13380.1, 300 sec: 13259.9). Total num frames: 704000000. Throughput: 0: 3342.4. Samples: 165170752. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 10:24:28,969][134211] Avg episode reward: [(0, '9.925')] [2025-01-04 10:24:31,794][134294] Updated weights for policy 0, policy_version 171884 (0.0026) [2025-01-04 10:24:33,968][134211] Fps is (10 sec: 12697.7, 60 sec: 13244.7, 300 sec: 13246.1). Total num frames: 704061440. Throughput: 0: 3345.6. Samples: 165180462. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 10:24:33,968][134211] Avg episode reward: [(0, '9.856')] [2025-01-04 10:24:34,990][134294] Updated weights for policy 0, policy_version 171894 (0.0029) [2025-01-04 10:24:38,182][134294] Updated weights for policy 0, policy_version 171904 (0.0028) [2025-01-04 10:24:38,968][134211] Fps is (10 sec: 12698.4, 60 sec: 13243.7, 300 sec: 13259.9). Total num frames: 704126976. Throughput: 0: 3363.6. Samples: 165199716. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 10:24:38,968][134211] Avg episode reward: [(0, '9.747')] [2025-01-04 10:24:41,669][134294] Updated weights for policy 0, policy_version 171914 (0.0026) [2025-01-04 10:24:43,968][134211] Fps is (10 sec: 12697.6, 60 sec: 13175.5, 300 sec: 13273.8). Total num frames: 704188416. Throughput: 0: 3334.9. Samples: 165217992. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 10:24:43,968][134211] Avg episode reward: [(0, '10.207')] [2025-01-04 10:24:44,972][134294] Updated weights for policy 0, policy_version 171924 (0.0026) [2025-01-04 10:24:48,116][134294] Updated weights for policy 0, policy_version 171934 (0.0028) [2025-01-04 10:24:48,968][134211] Fps is (10 sec: 12288.0, 60 sec: 13175.6, 300 sec: 13273.8). Total num frames: 704249856. Throughput: 0: 3317.4. Samples: 165226894. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 10:24:48,968][134211] Avg episode reward: [(0, '9.298')] [2025-01-04 10:24:51,725][134294] Updated weights for policy 0, policy_version 171944 (0.0026) [2025-01-04 10:24:53,968][134211] Fps is (10 sec: 11878.3, 60 sec: 13107.2, 300 sec: 13232.3). Total num frames: 704307200. Throughput: 0: 3308.3. Samples: 165245040. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 10:24:53,968][134211] Avg episode reward: [(0, '8.495')] [2025-01-04 10:24:55,187][134294] Updated weights for policy 0, policy_version 171954 (0.0026) [2025-01-04 10:24:58,320][134294] Updated weights for policy 0, policy_version 171964 (0.0022) [2025-01-04 10:24:58,968][134211] Fps is (10 sec: 12288.0, 60 sec: 13107.2, 300 sec: 13232.2). Total num frames: 704372736. Throughput: 0: 3077.6. Samples: 165264110. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 10:24:58,968][134211] Avg episode reward: [(0, '8.726')] [2025-01-04 10:25:01,359][134294] Updated weights for policy 0, policy_version 171974 (0.0028) [2025-01-04 10:25:03,968][134211] Fps is (10 sec: 13107.3, 60 sec: 13107.2, 300 sec: 13232.2). Total num frames: 704438272. Throughput: 0: 3050.4. Samples: 165273886. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 10:25:03,968][134211] Avg episode reward: [(0, '9.040')] [2025-01-04 10:25:04,662][134294] Updated weights for policy 0, policy_version 171984 (0.0029) [2025-01-04 10:25:07,701][134294] Updated weights for policy 0, policy_version 171994 (0.0024) [2025-01-04 10:25:08,968][134211] Fps is (10 sec: 12697.7, 60 sec: 12902.4, 300 sec: 13232.2). Total num frames: 704499712. Throughput: 0: 3123.0. Samples: 165293214. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 10:25:08,968][134211] Avg episode reward: [(0, '9.768')] [2025-01-04 10:25:11,124][134294] Updated weights for policy 0, policy_version 172004 (0.0024) [2025-01-04 10:25:13,575][134294] Updated weights for policy 0, policy_version 172014 (0.0016) [2025-01-04 10:25:13,968][134211] Fps is (10 sec: 13516.8, 60 sec: 12492.8, 300 sec: 13287.7). Total num frames: 704573440. Throughput: 0: 3168.4. Samples: 165313326. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 10:25:13,968][134211] Avg episode reward: [(0, '9.847')] [2025-01-04 10:25:13,973][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000172015_704573440.pth... [2025-01-04 10:25:14,028][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000171246_701423616.pth [2025-01-04 10:25:15,626][134294] Updated weights for policy 0, policy_version 172024 (0.0012) [2025-01-04 10:25:17,524][134294] Updated weights for policy 0, policy_version 172034 (0.0013) [2025-01-04 10:25:18,968][134211] Fps is (10 sec: 18022.8, 60 sec: 13380.3, 300 sec: 13412.7). Total num frames: 704679936. Throughput: 0: 3295.1. Samples: 165328740. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 10:25:18,968][134211] Avg episode reward: [(0, '9.833')] [2025-01-04 10:25:19,842][134294] Updated weights for policy 0, policy_version 172044 (0.0018) [2025-01-04 10:25:23,641][134294] Updated weights for policy 0, policy_version 172054 (0.0030) [2025-01-04 10:25:23,968][134211] Fps is (10 sec: 15974.3, 60 sec: 13312.0, 300 sec: 13218.3). Total num frames: 704733184. Throughput: 0: 3394.2. Samples: 165352454. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 10:25:23,968][134211] Avg episode reward: [(0, '9.223')] [2025-01-04 10:25:26,912][134294] Updated weights for policy 0, policy_version 172064 (0.0026) [2025-01-04 10:25:28,968][134211] Fps is (10 sec: 11878.1, 60 sec: 13312.1, 300 sec: 13176.6). Total num frames: 704798720. Throughput: 0: 3389.5. Samples: 165370522. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 10:25:28,968][134211] Avg episode reward: [(0, '8.538')] [2025-01-04 10:25:30,430][134294] Updated weights for policy 0, policy_version 172074 (0.0031) [2025-01-04 10:25:33,732][134294] Updated weights for policy 0, policy_version 172084 (0.0027) [2025-01-04 10:25:33,968][134211] Fps is (10 sec: 12287.5, 60 sec: 13243.6, 300 sec: 13162.7). Total num frames: 704856064. Throughput: 0: 3380.2. Samples: 165379004. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 10:25:33,969][134211] Avg episode reward: [(0, '9.969')] [2025-01-04 10:25:37,796][134294] Updated weights for policy 0, policy_version 172094 (0.0037) [2025-01-04 10:25:38,968][134211] Fps is (10 sec: 11059.2, 60 sec: 13038.9, 300 sec: 13135.0). Total num frames: 704909312. Throughput: 0: 3352.2. Samples: 165395888. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 10:25:38,969][134211] Avg episode reward: [(0, '10.102')] [2025-01-04 10:25:41,565][134294] Updated weights for policy 0, policy_version 172104 (0.0031) [2025-01-04 10:25:43,968][134211] Fps is (10 sec: 10649.7, 60 sec: 12902.3, 300 sec: 13093.3). Total num frames: 704962560. Throughput: 0: 3288.2. Samples: 165412080. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 10:25:43,969][134211] Avg episode reward: [(0, '9.612')] [2025-01-04 10:25:45,397][134294] Updated weights for policy 0, policy_version 172114 (0.0033) [2025-01-04 10:25:47,855][134294] Updated weights for policy 0, policy_version 172124 (0.0017) [2025-01-04 10:25:48,968][134211] Fps is (10 sec: 12697.8, 60 sec: 13107.2, 300 sec: 13148.9). Total num frames: 705036288. Throughput: 0: 3249.7. Samples: 165420122. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 10:25:48,968][134211] Avg episode reward: [(0, '9.657')] [2025-01-04 10:25:50,510][134294] Updated weights for policy 0, policy_version 172134 (0.0021) [2025-01-04 10:25:53,824][134294] Updated weights for policy 0, policy_version 172144 (0.0026) [2025-01-04 10:25:53,968][134211] Fps is (10 sec: 13926.7, 60 sec: 13243.7, 300 sec: 13148.8). Total num frames: 705101824. Throughput: 0: 3343.7. Samples: 165443680. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 10:25:53,969][134211] Avg episode reward: [(0, '9.143')] [2025-01-04 10:25:56,952][134294] Updated weights for policy 0, policy_version 172154 (0.0025) [2025-01-04 10:25:58,968][134211] Fps is (10 sec: 13107.1, 60 sec: 13243.7, 300 sec: 13135.0). Total num frames: 705167360. Throughput: 0: 3321.7. Samples: 165462802. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 10:25:58,969][134211] Avg episode reward: [(0, '9.923')] [2025-01-04 10:26:00,171][134294] Updated weights for policy 0, policy_version 172164 (0.0027) [2025-01-04 10:26:03,350][134294] Updated weights for policy 0, policy_version 172174 (0.0028) [2025-01-04 10:26:03,968][134211] Fps is (10 sec: 12697.6, 60 sec: 13175.4, 300 sec: 13121.1). Total num frames: 705228800. Throughput: 0: 3191.5. Samples: 165472360. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 10:26:03,969][134211] Avg episode reward: [(0, '9.644')] [2025-01-04 10:26:06,666][134294] Updated weights for policy 0, policy_version 172184 (0.0025) [2025-01-04 10:26:08,968][134211] Fps is (10 sec: 12288.1, 60 sec: 13175.5, 300 sec: 13107.3). Total num frames: 705290240. Throughput: 0: 3079.1. Samples: 165491014. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 10:26:08,968][134211] Avg episode reward: [(0, '9.475')] [2025-01-04 10:26:10,213][134294] Updated weights for policy 0, policy_version 172194 (0.0029) [2025-01-04 10:26:13,584][134294] Updated weights for policy 0, policy_version 172204 (0.0027) [2025-01-04 10:26:13,968][134211] Fps is (10 sec: 12288.2, 60 sec: 12970.7, 300 sec: 13107.2). Total num frames: 705351680. Throughput: 0: 3076.7. Samples: 165508974. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 10:26:13,968][134211] Avg episode reward: [(0, '8.759')] [2025-01-04 10:26:16,851][134294] Updated weights for policy 0, policy_version 172214 (0.0024) [2025-01-04 10:26:18,968][134211] Fps is (10 sec: 13107.3, 60 sec: 12356.2, 300 sec: 13135.0). Total num frames: 705421312. Throughput: 0: 3076.7. Samples: 165517456. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 10:26:18,968][134211] Avg episode reward: [(0, '10.482')] [2025-01-04 10:26:19,638][134294] Updated weights for policy 0, policy_version 172224 (0.0019) [2025-01-04 10:26:23,451][134294] Updated weights for policy 0, policy_version 172234 (0.0028) [2025-01-04 10:26:23,968][134211] Fps is (10 sec: 12287.8, 60 sec: 12356.2, 300 sec: 13093.3). Total num frames: 705474560. Throughput: 0: 3143.0. Samples: 165537324. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 10:26:23,969][134211] Avg episode reward: [(0, '9.332')] [2025-01-04 10:26:26,224][134294] Updated weights for policy 0, policy_version 172244 (0.0019) [2025-01-04 10:26:28,191][134294] Updated weights for policy 0, policy_version 172254 (0.0016) [2025-01-04 10:26:28,968][134211] Fps is (10 sec: 14745.6, 60 sec: 12834.2, 300 sec: 13176.6). Total num frames: 705568768. Throughput: 0: 3317.5. Samples: 165561366. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 10:26:28,968][134211] Avg episode reward: [(0, '9.756')] [2025-01-04 10:26:30,060][134294] Updated weights for policy 0, policy_version 172264 (0.0013) [2025-01-04 10:26:31,953][134294] Updated weights for policy 0, policy_version 172274 (0.0015) [2025-01-04 10:26:33,868][134294] Updated weights for policy 0, policy_version 172284 (0.0014) [2025-01-04 10:26:33,968][134211] Fps is (10 sec: 20070.9, 60 sec: 13653.5, 300 sec: 13315.5). Total num frames: 705675264. Throughput: 0: 3498.3. Samples: 165577546. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 10:26:33,968][134211] Avg episode reward: [(0, '9.414')] [2025-01-04 10:26:35,807][134294] Updated weights for policy 0, policy_version 172294 (0.0014) [2025-01-04 10:26:38,444][134294] Updated weights for policy 0, policy_version 172304 (0.0021) [2025-01-04 10:26:38,968][134211] Fps is (10 sec: 19250.9, 60 sec: 14199.5, 300 sec: 13398.8). Total num frames: 705761280. Throughput: 0: 3662.6. Samples: 165608498. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 10:26:38,969][134211] Avg episode reward: [(0, '9.639')] [2025-01-04 10:26:42,002][134294] Updated weights for policy 0, policy_version 172314 (0.0035) [2025-01-04 10:26:43,968][134211] Fps is (10 sec: 14335.8, 60 sec: 14267.8, 300 sec: 13259.9). Total num frames: 705818624. Throughput: 0: 3629.3. Samples: 165626120. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 10:26:43,968][134211] Avg episode reward: [(0, '9.291')] [2025-01-04 10:26:45,476][134294] Updated weights for policy 0, policy_version 172324 (0.0028) [2025-01-04 10:26:48,573][134294] Updated weights for policy 0, policy_version 172334 (0.0025) [2025-01-04 10:26:48,970][134211] Fps is (10 sec: 11876.0, 60 sec: 14062.4, 300 sec: 13287.6). Total num frames: 705880064. Throughput: 0: 3617.8. Samples: 165635170. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 10:26:48,971][134211] Avg episode reward: [(0, '9.836')] [2025-01-04 10:26:52,007][134294] Updated weights for policy 0, policy_version 172344 (0.0025) [2025-01-04 10:26:53,968][134211] Fps is (10 sec: 12287.8, 60 sec: 13994.7, 300 sec: 13301.6). Total num frames: 705941504. Throughput: 0: 3619.9. Samples: 165653910. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 10:26:53,969][134211] Avg episode reward: [(0, '9.586')] [2025-01-04 10:26:55,589][134294] Updated weights for policy 0, policy_version 172354 (0.0028) [2025-01-04 10:26:58,968][134211] Fps is (10 sec: 11880.8, 60 sec: 13858.1, 300 sec: 13176.6). Total num frames: 705998848. Throughput: 0: 3606.8. Samples: 165671280. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 10:26:58,968][134211] Avg episode reward: [(0, '9.989')] [2025-01-04 10:26:59,097][134294] Updated weights for policy 0, policy_version 172364 (0.0028) [2025-01-04 10:27:02,662][134294] Updated weights for policy 0, policy_version 172374 (0.0027) [2025-01-04 10:27:03,968][134211] Fps is (10 sec: 11469.1, 60 sec: 13789.9, 300 sec: 13107.2). Total num frames: 706056192. Throughput: 0: 3611.0. Samples: 165679950. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 10:27:03,968][134211] Avg episode reward: [(0, '8.819')] [2025-01-04 10:27:06,007][134294] Updated weights for policy 0, policy_version 172384 (0.0024) [2025-01-04 10:27:08,968][134211] Fps is (10 sec: 11878.5, 60 sec: 13789.9, 300 sec: 13148.9). Total num frames: 706117632. Throughput: 0: 3566.9. Samples: 165697834. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 10:27:08,968][134211] Avg episode reward: [(0, '10.735')] [2025-01-04 10:27:09,404][134294] Updated weights for policy 0, policy_version 172394 (0.0028) [2025-01-04 10:27:12,500][134294] Updated weights for policy 0, policy_version 172404 (0.0027) [2025-01-04 10:27:13,968][134211] Fps is (10 sec: 12696.7, 60 sec: 13858.0, 300 sec: 13190.5). Total num frames: 706183168. Throughput: 0: 3460.9. Samples: 165717108. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 10:27:13,969][134211] Avg episode reward: [(0, '10.361')] [2025-01-04 10:27:14,013][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000172409_706187264.pth... [2025-01-04 10:27:14,081][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000171630_702996480.pth [2025-01-04 10:27:15,544][134294] Updated weights for policy 0, policy_version 172414 (0.0022) [2025-01-04 10:27:18,478][134294] Updated weights for policy 0, policy_version 172424 (0.0023) [2025-01-04 10:27:18,968][134211] Fps is (10 sec: 13516.7, 60 sec: 13858.1, 300 sec: 13218.3). Total num frames: 706252800. Throughput: 0: 3329.6. Samples: 165727380. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 10:27:18,968][134211] Avg episode reward: [(0, '9.171')] [2025-01-04 10:27:21,636][134294] Updated weights for policy 0, policy_version 172434 (0.0023) [2025-01-04 10:27:23,968][134211] Fps is (10 sec: 13517.6, 60 sec: 14062.9, 300 sec: 13093.3). Total num frames: 706318336. Throughput: 0: 3086.7. Samples: 165747400. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 10:27:23,968][134211] Avg episode reward: [(0, '9.759')] [2025-01-04 10:27:24,948][134294] Updated weights for policy 0, policy_version 172444 (0.0029) [2025-01-04 10:27:27,993][134294] Updated weights for policy 0, policy_version 172454 (0.0024) [2025-01-04 10:27:28,968][134211] Fps is (10 sec: 13107.1, 60 sec: 13585.0, 300 sec: 13037.8). Total num frames: 706383872. Throughput: 0: 3126.9. Samples: 165766830. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 10:27:28,968][134211] Avg episode reward: [(0, '9.494')] [2025-01-04 10:27:30,899][134294] Updated weights for policy 0, policy_version 172464 (0.0025) [2025-01-04 10:27:33,807][134294] Updated weights for policy 0, policy_version 172474 (0.0027) [2025-01-04 10:27:33,968][134211] Fps is (10 sec: 13517.0, 60 sec: 12970.7, 300 sec: 13093.3). Total num frames: 706453504. Throughput: 0: 3160.7. Samples: 165777394. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 10:27:33,968][134211] Avg episode reward: [(0, '9.576')] [2025-01-04 10:27:36,766][134294] Updated weights for policy 0, policy_version 172484 (0.0025) [2025-01-04 10:27:38,968][134211] Fps is (10 sec: 13926.5, 60 sec: 12697.6, 300 sec: 13121.1). Total num frames: 706523136. Throughput: 0: 3207.8. Samples: 165798262. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 10:27:38,968][134211] Avg episode reward: [(0, '9.913')] [2025-01-04 10:27:39,889][134294] Updated weights for policy 0, policy_version 172494 (0.0024) [2025-01-04 10:27:42,908][134294] Updated weights for policy 0, policy_version 172504 (0.0025) [2025-01-04 10:27:43,968][134211] Fps is (10 sec: 13516.6, 60 sec: 12834.1, 300 sec: 13135.0). Total num frames: 706588672. Throughput: 0: 3267.8. Samples: 165818330. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 10:27:43,968][134211] Avg episode reward: [(0, '8.392')] [2025-01-04 10:27:45,860][134294] Updated weights for policy 0, policy_version 172514 (0.0025) [2025-01-04 10:27:48,894][134294] Updated weights for policy 0, policy_version 172524 (0.0026) [2025-01-04 10:27:48,968][134211] Fps is (10 sec: 13516.9, 60 sec: 12971.1, 300 sec: 13190.5). Total num frames: 706658304. Throughput: 0: 3302.2. Samples: 165828550. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 10:27:48,968][134211] Avg episode reward: [(0, '9.327')] [2025-01-04 10:27:52,011][134294] Updated weights for policy 0, policy_version 172534 (0.0022) [2025-01-04 10:27:53,968][134211] Fps is (10 sec: 13107.2, 60 sec: 12970.7, 300 sec: 13204.4). Total num frames: 706719744. Throughput: 0: 3349.9. Samples: 165848578. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 10:27:53,968][134211] Avg episode reward: [(0, '8.616')] [2025-01-04 10:27:55,281][134294] Updated weights for policy 0, policy_version 172544 (0.0025) [2025-01-04 10:27:58,066][134294] Updated weights for policy 0, policy_version 172554 (0.0026) [2025-01-04 10:27:58,968][134211] Fps is (10 sec: 13107.3, 60 sec: 13175.5, 300 sec: 13246.1). Total num frames: 706789376. Throughput: 0: 3366.6. Samples: 165868604. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 10:27:58,968][134211] Avg episode reward: [(0, '8.886')] [2025-01-04 10:28:01,057][134294] Updated weights for policy 0, policy_version 172564 (0.0023) [2025-01-04 10:28:03,968][134211] Fps is (10 sec: 13925.9, 60 sec: 13380.2, 300 sec: 13301.6). Total num frames: 706859008. Throughput: 0: 3369.7. Samples: 165879018. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 10:28:03,969][134211] Avg episode reward: [(0, '9.354')] [2025-01-04 10:28:04,194][134294] Updated weights for policy 0, policy_version 172574 (0.0027) [2025-01-04 10:28:07,183][134294] Updated weights for policy 0, policy_version 172584 (0.0027) [2025-01-04 10:28:08,968][134211] Fps is (10 sec: 13516.8, 60 sec: 13448.5, 300 sec: 13315.5). Total num frames: 706924544. Throughput: 0: 3376.6. Samples: 165899346. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 10:28:08,968][134211] Avg episode reward: [(0, '8.286')] [2025-01-04 10:28:10,153][134294] Updated weights for policy 0, policy_version 172594 (0.0025) [2025-01-04 10:28:12,992][134294] Updated weights for policy 0, policy_version 172604 (0.0024) [2025-01-04 10:28:13,968][134211] Fps is (10 sec: 13517.3, 60 sec: 13516.9, 300 sec: 13357.1). Total num frames: 706994176. Throughput: 0: 3408.2. Samples: 165920198. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 10:28:13,968][134211] Avg episode reward: [(0, '9.572')] [2025-01-04 10:28:16,411][134294] Updated weights for policy 0, policy_version 172614 (0.0022) [2025-01-04 10:28:18,968][134211] Fps is (10 sec: 13107.1, 60 sec: 13380.3, 300 sec: 13357.1). Total num frames: 707055616. Throughput: 0: 3373.3. Samples: 165929192. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 10:28:18,968][134211] Avg episode reward: [(0, '9.976')] [2025-01-04 10:28:20,021][134294] Updated weights for policy 0, policy_version 172624 (0.0026) [2025-01-04 10:28:23,364][134294] Updated weights for policy 0, policy_version 172634 (0.0024) [2025-01-04 10:28:23,970][134211] Fps is (10 sec: 11875.9, 60 sec: 13243.3, 300 sec: 13273.7). Total num frames: 707112960. Throughput: 0: 3297.0. Samples: 165946634. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 10:28:23,970][134211] Avg episode reward: [(0, '9.615')] [2025-01-04 10:28:26,727][134294] Updated weights for policy 0, policy_version 172644 (0.0025) [2025-01-04 10:28:28,968][134211] Fps is (10 sec: 12288.2, 60 sec: 13243.8, 300 sec: 13260.1). Total num frames: 707178496. Throughput: 0: 3259.7. Samples: 165965016. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 10:28:28,968][134211] Avg episode reward: [(0, '9.029')] [2025-01-04 10:28:29,559][134294] Updated weights for policy 0, policy_version 172654 (0.0020) [2025-01-04 10:28:31,482][134294] Updated weights for policy 0, policy_version 172664 (0.0012) [2025-01-04 10:28:33,395][134294] Updated weights for policy 0, policy_version 172674 (0.0015) [2025-01-04 10:28:33,968][134211] Fps is (10 sec: 16387.5, 60 sec: 13721.6, 300 sec: 13371.0). Total num frames: 707276800. Throughput: 0: 3356.5. Samples: 165979594. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 10:28:33,968][134211] Avg episode reward: [(0, '8.885')] [2025-01-04 10:28:36,323][134294] Updated weights for policy 0, policy_version 172684 (0.0023) [2025-01-04 10:28:38,968][134211] Fps is (10 sec: 16793.0, 60 sec: 13721.6, 300 sec: 13384.9). Total num frames: 707346432. Throughput: 0: 3464.1. Samples: 166004462. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 10:28:38,969][134211] Avg episode reward: [(0, '10.457')] [2025-01-04 10:28:39,523][134294] Updated weights for policy 0, policy_version 172694 (0.0026) [2025-01-04 10:28:42,536][134294] Updated weights for policy 0, policy_version 172704 (0.0026) [2025-01-04 10:28:43,968][134211] Fps is (10 sec: 13516.7, 60 sec: 13721.6, 300 sec: 13398.8). Total num frames: 707411968. Throughput: 0: 3455.3. Samples: 166024094. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 10:28:43,968][134211] Avg episode reward: [(0, '9.350')] [2025-01-04 10:28:45,541][134294] Updated weights for policy 0, policy_version 172714 (0.0025) [2025-01-04 10:28:48,483][134294] Updated weights for policy 0, policy_version 172724 (0.0024) [2025-01-04 10:28:48,968][134211] Fps is (10 sec: 13517.1, 60 sec: 13721.6, 300 sec: 13426.6). Total num frames: 707481600. Throughput: 0: 3456.0. Samples: 166034538. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 10:28:48,968][134211] Avg episode reward: [(0, '9.891')] [2025-01-04 10:28:51,658][134294] Updated weights for policy 0, policy_version 172734 (0.0025) [2025-01-04 10:28:53,968][134211] Fps is (10 sec: 13106.5, 60 sec: 13721.5, 300 sec: 13412.6). Total num frames: 707543040. Throughput: 0: 3446.4. Samples: 166054438. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 10:28:53,969][134211] Avg episode reward: [(0, '9.945')] [2025-01-04 10:28:55,142][134294] Updated weights for policy 0, policy_version 172744 (0.0029) [2025-01-04 10:28:58,394][134294] Updated weights for policy 0, policy_version 172754 (0.0023) [2025-01-04 10:28:58,968][134211] Fps is (10 sec: 12288.0, 60 sec: 13585.1, 300 sec: 13398.8). Total num frames: 707604480. Throughput: 0: 3388.0. Samples: 166072658. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 10:28:58,968][134211] Avg episode reward: [(0, '9.413')] [2025-01-04 10:29:01,393][134294] Updated weights for policy 0, policy_version 172764 (0.0025) [2025-01-04 10:29:03,968][134211] Fps is (10 sec: 13107.9, 60 sec: 13585.1, 300 sec: 13384.9). Total num frames: 707674112. Throughput: 0: 3413.1. Samples: 166082780. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 10:29:03,968][134211] Avg episode reward: [(0, '9.561')] [2025-01-04 10:29:04,485][134294] Updated weights for policy 0, policy_version 172774 (0.0026) [2025-01-04 10:29:07,528][134294] Updated weights for policy 0, policy_version 172784 (0.0025) [2025-01-04 10:29:08,968][134211] Fps is (10 sec: 13516.7, 60 sec: 13585.0, 300 sec: 13273.8). Total num frames: 707739648. Throughput: 0: 3473.1. Samples: 166102918. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 10:29:08,968][134211] Avg episode reward: [(0, '10.086')] [2025-01-04 10:29:10,449][134294] Updated weights for policy 0, policy_version 172794 (0.0023) [2025-01-04 10:29:13,390][134294] Updated weights for policy 0, policy_version 172804 (0.0027) [2025-01-04 10:29:13,968][134211] Fps is (10 sec: 13516.8, 60 sec: 13585.0, 300 sec: 13329.3). Total num frames: 707809280. Throughput: 0: 3529.7. Samples: 166123852. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 10:29:13,969][134211] Avg episode reward: [(0, '9.797')] [2025-01-04 10:29:13,980][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000172805_707809280.pth... [2025-01-04 10:29:14,057][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000172015_704573440.pth [2025-01-04 10:29:16,436][134294] Updated weights for policy 0, policy_version 172814 (0.0025) [2025-01-04 10:29:18,968][134211] Fps is (10 sec: 13926.4, 60 sec: 13721.6, 300 sec: 13371.0). Total num frames: 707878912. Throughput: 0: 3425.2. Samples: 166133728. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 10:29:18,968][134211] Avg episode reward: [(0, '8.876')] [2025-01-04 10:29:19,550][134294] Updated weights for policy 0, policy_version 172824 (0.0025) [2025-01-04 10:29:22,689][134294] Updated weights for policy 0, policy_version 172834 (0.0025) [2025-01-04 10:29:23,968][134211] Fps is (10 sec: 13107.5, 60 sec: 13790.4, 300 sec: 13357.2). Total num frames: 707940352. Throughput: 0: 3313.4. Samples: 166153566. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 10:29:23,968][134211] Avg episode reward: [(0, '10.093')] [2025-01-04 10:29:25,906][134294] Updated weights for policy 0, policy_version 172844 (0.0027) [2025-01-04 10:29:28,800][134294] Updated weights for policy 0, policy_version 172854 (0.0026) [2025-01-04 10:29:28,968][134211] Fps is (10 sec: 13107.2, 60 sec: 13858.1, 300 sec: 13384.9). Total num frames: 708009984. Throughput: 0: 3318.7. Samples: 166173436. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 10:29:28,968][134211] Avg episode reward: [(0, '8.331')] [2025-01-04 10:29:31,816][134294] Updated weights for policy 0, policy_version 172864 (0.0023) [2025-01-04 10:29:33,968][134211] Fps is (10 sec: 13926.2, 60 sec: 13380.3, 300 sec: 13398.8). Total num frames: 708079616. Throughput: 0: 3314.9. Samples: 166183708. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 10:29:33,968][134211] Avg episode reward: [(0, '9.633')] [2025-01-04 10:29:34,853][134294] Updated weights for policy 0, policy_version 172874 (0.0027) [2025-01-04 10:29:37,858][134294] Updated weights for policy 0, policy_version 172884 (0.0029) [2025-01-04 10:29:38,968][134211] Fps is (10 sec: 13516.8, 60 sec: 13312.0, 300 sec: 13412.7). Total num frames: 708145152. Throughput: 0: 3327.3. Samples: 166204166. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 10:29:38,968][134211] Avg episode reward: [(0, '9.781')] [2025-01-04 10:29:40,370][134294] Updated weights for policy 0, policy_version 172894 (0.0017) [2025-01-04 10:29:42,469][134294] Updated weights for policy 0, policy_version 172904 (0.0016) [2025-01-04 10:29:43,968][134211] Fps is (10 sec: 15155.2, 60 sec: 13653.3, 300 sec: 13496.0). Total num frames: 708231168. Throughput: 0: 3476.8. Samples: 166229114. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 10:29:43,968][134211] Avg episode reward: [(0, '9.794')] [2025-01-04 10:29:45,637][134294] Updated weights for policy 0, policy_version 172914 (0.0025) [2025-01-04 10:29:48,721][134294] Updated weights for policy 0, policy_version 172924 (0.0026) [2025-01-04 10:29:48,968][134211] Fps is (10 sec: 15155.3, 60 sec: 13585.1, 300 sec: 13523.7). Total num frames: 708296704. Throughput: 0: 3474.4. Samples: 166239126. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 10:29:48,968][134211] Avg episode reward: [(0, '8.689')] [2025-01-04 10:29:52,283][134294] Updated weights for policy 0, policy_version 172934 (0.0029) [2025-01-04 10:29:53,968][134211] Fps is (10 sec: 12288.1, 60 sec: 13516.9, 300 sec: 13496.0). Total num frames: 708354048. Throughput: 0: 3433.6. Samples: 166257430. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 10:29:53,968][134211] Avg episode reward: [(0, '9.351')] [2025-01-04 10:29:55,619][134294] Updated weights for policy 0, policy_version 172944 (0.0022) [2025-01-04 10:29:58,394][134294] Updated weights for policy 0, policy_version 172954 (0.0022) [2025-01-04 10:29:58,968][134211] Fps is (10 sec: 12697.5, 60 sec: 13653.3, 300 sec: 13509.9). Total num frames: 708423680. Throughput: 0: 3410.8. Samples: 166277336. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 10:29:58,968][134211] Avg episode reward: [(0, '9.035')] [2025-01-04 10:30:01,557][134294] Updated weights for policy 0, policy_version 172964 (0.0026) [2025-01-04 10:30:03,968][134211] Fps is (10 sec: 13516.8, 60 sec: 13585.1, 300 sec: 13523.7). Total num frames: 708489216. Throughput: 0: 3408.1. Samples: 166287094. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 10:30:03,968][134211] Avg episode reward: [(0, '8.619')] [2025-01-04 10:30:05,018][134294] Updated weights for policy 0, policy_version 172974 (0.0028) [2025-01-04 10:30:08,255][134294] Updated weights for policy 0, policy_version 172984 (0.0027) [2025-01-04 10:30:08,968][134211] Fps is (10 sec: 12697.6, 60 sec: 13516.8, 300 sec: 13482.1). Total num frames: 708550656. Throughput: 0: 3371.6. Samples: 166305290. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 10:30:08,968][134211] Avg episode reward: [(0, '8.991')] [2025-01-04 10:30:11,417][134294] Updated weights for policy 0, policy_version 172994 (0.0028) [2025-01-04 10:30:13,968][134211] Fps is (10 sec: 11878.4, 60 sec: 13312.0, 300 sec: 13315.5). Total num frames: 708608000. Throughput: 0: 3342.4. Samples: 166323844. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 10:30:13,968][134211] Avg episode reward: [(0, '9.643')] [2025-01-04 10:30:14,759][134294] Updated weights for policy 0, policy_version 173004 (0.0025) [2025-01-04 10:30:16,837][134294] Updated weights for policy 0, policy_version 173014 (0.0013) [2025-01-04 10:30:18,968][134211] Fps is (10 sec: 15155.4, 60 sec: 13721.6, 300 sec: 13454.3). Total num frames: 708702208. Throughput: 0: 3391.3. Samples: 166336314. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 10:30:18,968][134211] Avg episode reward: [(0, '8.591')] [2025-01-04 10:30:19,089][134294] Updated weights for policy 0, policy_version 173024 (0.0012) [2025-01-04 10:30:22,326][134294] Updated weights for policy 0, policy_version 173034 (0.0025) [2025-01-04 10:30:23,969][134211] Fps is (10 sec: 15563.0, 60 sec: 13721.3, 300 sec: 13440.4). Total num frames: 708763648. Throughput: 0: 3454.8. Samples: 166359634. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 10:30:23,970][134211] Avg episode reward: [(0, '9.144')] [2025-01-04 10:30:26,034][134294] Updated weights for policy 0, policy_version 173044 (0.0028) [2025-01-04 10:30:28,968][134211] Fps is (10 sec: 11878.3, 60 sec: 13516.8, 300 sec: 13440.5). Total num frames: 708820992. Throughput: 0: 3280.1. Samples: 166376718. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 10:30:28,968][134211] Avg episode reward: [(0, '8.830')] [2025-01-04 10:30:29,448][134294] Updated weights for policy 0, policy_version 173054 (0.0023) [2025-01-04 10:30:32,689][134294] Updated weights for policy 0, policy_version 173064 (0.0025) [2025-01-04 10:30:33,968][134211] Fps is (10 sec: 11879.7, 60 sec: 13380.3, 300 sec: 13468.2). Total num frames: 708882432. Throughput: 0: 3260.1. Samples: 166385830. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 10:30:33,968][134211] Avg episode reward: [(0, '8.807')] [2025-01-04 10:30:35,850][134294] Updated weights for policy 0, policy_version 173074 (0.0028) [2025-01-04 10:30:38,770][134294] Updated weights for policy 0, policy_version 173084 (0.0024) [2025-01-04 10:30:38,968][134211] Fps is (10 sec: 13107.1, 60 sec: 13448.5, 300 sec: 13523.8). Total num frames: 708952064. Throughput: 0: 3295.0. Samples: 166405706. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 10:30:38,968][134211] Avg episode reward: [(0, '9.005')] [2025-01-04 10:30:42,096][134294] Updated weights for policy 0, policy_version 173094 (0.0025) [2025-01-04 10:30:43,968][134211] Fps is (10 sec: 13107.2, 60 sec: 13038.9, 300 sec: 13482.1). Total num frames: 709013504. Throughput: 0: 3280.0. Samples: 166424936. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 10:30:43,968][134211] Avg episode reward: [(0, '9.021')] [2025-01-04 10:30:45,302][134294] Updated weights for policy 0, policy_version 173104 (0.0028) [2025-01-04 10:30:48,968][134211] Fps is (10 sec: 11878.4, 60 sec: 12902.4, 300 sec: 13454.3). Total num frames: 709070848. Throughput: 0: 3274.8. Samples: 166434458. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 10:30:48,968][134211] Avg episode reward: [(0, '9.186')] [2025-01-04 10:30:49,081][134294] Updated weights for policy 0, policy_version 173114 (0.0029) [2025-01-04 10:30:53,542][134294] Updated weights for policy 0, policy_version 173124 (0.0031) [2025-01-04 10:30:53,968][134211] Fps is (10 sec: 10239.9, 60 sec: 12697.6, 300 sec: 13384.9). Total num frames: 709115904. Throughput: 0: 3201.4. Samples: 166449354. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 10:30:53,969][134211] Avg episode reward: [(0, '10.133')] [2025-01-04 10:30:56,350][134294] Updated weights for policy 0, policy_version 173134 (0.0018) [2025-01-04 10:30:58,495][134294] Updated weights for policy 0, policy_version 173144 (0.0016) [2025-01-04 10:30:58,968][134211] Fps is (10 sec: 13517.1, 60 sec: 13039.0, 300 sec: 13482.1). Total num frames: 709206016. Throughput: 0: 3273.0. Samples: 166471128. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 10:30:58,968][134211] Avg episode reward: [(0, '10.311')] [2025-01-04 10:31:00,547][134294] Updated weights for policy 0, policy_version 173154 (0.0015) [2025-01-04 10:31:03,497][134294] Updated weights for policy 0, policy_version 173164 (0.0027) [2025-01-04 10:31:03,968][134211] Fps is (10 sec: 16793.5, 60 sec: 13243.7, 300 sec: 13537.6). Total num frames: 709283840. Throughput: 0: 3321.8. Samples: 166485796. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 10:31:03,969][134211] Avg episode reward: [(0, '8.568')] [2025-01-04 10:31:06,756][134294] Updated weights for policy 0, policy_version 173174 (0.0028) [2025-01-04 10:31:08,968][134211] Fps is (10 sec: 13926.2, 60 sec: 13243.8, 300 sec: 13537.6). Total num frames: 709345280. Throughput: 0: 3222.0. Samples: 166504618. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 10:31:08,968][134211] Avg episode reward: [(0, '9.652')] [2025-01-04 10:31:10,237][134294] Updated weights for policy 0, policy_version 173184 (0.0029) [2025-01-04 10:31:13,521][134294] Updated weights for policy 0, policy_version 173194 (0.0027) [2025-01-04 10:31:13,968][134211] Fps is (10 sec: 12288.0, 60 sec: 13312.0, 300 sec: 13509.8). Total num frames: 709406720. Throughput: 0: 3248.8. Samples: 166522914. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 10:31:13,969][134211] Avg episode reward: [(0, '9.142')] [2025-01-04 10:31:13,981][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000173195_709406720.pth... [2025-01-04 10:31:14,068][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000172409_706187264.pth [2025-01-04 10:31:16,763][134294] Updated weights for policy 0, policy_version 173204 (0.0029) [2025-01-04 10:31:18,969][134211] Fps is (10 sec: 12286.6, 60 sec: 12765.6, 300 sec: 13537.6). Total num frames: 709468160. Throughput: 0: 3253.7. Samples: 166532252. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 10:31:18,970][134211] Avg episode reward: [(0, '9.594')] [2025-01-04 10:31:20,251][134294] Updated weights for policy 0, policy_version 173214 (0.0028) [2025-01-04 10:31:23,783][134294] Updated weights for policy 0, policy_version 173224 (0.0026) [2025-01-04 10:31:23,968][134211] Fps is (10 sec: 11878.5, 60 sec: 12697.8, 300 sec: 13412.7). Total num frames: 709525504. Throughput: 0: 3203.6. Samples: 166549868. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 10:31:23,969][134211] Avg episode reward: [(0, '9.253')] [2025-01-04 10:31:27,278][134294] Updated weights for policy 0, policy_version 173234 (0.0029) [2025-01-04 10:31:28,968][134211] Fps is (10 sec: 11879.9, 60 sec: 12765.9, 300 sec: 13259.9). Total num frames: 709586944. Throughput: 0: 3170.6. Samples: 166567610. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 10:31:28,968][134211] Avg episode reward: [(0, '8.528')] [2025-01-04 10:31:30,385][134294] Updated weights for policy 0, policy_version 173244 (0.0026) [2025-01-04 10:31:33,297][134294] Updated weights for policy 0, policy_version 173254 (0.0027) [2025-01-04 10:31:33,968][134211] Fps is (10 sec: 13107.5, 60 sec: 12902.4, 300 sec: 13204.4). Total num frames: 709656576. Throughput: 0: 3190.2. Samples: 166578018. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 10:31:33,968][134211] Avg episode reward: [(0, '9.864')] [2025-01-04 10:31:36,297][134294] Updated weights for policy 0, policy_version 173264 (0.0025) [2025-01-04 10:31:38,968][134211] Fps is (10 sec: 13516.5, 60 sec: 12834.1, 300 sec: 13232.2). Total num frames: 709722112. Throughput: 0: 3314.0. Samples: 166598484. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 10:31:38,968][134211] Avg episode reward: [(0, '9.517')] [2025-01-04 10:31:39,486][134294] Updated weights for policy 0, policy_version 173274 (0.0026) [2025-01-04 10:31:42,486][134294] Updated weights for policy 0, policy_version 173284 (0.0024) [2025-01-04 10:31:43,968][134211] Fps is (10 sec: 13107.2, 60 sec: 12902.4, 300 sec: 13246.1). Total num frames: 709787648. Throughput: 0: 3268.9. Samples: 166618228. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 10:31:43,968][134211] Avg episode reward: [(0, '9.237')] [2025-01-04 10:31:45,490][134294] Updated weights for policy 0, policy_version 173294 (0.0024) [2025-01-04 10:31:48,715][134294] Updated weights for policy 0, policy_version 173304 (0.0025) [2025-01-04 10:31:48,968][134211] Fps is (10 sec: 13107.2, 60 sec: 13038.9, 300 sec: 13259.9). Total num frames: 709853184. Throughput: 0: 3170.4. Samples: 166628464. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 10:31:48,968][134211] Avg episode reward: [(0, '8.911')] [2025-01-04 10:31:51,929][134294] Updated weights for policy 0, policy_version 173314 (0.0028) [2025-01-04 10:31:53,967][134211] Fps is (10 sec: 13107.3, 60 sec: 13380.3, 300 sec: 13287.7). Total num frames: 709918720. Throughput: 0: 3173.4. Samples: 166647420. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 10:31:53,968][134211] Avg episode reward: [(0, '9.446')] [2025-01-04 10:31:54,719][134294] Updated weights for policy 0, policy_version 173324 (0.0018) [2025-01-04 10:31:56,646][134294] Updated weights for policy 0, policy_version 173334 (0.0013) [2025-01-04 10:31:58,516][134294] Updated weights for policy 0, policy_version 173344 (0.0013) [2025-01-04 10:31:58,968][134211] Fps is (10 sec: 17203.5, 60 sec: 13653.3, 300 sec: 13454.3). Total num frames: 710025216. Throughput: 0: 3391.5. Samples: 166675532. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 10:31:58,968][134211] Avg episode reward: [(0, '9.641')] [2025-01-04 10:32:00,550][134294] Updated weights for policy 0, policy_version 173354 (0.0013) [2025-01-04 10:32:02,697][134294] Updated weights for policy 0, policy_version 173364 (0.0016) [2025-01-04 10:32:03,969][134211] Fps is (10 sec: 19248.1, 60 sec: 13789.6, 300 sec: 13537.6). Total num frames: 710111232. Throughput: 0: 3525.2. Samples: 166690888. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 10:32:03,970][134211] Avg episode reward: [(0, '8.197')] [2025-01-04 10:32:06,042][134294] Updated weights for policy 0, policy_version 173374 (0.0032) [2025-01-04 10:32:08,968][134211] Fps is (10 sec: 14745.4, 60 sec: 13789.9, 300 sec: 13523.8). Total num frames: 710172672. Throughput: 0: 3595.3. Samples: 166711658. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 10:32:08,968][134211] Avg episode reward: [(0, '9.058')] [2025-01-04 10:32:09,666][134294] Updated weights for policy 0, policy_version 173384 (0.0027) [2025-01-04 10:32:12,871][134294] Updated weights for policy 0, policy_version 173394 (0.0024) [2025-01-04 10:32:13,968][134211] Fps is (10 sec: 12289.6, 60 sec: 13789.9, 300 sec: 13496.0). Total num frames: 710234112. Throughput: 0: 3600.4. Samples: 166729630. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 10:32:13,970][134211] Avg episode reward: [(0, '8.439')] [2025-01-04 10:32:16,250][134294] Updated weights for policy 0, policy_version 173404 (0.0028) [2025-01-04 10:32:18,968][134211] Fps is (10 sec: 11878.3, 60 sec: 13721.8, 300 sec: 13468.2). Total num frames: 710291456. Throughput: 0: 3567.7. Samples: 166738564. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 10:32:18,968][134211] Avg episode reward: [(0, '9.049')] [2025-01-04 10:32:20,125][134294] Updated weights for policy 0, policy_version 173414 (0.0031) [2025-01-04 10:32:23,614][134294] Updated weights for policy 0, policy_version 173424 (0.0027) [2025-01-04 10:32:23,968][134211] Fps is (10 sec: 11058.7, 60 sec: 13653.2, 300 sec: 13426.5). Total num frames: 710344704. Throughput: 0: 3483.2. Samples: 166755230. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 10:32:23,969][134211] Avg episode reward: [(0, '8.584')] [2025-01-04 10:32:26,954][134294] Updated weights for policy 0, policy_version 173434 (0.0024) [2025-01-04 10:32:28,968][134211] Fps is (10 sec: 11878.5, 60 sec: 13721.6, 300 sec: 13412.7). Total num frames: 710410240. Throughput: 0: 3457.2. Samples: 166773804. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 10:32:28,968][134211] Avg episode reward: [(0, '9.665')] [2025-01-04 10:32:29,941][134294] Updated weights for policy 0, policy_version 173444 (0.0027) [2025-01-04 10:32:32,869][134294] Updated weights for policy 0, policy_version 173454 (0.0027) [2025-01-04 10:32:33,968][134211] Fps is (10 sec: 13517.5, 60 sec: 13721.6, 300 sec: 13412.7). Total num frames: 710479872. Throughput: 0: 3456.2. Samples: 166783992. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 10:32:33,968][134211] Avg episode reward: [(0, '9.854')] [2025-01-04 10:32:35,965][134294] Updated weights for policy 0, policy_version 173464 (0.0026) [2025-01-04 10:32:38,827][134294] Updated weights for policy 0, policy_version 173474 (0.0024) [2025-01-04 10:32:38,968][134211] Fps is (10 sec: 13926.4, 60 sec: 13789.9, 300 sec: 13426.6). Total num frames: 710549504. Throughput: 0: 3496.4. Samples: 166804758. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 10:32:38,968][134211] Avg episode reward: [(0, '9.320')] [2025-01-04 10:32:41,842][134294] Updated weights for policy 0, policy_version 173484 (0.0028) [2025-01-04 10:32:43,968][134211] Fps is (10 sec: 13516.8, 60 sec: 13789.8, 300 sec: 13412.7). Total num frames: 710615040. Throughput: 0: 3326.7. Samples: 166825236. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 10:32:43,968][134211] Avg episode reward: [(0, '9.256')] [2025-01-04 10:32:44,966][134294] Updated weights for policy 0, policy_version 173494 (0.0024) [2025-01-04 10:32:48,007][134294] Updated weights for policy 0, policy_version 173504 (0.0025) [2025-01-04 10:32:48,968][134211] Fps is (10 sec: 13107.1, 60 sec: 13789.9, 300 sec: 13426.6). Total num frames: 710680576. Throughput: 0: 3208.0. Samples: 166835244. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 10:32:48,968][134211] Avg episode reward: [(0, '8.929')] [2025-01-04 10:32:51,495][134294] Updated weights for policy 0, policy_version 173514 (0.0025) [2025-01-04 10:32:53,968][134211] Fps is (10 sec: 12697.5, 60 sec: 13721.5, 300 sec: 13398.8). Total num frames: 710742016. Throughput: 0: 3156.9. Samples: 166853718. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 10:32:53,968][134211] Avg episode reward: [(0, '9.258')] [2025-01-04 10:32:54,829][134294] Updated weights for policy 0, policy_version 173524 (0.0024) [2025-01-04 10:32:58,024][134294] Updated weights for policy 0, policy_version 173534 (0.0026) [2025-01-04 10:32:58,968][134211] Fps is (10 sec: 12288.0, 60 sec: 12970.6, 300 sec: 13371.0). Total num frames: 710803456. Throughput: 0: 3173.1. Samples: 166872420. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 10:32:58,968][134211] Avg episode reward: [(0, '8.924')] [2025-01-04 10:33:01,387][134294] Updated weights for policy 0, policy_version 173544 (0.0025) [2025-01-04 10:33:03,969][134211] Fps is (10 sec: 12286.6, 60 sec: 12561.1, 300 sec: 13357.1). Total num frames: 710864896. Throughput: 0: 3181.6. Samples: 166881740. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 10:33:03,971][134211] Avg episode reward: [(0, '9.201')] [2025-01-04 10:33:04,845][134294] Updated weights for policy 0, policy_version 173554 (0.0029) [2025-01-04 10:33:07,990][134294] Updated weights for policy 0, policy_version 173564 (0.0024) [2025-01-04 10:33:08,968][134211] Fps is (10 sec: 12288.0, 60 sec: 12561.0, 300 sec: 13329.4). Total num frames: 710926336. Throughput: 0: 3217.5. Samples: 166900018. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 10:33:08,968][134211] Avg episode reward: [(0, '9.366')] [2025-01-04 10:33:11,346][134294] Updated weights for policy 0, policy_version 173574 (0.0028) [2025-01-04 10:33:13,968][134211] Fps is (10 sec: 12289.6, 60 sec: 12561.1, 300 sec: 13329.4). Total num frames: 710987776. Throughput: 0: 3209.2. Samples: 166918220. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 10:33:13,968][134211] Avg episode reward: [(0, '8.081')] [2025-01-04 10:33:13,978][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000173581_710987776.pth... [2025-01-04 10:33:14,056][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000172805_707809280.pth [2025-01-04 10:33:14,700][134294] Updated weights for policy 0, policy_version 173584 (0.0026) [2025-01-04 10:33:17,151][134294] Updated weights for policy 0, policy_version 173594 (0.0015) [2025-01-04 10:33:18,968][134211] Fps is (10 sec: 15155.6, 60 sec: 13107.3, 300 sec: 13440.5). Total num frames: 711077888. Throughput: 0: 3206.6. Samples: 166928290. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 10:33:18,968][134211] Avg episode reward: [(0, '9.972')] [2025-01-04 10:33:19,146][134294] Updated weights for policy 0, policy_version 173604 (0.0014) [2025-01-04 10:33:21,119][134294] Updated weights for policy 0, policy_version 173614 (0.0013) [2025-01-04 10:33:23,066][134294] Updated weights for policy 0, policy_version 173624 (0.0016) [2025-01-04 10:33:23,968][134211] Fps is (10 sec: 18841.6, 60 sec: 13858.3, 300 sec: 13551.5). Total num frames: 711176192. Throughput: 0: 3436.8. Samples: 166959414. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 10:33:23,968][134211] Avg episode reward: [(0, '9.914')] [2025-01-04 10:33:26,384][134294] Updated weights for policy 0, policy_version 173634 (0.0028) [2025-01-04 10:33:28,968][134211] Fps is (10 sec: 15154.7, 60 sec: 13653.3, 300 sec: 13398.8). Total num frames: 711229440. Throughput: 0: 3420.1. Samples: 166979140. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 10:33:28,969][134211] Avg episode reward: [(0, '9.746')] [2025-01-04 10:33:30,098][134294] Updated weights for policy 0, policy_version 173644 (0.0031) [2025-01-04 10:33:33,968][134211] Fps is (10 sec: 10649.3, 60 sec: 13380.2, 300 sec: 13343.2). Total num frames: 711282688. Throughput: 0: 3378.8. Samples: 166987290. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 10:33:33,969][134211] Avg episode reward: [(0, '9.466')] [2025-01-04 10:33:34,332][134294] Updated weights for policy 0, policy_version 173654 (0.0033) [2025-01-04 10:33:37,582][134294] Updated weights for policy 0, policy_version 173664 (0.0027) [2025-01-04 10:33:38,968][134211] Fps is (10 sec: 11059.2, 60 sec: 13175.4, 300 sec: 13315.5). Total num frames: 711340032. Throughput: 0: 3334.2. Samples: 167003756. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 10:33:38,968][134211] Avg episode reward: [(0, '9.092')] [2025-01-04 10:33:41,023][134294] Updated weights for policy 0, policy_version 173674 (0.0028) [2025-01-04 10:33:43,968][134211] Fps is (10 sec: 12288.2, 60 sec: 13175.5, 300 sec: 13301.6). Total num frames: 711405568. Throughput: 0: 3327.7. Samples: 167022166. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 10:33:43,968][134211] Avg episode reward: [(0, '9.368')] [2025-01-04 10:33:44,334][134294] Updated weights for policy 0, policy_version 173684 (0.0027) [2025-01-04 10:33:47,655][134294] Updated weights for policy 0, policy_version 173694 (0.0028) [2025-01-04 10:33:48,968][134211] Fps is (10 sec: 12288.2, 60 sec: 13039.0, 300 sec: 13287.7). Total num frames: 711462912. Throughput: 0: 3322.5. Samples: 167031248. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 10:33:48,968][134211] Avg episode reward: [(0, '8.567')] [2025-01-04 10:33:50,957][134294] Updated weights for policy 0, policy_version 173704 (0.0025) [2025-01-04 10:33:53,968][134211] Fps is (10 sec: 12287.9, 60 sec: 13107.2, 300 sec: 13301.6). Total num frames: 711528448. Throughput: 0: 3332.4. Samples: 167049974. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 10:33:53,968][134211] Avg episode reward: [(0, '9.236')] [2025-01-04 10:33:54,190][134294] Updated weights for policy 0, policy_version 173714 (0.0027) [2025-01-04 10:33:57,431][134294] Updated weights for policy 0, policy_version 173724 (0.0027) [2025-01-04 10:33:58,968][134211] Fps is (10 sec: 12697.6, 60 sec: 13107.2, 300 sec: 13273.8). Total num frames: 711589888. Throughput: 0: 3348.0. Samples: 167068880. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 10:33:58,968][134211] Avg episode reward: [(0, '9.522')] [2025-01-04 10:34:00,573][134294] Updated weights for policy 0, policy_version 173734 (0.0025) [2025-01-04 10:34:03,599][134294] Updated weights for policy 0, policy_version 173744 (0.0027) [2025-01-04 10:34:03,968][134211] Fps is (10 sec: 13107.3, 60 sec: 13244.0, 300 sec: 13287.7). Total num frames: 711659520. Throughput: 0: 3345.9. Samples: 167078858. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 10:34:03,968][134211] Avg episode reward: [(0, '8.990')] [2025-01-04 10:34:06,587][134294] Updated weights for policy 0, policy_version 173754 (0.0028) [2025-01-04 10:34:08,968][134211] Fps is (10 sec: 13516.9, 60 sec: 13312.0, 300 sec: 13273.8). Total num frames: 711725056. Throughput: 0: 3103.7. Samples: 167099080. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 10:34:08,968][134211] Avg episode reward: [(0, '9.555')] [2025-01-04 10:34:09,792][134294] Updated weights for policy 0, policy_version 173764 (0.0027) [2025-01-04 10:34:12,812][134294] Updated weights for policy 0, policy_version 173774 (0.0025) [2025-01-04 10:34:13,968][134211] Fps is (10 sec: 13107.2, 60 sec: 13380.2, 300 sec: 13259.9). Total num frames: 711790592. Throughput: 0: 3107.6. Samples: 167118984. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 10:34:13,968][134211] Avg episode reward: [(0, '9.212')] [2025-01-04 10:34:16,330][134294] Updated weights for policy 0, policy_version 173784 (0.0028) [2025-01-04 10:34:18,955][134294] Updated weights for policy 0, policy_version 173794 (0.0015) [2025-01-04 10:34:18,968][134211] Fps is (10 sec: 13516.4, 60 sec: 13038.8, 300 sec: 13287.7). Total num frames: 711860224. Throughput: 0: 3119.9. Samples: 167127684. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 10:34:18,969][134211] Avg episode reward: [(0, '10.366')] [2025-01-04 10:34:21,251][134294] Updated weights for policy 0, policy_version 173804 (0.0015) [2025-01-04 10:34:23,457][134294] Updated weights for policy 0, policy_version 173814 (0.0018) [2025-01-04 10:34:23,969][134211] Fps is (10 sec: 15563.6, 60 sec: 12833.9, 300 sec: 13343.2). Total num frames: 711946240. Throughput: 0: 3320.3. Samples: 167153174. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 10:34:23,969][134211] Avg episode reward: [(0, '9.614')] [2025-01-04 10:34:26,939][134294] Updated weights for policy 0, policy_version 173824 (0.0031) [2025-01-04 10:34:28,968][134211] Fps is (10 sec: 14336.4, 60 sec: 12902.4, 300 sec: 13301.6). Total num frames: 712003584. Throughput: 0: 3342.8. Samples: 167172592. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 10:34:28,968][134211] Avg episode reward: [(0, '10.101')] [2025-01-04 10:34:30,375][134294] Updated weights for policy 0, policy_version 173834 (0.0027) [2025-01-04 10:34:33,706][134294] Updated weights for policy 0, policy_version 173844 (0.0029) [2025-01-04 10:34:33,968][134211] Fps is (10 sec: 11879.4, 60 sec: 13039.0, 300 sec: 13287.7). Total num frames: 712065024. Throughput: 0: 3341.8. Samples: 167181628. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 10:34:33,968][134211] Avg episode reward: [(0, '8.524')] [2025-01-04 10:34:36,814][134294] Updated weights for policy 0, policy_version 173854 (0.0024) [2025-01-04 10:34:38,968][134211] Fps is (10 sec: 12697.4, 60 sec: 13175.5, 300 sec: 13218.3). Total num frames: 712130560. Throughput: 0: 3352.6. Samples: 167200840. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 10:34:38,968][134211] Avg episode reward: [(0, '10.181')] [2025-01-04 10:34:40,293][134294] Updated weights for policy 0, policy_version 173864 (0.0029) [2025-01-04 10:34:43,968][134211] Fps is (10 sec: 11878.3, 60 sec: 12970.6, 300 sec: 13176.6). Total num frames: 712183808. Throughput: 0: 3299.5. Samples: 167217356. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 10:34:43,969][134211] Avg episode reward: [(0, '9.381')] [2025-01-04 10:34:44,066][134294] Updated weights for policy 0, policy_version 173874 (0.0033) [2025-01-04 10:34:47,695][134294] Updated weights for policy 0, policy_version 173884 (0.0029) [2025-01-04 10:34:48,968][134211] Fps is (10 sec: 11059.3, 60 sec: 12970.7, 300 sec: 13176.6). Total num frames: 712241152. Throughput: 0: 3271.3. Samples: 167226066. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 10:34:48,968][134211] Avg episode reward: [(0, '9.690')] [2025-01-04 10:34:51,427][134294] Updated weights for policy 0, policy_version 173894 (0.0029) [2025-01-04 10:34:53,908][134294] Updated weights for policy 0, policy_version 173904 (0.0016) [2025-01-04 10:34:53,968][134211] Fps is (10 sec: 12697.8, 60 sec: 13039.0, 300 sec: 13176.6). Total num frames: 712310784. Throughput: 0: 3195.9. Samples: 167242894. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 10:34:53,968][134211] Avg episode reward: [(0, '9.743')] [2025-01-04 10:34:56,111][134294] Updated weights for policy 0, policy_version 173914 (0.0013) [2025-01-04 10:34:58,186][134294] Updated weights for policy 0, policy_version 173924 (0.0013) [2025-01-04 10:34:58,968][134211] Fps is (10 sec: 16384.1, 60 sec: 13585.1, 300 sec: 13273.8). Total num frames: 712404992. Throughput: 0: 3392.1. Samples: 167271628. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 10:34:58,968][134211] Avg episode reward: [(0, '9.111')] [2025-01-04 10:35:00,391][134294] Updated weights for policy 0, policy_version 173934 (0.0014) [2025-01-04 10:35:03,968][134211] Fps is (10 sec: 15564.3, 60 sec: 13448.5, 300 sec: 13273.8). Total num frames: 712466432. Throughput: 0: 3478.5. Samples: 167284216. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 10:35:03,969][134211] Avg episode reward: [(0, '9.433')] [2025-01-04 10:35:04,917][134294] Updated weights for policy 0, policy_version 173944 (0.0034) [2025-01-04 10:35:08,968][134211] Fps is (10 sec: 10649.4, 60 sec: 13107.2, 300 sec: 13232.2). Total num frames: 712511488. Throughput: 0: 3202.9. Samples: 167297304. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 10:35:08,969][134211] Avg episode reward: [(0, '9.349')] [2025-01-04 10:35:09,225][134294] Updated weights for policy 0, policy_version 173954 (0.0030) [2025-01-04 10:35:12,501][134294] Updated weights for policy 0, policy_version 173964 (0.0025) [2025-01-04 10:35:13,968][134211] Fps is (10 sec: 11059.5, 60 sec: 13107.2, 300 sec: 13135.0). Total num frames: 712577024. Throughput: 0: 3167.2. Samples: 167315116. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 10:35:13,968][134211] Avg episode reward: [(0, '10.097')] [2025-01-04 10:35:13,974][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000173970_712581120.pth... [2025-01-04 10:35:14,034][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000173195_709406720.pth [2025-01-04 10:35:14,937][134294] Updated weights for policy 0, policy_version 173974 (0.0015) [2025-01-04 10:35:18,570][134294] Updated weights for policy 0, policy_version 173984 (0.0027) [2025-01-04 10:35:18,968][134211] Fps is (10 sec: 13106.9, 60 sec: 13038.9, 300 sec: 13148.9). Total num frames: 712642560. Throughput: 0: 3222.2. Samples: 167326630. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 10:35:18,969][134211] Avg episode reward: [(0, '9.051')] [2025-01-04 10:35:22,253][134294] Updated weights for policy 0, policy_version 173994 (0.0027) [2025-01-04 10:35:23,968][134211] Fps is (10 sec: 11878.4, 60 sec: 12493.0, 300 sec: 13135.0). Total num frames: 712695808. Throughput: 0: 3155.3. Samples: 167342828. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 10:35:23,968][134211] Avg episode reward: [(0, '8.679')] [2025-01-04 10:35:25,882][134294] Updated weights for policy 0, policy_version 174004 (0.0029) [2025-01-04 10:35:28,968][134211] Fps is (10 sec: 11059.5, 60 sec: 12492.8, 300 sec: 13121.1). Total num frames: 712753152. Throughput: 0: 3174.5. Samples: 167360208. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 10:35:28,969][134211] Avg episode reward: [(0, '9.326')] [2025-01-04 10:35:29,567][134294] Updated weights for policy 0, policy_version 174014 (0.0030) [2025-01-04 10:35:33,095][134294] Updated weights for policy 0, policy_version 174024 (0.0030) [2025-01-04 10:35:33,968][134211] Fps is (10 sec: 12288.1, 60 sec: 12561.1, 300 sec: 13107.2). Total num frames: 712818688. Throughput: 0: 3154.2. Samples: 167368006. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 10:35:33,968][134211] Avg episode reward: [(0, '8.651')] [2025-01-04 10:35:35,341][134294] Updated weights for policy 0, policy_version 174034 (0.0013) [2025-01-04 10:35:37,425][134294] Updated weights for policy 0, policy_version 174044 (0.0015) [2025-01-04 10:35:38,968][134211] Fps is (10 sec: 15564.8, 60 sec: 12970.7, 300 sec: 13204.4). Total num frames: 712908800. Throughput: 0: 3336.8. Samples: 167393050. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 10:35:38,968][134211] Avg episode reward: [(0, '9.955')] [2025-01-04 10:35:39,808][134294] Updated weights for policy 0, policy_version 174054 (0.0020) [2025-01-04 10:35:43,015][134294] Updated weights for policy 0, policy_version 174064 (0.0028) [2025-01-04 10:35:43,968][134211] Fps is (10 sec: 15564.2, 60 sec: 13175.4, 300 sec: 13232.2). Total num frames: 712974336. Throughput: 0: 3191.0. Samples: 167415222. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 10:35:43,969][134211] Avg episode reward: [(0, '8.973')] [2025-01-04 10:35:46,212][134294] Updated weights for policy 0, policy_version 174074 (0.0027) [2025-01-04 10:35:48,968][134211] Fps is (10 sec: 12697.6, 60 sec: 13243.7, 300 sec: 13287.7). Total num frames: 713035776. Throughput: 0: 3126.6. Samples: 167424914. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 10:35:48,969][134211] Avg episode reward: [(0, '9.243')] [2025-01-04 10:35:49,903][134294] Updated weights for policy 0, policy_version 174084 (0.0029) [2025-01-04 10:35:53,197][134294] Updated weights for policy 0, policy_version 174094 (0.0027) [2025-01-04 10:35:53,968][134211] Fps is (10 sec: 12288.3, 60 sec: 13107.2, 300 sec: 13190.5). Total num frames: 713097216. Throughput: 0: 3218.3. Samples: 167442128. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 10:35:53,968][134211] Avg episode reward: [(0, '9.487')] [2025-01-04 10:35:56,813][134294] Updated weights for policy 0, policy_version 174104 (0.0025) [2025-01-04 10:35:58,968][134211] Fps is (10 sec: 11878.4, 60 sec: 12492.8, 300 sec: 13121.1). Total num frames: 713154560. Throughput: 0: 3213.1. Samples: 167459706. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 10:35:58,969][134211] Avg episode reward: [(0, '9.090')] [2025-01-04 10:36:00,302][134294] Updated weights for policy 0, policy_version 174114 (0.0029) [2025-01-04 10:36:03,457][134294] Updated weights for policy 0, policy_version 174124 (0.0027) [2025-01-04 10:36:03,968][134211] Fps is (10 sec: 11878.4, 60 sec: 12492.8, 300 sec: 13121.1). Total num frames: 713216000. Throughput: 0: 3160.2. Samples: 167468838. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 10:36:03,968][134211] Avg episode reward: [(0, '10.082')] [2025-01-04 10:36:06,557][134294] Updated weights for policy 0, policy_version 174134 (0.0030) [2025-01-04 10:36:08,969][134211] Fps is (10 sec: 12695.6, 60 sec: 12833.8, 300 sec: 13134.9). Total num frames: 713281536. Throughput: 0: 3235.7. Samples: 167488438. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 10:36:08,970][134211] Avg episode reward: [(0, '8.777')] [2025-01-04 10:36:09,705][134294] Updated weights for policy 0, policy_version 174144 (0.0025) [2025-01-04 10:36:12,962][134294] Updated weights for policy 0, policy_version 174154 (0.0026) [2025-01-04 10:36:13,968][134211] Fps is (10 sec: 13107.4, 60 sec: 12834.1, 300 sec: 13148.9). Total num frames: 713347072. Throughput: 0: 3275.2. Samples: 167507592. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 10:36:13,968][134211] Avg episode reward: [(0, '8.590')] [2025-01-04 10:36:16,349][134294] Updated weights for policy 0, policy_version 174164 (0.0028) [2025-01-04 10:36:18,968][134211] Fps is (10 sec: 11880.3, 60 sec: 12629.4, 300 sec: 13135.0). Total num frames: 713400320. Throughput: 0: 3302.1. Samples: 167516602. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 10:36:18,968][134211] Avg episode reward: [(0, '9.205')] [2025-01-04 10:36:20,040][134294] Updated weights for policy 0, policy_version 174174 (0.0028) [2025-01-04 10:36:23,671][134294] Updated weights for policy 0, policy_version 174184 (0.0028) [2025-01-04 10:36:23,968][134211] Fps is (10 sec: 11059.2, 60 sec: 12697.6, 300 sec: 13121.1). Total num frames: 713457664. Throughput: 0: 3117.1. Samples: 167533318. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 10:36:23,968][134211] Avg episode reward: [(0, '8.398')] [2025-01-04 10:36:27,149][134294] Updated weights for policy 0, policy_version 174194 (0.0028) [2025-01-04 10:36:28,968][134211] Fps is (10 sec: 11878.4, 60 sec: 12765.9, 300 sec: 13093.3). Total num frames: 713519104. Throughput: 0: 3024.4. Samples: 167551318. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 10:36:28,968][134211] Avg episode reward: [(0, '9.972')] [2025-01-04 10:36:30,272][134294] Updated weights for policy 0, policy_version 174204 (0.0027) [2025-01-04 10:36:33,263][134294] Updated weights for policy 0, policy_version 174214 (0.0026) [2025-01-04 10:36:33,968][134211] Fps is (10 sec: 12697.4, 60 sec: 12765.8, 300 sec: 13093.3). Total num frames: 713584640. Throughput: 0: 3028.3. Samples: 167561186. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 10:36:33,968][134211] Avg episode reward: [(0, '9.957')] [2025-01-04 10:36:36,355][134294] Updated weights for policy 0, policy_version 174224 (0.0026) [2025-01-04 10:36:38,968][134211] Fps is (10 sec: 13516.8, 60 sec: 12424.5, 300 sec: 13107.2). Total num frames: 713654272. Throughput: 0: 3090.1. Samples: 167581184. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 10:36:38,968][134211] Avg episode reward: [(0, '8.932')] [2025-01-04 10:36:39,555][134294] Updated weights for policy 0, policy_version 174234 (0.0028) [2025-01-04 10:36:42,621][134294] Updated weights for policy 0, policy_version 174244 (0.0025) [2025-01-04 10:36:43,968][134211] Fps is (10 sec: 13517.0, 60 sec: 12424.6, 300 sec: 13107.2). Total num frames: 713719808. Throughput: 0: 3136.4. Samples: 167600842. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 10:36:43,968][134211] Avg episode reward: [(0, '8.696')] [2025-01-04 10:36:45,673][134294] Updated weights for policy 0, policy_version 174254 (0.0028) [2025-01-04 10:36:48,732][134294] Updated weights for policy 0, policy_version 174264 (0.0025) [2025-01-04 10:36:48,968][134211] Fps is (10 sec: 13516.9, 60 sec: 12561.1, 300 sec: 13121.1). Total num frames: 713789440. Throughput: 0: 3160.5. Samples: 167611060. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 10:36:48,968][134211] Avg episode reward: [(0, '8.502')] [2025-01-04 10:36:51,968][134294] Updated weights for policy 0, policy_version 174274 (0.0028) [2025-01-04 10:36:53,968][134211] Fps is (10 sec: 12697.4, 60 sec: 12492.8, 300 sec: 12954.5). Total num frames: 713846784. Throughput: 0: 3154.3. Samples: 167630378. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 10:36:53,969][134211] Avg episode reward: [(0, '8.217')] [2025-01-04 10:36:55,235][134294] Updated weights for policy 0, policy_version 174284 (0.0028) [2025-01-04 10:36:58,494][134294] Updated weights for policy 0, policy_version 174294 (0.0026) [2025-01-04 10:36:58,968][134211] Fps is (10 sec: 12287.9, 60 sec: 12629.3, 300 sec: 12885.1). Total num frames: 713912320. Throughput: 0: 3148.4. Samples: 167649270. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 10:36:58,968][134211] Avg episode reward: [(0, '8.767')] [2025-01-04 10:37:01,927][134294] Updated weights for policy 0, policy_version 174304 (0.0025) [2025-01-04 10:37:03,968][134211] Fps is (10 sec: 13517.1, 60 sec: 12765.9, 300 sec: 12912.8). Total num frames: 713981952. Throughput: 0: 3149.7. Samples: 167658340. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 10:37:03,968][134211] Avg episode reward: [(0, '10.185')] [2025-01-04 10:37:04,262][134294] Updated weights for policy 0, policy_version 174314 (0.0016) [2025-01-04 10:37:07,148][134294] Updated weights for policy 0, policy_version 174324 (0.0023) [2025-01-04 10:37:08,968][134211] Fps is (10 sec: 13926.4, 60 sec: 12834.5, 300 sec: 12940.6). Total num frames: 714051584. Throughput: 0: 3286.4. Samples: 167681208. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 10:37:08,968][134211] Avg episode reward: [(0, '9.124')] [2025-01-04 10:37:10,321][134294] Updated weights for policy 0, policy_version 174334 (0.0027) [2025-01-04 10:37:13,343][134294] Updated weights for policy 0, policy_version 174344 (0.0026) [2025-01-04 10:37:13,968][134211] Fps is (10 sec: 13516.4, 60 sec: 12834.1, 300 sec: 12968.3). Total num frames: 714117120. Throughput: 0: 3322.6. Samples: 167700834. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 10:37:13,969][134211] Avg episode reward: [(0, '8.705')] [2025-01-04 10:37:14,030][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000174346_714121216.pth... [2025-01-04 10:37:14,103][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000173581_710987776.pth [2025-01-04 10:37:16,496][134294] Updated weights for policy 0, policy_version 174354 (0.0028) [2025-01-04 10:37:18,968][134211] Fps is (10 sec: 13107.2, 60 sec: 13038.9, 300 sec: 13010.0). Total num frames: 714182656. Throughput: 0: 3317.4. Samples: 167710470. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 10:37:18,968][134211] Avg episode reward: [(0, '8.989')] [2025-01-04 10:37:19,671][134294] Updated weights for policy 0, policy_version 174364 (0.0028) [2025-01-04 10:37:23,064][134294] Updated weights for policy 0, policy_version 174374 (0.0027) [2025-01-04 10:37:23,968][134211] Fps is (10 sec: 12697.6, 60 sec: 13107.1, 300 sec: 12996.1). Total num frames: 714244096. Throughput: 0: 3300.5. Samples: 167729706. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 10:37:23,969][134211] Avg episode reward: [(0, '9.459')] [2025-01-04 10:37:26,494][134294] Updated weights for policy 0, policy_version 174384 (0.0028) [2025-01-04 10:37:28,968][134211] Fps is (10 sec: 12697.8, 60 sec: 13175.5, 300 sec: 12982.2). Total num frames: 714309632. Throughput: 0: 3262.3. Samples: 167747646. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 10:37:28,968][134211] Avg episode reward: [(0, '9.646')] [2025-01-04 10:37:29,238][134294] Updated weights for policy 0, policy_version 174394 (0.0018) [2025-01-04 10:37:31,252][134294] Updated weights for policy 0, policy_version 174404 (0.0014) [2025-01-04 10:37:33,127][134294] Updated weights for policy 0, policy_version 174414 (0.0013) [2025-01-04 10:37:33,968][134211] Fps is (10 sec: 17203.8, 60 sec: 13858.2, 300 sec: 13107.2). Total num frames: 714416128. Throughput: 0: 3370.2. Samples: 167762718. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 10:37:33,968][134211] Avg episode reward: [(0, '9.568')] [2025-01-04 10:37:35,066][134294] Updated weights for policy 0, policy_version 174424 (0.0013) [2025-01-04 10:37:36,943][134294] Updated weights for policy 0, policy_version 174434 (0.0011) [2025-01-04 10:37:38,968][134211] Fps is (10 sec: 20070.1, 60 sec: 14267.8, 300 sec: 13204.4). Total num frames: 714510336. Throughput: 0: 3654.7. Samples: 167794838. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 10:37:38,968][134211] Avg episode reward: [(0, '9.613')] [2025-01-04 10:37:39,764][134294] Updated weights for policy 0, policy_version 174444 (0.0026) [2025-01-04 10:37:43,311][134294] Updated weights for policy 0, policy_version 174454 (0.0029) [2025-01-04 10:37:43,968][134211] Fps is (10 sec: 15154.7, 60 sec: 14131.2, 300 sec: 13176.6). Total num frames: 714567680. Throughput: 0: 3653.7. Samples: 167813686. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 10:37:43,969][134211] Avg episode reward: [(0, '8.477')] [2025-01-04 10:37:46,616][134294] Updated weights for policy 0, policy_version 174464 (0.0028) [2025-01-04 10:37:48,968][134211] Fps is (10 sec: 11878.5, 60 sec: 13994.7, 300 sec: 13176.6). Total num frames: 714629120. Throughput: 0: 3655.4. Samples: 167822832. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 10:37:48,968][134211] Avg episode reward: [(0, '10.215')] [2025-01-04 10:37:50,040][134294] Updated weights for policy 0, policy_version 174474 (0.0027) [2025-01-04 10:37:53,412][134294] Updated weights for policy 0, policy_version 174484 (0.0029) [2025-01-04 10:37:53,968][134211] Fps is (10 sec: 12288.0, 60 sec: 14062.9, 300 sec: 13176.6). Total num frames: 714690560. Throughput: 0: 3555.7. Samples: 167841216. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 10:37:53,968][134211] Avg episode reward: [(0, '9.394')] [2025-01-04 10:37:56,732][134294] Updated weights for policy 0, policy_version 174494 (0.0028) [2025-01-04 10:37:58,968][134211] Fps is (10 sec: 12287.9, 60 sec: 13994.7, 300 sec: 13176.7). Total num frames: 714752000. Throughput: 0: 3527.4. Samples: 167859568. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 10:37:58,968][134211] Avg episode reward: [(0, '9.193')] [2025-01-04 10:38:00,154][134294] Updated weights for policy 0, policy_version 174504 (0.0026) [2025-01-04 10:38:03,323][134294] Updated weights for policy 0, policy_version 174514 (0.0025) [2025-01-04 10:38:03,968][134211] Fps is (10 sec: 12288.1, 60 sec: 13858.1, 300 sec: 13176.6). Total num frames: 714813440. Throughput: 0: 3514.3. Samples: 167868612. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 10:38:03,968][134211] Avg episode reward: [(0, '9.175')] [2025-01-04 10:38:06,508][134294] Updated weights for policy 0, policy_version 174524 (0.0027) [2025-01-04 10:38:08,968][134211] Fps is (10 sec: 12697.6, 60 sec: 13789.9, 300 sec: 13190.5). Total num frames: 714878976. Throughput: 0: 3511.1. Samples: 167887706. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 10:38:08,968][134211] Avg episode reward: [(0, '8.587')] [2025-01-04 10:38:09,851][134294] Updated weights for policy 0, policy_version 174534 (0.0027) [2025-01-04 10:38:13,446][134294] Updated weights for policy 0, policy_version 174544 (0.0029) [2025-01-04 10:38:13,968][134211] Fps is (10 sec: 11878.2, 60 sec: 13585.1, 300 sec: 13065.5). Total num frames: 714932224. Throughput: 0: 3507.4. Samples: 167905482. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 10:38:13,971][134211] Avg episode reward: [(0, '10.121')] [2025-01-04 10:38:17,576][134294] Updated weights for policy 0, policy_version 174554 (0.0030) [2025-01-04 10:38:18,968][134211] Fps is (10 sec: 11059.0, 60 sec: 13448.5, 300 sec: 12926.7). Total num frames: 714989568. Throughput: 0: 3334.2. Samples: 167912760. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 10:38:18,969][134211] Avg episode reward: [(0, '9.391')] [2025-01-04 10:38:20,944][134294] Updated weights for policy 0, policy_version 174564 (0.0027) [2025-01-04 10:38:23,968][134211] Fps is (10 sec: 11059.2, 60 sec: 13312.0, 300 sec: 12926.7). Total num frames: 715042816. Throughput: 0: 3005.8. Samples: 167930098. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 10:38:23,969][134211] Avg episode reward: [(0, '8.987')] [2025-01-04 10:38:24,652][134294] Updated weights for policy 0, policy_version 174574 (0.0027) [2025-01-04 10:38:26,686][134294] Updated weights for policy 0, policy_version 174584 (0.0014) [2025-01-04 10:38:28,816][134294] Updated weights for policy 0, policy_version 174594 (0.0014) [2025-01-04 10:38:28,968][134211] Fps is (10 sec: 14745.5, 60 sec: 13789.8, 300 sec: 13065.5). Total num frames: 715137024. Throughput: 0: 3133.6. Samples: 167954700. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 10:38:28,969][134211] Avg episode reward: [(0, '8.515')] [2025-01-04 10:38:30,969][134294] Updated weights for policy 0, policy_version 174604 (0.0014) [2025-01-04 10:38:33,020][134294] Updated weights for policy 0, policy_version 174614 (0.0013) [2025-01-04 10:38:33,968][134211] Fps is (10 sec: 19251.4, 60 sec: 13653.3, 300 sec: 13204.4). Total num frames: 715235328. Throughput: 0: 3243.9. Samples: 167968810. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 10:38:33,968][134211] Avg episode reward: [(0, '8.191')] [2025-01-04 10:38:36,016][134294] Updated weights for policy 0, policy_version 174624 (0.0025) [2025-01-04 10:38:38,971][134211] Fps is (10 sec: 15150.6, 60 sec: 12970.0, 300 sec: 13162.6). Total num frames: 715288576. Throughput: 0: 3333.4. Samples: 167991230. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 10:38:38,972][134211] Avg episode reward: [(0, '9.114')] [2025-01-04 10:38:39,960][134294] Updated weights for policy 0, policy_version 174634 (0.0033) [2025-01-04 10:38:43,468][134294] Updated weights for policy 0, policy_version 174644 (0.0030) [2025-01-04 10:38:43,968][134211] Fps is (10 sec: 11059.2, 60 sec: 12970.7, 300 sec: 13162.7). Total num frames: 715345920. Throughput: 0: 3295.1. Samples: 168007846. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 10:38:43,968][134211] Avg episode reward: [(0, '8.985')] [2025-01-04 10:38:47,007][134294] Updated weights for policy 0, policy_version 174654 (0.0030) [2025-01-04 10:38:48,968][134211] Fps is (10 sec: 11062.8, 60 sec: 12834.1, 300 sec: 13121.1). Total num frames: 715399168. Throughput: 0: 3281.6. Samples: 168016282. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 10:38:48,968][134211] Avg episode reward: [(0, '8.262')] [2025-01-04 10:38:50,527][134294] Updated weights for policy 0, policy_version 174664 (0.0032) [2025-01-04 10:38:53,797][134294] Updated weights for policy 0, policy_version 174674 (0.0025) [2025-01-04 10:38:53,968][134211] Fps is (10 sec: 11878.3, 60 sec: 12902.4, 300 sec: 13135.0). Total num frames: 715464704. Throughput: 0: 3262.3. Samples: 168034512. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 10:38:53,969][134211] Avg episode reward: [(0, '9.275')] [2025-01-04 10:38:57,133][134294] Updated weights for policy 0, policy_version 174684 (0.0027) [2025-01-04 10:38:58,968][134211] Fps is (10 sec: 12696.9, 60 sec: 12902.3, 300 sec: 13107.2). Total num frames: 715526144. Throughput: 0: 3267.4. Samples: 168052518. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 10:38:58,969][134211] Avg episode reward: [(0, '10.467')] [2025-01-04 10:39:00,698][134294] Updated weights for policy 0, policy_version 174694 (0.0026) [2025-01-04 10:39:03,968][134211] Fps is (10 sec: 11468.8, 60 sec: 12765.8, 300 sec: 13065.5). Total num frames: 715579392. Throughput: 0: 3289.7. Samples: 168060798. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 10:39:03,969][134211] Avg episode reward: [(0, '10.050')] [2025-01-04 10:39:04,557][134294] Updated weights for policy 0, policy_version 174704 (0.0029) [2025-01-04 10:39:08,350][134294] Updated weights for policy 0, policy_version 174714 (0.0027) [2025-01-04 10:39:08,967][134211] Fps is (10 sec: 11060.1, 60 sec: 12629.4, 300 sec: 13037.8). Total num frames: 715636736. Throughput: 0: 3266.3. Samples: 168077082. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 10:39:08,968][134211] Avg episode reward: [(0, '9.132')] [2025-01-04 10:39:10,428][134294] Updated weights for policy 0, policy_version 174724 (0.0015) [2025-01-04 10:39:13,345][134294] Updated weights for policy 0, policy_version 174734 (0.0025) [2025-01-04 10:39:13,968][134211] Fps is (10 sec: 13926.5, 60 sec: 13107.2, 300 sec: 13079.4). Total num frames: 715718656. Throughput: 0: 3242.6. Samples: 168100618. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 10:39:13,968][134211] Avg episode reward: [(0, '9.824')] [2025-01-04 10:39:13,977][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000174736_715718656.pth... [2025-01-04 10:39:14,070][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000173970_712581120.pth [2025-01-04 10:39:16,597][134294] Updated weights for policy 0, policy_version 174744 (0.0024) [2025-01-04 10:39:18,968][134211] Fps is (10 sec: 14335.5, 60 sec: 13175.5, 300 sec: 12996.2). Total num frames: 715780096. Throughput: 0: 3128.8. Samples: 168109608. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 10:39:18,968][134211] Avg episode reward: [(0, '9.839')] [2025-01-04 10:39:19,818][134294] Updated weights for policy 0, policy_version 174754 (0.0028) [2025-01-04 10:39:23,099][134294] Updated weights for policy 0, policy_version 174764 (0.0026) [2025-01-04 10:39:23,970][134211] Fps is (10 sec: 11875.9, 60 sec: 13243.3, 300 sec: 12996.0). Total num frames: 715837440. Throughput: 0: 3052.7. Samples: 168128600. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 10:39:23,971][134211] Avg episode reward: [(0, '9.921')] [2025-01-04 10:39:26,798][134294] Updated weights for policy 0, policy_version 174774 (0.0029) [2025-01-04 10:39:28,968][134211] Fps is (10 sec: 12288.1, 60 sec: 12765.9, 300 sec: 13010.0). Total num frames: 715902976. Throughput: 0: 3078.0. Samples: 168146356. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 10:39:28,968][134211] Avg episode reward: [(0, '8.969')] [2025-01-04 10:39:29,962][134294] Updated weights for policy 0, policy_version 174784 (0.0030) [2025-01-04 10:39:33,354][134294] Updated weights for policy 0, policy_version 174794 (0.0027) [2025-01-04 10:39:33,968][134211] Fps is (10 sec: 12290.7, 60 sec: 12083.2, 300 sec: 12982.2). Total num frames: 715960320. Throughput: 0: 3089.0. Samples: 168155286. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 10:39:33,968][134211] Avg episode reward: [(0, '9.763')] [2025-01-04 10:39:36,288][134294] Updated weights for policy 0, policy_version 174804 (0.0025) [2025-01-04 10:39:38,195][134294] Updated weights for policy 0, policy_version 174814 (0.0014) [2025-01-04 10:39:38,967][134211] Fps is (10 sec: 15155.6, 60 sec: 12766.6, 300 sec: 13121.1). Total num frames: 716054528. Throughput: 0: 3170.2. Samples: 168177170. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 10:39:38,968][134211] Avg episode reward: [(0, '9.982')] [2025-01-04 10:39:40,289][134294] Updated weights for policy 0, policy_version 174824 (0.0015) [2025-01-04 10:39:43,294][134294] Updated weights for policy 0, policy_version 174834 (0.0024) [2025-01-04 10:39:43,968][134211] Fps is (10 sec: 16793.1, 60 sec: 13038.9, 300 sec: 13176.6). Total num frames: 716128256. Throughput: 0: 3340.1. Samples: 168202820. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 10:39:43,969][134211] Avg episode reward: [(0, '8.371')] [2025-01-04 10:39:46,574][134294] Updated weights for policy 0, policy_version 174844 (0.0027) [2025-01-04 10:39:48,968][134211] Fps is (10 sec: 13516.4, 60 sec: 13175.5, 300 sec: 13148.8). Total num frames: 716189696. Throughput: 0: 3365.2. Samples: 168212232. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 10:39:48,968][134211] Avg episode reward: [(0, '8.490')] [2025-01-04 10:39:49,745][134294] Updated weights for policy 0, policy_version 174854 (0.0026) [2025-01-04 10:39:53,173][134294] Updated weights for policy 0, policy_version 174864 (0.0028) [2025-01-04 10:39:53,968][134211] Fps is (10 sec: 12288.4, 60 sec: 13107.2, 300 sec: 13037.8). Total num frames: 716251136. Throughput: 0: 3424.3. Samples: 168231178. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 10:39:53,968][134211] Avg episode reward: [(0, '8.740')] [2025-01-04 10:39:56,424][134294] Updated weights for policy 0, policy_version 174874 (0.0026) [2025-01-04 10:39:58,968][134211] Fps is (10 sec: 12288.1, 60 sec: 13107.3, 300 sec: 13037.8). Total num frames: 716312576. Throughput: 0: 3314.1. Samples: 168249754. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 10:39:58,968][134211] Avg episode reward: [(0, '9.516')] [2025-01-04 10:39:59,667][134294] Updated weights for policy 0, policy_version 174884 (0.0027) [2025-01-04 10:40:02,962][134294] Updated weights for policy 0, policy_version 174894 (0.0028) [2025-01-04 10:40:03,969][134211] Fps is (10 sec: 12696.4, 60 sec: 13311.8, 300 sec: 13107.2). Total num frames: 716378112. Throughput: 0: 3319.8. Samples: 168259002. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 10:40:03,970][134211] Avg episode reward: [(0, '10.759')] [2025-01-04 10:40:05,887][134294] Updated weights for policy 0, policy_version 174904 (0.0023) [2025-01-04 10:40:08,968][134211] Fps is (10 sec: 13106.2, 60 sec: 13448.3, 300 sec: 13107.2). Total num frames: 716443648. Throughput: 0: 3343.0. Samples: 168279030. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 10:40:08,969][134211] Avg episode reward: [(0, '9.188')] [2025-01-04 10:40:09,242][134294] Updated weights for policy 0, policy_version 174914 (0.0029) [2025-01-04 10:40:12,554][134294] Updated weights for policy 0, policy_version 174924 (0.0026) [2025-01-04 10:40:13,968][134211] Fps is (10 sec: 12288.9, 60 sec: 13038.9, 300 sec: 13079.4). Total num frames: 716500992. Throughput: 0: 3351.9. Samples: 168297190. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 10:40:13,969][134211] Avg episode reward: [(0, '8.769')] [2025-01-04 10:40:16,294][134294] Updated weights for policy 0, policy_version 174934 (0.0029) [2025-01-04 10:40:18,968][134211] Fps is (10 sec: 11469.7, 60 sec: 12970.7, 300 sec: 13093.3). Total num frames: 716558336. Throughput: 0: 3331.2. Samples: 168305190. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 10:40:18,968][134211] Avg episode reward: [(0, '8.678')] [2025-01-04 10:40:20,089][134294] Updated weights for policy 0, policy_version 174944 (0.0028) [2025-01-04 10:40:23,054][134294] Updated weights for policy 0, policy_version 174954 (0.0021) [2025-01-04 10:40:23,967][134211] Fps is (10 sec: 12288.4, 60 sec: 13107.7, 300 sec: 13121.1). Total num frames: 716623872. Throughput: 0: 3221.1. Samples: 168322120. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 10:40:23,968][134211] Avg episode reward: [(0, '9.467')] [2025-01-04 10:40:25,225][134294] Updated weights for policy 0, policy_version 174964 (0.0013) [2025-01-04 10:40:27,112][134294] Updated weights for policy 0, policy_version 174974 (0.0012) [2025-01-04 10:40:28,968][134211] Fps is (10 sec: 17203.3, 60 sec: 13789.9, 300 sec: 13259.9). Total num frames: 716730368. Throughput: 0: 3321.8. Samples: 168352298. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 10:40:28,968][134211] Avg episode reward: [(0, '9.686')] [2025-01-04 10:40:29,034][134294] Updated weights for policy 0, policy_version 174984 (0.0016) [2025-01-04 10:40:32,100][134294] Updated weights for policy 0, policy_version 174994 (0.0027) [2025-01-04 10:40:33,968][134211] Fps is (10 sec: 17202.6, 60 sec: 13926.4, 300 sec: 13176.6). Total num frames: 716795904. Throughput: 0: 3375.5. Samples: 168364128. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 10:40:33,969][134211] Avg episode reward: [(0, '10.354')] [2025-01-04 10:40:35,430][134294] Updated weights for policy 0, policy_version 175004 (0.0029) [2025-01-04 10:40:38,524][134294] Updated weights for policy 0, policy_version 175014 (0.0026) [2025-01-04 10:40:38,968][134211] Fps is (10 sec: 13107.0, 60 sec: 13448.5, 300 sec: 13176.6). Total num frames: 716861440. Throughput: 0: 3379.9. Samples: 168383274. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 10:40:38,968][134211] Avg episode reward: [(0, '9.169')] [2025-01-04 10:40:41,672][134294] Updated weights for policy 0, policy_version 175024 (0.0029) [2025-01-04 10:40:43,968][134211] Fps is (10 sec: 13107.3, 60 sec: 13312.0, 300 sec: 13190.5). Total num frames: 716926976. Throughput: 0: 3397.1. Samples: 168402624. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 10:40:43,968][134211] Avg episode reward: [(0, '8.709')] [2025-01-04 10:40:44,927][134294] Updated weights for policy 0, policy_version 175034 (0.0028) [2025-01-04 10:40:48,062][134294] Updated weights for policy 0, policy_version 175044 (0.0024) [2025-01-04 10:40:48,968][134211] Fps is (10 sec: 12697.6, 60 sec: 13312.0, 300 sec: 13190.5). Total num frames: 716988416. Throughput: 0: 3406.7. Samples: 168412302. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 10:40:48,968][134211] Avg episode reward: [(0, '8.943')] [2025-01-04 10:40:51,596][134294] Updated weights for policy 0, policy_version 175054 (0.0031) [2025-01-04 10:40:53,968][134211] Fps is (10 sec: 11878.5, 60 sec: 13243.7, 300 sec: 13190.5). Total num frames: 717045760. Throughput: 0: 3357.1. Samples: 168430098. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 10:40:53,968][134211] Avg episode reward: [(0, '8.696')] [2025-01-04 10:40:55,120][134294] Updated weights for policy 0, policy_version 175064 (0.0028) [2025-01-04 10:40:58,371][134294] Updated weights for policy 0, policy_version 175074 (0.0026) [2025-01-04 10:40:58,968][134211] Fps is (10 sec: 11878.4, 60 sec: 13243.7, 300 sec: 13190.5). Total num frames: 717107200. Throughput: 0: 3357.7. Samples: 168448286. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 10:40:58,968][134211] Avg episode reward: [(0, '8.561')] [2025-01-04 10:41:01,684][134294] Updated weights for policy 0, policy_version 175084 (0.0028) [2025-01-04 10:41:03,968][134211] Fps is (10 sec: 12287.9, 60 sec: 13175.6, 300 sec: 13176.7). Total num frames: 717168640. Throughput: 0: 3387.6. Samples: 168457634. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 10:41:03,968][134211] Avg episode reward: [(0, '10.266')] [2025-01-04 10:41:04,986][134294] Updated weights for policy 0, policy_version 175094 (0.0025) [2025-01-04 10:41:08,108][134294] Updated weights for policy 0, policy_version 175104 (0.0026) [2025-01-04 10:41:08,968][134211] Fps is (10 sec: 12697.6, 60 sec: 13175.6, 300 sec: 13176.6). Total num frames: 717234176. Throughput: 0: 3431.1. Samples: 168476520. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 10:41:08,968][134211] Avg episode reward: [(0, '9.362')] [2025-01-04 10:41:11,302][134294] Updated weights for policy 0, policy_version 175114 (0.0025) [2025-01-04 10:41:13,968][134211] Fps is (10 sec: 13107.0, 60 sec: 13312.0, 300 sec: 13218.3). Total num frames: 717299712. Throughput: 0: 3193.1. Samples: 168495988. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 10:41:13,969][134211] Avg episode reward: [(0, '9.053')] [2025-01-04 10:41:13,977][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000175122_717299712.pth... [2025-01-04 10:41:14,058][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000174346_714121216.pth [2025-01-04 10:41:14,448][134294] Updated weights for policy 0, policy_version 175124 (0.0024) [2025-01-04 10:41:18,013][134294] Updated weights for policy 0, policy_version 175134 (0.0027) [2025-01-04 10:41:18,968][134211] Fps is (10 sec: 12287.8, 60 sec: 13312.0, 300 sec: 13218.3). Total num frames: 717357056. Throughput: 0: 3134.6. Samples: 168505184. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 10:41:18,968][134211] Avg episode reward: [(0, '9.126')] [2025-01-04 10:41:21,560][134294] Updated weights for policy 0, policy_version 175144 (0.0026) [2025-01-04 10:41:23,968][134211] Fps is (10 sec: 11469.0, 60 sec: 13175.4, 300 sec: 13204.4). Total num frames: 717414400. Throughput: 0: 3088.7. Samples: 168522264. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 10:41:23,968][134211] Avg episode reward: [(0, '8.944')] [2025-01-04 10:41:25,271][134294] Updated weights for policy 0, policy_version 175154 (0.0029) [2025-01-04 10:41:28,968][134211] Fps is (10 sec: 11059.3, 60 sec: 12288.0, 300 sec: 13162.7). Total num frames: 717467648. Throughput: 0: 3025.7. Samples: 168538780. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 10:41:28,969][134211] Avg episode reward: [(0, '9.583')] [2025-01-04 10:41:29,044][134294] Updated weights for policy 0, policy_version 175164 (0.0030) [2025-01-04 10:41:31,353][134294] Updated weights for policy 0, policy_version 175174 (0.0013) [2025-01-04 10:41:33,533][134294] Updated weights for policy 0, policy_version 175184 (0.0014) [2025-01-04 10:41:33,968][134211] Fps is (10 sec: 14745.6, 60 sec: 12765.9, 300 sec: 13246.0). Total num frames: 717561856. Throughput: 0: 3073.1. Samples: 168550590. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 10:41:33,968][134211] Avg episode reward: [(0, '10.418')] [2025-01-04 10:41:35,761][134294] Updated weights for policy 0, policy_version 175194 (0.0014) [2025-01-04 10:41:38,968][134211] Fps is (10 sec: 16384.2, 60 sec: 12834.1, 300 sec: 13259.9). Total num frames: 717631488. Throughput: 0: 3255.6. Samples: 168576602. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 10:41:38,968][134211] Avg episode reward: [(0, '10.785')] [2025-01-04 10:41:39,366][134294] Updated weights for policy 0, policy_version 175204 (0.0033) [2025-01-04 10:41:43,124][134294] Updated weights for policy 0, policy_version 175214 (0.0033) [2025-01-04 10:41:43,968][134211] Fps is (10 sec: 12287.9, 60 sec: 12629.3, 300 sec: 13204.4). Total num frames: 717684736. Throughput: 0: 3199.3. Samples: 168592254. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 10:41:43,969][134211] Avg episode reward: [(0, '9.272')] [2025-01-04 10:41:46,748][134294] Updated weights for policy 0, policy_version 175224 (0.0028) [2025-01-04 10:41:48,968][134211] Fps is (10 sec: 10649.6, 60 sec: 12492.8, 300 sec: 13190.5). Total num frames: 717737984. Throughput: 0: 3183.3. Samples: 168600884. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 10:41:48,968][134211] Avg episode reward: [(0, '8.927')] [2025-01-04 10:41:50,535][134294] Updated weights for policy 0, policy_version 175234 (0.0025) [2025-01-04 10:41:53,968][134211] Fps is (10 sec: 11059.3, 60 sec: 12492.8, 300 sec: 13162.7). Total num frames: 717795328. Throughput: 0: 3133.1. Samples: 168617510. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 10:41:53,969][134211] Avg episode reward: [(0, '9.377')] [2025-01-04 10:41:54,196][134294] Updated weights for policy 0, policy_version 175244 (0.0028) [2025-01-04 10:41:57,771][134294] Updated weights for policy 0, policy_version 175254 (0.0025) [2025-01-04 10:41:58,968][134211] Fps is (10 sec: 11468.9, 60 sec: 12424.6, 300 sec: 13121.1). Total num frames: 717852672. Throughput: 0: 3076.9. Samples: 168634446. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 10:41:58,968][134211] Avg episode reward: [(0, '8.767')] [2025-01-04 10:42:00,380][134294] Updated weights for policy 0, policy_version 175264 (0.0017) [2025-01-04 10:42:02,605][134294] Updated weights for policy 0, policy_version 175274 (0.0013) [2025-01-04 10:42:03,967][134211] Fps is (10 sec: 15155.7, 60 sec: 12970.7, 300 sec: 13204.4). Total num frames: 717946880. Throughput: 0: 3160.0. Samples: 168647384. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 10:42:03,968][134211] Avg episode reward: [(0, '9.564')] [2025-01-04 10:42:05,055][134294] Updated weights for policy 0, policy_version 175284 (0.0015) [2025-01-04 10:42:08,924][134294] Updated weights for policy 0, policy_version 175294 (0.0031) [2025-01-04 10:42:08,968][134211] Fps is (10 sec: 15154.9, 60 sec: 12834.1, 300 sec: 13176.6). Total num frames: 718004224. Throughput: 0: 3282.2. Samples: 168669964. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 10:42:08,969][134211] Avg episode reward: [(0, '9.458')] [2025-01-04 10:42:12,981][134294] Updated weights for policy 0, policy_version 175304 (0.0032) [2025-01-04 10:42:13,968][134211] Fps is (10 sec: 10649.2, 60 sec: 12561.1, 300 sec: 13121.1). Total num frames: 718053376. Throughput: 0: 3247.4. Samples: 168684912. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 10:42:13,969][134211] Avg episode reward: [(0, '10.066')] [2025-01-04 10:42:16,569][134294] Updated weights for policy 0, policy_version 175314 (0.0030) [2025-01-04 10:42:18,968][134211] Fps is (10 sec: 10649.7, 60 sec: 12561.1, 300 sec: 13107.2). Total num frames: 718110720. Throughput: 0: 3178.8. Samples: 168693636. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 10:42:18,968][134211] Avg episode reward: [(0, '10.090')] [2025-01-04 10:42:19,920][134294] Updated weights for policy 0, policy_version 175324 (0.0021) [2025-01-04 10:42:22,202][134294] Updated weights for policy 0, policy_version 175334 (0.0013) [2025-01-04 10:42:23,968][134211] Fps is (10 sec: 14746.0, 60 sec: 13107.2, 300 sec: 13190.5). Total num frames: 718200832. Throughput: 0: 3077.7. Samples: 168715096. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 10:42:23,968][134211] Avg episode reward: [(0, '10.123')] [2025-01-04 10:42:24,397][134294] Updated weights for policy 0, policy_version 175344 (0.0014) [2025-01-04 10:42:26,423][134294] Updated weights for policy 0, policy_version 175354 (0.0013) [2025-01-04 10:42:28,664][134294] Updated weights for policy 0, policy_version 175364 (0.0013) [2025-01-04 10:42:28,969][134211] Fps is (10 sec: 18430.1, 60 sec: 13789.6, 300 sec: 13148.8). Total num frames: 718295040. Throughput: 0: 3377.2. Samples: 168744230. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 10:42:28,969][134211] Avg episode reward: [(0, '9.507')] [2025-01-04 10:42:32,538][134294] Updated weights for policy 0, policy_version 175374 (0.0029) [2025-01-04 10:42:33,968][134211] Fps is (10 sec: 14335.6, 60 sec: 13038.9, 300 sec: 12996.1). Total num frames: 718344192. Throughput: 0: 3374.9. Samples: 168752754. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 10:42:33,969][134211] Avg episode reward: [(0, '9.716')] [2025-01-04 10:42:36,277][134294] Updated weights for policy 0, policy_version 175384 (0.0030) [2025-01-04 10:42:38,968][134211] Fps is (10 sec: 10650.7, 60 sec: 12834.1, 300 sec: 12996.1). Total num frames: 718401536. Throughput: 0: 3362.5. Samples: 168768824. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 10:42:38,968][134211] Avg episode reward: [(0, '9.612')] [2025-01-04 10:42:39,778][134294] Updated weights for policy 0, policy_version 175394 (0.0026) [2025-01-04 10:42:42,819][134294] Updated weights for policy 0, policy_version 175404 (0.0025) [2025-01-04 10:42:43,968][134211] Fps is (10 sec: 12288.1, 60 sec: 13039.0, 300 sec: 13010.0). Total num frames: 718467072. Throughput: 0: 3410.7. Samples: 168787926. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 10:42:43,968][134211] Avg episode reward: [(0, '9.092')] [2025-01-04 10:42:45,977][134294] Updated weights for policy 0, policy_version 175414 (0.0026) [2025-01-04 10:42:48,891][134294] Updated weights for policy 0, policy_version 175424 (0.0026) [2025-01-04 10:42:48,968][134211] Fps is (10 sec: 13515.9, 60 sec: 13311.8, 300 sec: 13037.8). Total num frames: 718536704. Throughput: 0: 3345.8. Samples: 168797948. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 10:42:48,969][134211] Avg episode reward: [(0, '8.200')] [2025-01-04 10:42:52,167][134294] Updated weights for policy 0, policy_version 175434 (0.0030) [2025-01-04 10:42:53,968][134211] Fps is (10 sec: 13107.1, 60 sec: 13380.3, 300 sec: 13037.8). Total num frames: 718598144. Throughput: 0: 3286.4. Samples: 168817850. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 10:42:53,969][134211] Avg episode reward: [(0, '9.694')] [2025-01-04 10:42:55,417][134294] Updated weights for policy 0, policy_version 175444 (0.0029) [2025-01-04 10:42:58,721][134294] Updated weights for policy 0, policy_version 175454 (0.0026) [2025-01-04 10:42:58,969][134211] Fps is (10 sec: 12287.4, 60 sec: 13448.2, 300 sec: 13037.7). Total num frames: 718659584. Throughput: 0: 3363.8. Samples: 168836288. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 10:42:58,970][134211] Avg episode reward: [(0, '8.857')] [2025-01-04 10:43:01,985][134294] Updated weights for policy 0, policy_version 175464 (0.0027) [2025-01-04 10:43:03,968][134211] Fps is (10 sec: 12288.2, 60 sec: 12902.4, 300 sec: 13023.9). Total num frames: 718721024. Throughput: 0: 3380.3. Samples: 168845748. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 10:43:03,968][134211] Avg episode reward: [(0, '9.175')] [2025-01-04 10:43:05,258][134294] Updated weights for policy 0, policy_version 175474 (0.0027) [2025-01-04 10:43:08,331][134294] Updated weights for policy 0, policy_version 175484 (0.0025) [2025-01-04 10:43:08,968][134211] Fps is (10 sec: 13108.7, 60 sec: 13107.2, 300 sec: 13079.4). Total num frames: 718790656. Throughput: 0: 3336.3. Samples: 168865230. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 10:43:08,968][134211] Avg episode reward: [(0, '8.947')] [2025-01-04 10:43:11,372][134294] Updated weights for policy 0, policy_version 175494 (0.0027) [2025-01-04 10:43:13,968][134211] Fps is (10 sec: 13516.5, 60 sec: 13380.3, 300 sec: 13107.2). Total num frames: 718856192. Throughput: 0: 3127.9. Samples: 168884982. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 10:43:13,969][134211] Avg episode reward: [(0, '9.174')] [2025-01-04 10:43:13,981][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000175502_718856192.pth... [2025-01-04 10:43:14,054][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000174736_715718656.pth [2025-01-04 10:43:14,499][134294] Updated weights for policy 0, policy_version 175504 (0.0027) [2025-01-04 10:43:17,700][134294] Updated weights for policy 0, policy_version 175514 (0.0024) [2025-01-04 10:43:18,968][134211] Fps is (10 sec: 12697.7, 60 sec: 13448.6, 300 sec: 13135.0). Total num frames: 718917632. Throughput: 0: 3153.7. Samples: 168894670. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 10:43:18,968][134211] Avg episode reward: [(0, '8.465')] [2025-01-04 10:43:21,249][134294] Updated weights for policy 0, policy_version 175524 (0.0026) [2025-01-04 10:43:23,968][134211] Fps is (10 sec: 11878.3, 60 sec: 12902.3, 300 sec: 13010.0). Total num frames: 718974976. Throughput: 0: 3195.1. Samples: 168912606. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 10:43:23,969][134211] Avg episode reward: [(0, '9.290')] [2025-01-04 10:43:24,618][134294] Updated weights for policy 0, policy_version 175534 (0.0027) [2025-01-04 10:43:27,875][134294] Updated weights for policy 0, policy_version 175544 (0.0028) [2025-01-04 10:43:28,969][134211] Fps is (10 sec: 12286.9, 60 sec: 12424.6, 300 sec: 12898.9). Total num frames: 719040512. Throughput: 0: 3183.8. Samples: 168931200. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 10:43:28,969][134211] Avg episode reward: [(0, '8.925')] [2025-01-04 10:43:30,878][134294] Updated weights for policy 0, policy_version 175554 (0.0026) [2025-01-04 10:43:33,968][134211] Fps is (10 sec: 13107.6, 60 sec: 12697.6, 300 sec: 12940.7). Total num frames: 719106048. Throughput: 0: 3188.5. Samples: 168941428. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 10:43:33,968][134211] Avg episode reward: [(0, '9.892')] [2025-01-04 10:43:34,018][134294] Updated weights for policy 0, policy_version 175564 (0.0028) [2025-01-04 10:43:37,063][134294] Updated weights for policy 0, policy_version 175574 (0.0024) [2025-01-04 10:43:38,968][134211] Fps is (10 sec: 13108.2, 60 sec: 12834.1, 300 sec: 12968.4). Total num frames: 719171584. Throughput: 0: 3191.6. Samples: 168961470. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 10:43:38,969][134211] Avg episode reward: [(0, '8.872')] [2025-01-04 10:43:40,271][134294] Updated weights for policy 0, policy_version 175584 (0.0028) [2025-01-04 10:43:43,218][134294] Updated weights for policy 0, policy_version 175594 (0.0022) [2025-01-04 10:43:43,968][134211] Fps is (10 sec: 13516.7, 60 sec: 12902.4, 300 sec: 13023.9). Total num frames: 719241216. Throughput: 0: 3223.5. Samples: 168981340. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 10:43:43,968][134211] Avg episode reward: [(0, '8.446')] [2025-01-04 10:43:46,173][134294] Updated weights for policy 0, policy_version 175604 (0.0026) [2025-01-04 10:43:48,969][134211] Fps is (10 sec: 13106.0, 60 sec: 12765.8, 300 sec: 13010.0). Total num frames: 719302656. Throughput: 0: 3240.6. Samples: 168991578. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 10:43:48,969][134211] Avg episode reward: [(0, '9.517')] [2025-01-04 10:43:49,665][134294] Updated weights for policy 0, policy_version 175614 (0.0027) [2025-01-04 10:43:53,108][134294] Updated weights for policy 0, policy_version 175624 (0.0026) [2025-01-04 10:43:53,968][134211] Fps is (10 sec: 12288.1, 60 sec: 12765.9, 300 sec: 13010.0). Total num frames: 719364096. Throughput: 0: 3203.1. Samples: 169009368. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 10:43:53,968][134211] Avg episode reward: [(0, '8.881')] [2025-01-04 10:43:56,342][134294] Updated weights for policy 0, policy_version 175634 (0.0025) [2025-01-04 10:43:58,968][134211] Fps is (10 sec: 12289.2, 60 sec: 12766.1, 300 sec: 13037.8). Total num frames: 719425536. Throughput: 0: 3179.0. Samples: 169028036. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 10:43:58,968][134211] Avg episode reward: [(0, '9.332')] [2025-01-04 10:43:59,556][134294] Updated weights for policy 0, policy_version 175644 (0.0031) [2025-01-04 10:44:02,939][134294] Updated weights for policy 0, policy_version 175654 (0.0029) [2025-01-04 10:44:03,968][134211] Fps is (10 sec: 13107.3, 60 sec: 12902.4, 300 sec: 13079.4). Total num frames: 719495168. Throughput: 0: 3166.2. Samples: 169037148. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 10:44:03,968][134211] Avg episode reward: [(0, '8.466')] [2025-01-04 10:44:05,033][134294] Updated weights for policy 0, policy_version 175664 (0.0015) [2025-01-04 10:44:06,998][134294] Updated weights for policy 0, policy_version 175674 (0.0017) [2025-01-04 10:44:08,968][134211] Fps is (10 sec: 15974.5, 60 sec: 13243.7, 300 sec: 13107.2). Total num frames: 719585280. Throughput: 0: 3366.0. Samples: 169064074. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 10:44:08,968][134211] Avg episode reward: [(0, '9.615')] [2025-01-04 10:44:09,943][134294] Updated weights for policy 0, policy_version 175684 (0.0023) [2025-01-04 10:44:12,920][134294] Updated weights for policy 0, policy_version 175694 (0.0023) [2025-01-04 10:44:13,968][134211] Fps is (10 sec: 15974.2, 60 sec: 13312.1, 300 sec: 13135.0). Total num frames: 719654912. Throughput: 0: 3410.8. Samples: 169084682. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 10:44:13,968][134211] Avg episode reward: [(0, '10.944')] [2025-01-04 10:44:13,978][134264] Saving new best policy, reward=10.944! [2025-01-04 10:44:16,036][134294] Updated weights for policy 0, policy_version 175704 (0.0027) [2025-01-04 10:44:18,968][134211] Fps is (10 sec: 13107.2, 60 sec: 13312.0, 300 sec: 13149.0). Total num frames: 719716352. Throughput: 0: 3400.1. Samples: 169094432. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 10:44:18,968][134211] Avg episode reward: [(0, '10.658')] [2025-01-04 10:44:19,567][134294] Updated weights for policy 0, policy_version 175714 (0.0027) [2025-01-04 10:44:23,172][134294] Updated weights for policy 0, policy_version 175724 (0.0031) [2025-01-04 10:44:23,968][134211] Fps is (10 sec: 11878.2, 60 sec: 13312.0, 300 sec: 13121.1). Total num frames: 719773696. Throughput: 0: 3341.1. Samples: 169111818. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 10:44:23,969][134211] Avg episode reward: [(0, '9.418')] [2025-01-04 10:44:26,697][134294] Updated weights for policy 0, policy_version 175734 (0.0025) [2025-01-04 10:44:28,968][134211] Fps is (10 sec: 11468.4, 60 sec: 13175.6, 300 sec: 13121.1). Total num frames: 719831040. Throughput: 0: 3289.7. Samples: 169129378. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 10:44:28,969][134211] Avg episode reward: [(0, '10.504')] [2025-01-04 10:44:29,934][134294] Updated weights for policy 0, policy_version 175744 (0.0025) [2025-01-04 10:44:33,178][134294] Updated weights for policy 0, policy_version 175754 (0.0027) [2025-01-04 10:44:33,968][134211] Fps is (10 sec: 12288.0, 60 sec: 13175.4, 300 sec: 13023.9). Total num frames: 719896576. Throughput: 0: 3279.1. Samples: 169139136. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 10:44:33,968][134211] Avg episode reward: [(0, '9.263')] [2025-01-04 10:44:36,194][134294] Updated weights for policy 0, policy_version 175764 (0.0024) [2025-01-04 10:44:38,968][134211] Fps is (10 sec: 13107.5, 60 sec: 13175.5, 300 sec: 12996.1). Total num frames: 719962112. Throughput: 0: 3317.7. Samples: 169158666. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 10:44:38,969][134211] Avg episode reward: [(0, '9.207')] [2025-01-04 10:44:39,452][134294] Updated weights for policy 0, policy_version 175774 (0.0028) [2025-01-04 10:44:42,736][134294] Updated weights for policy 0, policy_version 175784 (0.0028) [2025-01-04 10:44:43,968][134211] Fps is (10 sec: 13107.5, 60 sec: 13107.2, 300 sec: 13010.0). Total num frames: 720027648. Throughput: 0: 3317.0. Samples: 169177300. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 10:44:43,968][134211] Avg episode reward: [(0, '8.919')] [2025-01-04 10:44:44,932][134294] Updated weights for policy 0, policy_version 175794 (0.0017) [2025-01-04 10:44:47,640][134294] Updated weights for policy 0, policy_version 175804 (0.0020) [2025-01-04 10:44:48,968][134211] Fps is (10 sec: 14745.6, 60 sec: 13448.7, 300 sec: 13079.4). Total num frames: 720109568. Throughput: 0: 3430.2. Samples: 169191506. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 10:44:48,968][134211] Avg episode reward: [(0, '10.138')] [2025-01-04 10:44:50,773][134294] Updated weights for policy 0, policy_version 175814 (0.0024) [2025-01-04 10:44:53,968][134211] Fps is (10 sec: 14335.7, 60 sec: 13448.5, 300 sec: 13079.4). Total num frames: 720171008. Throughput: 0: 3269.2. Samples: 169211190. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 10:44:53,968][134211] Avg episode reward: [(0, '9.780')] [2025-01-04 10:44:54,021][134294] Updated weights for policy 0, policy_version 175824 (0.0028) [2025-01-04 10:44:57,353][134294] Updated weights for policy 0, policy_version 175834 (0.0027) [2025-01-04 10:44:58,968][134211] Fps is (10 sec: 12697.6, 60 sec: 13516.8, 300 sec: 13079.5). Total num frames: 720236544. Throughput: 0: 3226.5. Samples: 169229874. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 10:44:58,968][134211] Avg episode reward: [(0, '10.199')] [2025-01-04 10:45:00,463][134294] Updated weights for policy 0, policy_version 175844 (0.0027) [2025-01-04 10:45:03,876][134294] Updated weights for policy 0, policy_version 175854 (0.0026) [2025-01-04 10:45:03,969][134211] Fps is (10 sec: 12696.2, 60 sec: 13380.0, 300 sec: 13065.5). Total num frames: 720297984. Throughput: 0: 3224.5. Samples: 169239540. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 10:45:03,970][134211] Avg episode reward: [(0, '10.000')] [2025-01-04 10:45:06,995][134294] Updated weights for policy 0, policy_version 175864 (0.0024) [2025-01-04 10:45:08,944][134294] Updated weights for policy 0, policy_version 175874 (0.0014) [2025-01-04 10:45:08,968][134211] Fps is (10 sec: 14336.3, 60 sec: 13243.8, 300 sec: 13148.9). Total num frames: 720379904. Throughput: 0: 3267.5. Samples: 169258856. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 10:45:08,968][134211] Avg episode reward: [(0, '9.951')] [2025-01-04 10:45:10,942][134294] Updated weights for policy 0, policy_version 175884 (0.0015) [2025-01-04 10:45:13,968][134211] Fps is (10 sec: 15976.2, 60 sec: 13380.2, 300 sec: 13218.3). Total num frames: 720457728. Throughput: 0: 3487.5. Samples: 169286316. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 10:45:13,968][134211] Avg episode reward: [(0, '9.879')] [2025-01-04 10:45:14,014][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000175894_720461824.pth... [2025-01-04 10:45:14,020][134294] Updated weights for policy 0, policy_version 175894 (0.0027) [2025-01-04 10:45:14,091][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000175122_717299712.pth [2025-01-04 10:45:17,313][134294] Updated weights for policy 0, policy_version 175904 (0.0027) [2025-01-04 10:45:18,968][134211] Fps is (10 sec: 13926.3, 60 sec: 13380.3, 300 sec: 13204.4). Total num frames: 720519168. Throughput: 0: 3468.2. Samples: 169295206. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 10:45:18,968][134211] Avg episode reward: [(0, '9.844')] [2025-01-04 10:45:20,564][134294] Updated weights for policy 0, policy_version 175914 (0.0028) [2025-01-04 10:45:23,968][134211] Fps is (10 sec: 11878.2, 60 sec: 13380.2, 300 sec: 13037.8). Total num frames: 720576512. Throughput: 0: 3435.4. Samples: 169313260. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 10:45:23,969][134211] Avg episode reward: [(0, '9.456')] [2025-01-04 10:45:24,653][134294] Updated weights for policy 0, policy_version 175924 (0.0036) [2025-01-04 10:45:28,572][134294] Updated weights for policy 0, policy_version 175934 (0.0030) [2025-01-04 10:45:28,968][134211] Fps is (10 sec: 11059.1, 60 sec: 13312.1, 300 sec: 12996.1). Total num frames: 720629760. Throughput: 0: 3363.0. Samples: 169328634. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 10:45:28,968][134211] Avg episode reward: [(0, '9.658')] [2025-01-04 10:45:30,902][134294] Updated weights for policy 0, policy_version 175944 (0.0015) [2025-01-04 10:45:32,921][134294] Updated weights for policy 0, policy_version 175954 (0.0013) [2025-01-04 10:45:33,967][134211] Fps is (10 sec: 15155.8, 60 sec: 13858.2, 300 sec: 13107.2). Total num frames: 720728064. Throughput: 0: 3325.7. Samples: 169341164. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 10:45:33,968][134211] Avg episode reward: [(0, '9.422')] [2025-01-04 10:45:34,851][134294] Updated weights for policy 0, policy_version 175964 (0.0012) [2025-01-04 10:45:36,857][134294] Updated weights for policy 0, policy_version 175974 (0.0018) [2025-01-04 10:45:38,969][134211] Fps is (10 sec: 18430.3, 60 sec: 14199.3, 300 sec: 13176.6). Total num frames: 720814080. Throughput: 0: 3561.5. Samples: 169371462. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 10:45:38,969][134211] Avg episode reward: [(0, '9.860')] [2025-01-04 10:45:40,255][134294] Updated weights for policy 0, policy_version 175984 (0.0026) [2025-01-04 10:45:43,428][134294] Updated weights for policy 0, policy_version 175994 (0.0029) [2025-01-04 10:45:43,969][134211] Fps is (10 sec: 14744.1, 60 sec: 14131.0, 300 sec: 13176.6). Total num frames: 720875520. Throughput: 0: 3558.1. Samples: 169389990. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 10:45:43,969][134211] Avg episode reward: [(0, '9.466')] [2025-01-04 10:45:46,649][134294] Updated weights for policy 0, policy_version 176004 (0.0026) [2025-01-04 10:45:48,968][134211] Fps is (10 sec: 12698.9, 60 sec: 13858.2, 300 sec: 13204.4). Total num frames: 720941056. Throughput: 0: 3561.5. Samples: 169399804. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 10:45:48,968][134211] Avg episode reward: [(0, '9.558')] [2025-01-04 10:45:50,026][134294] Updated weights for policy 0, policy_version 176014 (0.0026) [2025-01-04 10:45:53,537][134294] Updated weights for policy 0, policy_version 176024 (0.0028) [2025-01-04 10:45:53,968][134211] Fps is (10 sec: 12288.9, 60 sec: 13789.9, 300 sec: 13190.5). Total num frames: 720998400. Throughput: 0: 3531.8. Samples: 169417786. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 10:45:53,969][134211] Avg episode reward: [(0, '10.669')] [2025-01-04 10:45:56,873][134294] Updated weights for policy 0, policy_version 176034 (0.0028) [2025-01-04 10:45:58,968][134211] Fps is (10 sec: 11878.2, 60 sec: 13721.6, 300 sec: 13190.5). Total num frames: 721059840. Throughput: 0: 3326.6. Samples: 169436014. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 10:45:58,969][134211] Avg episode reward: [(0, '10.262')] [2025-01-04 10:46:00,139][134294] Updated weights for policy 0, policy_version 176044 (0.0025) [2025-01-04 10:46:03,315][134294] Updated weights for policy 0, policy_version 176054 (0.0023) [2025-01-04 10:46:03,968][134211] Fps is (10 sec: 12288.2, 60 sec: 13721.9, 300 sec: 13176.6). Total num frames: 721121280. Throughput: 0: 3345.9. Samples: 169445770. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 10:46:03,968][134211] Avg episode reward: [(0, '10.072')] [2025-01-04 10:46:06,518][134294] Updated weights for policy 0, policy_version 176064 (0.0024) [2025-01-04 10:46:08,968][134211] Fps is (10 sec: 12697.7, 60 sec: 13448.5, 300 sec: 13176.6). Total num frames: 721186816. Throughput: 0: 3369.1. Samples: 169464870. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 10:46:08,968][134211] Avg episode reward: [(0, '9.423')] [2025-01-04 10:46:09,793][134294] Updated weights for policy 0, policy_version 176074 (0.0026) [2025-01-04 10:46:12,688][134294] Updated weights for policy 0, policy_version 176084 (0.0025) [2025-01-04 10:46:13,968][134211] Fps is (10 sec: 13106.6, 60 sec: 13243.7, 300 sec: 13204.4). Total num frames: 721252352. Throughput: 0: 3466.5. Samples: 169484626. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 10:46:13,969][134211] Avg episode reward: [(0, '9.543')] [2025-01-04 10:46:15,860][134294] Updated weights for policy 0, policy_version 176094 (0.0026) [2025-01-04 10:46:18,968][134211] Fps is (10 sec: 13107.2, 60 sec: 13312.0, 300 sec: 13232.2). Total num frames: 721317888. Throughput: 0: 3406.9. Samples: 169494474. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 10:46:18,968][134211] Avg episode reward: [(0, '8.904')] [2025-01-04 10:46:19,301][134294] Updated weights for policy 0, policy_version 176104 (0.0025) [2025-01-04 10:46:22,707][134294] Updated weights for policy 0, policy_version 176114 (0.0025) [2025-01-04 10:46:23,968][134211] Fps is (10 sec: 12288.4, 60 sec: 13312.0, 300 sec: 13246.0). Total num frames: 721375232. Throughput: 0: 3128.5. Samples: 169512240. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 10:46:23,968][134211] Avg episode reward: [(0, '9.506')] [2025-01-04 10:46:26,253][134294] Updated weights for policy 0, policy_version 176124 (0.0028) [2025-01-04 10:46:28,968][134211] Fps is (10 sec: 11468.8, 60 sec: 13380.3, 300 sec: 13121.1). Total num frames: 721432576. Throughput: 0: 3109.7. Samples: 169529926. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 10:46:28,968][134211] Avg episode reward: [(0, '10.411')] [2025-01-04 10:46:29,567][134294] Updated weights for policy 0, policy_version 176134 (0.0026) [2025-01-04 10:46:32,676][134294] Updated weights for policy 0, policy_version 176144 (0.0023) [2025-01-04 10:46:33,969][134211] Fps is (10 sec: 12287.1, 60 sec: 12833.9, 300 sec: 13107.2). Total num frames: 721498112. Throughput: 0: 3108.7. Samples: 169539698. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 10:46:33,969][134211] Avg episode reward: [(0, '9.171')] [2025-01-04 10:46:35,782][134294] Updated weights for policy 0, policy_version 176154 (0.0026) [2025-01-04 10:46:38,708][134294] Updated weights for policy 0, policy_version 176164 (0.0024) [2025-01-04 10:46:38,968][134211] Fps is (10 sec: 13516.8, 60 sec: 12561.3, 300 sec: 13162.7). Total num frames: 721567744. Throughput: 0: 3157.2. Samples: 169559858. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 10:46:38,968][134211] Avg episode reward: [(0, '9.659')] [2025-01-04 10:46:41,855][134294] Updated weights for policy 0, policy_version 176174 (0.0023) [2025-01-04 10:46:43,968][134211] Fps is (10 sec: 13517.8, 60 sec: 12629.5, 300 sec: 13204.4). Total num frames: 721633280. Throughput: 0: 3191.8. Samples: 169579644. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 10:46:43,968][134211] Avg episode reward: [(0, '8.760')] [2025-01-04 10:46:45,023][134294] Updated weights for policy 0, policy_version 176184 (0.0025) [2025-01-04 10:46:47,939][134294] Updated weights for policy 0, policy_version 176194 (0.0026) [2025-01-04 10:46:48,968][134211] Fps is (10 sec: 13516.9, 60 sec: 12697.6, 300 sec: 13246.1). Total num frames: 721702912. Throughput: 0: 3195.9. Samples: 169589586. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 10:46:48,968][134211] Avg episode reward: [(0, '9.441')] [2025-01-04 10:46:51,160][134294] Updated weights for policy 0, policy_version 176204 (0.0022) [2025-01-04 10:46:53,968][134211] Fps is (10 sec: 13107.2, 60 sec: 12765.9, 300 sec: 13259.9). Total num frames: 721764352. Throughput: 0: 3203.0. Samples: 169609004. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 10:46:53,969][134211] Avg episode reward: [(0, '9.750')] [2025-01-04 10:46:54,590][134294] Updated weights for policy 0, policy_version 176214 (0.0027) [2025-01-04 10:46:57,792][134294] Updated weights for policy 0, policy_version 176224 (0.0025) [2025-01-04 10:46:58,967][134211] Fps is (10 sec: 13107.4, 60 sec: 12902.5, 300 sec: 13176.6). Total num frames: 721833984. Throughput: 0: 3180.4. Samples: 169627742. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 10:46:58,968][134211] Avg episode reward: [(0, '8.705')] [2025-01-04 10:46:59,944][134294] Updated weights for policy 0, policy_version 176234 (0.0013) [2025-01-04 10:47:02,772][134294] Updated weights for policy 0, policy_version 176244 (0.0027) [2025-01-04 10:47:03,968][134211] Fps is (10 sec: 14336.0, 60 sec: 13107.2, 300 sec: 13232.2). Total num frames: 721907712. Throughput: 0: 3274.3. Samples: 169641818. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 10:47:03,968][134211] Avg episode reward: [(0, '9.053')] [2025-01-04 10:47:06,167][134294] Updated weights for policy 0, policy_version 176254 (0.0030) [2025-01-04 10:47:08,970][134211] Fps is (10 sec: 13513.9, 60 sec: 13038.5, 300 sec: 13273.7). Total num frames: 721969152. Throughput: 0: 3300.3. Samples: 169660760. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 10:47:08,970][134211] Avg episode reward: [(0, '9.046')] [2025-01-04 10:47:09,262][134294] Updated weights for policy 0, policy_version 176264 (0.0025) [2025-01-04 10:47:12,324][134294] Updated weights for policy 0, policy_version 176274 (0.0024) [2025-01-04 10:47:13,968][134211] Fps is (10 sec: 12697.6, 60 sec: 13039.0, 300 sec: 13301.6). Total num frames: 722034688. Throughput: 0: 3340.9. Samples: 169680268. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 10:47:13,968][134211] Avg episode reward: [(0, '9.011')] [2025-01-04 10:47:14,011][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000176279_722038784.pth... [2025-01-04 10:47:14,077][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000175502_718856192.pth [2025-01-04 10:47:15,562][134294] Updated weights for policy 0, policy_version 176284 (0.0026) [2025-01-04 10:47:18,589][134294] Updated weights for policy 0, policy_version 176294 (0.0027) [2025-01-04 10:47:18,968][134211] Fps is (10 sec: 13109.9, 60 sec: 13039.0, 300 sec: 13218.3). Total num frames: 722100224. Throughput: 0: 3344.6. Samples: 169690202. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 10:47:18,968][134211] Avg episode reward: [(0, '9.631')] [2025-01-04 10:47:21,937][134294] Updated weights for policy 0, policy_version 176304 (0.0026) [2025-01-04 10:47:23,968][134211] Fps is (10 sec: 13107.3, 60 sec: 13175.5, 300 sec: 13121.1). Total num frames: 722165760. Throughput: 0: 3322.4. Samples: 169709366. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 10:47:23,968][134211] Avg episode reward: [(0, '10.152')] [2025-01-04 10:47:25,267][134294] Updated weights for policy 0, policy_version 176314 (0.0025) [2025-01-04 10:47:28,418][134294] Updated weights for policy 0, policy_version 176324 (0.0024) [2025-01-04 10:47:28,968][134211] Fps is (10 sec: 12697.7, 60 sec: 13243.8, 300 sec: 13162.8). Total num frames: 722227200. Throughput: 0: 3299.0. Samples: 169728098. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 10:47:28,968][134211] Avg episode reward: [(0, '8.715')] [2025-01-04 10:47:31,682][134294] Updated weights for policy 0, policy_version 176334 (0.0027) [2025-01-04 10:47:33,968][134211] Fps is (10 sec: 12287.5, 60 sec: 13175.5, 300 sec: 13176.6). Total num frames: 722288640. Throughput: 0: 3290.4. Samples: 169737654. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 10:47:33,968][134211] Avg episode reward: [(0, '8.543')] [2025-01-04 10:47:34,699][134294] Updated weights for policy 0, policy_version 176344 (0.0023) [2025-01-04 10:47:36,694][134294] Updated weights for policy 0, policy_version 176354 (0.0014) [2025-01-04 10:47:38,637][134294] Updated weights for policy 0, policy_version 176364 (0.0013) [2025-01-04 10:47:38,968][134211] Fps is (10 sec: 16793.6, 60 sec: 13789.9, 300 sec: 13315.5). Total num frames: 722395136. Throughput: 0: 3403.8. Samples: 169762176. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-04 10:47:38,968][134211] Avg episode reward: [(0, '9.090')] [2025-01-04 10:47:40,543][134294] Updated weights for policy 0, policy_version 176374 (0.0013) [2025-01-04 10:47:43,026][134294] Updated weights for policy 0, policy_version 176384 (0.0021) [2025-01-04 10:47:43,968][134211] Fps is (10 sec: 18842.5, 60 sec: 14063.0, 300 sec: 13357.2). Total num frames: 722477056. Throughput: 0: 3627.8. Samples: 169790994. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-04 10:47:43,968][134211] Avg episode reward: [(0, '10.444')] [2025-01-04 10:47:46,133][134294] Updated weights for policy 0, policy_version 176394 (0.0031) [2025-01-04 10:47:48,968][134211] Fps is (10 sec: 14335.8, 60 sec: 13926.4, 300 sec: 13357.1). Total num frames: 722538496. Throughput: 0: 3529.5. Samples: 169800644. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-04 10:47:48,968][134211] Avg episode reward: [(0, '8.817')] [2025-01-04 10:47:49,706][134294] Updated weights for policy 0, policy_version 176404 (0.0027) [2025-01-04 10:47:53,174][134294] Updated weights for policy 0, policy_version 176414 (0.0023) [2025-01-04 10:47:53,968][134211] Fps is (10 sec: 12287.8, 60 sec: 13926.4, 300 sec: 13357.2). Total num frames: 722599936. Throughput: 0: 3501.2. Samples: 169818306. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-04 10:47:53,970][134211] Avg episode reward: [(0, '8.805')] [2025-01-04 10:47:56,567][134294] Updated weights for policy 0, policy_version 176424 (0.0026) [2025-01-04 10:47:58,968][134211] Fps is (10 sec: 12697.5, 60 sec: 13858.1, 300 sec: 13371.0). Total num frames: 722665472. Throughput: 0: 3475.5. Samples: 169836666. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-04 10:47:58,968][134211] Avg episode reward: [(0, '8.049')] [2025-01-04 10:47:59,715][134294] Updated weights for policy 0, policy_version 176434 (0.0024) [2025-01-04 10:48:03,155][134294] Updated weights for policy 0, policy_version 176444 (0.0028) [2025-01-04 10:48:03,968][134211] Fps is (10 sec: 12287.8, 60 sec: 13585.0, 300 sec: 13329.3). Total num frames: 722722816. Throughput: 0: 3460.7. Samples: 169845936. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-04 10:48:03,969][134211] Avg episode reward: [(0, '8.416')] [2025-01-04 10:48:06,411][134294] Updated weights for policy 0, policy_version 176454 (0.0026) [2025-01-04 10:48:08,968][134211] Fps is (10 sec: 12287.6, 60 sec: 13653.7, 300 sec: 13329.3). Total num frames: 722788352. Throughput: 0: 3448.3. Samples: 169864540. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-04 10:48:08,969][134211] Avg episode reward: [(0, '8.646')] [2025-01-04 10:48:09,417][134294] Updated weights for policy 0, policy_version 176464 (0.0025) [2025-01-04 10:48:12,543][134294] Updated weights for policy 0, policy_version 176474 (0.0024) [2025-01-04 10:48:13,968][134211] Fps is (10 sec: 13107.0, 60 sec: 13653.3, 300 sec: 13343.2). Total num frames: 722853888. Throughput: 0: 3478.2. Samples: 169884618. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-04 10:48:13,969][134211] Avg episode reward: [(0, '9.524')] [2025-01-04 10:48:15,482][134294] Updated weights for policy 0, policy_version 176484 (0.0024) [2025-01-04 10:48:18,740][134294] Updated weights for policy 0, policy_version 176494 (0.0028) [2025-01-04 10:48:18,968][134211] Fps is (10 sec: 13107.7, 60 sec: 13653.3, 300 sec: 13371.0). Total num frames: 722919424. Throughput: 0: 3502.7. Samples: 169895276. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-04 10:48:18,968][134211] Avg episode reward: [(0, '9.473')] [2025-01-04 10:48:22,156][134294] Updated weights for policy 0, policy_version 176504 (0.0022) [2025-01-04 10:48:23,968][134211] Fps is (10 sec: 12288.5, 60 sec: 13516.8, 300 sec: 13343.3). Total num frames: 722976768. Throughput: 0: 3354.5. Samples: 169913130. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-04 10:48:23,968][134211] Avg episode reward: [(0, '9.581')] [2025-01-04 10:48:25,650][134294] Updated weights for policy 0, policy_version 176514 (0.0025) [2025-01-04 10:48:28,821][134294] Updated weights for policy 0, policy_version 176524 (0.0024) [2025-01-04 10:48:28,968][134211] Fps is (10 sec: 12288.1, 60 sec: 13585.0, 300 sec: 13343.2). Total num frames: 723042304. Throughput: 0: 3125.6. Samples: 169931648. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-04 10:48:28,968][134211] Avg episode reward: [(0, '9.437')] [2025-01-04 10:48:31,730][134294] Updated weights for policy 0, policy_version 176534 (0.0027) [2025-01-04 10:48:33,968][134211] Fps is (10 sec: 13107.1, 60 sec: 13653.4, 300 sec: 13343.2). Total num frames: 723107840. Throughput: 0: 3140.0. Samples: 169941944. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-04 10:48:33,968][134211] Avg episode reward: [(0, '9.426')] [2025-01-04 10:48:34,939][134294] Updated weights for policy 0, policy_version 176544 (0.0026) [2025-01-04 10:48:38,017][134294] Updated weights for policy 0, policy_version 176554 (0.0024) [2025-01-04 10:48:38,968][134211] Fps is (10 sec: 13516.8, 60 sec: 13038.9, 300 sec: 13343.2). Total num frames: 723177472. Throughput: 0: 3188.8. Samples: 169961800. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-04 10:48:38,968][134211] Avg episode reward: [(0, '9.685')] [2025-01-04 10:48:41,008][134294] Updated weights for policy 0, policy_version 176564 (0.0021) [2025-01-04 10:48:43,968][134211] Fps is (10 sec: 13516.8, 60 sec: 12765.9, 300 sec: 13357.2). Total num frames: 723243008. Throughput: 0: 3232.2. Samples: 169982114. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-04 10:48:43,968][134211] Avg episode reward: [(0, '8.990')] [2025-01-04 10:48:44,122][134294] Updated weights for policy 0, policy_version 176574 (0.0026) [2025-01-04 10:48:46,959][134294] Updated weights for policy 0, policy_version 176584 (0.0023) [2025-01-04 10:48:48,968][134211] Fps is (10 sec: 13516.8, 60 sec: 12902.4, 300 sec: 13384.9). Total num frames: 723312640. Throughput: 0: 3252.6. Samples: 169992304. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 10:48:48,968][134211] Avg episode reward: [(0, '8.452')] [2025-01-04 10:48:50,207][134294] Updated weights for policy 0, policy_version 176594 (0.0026) [2025-01-04 10:48:53,371][134294] Updated weights for policy 0, policy_version 176604 (0.0027) [2025-01-04 10:48:53,968][134211] Fps is (10 sec: 13516.2, 60 sec: 12970.6, 300 sec: 13398.8). Total num frames: 723378176. Throughput: 0: 3278.0. Samples: 170012052. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 10:48:53,969][134211] Avg episode reward: [(0, '10.368')] [2025-01-04 10:48:56,528][134294] Updated weights for policy 0, policy_version 176614 (0.0024) [2025-01-04 10:48:58,968][134211] Fps is (10 sec: 13107.1, 60 sec: 12970.7, 300 sec: 13384.9). Total num frames: 723443712. Throughput: 0: 3267.8. Samples: 170031670. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 10:48:58,968][134211] Avg episode reward: [(0, '10.417')] [2025-01-04 10:48:59,545][134294] Updated weights for policy 0, policy_version 176624 (0.0028) [2025-01-04 10:49:02,719][134294] Updated weights for policy 0, policy_version 176634 (0.0029) [2025-01-04 10:49:03,968][134211] Fps is (10 sec: 12698.1, 60 sec: 13039.0, 300 sec: 13287.7). Total num frames: 723505152. Throughput: 0: 3248.3. Samples: 170041448. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 10:49:03,968][134211] Avg episode reward: [(0, '9.593')] [2025-01-04 10:49:05,928][134294] Updated weights for policy 0, policy_version 176644 (0.0029) [2025-01-04 10:49:08,969][134211] Fps is (10 sec: 12696.5, 60 sec: 13038.8, 300 sec: 13273.8). Total num frames: 723570688. Throughput: 0: 3279.8. Samples: 170060722. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 10:49:08,969][134211] Avg episode reward: [(0, '9.254')] [2025-01-04 10:49:09,123][134294] Updated weights for policy 0, policy_version 176654 (0.0027) [2025-01-04 10:49:11,382][134294] Updated weights for policy 0, policy_version 176664 (0.0015) [2025-01-04 10:49:13,276][134294] Updated weights for policy 0, policy_version 176674 (0.0012) [2025-01-04 10:49:13,968][134211] Fps is (10 sec: 16384.4, 60 sec: 13585.2, 300 sec: 13398.8). Total num frames: 723668992. Throughput: 0: 3452.0. Samples: 170086986. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 10:49:13,968][134211] Avg episode reward: [(0, '9.573')] [2025-01-04 10:49:14,021][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000176678_723673088.pth... [2025-01-04 10:49:14,061][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000175894_720461824.pth [2025-01-04 10:49:15,150][134294] Updated weights for policy 0, policy_version 176684 (0.0013) [2025-01-04 10:49:17,024][134294] Updated weights for policy 0, policy_version 176694 (0.0015) [2025-01-04 10:49:18,968][134211] Fps is (10 sec: 19662.5, 60 sec: 14131.2, 300 sec: 13537.6). Total num frames: 723767296. Throughput: 0: 3585.6. Samples: 170103294. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 10:49:18,968][134211] Avg episode reward: [(0, '9.482')] [2025-01-04 10:49:19,910][134294] Updated weights for policy 0, policy_version 176704 (0.0021) [2025-01-04 10:49:23,393][134294] Updated weights for policy 0, policy_version 176714 (0.0030) [2025-01-04 10:49:23,968][134211] Fps is (10 sec: 15564.6, 60 sec: 14131.2, 300 sec: 13537.6). Total num frames: 723824640. Throughput: 0: 3625.7. Samples: 170124956. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 10:49:23,968][134211] Avg episode reward: [(0, '9.127')] [2025-01-04 10:49:26,794][134294] Updated weights for policy 0, policy_version 176724 (0.0027) [2025-01-04 10:49:28,968][134211] Fps is (10 sec: 11878.4, 60 sec: 14062.9, 300 sec: 13523.7). Total num frames: 723886080. Throughput: 0: 3566.4. Samples: 170142602. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 10:49:28,968][134211] Avg episode reward: [(0, '10.503')] [2025-01-04 10:49:30,136][134294] Updated weights for policy 0, policy_version 176734 (0.0027) [2025-01-04 10:49:33,138][134294] Updated weights for policy 0, policy_version 176744 (0.0025) [2025-01-04 10:49:33,968][134211] Fps is (10 sec: 12697.6, 60 sec: 14062.9, 300 sec: 13523.7). Total num frames: 723951616. Throughput: 0: 3561.6. Samples: 170152576. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 10:49:33,968][134211] Avg episode reward: [(0, '9.234')] [2025-01-04 10:49:36,383][134294] Updated weights for policy 0, policy_version 176754 (0.0026) [2025-01-04 10:49:38,968][134211] Fps is (10 sec: 12697.6, 60 sec: 13926.4, 300 sec: 13509.9). Total num frames: 724013056. Throughput: 0: 3542.7. Samples: 170171474. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 10:49:38,968][134211] Avg episode reward: [(0, '9.967')] [2025-01-04 10:49:39,742][134294] Updated weights for policy 0, policy_version 176764 (0.0027) [2025-01-04 10:49:42,610][134294] Updated weights for policy 0, policy_version 176774 (0.0026) [2025-01-04 10:49:43,968][134211] Fps is (10 sec: 13107.2, 60 sec: 13994.7, 300 sec: 13468.2). Total num frames: 724082688. Throughput: 0: 3548.2. Samples: 170191340. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 10:49:43,968][134211] Avg episode reward: [(0, '9.815')] [2025-01-04 10:49:45,647][134294] Updated weights for policy 0, policy_version 176784 (0.0026) [2025-01-04 10:49:48,898][134294] Updated weights for policy 0, policy_version 176794 (0.0024) [2025-01-04 10:49:48,968][134211] Fps is (10 sec: 13516.8, 60 sec: 13926.4, 300 sec: 13482.1). Total num frames: 724148224. Throughput: 0: 3564.1. Samples: 170201834. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 10:49:48,968][134211] Avg episode reward: [(0, '9.758')] [2025-01-04 10:49:52,174][134294] Updated weights for policy 0, policy_version 176804 (0.0026) [2025-01-04 10:49:53,969][134211] Fps is (10 sec: 12696.5, 60 sec: 13858.0, 300 sec: 13468.2). Total num frames: 724209664. Throughput: 0: 3548.0. Samples: 170220380. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 10:49:53,969][134211] Avg episode reward: [(0, '10.041')] [2025-01-04 10:49:55,530][134294] Updated weights for policy 0, policy_version 176814 (0.0028) [2025-01-04 10:49:58,565][134294] Updated weights for policy 0, policy_version 176824 (0.0024) [2025-01-04 10:49:58,968][134211] Fps is (10 sec: 12697.7, 60 sec: 13858.2, 300 sec: 13482.1). Total num frames: 724275200. Throughput: 0: 3398.9. Samples: 170239936. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 10:49:58,969][134211] Avg episode reward: [(0, '8.845')] [2025-01-04 10:50:01,443][134294] Updated weights for policy 0, policy_version 176834 (0.0024) [2025-01-04 10:50:03,968][134211] Fps is (10 sec: 13108.2, 60 sec: 13926.4, 300 sec: 13426.5). Total num frames: 724340736. Throughput: 0: 3264.4. Samples: 170250194. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 10:50:03,969][134211] Avg episode reward: [(0, '9.805')] [2025-01-04 10:50:04,721][134294] Updated weights for policy 0, policy_version 176844 (0.0027) [2025-01-04 10:50:07,758][134294] Updated weights for policy 0, policy_version 176854 (0.0023) [2025-01-04 10:50:08,968][134211] Fps is (10 sec: 13107.3, 60 sec: 13926.6, 300 sec: 13384.9). Total num frames: 724406272. Throughput: 0: 3217.2. Samples: 170269730. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 10:50:08,968][134211] Avg episode reward: [(0, '9.080')] [2025-01-04 10:50:10,716][134294] Updated weights for policy 0, policy_version 176864 (0.0021) [2025-01-04 10:50:13,701][134294] Updated weights for policy 0, policy_version 176874 (0.0025) [2025-01-04 10:50:13,968][134211] Fps is (10 sec: 13926.7, 60 sec: 13516.8, 300 sec: 13426.6). Total num frames: 724480000. Throughput: 0: 3287.6. Samples: 170290544. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 10:50:13,968][134211] Avg episode reward: [(0, '8.642')] [2025-01-04 10:50:16,593][134294] Updated weights for policy 0, policy_version 176884 (0.0024) [2025-01-04 10:50:18,968][134211] Fps is (10 sec: 13516.6, 60 sec: 12902.4, 300 sec: 13440.4). Total num frames: 724541440. Throughput: 0: 3291.6. Samples: 170300698. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 10:50:18,968][134211] Avg episode reward: [(0, '8.389')] [2025-01-04 10:50:20,245][134294] Updated weights for policy 0, policy_version 176894 (0.0026) [2025-01-04 10:50:23,653][134294] Updated weights for policy 0, policy_version 176904 (0.0028) [2025-01-04 10:50:23,971][134211] Fps is (10 sec: 11876.6, 60 sec: 12902.1, 300 sec: 13454.3). Total num frames: 724598784. Throughput: 0: 3265.7. Samples: 170318436. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 10:50:23,972][134211] Avg episode reward: [(0, '9.633')] [2025-01-04 10:50:27,226][134294] Updated weights for policy 0, policy_version 176914 (0.0022) [2025-01-04 10:50:28,968][134211] Fps is (10 sec: 11878.4, 60 sec: 12902.4, 300 sec: 13329.3). Total num frames: 724660224. Throughput: 0: 3220.4. Samples: 170336256. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 10:50:28,968][134211] Avg episode reward: [(0, '9.764')] [2025-01-04 10:50:30,028][134294] Updated weights for policy 0, policy_version 176924 (0.0022) [2025-01-04 10:50:32,030][134294] Updated weights for policy 0, policy_version 176934 (0.0014) [2025-01-04 10:50:33,968][134211] Fps is (10 sec: 15157.3, 60 sec: 13312.0, 300 sec: 13343.3). Total num frames: 724750336. Throughput: 0: 3283.8. Samples: 170349606. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 10:50:33,968][134211] Avg episode reward: [(0, '8.779')] [2025-01-04 10:50:34,883][134294] Updated weights for policy 0, policy_version 176944 (0.0025) [2025-01-04 10:50:37,888][134294] Updated weights for policy 0, policy_version 176954 (0.0025) [2025-01-04 10:50:38,968][134211] Fps is (10 sec: 15564.8, 60 sec: 13380.3, 300 sec: 13357.2). Total num frames: 724815872. Throughput: 0: 3364.3. Samples: 170371770. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 10:50:38,968][134211] Avg episode reward: [(0, '10.026')] [2025-01-04 10:50:41,149][134294] Updated weights for policy 0, policy_version 176964 (0.0024) [2025-01-04 10:50:43,968][134211] Fps is (10 sec: 12697.8, 60 sec: 13243.7, 300 sec: 13343.2). Total num frames: 724877312. Throughput: 0: 3349.7. Samples: 170390672. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 10:50:43,968][134211] Avg episode reward: [(0, '8.794')] [2025-01-04 10:50:44,408][134294] Updated weights for policy 0, policy_version 176974 (0.0026) [2025-01-04 10:50:47,333][134294] Updated weights for policy 0, policy_version 176984 (0.0027) [2025-01-04 10:50:48,968][134211] Fps is (10 sec: 12697.6, 60 sec: 13243.7, 300 sec: 13371.0). Total num frames: 724942848. Throughput: 0: 3344.7. Samples: 170400706. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 10:50:48,968][134211] Avg episode reward: [(0, '10.022')] [2025-01-04 10:50:50,742][134294] Updated weights for policy 0, policy_version 176994 (0.0025) [2025-01-04 10:50:53,872][134294] Updated weights for policy 0, policy_version 177004 (0.0028) [2025-01-04 10:50:53,968][134211] Fps is (10 sec: 13107.2, 60 sec: 13312.2, 300 sec: 13384.9). Total num frames: 725008384. Throughput: 0: 3336.4. Samples: 170419868. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 10:50:53,968][134211] Avg episode reward: [(0, '9.839')] [2025-01-04 10:50:57,222][134294] Updated weights for policy 0, policy_version 177014 (0.0028) [2025-01-04 10:50:58,968][134211] Fps is (10 sec: 12697.6, 60 sec: 13243.7, 300 sec: 13384.9). Total num frames: 725069824. Throughput: 0: 3287.3. Samples: 170438472. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 10:50:58,968][134211] Avg episode reward: [(0, '8.392')] [2025-01-04 10:51:00,259][134294] Updated weights for policy 0, policy_version 177024 (0.0028) [2025-01-04 10:51:03,464][134294] Updated weights for policy 0, policy_version 177034 (0.0024) [2025-01-04 10:51:03,968][134211] Fps is (10 sec: 12697.6, 60 sec: 13243.8, 300 sec: 13384.9). Total num frames: 725135360. Throughput: 0: 3290.4. Samples: 170448766. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 10:51:03,968][134211] Avg episode reward: [(0, '9.721')] [2025-01-04 10:51:06,166][134294] Updated weights for policy 0, policy_version 177044 (0.0017) [2025-01-04 10:51:08,348][134294] Updated weights for policy 0, policy_version 177054 (0.0018) [2025-01-04 10:51:08,968][134211] Fps is (10 sec: 14745.6, 60 sec: 13516.8, 300 sec: 13440.5). Total num frames: 725217280. Throughput: 0: 3406.7. Samples: 170471734. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 10:51:08,968][134211] Avg episode reward: [(0, '9.424')] [2025-01-04 10:51:11,472][134294] Updated weights for policy 0, policy_version 177064 (0.0028) [2025-01-04 10:51:13,968][134211] Fps is (10 sec: 14745.1, 60 sec: 13380.2, 300 sec: 13440.4). Total num frames: 725282816. Throughput: 0: 3463.2. Samples: 170492100. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 10:51:13,969][134211] Avg episode reward: [(0, '10.072')] [2025-01-04 10:51:14,039][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000177072_725286912.pth... [2025-01-04 10:51:14,109][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000176279_722038784.pth [2025-01-04 10:51:14,673][134294] Updated weights for policy 0, policy_version 177074 (0.0029) [2025-01-04 10:51:17,889][134294] Updated weights for policy 0, policy_version 177084 (0.0027) [2025-01-04 10:51:18,968][134211] Fps is (10 sec: 13107.2, 60 sec: 13448.5, 300 sec: 13468.2). Total num frames: 725348352. Throughput: 0: 3380.4. Samples: 170501724. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 10:51:18,968][134211] Avg episode reward: [(0, '9.589')] [2025-01-04 10:51:21,149][134294] Updated weights for policy 0, policy_version 177094 (0.0028) [2025-01-04 10:51:23,969][134211] Fps is (10 sec: 12696.8, 60 sec: 13516.9, 300 sec: 13482.0). Total num frames: 725409792. Throughput: 0: 3305.9. Samples: 170520540. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 10:51:23,969][134211] Avg episode reward: [(0, '8.666')] [2025-01-04 10:51:24,144][134294] Updated weights for policy 0, policy_version 177104 (0.0021) [2025-01-04 10:51:26,222][134294] Updated weights for policy 0, policy_version 177114 (0.0013) [2025-01-04 10:51:28,128][134294] Updated weights for policy 0, policy_version 177124 (0.0013) [2025-01-04 10:51:28,968][134211] Fps is (10 sec: 16794.0, 60 sec: 14267.8, 300 sec: 13621.0). Total num frames: 725516288. Throughput: 0: 3505.4. Samples: 170548416. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 10:51:28,968][134211] Avg episode reward: [(0, '9.010')] [2025-01-04 10:51:30,050][134294] Updated weights for policy 0, policy_version 177134 (0.0013) [2025-01-04 10:51:32,376][134294] Updated weights for policy 0, policy_version 177144 (0.0019) [2025-01-04 10:51:33,968][134211] Fps is (10 sec: 18842.9, 60 sec: 14131.2, 300 sec: 13662.6). Total num frames: 725598208. Throughput: 0: 3634.7. Samples: 170564270. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 10:51:33,969][134211] Avg episode reward: [(0, '10.378')] [2025-01-04 10:51:35,925][134294] Updated weights for policy 0, policy_version 177154 (0.0029) [2025-01-04 10:51:38,968][134211] Fps is (10 sec: 14335.7, 60 sec: 14062.9, 300 sec: 13648.7). Total num frames: 725659648. Throughput: 0: 3617.2. Samples: 170582642. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 10:51:38,968][134211] Avg episode reward: [(0, '9.347')] [2025-01-04 10:51:39,329][134294] Updated weights for policy 0, policy_version 177164 (0.0027) [2025-01-04 10:51:42,497][134294] Updated weights for policy 0, policy_version 177174 (0.0026) [2025-01-04 10:51:43,968][134211] Fps is (10 sec: 12288.3, 60 sec: 14062.9, 300 sec: 13620.9). Total num frames: 725721088. Throughput: 0: 3619.8. Samples: 170601362. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 10:51:43,968][134211] Avg episode reward: [(0, '9.162')] [2025-01-04 10:51:45,645][134294] Updated weights for policy 0, policy_version 177184 (0.0025) [2025-01-04 10:51:48,942][134294] Updated weights for policy 0, policy_version 177194 (0.0028) [2025-01-04 10:51:48,968][134211] Fps is (10 sec: 12697.5, 60 sec: 14062.9, 300 sec: 13634.8). Total num frames: 725786624. Throughput: 0: 3611.2. Samples: 170611270. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 10:51:48,969][134211] Avg episode reward: [(0, '9.526')] [2025-01-04 10:51:52,539][134294] Updated weights for policy 0, policy_version 177204 (0.0022) [2025-01-04 10:51:53,968][134211] Fps is (10 sec: 12287.8, 60 sec: 13926.4, 300 sec: 13593.2). Total num frames: 725843968. Throughput: 0: 3500.8. Samples: 170629272. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 10:51:53,969][134211] Avg episode reward: [(0, '9.024')] [2025-01-04 10:51:55,960][134294] Updated weights for policy 0, policy_version 177214 (0.0026) [2025-01-04 10:51:58,968][134211] Fps is (10 sec: 11878.4, 60 sec: 13926.4, 300 sec: 13551.5). Total num frames: 725905408. Throughput: 0: 3453.2. Samples: 170647492. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 10:51:58,968][134211] Avg episode reward: [(0, '8.165')] [2025-01-04 10:51:59,197][134294] Updated weights for policy 0, policy_version 177224 (0.0026) [2025-01-04 10:52:02,532][134294] Updated weights for policy 0, policy_version 177234 (0.0032) [2025-01-04 10:52:03,968][134211] Fps is (10 sec: 12288.2, 60 sec: 13858.1, 300 sec: 13551.6). Total num frames: 725966848. Throughput: 0: 3443.4. Samples: 170656676. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 10:52:03,968][134211] Avg episode reward: [(0, '8.659')] [2025-01-04 10:52:05,867][134294] Updated weights for policy 0, policy_version 177244 (0.0028) [2025-01-04 10:52:08,927][134294] Updated weights for policy 0, policy_version 177254 (0.0029) [2025-01-04 10:52:08,968][134211] Fps is (10 sec: 12697.8, 60 sec: 13585.1, 300 sec: 13551.5). Total num frames: 726032384. Throughput: 0: 3444.3. Samples: 170675532. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 10:52:08,968][134211] Avg episode reward: [(0, '8.202')] [2025-01-04 10:52:12,134][134294] Updated weights for policy 0, policy_version 177264 (0.0028) [2025-01-04 10:52:13,968][134211] Fps is (10 sec: 12697.4, 60 sec: 13516.8, 300 sec: 13537.6). Total num frames: 726093824. Throughput: 0: 3255.1. Samples: 170694898. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 10:52:13,968][134211] Avg episode reward: [(0, '10.233')] [2025-01-04 10:52:15,211][134294] Updated weights for policy 0, policy_version 177274 (0.0024) [2025-01-04 10:52:18,342][134294] Updated weights for policy 0, policy_version 177284 (0.0028) [2025-01-04 10:52:18,968][134211] Fps is (10 sec: 12697.7, 60 sec: 13516.8, 300 sec: 13537.6). Total num frames: 726159360. Throughput: 0: 3130.8. Samples: 170705156. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 10:52:18,968][134211] Avg episode reward: [(0, '9.790')] [2025-01-04 10:52:21,805][134294] Updated weights for policy 0, policy_version 177294 (0.0026) [2025-01-04 10:52:23,968][134211] Fps is (10 sec: 12697.6, 60 sec: 13517.0, 300 sec: 13537.6). Total num frames: 726220800. Throughput: 0: 3124.0. Samples: 170723220. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 10:52:23,968][134211] Avg episode reward: [(0, '9.923')] [2025-01-04 10:52:25,398][134294] Updated weights for policy 0, policy_version 177304 (0.0028) [2025-01-04 10:52:28,908][134294] Updated weights for policy 0, policy_version 177314 (0.0026) [2025-01-04 10:52:28,968][134211] Fps is (10 sec: 11878.2, 60 sec: 12697.5, 300 sec: 13523.8). Total num frames: 726278144. Throughput: 0: 3095.8. Samples: 170740672. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 10:52:28,968][134211] Avg episode reward: [(0, '9.387')] [2025-01-04 10:52:31,464][134294] Updated weights for policy 0, policy_version 177324 (0.0019) [2025-01-04 10:52:33,968][134211] Fps is (10 sec: 13517.0, 60 sec: 12629.4, 300 sec: 13426.5). Total num frames: 726355968. Throughput: 0: 3118.5. Samples: 170751604. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 10:52:33,968][134211] Avg episode reward: [(0, '9.283')] [2025-01-04 10:52:34,338][134294] Updated weights for policy 0, policy_version 177334 (0.0022) [2025-01-04 10:52:37,829][134294] Updated weights for policy 0, policy_version 177344 (0.0025) [2025-01-04 10:52:38,968][134211] Fps is (10 sec: 13516.8, 60 sec: 12561.1, 300 sec: 13343.2). Total num frames: 726413312. Throughput: 0: 3161.8. Samples: 170771554. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 10:52:38,968][134211] Avg episode reward: [(0, '9.559')] [2025-01-04 10:52:41,010][134294] Updated weights for policy 0, policy_version 177354 (0.0023) [2025-01-04 10:52:43,968][134211] Fps is (10 sec: 12287.7, 60 sec: 12629.3, 300 sec: 13357.1). Total num frames: 726478848. Throughput: 0: 3183.6. Samples: 170790756. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 10:52:43,969][134211] Avg episode reward: [(0, '9.185')] [2025-01-04 10:52:44,135][134294] Updated weights for policy 0, policy_version 177364 (0.0026) [2025-01-04 10:52:47,133][134294] Updated weights for policy 0, policy_version 177374 (0.0024) [2025-01-04 10:52:48,968][134211] Fps is (10 sec: 13106.8, 60 sec: 12629.3, 300 sec: 13371.0). Total num frames: 726544384. Throughput: 0: 3202.5. Samples: 170800792. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 10:52:48,969][134211] Avg episode reward: [(0, '8.321')] [2025-01-04 10:52:50,251][134294] Updated weights for policy 0, policy_version 177384 (0.0026) [2025-01-04 10:52:53,239][134294] Updated weights for policy 0, policy_version 177394 (0.0024) [2025-01-04 10:52:53,968][134211] Fps is (10 sec: 13516.8, 60 sec: 12834.1, 300 sec: 13384.9). Total num frames: 726614016. Throughput: 0: 3242.3. Samples: 170821436. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 10:52:53,969][134211] Avg episode reward: [(0, '8.973')] [2025-01-04 10:52:56,604][134294] Updated weights for policy 0, policy_version 177404 (0.0025) [2025-01-04 10:52:58,940][134294] Updated weights for policy 0, policy_version 177414 (0.0014) [2025-01-04 10:52:58,968][134211] Fps is (10 sec: 14336.8, 60 sec: 13039.0, 300 sec: 13440.5). Total num frames: 726687744. Throughput: 0: 3250.2. Samples: 170841156. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 10:52:58,968][134211] Avg episode reward: [(0, '8.945')] [2025-01-04 10:53:00,836][134294] Updated weights for policy 0, policy_version 177424 (0.0013) [2025-01-04 10:53:02,739][134294] Updated weights for policy 0, policy_version 177434 (0.0013) [2025-01-04 10:53:03,968][134211] Fps is (10 sec: 18022.2, 60 sec: 13789.8, 300 sec: 13579.3). Total num frames: 726794240. Throughput: 0: 3382.2. Samples: 170857358. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 10:53:03,968][134211] Avg episode reward: [(0, '9.793')] [2025-01-04 10:53:04,684][134294] Updated weights for policy 0, policy_version 177444 (0.0012) [2025-01-04 10:53:07,383][134294] Updated weights for policy 0, policy_version 177454 (0.0023) [2025-01-04 10:53:08,968][134211] Fps is (10 sec: 18022.2, 60 sec: 13926.4, 300 sec: 13607.1). Total num frames: 726867968. Throughput: 0: 3599.6. Samples: 170885202. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 10:53:08,968][134211] Avg episode reward: [(0, '8.192')] [2025-01-04 10:53:10,797][134294] Updated weights for policy 0, policy_version 177464 (0.0029) [2025-01-04 10:53:13,787][134294] Updated weights for policy 0, policy_version 177474 (0.0025) [2025-01-04 10:53:13,968][134211] Fps is (10 sec: 13926.7, 60 sec: 13994.7, 300 sec: 13607.1). Total num frames: 726933504. Throughput: 0: 3639.4. Samples: 170904446. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 10:53:13,968][134211] Avg episode reward: [(0, '9.891')] [2025-01-04 10:53:13,978][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000177474_726933504.pth... [2025-01-04 10:53:14,056][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000176678_723673088.pth [2025-01-04 10:53:16,993][134294] Updated weights for policy 0, policy_version 177484 (0.0025) [2025-01-04 10:53:18,968][134211] Fps is (10 sec: 13107.2, 60 sec: 13994.7, 300 sec: 13634.8). Total num frames: 726999040. Throughput: 0: 3609.5. Samples: 170914030. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 10:53:18,968][134211] Avg episode reward: [(0, '9.778')] [2025-01-04 10:53:20,057][134294] Updated weights for policy 0, policy_version 177494 (0.0024) [2025-01-04 10:53:23,136][134294] Updated weights for policy 0, policy_version 177504 (0.0025) [2025-01-04 10:53:23,968][134211] Fps is (10 sec: 13107.1, 60 sec: 14062.9, 300 sec: 13634.8). Total num frames: 727064576. Throughput: 0: 3617.4. Samples: 170934336. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 10:53:23,968][134211] Avg episode reward: [(0, '9.355')] [2025-01-04 10:53:26,343][134294] Updated weights for policy 0, policy_version 177514 (0.0023) [2025-01-04 10:53:28,968][134211] Fps is (10 sec: 13107.1, 60 sec: 14199.5, 300 sec: 13634.8). Total num frames: 727130112. Throughput: 0: 3614.1. Samples: 170953388. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 10:53:28,968][134211] Avg episode reward: [(0, '8.867')] [2025-01-04 10:53:29,514][134294] Updated weights for policy 0, policy_version 177524 (0.0026) [2025-01-04 10:53:32,634][134294] Updated weights for policy 0, policy_version 177534 (0.0028) [2025-01-04 10:53:33,969][134211] Fps is (10 sec: 13106.0, 60 sec: 13994.4, 300 sec: 13620.9). Total num frames: 727195648. Throughput: 0: 3613.7. Samples: 170963412. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 10:53:33,969][134211] Avg episode reward: [(0, '10.328')] [2025-01-04 10:53:35,799][134294] Updated weights for policy 0, policy_version 177544 (0.0023) [2025-01-04 10:53:38,696][134294] Updated weights for policy 0, policy_version 177554 (0.0024) [2025-01-04 10:53:38,968][134211] Fps is (10 sec: 13107.3, 60 sec: 14131.2, 300 sec: 13620.9). Total num frames: 727261184. Throughput: 0: 3592.9. Samples: 170983118. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 10:53:38,968][134211] Avg episode reward: [(0, '8.556')] [2025-01-04 10:53:41,718][134294] Updated weights for policy 0, policy_version 177564 (0.0025) [2025-01-04 10:53:43,968][134211] Fps is (10 sec: 13518.3, 60 sec: 14199.5, 300 sec: 13620.9). Total num frames: 727330816. Throughput: 0: 3609.9. Samples: 171003604. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 10:53:43,968][134211] Avg episode reward: [(0, '10.175')] [2025-01-04 10:53:44,785][134294] Updated weights for policy 0, policy_version 177574 (0.0026) [2025-01-04 10:53:47,768][134294] Updated weights for policy 0, policy_version 177584 (0.0023) [2025-01-04 10:53:48,968][134211] Fps is (10 sec: 13516.8, 60 sec: 14199.6, 300 sec: 13621.0). Total num frames: 727396352. Throughput: 0: 3472.3. Samples: 171013610. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 10:53:48,968][134211] Avg episode reward: [(0, '9.109')] [2025-01-04 10:53:51,113][134294] Updated weights for policy 0, policy_version 177594 (0.0028) [2025-01-04 10:53:53,968][134211] Fps is (10 sec: 12697.4, 60 sec: 14062.9, 300 sec: 13607.0). Total num frames: 727457792. Throughput: 0: 3278.7. Samples: 171032746. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 10:53:53,968][134211] Avg episode reward: [(0, '9.478')] [2025-01-04 10:53:54,547][134294] Updated weights for policy 0, policy_version 177604 (0.0025) [2025-01-04 10:53:57,601][134294] Updated weights for policy 0, policy_version 177614 (0.0028) [2025-01-04 10:53:58,968][134211] Fps is (10 sec: 12697.5, 60 sec: 13926.4, 300 sec: 13620.9). Total num frames: 727523328. Throughput: 0: 3273.7. Samples: 171051762. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 10:53:58,968][134211] Avg episode reward: [(0, '8.454')] [2025-01-04 10:54:00,660][134294] Updated weights for policy 0, policy_version 177624 (0.0025) [2025-01-04 10:54:03,630][134294] Updated weights for policy 0, policy_version 177634 (0.0026) [2025-01-04 10:54:03,968][134211] Fps is (10 sec: 13107.3, 60 sec: 13243.8, 300 sec: 13621.0). Total num frames: 727588864. Throughput: 0: 3292.2. Samples: 171062178. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 10:54:03,968][134211] Avg episode reward: [(0, '10.553')] [2025-01-04 10:54:06,798][134294] Updated weights for policy 0, policy_version 177644 (0.0022) [2025-01-04 10:54:08,968][134211] Fps is (10 sec: 13516.8, 60 sec: 13175.4, 300 sec: 13523.7). Total num frames: 727658496. Throughput: 0: 3281.4. Samples: 171081998. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 10:54:08,968][134211] Avg episode reward: [(0, '9.188')] [2025-01-04 10:54:09,825][134294] Updated weights for policy 0, policy_version 177654 (0.0025) [2025-01-04 10:54:12,874][134294] Updated weights for policy 0, policy_version 177664 (0.0025) [2025-01-04 10:54:13,969][134211] Fps is (10 sec: 13515.2, 60 sec: 13175.2, 300 sec: 13412.6). Total num frames: 727724032. Throughput: 0: 3300.1. Samples: 171101898. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 10:54:13,969][134211] Avg episode reward: [(0, '8.981')] [2025-01-04 10:54:16,183][134294] Updated weights for policy 0, policy_version 177674 (0.0025) [2025-01-04 10:54:18,968][134211] Fps is (10 sec: 12288.1, 60 sec: 13038.9, 300 sec: 13412.7). Total num frames: 727781376. Throughput: 0: 3282.8. Samples: 171111134. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 10:54:18,968][134211] Avg episode reward: [(0, '8.680')] [2025-01-04 10:54:19,798][134294] Updated weights for policy 0, policy_version 177684 (0.0025) [2025-01-04 10:54:22,912][134294] Updated weights for policy 0, policy_version 177694 (0.0021) [2025-01-04 10:54:23,968][134211] Fps is (10 sec: 12699.3, 60 sec: 13107.3, 300 sec: 13440.4). Total num frames: 727851008. Throughput: 0: 3235.1. Samples: 171128698. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 10:54:23,968][134211] Avg episode reward: [(0, '9.353')] [2025-01-04 10:54:25,118][134294] Updated weights for policy 0, policy_version 177704 (0.0013) [2025-01-04 10:54:27,135][134294] Updated weights for policy 0, policy_version 177714 (0.0013) [2025-01-04 10:54:28,968][134211] Fps is (10 sec: 17203.3, 60 sec: 13721.6, 300 sec: 13565.4). Total num frames: 727953408. Throughput: 0: 3425.2. Samples: 171157740. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 10:54:28,968][134211] Avg episode reward: [(0, '10.095')] [2025-01-04 10:54:29,119][134294] Updated weights for policy 0, policy_version 177724 (0.0013) [2025-01-04 10:54:31,961][134294] Updated weights for policy 0, policy_version 177734 (0.0025) [2025-01-04 10:54:33,968][134211] Fps is (10 sec: 17203.0, 60 sec: 13790.1, 300 sec: 13593.2). Total num frames: 728023040. Throughput: 0: 3481.2. Samples: 171170266. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 10:54:33,968][134211] Avg episode reward: [(0, '9.626')] [2025-01-04 10:54:35,203][134294] Updated weights for policy 0, policy_version 177744 (0.0028) [2025-01-04 10:54:38,194][134294] Updated weights for policy 0, policy_version 177754 (0.0022) [2025-01-04 10:54:38,968][134211] Fps is (10 sec: 13107.0, 60 sec: 13721.6, 300 sec: 13565.4). Total num frames: 728084480. Throughput: 0: 3488.2. Samples: 171189714. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 10:54:38,968][134211] Avg episode reward: [(0, '9.846')] [2025-01-04 10:54:41,319][134294] Updated weights for policy 0, policy_version 177764 (0.0026) [2025-01-04 10:54:43,968][134211] Fps is (10 sec: 13107.0, 60 sec: 13721.6, 300 sec: 13579.3). Total num frames: 728154112. Throughput: 0: 3505.5. Samples: 171209508. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 10:54:43,968][134211] Avg episode reward: [(0, '9.329')] [2025-01-04 10:54:44,445][134294] Updated weights for policy 0, policy_version 177774 (0.0026) [2025-01-04 10:54:47,509][134294] Updated weights for policy 0, policy_version 177784 (0.0026) [2025-01-04 10:54:48,968][134211] Fps is (10 sec: 13516.8, 60 sec: 13721.6, 300 sec: 13593.2). Total num frames: 728219648. Throughput: 0: 3497.5. Samples: 171219566. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 10:54:48,968][134211] Avg episode reward: [(0, '10.791')] [2025-01-04 10:54:50,432][134294] Updated weights for policy 0, policy_version 177794 (0.0024) [2025-01-04 10:54:53,501][134294] Updated weights for policy 0, policy_version 177804 (0.0026) [2025-01-04 10:54:53,968][134211] Fps is (10 sec: 13516.8, 60 sec: 13858.1, 300 sec: 13607.0). Total num frames: 728289280. Throughput: 0: 3519.0. Samples: 171240352. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 10:54:53,968][134211] Avg episode reward: [(0, '10.037')] [2025-01-04 10:54:56,721][134294] Updated weights for policy 0, policy_version 177814 (0.0023) [2025-01-04 10:54:58,968][134211] Fps is (10 sec: 13107.4, 60 sec: 13789.9, 300 sec: 13593.2). Total num frames: 728350720. Throughput: 0: 3492.7. Samples: 171259064. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 10:54:58,968][134211] Avg episode reward: [(0, '9.560')] [2025-01-04 10:54:59,940][134294] Updated weights for policy 0, policy_version 177824 (0.0024) [2025-01-04 10:55:02,989][134294] Updated weights for policy 0, policy_version 177834 (0.0026) [2025-01-04 10:55:03,968][134211] Fps is (10 sec: 13107.2, 60 sec: 13858.1, 300 sec: 13607.0). Total num frames: 728420352. Throughput: 0: 3507.3. Samples: 171268962. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 10:55:03,968][134211] Avg episode reward: [(0, '8.991')] [2025-01-04 10:55:06,147][134294] Updated weights for policy 0, policy_version 177844 (0.0027) [2025-01-04 10:55:08,968][134211] Fps is (10 sec: 13516.8, 60 sec: 13789.9, 300 sec: 13579.3). Total num frames: 728485888. Throughput: 0: 3561.7. Samples: 171288976. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 10:55:08,968][134211] Avg episode reward: [(0, '8.364')] [2025-01-04 10:55:09,094][134294] Updated weights for policy 0, policy_version 177854 (0.0025) [2025-01-04 10:55:12,015][134294] Updated weights for policy 0, policy_version 177864 (0.0025) [2025-01-04 10:55:13,968][134211] Fps is (10 sec: 13517.0, 60 sec: 13858.4, 300 sec: 13607.1). Total num frames: 728555520. Throughput: 0: 3377.9. Samples: 171309746. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 10:55:13,968][134211] Avg episode reward: [(0, '10.007')] [2025-01-04 10:55:13,977][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000177870_728555520.pth... [2025-01-04 10:55:14,051][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000177072_725286912.pth [2025-01-04 10:55:15,233][134294] Updated weights for policy 0, policy_version 177874 (0.0024) [2025-01-04 10:55:18,152][134294] Updated weights for policy 0, policy_version 177884 (0.0025) [2025-01-04 10:55:18,968][134211] Fps is (10 sec: 13516.6, 60 sec: 13994.6, 300 sec: 13634.9). Total num frames: 728621056. Throughput: 0: 3314.2. Samples: 171319406. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 10:55:18,968][134211] Avg episode reward: [(0, '8.682')] [2025-01-04 10:55:21,413][134294] Updated weights for policy 0, policy_version 177894 (0.0026) [2025-01-04 10:55:23,968][134211] Fps is (10 sec: 12697.6, 60 sec: 13858.1, 300 sec: 13634.8). Total num frames: 728682496. Throughput: 0: 3318.4. Samples: 171339042. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 10:55:23,968][134211] Avg episode reward: [(0, '8.380')] [2025-01-04 10:55:24,728][134294] Updated weights for policy 0, policy_version 177904 (0.0026) [2025-01-04 10:55:27,854][134294] Updated weights for policy 0, policy_version 177914 (0.0022) [2025-01-04 10:55:28,968][134211] Fps is (10 sec: 12697.6, 60 sec: 13243.7, 300 sec: 13551.5). Total num frames: 728748032. Throughput: 0: 3301.0. Samples: 171358054. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 10:55:28,968][134211] Avg episode reward: [(0, '8.843')] [2025-01-04 10:55:30,777][134294] Updated weights for policy 0, policy_version 177924 (0.0027) [2025-01-04 10:55:33,790][134294] Updated weights for policy 0, policy_version 177934 (0.0024) [2025-01-04 10:55:33,968][134211] Fps is (10 sec: 13516.6, 60 sec: 13243.7, 300 sec: 13565.4). Total num frames: 728817664. Throughput: 0: 3307.6. Samples: 171368408. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 10:55:33,968][134211] Avg episode reward: [(0, '8.721')] [2025-01-04 10:55:36,915][134294] Updated weights for policy 0, policy_version 177944 (0.0031) [2025-01-04 10:55:38,968][134211] Fps is (10 sec: 13516.8, 60 sec: 13312.0, 300 sec: 13579.3). Total num frames: 728883200. Throughput: 0: 3293.6. Samples: 171388562. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 10:55:38,968][134211] Avg episode reward: [(0, '8.724')] [2025-01-04 10:55:40,111][134294] Updated weights for policy 0, policy_version 177954 (0.0024) [2025-01-04 10:55:43,154][134294] Updated weights for policy 0, policy_version 177964 (0.0030) [2025-01-04 10:55:43,968][134211] Fps is (10 sec: 13107.2, 60 sec: 13243.7, 300 sec: 13579.3). Total num frames: 728948736. Throughput: 0: 3318.4. Samples: 171408394. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 10:55:43,968][134211] Avg episode reward: [(0, '9.456')] [2025-01-04 10:55:46,405][134294] Updated weights for policy 0, policy_version 177974 (0.0025) [2025-01-04 10:55:48,742][134294] Updated weights for policy 0, policy_version 177984 (0.0016) [2025-01-04 10:55:48,967][134211] Fps is (10 sec: 13926.8, 60 sec: 13380.3, 300 sec: 13607.1). Total num frames: 729022464. Throughput: 0: 3313.0. Samples: 171418044. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 10:55:48,968][134211] Avg episode reward: [(0, '9.029')] [2025-01-04 10:55:50,919][134294] Updated weights for policy 0, policy_version 177994 (0.0012) [2025-01-04 10:55:53,004][134294] Updated weights for policy 0, policy_version 178004 (0.0015) [2025-01-04 10:55:53,968][134211] Fps is (10 sec: 16793.9, 60 sec: 13789.9, 300 sec: 13718.1). Total num frames: 729116672. Throughput: 0: 3469.2. Samples: 171445090. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 10:55:53,968][134211] Avg episode reward: [(0, '9.223')] [2025-01-04 10:55:55,932][134294] Updated weights for policy 0, policy_version 178014 (0.0020) [2025-01-04 10:55:58,968][134211] Fps is (10 sec: 15564.4, 60 sec: 13789.8, 300 sec: 13704.2). Total num frames: 729178112. Throughput: 0: 3467.4. Samples: 171465780. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 10:55:58,968][134211] Avg episode reward: [(0, '10.181')] [2025-01-04 10:55:59,543][134294] Updated weights for policy 0, policy_version 178024 (0.0024) [2025-01-04 10:56:02,660][134294] Updated weights for policy 0, policy_version 178034 (0.0025) [2025-01-04 10:56:03,968][134211] Fps is (10 sec: 12697.0, 60 sec: 13721.5, 300 sec: 13648.7). Total num frames: 729243648. Throughput: 0: 3460.7. Samples: 171475138. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 10:56:03,969][134211] Avg episode reward: [(0, '9.998')] [2025-01-04 10:56:05,859][134294] Updated weights for policy 0, policy_version 178044 (0.0026) [2025-01-04 10:56:08,841][134294] Updated weights for policy 0, policy_version 178054 (0.0026) [2025-01-04 10:56:08,968][134211] Fps is (10 sec: 13107.1, 60 sec: 13721.5, 300 sec: 13648.7). Total num frames: 729309184. Throughput: 0: 3461.1. Samples: 171494790. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 10:56:08,968][134211] Avg episode reward: [(0, '8.465')] [2025-01-04 10:56:11,869][134294] Updated weights for policy 0, policy_version 178064 (0.0026) [2025-01-04 10:56:13,968][134211] Fps is (10 sec: 13107.6, 60 sec: 13653.3, 300 sec: 13648.7). Total num frames: 729374720. Throughput: 0: 3490.0. Samples: 171515102. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 10:56:13,968][134211] Avg episode reward: [(0, '9.002')] [2025-01-04 10:56:14,962][134294] Updated weights for policy 0, policy_version 178074 (0.0026) [2025-01-04 10:56:18,009][134294] Updated weights for policy 0, policy_version 178084 (0.0023) [2025-01-04 10:56:18,968][134211] Fps is (10 sec: 13107.3, 60 sec: 13653.3, 300 sec: 13662.6). Total num frames: 729440256. Throughput: 0: 3481.6. Samples: 171525080. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 10:56:18,968][134211] Avg episode reward: [(0, '8.909')] [2025-01-04 10:56:21,386][134294] Updated weights for policy 0, policy_version 178094 (0.0025) [2025-01-04 10:56:23,968][134211] Fps is (10 sec: 12697.6, 60 sec: 13653.3, 300 sec: 13509.8). Total num frames: 729501696. Throughput: 0: 3448.4. Samples: 171543742. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 10:56:23,968][134211] Avg episode reward: [(0, '9.534')] [2025-01-04 10:56:25,010][134294] Updated weights for policy 0, policy_version 178104 (0.0025) [2025-01-04 10:56:28,436][134294] Updated weights for policy 0, policy_version 178114 (0.0028) [2025-01-04 10:56:28,968][134211] Fps is (10 sec: 11878.4, 60 sec: 13516.8, 300 sec: 13426.6). Total num frames: 729559040. Throughput: 0: 3394.8. Samples: 171561158. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 10:56:28,968][134211] Avg episode reward: [(0, '9.309')] [2025-01-04 10:56:31,566][134294] Updated weights for policy 0, policy_version 178124 (0.0028) [2025-01-04 10:56:33,968][134211] Fps is (10 sec: 12288.1, 60 sec: 13448.6, 300 sec: 13440.4). Total num frames: 729624576. Throughput: 0: 3395.2. Samples: 171570830. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 10:56:33,968][134211] Avg episode reward: [(0, '9.861')] [2025-01-04 10:56:34,800][134294] Updated weights for policy 0, policy_version 178134 (0.0026) [2025-01-04 10:56:37,801][134294] Updated weights for policy 0, policy_version 178144 (0.0024) [2025-01-04 10:56:38,968][134211] Fps is (10 sec: 13107.2, 60 sec: 13448.5, 300 sec: 13454.3). Total num frames: 729690112. Throughput: 0: 3225.7. Samples: 171590248. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 10:56:38,969][134211] Avg episode reward: [(0, '9.868')] [2025-01-04 10:56:40,895][134294] Updated weights for policy 0, policy_version 178154 (0.0027) [2025-01-04 10:56:43,768][134294] Updated weights for policy 0, policy_version 178164 (0.0024) [2025-01-04 10:56:43,968][134211] Fps is (10 sec: 13516.6, 60 sec: 13516.8, 300 sec: 13468.2). Total num frames: 729759744. Throughput: 0: 3228.0. Samples: 171611040. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 10:56:43,968][134211] Avg episode reward: [(0, '10.047')] [2025-01-04 10:56:46,793][134294] Updated weights for policy 0, policy_version 178174 (0.0025) [2025-01-04 10:56:48,968][134211] Fps is (10 sec: 13926.1, 60 sec: 13448.4, 300 sec: 13509.9). Total num frames: 729829376. Throughput: 0: 3246.6. Samples: 171621236. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 10:56:48,969][134211] Avg episode reward: [(0, '9.059')] [2025-01-04 10:56:49,884][134294] Updated weights for policy 0, policy_version 178184 (0.0027) [2025-01-04 10:56:52,991][134294] Updated weights for policy 0, policy_version 178194 (0.0024) [2025-01-04 10:56:53,968][134211] Fps is (10 sec: 13107.4, 60 sec: 12902.4, 300 sec: 13509.9). Total num frames: 729890816. Throughput: 0: 3252.1. Samples: 171641136. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 10:56:53,968][134211] Avg episode reward: [(0, '8.979')] [2025-01-04 10:56:56,272][134294] Updated weights for policy 0, policy_version 178204 (0.0026) [2025-01-04 10:56:58,968][134211] Fps is (10 sec: 12697.9, 60 sec: 12970.7, 300 sec: 13523.7). Total num frames: 729956352. Throughput: 0: 3219.9. Samples: 171659996. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 10:56:58,968][134211] Avg episode reward: [(0, '9.899')] [2025-01-04 10:56:59,479][134294] Updated weights for policy 0, policy_version 178214 (0.0026) [2025-01-04 10:57:02,460][134294] Updated weights for policy 0, policy_version 178224 (0.0029) [2025-01-04 10:57:03,968][134211] Fps is (10 sec: 13107.0, 60 sec: 12970.7, 300 sec: 13523.7). Total num frames: 730021888. Throughput: 0: 3226.7. Samples: 171670280. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 10:57:03,968][134211] Avg episode reward: [(0, '8.903')] [2025-01-04 10:57:05,597][134294] Updated weights for policy 0, policy_version 178234 (0.0025) [2025-01-04 10:57:08,670][134294] Updated weights for policy 0, policy_version 178244 (0.0026) [2025-01-04 10:57:08,967][134211] Fps is (10 sec: 13107.6, 60 sec: 12970.8, 300 sec: 13537.6). Total num frames: 730087424. Throughput: 0: 3249.2. Samples: 171689954. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 10:57:08,968][134211] Avg episode reward: [(0, '9.040')] [2025-01-04 10:57:10,657][134294] Updated weights for policy 0, policy_version 178254 (0.0013) [2025-01-04 10:57:13,339][134294] Updated weights for policy 0, policy_version 178264 (0.0023) [2025-01-04 10:57:13,969][134211] Fps is (10 sec: 15563.6, 60 sec: 13380.1, 300 sec: 13620.9). Total num frames: 730177536. Throughput: 0: 3417.2. Samples: 171714936. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 10:57:13,969][134211] Avg episode reward: [(0, '8.913')] [2025-01-04 10:57:13,977][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000178266_730177536.pth... [2025-01-04 10:57:14,050][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000177474_726933504.pth [2025-01-04 10:57:16,542][134294] Updated weights for policy 0, policy_version 178274 (0.0027) [2025-01-04 10:57:18,968][134211] Fps is (10 sec: 15155.0, 60 sec: 13312.0, 300 sec: 13620.9). Total num frames: 730238976. Throughput: 0: 3417.5. Samples: 171724618. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 10:57:18,968][134211] Avg episode reward: [(0, '10.110')] [2025-01-04 10:57:19,816][134294] Updated weights for policy 0, policy_version 178284 (0.0025) [2025-01-04 10:57:23,053][134294] Updated weights for policy 0, policy_version 178294 (0.0027) [2025-01-04 10:57:23,968][134211] Fps is (10 sec: 12288.7, 60 sec: 13311.9, 300 sec: 13634.8). Total num frames: 730300416. Throughput: 0: 3403.4. Samples: 171743404. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 10:57:23,969][134211] Avg episode reward: [(0, '9.575')] [2025-01-04 10:57:26,384][134294] Updated weights for policy 0, policy_version 178304 (0.0028) [2025-01-04 10:57:28,967][134211] Fps is (10 sec: 13107.4, 60 sec: 13516.9, 300 sec: 13607.1). Total num frames: 730370048. Throughput: 0: 3361.3. Samples: 171762296. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 10:57:28,968][134211] Avg episode reward: [(0, '9.495')] [2025-01-04 10:57:29,028][134294] Updated weights for policy 0, policy_version 178314 (0.0017) [2025-01-04 10:57:30,905][134294] Updated weights for policy 0, policy_version 178324 (0.0012) [2025-01-04 10:57:33,090][134294] Updated weights for policy 0, policy_version 178334 (0.0017) [2025-01-04 10:57:33,968][134211] Fps is (10 sec: 16384.6, 60 sec: 13994.7, 300 sec: 13732.0). Total num frames: 730464256. Throughput: 0: 3492.6. Samples: 171778400. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 10:57:33,968][134211] Avg episode reward: [(0, '8.546')] [2025-01-04 10:57:36,242][134294] Updated weights for policy 0, policy_version 178344 (0.0028) [2025-01-04 10:57:38,968][134211] Fps is (10 sec: 15973.9, 60 sec: 13994.7, 300 sec: 13732.0). Total num frames: 730529792. Throughput: 0: 3537.5. Samples: 171800322. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 10:57:38,968][134211] Avg episode reward: [(0, '7.679')] [2025-01-04 10:57:39,511][134294] Updated weights for policy 0, policy_version 178354 (0.0027) [2025-01-04 10:57:42,593][134294] Updated weights for policy 0, policy_version 178364 (0.0024) [2025-01-04 10:57:43,968][134211] Fps is (10 sec: 13107.0, 60 sec: 13926.4, 300 sec: 13732.0). Total num frames: 730595328. Throughput: 0: 3553.2. Samples: 171819888. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 10:57:43,968][134211] Avg episode reward: [(0, '9.172')] [2025-01-04 10:57:45,600][134294] Updated weights for policy 0, policy_version 178374 (0.0025) [2025-01-04 10:57:48,646][134294] Updated weights for policy 0, policy_version 178384 (0.0024) [2025-01-04 10:57:48,968][134211] Fps is (10 sec: 13516.8, 60 sec: 13926.4, 300 sec: 13732.0). Total num frames: 730664960. Throughput: 0: 3552.2. Samples: 171830130. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 10:57:48,968][134211] Avg episode reward: [(0, '8.699')] [2025-01-04 10:57:51,868][134294] Updated weights for policy 0, policy_version 178394 (0.0023) [2025-01-04 10:57:53,968][134211] Fps is (10 sec: 13107.2, 60 sec: 13926.4, 300 sec: 13690.3). Total num frames: 730726400. Throughput: 0: 3545.9. Samples: 171849520. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 10:57:53,968][134211] Avg episode reward: [(0, '8.736')] [2025-01-04 10:57:55,457][134294] Updated weights for policy 0, policy_version 178404 (0.0026) [2025-01-04 10:57:58,586][134294] Updated weights for policy 0, policy_version 178414 (0.0026) [2025-01-04 10:57:58,968][134211] Fps is (10 sec: 12288.1, 60 sec: 13858.1, 300 sec: 13537.6). Total num frames: 730787840. Throughput: 0: 3398.5. Samples: 171867864. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 10:57:58,968][134211] Avg episode reward: [(0, '9.471')] [2025-01-04 10:58:01,549][134294] Updated weights for policy 0, policy_version 178424 (0.0025) [2025-01-04 10:58:03,968][134211] Fps is (10 sec: 12697.6, 60 sec: 13858.1, 300 sec: 13509.8). Total num frames: 730853376. Throughput: 0: 3410.0. Samples: 171878070. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 10:58:03,968][134211] Avg episode reward: [(0, '9.951')] [2025-01-04 10:58:04,685][134294] Updated weights for policy 0, policy_version 178434 (0.0025) [2025-01-04 10:58:07,825][134294] Updated weights for policy 0, policy_version 178444 (0.0027) [2025-01-04 10:58:08,968][134211] Fps is (10 sec: 13107.3, 60 sec: 13858.1, 300 sec: 13509.9). Total num frames: 730918912. Throughput: 0: 3426.6. Samples: 171897598. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 10:58:08,968][134211] Avg episode reward: [(0, '9.790')] [2025-01-04 10:58:10,768][134294] Updated weights for policy 0, policy_version 178454 (0.0023) [2025-01-04 10:58:13,707][134294] Updated weights for policy 0, policy_version 178464 (0.0026) [2025-01-04 10:58:13,968][134211] Fps is (10 sec: 13516.8, 60 sec: 13517.0, 300 sec: 13523.7). Total num frames: 730988544. Throughput: 0: 3473.6. Samples: 171918608. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 10:58:13,968][134211] Avg episode reward: [(0, '9.282')] [2025-01-04 10:58:16,688][134294] Updated weights for policy 0, policy_version 178474 (0.0023) [2025-01-04 10:58:18,971][134211] Fps is (10 sec: 13921.9, 60 sec: 13652.6, 300 sec: 13537.5). Total num frames: 731058176. Throughput: 0: 3343.8. Samples: 171928884. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 10:58:18,972][134211] Avg episode reward: [(0, '10.456')] [2025-01-04 10:58:19,918][134294] Updated weights for policy 0, policy_version 178484 (0.0025) [2025-01-04 10:58:23,526][134294] Updated weights for policy 0, policy_version 178494 (0.0029) [2025-01-04 10:58:23,968][134211] Fps is (10 sec: 12697.6, 60 sec: 13585.1, 300 sec: 13509.9). Total num frames: 731115520. Throughput: 0: 3268.0. Samples: 171947380. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 10:58:23,968][134211] Avg episode reward: [(0, '9.953')] [2025-01-04 10:58:26,339][134294] Updated weights for policy 0, policy_version 178504 (0.0019) [2025-01-04 10:58:28,492][134294] Updated weights for policy 0, policy_version 178514 (0.0012) [2025-01-04 10:58:28,968][134211] Fps is (10 sec: 14340.8, 60 sec: 13858.1, 300 sec: 13579.3). Total num frames: 731201536. Throughput: 0: 3339.4. Samples: 171970160. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 10:58:28,968][134211] Avg episode reward: [(0, '9.724')] [2025-01-04 10:58:30,472][134294] Updated weights for policy 0, policy_version 178524 (0.0011) [2025-01-04 10:58:32,375][134294] Updated weights for policy 0, policy_version 178534 (0.0013) [2025-01-04 10:58:33,968][134211] Fps is (10 sec: 19251.6, 60 sec: 14062.9, 300 sec: 13718.1). Total num frames: 731308032. Throughput: 0: 3458.0. Samples: 171985740. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-04 10:58:33,968][134211] Avg episode reward: [(0, '10.811')] [2025-01-04 10:58:34,232][134294] Updated weights for policy 0, policy_version 178544 (0.0014) [2025-01-04 10:58:37,119][134294] Updated weights for policy 0, policy_version 178554 (0.0024) [2025-01-04 10:58:38,968][134211] Fps is (10 sec: 17612.3, 60 sec: 14131.2, 300 sec: 13718.1). Total num frames: 731377664. Throughput: 0: 3626.7. Samples: 172012722. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-04 10:58:38,969][134211] Avg episode reward: [(0, '9.563')] [2025-01-04 10:58:40,472][134294] Updated weights for policy 0, policy_version 178564 (0.0027) [2025-01-04 10:58:43,580][134294] Updated weights for policy 0, policy_version 178574 (0.0028) [2025-01-04 10:58:43,968][134211] Fps is (10 sec: 13516.7, 60 sec: 14131.2, 300 sec: 13718.1). Total num frames: 731443200. Throughput: 0: 3642.4. Samples: 172031770. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-04 10:58:43,968][134211] Avg episode reward: [(0, '9.390')] [2025-01-04 10:58:46,645][134294] Updated weights for policy 0, policy_version 178584 (0.0025) [2025-01-04 10:58:48,968][134211] Fps is (10 sec: 13106.7, 60 sec: 14062.8, 300 sec: 13732.0). Total num frames: 731508736. Throughput: 0: 3636.2. Samples: 172041700. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-04 10:58:48,969][134211] Avg episode reward: [(0, '9.659')] [2025-01-04 10:58:49,852][134294] Updated weights for policy 0, policy_version 178594 (0.0030) [2025-01-04 10:58:53,189][134294] Updated weights for policy 0, policy_version 178604 (0.0028) [2025-01-04 10:58:53,968][134211] Fps is (10 sec: 12697.6, 60 sec: 14063.0, 300 sec: 13718.1). Total num frames: 731570176. Throughput: 0: 3621.4. Samples: 172060562. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-04 10:58:53,968][134211] Avg episode reward: [(0, '9.125')] [2025-01-04 10:58:56,420][134294] Updated weights for policy 0, policy_version 178614 (0.0027) [2025-01-04 10:58:58,968][134211] Fps is (10 sec: 12288.6, 60 sec: 14062.9, 300 sec: 13704.2). Total num frames: 731631616. Throughput: 0: 3570.3. Samples: 172079270. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-04 10:58:58,968][134211] Avg episode reward: [(0, '9.035')] [2025-01-04 10:58:59,718][134294] Updated weights for policy 0, policy_version 178624 (0.0027) [2025-01-04 10:59:02,895][134294] Updated weights for policy 0, policy_version 178634 (0.0025) [2025-01-04 10:59:03,968][134211] Fps is (10 sec: 12696.9, 60 sec: 14062.8, 300 sec: 13690.3). Total num frames: 731697152. Throughput: 0: 3559.8. Samples: 172089066. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-04 10:59:03,969][134211] Avg episode reward: [(0, '10.399')] [2025-01-04 10:59:05,867][134294] Updated weights for policy 0, policy_version 178644 (0.0026) [2025-01-04 10:59:08,874][134294] Updated weights for policy 0, policy_version 178654 (0.0025) [2025-01-04 10:59:08,968][134211] Fps is (10 sec: 13516.7, 60 sec: 14131.2, 300 sec: 13704.3). Total num frames: 731766784. Throughput: 0: 3588.0. Samples: 172108838. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-04 10:59:08,969][134211] Avg episode reward: [(0, '8.727')] [2025-01-04 10:59:11,908][134294] Updated weights for policy 0, policy_version 178664 (0.0026) [2025-01-04 10:59:13,968][134211] Fps is (10 sec: 13517.5, 60 sec: 14063.0, 300 sec: 13732.0). Total num frames: 731832320. Throughput: 0: 3524.4. Samples: 172128758. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-04 10:59:13,968][134211] Avg episode reward: [(0, '9.583')] [2025-01-04 10:59:13,977][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000178670_731832320.pth... [2025-01-04 10:59:14,044][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000177870_728555520.pth [2025-01-04 10:59:15,322][134294] Updated weights for policy 0, policy_version 178674 (0.0026) [2025-01-04 10:59:18,379][134294] Updated weights for policy 0, policy_version 178684 (0.0027) [2025-01-04 10:59:18,968][134211] Fps is (10 sec: 12697.6, 60 sec: 13927.1, 300 sec: 13704.2). Total num frames: 731893760. Throughput: 0: 3391.5. Samples: 172138356. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-04 10:59:18,968][134211] Avg episode reward: [(0, '8.778')] [2025-01-04 10:59:21,594][134294] Updated weights for policy 0, policy_version 178694 (0.0029) [2025-01-04 10:59:23,968][134211] Fps is (10 sec: 12697.7, 60 sec: 14063.0, 300 sec: 13579.3). Total num frames: 731959296. Throughput: 0: 3218.3. Samples: 172157546. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-04 10:59:23,968][134211] Avg episode reward: [(0, '8.339')] [2025-01-04 10:59:24,913][134294] Updated weights for policy 0, policy_version 178704 (0.0026) [2025-01-04 10:59:28,234][134294] Updated weights for policy 0, policy_version 178714 (0.0026) [2025-01-04 10:59:28,968][134211] Fps is (10 sec: 12697.7, 60 sec: 13653.3, 300 sec: 13551.5). Total num frames: 732020736. Throughput: 0: 3209.9. Samples: 172176216. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-04 10:59:28,968][134211] Avg episode reward: [(0, '9.234')] [2025-01-04 10:59:31,164][134294] Updated weights for policy 0, policy_version 178724 (0.0027) [2025-01-04 10:59:33,968][134211] Fps is (10 sec: 12697.4, 60 sec: 12970.6, 300 sec: 13565.4). Total num frames: 732086272. Throughput: 0: 3219.7. Samples: 172186584. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-04 10:59:33,968][134211] Avg episode reward: [(0, '9.407')] [2025-01-04 10:59:34,254][134294] Updated weights for policy 0, policy_version 178734 (0.0026) [2025-01-04 10:59:37,414][134294] Updated weights for policy 0, policy_version 178744 (0.0025) [2025-01-04 10:59:38,968][134211] Fps is (10 sec: 13107.2, 60 sec: 12902.4, 300 sec: 13551.5). Total num frames: 732151808. Throughput: 0: 3237.6. Samples: 172206252. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 10:59:38,968][134211] Avg episode reward: [(0, '9.599')] [2025-01-04 10:59:40,465][134294] Updated weights for policy 0, policy_version 178754 (0.0025) [2025-01-04 10:59:43,378][134294] Updated weights for policy 0, policy_version 178764 (0.0026) [2025-01-04 10:59:43,968][134211] Fps is (10 sec: 13516.8, 60 sec: 12970.6, 300 sec: 13565.4). Total num frames: 732221440. Throughput: 0: 3278.2. Samples: 172226788. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 10:59:43,969][134211] Avg episode reward: [(0, '8.800')] [2025-01-04 10:59:46,378][134294] Updated weights for policy 0, policy_version 178774 (0.0025) [2025-01-04 10:59:48,968][134211] Fps is (10 sec: 13926.3, 60 sec: 13039.0, 300 sec: 13565.4). Total num frames: 732291072. Throughput: 0: 3293.0. Samples: 172237250. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 10:59:48,968][134211] Avg episode reward: [(0, '10.185')] [2025-01-04 10:59:49,501][134294] Updated weights for policy 0, policy_version 178784 (0.0030) [2025-01-04 10:59:52,806][134294] Updated weights for policy 0, policy_version 178794 (0.0025) [2025-01-04 10:59:53,968][134211] Fps is (10 sec: 13107.3, 60 sec: 13038.9, 300 sec: 13565.4). Total num frames: 732352512. Throughput: 0: 3275.6. Samples: 172256238. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 10:59:53,968][134211] Avg episode reward: [(0, '8.622')] [2025-01-04 10:59:56,112][134294] Updated weights for policy 0, policy_version 178804 (0.0024) [2025-01-04 10:59:58,117][134294] Updated weights for policy 0, policy_version 178814 (0.0013) [2025-01-04 10:59:58,967][134211] Fps is (10 sec: 14746.0, 60 sec: 13448.6, 300 sec: 13621.0). Total num frames: 732438528. Throughput: 0: 3338.3. Samples: 172278980. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 10:59:58,968][134211] Avg episode reward: [(0, '8.759')] [2025-01-04 10:59:59,991][134294] Updated weights for policy 0, policy_version 178824 (0.0013) [2025-01-04 11:00:02,935][134294] Updated weights for policy 0, policy_version 178834 (0.0021) [2025-01-04 11:00:03,968][134211] Fps is (10 sec: 15974.2, 60 sec: 13585.2, 300 sec: 13648.7). Total num frames: 732512256. Throughput: 0: 3457.0. Samples: 172293922. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 11:00:03,968][134211] Avg episode reward: [(0, '9.692')] [2025-01-04 11:00:06,403][134294] Updated weights for policy 0, policy_version 178844 (0.0029) [2025-01-04 11:00:08,968][134211] Fps is (10 sec: 13106.9, 60 sec: 13380.3, 300 sec: 13607.0). Total num frames: 732569600. Throughput: 0: 3414.8. Samples: 172311212. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 11:00:08,969][134211] Avg episode reward: [(0, '8.657')] [2025-01-04 11:00:10,427][134294] Updated weights for policy 0, policy_version 178854 (0.0027) [2025-01-04 11:00:13,968][134211] Fps is (10 sec: 11059.2, 60 sec: 13175.4, 300 sec: 13565.4). Total num frames: 732622848. Throughput: 0: 3366.2. Samples: 172327696. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 11:00:13,969][134211] Avg episode reward: [(0, '8.980')] [2025-01-04 11:00:13,997][134294] Updated weights for policy 0, policy_version 178864 (0.0026) [2025-01-04 11:00:16,671][134294] Updated weights for policy 0, policy_version 178874 (0.0022) [2025-01-04 11:00:18,693][134294] Updated weights for policy 0, policy_version 178884 (0.0014) [2025-01-04 11:00:18,968][134211] Fps is (10 sec: 14336.0, 60 sec: 13653.3, 300 sec: 13662.6). Total num frames: 732712960. Throughput: 0: 3358.6. Samples: 172337722. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 11:00:18,968][134211] Avg episode reward: [(0, '8.370')] [2025-01-04 11:00:20,859][134294] Updated weights for policy 0, policy_version 178894 (0.0013) [2025-01-04 11:00:23,402][134294] Updated weights for policy 0, policy_version 178904 (0.0019) [2025-01-04 11:00:23,968][134211] Fps is (10 sec: 17203.2, 60 sec: 13926.4, 300 sec: 13718.1). Total num frames: 732794880. Throughput: 0: 3567.1. Samples: 172366774. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 11:00:23,969][134211] Avg episode reward: [(0, '9.564')] [2025-01-04 11:00:27,299][134294] Updated weights for policy 0, policy_version 178914 (0.0027) [2025-01-04 11:00:28,968][134211] Fps is (10 sec: 13516.7, 60 sec: 13789.9, 300 sec: 13662.6). Total num frames: 732848128. Throughput: 0: 3483.3. Samples: 172383536. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 11:00:28,968][134211] Avg episode reward: [(0, '10.484')] [2025-01-04 11:00:30,832][134294] Updated weights for policy 0, policy_version 178924 (0.0030) [2025-01-04 11:00:33,968][134211] Fps is (10 sec: 11059.2, 60 sec: 13653.3, 300 sec: 13634.8). Total num frames: 732905472. Throughput: 0: 3446.3. Samples: 172392332. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 11:00:33,969][134211] Avg episode reward: [(0, '9.397')] [2025-01-04 11:00:34,288][134294] Updated weights for policy 0, policy_version 178934 (0.0029) [2025-01-04 11:00:37,838][134294] Updated weights for policy 0, policy_version 178944 (0.0027) [2025-01-04 11:00:38,968][134211] Fps is (10 sec: 11878.4, 60 sec: 13585.1, 300 sec: 13620.9). Total num frames: 732966912. Throughput: 0: 3415.6. Samples: 172409938. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 11:00:38,968][134211] Avg episode reward: [(0, '9.528')] [2025-01-04 11:00:41,413][134294] Updated weights for policy 0, policy_version 178954 (0.0027) [2025-01-04 11:00:43,968][134211] Fps is (10 sec: 11878.5, 60 sec: 13380.3, 300 sec: 13565.4). Total num frames: 733024256. Throughput: 0: 3293.2. Samples: 172427174. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 11:00:43,969][134211] Avg episode reward: [(0, '9.900')] [2025-01-04 11:00:44,994][134294] Updated weights for policy 0, policy_version 178964 (0.0025) [2025-01-04 11:00:48,395][134294] Updated weights for policy 0, policy_version 178974 (0.0027) [2025-01-04 11:00:48,968][134211] Fps is (10 sec: 11468.8, 60 sec: 13175.5, 300 sec: 13440.4). Total num frames: 733081600. Throughput: 0: 3152.4. Samples: 172435778. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 11:00:48,968][134211] Avg episode reward: [(0, '8.867')] [2025-01-04 11:00:51,284][134294] Updated weights for policy 0, policy_version 178984 (0.0019) [2025-01-04 11:00:53,968][134211] Fps is (10 sec: 13107.4, 60 sec: 13380.3, 300 sec: 13482.1). Total num frames: 733155328. Throughput: 0: 3240.0. Samples: 172457012. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 11:00:53,968][134211] Avg episode reward: [(0, '10.015')] [2025-01-04 11:00:54,018][134294] Updated weights for policy 0, policy_version 178994 (0.0019) [2025-01-04 11:00:57,558][134294] Updated weights for policy 0, policy_version 179004 (0.0028) [2025-01-04 11:00:58,968][134211] Fps is (10 sec: 13107.2, 60 sec: 12902.3, 300 sec: 13454.3). Total num frames: 733212672. Throughput: 0: 3272.4. Samples: 172474956. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 11:00:58,970][134211] Avg episode reward: [(0, '9.245')] [2025-01-04 11:01:01,070][134294] Updated weights for policy 0, policy_version 179014 (0.0026) [2025-01-04 11:01:03,973][134211] Fps is (10 sec: 11463.3, 60 sec: 12628.4, 300 sec: 13426.3). Total num frames: 733270016. Throughput: 0: 3248.3. Samples: 172483910. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 11:01:03,973][134211] Avg episode reward: [(0, '8.873')] [2025-01-04 11:01:04,782][134294] Updated weights for policy 0, policy_version 179024 (0.0024) [2025-01-04 11:01:07,264][134294] Updated weights for policy 0, policy_version 179034 (0.0015) [2025-01-04 11:01:08,968][134211] Fps is (10 sec: 14336.3, 60 sec: 13107.2, 300 sec: 13496.0). Total num frames: 733356032. Throughput: 0: 3046.9. Samples: 172503884. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 11:01:08,968][134211] Avg episode reward: [(0, '8.457')] [2025-01-04 11:01:09,330][134294] Updated weights for policy 0, policy_version 179044 (0.0013) [2025-01-04 11:01:11,355][134294] Updated weights for policy 0, policy_version 179054 (0.0013) [2025-01-04 11:01:13,337][134294] Updated weights for policy 0, policy_version 179064 (0.0014) [2025-01-04 11:01:13,968][134211] Fps is (10 sec: 18850.7, 60 sec: 13926.4, 300 sec: 13620.9). Total num frames: 733458432. Throughput: 0: 3339.1. Samples: 172533796. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 11:01:13,968][134211] Avg episode reward: [(0, '8.773')] [2025-01-04 11:01:13,972][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000179067_733458432.pth... [2025-01-04 11:01:14,017][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000178266_730177536.pth [2025-01-04 11:01:15,398][134294] Updated weights for policy 0, policy_version 179074 (0.0016) [2025-01-04 11:01:18,321][134294] Updated weights for policy 0, policy_version 179084 (0.0023) [2025-01-04 11:01:18,968][134211] Fps is (10 sec: 17612.4, 60 sec: 13653.3, 300 sec: 13662.6). Total num frames: 733532160. Throughput: 0: 3469.6. Samples: 172548464. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 11:01:18,969][134211] Avg episode reward: [(0, '9.310')] [2025-01-04 11:01:21,949][134294] Updated weights for policy 0, policy_version 179094 (0.0024) [2025-01-04 11:01:23,968][134211] Fps is (10 sec: 13107.2, 60 sec: 13243.8, 300 sec: 13662.6). Total num frames: 733589504. Throughput: 0: 3468.3. Samples: 172566012. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 11:01:23,968][134211] Avg episode reward: [(0, '8.657')] [2025-01-04 11:01:25,671][134294] Updated weights for policy 0, policy_version 179104 (0.0028) [2025-01-04 11:01:28,969][134211] Fps is (10 sec: 11466.9, 60 sec: 13311.6, 300 sec: 13634.7). Total num frames: 733646848. Throughput: 0: 3461.7. Samples: 172582954. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 11:01:28,970][134211] Avg episode reward: [(0, '9.410')] [2025-01-04 11:01:29,231][134294] Updated weights for policy 0, policy_version 179114 (0.0027) [2025-01-04 11:01:32,519][134294] Updated weights for policy 0, policy_version 179124 (0.0025) [2025-01-04 11:01:33,968][134211] Fps is (10 sec: 11878.0, 60 sec: 13380.3, 300 sec: 13620.9). Total num frames: 733708288. Throughput: 0: 3470.0. Samples: 172591928. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 11:01:33,969][134211] Avg episode reward: [(0, '8.102')] [2025-01-04 11:01:35,887][134294] Updated weights for policy 0, policy_version 179134 (0.0026) [2025-01-04 11:01:38,968][134211] Fps is (10 sec: 12290.2, 60 sec: 13380.3, 300 sec: 13593.2). Total num frames: 733769728. Throughput: 0: 3410.8. Samples: 172610496. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 11:01:38,968][134211] Avg episode reward: [(0, '9.338')] [2025-01-04 11:01:39,244][134294] Updated weights for policy 0, policy_version 179144 (0.0024) [2025-01-04 11:01:42,651][134294] Updated weights for policy 0, policy_version 179154 (0.0024) [2025-01-04 11:01:43,968][134211] Fps is (10 sec: 12288.3, 60 sec: 13448.6, 300 sec: 13565.4). Total num frames: 733831168. Throughput: 0: 3414.5. Samples: 172628606. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 11:01:43,968][134211] Avg episode reward: [(0, '10.067')] [2025-01-04 11:01:45,886][134294] Updated weights for policy 0, policy_version 179164 (0.0032) [2025-01-04 11:01:48,968][134211] Fps is (10 sec: 12287.9, 60 sec: 13516.8, 300 sec: 13565.4). Total num frames: 733892608. Throughput: 0: 3427.7. Samples: 172638142. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 11:01:48,968][134211] Avg episode reward: [(0, '9.483')] [2025-01-04 11:01:49,040][134294] Updated weights for policy 0, policy_version 179174 (0.0022) [2025-01-04 11:01:52,492][134294] Updated weights for policy 0, policy_version 179184 (0.0022) [2025-01-04 11:01:53,968][134211] Fps is (10 sec: 12287.8, 60 sec: 13312.0, 300 sec: 13551.5). Total num frames: 733954048. Throughput: 0: 3396.8. Samples: 172656742. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 11:01:53,968][134211] Avg episode reward: [(0, '10.128')] [2025-01-04 11:01:55,891][134294] Updated weights for policy 0, policy_version 179194 (0.0026) [2025-01-04 11:01:58,968][134211] Fps is (10 sec: 11878.3, 60 sec: 13312.0, 300 sec: 13523.7). Total num frames: 734011392. Throughput: 0: 3134.5. Samples: 172674850. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 11:01:58,968][134211] Avg episode reward: [(0, '9.997')] [2025-01-04 11:01:59,342][134294] Updated weights for policy 0, policy_version 179204 (0.0025) [2025-01-04 11:02:02,732][134294] Updated weights for policy 0, policy_version 179214 (0.0024) [2025-01-04 11:02:03,968][134211] Fps is (10 sec: 11878.4, 60 sec: 13381.3, 300 sec: 13509.8). Total num frames: 734072832. Throughput: 0: 3005.3. Samples: 172683702. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 11:02:03,969][134211] Avg episode reward: [(0, '9.232')] [2025-01-04 11:02:05,983][134294] Updated weights for policy 0, policy_version 179224 (0.0029) [2025-01-04 11:02:08,968][134211] Fps is (10 sec: 12288.0, 60 sec: 12970.6, 300 sec: 13412.7). Total num frames: 734134272. Throughput: 0: 3028.8. Samples: 172702308. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 11:02:08,968][134211] Avg episode reward: [(0, '8.922')] [2025-01-04 11:02:09,482][134294] Updated weights for policy 0, policy_version 179234 (0.0027) [2025-01-04 11:02:13,192][134294] Updated weights for policy 0, policy_version 179244 (0.0028) [2025-01-04 11:02:13,968][134211] Fps is (10 sec: 11878.4, 60 sec: 12219.7, 300 sec: 13398.8). Total num frames: 734191616. Throughput: 0: 3025.0. Samples: 172719074. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 11:02:13,969][134211] Avg episode reward: [(0, '9.438')] [2025-01-04 11:02:16,521][134294] Updated weights for policy 0, policy_version 179254 (0.0027) [2025-01-04 11:02:18,968][134211] Fps is (10 sec: 11878.4, 60 sec: 12014.9, 300 sec: 13398.8). Total num frames: 734253056. Throughput: 0: 3028.4. Samples: 172728204. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 11:02:18,968][134211] Avg episode reward: [(0, '9.732')] [2025-01-04 11:02:19,632][134294] Updated weights for policy 0, policy_version 179264 (0.0023) [2025-01-04 11:02:22,127][134294] Updated weights for policy 0, policy_version 179274 (0.0018) [2025-01-04 11:02:23,970][134211] Fps is (10 sec: 13104.5, 60 sec: 12219.3, 300 sec: 13398.7). Total num frames: 734322688. Throughput: 0: 3091.6. Samples: 172749624. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 11:02:23,971][134211] Avg episode reward: [(0, '9.697')] [2025-01-04 11:02:25,592][134294] Updated weights for policy 0, policy_version 179284 (0.0027) [2025-01-04 11:02:28,848][134294] Updated weights for policy 0, policy_version 179294 (0.0021) [2025-01-04 11:02:28,967][134211] Fps is (10 sec: 13517.2, 60 sec: 12356.7, 300 sec: 13301.6). Total num frames: 734388224. Throughput: 0: 3085.6. Samples: 172767456. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 11:02:28,968][134211] Avg episode reward: [(0, '9.000')] [2025-01-04 11:02:31,043][134294] Updated weights for policy 0, policy_version 179304 (0.0014) [2025-01-04 11:02:33,183][134294] Updated weights for policy 0, policy_version 179314 (0.0013) [2025-01-04 11:02:33,968][134211] Fps is (10 sec: 15978.1, 60 sec: 12902.5, 300 sec: 13398.8). Total num frames: 734482432. Throughput: 0: 3179.7. Samples: 172781230. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 11:02:33,968][134211] Avg episode reward: [(0, '9.760')] [2025-01-04 11:02:36,102][134294] Updated weights for policy 0, policy_version 179324 (0.0023) [2025-01-04 11:02:38,968][134211] Fps is (10 sec: 15155.0, 60 sec: 12834.1, 300 sec: 13371.0). Total num frames: 734539776. Throughput: 0: 3266.5. Samples: 172803734. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 11:02:38,968][134211] Avg episode reward: [(0, '9.663')] [2025-01-04 11:02:39,923][134294] Updated weights for policy 0, policy_version 179334 (0.0031) [2025-01-04 11:02:43,388][134294] Updated weights for policy 0, policy_version 179344 (0.0028) [2025-01-04 11:02:43,968][134211] Fps is (10 sec: 11468.6, 60 sec: 12765.8, 300 sec: 13329.4). Total num frames: 734597120. Throughput: 0: 3243.0. Samples: 172820786. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 11:02:43,968][134211] Avg episode reward: [(0, '9.691')] [2025-01-04 11:02:46,847][134294] Updated weights for policy 0, policy_version 179354 (0.0027) [2025-01-04 11:02:48,968][134211] Fps is (10 sec: 11878.2, 60 sec: 12765.8, 300 sec: 13329.4). Total num frames: 734658560. Throughput: 0: 3245.4. Samples: 172829744. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 11:02:48,968][134211] Avg episode reward: [(0, '9.439')] [2025-01-04 11:02:50,289][134294] Updated weights for policy 0, policy_version 179364 (0.0026) [2025-01-04 11:02:53,677][134294] Updated weights for policy 0, policy_version 179374 (0.0026) [2025-01-04 11:02:53,968][134211] Fps is (10 sec: 11878.3, 60 sec: 12697.6, 300 sec: 13315.5). Total num frames: 734715904. Throughput: 0: 3229.4. Samples: 172847630. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 11:02:53,973][134211] Avg episode reward: [(0, '8.611')] [2025-01-04 11:02:57,243][134294] Updated weights for policy 0, policy_version 179384 (0.0026) [2025-01-04 11:02:58,968][134211] Fps is (10 sec: 11878.4, 60 sec: 12765.9, 300 sec: 13301.6). Total num frames: 734777344. Throughput: 0: 3245.7. Samples: 172865132. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 11:02:58,968][134211] Avg episode reward: [(0, '8.643')] [2025-01-04 11:03:00,675][134294] Updated weights for policy 0, policy_version 179394 (0.0025) [2025-01-04 11:03:03,968][134211] Fps is (10 sec: 11878.4, 60 sec: 12697.6, 300 sec: 13273.8). Total num frames: 734834688. Throughput: 0: 3244.0. Samples: 172874184. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 11:03:03,968][134211] Avg episode reward: [(0, '8.714')] [2025-01-04 11:03:04,167][134294] Updated weights for policy 0, policy_version 179404 (0.0025) [2025-01-04 11:03:06,623][134294] Updated weights for policy 0, policy_version 179414 (0.0017) [2025-01-04 11:03:08,823][134294] Updated weights for policy 0, policy_version 179424 (0.0013) [2025-01-04 11:03:08,968][134211] Fps is (10 sec: 14336.3, 60 sec: 13107.2, 300 sec: 13329.4). Total num frames: 734920704. Throughput: 0: 3248.4. Samples: 172895794. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 11:03:08,968][134211] Avg episode reward: [(0, '9.215')] [2025-01-04 11:03:12,067][134294] Updated weights for policy 0, policy_version 179434 (0.0026) [2025-01-04 11:03:13,968][134211] Fps is (10 sec: 14745.4, 60 sec: 13175.5, 300 sec: 13301.7). Total num frames: 734982144. Throughput: 0: 3312.3. Samples: 172916512. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 11:03:13,969][134211] Avg episode reward: [(0, '9.876')] [2025-01-04 11:03:13,979][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000179439_734982144.pth... [2025-01-04 11:03:14,055][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000178670_731832320.pth [2025-01-04 11:03:15,611][134294] Updated weights for policy 0, policy_version 179444 (0.0030) [2025-01-04 11:03:18,970][134211] Fps is (10 sec: 11875.7, 60 sec: 13106.8, 300 sec: 13301.5). Total num frames: 735039488. Throughput: 0: 3201.6. Samples: 172925308. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 11:03:18,972][134211] Avg episode reward: [(0, '8.787')] [2025-01-04 11:03:19,094][134294] Updated weights for policy 0, policy_version 179454 (0.0027) [2025-01-04 11:03:22,422][134294] Updated weights for policy 0, policy_version 179464 (0.0024) [2025-01-04 11:03:23,968][134211] Fps is (10 sec: 11878.4, 60 sec: 12971.1, 300 sec: 13218.3). Total num frames: 735100928. Throughput: 0: 3109.4. Samples: 172943656. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 11:03:23,969][134211] Avg episode reward: [(0, '8.856')] [2025-01-04 11:03:25,861][134294] Updated weights for policy 0, policy_version 179474 (0.0027) [2025-01-04 11:03:28,968][134211] Fps is (10 sec: 12290.6, 60 sec: 12902.3, 300 sec: 13065.5). Total num frames: 735162368. Throughput: 0: 3129.9. Samples: 172961632. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 11:03:28,968][134211] Avg episode reward: [(0, '9.416')] [2025-01-04 11:03:29,214][134294] Updated weights for policy 0, policy_version 179484 (0.0029) [2025-01-04 11:03:32,618][134294] Updated weights for policy 0, policy_version 179494 (0.0027) [2025-01-04 11:03:33,968][134211] Fps is (10 sec: 12288.2, 60 sec: 12356.2, 300 sec: 13037.8). Total num frames: 735223808. Throughput: 0: 3135.4. Samples: 172970838. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 11:03:33,968][134211] Avg episode reward: [(0, '9.495')] [2025-01-04 11:03:35,889][134294] Updated weights for policy 0, policy_version 179504 (0.0024) [2025-01-04 11:03:38,278][134294] Updated weights for policy 0, policy_version 179514 (0.0015) [2025-01-04 11:03:38,967][134211] Fps is (10 sec: 13926.7, 60 sec: 12697.6, 300 sec: 13079.4). Total num frames: 735301632. Throughput: 0: 3170.9. Samples: 172990318. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 11:03:38,968][134211] Avg episode reward: [(0, '9.453')] [2025-01-04 11:03:40,456][134294] Updated weights for policy 0, policy_version 179524 (0.0014) [2025-01-04 11:03:42,508][134294] Updated weights for policy 0, policy_version 179534 (0.0012) [2025-01-04 11:03:43,968][134211] Fps is (10 sec: 16793.8, 60 sec: 13243.8, 300 sec: 13162.8). Total num frames: 735391744. Throughput: 0: 3422.0. Samples: 173019122. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 11:03:43,968][134211] Avg episode reward: [(0, '10.051')] [2025-01-04 11:03:45,362][134294] Updated weights for policy 0, policy_version 179544 (0.0019) [2025-01-04 11:03:48,906][134294] Updated weights for policy 0, policy_version 179554 (0.0030) [2025-01-04 11:03:48,968][134211] Fps is (10 sec: 15154.8, 60 sec: 13243.7, 300 sec: 13162.7). Total num frames: 735453184. Throughput: 0: 3436.4. Samples: 173028824. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 11:03:48,968][134211] Avg episode reward: [(0, '9.168')] [2025-01-04 11:03:52,257][134294] Updated weights for policy 0, policy_version 179564 (0.0025) [2025-01-04 11:03:53,968][134211] Fps is (10 sec: 11878.2, 60 sec: 13243.7, 300 sec: 13148.9). Total num frames: 735510528. Throughput: 0: 3346.6. Samples: 173046394. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 11:03:53,969][134211] Avg episode reward: [(0, '10.122')] [2025-01-04 11:03:55,846][134294] Updated weights for policy 0, policy_version 179574 (0.0025) [2025-01-04 11:03:58,968][134211] Fps is (10 sec: 11468.8, 60 sec: 13175.5, 300 sec: 13121.1). Total num frames: 735567872. Throughput: 0: 3275.5. Samples: 173063908. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 11:03:58,968][134211] Avg episode reward: [(0, '10.226')] [2025-01-04 11:03:59,360][134294] Updated weights for policy 0, policy_version 179584 (0.0028) [2025-01-04 11:04:02,733][134294] Updated weights for policy 0, policy_version 179594 (0.0026) [2025-01-04 11:04:03,968][134211] Fps is (10 sec: 11878.3, 60 sec: 13243.7, 300 sec: 13093.3). Total num frames: 735629312. Throughput: 0: 3274.7. Samples: 173072662. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 11:04:03,969][134211] Avg episode reward: [(0, '9.338')] [2025-01-04 11:04:06,175][134294] Updated weights for policy 0, policy_version 179604 (0.0028) [2025-01-04 11:04:08,968][134211] Fps is (10 sec: 11878.3, 60 sec: 12765.8, 300 sec: 13065.5). Total num frames: 735686656. Throughput: 0: 3264.0. Samples: 173090536. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 11:04:08,969][134211] Avg episode reward: [(0, '9.727')] [2025-01-04 11:04:09,733][134294] Updated weights for policy 0, policy_version 179614 (0.0025) [2025-01-04 11:04:13,246][134294] Updated weights for policy 0, policy_version 179624 (0.0026) [2025-01-04 11:04:13,968][134211] Fps is (10 sec: 11878.6, 60 sec: 12765.9, 300 sec: 13065.5). Total num frames: 735748096. Throughput: 0: 3255.4. Samples: 173108126. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 11:04:13,968][134211] Avg episode reward: [(0, '8.441')] [2025-01-04 11:04:16,510][134294] Updated weights for policy 0, policy_version 179634 (0.0023) [2025-01-04 11:04:18,968][134211] Fps is (10 sec: 12288.3, 60 sec: 12834.6, 300 sec: 13051.7). Total num frames: 735809536. Throughput: 0: 3259.4. Samples: 173117510. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 11:04:18,968][134211] Avg episode reward: [(0, '10.496')] [2025-01-04 11:04:19,768][134294] Updated weights for policy 0, policy_version 179644 (0.0027) [2025-01-04 11:04:23,288][134294] Updated weights for policy 0, policy_version 179654 (0.0025) [2025-01-04 11:04:23,968][134211] Fps is (10 sec: 11878.2, 60 sec: 12765.9, 300 sec: 13037.8). Total num frames: 735866880. Throughput: 0: 3234.1. Samples: 173135852. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 11:04:23,968][134211] Avg episode reward: [(0, '9.293')] [2025-01-04 11:04:26,685][134294] Updated weights for policy 0, policy_version 179664 (0.0025) [2025-01-04 11:04:28,936][134294] Updated weights for policy 0, policy_version 179674 (0.0011) [2025-01-04 11:04:28,968][134211] Fps is (10 sec: 13517.0, 60 sec: 13039.0, 300 sec: 13079.4). Total num frames: 735944704. Throughput: 0: 3038.5. Samples: 173155854. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 11:04:28,968][134211] Avg episode reward: [(0, '10.707')] [2025-01-04 11:04:31,463][134294] Updated weights for policy 0, policy_version 179684 (0.0017) [2025-01-04 11:04:33,968][134211] Fps is (10 sec: 14745.6, 60 sec: 13175.5, 300 sec: 13093.3). Total num frames: 736014336. Throughput: 0: 3106.9. Samples: 173168636. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 11:04:33,968][134211] Avg episode reward: [(0, '8.541')] [2025-01-04 11:04:35,042][134294] Updated weights for policy 0, policy_version 179694 (0.0025) [2025-01-04 11:04:38,327][134294] Updated weights for policy 0, policy_version 179704 (0.0027) [2025-01-04 11:04:38,969][134211] Fps is (10 sec: 12695.9, 60 sec: 12833.8, 300 sec: 13051.6). Total num frames: 736071680. Throughput: 0: 3116.7. Samples: 173186650. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 11:04:38,970][134211] Avg episode reward: [(0, '10.806')] [2025-01-04 11:04:41,778][134294] Updated weights for policy 0, policy_version 179714 (0.0028) [2025-01-04 11:04:43,968][134211] Fps is (10 sec: 11878.3, 60 sec: 12356.2, 300 sec: 13023.9). Total num frames: 736133120. Throughput: 0: 3125.7. Samples: 173204566. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 11:04:43,969][134211] Avg episode reward: [(0, '8.919')] [2025-01-04 11:04:45,184][134294] Updated weights for policy 0, policy_version 179724 (0.0025) [2025-01-04 11:04:48,521][134294] Updated weights for policy 0, policy_version 179734 (0.0027) [2025-01-04 11:04:48,968][134211] Fps is (10 sec: 12289.4, 60 sec: 12356.3, 300 sec: 13023.9). Total num frames: 736194560. Throughput: 0: 3132.4. Samples: 173213618. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 11:04:48,969][134211] Avg episode reward: [(0, '9.040')] [2025-01-04 11:04:51,724][134294] Updated weights for policy 0, policy_version 179744 (0.0025) [2025-01-04 11:04:53,968][134211] Fps is (10 sec: 12287.4, 60 sec: 12424.4, 300 sec: 12940.5). Total num frames: 736256000. Throughput: 0: 3153.7. Samples: 173232454. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 11:04:53,969][134211] Avg episode reward: [(0, '9.039')] [2025-01-04 11:04:54,881][134294] Updated weights for policy 0, policy_version 179754 (0.0025) [2025-01-04 11:04:57,377][134294] Updated weights for policy 0, policy_version 179764 (0.0020) [2025-01-04 11:04:58,968][134211] Fps is (10 sec: 13926.6, 60 sec: 12765.9, 300 sec: 12954.5). Total num frames: 736333824. Throughput: 0: 3246.4. Samples: 173254214. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 11:04:58,968][134211] Avg episode reward: [(0, '9.440')] [2025-01-04 11:05:00,667][134294] Updated weights for policy 0, policy_version 179774 (0.0027) [2025-01-04 11:05:03,937][134294] Updated weights for policy 0, policy_version 179784 (0.0029) [2025-01-04 11:05:03,968][134211] Fps is (10 sec: 13927.1, 60 sec: 12765.9, 300 sec: 12968.3). Total num frames: 736395264. Throughput: 0: 3246.3. Samples: 173263596. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 11:05:03,969][134211] Avg episode reward: [(0, '8.926')] [2025-01-04 11:05:07,350][134294] Updated weights for policy 0, policy_version 179794 (0.0026) [2025-01-04 11:05:08,968][134211] Fps is (10 sec: 11878.4, 60 sec: 12765.9, 300 sec: 12982.2). Total num frames: 736452608. Throughput: 0: 3247.9. Samples: 173282008. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 11:05:08,968][134211] Avg episode reward: [(0, '9.056')] [2025-01-04 11:05:10,248][134294] Updated weights for policy 0, policy_version 179804 (0.0021) [2025-01-04 11:05:12,318][134294] Updated weights for policy 0, policy_version 179814 (0.0013) [2025-01-04 11:05:13,968][134211] Fps is (10 sec: 15155.5, 60 sec: 13312.0, 300 sec: 12996.1). Total num frames: 736546816. Throughput: 0: 3356.3. Samples: 173306890. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 11:05:13,968][134211] Avg episode reward: [(0, '9.145')] [2025-01-04 11:05:14,002][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000179822_736550912.pth... [2025-01-04 11:05:14,047][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000179067_733458432.pth [2025-01-04 11:05:14,427][134294] Updated weights for policy 0, policy_version 179824 (0.0012) [2025-01-04 11:05:16,472][134294] Updated weights for policy 0, policy_version 179834 (0.0014) [2025-01-04 11:05:18,968][134211] Fps is (10 sec: 17612.6, 60 sec: 13653.3, 300 sec: 12996.1). Total num frames: 736628736. Throughput: 0: 3402.0. Samples: 173321726. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 11:05:18,968][134211] Avg episode reward: [(0, '10.352')] [2025-01-04 11:05:19,692][134294] Updated weights for policy 0, policy_version 179844 (0.0028) [2025-01-04 11:05:23,313][134294] Updated weights for policy 0, policy_version 179854 (0.0028) [2025-01-04 11:05:23,968][134211] Fps is (10 sec: 14335.8, 60 sec: 13721.6, 300 sec: 13023.9). Total num frames: 736690176. Throughput: 0: 3418.6. Samples: 173340484. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 11:05:23,968][134211] Avg episode reward: [(0, '10.293')] [2025-01-04 11:05:26,754][134294] Updated weights for policy 0, policy_version 179864 (0.0027) [2025-01-04 11:05:28,969][134211] Fps is (10 sec: 11877.5, 60 sec: 13380.1, 300 sec: 13023.9). Total num frames: 736747520. Throughput: 0: 3404.9. Samples: 173357788. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 11:05:28,969][134211] Avg episode reward: [(0, '10.182')] [2025-01-04 11:05:30,476][134294] Updated weights for policy 0, policy_version 179874 (0.0030) [2025-01-04 11:05:33,915][134294] Updated weights for policy 0, policy_version 179884 (0.0027) [2025-01-04 11:05:33,969][134211] Fps is (10 sec: 11467.8, 60 sec: 13175.3, 300 sec: 13010.0). Total num frames: 736804864. Throughput: 0: 3389.9. Samples: 173366166. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 11:05:33,969][134211] Avg episode reward: [(0, '8.356')] [2025-01-04 11:05:37,106][134294] Updated weights for policy 0, policy_version 179894 (0.0024) [2025-01-04 11:05:38,968][134211] Fps is (10 sec: 11879.3, 60 sec: 13244.0, 300 sec: 13023.9). Total num frames: 736866304. Throughput: 0: 3387.5. Samples: 173384890. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 11:05:38,968][134211] Avg episode reward: [(0, '10.026')] [2025-01-04 11:05:40,493][134294] Updated weights for policy 0, policy_version 179904 (0.0025) [2025-01-04 11:05:43,737][134294] Updated weights for policy 0, policy_version 179914 (0.0026) [2025-01-04 11:05:43,968][134211] Fps is (10 sec: 12289.3, 60 sec: 13243.8, 300 sec: 13037.8). Total num frames: 736927744. Throughput: 0: 3315.6. Samples: 173403414. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 11:05:43,968][134211] Avg episode reward: [(0, '8.544')] [2025-01-04 11:05:47,048][134294] Updated weights for policy 0, policy_version 179924 (0.0023) [2025-01-04 11:05:48,968][134211] Fps is (10 sec: 12288.1, 60 sec: 13243.8, 300 sec: 12996.1). Total num frames: 736989184. Throughput: 0: 3315.4. Samples: 173412786. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 11:05:48,968][134211] Avg episode reward: [(0, '9.508')] [2025-01-04 11:05:50,590][134294] Updated weights for policy 0, policy_version 179934 (0.0028) [2025-01-04 11:05:53,799][134294] Updated weights for policy 0, policy_version 179944 (0.0029) [2025-01-04 11:05:53,968][134211] Fps is (10 sec: 12287.8, 60 sec: 13243.9, 300 sec: 13010.0). Total num frames: 737050624. Throughput: 0: 3308.4. Samples: 173430886. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 11:05:53,968][134211] Avg episode reward: [(0, '8.663')] [2025-01-04 11:05:57,126][134294] Updated weights for policy 0, policy_version 179954 (0.0022) [2025-01-04 11:05:58,968][134211] Fps is (10 sec: 12287.9, 60 sec: 12970.6, 300 sec: 13024.1). Total num frames: 737112064. Throughput: 0: 3159.1. Samples: 173449048. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 11:05:58,968][134211] Avg episode reward: [(0, '10.662')] [2025-01-04 11:06:00,529][134294] Updated weights for policy 0, policy_version 179964 (0.0026) [2025-01-04 11:06:02,846][134294] Updated weights for policy 0, policy_version 179974 (0.0014) [2025-01-04 11:06:03,968][134211] Fps is (10 sec: 13516.8, 60 sec: 13175.5, 300 sec: 12982.2). Total num frames: 737185792. Throughput: 0: 3052.8. Samples: 173459104. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 11:06:03,969][134211] Avg episode reward: [(0, '9.124')] [2025-01-04 11:06:06,139][134294] Updated weights for policy 0, policy_version 179984 (0.0023) [2025-01-04 11:06:08,970][134211] Fps is (10 sec: 13514.2, 60 sec: 13243.3, 300 sec: 12843.3). Total num frames: 737247232. Throughput: 0: 3097.2. Samples: 173479866. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 11:06:08,970][134211] Avg episode reward: [(0, '9.838')] [2025-01-04 11:06:09,611][134294] Updated weights for policy 0, policy_version 179994 (0.0025) [2025-01-04 11:06:12,157][134294] Updated weights for policy 0, policy_version 180004 (0.0019) [2025-01-04 11:06:13,967][134211] Fps is (10 sec: 14336.4, 60 sec: 13039.0, 300 sec: 12871.2). Total num frames: 737329152. Throughput: 0: 3211.9. Samples: 173502322. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 11:06:13,968][134211] Avg episode reward: [(0, '9.326')] [2025-01-04 11:06:14,224][134294] Updated weights for policy 0, policy_version 180014 (0.0013) [2025-01-04 11:06:16,267][134294] Updated weights for policy 0, policy_version 180024 (0.0012) [2025-01-04 11:06:18,968][134211] Fps is (10 sec: 16797.0, 60 sec: 13107.2, 300 sec: 12968.3). Total num frames: 737415168. Throughput: 0: 3358.3. Samples: 173517286. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 11:06:18,968][134211] Avg episode reward: [(0, '9.004')] [2025-01-04 11:06:19,249][134294] Updated weights for policy 0, policy_version 180034 (0.0023) [2025-01-04 11:06:22,880][134294] Updated weights for policy 0, policy_version 180044 (0.0026) [2025-01-04 11:06:23,968][134211] Fps is (10 sec: 13926.0, 60 sec: 12970.7, 300 sec: 12954.5). Total num frames: 737468416. Throughput: 0: 3365.5. Samples: 173536340. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 11:06:23,969][134211] Avg episode reward: [(0, '9.212')] [2025-01-04 11:06:26,443][134294] Updated weights for policy 0, policy_version 180054 (0.0026) [2025-01-04 11:06:28,968][134211] Fps is (10 sec: 11058.7, 60 sec: 12970.8, 300 sec: 12940.6). Total num frames: 737525760. Throughput: 0: 3327.9. Samples: 173553170. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 11:06:28,969][134211] Avg episode reward: [(0, '10.668')] [2025-01-04 11:06:30,192][134294] Updated weights for policy 0, policy_version 180064 (0.0023) [2025-01-04 11:06:33,712][134294] Updated weights for policy 0, policy_version 180074 (0.0028) [2025-01-04 11:06:33,968][134211] Fps is (10 sec: 11469.0, 60 sec: 12970.9, 300 sec: 12926.7). Total num frames: 737583104. Throughput: 0: 3307.9. Samples: 173561642. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 11:06:33,968][134211] Avg episode reward: [(0, '9.626')] [2025-01-04 11:06:37,193][134294] Updated weights for policy 0, policy_version 180084 (0.0026) [2025-01-04 11:06:38,968][134211] Fps is (10 sec: 11469.2, 60 sec: 12902.4, 300 sec: 12912.8). Total num frames: 737640448. Throughput: 0: 3294.3. Samples: 173579130. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 11:06:38,968][134211] Avg episode reward: [(0, '10.012')] [2025-01-04 11:06:40,678][134294] Updated weights for policy 0, policy_version 180094 (0.0024) [2025-01-04 11:06:43,902][134294] Updated weights for policy 0, policy_version 180104 (0.0025) [2025-01-04 11:06:43,968][134211] Fps is (10 sec: 12288.0, 60 sec: 12970.7, 300 sec: 12926.7). Total num frames: 737705984. Throughput: 0: 3297.6. Samples: 173597438. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 11:06:43,968][134211] Avg episode reward: [(0, '9.973')] [2025-01-04 11:06:47,224][134294] Updated weights for policy 0, policy_version 180114 (0.0030) [2025-01-04 11:06:48,968][134211] Fps is (10 sec: 12288.2, 60 sec: 12902.4, 300 sec: 12912.8). Total num frames: 737763328. Throughput: 0: 3285.3. Samples: 173606942. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 11:06:48,968][134211] Avg episode reward: [(0, '10.708')] [2025-01-04 11:06:50,454][134294] Updated weights for policy 0, policy_version 180124 (0.0029) [2025-01-04 11:06:52,613][134294] Updated weights for policy 0, policy_version 180134 (0.0014) [2025-01-04 11:06:53,968][134211] Fps is (10 sec: 14745.7, 60 sec: 13380.3, 300 sec: 13023.9). Total num frames: 737853440. Throughput: 0: 3301.2. Samples: 173628412. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 11:06:53,968][134211] Avg episode reward: [(0, '8.453')] [2025-01-04 11:06:54,769][134294] Updated weights for policy 0, policy_version 180144 (0.0014) [2025-01-04 11:06:58,047][134294] Updated weights for policy 0, policy_version 180154 (0.0027) [2025-01-04 11:06:58,968][134211] Fps is (10 sec: 15564.5, 60 sec: 13448.5, 300 sec: 13037.8). Total num frames: 737918976. Throughput: 0: 3305.4. Samples: 173651068. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 11:06:58,968][134211] Avg episode reward: [(0, '10.133')] [2025-01-04 11:07:01,594][134294] Updated weights for policy 0, policy_version 180164 (0.0025) [2025-01-04 11:07:03,968][134211] Fps is (10 sec: 12287.7, 60 sec: 13175.5, 300 sec: 13023.9). Total num frames: 737976320. Throughput: 0: 3166.7. Samples: 173659786. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 11:07:03,969][134211] Avg episode reward: [(0, '10.319')] [2025-01-04 11:07:05,071][134294] Updated weights for policy 0, policy_version 180174 (0.0026) [2025-01-04 11:07:08,458][134294] Updated weights for policy 0, policy_version 180184 (0.0026) [2025-01-04 11:07:08,968][134211] Fps is (10 sec: 11878.4, 60 sec: 13175.9, 300 sec: 13037.8). Total num frames: 738037760. Throughput: 0: 3144.8. Samples: 173677854. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 11:07:08,968][134211] Avg episode reward: [(0, '10.937')] [2025-01-04 11:07:11,834][134294] Updated weights for policy 0, policy_version 180194 (0.0027) [2025-01-04 11:07:13,968][134211] Fps is (10 sec: 12288.1, 60 sec: 12834.1, 300 sec: 13037.8). Total num frames: 738099200. Throughput: 0: 3169.1. Samples: 173695778. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 11:07:13,968][134211] Avg episode reward: [(0, '10.727')] [2025-01-04 11:07:13,981][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000180200_738099200.pth... [2025-01-04 11:07:14,062][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000179439_734982144.pth [2025-01-04 11:07:15,326][134294] Updated weights for policy 0, policy_version 180204 (0.0025) [2025-01-04 11:07:18,166][134294] Updated weights for policy 0, policy_version 180214 (0.0019) [2025-01-04 11:07:18,968][134211] Fps is (10 sec: 13107.4, 60 sec: 12561.1, 300 sec: 13037.9). Total num frames: 738168832. Throughput: 0: 3176.6. Samples: 173704590. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 11:07:18,968][134211] Avg episode reward: [(0, '9.428')] [2025-01-04 11:07:20,199][134294] Updated weights for policy 0, policy_version 180224 (0.0012) [2025-01-04 11:07:22,271][134294] Updated weights for policy 0, policy_version 180234 (0.0015) [2025-01-04 11:07:23,968][134211] Fps is (10 sec: 17203.5, 60 sec: 13380.3, 300 sec: 13162.7). Total num frames: 738271232. Throughput: 0: 3409.8. Samples: 173732570. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 11:07:23,968][134211] Avg episode reward: [(0, '9.253')] [2025-01-04 11:07:24,428][134294] Updated weights for policy 0, policy_version 180244 (0.0017) [2025-01-04 11:07:27,890][134294] Updated weights for policy 0, policy_version 180254 (0.0026) [2025-01-04 11:07:28,968][134211] Fps is (10 sec: 15974.0, 60 sec: 13380.3, 300 sec: 13037.8). Total num frames: 738328576. Throughput: 0: 3483.7. Samples: 173754206. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 11:07:28,969][134211] Avg episode reward: [(0, '9.922')] [2025-01-04 11:07:31,548][134294] Updated weights for policy 0, policy_version 180264 (0.0028) [2025-01-04 11:07:33,968][134211] Fps is (10 sec: 11878.3, 60 sec: 13448.5, 300 sec: 13051.7). Total num frames: 738390016. Throughput: 0: 3462.3. Samples: 173762746. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 11:07:33,968][134211] Avg episode reward: [(0, '10.250')] [2025-01-04 11:07:34,999][134294] Updated weights for policy 0, policy_version 180274 (0.0024) [2025-01-04 11:07:38,436][134294] Updated weights for policy 0, policy_version 180284 (0.0028) [2025-01-04 11:07:38,968][134211] Fps is (10 sec: 11878.6, 60 sec: 13448.6, 300 sec: 13051.7). Total num frames: 738447360. Throughput: 0: 3374.2. Samples: 173780250. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 11:07:38,968][134211] Avg episode reward: [(0, '9.222')] [2025-01-04 11:07:41,912][134294] Updated weights for policy 0, policy_version 180294 (0.0029) [2025-01-04 11:07:43,968][134211] Fps is (10 sec: 11878.4, 60 sec: 13380.3, 300 sec: 13051.7). Total num frames: 738508800. Throughput: 0: 3268.0. Samples: 173798130. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 11:07:43,968][134211] Avg episode reward: [(0, '9.778')] [2025-01-04 11:07:45,334][134294] Updated weights for policy 0, policy_version 180304 (0.0028) [2025-01-04 11:07:48,586][134294] Updated weights for policy 0, policy_version 180314 (0.0023) [2025-01-04 11:07:48,968][134211] Fps is (10 sec: 12287.8, 60 sec: 13448.5, 300 sec: 13065.5). Total num frames: 738570240. Throughput: 0: 3276.2. Samples: 173807216. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 11:07:48,968][134211] Avg episode reward: [(0, '8.258')] [2025-01-04 11:07:51,754][134294] Updated weights for policy 0, policy_version 180324 (0.0024) [2025-01-04 11:07:53,968][134211] Fps is (10 sec: 12288.0, 60 sec: 12970.6, 300 sec: 13065.5). Total num frames: 738631680. Throughput: 0: 3290.9. Samples: 173825946. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 11:07:53,968][134211] Avg episode reward: [(0, '10.077')] [2025-01-04 11:07:55,415][134294] Updated weights for policy 0, policy_version 180334 (0.0027) [2025-01-04 11:07:58,872][134294] Updated weights for policy 0, policy_version 180344 (0.0027) [2025-01-04 11:07:58,968][134211] Fps is (10 sec: 11878.6, 60 sec: 12834.2, 300 sec: 13065.6). Total num frames: 738689024. Throughput: 0: 3277.9. Samples: 173843284. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 11:07:58,968][134211] Avg episode reward: [(0, '9.342')] [2025-01-04 11:08:01,298][134294] Updated weights for policy 0, policy_version 180354 (0.0016) [2025-01-04 11:08:03,968][134211] Fps is (10 sec: 13107.0, 60 sec: 13107.2, 300 sec: 13023.9). Total num frames: 738762752. Throughput: 0: 3347.9. Samples: 173855246. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 11:08:03,969][134211] Avg episode reward: [(0, '10.206')] [2025-01-04 11:08:04,583][134294] Updated weights for policy 0, policy_version 180364 (0.0023) [2025-01-04 11:08:07,998][134294] Updated weights for policy 0, policy_version 180374 (0.0032) [2025-01-04 11:08:08,968][134211] Fps is (10 sec: 13107.0, 60 sec: 13038.9, 300 sec: 13010.0). Total num frames: 738820096. Throughput: 0: 3135.8. Samples: 173873680. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 11:08:08,969][134211] Avg episode reward: [(0, '8.892')] [2025-01-04 11:08:11,359][134294] Updated weights for policy 0, policy_version 180384 (0.0023) [2025-01-04 11:08:13,574][134294] Updated weights for policy 0, policy_version 180394 (0.0016) [2025-01-04 11:08:13,968][134211] Fps is (10 sec: 13517.0, 60 sec: 13312.0, 300 sec: 13079.5). Total num frames: 738897920. Throughput: 0: 3130.5. Samples: 173895078. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 11:08:13,972][134211] Avg episode reward: [(0, '9.490')] [2025-01-04 11:08:16,883][134294] Updated weights for policy 0, policy_version 180404 (0.0024) [2025-01-04 11:08:18,971][134211] Fps is (10 sec: 13512.7, 60 sec: 13106.5, 300 sec: 13065.4). Total num frames: 738955264. Throughput: 0: 3156.4. Samples: 173904794. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 11:08:18,971][134211] Avg episode reward: [(0, '8.513')] [2025-01-04 11:08:20,399][134294] Updated weights for policy 0, policy_version 180414 (0.0030) [2025-01-04 11:08:23,937][134294] Updated weights for policy 0, policy_version 180424 (0.0026) [2025-01-04 11:08:23,968][134211] Fps is (10 sec: 11878.4, 60 sec: 12424.5, 300 sec: 13065.5). Total num frames: 739016704. Throughput: 0: 3156.9. Samples: 173922312. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 11:08:23,968][134211] Avg episode reward: [(0, '9.199')] [2025-01-04 11:08:27,145][134294] Updated weights for policy 0, policy_version 180434 (0.0022) [2025-01-04 11:08:28,968][134211] Fps is (10 sec: 13521.2, 60 sec: 12697.6, 300 sec: 13107.2). Total num frames: 739090432. Throughput: 0: 3207.6. Samples: 173942470. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 11:08:28,968][134211] Avg episode reward: [(0, '9.679')] [2025-01-04 11:08:29,287][134294] Updated weights for policy 0, policy_version 180444 (0.0015) [2025-01-04 11:08:32,025][134294] Updated weights for policy 0, policy_version 180454 (0.0019) [2025-01-04 11:08:33,969][134211] Fps is (10 sec: 13924.5, 60 sec: 12765.6, 300 sec: 13065.5). Total num frames: 739155968. Throughput: 0: 3290.7. Samples: 173955300. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 11:08:33,970][134211] Avg episode reward: [(0, '9.217')] [2025-01-04 11:08:35,908][134294] Updated weights for policy 0, policy_version 180464 (0.0030) [2025-01-04 11:08:38,968][134211] Fps is (10 sec: 12287.8, 60 sec: 12765.9, 300 sec: 12954.5). Total num frames: 739213312. Throughput: 0: 3247.9. Samples: 173972100. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 11:08:38,968][134211] Avg episode reward: [(0, '8.923')] [2025-01-04 11:08:39,337][134294] Updated weights for policy 0, policy_version 180474 (0.0026) [2025-01-04 11:08:42,880][134294] Updated weights for policy 0, policy_version 180484 (0.0025) [2025-01-04 11:08:43,969][134211] Fps is (10 sec: 11879.0, 60 sec: 12765.7, 300 sec: 12954.4). Total num frames: 739274752. Throughput: 0: 3252.2. Samples: 173989634. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 11:08:43,969][134211] Avg episode reward: [(0, '9.168')] [2025-01-04 11:08:46,076][134294] Updated weights for policy 0, policy_version 180494 (0.0024) [2025-01-04 11:08:48,968][134211] Fps is (10 sec: 12288.1, 60 sec: 12765.9, 300 sec: 12968.4). Total num frames: 739336192. Throughput: 0: 3199.5. Samples: 173999224. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 11:08:48,968][134211] Avg episode reward: [(0, '9.784')] [2025-01-04 11:08:49,500][134294] Updated weights for policy 0, policy_version 180504 (0.0027) [2025-01-04 11:08:52,180][134294] Updated weights for policy 0, policy_version 180514 (0.0016) [2025-01-04 11:08:53,968][134211] Fps is (10 sec: 13927.5, 60 sec: 13038.9, 300 sec: 13037.8). Total num frames: 739414016. Throughput: 0: 3244.8. Samples: 174019698. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 11:08:53,968][134211] Avg episode reward: [(0, '9.376')] [2025-01-04 11:08:54,743][134294] Updated weights for policy 0, policy_version 180524 (0.0020) [2025-01-04 11:08:58,171][134294] Updated weights for policy 0, policy_version 180534 (0.0025) [2025-01-04 11:08:58,968][134211] Fps is (10 sec: 13926.3, 60 sec: 13107.2, 300 sec: 13037.8). Total num frames: 739475456. Throughput: 0: 3222.4. Samples: 174040086. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 11:08:58,968][134211] Avg episode reward: [(0, '9.333')] [2025-01-04 11:09:01,586][134294] Updated weights for policy 0, policy_version 180544 (0.0025) [2025-01-04 11:09:03,968][134211] Fps is (10 sec: 11878.3, 60 sec: 12834.1, 300 sec: 13037.8). Total num frames: 739532800. Throughput: 0: 3205.1. Samples: 174049014. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 11:09:03,968][134211] Avg episode reward: [(0, '8.232')] [2025-01-04 11:09:05,054][134294] Updated weights for policy 0, policy_version 180554 (0.0032) [2025-01-04 11:09:08,401][134294] Updated weights for policy 0, policy_version 180564 (0.0027) [2025-01-04 11:09:08,968][134211] Fps is (10 sec: 11878.6, 60 sec: 12902.4, 300 sec: 13037.8). Total num frames: 739594240. Throughput: 0: 3222.3. Samples: 174067314. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 11:09:08,968][134211] Avg episode reward: [(0, '10.683')] [2025-01-04 11:09:10,900][134294] Updated weights for policy 0, policy_version 180574 (0.0015) [2025-01-04 11:09:12,915][134294] Updated weights for policy 0, policy_version 180584 (0.0015) [2025-01-04 11:09:13,968][134211] Fps is (10 sec: 15974.6, 60 sec: 13243.7, 300 sec: 13162.7). Total num frames: 739692544. Throughput: 0: 3333.9. Samples: 174092494. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 11:09:13,968][134211] Avg episode reward: [(0, '9.406')] [2025-01-04 11:09:13,972][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000180589_739692544.pth... [2025-01-04 11:09:14,017][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000179822_736550912.pth [2025-01-04 11:09:14,969][134294] Updated weights for policy 0, policy_version 180594 (0.0014) [2025-01-04 11:09:17,401][134294] Updated weights for policy 0, policy_version 180604 (0.0017) [2025-01-04 11:09:18,968][134211] Fps is (10 sec: 17612.7, 60 sec: 13585.8, 300 sec: 13232.2). Total num frames: 739770368. Throughput: 0: 3376.8. Samples: 174107252. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 11:09:18,968][134211] Avg episode reward: [(0, '9.445')] [2025-01-04 11:09:20,921][134294] Updated weights for policy 0, policy_version 180614 (0.0026) [2025-01-04 11:09:23,968][134211] Fps is (10 sec: 13516.7, 60 sec: 13516.8, 300 sec: 13162.7). Total num frames: 739827712. Throughput: 0: 3412.7. Samples: 174125670. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 11:09:23,968][134211] Avg episode reward: [(0, '9.892')] [2025-01-04 11:09:24,302][134294] Updated weights for policy 0, policy_version 180624 (0.0031) [2025-01-04 11:09:27,828][134294] Updated weights for policy 0, policy_version 180634 (0.0024) [2025-01-04 11:09:28,971][134211] Fps is (10 sec: 11874.0, 60 sec: 13311.2, 300 sec: 13134.8). Total num frames: 739889152. Throughput: 0: 3416.8. Samples: 174143398. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 11:09:28,972][134211] Avg episode reward: [(0, '9.843')] [2025-01-04 11:09:31,350][134294] Updated weights for policy 0, policy_version 180644 (0.0027) [2025-01-04 11:09:33,968][134211] Fps is (10 sec: 11878.4, 60 sec: 13175.8, 300 sec: 13135.0). Total num frames: 739946496. Throughput: 0: 3399.1. Samples: 174152182. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 11:09:33,968][134211] Avg episode reward: [(0, '9.404')] [2025-01-04 11:09:34,768][134294] Updated weights for policy 0, policy_version 180654 (0.0030) [2025-01-04 11:09:38,095][134294] Updated weights for policy 0, policy_version 180664 (0.0027) [2025-01-04 11:09:38,968][134211] Fps is (10 sec: 11882.6, 60 sec: 13243.7, 300 sec: 13135.0). Total num frames: 740007936. Throughput: 0: 3350.9. Samples: 174170490. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 11:09:38,968][134211] Avg episode reward: [(0, '9.372')] [2025-01-04 11:09:41,361][134294] Updated weights for policy 0, policy_version 180674 (0.0027) [2025-01-04 11:09:43,968][134211] Fps is (10 sec: 12287.9, 60 sec: 13243.9, 300 sec: 13135.0). Total num frames: 740069376. Throughput: 0: 3304.2. Samples: 174188774. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 11:09:43,968][134211] Avg episode reward: [(0, '10.678')] [2025-01-04 11:09:44,762][134294] Updated weights for policy 0, policy_version 180684 (0.0027) [2025-01-04 11:09:48,113][134294] Updated weights for policy 0, policy_version 180694 (0.0028) [2025-01-04 11:09:48,968][134211] Fps is (10 sec: 12288.1, 60 sec: 13243.7, 300 sec: 13135.0). Total num frames: 740130816. Throughput: 0: 3311.2. Samples: 174198016. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 11:09:48,968][134211] Avg episode reward: [(0, '8.536')] [2025-01-04 11:09:51,214][134294] Updated weights for policy 0, policy_version 180704 (0.0026) [2025-01-04 11:09:53,968][134211] Fps is (10 sec: 12288.2, 60 sec: 12970.7, 300 sec: 13079.4). Total num frames: 740192256. Throughput: 0: 3319.0. Samples: 174216668. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 11:09:53,968][134211] Avg episode reward: [(0, '10.237')] [2025-01-04 11:09:54,798][134294] Updated weights for policy 0, policy_version 180714 (0.0026) [2025-01-04 11:09:58,192][134294] Updated weights for policy 0, policy_version 180724 (0.0026) [2025-01-04 11:09:58,968][134211] Fps is (10 sec: 12288.0, 60 sec: 12970.7, 300 sec: 13079.4). Total num frames: 740253696. Throughput: 0: 3153.5. Samples: 174234402. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 11:09:58,968][134211] Avg episode reward: [(0, '8.410')] [2025-01-04 11:10:01,304][134294] Updated weights for policy 0, policy_version 180734 (0.0019) [2025-01-04 11:10:03,517][134294] Updated weights for policy 0, policy_version 180744 (0.0013) [2025-01-04 11:10:03,968][134211] Fps is (10 sec: 13926.5, 60 sec: 13312.1, 300 sec: 13148.9). Total num frames: 740331520. Throughput: 0: 3045.4. Samples: 174244294. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 11:10:03,968][134211] Avg episode reward: [(0, '10.179')] [2025-01-04 11:10:05,612][134294] Updated weights for policy 0, policy_version 180754 (0.0012) [2025-01-04 11:10:07,674][134294] Updated weights for policy 0, policy_version 180764 (0.0014) [2025-01-04 11:10:08,968][134211] Fps is (10 sec: 18022.5, 60 sec: 13994.7, 300 sec: 13176.6). Total num frames: 740433920. Throughput: 0: 3278.3. Samples: 174273194. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 11:10:08,968][134211] Avg episode reward: [(0, '9.174')] [2025-01-04 11:10:09,937][134294] Updated weights for policy 0, policy_version 180774 (0.0017) [2025-01-04 11:10:13,549][134294] Updated weights for policy 0, policy_version 180784 (0.0032) [2025-01-04 11:10:13,968][134211] Fps is (10 sec: 15974.2, 60 sec: 13312.0, 300 sec: 13093.3). Total num frames: 740491264. Throughput: 0: 3366.5. Samples: 174294880. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 11:10:13,968][134211] Avg episode reward: [(0, '9.680')] [2025-01-04 11:10:16,968][134294] Updated weights for policy 0, policy_version 180794 (0.0026) [2025-01-04 11:10:18,968][134211] Fps is (10 sec: 11877.4, 60 sec: 13038.8, 300 sec: 13093.3). Total num frames: 740552704. Throughput: 0: 3370.2. Samples: 174303842. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 11:10:18,969][134211] Avg episode reward: [(0, '9.944')] [2025-01-04 11:10:20,398][134294] Updated weights for policy 0, policy_version 180804 (0.0030) [2025-01-04 11:10:23,960][134294] Updated weights for policy 0, policy_version 180814 (0.0026) [2025-01-04 11:10:23,968][134211] Fps is (10 sec: 12287.8, 60 sec: 13107.2, 300 sec: 13107.2). Total num frames: 740614144. Throughput: 0: 3364.3. Samples: 174321884. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 11:10:23,969][134211] Avg episode reward: [(0, '9.679')] [2025-01-04 11:10:27,469][134294] Updated weights for policy 0, policy_version 180824 (0.0023) [2025-01-04 11:10:28,968][134211] Fps is (10 sec: 11879.3, 60 sec: 13039.7, 300 sec: 13107.2). Total num frames: 740671488. Throughput: 0: 3341.4. Samples: 174339138. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 11:10:28,968][134211] Avg episode reward: [(0, '9.069')] [2025-01-04 11:10:30,853][134294] Updated weights for policy 0, policy_version 180834 (0.0028) [2025-01-04 11:10:33,968][134211] Fps is (10 sec: 11469.0, 60 sec: 13038.9, 300 sec: 13093.3). Total num frames: 740728832. Throughput: 0: 3335.8. Samples: 174348126. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 11:10:33,968][134211] Avg episode reward: [(0, '10.187')] [2025-01-04 11:10:34,394][134294] Updated weights for policy 0, policy_version 180844 (0.0030) [2025-01-04 11:10:37,745][134294] Updated weights for policy 0, policy_version 180854 (0.0024) [2025-01-04 11:10:38,968][134211] Fps is (10 sec: 11878.3, 60 sec: 13038.9, 300 sec: 13093.3). Total num frames: 740790272. Throughput: 0: 3321.0. Samples: 174366112. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 11:10:38,968][134211] Avg episode reward: [(0, '9.310')] [2025-01-04 11:10:41,136][134294] Updated weights for policy 0, policy_version 180864 (0.0026) [2025-01-04 11:10:43,968][134211] Fps is (10 sec: 12287.8, 60 sec: 13038.9, 300 sec: 13093.3). Total num frames: 740851712. Throughput: 0: 3322.7. Samples: 174383926. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 11:10:43,968][134211] Avg episode reward: [(0, '8.579')] [2025-01-04 11:10:44,618][134294] Updated weights for policy 0, policy_version 180874 (0.0024) [2025-01-04 11:10:47,893][134294] Updated weights for policy 0, policy_version 180884 (0.0024) [2025-01-04 11:10:48,968][134211] Fps is (10 sec: 12288.1, 60 sec: 13039.0, 300 sec: 13093.3). Total num frames: 740913152. Throughput: 0: 3306.8. Samples: 174393102. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 11:10:48,968][134211] Avg episode reward: [(0, '9.264')] [2025-01-04 11:10:51,106][134294] Updated weights for policy 0, policy_version 180894 (0.0026) [2025-01-04 11:10:53,405][134294] Updated weights for policy 0, policy_version 180904 (0.0014) [2025-01-04 11:10:53,967][134211] Fps is (10 sec: 13926.8, 60 sec: 13312.0, 300 sec: 13148.9). Total num frames: 740990976. Throughput: 0: 3104.2. Samples: 174412882. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 11:10:53,968][134211] Avg episode reward: [(0, '9.683')] [2025-01-04 11:10:55,608][134294] Updated weights for policy 0, policy_version 180914 (0.0014) [2025-01-04 11:10:58,824][134294] Updated weights for policy 0, policy_version 180924 (0.0027) [2025-01-04 11:10:58,968][134211] Fps is (10 sec: 15154.9, 60 sec: 13516.8, 300 sec: 13148.9). Total num frames: 741064704. Throughput: 0: 3170.7. Samples: 174437560. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 11:10:58,968][134211] Avg episode reward: [(0, '10.597')] [2025-01-04 11:11:02,412][134294] Updated weights for policy 0, policy_version 180934 (0.0028) [2025-01-04 11:11:03,968][134211] Fps is (10 sec: 13106.4, 60 sec: 13175.4, 300 sec: 13135.0). Total num frames: 741122048. Throughput: 0: 3163.4. Samples: 174446194. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 11:11:03,969][134211] Avg episode reward: [(0, '8.954')] [2025-01-04 11:11:05,927][134294] Updated weights for policy 0, policy_version 180944 (0.0026) [2025-01-04 11:11:08,968][134211] Fps is (10 sec: 11468.7, 60 sec: 12424.5, 300 sec: 13051.6). Total num frames: 741179392. Throughput: 0: 3150.9. Samples: 174463674. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 11:11:08,968][134211] Avg episode reward: [(0, '9.405')] [2025-01-04 11:11:09,185][134294] Updated weights for policy 0, policy_version 180954 (0.0027) [2025-01-04 11:11:12,610][134294] Updated weights for policy 0, policy_version 180964 (0.0028) [2025-01-04 11:11:13,968][134211] Fps is (10 sec: 11878.6, 60 sec: 12492.7, 300 sec: 12968.3). Total num frames: 741240832. Throughput: 0: 3173.0. Samples: 174481926. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 11:11:13,969][134211] Avg episode reward: [(0, '9.910')] [2025-01-04 11:11:14,023][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000180968_741244928.pth... [2025-01-04 11:11:14,098][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000180200_738099200.pth [2025-01-04 11:11:16,033][134294] Updated weights for policy 0, policy_version 180974 (0.0027) [2025-01-04 11:11:18,970][134211] Fps is (10 sec: 12695.0, 60 sec: 12560.7, 300 sec: 13009.9). Total num frames: 741306368. Throughput: 0: 3172.1. Samples: 174490878. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 11:11:18,971][134211] Avg episode reward: [(0, '9.635')] [2025-01-04 11:11:19,072][134294] Updated weights for policy 0, policy_version 180984 (0.0024) [2025-01-04 11:11:22,086][134294] Updated weights for policy 0, policy_version 180994 (0.0024) [2025-01-04 11:11:23,968][134211] Fps is (10 sec: 13107.4, 60 sec: 12629.3, 300 sec: 13037.8). Total num frames: 741371904. Throughput: 0: 3227.6. Samples: 174511352. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 11:11:23,968][134211] Avg episode reward: [(0, '9.182')] [2025-01-04 11:11:25,205][134294] Updated weights for policy 0, policy_version 181004 (0.0027) [2025-01-04 11:11:28,600][134294] Updated weights for policy 0, policy_version 181014 (0.0022) [2025-01-04 11:11:28,968][134211] Fps is (10 sec: 13110.0, 60 sec: 12765.8, 300 sec: 13065.5). Total num frames: 741437440. Throughput: 0: 3258.0. Samples: 174530536. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 11:11:28,968][134211] Avg episode reward: [(0, '9.217')] [2025-01-04 11:11:31,718][134294] Updated weights for policy 0, policy_version 181024 (0.0021) [2025-01-04 11:11:33,821][134294] Updated weights for policy 0, policy_version 181034 (0.0014) [2025-01-04 11:11:33,967][134211] Fps is (10 sec: 14746.1, 60 sec: 13175.5, 300 sec: 13148.9). Total num frames: 741519360. Throughput: 0: 3254.5. Samples: 174539556. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 11:11:33,968][134211] Avg episode reward: [(0, '10.282')] [2025-01-04 11:11:35,800][134294] Updated weights for policy 0, policy_version 181044 (0.0013) [2025-01-04 11:11:38,154][134294] Updated weights for policy 0, policy_version 181054 (0.0016) [2025-01-04 11:11:38,969][134211] Fps is (10 sec: 16792.0, 60 sec: 13584.8, 300 sec: 13218.2). Total num frames: 741605376. Throughput: 0: 3472.4. Samples: 174569144. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 11:11:38,969][134211] Avg episode reward: [(0, '10.094')] [2025-01-04 11:11:41,671][134294] Updated weights for policy 0, policy_version 181064 (0.0027) [2025-01-04 11:11:43,968][134211] Fps is (10 sec: 14335.5, 60 sec: 13516.8, 300 sec: 13218.3). Total num frames: 741662720. Throughput: 0: 3316.5. Samples: 174586802. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 11:11:43,970][134211] Avg episode reward: [(0, '10.389')] [2025-01-04 11:11:45,216][134294] Updated weights for policy 0, policy_version 181074 (0.0029) [2025-01-04 11:11:48,639][134294] Updated weights for policy 0, policy_version 181084 (0.0025) [2025-01-04 11:11:48,968][134211] Fps is (10 sec: 11879.0, 60 sec: 13516.7, 300 sec: 13121.1). Total num frames: 741724160. Throughput: 0: 3323.9. Samples: 174595770. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 11:11:48,969][134211] Avg episode reward: [(0, '10.218')] [2025-01-04 11:11:51,654][134294] Updated weights for policy 0, policy_version 181094 (0.0025) [2025-01-04 11:11:53,968][134211] Fps is (10 sec: 12288.0, 60 sec: 13243.7, 300 sec: 13107.2). Total num frames: 741785600. Throughput: 0: 3359.9. Samples: 174614868. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 11:11:53,968][134211] Avg episode reward: [(0, '9.201')] [2025-01-04 11:11:55,197][134294] Updated weights for policy 0, policy_version 181104 (0.0028) [2025-01-04 11:11:58,230][134294] Updated weights for policy 0, policy_version 181114 (0.0026) [2025-01-04 11:11:58,968][134211] Fps is (10 sec: 12288.5, 60 sec: 13038.9, 300 sec: 13121.1). Total num frames: 741847040. Throughput: 0: 3370.0. Samples: 174633576. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 11:11:58,968][134211] Avg episode reward: [(0, '9.945')] [2025-01-04 11:12:01,543][134294] Updated weights for policy 0, policy_version 181124 (0.0028) [2025-01-04 11:12:03,968][134211] Fps is (10 sec: 12697.6, 60 sec: 13175.5, 300 sec: 13135.0). Total num frames: 741912576. Throughput: 0: 3380.8. Samples: 174643006. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 11:12:03,968][134211] Avg episode reward: [(0, '9.248')] [2025-01-04 11:12:04,760][134294] Updated weights for policy 0, policy_version 181134 (0.0023) [2025-01-04 11:12:07,725][134294] Updated weights for policy 0, policy_version 181144 (0.0024) [2025-01-04 11:12:08,968][134211] Fps is (10 sec: 13107.6, 60 sec: 13312.1, 300 sec: 13148.9). Total num frames: 741978112. Throughput: 0: 3363.7. Samples: 174662718. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 11:12:08,968][134211] Avg episode reward: [(0, '9.036')] [2025-01-04 11:12:11,162][134294] Updated weights for policy 0, policy_version 181154 (0.0025) [2025-01-04 11:12:13,968][134211] Fps is (10 sec: 12697.5, 60 sec: 13312.0, 300 sec: 13121.1). Total num frames: 742039552. Throughput: 0: 3342.3. Samples: 174680940. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 11:12:13,968][134211] Avg episode reward: [(0, '8.283')] [2025-01-04 11:12:14,493][134294] Updated weights for policy 0, policy_version 181164 (0.0027) [2025-01-04 11:12:17,506][134294] Updated weights for policy 0, policy_version 181174 (0.0022) [2025-01-04 11:12:18,968][134211] Fps is (10 sec: 13107.2, 60 sec: 13380.8, 300 sec: 13010.0). Total num frames: 742109184. Throughput: 0: 3359.9. Samples: 174690754. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 11:12:18,968][134211] Avg episode reward: [(0, '9.098')] [2025-01-04 11:12:19,691][134294] Updated weights for policy 0, policy_version 181184 (0.0016) [2025-01-04 11:12:22,206][134294] Updated weights for policy 0, policy_version 181194 (0.0018) [2025-01-04 11:12:23,968][134211] Fps is (10 sec: 14745.9, 60 sec: 13585.1, 300 sec: 13079.4). Total num frames: 742187008. Throughput: 0: 3258.1. Samples: 174715754. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 11:12:23,968][134211] Avg episode reward: [(0, '9.371')] [2025-01-04 11:12:25,663][134294] Updated weights for policy 0, policy_version 181204 (0.0027) [2025-01-04 11:12:28,968][134211] Fps is (10 sec: 13926.3, 60 sec: 13516.8, 300 sec: 13079.4). Total num frames: 742248448. Throughput: 0: 3257.1. Samples: 174733370. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 11:12:28,968][134211] Avg episode reward: [(0, '9.854')] [2025-01-04 11:12:29,205][134294] Updated weights for policy 0, policy_version 181214 (0.0026) [2025-01-04 11:12:32,674][134294] Updated weights for policy 0, policy_version 181224 (0.0025) [2025-01-04 11:12:33,968][134211] Fps is (10 sec: 12288.1, 60 sec: 13175.4, 300 sec: 13093.3). Total num frames: 742309888. Throughput: 0: 3252.3. Samples: 174742122. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 11:12:33,968][134211] Avg episode reward: [(0, '8.964')] [2025-01-04 11:12:34,985][134294] Updated weights for policy 0, policy_version 181234 (0.0016) [2025-01-04 11:12:36,915][134294] Updated weights for policy 0, policy_version 181244 (0.0014) [2025-01-04 11:12:38,773][134294] Updated weights for policy 0, policy_version 181254 (0.0015) [2025-01-04 11:12:38,967][134211] Fps is (10 sec: 17203.4, 60 sec: 13585.4, 300 sec: 13259.9). Total num frames: 742420480. Throughput: 0: 3416.6. Samples: 174768612. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 11:12:38,968][134211] Avg episode reward: [(0, '8.565')] [2025-01-04 11:12:40,689][134294] Updated weights for policy 0, policy_version 181264 (0.0014) [2025-01-04 11:12:42,634][134294] Updated weights for policy 0, policy_version 181274 (0.0013) [2025-01-04 11:12:43,968][134211] Fps is (10 sec: 21708.4, 60 sec: 14404.3, 300 sec: 13412.7). Total num frames: 742526976. Throughput: 0: 3721.5. Samples: 174801044. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 11:12:43,968][134211] Avg episode reward: [(0, '9.596')] [2025-01-04 11:12:44,460][134294] Updated weights for policy 0, policy_version 181284 (0.0013) [2025-01-04 11:12:47,238][134294] Updated weights for policy 0, policy_version 181294 (0.0024) [2025-01-04 11:12:48,968][134211] Fps is (10 sec: 18021.2, 60 sec: 14609.1, 300 sec: 13454.3). Total num frames: 742600704. Throughput: 0: 3820.7. Samples: 174814938. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 11:12:48,969][134211] Avg episode reward: [(0, '9.306')] [2025-01-04 11:12:50,459][134294] Updated weights for policy 0, policy_version 181304 (0.0029) [2025-01-04 11:12:53,487][134294] Updated weights for policy 0, policy_version 181314 (0.0026) [2025-01-04 11:12:53,968][134211] Fps is (10 sec: 13926.2, 60 sec: 14677.3, 300 sec: 13482.1). Total num frames: 742666240. Throughput: 0: 3815.0. Samples: 174834394. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 11:12:53,969][134211] Avg episode reward: [(0, '8.957')] [2025-01-04 11:12:56,565][134294] Updated weights for policy 0, policy_version 181324 (0.0027) [2025-01-04 11:12:58,968][134211] Fps is (10 sec: 13107.7, 60 sec: 14745.6, 300 sec: 13454.3). Total num frames: 742731776. Throughput: 0: 3840.9. Samples: 174853778. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 11:12:58,968][134211] Avg episode reward: [(0, '9.053')] [2025-01-04 11:12:59,974][134294] Updated weights for policy 0, policy_version 181334 (0.0029) [2025-01-04 11:13:03,513][134294] Updated weights for policy 0, policy_version 181344 (0.0029) [2025-01-04 11:13:03,968][134211] Fps is (10 sec: 12288.1, 60 sec: 14609.1, 300 sec: 13454.3). Total num frames: 742789120. Throughput: 0: 3807.7. Samples: 174862102. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 11:13:03,968][134211] Avg episode reward: [(0, '9.183')] [2025-01-04 11:13:06,726][134294] Updated weights for policy 0, policy_version 181354 (0.0025) [2025-01-04 11:13:08,968][134211] Fps is (10 sec: 12288.1, 60 sec: 14609.0, 300 sec: 13412.7). Total num frames: 742854656. Throughput: 0: 3680.2. Samples: 174881362. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 11:13:08,968][134211] Avg episode reward: [(0, '9.337')] [2025-01-04 11:13:09,772][134294] Updated weights for policy 0, policy_version 181364 (0.0026) [2025-01-04 11:13:13,100][134294] Updated weights for policy 0, policy_version 181374 (0.0029) [2025-01-04 11:13:13,968][134211] Fps is (10 sec: 12697.5, 60 sec: 14609.1, 300 sec: 13426.7). Total num frames: 742916096. Throughput: 0: 3708.4. Samples: 174900248. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 11:13:13,968][134211] Avg episode reward: [(0, '8.440')] [2025-01-04 11:13:14,024][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000181377_742920192.pth... [2025-01-04 11:13:14,089][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000180589_739692544.pth [2025-01-04 11:13:16,236][134294] Updated weights for policy 0, policy_version 181384 (0.0024) [2025-01-04 11:13:18,968][134211] Fps is (10 sec: 13107.3, 60 sec: 14609.1, 300 sec: 13454.3). Total num frames: 742985728. Throughput: 0: 3736.4. Samples: 174910262. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 11:13:18,968][134211] Avg episode reward: [(0, '8.953')] [2025-01-04 11:13:19,185][134294] Updated weights for policy 0, policy_version 181394 (0.0024) [2025-01-04 11:13:22,267][134294] Updated weights for policy 0, policy_version 181404 (0.0022) [2025-01-04 11:13:23,968][134211] Fps is (10 sec: 13517.1, 60 sec: 14404.3, 300 sec: 13426.5). Total num frames: 743051264. Throughput: 0: 3602.5. Samples: 174930724. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 11:13:23,968][134211] Avg episode reward: [(0, '10.329')] [2025-01-04 11:13:25,456][134294] Updated weights for policy 0, policy_version 181414 (0.0026) [2025-01-04 11:13:28,754][134294] Updated weights for policy 0, policy_version 181424 (0.0029) [2025-01-04 11:13:28,968][134211] Fps is (10 sec: 12697.5, 60 sec: 14404.3, 300 sec: 13412.7). Total num frames: 743112704. Throughput: 0: 3305.1. Samples: 174949774. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 11:13:28,968][134211] Avg episode reward: [(0, '8.034')] [2025-01-04 11:13:31,975][134294] Updated weights for policy 0, policy_version 181434 (0.0028) [2025-01-04 11:13:33,968][134211] Fps is (10 sec: 12287.9, 60 sec: 14404.2, 300 sec: 13426.6). Total num frames: 743174144. Throughput: 0: 3204.7. Samples: 174959146. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 11:13:33,968][134211] Avg episode reward: [(0, '9.503')] [2025-01-04 11:13:35,238][134294] Updated weights for policy 0, policy_version 181444 (0.0027) [2025-01-04 11:13:38,111][134294] Updated weights for policy 0, policy_version 181454 (0.0022) [2025-01-04 11:13:38,968][134211] Fps is (10 sec: 13107.0, 60 sec: 13721.5, 300 sec: 13454.4). Total num frames: 743243776. Throughput: 0: 3208.9. Samples: 174978792. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 11:13:38,968][134211] Avg episode reward: [(0, '9.722')] [2025-01-04 11:13:41,233][134294] Updated weights for policy 0, policy_version 181464 (0.0023) [2025-01-04 11:13:43,968][134211] Fps is (10 sec: 13516.5, 60 sec: 13038.9, 300 sec: 13468.2). Total num frames: 743309312. Throughput: 0: 3217.1. Samples: 174998550. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 11:13:43,971][134211] Avg episode reward: [(0, '9.180')] [2025-01-04 11:13:44,364][134294] Updated weights for policy 0, policy_version 181474 (0.0028) [2025-01-04 11:13:47,332][134294] Updated weights for policy 0, policy_version 181484 (0.0023) [2025-01-04 11:13:48,968][134211] Fps is (10 sec: 13516.9, 60 sec: 12970.8, 300 sec: 13440.4). Total num frames: 743378944. Throughput: 0: 3260.4. Samples: 175008820. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 11:13:48,968][134211] Avg episode reward: [(0, '9.829')] [2025-01-04 11:13:50,278][134294] Updated weights for policy 0, policy_version 181494 (0.0024) [2025-01-04 11:13:53,253][134294] Updated weights for policy 0, policy_version 181504 (0.0025) [2025-01-04 11:13:53,968][134211] Fps is (10 sec: 13926.8, 60 sec: 13039.0, 300 sec: 13468.2). Total num frames: 743448576. Throughput: 0: 3299.1. Samples: 175029822. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 11:13:53,968][134211] Avg episode reward: [(0, '10.065')] [2025-01-04 11:13:56,551][134294] Updated weights for policy 0, policy_version 181514 (0.0028) [2025-01-04 11:13:58,968][134211] Fps is (10 sec: 12697.7, 60 sec: 12902.4, 300 sec: 13468.2). Total num frames: 743505920. Throughput: 0: 3288.1. Samples: 175048210. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 11:13:58,968][134211] Avg episode reward: [(0, '9.149')] [2025-01-04 11:14:00,127][134294] Updated weights for policy 0, policy_version 181524 (0.0028) [2025-01-04 11:14:03,520][134294] Updated weights for policy 0, policy_version 181534 (0.0026) [2025-01-04 11:14:03,968][134211] Fps is (10 sec: 11877.8, 60 sec: 12970.6, 300 sec: 13468.2). Total num frames: 743567360. Throughput: 0: 3258.9. Samples: 175056916. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 11:14:03,969][134211] Avg episode reward: [(0, '8.451')] [2025-01-04 11:14:06,709][134294] Updated weights for policy 0, policy_version 181544 (0.0024) [2025-01-04 11:14:08,967][134211] Fps is (10 sec: 13517.0, 60 sec: 13107.2, 300 sec: 13384.9). Total num frames: 743641088. Throughput: 0: 3223.0. Samples: 175075758. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 11:14:08,968][134211] Avg episode reward: [(0, '8.948')] [2025-01-04 11:14:09,062][134294] Updated weights for policy 0, policy_version 181554 (0.0014) [2025-01-04 11:14:10,988][134294] Updated weights for policy 0, policy_version 181564 (0.0013) [2025-01-04 11:14:12,963][134294] Updated weights for policy 0, policy_version 181574 (0.0012) [2025-01-04 11:14:13,968][134211] Fps is (10 sec: 18022.2, 60 sec: 13858.0, 300 sec: 13482.1). Total num frames: 743747584. Throughput: 0: 3477.1. Samples: 175106248. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 11:14:13,969][134211] Avg episode reward: [(0, '9.830')] [2025-01-04 11:14:14,846][134294] Updated weights for policy 0, policy_version 181584 (0.0014) [2025-01-04 11:14:17,783][134294] Updated weights for policy 0, policy_version 181594 (0.0027) [2025-01-04 11:14:18,968][134211] Fps is (10 sec: 18021.8, 60 sec: 13926.4, 300 sec: 13537.6). Total num frames: 743821312. Throughput: 0: 3584.3. Samples: 175120438. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 11:14:18,968][134211] Avg episode reward: [(0, '8.888')] [2025-01-04 11:14:20,887][134294] Updated weights for policy 0, policy_version 181604 (0.0027) [2025-01-04 11:14:23,968][134211] Fps is (10 sec: 13927.0, 60 sec: 13926.3, 300 sec: 13551.7). Total num frames: 743886848. Throughput: 0: 3583.1. Samples: 175140030. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 11:14:23,969][134211] Avg episode reward: [(0, '9.255')] [2025-01-04 11:14:24,174][134294] Updated weights for policy 0, policy_version 181614 (0.0027) [2025-01-04 11:14:27,816][134294] Updated weights for policy 0, policy_version 181624 (0.0030) [2025-01-04 11:14:28,969][134211] Fps is (10 sec: 12286.5, 60 sec: 13857.8, 300 sec: 13551.5). Total num frames: 743944192. Throughput: 0: 3523.2. Samples: 175157096. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 11:14:28,970][134211] Avg episode reward: [(0, '10.270')] [2025-01-04 11:14:31,297][134294] Updated weights for policy 0, policy_version 181634 (0.0026) [2025-01-04 11:14:33,968][134211] Fps is (10 sec: 11468.9, 60 sec: 13789.8, 300 sec: 13537.6). Total num frames: 744001536. Throughput: 0: 3492.5. Samples: 175165982. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 11:14:33,968][134211] Avg episode reward: [(0, '9.568')] [2025-01-04 11:14:34,811][134294] Updated weights for policy 0, policy_version 181644 (0.0027) [2025-01-04 11:14:37,801][134294] Updated weights for policy 0, policy_version 181654 (0.0026) [2025-01-04 11:14:38,968][134211] Fps is (10 sec: 12289.7, 60 sec: 13721.6, 300 sec: 13551.5). Total num frames: 744067072. Throughput: 0: 3446.3. Samples: 175184904. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 11:14:38,968][134211] Avg episode reward: [(0, '8.984')] [2025-01-04 11:14:40,789][134294] Updated weights for policy 0, policy_version 181664 (0.0023) [2025-01-04 11:14:43,830][134294] Updated weights for policy 0, policy_version 181674 (0.0025) [2025-01-04 11:14:43,968][134211] Fps is (10 sec: 13516.8, 60 sec: 13789.9, 300 sec: 13579.3). Total num frames: 744136704. Throughput: 0: 3490.6. Samples: 175205286. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 11:14:43,968][134211] Avg episode reward: [(0, '8.820')] [2025-01-04 11:14:46,751][134294] Updated weights for policy 0, policy_version 181684 (0.0022) [2025-01-04 11:14:48,968][134211] Fps is (10 sec: 13926.2, 60 sec: 13789.8, 300 sec: 13607.0). Total num frames: 744206336. Throughput: 0: 3525.5. Samples: 175215560. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 11:14:48,968][134211] Avg episode reward: [(0, '8.785')] [2025-01-04 11:14:49,809][134294] Updated weights for policy 0, policy_version 181694 (0.0024) [2025-01-04 11:14:52,796][134294] Updated weights for policy 0, policy_version 181704 (0.0026) [2025-01-04 11:14:53,968][134211] Fps is (10 sec: 13516.8, 60 sec: 13721.6, 300 sec: 13620.9). Total num frames: 744271872. Throughput: 0: 3560.2. Samples: 175235970. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 11:14:53,968][134211] Avg episode reward: [(0, '9.381')] [2025-01-04 11:14:55,726][134294] Updated weights for policy 0, policy_version 181714 (0.0024) [2025-01-04 11:14:58,799][134294] Updated weights for policy 0, policy_version 181724 (0.0027) [2025-01-04 11:14:58,968][134211] Fps is (10 sec: 13517.0, 60 sec: 13926.4, 300 sec: 13593.2). Total num frames: 744341504. Throughput: 0: 3340.1. Samples: 175256548. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 11:14:58,968][134211] Avg episode reward: [(0, '10.599')] [2025-01-04 11:15:02,042][134294] Updated weights for policy 0, policy_version 181734 (0.0025) [2025-01-04 11:15:03,968][134211] Fps is (10 sec: 13516.8, 60 sec: 13994.8, 300 sec: 13468.2). Total num frames: 744407040. Throughput: 0: 3231.6. Samples: 175265862. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 11:15:03,968][134211] Avg episode reward: [(0, '8.820')] [2025-01-04 11:15:05,204][134294] Updated weights for policy 0, policy_version 181744 (0.0025) [2025-01-04 11:15:08,127][134294] Updated weights for policy 0, policy_version 181754 (0.0022) [2025-01-04 11:15:08,968][134211] Fps is (10 sec: 13107.1, 60 sec: 13858.1, 300 sec: 13496.0). Total num frames: 744472576. Throughput: 0: 3243.3. Samples: 175285978. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 11:15:08,968][134211] Avg episode reward: [(0, '9.769')] [2025-01-04 11:15:11,262][134294] Updated weights for policy 0, policy_version 181764 (0.0026) [2025-01-04 11:15:13,968][134211] Fps is (10 sec: 13106.9, 60 sec: 13175.5, 300 sec: 13509.9). Total num frames: 744538112. Throughput: 0: 3298.1. Samples: 175305508. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 11:15:13,969][134211] Avg episode reward: [(0, '8.850')] [2025-01-04 11:15:13,978][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000181772_744538112.pth... [2025-01-04 11:15:14,051][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000180968_741244928.pth [2025-01-04 11:15:14,495][134294] Updated weights for policy 0, policy_version 181774 (0.0026) [2025-01-04 11:15:17,579][134294] Updated weights for policy 0, policy_version 181784 (0.0026) [2025-01-04 11:15:18,968][134211] Fps is (10 sec: 13107.3, 60 sec: 13039.0, 300 sec: 13523.8). Total num frames: 744603648. Throughput: 0: 3319.6. Samples: 175315364. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 11:15:18,968][134211] Avg episode reward: [(0, '8.578')] [2025-01-04 11:15:20,452][134294] Updated weights for policy 0, policy_version 181794 (0.0025) [2025-01-04 11:15:23,026][134294] Updated weights for policy 0, policy_version 181804 (0.0020) [2025-01-04 11:15:23,968][134211] Fps is (10 sec: 14746.2, 60 sec: 13312.1, 300 sec: 13607.1). Total num frames: 744685568. Throughput: 0: 3369.6. Samples: 175336536. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 11:15:23,968][134211] Avg episode reward: [(0, '9.577')] [2025-01-04 11:15:25,184][134294] Updated weights for policy 0, policy_version 181814 (0.0018) [2025-01-04 11:15:28,108][134294] Updated weights for policy 0, policy_version 181824 (0.0025) [2025-01-04 11:15:28,968][134211] Fps is (10 sec: 15564.7, 60 sec: 13585.4, 300 sec: 13662.6). Total num frames: 744759296. Throughput: 0: 3457.8. Samples: 175360886. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 11:15:28,968][134211] Avg episode reward: [(0, '10.359')] [2025-01-04 11:15:31,466][134294] Updated weights for policy 0, policy_version 181834 (0.0027) [2025-01-04 11:15:33,968][134211] Fps is (10 sec: 13926.1, 60 sec: 13721.6, 300 sec: 13676.5). Total num frames: 744824832. Throughput: 0: 3440.8. Samples: 175370394. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 11:15:33,968][134211] Avg episode reward: [(0, '9.010')] [2025-01-04 11:15:34,545][134294] Updated weights for policy 0, policy_version 181844 (0.0025) [2025-01-04 11:15:37,552][134294] Updated weights for policy 0, policy_version 181854 (0.0023) [2025-01-04 11:15:38,968][134211] Fps is (10 sec: 13107.2, 60 sec: 13721.6, 300 sec: 13690.4). Total num frames: 744890368. Throughput: 0: 3432.5. Samples: 175390434. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 11:15:38,969][134211] Avg episode reward: [(0, '10.024')] [2025-01-04 11:15:40,566][134294] Updated weights for policy 0, policy_version 181864 (0.0025) [2025-01-04 11:15:43,627][134294] Updated weights for policy 0, policy_version 181874 (0.0025) [2025-01-04 11:15:43,968][134211] Fps is (10 sec: 13107.2, 60 sec: 13653.3, 300 sec: 13704.2). Total num frames: 744955904. Throughput: 0: 3426.9. Samples: 175410760. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 11:15:43,968][134211] Avg episode reward: [(0, '9.636')] [2025-01-04 11:15:46,594][134294] Updated weights for policy 0, policy_version 181884 (0.0021) [2025-01-04 11:15:48,968][134211] Fps is (10 sec: 13516.7, 60 sec: 13653.3, 300 sec: 13676.5). Total num frames: 745025536. Throughput: 0: 3445.7. Samples: 175420920. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 11:15:48,968][134211] Avg episode reward: [(0, '9.032')] [2025-01-04 11:15:49,615][134294] Updated weights for policy 0, policy_version 181894 (0.0025) [2025-01-04 11:15:52,627][134294] Updated weights for policy 0, policy_version 181904 (0.0022) [2025-01-04 11:15:53,968][134211] Fps is (10 sec: 13517.0, 60 sec: 13653.4, 300 sec: 13648.7). Total num frames: 745091072. Throughput: 0: 3452.8. Samples: 175441352. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 11:15:53,968][134211] Avg episode reward: [(0, '10.357')] [2025-01-04 11:15:55,537][134294] Updated weights for policy 0, policy_version 181914 (0.0024) [2025-01-04 11:15:57,618][134294] Updated weights for policy 0, policy_version 181924 (0.0014) [2025-01-04 11:15:58,967][134211] Fps is (10 sec: 15974.9, 60 sec: 14063.0, 300 sec: 13773.7). Total num frames: 745185280. Throughput: 0: 3567.3. Samples: 175466034. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 11:15:58,968][134211] Avg episode reward: [(0, '9.420')] [2025-01-04 11:15:59,715][134294] Updated weights for policy 0, policy_version 181934 (0.0011) [2025-01-04 11:16:01,753][134294] Updated weights for policy 0, policy_version 181944 (0.0013) [2025-01-04 11:16:03,638][134294] Updated weights for policy 0, policy_version 181954 (0.0012) [2025-01-04 11:16:03,968][134211] Fps is (10 sec: 19661.0, 60 sec: 14677.4, 300 sec: 13926.4). Total num frames: 745287680. Throughput: 0: 3681.6. Samples: 175481034. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 11:16:03,968][134211] Avg episode reward: [(0, '8.686')] [2025-01-04 11:16:05,518][134294] Updated weights for policy 0, policy_version 181964 (0.0015) [2025-01-04 11:16:08,453][134294] Updated weights for policy 0, policy_version 181974 (0.0023) [2025-01-04 11:16:08,968][134211] Fps is (10 sec: 18431.5, 60 sec: 14950.4, 300 sec: 13995.8). Total num frames: 745369600. Throughput: 0: 3871.2. Samples: 175510742. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 11:16:08,968][134211] Avg episode reward: [(0, '9.898')] [2025-01-04 11:16:12,110][134294] Updated weights for policy 0, policy_version 181984 (0.0029) [2025-01-04 11:16:13,968][134211] Fps is (10 sec: 13926.3, 60 sec: 14814.0, 300 sec: 13968.2). Total num frames: 745426944. Throughput: 0: 3706.6. Samples: 175527684. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 11:16:13,968][134211] Avg episode reward: [(0, '8.288')] [2025-01-04 11:16:15,657][134294] Updated weights for policy 0, policy_version 181994 (0.0031) [2025-01-04 11:16:18,743][134294] Updated weights for policy 0, policy_version 182004 (0.0025) [2025-01-04 11:16:18,968][134211] Fps is (10 sec: 11878.6, 60 sec: 14745.6, 300 sec: 13954.2). Total num frames: 745488384. Throughput: 0: 3698.6. Samples: 175536830. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 11:16:18,968][134211] Avg episode reward: [(0, '10.139')] [2025-01-04 11:16:21,711][134294] Updated weights for policy 0, policy_version 182014 (0.0028) [2025-01-04 11:16:23,968][134211] Fps is (10 sec: 12697.3, 60 sec: 14472.5, 300 sec: 13954.2). Total num frames: 745553920. Throughput: 0: 3699.5. Samples: 175556912. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 11:16:23,969][134211] Avg episode reward: [(0, '8.265')] [2025-01-04 11:16:25,196][134294] Updated weights for policy 0, policy_version 182024 (0.0026) [2025-01-04 11:16:28,729][134294] Updated weights for policy 0, policy_version 182034 (0.0028) [2025-01-04 11:16:28,968][134211] Fps is (10 sec: 12288.0, 60 sec: 14199.5, 300 sec: 13870.9). Total num frames: 745611264. Throughput: 0: 3637.7. Samples: 175574454. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 11:16:28,968][134211] Avg episode reward: [(0, '9.167')] [2025-01-04 11:16:32,252][134294] Updated weights for policy 0, policy_version 182044 (0.0021) [2025-01-04 11:16:33,968][134211] Fps is (10 sec: 11878.6, 60 sec: 14131.2, 300 sec: 13787.6). Total num frames: 745672704. Throughput: 0: 3607.6. Samples: 175583260. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 11:16:33,968][134211] Avg episode reward: [(0, '10.733')] [2025-01-04 11:16:35,471][134294] Updated weights for policy 0, policy_version 182054 (0.0027) [2025-01-04 11:16:38,457][134294] Updated weights for policy 0, policy_version 182064 (0.0025) [2025-01-04 11:16:38,968][134211] Fps is (10 sec: 12697.5, 60 sec: 14131.2, 300 sec: 13815.3). Total num frames: 745738240. Throughput: 0: 3581.0. Samples: 175602496. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 11:16:38,969][134211] Avg episode reward: [(0, '8.641')] [2025-01-04 11:16:41,412][134294] Updated weights for policy 0, policy_version 182074 (0.0025) [2025-01-04 11:16:43,968][134211] Fps is (10 sec: 13107.2, 60 sec: 14131.2, 300 sec: 13829.2). Total num frames: 745803776. Throughput: 0: 3481.6. Samples: 175622708. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 11:16:43,968][134211] Avg episode reward: [(0, '10.583')] [2025-01-04 11:16:44,550][134294] Updated weights for policy 0, policy_version 182084 (0.0024) [2025-01-04 11:16:47,602][134294] Updated weights for policy 0, policy_version 182094 (0.0024) [2025-01-04 11:16:48,968][134211] Fps is (10 sec: 13517.0, 60 sec: 14131.2, 300 sec: 13857.0). Total num frames: 745873408. Throughput: 0: 3366.0. Samples: 175632506. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 11:16:48,968][134211] Avg episode reward: [(0, '8.307')] [2025-01-04 11:16:50,588][134294] Updated weights for policy 0, policy_version 182104 (0.0025) [2025-01-04 11:16:53,415][134294] Updated weights for policy 0, policy_version 182114 (0.0025) [2025-01-04 11:16:53,968][134211] Fps is (10 sec: 13925.7, 60 sec: 14199.3, 300 sec: 13884.7). Total num frames: 745943040. Throughput: 0: 3172.6. Samples: 175653512. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 11:16:53,969][134211] Avg episode reward: [(0, '10.659')] [2025-01-04 11:16:56,358][134294] Updated weights for policy 0, policy_version 182124 (0.0022) [2025-01-04 11:16:58,968][134211] Fps is (10 sec: 13516.6, 60 sec: 13721.5, 300 sec: 13884.7). Total num frames: 746008576. Throughput: 0: 3247.5. Samples: 175673822. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 11:16:58,968][134211] Avg episode reward: [(0, '9.178')] [2025-01-04 11:16:59,622][134294] Updated weights for policy 0, policy_version 182134 (0.0024) [2025-01-04 11:17:02,917][134294] Updated weights for policy 0, policy_version 182144 (0.0026) [2025-01-04 11:17:03,968][134211] Fps is (10 sec: 13107.9, 60 sec: 13107.2, 300 sec: 13884.7). Total num frames: 746074112. Throughput: 0: 3252.9. Samples: 175683210. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 11:17:03,968][134211] Avg episode reward: [(0, '10.067')] [2025-01-04 11:17:05,842][134294] Updated weights for policy 0, policy_version 182154 (0.0026) [2025-01-04 11:17:08,968][134211] Fps is (10 sec: 13516.7, 60 sec: 12902.4, 300 sec: 13912.5). Total num frames: 746143744. Throughput: 0: 3254.4. Samples: 175703358. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 11:17:08,969][134211] Avg episode reward: [(0, '9.243')] [2025-01-04 11:17:08,969][134294] Updated weights for policy 0, policy_version 182164 (0.0025) [2025-01-04 11:17:12,102][134294] Updated weights for policy 0, policy_version 182174 (0.0024) [2025-01-04 11:17:13,968][134211] Fps is (10 sec: 13515.7, 60 sec: 13038.8, 300 sec: 13898.6). Total num frames: 746209280. Throughput: 0: 3306.3. Samples: 175723242. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 11:17:13,969][134211] Avg episode reward: [(0, '9.669')] [2025-01-04 11:17:13,980][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000182180_746209280.pth... [2025-01-04 11:17:14,052][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000181377_742920192.pth [2025-01-04 11:17:15,136][134294] Updated weights for policy 0, policy_version 182184 (0.0026) [2025-01-04 11:17:18,040][134294] Updated weights for policy 0, policy_version 182194 (0.0026) [2025-01-04 11:17:18,968][134211] Fps is (10 sec: 13517.1, 60 sec: 13175.5, 300 sec: 13870.9). Total num frames: 746278912. Throughput: 0: 3332.2. Samples: 175733208. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 11:17:18,968][134211] Avg episode reward: [(0, '8.244')] [2025-01-04 11:17:20,979][134294] Updated weights for policy 0, policy_version 182204 (0.0023) [2025-01-04 11:17:23,968][134211] Fps is (10 sec: 13517.6, 60 sec: 13175.5, 300 sec: 13884.7). Total num frames: 746344448. Throughput: 0: 3372.6. Samples: 175754262. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 11:17:23,969][134211] Avg episode reward: [(0, '9.066')] [2025-01-04 11:17:24,022][134294] Updated weights for policy 0, policy_version 182214 (0.0024) [2025-01-04 11:17:27,206][134294] Updated weights for policy 0, policy_version 182224 (0.0024) [2025-01-04 11:17:28,968][134211] Fps is (10 sec: 13107.0, 60 sec: 13312.0, 300 sec: 13898.6). Total num frames: 746409984. Throughput: 0: 3348.7. Samples: 175773400. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 11:17:28,972][134211] Avg episode reward: [(0, '10.634')] [2025-01-04 11:17:30,213][134294] Updated weights for policy 0, policy_version 182234 (0.0022) [2025-01-04 11:17:32,188][134294] Updated weights for policy 0, policy_version 182244 (0.0015) [2025-01-04 11:17:33,968][134211] Fps is (10 sec: 16384.5, 60 sec: 13926.4, 300 sec: 13857.0). Total num frames: 746508288. Throughput: 0: 3406.9. Samples: 175785816. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 11:17:33,968][134211] Avg episode reward: [(0, '10.057')] [2025-01-04 11:17:34,056][134294] Updated weights for policy 0, policy_version 182254 (0.0013) [2025-01-04 11:17:35,976][134294] Updated weights for policy 0, policy_version 182264 (0.0014) [2025-01-04 11:17:38,778][134294] Updated weights for policy 0, policy_version 182274 (0.0027) [2025-01-04 11:17:38,968][134211] Fps is (10 sec: 18431.9, 60 sec: 14267.7, 300 sec: 13787.6). Total num frames: 746594304. Throughput: 0: 3621.9. Samples: 175816496. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 11:17:38,970][134211] Avg episode reward: [(0, '9.717')] [2025-01-04 11:17:42,050][134294] Updated weights for policy 0, policy_version 182284 (0.0026) [2025-01-04 11:17:43,968][134211] Fps is (10 sec: 14745.4, 60 sec: 14199.5, 300 sec: 13745.9). Total num frames: 746655744. Throughput: 0: 3591.1. Samples: 175835420. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 11:17:43,968][134211] Avg episode reward: [(0, '9.456')] [2025-01-04 11:17:45,315][134294] Updated weights for policy 0, policy_version 182294 (0.0029) [2025-01-04 11:17:48,377][134294] Updated weights for policy 0, policy_version 182304 (0.0024) [2025-01-04 11:17:48,968][134211] Fps is (10 sec: 13106.7, 60 sec: 14199.3, 300 sec: 13759.8). Total num frames: 746725376. Throughput: 0: 3601.4. Samples: 175845274. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 11:17:48,969][134211] Avg episode reward: [(0, '9.027')] [2025-01-04 11:17:51,322][134294] Updated weights for policy 0, policy_version 182314 (0.0026) [2025-01-04 11:17:53,968][134211] Fps is (10 sec: 13516.4, 60 sec: 14131.2, 300 sec: 13759.8). Total num frames: 746790912. Throughput: 0: 3605.7. Samples: 175865616. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 11:17:53,969][134211] Avg episode reward: [(0, '9.427')] [2025-01-04 11:17:54,609][134294] Updated weights for policy 0, policy_version 182324 (0.0025) [2025-01-04 11:17:57,976][134294] Updated weights for policy 0, policy_version 182334 (0.0022) [2025-01-04 11:17:58,968][134211] Fps is (10 sec: 12288.5, 60 sec: 13994.7, 300 sec: 13759.8). Total num frames: 746848256. Throughput: 0: 3564.8. Samples: 175883656. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 11:17:58,969][134211] Avg episode reward: [(0, '9.278')] [2025-01-04 11:18:01,156][134294] Updated weights for policy 0, policy_version 182344 (0.0023) [2025-01-04 11:18:03,970][134211] Fps is (10 sec: 12285.6, 60 sec: 13994.1, 300 sec: 13759.7). Total num frames: 746913792. Throughput: 0: 3557.9. Samples: 175893322. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 11:18:03,971][134211] Avg episode reward: [(0, '9.545')] [2025-01-04 11:18:04,512][134294] Updated weights for policy 0, policy_version 182354 (0.0027) [2025-01-04 11:18:07,606][134294] Updated weights for policy 0, policy_version 182364 (0.0024) [2025-01-04 11:18:08,968][134211] Fps is (10 sec: 12697.8, 60 sec: 13858.2, 300 sec: 13759.8). Total num frames: 746975232. Throughput: 0: 3520.0. Samples: 175912660. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 11:18:08,968][134211] Avg episode reward: [(0, '8.474')] [2025-01-04 11:18:10,758][134294] Updated weights for policy 0, policy_version 182374 (0.0025) [2025-01-04 11:18:13,936][134294] Updated weights for policy 0, policy_version 182384 (0.0023) [2025-01-04 11:18:13,968][134211] Fps is (10 sec: 13110.2, 60 sec: 13926.6, 300 sec: 13759.8). Total num frames: 747044864. Throughput: 0: 3527.2. Samples: 175932122. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 11:18:13,968][134211] Avg episode reward: [(0, '8.148')] [2025-01-04 11:18:17,041][134294] Updated weights for policy 0, policy_version 182394 (0.0026) [2025-01-04 11:18:18,967][134211] Fps is (10 sec: 14336.2, 60 sec: 13994.7, 300 sec: 13787.6). Total num frames: 747118592. Throughput: 0: 3464.5. Samples: 175941718. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 11:18:18,968][134211] Avg episode reward: [(0, '9.778')] [2025-01-04 11:18:19,181][134294] Updated weights for policy 0, policy_version 182404 (0.0015) [2025-01-04 11:18:21,638][134294] Updated weights for policy 0, policy_version 182414 (0.0023) [2025-01-04 11:18:23,968][134211] Fps is (10 sec: 15154.9, 60 sec: 14199.5, 300 sec: 13843.1). Total num frames: 747196416. Throughput: 0: 3347.2. Samples: 175967122. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 11:18:23,969][134211] Avg episode reward: [(0, '9.409')] [2025-01-04 11:18:24,790][134294] Updated weights for policy 0, policy_version 182424 (0.0027) [2025-01-04 11:18:28,291][134294] Updated weights for policy 0, policy_version 182434 (0.0028) [2025-01-04 11:18:28,968][134211] Fps is (10 sec: 13516.5, 60 sec: 14062.9, 300 sec: 13829.2). Total num frames: 747253760. Throughput: 0: 3331.3. Samples: 175985328. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 11:18:28,968][134211] Avg episode reward: [(0, '8.797')] [2025-01-04 11:18:31,486][134294] Updated weights for policy 0, policy_version 182444 (0.0023) [2025-01-04 11:18:33,554][134294] Updated weights for policy 0, policy_version 182454 (0.0015) [2025-01-04 11:18:33,967][134211] Fps is (10 sec: 13926.9, 60 sec: 13789.9, 300 sec: 13870.9). Total num frames: 747335680. Throughput: 0: 3314.1. Samples: 175994406. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 11:18:33,968][134211] Avg episode reward: [(0, '9.903')] [2025-01-04 11:18:36,005][134294] Updated weights for policy 0, policy_version 182464 (0.0020) [2025-01-04 11:18:38,968][134211] Fps is (10 sec: 15564.6, 60 sec: 13585.1, 300 sec: 13898.6). Total num frames: 747409408. Throughput: 0: 3443.3. Samples: 176020564. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 11:18:38,969][134211] Avg episode reward: [(0, '9.473')] [2025-01-04 11:18:38,993][134294] Updated weights for policy 0, policy_version 182474 (0.0024) [2025-01-04 11:18:42,158][134294] Updated weights for policy 0, policy_version 182484 (0.0027) [2025-01-04 11:18:43,968][134211] Fps is (10 sec: 13926.1, 60 sec: 13653.3, 300 sec: 13884.7). Total num frames: 747474944. Throughput: 0: 3482.3. Samples: 176040358. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 11:18:43,968][134211] Avg episode reward: [(0, '8.992')] [2025-01-04 11:18:45,284][134294] Updated weights for policy 0, policy_version 182494 (0.0029) [2025-01-04 11:18:48,303][134294] Updated weights for policy 0, policy_version 182504 (0.0026) [2025-01-04 11:18:48,968][134211] Fps is (10 sec: 13516.9, 60 sec: 13653.4, 300 sec: 13884.7). Total num frames: 747544576. Throughput: 0: 3483.0. Samples: 176050050. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 11:18:48,968][134211] Avg episode reward: [(0, '10.103')] [2025-01-04 11:18:51,255][134294] Updated weights for policy 0, policy_version 182514 (0.0025) [2025-01-04 11:18:53,968][134211] Fps is (10 sec: 13926.3, 60 sec: 13721.7, 300 sec: 13926.4). Total num frames: 747614208. Throughput: 0: 3512.9. Samples: 176070740. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 11:18:53,968][134211] Avg episode reward: [(0, '10.316')] [2025-01-04 11:18:54,319][134294] Updated weights for policy 0, policy_version 182524 (0.0026) [2025-01-04 11:18:57,353][134294] Updated weights for policy 0, policy_version 182534 (0.0025) [2025-01-04 11:18:58,968][134211] Fps is (10 sec: 13517.0, 60 sec: 13858.2, 300 sec: 13940.3). Total num frames: 747679744. Throughput: 0: 3522.7. Samples: 176090642. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 11:18:58,968][134211] Avg episode reward: [(0, '9.631')] [2025-01-04 11:19:00,530][134294] Updated weights for policy 0, policy_version 182544 (0.0025) [2025-01-04 11:19:03,713][134294] Updated weights for policy 0, policy_version 182554 (0.0026) [2025-01-04 11:19:03,968][134211] Fps is (10 sec: 13107.3, 60 sec: 13858.7, 300 sec: 13912.5). Total num frames: 747745280. Throughput: 0: 3522.0. Samples: 176100210. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 11:19:03,968][134211] Avg episode reward: [(0, '9.678')] [2025-01-04 11:19:06,642][134294] Updated weights for policy 0, policy_version 182564 (0.0025) [2025-01-04 11:19:08,968][134211] Fps is (10 sec: 13107.0, 60 sec: 13926.4, 300 sec: 13773.7). Total num frames: 747810816. Throughput: 0: 3406.3. Samples: 176120404. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 11:19:08,968][134211] Avg episode reward: [(0, '8.981')] [2025-01-04 11:19:09,736][134294] Updated weights for policy 0, policy_version 182574 (0.0027) [2025-01-04 11:19:12,791][134294] Updated weights for policy 0, policy_version 182584 (0.0028) [2025-01-04 11:19:13,968][134211] Fps is (10 sec: 13106.9, 60 sec: 13858.1, 300 sec: 13745.9). Total num frames: 747876352. Throughput: 0: 3446.7. Samples: 176140432. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 11:19:13,968][134211] Avg episode reward: [(0, '8.827')] [2025-01-04 11:19:13,980][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000182587_747876352.pth... [2025-01-04 11:19:14,062][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000181772_744538112.pth [2025-01-04 11:19:15,901][134294] Updated weights for policy 0, policy_version 182594 (0.0026) [2025-01-04 11:19:18,865][134294] Updated weights for policy 0, policy_version 182604 (0.0024) [2025-01-04 11:19:18,968][134211] Fps is (10 sec: 13517.1, 60 sec: 13789.9, 300 sec: 13759.8). Total num frames: 747945984. Throughput: 0: 3464.0. Samples: 176150284. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 11:19:18,968][134211] Avg episode reward: [(0, '9.457')] [2025-01-04 11:19:21,864][134294] Updated weights for policy 0, policy_version 182614 (0.0023) [2025-01-04 11:19:23,968][134211] Fps is (10 sec: 13516.8, 60 sec: 13585.1, 300 sec: 13787.6). Total num frames: 748011520. Throughput: 0: 3340.5. Samples: 176170886. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 11:19:23,968][134211] Avg episode reward: [(0, '9.564')] [2025-01-04 11:19:25,350][134294] Updated weights for policy 0, policy_version 182624 (0.0025) [2025-01-04 11:19:27,700][134294] Updated weights for policy 0, policy_version 182634 (0.0015) [2025-01-04 11:19:28,968][134211] Fps is (10 sec: 13926.2, 60 sec: 13858.1, 300 sec: 13843.1). Total num frames: 748085248. Throughput: 0: 3376.3. Samples: 176192292. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 11:19:28,968][134211] Avg episode reward: [(0, '8.352')] [2025-01-04 11:19:30,852][134294] Updated weights for policy 0, policy_version 182644 (0.0022) [2025-01-04 11:19:33,963][134294] Updated weights for policy 0, policy_version 182654 (0.0024) [2025-01-04 11:19:33,968][134211] Fps is (10 sec: 13926.7, 60 sec: 13585.0, 300 sec: 13843.1). Total num frames: 748150784. Throughput: 0: 3369.4. Samples: 176201674. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 11:19:33,968][134211] Avg episode reward: [(0, '10.352')] [2025-01-04 11:19:37,033][134294] Updated weights for policy 0, policy_version 182664 (0.0028) [2025-01-04 11:19:38,968][134211] Fps is (10 sec: 13106.9, 60 sec: 13448.5, 300 sec: 13829.2). Total num frames: 748216320. Throughput: 0: 3358.1. Samples: 176221856. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 11:19:38,969][134211] Avg episode reward: [(0, '9.202')] [2025-01-04 11:19:40,003][134294] Updated weights for policy 0, policy_version 182674 (0.0025) [2025-01-04 11:19:42,034][134294] Updated weights for policy 0, policy_version 182684 (0.0013) [2025-01-04 11:19:43,968][134211] Fps is (10 sec: 15974.6, 60 sec: 13926.4, 300 sec: 13912.5). Total num frames: 748310528. Throughput: 0: 3484.8. Samples: 176247456. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 11:19:43,968][134211] Avg episode reward: [(0, '9.272')] [2025-01-04 11:19:43,991][134294] Updated weights for policy 0, policy_version 182694 (0.0014) [2025-01-04 11:19:45,992][134294] Updated weights for policy 0, policy_version 182704 (0.0014) [2025-01-04 11:19:47,920][134294] Updated weights for policy 0, policy_version 182714 (0.0013) [2025-01-04 11:19:48,968][134211] Fps is (10 sec: 20071.1, 60 sec: 14540.8, 300 sec: 14051.4). Total num frames: 748417024. Throughput: 0: 3621.4. Samples: 176263174. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 11:19:48,968][134211] Avg episode reward: [(0, '10.059')] [2025-01-04 11:19:49,900][134294] Updated weights for policy 0, policy_version 182724 (0.0015) [2025-01-04 11:19:53,118][134294] Updated weights for policy 0, policy_version 182734 (0.0024) [2025-01-04 11:19:53,968][134211] Fps is (10 sec: 17612.3, 60 sec: 14540.8, 300 sec: 14051.4). Total num frames: 748486656. Throughput: 0: 3772.0. Samples: 176290146. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 11:19:53,968][134211] Avg episode reward: [(0, '9.064')] [2025-01-04 11:19:56,447][134294] Updated weights for policy 0, policy_version 182744 (0.0027) [2025-01-04 11:19:58,968][134211] Fps is (10 sec: 13107.0, 60 sec: 14472.5, 300 sec: 14037.5). Total num frames: 748548096. Throughput: 0: 3725.6. Samples: 176308082. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 11:19:58,968][134211] Avg episode reward: [(0, '9.924')] [2025-01-04 11:20:00,021][134294] Updated weights for policy 0, policy_version 182754 (0.0030) [2025-01-04 11:20:03,337][134294] Updated weights for policy 0, policy_version 182764 (0.0025) [2025-01-04 11:20:03,968][134211] Fps is (10 sec: 11878.2, 60 sec: 14335.9, 300 sec: 14009.7). Total num frames: 748605440. Throughput: 0: 3697.1. Samples: 176316654. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 11:20:03,969][134211] Avg episode reward: [(0, '9.750')] [2025-01-04 11:20:06,441][134294] Updated weights for policy 0, policy_version 182774 (0.0024) [2025-01-04 11:20:08,969][134211] Fps is (10 sec: 12696.2, 60 sec: 14404.0, 300 sec: 14023.6). Total num frames: 748675072. Throughput: 0: 3672.0. Samples: 176336128. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 11:20:08,969][134211] Avg episode reward: [(0, '9.403')] [2025-01-04 11:20:09,664][134294] Updated weights for policy 0, policy_version 182784 (0.0026) [2025-01-04 11:20:12,682][134294] Updated weights for policy 0, policy_version 182794 (0.0024) [2025-01-04 11:20:13,968][134211] Fps is (10 sec: 13107.5, 60 sec: 14336.0, 300 sec: 14009.7). Total num frames: 748736512. Throughput: 0: 3628.8. Samples: 176355590. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 11:20:13,969][134211] Avg episode reward: [(0, '9.659')] [2025-01-04 11:20:15,904][134294] Updated weights for policy 0, policy_version 182804 (0.0023) [2025-01-04 11:20:18,968][134211] Fps is (10 sec: 13108.8, 60 sec: 14336.0, 300 sec: 13968.1). Total num frames: 748806144. Throughput: 0: 3636.7. Samples: 176365326. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 11:20:18,968][134211] Avg episode reward: [(0, '9.609')] [2025-01-04 11:20:18,972][134294] Updated weights for policy 0, policy_version 182814 (0.0022) [2025-01-04 11:20:21,953][134294] Updated weights for policy 0, policy_version 182824 (0.0023) [2025-01-04 11:20:23,968][134211] Fps is (10 sec: 13517.0, 60 sec: 14336.1, 300 sec: 13940.3). Total num frames: 748871680. Throughput: 0: 3640.2. Samples: 176385662. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 11:20:23,968][134211] Avg episode reward: [(0, '9.412')] [2025-01-04 11:20:25,380][134294] Updated weights for policy 0, policy_version 182834 (0.0026) [2025-01-04 11:20:28,891][134294] Updated weights for policy 0, policy_version 182844 (0.0025) [2025-01-04 11:20:28,968][134211] Fps is (10 sec: 12287.8, 60 sec: 14062.9, 300 sec: 13912.5). Total num frames: 748929024. Throughput: 0: 3467.0. Samples: 176403472. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-04 11:20:28,968][134211] Avg episode reward: [(0, '9.611')] [2025-01-04 11:20:32,339][134294] Updated weights for policy 0, policy_version 182854 (0.0024) [2025-01-04 11:20:33,968][134211] Fps is (10 sec: 11468.7, 60 sec: 13926.4, 300 sec: 13884.7). Total num frames: 748986368. Throughput: 0: 3316.6. Samples: 176412420. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-04 11:20:33,968][134211] Avg episode reward: [(0, '9.582')] [2025-01-04 11:20:35,612][134294] Updated weights for policy 0, policy_version 182864 (0.0028) [2025-01-04 11:20:38,746][134294] Updated weights for policy 0, policy_version 182874 (0.0025) [2025-01-04 11:20:38,968][134211] Fps is (10 sec: 12697.7, 60 sec: 13994.7, 300 sec: 13898.6). Total num frames: 749056000. Throughput: 0: 3134.3. Samples: 176431190. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-04 11:20:38,969][134211] Avg episode reward: [(0, '9.505')] [2025-01-04 11:20:41,690][134294] Updated weights for policy 0, policy_version 182884 (0.0023) [2025-01-04 11:20:43,968][134211] Fps is (10 sec: 13516.9, 60 sec: 13516.8, 300 sec: 13884.8). Total num frames: 749121536. Throughput: 0: 3184.3. Samples: 176451376. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-04 11:20:43,968][134211] Avg episode reward: [(0, '9.595')] [2025-01-04 11:20:44,830][134294] Updated weights for policy 0, policy_version 182894 (0.0024) [2025-01-04 11:20:47,974][134294] Updated weights for policy 0, policy_version 182904 (0.0023) [2025-01-04 11:20:48,968][134211] Fps is (10 sec: 13107.2, 60 sec: 12834.1, 300 sec: 13884.7). Total num frames: 749187072. Throughput: 0: 3203.9. Samples: 176460830. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-04 11:20:48,968][134211] Avg episode reward: [(0, '10.844')] [2025-01-04 11:20:51,002][134294] Updated weights for policy 0, policy_version 182914 (0.0023) [2025-01-04 11:20:53,968][134211] Fps is (10 sec: 13106.6, 60 sec: 12765.8, 300 sec: 13787.5). Total num frames: 749252608. Throughput: 0: 3217.7. Samples: 176480920. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-04 11:20:53,969][134211] Avg episode reward: [(0, '8.493')] [2025-01-04 11:20:54,139][134294] Updated weights for policy 0, policy_version 182924 (0.0026) [2025-01-04 11:20:57,099][134294] Updated weights for policy 0, policy_version 182934 (0.0022) [2025-01-04 11:20:58,967][134211] Fps is (10 sec: 14336.3, 60 sec: 13039.0, 300 sec: 13704.2). Total num frames: 749330432. Throughput: 0: 3268.9. Samples: 176502688. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-04 11:20:58,968][134211] Avg episode reward: [(0, '9.152')] [2025-01-04 11:20:59,223][134294] Updated weights for policy 0, policy_version 182944 (0.0013) [2025-01-04 11:21:01,209][134294] Updated weights for policy 0, policy_version 182954 (0.0013) [2025-01-04 11:21:03,106][134294] Updated weights for policy 0, policy_version 182964 (0.0013) [2025-01-04 11:21:03,967][134211] Fps is (10 sec: 18433.1, 60 sec: 13858.3, 300 sec: 13787.6). Total num frames: 749436928. Throughput: 0: 3396.7. Samples: 176518176. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-04 11:21:03,968][134211] Avg episode reward: [(0, '9.892')] [2025-01-04 11:21:04,992][134294] Updated weights for policy 0, policy_version 182974 (0.0012) [2025-01-04 11:21:07,224][134294] Updated weights for policy 0, policy_version 182984 (0.0019) [2025-01-04 11:21:08,968][134211] Fps is (10 sec: 18841.1, 60 sec: 14063.2, 300 sec: 13870.9). Total num frames: 749518848. Throughput: 0: 3621.4. Samples: 176548626. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-04 11:21:08,969][134211] Avg episode reward: [(0, '8.733')] [2025-01-04 11:21:10,781][134294] Updated weights for policy 0, policy_version 182994 (0.0027) [2025-01-04 11:21:13,964][134294] Updated weights for policy 0, policy_version 183004 (0.0026) [2025-01-04 11:21:13,968][134211] Fps is (10 sec: 14745.2, 60 sec: 14131.2, 300 sec: 13884.7). Total num frames: 749584384. Throughput: 0: 3629.8. Samples: 176566812. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-04 11:21:13,968][134211] Avg episode reward: [(0, '9.950')] [2025-01-04 11:21:13,980][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000183004_749584384.pth... [2025-01-04 11:21:14,064][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000182180_746209280.pth [2025-01-04 11:21:17,275][134294] Updated weights for policy 0, policy_version 183014 (0.0027) [2025-01-04 11:21:18,968][134211] Fps is (10 sec: 12697.7, 60 sec: 13994.6, 300 sec: 13870.9). Total num frames: 749645824. Throughput: 0: 3634.0. Samples: 176575950. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-04 11:21:18,968][134211] Avg episode reward: [(0, '9.443')] [2025-01-04 11:21:20,337][134294] Updated weights for policy 0, policy_version 183024 (0.0024) [2025-01-04 11:21:23,208][134294] Updated weights for policy 0, policy_version 183034 (0.0025) [2025-01-04 11:21:23,968][134211] Fps is (10 sec: 12696.8, 60 sec: 13994.5, 300 sec: 13898.6). Total num frames: 749711360. Throughput: 0: 3671.7. Samples: 176596418. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-04 11:21:23,969][134211] Avg episode reward: [(0, '9.086')] [2025-01-04 11:21:26,635][134294] Updated weights for policy 0, policy_version 183044 (0.0027) [2025-01-04 11:21:28,968][134211] Fps is (10 sec: 12697.0, 60 sec: 14062.8, 300 sec: 13898.6). Total num frames: 749772800. Throughput: 0: 3633.6. Samples: 176614890. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-04 11:21:28,969][134211] Avg episode reward: [(0, '9.435')] [2025-01-04 11:21:30,011][134294] Updated weights for policy 0, policy_version 183054 (0.0026) [2025-01-04 11:21:33,177][134294] Updated weights for policy 0, policy_version 183064 (0.0026) [2025-01-04 11:21:33,968][134211] Fps is (10 sec: 12698.3, 60 sec: 14199.4, 300 sec: 13898.6). Total num frames: 749838336. Throughput: 0: 3629.0. Samples: 176624134. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 11:21:33,968][134211] Avg episode reward: [(0, '10.137')] [2025-01-04 11:21:36,118][134294] Updated weights for policy 0, policy_version 183074 (0.0027) [2025-01-04 11:21:38,968][134211] Fps is (10 sec: 13517.3, 60 sec: 14199.4, 300 sec: 13912.5). Total num frames: 749907968. Throughput: 0: 3637.8. Samples: 176644618. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 11:21:38,968][134211] Avg episode reward: [(0, '9.494')] [2025-01-04 11:21:39,213][134294] Updated weights for policy 0, policy_version 183084 (0.0028) [2025-01-04 11:21:42,167][134294] Updated weights for policy 0, policy_version 183094 (0.0025) [2025-01-04 11:21:43,968][134211] Fps is (10 sec: 13516.8, 60 sec: 14199.4, 300 sec: 13898.6). Total num frames: 749973504. Throughput: 0: 3601.0. Samples: 176664734. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 11:21:43,968][134211] Avg episode reward: [(0, '9.804')] [2025-01-04 11:21:45,237][134294] Updated weights for policy 0, policy_version 183104 (0.0024) [2025-01-04 11:21:48,524][134294] Updated weights for policy 0, policy_version 183114 (0.0025) [2025-01-04 11:21:48,968][134211] Fps is (10 sec: 13107.4, 60 sec: 14199.5, 300 sec: 13884.8). Total num frames: 750039040. Throughput: 0: 3477.7. Samples: 176674672. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 11:21:48,968][134211] Avg episode reward: [(0, '9.069')] [2025-01-04 11:21:51,683][134294] Updated weights for policy 0, policy_version 183124 (0.0029) [2025-01-04 11:21:53,968][134211] Fps is (10 sec: 12697.6, 60 sec: 14131.3, 300 sec: 13870.9). Total num frames: 750100480. Throughput: 0: 3216.2. Samples: 176693354. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 11:21:53,969][134211] Avg episode reward: [(0, '8.799')] [2025-01-04 11:21:55,245][134294] Updated weights for policy 0, policy_version 183134 (0.0027) [2025-01-04 11:21:58,688][134294] Updated weights for policy 0, policy_version 183144 (0.0025) [2025-01-04 11:21:58,969][134211] Fps is (10 sec: 11877.4, 60 sec: 13789.6, 300 sec: 13843.0). Total num frames: 750157824. Throughput: 0: 3206.8. Samples: 176711120. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 11:21:58,969][134211] Avg episode reward: [(0, '10.319')] [2025-01-04 11:22:02,237][134294] Updated weights for policy 0, policy_version 183154 (0.0024) [2025-01-04 11:22:03,968][134211] Fps is (10 sec: 12288.2, 60 sec: 13107.2, 300 sec: 13829.2). Total num frames: 750223360. Throughput: 0: 3200.2. Samples: 176719960. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 11:22:03,968][134211] Avg episode reward: [(0, '9.061')] [2025-01-04 11:22:04,635][134294] Updated weights for policy 0, policy_version 183164 (0.0016) [2025-01-04 11:22:06,685][134294] Updated weights for policy 0, policy_version 183174 (0.0013) [2025-01-04 11:22:08,622][134294] Updated weights for policy 0, policy_version 183184 (0.0013) [2025-01-04 11:22:08,967][134211] Fps is (10 sec: 16795.3, 60 sec: 13448.6, 300 sec: 13954.2). Total num frames: 750325760. Throughput: 0: 3324.6. Samples: 176746024. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 11:22:08,968][134211] Avg episode reward: [(0, '8.254')] [2025-01-04 11:22:11,455][134294] Updated weights for policy 0, policy_version 183194 (0.0023) [2025-01-04 11:22:13,968][134211] Fps is (10 sec: 17203.2, 60 sec: 13516.8, 300 sec: 13954.2). Total num frames: 750395392. Throughput: 0: 3437.7. Samples: 176769584. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 11:22:13,968][134211] Avg episode reward: [(0, '9.243')] [2025-01-04 11:22:14,619][134294] Updated weights for policy 0, policy_version 183204 (0.0027) [2025-01-04 11:22:17,888][134294] Updated weights for policy 0, policy_version 183214 (0.0030) [2025-01-04 11:22:18,968][134211] Fps is (10 sec: 13106.7, 60 sec: 13516.8, 300 sec: 13940.3). Total num frames: 750456832. Throughput: 0: 3437.1. Samples: 176778804. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 11:22:18,969][134211] Avg episode reward: [(0, '9.973')] [2025-01-04 11:22:20,894][134294] Updated weights for policy 0, policy_version 183224 (0.0026) [2025-01-04 11:22:23,798][134294] Updated weights for policy 0, policy_version 183234 (0.0028) [2025-01-04 11:22:23,969][134211] Fps is (10 sec: 13105.5, 60 sec: 13584.9, 300 sec: 13954.1). Total num frames: 750526464. Throughput: 0: 3434.2. Samples: 176799160. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 11:22:23,970][134211] Avg episode reward: [(0, '9.089')] [2025-01-04 11:22:27,239][134294] Updated weights for policy 0, policy_version 183244 (0.0026) [2025-01-04 11:22:28,968][134211] Fps is (10 sec: 12697.9, 60 sec: 13516.9, 300 sec: 13815.3). Total num frames: 750583808. Throughput: 0: 3395.5. Samples: 176817530. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 11:22:28,968][134211] Avg episode reward: [(0, '8.661')] [2025-01-04 11:22:30,812][134294] Updated weights for policy 0, policy_version 183254 (0.0027) [2025-01-04 11:22:33,968][134211] Fps is (10 sec: 11880.0, 60 sec: 13448.6, 300 sec: 13732.0). Total num frames: 750645248. Throughput: 0: 3372.6. Samples: 176826440. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 11:22:33,968][134211] Avg episode reward: [(0, '9.039')] [2025-01-04 11:22:34,055][134294] Updated weights for policy 0, policy_version 183264 (0.0023) [2025-01-04 11:22:37,198][134294] Updated weights for policy 0, policy_version 183274 (0.0025) [2025-01-04 11:22:38,968][134211] Fps is (10 sec: 12697.3, 60 sec: 13380.3, 300 sec: 13745.9). Total num frames: 750710784. Throughput: 0: 3386.4. Samples: 176845744. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 11:22:38,969][134211] Avg episode reward: [(0, '10.407')] [2025-01-04 11:22:40,165][134294] Updated weights for policy 0, policy_version 183284 (0.0024) [2025-01-04 11:22:43,066][134294] Updated weights for policy 0, policy_version 183294 (0.0025) [2025-01-04 11:22:43,967][134211] Fps is (10 sec: 14336.2, 60 sec: 13585.1, 300 sec: 13773.7). Total num frames: 750788608. Throughput: 0: 3461.7. Samples: 176866894. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 11:22:43,968][134211] Avg episode reward: [(0, '9.350')] [2025-01-04 11:22:44,959][134294] Updated weights for policy 0, policy_version 183304 (0.0014) [2025-01-04 11:22:46,900][134294] Updated weights for policy 0, policy_version 183314 (0.0012) [2025-01-04 11:22:48,790][134294] Updated weights for policy 0, policy_version 183324 (0.0013) [2025-01-04 11:22:48,968][134211] Fps is (10 sec: 18842.2, 60 sec: 14336.0, 300 sec: 13926.4). Total num frames: 750899200. Throughput: 0: 3622.5. Samples: 176882970. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 11:22:48,968][134211] Avg episode reward: [(0, '9.804')] [2025-01-04 11:22:50,772][134294] Updated weights for policy 0, policy_version 183334 (0.0015) [2025-01-04 11:22:53,749][134294] Updated weights for policy 0, policy_version 183344 (0.0026) [2025-01-04 11:22:53,968][134211] Fps is (10 sec: 18841.3, 60 sec: 14609.1, 300 sec: 13995.8). Total num frames: 750977024. Throughput: 0: 3694.8. Samples: 176912290. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 11:22:53,968][134211] Avg episode reward: [(0, '9.235')] [2025-01-04 11:22:56,786][134294] Updated weights for policy 0, policy_version 183354 (0.0028) [2025-01-04 11:22:58,968][134211] Fps is (10 sec: 14335.7, 60 sec: 14745.8, 300 sec: 13995.9). Total num frames: 751042560. Throughput: 0: 3603.3. Samples: 176931734. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 11:22:58,968][134211] Avg episode reward: [(0, '9.139')] [2025-01-04 11:23:00,323][134294] Updated weights for policy 0, policy_version 183364 (0.0028) [2025-01-04 11:23:03,556][134294] Updated weights for policy 0, policy_version 183374 (0.0027) [2025-01-04 11:23:03,969][134211] Fps is (10 sec: 12696.0, 60 sec: 14677.0, 300 sec: 13995.8). Total num frames: 751104000. Throughput: 0: 3594.1. Samples: 176940544. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 11:23:03,970][134211] Avg episode reward: [(0, '9.762')] [2025-01-04 11:23:06,595][134294] Updated weights for policy 0, policy_version 183384 (0.0024) [2025-01-04 11:23:08,968][134211] Fps is (10 sec: 12697.7, 60 sec: 14062.9, 300 sec: 13981.9). Total num frames: 751169536. Throughput: 0: 3578.9. Samples: 176960206. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 11:23:08,968][134211] Avg episode reward: [(0, '9.329')] [2025-01-04 11:23:09,833][134294] Updated weights for policy 0, policy_version 183394 (0.0025) [2025-01-04 11:23:12,855][134294] Updated weights for policy 0, policy_version 183404 (0.0022) [2025-01-04 11:23:13,968][134211] Fps is (10 sec: 13108.6, 60 sec: 13994.6, 300 sec: 13954.2). Total num frames: 751235072. Throughput: 0: 3610.4. Samples: 176980000. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 11:23:13,968][134211] Avg episode reward: [(0, '10.089')] [2025-01-04 11:23:13,980][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000183407_751235072.pth... [2025-01-04 11:23:14,054][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000182587_747876352.pth [2025-01-04 11:23:16,105][134294] Updated weights for policy 0, policy_version 183414 (0.0026) [2025-01-04 11:23:18,968][134211] Fps is (10 sec: 13107.0, 60 sec: 14063.0, 300 sec: 13912.5). Total num frames: 751300608. Throughput: 0: 3622.8. Samples: 176989468. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 11:23:18,968][134211] Avg episode reward: [(0, '9.355')] [2025-01-04 11:23:19,051][134294] Updated weights for policy 0, policy_version 183424 (0.0028) [2025-01-04 11:23:22,116][134294] Updated weights for policy 0, policy_version 183434 (0.0026) [2025-01-04 11:23:23,968][134211] Fps is (10 sec: 13517.0, 60 sec: 14063.2, 300 sec: 13954.2). Total num frames: 751370240. Throughput: 0: 3650.8. Samples: 177010028. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 11:23:23,968][134211] Avg episode reward: [(0, '9.916')] [2025-01-04 11:23:25,002][134294] Updated weights for policy 0, policy_version 183444 (0.0025) [2025-01-04 11:23:28,062][134294] Updated weights for policy 0, policy_version 183454 (0.0024) [2025-01-04 11:23:28,968][134211] Fps is (10 sec: 13517.0, 60 sec: 14199.5, 300 sec: 13898.6). Total num frames: 751435776. Throughput: 0: 3636.4. Samples: 177030534. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 11:23:28,968][134211] Avg episode reward: [(0, '9.857')] [2025-01-04 11:23:31,184][134294] Updated weights for policy 0, policy_version 183464 (0.0026) [2025-01-04 11:23:33,968][134211] Fps is (10 sec: 13107.3, 60 sec: 14267.7, 300 sec: 13870.9). Total num frames: 751501312. Throughput: 0: 3495.2. Samples: 177040254. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 11:23:33,968][134211] Avg episode reward: [(0, '9.195')] [2025-01-04 11:23:34,374][134294] Updated weights for policy 0, policy_version 183474 (0.0022) [2025-01-04 11:23:37,605][134294] Updated weights for policy 0, policy_version 183484 (0.0026) [2025-01-04 11:23:38,968][134211] Fps is (10 sec: 13107.1, 60 sec: 14267.8, 300 sec: 13870.9). Total num frames: 751566848. Throughput: 0: 3272.1. Samples: 177059536. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 11:23:38,968][134211] Avg episode reward: [(0, '9.281')] [2025-01-04 11:23:40,596][134294] Updated weights for policy 0, policy_version 183494 (0.0024) [2025-01-04 11:23:43,426][134294] Updated weights for policy 0, policy_version 183504 (0.0021) [2025-01-04 11:23:43,968][134211] Fps is (10 sec: 13516.7, 60 sec: 14131.1, 300 sec: 13870.9). Total num frames: 751636480. Throughput: 0: 3302.8. Samples: 177080358. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 11:23:43,968][134211] Avg episode reward: [(0, '8.775')] [2025-01-04 11:23:46,710][134294] Updated weights for policy 0, policy_version 183514 (0.0024) [2025-01-04 11:23:48,968][134211] Fps is (10 sec: 13516.9, 60 sec: 13380.2, 300 sec: 13857.0). Total num frames: 751702016. Throughput: 0: 3322.1. Samples: 177090034. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 11:23:48,968][134211] Avg episode reward: [(0, '9.130')] [2025-01-04 11:23:49,686][134294] Updated weights for policy 0, policy_version 183524 (0.0024) [2025-01-04 11:23:52,726][134294] Updated weights for policy 0, policy_version 183534 (0.0025) [2025-01-04 11:23:53,968][134211] Fps is (10 sec: 13516.1, 60 sec: 13243.6, 300 sec: 13870.8). Total num frames: 751771648. Throughput: 0: 3332.5. Samples: 177110172. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 11:23:53,969][134211] Avg episode reward: [(0, '9.801')] [2025-01-04 11:23:55,699][134294] Updated weights for policy 0, policy_version 183544 (0.0024) [2025-01-04 11:23:58,762][134294] Updated weights for policy 0, policy_version 183554 (0.0023) [2025-01-04 11:23:58,968][134211] Fps is (10 sec: 13516.7, 60 sec: 13243.7, 300 sec: 13870.9). Total num frames: 751837184. Throughput: 0: 3348.3. Samples: 177130672. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 11:23:58,968][134211] Avg episode reward: [(0, '9.324')] [2025-01-04 11:24:01,509][134294] Updated weights for policy 0, policy_version 183564 (0.0018) [2025-01-04 11:24:03,439][134294] Updated weights for policy 0, policy_version 183574 (0.0015) [2025-01-04 11:24:03,968][134211] Fps is (10 sec: 15565.9, 60 sec: 13721.9, 300 sec: 13954.2). Total num frames: 751927296. Throughput: 0: 3371.8. Samples: 177141200. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 11:24:03,968][134211] Avg episode reward: [(0, '8.764')] [2025-01-04 11:24:05,310][134294] Updated weights for policy 0, policy_version 183584 (0.0014) [2025-01-04 11:24:07,189][134294] Updated weights for policy 0, policy_version 183594 (0.0013) [2025-01-04 11:24:08,968][134211] Fps is (10 sec: 20070.9, 60 sec: 14472.6, 300 sec: 14106.9). Total num frames: 752037888. Throughput: 0: 3629.5. Samples: 177173354. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 11:24:08,968][134211] Avg episode reward: [(0, '8.497')] [2025-01-04 11:24:09,108][134294] Updated weights for policy 0, policy_version 183604 (0.0011) [2025-01-04 11:24:10,964][134294] Updated weights for policy 0, policy_version 183614 (0.0013) [2025-01-04 11:24:13,247][134294] Updated weights for policy 0, policy_version 183624 (0.0020) [2025-01-04 11:24:13,968][134211] Fps is (10 sec: 20479.5, 60 sec: 14950.4, 300 sec: 14190.2). Total num frames: 752132096. Throughput: 0: 3851.1. Samples: 177203834. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 11:24:13,969][134211] Avg episode reward: [(0, '9.952')] [2025-01-04 11:24:16,730][134294] Updated weights for policy 0, policy_version 183634 (0.0027) [2025-01-04 11:24:18,968][134211] Fps is (10 sec: 15154.9, 60 sec: 14813.9, 300 sec: 14162.4). Total num frames: 752189440. Throughput: 0: 3831.1. Samples: 177212654. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 11:24:18,969][134211] Avg episode reward: [(0, '9.459')] [2025-01-04 11:24:20,106][134294] Updated weights for policy 0, policy_version 183644 (0.0027) [2025-01-04 11:24:23,250][134294] Updated weights for policy 0, policy_version 183654 (0.0026) [2025-01-04 11:24:23,968][134211] Fps is (10 sec: 12288.1, 60 sec: 14745.6, 300 sec: 14134.7). Total num frames: 752254976. Throughput: 0: 3821.1. Samples: 177231484. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 11:24:23,968][134211] Avg episode reward: [(0, '8.325')] [2025-01-04 11:24:26,544][134294] Updated weights for policy 0, policy_version 183664 (0.0028) [2025-01-04 11:24:28,968][134211] Fps is (10 sec: 12288.1, 60 sec: 14609.1, 300 sec: 14106.9). Total num frames: 752312320. Throughput: 0: 3759.5. Samples: 177249536. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 11:24:28,968][134211] Avg episode reward: [(0, '9.655')] [2025-01-04 11:24:30,330][134294] Updated weights for policy 0, policy_version 183674 (0.0023) [2025-01-04 11:24:33,740][134294] Updated weights for policy 0, policy_version 183684 (0.0027) [2025-01-04 11:24:33,968][134211] Fps is (10 sec: 11468.7, 60 sec: 14472.5, 300 sec: 14079.1). Total num frames: 752369664. Throughput: 0: 3733.4. Samples: 177258038. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 11:24:33,968][134211] Avg episode reward: [(0, '9.322')] [2025-01-04 11:24:36,955][134294] Updated weights for policy 0, policy_version 183694 (0.0030) [2025-01-04 11:24:38,968][134211] Fps is (10 sec: 12287.5, 60 sec: 14472.5, 300 sec: 13981.9). Total num frames: 752435200. Throughput: 0: 3690.6. Samples: 177276248. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 11:24:38,969][134211] Avg episode reward: [(0, '10.732')] [2025-01-04 11:24:40,077][134294] Updated weights for policy 0, policy_version 183704 (0.0025) [2025-01-04 11:24:43,287][134294] Updated weights for policy 0, policy_version 183714 (0.0024) [2025-01-04 11:24:43,968][134211] Fps is (10 sec: 13107.2, 60 sec: 14404.3, 300 sec: 13843.1). Total num frames: 752500736. Throughput: 0: 3674.3. Samples: 177296016. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 11:24:43,968][134211] Avg episode reward: [(0, '10.395')] [2025-01-04 11:24:46,406][134294] Updated weights for policy 0, policy_version 183724 (0.0025) [2025-01-04 11:24:48,968][134211] Fps is (10 sec: 13107.6, 60 sec: 14404.3, 300 sec: 13829.2). Total num frames: 752566272. Throughput: 0: 3660.5. Samples: 177305922. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 11:24:48,968][134211] Avg episode reward: [(0, '10.583')] [2025-01-04 11:24:49,549][134294] Updated weights for policy 0, policy_version 183734 (0.0023) [2025-01-04 11:24:52,782][134294] Updated weights for policy 0, policy_version 183744 (0.0024) [2025-01-04 11:24:53,968][134211] Fps is (10 sec: 12697.6, 60 sec: 14267.9, 300 sec: 13829.2). Total num frames: 752627712. Throughput: 0: 3375.1. Samples: 177325234. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 11:24:53,968][134211] Avg episode reward: [(0, '9.329')] [2025-01-04 11:24:56,148][134294] Updated weights for policy 0, policy_version 183754 (0.0024) [2025-01-04 11:24:58,968][134211] Fps is (10 sec: 11878.4, 60 sec: 14131.2, 300 sec: 13829.2). Total num frames: 752685056. Throughput: 0: 3093.3. Samples: 177343032. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 11:24:58,968][134211] Avg episode reward: [(0, '10.504')] [2025-01-04 11:24:59,685][134294] Updated weights for policy 0, policy_version 183764 (0.0026) [2025-01-04 11:25:02,841][134294] Updated weights for policy 0, policy_version 183774 (0.0026) [2025-01-04 11:25:03,968][134211] Fps is (10 sec: 12288.0, 60 sec: 13721.6, 300 sec: 13815.4). Total num frames: 752750592. Throughput: 0: 3106.2. Samples: 177352432. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 11:25:03,968][134211] Avg episode reward: [(0, '9.547')] [2025-01-04 11:25:05,865][134294] Updated weights for policy 0, policy_version 183784 (0.0024) [2025-01-04 11:25:08,968][134211] Fps is (10 sec: 12697.8, 60 sec: 12902.4, 300 sec: 13815.3). Total num frames: 752812032. Throughput: 0: 3141.3. Samples: 177372840. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 11:25:08,968][134211] Avg episode reward: [(0, '9.337')] [2025-01-04 11:25:09,930][134294] Updated weights for policy 0, policy_version 183794 (0.0023) [2025-01-04 11:25:11,888][134294] Updated weights for policy 0, policy_version 183804 (0.0014) [2025-01-04 11:25:13,771][134294] Updated weights for policy 0, policy_version 183814 (0.0013) [2025-01-04 11:25:13,968][134211] Fps is (10 sec: 15155.4, 60 sec: 12834.2, 300 sec: 13884.7). Total num frames: 752902144. Throughput: 0: 3244.6. Samples: 177395542. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 11:25:13,968][134211] Avg episode reward: [(0, '9.328')] [2025-01-04 11:25:13,990][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000183815_752906240.pth... [2025-01-04 11:25:14,029][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000183004_749584384.pth [2025-01-04 11:25:15,854][134294] Updated weights for policy 0, policy_version 183824 (0.0012) [2025-01-04 11:25:18,152][134294] Updated weights for policy 0, policy_version 183834 (0.0015) [2025-01-04 11:25:18,968][134211] Fps is (10 sec: 18022.4, 60 sec: 13380.3, 300 sec: 13968.1). Total num frames: 752992256. Throughput: 0: 3381.3. Samples: 177410194. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 11:25:18,968][134211] Avg episode reward: [(0, '8.684')] [2025-01-04 11:25:21,296][134294] Updated weights for policy 0, policy_version 183844 (0.0025) [2025-01-04 11:25:23,968][134211] Fps is (10 sec: 14745.3, 60 sec: 13243.7, 300 sec: 13968.1). Total num frames: 753049600. Throughput: 0: 3453.4. Samples: 177431648. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 11:25:23,969][134211] Avg episode reward: [(0, '9.877')] [2025-01-04 11:25:25,730][134294] Updated weights for policy 0, policy_version 183854 (0.0036) [2025-01-04 11:25:28,968][134211] Fps is (10 sec: 10239.7, 60 sec: 13038.9, 300 sec: 13926.4). Total num frames: 753094656. Throughput: 0: 3331.9. Samples: 177445952. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 11:25:28,969][134211] Avg episode reward: [(0, '8.736')] [2025-01-04 11:25:29,791][134294] Updated weights for policy 0, policy_version 183864 (0.0031) [2025-01-04 11:25:33,302][134294] Updated weights for policy 0, policy_version 183874 (0.0030) [2025-01-04 11:25:33,968][134211] Fps is (10 sec: 10649.8, 60 sec: 13107.2, 300 sec: 13898.6). Total num frames: 753156096. Throughput: 0: 3287.1. Samples: 177453840. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 11:25:33,968][134211] Avg episode reward: [(0, '9.062')] [2025-01-04 11:25:35,339][134294] Updated weights for policy 0, policy_version 183884 (0.0014) [2025-01-04 11:25:38,248][134294] Updated weights for policy 0, policy_version 183894 (0.0024) [2025-01-04 11:25:38,968][134211] Fps is (10 sec: 14336.1, 60 sec: 13380.3, 300 sec: 13954.2). Total num frames: 753238016. Throughput: 0: 3376.1. Samples: 177477160. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 11:25:38,968][134211] Avg episode reward: [(0, '7.917')] [2025-01-04 11:25:41,317][134294] Updated weights for policy 0, policy_version 183904 (0.0026) [2025-01-04 11:25:43,968][134211] Fps is (10 sec: 14745.6, 60 sec: 13380.3, 300 sec: 13954.2). Total num frames: 753303552. Throughput: 0: 3418.5. Samples: 177496866. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 11:25:43,968][134211] Avg episode reward: [(0, '9.234')] [2025-01-04 11:25:44,518][134294] Updated weights for policy 0, policy_version 183914 (0.0025) [2025-01-04 11:25:47,640][134294] Updated weights for policy 0, policy_version 183924 (0.0025) [2025-01-04 11:25:48,968][134211] Fps is (10 sec: 13107.0, 60 sec: 13380.2, 300 sec: 13954.2). Total num frames: 753369088. Throughput: 0: 3427.1. Samples: 177506652. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 11:25:48,969][134211] Avg episode reward: [(0, '9.121')] [2025-01-04 11:25:50,698][134294] Updated weights for policy 0, policy_version 183934 (0.0025) [2025-01-04 11:25:53,610][134294] Updated weights for policy 0, policy_version 183944 (0.0025) [2025-01-04 11:25:53,968][134211] Fps is (10 sec: 13107.0, 60 sec: 13448.5, 300 sec: 13912.5). Total num frames: 753434624. Throughput: 0: 3428.0. Samples: 177527102. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 11:25:53,969][134211] Avg episode reward: [(0, '9.012')] [2025-01-04 11:25:56,931][134294] Updated weights for policy 0, policy_version 183954 (0.0026) [2025-01-04 11:25:58,968][134211] Fps is (10 sec: 12697.6, 60 sec: 13516.8, 300 sec: 13759.8). Total num frames: 753496064. Throughput: 0: 3339.8. Samples: 177545836. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 11:25:58,969][134211] Avg episode reward: [(0, '9.426')] [2025-01-04 11:26:00,360][134294] Updated weights for policy 0, policy_version 183964 (0.0029) [2025-01-04 11:26:03,637][134294] Updated weights for policy 0, policy_version 183974 (0.0030) [2025-01-04 11:26:03,968][134211] Fps is (10 sec: 12697.7, 60 sec: 13516.8, 300 sec: 13704.2). Total num frames: 753561600. Throughput: 0: 3210.3. Samples: 177554658. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 11:26:03,968][134211] Avg episode reward: [(0, '9.614')] [2025-01-04 11:26:06,596][134294] Updated weights for policy 0, policy_version 183984 (0.0026) [2025-01-04 11:26:08,968][134211] Fps is (10 sec: 13107.5, 60 sec: 13585.0, 300 sec: 13704.2). Total num frames: 753627136. Throughput: 0: 3180.2. Samples: 177574756. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 11:26:08,968][134211] Avg episode reward: [(0, '9.654')] [2025-01-04 11:26:09,786][134294] Updated weights for policy 0, policy_version 183994 (0.0025) [2025-01-04 11:26:13,069][134294] Updated weights for policy 0, policy_version 184004 (0.0024) [2025-01-04 11:26:13,969][134211] Fps is (10 sec: 12696.2, 60 sec: 13106.9, 300 sec: 13704.2). Total num frames: 753688576. Throughput: 0: 3284.6. Samples: 177593764. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 11:26:13,969][134211] Avg episode reward: [(0, '10.108')] [2025-01-04 11:26:16,198][134294] Updated weights for policy 0, policy_version 184014 (0.0026) [2025-01-04 11:26:18,684][134294] Updated weights for policy 0, policy_version 184024 (0.0015) [2025-01-04 11:26:18,968][134211] Fps is (10 sec: 13926.2, 60 sec: 12902.3, 300 sec: 13745.9). Total num frames: 753766400. Throughput: 0: 3327.3. Samples: 177603568. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 11:26:18,968][134211] Avg episode reward: [(0, '9.007')] [2025-01-04 11:26:20,759][134294] Updated weights for policy 0, policy_version 184034 (0.0015) [2025-01-04 11:26:23,908][134294] Updated weights for policy 0, policy_version 184044 (0.0025) [2025-01-04 11:26:23,968][134211] Fps is (10 sec: 15566.4, 60 sec: 13243.7, 300 sec: 13801.5). Total num frames: 753844224. Throughput: 0: 3374.5. Samples: 177629012. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 11:26:23,968][134211] Avg episode reward: [(0, '9.088')] [2025-01-04 11:26:27,527][134294] Updated weights for policy 0, policy_version 184054 (0.0030) [2025-01-04 11:26:28,968][134211] Fps is (10 sec: 13517.0, 60 sec: 13448.6, 300 sec: 13773.7). Total num frames: 753901568. Throughput: 0: 3323.1. Samples: 177646404. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 11:26:28,968][134211] Avg episode reward: [(0, '10.240')] [2025-01-04 11:26:31,024][134294] Updated weights for policy 0, policy_version 184064 (0.0029) [2025-01-04 11:26:33,856][134294] Updated weights for policy 0, policy_version 184074 (0.0018) [2025-01-04 11:26:33,967][134211] Fps is (10 sec: 12288.4, 60 sec: 13516.8, 300 sec: 13759.8). Total num frames: 753967104. Throughput: 0: 3300.4. Samples: 177655170. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 11:26:33,968][134211] Avg episode reward: [(0, '9.268')] [2025-01-04 11:26:35,749][134294] Updated weights for policy 0, policy_version 184084 (0.0012) [2025-01-04 11:26:37,617][134294] Updated weights for policy 0, policy_version 184094 (0.0013) [2025-01-04 11:26:38,968][134211] Fps is (10 sec: 17613.1, 60 sec: 13994.7, 300 sec: 13912.5). Total num frames: 754077696. Throughput: 0: 3466.0. Samples: 177683070. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 11:26:38,968][134211] Avg episode reward: [(0, '11.125')] [2025-01-04 11:26:38,969][134264] Saving new best policy, reward=11.125! [2025-01-04 11:26:39,546][134294] Updated weights for policy 0, policy_version 184104 (0.0013) [2025-01-04 11:26:41,436][134294] Updated weights for policy 0, policy_version 184114 (0.0013) [2025-01-04 11:26:43,347][134294] Updated weights for policy 0, policy_version 184124 (0.0016) [2025-01-04 11:26:43,968][134211] Fps is (10 sec: 21708.4, 60 sec: 14677.3, 300 sec: 14051.4). Total num frames: 754184192. Throughput: 0: 3773.4. Samples: 177715636. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 11:26:43,968][134211] Avg episode reward: [(0, '9.581')] [2025-01-04 11:26:45,780][134294] Updated weights for policy 0, policy_version 184134 (0.0023) [2025-01-04 11:26:48,968][134211] Fps is (10 sec: 17203.0, 60 sec: 14677.4, 300 sec: 14065.3). Total num frames: 754249728. Throughput: 0: 3837.9. Samples: 177727364. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 11:26:48,968][134211] Avg episode reward: [(0, '9.667')] [2025-01-04 11:26:49,134][134294] Updated weights for policy 0, policy_version 184144 (0.0028) [2025-01-04 11:26:52,337][134294] Updated weights for policy 0, policy_version 184154 (0.0027) [2025-01-04 11:26:53,968][134211] Fps is (10 sec: 12697.3, 60 sec: 14609.0, 300 sec: 14079.2). Total num frames: 754311168. Throughput: 0: 3811.1. Samples: 177746256. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 11:26:53,969][134211] Avg episode reward: [(0, '9.684')] [2025-01-04 11:26:55,509][134294] Updated weights for policy 0, policy_version 184164 (0.0029) [2025-01-04 11:26:58,670][134294] Updated weights for policy 0, policy_version 184174 (0.0026) [2025-01-04 11:26:58,968][134211] Fps is (10 sec: 13107.0, 60 sec: 14745.6, 300 sec: 14093.0). Total num frames: 754380800. Throughput: 0: 3826.2. Samples: 177765940. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 11:26:58,969][134211] Avg episode reward: [(0, '8.236')] [2025-01-04 11:27:02,125][134294] Updated weights for policy 0, policy_version 184184 (0.0025) [2025-01-04 11:27:03,968][134211] Fps is (10 sec: 12697.8, 60 sec: 14609.1, 300 sec: 13940.3). Total num frames: 754438144. Throughput: 0: 3803.9. Samples: 177774744. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 11:27:03,968][134211] Avg episode reward: [(0, '9.178')] [2025-01-04 11:27:05,178][134294] Updated weights for policy 0, policy_version 184194 (0.0027) [2025-01-04 11:27:08,193][134294] Updated weights for policy 0, policy_version 184204 (0.0027) [2025-01-04 11:27:08,968][134211] Fps is (10 sec: 12697.3, 60 sec: 14677.2, 300 sec: 13940.3). Total num frames: 754507776. Throughput: 0: 3680.5. Samples: 177794636. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 11:27:08,969][134211] Avg episode reward: [(0, '9.099')] [2025-01-04 11:27:11,272][134294] Updated weights for policy 0, policy_version 184214 (0.0025) [2025-01-04 11:27:13,968][134211] Fps is (10 sec: 13516.5, 60 sec: 14745.8, 300 sec: 13954.2). Total num frames: 754573312. Throughput: 0: 3738.5. Samples: 177814638. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 11:27:13,969][134211] Avg episode reward: [(0, '9.155')] [2025-01-04 11:27:14,053][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000184223_754577408.pth... [2025-01-04 11:27:14,126][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000183407_751235072.pth [2025-01-04 11:27:14,373][134294] Updated weights for policy 0, policy_version 184224 (0.0026) [2025-01-04 11:27:17,569][134294] Updated weights for policy 0, policy_version 184234 (0.0023) [2025-01-04 11:27:18,968][134211] Fps is (10 sec: 13107.5, 60 sec: 14540.8, 300 sec: 13940.3). Total num frames: 754638848. Throughput: 0: 3760.1. Samples: 177824374. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 11:27:18,968][134211] Avg episode reward: [(0, '10.954')] [2025-01-04 11:27:20,630][134294] Updated weights for policy 0, policy_version 184244 (0.0026) [2025-01-04 11:27:23,536][134294] Updated weights for policy 0, policy_version 184254 (0.0024) [2025-01-04 11:27:23,968][134211] Fps is (10 sec: 13517.0, 60 sec: 14404.3, 300 sec: 13981.9). Total num frames: 754708480. Throughput: 0: 3593.4. Samples: 177844772. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 11:27:23,968][134211] Avg episode reward: [(0, '10.090')] [2025-01-04 11:27:26,420][134294] Updated weights for policy 0, policy_version 184264 (0.0022) [2025-01-04 11:27:28,968][134211] Fps is (10 sec: 13516.9, 60 sec: 14540.8, 300 sec: 13995.8). Total num frames: 754774016. Throughput: 0: 3318.6. Samples: 177864974. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 11:27:28,968][134211] Avg episode reward: [(0, '10.211')] [2025-01-04 11:27:29,821][134294] Updated weights for policy 0, policy_version 184274 (0.0027) [2025-01-04 11:27:32,843][134294] Updated weights for policy 0, policy_version 184284 (0.0023) [2025-01-04 11:27:33,968][134211] Fps is (10 sec: 13107.3, 60 sec: 14540.7, 300 sec: 13995.8). Total num frames: 754839552. Throughput: 0: 3264.9. Samples: 177874286. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 11:27:33,968][134211] Avg episode reward: [(0, '9.992')] [2025-01-04 11:27:35,883][134294] Updated weights for policy 0, policy_version 184294 (0.0023) [2025-01-04 11:27:38,835][134294] Updated weights for policy 0, policy_version 184304 (0.0023) [2025-01-04 11:27:38,968][134211] Fps is (10 sec: 13516.8, 60 sec: 13858.1, 300 sec: 13968.0). Total num frames: 754909184. Throughput: 0: 3301.9. Samples: 177894840. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 11:27:38,968][134211] Avg episode reward: [(0, '8.597')] [2025-01-04 11:27:41,845][134294] Updated weights for policy 0, policy_version 184314 (0.0020) [2025-01-04 11:27:43,968][134211] Fps is (10 sec: 13926.2, 60 sec: 13243.7, 300 sec: 13829.2). Total num frames: 754978816. Throughput: 0: 3321.2. Samples: 177915392. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 11:27:43,968][134211] Avg episode reward: [(0, '8.927')] [2025-01-04 11:27:44,806][134294] Updated weights for policy 0, policy_version 184324 (0.0025) [2025-01-04 11:27:47,872][134294] Updated weights for policy 0, policy_version 184334 (0.0023) [2025-01-04 11:27:48,968][134211] Fps is (10 sec: 13517.0, 60 sec: 13243.8, 300 sec: 13787.6). Total num frames: 755044352. Throughput: 0: 3349.6. Samples: 177925476. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 11:27:48,968][134211] Avg episode reward: [(0, '9.381')] [2025-01-04 11:27:50,902][134294] Updated weights for policy 0, policy_version 184344 (0.0023) [2025-01-04 11:27:53,778][134294] Updated weights for policy 0, policy_version 184354 (0.0024) [2025-01-04 11:27:53,968][134211] Fps is (10 sec: 13516.7, 60 sec: 13380.3, 300 sec: 13801.4). Total num frames: 755113984. Throughput: 0: 3363.1. Samples: 177945974. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 11:27:53,969][134211] Avg episode reward: [(0, '9.065')] [2025-01-04 11:27:57,099][134294] Updated weights for policy 0, policy_version 184364 (0.0030) [2025-01-04 11:27:58,968][134211] Fps is (10 sec: 13107.1, 60 sec: 13243.8, 300 sec: 13801.5). Total num frames: 755175424. Throughput: 0: 3347.9. Samples: 177965292. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 11:27:58,968][134211] Avg episode reward: [(0, '9.200')] [2025-01-04 11:28:00,391][134294] Updated weights for policy 0, policy_version 184374 (0.0026) [2025-01-04 11:28:03,781][134294] Updated weights for policy 0, policy_version 184384 (0.0023) [2025-01-04 11:28:03,968][134211] Fps is (10 sec: 12288.3, 60 sec: 13312.0, 300 sec: 13787.6). Total num frames: 755236864. Throughput: 0: 3336.9. Samples: 177974534. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 11:28:03,968][134211] Avg episode reward: [(0, '8.197')] [2025-01-04 11:28:06,944][134294] Updated weights for policy 0, policy_version 184394 (0.0026) [2025-01-04 11:28:08,968][134211] Fps is (10 sec: 12697.6, 60 sec: 13243.8, 300 sec: 13787.6). Total num frames: 755302400. Throughput: 0: 3297.7. Samples: 177993168. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 11:28:08,968][134211] Avg episode reward: [(0, '8.963')] [2025-01-04 11:28:10,349][134294] Updated weights for policy 0, policy_version 184404 (0.0025) [2025-01-04 11:28:12,846][134294] Updated weights for policy 0, policy_version 184414 (0.0016) [2025-01-04 11:28:13,967][134211] Fps is (10 sec: 14336.2, 60 sec: 13448.6, 300 sec: 13829.2). Total num frames: 755380224. Throughput: 0: 3332.5. Samples: 178014936. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 11:28:13,968][134211] Avg episode reward: [(0, '9.267')] [2025-01-04 11:28:14,900][134294] Updated weights for policy 0, policy_version 184424 (0.0013) [2025-01-04 11:28:17,079][134294] Updated weights for policy 0, policy_version 184434 (0.0013) [2025-01-04 11:28:18,968][134211] Fps is (10 sec: 16793.5, 60 sec: 13858.2, 300 sec: 13898.6). Total num frames: 755470336. Throughput: 0: 3447.6. Samples: 178029426. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 11:28:18,968][134211] Avg episode reward: [(0, '8.498')] [2025-01-04 11:28:19,793][134294] Updated weights for policy 0, policy_version 184444 (0.0023) [2025-01-04 11:28:23,514][134294] Updated weights for policy 0, policy_version 184454 (0.0028) [2025-01-04 11:28:23,968][134211] Fps is (10 sec: 14745.2, 60 sec: 13653.3, 300 sec: 13870.9). Total num frames: 755527680. Throughput: 0: 3459.8. Samples: 178050532. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 11:28:23,969][134211] Avg episode reward: [(0, '8.472')] [2025-01-04 11:28:27,126][134294] Updated weights for policy 0, policy_version 184464 (0.0027) [2025-01-04 11:28:28,968][134211] Fps is (10 sec: 11059.1, 60 sec: 13448.5, 300 sec: 13829.2). Total num frames: 755580928. Throughput: 0: 3372.6. Samples: 178067160. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 11:28:28,968][134211] Avg episode reward: [(0, '8.930')] [2025-01-04 11:28:30,753][134294] Updated weights for policy 0, policy_version 184474 (0.0028) [2025-01-04 11:28:33,968][134211] Fps is (10 sec: 11468.8, 60 sec: 13380.2, 300 sec: 13815.3). Total num frames: 755642368. Throughput: 0: 3342.8. Samples: 178075902. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 11:28:33,969][134211] Avg episode reward: [(0, '8.875')] [2025-01-04 11:28:34,247][134294] Updated weights for policy 0, policy_version 184484 (0.0029) [2025-01-04 11:28:37,677][134294] Updated weights for policy 0, policy_version 184494 (0.0024) [2025-01-04 11:28:38,968][134211] Fps is (10 sec: 11878.5, 60 sec: 13175.5, 300 sec: 13773.7). Total num frames: 755699712. Throughput: 0: 3280.3. Samples: 178093588. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 11:28:38,968][134211] Avg episode reward: [(0, '9.640')] [2025-01-04 11:28:40,258][134294] Updated weights for policy 0, policy_version 184504 (0.0016) [2025-01-04 11:28:42,160][134294] Updated weights for policy 0, policy_version 184514 (0.0013) [2025-01-04 11:28:43,938][134294] Updated weights for policy 0, policy_version 184524 (0.0011) [2025-01-04 11:28:43,968][134211] Fps is (10 sec: 16794.0, 60 sec: 13858.2, 300 sec: 13926.4). Total num frames: 755810304. Throughput: 0: 3469.6. Samples: 178121426. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 11:28:43,968][134211] Avg episode reward: [(0, '8.897')] [2025-01-04 11:28:45,894][134294] Updated weights for policy 0, policy_version 184534 (0.0013) [2025-01-04 11:28:47,795][134294] Updated weights for policy 0, policy_version 184544 (0.0013) [2025-01-04 11:28:48,968][134211] Fps is (10 sec: 20889.7, 60 sec: 14404.3, 300 sec: 14023.6). Total num frames: 755908608. Throughput: 0: 3628.0. Samples: 178137794. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 11:28:48,968][134211] Avg episode reward: [(0, '9.569')] [2025-01-04 11:28:50,650][134294] Updated weights for policy 0, policy_version 184554 (0.0024) [2025-01-04 11:28:53,770][134294] Updated weights for policy 0, policy_version 184564 (0.0027) [2025-01-04 11:28:53,968][134211] Fps is (10 sec: 16383.6, 60 sec: 14336.0, 300 sec: 14023.6). Total num frames: 755974144. Throughput: 0: 3742.7. Samples: 178161590. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 11:28:53,968][134211] Avg episode reward: [(0, '8.879')] [2025-01-04 11:28:56,970][134294] Updated weights for policy 0, policy_version 184574 (0.0027) [2025-01-04 11:28:58,968][134211] Fps is (10 sec: 12697.6, 60 sec: 14336.0, 300 sec: 13926.4). Total num frames: 756035584. Throughput: 0: 3683.8. Samples: 178180706. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 11:28:58,968][134211] Avg episode reward: [(0, '10.071')] [2025-01-04 11:29:00,355][134294] Updated weights for policy 0, policy_version 184584 (0.0026) [2025-01-04 11:29:03,868][134294] Updated weights for policy 0, policy_version 184594 (0.0025) [2025-01-04 11:29:03,968][134211] Fps is (10 sec: 12287.5, 60 sec: 14335.9, 300 sec: 13759.8). Total num frames: 756097024. Throughput: 0: 3559.7. Samples: 178189614. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 11:29:03,969][134211] Avg episode reward: [(0, '9.101')] [2025-01-04 11:29:07,264][134294] Updated weights for policy 0, policy_version 184604 (0.0024) [2025-01-04 11:29:08,968][134211] Fps is (10 sec: 11878.3, 60 sec: 14199.4, 300 sec: 13634.8). Total num frames: 756154368. Throughput: 0: 3488.5. Samples: 178207514. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 11:29:08,968][134211] Avg episode reward: [(0, '9.735')] [2025-01-04 11:29:10,656][134294] Updated weights for policy 0, policy_version 184614 (0.0024) [2025-01-04 11:29:13,840][134294] Updated weights for policy 0, policy_version 184624 (0.0023) [2025-01-04 11:29:13,968][134211] Fps is (10 sec: 12288.4, 60 sec: 13994.6, 300 sec: 13662.6). Total num frames: 756219904. Throughput: 0: 3533.7. Samples: 178226176. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 11:29:13,969][134211] Avg episode reward: [(0, '9.749')] [2025-01-04 11:29:13,977][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000184624_756219904.pth... [2025-01-04 11:29:14,045][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000183815_752906240.pth [2025-01-04 11:29:16,853][134294] Updated weights for policy 0, policy_version 184634 (0.0025) [2025-01-04 11:29:18,968][134211] Fps is (10 sec: 13107.2, 60 sec: 13585.1, 300 sec: 13662.6). Total num frames: 756285440. Throughput: 0: 3559.4. Samples: 178236074. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 11:29:18,968][134211] Avg episode reward: [(0, '9.522')] [2025-01-04 11:29:20,098][134294] Updated weights for policy 0, policy_version 184644 (0.0026) [2025-01-04 11:29:23,053][134294] Updated weights for policy 0, policy_version 184654 (0.0025) [2025-01-04 11:29:23,968][134211] Fps is (10 sec: 13107.3, 60 sec: 13721.6, 300 sec: 13690.4). Total num frames: 756350976. Throughput: 0: 3605.9. Samples: 178255852. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 11:29:23,968][134211] Avg episode reward: [(0, '9.569')] [2025-01-04 11:29:26,208][134294] Updated weights for policy 0, policy_version 184664 (0.0024) [2025-01-04 11:29:28,968][134211] Fps is (10 sec: 13107.1, 60 sec: 13926.4, 300 sec: 13718.1). Total num frames: 756416512. Throughput: 0: 3420.5. Samples: 178275350. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 11:29:28,969][134211] Avg episode reward: [(0, '9.835')] [2025-01-04 11:29:29,490][134294] Updated weights for policy 0, policy_version 184674 (0.0027) [2025-01-04 11:29:32,802][134294] Updated weights for policy 0, policy_version 184684 (0.0025) [2025-01-04 11:29:33,968][134211] Fps is (10 sec: 12697.8, 60 sec: 13926.5, 300 sec: 13704.3). Total num frames: 756477952. Throughput: 0: 3263.8. Samples: 178284664. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 11:29:33,968][134211] Avg episode reward: [(0, '10.058')] [2025-01-04 11:29:36,016][134294] Updated weights for policy 0, policy_version 184694 (0.0025) [2025-01-04 11:29:38,968][134211] Fps is (10 sec: 12697.7, 60 sec: 14062.9, 300 sec: 13704.2). Total num frames: 756543488. Throughput: 0: 3159.6. Samples: 178303774. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 11:29:38,968][134211] Avg episode reward: [(0, '9.212')] [2025-01-04 11:29:39,081][134294] Updated weights for policy 0, policy_version 184704 (0.0027) [2025-01-04 11:29:42,087][134294] Updated weights for policy 0, policy_version 184714 (0.0025) [2025-01-04 11:29:43,968][134211] Fps is (10 sec: 13515.9, 60 sec: 13380.1, 300 sec: 13718.1). Total num frames: 756613120. Throughput: 0: 3183.6. Samples: 178323968. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 11:29:43,969][134211] Avg episode reward: [(0, '10.249')] [2025-01-04 11:29:45,092][134294] Updated weights for policy 0, policy_version 184724 (0.0023) [2025-01-04 11:29:48,184][134294] Updated weights for policy 0, policy_version 184734 (0.0029) [2025-01-04 11:29:48,968][134211] Fps is (10 sec: 13516.8, 60 sec: 12834.1, 300 sec: 13732.0). Total num frames: 756678656. Throughput: 0: 3216.6. Samples: 178334360. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 11:29:48,968][134211] Avg episode reward: [(0, '9.679')] [2025-01-04 11:29:51,146][134294] Updated weights for policy 0, policy_version 184744 (0.0024) [2025-01-04 11:29:53,968][134211] Fps is (10 sec: 13517.4, 60 sec: 12902.4, 300 sec: 13773.7). Total num frames: 756748288. Throughput: 0: 3270.3. Samples: 178354678. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 11:29:53,968][134211] Avg episode reward: [(0, '9.916')] [2025-01-04 11:29:54,187][134294] Updated weights for policy 0, policy_version 184754 (0.0023) [2025-01-04 11:29:57,411][134294] Updated weights for policy 0, policy_version 184764 (0.0022) [2025-01-04 11:29:58,968][134211] Fps is (10 sec: 13107.1, 60 sec: 12902.4, 300 sec: 13759.8). Total num frames: 756809728. Throughput: 0: 3280.8. Samples: 178373810. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 11:29:58,968][134211] Avg episode reward: [(0, '8.837')] [2025-01-04 11:30:00,702][134294] Updated weights for policy 0, policy_version 184774 (0.0025) [2025-01-04 11:30:03,847][134294] Updated weights for policy 0, policy_version 184784 (0.0026) [2025-01-04 11:30:03,968][134211] Fps is (10 sec: 12697.8, 60 sec: 12970.8, 300 sec: 13773.7). Total num frames: 756875264. Throughput: 0: 3276.5. Samples: 178383516. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 11:30:03,968][134211] Avg episode reward: [(0, '10.178')] [2025-01-04 11:30:06,168][134294] Updated weights for policy 0, policy_version 184794 (0.0017) [2025-01-04 11:30:07,992][134294] Updated weights for policy 0, policy_version 184804 (0.0013) [2025-01-04 11:30:08,968][134211] Fps is (10 sec: 16793.3, 60 sec: 13721.6, 300 sec: 13815.3). Total num frames: 756977664. Throughput: 0: 3382.5. Samples: 178408064. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 11:30:08,968][134211] Avg episode reward: [(0, '10.012')] [2025-01-04 11:30:09,907][134294] Updated weights for policy 0, policy_version 184814 (0.0015) [2025-01-04 11:30:11,905][134294] Updated weights for policy 0, policy_version 184824 (0.0015) [2025-01-04 11:30:13,968][134211] Fps is (10 sec: 18841.6, 60 sec: 14063.0, 300 sec: 13801.4). Total num frames: 757063680. Throughput: 0: 3604.0. Samples: 178437530. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 11:30:13,968][134211] Avg episode reward: [(0, '9.075')] [2025-01-04 11:30:15,029][134294] Updated weights for policy 0, policy_version 184834 (0.0027) [2025-01-04 11:30:18,438][134294] Updated weights for policy 0, policy_version 184844 (0.0033) [2025-01-04 11:30:18,968][134211] Fps is (10 sec: 14746.0, 60 sec: 13994.7, 300 sec: 13815.3). Total num frames: 757125120. Throughput: 0: 3599.3. Samples: 178446634. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 11:30:18,968][134211] Avg episode reward: [(0, '9.921')] [2025-01-04 11:30:21,640][134294] Updated weights for policy 0, policy_version 184854 (0.0026) [2025-01-04 11:30:23,968][134211] Fps is (10 sec: 12697.5, 60 sec: 13994.7, 300 sec: 13884.7). Total num frames: 757190656. Throughput: 0: 3592.8. Samples: 178465452. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 11:30:23,969][134211] Avg episode reward: [(0, '10.201')] [2025-01-04 11:30:24,834][134294] Updated weights for policy 0, policy_version 184864 (0.0029) [2025-01-04 11:30:28,498][134294] Updated weights for policy 0, policy_version 184874 (0.0026) [2025-01-04 11:30:28,968][134211] Fps is (10 sec: 12288.1, 60 sec: 13858.2, 300 sec: 13870.9). Total num frames: 757248000. Throughput: 0: 3536.9. Samples: 178483128. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 11:30:28,968][134211] Avg episode reward: [(0, '8.999')] [2025-01-04 11:30:32,045][134294] Updated weights for policy 0, policy_version 184884 (0.0026) [2025-01-04 11:30:33,968][134211] Fps is (10 sec: 11468.8, 60 sec: 13789.8, 300 sec: 13787.5). Total num frames: 757305344. Throughput: 0: 3499.8. Samples: 178491852. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 11:30:33,969][134211] Avg episode reward: [(0, '9.956')] [2025-01-04 11:30:35,470][134294] Updated weights for policy 0, policy_version 184894 (0.0027) [2025-01-04 11:30:37,504][134294] Updated weights for policy 0, policy_version 184904 (0.0013) [2025-01-04 11:30:38,968][134211] Fps is (10 sec: 14745.7, 60 sec: 14199.5, 300 sec: 13870.9). Total num frames: 757395456. Throughput: 0: 3524.2. Samples: 178513264. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 11:30:38,968][134211] Avg episode reward: [(0, '10.125')] [2025-01-04 11:30:39,454][134294] Updated weights for policy 0, policy_version 184914 (0.0012) [2025-01-04 11:30:41,428][134294] Updated weights for policy 0, policy_version 184924 (0.0014) [2025-01-04 11:30:43,414][134294] Updated weights for policy 0, policy_version 184934 (0.0014) [2025-01-04 11:30:43,968][134211] Fps is (10 sec: 18841.9, 60 sec: 14677.5, 300 sec: 13981.9). Total num frames: 757493760. Throughput: 0: 3793.2. Samples: 178544502. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 11:30:43,968][134211] Avg episode reward: [(0, '9.464')] [2025-01-04 11:30:46,597][134294] Updated weights for policy 0, policy_version 184944 (0.0028) [2025-01-04 11:30:48,968][134211] Fps is (10 sec: 15974.1, 60 sec: 14609.1, 300 sec: 13968.1). Total num frames: 757555200. Throughput: 0: 3802.4. Samples: 178554626. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 11:30:48,968][134211] Avg episode reward: [(0, '9.805')] [2025-01-04 11:30:49,993][134294] Updated weights for policy 0, policy_version 184954 (0.0029) [2025-01-04 11:30:53,290][134294] Updated weights for policy 0, policy_version 184964 (0.0027) [2025-01-04 11:30:53,968][134211] Fps is (10 sec: 12697.6, 60 sec: 14540.8, 300 sec: 13982.0). Total num frames: 757620736. Throughput: 0: 3663.6. Samples: 178572926. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 11:30:53,968][134211] Avg episode reward: [(0, '9.824')] [2025-01-04 11:30:56,361][134294] Updated weights for policy 0, policy_version 184974 (0.0025) [2025-01-04 11:30:58,968][134211] Fps is (10 sec: 12697.7, 60 sec: 14540.8, 300 sec: 13968.1). Total num frames: 757682176. Throughput: 0: 3435.2. Samples: 178592112. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 11:30:58,968][134211] Avg episode reward: [(0, '8.470')] [2025-01-04 11:30:59,638][134294] Updated weights for policy 0, policy_version 184984 (0.0026) [2025-01-04 11:31:03,125][134294] Updated weights for policy 0, policy_version 184994 (0.0025) [2025-01-04 11:31:03,968][134211] Fps is (10 sec: 12287.9, 60 sec: 14472.5, 300 sec: 13954.2). Total num frames: 757743616. Throughput: 0: 3432.8. Samples: 178601112. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 11:31:03,970][134211] Avg episode reward: [(0, '9.132')] [2025-01-04 11:31:06,127][134294] Updated weights for policy 0, policy_version 185004 (0.0027) [2025-01-04 11:31:08,968][134211] Fps is (10 sec: 12697.7, 60 sec: 13858.2, 300 sec: 13968.1). Total num frames: 757809152. Throughput: 0: 3447.1. Samples: 178620570. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 11:31:08,968][134211] Avg episode reward: [(0, '9.234')] [2025-01-04 11:31:09,461][134294] Updated weights for policy 0, policy_version 185014 (0.0023) [2025-01-04 11:31:12,592][134294] Updated weights for policy 0, policy_version 185024 (0.0025) [2025-01-04 11:31:13,968][134211] Fps is (10 sec: 13107.0, 60 sec: 13516.8, 300 sec: 13926.4). Total num frames: 757874688. Throughput: 0: 3477.4. Samples: 178639614. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 11:31:13,968][134211] Avg episode reward: [(0, '9.573')] [2025-01-04 11:31:13,977][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000185028_757874688.pth... [2025-01-04 11:31:14,050][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000184223_754577408.pth [2025-01-04 11:31:15,935][134294] Updated weights for policy 0, policy_version 185034 (0.0025) [2025-01-04 11:31:18,968][134211] Fps is (10 sec: 12697.5, 60 sec: 13516.8, 300 sec: 13870.9). Total num frames: 757936128. Throughput: 0: 3493.9. Samples: 178649078. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 11:31:18,969][134211] Avg episode reward: [(0, '9.611')] [2025-01-04 11:31:18,995][134294] Updated weights for policy 0, policy_version 185044 (0.0024) [2025-01-04 11:31:22,115][134294] Updated weights for policy 0, policy_version 185054 (0.0025) [2025-01-04 11:31:23,968][134211] Fps is (10 sec: 13107.5, 60 sec: 13585.1, 300 sec: 13912.5). Total num frames: 758005760. Throughput: 0: 3455.7. Samples: 178668770. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 11:31:23,968][134211] Avg episode reward: [(0, '8.179')] [2025-01-04 11:31:25,170][134294] Updated weights for policy 0, policy_version 185064 (0.0025) [2025-01-04 11:31:28,267][134294] Updated weights for policy 0, policy_version 185074 (0.0025) [2025-01-04 11:31:28,968][134211] Fps is (10 sec: 13516.8, 60 sec: 13721.6, 300 sec: 13912.5). Total num frames: 758071296. Throughput: 0: 3207.6. Samples: 178688846. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 11:31:28,968][134211] Avg episode reward: [(0, '9.812')] [2025-01-04 11:31:31,600][134294] Updated weights for policy 0, policy_version 185084 (0.0026) [2025-01-04 11:31:33,968][134211] Fps is (10 sec: 12697.7, 60 sec: 13789.9, 300 sec: 13745.9). Total num frames: 758132736. Throughput: 0: 3189.6. Samples: 178698158. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 11:31:33,968][134211] Avg episode reward: [(0, '9.693')] [2025-01-04 11:31:34,380][134294] Updated weights for policy 0, policy_version 185094 (0.0022) [2025-01-04 11:31:36,617][134294] Updated weights for policy 0, policy_version 185104 (0.0017) [2025-01-04 11:31:38,970][134211] Fps is (10 sec: 14332.3, 60 sec: 13652.7, 300 sec: 13662.5). Total num frames: 758214656. Throughput: 0: 3305.0. Samples: 178721658. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 11:31:38,971][134211] Avg episode reward: [(0, '9.395')] [2025-01-04 11:31:39,624][134294] Updated weights for policy 0, policy_version 185114 (0.0025) [2025-01-04 11:31:42,776][134294] Updated weights for policy 0, policy_version 185124 (0.0024) [2025-01-04 11:31:43,970][134211] Fps is (10 sec: 14742.7, 60 sec: 13106.8, 300 sec: 13662.5). Total num frames: 758280192. Throughput: 0: 3317.7. Samples: 178741414. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 11:31:43,970][134211] Avg episode reward: [(0, '9.318')] [2025-01-04 11:31:45,814][134294] Updated weights for policy 0, policy_version 185134 (0.0025) [2025-01-04 11:31:48,935][134294] Updated weights for policy 0, policy_version 185144 (0.0026) [2025-01-04 11:31:48,968][134211] Fps is (10 sec: 13520.3, 60 sec: 13243.7, 300 sec: 13690.4). Total num frames: 758349824. Throughput: 0: 3345.9. Samples: 178751676. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 11:31:48,968][134211] Avg episode reward: [(0, '10.441')] [2025-01-04 11:31:51,999][134294] Updated weights for policy 0, policy_version 185154 (0.0023) [2025-01-04 11:31:53,970][134211] Fps is (10 sec: 13107.2, 60 sec: 13175.0, 300 sec: 13662.5). Total num frames: 758411264. Throughput: 0: 3354.8. Samples: 178771544. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 11:31:53,970][134211] Avg episode reward: [(0, '8.137')] [2025-01-04 11:31:55,225][134294] Updated weights for policy 0, policy_version 185164 (0.0024) [2025-01-04 11:31:57,153][134294] Updated weights for policy 0, policy_version 185174 (0.0014) [2025-01-04 11:31:58,968][134211] Fps is (10 sec: 15565.0, 60 sec: 13721.6, 300 sec: 13787.6). Total num frames: 758505472. Throughput: 0: 3485.5. Samples: 178796460. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 11:31:58,968][134211] Avg episode reward: [(0, '9.390')] [2025-01-04 11:31:59,166][134294] Updated weights for policy 0, policy_version 185184 (0.0013) [2025-01-04 11:32:01,188][134294] Updated weights for policy 0, policy_version 185194 (0.0013) [2025-01-04 11:32:03,508][134294] Updated weights for policy 0, policy_version 185204 (0.0017) [2025-01-04 11:32:03,968][134211] Fps is (10 sec: 18845.0, 60 sec: 14267.7, 300 sec: 13870.9). Total num frames: 758599680. Throughput: 0: 3613.1. Samples: 178811668. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 11:32:03,968][134211] Avg episode reward: [(0, '9.997')] [2025-01-04 11:32:06,855][134294] Updated weights for policy 0, policy_version 185214 (0.0028) [2025-01-04 11:32:08,968][134211] Fps is (10 sec: 15154.9, 60 sec: 14131.2, 300 sec: 13843.1). Total num frames: 758657024. Throughput: 0: 3656.3. Samples: 178833302. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 11:32:08,968][134211] Avg episode reward: [(0, '8.753')] [2025-01-04 11:32:10,477][134294] Updated weights for policy 0, policy_version 185224 (0.0027) [2025-01-04 11:32:13,580][134294] Updated weights for policy 0, policy_version 185234 (0.0026) [2025-01-04 11:32:13,968][134211] Fps is (10 sec: 12288.1, 60 sec: 14131.2, 300 sec: 13843.1). Total num frames: 758722560. Throughput: 0: 3613.6. Samples: 178851456. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 11:32:13,968][134211] Avg episode reward: [(0, '10.163')] [2025-01-04 11:32:16,637][134294] Updated weights for policy 0, policy_version 185244 (0.0027) [2025-01-04 11:32:18,968][134211] Fps is (10 sec: 12697.0, 60 sec: 14131.1, 300 sec: 13815.3). Total num frames: 758784000. Throughput: 0: 3628.9. Samples: 178861462. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 11:32:18,969][134211] Avg episode reward: [(0, '9.147')] [2025-01-04 11:32:19,987][134294] Updated weights for policy 0, policy_version 185254 (0.0027) [2025-01-04 11:32:22,973][134294] Updated weights for policy 0, policy_version 185264 (0.0023) [2025-01-04 11:32:23,968][134211] Fps is (10 sec: 13107.0, 60 sec: 14131.2, 300 sec: 13829.2). Total num frames: 758853632. Throughput: 0: 3532.5. Samples: 178880610. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 11:32:23,968][134211] Avg episode reward: [(0, '9.105')] [2025-01-04 11:32:26,042][134294] Updated weights for policy 0, policy_version 185274 (0.0025) [2025-01-04 11:32:28,968][134211] Fps is (10 sec: 13108.0, 60 sec: 14063.0, 300 sec: 13815.3). Total num frames: 758915072. Throughput: 0: 3523.0. Samples: 178899944. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 11:32:28,968][134211] Avg episode reward: [(0, '8.276')] [2025-01-04 11:32:29,583][134294] Updated weights for policy 0, policy_version 185284 (0.0030) [2025-01-04 11:32:33,101][134294] Updated weights for policy 0, policy_version 185294 (0.0027) [2025-01-04 11:32:33,968][134211] Fps is (10 sec: 11878.5, 60 sec: 13994.6, 300 sec: 13773.7). Total num frames: 758972416. Throughput: 0: 3486.3. Samples: 178908560. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 11:32:33,968][134211] Avg episode reward: [(0, '8.393')] [2025-01-04 11:32:36,397][134294] Updated weights for policy 0, policy_version 185304 (0.0026) [2025-01-04 11:32:38,968][134211] Fps is (10 sec: 11878.2, 60 sec: 13653.9, 300 sec: 13745.9). Total num frames: 759033856. Throughput: 0: 3449.1. Samples: 178926746. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 11:32:38,968][134211] Avg episode reward: [(0, '10.231')] [2025-01-04 11:32:39,645][134294] Updated weights for policy 0, policy_version 185314 (0.0024) [2025-01-04 11:32:42,586][134294] Updated weights for policy 0, policy_version 185324 (0.0025) [2025-01-04 11:32:43,968][134211] Fps is (10 sec: 13107.0, 60 sec: 13722.0, 300 sec: 13759.8). Total num frames: 759103488. Throughput: 0: 3341.7. Samples: 178946838. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 11:32:43,968][134211] Avg episode reward: [(0, '9.181')] [2025-01-04 11:32:45,580][134294] Updated weights for policy 0, policy_version 185334 (0.0026) [2025-01-04 11:32:48,087][134294] Updated weights for policy 0, policy_version 185344 (0.0019) [2025-01-04 11:32:48,968][134211] Fps is (10 sec: 15154.7, 60 sec: 13926.3, 300 sec: 13801.4). Total num frames: 759185408. Throughput: 0: 3239.5. Samples: 178957448. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 11:32:48,969][134211] Avg episode reward: [(0, '9.726')] [2025-01-04 11:32:50,478][134294] Updated weights for policy 0, policy_version 185354 (0.0018) [2025-01-04 11:32:53,371][134294] Updated weights for policy 0, policy_version 185364 (0.0024) [2025-01-04 11:32:53,968][134211] Fps is (10 sec: 15155.5, 60 sec: 14063.4, 300 sec: 13829.2). Total num frames: 759255040. Throughput: 0: 3299.9. Samples: 178981798. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 11:32:53,968][134211] Avg episode reward: [(0, '8.336')] [2025-01-04 11:32:56,385][134294] Updated weights for policy 0, policy_version 185374 (0.0028) [2025-01-04 11:32:58,968][134211] Fps is (10 sec: 13926.9, 60 sec: 13653.3, 300 sec: 13857.0). Total num frames: 759324672. Throughput: 0: 3348.5. Samples: 179002138. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 11:32:58,968][134211] Avg episode reward: [(0, '8.842')] [2025-01-04 11:32:59,548][134294] Updated weights for policy 0, policy_version 185384 (0.0024) [2025-01-04 11:33:02,759][134294] Updated weights for policy 0, policy_version 185394 (0.0025) [2025-01-04 11:33:03,968][134211] Fps is (10 sec: 13107.2, 60 sec: 13107.2, 300 sec: 13843.1). Total num frames: 759386112. Throughput: 0: 3335.3. Samples: 179011548. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 11:33:03,968][134211] Avg episode reward: [(0, '9.212')] [2025-01-04 11:33:05,740][134294] Updated weights for policy 0, policy_version 185404 (0.0028) [2025-01-04 11:33:08,708][134294] Updated weights for policy 0, policy_version 185414 (0.0026) [2025-01-04 11:33:08,968][134211] Fps is (10 sec: 13107.3, 60 sec: 13312.0, 300 sec: 13815.3). Total num frames: 759455744. Throughput: 0: 3361.8. Samples: 179031890. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 11:33:08,968][134211] Avg episode reward: [(0, '8.837')] [2025-01-04 11:33:11,948][134294] Updated weights for policy 0, policy_version 185424 (0.0026) [2025-01-04 11:33:13,968][134211] Fps is (10 sec: 14335.8, 60 sec: 13448.5, 300 sec: 13759.8). Total num frames: 759529472. Throughput: 0: 3377.3. Samples: 179051924. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 11:33:13,968][134211] Avg episode reward: [(0, '8.517')] [2025-01-04 11:33:14,010][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000185433_759533568.pth... [2025-01-04 11:33:14,051][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000184624_756219904.pth [2025-01-04 11:33:14,271][134294] Updated weights for policy 0, policy_version 185434 (0.0014) [2025-01-04 11:33:16,301][134294] Updated weights for policy 0, policy_version 185444 (0.0014) [2025-01-04 11:33:18,402][134294] Updated weights for policy 0, policy_version 185454 (0.0012) [2025-01-04 11:33:18,967][134211] Fps is (10 sec: 17203.5, 60 sec: 14063.1, 300 sec: 13898.6). Total num frames: 759627776. Throughput: 0: 3524.0. Samples: 179067138. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 11:33:18,968][134211] Avg episode reward: [(0, '9.668')] [2025-01-04 11:33:21,060][134294] Updated weights for policy 0, policy_version 185464 (0.0022) [2025-01-04 11:33:23,968][134211] Fps is (10 sec: 16383.9, 60 sec: 13994.7, 300 sec: 13940.3). Total num frames: 759693312. Throughput: 0: 3660.6. Samples: 179091474. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 11:33:23,969][134211] Avg episode reward: [(0, '8.291')] [2025-01-04 11:33:24,745][134294] Updated weights for policy 0, policy_version 185474 (0.0026) [2025-01-04 11:33:28,400][134294] Updated weights for policy 0, policy_version 185484 (0.0027) [2025-01-04 11:33:28,968][134211] Fps is (10 sec: 11878.2, 60 sec: 13858.1, 300 sec: 13912.5). Total num frames: 759746560. Throughput: 0: 3589.0. Samples: 179108344. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 11:33:28,968][134211] Avg episode reward: [(0, '9.040')] [2025-01-04 11:33:31,889][134294] Updated weights for policy 0, policy_version 185494 (0.0025) [2025-01-04 11:33:33,968][134211] Fps is (10 sec: 11468.8, 60 sec: 13926.4, 300 sec: 13926.4). Total num frames: 759808000. Throughput: 0: 3546.5. Samples: 179117040. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 11:33:33,969][134211] Avg episode reward: [(0, '8.713')] [2025-01-04 11:33:35,494][134294] Updated weights for policy 0, policy_version 185504 (0.0027) [2025-01-04 11:33:38,769][134294] Updated weights for policy 0, policy_version 185514 (0.0027) [2025-01-04 11:33:38,968][134211] Fps is (10 sec: 11878.2, 60 sec: 13858.1, 300 sec: 13745.9). Total num frames: 759865344. Throughput: 0: 3393.2. Samples: 179134492. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 11:33:38,968][134211] Avg episode reward: [(0, '9.250')] [2025-01-04 11:33:42,087][134294] Updated weights for policy 0, policy_version 185524 (0.0027) [2025-01-04 11:33:43,968][134211] Fps is (10 sec: 11878.3, 60 sec: 13721.6, 300 sec: 13620.9). Total num frames: 759926784. Throughput: 0: 3352.6. Samples: 179153006. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 11:33:43,969][134211] Avg episode reward: [(0, '8.055')] [2025-01-04 11:33:45,391][134294] Updated weights for policy 0, policy_version 185534 (0.0027) [2025-01-04 11:33:48,681][134294] Updated weights for policy 0, policy_version 185544 (0.0029) [2025-01-04 11:33:48,971][134211] Fps is (10 sec: 12284.3, 60 sec: 13379.7, 300 sec: 13606.9). Total num frames: 759988224. Throughput: 0: 3355.4. Samples: 179162550. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 11:33:48,971][134211] Avg episode reward: [(0, '8.365')] [2025-01-04 11:33:51,895][134294] Updated weights for policy 0, policy_version 185554 (0.0027) [2025-01-04 11:33:53,968][134211] Fps is (10 sec: 12697.7, 60 sec: 13312.0, 300 sec: 13620.9). Total num frames: 760053760. Throughput: 0: 3318.3. Samples: 179181216. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 11:33:53,968][134211] Avg episode reward: [(0, '9.657')] [2025-01-04 11:33:55,230][134294] Updated weights for policy 0, policy_version 185564 (0.0024) [2025-01-04 11:33:57,838][134294] Updated weights for policy 0, policy_version 185574 (0.0018) [2025-01-04 11:33:58,968][134211] Fps is (10 sec: 14340.5, 60 sec: 13448.6, 300 sec: 13676.5). Total num frames: 760131584. Throughput: 0: 3348.1. Samples: 179202588. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 11:33:58,968][134211] Avg episode reward: [(0, '9.772')] [2025-01-04 11:33:59,985][134294] Updated weights for policy 0, policy_version 185584 (0.0013) [2025-01-04 11:34:02,129][134294] Updated weights for policy 0, policy_version 185594 (0.0014) [2025-01-04 11:34:03,968][134211] Fps is (10 sec: 16793.8, 60 sec: 13926.4, 300 sec: 13787.6). Total num frames: 760221696. Throughput: 0: 3330.0. Samples: 179216990. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 11:34:03,968][134211] Avg episode reward: [(0, '9.036')] [2025-01-04 11:34:04,600][134294] Updated weights for policy 0, policy_version 185604 (0.0018) [2025-01-04 11:34:08,028][134294] Updated weights for policy 0, policy_version 185614 (0.0027) [2025-01-04 11:34:08,968][134211] Fps is (10 sec: 15155.0, 60 sec: 13789.8, 300 sec: 13773.7). Total num frames: 760283136. Throughput: 0: 3286.2. Samples: 179239354. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 11:34:08,968][134211] Avg episode reward: [(0, '9.058')] [2025-01-04 11:34:11,493][134294] Updated weights for policy 0, policy_version 185624 (0.0030) [2025-01-04 11:34:13,968][134211] Fps is (10 sec: 11878.4, 60 sec: 13516.8, 300 sec: 13745.9). Total num frames: 760340480. Throughput: 0: 3295.5. Samples: 179256642. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 11:34:13,968][134211] Avg episode reward: [(0, '9.381')] [2025-01-04 11:34:15,170][134294] Updated weights for policy 0, policy_version 185634 (0.0030) [2025-01-04 11:34:18,250][134294] Updated weights for policy 0, policy_version 185644 (0.0025) [2025-01-04 11:34:18,968][134211] Fps is (10 sec: 12288.1, 60 sec: 12970.6, 300 sec: 13745.9). Total num frames: 760406016. Throughput: 0: 3300.7. Samples: 179265570. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 11:34:18,968][134211] Avg episode reward: [(0, '9.892')] [2025-01-04 11:34:21,332][134294] Updated weights for policy 0, policy_version 185654 (0.0026) [2025-01-04 11:34:23,968][134211] Fps is (10 sec: 13107.2, 60 sec: 12970.7, 300 sec: 13745.9). Total num frames: 760471552. Throughput: 0: 3354.4. Samples: 179285440. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 11:34:23,968][134211] Avg episode reward: [(0, '8.736')] [2025-01-04 11:34:24,522][134294] Updated weights for policy 0, policy_version 185664 (0.0029) [2025-01-04 11:34:27,751][134294] Updated weights for policy 0, policy_version 185674 (0.0026) [2025-01-04 11:34:28,968][134211] Fps is (10 sec: 12697.5, 60 sec: 13107.2, 300 sec: 13745.9). Total num frames: 760532992. Throughput: 0: 3369.0. Samples: 179304612. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 11:34:28,968][134211] Avg episode reward: [(0, '9.306')] [2025-01-04 11:34:31,147][134294] Updated weights for policy 0, policy_version 185684 (0.0025) [2025-01-04 11:34:33,968][134211] Fps is (10 sec: 11878.2, 60 sec: 13038.9, 300 sec: 13718.1). Total num frames: 760590336. Throughput: 0: 3355.5. Samples: 179313536. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 11:34:33,969][134211] Avg episode reward: [(0, '10.177')] [2025-01-04 11:34:34,654][134294] Updated weights for policy 0, policy_version 185694 (0.0023) [2025-01-04 11:34:36,843][134294] Updated weights for policy 0, policy_version 185704 (0.0014) [2025-01-04 11:34:38,968][134211] Fps is (10 sec: 13926.3, 60 sec: 13448.5, 300 sec: 13759.8). Total num frames: 760672256. Throughput: 0: 3425.7. Samples: 179335374. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 11:34:38,969][134211] Avg episode reward: [(0, '9.550')] [2025-01-04 11:34:39,889][134294] Updated weights for policy 0, policy_version 185714 (0.0026) [2025-01-04 11:34:42,878][134294] Updated weights for policy 0, policy_version 185724 (0.0028) [2025-01-04 11:34:43,968][134211] Fps is (10 sec: 14745.7, 60 sec: 13516.8, 300 sec: 13759.8). Total num frames: 760737792. Throughput: 0: 3395.1. Samples: 179355368. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 11:34:43,969][134211] Avg episode reward: [(0, '9.238')] [2025-01-04 11:34:45,843][134294] Updated weights for policy 0, policy_version 185734 (0.0025) [2025-01-04 11:34:48,872][134294] Updated weights for policy 0, policy_version 185744 (0.0024) [2025-01-04 11:34:48,968][134211] Fps is (10 sec: 13517.0, 60 sec: 13654.0, 300 sec: 13759.8). Total num frames: 760807424. Throughput: 0: 3308.6. Samples: 179365876. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 11:34:48,969][134211] Avg episode reward: [(0, '9.819')] [2025-01-04 11:34:51,849][134294] Updated weights for policy 0, policy_version 185754 (0.0025) [2025-01-04 11:34:53,968][134211] Fps is (10 sec: 13516.9, 60 sec: 13653.4, 300 sec: 13773.7). Total num frames: 760872960. Throughput: 0: 3265.2. Samples: 179386286. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 11:34:53,968][134211] Avg episode reward: [(0, '9.636')] [2025-01-04 11:34:55,009][134294] Updated weights for policy 0, policy_version 185764 (0.0024) [2025-01-04 11:34:58,046][134294] Updated weights for policy 0, policy_version 185774 (0.0026) [2025-01-04 11:34:58,968][134211] Fps is (10 sec: 13106.8, 60 sec: 13448.5, 300 sec: 13773.7). Total num frames: 760938496. Throughput: 0: 3326.4. Samples: 179406332. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 11:34:58,969][134211] Avg episode reward: [(0, '8.702')] [2025-01-04 11:35:01,129][134294] Updated weights for policy 0, policy_version 185784 (0.0027) [2025-01-04 11:35:03,968][134211] Fps is (10 sec: 13107.0, 60 sec: 13038.9, 300 sec: 13648.7). Total num frames: 761004032. Throughput: 0: 3341.2. Samples: 179415924. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 11:35:03,968][134211] Avg episode reward: [(0, '9.223')] [2025-01-04 11:35:04,475][134294] Updated weights for policy 0, policy_version 185794 (0.0025) [2025-01-04 11:35:07,439][134294] Updated weights for policy 0, policy_version 185804 (0.0027) [2025-01-04 11:35:08,968][134211] Fps is (10 sec: 13517.2, 60 sec: 13175.5, 300 sec: 13593.2). Total num frames: 761073664. Throughput: 0: 3337.8. Samples: 179435640. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 11:35:08,968][134211] Avg episode reward: [(0, '10.129')] [2025-01-04 11:35:09,996][134294] Updated weights for policy 0, policy_version 185814 (0.0018) [2025-01-04 11:35:11,930][134294] Updated weights for policy 0, policy_version 185824 (0.0013) [2025-01-04 11:35:13,762][134294] Updated weights for policy 0, policy_version 185834 (0.0014) [2025-01-04 11:35:13,968][134211] Fps is (10 sec: 17613.2, 60 sec: 13994.7, 300 sec: 13745.9). Total num frames: 761180160. Throughput: 0: 3542.7. Samples: 179464032. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 11:35:13,968][134211] Avg episode reward: [(0, '9.276')] [2025-01-04 11:35:13,972][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000185835_761180160.pth... [2025-01-04 11:35:14,022][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000185028_757874688.pth [2025-01-04 11:35:15,679][134294] Updated weights for policy 0, policy_version 185844 (0.0014) [2025-01-04 11:35:17,899][134294] Updated weights for policy 0, policy_version 185854 (0.0019) [2025-01-04 11:35:18,968][134211] Fps is (10 sec: 19660.6, 60 sec: 14404.3, 300 sec: 13829.2). Total num frames: 761270272. Throughput: 0: 3702.3. Samples: 179480140. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 11:35:18,969][134211] Avg episode reward: [(0, '8.962')] [2025-01-04 11:35:21,048][134294] Updated weights for policy 0, policy_version 185864 (0.0028) [2025-01-04 11:35:23,968][134211] Fps is (10 sec: 15154.8, 60 sec: 14336.0, 300 sec: 13843.1). Total num frames: 761331712. Throughput: 0: 3685.3. Samples: 179501212. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 11:35:23,969][134211] Avg episode reward: [(0, '9.920')] [2025-01-04 11:35:24,506][134294] Updated weights for policy 0, policy_version 185874 (0.0027) [2025-01-04 11:35:27,833][134294] Updated weights for policy 0, policy_version 185884 (0.0026) [2025-01-04 11:35:28,968][134211] Fps is (10 sec: 12288.0, 60 sec: 14336.0, 300 sec: 13857.0). Total num frames: 761393152. Throughput: 0: 3640.5. Samples: 179519192. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 11:35:28,968][134211] Avg episode reward: [(0, '8.655')] [2025-01-04 11:35:31,178][134294] Updated weights for policy 0, policy_version 185894 (0.0026) [2025-01-04 11:35:33,968][134211] Fps is (10 sec: 12288.0, 60 sec: 14404.3, 300 sec: 13759.8). Total num frames: 761454592. Throughput: 0: 3615.6. Samples: 179528578. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 11:35:33,968][134211] Avg episode reward: [(0, '8.960')] [2025-01-04 11:35:34,515][134294] Updated weights for policy 0, policy_version 185904 (0.0027) [2025-01-04 11:35:37,705][134294] Updated weights for policy 0, policy_version 185914 (0.0028) [2025-01-04 11:35:38,968][134211] Fps is (10 sec: 12697.7, 60 sec: 14131.3, 300 sec: 13648.7). Total num frames: 761520128. Throughput: 0: 3585.6. Samples: 179547636. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 11:35:38,968][134211] Avg episode reward: [(0, '9.618')] [2025-01-04 11:35:40,730][134294] Updated weights for policy 0, policy_version 185924 (0.0026) [2025-01-04 11:35:43,572][134294] Updated weights for policy 0, policy_version 185934 (0.0022) [2025-01-04 11:35:43,968][134211] Fps is (10 sec: 13516.8, 60 sec: 14199.5, 300 sec: 13676.5). Total num frames: 761589760. Throughput: 0: 3595.8. Samples: 179568144. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 11:35:43,968][134211] Avg episode reward: [(0, '9.254')] [2025-01-04 11:35:46,606][134294] Updated weights for policy 0, policy_version 185944 (0.0028) [2025-01-04 11:35:48,968][134211] Fps is (10 sec: 13516.7, 60 sec: 14131.2, 300 sec: 13676.5). Total num frames: 761655296. Throughput: 0: 3613.0. Samples: 179578510. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 11:35:48,968][134211] Avg episode reward: [(0, '9.172')] [2025-01-04 11:35:49,872][134294] Updated weights for policy 0, policy_version 185954 (0.0026) [2025-01-04 11:35:52,799][134294] Updated weights for policy 0, policy_version 185964 (0.0024) [2025-01-04 11:35:53,968][134211] Fps is (10 sec: 13516.7, 60 sec: 14199.4, 300 sec: 13704.2). Total num frames: 761724928. Throughput: 0: 3611.9. Samples: 179598174. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 11:35:53,969][134211] Avg episode reward: [(0, '9.766')] [2025-01-04 11:35:55,927][134294] Updated weights for policy 0, policy_version 185974 (0.0028) [2025-01-04 11:35:58,968][134211] Fps is (10 sec: 13107.2, 60 sec: 14131.3, 300 sec: 13704.2). Total num frames: 761786368. Throughput: 0: 3419.9. Samples: 179617926. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 11:35:58,968][134211] Avg episode reward: [(0, '9.944')] [2025-01-04 11:35:59,098][134294] Updated weights for policy 0, policy_version 185984 (0.0025) [2025-01-04 11:36:02,343][134294] Updated weights for policy 0, policy_version 185994 (0.0029) [2025-01-04 11:36:03,968][134211] Fps is (10 sec: 12288.1, 60 sec: 14062.9, 300 sec: 13690.4). Total num frames: 761847808. Throughput: 0: 3269.4. Samples: 179627262. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 11:36:03,968][134211] Avg episode reward: [(0, '10.674')] [2025-01-04 11:36:05,452][134294] Updated weights for policy 0, policy_version 186004 (0.0021) [2025-01-04 11:36:08,412][134294] Updated weights for policy 0, policy_version 186014 (0.0024) [2025-01-04 11:36:08,968][134211] Fps is (10 sec: 13516.9, 60 sec: 14131.2, 300 sec: 13718.1). Total num frames: 761921536. Throughput: 0: 3247.0. Samples: 179647326. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 11:36:08,968][134211] Avg episode reward: [(0, '9.224')] [2025-01-04 11:36:11,332][134294] Updated weights for policy 0, policy_version 186024 (0.0026) [2025-01-04 11:36:13,968][134211] Fps is (10 sec: 13926.3, 60 sec: 13448.5, 300 sec: 13732.0). Total num frames: 761987072. Throughput: 0: 3309.2. Samples: 179668104. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 11:36:13,968][134211] Avg episode reward: [(0, '10.221')] [2025-01-04 11:36:14,396][134294] Updated weights for policy 0, policy_version 186034 (0.0025) [2025-01-04 11:36:17,682][134294] Updated weights for policy 0, policy_version 186044 (0.0027) [2025-01-04 11:36:18,967][134211] Fps is (10 sec: 13107.4, 60 sec: 13039.0, 300 sec: 13718.1). Total num frames: 762052608. Throughput: 0: 3306.0. Samples: 179677346. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 11:36:18,968][134211] Avg episode reward: [(0, '9.970')] [2025-01-04 11:36:20,026][134294] Updated weights for policy 0, policy_version 186054 (0.0018) [2025-01-04 11:36:21,879][134294] Updated weights for policy 0, policy_version 186064 (0.0012) [2025-01-04 11:36:23,771][134294] Updated weights for policy 0, policy_version 186074 (0.0014) [2025-01-04 11:36:23,971][134211] Fps is (10 sec: 17607.7, 60 sec: 13857.5, 300 sec: 13870.7). Total num frames: 762163200. Throughput: 0: 3483.5. Samples: 179704404. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 11:36:23,971][134211] Avg episode reward: [(0, '9.074')] [2025-01-04 11:36:26,174][134294] Updated weights for policy 0, policy_version 186084 (0.0019) [2025-01-04 11:36:28,968][134211] Fps is (10 sec: 17612.0, 60 sec: 13926.3, 300 sec: 13884.7). Total num frames: 762228736. Throughput: 0: 3578.7. Samples: 179729188. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 11:36:28,969][134211] Avg episode reward: [(0, '9.497')] [2025-01-04 11:36:29,786][134294] Updated weights for policy 0, policy_version 186094 (0.0026) [2025-01-04 11:36:33,380][134294] Updated weights for policy 0, policy_version 186104 (0.0025) [2025-01-04 11:36:33,971][134211] Fps is (10 sec: 12288.2, 60 sec: 13857.5, 300 sec: 13801.4). Total num frames: 762286080. Throughput: 0: 3529.2. Samples: 179737336. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 11:36:33,971][134211] Avg episode reward: [(0, '9.293')] [2025-01-04 11:36:36,854][134294] Updated weights for policy 0, policy_version 186114 (0.0027) [2025-01-04 11:36:38,968][134211] Fps is (10 sec: 11878.6, 60 sec: 13789.8, 300 sec: 13787.6). Total num frames: 762347520. Throughput: 0: 3481.7. Samples: 179754850. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 11:36:38,968][134211] Avg episode reward: [(0, '9.081')] [2025-01-04 11:36:40,242][134294] Updated weights for policy 0, policy_version 186124 (0.0028) [2025-01-04 11:36:43,180][134294] Updated weights for policy 0, policy_version 186134 (0.0025) [2025-01-04 11:36:43,967][134211] Fps is (10 sec: 13520.9, 60 sec: 13858.2, 300 sec: 13801.4). Total num frames: 762421248. Throughput: 0: 3480.3. Samples: 179774540. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 11:36:43,968][134211] Avg episode reward: [(0, '9.692')] [2025-01-04 11:36:45,132][134294] Updated weights for policy 0, policy_version 186144 (0.0016) [2025-01-04 11:36:47,897][134294] Updated weights for policy 0, policy_version 186154 (0.0025) [2025-01-04 11:36:48,968][134211] Fps is (10 sec: 15155.3, 60 sec: 14062.9, 300 sec: 13857.1). Total num frames: 762499072. Throughput: 0: 3590.1. Samples: 179788818. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 11:36:48,968][134211] Avg episode reward: [(0, '9.921')] [2025-01-04 11:36:51,103][134294] Updated weights for policy 0, policy_version 186164 (0.0026) [2025-01-04 11:36:53,968][134211] Fps is (10 sec: 14335.6, 60 sec: 13994.7, 300 sec: 13759.8). Total num frames: 762564608. Throughput: 0: 3584.3. Samples: 179808622. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 11:36:53,968][134211] Avg episode reward: [(0, '9.682')] [2025-01-04 11:36:54,267][134294] Updated weights for policy 0, policy_version 186174 (0.0025) [2025-01-04 11:36:57,258][134294] Updated weights for policy 0, policy_version 186184 (0.0025) [2025-01-04 11:36:58,968][134211] Fps is (10 sec: 13107.3, 60 sec: 14063.0, 300 sec: 13662.6). Total num frames: 762630144. Throughput: 0: 3567.7. Samples: 179828650. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 11:36:58,968][134211] Avg episode reward: [(0, '9.606')] [2025-01-04 11:37:00,383][134294] Updated weights for policy 0, policy_version 186194 (0.0025) [2025-01-04 11:37:03,596][134294] Updated weights for policy 0, policy_version 186204 (0.0027) [2025-01-04 11:37:03,968][134211] Fps is (10 sec: 13107.2, 60 sec: 14131.2, 300 sec: 13690.4). Total num frames: 762695680. Throughput: 0: 3579.8. Samples: 179838438. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 11:37:03,968][134211] Avg episode reward: [(0, '9.681')] [2025-01-04 11:37:06,435][134294] Updated weights for policy 0, policy_version 186214 (0.0024) [2025-01-04 11:37:08,968][134211] Fps is (10 sec: 13107.2, 60 sec: 13994.7, 300 sec: 13690.4). Total num frames: 762761216. Throughput: 0: 3427.4. Samples: 179858626. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 11:37:08,968][134211] Avg episode reward: [(0, '8.876')] [2025-01-04 11:37:09,659][134294] Updated weights for policy 0, policy_version 186224 (0.0026) [2025-01-04 11:37:12,336][134294] Updated weights for policy 0, policy_version 186234 (0.0019) [2025-01-04 11:37:13,968][134211] Fps is (10 sec: 15155.4, 60 sec: 14336.0, 300 sec: 13773.7). Total num frames: 762847232. Throughput: 0: 3383.2. Samples: 179881432. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 11:37:13,968][134211] Avg episode reward: [(0, '8.417')] [2025-01-04 11:37:13,972][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000186242_762847232.pth... [2025-01-04 11:37:14,021][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000185433_759533568.pth [2025-01-04 11:37:14,309][134294] Updated weights for policy 0, policy_version 186244 (0.0014) [2025-01-04 11:37:16,170][134294] Updated weights for policy 0, policy_version 186254 (0.0014) [2025-01-04 11:37:18,045][134294] Updated weights for policy 0, policy_version 186264 (0.0012) [2025-01-04 11:37:18,967][134211] Fps is (10 sec: 19251.5, 60 sec: 15018.7, 300 sec: 13898.6). Total num frames: 762953728. Throughput: 0: 3556.1. Samples: 179897348. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 11:37:18,968][134211] Avg episode reward: [(0, '9.770')] [2025-01-04 11:37:20,039][134294] Updated weights for policy 0, policy_version 186274 (0.0013) [2025-01-04 11:37:22,363][134294] Updated weights for policy 0, policy_version 186284 (0.0019) [2025-01-04 11:37:23,968][134211] Fps is (10 sec: 18841.3, 60 sec: 14541.5, 300 sec: 13968.0). Total num frames: 763035648. Throughput: 0: 3832.4. Samples: 179927306. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 11:37:23,969][134211] Avg episode reward: [(0, '8.914')] [2025-01-04 11:37:25,733][134294] Updated weights for policy 0, policy_version 186294 (0.0031) [2025-01-04 11:37:28,918][134294] Updated weights for policy 0, policy_version 186304 (0.0026) [2025-01-04 11:37:28,968][134211] Fps is (10 sec: 14745.0, 60 sec: 14540.8, 300 sec: 13995.8). Total num frames: 763101184. Throughput: 0: 3820.1. Samples: 179946444. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 11:37:28,969][134211] Avg episode reward: [(0, '9.765')] [2025-01-04 11:37:32,158][134294] Updated weights for policy 0, policy_version 186314 (0.0028) [2025-01-04 11:37:33,968][134211] Fps is (10 sec: 12697.4, 60 sec: 14609.7, 300 sec: 13995.8). Total num frames: 763162624. Throughput: 0: 3705.5. Samples: 179955566. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 11:37:33,970][134211] Avg episode reward: [(0, '9.891')] [2025-01-04 11:37:35,324][134294] Updated weights for policy 0, policy_version 186324 (0.0028) [2025-01-04 11:37:38,239][134294] Updated weights for policy 0, policy_version 186334 (0.0024) [2025-01-04 11:37:38,968][134211] Fps is (10 sec: 13107.5, 60 sec: 14745.6, 300 sec: 13995.8). Total num frames: 763232256. Throughput: 0: 3708.9. Samples: 179975520. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 11:37:38,968][134211] Avg episode reward: [(0, '9.128')] [2025-01-04 11:37:41,312][134294] Updated weights for policy 0, policy_version 186344 (0.0026) [2025-01-04 11:37:43,968][134211] Fps is (10 sec: 13517.0, 60 sec: 14609.0, 300 sec: 13940.3). Total num frames: 763297792. Throughput: 0: 3707.8. Samples: 179995504. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 11:37:43,968][134211] Avg episode reward: [(0, '10.034')] [2025-01-04 11:37:44,410][134294] Updated weights for policy 0, policy_version 186354 (0.0026) [2025-01-04 11:37:47,380][134294] Updated weights for policy 0, policy_version 186364 (0.0026) [2025-01-04 11:37:48,968][134211] Fps is (10 sec: 13516.8, 60 sec: 14472.6, 300 sec: 13940.3). Total num frames: 763367424. Throughput: 0: 3715.7. Samples: 180005646. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 11:37:48,968][134211] Avg episode reward: [(0, '9.387')] [2025-01-04 11:37:50,608][134294] Updated weights for policy 0, policy_version 186374 (0.0024) [2025-01-04 11:37:53,779][134294] Updated weights for policy 0, policy_version 186384 (0.0027) [2025-01-04 11:37:53,968][134211] Fps is (10 sec: 13107.4, 60 sec: 14404.3, 300 sec: 13912.5). Total num frames: 763428864. Throughput: 0: 3704.4. Samples: 180025324. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 11:37:53,968][134211] Avg episode reward: [(0, '9.646')] [2025-01-04 11:37:57,173][134294] Updated weights for policy 0, policy_version 186394 (0.0027) [2025-01-04 11:37:58,968][134211] Fps is (10 sec: 12288.0, 60 sec: 14336.0, 300 sec: 13912.5). Total num frames: 763490304. Throughput: 0: 3604.8. Samples: 180043648. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 11:37:58,968][134211] Avg episode reward: [(0, '9.160')] [2025-01-04 11:38:00,473][134294] Updated weights for policy 0, policy_version 186404 (0.0027) [2025-01-04 11:38:03,845][134294] Updated weights for policy 0, policy_version 186414 (0.0026) [2025-01-04 11:38:03,968][134211] Fps is (10 sec: 12288.1, 60 sec: 14267.8, 300 sec: 13884.7). Total num frames: 763551744. Throughput: 0: 3460.2. Samples: 180053058. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 11:38:03,968][134211] Avg episode reward: [(0, '10.817')] [2025-01-04 11:38:06,756][134294] Updated weights for policy 0, policy_version 186424 (0.0028) [2025-01-04 11:38:08,968][134211] Fps is (10 sec: 13106.8, 60 sec: 14335.9, 300 sec: 13870.9). Total num frames: 763621376. Throughput: 0: 3230.6. Samples: 180072684. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 11:38:08,968][134211] Avg episode reward: [(0, '9.624')] [2025-01-04 11:38:09,790][134294] Updated weights for policy 0, policy_version 186434 (0.0025) [2025-01-04 11:38:12,848][134294] Updated weights for policy 0, policy_version 186444 (0.0026) [2025-01-04 11:38:13,968][134211] Fps is (10 sec: 13516.6, 60 sec: 13994.6, 300 sec: 13759.8). Total num frames: 763686912. Throughput: 0: 3255.4. Samples: 180092936. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 11:38:13,968][134211] Avg episode reward: [(0, '9.948')] [2025-01-04 11:38:15,815][134294] Updated weights for policy 0, policy_version 186454 (0.0025) [2025-01-04 11:38:18,670][134294] Updated weights for policy 0, policy_version 186464 (0.0025) [2025-01-04 11:38:18,968][134211] Fps is (10 sec: 13926.8, 60 sec: 13448.5, 300 sec: 13787.6). Total num frames: 763760640. Throughput: 0: 3286.7. Samples: 180103466. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 11:38:18,968][134211] Avg episode reward: [(0, '9.679')] [2025-01-04 11:38:21,509][134294] Updated weights for policy 0, policy_version 186474 (0.0025) [2025-01-04 11:38:23,968][134211] Fps is (10 sec: 13925.6, 60 sec: 13175.4, 300 sec: 13829.2). Total num frames: 763826176. Throughput: 0: 3313.0. Samples: 180124606. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 11:38:23,969][134211] Avg episode reward: [(0, '9.195')] [2025-01-04 11:38:24,616][134294] Updated weights for policy 0, policy_version 186484 (0.0026) [2025-01-04 11:38:27,982][134294] Updated weights for policy 0, policy_version 186494 (0.0030) [2025-01-04 11:38:28,968][134211] Fps is (10 sec: 13107.2, 60 sec: 13175.5, 300 sec: 13843.1). Total num frames: 763891712. Throughput: 0: 3293.6. Samples: 180143714. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 11:38:28,968][134211] Avg episode reward: [(0, '10.656')] [2025-01-04 11:38:30,929][134294] Updated weights for policy 0, policy_version 186504 (0.0021) [2025-01-04 11:38:33,069][134294] Updated weights for policy 0, policy_version 186514 (0.0014) [2025-01-04 11:38:33,968][134211] Fps is (10 sec: 15156.3, 60 sec: 13585.2, 300 sec: 13940.3). Total num frames: 763977728. Throughput: 0: 3309.5. Samples: 180154572. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 11:38:33,968][134211] Avg episode reward: [(0, '9.379')] [2025-01-04 11:38:35,162][134294] Updated weights for policy 0, policy_version 186524 (0.0012) [2025-01-04 11:38:37,031][134294] Updated weights for policy 0, policy_version 186534 (0.0015) [2025-01-04 11:38:38,968][134211] Fps is (10 sec: 18022.4, 60 sec: 13994.7, 300 sec: 14051.4). Total num frames: 764071936. Throughput: 0: 3540.6. Samples: 180184652. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 11:38:38,968][134211] Avg episode reward: [(0, '8.688')] [2025-01-04 11:38:39,925][134294] Updated weights for policy 0, policy_version 186544 (0.0024) [2025-01-04 11:38:42,989][134294] Updated weights for policy 0, policy_version 186554 (0.0028) [2025-01-04 11:38:43,968][134211] Fps is (10 sec: 15973.9, 60 sec: 13994.7, 300 sec: 14065.4). Total num frames: 764137472. Throughput: 0: 3588.1. Samples: 180205114. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 11:38:43,969][134211] Avg episode reward: [(0, '9.434')] [2025-01-04 11:38:45,980][134294] Updated weights for policy 0, policy_version 186564 (0.0026) [2025-01-04 11:38:48,968][134211] Fps is (10 sec: 13107.2, 60 sec: 13926.4, 300 sec: 14065.3). Total num frames: 764203008. Throughput: 0: 3604.9. Samples: 180215280. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 11:38:48,968][134211] Avg episode reward: [(0, '10.498')] [2025-01-04 11:38:49,028][134294] Updated weights for policy 0, policy_version 186574 (0.0025) [2025-01-04 11:38:52,060][134294] Updated weights for policy 0, policy_version 186584 (0.0026) [2025-01-04 11:38:53,968][134211] Fps is (10 sec: 13517.0, 60 sec: 14062.9, 300 sec: 14037.5). Total num frames: 764272640. Throughput: 0: 3623.4. Samples: 180235738. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 11:38:53,968][134211] Avg episode reward: [(0, '9.564')] [2025-01-04 11:38:55,112][134294] Updated weights for policy 0, policy_version 186594 (0.0027) [2025-01-04 11:38:58,426][134294] Updated weights for policy 0, policy_version 186604 (0.0025) [2025-01-04 11:38:58,968][134211] Fps is (10 sec: 13107.0, 60 sec: 14062.9, 300 sec: 13940.3). Total num frames: 764334080. Throughput: 0: 3601.7. Samples: 180255014. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 11:38:58,968][134211] Avg episode reward: [(0, '9.512')] [2025-01-04 11:39:01,569][134294] Updated weights for policy 0, policy_version 186614 (0.0023) [2025-01-04 11:39:03,968][134211] Fps is (10 sec: 12288.1, 60 sec: 14062.9, 300 sec: 13940.3). Total num frames: 764395520. Throughput: 0: 3582.4. Samples: 180264674. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 11:39:03,968][134211] Avg episode reward: [(0, '9.188')] [2025-01-04 11:39:04,939][134294] Updated weights for policy 0, policy_version 186624 (0.0026) [2025-01-04 11:39:07,906][134294] Updated weights for policy 0, policy_version 186634 (0.0024) [2025-01-04 11:39:08,968][134211] Fps is (10 sec: 13106.5, 60 sec: 14062.8, 300 sec: 13981.9). Total num frames: 764465152. Throughput: 0: 3540.0. Samples: 180283908. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 11:39:08,969][134211] Avg episode reward: [(0, '8.212')] [2025-01-04 11:39:10,892][134294] Updated weights for policy 0, policy_version 186644 (0.0024) [2025-01-04 11:39:13,865][134294] Updated weights for policy 0, policy_version 186654 (0.0026) [2025-01-04 11:39:13,968][134211] Fps is (10 sec: 13926.0, 60 sec: 14131.2, 300 sec: 13995.8). Total num frames: 764534784. Throughput: 0: 3576.4. Samples: 180304652. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 11:39:13,969][134211] Avg episode reward: [(0, '9.862')] [2025-01-04 11:39:13,979][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000186654_764534784.pth... [2025-01-04 11:39:14,060][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000185835_761180160.pth [2025-01-04 11:39:16,815][134294] Updated weights for policy 0, policy_version 186664 (0.0027) [2025-01-04 11:39:18,968][134211] Fps is (10 sec: 13517.7, 60 sec: 13994.7, 300 sec: 13995.8). Total num frames: 764600320. Throughput: 0: 3560.7. Samples: 180314802. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 11:39:18,968][134211] Avg episode reward: [(0, '8.896')] [2025-01-04 11:39:19,896][134294] Updated weights for policy 0, policy_version 186674 (0.0023) [2025-01-04 11:39:22,781][134294] Updated weights for policy 0, policy_version 186684 (0.0024) [2025-01-04 11:39:23,968][134211] Fps is (10 sec: 13517.0, 60 sec: 14063.0, 300 sec: 14023.6). Total num frames: 764669952. Throughput: 0: 3346.8. Samples: 180335260. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 11:39:23,968][134211] Avg episode reward: [(0, '9.549')] [2025-01-04 11:39:25,989][134294] Updated weights for policy 0, policy_version 186694 (0.0027) [2025-01-04 11:39:28,968][134211] Fps is (10 sec: 13107.0, 60 sec: 13994.6, 300 sec: 14037.5). Total num frames: 764731392. Throughput: 0: 3313.6. Samples: 180354226. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 11:39:28,968][134211] Avg episode reward: [(0, '9.204')] [2025-01-04 11:39:29,575][134294] Updated weights for policy 0, policy_version 186704 (0.0029) [2025-01-04 11:39:31,853][134294] Updated weights for policy 0, policy_version 186714 (0.0016) [2025-01-04 11:39:33,858][134294] Updated weights for policy 0, policy_version 186724 (0.0014) [2025-01-04 11:39:33,967][134211] Fps is (10 sec: 15155.7, 60 sec: 14062.9, 300 sec: 14065.3). Total num frames: 764821504. Throughput: 0: 3329.2. Samples: 180365092. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 11:39:33,968][134211] Avg episode reward: [(0, '10.102')] [2025-01-04 11:39:35,675][134294] Updated weights for policy 0, policy_version 186734 (0.0014) [2025-01-04 11:39:37,665][134294] Updated weights for policy 0, policy_version 186744 (0.0014) [2025-01-04 11:39:38,968][134211] Fps is (10 sec: 18841.8, 60 sec: 14131.2, 300 sec: 14176.3). Total num frames: 764919808. Throughput: 0: 3571.8. Samples: 180396470. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 11:39:38,968][134211] Avg episode reward: [(0, '10.196')] [2025-01-04 11:39:40,657][134294] Updated weights for policy 0, policy_version 186754 (0.0027) [2025-01-04 11:39:43,773][134294] Updated weights for policy 0, policy_version 186764 (0.0027) [2025-01-04 11:39:43,968][134211] Fps is (10 sec: 16383.7, 60 sec: 14131.2, 300 sec: 14162.4). Total num frames: 764985344. Throughput: 0: 3615.2. Samples: 180417700. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 11:39:43,968][134211] Avg episode reward: [(0, '10.046')] [2025-01-04 11:39:46,904][134294] Updated weights for policy 0, policy_version 186774 (0.0027) [2025-01-04 11:39:48,968][134211] Fps is (10 sec: 13107.0, 60 sec: 14131.2, 300 sec: 14162.4). Total num frames: 765050880. Throughput: 0: 3615.7. Samples: 180427382. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 11:39:48,969][134211] Avg episode reward: [(0, '8.970')] [2025-01-04 11:39:50,069][134294] Updated weights for policy 0, policy_version 186784 (0.0027) [2025-01-04 11:39:53,177][134294] Updated weights for policy 0, policy_version 186794 (0.0025) [2025-01-04 11:39:53,968][134211] Fps is (10 sec: 13106.4, 60 sec: 14062.8, 300 sec: 14162.4). Total num frames: 765116416. Throughput: 0: 3629.5. Samples: 180447234. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 11:39:53,969][134211] Avg episode reward: [(0, '8.939')] [2025-01-04 11:39:56,453][134294] Updated weights for policy 0, policy_version 186804 (0.0026) [2025-01-04 11:39:58,968][134211] Fps is (10 sec: 12697.6, 60 sec: 14062.9, 300 sec: 14148.6). Total num frames: 765177856. Throughput: 0: 3577.7. Samples: 180465646. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 11:39:58,968][134211] Avg episode reward: [(0, '8.241')] [2025-01-04 11:39:59,914][134294] Updated weights for policy 0, policy_version 186814 (0.0024) [2025-01-04 11:40:03,317][134294] Updated weights for policy 0, policy_version 186824 (0.0023) [2025-01-04 11:40:03,968][134211] Fps is (10 sec: 11878.9, 60 sec: 13994.6, 300 sec: 14106.9). Total num frames: 765235200. Throughput: 0: 3548.7. Samples: 180474496. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 11:40:03,968][134211] Avg episode reward: [(0, '7.488')] [2025-01-04 11:40:06,337][134294] Updated weights for policy 0, policy_version 186834 (0.0025) [2025-01-04 11:40:08,968][134211] Fps is (10 sec: 12697.6, 60 sec: 13994.8, 300 sec: 13981.9). Total num frames: 765304832. Throughput: 0: 3525.3. Samples: 180493898. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 11:40:08,968][134211] Avg episode reward: [(0, '8.590')] [2025-01-04 11:40:09,454][134294] Updated weights for policy 0, policy_version 186844 (0.0024) [2025-01-04 11:40:12,386][134294] Updated weights for policy 0, policy_version 186854 (0.0024) [2025-01-04 11:40:13,968][134211] Fps is (10 sec: 13926.6, 60 sec: 13994.7, 300 sec: 13912.5). Total num frames: 765374464. Throughput: 0: 3558.4. Samples: 180514352. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 11:40:13,968][134211] Avg episode reward: [(0, '9.848')] [2025-01-04 11:40:15,416][134294] Updated weights for policy 0, policy_version 186864 (0.0025) [2025-01-04 11:40:18,236][134294] Updated weights for policy 0, policy_version 186874 (0.0024) [2025-01-04 11:40:18,968][134211] Fps is (10 sec: 13925.8, 60 sec: 14062.8, 300 sec: 13940.3). Total num frames: 765444096. Throughput: 0: 3550.2. Samples: 180524854. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 11:40:18,969][134211] Avg episode reward: [(0, '10.662')] [2025-01-04 11:40:21,196][134294] Updated weights for policy 0, policy_version 186884 (0.0024) [2025-01-04 11:40:23,968][134211] Fps is (10 sec: 13516.8, 60 sec: 13994.7, 300 sec: 13954.2). Total num frames: 765509632. Throughput: 0: 3315.1. Samples: 180545648. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 11:40:23,968][134211] Avg episode reward: [(0, '10.428')] [2025-01-04 11:40:24,271][134294] Updated weights for policy 0, policy_version 186894 (0.0025) [2025-01-04 11:40:27,690][134294] Updated weights for policy 0, policy_version 186904 (0.0027) [2025-01-04 11:40:28,968][134211] Fps is (10 sec: 12698.2, 60 sec: 13994.7, 300 sec: 13954.2). Total num frames: 765571072. Throughput: 0: 3258.8. Samples: 180564348. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 11:40:28,968][134211] Avg episode reward: [(0, '8.986')] [2025-01-04 11:40:31,018][134294] Updated weights for policy 0, policy_version 186914 (0.0023) [2025-01-04 11:40:33,968][134211] Fps is (10 sec: 12288.1, 60 sec: 13516.8, 300 sec: 13940.3). Total num frames: 765632512. Throughput: 0: 3249.3. Samples: 180573600. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 11:40:33,968][134211] Avg episode reward: [(0, '10.015')] [2025-01-04 11:40:34,265][134294] Updated weights for policy 0, policy_version 186924 (0.0027) [2025-01-04 11:40:36,319][134294] Updated weights for policy 0, policy_version 186934 (0.0014) [2025-01-04 11:40:38,203][134294] Updated weights for policy 0, policy_version 186944 (0.0013) [2025-01-04 11:40:38,968][134211] Fps is (10 sec: 16794.0, 60 sec: 13653.4, 300 sec: 14065.3). Total num frames: 765739008. Throughput: 0: 3356.7. Samples: 180598282. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 11:40:38,968][134211] Avg episode reward: [(0, '9.358')] [2025-01-04 11:40:40,138][134294] Updated weights for policy 0, policy_version 186954 (0.0013) [2025-01-04 11:40:41,958][134294] Updated weights for policy 0, policy_version 186964 (0.0013) [2025-01-04 11:40:43,968][134211] Fps is (10 sec: 20889.4, 60 sec: 14267.7, 300 sec: 14190.2). Total num frames: 765841408. Throughput: 0: 3664.8. Samples: 180630562. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 11:40:43,968][134211] Avg episode reward: [(0, '9.662')] [2025-01-04 11:40:44,210][134294] Updated weights for policy 0, policy_version 186974 (0.0017) [2025-01-04 11:40:47,338][134294] Updated weights for policy 0, policy_version 186984 (0.0029) [2025-01-04 11:40:48,969][134211] Fps is (10 sec: 16381.2, 60 sec: 14199.1, 300 sec: 14162.4). Total num frames: 765902848. Throughput: 0: 3704.0. Samples: 180641180. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 11:40:48,970][134211] Avg episode reward: [(0, '9.480')] [2025-01-04 11:40:50,855][134294] Updated weights for policy 0, policy_version 186994 (0.0026) [2025-01-04 11:40:53,968][134211] Fps is (10 sec: 12287.4, 60 sec: 14131.2, 300 sec: 14162.4). Total num frames: 765964288. Throughput: 0: 3670.9. Samples: 180659092. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 11:40:53,969][134211] Avg episode reward: [(0, '8.734')] [2025-01-04 11:40:54,228][134294] Updated weights for policy 0, policy_version 187004 (0.0028) [2025-01-04 11:40:57,338][134294] Updated weights for policy 0, policy_version 187014 (0.0026) [2025-01-04 11:40:58,968][134211] Fps is (10 sec: 12289.8, 60 sec: 14131.2, 300 sec: 14162.4). Total num frames: 766025728. Throughput: 0: 3637.3. Samples: 180678030. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 11:40:58,968][134211] Avg episode reward: [(0, '10.075')] [2025-01-04 11:41:00,784][134294] Updated weights for policy 0, policy_version 187024 (0.0028) [2025-01-04 11:41:03,932][134294] Updated weights for policy 0, policy_version 187034 (0.0025) [2025-01-04 11:41:03,968][134211] Fps is (10 sec: 12698.2, 60 sec: 14267.8, 300 sec: 14134.7). Total num frames: 766091264. Throughput: 0: 3605.5. Samples: 180687100. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 11:41:03,968][134211] Avg episode reward: [(0, '9.050')] [2025-01-04 11:41:06,860][134294] Updated weights for policy 0, policy_version 187044 (0.0025) [2025-01-04 11:41:08,968][134211] Fps is (10 sec: 13107.0, 60 sec: 14199.4, 300 sec: 14134.7). Total num frames: 766156800. Throughput: 0: 3587.6. Samples: 180707090. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 11:41:08,969][134211] Avg episode reward: [(0, '8.517')] [2025-01-04 11:41:10,193][134294] Updated weights for policy 0, policy_version 187054 (0.0025) [2025-01-04 11:41:13,230][134294] Updated weights for policy 0, policy_version 187064 (0.0026) [2025-01-04 11:41:13,968][134211] Fps is (10 sec: 13107.2, 60 sec: 14131.2, 300 sec: 14134.7). Total num frames: 766222336. Throughput: 0: 3602.9. Samples: 180726480. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 11:41:13,968][134211] Avg episode reward: [(0, '10.804')] [2025-01-04 11:41:13,977][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000187066_766222336.pth... [2025-01-04 11:41:14,052][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000186242_762847232.pth [2025-01-04 11:41:16,556][134294] Updated weights for policy 0, policy_version 187074 (0.0026) [2025-01-04 11:41:18,968][134211] Fps is (10 sec: 12697.6, 60 sec: 13994.7, 300 sec: 13968.2). Total num frames: 766283776. Throughput: 0: 3603.2. Samples: 180735744. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 11:41:18,969][134211] Avg episode reward: [(0, '10.342')] [2025-01-04 11:41:19,874][134294] Updated weights for policy 0, policy_version 187084 (0.0026) [2025-01-04 11:41:22,965][134294] Updated weights for policy 0, policy_version 187094 (0.0027) [2025-01-04 11:41:23,968][134211] Fps is (10 sec: 12697.6, 60 sec: 13994.6, 300 sec: 13968.1). Total num frames: 766349312. Throughput: 0: 3477.8. Samples: 180754784. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 11:41:23,968][134211] Avg episode reward: [(0, '8.405')] [2025-01-04 11:41:26,205][134294] Updated weights for policy 0, policy_version 187104 (0.0028) [2025-01-04 11:41:28,968][134211] Fps is (10 sec: 12288.2, 60 sec: 13926.4, 300 sec: 13968.2). Total num frames: 766406656. Throughput: 0: 3174.0. Samples: 180773394. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 11:41:28,968][134211] Avg episode reward: [(0, '10.677')] [2025-01-04 11:41:29,656][134294] Updated weights for policy 0, policy_version 187114 (0.0025) [2025-01-04 11:41:32,550][134294] Updated weights for policy 0, policy_version 187124 (0.0019) [2025-01-04 11:41:33,967][134211] Fps is (10 sec: 13926.8, 60 sec: 14267.8, 300 sec: 14037.5). Total num frames: 766488576. Throughput: 0: 3141.9. Samples: 180782558. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 11:41:33,968][134211] Avg episode reward: [(0, '9.886')] [2025-01-04 11:41:34,467][134294] Updated weights for policy 0, policy_version 187134 (0.0013) [2025-01-04 11:41:36,526][134294] Updated weights for policy 0, policy_version 187144 (0.0017) [2025-01-04 11:41:38,968][134211] Fps is (10 sec: 16793.6, 60 sec: 13926.3, 300 sec: 14079.1). Total num frames: 766574592. Throughput: 0: 3380.4. Samples: 180811208. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 11:41:38,968][134211] Avg episode reward: [(0, '10.135')] [2025-01-04 11:41:39,471][134294] Updated weights for policy 0, policy_version 187154 (0.0026) [2025-01-04 11:41:42,665][134294] Updated weights for policy 0, policy_version 187164 (0.0028) [2025-01-04 11:41:43,968][134211] Fps is (10 sec: 15154.8, 60 sec: 13312.0, 300 sec: 14037.5). Total num frames: 766640128. Throughput: 0: 3396.1. Samples: 180830854. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 11:41:43,968][134211] Avg episode reward: [(0, '8.653')] [2025-01-04 11:41:45,645][134294] Updated weights for policy 0, policy_version 187174 (0.0026) [2025-01-04 11:41:48,692][134294] Updated weights for policy 0, policy_version 187184 (0.0022) [2025-01-04 11:41:48,968][134211] Fps is (10 sec: 13516.9, 60 sec: 13448.9, 300 sec: 14051.4). Total num frames: 766709760. Throughput: 0: 3424.0. Samples: 180841182. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 11:41:48,968][134211] Avg episode reward: [(0, '8.655')] [2025-01-04 11:41:51,514][134294] Updated weights for policy 0, policy_version 187194 (0.0026) [2025-01-04 11:41:53,968][134211] Fps is (10 sec: 13516.4, 60 sec: 13516.8, 300 sec: 14051.3). Total num frames: 766775296. Throughput: 0: 3435.1. Samples: 180861668. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 11:41:53,969][134211] Avg episode reward: [(0, '9.494')] [2025-01-04 11:41:54,700][134294] Updated weights for policy 0, policy_version 187204 (0.0026) [2025-01-04 11:41:58,165][134294] Updated weights for policy 0, policy_version 187214 (0.0024) [2025-01-04 11:41:58,968][134211] Fps is (10 sec: 12697.8, 60 sec: 13516.8, 300 sec: 14037.5). Total num frames: 766836736. Throughput: 0: 3420.2. Samples: 180880388. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 11:41:58,968][134211] Avg episode reward: [(0, '9.603')] [2025-01-04 11:42:01,350][134294] Updated weights for policy 0, policy_version 187224 (0.0025) [2025-01-04 11:42:03,968][134211] Fps is (10 sec: 12288.3, 60 sec: 13448.5, 300 sec: 14023.6). Total num frames: 766898176. Throughput: 0: 3426.3. Samples: 180889926. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 11:42:03,968][134211] Avg episode reward: [(0, '8.882')] [2025-01-04 11:42:04,822][134294] Updated weights for policy 0, policy_version 187234 (0.0026) [2025-01-04 11:42:07,686][134294] Updated weights for policy 0, policy_version 187244 (0.0024) [2025-01-04 11:42:08,968][134211] Fps is (10 sec: 13926.5, 60 sec: 13653.4, 300 sec: 13995.8). Total num frames: 766976000. Throughput: 0: 3427.5. Samples: 180909022. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 11:42:08,968][134211] Avg episode reward: [(0, '9.239')] [2025-01-04 11:42:09,697][134294] Updated weights for policy 0, policy_version 187254 (0.0014) [2025-01-04 11:42:11,544][134294] Updated weights for policy 0, policy_version 187264 (0.0014) [2025-01-04 11:42:13,968][134211] Fps is (10 sec: 16793.7, 60 sec: 14062.9, 300 sec: 13940.3). Total num frames: 767066112. Throughput: 0: 3649.4. Samples: 180937616. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 11:42:13,968][134211] Avg episode reward: [(0, '9.275')] [2025-01-04 11:42:14,399][134294] Updated weights for policy 0, policy_version 187274 (0.0026) [2025-01-04 11:42:17,470][134294] Updated weights for policy 0, policy_version 187284 (0.0028) [2025-01-04 11:42:18,969][134211] Fps is (10 sec: 15562.5, 60 sec: 14130.9, 300 sec: 13884.7). Total num frames: 767131648. Throughput: 0: 3667.1. Samples: 180947582. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 11:42:18,970][134211] Avg episode reward: [(0, '10.090')] [2025-01-04 11:42:20,587][134294] Updated weights for policy 0, policy_version 187294 (0.0024) [2025-01-04 11:42:23,446][134294] Updated weights for policy 0, policy_version 187304 (0.0026) [2025-01-04 11:42:23,968][134211] Fps is (10 sec: 13516.8, 60 sec: 14199.5, 300 sec: 13898.6). Total num frames: 767201280. Throughput: 0: 3484.9. Samples: 180968028. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-04 11:42:23,968][134211] Avg episode reward: [(0, '9.276')] [2025-01-04 11:42:26,781][134294] Updated weights for policy 0, policy_version 187314 (0.0024) [2025-01-04 11:42:28,968][134211] Fps is (10 sec: 12699.2, 60 sec: 14199.5, 300 sec: 13884.8). Total num frames: 767258624. Throughput: 0: 3459.9. Samples: 180986550. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-04 11:42:28,968][134211] Avg episode reward: [(0, '9.054')] [2025-01-04 11:42:30,474][134294] Updated weights for policy 0, policy_version 187324 (0.0031) [2025-01-04 11:42:33,845][134294] Updated weights for policy 0, policy_version 187334 (0.0023) [2025-01-04 11:42:33,968][134211] Fps is (10 sec: 11878.5, 60 sec: 13858.1, 300 sec: 13857.0). Total num frames: 767320064. Throughput: 0: 3422.3. Samples: 180995186. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-04 11:42:33,968][134211] Avg episode reward: [(0, '8.042')] [2025-01-04 11:42:36,089][134294] Updated weights for policy 0, policy_version 187344 (0.0013) [2025-01-04 11:42:38,162][134294] Updated weights for policy 0, policy_version 187354 (0.0014) [2025-01-04 11:42:38,968][134211] Fps is (10 sec: 15974.4, 60 sec: 14062.9, 300 sec: 13968.1). Total num frames: 767418368. Throughput: 0: 3488.9. Samples: 181018666. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-04 11:42:38,968][134211] Avg episode reward: [(0, '9.826')] [2025-01-04 11:42:40,093][134294] Updated weights for policy 0, policy_version 187364 (0.0013) [2025-01-04 11:42:41,979][134294] Updated weights for policy 0, policy_version 187374 (0.0014) [2025-01-04 11:42:43,968][134211] Fps is (10 sec: 20070.2, 60 sec: 14677.3, 300 sec: 14079.1). Total num frames: 767520768. Throughput: 0: 3778.3. Samples: 181050412. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-04 11:42:43,968][134211] Avg episode reward: [(0, '10.388')] [2025-01-04 11:42:44,154][134294] Updated weights for policy 0, policy_version 187384 (0.0020) [2025-01-04 11:42:47,307][134294] Updated weights for policy 0, policy_version 187394 (0.0027) [2025-01-04 11:42:48,968][134211] Fps is (10 sec: 16793.0, 60 sec: 14609.0, 300 sec: 14093.0). Total num frames: 767586304. Throughput: 0: 3803.5. Samples: 181061086. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-04 11:42:48,969][134211] Avg episode reward: [(0, '8.842')] [2025-01-04 11:42:50,506][134294] Updated weights for policy 0, policy_version 187404 (0.0026) [2025-01-04 11:42:53,463][134294] Updated weights for policy 0, policy_version 187414 (0.0028) [2025-01-04 11:42:53,968][134211] Fps is (10 sec: 13107.1, 60 sec: 14609.1, 300 sec: 14106.9). Total num frames: 767651840. Throughput: 0: 3818.4. Samples: 181080850. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-04 11:42:53,968][134211] Avg episode reward: [(0, '9.505')] [2025-01-04 11:42:56,612][134294] Updated weights for policy 0, policy_version 187424 (0.0024) [2025-01-04 11:42:58,970][134211] Fps is (10 sec: 13107.6, 60 sec: 14677.3, 300 sec: 14120.8). Total num frames: 767717376. Throughput: 0: 3613.9. Samples: 181100240. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-04 11:42:58,970][134211] Avg episode reward: [(0, '9.637')] [2025-01-04 11:42:59,806][134294] Updated weights for policy 0, policy_version 187434 (0.0025) [2025-01-04 11:43:03,218][134294] Updated weights for policy 0, policy_version 187444 (0.0025) [2025-01-04 11:43:03,968][134211] Fps is (10 sec: 12697.6, 60 sec: 14677.3, 300 sec: 14093.0). Total num frames: 767778816. Throughput: 0: 3600.5. Samples: 181109598. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-04 11:43:03,968][134211] Avg episode reward: [(0, '11.215')] [2025-01-04 11:43:03,976][134264] Saving new best policy, reward=11.215! [2025-01-04 11:43:06,329][134294] Updated weights for policy 0, policy_version 187454 (0.0026) [2025-01-04 11:43:08,968][134211] Fps is (10 sec: 12288.2, 60 sec: 14404.2, 300 sec: 14079.1). Total num frames: 767840256. Throughput: 0: 3569.6. Samples: 181128658. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-04 11:43:08,968][134211] Avg episode reward: [(0, '11.238')] [2025-01-04 11:43:08,972][134264] Saving new best policy, reward=11.238! [2025-01-04 11:43:09,761][134294] Updated weights for policy 0, policy_version 187464 (0.0027) [2025-01-04 11:43:12,969][134294] Updated weights for policy 0, policy_version 187474 (0.0026) [2025-01-04 11:43:13,968][134211] Fps is (10 sec: 12697.7, 60 sec: 13994.7, 300 sec: 14051.4). Total num frames: 767905792. Throughput: 0: 3571.3. Samples: 181147260. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-04 11:43:13,968][134211] Avg episode reward: [(0, '8.745')] [2025-01-04 11:43:13,976][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000187477_767905792.pth... [2025-01-04 11:43:14,045][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000186654_764534784.pth [2025-01-04 11:43:16,027][134294] Updated weights for policy 0, policy_version 187484 (0.0027) [2025-01-04 11:43:18,862][134294] Updated weights for policy 0, policy_version 187494 (0.0026) [2025-01-04 11:43:18,968][134211] Fps is (10 sec: 13516.5, 60 sec: 14063.2, 300 sec: 14065.3). Total num frames: 767975424. Throughput: 0: 3606.6. Samples: 181157486. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-04 11:43:18,969][134211] Avg episode reward: [(0, '10.430')] [2025-01-04 11:43:21,868][134294] Updated weights for policy 0, policy_version 187504 (0.0025) [2025-01-04 11:43:23,968][134211] Fps is (10 sec: 13516.7, 60 sec: 13994.7, 300 sec: 14065.2). Total num frames: 768040960. Throughput: 0: 3546.0. Samples: 181178236. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-04 11:43:23,968][134211] Avg episode reward: [(0, '8.604')] [2025-01-04 11:43:25,023][134294] Updated weights for policy 0, policy_version 187514 (0.0025) [2025-01-04 11:43:28,324][134294] Updated weights for policy 0, policy_version 187524 (0.0025) [2025-01-04 11:43:28,968][134211] Fps is (10 sec: 12697.7, 60 sec: 14062.9, 300 sec: 13981.9). Total num frames: 768102400. Throughput: 0: 3262.8. Samples: 181197236. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 11:43:28,968][134211] Avg episode reward: [(0, '9.017')] [2025-01-04 11:43:31,674][134294] Updated weights for policy 0, policy_version 187534 (0.0025) [2025-01-04 11:43:33,968][134211] Fps is (10 sec: 12697.3, 60 sec: 14131.1, 300 sec: 13884.7). Total num frames: 768167936. Throughput: 0: 3235.9. Samples: 181206702. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 11:43:33,969][134211] Avg episode reward: [(0, '8.432')] [2025-01-04 11:43:34,892][134294] Updated weights for policy 0, policy_version 187544 (0.0025) [2025-01-04 11:43:37,832][134294] Updated weights for policy 0, policy_version 187554 (0.0024) [2025-01-04 11:43:38,968][134211] Fps is (10 sec: 13107.2, 60 sec: 13585.1, 300 sec: 13884.8). Total num frames: 768233472. Throughput: 0: 3228.0. Samples: 181226112. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 11:43:38,968][134211] Avg episode reward: [(0, '9.164')] [2025-01-04 11:43:40,884][134294] Updated weights for policy 0, policy_version 187564 (0.0025) [2025-01-04 11:43:43,857][134294] Updated weights for policy 0, policy_version 187574 (0.0024) [2025-01-04 11:43:43,968][134211] Fps is (10 sec: 13517.0, 60 sec: 13038.9, 300 sec: 13898.6). Total num frames: 768303104. Throughput: 0: 3256.0. Samples: 181246760. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 11:43:43,968][134211] Avg episode reward: [(0, '9.663')] [2025-01-04 11:43:46,772][134294] Updated weights for policy 0, policy_version 187584 (0.0024) [2025-01-04 11:43:48,968][134211] Fps is (10 sec: 13926.5, 60 sec: 13107.3, 300 sec: 13898.6). Total num frames: 768372736. Throughput: 0: 3277.8. Samples: 181257100. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 11:43:48,968][134211] Avg episode reward: [(0, '9.769')] [2025-01-04 11:43:49,903][134294] Updated weights for policy 0, policy_version 187594 (0.0025) [2025-01-04 11:43:52,747][134294] Updated weights for policy 0, policy_version 187604 (0.0024) [2025-01-04 11:43:53,968][134211] Fps is (10 sec: 13517.0, 60 sec: 13107.2, 300 sec: 13912.5). Total num frames: 768438272. Throughput: 0: 3311.2. Samples: 181277660. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 11:43:53,968][134211] Avg episode reward: [(0, '10.077')] [2025-01-04 11:43:55,787][134294] Updated weights for policy 0, policy_version 187614 (0.0026) [2025-01-04 11:43:58,864][134294] Updated weights for policy 0, policy_version 187624 (0.0024) [2025-01-04 11:43:58,968][134211] Fps is (10 sec: 13516.6, 60 sec: 13175.5, 300 sec: 13940.3). Total num frames: 768507904. Throughput: 0: 3351.3. Samples: 181298070. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 11:43:58,968][134211] Avg episode reward: [(0, '10.268')] [2025-01-04 11:44:01,331][134294] Updated weights for policy 0, policy_version 187634 (0.0017) [2025-01-04 11:44:03,379][134294] Updated weights for policy 0, policy_version 187644 (0.0013) [2025-01-04 11:44:03,968][134211] Fps is (10 sec: 15974.5, 60 sec: 13653.4, 300 sec: 14009.7). Total num frames: 768598016. Throughput: 0: 3379.7. Samples: 181309570. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 11:44:03,968][134211] Avg episode reward: [(0, '9.696')] [2025-01-04 11:44:06,161][134294] Updated weights for policy 0, policy_version 187654 (0.0023) [2025-01-04 11:44:08,968][134211] Fps is (10 sec: 15974.4, 60 sec: 13789.8, 300 sec: 14009.7). Total num frames: 768667648. Throughput: 0: 3468.4. Samples: 181334316. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 11:44:08,968][134211] Avg episode reward: [(0, '9.389')] [2025-01-04 11:44:09,254][134294] Updated weights for policy 0, policy_version 187664 (0.0025) [2025-01-04 11:44:12,300][134294] Updated weights for policy 0, policy_version 187674 (0.0026) [2025-01-04 11:44:13,968][134211] Fps is (10 sec: 13516.6, 60 sec: 13789.9, 300 sec: 14009.7). Total num frames: 768733184. Throughput: 0: 3492.9. Samples: 181354418. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 11:44:13,968][134211] Avg episode reward: [(0, '9.487')] [2025-01-04 11:44:15,225][134294] Updated weights for policy 0, policy_version 187684 (0.0023) [2025-01-04 11:44:18,275][134294] Updated weights for policy 0, policy_version 187694 (0.0026) [2025-01-04 11:44:18,968][134211] Fps is (10 sec: 13516.3, 60 sec: 13789.8, 300 sec: 14009.7). Total num frames: 768802816. Throughput: 0: 3510.2. Samples: 181364662. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 11:44:18,969][134211] Avg episode reward: [(0, '9.291')] [2025-01-04 11:44:21,260][134294] Updated weights for policy 0, policy_version 187704 (0.0024) [2025-01-04 11:44:23,968][134211] Fps is (10 sec: 13516.9, 60 sec: 13789.9, 300 sec: 14023.6). Total num frames: 768868352. Throughput: 0: 3532.9. Samples: 181385094. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 11:44:23,968][134211] Avg episode reward: [(0, '9.372')] [2025-01-04 11:44:24,399][134294] Updated weights for policy 0, policy_version 187714 (0.0028) [2025-01-04 11:44:27,691][134294] Updated weights for policy 0, policy_version 187724 (0.0028) [2025-01-04 11:44:28,968][134211] Fps is (10 sec: 12698.1, 60 sec: 13789.9, 300 sec: 13926.4). Total num frames: 768929792. Throughput: 0: 3494.3. Samples: 181404004. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 11:44:28,968][134211] Avg episode reward: [(0, '9.201')] [2025-01-04 11:44:30,964][134294] Updated weights for policy 0, policy_version 187734 (0.0025) [2025-01-04 11:44:33,637][134294] Updated weights for policy 0, policy_version 187744 (0.0018) [2025-01-04 11:44:33,968][134211] Fps is (10 sec: 13516.9, 60 sec: 13926.5, 300 sec: 13843.1). Total num frames: 769003520. Throughput: 0: 3474.1. Samples: 181413436. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 11:44:33,968][134211] Avg episode reward: [(0, '8.930')] [2025-01-04 11:44:35,907][134294] Updated weights for policy 0, policy_version 187754 (0.0014) [2025-01-04 11:44:37,874][134294] Updated weights for policy 0, policy_version 187764 (0.0013) [2025-01-04 11:44:38,967][134211] Fps is (10 sec: 17203.6, 60 sec: 14472.6, 300 sec: 13954.2). Total num frames: 769101824. Throughput: 0: 3598.9. Samples: 181439612. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 11:44:38,968][134211] Avg episode reward: [(0, '9.438')] [2025-01-04 11:44:39,791][134294] Updated weights for policy 0, policy_version 187774 (0.0011) [2025-01-04 11:44:41,709][134294] Updated weights for policy 0, policy_version 187784 (0.0014) [2025-01-04 11:44:43,968][134211] Fps is (10 sec: 19250.6, 60 sec: 14882.1, 300 sec: 14051.4). Total num frames: 769196032. Throughput: 0: 3827.9. Samples: 181470326. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 11:44:43,969][134211] Avg episode reward: [(0, '8.963')] [2025-01-04 11:44:44,414][134294] Updated weights for policy 0, policy_version 187794 (0.0022) [2025-01-04 11:44:47,565][134294] Updated weights for policy 0, policy_version 187804 (0.0028) [2025-01-04 11:44:48,968][134211] Fps is (10 sec: 15973.0, 60 sec: 14813.7, 300 sec: 14051.4). Total num frames: 769261568. Throughput: 0: 3791.2. Samples: 181480176. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 11:44:48,969][134211] Avg episode reward: [(0, '11.362')] [2025-01-04 11:44:48,970][134264] Saving new best policy, reward=11.362! [2025-01-04 11:44:50,711][134294] Updated weights for policy 0, policy_version 187814 (0.0025) [2025-01-04 11:44:53,768][134294] Updated weights for policy 0, policy_version 187824 (0.0026) [2025-01-04 11:44:53,968][134211] Fps is (10 sec: 13107.5, 60 sec: 14813.9, 300 sec: 14065.3). Total num frames: 769327104. Throughput: 0: 3675.0. Samples: 181499692. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 11:44:53,968][134211] Avg episode reward: [(0, '9.207')] [2025-01-04 11:44:56,828][134294] Updated weights for policy 0, policy_version 187834 (0.0025) [2025-01-04 11:44:58,968][134211] Fps is (10 sec: 13107.6, 60 sec: 14745.5, 300 sec: 14093.0). Total num frames: 769392640. Throughput: 0: 3665.1. Samples: 181519348. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 11:44:58,969][134211] Avg episode reward: [(0, '9.535')] [2025-01-04 11:45:00,147][134294] Updated weights for policy 0, policy_version 187844 (0.0025) [2025-01-04 11:45:03,409][134294] Updated weights for policy 0, policy_version 187854 (0.0025) [2025-01-04 11:45:03,968][134211] Fps is (10 sec: 12697.4, 60 sec: 14267.7, 300 sec: 14065.2). Total num frames: 769454080. Throughput: 0: 3642.9. Samples: 181528592. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 11:45:03,968][134211] Avg episode reward: [(0, '9.057')] [2025-01-04 11:45:06,521][134294] Updated weights for policy 0, policy_version 187864 (0.0025) [2025-01-04 11:45:08,968][134211] Fps is (10 sec: 12698.0, 60 sec: 14199.5, 300 sec: 14051.4). Total num frames: 769519616. Throughput: 0: 3621.0. Samples: 181548040. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 11:45:08,968][134211] Avg episode reward: [(0, '9.826')] [2025-01-04 11:45:09,670][134294] Updated weights for policy 0, policy_version 187874 (0.0024) [2025-01-04 11:45:12,812][134294] Updated weights for policy 0, policy_version 187884 (0.0025) [2025-01-04 11:45:13,968][134211] Fps is (10 sec: 13107.1, 60 sec: 14199.4, 300 sec: 14037.5). Total num frames: 769585152. Throughput: 0: 3636.2. Samples: 181567632. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 11:45:13,969][134211] Avg episode reward: [(0, '9.231')] [2025-01-04 11:45:14,048][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000187888_769589248.pth... [2025-01-04 11:45:14,116][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000187066_766222336.pth [2025-01-04 11:45:15,806][134294] Updated weights for policy 0, policy_version 187894 (0.0028) [2025-01-04 11:45:18,673][134294] Updated weights for policy 0, policy_version 187904 (0.0025) [2025-01-04 11:45:18,968][134211] Fps is (10 sec: 13926.5, 60 sec: 14267.9, 300 sec: 14065.3). Total num frames: 769658880. Throughput: 0: 3657.4. Samples: 181578020. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 11:45:18,968][134211] Avg episode reward: [(0, '9.356')] [2025-01-04 11:45:21,686][134294] Updated weights for policy 0, policy_version 187914 (0.0024) [2025-01-04 11:45:23,968][134211] Fps is (10 sec: 13926.6, 60 sec: 14267.7, 300 sec: 14079.1). Total num frames: 769724416. Throughput: 0: 3540.2. Samples: 181598920. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 11:45:23,968][134211] Avg episode reward: [(0, '9.324')] [2025-01-04 11:45:24,660][134294] Updated weights for policy 0, policy_version 187924 (0.0027) [2025-01-04 11:45:28,061][134294] Updated weights for policy 0, policy_version 187934 (0.0025) [2025-01-04 11:45:28,968][134211] Fps is (10 sec: 13107.0, 60 sec: 14336.0, 300 sec: 14093.0). Total num frames: 769789952. Throughput: 0: 3287.5. Samples: 181618264. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 11:45:28,968][134211] Avg episode reward: [(0, '8.615')] [2025-01-04 11:45:31,156][134294] Updated weights for policy 0, policy_version 187944 (0.0024) [2025-01-04 11:45:33,968][134211] Fps is (10 sec: 12697.5, 60 sec: 14131.1, 300 sec: 13940.3). Total num frames: 769851392. Throughput: 0: 3281.6. Samples: 181627848. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 11:45:33,968][134211] Avg episode reward: [(0, '9.409')] [2025-01-04 11:45:34,598][134294] Updated weights for policy 0, policy_version 187954 (0.0029) [2025-01-04 11:45:37,537][134294] Updated weights for policy 0, policy_version 187964 (0.0025) [2025-01-04 11:45:38,968][134211] Fps is (10 sec: 12697.5, 60 sec: 13585.0, 300 sec: 13815.3). Total num frames: 769916928. Throughput: 0: 3274.4. Samples: 181647040. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 11:45:38,969][134211] Avg episode reward: [(0, '9.440')] [2025-01-04 11:45:40,577][134294] Updated weights for policy 0, policy_version 187974 (0.0027) [2025-01-04 11:45:43,475][134294] Updated weights for policy 0, policy_version 187984 (0.0026) [2025-01-04 11:45:43,968][134211] Fps is (10 sec: 13516.8, 60 sec: 13175.5, 300 sec: 13843.2). Total num frames: 769986560. Throughput: 0: 3297.4. Samples: 181667730. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 11:45:43,969][134211] Avg episode reward: [(0, '9.180')] [2025-01-04 11:45:46,445][134294] Updated weights for policy 0, policy_version 187994 (0.0027) [2025-01-04 11:45:48,968][134211] Fps is (10 sec: 13926.4, 60 sec: 13243.9, 300 sec: 13870.9). Total num frames: 770056192. Throughput: 0: 3322.9. Samples: 181678122. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 11:45:48,968][134211] Avg episode reward: [(0, '9.672')] [2025-01-04 11:45:49,480][134294] Updated weights for policy 0, policy_version 188004 (0.0024) [2025-01-04 11:45:52,414][134294] Updated weights for policy 0, policy_version 188014 (0.0024) [2025-01-04 11:45:53,968][134211] Fps is (10 sec: 13926.5, 60 sec: 13312.0, 300 sec: 13898.6). Total num frames: 770125824. Throughput: 0: 3349.2. Samples: 181698756. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 11:45:53,968][134211] Avg episode reward: [(0, '11.911')] [2025-01-04 11:45:53,977][134264] Saving new best policy, reward=11.911! [2025-01-04 11:45:55,398][134294] Updated weights for policy 0, policy_version 188024 (0.0024) [2025-01-04 11:45:57,986][134294] Updated weights for policy 0, policy_version 188034 (0.0020) [2025-01-04 11:45:58,968][134211] Fps is (10 sec: 14745.9, 60 sec: 13516.9, 300 sec: 13940.3). Total num frames: 770203648. Throughput: 0: 3407.8. Samples: 181720984. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 11:45:58,968][134211] Avg episode reward: [(0, '9.782')] [2025-01-04 11:46:00,444][134294] Updated weights for policy 0, policy_version 188044 (0.0018) [2025-01-04 11:46:03,757][134294] Updated weights for policy 0, policy_version 188054 (0.0025) [2025-01-04 11:46:03,968][134211] Fps is (10 sec: 14336.0, 60 sec: 13585.1, 300 sec: 13940.3). Total num frames: 770269184. Throughput: 0: 3439.0. Samples: 181732774. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 11:46:03,968][134211] Avg episode reward: [(0, '10.390')] [2025-01-04 11:46:06,706][134294] Updated weights for policy 0, policy_version 188064 (0.0023) [2025-01-04 11:46:08,641][134294] Updated weights for policy 0, policy_version 188074 (0.0013) [2025-01-04 11:46:08,967][134211] Fps is (10 sec: 15155.3, 60 sec: 13926.4, 300 sec: 14009.7). Total num frames: 770355200. Throughput: 0: 3425.7. Samples: 181753074. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 11:46:08,968][134211] Avg episode reward: [(0, '9.835')] [2025-01-04 11:46:10,550][134294] Updated weights for policy 0, policy_version 188084 (0.0014) [2025-01-04 11:46:12,422][134294] Updated weights for policy 0, policy_version 188094 (0.0013) [2025-01-04 11:46:13,968][134211] Fps is (10 sec: 19661.1, 60 sec: 14677.4, 300 sec: 14176.3). Total num frames: 770465792. Throughput: 0: 3719.2. Samples: 181785626. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 11:46:13,968][134211] Avg episode reward: [(0, '9.212')] [2025-01-04 11:46:14,326][134294] Updated weights for policy 0, policy_version 188104 (0.0014) [2025-01-04 11:46:16,317][134294] Updated weights for policy 0, policy_version 188114 (0.0014) [2025-01-04 11:46:18,968][134211] Fps is (10 sec: 19250.7, 60 sec: 14813.8, 300 sec: 14231.9). Total num frames: 770547712. Throughput: 0: 3855.9. Samples: 181801364. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 11:46:18,969][134211] Avg episode reward: [(0, '10.098')] [2025-01-04 11:46:19,529][134294] Updated weights for policy 0, policy_version 188124 (0.0026) [2025-01-04 11:46:22,845][134294] Updated weights for policy 0, policy_version 188134 (0.0025) [2025-01-04 11:46:23,968][134211] Fps is (10 sec: 14335.7, 60 sec: 14745.6, 300 sec: 14245.7). Total num frames: 770609152. Throughput: 0: 3851.4. Samples: 181820354. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 11:46:23,968][134211] Avg episode reward: [(0, '9.080')] [2025-01-04 11:46:25,966][134294] Updated weights for policy 0, policy_version 188144 (0.0025) [2025-01-04 11:46:28,968][134211] Fps is (10 sec: 12697.7, 60 sec: 14745.6, 300 sec: 14190.2). Total num frames: 770674688. Throughput: 0: 3819.5. Samples: 181839608. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 11:46:28,968][134211] Avg episode reward: [(0, '8.377')] [2025-01-04 11:46:29,241][134294] Updated weights for policy 0, policy_version 188154 (0.0027) [2025-01-04 11:46:32,737][134294] Updated weights for policy 0, policy_version 188164 (0.0025) [2025-01-04 11:46:33,968][134211] Fps is (10 sec: 12288.2, 60 sec: 14677.4, 300 sec: 14093.0). Total num frames: 770732032. Throughput: 0: 3785.0. Samples: 181848448. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 11:46:33,968][134211] Avg episode reward: [(0, '8.676')] [2025-01-04 11:46:36,295][134294] Updated weights for policy 0, policy_version 188174 (0.0028) [2025-01-04 11:46:38,968][134211] Fps is (10 sec: 11468.7, 60 sec: 14540.8, 300 sec: 14065.2). Total num frames: 770789376. Throughput: 0: 3717.8. Samples: 181866058. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 11:46:38,968][134211] Avg episode reward: [(0, '9.837')] [2025-01-04 11:46:39,705][134294] Updated weights for policy 0, policy_version 188184 (0.0023) [2025-01-04 11:46:43,043][134294] Updated weights for policy 0, policy_version 188194 (0.0032) [2025-01-04 11:46:43,968][134211] Fps is (10 sec: 12288.0, 60 sec: 14472.6, 300 sec: 14051.4). Total num frames: 770854912. Throughput: 0: 3633.7. Samples: 181884502. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 11:46:43,968][134211] Avg episode reward: [(0, '9.037')] [2025-01-04 11:46:45,927][134294] Updated weights for policy 0, policy_version 188204 (0.0025) [2025-01-04 11:46:48,858][134294] Updated weights for policy 0, policy_version 188214 (0.0025) [2025-01-04 11:46:48,968][134211] Fps is (10 sec: 13516.8, 60 sec: 14472.5, 300 sec: 14065.3). Total num frames: 770924544. Throughput: 0: 3606.5. Samples: 181895068. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 11:46:48,968][134211] Avg episode reward: [(0, '9.145')] [2025-01-04 11:46:51,710][134294] Updated weights for policy 0, policy_version 188224 (0.0024) [2025-01-04 11:46:53,968][134211] Fps is (10 sec: 13926.3, 60 sec: 14472.5, 300 sec: 14093.0). Total num frames: 770994176. Throughput: 0: 3623.1. Samples: 181916114. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 11:46:53,968][134211] Avg episode reward: [(0, '9.732')] [2025-01-04 11:46:54,759][134294] Updated weights for policy 0, policy_version 188234 (0.0028) [2025-01-04 11:46:57,842][134294] Updated weights for policy 0, policy_version 188244 (0.0027) [2025-01-04 11:46:58,968][134211] Fps is (10 sec: 13516.9, 60 sec: 14267.7, 300 sec: 14106.9). Total num frames: 771059712. Throughput: 0: 3341.6. Samples: 181935998. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 11:46:58,968][134211] Avg episode reward: [(0, '9.446')] [2025-01-04 11:47:00,969][134294] Updated weights for policy 0, policy_version 188254 (0.0026) [2025-01-04 11:47:03,968][134211] Fps is (10 sec: 13107.2, 60 sec: 14267.7, 300 sec: 14065.2). Total num frames: 771125248. Throughput: 0: 3215.6. Samples: 181946066. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 11:47:03,968][134211] Avg episode reward: [(0, '9.364')] [2025-01-04 11:47:04,135][134294] Updated weights for policy 0, policy_version 188264 (0.0024) [2025-01-04 11:47:07,201][134294] Updated weights for policy 0, policy_version 188274 (0.0026) [2025-01-04 11:47:08,968][134211] Fps is (10 sec: 13107.2, 60 sec: 13926.4, 300 sec: 13981.9). Total num frames: 771190784. Throughput: 0: 3234.0. Samples: 181965882. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 11:47:08,968][134211] Avg episode reward: [(0, '10.138')] [2025-01-04 11:47:10,233][134294] Updated weights for policy 0, policy_version 188284 (0.0026) [2025-01-04 11:47:13,134][134294] Updated weights for policy 0, policy_version 188294 (0.0024) [2025-01-04 11:47:13,968][134211] Fps is (10 sec: 13516.6, 60 sec: 13243.7, 300 sec: 13995.9). Total num frames: 771260416. Throughput: 0: 3259.3. Samples: 181986278. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 11:47:13,969][134211] Avg episode reward: [(0, '9.783')] [2025-01-04 11:47:13,983][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000188296_771260416.pth... [2025-01-04 11:47:14,060][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000187477_767905792.pth [2025-01-04 11:47:16,244][134294] Updated weights for policy 0, policy_version 188304 (0.0023) [2025-01-04 11:47:18,968][134211] Fps is (10 sec: 13926.3, 60 sec: 13039.0, 300 sec: 13995.8). Total num frames: 771330048. Throughput: 0: 3284.8. Samples: 181996266. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 11:47:18,968][134211] Avg episode reward: [(0, '9.584')] [2025-01-04 11:47:19,150][134294] Updated weights for policy 0, policy_version 188314 (0.0025) [2025-01-04 11:47:22,132][134294] Updated weights for policy 0, policy_version 188324 (0.0026) [2025-01-04 11:47:23,968][134211] Fps is (10 sec: 13926.6, 60 sec: 13175.5, 300 sec: 14037.5). Total num frames: 771399680. Throughput: 0: 3357.5. Samples: 182017144. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 11:47:23,968][134211] Avg episode reward: [(0, '10.521')] [2025-01-04 11:47:25,064][134294] Updated weights for policy 0, policy_version 188334 (0.0026) [2025-01-04 11:47:28,121][134294] Updated weights for policy 0, policy_version 188344 (0.0025) [2025-01-04 11:47:28,968][134211] Fps is (10 sec: 13517.0, 60 sec: 13175.5, 300 sec: 14051.4). Total num frames: 771465216. Throughput: 0: 3401.6. Samples: 182037574. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 11:47:28,968][134211] Avg episode reward: [(0, '9.570')] [2025-01-04 11:47:30,457][134294] Updated weights for policy 0, policy_version 188354 (0.0016) [2025-01-04 11:47:33,214][134294] Updated weights for policy 0, policy_version 188364 (0.0022) [2025-01-04 11:47:33,968][134211] Fps is (10 sec: 14745.8, 60 sec: 13585.1, 300 sec: 13995.8). Total num frames: 771547136. Throughput: 0: 3466.8. Samples: 182051074. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 11:47:33,968][134211] Avg episode reward: [(0, '9.984')] [2025-01-04 11:47:36,323][134294] Updated weights for policy 0, policy_version 188374 (0.0022) [2025-01-04 11:47:38,968][134211] Fps is (10 sec: 14745.3, 60 sec: 13721.6, 300 sec: 13870.9). Total num frames: 771612672. Throughput: 0: 3438.5. Samples: 182070848. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 11:47:38,968][134211] Avg episode reward: [(0, '9.295')] [2025-01-04 11:47:39,406][134294] Updated weights for policy 0, policy_version 188384 (0.0022) [2025-01-04 11:47:41,362][134294] Updated weights for policy 0, policy_version 188394 (0.0014) [2025-01-04 11:47:43,968][134211] Fps is (10 sec: 15155.3, 60 sec: 14062.9, 300 sec: 13940.3). Total num frames: 771698688. Throughput: 0: 3547.3. Samples: 182095626. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 11:47:43,968][134211] Avg episode reward: [(0, '8.523')] [2025-01-04 11:47:44,060][134294] Updated weights for policy 0, policy_version 188404 (0.0021) [2025-01-04 11:47:47,159][134294] Updated weights for policy 0, policy_version 188414 (0.0026) [2025-01-04 11:47:48,968][134211] Fps is (10 sec: 15154.8, 60 sec: 13994.6, 300 sec: 13940.3). Total num frames: 771764224. Throughput: 0: 3546.4. Samples: 182105654. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 11:47:48,969][134211] Avg episode reward: [(0, '9.041')] [2025-01-04 11:47:50,205][134294] Updated weights for policy 0, policy_version 188424 (0.0026) [2025-01-04 11:47:53,179][134294] Updated weights for policy 0, policy_version 188434 (0.0023) [2025-01-04 11:47:53,968][134211] Fps is (10 sec: 13516.6, 60 sec: 13994.7, 300 sec: 13954.2). Total num frames: 771833856. Throughput: 0: 3567.7. Samples: 182126430. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 11:47:53,968][134211] Avg episode reward: [(0, '9.086')] [2025-01-04 11:47:56,103][134294] Updated weights for policy 0, policy_version 188444 (0.0025) [2025-01-04 11:47:58,968][134211] Fps is (10 sec: 13517.2, 60 sec: 13994.6, 300 sec: 13968.1). Total num frames: 771899392. Throughput: 0: 3544.6. Samples: 182145786. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 11:47:58,968][134211] Avg episode reward: [(0, '9.532')] [2025-01-04 11:47:59,720][134294] Updated weights for policy 0, policy_version 188454 (0.0027) [2025-01-04 11:48:02,542][134294] Updated weights for policy 0, policy_version 188464 (0.0020) [2025-01-04 11:48:03,968][134211] Fps is (10 sec: 14336.2, 60 sec: 14199.5, 300 sec: 14023.6). Total num frames: 771977216. Throughput: 0: 3518.2. Samples: 182154586. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 11:48:03,968][134211] Avg episode reward: [(0, '9.741')] [2025-01-04 11:48:04,579][134294] Updated weights for policy 0, policy_version 188474 (0.0013) [2025-01-04 11:48:06,979][134294] Updated weights for policy 0, policy_version 188484 (0.0019) [2025-01-04 11:48:08,968][134211] Fps is (10 sec: 15565.0, 60 sec: 14404.3, 300 sec: 14065.3). Total num frames: 772055040. Throughput: 0: 3653.6. Samples: 182181554. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 11:48:08,968][134211] Avg episode reward: [(0, '9.485')] [2025-01-04 11:48:10,036][134294] Updated weights for policy 0, policy_version 188494 (0.0026) [2025-01-04 11:48:13,116][134294] Updated weights for policy 0, policy_version 188504 (0.0024) [2025-01-04 11:48:13,968][134211] Fps is (10 sec: 14335.9, 60 sec: 14336.1, 300 sec: 14051.4). Total num frames: 772120576. Throughput: 0: 3642.2. Samples: 182201472. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 11:48:13,968][134211] Avg episode reward: [(0, '9.630')] [2025-01-04 11:48:16,343][134294] Updated weights for policy 0, policy_version 188514 (0.0025) [2025-01-04 11:48:18,968][134211] Fps is (10 sec: 13107.0, 60 sec: 14267.7, 300 sec: 14051.4). Total num frames: 772186112. Throughput: 0: 3552.5. Samples: 182210938. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 11:48:18,968][134211] Avg episode reward: [(0, '8.643')] [2025-01-04 11:48:19,543][134294] Updated weights for policy 0, policy_version 188524 (0.0024) [2025-01-04 11:48:22,525][134294] Updated weights for policy 0, policy_version 188534 (0.0025) [2025-01-04 11:48:23,968][134211] Fps is (10 sec: 13107.1, 60 sec: 14199.5, 300 sec: 14065.2). Total num frames: 772251648. Throughput: 0: 3556.8. Samples: 182230904. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 11:48:23,968][134211] Avg episode reward: [(0, '9.462')] [2025-01-04 11:48:25,522][134294] Updated weights for policy 0, policy_version 188544 (0.0027) [2025-01-04 11:48:28,926][134294] Updated weights for policy 0, policy_version 188554 (0.0024) [2025-01-04 11:48:28,968][134211] Fps is (10 sec: 13107.4, 60 sec: 14199.4, 300 sec: 14065.3). Total num frames: 772317184. Throughput: 0: 3438.2. Samples: 182250344. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 11:48:28,968][134211] Avg episode reward: [(0, '8.693')] [2025-01-04 11:48:32,369][134294] Updated weights for policy 0, policy_version 188564 (0.0025) [2025-01-04 11:48:33,968][134211] Fps is (10 sec: 12287.9, 60 sec: 13789.8, 300 sec: 14037.5). Total num frames: 772374528. Throughput: 0: 3413.0. Samples: 182259238. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 11:48:33,968][134211] Avg episode reward: [(0, '9.787')] [2025-01-04 11:48:35,249][134294] Updated weights for policy 0, policy_version 188574 (0.0019) [2025-01-04 11:48:37,325][134294] Updated weights for policy 0, policy_version 188584 (0.0013) [2025-01-04 11:48:38,968][134211] Fps is (10 sec: 15564.9, 60 sec: 14336.0, 300 sec: 14134.7). Total num frames: 772472832. Throughput: 0: 3471.3. Samples: 182282636. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 11:48:38,968][134211] Avg episode reward: [(0, '9.061')] [2025-01-04 11:48:39,243][134294] Updated weights for policy 0, policy_version 188594 (0.0012) [2025-01-04 11:48:41,107][134294] Updated weights for policy 0, policy_version 188604 (0.0014) [2025-01-04 11:48:43,002][134294] Updated weights for policy 0, policy_version 188614 (0.0013) [2025-01-04 11:48:43,968][134211] Fps is (10 sec: 20480.5, 60 sec: 14677.4, 300 sec: 14259.6). Total num frames: 772579328. Throughput: 0: 3757.8. Samples: 182314888. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 11:48:43,968][134211] Avg episode reward: [(0, '8.775')] [2025-01-04 11:48:44,900][134294] Updated weights for policy 0, policy_version 188624 (0.0014) [2025-01-04 11:48:47,538][134294] Updated weights for policy 0, policy_version 188634 (0.0023) [2025-01-04 11:48:48,968][134211] Fps is (10 sec: 18841.2, 60 sec: 14950.5, 300 sec: 14315.2). Total num frames: 772661248. Throughput: 0: 3897.4. Samples: 182329968. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 11:48:48,968][134211] Avg episode reward: [(0, '9.034')] [2025-01-04 11:48:50,843][134294] Updated weights for policy 0, policy_version 188644 (0.0031) [2025-01-04 11:48:53,960][134294] Updated weights for policy 0, policy_version 188654 (0.0027) [2025-01-04 11:48:53,969][134211] Fps is (10 sec: 14743.6, 60 sec: 14881.9, 300 sec: 14301.2). Total num frames: 772726784. Throughput: 0: 3720.6. Samples: 182348984. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 11:48:53,970][134211] Avg episode reward: [(0, '10.086')] [2025-01-04 11:48:57,028][134294] Updated weights for policy 0, policy_version 188664 (0.0027) [2025-01-04 11:48:58,969][134211] Fps is (10 sec: 13105.8, 60 sec: 14881.9, 300 sec: 14217.9). Total num frames: 772792320. Throughput: 0: 3720.3. Samples: 182368888. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 11:48:58,969][134211] Avg episode reward: [(0, '8.894')] [2025-01-04 11:49:00,253][134294] Updated weights for policy 0, policy_version 188674 (0.0027) [2025-01-04 11:49:03,636][134294] Updated weights for policy 0, policy_version 188684 (0.0029) [2025-01-04 11:49:03,968][134211] Fps is (10 sec: 12699.0, 60 sec: 14609.0, 300 sec: 14190.2). Total num frames: 772853760. Throughput: 0: 3718.2. Samples: 182378258. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 11:49:03,968][134211] Avg episode reward: [(0, '10.001')] [2025-01-04 11:49:06,638][134294] Updated weights for policy 0, policy_version 188694 (0.0024) [2025-01-04 11:49:08,969][134211] Fps is (10 sec: 12697.6, 60 sec: 14404.0, 300 sec: 14190.2). Total num frames: 772919296. Throughput: 0: 3705.5. Samples: 182397654. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 11:49:08,969][134211] Avg episode reward: [(0, '9.138')] [2025-01-04 11:49:09,872][134294] Updated weights for policy 0, policy_version 188704 (0.0027) [2025-01-04 11:49:12,789][134294] Updated weights for policy 0, policy_version 188714 (0.0025) [2025-01-04 11:49:13,968][134211] Fps is (10 sec: 13107.2, 60 sec: 14404.2, 300 sec: 14176.3). Total num frames: 772984832. Throughput: 0: 3713.4. Samples: 182417446. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 11:49:13,968][134211] Avg episode reward: [(0, '9.658')] [2025-01-04 11:49:13,981][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000188717_772984832.pth... [2025-01-04 11:49:14,068][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000187888_769589248.pth [2025-01-04 11:49:15,881][134294] Updated weights for policy 0, policy_version 188724 (0.0029) [2025-01-04 11:49:18,857][134294] Updated weights for policy 0, policy_version 188734 (0.0025) [2025-01-04 11:49:18,968][134211] Fps is (10 sec: 13518.3, 60 sec: 14472.6, 300 sec: 14190.2). Total num frames: 773054464. Throughput: 0: 3738.7. Samples: 182427478. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 11:49:18,968][134211] Avg episode reward: [(0, '8.962')] [2025-01-04 11:49:21,861][134294] Updated weights for policy 0, policy_version 188744 (0.0026) [2025-01-04 11:49:23,968][134211] Fps is (10 sec: 13926.4, 60 sec: 14540.8, 300 sec: 14218.0). Total num frames: 773124096. Throughput: 0: 3680.8. Samples: 182448274. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 11:49:23,968][134211] Avg episode reward: [(0, '9.099')] [2025-01-04 11:49:24,891][134294] Updated weights for policy 0, policy_version 188754 (0.0025) [2025-01-04 11:49:28,094][134294] Updated weights for policy 0, policy_version 188764 (0.0025) [2025-01-04 11:49:28,968][134211] Fps is (10 sec: 13107.1, 60 sec: 14472.5, 300 sec: 14176.3). Total num frames: 773185536. Throughput: 0: 3397.1. Samples: 182467758. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 11:49:28,968][134211] Avg episode reward: [(0, '10.385')] [2025-01-04 11:49:31,482][134294] Updated weights for policy 0, policy_version 188774 (0.0025) [2025-01-04 11:49:33,968][134211] Fps is (10 sec: 12287.4, 60 sec: 14540.7, 300 sec: 14051.3). Total num frames: 773246976. Throughput: 0: 3266.3. Samples: 182476952. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 11:49:33,970][134211] Avg episode reward: [(0, '9.623')] [2025-01-04 11:49:34,774][134294] Updated weights for policy 0, policy_version 188784 (0.0025) [2025-01-04 11:49:37,828][134294] Updated weights for policy 0, policy_version 188794 (0.0026) [2025-01-04 11:49:38,968][134211] Fps is (10 sec: 12697.8, 60 sec: 13994.7, 300 sec: 13954.2). Total num frames: 773312512. Throughput: 0: 3272.1. Samples: 182496224. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 11:49:38,968][134211] Avg episode reward: [(0, '9.513')] [2025-01-04 11:49:40,892][134294] Updated weights for policy 0, policy_version 188804 (0.0025) [2025-01-04 11:49:43,968][134211] Fps is (10 sec: 13107.8, 60 sec: 13311.9, 300 sec: 13954.2). Total num frames: 773378048. Throughput: 0: 3273.1. Samples: 182516176. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 11:49:43,968][134211] Avg episode reward: [(0, '9.317')] [2025-01-04 11:49:44,041][134294] Updated weights for policy 0, policy_version 188814 (0.0026) [2025-01-04 11:49:47,096][134294] Updated weights for policy 0, policy_version 188824 (0.0023) [2025-01-04 11:49:48,968][134211] Fps is (10 sec: 13516.7, 60 sec: 13107.2, 300 sec: 13968.1). Total num frames: 773447680. Throughput: 0: 3278.9. Samples: 182525806. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 11:49:48,968][134211] Avg episode reward: [(0, '9.373')] [2025-01-04 11:49:50,101][134294] Updated weights for policy 0, policy_version 188834 (0.0025) [2025-01-04 11:49:52,949][134294] Updated weights for policy 0, policy_version 188844 (0.0026) [2025-01-04 11:49:53,968][134211] Fps is (10 sec: 13926.5, 60 sec: 13175.7, 300 sec: 13982.0). Total num frames: 773517312. Throughput: 0: 3313.2. Samples: 182546744. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 11:49:53,968][134211] Avg episode reward: [(0, '9.930')] [2025-01-04 11:49:55,885][134294] Updated weights for policy 0, policy_version 188854 (0.0027) [2025-01-04 11:49:58,935][134294] Updated weights for policy 0, policy_version 188864 (0.0026) [2025-01-04 11:49:58,968][134211] Fps is (10 sec: 13926.4, 60 sec: 13244.0, 300 sec: 14009.7). Total num frames: 773586944. Throughput: 0: 3336.5. Samples: 182567588. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 11:49:58,968][134211] Avg episode reward: [(0, '9.259')] [2025-01-04 11:50:02,277][134294] Updated weights for policy 0, policy_version 188874 (0.0026) [2025-01-04 11:50:03,968][134211] Fps is (10 sec: 13107.2, 60 sec: 13243.7, 300 sec: 13995.8). Total num frames: 773648384. Throughput: 0: 3320.8. Samples: 182576914. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 11:50:03,968][134211] Avg episode reward: [(0, '10.389')] [2025-01-04 11:50:05,053][134294] Updated weights for policy 0, policy_version 188884 (0.0022) [2025-01-04 11:50:06,932][134294] Updated weights for policy 0, policy_version 188894 (0.0014) [2025-01-04 11:50:08,795][134294] Updated weights for policy 0, policy_version 188904 (0.0013) [2025-01-04 11:50:08,968][134211] Fps is (10 sec: 16384.3, 60 sec: 13858.4, 300 sec: 14120.8). Total num frames: 773750784. Throughput: 0: 3418.9. Samples: 182602124. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 11:50:08,968][134211] Avg episode reward: [(0, '9.358')] [2025-01-04 11:50:10,697][134294] Updated weights for policy 0, policy_version 188914 (0.0012) [2025-01-04 11:50:12,548][134294] Updated weights for policy 0, policy_version 188924 (0.0013) [2025-01-04 11:50:13,968][134211] Fps is (10 sec: 21298.5, 60 sec: 14609.0, 300 sec: 14245.7). Total num frames: 773861376. Throughput: 0: 3710.4. Samples: 182634728. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 11:50:13,969][134211] Avg episode reward: [(0, '9.274')] [2025-01-04 11:50:14,672][134294] Updated weights for policy 0, policy_version 188934 (0.0016) [2025-01-04 11:50:17,768][134294] Updated weights for policy 0, policy_version 188944 (0.0027) [2025-01-04 11:50:18,969][134211] Fps is (10 sec: 17609.4, 60 sec: 14540.4, 300 sec: 14245.7). Total num frames: 773926912. Throughput: 0: 3773.4. Samples: 182646760. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 11:50:18,970][134211] Avg episode reward: [(0, '9.966')] [2025-01-04 11:50:20,930][134294] Updated weights for policy 0, policy_version 188954 (0.0026) [2025-01-04 11:50:23,970][134211] Fps is (10 sec: 13514.3, 60 sec: 14540.3, 300 sec: 14259.5). Total num frames: 773996544. Throughput: 0: 3777.0. Samples: 182666196. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 11:50:23,971][134211] Avg episode reward: [(0, '9.442')] [2025-01-04 11:50:23,973][134294] Updated weights for policy 0, policy_version 188964 (0.0026) [2025-01-04 11:50:27,141][134294] Updated weights for policy 0, policy_version 188974 (0.0026) [2025-01-04 11:50:28,968][134211] Fps is (10 sec: 12699.1, 60 sec: 14472.4, 300 sec: 14245.7). Total num frames: 774053888. Throughput: 0: 3761.5. Samples: 182685444. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 11:50:28,969][134211] Avg episode reward: [(0, '9.752')] [2025-01-04 11:50:30,684][134294] Updated weights for policy 0, policy_version 188984 (0.0028) [2025-01-04 11:50:33,968][134211] Fps is (10 sec: 11880.5, 60 sec: 14472.6, 300 sec: 14231.8). Total num frames: 774115328. Throughput: 0: 3743.5. Samples: 182694266. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 11:50:33,969][134211] Avg episode reward: [(0, '9.513')] [2025-01-04 11:50:34,189][134294] Updated weights for policy 0, policy_version 188994 (0.0023) [2025-01-04 11:50:37,529][134294] Updated weights for policy 0, policy_version 189004 (0.0026) [2025-01-04 11:50:38,968][134211] Fps is (10 sec: 12288.7, 60 sec: 14404.2, 300 sec: 14204.1). Total num frames: 774176768. Throughput: 0: 3676.5. Samples: 182712186. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 11:50:38,968][134211] Avg episode reward: [(0, '8.113')] [2025-01-04 11:50:40,828][134294] Updated weights for policy 0, policy_version 189014 (0.0022) [2025-01-04 11:50:43,946][134294] Updated weights for policy 0, policy_version 189024 (0.0028) [2025-01-04 11:50:43,968][134211] Fps is (10 sec: 12698.1, 60 sec: 14404.3, 300 sec: 14190.2). Total num frames: 774242304. Throughput: 0: 3642.8. Samples: 182731516. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 11:50:43,968][134211] Avg episode reward: [(0, '9.459')] [2025-01-04 11:50:46,776][134294] Updated weights for policy 0, policy_version 189034 (0.0026) [2025-01-04 11:50:48,968][134211] Fps is (10 sec: 13107.2, 60 sec: 14336.0, 300 sec: 14176.3). Total num frames: 774307840. Throughput: 0: 3662.4. Samples: 182741722. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 11:50:48,968][134211] Avg episode reward: [(0, '9.127')] [2025-01-04 11:50:49,899][134294] Updated weights for policy 0, policy_version 189044 (0.0025) [2025-01-04 11:50:52,833][134294] Updated weights for policy 0, policy_version 189054 (0.0026) [2025-01-04 11:50:53,968][134211] Fps is (10 sec: 13516.7, 60 sec: 14336.0, 300 sec: 14148.5). Total num frames: 774377472. Throughput: 0: 3557.0. Samples: 182762190. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 11:50:53,969][134211] Avg episode reward: [(0, '10.058')] [2025-01-04 11:50:55,869][134294] Updated weights for policy 0, policy_version 189064 (0.0025) [2025-01-04 11:50:58,793][134294] Updated weights for policy 0, policy_version 189074 (0.0023) [2025-01-04 11:50:58,968][134211] Fps is (10 sec: 13926.5, 60 sec: 14336.0, 300 sec: 14162.4). Total num frames: 774447104. Throughput: 0: 3297.4. Samples: 182783108. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 11:50:58,968][134211] Avg episode reward: [(0, '9.060')] [2025-01-04 11:51:02,007][134294] Updated weights for policy 0, policy_version 189084 (0.0028) [2025-01-04 11:51:03,968][134211] Fps is (10 sec: 13107.2, 60 sec: 14336.0, 300 sec: 14079.1). Total num frames: 774508544. Throughput: 0: 3239.2. Samples: 182792520. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 11:51:03,968][134211] Avg episode reward: [(0, '8.246')] [2025-01-04 11:51:05,249][134294] Updated weights for policy 0, policy_version 189094 (0.0025) [2025-01-04 11:51:08,091][134294] Updated weights for policy 0, policy_version 189104 (0.0024) [2025-01-04 11:51:08,968][134211] Fps is (10 sec: 13107.2, 60 sec: 13789.8, 300 sec: 13940.3). Total num frames: 774578176. Throughput: 0: 3250.5. Samples: 182812460. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 11:51:08,968][134211] Avg episode reward: [(0, '9.508')] [2025-01-04 11:51:11,119][134294] Updated weights for policy 0, policy_version 189114 (0.0026) [2025-01-04 11:51:13,968][134211] Fps is (10 sec: 13926.6, 60 sec: 13107.3, 300 sec: 13898.6). Total num frames: 774647808. Throughput: 0: 3273.2. Samples: 182832738. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 11:51:13,968][134211] Avg episode reward: [(0, '10.483')] [2025-01-04 11:51:13,983][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000189123_774647808.pth... [2025-01-04 11:51:14,072][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000188296_771260416.pth [2025-01-04 11:51:14,269][134294] Updated weights for policy 0, policy_version 189124 (0.0022) [2025-01-04 11:51:17,364][134294] Updated weights for policy 0, policy_version 189134 (0.0025) [2025-01-04 11:51:18,968][134211] Fps is (10 sec: 13516.7, 60 sec: 13107.6, 300 sec: 13912.5). Total num frames: 774713344. Throughput: 0: 3291.9. Samples: 182842400. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 11:51:18,968][134211] Avg episode reward: [(0, '9.068')] [2025-01-04 11:51:20,404][134294] Updated weights for policy 0, policy_version 189144 (0.0026) [2025-01-04 11:51:22,449][134294] Updated weights for policy 0, policy_version 189154 (0.0013) [2025-01-04 11:51:23,968][134211] Fps is (10 sec: 14745.1, 60 sec: 13312.4, 300 sec: 13968.0). Total num frames: 774795264. Throughput: 0: 3412.0. Samples: 182865728. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 11:51:23,970][134211] Avg episode reward: [(0, '8.320')] [2025-01-04 11:51:25,185][134294] Updated weights for policy 0, policy_version 189164 (0.0022) [2025-01-04 11:51:28,195][134294] Updated weights for policy 0, policy_version 189174 (0.0025) [2025-01-04 11:51:28,968][134211] Fps is (10 sec: 15155.1, 60 sec: 13516.9, 300 sec: 14009.7). Total num frames: 774864896. Throughput: 0: 3466.9. Samples: 182887526. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 11:51:28,968][134211] Avg episode reward: [(0, '9.141')] [2025-01-04 11:51:31,416][134294] Updated weights for policy 0, policy_version 189184 (0.0026) [2025-01-04 11:51:33,968][134211] Fps is (10 sec: 13107.7, 60 sec: 13516.9, 300 sec: 14023.6). Total num frames: 774926336. Throughput: 0: 3447.5. Samples: 182896858. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 11:51:33,969][134211] Avg episode reward: [(0, '8.791')] [2025-01-04 11:51:34,805][134294] Updated weights for policy 0, policy_version 189194 (0.0026) [2025-01-04 11:51:37,661][134294] Updated weights for policy 0, policy_version 189204 (0.0024) [2025-01-04 11:51:38,968][134211] Fps is (10 sec: 13926.7, 60 sec: 13789.9, 300 sec: 14065.3). Total num frames: 775004160. Throughput: 0: 3423.5. Samples: 182916248. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 11:51:38,968][134211] Avg episode reward: [(0, '8.881')] [2025-01-04 11:51:39,714][134294] Updated weights for policy 0, policy_version 189214 (0.0013) [2025-01-04 11:51:41,613][134294] Updated weights for policy 0, policy_version 189224 (0.0013) [2025-01-04 11:51:43,515][134294] Updated weights for policy 0, policy_version 189234 (0.0014) [2025-01-04 11:51:43,968][134211] Fps is (10 sec: 18431.5, 60 sec: 14472.5, 300 sec: 14190.2). Total num frames: 775110656. Throughput: 0: 3648.2. Samples: 182947280. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 11:51:43,968][134211] Avg episode reward: [(0, '9.236')] [2025-01-04 11:51:45,528][134294] Updated weights for policy 0, policy_version 189244 (0.0014) [2025-01-04 11:51:48,780][134294] Updated weights for policy 0, policy_version 189254 (0.0026) [2025-01-04 11:51:48,968][134211] Fps is (10 sec: 18022.0, 60 sec: 14609.0, 300 sec: 14204.1). Total num frames: 775184384. Throughput: 0: 3759.7. Samples: 182961708. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 11:51:48,969][134211] Avg episode reward: [(0, '9.962')] [2025-01-04 11:51:52,477][134294] Updated weights for policy 0, policy_version 189264 (0.0027) [2025-01-04 11:51:53,968][134211] Fps is (10 sec: 13107.1, 60 sec: 14404.2, 300 sec: 14176.3). Total num frames: 775241728. Throughput: 0: 3693.0. Samples: 182978648. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 11:51:53,969][134211] Avg episode reward: [(0, '9.344')] [2025-01-04 11:51:55,880][134294] Updated weights for policy 0, policy_version 189274 (0.0027) [2025-01-04 11:51:58,968][134211] Fps is (10 sec: 11878.4, 60 sec: 14267.7, 300 sec: 14162.4). Total num frames: 775303168. Throughput: 0: 3652.6. Samples: 182997106. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 11:51:58,968][134211] Avg episode reward: [(0, '9.316')] [2025-01-04 11:51:59,159][134294] Updated weights for policy 0, policy_version 189284 (0.0025) [2025-01-04 11:52:02,736][134294] Updated weights for policy 0, policy_version 189294 (0.0026) [2025-01-04 11:52:03,968][134211] Fps is (10 sec: 11878.6, 60 sec: 14199.5, 300 sec: 14134.7). Total num frames: 775360512. Throughput: 0: 3630.3. Samples: 183005764. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 11:52:03,969][134211] Avg episode reward: [(0, '9.632')] [2025-01-04 11:52:06,171][134294] Updated weights for policy 0, policy_version 189304 (0.0027) [2025-01-04 11:52:08,968][134211] Fps is (10 sec: 11878.6, 60 sec: 14062.9, 300 sec: 14106.9). Total num frames: 775421952. Throughput: 0: 3510.0. Samples: 183023676. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 11:52:08,968][134211] Avg episode reward: [(0, '9.548')] [2025-01-04 11:52:09,454][134294] Updated weights for policy 0, policy_version 189314 (0.0026) [2025-01-04 11:52:12,581][134294] Updated weights for policy 0, policy_version 189324 (0.0028) [2025-01-04 11:52:13,968][134211] Fps is (10 sec: 12697.7, 60 sec: 13994.7, 300 sec: 14093.0). Total num frames: 775487488. Throughput: 0: 3453.5. Samples: 183042932. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 11:52:13,968][134211] Avg episode reward: [(0, '9.248')] [2025-01-04 11:52:15,508][134294] Updated weights for policy 0, policy_version 189334 (0.0024) [2025-01-04 11:52:18,456][134294] Updated weights for policy 0, policy_version 189344 (0.0026) [2025-01-04 11:52:18,968][134211] Fps is (10 sec: 13516.9, 60 sec: 14063.0, 300 sec: 14093.0). Total num frames: 775557120. Throughput: 0: 3480.9. Samples: 183053496. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 11:52:18,968][134211] Avg episode reward: [(0, '9.084')] [2025-01-04 11:52:21,398][134294] Updated weights for policy 0, policy_version 189354 (0.0026) [2025-01-04 11:52:23,968][134211] Fps is (10 sec: 13926.4, 60 sec: 13858.2, 300 sec: 14106.9). Total num frames: 775626752. Throughput: 0: 3509.5. Samples: 183074176. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 11:52:23,968][134211] Avg episode reward: [(0, '9.310')] [2025-01-04 11:52:24,490][134294] Updated weights for policy 0, policy_version 189364 (0.0026) [2025-01-04 11:52:27,274][134294] Updated weights for policy 0, policy_version 189374 (0.0022) [2025-01-04 11:52:28,968][134211] Fps is (10 sec: 15155.2, 60 sec: 14063.0, 300 sec: 14106.9). Total num frames: 775708672. Throughput: 0: 3324.7. Samples: 183096890. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 11:52:28,968][134211] Avg episode reward: [(0, '8.971')] [2025-01-04 11:52:29,218][134294] Updated weights for policy 0, policy_version 189384 (0.0011) [2025-01-04 11:52:31,361][134294] Updated weights for policy 0, policy_version 189394 (0.0014) [2025-01-04 11:52:33,512][134294] Updated weights for policy 0, policy_version 189404 (0.0014) [2025-01-04 11:52:33,968][134211] Fps is (10 sec: 17612.9, 60 sec: 14609.1, 300 sec: 14204.1). Total num frames: 775802880. Throughput: 0: 3336.3. Samples: 183111840. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 11:52:33,968][134211] Avg episode reward: [(0, '10.389')] [2025-01-04 11:52:36,764][134294] Updated weights for policy 0, policy_version 189414 (0.0025) [2025-01-04 11:52:38,968][134211] Fps is (10 sec: 15564.5, 60 sec: 14336.0, 300 sec: 14120.8). Total num frames: 775864320. Throughput: 0: 3457.5. Samples: 183134236. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 11:52:38,968][134211] Avg episode reward: [(0, '9.396')] [2025-01-04 11:52:40,237][134294] Updated weights for policy 0, policy_version 189424 (0.0028) [2025-01-04 11:52:43,337][134294] Updated weights for policy 0, policy_version 189434 (0.0026) [2025-01-04 11:52:43,970][134211] Fps is (10 sec: 12285.3, 60 sec: 13584.6, 300 sec: 14106.8). Total num frames: 775925760. Throughput: 0: 3464.8. Samples: 183153030. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 11:52:43,970][134211] Avg episode reward: [(0, '9.105')] [2025-01-04 11:52:46,386][134294] Updated weights for policy 0, policy_version 189444 (0.0026) [2025-01-04 11:52:48,969][134211] Fps is (10 sec: 12696.2, 60 sec: 13448.3, 300 sec: 14093.0). Total num frames: 775991296. Throughput: 0: 3495.4. Samples: 183163062. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 11:52:48,970][134211] Avg episode reward: [(0, '10.276')] [2025-01-04 11:52:49,565][134294] Updated weights for policy 0, policy_version 189454 (0.0027) [2025-01-04 11:52:52,615][134294] Updated weights for policy 0, policy_version 189464 (0.0026) [2025-01-04 11:52:53,968][134211] Fps is (10 sec: 13519.5, 60 sec: 13653.4, 300 sec: 14106.9). Total num frames: 776060928. Throughput: 0: 3537.7. Samples: 183182872. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 11:52:53,968][134211] Avg episode reward: [(0, '8.797')] [2025-01-04 11:52:55,637][134294] Updated weights for policy 0, policy_version 189474 (0.0026) [2025-01-04 11:52:58,644][134294] Updated weights for policy 0, policy_version 189484 (0.0023) [2025-01-04 11:52:58,968][134211] Fps is (10 sec: 13518.3, 60 sec: 13721.6, 300 sec: 14065.2). Total num frames: 776126464. Throughput: 0: 3563.6. Samples: 183203292. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 11:52:58,968][134211] Avg episode reward: [(0, '8.862')] [2025-01-04 11:53:01,940][134294] Updated weights for policy 0, policy_version 189494 (0.0028) [2025-01-04 11:53:03,968][134211] Fps is (10 sec: 12697.8, 60 sec: 13789.9, 300 sec: 14009.7). Total num frames: 776187904. Throughput: 0: 3531.7. Samples: 183212422. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 11:53:03,968][134211] Avg episode reward: [(0, '8.460')] [2025-01-04 11:53:05,350][134294] Updated weights for policy 0, policy_version 189504 (0.0027) [2025-01-04 11:53:08,374][134294] Updated weights for policy 0, policy_version 189514 (0.0023) [2025-01-04 11:53:08,968][134211] Fps is (10 sec: 12697.6, 60 sec: 13858.1, 300 sec: 14009.7). Total num frames: 776253440. Throughput: 0: 3495.0. Samples: 183231452. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 11:53:08,968][134211] Avg episode reward: [(0, '9.316')] [2025-01-04 11:53:11,419][134294] Updated weights for policy 0, policy_version 189524 (0.0025) [2025-01-04 11:53:13,968][134211] Fps is (10 sec: 13515.9, 60 sec: 13926.3, 300 sec: 14023.6). Total num frames: 776323072. Throughput: 0: 3431.2. Samples: 183251298. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 11:53:13,969][134211] Avg episode reward: [(0, '8.857')] [2025-01-04 11:53:13,977][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000189532_776323072.pth... [2025-01-04 11:53:14,060][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000188717_772984832.pth [2025-01-04 11:53:14,672][134294] Updated weights for policy 0, policy_version 189534 (0.0026) [2025-01-04 11:53:17,640][134294] Updated weights for policy 0, policy_version 189544 (0.0024) [2025-01-04 11:53:18,968][134211] Fps is (10 sec: 13517.0, 60 sec: 13858.1, 300 sec: 14023.6). Total num frames: 776388608. Throughput: 0: 3320.2. Samples: 183261248. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 11:53:18,968][134211] Avg episode reward: [(0, '8.651')] [2025-01-04 11:53:20,629][134294] Updated weights for policy 0, policy_version 189554 (0.0026) [2025-01-04 11:53:23,545][134294] Updated weights for policy 0, policy_version 189564 (0.0024) [2025-01-04 11:53:23,968][134211] Fps is (10 sec: 13517.6, 60 sec: 13858.1, 300 sec: 14037.5). Total num frames: 776458240. Throughput: 0: 3285.6. Samples: 183282088. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 11:53:23,968][134211] Avg episode reward: [(0, '11.094')] [2025-01-04 11:53:26,477][134294] Updated weights for policy 0, policy_version 189574 (0.0024) [2025-01-04 11:53:28,968][134211] Fps is (10 sec: 13516.5, 60 sec: 13585.0, 300 sec: 14065.2). Total num frames: 776523776. Throughput: 0: 3316.6. Samples: 183302272. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 11:53:28,968][134211] Avg episode reward: [(0, '9.219')] [2025-01-04 11:53:29,913][134294] Updated weights for policy 0, policy_version 189584 (0.0026) [2025-01-04 11:53:32,087][134294] Updated weights for policy 0, policy_version 189594 (0.0014) [2025-01-04 11:53:33,967][134211] Fps is (10 sec: 15565.2, 60 sec: 13516.8, 300 sec: 14037.5). Total num frames: 776613888. Throughput: 0: 3328.4. Samples: 183312834. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 11:53:33,968][134211] Avg episode reward: [(0, '9.507')] [2025-01-04 11:53:34,114][134294] Updated weights for policy 0, policy_version 189604 (0.0012) [2025-01-04 11:53:36,054][134294] Updated weights for policy 0, policy_version 189614 (0.0014) [2025-01-04 11:53:37,954][134294] Updated weights for policy 0, policy_version 189624 (0.0013) [2025-01-04 11:53:38,967][134211] Fps is (10 sec: 19661.4, 60 sec: 14267.8, 300 sec: 14037.5). Total num frames: 776720384. Throughput: 0: 3581.1. Samples: 183344022. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 11:53:38,968][134211] Avg episode reward: [(0, '8.863')] [2025-01-04 11:53:39,949][134294] Updated weights for policy 0, policy_version 189634 (0.0015) [2025-01-04 11:53:43,055][134294] Updated weights for policy 0, policy_version 189644 (0.0027) [2025-01-04 11:53:43,968][134211] Fps is (10 sec: 17612.1, 60 sec: 14404.7, 300 sec: 13995.8). Total num frames: 776790016. Throughput: 0: 3685.8. Samples: 183369152. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 11:53:43,969][134211] Avg episode reward: [(0, '9.238')] [2025-01-04 11:53:46,216][134294] Updated weights for policy 0, policy_version 189654 (0.0026) [2025-01-04 11:53:48,968][134211] Fps is (10 sec: 13516.6, 60 sec: 14404.6, 300 sec: 13995.9). Total num frames: 776855552. Throughput: 0: 3699.6. Samples: 183378902. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 11:53:48,968][134211] Avg episode reward: [(0, '9.098')] [2025-01-04 11:53:49,424][134294] Updated weights for policy 0, policy_version 189664 (0.0027) [2025-01-04 11:53:52,430][134294] Updated weights for policy 0, policy_version 189674 (0.0025) [2025-01-04 11:53:53,968][134211] Fps is (10 sec: 13107.2, 60 sec: 14336.0, 300 sec: 13995.9). Total num frames: 776921088. Throughput: 0: 3709.9. Samples: 183398400. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 11:53:53,969][134211] Avg episode reward: [(0, '9.712')] [2025-01-04 11:53:55,519][134294] Updated weights for policy 0, policy_version 189684 (0.0027) [2025-01-04 11:53:58,694][134294] Updated weights for policy 0, policy_version 189694 (0.0026) [2025-01-04 11:53:58,968][134211] Fps is (10 sec: 13107.0, 60 sec: 14336.0, 300 sec: 14009.7). Total num frames: 776986624. Throughput: 0: 3709.9. Samples: 183418242. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 11:53:58,968][134211] Avg episode reward: [(0, '10.237')] [2025-01-04 11:54:02,219][134294] Updated weights for policy 0, policy_version 189704 (0.0025) [2025-01-04 11:54:03,968][134211] Fps is (10 sec: 12288.0, 60 sec: 14267.7, 300 sec: 13982.0). Total num frames: 777043968. Throughput: 0: 3686.7. Samples: 183427152. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 11:54:03,969][134211] Avg episode reward: [(0, '9.938')] [2025-01-04 11:54:05,520][134294] Updated weights for policy 0, policy_version 189714 (0.0025) [2025-01-04 11:54:08,439][134294] Updated weights for policy 0, policy_version 189724 (0.0024) [2025-01-04 11:54:08,968][134211] Fps is (10 sec: 12697.8, 60 sec: 14336.0, 300 sec: 13995.8). Total num frames: 777113600. Throughput: 0: 3644.2. Samples: 183446078. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 11:54:08,968][134211] Avg episode reward: [(0, '10.715')] [2025-01-04 11:54:11,402][134294] Updated weights for policy 0, policy_version 189734 (0.0023) [2025-01-04 11:54:13,968][134211] Fps is (10 sec: 13926.4, 60 sec: 14336.1, 300 sec: 13995.8). Total num frames: 777183232. Throughput: 0: 3653.5. Samples: 183466678. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 11:54:13,969][134211] Avg episode reward: [(0, '9.856')] [2025-01-04 11:54:14,473][134294] Updated weights for policy 0, policy_version 189744 (0.0025) [2025-01-04 11:54:17,476][134294] Updated weights for policy 0, policy_version 189754 (0.0027) [2025-01-04 11:54:18,968][134211] Fps is (10 sec: 13926.2, 60 sec: 14404.2, 300 sec: 13995.8). Total num frames: 777252864. Throughput: 0: 3643.8. Samples: 183476806. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 11:54:18,968][134211] Avg episode reward: [(0, '9.884')] [2025-01-04 11:54:20,471][134294] Updated weights for policy 0, policy_version 189764 (0.0026) [2025-01-04 11:54:23,398][134294] Updated weights for policy 0, policy_version 189774 (0.0025) [2025-01-04 11:54:23,968][134211] Fps is (10 sec: 13926.5, 60 sec: 14404.3, 300 sec: 14023.6). Total num frames: 777322496. Throughput: 0: 3417.2. Samples: 183497798. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 11:54:23,968][134211] Avg episode reward: [(0, '9.570')] [2025-01-04 11:54:26,195][134294] Updated weights for policy 0, policy_version 189784 (0.0027) [2025-01-04 11:54:28,968][134211] Fps is (10 sec: 13926.3, 60 sec: 14472.5, 300 sec: 14051.4). Total num frames: 777392128. Throughput: 0: 3323.1. Samples: 183518692. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 11:54:28,968][134211] Avg episode reward: [(0, '9.420')] [2025-01-04 11:54:29,303][134294] Updated weights for policy 0, policy_version 189794 (0.0026) [2025-01-04 11:54:32,814][134294] Updated weights for policy 0, policy_version 189804 (0.0023) [2025-01-04 11:54:33,968][134211] Fps is (10 sec: 12697.6, 60 sec: 13926.3, 300 sec: 14023.6). Total num frames: 777449472. Throughput: 0: 3313.4. Samples: 183528006. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 11:54:33,968][134211] Avg episode reward: [(0, '8.933')] [2025-01-04 11:54:36,109][134294] Updated weights for policy 0, policy_version 189814 (0.0024) [2025-01-04 11:54:38,968][134211] Fps is (10 sec: 11468.8, 60 sec: 13107.1, 300 sec: 13995.8). Total num frames: 777506816. Throughput: 0: 3273.4. Samples: 183545704. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 11:54:38,968][134211] Avg episode reward: [(0, '10.044')] [2025-01-04 11:54:39,738][134294] Updated weights for policy 0, policy_version 189824 (0.0027) [2025-01-04 11:54:42,093][134294] Updated weights for policy 0, policy_version 189834 (0.0018) [2025-01-04 11:54:43,968][134211] Fps is (10 sec: 14745.6, 60 sec: 13448.5, 300 sec: 14065.2). Total num frames: 777596928. Throughput: 0: 3342.6. Samples: 183568660. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 11:54:43,968][134211] Avg episode reward: [(0, '8.386')] [2025-01-04 11:54:44,072][134294] Updated weights for policy 0, policy_version 189844 (0.0012) [2025-01-04 11:54:45,986][134294] Updated weights for policy 0, policy_version 189854 (0.0015) [2025-01-04 11:54:48,441][134294] Updated weights for policy 0, policy_version 189864 (0.0019) [2025-01-04 11:54:48,969][134211] Fps is (10 sec: 18020.6, 60 sec: 13857.9, 300 sec: 14134.6). Total num frames: 777687040. Throughput: 0: 3501.8. Samples: 183584736. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 11:54:48,969][134211] Avg episode reward: [(0, '8.796')] [2025-01-04 11:54:51,652][134294] Updated weights for policy 0, policy_version 189874 (0.0031) [2025-01-04 11:54:53,968][134211] Fps is (10 sec: 15564.9, 60 sec: 13858.1, 300 sec: 14120.8). Total num frames: 777752576. Throughput: 0: 3551.5. Samples: 183605896. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 11:54:53,968][134211] Avg episode reward: [(0, '9.257')] [2025-01-04 11:54:54,829][134294] Updated weights for policy 0, policy_version 189884 (0.0026) [2025-01-04 11:54:57,940][134294] Updated weights for policy 0, policy_version 189894 (0.0024) [2025-01-04 11:54:58,968][134211] Fps is (10 sec: 13108.5, 60 sec: 13858.1, 300 sec: 14134.7). Total num frames: 777818112. Throughput: 0: 3529.5. Samples: 183625506. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 11:54:58,968][134211] Avg episode reward: [(0, '9.224')] [2025-01-04 11:55:01,100][134294] Updated weights for policy 0, policy_version 189904 (0.0027) [2025-01-04 11:55:03,968][134211] Fps is (10 sec: 12697.6, 60 sec: 13926.4, 300 sec: 13995.8). Total num frames: 777879552. Throughput: 0: 3515.5. Samples: 183635004. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 11:55:03,968][134211] Avg episode reward: [(0, '8.715')] [2025-01-04 11:55:04,630][134294] Updated weights for policy 0, policy_version 189914 (0.0024) [2025-01-04 11:55:07,937][134294] Updated weights for policy 0, policy_version 189924 (0.0027) [2025-01-04 11:55:08,968][134211] Fps is (10 sec: 12287.2, 60 sec: 13789.7, 300 sec: 13829.2). Total num frames: 777940992. Throughput: 0: 3446.5. Samples: 183652892. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 11:55:08,969][134211] Avg episode reward: [(0, '9.903')] [2025-01-04 11:55:11,003][134294] Updated weights for policy 0, policy_version 189934 (0.0026) [2025-01-04 11:55:13,969][134211] Fps is (10 sec: 12696.6, 60 sec: 13721.4, 300 sec: 13829.2). Total num frames: 778006528. Throughput: 0: 3428.2. Samples: 183672964. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 11:55:13,969][134211] Avg episode reward: [(0, '9.716')] [2025-01-04 11:55:13,977][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000189943_778006528.pth... [2025-01-04 11:55:14,059][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000189123_774647808.pth [2025-01-04 11:55:14,139][134294] Updated weights for policy 0, policy_version 189944 (0.0024) [2025-01-04 11:55:17,228][134294] Updated weights for policy 0, policy_version 189954 (0.0026) [2025-01-04 11:55:18,968][134211] Fps is (10 sec: 13108.0, 60 sec: 13653.3, 300 sec: 13815.4). Total num frames: 778072064. Throughput: 0: 3432.6. Samples: 183682474. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 11:55:18,968][134211] Avg episode reward: [(0, '9.375')] [2025-01-04 11:55:20,291][134294] Updated weights for policy 0, policy_version 189964 (0.0026) [2025-01-04 11:55:23,151][134294] Updated weights for policy 0, policy_version 189974 (0.0023) [2025-01-04 11:55:23,968][134211] Fps is (10 sec: 13518.0, 60 sec: 13653.3, 300 sec: 13857.0). Total num frames: 778141696. Throughput: 0: 3497.4. Samples: 183703086. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 11:55:23,968][134211] Avg episode reward: [(0, '9.452')] [2025-01-04 11:55:26,153][134294] Updated weights for policy 0, policy_version 189984 (0.0027) [2025-01-04 11:55:28,968][134211] Fps is (10 sec: 13926.5, 60 sec: 13653.3, 300 sec: 13884.8). Total num frames: 778211328. Throughput: 0: 3447.0. Samples: 183723776. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 11:55:28,969][134211] Avg episode reward: [(0, '9.080')] [2025-01-04 11:55:29,203][134294] Updated weights for policy 0, policy_version 189994 (0.0028) [2025-01-04 11:55:32,105][134294] Updated weights for policy 0, policy_version 190004 (0.0020) [2025-01-04 11:55:33,968][134211] Fps is (10 sec: 15155.4, 60 sec: 14063.0, 300 sec: 13954.2). Total num frames: 778293248. Throughput: 0: 3307.4. Samples: 183733566. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 11:55:33,968][134211] Avg episode reward: [(0, '9.747')] [2025-01-04 11:55:34,162][134294] Updated weights for policy 0, policy_version 190014 (0.0013) [2025-01-04 11:55:36,202][134294] Updated weights for policy 0, policy_version 190024 (0.0013) [2025-01-04 11:55:38,189][134294] Updated weights for policy 0, policy_version 190034 (0.0014) [2025-01-04 11:55:38,968][134211] Fps is (10 sec: 18022.7, 60 sec: 14745.6, 300 sec: 14065.3). Total num frames: 778391552. Throughput: 0: 3482.7. Samples: 183762618. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 11:55:38,968][134211] Avg episode reward: [(0, '9.499')] [2025-01-04 11:55:40,924][134294] Updated weights for policy 0, policy_version 190044 (0.0023) [2025-01-04 11:55:43,970][134211] Fps is (10 sec: 16380.8, 60 sec: 14335.6, 300 sec: 14065.2). Total num frames: 778457088. Throughput: 0: 3561.6. Samples: 183785786. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 11:55:43,970][134211] Avg episode reward: [(0, '8.334')] [2025-01-04 11:55:43,991][134294] Updated weights for policy 0, policy_version 190054 (0.0028) [2025-01-04 11:55:47,240][134294] Updated weights for policy 0, policy_version 190064 (0.0027) [2025-01-04 11:55:48,968][134211] Fps is (10 sec: 13106.5, 60 sec: 13926.5, 300 sec: 14051.3). Total num frames: 778522624. Throughput: 0: 3564.9. Samples: 183795428. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 11:55:48,969][134211] Avg episode reward: [(0, '8.222')] [2025-01-04 11:55:50,229][134294] Updated weights for policy 0, policy_version 190074 (0.0023) [2025-01-04 11:55:53,240][134294] Updated weights for policy 0, policy_version 190084 (0.0025) [2025-01-04 11:55:53,968][134211] Fps is (10 sec: 13518.9, 60 sec: 13994.6, 300 sec: 14051.3). Total num frames: 778592256. Throughput: 0: 3620.2. Samples: 183815800. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 11:55:53,969][134211] Avg episode reward: [(0, '9.830')] [2025-01-04 11:55:56,308][134294] Updated weights for policy 0, policy_version 190094 (0.0028) [2025-01-04 11:55:58,968][134211] Fps is (10 sec: 13517.6, 60 sec: 13994.7, 300 sec: 14065.3). Total num frames: 778657792. Throughput: 0: 3613.9. Samples: 183835588. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 11:55:58,968][134211] Avg episode reward: [(0, '9.323')] [2025-01-04 11:55:59,501][134294] Updated weights for policy 0, policy_version 190104 (0.0026) [2025-01-04 11:56:02,897][134294] Updated weights for policy 0, policy_version 190114 (0.0023) [2025-01-04 11:56:03,968][134211] Fps is (10 sec: 12697.8, 60 sec: 13994.7, 300 sec: 14037.5). Total num frames: 778719232. Throughput: 0: 3606.6. Samples: 183844770. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 11:56:03,968][134211] Avg episode reward: [(0, '9.416')] [2025-01-04 11:56:06,292][134294] Updated weights for policy 0, policy_version 190124 (0.0023) [2025-01-04 11:56:08,969][134211] Fps is (10 sec: 12286.6, 60 sec: 13994.6, 300 sec: 14009.7). Total num frames: 778780672. Throughput: 0: 3545.1. Samples: 183862620. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 11:56:08,969][134211] Avg episode reward: [(0, '9.350')] [2025-01-04 11:56:09,671][134294] Updated weights for policy 0, policy_version 190134 (0.0025) [2025-01-04 11:56:12,661][134294] Updated weights for policy 0, policy_version 190144 (0.0022) [2025-01-04 11:56:13,968][134211] Fps is (10 sec: 12697.4, 60 sec: 13994.8, 300 sec: 14009.7). Total num frames: 778846208. Throughput: 0: 3525.1. Samples: 183882404. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 11:56:13,969][134211] Avg episode reward: [(0, '8.479')] [2025-01-04 11:56:15,637][134294] Updated weights for policy 0, policy_version 190154 (0.0026) [2025-01-04 11:56:18,523][134294] Updated weights for policy 0, policy_version 190164 (0.0024) [2025-01-04 11:56:18,968][134211] Fps is (10 sec: 13517.5, 60 sec: 14062.8, 300 sec: 13968.0). Total num frames: 778915840. Throughput: 0: 3543.2. Samples: 183893012. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 11:56:18,970][134211] Avg episode reward: [(0, '9.478')] [2025-01-04 11:56:21,456][134294] Updated weights for policy 0, policy_version 190174 (0.0026) [2025-01-04 11:56:23,969][134211] Fps is (10 sec: 13515.7, 60 sec: 13994.4, 300 sec: 13954.1). Total num frames: 778981376. Throughput: 0: 3353.8. Samples: 183913544. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 11:56:23,969][134211] Avg episode reward: [(0, '8.422')] [2025-01-04 11:56:24,931][134294] Updated weights for policy 0, policy_version 190184 (0.0027) [2025-01-04 11:56:27,915][134294] Updated weights for policy 0, policy_version 190194 (0.0024) [2025-01-04 11:56:28,968][134211] Fps is (10 sec: 13107.9, 60 sec: 13926.4, 300 sec: 13968.1). Total num frames: 779046912. Throughput: 0: 3260.5. Samples: 183932504. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 11:56:28,968][134211] Avg episode reward: [(0, '9.348')] [2025-01-04 11:56:30,682][134294] Updated weights for policy 0, policy_version 190204 (0.0020) [2025-01-04 11:56:32,743][134294] Updated weights for policy 0, policy_version 190214 (0.0015) [2025-01-04 11:56:33,968][134211] Fps is (10 sec: 15566.5, 60 sec: 14062.9, 300 sec: 14009.7). Total num frames: 779137024. Throughput: 0: 3314.9. Samples: 183944596. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 11:56:33,968][134211] Avg episode reward: [(0, '8.867')] [2025-01-04 11:56:35,579][134294] Updated weights for policy 0, policy_version 190224 (0.0022) [2025-01-04 11:56:38,968][134211] Fps is (10 sec: 14745.4, 60 sec: 13380.2, 300 sec: 13843.1). Total num frames: 779194368. Throughput: 0: 3363.3. Samples: 183967146. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 11:56:38,969][134211] Avg episode reward: [(0, '9.781')] [2025-01-04 11:56:39,179][134294] Updated weights for policy 0, policy_version 190234 (0.0030) [2025-01-04 11:56:42,442][134294] Updated weights for policy 0, policy_version 190244 (0.0028) [2025-01-04 11:56:43,968][134211] Fps is (10 sec: 12288.1, 60 sec: 13380.7, 300 sec: 13815.3). Total num frames: 779259904. Throughput: 0: 3320.4. Samples: 183985004. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 11:56:43,968][134211] Avg episode reward: [(0, '9.860')] [2025-01-04 11:56:44,737][134294] Updated weights for policy 0, policy_version 190254 (0.0013) [2025-01-04 11:56:47,373][134294] Updated weights for policy 0, policy_version 190264 (0.0024) [2025-01-04 11:56:48,968][134211] Fps is (10 sec: 14336.1, 60 sec: 13585.2, 300 sec: 13884.8). Total num frames: 779337728. Throughput: 0: 3431.9. Samples: 183999206. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 11:56:48,968][134211] Avg episode reward: [(0, '8.941')] [2025-01-04 11:56:50,434][134294] Updated weights for policy 0, policy_version 190274 (0.0023) [2025-01-04 11:56:53,472][134294] Updated weights for policy 0, policy_version 190284 (0.0026) [2025-01-04 11:56:53,968][134211] Fps is (10 sec: 14745.2, 60 sec: 13585.1, 300 sec: 13912.5). Total num frames: 779407360. Throughput: 0: 3494.5. Samples: 184019870. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 11:56:53,968][134211] Avg episode reward: [(0, '9.586')] [2025-01-04 11:56:56,561][134294] Updated weights for policy 0, policy_version 190294 (0.0026) [2025-01-04 11:56:58,968][134211] Fps is (10 sec: 13516.7, 60 sec: 13585.0, 300 sec: 13940.3). Total num frames: 779472896. Throughput: 0: 3486.5. Samples: 184039298. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 11:56:58,968][134211] Avg episode reward: [(0, '9.474')] [2025-01-04 11:56:59,751][134294] Updated weights for policy 0, policy_version 190304 (0.0026) [2025-01-04 11:57:03,152][134294] Updated weights for policy 0, policy_version 190314 (0.0029) [2025-01-04 11:57:03,968][134211] Fps is (10 sec: 12696.9, 60 sec: 13585.0, 300 sec: 13940.3). Total num frames: 779534336. Throughput: 0: 3463.7. Samples: 184048880. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 11:57:03,969][134211] Avg episode reward: [(0, '9.412')] [2025-01-04 11:57:06,407][134294] Updated weights for policy 0, policy_version 190324 (0.0028) [2025-01-04 11:57:08,970][134211] Fps is (10 sec: 12285.0, 60 sec: 13584.7, 300 sec: 13926.3). Total num frames: 779595776. Throughput: 0: 3413.0. Samples: 184067132. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 11:57:08,971][134211] Avg episode reward: [(0, '9.120')] [2025-01-04 11:57:09,826][134294] Updated weights for policy 0, policy_version 190334 (0.0025) [2025-01-04 11:57:12,130][134294] Updated weights for policy 0, policy_version 190344 (0.0016) [2025-01-04 11:57:13,968][134211] Fps is (10 sec: 15156.4, 60 sec: 13994.8, 300 sec: 13995.8). Total num frames: 779685888. Throughput: 0: 3524.3. Samples: 184091096. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 11:57:13,968][134211] Avg episode reward: [(0, '9.595')] [2025-01-04 11:57:14,038][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000190354_779689984.pth... [2025-01-04 11:57:14,040][134294] Updated weights for policy 0, policy_version 190354 (0.0013) [2025-01-04 11:57:14,078][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000189532_776323072.pth [2025-01-04 11:57:16,028][134294] Updated weights for policy 0, policy_version 190364 (0.0015) [2025-01-04 11:57:17,922][134294] Updated weights for policy 0, policy_version 190374 (0.0012) [2025-01-04 11:57:18,968][134211] Fps is (10 sec: 19666.0, 60 sec: 14609.2, 300 sec: 14120.8). Total num frames: 779792384. Throughput: 0: 3606.8. Samples: 184106900. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 11:57:18,968][134211] Avg episode reward: [(0, '10.058')] [2025-01-04 11:57:19,875][134294] Updated weights for policy 0, policy_version 190384 (0.0013) [2025-01-04 11:57:22,777][134294] Updated weights for policy 0, policy_version 190394 (0.0022) [2025-01-04 11:57:23,968][134211] Fps is (10 sec: 18022.1, 60 sec: 14745.9, 300 sec: 14093.0). Total num frames: 779866112. Throughput: 0: 3731.5. Samples: 184135064. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 11:57:23,968][134211] Avg episode reward: [(0, '9.532')] [2025-01-04 11:57:26,042][134294] Updated weights for policy 0, policy_version 190404 (0.0026) [2025-01-04 11:57:28,968][134211] Fps is (10 sec: 13516.7, 60 sec: 14677.3, 300 sec: 13981.9). Total num frames: 779927552. Throughput: 0: 3734.3. Samples: 184153048. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 11:57:28,968][134211] Avg episode reward: [(0, '8.345')] [2025-01-04 11:57:29,635][134294] Updated weights for policy 0, policy_version 190414 (0.0031) [2025-01-04 11:57:33,047][134294] Updated weights for policy 0, policy_version 190424 (0.0025) [2025-01-04 11:57:33,968][134211] Fps is (10 sec: 11878.5, 60 sec: 14131.2, 300 sec: 13968.1). Total num frames: 779984896. Throughput: 0: 3615.4. Samples: 184161898. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 11:57:33,968][134211] Avg episode reward: [(0, '9.385')] [2025-01-04 11:57:36,354][134294] Updated weights for policy 0, policy_version 190434 (0.0028) [2025-01-04 11:57:38,968][134211] Fps is (10 sec: 11878.3, 60 sec: 14199.5, 300 sec: 13968.2). Total num frames: 780046336. Throughput: 0: 3554.4. Samples: 184179818. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 11:57:38,968][134211] Avg episode reward: [(0, '8.937')] [2025-01-04 11:57:40,070][134294] Updated weights for policy 0, policy_version 190444 (0.0025) [2025-01-04 11:57:43,191][134294] Updated weights for policy 0, policy_version 190454 (0.0024) [2025-01-04 11:57:43,968][134211] Fps is (10 sec: 12287.5, 60 sec: 14131.1, 300 sec: 13954.2). Total num frames: 780107776. Throughput: 0: 3529.0. Samples: 184198106. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 11:57:43,970][134211] Avg episode reward: [(0, '9.254')] [2025-01-04 11:57:46,247][134294] Updated weights for policy 0, policy_version 190464 (0.0023) [2025-01-04 11:57:48,968][134211] Fps is (10 sec: 12697.5, 60 sec: 13926.4, 300 sec: 13940.3). Total num frames: 780173312. Throughput: 0: 3538.1. Samples: 184208092. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 11:57:48,968][134211] Avg episode reward: [(0, '7.815')] [2025-01-04 11:57:49,397][134294] Updated weights for policy 0, policy_version 190474 (0.0023) [2025-01-04 11:57:52,412][134294] Updated weights for policy 0, policy_version 190484 (0.0023) [2025-01-04 11:57:53,968][134211] Fps is (10 sec: 13107.6, 60 sec: 13858.1, 300 sec: 13940.3). Total num frames: 780238848. Throughput: 0: 3576.1. Samples: 184228048. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 11:57:53,968][134211] Avg episode reward: [(0, '9.930')] [2025-01-04 11:57:55,682][134294] Updated weights for policy 0, policy_version 190494 (0.0026) [2025-01-04 11:57:58,719][134294] Updated weights for policy 0, policy_version 190504 (0.0024) [2025-01-04 11:57:58,968][134211] Fps is (10 sec: 13107.2, 60 sec: 13858.1, 300 sec: 13954.2). Total num frames: 780304384. Throughput: 0: 3479.8. Samples: 184247688. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 11:57:58,968][134211] Avg episode reward: [(0, '9.615')] [2025-01-04 11:58:02,092][134294] Updated weights for policy 0, policy_version 190514 (0.0025) [2025-01-04 11:58:03,968][134211] Fps is (10 sec: 12697.5, 60 sec: 13858.2, 300 sec: 13940.3). Total num frames: 780365824. Throughput: 0: 3331.0. Samples: 184256794. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 11:58:03,969][134211] Avg episode reward: [(0, '8.939')] [2025-01-04 11:58:05,521][134294] Updated weights for policy 0, policy_version 190524 (0.0025) [2025-01-04 11:58:08,785][134294] Updated weights for policy 0, policy_version 190534 (0.0025) [2025-01-04 11:58:08,968][134211] Fps is (10 sec: 12288.0, 60 sec: 13858.7, 300 sec: 13912.5). Total num frames: 780427264. Throughput: 0: 3110.2. Samples: 184275024. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 11:58:08,968][134211] Avg episode reward: [(0, '8.597')] [2025-01-04 11:58:11,686][134294] Updated weights for policy 0, policy_version 190544 (0.0025) [2025-01-04 11:58:13,969][134211] Fps is (10 sec: 13105.2, 60 sec: 13516.4, 300 sec: 13926.3). Total num frames: 780496896. Throughput: 0: 3161.2. Samples: 184295306. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 11:58:13,970][134211] Avg episode reward: [(0, '8.801')] [2025-01-04 11:58:14,742][134294] Updated weights for policy 0, policy_version 190554 (0.0026) [2025-01-04 11:58:17,745][134294] Updated weights for policy 0, policy_version 190564 (0.0024) [2025-01-04 11:58:18,968][134211] Fps is (10 sec: 13926.5, 60 sec: 12902.4, 300 sec: 13926.4). Total num frames: 780566528. Throughput: 0: 3188.0. Samples: 184305356. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 11:58:18,968][134211] Avg episode reward: [(0, '9.844')] [2025-01-04 11:58:20,688][134294] Updated weights for policy 0, policy_version 190574 (0.0024) [2025-01-04 11:58:22,728][134294] Updated weights for policy 0, policy_version 190584 (0.0016) [2025-01-04 11:58:23,968][134211] Fps is (10 sec: 15157.6, 60 sec: 13038.9, 300 sec: 13981.9). Total num frames: 780648448. Throughput: 0: 3295.9. Samples: 184328132. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 11:58:23,968][134211] Avg episode reward: [(0, '9.909')] [2025-01-04 11:58:25,387][134294] Updated weights for policy 0, policy_version 190594 (0.0021) [2025-01-04 11:58:28,352][134294] Updated weights for policy 0, policy_version 190604 (0.0026) [2025-01-04 11:58:28,968][134211] Fps is (10 sec: 15564.8, 60 sec: 13243.7, 300 sec: 13926.4). Total num frames: 780722176. Throughput: 0: 3401.9. Samples: 184351188. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 11:58:28,968][134211] Avg episode reward: [(0, '8.471')] [2025-01-04 11:58:31,366][134294] Updated weights for policy 0, policy_version 190614 (0.0027) [2025-01-04 11:58:33,968][134211] Fps is (10 sec: 13517.0, 60 sec: 13312.0, 300 sec: 13773.7). Total num frames: 780783616. Throughput: 0: 3404.2. Samples: 184361282. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 11:58:33,968][134211] Avg episode reward: [(0, '9.170')] [2025-01-04 11:58:35,030][134294] Updated weights for policy 0, policy_version 190624 (0.0023) [2025-01-04 11:58:37,335][134294] Updated weights for policy 0, policy_version 190634 (0.0014) [2025-01-04 11:58:38,968][134211] Fps is (10 sec: 14336.2, 60 sec: 13653.4, 300 sec: 13815.3). Total num frames: 780865536. Throughput: 0: 3413.5. Samples: 184381656. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 11:58:38,968][134211] Avg episode reward: [(0, '9.233')] [2025-01-04 11:58:39,441][134294] Updated weights for policy 0, policy_version 190644 (0.0014) [2025-01-04 11:58:41,408][134294] Updated weights for policy 0, policy_version 190654 (0.0014) [2025-01-04 11:58:43,635][134294] Updated weights for policy 0, policy_version 190664 (0.0022) [2025-01-04 11:58:43,968][134211] Fps is (10 sec: 17612.6, 60 sec: 14199.5, 300 sec: 13912.5). Total num frames: 780959744. Throughput: 0: 3638.4. Samples: 184411414. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 11:58:43,968][134211] Avg episode reward: [(0, '9.115')] [2025-01-04 11:58:46,755][134294] Updated weights for policy 0, policy_version 190674 (0.0026) [2025-01-04 11:58:48,968][134211] Fps is (10 sec: 15974.3, 60 sec: 14199.5, 300 sec: 13912.5). Total num frames: 781025280. Throughput: 0: 3658.6. Samples: 184421432. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 11:58:48,968][134211] Avg episode reward: [(0, '10.054')] [2025-01-04 11:58:50,010][134294] Updated weights for policy 0, policy_version 190684 (0.0023) [2025-01-04 11:58:53,049][134294] Updated weights for policy 0, policy_version 190694 (0.0026) [2025-01-04 11:58:53,968][134211] Fps is (10 sec: 13517.0, 60 sec: 14267.8, 300 sec: 13926.4). Total num frames: 781094912. Throughput: 0: 3684.2. Samples: 184440812. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 11:58:53,968][134211] Avg episode reward: [(0, '9.370')] [2025-01-04 11:58:56,090][134294] Updated weights for policy 0, policy_version 190704 (0.0025) [2025-01-04 11:58:58,968][134211] Fps is (10 sec: 13516.1, 60 sec: 14267.6, 300 sec: 13954.2). Total num frames: 781160448. Throughput: 0: 3679.7. Samples: 184460888. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 11:58:58,969][134211] Avg episode reward: [(0, '10.116')] [2025-01-04 11:58:59,227][134294] Updated weights for policy 0, policy_version 190714 (0.0026) [2025-01-04 11:59:02,451][134294] Updated weights for policy 0, policy_version 190724 (0.0026) [2025-01-04 11:59:03,968][134211] Fps is (10 sec: 12697.6, 60 sec: 14267.8, 300 sec: 13926.4). Total num frames: 781221888. Throughput: 0: 3673.6. Samples: 184470666. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 11:59:03,968][134211] Avg episode reward: [(0, '8.611')] [2025-01-04 11:59:05,600][134294] Updated weights for policy 0, policy_version 190734 (0.0023) [2025-01-04 11:59:08,895][134294] Updated weights for policy 0, policy_version 190744 (0.0027) [2025-01-04 11:59:08,968][134211] Fps is (10 sec: 12698.2, 60 sec: 14336.0, 300 sec: 13912.5). Total num frames: 781287424. Throughput: 0: 3585.4. Samples: 184489476. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 11:59:08,968][134211] Avg episode reward: [(0, '8.794')] [2025-01-04 11:59:11,876][134294] Updated weights for policy 0, policy_version 190754 (0.0023) [2025-01-04 11:59:13,968][134211] Fps is (10 sec: 13107.0, 60 sec: 14268.1, 300 sec: 13898.6). Total num frames: 781352960. Throughput: 0: 3511.9. Samples: 184509222. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 11:59:13,968][134211] Avg episode reward: [(0, '9.452')] [2025-01-04 11:59:13,977][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000190760_781352960.pth... [2025-01-04 11:59:14,054][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000189943_778006528.pth [2025-01-04 11:59:15,187][134294] Updated weights for policy 0, policy_version 190764 (0.0024) [2025-01-04 11:59:18,253][134294] Updated weights for policy 0, policy_version 190774 (0.0025) [2025-01-04 11:59:18,968][134211] Fps is (10 sec: 13107.2, 60 sec: 14199.5, 300 sec: 13884.7). Total num frames: 781418496. Throughput: 0: 3496.8. Samples: 184518638. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 11:59:18,968][134211] Avg episode reward: [(0, '8.130')] [2025-01-04 11:59:21,199][134294] Updated weights for policy 0, policy_version 190784 (0.0024) [2025-01-04 11:59:23,968][134211] Fps is (10 sec: 13516.8, 60 sec: 13994.7, 300 sec: 13884.7). Total num frames: 781488128. Throughput: 0: 3505.0. Samples: 184539380. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 11:59:23,970][134211] Avg episode reward: [(0, '9.310')] [2025-01-04 11:59:24,276][134294] Updated weights for policy 0, policy_version 190794 (0.0027) [2025-01-04 11:59:27,280][134294] Updated weights for policy 0, policy_version 190804 (0.0023) [2025-01-04 11:59:28,968][134211] Fps is (10 sec: 13516.3, 60 sec: 13858.0, 300 sec: 13912.5). Total num frames: 781553664. Throughput: 0: 3296.3. Samples: 184559750. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 11:59:28,969][134211] Avg episode reward: [(0, '9.293')] [2025-01-04 11:59:30,152][134294] Updated weights for policy 0, policy_version 190814 (0.0027) [2025-01-04 11:59:33,249][134294] Updated weights for policy 0, policy_version 190824 (0.0026) [2025-01-04 11:59:33,968][134211] Fps is (10 sec: 13107.3, 60 sec: 13926.4, 300 sec: 13940.3). Total num frames: 781619200. Throughput: 0: 3304.0. Samples: 184570114. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 11:59:33,968][134211] Avg episode reward: [(0, '9.886')] [2025-01-04 11:59:36,554][134294] Updated weights for policy 0, policy_version 190834 (0.0022) [2025-01-04 11:59:38,688][134294] Updated weights for policy 0, policy_version 190844 (0.0014) [2025-01-04 11:59:38,967][134211] Fps is (10 sec: 14746.5, 60 sec: 13926.4, 300 sec: 13912.5). Total num frames: 781701120. Throughput: 0: 3310.4. Samples: 184589780. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 11:59:38,968][134211] Avg episode reward: [(0, '9.060')] [2025-01-04 11:59:40,604][134294] Updated weights for policy 0, policy_version 190854 (0.0013) [2025-01-04 11:59:42,505][134294] Updated weights for policy 0, policy_version 190864 (0.0012) [2025-01-04 11:59:43,968][134211] Fps is (10 sec: 18022.4, 60 sec: 13994.7, 300 sec: 13940.3). Total num frames: 781799424. Throughput: 0: 3555.6. Samples: 184620888. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 11:59:43,968][134211] Avg episode reward: [(0, '8.720')] [2025-01-04 11:59:45,221][134294] Updated weights for policy 0, policy_version 190874 (0.0024) [2025-01-04 11:59:48,601][134294] Updated weights for policy 0, policy_version 190884 (0.0027) [2025-01-04 11:59:48,968][134211] Fps is (10 sec: 16383.6, 60 sec: 13994.6, 300 sec: 13940.3). Total num frames: 781864960. Throughput: 0: 3566.6. Samples: 184631162. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 11:59:48,968][134211] Avg episode reward: [(0, '10.769')] [2025-01-04 11:59:51,902][134294] Updated weights for policy 0, policy_version 190894 (0.0025) [2025-01-04 11:59:53,968][134211] Fps is (10 sec: 12697.6, 60 sec: 13858.1, 300 sec: 13926.4). Total num frames: 781926400. Throughput: 0: 3556.3. Samples: 184649510. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 11:59:53,968][134211] Avg episode reward: [(0, '9.178')] [2025-01-04 11:59:55,157][134294] Updated weights for policy 0, policy_version 190904 (0.0026) [2025-01-04 11:59:57,981][134294] Updated weights for policy 0, policy_version 190914 (0.0023) [2025-01-04 11:59:58,968][134211] Fps is (10 sec: 12697.7, 60 sec: 13858.3, 300 sec: 13940.3). Total num frames: 781991936. Throughput: 0: 3562.2. Samples: 184669522. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 11:59:58,968][134211] Avg episode reward: [(0, '9.294')] [2025-01-04 12:00:01,289][134294] Updated weights for policy 0, policy_version 190924 (0.0028) [2025-01-04 12:00:03,968][134211] Fps is (10 sec: 12697.5, 60 sec: 13858.1, 300 sec: 13940.3). Total num frames: 782053376. Throughput: 0: 3560.4. Samples: 184678858. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 12:00:03,969][134211] Avg episode reward: [(0, '9.943')] [2025-01-04 12:00:05,036][134294] Updated weights for policy 0, policy_version 190934 (0.0034) [2025-01-04 12:00:08,131][134294] Updated weights for policy 0, policy_version 190944 (0.0024) [2025-01-04 12:00:08,968][134211] Fps is (10 sec: 13107.2, 60 sec: 13926.4, 300 sec: 13954.2). Total num frames: 782123008. Throughput: 0: 3476.8. Samples: 184695834. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 12:00:08,968][134211] Avg episode reward: [(0, '8.836')] [2025-01-04 12:00:10,015][134294] Updated weights for policy 0, policy_version 190954 (0.0014) [2025-01-04 12:00:11,837][134294] Updated weights for policy 0, policy_version 190964 (0.0013) [2025-01-04 12:00:13,741][134294] Updated weights for policy 0, policy_version 190974 (0.0014) [2025-01-04 12:00:13,968][134211] Fps is (10 sec: 18022.9, 60 sec: 14677.4, 300 sec: 14106.9). Total num frames: 782233600. Throughput: 0: 3730.0. Samples: 184727598. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 12:00:13,968][134211] Avg episode reward: [(0, '9.444')] [2025-01-04 12:00:15,615][134294] Updated weights for policy 0, policy_version 190984 (0.0013) [2025-01-04 12:00:17,857][134294] Updated weights for policy 0, policy_version 190994 (0.0016) [2025-01-04 12:00:18,968][134211] Fps is (10 sec: 20479.8, 60 sec: 15155.2, 300 sec: 14190.2). Total num frames: 782327808. Throughput: 0: 3863.4. Samples: 184743966. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 12:00:18,968][134211] Avg episode reward: [(0, '9.542')] [2025-01-04 12:00:20,819][134294] Updated weights for policy 0, policy_version 191004 (0.0028) [2025-01-04 12:00:23,968][134211] Fps is (10 sec: 15564.4, 60 sec: 15018.7, 300 sec: 14162.4). Total num frames: 782389248. Throughput: 0: 3905.1. Samples: 184765510. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 12:00:23,968][134211] Avg episode reward: [(0, '8.949')] [2025-01-04 12:00:24,141][134294] Updated weights for policy 0, policy_version 191014 (0.0026) [2025-01-04 12:00:27,469][134294] Updated weights for policy 0, policy_version 191024 (0.0028) [2025-01-04 12:00:28,968][134211] Fps is (10 sec: 12287.4, 60 sec: 14950.4, 300 sec: 14093.0). Total num frames: 782450688. Throughput: 0: 3622.0. Samples: 184783880. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 12:00:28,969][134211] Avg episode reward: [(0, '9.719')] [2025-01-04 12:00:30,684][134294] Updated weights for policy 0, policy_version 191034 (0.0029) [2025-01-04 12:00:33,968][134211] Fps is (10 sec: 12288.1, 60 sec: 14882.1, 300 sec: 13968.0). Total num frames: 782512128. Throughput: 0: 3610.4. Samples: 184793628. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 12:00:33,968][134211] Avg episode reward: [(0, '10.718')] [2025-01-04 12:00:34,162][134294] Updated weights for policy 0, policy_version 191044 (0.0027) [2025-01-04 12:00:37,567][134294] Updated weights for policy 0, policy_version 191054 (0.0025) [2025-01-04 12:00:38,968][134211] Fps is (10 sec: 11878.9, 60 sec: 14472.5, 300 sec: 13940.4). Total num frames: 782569472. Throughput: 0: 3593.6. Samples: 184811220. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 12:00:38,968][134211] Avg episode reward: [(0, '9.704')] [2025-01-04 12:00:41,060][134294] Updated weights for policy 0, policy_version 191064 (0.0025) [2025-01-04 12:00:43,968][134211] Fps is (10 sec: 11468.6, 60 sec: 13789.8, 300 sec: 13912.5). Total num frames: 782626816. Throughput: 0: 3528.6. Samples: 184828308. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 12:00:43,969][134211] Avg episode reward: [(0, '9.084')] [2025-01-04 12:00:44,841][134294] Updated weights for policy 0, policy_version 191074 (0.0029) [2025-01-04 12:00:48,407][134294] Updated weights for policy 0, policy_version 191084 (0.0025) [2025-01-04 12:00:48,968][134211] Fps is (10 sec: 11468.8, 60 sec: 13653.3, 300 sec: 13870.9). Total num frames: 782684160. Throughput: 0: 3511.3. Samples: 184836866. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 12:00:48,968][134211] Avg episode reward: [(0, '8.847')] [2025-01-04 12:00:51,873][134294] Updated weights for policy 0, policy_version 191094 (0.0028) [2025-01-04 12:00:53,968][134211] Fps is (10 sec: 11878.6, 60 sec: 13653.3, 300 sec: 13857.0). Total num frames: 782745600. Throughput: 0: 3522.1. Samples: 184854328. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 12:00:53,968][134211] Avg episode reward: [(0, '10.863')] [2025-01-04 12:00:55,492][134294] Updated weights for policy 0, policy_version 191104 (0.0026) [2025-01-04 12:00:58,343][134294] Updated weights for policy 0, policy_version 191114 (0.0016) [2025-01-04 12:00:58,968][134211] Fps is (10 sec: 12697.8, 60 sec: 13653.3, 300 sec: 13870.9). Total num frames: 782811136. Throughput: 0: 3236.4. Samples: 184873234. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 12:00:58,968][134211] Avg episode reward: [(0, '10.015')] [2025-01-04 12:01:00,505][134294] Updated weights for policy 0, policy_version 191124 (0.0013) [2025-01-04 12:01:03,693][134294] Updated weights for policy 0, policy_version 191134 (0.0024) [2025-01-04 12:01:03,968][134211] Fps is (10 sec: 13925.5, 60 sec: 13858.0, 300 sec: 13912.5). Total num frames: 782884864. Throughput: 0: 3180.3. Samples: 184887082. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 12:01:03,969][134211] Avg episode reward: [(0, '8.657')] [2025-01-04 12:01:07,271][134294] Updated weights for policy 0, policy_version 191144 (0.0026) [2025-01-04 12:01:08,968][134211] Fps is (10 sec: 13107.1, 60 sec: 13653.3, 300 sec: 13884.8). Total num frames: 782942208. Throughput: 0: 3089.4. Samples: 184904532. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 12:01:08,968][134211] Avg episode reward: [(0, '10.678')] [2025-01-04 12:01:10,779][134294] Updated weights for policy 0, policy_version 191154 (0.0028) [2025-01-04 12:01:13,968][134211] Fps is (10 sec: 11878.9, 60 sec: 12834.1, 300 sec: 13857.0). Total num frames: 783003648. Throughput: 0: 3072.7. Samples: 184922150. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 12:01:13,969][134211] Avg episode reward: [(0, '9.349')] [2025-01-04 12:01:13,981][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000191163_783003648.pth... [2025-01-04 12:01:14,058][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000190354_779689984.pth [2025-01-04 12:01:14,245][134294] Updated weights for policy 0, policy_version 191164 (0.0025) [2025-01-04 12:01:16,590][134294] Updated weights for policy 0, policy_version 191174 (0.0014) [2025-01-04 12:01:18,967][134294] Updated weights for policy 0, policy_version 191184 (0.0016) [2025-01-04 12:01:18,968][134211] Fps is (10 sec: 14745.5, 60 sec: 12697.6, 300 sec: 13926.5). Total num frames: 783089664. Throughput: 0: 3102.2. Samples: 184933228. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 12:01:18,968][134211] Avg episode reward: [(0, '9.823')] [2025-01-04 12:01:22,475][134294] Updated weights for policy 0, policy_version 191194 (0.0025) [2025-01-04 12:01:23,968][134211] Fps is (10 sec: 13926.5, 60 sec: 12561.1, 300 sec: 13884.7). Total num frames: 783142912. Throughput: 0: 3191.0. Samples: 184954816. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 12:01:23,969][134211] Avg episode reward: [(0, '10.337')] [2025-01-04 12:01:26,128][134294] Updated weights for policy 0, policy_version 191204 (0.0028) [2025-01-04 12:01:28,968][134211] Fps is (10 sec: 11878.5, 60 sec: 12629.5, 300 sec: 13801.4). Total num frames: 783208448. Throughput: 0: 3199.8. Samples: 184972298. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 12:01:28,968][134211] Avg episode reward: [(0, '10.432')] [2025-01-04 12:01:29,101][134294] Updated weights for policy 0, policy_version 191214 (0.0018) [2025-01-04 12:01:31,256][134294] Updated weights for policy 0, policy_version 191224 (0.0013) [2025-01-04 12:01:33,357][134294] Updated weights for policy 0, policy_version 191234 (0.0013) [2025-01-04 12:01:33,968][134211] Fps is (10 sec: 15974.8, 60 sec: 13175.5, 300 sec: 13926.4). Total num frames: 783302656. Throughput: 0: 3320.5. Samples: 184986290. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 12:01:33,968][134211] Avg episode reward: [(0, '8.459')] [2025-01-04 12:01:35,563][134294] Updated weights for policy 0, policy_version 191244 (0.0014) [2025-01-04 12:01:38,536][134294] Updated weights for policy 0, policy_version 191254 (0.0024) [2025-01-04 12:01:38,968][134211] Fps is (10 sec: 17202.8, 60 sec: 13516.8, 300 sec: 13968.0). Total num frames: 783380480. Throughput: 0: 3534.5. Samples: 185013382. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 12:01:38,969][134211] Avg episode reward: [(0, '9.212')] [2025-01-04 12:01:42,484][134294] Updated weights for policy 0, policy_version 191264 (0.0030) [2025-01-04 12:01:43,968][134211] Fps is (10 sec: 13106.9, 60 sec: 13448.5, 300 sec: 13884.7). Total num frames: 783433728. Throughput: 0: 3470.9. Samples: 185029426. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 12:01:43,969][134211] Avg episode reward: [(0, '9.365')] [2025-01-04 12:01:45,959][134294] Updated weights for policy 0, policy_version 191274 (0.0026) [2025-01-04 12:01:48,968][134211] Fps is (10 sec: 11058.7, 60 sec: 13448.4, 300 sec: 13843.1). Total num frames: 783491072. Throughput: 0: 3358.3. Samples: 185038206. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 12:01:48,969][134211] Avg episode reward: [(0, '9.361')] [2025-01-04 12:01:49,571][134294] Updated weights for policy 0, policy_version 191284 (0.0025) [2025-01-04 12:01:53,201][134294] Updated weights for policy 0, policy_version 191294 (0.0025) [2025-01-04 12:01:53,968][134211] Fps is (10 sec: 11468.9, 60 sec: 13380.2, 300 sec: 13815.3). Total num frames: 783548416. Throughput: 0: 3354.5. Samples: 185055486. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 12:01:53,968][134211] Avg episode reward: [(0, '10.365')] [2025-01-04 12:01:56,640][134294] Updated weights for policy 0, policy_version 191304 (0.0025) [2025-01-04 12:01:58,968][134211] Fps is (10 sec: 11469.4, 60 sec: 13243.7, 300 sec: 13801.5). Total num frames: 783605760. Throughput: 0: 3347.2. Samples: 185072774. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 12:01:58,968][134211] Avg episode reward: [(0, '8.214')] [2025-01-04 12:02:00,215][134294] Updated weights for policy 0, policy_version 191314 (0.0029) [2025-01-04 12:02:03,441][134294] Updated weights for policy 0, policy_version 191324 (0.0026) [2025-01-04 12:02:03,968][134211] Fps is (10 sec: 11878.5, 60 sec: 13039.1, 300 sec: 13801.6). Total num frames: 783667200. Throughput: 0: 3301.9. Samples: 185081812. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 12:02:03,968][134211] Avg episode reward: [(0, '8.890')] [2025-01-04 12:02:06,761][134294] Updated weights for policy 0, policy_version 191334 (0.0024) [2025-01-04 12:02:08,970][134211] Fps is (10 sec: 11876.2, 60 sec: 13038.5, 300 sec: 13690.3). Total num frames: 783724544. Throughput: 0: 3233.6. Samples: 185100332. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 12:02:08,970][134211] Avg episode reward: [(0, '10.575')] [2025-01-04 12:02:10,523][134294] Updated weights for policy 0, policy_version 191344 (0.0031) [2025-01-04 12:02:12,678][134294] Updated weights for policy 0, policy_version 191354 (0.0014) [2025-01-04 12:02:13,968][134211] Fps is (10 sec: 13926.6, 60 sec: 13380.4, 300 sec: 13607.1). Total num frames: 783806464. Throughput: 0: 3323.6. Samples: 185121858. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 12:02:13,968][134211] Avg episode reward: [(0, '10.171')] [2025-01-04 12:02:14,865][134294] Updated weights for policy 0, policy_version 191364 (0.0017) [2025-01-04 12:02:17,944][134294] Updated weights for policy 0, policy_version 191374 (0.0024) [2025-01-04 12:02:18,968][134211] Fps is (10 sec: 15158.0, 60 sec: 13107.2, 300 sec: 13593.2). Total num frames: 783876096. Throughput: 0: 3297.3. Samples: 185134668. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 12:02:18,968][134211] Avg episode reward: [(0, '9.949')] [2025-01-04 12:02:21,420][134294] Updated weights for policy 0, policy_version 191384 (0.0024) [2025-01-04 12:02:23,968][134211] Fps is (10 sec: 13106.7, 60 sec: 13243.7, 300 sec: 13593.2). Total num frames: 783937536. Throughput: 0: 3092.0. Samples: 185152520. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 12:02:23,969][134211] Avg episode reward: [(0, '10.148')] [2025-01-04 12:02:25,063][134294] Updated weights for policy 0, policy_version 191394 (0.0026) [2025-01-04 12:02:28,378][134294] Updated weights for policy 0, policy_version 191404 (0.0025) [2025-01-04 12:02:28,968][134211] Fps is (10 sec: 11878.2, 60 sec: 13107.1, 300 sec: 13593.2). Total num frames: 783994880. Throughput: 0: 3128.1. Samples: 185170190. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 12:02:28,969][134211] Avg episode reward: [(0, '9.710')] [2025-01-04 12:02:31,540][134294] Updated weights for policy 0, policy_version 191414 (0.0022) [2025-01-04 12:02:33,619][134294] Updated weights for policy 0, policy_version 191424 (0.0014) [2025-01-04 12:02:33,968][134211] Fps is (10 sec: 13926.9, 60 sec: 12902.4, 300 sec: 13662.6). Total num frames: 784076800. Throughput: 0: 3142.5. Samples: 185179614. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 12:02:33,968][134211] Avg episode reward: [(0, '8.230')] [2025-01-04 12:02:36,330][134294] Updated weights for policy 0, policy_version 191434 (0.0019) [2025-01-04 12:02:38,968][134211] Fps is (10 sec: 14745.8, 60 sec: 12697.6, 300 sec: 13676.5). Total num frames: 784142336. Throughput: 0: 3303.6. Samples: 185204146. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 12:02:38,968][134211] Avg episode reward: [(0, '10.173')] [2025-01-04 12:02:39,969][134294] Updated weights for policy 0, policy_version 191444 (0.0028) [2025-01-04 12:02:43,514][134294] Updated weights for policy 0, policy_version 191454 (0.0028) [2025-01-04 12:02:43,968][134211] Fps is (10 sec: 11878.1, 60 sec: 12697.6, 300 sec: 13634.8). Total num frames: 784195584. Throughput: 0: 3294.4. Samples: 185221020. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 12:02:43,969][134211] Avg episode reward: [(0, '9.164')] [2025-01-04 12:02:47,145][134294] Updated weights for policy 0, policy_version 191464 (0.0030) [2025-01-04 12:02:48,968][134211] Fps is (10 sec: 12288.3, 60 sec: 12902.6, 300 sec: 13648.7). Total num frames: 784265216. Throughput: 0: 3284.5. Samples: 185229612. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 12:02:48,968][134211] Avg episode reward: [(0, '9.436')] [2025-01-04 12:02:49,572][134294] Updated weights for policy 0, policy_version 191474 (0.0017) [2025-01-04 12:02:52,860][134294] Updated weights for policy 0, policy_version 191484 (0.0027) [2025-01-04 12:02:53,968][134211] Fps is (10 sec: 13516.7, 60 sec: 13038.9, 300 sec: 13648.7). Total num frames: 784330752. Throughput: 0: 3342.1. Samples: 185250720. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 12:02:53,969][134211] Avg episode reward: [(0, '10.987')] [2025-01-04 12:02:56,130][134294] Updated weights for policy 0, policy_version 191494 (0.0024) [2025-01-04 12:02:58,968][134211] Fps is (10 sec: 13106.9, 60 sec: 13175.5, 300 sec: 13662.6). Total num frames: 784396288. Throughput: 0: 3288.6. Samples: 185269844. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 12:02:58,968][134211] Avg episode reward: [(0, '8.589')] [2025-01-04 12:02:59,188][134294] Updated weights for policy 0, policy_version 191504 (0.0027) [2025-01-04 12:03:02,257][134294] Updated weights for policy 0, policy_version 191514 (0.0024) [2025-01-04 12:03:03,968][134211] Fps is (10 sec: 13107.3, 60 sec: 13243.7, 300 sec: 13676.5). Total num frames: 784461824. Throughput: 0: 3228.6. Samples: 185279954. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 12:03:03,968][134211] Avg episode reward: [(0, '10.246')] [2025-01-04 12:03:05,251][134294] Updated weights for policy 0, policy_version 191524 (0.0025) [2025-01-04 12:03:08,344][134294] Updated weights for policy 0, policy_version 191534 (0.0022) [2025-01-04 12:03:08,968][134211] Fps is (10 sec: 13516.8, 60 sec: 13448.9, 300 sec: 13676.6). Total num frames: 784531456. Throughput: 0: 3287.8. Samples: 185300470. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 12:03:08,968][134211] Avg episode reward: [(0, '10.031')] [2025-01-04 12:03:11,274][134294] Updated weights for policy 0, policy_version 191544 (0.0029) [2025-01-04 12:03:13,968][134211] Fps is (10 sec: 13517.0, 60 sec: 13175.4, 300 sec: 13662.6). Total num frames: 784596992. Throughput: 0: 3343.3. Samples: 185320636. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 12:03:13,968][134211] Avg episode reward: [(0, '10.081')] [2025-01-04 12:03:13,994][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000191553_784601088.pth... [2025-01-04 12:03:14,061][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000190760_781352960.pth [2025-01-04 12:03:14,382][134294] Updated weights for policy 0, policy_version 191554 (0.0024) [2025-01-04 12:03:17,570][134294] Updated weights for policy 0, policy_version 191564 (0.0028) [2025-01-04 12:03:18,968][134211] Fps is (10 sec: 13107.4, 60 sec: 13107.2, 300 sec: 13607.1). Total num frames: 784662528. Throughput: 0: 3351.6. Samples: 185330436. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 12:03:18,968][134211] Avg episode reward: [(0, '9.366')] [2025-01-04 12:03:20,906][134294] Updated weights for policy 0, policy_version 191574 (0.0026) [2025-01-04 12:03:23,477][134294] Updated weights for policy 0, policy_version 191584 (0.0017) [2025-01-04 12:03:23,968][134211] Fps is (10 sec: 13926.6, 60 sec: 13312.1, 300 sec: 13607.1). Total num frames: 784736256. Throughput: 0: 3219.3. Samples: 185349012. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 12:03:23,968][134211] Avg episode reward: [(0, '9.958')] [2025-01-04 12:03:25,520][134294] Updated weights for policy 0, policy_version 191594 (0.0012) [2025-01-04 12:03:27,495][134294] Updated weights for policy 0, policy_version 191604 (0.0014) [2025-01-04 12:03:28,967][134211] Fps is (10 sec: 17612.9, 60 sec: 14063.0, 300 sec: 13745.9). Total num frames: 784838656. Throughput: 0: 3522.1. Samples: 185379514. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 12:03:28,968][134211] Avg episode reward: [(0, '9.905')] [2025-01-04 12:03:29,372][134294] Updated weights for policy 0, policy_version 191614 (0.0013) [2025-01-04 12:03:31,261][134294] Updated weights for policy 0, policy_version 191624 (0.0014) [2025-01-04 12:03:33,155][134294] Updated weights for policy 0, policy_version 191634 (0.0013) [2025-01-04 12:03:33,968][134211] Fps is (10 sec: 20888.6, 60 sec: 14472.4, 300 sec: 13829.2). Total num frames: 784945152. Throughput: 0: 3690.8. Samples: 185395698. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 12:03:33,969][134211] Avg episode reward: [(0, '10.195')] [2025-01-04 12:03:35,937][134294] Updated weights for policy 0, policy_version 191644 (0.0026) [2025-01-04 12:03:38,968][134211] Fps is (10 sec: 17202.7, 60 sec: 14472.5, 300 sec: 13732.0). Total num frames: 785010688. Throughput: 0: 3775.7. Samples: 185420624. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 12:03:38,969][134211] Avg episode reward: [(0, '8.780')] [2025-01-04 12:03:39,284][134294] Updated weights for policy 0, policy_version 191654 (0.0023) [2025-01-04 12:03:42,452][134294] Updated weights for policy 0, policy_version 191664 (0.0024) [2025-01-04 12:03:43,968][134211] Fps is (10 sec: 12698.0, 60 sec: 14609.1, 300 sec: 13718.1). Total num frames: 785072128. Throughput: 0: 3764.7. Samples: 185439256. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 12:03:43,968][134211] Avg episode reward: [(0, '9.003')] [2025-01-04 12:03:45,529][134294] Updated weights for policy 0, policy_version 191674 (0.0026) [2025-01-04 12:03:48,577][134294] Updated weights for policy 0, policy_version 191684 (0.0024) [2025-01-04 12:03:48,968][134211] Fps is (10 sec: 13107.3, 60 sec: 14609.0, 300 sec: 13718.1). Total num frames: 785141760. Throughput: 0: 3767.9. Samples: 185449510. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 12:03:48,968][134211] Avg episode reward: [(0, '9.089')] [2025-01-04 12:03:51,590][134294] Updated weights for policy 0, policy_version 191694 (0.0028) [2025-01-04 12:03:53,968][134211] Fps is (10 sec: 13516.8, 60 sec: 14609.1, 300 sec: 13718.1). Total num frames: 785207296. Throughput: 0: 3763.1. Samples: 185469810. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 12:03:53,968][134211] Avg episode reward: [(0, '10.104')] [2025-01-04 12:03:54,910][134294] Updated weights for policy 0, policy_version 191704 (0.0027) [2025-01-04 12:03:58,051][134294] Updated weights for policy 0, policy_version 191714 (0.0025) [2025-01-04 12:03:58,968][134211] Fps is (10 sec: 13107.0, 60 sec: 14609.0, 300 sec: 13732.0). Total num frames: 785272832. Throughput: 0: 3732.8. Samples: 185488614. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 12:03:58,969][134211] Avg episode reward: [(0, '9.407')] [2025-01-04 12:04:01,103][134294] Updated weights for policy 0, policy_version 191724 (0.0026) [2025-01-04 12:04:03,968][134211] Fps is (10 sec: 13107.2, 60 sec: 14609.1, 300 sec: 13732.0). Total num frames: 785338368. Throughput: 0: 3743.9. Samples: 185498914. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 12:04:03,968][134211] Avg episode reward: [(0, '9.365')] [2025-01-04 12:04:04,176][134294] Updated weights for policy 0, policy_version 191734 (0.0026) [2025-01-04 12:04:07,265][134294] Updated weights for policy 0, policy_version 191744 (0.0023) [2025-01-04 12:04:08,968][134211] Fps is (10 sec: 13107.4, 60 sec: 14540.8, 300 sec: 13732.0). Total num frames: 785403904. Throughput: 0: 3774.8. Samples: 185518878. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 12:04:08,968][134211] Avg episode reward: [(0, '9.412')] [2025-01-04 12:04:10,402][134294] Updated weights for policy 0, policy_version 191754 (0.0026) [2025-01-04 12:04:13,383][134294] Updated weights for policy 0, policy_version 191764 (0.0025) [2025-01-04 12:04:13,968][134211] Fps is (10 sec: 13516.3, 60 sec: 14609.0, 300 sec: 13745.9). Total num frames: 785473536. Throughput: 0: 3542.4. Samples: 185538922. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 12:04:13,969][134211] Avg episode reward: [(0, '8.536')] [2025-01-04 12:04:16,311][134294] Updated weights for policy 0, policy_version 191774 (0.0020) [2025-01-04 12:04:18,968][134211] Fps is (10 sec: 13516.8, 60 sec: 14609.0, 300 sec: 13732.0). Total num frames: 785539072. Throughput: 0: 3412.8. Samples: 185549274. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 12:04:18,968][134211] Avg episode reward: [(0, '9.154')] [2025-01-04 12:04:19,419][134294] Updated weights for policy 0, policy_version 191784 (0.0026) [2025-01-04 12:04:22,577][134294] Updated weights for policy 0, policy_version 191794 (0.0027) [2025-01-04 12:04:23,968][134211] Fps is (10 sec: 13107.6, 60 sec: 14472.5, 300 sec: 13732.0). Total num frames: 785604608. Throughput: 0: 3295.8. Samples: 185568936. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 12:04:23,968][134211] Avg episode reward: [(0, '9.195')] [2025-01-04 12:04:25,788][134294] Updated weights for policy 0, policy_version 191804 (0.0028) [2025-01-04 12:04:28,846][134294] Updated weights for policy 0, policy_version 191814 (0.0026) [2025-01-04 12:04:28,968][134211] Fps is (10 sec: 13107.4, 60 sec: 13858.1, 300 sec: 13732.0). Total num frames: 785670144. Throughput: 0: 3318.2. Samples: 185588574. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 12:04:28,968][134211] Avg episode reward: [(0, '10.394')] [2025-01-04 12:04:31,812][134294] Updated weights for policy 0, policy_version 191824 (0.0025) [2025-01-04 12:04:33,968][134211] Fps is (10 sec: 13106.8, 60 sec: 13175.5, 300 sec: 13676.4). Total num frames: 785735680. Throughput: 0: 3317.3. Samples: 185598788. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 12:04:33,969][134211] Avg episode reward: [(0, '10.836')] [2025-01-04 12:04:35,048][134294] Updated weights for policy 0, policy_version 191834 (0.0027) [2025-01-04 12:04:38,503][134294] Updated weights for policy 0, policy_version 191844 (0.0028) [2025-01-04 12:04:38,968][134211] Fps is (10 sec: 12697.5, 60 sec: 13107.2, 300 sec: 13551.5). Total num frames: 785797120. Throughput: 0: 3285.0. Samples: 185617636. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 12:04:38,968][134211] Avg episode reward: [(0, '10.052')] [2025-01-04 12:04:41,856][134294] Updated weights for policy 0, policy_version 191854 (0.0025) [2025-01-04 12:04:43,967][134211] Fps is (10 sec: 12288.8, 60 sec: 13107.3, 300 sec: 13537.6). Total num frames: 785858560. Throughput: 0: 3263.3. Samples: 185635460. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 12:04:43,968][134211] Avg episode reward: [(0, '10.409')] [2025-01-04 12:04:44,747][134294] Updated weights for policy 0, policy_version 191864 (0.0019) [2025-01-04 12:04:46,729][134294] Updated weights for policy 0, policy_version 191874 (0.0012) [2025-01-04 12:04:48,832][134294] Updated weights for policy 0, policy_version 191884 (0.0016) [2025-01-04 12:04:48,968][134211] Fps is (10 sec: 15974.3, 60 sec: 13585.0, 300 sec: 13662.6). Total num frames: 785956864. Throughput: 0: 3342.8. Samples: 185649340. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 12:04:48,968][134211] Avg episode reward: [(0, '9.754')] [2025-01-04 12:04:52,028][134294] Updated weights for policy 0, policy_version 191894 (0.0026) [2025-01-04 12:04:53,968][134211] Fps is (10 sec: 15973.9, 60 sec: 13516.8, 300 sec: 13648.7). Total num frames: 786018304. Throughput: 0: 3427.7. Samples: 185673124. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 12:04:53,969][134211] Avg episode reward: [(0, '9.765')] [2025-01-04 12:04:55,338][134294] Updated weights for policy 0, policy_version 191904 (0.0025) [2025-01-04 12:04:58,499][134294] Updated weights for policy 0, policy_version 191914 (0.0026) [2025-01-04 12:04:58,968][134211] Fps is (10 sec: 12697.4, 60 sec: 13516.8, 300 sec: 13662.6). Total num frames: 786083840. Throughput: 0: 3404.5. Samples: 185692122. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 12:04:58,969][134211] Avg episode reward: [(0, '10.552')] [2025-01-04 12:05:01,456][134294] Updated weights for policy 0, policy_version 191924 (0.0025) [2025-01-04 12:05:03,968][134211] Fps is (10 sec: 13516.8, 60 sec: 13585.1, 300 sec: 13662.6). Total num frames: 786153472. Throughput: 0: 3401.4. Samples: 185702336. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 12:05:03,968][134211] Avg episode reward: [(0, '10.028')] [2025-01-04 12:05:04,753][134294] Updated weights for policy 0, policy_version 191934 (0.0023) [2025-01-04 12:05:07,654][134294] Updated weights for policy 0, policy_version 191944 (0.0024) [2025-01-04 12:05:08,968][134211] Fps is (10 sec: 13107.6, 60 sec: 13516.8, 300 sec: 13496.0). Total num frames: 786214912. Throughput: 0: 3407.8. Samples: 185722288. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 12:05:08,968][134211] Avg episode reward: [(0, '10.199')] [2025-01-04 12:05:10,708][134294] Updated weights for policy 0, policy_version 191954 (0.0025) [2025-01-04 12:05:13,577][134294] Updated weights for policy 0, policy_version 191964 (0.0023) [2025-01-04 12:05:13,968][134211] Fps is (10 sec: 13516.7, 60 sec: 13585.1, 300 sec: 13426.5). Total num frames: 786288640. Throughput: 0: 3428.6. Samples: 185742860. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 12:05:13,969][134211] Avg episode reward: [(0, '8.877')] [2025-01-04 12:05:13,980][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000191965_786288640.pth... [2025-01-04 12:05:14,052][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000191163_783003648.pth [2025-01-04 12:05:16,685][134294] Updated weights for policy 0, policy_version 191974 (0.0024) [2025-01-04 12:05:18,968][134211] Fps is (10 sec: 13926.5, 60 sec: 13585.1, 300 sec: 13440.4). Total num frames: 786354176. Throughput: 0: 3424.9. Samples: 185752908. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 12:05:18,968][134211] Avg episode reward: [(0, '9.161')] [2025-01-04 12:05:19,897][134294] Updated weights for policy 0, policy_version 191984 (0.0029) [2025-01-04 12:05:23,682][134294] Updated weights for policy 0, policy_version 191994 (0.0018) [2025-01-04 12:05:23,968][134211] Fps is (10 sec: 12288.4, 60 sec: 13448.6, 300 sec: 13426.6). Total num frames: 786411520. Throughput: 0: 3345.0. Samples: 185768160. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 12:05:23,968][134211] Avg episode reward: [(0, '9.331')] [2025-01-04 12:05:25,824][134294] Updated weights for policy 0, policy_version 192004 (0.0015) [2025-01-04 12:05:27,736][134294] Updated weights for policy 0, policy_version 192014 (0.0012) [2025-01-04 12:05:28,968][134211] Fps is (10 sec: 15564.8, 60 sec: 13994.7, 300 sec: 13551.5). Total num frames: 786509824. Throughput: 0: 3604.1. Samples: 185797644. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 12:05:28,968][134211] Avg episode reward: [(0, '10.661')] [2025-01-04 12:05:30,621][134294] Updated weights for policy 0, policy_version 192024 (0.0022) [2025-01-04 12:05:32,823][134294] Updated weights for policy 0, policy_version 192034 (0.0017) [2025-01-04 12:05:33,968][134211] Fps is (10 sec: 17202.3, 60 sec: 14131.2, 300 sec: 13607.0). Total num frames: 786583552. Throughput: 0: 3530.1. Samples: 185808194. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-04 12:05:33,969][134211] Avg episode reward: [(0, '9.792')] [2025-01-04 12:05:35,937][134294] Updated weights for policy 0, policy_version 192044 (0.0027) [2025-01-04 12:05:38,968][134211] Fps is (10 sec: 13926.2, 60 sec: 14199.5, 300 sec: 13634.8). Total num frames: 786649088. Throughput: 0: 3492.5. Samples: 185830286. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-04 12:05:38,968][134211] Avg episode reward: [(0, '8.934')] [2025-01-04 12:05:39,335][134294] Updated weights for policy 0, policy_version 192054 (0.0025) [2025-01-04 12:05:42,758][134294] Updated weights for policy 0, policy_version 192064 (0.0028) [2025-01-04 12:05:43,968][134211] Fps is (10 sec: 12288.2, 60 sec: 14131.1, 300 sec: 13634.8). Total num frames: 786706432. Throughput: 0: 3460.4. Samples: 185847838. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-04 12:05:43,969][134211] Avg episode reward: [(0, '9.322')] [2025-01-04 12:05:46,223][134294] Updated weights for policy 0, policy_version 192074 (0.0027) [2025-01-04 12:05:48,968][134211] Fps is (10 sec: 11468.8, 60 sec: 13448.5, 300 sec: 13620.9). Total num frames: 786763776. Throughput: 0: 3435.3. Samples: 185856924. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-04 12:05:48,969][134211] Avg episode reward: [(0, '10.024')] [2025-01-04 12:05:49,743][134294] Updated weights for policy 0, policy_version 192084 (0.0028) [2025-01-04 12:05:53,178][134294] Updated weights for policy 0, policy_version 192094 (0.0027) [2025-01-04 12:05:53,968][134211] Fps is (10 sec: 11878.4, 60 sec: 13448.5, 300 sec: 13607.0). Total num frames: 786825216. Throughput: 0: 3383.0. Samples: 185874524. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-04 12:05:53,969][134211] Avg episode reward: [(0, '9.853')] [2025-01-04 12:05:56,638][134294] Updated weights for policy 0, policy_version 192104 (0.0027) [2025-01-04 12:05:58,780][134294] Updated weights for policy 0, policy_version 192114 (0.0016) [2025-01-04 12:05:58,968][134211] Fps is (10 sec: 13515.9, 60 sec: 13585.0, 300 sec: 13607.0). Total num frames: 786898944. Throughput: 0: 3391.2. Samples: 185895464. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-04 12:05:58,969][134211] Avg episode reward: [(0, '10.012')] [2025-01-04 12:06:01,997][134294] Updated weights for policy 0, policy_version 192124 (0.0025) [2025-01-04 12:06:03,968][134211] Fps is (10 sec: 13517.0, 60 sec: 13448.6, 300 sec: 13620.9). Total num frames: 786960384. Throughput: 0: 3394.9. Samples: 185905678. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-04 12:06:03,968][134211] Avg episode reward: [(0, '8.936')] [2025-01-04 12:06:05,379][134294] Updated weights for policy 0, policy_version 192134 (0.0027) [2025-01-04 12:06:08,929][134294] Updated weights for policy 0, policy_version 192144 (0.0028) [2025-01-04 12:06:08,968][134211] Fps is (10 sec: 12288.8, 60 sec: 13448.5, 300 sec: 13620.9). Total num frames: 787021824. Throughput: 0: 3461.6. Samples: 185923934. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-04 12:06:08,969][134211] Avg episode reward: [(0, '9.357')] [2025-01-04 12:06:11,680][134294] Updated weights for policy 0, policy_version 192154 (0.0019) [2025-01-04 12:06:13,743][134294] Updated weights for policy 0, policy_version 192164 (0.0014) [2025-01-04 12:06:13,968][134211] Fps is (10 sec: 14745.4, 60 sec: 13653.3, 300 sec: 13620.9). Total num frames: 787107840. Throughput: 0: 3299.9. Samples: 185946142. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-04 12:06:13,968][134211] Avg episode reward: [(0, '9.381')] [2025-01-04 12:06:15,808][134294] Updated weights for policy 0, policy_version 192174 (0.0013) [2025-01-04 12:06:18,492][134294] Updated weights for policy 0, policy_version 192184 (0.0015) [2025-01-04 12:06:18,968][134211] Fps is (10 sec: 16793.3, 60 sec: 13926.3, 300 sec: 13718.1). Total num frames: 787189760. Throughput: 0: 3400.0. Samples: 185961192. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-04 12:06:18,969][134211] Avg episode reward: [(0, '9.113')] [2025-01-04 12:06:22,670][134294] Updated weights for policy 0, policy_version 192194 (0.0031) [2025-01-04 12:06:23,968][134211] Fps is (10 sec: 13107.4, 60 sec: 13789.8, 300 sec: 13662.6). Total num frames: 787238912. Throughput: 0: 3296.7. Samples: 185978636. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-04 12:06:23,968][134211] Avg episode reward: [(0, '9.609')] [2025-01-04 12:06:26,555][134294] Updated weights for policy 0, policy_version 192204 (0.0033) [2025-01-04 12:06:28,968][134211] Fps is (10 sec: 10240.2, 60 sec: 13038.9, 300 sec: 13523.7). Total num frames: 787292160. Throughput: 0: 3252.2. Samples: 185994186. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-04 12:06:28,968][134211] Avg episode reward: [(0, '9.569')] [2025-01-04 12:06:30,568][134294] Updated weights for policy 0, policy_version 192214 (0.0029) [2025-01-04 12:06:32,910][134294] Updated weights for policy 0, policy_version 192224 (0.0015) [2025-01-04 12:06:33,968][134211] Fps is (10 sec: 12697.7, 60 sec: 13039.0, 300 sec: 13509.9). Total num frames: 787365888. Throughput: 0: 3233.4. Samples: 186002426. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-04 12:06:33,968][134211] Avg episode reward: [(0, '9.013')] [2025-01-04 12:06:35,114][134294] Updated weights for policy 0, policy_version 192234 (0.0013) [2025-01-04 12:06:37,388][134294] Updated weights for policy 0, policy_version 192244 (0.0014) [2025-01-04 12:06:38,968][134211] Fps is (10 sec: 15564.9, 60 sec: 13312.0, 300 sec: 13607.1). Total num frames: 787447808. Throughput: 0: 3461.2. Samples: 186030276. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-04 12:06:38,968][134211] Avg episode reward: [(0, '9.062')] [2025-01-04 12:06:40,993][134294] Updated weights for policy 0, policy_version 192254 (0.0028) [2025-01-04 12:06:43,969][134211] Fps is (10 sec: 13515.4, 60 sec: 13243.6, 300 sec: 13593.2). Total num frames: 787501056. Throughput: 0: 3354.5. Samples: 186046418. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 12:06:43,969][134211] Avg episode reward: [(0, '9.402')] [2025-01-04 12:06:45,157][134294] Updated weights for policy 0, policy_version 192264 (0.0029) [2025-01-04 12:06:48,920][134294] Updated weights for policy 0, policy_version 192274 (0.0031) [2025-01-04 12:06:48,968][134211] Fps is (10 sec: 10649.7, 60 sec: 13175.5, 300 sec: 13579.3). Total num frames: 787554304. Throughput: 0: 3300.0. Samples: 186054180. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 12:06:48,968][134211] Avg episode reward: [(0, '8.794')] [2025-01-04 12:06:52,836][134294] Updated weights for policy 0, policy_version 192284 (0.0030) [2025-01-04 12:06:53,968][134211] Fps is (10 sec: 10650.4, 60 sec: 13038.9, 300 sec: 13565.4). Total num frames: 787607552. Throughput: 0: 3256.5. Samples: 186070478. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 12:06:53,969][134211] Avg episode reward: [(0, '9.779')] [2025-01-04 12:06:55,768][134294] Updated weights for policy 0, policy_version 192294 (0.0017) [2025-01-04 12:06:58,039][134294] Updated weights for policy 0, policy_version 192304 (0.0014) [2025-01-04 12:06:58,967][134211] Fps is (10 sec: 13517.0, 60 sec: 13175.7, 300 sec: 13634.8). Total num frames: 787689472. Throughput: 0: 3265.9. Samples: 186093108. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 12:06:58,968][134211] Avg episode reward: [(0, '8.387')] [2025-01-04 12:07:00,386][134294] Updated weights for policy 0, policy_version 192314 (0.0014) [2025-01-04 12:07:02,704][134294] Updated weights for policy 0, policy_version 192324 (0.0014) [2025-01-04 12:07:03,968][134211] Fps is (10 sec: 16793.9, 60 sec: 13585.1, 300 sec: 13732.1). Total num frames: 787775488. Throughput: 0: 3227.8. Samples: 186106444. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 12:07:03,968][134211] Avg episode reward: [(0, '10.206')] [2025-01-04 12:07:06,264][134294] Updated weights for policy 0, policy_version 192334 (0.0028) [2025-01-04 12:07:08,968][134211] Fps is (10 sec: 13516.4, 60 sec: 13380.3, 300 sec: 13620.9). Total num frames: 787824640. Throughput: 0: 3254.0. Samples: 186125066. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 12:07:08,969][134211] Avg episode reward: [(0, '9.999')] [2025-01-04 12:07:10,616][134294] Updated weights for policy 0, policy_version 192344 (0.0030) [2025-01-04 12:07:13,969][134211] Fps is (10 sec: 9829.6, 60 sec: 12765.7, 300 sec: 13551.5). Total num frames: 787873792. Throughput: 0: 3241.8. Samples: 186140068. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 12:07:13,969][134211] Avg episode reward: [(0, '10.320')] [2025-01-04 12:07:13,979][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000192352_787873792.pth... [2025-01-04 12:07:14,067][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000191553_784601088.pth [2025-01-04 12:07:14,643][134294] Updated weights for policy 0, policy_version 192354 (0.0030) [2025-01-04 12:07:17,906][134294] Updated weights for policy 0, policy_version 192364 (0.0020) [2025-01-04 12:07:18,968][134211] Fps is (10 sec: 11469.0, 60 sec: 12492.9, 300 sec: 13565.4). Total num frames: 787939328. Throughput: 0: 3231.6. Samples: 186147848. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 12:07:18,968][134211] Avg episode reward: [(0, '9.062')] [2025-01-04 12:07:20,185][134294] Updated weights for policy 0, policy_version 192374 (0.0013) [2025-01-04 12:07:23,398][134294] Updated weights for policy 0, policy_version 192384 (0.0025) [2025-01-04 12:07:23,968][134211] Fps is (10 sec: 13517.6, 60 sec: 12834.1, 300 sec: 13607.1). Total num frames: 788008960. Throughput: 0: 3132.7. Samples: 186171248. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 12:07:23,969][134211] Avg episode reward: [(0, '9.034')] [2025-01-04 12:07:27,040][134294] Updated weights for policy 0, policy_version 192394 (0.0028) [2025-01-04 12:07:28,968][134211] Fps is (10 sec: 12287.8, 60 sec: 12834.1, 300 sec: 13509.8). Total num frames: 788062208. Throughput: 0: 3136.1. Samples: 186187540. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 12:07:28,969][134211] Avg episode reward: [(0, '9.290')] [2025-01-04 12:07:30,752][134294] Updated weights for policy 0, policy_version 192404 (0.0028) [2025-01-04 12:07:33,968][134211] Fps is (10 sec: 11059.2, 60 sec: 12561.0, 300 sec: 13482.1). Total num frames: 788119552. Throughput: 0: 3151.5. Samples: 186195996. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 12:07:33,969][134211] Avg episode reward: [(0, '9.915')] [2025-01-04 12:07:34,586][134294] Updated weights for policy 0, policy_version 192414 (0.0028) [2025-01-04 12:07:38,748][134294] Updated weights for policy 0, policy_version 192424 (0.0031) [2025-01-04 12:07:38,969][134211] Fps is (10 sec: 10648.7, 60 sec: 12014.8, 300 sec: 13468.2). Total num frames: 788168704. Throughput: 0: 3144.3. Samples: 186211974. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 12:07:38,969][134211] Avg episode reward: [(0, '9.380')] [2025-01-04 12:07:41,608][134294] Updated weights for policy 0, policy_version 192434 (0.0015) [2025-01-04 12:07:43,968][134211] Fps is (10 sec: 11878.7, 60 sec: 12288.2, 300 sec: 13468.2). Total num frames: 788238336. Throughput: 0: 3080.2. Samples: 186231718. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 12:07:43,968][134211] Avg episode reward: [(0, '9.159')] [2025-01-04 12:07:44,925][134294] Updated weights for policy 0, policy_version 192444 (0.0020) [2025-01-04 12:07:48,968][134211] Fps is (10 sec: 11060.3, 60 sec: 12083.2, 300 sec: 13384.9). Total num frames: 788279296. Throughput: 0: 2910.4. Samples: 186237412. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 12:07:48,968][134211] Avg episode reward: [(0, '9.884')] [2025-01-04 12:07:50,162][134294] Updated weights for policy 0, policy_version 192454 (0.0019) [2025-01-04 12:07:53,969][134211] Fps is (10 sec: 8191.2, 60 sec: 11878.3, 300 sec: 13301.5). Total num frames: 788320256. Throughput: 0: 2780.0. Samples: 186250170. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 12:07:53,970][134211] Avg episode reward: [(0, '10.590')] [2025-01-04 12:07:55,891][134294] Updated weights for policy 0, policy_version 192464 (0.0026) [2025-01-04 12:07:58,968][134211] Fps is (10 sec: 8601.6, 60 sec: 11264.0, 300 sec: 13232.2). Total num frames: 788365312. Throughput: 0: 2729.5. Samples: 186262894. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 12:07:58,968][134211] Avg episode reward: [(0, '9.862')] [2025-01-04 12:07:59,362][134294] Updated weights for policy 0, policy_version 192474 (0.0024) [2025-01-04 12:08:02,580][134294] Updated weights for policy 0, policy_version 192484 (0.0024) [2025-01-04 12:08:03,968][134211] Fps is (10 sec: 11060.1, 60 sec: 10922.7, 300 sec: 13218.3). Total num frames: 788430848. Throughput: 0: 2769.0. Samples: 186272454. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 12:08:03,968][134211] Avg episode reward: [(0, '9.168')] [2025-01-04 12:08:05,442][134294] Updated weights for policy 0, policy_version 192494 (0.0023) [2025-01-04 12:08:07,906][134294] Updated weights for policy 0, policy_version 192504 (0.0022) [2025-01-04 12:08:08,968][134211] Fps is (10 sec: 15155.3, 60 sec: 11537.1, 300 sec: 13287.7). Total num frames: 788516864. Throughput: 0: 2730.5. Samples: 186294120. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 12:08:08,968][134211] Avg episode reward: [(0, '9.830')] [2025-01-04 12:08:10,121][134294] Updated weights for policy 0, policy_version 192514 (0.0017) [2025-01-04 12:08:13,299][134294] Updated weights for policy 0, policy_version 192524 (0.0026) [2025-01-04 12:08:13,968][134211] Fps is (10 sec: 15565.0, 60 sec: 11878.6, 300 sec: 13301.6). Total num frames: 788586496. Throughput: 0: 2886.7. Samples: 186317442. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 12:08:13,968][134211] Avg episode reward: [(0, '8.878')] [2025-01-04 12:08:16,400][134294] Updated weights for policy 0, policy_version 192534 (0.0029) [2025-01-04 12:08:18,968][134211] Fps is (10 sec: 13107.1, 60 sec: 11810.1, 300 sec: 13259.9). Total num frames: 788647936. Throughput: 0: 2910.9. Samples: 186326986. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 12:08:18,968][134211] Avg episode reward: [(0, '9.302')] [2025-01-04 12:08:19,718][134294] Updated weights for policy 0, policy_version 192544 (0.0028) [2025-01-04 12:08:22,840][134294] Updated weights for policy 0, policy_version 192554 (0.0023) [2025-01-04 12:08:23,968][134211] Fps is (10 sec: 12697.1, 60 sec: 11741.8, 300 sec: 13134.9). Total num frames: 788713472. Throughput: 0: 2985.9. Samples: 186346338. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 12:08:23,969][134211] Avg episode reward: [(0, '9.819')] [2025-01-04 12:08:25,876][134294] Updated weights for policy 0, policy_version 192564 (0.0027) [2025-01-04 12:08:28,703][134294] Updated weights for policy 0, policy_version 192574 (0.0023) [2025-01-04 12:08:28,968][134211] Fps is (10 sec: 13516.2, 60 sec: 12014.9, 300 sec: 13010.0). Total num frames: 788783104. Throughput: 0: 3005.7. Samples: 186366978. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 12:08:28,969][134211] Avg episode reward: [(0, '10.750')] [2025-01-04 12:08:31,737][134294] Updated weights for policy 0, policy_version 192584 (0.0024) [2025-01-04 12:08:33,969][134211] Fps is (10 sec: 13925.3, 60 sec: 12219.5, 300 sec: 13023.8). Total num frames: 788852736. Throughput: 0: 3106.6. Samples: 186377212. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 12:08:33,969][134211] Avg episode reward: [(0, '9.842')] [2025-01-04 12:08:34,871][134294] Updated weights for policy 0, policy_version 192594 (0.0022) [2025-01-04 12:08:37,858][134294] Updated weights for policy 0, policy_version 192604 (0.0023) [2025-01-04 12:08:38,968][134211] Fps is (10 sec: 13517.3, 60 sec: 12493.0, 300 sec: 13037.8). Total num frames: 788918272. Throughput: 0: 3271.0. Samples: 186397364. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 12:08:38,968][134211] Avg episode reward: [(0, '10.571')] [2025-01-04 12:08:41,088][134294] Updated weights for policy 0, policy_version 192614 (0.0028) [2025-01-04 12:08:43,968][134211] Fps is (10 sec: 13108.8, 60 sec: 12424.5, 300 sec: 13023.9). Total num frames: 788983808. Throughput: 0: 3417.9. Samples: 186416698. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 12:08:43,968][134211] Avg episode reward: [(0, '10.127')] [2025-01-04 12:08:44,203][134294] Updated weights for policy 0, policy_version 192624 (0.0027) [2025-01-04 12:08:47,188][134294] Updated weights for policy 0, policy_version 192634 (0.0022) [2025-01-04 12:08:48,968][134211] Fps is (10 sec: 13107.2, 60 sec: 12834.1, 300 sec: 13023.9). Total num frames: 789049344. Throughput: 0: 3430.5. Samples: 186426828. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 12:08:48,968][134211] Avg episode reward: [(0, '9.277')] [2025-01-04 12:08:50,141][134294] Updated weights for policy 0, policy_version 192644 (0.0026) [2025-01-04 12:08:53,029][134294] Updated weights for policy 0, policy_version 192654 (0.0025) [2025-01-04 12:08:53,968][134211] Fps is (10 sec: 13926.1, 60 sec: 13380.4, 300 sec: 13051.7). Total num frames: 789123072. Throughput: 0: 3421.2. Samples: 186448074. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 12:08:53,968][134211] Avg episode reward: [(0, '10.042')] [2025-01-04 12:08:55,866][134294] Updated weights for policy 0, policy_version 192664 (0.0027) [2025-01-04 12:08:58,743][134294] Updated weights for policy 0, policy_version 192674 (0.0026) [2025-01-04 12:08:58,968][134211] Fps is (10 sec: 14336.0, 60 sec: 13789.8, 300 sec: 13065.5). Total num frames: 789192704. Throughput: 0: 3376.7. Samples: 186469392. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 12:08:58,968][134211] Avg episode reward: [(0, '8.684')] [2025-01-04 12:09:01,598][134294] Updated weights for policy 0, policy_version 192684 (0.0026) [2025-01-04 12:09:03,968][134211] Fps is (10 sec: 13926.7, 60 sec: 13858.2, 300 sec: 13079.4). Total num frames: 789262336. Throughput: 0: 3397.9. Samples: 186479890. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 12:09:03,968][134211] Avg episode reward: [(0, '8.804')] [2025-01-04 12:09:04,722][134294] Updated weights for policy 0, policy_version 192694 (0.0025) [2025-01-04 12:09:07,691][134294] Updated weights for policy 0, policy_version 192704 (0.0024) [2025-01-04 12:09:08,968][134211] Fps is (10 sec: 13926.2, 60 sec: 13585.0, 300 sec: 13079.4). Total num frames: 789331968. Throughput: 0: 3425.1. Samples: 186500466. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 12:09:08,969][134211] Avg episode reward: [(0, '7.872')] [2025-01-04 12:09:10,518][134294] Updated weights for policy 0, policy_version 192714 (0.0025) [2025-01-04 12:09:13,368][134294] Updated weights for policy 0, policy_version 192724 (0.0027) [2025-01-04 12:09:13,968][134211] Fps is (10 sec: 14335.9, 60 sec: 13653.3, 300 sec: 13107.2). Total num frames: 789405696. Throughput: 0: 3445.0. Samples: 186522000. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 12:09:13,968][134211] Avg episode reward: [(0, '10.600')] [2025-01-04 12:09:13,975][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000192726_789405696.pth... [2025-01-04 12:09:14,042][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000191965_786288640.pth [2025-01-04 12:09:16,215][134294] Updated weights for policy 0, policy_version 192734 (0.0026) [2025-01-04 12:09:18,968][134211] Fps is (10 sec: 14336.2, 60 sec: 13789.8, 300 sec: 13121.1). Total num frames: 789475328. Throughput: 0: 3450.4. Samples: 186532474. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 12:09:18,968][134211] Avg episode reward: [(0, '9.661')] [2025-01-04 12:09:19,244][134294] Updated weights for policy 0, policy_version 192744 (0.0023) [2025-01-04 12:09:22,194][134294] Updated weights for policy 0, policy_version 192754 (0.0024) [2025-01-04 12:09:23,968][134211] Fps is (10 sec: 13926.3, 60 sec: 13858.2, 300 sec: 13135.0). Total num frames: 789544960. Throughput: 0: 3466.3. Samples: 186553350. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 12:09:23,968][134211] Avg episode reward: [(0, '10.180')] [2025-01-04 12:09:25,042][134294] Updated weights for policy 0, policy_version 192764 (0.0024) [2025-01-04 12:09:27,822][134294] Updated weights for policy 0, policy_version 192774 (0.0023) [2025-01-04 12:09:28,968][134211] Fps is (10 sec: 13926.4, 60 sec: 13858.2, 300 sec: 13148.9). Total num frames: 789614592. Throughput: 0: 3513.0. Samples: 186574782. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 12:09:28,968][134211] Avg episode reward: [(0, '9.878')] [2025-01-04 12:09:30,727][134294] Updated weights for policy 0, policy_version 192784 (0.0023) [2025-01-04 12:09:33,532][134294] Updated weights for policy 0, policy_version 192794 (0.0025) [2025-01-04 12:09:33,968][134211] Fps is (10 sec: 14336.0, 60 sec: 13926.6, 300 sec: 13190.5). Total num frames: 789688320. Throughput: 0: 3528.5. Samples: 186585612. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 12:09:33,968][134211] Avg episode reward: [(0, '9.608')] [2025-01-04 12:09:36,450][134294] Updated weights for policy 0, policy_version 192804 (0.0023) [2025-01-04 12:09:38,968][134211] Fps is (10 sec: 14336.1, 60 sec: 13994.7, 300 sec: 13218.3). Total num frames: 789757952. Throughput: 0: 3530.2. Samples: 186606932. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 12:09:38,968][134211] Avg episode reward: [(0, '10.420')] [2025-01-04 12:09:39,492][134294] Updated weights for policy 0, policy_version 192814 (0.0022) [2025-01-04 12:09:42,437][134294] Updated weights for policy 0, policy_version 192824 (0.0026) [2025-01-04 12:09:43,968][134211] Fps is (10 sec: 13926.4, 60 sec: 14062.9, 300 sec: 13121.1). Total num frames: 789827584. Throughput: 0: 3517.2. Samples: 186627664. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 12:09:43,968][134211] Avg episode reward: [(0, '10.948')] [2025-01-04 12:09:45,268][134294] Updated weights for policy 0, policy_version 192834 (0.0026) [2025-01-04 12:09:48,102][134294] Updated weights for policy 0, policy_version 192844 (0.0025) [2025-01-04 12:09:48,968][134211] Fps is (10 sec: 13926.5, 60 sec: 14131.2, 300 sec: 13148.9). Total num frames: 789897216. Throughput: 0: 3525.2. Samples: 186638522. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 12:09:48,968][134211] Avg episode reward: [(0, '9.864')] [2025-01-04 12:09:51,023][134294] Updated weights for policy 0, policy_version 192854 (0.0023) [2025-01-04 12:09:53,969][134211] Fps is (10 sec: 13924.9, 60 sec: 14062.7, 300 sec: 13162.7). Total num frames: 789966848. Throughput: 0: 3540.3. Samples: 186659782. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 12:09:53,970][134211] Avg episode reward: [(0, '10.145')] [2025-01-04 12:09:53,980][134294] Updated weights for policy 0, policy_version 192864 (0.0026) [2025-01-04 12:09:56,931][134294] Updated weights for policy 0, policy_version 192874 (0.0025) [2025-01-04 12:09:58,968][134211] Fps is (10 sec: 13926.4, 60 sec: 14063.0, 300 sec: 13162.7). Total num frames: 790036480. Throughput: 0: 3522.1. Samples: 186680496. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 12:09:58,968][134211] Avg episode reward: [(0, '9.727')] [2025-01-04 12:09:59,976][134294] Updated weights for policy 0, policy_version 192884 (0.0024) [2025-01-04 12:10:02,931][134294] Updated weights for policy 0, policy_version 192894 (0.0025) [2025-01-04 12:10:03,968][134211] Fps is (10 sec: 13928.1, 60 sec: 14062.9, 300 sec: 13190.5). Total num frames: 790106112. Throughput: 0: 3513.9. Samples: 186690598. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 12:10:03,968][134211] Avg episode reward: [(0, '9.284')] [2025-01-04 12:10:05,868][134294] Updated weights for policy 0, policy_version 192904 (0.0025) [2025-01-04 12:10:08,613][134294] Updated weights for policy 0, policy_version 192914 (0.0024) [2025-01-04 12:10:08,968][134211] Fps is (10 sec: 14336.0, 60 sec: 14131.3, 300 sec: 13190.5). Total num frames: 790179840. Throughput: 0: 3523.7. Samples: 186711914. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 12:10:08,968][134211] Avg episode reward: [(0, '9.381')] [2025-01-04 12:10:11,511][134294] Updated weights for policy 0, policy_version 192924 (0.0025) [2025-01-04 12:10:13,968][134211] Fps is (10 sec: 14335.8, 60 sec: 14062.9, 300 sec: 13204.4). Total num frames: 790249472. Throughput: 0: 3515.7. Samples: 186732988. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 12:10:13,968][134211] Avg episode reward: [(0, '10.789')] [2025-01-04 12:10:14,569][134294] Updated weights for policy 0, policy_version 192934 (0.0025) [2025-01-04 12:10:16,648][134294] Updated weights for policy 0, policy_version 192944 (0.0016) [2025-01-04 12:10:18,969][134211] Fps is (10 sec: 15563.5, 60 sec: 14335.8, 300 sec: 13301.5). Total num frames: 790335488. Throughput: 0: 3550.9. Samples: 186745406. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 12:10:18,969][134211] Avg episode reward: [(0, '9.536')] [2025-01-04 12:10:19,046][134294] Updated weights for policy 0, policy_version 192954 (0.0022) [2025-01-04 12:10:22,004][134294] Updated weights for policy 0, policy_version 192964 (0.0022) [2025-01-04 12:10:23,968][134211] Fps is (10 sec: 15565.0, 60 sec: 14336.0, 300 sec: 13204.4). Total num frames: 790405120. Throughput: 0: 3604.8. Samples: 186769146. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 12:10:23,968][134211] Avg episode reward: [(0, '8.661')] [2025-01-04 12:10:24,969][134294] Updated weights for policy 0, policy_version 192974 (0.0023) [2025-01-04 12:10:27,785][134294] Updated weights for policy 0, policy_version 192984 (0.0025) [2025-01-04 12:10:28,968][134211] Fps is (10 sec: 14337.2, 60 sec: 14404.3, 300 sec: 13204.4). Total num frames: 790478848. Throughput: 0: 3613.3. Samples: 186790264. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 12:10:28,968][134211] Avg episode reward: [(0, '8.341')] [2025-01-04 12:10:30,652][134294] Updated weights for policy 0, policy_version 192994 (0.0024) [2025-01-04 12:10:33,452][134294] Updated weights for policy 0, policy_version 193004 (0.0026) [2025-01-04 12:10:33,968][134211] Fps is (10 sec: 14335.8, 60 sec: 14336.0, 300 sec: 13218.3). Total num frames: 790548480. Throughput: 0: 3611.5. Samples: 186801040. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 12:10:33,968][134211] Avg episode reward: [(0, '10.471')] [2025-01-04 12:10:36,325][134294] Updated weights for policy 0, policy_version 193014 (0.0024) [2025-01-04 12:10:38,968][134211] Fps is (10 sec: 13926.3, 60 sec: 14336.0, 300 sec: 13259.9). Total num frames: 790618112. Throughput: 0: 3612.8. Samples: 186822354. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 12:10:38,968][134211] Avg episode reward: [(0, '9.262')] [2025-01-04 12:10:39,370][134294] Updated weights for policy 0, policy_version 193024 (0.0025) [2025-01-04 12:10:42,295][134294] Updated weights for policy 0, policy_version 193034 (0.0023) [2025-01-04 12:10:43,968][134211] Fps is (10 sec: 13926.5, 60 sec: 14336.0, 300 sec: 13301.6). Total num frames: 790687744. Throughput: 0: 3612.1. Samples: 186843040. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 12:10:43,968][134211] Avg episode reward: [(0, '10.388')] [2025-01-04 12:10:45,259][134294] Updated weights for policy 0, policy_version 193044 (0.0022) [2025-01-04 12:10:48,086][134294] Updated weights for policy 0, policy_version 193054 (0.0022) [2025-01-04 12:10:48,968][134211] Fps is (10 sec: 14336.1, 60 sec: 14404.3, 300 sec: 13343.3). Total num frames: 790761472. Throughput: 0: 3628.1. Samples: 186853862. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 12:10:48,968][134211] Avg episode reward: [(0, '9.565')] [2025-01-04 12:10:50,986][134294] Updated weights for policy 0, policy_version 193064 (0.0025) [2025-01-04 12:10:53,902][134294] Updated weights for policy 0, policy_version 193074 (0.0025) [2025-01-04 12:10:53,968][134211] Fps is (10 sec: 14335.9, 60 sec: 14404.5, 300 sec: 13329.4). Total num frames: 790831104. Throughput: 0: 3623.0. Samples: 186874948. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 12:10:53,968][134211] Avg episode reward: [(0, '9.782')] [2025-01-04 12:10:56,786][134294] Updated weights for policy 0, policy_version 193084 (0.0024) [2025-01-04 12:10:58,968][134211] Fps is (10 sec: 13926.4, 60 sec: 14404.3, 300 sec: 13357.1). Total num frames: 790900736. Throughput: 0: 3620.3. Samples: 186895900. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 12:10:58,968][134211] Avg episode reward: [(0, '9.181')] [2025-01-04 12:10:59,859][134294] Updated weights for policy 0, policy_version 193094 (0.0024) [2025-01-04 12:11:02,504][134294] Updated weights for policy 0, policy_version 193104 (0.0021) [2025-01-04 12:11:03,968][134211] Fps is (10 sec: 15155.3, 60 sec: 14609.1, 300 sec: 13426.6). Total num frames: 790982656. Throughput: 0: 3571.4. Samples: 186906116. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 12:11:03,968][134211] Avg episode reward: [(0, '10.092')] [2025-01-04 12:11:04,466][134294] Updated weights for policy 0, policy_version 193114 (0.0015) [2025-01-04 12:11:07,154][134294] Updated weights for policy 0, policy_version 193124 (0.0022) [2025-01-04 12:11:08,968][134211] Fps is (10 sec: 15974.4, 60 sec: 14677.3, 300 sec: 13398.8). Total num frames: 791060480. Throughput: 0: 3623.8. Samples: 186932218. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 12:11:08,968][134211] Avg episode reward: [(0, '10.387')] [2025-01-04 12:11:10,140][134294] Updated weights for policy 0, policy_version 193134 (0.0026) [2025-01-04 12:11:12,983][134294] Updated weights for policy 0, policy_version 193144 (0.0024) [2025-01-04 12:11:13,968][134211] Fps is (10 sec: 14745.3, 60 sec: 14677.3, 300 sec: 13357.1). Total num frames: 791130112. Throughput: 0: 3627.1. Samples: 186953484. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 12:11:13,969][134211] Avg episode reward: [(0, '10.317')] [2025-01-04 12:11:13,979][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000193147_791130112.pth... [2025-01-04 12:11:14,054][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000192352_787873792.pth [2025-01-04 12:11:15,958][134294] Updated weights for policy 0, policy_version 193154 (0.0027) [2025-01-04 12:11:18,754][134294] Updated weights for policy 0, policy_version 193164 (0.0023) [2025-01-04 12:11:18,968][134211] Fps is (10 sec: 13926.4, 60 sec: 14404.5, 300 sec: 13426.6). Total num frames: 791199744. Throughput: 0: 3617.3. Samples: 186963818. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 12:11:18,968][134211] Avg episode reward: [(0, '9.201')] [2025-01-04 12:11:21,597][134294] Updated weights for policy 0, policy_version 193174 (0.0024) [2025-01-04 12:11:23,968][134211] Fps is (10 sec: 13926.7, 60 sec: 14404.3, 300 sec: 13482.1). Total num frames: 791269376. Throughput: 0: 3620.7. Samples: 186985284. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 12:11:23,968][134211] Avg episode reward: [(0, '8.608')] [2025-01-04 12:11:24,596][134294] Updated weights for policy 0, policy_version 193184 (0.0025) [2025-01-04 12:11:27,534][134294] Updated weights for policy 0, policy_version 193194 (0.0023) [2025-01-04 12:11:28,968][134211] Fps is (10 sec: 13926.3, 60 sec: 14336.0, 300 sec: 13468.2). Total num frames: 791339008. Throughput: 0: 3621.2. Samples: 187005992. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 12:11:28,968][134211] Avg episode reward: [(0, '9.238')] [2025-01-04 12:11:30,486][134294] Updated weights for policy 0, policy_version 193204 (0.0026) [2025-01-04 12:11:33,268][134294] Updated weights for policy 0, policy_version 193214 (0.0024) [2025-01-04 12:11:33,968][134211] Fps is (10 sec: 14335.8, 60 sec: 14404.3, 300 sec: 13440.4). Total num frames: 791412736. Throughput: 0: 3618.8. Samples: 187016710. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 12:11:33,969][134211] Avg episode reward: [(0, '10.462')] [2025-01-04 12:11:36,178][134294] Updated weights for policy 0, policy_version 193224 (0.0022) [2025-01-04 12:11:38,969][134211] Fps is (10 sec: 14334.2, 60 sec: 14404.0, 300 sec: 13496.0). Total num frames: 791482368. Throughput: 0: 3623.1. Samples: 187037992. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 12:11:38,969][134211] Avg episode reward: [(0, '9.126')] [2025-01-04 12:11:39,126][134294] Updated weights for policy 0, policy_version 193234 (0.0024) [2025-01-04 12:11:41,985][134294] Updated weights for policy 0, policy_version 193244 (0.0027) [2025-01-04 12:11:43,968][134211] Fps is (10 sec: 13926.5, 60 sec: 14404.3, 300 sec: 13551.5). Total num frames: 791552000. Throughput: 0: 3623.5. Samples: 187058956. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 12:11:43,968][134211] Avg episode reward: [(0, '9.369')] [2025-01-04 12:11:44,958][134294] Updated weights for policy 0, policy_version 193254 (0.0022) [2025-01-04 12:11:47,137][134294] Updated weights for policy 0, policy_version 193264 (0.0016) [2025-01-04 12:11:48,968][134211] Fps is (10 sec: 16386.2, 60 sec: 14745.6, 300 sec: 13690.4). Total num frames: 791646208. Throughput: 0: 3647.9. Samples: 187070270. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 12:11:48,968][134211] Avg episode reward: [(0, '9.828')] [2025-01-04 12:11:49,043][134294] Updated weights for policy 0, policy_version 193274 (0.0012) [2025-01-04 12:11:50,936][134294] Updated weights for policy 0, policy_version 193284 (0.0013) [2025-01-04 12:11:52,858][134294] Updated weights for policy 0, policy_version 193294 (0.0013) [2025-01-04 12:11:53,968][134211] Fps is (10 sec: 20070.7, 60 sec: 15360.0, 300 sec: 13773.7). Total num frames: 791752704. Throughput: 0: 3791.7. Samples: 187102844. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 12:11:53,968][134211] Avg episode reward: [(0, '9.498')] [2025-01-04 12:11:54,950][134294] Updated weights for policy 0, policy_version 193304 (0.0018) [2025-01-04 12:11:58,030][134294] Updated weights for policy 0, policy_version 193314 (0.0026) [2025-01-04 12:11:58,968][134211] Fps is (10 sec: 18021.5, 60 sec: 15428.2, 300 sec: 13732.0). Total num frames: 791826432. Throughput: 0: 3863.7. Samples: 187127352. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 12:11:58,969][134211] Avg episode reward: [(0, '9.866')] [2025-01-04 12:12:01,062][134294] Updated weights for policy 0, policy_version 193324 (0.0027) [2025-01-04 12:12:03,968][134211] Fps is (10 sec: 13926.2, 60 sec: 15155.2, 300 sec: 13787.6). Total num frames: 791891968. Throughput: 0: 3856.5. Samples: 187137360. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 12:12:03,968][134211] Avg episode reward: [(0, '9.828')] [2025-01-04 12:12:04,285][134294] Updated weights for policy 0, policy_version 193334 (0.0027) [2025-01-04 12:12:07,201][134294] Updated weights for policy 0, policy_version 193344 (0.0027) [2025-01-04 12:12:08,968][134211] Fps is (10 sec: 13107.6, 60 sec: 14950.4, 300 sec: 13843.1). Total num frames: 791957504. Throughput: 0: 3821.1. Samples: 187157232. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 12:12:08,968][134211] Avg episode reward: [(0, '9.996')] [2025-01-04 12:12:10,451][134294] Updated weights for policy 0, policy_version 193354 (0.0030) [2025-01-04 12:12:13,278][134294] Updated weights for policy 0, policy_version 193364 (0.0025) [2025-01-04 12:12:13,968][134211] Fps is (10 sec: 13517.0, 60 sec: 14950.5, 300 sec: 13857.0). Total num frames: 792027136. Throughput: 0: 3810.5. Samples: 187177464. Policy #0 lag: (min: 0.0, avg: 0.8, max: 3.0) [2025-01-04 12:12:13,968][134211] Avg episode reward: [(0, '10.310')] [2025-01-04 12:12:16,235][134294] Updated weights for policy 0, policy_version 193374 (0.0025) [2025-01-04 12:12:18,968][134211] Fps is (10 sec: 13926.4, 60 sec: 14950.4, 300 sec: 13857.0). Total num frames: 792096768. Throughput: 0: 3805.3. Samples: 187187946. Policy #0 lag: (min: 0.0, avg: 0.8, max: 3.0) [2025-01-04 12:12:18,968][134211] Avg episode reward: [(0, '9.966')] [2025-01-04 12:12:19,259][134294] Updated weights for policy 0, policy_version 193384 (0.0024) [2025-01-04 12:12:22,202][134294] Updated weights for policy 0, policy_version 193394 (0.0022) [2025-01-04 12:12:23,968][134211] Fps is (10 sec: 13926.2, 60 sec: 14950.4, 300 sec: 13912.5). Total num frames: 792166400. Throughput: 0: 3794.6. Samples: 187208744. Policy #0 lag: (min: 0.0, avg: 0.8, max: 3.0) [2025-01-04 12:12:23,968][134211] Avg episode reward: [(0, '10.380')] [2025-01-04 12:12:25,101][134294] Updated weights for policy 0, policy_version 193404 (0.0025) [2025-01-04 12:12:27,824][134294] Updated weights for policy 0, policy_version 193414 (0.0025) [2025-01-04 12:12:28,968][134211] Fps is (10 sec: 13925.6, 60 sec: 14950.3, 300 sec: 13954.2). Total num frames: 792236032. Throughput: 0: 3804.0. Samples: 187230140. Policy #0 lag: (min: 0.0, avg: 0.8, max: 3.0) [2025-01-04 12:12:28,969][134211] Avg episode reward: [(0, '10.135')] [2025-01-04 12:12:30,781][134294] Updated weights for policy 0, policy_version 193424 (0.0025) [2025-01-04 12:12:33,627][134294] Updated weights for policy 0, policy_version 193434 (0.0023) [2025-01-04 12:12:33,968][134211] Fps is (10 sec: 14336.2, 60 sec: 14950.4, 300 sec: 14037.5). Total num frames: 792309760. Throughput: 0: 3789.5. Samples: 187240796. Policy #0 lag: (min: 0.0, avg: 0.8, max: 3.0) [2025-01-04 12:12:33,968][134211] Avg episode reward: [(0, '8.981')] [2025-01-04 12:12:36,541][134294] Updated weights for policy 0, policy_version 193444 (0.0026) [2025-01-04 12:12:38,971][134211] Fps is (10 sec: 14332.6, 60 sec: 14950.0, 300 sec: 14037.3). Total num frames: 792379392. Throughput: 0: 3540.5. Samples: 187262178. Policy #0 lag: (min: 0.0, avg: 0.8, max: 3.0) [2025-01-04 12:12:38,971][134211] Avg episode reward: [(0, '9.886')] [2025-01-04 12:12:39,398][134294] Updated weights for policy 0, policy_version 193454 (0.0024) [2025-01-04 12:12:42,445][134294] Updated weights for policy 0, policy_version 193464 (0.0024) [2025-01-04 12:12:43,968][134211] Fps is (10 sec: 13926.5, 60 sec: 14950.4, 300 sec: 14134.7). Total num frames: 792449024. Throughput: 0: 3458.5. Samples: 187282982. Policy #0 lag: (min: 0.0, avg: 0.8, max: 3.0) [2025-01-04 12:12:43,968][134211] Avg episode reward: [(0, '9.451')] [2025-01-04 12:12:45,284][134294] Updated weights for policy 0, policy_version 193474 (0.0024) [2025-01-04 12:12:48,075][134294] Updated weights for policy 0, policy_version 193484 (0.0022) [2025-01-04 12:12:48,968][134211] Fps is (10 sec: 14340.2, 60 sec: 14609.0, 300 sec: 14245.8). Total num frames: 792522752. Throughput: 0: 3478.5. Samples: 187293892. Policy #0 lag: (min: 0.0, avg: 0.8, max: 3.0) [2025-01-04 12:12:48,968][134211] Avg episode reward: [(0, '9.789')] [2025-01-04 12:12:50,982][134294] Updated weights for policy 0, policy_version 193494 (0.0026) [2025-01-04 12:12:53,768][134294] Updated weights for policy 0, policy_version 193504 (0.0025) [2025-01-04 12:12:53,968][134211] Fps is (10 sec: 14335.9, 60 sec: 13994.7, 300 sec: 14329.1). Total num frames: 792592384. Throughput: 0: 3515.7. Samples: 187315438. Policy #0 lag: (min: 0.0, avg: 0.8, max: 3.0) [2025-01-04 12:12:53,968][134211] Avg episode reward: [(0, '9.732')] [2025-01-04 12:12:56,657][134294] Updated weights for policy 0, policy_version 193514 (0.0026) [2025-01-04 12:12:58,968][134211] Fps is (10 sec: 13926.4, 60 sec: 13926.5, 300 sec: 14342.9). Total num frames: 792662016. Throughput: 0: 3539.1. Samples: 187336722. Policy #0 lag: (min: 0.0, avg: 0.8, max: 3.0) [2025-01-04 12:12:58,968][134211] Avg episode reward: [(0, '9.554')] [2025-01-04 12:12:59,701][134294] Updated weights for policy 0, policy_version 193524 (0.0025) [2025-01-04 12:13:02,638][134294] Updated weights for policy 0, policy_version 193534 (0.0026) [2025-01-04 12:13:03,968][134211] Fps is (10 sec: 13925.5, 60 sec: 13994.5, 300 sec: 14287.4). Total num frames: 792731648. Throughput: 0: 3537.2. Samples: 187347120. Policy #0 lag: (min: 0.0, avg: 0.8, max: 3.0) [2025-01-04 12:13:03,969][134211] Avg episode reward: [(0, '9.799')] [2025-01-04 12:13:05,695][134294] Updated weights for policy 0, policy_version 193544 (0.0027) [2025-01-04 12:13:08,449][134294] Updated weights for policy 0, policy_version 193554 (0.0024) [2025-01-04 12:13:08,968][134211] Fps is (10 sec: 13926.4, 60 sec: 14062.9, 300 sec: 14287.4). Total num frames: 792801280. Throughput: 0: 3536.1. Samples: 187367868. Policy #0 lag: (min: 0.0, avg: 0.8, max: 3.0) [2025-01-04 12:13:08,968][134211] Avg episode reward: [(0, '10.205')] [2025-01-04 12:13:11,324][134294] Updated weights for policy 0, policy_version 193564 (0.0025) [2025-01-04 12:13:13,968][134211] Fps is (10 sec: 13927.1, 60 sec: 14062.9, 300 sec: 14315.2). Total num frames: 792870912. Throughput: 0: 3531.7. Samples: 187389064. Policy #0 lag: (min: 0.0, avg: 0.8, max: 3.0) [2025-01-04 12:13:13,968][134211] Avg episode reward: [(0, '10.574')] [2025-01-04 12:13:13,981][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000193572_792870912.pth... [2025-01-04 12:13:14,050][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000192726_789405696.pth [2025-01-04 12:13:14,293][134294] Updated weights for policy 0, policy_version 193574 (0.0023) [2025-01-04 12:13:16,404][134294] Updated weights for policy 0, policy_version 193584 (0.0014) [2025-01-04 12:13:18,271][134294] Updated weights for policy 0, policy_version 193594 (0.0014) [2025-01-04 12:13:18,968][134211] Fps is (10 sec: 17203.5, 60 sec: 14609.1, 300 sec: 14440.2). Total num frames: 792973312. Throughput: 0: 3586.1. Samples: 187402170. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 12:13:18,969][134211] Avg episode reward: [(0, '8.850')] [2025-01-04 12:13:20,180][134294] Updated weights for policy 0, policy_version 193604 (0.0013) [2025-01-04 12:13:21,998][134294] Updated weights for policy 0, policy_version 193614 (0.0013) [2025-01-04 12:13:23,968][134211] Fps is (10 sec: 20480.3, 60 sec: 15155.2, 300 sec: 14551.2). Total num frames: 793075712. Throughput: 0: 3839.4. Samples: 187434938. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 12:13:23,968][134211] Avg episode reward: [(0, '9.199')] [2025-01-04 12:13:24,418][134294] Updated weights for policy 0, policy_version 193624 (0.0020) [2025-01-04 12:13:27,382][134294] Updated weights for policy 0, policy_version 193634 (0.0025) [2025-01-04 12:13:28,968][134211] Fps is (10 sec: 16793.3, 60 sec: 15087.1, 300 sec: 14537.4). Total num frames: 793141248. Throughput: 0: 3862.2. Samples: 187456782. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 12:13:28,968][134211] Avg episode reward: [(0, '9.523')] [2025-01-04 12:13:30,509][134294] Updated weights for policy 0, policy_version 193644 (0.0025) [2025-01-04 12:13:33,556][134294] Updated weights for policy 0, policy_version 193654 (0.0027) [2025-01-04 12:13:33,969][134211] Fps is (10 sec: 13514.6, 60 sec: 15018.3, 300 sec: 14551.1). Total num frames: 793210880. Throughput: 0: 3844.4. Samples: 187466898. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 12:13:33,970][134211] Avg episode reward: [(0, '9.909')] [2025-01-04 12:13:36,427][134294] Updated weights for policy 0, policy_version 193664 (0.0024) [2025-01-04 12:13:38,968][134211] Fps is (10 sec: 13926.3, 60 sec: 15019.4, 300 sec: 14565.1). Total num frames: 793280512. Throughput: 0: 3825.1. Samples: 187487568. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 12:13:38,968][134211] Avg episode reward: [(0, '10.753')] [2025-01-04 12:13:39,549][134294] Updated weights for policy 0, policy_version 193674 (0.0027) [2025-01-04 12:13:42,502][134294] Updated weights for policy 0, policy_version 193684 (0.0026) [2025-01-04 12:13:43,968][134211] Fps is (10 sec: 13518.9, 60 sec: 14950.4, 300 sec: 14565.1). Total num frames: 793346048. Throughput: 0: 3803.2. Samples: 187507868. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 12:13:43,968][134211] Avg episode reward: [(0, '9.608')] [2025-01-04 12:13:45,408][134294] Updated weights for policy 0, policy_version 193694 (0.0025) [2025-01-04 12:13:48,284][134294] Updated weights for policy 0, policy_version 193704 (0.0025) [2025-01-04 12:13:48,968][134211] Fps is (10 sec: 13926.5, 60 sec: 14950.4, 300 sec: 14565.1). Total num frames: 793419776. Throughput: 0: 3811.8. Samples: 187518650. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 12:13:48,968][134211] Avg episode reward: [(0, '8.938')] [2025-01-04 12:13:51,186][134294] Updated weights for policy 0, policy_version 193714 (0.0022) [2025-01-04 12:13:53,968][134211] Fps is (10 sec: 14336.0, 60 sec: 14950.4, 300 sec: 14565.1). Total num frames: 793489408. Throughput: 0: 3820.8. Samples: 187539802. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 12:13:53,968][134211] Avg episode reward: [(0, '9.671')] [2025-01-04 12:13:54,243][134294] Updated weights for policy 0, policy_version 193724 (0.0025) [2025-01-04 12:13:57,051][134294] Updated weights for policy 0, policy_version 193734 (0.0023) [2025-01-04 12:13:58,968][134211] Fps is (10 sec: 13926.5, 60 sec: 14950.4, 300 sec: 14565.1). Total num frames: 793559040. Throughput: 0: 3812.8. Samples: 187560640. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 12:13:58,968][134211] Avg episode reward: [(0, '8.136')] [2025-01-04 12:13:59,949][134294] Updated weights for policy 0, policy_version 193744 (0.0030) [2025-01-04 12:14:02,825][134294] Updated weights for policy 0, policy_version 193754 (0.0023) [2025-01-04 12:14:03,968][134211] Fps is (10 sec: 13925.5, 60 sec: 14950.4, 300 sec: 14565.1). Total num frames: 793628672. Throughput: 0: 3762.2. Samples: 187571474. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 12:14:03,969][134211] Avg episode reward: [(0, '10.426')] [2025-01-04 12:14:05,815][134294] Updated weights for policy 0, policy_version 193764 (0.0023) [2025-01-04 12:14:08,597][134294] Updated weights for policy 0, policy_version 193774 (0.0024) [2025-01-04 12:14:08,968][134211] Fps is (10 sec: 14335.9, 60 sec: 15018.7, 300 sec: 14565.1). Total num frames: 793702400. Throughput: 0: 3506.8. Samples: 187592744. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 12:14:08,968][134211] Avg episode reward: [(0, '8.835')] [2025-01-04 12:14:11,484][134294] Updated weights for policy 0, policy_version 193784 (0.0025) [2025-01-04 12:14:13,969][134211] Fps is (10 sec: 14334.4, 60 sec: 15018.3, 300 sec: 14565.0). Total num frames: 793772032. Throughput: 0: 3492.6. Samples: 187613956. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 12:14:13,970][134211] Avg episode reward: [(0, '8.655')] [2025-01-04 12:14:14,469][134294] Updated weights for policy 0, policy_version 193794 (0.0023) [2025-01-04 12:14:17,339][134294] Updated weights for policy 0, policy_version 193804 (0.0026) [2025-01-04 12:14:18,968][134211] Fps is (10 sec: 13926.3, 60 sec: 14472.5, 300 sec: 14565.1). Total num frames: 793841664. Throughput: 0: 3496.7. Samples: 187624244. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 12:14:18,968][134211] Avg episode reward: [(0, '10.579')] [2025-01-04 12:14:20,294][134294] Updated weights for policy 0, policy_version 193814 (0.0023) [2025-01-04 12:14:23,042][134294] Updated weights for policy 0, policy_version 193824 (0.0025) [2025-01-04 12:14:23,968][134211] Fps is (10 sec: 13928.9, 60 sec: 13926.4, 300 sec: 14565.1). Total num frames: 793911296. Throughput: 0: 3516.6. Samples: 187645814. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 12:14:23,968][134211] Avg episode reward: [(0, '9.679')] [2025-01-04 12:14:25,944][134294] Updated weights for policy 0, policy_version 193834 (0.0025) [2025-01-04 12:14:28,730][134294] Updated weights for policy 0, policy_version 193844 (0.0024) [2025-01-04 12:14:28,968][134211] Fps is (10 sec: 14336.1, 60 sec: 14062.9, 300 sec: 14565.1). Total num frames: 793985024. Throughput: 0: 3546.2. Samples: 187667448. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 12:14:28,968][134211] Avg episode reward: [(0, '9.236')] [2025-01-04 12:14:31,624][134294] Updated weights for policy 0, policy_version 193854 (0.0024) [2025-01-04 12:14:33,968][134211] Fps is (10 sec: 14335.8, 60 sec: 14063.3, 300 sec: 14565.1). Total num frames: 794054656. Throughput: 0: 3540.6. Samples: 187677976. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 12:14:33,968][134211] Avg episode reward: [(0, '9.437')] [2025-01-04 12:14:34,633][134294] Updated weights for policy 0, policy_version 193864 (0.0025) [2025-01-04 12:14:37,604][134294] Updated weights for policy 0, policy_version 193874 (0.0027) [2025-01-04 12:14:38,968][134211] Fps is (10 sec: 13926.5, 60 sec: 14063.0, 300 sec: 14565.1). Total num frames: 794124288. Throughput: 0: 3534.1. Samples: 187698838. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 12:14:38,968][134211] Avg episode reward: [(0, '10.171')] [2025-01-04 12:14:40,496][134294] Updated weights for policy 0, policy_version 193884 (0.0025) [2025-01-04 12:14:42,674][134294] Updated weights for policy 0, policy_version 193894 (0.0016) [2025-01-04 12:14:43,968][134211] Fps is (10 sec: 15565.1, 60 sec: 14404.3, 300 sec: 14620.6). Total num frames: 794210304. Throughput: 0: 3617.5. Samples: 187723428. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 12:14:43,968][134211] Avg episode reward: [(0, '10.121')] [2025-01-04 12:14:45,030][134294] Updated weights for policy 0, policy_version 193904 (0.0020) [2025-01-04 12:14:47,848][134294] Updated weights for policy 0, policy_version 193914 (0.0026) [2025-01-04 12:14:48,968][134211] Fps is (10 sec: 15974.2, 60 sec: 14404.3, 300 sec: 14634.6). Total num frames: 794284032. Throughput: 0: 3636.1. Samples: 187735098. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 12:14:48,968][134211] Avg episode reward: [(0, '9.639')] [2025-01-04 12:14:50,764][134294] Updated weights for policy 0, policy_version 193924 (0.0024) [2025-01-04 12:14:53,570][134294] Updated weights for policy 0, policy_version 193934 (0.0020) [2025-01-04 12:14:53,968][134211] Fps is (10 sec: 14335.6, 60 sec: 14404.2, 300 sec: 14634.5). Total num frames: 794353664. Throughput: 0: 3640.7. Samples: 187756578. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 12:14:53,969][134211] Avg episode reward: [(0, '9.253')] [2025-01-04 12:14:56,547][134294] Updated weights for policy 0, policy_version 193944 (0.0024) [2025-01-04 12:14:58,968][134211] Fps is (10 sec: 14336.1, 60 sec: 14472.5, 300 sec: 14648.4). Total num frames: 794427392. Throughput: 0: 3634.5. Samples: 187777504. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 12:14:58,968][134211] Avg episode reward: [(0, '8.822')] [2025-01-04 12:14:59,600][134294] Updated weights for policy 0, policy_version 193954 (0.0024) [2025-01-04 12:15:02,589][134294] Updated weights for policy 0, policy_version 193964 (0.0024) [2025-01-04 12:15:03,968][134211] Fps is (10 sec: 13926.6, 60 sec: 14404.4, 300 sec: 14620.6). Total num frames: 794492928. Throughput: 0: 3632.1. Samples: 187787690. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 12:15:03,968][134211] Avg episode reward: [(0, '9.191')] [2025-01-04 12:15:05,428][134294] Updated weights for policy 0, policy_version 193974 (0.0023) [2025-01-04 12:15:08,106][134294] Updated weights for policy 0, policy_version 193984 (0.0024) [2025-01-04 12:15:08,968][134211] Fps is (10 sec: 14745.3, 60 sec: 14540.8, 300 sec: 14662.3). Total num frames: 794574848. Throughput: 0: 3627.0. Samples: 187809030. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 12:15:08,968][134211] Avg episode reward: [(0, '9.115')] [2025-01-04 12:15:10,024][134294] Updated weights for policy 0, policy_version 193994 (0.0013) [2025-01-04 12:15:11,898][134294] Updated weights for policy 0, policy_version 194004 (0.0013) [2025-01-04 12:15:13,792][134294] Updated weights for policy 0, policy_version 194014 (0.0014) [2025-01-04 12:15:13,967][134211] Fps is (10 sec: 18842.0, 60 sec: 15155.7, 300 sec: 14731.8). Total num frames: 794681344. Throughput: 0: 3843.3. Samples: 187840398. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 12:15:13,968][134211] Avg episode reward: [(0, '8.351')] [2025-01-04 12:15:13,972][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000194015_794685440.pth... [2025-01-04 12:15:14,020][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000193147_791130112.pth [2025-01-04 12:15:15,706][134294] Updated weights for policy 0, policy_version 194024 (0.0013) [2025-01-04 12:15:18,422][134294] Updated weights for policy 0, policy_version 194034 (0.0020) [2025-01-04 12:15:18,968][134211] Fps is (10 sec: 19660.1, 60 sec: 15496.4, 300 sec: 14801.1). Total num frames: 794771456. Throughput: 0: 3956.6. Samples: 187856024. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 12:15:18,969][134211] Avg episode reward: [(0, '8.978')] [2025-01-04 12:15:21,461][134294] Updated weights for policy 0, policy_version 194044 (0.0029) [2025-01-04 12:15:23,971][134211] Fps is (10 sec: 15559.5, 60 sec: 15427.4, 300 sec: 14773.2). Total num frames: 794836992. Throughput: 0: 3950.6. Samples: 187876630. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 12:15:23,972][134211] Avg episode reward: [(0, '9.694')] [2025-01-04 12:15:24,657][134294] Updated weights for policy 0, policy_version 194054 (0.0028) [2025-01-04 12:15:27,688][134294] Updated weights for policy 0, policy_version 194064 (0.0026) [2025-01-04 12:15:28,968][134211] Fps is (10 sec: 13107.9, 60 sec: 15291.7, 300 sec: 14759.5). Total num frames: 794902528. Throughput: 0: 3842.0. Samples: 187896318. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 12:15:28,968][134211] Avg episode reward: [(0, '9.714')] [2025-01-04 12:15:30,692][134294] Updated weights for policy 0, policy_version 194074 (0.0027) [2025-01-04 12:15:33,562][134294] Updated weights for policy 0, policy_version 194084 (0.0024) [2025-01-04 12:15:33,968][134211] Fps is (10 sec: 13521.0, 60 sec: 15291.7, 300 sec: 14759.5). Total num frames: 794972160. Throughput: 0: 3816.5. Samples: 187906842. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 12:15:33,968][134211] Avg episode reward: [(0, '9.456')] [2025-01-04 12:15:36,548][134294] Updated weights for policy 0, policy_version 194094 (0.0027) [2025-01-04 12:15:38,968][134211] Fps is (10 sec: 13926.4, 60 sec: 15291.7, 300 sec: 14759.5). Total num frames: 795041792. Throughput: 0: 3803.9. Samples: 187927752. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 12:15:38,968][134211] Avg episode reward: [(0, '9.425')] [2025-01-04 12:15:39,481][134294] Updated weights for policy 0, policy_version 194104 (0.0023) [2025-01-04 12:15:42,441][134294] Updated weights for policy 0, policy_version 194114 (0.0024) [2025-01-04 12:15:43,968][134211] Fps is (10 sec: 13926.5, 60 sec: 15018.6, 300 sec: 14745.6). Total num frames: 795111424. Throughput: 0: 3796.9. Samples: 187948366. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 12:15:43,968][134211] Avg episode reward: [(0, '9.368')] [2025-01-04 12:15:45,389][134294] Updated weights for policy 0, policy_version 194124 (0.0024) [2025-01-04 12:15:48,118][134294] Updated weights for policy 0, policy_version 194134 (0.0022) [2025-01-04 12:15:48,968][134211] Fps is (10 sec: 13926.4, 60 sec: 14950.4, 300 sec: 14745.6). Total num frames: 795181056. Throughput: 0: 3813.1. Samples: 187959280. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 12:15:48,968][134211] Avg episode reward: [(0, '9.733')] [2025-01-04 12:15:51,085][134294] Updated weights for policy 0, policy_version 194144 (0.0023) [2025-01-04 12:15:53,968][134211] Fps is (10 sec: 13926.5, 60 sec: 14950.5, 300 sec: 14745.6). Total num frames: 795250688. Throughput: 0: 3809.3. Samples: 187980450. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 12:15:53,968][134211] Avg episode reward: [(0, '9.611')] [2025-01-04 12:15:54,043][134294] Updated weights for policy 0, policy_version 194154 (0.0022) [2025-01-04 12:15:56,991][134294] Updated weights for policy 0, policy_version 194164 (0.0023) [2025-01-04 12:15:58,968][134211] Fps is (10 sec: 13926.4, 60 sec: 14882.1, 300 sec: 14703.9). Total num frames: 795320320. Throughput: 0: 3576.3. Samples: 188001332. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 12:15:58,968][134211] Avg episode reward: [(0, '9.753')] [2025-01-04 12:15:59,934][134294] Updated weights for policy 0, policy_version 194174 (0.0025) [2025-01-04 12:16:02,663][134294] Updated weights for policy 0, policy_version 194184 (0.0023) [2025-01-04 12:16:03,968][134211] Fps is (10 sec: 14335.5, 60 sec: 15018.6, 300 sec: 14690.0). Total num frames: 795394048. Throughput: 0: 3470.6. Samples: 188012200. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 12:16:03,969][134211] Avg episode reward: [(0, '9.667')] [2025-01-04 12:16:05,608][134294] Updated weights for policy 0, policy_version 194194 (0.0024) [2025-01-04 12:16:08,491][134294] Updated weights for policy 0, policy_version 194204 (0.0024) [2025-01-04 12:16:08,968][134211] Fps is (10 sec: 14336.0, 60 sec: 14813.9, 300 sec: 14690.1). Total num frames: 795463680. Throughput: 0: 3487.9. Samples: 188033576. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 12:16:08,968][134211] Avg episode reward: [(0, '10.412')] [2025-01-04 12:16:11,308][134294] Updated weights for policy 0, policy_version 194214 (0.0027) [2025-01-04 12:16:13,968][134211] Fps is (10 sec: 14336.3, 60 sec: 14267.7, 300 sec: 14703.9). Total num frames: 795537408. Throughput: 0: 3530.5. Samples: 188055192. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 12:16:13,968][134211] Avg episode reward: [(0, '10.155')] [2025-01-04 12:16:14,100][134294] Updated weights for policy 0, policy_version 194224 (0.0024) [2025-01-04 12:16:17,027][134294] Updated weights for policy 0, policy_version 194234 (0.0023) [2025-01-04 12:16:18,968][134211] Fps is (10 sec: 14336.1, 60 sec: 13926.5, 300 sec: 14703.9). Total num frames: 795607040. Throughput: 0: 3532.1. Samples: 188065788. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 12:16:18,968][134211] Avg episode reward: [(0, '9.277')] [2025-01-04 12:16:19,919][134294] Updated weights for policy 0, policy_version 194244 (0.0027) [2025-01-04 12:16:22,784][134294] Updated weights for policy 0, policy_version 194254 (0.0023) [2025-01-04 12:16:23,968][134211] Fps is (10 sec: 13926.5, 60 sec: 13995.4, 300 sec: 14703.9). Total num frames: 795676672. Throughput: 0: 3544.4. Samples: 188087248. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 12:16:23,968][134211] Avg episode reward: [(0, '10.255')] [2025-01-04 12:16:25,698][134294] Updated weights for policy 0, policy_version 194264 (0.0024) [2025-01-04 12:16:28,485][134294] Updated weights for policy 0, policy_version 194274 (0.0025) [2025-01-04 12:16:28,968][134211] Fps is (10 sec: 14336.1, 60 sec: 14131.2, 300 sec: 14704.0). Total num frames: 795750400. Throughput: 0: 3563.2. Samples: 188108710. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 12:16:28,968][134211] Avg episode reward: [(0, '10.722')] [2025-01-04 12:16:31,378][134294] Updated weights for policy 0, policy_version 194284 (0.0026) [2025-01-04 12:16:33,967][134211] Fps is (10 sec: 14745.9, 60 sec: 14199.5, 300 sec: 14717.9). Total num frames: 795824128. Throughput: 0: 3557.4. Samples: 188119364. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 12:16:33,968][134211] Avg episode reward: [(0, '9.609')] [2025-01-04 12:16:34,063][134294] Updated weights for policy 0, policy_version 194294 (0.0021) [2025-01-04 12:16:36,007][134294] Updated weights for policy 0, policy_version 194304 (0.0011) [2025-01-04 12:16:37,845][134294] Updated weights for policy 0, policy_version 194314 (0.0011) [2025-01-04 12:16:38,967][134211] Fps is (10 sec: 18432.1, 60 sec: 14882.2, 300 sec: 14856.7). Total num frames: 795934720. Throughput: 0: 3702.9. Samples: 188147080. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 12:16:38,968][134211] Avg episode reward: [(0, '9.888')] [2025-01-04 12:16:39,714][134294] Updated weights for policy 0, policy_version 194324 (0.0012) [2025-01-04 12:16:41,583][134294] Updated weights for policy 0, policy_version 194334 (0.0013) [2025-01-04 12:16:43,464][134294] Updated weights for policy 0, policy_version 194344 (0.0014) [2025-01-04 12:16:43,968][134211] Fps is (10 sec: 21708.7, 60 sec: 15496.6, 300 sec: 14898.3). Total num frames: 796041216. Throughput: 0: 3969.6. Samples: 188179964. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 12:16:43,968][134211] Avg episode reward: [(0, '10.372')] [2025-01-04 12:16:45,369][134294] Updated weights for policy 0, policy_version 194354 (0.0015) [2025-01-04 12:16:48,130][134294] Updated weights for policy 0, policy_version 194364 (0.0022) [2025-01-04 12:16:48,968][134211] Fps is (10 sec: 18841.1, 60 sec: 15701.3, 300 sec: 14815.0). Total num frames: 796123136. Throughput: 0: 4063.8. Samples: 188195068. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 12:16:48,968][134211] Avg episode reward: [(0, '9.206')] [2025-01-04 12:16:51,309][134294] Updated weights for policy 0, policy_version 194374 (0.0027) [2025-01-04 12:16:53,968][134211] Fps is (10 sec: 14745.5, 60 sec: 15633.1, 300 sec: 14787.3). Total num frames: 796188672. Throughput: 0: 4026.5. Samples: 188214766. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 12:16:53,968][134211] Avg episode reward: [(0, '9.051')] [2025-01-04 12:16:54,509][134294] Updated weights for policy 0, policy_version 194384 (0.0031) [2025-01-04 12:16:57,539][134294] Updated weights for policy 0, policy_version 194394 (0.0027) [2025-01-04 12:16:58,968][134211] Fps is (10 sec: 13107.3, 60 sec: 15564.8, 300 sec: 14787.3). Total num frames: 796254208. Throughput: 0: 3989.1. Samples: 188234702. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 12:16:58,968][134211] Avg episode reward: [(0, '9.189')] [2025-01-04 12:17:00,491][134294] Updated weights for policy 0, policy_version 194404 (0.0026) [2025-01-04 12:17:03,484][134294] Updated weights for policy 0, policy_version 194414 (0.0023) [2025-01-04 12:17:03,968][134211] Fps is (10 sec: 13516.8, 60 sec: 15496.6, 300 sec: 14801.1). Total num frames: 796323840. Throughput: 0: 3987.7. Samples: 188245234. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 12:17:03,968][134211] Avg episode reward: [(0, '8.843')] [2025-01-04 12:17:06,399][134294] Updated weights for policy 0, policy_version 194424 (0.0026) [2025-01-04 12:17:08,968][134211] Fps is (10 sec: 13926.5, 60 sec: 15496.6, 300 sec: 14801.1). Total num frames: 796393472. Throughput: 0: 3971.2. Samples: 188265950. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 12:17:08,968][134211] Avg episode reward: [(0, '9.067')] [2025-01-04 12:17:09,580][134294] Updated weights for policy 0, policy_version 194434 (0.0026) [2025-01-04 12:17:12,855][134294] Updated weights for policy 0, policy_version 194444 (0.0026) [2025-01-04 12:17:13,968][134211] Fps is (10 sec: 13107.0, 60 sec: 15291.7, 300 sec: 14773.4). Total num frames: 796454912. Throughput: 0: 3916.3. Samples: 188284944. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 12:17:13,968][134211] Avg episode reward: [(0, '8.770')] [2025-01-04 12:17:13,976][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000194447_796454912.pth... [2025-01-04 12:17:14,047][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000193572_792870912.pth [2025-01-04 12:17:15,906][134294] Updated weights for policy 0, policy_version 194454 (0.0027) [2025-01-04 12:17:18,707][134294] Updated weights for policy 0, policy_version 194464 (0.0024) [2025-01-04 12:17:18,968][134211] Fps is (10 sec: 13107.2, 60 sec: 15291.7, 300 sec: 14773.4). Total num frames: 796524544. Throughput: 0: 3906.8. Samples: 188295172. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 12:17:18,968][134211] Avg episode reward: [(0, '10.736')] [2025-01-04 12:17:21,562][134294] Updated weights for policy 0, policy_version 194474 (0.0025) [2025-01-04 12:17:23,968][134211] Fps is (10 sec: 13926.3, 60 sec: 15291.7, 300 sec: 14773.4). Total num frames: 796594176. Throughput: 0: 3766.6. Samples: 188316578. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 12:17:23,968][134211] Avg episode reward: [(0, '9.359')] [2025-01-04 12:17:24,650][134294] Updated weights for policy 0, policy_version 194484 (0.0026) [2025-01-04 12:17:27,670][134294] Updated weights for policy 0, policy_version 194494 (0.0027) [2025-01-04 12:17:28,968][134211] Fps is (10 sec: 13926.2, 60 sec: 15223.4, 300 sec: 14759.5). Total num frames: 796663808. Throughput: 0: 3486.5. Samples: 188336856. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 12:17:28,968][134211] Avg episode reward: [(0, '10.474')] [2025-01-04 12:17:30,508][134294] Updated weights for policy 0, policy_version 194504 (0.0026) [2025-01-04 12:17:33,365][134294] Updated weights for policy 0, policy_version 194514 (0.0024) [2025-01-04 12:17:33,968][134211] Fps is (10 sec: 14336.1, 60 sec: 15223.4, 300 sec: 14773.5). Total num frames: 796737536. Throughput: 0: 3393.2. Samples: 188347760. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 12:17:33,968][134211] Avg episode reward: [(0, '10.058')] [2025-01-04 12:17:36,097][134294] Updated weights for policy 0, policy_version 194524 (0.0027) [2025-01-04 12:17:38,968][134211] Fps is (10 sec: 14336.1, 60 sec: 14540.8, 300 sec: 14773.4). Total num frames: 796807168. Throughput: 0: 3438.0. Samples: 188369478. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 12:17:38,968][134211] Avg episode reward: [(0, '8.682')] [2025-01-04 12:17:39,018][134294] Updated weights for policy 0, policy_version 194534 (0.0024) [2025-01-04 12:17:41,893][134294] Updated weights for policy 0, policy_version 194544 (0.0025) [2025-01-04 12:17:43,968][134211] Fps is (10 sec: 13926.6, 60 sec: 13926.4, 300 sec: 14759.5). Total num frames: 796876800. Throughput: 0: 3462.7. Samples: 188390522. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 12:17:43,968][134211] Avg episode reward: [(0, '9.385')] [2025-01-04 12:17:44,927][134294] Updated weights for policy 0, policy_version 194554 (0.0023) [2025-01-04 12:17:47,807][134294] Updated weights for policy 0, policy_version 194564 (0.0022) [2025-01-04 12:17:48,968][134211] Fps is (10 sec: 13926.4, 60 sec: 13721.6, 300 sec: 14759.5). Total num frames: 796946432. Throughput: 0: 3456.5. Samples: 188400776. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 12:17:48,968][134211] Avg episode reward: [(0, '10.171')] [2025-01-04 12:17:50,787][134294] Updated weights for policy 0, policy_version 194574 (0.0026) [2025-01-04 12:17:53,523][134294] Updated weights for policy 0, policy_version 194584 (0.0023) [2025-01-04 12:17:53,968][134211] Fps is (10 sec: 14336.1, 60 sec: 13858.1, 300 sec: 14773.4). Total num frames: 797020160. Throughput: 0: 3474.4. Samples: 188422296. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 12:17:53,968][134211] Avg episode reward: [(0, '9.727')] [2025-01-04 12:17:56,439][134294] Updated weights for policy 0, policy_version 194594 (0.0024) [2025-01-04 12:17:58,968][134211] Fps is (10 sec: 14335.8, 60 sec: 13926.4, 300 sec: 14773.4). Total num frames: 797089792. Throughput: 0: 3521.6. Samples: 188443418. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 12:17:58,968][134211] Avg episode reward: [(0, '9.734')] [2025-01-04 12:17:59,461][134294] Updated weights for policy 0, policy_version 194604 (0.0025) [2025-01-04 12:18:02,392][134294] Updated weights for policy 0, policy_version 194614 (0.0024) [2025-01-04 12:18:03,968][134211] Fps is (10 sec: 13926.3, 60 sec: 13926.4, 300 sec: 14773.4). Total num frames: 797159424. Throughput: 0: 3523.7. Samples: 188453740. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 12:18:03,968][134211] Avg episode reward: [(0, '10.228')] [2025-01-04 12:18:05,321][134294] Updated weights for policy 0, policy_version 194624 (0.0026) [2025-01-04 12:18:08,163][134294] Updated weights for policy 0, policy_version 194634 (0.0024) [2025-01-04 12:18:08,968][134211] Fps is (10 sec: 13926.6, 60 sec: 13926.4, 300 sec: 14773.4). Total num frames: 797229056. Throughput: 0: 3522.9. Samples: 188475106. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 12:18:08,968][134211] Avg episode reward: [(0, '9.603')] [2025-01-04 12:18:10,949][134294] Updated weights for policy 0, policy_version 194644 (0.0021) [2025-01-04 12:18:13,806][134294] Updated weights for policy 0, policy_version 194654 (0.0023) [2025-01-04 12:18:13,974][134211] Fps is (10 sec: 14327.4, 60 sec: 14129.8, 300 sec: 14675.9). Total num frames: 797302784. Throughput: 0: 3553.7. Samples: 188496792. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 12:18:13,975][134211] Avg episode reward: [(0, '9.479')] [2025-01-04 12:18:15,998][134294] Updated weights for policy 0, policy_version 194664 (0.0015) [2025-01-04 12:18:18,304][134294] Updated weights for policy 0, policy_version 194674 (0.0020) [2025-01-04 12:18:18,969][134211] Fps is (10 sec: 16382.3, 60 sec: 14472.3, 300 sec: 14634.5). Total num frames: 797392896. Throughput: 0: 3609.5. Samples: 188510192. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 12:18:18,969][134211] Avg episode reward: [(0, '9.776')] [2025-01-04 12:18:21,238][134294] Updated weights for policy 0, policy_version 194684 (0.0026) [2025-01-04 12:18:23,968][134211] Fps is (10 sec: 15983.9, 60 sec: 14472.6, 300 sec: 14648.4). Total num frames: 797462528. Throughput: 0: 3628.4. Samples: 188532758. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 12:18:23,968][134211] Avg episode reward: [(0, '11.248')] [2025-01-04 12:18:24,317][134294] Updated weights for policy 0, policy_version 194694 (0.0023) [2025-01-04 12:18:27,188][134294] Updated weights for policy 0, policy_version 194704 (0.0027) [2025-01-04 12:18:28,968][134211] Fps is (10 sec: 13518.2, 60 sec: 14404.3, 300 sec: 14634.6). Total num frames: 797528064. Throughput: 0: 3620.8. Samples: 188553458. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 12:18:28,968][134211] Avg episode reward: [(0, '8.727')] [2025-01-04 12:18:30,138][134294] Updated weights for policy 0, policy_version 194714 (0.0024) [2025-01-04 12:18:32,990][134294] Updated weights for policy 0, policy_version 194724 (0.0026) [2025-01-04 12:18:33,968][134211] Fps is (10 sec: 13926.2, 60 sec: 14404.3, 300 sec: 14648.4). Total num frames: 797601792. Throughput: 0: 3630.5. Samples: 188564148. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 12:18:33,968][134211] Avg episode reward: [(0, '9.329')] [2025-01-04 12:18:35,898][134294] Updated weights for policy 0, policy_version 194734 (0.0021) [2025-01-04 12:18:37,775][134294] Updated weights for policy 0, policy_version 194744 (0.0014) [2025-01-04 12:18:38,968][134211] Fps is (10 sec: 16384.1, 60 sec: 14745.6, 300 sec: 14731.7). Total num frames: 797691904. Throughput: 0: 3677.6. Samples: 188587788. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 12:18:38,968][134211] Avg episode reward: [(0, '10.387')] [2025-01-04 12:18:40,217][134294] Updated weights for policy 0, policy_version 194754 (0.0020) [2025-01-04 12:18:43,064][134294] Updated weights for policy 0, policy_version 194764 (0.0023) [2025-01-04 12:18:43,968][134211] Fps is (10 sec: 15974.6, 60 sec: 14745.6, 300 sec: 14717.8). Total num frames: 797761536. Throughput: 0: 3741.2. Samples: 188611770. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 12:18:43,968][134211] Avg episode reward: [(0, '10.715')] [2025-01-04 12:18:46,007][134294] Updated weights for policy 0, policy_version 194774 (0.0024) [2025-01-04 12:18:48,968][134211] Fps is (10 sec: 13926.4, 60 sec: 14745.6, 300 sec: 14717.8). Total num frames: 797831168. Throughput: 0: 3746.8. Samples: 188622348. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 12:18:48,968][134211] Avg episode reward: [(0, '9.983')] [2025-01-04 12:18:49,001][134294] Updated weights for policy 0, policy_version 194784 (0.0028) [2025-01-04 12:18:51,945][134294] Updated weights for policy 0, policy_version 194794 (0.0028) [2025-01-04 12:18:53,968][134211] Fps is (10 sec: 13926.3, 60 sec: 14677.3, 300 sec: 14717.8). Total num frames: 797900800. Throughput: 0: 3733.2. Samples: 188643100. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 12:18:53,968][134211] Avg episode reward: [(0, '9.180')] [2025-01-04 12:18:54,908][134294] Updated weights for policy 0, policy_version 194804 (0.0026) [2025-01-04 12:18:57,696][134294] Updated weights for policy 0, policy_version 194814 (0.0022) [2025-01-04 12:18:58,968][134211] Fps is (10 sec: 15155.4, 60 sec: 14882.2, 300 sec: 14759.5). Total num frames: 797982720. Throughput: 0: 3742.9. Samples: 188665198. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 12:18:58,968][134211] Avg episode reward: [(0, '8.146')] [2025-01-04 12:18:59,662][134294] Updated weights for policy 0, policy_version 194824 (0.0012) [2025-01-04 12:19:01,488][134294] Updated weights for policy 0, policy_version 194834 (0.0016) [2025-01-04 12:19:03,365][134294] Updated weights for policy 0, policy_version 194844 (0.0012) [2025-01-04 12:19:03,968][134211] Fps is (10 sec: 19251.5, 60 sec: 15564.8, 300 sec: 14884.5). Total num frames: 798093312. Throughput: 0: 3806.7. Samples: 188681488. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 12:19:03,968][134211] Avg episode reward: [(0, '9.875')] [2025-01-04 12:19:05,268][134294] Updated weights for policy 0, policy_version 194854 (0.0012) [2025-01-04 12:19:07,987][134294] Updated weights for policy 0, policy_version 194864 (0.0024) [2025-01-04 12:19:08,968][134211] Fps is (10 sec: 18841.3, 60 sec: 15701.3, 300 sec: 14912.3). Total num frames: 798171136. Throughput: 0: 3963.8. Samples: 188711128. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 12:19:08,968][134211] Avg episode reward: [(0, '9.286')] [2025-01-04 12:19:11,178][134294] Updated weights for policy 0, policy_version 194874 (0.0028) [2025-01-04 12:19:13,968][134211] Fps is (10 sec: 14335.6, 60 sec: 15566.3, 300 sec: 14898.3). Total num frames: 798236672. Throughput: 0: 3935.8. Samples: 188730570. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 12:19:13,969][134211] Avg episode reward: [(0, '8.640')] [2025-01-04 12:19:14,026][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000194883_798240768.pth... [2025-01-04 12:19:14,101][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000194015_794685440.pth [2025-01-04 12:19:14,348][134294] Updated weights for policy 0, policy_version 194884 (0.0030) [2025-01-04 12:19:17,463][134294] Updated weights for policy 0, policy_version 194894 (0.0024) [2025-01-04 12:19:18,968][134211] Fps is (10 sec: 13107.3, 60 sec: 15155.5, 300 sec: 14884.4). Total num frames: 798302208. Throughput: 0: 3914.1. Samples: 188740282. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 12:19:18,968][134211] Avg episode reward: [(0, '10.793')] [2025-01-04 12:19:20,454][134294] Updated weights for policy 0, policy_version 194904 (0.0027) [2025-01-04 12:19:23,266][134294] Updated weights for policy 0, policy_version 194914 (0.0024) [2025-01-04 12:19:23,968][134211] Fps is (10 sec: 13926.3, 60 sec: 15223.4, 300 sec: 14884.4). Total num frames: 798375936. Throughput: 0: 3854.0. Samples: 188761218. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 12:19:23,969][134211] Avg episode reward: [(0, '8.986')] [2025-01-04 12:19:26,190][134294] Updated weights for policy 0, policy_version 194924 (0.0025) [2025-01-04 12:19:28,968][134211] Fps is (10 sec: 14335.8, 60 sec: 15291.7, 300 sec: 14884.5). Total num frames: 798445568. Throughput: 0: 3787.0. Samples: 188782186. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 12:19:28,968][134211] Avg episode reward: [(0, '8.623')] [2025-01-04 12:19:29,176][134294] Updated weights for policy 0, policy_version 194934 (0.0027) [2025-01-04 12:19:32,134][134294] Updated weights for policy 0, policy_version 194944 (0.0026) [2025-01-04 12:19:33,968][134211] Fps is (10 sec: 13926.6, 60 sec: 15223.5, 300 sec: 14884.4). Total num frames: 798515200. Throughput: 0: 3780.0. Samples: 188792448. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 12:19:33,968][134211] Avg episode reward: [(0, '9.195')] [2025-01-04 12:19:34,959][134294] Updated weights for policy 0, policy_version 194954 (0.0024) [2025-01-04 12:19:37,877][134294] Updated weights for policy 0, policy_version 194964 (0.0024) [2025-01-04 12:19:38,968][134211] Fps is (10 sec: 13926.6, 60 sec: 14882.1, 300 sec: 14828.9). Total num frames: 798584832. Throughput: 0: 3798.2. Samples: 188814018. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 12:19:38,968][134211] Avg episode reward: [(0, '10.371')] [2025-01-04 12:19:40,776][134294] Updated weights for policy 0, policy_version 194974 (0.0023) [2025-01-04 12:19:43,574][134294] Updated weights for policy 0, policy_version 194984 (0.0024) [2025-01-04 12:19:43,968][134211] Fps is (10 sec: 14336.1, 60 sec: 14950.4, 300 sec: 14828.9). Total num frames: 798658560. Throughput: 0: 3783.4. Samples: 188835450. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 12:19:43,968][134211] Avg episode reward: [(0, '9.752')] [2025-01-04 12:19:46,440][134294] Updated weights for policy 0, policy_version 194994 (0.0025) [2025-01-04 12:19:48,968][134211] Fps is (10 sec: 14335.9, 60 sec: 14950.4, 300 sec: 14828.9). Total num frames: 798728192. Throughput: 0: 3657.3. Samples: 188846068. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 12:19:48,968][134211] Avg episode reward: [(0, '9.374')] [2025-01-04 12:19:49,446][134294] Updated weights for policy 0, policy_version 195004 (0.0023) [2025-01-04 12:19:52,363][134294] Updated weights for policy 0, policy_version 195014 (0.0025) [2025-01-04 12:19:53,968][134211] Fps is (10 sec: 13926.6, 60 sec: 14950.4, 300 sec: 14815.0). Total num frames: 798797824. Throughput: 0: 3460.7. Samples: 188866858. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 12:19:53,968][134211] Avg episode reward: [(0, '9.376')] [2025-01-04 12:19:55,284][134294] Updated weights for policy 0, policy_version 195024 (0.0024) [2025-01-04 12:19:58,130][134294] Updated weights for policy 0, policy_version 195034 (0.0027) [2025-01-04 12:19:58,968][134211] Fps is (10 sec: 13926.4, 60 sec: 14745.6, 300 sec: 14828.9). Total num frames: 798867456. Throughput: 0: 3502.8. Samples: 188888196. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 12:19:58,968][134211] Avg episode reward: [(0, '8.681')] [2025-01-04 12:20:00,978][134294] Updated weights for policy 0, policy_version 195044 (0.0022) [2025-01-04 12:20:03,880][134294] Updated weights for policy 0, policy_version 195054 (0.0025) [2025-01-04 12:20:03,968][134211] Fps is (10 sec: 14335.8, 60 sec: 14131.2, 300 sec: 14801.1). Total num frames: 798941184. Throughput: 0: 3526.3. Samples: 188898968. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 12:20:03,968][134211] Avg episode reward: [(0, '10.481')] [2025-01-04 12:20:06,744][134294] Updated weights for policy 0, policy_version 195064 (0.0024) [2025-01-04 12:20:08,971][134211] Fps is (10 sec: 14331.6, 60 sec: 13993.9, 300 sec: 14676.0). Total num frames: 799010816. Throughput: 0: 3531.3. Samples: 188920138. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 12:20:08,971][134211] Avg episode reward: [(0, '8.661')] [2025-01-04 12:20:09,794][134294] Updated weights for policy 0, policy_version 195074 (0.0023) [2025-01-04 12:20:12,409][134294] Updated weights for policy 0, policy_version 195084 (0.0020) [2025-01-04 12:20:13,968][134211] Fps is (10 sec: 15565.1, 60 sec: 14336.1, 300 sec: 14662.3). Total num frames: 799096832. Throughput: 0: 3583.8. Samples: 188943458. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 12:20:13,968][134211] Avg episode reward: [(0, '10.302')] [2025-01-04 12:20:14,311][134294] Updated weights for policy 0, policy_version 195094 (0.0013) [2025-01-04 12:20:16,118][134294] Updated weights for policy 0, policy_version 195104 (0.0012) [2025-01-04 12:20:17,990][134294] Updated weights for policy 0, policy_version 195114 (0.0015) [2025-01-04 12:20:18,968][134211] Fps is (10 sec: 19257.1, 60 sec: 15018.6, 300 sec: 14801.3). Total num frames: 799203328. Throughput: 0: 3719.8. Samples: 188959838. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 12:20:18,968][134211] Avg episode reward: [(0, '8.214')] [2025-01-04 12:20:19,909][134294] Updated weights for policy 0, policy_version 195124 (0.0014) [2025-01-04 12:20:21,766][134294] Updated weights for policy 0, policy_version 195134 (0.0015) [2025-01-04 12:20:23,968][134211] Fps is (10 sec: 20479.6, 60 sec: 15428.3, 300 sec: 14912.2). Total num frames: 799301632. Throughput: 0: 3964.1. Samples: 188992402. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 12:20:23,968][134211] Avg episode reward: [(0, '10.418')] [2025-01-04 12:20:24,480][134294] Updated weights for policy 0, policy_version 195144 (0.0023) [2025-01-04 12:20:27,622][134294] Updated weights for policy 0, policy_version 195154 (0.0027) [2025-01-04 12:20:28,968][134211] Fps is (10 sec: 16383.9, 60 sec: 15360.0, 300 sec: 14898.3). Total num frames: 799367168. Throughput: 0: 3938.0. Samples: 189012662. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 12:20:28,968][134211] Avg episode reward: [(0, '10.650')] [2025-01-04 12:20:30,682][134294] Updated weights for policy 0, policy_version 195164 (0.0026) [2025-01-04 12:20:33,616][134294] Updated weights for policy 0, policy_version 195174 (0.0024) [2025-01-04 12:20:33,968][134211] Fps is (10 sec: 13517.0, 60 sec: 15360.1, 300 sec: 14898.3). Total num frames: 799436800. Throughput: 0: 3929.0. Samples: 189022874. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 12:20:33,968][134211] Avg episode reward: [(0, '8.451')] [2025-01-04 12:20:36,653][134294] Updated weights for policy 0, policy_version 195184 (0.0025) [2025-01-04 12:20:38,968][134211] Fps is (10 sec: 13516.9, 60 sec: 15291.7, 300 sec: 14884.5). Total num frames: 799502336. Throughput: 0: 3924.6. Samples: 189043464. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 12:20:38,970][134211] Avg episode reward: [(0, '9.880')] [2025-01-04 12:20:39,742][134294] Updated weights for policy 0, policy_version 195194 (0.0027) [2025-01-04 12:20:42,723][134294] Updated weights for policy 0, policy_version 195204 (0.0025) [2025-01-04 12:20:43,968][134211] Fps is (10 sec: 13516.9, 60 sec: 15223.5, 300 sec: 14884.5). Total num frames: 799571968. Throughput: 0: 3899.7. Samples: 189063682. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 12:20:43,968][134211] Avg episode reward: [(0, '8.809')] [2025-01-04 12:20:45,639][134294] Updated weights for policy 0, policy_version 195214 (0.0026) [2025-01-04 12:20:48,491][134294] Updated weights for policy 0, policy_version 195224 (0.0022) [2025-01-04 12:20:48,968][134211] Fps is (10 sec: 13926.3, 60 sec: 15223.5, 300 sec: 14884.4). Total num frames: 799641600. Throughput: 0: 3899.5. Samples: 189074446. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 12:20:48,968][134211] Avg episode reward: [(0, '10.295')] [2025-01-04 12:20:51,322][134294] Updated weights for policy 0, policy_version 195234 (0.0026) [2025-01-04 12:20:53,968][134211] Fps is (10 sec: 13926.1, 60 sec: 15223.4, 300 sec: 14884.4). Total num frames: 799711232. Throughput: 0: 3902.3. Samples: 189095728. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 12:20:53,968][134211] Avg episode reward: [(0, '9.937')] [2025-01-04 12:20:54,272][134294] Updated weights for policy 0, policy_version 195244 (0.0025) [2025-01-04 12:20:57,234][134294] Updated weights for policy 0, policy_version 195254 (0.0024) [2025-01-04 12:20:58,968][134211] Fps is (10 sec: 13926.6, 60 sec: 15223.5, 300 sec: 14870.6). Total num frames: 799780864. Throughput: 0: 3841.2. Samples: 189116314. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 12:20:58,968][134211] Avg episode reward: [(0, '8.806')] [2025-01-04 12:21:00,166][134294] Updated weights for policy 0, policy_version 195264 (0.0026) [2025-01-04 12:21:03,101][134294] Updated weights for policy 0, policy_version 195274 (0.0023) [2025-01-04 12:21:03,968][134211] Fps is (10 sec: 13926.6, 60 sec: 15155.2, 300 sec: 14870.6). Total num frames: 799850496. Throughput: 0: 3718.1. Samples: 189127154. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 12:21:03,968][134211] Avg episode reward: [(0, '10.344')] [2025-01-04 12:21:05,924][134294] Updated weights for policy 0, policy_version 195284 (0.0023) [2025-01-04 12:21:08,692][134294] Updated weights for policy 0, policy_version 195294 (0.0023) [2025-01-04 12:21:08,968][134211] Fps is (10 sec: 14335.8, 60 sec: 15224.2, 300 sec: 14870.6). Total num frames: 799924224. Throughput: 0: 3470.5. Samples: 189148574. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 12:21:08,968][134211] Avg episode reward: [(0, '9.739')] [2025-01-04 12:21:11,639][134294] Updated weights for policy 0, policy_version 195304 (0.0025) [2025-01-04 12:21:13,968][134211] Fps is (10 sec: 14745.5, 60 sec: 15018.6, 300 sec: 14884.4). Total num frames: 799997952. Throughput: 0: 3493.3. Samples: 189169862. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 12:21:13,968][134211] Avg episode reward: [(0, '7.935')] [2025-01-04 12:21:13,980][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000195312_799997952.pth... [2025-01-04 12:21:14,049][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000194447_796454912.pth [2025-01-04 12:21:14,712][134294] Updated weights for policy 0, policy_version 195314 (0.0025) [2025-01-04 12:21:17,569][134294] Updated weights for policy 0, policy_version 195324 (0.0025) [2025-01-04 12:21:18,968][134211] Fps is (10 sec: 13926.5, 60 sec: 14336.0, 300 sec: 14870.6). Total num frames: 800063488. Throughput: 0: 3492.4. Samples: 189180032. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 12:21:18,968][134211] Avg episode reward: [(0, '9.279')] [2025-01-04 12:21:20,457][134294] Updated weights for policy 0, policy_version 195334 (0.0022) [2025-01-04 12:21:23,208][134294] Updated weights for policy 0, policy_version 195344 (0.0024) [2025-01-04 12:21:23,968][134211] Fps is (10 sec: 13926.5, 60 sec: 13926.4, 300 sec: 14870.6). Total num frames: 800137216. Throughput: 0: 3514.7. Samples: 189201624. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 12:21:23,968][134211] Avg episode reward: [(0, '9.289')] [2025-01-04 12:21:26,223][134294] Updated weights for policy 0, policy_version 195354 (0.0022) [2025-01-04 12:21:28,968][134211] Fps is (10 sec: 14335.9, 60 sec: 13994.7, 300 sec: 14856.7). Total num frames: 800206848. Throughput: 0: 3535.1. Samples: 189222762. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 12:21:28,968][134211] Avg episode reward: [(0, '10.573')] [2025-01-04 12:21:29,103][134294] Updated weights for policy 0, policy_version 195364 (0.0025) [2025-01-04 12:21:32,083][134294] Updated weights for policy 0, policy_version 195374 (0.0023) [2025-01-04 12:21:33,968][134211] Fps is (10 sec: 13926.4, 60 sec: 13994.7, 300 sec: 14717.8). Total num frames: 800276480. Throughput: 0: 3522.3. Samples: 189232950. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 12:21:33,968][134211] Avg episode reward: [(0, '10.400')] [2025-01-04 12:21:34,984][134294] Updated weights for policy 0, policy_version 195384 (0.0023) [2025-01-04 12:21:37,839][134294] Updated weights for policy 0, policy_version 195394 (0.0024) [2025-01-04 12:21:38,968][134211] Fps is (10 sec: 13926.5, 60 sec: 14062.9, 300 sec: 14592.9). Total num frames: 800346112. Throughput: 0: 3527.5. Samples: 189254464. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 12:21:38,968][134211] Avg episode reward: [(0, '9.247')] [2025-01-04 12:21:40,402][134294] Updated weights for policy 0, policy_version 195404 (0.0019) [2025-01-04 12:21:42,379][134294] Updated weights for policy 0, policy_version 195414 (0.0015) [2025-01-04 12:21:43,968][134211] Fps is (10 sec: 15973.1, 60 sec: 14404.1, 300 sec: 14620.6). Total num frames: 800436224. Throughput: 0: 3642.3. Samples: 189280222. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 12:21:43,969][134211] Avg episode reward: [(0, '9.669')] [2025-01-04 12:21:45,119][134294] Updated weights for policy 0, policy_version 195424 (0.0025) [2025-01-04 12:21:47,919][134294] Updated weights for policy 0, policy_version 195434 (0.0024) [2025-01-04 12:21:48,968][134211] Fps is (10 sec: 16383.9, 60 sec: 14472.5, 300 sec: 14648.4). Total num frames: 800509952. Throughput: 0: 3646.4. Samples: 189291240. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 12:21:48,968][134211] Avg episode reward: [(0, '9.635')] [2025-01-04 12:21:50,875][134294] Updated weights for policy 0, policy_version 195444 (0.0021) [2025-01-04 12:21:53,708][134294] Updated weights for policy 0, policy_version 195454 (0.0024) [2025-01-04 12:21:53,968][134211] Fps is (10 sec: 14336.3, 60 sec: 14472.4, 300 sec: 14662.3). Total num frames: 800579584. Throughput: 0: 3643.7. Samples: 189312544. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 12:21:53,969][134211] Avg episode reward: [(0, '9.762')] [2025-01-04 12:21:56,630][134294] Updated weights for policy 0, policy_version 195464 (0.0026) [2025-01-04 12:21:58,968][134211] Fps is (10 sec: 13926.5, 60 sec: 14472.5, 300 sec: 14662.3). Total num frames: 800649216. Throughput: 0: 3638.7. Samples: 189333604. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 12:21:58,968][134211] Avg episode reward: [(0, '9.696')] [2025-01-04 12:21:59,609][134294] Updated weights for policy 0, policy_version 195474 (0.0023) [2025-01-04 12:22:02,621][134294] Updated weights for policy 0, policy_version 195484 (0.0021) [2025-01-04 12:22:03,968][134211] Fps is (10 sec: 14746.5, 60 sec: 14609.1, 300 sec: 14690.1). Total num frames: 800727040. Throughput: 0: 3645.1. Samples: 189344060. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 12:22:03,968][134211] Avg episode reward: [(0, '9.359')] [2025-01-04 12:22:04,636][134294] Updated weights for policy 0, policy_version 195494 (0.0014) [2025-01-04 12:22:06,537][134294] Updated weights for policy 0, policy_version 195504 (0.0012) [2025-01-04 12:22:08,410][134294] Updated weights for policy 0, policy_version 195514 (0.0013) [2025-01-04 12:22:08,968][134211] Fps is (10 sec: 18432.2, 60 sec: 15155.2, 300 sec: 14842.8). Total num frames: 800833536. Throughput: 0: 3807.4. Samples: 189372956. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 12:22:08,968][134211] Avg episode reward: [(0, '9.108')] [2025-01-04 12:22:10,283][134294] Updated weights for policy 0, policy_version 195524 (0.0014) [2025-01-04 12:22:12,664][134294] Updated weights for policy 0, policy_version 195534 (0.0020) [2025-01-04 12:22:13,969][134211] Fps is (10 sec: 19249.3, 60 sec: 15359.8, 300 sec: 14898.3). Total num frames: 800919552. Throughput: 0: 3974.1. Samples: 189401602. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 12:22:13,969][134211] Avg episode reward: [(0, '8.738')] [2025-01-04 12:22:15,848][134294] Updated weights for policy 0, policy_version 195544 (0.0028) [2025-01-04 12:22:18,889][134294] Updated weights for policy 0, policy_version 195554 (0.0027) [2025-01-04 12:22:18,968][134211] Fps is (10 sec: 15564.5, 60 sec: 15428.2, 300 sec: 14898.3). Total num frames: 800989184. Throughput: 0: 3966.6. Samples: 189411448. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 12:22:18,968][134211] Avg episode reward: [(0, '8.696')] [2025-01-04 12:22:21,801][134294] Updated weights for policy 0, policy_version 195564 (0.0025) [2025-01-04 12:22:23,968][134211] Fps is (10 sec: 13927.5, 60 sec: 15360.0, 300 sec: 14898.3). Total num frames: 801058816. Throughput: 0: 3944.7. Samples: 189431976. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 12:22:23,968][134211] Avg episode reward: [(0, '9.768')] [2025-01-04 12:22:24,921][134294] Updated weights for policy 0, policy_version 195574 (0.0025) [2025-01-04 12:22:27,926][134294] Updated weights for policy 0, policy_version 195584 (0.0024) [2025-01-04 12:22:28,968][134211] Fps is (10 sec: 13516.0, 60 sec: 15291.6, 300 sec: 14870.5). Total num frames: 801124352. Throughput: 0: 3817.8. Samples: 189452024. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 12:22:28,969][134211] Avg episode reward: [(0, '9.604')] [2025-01-04 12:22:30,892][134294] Updated weights for policy 0, policy_version 195594 (0.0025) [2025-01-04 12:22:33,659][134294] Updated weights for policy 0, policy_version 195604 (0.0021) [2025-01-04 12:22:33,968][134211] Fps is (10 sec: 13926.5, 60 sec: 15360.0, 300 sec: 14884.4). Total num frames: 801198080. Throughput: 0: 3810.5. Samples: 189462714. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 12:22:33,968][134211] Avg episode reward: [(0, '9.161')] [2025-01-04 12:22:36,591][134294] Updated weights for policy 0, policy_version 195614 (0.0024) [2025-01-04 12:22:38,968][134211] Fps is (10 sec: 13927.3, 60 sec: 15291.7, 300 sec: 14870.6). Total num frames: 801263616. Throughput: 0: 3809.5. Samples: 189483968. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 12:22:38,968][134211] Avg episode reward: [(0, '8.935')] [2025-01-04 12:22:39,682][134294] Updated weights for policy 0, policy_version 195624 (0.0024) [2025-01-04 12:22:42,447][134294] Updated weights for policy 0, policy_version 195634 (0.0025) [2025-01-04 12:22:43,968][134211] Fps is (10 sec: 13516.7, 60 sec: 14950.6, 300 sec: 14870.6). Total num frames: 801333248. Throughput: 0: 3802.2. Samples: 189504704. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 12:22:43,968][134211] Avg episode reward: [(0, '8.741')] [2025-01-04 12:22:45,408][134294] Updated weights for policy 0, policy_version 195644 (0.0025) [2025-01-04 12:22:48,240][134294] Updated weights for policy 0, policy_version 195654 (0.0026) [2025-01-04 12:22:48,968][134211] Fps is (10 sec: 14335.9, 60 sec: 14950.4, 300 sec: 14870.6). Total num frames: 801406976. Throughput: 0: 3809.1. Samples: 189515472. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 12:22:48,968][134211] Avg episode reward: [(0, '9.893')] [2025-01-04 12:22:51,072][134294] Updated weights for policy 0, policy_version 195664 (0.0028) [2025-01-04 12:22:53,968][134211] Fps is (10 sec: 14336.1, 60 sec: 14950.5, 300 sec: 14870.6). Total num frames: 801476608. Throughput: 0: 3642.4. Samples: 189536864. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 12:22:53,968][134211] Avg episode reward: [(0, '8.617')] [2025-01-04 12:22:54,143][134294] Updated weights for policy 0, policy_version 195674 (0.0022) [2025-01-04 12:22:56,962][134294] Updated weights for policy 0, policy_version 195684 (0.0027) [2025-01-04 12:22:58,968][134211] Fps is (10 sec: 13926.4, 60 sec: 14950.4, 300 sec: 14870.6). Total num frames: 801546240. Throughput: 0: 3468.9. Samples: 189557700. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 12:22:58,968][134211] Avg episode reward: [(0, '10.516')] [2025-01-04 12:22:59,907][134294] Updated weights for policy 0, policy_version 195694 (0.0025) [2025-01-04 12:23:02,810][134294] Updated weights for policy 0, policy_version 195704 (0.0023) [2025-01-04 12:23:03,968][134211] Fps is (10 sec: 13926.3, 60 sec: 14813.8, 300 sec: 14870.6). Total num frames: 801615872. Throughput: 0: 3491.0. Samples: 189568542. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 12:23:03,968][134211] Avg episode reward: [(0, '8.627')] [2025-01-04 12:23:05,652][134294] Updated weights for policy 0, policy_version 195714 (0.0023) [2025-01-04 12:23:08,469][134294] Updated weights for policy 0, policy_version 195724 (0.0024) [2025-01-04 12:23:08,968][134211] Fps is (10 sec: 14336.0, 60 sec: 14267.7, 300 sec: 14870.9). Total num frames: 801689600. Throughput: 0: 3510.6. Samples: 189589952. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 12:23:08,968][134211] Avg episode reward: [(0, '9.339')] [2025-01-04 12:23:11,366][134294] Updated weights for policy 0, policy_version 195734 (0.0026) [2025-01-04 12:23:13,968][134211] Fps is (10 sec: 14335.9, 60 sec: 13994.8, 300 sec: 14801.2). Total num frames: 801759232. Throughput: 0: 3534.6. Samples: 189611078. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 12:23:13,968][134211] Avg episode reward: [(0, '9.788')] [2025-01-04 12:23:14,021][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000195743_801763328.pth... [2025-01-04 12:23:14,091][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000194883_798240768.pth [2025-01-04 12:23:14,415][134294] Updated weights for policy 0, policy_version 195744 (0.0026) [2025-01-04 12:23:16,359][134294] Updated weights for policy 0, policy_version 195754 (0.0015) [2025-01-04 12:23:18,909][134294] Updated weights for policy 0, policy_version 195764 (0.0023) [2025-01-04 12:23:18,968][134211] Fps is (10 sec: 15973.9, 60 sec: 14335.9, 300 sec: 14870.5). Total num frames: 801849344. Throughput: 0: 3596.1. Samples: 189624540. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 12:23:18,969][134211] Avg episode reward: [(0, '10.333')] [2025-01-04 12:23:21,873][134294] Updated weights for policy 0, policy_version 195774 (0.0025) [2025-01-04 12:23:23,968][134211] Fps is (10 sec: 15564.9, 60 sec: 14267.7, 300 sec: 14870.6). Total num frames: 801914880. Throughput: 0: 3617.1. Samples: 189646736. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 12:23:23,968][134211] Avg episode reward: [(0, '9.592')] [2025-01-04 12:23:24,976][134294] Updated weights for policy 0, policy_version 195784 (0.0025) [2025-01-04 12:23:27,969][134294] Updated weights for policy 0, policy_version 195794 (0.0027) [2025-01-04 12:23:28,968][134211] Fps is (10 sec: 13517.4, 60 sec: 14336.2, 300 sec: 14856.7). Total num frames: 801984512. Throughput: 0: 3611.0. Samples: 189667200. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 12:23:28,968][134211] Avg episode reward: [(0, '10.624')] [2025-01-04 12:23:30,433][134294] Updated weights for policy 0, policy_version 195804 (0.0022) [2025-01-04 12:23:32,425][134294] Updated weights for policy 0, policy_version 195814 (0.0015) [2025-01-04 12:23:33,968][134211] Fps is (10 sec: 15974.6, 60 sec: 14609.1, 300 sec: 14856.7). Total num frames: 802074624. Throughput: 0: 3670.0. Samples: 189680622. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 12:23:33,968][134211] Avg episode reward: [(0, '8.889')] [2025-01-04 12:23:35,212][134294] Updated weights for policy 0, policy_version 195824 (0.0025) [2025-01-04 12:23:38,023][134294] Updated weights for policy 0, policy_version 195834 (0.0023) [2025-01-04 12:23:38,968][134211] Fps is (10 sec: 16383.4, 60 sec: 14745.5, 300 sec: 14870.5). Total num frames: 802148352. Throughput: 0: 3718.2. Samples: 189704186. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 12:23:38,969][134211] Avg episode reward: [(0, '8.509')] [2025-01-04 12:23:40,977][134294] Updated weights for policy 0, policy_version 195844 (0.0024) [2025-01-04 12:23:43,854][134294] Updated weights for policy 0, policy_version 195854 (0.0025) [2025-01-04 12:23:43,968][134211] Fps is (10 sec: 14335.8, 60 sec: 14745.6, 300 sec: 14870.6). Total num frames: 802217984. Throughput: 0: 3729.3. Samples: 189725518. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 12:23:43,968][134211] Avg episode reward: [(0, '10.281')] [2025-01-04 12:23:46,711][134294] Updated weights for policy 0, policy_version 195864 (0.0027) [2025-01-04 12:23:48,968][134211] Fps is (10 sec: 13927.1, 60 sec: 14677.4, 300 sec: 14870.6). Total num frames: 802287616. Throughput: 0: 3721.4. Samples: 189736002. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 12:23:48,968][134211] Avg episode reward: [(0, '9.851')] [2025-01-04 12:23:49,758][134294] Updated weights for policy 0, policy_version 195874 (0.0023) [2025-01-04 12:23:52,679][134294] Updated weights for policy 0, policy_version 195884 (0.0024) [2025-01-04 12:23:53,968][134211] Fps is (10 sec: 13926.6, 60 sec: 14677.3, 300 sec: 14828.9). Total num frames: 802357248. Throughput: 0: 3707.4. Samples: 189756786. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 12:23:53,968][134211] Avg episode reward: [(0, '9.537')] [2025-01-04 12:23:55,577][134294] Updated weights for policy 0, policy_version 195894 (0.0024) [2025-01-04 12:23:58,283][134294] Updated weights for policy 0, policy_version 195904 (0.0024) [2025-01-04 12:23:58,968][134211] Fps is (10 sec: 13926.2, 60 sec: 14677.3, 300 sec: 14690.1). Total num frames: 802426880. Throughput: 0: 3717.0. Samples: 189778344. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 12:23:58,968][134211] Avg episode reward: [(0, '10.216')] [2025-01-04 12:24:01,234][134294] Updated weights for policy 0, policy_version 195914 (0.0025) [2025-01-04 12:24:03,156][134294] Updated weights for policy 0, policy_version 195924 (0.0014) [2025-01-04 12:24:03,968][134211] Fps is (10 sec: 16384.0, 60 sec: 15087.0, 300 sec: 14745.6). Total num frames: 802521088. Throughput: 0: 3655.4. Samples: 189789032. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 12:24:03,968][134211] Avg episode reward: [(0, '9.401')] [2025-01-04 12:24:05,089][134294] Updated weights for policy 0, policy_version 195934 (0.0015) [2025-01-04 12:24:07,460][134294] Updated weights for policy 0, policy_version 195944 (0.0020) [2025-01-04 12:24:08,968][134211] Fps is (10 sec: 17612.7, 60 sec: 15223.5, 300 sec: 14801.1). Total num frames: 802603008. Throughput: 0: 3823.7. Samples: 189818804. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 12:24:08,968][134211] Avg episode reward: [(0, '10.033')] [2025-01-04 12:24:10,530][134294] Updated weights for policy 0, policy_version 195954 (0.0026) [2025-01-04 12:24:13,424][134294] Updated weights for policy 0, policy_version 195964 (0.0024) [2025-01-04 12:24:13,968][134211] Fps is (10 sec: 15154.9, 60 sec: 15223.5, 300 sec: 14815.0). Total num frames: 802672640. Throughput: 0: 3829.1. Samples: 189839510. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 12:24:13,968][134211] Avg episode reward: [(0, '10.463')] [2025-01-04 12:24:16,452][134294] Updated weights for policy 0, policy_version 195974 (0.0025) [2025-01-04 12:24:18,968][134211] Fps is (10 sec: 13926.3, 60 sec: 14882.2, 300 sec: 14801.2). Total num frames: 802742272. Throughput: 0: 3762.7. Samples: 189849942. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 12:24:18,968][134211] Avg episode reward: [(0, '11.145')] [2025-01-04 12:24:19,349][134294] Updated weights for policy 0, policy_version 195984 (0.0024) [2025-01-04 12:24:22,355][134294] Updated weights for policy 0, policy_version 195994 (0.0023) [2025-01-04 12:24:23,968][134211] Fps is (10 sec: 13926.6, 60 sec: 14950.4, 300 sec: 14801.1). Total num frames: 802811904. Throughput: 0: 3696.7. Samples: 189870536. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 12:24:23,968][134211] Avg episode reward: [(0, '8.862')] [2025-01-04 12:24:25,267][134294] Updated weights for policy 0, policy_version 196004 (0.0026) [2025-01-04 12:24:28,073][134294] Updated weights for policy 0, policy_version 196014 (0.0022) [2025-01-04 12:24:28,968][134211] Fps is (10 sec: 14336.1, 60 sec: 15018.6, 300 sec: 14815.0). Total num frames: 802885632. Throughput: 0: 3696.4. Samples: 189891856. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 12:24:28,968][134211] Avg episode reward: [(0, '10.457')] [2025-01-04 12:24:30,989][134294] Updated weights for policy 0, policy_version 196024 (0.0023) [2025-01-04 12:24:33,879][134294] Updated weights for policy 0, policy_version 196034 (0.0027) [2025-01-04 12:24:33,968][134211] Fps is (10 sec: 14335.8, 60 sec: 14677.3, 300 sec: 14815.0). Total num frames: 802955264. Throughput: 0: 3701.0. Samples: 189902550. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 12:24:33,968][134211] Avg episode reward: [(0, '9.614')] [2025-01-04 12:24:36,850][134294] Updated weights for policy 0, policy_version 196044 (0.0026) [2025-01-04 12:24:38,968][134211] Fps is (10 sec: 13926.4, 60 sec: 14609.1, 300 sec: 14801.1). Total num frames: 803024896. Throughput: 0: 3702.8. Samples: 189923414. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 12:24:38,968][134211] Avg episode reward: [(0, '9.274')] [2025-01-04 12:24:39,831][134294] Updated weights for policy 0, policy_version 196054 (0.0024) [2025-01-04 12:24:42,656][134294] Updated weights for policy 0, policy_version 196064 (0.0026) [2025-01-04 12:24:43,968][134211] Fps is (10 sec: 13925.8, 60 sec: 14608.9, 300 sec: 14801.1). Total num frames: 803094528. Throughput: 0: 3688.2. Samples: 189944316. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 12:24:43,969][134211] Avg episode reward: [(0, '9.936')] [2025-01-04 12:24:45,163][134294] Updated weights for policy 0, policy_version 196074 (0.0021) [2025-01-04 12:24:47,237][134294] Updated weights for policy 0, policy_version 196084 (0.0016) [2025-01-04 12:24:48,968][134211] Fps is (10 sec: 15564.7, 60 sec: 14882.1, 300 sec: 14856.7). Total num frames: 803180544. Throughput: 0: 3765.2. Samples: 189958468. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 12:24:48,968][134211] Avg episode reward: [(0, '8.498')] [2025-01-04 12:24:50,180][134294] Updated weights for policy 0, policy_version 196094 (0.0024) [2025-01-04 12:24:53,076][134294] Updated weights for policy 0, policy_version 196104 (0.0024) [2025-01-04 12:24:53,968][134211] Fps is (10 sec: 15565.6, 60 sec: 14882.1, 300 sec: 14856.7). Total num frames: 803250176. Throughput: 0: 3598.0. Samples: 189980714. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 12:24:53,968][134211] Avg episode reward: [(0, '10.292')] [2025-01-04 12:24:55,941][134294] Updated weights for policy 0, policy_version 196114 (0.0025) [2025-01-04 12:24:58,796][134294] Updated weights for policy 0, policy_version 196124 (0.0023) [2025-01-04 12:24:58,968][134211] Fps is (10 sec: 14336.1, 60 sec: 14950.4, 300 sec: 14856.7). Total num frames: 803323904. Throughput: 0: 3613.1. Samples: 190002100. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 12:24:58,968][134211] Avg episode reward: [(0, '8.530')] [2025-01-04 12:25:01,693][134294] Updated weights for policy 0, policy_version 196134 (0.0027) [2025-01-04 12:25:03,968][134211] Fps is (10 sec: 14335.9, 60 sec: 14540.8, 300 sec: 14856.8). Total num frames: 803393536. Throughput: 0: 3612.4. Samples: 190012502. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 12:25:03,968][134211] Avg episode reward: [(0, '8.788')] [2025-01-04 12:25:04,738][134294] Updated weights for policy 0, policy_version 196144 (0.0025) [2025-01-04 12:25:06,985][134294] Updated weights for policy 0, policy_version 196154 (0.0016) [2025-01-04 12:25:08,968][134211] Fps is (10 sec: 15564.9, 60 sec: 14609.1, 300 sec: 14856.7). Total num frames: 803479552. Throughput: 0: 3676.3. Samples: 190035968. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 12:25:08,968][134211] Avg episode reward: [(0, '9.026')] [2025-01-04 12:25:09,390][134294] Updated weights for policy 0, policy_version 196164 (0.0018) [2025-01-04 12:25:12,176][134294] Updated weights for policy 0, policy_version 196174 (0.0025) [2025-01-04 12:25:13,968][134211] Fps is (10 sec: 15974.3, 60 sec: 14677.3, 300 sec: 14745.6). Total num frames: 803553280. Throughput: 0: 3705.1. Samples: 190058588. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 12:25:13,968][134211] Avg episode reward: [(0, '7.947')] [2025-01-04 12:25:13,984][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000196180_803553280.pth... [2025-01-04 12:25:14,055][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000195312_799997952.pth [2025-01-04 12:25:15,218][134294] Updated weights for policy 0, policy_version 196184 (0.0024) [2025-01-04 12:25:18,018][134294] Updated weights for policy 0, policy_version 196194 (0.0026) [2025-01-04 12:25:18,968][134211] Fps is (10 sec: 14335.9, 60 sec: 14677.3, 300 sec: 14648.4). Total num frames: 803622912. Throughput: 0: 3701.7. Samples: 190069128. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 12:25:18,968][134211] Avg episode reward: [(0, '9.445')] [2025-01-04 12:25:20,984][134294] Updated weights for policy 0, policy_version 196204 (0.0021) [2025-01-04 12:25:23,792][134294] Updated weights for policy 0, policy_version 196214 (0.0024) [2025-01-04 12:25:23,968][134211] Fps is (10 sec: 13926.5, 60 sec: 14677.3, 300 sec: 14662.3). Total num frames: 803692544. Throughput: 0: 3713.6. Samples: 190090524. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 12:25:23,968][134211] Avg episode reward: [(0, '8.921')] [2025-01-04 12:25:26,758][134294] Updated weights for policy 0, policy_version 196224 (0.0023) [2025-01-04 12:25:28,970][134211] Fps is (10 sec: 13923.9, 60 sec: 14608.6, 300 sec: 14662.2). Total num frames: 803762176. Throughput: 0: 3719.6. Samples: 190111704. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 12:25:28,970][134211] Avg episode reward: [(0, '10.157')] [2025-01-04 12:25:29,457][134294] Updated weights for policy 0, policy_version 196234 (0.0021) [2025-01-04 12:25:31,467][134294] Updated weights for policy 0, policy_version 196244 (0.0013) [2025-01-04 12:25:33,968][134211] Fps is (10 sec: 15564.9, 60 sec: 14882.2, 300 sec: 14731.7). Total num frames: 803848192. Throughput: 0: 3718.4. Samples: 190125796. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 12:25:33,968][134211] Avg episode reward: [(0, '9.773')] [2025-01-04 12:25:34,244][134294] Updated weights for policy 0, policy_version 196254 (0.0026) [2025-01-04 12:25:37,167][134294] Updated weights for policy 0, policy_version 196264 (0.0024) [2025-01-04 12:25:38,968][134211] Fps is (10 sec: 15977.5, 60 sec: 14950.4, 300 sec: 14745.6). Total num frames: 803921920. Throughput: 0: 3706.1. Samples: 190147488. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 12:25:38,968][134211] Avg episode reward: [(0, '9.640')] [2025-01-04 12:25:40,160][134294] Updated weights for policy 0, policy_version 196274 (0.0025) [2025-01-04 12:25:42,990][134294] Updated weights for policy 0, policy_version 196284 (0.0024) [2025-01-04 12:25:43,969][134211] Fps is (10 sec: 14334.7, 60 sec: 14950.3, 300 sec: 14745.6). Total num frames: 803991552. Throughput: 0: 3701.8. Samples: 190168684. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 12:25:43,969][134211] Avg episode reward: [(0, '10.087')] [2025-01-04 12:25:45,851][134294] Updated weights for policy 0, policy_version 196294 (0.0024) [2025-01-04 12:25:48,652][134294] Updated weights for policy 0, policy_version 196304 (0.0024) [2025-01-04 12:25:48,969][134211] Fps is (10 sec: 14333.8, 60 sec: 14745.3, 300 sec: 14759.4). Total num frames: 804065280. Throughput: 0: 3711.4. Samples: 190179518. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 12:25:48,970][134211] Avg episode reward: [(0, '10.462')] [2025-01-04 12:25:51,591][134294] Updated weights for policy 0, policy_version 196314 (0.0024) [2025-01-04 12:25:53,969][134211] Fps is (10 sec: 14336.1, 60 sec: 14745.4, 300 sec: 14759.4). Total num frames: 804134912. Throughput: 0: 3664.4. Samples: 190200870. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 12:25:53,969][134211] Avg episode reward: [(0, '9.306')] [2025-01-04 12:25:54,512][134294] Updated weights for policy 0, policy_version 196324 (0.0023) [2025-01-04 12:25:57,438][134294] Updated weights for policy 0, policy_version 196334 (0.0027) [2025-01-04 12:25:58,968][134211] Fps is (10 sec: 14747.8, 60 sec: 14813.9, 300 sec: 14787.3). Total num frames: 804212736. Throughput: 0: 3641.7. Samples: 190222462. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 12:25:58,968][134211] Avg episode reward: [(0, '9.609')] [2025-01-04 12:25:59,462][134294] Updated weights for policy 0, policy_version 196344 (0.0011) [2025-01-04 12:26:02,039][134294] Updated weights for policy 0, policy_version 196354 (0.0020) [2025-01-04 12:26:03,968][134211] Fps is (10 sec: 15566.0, 60 sec: 14950.4, 300 sec: 14801.1). Total num frames: 804290560. Throughput: 0: 3720.8. Samples: 190236564. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 12:26:03,968][134211] Avg episode reward: [(0, '9.927')] [2025-01-04 12:26:05,041][134294] Updated weights for policy 0, policy_version 196364 (0.0027) [2025-01-04 12:26:07,840][134294] Updated weights for policy 0, policy_version 196374 (0.0023) [2025-01-04 12:26:08,968][134211] Fps is (10 sec: 14745.7, 60 sec: 14677.3, 300 sec: 14787.3). Total num frames: 804360192. Throughput: 0: 3721.0. Samples: 190257968. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 12:26:08,968][134211] Avg episode reward: [(0, '9.205')] [2025-01-04 12:26:10,703][134294] Updated weights for policy 0, policy_version 196384 (0.0023) [2025-01-04 12:26:13,571][134294] Updated weights for policy 0, policy_version 196394 (0.0021) [2025-01-04 12:26:13,968][134211] Fps is (10 sec: 14336.1, 60 sec: 14677.4, 300 sec: 14815.0). Total num frames: 804433920. Throughput: 0: 3724.8. Samples: 190279314. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 12:26:13,968][134211] Avg episode reward: [(0, '10.666')] [2025-01-04 12:26:16,452][134294] Updated weights for policy 0, policy_version 196404 (0.0026) [2025-01-04 12:26:18,947][134294] Updated weights for policy 0, policy_version 196414 (0.0018) [2025-01-04 12:26:18,967][134211] Fps is (10 sec: 15155.4, 60 sec: 14813.9, 300 sec: 14828.9). Total num frames: 804511744. Throughput: 0: 3647.1. Samples: 190289916. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 12:26:18,968][134211] Avg episode reward: [(0, '9.654')] [2025-01-04 12:26:21,141][134294] Updated weights for policy 0, policy_version 196424 (0.0018) [2025-01-04 12:26:23,968][134211] Fps is (10 sec: 15564.7, 60 sec: 14950.4, 300 sec: 14856.7). Total num frames: 804589568. Throughput: 0: 3726.8. Samples: 190315194. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 12:26:23,968][134211] Avg episode reward: [(0, '9.139')] [2025-01-04 12:26:23,993][134294] Updated weights for policy 0, policy_version 196434 (0.0022) [2025-01-04 12:26:26,916][134294] Updated weights for policy 0, policy_version 196444 (0.0026) [2025-01-04 12:26:28,968][134211] Fps is (10 sec: 15155.0, 60 sec: 15019.1, 300 sec: 14870.6). Total num frames: 804663296. Throughput: 0: 3721.5. Samples: 190336146. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 12:26:28,968][134211] Avg episode reward: [(0, '8.607')] [2025-01-04 12:26:29,837][134294] Updated weights for policy 0, policy_version 196454 (0.0026) [2025-01-04 12:26:32,837][134294] Updated weights for policy 0, policy_version 196464 (0.0027) [2025-01-04 12:26:33,968][134211] Fps is (10 sec: 14335.9, 60 sec: 14745.6, 300 sec: 14870.6). Total num frames: 804732928. Throughput: 0: 3710.6. Samples: 190346488. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 12:26:33,968][134211] Avg episode reward: [(0, '9.820')] [2025-01-04 12:26:35,738][134294] Updated weights for policy 0, policy_version 196474 (0.0021) [2025-01-04 12:26:38,573][134294] Updated weights for policy 0, policy_version 196484 (0.0022) [2025-01-04 12:26:38,968][134211] Fps is (10 sec: 13926.4, 60 sec: 14677.3, 300 sec: 14801.2). Total num frames: 804802560. Throughput: 0: 3713.1. Samples: 190367958. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 12:26:38,968][134211] Avg episode reward: [(0, '9.863')] [2025-01-04 12:26:41,386][134294] Updated weights for policy 0, policy_version 196494 (0.0023) [2025-01-04 12:26:43,320][134294] Updated weights for policy 0, policy_version 196504 (0.0014) [2025-01-04 12:26:43,968][134211] Fps is (10 sec: 15564.9, 60 sec: 14950.6, 300 sec: 14842.8). Total num frames: 804888576. Throughput: 0: 3778.6. Samples: 190392498. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 12:26:43,968][134211] Avg episode reward: [(0, '10.308')] [2025-01-04 12:26:45,865][134294] Updated weights for policy 0, policy_version 196514 (0.0022) [2025-01-04 12:26:48,697][134294] Updated weights for policy 0, policy_version 196524 (0.0024) [2025-01-04 12:26:48,968][134211] Fps is (10 sec: 16384.0, 60 sec: 15019.0, 300 sec: 14870.6). Total num frames: 804966400. Throughput: 0: 3737.3. Samples: 190404742. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 12:26:48,968][134211] Avg episode reward: [(0, '9.807')] [2025-01-04 12:26:51,637][134294] Updated weights for policy 0, policy_version 196534 (0.0025) [2025-01-04 12:26:53,968][134211] Fps is (10 sec: 14335.9, 60 sec: 14950.6, 300 sec: 14856.7). Total num frames: 805031936. Throughput: 0: 3732.2. Samples: 190425920. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 12:26:53,968][134211] Avg episode reward: [(0, '9.125')] [2025-01-04 12:26:54,646][134294] Updated weights for policy 0, policy_version 196544 (0.0024) [2025-01-04 12:26:57,570][134294] Updated weights for policy 0, policy_version 196554 (0.0022) [2025-01-04 12:26:58,969][134211] Fps is (10 sec: 13515.5, 60 sec: 14813.6, 300 sec: 14828.9). Total num frames: 805101568. Throughput: 0: 3716.0. Samples: 190446538. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 12:26:58,969][134211] Avg episode reward: [(0, '10.063')] [2025-01-04 12:27:00,487][134294] Updated weights for policy 0, policy_version 196564 (0.0022) [2025-01-04 12:27:03,337][134294] Updated weights for policy 0, policy_version 196574 (0.0023) [2025-01-04 12:27:03,968][134211] Fps is (10 sec: 14336.0, 60 sec: 14745.6, 300 sec: 14717.8). Total num frames: 805175296. Throughput: 0: 3720.8. Samples: 190457352. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 12:27:03,968][134211] Avg episode reward: [(0, '9.903')] [2025-01-04 12:27:06,262][134294] Updated weights for policy 0, policy_version 196584 (0.0023) [2025-01-04 12:27:08,968][134211] Fps is (10 sec: 13927.4, 60 sec: 14677.3, 300 sec: 14648.4). Total num frames: 805240832. Throughput: 0: 3625.9. Samples: 190478358. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 12:27:08,969][134211] Avg episode reward: [(0, '8.685')] [2025-01-04 12:27:09,395][134294] Updated weights for policy 0, policy_version 196594 (0.0026) [2025-01-04 12:27:11,588][134294] Updated weights for policy 0, policy_version 196604 (0.0015) [2025-01-04 12:27:13,467][134294] Updated weights for policy 0, policy_version 196614 (0.0011) [2025-01-04 12:27:13,968][134211] Fps is (10 sec: 16383.1, 60 sec: 15086.8, 300 sec: 14745.6). Total num frames: 805339136. Throughput: 0: 3741.3. Samples: 190504506. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 12:27:13,969][134211] Avg episode reward: [(0, '9.520')] [2025-01-04 12:27:13,979][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000196616_805339136.pth... [2025-01-04 12:27:14,023][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000195743_801763328.pth [2025-01-04 12:27:15,328][134294] Updated weights for policy 0, policy_version 196624 (0.0013) [2025-01-04 12:27:17,233][134294] Updated weights for policy 0, policy_version 196634 (0.0013) [2025-01-04 12:27:18,968][134211] Fps is (10 sec: 20890.1, 60 sec: 15633.0, 300 sec: 14884.5). Total num frames: 805449728. Throughput: 0: 3875.0. Samples: 190520860. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 12:27:18,968][134211] Avg episode reward: [(0, '9.232')] [2025-01-04 12:27:19,095][134294] Updated weights for policy 0, policy_version 196644 (0.0014) [2025-01-04 12:27:21,298][134294] Updated weights for policy 0, policy_version 196654 (0.0018) [2025-01-04 12:27:23,968][134211] Fps is (10 sec: 18842.6, 60 sec: 15633.0, 300 sec: 14926.1). Total num frames: 805527552. Throughput: 0: 4041.1. Samples: 190549808. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 12:27:23,969][134211] Avg episode reward: [(0, '9.103')] [2025-01-04 12:27:24,404][134294] Updated weights for policy 0, policy_version 196664 (0.0028) [2025-01-04 12:27:27,525][134294] Updated weights for policy 0, policy_version 196674 (0.0025) [2025-01-04 12:27:28,968][134211] Fps is (10 sec: 14335.7, 60 sec: 15496.5, 300 sec: 14898.3). Total num frames: 805593088. Throughput: 0: 3933.8. Samples: 190569520. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 12:27:28,968][134211] Avg episode reward: [(0, '9.968')] [2025-01-04 12:27:30,542][134294] Updated weights for policy 0, policy_version 196684 (0.0024) [2025-01-04 12:27:33,496][134294] Updated weights for policy 0, policy_version 196694 (0.0025) [2025-01-04 12:27:33,968][134211] Fps is (10 sec: 13517.0, 60 sec: 15496.6, 300 sec: 14912.2). Total num frames: 805662720. Throughput: 0: 3892.4. Samples: 190579900. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 12:27:33,968][134211] Avg episode reward: [(0, '10.157')] [2025-01-04 12:27:36,367][134294] Updated weights for policy 0, policy_version 196704 (0.0027) [2025-01-04 12:27:38,968][134211] Fps is (10 sec: 13926.6, 60 sec: 15496.5, 300 sec: 14912.2). Total num frames: 805732352. Throughput: 0: 3882.7. Samples: 190600640. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 12:27:38,968][134211] Avg episode reward: [(0, '10.969')] [2025-01-04 12:27:39,498][134294] Updated weights for policy 0, policy_version 196714 (0.0024) [2025-01-04 12:27:42,413][134294] Updated weights for policy 0, policy_version 196724 (0.0023) [2025-01-04 12:27:43,968][134211] Fps is (10 sec: 13926.2, 60 sec: 15223.4, 300 sec: 14898.3). Total num frames: 805801984. Throughput: 0: 3880.6. Samples: 190621164. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 12:27:43,968][134211] Avg episode reward: [(0, '9.051')] [2025-01-04 12:27:45,377][134294] Updated weights for policy 0, policy_version 196734 (0.0027) [2025-01-04 12:27:48,176][134294] Updated weights for policy 0, policy_version 196744 (0.0024) [2025-01-04 12:27:48,968][134211] Fps is (10 sec: 13926.3, 60 sec: 15086.9, 300 sec: 14898.3). Total num frames: 805871616. Throughput: 0: 3878.0. Samples: 190631862. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 12:27:48,968][134211] Avg episode reward: [(0, '9.761')] [2025-01-04 12:27:51,150][134294] Updated weights for policy 0, policy_version 196754 (0.0026) [2025-01-04 12:27:53,968][134211] Fps is (10 sec: 13926.6, 60 sec: 15155.2, 300 sec: 14898.3). Total num frames: 805941248. Throughput: 0: 3880.1. Samples: 190652962. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 12:27:53,968][134211] Avg episode reward: [(0, '10.221')] [2025-01-04 12:27:54,029][134294] Updated weights for policy 0, policy_version 196764 (0.0025) [2025-01-04 12:27:56,959][134294] Updated weights for policy 0, policy_version 196774 (0.0023) [2025-01-04 12:27:58,968][134211] Fps is (10 sec: 13926.4, 60 sec: 15155.4, 300 sec: 14898.3). Total num frames: 806010880. Throughput: 0: 3762.7. Samples: 190673826. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 12:27:58,968][134211] Avg episode reward: [(0, '10.833')] [2025-01-04 12:27:59,910][134294] Updated weights for policy 0, policy_version 196784 (0.0023) [2025-01-04 12:28:02,794][134294] Updated weights for policy 0, policy_version 196794 (0.0022) [2025-01-04 12:28:03,968][134211] Fps is (10 sec: 13926.2, 60 sec: 15086.9, 300 sec: 14884.4). Total num frames: 806080512. Throughput: 0: 3638.2. Samples: 190684578. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 12:28:03,968][134211] Avg episode reward: [(0, '9.911')] [2025-01-04 12:28:05,749][134294] Updated weights for policy 0, policy_version 196804 (0.0022) [2025-01-04 12:28:08,500][134294] Updated weights for policy 0, policy_version 196814 (0.0025) [2025-01-04 12:28:08,968][134211] Fps is (10 sec: 14336.0, 60 sec: 15223.5, 300 sec: 14898.3). Total num frames: 806154240. Throughput: 0: 3472.5. Samples: 190706068. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 12:28:08,968][134211] Avg episode reward: [(0, '8.716')] [2025-01-04 12:28:11,419][134294] Updated weights for policy 0, policy_version 196824 (0.0024) [2025-01-04 12:28:13,968][134211] Fps is (10 sec: 14336.0, 60 sec: 14745.8, 300 sec: 14828.9). Total num frames: 806223872. Throughput: 0: 3503.0. Samples: 190727156. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 12:28:13,968][134211] Avg episode reward: [(0, '9.973')] [2025-01-04 12:28:14,393][134294] Updated weights for policy 0, policy_version 196834 (0.0024) [2025-01-04 12:28:17,282][134294] Updated weights for policy 0, policy_version 196844 (0.0025) [2025-01-04 12:28:18,968][134211] Fps is (10 sec: 13925.7, 60 sec: 14062.8, 300 sec: 14842.8). Total num frames: 806293504. Throughput: 0: 3502.6. Samples: 190737518. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 12:28:18,969][134211] Avg episode reward: [(0, '8.781')] [2025-01-04 12:28:20,203][134294] Updated weights for policy 0, policy_version 196854 (0.0025) [2025-01-04 12:28:23,046][134294] Updated weights for policy 0, policy_version 196864 (0.0023) [2025-01-04 12:28:23,968][134211] Fps is (10 sec: 14336.0, 60 sec: 13994.7, 300 sec: 14856.7). Total num frames: 806367232. Throughput: 0: 3519.4. Samples: 190759014. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 12:28:23,968][134211] Avg episode reward: [(0, '10.181')] [2025-01-04 12:28:25,882][134294] Updated weights for policy 0, policy_version 196874 (0.0025) [2025-01-04 12:28:28,744][134294] Updated weights for policy 0, policy_version 196884 (0.0022) [2025-01-04 12:28:28,968][134211] Fps is (10 sec: 14336.9, 60 sec: 14063.0, 300 sec: 14787.3). Total num frames: 806436864. Throughput: 0: 3542.0. Samples: 190780554. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 12:28:28,968][134211] Avg episode reward: [(0, '9.458')] [2025-01-04 12:28:31,546][134294] Updated weights for policy 0, policy_version 196894 (0.0026) [2025-01-04 12:28:33,968][134211] Fps is (10 sec: 14335.9, 60 sec: 14131.2, 300 sec: 14787.3). Total num frames: 806510592. Throughput: 0: 3539.8. Samples: 190791154. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 12:28:33,968][134211] Avg episode reward: [(0, '10.309')] [2025-01-04 12:28:34,382][134294] Updated weights for policy 0, policy_version 196904 (0.0022) [2025-01-04 12:28:37,317][134294] Updated weights for policy 0, policy_version 196914 (0.0024) [2025-01-04 12:28:38,968][134211] Fps is (10 sec: 14335.8, 60 sec: 14131.2, 300 sec: 14787.3). Total num frames: 806580224. Throughput: 0: 3547.7. Samples: 190812608. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 12:28:38,968][134211] Avg episode reward: [(0, '9.624')] [2025-01-04 12:28:40,192][134294] Updated weights for policy 0, policy_version 196924 (0.0022) [2025-01-04 12:28:43,048][134294] Updated weights for policy 0, policy_version 196934 (0.0025) [2025-01-04 12:28:43,968][134211] Fps is (10 sec: 14336.3, 60 sec: 14199.5, 300 sec: 14801.1). Total num frames: 806653952. Throughput: 0: 3562.7. Samples: 190834148. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 12:28:43,968][134211] Avg episode reward: [(0, '10.047')] [2025-01-04 12:28:45,218][134294] Updated weights for policy 0, policy_version 196944 (0.0016) [2025-01-04 12:28:47,088][134294] Updated weights for policy 0, policy_version 196954 (0.0013) [2025-01-04 12:28:48,915][134294] Updated weights for policy 0, policy_version 196964 (0.0013) [2025-01-04 12:28:48,967][134211] Fps is (10 sec: 18432.5, 60 sec: 14882.2, 300 sec: 14940.0). Total num frames: 806764544. Throughput: 0: 3659.2. Samples: 190849242. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 12:28:48,968][134211] Avg episode reward: [(0, '10.081')] [2025-01-04 12:28:50,822][134294] Updated weights for policy 0, policy_version 196974 (0.0014) [2025-01-04 12:28:52,948][134294] Updated weights for policy 0, policy_version 196984 (0.0016) [2025-01-04 12:28:53,968][134211] Fps is (10 sec: 20479.6, 60 sec: 15291.7, 300 sec: 15023.3). Total num frames: 806858752. Throughput: 0: 3900.5. Samples: 190881592. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 12:28:53,968][134211] Avg episode reward: [(0, '9.550')] [2025-01-04 12:28:56,057][134294] Updated weights for policy 0, policy_version 196994 (0.0026) [2025-01-04 12:28:58,968][134211] Fps is (10 sec: 15974.0, 60 sec: 15223.5, 300 sec: 14926.1). Total num frames: 806924288. Throughput: 0: 3882.6. Samples: 190901874. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 12:28:58,968][134211] Avg episode reward: [(0, '10.903')] [2025-01-04 12:28:59,207][134294] Updated weights for policy 0, policy_version 197004 (0.0026) [2025-01-04 12:29:02,319][134294] Updated weights for policy 0, policy_version 197014 (0.0025) [2025-01-04 12:29:03,968][134211] Fps is (10 sec: 13107.2, 60 sec: 15155.2, 300 sec: 14870.6). Total num frames: 806989824. Throughput: 0: 3871.6. Samples: 190911738. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 12:29:03,968][134211] Avg episode reward: [(0, '9.889')] [2025-01-04 12:29:05,317][134294] Updated weights for policy 0, policy_version 197024 (0.0026) [2025-01-04 12:29:08,199][134294] Updated weights for policy 0, policy_version 197034 (0.0025) [2025-01-04 12:29:08,968][134211] Fps is (10 sec: 13516.8, 60 sec: 15086.9, 300 sec: 14870.6). Total num frames: 807059456. Throughput: 0: 3853.2. Samples: 190932406. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 12:29:08,968][134211] Avg episode reward: [(0, '9.605')] [2025-01-04 12:29:11,166][134294] Updated weights for policy 0, policy_version 197044 (0.0024) [2025-01-04 12:29:13,968][134211] Fps is (10 sec: 13926.3, 60 sec: 15086.9, 300 sec: 14870.6). Total num frames: 807129088. Throughput: 0: 3837.5. Samples: 190953242. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 12:29:13,968][134211] Avg episode reward: [(0, '9.730')] [2025-01-04 12:29:13,976][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000197053_807129088.pth... [2025-01-04 12:29:14,052][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000196180_803553280.pth [2025-01-04 12:29:14,147][134294] Updated weights for policy 0, policy_version 197054 (0.0024) [2025-01-04 12:29:17,104][134294] Updated weights for policy 0, policy_version 197064 (0.0025) [2025-01-04 12:29:18,968][134211] Fps is (10 sec: 13926.6, 60 sec: 15087.1, 300 sec: 14870.6). Total num frames: 807198720. Throughput: 0: 3828.4. Samples: 190963430. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 12:29:18,968][134211] Avg episode reward: [(0, '9.569')] [2025-01-04 12:29:20,007][134294] Updated weights for policy 0, policy_version 197074 (0.0022) [2025-01-04 12:29:22,859][134294] Updated weights for policy 0, policy_version 197084 (0.0024) [2025-01-04 12:29:23,968][134211] Fps is (10 sec: 13926.5, 60 sec: 15018.6, 300 sec: 14856.7). Total num frames: 807268352. Throughput: 0: 3829.9. Samples: 190984952. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 12:29:23,968][134211] Avg episode reward: [(0, '9.249')] [2025-01-04 12:29:25,752][134294] Updated weights for policy 0, policy_version 197094 (0.0024) [2025-01-04 12:29:28,600][134294] Updated weights for policy 0, policy_version 197104 (0.0025) [2025-01-04 12:29:28,968][134211] Fps is (10 sec: 14335.8, 60 sec: 15086.9, 300 sec: 14870.6). Total num frames: 807342080. Throughput: 0: 3827.1. Samples: 191006368. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 12:29:28,968][134211] Avg episode reward: [(0, '10.084')] [2025-01-04 12:29:31,348][134294] Updated weights for policy 0, policy_version 197114 (0.0025) [2025-01-04 12:29:33,968][134211] Fps is (10 sec: 14745.7, 60 sec: 15086.9, 300 sec: 14884.4). Total num frames: 807415808. Throughput: 0: 3730.8. Samples: 191017128. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 12:29:33,968][134211] Avg episode reward: [(0, '9.856')] [2025-01-04 12:29:34,314][134294] Updated weights for policy 0, policy_version 197124 (0.0020) [2025-01-04 12:29:37,099][134294] Updated weights for policy 0, policy_version 197134 (0.0023) [2025-01-04 12:29:38,968][134211] Fps is (10 sec: 13926.4, 60 sec: 15018.7, 300 sec: 14870.6). Total num frames: 807481344. Throughput: 0: 3486.9. Samples: 191038504. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 12:29:38,968][134211] Avg episode reward: [(0, '9.284')] [2025-01-04 12:29:40,025][134294] Updated weights for policy 0, policy_version 197144 (0.0024) [2025-01-04 12:29:42,893][134294] Updated weights for policy 0, policy_version 197154 (0.0023) [2025-01-04 12:29:43,968][134211] Fps is (10 sec: 13926.4, 60 sec: 15018.6, 300 sec: 14828.9). Total num frames: 807555072. Throughput: 0: 3512.9. Samples: 191059956. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 12:29:43,968][134211] Avg episode reward: [(0, '9.253')] [2025-01-04 12:29:45,572][134294] Updated weights for policy 0, policy_version 197164 (0.0022) [2025-01-04 12:29:47,481][134294] Updated weights for policy 0, policy_version 197174 (0.0013) [2025-01-04 12:29:48,968][134211] Fps is (10 sec: 16383.2, 60 sec: 14677.2, 300 sec: 14898.3). Total num frames: 807645184. Throughput: 0: 3575.1. Samples: 191072618. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 12:29:48,969][134211] Avg episode reward: [(0, '9.427')] [2025-01-04 12:29:50,172][134294] Updated weights for policy 0, policy_version 197184 (0.0021) [2025-01-04 12:29:53,020][134294] Updated weights for policy 0, policy_version 197194 (0.0022) [2025-01-04 12:29:53,968][134211] Fps is (10 sec: 16384.0, 60 sec: 14336.0, 300 sec: 14898.3). Total num frames: 807718912. Throughput: 0: 3657.8. Samples: 191097008. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 12:29:53,968][134211] Avg episode reward: [(0, '10.152')] [2025-01-04 12:29:56,027][134294] Updated weights for policy 0, policy_version 197204 (0.0025) [2025-01-04 12:29:58,913][134294] Updated weights for policy 0, policy_version 197214 (0.0025) [2025-01-04 12:29:58,968][134211] Fps is (10 sec: 14336.8, 60 sec: 14404.3, 300 sec: 14898.3). Total num frames: 807788544. Throughput: 0: 3655.7. Samples: 191117748. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 12:29:58,968][134211] Avg episode reward: [(0, '10.102')] [2025-01-04 12:30:01,892][134294] Updated weights for policy 0, policy_version 197224 (0.0024) [2025-01-04 12:30:03,968][134211] Fps is (10 sec: 13516.8, 60 sec: 14404.3, 300 sec: 14828.9). Total num frames: 807854080. Throughput: 0: 3658.5. Samples: 191128064. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 12:30:03,968][134211] Avg episode reward: [(0, '8.790')] [2025-01-04 12:30:04,865][134294] Updated weights for policy 0, policy_version 197234 (0.0024) [2025-01-04 12:30:07,395][134294] Updated weights for policy 0, policy_version 197244 (0.0018) [2025-01-04 12:30:08,967][134211] Fps is (10 sec: 15565.0, 60 sec: 14745.7, 300 sec: 14884.5). Total num frames: 807944192. Throughput: 0: 3663.9. Samples: 191149826. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 12:30:08,968][134211] Avg episode reward: [(0, '8.536')] [2025-01-04 12:30:09,288][134294] Updated weights for policy 0, policy_version 197254 (0.0013) [2025-01-04 12:30:11,173][134294] Updated weights for policy 0, policy_version 197264 (0.0015) [2025-01-04 12:30:13,069][134294] Updated weights for policy 0, policy_version 197274 (0.0012) [2025-01-04 12:30:13,967][134211] Fps is (10 sec: 20071.0, 60 sec: 15428.4, 300 sec: 15023.3). Total num frames: 808054784. Throughput: 0: 3914.2. Samples: 191182504. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 12:30:13,968][134211] Avg episode reward: [(0, '9.318')] [2025-01-04 12:30:14,956][134294] Updated weights for policy 0, policy_version 197284 (0.0014) [2025-01-04 12:30:16,968][134294] Updated weights for policy 0, policy_version 197294 (0.0015) [2025-01-04 12:30:18,968][134211] Fps is (10 sec: 19660.3, 60 sec: 15701.3, 300 sec: 15078.8). Total num frames: 808140800. Throughput: 0: 4039.9. Samples: 191198922. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 12:30:18,968][134211] Avg episode reward: [(0, '9.358')] [2025-01-04 12:30:19,959][134294] Updated weights for policy 0, policy_version 197304 (0.0026) [2025-01-04 12:30:22,931][134294] Updated weights for policy 0, policy_version 197314 (0.0027) [2025-01-04 12:30:23,968][134211] Fps is (10 sec: 15564.3, 60 sec: 15701.3, 300 sec: 15078.9). Total num frames: 808210432. Throughput: 0: 4036.6. Samples: 191220152. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 12:30:23,968][134211] Avg episode reward: [(0, '9.126')] [2025-01-04 12:30:26,052][134294] Updated weights for policy 0, policy_version 197324 (0.0026) [2025-01-04 12:30:28,971][134211] Fps is (10 sec: 13512.5, 60 sec: 15564.0, 300 sec: 15009.2). Total num frames: 808275968. Throughput: 0: 3998.6. Samples: 191239906. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 12:30:28,972][134211] Avg episode reward: [(0, '9.027')] [2025-01-04 12:30:29,186][134294] Updated weights for policy 0, policy_version 197334 (0.0025) [2025-01-04 12:30:32,193][134294] Updated weights for policy 0, policy_version 197344 (0.0023) [2025-01-04 12:30:33,968][134211] Fps is (10 sec: 13107.2, 60 sec: 15428.3, 300 sec: 14981.6). Total num frames: 808341504. Throughput: 0: 3938.7. Samples: 191249860. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 12:30:33,968][134211] Avg episode reward: [(0, '8.859')] [2025-01-04 12:30:35,168][134294] Updated weights for policy 0, policy_version 197354 (0.0023) [2025-01-04 12:30:37,957][134294] Updated weights for policy 0, policy_version 197364 (0.0025) [2025-01-04 12:30:38,968][134211] Fps is (10 sec: 13930.9, 60 sec: 15564.8, 300 sec: 14995.6). Total num frames: 808415232. Throughput: 0: 3869.3. Samples: 191271128. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 12:30:38,968][134211] Avg episode reward: [(0, '9.147')] [2025-01-04 12:30:40,977][134294] Updated weights for policy 0, policy_version 197374 (0.0022) [2025-01-04 12:30:43,817][134294] Updated weights for policy 0, policy_version 197384 (0.0023) [2025-01-04 12:30:43,968][134211] Fps is (10 sec: 14336.1, 60 sec: 15496.5, 300 sec: 14981.7). Total num frames: 808484864. Throughput: 0: 3878.6. Samples: 191292286. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 12:30:43,968][134211] Avg episode reward: [(0, '10.032')] [2025-01-04 12:30:46,701][134294] Updated weights for policy 0, policy_version 197394 (0.0024) [2025-01-04 12:30:48,968][134211] Fps is (10 sec: 13926.4, 60 sec: 15155.3, 300 sec: 14981.7). Total num frames: 808554496. Throughput: 0: 3880.5. Samples: 191302688. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 12:30:48,968][134211] Avg episode reward: [(0, '9.405')] [2025-01-04 12:30:49,721][134294] Updated weights for policy 0, policy_version 197404 (0.0026) [2025-01-04 12:30:52,666][134294] Updated weights for policy 0, policy_version 197414 (0.0022) [2025-01-04 12:30:53,968][134211] Fps is (10 sec: 13926.3, 60 sec: 15086.9, 300 sec: 14953.9). Total num frames: 808624128. Throughput: 0: 3856.7. Samples: 191323378. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 12:30:53,968][134211] Avg episode reward: [(0, '9.066')] [2025-01-04 12:30:55,614][134294] Updated weights for policy 0, policy_version 197424 (0.0023) [2025-01-04 12:30:58,411][134294] Updated weights for policy 0, policy_version 197434 (0.0024) [2025-01-04 12:30:58,968][134211] Fps is (10 sec: 13926.5, 60 sec: 15086.9, 300 sec: 14926.1). Total num frames: 808693760. Throughput: 0: 3603.0. Samples: 191344638. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 12:30:58,968][134211] Avg episode reward: [(0, '7.926')] [2025-01-04 12:31:01,358][134294] Updated weights for policy 0, policy_version 197444 (0.0028) [2025-01-04 12:31:03,968][134211] Fps is (10 sec: 13926.4, 60 sec: 15155.2, 300 sec: 14926.1). Total num frames: 808763392. Throughput: 0: 3471.8. Samples: 191355152. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 12:31:03,968][134211] Avg episode reward: [(0, '9.178')] [2025-01-04 12:31:04,429][134294] Updated weights for policy 0, policy_version 197454 (0.0023) [2025-01-04 12:31:07,377][134294] Updated weights for policy 0, policy_version 197464 (0.0024) [2025-01-04 12:31:08,968][134211] Fps is (10 sec: 13925.4, 60 sec: 14813.7, 300 sec: 14912.2). Total num frames: 808833024. Throughput: 0: 3456.2. Samples: 191375684. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 12:31:08,969][134211] Avg episode reward: [(0, '9.917')] [2025-01-04 12:31:10,344][134294] Updated weights for policy 0, policy_version 197474 (0.0025) [2025-01-04 12:31:13,208][134294] Updated weights for policy 0, policy_version 197484 (0.0024) [2025-01-04 12:31:13,968][134211] Fps is (10 sec: 13926.3, 60 sec: 14131.1, 300 sec: 14884.4). Total num frames: 808902656. Throughput: 0: 3488.1. Samples: 191396862. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 12:31:13,968][134211] Avg episode reward: [(0, '8.867')] [2025-01-04 12:31:13,980][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000197486_808902656.pth... [2025-01-04 12:31:14,051][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000196616_805339136.pth [2025-01-04 12:31:16,061][134294] Updated weights for policy 0, policy_version 197494 (0.0020) [2025-01-04 12:31:17,929][134294] Updated weights for policy 0, policy_version 197504 (0.0013) [2025-01-04 12:31:18,967][134211] Fps is (10 sec: 16385.5, 60 sec: 14267.8, 300 sec: 14940.0). Total num frames: 808996864. Throughput: 0: 3510.8. Samples: 191407844. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 12:31:18,968][134211] Avg episode reward: [(0, '9.758')] [2025-01-04 12:31:19,804][134294] Updated weights for policy 0, policy_version 197514 (0.0012) [2025-01-04 12:31:21,667][134294] Updated weights for policy 0, policy_version 197524 (0.0015) [2025-01-04 12:31:23,561][134294] Updated weights for policy 0, policy_version 197534 (0.0014) [2025-01-04 12:31:23,968][134211] Fps is (10 sec: 20480.6, 60 sec: 14950.5, 300 sec: 15065.0). Total num frames: 809107456. Throughput: 0: 3768.0. Samples: 191440688. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 12:31:23,968][134211] Avg episode reward: [(0, '8.652')] [2025-01-04 12:31:26,176][134294] Updated weights for policy 0, policy_version 197544 (0.0021) [2025-01-04 12:31:28,968][134211] Fps is (10 sec: 18021.9, 60 sec: 15019.5, 300 sec: 15065.0). Total num frames: 809177088. Throughput: 0: 3839.4. Samples: 191465058. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 12:31:28,968][134211] Avg episode reward: [(0, '10.145')] [2025-01-04 12:31:29,248][134294] Updated weights for policy 0, policy_version 197554 (0.0026) [2025-01-04 12:31:32,453][134294] Updated weights for policy 0, policy_version 197564 (0.0028) [2025-01-04 12:31:33,968][134211] Fps is (10 sec: 13106.9, 60 sec: 14950.4, 300 sec: 15037.2). Total num frames: 809238528. Throughput: 0: 3820.3. Samples: 191474600. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 12:31:33,968][134211] Avg episode reward: [(0, '9.010')] [2025-01-04 12:31:35,414][134294] Updated weights for policy 0, policy_version 197574 (0.0025) [2025-01-04 12:31:38,400][134294] Updated weights for policy 0, policy_version 197584 (0.0025) [2025-01-04 12:31:38,968][134211] Fps is (10 sec: 13107.2, 60 sec: 14882.1, 300 sec: 14981.6). Total num frames: 809308160. Throughput: 0: 3819.7. Samples: 191495262. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 12:31:38,968][134211] Avg episode reward: [(0, '9.714')] [2025-01-04 12:31:41,251][134294] Updated weights for policy 0, policy_version 197594 (0.0024) [2025-01-04 12:31:43,968][134211] Fps is (10 sec: 13926.6, 60 sec: 14882.2, 300 sec: 14953.9). Total num frames: 809377792. Throughput: 0: 3797.8. Samples: 191515540. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 12:31:43,968][134211] Avg episode reward: [(0, '9.021')] [2025-01-04 12:31:44,371][134294] Updated weights for policy 0, policy_version 197604 (0.0026) [2025-01-04 12:31:47,404][134294] Updated weights for policy 0, policy_version 197614 (0.0025) [2025-01-04 12:31:48,968][134211] Fps is (10 sec: 13926.6, 60 sec: 14882.2, 300 sec: 14967.8). Total num frames: 809447424. Throughput: 0: 3788.2. Samples: 191525620. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 12:31:48,968][134211] Avg episode reward: [(0, '10.294')] [2025-01-04 12:31:50,337][134294] Updated weights for policy 0, policy_version 197624 (0.0023) [2025-01-04 12:31:53,173][134294] Updated weights for policy 0, policy_version 197634 (0.0022) [2025-01-04 12:31:53,968][134211] Fps is (10 sec: 13926.4, 60 sec: 14882.2, 300 sec: 14967.8). Total num frames: 809517056. Throughput: 0: 3804.1. Samples: 191546866. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 12:31:53,968][134211] Avg episode reward: [(0, '9.229')] [2025-01-04 12:31:56,070][134294] Updated weights for policy 0, policy_version 197644 (0.0023) [2025-01-04 12:31:58,969][134211] Fps is (10 sec: 13923.8, 60 sec: 14881.7, 300 sec: 14953.8). Total num frames: 809586688. Throughput: 0: 3798.8. Samples: 191567816. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 12:31:58,970][134211] Avg episode reward: [(0, '9.619')] [2025-01-04 12:31:59,071][134294] Updated weights for policy 0, policy_version 197654 (0.0024) [2025-01-04 12:32:02,066][134294] Updated weights for policy 0, policy_version 197664 (0.0023) [2025-01-04 12:32:03,968][134211] Fps is (10 sec: 13926.0, 60 sec: 14882.1, 300 sec: 14967.7). Total num frames: 809656320. Throughput: 0: 3781.0. Samples: 191577992. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 12:32:03,969][134211] Avg episode reward: [(0, '9.749')] [2025-01-04 12:32:05,041][134294] Updated weights for policy 0, policy_version 197674 (0.0026) [2025-01-04 12:32:08,017][134294] Updated weights for policy 0, policy_version 197684 (0.0024) [2025-01-04 12:32:08,968][134211] Fps is (10 sec: 13928.6, 60 sec: 14882.3, 300 sec: 14870.6). Total num frames: 809725952. Throughput: 0: 3519.9. Samples: 191599086. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 12:32:08,968][134211] Avg episode reward: [(0, '9.288')] [2025-01-04 12:32:10,817][134294] Updated weights for policy 0, policy_version 197694 (0.0023) [2025-01-04 12:32:13,758][134294] Updated weights for policy 0, policy_version 197704 (0.0023) [2025-01-04 12:32:13,968][134211] Fps is (10 sec: 13926.6, 60 sec: 14882.2, 300 sec: 14731.7). Total num frames: 809795584. Throughput: 0: 3443.8. Samples: 191620028. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 12:32:13,968][134211] Avg episode reward: [(0, '9.046')] [2025-01-04 12:32:16,659][134294] Updated weights for policy 0, policy_version 197714 (0.0025) [2025-01-04 12:32:18,968][134211] Fps is (10 sec: 13926.7, 60 sec: 14472.5, 300 sec: 14704.0). Total num frames: 809865216. Throughput: 0: 3468.5. Samples: 191630684. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 12:32:18,969][134211] Avg episode reward: [(0, '8.841')] [2025-01-04 12:32:19,615][134294] Updated weights for policy 0, policy_version 197724 (0.0023) [2025-01-04 12:32:22,586][134294] Updated weights for policy 0, policy_version 197734 (0.0024) [2025-01-04 12:32:23,968][134211] Fps is (10 sec: 13926.4, 60 sec: 13789.8, 300 sec: 14717.8). Total num frames: 809934848. Throughput: 0: 3472.6. Samples: 191651530. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 12:32:23,968][134211] Avg episode reward: [(0, '10.713')] [2025-01-04 12:32:25,405][134294] Updated weights for policy 0, policy_version 197744 (0.0022) [2025-01-04 12:32:28,280][134294] Updated weights for policy 0, policy_version 197754 (0.0022) [2025-01-04 12:32:28,967][134211] Fps is (10 sec: 14745.8, 60 sec: 13926.5, 300 sec: 14745.6). Total num frames: 810012672. Throughput: 0: 3500.1. Samples: 191673042. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 12:32:28,968][134211] Avg episode reward: [(0, '9.155')] [2025-01-04 12:32:30,171][134294] Updated weights for policy 0, policy_version 197764 (0.0013) [2025-01-04 12:32:31,987][134294] Updated weights for policy 0, policy_version 197774 (0.0012) [2025-01-04 12:32:33,909][134294] Updated weights for policy 0, policy_version 197784 (0.0013) [2025-01-04 12:32:33,968][134211] Fps is (10 sec: 18841.4, 60 sec: 14745.6, 300 sec: 14884.4). Total num frames: 810123264. Throughput: 0: 3637.9. Samples: 191689328. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 12:32:33,968][134211] Avg episode reward: [(0, '8.885')] [2025-01-04 12:32:35,732][134294] Updated weights for policy 0, policy_version 197794 (0.0014) [2025-01-04 12:32:38,284][134294] Updated weights for policy 0, policy_version 197804 (0.0021) [2025-01-04 12:32:38,968][134211] Fps is (10 sec: 19660.6, 60 sec: 15018.7, 300 sec: 14940.0). Total num frames: 810209280. Throughput: 0: 3861.1. Samples: 191720616. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 12:32:38,968][134211] Avg episode reward: [(0, '9.510')] [2025-01-04 12:32:41,321][134294] Updated weights for policy 0, policy_version 197814 (0.0027) [2025-01-04 12:32:43,968][134211] Fps is (10 sec: 15565.1, 60 sec: 15018.7, 300 sec: 14940.0). Total num frames: 810278912. Throughput: 0: 3839.7. Samples: 191740596. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 12:32:43,968][134211] Avg episode reward: [(0, '9.347')] [2025-01-04 12:32:44,534][134294] Updated weights for policy 0, policy_version 197824 (0.0024) [2025-01-04 12:32:47,539][134294] Updated weights for policy 0, policy_version 197834 (0.0025) [2025-01-04 12:32:48,968][134211] Fps is (10 sec: 13516.6, 60 sec: 14950.4, 300 sec: 14926.1). Total num frames: 810344448. Throughput: 0: 3833.9. Samples: 191750518. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 12:32:48,968][134211] Avg episode reward: [(0, '8.962')] [2025-01-04 12:32:50,480][134294] Updated weights for policy 0, policy_version 197844 (0.0027) [2025-01-04 12:32:53,395][134294] Updated weights for policy 0, policy_version 197854 (0.0026) [2025-01-04 12:32:53,968][134211] Fps is (10 sec: 13516.9, 60 sec: 14950.4, 300 sec: 14926.1). Total num frames: 810414080. Throughput: 0: 3827.3. Samples: 191771314. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 12:32:53,968][134211] Avg episode reward: [(0, '9.801')] [2025-01-04 12:32:56,340][134294] Updated weights for policy 0, policy_version 197864 (0.0025) [2025-01-04 12:32:58,968][134211] Fps is (10 sec: 13926.4, 60 sec: 14950.8, 300 sec: 14926.1). Total num frames: 810483712. Throughput: 0: 3823.4. Samples: 191792080. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 12:32:58,968][134211] Avg episode reward: [(0, '11.001')] [2025-01-04 12:32:59,366][134294] Updated weights for policy 0, policy_version 197874 (0.0024) [2025-01-04 12:33:02,315][134294] Updated weights for policy 0, policy_version 197884 (0.0025) [2025-01-04 12:33:03,968][134211] Fps is (10 sec: 13926.4, 60 sec: 14950.5, 300 sec: 14912.2). Total num frames: 810553344. Throughput: 0: 3817.6. Samples: 191802474. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 12:33:03,968][134211] Avg episode reward: [(0, '8.956')] [2025-01-04 12:33:05,257][134294] Updated weights for policy 0, policy_version 197894 (0.0023) [2025-01-04 12:33:08,058][134294] Updated weights for policy 0, policy_version 197904 (0.0022) [2025-01-04 12:33:08,968][134211] Fps is (10 sec: 14336.2, 60 sec: 15018.7, 300 sec: 14926.1). Total num frames: 810627072. Throughput: 0: 3825.9. Samples: 191823696. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 12:33:08,968][134211] Avg episode reward: [(0, '9.565')] [2025-01-04 12:33:10,993][134294] Updated weights for policy 0, policy_version 197914 (0.0024) [2025-01-04 12:33:13,968][134211] Fps is (10 sec: 14335.8, 60 sec: 15018.7, 300 sec: 14926.1). Total num frames: 810696704. Throughput: 0: 3813.4. Samples: 191844644. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 12:33:13,968][134211] Avg episode reward: [(0, '9.991')] [2025-01-04 12:33:13,969][134294] Updated weights for policy 0, policy_version 197924 (0.0023) [2025-01-04 12:33:13,975][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000197924_810696704.pth... [2025-01-04 12:33:14,043][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000197053_807129088.pth [2025-01-04 12:33:16,911][134294] Updated weights for policy 0, policy_version 197934 (0.0022) [2025-01-04 12:33:18,968][134211] Fps is (10 sec: 13516.8, 60 sec: 14950.4, 300 sec: 14898.3). Total num frames: 810762240. Throughput: 0: 3679.8. Samples: 191854920. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 12:33:18,968][134211] Avg episode reward: [(0, '10.504')] [2025-01-04 12:33:19,965][134294] Updated weights for policy 0, policy_version 197944 (0.0025) [2025-01-04 12:33:22,788][134294] Updated weights for policy 0, policy_version 197954 (0.0026) [2025-01-04 12:33:23,968][134211] Fps is (10 sec: 13517.1, 60 sec: 14950.4, 300 sec: 14898.3). Total num frames: 810831872. Throughput: 0: 3449.0. Samples: 191875822. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 12:33:23,968][134211] Avg episode reward: [(0, '10.143')] [2025-01-04 12:33:25,747][134294] Updated weights for policy 0, policy_version 197964 (0.0023) [2025-01-04 12:33:28,556][134294] Updated weights for policy 0, policy_version 197974 (0.0026) [2025-01-04 12:33:28,968][134211] Fps is (10 sec: 14745.3, 60 sec: 14950.3, 300 sec: 14912.2). Total num frames: 810909696. Throughput: 0: 3482.1. Samples: 191897290. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 12:33:28,968][134211] Avg episode reward: [(0, '9.902')] [2025-01-04 12:33:30,388][134294] Updated weights for policy 0, policy_version 197984 (0.0012) [2025-01-04 12:33:32,289][134294] Updated weights for policy 0, policy_version 197994 (0.0013) [2025-01-04 12:33:33,968][134211] Fps is (10 sec: 18841.7, 60 sec: 14950.5, 300 sec: 15051.1). Total num frames: 811020288. Throughput: 0: 3611.7. Samples: 191913044. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 12:33:33,968][134211] Avg episode reward: [(0, '9.049')] [2025-01-04 12:33:34,140][134294] Updated weights for policy 0, policy_version 198004 (0.0013) [2025-01-04 12:33:36,010][134294] Updated weights for policy 0, policy_version 198014 (0.0014) [2025-01-04 12:33:38,660][134294] Updated weights for policy 0, policy_version 198024 (0.0023) [2025-01-04 12:33:38,968][134211] Fps is (10 sec: 19661.1, 60 sec: 14950.4, 300 sec: 15092.7). Total num frames: 811106304. Throughput: 0: 3854.2. Samples: 191944752. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 12:33:38,968][134211] Avg episode reward: [(0, '8.519')] [2025-01-04 12:33:41,809][134294] Updated weights for policy 0, policy_version 198034 (0.0029) [2025-01-04 12:33:43,968][134211] Fps is (10 sec: 15154.8, 60 sec: 14882.1, 300 sec: 14940.0). Total num frames: 811171840. Throughput: 0: 3828.5. Samples: 191964364. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 12:33:43,968][134211] Avg episode reward: [(0, '8.642')] [2025-01-04 12:33:45,022][134294] Updated weights for policy 0, policy_version 198044 (0.0026) [2025-01-04 12:33:48,006][134294] Updated weights for policy 0, policy_version 198054 (0.0026) [2025-01-04 12:33:48,968][134211] Fps is (10 sec: 13516.7, 60 sec: 14950.4, 300 sec: 14856.7). Total num frames: 811241472. Throughput: 0: 3815.0. Samples: 191974148. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 12:33:48,968][134211] Avg episode reward: [(0, '9.477')] [2025-01-04 12:33:51,011][134294] Updated weights for policy 0, policy_version 198064 (0.0026) [2025-01-04 12:33:53,968][134211] Fps is (10 sec: 13516.8, 60 sec: 14882.1, 300 sec: 14856.7). Total num frames: 811307008. Throughput: 0: 3802.0. Samples: 191994786. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 12:33:53,968][134211] Avg episode reward: [(0, '8.607')] [2025-01-04 12:33:54,071][134294] Updated weights for policy 0, policy_version 198074 (0.0028) [2025-01-04 12:33:56,994][134294] Updated weights for policy 0, policy_version 198084 (0.0021) [2025-01-04 12:33:58,968][134211] Fps is (10 sec: 13517.0, 60 sec: 14882.2, 300 sec: 14870.6). Total num frames: 811376640. Throughput: 0: 3793.7. Samples: 192015358. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 12:33:58,968][134211] Avg episode reward: [(0, '9.522')] [2025-01-04 12:33:59,978][134294] Updated weights for policy 0, policy_version 198094 (0.0024) [2025-01-04 12:34:02,846][134294] Updated weights for policy 0, policy_version 198104 (0.0026) [2025-01-04 12:34:03,968][134211] Fps is (10 sec: 13926.5, 60 sec: 14882.1, 300 sec: 14870.6). Total num frames: 811446272. Throughput: 0: 3803.0. Samples: 192026056. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 12:34:03,968][134211] Avg episode reward: [(0, '11.045')] [2025-01-04 12:34:05,701][134294] Updated weights for policy 0, policy_version 198114 (0.0027) [2025-01-04 12:34:08,531][134294] Updated weights for policy 0, policy_version 198124 (0.0025) [2025-01-04 12:34:08,968][134211] Fps is (10 sec: 14335.8, 60 sec: 14882.1, 300 sec: 14884.5). Total num frames: 811520000. Throughput: 0: 3811.6. Samples: 192047344. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 12:34:08,968][134211] Avg episode reward: [(0, '8.669')] [2025-01-04 12:34:11,439][134294] Updated weights for policy 0, policy_version 198134 (0.0027) [2025-01-04 12:34:13,968][134211] Fps is (10 sec: 14335.8, 60 sec: 14882.1, 300 sec: 14884.4). Total num frames: 811589632. Throughput: 0: 3802.6. Samples: 192068408. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 12:34:13,968][134211] Avg episode reward: [(0, '9.602')] [2025-01-04 12:34:14,480][134294] Updated weights for policy 0, policy_version 198144 (0.0025) [2025-01-04 12:34:17,404][134294] Updated weights for policy 0, policy_version 198154 (0.0026) [2025-01-04 12:34:18,968][134211] Fps is (10 sec: 13926.6, 60 sec: 14950.4, 300 sec: 14884.5). Total num frames: 811659264. Throughput: 0: 3683.2. Samples: 192078788. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 12:34:18,968][134211] Avg episode reward: [(0, '9.116')] [2025-01-04 12:34:20,201][134294] Updated weights for policy 0, policy_version 198164 (0.0026) [2025-01-04 12:34:23,054][134294] Updated weights for policy 0, policy_version 198174 (0.0024) [2025-01-04 12:34:23,968][134211] Fps is (10 sec: 14336.0, 60 sec: 15018.6, 300 sec: 14884.4). Total num frames: 811732992. Throughput: 0: 3457.3. Samples: 192100330. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 12:34:23,968][134211] Avg episode reward: [(0, '11.435')] [2025-01-04 12:34:25,897][134294] Updated weights for policy 0, policy_version 198184 (0.0025) [2025-01-04 12:34:28,757][134294] Updated weights for policy 0, policy_version 198194 (0.0022) [2025-01-04 12:34:28,968][134211] Fps is (10 sec: 14335.8, 60 sec: 14882.2, 300 sec: 14870.6). Total num frames: 811802624. Throughput: 0: 3499.6. Samples: 192121846. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 12:34:28,968][134211] Avg episode reward: [(0, '10.451')] [2025-01-04 12:34:31,620][134294] Updated weights for policy 0, policy_version 198204 (0.0023) [2025-01-04 12:34:33,968][134211] Fps is (10 sec: 13926.4, 60 sec: 14199.4, 300 sec: 14884.4). Total num frames: 811872256. Throughput: 0: 3518.4. Samples: 192132474. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 12:34:33,968][134211] Avg episode reward: [(0, '9.314')] [2025-01-04 12:34:34,613][134294] Updated weights for policy 0, policy_version 198214 (0.0024) [2025-01-04 12:34:37,277][134294] Updated weights for policy 0, policy_version 198224 (0.0020) [2025-01-04 12:34:38,968][134211] Fps is (10 sec: 15565.1, 60 sec: 14199.5, 300 sec: 14926.1). Total num frames: 811958272. Throughput: 0: 3540.9. Samples: 192154126. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 12:34:38,968][134211] Avg episode reward: [(0, '10.535')] [2025-01-04 12:34:39,159][134294] Updated weights for policy 0, policy_version 198234 (0.0013) [2025-01-04 12:34:41,018][134294] Updated weights for policy 0, policy_version 198244 (0.0012) [2025-01-04 12:34:42,903][134294] Updated weights for policy 0, policy_version 198254 (0.0013) [2025-01-04 12:34:43,968][134211] Fps is (10 sec: 19661.2, 60 sec: 14950.5, 300 sec: 14995.6). Total num frames: 812068864. Throughput: 0: 3812.3. Samples: 192186910. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 12:34:43,968][134211] Avg episode reward: [(0, '11.138')] [2025-01-04 12:34:44,816][134294] Updated weights for policy 0, policy_version 198264 (0.0016) [2025-01-04 12:34:47,673][134294] Updated weights for policy 0, policy_version 198274 (0.0024) [2025-01-04 12:34:48,968][134211] Fps is (10 sec: 18841.4, 60 sec: 15087.0, 300 sec: 15009.4). Total num frames: 812146688. Throughput: 0: 3885.4. Samples: 192200900. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 12:34:48,968][134211] Avg episode reward: [(0, '10.235')] [2025-01-04 12:34:50,733][134294] Updated weights for policy 0, policy_version 198284 (0.0027) [2025-01-04 12:34:53,708][134294] Updated weights for policy 0, policy_version 198294 (0.0028) [2025-01-04 12:34:53,968][134211] Fps is (10 sec: 14745.3, 60 sec: 15155.2, 300 sec: 15009.4). Total num frames: 812216320. Throughput: 0: 3863.0. Samples: 192221180. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 12:34:53,968][134211] Avg episode reward: [(0, '8.736')] [2025-01-04 12:34:56,702][134294] Updated weights for policy 0, policy_version 198304 (0.0028) [2025-01-04 12:34:58,968][134211] Fps is (10 sec: 13516.9, 60 sec: 15086.9, 300 sec: 15009.4). Total num frames: 812281856. Throughput: 0: 3846.7. Samples: 192241510. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 12:34:58,968][134211] Avg episode reward: [(0, '10.475')] [2025-01-04 12:34:59,708][134294] Updated weights for policy 0, policy_version 198314 (0.0026) [2025-01-04 12:35:02,822][134294] Updated weights for policy 0, policy_version 198324 (0.0029) [2025-01-04 12:35:03,968][134211] Fps is (10 sec: 13107.1, 60 sec: 15018.6, 300 sec: 14926.1). Total num frames: 812347392. Throughput: 0: 3838.6. Samples: 192251524. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 12:35:03,969][134211] Avg episode reward: [(0, '9.133')] [2025-01-04 12:35:05,722][134294] Updated weights for policy 0, policy_version 198334 (0.0024) [2025-01-04 12:35:08,572][134294] Updated weights for policy 0, policy_version 198344 (0.0026) [2025-01-04 12:35:08,968][134211] Fps is (10 sec: 13926.4, 60 sec: 15018.7, 300 sec: 14801.1). Total num frames: 812421120. Throughput: 0: 3827.1. Samples: 192272548. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 12:35:08,968][134211] Avg episode reward: [(0, '9.906')] [2025-01-04 12:35:11,421][134294] Updated weights for policy 0, policy_version 198354 (0.0025) [2025-01-04 12:35:13,968][134211] Fps is (10 sec: 14335.9, 60 sec: 15018.7, 300 sec: 14745.6). Total num frames: 812490752. Throughput: 0: 3818.0. Samples: 192293658. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 12:35:13,968][134211] Avg episode reward: [(0, '11.448')] [2025-01-04 12:35:13,979][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000198362_812490752.pth... [2025-01-04 12:35:14,053][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000197486_808902656.pth [2025-01-04 12:35:14,458][134294] Updated weights for policy 0, policy_version 198364 (0.0026) [2025-01-04 12:35:17,376][134294] Updated weights for policy 0, policy_version 198374 (0.0026) [2025-01-04 12:35:18,968][134211] Fps is (10 sec: 13926.3, 60 sec: 15018.6, 300 sec: 14745.6). Total num frames: 812560384. Throughput: 0: 3807.6. Samples: 192303816. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 12:35:18,968][134211] Avg episode reward: [(0, '8.495')] [2025-01-04 12:35:20,268][134294] Updated weights for policy 0, policy_version 198384 (0.0024) [2025-01-04 12:35:23,187][134294] Updated weights for policy 0, policy_version 198394 (0.0026) [2025-01-04 12:35:23,968][134211] Fps is (10 sec: 13926.5, 60 sec: 14950.4, 300 sec: 14759.6). Total num frames: 812630016. Throughput: 0: 3802.1. Samples: 192325222. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 12:35:23,968][134211] Avg episode reward: [(0, '9.053')] [2025-01-04 12:35:26,012][134294] Updated weights for policy 0, policy_version 198404 (0.0025) [2025-01-04 12:35:28,968][134211] Fps is (10 sec: 13926.6, 60 sec: 14950.4, 300 sec: 14773.4). Total num frames: 812699648. Throughput: 0: 3538.8. Samples: 192346158. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 12:35:28,968][134211] Avg episode reward: [(0, '9.877')] [2025-01-04 12:35:29,077][134294] Updated weights for policy 0, policy_version 198414 (0.0023) [2025-01-04 12:35:31,887][134294] Updated weights for policy 0, policy_version 198424 (0.0024) [2025-01-04 12:35:33,968][134211] Fps is (10 sec: 13926.6, 60 sec: 14950.4, 300 sec: 14759.5). Total num frames: 812769280. Throughput: 0: 3458.9. Samples: 192356550. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 12:35:33,968][134211] Avg episode reward: [(0, '8.874')] [2025-01-04 12:35:34,929][134294] Updated weights for policy 0, policy_version 198434 (0.0026) [2025-01-04 12:35:37,777][134294] Updated weights for policy 0, policy_version 198444 (0.0024) [2025-01-04 12:35:38,968][134211] Fps is (10 sec: 14335.8, 60 sec: 14745.6, 300 sec: 14773.4). Total num frames: 812843008. Throughput: 0: 3474.9. Samples: 192377552. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 12:35:38,968][134211] Avg episode reward: [(0, '8.841')] [2025-01-04 12:35:40,722][134294] Updated weights for policy 0, policy_version 198454 (0.0026) [2025-01-04 12:35:43,551][134294] Updated weights for policy 0, policy_version 198464 (0.0025) [2025-01-04 12:35:43,968][134211] Fps is (10 sec: 14336.1, 60 sec: 14062.9, 300 sec: 14773.4). Total num frames: 812912640. Throughput: 0: 3503.2. Samples: 192399152. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 12:35:43,968][134211] Avg episode reward: [(0, '9.582')] [2025-01-04 12:35:46,377][134294] Updated weights for policy 0, policy_version 198474 (0.0024) [2025-01-04 12:35:48,556][134294] Updated weights for policy 0, policy_version 198484 (0.0018) [2025-01-04 12:35:48,968][134211] Fps is (10 sec: 15155.5, 60 sec: 14131.2, 300 sec: 14815.0). Total num frames: 812994560. Throughput: 0: 3515.8. Samples: 192409734. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 12:35:48,968][134211] Avg episode reward: [(0, '9.575')] [2025-01-04 12:35:50,489][134294] Updated weights for policy 0, policy_version 198494 (0.0012) [2025-01-04 12:35:52,397][134294] Updated weights for policy 0, policy_version 198504 (0.0014) [2025-01-04 12:35:53,968][134211] Fps is (10 sec: 19251.2, 60 sec: 14813.9, 300 sec: 14953.9). Total num frames: 813105152. Throughput: 0: 3724.1. Samples: 192440134. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 12:35:53,968][134211] Avg episode reward: [(0, '9.254')] [2025-01-04 12:35:54,275][134294] Updated weights for policy 0, policy_version 198514 (0.0012) [2025-01-04 12:35:56,188][134294] Updated weights for policy 0, policy_version 198524 (0.0013) [2025-01-04 12:35:58,467][134294] Updated weights for policy 0, policy_version 198534 (0.0019) [2025-01-04 12:35:58,968][134211] Fps is (10 sec: 20479.5, 60 sec: 15291.7, 300 sec: 15037.2). Total num frames: 813199360. Throughput: 0: 3938.1. Samples: 192470872. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 12:35:58,968][134211] Avg episode reward: [(0, '9.149')] [2025-01-04 12:36:01,770][134294] Updated weights for policy 0, policy_version 198544 (0.0025) [2025-01-04 12:36:03,968][134211] Fps is (10 sec: 15564.5, 60 sec: 15223.5, 300 sec: 15009.4). Total num frames: 813260800. Throughput: 0: 3924.7. Samples: 192480426. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 12:36:03,968][134211] Avg episode reward: [(0, '9.234')] [2025-01-04 12:36:04,979][134294] Updated weights for policy 0, policy_version 198554 (0.0026) [2025-01-04 12:36:08,009][134294] Updated weights for policy 0, policy_version 198564 (0.0024) [2025-01-04 12:36:08,968][134211] Fps is (10 sec: 13107.5, 60 sec: 15155.2, 300 sec: 15009.4). Total num frames: 813330432. Throughput: 0: 3884.2. Samples: 192500010. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 12:36:08,968][134211] Avg episode reward: [(0, '9.166')] [2025-01-04 12:36:11,007][134294] Updated weights for policy 0, policy_version 198574 (0.0027) [2025-01-04 12:36:13,968][134211] Fps is (10 sec: 13517.0, 60 sec: 15087.0, 300 sec: 14912.2). Total num frames: 813395968. Throughput: 0: 3864.7. Samples: 192520068. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 12:36:13,968][134211] Avg episode reward: [(0, '8.604')] [2025-01-04 12:36:14,082][134294] Updated weights for policy 0, policy_version 198584 (0.0024) [2025-01-04 12:36:17,014][134294] Updated weights for policy 0, policy_version 198594 (0.0023) [2025-01-04 12:36:18,968][134211] Fps is (10 sec: 13516.5, 60 sec: 15086.9, 300 sec: 14773.4). Total num frames: 813465600. Throughput: 0: 3863.1. Samples: 192530390. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 12:36:18,968][134211] Avg episode reward: [(0, '9.518')] [2025-01-04 12:36:19,947][134294] Updated weights for policy 0, policy_version 198604 (0.0030) [2025-01-04 12:36:22,836][134294] Updated weights for policy 0, policy_version 198614 (0.0022) [2025-01-04 12:36:23,968][134211] Fps is (10 sec: 13926.2, 60 sec: 15086.9, 300 sec: 14773.4). Total num frames: 813535232. Throughput: 0: 3871.7. Samples: 192551778. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 12:36:23,968][134211] Avg episode reward: [(0, '8.715')] [2025-01-04 12:36:25,705][134294] Updated weights for policy 0, policy_version 198624 (0.0025) [2025-01-04 12:36:28,534][134294] Updated weights for policy 0, policy_version 198634 (0.0023) [2025-01-04 12:36:28,968][134211] Fps is (10 sec: 14336.0, 60 sec: 15155.2, 300 sec: 14815.0). Total num frames: 813608960. Throughput: 0: 3867.9. Samples: 192573206. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 12:36:28,968][134211] Avg episode reward: [(0, '8.994')] [2025-01-04 12:36:31,427][134294] Updated weights for policy 0, policy_version 198644 (0.0026) [2025-01-04 12:36:33,968][134211] Fps is (10 sec: 14335.9, 60 sec: 15155.2, 300 sec: 14815.0). Total num frames: 813678592. Throughput: 0: 3868.3. Samples: 192583808. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 12:36:33,968][134211] Avg episode reward: [(0, '8.848')] [2025-01-04 12:36:34,464][134294] Updated weights for policy 0, policy_version 198654 (0.0025) [2025-01-04 12:36:37,322][134294] Updated weights for policy 0, policy_version 198664 (0.0023) [2025-01-04 12:36:38,968][134211] Fps is (10 sec: 13926.5, 60 sec: 15087.0, 300 sec: 14815.0). Total num frames: 813748224. Throughput: 0: 3654.1. Samples: 192604568. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 12:36:38,968][134211] Avg episode reward: [(0, '11.435')] [2025-01-04 12:36:40,290][134294] Updated weights for policy 0, policy_version 198674 (0.0027) [2025-01-04 12:36:43,647][134294] Updated weights for policy 0, policy_version 198684 (0.0026) [2025-01-04 12:36:43,968][134211] Fps is (10 sec: 13107.2, 60 sec: 14950.3, 300 sec: 14787.2). Total num frames: 813809664. Throughput: 0: 3406.1. Samples: 192624148. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 12:36:43,968][134211] Avg episode reward: [(0, '8.717')] [2025-01-04 12:36:46,830][134294] Updated weights for policy 0, policy_version 198694 (0.0024) [2025-01-04 12:36:48,968][134211] Fps is (10 sec: 12288.0, 60 sec: 14609.0, 300 sec: 14759.5). Total num frames: 813871104. Throughput: 0: 3403.4. Samples: 192633580. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 12:36:48,968][134211] Avg episode reward: [(0, '9.567')] [2025-01-04 12:36:50,306][134294] Updated weights for policy 0, policy_version 198704 (0.0028) [2025-01-04 12:36:53,177][134294] Updated weights for policy 0, policy_version 198714 (0.0018) [2025-01-04 12:36:53,968][134211] Fps is (10 sec: 13517.2, 60 sec: 13994.7, 300 sec: 14773.5). Total num frames: 813944832. Throughput: 0: 3376.6. Samples: 192651958. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 12:36:53,968][134211] Avg episode reward: [(0, '9.970')] [2025-01-04 12:36:55,212][134294] Updated weights for policy 0, policy_version 198724 (0.0015) [2025-01-04 12:36:57,225][134294] Updated weights for policy 0, policy_version 198734 (0.0014) [2025-01-04 12:36:58,967][134211] Fps is (10 sec: 17613.1, 60 sec: 14131.3, 300 sec: 14884.5). Total num frames: 814047232. Throughput: 0: 3593.4. Samples: 192681772. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 12:36:58,968][134211] Avg episode reward: [(0, '10.510')] [2025-01-04 12:36:59,225][134294] Updated weights for policy 0, policy_version 198744 (0.0012) [2025-01-04 12:37:01,992][134294] Updated weights for policy 0, policy_version 198754 (0.0022) [2025-01-04 12:37:03,968][134211] Fps is (10 sec: 17612.6, 60 sec: 14336.0, 300 sec: 14898.3). Total num frames: 814120960. Throughput: 0: 3651.1. Samples: 192694688. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 12:37:03,968][134211] Avg episode reward: [(0, '8.754')] [2025-01-04 12:37:05,219][134294] Updated weights for policy 0, policy_version 198764 (0.0025) [2025-01-04 12:37:08,178][134294] Updated weights for policy 0, policy_version 198774 (0.0024) [2025-01-04 12:37:08,968][134211] Fps is (10 sec: 13926.0, 60 sec: 14267.7, 300 sec: 14884.4). Total num frames: 814186496. Throughput: 0: 3619.6. Samples: 192714660. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 12:37:08,968][134211] Avg episode reward: [(0, '9.203')] [2025-01-04 12:37:11,301][134294] Updated weights for policy 0, policy_version 198784 (0.0026) [2025-01-04 12:37:13,968][134211] Fps is (10 sec: 13106.9, 60 sec: 14267.7, 300 sec: 14870.5). Total num frames: 814252032. Throughput: 0: 3578.6. Samples: 192734242. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 12:37:13,969][134211] Avg episode reward: [(0, '9.925')] [2025-01-04 12:37:13,977][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000198792_814252032.pth... [2025-01-04 12:37:14,063][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000197924_810696704.pth [2025-01-04 12:37:14,490][134294] Updated weights for policy 0, policy_version 198794 (0.0028) [2025-01-04 12:37:17,377][134294] Updated weights for policy 0, policy_version 198804 (0.0025) [2025-01-04 12:37:18,968][134211] Fps is (10 sec: 13516.8, 60 sec: 14267.7, 300 sec: 14870.6). Total num frames: 814321664. Throughput: 0: 3564.6. Samples: 192744216. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 12:37:18,968][134211] Avg episode reward: [(0, '9.564')] [2025-01-04 12:37:20,292][134294] Updated weights for policy 0, policy_version 198814 (0.0023) [2025-01-04 12:37:23,166][134294] Updated weights for policy 0, policy_version 198824 (0.0023) [2025-01-04 12:37:23,968][134211] Fps is (10 sec: 13926.7, 60 sec: 14267.7, 300 sec: 14842.8). Total num frames: 814391296. Throughput: 0: 3577.5. Samples: 192765554. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 12:37:23,968][134211] Avg episode reward: [(0, '8.922')] [2025-01-04 12:37:25,998][134294] Updated weights for policy 0, policy_version 198834 (0.0024) [2025-01-04 12:37:28,968][134211] Fps is (10 sec: 13926.5, 60 sec: 14199.5, 300 sec: 14704.0). Total num frames: 814460928. Throughput: 0: 3601.8. Samples: 192786230. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 12:37:28,968][134211] Avg episode reward: [(0, '9.292')] [2025-01-04 12:37:29,166][134294] Updated weights for policy 0, policy_version 198844 (0.0024) [2025-01-04 12:37:32,040][134294] Updated weights for policy 0, policy_version 198854 (0.0025) [2025-01-04 12:37:33,968][134211] Fps is (10 sec: 13926.4, 60 sec: 14199.5, 300 sec: 14648.4). Total num frames: 814530560. Throughput: 0: 3621.1. Samples: 192796530. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 12:37:33,968][134211] Avg episode reward: [(0, '9.561')] [2025-01-04 12:37:34,970][134294] Updated weights for policy 0, policy_version 198864 (0.0023) [2025-01-04 12:37:37,736][134294] Updated weights for policy 0, policy_version 198874 (0.0026) [2025-01-04 12:37:38,968][134211] Fps is (10 sec: 13926.3, 60 sec: 14199.5, 300 sec: 14648.4). Total num frames: 814600192. Throughput: 0: 3691.9. Samples: 192818094. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 12:37:38,968][134211] Avg episode reward: [(0, '10.626')] [2025-01-04 12:37:40,782][134294] Updated weights for policy 0, policy_version 198884 (0.0024) [2025-01-04 12:37:43,772][134294] Updated weights for policy 0, policy_version 198894 (0.0026) [2025-01-04 12:37:43,968][134211] Fps is (10 sec: 13926.3, 60 sec: 14336.0, 300 sec: 14662.3). Total num frames: 814669824. Throughput: 0: 3487.0. Samples: 192838690. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 12:37:43,968][134211] Avg episode reward: [(0, '9.453')] [2025-01-04 12:37:46,644][134294] Updated weights for policy 0, policy_version 198904 (0.0020) [2025-01-04 12:37:48,565][134294] Updated weights for policy 0, policy_version 198914 (0.0013) [2025-01-04 12:37:48,969][134211] Fps is (10 sec: 15563.4, 60 sec: 14745.4, 300 sec: 14717.8). Total num frames: 814755840. Throughput: 0: 3433.5. Samples: 192849198. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 12:37:48,969][134211] Avg episode reward: [(0, '8.493')] [2025-01-04 12:37:51,249][134294] Updated weights for policy 0, policy_version 198924 (0.0022) [2025-01-04 12:37:53,970][134211] Fps is (10 sec: 15971.0, 60 sec: 14745.0, 300 sec: 14731.6). Total num frames: 814829568. Throughput: 0: 3559.6. Samples: 192874848. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 12:37:53,970][134211] Avg episode reward: [(0, '8.694')] [2025-01-04 12:37:54,067][134294] Updated weights for policy 0, policy_version 198934 (0.0023) [2025-01-04 12:37:56,970][134294] Updated weights for policy 0, policy_version 198944 (0.0025) [2025-01-04 12:37:58,968][134211] Fps is (10 sec: 14337.3, 60 sec: 14199.4, 300 sec: 14731.7). Total num frames: 814899200. Throughput: 0: 3593.2. Samples: 192895936. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 12:37:58,968][134211] Avg episode reward: [(0, '9.063')] [2025-01-04 12:37:59,983][134294] Updated weights for policy 0, policy_version 198954 (0.0021) [2025-01-04 12:38:02,804][134294] Updated weights for policy 0, policy_version 198964 (0.0025) [2025-01-04 12:38:03,968][134211] Fps is (10 sec: 13929.5, 60 sec: 14131.2, 300 sec: 14717.8). Total num frames: 814968832. Throughput: 0: 3609.9. Samples: 192906662. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 12:38:03,968][134211] Avg episode reward: [(0, '9.859')] [2025-01-04 12:38:05,696][134294] Updated weights for policy 0, policy_version 198974 (0.0026) [2025-01-04 12:38:08,573][134294] Updated weights for policy 0, policy_version 198984 (0.0023) [2025-01-04 12:38:08,968][134211] Fps is (10 sec: 14335.9, 60 sec: 14267.7, 300 sec: 14731.7). Total num frames: 815042560. Throughput: 0: 3609.6. Samples: 192927986. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 12:38:08,969][134211] Avg episode reward: [(0, '10.456')] [2025-01-04 12:38:11,440][134294] Updated weights for policy 0, policy_version 198994 (0.0025) [2025-01-04 12:38:13,968][134211] Fps is (10 sec: 14336.0, 60 sec: 14336.1, 300 sec: 14745.6). Total num frames: 815112192. Throughput: 0: 3618.9. Samples: 192949080. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 12:38:13,968][134211] Avg episode reward: [(0, '9.292')] [2025-01-04 12:38:14,312][134294] Updated weights for policy 0, policy_version 199004 (0.0020) [2025-01-04 12:38:16,209][134294] Updated weights for policy 0, policy_version 199014 (0.0013) [2025-01-04 12:38:18,876][134294] Updated weights for policy 0, policy_version 199024 (0.0024) [2025-01-04 12:38:18,968][134211] Fps is (10 sec: 15974.5, 60 sec: 14677.3, 300 sec: 14815.0). Total num frames: 815202304. Throughput: 0: 3707.9. Samples: 192963384. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 12:38:18,968][134211] Avg episode reward: [(0, '10.098')] [2025-01-04 12:38:21,849][134294] Updated weights for policy 0, policy_version 199034 (0.0027) [2025-01-04 12:38:23,968][134211] Fps is (10 sec: 15974.1, 60 sec: 14677.3, 300 sec: 14787.3). Total num frames: 815271936. Throughput: 0: 3711.7. Samples: 192985122. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 12:38:23,968][134211] Avg episode reward: [(0, '9.502')] [2025-01-04 12:38:24,815][134294] Updated weights for policy 0, policy_version 199044 (0.0024) [2025-01-04 12:38:27,813][134294] Updated weights for policy 0, policy_version 199054 (0.0028) [2025-01-04 12:38:28,968][134211] Fps is (10 sec: 13516.9, 60 sec: 14609.1, 300 sec: 14634.5). Total num frames: 815337472. Throughput: 0: 3710.7. Samples: 193005672. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 12:38:28,968][134211] Avg episode reward: [(0, '9.572')] [2025-01-04 12:38:30,736][134294] Updated weights for policy 0, policy_version 199064 (0.0023) [2025-01-04 12:38:33,537][134294] Updated weights for policy 0, policy_version 199074 (0.0022) [2025-01-04 12:38:33,969][134211] Fps is (10 sec: 13924.9, 60 sec: 14677.0, 300 sec: 14592.8). Total num frames: 815411200. Throughput: 0: 3716.6. Samples: 193016446. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 12:38:33,969][134211] Avg episode reward: [(0, '9.305')] [2025-01-04 12:38:36,450][134294] Updated weights for policy 0, policy_version 199084 (0.0027) [2025-01-04 12:38:38,968][134211] Fps is (10 sec: 14336.0, 60 sec: 14677.4, 300 sec: 14606.8). Total num frames: 815480832. Throughput: 0: 3622.0. Samples: 193037832. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 12:38:38,968][134211] Avg episode reward: [(0, '8.783')] [2025-01-04 12:38:39,380][134294] Updated weights for policy 0, policy_version 199094 (0.0024) [2025-01-04 12:38:41,379][134294] Updated weights for policy 0, policy_version 199104 (0.0013) [2025-01-04 12:38:43,888][134294] Updated weights for policy 0, policy_version 199114 (0.0020) [2025-01-04 12:38:43,968][134211] Fps is (10 sec: 15976.3, 60 sec: 15018.7, 300 sec: 14676.2). Total num frames: 815570944. Throughput: 0: 3719.5. Samples: 193063314. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 12:38:43,969][134211] Avg episode reward: [(0, '10.419')] [2025-01-04 12:38:46,883][134294] Updated weights for policy 0, policy_version 199124 (0.0020) [2025-01-04 12:38:48,968][134211] Fps is (10 sec: 15974.3, 60 sec: 14745.8, 300 sec: 14690.1). Total num frames: 815640576. Throughput: 0: 3712.3. Samples: 193073718. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 12:38:48,968][134211] Avg episode reward: [(0, '8.939')] [2025-01-04 12:38:49,876][134294] Updated weights for policy 0, policy_version 199134 (0.0027) [2025-01-04 12:38:52,846][134294] Updated weights for policy 0, policy_version 199144 (0.0026) [2025-01-04 12:38:53,968][134211] Fps is (10 sec: 13516.6, 60 sec: 14609.6, 300 sec: 14676.2). Total num frames: 815706112. Throughput: 0: 3696.3. Samples: 193094318. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 12:38:53,968][134211] Avg episode reward: [(0, '10.003')] [2025-01-04 12:38:55,475][134294] Updated weights for policy 0, policy_version 199154 (0.0021) [2025-01-04 12:38:57,348][134294] Updated weights for policy 0, policy_version 199164 (0.0013) [2025-01-04 12:38:58,968][134211] Fps is (10 sec: 16793.8, 60 sec: 15155.2, 300 sec: 14787.3). Total num frames: 815808512. Throughput: 0: 3834.5. Samples: 193121634. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 12:38:58,968][134211] Avg episode reward: [(0, '9.097')] [2025-01-04 12:38:59,273][134294] Updated weights for policy 0, policy_version 199174 (0.0014) [2025-01-04 12:39:01,902][134294] Updated weights for policy 0, policy_version 199184 (0.0021) [2025-01-04 12:39:03,968][134211] Fps is (10 sec: 17612.7, 60 sec: 15223.4, 300 sec: 14787.2). Total num frames: 815882240. Throughput: 0: 3812.2. Samples: 193134932. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 12:39:03,969][134211] Avg episode reward: [(0, '10.067')] [2025-01-04 12:39:05,118][134294] Updated weights for policy 0, policy_version 199194 (0.0027) [2025-01-04 12:39:08,108][134294] Updated weights for policy 0, policy_version 199204 (0.0024) [2025-01-04 12:39:08,970][134211] Fps is (10 sec: 13923.3, 60 sec: 15086.4, 300 sec: 14773.3). Total num frames: 815947776. Throughput: 0: 3772.7. Samples: 193154902. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 12:39:08,970][134211] Avg episode reward: [(0, '10.010')] [2025-01-04 12:39:11,017][134294] Updated weights for policy 0, policy_version 199214 (0.0024) [2025-01-04 12:39:13,968][134211] Fps is (10 sec: 13516.9, 60 sec: 15086.9, 300 sec: 14773.4). Total num frames: 816017408. Throughput: 0: 3773.8. Samples: 193175494. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 12:39:13,968][134211] Avg episode reward: [(0, '9.793')] [2025-01-04 12:39:14,043][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000199224_816021504.pth... [2025-01-04 12:39:14,046][134294] Updated weights for policy 0, policy_version 199224 (0.0028) [2025-01-04 12:39:14,109][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000198362_812490752.pth [2025-01-04 12:39:17,076][134294] Updated weights for policy 0, policy_version 199234 (0.0025) [2025-01-04 12:39:18,968][134211] Fps is (10 sec: 13929.2, 60 sec: 14745.6, 300 sec: 14759.5). Total num frames: 816087040. Throughput: 0: 3760.9. Samples: 193185682. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 12:39:18,968][134211] Avg episode reward: [(0, '9.097')] [2025-01-04 12:39:20,040][134294] Updated weights for policy 0, policy_version 199244 (0.0023) [2025-01-04 12:39:22,873][134294] Updated weights for policy 0, policy_version 199254 (0.0025) [2025-01-04 12:39:23,968][134211] Fps is (10 sec: 13926.5, 60 sec: 14745.6, 300 sec: 14759.5). Total num frames: 816156672. Throughput: 0: 3757.7. Samples: 193206930. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 12:39:23,969][134211] Avg episode reward: [(0, '9.284')] [2025-01-04 12:39:25,758][134294] Updated weights for policy 0, policy_version 199264 (0.0024) [2025-01-04 12:39:28,637][134294] Updated weights for policy 0, policy_version 199274 (0.0025) [2025-01-04 12:39:28,968][134211] Fps is (10 sec: 14336.0, 60 sec: 14882.1, 300 sec: 14773.4). Total num frames: 816230400. Throughput: 0: 3666.9. Samples: 193228324. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 12:39:28,968][134211] Avg episode reward: [(0, '9.360')] [2025-01-04 12:39:31,506][134294] Updated weights for policy 0, policy_version 199284 (0.0025) [2025-01-04 12:39:33,968][134211] Fps is (10 sec: 14335.7, 60 sec: 14814.1, 300 sec: 14717.8). Total num frames: 816300032. Throughput: 0: 3670.8. Samples: 193238906. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 12:39:33,969][134211] Avg episode reward: [(0, '8.816')] [2025-01-04 12:39:34,546][134294] Updated weights for policy 0, policy_version 199294 (0.0026) [2025-01-04 12:39:37,360][134294] Updated weights for policy 0, policy_version 199304 (0.0023) [2025-01-04 12:39:38,968][134211] Fps is (10 sec: 13926.4, 60 sec: 14813.8, 300 sec: 14579.0). Total num frames: 816369664. Throughput: 0: 3676.2. Samples: 193259746. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 12:39:38,968][134211] Avg episode reward: [(0, '9.137')] [2025-01-04 12:39:40,251][134294] Updated weights for policy 0, policy_version 199314 (0.0026) [2025-01-04 12:39:43,141][134294] Updated weights for policy 0, policy_version 199324 (0.0027) [2025-01-04 12:39:43,968][134211] Fps is (10 sec: 14336.5, 60 sec: 14540.8, 300 sec: 14565.1). Total num frames: 816443392. Throughput: 0: 3544.5. Samples: 193281136. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 12:39:43,968][134211] Avg episode reward: [(0, '9.944')] [2025-01-04 12:39:45,301][134294] Updated weights for policy 0, policy_version 199334 (0.0017) [2025-01-04 12:39:47,532][134294] Updated weights for policy 0, policy_version 199344 (0.0020) [2025-01-04 12:39:48,968][134211] Fps is (10 sec: 15974.4, 60 sec: 14813.9, 300 sec: 14620.6). Total num frames: 816529408. Throughput: 0: 3576.1. Samples: 193295854. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 12:39:48,968][134211] Avg episode reward: [(0, '9.165')] [2025-01-04 12:39:50,550][134294] Updated weights for policy 0, policy_version 199354 (0.0024) [2025-01-04 12:39:53,390][134294] Updated weights for policy 0, policy_version 199364 (0.0025) [2025-01-04 12:39:53,968][134211] Fps is (10 sec: 15564.5, 60 sec: 14882.1, 300 sec: 14634.5). Total num frames: 816599040. Throughput: 0: 3618.9. Samples: 193317744. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 12:39:53,968][134211] Avg episode reward: [(0, '9.628')] [2025-01-04 12:39:56,327][134294] Updated weights for policy 0, policy_version 199374 (0.0022) [2025-01-04 12:39:58,968][134211] Fps is (10 sec: 13926.3, 60 sec: 14335.9, 300 sec: 14648.4). Total num frames: 816668672. Throughput: 0: 3623.3. Samples: 193338542. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 12:39:58,968][134211] Avg episode reward: [(0, '9.929')] [2025-01-04 12:39:59,255][134294] Updated weights for policy 0, policy_version 199384 (0.0024) [2025-01-04 12:40:02,206][134294] Updated weights for policy 0, policy_version 199394 (0.0020) [2025-01-04 12:40:03,968][134211] Fps is (10 sec: 15155.6, 60 sec: 14472.6, 300 sec: 14676.2). Total num frames: 816750592. Throughput: 0: 3628.7. Samples: 193348972. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 12:40:03,968][134211] Avg episode reward: [(0, '9.059')] [2025-01-04 12:40:04,224][134294] Updated weights for policy 0, policy_version 199404 (0.0015) [2025-01-04 12:40:06,837][134294] Updated weights for policy 0, policy_version 199414 (0.0022) [2025-01-04 12:40:08,968][134211] Fps is (10 sec: 15974.5, 60 sec: 14677.8, 300 sec: 14704.0). Total num frames: 816828416. Throughput: 0: 3720.1. Samples: 193374336. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 12:40:08,968][134211] Avg episode reward: [(0, '10.871')] [2025-01-04 12:40:09,855][134294] Updated weights for policy 0, policy_version 199424 (0.0028) [2025-01-04 12:40:12,780][134294] Updated weights for policy 0, policy_version 199434 (0.0024) [2025-01-04 12:40:13,968][134211] Fps is (10 sec: 14335.7, 60 sec: 14609.1, 300 sec: 14690.1). Total num frames: 816893952. Throughput: 0: 3697.7. Samples: 193394722. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 12:40:13,968][134211] Avg episode reward: [(0, '8.729')] [2025-01-04 12:40:15,722][134294] Updated weights for policy 0, policy_version 199444 (0.0027) [2025-01-04 12:40:18,573][134294] Updated weights for policy 0, policy_version 199454 (0.0022) [2025-01-04 12:40:18,968][134211] Fps is (10 sec: 13926.4, 60 sec: 14677.3, 300 sec: 14703.9). Total num frames: 816967680. Throughput: 0: 3702.4. Samples: 193405512. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 12:40:18,968][134211] Avg episode reward: [(0, '9.310')] [2025-01-04 12:40:21,448][134294] Updated weights for policy 0, policy_version 199464 (0.0027) [2025-01-04 12:40:23,968][134211] Fps is (10 sec: 14336.0, 60 sec: 14677.3, 300 sec: 14703.9). Total num frames: 817037312. Throughput: 0: 3713.3. Samples: 193426846. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 12:40:23,968][134211] Avg episode reward: [(0, '9.419')] [2025-01-04 12:40:24,477][134294] Updated weights for policy 0, policy_version 199474 (0.0024) [2025-01-04 12:40:27,350][134294] Updated weights for policy 0, policy_version 199484 (0.0025) [2025-01-04 12:40:28,968][134211] Fps is (10 sec: 13926.4, 60 sec: 14609.1, 300 sec: 14703.9). Total num frames: 817106944. Throughput: 0: 3698.0. Samples: 193447546. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 12:40:28,968][134211] Avg episode reward: [(0, '8.631')] [2025-01-04 12:40:30,138][134294] Updated weights for policy 0, policy_version 199494 (0.0024) [2025-01-04 12:40:31,987][134294] Updated weights for policy 0, policy_version 199504 (0.0014) [2025-01-04 12:40:33,838][134294] Updated weights for policy 0, policy_version 199514 (0.0014) [2025-01-04 12:40:33,967][134211] Fps is (10 sec: 17203.6, 60 sec: 15155.3, 300 sec: 14801.2). Total num frames: 817209344. Throughput: 0: 3669.9. Samples: 193461000. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 12:40:33,968][134211] Avg episode reward: [(0, '9.054')] [2025-01-04 12:40:35,716][134294] Updated weights for policy 0, policy_version 199524 (0.0013) [2025-01-04 12:40:37,577][134294] Updated weights for policy 0, policy_version 199534 (0.0013) [2025-01-04 12:40:38,968][134211] Fps is (10 sec: 20480.2, 60 sec: 15701.4, 300 sec: 14912.2). Total num frames: 817311744. Throughput: 0: 3915.7. Samples: 193493950. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 12:40:38,968][134211] Avg episode reward: [(0, '9.885')] [2025-01-04 12:40:40,293][134294] Updated weights for policy 0, policy_version 199544 (0.0025) [2025-01-04 12:40:43,229][134294] Updated weights for policy 0, policy_version 199554 (0.0030) [2025-01-04 12:40:43,968][134211] Fps is (10 sec: 16792.9, 60 sec: 15564.7, 300 sec: 14856.7). Total num frames: 817377280. Throughput: 0: 3947.6. Samples: 193516184. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 12:40:43,969][134211] Avg episode reward: [(0, '9.208')] [2025-01-04 12:40:46,408][134294] Updated weights for policy 0, policy_version 199564 (0.0026) [2025-01-04 12:40:48,968][134211] Fps is (10 sec: 13516.8, 60 sec: 15291.8, 300 sec: 14717.8). Total num frames: 817446912. Throughput: 0: 3934.8. Samples: 193526040. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 12:40:48,968][134211] Avg episode reward: [(0, '10.074')] [2025-01-04 12:40:49,504][134294] Updated weights for policy 0, policy_version 199574 (0.0024) [2025-01-04 12:40:52,447][134294] Updated weights for policy 0, policy_version 199584 (0.0024) [2025-01-04 12:40:53,968][134211] Fps is (10 sec: 13926.8, 60 sec: 15291.8, 300 sec: 14634.5). Total num frames: 817516544. Throughput: 0: 3820.4. Samples: 193546254. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 12:40:53,968][134211] Avg episode reward: [(0, '8.904')] [2025-01-04 12:40:55,480][134294] Updated weights for policy 0, policy_version 199594 (0.0026) [2025-01-04 12:40:58,352][134294] Updated weights for policy 0, policy_version 199604 (0.0028) [2025-01-04 12:40:58,968][134211] Fps is (10 sec: 13926.2, 60 sec: 15291.8, 300 sec: 14662.3). Total num frames: 817586176. Throughput: 0: 3835.5. Samples: 193567318. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 12:40:58,969][134211] Avg episode reward: [(0, '9.026')] [2025-01-04 12:41:01,207][134294] Updated weights for policy 0, policy_version 199614 (0.0025) [2025-01-04 12:41:03,968][134211] Fps is (10 sec: 13926.2, 60 sec: 15086.9, 300 sec: 14662.3). Total num frames: 817655808. Throughput: 0: 3828.4. Samples: 193577790. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 12:41:03,968][134211] Avg episode reward: [(0, '8.917')] [2025-01-04 12:41:04,406][134294] Updated weights for policy 0, policy_version 199624 (0.0024) [2025-01-04 12:41:07,179][134294] Updated weights for policy 0, policy_version 199634 (0.0025) [2025-01-04 12:41:08,968][134211] Fps is (10 sec: 13926.5, 60 sec: 14950.4, 300 sec: 14676.2). Total num frames: 817725440. Throughput: 0: 3811.1. Samples: 193598346. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 12:41:08,968][134211] Avg episode reward: [(0, '9.320')] [2025-01-04 12:41:10,137][134294] Updated weights for policy 0, policy_version 199644 (0.0025) [2025-01-04 12:41:12,984][134294] Updated weights for policy 0, policy_version 199654 (0.0025) [2025-01-04 12:41:13,968][134211] Fps is (10 sec: 13926.3, 60 sec: 15018.6, 300 sec: 14676.2). Total num frames: 817795072. Throughput: 0: 3822.1. Samples: 193619542. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 12:41:13,968][134211] Avg episode reward: [(0, '8.982')] [2025-01-04 12:41:13,979][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000199657_817795072.pth... [2025-01-04 12:41:14,046][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000198792_814252032.pth [2025-01-04 12:41:15,962][134294] Updated weights for policy 0, policy_version 199664 (0.0022) [2025-01-04 12:41:18,794][134294] Updated weights for policy 0, policy_version 199674 (0.0025) [2025-01-04 12:41:18,968][134211] Fps is (10 sec: 13926.4, 60 sec: 14950.4, 300 sec: 14676.2). Total num frames: 817864704. Throughput: 0: 3753.9. Samples: 193629924. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 12:41:18,968][134211] Avg episode reward: [(0, '9.630')] [2025-01-04 12:41:21,650][134294] Updated weights for policy 0, policy_version 199684 (0.0025) [2025-01-04 12:41:23,968][134211] Fps is (10 sec: 13926.5, 60 sec: 14950.4, 300 sec: 14662.3). Total num frames: 817934336. Throughput: 0: 3497.9. Samples: 193651356. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 12:41:23,968][134211] Avg episode reward: [(0, '8.782')] [2025-01-04 12:41:24,692][134294] Updated weights for policy 0, policy_version 199694 (0.0026) [2025-01-04 12:41:27,599][134294] Updated weights for policy 0, policy_version 199704 (0.0025) [2025-01-04 12:41:28,968][134211] Fps is (10 sec: 13926.4, 60 sec: 14950.4, 300 sec: 14662.3). Total num frames: 818003968. Throughput: 0: 3462.2. Samples: 193671984. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 12:41:28,968][134211] Avg episode reward: [(0, '9.246')] [2025-01-04 12:41:30,515][134294] Updated weights for policy 0, policy_version 199714 (0.0026) [2025-01-04 12:41:33,352][134294] Updated weights for policy 0, policy_version 199724 (0.0024) [2025-01-04 12:41:33,968][134211] Fps is (10 sec: 14336.1, 60 sec: 14472.5, 300 sec: 14676.2). Total num frames: 818077696. Throughput: 0: 3486.7. Samples: 193682942. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 12:41:33,968][134211] Avg episode reward: [(0, '8.824')] [2025-01-04 12:41:36,165][134294] Updated weights for policy 0, policy_version 199734 (0.0026) [2025-01-04 12:41:38,968][134211] Fps is (10 sec: 14335.9, 60 sec: 13926.4, 300 sec: 14704.0). Total num frames: 818147328. Throughput: 0: 3517.8. Samples: 193704554. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 12:41:38,968][134211] Avg episode reward: [(0, '8.607')] [2025-01-04 12:41:38,993][134294] Updated weights for policy 0, policy_version 199744 (0.0024) [2025-01-04 12:41:41,856][134294] Updated weights for policy 0, policy_version 199754 (0.0022) [2025-01-04 12:41:43,868][134294] Updated weights for policy 0, policy_version 199764 (0.0014) [2025-01-04 12:41:43,968][134211] Fps is (10 sec: 15564.9, 60 sec: 14267.8, 300 sec: 14787.3). Total num frames: 818233344. Throughput: 0: 3563.2. Samples: 193727664. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 12:41:43,968][134211] Avg episode reward: [(0, '9.358')] [2025-01-04 12:41:46,573][134294] Updated weights for policy 0, policy_version 199774 (0.0023) [2025-01-04 12:41:48,968][134211] Fps is (10 sec: 15973.8, 60 sec: 14335.9, 300 sec: 14787.2). Total num frames: 818307072. Throughput: 0: 3614.4. Samples: 193740438. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 12:41:48,969][134211] Avg episode reward: [(0, '9.885')] [2025-01-04 12:41:49,434][134294] Updated weights for policy 0, policy_version 199784 (0.0023) [2025-01-04 12:41:52,444][134294] Updated weights for policy 0, policy_version 199794 (0.0026) [2025-01-04 12:41:53,968][134211] Fps is (10 sec: 13926.1, 60 sec: 14267.7, 300 sec: 14662.3). Total num frames: 818372608. Throughput: 0: 3623.9. Samples: 193761424. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 12:41:53,969][134211] Avg episode reward: [(0, '9.356')] [2025-01-04 12:41:55,566][134294] Updated weights for policy 0, policy_version 199804 (0.0027) [2025-01-04 12:41:57,850][134294] Updated weights for policy 0, policy_version 199814 (0.0017) [2025-01-04 12:41:58,968][134211] Fps is (10 sec: 15156.0, 60 sec: 14540.8, 300 sec: 14704.0). Total num frames: 818458624. Throughput: 0: 3669.3. Samples: 193784660. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 12:41:58,968][134211] Avg episode reward: [(0, '9.779')] [2025-01-04 12:41:59,737][134294] Updated weights for policy 0, policy_version 199824 (0.0014) [2025-01-04 12:42:01,647][134294] Updated weights for policy 0, policy_version 199834 (0.0012) [2025-01-04 12:42:03,520][134294] Updated weights for policy 0, policy_version 199844 (0.0013) [2025-01-04 12:42:03,968][134211] Fps is (10 sec: 19661.3, 60 sec: 15223.5, 300 sec: 14856.7). Total num frames: 818569216. Throughput: 0: 3800.8. Samples: 193800960. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 12:42:03,968][134211] Avg episode reward: [(0, '10.228')] [2025-01-04 12:42:05,367][134294] Updated weights for policy 0, policy_version 199854 (0.0013) [2025-01-04 12:42:07,777][134294] Updated weights for policy 0, policy_version 199864 (0.0019) [2025-01-04 12:42:08,968][134211] Fps is (10 sec: 19660.1, 60 sec: 15496.5, 300 sec: 14926.1). Total num frames: 818655232. Throughput: 0: 4013.3. Samples: 193831956. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 12:42:08,969][134211] Avg episode reward: [(0, '9.416')] [2025-01-04 12:42:11,036][134294] Updated weights for policy 0, policy_version 199874 (0.0028) [2025-01-04 12:42:13,968][134211] Fps is (10 sec: 15155.1, 60 sec: 15428.3, 300 sec: 14912.2). Total num frames: 818720768. Throughput: 0: 3980.6. Samples: 193851112. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 12:42:13,968][134211] Avg episode reward: [(0, '9.418')] [2025-01-04 12:42:14,217][134294] Updated weights for policy 0, policy_version 199884 (0.0031) [2025-01-04 12:42:17,297][134294] Updated weights for policy 0, policy_version 199894 (0.0025) [2025-01-04 12:42:18,968][134211] Fps is (10 sec: 13106.6, 60 sec: 15359.8, 300 sec: 14898.3). Total num frames: 818786304. Throughput: 0: 3955.7. Samples: 193860950. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 12:42:18,969][134211] Avg episode reward: [(0, '9.094')] [2025-01-04 12:42:20,143][134294] Updated weights for policy 0, policy_version 199904 (0.0027) [2025-01-04 12:42:23,123][134294] Updated weights for policy 0, policy_version 199914 (0.0027) [2025-01-04 12:42:23,968][134211] Fps is (10 sec: 13516.8, 60 sec: 15360.0, 300 sec: 14898.3). Total num frames: 818855936. Throughput: 0: 3941.2. Samples: 193881908. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 12:42:23,968][134211] Avg episode reward: [(0, '9.811')] [2025-01-04 12:42:25,998][134294] Updated weights for policy 0, policy_version 199924 (0.0023) [2025-01-04 12:42:28,968][134211] Fps is (10 sec: 13927.4, 60 sec: 15360.0, 300 sec: 14898.3). Total num frames: 818925568. Throughput: 0: 3889.6. Samples: 193902694. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 12:42:28,968][134211] Avg episode reward: [(0, '8.978')] [2025-01-04 12:42:28,973][134294] Updated weights for policy 0, policy_version 199934 (0.0024) [2025-01-04 12:42:31,962][134294] Updated weights for policy 0, policy_version 199944 (0.0027) [2025-01-04 12:42:33,968][134211] Fps is (10 sec: 13926.2, 60 sec: 15291.7, 300 sec: 14898.3). Total num frames: 818995200. Throughput: 0: 3836.6. Samples: 193913082. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 12:42:33,968][134211] Avg episode reward: [(0, '9.342')] [2025-01-04 12:42:34,953][134294] Updated weights for policy 0, policy_version 199954 (0.0024) [2025-01-04 12:42:37,713][134294] Updated weights for policy 0, policy_version 199964 (0.0023) [2025-01-04 12:42:38,968][134211] Fps is (10 sec: 14335.8, 60 sec: 15360.0, 300 sec: 14912.2). Total num frames: 819068928. Throughput: 0: 3846.1. Samples: 193934498. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 12:42:38,968][134211] Avg episode reward: [(0, '9.078')] [2025-01-04 12:42:40,571][134294] Updated weights for policy 0, policy_version 199974 (0.0023) [2025-01-04 12:42:43,410][134294] Updated weights for policy 0, policy_version 199984 (0.0022) [2025-01-04 12:42:43,971][134211] Fps is (10 sec: 14331.6, 60 sec: 15086.1, 300 sec: 14856.6). Total num frames: 819138560. Throughput: 0: 3807.6. Samples: 193956016. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 12:42:43,971][134211] Avg episode reward: [(0, '9.188')] [2025-01-04 12:42:46,268][134294] Updated weights for policy 0, policy_version 199994 (0.0023) [2025-01-04 12:42:48,968][134211] Fps is (10 sec: 14336.2, 60 sec: 15087.0, 300 sec: 14856.8). Total num frames: 819212288. Throughput: 0: 3683.9. Samples: 193966734. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 12:42:48,968][134211] Avg episode reward: [(0, '9.922')] [2025-01-04 12:42:49,289][134294] Updated weights for policy 0, policy_version 200004 (0.0027) [2025-01-04 12:42:52,186][134294] Updated weights for policy 0, policy_version 200014 (0.0027) [2025-01-04 12:42:53,968][134211] Fps is (10 sec: 14340.5, 60 sec: 15155.2, 300 sec: 14856.7). Total num frames: 819281920. Throughput: 0: 3458.9. Samples: 193987608. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 12:42:53,968][134211] Avg episode reward: [(0, '10.477')] [2025-01-04 12:42:55,053][134294] Updated weights for policy 0, policy_version 200024 (0.0024) [2025-01-04 12:42:57,938][134294] Updated weights for policy 0, policy_version 200034 (0.0026) [2025-01-04 12:42:58,968][134211] Fps is (10 sec: 13926.3, 60 sec: 14882.1, 300 sec: 14856.7). Total num frames: 819351552. Throughput: 0: 3506.1. Samples: 194008886. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 12:42:58,968][134211] Avg episode reward: [(0, '9.709')] [2025-01-04 12:43:00,774][134294] Updated weights for policy 0, policy_version 200044 (0.0025) [2025-01-04 12:43:03,727][134294] Updated weights for policy 0, policy_version 200054 (0.0022) [2025-01-04 12:43:03,970][134211] Fps is (10 sec: 14333.2, 60 sec: 14267.2, 300 sec: 14856.6). Total num frames: 819425280. Throughput: 0: 3528.9. Samples: 194019756. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 12:43:03,970][134211] Avg episode reward: [(0, '9.853')] [2025-01-04 12:43:06,615][134294] Updated weights for policy 0, policy_version 200064 (0.0026) [2025-01-04 12:43:08,970][134211] Fps is (10 sec: 13923.5, 60 sec: 13926.0, 300 sec: 14842.7). Total num frames: 819490816. Throughput: 0: 3523.5. Samples: 194040472. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 12:43:08,970][134211] Avg episode reward: [(0, '9.206')] [2025-01-04 12:43:09,630][134294] Updated weights for policy 0, policy_version 200074 (0.0026) [2025-01-04 12:43:12,619][134294] Updated weights for policy 0, policy_version 200084 (0.0025) [2025-01-04 12:43:13,968][134211] Fps is (10 sec: 13519.5, 60 sec: 13994.7, 300 sec: 14773.4). Total num frames: 819560448. Throughput: 0: 3519.5. Samples: 194061074. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 12:43:13,968][134211] Avg episode reward: [(0, '10.388')] [2025-01-04 12:43:13,976][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000200088_819560448.pth... [2025-01-04 12:43:14,046][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000199224_816021504.pth [2025-01-04 12:43:15,592][134294] Updated weights for policy 0, policy_version 200094 (0.0023) [2025-01-04 12:43:17,937][134294] Updated weights for policy 0, policy_version 200104 (0.0018) [2025-01-04 12:43:18,968][134211] Fps is (10 sec: 15567.9, 60 sec: 14336.1, 300 sec: 14828.9). Total num frames: 819646464. Throughput: 0: 3525.8. Samples: 194071742. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 12:43:18,968][134211] Avg episode reward: [(0, '10.077')] [2025-01-04 12:43:19,886][134294] Updated weights for policy 0, policy_version 200114 (0.0012) [2025-01-04 12:43:21,859][134294] Updated weights for policy 0, policy_version 200124 (0.0014) [2025-01-04 12:43:23,968][134211] Fps is (10 sec: 17612.6, 60 sec: 14677.3, 300 sec: 14912.2). Total num frames: 819736576. Throughput: 0: 3709.9. Samples: 194101444. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 12:43:23,968][134211] Avg episode reward: [(0, '9.453')] [2025-01-04 12:43:24,689][134294] Updated weights for policy 0, policy_version 200134 (0.0025) [2025-01-04 12:43:27,758][134294] Updated weights for policy 0, policy_version 200144 (0.0029) [2025-01-04 12:43:28,968][134211] Fps is (10 sec: 15564.8, 60 sec: 14609.0, 300 sec: 14884.5). Total num frames: 819802112. Throughput: 0: 3686.1. Samples: 194121880. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 12:43:28,968][134211] Avg episode reward: [(0, '9.412')] [2025-01-04 12:43:30,727][134294] Updated weights for policy 0, policy_version 200154 (0.0024) [2025-01-04 12:43:33,599][134294] Updated weights for policy 0, policy_version 200164 (0.0026) [2025-01-04 12:43:33,968][134211] Fps is (10 sec: 13926.5, 60 sec: 14677.3, 300 sec: 14898.3). Total num frames: 819875840. Throughput: 0: 3683.3. Samples: 194132482. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 12:43:33,968][134211] Avg episode reward: [(0, '8.951')] [2025-01-04 12:43:36,586][134294] Updated weights for policy 0, policy_version 200174 (0.0026) [2025-01-04 12:43:38,968][134211] Fps is (10 sec: 14336.0, 60 sec: 14609.1, 300 sec: 14828.9). Total num frames: 819945472. Throughput: 0: 3687.7. Samples: 194153554. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 12:43:38,968][134211] Avg episode reward: [(0, '9.311')] [2025-01-04 12:43:39,585][134294] Updated weights for policy 0, policy_version 200184 (0.0027) [2025-01-04 12:43:42,457][134294] Updated weights for policy 0, policy_version 200194 (0.0023) [2025-01-04 12:43:43,968][134211] Fps is (10 sec: 13516.7, 60 sec: 14541.5, 300 sec: 14815.0). Total num frames: 820011008. Throughput: 0: 3672.3. Samples: 194174138. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 12:43:43,968][134211] Avg episode reward: [(0, '8.498')] [2025-01-04 12:43:45,369][134294] Updated weights for policy 0, policy_version 200204 (0.0025) [2025-01-04 12:43:48,266][134294] Updated weights for policy 0, policy_version 200214 (0.0024) [2025-01-04 12:43:48,968][134211] Fps is (10 sec: 13926.6, 60 sec: 14540.8, 300 sec: 14842.8). Total num frames: 820084736. Throughput: 0: 3671.5. Samples: 194184964. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 12:43:48,968][134211] Avg episode reward: [(0, '10.117')] [2025-01-04 12:43:51,092][134294] Updated weights for policy 0, policy_version 200224 (0.0023) [2025-01-04 12:43:53,968][134211] Fps is (10 sec: 14336.0, 60 sec: 14540.8, 300 sec: 14731.7). Total num frames: 820154368. Throughput: 0: 3685.2. Samples: 194206300. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 12:43:53,968][134211] Avg episode reward: [(0, '10.103')] [2025-01-04 12:43:54,113][134294] Updated weights for policy 0, policy_version 200234 (0.0026) [2025-01-04 12:43:56,233][134294] Updated weights for policy 0, policy_version 200244 (0.0016) [2025-01-04 12:43:58,107][134294] Updated weights for policy 0, policy_version 200254 (0.0012) [2025-01-04 12:43:58,968][134211] Fps is (10 sec: 17202.9, 60 sec: 15086.9, 300 sec: 14828.9). Total num frames: 820256768. Throughput: 0: 3839.6. Samples: 194233856. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 12:43:58,968][134211] Avg episode reward: [(0, '9.112')] [2025-01-04 12:43:59,979][134294] Updated weights for policy 0, policy_version 200264 (0.0012) [2025-01-04 12:44:01,850][134294] Updated weights for policy 0, policy_version 200274 (0.0013) [2025-01-04 12:44:03,968][134211] Fps is (10 sec: 20480.3, 60 sec: 15565.3, 300 sec: 14954.0). Total num frames: 820359168. Throughput: 0: 3969.4. Samples: 194250366. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 12:44:03,968][134211] Avg episode reward: [(0, '10.285')] [2025-01-04 12:44:04,045][134294] Updated weights for policy 0, policy_version 200284 (0.0018) [2025-01-04 12:44:07,077][134294] Updated weights for policy 0, policy_version 200294 (0.0026) [2025-01-04 12:44:08,968][134211] Fps is (10 sec: 16793.7, 60 sec: 15565.3, 300 sec: 14940.0). Total num frames: 820424704. Throughput: 0: 3846.5. Samples: 194274538. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 12:44:08,968][134211] Avg episode reward: [(0, '10.292')] [2025-01-04 12:44:10,326][134294] Updated weights for policy 0, policy_version 200304 (0.0024) [2025-01-04 12:44:13,158][134294] Updated weights for policy 0, policy_version 200314 (0.0025) [2025-01-04 12:44:13,968][134211] Fps is (10 sec: 13516.6, 60 sec: 15564.8, 300 sec: 14940.0). Total num frames: 820494336. Throughput: 0: 3840.6. Samples: 194294706. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 12:44:13,968][134211] Avg episode reward: [(0, '10.460')] [2025-01-04 12:44:16,112][134294] Updated weights for policy 0, policy_version 200324 (0.0025) [2025-01-04 12:44:18,968][134211] Fps is (10 sec: 13926.6, 60 sec: 15291.8, 300 sec: 14940.0). Total num frames: 820563968. Throughput: 0: 3834.7. Samples: 194305044. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 12:44:18,968][134211] Avg episode reward: [(0, '9.282')] [2025-01-04 12:44:19,232][134294] Updated weights for policy 0, policy_version 200334 (0.0024) [2025-01-04 12:44:22,170][134294] Updated weights for policy 0, policy_version 200344 (0.0024) [2025-01-04 12:44:23,968][134211] Fps is (10 sec: 13516.9, 60 sec: 14882.2, 300 sec: 14912.2). Total num frames: 820629504. Throughput: 0: 3819.2. Samples: 194325420. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 12:44:23,968][134211] Avg episode reward: [(0, '10.414')] [2025-01-04 12:44:25,161][134294] Updated weights for policy 0, policy_version 200354 (0.0022) [2025-01-04 12:44:27,986][134294] Updated weights for policy 0, policy_version 200364 (0.0023) [2025-01-04 12:44:28,968][134211] Fps is (10 sec: 13926.0, 60 sec: 15018.6, 300 sec: 14926.1). Total num frames: 820703232. Throughput: 0: 3833.8. Samples: 194346660. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 12:44:28,969][134211] Avg episode reward: [(0, '9.915')] [2025-01-04 12:44:30,836][134294] Updated weights for policy 0, policy_version 200374 (0.0024) [2025-01-04 12:44:33,746][134294] Updated weights for policy 0, policy_version 200384 (0.0027) [2025-01-04 12:44:33,968][134211] Fps is (10 sec: 14336.0, 60 sec: 14950.4, 300 sec: 14926.1). Total num frames: 820772864. Throughput: 0: 3830.4. Samples: 194357334. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 12:44:33,969][134211] Avg episode reward: [(0, '9.955')] [2025-01-04 12:44:36,606][134294] Updated weights for policy 0, policy_version 200394 (0.0028) [2025-01-04 12:44:38,968][134211] Fps is (10 sec: 14336.1, 60 sec: 15018.6, 300 sec: 14926.1). Total num frames: 820846592. Throughput: 0: 3830.2. Samples: 194378658. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 12:44:38,969][134211] Avg episode reward: [(0, '8.831')] [2025-01-04 12:44:39,612][134294] Updated weights for policy 0, policy_version 200404 (0.0024) [2025-01-04 12:44:42,537][134294] Updated weights for policy 0, policy_version 200414 (0.0027) [2025-01-04 12:44:43,968][134211] Fps is (10 sec: 14335.9, 60 sec: 15086.9, 300 sec: 14870.6). Total num frames: 820916224. Throughput: 0: 3678.6. Samples: 194399394. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 12:44:43,968][134211] Avg episode reward: [(0, '10.229')] [2025-01-04 12:44:45,390][134294] Updated weights for policy 0, policy_version 200424 (0.0022) [2025-01-04 12:44:48,245][134294] Updated weights for policy 0, policy_version 200434 (0.0026) [2025-01-04 12:44:48,968][134211] Fps is (10 sec: 13926.5, 60 sec: 15018.6, 300 sec: 14870.6). Total num frames: 820985856. Throughput: 0: 3552.7. Samples: 194410238. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 12:44:48,968][134211] Avg episode reward: [(0, '9.204')] [2025-01-04 12:44:51,106][134294] Updated weights for policy 0, policy_version 200444 (0.0024) [2025-01-04 12:44:53,968][134211] Fps is (10 sec: 13926.1, 60 sec: 15018.6, 300 sec: 14870.6). Total num frames: 821055488. Throughput: 0: 3487.5. Samples: 194431478. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 12:44:53,969][134211] Avg episode reward: [(0, '9.080')] [2025-01-04 12:44:54,091][134294] Updated weights for policy 0, policy_version 200454 (0.0025) [2025-01-04 12:44:57,000][134294] Updated weights for policy 0, policy_version 200464 (0.0029) [2025-01-04 12:44:58,968][134211] Fps is (10 sec: 13926.4, 60 sec: 14472.5, 300 sec: 14828.9). Total num frames: 821125120. Throughput: 0: 3503.4. Samples: 194452358. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 12:44:58,968][134211] Avg episode reward: [(0, '9.671')] [2025-01-04 12:44:59,899][134294] Updated weights for policy 0, policy_version 200474 (0.0025) [2025-01-04 12:45:02,776][134294] Updated weights for policy 0, policy_version 200484 (0.0025) [2025-01-04 12:45:03,968][134211] Fps is (10 sec: 14336.6, 60 sec: 13994.7, 300 sec: 14815.0). Total num frames: 821198848. Throughput: 0: 3516.8. Samples: 194463302. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 12:45:03,968][134211] Avg episode reward: [(0, '9.227')] [2025-01-04 12:45:05,642][134294] Updated weights for policy 0, policy_version 200494 (0.0024) [2025-01-04 12:45:07,690][134294] Updated weights for policy 0, policy_version 200504 (0.0014) [2025-01-04 12:45:08,968][134211] Fps is (10 sec: 16384.3, 60 sec: 14404.3, 300 sec: 14898.3). Total num frames: 821288960. Throughput: 0: 3580.8. Samples: 194486556. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 12:45:08,968][134211] Avg episode reward: [(0, '9.457')] [2025-01-04 12:45:09,586][134294] Updated weights for policy 0, policy_version 200514 (0.0012) [2025-01-04 12:45:11,414][134294] Updated weights for policy 0, policy_version 200524 (0.0014) [2025-01-04 12:45:13,324][134294] Updated weights for policy 0, policy_version 200534 (0.0013) [2025-01-04 12:45:13,968][134211] Fps is (10 sec: 20070.5, 60 sec: 15087.0, 300 sec: 15023.3). Total num frames: 821399552. Throughput: 0: 3834.8. Samples: 194519224. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 12:45:13,968][134211] Avg episode reward: [(0, '8.423')] [2025-01-04 12:45:13,974][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000200537_821399552.pth... [2025-01-04 12:45:14,029][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000199657_817795072.pth [2025-01-04 12:45:15,717][134294] Updated weights for policy 0, policy_version 200544 (0.0021) [2025-01-04 12:45:18,810][134294] Updated weights for policy 0, policy_version 200554 (0.0029) [2025-01-04 12:45:18,968][134211] Fps is (10 sec: 18022.0, 60 sec: 15086.9, 300 sec: 15023.3). Total num frames: 821469184. Throughput: 0: 3875.0. Samples: 194531710. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 12:45:18,968][134211] Avg episode reward: [(0, '8.294')] [2025-01-04 12:45:21,711][134294] Updated weights for policy 0, policy_version 200564 (0.0023) [2025-01-04 12:45:23,968][134211] Fps is (10 sec: 13926.3, 60 sec: 15155.2, 300 sec: 15023.3). Total num frames: 821538816. Throughput: 0: 3854.8. Samples: 194552122. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 12:45:23,969][134211] Avg episode reward: [(0, '8.828')] [2025-01-04 12:45:24,838][134294] Updated weights for policy 0, policy_version 200574 (0.0030) [2025-01-04 12:45:27,862][134294] Updated weights for policy 0, policy_version 200584 (0.0024) [2025-01-04 12:45:28,968][134211] Fps is (10 sec: 13517.0, 60 sec: 15018.7, 300 sec: 14898.3). Total num frames: 821604352. Throughput: 0: 3843.8. Samples: 194572366. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 12:45:28,968][134211] Avg episode reward: [(0, '8.523')] [2025-01-04 12:45:30,813][134294] Updated weights for policy 0, policy_version 200594 (0.0021) [2025-01-04 12:45:33,724][134294] Updated weights for policy 0, policy_version 200604 (0.0023) [2025-01-04 12:45:33,968][134211] Fps is (10 sec: 13516.8, 60 sec: 15018.7, 300 sec: 14787.2). Total num frames: 821673984. Throughput: 0: 3831.1. Samples: 194582636. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 12:45:33,968][134211] Avg episode reward: [(0, '8.924')] [2025-01-04 12:45:36,603][134294] Updated weights for policy 0, policy_version 200614 (0.0026) [2025-01-04 12:45:38,968][134211] Fps is (10 sec: 13926.2, 60 sec: 14950.4, 300 sec: 14801.1). Total num frames: 821743616. Throughput: 0: 3829.2. Samples: 194603790. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 12:45:38,968][134211] Avg episode reward: [(0, '9.385')] [2025-01-04 12:45:39,634][134294] Updated weights for policy 0, policy_version 200624 (0.0022) [2025-01-04 12:45:42,616][134294] Updated weights for policy 0, policy_version 200634 (0.0023) [2025-01-04 12:45:43,968][134211] Fps is (10 sec: 13926.3, 60 sec: 14950.4, 300 sec: 14801.1). Total num frames: 821813248. Throughput: 0: 3822.5. Samples: 194624370. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 12:45:43,968][134211] Avg episode reward: [(0, '9.346')] [2025-01-04 12:45:45,449][134294] Updated weights for policy 0, policy_version 200644 (0.0026) [2025-01-04 12:45:48,348][134294] Updated weights for policy 0, policy_version 200654 (0.0022) [2025-01-04 12:45:48,968][134211] Fps is (10 sec: 14335.8, 60 sec: 15018.6, 300 sec: 14815.0). Total num frames: 821886976. Throughput: 0: 3818.8. Samples: 194635150. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 12:45:48,969][134211] Avg episode reward: [(0, '9.611')] [2025-01-04 12:45:51,194][134294] Updated weights for policy 0, policy_version 200664 (0.0026) [2025-01-04 12:45:53,968][134211] Fps is (10 sec: 14336.0, 60 sec: 15018.7, 300 sec: 14815.0). Total num frames: 821956608. Throughput: 0: 3777.0. Samples: 194656524. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 12:45:53,968][134211] Avg episode reward: [(0, '11.275')] [2025-01-04 12:45:54,170][134294] Updated weights for policy 0, policy_version 200674 (0.0025) [2025-01-04 12:45:57,065][134294] Updated weights for policy 0, policy_version 200684 (0.0023) [2025-01-04 12:45:58,968][134211] Fps is (10 sec: 13926.1, 60 sec: 15018.6, 300 sec: 14815.0). Total num frames: 822026240. Throughput: 0: 3514.8. Samples: 194677390. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 12:45:58,969][134211] Avg episode reward: [(0, '8.996')] [2025-01-04 12:45:59,986][134294] Updated weights for policy 0, policy_version 200694 (0.0021) [2025-01-04 12:46:02,835][134294] Updated weights for policy 0, policy_version 200704 (0.0022) [2025-01-04 12:46:03,968][134211] Fps is (10 sec: 13926.6, 60 sec: 14950.4, 300 sec: 14815.0). Total num frames: 822095872. Throughput: 0: 3480.1. Samples: 194688316. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 12:46:03,968][134211] Avg episode reward: [(0, '9.831')] [2025-01-04 12:46:05,767][134294] Updated weights for policy 0, policy_version 200714 (0.0023) [2025-01-04 12:46:08,559][134294] Updated weights for policy 0, policy_version 200724 (0.0023) [2025-01-04 12:46:08,968][134211] Fps is (10 sec: 14336.7, 60 sec: 14677.3, 300 sec: 14828.9). Total num frames: 822169600. Throughput: 0: 3499.7. Samples: 194709610. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 12:46:08,968][134211] Avg episode reward: [(0, '9.822')] [2025-01-04 12:46:11,430][134294] Updated weights for policy 0, policy_version 200734 (0.0025) [2025-01-04 12:46:13,968][134211] Fps is (10 sec: 14335.9, 60 sec: 13994.6, 300 sec: 14828.9). Total num frames: 822239232. Throughput: 0: 3518.9. Samples: 194730718. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 12:46:13,968][134211] Avg episode reward: [(0, '8.719')] [2025-01-04 12:46:14,438][134294] Updated weights for policy 0, policy_version 200744 (0.0023) [2025-01-04 12:46:17,178][134294] Updated weights for policy 0, policy_version 200754 (0.0023) [2025-01-04 12:46:18,967][134211] Fps is (10 sec: 15565.0, 60 sec: 14267.8, 300 sec: 14884.5). Total num frames: 822325248. Throughput: 0: 3518.7. Samples: 194740978. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 12:46:18,968][134211] Avg episode reward: [(0, '8.980')] [2025-01-04 12:46:19,041][134294] Updated weights for policy 0, policy_version 200764 (0.0012) [2025-01-04 12:46:20,953][134294] Updated weights for policy 0, policy_version 200774 (0.0012) [2025-01-04 12:46:22,799][134294] Updated weights for policy 0, policy_version 200784 (0.0013) [2025-01-04 12:46:23,968][134211] Fps is (10 sec: 19661.1, 60 sec: 14950.4, 300 sec: 15023.3). Total num frames: 822435840. Throughput: 0: 3752.4. Samples: 194772646. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 12:46:23,968][134211] Avg episode reward: [(0, '10.127')] [2025-01-04 12:46:24,636][134294] Updated weights for policy 0, policy_version 200794 (0.0012) [2025-01-04 12:46:26,547][134294] Updated weights for policy 0, policy_version 200804 (0.0013) [2025-01-04 12:46:28,443][134294] Updated weights for policy 0, policy_version 200814 (0.0015) [2025-01-04 12:46:28,968][134211] Fps is (10 sec: 21708.4, 60 sec: 15633.1, 300 sec: 15134.4). Total num frames: 822542336. Throughput: 0: 4024.5. Samples: 194805470. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 12:46:28,968][134211] Avg episode reward: [(0, '9.098')] [2025-01-04 12:46:31,341][134294] Updated weights for policy 0, policy_version 200824 (0.0024) [2025-01-04 12:46:33,968][134211] Fps is (10 sec: 17203.0, 60 sec: 15564.8, 300 sec: 15120.5). Total num frames: 822607872. Throughput: 0: 4033.5. Samples: 194816656. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 12:46:33,968][134211] Avg episode reward: [(0, '9.701')] [2025-01-04 12:46:34,553][134294] Updated weights for policy 0, policy_version 200834 (0.0029) [2025-01-04 12:46:37,550][134294] Updated weights for policy 0, policy_version 200844 (0.0024) [2025-01-04 12:46:38,968][134211] Fps is (10 sec: 13107.2, 60 sec: 15496.5, 300 sec: 15051.1). Total num frames: 822673408. Throughput: 0: 3989.5. Samples: 194836052. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 12:46:38,968][134211] Avg episode reward: [(0, '9.085')] [2025-01-04 12:46:40,553][134294] Updated weights for policy 0, policy_version 200854 (0.0027) [2025-01-04 12:46:43,507][134294] Updated weights for policy 0, policy_version 200864 (0.0028) [2025-01-04 12:46:43,968][134211] Fps is (10 sec: 13516.8, 60 sec: 15496.6, 300 sec: 15037.2). Total num frames: 822743040. Throughput: 0: 3986.9. Samples: 194856800. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 12:46:43,968][134211] Avg episode reward: [(0, '9.548')] [2025-01-04 12:46:46,455][134294] Updated weights for policy 0, policy_version 200874 (0.0026) [2025-01-04 12:46:48,968][134211] Fps is (10 sec: 13926.4, 60 sec: 15428.3, 300 sec: 15051.1). Total num frames: 822812672. Throughput: 0: 3974.3. Samples: 194867162. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 12:46:48,968][134211] Avg episode reward: [(0, '10.177')] [2025-01-04 12:46:49,516][134294] Updated weights for policy 0, policy_version 200884 (0.0023) [2025-01-04 12:46:52,537][134294] Updated weights for policy 0, policy_version 200894 (0.0023) [2025-01-04 12:46:53,968][134211] Fps is (10 sec: 13516.7, 60 sec: 15360.0, 300 sec: 14981.6). Total num frames: 822878208. Throughput: 0: 3954.4. Samples: 194887560. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 12:46:53,968][134211] Avg episode reward: [(0, '9.273')] [2025-01-04 12:46:55,415][134294] Updated weights for policy 0, policy_version 200904 (0.0025) [2025-01-04 12:46:58,288][134294] Updated weights for policy 0, policy_version 200914 (0.0023) [2025-01-04 12:46:58,968][134211] Fps is (10 sec: 13926.4, 60 sec: 15428.4, 300 sec: 14856.7). Total num frames: 822951936. Throughput: 0: 3958.0. Samples: 194908828. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 12:46:58,968][134211] Avg episode reward: [(0, '9.487')] [2025-01-04 12:47:01,154][134294] Updated weights for policy 0, policy_version 200924 (0.0022) [2025-01-04 12:47:03,970][134211] Fps is (10 sec: 14333.1, 60 sec: 15427.7, 300 sec: 14801.0). Total num frames: 823021568. Throughput: 0: 3965.0. Samples: 194919410. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 12:47:03,970][134211] Avg episode reward: [(0, '9.221')] [2025-01-04 12:47:04,223][134294] Updated weights for policy 0, policy_version 200934 (0.0024) [2025-01-04 12:47:07,141][134294] Updated weights for policy 0, policy_version 200944 (0.0026) [2025-01-04 12:47:08,969][134211] Fps is (10 sec: 13515.6, 60 sec: 15291.5, 300 sec: 14801.1). Total num frames: 823087104. Throughput: 0: 3720.5. Samples: 194940070. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 12:47:08,969][134211] Avg episode reward: [(0, '9.174')] [2025-01-04 12:47:10,191][134294] Updated weights for policy 0, policy_version 200954 (0.0025) [2025-01-04 12:47:13,052][134294] Updated weights for policy 0, policy_version 200964 (0.0021) [2025-01-04 12:47:13,968][134211] Fps is (10 sec: 13519.6, 60 sec: 15291.7, 300 sec: 14815.1). Total num frames: 823156736. Throughput: 0: 3446.9. Samples: 194960582. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 12:47:13,968][134211] Avg episode reward: [(0, '9.148')] [2025-01-04 12:47:13,977][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000200967_823160832.pth... [2025-01-04 12:47:14,042][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000200088_819560448.pth [2025-01-04 12:47:16,021][134294] Updated weights for policy 0, policy_version 200974 (0.0025) [2025-01-04 12:47:18,893][134294] Updated weights for policy 0, policy_version 200984 (0.0024) [2025-01-04 12:47:18,968][134211] Fps is (10 sec: 14337.2, 60 sec: 15086.9, 300 sec: 14828.9). Total num frames: 823230464. Throughput: 0: 3432.6. Samples: 194971124. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 12:47:18,968][134211] Avg episode reward: [(0, '8.724')] [2025-01-04 12:47:21,804][134294] Updated weights for policy 0, policy_version 200994 (0.0022) [2025-01-04 12:47:23,968][134211] Fps is (10 sec: 14336.0, 60 sec: 14404.2, 300 sec: 14828.9). Total num frames: 823300096. Throughput: 0: 3477.8. Samples: 194992552. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 12:47:23,968][134211] Avg episode reward: [(0, '9.112')] [2025-01-04 12:47:24,732][134294] Updated weights for policy 0, policy_version 201004 (0.0026) [2025-01-04 12:47:27,686][134294] Updated weights for policy 0, policy_version 201014 (0.0024) [2025-01-04 12:47:28,968][134211] Fps is (10 sec: 13925.9, 60 sec: 13789.8, 300 sec: 14828.9). Total num frames: 823369728. Throughput: 0: 3473.1. Samples: 195013092. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 12:47:28,969][134211] Avg episode reward: [(0, '9.282')] [2025-01-04 12:47:30,533][134294] Updated weights for policy 0, policy_version 201024 (0.0022) [2025-01-04 12:47:33,382][134294] Updated weights for policy 0, policy_version 201034 (0.0024) [2025-01-04 12:47:33,968][134211] Fps is (10 sec: 14336.0, 60 sec: 13926.4, 300 sec: 14828.9). Total num frames: 823443456. Throughput: 0: 3486.7. Samples: 195024062. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 12:47:33,968][134211] Avg episode reward: [(0, '9.507')] [2025-01-04 12:47:36,230][134294] Updated weights for policy 0, policy_version 201044 (0.0021) [2025-01-04 12:47:38,969][134211] Fps is (10 sec: 14335.2, 60 sec: 13994.5, 300 sec: 14829.0). Total num frames: 823513088. Throughput: 0: 3507.2. Samples: 195045386. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 12:47:38,969][134211] Avg episode reward: [(0, '9.276')] [2025-01-04 12:47:39,253][134294] Updated weights for policy 0, policy_version 201054 (0.0024) [2025-01-04 12:47:42,166][134294] Updated weights for policy 0, policy_version 201064 (0.0025) [2025-01-04 12:47:43,968][134211] Fps is (10 sec: 15155.5, 60 sec: 14199.5, 300 sec: 14856.7). Total num frames: 823595008. Throughput: 0: 3534.9. Samples: 195067896. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 12:47:43,968][134211] Avg episode reward: [(0, '8.103')] [2025-01-04 12:47:44,115][134294] Updated weights for policy 0, policy_version 201074 (0.0012) [2025-01-04 12:47:45,996][134294] Updated weights for policy 0, policy_version 201084 (0.0013) [2025-01-04 12:47:47,891][134294] Updated weights for policy 0, policy_version 201094 (0.0012) [2025-01-04 12:47:48,968][134211] Fps is (10 sec: 18843.6, 60 sec: 14813.9, 300 sec: 14981.7). Total num frames: 823701504. Throughput: 0: 3663.0. Samples: 195084236. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 12:47:48,968][134211] Avg episode reward: [(0, '8.747')] [2025-01-04 12:47:49,760][134294] Updated weights for policy 0, policy_version 201104 (0.0012) [2025-01-04 12:47:51,620][134294] Updated weights for policy 0, policy_version 201114 (0.0011) [2025-01-04 12:47:53,968][134211] Fps is (10 sec: 20479.5, 60 sec: 15360.0, 300 sec: 15078.8). Total num frames: 823799808. Throughput: 0: 3930.6. Samples: 195116946. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 12:47:53,968][134211] Avg episode reward: [(0, '9.204')] [2025-01-04 12:47:54,057][134294] Updated weights for policy 0, policy_version 201124 (0.0023) [2025-01-04 12:47:57,300][134294] Updated weights for policy 0, policy_version 201134 (0.0025) [2025-01-04 12:47:58,968][134211] Fps is (10 sec: 15974.3, 60 sec: 15155.2, 300 sec: 15037.3). Total num frames: 823861248. Throughput: 0: 3923.5. Samples: 195137140. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 12:47:58,968][134211] Avg episode reward: [(0, '8.987')] [2025-01-04 12:48:00,448][134294] Updated weights for policy 0, policy_version 201144 (0.0025) [2025-01-04 12:48:03,403][134294] Updated weights for policy 0, policy_version 201154 (0.0026) [2025-01-04 12:48:03,968][134211] Fps is (10 sec: 13107.3, 60 sec: 15155.7, 300 sec: 15051.2). Total num frames: 823930880. Throughput: 0: 3916.4. Samples: 195147362. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 12:48:03,968][134211] Avg episode reward: [(0, '9.818')] [2025-01-04 12:48:06,404][134294] Updated weights for policy 0, policy_version 201164 (0.0027) [2025-01-04 12:48:08,968][134211] Fps is (10 sec: 13926.2, 60 sec: 15223.7, 300 sec: 15051.1). Total num frames: 824000512. Throughput: 0: 3894.1. Samples: 195167788. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 12:48:08,968][134211] Avg episode reward: [(0, '9.587')] [2025-01-04 12:48:09,420][134294] Updated weights for policy 0, policy_version 201174 (0.0023) [2025-01-04 12:48:12,348][134294] Updated weights for policy 0, policy_version 201184 (0.0027) [2025-01-04 12:48:13,968][134211] Fps is (10 sec: 13926.5, 60 sec: 15223.5, 300 sec: 14995.5). Total num frames: 824070144. Throughput: 0: 3893.2. Samples: 195188286. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 12:48:13,968][134211] Avg episode reward: [(0, '8.727')] [2025-01-04 12:48:15,348][134294] Updated weights for policy 0, policy_version 201194 (0.0022) [2025-01-04 12:48:18,239][134294] Updated weights for policy 0, policy_version 201204 (0.0023) [2025-01-04 12:48:18,968][134211] Fps is (10 sec: 13926.4, 60 sec: 15155.2, 300 sec: 14926.1). Total num frames: 824139776. Throughput: 0: 3883.9. Samples: 195198838. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 12:48:18,968][134211] Avg episode reward: [(0, '8.389')] [2025-01-04 12:48:21,160][134294] Updated weights for policy 0, policy_version 201214 (0.0024) [2025-01-04 12:48:23,968][134211] Fps is (10 sec: 13926.4, 60 sec: 15155.2, 300 sec: 14940.0). Total num frames: 824209408. Throughput: 0: 3880.6. Samples: 195220010. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 12:48:23,968][134211] Avg episode reward: [(0, '9.565')] [2025-01-04 12:48:24,114][134294] Updated weights for policy 0, policy_version 201224 (0.0023) [2025-01-04 12:48:27,015][134294] Updated weights for policy 0, policy_version 201234 (0.0026) [2025-01-04 12:48:28,968][134211] Fps is (10 sec: 13926.6, 60 sec: 15155.3, 300 sec: 14926.1). Total num frames: 824279040. Throughput: 0: 3843.1. Samples: 195240836. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 12:48:28,968][134211] Avg episode reward: [(0, '9.852')] [2025-01-04 12:48:29,964][134294] Updated weights for policy 0, policy_version 201244 (0.0023) [2025-01-04 12:48:32,727][134294] Updated weights for policy 0, policy_version 201254 (0.0025) [2025-01-04 12:48:33,968][134211] Fps is (10 sec: 14335.9, 60 sec: 15155.2, 300 sec: 14940.0). Total num frames: 824352768. Throughput: 0: 3721.8. Samples: 195251718. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 12:48:33,968][134211] Avg episode reward: [(0, '9.212')] [2025-01-04 12:48:35,624][134294] Updated weights for policy 0, policy_version 201264 (0.0025) [2025-01-04 12:48:38,530][134294] Updated weights for policy 0, policy_version 201274 (0.0024) [2025-01-04 12:48:38,968][134211] Fps is (10 sec: 14335.8, 60 sec: 15155.4, 300 sec: 14953.9). Total num frames: 824422400. Throughput: 0: 3472.1. Samples: 195273192. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 12:48:38,968][134211] Avg episode reward: [(0, '9.313')] [2025-01-04 12:48:41,331][134294] Updated weights for policy 0, policy_version 201284 (0.0023) [2025-01-04 12:48:43,968][134211] Fps is (10 sec: 14335.5, 60 sec: 15018.5, 300 sec: 14953.8). Total num frames: 824496128. Throughput: 0: 3504.8. Samples: 195294856. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 12:48:43,969][134211] Avg episode reward: [(0, '10.099')] [2025-01-04 12:48:44,080][134294] Updated weights for policy 0, policy_version 201294 (0.0024) [2025-01-04 12:48:47,046][134294] Updated weights for policy 0, policy_version 201304 (0.0025) [2025-01-04 12:48:48,968][134211] Fps is (10 sec: 14336.2, 60 sec: 14404.3, 300 sec: 14953.9). Total num frames: 824565760. Throughput: 0: 3508.1. Samples: 195305224. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 12:48:48,968][134211] Avg episode reward: [(0, '8.949')] [2025-01-04 12:48:49,963][134294] Updated weights for policy 0, policy_version 201314 (0.0028) [2025-01-04 12:48:52,850][134294] Updated weights for policy 0, policy_version 201324 (0.0023) [2025-01-04 12:48:53,968][134211] Fps is (10 sec: 13926.4, 60 sec: 13926.3, 300 sec: 14842.8). Total num frames: 824635392. Throughput: 0: 3531.7. Samples: 195326714. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 12:48:53,969][134211] Avg episode reward: [(0, '9.074')] [2025-01-04 12:48:55,727][134294] Updated weights for policy 0, policy_version 201334 (0.0022) [2025-01-04 12:48:58,483][134294] Updated weights for policy 0, policy_version 201344 (0.0023) [2025-01-04 12:48:58,970][134211] Fps is (10 sec: 14332.9, 60 sec: 14130.7, 300 sec: 14745.5). Total num frames: 824709120. Throughput: 0: 3556.7. Samples: 195348344. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 12:48:58,970][134211] Avg episode reward: [(0, '9.002')] [2025-01-04 12:49:01,396][134294] Updated weights for policy 0, policy_version 201354 (0.0023) [2025-01-04 12:49:03,968][134211] Fps is (10 sec: 14336.4, 60 sec: 14131.2, 300 sec: 14759.5). Total num frames: 824778752. Throughput: 0: 3560.2. Samples: 195359046. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 12:49:03,968][134211] Avg episode reward: [(0, '9.059')] [2025-01-04 12:49:04,338][134294] Updated weights for policy 0, policy_version 201364 (0.0025) [2025-01-04 12:49:06,469][134294] Updated weights for policy 0, policy_version 201374 (0.0017) [2025-01-04 12:49:08,920][134294] Updated weights for policy 0, policy_version 201384 (0.0019) [2025-01-04 12:49:08,968][134211] Fps is (10 sec: 15977.7, 60 sec: 14472.5, 300 sec: 14828.9). Total num frames: 824868864. Throughput: 0: 3638.1. Samples: 195383726. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 12:49:08,968][134211] Avg episode reward: [(0, '9.233')] [2025-01-04 12:49:11,846][134294] Updated weights for policy 0, policy_version 201394 (0.0029) [2025-01-04 12:49:13,968][134211] Fps is (10 sec: 15974.3, 60 sec: 14472.5, 300 sec: 14828.9). Total num frames: 824938496. Throughput: 0: 3651.2. Samples: 195405142. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 12:49:13,968][134211] Avg episode reward: [(0, '10.186')] [2025-01-04 12:49:13,978][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000201401_824938496.pth... [2025-01-04 12:49:14,051][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000200537_821399552.pth [2025-01-04 12:49:14,890][134294] Updated weights for policy 0, policy_version 201404 (0.0027) [2025-01-04 12:49:17,870][134294] Updated weights for policy 0, policy_version 201414 (0.0024) [2025-01-04 12:49:18,968][134211] Fps is (10 sec: 14336.1, 60 sec: 14540.8, 300 sec: 14856.7). Total num frames: 825012224. Throughput: 0: 3631.7. Samples: 195415144. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 12:49:18,968][134211] Avg episode reward: [(0, '9.590')] [2025-01-04 12:49:19,898][134294] Updated weights for policy 0, policy_version 201424 (0.0014) [2025-01-04 12:49:22,418][134294] Updated weights for policy 0, policy_version 201434 (0.0023) [2025-01-04 12:49:23,968][134211] Fps is (10 sec: 15155.4, 60 sec: 14677.3, 300 sec: 14870.6). Total num frames: 825090048. Throughput: 0: 3721.0. Samples: 195440638. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 12:49:23,968][134211] Avg episode reward: [(0, '10.244')] [2025-01-04 12:49:25,420][134294] Updated weights for policy 0, policy_version 201444 (0.0025) [2025-01-04 12:49:28,285][134294] Updated weights for policy 0, policy_version 201454 (0.0024) [2025-01-04 12:49:28,968][134211] Fps is (10 sec: 15155.1, 60 sec: 14745.6, 300 sec: 14884.4). Total num frames: 825163776. Throughput: 0: 3706.9. Samples: 195461664. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 12:49:28,968][134211] Avg episode reward: [(0, '8.866')] [2025-01-04 12:49:31,248][134294] Updated weights for policy 0, policy_version 201464 (0.0025) [2025-01-04 12:49:33,969][134211] Fps is (10 sec: 14334.7, 60 sec: 14677.1, 300 sec: 14870.5). Total num frames: 825233408. Throughput: 0: 3711.2. Samples: 195472230. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 12:49:33,969][134211] Avg episode reward: [(0, '9.930')] [2025-01-04 12:49:34,140][134294] Updated weights for policy 0, policy_version 201474 (0.0022) [2025-01-04 12:49:36,955][134294] Updated weights for policy 0, policy_version 201484 (0.0022) [2025-01-04 12:49:38,968][134211] Fps is (10 sec: 15155.3, 60 sec: 14882.2, 300 sec: 14912.2). Total num frames: 825315328. Throughput: 0: 3720.5. Samples: 195494134. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 12:49:38,968][134211] Avg episode reward: [(0, '9.310')] [2025-01-04 12:49:38,987][134294] Updated weights for policy 0, policy_version 201494 (0.0015) [2025-01-04 12:49:41,725][134294] Updated weights for policy 0, policy_version 201504 (0.0022) [2025-01-04 12:49:43,968][134211] Fps is (10 sec: 15566.2, 60 sec: 14882.2, 300 sec: 14926.1). Total num frames: 825389056. Throughput: 0: 3785.9. Samples: 195518702. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 12:49:43,968][134211] Avg episode reward: [(0, '9.581')] [2025-01-04 12:49:44,552][134294] Updated weights for policy 0, policy_version 201514 (0.0022) [2025-01-04 12:49:47,490][134294] Updated weights for policy 0, policy_version 201524 (0.0026) [2025-01-04 12:49:48,968][134211] Fps is (10 sec: 14336.0, 60 sec: 14882.1, 300 sec: 14926.1). Total num frames: 825458688. Throughput: 0: 3781.0. Samples: 195529192. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 12:49:48,968][134211] Avg episode reward: [(0, '9.850')] [2025-01-04 12:49:50,390][134294] Updated weights for policy 0, policy_version 201534 (0.0025) [2025-01-04 12:49:53,288][134294] Updated weights for policy 0, policy_version 201544 (0.0024) [2025-01-04 12:49:53,968][134211] Fps is (10 sec: 14335.9, 60 sec: 14950.5, 300 sec: 14940.0). Total num frames: 825532416. Throughput: 0: 3707.6. Samples: 195550568. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 12:49:53,968][134211] Avg episode reward: [(0, '9.628')] [2025-01-04 12:49:56,200][134294] Updated weights for policy 0, policy_version 201554 (0.0025) [2025-01-04 12:49:58,968][134211] Fps is (10 sec: 14336.1, 60 sec: 14882.7, 300 sec: 14926.1). Total num frames: 825602048. Throughput: 0: 3694.9. Samples: 195571410. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 12:49:58,968][134211] Avg episode reward: [(0, '10.988')] [2025-01-04 12:49:59,217][134294] Updated weights for policy 0, policy_version 201564 (0.0023) [2025-01-04 12:50:02,065][134294] Updated weights for policy 0, policy_version 201574 (0.0025) [2025-01-04 12:50:03,967][134211] Fps is (10 sec: 15155.7, 60 sec: 15087.0, 300 sec: 14898.3). Total num frames: 825683968. Throughput: 0: 3700.2. Samples: 195581654. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 12:50:03,968][134211] Avg episode reward: [(0, '9.039')] [2025-01-04 12:50:04,000][134294] Updated weights for policy 0, policy_version 201584 (0.0013) [2025-01-04 12:50:05,859][134294] Updated weights for policy 0, policy_version 201594 (0.0014) [2025-01-04 12:50:08,390][134294] Updated weights for policy 0, policy_version 201604 (0.0022) [2025-01-04 12:50:08,968][134211] Fps is (10 sec: 17203.0, 60 sec: 15086.9, 300 sec: 14828.9). Total num frames: 825774080. Throughput: 0: 3791.2. Samples: 195611242. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 12:50:08,969][134211] Avg episode reward: [(0, '10.115')] [2025-01-04 12:50:11,487][134294] Updated weights for policy 0, policy_version 201614 (0.0027) [2025-01-04 12:50:13,968][134211] Fps is (10 sec: 15973.9, 60 sec: 15086.9, 300 sec: 14828.9). Total num frames: 825843712. Throughput: 0: 3774.7. Samples: 195631524. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 12:50:13,968][134211] Avg episode reward: [(0, '9.308')] [2025-01-04 12:50:14,549][134294] Updated weights for policy 0, policy_version 201624 (0.0028) [2025-01-04 12:50:17,445][134294] Updated weights for policy 0, policy_version 201634 (0.0024) [2025-01-04 12:50:18,968][134211] Fps is (10 sec: 13517.0, 60 sec: 14950.4, 300 sec: 14815.0). Total num frames: 825909248. Throughput: 0: 3768.3. Samples: 195641798. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 12:50:18,968][134211] Avg episode reward: [(0, '8.934')] [2025-01-04 12:50:20,413][134294] Updated weights for policy 0, policy_version 201644 (0.0023) [2025-01-04 12:50:23,309][134294] Updated weights for policy 0, policy_version 201654 (0.0026) [2025-01-04 12:50:23,968][134211] Fps is (10 sec: 13926.4, 60 sec: 14882.1, 300 sec: 14842.8). Total num frames: 825982976. Throughput: 0: 3753.8. Samples: 195663054. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 12:50:23,968][134211] Avg episode reward: [(0, '10.665')] [2025-01-04 12:50:26,161][134294] Updated weights for policy 0, policy_version 201664 (0.0024) [2025-01-04 12:50:28,968][134211] Fps is (10 sec: 14335.8, 60 sec: 14813.9, 300 sec: 14842.8). Total num frames: 826052608. Throughput: 0: 3671.9. Samples: 195683936. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 12:50:28,968][134211] Avg episode reward: [(0, '9.161')] [2025-01-04 12:50:29,228][134294] Updated weights for policy 0, policy_version 201674 (0.0028) [2025-01-04 12:50:32,124][134294] Updated weights for policy 0, policy_version 201684 (0.0024) [2025-01-04 12:50:33,968][134211] Fps is (10 sec: 13926.6, 60 sec: 14814.1, 300 sec: 14842.8). Total num frames: 826122240. Throughput: 0: 3668.9. Samples: 195694294. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 12:50:33,968][134211] Avg episode reward: [(0, '9.698')] [2025-01-04 12:50:35,043][134294] Updated weights for policy 0, policy_version 201694 (0.0026) [2025-01-04 12:50:37,857][134294] Updated weights for policy 0, policy_version 201704 (0.0025) [2025-01-04 12:50:38,968][134211] Fps is (10 sec: 13925.9, 60 sec: 14608.9, 300 sec: 14842.8). Total num frames: 826191872. Throughput: 0: 3672.7. Samples: 195715840. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 12:50:38,969][134211] Avg episode reward: [(0, '9.866')] [2025-01-04 12:50:40,657][134294] Updated weights for policy 0, policy_version 201714 (0.0026) [2025-01-04 12:50:43,583][134294] Updated weights for policy 0, policy_version 201724 (0.0025) [2025-01-04 12:50:43,968][134211] Fps is (10 sec: 14335.9, 60 sec: 14609.1, 300 sec: 14842.8). Total num frames: 826265600. Throughput: 0: 3685.1. Samples: 195737238. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 12:50:43,968][134211] Avg episode reward: [(0, '9.733')] [2025-01-04 12:50:46,264][134294] Updated weights for policy 0, policy_version 201734 (0.0023) [2025-01-04 12:50:48,138][134294] Updated weights for policy 0, policy_version 201744 (0.0013) [2025-01-04 12:50:48,968][134211] Fps is (10 sec: 16794.5, 60 sec: 15018.7, 300 sec: 14926.1). Total num frames: 826359808. Throughput: 0: 3704.8. Samples: 195748370. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 12:50:48,968][134211] Avg episode reward: [(0, '8.921')] [2025-01-04 12:50:50,005][134294] Updated weights for policy 0, policy_version 201754 (0.0014) [2025-01-04 12:50:52,575][134294] Updated weights for policy 0, policy_version 201764 (0.0022) [2025-01-04 12:50:53,968][134211] Fps is (10 sec: 17612.7, 60 sec: 15155.2, 300 sec: 14967.8). Total num frames: 826441728. Throughput: 0: 3712.8. Samples: 195778320. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 12:50:53,968][134211] Avg episode reward: [(0, '9.147')] [2025-01-04 12:50:55,625][134294] Updated weights for policy 0, policy_version 201774 (0.0025) [2025-01-04 12:50:58,548][134294] Updated weights for policy 0, policy_version 201784 (0.0027) [2025-01-04 12:50:58,969][134211] Fps is (10 sec: 15152.9, 60 sec: 15154.8, 300 sec: 14967.7). Total num frames: 826511360. Throughput: 0: 3719.0. Samples: 195798884. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 12:50:58,970][134211] Avg episode reward: [(0, '10.064')] [2025-01-04 12:51:01,467][134294] Updated weights for policy 0, policy_version 201794 (0.0024) [2025-01-04 12:51:03,968][134211] Fps is (10 sec: 13516.8, 60 sec: 14882.1, 300 sec: 14940.0). Total num frames: 826576896. Throughput: 0: 3719.8. Samples: 195809188. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 12:51:03,969][134211] Avg episode reward: [(0, '10.512')] [2025-01-04 12:51:04,545][134294] Updated weights for policy 0, policy_version 201804 (0.0027) [2025-01-04 12:51:07,494][134294] Updated weights for policy 0, policy_version 201814 (0.0026) [2025-01-04 12:51:08,968][134211] Fps is (10 sec: 13518.7, 60 sec: 14540.8, 300 sec: 14940.0). Total num frames: 826646528. Throughput: 0: 3700.0. Samples: 195829556. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 12:51:08,968][134211] Avg episode reward: [(0, '10.011')] [2025-01-04 12:51:10,440][134294] Updated weights for policy 0, policy_version 201824 (0.0026) [2025-01-04 12:51:13,281][134294] Updated weights for policy 0, policy_version 201834 (0.0026) [2025-01-04 12:51:13,968][134211] Fps is (10 sec: 14335.9, 60 sec: 14609.1, 300 sec: 14898.3). Total num frames: 826720256. Throughput: 0: 3710.2. Samples: 195850894. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 12:51:13,969][134211] Avg episode reward: [(0, '9.884')] [2025-01-04 12:51:13,977][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000201836_826720256.pth... [2025-01-04 12:51:14,053][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000200967_823160832.pth [2025-01-04 12:51:16,239][134294] Updated weights for policy 0, policy_version 201844 (0.0027) [2025-01-04 12:51:18,968][134211] Fps is (10 sec: 14336.0, 60 sec: 14677.3, 300 sec: 14759.5). Total num frames: 826789888. Throughput: 0: 3711.5. Samples: 195861312. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 12:51:18,968][134211] Avg episode reward: [(0, '9.514')] [2025-01-04 12:51:19,230][134294] Updated weights for policy 0, policy_version 201854 (0.0025) [2025-01-04 12:51:22,093][134294] Updated weights for policy 0, policy_version 201864 (0.0027) [2025-01-04 12:51:23,967][134211] Fps is (10 sec: 14746.1, 60 sec: 14745.7, 300 sec: 14662.3). Total num frames: 826867712. Throughput: 0: 3695.6. Samples: 195882138. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 12:51:23,968][134211] Avg episode reward: [(0, '9.653')] [2025-01-04 12:51:24,188][134294] Updated weights for policy 0, policy_version 201874 (0.0013) [2025-01-04 12:51:26,025][134294] Updated weights for policy 0, policy_version 201884 (0.0013) [2025-01-04 12:51:28,324][134294] Updated weights for policy 0, policy_version 201894 (0.0017) [2025-01-04 12:51:28,968][134211] Fps is (10 sec: 17612.6, 60 sec: 15223.4, 300 sec: 14773.4). Total num frames: 826966016. Throughput: 0: 3884.0. Samples: 195912018. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 12:51:28,968][134211] Avg episode reward: [(0, '9.152')] [2025-01-04 12:51:31,392][134294] Updated weights for policy 0, policy_version 201904 (0.0027) [2025-01-04 12:51:33,968][134211] Fps is (10 sec: 16383.5, 60 sec: 15155.2, 300 sec: 14773.4). Total num frames: 827031552. Throughput: 0: 3860.9. Samples: 195922110. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 12:51:33,969][134211] Avg episode reward: [(0, '9.464')] [2025-01-04 12:51:34,443][134294] Updated weights for policy 0, policy_version 201914 (0.0022) [2025-01-04 12:51:37,485][134294] Updated weights for policy 0, policy_version 201924 (0.0027) [2025-01-04 12:51:38,968][134211] Fps is (10 sec: 13107.5, 60 sec: 15087.1, 300 sec: 14759.5). Total num frames: 827097088. Throughput: 0: 3644.0. Samples: 195942298. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 12:51:38,968][134211] Avg episode reward: [(0, '10.045')] [2025-01-04 12:51:40,426][134294] Updated weights for policy 0, policy_version 201934 (0.0027) [2025-01-04 12:51:43,338][134294] Updated weights for policy 0, policy_version 201944 (0.0025) [2025-01-04 12:51:43,968][134211] Fps is (10 sec: 13517.1, 60 sec: 15018.7, 300 sec: 14759.5). Total num frames: 827166720. Throughput: 0: 3656.1. Samples: 195963404. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 12:51:43,968][134211] Avg episode reward: [(0, '8.876')] [2025-01-04 12:51:46,256][134294] Updated weights for policy 0, policy_version 201954 (0.0025) [2025-01-04 12:51:48,968][134211] Fps is (10 sec: 14335.9, 60 sec: 14677.3, 300 sec: 14787.3). Total num frames: 827240448. Throughput: 0: 3662.0. Samples: 195973976. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 12:51:48,968][134211] Avg episode reward: [(0, '9.478')] [2025-01-04 12:51:49,136][134294] Updated weights for policy 0, policy_version 201964 (0.0025) [2025-01-04 12:51:52,089][134294] Updated weights for policy 0, policy_version 201974 (0.0024) [2025-01-04 12:51:53,968][134211] Fps is (10 sec: 14336.0, 60 sec: 14472.6, 300 sec: 14773.4). Total num frames: 827310080. Throughput: 0: 3672.3. Samples: 195994810. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 12:51:53,968][134211] Avg episode reward: [(0, '9.019')] [2025-01-04 12:51:55,078][134294] Updated weights for policy 0, policy_version 201984 (0.0020) [2025-01-04 12:51:57,022][134294] Updated weights for policy 0, policy_version 201994 (0.0014) [2025-01-04 12:51:58,970][134211] Fps is (10 sec: 15971.0, 60 sec: 14813.7, 300 sec: 14842.8). Total num frames: 827400192. Throughput: 0: 3767.8. Samples: 196020454. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 12:51:58,970][134211] Avg episode reward: [(0, '9.717')] [2025-01-04 12:51:59,517][134294] Updated weights for policy 0, policy_version 202004 (0.0020) [2025-01-04 12:52:02,328][134294] Updated weights for policy 0, policy_version 202014 (0.0026) [2025-01-04 12:52:03,968][134211] Fps is (10 sec: 15974.1, 60 sec: 14882.1, 300 sec: 14856.7). Total num frames: 827469824. Throughput: 0: 3782.6. Samples: 196031530. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 12:52:03,968][134211] Avg episode reward: [(0, '9.284')] [2025-01-04 12:52:05,322][134294] Updated weights for policy 0, policy_version 202024 (0.0025) [2025-01-04 12:52:08,233][134294] Updated weights for policy 0, policy_version 202034 (0.0026) [2025-01-04 12:52:08,968][134211] Fps is (10 sec: 13929.2, 60 sec: 14882.1, 300 sec: 14856.7). Total num frames: 827539456. Throughput: 0: 3786.8. Samples: 196052546. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 12:52:08,969][134211] Avg episode reward: [(0, '9.410')] [2025-01-04 12:52:11,067][134294] Updated weights for policy 0, policy_version 202044 (0.0025) [2025-01-04 12:52:13,968][134211] Fps is (10 sec: 13926.5, 60 sec: 14813.9, 300 sec: 14842.8). Total num frames: 827609088. Throughput: 0: 3582.5. Samples: 196073228. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 12:52:13,968][134211] Avg episode reward: [(0, '10.013')] [2025-01-04 12:52:14,197][134294] Updated weights for policy 0, policy_version 202054 (0.0030) [2025-01-04 12:52:16,206][134294] Updated weights for policy 0, policy_version 202064 (0.0014) [2025-01-04 12:52:18,115][134294] Updated weights for policy 0, policy_version 202074 (0.0014) [2025-01-04 12:52:18,968][134211] Fps is (10 sec: 17203.4, 60 sec: 15360.0, 300 sec: 14953.9). Total num frames: 827711488. Throughput: 0: 3652.5. Samples: 196086470. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 12:52:18,968][134211] Avg episode reward: [(0, '9.676')] [2025-01-04 12:52:20,366][134294] Updated weights for policy 0, policy_version 202084 (0.0020) [2025-01-04 12:52:23,302][134294] Updated weights for policy 0, policy_version 202094 (0.0025) [2025-01-04 12:52:23,968][134211] Fps is (10 sec: 17203.0, 60 sec: 15223.4, 300 sec: 14953.9). Total num frames: 827781120. Throughput: 0: 3808.1. Samples: 196113664. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 12:52:23,968][134211] Avg episode reward: [(0, '9.695')] [2025-01-04 12:52:26,308][134294] Updated weights for policy 0, policy_version 202104 (0.0027) [2025-01-04 12:52:28,968][134211] Fps is (10 sec: 14335.9, 60 sec: 14813.9, 300 sec: 14953.9). Total num frames: 827854848. Throughput: 0: 3791.0. Samples: 196133998. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 12:52:28,968][134211] Avg episode reward: [(0, '10.234')] [2025-01-04 12:52:29,304][134294] Updated weights for policy 0, policy_version 202114 (0.0023) [2025-01-04 12:52:32,301][134294] Updated weights for policy 0, policy_version 202124 (0.0025) [2025-01-04 12:52:33,968][134211] Fps is (10 sec: 13926.3, 60 sec: 14813.9, 300 sec: 14940.0). Total num frames: 827920384. Throughput: 0: 3782.9. Samples: 196144206. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 12:52:33,968][134211] Avg episode reward: [(0, '9.521')] [2025-01-04 12:52:35,263][134294] Updated weights for policy 0, policy_version 202134 (0.0024) [2025-01-04 12:52:38,109][134294] Updated weights for policy 0, policy_version 202144 (0.0023) [2025-01-04 12:52:38,968][134211] Fps is (10 sec: 13516.8, 60 sec: 14882.1, 300 sec: 14898.3). Total num frames: 827990016. Throughput: 0: 3792.1. Samples: 196165454. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 12:52:38,968][134211] Avg episode reward: [(0, '8.787')] [2025-01-04 12:52:41,022][134294] Updated weights for policy 0, policy_version 202154 (0.0023) [2025-01-04 12:52:43,968][134211] Fps is (10 sec: 14336.2, 60 sec: 14950.4, 300 sec: 14787.2). Total num frames: 828063744. Throughput: 0: 3687.2. Samples: 196186372. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 12:52:43,968][134211] Avg episode reward: [(0, '9.531')] [2025-01-04 12:52:43,971][134294] Updated weights for policy 0, policy_version 202164 (0.0027) [2025-01-04 12:52:46,906][134294] Updated weights for policy 0, policy_version 202174 (0.0023) [2025-01-04 12:52:48,968][134211] Fps is (10 sec: 14336.1, 60 sec: 14882.1, 300 sec: 14690.1). Total num frames: 828133376. Throughput: 0: 3670.9. Samples: 196196722. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 12:52:48,968][134211] Avg episode reward: [(0, '10.574')] [2025-01-04 12:52:49,831][134294] Updated weights for policy 0, policy_version 202184 (0.0022) [2025-01-04 12:52:52,277][134294] Updated weights for policy 0, policy_version 202194 (0.0019) [2025-01-04 12:52:53,968][134211] Fps is (10 sec: 15974.6, 60 sec: 15223.5, 300 sec: 14787.3). Total num frames: 828223488. Throughput: 0: 3703.4. Samples: 196219200. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 12:52:53,968][134211] Avg episode reward: [(0, '9.590')] [2025-01-04 12:52:54,186][134294] Updated weights for policy 0, policy_version 202204 (0.0012) [2025-01-04 12:52:56,022][134294] Updated weights for policy 0, policy_version 202214 (0.0013) [2025-01-04 12:52:57,919][134294] Updated weights for policy 0, policy_version 202224 (0.0013) [2025-01-04 12:52:58,968][134211] Fps is (10 sec: 19251.2, 60 sec: 15428.8, 300 sec: 14898.3). Total num frames: 828325888. Throughput: 0: 3972.7. Samples: 196251998. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 12:52:58,968][134211] Avg episode reward: [(0, '9.563')] [2025-01-04 12:53:00,554][134294] Updated weights for policy 0, policy_version 202234 (0.0024) [2025-01-04 12:53:03,652][134294] Updated weights for policy 0, policy_version 202244 (0.0026) [2025-01-04 12:53:03,968][134211] Fps is (10 sec: 17202.9, 60 sec: 15428.3, 300 sec: 14898.3). Total num frames: 828395520. Throughput: 0: 3921.3. Samples: 196262928. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 12:53:03,968][134211] Avg episode reward: [(0, '10.761')] [2025-01-04 12:53:06,723][134294] Updated weights for policy 0, policy_version 202254 (0.0026) [2025-01-04 12:53:08,968][134211] Fps is (10 sec: 13516.9, 60 sec: 15360.0, 300 sec: 14884.4). Total num frames: 828461056. Throughput: 0: 3758.7. Samples: 196282806. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 12:53:08,968][134211] Avg episode reward: [(0, '9.128')] [2025-01-04 12:53:09,771][134294] Updated weights for policy 0, policy_version 202264 (0.0025) [2025-01-04 12:53:12,841][134294] Updated weights for policy 0, policy_version 202274 (0.0024) [2025-01-04 12:53:13,968][134211] Fps is (10 sec: 13516.7, 60 sec: 15360.0, 300 sec: 14884.4). Total num frames: 828530688. Throughput: 0: 3754.2. Samples: 196302938. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 12:53:13,968][134211] Avg episode reward: [(0, '10.712')] [2025-01-04 12:53:13,977][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000202278_828530688.pth... [2025-01-04 12:53:14,043][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000201401_824938496.pth [2025-01-04 12:53:15,763][134294] Updated weights for policy 0, policy_version 202284 (0.0024) [2025-01-04 12:53:18,699][134294] Updated weights for policy 0, policy_version 202294 (0.0023) [2025-01-04 12:53:18,968][134211] Fps is (10 sec: 13516.8, 60 sec: 14745.6, 300 sec: 14870.6). Total num frames: 828596224. Throughput: 0: 3756.9. Samples: 196313266. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 12:53:18,968][134211] Avg episode reward: [(0, '9.070')] [2025-01-04 12:53:21,553][134294] Updated weights for policy 0, policy_version 202304 (0.0026) [2025-01-04 12:53:23,968][134211] Fps is (10 sec: 13926.7, 60 sec: 14813.9, 300 sec: 14884.4). Total num frames: 828669952. Throughput: 0: 3757.3. Samples: 196334530. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 12:53:23,968][134211] Avg episode reward: [(0, '9.929')] [2025-01-04 12:53:24,661][134294] Updated weights for policy 0, policy_version 202314 (0.0024) [2025-01-04 12:53:27,561][134294] Updated weights for policy 0, policy_version 202324 (0.0027) [2025-01-04 12:53:28,968][134211] Fps is (10 sec: 13926.4, 60 sec: 14677.4, 300 sec: 14856.7). Total num frames: 828735488. Throughput: 0: 3747.1. Samples: 196354990. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 12:53:28,968][134211] Avg episode reward: [(0, '9.494')] [2025-01-04 12:53:30,440][134294] Updated weights for policy 0, policy_version 202334 (0.0025) [2025-01-04 12:53:33,321][134294] Updated weights for policy 0, policy_version 202344 (0.0022) [2025-01-04 12:53:33,968][134211] Fps is (10 sec: 13926.2, 60 sec: 14813.9, 300 sec: 14870.6). Total num frames: 828809216. Throughput: 0: 3758.6. Samples: 196365858. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 12:53:33,968][134211] Avg episode reward: [(0, '9.332')] [2025-01-04 12:53:36,156][134294] Updated weights for policy 0, policy_version 202354 (0.0024) [2025-01-04 12:53:38,968][134211] Fps is (10 sec: 14335.9, 60 sec: 14813.9, 300 sec: 14856.7). Total num frames: 828878848. Throughput: 0: 3730.2. Samples: 196387058. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 12:53:38,968][134211] Avg episode reward: [(0, '10.419')] [2025-01-04 12:53:39,253][134294] Updated weights for policy 0, policy_version 202364 (0.0026) [2025-01-04 12:53:42,009][134294] Updated weights for policy 0, policy_version 202374 (0.0024) [2025-01-04 12:53:43,968][134211] Fps is (10 sec: 13926.6, 60 sec: 14745.6, 300 sec: 14856.7). Total num frames: 828948480. Throughput: 0: 3467.1. Samples: 196408016. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 12:53:43,968][134211] Avg episode reward: [(0, '10.340')] [2025-01-04 12:53:44,990][134294] Updated weights for policy 0, policy_version 202384 (0.0025) [2025-01-04 12:53:46,968][134294] Updated weights for policy 0, policy_version 202394 (0.0014) [2025-01-04 12:53:48,968][134211] Fps is (10 sec: 15973.6, 60 sec: 15086.8, 300 sec: 14926.1). Total num frames: 829038592. Throughput: 0: 3500.0. Samples: 196420430. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 12:53:48,969][134211] Avg episode reward: [(0, '8.743')] [2025-01-04 12:53:49,457][134294] Updated weights for policy 0, policy_version 202404 (0.0022) [2025-01-04 12:53:52,257][134294] Updated weights for policy 0, policy_version 202414 (0.0026) [2025-01-04 12:53:53,968][134211] Fps is (10 sec: 15974.3, 60 sec: 14745.6, 300 sec: 14912.3). Total num frames: 829108224. Throughput: 0: 3601.0. Samples: 196444850. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 12:53:53,968][134211] Avg episode reward: [(0, '10.783')] [2025-01-04 12:53:55,205][134294] Updated weights for policy 0, policy_version 202424 (0.0026) [2025-01-04 12:53:58,054][134294] Updated weights for policy 0, policy_version 202434 (0.0024) [2025-01-04 12:53:58,968][134211] Fps is (10 sec: 14336.7, 60 sec: 14267.7, 300 sec: 14926.1). Total num frames: 829181952. Throughput: 0: 3626.9. Samples: 196466146. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 12:53:58,968][134211] Avg episode reward: [(0, '9.396')] [2025-01-04 12:54:00,944][134294] Updated weights for policy 0, policy_version 202444 (0.0024) [2025-01-04 12:54:03,865][134294] Updated weights for policy 0, policy_version 202454 (0.0025) [2025-01-04 12:54:03,968][134211] Fps is (10 sec: 14336.0, 60 sec: 14267.7, 300 sec: 14856.7). Total num frames: 829251584. Throughput: 0: 3634.5. Samples: 196476818. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 12:54:03,968][134211] Avg episode reward: [(0, '9.968')] [2025-01-04 12:54:06,712][134294] Updated weights for policy 0, policy_version 202464 (0.0021) [2025-01-04 12:54:08,967][134211] Fps is (10 sec: 14746.0, 60 sec: 14472.6, 300 sec: 14884.5). Total num frames: 829329408. Throughput: 0: 3632.7. Samples: 196498002. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 12:54:08,968][134211] Avg episode reward: [(0, '9.383')] [2025-01-04 12:54:09,015][134294] Updated weights for policy 0, policy_version 202474 (0.0016) [2025-01-04 12:54:11,424][134294] Updated weights for policy 0, policy_version 202484 (0.0022) [2025-01-04 12:54:13,968][134211] Fps is (10 sec: 15974.5, 60 sec: 14677.4, 300 sec: 14912.2). Total num frames: 829411328. Throughput: 0: 3739.1. Samples: 196523250. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 12:54:13,968][134211] Avg episode reward: [(0, '8.784')] [2025-01-04 12:54:14,320][134294] Updated weights for policy 0, policy_version 202494 (0.0023) [2025-01-04 12:54:17,259][134294] Updated weights for policy 0, policy_version 202504 (0.0026) [2025-01-04 12:54:18,968][134211] Fps is (10 sec: 14745.2, 60 sec: 14677.3, 300 sec: 14870.6). Total num frames: 829476864. Throughput: 0: 3729.7. Samples: 196533696. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 12:54:18,968][134211] Avg episode reward: [(0, '9.510')] [2025-01-04 12:54:20,121][134294] Updated weights for policy 0, policy_version 202514 (0.0023) [2025-01-04 12:54:22,956][134294] Updated weights for policy 0, policy_version 202524 (0.0024) [2025-01-04 12:54:23,968][134211] Fps is (10 sec: 13926.2, 60 sec: 14677.3, 300 sec: 14870.6). Total num frames: 829550592. Throughput: 0: 3732.8. Samples: 196555036. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 12:54:23,968][134211] Avg episode reward: [(0, '9.548')] [2025-01-04 12:54:25,835][134294] Updated weights for policy 0, policy_version 202534 (0.0024) [2025-01-04 12:54:28,547][134294] Updated weights for policy 0, policy_version 202544 (0.0020) [2025-01-04 12:54:28,967][134211] Fps is (10 sec: 15155.6, 60 sec: 14882.2, 300 sec: 14898.4). Total num frames: 829628416. Throughput: 0: 3743.4. Samples: 196576468. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 12:54:28,968][134211] Avg episode reward: [(0, '10.211')] [2025-01-04 12:54:30,540][134294] Updated weights for policy 0, policy_version 202554 (0.0016) [2025-01-04 12:54:33,190][134294] Updated weights for policy 0, policy_version 202564 (0.0025) [2025-01-04 12:54:33,968][134211] Fps is (10 sec: 15974.3, 60 sec: 15018.7, 300 sec: 14898.3). Total num frames: 829710336. Throughput: 0: 3797.7. Samples: 196591326. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 12:54:33,968][134211] Avg episode reward: [(0, '10.204')] [2025-01-04 12:54:36,150][134294] Updated weights for policy 0, policy_version 202574 (0.0024) [2025-01-04 12:54:38,968][134211] Fps is (10 sec: 15154.8, 60 sec: 15018.7, 300 sec: 14884.4). Total num frames: 829779968. Throughput: 0: 3726.4. Samples: 196612538. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 12:54:38,968][134211] Avg episode reward: [(0, '9.081')] [2025-01-04 12:54:39,133][134294] Updated weights for policy 0, policy_version 202584 (0.0024) [2025-01-04 12:54:42,119][134294] Updated weights for policy 0, policy_version 202594 (0.0025) [2025-01-04 12:54:43,968][134211] Fps is (10 sec: 13926.5, 60 sec: 15018.6, 300 sec: 14884.4). Total num frames: 829849600. Throughput: 0: 3713.4. Samples: 196633250. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 12:54:43,968][134211] Avg episode reward: [(0, '10.567')] [2025-01-04 12:54:44,997][134294] Updated weights for policy 0, policy_version 202604 (0.0027) [2025-01-04 12:54:47,836][134294] Updated weights for policy 0, policy_version 202614 (0.0022) [2025-01-04 12:54:48,968][134211] Fps is (10 sec: 13926.4, 60 sec: 14677.5, 300 sec: 14870.6). Total num frames: 829919232. Throughput: 0: 3715.4. Samples: 196644010. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 12:54:48,968][134211] Avg episode reward: [(0, '8.931')] [2025-01-04 12:54:50,758][134294] Updated weights for policy 0, policy_version 202624 (0.0025) [2025-01-04 12:54:53,530][134294] Updated weights for policy 0, policy_version 202634 (0.0026) [2025-01-04 12:54:53,968][134211] Fps is (10 sec: 14336.1, 60 sec: 14745.6, 300 sec: 14884.4). Total num frames: 829992960. Throughput: 0: 3722.4. Samples: 196665512. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 12:54:53,968][134211] Avg episode reward: [(0, '9.526')] [2025-01-04 12:54:56,423][134294] Updated weights for policy 0, policy_version 202644 (0.0023) [2025-01-04 12:54:58,968][134211] Fps is (10 sec: 14336.1, 60 sec: 14677.4, 300 sec: 14842.8). Total num frames: 830062592. Throughput: 0: 3630.6. Samples: 196686628. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 12:54:58,968][134211] Avg episode reward: [(0, '10.671')] [2025-01-04 12:54:59,472][134294] Updated weights for policy 0, policy_version 202654 (0.0021) [2025-01-04 12:55:01,838][134294] Updated weights for policy 0, policy_version 202664 (0.0018) [2025-01-04 12:55:03,774][134294] Updated weights for policy 0, policy_version 202674 (0.0013) [2025-01-04 12:55:03,967][134211] Fps is (10 sec: 16384.2, 60 sec: 15087.0, 300 sec: 14856.7). Total num frames: 830156800. Throughput: 0: 3643.8. Samples: 196697666. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 12:55:03,968][134211] Avg episode reward: [(0, '10.694')] [2025-01-04 12:55:05,789][134294] Updated weights for policy 0, policy_version 202684 (0.0015) [2025-01-04 12:55:08,756][134294] Updated weights for policy 0, policy_version 202694 (0.0024) [2025-01-04 12:55:08,969][134211] Fps is (10 sec: 17201.5, 60 sec: 15086.7, 300 sec: 14884.4). Total num frames: 830234624. Throughput: 0: 3810.6. Samples: 196726516. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 12:55:08,969][134211] Avg episode reward: [(0, '9.359')] [2025-01-04 12:55:11,705][134294] Updated weights for policy 0, policy_version 202704 (0.0024) [2025-01-04 12:55:13,968][134211] Fps is (10 sec: 14745.3, 60 sec: 14882.1, 300 sec: 14898.3). Total num frames: 830304256. Throughput: 0: 3789.5. Samples: 196746996. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 12:55:13,968][134211] Avg episode reward: [(0, '10.034')] [2025-01-04 12:55:13,982][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000202711_830304256.pth... [2025-01-04 12:55:14,054][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000201836_826720256.pth [2025-01-04 12:55:14,743][134294] Updated weights for policy 0, policy_version 202714 (0.0023) [2025-01-04 12:55:17,753][134294] Updated weights for policy 0, policy_version 202724 (0.0025) [2025-01-04 12:55:18,968][134211] Fps is (10 sec: 13518.1, 60 sec: 14882.2, 300 sec: 14870.6). Total num frames: 830369792. Throughput: 0: 3681.4. Samples: 196756990. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 12:55:18,968][134211] Avg episode reward: [(0, '9.566')] [2025-01-04 12:55:20,713][134294] Updated weights for policy 0, policy_version 202734 (0.0024) [2025-01-04 12:55:23,502][134294] Updated weights for policy 0, policy_version 202744 (0.0024) [2025-01-04 12:55:23,968][134211] Fps is (10 sec: 13925.7, 60 sec: 14882.0, 300 sec: 14884.4). Total num frames: 830443520. Throughput: 0: 3681.3. Samples: 196778196. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 12:55:23,969][134211] Avg episode reward: [(0, '9.274')] [2025-01-04 12:55:26,454][134294] Updated weights for policy 0, policy_version 202754 (0.0025) [2025-01-04 12:55:28,968][134211] Fps is (10 sec: 14336.1, 60 sec: 14745.6, 300 sec: 14884.5). Total num frames: 830513152. Throughput: 0: 3685.8. Samples: 196799110. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 12:55:28,968][134211] Avg episode reward: [(0, '8.594')] [2025-01-04 12:55:29,431][134294] Updated weights for policy 0, policy_version 202764 (0.0023) [2025-01-04 12:55:32,348][134294] Updated weights for policy 0, policy_version 202774 (0.0024) [2025-01-04 12:55:33,969][134211] Fps is (10 sec: 13926.0, 60 sec: 14540.6, 300 sec: 14884.4). Total num frames: 830582784. Throughput: 0: 3678.4. Samples: 196809542. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 12:55:33,971][134211] Avg episode reward: [(0, '9.549')] [2025-01-04 12:55:35,261][134294] Updated weights for policy 0, policy_version 202784 (0.0025) [2025-01-04 12:55:37,572][134294] Updated weights for policy 0, policy_version 202794 (0.0017) [2025-01-04 12:55:38,968][134211] Fps is (10 sec: 15564.5, 60 sec: 14813.9, 300 sec: 14926.1). Total num frames: 830668800. Throughput: 0: 3704.8. Samples: 196832230. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 12:55:38,969][134211] Avg episode reward: [(0, '10.242')] [2025-01-04 12:55:39,868][134294] Updated weights for policy 0, policy_version 202804 (0.0017) [2025-01-04 12:55:42,650][134294] Updated weights for policy 0, policy_version 202814 (0.0022) [2025-01-04 12:55:43,968][134211] Fps is (10 sec: 15975.6, 60 sec: 14882.1, 300 sec: 14856.7). Total num frames: 830742528. Throughput: 0: 3773.5. Samples: 196856436. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 12:55:43,968][134211] Avg episode reward: [(0, '9.681')] [2025-01-04 12:55:45,624][134294] Updated weights for policy 0, policy_version 202824 (0.0027) [2025-01-04 12:55:48,478][134294] Updated weights for policy 0, policy_version 202834 (0.0023) [2025-01-04 12:55:48,968][134211] Fps is (10 sec: 14336.2, 60 sec: 14882.2, 300 sec: 14815.0). Total num frames: 830812160. Throughput: 0: 3765.1. Samples: 196867096. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 12:55:48,968][134211] Avg episode reward: [(0, '9.310')] [2025-01-04 12:55:51,336][134294] Updated weights for policy 0, policy_version 202844 (0.0027) [2025-01-04 12:55:53,968][134211] Fps is (10 sec: 13926.3, 60 sec: 14813.8, 300 sec: 14815.1). Total num frames: 830881792. Throughput: 0: 3596.1. Samples: 196888336. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 12:55:53,968][134211] Avg episode reward: [(0, '8.788')] [2025-01-04 12:55:54,281][134294] Updated weights for policy 0, policy_version 202854 (0.0028) [2025-01-04 12:55:57,260][134294] Updated weights for policy 0, policy_version 202864 (0.0025) [2025-01-04 12:55:58,967][134211] Fps is (10 sec: 14745.8, 60 sec: 14950.4, 300 sec: 14856.7). Total num frames: 830959616. Throughput: 0: 3609.6. Samples: 196909426. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 12:55:58,968][134211] Avg episode reward: [(0, '9.728')] [2025-01-04 12:55:59,433][134294] Updated weights for policy 0, policy_version 202874 (0.0016) [2025-01-04 12:56:01,861][134294] Updated weights for policy 0, policy_version 202884 (0.0021) [2025-01-04 12:56:03,968][134211] Fps is (10 sec: 15973.5, 60 sec: 14745.4, 300 sec: 14898.3). Total num frames: 831041536. Throughput: 0: 3715.5. Samples: 196924190. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 12:56:03,969][134211] Avg episode reward: [(0, '8.016')] [2025-01-04 12:56:04,865][134294] Updated weights for policy 0, policy_version 202894 (0.0026) [2025-01-04 12:56:07,748][134294] Updated weights for policy 0, policy_version 202904 (0.0023) [2025-01-04 12:56:08,968][134211] Fps is (10 sec: 15155.0, 60 sec: 14609.3, 300 sec: 14884.5). Total num frames: 831111168. Throughput: 0: 3701.5. Samples: 196944760. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 12:56:08,968][134211] Avg episode reward: [(0, '9.740')] [2025-01-04 12:56:10,662][134294] Updated weights for policy 0, policy_version 202914 (0.0024) [2025-01-04 12:56:13,559][134294] Updated weights for policy 0, policy_version 202924 (0.0024) [2025-01-04 12:56:13,968][134211] Fps is (10 sec: 13927.5, 60 sec: 14609.1, 300 sec: 14884.5). Total num frames: 831180800. Throughput: 0: 3711.6. Samples: 196966134. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 12:56:13,968][134211] Avg episode reward: [(0, '11.392')] [2025-01-04 12:56:16,398][134294] Updated weights for policy 0, policy_version 202934 (0.0023) [2025-01-04 12:56:18,357][134294] Updated weights for policy 0, policy_version 202944 (0.0013) [2025-01-04 12:56:18,968][134211] Fps is (10 sec: 15564.7, 60 sec: 14950.4, 300 sec: 14912.2). Total num frames: 831266816. Throughput: 0: 3716.1. Samples: 196976764. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 12:56:18,968][134211] Avg episode reward: [(0, '9.805')] [2025-01-04 12:56:20,972][134294] Updated weights for policy 0, policy_version 202954 (0.0023) [2025-01-04 12:56:23,876][134294] Updated weights for policy 0, policy_version 202964 (0.0022) [2025-01-04 12:56:23,968][134211] Fps is (10 sec: 15974.3, 60 sec: 14950.5, 300 sec: 14828.9). Total num frames: 831340544. Throughput: 0: 3779.3. Samples: 197002296. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 12:56:23,968][134211] Avg episode reward: [(0, '9.243')] [2025-01-04 12:56:26,836][134294] Updated weights for policy 0, policy_version 202974 (0.0026) [2025-01-04 12:56:28,968][134211] Fps is (10 sec: 14336.0, 60 sec: 14950.4, 300 sec: 14842.8). Total num frames: 831410176. Throughput: 0: 3709.6. Samples: 197023370. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 12:56:28,969][134211] Avg episode reward: [(0, '8.399')] [2025-01-04 12:56:29,784][134294] Updated weights for policy 0, policy_version 202984 (0.0024) [2025-01-04 12:56:32,719][134294] Updated weights for policy 0, policy_version 202994 (0.0025) [2025-01-04 12:56:33,968][134211] Fps is (10 sec: 13926.2, 60 sec: 14950.6, 300 sec: 14856.7). Total num frames: 831479808. Throughput: 0: 3699.5. Samples: 197033574. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 12:56:33,968][134211] Avg episode reward: [(0, '9.313')] [2025-01-04 12:56:35,312][134294] Updated weights for policy 0, policy_version 203004 (0.0017) [2025-01-04 12:56:37,337][134294] Updated weights for policy 0, policy_version 203014 (0.0015) [2025-01-04 12:56:38,968][134211] Fps is (10 sec: 15564.7, 60 sec: 14950.4, 300 sec: 14912.2). Total num frames: 831565824. Throughput: 0: 3789.8. Samples: 197058876. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 12:56:38,968][134211] Avg episode reward: [(0, '9.106')] [2025-01-04 12:56:40,211][134294] Updated weights for policy 0, policy_version 203024 (0.0026) [2025-01-04 12:56:43,007][134294] Updated weights for policy 0, policy_version 203034 (0.0022) [2025-01-04 12:56:43,968][134211] Fps is (10 sec: 15564.8, 60 sec: 14882.1, 300 sec: 14898.3). Total num frames: 831635456. Throughput: 0: 3798.2. Samples: 197080348. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 12:56:43,968][134211] Avg episode reward: [(0, '10.433')] [2025-01-04 12:56:46,062][134294] Updated weights for policy 0, policy_version 203044 (0.0025) [2025-01-04 12:56:48,956][134294] Updated weights for policy 0, policy_version 203054 (0.0024) [2025-01-04 12:56:48,969][134211] Fps is (10 sec: 14333.8, 60 sec: 14950.0, 300 sec: 14912.1). Total num frames: 831709184. Throughput: 0: 3704.1. Samples: 197090876. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 12:56:48,970][134211] Avg episode reward: [(0, '9.924')] [2025-01-04 12:56:51,914][134294] Updated weights for policy 0, policy_version 203064 (0.0027) [2025-01-04 12:56:53,968][134211] Fps is (10 sec: 13926.4, 60 sec: 14882.1, 300 sec: 14829.0). Total num frames: 831774720. Throughput: 0: 3710.0. Samples: 197111710. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 12:56:53,968][134211] Avg episode reward: [(0, '9.827')] [2025-01-04 12:56:54,862][134294] Updated weights for policy 0, policy_version 203074 (0.0025) [2025-01-04 12:56:57,846][134294] Updated weights for policy 0, policy_version 203084 (0.0026) [2025-01-04 12:56:58,968][134211] Fps is (10 sec: 14338.4, 60 sec: 14882.1, 300 sec: 14856.7). Total num frames: 831852544. Throughput: 0: 3696.7. Samples: 197132486. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 12:56:58,968][134211] Avg episode reward: [(0, '9.815')] [2025-01-04 12:56:59,837][134294] Updated weights for policy 0, policy_version 203094 (0.0014) [2025-01-04 12:57:01,741][134294] Updated weights for policy 0, policy_version 203104 (0.0013) [2025-01-04 12:57:03,968][134211] Fps is (10 sec: 17612.8, 60 sec: 15155.4, 300 sec: 14953.9). Total num frames: 831950848. Throughput: 0: 3826.4. Samples: 197148954. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 12:57:03,968][134211] Avg episode reward: [(0, '10.038')] [2025-01-04 12:57:04,045][134294] Updated weights for policy 0, policy_version 203114 (0.0021) [2025-01-04 12:57:07,203][134294] Updated weights for policy 0, policy_version 203124 (0.0024) [2025-01-04 12:57:08,968][134211] Fps is (10 sec: 16383.7, 60 sec: 15086.9, 300 sec: 14940.0). Total num frames: 832016384. Throughput: 0: 3777.1. Samples: 197172264. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 12:57:08,968][134211] Avg episode reward: [(0, '9.373')] [2025-01-04 12:57:10,394][134294] Updated weights for policy 0, policy_version 203134 (0.0024) [2025-01-04 12:57:13,233][134294] Updated weights for policy 0, policy_version 203144 (0.0026) [2025-01-04 12:57:13,968][134211] Fps is (10 sec: 13516.7, 60 sec: 15086.9, 300 sec: 14828.9). Total num frames: 832086016. Throughput: 0: 3754.8. Samples: 197192336. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 12:57:13,969][134211] Avg episode reward: [(0, '9.252')] [2025-01-04 12:57:13,978][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000203146_832086016.pth... [2025-01-04 12:57:14,054][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000202278_828530688.pth [2025-01-04 12:57:16,286][134294] Updated weights for policy 0, policy_version 203154 (0.0022) [2025-01-04 12:57:18,968][134211] Fps is (10 sec: 13926.0, 60 sec: 14813.8, 300 sec: 14828.9). Total num frames: 832155648. Throughput: 0: 3755.1. Samples: 197202554. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 12:57:18,969][134211] Avg episode reward: [(0, '9.591')] [2025-01-04 12:57:19,197][134294] Updated weights for policy 0, policy_version 203164 (0.0026) [2025-01-04 12:57:22,072][134294] Updated weights for policy 0, policy_version 203174 (0.0026) [2025-01-04 12:57:23,968][134211] Fps is (10 sec: 13926.7, 60 sec: 14745.6, 300 sec: 14815.0). Total num frames: 832225280. Throughput: 0: 3663.5. Samples: 197223732. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 12:57:23,968][134211] Avg episode reward: [(0, '10.260')] [2025-01-04 12:57:24,993][134294] Updated weights for policy 0, policy_version 203184 (0.0027) [2025-01-04 12:57:27,809][134294] Updated weights for policy 0, policy_version 203194 (0.0025) [2025-01-04 12:57:28,968][134211] Fps is (10 sec: 13926.9, 60 sec: 14745.6, 300 sec: 14828.9). Total num frames: 832294912. Throughput: 0: 3658.9. Samples: 197245000. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 12:57:28,968][134211] Avg episode reward: [(0, '8.958')] [2025-01-04 12:57:30,696][134294] Updated weights for policy 0, policy_version 203204 (0.0025) [2025-01-04 12:57:33,563][134294] Updated weights for policy 0, policy_version 203214 (0.0026) [2025-01-04 12:57:33,968][134211] Fps is (10 sec: 14336.0, 60 sec: 14813.9, 300 sec: 14842.8). Total num frames: 832368640. Throughput: 0: 3666.3. Samples: 197255854. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 12:57:33,968][134211] Avg episode reward: [(0, '9.461')] [2025-01-04 12:57:36,454][134294] Updated weights for policy 0, policy_version 203224 (0.0025) [2025-01-04 12:57:38,968][134211] Fps is (10 sec: 14336.2, 60 sec: 14540.8, 300 sec: 14828.9). Total num frames: 832438272. Throughput: 0: 3676.4. Samples: 197277146. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 12:57:38,968][134211] Avg episode reward: [(0, '9.145')] [2025-01-04 12:57:39,287][134294] Updated weights for policy 0, policy_version 203234 (0.0023) [2025-01-04 12:57:41,173][134294] Updated weights for policy 0, policy_version 203244 (0.0013) [2025-01-04 12:57:43,045][134294] Updated weights for policy 0, policy_version 203254 (0.0012) [2025-01-04 12:57:43,967][134211] Fps is (10 sec: 17613.0, 60 sec: 15155.3, 300 sec: 14953.9). Total num frames: 832544768. Throughput: 0: 3857.2. Samples: 197306062. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 12:57:43,968][134211] Avg episode reward: [(0, '10.504')] [2025-01-04 12:57:44,923][134294] Updated weights for policy 0, policy_version 203264 (0.0013) [2025-01-04 12:57:46,803][134294] Updated weights for policy 0, policy_version 203274 (0.0013) [2025-01-04 12:57:48,730][134294] Updated weights for policy 0, policy_version 203284 (0.0014) [2025-01-04 12:57:48,968][134211] Fps is (10 sec: 21708.9, 60 sec: 15770.0, 300 sec: 15023.3). Total num frames: 832655360. Throughput: 0: 3855.1. Samples: 197322434. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 12:57:48,968][134211] Avg episode reward: [(0, '9.929')] [2025-01-04 12:57:51,436][134294] Updated weights for policy 0, policy_version 203294 (0.0023) [2025-01-04 12:57:53,971][134211] Fps is (10 sec: 18016.4, 60 sec: 15837.1, 300 sec: 14912.1). Total num frames: 832724992. Throughput: 0: 3923.0. Samples: 197348810. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 12:57:53,972][134211] Avg episode reward: [(0, '9.858')] [2025-01-04 12:57:54,563][134294] Updated weights for policy 0, policy_version 203304 (0.0028) [2025-01-04 12:57:57,619][134294] Updated weights for policy 0, policy_version 203314 (0.0029) [2025-01-04 12:57:58,968][134211] Fps is (10 sec: 13516.6, 60 sec: 15633.0, 300 sec: 14898.3). Total num frames: 832790528. Throughput: 0: 3916.6. Samples: 197368584. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 12:57:58,968][134211] Avg episode reward: [(0, '8.839')] [2025-01-04 12:58:00,667][134294] Updated weights for policy 0, policy_version 203324 (0.0025) [2025-01-04 12:58:03,552][134294] Updated weights for policy 0, policy_version 203334 (0.0027) [2025-01-04 12:58:03,968][134211] Fps is (10 sec: 13521.2, 60 sec: 15155.2, 300 sec: 14912.2). Total num frames: 832860160. Throughput: 0: 3919.8. Samples: 197378942. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 12:58:03,968][134211] Avg episode reward: [(0, '10.756')] [2025-01-04 12:58:06,546][134294] Updated weights for policy 0, policy_version 203344 (0.0027) [2025-01-04 12:58:08,968][134211] Fps is (10 sec: 13516.8, 60 sec: 15155.2, 300 sec: 14898.3). Total num frames: 832925696. Throughput: 0: 3907.6. Samples: 197399574. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 12:58:08,969][134211] Avg episode reward: [(0, '10.314')] [2025-01-04 12:58:09,594][134294] Updated weights for policy 0, policy_version 203354 (0.0026) [2025-01-04 12:58:12,534][134294] Updated weights for policy 0, policy_version 203364 (0.0025) [2025-01-04 12:58:13,968][134211] Fps is (10 sec: 13516.7, 60 sec: 15155.2, 300 sec: 14912.2). Total num frames: 832995328. Throughput: 0: 3890.0. Samples: 197420052. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 12:58:13,968][134211] Avg episode reward: [(0, '9.670')] [2025-01-04 12:58:15,385][134294] Updated weights for policy 0, policy_version 203374 (0.0025) [2025-01-04 12:58:18,292][134294] Updated weights for policy 0, policy_version 203384 (0.0023) [2025-01-04 12:58:18,968][134211] Fps is (10 sec: 14335.9, 60 sec: 15223.5, 300 sec: 14912.2). Total num frames: 833069056. Throughput: 0: 3887.1. Samples: 197430776. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 12:58:18,968][134211] Avg episode reward: [(0, '9.638')] [2025-01-04 12:58:21,157][134294] Updated weights for policy 0, policy_version 203394 (0.0025) [2025-01-04 12:58:23,968][134211] Fps is (10 sec: 14335.8, 60 sec: 15223.4, 300 sec: 14926.1). Total num frames: 833138688. Throughput: 0: 3888.2. Samples: 197452114. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 12:58:23,968][134211] Avg episode reward: [(0, '10.527')] [2025-01-04 12:58:24,143][134294] Updated weights for policy 0, policy_version 203404 (0.0023) [2025-01-04 12:58:27,051][134294] Updated weights for policy 0, policy_version 203414 (0.0027) [2025-01-04 12:58:28,968][134211] Fps is (10 sec: 13926.6, 60 sec: 15223.5, 300 sec: 14912.2). Total num frames: 833208320. Throughput: 0: 3705.7. Samples: 197472820. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 12:58:28,968][134211] Avg episode reward: [(0, '9.184')] [2025-01-04 12:58:30,026][134294] Updated weights for policy 0, policy_version 203424 (0.0023) [2025-01-04 12:58:32,888][134294] Updated weights for policy 0, policy_version 203434 (0.0026) [2025-01-04 12:58:33,968][134211] Fps is (10 sec: 14335.9, 60 sec: 15223.4, 300 sec: 14926.1). Total num frames: 833282048. Throughput: 0: 3583.0. Samples: 197483668. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 12:58:33,969][134211] Avg episode reward: [(0, '9.309')] [2025-01-04 12:58:35,668][134294] Updated weights for policy 0, policy_version 203444 (0.0025) [2025-01-04 12:58:38,571][134294] Updated weights for policy 0, policy_version 203454 (0.0027) [2025-01-04 12:58:38,968][134211] Fps is (10 sec: 14335.8, 60 sec: 15223.4, 300 sec: 14926.1). Total num frames: 833351680. Throughput: 0: 3477.6. Samples: 197505292. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 12:58:38,969][134211] Avg episode reward: [(0, '10.500')] [2025-01-04 12:58:41,433][134294] Updated weights for policy 0, policy_version 203464 (0.0023) [2025-01-04 12:58:43,968][134211] Fps is (10 sec: 13926.6, 60 sec: 14609.0, 300 sec: 14856.7). Total num frames: 833421312. Throughput: 0: 3506.5. Samples: 197526378. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 12:58:43,969][134211] Avg episode reward: [(0, '9.826')] [2025-01-04 12:58:44,386][134294] Updated weights for policy 0, policy_version 203474 (0.0025) [2025-01-04 12:58:47,358][134294] Updated weights for policy 0, policy_version 203484 (0.0024) [2025-01-04 12:58:48,968][134211] Fps is (10 sec: 13926.6, 60 sec: 13926.4, 300 sec: 14856.7). Total num frames: 833490944. Throughput: 0: 3505.3. Samples: 197536680. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 12:58:48,968][134211] Avg episode reward: [(0, '9.246')] [2025-01-04 12:58:50,263][134294] Updated weights for policy 0, policy_version 203494 (0.0026) [2025-01-04 12:58:53,070][134294] Updated weights for policy 0, policy_version 203504 (0.0024) [2025-01-04 12:58:53,968][134211] Fps is (10 sec: 14335.9, 60 sec: 13995.4, 300 sec: 14856.7). Total num frames: 833564672. Throughput: 0: 3524.8. Samples: 197558190. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 12:58:53,968][134211] Avg episode reward: [(0, '10.836')] [2025-01-04 12:58:55,821][134294] Updated weights for policy 0, policy_version 203514 (0.0022) [2025-01-04 12:58:57,668][134294] Updated weights for policy 0, policy_version 203524 (0.0013) [2025-01-04 12:58:58,968][134211] Fps is (10 sec: 16793.7, 60 sec: 14472.6, 300 sec: 14940.0). Total num frames: 833658880. Throughput: 0: 3658.8. Samples: 197584698. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 12:58:58,968][134211] Avg episode reward: [(0, '8.401')] [2025-01-04 12:58:59,555][134294] Updated weights for policy 0, policy_version 203534 (0.0013) [2025-01-04 12:59:01,401][134294] Updated weights for policy 0, policy_version 203544 (0.0013) [2025-01-04 12:59:03,345][134294] Updated weights for policy 0, policy_version 203554 (0.0015) [2025-01-04 12:59:03,968][134211] Fps is (10 sec: 20070.8, 60 sec: 15086.9, 300 sec: 15037.2). Total num frames: 833765376. Throughput: 0: 3785.3. Samples: 197601112. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 12:59:03,968][134211] Avg episode reward: [(0, '8.361')] [2025-01-04 12:59:05,945][134294] Updated weights for policy 0, policy_version 203564 (0.0022) [2025-01-04 12:59:08,968][134211] Fps is (10 sec: 17612.5, 60 sec: 15155.2, 300 sec: 14995.5). Total num frames: 833835008. Throughput: 0: 3882.7. Samples: 197626834. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 12:59:08,968][134211] Avg episode reward: [(0, '10.653')] [2025-01-04 12:59:09,086][134294] Updated weights for policy 0, policy_version 203574 (0.0028) [2025-01-04 12:59:12,133][134294] Updated weights for policy 0, policy_version 203584 (0.0027) [2025-01-04 12:59:13,968][134211] Fps is (10 sec: 13516.6, 60 sec: 15086.9, 300 sec: 14995.5). Total num frames: 833900544. Throughput: 0: 3863.4. Samples: 197646674. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 12:59:13,968][134211] Avg episode reward: [(0, '9.079')] [2025-01-04 12:59:14,039][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000203590_833904640.pth... [2025-01-04 12:59:14,111][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000202711_830304256.pth [2025-01-04 12:59:15,197][134294] Updated weights for policy 0, policy_version 203594 (0.0025) [2025-01-04 12:59:18,011][134294] Updated weights for policy 0, policy_version 203604 (0.0024) [2025-01-04 12:59:18,968][134211] Fps is (10 sec: 13925.8, 60 sec: 15086.8, 300 sec: 14995.5). Total num frames: 833974272. Throughput: 0: 3848.8. Samples: 197656866. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 12:59:18,969][134211] Avg episode reward: [(0, '9.925')] [2025-01-04 12:59:21,080][134294] Updated weights for policy 0, policy_version 203614 (0.0027) [2025-01-04 12:59:23,968][134211] Fps is (10 sec: 13926.3, 60 sec: 15018.7, 300 sec: 14953.9). Total num frames: 834039808. Throughput: 0: 3832.2. Samples: 197677740. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 12:59:23,968][134211] Avg episode reward: [(0, '9.408')] [2025-01-04 12:59:24,041][134294] Updated weights for policy 0, policy_version 203624 (0.0027) [2025-01-04 12:59:26,987][134294] Updated weights for policy 0, policy_version 203634 (0.0024) [2025-01-04 12:59:28,968][134211] Fps is (10 sec: 13517.3, 60 sec: 15018.6, 300 sec: 14912.2). Total num frames: 834109440. Throughput: 0: 3824.0. Samples: 197698460. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 12:59:28,968][134211] Avg episode reward: [(0, '8.627')] [2025-01-04 12:59:29,949][134294] Updated weights for policy 0, policy_version 203644 (0.0024) [2025-01-04 12:59:32,765][134294] Updated weights for policy 0, policy_version 203654 (0.0020) [2025-01-04 12:59:33,968][134211] Fps is (10 sec: 14336.3, 60 sec: 15018.7, 300 sec: 14926.1). Total num frames: 834183168. Throughput: 0: 3833.8. Samples: 197709200. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 12:59:33,968][134211] Avg episode reward: [(0, '10.789')] [2025-01-04 12:59:35,609][134294] Updated weights for policy 0, policy_version 203664 (0.0027) [2025-01-04 12:59:38,453][134294] Updated weights for policy 0, policy_version 203674 (0.0025) [2025-01-04 12:59:38,968][134211] Fps is (10 sec: 14336.2, 60 sec: 15018.7, 300 sec: 14926.1). Total num frames: 834252800. Throughput: 0: 3838.1. Samples: 197730904. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 12:59:38,968][134211] Avg episode reward: [(0, '9.310')] [2025-01-04 12:59:41,318][134294] Updated weights for policy 0, policy_version 203684 (0.0026) [2025-01-04 12:59:43,968][134211] Fps is (10 sec: 13926.2, 60 sec: 15018.6, 300 sec: 14926.1). Total num frames: 834322432. Throughput: 0: 3717.9. Samples: 197752004. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 12:59:43,968][134211] Avg episode reward: [(0, '9.774')] [2025-01-04 12:59:44,281][134294] Updated weights for policy 0, policy_version 203694 (0.0023) [2025-01-04 12:59:47,134][134294] Updated weights for policy 0, policy_version 203704 (0.0022) [2025-01-04 12:59:48,968][134211] Fps is (10 sec: 13926.3, 60 sec: 15018.6, 300 sec: 14912.2). Total num frames: 834392064. Throughput: 0: 3583.0. Samples: 197762346. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 12:59:48,968][134211] Avg episode reward: [(0, '9.795')] [2025-01-04 12:59:50,085][134294] Updated weights for policy 0, policy_version 203714 (0.0027) [2025-01-04 12:59:52,901][134294] Updated weights for policy 0, policy_version 203724 (0.0023) [2025-01-04 12:59:53,968][134211] Fps is (10 sec: 14336.2, 60 sec: 15018.7, 300 sec: 14926.1). Total num frames: 834465792. Throughput: 0: 3491.8. Samples: 197783964. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 12:59:53,968][134211] Avg episode reward: [(0, '9.656')] [2025-01-04 12:59:55,852][134294] Updated weights for policy 0, policy_version 203734 (0.0023) [2025-01-04 12:59:58,589][134294] Updated weights for policy 0, policy_version 203744 (0.0025) [2025-01-04 12:59:58,968][134211] Fps is (10 sec: 14336.1, 60 sec: 14609.0, 300 sec: 14842.8). Total num frames: 834535424. Throughput: 0: 3528.6. Samples: 197805462. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 12:59:58,968][134211] Avg episode reward: [(0, '9.614')] [2025-01-04 13:00:01,521][134294] Updated weights for policy 0, policy_version 203754 (0.0024) [2025-01-04 13:00:03,968][134211] Fps is (10 sec: 14336.0, 60 sec: 14062.9, 300 sec: 14829.0). Total num frames: 834609152. Throughput: 0: 3538.0. Samples: 197816076. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 13:00:03,968][134211] Avg episode reward: [(0, '10.535')] [2025-01-04 13:00:04,568][134294] Updated weights for policy 0, policy_version 203764 (0.0023) [2025-01-04 13:00:06,529][134294] Updated weights for policy 0, policy_version 203774 (0.0013) [2025-01-04 13:00:08,415][134294] Updated weights for policy 0, policy_version 203784 (0.0014) [2025-01-04 13:00:08,967][134211] Fps is (10 sec: 17203.4, 60 sec: 14540.8, 300 sec: 14926.1). Total num frames: 834707456. Throughput: 0: 3635.4. Samples: 197841332. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 13:00:08,968][134211] Avg episode reward: [(0, '9.595')] [2025-01-04 13:00:10,748][134294] Updated weights for policy 0, policy_version 203794 (0.0023) [2025-01-04 13:00:13,699][134294] Updated weights for policy 0, policy_version 203804 (0.0027) [2025-01-04 13:00:13,968][134211] Fps is (10 sec: 17202.8, 60 sec: 14677.3, 300 sec: 14953.9). Total num frames: 834781184. Throughput: 0: 3739.3. Samples: 197866728. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 13:00:13,968][134211] Avg episode reward: [(0, '9.311')] [2025-01-04 13:00:16,690][134294] Updated weights for policy 0, policy_version 203814 (0.0023) [2025-01-04 13:00:18,968][134211] Fps is (10 sec: 14335.8, 60 sec: 14609.2, 300 sec: 14940.0). Total num frames: 834850816. Throughput: 0: 3725.9. Samples: 197876866. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 13:00:18,968][134211] Avg episode reward: [(0, '9.305')] [2025-01-04 13:00:19,769][134294] Updated weights for policy 0, policy_version 203824 (0.0025) [2025-01-04 13:00:22,688][134294] Updated weights for policy 0, policy_version 203834 (0.0024) [2025-01-04 13:00:23,968][134211] Fps is (10 sec: 13926.4, 60 sec: 14677.3, 300 sec: 14940.0). Total num frames: 834920448. Throughput: 0: 3698.7. Samples: 197897346. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 13:00:23,968][134211] Avg episode reward: [(0, '8.758')] [2025-01-04 13:00:25,645][134294] Updated weights for policy 0, policy_version 203844 (0.0025) [2025-01-04 13:00:28,458][134294] Updated weights for policy 0, policy_version 203854 (0.0023) [2025-01-04 13:00:28,968][134211] Fps is (10 sec: 13926.3, 60 sec: 14677.3, 300 sec: 14940.0). Total num frames: 834990080. Throughput: 0: 3703.7. Samples: 197918670. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 13:00:28,968][134211] Avg episode reward: [(0, '10.410')] [2025-01-04 13:00:31,438][134294] Updated weights for policy 0, policy_version 203864 (0.0024) [2025-01-04 13:00:33,968][134211] Fps is (10 sec: 13926.4, 60 sec: 14609.0, 300 sec: 14884.4). Total num frames: 835059712. Throughput: 0: 3707.9. Samples: 197929200. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 13:00:33,968][134211] Avg episode reward: [(0, '9.161')] [2025-01-04 13:00:34,281][134294] Updated weights for policy 0, policy_version 203874 (0.0024) [2025-01-04 13:00:37,288][134294] Updated weights for policy 0, policy_version 203884 (0.0026) [2025-01-04 13:00:38,968][134211] Fps is (10 sec: 13926.5, 60 sec: 14609.1, 300 sec: 14870.6). Total num frames: 835129344. Throughput: 0: 3690.9. Samples: 197950056. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 13:00:38,968][134211] Avg episode reward: [(0, '9.492')] [2025-01-04 13:00:40,186][134294] Updated weights for policy 0, policy_version 203894 (0.0024) [2025-01-04 13:00:42,157][134294] Updated weights for policy 0, policy_version 203904 (0.0014) [2025-01-04 13:00:43,968][134211] Fps is (10 sec: 15974.6, 60 sec: 14950.4, 300 sec: 14940.0). Total num frames: 835219456. Throughput: 0: 3787.3. Samples: 197975892. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 13:00:43,968][134211] Avg episode reward: [(0, '9.940')] [2025-01-04 13:00:44,512][134294] Updated weights for policy 0, policy_version 203914 (0.0020) [2025-01-04 13:00:47,345][134294] Updated weights for policy 0, policy_version 203924 (0.0025) [2025-01-04 13:00:48,968][134211] Fps is (10 sec: 16383.8, 60 sec: 15018.7, 300 sec: 14953.9). Total num frames: 835293184. Throughput: 0: 3801.3. Samples: 197987136. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 13:00:48,968][134211] Avg episode reward: [(0, '10.790')] [2025-01-04 13:00:50,331][134294] Updated weights for policy 0, policy_version 203934 (0.0025) [2025-01-04 13:00:53,169][134294] Updated weights for policy 0, policy_version 203944 (0.0022) [2025-01-04 13:00:53,968][134211] Fps is (10 sec: 14335.8, 60 sec: 14950.4, 300 sec: 14926.1). Total num frames: 835362816. Throughput: 0: 3710.2. Samples: 198008292. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 13:00:53,968][134211] Avg episode reward: [(0, '9.279')] [2025-01-04 13:00:56,151][134294] Updated weights for policy 0, policy_version 203954 (0.0023) [2025-01-04 13:00:58,968][134211] Fps is (10 sec: 13926.4, 60 sec: 14950.4, 300 sec: 14884.5). Total num frames: 835432448. Throughput: 0: 3610.3. Samples: 198029192. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 13:00:58,968][134211] Avg episode reward: [(0, '9.200')] [2025-01-04 13:00:59,041][134294] Updated weights for policy 0, policy_version 203964 (0.0026) [2025-01-04 13:01:02,012][134294] Updated weights for policy 0, policy_version 203974 (0.0025) [2025-01-04 13:01:03,968][134211] Fps is (10 sec: 13926.4, 60 sec: 14882.1, 300 sec: 14884.4). Total num frames: 835502080. Throughput: 0: 3616.9. Samples: 198039628. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 13:01:03,968][134211] Avg episode reward: [(0, '11.169')] [2025-01-04 13:01:04,905][134294] Updated weights for policy 0, policy_version 203984 (0.0021) [2025-01-04 13:01:06,832][134294] Updated weights for policy 0, policy_version 203994 (0.0012) [2025-01-04 13:01:08,667][134294] Updated weights for policy 0, policy_version 204004 (0.0012) [2025-01-04 13:01:08,968][134211] Fps is (10 sec: 17203.6, 60 sec: 14950.4, 300 sec: 14995.5). Total num frames: 835604480. Throughput: 0: 3726.3. Samples: 198065028. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 13:01:08,968][134211] Avg episode reward: [(0, '8.911')] [2025-01-04 13:01:11,061][134294] Updated weights for policy 0, policy_version 204014 (0.0019) [2025-01-04 13:01:13,968][134211] Fps is (10 sec: 17612.9, 60 sec: 14950.4, 300 sec: 14953.9). Total num frames: 835678208. Throughput: 0: 3830.1. Samples: 198091026. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-04 13:01:13,968][134211] Avg episode reward: [(0, '10.225')] [2025-01-04 13:01:14,001][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000204024_835682304.pth... [2025-01-04 13:01:14,005][134294] Updated weights for policy 0, policy_version 204024 (0.0025) [2025-01-04 13:01:14,078][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000203146_832086016.pth [2025-01-04 13:01:17,078][134294] Updated weights for policy 0, policy_version 204034 (0.0027) [2025-01-04 13:01:18,968][134211] Fps is (10 sec: 14335.7, 60 sec: 14950.4, 300 sec: 14940.0). Total num frames: 835747840. Throughput: 0: 3815.4. Samples: 198100894. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-04 13:01:18,968][134211] Avg episode reward: [(0, '9.247')] [2025-01-04 13:01:20,124][134294] Updated weights for policy 0, policy_version 204044 (0.0026) [2025-01-04 13:01:22,881][134294] Updated weights for policy 0, policy_version 204054 (0.0026) [2025-01-04 13:01:23,968][134211] Fps is (10 sec: 13926.3, 60 sec: 14950.4, 300 sec: 14940.0). Total num frames: 835817472. Throughput: 0: 3818.6. Samples: 198121892. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-04 13:01:23,968][134211] Avg episode reward: [(0, '9.867')] [2025-01-04 13:01:25,804][134294] Updated weights for policy 0, policy_version 204064 (0.0024) [2025-01-04 13:01:28,686][134294] Updated weights for policy 0, policy_version 204074 (0.0023) [2025-01-04 13:01:28,968][134211] Fps is (10 sec: 13926.6, 60 sec: 14950.4, 300 sec: 14940.0). Total num frames: 835887104. Throughput: 0: 3717.9. Samples: 198143198. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-04 13:01:28,968][134211] Avg episode reward: [(0, '9.909')] [2025-01-04 13:01:31,566][134294] Updated weights for policy 0, policy_version 204084 (0.0024) [2025-01-04 13:01:33,968][134211] Fps is (10 sec: 13926.4, 60 sec: 14950.4, 300 sec: 14884.4). Total num frames: 835956736. Throughput: 0: 3702.4. Samples: 198153742. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-04 13:01:33,968][134211] Avg episode reward: [(0, '9.639')] [2025-01-04 13:01:34,621][134294] Updated weights for policy 0, policy_version 204094 (0.0025) [2025-01-04 13:01:37,463][134294] Updated weights for policy 0, policy_version 204104 (0.0026) [2025-01-04 13:01:38,968][134211] Fps is (10 sec: 13926.4, 60 sec: 14950.4, 300 sec: 14884.5). Total num frames: 836026368. Throughput: 0: 3695.7. Samples: 198174596. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-04 13:01:38,968][134211] Avg episode reward: [(0, '9.968')] [2025-01-04 13:01:40,472][134294] Updated weights for policy 0, policy_version 204114 (0.0020) [2025-01-04 13:01:43,230][134294] Updated weights for policy 0, policy_version 204124 (0.0025) [2025-01-04 13:01:43,968][134211] Fps is (10 sec: 14335.9, 60 sec: 14677.3, 300 sec: 14884.5). Total num frames: 836100096. Throughput: 0: 3707.7. Samples: 198196040. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-04 13:01:43,968][134211] Avg episode reward: [(0, '9.495')] [2025-01-04 13:01:46,110][134294] Updated weights for policy 0, policy_version 204134 (0.0025) [2025-01-04 13:01:48,278][134294] Updated weights for policy 0, policy_version 204144 (0.0016) [2025-01-04 13:01:48,968][134211] Fps is (10 sec: 15974.2, 60 sec: 14882.1, 300 sec: 14953.9). Total num frames: 836186112. Throughput: 0: 3713.0. Samples: 198206712. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-04 13:01:48,968][134211] Avg episode reward: [(0, '8.809')] [2025-01-04 13:01:50,202][134294] Updated weights for policy 0, policy_version 204154 (0.0012) [2025-01-04 13:01:52,531][134294] Updated weights for policy 0, policy_version 204164 (0.0022) [2025-01-04 13:01:53,968][134211] Fps is (10 sec: 17203.5, 60 sec: 15155.2, 300 sec: 14981.6). Total num frames: 836272128. Throughput: 0: 3798.8. Samples: 198235974. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-04 13:01:53,968][134211] Avg episode reward: [(0, '9.286')] [2025-01-04 13:01:55,561][134294] Updated weights for policy 0, policy_version 204174 (0.0025) [2025-01-04 13:01:58,460][134294] Updated weights for policy 0, policy_version 204184 (0.0022) [2025-01-04 13:01:58,968][134211] Fps is (10 sec: 15564.8, 60 sec: 15155.2, 300 sec: 14884.4). Total num frames: 836341760. Throughput: 0: 3679.9. Samples: 198256620. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-04 13:01:58,968][134211] Avg episode reward: [(0, '9.146')] [2025-01-04 13:02:01,403][134294] Updated weights for policy 0, policy_version 204194 (0.0026) [2025-01-04 13:02:03,968][134211] Fps is (10 sec: 13926.2, 60 sec: 15155.2, 300 sec: 14898.3). Total num frames: 836411392. Throughput: 0: 3690.7. Samples: 198266976. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-04 13:02:03,968][134211] Avg episode reward: [(0, '9.203')] [2025-01-04 13:02:04,448][134294] Updated weights for policy 0, policy_version 204204 (0.0023) [2025-01-04 13:02:07,411][134294] Updated weights for policy 0, policy_version 204214 (0.0025) [2025-01-04 13:02:08,968][134211] Fps is (10 sec: 13516.8, 60 sec: 14540.8, 300 sec: 14884.5). Total num frames: 836476928. Throughput: 0: 3678.8. Samples: 198287438. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-04 13:02:08,968][134211] Avg episode reward: [(0, '8.877')] [2025-01-04 13:02:10,409][134294] Updated weights for policy 0, policy_version 204224 (0.0022) [2025-01-04 13:02:13,198][134294] Updated weights for policy 0, policy_version 204234 (0.0022) [2025-01-04 13:02:13,968][134211] Fps is (10 sec: 13926.4, 60 sec: 14540.8, 300 sec: 14898.3). Total num frames: 836550656. Throughput: 0: 3675.9. Samples: 198308616. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-04 13:02:13,968][134211] Avg episode reward: [(0, '8.698')] [2025-01-04 13:02:16,168][134294] Updated weights for policy 0, policy_version 204244 (0.0025) [2025-01-04 13:02:18,968][134211] Fps is (10 sec: 14336.1, 60 sec: 14540.8, 300 sec: 14898.3). Total num frames: 836620288. Throughput: 0: 3676.9. Samples: 198319202. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 13:02:18,968][134211] Avg episode reward: [(0, '9.091')] [2025-01-04 13:02:19,073][134294] Updated weights for policy 0, policy_version 204254 (0.0026) [2025-01-04 13:02:21,987][134294] Updated weights for policy 0, policy_version 204264 (0.0023) [2025-01-04 13:02:23,968][134211] Fps is (10 sec: 13926.4, 60 sec: 14540.8, 300 sec: 14898.3). Total num frames: 836689920. Throughput: 0: 3680.0. Samples: 198340196. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 13:02:23,968][134211] Avg episode reward: [(0, '9.823')] [2025-01-04 13:02:24,958][134294] Updated weights for policy 0, policy_version 204274 (0.0024) [2025-01-04 13:02:27,359][134294] Updated weights for policy 0, policy_version 204284 (0.0019) [2025-01-04 13:02:28,968][134211] Fps is (10 sec: 15974.6, 60 sec: 14882.1, 300 sec: 14953.9). Total num frames: 836780032. Throughput: 0: 3745.9. Samples: 198364606. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 13:02:28,968][134211] Avg episode reward: [(0, '9.252')] [2025-01-04 13:02:29,244][134294] Updated weights for policy 0, policy_version 204294 (0.0011) [2025-01-04 13:02:31,126][134294] Updated weights for policy 0, policy_version 204304 (0.0013) [2025-01-04 13:02:32,966][134294] Updated weights for policy 0, policy_version 204314 (0.0013) [2025-01-04 13:02:33,968][134211] Fps is (10 sec: 20070.9, 60 sec: 15564.9, 300 sec: 15092.7). Total num frames: 836890624. Throughput: 0: 3873.7. Samples: 198381026. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 13:02:33,968][134211] Avg episode reward: [(0, '9.245')] [2025-01-04 13:02:34,808][134294] Updated weights for policy 0, policy_version 204324 (0.0014) [2025-01-04 13:02:36,807][134294] Updated weights for policy 0, policy_version 204334 (0.0015) [2025-01-04 13:02:38,968][134211] Fps is (10 sec: 19250.8, 60 sec: 15769.6, 300 sec: 15009.4). Total num frames: 836972544. Throughput: 0: 3925.0. Samples: 198412600. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 13:02:38,968][134211] Avg episode reward: [(0, '9.070')] [2025-01-04 13:02:40,895][134294] Updated weights for policy 0, policy_version 204344 (0.0021) [2025-01-04 13:02:43,968][134211] Fps is (10 sec: 13925.2, 60 sec: 15496.4, 300 sec: 14828.9). Total num frames: 837029888. Throughput: 0: 3829.8. Samples: 198428964. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 13:02:43,969][134211] Avg episode reward: [(0, '9.533')] [2025-01-04 13:02:44,351][134294] Updated weights for policy 0, policy_version 204354 (0.0030) [2025-01-04 13:02:48,458][134294] Updated weights for policy 0, policy_version 204364 (0.0035) [2025-01-04 13:02:48,968][134211] Fps is (10 sec: 10649.3, 60 sec: 14882.1, 300 sec: 14759.6). Total num frames: 837079040. Throughput: 0: 3760.8. Samples: 198436212. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 13:02:48,969][134211] Avg episode reward: [(0, '9.928')] [2025-01-04 13:02:51,925][134294] Updated weights for policy 0, policy_version 204374 (0.0031) [2025-01-04 13:02:53,968][134211] Fps is (10 sec: 10649.6, 60 sec: 14404.1, 300 sec: 14731.7). Total num frames: 837136384. Throughput: 0: 3675.7. Samples: 198452846. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 13:02:53,969][134211] Avg episode reward: [(0, '8.504')] [2025-01-04 13:02:56,034][134294] Updated weights for policy 0, policy_version 204384 (0.0029) [2025-01-04 13:02:58,189][134294] Updated weights for policy 0, policy_version 204394 (0.0014) [2025-01-04 13:02:58,968][134211] Fps is (10 sec: 13517.4, 60 sec: 14540.8, 300 sec: 14759.5). Total num frames: 837214208. Throughput: 0: 3649.0. Samples: 198472822. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 13:02:58,968][134211] Avg episode reward: [(0, '9.243')] [2025-01-04 13:03:00,117][134294] Updated weights for policy 0, policy_version 204404 (0.0013) [2025-01-04 13:03:02,012][134294] Updated weights for policy 0, policy_version 204414 (0.0013) [2025-01-04 13:03:03,968][134211] Fps is (10 sec: 17613.8, 60 sec: 15018.7, 300 sec: 14870.6). Total num frames: 837312512. Throughput: 0: 3765.0. Samples: 198488626. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 13:03:03,969][134211] Avg episode reward: [(0, '11.187')] [2025-01-04 13:03:04,654][134294] Updated weights for policy 0, policy_version 204424 (0.0021) [2025-01-04 13:03:08,111][134294] Updated weights for policy 0, policy_version 204434 (0.0028) [2025-01-04 13:03:08,968][134211] Fps is (10 sec: 15564.5, 60 sec: 14882.1, 300 sec: 14828.9). Total num frames: 837369856. Throughput: 0: 3793.7. Samples: 198510914. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 13:03:08,968][134211] Avg episode reward: [(0, '9.125')] [2025-01-04 13:03:11,406][134294] Updated weights for policy 0, policy_version 204444 (0.0027) [2025-01-04 13:03:13,968][134211] Fps is (10 sec: 12287.9, 60 sec: 14745.6, 300 sec: 14801.1). Total num frames: 837435392. Throughput: 0: 3662.5. Samples: 198529420. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 13:03:13,969][134211] Avg episode reward: [(0, '9.289')] [2025-01-04 13:03:13,980][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000204452_837435392.pth... [2025-01-04 13:03:14,057][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000203590_833904640.pth [2025-01-04 13:03:14,730][134294] Updated weights for policy 0, policy_version 204454 (0.0024) [2025-01-04 13:03:18,179][134294] Updated weights for policy 0, policy_version 204464 (0.0030) [2025-01-04 13:03:18,968][134211] Fps is (10 sec: 12288.1, 60 sec: 14540.8, 300 sec: 14759.5). Total num frames: 837492736. Throughput: 0: 3495.1. Samples: 198538308. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 13:03:18,968][134211] Avg episode reward: [(0, '10.208')] [2025-01-04 13:03:21,266][134294] Updated weights for policy 0, policy_version 204474 (0.0027) [2025-01-04 13:03:23,968][134211] Fps is (10 sec: 11878.5, 60 sec: 14404.3, 300 sec: 14731.7). Total num frames: 837554176. Throughput: 0: 3217.7. Samples: 198557398. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 13:03:23,968][134211] Avg episode reward: [(0, '10.465')] [2025-01-04 13:03:24,666][134294] Updated weights for policy 0, policy_version 204484 (0.0026) [2025-01-04 13:03:27,825][134294] Updated weights for policy 0, policy_version 204494 (0.0031) [2025-01-04 13:03:28,968][134211] Fps is (10 sec: 12697.5, 60 sec: 13994.6, 300 sec: 14704.0). Total num frames: 837619712. Throughput: 0: 3269.0. Samples: 198576068. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 13:03:28,968][134211] Avg episode reward: [(0, '9.548')] [2025-01-04 13:03:30,857][134294] Updated weights for policy 0, policy_version 204504 (0.0024) [2025-01-04 13:03:33,933][134294] Updated weights for policy 0, policy_version 204514 (0.0026) [2025-01-04 13:03:33,968][134211] Fps is (10 sec: 13517.0, 60 sec: 13312.0, 300 sec: 14703.9). Total num frames: 837689344. Throughput: 0: 3337.5. Samples: 198586400. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 13:03:33,969][134211] Avg episode reward: [(0, '8.562')] [2025-01-04 13:03:37,218][134294] Updated weights for policy 0, policy_version 204524 (0.0024) [2025-01-04 13:03:38,968][134211] Fps is (10 sec: 14336.3, 60 sec: 13175.5, 300 sec: 14717.8). Total num frames: 837763072. Throughput: 0: 3393.7. Samples: 198605560. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 13:03:38,968][134211] Avg episode reward: [(0, '9.770')] [2025-01-04 13:03:39,154][134294] Updated weights for policy 0, policy_version 204534 (0.0015) [2025-01-04 13:03:41,206][134294] Updated weights for policy 0, policy_version 204544 (0.0014) [2025-01-04 13:03:43,208][134294] Updated weights for policy 0, policy_version 204554 (0.0015) [2025-01-04 13:03:43,968][134211] Fps is (10 sec: 17203.2, 60 sec: 13858.3, 300 sec: 14815.0). Total num frames: 837861376. Throughput: 0: 3638.5. Samples: 198636556. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 13:03:43,968][134211] Avg episode reward: [(0, '9.233')] [2025-01-04 13:03:46,818][134294] Updated weights for policy 0, policy_version 204564 (0.0033) [2025-01-04 13:03:48,968][134211] Fps is (10 sec: 15154.4, 60 sec: 13926.4, 300 sec: 14745.6). Total num frames: 837914624. Throughput: 0: 3482.0. Samples: 198645316. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 13:03:48,969][134211] Avg episode reward: [(0, '8.405')] [2025-01-04 13:03:50,538][134294] Updated weights for policy 0, policy_version 204574 (0.0027) [2025-01-04 13:03:53,920][134294] Updated weights for policy 0, policy_version 204584 (0.0029) [2025-01-04 13:03:53,968][134211] Fps is (10 sec: 11468.7, 60 sec: 13994.8, 300 sec: 14634.5). Total num frames: 837976064. Throughput: 0: 3349.2. Samples: 198661630. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 13:03:53,968][134211] Avg episode reward: [(0, '9.463')] [2025-01-04 13:03:57,444][134294] Updated weights for policy 0, policy_version 204594 (0.0030) [2025-01-04 13:03:58,968][134211] Fps is (10 sec: 11878.8, 60 sec: 13653.3, 300 sec: 14467.9). Total num frames: 838033408. Throughput: 0: 3337.3. Samples: 198679596. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 13:03:58,968][134211] Avg episode reward: [(0, '9.464')] [2025-01-04 13:04:00,753][134294] Updated weights for policy 0, policy_version 204604 (0.0027) [2025-01-04 13:04:03,797][134294] Updated weights for policy 0, policy_version 204614 (0.0024) [2025-01-04 13:04:03,967][134211] Fps is (10 sec: 12288.3, 60 sec: 13107.3, 300 sec: 14454.0). Total num frames: 838098944. Throughput: 0: 3345.7. Samples: 198688862. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 13:04:03,968][134211] Avg episode reward: [(0, '8.457')] [2025-01-04 13:04:05,882][134294] Updated weights for policy 0, policy_version 204624 (0.0017) [2025-01-04 13:04:08,858][134294] Updated weights for policy 0, policy_version 204634 (0.0024) [2025-01-04 13:04:08,968][134211] Fps is (10 sec: 14745.4, 60 sec: 13516.8, 300 sec: 14509.6). Total num frames: 838180864. Throughput: 0: 3454.0. Samples: 198712830. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 13:04:08,968][134211] Avg episode reward: [(0, '8.870')] [2025-01-04 13:04:12,338][134294] Updated weights for policy 0, policy_version 204644 (0.0029) [2025-01-04 13:04:13,967][134211] Fps is (10 sec: 14336.0, 60 sec: 13448.6, 300 sec: 14467.9). Total num frames: 838242304. Throughput: 0: 3449.8. Samples: 198731310. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 13:04:13,968][134211] Avg episode reward: [(0, '8.502')] [2025-01-04 13:04:14,827][134294] Updated weights for policy 0, policy_version 204654 (0.0018) [2025-01-04 13:04:16,800][134294] Updated weights for policy 0, policy_version 204664 (0.0014) [2025-01-04 13:04:18,968][134211] Fps is (10 sec: 15974.7, 60 sec: 14131.2, 300 sec: 14579.0). Total num frames: 838340608. Throughput: 0: 3544.9. Samples: 198745922. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 13:04:18,968][134211] Avg episode reward: [(0, '9.758')] [2025-01-04 13:04:19,023][134294] Updated weights for policy 0, policy_version 204674 (0.0023) [2025-01-04 13:04:22,213][134294] Updated weights for policy 0, policy_version 204684 (0.0029) [2025-01-04 13:04:23,968][134211] Fps is (10 sec: 16383.5, 60 sec: 14199.5, 300 sec: 14565.1). Total num frames: 838406144. Throughput: 0: 3637.4. Samples: 198769244. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 13:04:23,968][134211] Avg episode reward: [(0, '9.288')] [2025-01-04 13:04:25,443][134294] Updated weights for policy 0, policy_version 204694 (0.0025) [2025-01-04 13:04:28,637][134294] Updated weights for policy 0, policy_version 204704 (0.0027) [2025-01-04 13:04:28,968][134211] Fps is (10 sec: 12697.5, 60 sec: 14131.2, 300 sec: 14523.4). Total num frames: 838467584. Throughput: 0: 3384.5. Samples: 198788858. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 13:04:28,968][134211] Avg episode reward: [(0, '9.822')] [2025-01-04 13:04:31,993][134294] Updated weights for policy 0, policy_version 204714 (0.0027) [2025-01-04 13:04:33,968][134211] Fps is (10 sec: 12288.1, 60 sec: 13994.7, 300 sec: 14495.7). Total num frames: 838529024. Throughput: 0: 3390.8. Samples: 198797900. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 13:04:33,968][134211] Avg episode reward: [(0, '9.260')] [2025-01-04 13:04:35,407][134294] Updated weights for policy 0, policy_version 204724 (0.0029) [2025-01-04 13:04:38,552][134294] Updated weights for policy 0, policy_version 204734 (0.0025) [2025-01-04 13:04:38,968][134211] Fps is (10 sec: 12287.9, 60 sec: 13789.8, 300 sec: 14467.9). Total num frames: 838590464. Throughput: 0: 3434.5. Samples: 198816184. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 13:04:38,968][134211] Avg episode reward: [(0, '9.923')] [2025-01-04 13:04:40,823][134294] Updated weights for policy 0, policy_version 204744 (0.0016) [2025-01-04 13:04:43,790][134294] Updated weights for policy 0, policy_version 204754 (0.0024) [2025-01-04 13:04:43,968][134211] Fps is (10 sec: 14335.9, 60 sec: 13516.8, 300 sec: 14509.6). Total num frames: 838672384. Throughput: 0: 3555.2. Samples: 198839582. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 13:04:43,968][134211] Avg episode reward: [(0, '9.679')] [2025-01-04 13:04:46,864][134294] Updated weights for policy 0, policy_version 204764 (0.0026) [2025-01-04 13:04:48,968][134211] Fps is (10 sec: 14336.0, 60 sec: 13653.4, 300 sec: 14467.9). Total num frames: 838733824. Throughput: 0: 3565.9. Samples: 198849328. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 13:04:48,968][134211] Avg episode reward: [(0, '9.311')] [2025-01-04 13:04:50,390][134294] Updated weights for policy 0, policy_version 204774 (0.0029) [2025-01-04 13:04:53,181][134294] Updated weights for policy 0, policy_version 204784 (0.0020) [2025-01-04 13:04:53,967][134211] Fps is (10 sec: 13926.8, 60 sec: 13926.5, 300 sec: 14495.7). Total num frames: 838811648. Throughput: 0: 3434.8. Samples: 198867394. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 13:04:53,968][134211] Avg episode reward: [(0, '9.122')] [2025-01-04 13:04:55,125][134294] Updated weights for policy 0, policy_version 204794 (0.0014) [2025-01-04 13:04:57,062][134294] Updated weights for policy 0, policy_version 204804 (0.0014) [2025-01-04 13:04:58,968][134211] Fps is (10 sec: 18022.8, 60 sec: 14677.4, 300 sec: 14592.9). Total num frames: 838914048. Throughput: 0: 3708.6. Samples: 198898196. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 13:04:58,968][134211] Avg episode reward: [(0, '10.411')] [2025-01-04 13:04:59,034][134294] Updated weights for policy 0, policy_version 204814 (0.0015) [2025-01-04 13:05:01,535][134294] Updated weights for policy 0, policy_version 204824 (0.0021) [2025-01-04 13:05:03,968][134211] Fps is (10 sec: 17612.4, 60 sec: 14813.8, 300 sec: 14509.5). Total num frames: 838987776. Throughput: 0: 3687.6. Samples: 198911864. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 13:05:03,968][134211] Avg episode reward: [(0, '8.498')] [2025-01-04 13:05:04,987][134294] Updated weights for policy 0, policy_version 204834 (0.0031) [2025-01-04 13:05:08,547][134294] Updated weights for policy 0, policy_version 204844 (0.0031) [2025-01-04 13:05:08,968][134211] Fps is (10 sec: 13106.9, 60 sec: 14404.3, 300 sec: 14454.0). Total num frames: 839045120. Throughput: 0: 3560.0. Samples: 198929442. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 13:05:08,968][134211] Avg episode reward: [(0, '10.204')] [2025-01-04 13:05:11,525][134294] Updated weights for policy 0, policy_version 204854 (0.0026) [2025-01-04 13:05:13,968][134211] Fps is (10 sec: 12287.9, 60 sec: 14472.5, 300 sec: 14440.1). Total num frames: 839110656. Throughput: 0: 3555.7. Samples: 198948864. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 13:05:13,968][134211] Avg episode reward: [(0, '9.526')] [2025-01-04 13:05:13,977][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000204861_839110656.pth... [2025-01-04 13:05:14,056][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000204024_835682304.pth [2025-01-04 13:05:14,810][134294] Updated weights for policy 0, policy_version 204864 (0.0023) [2025-01-04 13:05:17,774][134294] Updated weights for policy 0, policy_version 204874 (0.0025) [2025-01-04 13:05:18,968][134211] Fps is (10 sec: 13107.4, 60 sec: 13926.4, 300 sec: 14426.3). Total num frames: 839176192. Throughput: 0: 3568.4. Samples: 198958476. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 13:05:18,968][134211] Avg episode reward: [(0, '8.952')] [2025-01-04 13:05:20,861][134294] Updated weights for policy 0, policy_version 204884 (0.0024) [2025-01-04 13:05:23,969][134211] Fps is (10 sec: 13106.4, 60 sec: 13926.2, 300 sec: 14412.3). Total num frames: 839241728. Throughput: 0: 3610.6. Samples: 198978662. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 13:05:23,969][134211] Avg episode reward: [(0, '8.782')] [2025-01-04 13:05:24,047][134294] Updated weights for policy 0, policy_version 204894 (0.0027) [2025-01-04 13:05:27,591][134294] Updated weights for policy 0, policy_version 204904 (0.0031) [2025-01-04 13:05:28,968][134211] Fps is (10 sec: 12287.8, 60 sec: 13858.1, 300 sec: 14370.7). Total num frames: 839299072. Throughput: 0: 3489.2. Samples: 198996594. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 13:05:28,969][134211] Avg episode reward: [(0, '9.465')] [2025-01-04 13:05:31,071][134294] Updated weights for policy 0, policy_version 204914 (0.0031) [2025-01-04 13:05:33,868][134294] Updated weights for policy 0, policy_version 204924 (0.0017) [2025-01-04 13:05:33,967][134211] Fps is (10 sec: 12698.8, 60 sec: 13994.7, 300 sec: 14370.7). Total num frames: 839368704. Throughput: 0: 3466.3. Samples: 199005312. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 13:05:33,968][134211] Avg episode reward: [(0, '9.983')] [2025-01-04 13:05:35,965][134294] Updated weights for policy 0, policy_version 204934 (0.0012) [2025-01-04 13:05:37,975][134294] Updated weights for policy 0, policy_version 204944 (0.0013) [2025-01-04 13:05:38,968][134211] Fps is (10 sec: 17203.5, 60 sec: 14677.4, 300 sec: 14412.4). Total num frames: 839471104. Throughput: 0: 3649.0. Samples: 199031598. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 13:05:38,968][134211] Avg episode reward: [(0, '9.737')] [2025-01-04 13:05:40,365][134294] Updated weights for policy 0, policy_version 204954 (0.0019) [2025-01-04 13:05:43,565][134294] Updated weights for policy 0, policy_version 204964 (0.0027) [2025-01-04 13:05:43,968][134211] Fps is (10 sec: 16793.0, 60 sec: 14404.2, 300 sec: 14384.6). Total num frames: 839536640. Throughput: 0: 3494.1. Samples: 199055432. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 13:05:43,969][134211] Avg episode reward: [(0, '9.689')] [2025-01-04 13:05:47,312][134294] Updated weights for policy 0, policy_version 204974 (0.0033) [2025-01-04 13:05:48,968][134211] Fps is (10 sec: 11878.3, 60 sec: 14267.7, 300 sec: 14329.1). Total num frames: 839589888. Throughput: 0: 3368.0. Samples: 199063426. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 13:05:48,968][134211] Avg episode reward: [(0, '10.905')] [2025-01-04 13:05:50,604][134294] Updated weights for policy 0, policy_version 204984 (0.0030) [2025-01-04 13:05:53,609][134294] Updated weights for policy 0, policy_version 204994 (0.0026) [2025-01-04 13:05:53,969][134211] Fps is (10 sec: 11877.5, 60 sec: 14062.7, 300 sec: 14315.1). Total num frames: 839655424. Throughput: 0: 3394.7. Samples: 199082208. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 13:05:53,969][134211] Avg episode reward: [(0, '9.413')] [2025-01-04 13:05:56,887][134294] Updated weights for policy 0, policy_version 205004 (0.0029) [2025-01-04 13:05:58,968][134211] Fps is (10 sec: 13107.2, 60 sec: 13448.5, 300 sec: 14301.3). Total num frames: 839720960. Throughput: 0: 3380.0. Samples: 199100962. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 13:05:58,968][134211] Avg episode reward: [(0, '8.769')] [2025-01-04 13:06:00,324][134294] Updated weights for policy 0, policy_version 205014 (0.0028) [2025-01-04 13:06:03,480][134294] Updated weights for policy 0, policy_version 205024 (0.0032) [2025-01-04 13:06:03,968][134211] Fps is (10 sec: 12698.6, 60 sec: 13243.7, 300 sec: 14162.4). Total num frames: 839782400. Throughput: 0: 3377.7. Samples: 199110474. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 13:06:03,969][134211] Avg episode reward: [(0, '8.870')] [2025-01-04 13:06:06,666][134294] Updated weights for policy 0, policy_version 205034 (0.0024) [2025-01-04 13:06:08,968][134211] Fps is (10 sec: 12288.0, 60 sec: 13312.0, 300 sec: 14120.8). Total num frames: 839843840. Throughput: 0: 3352.9. Samples: 199129538. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 13:06:08,969][134211] Avg episode reward: [(0, '10.098')] [2025-01-04 13:06:10,125][134294] Updated weights for policy 0, policy_version 205044 (0.0025) [2025-01-04 13:06:13,426][134294] Updated weights for policy 0, policy_version 205054 (0.0027) [2025-01-04 13:06:13,968][134211] Fps is (10 sec: 12288.0, 60 sec: 13243.7, 300 sec: 14093.0). Total num frames: 839905280. Throughput: 0: 3360.1. Samples: 199147798. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 13:06:13,969][134211] Avg episode reward: [(0, '9.273')] [2025-01-04 13:06:16,255][134294] Updated weights for policy 0, policy_version 205064 (0.0020) [2025-01-04 13:06:18,287][134294] Updated weights for policy 0, policy_version 205074 (0.0012) [2025-01-04 13:06:18,968][134211] Fps is (10 sec: 15155.1, 60 sec: 13653.3, 300 sec: 14162.4). Total num frames: 839995392. Throughput: 0: 3403.9. Samples: 199158488. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 13:06:18,968][134211] Avg episode reward: [(0, '8.936')] [2025-01-04 13:06:20,827][134294] Updated weights for policy 0, policy_version 205084 (0.0021) [2025-01-04 13:06:23,957][134294] Updated weights for policy 0, policy_version 205094 (0.0029) [2025-01-04 13:06:23,968][134211] Fps is (10 sec: 15974.4, 60 sec: 13721.8, 300 sec: 14162.4). Total num frames: 840065024. Throughput: 0: 3379.1. Samples: 199183656. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 13:06:23,969][134211] Avg episode reward: [(0, '9.299')] [2025-01-04 13:06:26,982][134294] Updated weights for policy 0, policy_version 205104 (0.0026) [2025-01-04 13:06:28,968][134211] Fps is (10 sec: 13517.2, 60 sec: 13858.2, 300 sec: 14148.6). Total num frames: 840130560. Throughput: 0: 3293.6. Samples: 199203644. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 13:06:28,968][134211] Avg episode reward: [(0, '9.588')] [2025-01-04 13:06:30,012][134294] Updated weights for policy 0, policy_version 205114 (0.0025) [2025-01-04 13:06:32,956][134294] Updated weights for policy 0, policy_version 205124 (0.0024) [2025-01-04 13:06:33,968][134211] Fps is (10 sec: 13516.8, 60 sec: 13858.1, 300 sec: 14148.5). Total num frames: 840200192. Throughput: 0: 3348.0. Samples: 199214086. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 13:06:33,968][134211] Avg episode reward: [(0, '9.879')] [2025-01-04 13:06:35,858][134294] Updated weights for policy 0, policy_version 205134 (0.0026) [2025-01-04 13:06:38,819][134294] Updated weights for policy 0, policy_version 205144 (0.0022) [2025-01-04 13:06:38,968][134211] Fps is (10 sec: 13926.3, 60 sec: 13312.0, 300 sec: 14134.7). Total num frames: 840269824. Throughput: 0: 3395.0. Samples: 199234980. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 13:06:38,968][134211] Avg episode reward: [(0, '8.505')] [2025-01-04 13:06:41,858][134294] Updated weights for policy 0, policy_version 205154 (0.0024) [2025-01-04 13:06:43,968][134211] Fps is (10 sec: 13926.5, 60 sec: 13380.3, 300 sec: 14079.1). Total num frames: 840339456. Throughput: 0: 3426.9. Samples: 199255174. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 13:06:43,968][134211] Avg episode reward: [(0, '9.793')] [2025-01-04 13:06:44,954][134294] Updated weights for policy 0, policy_version 205164 (0.0026) [2025-01-04 13:06:47,904][134294] Updated weights for policy 0, policy_version 205174 (0.0026) [2025-01-04 13:06:48,968][134211] Fps is (10 sec: 13516.7, 60 sec: 13585.1, 300 sec: 14009.7). Total num frames: 840404992. Throughput: 0: 3438.7. Samples: 199265216. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 13:06:48,968][134211] Avg episode reward: [(0, '9.182')] [2025-01-04 13:06:50,903][134294] Updated weights for policy 0, policy_version 205184 (0.0026) [2025-01-04 13:06:53,943][134294] Updated weights for policy 0, policy_version 205194 (0.0024) [2025-01-04 13:06:53,968][134211] Fps is (10 sec: 13516.3, 60 sec: 13653.5, 300 sec: 14009.7). Total num frames: 840474624. Throughput: 0: 3469.9. Samples: 199285684. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 13:06:53,969][134211] Avg episode reward: [(0, '10.671')] [2025-01-04 13:06:56,964][134294] Updated weights for policy 0, policy_version 205204 (0.0026) [2025-01-04 13:06:58,968][134211] Fps is (10 sec: 13516.8, 60 sec: 13653.3, 300 sec: 13995.8). Total num frames: 840540160. Throughput: 0: 3508.5. Samples: 199305678. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 13:06:58,970][134211] Avg episode reward: [(0, '8.708')] [2025-01-04 13:07:00,137][134294] Updated weights for policy 0, policy_version 205214 (0.0026) [2025-01-04 13:07:03,178][134294] Updated weights for policy 0, policy_version 205224 (0.0022) [2025-01-04 13:07:03,968][134211] Fps is (10 sec: 13107.5, 60 sec: 13721.6, 300 sec: 13995.8). Total num frames: 840605696. Throughput: 0: 3493.8. Samples: 199315708. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 13:07:03,968][134211] Avg episode reward: [(0, '9.974')] [2025-01-04 13:07:06,231][134294] Updated weights for policy 0, policy_version 205234 (0.0025) [2025-01-04 13:07:08,163][134294] Updated weights for policy 0, policy_version 205244 (0.0015) [2025-01-04 13:07:08,968][134211] Fps is (10 sec: 15155.1, 60 sec: 14131.2, 300 sec: 14037.5). Total num frames: 840691712. Throughput: 0: 3427.2. Samples: 199337882. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 13:07:08,968][134211] Avg episode reward: [(0, '9.799')] [2025-01-04 13:07:11,049][134294] Updated weights for policy 0, policy_version 205254 (0.0026) [2025-01-04 13:07:13,968][134211] Fps is (10 sec: 15155.1, 60 sec: 14199.5, 300 sec: 14023.6). Total num frames: 840757248. Throughput: 0: 3480.4. Samples: 199360264. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 13:07:13,969][134211] Avg episode reward: [(0, '9.264')] [2025-01-04 13:07:13,979][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000205263_840757248.pth... [2025-01-04 13:07:14,053][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000204452_837435392.pth [2025-01-04 13:07:14,199][134294] Updated weights for policy 0, policy_version 205264 (0.0025) [2025-01-04 13:07:17,212][134294] Updated weights for policy 0, policy_version 205274 (0.0026) [2025-01-04 13:07:18,968][134211] Fps is (10 sec: 13107.4, 60 sec: 13789.9, 300 sec: 14009.7). Total num frames: 840822784. Throughput: 0: 3468.8. Samples: 199370182. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 13:07:18,968][134211] Avg episode reward: [(0, '9.645')] [2025-01-04 13:07:20,178][134294] Updated weights for policy 0, policy_version 205284 (0.0027) [2025-01-04 13:07:23,063][134294] Updated weights for policy 0, policy_version 205294 (0.0024) [2025-01-04 13:07:23,968][134211] Fps is (10 sec: 13926.5, 60 sec: 13858.1, 300 sec: 13954.2). Total num frames: 840896512. Throughput: 0: 3470.7. Samples: 199391160. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 13:07:23,968][134211] Avg episode reward: [(0, '9.583')] [2025-01-04 13:07:26,032][134294] Updated weights for policy 0, policy_version 205304 (0.0025) [2025-01-04 13:07:28,580][134294] Updated weights for policy 0, policy_version 205314 (0.0018) [2025-01-04 13:07:28,968][134211] Fps is (10 sec: 15155.3, 60 sec: 14062.9, 300 sec: 13843.1). Total num frames: 840974336. Throughput: 0: 3500.5. Samples: 199412698. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 13:07:28,968][134211] Avg episode reward: [(0, '9.676')] [2025-01-04 13:07:30,792][134294] Updated weights for policy 0, policy_version 205324 (0.0016) [2025-01-04 13:07:33,749][134294] Updated weights for policy 0, policy_version 205334 (0.0023) [2025-01-04 13:07:33,968][134211] Fps is (10 sec: 15155.4, 60 sec: 14131.2, 300 sec: 13815.3). Total num frames: 841048064. Throughput: 0: 3576.4. Samples: 199426152. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 13:07:33,968][134211] Avg episode reward: [(0, '9.910')] [2025-01-04 13:07:36,764][134294] Updated weights for policy 0, policy_version 205344 (0.0025) [2025-01-04 13:07:38,968][134211] Fps is (10 sec: 14335.7, 60 sec: 14131.2, 300 sec: 13857.0). Total num frames: 841117696. Throughput: 0: 3580.5. Samples: 199446804. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 13:07:38,968][134211] Avg episode reward: [(0, '10.064')] [2025-01-04 13:07:39,888][134294] Updated weights for policy 0, policy_version 205354 (0.0027) [2025-01-04 13:07:42,777][134294] Updated weights for policy 0, policy_version 205364 (0.0026) [2025-01-04 13:07:43,968][134211] Fps is (10 sec: 13516.8, 60 sec: 14063.0, 300 sec: 13912.5). Total num frames: 841183232. Throughput: 0: 3587.7. Samples: 199467126. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 13:07:43,968][134211] Avg episode reward: [(0, '10.435')] [2025-01-04 13:07:45,869][134294] Updated weights for policy 0, policy_version 205374 (0.0026) [2025-01-04 13:07:48,657][134294] Updated weights for policy 0, policy_version 205384 (0.0025) [2025-01-04 13:07:48,968][134211] Fps is (10 sec: 13516.8, 60 sec: 14131.2, 300 sec: 13954.2). Total num frames: 841252864. Throughput: 0: 3593.8. Samples: 199477428. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 13:07:48,968][134211] Avg episode reward: [(0, '9.751')] [2025-01-04 13:07:51,681][134294] Updated weights for policy 0, policy_version 205394 (0.0025) [2025-01-04 13:07:53,968][134211] Fps is (10 sec: 13926.4, 60 sec: 14131.3, 300 sec: 13926.4). Total num frames: 841322496. Throughput: 0: 3565.4. Samples: 199498324. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 13:07:53,968][134211] Avg episode reward: [(0, '9.359')] [2025-01-04 13:07:54,694][134294] Updated weights for policy 0, policy_version 205404 (0.0022) [2025-01-04 13:07:57,545][134294] Updated weights for policy 0, policy_version 205414 (0.0022) [2025-01-04 13:07:58,968][134211] Fps is (10 sec: 15155.5, 60 sec: 14404.3, 300 sec: 13870.9). Total num frames: 841404416. Throughput: 0: 3558.6. Samples: 199520402. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 13:07:58,968][134211] Avg episode reward: [(0, '9.548')] [2025-01-04 13:07:59,463][134294] Updated weights for policy 0, policy_version 205424 (0.0013) [2025-01-04 13:08:01,331][134294] Updated weights for policy 0, policy_version 205434 (0.0013) [2025-01-04 13:08:03,264][134294] Updated weights for policy 0, policy_version 205444 (0.0013) [2025-01-04 13:08:03,967][134211] Fps is (10 sec: 18841.9, 60 sec: 15087.0, 300 sec: 14037.5). Total num frames: 841510912. Throughput: 0: 3701.5. Samples: 199536750. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 13:08:03,968][134211] Avg episode reward: [(0, '10.110')] [2025-01-04 13:08:05,165][134294] Updated weights for policy 0, policy_version 205454 (0.0014) [2025-01-04 13:08:08,176][134294] Updated weights for policy 0, policy_version 205464 (0.0029) [2025-01-04 13:08:08,968][134211] Fps is (10 sec: 18431.6, 60 sec: 14950.4, 300 sec: 14079.1). Total num frames: 841588736. Throughput: 0: 3873.8. Samples: 199565480. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 13:08:08,969][134211] Avg episode reward: [(0, '10.455')] [2025-01-04 13:08:11,294][134294] Updated weights for policy 0, policy_version 205474 (0.0027) [2025-01-04 13:08:13,968][134211] Fps is (10 sec: 14335.7, 60 sec: 14950.4, 300 sec: 14106.9). Total num frames: 841654272. Throughput: 0: 3813.9. Samples: 199584324. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 13:08:13,969][134211] Avg episode reward: [(0, '9.846')] [2025-01-04 13:08:14,692][134294] Updated weights for policy 0, policy_version 205484 (0.0030) [2025-01-04 13:08:17,754][134294] Updated weights for policy 0, policy_version 205494 (0.0027) [2025-01-04 13:08:18,968][134211] Fps is (10 sec: 12697.8, 60 sec: 14882.1, 300 sec: 14106.9). Total num frames: 841715712. Throughput: 0: 3725.4. Samples: 199593794. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 13:08:18,968][134211] Avg episode reward: [(0, '8.651')] [2025-01-04 13:08:20,781][134294] Updated weights for policy 0, policy_version 205504 (0.0027) [2025-01-04 13:08:23,784][134294] Updated weights for policy 0, policy_version 205514 (0.0024) [2025-01-04 13:08:23,968][134211] Fps is (10 sec: 13107.3, 60 sec: 14813.9, 300 sec: 14120.8). Total num frames: 841785344. Throughput: 0: 3720.6. Samples: 199614230. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 13:08:23,968][134211] Avg episode reward: [(0, '9.480')] [2025-01-04 13:08:26,633][134294] Updated weights for policy 0, policy_version 205524 (0.0025) [2025-01-04 13:08:28,968][134211] Fps is (10 sec: 13926.4, 60 sec: 14677.3, 300 sec: 14120.8). Total num frames: 841854976. Throughput: 0: 3720.0. Samples: 199634524. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 13:08:28,968][134211] Avg episode reward: [(0, '8.834')] [2025-01-04 13:08:29,866][134294] Updated weights for policy 0, policy_version 205534 (0.0023) [2025-01-04 13:08:32,789][134294] Updated weights for policy 0, policy_version 205544 (0.0027) [2025-01-04 13:08:33,968][134211] Fps is (10 sec: 13516.7, 60 sec: 14540.8, 300 sec: 14093.0). Total num frames: 841920512. Throughput: 0: 3714.4. Samples: 199644576. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 13:08:33,968][134211] Avg episode reward: [(0, '9.929')] [2025-01-04 13:08:35,770][134294] Updated weights for policy 0, policy_version 205554 (0.0024) [2025-01-04 13:08:38,728][134294] Updated weights for policy 0, policy_version 205564 (0.0025) [2025-01-04 13:08:38,968][134211] Fps is (10 sec: 13925.3, 60 sec: 14608.9, 300 sec: 14009.7). Total num frames: 841994240. Throughput: 0: 3714.3. Samples: 199665468. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 13:08:38,969][134211] Avg episode reward: [(0, '9.947')] [2025-01-04 13:08:41,636][134294] Updated weights for policy 0, policy_version 205574 (0.0022) [2025-01-04 13:08:43,968][134211] Fps is (10 sec: 13925.5, 60 sec: 14608.9, 300 sec: 14051.4). Total num frames: 842059776. Throughput: 0: 3680.4. Samples: 199686024. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 13:08:43,969][134211] Avg episode reward: [(0, '10.977')] [2025-01-04 13:08:44,739][134294] Updated weights for policy 0, policy_version 205584 (0.0025) [2025-01-04 13:08:47,732][134294] Updated weights for policy 0, policy_version 205594 (0.0025) [2025-01-04 13:08:48,968][134211] Fps is (10 sec: 13108.2, 60 sec: 14540.8, 300 sec: 14065.3). Total num frames: 842125312. Throughput: 0: 3542.7. Samples: 199696170. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 13:08:48,968][134211] Avg episode reward: [(0, '9.673')] [2025-01-04 13:08:50,699][134294] Updated weights for policy 0, policy_version 205604 (0.0024) [2025-01-04 13:08:53,556][134294] Updated weights for policy 0, policy_version 205614 (0.0026) [2025-01-04 13:08:53,968][134211] Fps is (10 sec: 13927.1, 60 sec: 14609.0, 300 sec: 14120.8). Total num frames: 842199040. Throughput: 0: 3370.3. Samples: 199717144. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 13:08:53,968][134211] Avg episode reward: [(0, '10.664')] [2025-01-04 13:08:56,492][134294] Updated weights for policy 0, policy_version 205624 (0.0024) [2025-01-04 13:08:58,968][134211] Fps is (10 sec: 14336.0, 60 sec: 14404.2, 300 sec: 14134.7). Total num frames: 842268672. Throughput: 0: 3408.6. Samples: 199737712. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 13:08:58,968][134211] Avg episode reward: [(0, '10.195')] [2025-01-04 13:08:59,564][134294] Updated weights for policy 0, policy_version 205634 (0.0026) [2025-01-04 13:09:02,584][134294] Updated weights for policy 0, policy_version 205644 (0.0028) [2025-01-04 13:09:03,968][134211] Fps is (10 sec: 13516.8, 60 sec: 13721.5, 300 sec: 14079.1). Total num frames: 842334208. Throughput: 0: 3424.8. Samples: 199747910. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2025-01-04 13:09:03,968][134211] Avg episode reward: [(0, '7.931')] [2025-01-04 13:09:05,596][134294] Updated weights for policy 0, policy_version 205654 (0.0025) [2025-01-04 13:09:08,440][134294] Updated weights for policy 0, policy_version 205664 (0.0026) [2025-01-04 13:09:08,968][134211] Fps is (10 sec: 13516.0, 60 sec: 13584.9, 300 sec: 14106.9). Total num frames: 842403840. Throughput: 0: 3435.5. Samples: 199768832. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2025-01-04 13:09:08,969][134211] Avg episode reward: [(0, '8.821')] [2025-01-04 13:09:11,316][134294] Updated weights for policy 0, policy_version 205674 (0.0022) [2025-01-04 13:09:13,462][134294] Updated weights for policy 0, policy_version 205684 (0.0013) [2025-01-04 13:09:13,968][134211] Fps is (10 sec: 15565.0, 60 sec: 13926.4, 300 sec: 14065.2). Total num frames: 842489856. Throughput: 0: 3504.2. Samples: 199792212. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2025-01-04 13:09:13,968][134211] Avg episode reward: [(0, '9.654')] [2025-01-04 13:09:13,978][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000205686_842489856.pth... [2025-01-04 13:09:14,041][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000204861_839110656.pth [2025-01-04 13:09:16,219][134294] Updated weights for policy 0, policy_version 205694 (0.0023) [2025-01-04 13:09:18,968][134211] Fps is (10 sec: 15565.6, 60 sec: 14062.9, 300 sec: 14079.1). Total num frames: 842559488. Throughput: 0: 3537.0. Samples: 199803740. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2025-01-04 13:09:18,968][134211] Avg episode reward: [(0, '9.273')] [2025-01-04 13:09:19,106][134294] Updated weights for policy 0, policy_version 205704 (0.0025) [2025-01-04 13:09:22,169][134294] Updated weights for policy 0, policy_version 205714 (0.0026) [2025-01-04 13:09:23,968][134211] Fps is (10 sec: 13516.6, 60 sec: 13994.6, 300 sec: 14093.0). Total num frames: 842625024. Throughput: 0: 3534.3. Samples: 199824510. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2025-01-04 13:09:23,968][134211] Avg episode reward: [(0, '9.163')] [2025-01-04 13:09:25,156][134294] Updated weights for policy 0, policy_version 205724 (0.0024) [2025-01-04 13:09:28,043][134294] Updated weights for policy 0, policy_version 205734 (0.0021) [2025-01-04 13:09:28,968][134211] Fps is (10 sec: 13926.5, 60 sec: 14062.9, 300 sec: 14134.7). Total num frames: 842698752. Throughput: 0: 3539.8. Samples: 199845312. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2025-01-04 13:09:28,968][134211] Avg episode reward: [(0, '10.166')] [2025-01-04 13:09:30,985][134294] Updated weights for policy 0, policy_version 205744 (0.0025) [2025-01-04 13:09:33,368][134294] Updated weights for policy 0, policy_version 205754 (0.0016) [2025-01-04 13:09:33,967][134211] Fps is (10 sec: 15565.2, 60 sec: 14336.0, 300 sec: 14204.1). Total num frames: 842780672. Throughput: 0: 3545.1. Samples: 199855698. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2025-01-04 13:09:33,968][134211] Avg episode reward: [(0, '10.869')] [2025-01-04 13:09:35,242][134294] Updated weights for policy 0, policy_version 205764 (0.0014) [2025-01-04 13:09:37,422][134294] Updated weights for policy 0, policy_version 205774 (0.0017) [2025-01-04 13:09:38,968][134211] Fps is (10 sec: 17203.1, 60 sec: 14609.2, 300 sec: 14231.9). Total num frames: 842870784. Throughput: 0: 3730.4. Samples: 199885012. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2025-01-04 13:09:38,968][134211] Avg episode reward: [(0, '10.393')] [2025-01-04 13:09:40,437][134294] Updated weights for policy 0, policy_version 205784 (0.0028) [2025-01-04 13:09:43,491][134294] Updated weights for policy 0, policy_version 205794 (0.0023) [2025-01-04 13:09:43,968][134211] Fps is (10 sec: 15564.3, 60 sec: 14609.2, 300 sec: 14245.7). Total num frames: 842936320. Throughput: 0: 3725.8. Samples: 199905374. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2025-01-04 13:09:43,968][134211] Avg episode reward: [(0, '9.585')] [2025-01-04 13:09:46,488][134294] Updated weights for policy 0, policy_version 205804 (0.0024) [2025-01-04 13:09:48,968][134211] Fps is (10 sec: 13516.8, 60 sec: 14677.3, 300 sec: 14218.0). Total num frames: 843005952. Throughput: 0: 3723.7. Samples: 199915476. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2025-01-04 13:09:48,969][134211] Avg episode reward: [(0, '9.148')] [2025-01-04 13:09:49,594][134294] Updated weights for policy 0, policy_version 205814 (0.0024) [2025-01-04 13:09:52,581][134294] Updated weights for policy 0, policy_version 205824 (0.0028) [2025-01-04 13:09:53,968][134211] Fps is (10 sec: 13516.3, 60 sec: 14540.7, 300 sec: 14093.0). Total num frames: 843071488. Throughput: 0: 3708.6. Samples: 199935718. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2025-01-04 13:09:53,969][134211] Avg episode reward: [(0, '8.456')] [2025-01-04 13:09:55,516][134294] Updated weights for policy 0, policy_version 205834 (0.0022) [2025-01-04 13:09:58,430][134294] Updated weights for policy 0, policy_version 205844 (0.0025) [2025-01-04 13:09:58,968][134211] Fps is (10 sec: 13516.8, 60 sec: 14540.8, 300 sec: 14079.1). Total num frames: 843141120. Throughput: 0: 3652.8. Samples: 199956590. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2025-01-04 13:09:58,968][134211] Avg episode reward: [(0, '9.428')] [2025-01-04 13:10:01,447][134294] Updated weights for policy 0, policy_version 205854 (0.0025) [2025-01-04 13:10:03,968][134211] Fps is (10 sec: 13926.9, 60 sec: 14609.1, 300 sec: 14120.8). Total num frames: 843210752. Throughput: 0: 3623.8. Samples: 199966810. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2025-01-04 13:10:03,968][134211] Avg episode reward: [(0, '9.167')] [2025-01-04 13:10:04,595][134294] Updated weights for policy 0, policy_version 205864 (0.0026) [2025-01-04 13:10:07,466][134294] Updated weights for policy 0, policy_version 205874 (0.0024) [2025-01-04 13:10:08,968][134211] Fps is (10 sec: 13516.7, 60 sec: 14540.9, 300 sec: 14120.8). Total num frames: 843276288. Throughput: 0: 3612.8. Samples: 199987084. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 13:10:08,968][134211] Avg episode reward: [(0, '9.419')] [2025-01-04 13:10:10,419][134294] Updated weights for policy 0, policy_version 205884 (0.0023) [2025-01-04 13:10:13,289][134294] Updated weights for policy 0, policy_version 205894 (0.0023) [2025-01-04 13:10:13,969][134211] Fps is (10 sec: 13925.4, 60 sec: 14335.8, 300 sec: 14148.5). Total num frames: 843350016. Throughput: 0: 3620.6. Samples: 200008244. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 13:10:13,969][134211] Avg episode reward: [(0, '10.531')] [2025-01-04 13:10:16,263][134294] Updated weights for policy 0, policy_version 205904 (0.0024) [2025-01-04 13:10:18,220][134294] Updated weights for policy 0, policy_version 205914 (0.0015) [2025-01-04 13:10:18,968][134211] Fps is (10 sec: 15974.7, 60 sec: 14609.1, 300 sec: 14218.0). Total num frames: 843436032. Throughput: 0: 3619.0. Samples: 200018554. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 13:10:18,968][134211] Avg episode reward: [(0, '9.452')] [2025-01-04 13:10:20,938][134294] Updated weights for policy 0, policy_version 205924 (0.0025) [2025-01-04 13:10:23,862][134294] Updated weights for policy 0, policy_version 205934 (0.0027) [2025-01-04 13:10:23,968][134211] Fps is (10 sec: 15566.1, 60 sec: 14677.4, 300 sec: 14259.6). Total num frames: 843505664. Throughput: 0: 3527.0. Samples: 200043728. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 13:10:23,968][134211] Avg episode reward: [(0, '9.467')] [2025-01-04 13:10:26,879][134294] Updated weights for policy 0, policy_version 205944 (0.0024) [2025-01-04 13:10:28,968][134211] Fps is (10 sec: 13516.8, 60 sec: 14540.8, 300 sec: 14245.7). Total num frames: 843571200. Throughput: 0: 3530.9. Samples: 200064266. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 13:10:28,968][134211] Avg episode reward: [(0, '10.433')] [2025-01-04 13:10:29,910][134294] Updated weights for policy 0, policy_version 205954 (0.0025) [2025-01-04 13:10:32,910][134294] Updated weights for policy 0, policy_version 205964 (0.0025) [2025-01-04 13:10:33,968][134211] Fps is (10 sec: 13516.8, 60 sec: 14336.0, 300 sec: 14134.7). Total num frames: 843640832. Throughput: 0: 3528.2. Samples: 200074244. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 13:10:33,968][134211] Avg episode reward: [(0, '8.956')] [2025-01-04 13:10:35,911][134294] Updated weights for policy 0, policy_version 205974 (0.0025) [2025-01-04 13:10:38,738][134294] Updated weights for policy 0, policy_version 205984 (0.0024) [2025-01-04 13:10:38,968][134211] Fps is (10 sec: 13926.0, 60 sec: 13994.6, 300 sec: 14148.6). Total num frames: 843710464. Throughput: 0: 3545.2. Samples: 200095252. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 13:10:38,969][134211] Avg episode reward: [(0, '9.812')] [2025-01-04 13:10:41,680][134294] Updated weights for policy 0, policy_version 205994 (0.0024) [2025-01-04 13:10:43,968][134211] Fps is (10 sec: 13926.1, 60 sec: 14062.9, 300 sec: 14204.1). Total num frames: 843780096. Throughput: 0: 3541.5. Samples: 200115958. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 13:10:43,968][134211] Avg episode reward: [(0, '9.883')] [2025-01-04 13:10:44,751][134294] Updated weights for policy 0, policy_version 206004 (0.0026) [2025-01-04 13:10:46,657][134294] Updated weights for policy 0, policy_version 206014 (0.0013) [2025-01-04 13:10:48,513][134294] Updated weights for policy 0, policy_version 206024 (0.0013) [2025-01-04 13:10:48,968][134211] Fps is (10 sec: 17203.8, 60 sec: 14609.1, 300 sec: 14329.1). Total num frames: 843882496. Throughput: 0: 3606.0. Samples: 200129078. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 13:10:48,968][134211] Avg episode reward: [(0, '9.794')] [2025-01-04 13:10:50,394][134294] Updated weights for policy 0, policy_version 206034 (0.0013) [2025-01-04 13:10:52,266][134294] Updated weights for policy 0, policy_version 206044 (0.0014) [2025-01-04 13:10:53,968][134211] Fps is (10 sec: 21299.8, 60 sec: 15360.2, 300 sec: 14481.8). Total num frames: 843993088. Throughput: 0: 3884.7. Samples: 200161894. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 13:10:53,968][134211] Avg episode reward: [(0, '9.719')] [2025-01-04 13:10:54,135][134294] Updated weights for policy 0, policy_version 206054 (0.0013) [2025-01-04 13:10:55,989][134294] Updated weights for policy 0, policy_version 206064 (0.0015) [2025-01-04 13:10:58,839][134294] Updated weights for policy 0, policy_version 206074 (0.0029) [2025-01-04 13:10:58,969][134211] Fps is (10 sec: 19658.3, 60 sec: 15632.8, 300 sec: 14565.0). Total num frames: 844079104. Throughput: 0: 4061.5. Samples: 200191012. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 13:10:58,970][134211] Avg episode reward: [(0, '10.354')] [2025-01-04 13:11:02,080][134294] Updated weights for policy 0, policy_version 206084 (0.0034) [2025-01-04 13:11:03,968][134211] Fps is (10 sec: 14745.2, 60 sec: 15496.5, 300 sec: 14565.1). Total num frames: 844140544. Throughput: 0: 4037.4. Samples: 200200236. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 13:11:03,968][134211] Avg episode reward: [(0, '9.508')] [2025-01-04 13:11:05,262][134294] Updated weights for policy 0, policy_version 206094 (0.0029) [2025-01-04 13:11:08,286][134294] Updated weights for policy 0, policy_version 206104 (0.0028) [2025-01-04 13:11:08,968][134211] Fps is (10 sec: 13108.6, 60 sec: 15564.8, 300 sec: 14592.9). Total num frames: 844210176. Throughput: 0: 3915.6. Samples: 200219932. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 13:11:08,968][134211] Avg episode reward: [(0, '9.509')] [2025-01-04 13:11:11,315][134294] Updated weights for policy 0, policy_version 206114 (0.0027) [2025-01-04 13:11:13,968][134211] Fps is (10 sec: 13516.7, 60 sec: 15428.4, 300 sec: 14509.6). Total num frames: 844275712. Throughput: 0: 3906.2. Samples: 200240046. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 13:11:13,969][134211] Avg episode reward: [(0, '10.033')] [2025-01-04 13:11:13,979][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000206122_844275712.pth... [2025-01-04 13:11:14,061][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000205263_840757248.pth [2025-01-04 13:11:14,404][134294] Updated weights for policy 0, policy_version 206124 (0.0028) [2025-01-04 13:11:17,483][134294] Updated weights for policy 0, policy_version 206134 (0.0024) [2025-01-04 13:11:18,968][134211] Fps is (10 sec: 13107.4, 60 sec: 15086.9, 300 sec: 14495.7). Total num frames: 844341248. Throughput: 0: 3905.0. Samples: 200249970. Policy #0 lag: (min: 0.0, avg: 0.8, max: 3.0) [2025-01-04 13:11:18,968][134211] Avg episode reward: [(0, '10.601')] [2025-01-04 13:11:20,474][134294] Updated weights for policy 0, policy_version 206144 (0.0025) [2025-01-04 13:11:23,412][134294] Updated weights for policy 0, policy_version 206154 (0.0025) [2025-01-04 13:11:23,968][134211] Fps is (10 sec: 13516.9, 60 sec: 15086.9, 300 sec: 14509.5). Total num frames: 844410880. Throughput: 0: 3898.9. Samples: 200270704. Policy #0 lag: (min: 0.0, avg: 0.8, max: 3.0) [2025-01-04 13:11:23,968][134211] Avg episode reward: [(0, '9.571')] [2025-01-04 13:11:26,343][134294] Updated weights for policy 0, policy_version 206164 (0.0026) [2025-01-04 13:11:28,968][134211] Fps is (10 sec: 13926.1, 60 sec: 15155.1, 300 sec: 14509.6). Total num frames: 844480512. Throughput: 0: 3893.2. Samples: 200291152. Policy #0 lag: (min: 0.0, avg: 0.8, max: 3.0) [2025-01-04 13:11:28,969][134211] Avg episode reward: [(0, '10.408')] [2025-01-04 13:11:29,468][134294] Updated weights for policy 0, policy_version 206174 (0.0027) [2025-01-04 13:11:32,427][134294] Updated weights for policy 0, policy_version 206184 (0.0027) [2025-01-04 13:11:33,968][134211] Fps is (10 sec: 13516.9, 60 sec: 15086.9, 300 sec: 14495.7). Total num frames: 844546048. Throughput: 0: 3829.1. Samples: 200301388. Policy #0 lag: (min: 0.0, avg: 0.8, max: 3.0) [2025-01-04 13:11:33,968][134211] Avg episode reward: [(0, '11.966')] [2025-01-04 13:11:33,993][134264] Saving new best policy, reward=11.966! [2025-01-04 13:11:35,495][134294] Updated weights for policy 0, policy_version 206194 (0.0024) [2025-01-04 13:11:38,296][134294] Updated weights for policy 0, policy_version 206204 (0.0024) [2025-01-04 13:11:38,968][134211] Fps is (10 sec: 13517.0, 60 sec: 15087.0, 300 sec: 14495.7). Total num frames: 844615680. Throughput: 0: 3557.8. Samples: 200321994. Policy #0 lag: (min: 0.0, avg: 0.8, max: 3.0) [2025-01-04 13:11:38,968][134211] Avg episode reward: [(0, '9.132')] [2025-01-04 13:11:41,295][134294] Updated weights for policy 0, policy_version 206214 (0.0027) [2025-01-04 13:11:43,968][134211] Fps is (10 sec: 13926.3, 60 sec: 15086.9, 300 sec: 14509.6). Total num frames: 844685312. Throughput: 0: 3368.6. Samples: 200342596. Policy #0 lag: (min: 0.0, avg: 0.8, max: 3.0) [2025-01-04 13:11:43,968][134211] Avg episode reward: [(0, '9.002')] [2025-01-04 13:11:44,401][134294] Updated weights for policy 0, policy_version 206224 (0.0028) [2025-01-04 13:11:47,323][134294] Updated weights for policy 0, policy_version 206234 (0.0025) [2025-01-04 13:11:48,968][134211] Fps is (10 sec: 13926.4, 60 sec: 14540.8, 300 sec: 14509.6). Total num frames: 844754944. Throughput: 0: 3391.1. Samples: 200352836. Policy #0 lag: (min: 0.0, avg: 0.8, max: 3.0) [2025-01-04 13:11:48,968][134211] Avg episode reward: [(0, '8.799')] [2025-01-04 13:11:50,253][134294] Updated weights for policy 0, policy_version 206244 (0.0026) [2025-01-04 13:11:53,111][134294] Updated weights for policy 0, policy_version 206254 (0.0026) [2025-01-04 13:11:53,970][134211] Fps is (10 sec: 13923.6, 60 sec: 13857.6, 300 sec: 14523.3). Total num frames: 844824576. Throughput: 0: 3423.7. Samples: 200374006. Policy #0 lag: (min: 0.0, avg: 0.8, max: 3.0) [2025-01-04 13:11:53,970][134211] Avg episode reward: [(0, '7.946')] [2025-01-04 13:11:56,039][134294] Updated weights for policy 0, policy_version 206264 (0.0026) [2025-01-04 13:11:58,968][134211] Fps is (10 sec: 13926.3, 60 sec: 13585.3, 300 sec: 14537.3). Total num frames: 844894208. Throughput: 0: 3432.2. Samples: 200394494. Policy #0 lag: (min: 0.0, avg: 0.8, max: 3.0) [2025-01-04 13:11:58,968][134211] Avg episode reward: [(0, '8.314')] [2025-01-04 13:11:59,179][134294] Updated weights for policy 0, policy_version 206274 (0.0025) [2025-01-04 13:12:02,309][134294] Updated weights for policy 0, policy_version 206284 (0.0026) [2025-01-04 13:12:03,968][134211] Fps is (10 sec: 13109.8, 60 sec: 13585.1, 300 sec: 14454.0). Total num frames: 844955648. Throughput: 0: 3438.5. Samples: 200404702. Policy #0 lag: (min: 0.0, avg: 0.8, max: 3.0) [2025-01-04 13:12:03,969][134211] Avg episode reward: [(0, '9.776')] [2025-01-04 13:12:05,981][134294] Updated weights for policy 0, policy_version 206294 (0.0030) [2025-01-04 13:12:08,625][134294] Updated weights for policy 0, policy_version 206304 (0.0021) [2025-01-04 13:12:08,968][134211] Fps is (10 sec: 13106.9, 60 sec: 13585.0, 300 sec: 14467.9). Total num frames: 845025280. Throughput: 0: 3359.4. Samples: 200421878. Policy #0 lag: (min: 0.0, avg: 0.8, max: 3.0) [2025-01-04 13:12:08,968][134211] Avg episode reward: [(0, '9.900')] [2025-01-04 13:12:10,664][134294] Updated weights for policy 0, policy_version 206314 (0.0012) [2025-01-04 13:12:12,569][134294] Updated weights for policy 0, policy_version 206324 (0.0013) [2025-01-04 13:12:13,968][134211] Fps is (10 sec: 17203.4, 60 sec: 14199.5, 300 sec: 14592.9). Total num frames: 845127680. Throughput: 0: 3575.9. Samples: 200452066. Policy #0 lag: (min: 0.0, avg: 0.8, max: 3.0) [2025-01-04 13:12:13,968][134211] Avg episode reward: [(0, '9.126')] [2025-01-04 13:12:15,086][134294] Updated weights for policy 0, policy_version 206334 (0.0021) [2025-01-04 13:12:18,007][134294] Updated weights for policy 0, policy_version 206344 (0.0024) [2025-01-04 13:12:18,968][134211] Fps is (10 sec: 17203.7, 60 sec: 14267.7, 300 sec: 14579.0). Total num frames: 845197312. Throughput: 0: 3592.8. Samples: 200463062. Policy #0 lag: (min: 0.0, avg: 0.8, max: 3.0) [2025-01-04 13:12:18,969][134211] Avg episode reward: [(0, '9.732')] [2025-01-04 13:12:21,024][134294] Updated weights for policy 0, policy_version 206354 (0.0025) [2025-01-04 13:12:23,968][134211] Fps is (10 sec: 13516.7, 60 sec: 14199.5, 300 sec: 14537.3). Total num frames: 845262848. Throughput: 0: 3589.2. Samples: 200483510. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 13:12:23,968][134211] Avg episode reward: [(0, '9.247')] [2025-01-04 13:12:24,212][134294] Updated weights for policy 0, policy_version 206364 (0.0026) [2025-01-04 13:12:27,002][134294] Updated weights for policy 0, policy_version 206374 (0.0024) [2025-01-04 13:12:28,968][134211] Fps is (10 sec: 13517.0, 60 sec: 14199.5, 300 sec: 14523.4). Total num frames: 845332480. Throughput: 0: 3585.8. Samples: 200503954. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 13:12:28,968][134211] Avg episode reward: [(0, '8.517')] [2025-01-04 13:12:29,979][134294] Updated weights for policy 0, policy_version 206384 (0.0024) [2025-01-04 13:12:32,884][134294] Updated weights for policy 0, policy_version 206394 (0.0023) [2025-01-04 13:12:33,968][134211] Fps is (10 sec: 13926.6, 60 sec: 14267.8, 300 sec: 14523.4). Total num frames: 845402112. Throughput: 0: 3595.5. Samples: 200514632. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 13:12:33,968][134211] Avg episode reward: [(0, '9.297')] [2025-01-04 13:12:35,820][134294] Updated weights for policy 0, policy_version 206404 (0.0024) [2025-01-04 13:12:38,613][134294] Updated weights for policy 0, policy_version 206414 (0.0021) [2025-01-04 13:12:38,968][134211] Fps is (10 sec: 14335.9, 60 sec: 14336.0, 300 sec: 14551.2). Total num frames: 845475840. Throughput: 0: 3599.1. Samples: 200535958. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 13:12:38,968][134211] Avg episode reward: [(0, '10.216')] [2025-01-04 13:12:41,594][134294] Updated weights for policy 0, policy_version 206424 (0.0024) [2025-01-04 13:12:43,968][134211] Fps is (10 sec: 13925.9, 60 sec: 14267.7, 300 sec: 14537.3). Total num frames: 845541376. Throughput: 0: 3603.4. Samples: 200556646. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 13:12:43,969][134211] Avg episode reward: [(0, '9.791')] [2025-01-04 13:12:44,593][134294] Updated weights for policy 0, policy_version 206434 (0.0026) [2025-01-04 13:12:47,525][134294] Updated weights for policy 0, policy_version 206444 (0.0025) [2025-01-04 13:12:48,968][134211] Fps is (10 sec: 13516.7, 60 sec: 14267.7, 300 sec: 14537.3). Total num frames: 845611008. Throughput: 0: 3607.3. Samples: 200567030. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 13:12:48,968][134211] Avg episode reward: [(0, '9.573')] [2025-01-04 13:12:50,475][134294] Updated weights for policy 0, policy_version 206454 (0.0024) [2025-01-04 13:12:53,286][134294] Updated weights for policy 0, policy_version 206464 (0.0024) [2025-01-04 13:12:53,969][134211] Fps is (10 sec: 14334.8, 60 sec: 14336.2, 300 sec: 14509.5). Total num frames: 845684736. Throughput: 0: 3702.5. Samples: 200588494. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 13:12:53,969][134211] Avg episode reward: [(0, '10.897')] [2025-01-04 13:12:56,149][134294] Updated weights for policy 0, policy_version 206474 (0.0023) [2025-01-04 13:12:58,968][134211] Fps is (10 sec: 14336.1, 60 sec: 14336.0, 300 sec: 14384.6). Total num frames: 845754368. Throughput: 0: 3501.1. Samples: 200609616. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 13:12:58,968][134211] Avg episode reward: [(0, '10.212')] [2025-01-04 13:12:59,083][134294] Updated weights for policy 0, policy_version 206484 (0.0026) [2025-01-04 13:13:02,013][134294] Updated weights for policy 0, policy_version 206494 (0.0026) [2025-01-04 13:13:03,969][134211] Fps is (10 sec: 13926.3, 60 sec: 14472.3, 300 sec: 14356.8). Total num frames: 845824000. Throughput: 0: 3486.8. Samples: 200619974. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 13:13:03,970][134211] Avg episode reward: [(0, '9.516')] [2025-01-04 13:13:04,974][134294] Updated weights for policy 0, policy_version 206504 (0.0026) [2025-01-04 13:13:07,801][134294] Updated weights for policy 0, policy_version 206514 (0.0025) [2025-01-04 13:13:08,968][134211] Fps is (10 sec: 13926.5, 60 sec: 14472.6, 300 sec: 14370.7). Total num frames: 845893632. Throughput: 0: 3508.9. Samples: 200641410. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 13:13:08,968][134211] Avg episode reward: [(0, '9.044')] [2025-01-04 13:13:10,686][134294] Updated weights for policy 0, policy_version 206524 (0.0025) [2025-01-04 13:13:13,437][134294] Updated weights for policy 0, policy_version 206534 (0.0027) [2025-01-04 13:13:13,968][134211] Fps is (10 sec: 14747.5, 60 sec: 14063.0, 300 sec: 14426.3). Total num frames: 845971456. Throughput: 0: 3532.6. Samples: 200662920. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 13:13:13,968][134211] Avg episode reward: [(0, '9.207')] [2025-01-04 13:13:13,975][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000206536_845971456.pth... [2025-01-04 13:13:14,020][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000205686_842489856.pth [2025-01-04 13:13:15,471][134294] Updated weights for policy 0, policy_version 206544 (0.0011) [2025-01-04 13:13:17,364][134294] Updated weights for policy 0, policy_version 206554 (0.0012) [2025-01-04 13:13:18,968][134211] Fps is (10 sec: 18432.1, 60 sec: 14677.4, 300 sec: 14551.2). Total num frames: 846077952. Throughput: 0: 3634.6. Samples: 200678188. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 13:13:18,968][134211] Avg episode reward: [(0, '10.366')] [2025-01-04 13:13:19,226][134294] Updated weights for policy 0, policy_version 206564 (0.0013) [2025-01-04 13:13:21,135][134294] Updated weights for policy 0, policy_version 206574 (0.0013) [2025-01-04 13:13:22,963][134294] Updated weights for policy 0, policy_version 206584 (0.0013) [2025-01-04 13:13:23,968][134211] Fps is (10 sec: 21297.6, 60 sec: 15359.9, 300 sec: 14676.1). Total num frames: 846184448. Throughput: 0: 3886.7. Samples: 200710864. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 13:13:23,969][134211] Avg episode reward: [(0, '8.844')] [2025-01-04 13:13:25,673][134294] Updated weights for policy 0, policy_version 206594 (0.0022) [2025-01-04 13:13:28,705][134294] Updated weights for policy 0, policy_version 206604 (0.0029) [2025-01-04 13:13:28,968][134211] Fps is (10 sec: 17202.9, 60 sec: 15291.7, 300 sec: 14676.2). Total num frames: 846249984. Throughput: 0: 3943.0. Samples: 200734078. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 13:13:28,968][134211] Avg episode reward: [(0, '8.778')] [2025-01-04 13:13:31,709][134294] Updated weights for policy 0, policy_version 206614 (0.0027) [2025-01-04 13:13:33,968][134211] Fps is (10 sec: 13517.6, 60 sec: 15291.7, 300 sec: 14662.3). Total num frames: 846319616. Throughput: 0: 3931.3. Samples: 200743938. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 13:13:33,968][134211] Avg episode reward: [(0, '8.925')] [2025-01-04 13:13:34,858][134294] Updated weights for policy 0, policy_version 206624 (0.0024) [2025-01-04 13:13:37,867][134294] Updated weights for policy 0, policy_version 206634 (0.0028) [2025-01-04 13:13:38,968][134211] Fps is (10 sec: 13516.8, 60 sec: 15155.2, 300 sec: 14662.3). Total num frames: 846385152. Throughput: 0: 3903.5. Samples: 200764146. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 13:13:38,968][134211] Avg episode reward: [(0, '9.776')] [2025-01-04 13:13:40,808][134294] Updated weights for policy 0, policy_version 206644 (0.0022) [2025-01-04 13:13:43,693][134294] Updated weights for policy 0, policy_version 206654 (0.0023) [2025-01-04 13:13:43,968][134211] Fps is (10 sec: 13926.4, 60 sec: 15291.8, 300 sec: 14690.1). Total num frames: 846458880. Throughput: 0: 3902.9. Samples: 200785248. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 13:13:43,968][134211] Avg episode reward: [(0, '9.100')] [2025-01-04 13:13:46,535][134294] Updated weights for policy 0, policy_version 206664 (0.0024) [2025-01-04 13:13:48,968][134211] Fps is (10 sec: 14336.2, 60 sec: 15291.8, 300 sec: 14676.2). Total num frames: 846528512. Throughput: 0: 3906.4. Samples: 200795756. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 13:13:48,968][134211] Avg episode reward: [(0, '9.351')] [2025-01-04 13:13:49,571][134294] Updated weights for policy 0, policy_version 206674 (0.0025) [2025-01-04 13:13:52,483][134294] Updated weights for policy 0, policy_version 206684 (0.0023) [2025-01-04 13:13:53,968][134211] Fps is (10 sec: 13516.9, 60 sec: 15155.5, 300 sec: 14662.3). Total num frames: 846594048. Throughput: 0: 3891.6. Samples: 200816530. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 13:13:53,968][134211] Avg episode reward: [(0, '10.070')] [2025-01-04 13:13:55,432][134294] Updated weights for policy 0, policy_version 206694 (0.0022) [2025-01-04 13:13:58,156][134294] Updated weights for policy 0, policy_version 206704 (0.0024) [2025-01-04 13:13:58,968][134211] Fps is (10 sec: 13926.2, 60 sec: 15223.5, 300 sec: 14690.1). Total num frames: 846667776. Throughput: 0: 3890.6. Samples: 200837998. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 13:13:58,968][134211] Avg episode reward: [(0, '10.034')] [2025-01-04 13:14:01,123][134294] Updated weights for policy 0, policy_version 206714 (0.0026) [2025-01-04 13:14:03,969][134211] Fps is (10 sec: 14334.3, 60 sec: 15223.5, 300 sec: 14690.0). Total num frames: 846737408. Throughput: 0: 3786.8. Samples: 200848600. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 13:14:03,969][134211] Avg episode reward: [(0, '9.694')] [2025-01-04 13:14:04,151][134294] Updated weights for policy 0, policy_version 206724 (0.0025) [2025-01-04 13:14:07,034][134294] Updated weights for policy 0, policy_version 206734 (0.0022) [2025-01-04 13:14:08,968][134211] Fps is (10 sec: 13926.4, 60 sec: 15223.4, 300 sec: 14634.5). Total num frames: 846807040. Throughput: 0: 3518.7. Samples: 200869202. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 13:14:08,968][134211] Avg episode reward: [(0, '8.578')] [2025-01-04 13:14:10,022][134294] Updated weights for policy 0, policy_version 206744 (0.0025) [2025-01-04 13:14:12,768][134294] Updated weights for policy 0, policy_version 206754 (0.0023) [2025-01-04 13:14:13,968][134211] Fps is (10 sec: 13928.0, 60 sec: 15086.9, 300 sec: 14634.5). Total num frames: 846876672. Throughput: 0: 3478.0. Samples: 200890588. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 13:14:13,968][134211] Avg episode reward: [(0, '9.078')] [2025-01-04 13:14:15,660][134294] Updated weights for policy 0, policy_version 206764 (0.0027) [2025-01-04 13:14:18,424][134294] Updated weights for policy 0, policy_version 206774 (0.0027) [2025-01-04 13:14:18,968][134211] Fps is (10 sec: 14336.0, 60 sec: 14540.8, 300 sec: 14662.3). Total num frames: 846950400. Throughput: 0: 3499.8. Samples: 200901428. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 13:14:18,968][134211] Avg episode reward: [(0, '9.607')] [2025-01-04 13:14:21,394][134294] Updated weights for policy 0, policy_version 206784 (0.0025) [2025-01-04 13:14:23,968][134211] Fps is (10 sec: 14335.9, 60 sec: 13926.5, 300 sec: 14648.4). Total num frames: 847020032. Throughput: 0: 3524.3. Samples: 200922740. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 13:14:23,968][134211] Avg episode reward: [(0, '8.762')] [2025-01-04 13:14:24,273][134294] Updated weights for policy 0, policy_version 206794 (0.0025) [2025-01-04 13:14:27,234][134294] Updated weights for policy 0, policy_version 206804 (0.0024) [2025-01-04 13:14:28,968][134211] Fps is (10 sec: 13926.5, 60 sec: 13994.7, 300 sec: 14606.7). Total num frames: 847089664. Throughput: 0: 3518.2. Samples: 200943566. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 13:14:28,968][134211] Avg episode reward: [(0, '10.151')] [2025-01-04 13:14:30,193][134294] Updated weights for policy 0, policy_version 206814 (0.0025) [2025-01-04 13:14:33,025][134294] Updated weights for policy 0, policy_version 206824 (0.0024) [2025-01-04 13:14:33,971][134211] Fps is (10 sec: 14331.6, 60 sec: 14062.2, 300 sec: 14551.1). Total num frames: 847163392. Throughput: 0: 3527.3. Samples: 200954494. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 13:14:33,971][134211] Avg episode reward: [(0, '8.636')] [2025-01-04 13:14:35,862][134294] Updated weights for policy 0, policy_version 206834 (0.0023) [2025-01-04 13:14:38,594][134294] Updated weights for policy 0, policy_version 206844 (0.0024) [2025-01-04 13:14:38,968][134211] Fps is (10 sec: 14744.5, 60 sec: 14199.3, 300 sec: 14579.0). Total num frames: 847237120. Throughput: 0: 3545.7. Samples: 200976090. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 13:14:38,969][134211] Avg episode reward: [(0, '10.090')] [2025-01-04 13:14:41,496][134294] Updated weights for policy 0, policy_version 206854 (0.0024) [2025-01-04 13:14:43,968][134211] Fps is (10 sec: 13930.8, 60 sec: 14063.0, 300 sec: 14565.1). Total num frames: 847302656. Throughput: 0: 3539.7. Samples: 200997284. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 13:14:43,968][134211] Avg episode reward: [(0, '9.118')] [2025-01-04 13:14:44,513][134294] Updated weights for policy 0, policy_version 206864 (0.0020) [2025-01-04 13:14:47,115][134294] Updated weights for policy 0, policy_version 206874 (0.0020) [2025-01-04 13:14:48,967][134211] Fps is (10 sec: 15566.1, 60 sec: 14404.3, 300 sec: 14648.4). Total num frames: 847392768. Throughput: 0: 3534.1. Samples: 201007628. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 13:14:48,968][134211] Avg episode reward: [(0, '9.609')] [2025-01-04 13:14:48,976][134294] Updated weights for policy 0, policy_version 206884 (0.0014) [2025-01-04 13:14:50,822][134294] Updated weights for policy 0, policy_version 206894 (0.0013) [2025-01-04 13:14:52,733][134294] Updated weights for policy 0, policy_version 206904 (0.0013) [2025-01-04 13:14:53,968][134211] Fps is (10 sec: 20070.2, 60 sec: 15155.2, 300 sec: 14787.3). Total num frames: 847503360. Throughput: 0: 3794.8. Samples: 201039966. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 13:14:53,968][134211] Avg episode reward: [(0, '10.324')] [2025-01-04 13:14:54,628][134294] Updated weights for policy 0, policy_version 206914 (0.0012) [2025-01-04 13:14:57,511][134294] Updated weights for policy 0, policy_version 206924 (0.0025) [2025-01-04 13:14:58,968][134211] Fps is (10 sec: 18431.6, 60 sec: 15155.2, 300 sec: 14801.1). Total num frames: 847577088. Throughput: 0: 3887.2. Samples: 201065514. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 13:14:58,968][134211] Avg episode reward: [(0, '10.044')] [2025-01-04 13:15:00,536][134294] Updated weights for policy 0, policy_version 206934 (0.0031) [2025-01-04 13:15:03,594][134294] Updated weights for policy 0, policy_version 206944 (0.0027) [2025-01-04 13:15:03,968][134211] Fps is (10 sec: 14336.0, 60 sec: 15155.5, 300 sec: 14815.0). Total num frames: 847646720. Throughput: 0: 3876.5. Samples: 201075872. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 13:15:03,968][134211] Avg episode reward: [(0, '10.421')] [2025-01-04 13:15:06,482][134294] Updated weights for policy 0, policy_version 206954 (0.0027) [2025-01-04 13:15:08,968][134211] Fps is (10 sec: 13926.5, 60 sec: 15155.2, 300 sec: 14801.2). Total num frames: 847716352. Throughput: 0: 3855.6. Samples: 201096240. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 13:15:08,968][134211] Avg episode reward: [(0, '9.368')] [2025-01-04 13:15:09,598][134294] Updated weights for policy 0, policy_version 206964 (0.0029) [2025-01-04 13:15:12,549][134294] Updated weights for policy 0, policy_version 206974 (0.0024) [2025-01-04 13:15:13,968][134211] Fps is (10 sec: 13516.8, 60 sec: 15086.9, 300 sec: 14731.7). Total num frames: 847781888. Throughput: 0: 3847.6. Samples: 201116710. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 13:15:13,968][134211] Avg episode reward: [(0, '9.587')] [2025-01-04 13:15:13,978][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000206978_847781888.pth... [2025-01-04 13:15:14,048][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000206122_844275712.pth [2025-01-04 13:15:15,530][134294] Updated weights for policy 0, policy_version 206984 (0.0024) [2025-01-04 13:15:18,368][134294] Updated weights for policy 0, policy_version 206994 (0.0023) [2025-01-04 13:15:18,968][134211] Fps is (10 sec: 13926.4, 60 sec: 15087.0, 300 sec: 14745.6). Total num frames: 847855616. Throughput: 0: 3835.9. Samples: 201127098. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 13:15:18,968][134211] Avg episode reward: [(0, '9.820')] [2025-01-04 13:15:21,230][134294] Updated weights for policy 0, policy_version 207004 (0.0023) [2025-01-04 13:15:23,968][134211] Fps is (10 sec: 13926.5, 60 sec: 15018.7, 300 sec: 14745.6). Total num frames: 847921152. Throughput: 0: 3828.9. Samples: 201148386. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 13:15:23,968][134211] Avg episode reward: [(0, '9.916')] [2025-01-04 13:15:24,240][134294] Updated weights for policy 0, policy_version 207014 (0.0028) [2025-01-04 13:15:27,176][134294] Updated weights for policy 0, policy_version 207024 (0.0025) [2025-01-04 13:15:28,968][134211] Fps is (10 sec: 13925.6, 60 sec: 15086.8, 300 sec: 14759.5). Total num frames: 847994880. Throughput: 0: 3819.7. Samples: 201169174. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 13:15:28,969][134211] Avg episode reward: [(0, '9.244')] [2025-01-04 13:15:30,152][134294] Updated weights for policy 0, policy_version 207034 (0.0028) [2025-01-04 13:15:32,929][134294] Updated weights for policy 0, policy_version 207044 (0.0024) [2025-01-04 13:15:33,968][134211] Fps is (10 sec: 14336.0, 60 sec: 15019.4, 300 sec: 14759.5). Total num frames: 848064512. Throughput: 0: 3830.4. Samples: 201179998. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 13:15:33,968][134211] Avg episode reward: [(0, '11.299')] [2025-01-04 13:15:35,790][134294] Updated weights for policy 0, policy_version 207054 (0.0025) [2025-01-04 13:15:38,616][134294] Updated weights for policy 0, policy_version 207064 (0.0022) [2025-01-04 13:15:38,968][134211] Fps is (10 sec: 14336.7, 60 sec: 15018.8, 300 sec: 14773.4). Total num frames: 848138240. Throughput: 0: 3591.5. Samples: 201201584. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 13:15:38,968][134211] Avg episode reward: [(0, '9.666')] [2025-01-04 13:15:41,553][134294] Updated weights for policy 0, policy_version 207074 (0.0024) [2025-01-04 13:15:43,968][134211] Fps is (10 sec: 14336.2, 60 sec: 15086.9, 300 sec: 14662.3). Total num frames: 848207872. Throughput: 0: 3494.2. Samples: 201222754. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 13:15:43,968][134211] Avg episode reward: [(0, '9.937')] [2025-01-04 13:15:44,535][134294] Updated weights for policy 0, policy_version 207084 (0.0023) [2025-01-04 13:15:47,404][134294] Updated weights for policy 0, policy_version 207094 (0.0026) [2025-01-04 13:15:48,968][134211] Fps is (10 sec: 13925.7, 60 sec: 14745.4, 300 sec: 14523.4). Total num frames: 848277504. Throughput: 0: 3493.6. Samples: 201233084. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 13:15:48,969][134211] Avg episode reward: [(0, '10.281')] [2025-01-04 13:15:50,281][134294] Updated weights for policy 0, policy_version 207104 (0.0024) [2025-01-04 13:15:53,094][134294] Updated weights for policy 0, policy_version 207114 (0.0026) [2025-01-04 13:15:53,968][134211] Fps is (10 sec: 13926.2, 60 sec: 14062.9, 300 sec: 14468.0). Total num frames: 848347136. Throughput: 0: 3521.0. Samples: 201254686. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 13:15:53,968][134211] Avg episode reward: [(0, '10.365')] [2025-01-04 13:15:55,987][134294] Updated weights for policy 0, policy_version 207124 (0.0025) [2025-01-04 13:15:57,950][134294] Updated weights for policy 0, policy_version 207134 (0.0016) [2025-01-04 13:15:58,968][134211] Fps is (10 sec: 15975.3, 60 sec: 14336.0, 300 sec: 14565.1). Total num frames: 848437248. Throughput: 0: 3617.0. Samples: 201279474. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 13:15:58,968][134211] Avg episode reward: [(0, '8.775')] [2025-01-04 13:16:00,560][134294] Updated weights for policy 0, policy_version 207144 (0.0021) [2025-01-04 13:16:03,443][134294] Updated weights for policy 0, policy_version 207154 (0.0026) [2025-01-04 13:16:03,968][134211] Fps is (10 sec: 15974.6, 60 sec: 14336.0, 300 sec: 14565.1). Total num frames: 848506880. Throughput: 0: 3638.3. Samples: 201290820. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 13:16:03,968][134211] Avg episode reward: [(0, '8.202')] [2025-01-04 13:16:06,372][134294] Updated weights for policy 0, policy_version 207164 (0.0025) [2025-01-04 13:16:08,968][134211] Fps is (10 sec: 13926.3, 60 sec: 14336.0, 300 sec: 14579.0). Total num frames: 848576512. Throughput: 0: 3632.1. Samples: 201311830. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 13:16:08,968][134211] Avg episode reward: [(0, '8.895')] [2025-01-04 13:16:09,333][134294] Updated weights for policy 0, policy_version 207174 (0.0024) [2025-01-04 13:16:12,189][134294] Updated weights for policy 0, policy_version 207184 (0.0027) [2025-01-04 13:16:13,968][134211] Fps is (10 sec: 15564.8, 60 sec: 14677.4, 300 sec: 14648.4). Total num frames: 848662528. Throughput: 0: 3680.1. Samples: 201334776. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 13:16:13,968][134211] Avg episode reward: [(0, '9.874')] [2025-01-04 13:16:14,140][134294] Updated weights for policy 0, policy_version 207194 (0.0013) [2025-01-04 13:16:16,820][134294] Updated weights for policy 0, policy_version 207204 (0.0024) [2025-01-04 13:16:18,968][134211] Fps is (10 sec: 15564.8, 60 sec: 14609.0, 300 sec: 14648.4). Total num frames: 848732160. Throughput: 0: 3729.5. Samples: 201347826. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 13:16:18,968][134211] Avg episode reward: [(0, '10.012')] [2025-01-04 13:16:19,897][134294] Updated weights for policy 0, policy_version 207214 (0.0028) [2025-01-04 13:16:22,754][134294] Updated weights for policy 0, policy_version 207224 (0.0027) [2025-01-04 13:16:23,968][134211] Fps is (10 sec: 13926.3, 60 sec: 14677.3, 300 sec: 14648.4). Total num frames: 848801792. Throughput: 0: 3709.0. Samples: 201368490. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 13:16:23,968][134211] Avg episode reward: [(0, '11.241')] [2025-01-04 13:16:25,691][134294] Updated weights for policy 0, policy_version 207234 (0.0025) [2025-01-04 13:16:28,474][134294] Updated weights for policy 0, policy_version 207244 (0.0023) [2025-01-04 13:16:28,968][134211] Fps is (10 sec: 14336.1, 60 sec: 14677.5, 300 sec: 14676.2). Total num frames: 848875520. Throughput: 0: 3715.2. Samples: 201389938. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 13:16:28,968][134211] Avg episode reward: [(0, '8.695')] [2025-01-04 13:16:31,402][134294] Updated weights for policy 0, policy_version 207254 (0.0023) [2025-01-04 13:16:33,968][134211] Fps is (10 sec: 14335.8, 60 sec: 14677.3, 300 sec: 14676.2). Total num frames: 848945152. Throughput: 0: 3719.3. Samples: 201400450. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 13:16:33,968][134211] Avg episode reward: [(0, '9.620')] [2025-01-04 13:16:34,339][134294] Updated weights for policy 0, policy_version 207264 (0.0026) [2025-01-04 13:16:37,309][134294] Updated weights for policy 0, policy_version 207274 (0.0024) [2025-01-04 13:16:38,968][134211] Fps is (10 sec: 13926.4, 60 sec: 14609.1, 300 sec: 14676.2). Total num frames: 849014784. Throughput: 0: 3704.3. Samples: 201421380. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 13:16:38,968][134211] Avg episode reward: [(0, '10.100')] [2025-01-04 13:16:40,189][134294] Updated weights for policy 0, policy_version 207284 (0.0024) [2025-01-04 13:16:42,264][134294] Updated weights for policy 0, policy_version 207294 (0.0014) [2025-01-04 13:16:43,968][134211] Fps is (10 sec: 15974.5, 60 sec: 14950.4, 300 sec: 14745.6). Total num frames: 849104896. Throughput: 0: 3713.1. Samples: 201446562. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 13:16:43,968][134211] Avg episode reward: [(0, '9.838')] [2025-01-04 13:16:44,725][134294] Updated weights for policy 0, policy_version 207304 (0.0020) [2025-01-04 13:16:47,605][134294] Updated weights for policy 0, policy_version 207314 (0.0024) [2025-01-04 13:16:48,968][134211] Fps is (10 sec: 15974.6, 60 sec: 14950.6, 300 sec: 14745.7). Total num frames: 849174528. Throughput: 0: 3710.5. Samples: 201457792. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 13:16:48,968][134211] Avg episode reward: [(0, '8.765')] [2025-01-04 13:16:50,515][134294] Updated weights for policy 0, policy_version 207324 (0.0025) [2025-01-04 13:16:53,326][134294] Updated weights for policy 0, policy_version 207334 (0.0024) [2025-01-04 13:16:53,968][134211] Fps is (10 sec: 14336.1, 60 sec: 15018.7, 300 sec: 14759.5). Total num frames: 849248256. Throughput: 0: 3721.1. Samples: 201479280. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 13:16:53,968][134211] Avg episode reward: [(0, '9.634')] [2025-01-04 13:16:56,143][134294] Updated weights for policy 0, policy_version 207344 (0.0026) [2025-01-04 13:16:58,959][134294] Updated weights for policy 0, policy_version 207354 (0.0023) [2025-01-04 13:16:58,968][134211] Fps is (10 sec: 14745.4, 60 sec: 14745.6, 300 sec: 14801.1). Total num frames: 849321984. Throughput: 0: 3690.1. Samples: 201500830. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 13:16:58,968][134211] Avg episode reward: [(0, '9.763')] [2025-01-04 13:17:01,928][134294] Updated weights for policy 0, policy_version 207364 (0.0023) [2025-01-04 13:17:03,968][134211] Fps is (10 sec: 13926.2, 60 sec: 14677.3, 300 sec: 14787.3). Total num frames: 849387520. Throughput: 0: 3630.3. Samples: 201511190. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 13:17:03,968][134211] Avg episode reward: [(0, '9.023')] [2025-01-04 13:17:04,702][134294] Updated weights for policy 0, policy_version 207374 (0.0022) [2025-01-04 13:17:06,605][134294] Updated weights for policy 0, policy_version 207384 (0.0013) [2025-01-04 13:17:08,582][134294] Updated weights for policy 0, policy_version 207394 (0.0013) [2025-01-04 13:17:08,968][134211] Fps is (10 sec: 16793.7, 60 sec: 15223.5, 300 sec: 14787.3). Total num frames: 849489920. Throughput: 0: 3756.3. Samples: 201537524. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 13:17:08,968][134211] Avg episode reward: [(0, '8.978')] [2025-01-04 13:17:11,530][134294] Updated weights for policy 0, policy_version 207404 (0.0026) [2025-01-04 13:17:13,968][134211] Fps is (10 sec: 17203.1, 60 sec: 14950.3, 300 sec: 14787.2). Total num frames: 849559552. Throughput: 0: 3789.2. Samples: 201560452. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 13:17:13,969][134211] Avg episode reward: [(0, '9.498')] [2025-01-04 13:17:13,982][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000207412_849559552.pth... [2025-01-04 13:17:14,059][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000206536_845971456.pth [2025-01-04 13:17:14,663][134294] Updated weights for policy 0, policy_version 207414 (0.0023) [2025-01-04 13:17:17,682][134294] Updated weights for policy 0, policy_version 207424 (0.0025) [2025-01-04 13:17:18,968][134211] Fps is (10 sec: 13516.9, 60 sec: 14882.2, 300 sec: 14787.3). Total num frames: 849625088. Throughput: 0: 3775.8. Samples: 201570362. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 13:17:18,968][134211] Avg episode reward: [(0, '9.877')] [2025-01-04 13:17:20,519][134294] Updated weights for policy 0, policy_version 207434 (0.0026) [2025-01-04 13:17:23,469][134294] Updated weights for policy 0, policy_version 207444 (0.0021) [2025-01-04 13:17:23,968][134211] Fps is (10 sec: 13516.9, 60 sec: 14882.1, 300 sec: 14787.2). Total num frames: 849694720. Throughput: 0: 3780.6. Samples: 201591508. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 13:17:23,969][134211] Avg episode reward: [(0, '9.482')] [2025-01-04 13:17:26,574][134294] Updated weights for policy 0, policy_version 207454 (0.0025) [2025-01-04 13:17:28,968][134211] Fps is (10 sec: 13516.7, 60 sec: 14745.6, 300 sec: 14773.4). Total num frames: 849760256. Throughput: 0: 3652.3. Samples: 201610916. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 13:17:28,968][134211] Avg episode reward: [(0, '10.004')] [2025-01-04 13:17:29,931][134294] Updated weights for policy 0, policy_version 207464 (0.0024) [2025-01-04 13:17:32,837][134294] Updated weights for policy 0, policy_version 207474 (0.0027) [2025-01-04 13:17:33,968][134211] Fps is (10 sec: 13107.4, 60 sec: 14677.4, 300 sec: 14745.6). Total num frames: 849825792. Throughput: 0: 3621.8. Samples: 201620774. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 13:17:33,968][134211] Avg episode reward: [(0, '8.734')] [2025-01-04 13:17:35,680][134294] Updated weights for policy 0, policy_version 207484 (0.0025) [2025-01-04 13:17:38,512][134294] Updated weights for policy 0, policy_version 207494 (0.0023) [2025-01-04 13:17:38,968][134211] Fps is (10 sec: 13926.4, 60 sec: 14745.6, 300 sec: 14773.4). Total num frames: 849899520. Throughput: 0: 3621.9. Samples: 201642264. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 13:17:38,968][134211] Avg episode reward: [(0, '10.803')] [2025-01-04 13:17:41,428][134294] Updated weights for policy 0, policy_version 207504 (0.0024) [2025-01-04 13:17:43,493][134294] Updated weights for policy 0, policy_version 207514 (0.0015) [2025-01-04 13:17:43,968][134211] Fps is (10 sec: 15973.7, 60 sec: 14677.2, 300 sec: 14828.9). Total num frames: 849985536. Throughput: 0: 3667.9. Samples: 201665886. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 13:17:43,969][134211] Avg episode reward: [(0, '9.366')] [2025-01-04 13:17:46,123][134294] Updated weights for policy 0, policy_version 207524 (0.0023) [2025-01-04 13:17:48,968][134211] Fps is (10 sec: 15564.7, 60 sec: 14677.3, 300 sec: 14815.1). Total num frames: 850055168. Throughput: 0: 3706.2. Samples: 201677970. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 13:17:48,968][134211] Avg episode reward: [(0, '10.107')] [2025-01-04 13:17:48,974][134294] Updated weights for policy 0, policy_version 207534 (0.0024) [2025-01-04 13:17:51,855][134294] Updated weights for policy 0, policy_version 207544 (0.0023) [2025-01-04 13:17:53,968][134211] Fps is (10 sec: 14336.7, 60 sec: 14677.3, 300 sec: 14828.9). Total num frames: 850128896. Throughput: 0: 3596.1. Samples: 201699350. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 13:17:53,968][134211] Avg episode reward: [(0, '9.923')] [2025-01-04 13:17:54,786][134294] Updated weights for policy 0, policy_version 207554 (0.0028) [2025-01-04 13:17:57,551][134294] Updated weights for policy 0, policy_version 207564 (0.0024) [2025-01-04 13:17:58,967][134211] Fps is (10 sec: 15565.2, 60 sec: 14813.9, 300 sec: 14870.6). Total num frames: 850210816. Throughput: 0: 3591.7. Samples: 201722076. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 13:17:58,968][134211] Avg episode reward: [(0, '8.787')] [2025-01-04 13:17:59,436][134294] Updated weights for policy 0, policy_version 207574 (0.0015) [2025-01-04 13:18:01,293][134294] Updated weights for policy 0, policy_version 207584 (0.0013) [2025-01-04 13:18:03,208][134294] Updated weights for policy 0, policy_version 207594 (0.0014) [2025-01-04 13:18:03,967][134211] Fps is (10 sec: 19251.5, 60 sec: 15564.9, 300 sec: 15009.4). Total num frames: 850321408. Throughput: 0: 3738.2. Samples: 201738580. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 13:18:03,968][134211] Avg episode reward: [(0, '9.586')] [2025-01-04 13:18:05,071][134294] Updated weights for policy 0, policy_version 207604 (0.0013) [2025-01-04 13:18:06,925][134294] Updated weights for policy 0, policy_version 207614 (0.0014) [2025-01-04 13:18:08,970][134211] Fps is (10 sec: 20884.8, 60 sec: 15496.0, 300 sec: 15078.7). Total num frames: 850419712. Throughput: 0: 3992.6. Samples: 201771184. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 13:18:08,970][134211] Avg episode reward: [(0, '9.037')] [2025-01-04 13:18:09,695][134294] Updated weights for policy 0, policy_version 207624 (0.0023) [2025-01-04 13:18:12,991][134294] Updated weights for policy 0, policy_version 207634 (0.0029) [2025-01-04 13:18:13,968][134211] Fps is (10 sec: 15974.1, 60 sec: 15360.1, 300 sec: 14926.1). Total num frames: 850481152. Throughput: 0: 4007.8. Samples: 201791268. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 13:18:13,968][134211] Avg episode reward: [(0, '9.433')] [2025-01-04 13:18:15,861][134294] Updated weights for policy 0, policy_version 207644 (0.0026) [2025-01-04 13:18:18,968][134211] Fps is (10 sec: 13109.8, 60 sec: 15428.2, 300 sec: 14801.2). Total num frames: 850550784. Throughput: 0: 4016.7. Samples: 201801528. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 13:18:18,968][134211] Avg episode reward: [(0, '9.766')] [2025-01-04 13:18:18,970][134294] Updated weights for policy 0, policy_version 207654 (0.0027) [2025-01-04 13:18:21,803][134294] Updated weights for policy 0, policy_version 207664 (0.0026) [2025-01-04 13:18:23,968][134211] Fps is (10 sec: 13516.6, 60 sec: 15360.0, 300 sec: 14801.1). Total num frames: 850616320. Throughput: 0: 3997.6. Samples: 201822156. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 13:18:23,968][134211] Avg episode reward: [(0, '9.289')] [2025-01-04 13:18:24,983][134294] Updated weights for policy 0, policy_version 207674 (0.0026) [2025-01-04 13:18:27,791][134294] Updated weights for policy 0, policy_version 207684 (0.0025) [2025-01-04 13:18:28,968][134211] Fps is (10 sec: 13517.0, 60 sec: 15428.3, 300 sec: 14801.1). Total num frames: 850685952. Throughput: 0: 3929.7. Samples: 201842722. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 13:18:28,968][134211] Avg episode reward: [(0, '10.381')] [2025-01-04 13:18:30,785][134294] Updated weights for policy 0, policy_version 207694 (0.0027) [2025-01-04 13:18:33,686][134294] Updated weights for policy 0, policy_version 207704 (0.0025) [2025-01-04 13:18:33,968][134211] Fps is (10 sec: 14336.0, 60 sec: 15564.8, 300 sec: 14828.9). Total num frames: 850759680. Throughput: 0: 3896.0. Samples: 201853292. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 13:18:33,968][134211] Avg episode reward: [(0, '10.501')] [2025-01-04 13:18:36,515][134294] Updated weights for policy 0, policy_version 207714 (0.0023) [2025-01-04 13:18:38,968][134211] Fps is (10 sec: 14335.8, 60 sec: 15496.5, 300 sec: 14815.0). Total num frames: 850829312. Throughput: 0: 3893.8. Samples: 201874572. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 13:18:38,968][134211] Avg episode reward: [(0, '10.524')] [2025-01-04 13:18:39,496][134294] Updated weights for policy 0, policy_version 207724 (0.0023) [2025-01-04 13:18:42,466][134294] Updated weights for policy 0, policy_version 207734 (0.0024) [2025-01-04 13:18:43,968][134211] Fps is (10 sec: 13926.4, 60 sec: 15223.6, 300 sec: 14815.0). Total num frames: 850898944. Throughput: 0: 3848.9. Samples: 201895276. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 13:18:43,968][134211] Avg episode reward: [(0, '10.392')] [2025-01-04 13:18:45,341][134294] Updated weights for policy 0, policy_version 207744 (0.0022) [2025-01-04 13:18:48,162][134294] Updated weights for policy 0, policy_version 207754 (0.0022) [2025-01-04 13:18:48,969][134211] Fps is (10 sec: 13924.6, 60 sec: 15223.1, 300 sec: 14828.8). Total num frames: 850968576. Throughput: 0: 3725.8. Samples: 201906248. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 13:18:48,970][134211] Avg episode reward: [(0, '10.076')] [2025-01-04 13:18:50,995][134294] Updated weights for policy 0, policy_version 207764 (0.0026) [2025-01-04 13:18:53,968][134211] Fps is (10 sec: 13926.3, 60 sec: 15155.1, 300 sec: 14815.0). Total num frames: 851038208. Throughput: 0: 3473.8. Samples: 201927498. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 13:18:53,969][134211] Avg episode reward: [(0, '10.026')] [2025-01-04 13:18:54,054][134294] Updated weights for policy 0, policy_version 207774 (0.0023) [2025-01-04 13:18:56,897][134294] Updated weights for policy 0, policy_version 207784 (0.0025) [2025-01-04 13:18:58,968][134211] Fps is (10 sec: 13928.2, 60 sec: 14950.3, 300 sec: 14815.1). Total num frames: 851107840. Throughput: 0: 3488.0. Samples: 201948230. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 13:18:58,968][134211] Avg episode reward: [(0, '10.873')] [2025-01-04 13:18:59,947][134294] Updated weights for policy 0, policy_version 207794 (0.0025) [2025-01-04 13:19:02,933][134294] Updated weights for policy 0, policy_version 207804 (0.0024) [2025-01-04 13:19:03,968][134211] Fps is (10 sec: 13926.7, 60 sec: 14267.7, 300 sec: 14815.0). Total num frames: 851177472. Throughput: 0: 3490.8. Samples: 201958612. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 13:19:03,968][134211] Avg episode reward: [(0, '10.314')] [2025-01-04 13:19:05,817][134294] Updated weights for policy 0, policy_version 207814 (0.0022) [2025-01-04 13:19:08,644][134294] Updated weights for policy 0, policy_version 207824 (0.0023) [2025-01-04 13:19:08,968][134211] Fps is (10 sec: 14336.1, 60 sec: 13858.6, 300 sec: 14828.9). Total num frames: 851251200. Throughput: 0: 3503.9. Samples: 201979832. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 13:19:08,968][134211] Avg episode reward: [(0, '9.483')] [2025-01-04 13:19:11,513][134294] Updated weights for policy 0, policy_version 207834 (0.0023) [2025-01-04 13:19:13,968][134211] Fps is (10 sec: 14335.9, 60 sec: 13994.7, 300 sec: 14815.0). Total num frames: 851320832. Throughput: 0: 3514.8. Samples: 202000888. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 13:19:13,968][134211] Avg episode reward: [(0, '8.712')] [2025-01-04 13:19:13,979][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000207842_851320832.pth... [2025-01-04 13:19:14,049][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000206978_847781888.pth [2025-01-04 13:19:14,490][134294] Updated weights for policy 0, policy_version 207844 (0.0022) [2025-01-04 13:19:17,466][134294] Updated weights for policy 0, policy_version 207854 (0.0024) [2025-01-04 13:19:18,968][134211] Fps is (10 sec: 13516.8, 60 sec: 13926.4, 300 sec: 14801.1). Total num frames: 851386368. Throughput: 0: 3508.9. Samples: 202011192. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 13:19:18,968][134211] Avg episode reward: [(0, '9.142')] [2025-01-04 13:19:20,362][134294] Updated weights for policy 0, policy_version 207864 (0.0025) [2025-01-04 13:19:23,049][134294] Updated weights for policy 0, policy_version 207874 (0.0021) [2025-01-04 13:19:23,967][134211] Fps is (10 sec: 14745.9, 60 sec: 14199.5, 300 sec: 14842.8). Total num frames: 851468288. Throughput: 0: 3513.9. Samples: 202032696. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 13:19:23,968][134211] Avg episode reward: [(0, '9.688')] [2025-01-04 13:19:24,977][134294] Updated weights for policy 0, policy_version 207884 (0.0012) [2025-01-04 13:19:26,830][134294] Updated weights for policy 0, policy_version 207894 (0.0012) [2025-01-04 13:19:28,710][134294] Updated weights for policy 0, policy_version 207904 (0.0012) [2025-01-04 13:19:28,968][134211] Fps is (10 sec: 19251.5, 60 sec: 14882.1, 300 sec: 14967.9). Total num frames: 851578880. Throughput: 0: 3753.8. Samples: 202064194. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 13:19:28,968][134211] Avg episode reward: [(0, '10.564')] [2025-01-04 13:19:30,549][134294] Updated weights for policy 0, policy_version 207914 (0.0013) [2025-01-04 13:19:33,208][134294] Updated weights for policy 0, policy_version 207924 (0.0024) [2025-01-04 13:19:33,969][134211] Fps is (10 sec: 19658.1, 60 sec: 15086.7, 300 sec: 15009.4). Total num frames: 851664896. Throughput: 0: 3863.2. Samples: 202080090. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 13:19:33,969][134211] Avg episode reward: [(0, '9.732')] [2025-01-04 13:19:36,244][134294] Updated weights for policy 0, policy_version 207934 (0.0026) [2025-01-04 13:19:38,968][134211] Fps is (10 sec: 15155.0, 60 sec: 15018.7, 300 sec: 15009.4). Total num frames: 851730432. Throughput: 0: 3846.8. Samples: 202100604. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 13:19:38,968][134211] Avg episode reward: [(0, '9.755')] [2025-01-04 13:19:39,519][134294] Updated weights for policy 0, policy_version 207944 (0.0024) [2025-01-04 13:19:42,723][134294] Updated weights for policy 0, policy_version 207954 (0.0026) [2025-01-04 13:19:43,968][134211] Fps is (10 sec: 13108.8, 60 sec: 14950.4, 300 sec: 14926.1). Total num frames: 851795968. Throughput: 0: 3808.9. Samples: 202119628. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 13:19:43,968][134211] Avg episode reward: [(0, '10.362')] [2025-01-04 13:19:45,694][134294] Updated weights for policy 0, policy_version 207964 (0.0025) [2025-01-04 13:19:48,585][134294] Updated weights for policy 0, policy_version 207974 (0.0026) [2025-01-04 13:19:48,968][134211] Fps is (10 sec: 13516.8, 60 sec: 14950.7, 300 sec: 14787.3). Total num frames: 851865600. Throughput: 0: 3812.3. Samples: 202130168. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 13:19:48,968][134211] Avg episode reward: [(0, '10.800')] [2025-01-04 13:19:51,457][134294] Updated weights for policy 0, policy_version 207984 (0.0024) [2025-01-04 13:19:53,969][134211] Fps is (10 sec: 13924.8, 60 sec: 14950.2, 300 sec: 14773.3). Total num frames: 851935232. Throughput: 0: 3810.1. Samples: 202151290. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 13:19:53,969][134211] Avg episode reward: [(0, '9.112')] [2025-01-04 13:19:54,524][134294] Updated weights for policy 0, policy_version 207994 (0.0024) [2025-01-04 13:19:57,470][134294] Updated weights for policy 0, policy_version 208004 (0.0024) [2025-01-04 13:19:58,968][134211] Fps is (10 sec: 13926.4, 60 sec: 14950.4, 300 sec: 14773.4). Total num frames: 852004864. Throughput: 0: 3801.4. Samples: 202171952. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 13:19:58,968][134211] Avg episode reward: [(0, '10.160')] [2025-01-04 13:20:00,351][134294] Updated weights for policy 0, policy_version 208014 (0.0025) [2025-01-04 13:20:03,147][134294] Updated weights for policy 0, policy_version 208024 (0.0021) [2025-01-04 13:20:03,968][134211] Fps is (10 sec: 13927.9, 60 sec: 14950.4, 300 sec: 14773.4). Total num frames: 852074496. Throughput: 0: 3810.6. Samples: 202182670. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 13:20:03,968][134211] Avg episode reward: [(0, '9.866')] [2025-01-04 13:20:06,087][134294] Updated weights for policy 0, policy_version 208034 (0.0026) [2025-01-04 13:20:08,968][134211] Fps is (10 sec: 13926.4, 60 sec: 14882.1, 300 sec: 14787.3). Total num frames: 852144128. Throughput: 0: 3801.5. Samples: 202203766. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 13:20:08,968][134211] Avg episode reward: [(0, '8.977')] [2025-01-04 13:20:09,029][134294] Updated weights for policy 0, policy_version 208044 (0.0027) [2025-01-04 13:20:12,007][134294] Updated weights for policy 0, policy_version 208054 (0.0023) [2025-01-04 13:20:13,968][134211] Fps is (10 sec: 13926.1, 60 sec: 14882.1, 300 sec: 14773.4). Total num frames: 852213760. Throughput: 0: 3564.4. Samples: 202224592. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 13:20:13,969][134211] Avg episode reward: [(0, '9.991')] [2025-01-04 13:20:14,891][134294] Updated weights for policy 0, policy_version 208064 (0.0025) [2025-01-04 13:20:17,695][134294] Updated weights for policy 0, policy_version 208074 (0.0024) [2025-01-04 13:20:18,968][134211] Fps is (10 sec: 14336.1, 60 sec: 15018.7, 300 sec: 14801.1). Total num frames: 852287488. Throughput: 0: 3451.3. Samples: 202235392. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 13:20:18,968][134211] Avg episode reward: [(0, '10.943')] [2025-01-04 13:20:20,625][134294] Updated weights for policy 0, policy_version 208084 (0.0024) [2025-01-04 13:20:23,459][134294] Updated weights for policy 0, policy_version 208094 (0.0024) [2025-01-04 13:20:23,968][134211] Fps is (10 sec: 14336.1, 60 sec: 14813.8, 300 sec: 14787.3). Total num frames: 852357120. Throughput: 0: 3477.7. Samples: 202257102. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 13:20:23,968][134211] Avg episode reward: [(0, '9.795')] [2025-01-04 13:20:26,339][134294] Updated weights for policy 0, policy_version 208104 (0.0022) [2025-01-04 13:20:28,968][134211] Fps is (10 sec: 13926.4, 60 sec: 14131.2, 300 sec: 14787.3). Total num frames: 852426752. Throughput: 0: 3522.4. Samples: 202278136. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 13:20:28,968][134211] Avg episode reward: [(0, '10.445')] [2025-01-04 13:20:29,234][134294] Updated weights for policy 0, policy_version 208114 (0.0022) [2025-01-04 13:20:31,465][134294] Updated weights for policy 0, policy_version 208124 (0.0016) [2025-01-04 13:20:33,348][134294] Updated weights for policy 0, policy_version 208134 (0.0013) [2025-01-04 13:20:33,968][134211] Fps is (10 sec: 17203.5, 60 sec: 14404.6, 300 sec: 14884.5). Total num frames: 852529152. Throughput: 0: 3565.7. Samples: 202290624. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 13:20:33,968][134211] Avg episode reward: [(0, '9.691')] [2025-01-04 13:20:35,207][134294] Updated weights for policy 0, policy_version 208144 (0.0014) [2025-01-04 13:20:37,049][134294] Updated weights for policy 0, policy_version 208154 (0.0012) [2025-01-04 13:20:38,968][134211] Fps is (10 sec: 20889.6, 60 sec: 15087.0, 300 sec: 15009.4). Total num frames: 852635648. Throughput: 0: 3825.8. Samples: 202323446. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 13:20:38,968][134211] Avg episode reward: [(0, '10.425')] [2025-01-04 13:20:39,172][134294] Updated weights for policy 0, policy_version 208164 (0.0018) [2025-01-04 13:20:42,184][134294] Updated weights for policy 0, policy_version 208174 (0.0025) [2025-01-04 13:20:43,968][134211] Fps is (10 sec: 17202.8, 60 sec: 15086.9, 300 sec: 14995.5). Total num frames: 852701184. Throughput: 0: 3878.7. Samples: 202346494. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 13:20:43,969][134211] Avg episode reward: [(0, '9.605')] [2025-01-04 13:20:45,331][134294] Updated weights for policy 0, policy_version 208184 (0.0028) [2025-01-04 13:20:48,210][134294] Updated weights for policy 0, policy_version 208194 (0.0026) [2025-01-04 13:20:48,968][134211] Fps is (10 sec: 13516.7, 60 sec: 15086.9, 300 sec: 14995.5). Total num frames: 852770816. Throughput: 0: 3867.5. Samples: 202356708. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 13:20:48,968][134211] Avg episode reward: [(0, '8.990')] [2025-01-04 13:20:51,235][134294] Updated weights for policy 0, policy_version 208204 (0.0023) [2025-01-04 13:20:53,968][134211] Fps is (10 sec: 13926.5, 60 sec: 15087.2, 300 sec: 14926.1). Total num frames: 852840448. Throughput: 0: 3854.7. Samples: 202377226. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 13:20:53,968][134211] Avg episode reward: [(0, '10.016')] [2025-01-04 13:20:54,327][134294] Updated weights for policy 0, policy_version 208214 (0.0026) [2025-01-04 13:20:57,259][134294] Updated weights for policy 0, policy_version 208224 (0.0025) [2025-01-04 13:20:58,968][134211] Fps is (10 sec: 13516.8, 60 sec: 15018.7, 300 sec: 14912.2). Total num frames: 852905984. Throughput: 0: 3847.0. Samples: 202397706. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 13:20:58,968][134211] Avg episode reward: [(0, '9.507')] [2025-01-04 13:21:00,172][134294] Updated weights for policy 0, policy_version 208234 (0.0025) [2025-01-04 13:21:03,027][134294] Updated weights for policy 0, policy_version 208244 (0.0022) [2025-01-04 13:21:03,968][134211] Fps is (10 sec: 13926.3, 60 sec: 15086.9, 300 sec: 14926.1). Total num frames: 852979712. Throughput: 0: 3846.3. Samples: 202408478. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 13:21:03,968][134211] Avg episode reward: [(0, '10.523')] [2025-01-04 13:21:05,923][134294] Updated weights for policy 0, policy_version 208254 (0.0023) [2025-01-04 13:21:08,772][134294] Updated weights for policy 0, policy_version 208264 (0.0024) [2025-01-04 13:21:08,968][134211] Fps is (10 sec: 14336.0, 60 sec: 15086.9, 300 sec: 14870.6). Total num frames: 853049344. Throughput: 0: 3840.1. Samples: 202429906. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 13:21:08,968][134211] Avg episode reward: [(0, '10.266')] [2025-01-04 13:21:11,610][134294] Updated weights for policy 0, policy_version 208274 (0.0024) [2025-01-04 13:21:13,968][134211] Fps is (10 sec: 13926.3, 60 sec: 15086.9, 300 sec: 14870.6). Total num frames: 853118976. Throughput: 0: 3842.9. Samples: 202451066. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 13:21:13,968][134211] Avg episode reward: [(0, '9.287')] [2025-01-04 13:21:13,991][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000208282_853123072.pth... [2025-01-04 13:21:14,056][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000207412_849559552.pth [2025-01-04 13:21:14,574][134294] Updated weights for policy 0, policy_version 208284 (0.0028) [2025-01-04 13:21:17,626][134294] Updated weights for policy 0, policy_version 208294 (0.0026) [2025-01-04 13:21:18,968][134211] Fps is (10 sec: 13926.4, 60 sec: 15018.6, 300 sec: 14870.6). Total num frames: 853188608. Throughput: 0: 3790.2. Samples: 202461182. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 13:21:18,968][134211] Avg episode reward: [(0, '9.122')] [2025-01-04 13:21:20,494][134294] Updated weights for policy 0, policy_version 208304 (0.0025) [2025-01-04 13:21:23,308][134294] Updated weights for policy 0, policy_version 208314 (0.0022) [2025-01-04 13:21:23,968][134211] Fps is (10 sec: 14336.2, 60 sec: 15086.9, 300 sec: 14870.6). Total num frames: 853262336. Throughput: 0: 3539.0. Samples: 202482700. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 13:21:23,968][134211] Avg episode reward: [(0, '10.025')] [2025-01-04 13:21:26,226][134294] Updated weights for policy 0, policy_version 208324 (0.0026) [2025-01-04 13:21:28,968][134211] Fps is (10 sec: 14336.1, 60 sec: 15086.9, 300 sec: 14870.6). Total num frames: 853331968. Throughput: 0: 3498.3. Samples: 202503916. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 13:21:28,968][134211] Avg episode reward: [(0, '9.491')] [2025-01-04 13:21:29,104][134294] Updated weights for policy 0, policy_version 208334 (0.0024) [2025-01-04 13:21:32,065][134294] Updated weights for policy 0, policy_version 208344 (0.0025) [2025-01-04 13:21:33,968][134211] Fps is (10 sec: 13926.5, 60 sec: 14540.8, 300 sec: 14870.6). Total num frames: 853401600. Throughput: 0: 3501.9. Samples: 202514292. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 13:21:33,969][134211] Avg episode reward: [(0, '9.926')] [2025-01-04 13:21:34,933][134294] Updated weights for policy 0, policy_version 208354 (0.0025) [2025-01-04 13:21:37,750][134294] Updated weights for policy 0, policy_version 208364 (0.0025) [2025-01-04 13:21:38,968][134211] Fps is (10 sec: 14336.1, 60 sec: 13994.7, 300 sec: 14815.0). Total num frames: 853475328. Throughput: 0: 3527.1. Samples: 202535944. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 13:21:38,968][134211] Avg episode reward: [(0, '10.328')] [2025-01-04 13:21:40,575][134294] Updated weights for policy 0, policy_version 208374 (0.0024) [2025-01-04 13:21:42,477][134294] Updated weights for policy 0, policy_version 208384 (0.0013) [2025-01-04 13:21:43,968][134211] Fps is (10 sec: 16384.1, 60 sec: 14404.3, 300 sec: 14884.4). Total num frames: 853565440. Throughput: 0: 3642.5. Samples: 202561620. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 13:21:43,968][134211] Avg episode reward: [(0, '8.451')] [2025-01-04 13:21:44,995][134294] Updated weights for policy 0, policy_version 208394 (0.0020) [2025-01-04 13:21:47,809][134294] Updated weights for policy 0, policy_version 208404 (0.0022) [2025-01-04 13:21:48,968][134211] Fps is (10 sec: 15974.5, 60 sec: 14404.3, 300 sec: 14870.6). Total num frames: 853635072. Throughput: 0: 3652.9. Samples: 202572858. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 13:21:48,968][134211] Avg episode reward: [(0, '8.837')] [2025-01-04 13:21:50,739][134294] Updated weights for policy 0, policy_version 208414 (0.0022) [2025-01-04 13:21:53,611][134294] Updated weights for policy 0, policy_version 208424 (0.0026) [2025-01-04 13:21:53,968][134211] Fps is (10 sec: 14335.9, 60 sec: 14472.5, 300 sec: 14870.6). Total num frames: 853708800. Throughput: 0: 3654.4. Samples: 202594354. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 13:21:53,968][134211] Avg episode reward: [(0, '10.336')] [2025-01-04 13:21:56,597][134294] Updated weights for policy 0, policy_version 208434 (0.0023) [2025-01-04 13:21:58,968][134211] Fps is (10 sec: 14335.8, 60 sec: 14540.8, 300 sec: 14884.5). Total num frames: 853778432. Throughput: 0: 3645.4. Samples: 202615108. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 13:21:58,968][134211] Avg episode reward: [(0, '9.664')] [2025-01-04 13:21:59,515][134294] Updated weights for policy 0, policy_version 208444 (0.0023) [2025-01-04 13:22:01,963][134294] Updated weights for policy 0, policy_version 208454 (0.0019) [2025-01-04 13:22:03,968][134211] Fps is (10 sec: 15155.1, 60 sec: 14677.3, 300 sec: 14815.0). Total num frames: 853860352. Throughput: 0: 3660.4. Samples: 202625900. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 13:22:03,968][134211] Avg episode reward: [(0, '9.239')] [2025-01-04 13:22:04,219][134294] Updated weights for policy 0, policy_version 208464 (0.0020) [2025-01-04 13:22:07,092][134294] Updated weights for policy 0, policy_version 208474 (0.0027) [2025-01-04 13:22:08,968][134211] Fps is (10 sec: 15564.8, 60 sec: 14745.6, 300 sec: 14828.9). Total num frames: 853934080. Throughput: 0: 3736.4. Samples: 202650836. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 13:22:08,968][134211] Avg episode reward: [(0, '10.474')] [2025-01-04 13:22:10,088][134294] Updated weights for policy 0, policy_version 208484 (0.0025) [2025-01-04 13:22:12,979][134294] Updated weights for policy 0, policy_version 208494 (0.0023) [2025-01-04 13:22:13,968][134211] Fps is (10 sec: 14336.0, 60 sec: 14745.6, 300 sec: 14842.8). Total num frames: 854003712. Throughput: 0: 3732.5. Samples: 202671880. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 13:22:13,968][134211] Avg episode reward: [(0, '10.026')] [2025-01-04 13:22:15,802][134294] Updated weights for policy 0, policy_version 208504 (0.0022) [2025-01-04 13:22:18,683][134294] Updated weights for policy 0, policy_version 208514 (0.0021) [2025-01-04 13:22:18,968][134211] Fps is (10 sec: 14335.6, 60 sec: 14813.8, 300 sec: 14856.7). Total num frames: 854077440. Throughput: 0: 3740.5. Samples: 202682614. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 13:22:18,969][134211] Avg episode reward: [(0, '10.518')] [2025-01-04 13:22:21,483][134294] Updated weights for policy 0, policy_version 208524 (0.0025) [2025-01-04 13:22:23,968][134211] Fps is (10 sec: 14336.2, 60 sec: 14745.6, 300 sec: 14870.6). Total num frames: 854147072. Throughput: 0: 3737.0. Samples: 202704108. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 13:22:23,968][134211] Avg episode reward: [(0, '9.920')] [2025-01-04 13:22:24,334][134294] Updated weights for policy 0, policy_version 208534 (0.0024) [2025-01-04 13:22:26,627][134294] Updated weights for policy 0, policy_version 208544 (0.0017) [2025-01-04 13:22:28,503][134294] Updated weights for policy 0, policy_version 208554 (0.0013) [2025-01-04 13:22:28,968][134211] Fps is (10 sec: 16793.5, 60 sec: 15223.4, 300 sec: 14981.6). Total num frames: 854245376. Throughput: 0: 3756.1. Samples: 202730646. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 13:22:28,968][134211] Avg episode reward: [(0, '10.338')] [2025-01-04 13:22:30,617][134294] Updated weights for policy 0, policy_version 208564 (0.0015) [2025-01-04 13:22:33,512][134294] Updated weights for policy 0, policy_version 208574 (0.0026) [2025-01-04 13:22:33,968][134211] Fps is (10 sec: 17612.5, 60 sec: 15360.0, 300 sec: 14995.5). Total num frames: 854323200. Throughput: 0: 3823.5. Samples: 202744914. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 13:22:33,968][134211] Avg episode reward: [(0, '10.864')] [2025-01-04 13:22:36,481][134294] Updated weights for policy 0, policy_version 208584 (0.0024) [2025-01-04 13:22:38,968][134211] Fps is (10 sec: 14746.0, 60 sec: 15291.7, 300 sec: 14940.0). Total num frames: 854392832. Throughput: 0: 3800.4. Samples: 202765374. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 13:22:38,968][134211] Avg episode reward: [(0, '9.706')] [2025-01-04 13:22:39,634][134294] Updated weights for policy 0, policy_version 208594 (0.0026) [2025-01-04 13:22:42,574][134294] Updated weights for policy 0, policy_version 208604 (0.0027) [2025-01-04 13:22:43,968][134211] Fps is (10 sec: 13517.0, 60 sec: 14882.1, 300 sec: 14926.1). Total num frames: 854458368. Throughput: 0: 3791.9. Samples: 202785744. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 13:22:43,968][134211] Avg episode reward: [(0, '11.147')] [2025-01-04 13:22:45,510][134294] Updated weights for policy 0, policy_version 208614 (0.0025) [2025-01-04 13:22:48,298][134294] Updated weights for policy 0, policy_version 208624 (0.0024) [2025-01-04 13:22:48,968][134211] Fps is (10 sec: 13926.4, 60 sec: 14950.3, 300 sec: 14926.1). Total num frames: 854532096. Throughput: 0: 3792.1. Samples: 202796544. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 13:22:48,968][134211] Avg episode reward: [(0, '10.553')] [2025-01-04 13:22:51,256][134294] Updated weights for policy 0, policy_version 208634 (0.0024) [2025-01-04 13:22:53,970][134211] Fps is (10 sec: 14332.4, 60 sec: 14881.5, 300 sec: 14884.3). Total num frames: 854601728. Throughput: 0: 3708.6. Samples: 202817732. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 13:22:53,971][134211] Avg episode reward: [(0, '10.136')] [2025-01-04 13:22:54,177][134294] Updated weights for policy 0, policy_version 208644 (0.0022) [2025-01-04 13:22:57,135][134294] Updated weights for policy 0, policy_version 208654 (0.0025) [2025-01-04 13:22:58,968][134211] Fps is (10 sec: 13926.2, 60 sec: 14882.1, 300 sec: 14745.6). Total num frames: 854671360. Throughput: 0: 3703.2. Samples: 202838524. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 13:22:58,968][134211] Avg episode reward: [(0, '9.705')] [2025-01-04 13:23:00,054][134294] Updated weights for policy 0, policy_version 208664 (0.0025) [2025-01-04 13:23:02,894][134294] Updated weights for policy 0, policy_version 208674 (0.0026) [2025-01-04 13:23:03,968][134211] Fps is (10 sec: 13929.9, 60 sec: 14677.4, 300 sec: 14648.5). Total num frames: 854740992. Throughput: 0: 3706.1. Samples: 202849386. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 13:23:03,968][134211] Avg episode reward: [(0, '9.909')] [2025-01-04 13:23:05,212][134294] Updated weights for policy 0, policy_version 208684 (0.0017) [2025-01-04 13:23:07,080][134294] Updated weights for policy 0, policy_version 208694 (0.0014) [2025-01-04 13:23:08,967][134294] Updated weights for policy 0, policy_version 208704 (0.0012) [2025-01-04 13:23:08,968][134211] Fps is (10 sec: 18022.3, 60 sec: 15291.7, 300 sec: 14815.0). Total num frames: 854851584. Throughput: 0: 3831.5. Samples: 202876526. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 13:23:08,968][134211] Avg episode reward: [(0, '10.976')] [2025-01-04 13:23:10,859][134294] Updated weights for policy 0, policy_version 208714 (0.0013) [2025-01-04 13:23:13,322][134294] Updated weights for policy 0, policy_version 208724 (0.0023) [2025-01-04 13:23:13,968][134211] Fps is (10 sec: 20070.0, 60 sec: 15633.0, 300 sec: 14884.4). Total num frames: 854941696. Throughput: 0: 3906.5. Samples: 202906438. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 13:23:13,969][134211] Avg episode reward: [(0, '9.388')] [2025-01-04 13:23:13,983][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000208726_854941696.pth... [2025-01-04 13:23:14,067][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000207842_851320832.pth [2025-01-04 13:23:16,487][134294] Updated weights for policy 0, policy_version 208734 (0.0026) [2025-01-04 13:23:18,968][134211] Fps is (10 sec: 15155.5, 60 sec: 15428.3, 300 sec: 14870.6). Total num frames: 855003136. Throughput: 0: 3798.5. Samples: 202915848. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 13:23:18,968][134211] Avg episode reward: [(0, '9.587')] [2025-01-04 13:23:19,668][134294] Updated weights for policy 0, policy_version 208744 (0.0027) [2025-01-04 13:23:22,796][134294] Updated weights for policy 0, policy_version 208754 (0.0029) [2025-01-04 13:23:23,968][134211] Fps is (10 sec: 12697.9, 60 sec: 15360.0, 300 sec: 14856.7). Total num frames: 855068672. Throughput: 0: 3780.2. Samples: 202935482. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 13:23:23,968][134211] Avg episode reward: [(0, '9.483')] [2025-01-04 13:23:25,739][134294] Updated weights for policy 0, policy_version 208764 (0.0024) [2025-01-04 13:23:28,558][134294] Updated weights for policy 0, policy_version 208774 (0.0024) [2025-01-04 13:23:28,968][134211] Fps is (10 sec: 13926.4, 60 sec: 14950.5, 300 sec: 14856.7). Total num frames: 855142400. Throughput: 0: 3789.1. Samples: 202956254. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 13:23:28,968][134211] Avg episode reward: [(0, '10.625')] [2025-01-04 13:23:31,501][134294] Updated weights for policy 0, policy_version 208784 (0.0026) [2025-01-04 13:23:33,968][134211] Fps is (10 sec: 14336.0, 60 sec: 14813.9, 300 sec: 14856.7). Total num frames: 855212032. Throughput: 0: 3781.2. Samples: 202966698. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 13:23:33,968][134211] Avg episode reward: [(0, '9.785')] [2025-01-04 13:23:34,580][134294] Updated weights for policy 0, policy_version 208794 (0.0024) [2025-01-04 13:23:37,492][134294] Updated weights for policy 0, policy_version 208804 (0.0021) [2025-01-04 13:23:38,968][134211] Fps is (10 sec: 13926.5, 60 sec: 14813.9, 300 sec: 14856.7). Total num frames: 855281664. Throughput: 0: 3770.3. Samples: 202987386. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 13:23:38,968][134211] Avg episode reward: [(0, '10.049')] [2025-01-04 13:23:40,383][134294] Updated weights for policy 0, policy_version 208814 (0.0024) [2025-01-04 13:23:43,205][134294] Updated weights for policy 0, policy_version 208824 (0.0025) [2025-01-04 13:23:43,968][134211] Fps is (10 sec: 13926.3, 60 sec: 14882.1, 300 sec: 14856.7). Total num frames: 855351296. Throughput: 0: 3786.1. Samples: 203008896. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 13:23:43,968][134211] Avg episode reward: [(0, '10.174')] [2025-01-04 13:23:46,091][134294] Updated weights for policy 0, policy_version 208834 (0.0026) [2025-01-04 13:23:48,968][134211] Fps is (10 sec: 13926.5, 60 sec: 14813.9, 300 sec: 14856.7). Total num frames: 855420928. Throughput: 0: 3779.7. Samples: 203019472. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 13:23:48,968][134211] Avg episode reward: [(0, '10.734')] [2025-01-04 13:23:49,085][134294] Updated weights for policy 0, policy_version 208844 (0.0025) [2025-01-04 13:23:51,975][134294] Updated weights for policy 0, policy_version 208854 (0.0026) [2025-01-04 13:23:53,968][134211] Fps is (10 sec: 13926.4, 60 sec: 14814.5, 300 sec: 14856.7). Total num frames: 855490560. Throughput: 0: 3642.7. Samples: 203040446. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 13:23:53,968][134211] Avg episode reward: [(0, '10.102')] [2025-01-04 13:23:54,946][134294] Updated weights for policy 0, policy_version 208864 (0.0027) [2025-01-04 13:23:57,780][134294] Updated weights for policy 0, policy_version 208874 (0.0025) [2025-01-04 13:23:58,970][134211] Fps is (10 sec: 14332.4, 60 sec: 14881.6, 300 sec: 14870.4). Total num frames: 855564288. Throughput: 0: 3451.7. Samples: 203061772. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 13:23:58,971][134211] Avg episode reward: [(0, '9.307')] [2025-01-04 13:24:00,554][134294] Updated weights for policy 0, policy_version 208884 (0.0025) [2025-01-04 13:24:03,447][134294] Updated weights for policy 0, policy_version 208894 (0.0026) [2025-01-04 13:24:03,968][134211] Fps is (10 sec: 14336.0, 60 sec: 14882.1, 300 sec: 14856.7). Total num frames: 855633920. Throughput: 0: 3483.4. Samples: 203072600. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 13:24:03,968][134211] Avg episode reward: [(0, '9.733')] [2025-01-04 13:24:06,040][134294] Updated weights for policy 0, policy_version 208904 (0.0020) [2025-01-04 13:24:07,926][134294] Updated weights for policy 0, policy_version 208914 (0.0013) [2025-01-04 13:24:08,967][134211] Fps is (10 sec: 16798.1, 60 sec: 14677.5, 300 sec: 14953.9). Total num frames: 855732224. Throughput: 0: 3592.7. Samples: 203097152. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 13:24:08,968][134211] Avg episode reward: [(0, '10.742')] [2025-01-04 13:24:09,824][134294] Updated weights for policy 0, policy_version 208924 (0.0014) [2025-01-04 13:24:12,548][134294] Updated weights for policy 0, policy_version 208934 (0.0023) [2025-01-04 13:24:13,968][134211] Fps is (10 sec: 17611.9, 60 sec: 14472.4, 300 sec: 14995.5). Total num frames: 855810048. Throughput: 0: 3722.8. Samples: 203123782. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 13:24:13,970][134211] Avg episode reward: [(0, '9.708')] [2025-01-04 13:24:15,457][134294] Updated weights for policy 0, policy_version 208944 (0.0027) [2025-01-04 13:24:18,316][134294] Updated weights for policy 0, policy_version 208954 (0.0024) [2025-01-04 13:24:18,968][134211] Fps is (10 sec: 15155.0, 60 sec: 14677.4, 300 sec: 14967.7). Total num frames: 855883776. Throughput: 0: 3726.7. Samples: 203134398. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 13:24:18,968][134211] Avg episode reward: [(0, '8.529')] [2025-01-04 13:24:21,333][134294] Updated weights for policy 0, policy_version 208964 (0.0024) [2025-01-04 13:24:23,968][134211] Fps is (10 sec: 14336.7, 60 sec: 14745.6, 300 sec: 14828.9). Total num frames: 855953408. Throughput: 0: 3734.3. Samples: 203155430. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 13:24:23,968][134211] Avg episode reward: [(0, '9.280')] [2025-01-04 13:24:24,299][134294] Updated weights for policy 0, policy_version 208974 (0.0024) [2025-01-04 13:24:27,170][134294] Updated weights for policy 0, policy_version 208984 (0.0024) [2025-01-04 13:24:28,968][134211] Fps is (10 sec: 13926.5, 60 sec: 14677.4, 300 sec: 14773.4). Total num frames: 856023040. Throughput: 0: 3720.2. Samples: 203176302. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 13:24:28,968][134211] Avg episode reward: [(0, '9.406')] [2025-01-04 13:24:30,095][134294] Updated weights for policy 0, policy_version 208994 (0.0025) [2025-01-04 13:24:32,921][134294] Updated weights for policy 0, policy_version 209004 (0.0025) [2025-01-04 13:24:33,968][134211] Fps is (10 sec: 13926.4, 60 sec: 14677.3, 300 sec: 14787.3). Total num frames: 856092672. Throughput: 0: 3725.8. Samples: 203187132. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 13:24:33,968][134211] Avg episode reward: [(0, '8.628')] [2025-01-04 13:24:35,779][134294] Updated weights for policy 0, policy_version 209014 (0.0022) [2025-01-04 13:24:38,553][134294] Updated weights for policy 0, policy_version 209024 (0.0025) [2025-01-04 13:24:38,968][134211] Fps is (10 sec: 14335.8, 60 sec: 14745.6, 300 sec: 14815.0). Total num frames: 856166400. Throughput: 0: 3741.1. Samples: 203208794. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 13:24:38,968][134211] Avg episode reward: [(0, '11.051')] [2025-01-04 13:24:41,441][134294] Updated weights for policy 0, policy_version 209034 (0.0026) [2025-01-04 13:24:43,968][134211] Fps is (10 sec: 14336.2, 60 sec: 14745.6, 300 sec: 14815.0). Total num frames: 856236032. Throughput: 0: 3736.7. Samples: 203229916. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 13:24:43,968][134211] Avg episode reward: [(0, '10.339')] [2025-01-04 13:24:44,483][134294] Updated weights for policy 0, policy_version 209044 (0.0023) [2025-01-04 13:24:47,316][134294] Updated weights for policy 0, policy_version 209054 (0.0024) [2025-01-04 13:24:48,968][134211] Fps is (10 sec: 15155.4, 60 sec: 14950.4, 300 sec: 14856.7). Total num frames: 856317952. Throughput: 0: 3725.2. Samples: 203240232. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 13:24:48,968][134211] Avg episode reward: [(0, '9.693')] [2025-01-04 13:24:49,242][134294] Updated weights for policy 0, policy_version 209064 (0.0013) [2025-01-04 13:24:51,918][134294] Updated weights for policy 0, policy_version 209074 (0.0021) [2025-01-04 13:24:53,968][134211] Fps is (10 sec: 15564.7, 60 sec: 15018.7, 300 sec: 14870.6). Total num frames: 856391680. Throughput: 0: 3750.2. Samples: 203265910. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 13:24:53,968][134211] Avg episode reward: [(0, '10.263')] [2025-01-04 13:24:54,929][134294] Updated weights for policy 0, policy_version 209084 (0.0027) [2025-01-04 13:24:57,813][134294] Updated weights for policy 0, policy_version 209094 (0.0024) [2025-01-04 13:24:58,968][134211] Fps is (10 sec: 14336.0, 60 sec: 14951.0, 300 sec: 14870.6). Total num frames: 856461312. Throughput: 0: 3618.6. Samples: 203286616. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 13:24:58,968][134211] Avg episode reward: [(0, '10.581')] [2025-01-04 13:25:00,804][134294] Updated weights for policy 0, policy_version 209104 (0.0023) [2025-01-04 13:25:03,654][134294] Updated weights for policy 0, policy_version 209114 (0.0023) [2025-01-04 13:25:03,968][134211] Fps is (10 sec: 14336.0, 60 sec: 15018.7, 300 sec: 14884.5). Total num frames: 856535040. Throughput: 0: 3619.6. Samples: 203297278. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 13:25:03,968][134211] Avg episode reward: [(0, '10.784')] [2025-01-04 13:25:06,557][134294] Updated weights for policy 0, policy_version 209124 (0.0022) [2025-01-04 13:25:08,968][134211] Fps is (10 sec: 14336.0, 60 sec: 14540.8, 300 sec: 14884.5). Total num frames: 856604672. Throughput: 0: 3620.7. Samples: 203318362. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 13:25:08,968][134211] Avg episode reward: [(0, '9.831')] [2025-01-04 13:25:09,509][134294] Updated weights for policy 0, policy_version 209134 (0.0026) [2025-01-04 13:25:12,219][134294] Updated weights for policy 0, policy_version 209144 (0.0019) [2025-01-04 13:25:13,968][134211] Fps is (10 sec: 15564.8, 60 sec: 14677.5, 300 sec: 14926.1). Total num frames: 856690688. Throughput: 0: 3681.2. Samples: 203341956. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 13:25:13,968][134211] Avg episode reward: [(0, '10.391')] [2025-01-04 13:25:13,972][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000209153_856690688.pth... [2025-01-04 13:25:14,017][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000208282_853123072.pth [2025-01-04 13:25:14,133][134294] Updated weights for policy 0, policy_version 209154 (0.0013) [2025-01-04 13:25:16,075][134294] Updated weights for policy 0, policy_version 209164 (0.0014) [2025-01-04 13:25:18,748][134294] Updated weights for policy 0, policy_version 209174 (0.0024) [2025-01-04 13:25:18,968][134211] Fps is (10 sec: 17203.0, 60 sec: 14882.1, 300 sec: 14981.6). Total num frames: 856776704. Throughput: 0: 3789.4. Samples: 203357656. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 13:25:18,968][134211] Avg episode reward: [(0, '10.868')] [2025-01-04 13:25:21,813][134294] Updated weights for policy 0, policy_version 209184 (0.0026) [2025-01-04 13:25:23,968][134211] Fps is (10 sec: 15564.6, 60 sec: 14882.1, 300 sec: 14981.6). Total num frames: 856846336. Throughput: 0: 3782.6. Samples: 203379012. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 13:25:23,968][134211] Avg episode reward: [(0, '10.781')] [2025-01-04 13:25:24,804][134294] Updated weights for policy 0, policy_version 209194 (0.0024) [2025-01-04 13:25:27,766][134294] Updated weights for policy 0, policy_version 209204 (0.0024) [2025-01-04 13:25:28,968][134211] Fps is (10 sec: 13926.4, 60 sec: 14882.1, 300 sec: 14870.6). Total num frames: 856915968. Throughput: 0: 3771.4. Samples: 203399630. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 13:25:28,968][134211] Avg episode reward: [(0, '10.273')] [2025-01-04 13:25:30,693][134294] Updated weights for policy 0, policy_version 209214 (0.0027) [2025-01-04 13:25:33,416][134294] Updated weights for policy 0, policy_version 209224 (0.0023) [2025-01-04 13:25:33,968][134211] Fps is (10 sec: 13926.6, 60 sec: 14882.2, 300 sec: 14745.6). Total num frames: 856985600. Throughput: 0: 3777.2. Samples: 203410204. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 13:25:33,968][134211] Avg episode reward: [(0, '10.175')] [2025-01-04 13:25:36,358][134294] Updated weights for policy 0, policy_version 209234 (0.0023) [2025-01-04 13:25:38,968][134211] Fps is (10 sec: 13926.4, 60 sec: 14813.9, 300 sec: 14759.5). Total num frames: 857055232. Throughput: 0: 3680.2. Samples: 203431520. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 13:25:38,968][134211] Avg episode reward: [(0, '9.441')] [2025-01-04 13:25:39,422][134294] Updated weights for policy 0, policy_version 209244 (0.0026) [2025-01-04 13:25:42,293][134294] Updated weights for policy 0, policy_version 209254 (0.0026) [2025-01-04 13:25:43,968][134211] Fps is (10 sec: 13926.4, 60 sec: 14813.9, 300 sec: 14759.5). Total num frames: 857124864. Throughput: 0: 3681.9. Samples: 203452302. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 13:25:43,968][134211] Avg episode reward: [(0, '9.245')] [2025-01-04 13:25:45,171][134294] Updated weights for policy 0, policy_version 209264 (0.0025) [2025-01-04 13:25:48,135][134294] Updated weights for policy 0, policy_version 209274 (0.0024) [2025-01-04 13:25:48,968][134211] Fps is (10 sec: 13926.4, 60 sec: 14609.0, 300 sec: 14759.5). Total num frames: 857194496. Throughput: 0: 3684.4. Samples: 203463078. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 13:25:48,969][134211] Avg episode reward: [(0, '9.337')] [2025-01-04 13:25:51,234][134294] Updated weights for policy 0, policy_version 209284 (0.0023) [2025-01-04 13:25:53,891][134294] Updated weights for policy 0, policy_version 209294 (0.0020) [2025-01-04 13:25:53,968][134211] Fps is (10 sec: 14335.4, 60 sec: 14609.0, 300 sec: 14787.2). Total num frames: 857268224. Throughput: 0: 3655.4. Samples: 203482856. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 13:25:53,969][134211] Avg episode reward: [(0, '9.509')] [2025-01-04 13:25:56,011][134294] Updated weights for policy 0, policy_version 209304 (0.0014) [2025-01-04 13:25:58,030][134294] Updated weights for policy 0, policy_version 209314 (0.0015) [2025-01-04 13:25:58,968][134211] Fps is (10 sec: 17203.4, 60 sec: 15086.9, 300 sec: 14870.6). Total num frames: 857366528. Throughput: 0: 3771.7. Samples: 203511684. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 13:25:58,968][134211] Avg episode reward: [(0, '9.780')] [2025-01-04 13:26:00,289][134294] Updated weights for policy 0, policy_version 209324 (0.0022) [2025-01-04 13:26:03,595][134294] Updated weights for policy 0, policy_version 209334 (0.0026) [2025-01-04 13:26:03,968][134211] Fps is (10 sec: 16794.3, 60 sec: 15018.7, 300 sec: 14870.6). Total num frames: 857436160. Throughput: 0: 3689.0. Samples: 203523660. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 13:26:03,968][134211] Avg episode reward: [(0, '10.453')] [2025-01-04 13:26:06,734][134294] Updated weights for policy 0, policy_version 209344 (0.0025) [2025-01-04 13:26:08,968][134211] Fps is (10 sec: 13107.1, 60 sec: 14882.1, 300 sec: 14842.8). Total num frames: 857497600. Throughput: 0: 3633.7. Samples: 203542528. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 13:26:08,968][134211] Avg episode reward: [(0, '9.260')] [2025-01-04 13:26:10,077][134294] Updated weights for policy 0, policy_version 209354 (0.0028) [2025-01-04 13:26:13,175][134294] Updated weights for policy 0, policy_version 209364 (0.0024) [2025-01-04 13:26:13,968][134211] Fps is (10 sec: 12697.6, 60 sec: 14540.8, 300 sec: 14828.9). Total num frames: 857563136. Throughput: 0: 3609.1. Samples: 203562038. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 13:26:13,968][134211] Avg episode reward: [(0, '10.260')] [2025-01-04 13:26:16,612][134294] Updated weights for policy 0, policy_version 209374 (0.0028) [2025-01-04 13:26:18,969][134211] Fps is (10 sec: 12696.2, 60 sec: 14130.9, 300 sec: 14787.2). Total num frames: 857624576. Throughput: 0: 3575.9. Samples: 203571122. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 13:26:18,969][134211] Avg episode reward: [(0, '9.774')] [2025-01-04 13:26:19,981][134294] Updated weights for policy 0, policy_version 209384 (0.0027) [2025-01-04 13:26:22,939][134294] Updated weights for policy 0, policy_version 209394 (0.0029) [2025-01-04 13:26:23,968][134211] Fps is (10 sec: 12697.6, 60 sec: 14063.0, 300 sec: 14773.4). Total num frames: 857690112. Throughput: 0: 3516.2. Samples: 203589750. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 13:26:23,968][134211] Avg episode reward: [(0, '10.546')] [2025-01-04 13:26:25,958][134294] Updated weights for policy 0, policy_version 209404 (0.0023) [2025-01-04 13:26:28,020][134294] Updated weights for policy 0, policy_version 209414 (0.0016) [2025-01-04 13:26:28,968][134211] Fps is (10 sec: 14747.4, 60 sec: 14267.8, 300 sec: 14815.0). Total num frames: 857772032. Throughput: 0: 3590.9. Samples: 203613890. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 13:26:28,968][134211] Avg episode reward: [(0, '9.967')] [2025-01-04 13:26:31,172][134294] Updated weights for policy 0, policy_version 209424 (0.0026) [2025-01-04 13:26:33,968][134211] Fps is (10 sec: 14335.7, 60 sec: 14131.2, 300 sec: 14773.4). Total num frames: 857833472. Throughput: 0: 3565.9. Samples: 203623542. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 13:26:33,969][134211] Avg episode reward: [(0, '9.887')] [2025-01-04 13:26:34,436][134294] Updated weights for policy 0, policy_version 209434 (0.0026) [2025-01-04 13:26:36,491][134294] Updated weights for policy 0, policy_version 209444 (0.0015) [2025-01-04 13:26:38,971][134211] Fps is (10 sec: 13922.0, 60 sec: 14267.0, 300 sec: 14731.6). Total num frames: 857911296. Throughput: 0: 3627.9. Samples: 203646122. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 13:26:38,971][134211] Avg episode reward: [(0, '10.848')] [2025-01-04 13:26:40,189][134294] Updated weights for policy 0, policy_version 209454 (0.0032) [2025-01-04 13:26:42,637][134294] Updated weights for policy 0, policy_version 209464 (0.0020) [2025-01-04 13:26:43,968][134211] Fps is (10 sec: 15565.2, 60 sec: 14404.3, 300 sec: 14759.5). Total num frames: 857989120. Throughput: 0: 3450.5. Samples: 203666958. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 13:26:43,968][134211] Avg episode reward: [(0, '10.175')] [2025-01-04 13:26:44,761][134294] Updated weights for policy 0, policy_version 209474 (0.0016) [2025-01-04 13:26:46,856][134294] Updated weights for policy 0, policy_version 209484 (0.0014) [2025-01-04 13:26:48,913][134294] Updated weights for policy 0, policy_version 209494 (0.0012) [2025-01-04 13:26:48,969][134211] Fps is (10 sec: 17616.3, 60 sec: 14881.9, 300 sec: 14842.7). Total num frames: 858087424. Throughput: 0: 3512.6. Samples: 203681730. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 13:26:48,969][134211] Avg episode reward: [(0, '10.970')] [2025-01-04 13:26:50,824][134294] Updated weights for policy 0, policy_version 209504 (0.0013) [2025-01-04 13:26:52,744][134294] Updated weights for policy 0, policy_version 209514 (0.0012) [2025-01-04 13:26:53,968][134211] Fps is (10 sec: 20479.9, 60 sec: 15428.4, 300 sec: 14967.8). Total num frames: 858193920. Throughput: 0: 3780.4. Samples: 203712648. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 13:26:53,968][134211] Avg episode reward: [(0, '9.550')] [2025-01-04 13:26:54,822][134294] Updated weights for policy 0, policy_version 209524 (0.0019) [2025-01-04 13:26:58,208][134294] Updated weights for policy 0, policy_version 209534 (0.0033) [2025-01-04 13:26:58,968][134211] Fps is (10 sec: 17204.9, 60 sec: 14882.1, 300 sec: 14912.2). Total num frames: 858259456. Throughput: 0: 3864.7. Samples: 203735952. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 13:26:58,969][134211] Avg episode reward: [(0, '9.378')] [2025-01-04 13:27:01,550][134294] Updated weights for policy 0, policy_version 209544 (0.0026) [2025-01-04 13:27:03,968][134211] Fps is (10 sec: 12287.8, 60 sec: 14677.3, 300 sec: 14856.7). Total num frames: 858316800. Throughput: 0: 3866.6. Samples: 203745114. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 13:27:03,969][134211] Avg episode reward: [(0, '9.800')] [2025-01-04 13:27:05,146][134294] Updated weights for policy 0, policy_version 209554 (0.0029) [2025-01-04 13:27:08,330][134294] Updated weights for policy 0, policy_version 209564 (0.0029) [2025-01-04 13:27:08,968][134211] Fps is (10 sec: 11878.5, 60 sec: 14677.3, 300 sec: 14828.9). Total num frames: 858378240. Throughput: 0: 3854.5. Samples: 203763202. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 13:27:08,968][134211] Avg episode reward: [(0, '9.649')] [2025-01-04 13:27:11,642][134294] Updated weights for policy 0, policy_version 209574 (0.0029) [2025-01-04 13:27:13,968][134211] Fps is (10 sec: 12697.6, 60 sec: 14677.3, 300 sec: 14801.1). Total num frames: 858443776. Throughput: 0: 3728.9. Samples: 203781692. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 13:27:13,968][134211] Avg episode reward: [(0, '10.139')] [2025-01-04 13:27:13,977][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000209581_858443776.pth... [2025-01-04 13:27:14,058][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000208726_854941696.pth [2025-01-04 13:27:14,911][134294] Updated weights for policy 0, policy_version 209584 (0.0024) [2025-01-04 13:27:17,931][134294] Updated weights for policy 0, policy_version 209594 (0.0027) [2025-01-04 13:27:18,968][134211] Fps is (10 sec: 13107.2, 60 sec: 14745.9, 300 sec: 14787.2). Total num frames: 858509312. Throughput: 0: 3729.8. Samples: 203791382. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 13:27:18,968][134211] Avg episode reward: [(0, '9.217')] [2025-01-04 13:27:20,942][134294] Updated weights for policy 0, policy_version 209604 (0.0025) [2025-01-04 13:27:23,858][134294] Updated weights for policy 0, policy_version 209614 (0.0025) [2025-01-04 13:27:23,968][134211] Fps is (10 sec: 13517.0, 60 sec: 14813.9, 300 sec: 14690.1). Total num frames: 858578944. Throughput: 0: 3687.2. Samples: 203812036. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 13:27:23,968][134211] Avg episode reward: [(0, '9.217')] [2025-01-04 13:27:26,957][134294] Updated weights for policy 0, policy_version 209624 (0.0026) [2025-01-04 13:27:28,968][134211] Fps is (10 sec: 13516.8, 60 sec: 14540.8, 300 sec: 14648.4). Total num frames: 858644480. Throughput: 0: 3670.1. Samples: 203832112. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 13:27:28,968][134211] Avg episode reward: [(0, '10.615')] [2025-01-04 13:27:29,940][134294] Updated weights for policy 0, policy_version 209634 (0.0023) [2025-01-04 13:27:32,878][134294] Updated weights for policy 0, policy_version 209644 (0.0023) [2025-01-04 13:27:33,968][134211] Fps is (10 sec: 13516.7, 60 sec: 14677.3, 300 sec: 14648.4). Total num frames: 858714112. Throughput: 0: 3577.9. Samples: 203842734. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 13:27:33,968][134211] Avg episode reward: [(0, '9.557')] [2025-01-04 13:27:35,901][134294] Updated weights for policy 0, policy_version 209654 (0.0025) [2025-01-04 13:27:38,850][134294] Updated weights for policy 0, policy_version 209664 (0.0029) [2025-01-04 13:27:38,968][134211] Fps is (10 sec: 13926.4, 60 sec: 14541.5, 300 sec: 14662.3). Total num frames: 858783744. Throughput: 0: 3355.5. Samples: 203863646. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 13:27:38,968][134211] Avg episode reward: [(0, '8.898')] [2025-01-04 13:27:41,925][134294] Updated weights for policy 0, policy_version 209674 (0.0024) [2025-01-04 13:27:43,968][134211] Fps is (10 sec: 13926.6, 60 sec: 14404.3, 300 sec: 14648.4). Total num frames: 858853376. Throughput: 0: 3281.5. Samples: 203883618. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 13:27:43,968][134211] Avg episode reward: [(0, '9.626')] [2025-01-04 13:27:44,949][134294] Updated weights for policy 0, policy_version 209684 (0.0025) [2025-01-04 13:27:47,906][134294] Updated weights for policy 0, policy_version 209694 (0.0024) [2025-01-04 13:27:48,968][134211] Fps is (10 sec: 13516.8, 60 sec: 13858.4, 300 sec: 14634.6). Total num frames: 858918912. Throughput: 0: 3301.5. Samples: 203893682. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 13:27:48,968][134211] Avg episode reward: [(0, '10.512')] [2025-01-04 13:27:50,914][134294] Updated weights for policy 0, policy_version 209704 (0.0028) [2025-01-04 13:27:53,760][134294] Updated weights for policy 0, policy_version 209714 (0.0024) [2025-01-04 13:27:53,968][134211] Fps is (10 sec: 13516.7, 60 sec: 13243.7, 300 sec: 14634.5). Total num frames: 858988544. Throughput: 0: 3366.6. Samples: 203914700. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 13:27:53,968][134211] Avg episode reward: [(0, '10.003')] [2025-01-04 13:27:56,776][134294] Updated weights for policy 0, policy_version 209724 (0.0025) [2025-01-04 13:27:58,968][134211] Fps is (10 sec: 13926.4, 60 sec: 13312.0, 300 sec: 14634.5). Total num frames: 859058176. Throughput: 0: 3413.7. Samples: 203935310. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 13:27:58,968][134211] Avg episode reward: [(0, '9.103')] [2025-01-04 13:27:59,943][134294] Updated weights for policy 0, policy_version 209734 (0.0024) [2025-01-04 13:28:03,393][134294] Updated weights for policy 0, policy_version 209744 (0.0028) [2025-01-04 13:28:03,968][134211] Fps is (10 sec: 12697.2, 60 sec: 13312.0, 300 sec: 14454.0). Total num frames: 859115520. Throughput: 0: 3397.9. Samples: 203944288. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 13:28:03,969][134211] Avg episode reward: [(0, '9.918')] [2025-01-04 13:28:06,661][134294] Updated weights for policy 0, policy_version 209754 (0.0027) [2025-01-04 13:28:08,967][134211] Fps is (10 sec: 13107.5, 60 sec: 13516.9, 300 sec: 14398.5). Total num frames: 859189248. Throughput: 0: 3344.3. Samples: 203962530. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 13:28:08,968][134211] Avg episode reward: [(0, '9.157')] [2025-01-04 13:28:09,042][134294] Updated weights for policy 0, policy_version 209764 (0.0016) [2025-01-04 13:28:11,006][134294] Updated weights for policy 0, policy_version 209774 (0.0014) [2025-01-04 13:28:12,961][134294] Updated weights for policy 0, policy_version 209784 (0.0013) [2025-01-04 13:28:13,968][134211] Fps is (10 sec: 18022.7, 60 sec: 14199.5, 300 sec: 14551.2). Total num frames: 859295744. Throughput: 0: 3578.8. Samples: 203993160. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 13:28:13,968][134211] Avg episode reward: [(0, '9.470')] [2025-01-04 13:28:14,906][134294] Updated weights for policy 0, policy_version 209794 (0.0014) [2025-01-04 13:28:17,023][134294] Updated weights for policy 0, policy_version 209804 (0.0016) [2025-01-04 13:28:18,968][134211] Fps is (10 sec: 18431.1, 60 sec: 14404.2, 300 sec: 14592.8). Total num frames: 859373568. Throughput: 0: 3692.3. Samples: 204008890. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 13:28:18,969][134211] Avg episode reward: [(0, '10.438')] [2025-01-04 13:28:21,232][134294] Updated weights for policy 0, policy_version 209814 (0.0036) [2025-01-04 13:28:23,968][134211] Fps is (10 sec: 13106.6, 60 sec: 14131.1, 300 sec: 14523.4). Total num frames: 859426816. Throughput: 0: 3586.6. Samples: 204025046. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 13:28:23,969][134211] Avg episode reward: [(0, '10.331')] [2025-01-04 13:28:24,865][134294] Updated weights for policy 0, policy_version 209824 (0.0028) [2025-01-04 13:28:28,033][134294] Updated weights for policy 0, policy_version 209834 (0.0025) [2025-01-04 13:28:28,968][134211] Fps is (10 sec: 11469.2, 60 sec: 14063.0, 300 sec: 14495.7). Total num frames: 859488256. Throughput: 0: 3560.4. Samples: 204043834. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 13:28:28,968][134211] Avg episode reward: [(0, '10.326')] [2025-01-04 13:28:30,992][134294] Updated weights for policy 0, policy_version 209844 (0.0025) [2025-01-04 13:28:33,968][134211] Fps is (10 sec: 13107.8, 60 sec: 14062.9, 300 sec: 14495.7). Total num frames: 859557888. Throughput: 0: 3565.3. Samples: 204054120. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 13:28:33,969][134211] Avg episode reward: [(0, '9.517')] [2025-01-04 13:28:34,146][134294] Updated weights for policy 0, policy_version 209854 (0.0026) [2025-01-04 13:28:37,138][134294] Updated weights for policy 0, policy_version 209864 (0.0027) [2025-01-04 13:28:38,968][134211] Fps is (10 sec: 13516.6, 60 sec: 13994.7, 300 sec: 14481.8). Total num frames: 859623424. Throughput: 0: 3536.6. Samples: 204073848. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 13:28:38,968][134211] Avg episode reward: [(0, '9.556')] [2025-01-04 13:28:40,336][134294] Updated weights for policy 0, policy_version 209874 (0.0024) [2025-01-04 13:28:43,305][134294] Updated weights for policy 0, policy_version 209884 (0.0023) [2025-01-04 13:28:43,968][134211] Fps is (10 sec: 13516.9, 60 sec: 13994.6, 300 sec: 14481.8). Total num frames: 859693056. Throughput: 0: 3526.3. Samples: 204093992. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 13:28:43,968][134211] Avg episode reward: [(0, '9.272')] [2025-01-04 13:28:46,211][134294] Updated weights for policy 0, policy_version 209894 (0.0022) [2025-01-04 13:28:48,968][134211] Fps is (10 sec: 13516.9, 60 sec: 13994.7, 300 sec: 14467.9). Total num frames: 859758592. Throughput: 0: 3557.9. Samples: 204104394. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 13:28:48,971][134211] Avg episode reward: [(0, '9.387')] [2025-01-04 13:28:49,345][134294] Updated weights for policy 0, policy_version 209904 (0.0026) [2025-01-04 13:28:52,334][134294] Updated weights for policy 0, policy_version 209914 (0.0026) [2025-01-04 13:28:53,968][134211] Fps is (10 sec: 13517.0, 60 sec: 13994.7, 300 sec: 14454.1). Total num frames: 859828224. Throughput: 0: 3600.0. Samples: 204124530. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 13:28:53,968][134211] Avg episode reward: [(0, '9.885')] [2025-01-04 13:28:55,260][134294] Updated weights for policy 0, policy_version 209924 (0.0024) [2025-01-04 13:28:58,217][134294] Updated weights for policy 0, policy_version 209934 (0.0024) [2025-01-04 13:28:58,968][134211] Fps is (10 sec: 13926.5, 60 sec: 13994.7, 300 sec: 14454.0). Total num frames: 859897856. Throughput: 0: 3387.5. Samples: 204145598. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 13:28:58,968][134211] Avg episode reward: [(0, '9.898')] [2025-01-04 13:29:01,148][134294] Updated weights for policy 0, policy_version 209944 (0.0024) [2025-01-04 13:29:03,968][134211] Fps is (10 sec: 13516.5, 60 sec: 14131.2, 300 sec: 14342.9). Total num frames: 859963392. Throughput: 0: 3271.8. Samples: 204156122. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 13:29:03,968][134211] Avg episode reward: [(0, '9.526')] [2025-01-04 13:29:04,332][134294] Updated weights for policy 0, policy_version 209954 (0.0027) [2025-01-04 13:29:07,444][134294] Updated weights for policy 0, policy_version 209964 (0.0026) [2025-01-04 13:29:08,968][134211] Fps is (10 sec: 13107.2, 60 sec: 13994.6, 300 sec: 14301.3). Total num frames: 860028928. Throughput: 0: 3340.0. Samples: 204175344. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 13:29:08,968][134211] Avg episode reward: [(0, '9.840')] [2025-01-04 13:29:10,410][134294] Updated weights for policy 0, policy_version 209974 (0.0024) [2025-01-04 13:29:12,745][134294] Updated weights for policy 0, policy_version 209984 (0.0015) [2025-01-04 13:29:13,968][134211] Fps is (10 sec: 15565.2, 60 sec: 13721.6, 300 sec: 14356.8). Total num frames: 860119040. Throughput: 0: 3450.7. Samples: 204199114. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 13:29:13,968][134211] Avg episode reward: [(0, '10.292')] [2025-01-04 13:29:13,972][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000209990_860119040.pth... [2025-01-04 13:29:14,026][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000209153_856690688.pth [2025-01-04 13:29:14,673][134294] Updated weights for policy 0, policy_version 209994 (0.0011) [2025-01-04 13:29:16,636][134294] Updated weights for policy 0, policy_version 210004 (0.0012) [2025-01-04 13:29:18,608][134294] Updated weights for policy 0, policy_version 210014 (0.0013) [2025-01-04 13:29:18,968][134211] Fps is (10 sec: 19251.4, 60 sec: 14131.3, 300 sec: 14467.9). Total num frames: 860221440. Throughput: 0: 3576.0. Samples: 204215038. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-04 13:29:18,968][134211] Avg episode reward: [(0, '10.248')] [2025-01-04 13:29:20,567][134294] Updated weights for policy 0, policy_version 210024 (0.0016) [2025-01-04 13:29:23,219][134294] Updated weights for policy 0, policy_version 210034 (0.0024) [2025-01-04 13:29:23,968][134211] Fps is (10 sec: 18431.5, 60 sec: 14609.2, 300 sec: 14509.5). Total num frames: 860303360. Throughput: 0: 3793.3. Samples: 204244548. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-04 13:29:23,969][134211] Avg episode reward: [(0, '9.595')] [2025-01-04 13:29:26,761][134294] Updated weights for policy 0, policy_version 210044 (0.0030) [2025-01-04 13:29:28,968][134211] Fps is (10 sec: 14335.7, 60 sec: 14609.0, 300 sec: 14481.8). Total num frames: 860364800. Throughput: 0: 3743.6. Samples: 204262454. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-04 13:29:28,968][134211] Avg episode reward: [(0, '10.936')] [2025-01-04 13:29:30,074][134294] Updated weights for policy 0, policy_version 210054 (0.0030) [2025-01-04 13:29:33,231][134294] Updated weights for policy 0, policy_version 210064 (0.0026) [2025-01-04 13:29:33,968][134211] Fps is (10 sec: 12697.6, 60 sec: 14540.8, 300 sec: 14454.0). Total num frames: 860430336. Throughput: 0: 3724.8. Samples: 204272012. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-04 13:29:33,969][134211] Avg episode reward: [(0, '9.332')] [2025-01-04 13:29:36,204][134294] Updated weights for policy 0, policy_version 210074 (0.0027) [2025-01-04 13:29:38,968][134211] Fps is (10 sec: 13107.3, 60 sec: 14540.8, 300 sec: 14440.1). Total num frames: 860495872. Throughput: 0: 3722.8. Samples: 204292056. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-04 13:29:38,968][134211] Avg episode reward: [(0, '9.201')] [2025-01-04 13:29:39,372][134294] Updated weights for policy 0, policy_version 210084 (0.0027) [2025-01-04 13:29:42,364][134294] Updated weights for policy 0, policy_version 210094 (0.0026) [2025-01-04 13:29:43,968][134211] Fps is (10 sec: 13517.0, 60 sec: 14540.8, 300 sec: 14398.5). Total num frames: 860565504. Throughput: 0: 3698.3. Samples: 204312020. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-04 13:29:43,968][134211] Avg episode reward: [(0, '9.209')] [2025-01-04 13:29:45,440][134294] Updated weights for policy 0, policy_version 210104 (0.0026) [2025-01-04 13:29:48,517][134294] Updated weights for policy 0, policy_version 210114 (0.0025) [2025-01-04 13:29:48,968][134211] Fps is (10 sec: 13516.6, 60 sec: 14540.8, 300 sec: 14370.7). Total num frames: 860631040. Throughput: 0: 3695.0. Samples: 204322398. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-04 13:29:48,968][134211] Avg episode reward: [(0, '8.839')] [2025-01-04 13:29:51,546][134294] Updated weights for policy 0, policy_version 210124 (0.0026) [2025-01-04 13:29:53,971][134211] Fps is (10 sec: 13103.0, 60 sec: 14471.7, 300 sec: 14356.7). Total num frames: 860696576. Throughput: 0: 3706.7. Samples: 204342156. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-04 13:29:53,972][134211] Avg episode reward: [(0, '9.703')] [2025-01-04 13:29:54,690][134294] Updated weights for policy 0, policy_version 210134 (0.0026) [2025-01-04 13:29:57,744][134294] Updated weights for policy 0, policy_version 210144 (0.0025) [2025-01-04 13:29:58,968][134211] Fps is (10 sec: 13517.1, 60 sec: 14472.5, 300 sec: 14342.9). Total num frames: 860766208. Throughput: 0: 3622.4. Samples: 204362122. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-04 13:29:58,968][134211] Avg episode reward: [(0, '10.643')] [2025-01-04 13:30:00,661][134294] Updated weights for policy 0, policy_version 210154 (0.0022) [2025-01-04 13:30:03,713][134294] Updated weights for policy 0, policy_version 210164 (0.0025) [2025-01-04 13:30:03,969][134211] Fps is (10 sec: 13520.0, 60 sec: 14472.4, 300 sec: 14329.0). Total num frames: 860831744. Throughput: 0: 3502.0. Samples: 204372632. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-04 13:30:03,970][134211] Avg episode reward: [(0, '10.245')] [2025-01-04 13:30:06,699][134294] Updated weights for policy 0, policy_version 210174 (0.0026) [2025-01-04 13:30:08,968][134211] Fps is (10 sec: 13516.6, 60 sec: 14540.8, 300 sec: 14273.5). Total num frames: 860901376. Throughput: 0: 3296.0. Samples: 204392868. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-04 13:30:08,968][134211] Avg episode reward: [(0, '8.370')] [2025-01-04 13:30:09,802][134294] Updated weights for policy 0, policy_version 210184 (0.0026) [2025-01-04 13:30:12,983][134294] Updated weights for policy 0, policy_version 210194 (0.0029) [2025-01-04 13:30:13,968][134211] Fps is (10 sec: 13517.8, 60 sec: 14131.1, 300 sec: 14204.1). Total num frames: 860966912. Throughput: 0: 3331.9. Samples: 204412388. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-04 13:30:13,969][134211] Avg episode reward: [(0, '9.458')] [2025-01-04 13:30:16,274][134294] Updated weights for policy 0, policy_version 210204 (0.0028) [2025-01-04 13:30:18,847][134294] Updated weights for policy 0, policy_version 210214 (0.0020) [2025-01-04 13:30:18,968][134211] Fps is (10 sec: 13516.6, 60 sec: 13585.0, 300 sec: 14204.1). Total num frames: 861036544. Throughput: 0: 3331.7. Samples: 204421940. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-04 13:30:18,968][134211] Avg episode reward: [(0, '8.903')] [2025-01-04 13:30:21,351][134294] Updated weights for policy 0, policy_version 210224 (0.0021) [2025-01-04 13:30:23,968][134211] Fps is (10 sec: 14336.1, 60 sec: 13448.5, 300 sec: 14218.0). Total num frames: 861110272. Throughput: 0: 3404.7. Samples: 204445270. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 13:30:23,969][134211] Avg episode reward: [(0, '10.137')] [2025-01-04 13:30:24,636][134294] Updated weights for policy 0, policy_version 210234 (0.0027) [2025-01-04 13:30:27,177][134294] Updated weights for policy 0, policy_version 210244 (0.0019) [2025-01-04 13:30:28,969][134211] Fps is (10 sec: 15153.9, 60 sec: 13721.4, 300 sec: 14245.7). Total num frames: 861188096. Throughput: 0: 3466.2. Samples: 204468004. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 13:30:28,969][134211] Avg episode reward: [(0, '10.433')] [2025-01-04 13:30:29,574][134294] Updated weights for policy 0, policy_version 210254 (0.0020) [2025-01-04 13:30:32,653][134294] Updated weights for policy 0, policy_version 210264 (0.0027) [2025-01-04 13:30:33,968][134211] Fps is (10 sec: 14745.7, 60 sec: 13789.9, 300 sec: 14245.7). Total num frames: 861257728. Throughput: 0: 3475.2. Samples: 204478784. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 13:30:33,968][134211] Avg episode reward: [(0, '11.082')] [2025-01-04 13:30:35,680][134294] Updated weights for policy 0, policy_version 210274 (0.0028) [2025-01-04 13:30:38,582][134294] Updated weights for policy 0, policy_version 210284 (0.0026) [2025-01-04 13:30:38,968][134211] Fps is (10 sec: 13927.8, 60 sec: 13858.1, 300 sec: 14245.7). Total num frames: 861327360. Throughput: 0: 3488.5. Samples: 204499126. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 13:30:38,968][134211] Avg episode reward: [(0, '10.251')] [2025-01-04 13:30:41,787][134294] Updated weights for policy 0, policy_version 210294 (0.0024) [2025-01-04 13:30:43,968][134211] Fps is (10 sec: 12697.4, 60 sec: 13653.3, 300 sec: 14204.1). Total num frames: 861384704. Throughput: 0: 3463.8. Samples: 204517996. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 13:30:43,969][134211] Avg episode reward: [(0, '9.729')] [2025-01-04 13:30:45,745][134294] Updated weights for policy 0, policy_version 210304 (0.0032) [2025-01-04 13:30:47,995][134294] Updated weights for policy 0, policy_version 210314 (0.0014) [2025-01-04 13:30:48,968][134211] Fps is (10 sec: 13926.4, 60 sec: 13926.4, 300 sec: 14231.9). Total num frames: 861466624. Throughput: 0: 3401.4. Samples: 204525694. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 13:30:48,968][134211] Avg episode reward: [(0, '9.147')] [2025-01-04 13:30:49,933][134294] Updated weights for policy 0, policy_version 210324 (0.0013) [2025-01-04 13:30:51,799][134294] Updated weights for policy 0, policy_version 210334 (0.0013) [2025-01-04 13:30:53,862][134294] Updated weights for policy 0, policy_version 210344 (0.0014) [2025-01-04 13:30:53,968][134211] Fps is (10 sec: 18432.6, 60 sec: 14541.6, 300 sec: 14245.8). Total num frames: 861569024. Throughput: 0: 3650.9. Samples: 204557156. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 13:30:53,969][134211] Avg episode reward: [(0, '10.478')] [2025-01-04 13:30:55,814][134294] Updated weights for policy 0, policy_version 210354 (0.0013) [2025-01-04 13:30:57,779][134294] Updated weights for policy 0, policy_version 210364 (0.0014) [2025-01-04 13:30:58,968][134211] Fps is (10 sec: 20069.7, 60 sec: 15018.5, 300 sec: 14342.9). Total num frames: 861667328. Throughput: 0: 3893.8. Samples: 204587610. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 13:30:58,969][134211] Avg episode reward: [(0, '10.020')] [2025-01-04 13:31:00,664][134294] Updated weights for policy 0, policy_version 210374 (0.0029) [2025-01-04 13:31:03,968][134211] Fps is (10 sec: 15973.9, 60 sec: 14950.6, 300 sec: 14342.9). Total num frames: 861728768. Throughput: 0: 3901.0. Samples: 204597486. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 13:31:03,969][134211] Avg episode reward: [(0, '10.074')] [2025-01-04 13:31:04,057][134294] Updated weights for policy 0, policy_version 210384 (0.0028) [2025-01-04 13:31:07,259][134294] Updated weights for policy 0, policy_version 210394 (0.0027) [2025-01-04 13:31:08,968][134211] Fps is (10 sec: 12698.3, 60 sec: 14882.2, 300 sec: 14342.9). Total num frames: 861794304. Throughput: 0: 3803.6. Samples: 204616432. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 13:31:08,968][134211] Avg episode reward: [(0, '9.343')] [2025-01-04 13:31:10,486][134294] Updated weights for policy 0, policy_version 210404 (0.0031) [2025-01-04 13:31:13,638][134294] Updated weights for policy 0, policy_version 210414 (0.0026) [2025-01-04 13:31:13,968][134211] Fps is (10 sec: 12697.7, 60 sec: 14813.9, 300 sec: 14343.0). Total num frames: 861855744. Throughput: 0: 3725.9. Samples: 204635664. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 13:31:13,968][134211] Avg episode reward: [(0, '8.902')] [2025-01-04 13:31:13,983][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000210415_861859840.pth... [2025-01-04 13:31:14,060][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000209581_858443776.pth [2025-01-04 13:31:16,754][134294] Updated weights for policy 0, policy_version 210424 (0.0027) [2025-01-04 13:31:18,968][134211] Fps is (10 sec: 13107.1, 60 sec: 14813.9, 300 sec: 14356.8). Total num frames: 861925376. Throughput: 0: 3701.3. Samples: 204645344. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 13:31:18,968][134211] Avg episode reward: [(0, '9.708')] [2025-01-04 13:31:19,917][134294] Updated weights for policy 0, policy_version 210434 (0.0026) [2025-01-04 13:31:22,813][134294] Updated weights for policy 0, policy_version 210444 (0.0022) [2025-01-04 13:31:23,968][134211] Fps is (10 sec: 13516.9, 60 sec: 14677.3, 300 sec: 14301.3). Total num frames: 861990912. Throughput: 0: 3698.1. Samples: 204665542. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 13:31:23,968][134211] Avg episode reward: [(0, '9.362')] [2025-01-04 13:31:25,924][134294] Updated weights for policy 0, policy_version 210454 (0.0026) [2025-01-04 13:31:28,960][134294] Updated weights for policy 0, policy_version 210464 (0.0028) [2025-01-04 13:31:28,980][134211] Fps is (10 sec: 13500.4, 60 sec: 14538.1, 300 sec: 14328.5). Total num frames: 862060544. Throughput: 0: 3731.6. Samples: 204685962. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 13:31:28,980][134211] Avg episode reward: [(0, '9.946')] [2025-01-04 13:31:32,011][134294] Updated weights for policy 0, policy_version 210474 (0.0025) [2025-01-04 13:31:33,968][134211] Fps is (10 sec: 13516.8, 60 sec: 14472.5, 300 sec: 14287.5). Total num frames: 862126080. Throughput: 0: 3781.3. Samples: 204695852. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 13:31:33,968][134211] Avg episode reward: [(0, '9.334')] [2025-01-04 13:31:34,926][134294] Updated weights for policy 0, policy_version 210484 (0.0024) [2025-01-04 13:31:37,898][134294] Updated weights for policy 0, policy_version 210494 (0.0025) [2025-01-04 13:31:38,968][134211] Fps is (10 sec: 13533.2, 60 sec: 14472.5, 300 sec: 14259.6). Total num frames: 862195712. Throughput: 0: 3542.8. Samples: 204716584. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 13:31:38,968][134211] Avg episode reward: [(0, '9.799')] [2025-01-04 13:31:40,950][134294] Updated weights for policy 0, policy_version 210504 (0.0025) [2025-01-04 13:31:43,926][134294] Updated weights for policy 0, policy_version 210514 (0.0027) [2025-01-04 13:31:43,968][134211] Fps is (10 sec: 13926.7, 60 sec: 14677.4, 300 sec: 14162.5). Total num frames: 862265344. Throughput: 0: 3323.9. Samples: 204737184. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 13:31:43,968][134211] Avg episode reward: [(0, '9.432')] [2025-01-04 13:31:46,889][134294] Updated weights for policy 0, policy_version 210524 (0.0022) [2025-01-04 13:31:48,968][134211] Fps is (10 sec: 13516.9, 60 sec: 14404.3, 300 sec: 14023.6). Total num frames: 862330880. Throughput: 0: 3330.6. Samples: 204747360. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 13:31:48,968][134211] Avg episode reward: [(0, '9.203')] [2025-01-04 13:31:49,984][134294] Updated weights for policy 0, policy_version 210534 (0.0027) [2025-01-04 13:31:53,013][134294] Updated weights for policy 0, policy_version 210544 (0.0024) [2025-01-04 13:31:53,968][134211] Fps is (10 sec: 13106.9, 60 sec: 13789.8, 300 sec: 14023.6). Total num frames: 862396416. Throughput: 0: 3353.3. Samples: 204767332. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 13:31:53,968][134211] Avg episode reward: [(0, '8.885')] [2025-01-04 13:31:56,003][134294] Updated weights for policy 0, policy_version 210554 (0.0025) [2025-01-04 13:31:58,968][134211] Fps is (10 sec: 13516.8, 60 sec: 13312.1, 300 sec: 14065.3). Total num frames: 862466048. Throughput: 0: 3374.1. Samples: 204787496. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 13:31:58,968][134211] Avg episode reward: [(0, '10.462')] [2025-01-04 13:31:59,250][134294] Updated weights for policy 0, policy_version 210564 (0.0025) [2025-01-04 13:32:02,291][134294] Updated weights for policy 0, policy_version 210574 (0.0023) [2025-01-04 13:32:03,968][134211] Fps is (10 sec: 13516.9, 60 sec: 13380.3, 300 sec: 14079.1). Total num frames: 862531584. Throughput: 0: 3376.7. Samples: 204797294. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 13:32:03,968][134211] Avg episode reward: [(0, '9.931')] [2025-01-04 13:32:05,442][134294] Updated weights for policy 0, policy_version 210584 (0.0028) [2025-01-04 13:32:07,897][134294] Updated weights for policy 0, policy_version 210594 (0.0020) [2025-01-04 13:32:08,968][134211] Fps is (10 sec: 14745.6, 60 sec: 13653.3, 300 sec: 14134.7). Total num frames: 862613504. Throughput: 0: 3379.9. Samples: 204817638. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 13:32:08,968][134211] Avg episode reward: [(0, '8.949')] [2025-01-04 13:32:10,253][134294] Updated weights for policy 0, policy_version 210604 (0.0018) [2025-01-04 13:32:13,101][134294] Updated weights for policy 0, policy_version 210614 (0.0024) [2025-01-04 13:32:13,969][134211] Fps is (10 sec: 15153.6, 60 sec: 13789.6, 300 sec: 14148.5). Total num frames: 862683136. Throughput: 0: 3470.8. Samples: 204842110. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 13:32:13,969][134211] Avg episode reward: [(0, '8.753')] [2025-01-04 13:32:16,158][134294] Updated weights for policy 0, policy_version 210624 (0.0023) [2025-01-04 13:32:18,968][134211] Fps is (10 sec: 13925.5, 60 sec: 13789.7, 300 sec: 14148.5). Total num frames: 862752768. Throughput: 0: 3479.7. Samples: 204852438. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 13:32:18,969][134211] Avg episode reward: [(0, '9.657')] [2025-01-04 13:32:19,021][134294] Updated weights for policy 0, policy_version 210634 (0.0026) [2025-01-04 13:32:22,330][134294] Updated weights for policy 0, policy_version 210644 (0.0028) [2025-01-04 13:32:23,968][134211] Fps is (10 sec: 13108.6, 60 sec: 13721.6, 300 sec: 14134.7). Total num frames: 862814208. Throughput: 0: 3451.3. Samples: 204871894. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 13:32:23,968][134211] Avg episode reward: [(0, '9.411')] [2025-01-04 13:32:25,123][134294] Updated weights for policy 0, policy_version 210654 (0.0020) [2025-01-04 13:32:27,217][134294] Updated weights for policy 0, policy_version 210664 (0.0017) [2025-01-04 13:32:28,970][134211] Fps is (10 sec: 14743.3, 60 sec: 13997.0, 300 sec: 14190.1). Total num frames: 862900224. Throughput: 0: 3534.0. Samples: 204896222. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 13:32:28,971][134211] Avg episode reward: [(0, '9.070')] [2025-01-04 13:32:30,572][134294] Updated weights for policy 0, policy_version 210674 (0.0025) [2025-01-04 13:32:33,968][134211] Fps is (10 sec: 14335.9, 60 sec: 13858.1, 300 sec: 14148.6). Total num frames: 862957568. Throughput: 0: 3507.8. Samples: 204905210. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 13:32:33,968][134211] Avg episode reward: [(0, '8.898')] [2025-01-04 13:32:34,013][134294] Updated weights for policy 0, policy_version 210684 (0.0029) [2025-01-04 13:32:36,924][134294] Updated weights for policy 0, policy_version 210694 (0.0026) [2025-01-04 13:32:38,967][134211] Fps is (10 sec: 13110.3, 60 sec: 13926.4, 300 sec: 14162.4). Total num frames: 863031296. Throughput: 0: 3499.0. Samples: 204924786. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 13:32:38,968][134211] Avg episode reward: [(0, '9.537')] [2025-01-04 13:32:39,477][134294] Updated weights for policy 0, policy_version 210704 (0.0019) [2025-01-04 13:32:41,415][134294] Updated weights for policy 0, policy_version 210714 (0.0013) [2025-01-04 13:32:43,414][134294] Updated weights for policy 0, policy_version 210724 (0.0015) [2025-01-04 13:32:43,968][134211] Fps is (10 sec: 17612.6, 60 sec: 14472.5, 300 sec: 14287.4). Total num frames: 863133696. Throughput: 0: 3693.0. Samples: 204953682. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 13:32:43,969][134211] Avg episode reward: [(0, '9.108')] [2025-01-04 13:32:45,783][134294] Updated weights for policy 0, policy_version 210734 (0.0013) [2025-01-04 13:32:48,968][134211] Fps is (10 sec: 16793.0, 60 sec: 14472.5, 300 sec: 14273.5). Total num frames: 863199232. Throughput: 0: 3752.0. Samples: 204966136. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 13:32:48,969][134211] Avg episode reward: [(0, '9.652')] [2025-01-04 13:32:49,378][134294] Updated weights for policy 0, policy_version 210744 (0.0032) [2025-01-04 13:32:52,769][134294] Updated weights for policy 0, policy_version 210754 (0.0028) [2025-01-04 13:32:53,968][134211] Fps is (10 sec: 12697.7, 60 sec: 14404.3, 300 sec: 14245.7). Total num frames: 863260672. Throughput: 0: 3686.7. Samples: 204983540. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 13:32:53,968][134211] Avg episode reward: [(0, '8.797')] [2025-01-04 13:32:55,806][134294] Updated weights for policy 0, policy_version 210764 (0.0027) [2025-01-04 13:32:58,964][134294] Updated weights for policy 0, policy_version 210774 (0.0025) [2025-01-04 13:32:58,968][134211] Fps is (10 sec: 13107.4, 60 sec: 14404.3, 300 sec: 14287.4). Total num frames: 863330304. Throughput: 0: 3578.5. Samples: 205003136. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 13:32:58,968][134211] Avg episode reward: [(0, '9.563')] [2025-01-04 13:33:02,115][134294] Updated weights for policy 0, policy_version 210784 (0.0026) [2025-01-04 13:33:03,968][134211] Fps is (10 sec: 13107.2, 60 sec: 14336.0, 300 sec: 14245.7). Total num frames: 863391744. Throughput: 0: 3565.6. Samples: 205012890. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 13:33:03,969][134211] Avg episode reward: [(0, '9.270')] [2025-01-04 13:33:05,252][134294] Updated weights for policy 0, policy_version 210794 (0.0028) [2025-01-04 13:33:08,115][134294] Updated weights for policy 0, policy_version 210804 (0.0027) [2025-01-04 13:33:08,968][134211] Fps is (10 sec: 13107.1, 60 sec: 14131.2, 300 sec: 14120.8). Total num frames: 863461376. Throughput: 0: 3585.4. Samples: 205033236. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 13:33:08,968][134211] Avg episode reward: [(0, '10.006')] [2025-01-04 13:33:11,147][134294] Updated weights for policy 0, policy_version 210814 (0.0023) [2025-01-04 13:33:13,968][134211] Fps is (10 sec: 13926.1, 60 sec: 14131.4, 300 sec: 14093.0). Total num frames: 863531008. Throughput: 0: 3491.9. Samples: 205053352. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 13:33:13,969][134211] Avg episode reward: [(0, '8.940')] [2025-01-04 13:33:13,980][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000210823_863531008.pth... [2025-01-04 13:33:14,060][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000209990_860119040.pth [2025-01-04 13:33:14,297][134294] Updated weights for policy 0, policy_version 210824 (0.0027) [2025-01-04 13:33:17,407][134294] Updated weights for policy 0, policy_version 210834 (0.0026) [2025-01-04 13:33:18,968][134211] Fps is (10 sec: 13516.8, 60 sec: 14063.1, 300 sec: 14134.7). Total num frames: 863596544. Throughput: 0: 3508.6. Samples: 205063098. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 13:33:18,968][134211] Avg episode reward: [(0, '8.944')] [2025-01-04 13:33:20,364][134294] Updated weights for policy 0, policy_version 210844 (0.0028) [2025-01-04 13:33:23,290][134294] Updated weights for policy 0, policy_version 210854 (0.0024) [2025-01-04 13:33:23,969][134211] Fps is (10 sec: 13106.0, 60 sec: 14130.9, 300 sec: 14148.5). Total num frames: 863662080. Throughput: 0: 3532.6. Samples: 205083760. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 13:33:23,970][134211] Avg episode reward: [(0, '9.187')] [2025-01-04 13:33:26,407][134294] Updated weights for policy 0, policy_version 210864 (0.0025) [2025-01-04 13:33:28,968][134211] Fps is (10 sec: 13107.4, 60 sec: 13790.4, 300 sec: 14134.7). Total num frames: 863727616. Throughput: 0: 3325.8. Samples: 205103342. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 13:33:28,968][134211] Avg episode reward: [(0, '9.502')] [2025-01-04 13:33:29,691][134294] Updated weights for policy 0, policy_version 210874 (0.0028) [2025-01-04 13:33:32,668][134294] Updated weights for policy 0, policy_version 210884 (0.0026) [2025-01-04 13:33:33,968][134211] Fps is (10 sec: 13518.3, 60 sec: 13994.7, 300 sec: 14148.6). Total num frames: 863797248. Throughput: 0: 3266.4. Samples: 205113124. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 13:33:33,971][134211] Avg episode reward: [(0, '9.981')] [2025-01-04 13:33:35,697][134294] Updated weights for policy 0, policy_version 210894 (0.0024) [2025-01-04 13:33:38,809][134294] Updated weights for policy 0, policy_version 210904 (0.0023) [2025-01-04 13:33:38,968][134211] Fps is (10 sec: 13516.8, 60 sec: 13858.1, 300 sec: 14134.7). Total num frames: 863862784. Throughput: 0: 3333.4. Samples: 205133544. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 13:33:38,968][134211] Avg episode reward: [(0, '10.019')] [2025-01-04 13:33:41,841][134294] Updated weights for policy 0, policy_version 210914 (0.0025) [2025-01-04 13:33:43,968][134211] Fps is (10 sec: 13107.4, 60 sec: 13243.8, 300 sec: 14134.7). Total num frames: 863928320. Throughput: 0: 3334.5. Samples: 205153188. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 13:33:43,968][134211] Avg episode reward: [(0, '10.588')] [2025-01-04 13:33:44,867][134294] Updated weights for policy 0, policy_version 210924 (0.0024) [2025-01-04 13:33:46,907][134294] Updated weights for policy 0, policy_version 210934 (0.0015) [2025-01-04 13:33:48,968][134211] Fps is (10 sec: 15564.1, 60 sec: 13653.3, 300 sec: 14204.1). Total num frames: 864018432. Throughput: 0: 3401.2. Samples: 205165946. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 13:33:48,969][134211] Avg episode reward: [(0, '10.603')] [2025-01-04 13:33:49,631][134294] Updated weights for policy 0, policy_version 210944 (0.0020) [2025-01-04 13:33:52,971][134294] Updated weights for policy 0, policy_version 210954 (0.0030) [2025-01-04 13:33:53,969][134211] Fps is (10 sec: 14743.8, 60 sec: 13584.8, 300 sec: 14162.4). Total num frames: 864075776. Throughput: 0: 3431.1. Samples: 205187640. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 13:33:53,970][134211] Avg episode reward: [(0, '9.598')] [2025-01-04 13:33:56,122][134294] Updated weights for policy 0, policy_version 210964 (0.0027) [2025-01-04 13:33:58,107][134294] Updated weights for policy 0, policy_version 210974 (0.0015) [2025-01-04 13:33:58,968][134211] Fps is (10 sec: 14745.5, 60 sec: 13926.3, 300 sec: 14245.7). Total num frames: 864165888. Throughput: 0: 3503.9. Samples: 205211028. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 13:33:58,969][134211] Avg episode reward: [(0, '9.480')] [2025-01-04 13:34:00,030][134294] Updated weights for policy 0, policy_version 210984 (0.0013) [2025-01-04 13:34:01,946][134294] Updated weights for policy 0, policy_version 210994 (0.0013) [2025-01-04 13:34:03,879][134294] Updated weights for policy 0, policy_version 211004 (0.0014) [2025-01-04 13:34:03,968][134211] Fps is (10 sec: 19663.2, 60 sec: 14677.4, 300 sec: 14384.6). Total num frames: 864272384. Throughput: 0: 3640.9. Samples: 205226940. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 13:34:03,968][134211] Avg episode reward: [(0, '9.511')] [2025-01-04 13:34:05,788][134294] Updated weights for policy 0, policy_version 211014 (0.0014) [2025-01-04 13:34:07,846][134294] Updated weights for policy 0, policy_version 211024 (0.0015) [2025-01-04 13:34:08,968][134211] Fps is (10 sec: 20070.9, 60 sec: 15086.9, 300 sec: 14398.5). Total num frames: 864366592. Throughput: 0: 3891.4. Samples: 205258870. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 13:34:08,969][134211] Avg episode reward: [(0, '9.173')] [2025-01-04 13:34:11,206][134294] Updated weights for policy 0, policy_version 211034 (0.0032) [2025-01-04 13:34:13,968][134211] Fps is (10 sec: 15564.6, 60 sec: 14950.5, 300 sec: 14259.6). Total num frames: 864428032. Throughput: 0: 3878.0. Samples: 205277852. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 13:34:13,968][134211] Avg episode reward: [(0, '10.059')] [2025-01-04 13:34:14,598][134294] Updated weights for policy 0, policy_version 211044 (0.0029) [2025-01-04 13:34:17,703][134294] Updated weights for policy 0, policy_version 211054 (0.0026) [2025-01-04 13:34:18,968][134211] Fps is (10 sec: 12288.1, 60 sec: 14882.1, 300 sec: 14190.2). Total num frames: 864489472. Throughput: 0: 3871.3. Samples: 205287334. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 13:34:18,968][134211] Avg episode reward: [(0, '10.095')] [2025-01-04 13:34:20,854][134294] Updated weights for policy 0, policy_version 211064 (0.0028) [2025-01-04 13:34:23,938][134294] Updated weights for policy 0, policy_version 211074 (0.0025) [2025-01-04 13:34:23,968][134211] Fps is (10 sec: 13107.0, 60 sec: 14950.7, 300 sec: 14218.0). Total num frames: 864559104. Throughput: 0: 3863.6. Samples: 205307406. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 13:34:23,969][134211] Avg episode reward: [(0, '8.763')] [2025-01-04 13:34:27,009][134294] Updated weights for policy 0, policy_version 211084 (0.0024) [2025-01-04 13:34:28,968][134211] Fps is (10 sec: 13516.8, 60 sec: 14950.4, 300 sec: 14218.0). Total num frames: 864624640. Throughput: 0: 3865.2. Samples: 205327124. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 13:34:28,968][134211] Avg episode reward: [(0, '10.605')] [2025-01-04 13:34:29,999][134294] Updated weights for policy 0, policy_version 211094 (0.0025) [2025-01-04 13:34:33,237][134294] Updated weights for policy 0, policy_version 211104 (0.0028) [2025-01-04 13:34:33,968][134211] Fps is (10 sec: 13107.5, 60 sec: 14882.2, 300 sec: 14218.0). Total num frames: 864690176. Throughput: 0: 3803.9. Samples: 205337120. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 13:34:33,968][134211] Avg episode reward: [(0, '9.801')] [2025-01-04 13:34:36,557][134294] Updated weights for policy 0, policy_version 211114 (0.0031) [2025-01-04 13:34:38,968][134211] Fps is (10 sec: 12697.5, 60 sec: 14813.8, 300 sec: 14190.2). Total num frames: 864751616. Throughput: 0: 3739.6. Samples: 205355918. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 13:34:38,969][134211] Avg episode reward: [(0, '9.045')] [2025-01-04 13:34:40,038][134294] Updated weights for policy 0, policy_version 211124 (0.0026) [2025-01-04 13:34:43,971][134211] Fps is (10 sec: 11056.6, 60 sec: 14540.2, 300 sec: 14134.6). Total num frames: 864800768. Throughput: 0: 3585.4. Samples: 205372378. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 13:34:43,971][134211] Avg episode reward: [(0, '10.149')] [2025-01-04 13:34:44,112][134294] Updated weights for policy 0, policy_version 211134 (0.0032) [2025-01-04 13:34:47,849][134294] Updated weights for policy 0, policy_version 211144 (0.0030) [2025-01-04 13:34:48,968][134211] Fps is (10 sec: 10239.9, 60 sec: 13926.4, 300 sec: 14093.2). Total num frames: 864854016. Throughput: 0: 3403.6. Samples: 205380102. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 13:34:48,969][134211] Avg episode reward: [(0, '10.174')] [2025-01-04 13:34:51,535][134294] Updated weights for policy 0, policy_version 211154 (0.0031) [2025-01-04 13:34:53,968][134211] Fps is (10 sec: 11881.2, 60 sec: 14063.2, 300 sec: 14079.1). Total num frames: 864919552. Throughput: 0: 3072.4. Samples: 205397128. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 13:34:53,968][134211] Avg episode reward: [(0, '9.588')] [2025-01-04 13:34:54,523][134294] Updated weights for policy 0, policy_version 211164 (0.0024) [2025-01-04 13:34:56,436][134294] Updated weights for policy 0, policy_version 211174 (0.0012) [2025-01-04 13:34:58,389][134294] Updated weights for policy 0, policy_version 211184 (0.0012) [2025-01-04 13:34:58,968][134211] Fps is (10 sec: 16384.6, 60 sec: 14199.6, 300 sec: 14190.3). Total num frames: 865017856. Throughput: 0: 3262.9. Samples: 205424682. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 13:34:58,968][134211] Avg episode reward: [(0, '9.822')] [2025-01-04 13:35:00,317][134294] Updated weights for policy 0, policy_version 211194 (0.0013) [2025-01-04 13:35:02,311][134294] Updated weights for policy 0, policy_version 211204 (0.0014) [2025-01-04 13:35:03,968][134211] Fps is (10 sec: 20480.0, 60 sec: 14199.4, 300 sec: 14315.2). Total num frames: 865124352. Throughput: 0: 3399.6. Samples: 205440316. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 13:35:03,968][134211] Avg episode reward: [(0, '10.451')] [2025-01-04 13:35:04,339][134294] Updated weights for policy 0, policy_version 211214 (0.0017) [2025-01-04 13:35:07,557][134294] Updated weights for policy 0, policy_version 211224 (0.0027) [2025-01-04 13:35:08,968][134211] Fps is (10 sec: 16793.1, 60 sec: 13653.3, 300 sec: 14301.3). Total num frames: 865185792. Throughput: 0: 3511.9. Samples: 205465442. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 13:35:08,969][134211] Avg episode reward: [(0, '11.306')] [2025-01-04 13:35:11,072][134294] Updated weights for policy 0, policy_version 211234 (0.0031) [2025-01-04 13:35:13,968][134211] Fps is (10 sec: 11878.1, 60 sec: 13585.0, 300 sec: 14259.6). Total num frames: 865243136. Throughput: 0: 3450.1. Samples: 205482378. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 13:35:13,969][134211] Avg episode reward: [(0, '9.085')] [2025-01-04 13:35:13,978][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000211241_865243136.pth... [2025-01-04 13:35:14,069][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000210415_861859840.pth [2025-01-04 13:35:14,833][134294] Updated weights for policy 0, policy_version 211244 (0.0032) [2025-01-04 13:35:18,151][134294] Updated weights for policy 0, policy_version 211254 (0.0026) [2025-01-04 13:35:18,968][134211] Fps is (10 sec: 11878.4, 60 sec: 13585.0, 300 sec: 14218.0). Total num frames: 865304576. Throughput: 0: 3417.3. Samples: 205490898. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 13:35:18,969][134211] Avg episode reward: [(0, '9.484')] [2025-01-04 13:35:21,648][134294] Updated weights for policy 0, policy_version 211264 (0.0025) [2025-01-04 13:35:23,968][134211] Fps is (10 sec: 12288.3, 60 sec: 13448.6, 300 sec: 14162.5). Total num frames: 865366016. Throughput: 0: 3406.5. Samples: 205509212. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 13:35:23,968][134211] Avg episode reward: [(0, '8.847')] [2025-01-04 13:35:24,837][134294] Updated weights for policy 0, policy_version 211274 (0.0025) [2025-01-04 13:35:27,743][134294] Updated weights for policy 0, policy_version 211284 (0.0026) [2025-01-04 13:35:28,968][134211] Fps is (10 sec: 12697.8, 60 sec: 13448.5, 300 sec: 14148.6). Total num frames: 865431552. Throughput: 0: 3485.2. Samples: 205529204. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 13:35:28,968][134211] Avg episode reward: [(0, '9.788')] [2025-01-04 13:35:30,926][134294] Updated weights for policy 0, policy_version 211294 (0.0027) [2025-01-04 13:35:33,968][134211] Fps is (10 sec: 13107.0, 60 sec: 13448.5, 300 sec: 14134.7). Total num frames: 865497088. Throughput: 0: 3536.9. Samples: 205539262. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 13:35:33,968][134211] Avg episode reward: [(0, '9.727')] [2025-01-04 13:35:34,077][134294] Updated weights for policy 0, policy_version 211304 (0.0027) [2025-01-04 13:35:37,018][134294] Updated weights for policy 0, policy_version 211314 (0.0026) [2025-01-04 13:35:38,968][134211] Fps is (10 sec: 13516.7, 60 sec: 13585.1, 300 sec: 14176.3). Total num frames: 865566720. Throughput: 0: 3606.6. Samples: 205559426. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 13:35:38,968][134211] Avg episode reward: [(0, '11.632')] [2025-01-04 13:35:40,218][134294] Updated weights for policy 0, policy_version 211324 (0.0031) [2025-01-04 13:35:43,178][134294] Updated weights for policy 0, policy_version 211334 (0.0025) [2025-01-04 13:35:43,968][134211] Fps is (10 sec: 13516.6, 60 sec: 13858.6, 300 sec: 14120.8). Total num frames: 865632256. Throughput: 0: 3433.6. Samples: 205579194. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 13:35:43,969][134211] Avg episode reward: [(0, '9.308')] [2025-01-04 13:35:46,228][134294] Updated weights for policy 0, policy_version 211344 (0.0026) [2025-01-04 13:35:48,968][134211] Fps is (10 sec: 12288.0, 60 sec: 13926.4, 300 sec: 13968.0). Total num frames: 865689600. Throughput: 0: 3308.1. Samples: 205589180. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 13:35:48,968][134211] Avg episode reward: [(0, '10.117')] [2025-01-04 13:35:50,248][134294] Updated weights for policy 0, policy_version 211354 (0.0034) [2025-01-04 13:35:52,770][134294] Updated weights for policy 0, policy_version 211364 (0.0019) [2025-01-04 13:35:53,968][134211] Fps is (10 sec: 13926.9, 60 sec: 14199.5, 300 sec: 13912.5). Total num frames: 865771520. Throughput: 0: 3141.3. Samples: 205606800. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 13:35:53,968][134211] Avg episode reward: [(0, '8.867')] [2025-01-04 13:35:54,748][134294] Updated weights for policy 0, policy_version 211374 (0.0015) [2025-01-04 13:35:56,710][134294] Updated weights for policy 0, policy_version 211384 (0.0014) [2025-01-04 13:35:58,683][134294] Updated weights for policy 0, policy_version 211394 (0.0015) [2025-01-04 13:35:58,968][134211] Fps is (10 sec: 18432.4, 60 sec: 14267.7, 300 sec: 14051.4). Total num frames: 865873920. Throughput: 0: 3462.1. Samples: 205638170. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 13:35:58,968][134211] Avg episode reward: [(0, '9.727')] [2025-01-04 13:36:00,545][134294] Updated weights for policy 0, policy_version 211404 (0.0015) [2025-01-04 13:36:03,167][134294] Updated weights for policy 0, policy_version 211414 (0.0023) [2025-01-04 13:36:03,968][134211] Fps is (10 sec: 18841.1, 60 sec: 13926.3, 300 sec: 14120.8). Total num frames: 865959936. Throughput: 0: 3626.3. Samples: 205654082. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 13:36:03,969][134211] Avg episode reward: [(0, '10.094')] [2025-01-04 13:36:06,275][134294] Updated weights for policy 0, policy_version 211424 (0.0028) [2025-01-04 13:36:08,968][134211] Fps is (10 sec: 15154.9, 60 sec: 13994.7, 300 sec: 14134.7). Total num frames: 866025472. Throughput: 0: 3664.7. Samples: 205674122. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 13:36:08,968][134211] Avg episode reward: [(0, '10.429')] [2025-01-04 13:36:09,577][134294] Updated weights for policy 0, policy_version 211434 (0.0027) [2025-01-04 13:36:12,734][134294] Updated weights for policy 0, policy_version 211444 (0.0020) [2025-01-04 13:36:13,968][134211] Fps is (10 sec: 12697.2, 60 sec: 14062.9, 300 sec: 14106.9). Total num frames: 866086912. Throughput: 0: 3644.2. Samples: 205693194. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 13:36:13,969][134211] Avg episode reward: [(0, '8.601')] [2025-01-04 13:36:15,807][134294] Updated weights for policy 0, policy_version 211454 (0.0026) [2025-01-04 13:36:18,811][134294] Updated weights for policy 0, policy_version 211464 (0.0026) [2025-01-04 13:36:18,969][134211] Fps is (10 sec: 13105.7, 60 sec: 14199.2, 300 sec: 14120.7). Total num frames: 866156544. Throughput: 0: 3645.3. Samples: 205703306. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 13:36:18,970][134211] Avg episode reward: [(0, '9.519')] [2025-01-04 13:36:21,954][134294] Updated weights for policy 0, policy_version 211474 (0.0025) [2025-01-04 13:36:23,968][134211] Fps is (10 sec: 13107.6, 60 sec: 14199.4, 300 sec: 14093.6). Total num frames: 866217984. Throughput: 0: 3639.8. Samples: 205723216. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 13:36:23,969][134211] Avg episode reward: [(0, '9.232')] [2025-01-04 13:36:25,319][134294] Updated weights for policy 0, policy_version 211484 (0.0028) [2025-01-04 13:36:28,447][134294] Updated weights for policy 0, policy_version 211494 (0.0025) [2025-01-04 13:36:28,968][134211] Fps is (10 sec: 12698.6, 60 sec: 14199.4, 300 sec: 14093.0). Total num frames: 866283520. Throughput: 0: 3623.8. Samples: 205742266. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 13:36:28,969][134211] Avg episode reward: [(0, '9.583')] [2025-01-04 13:36:31,434][134294] Updated weights for policy 0, policy_version 211504 (0.0028) [2025-01-04 13:36:33,968][134211] Fps is (10 sec: 13107.4, 60 sec: 14199.5, 300 sec: 14079.1). Total num frames: 866349056. Throughput: 0: 3622.8. Samples: 205752208. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 13:36:33,968][134211] Avg episode reward: [(0, '9.585')] [2025-01-04 13:36:34,698][134294] Updated weights for policy 0, policy_version 211514 (0.0027) [2025-01-04 13:36:37,821][134294] Updated weights for policy 0, policy_version 211524 (0.0027) [2025-01-04 13:36:38,968][134211] Fps is (10 sec: 13107.7, 60 sec: 14131.2, 300 sec: 14065.2). Total num frames: 866414592. Throughput: 0: 3661.8. Samples: 205771580. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 13:36:38,968][134211] Avg episode reward: [(0, '9.936')] [2025-01-04 13:36:40,795][134294] Updated weights for policy 0, policy_version 211534 (0.0027) [2025-01-04 13:36:43,762][134294] Updated weights for policy 0, policy_version 211544 (0.0024) [2025-01-04 13:36:43,968][134211] Fps is (10 sec: 13517.0, 60 sec: 14199.6, 300 sec: 14079.1). Total num frames: 866484224. Throughput: 0: 3428.1. Samples: 205792434. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 13:36:43,968][134211] Avg episode reward: [(0, '9.299')] [2025-01-04 13:36:46,750][134294] Updated weights for policy 0, policy_version 211554 (0.0028) [2025-01-04 13:36:48,968][134211] Fps is (10 sec: 13926.6, 60 sec: 14404.3, 300 sec: 14093.0). Total num frames: 866553856. Throughput: 0: 3299.6. Samples: 205802562. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 13:36:48,968][134211] Avg episode reward: [(0, '9.481')] [2025-01-04 13:36:49,849][134294] Updated weights for policy 0, policy_version 211564 (0.0026) [2025-01-04 13:36:52,755][134294] Updated weights for policy 0, policy_version 211574 (0.0024) [2025-01-04 13:36:53,970][134211] Fps is (10 sec: 13923.3, 60 sec: 14198.9, 300 sec: 14092.9). Total num frames: 866623488. Throughput: 0: 3306.3. Samples: 205822910. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 13:36:53,970][134211] Avg episode reward: [(0, '10.217')] [2025-01-04 13:36:55,826][134294] Updated weights for policy 0, policy_version 211584 (0.0027) [2025-01-04 13:36:58,677][134294] Updated weights for policy 0, policy_version 211594 (0.0025) [2025-01-04 13:36:58,968][134211] Fps is (10 sec: 13926.4, 60 sec: 13653.3, 300 sec: 14106.9). Total num frames: 866693120. Throughput: 0: 3345.3. Samples: 205843732. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 13:36:58,968][134211] Avg episode reward: [(0, '8.653')] [2025-01-04 13:37:01,651][134294] Updated weights for policy 0, policy_version 211604 (0.0024) [2025-01-04 13:37:03,968][134211] Fps is (10 sec: 13519.6, 60 sec: 13312.0, 300 sec: 14051.4). Total num frames: 866758656. Throughput: 0: 3350.4. Samples: 205854068. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 13:37:03,968][134211] Avg episode reward: [(0, '9.804')] [2025-01-04 13:37:04,790][134294] Updated weights for policy 0, policy_version 211614 (0.0026) [2025-01-04 13:37:07,880][134294] Updated weights for policy 0, policy_version 211624 (0.0027) [2025-01-04 13:37:08,968][134211] Fps is (10 sec: 13107.0, 60 sec: 13312.0, 300 sec: 14037.5). Total num frames: 866824192. Throughput: 0: 3351.0. Samples: 205874012. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 13:37:08,968][134211] Avg episode reward: [(0, '9.403')] [2025-01-04 13:37:11,082][134294] Updated weights for policy 0, policy_version 211634 (0.0024) [2025-01-04 13:37:13,054][134294] Updated weights for policy 0, policy_version 211644 (0.0013) [2025-01-04 13:37:13,968][134211] Fps is (10 sec: 14745.6, 60 sec: 13653.4, 300 sec: 14079.2). Total num frames: 866906112. Throughput: 0: 3444.3. Samples: 205897256. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 13:37:13,968][134211] Avg episode reward: [(0, '9.084')] [2025-01-04 13:37:14,023][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000211648_866910208.pth... [2025-01-04 13:37:14,087][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000210823_863531008.pth [2025-01-04 13:37:15,785][134294] Updated weights for policy 0, policy_version 211654 (0.0023) [2025-01-04 13:37:18,856][134294] Updated weights for policy 0, policy_version 211664 (0.0027) [2025-01-04 13:37:18,968][134211] Fps is (10 sec: 15155.2, 60 sec: 13653.6, 300 sec: 14106.9). Total num frames: 866975744. Throughput: 0: 3464.7. Samples: 205908120. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 13:37:18,968][134211] Avg episode reward: [(0, '9.765')] [2025-01-04 13:37:21,918][134294] Updated weights for policy 0, policy_version 211674 (0.0027) [2025-01-04 13:37:23,969][134211] Fps is (10 sec: 13106.0, 60 sec: 13653.2, 300 sec: 14023.6). Total num frames: 867037184. Throughput: 0: 3476.5. Samples: 205928026. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 13:37:23,969][134211] Avg episode reward: [(0, '9.515')] [2025-01-04 13:37:25,070][134294] Updated weights for policy 0, policy_version 211684 (0.0022) [2025-01-04 13:37:27,166][134294] Updated weights for policy 0, policy_version 211694 (0.0014) [2025-01-04 13:37:28,967][134211] Fps is (10 sec: 15974.8, 60 sec: 14199.6, 300 sec: 14162.5). Total num frames: 867135488. Throughput: 0: 3580.8. Samples: 205953568. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 13:37:28,968][134211] Avg episode reward: [(0, '10.010')] [2025-01-04 13:37:28,985][134294] Updated weights for policy 0, policy_version 211704 (0.0015) [2025-01-04 13:37:30,908][134294] Updated weights for policy 0, policy_version 211714 (0.0013) [2025-01-04 13:37:32,980][134294] Updated weights for policy 0, policy_version 211724 (0.0017) [2025-01-04 13:37:33,968][134211] Fps is (10 sec: 19662.4, 60 sec: 14745.6, 300 sec: 14245.7). Total num frames: 867233792. Throughput: 0: 3720.2. Samples: 205969972. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 13:37:33,969][134211] Avg episode reward: [(0, '9.012')] [2025-01-04 13:37:36,442][134294] Updated weights for policy 0, policy_version 211734 (0.0028) [2025-01-04 13:37:38,968][134211] Fps is (10 sec: 15154.7, 60 sec: 14540.8, 300 sec: 14079.1). Total num frames: 867287040. Throughput: 0: 3728.0. Samples: 205990664. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 13:37:38,969][134211] Avg episode reward: [(0, '9.761')] [2025-01-04 13:37:40,485][134294] Updated weights for policy 0, policy_version 211744 (0.0034) [2025-01-04 13:37:43,627][134294] Updated weights for policy 0, policy_version 211754 (0.0024) [2025-01-04 13:37:43,968][134211] Fps is (10 sec: 11469.0, 60 sec: 14404.2, 300 sec: 14065.3). Total num frames: 867348480. Throughput: 0: 3648.1. Samples: 206007898. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 13:37:43,968][134211] Avg episode reward: [(0, '10.265')] [2025-01-04 13:37:46,695][134294] Updated weights for policy 0, policy_version 211764 (0.0025) [2025-01-04 13:37:48,968][134211] Fps is (10 sec: 12288.2, 60 sec: 14267.7, 300 sec: 14065.3). Total num frames: 867409920. Throughput: 0: 3635.7. Samples: 206017672. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 13:37:48,968][134211] Avg episode reward: [(0, '10.160')] [2025-01-04 13:37:49,909][134294] Updated weights for policy 0, policy_version 211774 (0.0028) [2025-01-04 13:37:52,872][134294] Updated weights for policy 0, policy_version 211784 (0.0025) [2025-01-04 13:37:53,968][134211] Fps is (10 sec: 13106.9, 60 sec: 14268.2, 300 sec: 14065.2). Total num frames: 867479552. Throughput: 0: 3635.5. Samples: 206037608. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 13:37:53,969][134211] Avg episode reward: [(0, '9.570')] [2025-01-04 13:37:55,937][134294] Updated weights for policy 0, policy_version 211794 (0.0025) [2025-01-04 13:37:58,785][134294] Updated weights for policy 0, policy_version 211804 (0.0025) [2025-01-04 13:37:58,968][134211] Fps is (10 sec: 13926.1, 60 sec: 14267.7, 300 sec: 14093.0). Total num frames: 867549184. Throughput: 0: 3579.8. Samples: 206058346. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 13:37:58,969][134211] Avg episode reward: [(0, '10.520')] [2025-01-04 13:38:01,863][134294] Updated weights for policy 0, policy_version 211814 (0.0023) [2025-01-04 13:38:03,968][134211] Fps is (10 sec: 13517.0, 60 sec: 14267.7, 300 sec: 14079.1). Total num frames: 867614720. Throughput: 0: 3564.1. Samples: 206068504. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 13:38:03,969][134211] Avg episode reward: [(0, '8.918')] [2025-01-04 13:38:04,955][134294] Updated weights for policy 0, policy_version 211824 (0.0025) [2025-01-04 13:38:07,945][134294] Updated weights for policy 0, policy_version 211834 (0.0024) [2025-01-04 13:38:08,968][134211] Fps is (10 sec: 13517.1, 60 sec: 14336.0, 300 sec: 14079.1). Total num frames: 867684352. Throughput: 0: 3565.9. Samples: 206088486. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 13:38:08,968][134211] Avg episode reward: [(0, '8.975')] [2025-01-04 13:38:11,025][134294] Updated weights for policy 0, policy_version 211844 (0.0022) [2025-01-04 13:38:13,963][134294] Updated weights for policy 0, policy_version 211854 (0.0023) [2025-01-04 13:38:13,968][134211] Fps is (10 sec: 13926.3, 60 sec: 14131.2, 300 sec: 14093.0). Total num frames: 867753984. Throughput: 0: 3449.9. Samples: 206108814. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 13:38:13,968][134211] Avg episode reward: [(0, '9.231')] [2025-01-04 13:38:16,998][134294] Updated weights for policy 0, policy_version 211864 (0.0024) [2025-01-04 13:38:18,968][134211] Fps is (10 sec: 13516.8, 60 sec: 14063.0, 300 sec: 14093.1). Total num frames: 867819520. Throughput: 0: 3311.5. Samples: 206118990. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 13:38:18,968][134211] Avg episode reward: [(0, '10.357')] [2025-01-04 13:38:19,991][134294] Updated weights for policy 0, policy_version 211874 (0.0024) [2025-01-04 13:38:22,865][134294] Updated weights for policy 0, policy_version 211884 (0.0024) [2025-01-04 13:38:23,968][134211] Fps is (10 sec: 13516.8, 60 sec: 14199.7, 300 sec: 14106.9). Total num frames: 867889152. Throughput: 0: 3317.9. Samples: 206139970. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 13:38:23,969][134211] Avg episode reward: [(0, '10.184')] [2025-01-04 13:38:25,869][134294] Updated weights for policy 0, policy_version 211894 (0.0025) [2025-01-04 13:38:28,776][134294] Updated weights for policy 0, policy_version 211904 (0.0024) [2025-01-04 13:38:28,967][134211] Fps is (10 sec: 14336.2, 60 sec: 13789.9, 300 sec: 14120.8). Total num frames: 867962880. Throughput: 0: 3401.1. Samples: 206160946. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 13:38:28,968][134211] Avg episode reward: [(0, '10.048')] [2025-01-04 13:38:30,655][134294] Updated weights for policy 0, policy_version 211914 (0.0012) [2025-01-04 13:38:32,560][134294] Updated weights for policy 0, policy_version 211924 (0.0012) [2025-01-04 13:38:33,967][134211] Fps is (10 sec: 18023.0, 60 sec: 13926.5, 300 sec: 14259.6). Total num frames: 868069376. Throughput: 0: 3518.7. Samples: 206176014. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 13:38:33,968][134211] Avg episode reward: [(0, '8.741')] [2025-01-04 13:38:34,416][134294] Updated weights for policy 0, policy_version 211934 (0.0013) [2025-01-04 13:38:36,306][134294] Updated weights for policy 0, policy_version 211944 (0.0012) [2025-01-04 13:38:38,952][134294] Updated weights for policy 0, policy_version 211954 (0.0021) [2025-01-04 13:38:38,969][134211] Fps is (10 sec: 20068.2, 60 sec: 14608.9, 300 sec: 14356.8). Total num frames: 868163584. Throughput: 0: 3784.1. Samples: 206207894. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 13:38:38,969][134211] Avg episode reward: [(0, '10.029')] [2025-01-04 13:38:42,271][134294] Updated weights for policy 0, policy_version 211964 (0.0028) [2025-01-04 13:38:43,968][134211] Fps is (10 sec: 15563.8, 60 sec: 14609.0, 300 sec: 14259.6). Total num frames: 868225024. Throughput: 0: 3750.7. Samples: 206227130. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 13:38:43,969][134211] Avg episode reward: [(0, '9.635')] [2025-01-04 13:38:45,486][134294] Updated weights for policy 0, policy_version 211974 (0.0031) [2025-01-04 13:38:48,385][134294] Updated weights for policy 0, policy_version 211984 (0.0025) [2025-01-04 13:38:48,968][134211] Fps is (10 sec: 12698.6, 60 sec: 14677.3, 300 sec: 14287.5). Total num frames: 868290560. Throughput: 0: 3746.6. Samples: 206237102. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 13:38:48,968][134211] Avg episode reward: [(0, '9.209')] [2025-01-04 13:38:51,530][134294] Updated weights for policy 0, policy_version 211994 (0.0024) [2025-01-04 13:38:53,968][134211] Fps is (10 sec: 13107.6, 60 sec: 14609.1, 300 sec: 14204.1). Total num frames: 868356096. Throughput: 0: 3746.5. Samples: 206257080. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 13:38:53,968][134211] Avg episode reward: [(0, '9.483')] [2025-01-04 13:38:54,707][134294] Updated weights for policy 0, policy_version 212004 (0.0026) [2025-01-04 13:38:57,710][134294] Updated weights for policy 0, policy_version 212014 (0.0025) [2025-01-04 13:38:58,968][134211] Fps is (10 sec: 13107.3, 60 sec: 14540.8, 300 sec: 14065.2). Total num frames: 868421632. Throughput: 0: 3739.5. Samples: 206277092. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 13:38:58,968][134211] Avg episode reward: [(0, '10.077')] [2025-01-04 13:39:00,734][134294] Updated weights for policy 0, policy_version 212024 (0.0028) [2025-01-04 13:39:03,534][134294] Updated weights for policy 0, policy_version 212034 (0.0024) [2025-01-04 13:39:03,968][134211] Fps is (10 sec: 13926.4, 60 sec: 14677.3, 300 sec: 13995.8). Total num frames: 868495360. Throughput: 0: 3747.0. Samples: 206287604. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 13:39:03,968][134211] Avg episode reward: [(0, '9.584')] [2025-01-04 13:39:06,638][134294] Updated weights for policy 0, policy_version 212044 (0.0025) [2025-01-04 13:39:08,968][134211] Fps is (10 sec: 13926.6, 60 sec: 14609.1, 300 sec: 14009.7). Total num frames: 868560896. Throughput: 0: 3736.5. Samples: 206308112. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 13:39:08,968][134211] Avg episode reward: [(0, '8.452')] [2025-01-04 13:39:09,601][134294] Updated weights for policy 0, policy_version 212054 (0.0025) [2025-01-04 13:39:12,631][134294] Updated weights for policy 0, policy_version 212064 (0.0025) [2025-01-04 13:39:13,968][134211] Fps is (10 sec: 13517.0, 60 sec: 14609.1, 300 sec: 14037.5). Total num frames: 868630528. Throughput: 0: 3722.0. Samples: 206328438. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 13:39:13,968][134211] Avg episode reward: [(0, '10.824')] [2025-01-04 13:39:13,976][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000212068_868630528.pth... [2025-01-04 13:39:14,060][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000211241_865243136.pth [2025-01-04 13:39:15,687][134294] Updated weights for policy 0, policy_version 212074 (0.0026) [2025-01-04 13:39:18,539][134294] Updated weights for policy 0, policy_version 212084 (0.0026) [2025-01-04 13:39:18,968][134211] Fps is (10 sec: 13926.2, 60 sec: 14677.3, 300 sec: 14037.5). Total num frames: 868700160. Throughput: 0: 3615.2. Samples: 206338698. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 13:39:18,968][134211] Avg episode reward: [(0, '8.743')] [2025-01-04 13:39:21,548][134294] Updated weights for policy 0, policy_version 212094 (0.0022) [2025-01-04 13:39:23,968][134211] Fps is (10 sec: 13926.4, 60 sec: 14677.4, 300 sec: 14051.4). Total num frames: 868769792. Throughput: 0: 3372.7. Samples: 206359662. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 13:39:23,968][134211] Avg episode reward: [(0, '9.918')] [2025-01-04 13:39:24,629][134294] Updated weights for policy 0, policy_version 212104 (0.0026) [2025-01-04 13:39:27,501][134294] Updated weights for policy 0, policy_version 212114 (0.0024) [2025-01-04 13:39:28,968][134211] Fps is (10 sec: 13517.0, 60 sec: 14540.8, 300 sec: 14051.4). Total num frames: 868835328. Throughput: 0: 3397.9. Samples: 206380032. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 13:39:28,968][134211] Avg episode reward: [(0, '9.604')] [2025-01-04 13:39:30,457][134294] Updated weights for policy 0, policy_version 212124 (0.0025) [2025-01-04 13:39:33,389][134294] Updated weights for policy 0, policy_version 212134 (0.0025) [2025-01-04 13:39:33,968][134211] Fps is (10 sec: 13516.8, 60 sec: 13926.4, 300 sec: 14079.1). Total num frames: 868904960. Throughput: 0: 3410.7. Samples: 206390582. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 13:39:33,968][134211] Avg episode reward: [(0, '8.690')] [2025-01-04 13:39:36,410][134294] Updated weights for policy 0, policy_version 212144 (0.0028) [2025-01-04 13:39:38,968][134211] Fps is (10 sec: 13926.2, 60 sec: 13517.0, 300 sec: 14148.7). Total num frames: 868974592. Throughput: 0: 3428.6. Samples: 206411368. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 13:39:38,968][134211] Avg episode reward: [(0, '10.131')] [2025-01-04 13:39:39,411][134294] Updated weights for policy 0, policy_version 212154 (0.0027) [2025-01-04 13:39:42,377][134294] Updated weights for policy 0, policy_version 212164 (0.0028) [2025-01-04 13:39:43,968][134211] Fps is (10 sec: 13926.3, 60 sec: 13653.4, 300 sec: 14204.1). Total num frames: 869044224. Throughput: 0: 3437.2. Samples: 206431768. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 13:39:43,968][134211] Avg episode reward: [(0, '9.672')] [2025-01-04 13:39:44,925][134294] Updated weights for policy 0, policy_version 212174 (0.0020) [2025-01-04 13:39:46,837][134294] Updated weights for policy 0, policy_version 212184 (0.0013) [2025-01-04 13:39:48,730][134294] Updated weights for policy 0, policy_version 212194 (0.0013) [2025-01-04 13:39:48,968][134211] Fps is (10 sec: 17613.2, 60 sec: 14336.1, 300 sec: 14342.9). Total num frames: 869150720. Throughput: 0: 3522.5. Samples: 206446114. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 13:39:48,968][134211] Avg episode reward: [(0, '8.902')] [2025-01-04 13:39:50,620][134294] Updated weights for policy 0, policy_version 212204 (0.0015) [2025-01-04 13:39:52,594][134294] Updated weights for policy 0, policy_version 212214 (0.0014) [2025-01-04 13:39:53,968][134211] Fps is (10 sec: 20070.5, 60 sec: 14813.9, 300 sec: 14329.0). Total num frames: 869244928. Throughput: 0: 3787.3. Samples: 206478540. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 13:39:53,968][134211] Avg episode reward: [(0, '9.165')] [2025-01-04 13:39:55,583][134294] Updated weights for policy 0, policy_version 212224 (0.0030) [2025-01-04 13:39:58,652][134294] Updated weights for policy 0, policy_version 212234 (0.0028) [2025-01-04 13:39:58,968][134211] Fps is (10 sec: 16383.8, 60 sec: 14882.2, 300 sec: 14204.1). Total num frames: 869314560. Throughput: 0: 3796.5. Samples: 206499280. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 13:39:58,968][134211] Avg episode reward: [(0, '11.456')] [2025-01-04 13:40:01,822][134294] Updated weights for policy 0, policy_version 212244 (0.0029) [2025-01-04 13:40:03,968][134211] Fps is (10 sec: 13107.2, 60 sec: 14677.4, 300 sec: 14204.1). Total num frames: 869376000. Throughput: 0: 3783.9. Samples: 206508974. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 13:40:03,968][134211] Avg episode reward: [(0, '8.950')] [2025-01-04 13:40:04,997][134294] Updated weights for policy 0, policy_version 212254 (0.0026) [2025-01-04 13:40:08,097][134294] Updated weights for policy 0, policy_version 212264 (0.0025) [2025-01-04 13:40:08,968][134211] Fps is (10 sec: 12697.5, 60 sec: 14677.3, 300 sec: 14231.9). Total num frames: 869441536. Throughput: 0: 3749.7. Samples: 206528400. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 13:40:08,968][134211] Avg episode reward: [(0, '8.600')] [2025-01-04 13:40:11,121][134294] Updated weights for policy 0, policy_version 212274 (0.0023) [2025-01-04 13:40:13,968][134211] Fps is (10 sec: 13516.7, 60 sec: 14677.3, 300 sec: 14259.6). Total num frames: 869511168. Throughput: 0: 3742.7. Samples: 206548452. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 13:40:13,968][134211] Avg episode reward: [(0, '8.962')] [2025-01-04 13:40:14,244][134294] Updated weights for policy 0, policy_version 212284 (0.0024) [2025-01-04 13:40:17,234][134294] Updated weights for policy 0, policy_version 212294 (0.0024) [2025-01-04 13:40:18,968][134211] Fps is (10 sec: 13516.7, 60 sec: 14609.1, 300 sec: 14273.5). Total num frames: 869576704. Throughput: 0: 3733.7. Samples: 206558598. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 13:40:18,968][134211] Avg episode reward: [(0, '9.887')] [2025-01-04 13:40:20,145][134294] Updated weights for policy 0, policy_version 212304 (0.0027) [2025-01-04 13:40:23,118][134294] Updated weights for policy 0, policy_version 212314 (0.0026) [2025-01-04 13:40:23,968][134211] Fps is (10 sec: 13516.8, 60 sec: 14609.1, 300 sec: 14287.4). Total num frames: 869646336. Throughput: 0: 3740.0. Samples: 206579668. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 13:40:23,968][134211] Avg episode reward: [(0, '9.474')] [2025-01-04 13:40:26,013][134294] Updated weights for policy 0, policy_version 212324 (0.0025) [2025-01-04 13:40:28,968][134211] Fps is (10 sec: 13926.6, 60 sec: 14677.3, 300 sec: 14301.3). Total num frames: 869715968. Throughput: 0: 3741.0. Samples: 206600114. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 13:40:28,968][134211] Avg episode reward: [(0, '10.161')] [2025-01-04 13:40:29,108][134294] Updated weights for policy 0, policy_version 212334 (0.0028) [2025-01-04 13:40:32,046][134294] Updated weights for policy 0, policy_version 212344 (0.0024) [2025-01-04 13:40:33,968][134211] Fps is (10 sec: 13926.5, 60 sec: 14677.3, 300 sec: 14301.3). Total num frames: 869785600. Throughput: 0: 3649.5. Samples: 206610342. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 13:40:33,968][134211] Avg episode reward: [(0, '9.898')] [2025-01-04 13:40:35,011][134294] Updated weights for policy 0, policy_version 212354 (0.0023) [2025-01-04 13:40:37,971][134294] Updated weights for policy 0, policy_version 212364 (0.0023) [2025-01-04 13:40:38,968][134211] Fps is (10 sec: 13926.0, 60 sec: 14677.3, 300 sec: 14315.2). Total num frames: 869855232. Throughput: 0: 3394.6. Samples: 206631296. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 13:40:38,969][134211] Avg episode reward: [(0, '9.359')] [2025-01-04 13:40:40,956][134294] Updated weights for policy 0, policy_version 212374 (0.0026) [2025-01-04 13:40:43,815][134294] Updated weights for policy 0, policy_version 212384 (0.0027) [2025-01-04 13:40:43,968][134211] Fps is (10 sec: 13926.5, 60 sec: 14677.4, 300 sec: 14356.8). Total num frames: 869924864. Throughput: 0: 3402.0. Samples: 206652368. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 13:40:43,968][134211] Avg episode reward: [(0, '10.051')] [2025-01-04 13:40:46,732][134294] Updated weights for policy 0, policy_version 212394 (0.0026) [2025-01-04 13:40:48,968][134211] Fps is (10 sec: 13926.9, 60 sec: 14062.9, 300 sec: 14315.2). Total num frames: 869994496. Throughput: 0: 3416.1. Samples: 206662700. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 13:40:48,968][134211] Avg episode reward: [(0, '10.516')] [2025-01-04 13:40:49,800][134294] Updated weights for policy 0, policy_version 212404 (0.0028) [2025-01-04 13:40:52,820][134294] Updated weights for policy 0, policy_version 212414 (0.0024) [2025-01-04 13:40:53,968][134211] Fps is (10 sec: 13516.5, 60 sec: 13585.0, 300 sec: 14190.2). Total num frames: 870060032. Throughput: 0: 3430.7. Samples: 206682780. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 13:40:53,969][134211] Avg episode reward: [(0, '9.202')] [2025-01-04 13:40:55,860][134294] Updated weights for policy 0, policy_version 212424 (0.0026) [2025-01-04 13:40:58,424][134294] Updated weights for policy 0, policy_version 212434 (0.0021) [2025-01-04 13:40:58,968][134211] Fps is (10 sec: 14336.0, 60 sec: 13721.6, 300 sec: 14162.5). Total num frames: 870137856. Throughput: 0: 3467.6. Samples: 206704492. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 13:40:58,968][134211] Avg episode reward: [(0, '9.318')] [2025-01-04 13:41:00,574][134294] Updated weights for policy 0, policy_version 212444 (0.0019) [2025-01-04 13:41:03,508][134294] Updated weights for policy 0, policy_version 212454 (0.0024) [2025-01-04 13:41:03,968][134211] Fps is (10 sec: 15564.9, 60 sec: 13994.6, 300 sec: 14204.1). Total num frames: 870215680. Throughput: 0: 3548.0. Samples: 206718260. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 13:41:03,969][134211] Avg episode reward: [(0, '10.665')] [2025-01-04 13:41:06,472][134294] Updated weights for policy 0, policy_version 212464 (0.0025) [2025-01-04 13:41:08,968][134211] Fps is (10 sec: 14745.4, 60 sec: 14062.9, 300 sec: 14231.9). Total num frames: 870285312. Throughput: 0: 3533.9. Samples: 206738692. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 13:41:08,968][134211] Avg episode reward: [(0, '9.799')] [2025-01-04 13:41:09,623][134294] Updated weights for policy 0, policy_version 212474 (0.0025) [2025-01-04 13:41:12,540][134294] Updated weights for policy 0, policy_version 212484 (0.0026) [2025-01-04 13:41:13,968][134211] Fps is (10 sec: 13516.7, 60 sec: 13994.6, 300 sec: 14218.0). Total num frames: 870350848. Throughput: 0: 3526.5. Samples: 206758808. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 13:41:13,969][134211] Avg episode reward: [(0, '9.054')] [2025-01-04 13:41:13,980][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000212488_870350848.pth... [2025-01-04 13:41:14,059][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000211648_866910208.pth [2025-01-04 13:41:15,605][134294] Updated weights for policy 0, policy_version 212494 (0.0027) [2025-01-04 13:41:17,703][134294] Updated weights for policy 0, policy_version 212504 (0.0014) [2025-01-04 13:41:18,968][134211] Fps is (10 sec: 15155.2, 60 sec: 14336.0, 300 sec: 14301.3). Total num frames: 870436864. Throughput: 0: 3529.8. Samples: 206769184. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 13:41:18,968][134211] Avg episode reward: [(0, '9.475')] [2025-01-04 13:41:20,231][134294] Updated weights for policy 0, policy_version 212514 (0.0019) [2025-01-04 13:41:23,217][134294] Updated weights for policy 0, policy_version 212524 (0.0025) [2025-01-04 13:41:23,968][134211] Fps is (10 sec: 15565.0, 60 sec: 14336.0, 300 sec: 14315.2). Total num frames: 870506496. Throughput: 0: 3625.5. Samples: 206794444. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 13:41:23,968][134211] Avg episode reward: [(0, '10.204')] [2025-01-04 13:41:26,049][134294] Updated weights for policy 0, policy_version 212534 (0.0024) [2025-01-04 13:41:28,968][134211] Fps is (10 sec: 13926.3, 60 sec: 14336.0, 300 sec: 14329.1). Total num frames: 870576128. Throughput: 0: 3607.4. Samples: 206814702. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 13:41:28,968][134211] Avg episode reward: [(0, '9.934')] [2025-01-04 13:41:29,244][134294] Updated weights for policy 0, policy_version 212544 (0.0023) [2025-01-04 13:41:32,161][134294] Updated weights for policy 0, policy_version 212554 (0.0025) [2025-01-04 13:41:33,968][134211] Fps is (10 sec: 13517.0, 60 sec: 14267.7, 300 sec: 14329.1). Total num frames: 870641664. Throughput: 0: 3605.7. Samples: 206824956. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 13:41:33,968][134211] Avg episode reward: [(0, '9.339')] [2025-01-04 13:41:35,188][134294] Updated weights for policy 0, policy_version 212564 (0.0026) [2025-01-04 13:41:38,069][134294] Updated weights for policy 0, policy_version 212574 (0.0026) [2025-01-04 13:41:38,968][134211] Fps is (10 sec: 13516.9, 60 sec: 14267.8, 300 sec: 14329.1). Total num frames: 870711296. Throughput: 0: 3624.5. Samples: 206845880. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 13:41:38,968][134211] Avg episode reward: [(0, '9.455')] [2025-01-04 13:41:41,002][134294] Updated weights for policy 0, policy_version 212584 (0.0024) [2025-01-04 13:41:43,968][134211] Fps is (10 sec: 13925.9, 60 sec: 14267.6, 300 sec: 14329.0). Total num frames: 870780928. Throughput: 0: 3597.9. Samples: 206866400. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 13:41:43,969][134211] Avg episode reward: [(0, '10.417')] [2025-01-04 13:41:44,141][134294] Updated weights for policy 0, policy_version 212594 (0.0028) [2025-01-04 13:41:46,589][134294] Updated weights for policy 0, policy_version 212604 (0.0018) [2025-01-04 13:41:48,436][134294] Updated weights for policy 0, policy_version 212614 (0.0014) [2025-01-04 13:41:48,967][134211] Fps is (10 sec: 16384.3, 60 sec: 14677.4, 300 sec: 14412.5). Total num frames: 870875136. Throughput: 0: 3540.9. Samples: 206877598. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 13:41:48,968][134211] Avg episode reward: [(0, '9.779')] [2025-01-04 13:41:50,354][134294] Updated weights for policy 0, policy_version 212624 (0.0014) [2025-01-04 13:41:52,225][134294] Updated weights for policy 0, policy_version 212634 (0.0013) [2025-01-04 13:41:53,968][134211] Fps is (10 sec: 20480.7, 60 sec: 15428.3, 300 sec: 14551.2). Total num frames: 870985728. Throughput: 0: 3814.4. Samples: 206910342. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 13:41:53,968][134211] Avg episode reward: [(0, '9.597')] [2025-01-04 13:41:54,258][134294] Updated weights for policy 0, policy_version 212644 (0.0017) [2025-01-04 13:41:57,327][134294] Updated weights for policy 0, policy_version 212654 (0.0030) [2025-01-04 13:41:58,968][134211] Fps is (10 sec: 17202.7, 60 sec: 15155.1, 300 sec: 14537.3). Total num frames: 871047168. Throughput: 0: 3882.9. Samples: 206933538. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 13:41:58,968][134211] Avg episode reward: [(0, '9.690')] [2025-01-04 13:42:00,541][134294] Updated weights for policy 0, policy_version 212664 (0.0030) [2025-01-04 13:42:03,630][134294] Updated weights for policy 0, policy_version 212674 (0.0029) [2025-01-04 13:42:03,968][134211] Fps is (10 sec: 13107.0, 60 sec: 15018.7, 300 sec: 14551.2). Total num frames: 871116800. Throughput: 0: 3867.9. Samples: 206943238. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 13:42:03,969][134211] Avg episode reward: [(0, '10.212')] [2025-01-04 13:42:06,714][134294] Updated weights for policy 0, policy_version 212684 (0.0026) [2025-01-04 13:42:08,982][134211] Fps is (10 sec: 13089.1, 60 sec: 14878.7, 300 sec: 14481.1). Total num frames: 871178240. Throughput: 0: 3745.5. Samples: 206963044. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 13:42:08,982][134211] Avg episode reward: [(0, '9.394')] [2025-01-04 13:42:10,013][134294] Updated weights for policy 0, policy_version 212694 (0.0030) [2025-01-04 13:42:12,978][134294] Updated weights for policy 0, policy_version 212704 (0.0024) [2025-01-04 13:42:13,968][134211] Fps is (10 sec: 12697.6, 60 sec: 14882.2, 300 sec: 14467.9). Total num frames: 871243776. Throughput: 0: 3732.3. Samples: 206982656. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 13:42:13,968][134211] Avg episode reward: [(0, '9.663')] [2025-01-04 13:42:15,897][134294] Updated weights for policy 0, policy_version 212714 (0.0026) [2025-01-04 13:42:18,889][134294] Updated weights for policy 0, policy_version 212724 (0.0024) [2025-01-04 13:42:18,968][134211] Fps is (10 sec: 13945.7, 60 sec: 14677.3, 300 sec: 14509.6). Total num frames: 871317504. Throughput: 0: 3733.7. Samples: 206992974. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 13:42:18,968][134211] Avg episode reward: [(0, '10.773')] [2025-01-04 13:42:21,852][134294] Updated weights for policy 0, policy_version 212734 (0.0024) [2025-01-04 13:42:23,968][134211] Fps is (10 sec: 13926.6, 60 sec: 14609.1, 300 sec: 14398.5). Total num frames: 871383040. Throughput: 0: 3730.9. Samples: 207013770. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 13:42:23,968][134211] Avg episode reward: [(0, '9.689')] [2025-01-04 13:42:24,947][134294] Updated weights for policy 0, policy_version 212744 (0.0022) [2025-01-04 13:42:27,935][134294] Updated weights for policy 0, policy_version 212754 (0.0024) [2025-01-04 13:42:28,968][134211] Fps is (10 sec: 13516.9, 60 sec: 14609.1, 300 sec: 14301.3). Total num frames: 871452672. Throughput: 0: 3726.0. Samples: 207034070. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 13:42:28,968][134211] Avg episode reward: [(0, '8.996')] [2025-01-04 13:42:30,989][134294] Updated weights for policy 0, policy_version 212764 (0.0026) [2025-01-04 13:42:33,968][134211] Fps is (10 sec: 13516.8, 60 sec: 14609.1, 300 sec: 14343.0). Total num frames: 871518208. Throughput: 0: 3703.5. Samples: 207044256. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 13:42:33,968][134211] Avg episode reward: [(0, '10.180')] [2025-01-04 13:42:34,045][134294] Updated weights for policy 0, policy_version 212774 (0.0024) [2025-01-04 13:42:36,990][134294] Updated weights for policy 0, policy_version 212784 (0.0027) [2025-01-04 13:42:38,968][134211] Fps is (10 sec: 13516.8, 60 sec: 14609.1, 300 sec: 14370.7). Total num frames: 871587840. Throughput: 0: 3427.5. Samples: 207064580. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 13:42:38,968][134211] Avg episode reward: [(0, '8.529')] [2025-01-04 13:42:39,960][134294] Updated weights for policy 0, policy_version 212794 (0.0025) [2025-01-04 13:42:42,921][134294] Updated weights for policy 0, policy_version 212804 (0.0024) [2025-01-04 13:42:43,968][134211] Fps is (10 sec: 13925.9, 60 sec: 14609.1, 300 sec: 14398.5). Total num frames: 871657472. Throughput: 0: 3374.6. Samples: 207085394. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 13:42:43,969][134211] Avg episode reward: [(0, '9.374')] [2025-01-04 13:42:45,797][134294] Updated weights for policy 0, policy_version 212814 (0.0025) [2025-01-04 13:42:48,699][134294] Updated weights for policy 0, policy_version 212824 (0.0024) [2025-01-04 13:42:48,968][134211] Fps is (10 sec: 13926.2, 60 sec: 14199.4, 300 sec: 14398.5). Total num frames: 871727104. Throughput: 0: 3394.1. Samples: 207095974. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 13:42:48,969][134211] Avg episode reward: [(0, '9.299')] [2025-01-04 13:42:52,223][134294] Updated weights for policy 0, policy_version 212834 (0.0033) [2025-01-04 13:42:53,968][134211] Fps is (10 sec: 12697.6, 60 sec: 13311.9, 300 sec: 14356.8). Total num frames: 871784448. Throughput: 0: 3377.3. Samples: 207114976. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 13:42:53,969][134211] Avg episode reward: [(0, '9.204')] [2025-01-04 13:42:55,158][134294] Updated weights for policy 0, policy_version 212844 (0.0020) [2025-01-04 13:42:57,178][134294] Updated weights for policy 0, policy_version 212854 (0.0014) [2025-01-04 13:42:58,968][134211] Fps is (10 sec: 15974.7, 60 sec: 13994.7, 300 sec: 14481.8). Total num frames: 871886848. Throughput: 0: 3512.9. Samples: 207140738. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 13:42:58,968][134211] Avg episode reward: [(0, '9.759')] [2025-01-04 13:42:59,175][134294] Updated weights for policy 0, policy_version 212864 (0.0012) [2025-01-04 13:43:01,160][134294] Updated weights for policy 0, policy_version 212874 (0.0012) [2025-01-04 13:43:03,968][134211] Fps is (10 sec: 18432.6, 60 sec: 14199.5, 300 sec: 14523.4). Total num frames: 871968768. Throughput: 0: 3631.3. Samples: 207156380. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 13:43:03,968][134211] Avg episode reward: [(0, '9.431')] [2025-01-04 13:43:04,074][134294] Updated weights for policy 0, policy_version 212884 (0.0027) [2025-01-04 13:43:07,239][134294] Updated weights for policy 0, policy_version 212894 (0.0025) [2025-01-04 13:43:08,968][134211] Fps is (10 sec: 14745.6, 60 sec: 14271.1, 300 sec: 14509.6). Total num frames: 872034304. Throughput: 0: 3616.8. Samples: 207176526. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 13:43:08,968][134211] Avg episode reward: [(0, '9.438')] [2025-01-04 13:43:10,238][134294] Updated weights for policy 0, policy_version 212904 (0.0027) [2025-01-04 13:43:13,102][134294] Updated weights for policy 0, policy_version 212914 (0.0025) [2025-01-04 13:43:13,968][134211] Fps is (10 sec: 13515.8, 60 sec: 14335.9, 300 sec: 14523.4). Total num frames: 872103936. Throughput: 0: 3623.8. Samples: 207197142. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 13:43:13,969][134211] Avg episode reward: [(0, '10.879')] [2025-01-04 13:43:14,047][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000212917_872108032.pth... [2025-01-04 13:43:14,115][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000212068_868630528.pth [2025-01-04 13:43:16,169][134294] Updated weights for policy 0, policy_version 212924 (0.0023) [2025-01-04 13:43:18,968][134211] Fps is (10 sec: 13926.2, 60 sec: 14267.7, 300 sec: 14523.4). Total num frames: 872173568. Throughput: 0: 3625.4. Samples: 207207398. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 13:43:18,968][134211] Avg episode reward: [(0, '9.777')] [2025-01-04 13:43:18,999][134294] Updated weights for policy 0, policy_version 212934 (0.0023) [2025-01-04 13:43:21,897][134294] Updated weights for policy 0, policy_version 212944 (0.0021) [2025-01-04 13:43:23,968][134211] Fps is (10 sec: 13927.2, 60 sec: 14336.0, 300 sec: 14509.5). Total num frames: 872243200. Throughput: 0: 3646.7. Samples: 207228680. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 13:43:23,968][134211] Avg episode reward: [(0, '9.066')] [2025-01-04 13:43:24,927][134294] Updated weights for policy 0, policy_version 212954 (0.0023) [2025-01-04 13:43:27,867][134294] Updated weights for policy 0, policy_version 212964 (0.0024) [2025-01-04 13:43:28,968][134211] Fps is (10 sec: 13926.4, 60 sec: 14336.0, 300 sec: 14384.6). Total num frames: 872312832. Throughput: 0: 3645.9. Samples: 207249460. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 13:43:28,968][134211] Avg episode reward: [(0, '10.139')] [2025-01-04 13:43:30,748][134294] Updated weights for policy 0, policy_version 212974 (0.0026) [2025-01-04 13:43:33,618][134294] Updated weights for policy 0, policy_version 212984 (0.0022) [2025-01-04 13:43:33,968][134211] Fps is (10 sec: 14335.6, 60 sec: 14472.4, 300 sec: 14315.2). Total num frames: 872386560. Throughput: 0: 3644.4. Samples: 207259972. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 13:43:33,969][134211] Avg episode reward: [(0, '10.597')] [2025-01-04 13:43:36,535][134294] Updated weights for policy 0, policy_version 212994 (0.0023) [2025-01-04 13:43:38,968][134211] Fps is (10 sec: 14336.2, 60 sec: 14472.5, 300 sec: 14343.0). Total num frames: 872456192. Throughput: 0: 3697.1. Samples: 207281344. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 13:43:38,968][134211] Avg episode reward: [(0, '9.296')] [2025-01-04 13:43:39,534][134294] Updated weights for policy 0, policy_version 213004 (0.0024) [2025-01-04 13:43:42,399][134294] Updated weights for policy 0, policy_version 213014 (0.0022) [2025-01-04 13:43:43,968][134211] Fps is (10 sec: 13927.0, 60 sec: 14472.6, 300 sec: 14356.8). Total num frames: 872525824. Throughput: 0: 3587.3. Samples: 207302168. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 13:43:43,968][134211] Avg episode reward: [(0, '9.257')] [2025-01-04 13:43:45,317][134294] Updated weights for policy 0, policy_version 213024 (0.0024) [2025-01-04 13:43:48,131][134294] Updated weights for policy 0, policy_version 213034 (0.0025) [2025-01-04 13:43:48,968][134211] Fps is (10 sec: 13926.4, 60 sec: 14472.6, 300 sec: 14370.7). Total num frames: 872595456. Throughput: 0: 3481.1. Samples: 207313030. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 13:43:48,968][134211] Avg episode reward: [(0, '10.091')] [2025-01-04 13:43:51,061][134294] Updated weights for policy 0, policy_version 213044 (0.0021) [2025-01-04 13:43:53,888][134294] Updated weights for policy 0, policy_version 213054 (0.0023) [2025-01-04 13:43:53,968][134211] Fps is (10 sec: 14335.8, 60 sec: 14745.6, 300 sec: 14398.5). Total num frames: 872669184. Throughput: 0: 3506.1. Samples: 207334302. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 13:43:53,968][134211] Avg episode reward: [(0, '9.920')] [2025-01-04 13:43:56,935][134294] Updated weights for policy 0, policy_version 213064 (0.0028) [2025-01-04 13:43:58,968][134211] Fps is (10 sec: 13926.0, 60 sec: 14131.1, 300 sec: 14370.7). Total num frames: 872734720. Throughput: 0: 3509.4. Samples: 207355064. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 13:43:58,968][134211] Avg episode reward: [(0, '10.657')] [2025-01-04 13:43:59,879][134294] Updated weights for policy 0, policy_version 213074 (0.0025) [2025-01-04 13:44:02,776][134294] Updated weights for policy 0, policy_version 213084 (0.0025) [2025-01-04 13:44:03,970][134211] Fps is (10 sec: 13513.9, 60 sec: 13925.9, 300 sec: 14384.5). Total num frames: 872804352. Throughput: 0: 3512.6. Samples: 207365472. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 13:44:03,970][134211] Avg episode reward: [(0, '9.616')] [2025-01-04 13:44:05,773][134294] Updated weights for policy 0, policy_version 213094 (0.0025) [2025-01-04 13:44:07,913][134294] Updated weights for policy 0, policy_version 213104 (0.0015) [2025-01-04 13:44:08,969][134211] Fps is (10 sec: 15973.4, 60 sec: 14335.8, 300 sec: 14454.0). Total num frames: 872894464. Throughput: 0: 3535.6. Samples: 207387784. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 13:44:08,969][134211] Avg episode reward: [(0, '9.383')] [2025-01-04 13:44:10,291][134294] Updated weights for policy 0, policy_version 213114 (0.0020) [2025-01-04 13:44:13,113][134294] Updated weights for policy 0, policy_version 213124 (0.0025) [2025-01-04 13:44:13,968][134211] Fps is (10 sec: 15977.9, 60 sec: 14336.1, 300 sec: 14454.0). Total num frames: 872964096. Throughput: 0: 3619.3. Samples: 207412330. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 13:44:13,968][134211] Avg episode reward: [(0, '10.506')] [2025-01-04 13:44:16,015][134294] Updated weights for policy 0, policy_version 213134 (0.0024) [2025-01-04 13:44:18,968][134211] Fps is (10 sec: 13927.5, 60 sec: 14336.0, 300 sec: 14454.0). Total num frames: 873033728. Throughput: 0: 3621.1. Samples: 207422920. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 13:44:18,968][134211] Avg episode reward: [(0, '10.299')] [2025-01-04 13:44:19,086][134294] Updated weights for policy 0, policy_version 213144 (0.0024) [2025-01-04 13:44:21,952][134294] Updated weights for policy 0, policy_version 213154 (0.0023) [2025-01-04 13:44:23,968][134211] Fps is (10 sec: 13926.5, 60 sec: 14336.0, 300 sec: 14467.9). Total num frames: 873103360. Throughput: 0: 3605.9. Samples: 207443610. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 13:44:23,968][134211] Avg episode reward: [(0, '8.441')] [2025-01-04 13:44:24,986][134294] Updated weights for policy 0, policy_version 213164 (0.0023) [2025-01-04 13:44:27,854][134294] Updated weights for policy 0, policy_version 213174 (0.0022) [2025-01-04 13:44:28,968][134211] Fps is (10 sec: 13926.6, 60 sec: 14336.0, 300 sec: 14467.9). Total num frames: 873172992. Throughput: 0: 3607.2. Samples: 207464494. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 13:44:28,968][134211] Avg episode reward: [(0, '9.709')] [2025-01-04 13:44:30,674][134294] Updated weights for policy 0, policy_version 213184 (0.0025) [2025-01-04 13:44:33,569][134294] Updated weights for policy 0, policy_version 213194 (0.0024) [2025-01-04 13:44:33,968][134211] Fps is (10 sec: 14335.9, 60 sec: 14336.1, 300 sec: 14481.8). Total num frames: 873246720. Throughput: 0: 3605.8. Samples: 207475292. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 13:44:33,968][134211] Avg episode reward: [(0, '9.325')] [2025-01-04 13:44:36,467][134294] Updated weights for policy 0, policy_version 213204 (0.0023) [2025-01-04 13:44:38,968][134211] Fps is (10 sec: 14336.0, 60 sec: 14336.0, 300 sec: 14481.8). Total num frames: 873316352. Throughput: 0: 3608.7. Samples: 207496694. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 13:44:38,968][134211] Avg episode reward: [(0, '9.926')] [2025-01-04 13:44:39,424][134294] Updated weights for policy 0, policy_version 213214 (0.0027) [2025-01-04 13:44:41,573][134294] Updated weights for policy 0, policy_version 213224 (0.0013) [2025-01-04 13:44:43,426][134294] Updated weights for policy 0, policy_version 213234 (0.0012) [2025-01-04 13:44:43,968][134211] Fps is (10 sec: 16793.9, 60 sec: 14813.9, 300 sec: 14454.0). Total num frames: 873414656. Throughput: 0: 3740.4. Samples: 207523380. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 13:44:43,968][134211] Avg episode reward: [(0, '10.260')] [2025-01-04 13:44:45,350][134294] Updated weights for policy 0, policy_version 213244 (0.0012) [2025-01-04 13:44:47,190][134294] Updated weights for policy 0, policy_version 213254 (0.0014) [2025-01-04 13:44:48,968][134211] Fps is (10 sec: 20070.2, 60 sec: 15360.0, 300 sec: 14481.8). Total num frames: 873517056. Throughput: 0: 3874.8. Samples: 207539830. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 13:44:48,968][134211] Avg episode reward: [(0, '9.693')] [2025-01-04 13:44:49,838][134294] Updated weights for policy 0, policy_version 213264 (0.0022) [2025-01-04 13:44:52,885][134294] Updated weights for policy 0, policy_version 213274 (0.0029) [2025-01-04 13:44:53,968][134211] Fps is (10 sec: 16793.2, 60 sec: 15223.5, 300 sec: 14467.9). Total num frames: 873582592. Throughput: 0: 3908.6. Samples: 207563668. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 13:44:53,968][134211] Avg episode reward: [(0, '9.531')] [2025-01-04 13:44:55,918][134294] Updated weights for policy 0, policy_version 213284 (0.0024) [2025-01-04 13:44:58,840][134294] Updated weights for policy 0, policy_version 213294 (0.0027) [2025-01-04 13:44:58,968][134211] Fps is (10 sec: 13516.8, 60 sec: 15291.8, 300 sec: 14495.7). Total num frames: 873652224. Throughput: 0: 3817.3. Samples: 207584106. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 13:44:58,968][134211] Avg episode reward: [(0, '9.726')] [2025-01-04 13:45:01,800][134294] Updated weights for policy 0, policy_version 213304 (0.0025) [2025-01-04 13:45:03,968][134211] Fps is (10 sec: 13517.0, 60 sec: 15224.1, 300 sec: 14495.7). Total num frames: 873717760. Throughput: 0: 3807.3. Samples: 207594248. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 13:45:03,968][134211] Avg episode reward: [(0, '10.023')] [2025-01-04 13:45:04,930][134294] Updated weights for policy 0, policy_version 213314 (0.0026) [2025-01-04 13:45:07,843][134294] Updated weights for policy 0, policy_version 213324 (0.0027) [2025-01-04 13:45:08,968][134211] Fps is (10 sec: 13516.9, 60 sec: 14882.4, 300 sec: 14495.7). Total num frames: 873787392. Throughput: 0: 3798.9. Samples: 207614562. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 13:45:08,968][134211] Avg episode reward: [(0, '9.564')] [2025-01-04 13:45:10,825][134294] Updated weights for policy 0, policy_version 213334 (0.0025) [2025-01-04 13:45:13,623][134294] Updated weights for policy 0, policy_version 213344 (0.0023) [2025-01-04 13:45:13,968][134211] Fps is (10 sec: 13926.1, 60 sec: 14882.1, 300 sec: 14509.6). Total num frames: 873857024. Throughput: 0: 3807.1. Samples: 207635814. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 13:45:13,968][134211] Avg episode reward: [(0, '10.417')] [2025-01-04 13:45:14,007][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000213345_873861120.pth... [2025-01-04 13:45:14,082][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000212488_870350848.pth [2025-01-04 13:45:16,635][134294] Updated weights for policy 0, policy_version 213354 (0.0026) [2025-01-04 13:45:18,968][134211] Fps is (10 sec: 13926.3, 60 sec: 14882.1, 300 sec: 14509.6). Total num frames: 873926656. Throughput: 0: 3794.1. Samples: 207646026. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 13:45:18,968][134211] Avg episode reward: [(0, '8.773')] [2025-01-04 13:45:19,586][134294] Updated weights for policy 0, policy_version 213364 (0.0024) [2025-01-04 13:45:22,542][134294] Updated weights for policy 0, policy_version 213374 (0.0024) [2025-01-04 13:45:23,968][134211] Fps is (10 sec: 13926.7, 60 sec: 14882.2, 300 sec: 14509.6). Total num frames: 873996288. Throughput: 0: 3783.8. Samples: 207666964. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 13:45:23,968][134211] Avg episode reward: [(0, '9.045')] [2025-01-04 13:45:25,419][134294] Updated weights for policy 0, policy_version 213384 (0.0023) [2025-01-04 13:45:28,249][134294] Updated weights for policy 0, policy_version 213394 (0.0023) [2025-01-04 13:45:28,968][134211] Fps is (10 sec: 14336.1, 60 sec: 14950.4, 300 sec: 14523.4). Total num frames: 874070016. Throughput: 0: 3666.8. Samples: 207688388. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 13:45:28,968][134211] Avg episode reward: [(0, '9.077')] [2025-01-04 13:45:31,153][134294] Updated weights for policy 0, policy_version 213404 (0.0025) [2025-01-04 13:45:33,968][134211] Fps is (10 sec: 14335.7, 60 sec: 14882.1, 300 sec: 14523.4). Total num frames: 874139648. Throughput: 0: 3537.6. Samples: 207699024. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 13:45:33,968][134211] Avg episode reward: [(0, '10.248')] [2025-01-04 13:45:34,072][134294] Updated weights for policy 0, policy_version 213414 (0.0025) [2025-01-04 13:45:37,000][134294] Updated weights for policy 0, policy_version 213424 (0.0025) [2025-01-04 13:45:38,968][134211] Fps is (10 sec: 13926.2, 60 sec: 14882.1, 300 sec: 14523.4). Total num frames: 874209280. Throughput: 0: 3473.7. Samples: 207719984. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 13:45:38,969][134211] Avg episode reward: [(0, '9.554')] [2025-01-04 13:45:39,964][134294] Updated weights for policy 0, policy_version 213434 (0.0024) [2025-01-04 13:45:42,791][134294] Updated weights for policy 0, policy_version 213444 (0.0026) [2025-01-04 13:45:43,968][134211] Fps is (10 sec: 14336.1, 60 sec: 14472.5, 300 sec: 14537.3). Total num frames: 874283008. Throughput: 0: 3492.1. Samples: 207741252. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 13:45:43,968][134211] Avg episode reward: [(0, '8.887')] [2025-01-04 13:45:45,598][134294] Updated weights for policy 0, policy_version 213454 (0.0022) [2025-01-04 13:45:48,371][134294] Updated weights for policy 0, policy_version 213464 (0.0023) [2025-01-04 13:45:48,968][134211] Fps is (10 sec: 14336.2, 60 sec: 13926.4, 300 sec: 14551.2). Total num frames: 874352640. Throughput: 0: 3507.9. Samples: 207752102. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 13:45:48,968][134211] Avg episode reward: [(0, '9.074')] [2025-01-04 13:45:51,365][134294] Updated weights for policy 0, policy_version 213474 (0.0024) [2025-01-04 13:45:53,968][134211] Fps is (10 sec: 13926.5, 60 sec: 13994.7, 300 sec: 14523.4). Total num frames: 874422272. Throughput: 0: 3529.5. Samples: 207773388. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 13:45:53,968][134211] Avg episode reward: [(0, '10.067')] [2025-01-04 13:45:54,310][134294] Updated weights for policy 0, policy_version 213484 (0.0024) [2025-01-04 13:45:57,284][134294] Updated weights for policy 0, policy_version 213494 (0.0025) [2025-01-04 13:45:58,968][134211] Fps is (10 sec: 13926.4, 60 sec: 13994.7, 300 sec: 14495.7). Total num frames: 874491904. Throughput: 0: 3519.8. Samples: 207794204. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 13:45:58,968][134211] Avg episode reward: [(0, '8.935')] [2025-01-04 13:46:00,134][134294] Updated weights for policy 0, policy_version 213504 (0.0022) [2025-01-04 13:46:02,946][134294] Updated weights for policy 0, policy_version 213514 (0.0023) [2025-01-04 13:46:03,968][134211] Fps is (10 sec: 14335.8, 60 sec: 14131.2, 300 sec: 14509.6). Total num frames: 874565632. Throughput: 0: 3535.7. Samples: 207805134. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 13:46:03,968][134211] Avg episode reward: [(0, '9.166')] [2025-01-04 13:46:05,884][134294] Updated weights for policy 0, policy_version 213524 (0.0025) [2025-01-04 13:46:08,683][134294] Updated weights for policy 0, policy_version 213534 (0.0024) [2025-01-04 13:46:08,969][134211] Fps is (10 sec: 14334.4, 60 sec: 14130.9, 300 sec: 14523.4). Total num frames: 874635264. Throughput: 0: 3546.9. Samples: 207826580. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 13:46:08,969][134211] Avg episode reward: [(0, '9.952')] [2025-01-04 13:46:11,397][134294] Updated weights for policy 0, policy_version 213544 (0.0022) [2025-01-04 13:46:13,362][134294] Updated weights for policy 0, policy_version 213554 (0.0015) [2025-01-04 13:46:13,968][134211] Fps is (10 sec: 15974.7, 60 sec: 14472.6, 300 sec: 14537.3). Total num frames: 874725376. Throughput: 0: 3634.3. Samples: 207851930. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 13:46:13,968][134211] Avg episode reward: [(0, '9.702')] [2025-01-04 13:46:15,986][134294] Updated weights for policy 0, policy_version 213564 (0.0022) [2025-01-04 13:46:18,877][134294] Updated weights for policy 0, policy_version 213574 (0.0023) [2025-01-04 13:46:18,968][134211] Fps is (10 sec: 16385.9, 60 sec: 14540.8, 300 sec: 14551.2). Total num frames: 874799104. Throughput: 0: 3653.5. Samples: 207863430. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 13:46:18,968][134211] Avg episode reward: [(0, '10.581')] [2025-01-04 13:46:21,716][134294] Updated weights for policy 0, policy_version 213584 (0.0021) [2025-01-04 13:46:23,968][134211] Fps is (10 sec: 14335.9, 60 sec: 14540.8, 300 sec: 14551.2). Total num frames: 874868736. Throughput: 0: 3659.2. Samples: 207884646. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 13:46:23,968][134211] Avg episode reward: [(0, '9.442')] [2025-01-04 13:46:24,691][134294] Updated weights for policy 0, policy_version 213594 (0.0025) [2025-01-04 13:46:27,717][134294] Updated weights for policy 0, policy_version 213604 (0.0026) [2025-01-04 13:46:28,968][134211] Fps is (10 sec: 13926.5, 60 sec: 14472.6, 300 sec: 14565.1). Total num frames: 874938368. Throughput: 0: 3644.3. Samples: 207905244. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 13:46:28,968][134211] Avg episode reward: [(0, '9.201')] [2025-01-04 13:46:30,622][134294] Updated weights for policy 0, policy_version 213614 (0.0022) [2025-01-04 13:46:33,366][134294] Updated weights for policy 0, policy_version 213624 (0.0021) [2025-01-04 13:46:33,968][134211] Fps is (10 sec: 13926.4, 60 sec: 14472.6, 300 sec: 14565.1). Total num frames: 875008000. Throughput: 0: 3643.7. Samples: 207916070. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 13:46:33,968][134211] Avg episode reward: [(0, '10.950')] [2025-01-04 13:46:35,694][134294] Updated weights for policy 0, policy_version 213634 (0.0020) [2025-01-04 13:46:37,559][134294] Updated weights for policy 0, policy_version 213644 (0.0013) [2025-01-04 13:46:38,968][134211] Fps is (10 sec: 17612.8, 60 sec: 15087.0, 300 sec: 14690.1). Total num frames: 875114496. Throughput: 0: 3760.6. Samples: 207942616. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 13:46:38,968][134211] Avg episode reward: [(0, '9.691')] [2025-01-04 13:46:39,591][134294] Updated weights for policy 0, policy_version 213654 (0.0016) [2025-01-04 13:46:42,486][134294] Updated weights for policy 0, policy_version 213664 (0.0025) [2025-01-04 13:46:43,968][134211] Fps is (10 sec: 17612.5, 60 sec: 15018.6, 300 sec: 14606.7). Total num frames: 875184128. Throughput: 0: 3848.3. Samples: 207967378. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 13:46:43,968][134211] Avg episode reward: [(0, '10.023')] [2025-01-04 13:46:45,486][134294] Updated weights for policy 0, policy_version 213674 (0.0025) [2025-01-04 13:46:48,481][134294] Updated weights for policy 0, policy_version 213684 (0.0022) [2025-01-04 13:46:48,968][134211] Fps is (10 sec: 13926.3, 60 sec: 15018.7, 300 sec: 14467.9). Total num frames: 875253760. Throughput: 0: 3838.0. Samples: 207977844. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 13:46:48,968][134211] Avg episode reward: [(0, '9.049')] [2025-01-04 13:46:51,319][134294] Updated weights for policy 0, policy_version 213694 (0.0026) [2025-01-04 13:46:53,968][134211] Fps is (10 sec: 13926.5, 60 sec: 15018.6, 300 sec: 14495.7). Total num frames: 875323392. Throughput: 0: 3820.9. Samples: 207998518. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 13:46:53,968][134211] Avg episode reward: [(0, '10.242')] [2025-01-04 13:46:54,408][134294] Updated weights for policy 0, policy_version 213704 (0.0024) [2025-01-04 13:46:57,433][134294] Updated weights for policy 0, policy_version 213714 (0.0023) [2025-01-04 13:46:58,968][134211] Fps is (10 sec: 13926.4, 60 sec: 15018.7, 300 sec: 14495.7). Total num frames: 875393024. Throughput: 0: 3713.5. Samples: 208019038. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 13:46:58,968][134211] Avg episode reward: [(0, '9.181')] [2025-01-04 13:47:00,291][134294] Updated weights for policy 0, policy_version 213724 (0.0023) [2025-01-04 13:47:03,103][134294] Updated weights for policy 0, policy_version 213734 (0.0024) [2025-01-04 13:47:03,971][134211] Fps is (10 sec: 13922.1, 60 sec: 14949.6, 300 sec: 14524.0). Total num frames: 875462656. Throughput: 0: 3696.8. Samples: 208029798. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 13:47:03,971][134211] Avg episode reward: [(0, '11.350')] [2025-01-04 13:47:05,991][134294] Updated weights for policy 0, policy_version 213744 (0.0026) [2025-01-04 13:47:08,968][134211] Fps is (10 sec: 13926.5, 60 sec: 14950.7, 300 sec: 14537.3). Total num frames: 875532288. Throughput: 0: 3693.4. Samples: 208050850. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 13:47:08,968][134211] Avg episode reward: [(0, '10.217')] [2025-01-04 13:47:09,155][134294] Updated weights for policy 0, policy_version 213754 (0.0024) [2025-01-04 13:47:12,156][134294] Updated weights for policy 0, policy_version 213764 (0.0025) [2025-01-04 13:47:13,968][134211] Fps is (10 sec: 13930.6, 60 sec: 14609.0, 300 sec: 14523.4). Total num frames: 875601920. Throughput: 0: 3681.7. Samples: 208070922. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 13:47:13,969][134211] Avg episode reward: [(0, '10.480')] [2025-01-04 13:47:13,976][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000213770_875601920.pth... [2025-01-04 13:47:14,047][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000212917_872108032.pth [2025-01-04 13:47:15,149][134294] Updated weights for policy 0, policy_version 213774 (0.0024) [2025-01-04 13:47:18,006][134294] Updated weights for policy 0, policy_version 213784 (0.0022) [2025-01-04 13:47:18,968][134211] Fps is (10 sec: 13926.1, 60 sec: 14540.8, 300 sec: 14537.3). Total num frames: 875671552. Throughput: 0: 3678.8. Samples: 208081618. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 13:47:18,968][134211] Avg episode reward: [(0, '10.499')] [2025-01-04 13:47:20,829][134294] Updated weights for policy 0, policy_version 213794 (0.0025) [2025-01-04 13:47:23,681][134294] Updated weights for policy 0, policy_version 213804 (0.0024) [2025-01-04 13:47:23,968][134211] Fps is (10 sec: 13926.5, 60 sec: 14540.8, 300 sec: 14537.3). Total num frames: 875741184. Throughput: 0: 3565.4. Samples: 208103060. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 13:47:23,968][134211] Avg episode reward: [(0, '10.415')] [2025-01-04 13:47:26,197][134294] Updated weights for policy 0, policy_version 213814 (0.0022) [2025-01-04 13:47:28,214][134294] Updated weights for policy 0, policy_version 213824 (0.0016) [2025-01-04 13:47:28,968][134211] Fps is (10 sec: 15974.5, 60 sec: 14882.1, 300 sec: 14620.6). Total num frames: 875831296. Throughput: 0: 3587.0. Samples: 208128792. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 13:47:28,968][134211] Avg episode reward: [(0, '9.923')] [2025-01-04 13:47:30,986][134294] Updated weights for policy 0, policy_version 213834 (0.0025) [2025-01-04 13:47:33,796][134294] Updated weights for policy 0, policy_version 213844 (0.0022) [2025-01-04 13:47:33,968][134211] Fps is (10 sec: 16384.0, 60 sec: 14950.4, 300 sec: 14634.5). Total num frames: 875905024. Throughput: 0: 3602.9. Samples: 208139974. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 13:47:33,968][134211] Avg episode reward: [(0, '10.235')] [2025-01-04 13:47:36,775][134294] Updated weights for policy 0, policy_version 213854 (0.0025) [2025-01-04 13:47:38,968][134211] Fps is (10 sec: 14336.2, 60 sec: 14336.0, 300 sec: 14634.5). Total num frames: 875974656. Throughput: 0: 3614.8. Samples: 208161184. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 13:47:38,968][134211] Avg episode reward: [(0, '10.401')] [2025-01-04 13:47:39,709][134294] Updated weights for policy 0, policy_version 213864 (0.0026) [2025-01-04 13:47:42,677][134294] Updated weights for policy 0, policy_version 213874 (0.0024) [2025-01-04 13:47:43,968][134211] Fps is (10 sec: 13926.4, 60 sec: 14336.0, 300 sec: 14634.5). Total num frames: 876044288. Throughput: 0: 3616.5. Samples: 208181780. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 13:47:43,968][134211] Avg episode reward: [(0, '10.083')] [2025-01-04 13:47:45,549][134294] Updated weights for policy 0, policy_version 213884 (0.0023) [2025-01-04 13:47:48,402][134294] Updated weights for policy 0, policy_version 213894 (0.0023) [2025-01-04 13:47:48,968][134211] Fps is (10 sec: 14335.8, 60 sec: 14404.2, 300 sec: 14690.1). Total num frames: 876118016. Throughput: 0: 3619.1. Samples: 208192648. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 13:47:48,968][134211] Avg episode reward: [(0, '10.001')] [2025-01-04 13:47:51,196][134294] Updated weights for policy 0, policy_version 213904 (0.0024) [2025-01-04 13:47:53,968][134211] Fps is (10 sec: 14336.0, 60 sec: 14404.3, 300 sec: 14579.0). Total num frames: 876187648. Throughput: 0: 3633.3. Samples: 208214348. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 13:47:53,968][134211] Avg episode reward: [(0, '10.268')] [2025-01-04 13:47:54,048][134294] Updated weights for policy 0, policy_version 213914 (0.0025) [2025-01-04 13:47:56,655][134294] Updated weights for policy 0, policy_version 213924 (0.0019) [2025-01-04 13:47:58,636][134294] Updated weights for policy 0, policy_version 213934 (0.0014) [2025-01-04 13:47:58,968][134211] Fps is (10 sec: 15974.5, 60 sec: 14745.6, 300 sec: 14606.8). Total num frames: 876277760. Throughput: 0: 3749.2. Samples: 208239634. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 13:47:58,968][134211] Avg episode reward: [(0, '9.862')] [2025-01-04 13:48:01,384][134294] Updated weights for policy 0, policy_version 213944 (0.0024) [2025-01-04 13:48:03,968][134211] Fps is (10 sec: 15974.4, 60 sec: 14746.3, 300 sec: 14620.6). Total num frames: 876347392. Throughput: 0: 3767.6. Samples: 208251160. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 13:48:03,968][134211] Avg episode reward: [(0, '10.425')] [2025-01-04 13:48:04,313][134294] Updated weights for policy 0, policy_version 213954 (0.0026) [2025-01-04 13:48:07,226][134294] Updated weights for policy 0, policy_version 213964 (0.0025) [2025-01-04 13:48:08,968][134211] Fps is (10 sec: 13926.5, 60 sec: 14745.6, 300 sec: 14620.7). Total num frames: 876417024. Throughput: 0: 3758.0. Samples: 208272170. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 13:48:08,968][134211] Avg episode reward: [(0, '9.748')] [2025-01-04 13:48:10,224][134294] Updated weights for policy 0, policy_version 213974 (0.0026) [2025-01-04 13:48:12,983][134294] Updated weights for policy 0, policy_version 213984 (0.0024) [2025-01-04 13:48:13,968][134211] Fps is (10 sec: 14336.3, 60 sec: 14813.9, 300 sec: 14634.5). Total num frames: 876490752. Throughput: 0: 3657.5. Samples: 208293378. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 13:48:13,968][134211] Avg episode reward: [(0, '9.566')] [2025-01-04 13:48:15,893][134294] Updated weights for policy 0, policy_version 213994 (0.0023) [2025-01-04 13:48:18,733][134294] Updated weights for policy 0, policy_version 214004 (0.0022) [2025-01-04 13:48:18,968][134211] Fps is (10 sec: 14335.9, 60 sec: 14813.9, 300 sec: 14634.5). Total num frames: 876560384. Throughput: 0: 3647.9. Samples: 208304128. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 13:48:18,968][134211] Avg episode reward: [(0, '10.711')] [2025-01-04 13:48:21,527][134294] Updated weights for policy 0, policy_version 214014 (0.0022) [2025-01-04 13:48:23,968][134211] Fps is (10 sec: 14335.9, 60 sec: 14882.2, 300 sec: 14648.4). Total num frames: 876634112. Throughput: 0: 3656.1. Samples: 208325710. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 13:48:23,968][134211] Avg episode reward: [(0, '10.617')] [2025-01-04 13:48:24,403][134294] Updated weights for policy 0, policy_version 214024 (0.0023) [2025-01-04 13:48:27,353][134294] Updated weights for policy 0, policy_version 214034 (0.0026) [2025-01-04 13:48:28,968][134211] Fps is (10 sec: 14336.0, 60 sec: 14540.8, 300 sec: 14634.5). Total num frames: 876703744. Throughput: 0: 3671.0. Samples: 208346976. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 13:48:28,968][134211] Avg episode reward: [(0, '10.124')] [2025-01-04 13:48:30,258][134294] Updated weights for policy 0, policy_version 214044 (0.0023) [2025-01-04 13:48:33,093][134294] Updated weights for policy 0, policy_version 214054 (0.0023) [2025-01-04 13:48:33,968][134211] Fps is (10 sec: 13926.2, 60 sec: 14472.5, 300 sec: 14634.5). Total num frames: 876773376. Throughput: 0: 3663.5. Samples: 208357504. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 13:48:33,968][134211] Avg episode reward: [(0, '9.040')] [2025-01-04 13:48:35,969][134294] Updated weights for policy 0, policy_version 214064 (0.0022) [2025-01-04 13:48:38,848][134294] Updated weights for policy 0, policy_version 214074 (0.0025) [2025-01-04 13:48:38,968][134211] Fps is (10 sec: 14335.9, 60 sec: 14540.8, 300 sec: 14648.4). Total num frames: 876847104. Throughput: 0: 3662.0. Samples: 208379136. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 13:48:38,968][134211] Avg episode reward: [(0, '9.120')] [2025-01-04 13:48:41,637][134294] Updated weights for policy 0, policy_version 214084 (0.0026) [2025-01-04 13:48:43,622][134294] Updated weights for policy 0, policy_version 214094 (0.0013) [2025-01-04 13:48:43,968][134211] Fps is (10 sec: 15974.6, 60 sec: 14813.9, 300 sec: 14703.9). Total num frames: 876933120. Throughput: 0: 3632.8. Samples: 208403108. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 13:48:43,968][134211] Avg episode reward: [(0, '9.814')] [2025-01-04 13:48:46,109][134294] Updated weights for policy 0, policy_version 214104 (0.0021) [2025-01-04 13:48:48,914][134294] Updated weights for policy 0, policy_version 214114 (0.0026) [2025-01-04 13:48:48,968][134211] Fps is (10 sec: 16384.0, 60 sec: 14882.1, 300 sec: 14717.8). Total num frames: 877010944. Throughput: 0: 3664.3. Samples: 208416052. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 13:48:48,968][134211] Avg episode reward: [(0, '10.943')] [2025-01-04 13:48:51,876][134294] Updated weights for policy 0, policy_version 214124 (0.0025) [2025-01-04 13:48:53,968][134211] Fps is (10 sec: 14335.8, 60 sec: 14813.9, 300 sec: 14717.8). Total num frames: 877076480. Throughput: 0: 3668.0. Samples: 208437232. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 13:48:53,968][134211] Avg episode reward: [(0, '9.828')] [2025-01-04 13:48:54,906][134294] Updated weights for policy 0, policy_version 214134 (0.0024) [2025-01-04 13:48:57,839][134294] Updated weights for policy 0, policy_version 214144 (0.0024) [2025-01-04 13:48:58,968][134211] Fps is (10 sec: 13515.9, 60 sec: 14472.4, 300 sec: 14717.9). Total num frames: 877146112. Throughput: 0: 3656.1. Samples: 208457904. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 13:48:58,969][134211] Avg episode reward: [(0, '9.944')] [2025-01-04 13:49:00,796][134294] Updated weights for policy 0, policy_version 214154 (0.0022) [2025-01-04 13:49:03,735][134294] Updated weights for policy 0, policy_version 214164 (0.0027) [2025-01-04 13:49:03,968][134211] Fps is (10 sec: 13926.5, 60 sec: 14472.5, 300 sec: 14648.4). Total num frames: 877215744. Throughput: 0: 3653.7. Samples: 208468546. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 13:49:03,968][134211] Avg episode reward: [(0, '8.813')] [2025-01-04 13:49:06,604][134294] Updated weights for policy 0, policy_version 214174 (0.0024) [2025-01-04 13:49:08,534][134294] Updated weights for policy 0, policy_version 214184 (0.0012) [2025-01-04 13:49:08,967][134211] Fps is (10 sec: 15975.9, 60 sec: 14813.9, 300 sec: 14717.8). Total num frames: 877305856. Throughput: 0: 3660.3. Samples: 208490422. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 13:49:08,968][134211] Avg episode reward: [(0, '9.474')] [2025-01-04 13:49:10,366][134294] Updated weights for policy 0, policy_version 214194 (0.0015) [2025-01-04 13:49:12,335][134294] Updated weights for policy 0, policy_version 214204 (0.0013) [2025-01-04 13:49:13,967][134211] Fps is (10 sec: 19661.3, 60 sec: 15360.0, 300 sec: 14842.8). Total num frames: 877412352. Throughput: 0: 3908.0. Samples: 208522834. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 13:49:13,968][134211] Avg episode reward: [(0, '9.310')] [2025-01-04 13:49:14,018][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000214213_877416448.pth... [2025-01-04 13:49:14,058][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000213345_873861120.pth [2025-01-04 13:49:14,248][134294] Updated weights for policy 0, policy_version 214214 (0.0013) [2025-01-04 13:49:16,097][134294] Updated weights for policy 0, policy_version 214224 (0.0013) [2025-01-04 13:49:18,013][134294] Updated weights for policy 0, policy_version 214234 (0.0014) [2025-01-04 13:49:18,968][134211] Fps is (10 sec: 21298.7, 60 sec: 15974.4, 300 sec: 14967.8). Total num frames: 877518848. Throughput: 0: 4035.7. Samples: 208539112. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 13:49:18,968][134211] Avg episode reward: [(0, '9.864')] [2025-01-04 13:49:20,727][134294] Updated weights for policy 0, policy_version 214244 (0.0023) [2025-01-04 13:49:23,790][134294] Updated weights for policy 0, policy_version 214254 (0.0028) [2025-01-04 13:49:23,968][134211] Fps is (10 sec: 17202.9, 60 sec: 15837.9, 300 sec: 14953.9). Total num frames: 877584384. Throughput: 0: 4106.7. Samples: 208563938. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 13:49:23,968][134211] Avg episode reward: [(0, '9.344')] [2025-01-04 13:49:26,946][134294] Updated weights for policy 0, policy_version 214264 (0.0028) [2025-01-04 13:49:28,968][134211] Fps is (10 sec: 13106.6, 60 sec: 15769.5, 300 sec: 14926.1). Total num frames: 877649920. Throughput: 0: 4011.1. Samples: 208583610. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 13:49:28,969][134211] Avg episode reward: [(0, '11.344')] [2025-01-04 13:49:30,010][134294] Updated weights for policy 0, policy_version 214274 (0.0020) [2025-01-04 13:49:32,986][134294] Updated weights for policy 0, policy_version 214284 (0.0026) [2025-01-04 13:49:33,968][134211] Fps is (10 sec: 13515.8, 60 sec: 15769.4, 300 sec: 14926.1). Total num frames: 877719552. Throughput: 0: 3947.4. Samples: 208593686. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 13:49:33,969][134211] Avg episode reward: [(0, '9.982')] [2025-01-04 13:49:35,944][134294] Updated weights for policy 0, policy_version 214294 (0.0025) [2025-01-04 13:49:38,789][134294] Updated weights for policy 0, policy_version 214304 (0.0022) [2025-01-04 13:49:38,968][134211] Fps is (10 sec: 13927.0, 60 sec: 15701.3, 300 sec: 14828.9). Total num frames: 877789184. Throughput: 0: 3943.3. Samples: 208614678. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 13:49:38,968][134211] Avg episode reward: [(0, '8.729')] [2025-01-04 13:49:41,717][134294] Updated weights for policy 0, policy_version 214314 (0.0026) [2025-01-04 13:49:43,968][134211] Fps is (10 sec: 13926.7, 60 sec: 15428.1, 300 sec: 14717.8). Total num frames: 877858816. Throughput: 0: 3950.0. Samples: 208635654. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 13:49:43,969][134211] Avg episode reward: [(0, '10.144')] [2025-01-04 13:49:44,778][134294] Updated weights for policy 0, policy_version 214324 (0.0023) [2025-01-04 13:49:47,707][134294] Updated weights for policy 0, policy_version 214334 (0.0025) [2025-01-04 13:49:48,968][134211] Fps is (10 sec: 13926.4, 60 sec: 15291.7, 300 sec: 14731.7). Total num frames: 877928448. Throughput: 0: 3942.0. Samples: 208645934. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 13:49:48,968][134211] Avg episode reward: [(0, '9.637')] [2025-01-04 13:49:50,597][134294] Updated weights for policy 0, policy_version 214344 (0.0025) [2025-01-04 13:49:53,317][134294] Updated weights for policy 0, policy_version 214354 (0.0024) [2025-01-04 13:49:53,968][134211] Fps is (10 sec: 13927.2, 60 sec: 15360.0, 300 sec: 14731.7). Total num frames: 877998080. Throughput: 0: 3933.5. Samples: 208667430. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 13:49:53,968][134211] Avg episode reward: [(0, '9.068')] [2025-01-04 13:49:56,228][134294] Updated weights for policy 0, policy_version 214364 (0.0024) [2025-01-04 13:49:58,968][134211] Fps is (10 sec: 14335.9, 60 sec: 15428.4, 300 sec: 14759.5). Total num frames: 878071808. Throughput: 0: 3680.7. Samples: 208688466. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 13:49:58,968][134211] Avg episode reward: [(0, '9.564')] [2025-01-04 13:49:59,263][134294] Updated weights for policy 0, policy_version 214374 (0.0027) [2025-01-04 13:50:02,129][134294] Updated weights for policy 0, policy_version 214384 (0.0024) [2025-01-04 13:50:03,968][134211] Fps is (10 sec: 14335.8, 60 sec: 15428.3, 300 sec: 14759.5). Total num frames: 878141440. Throughput: 0: 3549.5. Samples: 208698840. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 13:50:03,968][134211] Avg episode reward: [(0, '9.444')] [2025-01-04 13:50:05,132][134294] Updated weights for policy 0, policy_version 214394 (0.0025) [2025-01-04 13:50:07,974][134294] Updated weights for policy 0, policy_version 214404 (0.0025) [2025-01-04 13:50:08,968][134211] Fps is (10 sec: 13926.5, 60 sec: 15086.9, 300 sec: 14759.5). Total num frames: 878211072. Throughput: 0: 3470.2. Samples: 208720098. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 13:50:08,968][134211] Avg episode reward: [(0, '9.183')] [2025-01-04 13:50:10,822][134294] Updated weights for policy 0, policy_version 214414 (0.0025) [2025-01-04 13:50:13,671][134294] Updated weights for policy 0, policy_version 214424 (0.0025) [2025-01-04 13:50:13,968][134211] Fps is (10 sec: 13926.6, 60 sec: 14472.5, 300 sec: 14759.5). Total num frames: 878280704. Throughput: 0: 3513.4. Samples: 208741712. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 13:50:13,968][134211] Avg episode reward: [(0, '9.599')] [2025-01-04 13:50:16,578][134294] Updated weights for policy 0, policy_version 214434 (0.0024) [2025-01-04 13:50:18,968][134211] Fps is (10 sec: 14335.9, 60 sec: 13926.4, 300 sec: 14773.4). Total num frames: 878354432. Throughput: 0: 3522.6. Samples: 208752200. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 13:50:18,968][134211] Avg episode reward: [(0, '9.589')] [2025-01-04 13:50:19,558][134294] Updated weights for policy 0, policy_version 214444 (0.0024) [2025-01-04 13:50:22,481][134294] Updated weights for policy 0, policy_version 214454 (0.0024) [2025-01-04 13:50:23,968][134211] Fps is (10 sec: 13926.4, 60 sec: 13926.4, 300 sec: 14745.6). Total num frames: 878419968. Throughput: 0: 3518.2. Samples: 208772996. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 13:50:23,968][134211] Avg episode reward: [(0, '9.128')] [2025-01-04 13:50:25,331][134294] Updated weights for policy 0, policy_version 214464 (0.0023) [2025-01-04 13:50:28,112][134294] Updated weights for policy 0, policy_version 214474 (0.0024) [2025-01-04 13:50:28,968][134211] Fps is (10 sec: 13926.2, 60 sec: 14063.0, 300 sec: 14759.5). Total num frames: 878493696. Throughput: 0: 3532.3. Samples: 208794608. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 13:50:28,969][134211] Avg episode reward: [(0, '10.337')] [2025-01-04 13:50:31,025][134294] Updated weights for policy 0, policy_version 214484 (0.0024) [2025-01-04 13:50:33,968][134211] Fps is (10 sec: 14335.8, 60 sec: 14063.1, 300 sec: 14759.5). Total num frames: 878563328. Throughput: 0: 3540.6. Samples: 208805260. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 13:50:33,968][134211] Avg episode reward: [(0, '9.663')] [2025-01-04 13:50:34,026][134294] Updated weights for policy 0, policy_version 214494 (0.0025) [2025-01-04 13:50:36,955][134294] Updated weights for policy 0, policy_version 214504 (0.0024) [2025-01-04 13:50:38,968][134211] Fps is (10 sec: 13926.6, 60 sec: 14062.9, 300 sec: 14745.6). Total num frames: 878632960. Throughput: 0: 3528.5. Samples: 208826212. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 13:50:38,968][134211] Avg episode reward: [(0, '9.311')] [2025-01-04 13:50:39,838][134294] Updated weights for policy 0, policy_version 214514 (0.0026) [2025-01-04 13:50:42,553][134294] Updated weights for policy 0, policy_version 214524 (0.0021) [2025-01-04 13:50:43,968][134211] Fps is (10 sec: 14745.7, 60 sec: 14199.6, 300 sec: 14773.4). Total num frames: 878710784. Throughput: 0: 3542.4. Samples: 208847876. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 13:50:43,968][134211] Avg episode reward: [(0, '10.079')] [2025-01-04 13:50:45,329][134294] Updated weights for policy 0, policy_version 214534 (0.0022) [2025-01-04 13:50:48,202][134294] Updated weights for policy 0, policy_version 214544 (0.0026) [2025-01-04 13:50:48,968][134211] Fps is (10 sec: 14745.6, 60 sec: 14199.5, 300 sec: 14773.4). Total num frames: 878780416. Throughput: 0: 3559.3. Samples: 208859006. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 13:50:48,968][134211] Avg episode reward: [(0, '8.630')] [2025-01-04 13:50:51,061][134294] Updated weights for policy 0, policy_version 214554 (0.0025) [2025-01-04 13:50:53,968][134211] Fps is (10 sec: 13926.5, 60 sec: 14199.5, 300 sec: 14773.4). Total num frames: 878850048. Throughput: 0: 3559.9. Samples: 208880294. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 13:50:53,968][134211] Avg episode reward: [(0, '10.417')] [2025-01-04 13:50:54,037][134294] Updated weights for policy 0, policy_version 214564 (0.0022) [2025-01-04 13:50:56,802][134294] Updated weights for policy 0, policy_version 214574 (0.0024) [2025-01-04 13:50:58,690][134294] Updated weights for policy 0, policy_version 214584 (0.0013) [2025-01-04 13:50:58,967][134211] Fps is (10 sec: 15974.7, 60 sec: 14472.6, 300 sec: 14828.9). Total num frames: 878940160. Throughput: 0: 3613.9. Samples: 208904338. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 13:50:58,968][134211] Avg episode reward: [(0, '10.646')] [2025-01-04 13:51:00,606][134294] Updated weights for policy 0, policy_version 214594 (0.0014) [2025-01-04 13:51:02,454][134294] Updated weights for policy 0, policy_version 214604 (0.0013) [2025-01-04 13:51:03,967][134211] Fps is (10 sec: 19661.1, 60 sec: 15087.0, 300 sec: 14953.9). Total num frames: 879046656. Throughput: 0: 3744.6. Samples: 208920704. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 13:51:03,968][134211] Avg episode reward: [(0, '7.889')] [2025-01-04 13:51:04,442][134294] Updated weights for policy 0, policy_version 214614 (0.0013) [2025-01-04 13:51:07,144][134294] Updated weights for policy 0, policy_version 214624 (0.0025) [2025-01-04 13:51:08,971][134211] Fps is (10 sec: 18016.8, 60 sec: 15154.5, 300 sec: 14898.2). Total num frames: 879120384. Throughput: 0: 3889.6. Samples: 208948038. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 13:51:08,971][134211] Avg episode reward: [(0, '9.399')] [2025-01-04 13:51:10,249][134294] Updated weights for policy 0, policy_version 214634 (0.0028) [2025-01-04 13:51:13,243][134294] Updated weights for policy 0, policy_version 214644 (0.0027) [2025-01-04 13:51:13,968][134211] Fps is (10 sec: 14335.7, 60 sec: 15155.2, 300 sec: 14884.4). Total num frames: 879190016. Throughput: 0: 3865.7. Samples: 208968562. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 13:51:13,968][134211] Avg episode reward: [(0, '10.349')] [2025-01-04 13:51:13,976][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000214646_879190016.pth... [2025-01-04 13:51:14,051][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000213770_875601920.pth [2025-01-04 13:51:16,316][134294] Updated weights for policy 0, policy_version 214654 (0.0024) [2025-01-04 13:51:18,968][134211] Fps is (10 sec: 13930.4, 60 sec: 15086.9, 300 sec: 14884.4). Total num frames: 879259648. Throughput: 0: 3853.0. Samples: 208978644. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 13:51:18,968][134211] Avg episode reward: [(0, '9.319')] [2025-01-04 13:51:19,148][134294] Updated weights for policy 0, policy_version 214664 (0.0025) [2025-01-04 13:51:22,057][134294] Updated weights for policy 0, policy_version 214674 (0.0024) [2025-01-04 13:51:23,968][134211] Fps is (10 sec: 13926.4, 60 sec: 15155.2, 300 sec: 14884.4). Total num frames: 879329280. Throughput: 0: 3852.4. Samples: 208999572. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 13:51:23,968][134211] Avg episode reward: [(0, '10.014')] [2025-01-04 13:51:25,053][134294] Updated weights for policy 0, policy_version 214684 (0.0023) [2025-01-04 13:51:27,929][134294] Updated weights for policy 0, policy_version 214694 (0.0022) [2025-01-04 13:51:28,969][134211] Fps is (10 sec: 13925.0, 60 sec: 15086.7, 300 sec: 14884.4). Total num frames: 879398912. Throughput: 0: 3843.6. Samples: 209020842. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 13:51:28,969][134211] Avg episode reward: [(0, '8.402')] [2025-01-04 13:51:30,845][134294] Updated weights for policy 0, policy_version 214704 (0.0026) [2025-01-04 13:51:33,661][134294] Updated weights for policy 0, policy_version 214714 (0.0024) [2025-01-04 13:51:33,968][134211] Fps is (10 sec: 14336.0, 60 sec: 15155.2, 300 sec: 14773.4). Total num frames: 879472640. Throughput: 0: 3831.3. Samples: 209031416. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 13:51:33,968][134211] Avg episode reward: [(0, '9.878')] [2025-01-04 13:51:36,489][134294] Updated weights for policy 0, policy_version 214724 (0.0021) [2025-01-04 13:51:38,968][134211] Fps is (10 sec: 14337.6, 60 sec: 15155.2, 300 sec: 14773.4). Total num frames: 879542272. Throughput: 0: 3835.0. Samples: 209052868. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 13:51:38,968][134211] Avg episode reward: [(0, '10.263')] [2025-01-04 13:51:39,519][134294] Updated weights for policy 0, policy_version 214734 (0.0024) [2025-01-04 13:51:42,427][134294] Updated weights for policy 0, policy_version 214744 (0.0023) [2025-01-04 13:51:43,968][134211] Fps is (10 sec: 13925.5, 60 sec: 15018.5, 300 sec: 14773.3). Total num frames: 879611904. Throughput: 0: 3761.0. Samples: 209073584. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 13:51:43,969][134211] Avg episode reward: [(0, '8.756')] [2025-01-04 13:51:45,303][134294] Updated weights for policy 0, policy_version 214754 (0.0026) [2025-01-04 13:51:48,210][134294] Updated weights for policy 0, policy_version 214764 (0.0025) [2025-01-04 13:51:48,968][134211] Fps is (10 sec: 13926.3, 60 sec: 15018.7, 300 sec: 14773.4). Total num frames: 879681536. Throughput: 0: 3639.6. Samples: 209084488. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 13:51:48,968][134211] Avg episode reward: [(0, '10.236')] [2025-01-04 13:51:51,070][134294] Updated weights for policy 0, policy_version 214774 (0.0024) [2025-01-04 13:51:53,918][134294] Updated weights for policy 0, policy_version 214784 (0.0024) [2025-01-04 13:51:53,968][134211] Fps is (10 sec: 14336.8, 60 sec: 15086.9, 300 sec: 14787.2). Total num frames: 879755264. Throughput: 0: 3505.5. Samples: 209105774. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 13:51:53,968][134211] Avg episode reward: [(0, '10.117')] [2025-01-04 13:51:56,876][134294] Updated weights for policy 0, policy_version 214794 (0.0025) [2025-01-04 13:51:58,968][134211] Fps is (10 sec: 14336.0, 60 sec: 14745.5, 300 sec: 14787.4). Total num frames: 879824896. Throughput: 0: 3507.2. Samples: 209126388. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 13:51:58,968][134211] Avg episode reward: [(0, '9.144')] [2025-01-04 13:51:59,902][134294] Updated weights for policy 0, policy_version 214804 (0.0025) [2025-01-04 13:52:02,790][134294] Updated weights for policy 0, policy_version 214814 (0.0025) [2025-01-04 13:52:03,969][134211] Fps is (10 sec: 13515.8, 60 sec: 14062.7, 300 sec: 14773.3). Total num frames: 879890432. Throughput: 0: 3514.7. Samples: 209136806. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 13:52:03,969][134211] Avg episode reward: [(0, '8.508')] [2025-01-04 13:52:05,756][134294] Updated weights for policy 0, policy_version 214824 (0.0026) [2025-01-04 13:52:08,642][134294] Updated weights for policy 0, policy_version 214834 (0.0025) [2025-01-04 13:52:08,968][134211] Fps is (10 sec: 13516.5, 60 sec: 13995.3, 300 sec: 14773.4). Total num frames: 879960064. Throughput: 0: 3524.0. Samples: 209158152. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 13:52:08,969][134211] Avg episode reward: [(0, '10.189')] [2025-01-04 13:52:11,570][134294] Updated weights for policy 0, policy_version 214844 (0.0024) [2025-01-04 13:52:13,968][134211] Fps is (10 sec: 14337.0, 60 sec: 14062.9, 300 sec: 14787.3). Total num frames: 880033792. Throughput: 0: 3511.8. Samples: 209178870. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 13:52:13,968][134211] Avg episode reward: [(0, '9.410')] [2025-01-04 13:52:14,633][134294] Updated weights for policy 0, policy_version 214854 (0.0024) [2025-01-04 13:52:17,528][134294] Updated weights for policy 0, policy_version 214864 (0.0026) [2025-01-04 13:52:18,967][134211] Fps is (10 sec: 14746.3, 60 sec: 14131.3, 300 sec: 14801.2). Total num frames: 880107520. Throughput: 0: 3508.6. Samples: 209189302. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 13:52:18,968][134211] Avg episode reward: [(0, '9.286')] [2025-01-04 13:52:19,616][134294] Updated weights for policy 0, policy_version 214874 (0.0012) [2025-01-04 13:52:22,073][134294] Updated weights for policy 0, policy_version 214884 (0.0022) [2025-01-04 13:52:23,968][134211] Fps is (10 sec: 15564.9, 60 sec: 14336.0, 300 sec: 14773.4). Total num frames: 880189440. Throughput: 0: 3602.3. Samples: 209214972. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 13:52:23,968][134211] Avg episode reward: [(0, '9.703')] [2025-01-04 13:52:24,920][134294] Updated weights for policy 0, policy_version 214894 (0.0024) [2025-01-04 13:52:27,870][134294] Updated weights for policy 0, policy_version 214904 (0.0026) [2025-01-04 13:52:28,968][134211] Fps is (10 sec: 15564.7, 60 sec: 14404.6, 300 sec: 14773.4). Total num frames: 880263168. Throughput: 0: 3613.0. Samples: 209236168. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 13:52:28,968][134211] Avg episode reward: [(0, '9.074')] [2025-01-04 13:52:30,683][134294] Updated weights for policy 0, policy_version 214914 (0.0026) [2025-01-04 13:52:33,515][134294] Updated weights for policy 0, policy_version 214924 (0.0022) [2025-01-04 13:52:33,968][134211] Fps is (10 sec: 14335.9, 60 sec: 14336.0, 300 sec: 14773.4). Total num frames: 880332800. Throughput: 0: 3610.9. Samples: 209246980. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 13:52:33,968][134211] Avg episode reward: [(0, '9.914')] [2025-01-04 13:52:36,482][134294] Updated weights for policy 0, policy_version 214934 (0.0027) [2025-01-04 13:52:38,968][134211] Fps is (10 sec: 13926.1, 60 sec: 14336.0, 300 sec: 14773.4). Total num frames: 880402432. Throughput: 0: 3609.4. Samples: 209268198. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 13:52:38,968][134211] Avg episode reward: [(0, '9.516')] [2025-01-04 13:52:39,365][134294] Updated weights for policy 0, policy_version 214944 (0.0027) [2025-01-04 13:52:42,089][134294] Updated weights for policy 0, policy_version 214954 (0.0021) [2025-01-04 13:52:43,968][134211] Fps is (10 sec: 15155.4, 60 sec: 14541.0, 300 sec: 14801.1). Total num frames: 880484352. Throughput: 0: 3682.8. Samples: 209292112. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 13:52:43,968][134211] Avg episode reward: [(0, '9.063')] [2025-01-04 13:52:44,087][134294] Updated weights for policy 0, policy_version 214964 (0.0014) [2025-01-04 13:52:46,842][134294] Updated weights for policy 0, policy_version 214974 (0.0025) [2025-01-04 13:52:48,968][134211] Fps is (10 sec: 15564.8, 60 sec: 14609.0, 300 sec: 14815.0). Total num frames: 880558080. Throughput: 0: 3716.3. Samples: 209304036. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 13:52:48,968][134211] Avg episode reward: [(0, '9.128')] [2025-01-04 13:52:49,929][134294] Updated weights for policy 0, policy_version 214984 (0.0028) [2025-01-04 13:52:52,806][134294] Updated weights for policy 0, policy_version 214994 (0.0027) [2025-01-04 13:52:53,968][134211] Fps is (10 sec: 14336.0, 60 sec: 14540.8, 300 sec: 14745.6). Total num frames: 880627712. Throughput: 0: 3702.7. Samples: 209324774. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 13:52:53,968][134211] Avg episode reward: [(0, '9.559')] [2025-01-04 13:52:55,738][134294] Updated weights for policy 0, policy_version 215004 (0.0023) [2025-01-04 13:52:58,549][134294] Updated weights for policy 0, policy_version 215014 (0.0026) [2025-01-04 13:52:58,968][134211] Fps is (10 sec: 14336.1, 60 sec: 14609.1, 300 sec: 14759.5). Total num frames: 880701440. Throughput: 0: 3720.0. Samples: 209346270. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 13:52:58,968][134211] Avg episode reward: [(0, '8.985')] [2025-01-04 13:53:01,476][134294] Updated weights for policy 0, policy_version 215024 (0.0024) [2025-01-04 13:53:03,969][134211] Fps is (10 sec: 14334.8, 60 sec: 14677.3, 300 sec: 14759.4). Total num frames: 880771072. Throughput: 0: 3722.0. Samples: 209356798. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 13:53:03,969][134211] Avg episode reward: [(0, '10.028')] [2025-01-04 13:53:04,528][134294] Updated weights for policy 0, policy_version 215034 (0.0022) [2025-01-04 13:53:07,437][134294] Updated weights for policy 0, policy_version 215044 (0.0026) [2025-01-04 13:53:08,968][134211] Fps is (10 sec: 13926.4, 60 sec: 14677.4, 300 sec: 14745.6). Total num frames: 880840704. Throughput: 0: 3607.1. Samples: 209377290. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 13:53:08,968][134211] Avg episode reward: [(0, '10.431')] [2025-01-04 13:53:10,361][134294] Updated weights for policy 0, policy_version 215054 (0.0026) [2025-01-04 13:53:12,833][134294] Updated weights for policy 0, policy_version 215064 (0.0020) [2025-01-04 13:53:13,968][134211] Fps is (10 sec: 15156.7, 60 sec: 14813.9, 300 sec: 14787.3). Total num frames: 880922624. Throughput: 0: 3656.1. Samples: 209400692. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 13:53:13,968][134211] Avg episode reward: [(0, '8.847')] [2025-01-04 13:53:13,987][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000215070_880926720.pth... [2025-01-04 13:53:14,037][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000214213_877416448.pth [2025-01-04 13:53:14,961][134294] Updated weights for policy 0, policy_version 215074 (0.0017) [2025-01-04 13:53:17,767][134294] Updated weights for policy 0, policy_version 215084 (0.0026) [2025-01-04 13:53:18,968][134211] Fps is (10 sec: 15974.5, 60 sec: 14882.1, 300 sec: 14801.1). Total num frames: 881000448. Throughput: 0: 3697.9. Samples: 209413384. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 13:53:18,968][134211] Avg episode reward: [(0, '9.416')] [2025-01-04 13:53:20,783][134294] Updated weights for policy 0, policy_version 215094 (0.0025) [2025-01-04 13:53:23,537][134294] Updated weights for policy 0, policy_version 215104 (0.0022) [2025-01-04 13:53:23,968][134211] Fps is (10 sec: 14744.3, 60 sec: 14677.2, 300 sec: 14801.1). Total num frames: 881070080. Throughput: 0: 3700.3. Samples: 209434716. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 13:53:23,969][134211] Avg episode reward: [(0, '10.808')] [2025-01-04 13:53:26,500][134294] Updated weights for policy 0, policy_version 215114 (0.0025) [2025-01-04 13:53:28,968][134211] Fps is (10 sec: 13926.3, 60 sec: 14609.0, 300 sec: 14801.1). Total num frames: 881139712. Throughput: 0: 3637.2. Samples: 209455784. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 13:53:28,968][134211] Avg episode reward: [(0, '8.942')] [2025-01-04 13:53:29,479][134294] Updated weights for policy 0, policy_version 215124 (0.0024) [2025-01-04 13:53:32,376][134294] Updated weights for policy 0, policy_version 215134 (0.0025) [2025-01-04 13:53:33,968][134211] Fps is (10 sec: 13927.3, 60 sec: 14609.1, 300 sec: 14787.3). Total num frames: 881209344. Throughput: 0: 3600.3. Samples: 209466048. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 13:53:33,968][134211] Avg episode reward: [(0, '9.564')] [2025-01-04 13:53:35,332][134294] Updated weights for policy 0, policy_version 215144 (0.0024) [2025-01-04 13:53:38,153][134294] Updated weights for policy 0, policy_version 215154 (0.0022) [2025-01-04 13:53:38,968][134211] Fps is (10 sec: 13926.5, 60 sec: 14609.1, 300 sec: 14731.7). Total num frames: 881278976. Throughput: 0: 3618.0. Samples: 209487586. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 13:53:38,968][134211] Avg episode reward: [(0, '9.344')] [2025-01-04 13:53:41,037][134294] Updated weights for policy 0, policy_version 215164 (0.0024) [2025-01-04 13:53:43,968][134211] Fps is (10 sec: 14336.1, 60 sec: 14472.5, 300 sec: 14717.8). Total num frames: 881352704. Throughput: 0: 3608.0. Samples: 209508632. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 13:53:43,968][134211] Avg episode reward: [(0, '9.795')] [2025-01-04 13:53:43,971][134294] Updated weights for policy 0, policy_version 215174 (0.0027) [2025-01-04 13:53:46,842][134294] Updated weights for policy 0, policy_version 215184 (0.0022) [2025-01-04 13:53:48,787][134294] Updated weights for policy 0, policy_version 215194 (0.0012) [2025-01-04 13:53:48,967][134211] Fps is (10 sec: 15974.6, 60 sec: 14677.4, 300 sec: 14787.3). Total num frames: 881438720. Throughput: 0: 3606.6. Samples: 209519092. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 13:53:48,968][134211] Avg episode reward: [(0, '10.016')] [2025-01-04 13:53:50,622][134294] Updated weights for policy 0, policy_version 215204 (0.0012) [2025-01-04 13:53:53,171][134294] Updated weights for policy 0, policy_version 215214 (0.0023) [2025-01-04 13:53:53,968][134211] Fps is (10 sec: 17203.3, 60 sec: 14950.4, 300 sec: 14842.8). Total num frames: 881524736. Throughput: 0: 3810.4. Samples: 209548760. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 13:53:53,968][134211] Avg episode reward: [(0, '10.412')] [2025-01-04 13:53:56,234][134294] Updated weights for policy 0, policy_version 215224 (0.0026) [2025-01-04 13:53:58,968][134211] Fps is (10 sec: 15154.8, 60 sec: 14813.9, 300 sec: 14828.9). Total num frames: 881590272. Throughput: 0: 3743.0. Samples: 209569130. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 13:53:58,968][134211] Avg episode reward: [(0, '10.062')] [2025-01-04 13:53:59,267][134294] Updated weights for policy 0, policy_version 215234 (0.0027) [2025-01-04 13:54:02,289][134294] Updated weights for policy 0, policy_version 215244 (0.0028) [2025-01-04 13:54:03,968][134211] Fps is (10 sec: 13516.5, 60 sec: 14814.0, 300 sec: 14759.5). Total num frames: 881659904. Throughput: 0: 3687.8. Samples: 209579338. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 13:54:03,969][134211] Avg episode reward: [(0, '10.677')] [2025-01-04 13:54:05,213][134294] Updated weights for policy 0, policy_version 215254 (0.0022) [2025-01-04 13:54:08,091][134294] Updated weights for policy 0, policy_version 215264 (0.0025) [2025-01-04 13:54:08,968][134211] Fps is (10 sec: 13926.6, 60 sec: 14813.9, 300 sec: 14634.5). Total num frames: 881729536. Throughput: 0: 3681.0. Samples: 209600358. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 13:54:08,968][134211] Avg episode reward: [(0, '10.078')] [2025-01-04 13:54:10,987][134294] Updated weights for policy 0, policy_version 215274 (0.0026) [2025-01-04 13:54:13,968][134211] Fps is (10 sec: 13926.6, 60 sec: 14609.0, 300 sec: 14509.6). Total num frames: 881799168. Throughput: 0: 3676.8. Samples: 209621240. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 13:54:13,968][134211] Avg episode reward: [(0, '10.529')] [2025-01-04 13:54:13,979][134294] Updated weights for policy 0, policy_version 215284 (0.0022) [2025-01-04 13:54:16,911][134294] Updated weights for policy 0, policy_version 215294 (0.0024) [2025-01-04 13:54:18,968][134211] Fps is (10 sec: 13926.2, 60 sec: 14472.5, 300 sec: 14523.4). Total num frames: 881868800. Throughput: 0: 3680.0. Samples: 209631648. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 13:54:18,968][134211] Avg episode reward: [(0, '10.716')] [2025-01-04 13:54:20,098][134294] Updated weights for policy 0, policy_version 215304 (0.0023) [2025-01-04 13:54:22,128][134294] Updated weights for policy 0, policy_version 215314 (0.0013) [2025-01-04 13:54:23,968][134211] Fps is (10 sec: 16384.3, 60 sec: 14882.3, 300 sec: 14620.7). Total num frames: 881963008. Throughput: 0: 3720.5. Samples: 209655010. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 13:54:23,968][134211] Avg episode reward: [(0, '9.703')] [2025-01-04 13:54:24,045][134294] Updated weights for policy 0, policy_version 215324 (0.0013) [2025-01-04 13:54:26,189][134294] Updated weights for policy 0, policy_version 215334 (0.0014) [2025-01-04 13:54:28,294][134294] Updated weights for policy 0, policy_version 215344 (0.0017) [2025-01-04 13:54:28,968][134211] Fps is (10 sec: 18841.8, 60 sec: 15291.7, 300 sec: 14704.0). Total num frames: 882057216. Throughput: 0: 3921.6. Samples: 209685102. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 13:54:28,968][134211] Avg episode reward: [(0, '9.629')] [2025-01-04 13:54:31,957][134294] Updated weights for policy 0, policy_version 215354 (0.0031) [2025-01-04 13:54:33,968][134211] Fps is (10 sec: 14745.2, 60 sec: 15018.7, 300 sec: 14648.4). Total num frames: 882110464. Throughput: 0: 3884.8. Samples: 209693910. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 13:54:33,969][134211] Avg episode reward: [(0, '9.042')] [2025-01-04 13:54:35,728][134294] Updated weights for policy 0, policy_version 215364 (0.0031) [2025-01-04 13:54:38,968][134211] Fps is (10 sec: 10649.5, 60 sec: 14745.6, 300 sec: 14592.9). Total num frames: 882163712. Throughput: 0: 3585.4. Samples: 209710104. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 13:54:38,968][134211] Avg episode reward: [(0, '9.786')] [2025-01-04 13:54:39,711][134294] Updated weights for policy 0, policy_version 215374 (0.0030) [2025-01-04 13:54:42,311][134294] Updated weights for policy 0, policy_version 215384 (0.0017) [2025-01-04 13:54:43,968][134211] Fps is (10 sec: 13517.2, 60 sec: 14882.2, 300 sec: 14634.5). Total num frames: 882245632. Throughput: 0: 3597.6. Samples: 209731022. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 13:54:43,968][134211] Avg episode reward: [(0, '9.567')] [2025-01-04 13:54:44,231][134294] Updated weights for policy 0, policy_version 215394 (0.0013) [2025-01-04 13:54:46,139][134294] Updated weights for policy 0, policy_version 215404 (0.0012) [2025-01-04 13:54:47,988][134294] Updated weights for policy 0, policy_version 215414 (0.0013) [2025-01-04 13:54:48,968][134211] Fps is (10 sec: 18841.7, 60 sec: 15223.4, 300 sec: 14759.5). Total num frames: 882352128. Throughput: 0: 3732.8. Samples: 209747314. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 13:54:48,968][134211] Avg episode reward: [(0, '8.431')] [2025-01-04 13:54:50,590][134294] Updated weights for policy 0, policy_version 215424 (0.0020) [2025-01-04 13:54:53,649][134294] Updated weights for policy 0, policy_version 215434 (0.0028) [2025-01-04 13:54:53,968][134211] Fps is (10 sec: 17202.6, 60 sec: 14882.1, 300 sec: 14731.7). Total num frames: 882417664. Throughput: 0: 3826.9. Samples: 209772570. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 13:54:53,969][134211] Avg episode reward: [(0, '9.952')] [2025-01-04 13:54:56,970][134294] Updated weights for policy 0, policy_version 215444 (0.0025) [2025-01-04 13:54:58,968][134211] Fps is (10 sec: 12697.6, 60 sec: 14813.9, 300 sec: 14704.0). Total num frames: 882479104. Throughput: 0: 3782.7. Samples: 209791460. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 13:54:58,968][134211] Avg episode reward: [(0, '8.884')] [2025-01-04 13:55:00,284][134294] Updated weights for policy 0, policy_version 215454 (0.0026) [2025-01-04 13:55:03,296][134294] Updated weights for policy 0, policy_version 215464 (0.0028) [2025-01-04 13:55:03,969][134211] Fps is (10 sec: 12695.7, 60 sec: 14745.2, 300 sec: 14690.0). Total num frames: 882544640. Throughput: 0: 3769.5. Samples: 209801280. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 13:55:03,970][134211] Avg episode reward: [(0, '9.982')] [2025-01-04 13:55:06,413][134294] Updated weights for policy 0, policy_version 215474 (0.0022) [2025-01-04 13:55:08,968][134211] Fps is (10 sec: 13516.7, 60 sec: 14745.6, 300 sec: 14690.1). Total num frames: 882614272. Throughput: 0: 3693.1. Samples: 209821198. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 13:55:08,968][134211] Avg episode reward: [(0, '8.522')] [2025-01-04 13:55:09,541][134294] Updated weights for policy 0, policy_version 215484 (0.0025) [2025-01-04 13:55:12,683][134294] Updated weights for policy 0, policy_version 215494 (0.0026) [2025-01-04 13:55:13,968][134211] Fps is (10 sec: 13519.2, 60 sec: 14677.4, 300 sec: 14662.3). Total num frames: 882679808. Throughput: 0: 3458.8. Samples: 209840750. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 13:55:13,968][134211] Avg episode reward: [(0, '10.049')] [2025-01-04 13:55:13,976][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000215498_882679808.pth... [2025-01-04 13:55:14,050][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000214646_879190016.pth [2025-01-04 13:55:15,722][134294] Updated weights for policy 0, policy_version 215504 (0.0025) [2025-01-04 13:55:18,620][134294] Updated weights for policy 0, policy_version 215514 (0.0022) [2025-01-04 13:55:18,968][134211] Fps is (10 sec: 13516.8, 60 sec: 14677.3, 300 sec: 14676.2). Total num frames: 882749440. Throughput: 0: 3491.5. Samples: 209851026. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 13:55:18,968][134211] Avg episode reward: [(0, '9.028')] [2025-01-04 13:55:21,674][134294] Updated weights for policy 0, policy_version 215524 (0.0026) [2025-01-04 13:55:23,968][134211] Fps is (10 sec: 13516.6, 60 sec: 14199.4, 300 sec: 14648.4). Total num frames: 882814976. Throughput: 0: 3586.3. Samples: 209871486. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 13:55:23,969][134211] Avg episode reward: [(0, '8.901')] [2025-01-04 13:55:25,029][134294] Updated weights for policy 0, policy_version 215534 (0.0027) [2025-01-04 13:55:28,329][134294] Updated weights for policy 0, policy_version 215544 (0.0028) [2025-01-04 13:55:28,971][134211] Fps is (10 sec: 12693.7, 60 sec: 13652.6, 300 sec: 14620.5). Total num frames: 882876416. Throughput: 0: 3528.1. Samples: 209889800. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 13:55:28,971][134211] Avg episode reward: [(0, '9.145')] [2025-01-04 13:55:31,452][134294] Updated weights for policy 0, policy_version 215554 (0.0027) [2025-01-04 13:55:33,968][134211] Fps is (10 sec: 12287.6, 60 sec: 13789.8, 300 sec: 14592.8). Total num frames: 882937856. Throughput: 0: 3381.4. Samples: 209899478. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 13:55:33,969][134211] Avg episode reward: [(0, '10.349')] [2025-01-04 13:55:34,759][134294] Updated weights for policy 0, policy_version 215564 (0.0030) [2025-01-04 13:55:37,774][134294] Updated weights for policy 0, policy_version 215574 (0.0025) [2025-01-04 13:55:38,968][134211] Fps is (10 sec: 12701.5, 60 sec: 13994.7, 300 sec: 14551.2). Total num frames: 883003392. Throughput: 0: 3250.2. Samples: 209918828. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 13:55:38,968][134211] Avg episode reward: [(0, '9.701')] [2025-01-04 13:55:40,843][134294] Updated weights for policy 0, policy_version 215584 (0.0026) [2025-01-04 13:55:43,725][134294] Updated weights for policy 0, policy_version 215594 (0.0025) [2025-01-04 13:55:43,968][134211] Fps is (10 sec: 13517.2, 60 sec: 13789.8, 300 sec: 14551.2). Total num frames: 883073024. Throughput: 0: 3293.8. Samples: 209939682. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 13:55:43,968][134211] Avg episode reward: [(0, '8.736')] [2025-01-04 13:55:46,745][134294] Updated weights for policy 0, policy_version 215604 (0.0023) [2025-01-04 13:55:48,968][134211] Fps is (10 sec: 13516.8, 60 sec: 13107.2, 300 sec: 14537.3). Total num frames: 883138560. Throughput: 0: 3300.7. Samples: 209949808. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 13:55:48,968][134211] Avg episode reward: [(0, '9.117')] [2025-01-04 13:55:50,013][134294] Updated weights for policy 0, policy_version 215614 (0.0028) [2025-01-04 13:55:52,462][134294] Updated weights for policy 0, policy_version 215624 (0.0018) [2025-01-04 13:55:53,967][134211] Fps is (10 sec: 15155.6, 60 sec: 13448.6, 300 sec: 14523.4). Total num frames: 883224576. Throughput: 0: 3323.6. Samples: 209970758. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 13:55:53,968][134211] Avg episode reward: [(0, '9.690')] [2025-01-04 13:55:54,397][134294] Updated weights for policy 0, policy_version 215634 (0.0013) [2025-01-04 13:55:56,352][134294] Updated weights for policy 0, policy_version 215644 (0.0012) [2025-01-04 13:55:58,567][134294] Updated weights for policy 0, policy_version 215654 (0.0014) [2025-01-04 13:55:58,968][134211] Fps is (10 sec: 18432.4, 60 sec: 14063.0, 300 sec: 14495.7). Total num frames: 883322880. Throughput: 0: 3559.4. Samples: 210000924. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 13:55:58,968][134211] Avg episode reward: [(0, '9.534')] [2025-01-04 13:56:00,662][134294] Updated weights for policy 0, policy_version 215664 (0.0014) [2025-01-04 13:56:02,849][134294] Updated weights for policy 0, policy_version 215674 (0.0019) [2025-01-04 13:56:03,968][134211] Fps is (10 sec: 18840.7, 60 sec: 14472.9, 300 sec: 14551.3). Total num frames: 883412992. Throughput: 0: 3662.6. Samples: 210015842. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 13:56:03,969][134211] Avg episode reward: [(0, '9.252')] [2025-01-04 13:56:06,520][134294] Updated weights for policy 0, policy_version 215684 (0.0032) [2025-01-04 13:56:08,968][134211] Fps is (10 sec: 14745.3, 60 sec: 14267.7, 300 sec: 14509.6). Total num frames: 883470336. Throughput: 0: 3645.1. Samples: 210035514. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 13:56:08,968][134211] Avg episode reward: [(0, '9.097')] [2025-01-04 13:56:09,932][134294] Updated weights for policy 0, policy_version 215694 (0.0027) [2025-01-04 13:56:13,248][134294] Updated weights for policy 0, policy_version 215704 (0.0028) [2025-01-04 13:56:13,968][134211] Fps is (10 sec: 11878.2, 60 sec: 14199.3, 300 sec: 14481.8). Total num frames: 883531776. Throughput: 0: 3650.7. Samples: 210054070. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 13:56:13,969][134211] Avg episode reward: [(0, '10.040')] [2025-01-04 13:56:16,432][134294] Updated weights for policy 0, policy_version 215714 (0.0026) [2025-01-04 13:56:18,968][134211] Fps is (10 sec: 12697.6, 60 sec: 14131.2, 300 sec: 14467.9). Total num frames: 883597312. Throughput: 0: 3649.7. Samples: 210063712. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 13:56:18,969][134211] Avg episode reward: [(0, '9.326')] [2025-01-04 13:56:19,686][134294] Updated weights for policy 0, policy_version 215724 (0.0028) [2025-01-04 13:56:22,768][134294] Updated weights for policy 0, policy_version 215734 (0.0025) [2025-01-04 13:56:23,968][134211] Fps is (10 sec: 12698.1, 60 sec: 14062.9, 300 sec: 14440.2). Total num frames: 883658752. Throughput: 0: 3644.0. Samples: 210082806. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 13:56:23,968][134211] Avg episode reward: [(0, '9.445')] [2025-01-04 13:56:25,794][134294] Updated weights for policy 0, policy_version 215744 (0.0025) [2025-01-04 13:56:28,968][134211] Fps is (10 sec: 12697.6, 60 sec: 14131.9, 300 sec: 14412.4). Total num frames: 883724288. Throughput: 0: 3627.2. Samples: 210102906. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 13:56:28,969][134211] Avg episode reward: [(0, '9.513')] [2025-01-04 13:56:29,014][134294] Updated weights for policy 0, policy_version 215754 (0.0027) [2025-01-04 13:56:32,092][134294] Updated weights for policy 0, policy_version 215764 (0.0027) [2025-01-04 13:56:33,968][134211] Fps is (10 sec: 13517.0, 60 sec: 14267.8, 300 sec: 14412.4). Total num frames: 883793920. Throughput: 0: 3614.6. Samples: 210112464. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 13:56:33,968][134211] Avg episode reward: [(0, '9.180')] [2025-01-04 13:56:35,048][134294] Updated weights for policy 0, policy_version 215774 (0.0028) [2025-01-04 13:56:38,365][134294] Updated weights for policy 0, policy_version 215784 (0.0028) [2025-01-04 13:56:38,968][134211] Fps is (10 sec: 13106.8, 60 sec: 14199.4, 300 sec: 14384.6). Total num frames: 883855360. Throughput: 0: 3596.6. Samples: 210132606. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 13:56:38,969][134211] Avg episode reward: [(0, '10.059')] [2025-01-04 13:56:41,536][134294] Updated weights for policy 0, policy_version 215794 (0.0028) [2025-01-04 13:56:43,968][134211] Fps is (10 sec: 12697.4, 60 sec: 14131.2, 300 sec: 14370.7). Total num frames: 883920896. Throughput: 0: 3346.8. Samples: 210151530. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 13:56:43,968][134211] Avg episode reward: [(0, '9.498')] [2025-01-04 13:56:44,658][134294] Updated weights for policy 0, policy_version 215804 (0.0028) [2025-01-04 13:56:48,178][134294] Updated weights for policy 0, policy_version 215814 (0.0029) [2025-01-04 13:56:48,968][134211] Fps is (10 sec: 12698.0, 60 sec: 14062.9, 300 sec: 14329.1). Total num frames: 883982336. Throughput: 0: 3224.5. Samples: 210160946. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 13:56:48,968][134211] Avg episode reward: [(0, '9.877')] [2025-01-04 13:56:51,521][134294] Updated weights for policy 0, policy_version 215824 (0.0027) [2025-01-04 13:56:53,573][134294] Updated weights for policy 0, policy_version 215834 (0.0014) [2025-01-04 13:56:53,968][134211] Fps is (10 sec: 13926.4, 60 sec: 13926.3, 300 sec: 14356.8). Total num frames: 884060160. Throughput: 0: 3211.1. Samples: 210180014. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 13:56:53,968][134211] Avg episode reward: [(0, '10.298')] [2025-01-04 13:56:56,499][134294] Updated weights for policy 0, policy_version 215844 (0.0026) [2025-01-04 13:56:58,968][134211] Fps is (10 sec: 14336.1, 60 sec: 13380.2, 300 sec: 14356.9). Total num frames: 884125696. Throughput: 0: 3301.4. Samples: 210202632. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 13:56:58,968][134211] Avg episode reward: [(0, '9.529')] [2025-01-04 13:56:59,795][134294] Updated weights for policy 0, policy_version 215854 (0.0026) [2025-01-04 13:57:03,032][134294] Updated weights for policy 0, policy_version 215864 (0.0029) [2025-01-04 13:57:03,968][134211] Fps is (10 sec: 12697.5, 60 sec: 12902.4, 300 sec: 14329.1). Total num frames: 884187136. Throughput: 0: 3297.9. Samples: 210212116. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 13:57:03,969][134211] Avg episode reward: [(0, '11.816')] [2025-01-04 13:57:06,419][134294] Updated weights for policy 0, policy_version 215874 (0.0030) [2025-01-04 13:57:08,710][134294] Updated weights for policy 0, policy_version 215884 (0.0013) [2025-01-04 13:57:08,967][134211] Fps is (10 sec: 13926.8, 60 sec: 13243.8, 300 sec: 14343.0). Total num frames: 884264960. Throughput: 0: 3287.5. Samples: 210230744. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 13:57:08,968][134211] Avg episode reward: [(0, '9.778')] [2025-01-04 13:57:10,772][134294] Updated weights for policy 0, policy_version 215894 (0.0014) [2025-01-04 13:57:12,910][134294] Updated weights for policy 0, policy_version 215904 (0.0015) [2025-01-04 13:57:13,968][134211] Fps is (10 sec: 16793.5, 60 sec: 13721.7, 300 sec: 14398.5). Total num frames: 884355072. Throughput: 0: 3490.8. Samples: 210259994. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 13:57:13,969][134211] Avg episode reward: [(0, '9.671')] [2025-01-04 13:57:14,047][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000215908_884359168.pth... [2025-01-04 13:57:14,140][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000215070_880926720.pth [2025-01-04 13:57:16,285][134294] Updated weights for policy 0, policy_version 215914 (0.0029) [2025-01-04 13:57:18,968][134211] Fps is (10 sec: 14745.0, 60 sec: 13585.0, 300 sec: 14315.2). Total num frames: 884412416. Throughput: 0: 3475.3. Samples: 210268854. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 13:57:18,969][134211] Avg episode reward: [(0, '10.788')] [2025-01-04 13:57:19,773][134294] Updated weights for policy 0, policy_version 215924 (0.0030) [2025-01-04 13:57:23,162][134294] Updated weights for policy 0, policy_version 215934 (0.0027) [2025-01-04 13:57:23,968][134211] Fps is (10 sec: 11878.5, 60 sec: 13585.0, 300 sec: 14273.5). Total num frames: 884473856. Throughput: 0: 3418.6. Samples: 210286442. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 13:57:23,969][134211] Avg episode reward: [(0, '10.742')] [2025-01-04 13:57:26,223][134294] Updated weights for policy 0, policy_version 215944 (0.0025) [2025-01-04 13:57:28,968][134211] Fps is (10 sec: 12287.4, 60 sec: 13516.7, 300 sec: 14245.7). Total num frames: 884535296. Throughput: 0: 3423.8. Samples: 210305604. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 13:57:28,969][134211] Avg episode reward: [(0, '10.363')] [2025-01-04 13:57:29,657][134294] Updated weights for policy 0, policy_version 215954 (0.0029) [2025-01-04 13:57:33,109][134294] Updated weights for policy 0, policy_version 215964 (0.0025) [2025-01-04 13:57:33,968][134211] Fps is (10 sec: 12697.9, 60 sec: 13448.5, 300 sec: 14231.9). Total num frames: 884600832. Throughput: 0: 3419.0. Samples: 210314800. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 13:57:33,968][134211] Avg episode reward: [(0, '10.321')] [2025-01-04 13:57:35,242][134294] Updated weights for policy 0, policy_version 215974 (0.0015) [2025-01-04 13:57:37,297][134294] Updated weights for policy 0, policy_version 215984 (0.0013) [2025-01-04 13:57:38,968][134211] Fps is (10 sec: 16794.8, 60 sec: 14131.3, 300 sec: 14301.3). Total num frames: 884703232. Throughput: 0: 3561.9. Samples: 210340300. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 13:57:38,968][134211] Avg episode reward: [(0, '10.042')] [2025-01-04 13:57:39,352][134294] Updated weights for policy 0, policy_version 215994 (0.0015) [2025-01-04 13:57:41,329][134294] Updated weights for policy 0, policy_version 216004 (0.0017) [2025-01-04 13:57:43,968][134211] Fps is (10 sec: 18431.4, 60 sec: 14404.2, 300 sec: 14329.1). Total num frames: 884785152. Throughput: 0: 3659.2. Samples: 210367296. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 13:57:43,969][134211] Avg episode reward: [(0, '10.296')] [2025-01-04 13:57:44,612][134294] Updated weights for policy 0, policy_version 216014 (0.0025) [2025-01-04 13:57:48,176][134294] Updated weights for policy 0, policy_version 216024 (0.0034) [2025-01-04 13:57:48,968][134211] Fps is (10 sec: 13925.3, 60 sec: 14335.8, 300 sec: 14287.4). Total num frames: 884842496. Throughput: 0: 3639.5. Samples: 210375898. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 13:57:48,969][134211] Avg episode reward: [(0, '9.192')] [2025-01-04 13:57:51,522][134294] Updated weights for policy 0, policy_version 216034 (0.0028) [2025-01-04 13:57:53,968][134211] Fps is (10 sec: 11878.2, 60 sec: 14062.9, 300 sec: 14245.7). Total num frames: 884903936. Throughput: 0: 3627.2. Samples: 210393970. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 13:57:53,969][134211] Avg episode reward: [(0, '11.075')] [2025-01-04 13:57:54,897][134294] Updated weights for policy 0, policy_version 216044 (0.0029) [2025-01-04 13:57:58,680][134294] Updated weights for policy 0, policy_version 216054 (0.0030) [2025-01-04 13:57:58,968][134211] Fps is (10 sec: 11469.5, 60 sec: 13858.1, 300 sec: 14190.2). Total num frames: 884957184. Throughput: 0: 3360.6. Samples: 210411222. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 13:57:58,969][134211] Avg episode reward: [(0, '9.247')] [2025-01-04 13:58:03,177][134294] Updated weights for policy 0, policy_version 216064 (0.0039) [2025-01-04 13:58:03,969][134211] Fps is (10 sec: 9829.4, 60 sec: 13584.8, 300 sec: 14106.8). Total num frames: 885002240. Throughput: 0: 3316.8. Samples: 210418114. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 13:58:03,970][134211] Avg episode reward: [(0, '10.022')] [2025-01-04 13:58:05,693][134294] Updated weights for policy 0, policy_version 216074 (0.0019) [2025-01-04 13:58:07,720][134294] Updated weights for policy 0, policy_version 216084 (0.0014) [2025-01-04 13:58:08,968][134211] Fps is (10 sec: 13926.5, 60 sec: 13858.0, 300 sec: 14148.5). Total num frames: 885096448. Throughput: 0: 3421.0. Samples: 210440386. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 13:58:08,969][134211] Avg episode reward: [(0, '8.271')] [2025-01-04 13:58:10,825][134294] Updated weights for policy 0, policy_version 216094 (0.0028) [2025-01-04 13:58:13,968][134211] Fps is (10 sec: 15566.8, 60 sec: 13380.3, 300 sec: 14093.0). Total num frames: 885157888. Throughput: 0: 3437.5. Samples: 210460288. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 13:58:13,969][134211] Avg episode reward: [(0, '9.674')] [2025-01-04 13:58:14,298][134294] Updated weights for policy 0, policy_version 216104 (0.0030) [2025-01-04 13:58:17,329][134294] Updated weights for policy 0, policy_version 216114 (0.0029) [2025-01-04 13:58:18,968][134211] Fps is (10 sec: 12697.6, 60 sec: 13516.8, 300 sec: 14079.2). Total num frames: 885223424. Throughput: 0: 3447.0. Samples: 210469916. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 13:58:18,969][134211] Avg episode reward: [(0, '10.355')] [2025-01-04 13:58:20,471][134294] Updated weights for policy 0, policy_version 216124 (0.0025) [2025-01-04 13:58:23,968][134211] Fps is (10 sec: 12287.8, 60 sec: 13448.5, 300 sec: 14037.5). Total num frames: 885280768. Throughput: 0: 3304.6. Samples: 210489008. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 13:58:23,969][134211] Avg episode reward: [(0, '9.628')] [2025-01-04 13:58:24,313][134294] Updated weights for policy 0, policy_version 216134 (0.0032) [2025-01-04 13:58:28,185][134294] Updated weights for policy 0, policy_version 216144 (0.0033) [2025-01-04 13:58:28,968][134211] Fps is (10 sec: 11059.5, 60 sec: 13312.2, 300 sec: 13982.0). Total num frames: 885334016. Throughput: 0: 3046.6. Samples: 210504392. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 13:58:28,968][134211] Avg episode reward: [(0, '10.190')] [2025-01-04 13:58:30,580][134294] Updated weights for policy 0, policy_version 216154 (0.0013) [2025-01-04 13:58:32,536][134294] Updated weights for policy 0, policy_version 216164 (0.0015) [2025-01-04 13:58:33,968][134211] Fps is (10 sec: 15155.7, 60 sec: 13858.1, 300 sec: 14079.1). Total num frames: 885432320. Throughput: 0: 3159.8. Samples: 210518086. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 13:58:33,968][134211] Avg episode reward: [(0, '10.373')] [2025-01-04 13:58:34,686][134294] Updated weights for policy 0, policy_version 216174 (0.0011) [2025-01-04 13:58:36,685][134294] Updated weights for policy 0, policy_version 216184 (0.0014) [2025-01-04 13:58:38,968][134211] Fps is (10 sec: 19249.6, 60 sec: 13721.4, 300 sec: 14148.5). Total num frames: 885526528. Throughput: 0: 3426.0. Samples: 210548142. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 13:58:38,969][134211] Avg episode reward: [(0, '9.804')] [2025-01-04 13:58:39,385][134294] Updated weights for policy 0, policy_version 216194 (0.0021) [2025-01-04 13:58:43,321][134294] Updated weights for policy 0, policy_version 216204 (0.0034) [2025-01-04 13:58:43,969][134211] Fps is (10 sec: 14334.5, 60 sec: 13175.3, 300 sec: 14023.5). Total num frames: 885575680. Throughput: 0: 3432.7. Samples: 210565698. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 13:58:43,970][134211] Avg episode reward: [(0, '9.928')] [2025-01-04 13:58:47,113][134294] Updated weights for policy 0, policy_version 216214 (0.0035) [2025-01-04 13:58:48,968][134211] Fps is (10 sec: 10650.3, 60 sec: 13175.6, 300 sec: 13926.4). Total num frames: 885633024. Throughput: 0: 3456.6. Samples: 210573656. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 13:58:48,968][134211] Avg episode reward: [(0, '10.234')] [2025-01-04 13:58:50,818][134294] Updated weights for policy 0, policy_version 216224 (0.0035) [2025-01-04 13:58:53,968][134211] Fps is (10 sec: 11469.7, 60 sec: 13107.2, 300 sec: 13898.6). Total num frames: 885690368. Throughput: 0: 3345.8. Samples: 210590946. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 13:58:53,970][134211] Avg episode reward: [(0, '9.467')] [2025-01-04 13:58:54,137][134294] Updated weights for policy 0, policy_version 216234 (0.0027) [2025-01-04 13:58:57,845][134294] Updated weights for policy 0, policy_version 216244 (0.0030) [2025-01-04 13:58:58,968][134211] Fps is (10 sec: 11468.6, 60 sec: 13175.5, 300 sec: 13857.0). Total num frames: 885747712. Throughput: 0: 3285.7. Samples: 210608144. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 13:58:58,969][134211] Avg episode reward: [(0, '10.261')] [2025-01-04 13:59:01,384][134294] Updated weights for policy 0, policy_version 216254 (0.0028) [2025-01-04 13:59:03,587][134294] Updated weights for policy 0, policy_version 216264 (0.0014) [2025-01-04 13:59:03,968][134211] Fps is (10 sec: 13107.6, 60 sec: 13653.7, 300 sec: 13870.9). Total num frames: 885821440. Throughput: 0: 3258.5. Samples: 210616546. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 13:59:03,968][134211] Avg episode reward: [(0, '9.169')] [2025-01-04 13:59:05,651][134294] Updated weights for policy 0, policy_version 216274 (0.0015) [2025-01-04 13:59:07,807][134294] Updated weights for policy 0, policy_version 216284 (0.0014) [2025-01-04 13:59:08,970][134211] Fps is (10 sec: 16380.8, 60 sec: 13584.6, 300 sec: 13940.2). Total num frames: 885911552. Throughput: 0: 3467.8. Samples: 210645066. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 13:59:08,971][134211] Avg episode reward: [(0, '9.517')] [2025-01-04 13:59:11,815][134294] Updated weights for policy 0, policy_version 216294 (0.0033) [2025-01-04 13:59:13,968][134211] Fps is (10 sec: 13516.3, 60 sec: 13312.0, 300 sec: 13857.0). Total num frames: 885956608. Throughput: 0: 3486.0. Samples: 210661264. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 13:59:13,969][134211] Avg episode reward: [(0, '8.366')] [2025-01-04 13:59:14,004][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000216299_885960704.pth... [2025-01-04 13:59:14,114][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000215498_882679808.pth [2025-01-04 13:59:16,465][134294] Updated weights for policy 0, policy_version 216304 (0.0036) [2025-01-04 13:59:18,968][134211] Fps is (10 sec: 10242.4, 60 sec: 13175.5, 300 sec: 13732.0). Total num frames: 886013952. Throughput: 0: 3329.2. Samples: 210667900. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 13:59:18,968][134211] Avg episode reward: [(0, '9.399')] [2025-01-04 13:59:19,189][134294] Updated weights for policy 0, policy_version 216314 (0.0016) [2025-01-04 13:59:21,668][134294] Updated weights for policy 0, policy_version 216324 (0.0018) [2025-01-04 13:59:23,968][134211] Fps is (10 sec: 13926.5, 60 sec: 13585.1, 300 sec: 13690.3). Total num frames: 886095872. Throughput: 0: 3178.9. Samples: 210691190. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 13:59:23,969][134211] Avg episode reward: [(0, '9.729')] [2025-01-04 13:59:24,712][134294] Updated weights for policy 0, policy_version 216334 (0.0022) [2025-01-04 13:59:27,589][134294] Updated weights for policy 0, policy_version 216344 (0.0021) [2025-01-04 13:59:28,968][134211] Fps is (10 sec: 14744.9, 60 sec: 13789.8, 300 sec: 13732.0). Total num frames: 886161408. Throughput: 0: 3237.5. Samples: 210711384. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 13:59:28,969][134211] Avg episode reward: [(0, '9.759')] [2025-01-04 13:59:31,374][134294] Updated weights for policy 0, policy_version 216354 (0.0031) [2025-01-04 13:59:33,969][134211] Fps is (10 sec: 11058.4, 60 sec: 12902.2, 300 sec: 13704.2). Total num frames: 886206464. Throughput: 0: 3230.6. Samples: 210719038. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 13:59:33,970][134211] Avg episode reward: [(0, '9.063')] [2025-01-04 13:59:35,507][134294] Updated weights for policy 0, policy_version 216364 (0.0034) [2025-01-04 13:59:37,627][134294] Updated weights for policy 0, policy_version 216374 (0.0013) [2025-01-04 13:59:38,968][134211] Fps is (10 sec: 12698.1, 60 sec: 12697.8, 300 sec: 13704.2). Total num frames: 886288384. Throughput: 0: 3271.4. Samples: 210738160. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 13:59:38,968][134211] Avg episode reward: [(0, '10.823')] [2025-01-04 13:59:39,834][134294] Updated weights for policy 0, policy_version 216384 (0.0013) [2025-01-04 13:59:43,191][134294] Updated weights for policy 0, policy_version 216394 (0.0028) [2025-01-04 13:59:43,968][134211] Fps is (10 sec: 14746.8, 60 sec: 12970.8, 300 sec: 13565.4). Total num frames: 886353920. Throughput: 0: 3386.2. Samples: 210760522. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 13:59:43,969][134211] Avg episode reward: [(0, '10.105')] [2025-01-04 13:59:47,631][134294] Updated weights for policy 0, policy_version 216404 (0.0039) [2025-01-04 13:59:48,968][134211] Fps is (10 sec: 11468.5, 60 sec: 12834.1, 300 sec: 13509.9). Total num frames: 886403072. Throughput: 0: 3356.1. Samples: 210767572. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 13:59:48,969][134211] Avg episode reward: [(0, '10.506')] [2025-01-04 13:59:51,148][134294] Updated weights for policy 0, policy_version 216414 (0.0028) [2025-01-04 13:59:53,673][134294] Updated weights for policy 0, policy_version 216424 (0.0020) [2025-01-04 13:59:53,968][134211] Fps is (10 sec: 12288.2, 60 sec: 13107.2, 300 sec: 13551.5). Total num frames: 886476800. Throughput: 0: 3131.7. Samples: 210785984. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 13:59:53,968][134211] Avg episode reward: [(0, '10.111')] [2025-01-04 13:59:56,622][134294] Updated weights for policy 0, policy_version 216434 (0.0026) [2025-01-04 13:59:58,968][134211] Fps is (10 sec: 13926.6, 60 sec: 13243.8, 300 sec: 13551.6). Total num frames: 886542336. Throughput: 0: 3226.0. Samples: 210806432. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 13:59:58,968][134211] Avg episode reward: [(0, '9.771')] [2025-01-04 13:59:59,938][134294] Updated weights for policy 0, policy_version 216444 (0.0026) [2025-01-04 14:00:02,888][134294] Updated weights for policy 0, policy_version 216454 (0.0025) [2025-01-04 14:00:03,968][134211] Fps is (10 sec: 13107.2, 60 sec: 13107.2, 300 sec: 13537.6). Total num frames: 886607872. Throughput: 0: 3296.0. Samples: 210816220. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 14:00:03,968][134211] Avg episode reward: [(0, '9.735')] [2025-01-04 14:00:06,006][134294] Updated weights for policy 0, policy_version 216464 (0.0026) [2025-01-04 14:00:08,968][134211] Fps is (10 sec: 13107.3, 60 sec: 12698.1, 300 sec: 13537.6). Total num frames: 886673408. Throughput: 0: 3224.5. Samples: 210836292. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 14:00:08,968][134211] Avg episode reward: [(0, '9.212')] [2025-01-04 14:00:09,075][134294] Updated weights for policy 0, policy_version 216474 (0.0026) [2025-01-04 14:00:12,024][134294] Updated weights for policy 0, policy_version 216484 (0.0027) [2025-01-04 14:00:13,968][134211] Fps is (10 sec: 13516.6, 60 sec: 13107.2, 300 sec: 13537.6). Total num frames: 886743040. Throughput: 0: 3221.8. Samples: 210856364. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 14:00:13,968][134211] Avg episode reward: [(0, '9.491')] [2025-01-04 14:00:15,097][134294] Updated weights for policy 0, policy_version 216494 (0.0027) [2025-01-04 14:00:18,110][134294] Updated weights for policy 0, policy_version 216504 (0.0026) [2025-01-04 14:00:18,968][134211] Fps is (10 sec: 13516.7, 60 sec: 13243.7, 300 sec: 13537.6). Total num frames: 886808576. Throughput: 0: 3284.6. Samples: 210866840. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 14:00:18,968][134211] Avg episode reward: [(0, '10.029')] [2025-01-04 14:00:21,317][134294] Updated weights for policy 0, policy_version 216514 (0.0028) [2025-01-04 14:00:23,968][134211] Fps is (10 sec: 13107.3, 60 sec: 12970.7, 300 sec: 13551.7). Total num frames: 886874112. Throughput: 0: 3290.8. Samples: 210886246. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 14:00:23,968][134211] Avg episode reward: [(0, '9.294')] [2025-01-04 14:00:24,470][134294] Updated weights for policy 0, policy_version 216524 (0.0028) [2025-01-04 14:00:27,507][134294] Updated weights for policy 0, policy_version 216534 (0.0028) [2025-01-04 14:00:28,968][134211] Fps is (10 sec: 13107.1, 60 sec: 12970.7, 300 sec: 13565.4). Total num frames: 886939648. Throughput: 0: 3237.7. Samples: 210906216. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 14:00:28,968][134211] Avg episode reward: [(0, '9.612')] [2025-01-04 14:00:30,469][134294] Updated weights for policy 0, policy_version 216544 (0.0023) [2025-01-04 14:00:33,411][134294] Updated weights for policy 0, policy_version 216554 (0.0027) [2025-01-04 14:00:33,968][134211] Fps is (10 sec: 13926.3, 60 sec: 13448.7, 300 sec: 13593.2). Total num frames: 887013376. Throughput: 0: 3315.8. Samples: 210916782. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 14:00:33,968][134211] Avg episode reward: [(0, '8.219')] [2025-01-04 14:00:36,342][134294] Updated weights for policy 0, policy_version 216564 (0.0025) [2025-01-04 14:00:38,968][134211] Fps is (10 sec: 13926.3, 60 sec: 13175.4, 300 sec: 13579.3). Total num frames: 887078912. Throughput: 0: 3370.9. Samples: 210937676. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 14:00:38,968][134211] Avg episode reward: [(0, '9.783')] [2025-01-04 14:00:39,336][134294] Updated weights for policy 0, policy_version 216574 (0.0028) [2025-01-04 14:00:42,360][134294] Updated weights for policy 0, policy_version 216584 (0.0023) [2025-01-04 14:00:43,968][134211] Fps is (10 sec: 13516.9, 60 sec: 13243.8, 300 sec: 13593.2). Total num frames: 887148544. Throughput: 0: 3368.4. Samples: 210958008. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 14:00:43,968][134211] Avg episode reward: [(0, '10.745')] [2025-01-04 14:00:45,322][134294] Updated weights for policy 0, policy_version 216594 (0.0025) [2025-01-04 14:00:48,211][134294] Updated weights for policy 0, policy_version 216604 (0.0026) [2025-01-04 14:00:48,968][134211] Fps is (10 sec: 13926.5, 60 sec: 13585.1, 300 sec: 13537.6). Total num frames: 887218176. Throughput: 0: 3384.8. Samples: 210968534. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 14:00:48,968][134211] Avg episode reward: [(0, '10.431')] [2025-01-04 14:00:51,138][134294] Updated weights for policy 0, policy_version 216614 (0.0022) [2025-01-04 14:00:53,968][134211] Fps is (10 sec: 13926.4, 60 sec: 13516.8, 300 sec: 13440.4). Total num frames: 887287808. Throughput: 0: 3400.6. Samples: 210989318. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2025-01-04 14:00:53,968][134211] Avg episode reward: [(0, '10.031')] [2025-01-04 14:00:54,210][134294] Updated weights for policy 0, policy_version 216624 (0.0024) [2025-01-04 14:00:57,179][134294] Updated weights for policy 0, policy_version 216634 (0.0025) [2025-01-04 14:00:58,968][134211] Fps is (10 sec: 13516.9, 60 sec: 13516.8, 300 sec: 13357.1). Total num frames: 887353344. Throughput: 0: 3406.4. Samples: 211009652. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2025-01-04 14:00:58,968][134211] Avg episode reward: [(0, '10.611')] [2025-01-04 14:01:00,170][134294] Updated weights for policy 0, policy_version 216644 (0.0026) [2025-01-04 14:01:03,152][134294] Updated weights for policy 0, policy_version 216654 (0.0025) [2025-01-04 14:01:03,968][134211] Fps is (10 sec: 13516.6, 60 sec: 13585.0, 300 sec: 13398.8). Total num frames: 887422976. Throughput: 0: 3409.2. Samples: 211020256. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2025-01-04 14:01:03,969][134211] Avg episode reward: [(0, '10.714')] [2025-01-04 14:01:06,044][134294] Updated weights for policy 0, policy_version 216664 (0.0028) [2025-01-04 14:01:08,968][134211] Fps is (10 sec: 13926.4, 60 sec: 13653.3, 300 sec: 13426.6). Total num frames: 887492608. Throughput: 0: 3436.6. Samples: 211040894. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2025-01-04 14:01:08,968][134211] Avg episode reward: [(0, '9.212')] [2025-01-04 14:01:09,141][134294] Updated weights for policy 0, policy_version 216674 (0.0026) [2025-01-04 14:01:12,039][134294] Updated weights for policy 0, policy_version 216684 (0.0025) [2025-01-04 14:01:13,971][134211] Fps is (10 sec: 13922.4, 60 sec: 13652.7, 300 sec: 13440.3). Total num frames: 887562240. Throughput: 0: 3447.7. Samples: 211061372. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2025-01-04 14:01:13,971][134211] Avg episode reward: [(0, '10.132')] [2025-01-04 14:01:13,980][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000216690_887562240.pth... [2025-01-04 14:01:14,058][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000215908_884359168.pth [2025-01-04 14:01:15,098][134294] Updated weights for policy 0, policy_version 216694 (0.0026) [2025-01-04 14:01:17,354][134294] Updated weights for policy 0, policy_version 216704 (0.0017) [2025-01-04 14:01:18,968][134211] Fps is (10 sec: 15564.7, 60 sec: 13994.7, 300 sec: 13523.7). Total num frames: 887648256. Throughput: 0: 3443.7. Samples: 211071748. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2025-01-04 14:01:18,968][134211] Avg episode reward: [(0, '10.001')] [2025-01-04 14:01:19,602][134294] Updated weights for policy 0, policy_version 216714 (0.0019) [2025-01-04 14:01:22,553][134294] Updated weights for policy 0, policy_version 216724 (0.0026) [2025-01-04 14:01:23,968][134211] Fps is (10 sec: 15569.4, 60 sec: 14062.9, 300 sec: 13537.6). Total num frames: 887717888. Throughput: 0: 3549.3. Samples: 211097396. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2025-01-04 14:01:23,968][134211] Avg episode reward: [(0, '10.520')] [2025-01-04 14:01:25,469][134294] Updated weights for policy 0, policy_version 216734 (0.0025) [2025-01-04 14:01:28,418][134294] Updated weights for policy 0, policy_version 216744 (0.0030) [2025-01-04 14:01:28,968][134211] Fps is (10 sec: 13926.2, 60 sec: 14131.2, 300 sec: 13537.6). Total num frames: 887787520. Throughput: 0: 3562.2. Samples: 211118308. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2025-01-04 14:01:28,968][134211] Avg episode reward: [(0, '10.611')] [2025-01-04 14:01:31,358][134294] Updated weights for policy 0, policy_version 216754 (0.0025) [2025-01-04 14:01:33,968][134211] Fps is (10 sec: 13926.6, 60 sec: 14063.0, 300 sec: 13565.4). Total num frames: 887857152. Throughput: 0: 3558.7. Samples: 211128674. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2025-01-04 14:01:33,968][134211] Avg episode reward: [(0, '9.109')] [2025-01-04 14:01:34,465][134294] Updated weights for policy 0, policy_version 216764 (0.0027) [2025-01-04 14:01:37,403][134294] Updated weights for policy 0, policy_version 216774 (0.0023) [2025-01-04 14:01:38,968][134211] Fps is (10 sec: 13516.7, 60 sec: 14062.9, 300 sec: 13565.4). Total num frames: 887922688. Throughput: 0: 3546.8. Samples: 211148924. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2025-01-04 14:01:38,969][134211] Avg episode reward: [(0, '9.210')] [2025-01-04 14:01:40,362][134294] Updated weights for policy 0, policy_version 216784 (0.0027) [2025-01-04 14:01:43,302][134294] Updated weights for policy 0, policy_version 216794 (0.0025) [2025-01-04 14:01:43,969][134211] Fps is (10 sec: 13924.9, 60 sec: 14131.0, 300 sec: 13607.0). Total num frames: 887996416. Throughput: 0: 3561.9. Samples: 211169942. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2025-01-04 14:01:43,969][134211] Avg episode reward: [(0, '10.130')] [2025-01-04 14:01:46,275][134294] Updated weights for policy 0, policy_version 216804 (0.0024) [2025-01-04 14:01:48,968][134211] Fps is (10 sec: 14336.3, 60 sec: 14131.2, 300 sec: 13579.3). Total num frames: 888066048. Throughput: 0: 3557.8. Samples: 211180356. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2025-01-04 14:01:48,968][134211] Avg episode reward: [(0, '9.217')] [2025-01-04 14:01:49,289][134294] Updated weights for policy 0, policy_version 216814 (0.0025) [2025-01-04 14:01:52,212][134294] Updated weights for policy 0, policy_version 216824 (0.0022) [2025-01-04 14:01:53,968][134211] Fps is (10 sec: 13518.1, 60 sec: 14062.9, 300 sec: 13579.3). Total num frames: 888131584. Throughput: 0: 3550.2. Samples: 211200654. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2025-01-04 14:01:53,968][134211] Avg episode reward: [(0, '9.842')] [2025-01-04 14:01:55,236][134294] Updated weights for policy 0, policy_version 216834 (0.0023) [2025-01-04 14:01:58,119][134294] Updated weights for policy 0, policy_version 216844 (0.0026) [2025-01-04 14:01:58,968][134211] Fps is (10 sec: 13517.0, 60 sec: 14131.2, 300 sec: 13607.1). Total num frames: 888201216. Throughput: 0: 3561.2. Samples: 211221614. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2025-01-04 14:01:58,968][134211] Avg episode reward: [(0, '9.586')] [2025-01-04 14:02:01,066][134294] Updated weights for policy 0, policy_version 216854 (0.0025) [2025-01-04 14:02:03,968][134211] Fps is (10 sec: 13925.8, 60 sec: 14131.1, 300 sec: 13579.2). Total num frames: 888270848. Throughput: 0: 3562.9. Samples: 211232080. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 14:02:03,969][134211] Avg episode reward: [(0, '10.427')] [2025-01-04 14:02:04,193][134294] Updated weights for policy 0, policy_version 216864 (0.0025) [2025-01-04 14:02:06,474][134294] Updated weights for policy 0, policy_version 216874 (0.0015) [2025-01-04 14:02:08,385][134294] Updated weights for policy 0, policy_version 216884 (0.0013) [2025-01-04 14:02:08,968][134211] Fps is (10 sec: 16793.7, 60 sec: 14609.1, 300 sec: 13607.1). Total num frames: 888369152. Throughput: 0: 3525.7. Samples: 211256050. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 14:02:08,968][134211] Avg episode reward: [(0, '9.496')] [2025-01-04 14:02:10,270][134294] Updated weights for policy 0, policy_version 216894 (0.0013) [2025-01-04 14:02:12,121][134294] Updated weights for policy 0, policy_version 216904 (0.0015) [2025-01-04 14:02:13,968][134211] Fps is (10 sec: 20480.9, 60 sec: 15224.2, 300 sec: 13773.7). Total num frames: 888475648. Throughput: 0: 3784.5. Samples: 211288608. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 14:02:13,968][134211] Avg episode reward: [(0, '9.230')] [2025-01-04 14:02:14,008][134294] Updated weights for policy 0, policy_version 216914 (0.0012) [2025-01-04 14:02:15,887][134294] Updated weights for policy 0, policy_version 216924 (0.0013) [2025-01-04 14:02:18,586][134294] Updated weights for policy 0, policy_version 216934 (0.0021) [2025-01-04 14:02:18,968][134211] Fps is (10 sec: 19660.4, 60 sec: 15291.7, 300 sec: 13870.9). Total num frames: 888565760. Throughput: 0: 3915.6. Samples: 211304878. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 14:02:18,968][134211] Avg episode reward: [(0, '9.368')] [2025-01-04 14:02:21,869][134294] Updated weights for policy 0, policy_version 216944 (0.0029) [2025-01-04 14:02:23,968][134211] Fps is (10 sec: 15154.9, 60 sec: 15155.2, 300 sec: 13870.9). Total num frames: 888627200. Throughput: 0: 3909.3. Samples: 211324842. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 14:02:23,969][134211] Avg episode reward: [(0, '9.739')] [2025-01-04 14:02:25,116][134294] Updated weights for policy 0, policy_version 216954 (0.0027) [2025-01-04 14:02:28,224][134294] Updated weights for policy 0, policy_version 216964 (0.0025) [2025-01-04 14:02:28,968][134211] Fps is (10 sec: 12697.6, 60 sec: 15086.9, 300 sec: 13870.9). Total num frames: 888692736. Throughput: 0: 3872.4. Samples: 211344198. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 14:02:28,968][134211] Avg episode reward: [(0, '9.104')] [2025-01-04 14:02:31,261][134294] Updated weights for policy 0, policy_version 216974 (0.0024) [2025-01-04 14:02:33,968][134211] Fps is (10 sec: 13107.5, 60 sec: 15018.7, 300 sec: 13745.9). Total num frames: 888758272. Throughput: 0: 3868.5. Samples: 211354436. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 14:02:33,968][134211] Avg episode reward: [(0, '10.177')] [2025-01-04 14:02:34,411][134294] Updated weights for policy 0, policy_version 216984 (0.0027) [2025-01-04 14:02:37,315][134294] Updated weights for policy 0, policy_version 216994 (0.0026) [2025-01-04 14:02:38,968][134211] Fps is (10 sec: 13107.4, 60 sec: 15018.7, 300 sec: 13690.4). Total num frames: 888823808. Throughput: 0: 3861.0. Samples: 211374400. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 14:02:38,968][134211] Avg episode reward: [(0, '9.920')] [2025-01-04 14:02:40,376][134294] Updated weights for policy 0, policy_version 217004 (0.0026) [2025-01-04 14:02:43,315][134294] Updated weights for policy 0, policy_version 217014 (0.0028) [2025-01-04 14:02:43,968][134211] Fps is (10 sec: 13926.2, 60 sec: 15018.9, 300 sec: 13745.9). Total num frames: 888897536. Throughput: 0: 3854.4. Samples: 211395064. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 14:02:43,968][134211] Avg episode reward: [(0, '9.685')] [2025-01-04 14:02:46,296][134294] Updated weights for policy 0, policy_version 217024 (0.0022) [2025-01-04 14:02:48,968][134211] Fps is (10 sec: 13926.3, 60 sec: 14950.4, 300 sec: 13759.8). Total num frames: 888963072. Throughput: 0: 3851.9. Samples: 211405416. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 14:02:48,969][134211] Avg episode reward: [(0, '9.324')] [2025-01-04 14:02:49,371][134294] Updated weights for policy 0, policy_version 217034 (0.0027) [2025-01-04 14:02:52,359][134294] Updated weights for policy 0, policy_version 217044 (0.0026) [2025-01-04 14:02:53,968][134211] Fps is (10 sec: 13516.9, 60 sec: 15018.7, 300 sec: 13815.3). Total num frames: 889032704. Throughput: 0: 3771.9. Samples: 211425786. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 14:02:53,968][134211] Avg episode reward: [(0, '9.868')] [2025-01-04 14:02:55,296][134294] Updated weights for policy 0, policy_version 217054 (0.0026) [2025-01-04 14:02:58,234][134294] Updated weights for policy 0, policy_version 217064 (0.0028) [2025-01-04 14:02:58,968][134211] Fps is (10 sec: 13926.3, 60 sec: 15018.6, 300 sec: 13898.7). Total num frames: 889102336. Throughput: 0: 3512.7. Samples: 211446682. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 14:02:58,968][134211] Avg episode reward: [(0, '10.110')] [2025-01-04 14:03:01,246][134294] Updated weights for policy 0, policy_version 217074 (0.0025) [2025-01-04 14:03:03,968][134211] Fps is (10 sec: 13926.5, 60 sec: 15018.8, 300 sec: 13815.3). Total num frames: 889171968. Throughput: 0: 3379.8. Samples: 211456970. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 14:03:03,968][134211] Avg episode reward: [(0, '9.550')] [2025-01-04 14:03:04,289][134294] Updated weights for policy 0, policy_version 217084 (0.0025) [2025-01-04 14:03:07,207][134294] Updated weights for policy 0, policy_version 217094 (0.0025) [2025-01-04 14:03:08,968][134211] Fps is (10 sec: 13516.5, 60 sec: 14472.4, 300 sec: 13829.2). Total num frames: 889237504. Throughput: 0: 3388.5. Samples: 211477326. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 14:03:08,969][134211] Avg episode reward: [(0, '9.786')] [2025-01-04 14:03:10,163][134294] Updated weights for policy 0, policy_version 217104 (0.0023) [2025-01-04 14:03:13,089][134294] Updated weights for policy 0, policy_version 217114 (0.0025) [2025-01-04 14:03:13,968][134211] Fps is (10 sec: 13516.8, 60 sec: 13858.1, 300 sec: 13843.1). Total num frames: 889307136. Throughput: 0: 3420.9. Samples: 211498140. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 14:03:13,968][134211] Avg episode reward: [(0, '9.744')] [2025-01-04 14:03:13,979][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000217116_889307136.pth... [2025-01-04 14:03:14,056][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000216299_885960704.pth [2025-01-04 14:03:16,154][134294] Updated weights for policy 0, policy_version 217124 (0.0024) [2025-01-04 14:03:18,968][134211] Fps is (10 sec: 13926.9, 60 sec: 13516.8, 300 sec: 13884.8). Total num frames: 889376768. Throughput: 0: 3419.0. Samples: 211508290. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 14:03:18,968][134211] Avg episode reward: [(0, '9.619')] [2025-01-04 14:03:18,984][134294] Updated weights for policy 0, policy_version 217134 (0.0026) [2025-01-04 14:03:22,126][134294] Updated weights for policy 0, policy_version 217144 (0.0024) [2025-01-04 14:03:23,968][134211] Fps is (10 sec: 13926.2, 60 sec: 13653.3, 300 sec: 13940.3). Total num frames: 889446400. Throughput: 0: 3433.2. Samples: 211528894. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 14:03:23,968][134211] Avg episode reward: [(0, '9.774')] [2025-01-04 14:03:25,078][134294] Updated weights for policy 0, policy_version 217154 (0.0025) [2025-01-04 14:03:27,959][134294] Updated weights for policy 0, policy_version 217164 (0.0025) [2025-01-04 14:03:28,968][134211] Fps is (10 sec: 13926.3, 60 sec: 13721.6, 300 sec: 13843.1). Total num frames: 889516032. Throughput: 0: 3439.0. Samples: 211549818. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 14:03:28,968][134211] Avg episode reward: [(0, '9.381')] [2025-01-04 14:03:30,905][134294] Updated weights for policy 0, policy_version 217174 (0.0022) [2025-01-04 14:03:33,799][134294] Updated weights for policy 0, policy_version 217184 (0.0023) [2025-01-04 14:03:33,968][134211] Fps is (10 sec: 13926.3, 60 sec: 13789.8, 300 sec: 13759.8). Total num frames: 889585664. Throughput: 0: 3440.8. Samples: 211560252. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 14:03:33,969][134211] Avg episode reward: [(0, '9.000')] [2025-01-04 14:03:36,817][134294] Updated weights for policy 0, policy_version 217194 (0.0021) [2025-01-04 14:03:38,968][134211] Fps is (10 sec: 13926.4, 60 sec: 13858.1, 300 sec: 13829.3). Total num frames: 889655296. Throughput: 0: 3454.1. Samples: 211581220. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 14:03:38,968][134211] Avg episode reward: [(0, '10.754')] [2025-01-04 14:03:39,829][134294] Updated weights for policy 0, policy_version 217204 (0.0026) [2025-01-04 14:03:42,761][134294] Updated weights for policy 0, policy_version 217214 (0.0022) [2025-01-04 14:03:43,968][134211] Fps is (10 sec: 13516.9, 60 sec: 13721.6, 300 sec: 13857.0). Total num frames: 889720832. Throughput: 0: 3443.9. Samples: 211601658. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 14:03:43,968][134211] Avg episode reward: [(0, '8.630')] [2025-01-04 14:03:45,731][134294] Updated weights for policy 0, policy_version 217224 (0.0026) [2025-01-04 14:03:48,671][134294] Updated weights for policy 0, policy_version 217234 (0.0024) [2025-01-04 14:03:48,968][134211] Fps is (10 sec: 13926.6, 60 sec: 13858.2, 300 sec: 13912.5). Total num frames: 889794560. Throughput: 0: 3445.9. Samples: 211612036. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 14:03:48,968][134211] Avg episode reward: [(0, '9.210')] [2025-01-04 14:03:51,657][134294] Updated weights for policy 0, policy_version 217244 (0.0023) [2025-01-04 14:03:53,968][134211] Fps is (10 sec: 13926.4, 60 sec: 13789.9, 300 sec: 13940.3). Total num frames: 889860096. Throughput: 0: 3457.3. Samples: 211632902. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 14:03:53,968][134211] Avg episode reward: [(0, '10.003')] [2025-01-04 14:03:54,731][134294] Updated weights for policy 0, policy_version 217254 (0.0026) [2025-01-04 14:03:57,614][134294] Updated weights for policy 0, policy_version 217264 (0.0021) [2025-01-04 14:03:58,968][134211] Fps is (10 sec: 14745.6, 60 sec: 13994.7, 300 sec: 13968.1). Total num frames: 889942016. Throughput: 0: 3479.3. Samples: 211654708. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 14:03:58,968][134211] Avg episode reward: [(0, '9.397')] [2025-01-04 14:03:59,526][134294] Updated weights for policy 0, policy_version 217274 (0.0015) [2025-01-04 14:04:02,323][134294] Updated weights for policy 0, policy_version 217284 (0.0022) [2025-01-04 14:04:03,968][134211] Fps is (10 sec: 15565.0, 60 sec: 14062.9, 300 sec: 13912.6). Total num frames: 890015744. Throughput: 0: 3549.7. Samples: 211668026. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 14:04:03,968][134211] Avg episode reward: [(0, '9.850')] [2025-01-04 14:04:05,378][134294] Updated weights for policy 0, policy_version 217294 (0.0026) [2025-01-04 14:04:08,291][134294] Updated weights for policy 0, policy_version 217304 (0.0030) [2025-01-04 14:04:08,968][134211] Fps is (10 sec: 14336.1, 60 sec: 14131.3, 300 sec: 13995.8). Total num frames: 890085376. Throughput: 0: 3551.5. Samples: 211688710. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 14:04:08,968][134211] Avg episode reward: [(0, '9.753')] [2025-01-04 14:04:11,251][134294] Updated weights for policy 0, policy_version 217314 (0.0024) [2025-01-04 14:04:13,968][134211] Fps is (10 sec: 13926.2, 60 sec: 14131.2, 300 sec: 14037.5). Total num frames: 890155008. Throughput: 0: 3542.0. Samples: 211709208. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 14:04:13,968][134211] Avg episode reward: [(0, '8.728')] [2025-01-04 14:04:14,320][134294] Updated weights for policy 0, policy_version 217324 (0.0025) [2025-01-04 14:04:17,223][134294] Updated weights for policy 0, policy_version 217334 (0.0025) [2025-01-04 14:04:18,968][134211] Fps is (10 sec: 13516.8, 60 sec: 14063.0, 300 sec: 13982.0). Total num frames: 890220544. Throughput: 0: 3536.3. Samples: 211719386. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 14:04:18,968][134211] Avg episode reward: [(0, '9.777')] [2025-01-04 14:04:20,217][134294] Updated weights for policy 0, policy_version 217344 (0.0024) [2025-01-04 14:04:22,701][134294] Updated weights for policy 0, policy_version 217354 (0.0019) [2025-01-04 14:04:23,968][134211] Fps is (10 sec: 15154.9, 60 sec: 14335.9, 300 sec: 14051.4). Total num frames: 890306560. Throughput: 0: 3549.1. Samples: 211740930. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 14:04:23,969][134211] Avg episode reward: [(0, '10.234')] [2025-01-04 14:04:24,899][134294] Updated weights for policy 0, policy_version 217364 (0.0016) [2025-01-04 14:04:27,796][134294] Updated weights for policy 0, policy_version 217374 (0.0027) [2025-01-04 14:04:28,968][134211] Fps is (10 sec: 15564.5, 60 sec: 14336.0, 300 sec: 14134.7). Total num frames: 890376192. Throughput: 0: 3636.9. Samples: 211765320. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 14:04:28,968][134211] Avg episode reward: [(0, '9.391')] [2025-01-04 14:04:30,850][134294] Updated weights for policy 0, policy_version 217384 (0.0026) [2025-01-04 14:04:33,754][134294] Updated weights for policy 0, policy_version 217394 (0.0025) [2025-01-04 14:04:33,968][134211] Fps is (10 sec: 13926.7, 60 sec: 14336.0, 300 sec: 14093.0). Total num frames: 890445824. Throughput: 0: 3636.1. Samples: 211775660. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 14:04:33,968][134211] Avg episode reward: [(0, '9.497')] [2025-01-04 14:04:36,779][134294] Updated weights for policy 0, policy_version 217404 (0.0026) [2025-01-04 14:04:38,969][134211] Fps is (10 sec: 13924.0, 60 sec: 14335.6, 300 sec: 14106.8). Total num frames: 890515456. Throughput: 0: 3635.2. Samples: 211796494. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 14:04:38,970][134211] Avg episode reward: [(0, '9.759')] [2025-01-04 14:04:39,707][134294] Updated weights for policy 0, policy_version 217414 (0.0026) [2025-01-04 14:04:42,827][134294] Updated weights for policy 0, policy_version 217424 (0.0026) [2025-01-04 14:04:43,968][134211] Fps is (10 sec: 13517.0, 60 sec: 14336.0, 300 sec: 14162.4). Total num frames: 890580992. Throughput: 0: 3597.1. Samples: 211816576. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 14:04:43,968][134211] Avg episode reward: [(0, '9.905')] [2025-01-04 14:04:45,766][134294] Updated weights for policy 0, policy_version 217434 (0.0026) [2025-01-04 14:04:48,665][134294] Updated weights for policy 0, policy_version 217444 (0.0026) [2025-01-04 14:04:48,968][134211] Fps is (10 sec: 13928.9, 60 sec: 14336.0, 300 sec: 14162.4). Total num frames: 890654720. Throughput: 0: 3534.5. Samples: 211827080. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 14:04:48,968][134211] Avg episode reward: [(0, '10.228')] [2025-01-04 14:04:51,511][134294] Updated weights for policy 0, policy_version 217454 (0.0025) [2025-01-04 14:04:53,968][134211] Fps is (10 sec: 13926.2, 60 sec: 14336.0, 300 sec: 14162.4). Total num frames: 890720256. Throughput: 0: 3543.5. Samples: 211848170. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 14:04:53,968][134211] Avg episode reward: [(0, '9.569')] [2025-01-04 14:04:54,625][134294] Updated weights for policy 0, policy_version 217464 (0.0026) [2025-01-04 14:04:57,232][134294] Updated weights for policy 0, policy_version 217474 (0.0021) [2025-01-04 14:04:58,968][134211] Fps is (10 sec: 15564.7, 60 sec: 14472.5, 300 sec: 14245.7). Total num frames: 890810368. Throughput: 0: 3611.4. Samples: 211871720. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 14:04:58,968][134211] Avg episode reward: [(0, '9.453')] [2025-01-04 14:04:59,138][134294] Updated weights for policy 0, policy_version 217484 (0.0013) [2025-01-04 14:05:01,008][134294] Updated weights for policy 0, policy_version 217494 (0.0013) [2025-01-04 14:05:02,929][134294] Updated weights for policy 0, policy_version 217504 (0.0014) [2025-01-04 14:05:03,968][134211] Fps is (10 sec: 19660.1, 60 sec: 15018.5, 300 sec: 14384.6). Total num frames: 890916864. Throughput: 0: 3747.7. Samples: 211888036. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 14:05:03,969][134211] Avg episode reward: [(0, '10.057')] [2025-01-04 14:05:04,951][134294] Updated weights for policy 0, policy_version 217514 (0.0016) [2025-01-04 14:05:08,039][134294] Updated weights for policy 0, policy_version 217524 (0.0026) [2025-01-04 14:05:08,968][134211] Fps is (10 sec: 17613.0, 60 sec: 15018.6, 300 sec: 14384.6). Total num frames: 890986496. Throughput: 0: 3869.9. Samples: 211915072. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 14:05:08,968][134211] Avg episode reward: [(0, '9.748')] [2025-01-04 14:05:11,208][134294] Updated weights for policy 0, policy_version 217534 (0.0027) [2025-01-04 14:05:13,968][134211] Fps is (10 sec: 13517.0, 60 sec: 14950.3, 300 sec: 14384.6). Total num frames: 891052032. Throughput: 0: 3754.6. Samples: 211934276. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 14:05:13,969][134211] Avg episode reward: [(0, '8.235')] [2025-01-04 14:05:14,047][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000217543_891056128.pth... [2025-01-04 14:05:14,123][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000216690_887562240.pth [2025-01-04 14:05:14,501][134294] Updated weights for policy 0, policy_version 217544 (0.0031) [2025-01-04 14:05:17,621][134294] Updated weights for policy 0, policy_version 217554 (0.0026) [2025-01-04 14:05:18,969][134211] Fps is (10 sec: 13106.0, 60 sec: 14950.1, 300 sec: 14384.6). Total num frames: 891117568. Throughput: 0: 3738.8. Samples: 211943910. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 14:05:18,969][134211] Avg episode reward: [(0, '9.683')] [2025-01-04 14:05:20,540][134294] Updated weights for policy 0, policy_version 217564 (0.0027) [2025-01-04 14:05:23,946][134294] Updated weights for policy 0, policy_version 217574 (0.0026) [2025-01-04 14:05:23,968][134211] Fps is (10 sec: 13107.4, 60 sec: 14609.1, 300 sec: 14384.6). Total num frames: 891183104. Throughput: 0: 3725.9. Samples: 211964154. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 14:05:23,969][134211] Avg episode reward: [(0, '9.817')] [2025-01-04 14:05:27,435][134294] Updated weights for policy 0, policy_version 217584 (0.0029) [2025-01-04 14:05:28,968][134211] Fps is (10 sec: 12289.0, 60 sec: 14404.3, 300 sec: 14329.1). Total num frames: 891240448. Throughput: 0: 3657.0. Samples: 211981142. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 14:05:28,969][134211] Avg episode reward: [(0, '8.425')] [2025-01-04 14:05:30,936][134294] Updated weights for policy 0, policy_version 217594 (0.0030) [2025-01-04 14:05:33,968][134211] Fps is (10 sec: 11878.5, 60 sec: 14267.7, 300 sec: 14315.2). Total num frames: 891301888. Throughput: 0: 3626.9. Samples: 211990290. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 14:05:33,968][134211] Avg episode reward: [(0, '9.545')] [2025-01-04 14:05:34,098][134294] Updated weights for policy 0, policy_version 217604 (0.0027) [2025-01-04 14:05:36,949][134294] Updated weights for policy 0, policy_version 217614 (0.0023) [2025-01-04 14:05:38,968][134211] Fps is (10 sec: 13107.3, 60 sec: 14268.2, 300 sec: 14315.2). Total num frames: 891371520. Throughput: 0: 3613.9. Samples: 212010796. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 14:05:38,968][134211] Avg episode reward: [(0, '9.353')] [2025-01-04 14:05:39,924][134294] Updated weights for policy 0, policy_version 217624 (0.0024) [2025-01-04 14:05:42,786][134294] Updated weights for policy 0, policy_version 217634 (0.0023) [2025-01-04 14:05:43,968][134211] Fps is (10 sec: 13926.6, 60 sec: 14336.0, 300 sec: 14315.2). Total num frames: 891441152. Throughput: 0: 3551.7. Samples: 212031546. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 14:05:43,968][134211] Avg episode reward: [(0, '9.196')] [2025-01-04 14:05:45,737][134294] Updated weights for policy 0, policy_version 217644 (0.0024) [2025-01-04 14:05:48,541][134294] Updated weights for policy 0, policy_version 217654 (0.0026) [2025-01-04 14:05:48,969][134211] Fps is (10 sec: 14334.3, 60 sec: 14335.7, 300 sec: 14329.0). Total num frames: 891514880. Throughput: 0: 3427.9. Samples: 212042296. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 14:05:48,970][134211] Avg episode reward: [(0, '10.248')] [2025-01-04 14:05:51,485][134294] Updated weights for policy 0, policy_version 217664 (0.0023) [2025-01-04 14:05:53,968][134211] Fps is (10 sec: 14335.9, 60 sec: 14404.3, 300 sec: 14342.9). Total num frames: 891584512. Throughput: 0: 3301.0. Samples: 212063618. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 14:05:53,968][134211] Avg episode reward: [(0, '9.435')] [2025-01-04 14:05:54,467][134294] Updated weights for policy 0, policy_version 217674 (0.0022) [2025-01-04 14:05:56,459][134294] Updated weights for policy 0, policy_version 217684 (0.0014) [2025-01-04 14:05:58,940][134294] Updated weights for policy 0, policy_version 217694 (0.0023) [2025-01-04 14:05:58,968][134211] Fps is (10 sec: 15976.5, 60 sec: 14404.3, 300 sec: 14412.4). Total num frames: 891674624. Throughput: 0: 3439.7. Samples: 212089062. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 14:05:58,968][134211] Avg episode reward: [(0, '10.053')] [2025-01-04 14:06:01,844][134294] Updated weights for policy 0, policy_version 217704 (0.0026) [2025-01-04 14:06:03,969][134211] Fps is (10 sec: 15562.7, 60 sec: 13721.4, 300 sec: 14398.4). Total num frames: 891740160. Throughput: 0: 3461.9. Samples: 212099696. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 14:06:03,970][134211] Avg episode reward: [(0, '9.573')] [2025-01-04 14:06:04,914][134294] Updated weights for policy 0, policy_version 217714 (0.0029) [2025-01-04 14:06:07,944][134294] Updated weights for policy 0, policy_version 217724 (0.0026) [2025-01-04 14:06:08,968][134211] Fps is (10 sec: 13516.7, 60 sec: 13721.6, 300 sec: 14398.6). Total num frames: 891809792. Throughput: 0: 3464.8. Samples: 212120070. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 14:06:08,968][134211] Avg episode reward: [(0, '11.131')] [2025-01-04 14:06:10,762][134294] Updated weights for policy 0, policy_version 217734 (0.0023) [2025-01-04 14:06:13,638][134294] Updated weights for policy 0, policy_version 217744 (0.0023) [2025-01-04 14:06:13,968][134211] Fps is (10 sec: 14338.1, 60 sec: 13858.2, 300 sec: 14356.8). Total num frames: 891883520. Throughput: 0: 3564.2. Samples: 212141530. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 14:06:13,968][134211] Avg episode reward: [(0, '9.572')] [2025-01-04 14:06:16,479][134294] Updated weights for policy 0, policy_version 217754 (0.0024) [2025-01-04 14:06:18,968][134211] Fps is (10 sec: 14336.1, 60 sec: 13926.6, 300 sec: 14356.8). Total num frames: 891953152. Throughput: 0: 3596.9. Samples: 212152152. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 14:06:18,968][134211] Avg episode reward: [(0, '9.647')] [2025-01-04 14:06:19,495][134294] Updated weights for policy 0, policy_version 217764 (0.0023) [2025-01-04 14:06:22,330][134294] Updated weights for policy 0, policy_version 217774 (0.0023) [2025-01-04 14:06:23,968][134211] Fps is (10 sec: 13926.3, 60 sec: 13994.7, 300 sec: 14356.8). Total num frames: 892022784. Throughput: 0: 3608.7. Samples: 212173186. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 14:06:23,969][134211] Avg episode reward: [(0, '11.011')] [2025-01-04 14:06:25,248][134294] Updated weights for policy 0, policy_version 217784 (0.0025) [2025-01-04 14:06:28,067][134294] Updated weights for policy 0, policy_version 217794 (0.0025) [2025-01-04 14:06:28,968][134211] Fps is (10 sec: 13926.3, 60 sec: 14199.5, 300 sec: 14356.8). Total num frames: 892092416. Throughput: 0: 3623.6. Samples: 212194610. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 14:06:28,968][134211] Avg episode reward: [(0, '9.025')] [2025-01-04 14:06:30,979][134294] Updated weights for policy 0, policy_version 217804 (0.0024) [2025-01-04 14:06:33,785][134294] Updated weights for policy 0, policy_version 217814 (0.0026) [2025-01-04 14:06:33,969][134211] Fps is (10 sec: 14334.3, 60 sec: 14404.0, 300 sec: 14384.5). Total num frames: 892166144. Throughput: 0: 3622.1. Samples: 212205292. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 14:06:33,970][134211] Avg episode reward: [(0, '10.144')] [2025-01-04 14:06:36,620][134294] Updated weights for policy 0, policy_version 217824 (0.0023) [2025-01-04 14:06:38,970][134211] Fps is (10 sec: 14333.0, 60 sec: 14403.8, 300 sec: 14370.7). Total num frames: 892235776. Throughput: 0: 3625.2. Samples: 212226760. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 14:06:38,970][134211] Avg episode reward: [(0, '8.954')] [2025-01-04 14:06:39,654][134294] Updated weights for policy 0, policy_version 217834 (0.0023) [2025-01-04 14:06:42,601][134294] Updated weights for policy 0, policy_version 217844 (0.0025) [2025-01-04 14:06:43,968][134211] Fps is (10 sec: 13928.2, 60 sec: 14404.3, 300 sec: 14370.7). Total num frames: 892305408. Throughput: 0: 3519.4. Samples: 212247434. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 14:06:43,968][134211] Avg episode reward: [(0, '9.244')] [2025-01-04 14:06:45,259][134294] Updated weights for policy 0, policy_version 217854 (0.0020) [2025-01-04 14:06:47,138][134294] Updated weights for policy 0, policy_version 217864 (0.0013) [2025-01-04 14:06:48,937][134294] Updated weights for policy 0, policy_version 217874 (0.0013) [2025-01-04 14:06:48,968][134211] Fps is (10 sec: 17616.8, 60 sec: 14950.7, 300 sec: 14509.6). Total num frames: 892411904. Throughput: 0: 3582.2. Samples: 212260890. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 14:06:48,968][134211] Avg episode reward: [(0, '9.555')] [2025-01-04 14:06:50,886][134294] Updated weights for policy 0, policy_version 217884 (0.0014) [2025-01-04 14:06:52,751][134294] Updated weights for policy 0, policy_version 217894 (0.0014) [2025-01-04 14:06:53,968][134211] Fps is (10 sec: 20889.4, 60 sec: 15496.5, 300 sec: 14620.6). Total num frames: 892514304. Throughput: 0: 3857.9. Samples: 212293678. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 14:06:53,969][134211] Avg episode reward: [(0, '9.671')] [2025-01-04 14:06:55,487][134294] Updated weights for policy 0, policy_version 217904 (0.0026) [2025-01-04 14:06:58,532][134294] Updated weights for policy 0, policy_version 217914 (0.0026) [2025-01-04 14:06:58,968][134211] Fps is (10 sec: 16793.2, 60 sec: 15086.9, 300 sec: 14606.8). Total num frames: 892579840. Throughput: 0: 3878.6. Samples: 212316066. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 14:06:58,968][134211] Avg episode reward: [(0, '9.072')] [2025-01-04 14:07:01,622][134294] Updated weights for policy 0, policy_version 217924 (0.0028) [2025-01-04 14:07:03,968][134211] Fps is (10 sec: 13107.2, 60 sec: 15087.3, 300 sec: 14495.7). Total num frames: 892645376. Throughput: 0: 3858.7. Samples: 212325796. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 14:07:03,968][134211] Avg episode reward: [(0, '8.709')] [2025-01-04 14:07:04,811][134294] Updated weights for policy 0, policy_version 217934 (0.0024) [2025-01-04 14:07:07,779][134294] Updated weights for policy 0, policy_version 217944 (0.0026) [2025-01-04 14:07:08,968][134211] Fps is (10 sec: 13107.2, 60 sec: 15018.6, 300 sec: 14356.8). Total num frames: 892710912. Throughput: 0: 3834.3. Samples: 212345732. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 14:07:08,971][134211] Avg episode reward: [(0, '10.545')] [2025-01-04 14:07:10,942][134294] Updated weights for policy 0, policy_version 217954 (0.0026) [2025-01-04 14:07:13,809][134294] Updated weights for policy 0, policy_version 217964 (0.0024) [2025-01-04 14:07:13,968][134211] Fps is (10 sec: 13516.7, 60 sec: 14950.4, 300 sec: 14287.4). Total num frames: 892780544. Throughput: 0: 3812.0. Samples: 212366150. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 14:07:13,968][134211] Avg episode reward: [(0, '9.087')] [2025-01-04 14:07:13,979][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000217964_892780544.pth... [2025-01-04 14:07:14,057][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000217116_889307136.pth [2025-01-04 14:07:16,803][134294] Updated weights for policy 0, policy_version 217974 (0.0026) [2025-01-04 14:07:18,968][134211] Fps is (10 sec: 13926.6, 60 sec: 14950.4, 300 sec: 14315.2). Total num frames: 892850176. Throughput: 0: 3801.3. Samples: 212376346. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 14:07:18,968][134211] Avg episode reward: [(0, '9.670')] [2025-01-04 14:07:19,765][134294] Updated weights for policy 0, policy_version 217984 (0.0026) [2025-01-04 14:07:22,818][134294] Updated weights for policy 0, policy_version 217994 (0.0022) [2025-01-04 14:07:23,968][134211] Fps is (10 sec: 13926.5, 60 sec: 14950.4, 300 sec: 14329.1). Total num frames: 892919808. Throughput: 0: 3783.6. Samples: 212397016. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 14:07:23,968][134211] Avg episode reward: [(0, '9.325')] [2025-01-04 14:07:25,685][134294] Updated weights for policy 0, policy_version 218004 (0.0023) [2025-01-04 14:07:28,488][134294] Updated weights for policy 0, policy_version 218014 (0.0025) [2025-01-04 14:07:28,968][134211] Fps is (10 sec: 13926.4, 60 sec: 14950.4, 300 sec: 14342.9). Total num frames: 892989440. Throughput: 0: 3802.1. Samples: 212418528. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 14:07:28,968][134211] Avg episode reward: [(0, '9.333')] [2025-01-04 14:07:31,310][134294] Updated weights for policy 0, policy_version 218024 (0.0023) [2025-01-04 14:07:33,968][134211] Fps is (10 sec: 13926.4, 60 sec: 14882.4, 300 sec: 14356.8). Total num frames: 893059072. Throughput: 0: 3738.7. Samples: 212429134. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 14:07:33,968][134211] Avg episode reward: [(0, '8.885')] [2025-01-04 14:07:34,382][134294] Updated weights for policy 0, policy_version 218034 (0.0024) [2025-01-04 14:07:37,267][134294] Updated weights for policy 0, policy_version 218044 (0.0024) [2025-01-04 14:07:38,968][134211] Fps is (10 sec: 13926.5, 60 sec: 14882.7, 300 sec: 14343.0). Total num frames: 893128704. Throughput: 0: 3473.8. Samples: 212450000. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 14:07:38,968][134211] Avg episode reward: [(0, '9.946')] [2025-01-04 14:07:40,178][134294] Updated weights for policy 0, policy_version 218054 (0.0023) [2025-01-04 14:07:43,009][134294] Updated weights for policy 0, policy_version 218064 (0.0026) [2025-01-04 14:07:43,968][134211] Fps is (10 sec: 14335.7, 60 sec: 14950.3, 300 sec: 14370.7). Total num frames: 893202432. Throughput: 0: 3450.0. Samples: 212471316. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 14:07:43,969][134211] Avg episode reward: [(0, '10.392')] [2025-01-04 14:07:45,855][134294] Updated weights for policy 0, policy_version 218074 (0.0024) [2025-01-04 14:07:48,740][134294] Updated weights for policy 0, policy_version 218084 (0.0025) [2025-01-04 14:07:48,968][134211] Fps is (10 sec: 14336.0, 60 sec: 14336.0, 300 sec: 14370.7). Total num frames: 893272064. Throughput: 0: 3473.3. Samples: 212482096. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 14:07:48,968][134211] Avg episode reward: [(0, '9.121')] [2025-01-04 14:07:51,604][134294] Updated weights for policy 0, policy_version 218094 (0.0023) [2025-01-04 14:07:53,968][134211] Fps is (10 sec: 13926.8, 60 sec: 13789.9, 300 sec: 14370.7). Total num frames: 893341696. Throughput: 0: 3501.4. Samples: 212503294. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 14:07:53,968][134211] Avg episode reward: [(0, '10.572')] [2025-01-04 14:07:54,680][134294] Updated weights for policy 0, policy_version 218104 (0.0025) [2025-01-04 14:07:57,556][134294] Updated weights for policy 0, policy_version 218114 (0.0024) [2025-01-04 14:07:58,968][134211] Fps is (10 sec: 13926.3, 60 sec: 13858.2, 300 sec: 14370.7). Total num frames: 893411328. Throughput: 0: 3507.6. Samples: 212523990. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 14:07:58,968][134211] Avg episode reward: [(0, '9.999')] [2025-01-04 14:08:00,429][134294] Updated weights for policy 0, policy_version 218124 (0.0023) [2025-01-04 14:08:03,307][134294] Updated weights for policy 0, policy_version 218134 (0.0027) [2025-01-04 14:08:03,969][134211] Fps is (10 sec: 14334.9, 60 sec: 13994.5, 300 sec: 14398.5). Total num frames: 893485056. Throughput: 0: 3525.0. Samples: 212534972. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 14:08:03,969][134211] Avg episode reward: [(0, '9.053')] [2025-01-04 14:08:06,264][134294] Updated weights for policy 0, policy_version 218144 (0.0023) [2025-01-04 14:08:08,968][134211] Fps is (10 sec: 13926.2, 60 sec: 13994.7, 300 sec: 14384.6). Total num frames: 893550592. Throughput: 0: 3528.8. Samples: 212555812. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 14:08:08,968][134211] Avg episode reward: [(0, '9.242')] [2025-01-04 14:08:09,324][134294] Updated weights for policy 0, policy_version 218154 (0.0027) [2025-01-04 14:08:12,143][134294] Updated weights for policy 0, policy_version 218164 (0.0021) [2025-01-04 14:08:13,968][134211] Fps is (10 sec: 14747.1, 60 sec: 14199.5, 300 sec: 14426.3). Total num frames: 893632512. Throughput: 0: 3557.6. Samples: 212578618. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 14:08:13,968][134211] Avg episode reward: [(0, '9.456')] [2025-01-04 14:08:14,153][134294] Updated weights for policy 0, policy_version 218174 (0.0015) [2025-01-04 14:08:16,296][134294] Updated weights for policy 0, policy_version 218184 (0.0014) [2025-01-04 14:08:18,225][134294] Updated weights for policy 0, policy_version 218194 (0.0013) [2025-01-04 14:08:18,967][134211] Fps is (10 sec: 18432.6, 60 sec: 14745.7, 300 sec: 14537.3). Total num frames: 893734912. Throughput: 0: 3645.4. Samples: 212593176. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 14:08:18,968][134211] Avg episode reward: [(0, '9.885')] [2025-01-04 14:08:20,164][134294] Updated weights for policy 0, policy_version 218204 (0.0013) [2025-01-04 14:08:22,026][134294] Updated weights for policy 0, policy_version 218214 (0.0012) [2025-01-04 14:08:23,968][134211] Fps is (10 sec: 20069.6, 60 sec: 15223.4, 300 sec: 14634.5). Total num frames: 893833216. Throughput: 0: 3892.2. Samples: 212625152. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 14:08:23,969][134211] Avg episode reward: [(0, '9.276')] [2025-01-04 14:08:24,989][134294] Updated weights for policy 0, policy_version 218224 (0.0028) [2025-01-04 14:08:28,166][134294] Updated weights for policy 0, policy_version 218234 (0.0028) [2025-01-04 14:08:28,968][134211] Fps is (10 sec: 15974.1, 60 sec: 15086.9, 300 sec: 14606.8). Total num frames: 893894656. Throughput: 0: 3855.3. Samples: 212644804. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 14:08:28,968][134211] Avg episode reward: [(0, '9.791')] [2025-01-04 14:08:31,448][134294] Updated weights for policy 0, policy_version 218244 (0.0027) [2025-01-04 14:08:33,968][134211] Fps is (10 sec: 12698.0, 60 sec: 15018.7, 300 sec: 14592.9). Total num frames: 893960192. Throughput: 0: 3828.0. Samples: 212654358. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 14:08:33,968][134211] Avg episode reward: [(0, '9.691')] [2025-01-04 14:08:34,732][134294] Updated weights for policy 0, policy_version 218254 (0.0030) [2025-01-04 14:08:37,961][134294] Updated weights for policy 0, policy_version 218264 (0.0026) [2025-01-04 14:08:38,968][134211] Fps is (10 sec: 12697.6, 60 sec: 14882.1, 300 sec: 14579.0). Total num frames: 894021632. Throughput: 0: 3772.0. Samples: 212673034. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 14:08:38,968][134211] Avg episode reward: [(0, '9.600')] [2025-01-04 14:08:41,093][134294] Updated weights for policy 0, policy_version 218274 (0.0028) [2025-01-04 14:08:43,968][134211] Fps is (10 sec: 12697.6, 60 sec: 14745.7, 300 sec: 14551.2). Total num frames: 894087168. Throughput: 0: 3744.8. Samples: 212692504. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 14:08:43,968][134211] Avg episode reward: [(0, '10.595')] [2025-01-04 14:08:44,214][134294] Updated weights for policy 0, policy_version 218284 (0.0027) [2025-01-04 14:08:47,154][134294] Updated weights for policy 0, policy_version 218294 (0.0025) [2025-01-04 14:08:48,968][134211] Fps is (10 sec: 13106.8, 60 sec: 14677.2, 300 sec: 14551.2). Total num frames: 894152704. Throughput: 0: 3727.2. Samples: 212702692. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 14:08:48,969][134211] Avg episode reward: [(0, '9.825')] [2025-01-04 14:08:50,220][134294] Updated weights for policy 0, policy_version 218304 (0.0022) [2025-01-04 14:08:53,150][134294] Updated weights for policy 0, policy_version 218314 (0.0024) [2025-01-04 14:08:53,968][134211] Fps is (10 sec: 13516.6, 60 sec: 14677.3, 300 sec: 14509.6). Total num frames: 894222336. Throughput: 0: 3728.5. Samples: 212723596. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 14:08:53,968][134211] Avg episode reward: [(0, '9.465')] [2025-01-04 14:08:56,059][134294] Updated weights for policy 0, policy_version 218324 (0.0023) [2025-01-04 14:08:58,968][134211] Fps is (10 sec: 13926.8, 60 sec: 14677.3, 300 sec: 14495.7). Total num frames: 894291968. Throughput: 0: 3677.7. Samples: 212744116. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 14:08:58,968][134211] Avg episode reward: [(0, '10.048')] [2025-01-04 14:08:59,089][134294] Updated weights for policy 0, policy_version 218334 (0.0026) [2025-01-04 14:09:02,099][134294] Updated weights for policy 0, policy_version 218344 (0.0028) [2025-01-04 14:09:03,968][134211] Fps is (10 sec: 13516.9, 60 sec: 14541.0, 300 sec: 14481.8). Total num frames: 894357504. Throughput: 0: 3582.4. Samples: 212754386. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 14:09:03,968][134211] Avg episode reward: [(0, '10.267')] [2025-01-04 14:09:05,089][134294] Updated weights for policy 0, policy_version 218354 (0.0026) [2025-01-04 14:09:08,021][134294] Updated weights for policy 0, policy_version 218364 (0.0023) [2025-01-04 14:09:08,968][134211] Fps is (10 sec: 13926.2, 60 sec: 14677.3, 300 sec: 14495.7). Total num frames: 894431232. Throughput: 0: 3329.2. Samples: 212774966. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 14:09:08,968][134211] Avg episode reward: [(0, '9.932')] [2025-01-04 14:09:10,943][134294] Updated weights for policy 0, policy_version 218374 (0.0024) [2025-01-04 14:09:13,804][134294] Updated weights for policy 0, policy_version 218384 (0.0026) [2025-01-04 14:09:13,968][134211] Fps is (10 sec: 14336.0, 60 sec: 14472.5, 300 sec: 14509.6). Total num frames: 894500864. Throughput: 0: 3362.2. Samples: 212796102. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 14:09:13,968][134211] Avg episode reward: [(0, '9.672')] [2025-01-04 14:09:13,976][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000218384_894500864.pth... [2025-01-04 14:09:14,056][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000217543_891056128.pth [2025-01-04 14:09:16,882][134294] Updated weights for policy 0, policy_version 218394 (0.0025) [2025-01-04 14:09:18,968][134211] Fps is (10 sec: 13517.0, 60 sec: 13858.1, 300 sec: 14440.2). Total num frames: 894566400. Throughput: 0: 3372.3. Samples: 212806110. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 14:09:18,968][134211] Avg episode reward: [(0, '8.986')] [2025-01-04 14:09:19,993][134294] Updated weights for policy 0, policy_version 218404 (0.0023) [2025-01-04 14:09:22,945][134294] Updated weights for policy 0, policy_version 218414 (0.0024) [2025-01-04 14:09:23,969][134211] Fps is (10 sec: 13515.3, 60 sec: 13380.1, 300 sec: 14440.1). Total num frames: 894636032. Throughput: 0: 3413.1. Samples: 212826628. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 14:09:23,969][134211] Avg episode reward: [(0, '9.427')] [2025-01-04 14:09:25,887][134294] Updated weights for policy 0, policy_version 218424 (0.0027) [2025-01-04 14:09:28,798][134294] Updated weights for policy 0, policy_version 218434 (0.0025) [2025-01-04 14:09:28,968][134211] Fps is (10 sec: 13926.2, 60 sec: 13516.8, 300 sec: 14440.1). Total num frames: 894705664. Throughput: 0: 3445.5. Samples: 212847550. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 14:09:28,968][134211] Avg episode reward: [(0, '10.262')] [2025-01-04 14:09:31,680][134294] Updated weights for policy 0, policy_version 218444 (0.0025) [2025-01-04 14:09:33,969][134211] Fps is (10 sec: 13926.8, 60 sec: 13584.9, 300 sec: 14440.2). Total num frames: 894775296. Throughput: 0: 3448.7. Samples: 212857886. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 14:09:33,969][134211] Avg episode reward: [(0, '9.230')] [2025-01-04 14:09:34,867][134294] Updated weights for policy 0, policy_version 218454 (0.0026) [2025-01-04 14:09:37,831][134294] Updated weights for policy 0, policy_version 218464 (0.0025) [2025-01-04 14:09:38,968][134211] Fps is (10 sec: 13517.0, 60 sec: 13653.3, 300 sec: 14440.1). Total num frames: 894840832. Throughput: 0: 3436.9. Samples: 212878254. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 14:09:38,968][134211] Avg episode reward: [(0, '8.995')] [2025-01-04 14:09:40,754][134294] Updated weights for policy 0, policy_version 218474 (0.0022) [2025-01-04 14:09:42,767][134294] Updated weights for policy 0, policy_version 218484 (0.0013) [2025-01-04 14:09:43,968][134211] Fps is (10 sec: 15975.9, 60 sec: 14131.2, 300 sec: 14509.6). Total num frames: 894935040. Throughput: 0: 3534.8. Samples: 212903182. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 14:09:43,968][134211] Avg episode reward: [(0, '9.682')] [2025-01-04 14:09:44,628][134294] Updated weights for policy 0, policy_version 218494 (0.0015) [2025-01-04 14:09:46,536][134294] Updated weights for policy 0, policy_version 218504 (0.0015) [2025-01-04 14:09:48,426][134294] Updated weights for policy 0, policy_version 218514 (0.0014) [2025-01-04 14:09:48,968][134211] Fps is (10 sec: 20070.5, 60 sec: 14814.0, 300 sec: 14648.4). Total num frames: 895041536. Throughput: 0: 3667.4. Samples: 212919418. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 14:09:48,968][134211] Avg episode reward: [(0, '9.462')] [2025-01-04 14:09:50,811][134294] Updated weights for policy 0, policy_version 218524 (0.0019) [2025-01-04 14:09:53,968][134211] Fps is (10 sec: 17612.3, 60 sec: 14813.9, 300 sec: 14579.0). Total num frames: 895111168. Throughput: 0: 3806.6. Samples: 212946262. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 14:09:53,969][134211] Avg episode reward: [(0, '10.034')] [2025-01-04 14:09:54,026][134294] Updated weights for policy 0, policy_version 218534 (0.0028) [2025-01-04 14:09:57,234][134294] Updated weights for policy 0, policy_version 218544 (0.0025) [2025-01-04 14:09:58,968][134211] Fps is (10 sec: 13516.7, 60 sec: 14745.6, 300 sec: 14440.2). Total num frames: 895176704. Throughput: 0: 3762.6. Samples: 212965420. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-04 14:09:58,968][134211] Avg episode reward: [(0, '10.621')] [2025-01-04 14:10:00,357][134294] Updated weights for policy 0, policy_version 218554 (0.0026) [2025-01-04 14:10:03,245][134294] Updated weights for policy 0, policy_version 218564 (0.0026) [2025-01-04 14:10:03,968][134211] Fps is (10 sec: 13107.3, 60 sec: 14745.6, 300 sec: 14426.2). Total num frames: 895242240. Throughput: 0: 3765.9. Samples: 212975576. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-04 14:10:03,968][134211] Avg episode reward: [(0, '8.949')] [2025-01-04 14:10:06,370][134294] Updated weights for policy 0, policy_version 218574 (0.0029) [2025-01-04 14:10:08,968][134211] Fps is (10 sec: 13516.7, 60 sec: 14677.3, 300 sec: 14440.2). Total num frames: 895311872. Throughput: 0: 3755.1. Samples: 212995604. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-04 14:10:08,968][134211] Avg episode reward: [(0, '8.050')] [2025-01-04 14:10:09,438][134294] Updated weights for policy 0, policy_version 218584 (0.0024) [2025-01-04 14:10:12,500][134294] Updated weights for policy 0, policy_version 218594 (0.0026) [2025-01-04 14:10:13,968][134211] Fps is (10 sec: 13516.8, 60 sec: 14609.0, 300 sec: 14440.2). Total num frames: 895377408. Throughput: 0: 3738.8. Samples: 213015794. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-04 14:10:13,969][134211] Avg episode reward: [(0, '9.969')] [2025-01-04 14:10:15,427][134294] Updated weights for policy 0, policy_version 218604 (0.0027) [2025-01-04 14:10:18,327][134294] Updated weights for policy 0, policy_version 218614 (0.0025) [2025-01-04 14:10:18,968][134211] Fps is (10 sec: 13926.5, 60 sec: 14745.6, 300 sec: 14467.9). Total num frames: 895451136. Throughput: 0: 3743.9. Samples: 213026360. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-04 14:10:18,968][134211] Avg episode reward: [(0, '8.813')] [2025-01-04 14:10:21,265][134294] Updated weights for policy 0, policy_version 218624 (0.0023) [2025-01-04 14:10:23,968][134211] Fps is (10 sec: 13926.6, 60 sec: 14677.6, 300 sec: 14495.7). Total num frames: 895516672. Throughput: 0: 3751.5. Samples: 213047072. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-04 14:10:23,968][134211] Avg episode reward: [(0, '9.513')] [2025-01-04 14:10:24,395][134294] Updated weights for policy 0, policy_version 218634 (0.0026) [2025-01-04 14:10:27,249][134294] Updated weights for policy 0, policy_version 218644 (0.0025) [2025-01-04 14:10:28,968][134211] Fps is (10 sec: 13516.7, 60 sec: 14677.3, 300 sec: 14523.4). Total num frames: 895586304. Throughput: 0: 3651.5. Samples: 213067498. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-04 14:10:28,968][134211] Avg episode reward: [(0, '9.768')] [2025-01-04 14:10:30,304][134294] Updated weights for policy 0, policy_version 218654 (0.0025) [2025-01-04 14:10:33,248][134294] Updated weights for policy 0, policy_version 218664 (0.0024) [2025-01-04 14:10:33,968][134211] Fps is (10 sec: 13926.4, 60 sec: 14677.5, 300 sec: 14523.4). Total num frames: 895655936. Throughput: 0: 3523.8. Samples: 213077988. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-04 14:10:33,968][134211] Avg episode reward: [(0, '9.658')] [2025-01-04 14:10:36,109][134294] Updated weights for policy 0, policy_version 218674 (0.0026) [2025-01-04 14:10:38,970][134211] Fps is (10 sec: 13923.5, 60 sec: 14745.1, 300 sec: 14523.3). Total num frames: 895725568. Throughput: 0: 3391.7. Samples: 213098894. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-04 14:10:38,970][134211] Avg episode reward: [(0, '9.707')] [2025-01-04 14:10:39,157][134294] Updated weights for policy 0, policy_version 218684 (0.0026) [2025-01-04 14:10:42,072][134294] Updated weights for policy 0, policy_version 218694 (0.0022) [2025-01-04 14:10:43,968][134211] Fps is (10 sec: 13926.4, 60 sec: 14336.0, 300 sec: 14509.6). Total num frames: 895795200. Throughput: 0: 3419.3. Samples: 213119290. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-04 14:10:43,968][134211] Avg episode reward: [(0, '10.252')] [2025-01-04 14:10:45,144][134294] Updated weights for policy 0, policy_version 218704 (0.0023) [2025-01-04 14:10:48,006][134294] Updated weights for policy 0, policy_version 218714 (0.0023) [2025-01-04 14:10:48,968][134211] Fps is (10 sec: 13929.5, 60 sec: 13721.6, 300 sec: 14509.6). Total num frames: 895864832. Throughput: 0: 3429.3. Samples: 213129894. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-04 14:10:48,968][134211] Avg episode reward: [(0, '9.799')] [2025-01-04 14:10:50,962][134294] Updated weights for policy 0, policy_version 218724 (0.0024) [2025-01-04 14:10:53,892][134294] Updated weights for policy 0, policy_version 218734 (0.0027) [2025-01-04 14:10:53,968][134211] Fps is (10 sec: 13926.3, 60 sec: 13721.6, 300 sec: 14440.1). Total num frames: 895934464. Throughput: 0: 3451.3. Samples: 213150914. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-04 14:10:53,968][134211] Avg episode reward: [(0, '8.720')] [2025-01-04 14:10:56,791][134294] Updated weights for policy 0, policy_version 218744 (0.0024) [2025-01-04 14:10:58,968][134211] Fps is (10 sec: 13926.4, 60 sec: 13789.9, 300 sec: 14454.1). Total num frames: 896004096. Throughput: 0: 3465.2. Samples: 213171726. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-04 14:10:58,968][134211] Avg episode reward: [(0, '10.084')] [2025-01-04 14:10:59,860][134294] Updated weights for policy 0, policy_version 218754 (0.0025) [2025-01-04 14:11:02,765][134294] Updated weights for policy 0, policy_version 218764 (0.0024) [2025-01-04 14:11:03,968][134211] Fps is (10 sec: 13516.6, 60 sec: 13789.9, 300 sec: 14440.1). Total num frames: 896069632. Throughput: 0: 3455.2. Samples: 213181844. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-04 14:11:03,968][134211] Avg episode reward: [(0, '10.901')] [2025-01-04 14:11:05,818][134294] Updated weights for policy 0, policy_version 218774 (0.0025) [2025-01-04 14:11:07,716][134294] Updated weights for policy 0, policy_version 218784 (0.0013) [2025-01-04 14:11:08,968][134211] Fps is (10 sec: 15974.5, 60 sec: 14199.5, 300 sec: 14509.6). Total num frames: 896163840. Throughput: 0: 3515.1. Samples: 213205252. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 14:11:08,968][134211] Avg episode reward: [(0, '9.800')] [2025-01-04 14:11:09,589][134294] Updated weights for policy 0, policy_version 218794 (0.0012) [2025-01-04 14:11:11,483][134294] Updated weights for policy 0, policy_version 218804 (0.0014) [2025-01-04 14:11:13,302][134294] Updated weights for policy 0, policy_version 218814 (0.0012) [2025-01-04 14:11:13,968][134211] Fps is (10 sec: 20480.0, 60 sec: 14950.4, 300 sec: 14648.4). Total num frames: 896274432. Throughput: 0: 3787.8. Samples: 213237948. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 14:11:13,968][134211] Avg episode reward: [(0, '10.102')] [2025-01-04 14:11:13,975][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000218817_896274432.pth... [2025-01-04 14:11:14,036][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000217964_892780544.pth [2025-01-04 14:11:15,954][134294] Updated weights for policy 0, policy_version 218824 (0.0027) [2025-01-04 14:11:18,968][134211] Fps is (10 sec: 17612.4, 60 sec: 14813.8, 300 sec: 14634.5). Total num frames: 896339968. Throughput: 0: 3812.3. Samples: 213249540. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 14:11:18,968][134211] Avg episode reward: [(0, '10.508')] [2025-01-04 14:11:19,122][134294] Updated weights for policy 0, policy_version 218834 (0.0024) [2025-01-04 14:11:22,210][134294] Updated weights for policy 0, policy_version 218844 (0.0027) [2025-01-04 14:11:23,968][134211] Fps is (10 sec: 13107.3, 60 sec: 14813.9, 300 sec: 14620.6). Total num frames: 896405504. Throughput: 0: 3784.6. Samples: 213269192. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 14:11:23,968][134211] Avg episode reward: [(0, '9.040')] [2025-01-04 14:11:25,345][134294] Updated weights for policy 0, policy_version 218854 (0.0030) [2025-01-04 14:11:28,358][134294] Updated weights for policy 0, policy_version 218864 (0.0025) [2025-01-04 14:11:28,968][134211] Fps is (10 sec: 13516.8, 60 sec: 14813.9, 300 sec: 14606.8). Total num frames: 896475136. Throughput: 0: 3779.2. Samples: 213289354. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 14:11:28,968][134211] Avg episode reward: [(0, '10.099')] [2025-01-04 14:11:31,343][134294] Updated weights for policy 0, policy_version 218874 (0.0026) [2025-01-04 14:11:33,968][134211] Fps is (10 sec: 13516.6, 60 sec: 14745.6, 300 sec: 14593.0). Total num frames: 896540672. Throughput: 0: 3772.0. Samples: 213299636. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 14:11:33,968][134211] Avg episode reward: [(0, '9.853')] [2025-01-04 14:11:34,490][134294] Updated weights for policy 0, policy_version 218884 (0.0026) [2025-01-04 14:11:37,445][134294] Updated weights for policy 0, policy_version 218894 (0.0024) [2025-01-04 14:11:38,968][134211] Fps is (10 sec: 13107.3, 60 sec: 14677.8, 300 sec: 14579.0). Total num frames: 896606208. Throughput: 0: 3750.9. Samples: 213319706. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 14:11:38,968][134211] Avg episode reward: [(0, '10.190')] [2025-01-04 14:11:40,460][134294] Updated weights for policy 0, policy_version 218904 (0.0025) [2025-01-04 14:11:43,345][134294] Updated weights for policy 0, policy_version 218914 (0.0024) [2025-01-04 14:11:43,968][134211] Fps is (10 sec: 13517.0, 60 sec: 14677.3, 300 sec: 14454.0). Total num frames: 896675840. Throughput: 0: 3751.7. Samples: 213340554. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 14:11:43,968][134211] Avg episode reward: [(0, '11.777')] [2025-01-04 14:11:46,318][134294] Updated weights for policy 0, policy_version 218924 (0.0027) [2025-01-04 14:11:48,968][134211] Fps is (10 sec: 13926.5, 60 sec: 14677.3, 300 sec: 14342.9). Total num frames: 896745472. Throughput: 0: 3754.8. Samples: 213350808. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 14:11:48,968][134211] Avg episode reward: [(0, '9.945')] [2025-01-04 14:11:49,379][134294] Updated weights for policy 0, policy_version 218934 (0.0024) [2025-01-04 14:11:52,387][134294] Updated weights for policy 0, policy_version 218944 (0.0025) [2025-01-04 14:11:53,968][134211] Fps is (10 sec: 13516.7, 60 sec: 14609.0, 300 sec: 14342.9). Total num frames: 896811008. Throughput: 0: 3686.2. Samples: 213371134. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 14:11:53,968][134211] Avg episode reward: [(0, '9.229')] [2025-01-04 14:11:55,310][134294] Updated weights for policy 0, policy_version 218954 (0.0023) [2025-01-04 14:11:58,221][134294] Updated weights for policy 0, policy_version 218964 (0.0024) [2025-01-04 14:11:58,968][134211] Fps is (10 sec: 13926.3, 60 sec: 14677.3, 300 sec: 14370.7). Total num frames: 896884736. Throughput: 0: 3426.2. Samples: 213392126. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 14:11:58,968][134211] Avg episode reward: [(0, '9.262')] [2025-01-04 14:12:01,113][134294] Updated weights for policy 0, policy_version 218974 (0.0025) [2025-01-04 14:12:03,968][134211] Fps is (10 sec: 14335.9, 60 sec: 14745.6, 300 sec: 14384.6). Total num frames: 896954368. Throughput: 0: 3402.3. Samples: 213402644. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 14:12:03,969][134211] Avg episode reward: [(0, '9.688')] [2025-01-04 14:12:04,258][134294] Updated weights for policy 0, policy_version 218984 (0.0025) [2025-01-04 14:12:07,328][134294] Updated weights for policy 0, policy_version 218994 (0.0023) [2025-01-04 14:12:08,968][134211] Fps is (10 sec: 13516.8, 60 sec: 14267.7, 300 sec: 14370.7). Total num frames: 897019904. Throughput: 0: 3409.6. Samples: 213422624. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 14:12:08,968][134211] Avg episode reward: [(0, '10.034')] [2025-01-04 14:12:10,313][134294] Updated weights for policy 0, policy_version 219004 (0.0022) [2025-01-04 14:12:13,201][134294] Updated weights for policy 0, policy_version 219014 (0.0026) [2025-01-04 14:12:13,969][134211] Fps is (10 sec: 13515.2, 60 sec: 13584.8, 300 sec: 14370.6). Total num frames: 897089536. Throughput: 0: 3422.7. Samples: 213443378. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 14:12:13,970][134211] Avg episode reward: [(0, '9.842')] [2025-01-04 14:12:16,185][134294] Updated weights for policy 0, policy_version 219024 (0.0024) [2025-01-04 14:12:18,968][134211] Fps is (10 sec: 13926.3, 60 sec: 13653.3, 300 sec: 14370.7). Total num frames: 897159168. Throughput: 0: 3425.8. Samples: 213453796. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 14:12:18,968][134211] Avg episode reward: [(0, '8.150')] [2025-01-04 14:12:19,268][134294] Updated weights for policy 0, policy_version 219034 (0.0028) [2025-01-04 14:12:21,922][134294] Updated weights for policy 0, policy_version 219044 (0.0018) [2025-01-04 14:12:23,785][134294] Updated weights for policy 0, policy_version 219054 (0.0013) [2025-01-04 14:12:23,968][134211] Fps is (10 sec: 15567.1, 60 sec: 13994.7, 300 sec: 14426.3). Total num frames: 897245184. Throughput: 0: 3468.5. Samples: 213475790. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 14:12:23,968][134211] Avg episode reward: [(0, '9.235')] [2025-01-04 14:12:25,726][134294] Updated weights for policy 0, policy_version 219064 (0.0013) [2025-01-04 14:12:27,555][134294] Updated weights for policy 0, policy_version 219074 (0.0013) [2025-01-04 14:12:28,967][134211] Fps is (10 sec: 19661.2, 60 sec: 14677.4, 300 sec: 14565.1). Total num frames: 897355776. Throughput: 0: 3726.8. Samples: 213508260. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 14:12:28,968][134211] Avg episode reward: [(0, '8.691')] [2025-01-04 14:12:29,517][134294] Updated weights for policy 0, policy_version 219084 (0.0017) [2025-01-04 14:12:32,540][134294] Updated weights for policy 0, policy_version 219094 (0.0026) [2025-01-04 14:12:33,968][134211] Fps is (10 sec: 18022.1, 60 sec: 14745.6, 300 sec: 14565.1). Total num frames: 897425408. Throughput: 0: 3784.4. Samples: 213521106. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 14:12:33,968][134211] Avg episode reward: [(0, '9.384')] [2025-01-04 14:12:35,622][134294] Updated weights for policy 0, policy_version 219104 (0.0028) [2025-01-04 14:12:38,651][134294] Updated weights for policy 0, policy_version 219114 (0.0026) [2025-01-04 14:12:38,968][134211] Fps is (10 sec: 13516.6, 60 sec: 14745.6, 300 sec: 14537.3). Total num frames: 897490944. Throughput: 0: 3775.1. Samples: 213541014. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 14:12:38,968][134211] Avg episode reward: [(0, '9.394')] [2025-01-04 14:12:41,757][134294] Updated weights for policy 0, policy_version 219124 (0.0028) [2025-01-04 14:12:43,968][134211] Fps is (10 sec: 13516.7, 60 sec: 14745.6, 300 sec: 14537.3). Total num frames: 897560576. Throughput: 0: 3751.2. Samples: 213560930. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 14:12:43,968][134211] Avg episode reward: [(0, '9.659')] [2025-01-04 14:12:44,807][134294] Updated weights for policy 0, policy_version 219134 (0.0027) [2025-01-04 14:12:47,844][134294] Updated weights for policy 0, policy_version 219144 (0.0023) [2025-01-04 14:12:48,968][134211] Fps is (10 sec: 13516.7, 60 sec: 14677.3, 300 sec: 14523.4). Total num frames: 897626112. Throughput: 0: 3737.2. Samples: 213570816. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 14:12:48,968][134211] Avg episode reward: [(0, '9.498')] [2025-01-04 14:12:50,869][134294] Updated weights for policy 0, policy_version 219154 (0.0028) [2025-01-04 14:12:53,857][134294] Updated weights for policy 0, policy_version 219164 (0.0025) [2025-01-04 14:12:53,968][134211] Fps is (10 sec: 13516.9, 60 sec: 14745.6, 300 sec: 14523.4). Total num frames: 897695744. Throughput: 0: 3753.9. Samples: 213591552. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 14:12:53,968][134211] Avg episode reward: [(0, '9.213')] [2025-01-04 14:12:56,783][134294] Updated weights for policy 0, policy_version 219174 (0.0024) [2025-01-04 14:12:58,968][134211] Fps is (10 sec: 13926.3, 60 sec: 14677.3, 300 sec: 14509.6). Total num frames: 897765376. Throughput: 0: 3747.8. Samples: 213612024. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 14:12:58,968][134211] Avg episode reward: [(0, '8.701')] [2025-01-04 14:12:59,781][134294] Updated weights for policy 0, policy_version 219184 (0.0024) [2025-01-04 14:13:02,790][134294] Updated weights for policy 0, policy_version 219194 (0.0024) [2025-01-04 14:13:03,968][134211] Fps is (10 sec: 13516.9, 60 sec: 14609.1, 300 sec: 14509.6). Total num frames: 897830912. Throughput: 0: 3742.5. Samples: 213622210. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 14:13:03,968][134211] Avg episode reward: [(0, '10.336')] [2025-01-04 14:13:05,819][134294] Updated weights for policy 0, policy_version 219204 (0.0025) [2025-01-04 14:13:08,771][134294] Updated weights for policy 0, policy_version 219214 (0.0023) [2025-01-04 14:13:08,968][134211] Fps is (10 sec: 13516.2, 60 sec: 14677.2, 300 sec: 14467.9). Total num frames: 897900544. Throughput: 0: 3714.3. Samples: 213642938. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 14:13:08,969][134211] Avg episode reward: [(0, '9.398')] [2025-01-04 14:13:11,856][134294] Updated weights for policy 0, policy_version 219224 (0.0027) [2025-01-04 14:13:13,968][134211] Fps is (10 sec: 13515.7, 60 sec: 14609.2, 300 sec: 14342.9). Total num frames: 897966080. Throughput: 0: 3437.4. Samples: 213662948. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 14:13:13,969][134211] Avg episode reward: [(0, '8.742')] [2025-01-04 14:13:13,982][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000219231_897970176.pth... [2025-01-04 14:13:14,049][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000218384_894500864.pth [2025-01-04 14:13:14,960][134294] Updated weights for policy 0, policy_version 219234 (0.0025) [2025-01-04 14:13:17,924][134294] Updated weights for policy 0, policy_version 219244 (0.0023) [2025-01-04 14:13:18,968][134211] Fps is (10 sec: 13517.6, 60 sec: 14609.1, 300 sec: 14245.8). Total num frames: 898035712. Throughput: 0: 3373.0. Samples: 213672892. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 14:13:18,968][134211] Avg episode reward: [(0, '9.349')] [2025-01-04 14:13:20,953][134294] Updated weights for policy 0, policy_version 219254 (0.0024) [2025-01-04 14:13:23,750][134294] Updated weights for policy 0, policy_version 219264 (0.0024) [2025-01-04 14:13:23,968][134211] Fps is (10 sec: 13927.5, 60 sec: 14336.0, 300 sec: 14273.5). Total num frames: 898105344. Throughput: 0: 3396.0. Samples: 213693834. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 14:13:23,968][134211] Avg episode reward: [(0, '9.148')] [2025-01-04 14:13:26,795][134294] Updated weights for policy 0, policy_version 219274 (0.0028) [2025-01-04 14:13:28,968][134211] Fps is (10 sec: 13926.5, 60 sec: 13653.3, 300 sec: 14287.4). Total num frames: 898174976. Throughput: 0: 3411.8. Samples: 213714460. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 14:13:28,968][134211] Avg episode reward: [(0, '9.197')] [2025-01-04 14:13:29,808][134294] Updated weights for policy 0, policy_version 219284 (0.0024) [2025-01-04 14:13:32,775][134294] Updated weights for policy 0, policy_version 219294 (0.0024) [2025-01-04 14:13:33,968][134211] Fps is (10 sec: 13516.8, 60 sec: 13585.1, 300 sec: 14301.3). Total num frames: 898240512. Throughput: 0: 3419.0. Samples: 213724672. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 14:13:33,968][134211] Avg episode reward: [(0, '9.353')] [2025-01-04 14:13:35,702][134294] Updated weights for policy 0, policy_version 219304 (0.0027) [2025-01-04 14:13:38,672][134294] Updated weights for policy 0, policy_version 219314 (0.0024) [2025-01-04 14:13:38,968][134211] Fps is (10 sec: 13926.4, 60 sec: 13721.6, 300 sec: 14329.1). Total num frames: 898314240. Throughput: 0: 3425.1. Samples: 213745682. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 14:13:38,971][134211] Avg episode reward: [(0, '9.348')] [2025-01-04 14:13:41,663][134294] Updated weights for policy 0, policy_version 219324 (0.0025) [2025-01-04 14:13:43,968][134211] Fps is (10 sec: 13516.7, 60 sec: 13585.1, 300 sec: 14315.2). Total num frames: 898375680. Throughput: 0: 3406.1. Samples: 213765296. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 14:13:43,968][134211] Avg episode reward: [(0, '8.494')] [2025-01-04 14:13:44,850][134294] Updated weights for policy 0, policy_version 219334 (0.0024) [2025-01-04 14:13:46,978][134294] Updated weights for policy 0, policy_version 219344 (0.0015) [2025-01-04 14:13:48,967][134211] Fps is (10 sec: 15565.1, 60 sec: 14063.0, 300 sec: 14398.5). Total num frames: 898469888. Throughput: 0: 3455.3. Samples: 213777700. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 14:13:48,968][134211] Avg episode reward: [(0, '9.822')] [2025-01-04 14:13:49,010][134294] Updated weights for policy 0, policy_version 219354 (0.0015) [2025-01-04 14:13:50,991][134294] Updated weights for policy 0, policy_version 219364 (0.0015) [2025-01-04 14:13:52,873][134294] Updated weights for policy 0, policy_version 219374 (0.0013) [2025-01-04 14:13:53,968][134211] Fps is (10 sec: 20070.7, 60 sec: 14677.4, 300 sec: 14523.4). Total num frames: 898576384. Throughput: 0: 3677.1. Samples: 213808404. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 14:13:53,968][134211] Avg episode reward: [(0, '8.793')] [2025-01-04 14:13:54,761][134294] Updated weights for policy 0, policy_version 219384 (0.0013) [2025-01-04 14:13:57,540][134294] Updated weights for policy 0, policy_version 219394 (0.0023) [2025-01-04 14:13:58,968][134211] Fps is (10 sec: 18431.6, 60 sec: 14813.9, 300 sec: 14565.1). Total num frames: 898654208. Throughput: 0: 3817.3. Samples: 213834724. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 14:13:58,968][134211] Avg episode reward: [(0, '9.132')] [2025-01-04 14:14:00,721][134294] Updated weights for policy 0, policy_version 219404 (0.0029) [2025-01-04 14:14:03,804][134294] Updated weights for policy 0, policy_version 219414 (0.0025) [2025-01-04 14:14:03,968][134211] Fps is (10 sec: 14335.6, 60 sec: 14813.8, 300 sec: 14537.3). Total num frames: 898719744. Throughput: 0: 3816.2. Samples: 213844622. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 14:14:03,969][134211] Avg episode reward: [(0, '9.843')] [2025-01-04 14:14:06,666][134294] Updated weights for policy 0, policy_version 219424 (0.0024) [2025-01-04 14:14:08,968][134211] Fps is (10 sec: 13516.5, 60 sec: 14814.0, 300 sec: 14537.3). Total num frames: 898789376. Throughput: 0: 3804.6. Samples: 213865044. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 14:14:08,969][134211] Avg episode reward: [(0, '9.833')] [2025-01-04 14:14:09,746][134294] Updated weights for policy 0, policy_version 219434 (0.0025) [2025-01-04 14:14:12,783][134294] Updated weights for policy 0, policy_version 219444 (0.0021) [2025-01-04 14:14:13,968][134211] Fps is (10 sec: 13926.5, 60 sec: 14882.3, 300 sec: 14551.2). Total num frames: 898859008. Throughput: 0: 3794.5. Samples: 213885212. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 14:14:13,968][134211] Avg episode reward: [(0, '9.782')] [2025-01-04 14:14:15,612][134294] Updated weights for policy 0, policy_version 219454 (0.0023) [2025-01-04 14:14:18,530][134294] Updated weights for policy 0, policy_version 219464 (0.0024) [2025-01-04 14:14:18,968][134211] Fps is (10 sec: 13926.8, 60 sec: 14882.2, 300 sec: 14551.3). Total num frames: 898928640. Throughput: 0: 3806.0. Samples: 213895944. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 14:14:18,968][134211] Avg episode reward: [(0, '9.431')] [2025-01-04 14:14:21,391][134294] Updated weights for policy 0, policy_version 219474 (0.0025) [2025-01-04 14:14:23,968][134211] Fps is (10 sec: 13926.5, 60 sec: 14882.1, 300 sec: 14551.2). Total num frames: 898998272. Throughput: 0: 3811.2. Samples: 213917188. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 14:14:23,968][134211] Avg episode reward: [(0, '9.433')] [2025-01-04 14:14:24,379][134294] Updated weights for policy 0, policy_version 219484 (0.0028) [2025-01-04 14:14:27,312][134294] Updated weights for policy 0, policy_version 219494 (0.0025) [2025-01-04 14:14:28,968][134211] Fps is (10 sec: 13926.3, 60 sec: 14882.1, 300 sec: 14551.3). Total num frames: 899067904. Throughput: 0: 3836.9. Samples: 213937956. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 14:14:28,968][134211] Avg episode reward: [(0, '9.510')] [2025-01-04 14:14:30,258][134294] Updated weights for policy 0, policy_version 219504 (0.0023) [2025-01-04 14:14:33,065][134294] Updated weights for policy 0, policy_version 219514 (0.0023) [2025-01-04 14:14:33,968][134211] Fps is (10 sec: 14335.8, 60 sec: 15018.6, 300 sec: 14579.0). Total num frames: 899141632. Throughput: 0: 3799.7. Samples: 213948686. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 14:14:33,968][134211] Avg episode reward: [(0, '9.267')] [2025-01-04 14:14:36,012][134294] Updated weights for policy 0, policy_version 219524 (0.0022) [2025-01-04 14:14:38,935][134294] Updated weights for policy 0, policy_version 219534 (0.0025) [2025-01-04 14:14:38,968][134211] Fps is (10 sec: 14335.9, 60 sec: 14950.4, 300 sec: 14495.7). Total num frames: 899211264. Throughput: 0: 3590.4. Samples: 213969974. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 14:14:38,968][134211] Avg episode reward: [(0, '10.787')] [2025-01-04 14:14:41,889][134294] Updated weights for policy 0, policy_version 219544 (0.0027) [2025-01-04 14:14:43,968][134211] Fps is (10 sec: 13926.4, 60 sec: 15086.9, 300 sec: 14370.7). Total num frames: 899280896. Throughput: 0: 3468.2. Samples: 213990794. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 14:14:43,968][134211] Avg episode reward: [(0, '10.512')] [2025-01-04 14:14:44,848][134294] Updated weights for policy 0, policy_version 219554 (0.0026) [2025-01-04 14:14:47,846][134294] Updated weights for policy 0, policy_version 219564 (0.0025) [2025-01-04 14:14:48,968][134211] Fps is (10 sec: 13516.8, 60 sec: 14609.0, 300 sec: 14356.8). Total num frames: 899346432. Throughput: 0: 3476.5. Samples: 214001062. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 14:14:48,968][134211] Avg episode reward: [(0, '9.245')] [2025-01-04 14:14:50,736][134294] Updated weights for policy 0, policy_version 219574 (0.0024) [2025-01-04 14:14:53,493][134294] Updated weights for policy 0, policy_version 219584 (0.0021) [2025-01-04 14:14:53,968][134211] Fps is (10 sec: 13926.4, 60 sec: 14062.9, 300 sec: 14384.6). Total num frames: 899420160. Throughput: 0: 3500.1. Samples: 214022546. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 14:14:53,968][134211] Avg episode reward: [(0, '10.671')] [2025-01-04 14:14:56,370][134294] Updated weights for policy 0, policy_version 219594 (0.0024) [2025-01-04 14:14:58,968][134211] Fps is (10 sec: 14336.0, 60 sec: 13926.4, 300 sec: 14398.5). Total num frames: 899489792. Throughput: 0: 3524.3. Samples: 214043806. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 14:14:58,968][134211] Avg episode reward: [(0, '9.605')] [2025-01-04 14:14:59,301][134294] Updated weights for policy 0, policy_version 219604 (0.0026) [2025-01-04 14:15:02,284][134294] Updated weights for policy 0, policy_version 219614 (0.0025) [2025-01-04 14:15:03,968][134211] Fps is (10 sec: 13926.6, 60 sec: 13994.7, 300 sec: 14398.5). Total num frames: 899559424. Throughput: 0: 3515.3. Samples: 214054134. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 14:15:03,968][134211] Avg episode reward: [(0, '9.549')] [2025-01-04 14:15:05,123][134294] Updated weights for policy 0, policy_version 219624 (0.0024) [2025-01-04 14:15:08,010][134294] Updated weights for policy 0, policy_version 219634 (0.0022) [2025-01-04 14:15:08,968][134211] Fps is (10 sec: 14336.0, 60 sec: 14063.0, 300 sec: 14426.3). Total num frames: 899633152. Throughput: 0: 3518.0. Samples: 214075498. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 14:15:08,968][134211] Avg episode reward: [(0, '9.572')] [2025-01-04 14:15:10,891][134294] Updated weights for policy 0, policy_version 219644 (0.0026) [2025-01-04 14:15:13,711][134294] Updated weights for policy 0, policy_version 219654 (0.0025) [2025-01-04 14:15:13,968][134211] Fps is (10 sec: 14336.0, 60 sec: 14062.9, 300 sec: 14412.4). Total num frames: 899702784. Throughput: 0: 3536.3. Samples: 214097090. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 14:15:13,968][134211] Avg episode reward: [(0, '9.560')] [2025-01-04 14:15:13,978][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000219655_899706880.pth... [2025-01-04 14:15:14,047][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000218817_896274432.pth [2025-01-04 14:15:16,616][134294] Updated weights for policy 0, policy_version 219664 (0.0026) [2025-01-04 14:15:18,968][134211] Fps is (10 sec: 13926.4, 60 sec: 14062.9, 300 sec: 14426.2). Total num frames: 899772416. Throughput: 0: 3526.6. Samples: 214107384. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 14:15:18,968][134211] Avg episode reward: [(0, '9.372')] [2025-01-04 14:15:19,581][134294] Updated weights for policy 0, policy_version 219674 (0.0026) [2025-01-04 14:15:22,557][134294] Updated weights for policy 0, policy_version 219684 (0.0023) [2025-01-04 14:15:23,968][134211] Fps is (10 sec: 13926.3, 60 sec: 14062.9, 300 sec: 14426.2). Total num frames: 899842048. Throughput: 0: 3517.5. Samples: 214128260. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 14:15:23,968][134211] Avg episode reward: [(0, '8.683')] [2025-01-04 14:15:25,428][134294] Updated weights for policy 0, policy_version 219694 (0.0023) [2025-01-04 14:15:28,300][134294] Updated weights for policy 0, policy_version 219704 (0.0025) [2025-01-04 14:15:28,968][134211] Fps is (10 sec: 14336.2, 60 sec: 14131.2, 300 sec: 14440.1). Total num frames: 899915776. Throughput: 0: 3533.2. Samples: 214149788. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 14:15:28,968][134211] Avg episode reward: [(0, '10.012')] [2025-01-04 14:15:31,142][134294] Updated weights for policy 0, policy_version 219714 (0.0023) [2025-01-04 14:15:33,968][134211] Fps is (10 sec: 14335.9, 60 sec: 14062.9, 300 sec: 14440.2). Total num frames: 899985408. Throughput: 0: 3540.6. Samples: 214160390. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 14:15:33,968][134211] Avg episode reward: [(0, '10.345')] [2025-01-04 14:15:34,112][134294] Updated weights for policy 0, policy_version 219724 (0.0024) [2025-01-04 14:15:36,763][134294] Updated weights for policy 0, policy_version 219734 (0.0021) [2025-01-04 14:15:38,671][134294] Updated weights for policy 0, policy_version 219744 (0.0013) [2025-01-04 14:15:38,967][134211] Fps is (10 sec: 15974.5, 60 sec: 14404.3, 300 sec: 14509.6). Total num frames: 900075520. Throughput: 0: 3568.9. Samples: 214183144. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-04 14:15:38,968][134211] Avg episode reward: [(0, '10.515')] [2025-01-04 14:15:40,506][134294] Updated weights for policy 0, policy_version 219754 (0.0015) [2025-01-04 14:15:43,174][134294] Updated weights for policy 0, policy_version 219764 (0.0023) [2025-01-04 14:15:43,968][134211] Fps is (10 sec: 17612.9, 60 sec: 14677.3, 300 sec: 14565.1). Total num frames: 900161536. Throughput: 0: 3728.5. Samples: 214211588. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-04 14:15:43,968][134211] Avg episode reward: [(0, '8.674')] [2025-01-04 14:15:46,100][134294] Updated weights for policy 0, policy_version 219774 (0.0027) [2025-01-04 14:15:48,969][134211] Fps is (10 sec: 15562.9, 60 sec: 14745.4, 300 sec: 14565.0). Total num frames: 900231168. Throughput: 0: 3725.6. Samples: 214221792. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-04 14:15:48,969][134211] Avg episode reward: [(0, '9.974')] [2025-01-04 14:15:49,239][134294] Updated weights for policy 0, policy_version 219784 (0.0027) [2025-01-04 14:15:52,206][134294] Updated weights for policy 0, policy_version 219794 (0.0027) [2025-01-04 14:15:53,968][134211] Fps is (10 sec: 13516.8, 60 sec: 14609.1, 300 sec: 14551.2). Total num frames: 900296704. Throughput: 0: 3703.6. Samples: 214242158. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-04 14:15:53,968][134211] Avg episode reward: [(0, '9.846')] [2025-01-04 14:15:55,163][134294] Updated weights for policy 0, policy_version 219804 (0.0024) [2025-01-04 14:15:58,105][134294] Updated weights for policy 0, policy_version 219814 (0.0024) [2025-01-04 14:15:58,968][134211] Fps is (10 sec: 13517.6, 60 sec: 14609.0, 300 sec: 14565.1). Total num frames: 900366336. Throughput: 0: 3687.5. Samples: 214263030. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-04 14:15:58,969][134211] Avg episode reward: [(0, '9.773')] [2025-01-04 14:16:00,941][134294] Updated weights for policy 0, policy_version 219824 (0.0025) [2025-01-04 14:16:03,859][134294] Updated weights for policy 0, policy_version 219834 (0.0028) [2025-01-04 14:16:03,968][134211] Fps is (10 sec: 14336.0, 60 sec: 14677.3, 300 sec: 14495.7). Total num frames: 900440064. Throughput: 0: 3696.8. Samples: 214273738. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-04 14:16:03,968][134211] Avg episode reward: [(0, '9.520')] [2025-01-04 14:16:06,747][134294] Updated weights for policy 0, policy_version 219844 (0.0025) [2025-01-04 14:16:08,968][134211] Fps is (10 sec: 14336.5, 60 sec: 14609.1, 300 sec: 14356.8). Total num frames: 900509696. Throughput: 0: 3705.6. Samples: 214295014. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-04 14:16:08,968][134211] Avg episode reward: [(0, '9.156')] [2025-01-04 14:16:09,728][134294] Updated weights for policy 0, policy_version 219854 (0.0023) [2025-01-04 14:16:12,644][134294] Updated weights for policy 0, policy_version 219864 (0.0022) [2025-01-04 14:16:13,968][134211] Fps is (10 sec: 13926.5, 60 sec: 14609.1, 300 sec: 14370.7). Total num frames: 900579328. Throughput: 0: 3687.7. Samples: 214315734. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-04 14:16:13,968][134211] Avg episode reward: [(0, '10.076')] [2025-01-04 14:16:15,533][134294] Updated weights for policy 0, policy_version 219874 (0.0023) [2025-01-04 14:16:18,351][134294] Updated weights for policy 0, policy_version 219884 (0.0026) [2025-01-04 14:16:18,968][134211] Fps is (10 sec: 13926.4, 60 sec: 14609.1, 300 sec: 14384.6). Total num frames: 900648960. Throughput: 0: 3691.9. Samples: 214326526. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-04 14:16:18,968][134211] Avg episode reward: [(0, '9.548')] [2025-01-04 14:16:21,177][134294] Updated weights for policy 0, policy_version 219894 (0.0026) [2025-01-04 14:16:23,968][134211] Fps is (10 sec: 14336.0, 60 sec: 14677.4, 300 sec: 14398.5). Total num frames: 900722688. Throughput: 0: 3660.9. Samples: 214347884. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-04 14:16:23,968][134211] Avg episode reward: [(0, '10.569')] [2025-01-04 14:16:24,163][134294] Updated weights for policy 0, policy_version 219904 (0.0027) [2025-01-04 14:16:27,076][134294] Updated weights for policy 0, policy_version 219914 (0.0024) [2025-01-04 14:16:28,968][134211] Fps is (10 sec: 14336.1, 60 sec: 14609.1, 300 sec: 14412.4). Total num frames: 900792320. Throughput: 0: 3496.0. Samples: 214368906. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-04 14:16:28,969][134211] Avg episode reward: [(0, '8.740')] [2025-01-04 14:16:29,949][134294] Updated weights for policy 0, policy_version 219924 (0.0026) [2025-01-04 14:16:32,260][134294] Updated weights for policy 0, policy_version 219934 (0.0018) [2025-01-04 14:16:33,968][134211] Fps is (10 sec: 15564.1, 60 sec: 14882.1, 300 sec: 14481.8). Total num frames: 900878336. Throughput: 0: 3513.0. Samples: 214379876. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-04 14:16:33,969][134211] Avg episode reward: [(0, '9.665')] [2025-01-04 14:16:34,521][134294] Updated weights for policy 0, policy_version 219944 (0.0017) [2025-01-04 14:16:37,340][134294] Updated weights for policy 0, policy_version 219954 (0.0024) [2025-01-04 14:16:38,968][134211] Fps is (10 sec: 15974.3, 60 sec: 14609.0, 300 sec: 14495.7). Total num frames: 900952064. Throughput: 0: 3631.1. Samples: 214405558. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-04 14:16:38,968][134211] Avg episode reward: [(0, '9.790')] [2025-01-04 14:16:40,320][134294] Updated weights for policy 0, policy_version 219964 (0.0029) [2025-01-04 14:16:43,136][134294] Updated weights for policy 0, policy_version 219974 (0.0023) [2025-01-04 14:16:43,968][134211] Fps is (10 sec: 14336.5, 60 sec: 14336.0, 300 sec: 14495.7). Total num frames: 901021696. Throughput: 0: 3638.3. Samples: 214426752. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-04 14:16:43,968][134211] Avg episode reward: [(0, '8.901')] [2025-01-04 14:16:45,952][134294] Updated weights for policy 0, policy_version 219984 (0.0024) [2025-01-04 14:16:48,826][134294] Updated weights for policy 0, policy_version 219994 (0.0023) [2025-01-04 14:16:48,968][134211] Fps is (10 sec: 14336.1, 60 sec: 14404.5, 300 sec: 14523.5). Total num frames: 901095424. Throughput: 0: 3638.9. Samples: 214437490. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 14:16:48,968][134211] Avg episode reward: [(0, '9.734')] [2025-01-04 14:16:51,694][134294] Updated weights for policy 0, policy_version 220004 (0.0021) [2025-01-04 14:16:53,968][134211] Fps is (10 sec: 14336.0, 60 sec: 14472.5, 300 sec: 14509.6). Total num frames: 901165056. Throughput: 0: 3642.7. Samples: 214458936. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 14:16:53,968][134211] Avg episode reward: [(0, '9.100')] [2025-01-04 14:16:54,721][134294] Updated weights for policy 0, policy_version 220014 (0.0025) [2025-01-04 14:16:56,614][134294] Updated weights for policy 0, policy_version 220024 (0.0014) [2025-01-04 14:16:58,502][134294] Updated weights for policy 0, policy_version 220034 (0.0012) [2025-01-04 14:16:58,967][134211] Fps is (10 sec: 17203.6, 60 sec: 15018.8, 300 sec: 14620.7). Total num frames: 901267456. Throughput: 0: 3793.5. Samples: 214486440. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 14:16:58,968][134211] Avg episode reward: [(0, '8.557')] [2025-01-04 14:17:00,356][134294] Updated weights for policy 0, policy_version 220044 (0.0013) [2025-01-04 14:17:02,251][134294] Updated weights for policy 0, policy_version 220054 (0.0014) [2025-01-04 14:17:03,968][134211] Fps is (10 sec: 20480.2, 60 sec: 15496.6, 300 sec: 14745.6). Total num frames: 901369856. Throughput: 0: 3918.8. Samples: 214502870. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 14:17:03,968][134211] Avg episode reward: [(0, '9.142')] [2025-01-04 14:17:04,733][134294] Updated weights for policy 0, policy_version 220064 (0.0021) [2025-01-04 14:17:07,928][134294] Updated weights for policy 0, policy_version 220074 (0.0027) [2025-01-04 14:17:08,968][134211] Fps is (10 sec: 16793.0, 60 sec: 15428.3, 300 sec: 14731.8). Total num frames: 901435392. Throughput: 0: 3974.8. Samples: 214526750. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 14:17:08,969][134211] Avg episode reward: [(0, '9.534')] [2025-01-04 14:17:11,151][134294] Updated weights for policy 0, policy_version 220084 (0.0027) [2025-01-04 14:17:13,968][134211] Fps is (10 sec: 13107.0, 60 sec: 15360.0, 300 sec: 14717.8). Total num frames: 901500928. Throughput: 0: 3933.1. Samples: 214545898. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 14:17:13,968][134211] Avg episode reward: [(0, '10.201')] [2025-01-04 14:17:13,981][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000220093_901500928.pth... [2025-01-04 14:17:14,054][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000219231_897970176.pth [2025-01-04 14:17:14,292][134294] Updated weights for policy 0, policy_version 220094 (0.0027) [2025-01-04 14:17:17,356][134294] Updated weights for policy 0, policy_version 220104 (0.0025) [2025-01-04 14:17:18,968][134211] Fps is (10 sec: 13107.3, 60 sec: 15291.7, 300 sec: 14648.4). Total num frames: 901566464. Throughput: 0: 3909.1. Samples: 214555786. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 14:17:18,969][134211] Avg episode reward: [(0, '9.337')] [2025-01-04 14:17:20,274][134294] Updated weights for policy 0, policy_version 220114 (0.0026) [2025-01-04 14:17:23,175][134294] Updated weights for policy 0, policy_version 220124 (0.0026) [2025-01-04 14:17:23,968][134211] Fps is (10 sec: 13516.9, 60 sec: 15223.4, 300 sec: 14509.5). Total num frames: 901636096. Throughput: 0: 3808.5. Samples: 214576942. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 14:17:23,968][134211] Avg episode reward: [(0, '9.958')] [2025-01-04 14:17:26,077][134294] Updated weights for policy 0, policy_version 220134 (0.0025) [2025-01-04 14:17:28,968][134211] Fps is (10 sec: 13926.6, 60 sec: 15223.5, 300 sec: 14509.6). Total num frames: 901705728. Throughput: 0: 3799.7. Samples: 214597740. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 14:17:28,968][134211] Avg episode reward: [(0, '9.877')] [2025-01-04 14:17:28,996][134294] Updated weights for policy 0, policy_version 220144 (0.0023) [2025-01-04 14:17:31,903][134294] Updated weights for policy 0, policy_version 220154 (0.0023) [2025-01-04 14:17:33,968][134211] Fps is (10 sec: 14336.1, 60 sec: 15018.8, 300 sec: 14537.3). Total num frames: 901779456. Throughput: 0: 3794.8. Samples: 214608258. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 14:17:33,969][134211] Avg episode reward: [(0, '10.286')] [2025-01-04 14:17:34,843][134294] Updated weights for policy 0, policy_version 220164 (0.0024) [2025-01-04 14:17:37,833][134294] Updated weights for policy 0, policy_version 220174 (0.0024) [2025-01-04 14:17:38,968][134211] Fps is (10 sec: 14336.0, 60 sec: 14950.4, 300 sec: 14537.3). Total num frames: 901849088. Throughput: 0: 3782.8. Samples: 214629162. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 14:17:38,968][134211] Avg episode reward: [(0, '9.207')] [2025-01-04 14:17:40,676][134294] Updated weights for policy 0, policy_version 220184 (0.0023) [2025-01-04 14:17:43,567][134294] Updated weights for policy 0, policy_version 220194 (0.0026) [2025-01-04 14:17:43,968][134211] Fps is (10 sec: 13926.4, 60 sec: 14950.4, 300 sec: 14551.2). Total num frames: 901918720. Throughput: 0: 3647.9. Samples: 214650596. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 14:17:43,968][134211] Avg episode reward: [(0, '9.371')] [2025-01-04 14:17:46,395][134294] Updated weights for policy 0, policy_version 220204 (0.0022) [2025-01-04 14:17:48,970][134211] Fps is (10 sec: 13923.5, 60 sec: 14881.6, 300 sec: 14551.1). Total num frames: 901988352. Throughput: 0: 3517.7. Samples: 214661174. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 14:17:48,970][134211] Avg episode reward: [(0, '8.861')] [2025-01-04 14:17:49,430][134294] Updated weights for policy 0, policy_version 220214 (0.0025) [2025-01-04 14:17:52,385][134294] Updated weights for policy 0, policy_version 220224 (0.0024) [2025-01-04 14:17:53,968][134211] Fps is (10 sec: 13926.4, 60 sec: 14882.1, 300 sec: 14551.2). Total num frames: 902057984. Throughput: 0: 3451.2. Samples: 214682052. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 14:17:53,968][134211] Avg episode reward: [(0, '9.598')] [2025-01-04 14:17:55,217][134294] Updated weights for policy 0, policy_version 220234 (0.0023) [2025-01-04 14:17:58,091][134294] Updated weights for policy 0, policy_version 220244 (0.0025) [2025-01-04 14:17:58,968][134211] Fps is (10 sec: 13929.2, 60 sec: 14335.9, 300 sec: 14565.1). Total num frames: 902127616. Throughput: 0: 3501.7. Samples: 214703474. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 14:17:58,969][134211] Avg episode reward: [(0, '8.088')] [2025-01-04 14:18:00,996][134294] Updated weights for policy 0, policy_version 220254 (0.0023) [2025-01-04 14:18:03,933][134294] Updated weights for policy 0, policy_version 220264 (0.0023) [2025-01-04 14:18:03,968][134211] Fps is (10 sec: 14336.1, 60 sec: 13858.1, 300 sec: 14579.0). Total num frames: 902201344. Throughput: 0: 3517.1. Samples: 214714054. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 14:18:03,968][134211] Avg episode reward: [(0, '8.729')] [2025-01-04 14:18:06,837][134294] Updated weights for policy 0, policy_version 220274 (0.0023) [2025-01-04 14:18:08,968][134211] Fps is (10 sec: 14335.0, 60 sec: 13926.2, 300 sec: 14592.9). Total num frames: 902270976. Throughput: 0: 3508.0. Samples: 214734804. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 14:18:08,969][134211] Avg episode reward: [(0, '9.295')] [2025-01-04 14:18:09,905][134294] Updated weights for policy 0, policy_version 220284 (0.0024) [2025-01-04 14:18:12,579][134294] Updated weights for policy 0, policy_version 220294 (0.0021) [2025-01-04 14:18:13,967][134211] Fps is (10 sec: 14745.8, 60 sec: 14131.3, 300 sec: 14620.6). Total num frames: 902348800. Throughput: 0: 3543.2. Samples: 214757184. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 14:18:13,968][134211] Avg episode reward: [(0, '10.067')] [2025-01-04 14:18:14,645][134294] Updated weights for policy 0, policy_version 220304 (0.0014) [2025-01-04 14:18:17,370][134294] Updated weights for policy 0, policy_version 220314 (0.0025) [2025-01-04 14:18:18,968][134211] Fps is (10 sec: 15566.0, 60 sec: 14336.0, 300 sec: 14648.4). Total num frames: 902426624. Throughput: 0: 3608.2. Samples: 214770628. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 14:18:18,968][134211] Avg episode reward: [(0, '10.544')] [2025-01-04 14:18:20,259][134294] Updated weights for policy 0, policy_version 220324 (0.0024) [2025-01-04 14:18:23,145][134294] Updated weights for policy 0, policy_version 220334 (0.0025) [2025-01-04 14:18:23,968][134211] Fps is (10 sec: 14745.4, 60 sec: 14336.0, 300 sec: 14648.4). Total num frames: 902496256. Throughput: 0: 3619.7. Samples: 214792048. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 14:18:23,968][134211] Avg episode reward: [(0, '10.282')] [2025-01-04 14:18:25,983][134294] Updated weights for policy 0, policy_version 220344 (0.0023) [2025-01-04 14:18:28,749][134294] Updated weights for policy 0, policy_version 220354 (0.0024) [2025-01-04 14:18:28,968][134211] Fps is (10 sec: 14335.9, 60 sec: 14404.2, 300 sec: 14676.2). Total num frames: 902569984. Throughput: 0: 3622.8. Samples: 214813620. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 14:18:28,968][134211] Avg episode reward: [(0, '9.622')] [2025-01-04 14:18:31,736][134294] Updated weights for policy 0, policy_version 220364 (0.0024) [2025-01-04 14:18:33,968][134211] Fps is (10 sec: 14335.6, 60 sec: 14336.0, 300 sec: 14662.3). Total num frames: 902639616. Throughput: 0: 3618.3. Samples: 214823990. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 14:18:33,969][134211] Avg episode reward: [(0, '10.117')] [2025-01-04 14:18:34,620][134294] Updated weights for policy 0, policy_version 220374 (0.0024) [2025-01-04 14:18:37,576][134294] Updated weights for policy 0, policy_version 220384 (0.0025) [2025-01-04 14:18:38,968][134211] Fps is (10 sec: 13926.4, 60 sec: 14336.0, 300 sec: 14690.1). Total num frames: 902709248. Throughput: 0: 3621.1. Samples: 214845002. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 14:18:38,968][134211] Avg episode reward: [(0, '8.588')] [2025-01-04 14:18:40,501][134294] Updated weights for policy 0, policy_version 220394 (0.0024) [2025-01-04 14:18:43,283][134294] Updated weights for policy 0, policy_version 220404 (0.0021) [2025-01-04 14:18:43,968][134211] Fps is (10 sec: 14336.2, 60 sec: 14404.3, 300 sec: 14620.6). Total num frames: 902782976. Throughput: 0: 3624.8. Samples: 214866592. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 14:18:43,968][134211] Avg episode reward: [(0, '10.633')] [2025-01-04 14:18:46,075][134294] Updated weights for policy 0, policy_version 220414 (0.0022) [2025-01-04 14:18:47,972][134294] Updated weights for policy 0, policy_version 220424 (0.0014) [2025-01-04 14:18:48,968][134211] Fps is (10 sec: 15974.7, 60 sec: 14677.9, 300 sec: 14551.2). Total num frames: 902868992. Throughput: 0: 3634.6. Samples: 214877612. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 14:18:48,968][134211] Avg episode reward: [(0, '10.165')] [2025-01-04 14:18:50,604][134294] Updated weights for policy 0, policy_version 220434 (0.0022) [2025-01-04 14:18:53,442][134294] Updated weights for policy 0, policy_version 220444 (0.0025) [2025-01-04 14:18:53,968][134211] Fps is (10 sec: 15974.4, 60 sec: 14745.6, 300 sec: 14537.3). Total num frames: 902942720. Throughput: 0: 3749.3. Samples: 214903518. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 14:18:53,968][134211] Avg episode reward: [(0, '10.248')] [2025-01-04 14:18:56,187][134294] Updated weights for policy 0, policy_version 220454 (0.0023) [2025-01-04 14:18:58,968][134211] Fps is (10 sec: 14744.9, 60 sec: 14813.8, 300 sec: 14565.1). Total num frames: 903016448. Throughput: 0: 3736.4. Samples: 214925322. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 14:18:58,969][134211] Avg episode reward: [(0, '9.795')] [2025-01-04 14:18:59,068][134294] Updated weights for policy 0, policy_version 220464 (0.0025) [2025-01-04 14:19:01,925][134294] Updated weights for policy 0, policy_version 220474 (0.0025) [2025-01-04 14:19:03,968][134211] Fps is (10 sec: 14336.0, 60 sec: 14745.6, 300 sec: 14565.1). Total num frames: 903086080. Throughput: 0: 3670.6. Samples: 214935804. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 14:19:03,968][134211] Avg episode reward: [(0, '10.205')] [2025-01-04 14:19:05,044][134294] Updated weights for policy 0, policy_version 220484 (0.0026) [2025-01-04 14:19:07,904][134294] Updated weights for policy 0, policy_version 220494 (0.0022) [2025-01-04 14:19:08,968][134211] Fps is (10 sec: 13927.0, 60 sec: 14745.8, 300 sec: 14565.1). Total num frames: 903155712. Throughput: 0: 3650.8. Samples: 214956332. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 14:19:08,968][134211] Avg episode reward: [(0, '9.942')] [2025-01-04 14:19:10,835][134294] Updated weights for policy 0, policy_version 220504 (0.0025) [2025-01-04 14:19:13,648][134294] Updated weights for policy 0, policy_version 220514 (0.0026) [2025-01-04 14:19:13,968][134211] Fps is (10 sec: 14335.8, 60 sec: 14677.2, 300 sec: 14579.0). Total num frames: 903229440. Throughput: 0: 3651.0. Samples: 214977914. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 14:19:13,969][134211] Avg episode reward: [(0, '9.633')] [2025-01-04 14:19:13,983][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000220515_903229440.pth... [2025-01-04 14:19:14,053][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000219655_899706880.pth [2025-01-04 14:19:16,227][134294] Updated weights for policy 0, policy_version 220524 (0.0019) [2025-01-04 14:19:18,271][134294] Updated weights for policy 0, policy_version 220534 (0.0016) [2025-01-04 14:19:18,968][134211] Fps is (10 sec: 15974.3, 60 sec: 14813.9, 300 sec: 14634.5). Total num frames: 903315456. Throughput: 0: 3680.5. Samples: 214989612. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 14:19:18,968][134211] Avg episode reward: [(0, '9.357')] [2025-01-04 14:19:21,094][134294] Updated weights for policy 0, policy_version 220544 (0.0024) [2025-01-04 14:19:23,968][134211] Fps is (10 sec: 15565.2, 60 sec: 14813.9, 300 sec: 14634.5). Total num frames: 903385088. Throughput: 0: 3754.1. Samples: 215013934. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 14:19:23,969][134211] Avg episode reward: [(0, '10.055')] [2025-01-04 14:19:24,172][134294] Updated weights for policy 0, policy_version 220554 (0.0021) [2025-01-04 14:19:27,112][134294] Updated weights for policy 0, policy_version 220564 (0.0023) [2025-01-04 14:19:28,968][134211] Fps is (10 sec: 13926.3, 60 sec: 14745.6, 300 sec: 14620.6). Total num frames: 903454720. Throughput: 0: 3733.4. Samples: 215034594. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 14:19:28,968][134211] Avg episode reward: [(0, '9.846')] [2025-01-04 14:19:29,918][134294] Updated weights for policy 0, policy_version 220574 (0.0025) [2025-01-04 14:19:32,794][134294] Updated weights for policy 0, policy_version 220584 (0.0023) [2025-01-04 14:19:33,968][134211] Fps is (10 sec: 13926.5, 60 sec: 14745.7, 300 sec: 14620.6). Total num frames: 903524352. Throughput: 0: 3729.7. Samples: 215045450. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 14:19:33,968][134211] Avg episode reward: [(0, '11.723')] [2025-01-04 14:19:35,714][134294] Updated weights for policy 0, policy_version 220594 (0.0025) [2025-01-04 14:19:38,521][134294] Updated weights for policy 0, policy_version 220604 (0.0022) [2025-01-04 14:19:38,968][134211] Fps is (10 sec: 14335.8, 60 sec: 14813.8, 300 sec: 14634.5). Total num frames: 903598080. Throughput: 0: 3630.2. Samples: 215066876. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 14:19:38,969][134211] Avg episode reward: [(0, '9.115')] [2025-01-04 14:19:41,385][134294] Updated weights for policy 0, policy_version 220614 (0.0021) [2025-01-04 14:19:43,968][134211] Fps is (10 sec: 14335.8, 60 sec: 14745.6, 300 sec: 14648.4). Total num frames: 903667712. Throughput: 0: 3614.9. Samples: 215087992. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 14:19:43,968][134211] Avg episode reward: [(0, '8.704')] [2025-01-04 14:19:44,293][134294] Updated weights for policy 0, policy_version 220624 (0.0026) [2025-01-04 14:19:46,257][134294] Updated weights for policy 0, policy_version 220634 (0.0014) [2025-01-04 14:19:48,910][134294] Updated weights for policy 0, policy_version 220644 (0.0023) [2025-01-04 14:19:48,968][134211] Fps is (10 sec: 15974.8, 60 sec: 14813.9, 300 sec: 14704.0). Total num frames: 903757824. Throughput: 0: 3698.3. Samples: 215102226. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 14:19:48,968][134211] Avg episode reward: [(0, '9.991')] [2025-01-04 14:19:51,895][134294] Updated weights for policy 0, policy_version 220654 (0.0027) [2025-01-04 14:19:53,968][134211] Fps is (10 sec: 15973.9, 60 sec: 14745.5, 300 sec: 14703.9). Total num frames: 903827456. Throughput: 0: 3721.4. Samples: 215123798. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 14:19:53,969][134211] Avg episode reward: [(0, '9.916')] [2025-01-04 14:19:54,899][134294] Updated weights for policy 0, policy_version 220664 (0.0025) [2025-01-04 14:19:57,794][134294] Updated weights for policy 0, policy_version 220674 (0.0024) [2025-01-04 14:19:58,968][134211] Fps is (10 sec: 13516.7, 60 sec: 14609.1, 300 sec: 14690.1). Total num frames: 903892992. Throughput: 0: 3706.4. Samples: 215144700. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 14:19:58,968][134211] Avg episode reward: [(0, '8.882')] [2025-01-04 14:20:00,708][134294] Updated weights for policy 0, policy_version 220684 (0.0022) [2025-01-04 14:20:03,530][134294] Updated weights for policy 0, policy_version 220694 (0.0023) [2025-01-04 14:20:03,968][134211] Fps is (10 sec: 13926.8, 60 sec: 14677.3, 300 sec: 14690.1). Total num frames: 903966720. Throughput: 0: 3686.9. Samples: 215155524. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 14:20:03,968][134211] Avg episode reward: [(0, '8.975')] [2025-01-04 14:20:06,429][134294] Updated weights for policy 0, policy_version 220704 (0.0024) [2025-01-04 14:20:08,968][134211] Fps is (10 sec: 14336.0, 60 sec: 14677.3, 300 sec: 14690.1). Total num frames: 904036352. Throughput: 0: 3616.9. Samples: 215176696. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 14:20:08,968][134211] Avg episode reward: [(0, '9.882')] [2025-01-04 14:20:09,403][134294] Updated weights for policy 0, policy_version 220714 (0.0024) [2025-01-04 14:20:12,341][134294] Updated weights for policy 0, policy_version 220724 (0.0024) [2025-01-04 14:20:13,968][134211] Fps is (10 sec: 13926.2, 60 sec: 14609.1, 300 sec: 14690.1). Total num frames: 904105984. Throughput: 0: 3618.0. Samples: 215197406. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 14:20:13,969][134211] Avg episode reward: [(0, '9.589')] [2025-01-04 14:20:15,284][134294] Updated weights for policy 0, policy_version 220734 (0.0025) [2025-01-04 14:20:17,254][134294] Updated weights for policy 0, policy_version 220744 (0.0014) [2025-01-04 14:20:18,968][134211] Fps is (10 sec: 15974.4, 60 sec: 14677.3, 300 sec: 14759.5). Total num frames: 904196096. Throughput: 0: 3635.4. Samples: 215209042. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 14:20:18,968][134211] Avg episode reward: [(0, '8.757')] [2025-01-04 14:20:19,686][134294] Updated weights for policy 0, policy_version 220754 (0.0020) [2025-01-04 14:20:22,473][134294] Updated weights for policy 0, policy_version 220764 (0.0029) [2025-01-04 14:20:23,968][134211] Fps is (10 sec: 15974.6, 60 sec: 14677.3, 300 sec: 14745.6). Total num frames: 904265728. Throughput: 0: 3724.9. Samples: 215234494. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 14:20:23,968][134211] Avg episode reward: [(0, '9.297')] [2025-01-04 14:20:25,386][134294] Updated weights for policy 0, policy_version 220774 (0.0024) [2025-01-04 14:20:28,267][134294] Updated weights for policy 0, policy_version 220784 (0.0026) [2025-01-04 14:20:28,968][134211] Fps is (10 sec: 14336.1, 60 sec: 14745.6, 300 sec: 14759.5). Total num frames: 904339456. Throughput: 0: 3731.1. Samples: 215255892. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 14:20:28,968][134211] Avg episode reward: [(0, '8.070')] [2025-01-04 14:20:31,016][134294] Updated weights for policy 0, policy_version 220794 (0.0025) [2025-01-04 14:20:33,916][134294] Updated weights for policy 0, policy_version 220804 (0.0023) [2025-01-04 14:20:33,968][134211] Fps is (10 sec: 14745.8, 60 sec: 14813.8, 300 sec: 14703.9). Total num frames: 904413184. Throughput: 0: 3651.0. Samples: 215266522. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 14:20:33,968][134211] Avg episode reward: [(0, '9.396')] [2025-01-04 14:20:36,747][134294] Updated weights for policy 0, policy_version 220814 (0.0022) [2025-01-04 14:20:38,968][134211] Fps is (10 sec: 14336.0, 60 sec: 14745.7, 300 sec: 14648.4). Total num frames: 904482816. Throughput: 0: 3649.4. Samples: 215288018. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 14:20:38,968][134211] Avg episode reward: [(0, '9.196')] [2025-01-04 14:20:39,715][134294] Updated weights for policy 0, policy_version 220824 (0.0025) [2025-01-04 14:20:42,695][134294] Updated weights for policy 0, policy_version 220834 (0.0024) [2025-01-04 14:20:43,968][134211] Fps is (10 sec: 14335.7, 60 sec: 14813.8, 300 sec: 14662.3). Total num frames: 904556544. Throughput: 0: 3647.3. Samples: 215308828. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 14:20:43,968][134211] Avg episode reward: [(0, '9.695')] [2025-01-04 14:20:44,787][134294] Updated weights for policy 0, policy_version 220844 (0.0017) [2025-01-04 14:20:46,649][134294] Updated weights for policy 0, policy_version 220854 (0.0014) [2025-01-04 14:20:48,903][134294] Updated weights for policy 0, policy_version 220864 (0.0017) [2025-01-04 14:20:48,968][134211] Fps is (10 sec: 17612.1, 60 sec: 15018.6, 300 sec: 14787.2). Total num frames: 904658944. Throughput: 0: 3767.8. Samples: 215325078. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 14:20:48,969][134211] Avg episode reward: [(0, '9.955')] [2025-01-04 14:20:51,938][134294] Updated weights for policy 0, policy_version 220874 (0.0027) [2025-01-04 14:20:53,969][134211] Fps is (10 sec: 16792.1, 60 sec: 14950.2, 300 sec: 14773.3). Total num frames: 904724480. Throughput: 0: 3828.3. Samples: 215348972. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 14:20:53,969][134211] Avg episode reward: [(0, '10.559')] [2025-01-04 14:20:55,033][134294] Updated weights for policy 0, policy_version 220884 (0.0027) [2025-01-04 14:20:57,984][134294] Updated weights for policy 0, policy_version 220894 (0.0022) [2025-01-04 14:20:58,968][134211] Fps is (10 sec: 13517.3, 60 sec: 15018.7, 300 sec: 14759.5). Total num frames: 904794112. Throughput: 0: 3820.5. Samples: 215369328. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 14:20:58,968][134211] Avg episode reward: [(0, '11.037')] [2025-01-04 14:21:00,942][134294] Updated weights for policy 0, policy_version 220904 (0.0026) [2025-01-04 14:21:03,830][134294] Updated weights for policy 0, policy_version 220914 (0.0025) [2025-01-04 14:21:03,968][134211] Fps is (10 sec: 13927.8, 60 sec: 14950.4, 300 sec: 14759.5). Total num frames: 904863744. Throughput: 0: 3792.5. Samples: 215379704. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 14:21:03,968][134211] Avg episode reward: [(0, '10.277')] [2025-01-04 14:21:06,786][134294] Updated weights for policy 0, policy_version 220924 (0.0025) [2025-01-04 14:21:08,968][134211] Fps is (10 sec: 13926.5, 60 sec: 14950.4, 300 sec: 14759.5). Total num frames: 904933376. Throughput: 0: 3695.1. Samples: 215400774. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 14:21:08,968][134211] Avg episode reward: [(0, '8.232')] [2025-01-04 14:21:09,795][134294] Updated weights for policy 0, policy_version 220934 (0.0025) [2025-01-04 14:21:12,667][134294] Updated weights for policy 0, policy_version 220944 (0.0025) [2025-01-04 14:21:13,969][134211] Fps is (10 sec: 13925.0, 60 sec: 14950.2, 300 sec: 14759.4). Total num frames: 905003008. Throughput: 0: 3685.4. Samples: 215421738. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 14:21:13,969][134211] Avg episode reward: [(0, '9.097')] [2025-01-04 14:21:13,978][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000220948_905003008.pth... [2025-01-04 14:21:14,052][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000220093_901500928.pth [2025-01-04 14:21:15,683][134294] Updated weights for policy 0, policy_version 220954 (0.0023) [2025-01-04 14:21:18,377][134294] Updated weights for policy 0, policy_version 220964 (0.0023) [2025-01-04 14:21:18,968][134211] Fps is (10 sec: 13926.3, 60 sec: 14609.1, 300 sec: 14745.6). Total num frames: 905072640. Throughput: 0: 3679.6. Samples: 215432106. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 14:21:18,968][134211] Avg episode reward: [(0, '9.629')] [2025-01-04 14:21:21,362][134294] Updated weights for policy 0, policy_version 220974 (0.0024) [2025-01-04 14:21:23,968][134211] Fps is (10 sec: 13927.9, 60 sec: 14609.1, 300 sec: 14745.6). Total num frames: 905142272. Throughput: 0: 3677.4. Samples: 215453500. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 14:21:23,968][134211] Avg episode reward: [(0, '9.835')] [2025-01-04 14:21:24,254][134294] Updated weights for policy 0, policy_version 220984 (0.0025) [2025-01-04 14:21:27,214][134294] Updated weights for policy 0, policy_version 220994 (0.0023) [2025-01-04 14:21:28,968][134211] Fps is (10 sec: 14336.1, 60 sec: 14609.1, 300 sec: 14704.0). Total num frames: 905216000. Throughput: 0: 3678.3. Samples: 215474352. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 14:21:28,968][134211] Avg episode reward: [(0, '10.604')] [2025-01-04 14:21:30,123][134294] Updated weights for policy 0, policy_version 221004 (0.0024) [2025-01-04 14:21:32,345][134294] Updated weights for policy 0, policy_version 221014 (0.0018) [2025-01-04 14:21:33,967][134211] Fps is (10 sec: 16384.3, 60 sec: 14882.2, 300 sec: 14759.5). Total num frames: 905306112. Throughput: 0: 3561.4. Samples: 215485338. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 14:21:33,968][134211] Avg episode reward: [(0, '10.635')] [2025-01-04 14:21:34,272][134294] Updated weights for policy 0, policy_version 221024 (0.0014) [2025-01-04 14:21:36,388][134294] Updated weights for policy 0, policy_version 221034 (0.0019) [2025-01-04 14:21:38,968][134211] Fps is (10 sec: 17612.5, 60 sec: 15155.2, 300 sec: 14815.0). Total num frames: 905392128. Throughput: 0: 3691.5. Samples: 215515084. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 14:21:38,968][134211] Avg episode reward: [(0, '9.821')] [2025-01-04 14:21:39,243][134294] Updated weights for policy 0, policy_version 221044 (0.0026) [2025-01-04 14:21:42,270][134294] Updated weights for policy 0, policy_version 221054 (0.0028) [2025-01-04 14:21:43,968][134211] Fps is (10 sec: 15155.0, 60 sec: 15018.7, 300 sec: 14787.3). Total num frames: 905457664. Throughput: 0: 3693.0. Samples: 215535514. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 14:21:43,968][134211] Avg episode reward: [(0, '10.377')] [2025-01-04 14:21:45,248][134294] Updated weights for policy 0, policy_version 221064 (0.0027) [2025-01-04 14:21:48,136][134294] Updated weights for policy 0, policy_version 221074 (0.0029) [2025-01-04 14:21:48,968][134211] Fps is (10 sec: 13517.0, 60 sec: 14472.6, 300 sec: 14787.3). Total num frames: 905527296. Throughput: 0: 3697.0. Samples: 215546068. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 14:21:48,968][134211] Avg episode reward: [(0, '10.542')] [2025-01-04 14:21:51,110][134294] Updated weights for policy 0, policy_version 221084 (0.0024) [2025-01-04 14:21:53,968][134211] Fps is (10 sec: 13926.4, 60 sec: 14541.1, 300 sec: 14676.2). Total num frames: 905596928. Throughput: 0: 3695.3. Samples: 215567064. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 14:21:53,968][134211] Avg episode reward: [(0, '9.407')] [2025-01-04 14:21:53,991][134294] Updated weights for policy 0, policy_version 221094 (0.0025) [2025-01-04 14:21:56,899][134294] Updated weights for policy 0, policy_version 221104 (0.0025) [2025-01-04 14:21:58,968][134211] Fps is (10 sec: 13926.2, 60 sec: 14540.8, 300 sec: 14565.1). Total num frames: 905666560. Throughput: 0: 3694.1. Samples: 215587968. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 14:21:58,968][134211] Avg episode reward: [(0, '9.876')] [2025-01-04 14:21:59,978][134294] Updated weights for policy 0, policy_version 221114 (0.0024) [2025-01-04 14:22:02,882][134294] Updated weights for policy 0, policy_version 221124 (0.0026) [2025-01-04 14:22:03,971][134211] Fps is (10 sec: 13922.0, 60 sec: 14540.1, 300 sec: 14578.8). Total num frames: 905736192. Throughput: 0: 3686.7. Samples: 215598020. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 14:22:03,971][134211] Avg episode reward: [(0, '9.589')] [2025-01-04 14:22:05,875][134294] Updated weights for policy 0, policy_version 221134 (0.0026) [2025-01-04 14:22:08,748][134294] Updated weights for policy 0, policy_version 221144 (0.0026) [2025-01-04 14:22:08,968][134211] Fps is (10 sec: 13926.4, 60 sec: 14540.8, 300 sec: 14592.9). Total num frames: 905805824. Throughput: 0: 3681.9. Samples: 215619184. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 14:22:08,968][134211] Avg episode reward: [(0, '9.892')] [2025-01-04 14:22:11,923][134294] Updated weights for policy 0, policy_version 221154 (0.0025) [2025-01-04 14:22:13,967][134211] Fps is (10 sec: 14340.8, 60 sec: 14609.4, 300 sec: 14620.6). Total num frames: 905879552. Throughput: 0: 3660.9. Samples: 215639094. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 14:22:13,968][134211] Avg episode reward: [(0, '10.794')] [2025-01-04 14:22:14,303][134294] Updated weights for policy 0, policy_version 221164 (0.0015) [2025-01-04 14:22:16,227][134294] Updated weights for policy 0, policy_version 221174 (0.0015) [2025-01-04 14:22:18,066][134294] Updated weights for policy 0, policy_version 221184 (0.0014) [2025-01-04 14:22:18,968][134211] Fps is (10 sec: 18022.8, 60 sec: 15223.5, 300 sec: 14745.6). Total num frames: 905986048. Throughput: 0: 3777.2. Samples: 215655314. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 14:22:18,968][134211] Avg episode reward: [(0, '10.038')] [2025-01-04 14:22:19,917][134294] Updated weights for policy 0, policy_version 221194 (0.0015) [2025-01-04 14:22:21,813][134294] Updated weights for policy 0, policy_version 221204 (0.0014) [2025-01-04 14:22:23,968][134211] Fps is (10 sec: 20889.1, 60 sec: 15769.6, 300 sec: 14856.7). Total num frames: 906088448. Throughput: 0: 3845.0. Samples: 215688108. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 14:22:23,968][134211] Avg episode reward: [(0, '10.229')] [2025-01-04 14:22:24,123][134294] Updated weights for policy 0, policy_version 221214 (0.0021) [2025-01-04 14:22:27,313][134294] Updated weights for policy 0, policy_version 221224 (0.0025) [2025-01-04 14:22:28,968][134211] Fps is (10 sec: 16383.7, 60 sec: 15564.8, 300 sec: 14815.0). Total num frames: 906149888. Throughput: 0: 3861.5. Samples: 215709284. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 14:22:28,968][134211] Avg episode reward: [(0, '9.702')] [2025-01-04 14:22:30,405][134294] Updated weights for policy 0, policy_version 221234 (0.0026) [2025-01-04 14:22:33,401][134294] Updated weights for policy 0, policy_version 221244 (0.0026) [2025-01-04 14:22:33,968][134211] Fps is (10 sec: 13516.3, 60 sec: 15291.6, 300 sec: 14828.9). Total num frames: 906223616. Throughput: 0: 3855.8. Samples: 215719580. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 14:22:33,969][134211] Avg episode reward: [(0, '9.784')] [2025-01-04 14:22:36,371][134294] Updated weights for policy 0, policy_version 221254 (0.0024) [2025-01-04 14:22:38,968][134211] Fps is (10 sec: 13926.4, 60 sec: 14950.4, 300 sec: 14815.0). Total num frames: 906289152. Throughput: 0: 3845.5. Samples: 215740110. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 14:22:38,968][134211] Avg episode reward: [(0, '11.042')] [2025-01-04 14:22:39,439][134294] Updated weights for policy 0, policy_version 221264 (0.0027) [2025-01-04 14:22:42,411][134294] Updated weights for policy 0, policy_version 221274 (0.0024) [2025-01-04 14:22:43,968][134211] Fps is (10 sec: 13517.2, 60 sec: 15018.6, 300 sec: 14815.1). Total num frames: 906358784. Throughput: 0: 3836.0. Samples: 215760586. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 14:22:43,968][134211] Avg episode reward: [(0, '10.033')] [2025-01-04 14:22:45,276][134294] Updated weights for policy 0, policy_version 221284 (0.0025) [2025-01-04 14:22:48,180][134294] Updated weights for policy 0, policy_version 221294 (0.0026) [2025-01-04 14:22:48,968][134211] Fps is (10 sec: 13926.4, 60 sec: 15018.6, 300 sec: 14815.0). Total num frames: 906428416. Throughput: 0: 3845.2. Samples: 215771044. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 14:22:48,968][134211] Avg episode reward: [(0, '10.672')] [2025-01-04 14:22:51,088][134294] Updated weights for policy 0, policy_version 221304 (0.0023) [2025-01-04 14:22:53,968][134211] Fps is (10 sec: 13926.5, 60 sec: 15018.6, 300 sec: 14815.0). Total num frames: 906498048. Throughput: 0: 3843.4. Samples: 215792136. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 14:22:53,968][134211] Avg episode reward: [(0, '8.856')] [2025-01-04 14:22:54,125][134294] Updated weights for policy 0, policy_version 221314 (0.0026) [2025-01-04 14:22:57,002][134294] Updated weights for policy 0, policy_version 221324 (0.0027) [2025-01-04 14:22:58,968][134211] Fps is (10 sec: 13925.7, 60 sec: 15018.5, 300 sec: 14801.1). Total num frames: 906567680. Throughput: 0: 3865.1. Samples: 215813026. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 14:22:58,969][134211] Avg episode reward: [(0, '9.436')] [2025-01-04 14:22:59,941][134294] Updated weights for policy 0, policy_version 221334 (0.0023) [2025-01-04 14:23:02,829][134294] Updated weights for policy 0, policy_version 221344 (0.0025) [2025-01-04 14:23:03,968][134211] Fps is (10 sec: 13926.7, 60 sec: 15019.5, 300 sec: 14801.2). Total num frames: 906637312. Throughput: 0: 3745.6. Samples: 215823864. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 14:23:03,968][134211] Avg episode reward: [(0, '9.840')] [2025-01-04 14:23:05,687][134294] Updated weights for policy 0, policy_version 221354 (0.0025) [2025-01-04 14:23:08,523][134294] Updated weights for policy 0, policy_version 221364 (0.0024) [2025-01-04 14:23:08,968][134211] Fps is (10 sec: 14336.8, 60 sec: 15086.9, 300 sec: 14787.2). Total num frames: 906711040. Throughput: 0: 3489.8. Samples: 215845150. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 14:23:08,968][134211] Avg episode reward: [(0, '8.896')] [2025-01-04 14:23:11,391][134294] Updated weights for policy 0, policy_version 221374 (0.0026) [2025-01-04 14:23:13,968][134211] Fps is (10 sec: 14335.7, 60 sec: 15018.6, 300 sec: 14759.5). Total num frames: 906780672. Throughput: 0: 3490.8. Samples: 215866370. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 14:23:13,968][134211] Avg episode reward: [(0, '9.900')] [2025-01-04 14:23:13,983][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000221382_906780672.pth... [2025-01-04 14:23:14,054][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000220515_903229440.pth [2025-01-04 14:23:14,348][134294] Updated weights for policy 0, policy_version 221384 (0.0024) [2025-01-04 14:23:17,268][134294] Updated weights for policy 0, policy_version 221394 (0.0023) [2025-01-04 14:23:18,968][134211] Fps is (10 sec: 13926.3, 60 sec: 14404.2, 300 sec: 14759.5). Total num frames: 906850304. Throughput: 0: 3490.7. Samples: 215876662. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 14:23:18,968][134211] Avg episode reward: [(0, '8.948')] [2025-01-04 14:23:20,190][134294] Updated weights for policy 0, policy_version 221404 (0.0023) [2025-01-04 14:23:23,051][134294] Updated weights for policy 0, policy_version 221414 (0.0025) [2025-01-04 14:23:23,968][134211] Fps is (10 sec: 14336.1, 60 sec: 13926.4, 300 sec: 14759.5). Total num frames: 906924032. Throughput: 0: 3512.4. Samples: 215898170. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 14:23:23,968][134211] Avg episode reward: [(0, '10.274')] [2025-01-04 14:23:25,889][134294] Updated weights for policy 0, policy_version 221424 (0.0024) [2025-01-04 14:23:28,719][134294] Updated weights for policy 0, policy_version 221434 (0.0026) [2025-01-04 14:23:28,968][134211] Fps is (10 sec: 14336.1, 60 sec: 14062.9, 300 sec: 14759.5). Total num frames: 906993664. Throughput: 0: 3536.4. Samples: 215919722. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 14:23:28,968][134211] Avg episode reward: [(0, '9.844')] [2025-01-04 14:23:31,548][134294] Updated weights for policy 0, policy_version 221444 (0.0027) [2025-01-04 14:23:33,968][134211] Fps is (10 sec: 13926.3, 60 sec: 13994.7, 300 sec: 14759.5). Total num frames: 907063296. Throughput: 0: 3539.0. Samples: 215930300. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 14:23:33,968][134211] Avg episode reward: [(0, '10.124')] [2025-01-04 14:23:34,619][134294] Updated weights for policy 0, policy_version 221454 (0.0026) [2025-01-04 14:23:37,540][134294] Updated weights for policy 0, policy_version 221464 (0.0023) [2025-01-04 14:23:38,968][134211] Fps is (10 sec: 13926.5, 60 sec: 14063.0, 300 sec: 14745.6). Total num frames: 907132928. Throughput: 0: 3533.8. Samples: 215951158. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 14:23:38,968][134211] Avg episode reward: [(0, '9.871')] [2025-01-04 14:23:40,401][134294] Updated weights for policy 0, policy_version 221474 (0.0025) [2025-01-04 14:23:43,236][134294] Updated weights for policy 0, policy_version 221484 (0.0027) [2025-01-04 14:23:43,968][134211] Fps is (10 sec: 14336.0, 60 sec: 14131.2, 300 sec: 14703.9). Total num frames: 907206656. Throughput: 0: 3546.7. Samples: 215972628. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 14:23:43,968][134211] Avg episode reward: [(0, '10.373')] [2025-01-04 14:23:46,117][134294] Updated weights for policy 0, policy_version 221494 (0.0022) [2025-01-04 14:23:48,968][134211] Fps is (10 sec: 14335.9, 60 sec: 14131.2, 300 sec: 14690.1). Total num frames: 907276288. Throughput: 0: 3542.5. Samples: 215983278. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 14:23:48,968][134211] Avg episode reward: [(0, '9.986')] [2025-01-04 14:23:49,036][134294] Updated weights for policy 0, policy_version 221504 (0.0024) [2025-01-04 14:23:51,935][134294] Updated weights for policy 0, policy_version 221514 (0.0024) [2025-01-04 14:23:53,866][134294] Updated weights for policy 0, policy_version 221524 (0.0013) [2025-01-04 14:23:53,968][134211] Fps is (10 sec: 15564.1, 60 sec: 14404.1, 300 sec: 14731.7). Total num frames: 907362304. Throughput: 0: 3547.1. Samples: 216004770. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 14:23:53,969][134211] Avg episode reward: [(0, '8.898')] [2025-01-04 14:23:56,497][134294] Updated weights for policy 0, policy_version 221534 (0.0021) [2025-01-04 14:23:58,968][134211] Fps is (10 sec: 15974.4, 60 sec: 14472.7, 300 sec: 14745.6). Total num frames: 907436032. Throughput: 0: 3640.2. Samples: 216030180. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 14:23:58,968][134211] Avg episode reward: [(0, '10.288')] [2025-01-04 14:23:59,322][134294] Updated weights for policy 0, policy_version 221544 (0.0023) [2025-01-04 14:24:02,287][134294] Updated weights for policy 0, policy_version 221554 (0.0025) [2025-01-04 14:24:03,968][134211] Fps is (10 sec: 14336.7, 60 sec: 14472.5, 300 sec: 14745.6). Total num frames: 907505664. Throughput: 0: 3644.3. Samples: 216040654. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 14:24:03,968][134211] Avg episode reward: [(0, '9.440')] [2025-01-04 14:24:05,238][134294] Updated weights for policy 0, policy_version 221564 (0.0022) [2025-01-04 14:24:08,141][134294] Updated weights for policy 0, policy_version 221574 (0.0026) [2025-01-04 14:24:08,969][134211] Fps is (10 sec: 13924.9, 60 sec: 14404.0, 300 sec: 14731.7). Total num frames: 907575296. Throughput: 0: 3638.1. Samples: 216061886. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 14:24:08,969][134211] Avg episode reward: [(0, '10.198')] [2025-01-04 14:24:10,946][134294] Updated weights for policy 0, policy_version 221584 (0.0025) [2025-01-04 14:24:13,859][134294] Updated weights for policy 0, policy_version 221594 (0.0023) [2025-01-04 14:24:13,968][134211] Fps is (10 sec: 14336.2, 60 sec: 14472.6, 300 sec: 14690.1). Total num frames: 907649024. Throughput: 0: 3634.9. Samples: 216083292. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 14:24:13,968][134211] Avg episode reward: [(0, '9.906')] [2025-01-04 14:24:16,641][134294] Updated weights for policy 0, policy_version 221604 (0.0020) [2025-01-04 14:24:18,968][134211] Fps is (10 sec: 14337.7, 60 sec: 14472.6, 300 sec: 14690.1). Total num frames: 907718656. Throughput: 0: 3634.0. Samples: 216093828. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 14:24:18,968][134211] Avg episode reward: [(0, '9.683')] [2025-01-04 14:24:19,730][134294] Updated weights for policy 0, policy_version 221614 (0.0026) [2025-01-04 14:24:22,393][134294] Updated weights for policy 0, policy_version 221624 (0.0019) [2025-01-04 14:24:23,968][134211] Fps is (10 sec: 15155.3, 60 sec: 14609.1, 300 sec: 14731.7). Total num frames: 907800576. Throughput: 0: 3648.0. Samples: 216115320. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 14:24:23,968][134211] Avg episode reward: [(0, '9.732')] [2025-01-04 14:24:24,539][134294] Updated weights for policy 0, policy_version 221634 (0.0012) [2025-01-04 14:24:26,548][134294] Updated weights for policy 0, policy_version 221644 (0.0013) [2025-01-04 14:24:28,541][134294] Updated weights for policy 0, policy_version 221654 (0.0014) [2025-01-04 14:24:28,968][134211] Fps is (10 sec: 18022.4, 60 sec: 15087.0, 300 sec: 14828.9). Total num frames: 907898880. Throughput: 0: 3840.7. Samples: 216145458. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 14:24:28,968][134211] Avg episode reward: [(0, '9.637')] [2025-01-04 14:24:31,621][134294] Updated weights for policy 0, policy_version 221664 (0.0025) [2025-01-04 14:24:33,969][134211] Fps is (10 sec: 15562.3, 60 sec: 14881.8, 300 sec: 14773.3). Total num frames: 907956224. Throughput: 0: 3838.4. Samples: 216156012. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 14:24:33,970][134211] Avg episode reward: [(0, '8.521')] [2025-01-04 14:24:35,984][134294] Updated weights for policy 0, policy_version 221674 (0.0040) [2025-01-04 14:24:38,968][134211] Fps is (10 sec: 10649.6, 60 sec: 14540.8, 300 sec: 14704.0). Total num frames: 908005376. Throughput: 0: 3657.0. Samples: 216169334. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 14:24:38,968][134211] Avg episode reward: [(0, '10.293')] [2025-01-04 14:24:39,434][134294] Updated weights for policy 0, policy_version 221684 (0.0023) [2025-01-04 14:24:41,548][134294] Updated weights for policy 0, policy_version 221694 (0.0014) [2025-01-04 14:24:43,917][134294] Updated weights for policy 0, policy_version 221704 (0.0017) [2025-01-04 14:24:43,968][134211] Fps is (10 sec: 14338.0, 60 sec: 14882.1, 300 sec: 14717.8). Total num frames: 908099584. Throughput: 0: 3663.7. Samples: 216195048. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 14:24:43,969][134211] Avg episode reward: [(0, '8.980')] [2025-01-04 14:24:47,295][134294] Updated weights for policy 0, policy_version 221714 (0.0028) [2025-01-04 14:24:48,968][134211] Fps is (10 sec: 15154.1, 60 sec: 14677.2, 300 sec: 14676.2). Total num frames: 908156928. Throughput: 0: 3646.4. Samples: 216204746. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 14:24:48,969][134211] Avg episode reward: [(0, '10.024')] [2025-01-04 14:24:50,528][134294] Updated weights for policy 0, policy_version 221724 (0.0030) [2025-01-04 14:24:53,968][134211] Fps is (10 sec: 11878.2, 60 sec: 14267.8, 300 sec: 14662.3). Total num frames: 908218368. Throughput: 0: 3591.0. Samples: 216223478. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 14:24:53,970][134211] Avg episode reward: [(0, '9.428')] [2025-01-04 14:24:54,287][134294] Updated weights for policy 0, policy_version 221734 (0.0028) [2025-01-04 14:24:58,000][134294] Updated weights for policy 0, policy_version 221744 (0.0030) [2025-01-04 14:24:58,968][134211] Fps is (10 sec: 11469.1, 60 sec: 13926.3, 300 sec: 14592.9). Total num frames: 908271616. Throughput: 0: 3468.3. Samples: 216239368. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 14:24:58,969][134211] Avg episode reward: [(0, '9.434')] [2025-01-04 14:25:01,689][134294] Updated weights for policy 0, policy_version 221754 (0.0028) [2025-01-04 14:25:03,968][134211] Fps is (10 sec: 11059.2, 60 sec: 13721.5, 300 sec: 14551.2). Total num frames: 908328960. Throughput: 0: 3426.9. Samples: 216248038. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 14:25:03,969][134211] Avg episode reward: [(0, '9.171')] [2025-01-04 14:25:04,856][134294] Updated weights for policy 0, policy_version 221764 (0.0019) [2025-01-04 14:25:07,150][134294] Updated weights for policy 0, policy_version 221774 (0.0017) [2025-01-04 14:25:08,968][134211] Fps is (10 sec: 14746.4, 60 sec: 14063.2, 300 sec: 14620.7). Total num frames: 908419072. Throughput: 0: 3441.6. Samples: 216270194. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 14:25:08,968][134211] Avg episode reward: [(0, '9.260')] [2025-01-04 14:25:09,232][134294] Updated weights for policy 0, policy_version 221784 (0.0015) [2025-01-04 14:25:11,322][134294] Updated weights for policy 0, policy_version 221794 (0.0014) [2025-01-04 14:25:13,622][134294] Updated weights for policy 0, policy_version 221804 (0.0017) [2025-01-04 14:25:13,968][134211] Fps is (10 sec: 18022.7, 60 sec: 14335.9, 300 sec: 14620.6). Total num frames: 908509184. Throughput: 0: 3406.9. Samples: 216298770. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 14:25:13,969][134211] Avg episode reward: [(0, '10.260')] [2025-01-04 14:25:13,980][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000221804_908509184.pth... [2025-01-04 14:25:14,092][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000220948_905003008.pth [2025-01-04 14:25:18,968][134211] Fps is (10 sec: 12697.2, 60 sec: 13789.8, 300 sec: 14509.6). Total num frames: 908546048. Throughput: 0: 3313.0. Samples: 216305092. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 14:25:18,969][134211] Avg episode reward: [(0, '10.151')] [2025-01-04 14:25:19,232][134294] Updated weights for policy 0, policy_version 221814 (0.0047) [2025-01-04 14:25:23,437][134294] Updated weights for policy 0, policy_version 221824 (0.0037) [2025-01-04 14:25:23,968][134211] Fps is (10 sec: 8601.5, 60 sec: 13243.7, 300 sec: 14426.2). Total num frames: 908595200. Throughput: 0: 3307.4. Samples: 216318168. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 14:25:23,969][134211] Avg episode reward: [(0, '9.479')] [2025-01-04 14:25:27,027][134294] Updated weights for policy 0, policy_version 221834 (0.0024) [2025-01-04 14:25:28,968][134211] Fps is (10 sec: 12288.2, 60 sec: 12834.1, 300 sec: 14426.3). Total num frames: 908668928. Throughput: 0: 3152.9. Samples: 216336926. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 14:25:28,968][134211] Avg episode reward: [(0, '9.491')] [2025-01-04 14:25:29,216][134294] Updated weights for policy 0, policy_version 221844 (0.0015) [2025-01-04 14:25:31,348][134294] Updated weights for policy 0, policy_version 221854 (0.0014) [2025-01-04 14:25:33,968][134211] Fps is (10 sec: 15564.1, 60 sec: 13243.9, 300 sec: 14467.9). Total num frames: 908750848. Throughput: 0: 3248.2. Samples: 216350914. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 14:25:33,970][134211] Avg episode reward: [(0, '8.763')] [2025-01-04 14:25:34,263][134294] Updated weights for policy 0, policy_version 221864 (0.0026) [2025-01-04 14:25:38,613][134294] Updated weights for policy 0, policy_version 221874 (0.0037) [2025-01-04 14:25:38,968][134211] Fps is (10 sec: 13107.0, 60 sec: 13243.7, 300 sec: 14384.6). Total num frames: 908800000. Throughput: 0: 3222.0. Samples: 216368466. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 14:25:38,968][134211] Avg episode reward: [(0, '9.014')] [2025-01-04 14:25:41,778][134294] Updated weights for policy 0, policy_version 221884 (0.0029) [2025-01-04 14:25:43,968][134211] Fps is (10 sec: 11059.8, 60 sec: 12697.6, 300 sec: 14245.8). Total num frames: 908861440. Throughput: 0: 3279.7. Samples: 216386952. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 14:25:43,968][134211] Avg episode reward: [(0, '9.787')] [2025-01-04 14:25:45,128][134294] Updated weights for policy 0, policy_version 221894 (0.0026) [2025-01-04 14:25:48,322][134294] Updated weights for policy 0, policy_version 221904 (0.0030) [2025-01-04 14:25:48,968][134211] Fps is (10 sec: 12288.1, 60 sec: 12766.0, 300 sec: 14231.9). Total num frames: 908922880. Throughput: 0: 3286.6. Samples: 216395932. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 14:25:48,968][134211] Avg episode reward: [(0, '8.487')] [2025-01-04 14:25:51,535][134294] Updated weights for policy 0, policy_version 221914 (0.0029) [2025-01-04 14:25:53,968][134211] Fps is (10 sec: 12697.8, 60 sec: 12834.2, 300 sec: 14218.0). Total num frames: 908988416. Throughput: 0: 3218.1. Samples: 216415010. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 14:25:53,968][134211] Avg episode reward: [(0, '8.913')] [2025-01-04 14:25:54,746][134294] Updated weights for policy 0, policy_version 221924 (0.0024) [2025-01-04 14:25:58,169][134294] Updated weights for policy 0, policy_version 221934 (0.0028) [2025-01-04 14:25:58,968][134211] Fps is (10 sec: 12697.4, 60 sec: 12970.7, 300 sec: 14190.2). Total num frames: 909049856. Throughput: 0: 2999.1. Samples: 216433728. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 14:25:58,969][134211] Avg episode reward: [(0, '11.003')] [2025-01-04 14:26:01,952][134294] Updated weights for policy 0, policy_version 221944 (0.0030) [2025-01-04 14:26:03,968][134211] Fps is (10 sec: 11468.6, 60 sec: 12902.4, 300 sec: 14134.7). Total num frames: 909103104. Throughput: 0: 3040.0. Samples: 216441892. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 14:26:03,969][134211] Avg episode reward: [(0, '9.582')] [2025-01-04 14:26:05,550][134294] Updated weights for policy 0, policy_version 221954 (0.0025) [2025-01-04 14:26:07,958][134294] Updated weights for policy 0, policy_version 221964 (0.0017) [2025-01-04 14:26:08,967][134211] Fps is (10 sec: 13517.3, 60 sec: 12765.9, 300 sec: 14176.4). Total num frames: 909185024. Throughput: 0: 3156.9. Samples: 216460228. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 14:26:08,968][134211] Avg episode reward: [(0, '9.851')] [2025-01-04 14:26:09,923][134294] Updated weights for policy 0, policy_version 221974 (0.0011) [2025-01-04 14:26:12,463][134294] Updated weights for policy 0, policy_version 221984 (0.0022) [2025-01-04 14:26:13,968][134211] Fps is (10 sec: 15974.3, 60 sec: 12561.1, 300 sec: 14204.1). Total num frames: 909262848. Throughput: 0: 3332.4. Samples: 216486886. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 14:26:13,968][134211] Avg episode reward: [(0, '9.730')] [2025-01-04 14:26:15,881][134294] Updated weights for policy 0, policy_version 221994 (0.0031) [2025-01-04 14:26:18,843][134294] Updated weights for policy 0, policy_version 222004 (0.0025) [2025-01-04 14:26:18,968][134211] Fps is (10 sec: 14335.6, 60 sec: 13039.0, 300 sec: 14190.2). Total num frames: 909328384. Throughput: 0: 3226.9. Samples: 216496122. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 14:26:18,968][134211] Avg episode reward: [(0, '9.856')] [2025-01-04 14:26:21,979][134294] Updated weights for policy 0, policy_version 222014 (0.0029) [2025-01-04 14:26:23,968][134211] Fps is (10 sec: 12697.8, 60 sec: 13243.8, 300 sec: 14148.5). Total num frames: 909389824. Throughput: 0: 3280.5. Samples: 216516088. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 14:26:23,968][134211] Avg episode reward: [(0, '10.051')] [2025-01-04 14:26:25,775][134294] Updated weights for policy 0, policy_version 222024 (0.0028) [2025-01-04 14:26:28,968][134211] Fps is (10 sec: 11878.4, 60 sec: 12970.6, 300 sec: 14037.5). Total num frames: 909447168. Throughput: 0: 3238.5. Samples: 216532684. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 14:26:28,968][134211] Avg episode reward: [(0, '9.342')] [2025-01-04 14:26:29,250][134294] Updated weights for policy 0, policy_version 222034 (0.0031) [2025-01-04 14:26:33,234][134294] Updated weights for policy 0, policy_version 222044 (0.0028) [2025-01-04 14:26:33,968][134211] Fps is (10 sec: 11878.6, 60 sec: 12629.5, 300 sec: 13954.2). Total num frames: 909508608. Throughput: 0: 3225.2. Samples: 216541064. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 14:26:33,968][134211] Avg episode reward: [(0, '9.712')] [2025-01-04 14:26:35,115][134294] Updated weights for policy 0, policy_version 222054 (0.0013) [2025-01-04 14:26:37,141][134294] Updated weights for policy 0, policy_version 222064 (0.0013) [2025-01-04 14:26:38,968][134211] Fps is (10 sec: 16384.4, 60 sec: 13516.9, 300 sec: 14079.1). Total num frames: 909611008. Throughput: 0: 3374.9. Samples: 216566880. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 14:26:38,968][134211] Avg episode reward: [(0, '9.561')] [2025-01-04 14:26:39,169][134294] Updated weights for policy 0, policy_version 222074 (0.0013) [2025-01-04 14:26:41,289][134294] Updated weights for policy 0, policy_version 222084 (0.0016) [2025-01-04 14:26:43,968][134211] Fps is (10 sec: 17202.5, 60 sec: 13653.3, 300 sec: 14079.1). Total num frames: 909680640. Throughput: 0: 3520.1. Samples: 216592134. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 14:26:43,969][134211] Avg episode reward: [(0, '10.213')] [2025-01-04 14:26:45,444][134294] Updated weights for policy 0, policy_version 222094 (0.0035) [2025-01-04 14:26:48,968][134211] Fps is (10 sec: 12287.7, 60 sec: 13516.8, 300 sec: 14023.6). Total num frames: 909733888. Throughput: 0: 3495.8. Samples: 216599202. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 14:26:48,968][134211] Avg episode reward: [(0, '9.774')] [2025-01-04 14:26:49,106][134294] Updated weights for policy 0, policy_version 222104 (0.0033) [2025-01-04 14:26:52,290][134294] Updated weights for policy 0, policy_version 222114 (0.0029) [2025-01-04 14:26:53,968][134211] Fps is (10 sec: 11878.6, 60 sec: 13516.8, 300 sec: 14009.7). Total num frames: 909799424. Throughput: 0: 3495.1. Samples: 216617508. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 14:26:53,968][134211] Avg episode reward: [(0, '9.252')] [2025-01-04 14:26:55,496][134294] Updated weights for policy 0, policy_version 222124 (0.0025) [2025-01-04 14:26:58,518][134294] Updated weights for policy 0, policy_version 222134 (0.0026) [2025-01-04 14:26:58,968][134211] Fps is (10 sec: 13107.2, 60 sec: 13585.1, 300 sec: 13996.0). Total num frames: 909864960. Throughput: 0: 3345.0. Samples: 216637410. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 14:26:58,968][134211] Avg episode reward: [(0, '9.478')] [2025-01-04 14:27:01,771][134294] Updated weights for policy 0, policy_version 222144 (0.0026) [2025-01-04 14:27:03,968][134211] Fps is (10 sec: 12697.6, 60 sec: 13721.6, 300 sec: 13968.1). Total num frames: 909926400. Throughput: 0: 3347.9. Samples: 216646776. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 14:27:03,968][134211] Avg episode reward: [(0, '10.189')] [2025-01-04 14:27:05,146][134294] Updated weights for policy 0, policy_version 222154 (0.0029) [2025-01-04 14:27:08,193][134294] Updated weights for policy 0, policy_version 222164 (0.0027) [2025-01-04 14:27:08,968][134211] Fps is (10 sec: 12697.6, 60 sec: 13448.5, 300 sec: 13940.3). Total num frames: 909991936. Throughput: 0: 3323.6. Samples: 216665648. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 14:27:08,968][134211] Avg episode reward: [(0, '9.401')] [2025-01-04 14:27:11,204][134294] Updated weights for policy 0, policy_version 222174 (0.0025) [2025-01-04 14:27:13,968][134211] Fps is (10 sec: 13106.6, 60 sec: 13243.6, 300 sec: 13801.4). Total num frames: 910057472. Throughput: 0: 3397.0. Samples: 216685552. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 14:27:13,969][134211] Avg episode reward: [(0, '9.569')] [2025-01-04 14:27:13,979][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000222182_910057472.pth... [2025-01-04 14:27:14,064][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000221382_906780672.pth [2025-01-04 14:27:14,437][134294] Updated weights for policy 0, policy_version 222184 (0.0026) [2025-01-04 14:27:17,436][134294] Updated weights for policy 0, policy_version 222194 (0.0027) [2025-01-04 14:27:18,968][134211] Fps is (10 sec: 13107.3, 60 sec: 13243.7, 300 sec: 13676.5). Total num frames: 910123008. Throughput: 0: 3429.2. Samples: 216695378. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 14:27:18,968][134211] Avg episode reward: [(0, '9.782')] [2025-01-04 14:27:20,467][134294] Updated weights for policy 0, policy_version 222204 (0.0025) [2025-01-04 14:27:23,434][134294] Updated weights for policy 0, policy_version 222214 (0.0024) [2025-01-04 14:27:23,968][134211] Fps is (10 sec: 13517.6, 60 sec: 13380.3, 300 sec: 13704.2). Total num frames: 910192640. Throughput: 0: 3315.9. Samples: 216716094. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 14:27:23,968][134211] Avg episode reward: [(0, '9.397')] [2025-01-04 14:27:26,400][134294] Updated weights for policy 0, policy_version 222224 (0.0025) [2025-01-04 14:27:28,968][134211] Fps is (10 sec: 13926.3, 60 sec: 13585.1, 300 sec: 13690.4). Total num frames: 910262272. Throughput: 0: 3205.7. Samples: 216736388. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 14:27:28,968][134211] Avg episode reward: [(0, '8.994')] [2025-01-04 14:27:29,525][134294] Updated weights for policy 0, policy_version 222234 (0.0026) [2025-01-04 14:27:31,778][134294] Updated weights for policy 0, policy_version 222244 (0.0018) [2025-01-04 14:27:33,648][134294] Updated weights for policy 0, policy_version 222254 (0.0014) [2025-01-04 14:27:33,968][134211] Fps is (10 sec: 16383.3, 60 sec: 14131.1, 300 sec: 13787.5). Total num frames: 910356480. Throughput: 0: 3306.0. Samples: 216747974. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 14:27:33,968][134211] Avg episode reward: [(0, '9.488')] [2025-01-04 14:27:35,909][134294] Updated weights for policy 0, policy_version 222264 (0.0019) [2025-01-04 14:27:38,883][134294] Updated weights for policy 0, policy_version 222274 (0.0025) [2025-01-04 14:27:38,968][134211] Fps is (10 sec: 17203.2, 60 sec: 13721.5, 300 sec: 13815.3). Total num frames: 910434304. Throughput: 0: 3520.2. Samples: 216775916. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 14:27:38,968][134211] Avg episode reward: [(0, '10.136')] [2025-01-04 14:27:42,028][134294] Updated weights for policy 0, policy_version 222284 (0.0024) [2025-01-04 14:27:43,968][134211] Fps is (10 sec: 14336.6, 60 sec: 13653.4, 300 sec: 13801.4). Total num frames: 910499840. Throughput: 0: 3519.0. Samples: 216795766. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 14:27:43,968][134211] Avg episode reward: [(0, '9.332')] [2025-01-04 14:27:45,190][134294] Updated weights for policy 0, policy_version 222294 (0.0029) [2025-01-04 14:27:48,073][134294] Updated weights for policy 0, policy_version 222304 (0.0026) [2025-01-04 14:27:48,968][134211] Fps is (10 sec: 13107.2, 60 sec: 13858.1, 300 sec: 13787.6). Total num frames: 910565376. Throughput: 0: 3537.6. Samples: 216805968. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 14:27:48,968][134211] Avg episode reward: [(0, '10.390')] [2025-01-04 14:27:51,049][134294] Updated weights for policy 0, policy_version 222314 (0.0027) [2025-01-04 14:27:53,968][134211] Fps is (10 sec: 13107.1, 60 sec: 13858.1, 300 sec: 13773.7). Total num frames: 910630912. Throughput: 0: 3569.9. Samples: 216826294. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 14:27:53,968][134211] Avg episode reward: [(0, '10.329')] [2025-01-04 14:27:54,443][134294] Updated weights for policy 0, policy_version 222324 (0.0028) [2025-01-04 14:27:57,424][134294] Updated weights for policy 0, policy_version 222334 (0.0022) [2025-01-04 14:27:58,968][134211] Fps is (10 sec: 13107.2, 60 sec: 13858.1, 300 sec: 13759.8). Total num frames: 910696448. Throughput: 0: 3555.8. Samples: 216845562. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 14:27:58,968][134211] Avg episode reward: [(0, '10.460')] [2025-01-04 14:28:00,515][134294] Updated weights for policy 0, policy_version 222344 (0.0028) [2025-01-04 14:28:03,433][134294] Updated weights for policy 0, policy_version 222354 (0.0024) [2025-01-04 14:28:03,968][134211] Fps is (10 sec: 13516.7, 60 sec: 13994.7, 300 sec: 13745.9). Total num frames: 910766080. Throughput: 0: 3566.3. Samples: 216855862. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 14:28:03,969][134211] Avg episode reward: [(0, '9.878')] [2025-01-04 14:28:06,515][134294] Updated weights for policy 0, policy_version 222364 (0.0026) [2025-01-04 14:28:08,968][134211] Fps is (10 sec: 13926.6, 60 sec: 14063.0, 300 sec: 13745.9). Total num frames: 910835712. Throughput: 0: 3556.8. Samples: 216876150. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 14:28:08,968][134211] Avg episode reward: [(0, '9.074')] [2025-01-04 14:28:09,649][134294] Updated weights for policy 0, policy_version 222374 (0.0025) [2025-01-04 14:28:12,622][134294] Updated weights for policy 0, policy_version 222384 (0.0024) [2025-01-04 14:28:13,968][134211] Fps is (10 sec: 13517.0, 60 sec: 14063.1, 300 sec: 13732.0). Total num frames: 910901248. Throughput: 0: 3551.5. Samples: 216896204. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 14:28:13,968][134211] Avg episode reward: [(0, '9.424')] [2025-01-04 14:28:15,552][134294] Updated weights for policy 0, policy_version 222394 (0.0026) [2025-01-04 14:28:18,454][134294] Updated weights for policy 0, policy_version 222404 (0.0027) [2025-01-04 14:28:18,968][134211] Fps is (10 sec: 13516.8, 60 sec: 14131.2, 300 sec: 13718.1). Total num frames: 910970880. Throughput: 0: 3529.5. Samples: 216906798. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 14:28:18,968][134211] Avg episode reward: [(0, '9.757')] [2025-01-04 14:28:21,468][134294] Updated weights for policy 0, policy_version 222414 (0.0024) [2025-01-04 14:28:23,968][134211] Fps is (10 sec: 13926.2, 60 sec: 14131.2, 300 sec: 13718.1). Total num frames: 911040512. Throughput: 0: 3370.5. Samples: 216927590. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 14:28:23,968][134211] Avg episode reward: [(0, '9.877')] [2025-01-04 14:28:24,542][134294] Updated weights for policy 0, policy_version 222424 (0.0028) [2025-01-04 14:28:27,586][134294] Updated weights for policy 0, policy_version 222434 (0.0028) [2025-01-04 14:28:28,968][134211] Fps is (10 sec: 13516.6, 60 sec: 14062.9, 300 sec: 13704.2). Total num frames: 911106048. Throughput: 0: 3375.3. Samples: 216947656. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 14:28:28,968][134211] Avg episode reward: [(0, '9.918')] [2025-01-04 14:28:30,537][134294] Updated weights for policy 0, policy_version 222444 (0.0023) [2025-01-04 14:28:32,482][134294] Updated weights for policy 0, policy_version 222454 (0.0013) [2025-01-04 14:28:33,967][134211] Fps is (10 sec: 15974.9, 60 sec: 14063.1, 300 sec: 13787.6). Total num frames: 911200256. Throughput: 0: 3403.6. Samples: 216959130. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 14:28:33,968][134211] Avg episode reward: [(0, '9.563')] [2025-01-04 14:28:34,337][134294] Updated weights for policy 0, policy_version 222464 (0.0012) [2025-01-04 14:28:36,881][134294] Updated weights for policy 0, policy_version 222474 (0.0023) [2025-01-04 14:28:38,968][134211] Fps is (10 sec: 17203.2, 60 sec: 14062.9, 300 sec: 13801.4). Total num frames: 911278080. Throughput: 0: 3584.1. Samples: 216987578. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 14:28:38,968][134211] Avg episode reward: [(0, '9.293')] [2025-01-04 14:28:40,034][134294] Updated weights for policy 0, policy_version 222484 (0.0026) [2025-01-04 14:28:43,082][134294] Updated weights for policy 0, policy_version 222494 (0.0028) [2025-01-04 14:28:43,968][134211] Fps is (10 sec: 14335.8, 60 sec: 14062.9, 300 sec: 13787.6). Total num frames: 911343616. Throughput: 0: 3596.3. Samples: 217007394. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 14:28:43,968][134211] Avg episode reward: [(0, '9.130')] [2025-01-04 14:28:46,150][134294] Updated weights for policy 0, policy_version 222504 (0.0027) [2025-01-04 14:28:48,969][134211] Fps is (10 sec: 13105.8, 60 sec: 14062.7, 300 sec: 13718.1). Total num frames: 911409152. Throughput: 0: 3591.3. Samples: 217017472. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 14:28:48,969][134211] Avg episode reward: [(0, '9.779')] [2025-01-04 14:28:49,216][134294] Updated weights for policy 0, policy_version 222514 (0.0026) [2025-01-04 14:28:52,230][134294] Updated weights for policy 0, policy_version 222524 (0.0026) [2025-01-04 14:28:53,968][134211] Fps is (10 sec: 13516.0, 60 sec: 14131.1, 300 sec: 13704.2). Total num frames: 911478784. Throughput: 0: 3586.2. Samples: 217037532. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 14:28:53,969][134211] Avg episode reward: [(0, '9.886')] [2025-01-04 14:28:55,283][134294] Updated weights for policy 0, policy_version 222534 (0.0026) [2025-01-04 14:28:58,222][134294] Updated weights for policy 0, policy_version 222544 (0.0026) [2025-01-04 14:28:58,968][134211] Fps is (10 sec: 13927.9, 60 sec: 14199.5, 300 sec: 13704.2). Total num frames: 911548416. Throughput: 0: 3598.8. Samples: 217058152. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 14:28:58,968][134211] Avg episode reward: [(0, '9.083')] [2025-01-04 14:29:01,206][134294] Updated weights for policy 0, policy_version 222554 (0.0027) [2025-01-04 14:29:03,968][134211] Fps is (10 sec: 13517.6, 60 sec: 14131.2, 300 sec: 13690.4). Total num frames: 911613952. Throughput: 0: 3593.1. Samples: 217068488. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 14:29:03,968][134211] Avg episode reward: [(0, '8.806')] [2025-01-04 14:29:04,320][134294] Updated weights for policy 0, policy_version 222564 (0.0027) [2025-01-04 14:29:07,357][134294] Updated weights for policy 0, policy_version 222574 (0.0023) [2025-01-04 14:29:08,968][134211] Fps is (10 sec: 13516.5, 60 sec: 14131.1, 300 sec: 13676.5). Total num frames: 911683584. Throughput: 0: 3575.6. Samples: 217088492. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 14:29:08,969][134211] Avg episode reward: [(0, '9.910')] [2025-01-04 14:29:10,339][134294] Updated weights for policy 0, policy_version 222584 (0.0026) [2025-01-04 14:29:13,276][134294] Updated weights for policy 0, policy_version 222594 (0.0024) [2025-01-04 14:29:13,968][134211] Fps is (10 sec: 13926.2, 60 sec: 14199.4, 300 sec: 13676.5). Total num frames: 911753216. Throughput: 0: 3589.7. Samples: 217109194. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 14:29:13,969][134211] Avg episode reward: [(0, '12.320')] [2025-01-04 14:29:13,983][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000222596_911753216.pth... [2025-01-04 14:29:14,056][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000221804_908509184.pth [2025-01-04 14:29:14,057][134264] Saving new best policy, reward=12.320! [2025-01-04 14:29:16,499][134294] Updated weights for policy 0, policy_version 222604 (0.0027) [2025-01-04 14:29:18,968][134211] Fps is (10 sec: 13925.6, 60 sec: 14199.3, 300 sec: 13634.8). Total num frames: 911822848. Throughput: 0: 3542.9. Samples: 217118562. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 14:29:18,969][134211] Avg episode reward: [(0, '9.537')] [2025-01-04 14:29:19,131][134294] Updated weights for policy 0, policy_version 222614 (0.0015) [2025-01-04 14:29:21,196][134294] Updated weights for policy 0, policy_version 222624 (0.0013) [2025-01-04 14:29:23,127][134294] Updated weights for policy 0, policy_version 222634 (0.0014) [2025-01-04 14:29:23,968][134211] Fps is (10 sec: 17203.1, 60 sec: 14745.6, 300 sec: 13648.7). Total num frames: 911925248. Throughput: 0: 3506.5. Samples: 217145372. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 14:29:23,968][134211] Avg episode reward: [(0, '9.677')] [2025-01-04 14:29:25,178][134294] Updated weights for policy 0, policy_version 222644 (0.0015) [2025-01-04 14:29:28,725][134294] Updated weights for policy 0, policy_version 222654 (0.0032) [2025-01-04 14:29:28,971][134211] Fps is (10 sec: 16789.4, 60 sec: 14744.8, 300 sec: 13676.4). Total num frames: 911990784. Throughput: 0: 3596.5. Samples: 217169250. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 14:29:28,973][134211] Avg episode reward: [(0, '10.813')] [2025-01-04 14:29:32,614][134294] Updated weights for policy 0, policy_version 222664 (0.0032) [2025-01-04 14:29:33,968][134211] Fps is (10 sec: 11877.9, 60 sec: 14062.7, 300 sec: 13690.3). Total num frames: 912044032. Throughput: 0: 3545.1. Samples: 217177000. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 14:29:33,969][134211] Avg episode reward: [(0, '10.087')] [2025-01-04 14:29:36,618][134294] Updated weights for policy 0, policy_version 222674 (0.0035) [2025-01-04 14:29:38,968][134211] Fps is (10 sec: 10243.5, 60 sec: 13585.1, 300 sec: 13537.6). Total num frames: 912093184. Throughput: 0: 3446.6. Samples: 217192626. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 14:29:38,968][134211] Avg episode reward: [(0, '9.692')] [2025-01-04 14:29:39,921][134294] Updated weights for policy 0, policy_version 222684 (0.0023) [2025-01-04 14:29:41,960][134294] Updated weights for policy 0, policy_version 222694 (0.0013) [2025-01-04 14:29:43,968][134211] Fps is (10 sec: 14746.7, 60 sec: 14131.2, 300 sec: 13676.5). Total num frames: 912191488. Throughput: 0: 3554.5. Samples: 217218106. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 14:29:43,968][134211] Avg episode reward: [(0, '9.854')] [2025-01-04 14:29:43,979][134294] Updated weights for policy 0, policy_version 222704 (0.0014) [2025-01-04 14:29:46,050][134294] Updated weights for policy 0, policy_version 222714 (0.0014) [2025-01-04 14:29:48,031][134294] Updated weights for policy 0, policy_version 222724 (0.0016) [2025-01-04 14:29:48,968][134211] Fps is (10 sec: 20070.3, 60 sec: 14745.9, 300 sec: 13815.3). Total num frames: 912293888. Throughput: 0: 3657.5. Samples: 217233076. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 14:29:48,968][134211] Avg episode reward: [(0, '9.533')] [2025-01-04 14:29:50,118][134294] Updated weights for policy 0, policy_version 222734 (0.0016) [2025-01-04 14:29:52,384][134294] Updated weights for policy 0, policy_version 222744 (0.0017) [2025-01-04 14:29:53,968][134211] Fps is (10 sec: 18431.0, 60 sec: 14950.4, 300 sec: 13912.5). Total num frames: 912375808. Throughput: 0: 3868.7. Samples: 217262584. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 14:29:53,970][134211] Avg episode reward: [(0, '10.415')] [2025-01-04 14:29:56,381][134294] Updated weights for policy 0, policy_version 222754 (0.0036) [2025-01-04 14:29:58,968][134211] Fps is (10 sec: 13107.1, 60 sec: 14609.1, 300 sec: 13884.8). Total num frames: 912424960. Throughput: 0: 3750.1. Samples: 217277946. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 14:29:58,969][134211] Avg episode reward: [(0, '10.348')] [2025-01-04 14:30:00,599][134294] Updated weights for policy 0, policy_version 222764 (0.0037) [2025-01-04 14:30:03,969][134211] Fps is (10 sec: 10239.5, 60 sec: 14404.0, 300 sec: 13759.7). Total num frames: 912478208. Throughput: 0: 3696.5. Samples: 217284904. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 14:30:03,969][134211] Avg episode reward: [(0, '8.642')] [2025-01-04 14:30:04,471][134294] Updated weights for policy 0, policy_version 222774 (0.0032) [2025-01-04 14:30:08,464][134294] Updated weights for policy 0, policy_version 222784 (0.0034) [2025-01-04 14:30:08,968][134211] Fps is (10 sec: 10240.0, 60 sec: 14063.0, 300 sec: 13620.9). Total num frames: 912527360. Throughput: 0: 3451.7. Samples: 217300698. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 14:30:08,968][134211] Avg episode reward: [(0, '8.967')] [2025-01-04 14:30:11,897][134294] Updated weights for policy 0, policy_version 222794 (0.0030) [2025-01-04 14:30:13,968][134211] Fps is (10 sec: 10650.4, 60 sec: 13858.1, 300 sec: 13690.4). Total num frames: 912584704. Throughput: 0: 3298.6. Samples: 217317678. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 14:30:13,969][134211] Avg episode reward: [(0, '9.576')] [2025-01-04 14:30:15,756][134294] Updated weights for policy 0, policy_version 222804 (0.0029) [2025-01-04 14:30:18,968][134211] Fps is (10 sec: 11059.4, 60 sec: 13585.3, 300 sec: 13704.3). Total num frames: 912637952. Throughput: 0: 3299.4. Samples: 217325472. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 14:30:18,968][134211] Avg episode reward: [(0, '9.430')] [2025-01-04 14:30:19,238][134294] Updated weights for policy 0, policy_version 222814 (0.0023) [2025-01-04 14:30:21,295][134294] Updated weights for policy 0, policy_version 222824 (0.0016) [2025-01-04 14:30:23,501][134294] Updated weights for policy 0, policy_version 222834 (0.0017) [2025-01-04 14:30:23,968][134211] Fps is (10 sec: 14745.9, 60 sec: 13448.6, 300 sec: 13773.7). Total num frames: 912732160. Throughput: 0: 3484.3. Samples: 217349422. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 14:30:23,968][134211] Avg episode reward: [(0, '9.706')] [2025-01-04 14:30:27,192][134294] Updated weights for policy 0, policy_version 222844 (0.0030) [2025-01-04 14:30:28,968][134211] Fps is (10 sec: 14745.5, 60 sec: 13244.5, 300 sec: 13676.5). Total num frames: 912785408. Throughput: 0: 3334.5. Samples: 217368160. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 14:30:28,968][134211] Avg episode reward: [(0, '9.496')] [2025-01-04 14:30:30,867][134294] Updated weights for policy 0, policy_version 222854 (0.0032) [2025-01-04 14:30:33,968][134211] Fps is (10 sec: 11468.6, 60 sec: 13380.4, 300 sec: 13718.1). Total num frames: 912846848. Throughput: 0: 3195.0. Samples: 217376852. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 14:30:33,968][134211] Avg episode reward: [(0, '10.209')] [2025-01-04 14:30:34,005][134294] Updated weights for policy 0, policy_version 222864 (0.0029) [2025-01-04 14:30:37,853][134294] Updated weights for policy 0, policy_version 222874 (0.0027) [2025-01-04 14:30:38,968][134211] Fps is (10 sec: 12697.5, 60 sec: 13653.3, 300 sec: 13732.0). Total num frames: 912912384. Throughput: 0: 2909.7. Samples: 217393518. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 14:30:38,969][134211] Avg episode reward: [(0, '9.659')] [2025-01-04 14:30:39,986][134294] Updated weights for policy 0, policy_version 222884 (0.0014) [2025-01-04 14:30:42,040][134294] Updated weights for policy 0, policy_version 222894 (0.0014) [2025-01-04 14:30:43,968][134211] Fps is (10 sec: 16384.3, 60 sec: 13653.3, 300 sec: 13857.0). Total num frames: 913010688. Throughput: 0: 3203.5. Samples: 217422104. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 14:30:43,968][134211] Avg episode reward: [(0, '10.274')] [2025-01-04 14:30:44,114][134294] Updated weights for policy 0, policy_version 222904 (0.0013) [2025-01-04 14:30:46,288][134294] Updated weights for policy 0, policy_version 222914 (0.0016) [2025-01-04 14:30:48,968][134211] Fps is (10 sec: 17203.0, 60 sec: 13175.4, 300 sec: 13884.7). Total num frames: 913084416. Throughput: 0: 3374.5. Samples: 217436752. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 14:30:48,969][134211] Avg episode reward: [(0, '10.041')] [2025-01-04 14:30:50,105][134294] Updated weights for policy 0, policy_version 222924 (0.0039) [2025-01-04 14:30:53,969][134211] Fps is (10 sec: 12286.6, 60 sec: 12629.2, 300 sec: 13843.0). Total num frames: 913133568. Throughput: 0: 3378.2. Samples: 217452720. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 14:30:53,970][134211] Avg episode reward: [(0, '9.836')] [2025-01-04 14:30:54,216][134294] Updated weights for policy 0, policy_version 222934 (0.0038) [2025-01-04 14:30:58,968][134211] Fps is (10 sec: 9420.7, 60 sec: 12561.0, 300 sec: 13815.3). Total num frames: 913178624. Throughput: 0: 3304.6. Samples: 217466384. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 14:30:58,969][134211] Avg episode reward: [(0, '10.085')] [2025-01-04 14:30:58,969][134294] Updated weights for policy 0, policy_version 222944 (0.0034) [2025-01-04 14:31:02,149][134294] Updated weights for policy 0, policy_version 222954 (0.0021) [2025-01-04 14:31:03,968][134211] Fps is (10 sec: 11470.1, 60 sec: 12834.3, 300 sec: 13773.7). Total num frames: 913248256. Throughput: 0: 3300.2. Samples: 217473980. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 14:31:03,968][134211] Avg episode reward: [(0, '10.124')] [2025-01-04 14:31:04,432][134294] Updated weights for policy 0, policy_version 222964 (0.0014) [2025-01-04 14:31:06,474][134294] Updated weights for policy 0, policy_version 222974 (0.0016) [2025-01-04 14:31:08,430][134294] Updated weights for policy 0, policy_version 222984 (0.0014) [2025-01-04 14:31:08,967][134211] Fps is (10 sec: 17204.1, 60 sec: 13721.7, 300 sec: 13857.0). Total num frames: 913350656. Throughput: 0: 3392.8. Samples: 217502098. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 14:31:08,968][134211] Avg episode reward: [(0, '9.278')] [2025-01-04 14:31:10,484][134294] Updated weights for policy 0, policy_version 222994 (0.0013) [2025-01-04 14:31:13,968][134211] Fps is (10 sec: 17202.9, 60 sec: 13926.4, 300 sec: 13870.9). Total num frames: 913420288. Throughput: 0: 3527.4. Samples: 217526896. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 14:31:13,969][134211] Avg episode reward: [(0, '9.143')] [2025-01-04 14:31:13,982][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000223003_913420288.pth... [2025-01-04 14:31:14,080][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000222182_910057472.pth [2025-01-04 14:31:14,350][134294] Updated weights for policy 0, policy_version 223004 (0.0030) [2025-01-04 14:31:18,082][134294] Updated weights for policy 0, policy_version 223014 (0.0036) [2025-01-04 14:31:18,968][134211] Fps is (10 sec: 12287.8, 60 sec: 13926.4, 300 sec: 13843.1). Total num frames: 913473536. Throughput: 0: 3487.2. Samples: 217533776. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 14:31:18,968][134211] Avg episode reward: [(0, '9.047')] [2025-01-04 14:31:21,367][134294] Updated weights for policy 0, policy_version 223024 (0.0027) [2025-01-04 14:31:23,968][134211] Fps is (10 sec: 11468.3, 60 sec: 13380.1, 300 sec: 13857.0). Total num frames: 913534976. Throughput: 0: 3525.4. Samples: 217552162. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 14:31:23,969][134211] Avg episode reward: [(0, '10.113')] [2025-01-04 14:31:24,984][134294] Updated weights for policy 0, policy_version 223034 (0.0026) [2025-01-04 14:31:28,096][134294] Updated weights for policy 0, policy_version 223044 (0.0028) [2025-01-04 14:31:28,968][134211] Fps is (10 sec: 12287.9, 60 sec: 13516.8, 300 sec: 13857.0). Total num frames: 913596416. Throughput: 0: 3296.8. Samples: 217570460. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 14:31:28,968][134211] Avg episode reward: [(0, '10.352')] [2025-01-04 14:31:31,252][134294] Updated weights for policy 0, policy_version 223054 (0.0027) [2025-01-04 14:31:33,968][134211] Fps is (10 sec: 12698.2, 60 sec: 13585.1, 300 sec: 13732.0). Total num frames: 913661952. Throughput: 0: 3189.6. Samples: 217580286. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 14:31:33,968][134211] Avg episode reward: [(0, '10.035')] [2025-01-04 14:31:34,548][134294] Updated weights for policy 0, policy_version 223064 (0.0024) [2025-01-04 14:31:37,517][134294] Updated weights for policy 0, policy_version 223074 (0.0026) [2025-01-04 14:31:38,968][134211] Fps is (10 sec: 12697.5, 60 sec: 13516.8, 300 sec: 13704.3). Total num frames: 913723392. Throughput: 0: 3269.0. Samples: 217599820. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 14:31:38,969][134211] Avg episode reward: [(0, '8.899')] [2025-01-04 14:31:40,960][134294] Updated weights for policy 0, policy_version 223084 (0.0028) [2025-01-04 14:31:43,968][134211] Fps is (10 sec: 12697.5, 60 sec: 12970.6, 300 sec: 13745.9). Total num frames: 913788928. Throughput: 0: 3370.3. Samples: 217618048. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 14:31:43,969][134211] Avg episode reward: [(0, '11.373')] [2025-01-04 14:31:44,306][134294] Updated weights for policy 0, policy_version 223094 (0.0027) [2025-01-04 14:31:47,451][134294] Updated weights for policy 0, policy_version 223104 (0.0026) [2025-01-04 14:31:48,968][134211] Fps is (10 sec: 12697.7, 60 sec: 12765.9, 300 sec: 13732.0). Total num frames: 913850368. Throughput: 0: 3413.1. Samples: 217627568. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 14:31:48,968][134211] Avg episode reward: [(0, '9.457')] [2025-01-04 14:31:50,560][134294] Updated weights for policy 0, policy_version 223114 (0.0023) [2025-01-04 14:31:53,650][134294] Updated weights for policy 0, policy_version 223124 (0.0027) [2025-01-04 14:31:53,968][134211] Fps is (10 sec: 13107.3, 60 sec: 13107.4, 300 sec: 13745.9). Total num frames: 913920000. Throughput: 0: 3226.8. Samples: 217647304. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 14:31:53,968][134211] Avg episode reward: [(0, '10.212')] [2025-01-04 14:31:56,722][134294] Updated weights for policy 0, policy_version 223134 (0.0027) [2025-01-04 14:31:58,968][134211] Fps is (10 sec: 13106.8, 60 sec: 13380.3, 300 sec: 13745.9). Total num frames: 913981440. Throughput: 0: 3110.7. Samples: 217666878. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 14:31:58,969][134211] Avg episode reward: [(0, '10.239')] [2025-01-04 14:31:59,993][134294] Updated weights for policy 0, policy_version 223144 (0.0030) [2025-01-04 14:32:03,076][134294] Updated weights for policy 0, policy_version 223154 (0.0026) [2025-01-04 14:32:03,968][134211] Fps is (10 sec: 12697.5, 60 sec: 13312.0, 300 sec: 13745.9). Total num frames: 914046976. Throughput: 0: 3174.0. Samples: 217676606. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 14:32:03,969][134211] Avg episode reward: [(0, '10.446')] [2025-01-04 14:32:06,120][134294] Updated weights for policy 0, policy_version 223164 (0.0023) [2025-01-04 14:32:08,968][134211] Fps is (10 sec: 12697.7, 60 sec: 12629.2, 300 sec: 13732.0). Total num frames: 914108416. Throughput: 0: 3205.5. Samples: 217696408. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 14:32:08,969][134211] Avg episode reward: [(0, '10.117')] [2025-01-04 14:32:10,132][134294] Updated weights for policy 0, policy_version 223174 (0.0030) [2025-01-04 14:32:13,402][134294] Updated weights for policy 0, policy_version 223184 (0.0025) [2025-01-04 14:32:13,967][134211] Fps is (10 sec: 12698.1, 60 sec: 12561.1, 300 sec: 13732.0). Total num frames: 914173952. Throughput: 0: 3156.1. Samples: 217712482. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 14:32:13,968][134211] Avg episode reward: [(0, '10.112')] [2025-01-04 14:32:15,359][134294] Updated weights for policy 0, policy_version 223194 (0.0013) [2025-01-04 14:32:17,271][134294] Updated weights for policy 0, policy_version 223204 (0.0013) [2025-01-04 14:32:18,968][134211] Fps is (10 sec: 15974.4, 60 sec: 13243.7, 300 sec: 13815.3). Total num frames: 914268160. Throughput: 0: 3293.1. Samples: 217728476. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 14:32:18,969][134211] Avg episode reward: [(0, '9.931')] [2025-01-04 14:32:20,070][134294] Updated weights for policy 0, policy_version 223214 (0.0024) [2025-01-04 14:32:23,069][134294] Updated weights for policy 0, policy_version 223224 (0.0027) [2025-01-04 14:32:23,968][134211] Fps is (10 sec: 15973.8, 60 sec: 13312.1, 300 sec: 13801.4). Total num frames: 914333696. Throughput: 0: 3373.6. Samples: 217751634. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 14:32:23,969][134211] Avg episode reward: [(0, '9.489')] [2025-01-04 14:32:26,074][134294] Updated weights for policy 0, policy_version 223234 (0.0025) [2025-01-04 14:32:28,968][134211] Fps is (10 sec: 13107.5, 60 sec: 13380.3, 300 sec: 13704.3). Total num frames: 914399232. Throughput: 0: 3408.4. Samples: 217771424. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 14:32:28,968][134211] Avg episode reward: [(0, '10.121')] [2025-01-04 14:32:29,301][134294] Updated weights for policy 0, policy_version 223244 (0.0028) [2025-01-04 14:32:32,391][134294] Updated weights for policy 0, policy_version 223254 (0.0027) [2025-01-04 14:32:33,968][134211] Fps is (10 sec: 13516.9, 60 sec: 13448.5, 300 sec: 13676.5). Total num frames: 914468864. Throughput: 0: 3415.4. Samples: 217781262. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 14:32:33,968][134211] Avg episode reward: [(0, '9.720')] [2025-01-04 14:32:35,381][134294] Updated weights for policy 0, policy_version 223264 (0.0025) [2025-01-04 14:32:38,337][134294] Updated weights for policy 0, policy_version 223274 (0.0022) [2025-01-04 14:32:38,968][134211] Fps is (10 sec: 13516.8, 60 sec: 13516.8, 300 sec: 13676.5). Total num frames: 914534400. Throughput: 0: 3436.4. Samples: 217801940. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 14:32:38,968][134211] Avg episode reward: [(0, '9.294')] [2025-01-04 14:32:41,450][134294] Updated weights for policy 0, policy_version 223284 (0.0028) [2025-01-04 14:32:43,968][134211] Fps is (10 sec: 13516.6, 60 sec: 13585.0, 300 sec: 13690.4). Total num frames: 914604032. Throughput: 0: 3442.6. Samples: 217821794. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 14:32:43,969][134211] Avg episode reward: [(0, '9.689')] [2025-01-04 14:32:44,497][134294] Updated weights for policy 0, policy_version 223294 (0.0023) [2025-01-04 14:32:47,537][134294] Updated weights for policy 0, policy_version 223304 (0.0026) [2025-01-04 14:32:48,968][134211] Fps is (10 sec: 13516.8, 60 sec: 13653.3, 300 sec: 13690.4). Total num frames: 914669568. Throughput: 0: 3448.1. Samples: 217831768. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 14:32:48,968][134211] Avg episode reward: [(0, '8.722')] [2025-01-04 14:32:50,621][134294] Updated weights for policy 0, policy_version 223314 (0.0027) [2025-01-04 14:32:53,532][134294] Updated weights for policy 0, policy_version 223324 (0.0028) [2025-01-04 14:32:53,968][134211] Fps is (10 sec: 13517.0, 60 sec: 13653.3, 300 sec: 13704.2). Total num frames: 914739200. Throughput: 0: 3464.0. Samples: 217852288. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 14:32:53,968][134211] Avg episode reward: [(0, '10.255')] [2025-01-04 14:32:56,580][134294] Updated weights for policy 0, policy_version 223334 (0.0028) [2025-01-04 14:32:58,968][134211] Fps is (10 sec: 13516.9, 60 sec: 13721.7, 300 sec: 13690.4). Total num frames: 914804736. Throughput: 0: 3557.0. Samples: 217872548. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 14:32:58,968][134211] Avg episode reward: [(0, '9.600')] [2025-01-04 14:32:59,662][134294] Updated weights for policy 0, policy_version 223344 (0.0028) [2025-01-04 14:33:02,788][134294] Updated weights for policy 0, policy_version 223354 (0.0025) [2025-01-04 14:33:03,968][134211] Fps is (10 sec: 13517.0, 60 sec: 13789.9, 300 sec: 13690.4). Total num frames: 914874368. Throughput: 0: 3423.4. Samples: 217882526. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 14:33:03,968][134211] Avg episode reward: [(0, '9.074')] [2025-01-04 14:33:05,775][134294] Updated weights for policy 0, policy_version 223364 (0.0027) [2025-01-04 14:33:08,649][134294] Updated weights for policy 0, policy_version 223374 (0.0025) [2025-01-04 14:33:08,968][134211] Fps is (10 sec: 13516.9, 60 sec: 13858.2, 300 sec: 13690.4). Total num frames: 914939904. Throughput: 0: 3365.1. Samples: 217903062. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 14:33:08,968][134211] Avg episode reward: [(0, '9.816')] [2025-01-04 14:33:11,676][134294] Updated weights for policy 0, policy_version 223384 (0.0027) [2025-01-04 14:33:13,968][134211] Fps is (10 sec: 13516.7, 60 sec: 13926.3, 300 sec: 13690.4). Total num frames: 915009536. Throughput: 0: 3377.2. Samples: 217923396. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 14:33:13,968][134211] Avg episode reward: [(0, '9.347')] [2025-01-04 14:33:13,978][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000223391_915009536.pth... [2025-01-04 14:33:14,060][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000222596_911753216.pth [2025-01-04 14:33:14,663][134294] Updated weights for policy 0, policy_version 223394 (0.0021) [2025-01-04 14:33:16,574][134294] Updated weights for policy 0, policy_version 223404 (0.0013) [2025-01-04 14:33:18,490][134294] Updated weights for policy 0, policy_version 223414 (0.0013) [2025-01-04 14:33:18,968][134211] Fps is (10 sec: 17203.4, 60 sec: 14063.0, 300 sec: 13801.5). Total num frames: 915111936. Throughput: 0: 3455.2. Samples: 217936744. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 14:33:18,968][134211] Avg episode reward: [(0, '10.018')] [2025-01-04 14:33:20,394][134294] Updated weights for policy 0, policy_version 223424 (0.0013) [2025-01-04 14:33:22,257][134294] Updated weights for policy 0, policy_version 223434 (0.0014) [2025-01-04 14:33:23,968][134211] Fps is (10 sec: 20889.9, 60 sec: 14745.7, 300 sec: 13940.3). Total num frames: 915218432. Throughput: 0: 3719.2. Samples: 217969304. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 14:33:23,968][134211] Avg episode reward: [(0, '10.052')] [2025-01-04 14:33:24,173][134294] Updated weights for policy 0, policy_version 223444 (0.0015) [2025-01-04 14:33:27,141][134294] Updated weights for policy 0, policy_version 223454 (0.0026) [2025-01-04 14:33:28,968][134211] Fps is (10 sec: 17612.5, 60 sec: 14813.9, 300 sec: 13857.0). Total num frames: 915288064. Throughput: 0: 3808.4. Samples: 217993172. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 14:33:28,968][134211] Avg episode reward: [(0, '9.219')] [2025-01-04 14:33:30,495][134294] Updated weights for policy 0, policy_version 223464 (0.0029) [2025-01-04 14:33:33,594][134294] Updated weights for policy 0, policy_version 223474 (0.0025) [2025-01-04 14:33:33,968][134211] Fps is (10 sec: 13515.8, 60 sec: 14745.5, 300 sec: 13815.3). Total num frames: 915353600. Throughput: 0: 3795.8. Samples: 218002580. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 14:33:33,969][134211] Avg episode reward: [(0, '10.381')] [2025-01-04 14:33:36,726][134294] Updated weights for policy 0, policy_version 223484 (0.0027) [2025-01-04 14:33:38,968][134211] Fps is (10 sec: 13107.3, 60 sec: 14745.6, 300 sec: 13815.3). Total num frames: 915419136. Throughput: 0: 3780.1. Samples: 218022394. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 14:33:38,968][134211] Avg episode reward: [(0, '9.199')] [2025-01-04 14:33:39,857][134294] Updated weights for policy 0, policy_version 223494 (0.0026) [2025-01-04 14:33:42,910][134294] Updated weights for policy 0, policy_version 223504 (0.0023) [2025-01-04 14:33:43,968][134211] Fps is (10 sec: 13107.8, 60 sec: 14677.4, 300 sec: 13815.4). Total num frames: 915484672. Throughput: 0: 3767.6. Samples: 218042092. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 14:33:43,968][134211] Avg episode reward: [(0, '9.515')] [2025-01-04 14:33:45,947][134294] Updated weights for policy 0, policy_version 223514 (0.0026) [2025-01-04 14:33:48,968][134211] Fps is (10 sec: 13107.0, 60 sec: 14677.3, 300 sec: 13801.5). Total num frames: 915550208. Throughput: 0: 3772.3. Samples: 218052282. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 14:33:48,968][134211] Avg episode reward: [(0, '10.094')] [2025-01-04 14:33:48,985][134294] Updated weights for policy 0, policy_version 223524 (0.0027) [2025-01-04 14:33:51,920][134294] Updated weights for policy 0, policy_version 223534 (0.0023) [2025-01-04 14:33:53,969][134211] Fps is (10 sec: 13515.6, 60 sec: 14677.1, 300 sec: 13801.4). Total num frames: 915619840. Throughput: 0: 3773.8. Samples: 218072886. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 14:33:53,969][134211] Avg episode reward: [(0, '10.320')] [2025-01-04 14:33:55,044][134294] Updated weights for policy 0, policy_version 223544 (0.0027) [2025-01-04 14:33:58,066][134294] Updated weights for policy 0, policy_version 223554 (0.0026) [2025-01-04 14:33:58,968][134211] Fps is (10 sec: 13517.0, 60 sec: 14677.4, 300 sec: 13801.4). Total num frames: 915685376. Throughput: 0: 3768.6. Samples: 218092982. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 14:33:58,968][134211] Avg episode reward: [(0, '9.265')] [2025-01-04 14:34:01,051][134294] Updated weights for policy 0, policy_version 223564 (0.0024) [2025-01-04 14:34:03,968][134211] Fps is (10 sec: 13518.0, 60 sec: 14677.3, 300 sec: 13801.4). Total num frames: 915755008. Throughput: 0: 3699.0. Samples: 218103198. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 14:34:03,969][134211] Avg episode reward: [(0, '9.775')] [2025-01-04 14:34:04,237][134294] Updated weights for policy 0, policy_version 223574 (0.0028) [2025-01-04 14:34:07,176][134294] Updated weights for policy 0, policy_version 223584 (0.0026) [2025-01-04 14:34:08,968][134211] Fps is (10 sec: 13516.8, 60 sec: 14677.3, 300 sec: 13787.6). Total num frames: 915820544. Throughput: 0: 3420.3. Samples: 218123218. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 14:34:08,968][134211] Avg episode reward: [(0, '10.621')] [2025-01-04 14:34:10,163][134294] Updated weights for policy 0, policy_version 223594 (0.0022) [2025-01-04 14:34:13,186][134294] Updated weights for policy 0, policy_version 223604 (0.0025) [2025-01-04 14:34:13,968][134211] Fps is (10 sec: 13516.8, 60 sec: 14677.3, 300 sec: 13787.6). Total num frames: 915890176. Throughput: 0: 3343.5. Samples: 218143632. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 14:34:13,968][134211] Avg episode reward: [(0, '9.993')] [2025-01-04 14:34:16,111][134294] Updated weights for policy 0, policy_version 223614 (0.0025) [2025-01-04 14:34:18,968][134211] Fps is (10 sec: 13926.4, 60 sec: 14131.2, 300 sec: 13676.5). Total num frames: 915959808. Throughput: 0: 3365.4. Samples: 218154022. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 14:34:18,968][134211] Avg episode reward: [(0, '9.861')] [2025-01-04 14:34:19,306][134294] Updated weights for policy 0, policy_version 223624 (0.0025) [2025-01-04 14:34:22,196][134294] Updated weights for policy 0, policy_version 223634 (0.0023) [2025-01-04 14:34:23,968][134211] Fps is (10 sec: 13516.8, 60 sec: 13448.5, 300 sec: 13676.6). Total num frames: 916025344. Throughput: 0: 3373.1. Samples: 218174184. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 14:34:23,968][134211] Avg episode reward: [(0, '9.608')] [2025-01-04 14:34:25,203][134294] Updated weights for policy 0, policy_version 223644 (0.0026) [2025-01-04 14:34:28,203][134294] Updated weights for policy 0, policy_version 223654 (0.0025) [2025-01-04 14:34:28,968][134211] Fps is (10 sec: 13516.8, 60 sec: 13448.6, 300 sec: 13732.0). Total num frames: 916094976. Throughput: 0: 3395.4. Samples: 218194886. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 14:34:28,968][134211] Avg episode reward: [(0, '9.638')] [2025-01-04 14:34:31,083][134294] Updated weights for policy 0, policy_version 223664 (0.0025) [2025-01-04 14:34:33,968][134211] Fps is (10 sec: 13926.4, 60 sec: 13516.9, 300 sec: 13801.4). Total num frames: 916164608. Throughput: 0: 3401.5. Samples: 218205348. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 14:34:33,968][134211] Avg episode reward: [(0, '8.798')] [2025-01-04 14:34:34,299][134294] Updated weights for policy 0, policy_version 223674 (0.0025) [2025-01-04 14:34:37,222][134294] Updated weights for policy 0, policy_version 223684 (0.0022) [2025-01-04 14:34:38,968][134211] Fps is (10 sec: 13516.7, 60 sec: 13516.8, 300 sec: 13690.4). Total num frames: 916230144. Throughput: 0: 3393.0. Samples: 218225566. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 14:34:38,968][134211] Avg episode reward: [(0, '9.585')] [2025-01-04 14:34:40,219][134294] Updated weights for policy 0, policy_version 223694 (0.0026) [2025-01-04 14:34:43,169][134294] Updated weights for policy 0, policy_version 223704 (0.0025) [2025-01-04 14:34:43,969][134211] Fps is (10 sec: 13515.4, 60 sec: 13584.8, 300 sec: 13579.2). Total num frames: 916299776. Throughput: 0: 3404.7. Samples: 218246196. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 14:34:43,969][134211] Avg episode reward: [(0, '9.522')] [2025-01-04 14:34:46,072][134294] Updated weights for policy 0, policy_version 223714 (0.0027) [2025-01-04 14:34:48,133][134294] Updated weights for policy 0, policy_version 223724 (0.0015) [2025-01-04 14:34:48,968][134211] Fps is (10 sec: 15564.8, 60 sec: 13926.4, 300 sec: 13593.2). Total num frames: 916385792. Throughput: 0: 3408.1. Samples: 218256562. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 14:34:48,968][134211] Avg episode reward: [(0, '9.264')] [2025-01-04 14:34:50,990][134294] Updated weights for policy 0, policy_version 223734 (0.0022) [2025-01-04 14:34:53,968][134211] Fps is (10 sec: 15156.8, 60 sec: 13858.4, 300 sec: 13648.7). Total num frames: 916451328. Throughput: 0: 3501.2. Samples: 218280772. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 14:34:53,968][134211] Avg episode reward: [(0, '10.507')] [2025-01-04 14:34:54,150][134294] Updated weights for policy 0, policy_version 223744 (0.0024) [2025-01-04 14:34:57,053][134294] Updated weights for policy 0, policy_version 223754 (0.0025) [2025-01-04 14:34:58,968][134211] Fps is (10 sec: 13107.2, 60 sec: 13858.1, 300 sec: 13690.4). Total num frames: 916516864. Throughput: 0: 3492.0. Samples: 218300770. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 14:34:58,968][134211] Avg episode reward: [(0, '10.449')] [2025-01-04 14:35:00,156][134294] Updated weights for policy 0, policy_version 223764 (0.0029) [2025-01-04 14:35:03,161][134294] Updated weights for policy 0, policy_version 223774 (0.0023) [2025-01-04 14:35:03,969][134211] Fps is (10 sec: 13514.6, 60 sec: 13857.8, 300 sec: 13759.7). Total num frames: 916586496. Throughput: 0: 3490.5. Samples: 218311100. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 14:35:03,970][134211] Avg episode reward: [(0, '10.093')] [2025-01-04 14:35:06,112][134294] Updated weights for policy 0, policy_version 223784 (0.0025) [2025-01-04 14:35:08,968][134211] Fps is (10 sec: 13926.5, 60 sec: 13926.4, 300 sec: 13801.5). Total num frames: 916656128. Throughput: 0: 3493.5. Samples: 218331390. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 14:35:08,968][134211] Avg episode reward: [(0, '9.481')] [2025-01-04 14:35:09,260][134294] Updated weights for policy 0, policy_version 223794 (0.0027) [2025-01-04 14:35:11,777][134294] Updated weights for policy 0, policy_version 223804 (0.0018) [2025-01-04 14:35:13,968][134211] Fps is (10 sec: 15157.8, 60 sec: 14131.2, 300 sec: 13898.6). Total num frames: 916738048. Throughput: 0: 3571.0. Samples: 218355582. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 14:35:13,968][134211] Avg episode reward: [(0, '10.714')] [2025-01-04 14:35:14,020][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000223814_916742144.pth... [2025-01-04 14:35:14,021][134294] Updated weights for policy 0, policy_version 223814 (0.0019) [2025-01-04 14:35:14,090][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000223003_913420288.pth [2025-01-04 14:35:17,172][134294] Updated weights for policy 0, policy_version 223824 (0.0027) [2025-01-04 14:35:18,968][134211] Fps is (10 sec: 14745.5, 60 sec: 14062.9, 300 sec: 13801.4). Total num frames: 916803584. Throughput: 0: 3563.0. Samples: 218365682. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 14:35:18,968][134211] Avg episode reward: [(0, '9.944')] [2025-01-04 14:35:20,386][134294] Updated weights for policy 0, policy_version 223834 (0.0029) [2025-01-04 14:35:22,879][134294] Updated weights for policy 0, policy_version 223844 (0.0019) [2025-01-04 14:35:23,968][134211] Fps is (10 sec: 14745.5, 60 sec: 14336.0, 300 sec: 13898.6). Total num frames: 916885504. Throughput: 0: 3564.8. Samples: 218385982. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 14:35:23,969][134211] Avg episode reward: [(0, '9.515')] [2025-01-04 14:35:25,122][134294] Updated weights for policy 0, policy_version 223854 (0.0017) [2025-01-04 14:35:28,060][134294] Updated weights for policy 0, policy_version 223864 (0.0025) [2025-01-04 14:35:28,968][134211] Fps is (10 sec: 15155.0, 60 sec: 14336.0, 300 sec: 13926.4). Total num frames: 916955136. Throughput: 0: 3646.5. Samples: 218410284. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 14:35:28,968][134211] Avg episode reward: [(0, '9.667')] [2025-01-04 14:35:31,203][134294] Updated weights for policy 0, policy_version 223874 (0.0027) [2025-01-04 14:35:33,968][134211] Fps is (10 sec: 13516.9, 60 sec: 14267.8, 300 sec: 13926.4). Total num frames: 917020672. Throughput: 0: 3632.2. Samples: 218420012. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 14:35:33,968][134211] Avg episode reward: [(0, '10.542')] [2025-01-04 14:35:34,378][134294] Updated weights for policy 0, policy_version 223884 (0.0027) [2025-01-04 14:35:37,350][134294] Updated weights for policy 0, policy_version 223894 (0.0027) [2025-01-04 14:35:38,968][134211] Fps is (10 sec: 13516.9, 60 sec: 14336.0, 300 sec: 13829.2). Total num frames: 917090304. Throughput: 0: 3538.9. Samples: 218440022. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 14:35:38,968][134211] Avg episode reward: [(0, '8.990')] [2025-01-04 14:35:40,564][134294] Updated weights for policy 0, policy_version 223904 (0.0026) [2025-01-04 14:35:43,438][134294] Updated weights for policy 0, policy_version 223914 (0.0023) [2025-01-04 14:35:43,968][134211] Fps is (10 sec: 13516.8, 60 sec: 14268.0, 300 sec: 13801.4). Total num frames: 917155840. Throughput: 0: 3543.0. Samples: 218460204. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 14:35:43,968][134211] Avg episode reward: [(0, '9.817')] [2025-01-04 14:35:46,517][134294] Updated weights for policy 0, policy_version 223924 (0.0027) [2025-01-04 14:35:48,968][134211] Fps is (10 sec: 13107.2, 60 sec: 13926.4, 300 sec: 13857.0). Total num frames: 917221376. Throughput: 0: 3533.9. Samples: 218470120. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 14:35:48,968][134211] Avg episode reward: [(0, '8.427')] [2025-01-04 14:35:49,605][134294] Updated weights for policy 0, policy_version 223934 (0.0024) [2025-01-04 14:35:52,693][134294] Updated weights for policy 0, policy_version 223944 (0.0023) [2025-01-04 14:35:53,968][134211] Fps is (10 sec: 13516.9, 60 sec: 13994.7, 300 sec: 13940.3). Total num frames: 917291008. Throughput: 0: 3531.7. Samples: 218490316. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 14:35:53,968][134211] Avg episode reward: [(0, '10.302')] [2025-01-04 14:35:55,663][134294] Updated weights for policy 0, policy_version 223954 (0.0025) [2025-01-04 14:35:58,537][134294] Updated weights for policy 0, policy_version 223964 (0.0022) [2025-01-04 14:35:58,968][134211] Fps is (10 sec: 13926.3, 60 sec: 14062.9, 300 sec: 13940.3). Total num frames: 917360640. Throughput: 0: 3454.8. Samples: 218511048. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 14:35:58,968][134211] Avg episode reward: [(0, '11.326')] [2025-01-04 14:36:01,305][134294] Updated weights for policy 0, policy_version 223974 (0.0022) [2025-01-04 14:36:03,190][134294] Updated weights for policy 0, policy_version 223984 (0.0013) [2025-01-04 14:36:03,968][134211] Fps is (10 sec: 16384.1, 60 sec: 14473.0, 300 sec: 13912.5). Total num frames: 917454848. Throughput: 0: 3476.1. Samples: 218522108. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 14:36:03,968][134211] Avg episode reward: [(0, '10.092')] [2025-01-04 14:36:05,112][134294] Updated weights for policy 0, policy_version 223994 (0.0015) [2025-01-04 14:36:07,959][134294] Updated weights for policy 0, policy_version 224004 (0.0025) [2025-01-04 14:36:08,968][134211] Fps is (10 sec: 17203.5, 60 sec: 14609.1, 300 sec: 13940.3). Total num frames: 917532672. Throughput: 0: 3661.2. Samples: 218550734. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 14:36:08,968][134211] Avg episode reward: [(0, '9.005')] [2025-01-04 14:36:11,114][134294] Updated weights for policy 0, policy_version 224014 (0.0024) [2025-01-04 14:36:13,968][134211] Fps is (10 sec: 14335.9, 60 sec: 14336.0, 300 sec: 13981.9). Total num frames: 917598208. Throughput: 0: 3554.2. Samples: 218570222. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 14:36:13,968][134211] Avg episode reward: [(0, '9.369')] [2025-01-04 14:36:14,350][134294] Updated weights for policy 0, policy_version 224024 (0.0026) [2025-01-04 14:36:17,397][134294] Updated weights for policy 0, policy_version 224034 (0.0029) [2025-01-04 14:36:18,968][134211] Fps is (10 sec: 13107.0, 60 sec: 14336.0, 300 sec: 13995.8). Total num frames: 917663744. Throughput: 0: 3553.3. Samples: 218579910. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 14:36:18,968][134211] Avg episode reward: [(0, '9.463')] [2025-01-04 14:36:20,410][134294] Updated weights for policy 0, policy_version 224044 (0.0024) [2025-01-04 14:36:23,392][134294] Updated weights for policy 0, policy_version 224054 (0.0026) [2025-01-04 14:36:23,968][134211] Fps is (10 sec: 13107.2, 60 sec: 14062.9, 300 sec: 14009.7). Total num frames: 917729280. Throughput: 0: 3568.6. Samples: 218600608. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 14:36:23,968][134211] Avg episode reward: [(0, '11.709')] [2025-01-04 14:36:26,323][134294] Updated weights for policy 0, policy_version 224064 (0.0023) [2025-01-04 14:36:28,968][134211] Fps is (10 sec: 13516.8, 60 sec: 14062.9, 300 sec: 14023.6). Total num frames: 917798912. Throughput: 0: 3567.3. Samples: 218620732. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 14:36:28,969][134211] Avg episode reward: [(0, '11.133')] [2025-01-04 14:36:29,420][134294] Updated weights for policy 0, policy_version 224074 (0.0026) [2025-01-04 14:36:32,406][134294] Updated weights for policy 0, policy_version 224084 (0.0030) [2025-01-04 14:36:33,968][134211] Fps is (10 sec: 13926.4, 60 sec: 14131.2, 300 sec: 14051.4). Total num frames: 917868544. Throughput: 0: 3571.6. Samples: 218630842. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 14:36:33,968][134211] Avg episode reward: [(0, '9.756')] [2025-01-04 14:36:35,392][134294] Updated weights for policy 0, policy_version 224094 (0.0026) [2025-01-04 14:36:38,355][134294] Updated weights for policy 0, policy_version 224104 (0.0024) [2025-01-04 14:36:38,968][134211] Fps is (10 sec: 13926.6, 60 sec: 14131.2, 300 sec: 14065.3). Total num frames: 917938176. Throughput: 0: 3585.4. Samples: 218651660. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 14:36:38,968][134211] Avg episode reward: [(0, '9.990')] [2025-01-04 14:36:41,356][134294] Updated weights for policy 0, policy_version 224114 (0.0026) [2025-01-04 14:36:43,968][134211] Fps is (10 sec: 13516.7, 60 sec: 14131.2, 300 sec: 14079.1). Total num frames: 918003712. Throughput: 0: 3576.6. Samples: 218671994. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 14:36:43,968][134211] Avg episode reward: [(0, '10.147')] [2025-01-04 14:36:44,448][134294] Updated weights for policy 0, policy_version 224124 (0.0024) [2025-01-04 14:36:47,414][134294] Updated weights for policy 0, policy_version 224134 (0.0024) [2025-01-04 14:36:48,968][134211] Fps is (10 sec: 13516.6, 60 sec: 14199.5, 300 sec: 14079.1). Total num frames: 918073344. Throughput: 0: 3556.2. Samples: 218682138. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 14:36:48,968][134211] Avg episode reward: [(0, '9.656')] [2025-01-04 14:36:50,496][134294] Updated weights for policy 0, policy_version 224144 (0.0025) [2025-01-04 14:36:52,406][134294] Updated weights for policy 0, policy_version 224154 (0.0014) [2025-01-04 14:36:53,968][134211] Fps is (10 sec: 15565.0, 60 sec: 14472.5, 300 sec: 14162.5). Total num frames: 918159360. Throughput: 0: 3443.4. Samples: 218705686. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 14:36:53,968][134211] Avg episode reward: [(0, '9.570')] [2025-01-04 14:36:54,897][134294] Updated weights for policy 0, policy_version 224164 (0.0020) [2025-01-04 14:36:57,907][134294] Updated weights for policy 0, policy_version 224174 (0.0025) [2025-01-04 14:36:58,968][134211] Fps is (10 sec: 15564.7, 60 sec: 14472.5, 300 sec: 14176.3). Total num frames: 918228992. Throughput: 0: 3509.5. Samples: 218728152. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 14:36:58,968][134211] Avg episode reward: [(0, '8.759')] [2025-01-04 14:37:00,973][134294] Updated weights for policy 0, policy_version 224184 (0.0025) [2025-01-04 14:37:03,968][134211] Fps is (10 sec: 13516.6, 60 sec: 13994.6, 300 sec: 14190.2). Total num frames: 918294528. Throughput: 0: 3519.3. Samples: 218738278. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 14:37:03,969][134211] Avg episode reward: [(0, '10.539')] [2025-01-04 14:37:03,985][134294] Updated weights for policy 0, policy_version 224194 (0.0031) [2025-01-04 14:37:07,011][134294] Updated weights for policy 0, policy_version 224204 (0.0026) [2025-01-04 14:37:08,968][134211] Fps is (10 sec: 13517.1, 60 sec: 13858.1, 300 sec: 14204.1). Total num frames: 918364160. Throughput: 0: 3511.2. Samples: 218758612. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 14:37:08,968][134211] Avg episode reward: [(0, '10.778')] [2025-01-04 14:37:10,148][134294] Updated weights for policy 0, policy_version 224214 (0.0020) [2025-01-04 14:37:13,567][134294] Updated weights for policy 0, policy_version 224224 (0.0027) [2025-01-04 14:37:13,968][134211] Fps is (10 sec: 13107.3, 60 sec: 13789.9, 300 sec: 14093.0). Total num frames: 918425600. Throughput: 0: 3476.7. Samples: 218777182. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 14:37:13,968][134211] Avg episode reward: [(0, '8.713')] [2025-01-04 14:37:13,975][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000224226_918429696.pth... [2025-01-04 14:37:14,027][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000223391_915009536.pth [2025-01-04 14:37:15,700][134294] Updated weights for policy 0, policy_version 224234 (0.0015) [2025-01-04 14:37:17,791][134294] Updated weights for policy 0, policy_version 224244 (0.0015) [2025-01-04 14:37:18,967][134211] Fps is (10 sec: 15974.6, 60 sec: 14336.1, 300 sec: 14204.1). Total num frames: 918523904. Throughput: 0: 3565.3. Samples: 218791282. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 14:37:18,968][134211] Avg episode reward: [(0, '8.966')] [2025-01-04 14:37:19,796][134294] Updated weights for policy 0, policy_version 224254 (0.0014) [2025-01-04 14:37:21,746][134294] Updated weights for policy 0, policy_version 224264 (0.0013) [2025-01-04 14:37:23,603][134294] Updated weights for policy 0, policy_version 224274 (0.0013) [2025-01-04 14:37:23,968][134211] Fps is (10 sec: 20480.4, 60 sec: 15018.7, 300 sec: 14343.0). Total num frames: 918630400. Throughput: 0: 3785.1. Samples: 218821990. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 14:37:23,968][134211] Avg episode reward: [(0, '9.078')] [2025-01-04 14:37:26,407][134294] Updated weights for policy 0, policy_version 224284 (0.0023) [2025-01-04 14:37:28,968][134211] Fps is (10 sec: 17612.4, 60 sec: 15018.7, 300 sec: 14342.9). Total num frames: 918700032. Throughput: 0: 3866.8. Samples: 218846000. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 14:37:28,968][134211] Avg episode reward: [(0, '9.643')] [2025-01-04 14:37:29,486][134294] Updated weights for policy 0, policy_version 224294 (0.0027) [2025-01-04 14:37:32,660][134294] Updated weights for policy 0, policy_version 224304 (0.0026) [2025-01-04 14:37:33,968][134211] Fps is (10 sec: 13516.7, 60 sec: 14950.4, 300 sec: 14342.9). Total num frames: 918765568. Throughput: 0: 3856.5. Samples: 218855678. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 14:37:33,968][134211] Avg episode reward: [(0, '10.558')] [2025-01-04 14:37:35,665][134294] Updated weights for policy 0, policy_version 224314 (0.0028) [2025-01-04 14:37:38,685][134294] Updated weights for policy 0, policy_version 224324 (0.0026) [2025-01-04 14:37:38,968][134211] Fps is (10 sec: 13516.8, 60 sec: 14950.4, 300 sec: 14343.0). Total num frames: 918835200. Throughput: 0: 3786.5. Samples: 218876080. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 14:37:38,968][134211] Avg episode reward: [(0, '9.159')] [2025-01-04 14:37:41,568][134294] Updated weights for policy 0, policy_version 224334 (0.0025) [2025-01-04 14:37:43,968][134211] Fps is (10 sec: 13516.6, 60 sec: 14950.4, 300 sec: 14342.9). Total num frames: 918900736. Throughput: 0: 3741.6. Samples: 218896524. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 14:37:43,968][134211] Avg episode reward: [(0, '9.847')] [2025-01-04 14:37:44,557][134294] Updated weights for policy 0, policy_version 224344 (0.0025) [2025-01-04 14:37:47,572][134294] Updated weights for policy 0, policy_version 224354 (0.0025) [2025-01-04 14:37:48,968][134211] Fps is (10 sec: 13517.0, 60 sec: 14950.4, 300 sec: 14343.0). Total num frames: 918970368. Throughput: 0: 3743.8. Samples: 218906748. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 14:37:48,968][134211] Avg episode reward: [(0, '9.279')] [2025-01-04 14:37:50,595][134294] Updated weights for policy 0, policy_version 224364 (0.0025) [2025-01-04 14:37:53,428][134294] Updated weights for policy 0, policy_version 224374 (0.0022) [2025-01-04 14:37:53,968][134211] Fps is (10 sec: 13926.6, 60 sec: 14677.3, 300 sec: 14356.8). Total num frames: 919040000. Throughput: 0: 3759.6. Samples: 218927794. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 14:37:53,968][134211] Avg episode reward: [(0, '9.665')] [2025-01-04 14:37:56,338][134294] Updated weights for policy 0, policy_version 224384 (0.0020) [2025-01-04 14:37:58,968][134211] Fps is (10 sec: 13925.4, 60 sec: 14677.2, 300 sec: 14356.8). Total num frames: 919109632. Throughput: 0: 3806.7. Samples: 218948486. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 14:37:58,969][134211] Avg episode reward: [(0, '9.968')] [2025-01-04 14:37:59,422][134294] Updated weights for policy 0, policy_version 224394 (0.0023) [2025-01-04 14:38:02,449][134294] Updated weights for policy 0, policy_version 224404 (0.0025) [2025-01-04 14:38:03,968][134211] Fps is (10 sec: 13926.2, 60 sec: 14745.6, 300 sec: 14370.7). Total num frames: 919179264. Throughput: 0: 3721.1. Samples: 218958734. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 14:38:03,968][134211] Avg episode reward: [(0, '9.116')] [2025-01-04 14:38:05,349][134294] Updated weights for policy 0, policy_version 224414 (0.0026) [2025-01-04 14:38:08,273][134294] Updated weights for policy 0, policy_version 224424 (0.0024) [2025-01-04 14:38:08,968][134211] Fps is (10 sec: 13927.3, 60 sec: 14745.6, 300 sec: 14370.7). Total num frames: 919248896. Throughput: 0: 3506.3. Samples: 218979774. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 14:38:08,968][134211] Avg episode reward: [(0, '9.711')] [2025-01-04 14:38:11,222][134294] Updated weights for policy 0, policy_version 224434 (0.0025) [2025-01-04 14:38:13,968][134211] Fps is (10 sec: 13515.9, 60 sec: 14813.7, 300 sec: 14245.7). Total num frames: 919314432. Throughput: 0: 3424.8. Samples: 219000118. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 14:38:13,969][134211] Avg episode reward: [(0, '10.040')] [2025-01-04 14:38:14,262][134294] Updated weights for policy 0, policy_version 224444 (0.0028) [2025-01-04 14:38:17,224][134294] Updated weights for policy 0, policy_version 224454 (0.0023) [2025-01-04 14:38:18,968][134211] Fps is (10 sec: 13516.9, 60 sec: 14336.0, 300 sec: 14120.8). Total num frames: 919384064. Throughput: 0: 3437.2. Samples: 219010350. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 14:38:18,968][134211] Avg episode reward: [(0, '10.364')] [2025-01-04 14:38:20,169][134294] Updated weights for policy 0, policy_version 224464 (0.0024) [2025-01-04 14:38:23,030][134294] Updated weights for policy 0, policy_version 224474 (0.0023) [2025-01-04 14:38:23,968][134211] Fps is (10 sec: 14337.0, 60 sec: 13789.8, 300 sec: 14134.7). Total num frames: 919457792. Throughput: 0: 3458.5. Samples: 219031714. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 14:38:23,968][134211] Avg episode reward: [(0, '9.131')] [2025-01-04 14:38:25,989][134294] Updated weights for policy 0, policy_version 224484 (0.0026) [2025-01-04 14:38:28,968][134211] Fps is (10 sec: 13926.3, 60 sec: 13721.6, 300 sec: 14134.7). Total num frames: 919523328. Throughput: 0: 3463.2. Samples: 219052368. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 14:38:28,968][134211] Avg episode reward: [(0, '11.381')] [2025-01-04 14:38:28,999][134294] Updated weights for policy 0, policy_version 224494 (0.0024) [2025-01-04 14:38:31,837][134294] Updated weights for policy 0, policy_version 224504 (0.0023) [2025-01-04 14:38:33,968][134211] Fps is (10 sec: 13516.8, 60 sec: 13789.8, 300 sec: 14148.5). Total num frames: 919592960. Throughput: 0: 3466.4. Samples: 219062736. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 14:38:33,968][134211] Avg episode reward: [(0, '9.692')] [2025-01-04 14:38:35,085][134294] Updated weights for policy 0, policy_version 224514 (0.0026) [2025-01-04 14:38:37,966][134294] Updated weights for policy 0, policy_version 224524 (0.0023) [2025-01-04 14:38:38,968][134211] Fps is (10 sec: 13925.6, 60 sec: 13789.7, 300 sec: 14162.4). Total num frames: 919662592. Throughput: 0: 3446.2. Samples: 219082874. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 14:38:38,969][134211] Avg episode reward: [(0, '9.248')] [2025-01-04 14:38:40,928][134294] Updated weights for policy 0, policy_version 224534 (0.0024) [2025-01-04 14:38:43,797][134294] Updated weights for policy 0, policy_version 224544 (0.0023) [2025-01-04 14:38:43,968][134211] Fps is (10 sec: 13926.6, 60 sec: 13858.2, 300 sec: 14176.3). Total num frames: 919732224. Throughput: 0: 3458.8. Samples: 219104130. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 14:38:43,968][134211] Avg episode reward: [(0, '10.814')] [2025-01-04 14:38:46,763][134294] Updated weights for policy 0, policy_version 224554 (0.0026) [2025-01-04 14:38:48,968][134211] Fps is (10 sec: 13927.2, 60 sec: 13858.1, 300 sec: 14176.4). Total num frames: 919801856. Throughput: 0: 3462.5. Samples: 219114546. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 14:38:48,968][134211] Avg episode reward: [(0, '10.278')] [2025-01-04 14:38:49,657][134294] Updated weights for policy 0, policy_version 224564 (0.0022) [2025-01-04 14:38:52,658][134294] Updated weights for policy 0, policy_version 224574 (0.0024) [2025-01-04 14:38:53,968][134211] Fps is (10 sec: 13926.2, 60 sec: 13858.1, 300 sec: 14190.2). Total num frames: 919871488. Throughput: 0: 3454.5. Samples: 219135228. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 14:38:53,968][134211] Avg episode reward: [(0, '9.766')] [2025-01-04 14:38:55,169][134294] Updated weights for policy 0, policy_version 224584 (0.0016) [2025-01-04 14:38:57,027][134294] Updated weights for policy 0, policy_version 224594 (0.0013) [2025-01-04 14:38:58,884][134294] Updated weights for policy 0, policy_version 224604 (0.0013) [2025-01-04 14:38:58,968][134211] Fps is (10 sec: 17613.1, 60 sec: 14472.7, 300 sec: 14315.2). Total num frames: 919977984. Throughput: 0: 3636.5. Samples: 219163756. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 14:38:58,968][134211] Avg episode reward: [(0, '10.291')] [2025-01-04 14:39:00,783][134294] Updated weights for policy 0, policy_version 224614 (0.0015) [2025-01-04 14:39:02,694][134294] Updated weights for policy 0, policy_version 224624 (0.0013) [2025-01-04 14:39:03,968][134211] Fps is (10 sec: 21299.6, 60 sec: 15087.0, 300 sec: 14454.0). Total num frames: 920084480. Throughput: 0: 3771.3. Samples: 219180060. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 14:39:03,968][134211] Avg episode reward: [(0, '9.592')] [2025-01-04 14:39:04,773][134294] Updated weights for policy 0, policy_version 224634 (0.0014) [2025-01-04 14:39:07,814][134294] Updated weights for policy 0, policy_version 224644 (0.0026) [2025-01-04 14:39:08,968][134211] Fps is (10 sec: 17612.6, 60 sec: 15087.0, 300 sec: 14454.0). Total num frames: 920154112. Throughput: 0: 3889.5. Samples: 219206740. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 14:39:08,968][134211] Avg episode reward: [(0, '10.778')] [2025-01-04 14:39:10,878][134294] Updated weights for policy 0, policy_version 224654 (0.0024) [2025-01-04 14:39:13,969][134211] Fps is (10 sec: 13515.3, 60 sec: 15086.9, 300 sec: 14440.1). Total num frames: 920219648. Throughput: 0: 3874.8. Samples: 219226738. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 14:39:13,969][134211] Avg episode reward: [(0, '10.701')] [2025-01-04 14:39:13,976][134294] Updated weights for policy 0, policy_version 224664 (0.0024) [2025-01-04 14:39:13,984][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000224664_920223744.pth... [2025-01-04 14:39:14,065][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000223814_916742144.pth [2025-01-04 14:39:17,041][134294] Updated weights for policy 0, policy_version 224674 (0.0024) [2025-01-04 14:39:18,968][134211] Fps is (10 sec: 13516.7, 60 sec: 15086.9, 300 sec: 14454.0). Total num frames: 920289280. Throughput: 0: 3860.6. Samples: 219236464. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 14:39:18,968][134211] Avg episode reward: [(0, '9.712')] [2025-01-04 14:39:20,146][134294] Updated weights for policy 0, policy_version 224684 (0.0022) [2025-01-04 14:39:23,073][134294] Updated weights for policy 0, policy_version 224694 (0.0023) [2025-01-04 14:39:23,968][134211] Fps is (10 sec: 13517.8, 60 sec: 14950.4, 300 sec: 14440.1). Total num frames: 920354816. Throughput: 0: 3866.8. Samples: 219256878. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 14:39:23,969][134211] Avg episode reward: [(0, '9.353')] [2025-01-04 14:39:26,026][134294] Updated weights for policy 0, policy_version 224704 (0.0029) [2025-01-04 14:39:28,968][134211] Fps is (10 sec: 13516.8, 60 sec: 15018.7, 300 sec: 14440.1). Total num frames: 920424448. Throughput: 0: 3852.3. Samples: 219277484. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 14:39:28,968][134211] Avg episode reward: [(0, '7.915')] [2025-01-04 14:39:29,074][134294] Updated weights for policy 0, policy_version 224714 (0.0027) [2025-01-04 14:39:32,033][134294] Updated weights for policy 0, policy_version 224724 (0.0022) [2025-01-04 14:39:33,968][134211] Fps is (10 sec: 13926.7, 60 sec: 15018.7, 300 sec: 14454.0). Total num frames: 920494080. Throughput: 0: 3846.7. Samples: 219287648. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 14:39:33,968][134211] Avg episode reward: [(0, '9.517')] [2025-01-04 14:39:34,920][134294] Updated weights for policy 0, policy_version 224734 (0.0024) [2025-01-04 14:39:37,769][134294] Updated weights for policy 0, policy_version 224744 (0.0024) [2025-01-04 14:39:38,968][134211] Fps is (10 sec: 13926.3, 60 sec: 15018.8, 300 sec: 14454.1). Total num frames: 920563712. Throughput: 0: 3860.4. Samples: 219308946. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 14:39:38,968][134211] Avg episode reward: [(0, '9.182')] [2025-01-04 14:39:40,743][134294] Updated weights for policy 0, policy_version 224754 (0.0026) [2025-01-04 14:39:43,650][134294] Updated weights for policy 0, policy_version 224764 (0.0023) [2025-01-04 14:39:43,968][134211] Fps is (10 sec: 14336.0, 60 sec: 15086.9, 300 sec: 14412.4). Total num frames: 920637440. Throughput: 0: 3696.8. Samples: 219330112. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 14:39:43,968][134211] Avg episode reward: [(0, '9.406')] [2025-01-04 14:39:46,483][134294] Updated weights for policy 0, policy_version 224774 (0.0024) [2025-01-04 14:39:48,968][134211] Fps is (10 sec: 14336.0, 60 sec: 15086.9, 300 sec: 14426.2). Total num frames: 920707072. Throughput: 0: 3566.3. Samples: 219340546. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 14:39:48,969][134211] Avg episode reward: [(0, '9.450')] [2025-01-04 14:39:49,518][134294] Updated weights for policy 0, policy_version 224784 (0.0025) [2025-01-04 14:39:52,467][134294] Updated weights for policy 0, policy_version 224794 (0.0029) [2025-01-04 14:39:53,969][134211] Fps is (10 sec: 13515.3, 60 sec: 15018.4, 300 sec: 14426.2). Total num frames: 920772608. Throughput: 0: 3432.1. Samples: 219361188. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 14:39:53,969][134211] Avg episode reward: [(0, '10.242')] [2025-01-04 14:39:55,397][134294] Updated weights for policy 0, policy_version 224804 (0.0024) [2025-01-04 14:39:58,304][134294] Updated weights for policy 0, policy_version 224814 (0.0023) [2025-01-04 14:39:58,968][134211] Fps is (10 sec: 13926.4, 60 sec: 14472.5, 300 sec: 14440.2). Total num frames: 920846336. Throughput: 0: 3459.9. Samples: 219382430. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 14:39:58,968][134211] Avg episode reward: [(0, '9.102')] [2025-01-04 14:40:01,163][134294] Updated weights for policy 0, policy_version 224824 (0.0025) [2025-01-04 14:40:03,968][134211] Fps is (10 sec: 14337.5, 60 sec: 13858.1, 300 sec: 14440.1). Total num frames: 920915968. Throughput: 0: 3478.0. Samples: 219392974. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 14:40:03,968][134211] Avg episode reward: [(0, '10.865')] [2025-01-04 14:40:04,216][134294] Updated weights for policy 0, policy_version 224834 (0.0027) [2025-01-04 14:40:07,137][134294] Updated weights for policy 0, policy_version 224844 (0.0025) [2025-01-04 14:40:08,968][134211] Fps is (10 sec: 13926.7, 60 sec: 13858.1, 300 sec: 14398.5). Total num frames: 920985600. Throughput: 0: 3480.0. Samples: 219413476. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 14:40:08,968][134211] Avg episode reward: [(0, '10.199')] [2025-01-04 14:40:10,097][134294] Updated weights for policy 0, policy_version 224854 (0.0024) [2025-01-04 14:40:13,032][134294] Updated weights for policy 0, policy_version 224864 (0.0023) [2025-01-04 14:40:13,968][134211] Fps is (10 sec: 13925.6, 60 sec: 13926.5, 300 sec: 14412.3). Total num frames: 921055232. Throughput: 0: 3491.5. Samples: 219434602. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 14:40:13,969][134211] Avg episode reward: [(0, '9.684')] [2025-01-04 14:40:15,818][134294] Updated weights for policy 0, policy_version 224874 (0.0025) [2025-01-04 14:40:18,766][134294] Updated weights for policy 0, policy_version 224884 (0.0025) [2025-01-04 14:40:18,968][134211] Fps is (10 sec: 13926.2, 60 sec: 13926.4, 300 sec: 14370.7). Total num frames: 921124864. Throughput: 0: 3503.0. Samples: 219445282. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 14:40:18,968][134211] Avg episode reward: [(0, '10.027')] [2025-01-04 14:40:21,636][134294] Updated weights for policy 0, policy_version 224894 (0.0025) [2025-01-04 14:40:23,968][134211] Fps is (10 sec: 13927.3, 60 sec: 13994.7, 300 sec: 14370.7). Total num frames: 921194496. Throughput: 0: 3500.2. Samples: 219466454. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 14:40:23,968][134211] Avg episode reward: [(0, '10.315')] [2025-01-04 14:40:24,685][134294] Updated weights for policy 0, policy_version 224904 (0.0023) [2025-01-04 14:40:27,409][134294] Updated weights for policy 0, policy_version 224914 (0.0023) [2025-01-04 14:40:28,968][134211] Fps is (10 sec: 15565.1, 60 sec: 14267.8, 300 sec: 14440.1). Total num frames: 921280512. Throughput: 0: 3536.6. Samples: 219489260. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 14:40:28,968][134211] Avg episode reward: [(0, '9.418')] [2025-01-04 14:40:29,320][134294] Updated weights for policy 0, policy_version 224924 (0.0011) [2025-01-04 14:40:31,167][134294] Updated weights for policy 0, policy_version 224934 (0.0012) [2025-01-04 14:40:33,064][134294] Updated weights for policy 0, policy_version 224944 (0.0014) [2025-01-04 14:40:33,968][134211] Fps is (10 sec: 19251.6, 60 sec: 14882.2, 300 sec: 14565.1). Total num frames: 921387008. Throughput: 0: 3665.8. Samples: 219505506. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 14:40:33,968][134211] Avg episode reward: [(0, '9.476')] [2025-01-04 14:40:34,951][134294] Updated weights for policy 0, policy_version 224954 (0.0015) [2025-01-04 14:40:36,865][134294] Updated weights for policy 0, policy_version 224964 (0.0016) [2025-01-04 14:40:38,968][134211] Fps is (10 sec: 20070.2, 60 sec: 15291.8, 300 sec: 14662.3). Total num frames: 921481216. Throughput: 0: 3921.4. Samples: 219537648. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 14:40:38,968][134211] Avg episode reward: [(0, '9.402')] [2025-01-04 14:40:39,739][134294] Updated weights for policy 0, policy_version 224974 (0.0025) [2025-01-04 14:40:43,070][134294] Updated weights for policy 0, policy_version 224984 (0.0025) [2025-01-04 14:40:43,968][134211] Fps is (10 sec: 15564.4, 60 sec: 15086.9, 300 sec: 14648.4). Total num frames: 921542656. Throughput: 0: 3885.7. Samples: 219557286. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 14:40:43,968][134211] Avg episode reward: [(0, '9.890')] [2025-01-04 14:40:46,046][134294] Updated weights for policy 0, policy_version 224994 (0.0023) [2025-01-04 14:40:48,968][134211] Fps is (10 sec: 13107.2, 60 sec: 15087.0, 300 sec: 14648.4). Total num frames: 921612288. Throughput: 0: 3876.0. Samples: 219567394. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 14:40:48,968][134211] Avg episode reward: [(0, '9.285')] [2025-01-04 14:40:49,277][134294] Updated weights for policy 0, policy_version 225004 (0.0027) [2025-01-04 14:40:52,155][134294] Updated weights for policy 0, policy_version 225014 (0.0025) [2025-01-04 14:40:53,968][134211] Fps is (10 sec: 13517.0, 60 sec: 15087.2, 300 sec: 14634.5). Total num frames: 921677824. Throughput: 0: 3866.8. Samples: 219587482. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 14:40:53,968][134211] Avg episode reward: [(0, '9.576')] [2025-01-04 14:40:55,189][134294] Updated weights for policy 0, policy_version 225024 (0.0026) [2025-01-04 14:40:58,091][134294] Updated weights for policy 0, policy_version 225034 (0.0023) [2025-01-04 14:40:58,968][134211] Fps is (10 sec: 13516.8, 60 sec: 15018.7, 300 sec: 14551.2). Total num frames: 921747456. Throughput: 0: 3858.0. Samples: 219608208. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 14:40:58,968][134211] Avg episode reward: [(0, '9.827')] [2025-01-04 14:41:01,036][134294] Updated weights for policy 0, policy_version 225044 (0.0026) [2025-01-04 14:41:03,968][134211] Fps is (10 sec: 13926.3, 60 sec: 15018.7, 300 sec: 14523.4). Total num frames: 921817088. Throughput: 0: 3858.7. Samples: 219618922. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 14:41:03,968][134211] Avg episode reward: [(0, '11.000')] [2025-01-04 14:41:04,118][134294] Updated weights for policy 0, policy_version 225054 (0.0027) [2025-01-04 14:41:06,980][134294] Updated weights for policy 0, policy_version 225064 (0.0026) [2025-01-04 14:41:08,968][134211] Fps is (10 sec: 13925.8, 60 sec: 15018.6, 300 sec: 14537.3). Total num frames: 921886720. Throughput: 0: 3841.0. Samples: 219639298. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 14:41:08,969][134211] Avg episode reward: [(0, '10.972')] [2025-01-04 14:41:09,921][134294] Updated weights for policy 0, policy_version 225074 (0.0027) [2025-01-04 14:41:12,840][134294] Updated weights for policy 0, policy_version 225084 (0.0026) [2025-01-04 14:41:13,968][134211] Fps is (10 sec: 14335.7, 60 sec: 15087.1, 300 sec: 14565.1). Total num frames: 921960448. Throughput: 0: 3805.4. Samples: 219660504. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 14:41:13,969][134211] Avg episode reward: [(0, '9.388')] [2025-01-04 14:41:13,978][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000225088_921960448.pth... [2025-01-04 14:41:14,046][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000224226_918429696.pth [2025-01-04 14:41:15,690][134294] Updated weights for policy 0, policy_version 225094 (0.0023) [2025-01-04 14:41:18,578][134294] Updated weights for policy 0, policy_version 225104 (0.0025) [2025-01-04 14:41:18,968][134211] Fps is (10 sec: 14336.4, 60 sec: 15086.9, 300 sec: 14579.0). Total num frames: 922030080. Throughput: 0: 3676.0. Samples: 219670926. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 14:41:18,968][134211] Avg episode reward: [(0, '9.441')] [2025-01-04 14:41:21,597][134294] Updated weights for policy 0, policy_version 225114 (0.0027) [2025-01-04 14:41:23,968][134211] Fps is (10 sec: 13516.9, 60 sec: 15018.6, 300 sec: 14565.1). Total num frames: 922095616. Throughput: 0: 3426.8. Samples: 219691854. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 14:41:23,969][134211] Avg episode reward: [(0, '9.617')] [2025-01-04 14:41:24,630][134294] Updated weights for policy 0, policy_version 225124 (0.0024) [2025-01-04 14:41:27,621][134294] Updated weights for policy 0, policy_version 225134 (0.0024) [2025-01-04 14:41:28,968][134211] Fps is (10 sec: 13517.0, 60 sec: 14745.6, 300 sec: 14565.1). Total num frames: 922165248. Throughput: 0: 3446.1. Samples: 219712358. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 14:41:28,968][134211] Avg episode reward: [(0, '9.519')] [2025-01-04 14:41:30,553][134294] Updated weights for policy 0, policy_version 225144 (0.0023) [2025-01-04 14:41:33,369][134294] Updated weights for policy 0, policy_version 225154 (0.0026) [2025-01-04 14:41:33,968][134211] Fps is (10 sec: 14336.1, 60 sec: 14199.4, 300 sec: 14579.0). Total num frames: 922238976. Throughput: 0: 3458.3. Samples: 219723018. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 14:41:33,968][134211] Avg episode reward: [(0, '9.714')] [2025-01-04 14:41:36,245][134294] Updated weights for policy 0, policy_version 225164 (0.0026) [2025-01-04 14:41:38,968][134211] Fps is (10 sec: 13926.2, 60 sec: 13721.6, 300 sec: 14579.0). Total num frames: 922304512. Throughput: 0: 3482.1. Samples: 219744178. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 14:41:38,968][134211] Avg episode reward: [(0, '9.632')] [2025-01-04 14:41:39,359][134294] Updated weights for policy 0, policy_version 225174 (0.0024) [2025-01-04 14:41:42,164][134294] Updated weights for policy 0, policy_version 225184 (0.0025) [2025-01-04 14:41:43,968][134211] Fps is (10 sec: 13517.0, 60 sec: 13858.2, 300 sec: 14579.0). Total num frames: 922374144. Throughput: 0: 3481.4. Samples: 219764870. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 14:41:43,968][134211] Avg episode reward: [(0, '9.469')] [2025-01-04 14:41:45,131][134294] Updated weights for policy 0, policy_version 225194 (0.0027) [2025-01-04 14:41:48,009][134294] Updated weights for policy 0, policy_version 225204 (0.0024) [2025-01-04 14:41:48,968][134211] Fps is (10 sec: 14336.2, 60 sec: 13926.4, 300 sec: 14537.3). Total num frames: 922447872. Throughput: 0: 3482.1. Samples: 219775618. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 14:41:48,968][134211] Avg episode reward: [(0, '9.630')] [2025-01-04 14:41:50,894][134294] Updated weights for policy 0, policy_version 225214 (0.0024) [2025-01-04 14:41:53,858][134294] Updated weights for policy 0, policy_version 225224 (0.0026) [2025-01-04 14:41:53,968][134211] Fps is (10 sec: 14335.9, 60 sec: 13994.6, 300 sec: 14537.3). Total num frames: 922517504. Throughput: 0: 3501.7. Samples: 219796874. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 14:41:53,968][134211] Avg episode reward: [(0, '10.130')] [2025-01-04 14:41:56,705][134294] Updated weights for policy 0, policy_version 225234 (0.0024) [2025-01-04 14:41:58,968][134211] Fps is (10 sec: 13926.3, 60 sec: 13994.6, 300 sec: 14551.2). Total num frames: 922587136. Throughput: 0: 3489.9. Samples: 219817550. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 14:41:58,968][134211] Avg episode reward: [(0, '9.298')] [2025-01-04 14:41:59,766][134294] Updated weights for policy 0, policy_version 225244 (0.0026) [2025-01-04 14:42:02,074][134294] Updated weights for policy 0, policy_version 225254 (0.0014) [2025-01-04 14:42:03,968][134211] Fps is (10 sec: 15974.6, 60 sec: 14336.0, 300 sec: 14620.6). Total num frames: 922677248. Throughput: 0: 3500.4. Samples: 219828442. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 14:42:03,968][134211] Avg episode reward: [(0, '9.754')] [2025-01-04 14:42:04,011][134294] Updated weights for policy 0, policy_version 225264 (0.0013) [2025-01-04 14:42:05,917][134294] Updated weights for policy 0, policy_version 225274 (0.0014) [2025-01-04 14:42:07,775][134294] Updated weights for policy 0, policy_version 225284 (0.0014) [2025-01-04 14:42:08,968][134211] Fps is (10 sec: 20070.9, 60 sec: 15018.8, 300 sec: 14787.3). Total num frames: 922787840. Throughput: 0: 3752.1. Samples: 219860696. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 14:42:08,968][134211] Avg episode reward: [(0, '8.810')] [2025-01-04 14:42:09,710][134294] Updated weights for policy 0, policy_version 225294 (0.0013) [2025-01-04 14:42:12,167][134294] Updated weights for policy 0, policy_version 225304 (0.0020) [2025-01-04 14:42:13,968][134211] Fps is (10 sec: 18841.2, 60 sec: 15087.0, 300 sec: 14717.8). Total num frames: 922865664. Throughput: 0: 3896.2. Samples: 219887688. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 14:42:13,968][134211] Avg episode reward: [(0, '9.662')] [2025-01-04 14:42:15,281][134294] Updated weights for policy 0, policy_version 225314 (0.0026) [2025-01-04 14:42:18,320][134294] Updated weights for policy 0, policy_version 225324 (0.0028) [2025-01-04 14:42:18,968][134211] Fps is (10 sec: 14745.3, 60 sec: 15086.9, 300 sec: 14592.9). Total num frames: 922935296. Throughput: 0: 3884.7. Samples: 219897828. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 14:42:18,969][134211] Avg episode reward: [(0, '9.533')] [2025-01-04 14:42:21,372][134294] Updated weights for policy 0, policy_version 225334 (0.0028) [2025-01-04 14:42:23,968][134211] Fps is (10 sec: 13516.6, 60 sec: 15086.9, 300 sec: 14579.0). Total num frames: 923000832. Throughput: 0: 3860.9. Samples: 219917918. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 14:42:23,969][134211] Avg episode reward: [(0, '9.568')] [2025-01-04 14:42:24,501][134294] Updated weights for policy 0, policy_version 225344 (0.0026) [2025-01-04 14:42:27,514][134294] Updated weights for policy 0, policy_version 225354 (0.0026) [2025-01-04 14:42:28,968][134211] Fps is (10 sec: 13107.3, 60 sec: 15018.7, 300 sec: 14579.0). Total num frames: 923066368. Throughput: 0: 3845.3. Samples: 219937910. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 14:42:28,968][134211] Avg episode reward: [(0, '8.959')] [2025-01-04 14:42:30,530][134294] Updated weights for policy 0, policy_version 225364 (0.0028) [2025-01-04 14:42:33,435][134294] Updated weights for policy 0, policy_version 225374 (0.0023) [2025-01-04 14:42:33,968][134211] Fps is (10 sec: 13517.0, 60 sec: 14950.4, 300 sec: 14579.0). Total num frames: 923136000. Throughput: 0: 3839.1. Samples: 219948376. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 14:42:33,968][134211] Avg episode reward: [(0, '8.998')] [2025-01-04 14:42:36,379][134294] Updated weights for policy 0, policy_version 225384 (0.0023) [2025-01-04 14:42:38,970][134211] Fps is (10 sec: 13923.4, 60 sec: 15018.2, 300 sec: 14592.8). Total num frames: 923205632. Throughput: 0: 3826.7. Samples: 219969084. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 14:42:38,970][134211] Avg episode reward: [(0, '9.180')] [2025-01-04 14:42:39,447][134294] Updated weights for policy 0, policy_version 225394 (0.0025) [2025-01-04 14:42:42,331][134294] Updated weights for policy 0, policy_version 225404 (0.0024) [2025-01-04 14:42:43,968][134211] Fps is (10 sec: 13926.7, 60 sec: 15018.7, 300 sec: 14592.9). Total num frames: 923275264. Throughput: 0: 3824.9. Samples: 219989668. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 14:42:43,968][134211] Avg episode reward: [(0, '10.254')] [2025-01-04 14:42:45,356][134294] Updated weights for policy 0, policy_version 225414 (0.0023) [2025-01-04 14:42:48,200][134294] Updated weights for policy 0, policy_version 225424 (0.0023) [2025-01-04 14:42:48,968][134211] Fps is (10 sec: 13929.4, 60 sec: 14950.4, 300 sec: 14592.9). Total num frames: 923344896. Throughput: 0: 3818.6. Samples: 220000280. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 14:42:48,968][134211] Avg episode reward: [(0, '10.248')] [2025-01-04 14:42:51,119][134294] Updated weights for policy 0, policy_version 225434 (0.0025) [2025-01-04 14:42:53,968][134211] Fps is (10 sec: 13926.3, 60 sec: 14950.4, 300 sec: 14592.9). Total num frames: 923414528. Throughput: 0: 3569.6. Samples: 220021330. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 14:42:53,968][134211] Avg episode reward: [(0, '10.249')] [2025-01-04 14:42:54,077][134294] Updated weights for policy 0, policy_version 225444 (0.0024) [2025-01-04 14:42:56,995][134294] Updated weights for policy 0, policy_version 225454 (0.0027) [2025-01-04 14:42:58,968][134211] Fps is (10 sec: 13925.8, 60 sec: 14950.3, 300 sec: 14592.9). Total num frames: 923484160. Throughput: 0: 3430.8. Samples: 220042076. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 14:42:58,969][134211] Avg episode reward: [(0, '11.352')] [2025-01-04 14:42:59,938][134294] Updated weights for policy 0, policy_version 225464 (0.0027) [2025-01-04 14:43:02,929][134294] Updated weights for policy 0, policy_version 225474 (0.0025) [2025-01-04 14:43:03,968][134211] Fps is (10 sec: 13926.3, 60 sec: 14609.0, 300 sec: 14592.9). Total num frames: 923553792. Throughput: 0: 3441.2. Samples: 220052682. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 14:43:03,968][134211] Avg episode reward: [(0, '9.900')] [2025-01-04 14:43:05,834][134294] Updated weights for policy 0, policy_version 225484 (0.0024) [2025-01-04 14:43:08,845][134294] Updated weights for policy 0, policy_version 225494 (0.0022) [2025-01-04 14:43:08,968][134211] Fps is (10 sec: 13926.8, 60 sec: 13926.4, 300 sec: 14606.8). Total num frames: 923623424. Throughput: 0: 3459.2. Samples: 220073582. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 14:43:08,968][134211] Avg episode reward: [(0, '9.462')] [2025-01-04 14:43:11,814][134294] Updated weights for policy 0, policy_version 225504 (0.0025) [2025-01-04 14:43:13,969][134211] Fps is (10 sec: 13925.4, 60 sec: 13789.7, 300 sec: 14606.7). Total num frames: 923693056. Throughput: 0: 3471.9. Samples: 220094148. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 14:43:13,969][134211] Avg episode reward: [(0, '9.695')] [2025-01-04 14:43:13,978][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000225511_923693056.pth... [2025-01-04 14:43:14,056][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000224664_920223744.pth [2025-01-04 14:43:14,769][134294] Updated weights for policy 0, policy_version 225514 (0.0028) [2025-01-04 14:43:17,768][134294] Updated weights for policy 0, policy_version 225524 (0.0026) [2025-01-04 14:43:18,968][134211] Fps is (10 sec: 13517.0, 60 sec: 13721.6, 300 sec: 14579.0). Total num frames: 923758592. Throughput: 0: 3464.9. Samples: 220104294. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 14:43:18,968][134211] Avg episode reward: [(0, '9.627')] [2025-01-04 14:43:20,683][134294] Updated weights for policy 0, policy_version 225534 (0.0023) [2025-01-04 14:43:23,526][134294] Updated weights for policy 0, policy_version 225544 (0.0024) [2025-01-04 14:43:23,968][134211] Fps is (10 sec: 13927.4, 60 sec: 13858.2, 300 sec: 14606.7). Total num frames: 923832320. Throughput: 0: 3476.6. Samples: 220125522. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 14:43:23,968][134211] Avg episode reward: [(0, '10.011')] [2025-01-04 14:43:26,465][134294] Updated weights for policy 0, policy_version 225554 (0.0025) [2025-01-04 14:43:28,968][134211] Fps is (10 sec: 14745.5, 60 sec: 13994.7, 300 sec: 14620.6). Total num frames: 923906048. Throughput: 0: 3482.8. Samples: 220146396. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 14:43:28,968][134211] Avg episode reward: [(0, '8.489')] [2025-01-04 14:43:29,150][134294] Updated weights for policy 0, policy_version 225564 (0.0020) [2025-01-04 14:43:31,058][134294] Updated weights for policy 0, policy_version 225574 (0.0012) [2025-01-04 14:43:32,928][134294] Updated weights for policy 0, policy_version 225584 (0.0014) [2025-01-04 14:43:33,968][134211] Fps is (10 sec: 17203.6, 60 sec: 14472.6, 300 sec: 14717.9). Total num frames: 924004352. Throughput: 0: 3589.2. Samples: 220161796. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 14:43:33,968][134211] Avg episode reward: [(0, '9.572')] [2025-01-04 14:43:35,761][134294] Updated weights for policy 0, policy_version 225594 (0.0026) [2025-01-04 14:43:38,703][134294] Updated weights for policy 0, policy_version 225604 (0.0025) [2025-01-04 14:43:38,968][134211] Fps is (10 sec: 16792.9, 60 sec: 14473.0, 300 sec: 14717.8). Total num frames: 924073984. Throughput: 0: 3669.6. Samples: 220186462. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 14:43:38,969][134211] Avg episode reward: [(0, '9.844')] [2025-01-04 14:43:41,740][134294] Updated weights for policy 0, policy_version 225614 (0.0026) [2025-01-04 14:43:43,968][134211] Fps is (10 sec: 13926.3, 60 sec: 14472.5, 300 sec: 14717.8). Total num frames: 924143616. Throughput: 0: 3660.3. Samples: 220206790. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 14:43:43,968][134211] Avg episode reward: [(0, '8.978')] [2025-01-04 14:43:44,788][134294] Updated weights for policy 0, policy_version 225624 (0.0026) [2025-01-04 14:43:47,723][134294] Updated weights for policy 0, policy_version 225634 (0.0024) [2025-01-04 14:43:48,968][134211] Fps is (10 sec: 13927.0, 60 sec: 14472.5, 300 sec: 14717.8). Total num frames: 924213248. Throughput: 0: 3649.3. Samples: 220216900. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 14:43:48,968][134211] Avg episode reward: [(0, '9.956')] [2025-01-04 14:43:50,714][134294] Updated weights for policy 0, policy_version 225644 (0.0025) [2025-01-04 14:43:53,554][134294] Updated weights for policy 0, policy_version 225654 (0.0023) [2025-01-04 14:43:53,968][134211] Fps is (10 sec: 13926.4, 60 sec: 14472.5, 300 sec: 14592.9). Total num frames: 924282880. Throughput: 0: 3654.0. Samples: 220238012. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 14:43:53,968][134211] Avg episode reward: [(0, '9.051')] [2025-01-04 14:43:56,553][134294] Updated weights for policy 0, policy_version 225664 (0.0023) [2025-01-04 14:43:58,968][134211] Fps is (10 sec: 13926.2, 60 sec: 14472.6, 300 sec: 14467.9). Total num frames: 924352512. Throughput: 0: 3660.7. Samples: 220258876. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 14:43:58,968][134211] Avg episode reward: [(0, '9.849')] [2025-01-04 14:43:59,502][134294] Updated weights for policy 0, policy_version 225674 (0.0025) [2025-01-04 14:44:02,525][134294] Updated weights for policy 0, policy_version 225684 (0.0025) [2025-01-04 14:44:03,968][134211] Fps is (10 sec: 13516.8, 60 sec: 14404.3, 300 sec: 14454.0). Total num frames: 924418048. Throughput: 0: 3662.8. Samples: 220269118. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 14:44:03,968][134211] Avg episode reward: [(0, '9.820')] [2025-01-04 14:44:05,371][134294] Updated weights for policy 0, policy_version 225694 (0.0024) [2025-01-04 14:44:08,298][134294] Updated weights for policy 0, policy_version 225704 (0.0024) [2025-01-04 14:44:08,968][134211] Fps is (10 sec: 13926.5, 60 sec: 14472.6, 300 sec: 14481.8). Total num frames: 924491776. Throughput: 0: 3661.9. Samples: 220290306. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 14:44:08,968][134211] Avg episode reward: [(0, '9.863')] [2025-01-04 14:44:11,159][134294] Updated weights for policy 0, policy_version 225714 (0.0025) [2025-01-04 14:44:13,968][134211] Fps is (10 sec: 14335.8, 60 sec: 14472.7, 300 sec: 14481.8). Total num frames: 924561408. Throughput: 0: 3658.5. Samples: 220311028. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 14:44:13,973][134211] Avg episode reward: [(0, '10.270')] [2025-01-04 14:44:14,208][134294] Updated weights for policy 0, policy_version 225724 (0.0026) [2025-01-04 14:44:16,952][134294] Updated weights for policy 0, policy_version 225734 (0.0021) [2025-01-04 14:44:18,968][134211] Fps is (10 sec: 15155.2, 60 sec: 14745.6, 300 sec: 14537.3). Total num frames: 924643328. Throughput: 0: 3545.1. Samples: 220321324. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 14:44:18,968][134211] Avg episode reward: [(0, '10.395')] [2025-01-04 14:44:19,002][134294] Updated weights for policy 0, policy_version 225744 (0.0015) [2025-01-04 14:44:21,850][134294] Updated weights for policy 0, policy_version 225754 (0.0023) [2025-01-04 14:44:23,969][134211] Fps is (10 sec: 15153.4, 60 sec: 14677.0, 300 sec: 14537.3). Total num frames: 924712960. Throughput: 0: 3551.8. Samples: 220346296. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 14:44:23,970][134211] Avg episode reward: [(0, '10.478')] [2025-01-04 14:44:24,968][134294] Updated weights for policy 0, policy_version 225764 (0.0025) [2025-01-04 14:44:27,925][134294] Updated weights for policy 0, policy_version 225774 (0.0023) [2025-01-04 14:44:28,968][134211] Fps is (10 sec: 13926.3, 60 sec: 14609.0, 300 sec: 14537.3). Total num frames: 924782592. Throughput: 0: 3554.2. Samples: 220366728. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 14:44:28,969][134211] Avg episode reward: [(0, '9.631')] [2025-01-04 14:44:30,852][134294] Updated weights for policy 0, policy_version 225784 (0.0028) [2025-01-04 14:44:33,708][134294] Updated weights for policy 0, policy_version 225794 (0.0024) [2025-01-04 14:44:33,968][134211] Fps is (10 sec: 13928.2, 60 sec: 14131.2, 300 sec: 14537.3). Total num frames: 924852224. Throughput: 0: 3562.9. Samples: 220377232. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 14:44:33,968][134211] Avg episode reward: [(0, '10.103')] [2025-01-04 14:44:36,640][134294] Updated weights for policy 0, policy_version 225804 (0.0026) [2025-01-04 14:44:38,668][134294] Updated weights for policy 0, policy_version 225814 (0.0013) [2025-01-04 14:44:38,968][134211] Fps is (10 sec: 15564.9, 60 sec: 14404.4, 300 sec: 14579.0). Total num frames: 924938240. Throughput: 0: 3573.6. Samples: 220398826. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 14:44:38,968][134211] Avg episode reward: [(0, '10.550')] [2025-01-04 14:44:41,443][134294] Updated weights for policy 0, policy_version 225824 (0.0024) [2025-01-04 14:44:43,968][134211] Fps is (10 sec: 15564.6, 60 sec: 14404.2, 300 sec: 14579.0). Total num frames: 925007872. Throughput: 0: 3656.0. Samples: 220423396. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 14:44:43,968][134211] Avg episode reward: [(0, '9.985')] [2025-01-04 14:44:44,314][134294] Updated weights for policy 0, policy_version 225834 (0.0028) [2025-01-04 14:44:47,171][134294] Updated weights for policy 0, policy_version 225844 (0.0025) [2025-01-04 14:44:48,968][134211] Fps is (10 sec: 13926.4, 60 sec: 14404.3, 300 sec: 14592.9). Total num frames: 925077504. Throughput: 0: 3658.0. Samples: 220433730. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 14:44:48,968][134211] Avg episode reward: [(0, '10.834')] [2025-01-04 14:44:50,200][134294] Updated weights for policy 0, policy_version 225854 (0.0025) [2025-01-04 14:44:53,055][134294] Updated weights for policy 0, policy_version 225864 (0.0025) [2025-01-04 14:44:53,968][134211] Fps is (10 sec: 14336.3, 60 sec: 14472.5, 300 sec: 14592.9). Total num frames: 925151232. Throughput: 0: 3656.4. Samples: 220454842. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 14:44:53,968][134211] Avg episode reward: [(0, '10.303')] [2025-01-04 14:44:55,983][134294] Updated weights for policy 0, policy_version 225874 (0.0023) [2025-01-04 14:44:58,805][134294] Updated weights for policy 0, policy_version 225884 (0.0025) [2025-01-04 14:44:58,968][134211] Fps is (10 sec: 14335.9, 60 sec: 14472.5, 300 sec: 14592.9). Total num frames: 925220864. Throughput: 0: 3671.3. Samples: 220476238. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 14:44:58,968][134211] Avg episode reward: [(0, '9.222')] [2025-01-04 14:45:01,785][134294] Updated weights for policy 0, policy_version 225894 (0.0021) [2025-01-04 14:45:03,968][134211] Fps is (10 sec: 13926.3, 60 sec: 14540.8, 300 sec: 14592.9). Total num frames: 925290496. Throughput: 0: 3673.6. Samples: 220486634. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 14:45:03,968][134211] Avg episode reward: [(0, '10.340')] [2025-01-04 14:45:04,921][134294] Updated weights for policy 0, policy_version 225904 (0.0028) [2025-01-04 14:45:07,722][134294] Updated weights for policy 0, policy_version 225914 (0.0023) [2025-01-04 14:45:08,968][134211] Fps is (10 sec: 13926.6, 60 sec: 14472.6, 300 sec: 14592.9). Total num frames: 925360128. Throughput: 0: 3572.7. Samples: 220507062. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 14:45:08,968][134211] Avg episode reward: [(0, '9.844')] [2025-01-04 14:45:09,967][134294] Updated weights for policy 0, policy_version 225924 (0.0017) [2025-01-04 14:45:12,421][134294] Updated weights for policy 0, policy_version 225934 (0.0020) [2025-01-04 14:45:13,968][134211] Fps is (10 sec: 15564.3, 60 sec: 14745.6, 300 sec: 14648.4). Total num frames: 925446144. Throughput: 0: 3682.7. Samples: 220532450. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 14:45:13,969][134211] Avg episode reward: [(0, '9.986')] [2025-01-04 14:45:13,976][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000225939_925446144.pth... [2025-01-04 14:45:14,050][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000225088_921960448.pth [2025-01-04 14:45:15,408][134294] Updated weights for policy 0, policy_version 225944 (0.0026) [2025-01-04 14:45:18,242][134294] Updated weights for policy 0, policy_version 225954 (0.0024) [2025-01-04 14:45:18,968][134211] Fps is (10 sec: 15564.5, 60 sec: 14540.8, 300 sec: 14648.4). Total num frames: 925515776. Throughput: 0: 3677.9. Samples: 220542738. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 14:45:18,968][134211] Avg episode reward: [(0, '11.132')] [2025-01-04 14:45:21,262][134294] Updated weights for policy 0, policy_version 225964 (0.0024) [2025-01-04 14:45:23,968][134211] Fps is (10 sec: 13926.9, 60 sec: 14541.1, 300 sec: 14592.9). Total num frames: 925585408. Throughput: 0: 3658.0. Samples: 220563438. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 14:45:23,968][134211] Avg episode reward: [(0, '9.623')] [2025-01-04 14:45:24,301][134294] Updated weights for policy 0, policy_version 225974 (0.0027) [2025-01-04 14:45:27,246][134294] Updated weights for policy 0, policy_version 225984 (0.0027) [2025-01-04 14:45:28,968][134211] Fps is (10 sec: 13517.0, 60 sec: 14472.6, 300 sec: 14454.0). Total num frames: 925650944. Throughput: 0: 3569.9. Samples: 220584042. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 14:45:28,968][134211] Avg episode reward: [(0, '9.587')] [2025-01-04 14:45:29,781][134294] Updated weights for policy 0, policy_version 225994 (0.0020) [2025-01-04 14:45:31,855][134294] Updated weights for policy 0, policy_version 226004 (0.0018) [2025-01-04 14:45:33,968][134211] Fps is (10 sec: 15563.9, 60 sec: 14813.7, 300 sec: 14440.1). Total num frames: 925741056. Throughput: 0: 3662.8. Samples: 220598560. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 14:45:33,969][134211] Avg episode reward: [(0, '10.223')] [2025-01-04 14:45:34,789][134294] Updated weights for policy 0, policy_version 226014 (0.0025) [2025-01-04 14:45:37,872][134294] Updated weights for policy 0, policy_version 226024 (0.0028) [2025-01-04 14:45:38,968][134211] Fps is (10 sec: 15564.0, 60 sec: 14472.4, 300 sec: 14454.0). Total num frames: 925806592. Throughput: 0: 3661.5. Samples: 220619612. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 14:45:38,969][134211] Avg episode reward: [(0, '9.579')] [2025-01-04 14:45:40,798][134294] Updated weights for policy 0, policy_version 226034 (0.0025) [2025-01-04 14:45:43,665][134294] Updated weights for policy 0, policy_version 226044 (0.0022) [2025-01-04 14:45:43,968][134211] Fps is (10 sec: 13517.4, 60 sec: 14472.5, 300 sec: 14454.0). Total num frames: 925876224. Throughput: 0: 3653.0. Samples: 220640624. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 14:45:43,968][134211] Avg episode reward: [(0, '9.812')] [2025-01-04 14:45:46,592][134294] Updated weights for policy 0, policy_version 226054 (0.0022) [2025-01-04 14:45:48,968][134211] Fps is (10 sec: 14336.3, 60 sec: 14540.7, 300 sec: 14481.8). Total num frames: 925949952. Throughput: 0: 3653.1. Samples: 220651026. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 14:45:48,969][134211] Avg episode reward: [(0, '9.625')] [2025-01-04 14:45:49,573][134294] Updated weights for policy 0, policy_version 226064 (0.0024) [2025-01-04 14:45:52,552][134294] Updated weights for policy 0, policy_version 226074 (0.0022) [2025-01-04 14:45:53,968][134211] Fps is (10 sec: 13926.4, 60 sec: 14404.2, 300 sec: 14467.9). Total num frames: 926015488. Throughput: 0: 3657.5. Samples: 220671650. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 14:45:53,968][134211] Avg episode reward: [(0, '9.416')] [2025-01-04 14:45:55,486][134294] Updated weights for policy 0, policy_version 226084 (0.0025) [2025-01-04 14:45:57,376][134294] Updated weights for policy 0, policy_version 226094 (0.0013) [2025-01-04 14:45:58,967][134211] Fps is (10 sec: 16384.6, 60 sec: 14882.2, 300 sec: 14565.1). Total num frames: 926113792. Throughput: 0: 3677.6. Samples: 220697938. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 14:45:58,968][134211] Avg episode reward: [(0, '10.179')] [2025-01-04 14:45:59,241][134294] Updated weights for policy 0, policy_version 226104 (0.0012) [2025-01-04 14:46:01,090][134294] Updated weights for policy 0, policy_version 226114 (0.0013) [2025-01-04 14:46:03,070][134294] Updated weights for policy 0, policy_version 226124 (0.0014) [2025-01-04 14:46:03,968][134211] Fps is (10 sec: 20480.3, 60 sec: 15496.5, 300 sec: 14690.1). Total num frames: 926220288. Throughput: 0: 3807.2. Samples: 220714060. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 14:46:03,969][134211] Avg episode reward: [(0, '10.814')] [2025-01-04 14:46:05,742][134294] Updated weights for policy 0, policy_version 226134 (0.0023) [2025-01-04 14:46:08,737][134294] Updated weights for policy 0, policy_version 226144 (0.0028) [2025-01-04 14:46:08,968][134211] Fps is (10 sec: 17202.7, 60 sec: 15428.2, 300 sec: 14662.3). Total num frames: 926285824. Throughput: 0: 3910.1. Samples: 220739392. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 14:46:08,968][134211] Avg episode reward: [(0, '10.065')] [2025-01-04 14:46:11,903][134294] Updated weights for policy 0, policy_version 226154 (0.0026) [2025-01-04 14:46:13,968][134211] Fps is (10 sec: 13107.0, 60 sec: 15087.0, 300 sec: 14648.4). Total num frames: 926351360. Throughput: 0: 3888.2. Samples: 220759014. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 14:46:13,968][134211] Avg episode reward: [(0, '9.319')] [2025-01-04 14:46:15,091][134294] Updated weights for policy 0, policy_version 226164 (0.0026) [2025-01-04 14:46:18,076][134294] Updated weights for policy 0, policy_version 226174 (0.0027) [2025-01-04 14:46:18,968][134211] Fps is (10 sec: 13107.4, 60 sec: 15018.7, 300 sec: 14648.4). Total num frames: 926416896. Throughput: 0: 3784.5. Samples: 220768860. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 14:46:18,968][134211] Avg episode reward: [(0, '9.870')] [2025-01-04 14:46:21,094][134294] Updated weights for policy 0, policy_version 226184 (0.0025) [2025-01-04 14:46:23,968][134211] Fps is (10 sec: 13517.0, 60 sec: 15018.7, 300 sec: 14648.4). Total num frames: 926486528. Throughput: 0: 3773.8. Samples: 220789430. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 14:46:23,968][134211] Avg episode reward: [(0, '10.115')] [2025-01-04 14:46:24,106][134294] Updated weights for policy 0, policy_version 226194 (0.0028) [2025-01-04 14:46:27,035][134294] Updated weights for policy 0, policy_version 226204 (0.0026) [2025-01-04 14:46:28,968][134211] Fps is (10 sec: 13926.5, 60 sec: 15086.9, 300 sec: 14634.5). Total num frames: 926556160. Throughput: 0: 3764.0. Samples: 220810004. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 14:46:28,968][134211] Avg episode reward: [(0, '8.602')] [2025-01-04 14:46:30,036][134294] Updated weights for policy 0, policy_version 226214 (0.0022) [2025-01-04 14:46:32,926][134294] Updated weights for policy 0, policy_version 226224 (0.0024) [2025-01-04 14:46:33,968][134211] Fps is (10 sec: 13926.3, 60 sec: 14745.7, 300 sec: 14648.4). Total num frames: 926625792. Throughput: 0: 3765.2. Samples: 220820458. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 14:46:33,968][134211] Avg episode reward: [(0, '9.492')] [2025-01-04 14:46:35,792][134294] Updated weights for policy 0, policy_version 226234 (0.0022) [2025-01-04 14:46:38,682][134294] Updated weights for policy 0, policy_version 226244 (0.0024) [2025-01-04 14:46:38,968][134211] Fps is (10 sec: 14336.0, 60 sec: 14882.3, 300 sec: 14662.3). Total num frames: 926699520. Throughput: 0: 3783.1. Samples: 220841888. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 14:46:38,968][134211] Avg episode reward: [(0, '10.245')] [2025-01-04 14:46:41,597][134294] Updated weights for policy 0, policy_version 226254 (0.0024) [2025-01-04 14:46:43,969][134211] Fps is (10 sec: 14333.6, 60 sec: 14881.7, 300 sec: 14648.3). Total num frames: 926769152. Throughput: 0: 3662.2. Samples: 220862746. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 14:46:43,970][134211] Avg episode reward: [(0, '9.267')] [2025-01-04 14:46:44,573][134294] Updated weights for policy 0, policy_version 226264 (0.0022) [2025-01-04 14:46:47,507][134294] Updated weights for policy 0, policy_version 226274 (0.0024) [2025-01-04 14:46:48,968][134211] Fps is (10 sec: 13516.6, 60 sec: 14745.6, 300 sec: 14634.5). Total num frames: 926834688. Throughput: 0: 3535.0. Samples: 220873134. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 14:46:48,968][134211] Avg episode reward: [(0, '8.697')] [2025-01-04 14:46:50,407][134294] Updated weights for policy 0, policy_version 226284 (0.0024) [2025-01-04 14:46:53,310][134294] Updated weights for policy 0, policy_version 226294 (0.0023) [2025-01-04 14:46:53,968][134211] Fps is (10 sec: 13928.3, 60 sec: 14882.1, 300 sec: 14648.4). Total num frames: 926908416. Throughput: 0: 3443.0. Samples: 220894326. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 14:46:53,969][134211] Avg episode reward: [(0, '10.149')] [2025-01-04 14:46:56,244][134294] Updated weights for policy 0, policy_version 226304 (0.0024) [2025-01-04 14:46:58,968][134211] Fps is (10 sec: 14335.9, 60 sec: 14404.2, 300 sec: 14579.0). Total num frames: 926978048. Throughput: 0: 3471.5. Samples: 220915230. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 14:46:58,968][134211] Avg episode reward: [(0, '10.311')] [2025-01-04 14:46:59,251][134294] Updated weights for policy 0, policy_version 226314 (0.0028) [2025-01-04 14:47:02,215][134294] Updated weights for policy 0, policy_version 226324 (0.0025) [2025-01-04 14:47:03,968][134211] Fps is (10 sec: 13517.0, 60 sec: 13721.6, 300 sec: 14426.2). Total num frames: 927043584. Throughput: 0: 3477.0. Samples: 220925324. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 14:47:03,968][134211] Avg episode reward: [(0, '10.027')] [2025-01-04 14:47:05,160][134294] Updated weights for policy 0, policy_version 226334 (0.0027) [2025-01-04 14:47:08,056][134294] Updated weights for policy 0, policy_version 226344 (0.0022) [2025-01-04 14:47:08,968][134211] Fps is (10 sec: 13515.9, 60 sec: 13789.7, 300 sec: 14398.5). Total num frames: 927113216. Throughput: 0: 3490.7. Samples: 220946512. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 14:47:08,969][134211] Avg episode reward: [(0, '9.121')] [2025-01-04 14:47:11,033][134294] Updated weights for policy 0, policy_version 226354 (0.0027) [2025-01-04 14:47:13,710][134294] Updated weights for policy 0, policy_version 226364 (0.0020) [2025-01-04 14:47:13,971][134211] Fps is (10 sec: 14741.4, 60 sec: 13994.0, 300 sec: 14426.1). Total num frames: 927191040. Throughput: 0: 3497.9. Samples: 220967418. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 14:47:13,971][134211] Avg episode reward: [(0, '10.246')] [2025-01-04 14:47:13,976][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000226365_927191040.pth... [2025-01-04 14:47:14,019][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000225511_923693056.pth [2025-01-04 14:47:15,569][134294] Updated weights for policy 0, policy_version 226374 (0.0011) [2025-01-04 14:47:17,577][134294] Updated weights for policy 0, policy_version 226384 (0.0016) [2025-01-04 14:47:18,968][134211] Fps is (10 sec: 17204.6, 60 sec: 14472.5, 300 sec: 14523.5). Total num frames: 927285248. Throughput: 0: 3622.3. Samples: 220983460. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 14:47:18,968][134211] Avg episode reward: [(0, '9.963')] [2025-01-04 14:47:20,418][134294] Updated weights for policy 0, policy_version 226394 (0.0025) [2025-01-04 14:47:23,407][134294] Updated weights for policy 0, policy_version 226404 (0.0025) [2025-01-04 14:47:23,968][134211] Fps is (10 sec: 16388.3, 60 sec: 14472.4, 300 sec: 14537.3). Total num frames: 927354880. Throughput: 0: 3665.7. Samples: 221006846. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 14:47:23,969][134211] Avg episode reward: [(0, '8.489')] [2025-01-04 14:47:26,368][134294] Updated weights for policy 0, policy_version 226414 (0.0026) [2025-01-04 14:47:28,968][134211] Fps is (10 sec: 13926.2, 60 sec: 14472.5, 300 sec: 14537.3). Total num frames: 927424512. Throughput: 0: 3657.4. Samples: 221027322. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 14:47:28,968][134211] Avg episode reward: [(0, '10.069')] [2025-01-04 14:47:29,403][134294] Updated weights for policy 0, policy_version 226424 (0.0022) [2025-01-04 14:47:32,359][134294] Updated weights for policy 0, policy_version 226434 (0.0026) [2025-01-04 14:47:33,968][134211] Fps is (10 sec: 13926.7, 60 sec: 14472.5, 300 sec: 14537.4). Total num frames: 927494144. Throughput: 0: 3652.0. Samples: 221037474. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 14:47:33,968][134211] Avg episode reward: [(0, '10.275')] [2025-01-04 14:47:35,293][134294] Updated weights for policy 0, policy_version 226444 (0.0025) [2025-01-04 14:47:38,215][134294] Updated weights for policy 0, policy_version 226454 (0.0025) [2025-01-04 14:47:38,968][134211] Fps is (10 sec: 13926.4, 60 sec: 14404.2, 300 sec: 14537.3). Total num frames: 927563776. Throughput: 0: 3647.5. Samples: 221058464. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 14:47:38,968][134211] Avg episode reward: [(0, '10.699')] [2025-01-04 14:47:41,193][134294] Updated weights for policy 0, policy_version 226464 (0.0025) [2025-01-04 14:47:43,968][134211] Fps is (10 sec: 13926.6, 60 sec: 14404.7, 300 sec: 14537.3). Total num frames: 927633408. Throughput: 0: 3643.2. Samples: 221079174. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 14:47:43,968][134211] Avg episode reward: [(0, '10.171')] [2025-01-04 14:47:44,178][134294] Updated weights for policy 0, policy_version 226474 (0.0025) [2025-01-04 14:47:47,165][134294] Updated weights for policy 0, policy_version 226484 (0.0026) [2025-01-04 14:47:48,968][134211] Fps is (10 sec: 13516.9, 60 sec: 14404.3, 300 sec: 14523.4). Total num frames: 927698944. Throughput: 0: 3646.5. Samples: 221089418. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 14:47:48,968][134211] Avg episode reward: [(0, '9.727')] [2025-01-04 14:47:49,983][134294] Updated weights for policy 0, policy_version 226494 (0.0022) [2025-01-04 14:47:51,887][134294] Updated weights for policy 0, policy_version 226504 (0.0012) [2025-01-04 14:47:53,968][134211] Fps is (10 sec: 15564.8, 60 sec: 14677.4, 300 sec: 14592.9). Total num frames: 927789056. Throughput: 0: 3734.2. Samples: 221114548. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 14:47:53,968][134211] Avg episode reward: [(0, '9.583')] [2025-01-04 14:47:54,592][134294] Updated weights for policy 0, policy_version 226514 (0.0024) [2025-01-04 14:47:57,548][134294] Updated weights for policy 0, policy_version 226524 (0.0025) [2025-01-04 14:47:58,968][134211] Fps is (10 sec: 15974.3, 60 sec: 14677.3, 300 sec: 14592.9). Total num frames: 927858688. Throughput: 0: 3739.5. Samples: 221135686. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 14:47:58,968][134211] Avg episode reward: [(0, '8.814')] [2025-01-04 14:48:00,551][134294] Updated weights for policy 0, policy_version 226534 (0.0026) [2025-01-04 14:48:03,518][134294] Updated weights for policy 0, policy_version 226544 (0.0025) [2025-01-04 14:48:03,968][134211] Fps is (10 sec: 13926.3, 60 sec: 14745.6, 300 sec: 14592.9). Total num frames: 927928320. Throughput: 0: 3616.9. Samples: 221146220. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 14:48:03,968][134211] Avg episode reward: [(0, '9.953')] [2025-01-04 14:48:06,425][134294] Updated weights for policy 0, policy_version 226554 (0.0024) [2025-01-04 14:48:08,968][134211] Fps is (10 sec: 13926.4, 60 sec: 14745.8, 300 sec: 14592.9). Total num frames: 927997952. Throughput: 0: 3562.0. Samples: 221167134. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 14:48:08,968][134211] Avg episode reward: [(0, '9.896')] [2025-01-04 14:48:09,489][134294] Updated weights for policy 0, policy_version 226564 (0.0026) [2025-01-04 14:48:12,362][134294] Updated weights for policy 0, policy_version 226574 (0.0022) [2025-01-04 14:48:13,968][134211] Fps is (10 sec: 15155.1, 60 sec: 14814.6, 300 sec: 14648.4). Total num frames: 928079872. Throughput: 0: 3601.7. Samples: 221189398. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 14:48:13,968][134211] Avg episode reward: [(0, '10.454')] [2025-01-04 14:48:14,241][134294] Updated weights for policy 0, policy_version 226584 (0.0015) [2025-01-04 14:48:17,036][134294] Updated weights for policy 0, policy_version 226594 (0.0023) [2025-01-04 14:48:18,968][134211] Fps is (10 sec: 15564.5, 60 sec: 14472.4, 300 sec: 14648.4). Total num frames: 928153600. Throughput: 0: 3665.4. Samples: 221202416. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 14:48:18,969][134211] Avg episode reward: [(0, '10.195')] [2025-01-04 14:48:20,013][134294] Updated weights for policy 0, policy_version 226604 (0.0025) [2025-01-04 14:48:22,922][134294] Updated weights for policy 0, policy_version 226614 (0.0024) [2025-01-04 14:48:23,968][134211] Fps is (10 sec: 14336.1, 60 sec: 14472.6, 300 sec: 14634.5). Total num frames: 928223232. Throughput: 0: 3665.8. Samples: 221223424. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 14:48:23,968][134211] Avg episode reward: [(0, '10.045')] [2025-01-04 14:48:25,816][134294] Updated weights for policy 0, policy_version 226624 (0.0027) [2025-01-04 14:48:28,714][134294] Updated weights for policy 0, policy_version 226634 (0.0025) [2025-01-04 14:48:28,968][134211] Fps is (10 sec: 14336.3, 60 sec: 14540.8, 300 sec: 14551.2). Total num frames: 928296960. Throughput: 0: 3677.8. Samples: 221244674. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 14:48:28,968][134211] Avg episode reward: [(0, '9.388')] [2025-01-04 14:48:31,554][134294] Updated weights for policy 0, policy_version 226644 (0.0021) [2025-01-04 14:48:33,968][134211] Fps is (10 sec: 13926.4, 60 sec: 14472.5, 300 sec: 14537.3). Total num frames: 928362496. Throughput: 0: 3682.7. Samples: 221255138. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 14:48:33,969][134211] Avg episode reward: [(0, '9.822')] [2025-01-04 14:48:34,582][134294] Updated weights for policy 0, policy_version 226654 (0.0028) [2025-01-04 14:48:37,523][134294] Updated weights for policy 0, policy_version 226664 (0.0026) [2025-01-04 14:48:38,968][134211] Fps is (10 sec: 13516.9, 60 sec: 14472.6, 300 sec: 14537.3). Total num frames: 928432128. Throughput: 0: 3587.2. Samples: 221275974. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 14:48:38,968][134211] Avg episode reward: [(0, '9.166')] [2025-01-04 14:48:40,207][134294] Updated weights for policy 0, policy_version 226674 (0.0020) [2025-01-04 14:48:42,104][134294] Updated weights for policy 0, policy_version 226684 (0.0013) [2025-01-04 14:48:43,967][134211] Fps is (10 sec: 17203.7, 60 sec: 15018.7, 300 sec: 14648.4). Total num frames: 928534528. Throughput: 0: 3733.4. Samples: 221303688. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-04 14:48:43,968][134211] Avg episode reward: [(0, '8.881')] [2025-01-04 14:48:43,975][134294] Updated weights for policy 0, policy_version 226694 (0.0015) [2025-01-04 14:48:45,807][134294] Updated weights for policy 0, policy_version 226704 (0.0014) [2025-01-04 14:48:47,732][134294] Updated weights for policy 0, policy_version 226714 (0.0012) [2025-01-04 14:48:48,967][134211] Fps is (10 sec: 21299.5, 60 sec: 15769.7, 300 sec: 14787.3). Total num frames: 928645120. Throughput: 0: 3865.2. Samples: 221320152. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-04 14:48:48,968][134211] Avg episode reward: [(0, '10.717')] [2025-01-04 14:48:49,618][134294] Updated weights for policy 0, policy_version 226724 (0.0015) [2025-01-04 14:48:52,534][134294] Updated weights for policy 0, policy_version 226734 (0.0023) [2025-01-04 14:48:53,968][134211] Fps is (10 sec: 18431.7, 60 sec: 15496.5, 300 sec: 14801.1). Total num frames: 928718848. Throughput: 0: 4017.9. Samples: 221347940. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-04 14:48:53,968][134211] Avg episode reward: [(0, '8.137')] [2025-01-04 14:48:55,618][134294] Updated weights for policy 0, policy_version 226744 (0.0028) [2025-01-04 14:48:58,572][134294] Updated weights for policy 0, policy_version 226754 (0.0023) [2025-01-04 14:48:58,968][134211] Fps is (10 sec: 13926.2, 60 sec: 15428.3, 300 sec: 14801.1). Total num frames: 928784384. Throughput: 0: 3964.5. Samples: 221367798. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-04 14:48:58,968][134211] Avg episode reward: [(0, '9.057')] [2025-01-04 14:49:01,613][134294] Updated weights for policy 0, policy_version 226764 (0.0025) [2025-01-04 14:49:03,968][134211] Fps is (10 sec: 13516.8, 60 sec: 15428.3, 300 sec: 14787.3). Total num frames: 928854016. Throughput: 0: 3900.7. Samples: 221377946. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-04 14:49:03,968][134211] Avg episode reward: [(0, '10.253')] [2025-01-04 14:49:04,851][134294] Updated weights for policy 0, policy_version 226774 (0.0028) [2025-01-04 14:49:07,751][134294] Updated weights for policy 0, policy_version 226784 (0.0027) [2025-01-04 14:49:08,969][134211] Fps is (10 sec: 13515.2, 60 sec: 15359.7, 300 sec: 14773.3). Total num frames: 928919552. Throughput: 0: 3876.0. Samples: 221397846. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-04 14:49:08,969][134211] Avg episode reward: [(0, '8.859')] [2025-01-04 14:49:10,861][134294] Updated weights for policy 0, policy_version 226794 (0.0025) [2025-01-04 14:49:13,689][134294] Updated weights for policy 0, policy_version 226804 (0.0024) [2025-01-04 14:49:13,968][134211] Fps is (10 sec: 13516.5, 60 sec: 15155.2, 300 sec: 14731.7). Total num frames: 928989184. Throughput: 0: 3869.4. Samples: 221418796. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-04 14:49:13,968][134211] Avg episode reward: [(0, '10.967')] [2025-01-04 14:49:13,987][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000226805_928993280.pth... [2025-01-04 14:49:14,060][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000225939_925446144.pth [2025-01-04 14:49:16,691][134294] Updated weights for policy 0, policy_version 226814 (0.0024) [2025-01-04 14:49:18,968][134211] Fps is (10 sec: 13928.1, 60 sec: 15087.0, 300 sec: 14731.8). Total num frames: 929058816. Throughput: 0: 3862.9. Samples: 221428966. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-04 14:49:18,968][134211] Avg episode reward: [(0, '8.737')] [2025-01-04 14:49:19,715][134294] Updated weights for policy 0, policy_version 226824 (0.0027) [2025-01-04 14:49:22,691][134294] Updated weights for policy 0, policy_version 226834 (0.0026) [2025-01-04 14:49:23,968][134211] Fps is (10 sec: 13926.5, 60 sec: 15086.9, 300 sec: 14731.7). Total num frames: 929128448. Throughput: 0: 3857.9. Samples: 221449582. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-04 14:49:23,968][134211] Avg episode reward: [(0, '10.430')] [2025-01-04 14:49:25,593][134294] Updated weights for policy 0, policy_version 226844 (0.0027) [2025-01-04 14:49:28,489][134294] Updated weights for policy 0, policy_version 226854 (0.0025) [2025-01-04 14:49:28,968][134211] Fps is (10 sec: 13926.3, 60 sec: 15018.7, 300 sec: 14731.7). Total num frames: 929198080. Throughput: 0: 3714.3. Samples: 221470834. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-04 14:49:28,968][134211] Avg episode reward: [(0, '10.013')] [2025-01-04 14:49:31,358][134294] Updated weights for policy 0, policy_version 226864 (0.0024) [2025-01-04 14:49:33,968][134211] Fps is (10 sec: 13926.4, 60 sec: 15086.9, 300 sec: 14676.2). Total num frames: 929267712. Throughput: 0: 3583.0. Samples: 221481388. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-04 14:49:33,968][134211] Avg episode reward: [(0, '9.257')] [2025-01-04 14:49:34,373][134294] Updated weights for policy 0, policy_version 226874 (0.0023) [2025-01-04 14:49:37,285][134294] Updated weights for policy 0, policy_version 226884 (0.0024) [2025-01-04 14:49:38,968][134211] Fps is (10 sec: 13926.4, 60 sec: 15086.9, 300 sec: 14676.2). Total num frames: 929337344. Throughput: 0: 3425.6. Samples: 221502092. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-04 14:49:38,968][134211] Avg episode reward: [(0, '9.073')] [2025-01-04 14:49:40,214][134294] Updated weights for policy 0, policy_version 226894 (0.0024) [2025-01-04 14:49:43,033][134294] Updated weights for policy 0, policy_version 226904 (0.0022) [2025-01-04 14:49:43,968][134211] Fps is (10 sec: 13926.4, 60 sec: 14540.7, 300 sec: 14676.2). Total num frames: 929406976. Throughput: 0: 3455.0. Samples: 221523272. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-04 14:49:43,968][134211] Avg episode reward: [(0, '9.939')] [2025-01-04 14:49:45,988][134294] Updated weights for policy 0, policy_version 226914 (0.0025) [2025-01-04 14:49:48,806][134294] Updated weights for policy 0, policy_version 226924 (0.0021) [2025-01-04 14:49:48,968][134211] Fps is (10 sec: 14336.0, 60 sec: 13926.3, 300 sec: 14676.2). Total num frames: 929480704. Throughput: 0: 3464.5. Samples: 221533850. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-04 14:49:48,968][134211] Avg episode reward: [(0, '8.433')] [2025-01-04 14:49:51,712][134294] Updated weights for policy 0, policy_version 226934 (0.0024) [2025-01-04 14:49:53,969][134211] Fps is (10 sec: 14334.2, 60 sec: 13857.8, 300 sec: 14676.1). Total num frames: 929550336. Throughput: 0: 3495.1. Samples: 221555128. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 14:49:53,970][134211] Avg episode reward: [(0, '10.833')] [2025-01-04 14:49:54,680][134294] Updated weights for policy 0, policy_version 226944 (0.0024) [2025-01-04 14:49:57,661][134294] Updated weights for policy 0, policy_version 226954 (0.0023) [2025-01-04 14:49:58,968][134211] Fps is (10 sec: 13926.5, 60 sec: 13926.4, 300 sec: 14676.2). Total num frames: 929619968. Throughput: 0: 3489.7. Samples: 221575830. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 14:49:58,968][134211] Avg episode reward: [(0, '9.607')] [2025-01-04 14:50:00,577][134294] Updated weights for policy 0, policy_version 226964 (0.0024) [2025-01-04 14:50:03,520][134294] Updated weights for policy 0, policy_version 226974 (0.0021) [2025-01-04 14:50:03,968][134211] Fps is (10 sec: 13928.0, 60 sec: 13926.4, 300 sec: 14676.2). Total num frames: 929689600. Throughput: 0: 3499.0. Samples: 221586422. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 14:50:03,969][134211] Avg episode reward: [(0, '9.728')] [2025-01-04 14:50:06,082][134294] Updated weights for policy 0, policy_version 226984 (0.0020) [2025-01-04 14:50:07,935][134294] Updated weights for policy 0, policy_version 226994 (0.0013) [2025-01-04 14:50:08,968][134211] Fps is (10 sec: 16793.3, 60 sec: 14472.8, 300 sec: 14717.8). Total num frames: 929787904. Throughput: 0: 3586.1. Samples: 221610956. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 14:50:08,969][134211] Avg episode reward: [(0, '10.917')] [2025-01-04 14:50:09,814][134294] Updated weights for policy 0, policy_version 227004 (0.0012) [2025-01-04 14:50:11,709][134294] Updated weights for policy 0, policy_version 227014 (0.0013) [2025-01-04 14:50:13,608][134294] Updated weights for policy 0, policy_version 227024 (0.0015) [2025-01-04 14:50:13,968][134211] Fps is (10 sec: 20480.5, 60 sec: 15087.0, 300 sec: 14842.8). Total num frames: 929894400. Throughput: 0: 3842.5. Samples: 221643744. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 14:50:13,968][134211] Avg episode reward: [(0, '9.366')] [2025-01-04 14:50:16,485][134294] Updated weights for policy 0, policy_version 227034 (0.0028) [2025-01-04 14:50:18,968][134211] Fps is (10 sec: 17203.3, 60 sec: 15018.6, 300 sec: 14828.9). Total num frames: 929959936. Throughput: 0: 3859.1. Samples: 221655046. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 14:50:18,968][134211] Avg episode reward: [(0, '9.656')] [2025-01-04 14:50:19,837][134294] Updated weights for policy 0, policy_version 227044 (0.0030) [2025-01-04 14:50:22,916][134294] Updated weights for policy 0, policy_version 227054 (0.0029) [2025-01-04 14:50:23,968][134211] Fps is (10 sec: 13107.2, 60 sec: 14950.4, 300 sec: 14828.9). Total num frames: 930025472. Throughput: 0: 3823.5. Samples: 221674150. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 14:50:23,968][134211] Avg episode reward: [(0, '8.840')] [2025-01-04 14:50:25,915][134294] Updated weights for policy 0, policy_version 227064 (0.0025) [2025-01-04 14:50:28,828][134294] Updated weights for policy 0, policy_version 227074 (0.0026) [2025-01-04 14:50:28,968][134211] Fps is (10 sec: 13517.0, 60 sec: 14950.4, 300 sec: 14759.5). Total num frames: 930095104. Throughput: 0: 3812.8. Samples: 221694846. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 14:50:28,968][134211] Avg episode reward: [(0, '9.423')] [2025-01-04 14:50:31,767][134294] Updated weights for policy 0, policy_version 227084 (0.0022) [2025-01-04 14:50:33,968][134211] Fps is (10 sec: 13926.2, 60 sec: 14950.4, 300 sec: 14773.4). Total num frames: 930164736. Throughput: 0: 3806.6. Samples: 221705146. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 14:50:33,968][134211] Avg episode reward: [(0, '10.574')] [2025-01-04 14:50:34,795][134294] Updated weights for policy 0, policy_version 227094 (0.0027) [2025-01-04 14:50:37,677][134294] Updated weights for policy 0, policy_version 227104 (0.0025) [2025-01-04 14:50:38,968][134211] Fps is (10 sec: 13926.2, 60 sec: 14950.4, 300 sec: 14773.4). Total num frames: 930234368. Throughput: 0: 3790.2. Samples: 221725680. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 14:50:38,968][134211] Avg episode reward: [(0, '9.916')] [2025-01-04 14:50:40,700][134294] Updated weights for policy 0, policy_version 227114 (0.0025) [2025-01-04 14:50:43,531][134294] Updated weights for policy 0, policy_version 227124 (0.0025) [2025-01-04 14:50:43,968][134211] Fps is (10 sec: 13926.5, 60 sec: 14950.4, 300 sec: 14759.5). Total num frames: 930304000. Throughput: 0: 3801.0. Samples: 221746876. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 14:50:43,968][134211] Avg episode reward: [(0, '9.487')] [2025-01-04 14:50:46,455][134294] Updated weights for policy 0, policy_version 227134 (0.0026) [2025-01-04 14:50:48,968][134211] Fps is (10 sec: 13926.4, 60 sec: 14882.1, 300 sec: 14773.4). Total num frames: 930373632. Throughput: 0: 3797.2. Samples: 221757296. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 14:50:48,968][134211] Avg episode reward: [(0, '9.078')] [2025-01-04 14:50:49,424][134294] Updated weights for policy 0, policy_version 227144 (0.0023) [2025-01-04 14:50:52,413][134294] Updated weights for policy 0, policy_version 227154 (0.0025) [2025-01-04 14:50:53,968][134211] Fps is (10 sec: 13926.2, 60 sec: 14882.4, 300 sec: 14676.2). Total num frames: 930443264. Throughput: 0: 3713.1. Samples: 221778044. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 14:50:53,968][134211] Avg episode reward: [(0, '9.671')] [2025-01-04 14:50:55,298][134294] Updated weights for policy 0, policy_version 227164 (0.0023) [2025-01-04 14:50:58,226][134294] Updated weights for policy 0, policy_version 227174 (0.0026) [2025-01-04 14:50:58,968][134211] Fps is (10 sec: 13926.5, 60 sec: 14882.1, 300 sec: 14551.2). Total num frames: 930512896. Throughput: 0: 3455.3. Samples: 221799232. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 14:50:58,968][134211] Avg episode reward: [(0, '8.545')] [2025-01-04 14:51:01,111][134294] Updated weights for policy 0, policy_version 227184 (0.0025) [2025-01-04 14:51:03,968][134211] Fps is (10 sec: 13926.4, 60 sec: 14882.1, 300 sec: 14565.1). Total num frames: 930582528. Throughput: 0: 3437.2. Samples: 221809722. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 14:51:03,968][134211] Avg episode reward: [(0, '8.512')] [2025-01-04 14:51:04,200][134294] Updated weights for policy 0, policy_version 227194 (0.0024) [2025-01-04 14:51:07,151][134294] Updated weights for policy 0, policy_version 227204 (0.0025) [2025-01-04 14:51:08,968][134211] Fps is (10 sec: 13926.4, 60 sec: 14404.3, 300 sec: 14579.0). Total num frames: 930652160. Throughput: 0: 3469.8. Samples: 221830290. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 14:51:08,968][134211] Avg episode reward: [(0, '10.232')] [2025-01-04 14:51:10,049][134294] Updated weights for policy 0, policy_version 227214 (0.0022) [2025-01-04 14:51:12,952][134294] Updated weights for policy 0, policy_version 227224 (0.0024) [2025-01-04 14:51:13,968][134211] Fps is (10 sec: 13926.6, 60 sec: 13789.9, 300 sec: 14592.9). Total num frames: 930721792. Throughput: 0: 3482.2. Samples: 221851544. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 14:51:13,968][134211] Avg episode reward: [(0, '9.194')] [2025-01-04 14:51:13,978][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000227227_930721792.pth... [2025-01-04 14:51:14,049][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000226365_927191040.pth [2025-01-04 14:51:15,934][134294] Updated weights for policy 0, policy_version 227234 (0.0024) [2025-01-04 14:51:18,657][134294] Updated weights for policy 0, policy_version 227244 (0.0024) [2025-01-04 14:51:18,968][134211] Fps is (10 sec: 13926.4, 60 sec: 13858.1, 300 sec: 14592.9). Total num frames: 930791424. Throughput: 0: 3482.2. Samples: 221861844. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 14:51:18,968][134211] Avg episode reward: [(0, '9.407')] [2025-01-04 14:51:21,589][134294] Updated weights for policy 0, policy_version 227254 (0.0023) [2025-01-04 14:51:23,610][134294] Updated weights for policy 0, policy_version 227264 (0.0011) [2025-01-04 14:51:23,967][134211] Fps is (10 sec: 15565.1, 60 sec: 14199.5, 300 sec: 14648.4). Total num frames: 930877440. Throughput: 0: 3512.1. Samples: 221883724. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 14:51:23,968][134211] Avg episode reward: [(0, '11.144')] [2025-01-04 14:51:25,515][134294] Updated weights for policy 0, policy_version 227274 (0.0014) [2025-01-04 14:51:27,418][134294] Updated weights for policy 0, policy_version 227284 (0.0015) [2025-01-04 14:51:28,968][134211] Fps is (10 sec: 19660.6, 60 sec: 14882.1, 300 sec: 14787.2). Total num frames: 930988032. Throughput: 0: 3763.8. Samples: 221916246. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 14:51:28,968][134211] Avg episode reward: [(0, '9.412')] [2025-01-04 14:51:29,300][134294] Updated weights for policy 0, policy_version 227294 (0.0014) [2025-01-04 14:51:31,962][134294] Updated weights for policy 0, policy_version 227304 (0.0024) [2025-01-04 14:51:33,968][134211] Fps is (10 sec: 18431.4, 60 sec: 14950.4, 300 sec: 14787.2). Total num frames: 931061760. Throughput: 0: 3837.0. Samples: 221929962. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 14:51:33,969][134211] Avg episode reward: [(0, '10.004')] [2025-01-04 14:51:35,144][134294] Updated weights for policy 0, policy_version 227314 (0.0026) [2025-01-04 14:51:38,152][134294] Updated weights for policy 0, policy_version 227324 (0.0028) [2025-01-04 14:51:38,968][134211] Fps is (10 sec: 13926.7, 60 sec: 14882.2, 300 sec: 14773.5). Total num frames: 931127296. Throughput: 0: 3815.2. Samples: 221949726. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 14:51:38,969][134211] Avg episode reward: [(0, '10.397')] [2025-01-04 14:51:41,175][134294] Updated weights for policy 0, policy_version 227334 (0.0028) [2025-01-04 14:51:43,968][134211] Fps is (10 sec: 13107.3, 60 sec: 14813.8, 300 sec: 14773.4). Total num frames: 931192832. Throughput: 0: 3789.0. Samples: 221969738. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 14:51:43,968][134211] Avg episode reward: [(0, '9.820')] [2025-01-04 14:51:44,329][134294] Updated weights for policy 0, policy_version 227344 (0.0024) [2025-01-04 14:51:47,288][134294] Updated weights for policy 0, policy_version 227354 (0.0023) [2025-01-04 14:51:48,968][134211] Fps is (10 sec: 13516.8, 60 sec: 14813.9, 300 sec: 14759.5). Total num frames: 931262464. Throughput: 0: 3781.1. Samples: 221979872. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 14:51:48,968][134211] Avg episode reward: [(0, '10.882')] [2025-01-04 14:51:50,255][134294] Updated weights for policy 0, policy_version 227364 (0.0026) [2025-01-04 14:51:53,121][134294] Updated weights for policy 0, policy_version 227374 (0.0024) [2025-01-04 14:51:53,968][134211] Fps is (10 sec: 13926.4, 60 sec: 14813.9, 300 sec: 14759.5). Total num frames: 931332096. Throughput: 0: 3789.6. Samples: 222000824. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 14:51:53,968][134211] Avg episode reward: [(0, '9.981')] [2025-01-04 14:51:56,073][134294] Updated weights for policy 0, policy_version 227384 (0.0024) [2025-01-04 14:51:58,968][134211] Fps is (10 sec: 13926.3, 60 sec: 14813.9, 300 sec: 14773.4). Total num frames: 931401728. Throughput: 0: 3780.3. Samples: 222021658. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 14:51:58,968][134211] Avg episode reward: [(0, '9.948')] [2025-01-04 14:51:59,110][134294] Updated weights for policy 0, policy_version 227394 (0.0023) [2025-01-04 14:52:01,938][134294] Updated weights for policy 0, policy_version 227404 (0.0025) [2025-01-04 14:52:03,968][134211] Fps is (10 sec: 13926.6, 60 sec: 14813.9, 300 sec: 14773.4). Total num frames: 931471360. Throughput: 0: 3781.3. Samples: 222032002. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 14:52:03,968][134211] Avg episode reward: [(0, '11.009')] [2025-01-04 14:52:05,187][134294] Updated weights for policy 0, policy_version 227414 (0.0025) [2025-01-04 14:52:08,054][134294] Updated weights for policy 0, policy_version 227424 (0.0025) [2025-01-04 14:52:08,969][134211] Fps is (10 sec: 13515.7, 60 sec: 14745.4, 300 sec: 14731.8). Total num frames: 931536896. Throughput: 0: 3741.9. Samples: 222052114. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 14:52:08,969][134211] Avg episode reward: [(0, '9.642')] [2025-01-04 14:52:11,265][134294] Updated weights for policy 0, policy_version 227434 (0.0024) [2025-01-04 14:52:13,968][134211] Fps is (10 sec: 13516.6, 60 sec: 14745.6, 300 sec: 14648.4). Total num frames: 931606528. Throughput: 0: 3466.0. Samples: 222072218. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 14:52:13,968][134211] Avg episode reward: [(0, '9.151')] [2025-01-04 14:52:14,229][134294] Updated weights for policy 0, policy_version 227444 (0.0026) [2025-01-04 14:52:17,152][134294] Updated weights for policy 0, policy_version 227454 (0.0025) [2025-01-04 14:52:18,968][134211] Fps is (10 sec: 13517.9, 60 sec: 14677.3, 300 sec: 14634.5). Total num frames: 931672064. Throughput: 0: 3392.9. Samples: 222082644. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 14:52:18,968][134211] Avg episode reward: [(0, '8.782')] [2025-01-04 14:52:20,101][134294] Updated weights for policy 0, policy_version 227464 (0.0024) [2025-01-04 14:52:22,938][134294] Updated weights for policy 0, policy_version 227474 (0.0022) [2025-01-04 14:52:23,968][134211] Fps is (10 sec: 13925.6, 60 sec: 14472.3, 300 sec: 14648.4). Total num frames: 931745792. Throughput: 0: 3426.7. Samples: 222103928. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 14:52:23,970][134211] Avg episode reward: [(0, '10.002')] [2025-01-04 14:52:25,615][134294] Updated weights for policy 0, policy_version 227484 (0.0021) [2025-01-04 14:52:27,597][134294] Updated weights for policy 0, policy_version 227494 (0.0015) [2025-01-04 14:52:28,968][134211] Fps is (10 sec: 16384.1, 60 sec: 14131.2, 300 sec: 14717.8). Total num frames: 931835904. Throughput: 0: 3549.5. Samples: 222129466. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 14:52:28,968][134211] Avg episode reward: [(0, '9.631')] [2025-01-04 14:52:30,355][134294] Updated weights for policy 0, policy_version 227504 (0.0024) [2025-01-04 14:52:33,145][134294] Updated weights for policy 0, policy_version 227514 (0.0024) [2025-01-04 14:52:33,968][134211] Fps is (10 sec: 15975.3, 60 sec: 14063.0, 300 sec: 14717.8). Total num frames: 931905536. Throughput: 0: 3568.0. Samples: 222140432. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 14:52:33,968][134211] Avg episode reward: [(0, '9.291')] [2025-01-04 14:52:36,122][134294] Updated weights for policy 0, policy_version 227524 (0.0023) [2025-01-04 14:52:38,968][134211] Fps is (10 sec: 13926.3, 60 sec: 14131.2, 300 sec: 14717.8). Total num frames: 931975168. Throughput: 0: 3568.0. Samples: 222161386. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 14:52:38,968][134211] Avg episode reward: [(0, '11.329')] [2025-01-04 14:52:39,149][134294] Updated weights for policy 0, policy_version 227534 (0.0026) [2025-01-04 14:52:42,110][134294] Updated weights for policy 0, policy_version 227544 (0.0023) [2025-01-04 14:52:43,968][134211] Fps is (10 sec: 13926.6, 60 sec: 14199.5, 300 sec: 14731.7). Total num frames: 932044800. Throughput: 0: 3562.2. Samples: 222181956. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 14:52:43,968][134211] Avg episode reward: [(0, '10.102')] [2025-01-04 14:52:45,020][134294] Updated weights for policy 0, policy_version 227554 (0.0023) [2025-01-04 14:52:47,877][134294] Updated weights for policy 0, policy_version 227564 (0.0022) [2025-01-04 14:52:48,968][134211] Fps is (10 sec: 13926.6, 60 sec: 14199.5, 300 sec: 14662.3). Total num frames: 932114432. Throughput: 0: 3570.2. Samples: 222192662. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 14:52:48,968][134211] Avg episode reward: [(0, '10.181')] [2025-01-04 14:52:50,784][134294] Updated weights for policy 0, policy_version 227574 (0.0023) [2025-01-04 14:52:53,368][134294] Updated weights for policy 0, policy_version 227584 (0.0018) [2025-01-04 14:52:53,968][134211] Fps is (10 sec: 15154.9, 60 sec: 14404.3, 300 sec: 14703.9). Total num frames: 932196352. Throughput: 0: 3599.0. Samples: 222214066. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 14:52:53,968][134211] Avg episode reward: [(0, '10.260')] [2025-01-04 14:52:55,410][134294] Updated weights for policy 0, policy_version 227594 (0.0014) [2025-01-04 14:52:58,176][134294] Updated weights for policy 0, policy_version 227604 (0.0025) [2025-01-04 14:52:58,968][134211] Fps is (10 sec: 15974.2, 60 sec: 14540.8, 300 sec: 14731.7). Total num frames: 932274176. Throughput: 0: 3717.3. Samples: 222239498. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 14:52:58,968][134211] Avg episode reward: [(0, '10.131')] [2025-01-04 14:53:01,182][134294] Updated weights for policy 0, policy_version 227614 (0.0027) [2025-01-04 14:53:03,968][134211] Fps is (10 sec: 14336.1, 60 sec: 14472.5, 300 sec: 14717.8). Total num frames: 932339712. Throughput: 0: 3718.0. Samples: 222249954. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 14:53:03,968][134211] Avg episode reward: [(0, '9.719')] [2025-01-04 14:53:04,271][134294] Updated weights for policy 0, policy_version 227624 (0.0025) [2025-01-04 14:53:07,238][134294] Updated weights for policy 0, policy_version 227634 (0.0024) [2025-01-04 14:53:08,968][134211] Fps is (10 sec: 13516.9, 60 sec: 14541.0, 300 sec: 14676.2). Total num frames: 932409344. Throughput: 0: 3700.1. Samples: 222270428. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 14:53:08,968][134211] Avg episode reward: [(0, '9.843')] [2025-01-04 14:53:10,187][134294] Updated weights for policy 0, policy_version 227644 (0.0026) [2025-01-04 14:53:13,058][134294] Updated weights for policy 0, policy_version 227654 (0.0023) [2025-01-04 14:53:13,968][134211] Fps is (10 sec: 13926.6, 60 sec: 14540.8, 300 sec: 14662.3). Total num frames: 932478976. Throughput: 0: 3595.0. Samples: 222291240. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 14:53:13,968][134211] Avg episode reward: [(0, '11.150')] [2025-01-04 14:53:14,018][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000227657_932483072.pth... [2025-01-04 14:53:14,084][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000226805_928993280.pth [2025-01-04 14:53:15,787][134294] Updated weights for policy 0, policy_version 227664 (0.0021) [2025-01-04 14:53:17,668][134294] Updated weights for policy 0, policy_version 227674 (0.0015) [2025-01-04 14:53:18,968][134211] Fps is (10 sec: 16793.4, 60 sec: 15086.9, 300 sec: 14759.5). Total num frames: 932577280. Throughput: 0: 3621.6. Samples: 222303404. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-04 14:53:18,968][134211] Avg episode reward: [(0, '11.186')] [2025-01-04 14:53:19,548][134294] Updated weights for policy 0, policy_version 227684 (0.0014) [2025-01-04 14:53:21,437][134294] Updated weights for policy 0, policy_version 227694 (0.0012) [2025-01-04 14:53:23,601][134294] Updated weights for policy 0, policy_version 227704 (0.0020) [2025-01-04 14:53:23,968][134211] Fps is (10 sec: 20070.3, 60 sec: 15565.0, 300 sec: 14856.7). Total num frames: 932679680. Throughput: 0: 3881.6. Samples: 222336058. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-04 14:53:23,968][134211] Avg episode reward: [(0, '9.827')] [2025-01-04 14:53:26,858][134294] Updated weights for policy 0, policy_version 227714 (0.0027) [2025-01-04 14:53:28,969][134211] Fps is (10 sec: 16382.3, 60 sec: 15086.6, 300 sec: 14842.7). Total num frames: 932741120. Throughput: 0: 3879.1. Samples: 222356522. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-04 14:53:28,970][134211] Avg episode reward: [(0, '9.625')] [2025-01-04 14:53:30,051][134294] Updated weights for policy 0, policy_version 227724 (0.0027) [2025-01-04 14:53:33,120][134294] Updated weights for policy 0, policy_version 227734 (0.0023) [2025-01-04 14:53:33,968][134211] Fps is (10 sec: 12697.7, 60 sec: 15018.7, 300 sec: 14828.9). Total num frames: 932806656. Throughput: 0: 3855.2. Samples: 222366148. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-04 14:53:33,968][134211] Avg episode reward: [(0, '10.439')] [2025-01-04 14:53:36,213][134294] Updated weights for policy 0, policy_version 227744 (0.0026) [2025-01-04 14:53:38,968][134211] Fps is (10 sec: 13518.4, 60 sec: 15018.7, 300 sec: 14717.8). Total num frames: 932876288. Throughput: 0: 3829.0. Samples: 222386370. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-04 14:53:38,968][134211] Avg episode reward: [(0, '9.817')] [2025-01-04 14:53:39,309][134294] Updated weights for policy 0, policy_version 227754 (0.0027) [2025-01-04 14:53:42,247][134294] Updated weights for policy 0, policy_version 227764 (0.0025) [2025-01-04 14:53:43,968][134211] Fps is (10 sec: 13516.6, 60 sec: 14950.4, 300 sec: 14565.1). Total num frames: 932941824. Throughput: 0: 3715.6. Samples: 222406698. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-04 14:53:43,968][134211] Avg episode reward: [(0, '10.020')] [2025-01-04 14:53:45,219][134294] Updated weights for policy 0, policy_version 227774 (0.0025) [2025-01-04 14:53:48,119][134294] Updated weights for policy 0, policy_version 227784 (0.0024) [2025-01-04 14:53:48,968][134211] Fps is (10 sec: 13516.8, 60 sec: 14950.4, 300 sec: 14551.2). Total num frames: 933011456. Throughput: 0: 3716.9. Samples: 222417216. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-04 14:53:48,968][134211] Avg episode reward: [(0, '9.552')] [2025-01-04 14:53:50,964][134294] Updated weights for policy 0, policy_version 227794 (0.0021) [2025-01-04 14:53:53,852][134294] Updated weights for policy 0, policy_version 227804 (0.0025) [2025-01-04 14:53:53,968][134211] Fps is (10 sec: 14335.9, 60 sec: 14813.9, 300 sec: 14579.0). Total num frames: 933085184. Throughput: 0: 3736.0. Samples: 222438548. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-04 14:53:53,968][134211] Avg episode reward: [(0, '9.861')] [2025-01-04 14:53:56,811][134294] Updated weights for policy 0, policy_version 227814 (0.0027) [2025-01-04 14:53:58,968][134211] Fps is (10 sec: 14336.0, 60 sec: 14677.4, 300 sec: 14579.0). Total num frames: 933154816. Throughput: 0: 3740.0. Samples: 222459542. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-04 14:53:58,968][134211] Avg episode reward: [(0, '10.601')] [2025-01-04 14:53:59,758][134294] Updated weights for policy 0, policy_version 227824 (0.0023) [2025-01-04 14:54:02,823][134294] Updated weights for policy 0, policy_version 227834 (0.0024) [2025-01-04 14:54:03,968][134211] Fps is (10 sec: 13517.0, 60 sec: 14677.4, 300 sec: 14579.0). Total num frames: 933220352. Throughput: 0: 3696.4. Samples: 222469742. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-04 14:54:03,968][134211] Avg episode reward: [(0, '10.073')] [2025-01-04 14:54:05,690][134294] Updated weights for policy 0, policy_version 227844 (0.0027) [2025-01-04 14:54:08,578][134294] Updated weights for policy 0, policy_version 227854 (0.0024) [2025-01-04 14:54:08,968][134211] Fps is (10 sec: 13926.1, 60 sec: 14745.5, 300 sec: 14592.9). Total num frames: 933294080. Throughput: 0: 3439.0. Samples: 222490814. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-04 14:54:08,968][134211] Avg episode reward: [(0, '9.363')] [2025-01-04 14:54:11,496][134294] Updated weights for policy 0, policy_version 227864 (0.0025) [2025-01-04 14:54:13,968][134211] Fps is (10 sec: 14335.8, 60 sec: 14745.6, 300 sec: 14592.9). Total num frames: 933363712. Throughput: 0: 3451.4. Samples: 222511830. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-04 14:54:13,968][134211] Avg episode reward: [(0, '9.234')] [2025-01-04 14:54:14,504][134294] Updated weights for policy 0, policy_version 227874 (0.0025) [2025-01-04 14:54:17,427][134294] Updated weights for policy 0, policy_version 227884 (0.0024) [2025-01-04 14:54:18,968][134211] Fps is (10 sec: 13926.7, 60 sec: 14267.8, 300 sec: 14592.9). Total num frames: 933433344. Throughput: 0: 3463.6. Samples: 222522010. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-04 14:54:18,968][134211] Avg episode reward: [(0, '8.710')] [2025-01-04 14:54:20,384][134294] Updated weights for policy 0, policy_version 227894 (0.0026) [2025-01-04 14:54:23,222][134294] Updated weights for policy 0, policy_version 227904 (0.0022) [2025-01-04 14:54:23,968][134211] Fps is (10 sec: 13926.5, 60 sec: 13721.6, 300 sec: 14592.9). Total num frames: 933502976. Throughput: 0: 3489.9. Samples: 222543414. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-04 14:54:23,968][134211] Avg episode reward: [(0, '9.224')] [2025-01-04 14:54:26,121][134294] Updated weights for policy 0, policy_version 227914 (0.0027) [2025-01-04 14:54:28,173][134294] Updated weights for policy 0, policy_version 227924 (0.0014) [2025-01-04 14:54:28,968][134211] Fps is (10 sec: 15974.4, 60 sec: 14199.8, 300 sec: 14662.3). Total num frames: 933593088. Throughput: 0: 3574.1. Samples: 222567530. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 14:54:28,968][134211] Avg episode reward: [(0, '11.715')] [2025-01-04 14:54:30,071][134294] Updated weights for policy 0, policy_version 227934 (0.0012) [2025-01-04 14:54:31,981][134294] Updated weights for policy 0, policy_version 227944 (0.0014) [2025-01-04 14:54:33,881][134294] Updated weights for policy 0, policy_version 227954 (0.0013) [2025-01-04 14:54:33,968][134211] Fps is (10 sec: 19661.1, 60 sec: 14882.1, 300 sec: 14787.3). Total num frames: 933699584. Throughput: 0: 3699.1. Samples: 222583674. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 14:54:33,968][134211] Avg episode reward: [(0, '11.091')] [2025-01-04 14:54:36,492][134294] Updated weights for policy 0, policy_version 227964 (0.0020) [2025-01-04 14:54:38,968][134211] Fps is (10 sec: 18020.9, 60 sec: 14950.2, 300 sec: 14801.1). Total num frames: 933773312. Throughput: 0: 3830.8. Samples: 222610938. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 14:54:38,969][134211] Avg episode reward: [(0, '10.756')] [2025-01-04 14:54:39,412][134294] Updated weights for policy 0, policy_version 227974 (0.0028) [2025-01-04 14:54:42,417][134294] Updated weights for policy 0, policy_version 227984 (0.0025) [2025-01-04 14:54:43,968][134211] Fps is (10 sec: 13926.1, 60 sec: 14950.4, 300 sec: 14773.4). Total num frames: 933838848. Throughput: 0: 3810.9. Samples: 222631034. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 14:54:43,968][134211] Avg episode reward: [(0, '9.492')] [2025-01-04 14:54:45,552][134294] Updated weights for policy 0, policy_version 227994 (0.0024) [2025-01-04 14:54:48,452][134294] Updated weights for policy 0, policy_version 228004 (0.0026) [2025-01-04 14:54:48,968][134211] Fps is (10 sec: 13517.7, 60 sec: 14950.4, 300 sec: 14773.4). Total num frames: 933908480. Throughput: 0: 3814.0. Samples: 222641374. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 14:54:48,968][134211] Avg episode reward: [(0, '9.957')] [2025-01-04 14:54:51,437][134294] Updated weights for policy 0, policy_version 228014 (0.0022) [2025-01-04 14:54:53,968][134211] Fps is (10 sec: 13926.5, 60 sec: 14882.2, 300 sec: 14773.4). Total num frames: 933978112. Throughput: 0: 3798.9. Samples: 222661762. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 14:54:53,968][134211] Avg episode reward: [(0, '10.657')] [2025-01-04 14:54:54,587][134294] Updated weights for policy 0, policy_version 228024 (0.0026) [2025-01-04 14:54:57,480][134294] Updated weights for policy 0, policy_version 228034 (0.0023) [2025-01-04 14:54:58,968][134211] Fps is (10 sec: 13516.3, 60 sec: 14813.7, 300 sec: 14759.5). Total num frames: 934043648. Throughput: 0: 3784.8. Samples: 222682146. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 14:54:58,969][134211] Avg episode reward: [(0, '10.051')] [2025-01-04 14:55:00,448][134294] Updated weights for policy 0, policy_version 228044 (0.0026) [2025-01-04 14:55:03,404][134294] Updated weights for policy 0, policy_version 228054 (0.0026) [2025-01-04 14:55:03,968][134211] Fps is (10 sec: 13926.4, 60 sec: 14950.4, 300 sec: 14676.2). Total num frames: 934117376. Throughput: 0: 3796.1. Samples: 222692836. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 14:55:03,968][134211] Avg episode reward: [(0, '9.874')] [2025-01-04 14:55:06,431][134294] Updated weights for policy 0, policy_version 228064 (0.0030) [2025-01-04 14:55:08,971][134211] Fps is (10 sec: 13922.7, 60 sec: 14813.1, 300 sec: 14537.2). Total num frames: 934182912. Throughput: 0: 3776.3. Samples: 222713360. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 14:55:08,971][134211] Avg episode reward: [(0, '9.716')] [2025-01-04 14:55:09,422][134294] Updated weights for policy 0, policy_version 228074 (0.0025) [2025-01-04 14:55:12,374][134294] Updated weights for policy 0, policy_version 228084 (0.0024) [2025-01-04 14:55:13,968][134211] Fps is (10 sec: 13516.7, 60 sec: 14813.9, 300 sec: 14551.2). Total num frames: 934252544. Throughput: 0: 3698.1. Samples: 222733946. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 14:55:13,968][134211] Avg episode reward: [(0, '9.461')] [2025-01-04 14:55:13,978][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000228089_934252544.pth... [2025-01-04 14:55:14,054][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000227227_930721792.pth [2025-01-04 14:55:15,376][134294] Updated weights for policy 0, policy_version 228094 (0.0026) [2025-01-04 14:55:18,240][134294] Updated weights for policy 0, policy_version 228104 (0.0026) [2025-01-04 14:55:18,968][134211] Fps is (10 sec: 13930.9, 60 sec: 14813.9, 300 sec: 14565.1). Total num frames: 934322176. Throughput: 0: 3571.7. Samples: 222744402. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 14:55:18,968][134211] Avg episode reward: [(0, '9.301')] [2025-01-04 14:55:21,165][134294] Updated weights for policy 0, policy_version 228114 (0.0022) [2025-01-04 14:55:23,968][134211] Fps is (10 sec: 13926.5, 60 sec: 14813.9, 300 sec: 14565.1). Total num frames: 934391808. Throughput: 0: 3433.4. Samples: 222765438. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 14:55:23,968][134211] Avg episode reward: [(0, '10.383')] [2025-01-04 14:55:24,195][134294] Updated weights for policy 0, policy_version 228124 (0.0026) [2025-01-04 14:55:27,114][134294] Updated weights for policy 0, policy_version 228134 (0.0024) [2025-01-04 14:55:28,968][134211] Fps is (10 sec: 13925.2, 60 sec: 14472.3, 300 sec: 14565.1). Total num frames: 934461440. Throughput: 0: 3445.8. Samples: 222786096. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 14:55:28,969][134211] Avg episode reward: [(0, '9.207')] [2025-01-04 14:55:29,992][134294] Updated weights for policy 0, policy_version 228144 (0.0024) [2025-01-04 14:55:32,824][134294] Updated weights for policy 0, policy_version 228154 (0.0024) [2025-01-04 14:55:33,968][134211] Fps is (10 sec: 14745.9, 60 sec: 13994.7, 300 sec: 14592.9). Total num frames: 934539264. Throughput: 0: 3455.3. Samples: 222796860. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 14:55:33,968][134211] Avg episode reward: [(0, '9.990')] [2025-01-04 14:55:34,765][134294] Updated weights for policy 0, policy_version 228164 (0.0015) [2025-01-04 14:55:36,627][134294] Updated weights for policy 0, policy_version 228174 (0.0014) [2025-01-04 14:55:38,968][134211] Fps is (10 sec: 17614.1, 60 sec: 14404.4, 300 sec: 14690.1). Total num frames: 934637568. Throughput: 0: 3652.6. Samples: 222826130. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-04 14:55:38,968][134211] Avg episode reward: [(0, '8.852')] [2025-01-04 14:55:39,186][134294] Updated weights for policy 0, policy_version 228184 (0.0021) [2025-01-04 14:55:42,203][134294] Updated weights for policy 0, policy_version 228194 (0.0025) [2025-01-04 14:55:43,968][134211] Fps is (10 sec: 16383.9, 60 sec: 14404.3, 300 sec: 14676.2). Total num frames: 934703104. Throughput: 0: 3660.4. Samples: 222846860. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-04 14:55:43,968][134211] Avg episode reward: [(0, '9.733')] [2025-01-04 14:55:45,305][134294] Updated weights for policy 0, policy_version 228204 (0.0024) [2025-01-04 14:55:48,175][134294] Updated weights for policy 0, policy_version 228214 (0.0026) [2025-01-04 14:55:48,968][134211] Fps is (10 sec: 13516.7, 60 sec: 14404.3, 300 sec: 14676.2). Total num frames: 934772736. Throughput: 0: 3656.4. Samples: 222857374. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-04 14:55:48,968][134211] Avg episode reward: [(0, '9.358')] [2025-01-04 14:55:51,085][134294] Updated weights for policy 0, policy_version 228224 (0.0026) [2025-01-04 14:55:53,968][134211] Fps is (10 sec: 13926.3, 60 sec: 14404.3, 300 sec: 14676.2). Total num frames: 934842368. Throughput: 0: 3660.3. Samples: 222878064. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-04 14:55:53,968][134211] Avg episode reward: [(0, '10.338')] [2025-01-04 14:55:54,222][134294] Updated weights for policy 0, policy_version 228234 (0.0023) [2025-01-04 14:55:57,107][134294] Updated weights for policy 0, policy_version 228244 (0.0023) [2025-01-04 14:55:58,968][134211] Fps is (10 sec: 13926.2, 60 sec: 14472.6, 300 sec: 14676.2). Total num frames: 934912000. Throughput: 0: 3659.9. Samples: 222898642. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-04 14:55:58,968][134211] Avg episode reward: [(0, '9.642')] [2025-01-04 14:56:00,098][134294] Updated weights for policy 0, policy_version 228254 (0.0026) [2025-01-04 14:56:03,050][134294] Updated weights for policy 0, policy_version 228264 (0.0024) [2025-01-04 14:56:03,968][134211] Fps is (10 sec: 13926.2, 60 sec: 14404.3, 300 sec: 14676.2). Total num frames: 934981632. Throughput: 0: 3665.9. Samples: 222909368. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-04 14:56:03,968][134211] Avg episode reward: [(0, '9.122')] [2025-01-04 14:56:05,911][134294] Updated weights for policy 0, policy_version 228274 (0.0026) [2025-01-04 14:56:08,306][134294] Updated weights for policy 0, policy_version 228284 (0.0019) [2025-01-04 14:56:08,968][134211] Fps is (10 sec: 15155.7, 60 sec: 14678.1, 300 sec: 14717.8). Total num frames: 935063552. Throughput: 0: 3668.3. Samples: 222930510. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-04 14:56:08,968][134211] Avg episode reward: [(0, '9.185')] [2025-01-04 14:56:10,212][134294] Updated weights for policy 0, policy_version 228294 (0.0014) [2025-01-04 14:56:12,104][134294] Updated weights for policy 0, policy_version 228304 (0.0014) [2025-01-04 14:56:13,968][134211] Fps is (10 sec: 18842.1, 60 sec: 15291.8, 300 sec: 14842.8). Total num frames: 935170048. Throughput: 0: 3922.9. Samples: 222962622. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-04 14:56:13,968][134211] Avg episode reward: [(0, '10.279')] [2025-01-04 14:56:13,970][134294] Updated weights for policy 0, policy_version 228314 (0.0012) [2025-01-04 14:56:15,980][134294] Updated weights for policy 0, policy_version 228324 (0.0015) [2025-01-04 14:56:18,968][134211] Fps is (10 sec: 18841.2, 60 sec: 15496.5, 300 sec: 14828.9). Total num frames: 935251968. Throughput: 0: 4018.2. Samples: 222977682. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-04 14:56:18,968][134211] Avg episode reward: [(0, '8.247')] [2025-01-04 14:56:18,979][134294] Updated weights for policy 0, policy_version 228334 (0.0027) [2025-01-04 14:56:22,177][134294] Updated weights for policy 0, policy_version 228344 (0.0025) [2025-01-04 14:56:23,968][134211] Fps is (10 sec: 14745.2, 60 sec: 15428.3, 300 sec: 14676.2). Total num frames: 935317504. Throughput: 0: 3808.4. Samples: 222997510. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-04 14:56:23,968][134211] Avg episode reward: [(0, '10.047')] [2025-01-04 14:56:25,296][134294] Updated weights for policy 0, policy_version 228354 (0.0027) [2025-01-04 14:56:28,264][134294] Updated weights for policy 0, policy_version 228364 (0.0025) [2025-01-04 14:56:28,968][134211] Fps is (10 sec: 13516.8, 60 sec: 15428.4, 300 sec: 14662.3). Total num frames: 935387136. Throughput: 0: 3799.4. Samples: 223017832. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-04 14:56:28,968][134211] Avg episode reward: [(0, '9.653')] [2025-01-04 14:56:31,210][134294] Updated weights for policy 0, policy_version 228374 (0.0026) [2025-01-04 14:56:33,968][134211] Fps is (10 sec: 13926.4, 60 sec: 15291.7, 300 sec: 14676.2). Total num frames: 935456768. Throughput: 0: 3796.0. Samples: 223028196. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-04 14:56:33,968][134211] Avg episode reward: [(0, '9.273')] [2025-01-04 14:56:34,286][134294] Updated weights for policy 0, policy_version 228384 (0.0025) [2025-01-04 14:56:37,270][134294] Updated weights for policy 0, policy_version 228394 (0.0027) [2025-01-04 14:56:38,968][134211] Fps is (10 sec: 13516.8, 60 sec: 14745.6, 300 sec: 14676.2). Total num frames: 935522304. Throughput: 0: 3784.9. Samples: 223048386. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-04 14:56:38,968][134211] Avg episode reward: [(0, '10.903')] [2025-01-04 14:56:40,273][134294] Updated weights for policy 0, policy_version 228404 (0.0023) [2025-01-04 14:56:43,135][134294] Updated weights for policy 0, policy_version 228414 (0.0023) [2025-01-04 14:56:43,968][134211] Fps is (10 sec: 13516.9, 60 sec: 14813.8, 300 sec: 14676.2). Total num frames: 935591936. Throughput: 0: 3793.8. Samples: 223069364. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 14:56:43,968][134211] Avg episode reward: [(0, '10.594')] [2025-01-04 14:56:46,053][134294] Updated weights for policy 0, policy_version 228424 (0.0023) [2025-01-04 14:56:48,968][134211] Fps is (10 sec: 13926.6, 60 sec: 14813.9, 300 sec: 14676.2). Total num frames: 935661568. Throughput: 0: 3790.6. Samples: 223079944. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 14:56:48,968][134211] Avg episode reward: [(0, '9.726')] [2025-01-04 14:56:48,975][134294] Updated weights for policy 0, policy_version 228434 (0.0025) [2025-01-04 14:56:51,947][134294] Updated weights for policy 0, policy_version 228444 (0.0025) [2025-01-04 14:56:53,968][134211] Fps is (10 sec: 13926.2, 60 sec: 14813.8, 300 sec: 14676.2). Total num frames: 935731200. Throughput: 0: 3780.3. Samples: 223100624. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 14:56:53,968][134211] Avg episode reward: [(0, '10.078')] [2025-01-04 14:56:54,989][134294] Updated weights for policy 0, policy_version 228454 (0.0025) [2025-01-04 14:56:57,969][134294] Updated weights for policy 0, policy_version 228464 (0.0028) [2025-01-04 14:56:58,968][134211] Fps is (10 sec: 13926.4, 60 sec: 14813.9, 300 sec: 14676.2). Total num frames: 935800832. Throughput: 0: 3522.2. Samples: 223121122. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 14:56:58,968][134211] Avg episode reward: [(0, '9.227')] [2025-01-04 14:57:00,854][134294] Updated weights for policy 0, policy_version 228474 (0.0026) [2025-01-04 14:57:03,757][134294] Updated weights for policy 0, policy_version 228484 (0.0023) [2025-01-04 14:57:03,968][134211] Fps is (10 sec: 13926.5, 60 sec: 14813.9, 300 sec: 14690.1). Total num frames: 935870464. Throughput: 0: 3422.8. Samples: 223131706. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 14:57:03,968][134211] Avg episode reward: [(0, '9.221')] [2025-01-04 14:57:06,732][134294] Updated weights for policy 0, policy_version 228494 (0.0027) [2025-01-04 14:57:08,968][134211] Fps is (10 sec: 13926.5, 60 sec: 14609.1, 300 sec: 14690.1). Total num frames: 935940096. Throughput: 0: 3449.4. Samples: 223152732. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 14:57:08,968][134211] Avg episode reward: [(0, '9.645')] [2025-01-04 14:57:09,795][134294] Updated weights for policy 0, policy_version 228504 (0.0024) [2025-01-04 14:57:12,633][134294] Updated weights for policy 0, policy_version 228514 (0.0026) [2025-01-04 14:57:13,970][134211] Fps is (10 sec: 13923.3, 60 sec: 13994.1, 300 sec: 14703.8). Total num frames: 936009728. Throughput: 0: 3454.9. Samples: 223173312. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 14:57:13,970][134211] Avg episode reward: [(0, '9.207')] [2025-01-04 14:57:14,026][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000228519_936013824.pth... [2025-01-04 14:57:14,077][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000227657_932483072.pth [2025-01-04 14:57:15,045][134294] Updated weights for policy 0, policy_version 228524 (0.0017) [2025-01-04 14:57:17,332][134294] Updated weights for policy 0, policy_version 228534 (0.0018) [2025-01-04 14:57:18,968][134211] Fps is (10 sec: 15564.6, 60 sec: 14063.0, 300 sec: 14745.6). Total num frames: 936095744. Throughput: 0: 3541.6. Samples: 223187568. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 14:57:18,968][134211] Avg episode reward: [(0, '10.805')] [2025-01-04 14:57:20,185][134294] Updated weights for policy 0, policy_version 228544 (0.0024) [2025-01-04 14:57:23,016][134294] Updated weights for policy 0, policy_version 228554 (0.0022) [2025-01-04 14:57:23,968][134211] Fps is (10 sec: 15977.9, 60 sec: 14199.5, 300 sec: 14690.1). Total num frames: 936169472. Throughput: 0: 3583.2. Samples: 223209628. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 14:57:23,968][134211] Avg episode reward: [(0, '8.596')] [2025-01-04 14:57:25,935][134294] Updated weights for policy 0, policy_version 228564 (0.0027) [2025-01-04 14:57:28,732][134294] Updated weights for policy 0, policy_version 228574 (0.0025) [2025-01-04 14:57:28,968][134211] Fps is (10 sec: 14336.0, 60 sec: 14199.5, 300 sec: 14690.1). Total num frames: 936239104. Throughput: 0: 3589.2. Samples: 223230878. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 14:57:28,968][134211] Avg episode reward: [(0, '9.886')] [2025-01-04 14:57:31,700][134294] Updated weights for policy 0, policy_version 228584 (0.0027) [2025-01-04 14:57:33,968][134211] Fps is (10 sec: 13926.4, 60 sec: 14199.5, 300 sec: 14690.1). Total num frames: 936308736. Throughput: 0: 3582.3. Samples: 223241146. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 14:57:33,968][134211] Avg episode reward: [(0, '9.492')] [2025-01-04 14:57:34,778][134294] Updated weights for policy 0, policy_version 228594 (0.0023) [2025-01-04 14:57:37,245][134294] Updated weights for policy 0, policy_version 228604 (0.0018) [2025-01-04 14:57:38,967][134211] Fps is (10 sec: 15974.7, 60 sec: 14609.1, 300 sec: 14759.5). Total num frames: 936398848. Throughput: 0: 3621.6. Samples: 223263596. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 14:57:38,968][134211] Avg episode reward: [(0, '11.000')] [2025-01-04 14:57:39,116][134294] Updated weights for policy 0, policy_version 228614 (0.0012) [2025-01-04 14:57:40,981][134294] Updated weights for policy 0, policy_version 228624 (0.0012) [2025-01-04 14:57:42,899][134294] Updated weights for policy 0, policy_version 228634 (0.0014) [2025-01-04 14:57:43,968][134211] Fps is (10 sec: 19251.3, 60 sec: 15155.2, 300 sec: 14870.6). Total num frames: 936501248. Throughput: 0: 3878.9. Samples: 223295672. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 14:57:43,968][134211] Avg episode reward: [(0, '10.151')] [2025-01-04 14:57:45,702][134294] Updated weights for policy 0, policy_version 228644 (0.0025) [2025-01-04 14:57:48,798][134294] Updated weights for policy 0, policy_version 228654 (0.0029) [2025-01-04 14:57:48,968][134211] Fps is (10 sec: 16793.2, 60 sec: 15086.9, 300 sec: 14815.0). Total num frames: 936566784. Throughput: 0: 3878.0. Samples: 223306218. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 14:57:48,968][134211] Avg episode reward: [(0, '9.764')] [2025-01-04 14:57:51,912][134294] Updated weights for policy 0, policy_version 228664 (0.0025) [2025-01-04 14:57:53,968][134211] Fps is (10 sec: 13107.1, 60 sec: 15018.7, 300 sec: 14773.4). Total num frames: 936632320. Throughput: 0: 3852.6. Samples: 223326098. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 14:57:53,968][134211] Avg episode reward: [(0, '9.433')] [2025-01-04 14:57:55,071][134294] Updated weights for policy 0, policy_version 228674 (0.0025) [2025-01-04 14:57:57,963][134294] Updated weights for policy 0, policy_version 228684 (0.0026) [2025-01-04 14:57:58,968][134211] Fps is (10 sec: 13516.8, 60 sec: 15018.6, 300 sec: 14787.3). Total num frames: 936701952. Throughput: 0: 3845.8. Samples: 223346364. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 14:57:58,968][134211] Avg episode reward: [(0, '9.530')] [2025-01-04 14:58:00,960][134294] Updated weights for policy 0, policy_version 228694 (0.0025) [2025-01-04 14:58:03,847][134294] Updated weights for policy 0, policy_version 228704 (0.0024) [2025-01-04 14:58:03,968][134211] Fps is (10 sec: 13926.4, 60 sec: 15018.6, 300 sec: 14787.2). Total num frames: 936771584. Throughput: 0: 3759.8. Samples: 223356760. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 14:58:03,969][134211] Avg episode reward: [(0, '9.323')] [2025-01-04 14:58:06,731][134294] Updated weights for policy 0, policy_version 228714 (0.0024) [2025-01-04 14:58:08,968][134211] Fps is (10 sec: 13926.4, 60 sec: 15018.6, 300 sec: 14787.3). Total num frames: 936841216. Throughput: 0: 3732.2. Samples: 223377578. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 14:58:08,968][134211] Avg episode reward: [(0, '10.387')] [2025-01-04 14:58:09,862][134294] Updated weights for policy 0, policy_version 228724 (0.0023) [2025-01-04 14:58:12,867][134294] Updated weights for policy 0, policy_version 228734 (0.0025) [2025-01-04 14:58:13,968][134211] Fps is (10 sec: 13516.9, 60 sec: 14950.9, 300 sec: 14676.2). Total num frames: 936906752. Throughput: 0: 3710.2. Samples: 223397836. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 14:58:13,968][134211] Avg episode reward: [(0, '10.254')] [2025-01-04 14:58:15,768][134294] Updated weights for policy 0, policy_version 228744 (0.0025) [2025-01-04 14:58:18,634][134294] Updated weights for policy 0, policy_version 228754 (0.0023) [2025-01-04 14:58:18,968][134211] Fps is (10 sec: 13926.6, 60 sec: 14745.6, 300 sec: 14579.0). Total num frames: 936980480. Throughput: 0: 3716.5. Samples: 223408386. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 14:58:18,968][134211] Avg episode reward: [(0, '10.105')] [2025-01-04 14:58:21,523][134294] Updated weights for policy 0, policy_version 228764 (0.0025) [2025-01-04 14:58:23,968][134211] Fps is (10 sec: 13926.6, 60 sec: 14609.1, 300 sec: 14592.9). Total num frames: 937046016. Throughput: 0: 3690.9. Samples: 223429688. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 14:58:23,968][134211] Avg episode reward: [(0, '9.230')] [2025-01-04 14:58:24,605][134294] Updated weights for policy 0, policy_version 228774 (0.0027) [2025-01-04 14:58:27,521][134294] Updated weights for policy 0, policy_version 228784 (0.0026) [2025-01-04 14:58:28,968][134211] Fps is (10 sec: 13926.4, 60 sec: 14677.4, 300 sec: 14620.6). Total num frames: 937119744. Throughput: 0: 3435.4. Samples: 223450264. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 14:58:28,968][134211] Avg episode reward: [(0, '9.347')] [2025-01-04 14:58:30,456][134294] Updated weights for policy 0, policy_version 228794 (0.0022) [2025-01-04 14:58:33,319][134294] Updated weights for policy 0, policy_version 228804 (0.0027) [2025-01-04 14:58:33,967][134211] Fps is (10 sec: 14745.8, 60 sec: 14745.7, 300 sec: 14634.5). Total num frames: 937193472. Throughput: 0: 3439.4. Samples: 223460990. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 14:58:33,968][134211] Avg episode reward: [(0, '10.514')] [2025-01-04 14:58:35,238][134294] Updated weights for policy 0, policy_version 228814 (0.0015) [2025-01-04 14:58:37,118][134294] Updated weights for policy 0, policy_version 228824 (0.0013) [2025-01-04 14:58:38,968][134211] Fps is (10 sec: 17202.9, 60 sec: 14882.1, 300 sec: 14745.6). Total num frames: 937291776. Throughput: 0: 3628.2. Samples: 223489366. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 14:58:38,968][134211] Avg episode reward: [(0, '9.278')] [2025-01-04 14:58:39,577][134294] Updated weights for policy 0, policy_version 228834 (0.0021) [2025-01-04 14:58:42,561][134294] Updated weights for policy 0, policy_version 228844 (0.0026) [2025-01-04 14:58:43,968][134211] Fps is (10 sec: 16792.5, 60 sec: 14335.9, 300 sec: 14745.6). Total num frames: 937361408. Throughput: 0: 3669.3. Samples: 223511486. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 14:58:43,969][134211] Avg episode reward: [(0, '9.473')] [2025-01-04 14:58:45,559][134294] Updated weights for policy 0, policy_version 228854 (0.0025) [2025-01-04 14:58:48,435][134294] Updated weights for policy 0, policy_version 228864 (0.0023) [2025-01-04 14:58:48,968][134211] Fps is (10 sec: 13926.0, 60 sec: 14404.2, 300 sec: 14731.7). Total num frames: 937431040. Throughput: 0: 3670.6. Samples: 223521936. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 14:58:48,969][134211] Avg episode reward: [(0, '9.679')] [2025-01-04 14:58:51,450][134294] Updated weights for policy 0, policy_version 228874 (0.0023) [2025-01-04 14:58:53,968][134211] Fps is (10 sec: 13926.9, 60 sec: 14472.6, 300 sec: 14731.7). Total num frames: 937500672. Throughput: 0: 3669.2. Samples: 223542694. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 14:58:53,968][134211] Avg episode reward: [(0, '9.646')] [2025-01-04 14:58:54,511][134294] Updated weights for policy 0, policy_version 228884 (0.0024) [2025-01-04 14:58:57,444][134294] Updated weights for policy 0, policy_version 228894 (0.0024) [2025-01-04 14:58:58,968][134211] Fps is (10 sec: 13517.2, 60 sec: 14404.3, 300 sec: 14731.7). Total num frames: 937566208. Throughput: 0: 3672.7. Samples: 223563106. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 14:58:58,968][134211] Avg episode reward: [(0, '9.734')] [2025-01-04 14:59:00,387][134294] Updated weights for policy 0, policy_version 228904 (0.0027) [2025-01-04 14:59:03,229][134294] Updated weights for policy 0, policy_version 228914 (0.0023) [2025-01-04 14:59:03,968][134211] Fps is (10 sec: 13926.5, 60 sec: 14472.6, 300 sec: 14731.7). Total num frames: 937639936. Throughput: 0: 3676.3. Samples: 223573818. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 14:59:03,968][134211] Avg episode reward: [(0, '8.695')] [2025-01-04 14:59:06,213][134294] Updated weights for policy 0, policy_version 228924 (0.0025) [2025-01-04 14:59:08,968][134211] Fps is (10 sec: 14336.2, 60 sec: 14472.6, 300 sec: 14731.7). Total num frames: 937709568. Throughput: 0: 3664.8. Samples: 223594604. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 14:59:08,968][134211] Avg episode reward: [(0, '9.518')] [2025-01-04 14:59:09,215][134294] Updated weights for policy 0, policy_version 228934 (0.0026) [2025-01-04 14:59:12,144][134294] Updated weights for policy 0, policy_version 228944 (0.0023) [2025-01-04 14:59:13,968][134211] Fps is (10 sec: 15155.0, 60 sec: 14745.6, 300 sec: 14773.4). Total num frames: 937791488. Throughput: 0: 3711.8. Samples: 223617298. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 14:59:13,968][134211] Avg episode reward: [(0, '10.082')] [2025-01-04 14:59:13,974][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000228953_937791488.pth... [2025-01-04 14:59:14,018][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000228089_934252544.pth [2025-01-04 14:59:14,097][134294] Updated weights for policy 0, policy_version 228954 (0.0012) [2025-01-04 14:59:15,980][134294] Updated weights for policy 0, policy_version 228964 (0.0012) [2025-01-04 14:59:17,824][134294] Updated weights for policy 0, policy_version 228974 (0.0013) [2025-01-04 14:59:18,968][134211] Fps is (10 sec: 19251.3, 60 sec: 15360.0, 300 sec: 14912.2). Total num frames: 937902080. Throughput: 0: 3835.9. Samples: 223633606. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 14:59:18,968][134211] Avg episode reward: [(0, '9.218')] [2025-01-04 14:59:19,863][134294] Updated weights for policy 0, policy_version 228984 (0.0017) [2025-01-04 14:59:22,798][134294] Updated weights for policy 0, policy_version 228994 (0.0027) [2025-01-04 14:59:23,968][134211] Fps is (10 sec: 18022.7, 60 sec: 15428.3, 300 sec: 14842.8). Total num frames: 937971712. Throughput: 0: 3814.6. Samples: 223661022. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 14:59:23,968][134211] Avg episode reward: [(0, '10.513')] [2025-01-04 14:59:25,917][134294] Updated weights for policy 0, policy_version 229004 (0.0026) [2025-01-04 14:59:28,948][134294] Updated weights for policy 0, policy_version 229014 (0.0028) [2025-01-04 14:59:28,968][134211] Fps is (10 sec: 13926.2, 60 sec: 15360.0, 300 sec: 14717.8). Total num frames: 938041344. Throughput: 0: 3767.9. Samples: 223681040. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 14:59:28,968][134211] Avg episode reward: [(0, '9.747')] [2025-01-04 14:59:31,918][134294] Updated weights for policy 0, policy_version 229024 (0.0025) [2025-01-04 14:59:33,968][134211] Fps is (10 sec: 13516.8, 60 sec: 15223.4, 300 sec: 14690.1). Total num frames: 938106880. Throughput: 0: 3760.0. Samples: 223691136. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 14:59:33,968][134211] Avg episode reward: [(0, '9.475')] [2025-01-04 14:59:34,980][134294] Updated weights for policy 0, policy_version 229034 (0.0027) [2025-01-04 14:59:37,938][134294] Updated weights for policy 0, policy_version 229044 (0.0025) [2025-01-04 14:59:38,968][134211] Fps is (10 sec: 13517.0, 60 sec: 14745.6, 300 sec: 14704.0). Total num frames: 938176512. Throughput: 0: 3750.6. Samples: 223711472. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 14:59:38,968][134211] Avg episode reward: [(0, '10.271')] [2025-01-04 14:59:40,929][134294] Updated weights for policy 0, policy_version 229054 (0.0027) [2025-01-04 14:59:43,844][134294] Updated weights for policy 0, policy_version 229064 (0.0027) [2025-01-04 14:59:43,968][134211] Fps is (10 sec: 13926.2, 60 sec: 14745.7, 300 sec: 14703.9). Total num frames: 938246144. Throughput: 0: 3763.5. Samples: 223732464. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 14:59:43,968][134211] Avg episode reward: [(0, '10.231')] [2025-01-04 14:59:46,765][134294] Updated weights for policy 0, policy_version 229074 (0.0025) [2025-01-04 14:59:48,968][134211] Fps is (10 sec: 13926.4, 60 sec: 14745.7, 300 sec: 14704.0). Total num frames: 938315776. Throughput: 0: 3756.8. Samples: 223742874. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 14:59:48,968][134211] Avg episode reward: [(0, '9.955')] [2025-01-04 14:59:49,780][134294] Updated weights for policy 0, policy_version 229084 (0.0026) [2025-01-04 14:59:52,755][134294] Updated weights for policy 0, policy_version 229094 (0.0024) [2025-01-04 14:59:53,968][134211] Fps is (10 sec: 13926.6, 60 sec: 14745.6, 300 sec: 14717.9). Total num frames: 938385408. Throughput: 0: 3753.9. Samples: 223763530. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 14:59:53,968][134211] Avg episode reward: [(0, '10.197')] [2025-01-04 14:59:55,699][134294] Updated weights for policy 0, policy_version 229104 (0.0025) [2025-01-04 14:59:58,454][134294] Updated weights for policy 0, policy_version 229114 (0.0024) [2025-01-04 14:59:58,968][134211] Fps is (10 sec: 13926.2, 60 sec: 14813.9, 300 sec: 14703.9). Total num frames: 938455040. Throughput: 0: 3722.3. Samples: 223784802. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 14:59:58,968][134211] Avg episode reward: [(0, '10.373')] [2025-01-04 15:00:01,431][134294] Updated weights for policy 0, policy_version 229124 (0.0025) [2025-01-04 15:00:03,968][134211] Fps is (10 sec: 13926.3, 60 sec: 14745.6, 300 sec: 14718.0). Total num frames: 938524672. Throughput: 0: 3592.8. Samples: 223795284. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 15:00:03,968][134211] Avg episode reward: [(0, '9.677')] [2025-01-04 15:00:04,537][134294] Updated weights for policy 0, policy_version 229134 (0.0027) [2025-01-04 15:00:07,417][134294] Updated weights for policy 0, policy_version 229144 (0.0025) [2025-01-04 15:00:08,969][134211] Fps is (10 sec: 13925.0, 60 sec: 14745.3, 300 sec: 14717.8). Total num frames: 938594304. Throughput: 0: 3437.7. Samples: 223815722. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 15:00:08,969][134211] Avg episode reward: [(0, '9.660')] [2025-01-04 15:00:10,401][134294] Updated weights for policy 0, policy_version 229154 (0.0026) [2025-01-04 15:00:13,198][134294] Updated weights for policy 0, policy_version 229164 (0.0023) [2025-01-04 15:00:13,968][134211] Fps is (10 sec: 13926.6, 60 sec: 14540.8, 300 sec: 14717.8). Total num frames: 938663936. Throughput: 0: 3464.0. Samples: 223836920. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 15:00:13,968][134211] Avg episode reward: [(0, '10.970')] [2025-01-04 15:00:16,223][134294] Updated weights for policy 0, policy_version 229174 (0.0025) [2025-01-04 15:00:18,582][134294] Updated weights for policy 0, policy_version 229184 (0.0018) [2025-01-04 15:00:18,967][134211] Fps is (10 sec: 14747.4, 60 sec: 13994.7, 300 sec: 14745.6). Total num frames: 938741760. Throughput: 0: 3470.1. Samples: 223847292. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 15:00:18,968][134211] Avg episode reward: [(0, '10.010')] [2025-01-04 15:00:20,490][134294] Updated weights for policy 0, policy_version 229194 (0.0012) [2025-01-04 15:00:22,663][134294] Updated weights for policy 0, policy_version 229204 (0.0018) [2025-01-04 15:00:23,968][134211] Fps is (10 sec: 17202.9, 60 sec: 14404.2, 300 sec: 14828.9). Total num frames: 938835968. Throughput: 0: 3661.2. Samples: 223876226. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 15:00:23,969][134211] Avg episode reward: [(0, '11.544')] [2025-01-04 15:00:25,727][134294] Updated weights for policy 0, policy_version 229214 (0.0024) [2025-01-04 15:00:28,639][134294] Updated weights for policy 0, policy_version 229224 (0.0027) [2025-01-04 15:00:28,968][134211] Fps is (10 sec: 15974.0, 60 sec: 14336.0, 300 sec: 14787.2). Total num frames: 938901504. Throughput: 0: 3655.4. Samples: 223896958. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 15:00:28,968][134211] Avg episode reward: [(0, '9.732')] [2025-01-04 15:00:31,669][134294] Updated weights for policy 0, policy_version 229234 (0.0024) [2025-01-04 15:00:33,968][134211] Fps is (10 sec: 13516.8, 60 sec: 14404.2, 300 sec: 14690.1). Total num frames: 938971136. Throughput: 0: 3650.4. Samples: 223907144. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 15:00:33,969][134211] Avg episode reward: [(0, '11.062')] [2025-01-04 15:00:34,684][134294] Updated weights for policy 0, policy_version 229244 (0.0027) [2025-01-04 15:00:37,687][134294] Updated weights for policy 0, policy_version 229254 (0.0024) [2025-01-04 15:00:38,968][134211] Fps is (10 sec: 13925.9, 60 sec: 14404.1, 300 sec: 14703.9). Total num frames: 939040768. Throughput: 0: 3644.5. Samples: 223927536. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 15:00:38,969][134211] Avg episode reward: [(0, '9.930')] [2025-01-04 15:00:40,698][134294] Updated weights for policy 0, policy_version 229264 (0.0024) [2025-01-04 15:00:43,572][134294] Updated weights for policy 0, policy_version 229274 (0.0023) [2025-01-04 15:00:43,968][134211] Fps is (10 sec: 13925.8, 60 sec: 14404.2, 300 sec: 14703.9). Total num frames: 939110400. Throughput: 0: 3640.5. Samples: 223948628. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 15:00:43,971][134211] Avg episode reward: [(0, '9.540')] [2025-01-04 15:00:46,445][134294] Updated weights for policy 0, policy_version 229284 (0.0023) [2025-01-04 15:00:48,968][134211] Fps is (10 sec: 13926.8, 60 sec: 14404.2, 300 sec: 14703.9). Total num frames: 939180032. Throughput: 0: 3640.2. Samples: 223959092. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 15:00:48,968][134211] Avg episode reward: [(0, '9.587')] [2025-01-04 15:00:49,571][134294] Updated weights for policy 0, policy_version 229294 (0.0025) [2025-01-04 15:00:52,487][134294] Updated weights for policy 0, policy_version 229304 (0.0026) [2025-01-04 15:00:53,968][134211] Fps is (10 sec: 13927.2, 60 sec: 14404.3, 300 sec: 14704.0). Total num frames: 939249664. Throughput: 0: 3639.0. Samples: 223979472. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 15:00:53,968][134211] Avg episode reward: [(0, '10.435')] [2025-01-04 15:00:54,953][134294] Updated weights for policy 0, policy_version 229314 (0.0018) [2025-01-04 15:00:57,063][134294] Updated weights for policy 0, policy_version 229324 (0.0016) [2025-01-04 15:00:58,968][134211] Fps is (10 sec: 15564.9, 60 sec: 14677.3, 300 sec: 14759.5). Total num frames: 939335680. Throughput: 0: 3736.0. Samples: 224005040. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 15:00:58,968][134211] Avg episode reward: [(0, '9.792')] [2025-01-04 15:00:59,957][134294] Updated weights for policy 0, policy_version 229334 (0.0029) [2025-01-04 15:01:02,862][134294] Updated weights for policy 0, policy_version 229344 (0.0028) [2025-01-04 15:01:03,968][134211] Fps is (10 sec: 15563.6, 60 sec: 14677.2, 300 sec: 14717.8). Total num frames: 939405312. Throughput: 0: 3740.1. Samples: 224015598. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 15:01:03,969][134211] Avg episode reward: [(0, '9.790')] [2025-01-04 15:01:05,889][134294] Updated weights for policy 0, policy_version 229354 (0.0025) [2025-01-04 15:01:08,677][134294] Updated weights for policy 0, policy_version 229364 (0.0025) [2025-01-04 15:01:08,968][134211] Fps is (10 sec: 13926.4, 60 sec: 14677.6, 300 sec: 14592.9). Total num frames: 939474944. Throughput: 0: 3561.5. Samples: 224036494. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 15:01:08,968][134211] Avg episode reward: [(0, '10.577')] [2025-01-04 15:01:11,627][134294] Updated weights for policy 0, policy_version 229374 (0.0025) [2025-01-04 15:01:13,968][134211] Fps is (10 sec: 13927.3, 60 sec: 14677.3, 300 sec: 14551.2). Total num frames: 939544576. Throughput: 0: 3566.8. Samples: 224057462. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 15:01:13,968][134211] Avg episode reward: [(0, '10.359')] [2025-01-04 15:01:14,023][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000229382_939548672.pth... [2025-01-04 15:01:14,089][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000228519_936013824.pth [2025-01-04 15:01:14,705][134294] Updated weights for policy 0, policy_version 229384 (0.0025) [2025-01-04 15:01:17,726][134294] Updated weights for policy 0, policy_version 229394 (0.0023) [2025-01-04 15:01:18,967][134211] Fps is (10 sec: 13926.7, 60 sec: 14540.8, 300 sec: 14565.1). Total num frames: 939614208. Throughput: 0: 3562.9. Samples: 224067472. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 15:01:18,968][134211] Avg episode reward: [(0, '9.324')] [2025-01-04 15:01:19,920][134294] Updated weights for policy 0, policy_version 229404 (0.0018) [2025-01-04 15:01:21,821][134294] Updated weights for policy 0, policy_version 229414 (0.0013) [2025-01-04 15:01:23,914][134294] Updated weights for policy 0, policy_version 229424 (0.0016) [2025-01-04 15:01:23,968][134211] Fps is (10 sec: 17612.9, 60 sec: 14745.6, 300 sec: 14690.1). Total num frames: 939720704. Throughput: 0: 3723.6. Samples: 224095098. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 15:01:23,968][134211] Avg episode reward: [(0, '9.860')] [2025-01-04 15:01:26,968][134294] Updated weights for policy 0, policy_version 229434 (0.0030) [2025-01-04 15:01:28,968][134211] Fps is (10 sec: 17202.9, 60 sec: 14745.6, 300 sec: 14676.2). Total num frames: 939786240. Throughput: 0: 3750.0. Samples: 224117374. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 15:01:28,968][134211] Avg episode reward: [(0, '9.741')] [2025-01-04 15:01:30,057][134294] Updated weights for policy 0, policy_version 229444 (0.0024) [2025-01-04 15:01:32,999][134294] Updated weights for policy 0, policy_version 229454 (0.0026) [2025-01-04 15:01:33,968][134211] Fps is (10 sec: 13516.9, 60 sec: 14745.6, 300 sec: 14690.1). Total num frames: 939855872. Throughput: 0: 3751.6. Samples: 224127914. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 15:01:33,968][134211] Avg episode reward: [(0, '9.913')] [2025-01-04 15:01:35,963][134294] Updated weights for policy 0, policy_version 229464 (0.0024) [2025-01-04 15:01:38,891][134294] Updated weights for policy 0, policy_version 229474 (0.0023) [2025-01-04 15:01:38,969][134211] Fps is (10 sec: 13924.9, 60 sec: 14745.4, 300 sec: 14690.0). Total num frames: 939925504. Throughput: 0: 3759.9. Samples: 224148672. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 15:01:38,969][134211] Avg episode reward: [(0, '9.858')] [2025-01-04 15:01:41,853][134294] Updated weights for policy 0, policy_version 229484 (0.0024) [2025-01-04 15:01:43,968][134211] Fps is (10 sec: 13925.9, 60 sec: 14745.7, 300 sec: 14690.0). Total num frames: 939995136. Throughput: 0: 3656.4. Samples: 224169578. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 15:01:43,969][134211] Avg episode reward: [(0, '10.305')] [2025-01-04 15:01:44,751][134294] Updated weights for policy 0, policy_version 229494 (0.0026) [2025-01-04 15:01:47,747][134294] Updated weights for policy 0, policy_version 229504 (0.0027) [2025-01-04 15:01:48,968][134211] Fps is (10 sec: 13518.3, 60 sec: 14677.4, 300 sec: 14676.2). Total num frames: 940060672. Throughput: 0: 3649.1. Samples: 224179804. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 15:01:48,968][134211] Avg episode reward: [(0, '9.990')] [2025-01-04 15:01:50,696][134294] Updated weights for policy 0, policy_version 229514 (0.0027) [2025-01-04 15:01:53,570][134294] Updated weights for policy 0, policy_version 229524 (0.0025) [2025-01-04 15:01:53,968][134211] Fps is (10 sec: 13926.9, 60 sec: 14745.6, 300 sec: 14690.1). Total num frames: 940134400. Throughput: 0: 3656.9. Samples: 224201054. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 15:01:53,968][134211] Avg episode reward: [(0, '9.191')] [2025-01-04 15:01:56,357][134294] Updated weights for policy 0, policy_version 229534 (0.0024) [2025-01-04 15:01:58,968][134211] Fps is (10 sec: 14745.4, 60 sec: 14540.8, 300 sec: 14703.9). Total num frames: 940208128. Throughput: 0: 3666.9. Samples: 224222474. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 15:01:58,968][134211] Avg episode reward: [(0, '9.999')] [2025-01-04 15:01:59,272][134294] Updated weights for policy 0, policy_version 229544 (0.0025) [2025-01-04 15:02:02,133][134294] Updated weights for policy 0, policy_version 229554 (0.0026) [2025-01-04 15:02:03,967][134211] Fps is (10 sec: 14336.3, 60 sec: 14541.0, 300 sec: 14703.9). Total num frames: 940277760. Throughput: 0: 3675.9. Samples: 224232886. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 15:02:03,968][134211] Avg episode reward: [(0, '8.741')] [2025-01-04 15:02:04,662][134294] Updated weights for policy 0, policy_version 229564 (0.0016) [2025-01-04 15:02:06,829][134294] Updated weights for policy 0, policy_version 229574 (0.0016) [2025-01-04 15:02:08,968][134211] Fps is (10 sec: 15155.2, 60 sec: 14745.6, 300 sec: 14745.7). Total num frames: 940359680. Throughput: 0: 3621.7. Samples: 224258074. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 15:02:08,968][134211] Avg episode reward: [(0, '8.235')] [2025-01-04 15:02:10,094][134294] Updated weights for policy 0, policy_version 229584 (0.0026) [2025-01-04 15:02:13,151][134294] Updated weights for policy 0, policy_version 229594 (0.0025) [2025-01-04 15:02:13,968][134211] Fps is (10 sec: 14745.1, 60 sec: 14677.3, 300 sec: 14676.2). Total num frames: 940425216. Throughput: 0: 3564.0. Samples: 224277754. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 15:02:13,969][134211] Avg episode reward: [(0, '9.405')] [2025-01-04 15:02:15,624][134294] Updated weights for policy 0, policy_version 229604 (0.0019) [2025-01-04 15:02:17,717][134294] Updated weights for policy 0, policy_version 229614 (0.0017) [2025-01-04 15:02:18,968][134211] Fps is (10 sec: 15564.3, 60 sec: 15018.5, 300 sec: 14731.7). Total num frames: 940515328. Throughput: 0: 3618.7. Samples: 224290758. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 15:02:18,969][134211] Avg episode reward: [(0, '9.161')] [2025-01-04 15:02:20,639][134294] Updated weights for policy 0, policy_version 229624 (0.0023) [2025-01-04 15:02:23,561][134294] Updated weights for policy 0, policy_version 229634 (0.0026) [2025-01-04 15:02:23,968][134211] Fps is (10 sec: 15974.4, 60 sec: 14404.3, 300 sec: 14731.7). Total num frames: 940584960. Throughput: 0: 3663.5. Samples: 224313528. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 15:02:23,968][134211] Avg episode reward: [(0, '9.806')] [2025-01-04 15:02:26,535][134294] Updated weights for policy 0, policy_version 229644 (0.0026) [2025-01-04 15:02:28,968][134211] Fps is (10 sec: 13926.8, 60 sec: 14472.5, 300 sec: 14731.7). Total num frames: 940654592. Throughput: 0: 3657.8. Samples: 224334180. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 15:02:28,968][134211] Avg episode reward: [(0, '9.764')] [2025-01-04 15:02:29,574][134294] Updated weights for policy 0, policy_version 229654 (0.0024) [2025-01-04 15:02:32,593][134294] Updated weights for policy 0, policy_version 229664 (0.0026) [2025-01-04 15:02:33,968][134211] Fps is (10 sec: 13517.0, 60 sec: 14404.3, 300 sec: 14648.4). Total num frames: 940720128. Throughput: 0: 3656.3. Samples: 224344336. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 15:02:33,968][134211] Avg episode reward: [(0, '9.197')] [2025-01-04 15:02:35,509][134294] Updated weights for policy 0, policy_version 229674 (0.0025) [2025-01-04 15:02:38,356][134294] Updated weights for policy 0, policy_version 229684 (0.0026) [2025-01-04 15:02:38,968][134211] Fps is (10 sec: 13926.7, 60 sec: 14472.8, 300 sec: 14551.2). Total num frames: 940793856. Throughput: 0: 3653.8. Samples: 224365474. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 15:02:38,968][134211] Avg episode reward: [(0, '8.960')] [2025-01-04 15:02:41,229][134294] Updated weights for policy 0, policy_version 229694 (0.0024) [2025-01-04 15:02:43,968][134211] Fps is (10 sec: 14335.9, 60 sec: 14472.6, 300 sec: 14565.1). Total num frames: 940863488. Throughput: 0: 3644.6. Samples: 224386480. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 15:02:43,968][134211] Avg episode reward: [(0, '8.786')] [2025-01-04 15:02:44,267][134294] Updated weights for policy 0, policy_version 229704 (0.0026) [2025-01-04 15:02:47,183][134294] Updated weights for policy 0, policy_version 229714 (0.0027) [2025-01-04 15:02:48,967][134211] Fps is (10 sec: 14745.6, 60 sec: 14677.4, 300 sec: 14606.8). Total num frames: 940941312. Throughput: 0: 3643.9. Samples: 224396862. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 15:02:48,968][134211] Avg episode reward: [(0, '9.880')] [2025-01-04 15:02:49,199][134294] Updated weights for policy 0, policy_version 229724 (0.0015) [2025-01-04 15:02:51,781][134294] Updated weights for policy 0, policy_version 229734 (0.0022) [2025-01-04 15:02:53,968][134211] Fps is (10 sec: 15564.0, 60 sec: 14745.5, 300 sec: 14634.5). Total num frames: 941019136. Throughput: 0: 3649.6. Samples: 224422310. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 15:02:53,969][134211] Avg episode reward: [(0, '9.485')] [2025-01-04 15:02:54,678][134294] Updated weights for policy 0, policy_version 229744 (0.0022) [2025-01-04 15:02:57,647][134294] Updated weights for policy 0, policy_version 229754 (0.0026) [2025-01-04 15:02:58,968][134211] Fps is (10 sec: 14744.9, 60 sec: 14677.3, 300 sec: 14634.5). Total num frames: 941088768. Throughput: 0: 3675.5. Samples: 224443154. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 15:02:58,969][134211] Avg episode reward: [(0, '8.965')] [2025-01-04 15:03:00,476][134294] Updated weights for policy 0, policy_version 229764 (0.0024) [2025-01-04 15:03:03,472][134294] Updated weights for policy 0, policy_version 229774 (0.0023) [2025-01-04 15:03:03,967][134211] Fps is (10 sec: 14337.1, 60 sec: 14745.6, 300 sec: 14648.4). Total num frames: 941162496. Throughput: 0: 3623.6. Samples: 224453818. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 15:03:03,968][134211] Avg episode reward: [(0, '9.846')] [2025-01-04 15:03:05,443][134294] Updated weights for policy 0, policy_version 229784 (0.0015) [2025-01-04 15:03:08,034][134294] Updated weights for policy 0, policy_version 229794 (0.0021) [2025-01-04 15:03:08,968][134211] Fps is (10 sec: 15974.0, 60 sec: 14813.7, 300 sec: 14717.8). Total num frames: 941248512. Throughput: 0: 3682.5. Samples: 224479242. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 15:03:08,969][134211] Avg episode reward: [(0, '8.630')] [2025-01-04 15:03:11,049][134294] Updated weights for policy 0, policy_version 229804 (0.0026) [2025-01-04 15:03:13,968][134211] Fps is (10 sec: 15154.6, 60 sec: 14813.9, 300 sec: 14690.0). Total num frames: 941314048. Throughput: 0: 3678.7. Samples: 224499722. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 15:03:13,969][134211] Avg episode reward: [(0, '10.133')] [2025-01-04 15:03:13,978][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000229813_941314048.pth... [2025-01-04 15:03:14,055][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000228953_937791488.pth [2025-01-04 15:03:14,132][134294] Updated weights for policy 0, policy_version 229814 (0.0024) [2025-01-04 15:03:17,151][134294] Updated weights for policy 0, policy_version 229824 (0.0023) [2025-01-04 15:03:18,968][134211] Fps is (10 sec: 13517.5, 60 sec: 14472.6, 300 sec: 14703.9). Total num frames: 941383680. Throughput: 0: 3673.9. Samples: 224509662. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 15:03:18,968][134211] Avg episode reward: [(0, '9.788')] [2025-01-04 15:03:20,103][134294] Updated weights for policy 0, policy_version 229834 (0.0027) [2025-01-04 15:03:23,038][134294] Updated weights for policy 0, policy_version 229844 (0.0024) [2025-01-04 15:03:23,968][134211] Fps is (10 sec: 13926.1, 60 sec: 14472.5, 300 sec: 14690.0). Total num frames: 941453312. Throughput: 0: 3672.2. Samples: 224530726. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 15:03:23,969][134211] Avg episode reward: [(0, '11.156')] [2025-01-04 15:03:25,855][134294] Updated weights for policy 0, policy_version 229854 (0.0025) [2025-01-04 15:03:28,752][134294] Updated weights for policy 0, policy_version 229864 (0.0027) [2025-01-04 15:03:28,968][134211] Fps is (10 sec: 13926.5, 60 sec: 14472.6, 300 sec: 14676.2). Total num frames: 941522944. Throughput: 0: 3678.4. Samples: 224552008. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 15:03:28,968][134211] Avg episode reward: [(0, '9.247')] [2025-01-04 15:03:31,717][134294] Updated weights for policy 0, policy_version 229874 (0.0024) [2025-01-04 15:03:33,968][134211] Fps is (10 sec: 13926.8, 60 sec: 14540.8, 300 sec: 14579.0). Total num frames: 941592576. Throughput: 0: 3676.9. Samples: 224562324. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 15:03:33,968][134211] Avg episode reward: [(0, '9.376')] [2025-01-04 15:03:34,668][134294] Updated weights for policy 0, policy_version 229884 (0.0027) [2025-01-04 15:03:37,225][134294] Updated weights for policy 0, policy_version 229894 (0.0019) [2025-01-04 15:03:38,968][134211] Fps is (10 sec: 15974.6, 60 sec: 14813.9, 300 sec: 14648.4). Total num frames: 941682688. Throughput: 0: 3607.4. Samples: 224584642. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 15:03:38,968][134211] Avg episode reward: [(0, '8.698')] [2025-01-04 15:03:39,121][134294] Updated weights for policy 0, policy_version 229904 (0.0015) [2025-01-04 15:03:41,020][134294] Updated weights for policy 0, policy_version 229914 (0.0012) [2025-01-04 15:03:42,841][134294] Updated weights for policy 0, policy_version 229924 (0.0014) [2025-01-04 15:03:43,968][134211] Fps is (10 sec: 20069.7, 60 sec: 15496.4, 300 sec: 14787.2). Total num frames: 941793280. Throughput: 0: 3866.3. Samples: 224617140. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 15:03:43,969][134211] Avg episode reward: [(0, '9.516')] [2025-01-04 15:03:44,960][134294] Updated weights for policy 0, policy_version 229934 (0.0017) [2025-01-04 15:03:47,969][134294] Updated weights for policy 0, policy_version 229944 (0.0027) [2025-01-04 15:03:48,968][134211] Fps is (10 sec: 17612.5, 60 sec: 15291.7, 300 sec: 14773.4). Total num frames: 941858816. Throughput: 0: 3914.3. Samples: 224629964. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 15:03:48,968][134211] Avg episode reward: [(0, '8.818')] [2025-01-04 15:03:51,204][134294] Updated weights for policy 0, policy_version 229954 (0.0025) [2025-01-04 15:03:53,968][134211] Fps is (10 sec: 13107.7, 60 sec: 15087.1, 300 sec: 14773.4). Total num frames: 941924352. Throughput: 0: 3781.6. Samples: 224649414. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 15:03:53,968][134211] Avg episode reward: [(0, '9.315')] [2025-01-04 15:03:54,344][134294] Updated weights for policy 0, policy_version 229964 (0.0027) [2025-01-04 15:03:57,302][134294] Updated weights for policy 0, policy_version 229974 (0.0030) [2025-01-04 15:03:58,968][134211] Fps is (10 sec: 13516.8, 60 sec: 15087.0, 300 sec: 14759.5). Total num frames: 941993984. Throughput: 0: 3768.4. Samples: 224669300. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 15:03:58,968][134211] Avg episode reward: [(0, '10.025')] [2025-01-04 15:04:00,316][134294] Updated weights for policy 0, policy_version 229984 (0.0025) [2025-01-04 15:04:03,300][134294] Updated weights for policy 0, policy_version 229994 (0.0023) [2025-01-04 15:04:03,968][134211] Fps is (10 sec: 13517.0, 60 sec: 14950.3, 300 sec: 14745.6). Total num frames: 942059520. Throughput: 0: 3780.7. Samples: 224679794. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 15:04:03,968][134211] Avg episode reward: [(0, '9.822')] [2025-01-04 15:04:06,315][134294] Updated weights for policy 0, policy_version 230004 (0.0026) [2025-01-04 15:04:08,968][134211] Fps is (10 sec: 13926.5, 60 sec: 14745.8, 300 sec: 14717.8). Total num frames: 942133248. Throughput: 0: 3768.1. Samples: 224700288. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 15:04:08,968][134211] Avg episode reward: [(0, '10.909')] [2025-01-04 15:04:09,388][134294] Updated weights for policy 0, policy_version 230014 (0.0027) [2025-01-04 15:04:12,301][134294] Updated weights for policy 0, policy_version 230024 (0.0025) [2025-01-04 15:04:13,968][134211] Fps is (10 sec: 13926.2, 60 sec: 14745.6, 300 sec: 14565.1). Total num frames: 942198784. Throughput: 0: 3751.4. Samples: 224720822. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 15:04:13,968][134211] Avg episode reward: [(0, '9.282')] [2025-01-04 15:04:15,247][134294] Updated weights for policy 0, policy_version 230034 (0.0023) [2025-01-04 15:04:17,973][134294] Updated weights for policy 0, policy_version 230044 (0.0025) [2025-01-04 15:04:18,968][134211] Fps is (10 sec: 13926.3, 60 sec: 14813.9, 300 sec: 14579.0). Total num frames: 942272512. Throughput: 0: 3760.6. Samples: 224731550. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 15:04:18,968][134211] Avg episode reward: [(0, '9.453')] [2025-01-04 15:04:20,992][134294] Updated weights for policy 0, policy_version 230054 (0.0026) [2025-01-04 15:04:23,942][134294] Updated weights for policy 0, policy_version 230064 (0.0028) [2025-01-04 15:04:23,968][134211] Fps is (10 sec: 14336.2, 60 sec: 14814.0, 300 sec: 14579.0). Total num frames: 942342144. Throughput: 0: 3732.7. Samples: 224752614. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 15:04:23,968][134211] Avg episode reward: [(0, '9.947')] [2025-01-04 15:04:26,851][134294] Updated weights for policy 0, policy_version 230074 (0.0024) [2025-01-04 15:04:28,968][134211] Fps is (10 sec: 13926.4, 60 sec: 14813.8, 300 sec: 14592.9). Total num frames: 942411776. Throughput: 0: 3475.8. Samples: 224773548. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 15:04:28,968][134211] Avg episode reward: [(0, '9.497')] [2025-01-04 15:04:29,890][134294] Updated weights for policy 0, policy_version 230084 (0.0027) [2025-01-04 15:04:32,811][134294] Updated weights for policy 0, policy_version 230094 (0.0025) [2025-01-04 15:04:33,968][134211] Fps is (10 sec: 13516.7, 60 sec: 14745.6, 300 sec: 14579.0). Total num frames: 942477312. Throughput: 0: 3416.2. Samples: 224783692. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 15:04:33,968][134211] Avg episode reward: [(0, '9.928')] [2025-01-04 15:04:35,767][134294] Updated weights for policy 0, policy_version 230104 (0.0023) [2025-01-04 15:04:38,598][134294] Updated weights for policy 0, policy_version 230114 (0.0027) [2025-01-04 15:04:38,968][134211] Fps is (10 sec: 13926.6, 60 sec: 14472.5, 300 sec: 14592.9). Total num frames: 942551040. Throughput: 0: 3454.8. Samples: 224804878. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 15:04:38,968][134211] Avg episode reward: [(0, '9.236')] [2025-01-04 15:04:41,605][134294] Updated weights for policy 0, policy_version 230124 (0.0026) [2025-01-04 15:04:43,536][134294] Updated weights for policy 0, policy_version 230134 (0.0012) [2025-01-04 15:04:43,968][134211] Fps is (10 sec: 15974.5, 60 sec: 14063.0, 300 sec: 14648.4). Total num frames: 942637056. Throughput: 0: 3538.7. Samples: 224828542. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 15:04:43,968][134211] Avg episode reward: [(0, '9.436')] [2025-01-04 15:04:45,427][134294] Updated weights for policy 0, policy_version 230144 (0.0015) [2025-01-04 15:04:47,877][134294] Updated weights for policy 0, policy_version 230154 (0.0023) [2025-01-04 15:04:48,968][134211] Fps is (10 sec: 17202.9, 60 sec: 14404.3, 300 sec: 14703.9). Total num frames: 942723072. Throughput: 0: 3658.6. Samples: 224844432. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 15:04:48,968][134211] Avg episode reward: [(0, '9.396')] [2025-01-04 15:04:50,983][134294] Updated weights for policy 0, policy_version 230164 (0.0026) [2025-01-04 15:04:53,937][134294] Updated weights for policy 0, policy_version 230174 (0.0024) [2025-01-04 15:04:53,968][134211] Fps is (10 sec: 15564.6, 60 sec: 14472.5, 300 sec: 14703.9). Total num frames: 942792704. Throughput: 0: 3665.5. Samples: 224865238. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 15:04:53,969][134211] Avg episode reward: [(0, '10.118')] [2025-01-04 15:04:56,955][134294] Updated weights for policy 0, policy_version 230184 (0.0027) [2025-01-04 15:04:58,968][134211] Fps is (10 sec: 13516.2, 60 sec: 14404.1, 300 sec: 14690.0). Total num frames: 942858240. Throughput: 0: 3666.5. Samples: 224885816. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 15:04:58,969][134211] Avg episode reward: [(0, '9.862')] [2025-01-04 15:04:59,911][134294] Updated weights for policy 0, policy_version 230194 (0.0025) [2025-01-04 15:05:02,867][134294] Updated weights for policy 0, policy_version 230204 (0.0026) [2025-01-04 15:05:03,968][134211] Fps is (10 sec: 13516.9, 60 sec: 14472.5, 300 sec: 14690.1). Total num frames: 942927872. Throughput: 0: 3662.7. Samples: 224896370. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 15:05:03,968][134211] Avg episode reward: [(0, '9.956')] [2025-01-04 15:05:05,831][134294] Updated weights for policy 0, policy_version 230214 (0.0024) [2025-01-04 15:05:08,556][134294] Updated weights for policy 0, policy_version 230224 (0.0025) [2025-01-04 15:05:08,968][134211] Fps is (10 sec: 14336.7, 60 sec: 14472.5, 300 sec: 14703.9). Total num frames: 943001600. Throughput: 0: 3665.6. Samples: 224917566. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 15:05:08,968][134211] Avg episode reward: [(0, '9.400')] [2025-01-04 15:05:11,580][134294] Updated weights for policy 0, policy_version 230234 (0.0026) [2025-01-04 15:05:13,968][134211] Fps is (10 sec: 14335.4, 60 sec: 14540.7, 300 sec: 14676.1). Total num frames: 943071232. Throughput: 0: 3660.2. Samples: 224938260. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 15:05:13,969][134211] Avg episode reward: [(0, '9.811')] [2025-01-04 15:05:13,978][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000230242_943071232.pth... [2025-01-04 15:05:14,047][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000229382_939548672.pth [2025-01-04 15:05:14,597][134294] Updated weights for policy 0, policy_version 230244 (0.0027) [2025-01-04 15:05:17,245][134294] Updated weights for policy 0, policy_version 230254 (0.0020) [2025-01-04 15:05:18,968][134211] Fps is (10 sec: 15565.0, 60 sec: 14745.6, 300 sec: 14648.4). Total num frames: 943157248. Throughput: 0: 3655.7. Samples: 224948198. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 15:05:18,968][134211] Avg episode reward: [(0, '10.077')] [2025-01-04 15:05:19,099][134294] Updated weights for policy 0, policy_version 230264 (0.0012) [2025-01-04 15:05:21,001][134294] Updated weights for policy 0, policy_version 230274 (0.0014) [2025-01-04 15:05:22,894][134294] Updated weights for policy 0, policy_version 230284 (0.0014) [2025-01-04 15:05:23,967][134211] Fps is (10 sec: 19252.5, 60 sec: 15360.0, 300 sec: 14787.3). Total num frames: 943263744. Throughput: 0: 3895.1. Samples: 224980156. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 15:05:23,968][134211] Avg episode reward: [(0, '10.495')] [2025-01-04 15:05:24,773][134294] Updated weights for policy 0, policy_version 230294 (0.0014) [2025-01-04 15:05:26,855][134294] Updated weights for policy 0, policy_version 230304 (0.0016) [2025-01-04 15:05:28,968][134211] Fps is (10 sec: 19660.4, 60 sec: 15701.3, 300 sec: 14856.7). Total num frames: 943353856. Throughput: 0: 4014.7. Samples: 225009204. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 15:05:28,968][134211] Avg episode reward: [(0, '10.277')] [2025-01-04 15:05:29,902][134294] Updated weights for policy 0, policy_version 230314 (0.0027) [2025-01-04 15:05:33,060][134294] Updated weights for policy 0, policy_version 230324 (0.0029) [2025-01-04 15:05:33,968][134211] Fps is (10 sec: 15154.7, 60 sec: 15633.0, 300 sec: 14828.9). Total num frames: 943415296. Throughput: 0: 3872.6. Samples: 225018700. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 15:05:33,968][134211] Avg episode reward: [(0, '9.641')] [2025-01-04 15:05:36,107][134294] Updated weights for policy 0, policy_version 230334 (0.0023) [2025-01-04 15:05:38,968][134211] Fps is (10 sec: 12697.8, 60 sec: 15496.5, 300 sec: 14815.1). Total num frames: 943480832. Throughput: 0: 3851.8. Samples: 225038568. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 15:05:38,968][134211] Avg episode reward: [(0, '9.408')] [2025-01-04 15:05:39,260][134294] Updated weights for policy 0, policy_version 230344 (0.0029) [2025-01-04 15:05:42,288][134294] Updated weights for policy 0, policy_version 230354 (0.0025) [2025-01-04 15:05:43,968][134211] Fps is (10 sec: 13516.9, 60 sec: 15223.4, 300 sec: 14815.0). Total num frames: 943550464. Throughput: 0: 3842.2. Samples: 225058712. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 15:05:43,968][134211] Avg episode reward: [(0, '8.635')] [2025-01-04 15:05:45,302][134294] Updated weights for policy 0, policy_version 230364 (0.0025) [2025-01-04 15:05:48,360][134294] Updated weights for policy 0, policy_version 230374 (0.0025) [2025-01-04 15:05:48,968][134211] Fps is (10 sec: 13926.3, 60 sec: 14950.4, 300 sec: 14815.0). Total num frames: 943620096. Throughput: 0: 3831.2. Samples: 225068772. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 15:05:48,968][134211] Avg episode reward: [(0, '10.451')] [2025-01-04 15:05:51,224][134294] Updated weights for policy 0, policy_version 230384 (0.0026) [2025-01-04 15:05:53,968][134211] Fps is (10 sec: 13926.2, 60 sec: 14950.4, 300 sec: 14759.5). Total num frames: 943689728. Throughput: 0: 3819.9. Samples: 225089464. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 15:05:53,969][134211] Avg episode reward: [(0, '8.792')] [2025-01-04 15:05:54,311][134294] Updated weights for policy 0, policy_version 230394 (0.0027) [2025-01-04 15:05:57,250][134294] Updated weights for policy 0, policy_version 230404 (0.0025) [2025-01-04 15:05:58,968][134211] Fps is (10 sec: 13516.8, 60 sec: 14950.5, 300 sec: 14745.6). Total num frames: 943755264. Throughput: 0: 3818.2. Samples: 225110078. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 15:05:58,968][134211] Avg episode reward: [(0, '9.337')] [2025-01-04 15:06:00,196][134294] Updated weights for policy 0, policy_version 230414 (0.0022) [2025-01-04 15:06:03,069][134294] Updated weights for policy 0, policy_version 230424 (0.0025) [2025-01-04 15:06:03,968][134211] Fps is (10 sec: 13926.0, 60 sec: 15018.6, 300 sec: 14759.5). Total num frames: 943828992. Throughput: 0: 3836.7. Samples: 225120852. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 15:06:03,969][134211] Avg episode reward: [(0, '10.827')] [2025-01-04 15:06:05,980][134294] Updated weights for policy 0, policy_version 230434 (0.0026) [2025-01-04 15:06:08,867][134294] Updated weights for policy 0, policy_version 230444 (0.0026) [2025-01-04 15:06:08,968][134211] Fps is (10 sec: 14335.5, 60 sec: 14950.3, 300 sec: 14759.5). Total num frames: 943898624. Throughput: 0: 3595.3. Samples: 225141948. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 15:06:08,969][134211] Avg episode reward: [(0, '9.628')] [2025-01-04 15:06:11,688][134294] Updated weights for policy 0, policy_version 230454 (0.0025) [2025-01-04 15:06:13,968][134211] Fps is (10 sec: 13927.2, 60 sec: 14950.5, 300 sec: 14759.5). Total num frames: 943968256. Throughput: 0: 3424.7. Samples: 225163316. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 15:06:13,968][134211] Avg episode reward: [(0, '10.001')] [2025-01-04 15:06:14,645][134294] Updated weights for policy 0, policy_version 230464 (0.0023) [2025-01-04 15:06:17,497][134294] Updated weights for policy 0, policy_version 230474 (0.0024) [2025-01-04 15:06:18,968][134211] Fps is (10 sec: 13926.9, 60 sec: 14677.3, 300 sec: 14634.5). Total num frames: 944037888. Throughput: 0: 3445.2. Samples: 225173734. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 15:06:18,968][134211] Avg episode reward: [(0, '9.592')] [2025-01-04 15:06:20,545][134294] Updated weights for policy 0, policy_version 230484 (0.0028) [2025-01-04 15:06:23,417][134294] Updated weights for policy 0, policy_version 230494 (0.0025) [2025-01-04 15:06:23,968][134211] Fps is (10 sec: 14335.9, 60 sec: 14131.1, 300 sec: 14662.3). Total num frames: 944111616. Throughput: 0: 3472.4. Samples: 225194828. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 15:06:23,968][134211] Avg episode reward: [(0, '9.749')] [2025-01-04 15:06:26,465][134294] Updated weights for policy 0, policy_version 230504 (0.0023) [2025-01-04 15:06:28,971][134211] Fps is (10 sec: 13922.2, 60 sec: 13720.9, 300 sec: 14648.3). Total num frames: 944177152. Throughput: 0: 3475.1. Samples: 225215102. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 15:06:28,971][134211] Avg episode reward: [(0, '10.638')] [2025-01-04 15:06:29,475][134294] Updated weights for policy 0, policy_version 230514 (0.0024) [2025-01-04 15:06:32,478][134294] Updated weights for policy 0, policy_version 230524 (0.0025) [2025-01-04 15:06:33,968][134211] Fps is (10 sec: 13516.0, 60 sec: 13858.0, 300 sec: 14648.4). Total num frames: 944246784. Throughput: 0: 3479.6. Samples: 225225354. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 15:06:33,969][134211] Avg episode reward: [(0, '9.278')] [2025-01-04 15:06:35,436][134294] Updated weights for policy 0, policy_version 230534 (0.0023) [2025-01-04 15:06:38,235][134294] Updated weights for policy 0, policy_version 230544 (0.0024) [2025-01-04 15:06:38,968][134211] Fps is (10 sec: 13929.7, 60 sec: 13926.2, 300 sec: 14648.4). Total num frames: 944316416. Throughput: 0: 3493.0. Samples: 225246652. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 15:06:38,969][134211] Avg episode reward: [(0, '10.317')] [2025-01-04 15:06:40,728][134294] Updated weights for policy 0, policy_version 230554 (0.0021) [2025-01-04 15:06:42,656][134294] Updated weights for policy 0, policy_version 230564 (0.0014) [2025-01-04 15:06:43,968][134211] Fps is (10 sec: 17204.4, 60 sec: 14472.6, 300 sec: 14773.4). Total num frames: 944418816. Throughput: 0: 3640.3. Samples: 225273890. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 15:06:43,968][134211] Avg episode reward: [(0, '9.484')] [2025-01-04 15:06:44,531][134294] Updated weights for policy 0, policy_version 230574 (0.0015) [2025-01-04 15:06:47,399][134294] Updated weights for policy 0, policy_version 230584 (0.0022) [2025-01-04 15:06:48,968][134211] Fps is (10 sec: 17614.0, 60 sec: 14540.8, 300 sec: 14773.4). Total num frames: 944492544. Throughput: 0: 3694.5. Samples: 225287104. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 15:06:48,968][134211] Avg episode reward: [(0, '10.521')] [2025-01-04 15:06:50,381][134294] Updated weights for policy 0, policy_version 230594 (0.0027) [2025-01-04 15:06:53,363][134294] Updated weights for policy 0, policy_version 230604 (0.0024) [2025-01-04 15:06:53,968][134211] Fps is (10 sec: 13926.3, 60 sec: 14472.6, 300 sec: 14745.6). Total num frames: 944558080. Throughput: 0: 3683.6. Samples: 225307708. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 15:06:53,968][134211] Avg episode reward: [(0, '10.327')] [2025-01-04 15:06:56,352][134294] Updated weights for policy 0, policy_version 230614 (0.0025) [2025-01-04 15:06:58,968][134211] Fps is (10 sec: 13516.7, 60 sec: 14540.8, 300 sec: 14745.6). Total num frames: 944627712. Throughput: 0: 3654.5. Samples: 225327768. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 15:06:58,968][134211] Avg episode reward: [(0, '8.568')] [2025-01-04 15:06:59,492][134294] Updated weights for policy 0, policy_version 230624 (0.0025) [2025-01-04 15:07:02,590][134294] Updated weights for policy 0, policy_version 230634 (0.0023) [2025-01-04 15:07:03,968][134211] Fps is (10 sec: 13516.6, 60 sec: 14404.4, 300 sec: 14690.1). Total num frames: 944693248. Throughput: 0: 3648.6. Samples: 225337922. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 15:07:03,968][134211] Avg episode reward: [(0, '10.690')] [2025-01-04 15:07:05,605][134294] Updated weights for policy 0, policy_version 230644 (0.0026) [2025-01-04 15:07:08,438][134294] Updated weights for policy 0, policy_version 230654 (0.0025) [2025-01-04 15:07:08,968][134211] Fps is (10 sec: 13516.7, 60 sec: 14404.3, 300 sec: 14703.9). Total num frames: 944762880. Throughput: 0: 3639.3. Samples: 225358596. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 15:07:08,968][134211] Avg episode reward: [(0, '10.364')] [2025-01-04 15:07:11,338][134294] Updated weights for policy 0, policy_version 230664 (0.0027) [2025-01-04 15:07:13,968][134211] Fps is (10 sec: 13926.3, 60 sec: 14404.2, 300 sec: 14634.5). Total num frames: 944832512. Throughput: 0: 3649.5. Samples: 225379318. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 15:07:13,969][134211] Avg episode reward: [(0, '9.530')] [2025-01-04 15:07:13,977][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000230672_944832512.pth... [2025-01-04 15:07:14,055][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000229813_941314048.pth [2025-01-04 15:07:14,443][134294] Updated weights for policy 0, policy_version 230674 (0.0027) [2025-01-04 15:07:16,796][134294] Updated weights for policy 0, policy_version 230684 (0.0016) [2025-01-04 15:07:18,725][134294] Updated weights for policy 0, policy_version 230694 (0.0013) [2025-01-04 15:07:18,968][134211] Fps is (10 sec: 16384.4, 60 sec: 14813.9, 300 sec: 14717.8). Total num frames: 944926720. Throughput: 0: 3667.8. Samples: 225390404. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 15:07:18,968][134211] Avg episode reward: [(0, '9.843')] [2025-01-04 15:07:20,631][134294] Updated weights for policy 0, policy_version 230704 (0.0012) [2025-01-04 15:07:22,481][134294] Updated weights for policy 0, policy_version 230714 (0.0012) [2025-01-04 15:07:23,967][134211] Fps is (10 sec: 20071.1, 60 sec: 15360.0, 300 sec: 14842.8). Total num frames: 945033216. Throughput: 0: 3911.6. Samples: 225422672. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 15:07:23,968][134211] Avg episode reward: [(0, '9.931')] [2025-01-04 15:07:24,400][134294] Updated weights for policy 0, policy_version 230724 (0.0014) [2025-01-04 15:07:26,514][134294] Updated weights for policy 0, policy_version 230734 (0.0016) [2025-01-04 15:07:28,968][134211] Fps is (10 sec: 19250.8, 60 sec: 15702.1, 300 sec: 14912.2). Total num frames: 945119232. Throughput: 0: 3931.1. Samples: 225450790. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 15:07:28,968][134211] Avg episode reward: [(0, '10.094')] [2025-01-04 15:07:29,559][134294] Updated weights for policy 0, policy_version 230744 (0.0025) [2025-01-04 15:07:32,737][134294] Updated weights for policy 0, policy_version 230754 (0.0028) [2025-01-04 15:07:33,968][134211] Fps is (10 sec: 14745.2, 60 sec: 15564.9, 300 sec: 14870.6). Total num frames: 945180672. Throughput: 0: 3845.7. Samples: 225460160. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 15:07:33,968][134211] Avg episode reward: [(0, '9.366')] [2025-01-04 15:07:35,772][134294] Updated weights for policy 0, policy_version 230764 (0.0025) [2025-01-04 15:07:38,697][134294] Updated weights for policy 0, policy_version 230774 (0.0025) [2025-01-04 15:07:38,968][134211] Fps is (10 sec: 13107.2, 60 sec: 15564.9, 300 sec: 14870.6). Total num frames: 945250304. Throughput: 0: 3840.6. Samples: 225480536. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 15:07:38,968][134211] Avg episode reward: [(0, '11.050')] [2025-01-04 15:07:41,752][134294] Updated weights for policy 0, policy_version 230784 (0.0028) [2025-01-04 15:07:43,968][134211] Fps is (10 sec: 13926.4, 60 sec: 15018.6, 300 sec: 14842.8). Total num frames: 945319936. Throughput: 0: 3844.4. Samples: 225500764. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 15:07:43,968][134211] Avg episode reward: [(0, '9.133')] [2025-01-04 15:07:44,770][134294] Updated weights for policy 0, policy_version 230794 (0.0026) [2025-01-04 15:07:47,793][134294] Updated weights for policy 0, policy_version 230804 (0.0024) [2025-01-04 15:07:48,968][134211] Fps is (10 sec: 13926.5, 60 sec: 14950.4, 300 sec: 14815.1). Total num frames: 945389568. Throughput: 0: 3844.5. Samples: 225510924. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 15:07:48,968][134211] Avg episode reward: [(0, '9.446')] [2025-01-04 15:07:50,730][134294] Updated weights for policy 0, policy_version 230814 (0.0024) [2025-01-04 15:07:53,612][134294] Updated weights for policy 0, policy_version 230824 (0.0023) [2025-01-04 15:07:53,968][134211] Fps is (10 sec: 13926.3, 60 sec: 15018.6, 300 sec: 14815.0). Total num frames: 945459200. Throughput: 0: 3852.9. Samples: 225531978. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 15:07:53,968][134211] Avg episode reward: [(0, '9.835')] [2025-01-04 15:07:56,549][134294] Updated weights for policy 0, policy_version 230834 (0.0025) [2025-01-04 15:07:58,968][134211] Fps is (10 sec: 13926.4, 60 sec: 15018.7, 300 sec: 14801.1). Total num frames: 945528832. Throughput: 0: 3854.9. Samples: 225552788. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 15:07:58,968][134211] Avg episode reward: [(0, '10.263')] [2025-01-04 15:07:59,516][134294] Updated weights for policy 0, policy_version 230844 (0.0027) [2025-01-04 15:08:02,574][134294] Updated weights for policy 0, policy_version 230854 (0.0026) [2025-01-04 15:08:03,968][134211] Fps is (10 sec: 13516.8, 60 sec: 15018.7, 300 sec: 14731.7). Total num frames: 945594368. Throughput: 0: 3836.5. Samples: 225563048. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 15:08:03,969][134211] Avg episode reward: [(0, '10.124')] [2025-01-04 15:08:05,496][134294] Updated weights for policy 0, policy_version 230864 (0.0025) [2025-01-04 15:08:08,375][134294] Updated weights for policy 0, policy_version 230874 (0.0025) [2025-01-04 15:08:08,968][134211] Fps is (10 sec: 13926.5, 60 sec: 15087.0, 300 sec: 14759.5). Total num frames: 945668096. Throughput: 0: 3584.0. Samples: 225583952. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 15:08:08,968][134211] Avg episode reward: [(0, '10.003')] [2025-01-04 15:08:11,297][134294] Updated weights for policy 0, policy_version 230884 (0.0025) [2025-01-04 15:08:13,968][134211] Fps is (10 sec: 13926.6, 60 sec: 15018.7, 300 sec: 14745.6). Total num frames: 945733632. Throughput: 0: 3422.7. Samples: 225604812. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 15:08:13,968][134211] Avg episode reward: [(0, '10.046')] [2025-01-04 15:08:14,248][134294] Updated weights for policy 0, policy_version 230894 (0.0024) [2025-01-04 15:08:17,285][134294] Updated weights for policy 0, policy_version 230904 (0.0025) [2025-01-04 15:08:18,968][134211] Fps is (10 sec: 13516.3, 60 sec: 14609.0, 300 sec: 14745.6). Total num frames: 945803264. Throughput: 0: 3441.8. Samples: 225615040. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 15:08:18,969][134211] Avg episode reward: [(0, '10.826')] [2025-01-04 15:08:20,162][134294] Updated weights for policy 0, policy_version 230914 (0.0030) [2025-01-04 15:08:23,110][134294] Updated weights for policy 0, policy_version 230924 (0.0025) [2025-01-04 15:08:23,968][134211] Fps is (10 sec: 13926.2, 60 sec: 13994.6, 300 sec: 14745.6). Total num frames: 945872896. Throughput: 0: 3466.0. Samples: 225636506. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 15:08:23,968][134211] Avg episode reward: [(0, '9.956')] [2025-01-04 15:08:25,873][134294] Updated weights for policy 0, policy_version 230934 (0.0025) [2025-01-04 15:08:28,799][134294] Updated weights for policy 0, policy_version 230944 (0.0023) [2025-01-04 15:08:28,968][134211] Fps is (10 sec: 14336.3, 60 sec: 13789.9, 300 sec: 14759.5). Total num frames: 945946624. Throughput: 0: 3488.7. Samples: 225657756. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 15:08:28,968][134211] Avg episode reward: [(0, '10.002')] [2025-01-04 15:08:31,676][134294] Updated weights for policy 0, policy_version 230954 (0.0024) [2025-01-04 15:08:33,968][134211] Fps is (10 sec: 14336.0, 60 sec: 13926.4, 300 sec: 14690.0). Total num frames: 946016256. Throughput: 0: 3494.9. Samples: 225668196. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 15:08:33,968][134211] Avg episode reward: [(0, '10.199')] [2025-01-04 15:08:34,721][134294] Updated weights for policy 0, policy_version 230964 (0.0023) [2025-01-04 15:08:37,691][134294] Updated weights for policy 0, policy_version 230974 (0.0025) [2025-01-04 15:08:38,967][134211] Fps is (10 sec: 14336.3, 60 sec: 13994.7, 300 sec: 14565.1). Total num frames: 946089984. Throughput: 0: 3485.4. Samples: 225688818. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 15:08:38,968][134211] Avg episode reward: [(0, '11.006')] [2025-01-04 15:08:39,803][134294] Updated weights for policy 0, policy_version 230984 (0.0016) [2025-01-04 15:08:42,222][134294] Updated weights for policy 0, policy_version 230994 (0.0018) [2025-01-04 15:08:43,968][134211] Fps is (10 sec: 15564.8, 60 sec: 14199.5, 300 sec: 14620.6). Total num frames: 946171904. Throughput: 0: 3585.5. Samples: 225714134. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 15:08:43,968][134211] Avg episode reward: [(0, '10.292')] [2025-01-04 15:08:45,217][134294] Updated weights for policy 0, policy_version 231004 (0.0025) [2025-01-04 15:08:48,140][134294] Updated weights for policy 0, policy_version 231014 (0.0024) [2025-01-04 15:08:48,968][134211] Fps is (10 sec: 15154.9, 60 sec: 14199.5, 300 sec: 14634.5). Total num frames: 946241536. Throughput: 0: 3593.0. Samples: 225724734. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 15:08:48,968][134211] Avg episode reward: [(0, '10.326')] [2025-01-04 15:08:51,047][134294] Updated weights for policy 0, policy_version 231024 (0.0023) [2025-01-04 15:08:53,968][134211] Fps is (10 sec: 13926.4, 60 sec: 14199.5, 300 sec: 14634.5). Total num frames: 946311168. Throughput: 0: 3593.8. Samples: 225745674. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 15:08:53,968][134211] Avg episode reward: [(0, '9.997')] [2025-01-04 15:08:54,040][134294] Updated weights for policy 0, policy_version 231034 (0.0023) [2025-01-04 15:08:56,977][134294] Updated weights for policy 0, policy_version 231044 (0.0024) [2025-01-04 15:08:58,968][134211] Fps is (10 sec: 13926.6, 60 sec: 14199.5, 300 sec: 14648.4). Total num frames: 946380800. Throughput: 0: 3588.7. Samples: 225766302. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 15:08:58,968][134211] Avg episode reward: [(0, '9.940')] [2025-01-04 15:08:59,872][134294] Updated weights for policy 0, policy_version 231054 (0.0026) [2025-01-04 15:09:02,421][134294] Updated weights for policy 0, policy_version 231064 (0.0022) [2025-01-04 15:09:03,968][134211] Fps is (10 sec: 15974.7, 60 sec: 14609.1, 300 sec: 14703.9). Total num frames: 946470912. Throughput: 0: 3600.1. Samples: 225777044. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 15:09:03,968][134211] Avg episode reward: [(0, '9.836')] [2025-01-04 15:09:04,283][134294] Updated weights for policy 0, policy_version 231074 (0.0013) [2025-01-04 15:09:06,162][134294] Updated weights for policy 0, policy_version 231084 (0.0013) [2025-01-04 15:09:08,058][134294] Updated weights for policy 0, policy_version 231094 (0.0013) [2025-01-04 15:09:08,967][134211] Fps is (10 sec: 19660.9, 60 sec: 15155.2, 300 sec: 14842.8). Total num frames: 946577408. Throughput: 0: 3827.4. Samples: 225808736. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 15:09:08,968][134211] Avg episode reward: [(0, '9.977')] [2025-01-04 15:09:10,034][134294] Updated weights for policy 0, policy_version 231104 (0.0015) [2025-01-04 15:09:13,019][134294] Updated weights for policy 0, policy_version 231114 (0.0023) [2025-01-04 15:09:13,968][134211] Fps is (10 sec: 18022.3, 60 sec: 15291.7, 300 sec: 14842.8). Total num frames: 946651136. Throughput: 0: 3922.1. Samples: 225834250. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 15:09:13,968][134211] Avg episode reward: [(0, '9.738')] [2025-01-04 15:09:14,033][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000231117_946655232.pth... [2025-01-04 15:09:14,106][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000230242_943071232.pth [2025-01-04 15:09:16,277][134294] Updated weights for policy 0, policy_version 231124 (0.0030) [2025-01-04 15:09:18,968][134211] Fps is (10 sec: 14335.8, 60 sec: 15291.8, 300 sec: 14842.8). Total num frames: 946720768. Throughput: 0: 3907.4. Samples: 225844028. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 15:09:18,968][134211] Avg episode reward: [(0, '9.228')] [2025-01-04 15:09:19,236][134294] Updated weights for policy 0, policy_version 231134 (0.0027) [2025-01-04 15:09:22,316][134294] Updated weights for policy 0, policy_version 231144 (0.0028) [2025-01-04 15:09:23,968][134211] Fps is (10 sec: 13516.7, 60 sec: 15223.5, 300 sec: 14828.9). Total num frames: 946786304. Throughput: 0: 3900.7. Samples: 225864352. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 15:09:23,968][134211] Avg episode reward: [(0, '9.322')] [2025-01-04 15:09:25,225][134294] Updated weights for policy 0, policy_version 231154 (0.0024) [2025-01-04 15:09:28,126][134294] Updated weights for policy 0, policy_version 231164 (0.0028) [2025-01-04 15:09:28,968][134211] Fps is (10 sec: 13516.8, 60 sec: 15155.2, 300 sec: 14842.8). Total num frames: 946855936. Throughput: 0: 3798.9. Samples: 225885084. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 15:09:28,968][134211] Avg episode reward: [(0, '9.491')] [2025-01-04 15:09:31,200][134294] Updated weights for policy 0, policy_version 231174 (0.0023) [2025-01-04 15:09:33,968][134211] Fps is (10 sec: 13926.6, 60 sec: 15155.2, 300 sec: 14828.9). Total num frames: 946925568. Throughput: 0: 3796.5. Samples: 225895576. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 15:09:33,968][134211] Avg episode reward: [(0, '10.513')] [2025-01-04 15:09:34,099][134294] Updated weights for policy 0, policy_version 231184 (0.0025) [2025-01-04 15:09:37,099][134294] Updated weights for policy 0, policy_version 231194 (0.0024) [2025-01-04 15:09:38,968][134211] Fps is (10 sec: 13926.4, 60 sec: 15086.9, 300 sec: 14773.4). Total num frames: 946995200. Throughput: 0: 3786.8. Samples: 225916078. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 15:09:38,968][134211] Avg episode reward: [(0, '10.672')] [2025-01-04 15:09:40,045][134294] Updated weights for policy 0, policy_version 231204 (0.0023) [2025-01-04 15:09:42,935][134294] Updated weights for policy 0, policy_version 231214 (0.0023) [2025-01-04 15:09:43,968][134211] Fps is (10 sec: 13926.2, 60 sec: 14882.1, 300 sec: 14717.8). Total num frames: 947064832. Throughput: 0: 3797.6. Samples: 225937196. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 15:09:43,968][134211] Avg episode reward: [(0, '10.646')] [2025-01-04 15:09:45,879][134294] Updated weights for policy 0, policy_version 231224 (0.0025) [2025-01-04 15:09:48,695][134294] Updated weights for policy 0, policy_version 231234 (0.0024) [2025-01-04 15:09:48,968][134211] Fps is (10 sec: 13926.2, 60 sec: 14882.1, 300 sec: 14717.8). Total num frames: 947134464. Throughput: 0: 3795.3. Samples: 225947834. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 15:09:48,968][134211] Avg episode reward: [(0, '10.113')] [2025-01-04 15:09:51,617][134294] Updated weights for policy 0, policy_version 231244 (0.0021) [2025-01-04 15:09:53,968][134211] Fps is (10 sec: 14335.8, 60 sec: 14950.4, 300 sec: 14745.6). Total num frames: 947208192. Throughput: 0: 3562.1. Samples: 225969030. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 15:09:53,969][134211] Avg episode reward: [(0, '10.364')] [2025-01-04 15:09:54,519][134294] Updated weights for policy 0, policy_version 231254 (0.0022) [2025-01-04 15:09:57,555][134294] Updated weights for policy 0, policy_version 231264 (0.0026) [2025-01-04 15:09:58,968][134211] Fps is (10 sec: 13926.6, 60 sec: 14882.1, 300 sec: 14731.7). Total num frames: 947273728. Throughput: 0: 3453.8. Samples: 225989670. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 15:09:58,968][134211] Avg episode reward: [(0, '10.776')] [2025-01-04 15:10:00,430][134294] Updated weights for policy 0, policy_version 231274 (0.0024) [2025-01-04 15:10:03,360][134294] Updated weights for policy 0, policy_version 231284 (0.0023) [2025-01-04 15:10:03,968][134211] Fps is (10 sec: 13517.1, 60 sec: 14540.8, 300 sec: 14717.8). Total num frames: 947343360. Throughput: 0: 3474.6. Samples: 226000384. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 15:10:03,968][134211] Avg episode reward: [(0, '8.287')] [2025-01-04 15:10:06,305][134294] Updated weights for policy 0, policy_version 231294 (0.0023) [2025-01-04 15:10:08,968][134211] Fps is (10 sec: 13926.2, 60 sec: 13926.3, 300 sec: 14717.9). Total num frames: 947412992. Throughput: 0: 3487.3. Samples: 226021280. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 15:10:08,968][134211] Avg episode reward: [(0, '9.963')] [2025-01-04 15:10:09,272][134294] Updated weights for policy 0, policy_version 231304 (0.0027) [2025-01-04 15:10:11,858][134294] Updated weights for policy 0, policy_version 231314 (0.0020) [2025-01-04 15:10:13,968][134211] Fps is (10 sec: 15564.8, 60 sec: 14131.2, 300 sec: 14717.8). Total num frames: 947499008. Throughput: 0: 3562.6. Samples: 226045402. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 15:10:13,968][134211] Avg episode reward: [(0, '9.248')] [2025-01-04 15:10:14,140][134294] Updated weights for policy 0, policy_version 231324 (0.0016) [2025-01-04 15:10:17,021][134294] Updated weights for policy 0, policy_version 231334 (0.0025) [2025-01-04 15:10:18,968][134211] Fps is (10 sec: 15564.9, 60 sec: 14131.2, 300 sec: 14592.9). Total num frames: 947568640. Throughput: 0: 3567.8. Samples: 226056128. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 15:10:18,968][134211] Avg episode reward: [(0, '9.842')] [2025-01-04 15:10:20,034][134294] Updated weights for policy 0, policy_version 231344 (0.0028) [2025-01-04 15:10:22,886][134294] Updated weights for policy 0, policy_version 231354 (0.0025) [2025-01-04 15:10:23,968][134211] Fps is (10 sec: 13926.3, 60 sec: 14199.5, 300 sec: 14523.4). Total num frames: 947638272. Throughput: 0: 3579.3. Samples: 226077146. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 15:10:23,968][134211] Avg episode reward: [(0, '10.089')] [2025-01-04 15:10:25,921][134294] Updated weights for policy 0, policy_version 231364 (0.0028) [2025-01-04 15:10:28,279][134294] Updated weights for policy 0, policy_version 231374 (0.0019) [2025-01-04 15:10:28,968][134211] Fps is (10 sec: 15155.0, 60 sec: 14404.2, 300 sec: 14592.9). Total num frames: 947720192. Throughput: 0: 3611.0. Samples: 226099690. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 15:10:28,968][134211] Avg episode reward: [(0, '9.684')] [2025-01-04 15:10:30,188][134294] Updated weights for policy 0, policy_version 231384 (0.0013) [2025-01-04 15:10:32,257][134294] Updated weights for policy 0, policy_version 231394 (0.0018) [2025-01-04 15:10:33,968][134211] Fps is (10 sec: 17203.4, 60 sec: 14745.6, 300 sec: 14676.2). Total num frames: 947810304. Throughput: 0: 3734.4. Samples: 226115880. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 15:10:33,968][134211] Avg episode reward: [(0, '9.912')] [2025-01-04 15:10:35,295][134294] Updated weights for policy 0, policy_version 231404 (0.0023) [2025-01-04 15:10:38,216][134294] Updated weights for policy 0, policy_version 231414 (0.0024) [2025-01-04 15:10:38,968][134211] Fps is (10 sec: 15973.5, 60 sec: 14745.4, 300 sec: 14676.1). Total num frames: 947879936. Throughput: 0: 3746.3. Samples: 226137616. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 15:10:38,969][134211] Avg episode reward: [(0, '9.859')] [2025-01-04 15:10:41,199][134294] Updated weights for policy 0, policy_version 231424 (0.0025) [2025-01-04 15:10:43,968][134211] Fps is (10 sec: 13926.4, 60 sec: 14745.6, 300 sec: 14676.2). Total num frames: 947949568. Throughput: 0: 3741.2. Samples: 226158026. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 15:10:43,968][134211] Avg episode reward: [(0, '10.619')] [2025-01-04 15:10:44,300][134294] Updated weights for policy 0, policy_version 231434 (0.0026) [2025-01-04 15:10:47,130][134294] Updated weights for policy 0, policy_version 231444 (0.0025) [2025-01-04 15:10:48,968][134211] Fps is (10 sec: 13517.7, 60 sec: 14677.3, 300 sec: 14662.3). Total num frames: 948015104. Throughput: 0: 3729.7. Samples: 226168220. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 15:10:48,969][134211] Avg episode reward: [(0, '10.495')] [2025-01-04 15:10:50,315][134294] Updated weights for policy 0, policy_version 231454 (0.0024) [2025-01-04 15:10:53,204][134294] Updated weights for policy 0, policy_version 231464 (0.0025) [2025-01-04 15:10:53,968][134211] Fps is (10 sec: 13516.8, 60 sec: 14609.1, 300 sec: 14676.2). Total num frames: 948084736. Throughput: 0: 3719.6. Samples: 226188660. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 15:10:53,968][134211] Avg episode reward: [(0, '10.982')] [2025-01-04 15:10:56,150][134294] Updated weights for policy 0, policy_version 231474 (0.0024) [2025-01-04 15:10:58,970][134211] Fps is (10 sec: 13923.5, 60 sec: 14676.8, 300 sec: 14662.2). Total num frames: 948154368. Throughput: 0: 3644.0. Samples: 226209390. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 15:10:58,970][134211] Avg episode reward: [(0, '9.159')] [2025-01-04 15:10:59,227][134294] Updated weights for policy 0, policy_version 231484 (0.0025) [2025-01-04 15:11:02,098][134294] Updated weights for policy 0, policy_version 231494 (0.0026) [2025-01-04 15:11:03,968][134211] Fps is (10 sec: 13926.5, 60 sec: 14677.4, 300 sec: 14662.3). Total num frames: 948224000. Throughput: 0: 3632.9. Samples: 226219608. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 15:11:03,969][134211] Avg episode reward: [(0, '10.394')] [2025-01-04 15:11:05,154][134294] Updated weights for policy 0, policy_version 231504 (0.0023) [2025-01-04 15:11:08,028][134294] Updated weights for policy 0, policy_version 231514 (0.0021) [2025-01-04 15:11:08,968][134211] Fps is (10 sec: 13929.3, 60 sec: 14677.3, 300 sec: 14662.3). Total num frames: 948293632. Throughput: 0: 3634.4. Samples: 226240694. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 15:11:08,968][134211] Avg episode reward: [(0, '8.849')] [2025-01-04 15:11:10,929][134294] Updated weights for policy 0, policy_version 231524 (0.0026) [2025-01-04 15:11:13,771][134294] Updated weights for policy 0, policy_version 231534 (0.0025) [2025-01-04 15:11:13,968][134211] Fps is (10 sec: 13926.0, 60 sec: 14404.2, 300 sec: 14662.3). Total num frames: 948363264. Throughput: 0: 3605.8. Samples: 226261950. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 15:11:13,968][134211] Avg episode reward: [(0, '10.705')] [2025-01-04 15:11:14,040][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000231535_948367360.pth... [2025-01-04 15:11:14,111][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000230672_944832512.pth [2025-01-04 15:11:15,915][134294] Updated weights for policy 0, policy_version 231544 (0.0013) [2025-01-04 15:11:17,756][134294] Updated weights for policy 0, policy_version 231554 (0.0016) [2025-01-04 15:11:18,967][134211] Fps is (10 sec: 17613.3, 60 sec: 15018.7, 300 sec: 14773.4). Total num frames: 948469760. Throughput: 0: 3554.3. Samples: 226275824. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 15:11:18,968][134211] Avg episode reward: [(0, '9.209')] [2025-01-04 15:11:19,678][134294] Updated weights for policy 0, policy_version 231564 (0.0014) [2025-01-04 15:11:21,522][134294] Updated weights for policy 0, policy_version 231574 (0.0014) [2025-01-04 15:11:23,953][134294] Updated weights for policy 0, policy_version 231584 (0.0018) [2025-01-04 15:11:23,968][134211] Fps is (10 sec: 20479.5, 60 sec: 15496.4, 300 sec: 14884.6). Total num frames: 948568064. Throughput: 0: 3797.9. Samples: 226308520. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 15:11:23,969][134211] Avg episode reward: [(0, '10.570')] [2025-01-04 15:11:27,056][134294] Updated weights for policy 0, policy_version 231594 (0.0026) [2025-01-04 15:11:28,968][134211] Fps is (10 sec: 16383.6, 60 sec: 15223.5, 300 sec: 14870.6). Total num frames: 948633600. Throughput: 0: 3798.6. Samples: 226328964. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 15:11:28,968][134211] Avg episode reward: [(0, '9.904')] [2025-01-04 15:11:30,166][134294] Updated weights for policy 0, policy_version 231604 (0.0026) [2025-01-04 15:11:33,214][134294] Updated weights for policy 0, policy_version 231614 (0.0026) [2025-01-04 15:11:33,968][134211] Fps is (10 sec: 13107.8, 60 sec: 14813.9, 300 sec: 14856.7). Total num frames: 948699136. Throughput: 0: 3796.0. Samples: 226339040. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 15:11:33,968][134211] Avg episode reward: [(0, '10.244')] [2025-01-04 15:11:36,129][134294] Updated weights for policy 0, policy_version 231624 (0.0023) [2025-01-04 15:11:38,968][134211] Fps is (10 sec: 13516.9, 60 sec: 14814.0, 300 sec: 14745.6). Total num frames: 948768768. Throughput: 0: 3796.8. Samples: 226359514. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 15:11:38,968][134211] Avg episode reward: [(0, '10.691')] [2025-01-04 15:11:39,262][134294] Updated weights for policy 0, policy_version 231634 (0.0023) [2025-01-04 15:11:42,312][134294] Updated weights for policy 0, policy_version 231644 (0.0027) [2025-01-04 15:11:43,968][134211] Fps is (10 sec: 13516.8, 60 sec: 14745.6, 300 sec: 14717.8). Total num frames: 948834304. Throughput: 0: 3787.2. Samples: 226379808. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 15:11:43,968][134211] Avg episode reward: [(0, '9.749')] [2025-01-04 15:11:45,196][134294] Updated weights for policy 0, policy_version 231654 (0.0025) [2025-01-04 15:11:48,035][134294] Updated weights for policy 0, policy_version 231664 (0.0024) [2025-01-04 15:11:48,968][134211] Fps is (10 sec: 13925.8, 60 sec: 14882.0, 300 sec: 14745.6). Total num frames: 948908032. Throughput: 0: 3795.2. Samples: 226390394. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 15:11:48,969][134211] Avg episode reward: [(0, '8.590')] [2025-01-04 15:11:50,948][134294] Updated weights for policy 0, policy_version 231674 (0.0023) [2025-01-04 15:11:53,851][134294] Updated weights for policy 0, policy_version 231684 (0.0024) [2025-01-04 15:11:53,968][134211] Fps is (10 sec: 14335.9, 60 sec: 14882.1, 300 sec: 14745.6). Total num frames: 948977664. Throughput: 0: 3802.0. Samples: 226411784. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 15:11:53,968][134211] Avg episode reward: [(0, '9.538')] [2025-01-04 15:11:56,724][134294] Updated weights for policy 0, policy_version 231694 (0.0027) [2025-01-04 15:11:58,968][134211] Fps is (10 sec: 13927.1, 60 sec: 14882.7, 300 sec: 14759.5). Total num frames: 949047296. Throughput: 0: 3795.8. Samples: 226432758. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 15:11:58,968][134211] Avg episode reward: [(0, '9.583')] [2025-01-04 15:11:59,796][134294] Updated weights for policy 0, policy_version 231704 (0.0024) [2025-01-04 15:12:02,704][134294] Updated weights for policy 0, policy_version 231714 (0.0028) [2025-01-04 15:12:03,968][134211] Fps is (10 sec: 13926.3, 60 sec: 14882.1, 300 sec: 14759.5). Total num frames: 949116928. Throughput: 0: 3712.6. Samples: 226442894. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 15:12:03,968][134211] Avg episode reward: [(0, '9.954')] [2025-01-04 15:12:05,704][134294] Updated weights for policy 0, policy_version 231724 (0.0023) [2025-01-04 15:12:08,522][134294] Updated weights for policy 0, policy_version 231734 (0.0027) [2025-01-04 15:12:08,968][134211] Fps is (10 sec: 13926.2, 60 sec: 14882.1, 300 sec: 14759.5). Total num frames: 949186560. Throughput: 0: 3456.1. Samples: 226464044. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 15:12:08,968][134211] Avg episode reward: [(0, '8.930')] [2025-01-04 15:12:11,721][134294] Updated weights for policy 0, policy_version 231744 (0.0025) [2025-01-04 15:12:13,968][134211] Fps is (10 sec: 13516.8, 60 sec: 14813.9, 300 sec: 14662.3). Total num frames: 949252096. Throughput: 0: 3448.2. Samples: 226484132. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 15:12:13,968][134211] Avg episode reward: [(0, '9.030')] [2025-01-04 15:12:14,673][134294] Updated weights for policy 0, policy_version 231754 (0.0026) [2025-01-04 15:12:17,653][134294] Updated weights for policy 0, policy_version 231764 (0.0025) [2025-01-04 15:12:18,968][134211] Fps is (10 sec: 13516.8, 60 sec: 14199.4, 300 sec: 14537.3). Total num frames: 949321728. Throughput: 0: 3450.6. Samples: 226494316. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 15:12:18,968][134211] Avg episode reward: [(0, '9.327')] [2025-01-04 15:12:20,597][134294] Updated weights for policy 0, policy_version 231774 (0.0025) [2025-01-04 15:12:23,199][134294] Updated weights for policy 0, policy_version 231784 (0.0019) [2025-01-04 15:12:23,967][134211] Fps is (10 sec: 15155.6, 60 sec: 13926.5, 300 sec: 14523.5). Total num frames: 949403648. Throughput: 0: 3469.6. Samples: 226515644. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 15:12:23,968][134211] Avg episode reward: [(0, '9.846')] [2025-01-04 15:12:25,240][134294] Updated weights for policy 0, policy_version 231794 (0.0016) [2025-01-04 15:12:28,043][134294] Updated weights for policy 0, policy_version 231804 (0.0024) [2025-01-04 15:12:28,968][134211] Fps is (10 sec: 15974.3, 60 sec: 14131.2, 300 sec: 14579.0). Total num frames: 949481472. Throughput: 0: 3579.3. Samples: 226540878. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 15:12:28,968][134211] Avg episode reward: [(0, '9.415')] [2025-01-04 15:12:30,999][134294] Updated weights for policy 0, policy_version 231814 (0.0026) [2025-01-04 15:12:33,968][134211] Fps is (10 sec: 14335.4, 60 sec: 14131.1, 300 sec: 14565.1). Total num frames: 949547008. Throughput: 0: 3576.0. Samples: 226551314. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 15:12:33,969][134211] Avg episode reward: [(0, '9.198')] [2025-01-04 15:12:34,035][134294] Updated weights for policy 0, policy_version 231824 (0.0023) [2025-01-04 15:12:36,977][134294] Updated weights for policy 0, policy_version 231834 (0.0025) [2025-01-04 15:12:38,968][134211] Fps is (10 sec: 13516.8, 60 sec: 14131.2, 300 sec: 14565.1). Total num frames: 949616640. Throughput: 0: 3555.5. Samples: 226571780. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 15:12:38,968][134211] Avg episode reward: [(0, '10.827')] [2025-01-04 15:12:39,986][134294] Updated weights for policy 0, policy_version 231844 (0.0023) [2025-01-04 15:12:42,554][134294] Updated weights for policy 0, policy_version 231854 (0.0021) [2025-01-04 15:12:43,968][134211] Fps is (10 sec: 15565.2, 60 sec: 14472.5, 300 sec: 14620.6). Total num frames: 949702656. Throughput: 0: 3602.3. Samples: 226594862. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 15:12:43,968][134211] Avg episode reward: [(0, '10.319')] [2025-01-04 15:12:44,640][134294] Updated weights for policy 0, policy_version 231864 (0.0016) [2025-01-04 15:12:47,449][134294] Updated weights for policy 0, policy_version 231874 (0.0023) [2025-01-04 15:12:48,968][134211] Fps is (10 sec: 15973.8, 60 sec: 14472.5, 300 sec: 14634.5). Total num frames: 949776384. Throughput: 0: 3663.3. Samples: 226607746. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 15:12:48,969][134211] Avg episode reward: [(0, '9.136')] [2025-01-04 15:12:50,443][134294] Updated weights for policy 0, policy_version 231884 (0.0025) [2025-01-04 15:12:53,287][134294] Updated weights for policy 0, policy_version 231894 (0.0024) [2025-01-04 15:12:53,968][134211] Fps is (10 sec: 14336.0, 60 sec: 14472.5, 300 sec: 14634.5). Total num frames: 949846016. Throughput: 0: 3660.2. Samples: 226628754. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 15:12:53,968][134211] Avg episode reward: [(0, '11.386')] [2025-01-04 15:12:56,168][134294] Updated weights for policy 0, policy_version 231904 (0.0025) [2025-01-04 15:12:58,968][134211] Fps is (10 sec: 13927.0, 60 sec: 14472.5, 300 sec: 14648.4). Total num frames: 949915648. Throughput: 0: 3679.2. Samples: 226649694. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 15:12:58,968][134211] Avg episode reward: [(0, '9.736')] [2025-01-04 15:12:59,164][134294] Updated weights for policy 0, policy_version 231914 (0.0023) [2025-01-04 15:13:02,198][134294] Updated weights for policy 0, policy_version 231924 (0.0024) [2025-01-04 15:13:03,968][134211] Fps is (10 sec: 13516.3, 60 sec: 14404.2, 300 sec: 14620.6). Total num frames: 949981184. Throughput: 0: 3682.4. Samples: 226660026. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 15:13:03,969][134211] Avg episode reward: [(0, '9.639')] [2025-01-04 15:13:04,989][134294] Updated weights for policy 0, policy_version 231934 (0.0020) [2025-01-04 15:13:06,986][134294] Updated weights for policy 0, policy_version 231944 (0.0014) [2025-01-04 15:13:08,968][134211] Fps is (10 sec: 15564.8, 60 sec: 14745.6, 300 sec: 14703.9). Total num frames: 950071296. Throughput: 0: 3758.0. Samples: 226684756. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 15:13:08,968][134211] Avg episode reward: [(0, '9.522')] [2025-01-04 15:13:09,749][134294] Updated weights for policy 0, policy_version 231954 (0.0024) [2025-01-04 15:13:12,596][134294] Updated weights for policy 0, policy_version 231964 (0.0022) [2025-01-04 15:13:13,968][134211] Fps is (10 sec: 15974.5, 60 sec: 14813.8, 300 sec: 14703.9). Total num frames: 950140928. Throughput: 0: 3675.0. Samples: 226706256. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 15:13:13,969][134211] Avg episode reward: [(0, '9.993')] [2025-01-04 15:13:13,981][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000231968_950140928.pth... [2025-01-04 15:13:14,052][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000231117_946655232.pth [2025-01-04 15:13:15,596][134294] Updated weights for policy 0, policy_version 231974 (0.0024) [2025-01-04 15:13:18,539][134294] Updated weights for policy 0, policy_version 231984 (0.0024) [2025-01-04 15:13:18,968][134211] Fps is (10 sec: 13926.3, 60 sec: 14813.8, 300 sec: 14703.9). Total num frames: 950210560. Throughput: 0: 3674.4. Samples: 226716660. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 15:13:18,968][134211] Avg episode reward: [(0, '9.736')] [2025-01-04 15:13:21,593][134294] Updated weights for policy 0, policy_version 231994 (0.0024) [2025-01-04 15:13:23,969][134211] Fps is (10 sec: 13515.7, 60 sec: 14540.5, 300 sec: 14676.1). Total num frames: 950276096. Throughput: 0: 3669.6. Samples: 226736918. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 15:13:23,969][134211] Avg episode reward: [(0, '9.608')] [2025-01-04 15:13:24,622][134294] Updated weights for policy 0, policy_version 232004 (0.0024) [2025-01-04 15:13:27,028][134294] Updated weights for policy 0, policy_version 232014 (0.0017) [2025-01-04 15:13:28,922][134294] Updated weights for policy 0, policy_version 232024 (0.0013) [2025-01-04 15:13:28,967][134211] Fps is (10 sec: 15974.8, 60 sec: 14813.9, 300 sec: 14759.5). Total num frames: 950370304. Throughput: 0: 3709.2. Samples: 226761776. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 15:13:28,968][134211] Avg episode reward: [(0, '9.369')] [2025-01-04 15:13:31,064][134294] Updated weights for policy 0, policy_version 232034 (0.0017) [2025-01-04 15:13:33,968][134211] Fps is (10 sec: 17205.2, 60 sec: 15018.7, 300 sec: 14773.4). Total num frames: 950448128. Throughput: 0: 3747.6. Samples: 226776388. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 15:13:33,968][134211] Avg episode reward: [(0, '9.472')] [2025-01-04 15:13:34,006][134294] Updated weights for policy 0, policy_version 232044 (0.0027) [2025-01-04 15:13:36,987][134294] Updated weights for policy 0, policy_version 232054 (0.0024) [2025-01-04 15:13:38,968][134211] Fps is (10 sec: 14745.3, 60 sec: 15018.7, 300 sec: 14731.7). Total num frames: 950517760. Throughput: 0: 3737.1. Samples: 226796922. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 15:13:38,968][134211] Avg episode reward: [(0, '10.712')] [2025-01-04 15:13:40,030][134294] Updated weights for policy 0, policy_version 232064 (0.0025) [2025-01-04 15:13:43,000][134294] Updated weights for policy 0, policy_version 232074 (0.0024) [2025-01-04 15:13:43,968][134211] Fps is (10 sec: 13926.3, 60 sec: 14745.6, 300 sec: 14731.7). Total num frames: 950587392. Throughput: 0: 3727.4. Samples: 226817426. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 15:13:43,968][134211] Avg episode reward: [(0, '10.417')] [2025-01-04 15:13:46,014][134294] Updated weights for policy 0, policy_version 232084 (0.0027) [2025-01-04 15:13:48,968][134211] Fps is (10 sec: 13515.9, 60 sec: 14609.0, 300 sec: 14717.8). Total num frames: 950652928. Throughput: 0: 3731.3. Samples: 226827934. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 15:13:48,969][134211] Avg episode reward: [(0, '10.867')] [2025-01-04 15:13:49,053][134294] Updated weights for policy 0, policy_version 232094 (0.0025) [2025-01-04 15:13:51,973][134294] Updated weights for policy 0, policy_version 232104 (0.0025) [2025-01-04 15:13:53,968][134211] Fps is (10 sec: 13516.8, 60 sec: 14609.0, 300 sec: 14717.8). Total num frames: 950722560. Throughput: 0: 3637.0. Samples: 226848422. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 15:13:53,968][134211] Avg episode reward: [(0, '9.114')] [2025-01-04 15:13:54,952][134294] Updated weights for policy 0, policy_version 232114 (0.0025) [2025-01-04 15:13:57,830][134294] Updated weights for policy 0, policy_version 232124 (0.0025) [2025-01-04 15:13:58,968][134211] Fps is (10 sec: 14336.9, 60 sec: 14677.3, 300 sec: 14662.3). Total num frames: 950796288. Throughput: 0: 3627.7. Samples: 226869502. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 15:13:58,968][134211] Avg episode reward: [(0, '9.612')] [2025-01-04 15:14:00,670][134294] Updated weights for policy 0, policy_version 232134 (0.0026) [2025-01-04 15:14:03,575][134294] Updated weights for policy 0, policy_version 232144 (0.0025) [2025-01-04 15:14:03,968][134211] Fps is (10 sec: 14336.1, 60 sec: 14745.7, 300 sec: 14537.3). Total num frames: 950865920. Throughput: 0: 3635.4. Samples: 226880254. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 15:14:03,968][134211] Avg episode reward: [(0, '10.206')] [2025-01-04 15:14:06,517][134294] Updated weights for policy 0, policy_version 232154 (0.0024) [2025-01-04 15:14:08,840][134294] Updated weights for policy 0, policy_version 232164 (0.0018) [2025-01-04 15:14:08,968][134211] Fps is (10 sec: 14745.9, 60 sec: 14540.9, 300 sec: 14551.2). Total num frames: 950943744. Throughput: 0: 3651.2. Samples: 226901216. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 15:14:08,968][134211] Avg episode reward: [(0, '8.190')] [2025-01-04 15:14:10,753][134294] Updated weights for policy 0, policy_version 232174 (0.0013) [2025-01-04 15:14:12,580][134294] Updated weights for policy 0, policy_version 232184 (0.0011) [2025-01-04 15:14:13,967][134211] Fps is (10 sec: 18841.9, 60 sec: 15223.6, 300 sec: 14690.1). Total num frames: 951054336. Throughput: 0: 3798.7. Samples: 226932718. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 15:14:13,968][134211] Avg episode reward: [(0, '12.158')] [2025-01-04 15:14:14,507][134294] Updated weights for policy 0, policy_version 232194 (0.0014) [2025-01-04 15:14:16,699][134294] Updated weights for policy 0, policy_version 232204 (0.0016) [2025-01-04 15:14:18,970][134211] Fps is (10 sec: 19246.8, 60 sec: 15427.7, 300 sec: 14745.5). Total num frames: 951136256. Throughput: 0: 3813.0. Samples: 226947980. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 15:14:18,970][134211] Avg episode reward: [(0, '9.611')] [2025-01-04 15:14:19,834][134294] Updated weights for policy 0, policy_version 232214 (0.0027) [2025-01-04 15:14:22,828][134294] Updated weights for policy 0, policy_version 232224 (0.0028) [2025-01-04 15:14:23,968][134211] Fps is (10 sec: 14745.3, 60 sec: 15428.6, 300 sec: 14731.7). Total num frames: 951201792. Throughput: 0: 3806.6. Samples: 226968220. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 15:14:23,968][134211] Avg episode reward: [(0, '9.014')] [2025-01-04 15:14:25,978][134294] Updated weights for policy 0, policy_version 232234 (0.0026) [2025-01-04 15:14:28,861][134294] Updated weights for policy 0, policy_version 232244 (0.0027) [2025-01-04 15:14:28,968][134211] Fps is (10 sec: 13519.5, 60 sec: 15018.6, 300 sec: 14731.7). Total num frames: 951271424. Throughput: 0: 3806.5. Samples: 226988718. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 15:14:28,968][134211] Avg episode reward: [(0, '10.440')] [2025-01-04 15:14:31,922][134294] Updated weights for policy 0, policy_version 232254 (0.0025) [2025-01-04 15:14:33,968][134211] Fps is (10 sec: 13516.2, 60 sec: 14813.8, 300 sec: 14717.8). Total num frames: 951336960. Throughput: 0: 3795.0. Samples: 226998710. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 15:14:33,969][134211] Avg episode reward: [(0, '9.947')] [2025-01-04 15:14:35,043][134294] Updated weights for policy 0, policy_version 232264 (0.0026) [2025-01-04 15:14:37,949][134294] Updated weights for policy 0, policy_version 232274 (0.0026) [2025-01-04 15:14:38,968][134211] Fps is (10 sec: 13516.9, 60 sec: 14813.9, 300 sec: 14717.8). Total num frames: 951406592. Throughput: 0: 3792.6. Samples: 227019088. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 15:14:38,968][134211] Avg episode reward: [(0, '9.296')] [2025-01-04 15:14:40,961][134294] Updated weights for policy 0, policy_version 232284 (0.0024) [2025-01-04 15:14:43,784][134294] Updated weights for policy 0, policy_version 232294 (0.0023) [2025-01-04 15:14:43,970][134211] Fps is (10 sec: 13924.0, 60 sec: 14813.3, 300 sec: 14717.7). Total num frames: 951476224. Throughput: 0: 3791.1. Samples: 227040108. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 15:14:43,970][134211] Avg episode reward: [(0, '9.256')] [2025-01-04 15:14:46,731][134294] Updated weights for policy 0, policy_version 232304 (0.0024) [2025-01-04 15:14:48,968][134211] Fps is (10 sec: 13926.4, 60 sec: 14882.3, 300 sec: 14704.0). Total num frames: 951545856. Throughput: 0: 3783.3. Samples: 227050504. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 15:14:48,968][134211] Avg episode reward: [(0, '8.662')] [2025-01-04 15:14:49,720][134294] Updated weights for policy 0, policy_version 232314 (0.0024) [2025-01-04 15:14:52,685][134294] Updated weights for policy 0, policy_version 232324 (0.0025) [2025-01-04 15:14:53,968][134211] Fps is (10 sec: 13929.3, 60 sec: 14882.1, 300 sec: 14717.8). Total num frames: 951615488. Throughput: 0: 3776.3. Samples: 227071152. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 15:14:53,968][134211] Avg episode reward: [(0, '9.235')] [2025-01-04 15:14:55,673][134294] Updated weights for policy 0, policy_version 232334 (0.0025) [2025-01-04 15:14:58,447][134294] Updated weights for policy 0, policy_version 232344 (0.0023) [2025-01-04 15:14:58,968][134211] Fps is (10 sec: 13926.4, 60 sec: 14813.9, 300 sec: 14717.8). Total num frames: 951685120. Throughput: 0: 3549.5. Samples: 227092446. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 15:14:58,968][134211] Avg episode reward: [(0, '10.375')] [2025-01-04 15:15:01,338][134294] Updated weights for policy 0, policy_version 232354 (0.0025) [2025-01-04 15:15:03,968][134211] Fps is (10 sec: 13926.6, 60 sec: 14813.9, 300 sec: 14717.8). Total num frames: 951754752. Throughput: 0: 3443.1. Samples: 227102914. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 15:15:03,968][134211] Avg episode reward: [(0, '10.580')] [2025-01-04 15:15:04,504][134294] Updated weights for policy 0, policy_version 232364 (0.0027) [2025-01-04 15:15:07,378][134294] Updated weights for policy 0, policy_version 232374 (0.0027) [2025-01-04 15:15:08,968][134211] Fps is (10 sec: 13926.4, 60 sec: 14677.3, 300 sec: 14662.3). Total num frames: 951824384. Throughput: 0: 3448.8. Samples: 227123418. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 15:15:08,968][134211] Avg episode reward: [(0, '9.406')] [2025-01-04 15:15:10,370][134294] Updated weights for policy 0, policy_version 232384 (0.0023) [2025-01-04 15:15:13,199][134294] Updated weights for policy 0, policy_version 232394 (0.0023) [2025-01-04 15:15:13,968][134211] Fps is (10 sec: 13926.2, 60 sec: 13994.6, 300 sec: 14662.3). Total num frames: 951894016. Throughput: 0: 3465.3. Samples: 227144658. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 15:15:13,968][134211] Avg episode reward: [(0, '10.058')] [2025-01-04 15:15:13,978][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000232396_951894016.pth... [2025-01-04 15:15:14,057][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000231535_948367360.pth [2025-01-04 15:15:16,211][134294] Updated weights for policy 0, policy_version 232404 (0.0027) [2025-01-04 15:15:18,968][134211] Fps is (10 sec: 13926.4, 60 sec: 13790.3, 300 sec: 14662.3). Total num frames: 951963648. Throughput: 0: 3471.2. Samples: 227154914. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 15:15:18,968][134211] Avg episode reward: [(0, '9.473')] [2025-01-04 15:15:18,989][134294] Updated weights for policy 0, policy_version 232414 (0.0027) [2025-01-04 15:15:21,313][134294] Updated weights for policy 0, policy_version 232424 (0.0016) [2025-01-04 15:15:23,147][134294] Updated weights for policy 0, policy_version 232434 (0.0014) [2025-01-04 15:15:23,968][134211] Fps is (10 sec: 17203.4, 60 sec: 14404.3, 300 sec: 14731.7). Total num frames: 952066048. Throughput: 0: 3583.6. Samples: 227180348. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 15:15:23,968][134211] Avg episode reward: [(0, '9.476')] [2025-01-04 15:15:25,016][134294] Updated weights for policy 0, policy_version 232444 (0.0013) [2025-01-04 15:15:26,894][134294] Updated weights for policy 0, policy_version 232454 (0.0013) [2025-01-04 15:15:28,968][134211] Fps is (10 sec: 20479.9, 60 sec: 14950.4, 300 sec: 14773.4). Total num frames: 952168448. Throughput: 0: 3838.6. Samples: 227212836. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 15:15:28,968][134211] Avg episode reward: [(0, '11.123')] [2025-01-04 15:15:29,110][134294] Updated weights for policy 0, policy_version 232464 (0.0018) [2025-01-04 15:15:32,278][134294] Updated weights for policy 0, policy_version 232474 (0.0026) [2025-01-04 15:15:33,968][134211] Fps is (10 sec: 16793.6, 60 sec: 14950.5, 300 sec: 14759.5). Total num frames: 952233984. Throughput: 0: 3833.5. Samples: 227223012. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 15:15:33,968][134211] Avg episode reward: [(0, '9.966')] [2025-01-04 15:15:35,385][134294] Updated weights for policy 0, policy_version 232484 (0.0028) [2025-01-04 15:15:38,441][134294] Updated weights for policy 0, policy_version 232494 (0.0023) [2025-01-04 15:15:38,968][134211] Fps is (10 sec: 13107.2, 60 sec: 14882.1, 300 sec: 14745.6). Total num frames: 952299520. Throughput: 0: 3814.4. Samples: 227242798. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 15:15:38,968][134211] Avg episode reward: [(0, '9.825')] [2025-01-04 15:15:41,412][134294] Updated weights for policy 0, policy_version 232504 (0.0025) [2025-01-04 15:15:43,968][134211] Fps is (10 sec: 13516.5, 60 sec: 14882.6, 300 sec: 14759.5). Total num frames: 952369152. Throughput: 0: 3791.3. Samples: 227263056. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 15:15:43,969][134211] Avg episode reward: [(0, '10.818')] [2025-01-04 15:15:44,528][134294] Updated weights for policy 0, policy_version 232514 (0.0025) [2025-01-04 15:15:47,519][134294] Updated weights for policy 0, policy_version 232524 (0.0025) [2025-01-04 15:15:48,968][134211] Fps is (10 sec: 13517.0, 60 sec: 14813.9, 300 sec: 14745.6). Total num frames: 952434688. Throughput: 0: 3784.2. Samples: 227273202. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 15:15:48,968][134211] Avg episode reward: [(0, '9.358')] [2025-01-04 15:15:50,368][134294] Updated weights for policy 0, policy_version 232534 (0.0024) [2025-01-04 15:15:53,358][134294] Updated weights for policy 0, policy_version 232544 (0.0026) [2025-01-04 15:15:53,968][134211] Fps is (10 sec: 13926.7, 60 sec: 14882.2, 300 sec: 14759.6). Total num frames: 952508416. Throughput: 0: 3795.5. Samples: 227294216. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 15:15:53,968][134211] Avg episode reward: [(0, '10.265')] [2025-01-04 15:15:56,259][134294] Updated weights for policy 0, policy_version 232554 (0.0025) [2025-01-04 15:15:58,968][134211] Fps is (10 sec: 14335.8, 60 sec: 14882.1, 300 sec: 14759.5). Total num frames: 952578048. Throughput: 0: 3785.7. Samples: 227315012. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 15:15:58,968][134211] Avg episode reward: [(0, '10.366')] [2025-01-04 15:15:59,252][134294] Updated weights for policy 0, policy_version 232564 (0.0026) [2025-01-04 15:16:02,207][134294] Updated weights for policy 0, policy_version 232574 (0.0023) [2025-01-04 15:16:03,968][134211] Fps is (10 sec: 13516.8, 60 sec: 14813.9, 300 sec: 14745.6). Total num frames: 952643584. Throughput: 0: 3787.1. Samples: 227325334. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 15:16:03,968][134211] Avg episode reward: [(0, '9.890')] [2025-01-04 15:16:05,247][134294] Updated weights for policy 0, policy_version 232584 (0.0023) [2025-01-04 15:16:08,072][134294] Updated weights for policy 0, policy_version 232594 (0.0025) [2025-01-04 15:16:08,968][134211] Fps is (10 sec: 13516.9, 60 sec: 14813.9, 300 sec: 14745.6). Total num frames: 952713216. Throughput: 0: 3687.7. Samples: 227346296. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 15:16:08,968][134211] Avg episode reward: [(0, '10.335')] [2025-01-04 15:16:10,994][134294] Updated weights for policy 0, policy_version 232604 (0.0025) [2025-01-04 15:16:13,968][134211] Fps is (10 sec: 13926.2, 60 sec: 14813.9, 300 sec: 14620.6). Total num frames: 952782848. Throughput: 0: 3430.2. Samples: 227367194. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 15:16:13,968][134211] Avg episode reward: [(0, '9.512')] [2025-01-04 15:16:14,006][134294] Updated weights for policy 0, policy_version 232614 (0.0024) [2025-01-04 15:16:16,959][134294] Updated weights for policy 0, policy_version 232624 (0.0026) [2025-01-04 15:16:18,968][134211] Fps is (10 sec: 13926.3, 60 sec: 14813.9, 300 sec: 14523.5). Total num frames: 952852480. Throughput: 0: 3432.6. Samples: 227377478. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 15:16:18,968][134211] Avg episode reward: [(0, '8.643')] [2025-01-04 15:16:19,903][134294] Updated weights for policy 0, policy_version 232634 (0.0024) [2025-01-04 15:16:22,707][134294] Updated weights for policy 0, policy_version 232644 (0.0025) [2025-01-04 15:16:23,968][134211] Fps is (10 sec: 14336.2, 60 sec: 14336.0, 300 sec: 14551.2). Total num frames: 952926208. Throughput: 0: 3466.1. Samples: 227398772. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 15:16:23,968][134211] Avg episode reward: [(0, '9.904')] [2025-01-04 15:16:25,607][134294] Updated weights for policy 0, policy_version 232654 (0.0025) [2025-01-04 15:16:27,775][134294] Updated weights for policy 0, policy_version 232664 (0.0016) [2025-01-04 15:16:28,968][134211] Fps is (10 sec: 15974.6, 60 sec: 14063.0, 300 sec: 14620.6). Total num frames: 953012224. Throughput: 0: 3559.6. Samples: 227423236. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 15:16:28,968][134211] Avg episode reward: [(0, '9.849')] [2025-01-04 15:16:30,222][134294] Updated weights for policy 0, policy_version 232674 (0.0021) [2025-01-04 15:16:33,049][134294] Updated weights for policy 0, policy_version 232684 (0.0024) [2025-01-04 15:16:33,969][134211] Fps is (10 sec: 15562.8, 60 sec: 14130.9, 300 sec: 14620.6). Total num frames: 953081856. Throughput: 0: 3594.1. Samples: 227434940. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 15:16:33,970][134211] Avg episode reward: [(0, '9.169')] [2025-01-04 15:16:36,047][134294] Updated weights for policy 0, policy_version 232694 (0.0026) [2025-01-04 15:16:38,968][134211] Fps is (10 sec: 13926.2, 60 sec: 14199.5, 300 sec: 14634.5). Total num frames: 953151488. Throughput: 0: 3587.5. Samples: 227455654. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 15:16:38,968][134211] Avg episode reward: [(0, '10.522')] [2025-01-04 15:16:39,102][134294] Updated weights for policy 0, policy_version 232704 (0.0025) [2025-01-04 15:16:42,051][134294] Updated weights for policy 0, policy_version 232714 (0.0026) [2025-01-04 15:16:43,968][134211] Fps is (10 sec: 13928.3, 60 sec: 14199.5, 300 sec: 14620.7). Total num frames: 953221120. Throughput: 0: 3581.3. Samples: 227476168. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 15:16:43,968][134211] Avg episode reward: [(0, '10.229')] [2025-01-04 15:16:44,564][134294] Updated weights for policy 0, policy_version 232724 (0.0019) [2025-01-04 15:16:46,423][134294] Updated weights for policy 0, policy_version 232734 (0.0013) [2025-01-04 15:16:48,343][134294] Updated weights for policy 0, policy_version 232744 (0.0014) [2025-01-04 15:16:48,968][134211] Fps is (10 sec: 18021.7, 60 sec: 14950.3, 300 sec: 14759.5). Total num frames: 953331712. Throughput: 0: 3692.7. Samples: 227491506. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 15:16:48,968][134211] Avg episode reward: [(0, '9.743')] [2025-01-04 15:16:50,252][134294] Updated weights for policy 0, policy_version 232754 (0.0013) [2025-01-04 15:16:52,275][134294] Updated weights for policy 0, policy_version 232764 (0.0015) [2025-01-04 15:16:53,968][134211] Fps is (10 sec: 20070.2, 60 sec: 15223.5, 300 sec: 14828.9). Total num frames: 953421824. Throughput: 0: 3932.3. Samples: 227523250. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 15:16:53,968][134211] Avg episode reward: [(0, '9.757')] [2025-01-04 15:16:55,352][134294] Updated weights for policy 0, policy_version 232774 (0.0026) [2025-01-04 15:16:58,459][134294] Updated weights for policy 0, policy_version 232784 (0.0024) [2025-01-04 15:16:58,968][134211] Fps is (10 sec: 15565.6, 60 sec: 15155.2, 300 sec: 14815.0). Total num frames: 953487360. Throughput: 0: 3907.3. Samples: 227543020. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 15:16:58,968][134211] Avg episode reward: [(0, '10.808')] [2025-01-04 15:17:01,525][134294] Updated weights for policy 0, policy_version 232794 (0.0028) [2025-01-04 15:17:03,968][134211] Fps is (10 sec: 13107.2, 60 sec: 15155.2, 300 sec: 14801.1). Total num frames: 953552896. Throughput: 0: 3903.7. Samples: 227553146. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 15:17:03,968][134211] Avg episode reward: [(0, '9.678')] [2025-01-04 15:17:04,702][134294] Updated weights for policy 0, policy_version 232804 (0.0029) [2025-01-04 15:17:07,748][134294] Updated weights for policy 0, policy_version 232814 (0.0026) [2025-01-04 15:17:08,968][134211] Fps is (10 sec: 13107.2, 60 sec: 15086.9, 300 sec: 14801.1). Total num frames: 953618432. Throughput: 0: 3865.0. Samples: 227572696. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 15:17:08,968][134211] Avg episode reward: [(0, '9.413')] [2025-01-04 15:17:10,825][134294] Updated weights for policy 0, policy_version 232824 (0.0022) [2025-01-04 15:17:13,712][134294] Updated weights for policy 0, policy_version 232834 (0.0022) [2025-01-04 15:17:13,968][134211] Fps is (10 sec: 13516.5, 60 sec: 15086.9, 300 sec: 14801.1). Total num frames: 953688064. Throughput: 0: 3783.5. Samples: 227593496. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 15:17:13,969][134211] Avg episode reward: [(0, '10.628')] [2025-01-04 15:17:13,983][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000232835_953692160.pth... [2025-01-04 15:17:14,057][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000231968_950140928.pth [2025-01-04 15:17:16,701][134294] Updated weights for policy 0, policy_version 232844 (0.0024) [2025-01-04 15:17:18,968][134211] Fps is (10 sec: 13926.3, 60 sec: 15086.9, 300 sec: 14759.5). Total num frames: 953757696. Throughput: 0: 3746.1. Samples: 227603510. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 15:17:18,968][134211] Avg episode reward: [(0, '11.673')] [2025-01-04 15:17:19,661][134294] Updated weights for policy 0, policy_version 232854 (0.0023) [2025-01-04 15:17:22,705][134294] Updated weights for policy 0, policy_version 232864 (0.0025) [2025-01-04 15:17:23,968][134211] Fps is (10 sec: 13926.7, 60 sec: 15018.7, 300 sec: 14731.7). Total num frames: 953827328. Throughput: 0: 3746.5. Samples: 227624248. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 15:17:23,968][134211] Avg episode reward: [(0, '10.395')] [2025-01-04 15:17:25,601][134294] Updated weights for policy 0, policy_version 232874 (0.0024) [2025-01-04 15:17:28,521][134294] Updated weights for policy 0, policy_version 232884 (0.0021) [2025-01-04 15:17:28,968][134211] Fps is (10 sec: 13926.5, 60 sec: 14745.6, 300 sec: 14745.6). Total num frames: 953896960. Throughput: 0: 3762.7. Samples: 227645490. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 15:17:28,968][134211] Avg episode reward: [(0, '9.289')] [2025-01-04 15:17:31,351][134294] Updated weights for policy 0, policy_version 232894 (0.0024) [2025-01-04 15:17:33,968][134211] Fps is (10 sec: 13926.3, 60 sec: 14745.9, 300 sec: 14745.6). Total num frames: 953966592. Throughput: 0: 3656.5. Samples: 227656046. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 15:17:33,968][134211] Avg episode reward: [(0, '10.740')] [2025-01-04 15:17:34,410][134294] Updated weights for policy 0, policy_version 232904 (0.0024) [2025-01-04 15:17:37,346][134294] Updated weights for policy 0, policy_version 232914 (0.0025) [2025-01-04 15:17:38,968][134211] Fps is (10 sec: 13926.4, 60 sec: 14745.6, 300 sec: 14690.1). Total num frames: 954036224. Throughput: 0: 3406.6. Samples: 227676548. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 15:17:38,968][134211] Avg episode reward: [(0, '8.616')] [2025-01-04 15:17:40,287][134294] Updated weights for policy 0, policy_version 232924 (0.0024) [2025-01-04 15:17:43,084][134294] Updated weights for policy 0, policy_version 232934 (0.0025) [2025-01-04 15:17:43,968][134211] Fps is (10 sec: 13926.4, 60 sec: 14745.6, 300 sec: 14676.2). Total num frames: 954105856. Throughput: 0: 3436.1. Samples: 227697644. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 15:17:43,968][134211] Avg episode reward: [(0, '9.638')] [2025-01-04 15:17:46,067][134294] Updated weights for policy 0, policy_version 232944 (0.0026) [2025-01-04 15:17:48,968][134211] Fps is (10 sec: 13926.3, 60 sec: 14063.0, 300 sec: 14676.2). Total num frames: 954175488. Throughput: 0: 3449.0. Samples: 227708352. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 15:17:48,969][134211] Avg episode reward: [(0, '9.090')] [2025-01-04 15:17:48,985][134294] Updated weights for policy 0, policy_version 232954 (0.0026) [2025-01-04 15:17:51,941][134294] Updated weights for policy 0, policy_version 232964 (0.0022) [2025-01-04 15:17:53,968][134211] Fps is (10 sec: 14745.8, 60 sec: 13858.2, 300 sec: 14704.0). Total num frames: 954253312. Throughput: 0: 3477.1. Samples: 227729166. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 15:17:53,968][134211] Avg episode reward: [(0, '8.812')] [2025-01-04 15:17:54,345][134294] Updated weights for policy 0, policy_version 232974 (0.0017) [2025-01-04 15:17:56,729][134294] Updated weights for policy 0, policy_version 232984 (0.0017) [2025-01-04 15:17:58,968][134211] Fps is (10 sec: 15564.2, 60 sec: 14062.8, 300 sec: 14745.6). Total num frames: 954331136. Throughput: 0: 3570.1. Samples: 227754150. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 15:17:58,969][134211] Avg episode reward: [(0, '8.848')] [2025-01-04 15:17:59,554][134294] Updated weights for policy 0, policy_version 232994 (0.0025) [2025-01-04 15:18:02,630][134294] Updated weights for policy 0, policy_version 233004 (0.0026) [2025-01-04 15:18:03,968][134211] Fps is (10 sec: 14745.6, 60 sec: 14131.2, 300 sec: 14676.2). Total num frames: 954400768. Throughput: 0: 3577.6. Samples: 227764500. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 15:18:03,968][134211] Avg episode reward: [(0, '9.549')] [2025-01-04 15:18:05,609][134294] Updated weights for policy 0, policy_version 233014 (0.0023) [2025-01-04 15:18:08,500][134294] Updated weights for policy 0, policy_version 233024 (0.0022) [2025-01-04 15:18:08,968][134211] Fps is (10 sec: 13927.0, 60 sec: 14199.5, 300 sec: 14676.2). Total num frames: 954470400. Throughput: 0: 3578.7. Samples: 227785290. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 15:18:08,968][134211] Avg episode reward: [(0, '10.234')] [2025-01-04 15:18:11,151][134294] Updated weights for policy 0, policy_version 233034 (0.0018) [2025-01-04 15:18:13,100][134294] Updated weights for policy 0, policy_version 233044 (0.0016) [2025-01-04 15:18:13,968][134211] Fps is (10 sec: 15974.2, 60 sec: 14540.8, 300 sec: 14745.6). Total num frames: 954560512. Throughput: 0: 3665.6. Samples: 227810444. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 15:18:13,968][134211] Avg episode reward: [(0, '10.165')] [2025-01-04 15:18:15,871][134294] Updated weights for policy 0, policy_version 233054 (0.0022) [2025-01-04 15:18:18,792][134294] Updated weights for policy 0, policy_version 233064 (0.0026) [2025-01-04 15:18:18,968][134211] Fps is (10 sec: 15974.5, 60 sec: 14540.8, 300 sec: 14759.5). Total num frames: 954630144. Throughput: 0: 3678.7. Samples: 227821588. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 15:18:18,968][134211] Avg episode reward: [(0, '9.841')] [2025-01-04 15:18:21,648][134294] Updated weights for policy 0, policy_version 233074 (0.0025) [2025-01-04 15:18:23,968][134211] Fps is (10 sec: 13926.4, 60 sec: 14540.8, 300 sec: 14676.2). Total num frames: 954699776. Throughput: 0: 3691.0. Samples: 227842644. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 15:18:23,968][134211] Avg episode reward: [(0, '10.035')] [2025-01-04 15:18:24,759][134294] Updated weights for policy 0, policy_version 233084 (0.0024) [2025-01-04 15:18:27,721][134294] Updated weights for policy 0, policy_version 233094 (0.0026) [2025-01-04 15:18:28,968][134211] Fps is (10 sec: 13926.3, 60 sec: 14540.8, 300 sec: 14648.4). Total num frames: 954769408. Throughput: 0: 3675.6. Samples: 227863044. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 15:18:28,968][134211] Avg episode reward: [(0, '10.300')] [2025-01-04 15:18:30,662][134294] Updated weights for policy 0, policy_version 233104 (0.0027) [2025-01-04 15:18:33,510][134294] Updated weights for policy 0, policy_version 233114 (0.0024) [2025-01-04 15:18:33,968][134211] Fps is (10 sec: 13926.4, 60 sec: 14540.8, 300 sec: 14648.4). Total num frames: 954839040. Throughput: 0: 3671.7. Samples: 227873578. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 15:18:33,968][134211] Avg episode reward: [(0, '9.657')] [2025-01-04 15:18:36,425][134294] Updated weights for policy 0, policy_version 233124 (0.0024) [2025-01-04 15:18:38,968][134211] Fps is (10 sec: 13925.5, 60 sec: 14540.6, 300 sec: 14648.4). Total num frames: 954908672. Throughput: 0: 3679.5. Samples: 227894744. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 15:18:38,969][134211] Avg episode reward: [(0, '9.990')] [2025-01-04 15:18:39,619][134294] Updated weights for policy 0, policy_version 233134 (0.0025) [2025-01-04 15:18:41,669][134294] Updated weights for policy 0, policy_version 233144 (0.0014) [2025-01-04 15:18:43,571][134294] Updated weights for policy 0, policy_version 233154 (0.0013) [2025-01-04 15:18:43,967][134211] Fps is (10 sec: 16794.0, 60 sec: 15018.7, 300 sec: 14759.5). Total num frames: 955006976. Throughput: 0: 3705.1. Samples: 227920878. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 15:18:43,968][134211] Avg episode reward: [(0, '10.228')] [2025-01-04 15:18:45,441][134294] Updated weights for policy 0, policy_version 233164 (0.0014) [2025-01-04 15:18:47,340][134294] Updated weights for policy 0, policy_version 233174 (0.0013) [2025-01-04 15:18:48,968][134211] Fps is (10 sec: 20071.7, 60 sec: 15564.8, 300 sec: 14870.6). Total num frames: 955109376. Throughput: 0: 3837.6. Samples: 227937194. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 15:18:48,968][134211] Avg episode reward: [(0, '8.624')] [2025-01-04 15:18:49,821][134294] Updated weights for policy 0, policy_version 233184 (0.0019) [2025-01-04 15:18:52,911][134294] Updated weights for policy 0, policy_version 233194 (0.0025) [2025-01-04 15:18:53,968][134211] Fps is (10 sec: 16793.3, 60 sec: 15360.0, 300 sec: 14842.8). Total num frames: 955174912. Throughput: 0: 3918.6. Samples: 227961628. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 15:18:53,968][134211] Avg episode reward: [(0, '9.817')] [2025-01-04 15:18:56,013][134294] Updated weights for policy 0, policy_version 233204 (0.0026) [2025-01-04 15:18:58,968][134211] Fps is (10 sec: 13107.3, 60 sec: 15155.3, 300 sec: 14828.9). Total num frames: 955240448. Throughput: 0: 3793.1. Samples: 227981134. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 15:18:58,968][134211] Avg episode reward: [(0, '10.710')] [2025-01-04 15:18:59,151][134294] Updated weights for policy 0, policy_version 233214 (0.0027) [2025-01-04 15:19:02,275][134294] Updated weights for policy 0, policy_version 233224 (0.0026) [2025-01-04 15:19:03,968][134211] Fps is (10 sec: 13107.0, 60 sec: 15086.9, 300 sec: 14787.2). Total num frames: 955305984. Throughput: 0: 3767.0. Samples: 227991106. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 15:19:03,968][134211] Avg episode reward: [(0, '10.492')] [2025-01-04 15:19:05,272][134294] Updated weights for policy 0, policy_version 233234 (0.0030) [2025-01-04 15:19:08,182][134294] Updated weights for policy 0, policy_version 233244 (0.0023) [2025-01-04 15:19:08,968][134211] Fps is (10 sec: 13516.8, 60 sec: 15086.9, 300 sec: 14648.4). Total num frames: 955375616. Throughput: 0: 3758.1. Samples: 228011758. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 15:19:08,968][134211] Avg episode reward: [(0, '9.777')] [2025-01-04 15:19:11,083][134294] Updated weights for policy 0, policy_version 233254 (0.0026) [2025-01-04 15:19:13,968][134211] Fps is (10 sec: 13926.5, 60 sec: 14745.6, 300 sec: 14606.9). Total num frames: 955445248. Throughput: 0: 3758.3. Samples: 228032166. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 15:19:13,968][134211] Avg episode reward: [(0, '9.318')] [2025-01-04 15:19:13,981][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000233263_955445248.pth... [2025-01-04 15:19:14,060][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000232396_951894016.pth [2025-01-04 15:19:14,230][134294] Updated weights for policy 0, policy_version 233264 (0.0029) [2025-01-04 15:19:17,299][134294] Updated weights for policy 0, policy_version 233274 (0.0026) [2025-01-04 15:19:18,968][134211] Fps is (10 sec: 13516.8, 60 sec: 14677.3, 300 sec: 14606.8). Total num frames: 955510784. Throughput: 0: 3743.0. Samples: 228042014. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 15:19:18,968][134211] Avg episode reward: [(0, '9.074')] [2025-01-04 15:19:20,194][134294] Updated weights for policy 0, policy_version 233284 (0.0023) [2025-01-04 15:19:23,150][134294] Updated weights for policy 0, policy_version 233294 (0.0024) [2025-01-04 15:19:23,968][134211] Fps is (10 sec: 13516.8, 60 sec: 14677.3, 300 sec: 14606.8). Total num frames: 955580416. Throughput: 0: 3737.7. Samples: 228062940. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 15:19:23,968][134211] Avg episode reward: [(0, '9.614')] [2025-01-04 15:19:26,110][134294] Updated weights for policy 0, policy_version 233304 (0.0022) [2025-01-04 15:19:28,968][134211] Fps is (10 sec: 13926.4, 60 sec: 14677.3, 300 sec: 14620.7). Total num frames: 955650048. Throughput: 0: 3615.9. Samples: 228083594. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 15:19:28,968][134211] Avg episode reward: [(0, '9.433')] [2025-01-04 15:19:29,167][134294] Updated weights for policy 0, policy_version 233314 (0.0027) [2025-01-04 15:19:32,049][134294] Updated weights for policy 0, policy_version 233324 (0.0026) [2025-01-04 15:19:33,968][134211] Fps is (10 sec: 13926.6, 60 sec: 14677.4, 300 sec: 14620.6). Total num frames: 955719680. Throughput: 0: 3478.9. Samples: 228093742. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 15:19:33,968][134211] Avg episode reward: [(0, '9.595')] [2025-01-04 15:19:34,972][134294] Updated weights for policy 0, policy_version 233334 (0.0027) [2025-01-04 15:19:37,854][134294] Updated weights for policy 0, policy_version 233344 (0.0022) [2025-01-04 15:19:38,968][134211] Fps is (10 sec: 13925.5, 60 sec: 14677.3, 300 sec: 14620.7). Total num frames: 955789312. Throughput: 0: 3412.3. Samples: 228115182. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 15:19:38,969][134211] Avg episode reward: [(0, '11.451')] [2025-01-04 15:19:40,760][134294] Updated weights for policy 0, policy_version 233354 (0.0022) [2025-01-04 15:19:42,813][134294] Updated weights for policy 0, policy_version 233364 (0.0015) [2025-01-04 15:19:43,968][134211] Fps is (10 sec: 15974.4, 60 sec: 14540.8, 300 sec: 14690.1). Total num frames: 955879424. Throughput: 0: 3526.6. Samples: 228139832. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 15:19:43,968][134211] Avg episode reward: [(0, '9.955')] [2025-01-04 15:19:44,770][134294] Updated weights for policy 0, policy_version 233374 (0.0015) [2025-01-04 15:19:46,664][134294] Updated weights for policy 0, policy_version 233384 (0.0015) [2025-01-04 15:19:48,540][134294] Updated weights for policy 0, policy_version 233394 (0.0016) [2025-01-04 15:19:48,968][134211] Fps is (10 sec: 20072.1, 60 sec: 14677.4, 300 sec: 14828.9). Total num frames: 955990016. Throughput: 0: 3665.1. Samples: 228156034. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 15:19:48,968][134211] Avg episode reward: [(0, '9.103')] [2025-01-04 15:19:50,907][134294] Updated weights for policy 0, policy_version 233404 (0.0020) [2025-01-04 15:19:53,968][134211] Fps is (10 sec: 18022.0, 60 sec: 14745.6, 300 sec: 14828.9). Total num frames: 956059648. Throughput: 0: 3804.8. Samples: 228182976. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 15:19:53,968][134211] Avg episode reward: [(0, '9.913')] [2025-01-04 15:19:54,092][134294] Updated weights for policy 0, policy_version 233414 (0.0029) [2025-01-04 15:19:57,206][134294] Updated weights for policy 0, policy_version 233424 (0.0028) [2025-01-04 15:19:58,968][134211] Fps is (10 sec: 13516.6, 60 sec: 14745.6, 300 sec: 14815.0). Total num frames: 956125184. Throughput: 0: 3787.2. Samples: 228202588. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 15:19:58,968][134211] Avg episode reward: [(0, '9.188')] [2025-01-04 15:20:00,267][134294] Updated weights for policy 0, policy_version 233434 (0.0025) [2025-01-04 15:20:03,226][134294] Updated weights for policy 0, policy_version 233444 (0.0027) [2025-01-04 15:20:03,969][134211] Fps is (10 sec: 13515.1, 60 sec: 14813.6, 300 sec: 14815.0). Total num frames: 956194816. Throughput: 0: 3799.3. Samples: 228212986. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 15:20:03,970][134211] Avg episode reward: [(0, '9.670')] [2025-01-04 15:20:06,170][134294] Updated weights for policy 0, policy_version 233454 (0.0025) [2025-01-04 15:20:08,968][134211] Fps is (10 sec: 13926.6, 60 sec: 14813.9, 300 sec: 14815.0). Total num frames: 956264448. Throughput: 0: 3784.6. Samples: 228233248. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 15:20:08,968][134211] Avg episode reward: [(0, '9.506')] [2025-01-04 15:20:09,358][134294] Updated weights for policy 0, policy_version 233464 (0.0029) [2025-01-04 15:20:12,220][134294] Updated weights for policy 0, policy_version 233474 (0.0023) [2025-01-04 15:20:13,968][134211] Fps is (10 sec: 13518.8, 60 sec: 14745.6, 300 sec: 14801.1). Total num frames: 956329984. Throughput: 0: 3766.0. Samples: 228253064. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 15:20:13,968][134211] Avg episode reward: [(0, '10.078')] [2025-01-04 15:20:15,387][134294] Updated weights for policy 0, policy_version 233484 (0.0025) [2025-01-04 15:20:18,203][134294] Updated weights for policy 0, policy_version 233494 (0.0024) [2025-01-04 15:20:18,968][134211] Fps is (10 sec: 13516.7, 60 sec: 14813.9, 300 sec: 14690.1). Total num frames: 956399616. Throughput: 0: 3779.8. Samples: 228263832. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 15:20:18,968][134211] Avg episode reward: [(0, '10.880')] [2025-01-04 15:20:21,214][134294] Updated weights for policy 0, policy_version 233504 (0.0025) [2025-01-04 15:20:23,968][134211] Fps is (10 sec: 13926.4, 60 sec: 14813.9, 300 sec: 14579.0). Total num frames: 956469248. Throughput: 0: 3761.8. Samples: 228284460. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 15:20:23,968][134211] Avg episode reward: [(0, '10.628')] [2025-01-04 15:20:24,233][134294] Updated weights for policy 0, policy_version 233514 (0.0026) [2025-01-04 15:20:27,145][134294] Updated weights for policy 0, policy_version 233524 (0.0026) [2025-01-04 15:20:28,968][134211] Fps is (10 sec: 13926.3, 60 sec: 14813.9, 300 sec: 14592.9). Total num frames: 956538880. Throughput: 0: 3673.6. Samples: 228305142. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 15:20:28,968][134211] Avg episode reward: [(0, '9.507')] [2025-01-04 15:20:30,112][134294] Updated weights for policy 0, policy_version 233534 (0.0025) [2025-01-04 15:20:32,998][134294] Updated weights for policy 0, policy_version 233544 (0.0024) [2025-01-04 15:20:33,968][134211] Fps is (10 sec: 13925.8, 60 sec: 14813.7, 300 sec: 14606.7). Total num frames: 956608512. Throughput: 0: 3551.1. Samples: 228315836. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 15:20:33,969][134211] Avg episode reward: [(0, '10.001')] [2025-01-04 15:20:35,904][134294] Updated weights for policy 0, policy_version 233554 (0.0023) [2025-01-04 15:20:38,750][134294] Updated weights for policy 0, policy_version 233564 (0.0022) [2025-01-04 15:20:38,968][134211] Fps is (10 sec: 13926.2, 60 sec: 14814.0, 300 sec: 14606.8). Total num frames: 956678144. Throughput: 0: 3424.4. Samples: 228337076. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 15:20:38,968][134211] Avg episode reward: [(0, '9.519')] [2025-01-04 15:20:41,668][134294] Updated weights for policy 0, policy_version 233574 (0.0023) [2025-01-04 15:20:43,968][134211] Fps is (10 sec: 13927.1, 60 sec: 14472.5, 300 sec: 14620.6). Total num frames: 956747776. Throughput: 0: 3456.6. Samples: 228358134. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 15:20:43,968][134211] Avg episode reward: [(0, '8.678')] [2025-01-04 15:20:44,657][134294] Updated weights for policy 0, policy_version 233584 (0.0024) [2025-01-04 15:20:47,685][134294] Updated weights for policy 0, policy_version 233594 (0.0025) [2025-01-04 15:20:48,968][134211] Fps is (10 sec: 13926.6, 60 sec: 13789.8, 300 sec: 14606.8). Total num frames: 956817408. Throughput: 0: 3455.9. Samples: 228368496. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 15:20:48,968][134211] Avg episode reward: [(0, '10.327')] [2025-01-04 15:20:50,471][134294] Updated weights for policy 0, policy_version 233604 (0.0026) [2025-01-04 15:20:52,464][134294] Updated weights for policy 0, policy_version 233614 (0.0013) [2025-01-04 15:20:53,967][134211] Fps is (10 sec: 16793.8, 60 sec: 14267.8, 300 sec: 14704.0). Total num frames: 956915712. Throughput: 0: 3540.5. Samples: 228392570. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 15:20:53,968][134211] Avg episode reward: [(0, '9.447')] [2025-01-04 15:20:54,299][134294] Updated weights for policy 0, policy_version 233624 (0.0012) [2025-01-04 15:20:56,720][134294] Updated weights for policy 0, policy_version 233634 (0.0018) [2025-01-04 15:20:58,968][134211] Fps is (10 sec: 17612.6, 60 sec: 14472.5, 300 sec: 14745.6). Total num frames: 956993536. Throughput: 0: 3699.2. Samples: 228419530. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 15:20:58,968][134211] Avg episode reward: [(0, '9.267')] [2025-01-04 15:20:59,682][134294] Updated weights for policy 0, policy_version 233644 (0.0024) [2025-01-04 15:21:02,777][134294] Updated weights for policy 0, policy_version 233654 (0.0024) [2025-01-04 15:21:03,968][134211] Fps is (10 sec: 14335.6, 60 sec: 14404.6, 300 sec: 14731.7). Total num frames: 957059072. Throughput: 0: 3681.6. Samples: 228429506. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 15:21:03,968][134211] Avg episode reward: [(0, '9.948')] [2025-01-04 15:21:05,740][134294] Updated weights for policy 0, policy_version 233664 (0.0027) [2025-01-04 15:21:08,678][134294] Updated weights for policy 0, policy_version 233674 (0.0023) [2025-01-04 15:21:08,968][134211] Fps is (10 sec: 13516.9, 60 sec: 14404.2, 300 sec: 14731.7). Total num frames: 957128704. Throughput: 0: 3682.6. Samples: 228450178. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 15:21:08,968][134211] Avg episode reward: [(0, '8.265')] [2025-01-04 15:21:11,652][134294] Updated weights for policy 0, policy_version 233684 (0.0026) [2025-01-04 15:21:13,968][134211] Fps is (10 sec: 13926.3, 60 sec: 14472.5, 300 sec: 14731.7). Total num frames: 957198336. Throughput: 0: 3679.1. Samples: 228470700. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 15:21:13,968][134211] Avg episode reward: [(0, '9.066')] [2025-01-04 15:21:13,976][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000233691_957198336.pth... [2025-01-04 15:21:14,057][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000232835_953692160.pth [2025-01-04 15:21:14,764][134294] Updated weights for policy 0, policy_version 233694 (0.0025) [2025-01-04 15:21:17,743][134294] Updated weights for policy 0, policy_version 233704 (0.0026) [2025-01-04 15:21:18,968][134211] Fps is (10 sec: 13926.6, 60 sec: 14472.5, 300 sec: 14717.8). Total num frames: 957267968. Throughput: 0: 3664.4. Samples: 228480732. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 15:21:18,968][134211] Avg episode reward: [(0, '9.068')] [2025-01-04 15:21:20,652][134294] Updated weights for policy 0, policy_version 233714 (0.0026) [2025-01-04 15:21:23,470][134294] Updated weights for policy 0, policy_version 233724 (0.0025) [2025-01-04 15:21:23,968][134211] Fps is (10 sec: 13926.5, 60 sec: 14472.5, 300 sec: 14662.3). Total num frames: 957337600. Throughput: 0: 3664.7. Samples: 228501986. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 15:21:23,968][134211] Avg episode reward: [(0, '9.092')] [2025-01-04 15:21:26,441][134294] Updated weights for policy 0, policy_version 233734 (0.0024) [2025-01-04 15:21:28,970][134211] Fps is (10 sec: 13923.4, 60 sec: 14472.0, 300 sec: 14662.3). Total num frames: 957407232. Throughput: 0: 3656.4. Samples: 228522678. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 15:21:28,970][134211] Avg episode reward: [(0, '8.964')] [2025-01-04 15:21:29,416][134294] Updated weights for policy 0, policy_version 233744 (0.0023) [2025-01-04 15:21:31,335][134294] Updated weights for policy 0, policy_version 233754 (0.0013) [2025-01-04 15:21:33,160][134294] Updated weights for policy 0, policy_version 233764 (0.0013) [2025-01-04 15:21:33,968][134211] Fps is (10 sec: 17612.5, 60 sec: 15087.0, 300 sec: 14787.2). Total num frames: 957513728. Throughput: 0: 3735.0. Samples: 228536572. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 15:21:33,968][134211] Avg episode reward: [(0, '10.748')] [2025-01-04 15:21:35,091][134294] Updated weights for policy 0, policy_version 233774 (0.0013) [2025-01-04 15:21:36,965][134294] Updated weights for policy 0, policy_version 233784 (0.0013) [2025-01-04 15:21:38,968][134211] Fps is (10 sec: 20484.2, 60 sec: 15564.8, 300 sec: 14884.4). Total num frames: 957612032. Throughput: 0: 3925.7. Samples: 228569228. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 15:21:38,968][134211] Avg episode reward: [(0, '9.113')] [2025-01-04 15:21:39,563][134294] Updated weights for policy 0, policy_version 233794 (0.0024) [2025-01-04 15:21:42,669][134294] Updated weights for policy 0, policy_version 233804 (0.0026) [2025-01-04 15:21:43,968][134211] Fps is (10 sec: 16384.2, 60 sec: 15496.5, 300 sec: 14731.7). Total num frames: 957677568. Throughput: 0: 3789.7. Samples: 228590066. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 15:21:43,968][134211] Avg episode reward: [(0, '9.589')] [2025-01-04 15:21:45,774][134294] Updated weights for policy 0, policy_version 233814 (0.0027) [2025-01-04 15:21:48,793][134294] Updated weights for policy 0, policy_version 233824 (0.0023) [2025-01-04 15:21:48,968][134211] Fps is (10 sec: 13107.1, 60 sec: 15428.2, 300 sec: 14648.4). Total num frames: 957743104. Throughput: 0: 3792.0. Samples: 228600144. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 15:21:48,968][134211] Avg episode reward: [(0, '8.430')] [2025-01-04 15:21:51,755][134294] Updated weights for policy 0, policy_version 233834 (0.0025) [2025-01-04 15:21:53,968][134211] Fps is (10 sec: 13516.6, 60 sec: 14950.3, 300 sec: 14662.3). Total num frames: 957812736. Throughput: 0: 3787.8. Samples: 228620628. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 15:21:53,969][134211] Avg episode reward: [(0, '9.269')] [2025-01-04 15:21:54,864][134294] Updated weights for policy 0, policy_version 233844 (0.0027) [2025-01-04 15:21:57,817][134294] Updated weights for policy 0, policy_version 233854 (0.0024) [2025-01-04 15:21:58,968][134211] Fps is (10 sec: 13516.0, 60 sec: 14745.5, 300 sec: 14662.3). Total num frames: 957878272. Throughput: 0: 3780.7. Samples: 228640834. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-04 15:21:58,969][134211] Avg episode reward: [(0, '8.035')] [2025-01-04 15:22:00,726][134294] Updated weights for policy 0, policy_version 233864 (0.0023) [2025-01-04 15:22:03,812][134294] Updated weights for policy 0, policy_version 233874 (0.0024) [2025-01-04 15:22:03,968][134211] Fps is (10 sec: 13517.1, 60 sec: 14813.9, 300 sec: 14676.2). Total num frames: 957947904. Throughput: 0: 3789.6. Samples: 228651264. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 15:22:03,968][134211] Avg episode reward: [(0, '10.153')] [2025-01-04 15:22:06,775][134294] Updated weights for policy 0, policy_version 233884 (0.0024) [2025-01-04 15:22:08,968][134211] Fps is (10 sec: 12288.8, 60 sec: 14540.8, 300 sec: 14620.6). Total num frames: 958001152. Throughput: 0: 3770.7. Samples: 228671668. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 15:22:08,968][134211] Avg episode reward: [(0, '9.169')] [2025-01-04 15:22:11,000][134294] Updated weights for policy 0, policy_version 233894 (0.0023) [2025-01-04 15:22:13,968][134211] Fps is (10 sec: 11878.4, 60 sec: 14472.5, 300 sec: 14606.7). Total num frames: 958066688. Throughput: 0: 3677.4. Samples: 228688152. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 15:22:13,968][134211] Avg episode reward: [(0, '9.044')] [2025-01-04 15:22:14,089][134294] Updated weights for policy 0, policy_version 233904 (0.0030) [2025-01-04 15:22:17,960][134294] Updated weights for policy 0, policy_version 233914 (0.0027) [2025-01-04 15:22:18,968][134211] Fps is (10 sec: 12697.6, 60 sec: 14336.0, 300 sec: 14579.0). Total num frames: 958128128. Throughput: 0: 3531.5. Samples: 228695488. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 15:22:18,969][134211] Avg episode reward: [(0, '8.952')] [2025-01-04 15:22:20,114][134294] Updated weights for policy 0, policy_version 233924 (0.0015) [2025-01-04 15:22:22,128][134294] Updated weights for policy 0, policy_version 233934 (0.0013) [2025-01-04 15:22:23,968][134211] Fps is (10 sec: 16384.1, 60 sec: 14882.1, 300 sec: 14690.1). Total num frames: 958230528. Throughput: 0: 3388.5. Samples: 228721712. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 15:22:23,968][134211] Avg episode reward: [(0, '9.955')] [2025-01-04 15:22:24,166][134294] Updated weights for policy 0, policy_version 233944 (0.0013) [2025-01-04 15:22:27,223][134294] Updated weights for policy 0, policy_version 233954 (0.0028) [2025-01-04 15:22:28,968][134211] Fps is (10 sec: 16793.0, 60 sec: 14814.3, 300 sec: 14676.2). Total num frames: 958296064. Throughput: 0: 3440.9. Samples: 228744906. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 15:22:28,969][134211] Avg episode reward: [(0, '10.070')] [2025-01-04 15:22:30,539][134294] Updated weights for policy 0, policy_version 233964 (0.0029) [2025-01-04 15:22:33,969][134211] Fps is (10 sec: 12286.9, 60 sec: 13994.5, 300 sec: 14634.5). Total num frames: 958353408. Throughput: 0: 3424.3. Samples: 228754242. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 15:22:33,969][134211] Avg episode reward: [(0, '8.978')] [2025-01-04 15:22:34,241][134294] Updated weights for policy 0, policy_version 233974 (0.0030) [2025-01-04 15:22:37,367][134294] Updated weights for policy 0, policy_version 233984 (0.0023) [2025-01-04 15:22:38,968][134211] Fps is (10 sec: 13516.8, 60 sec: 13653.3, 300 sec: 14662.3). Total num frames: 958431232. Throughput: 0: 3364.8. Samples: 228772044. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 15:22:38,968][134211] Avg episode reward: [(0, '9.539')] [2025-01-04 15:22:39,210][134294] Updated weights for policy 0, policy_version 233994 (0.0013) [2025-01-04 15:22:41,091][134294] Updated weights for policy 0, policy_version 234004 (0.0013) [2025-01-04 15:22:43,042][134294] Updated weights for policy 0, policy_version 234014 (0.0012) [2025-01-04 15:22:43,968][134211] Fps is (10 sec: 18843.6, 60 sec: 14404.3, 300 sec: 14801.1). Total num frames: 958541824. Throughput: 0: 3639.4. Samples: 228804606. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 15:22:43,968][134211] Avg episode reward: [(0, '9.509')] [2025-01-04 15:22:44,864][134294] Updated weights for policy 0, policy_version 234024 (0.0014) [2025-01-04 15:22:46,929][134294] Updated weights for policy 0, policy_version 234034 (0.0016) [2025-01-04 15:22:48,968][134211] Fps is (10 sec: 19661.4, 60 sec: 14745.6, 300 sec: 14828.9). Total num frames: 958627840. Throughput: 0: 3769.7. Samples: 228820902. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 15:22:48,968][134211] Avg episode reward: [(0, '8.924')] [2025-01-04 15:22:49,972][134294] Updated weights for policy 0, policy_version 234044 (0.0027) [2025-01-04 15:22:53,233][134294] Updated weights for policy 0, policy_version 234054 (0.0029) [2025-01-04 15:22:53,968][134211] Fps is (10 sec: 15155.1, 60 sec: 14677.4, 300 sec: 14787.3). Total num frames: 958693376. Throughput: 0: 3764.2. Samples: 228841056. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 15:22:53,968][134211] Avg episode reward: [(0, '9.530')] [2025-01-04 15:22:56,357][134294] Updated weights for policy 0, policy_version 234064 (0.0028) [2025-01-04 15:22:58,968][134211] Fps is (10 sec: 13107.2, 60 sec: 14677.5, 300 sec: 14773.4). Total num frames: 958758912. Throughput: 0: 3833.4. Samples: 228860656. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 15:22:58,968][134211] Avg episode reward: [(0, '9.915')] [2025-01-04 15:22:59,510][134294] Updated weights for policy 0, policy_version 234074 (0.0025) [2025-01-04 15:23:02,466][134294] Updated weights for policy 0, policy_version 234084 (0.0025) [2025-01-04 15:23:03,968][134211] Fps is (10 sec: 13107.2, 60 sec: 14609.1, 300 sec: 14759.5). Total num frames: 958824448. Throughput: 0: 3892.4. Samples: 228870648. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 15:23:03,968][134211] Avg episode reward: [(0, '9.887')] [2025-01-04 15:23:05,445][134294] Updated weights for policy 0, policy_version 234094 (0.0028) [2025-01-04 15:23:08,316][134294] Updated weights for policy 0, policy_version 234104 (0.0024) [2025-01-04 15:23:08,968][134211] Fps is (10 sec: 13926.3, 60 sec: 14950.4, 300 sec: 14703.9). Total num frames: 958898176. Throughput: 0: 3776.5. Samples: 228891656. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 15:23:08,968][134211] Avg episode reward: [(0, '9.000')] [2025-01-04 15:23:11,324][134294] Updated weights for policy 0, policy_version 234114 (0.0022) [2025-01-04 15:23:13,968][134211] Fps is (10 sec: 13925.8, 60 sec: 14950.3, 300 sec: 14690.0). Total num frames: 958963712. Throughput: 0: 3720.3. Samples: 228912318. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 15:23:13,969][134211] Avg episode reward: [(0, '8.086')] [2025-01-04 15:23:14,015][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000234123_958967808.pth... [2025-01-04 15:23:14,086][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000233263_955445248.pth [2025-01-04 15:23:14,329][134294] Updated weights for policy 0, policy_version 234124 (0.0024) [2025-01-04 15:23:17,397][134294] Updated weights for policy 0, policy_version 234134 (0.0026) [2025-01-04 15:23:18,968][134211] Fps is (10 sec: 13517.0, 60 sec: 15086.9, 300 sec: 14690.1). Total num frames: 959033344. Throughput: 0: 3736.8. Samples: 228922392. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 15:23:18,968][134211] Avg episode reward: [(0, '10.323')] [2025-01-04 15:23:20,199][134294] Updated weights for policy 0, policy_version 234144 (0.0023) [2025-01-04 15:23:23,076][134294] Updated weights for policy 0, policy_version 234154 (0.0025) [2025-01-04 15:23:23,968][134211] Fps is (10 sec: 13926.9, 60 sec: 14540.8, 300 sec: 14690.1). Total num frames: 959102976. Throughput: 0: 3813.5. Samples: 228943650. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 15:23:23,968][134211] Avg episode reward: [(0, '9.543')] [2025-01-04 15:23:25,967][134294] Updated weights for policy 0, policy_version 234164 (0.0024) [2025-01-04 15:23:28,755][134294] Updated weights for policy 0, policy_version 234174 (0.0026) [2025-01-04 15:23:28,968][134211] Fps is (10 sec: 14335.8, 60 sec: 14677.4, 300 sec: 14703.9). Total num frames: 959176704. Throughput: 0: 3569.9. Samples: 228965250. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 15:23:28,968][134211] Avg episode reward: [(0, '10.806')] [2025-01-04 15:23:31,698][134294] Updated weights for policy 0, policy_version 234184 (0.0023) [2025-01-04 15:23:33,968][134211] Fps is (10 sec: 14336.0, 60 sec: 14882.4, 300 sec: 14704.0). Total num frames: 959246336. Throughput: 0: 3441.0. Samples: 228975746. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 15:23:33,968][134211] Avg episode reward: [(0, '10.442')] [2025-01-04 15:23:34,692][134294] Updated weights for policy 0, policy_version 234194 (0.0025) [2025-01-04 15:23:37,647][134294] Updated weights for policy 0, policy_version 234204 (0.0024) [2025-01-04 15:23:38,968][134211] Fps is (10 sec: 13926.6, 60 sec: 14745.7, 300 sec: 14606.7). Total num frames: 959315968. Throughput: 0: 3456.0. Samples: 228996576. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 15:23:38,968][134211] Avg episode reward: [(0, '10.952')] [2025-01-04 15:23:40,521][134294] Updated weights for policy 0, policy_version 234214 (0.0023) [2025-01-04 15:23:43,326][134294] Updated weights for policy 0, policy_version 234224 (0.0025) [2025-01-04 15:23:43,968][134211] Fps is (10 sec: 14336.0, 60 sec: 14131.2, 300 sec: 14509.6). Total num frames: 959389696. Throughput: 0: 3497.7. Samples: 229018052. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 15:23:43,968][134211] Avg episode reward: [(0, '9.587')] [2025-01-04 15:23:46,190][134294] Updated weights for policy 0, policy_version 234234 (0.0024) [2025-01-04 15:23:48,968][134211] Fps is (10 sec: 14335.9, 60 sec: 13858.1, 300 sec: 14523.4). Total num frames: 959459328. Throughput: 0: 3510.5. Samples: 229028620. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 15:23:48,968][134211] Avg episode reward: [(0, '9.394')] [2025-01-04 15:23:49,239][134294] Updated weights for policy 0, policy_version 234244 (0.0022) [2025-01-04 15:23:52,130][134294] Updated weights for policy 0, policy_version 234254 (0.0024) [2025-01-04 15:23:53,968][134211] Fps is (10 sec: 13926.6, 60 sec: 13926.4, 300 sec: 14537.3). Total num frames: 959528960. Throughput: 0: 3503.6. Samples: 229049318. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 15:23:53,968][134211] Avg episode reward: [(0, '8.836')] [2025-01-04 15:23:55,072][134294] Updated weights for policy 0, policy_version 234264 (0.0025) [2025-01-04 15:23:57,897][134294] Updated weights for policy 0, policy_version 234274 (0.0023) [2025-01-04 15:23:58,968][134211] Fps is (10 sec: 13926.2, 60 sec: 13994.6, 300 sec: 14551.2). Total num frames: 959598592. Throughput: 0: 3522.9. Samples: 229070846. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 15:23:58,968][134211] Avg episode reward: [(0, '9.840')] [2025-01-04 15:24:00,806][134294] Updated weights for policy 0, policy_version 234284 (0.0023) [2025-01-04 15:24:03,661][134294] Updated weights for policy 0, policy_version 234294 (0.0026) [2025-01-04 15:24:03,968][134211] Fps is (10 sec: 14336.0, 60 sec: 14131.2, 300 sec: 14565.1). Total num frames: 959672320. Throughput: 0: 3538.8. Samples: 229081640. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 15:24:03,968][134211] Avg episode reward: [(0, '9.709')] [2025-01-04 15:24:06,452][134294] Updated weights for policy 0, policy_version 234304 (0.0025) [2025-01-04 15:24:08,968][134211] Fps is (10 sec: 14336.2, 60 sec: 14062.9, 300 sec: 14565.1). Total num frames: 959741952. Throughput: 0: 3540.8. Samples: 229102988. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 15:24:08,968][134211] Avg episode reward: [(0, '9.165')] [2025-01-04 15:24:09,302][134294] Updated weights for policy 0, policy_version 234314 (0.0024) [2025-01-04 15:24:12,307][134294] Updated weights for policy 0, policy_version 234324 (0.0025) [2025-01-04 15:24:13,968][134211] Fps is (10 sec: 13926.4, 60 sec: 14131.3, 300 sec: 14579.0). Total num frames: 959811584. Throughput: 0: 3530.5. Samples: 229124120. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 15:24:13,968][134211] Avg episode reward: [(0, '10.081')] [2025-01-04 15:24:15,165][134294] Updated weights for policy 0, policy_version 234334 (0.0025) [2025-01-04 15:24:18,118][134294] Updated weights for policy 0, policy_version 234344 (0.0024) [2025-01-04 15:24:18,968][134211] Fps is (10 sec: 13926.4, 60 sec: 14131.2, 300 sec: 14579.0). Total num frames: 959881216. Throughput: 0: 3534.3. Samples: 229134790. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 15:24:18,968][134211] Avg episode reward: [(0, '9.145')] [2025-01-04 15:24:20,864][134294] Updated weights for policy 0, policy_version 234354 (0.0026) [2025-01-04 15:24:23,736][134294] Updated weights for policy 0, policy_version 234364 (0.0021) [2025-01-04 15:24:23,968][134211] Fps is (10 sec: 14335.8, 60 sec: 14199.5, 300 sec: 14592.9). Total num frames: 959954944. Throughput: 0: 3554.1. Samples: 229156510. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 15:24:23,968][134211] Avg episode reward: [(0, '9.543')] [2025-01-04 15:24:26,621][134294] Updated weights for policy 0, policy_version 234374 (0.0023) [2025-01-04 15:24:28,968][134211] Fps is (10 sec: 14745.8, 60 sec: 14199.5, 300 sec: 14606.8). Total num frames: 960028672. Throughput: 0: 3545.4. Samples: 229177596. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 15:24:28,968][134211] Avg episode reward: [(0, '9.725')] [2025-01-04 15:24:29,319][134294] Updated weights for policy 0, policy_version 234384 (0.0021) [2025-01-04 15:24:31,333][134294] Updated weights for policy 0, policy_version 234394 (0.0015) [2025-01-04 15:24:33,968][134211] Fps is (10 sec: 15974.6, 60 sec: 14472.6, 300 sec: 14662.3). Total num frames: 960114688. Throughput: 0: 3627.2. Samples: 229191846. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 15:24:33,968][134211] Avg episode reward: [(0, '9.329')] [2025-01-04 15:24:34,076][134294] Updated weights for policy 0, policy_version 234404 (0.0024) [2025-01-04 15:24:37,037][134294] Updated weights for policy 0, policy_version 234414 (0.0025) [2025-01-04 15:24:38,968][134211] Fps is (10 sec: 15564.7, 60 sec: 14472.5, 300 sec: 14592.9). Total num frames: 960184320. Throughput: 0: 3650.4. Samples: 229213586. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 15:24:38,968][134211] Avg episode reward: [(0, '10.385')] [2025-01-04 15:24:39,991][134294] Updated weights for policy 0, policy_version 234424 (0.0023) [2025-01-04 15:24:42,882][134294] Updated weights for policy 0, policy_version 234434 (0.0024) [2025-01-04 15:24:43,968][134211] Fps is (10 sec: 13926.4, 60 sec: 14404.3, 300 sec: 14454.0). Total num frames: 960253952. Throughput: 0: 3639.9. Samples: 229234642. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 15:24:43,968][134211] Avg episode reward: [(0, '8.978')] [2025-01-04 15:24:45,713][134294] Updated weights for policy 0, policy_version 234444 (0.0025) [2025-01-04 15:24:48,554][134294] Updated weights for policy 0, policy_version 234454 (0.0023) [2025-01-04 15:24:48,968][134211] Fps is (10 sec: 14335.9, 60 sec: 14472.5, 300 sec: 14467.9). Total num frames: 960327680. Throughput: 0: 3641.1. Samples: 229245490. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 15:24:48,968][134211] Avg episode reward: [(0, '9.420')] [2025-01-04 15:24:51,597][134294] Updated weights for policy 0, policy_version 234464 (0.0023) [2025-01-04 15:24:53,968][134211] Fps is (10 sec: 14335.8, 60 sec: 14472.5, 300 sec: 14481.8). Total num frames: 960397312. Throughput: 0: 3633.7. Samples: 229266506. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 15:24:53,968][134211] Avg episode reward: [(0, '8.549')] [2025-01-04 15:24:54,478][134294] Updated weights for policy 0, policy_version 234474 (0.0023) [2025-01-04 15:24:56,374][134294] Updated weights for policy 0, policy_version 234484 (0.0012) [2025-01-04 15:24:58,278][134294] Updated weights for policy 0, policy_version 234494 (0.0014) [2025-01-04 15:24:58,967][134211] Fps is (10 sec: 17203.5, 60 sec: 15018.8, 300 sec: 14592.9). Total num frames: 960499712. Throughput: 0: 3792.4. Samples: 229294776. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 15:24:58,968][134211] Avg episode reward: [(0, '9.860')] [2025-01-04 15:25:00,131][134294] Updated weights for policy 0, policy_version 234504 (0.0013) [2025-01-04 15:25:02,039][134294] Updated weights for policy 0, policy_version 234514 (0.0013) [2025-01-04 15:25:03,969][134211] Fps is (10 sec: 20067.2, 60 sec: 15427.8, 300 sec: 14690.0). Total num frames: 960598016. Throughput: 0: 3920.2. Samples: 229311204. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 15:25:03,970][134211] Avg episode reward: [(0, '8.748')] [2025-01-04 15:25:04,572][134294] Updated weights for policy 0, policy_version 234524 (0.0022) [2025-01-04 15:25:07,579][134294] Updated weights for policy 0, policy_version 234534 (0.0025) [2025-01-04 15:25:08,968][134211] Fps is (10 sec: 16793.2, 60 sec: 15428.3, 300 sec: 14703.9). Total num frames: 960667648. Throughput: 0: 3957.1. Samples: 229334580. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 15:25:08,968][134211] Avg episode reward: [(0, '9.267')] [2025-01-04 15:25:10,796][134294] Updated weights for policy 0, policy_version 234544 (0.0027) [2025-01-04 15:25:13,781][134294] Updated weights for policy 0, policy_version 234554 (0.0026) [2025-01-04 15:25:13,968][134211] Fps is (10 sec: 13518.9, 60 sec: 15359.9, 300 sec: 14690.0). Total num frames: 960733184. Throughput: 0: 3933.4. Samples: 229354602. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 15:25:13,968][134211] Avg episode reward: [(0, '10.108')] [2025-01-04 15:25:13,978][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000234554_960733184.pth... [2025-01-04 15:25:14,053][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000233691_957198336.pth [2025-01-04 15:25:16,699][134294] Updated weights for policy 0, policy_version 234564 (0.0025) [2025-01-04 15:25:18,968][134211] Fps is (10 sec: 13517.0, 60 sec: 15360.0, 300 sec: 14690.1). Total num frames: 960802816. Throughput: 0: 3840.5. Samples: 229364670. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 15:25:18,968][134211] Avg episode reward: [(0, '9.554')] [2025-01-04 15:25:19,819][134294] Updated weights for policy 0, policy_version 234574 (0.0025) [2025-01-04 15:25:22,672][134294] Updated weights for policy 0, policy_version 234584 (0.0025) [2025-01-04 15:25:23,968][134211] Fps is (10 sec: 13926.6, 60 sec: 15291.8, 300 sec: 14690.1). Total num frames: 960872448. Throughput: 0: 3815.2. Samples: 229385272. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 15:25:23,969][134211] Avg episode reward: [(0, '10.935')] [2025-01-04 15:25:25,719][134294] Updated weights for policy 0, policy_version 234594 (0.0023) [2025-01-04 15:25:28,466][134294] Updated weights for policy 0, policy_version 234604 (0.0027) [2025-01-04 15:25:28,968][134211] Fps is (10 sec: 13926.2, 60 sec: 15223.4, 300 sec: 14690.1). Total num frames: 960942080. Throughput: 0: 3820.3. Samples: 229406558. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 15:25:28,968][134211] Avg episode reward: [(0, '9.302')] [2025-01-04 15:25:31,432][134294] Updated weights for policy 0, policy_version 234614 (0.0024) [2025-01-04 15:25:33,968][134211] Fps is (10 sec: 13926.5, 60 sec: 14950.4, 300 sec: 14690.1). Total num frames: 961011712. Throughput: 0: 3813.7. Samples: 229417106. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 15:25:33,968][134211] Avg episode reward: [(0, '9.733')] [2025-01-04 15:25:34,422][134294] Updated weights for policy 0, policy_version 234624 (0.0027) [2025-01-04 15:25:37,260][134294] Updated weights for policy 0, policy_version 234634 (0.0024) [2025-01-04 15:25:38,968][134211] Fps is (10 sec: 13926.5, 60 sec: 14950.4, 300 sec: 14690.1). Total num frames: 961081344. Throughput: 0: 3811.2. Samples: 229438008. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 15:25:38,968][134211] Avg episode reward: [(0, '9.815')] [2025-01-04 15:25:40,246][134294] Updated weights for policy 0, policy_version 234644 (0.0024) [2025-01-04 15:25:43,024][134294] Updated weights for policy 0, policy_version 234654 (0.0025) [2025-01-04 15:25:43,968][134211] Fps is (10 sec: 14335.8, 60 sec: 15018.6, 300 sec: 14703.9). Total num frames: 961155072. Throughput: 0: 3657.9. Samples: 229459382. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 15:25:43,968][134211] Avg episode reward: [(0, '9.701')] [2025-01-04 15:25:45,969][134294] Updated weights for policy 0, policy_version 234664 (0.0027) [2025-01-04 15:25:48,843][134294] Updated weights for policy 0, policy_version 234674 (0.0024) [2025-01-04 15:25:48,968][134211] Fps is (10 sec: 14336.0, 60 sec: 14950.4, 300 sec: 14606.7). Total num frames: 961224704. Throughput: 0: 3526.7. Samples: 229469900. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 15:25:48,968][134211] Avg episode reward: [(0, '9.192')] [2025-01-04 15:25:51,728][134294] Updated weights for policy 0, policy_version 234684 (0.0024) [2025-01-04 15:25:53,968][134211] Fps is (10 sec: 13926.4, 60 sec: 14950.4, 300 sec: 14579.0). Total num frames: 961294336. Throughput: 0: 3481.5. Samples: 229491246. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 15:25:53,968][134211] Avg episode reward: [(0, '9.609')] [2025-01-04 15:25:54,719][134294] Updated weights for policy 0, policy_version 234694 (0.0023) [2025-01-04 15:25:57,595][134294] Updated weights for policy 0, policy_version 234704 (0.0024) [2025-01-04 15:25:58,968][134211] Fps is (10 sec: 13926.2, 60 sec: 14404.2, 300 sec: 14592.9). Total num frames: 961363968. Throughput: 0: 3500.1. Samples: 229512108. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 15:25:58,968][134211] Avg episode reward: [(0, '10.016')] [2025-01-04 15:26:00,463][134294] Updated weights for policy 0, policy_version 234714 (0.0024) [2025-01-04 15:26:03,303][134294] Updated weights for policy 0, policy_version 234724 (0.0026) [2025-01-04 15:26:03,968][134211] Fps is (10 sec: 14336.0, 60 sec: 13995.0, 300 sec: 14606.7). Total num frames: 961437696. Throughput: 0: 3517.1. Samples: 229522938. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 15:26:03,968][134211] Avg episode reward: [(0, '9.401')] [2025-01-04 15:26:06,225][134294] Updated weights for policy 0, policy_version 234734 (0.0024) [2025-01-04 15:26:08,968][134211] Fps is (10 sec: 14336.0, 60 sec: 13994.7, 300 sec: 14606.8). Total num frames: 961507328. Throughput: 0: 3530.9. Samples: 229544162. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 15:26:08,968][134211] Avg episode reward: [(0, '9.497')] [2025-01-04 15:26:09,177][134294] Updated weights for policy 0, policy_version 234744 (0.0024) [2025-01-04 15:26:12,062][134294] Updated weights for policy 0, policy_version 234754 (0.0025) [2025-01-04 15:26:13,968][134211] Fps is (10 sec: 13926.0, 60 sec: 14062.9, 300 sec: 14606.7). Total num frames: 961576960. Throughput: 0: 3522.3. Samples: 229565064. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 15:26:13,969][134211] Avg episode reward: [(0, '8.569')] [2025-01-04 15:26:15,001][134294] Updated weights for policy 0, policy_version 234764 (0.0022) [2025-01-04 15:26:16,909][134294] Updated weights for policy 0, policy_version 234774 (0.0013) [2025-01-04 15:26:18,784][134294] Updated weights for policy 0, policy_version 234784 (0.0014) [2025-01-04 15:26:18,967][134211] Fps is (10 sec: 17203.7, 60 sec: 14609.1, 300 sec: 14717.8). Total num frames: 961679360. Throughput: 0: 3574.3. Samples: 229577948. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 15:26:18,968][134211] Avg episode reward: [(0, '9.088')] [2025-01-04 15:26:20,578][134294] Updated weights for policy 0, policy_version 234794 (0.0012) [2025-01-04 15:26:22,436][134294] Updated weights for policy 0, policy_version 234804 (0.0012) [2025-01-04 15:26:23,967][134211] Fps is (10 sec: 20890.8, 60 sec: 15223.5, 300 sec: 14842.9). Total num frames: 961785856. Throughput: 0: 3841.5. Samples: 229610874. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 15:26:23,968][134211] Avg episode reward: [(0, '10.498')] [2025-01-04 15:26:24,338][134294] Updated weights for policy 0, policy_version 234814 (0.0014) [2025-01-04 15:26:26,739][134294] Updated weights for policy 0, policy_version 234824 (0.0021) [2025-01-04 15:26:28,968][134211] Fps is (10 sec: 18840.6, 60 sec: 15428.2, 300 sec: 14759.5). Total num frames: 961867776. Throughput: 0: 3966.6. Samples: 229637880. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 15:26:28,970][134211] Avg episode reward: [(0, '8.813')] [2025-01-04 15:26:29,823][134294] Updated weights for policy 0, policy_version 234834 (0.0024) [2025-01-04 15:26:32,904][134294] Updated weights for policy 0, policy_version 234844 (0.0028) [2025-01-04 15:26:33,968][134211] Fps is (10 sec: 14745.3, 60 sec: 15360.0, 300 sec: 14648.4). Total num frames: 961933312. Throughput: 0: 3946.1. Samples: 229647474. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 15:26:33,968][134211] Avg episode reward: [(0, '9.844')] [2025-01-04 15:26:35,979][134294] Updated weights for policy 0, policy_version 234854 (0.0027) [2025-01-04 15:26:38,968][134211] Fps is (10 sec: 13107.6, 60 sec: 15291.7, 300 sec: 14648.4). Total num frames: 961998848. Throughput: 0: 3924.8. Samples: 229667860. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 15:26:38,968][134211] Avg episode reward: [(0, '10.030')] [2025-01-04 15:26:39,075][134294] Updated weights for policy 0, policy_version 234864 (0.0028) [2025-01-04 15:26:42,201][134294] Updated weights for policy 0, policy_version 234874 (0.0026) [2025-01-04 15:26:43,968][134211] Fps is (10 sec: 13107.1, 60 sec: 15155.2, 300 sec: 14648.4). Total num frames: 962064384. Throughput: 0: 3895.8. Samples: 229687418. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 15:26:43,968][134211] Avg episode reward: [(0, '8.926')] [2025-01-04 15:26:45,191][134294] Updated weights for policy 0, policy_version 234884 (0.0028) [2025-01-04 15:26:48,393][134294] Updated weights for policy 0, policy_version 234894 (0.0027) [2025-01-04 15:26:48,968][134211] Fps is (10 sec: 13107.0, 60 sec: 15086.9, 300 sec: 14634.5). Total num frames: 962129920. Throughput: 0: 3884.4. Samples: 229697736. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 15:26:48,969][134211] Avg episode reward: [(0, '9.321')] [2025-01-04 15:26:52,148][134294] Updated weights for policy 0, policy_version 234904 (0.0029) [2025-01-04 15:26:53,968][134211] Fps is (10 sec: 12288.2, 60 sec: 14882.2, 300 sec: 14606.8). Total num frames: 962187264. Throughput: 0: 3796.4. Samples: 229715000. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 15:26:53,968][134211] Avg episode reward: [(0, '9.800')] [2025-01-04 15:26:55,251][134294] Updated weights for policy 0, policy_version 234914 (0.0025) [2025-01-04 15:26:58,152][134294] Updated weights for policy 0, policy_version 234924 (0.0024) [2025-01-04 15:26:58,968][134211] Fps is (10 sec: 12697.7, 60 sec: 14882.1, 300 sec: 14606.8). Total num frames: 962256896. Throughput: 0: 3785.7. Samples: 229735418. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 15:26:58,968][134211] Avg episode reward: [(0, '9.497')] [2025-01-04 15:27:01,145][134294] Updated weights for policy 0, policy_version 234934 (0.0026) [2025-01-04 15:27:03,968][134211] Fps is (10 sec: 13926.2, 60 sec: 14813.9, 300 sec: 14662.3). Total num frames: 962326528. Throughput: 0: 3725.0. Samples: 229745574. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 15:27:03,968][134211] Avg episode reward: [(0, '9.014')] [2025-01-04 15:27:04,255][134294] Updated weights for policy 0, policy_version 234944 (0.0025) [2025-01-04 15:27:07,320][134294] Updated weights for policy 0, policy_version 234954 (0.0026) [2025-01-04 15:27:08,968][134211] Fps is (10 sec: 13516.8, 60 sec: 14745.6, 300 sec: 14662.3). Total num frames: 962392064. Throughput: 0: 3440.7. Samples: 229765706. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 15:27:08,968][134211] Avg episode reward: [(0, '10.234')] [2025-01-04 15:27:10,489][134294] Updated weights for policy 0, policy_version 234964 (0.0023) [2025-01-04 15:27:13,797][134294] Updated weights for policy 0, policy_version 234974 (0.0028) [2025-01-04 15:27:13,968][134211] Fps is (10 sec: 12696.8, 60 sec: 14609.0, 300 sec: 14662.3). Total num frames: 962453504. Throughput: 0: 3262.9. Samples: 229784712. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 15:27:13,969][134211] Avg episode reward: [(0, '10.087')] [2025-01-04 15:27:13,978][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000234974_962453504.pth... [2025-01-04 15:27:14,058][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000234123_958967808.pth [2025-01-04 15:27:16,651][134294] Updated weights for policy 0, policy_version 234984 (0.0020) [2025-01-04 15:27:18,598][134294] Updated weights for policy 0, policy_version 234994 (0.0015) [2025-01-04 15:27:18,968][134211] Fps is (10 sec: 15155.5, 60 sec: 14404.3, 300 sec: 14620.6). Total num frames: 962543616. Throughput: 0: 3267.3. Samples: 229794504. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 15:27:18,968][134211] Avg episode reward: [(0, '10.284')] [2025-01-04 15:27:20,442][134294] Updated weights for policy 0, policy_version 235004 (0.0012) [2025-01-04 15:27:22,345][134294] Updated weights for policy 0, policy_version 235014 (0.0013) [2025-01-04 15:27:23,968][134211] Fps is (10 sec: 18843.0, 60 sec: 14267.7, 300 sec: 14731.7). Total num frames: 962641920. Throughput: 0: 3530.1. Samples: 229826714. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 15:27:23,968][134211] Avg episode reward: [(0, '9.404')] [2025-01-04 15:27:24,973][134294] Updated weights for policy 0, policy_version 235024 (0.0020) [2025-01-04 15:27:28,092][134294] Updated weights for policy 0, policy_version 235034 (0.0027) [2025-01-04 15:27:28,968][134211] Fps is (10 sec: 16383.6, 60 sec: 13994.7, 300 sec: 14759.5). Total num frames: 962707456. Throughput: 0: 3574.4. Samples: 229848266. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 15:27:28,968][134211] Avg episode reward: [(0, '8.797')] [2025-01-04 15:27:31,146][134294] Updated weights for policy 0, policy_version 235044 (0.0028) [2025-01-04 15:27:33,969][134211] Fps is (10 sec: 13106.0, 60 sec: 13994.5, 300 sec: 14717.8). Total num frames: 962772992. Throughput: 0: 3564.4. Samples: 229858138. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 15:27:33,969][134211] Avg episode reward: [(0, '9.968')] [2025-01-04 15:27:34,406][134294] Updated weights for policy 0, policy_version 235054 (0.0026) [2025-01-04 15:27:37,477][134294] Updated weights for policy 0, policy_version 235064 (0.0027) [2025-01-04 15:27:38,968][134211] Fps is (10 sec: 13107.3, 60 sec: 13994.6, 300 sec: 14565.1). Total num frames: 962838528. Throughput: 0: 3615.3. Samples: 229877688. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 15:27:38,968][134211] Avg episode reward: [(0, '9.275')] [2025-01-04 15:27:40,511][134294] Updated weights for policy 0, policy_version 235074 (0.0025) [2025-01-04 15:27:43,517][134294] Updated weights for policy 0, policy_version 235084 (0.0026) [2025-01-04 15:27:43,968][134211] Fps is (10 sec: 13518.0, 60 sec: 14063.0, 300 sec: 14509.6). Total num frames: 962908160. Throughput: 0: 3618.5. Samples: 229898250. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 15:27:43,968][134211] Avg episode reward: [(0, '10.080')] [2025-01-04 15:27:46,434][134294] Updated weights for policy 0, policy_version 235094 (0.0025) [2025-01-04 15:27:48,968][134211] Fps is (10 sec: 13926.3, 60 sec: 14131.2, 300 sec: 14523.4). Total num frames: 962977792. Throughput: 0: 3621.8. Samples: 229908556. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 15:27:48,968][134211] Avg episode reward: [(0, '8.857')] [2025-01-04 15:27:49,537][134294] Updated weights for policy 0, policy_version 235104 (0.0029) [2025-01-04 15:27:52,573][134294] Updated weights for policy 0, policy_version 235114 (0.0023) [2025-01-04 15:27:53,968][134211] Fps is (10 sec: 13516.7, 60 sec: 14267.7, 300 sec: 14523.4). Total num frames: 963043328. Throughput: 0: 3623.4. Samples: 229928758. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 15:27:53,968][134211] Avg episode reward: [(0, '10.974')] [2025-01-04 15:27:55,471][134294] Updated weights for policy 0, policy_version 235124 (0.0022) [2025-01-04 15:27:58,381][134294] Updated weights for policy 0, policy_version 235134 (0.0023) [2025-01-04 15:27:58,970][134211] Fps is (10 sec: 13514.1, 60 sec: 14267.3, 300 sec: 14537.2). Total num frames: 963112960. Throughput: 0: 3665.8. Samples: 229949680. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 15:27:58,970][134211] Avg episode reward: [(0, '9.174')] [2025-01-04 15:28:01,402][134294] Updated weights for policy 0, policy_version 235144 (0.0024) [2025-01-04 15:28:03,968][134211] Fps is (10 sec: 13926.3, 60 sec: 14267.7, 300 sec: 14523.4). Total num frames: 963182592. Throughput: 0: 3677.7. Samples: 229960002. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 15:28:03,969][134211] Avg episode reward: [(0, '10.066')] [2025-01-04 15:28:04,418][134294] Updated weights for policy 0, policy_version 235154 (0.0025) [2025-01-04 15:28:07,437][134294] Updated weights for policy 0, policy_version 235164 (0.0024) [2025-01-04 15:28:08,968][134211] Fps is (10 sec: 13519.6, 60 sec: 14267.7, 300 sec: 14523.5). Total num frames: 963248128. Throughput: 0: 3411.0. Samples: 229980208. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 15:28:08,968][134211] Avg episode reward: [(0, '8.883')] [2025-01-04 15:28:10,086][134294] Updated weights for policy 0, policy_version 235174 (0.0021) [2025-01-04 15:28:12,114][134294] Updated weights for policy 0, policy_version 235184 (0.0016) [2025-01-04 15:28:13,968][134211] Fps is (10 sec: 15565.2, 60 sec: 14745.8, 300 sec: 14592.9). Total num frames: 963338240. Throughput: 0: 3490.0. Samples: 230005314. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 15:28:13,968][134211] Avg episode reward: [(0, '10.002')] [2025-01-04 15:28:15,129][134294] Updated weights for policy 0, policy_version 235194 (0.0023) [2025-01-04 15:28:18,120][134294] Updated weights for policy 0, policy_version 235204 (0.0025) [2025-01-04 15:28:18,968][134211] Fps is (10 sec: 15564.2, 60 sec: 14335.9, 300 sec: 14579.0). Total num frames: 963403776. Throughput: 0: 3499.4. Samples: 230015608. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 15:28:18,969][134211] Avg episode reward: [(0, '9.120')] [2025-01-04 15:28:21,088][134294] Updated weights for policy 0, policy_version 235214 (0.0025) [2025-01-04 15:28:23,968][134211] Fps is (10 sec: 13516.6, 60 sec: 13858.1, 300 sec: 14565.1). Total num frames: 963473408. Throughput: 0: 3510.9. Samples: 230035680. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 15:28:23,968][134211] Avg episode reward: [(0, '9.764')] [2025-01-04 15:28:24,270][134294] Updated weights for policy 0, policy_version 235224 (0.0024) [2025-01-04 15:28:26,702][134294] Updated weights for policy 0, policy_version 235234 (0.0016) [2025-01-04 15:28:28,909][134294] Updated weights for policy 0, policy_version 235244 (0.0019) [2025-01-04 15:28:28,968][134211] Fps is (10 sec: 15565.5, 60 sec: 14199.5, 300 sec: 14620.6). Total num frames: 963559424. Throughput: 0: 3600.0. Samples: 230060252. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 15:28:28,968][134211] Avg episode reward: [(0, '9.890')] [2025-01-04 15:28:31,958][134294] Updated weights for policy 0, policy_version 235254 (0.0025) [2025-01-04 15:28:33,968][134211] Fps is (10 sec: 15155.2, 60 sec: 14199.7, 300 sec: 14606.7). Total num frames: 963624960. Throughput: 0: 3602.3. Samples: 230070660. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 15:28:33,968][134211] Avg episode reward: [(0, '10.123')] [2025-01-04 15:28:35,013][134294] Updated weights for policy 0, policy_version 235264 (0.0029) [2025-01-04 15:28:37,958][134294] Updated weights for policy 0, policy_version 235274 (0.0022) [2025-01-04 15:28:38,968][134211] Fps is (10 sec: 13516.9, 60 sec: 14267.8, 300 sec: 14592.9). Total num frames: 963694592. Throughput: 0: 3614.4. Samples: 230091404. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 15:28:38,968][134211] Avg episode reward: [(0, '10.314')] [2025-01-04 15:28:40,955][134294] Updated weights for policy 0, policy_version 235284 (0.0026) [2025-01-04 15:28:43,784][134294] Updated weights for policy 0, policy_version 235294 (0.0024) [2025-01-04 15:28:43,968][134211] Fps is (10 sec: 13926.5, 60 sec: 14267.7, 300 sec: 14592.9). Total num frames: 963764224. Throughput: 0: 3609.2. Samples: 230112086. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 15:28:43,968][134211] Avg episode reward: [(0, '10.371')] [2025-01-04 15:28:46,825][134294] Updated weights for policy 0, policy_version 235304 (0.0024) [2025-01-04 15:28:48,968][134211] Fps is (10 sec: 13926.2, 60 sec: 14267.7, 300 sec: 14592.9). Total num frames: 963833856. Throughput: 0: 3608.0. Samples: 230122360. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 15:28:48,969][134211] Avg episode reward: [(0, '11.126')] [2025-01-04 15:28:49,837][134294] Updated weights for policy 0, policy_version 235314 (0.0025) [2025-01-04 15:28:52,808][134294] Updated weights for policy 0, policy_version 235324 (0.0028) [2025-01-04 15:28:53,968][134211] Fps is (10 sec: 13516.7, 60 sec: 14267.7, 300 sec: 14579.0). Total num frames: 963899392. Throughput: 0: 3610.6. Samples: 230142686. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 15:28:53,969][134211] Avg episode reward: [(0, '9.711')] [2025-01-04 15:28:55,681][134294] Updated weights for policy 0, policy_version 235334 (0.0022) [2025-01-04 15:28:57,561][134294] Updated weights for policy 0, policy_version 235344 (0.0013) [2025-01-04 15:28:58,968][134211] Fps is (10 sec: 16384.4, 60 sec: 14746.1, 300 sec: 14662.3). Total num frames: 963997696. Throughput: 0: 3636.4. Samples: 230168950. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 15:28:58,968][134211] Avg episode reward: [(0, '9.997')] [2025-01-04 15:28:59,406][134294] Updated weights for policy 0, policy_version 235354 (0.0015) [2025-01-04 15:29:01,325][134294] Updated weights for policy 0, policy_version 235364 (0.0014) [2025-01-04 15:29:03,213][134294] Updated weights for policy 0, policy_version 235374 (0.0012) [2025-01-04 15:29:03,968][134211] Fps is (10 sec: 20890.2, 60 sec: 15428.3, 300 sec: 14801.1). Total num frames: 964108288. Throughput: 0: 3770.0. Samples: 230185258. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 15:29:03,968][134211] Avg episode reward: [(0, '9.051')] [2025-01-04 15:29:05,084][134294] Updated weights for policy 0, policy_version 235384 (0.0015) [2025-01-04 15:29:08,023][134294] Updated weights for policy 0, policy_version 235394 (0.0027) [2025-01-04 15:29:08,968][134211] Fps is (10 sec: 18430.9, 60 sec: 15564.7, 300 sec: 14815.0). Total num frames: 964182016. Throughput: 0: 3960.4. Samples: 230213898. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 15:29:08,969][134211] Avg episode reward: [(0, '10.356')] [2025-01-04 15:29:11,304][134294] Updated weights for policy 0, policy_version 235404 (0.0030) [2025-01-04 15:29:13,968][134211] Fps is (10 sec: 13925.9, 60 sec: 15155.1, 300 sec: 14801.1). Total num frames: 964247552. Throughput: 0: 3834.0. Samples: 230232782. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 15:29:13,969][134211] Avg episode reward: [(0, '10.390')] [2025-01-04 15:29:13,977][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000235412_964247552.pth... [2025-01-04 15:29:14,066][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000234554_960733184.pth [2025-01-04 15:29:14,635][134294] Updated weights for policy 0, policy_version 235414 (0.0028) [2025-01-04 15:29:17,761][134294] Updated weights for policy 0, policy_version 235424 (0.0028) [2025-01-04 15:29:18,968][134211] Fps is (10 sec: 12698.2, 60 sec: 15087.1, 300 sec: 14759.5). Total num frames: 964308992. Throughput: 0: 3810.9. Samples: 230242150. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 15:29:18,968][134211] Avg episode reward: [(0, '9.626')] [2025-01-04 15:29:20,874][134294] Updated weights for policy 0, policy_version 235434 (0.0024) [2025-01-04 15:29:23,782][134294] Updated weights for policy 0, policy_version 235444 (0.0024) [2025-01-04 15:29:23,968][134211] Fps is (10 sec: 13107.5, 60 sec: 15087.0, 300 sec: 14745.6). Total num frames: 964378624. Throughput: 0: 3799.8. Samples: 230262396. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 15:29:23,968][134211] Avg episode reward: [(0, '10.321')] [2025-01-04 15:29:26,804][134294] Updated weights for policy 0, policy_version 235454 (0.0026) [2025-01-04 15:29:28,968][134211] Fps is (10 sec: 13926.3, 60 sec: 14813.9, 300 sec: 14690.1). Total num frames: 964448256. Throughput: 0: 3793.9. Samples: 230282812. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 15:29:28,968][134211] Avg episode reward: [(0, '10.123')] [2025-01-04 15:29:29,914][134294] Updated weights for policy 0, policy_version 235464 (0.0025) [2025-01-04 15:29:32,913][134294] Updated weights for policy 0, policy_version 235474 (0.0024) [2025-01-04 15:29:33,968][134211] Fps is (10 sec: 13516.9, 60 sec: 14813.9, 300 sec: 14676.2). Total num frames: 964513792. Throughput: 0: 3783.9. Samples: 230292634. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 15:29:33,968][134211] Avg episode reward: [(0, '9.618')] [2025-01-04 15:29:35,804][134294] Updated weights for policy 0, policy_version 235484 (0.0025) [2025-01-04 15:29:38,816][134294] Updated weights for policy 0, policy_version 235494 (0.0025) [2025-01-04 15:29:38,968][134211] Fps is (10 sec: 13516.8, 60 sec: 14813.8, 300 sec: 14676.2). Total num frames: 964583424. Throughput: 0: 3795.5. Samples: 230313484. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 15:29:38,968][134211] Avg episode reward: [(0, '9.293')] [2025-01-04 15:29:41,758][134294] Updated weights for policy 0, policy_version 235504 (0.0027) [2025-01-04 15:29:43,968][134211] Fps is (10 sec: 13926.1, 60 sec: 14813.8, 300 sec: 14662.3). Total num frames: 964653056. Throughput: 0: 3670.0. Samples: 230334100. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 15:29:43,968][134211] Avg episode reward: [(0, '9.940')] [2025-01-04 15:29:44,820][134294] Updated weights for policy 0, policy_version 235514 (0.0025) [2025-01-04 15:29:47,842][134294] Updated weights for policy 0, policy_version 235524 (0.0025) [2025-01-04 15:29:48,968][134211] Fps is (10 sec: 13516.9, 60 sec: 14745.6, 300 sec: 14648.4). Total num frames: 964718592. Throughput: 0: 3532.2. Samples: 230344208. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 15:29:48,968][134211] Avg episode reward: [(0, '10.519')] [2025-01-04 15:29:50,704][134294] Updated weights for policy 0, policy_version 235534 (0.0024) [2025-01-04 15:29:53,620][134294] Updated weights for policy 0, policy_version 235544 (0.0023) [2025-01-04 15:29:53,968][134211] Fps is (10 sec: 13517.0, 60 sec: 14813.9, 300 sec: 14537.3). Total num frames: 964788224. Throughput: 0: 3361.2. Samples: 230365150. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 15:29:53,968][134211] Avg episode reward: [(0, '10.354')] [2025-01-04 15:29:56,572][134294] Updated weights for policy 0, policy_version 235554 (0.0025) [2025-01-04 15:29:58,968][134211] Fps is (10 sec: 13926.3, 60 sec: 14336.0, 300 sec: 14440.2). Total num frames: 964857856. Throughput: 0: 3400.8. Samples: 230385818. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 15:29:58,968][134211] Avg episode reward: [(0, '9.472')] [2025-01-04 15:29:59,628][134294] Updated weights for policy 0, policy_version 235564 (0.0026) [2025-01-04 15:30:02,718][134294] Updated weights for policy 0, policy_version 235574 (0.0026) [2025-01-04 15:30:03,968][134211] Fps is (10 sec: 13926.4, 60 sec: 13653.3, 300 sec: 14440.1). Total num frames: 964927488. Throughput: 0: 3415.6. Samples: 230395854. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 15:30:03,968][134211] Avg episode reward: [(0, '11.071')] [2025-01-04 15:30:05,697][134294] Updated weights for policy 0, policy_version 235584 (0.0028) [2025-01-04 15:30:08,591][134294] Updated weights for policy 0, policy_version 235594 (0.0023) [2025-01-04 15:30:08,968][134211] Fps is (10 sec: 13516.8, 60 sec: 13516.9, 300 sec: 14440.1). Total num frames: 964993024. Throughput: 0: 3422.3. Samples: 230416400. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 15:30:08,968][134211] Avg episode reward: [(0, '9.301')] [2025-01-04 15:30:11,649][134294] Updated weights for policy 0, policy_version 235604 (0.0028) [2025-01-04 15:30:13,968][134211] Fps is (10 sec: 13516.6, 60 sec: 13585.1, 300 sec: 14440.1). Total num frames: 965062656. Throughput: 0: 3419.7. Samples: 230436700. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 15:30:13,968][134211] Avg episode reward: [(0, '9.430')] [2025-01-04 15:30:14,799][134294] Updated weights for policy 0, policy_version 235614 (0.0027) [2025-01-04 15:30:17,105][134294] Updated weights for policy 0, policy_version 235624 (0.0018) [2025-01-04 15:30:18,967][134211] Fps is (10 sec: 15974.8, 60 sec: 14063.0, 300 sec: 14509.6). Total num frames: 965152768. Throughput: 0: 3434.2. Samples: 230447174. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 15:30:18,968][134211] Avg episode reward: [(0, '9.144')] [2025-01-04 15:30:19,022][134294] Updated weights for policy 0, policy_version 235634 (0.0015) [2025-01-04 15:30:20,939][134294] Updated weights for policy 0, policy_version 235644 (0.0014) [2025-01-04 15:30:22,804][134294] Updated weights for policy 0, policy_version 235654 (0.0013) [2025-01-04 15:30:23,968][134211] Fps is (10 sec: 20070.9, 60 sec: 14745.6, 300 sec: 14648.4). Total num frames: 965263360. Throughput: 0: 3692.2. Samples: 230479632. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 15:30:23,968][134211] Avg episode reward: [(0, '9.125')] [2025-01-04 15:30:24,726][134294] Updated weights for policy 0, policy_version 235664 (0.0015) [2025-01-04 15:30:26,826][134294] Updated weights for policy 0, policy_version 235674 (0.0021) [2025-01-04 15:30:28,968][134211] Fps is (10 sec: 19249.4, 60 sec: 14950.2, 300 sec: 14690.0). Total num frames: 965345280. Throughput: 0: 3848.8. Samples: 230507300. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 15:30:28,969][134211] Avg episode reward: [(0, '10.724')] [2025-01-04 15:30:30,297][134294] Updated weights for policy 0, policy_version 235684 (0.0028) [2025-01-04 15:30:33,621][134294] Updated weights for policy 0, policy_version 235694 (0.0030) [2025-01-04 15:30:33,968][134211] Fps is (10 sec: 14335.8, 60 sec: 14882.1, 300 sec: 14662.3). Total num frames: 965406720. Throughput: 0: 3820.4. Samples: 230516126. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 15:30:33,968][134211] Avg episode reward: [(0, '8.584')] [2025-01-04 15:30:36,698][134294] Updated weights for policy 0, policy_version 235704 (0.0025) [2025-01-04 15:30:38,968][134211] Fps is (10 sec: 12698.6, 60 sec: 14813.9, 300 sec: 14634.5). Total num frames: 965472256. Throughput: 0: 3787.7. Samples: 230535598. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 15:30:38,968][134211] Avg episode reward: [(0, '9.377')] [2025-01-04 15:30:39,817][134294] Updated weights for policy 0, policy_version 235714 (0.0024) [2025-01-04 15:30:42,915][134294] Updated weights for policy 0, policy_version 235724 (0.0029) [2025-01-04 15:30:43,968][134211] Fps is (10 sec: 13107.1, 60 sec: 14745.6, 300 sec: 14620.6). Total num frames: 965537792. Throughput: 0: 3767.7. Samples: 230555364. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 15:30:43,968][134211] Avg episode reward: [(0, '9.334')] [2025-01-04 15:30:46,011][134294] Updated weights for policy 0, policy_version 235734 (0.0026) [2025-01-04 15:30:48,968][134211] Fps is (10 sec: 13107.0, 60 sec: 14745.6, 300 sec: 14606.8). Total num frames: 965603328. Throughput: 0: 3766.3. Samples: 230565336. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 15:30:48,968][134211] Avg episode reward: [(0, '9.904')] [2025-01-04 15:30:49,270][134294] Updated weights for policy 0, policy_version 235744 (0.0026) [2025-01-04 15:30:52,206][134294] Updated weights for policy 0, policy_version 235754 (0.0024) [2025-01-04 15:30:53,968][134211] Fps is (10 sec: 13107.4, 60 sec: 14677.3, 300 sec: 14592.9). Total num frames: 965668864. Throughput: 0: 3753.7. Samples: 230585318. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 15:30:53,968][134211] Avg episode reward: [(0, '9.564')] [2025-01-04 15:30:55,194][134294] Updated weights for policy 0, policy_version 235764 (0.0024) [2025-01-04 15:30:58,138][134294] Updated weights for policy 0, policy_version 235774 (0.0027) [2025-01-04 15:30:58,968][134211] Fps is (10 sec: 13516.8, 60 sec: 14677.3, 300 sec: 14579.0). Total num frames: 965738496. Throughput: 0: 3757.4. Samples: 230605784. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 15:30:58,968][134211] Avg episode reward: [(0, '10.160')] [2025-01-04 15:31:01,070][134294] Updated weights for policy 0, policy_version 235784 (0.0026) [2025-01-04 15:31:03,968][134211] Fps is (10 sec: 13516.6, 60 sec: 14609.0, 300 sec: 14565.1). Total num frames: 965804032. Throughput: 0: 3755.2. Samples: 230616158. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 15:31:03,968][134211] Avg episode reward: [(0, '9.483')] [2025-01-04 15:31:04,349][134294] Updated weights for policy 0, policy_version 235794 (0.0026) [2025-01-04 15:31:07,374][134294] Updated weights for policy 0, policy_version 235804 (0.0023) [2025-01-04 15:31:08,969][134211] Fps is (10 sec: 13515.8, 60 sec: 14677.1, 300 sec: 14565.1). Total num frames: 965873664. Throughput: 0: 3473.7. Samples: 230635952. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 15:31:08,969][134211] Avg episode reward: [(0, '9.673')] [2025-01-04 15:31:10,354][134294] Updated weights for policy 0, policy_version 235814 (0.0026) [2025-01-04 15:31:13,328][134294] Updated weights for policy 0, policy_version 235824 (0.0026) [2025-01-04 15:31:13,968][134211] Fps is (10 sec: 13925.8, 60 sec: 14677.2, 300 sec: 14454.0). Total num frames: 965943296. Throughput: 0: 3315.0. Samples: 230656474. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 15:31:13,969][134211] Avg episode reward: [(0, '9.641')] [2025-01-04 15:31:13,977][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000235826_965943296.pth... [2025-01-04 15:31:14,046][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000234974_962453504.pth [2025-01-04 15:31:16,317][134294] Updated weights for policy 0, policy_version 235834 (0.0024) [2025-01-04 15:31:18,968][134211] Fps is (10 sec: 13517.7, 60 sec: 14267.7, 300 sec: 14315.2). Total num frames: 966008832. Throughput: 0: 3345.1. Samples: 230666656. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 15:31:18,968][134211] Avg episode reward: [(0, '11.527')] [2025-01-04 15:31:19,445][134294] Updated weights for policy 0, policy_version 235844 (0.0025) [2025-01-04 15:31:22,520][134294] Updated weights for policy 0, policy_version 235854 (0.0027) [2025-01-04 15:31:23,968][134211] Fps is (10 sec: 13108.0, 60 sec: 13516.8, 300 sec: 14259.7). Total num frames: 966074368. Throughput: 0: 3358.1. Samples: 230686712. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 15:31:23,968][134211] Avg episode reward: [(0, '10.143')] [2025-01-04 15:31:25,504][134294] Updated weights for policy 0, policy_version 235864 (0.0026) [2025-01-04 15:31:28,513][134294] Updated weights for policy 0, policy_version 235874 (0.0025) [2025-01-04 15:31:28,968][134211] Fps is (10 sec: 13516.9, 60 sec: 13312.2, 300 sec: 14273.5). Total num frames: 966144000. Throughput: 0: 3374.1. Samples: 230707196. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 15:31:28,968][134211] Avg episode reward: [(0, '9.357')] [2025-01-04 15:31:31,038][134294] Updated weights for policy 0, policy_version 235884 (0.0018) [2025-01-04 15:31:33,180][134294] Updated weights for policy 0, policy_version 235894 (0.0018) [2025-01-04 15:31:33,968][134211] Fps is (10 sec: 15564.5, 60 sec: 13721.6, 300 sec: 14342.9). Total num frames: 966230016. Throughput: 0: 3416.6. Samples: 230719082. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 15:31:33,968][134211] Avg episode reward: [(0, '8.927')] [2025-01-04 15:31:36,195][134294] Updated weights for policy 0, policy_version 235904 (0.0026) [2025-01-04 15:31:38,968][134211] Fps is (10 sec: 15155.3, 60 sec: 13721.6, 300 sec: 14343.0). Total num frames: 966295552. Throughput: 0: 3482.2. Samples: 230742016. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 15:31:38,968][134211] Avg episode reward: [(0, '10.065')] [2025-01-04 15:31:39,368][134294] Updated weights for policy 0, policy_version 235914 (0.0026) [2025-01-04 15:31:42,468][134294] Updated weights for policy 0, policy_version 235924 (0.0027) [2025-01-04 15:31:43,968][134211] Fps is (10 sec: 13107.4, 60 sec: 13721.6, 300 sec: 14342.9). Total num frames: 966361088. Throughput: 0: 3462.7. Samples: 230761606. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 15:31:43,968][134211] Avg episode reward: [(0, '9.176')] [2025-01-04 15:31:45,447][134294] Updated weights for policy 0, policy_version 235934 (0.0022) [2025-01-04 15:31:48,512][134294] Updated weights for policy 0, policy_version 235944 (0.0025) [2025-01-04 15:31:48,968][134211] Fps is (10 sec: 13516.8, 60 sec: 13789.9, 300 sec: 14384.6). Total num frames: 966430720. Throughput: 0: 3463.6. Samples: 230772018. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 15:31:48,968][134211] Avg episode reward: [(0, '10.270')] [2025-01-04 15:31:50,820][134294] Updated weights for policy 0, policy_version 235954 (0.0016) [2025-01-04 15:31:52,678][134294] Updated weights for policy 0, policy_version 235964 (0.0013) [2025-01-04 15:31:53,967][134211] Fps is (10 sec: 17203.5, 60 sec: 14404.3, 300 sec: 14495.7). Total num frames: 966533120. Throughput: 0: 3587.5. Samples: 230797386. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 15:31:53,968][134211] Avg episode reward: [(0, '10.189')] [2025-01-04 15:31:54,621][134294] Updated weights for policy 0, policy_version 235974 (0.0012) [2025-01-04 15:31:56,479][134294] Updated weights for policy 0, policy_version 235984 (0.0013) [2025-01-04 15:31:58,665][134294] Updated weights for policy 0, policy_version 235994 (0.0016) [2025-01-04 15:31:58,968][134211] Fps is (10 sec: 20070.2, 60 sec: 14882.1, 300 sec: 14592.9). Total num frames: 966631424. Throughput: 0: 3839.4. Samples: 230829244. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 15:31:58,968][134211] Avg episode reward: [(0, '10.712')] [2025-01-04 15:32:01,883][134294] Updated weights for policy 0, policy_version 236004 (0.0028) [2025-01-04 15:32:03,968][134211] Fps is (10 sec: 16383.2, 60 sec: 14882.1, 300 sec: 14592.9). Total num frames: 966696960. Throughput: 0: 3827.2. Samples: 230838880. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 15:32:03,969][134211] Avg episode reward: [(0, '9.802')] [2025-01-04 15:32:05,316][134294] Updated weights for policy 0, policy_version 236014 (0.0030) [2025-01-04 15:32:08,456][134294] Updated weights for policy 0, policy_version 236024 (0.0027) [2025-01-04 15:32:08,968][134211] Fps is (10 sec: 12697.7, 60 sec: 14745.8, 300 sec: 14592.9). Total num frames: 966758400. Throughput: 0: 3799.2. Samples: 230857676. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 15:32:08,968][134211] Avg episode reward: [(0, '8.698')] [2025-01-04 15:32:11,534][134294] Updated weights for policy 0, policy_version 236034 (0.0028) [2025-01-04 15:32:13,968][134211] Fps is (10 sec: 12697.9, 60 sec: 14677.5, 300 sec: 14509.6). Total num frames: 966823936. Throughput: 0: 3781.3. Samples: 230877354. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 15:32:13,968][134211] Avg episode reward: [(0, '11.165')] [2025-01-04 15:32:14,664][134294] Updated weights for policy 0, policy_version 236044 (0.0026) [2025-01-04 15:32:17,693][134294] Updated weights for policy 0, policy_version 236054 (0.0023) [2025-01-04 15:32:18,968][134211] Fps is (10 sec: 13516.8, 60 sec: 14745.6, 300 sec: 14412.4). Total num frames: 966893568. Throughput: 0: 3737.7. Samples: 230887278. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 15:32:18,968][134211] Avg episode reward: [(0, '10.985')] [2025-01-04 15:32:20,660][134294] Updated weights for policy 0, policy_version 236064 (0.0025) [2025-01-04 15:32:23,611][134294] Updated weights for policy 0, policy_version 236074 (0.0024) [2025-01-04 15:32:23,968][134211] Fps is (10 sec: 13926.3, 60 sec: 14813.8, 300 sec: 14426.2). Total num frames: 966963200. Throughput: 0: 3693.2. Samples: 230908210. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 15:32:23,968][134211] Avg episode reward: [(0, '9.961')] [2025-01-04 15:32:26,583][134294] Updated weights for policy 0, policy_version 236084 (0.0028) [2025-01-04 15:32:28,968][134211] Fps is (10 sec: 13926.3, 60 sec: 14813.9, 300 sec: 14440.2). Total num frames: 967032832. Throughput: 0: 3710.4. Samples: 230928576. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 15:32:28,968][134211] Avg episode reward: [(0, '8.553')] [2025-01-04 15:32:29,658][134294] Updated weights for policy 0, policy_version 236094 (0.0023) [2025-01-04 15:32:32,599][134294] Updated weights for policy 0, policy_version 236104 (0.0027) [2025-01-04 15:32:33,968][134211] Fps is (10 sec: 13516.8, 60 sec: 14472.5, 300 sec: 14440.1). Total num frames: 967098368. Throughput: 0: 3704.1. Samples: 230938702. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 15:32:33,968][134211] Avg episode reward: [(0, '10.092')] [2025-01-04 15:32:35,573][134294] Updated weights for policy 0, policy_version 236114 (0.0026) [2025-01-04 15:32:38,419][134294] Updated weights for policy 0, policy_version 236124 (0.0025) [2025-01-04 15:32:38,968][134211] Fps is (10 sec: 13516.8, 60 sec: 14540.8, 300 sec: 14440.1). Total num frames: 967168000. Throughput: 0: 3607.1. Samples: 230959706. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 15:32:38,968][134211] Avg episode reward: [(0, '9.786')] [2025-01-04 15:32:41,476][134294] Updated weights for policy 0, policy_version 236134 (0.0024) [2025-01-04 15:32:43,970][134211] Fps is (10 sec: 13923.1, 60 sec: 14608.5, 300 sec: 14440.0). Total num frames: 967237632. Throughput: 0: 3353.1. Samples: 230980140. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 15:32:43,971][134211] Avg episode reward: [(0, '10.282')] [2025-01-04 15:32:44,614][134294] Updated weights for policy 0, policy_version 236144 (0.0026) [2025-01-04 15:32:47,462][134294] Updated weights for policy 0, policy_version 236154 (0.0023) [2025-01-04 15:32:48,968][134211] Fps is (10 sec: 13516.8, 60 sec: 14540.8, 300 sec: 14440.1). Total num frames: 967303168. Throughput: 0: 3363.7. Samples: 230990246. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 15:32:48,968][134211] Avg episode reward: [(0, '9.089')] [2025-01-04 15:32:50,446][134294] Updated weights for policy 0, policy_version 236164 (0.0025) [2025-01-04 15:32:53,351][134294] Updated weights for policy 0, policy_version 236174 (0.0024) [2025-01-04 15:32:53,968][134211] Fps is (10 sec: 13519.9, 60 sec: 13994.6, 300 sec: 14440.2). Total num frames: 967372800. Throughput: 0: 3414.9. Samples: 231011348. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 15:32:53,969][134211] Avg episode reward: [(0, '8.889')] [2025-01-04 15:32:56,214][134294] Updated weights for policy 0, policy_version 236184 (0.0022) [2025-01-04 15:32:58,243][134294] Updated weights for policy 0, policy_version 236194 (0.0015) [2025-01-04 15:32:58,968][134211] Fps is (10 sec: 15564.9, 60 sec: 13789.9, 300 sec: 14495.7). Total num frames: 967458816. Throughput: 0: 3517.0. Samples: 231035618. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 15:32:58,968][134211] Avg episode reward: [(0, '10.936')] [2025-01-04 15:33:01,251][134294] Updated weights for policy 0, policy_version 236204 (0.0025) [2025-01-04 15:33:03,968][134211] Fps is (10 sec: 15155.2, 60 sec: 13789.9, 300 sec: 14495.7). Total num frames: 967524352. Throughput: 0: 3527.5. Samples: 231046018. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 15:33:03,968][134211] Avg episode reward: [(0, '9.882')] [2025-01-04 15:33:04,384][134294] Updated weights for policy 0, policy_version 236214 (0.0023) [2025-01-04 15:33:07,408][134294] Updated weights for policy 0, policy_version 236224 (0.0025) [2025-01-04 15:33:08,968][134211] Fps is (10 sec: 13516.7, 60 sec: 13926.4, 300 sec: 14426.2). Total num frames: 967593984. Throughput: 0: 3503.7. Samples: 231065876. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 15:33:08,968][134211] Avg episode reward: [(0, '9.620')] [2025-01-04 15:33:10,292][134294] Updated weights for policy 0, policy_version 236234 (0.0025) [2025-01-04 15:33:13,312][134294] Updated weights for policy 0, policy_version 236244 (0.0025) [2025-01-04 15:33:13,968][134211] Fps is (10 sec: 14336.3, 60 sec: 14063.0, 300 sec: 14454.0). Total num frames: 967667712. Throughput: 0: 3512.4. Samples: 231086632. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 15:33:13,968][134211] Avg episode reward: [(0, '8.896')] [2025-01-04 15:33:13,973][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000236247_967667712.pth... [2025-01-04 15:33:14,024][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000235412_964247552.pth [2025-01-04 15:33:15,279][134294] Updated weights for policy 0, policy_version 236254 (0.0012) [2025-01-04 15:33:17,154][134294] Updated weights for policy 0, policy_version 236264 (0.0013) [2025-01-04 15:33:18,967][134211] Fps is (10 sec: 18022.9, 60 sec: 14677.4, 300 sec: 14579.0). Total num frames: 967774208. Throughput: 0: 3635.2. Samples: 231102284. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 15:33:18,968][134211] Avg episode reward: [(0, '10.679')] [2025-01-04 15:33:19,084][134294] Updated weights for policy 0, policy_version 236274 (0.0014) [2025-01-04 15:33:20,953][134294] Updated weights for policy 0, policy_version 236284 (0.0011) [2025-01-04 15:33:23,907][134294] Updated weights for policy 0, policy_version 236294 (0.0028) [2025-01-04 15:33:23,968][134211] Fps is (10 sec: 19250.8, 60 sec: 14950.4, 300 sec: 14579.0). Total num frames: 967860224. Throughput: 0: 3851.2. Samples: 231133012. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 15:33:23,968][134211] Avg episode reward: [(0, '9.292')] [2025-01-04 15:33:27,552][134294] Updated weights for policy 0, policy_version 236304 (0.0029) [2025-01-04 15:33:28,968][134211] Fps is (10 sec: 14335.5, 60 sec: 14745.6, 300 sec: 14551.2). Total num frames: 967917568. Throughput: 0: 3785.6. Samples: 231150482. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 15:33:28,968][134211] Avg episode reward: [(0, '9.407')] [2025-01-04 15:33:30,712][134294] Updated weights for policy 0, policy_version 236314 (0.0030) [2025-01-04 15:33:33,745][134294] Updated weights for policy 0, policy_version 236324 (0.0025) [2025-01-04 15:33:33,968][134211] Fps is (10 sec: 12697.5, 60 sec: 14813.8, 300 sec: 14551.2). Total num frames: 967987200. Throughput: 0: 3781.7. Samples: 231160422. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 15:33:33,969][134211] Avg episode reward: [(0, '10.250')] [2025-01-04 15:33:36,686][134294] Updated weights for policy 0, policy_version 236334 (0.0025) [2025-01-04 15:33:38,968][134211] Fps is (10 sec: 13516.8, 60 sec: 14745.6, 300 sec: 14537.3). Total num frames: 968052736. Throughput: 0: 3765.2. Samples: 231180780. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 15:33:38,968][134211] Avg episode reward: [(0, '9.275')] [2025-01-04 15:33:39,829][134294] Updated weights for policy 0, policy_version 236344 (0.0028) [2025-01-04 15:33:42,955][134294] Updated weights for policy 0, policy_version 236354 (0.0024) [2025-01-04 15:33:43,968][134211] Fps is (10 sec: 13107.3, 60 sec: 14677.9, 300 sec: 14523.4). Total num frames: 968118272. Throughput: 0: 3662.4. Samples: 231200426. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 15:33:43,968][134211] Avg episode reward: [(0, '9.714')] [2025-01-04 15:33:45,898][134294] Updated weights for policy 0, policy_version 236364 (0.0026) [2025-01-04 15:33:48,838][134294] Updated weights for policy 0, policy_version 236374 (0.0025) [2025-01-04 15:33:48,968][134211] Fps is (10 sec: 13516.9, 60 sec: 14745.6, 300 sec: 14537.3). Total num frames: 968187904. Throughput: 0: 3662.9. Samples: 231210850. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 15:33:48,968][134211] Avg episode reward: [(0, '10.030')] [2025-01-04 15:33:51,826][134294] Updated weights for policy 0, policy_version 236384 (0.0024) [2025-01-04 15:33:53,968][134211] Fps is (10 sec: 13926.5, 60 sec: 14745.6, 300 sec: 14440.1). Total num frames: 968257536. Throughput: 0: 3683.0. Samples: 231231612. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 15:33:53,968][134211] Avg episode reward: [(0, '9.049')] [2025-01-04 15:33:54,875][134294] Updated weights for policy 0, policy_version 236394 (0.0025) [2025-01-04 15:33:57,824][134294] Updated weights for policy 0, policy_version 236404 (0.0027) [2025-01-04 15:33:58,968][134211] Fps is (10 sec: 13516.7, 60 sec: 14404.3, 300 sec: 14287.4). Total num frames: 968323072. Throughput: 0: 3677.4. Samples: 231252114. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 15:33:58,968][134211] Avg episode reward: [(0, '9.352')] [2025-01-04 15:34:00,822][134294] Updated weights for policy 0, policy_version 236414 (0.0026) [2025-01-04 15:34:03,778][134294] Updated weights for policy 0, policy_version 236424 (0.0029) [2025-01-04 15:34:03,968][134211] Fps is (10 sec: 13516.8, 60 sec: 14472.6, 300 sec: 14273.5). Total num frames: 968392704. Throughput: 0: 3555.7. Samples: 231262292. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 15:34:03,968][134211] Avg episode reward: [(0, '9.692')] [2025-01-04 15:34:06,716][134294] Updated weights for policy 0, policy_version 236434 (0.0027) [2025-01-04 15:34:08,968][134211] Fps is (10 sec: 13926.4, 60 sec: 14472.5, 300 sec: 14287.4). Total num frames: 968462336. Throughput: 0: 3337.0. Samples: 231283176. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 15:34:08,968][134211] Avg episode reward: [(0, '9.960')] [2025-01-04 15:34:09,832][134294] Updated weights for policy 0, policy_version 236444 (0.0025) [2025-01-04 15:34:12,830][134294] Updated weights for policy 0, policy_version 236454 (0.0025) [2025-01-04 15:34:13,968][134211] Fps is (10 sec: 13516.8, 60 sec: 14336.0, 300 sec: 14301.3). Total num frames: 968527872. Throughput: 0: 3392.5. Samples: 231303142. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 15:34:13,968][134211] Avg episode reward: [(0, '9.543')] [2025-01-04 15:34:15,747][134294] Updated weights for policy 0, policy_version 236464 (0.0027) [2025-01-04 15:34:18,762][134294] Updated weights for policy 0, policy_version 236474 (0.0022) [2025-01-04 15:34:18,968][134211] Fps is (10 sec: 13517.0, 60 sec: 13721.6, 300 sec: 14301.3). Total num frames: 968597504. Throughput: 0: 3404.2. Samples: 231313610. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 15:34:18,968][134211] Avg episode reward: [(0, '9.939')] [2025-01-04 15:34:21,646][134294] Updated weights for policy 0, policy_version 236484 (0.0024) [2025-01-04 15:34:23,968][134211] Fps is (10 sec: 13926.2, 60 sec: 13448.5, 300 sec: 14301.3). Total num frames: 968667136. Throughput: 0: 3416.5. Samples: 231334524. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 15:34:23,968][134211] Avg episode reward: [(0, '9.988')] [2025-01-04 15:34:24,820][134294] Updated weights for policy 0, policy_version 236494 (0.0026) [2025-01-04 15:34:27,692][134294] Updated weights for policy 0, policy_version 236504 (0.0024) [2025-01-04 15:34:28,968][134211] Fps is (10 sec: 14745.6, 60 sec: 13789.9, 300 sec: 14342.9). Total num frames: 968744960. Throughput: 0: 3446.8. Samples: 231355532. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 15:34:28,968][134211] Avg episode reward: [(0, '10.904')] [2025-01-04 15:34:29,698][134294] Updated weights for policy 0, policy_version 236514 (0.0013) [2025-01-04 15:34:32,474][134294] Updated weights for policy 0, policy_version 236524 (0.0020) [2025-01-04 15:34:33,968][134211] Fps is (10 sec: 15155.2, 60 sec: 13858.1, 300 sec: 14356.8). Total num frames: 968818688. Throughput: 0: 3525.1. Samples: 231369478. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 15:34:33,969][134211] Avg episode reward: [(0, '9.109')] [2025-01-04 15:34:35,390][134294] Updated weights for policy 0, policy_version 236534 (0.0026) [2025-01-04 15:34:38,354][134294] Updated weights for policy 0, policy_version 236544 (0.0026) [2025-01-04 15:34:38,969][134211] Fps is (10 sec: 14334.4, 60 sec: 13926.2, 300 sec: 14356.8). Total num frames: 968888320. Throughput: 0: 3523.1. Samples: 231390154. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-04 15:34:38,969][134211] Avg episode reward: [(0, '10.256')] [2025-01-04 15:34:41,307][134294] Updated weights for policy 0, policy_version 236554 (0.0022) [2025-01-04 15:34:43,968][134211] Fps is (10 sec: 13926.4, 60 sec: 13994.7, 300 sec: 14370.7). Total num frames: 968957952. Throughput: 0: 3518.7. Samples: 231410458. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 15:34:43,969][134211] Avg episode reward: [(0, '9.914')] [2025-01-04 15:34:44,524][134294] Updated weights for policy 0, policy_version 236564 (0.0027) [2025-01-04 15:34:47,467][134294] Updated weights for policy 0, policy_version 236574 (0.0027) [2025-01-04 15:34:48,968][134211] Fps is (10 sec: 13927.7, 60 sec: 13994.6, 300 sec: 14370.7). Total num frames: 969027584. Throughput: 0: 3517.7. Samples: 231420588. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 15:34:48,968][134211] Avg episode reward: [(0, '8.323')] [2025-01-04 15:34:50,519][134294] Updated weights for policy 0, policy_version 236584 (0.0025) [2025-01-04 15:34:53,047][134294] Updated weights for policy 0, policy_version 236594 (0.0020) [2025-01-04 15:34:53,967][134211] Fps is (10 sec: 14746.1, 60 sec: 14131.2, 300 sec: 14398.5). Total num frames: 969105408. Throughput: 0: 3512.6. Samples: 231441244. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 15:34:53,968][134211] Avg episode reward: [(0, '10.252')] [2025-01-04 15:34:55,214][134294] Updated weights for policy 0, policy_version 236604 (0.0016) [2025-01-04 15:34:58,071][134294] Updated weights for policy 0, policy_version 236614 (0.0025) [2025-01-04 15:34:58,968][134211] Fps is (10 sec: 15155.3, 60 sec: 14267.7, 300 sec: 14412.4). Total num frames: 969179136. Throughput: 0: 3622.7. Samples: 231466162. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 15:34:58,968][134211] Avg episode reward: [(0, '10.190')] [2025-01-04 15:35:01,062][134294] Updated weights for policy 0, policy_version 236624 (0.0025) [2025-01-04 15:35:03,968][134211] Fps is (10 sec: 14335.7, 60 sec: 14267.7, 300 sec: 14426.3). Total num frames: 969248768. Throughput: 0: 3618.9. Samples: 231476462. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 15:35:03,968][134211] Avg episode reward: [(0, '10.177')] [2025-01-04 15:35:04,219][134294] Updated weights for policy 0, policy_version 236634 (0.0026) [2025-01-04 15:35:07,259][134294] Updated weights for policy 0, policy_version 236644 (0.0024) [2025-01-04 15:35:08,968][134211] Fps is (10 sec: 13516.4, 60 sec: 14199.4, 300 sec: 14412.4). Total num frames: 969314304. Throughput: 0: 3600.0. Samples: 231496524. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 15:35:08,969][134211] Avg episode reward: [(0, '9.382')] [2025-01-04 15:35:10,265][134294] Updated weights for policy 0, policy_version 236654 (0.0027) [2025-01-04 15:35:13,160][134294] Updated weights for policy 0, policy_version 236664 (0.0023) [2025-01-04 15:35:13,968][134211] Fps is (10 sec: 13516.8, 60 sec: 14267.7, 300 sec: 14342.9). Total num frames: 969383936. Throughput: 0: 3592.1. Samples: 231517178. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 15:35:13,968][134211] Avg episode reward: [(0, '9.768')] [2025-01-04 15:35:14,055][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000236667_969388032.pth... [2025-01-04 15:35:14,127][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000235826_965943296.pth [2025-01-04 15:35:16,224][134294] Updated weights for policy 0, policy_version 236674 (0.0022) [2025-01-04 15:35:18,275][134294] Updated weights for policy 0, policy_version 236684 (0.0016) [2025-01-04 15:35:18,967][134211] Fps is (10 sec: 15565.6, 60 sec: 14540.8, 300 sec: 14259.6). Total num frames: 969469952. Throughput: 0: 3507.6. Samples: 231527320. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 15:35:18,968][134211] Avg episode reward: [(0, '10.633')] [2025-01-04 15:35:20,249][134294] Updated weights for policy 0, policy_version 236694 (0.0012) [2025-01-04 15:35:22,097][134294] Updated weights for policy 0, policy_version 236704 (0.0012) [2025-01-04 15:35:23,968][134211] Fps is (10 sec: 19251.5, 60 sec: 15155.3, 300 sec: 14343.0). Total num frames: 969576448. Throughput: 0: 3740.7. Samples: 231558480. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 15:35:23,968][134211] Avg episode reward: [(0, '10.310')] [2025-01-04 15:35:24,018][134294] Updated weights for policy 0, policy_version 236714 (0.0013) [2025-01-04 15:35:25,880][134294] Updated weights for policy 0, policy_version 236724 (0.0015) [2025-01-04 15:35:28,445][134294] Updated weights for policy 0, policy_version 236734 (0.0024) [2025-01-04 15:35:28,968][134211] Fps is (10 sec: 19660.2, 60 sec: 15360.0, 300 sec: 14440.1). Total num frames: 969666560. Throughput: 0: 3946.1. Samples: 231588030. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 15:35:28,968][134211] Avg episode reward: [(0, '9.439')] [2025-01-04 15:35:31,715][134294] Updated weights for policy 0, policy_version 236744 (0.0029) [2025-01-04 15:35:33,968][134211] Fps is (10 sec: 15154.7, 60 sec: 15155.2, 300 sec: 14426.2). Total num frames: 969728000. Throughput: 0: 3926.5. Samples: 231597282. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 15:35:33,969][134211] Avg episode reward: [(0, '9.415')] [2025-01-04 15:35:35,051][134294] Updated weights for policy 0, policy_version 236754 (0.0028) [2025-01-04 15:35:38,080][134294] Updated weights for policy 0, policy_version 236764 (0.0029) [2025-01-04 15:35:38,969][134211] Fps is (10 sec: 12696.3, 60 sec: 15086.9, 300 sec: 14426.2). Total num frames: 969793536. Throughput: 0: 3892.3. Samples: 231616404. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 15:35:38,969][134211] Avg episode reward: [(0, '9.717')] [2025-01-04 15:35:41,155][134294] Updated weights for policy 0, policy_version 236774 (0.0031) [2025-01-04 15:35:43,968][134211] Fps is (10 sec: 13107.2, 60 sec: 15018.7, 300 sec: 14426.2). Total num frames: 969859072. Throughput: 0: 3784.2. Samples: 231636452. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 15:35:43,968][134211] Avg episode reward: [(0, '9.740')] [2025-01-04 15:35:44,382][134294] Updated weights for policy 0, policy_version 236784 (0.0027) [2025-01-04 15:35:47,226][134294] Updated weights for policy 0, policy_version 236794 (0.0026) [2025-01-04 15:35:48,968][134211] Fps is (10 sec: 13518.2, 60 sec: 15018.7, 300 sec: 14440.1). Total num frames: 969928704. Throughput: 0: 3775.5. Samples: 231646358. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 15:35:48,968][134211] Avg episode reward: [(0, '8.787')] [2025-01-04 15:35:50,308][134294] Updated weights for policy 0, policy_version 236804 (0.0025) [2025-01-04 15:35:53,263][134294] Updated weights for policy 0, policy_version 236814 (0.0028) [2025-01-04 15:35:53,970][134211] Fps is (10 sec: 13514.0, 60 sec: 14813.3, 300 sec: 14426.1). Total num frames: 969994240. Throughput: 0: 3789.3. Samples: 231667048. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 15:35:53,970][134211] Avg episode reward: [(0, '9.710')] [2025-01-04 15:35:56,288][134294] Updated weights for policy 0, policy_version 236824 (0.0027) [2025-01-04 15:35:58,968][134211] Fps is (10 sec: 13516.9, 60 sec: 14745.6, 300 sec: 14440.1). Total num frames: 970063872. Throughput: 0: 3778.0. Samples: 231687188. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 15:35:58,968][134211] Avg episode reward: [(0, '9.874')] [2025-01-04 15:35:59,396][134294] Updated weights for policy 0, policy_version 236834 (0.0027) [2025-01-04 15:36:02,355][134294] Updated weights for policy 0, policy_version 236844 (0.0027) [2025-01-04 15:36:03,968][134211] Fps is (10 sec: 13929.2, 60 sec: 14745.6, 300 sec: 14440.2). Total num frames: 970133504. Throughput: 0: 3779.0. Samples: 231697378. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 15:36:03,969][134211] Avg episode reward: [(0, '9.918')] [2025-01-04 15:36:05,332][134294] Updated weights for policy 0, policy_version 236854 (0.0025) [2025-01-04 15:36:08,281][134294] Updated weights for policy 0, policy_version 236864 (0.0024) [2025-01-04 15:36:08,969][134211] Fps is (10 sec: 13924.8, 60 sec: 14813.7, 300 sec: 14440.1). Total num frames: 970203136. Throughput: 0: 3548.1. Samples: 231718148. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 15:36:08,969][134211] Avg episode reward: [(0, '8.585')] [2025-01-04 15:36:11,234][134294] Updated weights for policy 0, policy_version 236874 (0.0023) [2025-01-04 15:36:13,968][134211] Fps is (10 sec: 13926.6, 60 sec: 14813.9, 300 sec: 14454.0). Total num frames: 970272768. Throughput: 0: 3344.1. Samples: 231738516. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 15:36:13,968][134211] Avg episode reward: [(0, '9.648')] [2025-01-04 15:36:14,342][134294] Updated weights for policy 0, policy_version 236884 (0.0025) [2025-01-04 15:36:17,275][134294] Updated weights for policy 0, policy_version 236894 (0.0025) [2025-01-04 15:36:18,968][134211] Fps is (10 sec: 13518.4, 60 sec: 14472.5, 300 sec: 14454.0). Total num frames: 970338304. Throughput: 0: 3364.4. Samples: 231748678. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 15:36:18,968][134211] Avg episode reward: [(0, '10.055')] [2025-01-04 15:36:20,333][134294] Updated weights for policy 0, policy_version 236904 (0.0023) [2025-01-04 15:36:23,216][134294] Updated weights for policy 0, policy_version 236914 (0.0025) [2025-01-04 15:36:23,968][134211] Fps is (10 sec: 13516.8, 60 sec: 13858.1, 300 sec: 14454.0). Total num frames: 970407936. Throughput: 0: 3403.2. Samples: 231769546. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 15:36:23,968][134211] Avg episode reward: [(0, '10.921')] [2025-01-04 15:36:26,189][134294] Updated weights for policy 0, policy_version 236924 (0.0023) [2025-01-04 15:36:28,968][134211] Fps is (10 sec: 13926.3, 60 sec: 13516.8, 300 sec: 14398.5). Total num frames: 970477568. Throughput: 0: 3412.6. Samples: 231790018. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 15:36:28,968][134211] Avg episode reward: [(0, '9.523')] [2025-01-04 15:36:29,271][134294] Updated weights for policy 0, policy_version 236934 (0.0028) [2025-01-04 15:36:32,288][134294] Updated weights for policy 0, policy_version 236944 (0.0027) [2025-01-04 15:36:33,968][134211] Fps is (10 sec: 13517.0, 60 sec: 13585.1, 300 sec: 14398.5). Total num frames: 970543104. Throughput: 0: 3415.3. Samples: 231800044. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 15:36:33,968][134211] Avg episode reward: [(0, '10.928')] [2025-01-04 15:36:35,257][134294] Updated weights for policy 0, policy_version 236954 (0.0027) [2025-01-04 15:36:38,178][134294] Updated weights for policy 0, policy_version 236964 (0.0024) [2025-01-04 15:36:38,968][134211] Fps is (10 sec: 13516.8, 60 sec: 13653.6, 300 sec: 14412.4). Total num frames: 970612736. Throughput: 0: 3423.0. Samples: 231821076. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 15:36:38,968][134211] Avg episode reward: [(0, '10.427')] [2025-01-04 15:36:41,066][134294] Updated weights for policy 0, policy_version 236974 (0.0024) [2025-01-04 15:36:43,969][134211] Fps is (10 sec: 13924.7, 60 sec: 13721.4, 300 sec: 14412.3). Total num frames: 970682368. Throughput: 0: 3427.2. Samples: 231841418. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 15:36:43,970][134211] Avg episode reward: [(0, '9.524')] [2025-01-04 15:36:44,188][134294] Updated weights for policy 0, policy_version 236984 (0.0024) [2025-01-04 15:36:47,143][134294] Updated weights for policy 0, policy_version 236994 (0.0026) [2025-01-04 15:36:48,968][134211] Fps is (10 sec: 13516.8, 60 sec: 13653.3, 300 sec: 14287.4). Total num frames: 970747904. Throughput: 0: 3426.5. Samples: 231851572. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 15:36:48,968][134211] Avg episode reward: [(0, '9.780')] [2025-01-04 15:36:50,194][134294] Updated weights for policy 0, policy_version 237004 (0.0026) [2025-01-04 15:36:53,106][134294] Updated weights for policy 0, policy_version 237014 (0.0023) [2025-01-04 15:36:53,968][134211] Fps is (10 sec: 13518.4, 60 sec: 13722.1, 300 sec: 14190.2). Total num frames: 970817536. Throughput: 0: 3432.7. Samples: 231872614. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 15:36:53,968][134211] Avg episode reward: [(0, '10.695')] [2025-01-04 15:36:55,619][134294] Updated weights for policy 0, policy_version 237024 (0.0021) [2025-01-04 15:36:57,497][134294] Updated weights for policy 0, policy_version 237034 (0.0014) [2025-01-04 15:36:58,967][134211] Fps is (10 sec: 17203.6, 60 sec: 14267.8, 300 sec: 14315.2). Total num frames: 970919936. Throughput: 0: 3585.6. Samples: 231899868. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 15:36:58,968][134211] Avg episode reward: [(0, '8.752')] [2025-01-04 15:36:59,494][134294] Updated weights for policy 0, policy_version 237044 (0.0015) [2025-01-04 15:37:02,308][134294] Updated weights for policy 0, policy_version 237054 (0.0027) [2025-01-04 15:37:03,968][134211] Fps is (10 sec: 17203.1, 60 sec: 14267.8, 300 sec: 14342.9). Total num frames: 970989568. Throughput: 0: 3644.7. Samples: 231912690. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 15:37:03,968][134211] Avg episode reward: [(0, '9.344')] [2025-01-04 15:37:05,488][134294] Updated weights for policy 0, policy_version 237064 (0.0027) [2025-01-04 15:37:08,557][134294] Updated weights for policy 0, policy_version 237074 (0.0028) [2025-01-04 15:37:08,968][134211] Fps is (10 sec: 13925.9, 60 sec: 14268.0, 300 sec: 14356.8). Total num frames: 971059200. Throughput: 0: 3622.0. Samples: 231932536. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 15:37:08,969][134211] Avg episode reward: [(0, '8.874')] [2025-01-04 15:37:11,859][134294] Updated weights for policy 0, policy_version 237084 (0.0029) [2025-01-04 15:37:13,968][134211] Fps is (10 sec: 13516.7, 60 sec: 14199.4, 300 sec: 14342.9). Total num frames: 971124736. Throughput: 0: 3594.9. Samples: 231951788. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 15:37:13,968][134211] Avg episode reward: [(0, '10.209')] [2025-01-04 15:37:13,978][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000237091_971124736.pth... [2025-01-04 15:37:14,059][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000236247_967667712.pth [2025-01-04 15:37:14,930][134294] Updated weights for policy 0, policy_version 237094 (0.0024) [2025-01-04 15:37:17,919][134294] Updated weights for policy 0, policy_version 237104 (0.0025) [2025-01-04 15:37:18,968][134211] Fps is (10 sec: 13107.3, 60 sec: 14199.4, 300 sec: 14329.1). Total num frames: 971190272. Throughput: 0: 3588.5. Samples: 231961528. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 15:37:18,968][134211] Avg episode reward: [(0, '10.389')] [2025-01-04 15:37:20,892][134294] Updated weights for policy 0, policy_version 237114 (0.0025) [2025-01-04 15:37:23,896][134294] Updated weights for policy 0, policy_version 237124 (0.0027) [2025-01-04 15:37:23,968][134211] Fps is (10 sec: 13516.9, 60 sec: 14199.5, 300 sec: 14329.1). Total num frames: 971259904. Throughput: 0: 3588.1. Samples: 231982542. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 15:37:23,968][134211] Avg episode reward: [(0, '9.594')] [2025-01-04 15:37:26,859][134294] Updated weights for policy 0, policy_version 237134 (0.0024) [2025-01-04 15:37:28,968][134211] Fps is (10 sec: 13926.4, 60 sec: 14199.5, 300 sec: 14342.9). Total num frames: 971329536. Throughput: 0: 3589.1. Samples: 232002924. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 15:37:28,968][134211] Avg episode reward: [(0, '9.805')] [2025-01-04 15:37:29,839][134294] Updated weights for policy 0, policy_version 237144 (0.0029) [2025-01-04 15:37:32,969][134294] Updated weights for policy 0, policy_version 237154 (0.0025) [2025-01-04 15:37:33,968][134211] Fps is (10 sec: 13516.7, 60 sec: 14199.4, 300 sec: 14329.1). Total num frames: 971395072. Throughput: 0: 3585.6. Samples: 232012924. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 15:37:33,968][134211] Avg episode reward: [(0, '10.318')] [2025-01-04 15:37:35,792][134294] Updated weights for policy 0, policy_version 237164 (0.0025) [2025-01-04 15:37:38,713][134294] Updated weights for policy 0, policy_version 237174 (0.0022) [2025-01-04 15:37:38,968][134211] Fps is (10 sec: 13516.9, 60 sec: 14199.5, 300 sec: 14329.2). Total num frames: 971464704. Throughput: 0: 3585.8. Samples: 232033974. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 15:37:38,968][134211] Avg episode reward: [(0, '8.922')] [2025-01-04 15:37:41,715][134294] Updated weights for policy 0, policy_version 237184 (0.0026) [2025-01-04 15:37:43,967][134211] Fps is (10 sec: 14746.0, 60 sec: 14336.3, 300 sec: 14370.7). Total num frames: 971542528. Throughput: 0: 3451.5. Samples: 232055184. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 15:37:43,968][134211] Avg episode reward: [(0, '9.842')] [2025-01-04 15:37:44,038][134294] Updated weights for policy 0, policy_version 237194 (0.0018) [2025-01-04 15:37:46,442][134294] Updated weights for policy 0, policy_version 237204 (0.0020) [2025-01-04 15:37:48,968][134211] Fps is (10 sec: 15564.6, 60 sec: 14540.8, 300 sec: 14398.5). Total num frames: 971620352. Throughput: 0: 3474.8. Samples: 232069056. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 15:37:48,968][134211] Avg episode reward: [(0, '9.839')] [2025-01-04 15:37:49,374][134294] Updated weights for policy 0, policy_version 237214 (0.0023) [2025-01-04 15:37:52,453][134294] Updated weights for policy 0, policy_version 237224 (0.0025) [2025-01-04 15:37:53,968][134211] Fps is (10 sec: 14745.2, 60 sec: 14540.8, 300 sec: 14342.9). Total num frames: 971689984. Throughput: 0: 3494.0. Samples: 232089768. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 15:37:53,968][134211] Avg episode reward: [(0, '9.468')] [2025-01-04 15:37:55,431][134294] Updated weights for policy 0, policy_version 237234 (0.0026) [2025-01-04 15:37:58,362][134294] Updated weights for policy 0, policy_version 237244 (0.0025) [2025-01-04 15:37:58,971][134211] Fps is (10 sec: 13922.1, 60 sec: 13993.9, 300 sec: 14356.7). Total num frames: 971759616. Throughput: 0: 3525.1. Samples: 232110430. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 15:37:58,971][134211] Avg episode reward: [(0, '9.306')] [2025-01-04 15:38:01,314][134294] Updated weights for policy 0, policy_version 237254 (0.0029) [2025-01-04 15:38:03,968][134211] Fps is (10 sec: 13516.7, 60 sec: 13926.4, 300 sec: 14342.9). Total num frames: 971825152. Throughput: 0: 3537.9. Samples: 232120732. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 15:38:03,968][134211] Avg episode reward: [(0, '9.067')] [2025-01-04 15:38:04,499][134294] Updated weights for policy 0, policy_version 237264 (0.0025) [2025-01-04 15:38:07,495][134294] Updated weights for policy 0, policy_version 237274 (0.0026) [2025-01-04 15:38:08,968][134211] Fps is (10 sec: 13111.2, 60 sec: 13858.2, 300 sec: 14315.2). Total num frames: 971890688. Throughput: 0: 3517.1. Samples: 232140812. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 15:38:08,968][134211] Avg episode reward: [(0, '9.727')] [2025-01-04 15:38:10,459][134294] Updated weights for policy 0, policy_version 237284 (0.0026) [2025-01-04 15:38:13,385][134294] Updated weights for policy 0, policy_version 237294 (0.0028) [2025-01-04 15:38:13,968][134211] Fps is (10 sec: 13516.9, 60 sec: 13926.4, 300 sec: 14190.2). Total num frames: 971960320. Throughput: 0: 3527.0. Samples: 232161640. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 15:38:13,968][134211] Avg episode reward: [(0, '11.053')] [2025-01-04 15:38:15,671][134294] Updated weights for policy 0, policy_version 237304 (0.0014) [2025-01-04 15:38:17,951][134294] Updated weights for policy 0, policy_version 237314 (0.0018) [2025-01-04 15:38:18,968][134211] Fps is (10 sec: 15973.4, 60 sec: 14335.9, 300 sec: 14204.1). Total num frames: 972050432. Throughput: 0: 3606.0. Samples: 232175194. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 15:38:18,969][134211] Avg episode reward: [(0, '9.423')] [2025-01-04 15:38:21,009][134294] Updated weights for policy 0, policy_version 237324 (0.0025) [2025-01-04 15:38:23,968][134211] Fps is (10 sec: 15565.0, 60 sec: 14267.8, 300 sec: 14231.9). Total num frames: 972115968. Throughput: 0: 3615.2. Samples: 232196656. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 15:38:23,968][134211] Avg episode reward: [(0, '10.843')] [2025-01-04 15:38:24,180][134294] Updated weights for policy 0, policy_version 237334 (0.0026) [2025-01-04 15:38:27,264][134294] Updated weights for policy 0, policy_version 237344 (0.0026) [2025-01-04 15:38:28,968][134211] Fps is (10 sec: 12698.3, 60 sec: 14131.2, 300 sec: 14204.1). Total num frames: 972177408. Throughput: 0: 3573.7. Samples: 232216000. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 15:38:28,969][134211] Avg episode reward: [(0, '9.962')] [2025-01-04 15:38:30,851][134294] Updated weights for policy 0, policy_version 237354 (0.0030) [2025-01-04 15:38:32,986][134294] Updated weights for policy 0, policy_version 237364 (0.0016) [2025-01-04 15:38:33,968][134211] Fps is (10 sec: 14336.1, 60 sec: 14404.3, 300 sec: 14259.6). Total num frames: 972259328. Throughput: 0: 3471.3. Samples: 232225262. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 15:38:33,968][134211] Avg episode reward: [(0, '9.602')] [2025-01-04 15:38:35,079][134294] Updated weights for policy 0, policy_version 237374 (0.0013) [2025-01-04 15:38:37,037][134294] Updated weights for policy 0, policy_version 237384 (0.0014) [2025-01-04 15:38:38,904][134294] Updated weights for policy 0, policy_version 237394 (0.0013) [2025-01-04 15:38:38,968][134211] Fps is (10 sec: 18842.0, 60 sec: 15018.7, 300 sec: 14398.5). Total num frames: 972365824. Throughput: 0: 3679.4. Samples: 232255340. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 15:38:38,968][134211] Avg episode reward: [(0, '10.332')] [2025-01-04 15:38:40,802][134294] Updated weights for policy 0, policy_version 237404 (0.0012) [2025-01-04 15:38:42,678][134294] Updated weights for policy 0, policy_version 237414 (0.0013) [2025-01-04 15:38:43,968][134211] Fps is (10 sec: 21298.9, 60 sec: 15496.5, 300 sec: 14523.4). Total num frames: 972472320. Throughput: 0: 3948.3. Samples: 232288090. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 15:38:43,968][134211] Avg episode reward: [(0, '10.442')] [2025-01-04 15:38:44,947][134294] Updated weights for policy 0, policy_version 237424 (0.0019) [2025-01-04 15:38:47,987][134294] Updated weights for policy 0, policy_version 237434 (0.0027) [2025-01-04 15:38:48,968][134211] Fps is (10 sec: 17202.8, 60 sec: 15291.7, 300 sec: 14509.6). Total num frames: 972537856. Throughput: 0: 3974.8. Samples: 232299596. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 15:38:48,968][134211] Avg episode reward: [(0, '10.073')] [2025-01-04 15:38:51,137][134294] Updated weights for policy 0, policy_version 237444 (0.0024) [2025-01-04 15:38:53,968][134211] Fps is (10 sec: 13516.8, 60 sec: 15291.7, 300 sec: 14523.4). Total num frames: 972607488. Throughput: 0: 3965.7. Samples: 232319268. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 15:38:53,968][134211] Avg episode reward: [(0, '9.323')] [2025-01-04 15:38:54,247][134294] Updated weights for policy 0, policy_version 237454 (0.0026) [2025-01-04 15:38:57,221][134294] Updated weights for policy 0, policy_version 237464 (0.0030) [2025-01-04 15:38:58,968][134211] Fps is (10 sec: 13516.9, 60 sec: 15224.2, 300 sec: 14509.6). Total num frames: 972673024. Throughput: 0: 3955.3. Samples: 232339630. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 15:38:58,968][134211] Avg episode reward: [(0, '9.615')] [2025-01-04 15:39:00,082][134294] Updated weights for policy 0, policy_version 237474 (0.0025) [2025-01-04 15:39:03,099][134294] Updated weights for policy 0, policy_version 237484 (0.0026) [2025-01-04 15:39:03,968][134211] Fps is (10 sec: 13516.8, 60 sec: 15291.7, 300 sec: 14509.6). Total num frames: 972742656. Throughput: 0: 3891.2. Samples: 232350294. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 15:39:03,968][134211] Avg episode reward: [(0, '10.212')] [2025-01-04 15:39:06,002][134294] Updated weights for policy 0, policy_version 237494 (0.0026) [2025-01-04 15:39:08,968][134211] Fps is (10 sec: 13926.4, 60 sec: 15360.0, 300 sec: 14523.4). Total num frames: 972812288. Throughput: 0: 3876.6. Samples: 232371104. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 15:39:08,968][134211] Avg episode reward: [(0, '9.974')] [2025-01-04 15:39:08,991][134294] Updated weights for policy 0, policy_version 237504 (0.0025) [2025-01-04 15:39:11,876][134294] Updated weights for policy 0, policy_version 237514 (0.0025) [2025-01-04 15:39:13,968][134211] Fps is (10 sec: 13926.3, 60 sec: 15360.0, 300 sec: 14523.4). Total num frames: 972881920. Throughput: 0: 3905.6. Samples: 232391752. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 15:39:13,968][134211] Avg episode reward: [(0, '9.740')] [2025-01-04 15:39:14,024][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000237521_972886016.pth... [2025-01-04 15:39:14,091][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000236667_969388032.pth [2025-01-04 15:39:14,944][134294] Updated weights for policy 0, policy_version 237524 (0.0023) [2025-01-04 15:39:18,110][134294] Updated weights for policy 0, policy_version 237534 (0.0022) [2025-01-04 15:39:18,968][134211] Fps is (10 sec: 13516.6, 60 sec: 14950.5, 300 sec: 14509.6). Total num frames: 972947456. Throughput: 0: 3920.2. Samples: 232401674. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 15:39:18,969][134211] Avg episode reward: [(0, '9.250')] [2025-01-04 15:39:21,130][134294] Updated weights for policy 0, policy_version 237544 (0.0028) [2025-01-04 15:39:23,970][134211] Fps is (10 sec: 12695.6, 60 sec: 14881.7, 300 sec: 14453.9). Total num frames: 973008896. Throughput: 0: 3690.1. Samples: 232421400. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 15:39:23,970][134211] Avg episode reward: [(0, '9.634')] [2025-01-04 15:39:24,912][134294] Updated weights for policy 0, policy_version 237554 (0.0028) [2025-01-04 15:39:28,236][134294] Updated weights for policy 0, policy_version 237564 (0.0030) [2025-01-04 15:39:28,968][134211] Fps is (10 sec: 12288.2, 60 sec: 14882.1, 300 sec: 14412.4). Total num frames: 973070336. Throughput: 0: 3346.0. Samples: 232438662. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 15:39:28,968][134211] Avg episode reward: [(0, '9.771')] [2025-01-04 15:39:31,277][134294] Updated weights for policy 0, policy_version 237574 (0.0026) [2025-01-04 15:39:33,968][134211] Fps is (10 sec: 12699.4, 60 sec: 14609.0, 300 sec: 14398.5). Total num frames: 973135872. Throughput: 0: 3311.0. Samples: 232448590. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 15:39:33,969][134211] Avg episode reward: [(0, '9.538')] [2025-01-04 15:39:34,409][134294] Updated weights for policy 0, policy_version 237584 (0.0026) [2025-01-04 15:39:37,320][134294] Updated weights for policy 0, policy_version 237594 (0.0022) [2025-01-04 15:39:38,968][134211] Fps is (10 sec: 13516.8, 60 sec: 13994.6, 300 sec: 14398.5). Total num frames: 973205504. Throughput: 0: 3324.3. Samples: 232468862. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 15:39:38,968][134211] Avg episode reward: [(0, '8.292')] [2025-01-04 15:39:40,380][134294] Updated weights for policy 0, policy_version 237604 (0.0025) [2025-01-04 15:39:43,283][134294] Updated weights for policy 0, policy_version 237614 (0.0023) [2025-01-04 15:39:43,968][134211] Fps is (10 sec: 13927.0, 60 sec: 13380.3, 300 sec: 14398.5). Total num frames: 973275136. Throughput: 0: 3334.2. Samples: 232489668. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 15:39:43,968][134211] Avg episode reward: [(0, '9.979')] [2025-01-04 15:39:46,166][134294] Updated weights for policy 0, policy_version 237624 (0.0027) [2025-01-04 15:39:48,968][134211] Fps is (10 sec: 13516.8, 60 sec: 13380.3, 300 sec: 14356.8). Total num frames: 973340672. Throughput: 0: 3327.3. Samples: 232500022. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 15:39:48,969][134211] Avg episode reward: [(0, '9.913')] [2025-01-04 15:39:49,324][134294] Updated weights for policy 0, policy_version 237634 (0.0026) [2025-01-04 15:39:52,245][134294] Updated weights for policy 0, policy_version 237644 (0.0027) [2025-01-04 15:39:53,968][134211] Fps is (10 sec: 13516.5, 60 sec: 13380.3, 300 sec: 14342.9). Total num frames: 973410304. Throughput: 0: 3320.6. Samples: 232520532. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 15:39:53,968][134211] Avg episode reward: [(0, '9.438')] [2025-01-04 15:39:55,244][134294] Updated weights for policy 0, policy_version 237654 (0.0027) [2025-01-04 15:39:58,175][134294] Updated weights for policy 0, policy_version 237664 (0.0025) [2025-01-04 15:39:58,968][134211] Fps is (10 sec: 13926.5, 60 sec: 13448.6, 300 sec: 14342.9). Total num frames: 973479936. Throughput: 0: 3321.5. Samples: 232541218. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 15:39:58,968][134211] Avg episode reward: [(0, '9.761')] [2025-01-04 15:40:01,215][134294] Updated weights for policy 0, policy_version 237674 (0.0024) [2025-01-04 15:40:03,968][134211] Fps is (10 sec: 13926.4, 60 sec: 13448.5, 300 sec: 14356.8). Total num frames: 973549568. Throughput: 0: 3331.3. Samples: 232551584. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 15:40:03,968][134211] Avg episode reward: [(0, '10.705')] [2025-01-04 15:40:04,266][134294] Updated weights for policy 0, policy_version 237684 (0.0026) [2025-01-04 15:40:07,173][134294] Updated weights for policy 0, policy_version 237694 (0.0024) [2025-01-04 15:40:08,968][134211] Fps is (10 sec: 13516.9, 60 sec: 13380.3, 300 sec: 14342.9). Total num frames: 973615104. Throughput: 0: 3343.3. Samples: 232571844. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 15:40:08,968][134211] Avg episode reward: [(0, '9.862')] [2025-01-04 15:40:10,172][134294] Updated weights for policy 0, policy_version 237704 (0.0027) [2025-01-04 15:40:13,045][134294] Updated weights for policy 0, policy_version 237714 (0.0025) [2025-01-04 15:40:13,968][134211] Fps is (10 sec: 13926.6, 60 sec: 13448.6, 300 sec: 14301.3). Total num frames: 973688832. Throughput: 0: 3423.3. Samples: 232592708. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 15:40:13,968][134211] Avg episode reward: [(0, '9.633')] [2025-01-04 15:40:15,957][134294] Updated weights for policy 0, policy_version 237724 (0.0024) [2025-01-04 15:40:18,880][134294] Updated weights for policy 0, policy_version 237734 (0.0026) [2025-01-04 15:40:18,968][134211] Fps is (10 sec: 14335.7, 60 sec: 13516.8, 300 sec: 14176.3). Total num frames: 973758464. Throughput: 0: 3435.9. Samples: 232603206. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 15:40:18,968][134211] Avg episode reward: [(0, '9.746')] [2025-01-04 15:40:21,937][134294] Updated weights for policy 0, policy_version 237744 (0.0026) [2025-01-04 15:40:23,968][134211] Fps is (10 sec: 14336.1, 60 sec: 13722.0, 300 sec: 14120.8). Total num frames: 973832192. Throughput: 0: 3447.3. Samples: 232623990. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 15:40:23,968][134211] Avg episode reward: [(0, '9.928')] [2025-01-04 15:40:24,280][134294] Updated weights for policy 0, policy_version 237754 (0.0017) [2025-01-04 15:40:26,186][134294] Updated weights for policy 0, policy_version 237764 (0.0013) [2025-01-04 15:40:28,021][134294] Updated weights for policy 0, policy_version 237774 (0.0014) [2025-01-04 15:40:28,967][134211] Fps is (10 sec: 18023.0, 60 sec: 14472.6, 300 sec: 14273.5). Total num frames: 973938688. Throughput: 0: 3663.9. Samples: 232654544. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 15:40:28,968][134211] Avg episode reward: [(0, '9.955')] [2025-01-04 15:40:30,164][134294] Updated weights for policy 0, policy_version 237784 (0.0017) [2025-01-04 15:40:33,173][134294] Updated weights for policy 0, policy_version 237794 (0.0028) [2025-01-04 15:40:33,968][134211] Fps is (10 sec: 18022.2, 60 sec: 14609.1, 300 sec: 14301.3). Total num frames: 974012416. Throughput: 0: 3723.8. Samples: 232667594. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 15:40:33,968][134211] Avg episode reward: [(0, '9.666')] [2025-01-04 15:40:36,433][134294] Updated weights for policy 0, policy_version 237804 (0.0025) [2025-01-04 15:40:38,968][134211] Fps is (10 sec: 13926.1, 60 sec: 14540.8, 300 sec: 14301.3). Total num frames: 974077952. Throughput: 0: 3697.1. Samples: 232686900. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 15:40:38,968][134211] Avg episode reward: [(0, '9.506')] [2025-01-04 15:40:39,587][134294] Updated weights for policy 0, policy_version 237814 (0.0025) [2025-01-04 15:40:42,702][134294] Updated weights for policy 0, policy_version 237824 (0.0028) [2025-01-04 15:40:43,968][134211] Fps is (10 sec: 13107.3, 60 sec: 14472.5, 300 sec: 14287.4). Total num frames: 974143488. Throughput: 0: 3675.9. Samples: 232706632. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 15:40:43,968][134211] Avg episode reward: [(0, '9.624')] [2025-01-04 15:40:45,664][134294] Updated weights for policy 0, policy_version 237834 (0.0026) [2025-01-04 15:40:48,621][134294] Updated weights for policy 0, policy_version 237844 (0.0027) [2025-01-04 15:40:48,968][134211] Fps is (10 sec: 13516.8, 60 sec: 14540.8, 300 sec: 14301.4). Total num frames: 974213120. Throughput: 0: 3674.9. Samples: 232716956. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 15:40:48,968][134211] Avg episode reward: [(0, '9.669')] [2025-01-04 15:40:51,568][134294] Updated weights for policy 0, policy_version 237854 (0.0025) [2025-01-04 15:40:53,968][134211] Fps is (10 sec: 13516.6, 60 sec: 14472.5, 300 sec: 14287.4). Total num frames: 974278656. Throughput: 0: 3680.9. Samples: 232737486. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 15:40:53,968][134211] Avg episode reward: [(0, '11.176')] [2025-01-04 15:40:54,755][134294] Updated weights for policy 0, policy_version 237864 (0.0027) [2025-01-04 15:40:57,855][134294] Updated weights for policy 0, policy_version 237874 (0.0028) [2025-01-04 15:40:58,968][134211] Fps is (10 sec: 13106.6, 60 sec: 14404.1, 300 sec: 14273.5). Total num frames: 974344192. Throughput: 0: 3655.1. Samples: 232757190. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 15:40:58,970][134211] Avg episode reward: [(0, '9.905')] [2025-01-04 15:41:00,837][134294] Updated weights for policy 0, policy_version 237884 (0.0027) [2025-01-04 15:41:03,789][134294] Updated weights for policy 0, policy_version 237894 (0.0024) [2025-01-04 15:41:03,968][134211] Fps is (10 sec: 13516.2, 60 sec: 14404.2, 300 sec: 14273.5). Total num frames: 974413824. Throughput: 0: 3655.3. Samples: 232767696. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 15:41:03,969][134211] Avg episode reward: [(0, '8.626')] [2025-01-04 15:41:06,694][134294] Updated weights for policy 0, policy_version 237904 (0.0026) [2025-01-04 15:41:08,968][134211] Fps is (10 sec: 13927.1, 60 sec: 14472.5, 300 sec: 14273.5). Total num frames: 974483456. Throughput: 0: 3650.7. Samples: 232788270. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 15:41:08,968][134211] Avg episode reward: [(0, '10.042')] [2025-01-04 15:41:09,830][134294] Updated weights for policy 0, policy_version 237914 (0.0026) [2025-01-04 15:41:12,867][134294] Updated weights for policy 0, policy_version 237924 (0.0024) [2025-01-04 15:41:13,968][134211] Fps is (10 sec: 13517.3, 60 sec: 14336.0, 300 sec: 14273.5). Total num frames: 974548992. Throughput: 0: 3423.4. Samples: 232808600. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 15:41:13,969][134211] Avg episode reward: [(0, '9.572')] [2025-01-04 15:41:13,977][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000237928_974553088.pth... [2025-01-04 15:41:14,058][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000237091_971124736.pth [2025-01-04 15:41:15,850][134294] Updated weights for policy 0, policy_version 237934 (0.0027) [2025-01-04 15:41:18,782][134294] Updated weights for policy 0, policy_version 237944 (0.0027) [2025-01-04 15:41:18,970][134211] Fps is (10 sec: 13514.1, 60 sec: 14335.5, 300 sec: 14273.4). Total num frames: 974618624. Throughput: 0: 3356.9. Samples: 232818662. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 15:41:18,970][134211] Avg episode reward: [(0, '9.939')] [2025-01-04 15:41:21,753][134294] Updated weights for policy 0, policy_version 237954 (0.0026) [2025-01-04 15:41:23,968][134211] Fps is (10 sec: 13926.5, 60 sec: 14267.7, 300 sec: 14273.5). Total num frames: 974688256. Throughput: 0: 3390.7. Samples: 232839480. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 15:41:23,968][134211] Avg episode reward: [(0, '9.334')] [2025-01-04 15:41:24,858][134294] Updated weights for policy 0, policy_version 237964 (0.0028) [2025-01-04 15:41:26,937][134294] Updated weights for policy 0, policy_version 237974 (0.0014) [2025-01-04 15:41:28,968][134211] Fps is (10 sec: 15568.1, 60 sec: 13926.4, 300 sec: 14342.9). Total num frames: 974774272. Throughput: 0: 3499.6. Samples: 232864116. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 15:41:28,968][134211] Avg episode reward: [(0, '10.643')] [2025-01-04 15:41:29,452][134294] Updated weights for policy 0, policy_version 237984 (0.0020) [2025-01-04 15:41:32,506][134294] Updated weights for policy 0, policy_version 237994 (0.0025) [2025-01-04 15:41:33,968][134211] Fps is (10 sec: 15155.4, 60 sec: 13789.9, 300 sec: 14329.1). Total num frames: 974839808. Throughput: 0: 3498.3. Samples: 232874380. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 15:41:33,968][134211] Avg episode reward: [(0, '10.473')] [2025-01-04 15:41:35,491][134294] Updated weights for policy 0, policy_version 238004 (0.0029) [2025-01-04 15:41:38,483][134294] Updated weights for policy 0, policy_version 238014 (0.0027) [2025-01-04 15:41:38,968][134211] Fps is (10 sec: 13516.8, 60 sec: 13858.2, 300 sec: 14329.1). Total num frames: 974909440. Throughput: 0: 3501.9. Samples: 232895070. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 15:41:38,968][134211] Avg episode reward: [(0, '9.128')] [2025-01-04 15:41:41,457][134294] Updated weights for policy 0, policy_version 238024 (0.0026) [2025-01-04 15:41:43,968][134211] Fps is (10 sec: 13516.6, 60 sec: 13858.1, 300 sec: 14329.1). Total num frames: 974974976. Throughput: 0: 3517.1. Samples: 232915456. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 15:41:43,968][134211] Avg episode reward: [(0, '9.414')] [2025-01-04 15:41:44,568][134294] Updated weights for policy 0, policy_version 238034 (0.0026) [2025-01-04 15:41:47,570][134294] Updated weights for policy 0, policy_version 238044 (0.0026) [2025-01-04 15:41:48,967][134211] Fps is (10 sec: 14336.3, 60 sec: 13994.7, 300 sec: 14356.8). Total num frames: 975052800. Throughput: 0: 3509.7. Samples: 232925630. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 15:41:48,968][134211] Avg episode reward: [(0, '10.380')] [2025-01-04 15:41:49,564][134294] Updated weights for policy 0, policy_version 238054 (0.0013) [2025-01-04 15:41:51,407][134294] Updated weights for policy 0, policy_version 238064 (0.0012) [2025-01-04 15:41:53,867][134294] Updated weights for policy 0, policy_version 238074 (0.0022) [2025-01-04 15:41:53,968][134211] Fps is (10 sec: 17612.2, 60 sec: 14540.7, 300 sec: 14342.9). Total num frames: 975151104. Throughput: 0: 3695.3. Samples: 232954562. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 15:41:53,969][134211] Avg episode reward: [(0, '9.335')] [2025-01-04 15:41:57,026][134294] Updated weights for policy 0, policy_version 238084 (0.0027) [2025-01-04 15:41:58,968][134211] Fps is (10 sec: 16383.6, 60 sec: 14540.9, 300 sec: 14329.1). Total num frames: 975216640. Throughput: 0: 3693.2. Samples: 232974794. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 15:41:58,968][134211] Avg episode reward: [(0, '9.291')] [2025-01-04 15:42:00,148][134294] Updated weights for policy 0, policy_version 238094 (0.0025) [2025-01-04 15:42:03,139][134294] Updated weights for policy 0, policy_version 238104 (0.0025) [2025-01-04 15:42:03,968][134211] Fps is (10 sec: 13107.7, 60 sec: 14472.6, 300 sec: 14315.2). Total num frames: 975282176. Throughput: 0: 3698.3. Samples: 232985080. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 15:42:03,968][134211] Avg episode reward: [(0, '9.131')] [2025-01-04 15:42:06,214][134294] Updated weights for policy 0, policy_version 238114 (0.0026) [2025-01-04 15:42:08,968][134211] Fps is (10 sec: 13107.2, 60 sec: 14404.3, 300 sec: 14315.2). Total num frames: 975347712. Throughput: 0: 3681.1. Samples: 233005130. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 15:42:08,968][134211] Avg episode reward: [(0, '10.103')] [2025-01-04 15:42:09,353][134294] Updated weights for policy 0, policy_version 238124 (0.0027) [2025-01-04 15:42:12,328][134294] Updated weights for policy 0, policy_version 238134 (0.0024) [2025-01-04 15:42:13,968][134211] Fps is (10 sec: 13516.8, 60 sec: 14472.6, 300 sec: 14329.1). Total num frames: 975417344. Throughput: 0: 3574.4. Samples: 233024964. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 15:42:13,968][134211] Avg episode reward: [(0, '10.397')] [2025-01-04 15:42:15,319][134294] Updated weights for policy 0, policy_version 238144 (0.0022) [2025-01-04 15:42:18,334][134294] Updated weights for policy 0, policy_version 238154 (0.0025) [2025-01-04 15:42:18,968][134211] Fps is (10 sec: 13926.4, 60 sec: 14473.0, 300 sec: 14329.1). Total num frames: 975486976. Throughput: 0: 3582.0. Samples: 233035570. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 15:42:18,968][134211] Avg episode reward: [(0, '10.052')] [2025-01-04 15:42:21,274][134294] Updated weights for policy 0, policy_version 238164 (0.0025) [2025-01-04 15:42:23,968][134211] Fps is (10 sec: 13516.7, 60 sec: 14404.3, 300 sec: 14315.2). Total num frames: 975552512. Throughput: 0: 3578.9. Samples: 233056120. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 15:42:23,968][134211] Avg episode reward: [(0, '9.812')] [2025-01-04 15:42:24,352][134294] Updated weights for policy 0, policy_version 238174 (0.0025) [2025-01-04 15:42:27,333][134294] Updated weights for policy 0, policy_version 238184 (0.0025) [2025-01-04 15:42:28,968][134211] Fps is (10 sec: 13516.8, 60 sec: 14131.2, 300 sec: 14329.1). Total num frames: 975622144. Throughput: 0: 3576.2. Samples: 233076384. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 15:42:28,968][134211] Avg episode reward: [(0, '10.161')] [2025-01-04 15:42:30,303][134294] Updated weights for policy 0, policy_version 238194 (0.0026) [2025-01-04 15:42:33,240][134294] Updated weights for policy 0, policy_version 238204 (0.0023) [2025-01-04 15:42:33,968][134211] Fps is (10 sec: 13926.5, 60 sec: 14199.4, 300 sec: 14329.1). Total num frames: 975691776. Throughput: 0: 3583.7. Samples: 233086898. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 15:42:33,971][134211] Avg episode reward: [(0, '10.005')] [2025-01-04 15:42:36,265][134294] Updated weights for policy 0, policy_version 238214 (0.0026) [2025-01-04 15:42:38,967][134211] Fps is (10 sec: 13926.8, 60 sec: 14199.5, 300 sec: 14301.3). Total num frames: 975761408. Throughput: 0: 3400.9. Samples: 233107600. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 15:42:38,968][134211] Avg episode reward: [(0, '9.985')] [2025-01-04 15:42:39,033][134294] Updated weights for policy 0, policy_version 238224 (0.0024) [2025-01-04 15:42:41,023][134294] Updated weights for policy 0, policy_version 238234 (0.0014) [2025-01-04 15:42:42,870][134294] Updated weights for policy 0, policy_version 238244 (0.0013) [2025-01-04 15:42:43,967][134211] Fps is (10 sec: 17613.2, 60 sec: 14882.2, 300 sec: 14398.5). Total num frames: 975867904. Throughput: 0: 3602.4. Samples: 233136900. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 15:42:43,968][134211] Avg episode reward: [(0, '9.475')] [2025-01-04 15:42:44,768][134294] Updated weights for policy 0, policy_version 238254 (0.0013) [2025-01-04 15:42:46,630][134294] Updated weights for policy 0, policy_version 238264 (0.0015) [2025-01-04 15:42:48,968][134211] Fps is (10 sec: 20069.7, 60 sec: 15155.1, 300 sec: 14481.8). Total num frames: 975962112. Throughput: 0: 3739.5. Samples: 233153356. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 15:42:48,969][134211] Avg episode reward: [(0, '10.858')] [2025-01-04 15:42:49,601][134294] Updated weights for policy 0, policy_version 238274 (0.0027) [2025-01-04 15:42:52,724][134294] Updated weights for policy 0, policy_version 238284 (0.0031) [2025-01-04 15:42:53,968][134211] Fps is (10 sec: 15564.4, 60 sec: 14540.9, 300 sec: 14454.2). Total num frames: 976023552. Throughput: 0: 3759.7. Samples: 233174316. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 15:42:53,968][134211] Avg episode reward: [(0, '9.353')] [2025-01-04 15:42:55,923][134294] Updated weights for policy 0, policy_version 238294 (0.0027) [2025-01-04 15:42:58,949][134294] Updated weights for policy 0, policy_version 238304 (0.0027) [2025-01-04 15:42:58,968][134211] Fps is (10 sec: 13107.3, 60 sec: 14609.1, 300 sec: 14467.9). Total num frames: 976093184. Throughput: 0: 3760.9. Samples: 233194204. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 15:42:58,969][134211] Avg episode reward: [(0, '9.728')] [2025-01-04 15:43:01,926][134294] Updated weights for policy 0, policy_version 238314 (0.0027) [2025-01-04 15:43:03,968][134211] Fps is (10 sec: 13516.8, 60 sec: 14609.1, 300 sec: 14467.9). Total num frames: 976158720. Throughput: 0: 3747.2. Samples: 233204194. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 15:43:03,969][134211] Avg episode reward: [(0, '9.905')] [2025-01-04 15:43:05,071][134294] Updated weights for policy 0, policy_version 238324 (0.0026) [2025-01-04 15:43:08,095][134294] Updated weights for policy 0, policy_version 238334 (0.0027) [2025-01-04 15:43:08,968][134211] Fps is (10 sec: 13106.9, 60 sec: 14609.0, 300 sec: 14454.0). Total num frames: 976224256. Throughput: 0: 3732.8. Samples: 233224096. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 15:43:08,969][134211] Avg episode reward: [(0, '9.265')] [2025-01-04 15:43:11,134][134294] Updated weights for policy 0, policy_version 238344 (0.0024) [2025-01-04 15:43:13,968][134211] Fps is (10 sec: 13516.6, 60 sec: 14609.0, 300 sec: 14384.6). Total num frames: 976293888. Throughput: 0: 3730.7. Samples: 233244264. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 15:43:13,969][134211] Avg episode reward: [(0, '10.118')] [2025-01-04 15:43:13,982][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000238353_976293888.pth... [2025-01-04 15:43:14,056][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000237521_972886016.pth [2025-01-04 15:43:14,242][134294] Updated weights for policy 0, policy_version 238354 (0.0026) [2025-01-04 15:43:17,298][134294] Updated weights for policy 0, policy_version 238364 (0.0027) [2025-01-04 15:43:18,968][134211] Fps is (10 sec: 13517.4, 60 sec: 14540.9, 300 sec: 14384.6). Total num frames: 976359424. Throughput: 0: 3714.8. Samples: 233254062. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 15:43:18,968][134211] Avg episode reward: [(0, '8.546')] [2025-01-04 15:43:20,264][134294] Updated weights for policy 0, policy_version 238374 (0.0025) [2025-01-04 15:43:23,189][134294] Updated weights for policy 0, policy_version 238384 (0.0024) [2025-01-04 15:43:23,968][134211] Fps is (10 sec: 13517.0, 60 sec: 14609.1, 300 sec: 14412.4). Total num frames: 976429056. Throughput: 0: 3717.9. Samples: 233274906. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 15:43:23,968][134211] Avg episode reward: [(0, '10.490')] [2025-01-04 15:43:26,260][134294] Updated weights for policy 0, policy_version 238394 (0.0025) [2025-01-04 15:43:28,969][134211] Fps is (10 sec: 13924.8, 60 sec: 14608.8, 300 sec: 14370.7). Total num frames: 976498688. Throughput: 0: 3518.9. Samples: 233295256. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 15:43:28,969][134211] Avg episode reward: [(0, '10.107')] [2025-01-04 15:43:29,339][134294] Updated weights for policy 0, policy_version 238404 (0.0026) [2025-01-04 15:43:32,235][134294] Updated weights for policy 0, policy_version 238414 (0.0026) [2025-01-04 15:43:33,968][134211] Fps is (10 sec: 13516.7, 60 sec: 14540.8, 300 sec: 14231.9). Total num frames: 976564224. Throughput: 0: 3376.7. Samples: 233305308. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 15:43:33,968][134211] Avg episode reward: [(0, '10.372')] [2025-01-04 15:43:35,210][134294] Updated weights for policy 0, policy_version 238424 (0.0023) [2025-01-04 15:43:38,167][134294] Updated weights for policy 0, policy_version 238434 (0.0025) [2025-01-04 15:43:38,968][134211] Fps is (10 sec: 13518.2, 60 sec: 14540.8, 300 sec: 14106.9). Total num frames: 976633856. Throughput: 0: 3381.6. Samples: 233326486. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 15:43:38,968][134211] Avg episode reward: [(0, '10.332')] [2025-01-04 15:43:41,155][134294] Updated weights for policy 0, policy_version 238444 (0.0027) [2025-01-04 15:43:43,968][134211] Fps is (10 sec: 13926.4, 60 sec: 13926.3, 300 sec: 14120.8). Total num frames: 976703488. Throughput: 0: 3391.8. Samples: 233346834. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 15:43:43,968][134211] Avg episode reward: [(0, '9.209')] [2025-01-04 15:43:44,165][134294] Updated weights for policy 0, policy_version 238454 (0.0024) [2025-01-04 15:43:47,294][134294] Updated weights for policy 0, policy_version 238464 (0.0027) [2025-01-04 15:43:48,968][134211] Fps is (10 sec: 13516.9, 60 sec: 13448.6, 300 sec: 14106.9). Total num frames: 976769024. Throughput: 0: 3384.2. Samples: 233356482. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 15:43:48,968][134211] Avg episode reward: [(0, '10.438')] [2025-01-04 15:43:49,943][134294] Updated weights for policy 0, policy_version 238474 (0.0020) [2025-01-04 15:43:51,835][134294] Updated weights for policy 0, policy_version 238484 (0.0012) [2025-01-04 15:43:53,646][134294] Updated weights for policy 0, policy_version 238494 (0.0013) [2025-01-04 15:43:53,968][134211] Fps is (10 sec: 17203.6, 60 sec: 14199.5, 300 sec: 14245.8). Total num frames: 976875520. Throughput: 0: 3528.8. Samples: 233382890. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 15:43:53,968][134211] Avg episode reward: [(0, '10.029')] [2025-01-04 15:43:55,560][134294] Updated weights for policy 0, policy_version 238504 (0.0013) [2025-01-04 15:43:57,467][134294] Updated weights for policy 0, policy_version 238514 (0.0014) [2025-01-04 15:43:58,968][134211] Fps is (10 sec: 21299.1, 60 sec: 14813.9, 300 sec: 14370.7). Total num frames: 976982016. Throughput: 0: 3807.6. Samples: 233415606. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-04 15:43:58,968][134211] Avg episode reward: [(0, '9.876')] [2025-01-04 15:43:59,464][134294] Updated weights for policy 0, policy_version 238524 (0.0014) [2025-01-04 15:44:02,511][134294] Updated weights for policy 0, policy_version 238534 (0.0029) [2025-01-04 15:44:03,968][134211] Fps is (10 sec: 17612.7, 60 sec: 14882.2, 300 sec: 14370.7). Total num frames: 977051648. Throughput: 0: 3861.4. Samples: 233427826. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-04 15:44:03,968][134211] Avg episode reward: [(0, '10.345')] [2025-01-04 15:44:05,785][134294] Updated weights for policy 0, policy_version 238544 (0.0027) [2025-01-04 15:44:08,870][134294] Updated weights for policy 0, policy_version 238554 (0.0025) [2025-01-04 15:44:08,968][134211] Fps is (10 sec: 13516.8, 60 sec: 14882.2, 300 sec: 14356.8). Total num frames: 977117184. Throughput: 0: 3822.5. Samples: 233446920. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-04 15:44:08,968][134211] Avg episode reward: [(0, '8.740')] [2025-01-04 15:44:11,890][134294] Updated weights for policy 0, policy_version 238564 (0.0026) [2025-01-04 15:44:13,968][134211] Fps is (10 sec: 13107.2, 60 sec: 14813.9, 300 sec: 14356.8). Total num frames: 977182720. Throughput: 0: 3811.6. Samples: 233466776. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-04 15:44:13,968][134211] Avg episode reward: [(0, '9.547')] [2025-01-04 15:44:15,072][134294] Updated weights for policy 0, policy_version 238574 (0.0024) [2025-01-04 15:44:18,083][134294] Updated weights for policy 0, policy_version 238584 (0.0025) [2025-01-04 15:44:18,968][134211] Fps is (10 sec: 13107.2, 60 sec: 14813.8, 300 sec: 14370.8). Total num frames: 977248256. Throughput: 0: 3806.1. Samples: 233476580. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-04 15:44:18,968][134211] Avg episode reward: [(0, '10.150')] [2025-01-04 15:44:21,108][134294] Updated weights for policy 0, policy_version 238594 (0.0028) [2025-01-04 15:44:23,968][134211] Fps is (10 sec: 13516.7, 60 sec: 14813.9, 300 sec: 14398.5). Total num frames: 977317888. Throughput: 0: 3786.6. Samples: 233496884. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-04 15:44:23,968][134211] Avg episode reward: [(0, '10.696')] [2025-01-04 15:44:24,264][134294] Updated weights for policy 0, policy_version 238604 (0.0024) [2025-01-04 15:44:27,188][134294] Updated weights for policy 0, policy_version 238614 (0.0026) [2025-01-04 15:44:28,968][134211] Fps is (10 sec: 13516.8, 60 sec: 14745.8, 300 sec: 14398.5). Total num frames: 977383424. Throughput: 0: 3783.0. Samples: 233517068. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-04 15:44:28,968][134211] Avg episode reward: [(0, '10.234')] [2025-01-04 15:44:30,187][134294] Updated weights for policy 0, policy_version 238624 (0.0026) [2025-01-04 15:44:33,143][134294] Updated weights for policy 0, policy_version 238634 (0.0021) [2025-01-04 15:44:33,968][134211] Fps is (10 sec: 13516.7, 60 sec: 14813.9, 300 sec: 14398.5). Total num frames: 977453056. Throughput: 0: 3804.9. Samples: 233527702. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-04 15:44:33,969][134211] Avg episode reward: [(0, '11.062')] [2025-01-04 15:44:36,129][134294] Updated weights for policy 0, policy_version 238644 (0.0024) [2025-01-04 15:44:38,968][134211] Fps is (10 sec: 13926.2, 60 sec: 14813.8, 300 sec: 14398.5). Total num frames: 977522688. Throughput: 0: 3672.4. Samples: 233548150. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-04 15:44:38,968][134211] Avg episode reward: [(0, '9.191')] [2025-01-04 15:44:39,127][134294] Updated weights for policy 0, policy_version 238654 (0.0025) [2025-01-04 15:44:42,231][134294] Updated weights for policy 0, policy_version 238664 (0.0032) [2025-01-04 15:44:43,968][134211] Fps is (10 sec: 13516.7, 60 sec: 14745.6, 300 sec: 14398.5). Total num frames: 977588224. Throughput: 0: 3391.9. Samples: 233568244. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-04 15:44:43,969][134211] Avg episode reward: [(0, '10.072')] [2025-01-04 15:44:45,243][134294] Updated weights for policy 0, policy_version 238674 (0.0027) [2025-01-04 15:44:48,124][134294] Updated weights for policy 0, policy_version 238684 (0.0025) [2025-01-04 15:44:48,968][134211] Fps is (10 sec: 13517.0, 60 sec: 14813.9, 300 sec: 14398.5). Total num frames: 977657856. Throughput: 0: 3355.6. Samples: 233578826. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-04 15:44:48,968][134211] Avg episode reward: [(0, '9.369')] [2025-01-04 15:44:51,103][134294] Updated weights for policy 0, policy_version 238694 (0.0027) [2025-01-04 15:44:53,968][134211] Fps is (10 sec: 13926.5, 60 sec: 14199.4, 300 sec: 14398.5). Total num frames: 977727488. Throughput: 0: 3390.3. Samples: 233599486. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-04 15:44:53,968][134211] Avg episode reward: [(0, '9.624')] [2025-01-04 15:44:54,141][134294] Updated weights for policy 0, policy_version 238704 (0.0025) [2025-01-04 15:44:57,186][134294] Updated weights for policy 0, policy_version 238714 (0.0027) [2025-01-04 15:44:58,968][134211] Fps is (10 sec: 13516.5, 60 sec: 13516.8, 300 sec: 14384.6). Total num frames: 977793024. Throughput: 0: 3398.7. Samples: 233619718. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-04 15:44:58,968][134211] Avg episode reward: [(0, '10.065')] [2025-01-04 15:45:00,129][134294] Updated weights for policy 0, policy_version 238724 (0.0026) [2025-01-04 15:45:03,128][134294] Updated weights for policy 0, policy_version 238734 (0.0026) [2025-01-04 15:45:03,968][134211] Fps is (10 sec: 13516.8, 60 sec: 13516.8, 300 sec: 14398.5). Total num frames: 977862656. Throughput: 0: 3418.2. Samples: 233630400. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-04 15:45:03,968][134211] Avg episode reward: [(0, '10.415')] [2025-01-04 15:45:06,017][134294] Updated weights for policy 0, policy_version 238744 (0.0024) [2025-01-04 15:45:08,969][134211] Fps is (10 sec: 13925.5, 60 sec: 13584.9, 300 sec: 14384.6). Total num frames: 977932288. Throughput: 0: 3423.4. Samples: 233650940. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 15:45:08,969][134211] Avg episode reward: [(0, '10.735')] [2025-01-04 15:45:09,089][134294] Updated weights for policy 0, policy_version 238754 (0.0027) [2025-01-04 15:45:11,620][134294] Updated weights for policy 0, policy_version 238764 (0.0018) [2025-01-04 15:45:13,430][134294] Updated weights for policy 0, policy_version 238774 (0.0012) [2025-01-04 15:45:13,968][134211] Fps is (10 sec: 16384.4, 60 sec: 14063.0, 300 sec: 14467.9). Total num frames: 978026496. Throughput: 0: 3536.1. Samples: 233676194. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 15:45:13,968][134211] Avg episode reward: [(0, '9.182')] [2025-01-04 15:45:14,035][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000238777_978030592.pth... [2025-01-04 15:45:14,077][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000237928_974553088.pth [2025-01-04 15:45:15,378][134294] Updated weights for policy 0, policy_version 238784 (0.0012) [2025-01-04 15:45:17,259][134294] Updated weights for policy 0, policy_version 238794 (0.0013) [2025-01-04 15:45:18,968][134211] Fps is (10 sec: 20072.1, 60 sec: 14745.6, 300 sec: 14579.0). Total num frames: 978132992. Throughput: 0: 3659.7. Samples: 233692390. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 15:45:18,968][134211] Avg episode reward: [(0, '9.795')] [2025-01-04 15:45:19,292][134294] Updated weights for policy 0, policy_version 238804 (0.0014) [2025-01-04 15:45:22,422][134294] Updated weights for policy 0, policy_version 238814 (0.0026) [2025-01-04 15:45:23,968][134211] Fps is (10 sec: 17203.0, 60 sec: 14677.4, 300 sec: 14440.1). Total num frames: 978198528. Throughput: 0: 3775.0. Samples: 233718026. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 15:45:23,968][134211] Avg episode reward: [(0, '9.537')] [2025-01-04 15:45:25,653][134294] Updated weights for policy 0, policy_version 238824 (0.0027) [2025-01-04 15:45:28,622][134294] Updated weights for policy 0, policy_version 238834 (0.0028) [2025-01-04 15:45:28,968][134211] Fps is (10 sec: 13516.9, 60 sec: 14745.6, 300 sec: 14426.3). Total num frames: 978268160. Throughput: 0: 3763.7. Samples: 233737610. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 15:45:28,968][134211] Avg episode reward: [(0, '10.187')] [2025-01-04 15:45:31,711][134294] Updated weights for policy 0, policy_version 238844 (0.0027) [2025-01-04 15:45:33,968][134211] Fps is (10 sec: 13516.6, 60 sec: 14677.3, 300 sec: 14426.2). Total num frames: 978333696. Throughput: 0: 3749.8. Samples: 233747566. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 15:45:33,968][134211] Avg episode reward: [(0, '10.197')] [2025-01-04 15:45:34,872][134294] Updated weights for policy 0, policy_version 238854 (0.0028) [2025-01-04 15:45:37,901][134294] Updated weights for policy 0, policy_version 238864 (0.0022) [2025-01-04 15:45:38,968][134211] Fps is (10 sec: 13106.8, 60 sec: 14609.0, 300 sec: 14426.2). Total num frames: 978399232. Throughput: 0: 3731.5. Samples: 233767404. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 15:45:38,969][134211] Avg episode reward: [(0, '9.925')] [2025-01-04 15:45:40,895][134294] Updated weights for policy 0, policy_version 238874 (0.0024) [2025-01-04 15:45:43,802][134294] Updated weights for policy 0, policy_version 238884 (0.0022) [2025-01-04 15:45:43,968][134211] Fps is (10 sec: 13517.0, 60 sec: 14677.4, 300 sec: 14426.3). Total num frames: 978468864. Throughput: 0: 3744.7. Samples: 233788230. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 15:45:43,968][134211] Avg episode reward: [(0, '9.843')] [2025-01-04 15:45:46,843][134294] Updated weights for policy 0, policy_version 238894 (0.0025) [2025-01-04 15:45:48,968][134211] Fps is (10 sec: 13517.2, 60 sec: 14609.1, 300 sec: 14426.3). Total num frames: 978534400. Throughput: 0: 3734.4. Samples: 233798446. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 15:45:48,968][134211] Avg episode reward: [(0, '8.924')] [2025-01-04 15:45:49,896][134294] Updated weights for policy 0, policy_version 238904 (0.0023) [2025-01-04 15:45:52,869][134294] Updated weights for policy 0, policy_version 238914 (0.0026) [2025-01-04 15:45:53,969][134211] Fps is (10 sec: 13515.7, 60 sec: 14608.9, 300 sec: 14440.1). Total num frames: 978604032. Throughput: 0: 3726.8. Samples: 233818644. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 15:45:53,969][134211] Avg episode reward: [(0, '9.376')] [2025-01-04 15:45:56,124][134294] Updated weights for policy 0, policy_version 238924 (0.0028) [2025-01-04 15:45:58,968][134211] Fps is (10 sec: 12697.4, 60 sec: 14472.6, 300 sec: 14398.5). Total num frames: 978661376. Throughput: 0: 3573.5. Samples: 233837002. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 15:45:58,969][134211] Avg episode reward: [(0, '9.793')] [2025-01-04 15:45:59,916][134294] Updated weights for policy 0, policy_version 238934 (0.0036) [2025-01-04 15:46:03,210][134294] Updated weights for policy 0, policy_version 238944 (0.0022) [2025-01-04 15:46:03,968][134211] Fps is (10 sec: 12289.1, 60 sec: 14404.3, 300 sec: 14384.6). Total num frames: 978726912. Throughput: 0: 3396.5. Samples: 233845230. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 15:46:03,968][134211] Avg episode reward: [(0, '11.222')] [2025-01-04 15:46:05,220][134294] Updated weights for policy 0, policy_version 238954 (0.0013) [2025-01-04 15:46:07,125][134294] Updated weights for policy 0, policy_version 238964 (0.0012) [2025-01-04 15:46:08,967][134211] Fps is (10 sec: 17203.7, 60 sec: 15018.9, 300 sec: 14523.5). Total num frames: 978833408. Throughput: 0: 3440.5. Samples: 233872846. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 15:46:08,968][134211] Avg episode reward: [(0, '10.153')] [2025-01-04 15:46:08,996][134294] Updated weights for policy 0, policy_version 238974 (0.0014) [2025-01-04 15:46:10,812][134294] Updated weights for policy 0, policy_version 238984 (0.0013) [2025-01-04 15:46:12,789][134294] Updated weights for policy 0, policy_version 238994 (0.0014) [2025-01-04 15:46:13,968][134211] Fps is (10 sec: 20889.2, 60 sec: 15155.1, 300 sec: 14634.6). Total num frames: 978935808. Throughput: 0: 3712.6. Samples: 233904678. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 15:46:13,968][134211] Avg episode reward: [(0, '8.729')] [2025-01-04 15:46:15,656][134294] Updated weights for policy 0, policy_version 239004 (0.0025) [2025-01-04 15:46:18,674][134294] Updated weights for policy 0, policy_version 239014 (0.0027) [2025-01-04 15:46:18,968][134211] Fps is (10 sec: 16793.2, 60 sec: 14472.5, 300 sec: 14620.6). Total num frames: 979001344. Throughput: 0: 3723.0. Samples: 233915102. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 15:46:18,968][134211] Avg episode reward: [(0, '9.105')] [2025-01-04 15:46:21,684][134294] Updated weights for policy 0, policy_version 239024 (0.0026) [2025-01-04 15:46:23,968][134211] Fps is (10 sec: 13516.3, 60 sec: 14540.7, 300 sec: 14565.1). Total num frames: 979070976. Throughput: 0: 3728.5. Samples: 233935188. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 15:46:23,969][134211] Avg episode reward: [(0, '8.956')] [2025-01-04 15:46:24,887][134294] Updated weights for policy 0, policy_version 239034 (0.0026) [2025-01-04 15:46:27,855][134294] Updated weights for policy 0, policy_version 239044 (0.0026) [2025-01-04 15:46:28,968][134211] Fps is (10 sec: 13516.8, 60 sec: 14472.5, 300 sec: 14565.1). Total num frames: 979136512. Throughput: 0: 3716.7. Samples: 233955480. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 15:46:28,968][134211] Avg episode reward: [(0, '9.280')] [2025-01-04 15:46:30,827][134294] Updated weights for policy 0, policy_version 239054 (0.0025) [2025-01-04 15:46:33,650][134294] Updated weights for policy 0, policy_version 239064 (0.0022) [2025-01-04 15:46:33,968][134211] Fps is (10 sec: 13927.1, 60 sec: 14609.1, 300 sec: 14579.0). Total num frames: 979210240. Throughput: 0: 3720.3. Samples: 233965860. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 15:46:33,968][134211] Avg episode reward: [(0, '9.564')] [2025-01-04 15:46:36,540][134294] Updated weights for policy 0, policy_version 239074 (0.0025) [2025-01-04 15:46:38,968][134211] Fps is (10 sec: 14336.0, 60 sec: 14677.4, 300 sec: 14592.9). Total num frames: 979279872. Throughput: 0: 3744.3. Samples: 233987134. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 15:46:38,968][134211] Avg episode reward: [(0, '10.099')] [2025-01-04 15:46:39,507][134294] Updated weights for policy 0, policy_version 239084 (0.0026) [2025-01-04 15:46:42,482][134294] Updated weights for policy 0, policy_version 239094 (0.0026) [2025-01-04 15:46:43,968][134211] Fps is (10 sec: 13926.4, 60 sec: 14677.3, 300 sec: 14565.1). Total num frames: 979349504. Throughput: 0: 3796.5. Samples: 234007846. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 15:46:43,968][134211] Avg episode reward: [(0, '10.465')] [2025-01-04 15:46:45,395][134294] Updated weights for policy 0, policy_version 239104 (0.0024) [2025-01-04 15:46:48,267][134294] Updated weights for policy 0, policy_version 239114 (0.0026) [2025-01-04 15:46:48,968][134211] Fps is (10 sec: 13926.3, 60 sec: 14745.6, 300 sec: 14467.9). Total num frames: 979419136. Throughput: 0: 3853.3. Samples: 234018628. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 15:46:48,968][134211] Avg episode reward: [(0, '8.474')] [2025-01-04 15:46:51,091][134294] Updated weights for policy 0, policy_version 239124 (0.0024) [2025-01-04 15:46:53,968][134211] Fps is (10 sec: 13925.5, 60 sec: 14745.6, 300 sec: 14481.8). Total num frames: 979488768. Throughput: 0: 3708.9. Samples: 234039750. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 15:46:53,969][134211] Avg episode reward: [(0, '9.954')] [2025-01-04 15:46:54,116][134294] Updated weights for policy 0, policy_version 239134 (0.0024) [2025-01-04 15:46:56,987][134294] Updated weights for policy 0, policy_version 239144 (0.0023) [2025-01-04 15:46:58,968][134211] Fps is (10 sec: 13926.6, 60 sec: 14950.4, 300 sec: 14495.7). Total num frames: 979558400. Throughput: 0: 3465.9. Samples: 234060642. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 15:46:58,968][134211] Avg episode reward: [(0, '9.111')] [2025-01-04 15:47:00,005][134294] Updated weights for policy 0, policy_version 239154 (0.0026) [2025-01-04 15:47:02,741][134294] Updated weights for policy 0, policy_version 239164 (0.0026) [2025-01-04 15:47:03,968][134211] Fps is (10 sec: 13927.1, 60 sec: 15018.6, 300 sec: 14509.6). Total num frames: 979628032. Throughput: 0: 3472.4. Samples: 234071362. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 15:47:03,968][134211] Avg episode reward: [(0, '10.211')] [2025-01-04 15:47:05,707][134294] Updated weights for policy 0, policy_version 239174 (0.0025) [2025-01-04 15:47:08,595][134294] Updated weights for policy 0, policy_version 239184 (0.0025) [2025-01-04 15:47:08,968][134211] Fps is (10 sec: 14336.0, 60 sec: 14472.5, 300 sec: 14523.5). Total num frames: 979701760. Throughput: 0: 3502.8. Samples: 234092814. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 15:47:08,968][134211] Avg episode reward: [(0, '10.752')] [2025-01-04 15:47:11,620][134294] Updated weights for policy 0, policy_version 239194 (0.0026) [2025-01-04 15:47:13,968][134211] Fps is (10 sec: 13926.4, 60 sec: 13858.1, 300 sec: 14509.6). Total num frames: 979767296. Throughput: 0: 3499.1. Samples: 234112942. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 15:47:13,969][134211] Avg episode reward: [(0, '10.760')] [2025-01-04 15:47:13,980][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000239201_979767296.pth... [2025-01-04 15:47:14,058][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000238353_976293888.pth [2025-01-04 15:47:14,694][134294] Updated weights for policy 0, policy_version 239204 (0.0025) [2025-01-04 15:47:17,657][134294] Updated weights for policy 0, policy_version 239214 (0.0025) [2025-01-04 15:47:18,968][134211] Fps is (10 sec: 13516.6, 60 sec: 13926.4, 300 sec: 14523.4). Total num frames: 979836928. Throughput: 0: 3495.0. Samples: 234123136. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 15:47:18,968][134211] Avg episode reward: [(0, '9.926')] [2025-01-04 15:47:20,580][134294] Updated weights for policy 0, policy_version 239224 (0.0027) [2025-01-04 15:47:23,426][134294] Updated weights for policy 0, policy_version 239234 (0.0024) [2025-01-04 15:47:23,968][134211] Fps is (10 sec: 14336.3, 60 sec: 13994.8, 300 sec: 14537.3). Total num frames: 979910656. Throughput: 0: 3500.1. Samples: 234144640. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 15:47:23,968][134211] Avg episode reward: [(0, '9.635')] [2025-01-04 15:47:26,245][134294] Updated weights for policy 0, policy_version 239244 (0.0024) [2025-01-04 15:47:28,968][134211] Fps is (10 sec: 14336.0, 60 sec: 14062.9, 300 sec: 14537.3). Total num frames: 979980288. Throughput: 0: 3518.3. Samples: 234166172. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 15:47:28,968][134211] Avg episode reward: [(0, '10.587')] [2025-01-04 15:47:29,022][134294] Updated weights for policy 0, policy_version 239254 (0.0025) [2025-01-04 15:47:31,933][134294] Updated weights for policy 0, policy_version 239264 (0.0023) [2025-01-04 15:47:33,968][134211] Fps is (10 sec: 13926.0, 60 sec: 13994.6, 300 sec: 14537.3). Total num frames: 980049920. Throughput: 0: 3508.9. Samples: 234176530. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 15:47:33,969][134211] Avg episode reward: [(0, '9.615')] [2025-01-04 15:47:34,947][134294] Updated weights for policy 0, policy_version 239274 (0.0025) [2025-01-04 15:47:37,906][134294] Updated weights for policy 0, policy_version 239284 (0.0023) [2025-01-04 15:47:38,968][134211] Fps is (10 sec: 13926.4, 60 sec: 13994.7, 300 sec: 14412.4). Total num frames: 980119552. Throughput: 0: 3506.4. Samples: 234197538. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 15:47:38,968][134211] Avg episode reward: [(0, '9.440')] [2025-01-04 15:47:40,736][134294] Updated weights for policy 0, policy_version 239294 (0.0024) [2025-01-04 15:47:43,610][134294] Updated weights for policy 0, policy_version 239304 (0.0023) [2025-01-04 15:47:43,968][134211] Fps is (10 sec: 14336.4, 60 sec: 14062.9, 300 sec: 14342.9). Total num frames: 980193280. Throughput: 0: 3520.8. Samples: 234219078. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 15:47:43,968][134211] Avg episode reward: [(0, '9.669')] [2025-01-04 15:47:45,956][134294] Updated weights for policy 0, policy_version 239314 (0.0016) [2025-01-04 15:47:48,110][134294] Updated weights for policy 0, policy_version 239324 (0.0019) [2025-01-04 15:47:48,968][134211] Fps is (10 sec: 15974.4, 60 sec: 14336.0, 300 sec: 14426.3). Total num frames: 980279296. Throughput: 0: 3571.0. Samples: 234232056. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 15:47:48,968][134211] Avg episode reward: [(0, '10.761')] [2025-01-04 15:47:51,015][134294] Updated weights for policy 0, policy_version 239334 (0.0024) [2025-01-04 15:47:53,968][134211] Fps is (10 sec: 15564.8, 60 sec: 14336.2, 300 sec: 14426.3). Total num frames: 980348928. Throughput: 0: 3606.7. Samples: 234255114. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 15:47:53,968][134211] Avg episode reward: [(0, '9.575')] [2025-01-04 15:47:54,074][134294] Updated weights for policy 0, policy_version 239344 (0.0025) [2025-01-04 15:47:56,960][134294] Updated weights for policy 0, policy_version 239354 (0.0024) [2025-01-04 15:47:58,968][134211] Fps is (10 sec: 13926.4, 60 sec: 14336.0, 300 sec: 14440.1). Total num frames: 980418560. Throughput: 0: 3620.4. Samples: 234275858. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 15:47:58,968][134211] Avg episode reward: [(0, '9.793')] [2025-01-04 15:47:59,926][134294] Updated weights for policy 0, policy_version 239364 (0.0025) [2025-01-04 15:48:02,763][134294] Updated weights for policy 0, policy_version 239374 (0.0023) [2025-01-04 15:48:03,968][134211] Fps is (10 sec: 13926.4, 60 sec: 14336.0, 300 sec: 14454.0). Total num frames: 980488192. Throughput: 0: 3631.8. Samples: 234286566. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 15:48:03,968][134211] Avg episode reward: [(0, '10.383')] [2025-01-04 15:48:05,704][134294] Updated weights for policy 0, policy_version 239384 (0.0024) [2025-01-04 15:48:07,754][134294] Updated weights for policy 0, policy_version 239394 (0.0015) [2025-01-04 15:48:08,968][134211] Fps is (10 sec: 15974.5, 60 sec: 14609.1, 300 sec: 14523.5). Total num frames: 980578304. Throughput: 0: 3667.0. Samples: 234309654. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 15:48:08,968][134211] Avg episode reward: [(0, '10.392')] [2025-01-04 15:48:10,268][134294] Updated weights for policy 0, policy_version 239404 (0.0020) [2025-01-04 15:48:13,143][134294] Updated weights for policy 0, policy_version 239414 (0.0025) [2025-01-04 15:48:13,968][134211] Fps is (10 sec: 15974.2, 60 sec: 14677.3, 300 sec: 14537.3). Total num frames: 980647936. Throughput: 0: 3716.1. Samples: 234333396. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 15:48:13,968][134211] Avg episode reward: [(0, '9.751')] [2025-01-04 15:48:16,027][134294] Updated weights for policy 0, policy_version 239424 (0.0024) [2025-01-04 15:48:18,968][134211] Fps is (10 sec: 13926.3, 60 sec: 14677.3, 300 sec: 14537.3). Total num frames: 980717568. Throughput: 0: 3720.1. Samples: 234343934. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 15:48:18,968][134211] Avg episode reward: [(0, '9.424')] [2025-01-04 15:48:19,112][134294] Updated weights for policy 0, policy_version 239434 (0.0028) [2025-01-04 15:48:22,020][134294] Updated weights for policy 0, policy_version 239444 (0.0024) [2025-01-04 15:48:23,968][134211] Fps is (10 sec: 13926.6, 60 sec: 14609.1, 300 sec: 14537.4). Total num frames: 980787200. Throughput: 0: 3707.5. Samples: 234364376. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 15:48:23,968][134211] Avg episode reward: [(0, '9.166')] [2025-01-04 15:48:24,939][134294] Updated weights for policy 0, policy_version 239454 (0.0024) [2025-01-04 15:48:27,100][134294] Updated weights for policy 0, policy_version 239464 (0.0017) [2025-01-04 15:48:28,968][134211] Fps is (10 sec: 15974.7, 60 sec: 14950.4, 300 sec: 14620.6). Total num frames: 980877312. Throughput: 0: 3785.7. Samples: 234389434. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 15:48:28,968][134211] Avg episode reward: [(0, '10.190')] [2025-01-04 15:48:29,508][134294] Updated weights for policy 0, policy_version 239474 (0.0020) [2025-01-04 15:48:32,401][134294] Updated weights for policy 0, policy_version 239484 (0.0025) [2025-01-04 15:48:33,968][134211] Fps is (10 sec: 15974.2, 60 sec: 14950.4, 300 sec: 14620.6). Total num frames: 980946944. Throughput: 0: 3745.6. Samples: 234400608. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 15:48:33,968][134211] Avg episode reward: [(0, '9.153')] [2025-01-04 15:48:35,300][134294] Updated weights for policy 0, policy_version 239494 (0.0025) [2025-01-04 15:48:38,125][134294] Updated weights for policy 0, policy_version 239504 (0.0024) [2025-01-04 15:48:38,968][134211] Fps is (10 sec: 13926.4, 60 sec: 14950.4, 300 sec: 14620.6). Total num frames: 981016576. Throughput: 0: 3708.0. Samples: 234421974. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 15:48:38,968][134211] Avg episode reward: [(0, '10.381')] [2025-01-04 15:48:41,007][134294] Updated weights for policy 0, policy_version 239514 (0.0024) [2025-01-04 15:48:43,952][134294] Updated weights for policy 0, policy_version 239524 (0.0025) [2025-01-04 15:48:43,968][134211] Fps is (10 sec: 14335.6, 60 sec: 14950.3, 300 sec: 14648.4). Total num frames: 981090304. Throughput: 0: 3712.8. Samples: 234442934. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 15:48:43,969][134211] Avg episode reward: [(0, '8.493')] [2025-01-04 15:48:46,946][134294] Updated weights for policy 0, policy_version 239534 (0.0024) [2025-01-04 15:48:48,969][134211] Fps is (10 sec: 13925.1, 60 sec: 14608.9, 300 sec: 14509.5). Total num frames: 981155840. Throughput: 0: 3706.4. Samples: 234453358. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 15:48:48,969][134211] Avg episode reward: [(0, '9.433')] [2025-01-04 15:48:49,872][134294] Updated weights for policy 0, policy_version 239544 (0.0028) [2025-01-04 15:48:52,772][134294] Updated weights for policy 0, policy_version 239554 (0.0022) [2025-01-04 15:48:53,968][134211] Fps is (10 sec: 13517.1, 60 sec: 14609.0, 300 sec: 14384.6). Total num frames: 981225472. Throughput: 0: 3657.7. Samples: 234474252. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 15:48:53,968][134211] Avg episode reward: [(0, '9.785')] [2025-01-04 15:48:55,700][134294] Updated weights for policy 0, policy_version 239564 (0.0023) [2025-01-04 15:48:58,526][134294] Updated weights for policy 0, policy_version 239574 (0.0023) [2025-01-04 15:48:58,968][134211] Fps is (10 sec: 14337.2, 60 sec: 14677.3, 300 sec: 14398.5). Total num frames: 981299200. Throughput: 0: 3607.7. Samples: 234495742. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 15:48:58,968][134211] Avg episode reward: [(0, '9.986')] [2025-01-04 15:49:01,049][134294] Updated weights for policy 0, policy_version 239584 (0.0019) [2025-01-04 15:49:02,946][134294] Updated weights for policy 0, policy_version 239594 (0.0011) [2025-01-04 15:49:03,967][134211] Fps is (10 sec: 17203.7, 60 sec: 15155.2, 300 sec: 14509.6). Total num frames: 981397504. Throughput: 0: 3645.8. Samples: 234507996. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 15:49:03,968][134211] Avg episode reward: [(0, '9.266')] [2025-01-04 15:49:04,785][134294] Updated weights for policy 0, policy_version 239604 (0.0013) [2025-01-04 15:49:06,685][134294] Updated weights for policy 0, policy_version 239614 (0.0014) [2025-01-04 15:49:08,596][134294] Updated weights for policy 0, policy_version 239624 (0.0015) [2025-01-04 15:49:08,968][134211] Fps is (10 sec: 20480.3, 60 sec: 15428.3, 300 sec: 14648.4). Total num frames: 981504000. Throughput: 0: 3921.6. Samples: 234540846. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 15:49:08,968][134211] Avg episode reward: [(0, '8.843')] [2025-01-04 15:49:11,366][134294] Updated weights for policy 0, policy_version 239634 (0.0024) [2025-01-04 15:49:13,968][134211] Fps is (10 sec: 17612.3, 60 sec: 15428.3, 300 sec: 14662.3). Total num frames: 981573632. Throughput: 0: 3890.5. Samples: 234564508. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 15:49:13,968][134211] Avg episode reward: [(0, '10.319')] [2025-01-04 15:49:13,979][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000239642_981573632.pth... [2025-01-04 15:49:14,052][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000238777_978030592.pth [2025-01-04 15:49:14,461][134294] Updated weights for policy 0, policy_version 239644 (0.0027) [2025-01-04 15:49:17,586][134294] Updated weights for policy 0, policy_version 239654 (0.0028) [2025-01-04 15:49:18,968][134211] Fps is (10 sec: 13516.7, 60 sec: 15360.0, 300 sec: 14648.4). Total num frames: 981639168. Throughput: 0: 3857.3. Samples: 234574184. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 15:49:18,968][134211] Avg episode reward: [(0, '9.826')] [2025-01-04 15:49:20,581][134294] Updated weights for policy 0, policy_version 239664 (0.0025) [2025-01-04 15:49:23,512][134294] Updated weights for policy 0, policy_version 239674 (0.0026) [2025-01-04 15:49:23,968][134211] Fps is (10 sec: 13516.8, 60 sec: 15360.0, 300 sec: 14662.3). Total num frames: 981708800. Throughput: 0: 3841.8. Samples: 234594858. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 15:49:23,968][134211] Avg episode reward: [(0, '9.269')] [2025-01-04 15:49:26,369][134294] Updated weights for policy 0, policy_version 239684 (0.0025) [2025-01-04 15:49:28,968][134211] Fps is (10 sec: 13926.3, 60 sec: 15018.6, 300 sec: 14662.3). Total num frames: 981778432. Throughput: 0: 3835.0. Samples: 234615508. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 15:49:28,968][134211] Avg episode reward: [(0, '8.829')] [2025-01-04 15:49:29,476][134294] Updated weights for policy 0, policy_version 239694 (0.0023) [2025-01-04 15:49:32,360][134294] Updated weights for policy 0, policy_version 239704 (0.0025) [2025-01-04 15:49:33,968][134211] Fps is (10 sec: 13926.4, 60 sec: 15018.7, 300 sec: 14662.3). Total num frames: 981848064. Throughput: 0: 3834.1. Samples: 234625890. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 15:49:33,968][134211] Avg episode reward: [(0, '10.684')] [2025-01-04 15:49:35,286][134294] Updated weights for policy 0, policy_version 239714 (0.0025) [2025-01-04 15:49:38,159][134294] Updated weights for policy 0, policy_version 239724 (0.0024) [2025-01-04 15:49:38,968][134211] Fps is (10 sec: 13926.4, 60 sec: 15018.6, 300 sec: 14676.2). Total num frames: 981917696. Throughput: 0: 3844.8. Samples: 234647266. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 15:49:38,968][134211] Avg episode reward: [(0, '9.346')] [2025-01-04 15:49:41,062][134294] Updated weights for policy 0, policy_version 239734 (0.0023) [2025-01-04 15:49:43,968][134211] Fps is (10 sec: 13926.6, 60 sec: 14950.5, 300 sec: 14676.2). Total num frames: 981987328. Throughput: 0: 3829.3. Samples: 234668062. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 15:49:43,968][134211] Avg episode reward: [(0, '9.381')] [2025-01-04 15:49:44,026][134294] Updated weights for policy 0, policy_version 239744 (0.0024) [2025-01-04 15:49:47,203][134294] Updated weights for policy 0, policy_version 239754 (0.0029) [2025-01-04 15:49:48,968][134211] Fps is (10 sec: 13516.7, 60 sec: 14950.6, 300 sec: 14662.3). Total num frames: 982052864. Throughput: 0: 3784.1. Samples: 234678282. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 15:49:48,968][134211] Avg episode reward: [(0, '10.340')] [2025-01-04 15:49:50,129][134294] Updated weights for policy 0, policy_version 239764 (0.0024) [2025-01-04 15:49:53,699][134294] Updated weights for policy 0, policy_version 239774 (0.0030) [2025-01-04 15:49:53,968][134211] Fps is (10 sec: 13107.0, 60 sec: 14882.1, 300 sec: 14662.3). Total num frames: 982118400. Throughput: 0: 3497.4. Samples: 234698228. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 15:49:53,969][134211] Avg episode reward: [(0, '9.639')] [2025-01-04 15:49:57,087][134294] Updated weights for policy 0, policy_version 239784 (0.0027) [2025-01-04 15:49:58,968][134211] Fps is (10 sec: 13517.1, 60 sec: 14813.9, 300 sec: 14662.3). Total num frames: 982188032. Throughput: 0: 3390.5. Samples: 234717080. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 15:49:58,968][134211] Avg episode reward: [(0, '9.755')] [2025-01-04 15:49:59,138][134294] Updated weights for policy 0, policy_version 239794 (0.0012) [2025-01-04 15:50:01,146][134294] Updated weights for policy 0, policy_version 239804 (0.0015) [2025-01-04 15:50:03,127][134294] Updated weights for policy 0, policy_version 239814 (0.0014) [2025-01-04 15:50:03,968][134211] Fps is (10 sec: 17203.7, 60 sec: 14882.1, 300 sec: 14773.4). Total num frames: 982290432. Throughput: 0: 3522.6. Samples: 234732702. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 15:50:03,968][134211] Avg episode reward: [(0, '9.689')] [2025-01-04 15:50:05,316][134294] Updated weights for policy 0, policy_version 239824 (0.0016) [2025-01-04 15:50:08,968][134211] Fps is (10 sec: 16793.0, 60 sec: 14199.4, 300 sec: 14676.2). Total num frames: 982355968. Throughput: 0: 3637.9. Samples: 234758562. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 15:50:08,969][134211] Avg episode reward: [(0, '10.595')] [2025-01-04 15:50:09,087][134294] Updated weights for policy 0, policy_version 239834 (0.0031) [2025-01-04 15:50:12,520][134294] Updated weights for policy 0, policy_version 239844 (0.0030) [2025-01-04 15:50:13,969][134211] Fps is (10 sec: 12696.3, 60 sec: 14062.8, 300 sec: 14523.4). Total num frames: 982417408. Throughput: 0: 3550.7. Samples: 234775294. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 15:50:13,969][134211] Avg episode reward: [(0, '9.721')] [2025-01-04 15:50:15,977][134294] Updated weights for policy 0, policy_version 239854 (0.0027) [2025-01-04 15:50:18,968][134211] Fps is (10 sec: 12287.6, 60 sec: 13994.5, 300 sec: 14509.5). Total num frames: 982478848. Throughput: 0: 3523.1. Samples: 234784432. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 15:50:18,969][134211] Avg episode reward: [(0, '10.751')] [2025-01-04 15:50:19,024][134294] Updated weights for policy 0, policy_version 239864 (0.0026) [2025-01-04 15:50:22,123][134294] Updated weights for policy 0, policy_version 239874 (0.0028) [2025-01-04 15:50:23,968][134211] Fps is (10 sec: 12698.5, 60 sec: 13926.4, 300 sec: 14495.7). Total num frames: 982544384. Throughput: 0: 3492.5. Samples: 234804428. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 15:50:23,969][134211] Avg episode reward: [(0, '9.417')] [2025-01-04 15:50:25,307][134294] Updated weights for policy 0, policy_version 239884 (0.0029) [2025-01-04 15:50:28,191][134294] Updated weights for policy 0, policy_version 239894 (0.0024) [2025-01-04 15:50:28,968][134211] Fps is (10 sec: 13517.5, 60 sec: 13926.4, 300 sec: 14509.6). Total num frames: 982614016. Throughput: 0: 3476.8. Samples: 234824518. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 15:50:28,968][134211] Avg episode reward: [(0, '10.131')] [2025-01-04 15:50:31,294][134294] Updated weights for policy 0, policy_version 239904 (0.0027) [2025-01-04 15:50:33,968][134211] Fps is (10 sec: 13516.9, 60 sec: 13858.1, 300 sec: 14509.6). Total num frames: 982679552. Throughput: 0: 3471.7. Samples: 234834508. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 15:50:33,968][134211] Avg episode reward: [(0, '10.018')] [2025-01-04 15:50:34,458][134294] Updated weights for policy 0, policy_version 239914 (0.0025) [2025-01-04 15:50:37,148][134294] Updated weights for policy 0, policy_version 239924 (0.0019) [2025-01-04 15:50:38,967][134211] Fps is (10 sec: 15155.4, 60 sec: 14131.2, 300 sec: 14565.1). Total num frames: 982765568. Throughput: 0: 3495.0. Samples: 234855500. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 15:50:38,968][134211] Avg episode reward: [(0, '9.639')] [2025-01-04 15:50:39,058][134294] Updated weights for policy 0, policy_version 239934 (0.0015) [2025-01-04 15:50:40,936][134294] Updated weights for policy 0, policy_version 239944 (0.0013) [2025-01-04 15:50:42,861][134294] Updated weights for policy 0, policy_version 239954 (0.0013) [2025-01-04 15:50:43,968][134211] Fps is (10 sec: 19251.2, 60 sec: 14745.6, 300 sec: 14703.9). Total num frames: 982872064. Throughput: 0: 3801.1. Samples: 234888130. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 15:50:43,968][134211] Avg episode reward: [(0, '9.795')] [2025-01-04 15:50:44,691][134294] Updated weights for policy 0, policy_version 239964 (0.0013) [2025-01-04 15:50:46,840][134294] Updated weights for policy 0, policy_version 239974 (0.0018) [2025-01-04 15:50:48,968][134211] Fps is (10 sec: 19250.0, 60 sec: 15086.8, 300 sec: 14759.5). Total num frames: 982958080. Throughput: 0: 3805.9. Samples: 234903970. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 15:50:48,969][134211] Avg episode reward: [(0, '9.235')] [2025-01-04 15:50:50,016][134294] Updated weights for policy 0, policy_version 239984 (0.0028) [2025-01-04 15:50:53,225][134294] Updated weights for policy 0, policy_version 239994 (0.0030) [2025-01-04 15:50:53,968][134211] Fps is (10 sec: 15155.1, 60 sec: 15086.9, 300 sec: 14787.2). Total num frames: 983023616. Throughput: 0: 3667.3. Samples: 234923590. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 15:50:53,968][134211] Avg episode reward: [(0, '9.186')] [2025-01-04 15:50:56,392][134294] Updated weights for policy 0, policy_version 240004 (0.0026) [2025-01-04 15:50:58,968][134211] Fps is (10 sec: 13107.7, 60 sec: 15018.6, 300 sec: 14787.2). Total num frames: 983089152. Throughput: 0: 3729.5. Samples: 234943118. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 15:50:58,968][134211] Avg episode reward: [(0, '10.168')] [2025-01-04 15:50:59,537][134294] Updated weights for policy 0, policy_version 240014 (0.0026) [2025-01-04 15:51:02,628][134294] Updated weights for policy 0, policy_version 240024 (0.0024) [2025-01-04 15:51:03,968][134211] Fps is (10 sec: 13106.6, 60 sec: 14404.1, 300 sec: 14648.4). Total num frames: 983154688. Throughput: 0: 3745.9. Samples: 234952998. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 15:51:03,969][134211] Avg episode reward: [(0, '9.924')] [2025-01-04 15:51:05,591][134294] Updated weights for policy 0, policy_version 240034 (0.0024) [2025-01-04 15:51:08,540][134294] Updated weights for policy 0, policy_version 240044 (0.0024) [2025-01-04 15:51:08,968][134211] Fps is (10 sec: 13516.8, 60 sec: 14472.6, 300 sec: 14537.3). Total num frames: 983224320. Throughput: 0: 3761.8. Samples: 234973708. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 15:51:08,968][134211] Avg episode reward: [(0, '10.103')] [2025-01-04 15:51:11,493][134294] Updated weights for policy 0, policy_version 240054 (0.0026) [2025-01-04 15:51:13,968][134211] Fps is (10 sec: 13927.0, 60 sec: 14609.2, 300 sec: 14551.2). Total num frames: 983293952. Throughput: 0: 3768.2. Samples: 234994090. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 15:51:13,969][134211] Avg episode reward: [(0, '10.171')] [2025-01-04 15:51:13,979][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000240062_983293952.pth... [2025-01-04 15:51:14,057][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000239201_979767296.pth [2025-01-04 15:51:14,618][134294] Updated weights for policy 0, policy_version 240064 (0.0029) [2025-01-04 15:51:17,565][134294] Updated weights for policy 0, policy_version 240074 (0.0028) [2025-01-04 15:51:18,968][134211] Fps is (10 sec: 13516.9, 60 sec: 14677.5, 300 sec: 14537.4). Total num frames: 983359488. Throughput: 0: 3765.5. Samples: 235003956. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 15:51:18,968][134211] Avg episode reward: [(0, '8.996')] [2025-01-04 15:51:20,552][134294] Updated weights for policy 0, policy_version 240084 (0.0025) [2025-01-04 15:51:23,442][134294] Updated weights for policy 0, policy_version 240094 (0.0024) [2025-01-04 15:51:23,968][134211] Fps is (10 sec: 13516.1, 60 sec: 14745.5, 300 sec: 14551.2). Total num frames: 983429120. Throughput: 0: 3767.3. Samples: 235025034. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 15:51:23,969][134211] Avg episode reward: [(0, '9.747')] [2025-01-04 15:51:26,346][134294] Updated weights for policy 0, policy_version 240104 (0.0021) [2025-01-04 15:51:28,968][134211] Fps is (10 sec: 13926.2, 60 sec: 14745.6, 300 sec: 14537.3). Total num frames: 983498752. Throughput: 0: 3499.8. Samples: 235045622. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 15:51:28,968][134211] Avg episode reward: [(0, '10.613')] [2025-01-04 15:51:29,458][134294] Updated weights for policy 0, policy_version 240114 (0.0024) [2025-01-04 15:51:32,425][134294] Updated weights for policy 0, policy_version 240124 (0.0024) [2025-01-04 15:51:33,968][134211] Fps is (10 sec: 13927.3, 60 sec: 14813.9, 300 sec: 14537.3). Total num frames: 983568384. Throughput: 0: 3373.0. Samples: 235055754. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 15:51:33,968][134211] Avg episode reward: [(0, '10.501')] [2025-01-04 15:51:35,477][134294] Updated weights for policy 0, policy_version 240134 (0.0023) [2025-01-04 15:51:38,344][134294] Updated weights for policy 0, policy_version 240144 (0.0024) [2025-01-04 15:51:38,968][134211] Fps is (10 sec: 13516.8, 60 sec: 14472.5, 300 sec: 14523.4). Total num frames: 983633920. Throughput: 0: 3400.1. Samples: 235076594. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 15:51:38,968][134211] Avg episode reward: [(0, '8.998')] [2025-01-04 15:51:41,416][134294] Updated weights for policy 0, policy_version 240154 (0.0025) [2025-01-04 15:51:43,968][134211] Fps is (10 sec: 13516.8, 60 sec: 13858.1, 300 sec: 14523.4). Total num frames: 983703552. Throughput: 0: 3415.9. Samples: 235096834. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 15:51:43,968][134211] Avg episode reward: [(0, '9.508')] [2025-01-04 15:51:44,480][134294] Updated weights for policy 0, policy_version 240164 (0.0025) [2025-01-04 15:51:47,459][134294] Updated weights for policy 0, policy_version 240174 (0.0025) [2025-01-04 15:51:48,968][134211] Fps is (10 sec: 13516.9, 60 sec: 13516.9, 300 sec: 14509.6). Total num frames: 983769088. Throughput: 0: 3420.3. Samples: 235106908. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 15:51:48,968][134211] Avg episode reward: [(0, '9.601')] [2025-01-04 15:51:50,420][134294] Updated weights for policy 0, policy_version 240184 (0.0025) [2025-01-04 15:51:53,545][134294] Updated weights for policy 0, policy_version 240194 (0.0024) [2025-01-04 15:51:53,968][134211] Fps is (10 sec: 13516.8, 60 sec: 13585.1, 300 sec: 14509.6). Total num frames: 983838720. Throughput: 0: 3416.6. Samples: 235127454. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 15:51:53,968][134211] Avg episode reward: [(0, '9.489')] [2025-01-04 15:51:56,582][134294] Updated weights for policy 0, policy_version 240204 (0.0026) [2025-01-04 15:51:58,968][134211] Fps is (10 sec: 13516.8, 60 sec: 13585.1, 300 sec: 14495.7). Total num frames: 983904256. Throughput: 0: 3401.9. Samples: 235147176. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 15:51:58,968][134211] Avg episode reward: [(0, '9.364')] [2025-01-04 15:51:59,682][134294] Updated weights for policy 0, policy_version 240214 (0.0025) [2025-01-04 15:52:01,720][134294] Updated weights for policy 0, policy_version 240224 (0.0014) [2025-01-04 15:52:03,968][134211] Fps is (10 sec: 15155.5, 60 sec: 13926.5, 300 sec: 14537.3). Total num frames: 983990272. Throughput: 0: 3460.6. Samples: 235159682. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 15:52:03,968][134211] Avg episode reward: [(0, '9.820')] [2025-01-04 15:52:04,367][134294] Updated weights for policy 0, policy_version 240234 (0.0024) [2025-01-04 15:52:07,537][134294] Updated weights for policy 0, policy_version 240244 (0.0028) [2025-01-04 15:52:08,968][134211] Fps is (10 sec: 15155.1, 60 sec: 13858.1, 300 sec: 14537.3). Total num frames: 984055808. Throughput: 0: 3482.9. Samples: 235181762. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 15:52:08,968][134211] Avg episode reward: [(0, '9.796')] [2025-01-04 15:52:10,455][134294] Updated weights for policy 0, policy_version 240254 (0.0022) [2025-01-04 15:52:12,368][134294] Updated weights for policy 0, policy_version 240264 (0.0013) [2025-01-04 15:52:13,967][134211] Fps is (10 sec: 16384.3, 60 sec: 14336.1, 300 sec: 14634.5). Total num frames: 984154112. Throughput: 0: 3610.9. Samples: 235208110. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 15:52:13,968][134211] Avg episode reward: [(0, '9.145')] [2025-01-04 15:52:14,250][134294] Updated weights for policy 0, policy_version 240274 (0.0013) [2025-01-04 15:52:16,101][134294] Updated weights for policy 0, policy_version 240284 (0.0014) [2025-01-04 15:52:17,975][134294] Updated weights for policy 0, policy_version 240294 (0.0013) [2025-01-04 15:52:18,967][134211] Fps is (10 sec: 20890.0, 60 sec: 15087.0, 300 sec: 14759.5). Total num frames: 984264704. Throughput: 0: 3749.6. Samples: 235224484. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 15:52:18,968][134211] Avg episode reward: [(0, '9.700')] [2025-01-04 15:52:19,903][134294] Updated weights for policy 0, policy_version 240304 (0.0014) [2025-01-04 15:52:22,843][134294] Updated weights for policy 0, policy_version 240314 (0.0027) [2025-01-04 15:52:23,968][134211] Fps is (10 sec: 18431.6, 60 sec: 15155.4, 300 sec: 14773.4). Total num frames: 984338432. Throughput: 0: 3910.6. Samples: 235252570. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 15:52:23,968][134211] Avg episode reward: [(0, '10.548')] [2025-01-04 15:52:26,054][134294] Updated weights for policy 0, policy_version 240324 (0.0026) [2025-01-04 15:52:28,968][134211] Fps is (10 sec: 13516.5, 60 sec: 15018.7, 300 sec: 14745.6). Total num frames: 984399872. Throughput: 0: 3882.5. Samples: 235271548. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 15:52:28,969][134211] Avg episode reward: [(0, '10.514')] [2025-01-04 15:52:29,396][134294] Updated weights for policy 0, policy_version 240334 (0.0028) [2025-01-04 15:52:32,412][134294] Updated weights for policy 0, policy_version 240344 (0.0028) [2025-01-04 15:52:33,968][134211] Fps is (10 sec: 13107.1, 60 sec: 15018.7, 300 sec: 14745.6). Total num frames: 984469504. Throughput: 0: 3875.4. Samples: 235281300. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 15:52:33,968][134211] Avg episode reward: [(0, '9.484')] [2025-01-04 15:52:35,471][134294] Updated weights for policy 0, policy_version 240354 (0.0026) [2025-01-04 15:52:38,403][134294] Updated weights for policy 0, policy_version 240364 (0.0025) [2025-01-04 15:52:38,968][134211] Fps is (10 sec: 13517.0, 60 sec: 15018.7, 300 sec: 14717.8). Total num frames: 984535040. Throughput: 0: 3873.3. Samples: 235301750. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 15:52:38,968][134211] Avg episode reward: [(0, '9.631')] [2025-01-04 15:52:41,397][134294] Updated weights for policy 0, policy_version 240374 (0.0025) [2025-01-04 15:52:43,968][134211] Fps is (10 sec: 13516.8, 60 sec: 15018.7, 300 sec: 14662.3). Total num frames: 984604672. Throughput: 0: 3887.0. Samples: 235322092. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 15:52:43,969][134211] Avg episode reward: [(0, '9.458')] [2025-01-04 15:52:44,453][134294] Updated weights for policy 0, policy_version 240384 (0.0027) [2025-01-04 15:52:47,415][134294] Updated weights for policy 0, policy_version 240394 (0.0026) [2025-01-04 15:52:48,968][134211] Fps is (10 sec: 13926.4, 60 sec: 15086.9, 300 sec: 14662.3). Total num frames: 984674304. Throughput: 0: 3835.2. Samples: 235332264. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 15:52:48,968][134211] Avg episode reward: [(0, '11.224')] [2025-01-04 15:52:50,432][134294] Updated weights for policy 0, policy_version 240404 (0.0025) [2025-01-04 15:52:53,592][134294] Updated weights for policy 0, policy_version 240414 (0.0029) [2025-01-04 15:52:53,968][134211] Fps is (10 sec: 13516.6, 60 sec: 15018.6, 300 sec: 14648.4). Total num frames: 984739840. Throughput: 0: 3797.6. Samples: 235352654. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 15:52:53,969][134211] Avg episode reward: [(0, '9.857')] [2025-01-04 15:52:56,545][134294] Updated weights for policy 0, policy_version 240424 (0.0026) [2025-01-04 15:52:58,968][134211] Fps is (10 sec: 13107.2, 60 sec: 15018.7, 300 sec: 14634.5). Total num frames: 984805376. Throughput: 0: 3657.1. Samples: 235372680. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 15:52:58,968][134211] Avg episode reward: [(0, '10.673')] [2025-01-04 15:52:59,667][134294] Updated weights for policy 0, policy_version 240434 (0.0028) [2025-01-04 15:53:02,582][134294] Updated weights for policy 0, policy_version 240444 (0.0026) [2025-01-04 15:53:03,969][134211] Fps is (10 sec: 13515.7, 60 sec: 14745.3, 300 sec: 14565.0). Total num frames: 984875008. Throughput: 0: 3518.6. Samples: 235382824. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 15:53:03,969][134211] Avg episode reward: [(0, '9.171')] [2025-01-04 15:53:05,631][134294] Updated weights for policy 0, policy_version 240454 (0.0023) [2025-01-04 15:53:08,565][134294] Updated weights for policy 0, policy_version 240464 (0.0027) [2025-01-04 15:53:08,968][134211] Fps is (10 sec: 13926.4, 60 sec: 14813.9, 300 sec: 14565.1). Total num frames: 984944640. Throughput: 0: 3356.5. Samples: 235403612. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 15:53:08,968][134211] Avg episode reward: [(0, '11.180')] [2025-01-04 15:53:11,445][134294] Updated weights for policy 0, policy_version 240474 (0.0025) [2025-01-04 15:53:13,968][134211] Fps is (10 sec: 13928.0, 60 sec: 14336.0, 300 sec: 14565.1). Total num frames: 985014272. Throughput: 0: 3393.0. Samples: 235424234. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 15:53:13,968][134211] Avg episode reward: [(0, '9.714')] [2025-01-04 15:53:13,981][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000240482_985014272.pth... [2025-01-04 15:53:14,064][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000239642_981573632.pth [2025-01-04 15:53:14,565][134294] Updated weights for policy 0, policy_version 240484 (0.0024) [2025-01-04 15:53:17,618][134294] Updated weights for policy 0, policy_version 240494 (0.0026) [2025-01-04 15:53:18,968][134211] Fps is (10 sec: 13516.7, 60 sec: 13585.0, 300 sec: 14551.2). Total num frames: 985079808. Throughput: 0: 3395.9. Samples: 235434114. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 15:53:18,968][134211] Avg episode reward: [(0, '11.114')] [2025-01-04 15:53:20,524][134294] Updated weights for policy 0, policy_version 240504 (0.0025) [2025-01-04 15:53:23,446][134294] Updated weights for policy 0, policy_version 240514 (0.0026) [2025-01-04 15:53:23,968][134211] Fps is (10 sec: 13516.0, 60 sec: 13516.7, 300 sec: 14481.8). Total num frames: 985149440. Throughput: 0: 3409.1. Samples: 235455160. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 15:53:23,969][134211] Avg episode reward: [(0, '9.820')] [2025-01-04 15:53:26,385][134294] Updated weights for policy 0, policy_version 240524 (0.0024) [2025-01-04 15:53:28,968][134211] Fps is (10 sec: 13925.8, 60 sec: 13653.2, 300 sec: 14481.8). Total num frames: 985219072. Throughput: 0: 3414.1. Samples: 235475728. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 15:53:28,969][134211] Avg episode reward: [(0, '8.851')] [2025-01-04 15:53:29,445][134294] Updated weights for policy 0, policy_version 240534 (0.0024) [2025-01-04 15:53:32,187][134294] Updated weights for policy 0, policy_version 240544 (0.0022) [2025-01-04 15:53:33,968][134211] Fps is (10 sec: 15156.3, 60 sec: 13858.2, 300 sec: 14523.4). Total num frames: 985300992. Throughput: 0: 3414.4. Samples: 235485914. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 15:53:33,968][134211] Avg episode reward: [(0, '9.241')] [2025-01-04 15:53:34,200][134294] Updated weights for policy 0, policy_version 240554 (0.0015) [2025-01-04 15:53:37,054][134294] Updated weights for policy 0, policy_version 240564 (0.0024) [2025-01-04 15:53:38,968][134211] Fps is (10 sec: 15565.4, 60 sec: 13994.6, 300 sec: 14523.5). Total num frames: 985374720. Throughput: 0: 3520.9. Samples: 235511092. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 15:53:38,968][134211] Avg episode reward: [(0, '11.350')] [2025-01-04 15:53:40,118][134294] Updated weights for policy 0, policy_version 240574 (0.0026) [2025-01-04 15:53:42,966][134294] Updated weights for policy 0, policy_version 240584 (0.0024) [2025-01-04 15:53:43,968][134211] Fps is (10 sec: 14336.0, 60 sec: 13994.7, 300 sec: 14537.4). Total num frames: 985444352. Throughput: 0: 3536.0. Samples: 235531798. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 15:53:43,968][134211] Avg episode reward: [(0, '9.939')] [2025-01-04 15:53:45,956][134294] Updated weights for policy 0, policy_version 240594 (0.0027) [2025-01-04 15:53:48,846][134294] Updated weights for policy 0, policy_version 240604 (0.0024) [2025-01-04 15:53:48,971][134211] Fps is (10 sec: 13922.1, 60 sec: 13993.9, 300 sec: 14537.2). Total num frames: 985513984. Throughput: 0: 3541.5. Samples: 235542198. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 15:53:48,971][134211] Avg episode reward: [(0, '10.702')] [2025-01-04 15:53:51,832][134294] Updated weights for policy 0, policy_version 240614 (0.0027) [2025-01-04 15:53:53,969][134211] Fps is (10 sec: 13515.3, 60 sec: 13994.5, 300 sec: 14509.5). Total num frames: 985579520. Throughput: 0: 3542.7. Samples: 235563036. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 15:53:53,969][134211] Avg episode reward: [(0, '11.193')] [2025-01-04 15:53:54,643][134294] Updated weights for policy 0, policy_version 240624 (0.0023) [2025-01-04 15:53:56,638][134294] Updated weights for policy 0, policy_version 240634 (0.0015) [2025-01-04 15:53:58,527][134294] Updated weights for policy 0, policy_version 240644 (0.0014) [2025-01-04 15:53:58,968][134211] Fps is (10 sec: 16798.9, 60 sec: 14609.1, 300 sec: 14523.4). Total num frames: 985681920. Throughput: 0: 3700.9. Samples: 235590776. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 15:53:58,968][134211] Avg episode reward: [(0, '10.757')] [2025-01-04 15:54:01,324][134294] Updated weights for policy 0, policy_version 240654 (0.0023) [2025-01-04 15:54:03,968][134211] Fps is (10 sec: 17614.4, 60 sec: 14677.6, 300 sec: 14412.4). Total num frames: 985755648. Throughput: 0: 3745.2. Samples: 235602648. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 15:54:03,969][134211] Avg episode reward: [(0, '10.689')] [2025-01-04 15:54:04,210][134294] Updated weights for policy 0, policy_version 240664 (0.0023) [2025-01-04 15:54:07,258][134294] Updated weights for policy 0, policy_version 240674 (0.0026) [2025-01-04 15:54:08,968][134211] Fps is (10 sec: 13926.4, 60 sec: 14609.1, 300 sec: 14398.5). Total num frames: 985821184. Throughput: 0: 3731.7. Samples: 235623086. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 15:54:08,968][134211] Avg episode reward: [(0, '10.418')] [2025-01-04 15:54:10,304][134294] Updated weights for policy 0, policy_version 240684 (0.0027) [2025-01-04 15:54:13,170][134294] Updated weights for policy 0, policy_version 240694 (0.0025) [2025-01-04 15:54:13,968][134211] Fps is (10 sec: 13517.0, 60 sec: 14609.1, 300 sec: 14412.4). Total num frames: 985890816. Throughput: 0: 3732.0. Samples: 235643668. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 15:54:13,968][134211] Avg episode reward: [(0, '11.141')] [2025-01-04 15:54:16,223][134294] Updated weights for policy 0, policy_version 240704 (0.0028) [2025-01-04 15:54:18,968][134211] Fps is (10 sec: 13516.8, 60 sec: 14609.1, 300 sec: 14398.5). Total num frames: 985956352. Throughput: 0: 3738.5. Samples: 235654148. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 15:54:18,968][134211] Avg episode reward: [(0, '10.002')] [2025-01-04 15:54:19,218][134294] Updated weights for policy 0, policy_version 240714 (0.0028) [2025-01-04 15:54:22,261][134294] Updated weights for policy 0, policy_version 240724 (0.0025) [2025-01-04 15:54:23,968][134211] Fps is (10 sec: 13516.8, 60 sec: 14609.2, 300 sec: 14398.5). Total num frames: 986025984. Throughput: 0: 3628.7. Samples: 235674382. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 15:54:23,968][134211] Avg episode reward: [(0, '10.809')] [2025-01-04 15:54:25,219][134294] Updated weights for policy 0, policy_version 240734 (0.0022) [2025-01-04 15:54:28,117][134294] Updated weights for policy 0, policy_version 240744 (0.0025) [2025-01-04 15:54:28,968][134211] Fps is (10 sec: 13926.3, 60 sec: 14609.2, 300 sec: 14398.5). Total num frames: 986095616. Throughput: 0: 3631.0. Samples: 235695194. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 15:54:28,968][134211] Avg episode reward: [(0, '9.467')] [2025-01-04 15:54:31,125][134294] Updated weights for policy 0, policy_version 240754 (0.0025) [2025-01-04 15:54:33,968][134211] Fps is (10 sec: 13926.3, 60 sec: 14404.2, 300 sec: 14398.5). Total num frames: 986165248. Throughput: 0: 3630.7. Samples: 235705568. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 15:54:33,968][134211] Avg episode reward: [(0, '9.462')] [2025-01-04 15:54:34,182][134294] Updated weights for policy 0, policy_version 240764 (0.0026) [2025-01-04 15:54:37,108][134294] Updated weights for policy 0, policy_version 240774 (0.0023) [2025-01-04 15:54:38,968][134211] Fps is (10 sec: 13926.4, 60 sec: 14336.0, 300 sec: 14398.5). Total num frames: 986234880. Throughput: 0: 3620.3. Samples: 235725946. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 15:54:38,968][134211] Avg episode reward: [(0, '9.318')] [2025-01-04 15:54:40,067][134294] Updated weights for policy 0, policy_version 240784 (0.0025) [2025-01-04 15:54:42,952][134294] Updated weights for policy 0, policy_version 240794 (0.0024) [2025-01-04 15:54:43,968][134211] Fps is (10 sec: 14336.3, 60 sec: 14404.3, 300 sec: 14426.3). Total num frames: 986308608. Throughput: 0: 3473.5. Samples: 235747084. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 15:54:43,968][134211] Avg episode reward: [(0, '9.039')] [2025-01-04 15:54:44,966][134294] Updated weights for policy 0, policy_version 240804 (0.0014) [2025-01-04 15:54:47,606][134294] Updated weights for policy 0, policy_version 240814 (0.0022) [2025-01-04 15:54:48,968][134211] Fps is (10 sec: 15564.8, 60 sec: 14609.8, 300 sec: 14481.8). Total num frames: 986390528. Throughput: 0: 3532.2. Samples: 235761596. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 15:54:48,968][134211] Avg episode reward: [(0, '9.910')] [2025-01-04 15:54:50,643][134294] Updated weights for policy 0, policy_version 240824 (0.0023) [2025-01-04 15:54:53,538][134294] Updated weights for policy 0, policy_version 240834 (0.0024) [2025-01-04 15:54:53,968][134211] Fps is (10 sec: 15154.8, 60 sec: 14677.6, 300 sec: 14481.8). Total num frames: 986460160. Throughput: 0: 3544.3. Samples: 235782578. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 15:54:53,968][134211] Avg episode reward: [(0, '9.055')] [2025-01-04 15:54:56,627][134294] Updated weights for policy 0, policy_version 240844 (0.0025) [2025-01-04 15:54:58,968][134211] Fps is (10 sec: 13516.8, 60 sec: 14062.9, 300 sec: 14356.8). Total num frames: 986525696. Throughput: 0: 3531.4. Samples: 235802580. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 15:54:58,968][134211] Avg episode reward: [(0, '9.337')] [2025-01-04 15:54:59,631][134294] Updated weights for policy 0, policy_version 240854 (0.0025) [2025-01-04 15:55:02,797][134294] Updated weights for policy 0, policy_version 240864 (0.0027) [2025-01-04 15:55:03,968][134211] Fps is (10 sec: 13517.0, 60 sec: 13994.7, 300 sec: 14370.7). Total num frames: 986595328. Throughput: 0: 3521.8. Samples: 235812628. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 15:55:03,968][134211] Avg episode reward: [(0, '9.783')] [2025-01-04 15:55:05,639][134294] Updated weights for policy 0, policy_version 240874 (0.0021) [2025-01-04 15:55:07,509][134294] Updated weights for policy 0, policy_version 240884 (0.0012) [2025-01-04 15:55:08,967][134211] Fps is (10 sec: 16384.4, 60 sec: 14472.6, 300 sec: 14481.8). Total num frames: 986689536. Throughput: 0: 3601.8. Samples: 235836464. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 15:55:08,968][134211] Avg episode reward: [(0, '9.220')] [2025-01-04 15:55:09,390][134294] Updated weights for policy 0, policy_version 240894 (0.0013) [2025-01-04 15:55:11,905][134294] Updated weights for policy 0, policy_version 240904 (0.0021) [2025-01-04 15:55:13,968][134211] Fps is (10 sec: 17202.8, 60 sec: 14609.0, 300 sec: 14537.3). Total num frames: 986767360. Throughput: 0: 3729.0. Samples: 235862998. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 15:55:13,969][134211] Avg episode reward: [(0, '10.601')] [2025-01-04 15:55:13,977][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000240910_986767360.pth... [2025-01-04 15:55:14,066][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000240062_983293952.pth [2025-01-04 15:55:15,155][134294] Updated weights for policy 0, policy_version 240914 (0.0026) [2025-01-04 15:55:18,288][134294] Updated weights for policy 0, policy_version 240924 (0.0025) [2025-01-04 15:55:18,968][134211] Fps is (10 sec: 14335.7, 60 sec: 14609.1, 300 sec: 14537.3). Total num frames: 986832896. Throughput: 0: 3703.2. Samples: 235872210. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 15:55:18,968][134211] Avg episode reward: [(0, '10.585')] [2025-01-04 15:55:21,350][134294] Updated weights for policy 0, policy_version 240934 (0.0027) [2025-01-04 15:55:23,968][134211] Fps is (10 sec: 13107.3, 60 sec: 14540.8, 300 sec: 14523.4). Total num frames: 986898432. Throughput: 0: 3699.4. Samples: 235892420. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 15:55:23,969][134211] Avg episode reward: [(0, '9.567')] [2025-01-04 15:55:24,416][134294] Updated weights for policy 0, policy_version 240944 (0.0026) [2025-01-04 15:55:27,395][134294] Updated weights for policy 0, policy_version 240954 (0.0025) [2025-01-04 15:55:28,968][134211] Fps is (10 sec: 13107.1, 60 sec: 14472.5, 300 sec: 14523.4). Total num frames: 986963968. Throughput: 0: 3675.3. Samples: 235912472. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-04 15:55:28,969][134211] Avg episode reward: [(0, '9.765')] [2025-01-04 15:55:30,698][134294] Updated weights for policy 0, policy_version 240964 (0.0026) [2025-01-04 15:55:33,904][134294] Updated weights for policy 0, policy_version 240974 (0.0026) [2025-01-04 15:55:33,968][134211] Fps is (10 sec: 13107.3, 60 sec: 14404.3, 300 sec: 14454.0). Total num frames: 987029504. Throughput: 0: 3551.6. Samples: 235921420. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 15:55:33,968][134211] Avg episode reward: [(0, '10.231')] [2025-01-04 15:55:37,136][134294] Updated weights for policy 0, policy_version 240984 (0.0029) [2025-01-04 15:55:38,967][134211] Fps is (10 sec: 13926.8, 60 sec: 14472.6, 300 sec: 14343.0). Total num frames: 987103232. Throughput: 0: 3519.5. Samples: 235940956. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 15:55:38,968][134211] Avg episode reward: [(0, '10.308')] [2025-01-04 15:55:39,191][134294] Updated weights for policy 0, policy_version 240994 (0.0015) [2025-01-04 15:55:42,173][134294] Updated weights for policy 0, policy_version 241004 (0.0025) [2025-01-04 15:55:43,968][134211] Fps is (10 sec: 14335.7, 60 sec: 14404.2, 300 sec: 14287.4). Total num frames: 987172864. Throughput: 0: 3597.5. Samples: 235964470. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 15:55:43,969][134211] Avg episode reward: [(0, '9.848')] [2025-01-04 15:55:45,528][134294] Updated weights for policy 0, policy_version 241014 (0.0030) [2025-01-04 15:55:48,258][134294] Updated weights for policy 0, policy_version 241024 (0.0017) [2025-01-04 15:55:48,967][134211] Fps is (10 sec: 14336.1, 60 sec: 14267.8, 300 sec: 14315.2). Total num frames: 987246592. Throughput: 0: 3578.4. Samples: 235973656. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 15:55:48,968][134211] Avg episode reward: [(0, '9.783')] [2025-01-04 15:55:50,163][134294] Updated weights for policy 0, policy_version 241034 (0.0013) [2025-01-04 15:55:52,218][134294] Updated weights for policy 0, policy_version 241044 (0.0015) [2025-01-04 15:55:53,968][134211] Fps is (10 sec: 18022.5, 60 sec: 14882.1, 300 sec: 14454.0). Total num frames: 987353088. Throughput: 0: 3684.6. Samples: 236002272. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 15:55:53,968][134211] Avg episode reward: [(0, '8.911')] [2025-01-04 15:55:54,184][134294] Updated weights for policy 0, policy_version 241054 (0.0013) [2025-01-04 15:55:57,721][134294] Updated weights for policy 0, policy_version 241064 (0.0030) [2025-01-04 15:55:58,968][134211] Fps is (10 sec: 16383.8, 60 sec: 14745.6, 300 sec: 14426.3). Total num frames: 987410432. Throughput: 0: 3569.5. Samples: 236023626. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 15:55:58,968][134211] Avg episode reward: [(0, '9.561')] [2025-01-04 15:56:01,293][134294] Updated weights for policy 0, policy_version 241074 (0.0029) [2025-01-04 15:56:03,968][134211] Fps is (10 sec: 11878.8, 60 sec: 14609.1, 300 sec: 14398.5). Total num frames: 987471872. Throughput: 0: 3567.6. Samples: 236032752. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 15:56:03,968][134211] Avg episode reward: [(0, '9.812')] [2025-01-04 15:56:04,498][134294] Updated weights for policy 0, policy_version 241084 (0.0023) [2025-01-04 15:56:07,506][134294] Updated weights for policy 0, policy_version 241094 (0.0030) [2025-01-04 15:56:08,968][134211] Fps is (10 sec: 12697.4, 60 sec: 14131.1, 300 sec: 14384.6). Total num frames: 987537408. Throughput: 0: 3552.5. Samples: 236052284. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 15:56:08,968][134211] Avg episode reward: [(0, '9.664')] [2025-01-04 15:56:10,544][134294] Updated weights for policy 0, policy_version 241104 (0.0026) [2025-01-04 15:56:13,641][134294] Updated weights for policy 0, policy_version 241114 (0.0028) [2025-01-04 15:56:13,968][134211] Fps is (10 sec: 13107.1, 60 sec: 13926.4, 300 sec: 14384.6). Total num frames: 987602944. Throughput: 0: 3550.3. Samples: 236072236. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 15:56:13,968][134211] Avg episode reward: [(0, '9.824')] [2025-01-04 15:56:17,011][134294] Updated weights for policy 0, policy_version 241124 (0.0027) [2025-01-04 15:56:18,968][134211] Fps is (10 sec: 12697.7, 60 sec: 13858.2, 300 sec: 14356.9). Total num frames: 987664384. Throughput: 0: 3554.0. Samples: 236081350. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 15:56:18,968][134211] Avg episode reward: [(0, '9.580')] [2025-01-04 15:56:20,196][134294] Updated weights for policy 0, policy_version 241134 (0.0025) [2025-01-04 15:56:23,281][134294] Updated weights for policy 0, policy_version 241144 (0.0026) [2025-01-04 15:56:23,969][134211] Fps is (10 sec: 13106.2, 60 sec: 13926.2, 300 sec: 14356.8). Total num frames: 987734016. Throughput: 0: 3559.3. Samples: 236101128. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 15:56:23,969][134211] Avg episode reward: [(0, '10.274')] [2025-01-04 15:56:26,302][134294] Updated weights for policy 0, policy_version 241154 (0.0026) [2025-01-04 15:56:28,968][134211] Fps is (10 sec: 13516.6, 60 sec: 13926.4, 300 sec: 14342.9). Total num frames: 987799552. Throughput: 0: 3477.3. Samples: 236120946. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 15:56:28,968][134211] Avg episode reward: [(0, '9.932')] [2025-01-04 15:56:29,554][134294] Updated weights for policy 0, policy_version 241164 (0.0025) [2025-01-04 15:56:32,756][134294] Updated weights for policy 0, policy_version 241174 (0.0026) [2025-01-04 15:56:33,968][134211] Fps is (10 sec: 12698.7, 60 sec: 13858.2, 300 sec: 14329.1). Total num frames: 987860992. Throughput: 0: 3474.1. Samples: 236129990. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 15:56:33,968][134211] Avg episode reward: [(0, '9.432')] [2025-01-04 15:56:35,669][134294] Updated weights for policy 0, policy_version 241184 (0.0023) [2025-01-04 15:56:37,572][134294] Updated weights for policy 0, policy_version 241194 (0.0014) [2025-01-04 15:56:38,968][134211] Fps is (10 sec: 15155.3, 60 sec: 14131.2, 300 sec: 14398.5). Total num frames: 987951104. Throughput: 0: 3371.8. Samples: 236154002. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 15:56:38,968][134211] Avg episode reward: [(0, '10.680')] [2025-01-04 15:56:40,305][134294] Updated weights for policy 0, policy_version 241204 (0.0023) [2025-01-04 15:56:43,303][134294] Updated weights for policy 0, policy_version 241214 (0.0025) [2025-01-04 15:56:43,968][134211] Fps is (10 sec: 15974.1, 60 sec: 14131.2, 300 sec: 14412.4). Total num frames: 988020736. Throughput: 0: 3383.5. Samples: 236175886. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 15:56:43,968][134211] Avg episode reward: [(0, '9.629')] [2025-01-04 15:56:46,258][134294] Updated weights for policy 0, policy_version 241224 (0.0025) [2025-01-04 15:56:48,968][134211] Fps is (10 sec: 13516.7, 60 sec: 13994.6, 300 sec: 14398.5). Total num frames: 988086272. Throughput: 0: 3411.7. Samples: 236186280. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 15:56:48,968][134211] Avg episode reward: [(0, '9.718')] [2025-01-04 15:56:49,482][134294] Updated weights for policy 0, policy_version 241234 (0.0027) [2025-01-04 15:56:52,378][134294] Updated weights for policy 0, policy_version 241244 (0.0023) [2025-01-04 15:56:53,968][134211] Fps is (10 sec: 13516.8, 60 sec: 13380.3, 300 sec: 14412.4). Total num frames: 988155904. Throughput: 0: 3424.7. Samples: 236206394. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 15:56:53,969][134211] Avg episode reward: [(0, '9.076')] [2025-01-04 15:56:55,447][134294] Updated weights for policy 0, policy_version 241254 (0.0026) [2025-01-04 15:56:58,371][134294] Updated weights for policy 0, policy_version 241264 (0.0023) [2025-01-04 15:56:58,968][134211] Fps is (10 sec: 13516.8, 60 sec: 13516.8, 300 sec: 14342.9). Total num frames: 988221440. Throughput: 0: 3433.6. Samples: 236226750. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 15:56:58,968][134211] Avg episode reward: [(0, '10.221')] [2025-01-04 15:57:01,447][134294] Updated weights for policy 0, policy_version 241274 (0.0027) [2025-01-04 15:57:03,968][134211] Fps is (10 sec: 13926.7, 60 sec: 13721.6, 300 sec: 14370.7). Total num frames: 988295168. Throughput: 0: 3454.8. Samples: 236236818. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 15:57:03,968][134211] Avg episode reward: [(0, '8.879')] [2025-01-04 15:57:04,036][134294] Updated weights for policy 0, policy_version 241284 (0.0018) [2025-01-04 15:57:06,623][134294] Updated weights for policy 0, policy_version 241294 (0.0020) [2025-01-04 15:57:08,968][134211] Fps is (10 sec: 14745.8, 60 sec: 13858.1, 300 sec: 14287.4). Total num frames: 988368896. Throughput: 0: 3531.0. Samples: 236260020. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 15:57:08,968][134211] Avg episode reward: [(0, '10.909')] [2025-01-04 15:57:09,780][134294] Updated weights for policy 0, policy_version 241304 (0.0030) [2025-01-04 15:57:12,876][134294] Updated weights for policy 0, policy_version 241314 (0.0026) [2025-01-04 15:57:13,968][134211] Fps is (10 sec: 13926.3, 60 sec: 13858.2, 300 sec: 14134.7). Total num frames: 988434432. Throughput: 0: 3526.7. Samples: 236279646. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 15:57:13,968][134211] Avg episode reward: [(0, '9.968')] [2025-01-04 15:57:13,984][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000241317_988434432.pth... [2025-01-04 15:57:14,062][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000240482_985014272.pth [2025-01-04 15:57:16,075][134294] Updated weights for policy 0, policy_version 241324 (0.0028) [2025-01-04 15:57:18,139][134294] Updated weights for policy 0, policy_version 241334 (0.0012) [2025-01-04 15:57:18,968][134211] Fps is (10 sec: 15155.1, 60 sec: 14267.7, 300 sec: 14176.3). Total num frames: 988520448. Throughput: 0: 3535.5. Samples: 236289090. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 15:57:18,968][134211] Avg episode reward: [(0, '9.720')] [2025-01-04 15:57:20,047][134294] Updated weights for policy 0, policy_version 241344 (0.0013) [2025-01-04 15:57:21,937][134294] Updated weights for policy 0, policy_version 241354 (0.0013) [2025-01-04 15:57:23,866][134294] Updated weights for policy 0, policy_version 241364 (0.0014) [2025-01-04 15:57:23,968][134211] Fps is (10 sec: 19250.5, 60 sec: 14882.3, 300 sec: 14329.0). Total num frames: 988626944. Throughput: 0: 3709.8. Samples: 236320942. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 15:57:23,968][134211] Avg episode reward: [(0, '9.564')] [2025-01-04 15:57:25,723][134294] Updated weights for policy 0, policy_version 241374 (0.0014) [2025-01-04 15:57:28,706][134294] Updated weights for policy 0, policy_version 241384 (0.0027) [2025-01-04 15:57:28,968][134211] Fps is (10 sec: 18841.7, 60 sec: 15155.2, 300 sec: 14370.7). Total num frames: 988708864. Throughput: 0: 3843.9. Samples: 236348860. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 15:57:28,968][134211] Avg episode reward: [(0, '8.880')] [2025-01-04 15:57:32,230][134294] Updated weights for policy 0, policy_version 241394 (0.0029) [2025-01-04 15:57:33,968][134211] Fps is (10 sec: 14336.4, 60 sec: 15155.2, 300 sec: 14356.8). Total num frames: 988770304. Throughput: 0: 3806.8. Samples: 236357586. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 15:57:33,968][134211] Avg episode reward: [(0, '9.086')] [2025-01-04 15:57:35,301][134294] Updated weights for policy 0, policy_version 241404 (0.0029) [2025-01-04 15:57:38,451][134294] Updated weights for policy 0, policy_version 241414 (0.0025) [2025-01-04 15:57:38,968][134211] Fps is (10 sec: 12697.7, 60 sec: 14745.6, 300 sec: 14343.0). Total num frames: 988835840. Throughput: 0: 3790.0. Samples: 236376942. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 15:57:38,968][134211] Avg episode reward: [(0, '9.005')] [2025-01-04 15:57:41,534][134294] Updated weights for policy 0, policy_version 241424 (0.0028) [2025-01-04 15:57:43,968][134211] Fps is (10 sec: 12697.5, 60 sec: 14609.1, 300 sec: 14315.2). Total num frames: 988897280. Throughput: 0: 3764.5. Samples: 236396152. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 15:57:43,969][134211] Avg episode reward: [(0, '9.217')] [2025-01-04 15:57:45,138][134294] Updated weights for policy 0, policy_version 241434 (0.0030) [2025-01-04 15:57:48,460][134294] Updated weights for policy 0, policy_version 241444 (0.0026) [2025-01-04 15:57:48,968][134211] Fps is (10 sec: 12287.9, 60 sec: 14540.8, 300 sec: 14301.3). Total num frames: 988958720. Throughput: 0: 3733.2. Samples: 236404814. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 15:57:48,968][134211] Avg episode reward: [(0, '9.143')] [2025-01-04 15:57:51,781][134294] Updated weights for policy 0, policy_version 241454 (0.0026) [2025-01-04 15:57:53,969][134211] Fps is (10 sec: 11877.4, 60 sec: 14335.8, 300 sec: 14273.5). Total num frames: 989016064. Throughput: 0: 3623.8. Samples: 236423094. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 15:57:53,969][134211] Avg episode reward: [(0, '9.485')] [2025-01-04 15:57:55,555][134294] Updated weights for policy 0, policy_version 241464 (0.0030) [2025-01-04 15:57:58,502][134294] Updated weights for policy 0, policy_version 241474 (0.0024) [2025-01-04 15:57:58,969][134211] Fps is (10 sec: 12286.7, 60 sec: 14335.8, 300 sec: 14259.6). Total num frames: 989081600. Throughput: 0: 3595.8. Samples: 236441462. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 15:57:58,970][134211] Avg episode reward: [(0, '9.839')] [2025-01-04 15:58:01,668][134294] Updated weights for policy 0, policy_version 241484 (0.0024) [2025-01-04 15:58:03,968][134211] Fps is (10 sec: 13107.6, 60 sec: 14199.3, 300 sec: 14245.7). Total num frames: 989147136. Throughput: 0: 3603.8. Samples: 236451262. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 15:58:03,969][134211] Avg episode reward: [(0, '9.151')] [2025-01-04 15:58:04,755][134294] Updated weights for policy 0, policy_version 241494 (0.0024) [2025-01-04 15:58:07,766][134294] Updated weights for policy 0, policy_version 241504 (0.0025) [2025-01-04 15:58:08,968][134211] Fps is (10 sec: 13108.6, 60 sec: 14062.9, 300 sec: 14231.9). Total num frames: 989212672. Throughput: 0: 3340.4. Samples: 236471258. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 15:58:08,968][134211] Avg episode reward: [(0, '9.276')] [2025-01-04 15:58:10,763][134294] Updated weights for policy 0, policy_version 241514 (0.0026) [2025-01-04 15:58:13,651][134294] Updated weights for policy 0, policy_version 241524 (0.0026) [2025-01-04 15:58:13,968][134211] Fps is (10 sec: 13927.3, 60 sec: 14199.5, 300 sec: 14259.6). Total num frames: 989286400. Throughput: 0: 3185.9. Samples: 236492224. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 15:58:13,969][134211] Avg episode reward: [(0, '9.475')] [2025-01-04 15:58:16,595][134294] Updated weights for policy 0, policy_version 241534 (0.0024) [2025-01-04 15:58:18,968][134211] Fps is (10 sec: 13925.6, 60 sec: 13858.0, 300 sec: 14245.8). Total num frames: 989351936. Throughput: 0: 3220.8. Samples: 236502522. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 15:58:18,970][134211] Avg episode reward: [(0, '10.979')] [2025-01-04 15:58:19,626][134294] Updated weights for policy 0, policy_version 241544 (0.0027) [2025-01-04 15:58:22,664][134294] Updated weights for policy 0, policy_version 241554 (0.0025) [2025-01-04 15:58:23,969][134211] Fps is (10 sec: 13515.6, 60 sec: 13243.6, 300 sec: 14245.7). Total num frames: 989421568. Throughput: 0: 3243.5. Samples: 236522904. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 15:58:23,969][134211] Avg episode reward: [(0, '10.093')] [2025-01-04 15:58:25,598][134294] Updated weights for policy 0, policy_version 241564 (0.0026) [2025-01-04 15:58:28,521][134294] Updated weights for policy 0, policy_version 241574 (0.0024) [2025-01-04 15:58:28,968][134211] Fps is (10 sec: 13927.3, 60 sec: 13038.9, 300 sec: 14204.1). Total num frames: 989491200. Throughput: 0: 3283.0. Samples: 236543886. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 15:58:28,968][134211] Avg episode reward: [(0, '10.386')] [2025-01-04 15:58:31,427][134294] Updated weights for policy 0, policy_version 241584 (0.0025) [2025-01-04 15:58:33,968][134211] Fps is (10 sec: 13927.5, 60 sec: 13175.5, 300 sec: 14190.2). Total num frames: 989560832. Throughput: 0: 3321.4. Samples: 236554276. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 15:58:33,968][134211] Avg episode reward: [(0, '9.786')] [2025-01-04 15:58:34,479][134294] Updated weights for policy 0, policy_version 241594 (0.0030) [2025-01-04 15:58:37,299][134294] Updated weights for policy 0, policy_version 241604 (0.0021) [2025-01-04 15:58:38,968][134211] Fps is (10 sec: 15155.2, 60 sec: 13448.5, 300 sec: 14231.9). Total num frames: 989642752. Throughput: 0: 3376.5. Samples: 236575034. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 15:58:38,968][134211] Avg episode reward: [(0, '10.045')] [2025-01-04 15:58:39,400][134294] Updated weights for policy 0, policy_version 241614 (0.0016) [2025-01-04 15:58:42,250][134294] Updated weights for policy 0, policy_version 241624 (0.0023) [2025-01-04 15:58:43,968][134211] Fps is (10 sec: 15155.5, 60 sec: 13585.1, 300 sec: 14232.0). Total num frames: 989712384. Throughput: 0: 3507.5. Samples: 236599296. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 15:58:43,968][134211] Avg episode reward: [(0, '10.581')] [2025-01-04 15:58:45,277][134294] Updated weights for policy 0, policy_version 241634 (0.0022) [2025-01-04 15:58:48,285][134294] Updated weights for policy 0, policy_version 241644 (0.0026) [2025-01-04 15:58:48,968][134211] Fps is (10 sec: 13926.2, 60 sec: 13721.6, 300 sec: 14245.8). Total num frames: 989782016. Throughput: 0: 3522.3. Samples: 236609762. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 15:58:48,968][134211] Avg episode reward: [(0, '9.858')] [2025-01-04 15:58:51,451][134294] Updated weights for policy 0, policy_version 241654 (0.0026) [2025-01-04 15:58:53,506][134294] Updated weights for policy 0, policy_version 241664 (0.0013) [2025-01-04 15:58:53,969][134211] Fps is (10 sec: 15153.4, 60 sec: 14131.2, 300 sec: 14176.3). Total num frames: 989863936. Throughput: 0: 3527.8. Samples: 236630014. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 15:58:53,969][134211] Avg episode reward: [(0, '10.373')] [2025-01-04 15:58:55,483][134294] Updated weights for policy 0, policy_version 241674 (0.0013) [2025-01-04 15:58:57,467][134294] Updated weights for policy 0, policy_version 241684 (0.0014) [2025-01-04 15:58:58,968][134211] Fps is (10 sec: 18841.6, 60 sec: 14814.1, 300 sec: 14287.4). Total num frames: 989970432. Throughput: 0: 3766.0. Samples: 236661694. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 15:58:58,968][134211] Avg episode reward: [(0, '9.686')] [2025-01-04 15:58:59,351][134294] Updated weights for policy 0, policy_version 241694 (0.0013) [2025-01-04 15:59:01,258][134294] Updated weights for policy 0, policy_version 241704 (0.0013) [2025-01-04 15:59:03,968][134211] Fps is (10 sec: 19253.2, 60 sec: 15155.4, 300 sec: 14356.8). Total num frames: 990056448. Throughput: 0: 3895.6. Samples: 236677822. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 15:59:03,969][134211] Avg episode reward: [(0, '10.803')] [2025-01-04 15:59:04,156][134294] Updated weights for policy 0, policy_version 241714 (0.0026) [2025-01-04 15:59:07,910][134294] Updated weights for policy 0, policy_version 241724 (0.0033) [2025-01-04 15:59:08,968][134211] Fps is (10 sec: 13516.7, 60 sec: 14882.1, 300 sec: 14287.4). Total num frames: 990105600. Throughput: 0: 3855.8. Samples: 236696410. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 15:59:08,969][134211] Avg episode reward: [(0, '10.648')] [2025-01-04 15:59:12,319][134294] Updated weights for policy 0, policy_version 241734 (0.0036) [2025-01-04 15:59:13,968][134211] Fps is (10 sec: 10239.9, 60 sec: 14540.8, 300 sec: 14245.7). Total num frames: 990158848. Throughput: 0: 3716.7. Samples: 236711140. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 15:59:13,969][134211] Avg episode reward: [(0, '9.319')] [2025-01-04 15:59:14,036][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000241739_990162944.pth... [2025-01-04 15:59:14,107][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000240910_986767360.pth [2025-01-04 15:59:15,603][134294] Updated weights for policy 0, policy_version 241744 (0.0032) [2025-01-04 15:59:18,678][134294] Updated weights for policy 0, policy_version 241754 (0.0028) [2025-01-04 15:59:18,968][134211] Fps is (10 sec: 12288.1, 60 sec: 14609.2, 300 sec: 14245.7). Total num frames: 990228480. Throughput: 0: 3703.7. Samples: 236720942. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 15:59:18,968][134211] Avg episode reward: [(0, '9.067')] [2025-01-04 15:59:21,702][134294] Updated weights for policy 0, policy_version 241764 (0.0025) [2025-01-04 15:59:23,968][134211] Fps is (10 sec: 13517.0, 60 sec: 14541.0, 300 sec: 14231.9). Total num frames: 990294016. Throughput: 0: 3686.3. Samples: 236740916. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 15:59:23,968][134211] Avg episode reward: [(0, '9.450')] [2025-01-04 15:59:25,045][134294] Updated weights for policy 0, policy_version 241774 (0.0029) [2025-01-04 15:59:28,459][134294] Updated weights for policy 0, policy_version 241784 (0.0028) [2025-01-04 15:59:28,968][134211] Fps is (10 sec: 12288.0, 60 sec: 14336.0, 300 sec: 14190.2). Total num frames: 990351360. Throughput: 0: 3550.7. Samples: 236759080. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 15:59:28,968][134211] Avg episode reward: [(0, '8.944')] [2025-01-04 15:59:31,681][134294] Updated weights for policy 0, policy_version 241794 (0.0026) [2025-01-04 15:59:33,968][134211] Fps is (10 sec: 11878.2, 60 sec: 14199.5, 300 sec: 14162.4). Total num frames: 990412800. Throughput: 0: 3536.0. Samples: 236768882. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 15:59:33,969][134211] Avg episode reward: [(0, '9.296')] [2025-01-04 15:59:35,129][134294] Updated weights for policy 0, policy_version 241804 (0.0032) [2025-01-04 15:59:38,154][134294] Updated weights for policy 0, policy_version 241814 (0.0025) [2025-01-04 15:59:38,968][134211] Fps is (10 sec: 12697.7, 60 sec: 13926.4, 300 sec: 14134.7). Total num frames: 990478336. Throughput: 0: 3494.8. Samples: 236787276. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 15:59:38,968][134211] Avg episode reward: [(0, '9.049')] [2025-01-04 15:59:41,475][134294] Updated weights for policy 0, policy_version 241824 (0.0028) [2025-01-04 15:59:43,968][134211] Fps is (10 sec: 12288.0, 60 sec: 13721.6, 300 sec: 14051.4). Total num frames: 990535680. Throughput: 0: 3187.9. Samples: 236805148. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 15:59:43,969][134211] Avg episode reward: [(0, '9.619')] [2025-01-04 15:59:44,882][134294] Updated weights for policy 0, policy_version 241834 (0.0023) [2025-01-04 15:59:46,797][134294] Updated weights for policy 0, policy_version 241844 (0.0013) [2025-01-04 15:59:48,760][134294] Updated weights for policy 0, policy_version 241854 (0.0013) [2025-01-04 15:59:48,967][134211] Fps is (10 sec: 15974.7, 60 sec: 14267.8, 300 sec: 14162.5). Total num frames: 990638080. Throughput: 0: 3119.0. Samples: 236818176. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 15:59:48,968][134211] Avg episode reward: [(0, '9.148')] [2025-01-04 15:59:50,768][134294] Updated weights for policy 0, policy_version 241864 (0.0013) [2025-01-04 15:59:52,946][134294] Updated weights for policy 0, policy_version 241874 (0.0015) [2025-01-04 15:59:53,967][134211] Fps is (10 sec: 19661.4, 60 sec: 14472.8, 300 sec: 14259.6). Total num frames: 990732288. Throughput: 0: 3388.0. Samples: 236848868. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 15:59:53,968][134211] Avg episode reward: [(0, '9.881')] [2025-01-04 15:59:55,135][134294] Updated weights for policy 0, policy_version 241884 (0.0016) [2025-01-04 15:59:58,594][134294] Updated weights for policy 0, policy_version 241894 (0.0032) [2025-01-04 15:59:58,968][134211] Fps is (10 sec: 16383.8, 60 sec: 13858.2, 300 sec: 14259.6). Total num frames: 990801920. Throughput: 0: 3561.2. Samples: 236871392. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 15:59:58,968][134211] Avg episode reward: [(0, '9.372')] [2025-01-04 16:00:01,949][134294] Updated weights for policy 0, policy_version 241904 (0.0028) [2025-01-04 16:00:03,969][134211] Fps is (10 sec: 12696.2, 60 sec: 13380.1, 300 sec: 14134.6). Total num frames: 990859264. Throughput: 0: 3545.4. Samples: 236880486. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 16:00:03,969][134211] Avg episode reward: [(0, '11.189')] [2025-01-04 16:00:05,254][134294] Updated weights for policy 0, policy_version 241914 (0.0032) [2025-01-04 16:00:08,371][134294] Updated weights for policy 0, policy_version 241924 (0.0023) [2025-01-04 16:00:08,969][134211] Fps is (10 sec: 12286.9, 60 sec: 13653.2, 300 sec: 14093.0). Total num frames: 990924800. Throughput: 0: 3521.8. Samples: 236899402. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 16:00:08,969][134211] Avg episode reward: [(0, '10.409')] [2025-01-04 16:00:11,419][134294] Updated weights for policy 0, policy_version 241934 (0.0025) [2025-01-04 16:00:13,968][134211] Fps is (10 sec: 13108.3, 60 sec: 13858.1, 300 sec: 14093.0). Total num frames: 990990336. Throughput: 0: 3556.3. Samples: 236919114. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 16:00:13,969][134211] Avg episode reward: [(0, '10.602')] [2025-01-04 16:00:14,598][134294] Updated weights for policy 0, policy_version 241944 (0.0028) [2025-01-04 16:00:17,788][134294] Updated weights for policy 0, policy_version 241954 (0.0026) [2025-01-04 16:00:18,968][134211] Fps is (10 sec: 13108.2, 60 sec: 13789.9, 300 sec: 14093.0). Total num frames: 991055872. Throughput: 0: 3553.1. Samples: 236928770. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 16:00:18,968][134211] Avg episode reward: [(0, '8.846')] [2025-01-04 16:00:20,767][134294] Updated weights for policy 0, policy_version 241964 (0.0026) [2025-01-04 16:00:23,688][134294] Updated weights for policy 0, policy_version 241974 (0.0024) [2025-01-04 16:00:23,969][134211] Fps is (10 sec: 13925.1, 60 sec: 13926.1, 300 sec: 14120.7). Total num frames: 991129600. Throughput: 0: 3604.6. Samples: 236949488. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 16:00:23,969][134211] Avg episode reward: [(0, '10.757')] [2025-01-04 16:00:26,614][134294] Updated weights for policy 0, policy_version 241984 (0.0026) [2025-01-04 16:00:28,968][134211] Fps is (10 sec: 13926.2, 60 sec: 14062.9, 300 sec: 14120.8). Total num frames: 991195136. Throughput: 0: 3661.0. Samples: 236969894. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 16:00:28,969][134211] Avg episode reward: [(0, '9.598')] [2025-01-04 16:00:29,711][134294] Updated weights for policy 0, policy_version 241994 (0.0026) [2025-01-04 16:00:32,737][134294] Updated weights for policy 0, policy_version 242004 (0.0025) [2025-01-04 16:00:33,968][134211] Fps is (10 sec: 13518.2, 60 sec: 14199.5, 300 sec: 14106.9). Total num frames: 991264768. Throughput: 0: 3597.8. Samples: 236980078. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 16:00:33,968][134211] Avg episode reward: [(0, '10.213')] [2025-01-04 16:00:35,650][134294] Updated weights for policy 0, policy_version 242014 (0.0023) [2025-01-04 16:00:38,599][134294] Updated weights for policy 0, policy_version 242024 (0.0025) [2025-01-04 16:00:38,971][134211] Fps is (10 sec: 13922.4, 60 sec: 14267.0, 300 sec: 14106.8). Total num frames: 991334400. Throughput: 0: 3379.7. Samples: 237000964. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 16:00:38,971][134211] Avg episode reward: [(0, '9.843')] [2025-01-04 16:00:41,506][134294] Updated weights for policy 0, policy_version 242034 (0.0026) [2025-01-04 16:00:43,969][134211] Fps is (10 sec: 13924.8, 60 sec: 14472.3, 300 sec: 14092.9). Total num frames: 991404032. Throughput: 0: 3335.8. Samples: 237021506. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 16:00:43,969][134211] Avg episode reward: [(0, '9.365')] [2025-01-04 16:00:44,651][134294] Updated weights for policy 0, policy_version 242044 (0.0025) [2025-01-04 16:00:47,605][134294] Updated weights for policy 0, policy_version 242054 (0.0027) [2025-01-04 16:00:48,968][134211] Fps is (10 sec: 13520.9, 60 sec: 13858.1, 300 sec: 13954.2). Total num frames: 991469568. Throughput: 0: 3359.4. Samples: 237031658. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 16:00:48,968][134211] Avg episode reward: [(0, '9.454')] [2025-01-04 16:00:50,542][134294] Updated weights for policy 0, policy_version 242064 (0.0025) [2025-01-04 16:00:53,451][134294] Updated weights for policy 0, policy_version 242074 (0.0023) [2025-01-04 16:00:53,968][134211] Fps is (10 sec: 13518.3, 60 sec: 13448.5, 300 sec: 13995.8). Total num frames: 991539200. Throughput: 0: 3404.4. Samples: 237052596. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 16:00:53,968][134211] Avg episode reward: [(0, '9.946')] [2025-01-04 16:00:56,455][134294] Updated weights for policy 0, policy_version 242084 (0.0025) [2025-01-04 16:00:58,968][134211] Fps is (10 sec: 13926.3, 60 sec: 13448.5, 300 sec: 14023.6). Total num frames: 991608832. Throughput: 0: 3423.7. Samples: 237073180. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 16:00:58,968][134211] Avg episode reward: [(0, '9.348')] [2025-01-04 16:00:59,505][134294] Updated weights for policy 0, policy_version 242094 (0.0027) [2025-01-04 16:01:02,439][134294] Updated weights for policy 0, policy_version 242104 (0.0022) [2025-01-04 16:01:03,968][134211] Fps is (10 sec: 13516.8, 60 sec: 13585.3, 300 sec: 14023.6). Total num frames: 991674368. Throughput: 0: 3433.1. Samples: 237083258. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 16:01:03,968][134211] Avg episode reward: [(0, '9.149')] [2025-01-04 16:01:05,482][134294] Updated weights for policy 0, policy_version 242114 (0.0025) [2025-01-04 16:01:08,361][134294] Updated weights for policy 0, policy_version 242124 (0.0025) [2025-01-04 16:01:08,968][134211] Fps is (10 sec: 13926.6, 60 sec: 13721.8, 300 sec: 14051.4). Total num frames: 991748096. Throughput: 0: 3435.0. Samples: 237104058. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 16:01:08,968][134211] Avg episode reward: [(0, '9.588')] [2025-01-04 16:01:11,371][134294] Updated weights for policy 0, policy_version 242134 (0.0027) [2025-01-04 16:01:13,968][134211] Fps is (10 sec: 13926.4, 60 sec: 13721.6, 300 sec: 14065.2). Total num frames: 991813632. Throughput: 0: 3440.2. Samples: 237124702. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 16:01:13,969][134211] Avg episode reward: [(0, '8.966')] [2025-01-04 16:01:13,976][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000242142_991813632.pth... [2025-01-04 16:01:14,062][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000241317_988434432.pth [2025-01-04 16:01:14,361][134294] Updated weights for policy 0, policy_version 242144 (0.0025) [2025-01-04 16:01:17,373][134294] Updated weights for policy 0, policy_version 242154 (0.0027) [2025-01-04 16:01:18,968][134211] Fps is (10 sec: 13107.2, 60 sec: 13721.6, 300 sec: 14051.4). Total num frames: 991879168. Throughput: 0: 3432.9. Samples: 237134556. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 16:01:18,968][134211] Avg episode reward: [(0, '9.784')] [2025-01-04 16:01:20,370][134294] Updated weights for policy 0, policy_version 242164 (0.0023) [2025-01-04 16:01:22,244][134294] Updated weights for policy 0, policy_version 242174 (0.0013) [2025-01-04 16:01:23,968][134211] Fps is (10 sec: 16794.0, 60 sec: 14199.7, 300 sec: 14176.3). Total num frames: 991981568. Throughput: 0: 3513.6. Samples: 237159066. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 16:01:23,968][134211] Avg episode reward: [(0, '9.235')] [2025-01-04 16:01:24,159][134294] Updated weights for policy 0, policy_version 242184 (0.0015) [2025-01-04 16:01:26,627][134294] Updated weights for policy 0, policy_version 242194 (0.0021) [2025-01-04 16:01:28,968][134211] Fps is (10 sec: 17612.3, 60 sec: 14336.0, 300 sec: 14218.0). Total num frames: 992055296. Throughput: 0: 3636.3. Samples: 237185136. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 16:01:28,969][134211] Avg episode reward: [(0, '10.157')] [2025-01-04 16:01:29,629][134294] Updated weights for policy 0, policy_version 242204 (0.0026) [2025-01-04 16:01:32,780][134294] Updated weights for policy 0, policy_version 242214 (0.0026) [2025-01-04 16:01:33,968][134211] Fps is (10 sec: 13926.3, 60 sec: 14267.8, 300 sec: 14134.7). Total num frames: 992120832. Throughput: 0: 3631.8. Samples: 237195088. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 16:01:33,968][134211] Avg episode reward: [(0, '11.225')] [2025-01-04 16:01:35,814][134294] Updated weights for policy 0, policy_version 242224 (0.0025) [2025-01-04 16:01:38,697][134294] Updated weights for policy 0, policy_version 242234 (0.0023) [2025-01-04 16:01:38,968][134211] Fps is (10 sec: 13517.0, 60 sec: 14268.4, 300 sec: 14134.7). Total num frames: 992190464. Throughput: 0: 3621.3. Samples: 237215556. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 16:01:38,968][134211] Avg episode reward: [(0, '10.172')] [2025-01-04 16:01:41,708][134294] Updated weights for policy 0, policy_version 242244 (0.0025) [2025-01-04 16:01:43,968][134211] Fps is (10 sec: 13926.2, 60 sec: 14268.0, 300 sec: 14148.6). Total num frames: 992260096. Throughput: 0: 3619.1. Samples: 237236038. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 16:01:43,968][134211] Avg episode reward: [(0, '9.457')] [2025-01-04 16:01:44,780][134294] Updated weights for policy 0, policy_version 242254 (0.0025) [2025-01-04 16:01:47,724][134294] Updated weights for policy 0, policy_version 242264 (0.0024) [2025-01-04 16:01:48,968][134211] Fps is (10 sec: 13516.9, 60 sec: 14267.7, 300 sec: 14134.7). Total num frames: 992325632. Throughput: 0: 3616.9. Samples: 237246020. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 16:01:48,968][134211] Avg episode reward: [(0, '10.707')] [2025-01-04 16:01:50,753][134294] Updated weights for policy 0, policy_version 242274 (0.0025) [2025-01-04 16:01:53,731][134294] Updated weights for policy 0, policy_version 242284 (0.0022) [2025-01-04 16:01:53,968][134211] Fps is (10 sec: 13926.7, 60 sec: 14336.0, 300 sec: 14162.4). Total num frames: 992399360. Throughput: 0: 3614.6. Samples: 237266714. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 16:01:53,968][134211] Avg episode reward: [(0, '8.709')] [2025-01-04 16:01:56,596][134294] Updated weights for policy 0, policy_version 242294 (0.0025) [2025-01-04 16:01:58,968][134211] Fps is (10 sec: 13926.4, 60 sec: 14267.8, 300 sec: 14134.7). Total num frames: 992464896. Throughput: 0: 3616.7. Samples: 237287454. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 16:01:58,968][134211] Avg episode reward: [(0, '10.459')] [2025-01-04 16:01:59,687][134294] Updated weights for policy 0, policy_version 242304 (0.0026) [2025-01-04 16:02:02,637][134294] Updated weights for policy 0, policy_version 242314 (0.0027) [2025-01-04 16:02:03,968][134211] Fps is (10 sec: 13515.9, 60 sec: 14335.9, 300 sec: 14120.8). Total num frames: 992534528. Throughput: 0: 3622.0. Samples: 237297550. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 16:02:03,969][134211] Avg episode reward: [(0, '10.210')] [2025-01-04 16:02:05,612][134294] Updated weights for policy 0, policy_version 242324 (0.0025) [2025-01-04 16:02:08,562][134294] Updated weights for policy 0, policy_version 242334 (0.0026) [2025-01-04 16:02:08,968][134211] Fps is (10 sec: 13926.6, 60 sec: 14267.8, 300 sec: 14134.7). Total num frames: 992604160. Throughput: 0: 3543.5. Samples: 237318522. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 16:02:08,968][134211] Avg episode reward: [(0, '9.174')] [2025-01-04 16:02:11,585][134294] Updated weights for policy 0, policy_version 242344 (0.0027) [2025-01-04 16:02:13,968][134211] Fps is (10 sec: 14336.9, 60 sec: 14404.3, 300 sec: 14093.0). Total num frames: 992677888. Throughput: 0: 3418.1. Samples: 237338948. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 16:02:13,968][134211] Avg episode reward: [(0, '10.074')] [2025-01-04 16:02:14,196][134294] Updated weights for policy 0, policy_version 242354 (0.0019) [2025-01-04 16:02:16,047][134294] Updated weights for policy 0, policy_version 242364 (0.0013) [2025-01-04 16:02:18,127][134294] Updated weights for policy 0, policy_version 242374 (0.0015) [2025-01-04 16:02:18,968][134211] Fps is (10 sec: 17203.1, 60 sec: 14950.4, 300 sec: 14065.3). Total num frames: 992776192. Throughput: 0: 3547.0. Samples: 237354702. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 16:02:18,968][134211] Avg episode reward: [(0, '8.854')] [2025-01-04 16:02:21,142][134294] Updated weights for policy 0, policy_version 242384 (0.0025) [2025-01-04 16:02:23,968][134211] Fps is (10 sec: 15974.4, 60 sec: 14267.7, 300 sec: 13995.8). Total num frames: 992837632. Throughput: 0: 3604.6. Samples: 237377762. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 16:02:23,968][134211] Avg episode reward: [(0, '8.997')] [2025-01-04 16:02:24,365][134294] Updated weights for policy 0, policy_version 242394 (0.0026) [2025-01-04 16:02:27,379][134294] Updated weights for policy 0, policy_version 242404 (0.0023) [2025-01-04 16:02:28,968][134211] Fps is (10 sec: 13106.1, 60 sec: 14199.3, 300 sec: 14023.6). Total num frames: 992907264. Throughput: 0: 3589.3. Samples: 237397558. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 16:02:28,969][134211] Avg episode reward: [(0, '9.940')] [2025-01-04 16:02:30,401][134294] Updated weights for policy 0, policy_version 242414 (0.0028) [2025-01-04 16:02:33,438][134294] Updated weights for policy 0, policy_version 242424 (0.0024) [2025-01-04 16:02:33,968][134211] Fps is (10 sec: 13516.7, 60 sec: 14199.5, 300 sec: 14023.6). Total num frames: 992972800. Throughput: 0: 3597.6. Samples: 237407914. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 16:02:33,968][134211] Avg episode reward: [(0, '8.728')] [2025-01-04 16:02:36,400][134294] Updated weights for policy 0, policy_version 242434 (0.0027) [2025-01-04 16:02:38,968][134211] Fps is (10 sec: 13517.7, 60 sec: 14199.5, 300 sec: 14051.4). Total num frames: 993042432. Throughput: 0: 3584.0. Samples: 237427996. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 16:02:38,969][134211] Avg episode reward: [(0, '10.809')] [2025-01-04 16:02:39,596][134294] Updated weights for policy 0, policy_version 242444 (0.0027) [2025-01-04 16:02:42,643][134294] Updated weights for policy 0, policy_version 242454 (0.0024) [2025-01-04 16:02:43,968][134211] Fps is (10 sec: 13516.9, 60 sec: 14131.2, 300 sec: 14065.3). Total num frames: 993107968. Throughput: 0: 3567.5. Samples: 237447990. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 16:02:43,968][134211] Avg episode reward: [(0, '9.530')] [2025-01-04 16:02:44,977][134294] Updated weights for policy 0, policy_version 242464 (0.0018) [2025-01-04 16:02:46,851][134294] Updated weights for policy 0, policy_version 242474 (0.0015) [2025-01-04 16:02:48,702][134294] Updated weights for policy 0, policy_version 242484 (0.0014) [2025-01-04 16:02:48,967][134211] Fps is (10 sec: 17613.3, 60 sec: 14882.2, 300 sec: 14245.8). Total num frames: 993218560. Throughput: 0: 3674.2. Samples: 237462886. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 16:02:48,968][134211] Avg episode reward: [(0, '9.869')] [2025-01-04 16:02:50,625][134294] Updated weights for policy 0, policy_version 242494 (0.0013) [2025-01-04 16:02:52,775][134294] Updated weights for policy 0, policy_version 242504 (0.0016) [2025-01-04 16:02:53,968][134211] Fps is (10 sec: 20479.5, 60 sec: 15223.4, 300 sec: 14343.0). Total num frames: 993312768. Throughput: 0: 3924.7. Samples: 237495134. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 16:02:53,969][134211] Avg episode reward: [(0, '9.776')] [2025-01-04 16:02:55,852][134294] Updated weights for policy 0, policy_version 242514 (0.0026) [2025-01-04 16:02:58,968][134211] Fps is (10 sec: 15564.4, 60 sec: 15155.2, 300 sec: 14329.1). Total num frames: 993374208. Throughput: 0: 3921.7. Samples: 237515426. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 16:02:58,968][134211] Avg episode reward: [(0, '10.806')] [2025-01-04 16:02:58,984][134294] Updated weights for policy 0, policy_version 242524 (0.0027) [2025-01-04 16:03:01,960][134294] Updated weights for policy 0, policy_version 242534 (0.0025) [2025-01-04 16:03:03,968][134211] Fps is (10 sec: 13107.3, 60 sec: 15155.3, 300 sec: 14342.9). Total num frames: 993443840. Throughput: 0: 3788.6. Samples: 237525188. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 16:03:03,968][134211] Avg episode reward: [(0, '9.369')] [2025-01-04 16:03:05,130][134294] Updated weights for policy 0, policy_version 242544 (0.0027) [2025-01-04 16:03:08,099][134294] Updated weights for policy 0, policy_version 242554 (0.0027) [2025-01-04 16:03:08,968][134211] Fps is (10 sec: 13516.7, 60 sec: 15086.9, 300 sec: 14315.2). Total num frames: 993509376. Throughput: 0: 3728.8. Samples: 237545558. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 16:03:08,968][134211] Avg episode reward: [(0, '10.550')] [2025-01-04 16:03:11,091][134294] Updated weights for policy 0, policy_version 242564 (0.0025) [2025-01-04 16:03:13,968][134211] Fps is (10 sec: 13516.7, 60 sec: 15018.6, 300 sec: 14329.1). Total num frames: 993579008. Throughput: 0: 3737.6. Samples: 237565748. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 16:03:13,969][134211] Avg episode reward: [(0, '9.468')] [2025-01-04 16:03:13,980][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000242573_993579008.pth... [2025-01-04 16:03:14,060][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000241739_990162944.pth [2025-01-04 16:03:14,137][134294] Updated weights for policy 0, policy_version 242574 (0.0026) [2025-01-04 16:03:17,117][134294] Updated weights for policy 0, policy_version 242584 (0.0025) [2025-01-04 16:03:18,968][134211] Fps is (10 sec: 13926.6, 60 sec: 14540.8, 300 sec: 14329.1). Total num frames: 993648640. Throughput: 0: 3729.9. Samples: 237575758. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 16:03:18,968][134211] Avg episode reward: [(0, '9.596')] [2025-01-04 16:03:20,140][134294] Updated weights for policy 0, policy_version 242594 (0.0026) [2025-01-04 16:03:23,159][134294] Updated weights for policy 0, policy_version 242604 (0.0025) [2025-01-04 16:03:23,968][134211] Fps is (10 sec: 13516.9, 60 sec: 14609.0, 300 sec: 14315.2). Total num frames: 993714176. Throughput: 0: 3747.2. Samples: 237596618. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 16:03:23,968][134211] Avg episode reward: [(0, '8.935')] [2025-01-04 16:03:26,067][134294] Updated weights for policy 0, policy_version 242614 (0.0024) [2025-01-04 16:03:28,968][134211] Fps is (10 sec: 13516.6, 60 sec: 14609.2, 300 sec: 14315.2). Total num frames: 993783808. Throughput: 0: 3757.9. Samples: 237617096. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 16:03:28,969][134211] Avg episode reward: [(0, '9.977')] [2025-01-04 16:03:29,145][134294] Updated weights for policy 0, policy_version 242624 (0.0026) [2025-01-04 16:03:32,084][134294] Updated weights for policy 0, policy_version 242634 (0.0023) [2025-01-04 16:03:33,968][134211] Fps is (10 sec: 13926.6, 60 sec: 14677.3, 300 sec: 14273.5). Total num frames: 993853440. Throughput: 0: 3653.1. Samples: 237627274. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 16:03:33,968][134211] Avg episode reward: [(0, '9.667')] [2025-01-04 16:03:35,054][134294] Updated weights for policy 0, policy_version 242644 (0.0025) [2025-01-04 16:03:37,970][134294] Updated weights for policy 0, policy_version 242654 (0.0025) [2025-01-04 16:03:38,968][134211] Fps is (10 sec: 13926.4, 60 sec: 14677.3, 300 sec: 14273.5). Total num frames: 993923072. Throughput: 0: 3401.6. Samples: 237648206. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 16:03:38,968][134211] Avg episode reward: [(0, '10.050')] [2025-01-04 16:03:40,862][134294] Updated weights for policy 0, policy_version 242664 (0.0026) [2025-01-04 16:03:43,847][134294] Updated weights for policy 0, policy_version 242674 (0.0025) [2025-01-04 16:03:43,968][134211] Fps is (10 sec: 13926.2, 60 sec: 14745.6, 300 sec: 14273.5). Total num frames: 993992704. Throughput: 0: 3419.1. Samples: 237669284. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 16:03:43,968][134211] Avg episode reward: [(0, '9.877')] [2025-01-04 16:03:46,709][134294] Updated weights for policy 0, policy_version 242684 (0.0026) [2025-01-04 16:03:48,968][134211] Fps is (10 sec: 13926.5, 60 sec: 14062.9, 300 sec: 14231.9). Total num frames: 994062336. Throughput: 0: 3431.0. Samples: 237679582. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 16:03:48,968][134211] Avg episode reward: [(0, '9.490')] [2025-01-04 16:03:49,895][134294] Updated weights for policy 0, policy_version 242694 (0.0024) [2025-01-04 16:03:52,862][134294] Updated weights for policy 0, policy_version 242704 (0.0026) [2025-01-04 16:03:53,968][134211] Fps is (10 sec: 13516.8, 60 sec: 13585.1, 300 sec: 14093.0). Total num frames: 994127872. Throughput: 0: 3430.2. Samples: 237699918. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 16:03:53,968][134211] Avg episode reward: [(0, '9.121')] [2025-01-04 16:03:55,744][134294] Updated weights for policy 0, policy_version 242714 (0.0026) [2025-01-04 16:03:57,852][134294] Updated weights for policy 0, policy_version 242724 (0.0016) [2025-01-04 16:03:58,968][134211] Fps is (10 sec: 15155.2, 60 sec: 13994.7, 300 sec: 14093.0). Total num frames: 994213888. Throughput: 0: 3521.3. Samples: 237724206. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 16:03:58,968][134211] Avg episode reward: [(0, '9.608')] [2025-01-04 16:04:00,346][134294] Updated weights for policy 0, policy_version 242734 (0.0022) [2025-01-04 16:04:03,368][134294] Updated weights for policy 0, policy_version 242744 (0.0026) [2025-01-04 16:04:03,968][134211] Fps is (10 sec: 15974.4, 60 sec: 14062.9, 300 sec: 14176.3). Total num frames: 994287616. Throughput: 0: 3552.7. Samples: 237735630. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 16:04:03,968][134211] Avg episode reward: [(0, '10.203')] [2025-01-04 16:04:06,239][134294] Updated weights for policy 0, policy_version 242754 (0.0024) [2025-01-04 16:04:08,968][134211] Fps is (10 sec: 13926.4, 60 sec: 14062.9, 300 sec: 14218.0). Total num frames: 994353152. Throughput: 0: 3545.7. Samples: 237756174. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 16:04:08,968][134211] Avg episode reward: [(0, '9.576')] [2025-01-04 16:04:09,415][134294] Updated weights for policy 0, policy_version 242764 (0.0024) [2025-01-04 16:04:11,409][134294] Updated weights for policy 0, policy_version 242774 (0.0013) [2025-01-04 16:04:13,968][134211] Fps is (10 sec: 15155.4, 60 sec: 14336.1, 300 sec: 14273.5). Total num frames: 994439168. Throughput: 0: 3646.4. Samples: 237781184. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 16:04:13,968][134211] Avg episode reward: [(0, '10.454')] [2025-01-04 16:04:14,013][134294] Updated weights for policy 0, policy_version 242784 (0.0023) [2025-01-04 16:04:16,996][134294] Updated weights for policy 0, policy_version 242794 (0.0026) [2025-01-04 16:04:18,968][134211] Fps is (10 sec: 15564.9, 60 sec: 14336.0, 300 sec: 14287.4). Total num frames: 994508800. Throughput: 0: 3645.9. Samples: 237791338. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 16:04:18,968][134211] Avg episode reward: [(0, '9.787')] [2025-01-04 16:04:20,018][134294] Updated weights for policy 0, policy_version 242804 (0.0025) [2025-01-04 16:04:22,963][134294] Updated weights for policy 0, policy_version 242814 (0.0024) [2025-01-04 16:04:23,968][134211] Fps is (10 sec: 13926.4, 60 sec: 14404.3, 300 sec: 14329.1). Total num frames: 994578432. Throughput: 0: 3644.6. Samples: 237812212. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 16:04:23,968][134211] Avg episode reward: [(0, '9.657')] [2025-01-04 16:04:25,866][134294] Updated weights for policy 0, policy_version 242824 (0.0025) [2025-01-04 16:04:28,846][134294] Updated weights for policy 0, policy_version 242834 (0.0023) [2025-01-04 16:04:28,968][134211] Fps is (10 sec: 13925.9, 60 sec: 14404.2, 300 sec: 14356.8). Total num frames: 994648064. Throughput: 0: 3641.5. Samples: 237833154. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 16:04:28,969][134211] Avg episode reward: [(0, '9.093')] [2025-01-04 16:04:31,731][134294] Updated weights for policy 0, policy_version 242844 (0.0026) [2025-01-04 16:04:33,968][134211] Fps is (10 sec: 13926.4, 60 sec: 14404.3, 300 sec: 14370.7). Total num frames: 994717696. Throughput: 0: 3642.3. Samples: 237843486. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 16:04:33,968][134211] Avg episode reward: [(0, '9.887')] [2025-01-04 16:04:34,774][134294] Updated weights for policy 0, policy_version 242854 (0.0027) [2025-01-04 16:04:37,804][134294] Updated weights for policy 0, policy_version 242864 (0.0027) [2025-01-04 16:04:38,968][134211] Fps is (10 sec: 13926.7, 60 sec: 14404.3, 300 sec: 14412.4). Total num frames: 994787328. Throughput: 0: 3643.5. Samples: 237863874. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 16:04:38,968][134211] Avg episode reward: [(0, '10.193')] [2025-01-04 16:04:40,276][134294] Updated weights for policy 0, policy_version 242874 (0.0019) [2025-01-04 16:04:42,416][134294] Updated weights for policy 0, policy_version 242884 (0.0015) [2025-01-04 16:04:43,968][134211] Fps is (10 sec: 15564.5, 60 sec: 14677.3, 300 sec: 14356.8). Total num frames: 994873344. Throughput: 0: 3665.8. Samples: 237889166. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 16:04:43,969][134211] Avg episode reward: [(0, '9.736')] [2025-01-04 16:04:45,339][134294] Updated weights for policy 0, policy_version 242894 (0.0025) [2025-01-04 16:04:48,264][134294] Updated weights for policy 0, policy_version 242904 (0.0026) [2025-01-04 16:04:48,968][134211] Fps is (10 sec: 15155.2, 60 sec: 14609.1, 300 sec: 14259.6). Total num frames: 994938880. Throughput: 0: 3641.8. Samples: 237899510. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 16:04:48,968][134211] Avg episode reward: [(0, '10.949')] [2025-01-04 16:04:51,228][134294] Updated weights for policy 0, policy_version 242914 (0.0025) [2025-01-04 16:04:53,968][134211] Fps is (10 sec: 13517.0, 60 sec: 14677.4, 300 sec: 14259.6). Total num frames: 995008512. Throughput: 0: 3641.4. Samples: 237920036. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 16:04:53,968][134211] Avg episode reward: [(0, '10.088')] [2025-01-04 16:04:54,388][134294] Updated weights for policy 0, policy_version 242924 (0.0025) [2025-01-04 16:04:57,318][134294] Updated weights for policy 0, policy_version 242934 (0.0024) [2025-01-04 16:04:58,968][134211] Fps is (10 sec: 13516.8, 60 sec: 14336.0, 300 sec: 14287.4). Total num frames: 995074048. Throughput: 0: 3528.5. Samples: 237939966. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 16:04:58,968][134211] Avg episode reward: [(0, '8.755')] [2025-01-04 16:05:00,734][134294] Updated weights for policy 0, policy_version 242944 (0.0030) [2025-01-04 16:05:03,377][134294] Updated weights for policy 0, policy_version 242954 (0.0021) [2025-01-04 16:05:03,968][134211] Fps is (10 sec: 14336.2, 60 sec: 14404.3, 300 sec: 14329.1). Total num frames: 995151872. Throughput: 0: 3510.6. Samples: 237949316. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 16:05:03,968][134211] Avg episode reward: [(0, '9.917')] [2025-01-04 16:05:05,297][134294] Updated weights for policy 0, policy_version 242964 (0.0014) [2025-01-04 16:05:07,424][134294] Updated weights for policy 0, policy_version 242974 (0.0016) [2025-01-04 16:05:08,968][134211] Fps is (10 sec: 16793.6, 60 sec: 14813.9, 300 sec: 14412.4). Total num frames: 995241984. Throughput: 0: 3678.3. Samples: 237977734. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 16:05:08,968][134211] Avg episode reward: [(0, '10.030')] [2025-01-04 16:05:10,510][134294] Updated weights for policy 0, policy_version 242984 (0.0024) [2025-01-04 16:05:13,628][134294] Updated weights for policy 0, policy_version 242994 (0.0028) [2025-01-04 16:05:13,968][134211] Fps is (10 sec: 15154.7, 60 sec: 14404.2, 300 sec: 14398.5). Total num frames: 995303424. Throughput: 0: 3656.1. Samples: 237997680. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 16:05:13,969][134211] Avg episode reward: [(0, '9.587')] [2025-01-04 16:05:13,984][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000242995_995307520.pth... [2025-01-04 16:05:14,067][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000242142_991813632.pth [2025-01-04 16:05:16,740][134294] Updated weights for policy 0, policy_version 243004 (0.0026) [2025-01-04 16:05:18,968][134211] Fps is (10 sec: 12697.7, 60 sec: 14336.0, 300 sec: 14370.8). Total num frames: 995368960. Throughput: 0: 3638.1. Samples: 238007202. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 16:05:18,968][134211] Avg episode reward: [(0, '10.354')] [2025-01-04 16:05:19,933][134294] Updated weights for policy 0, policy_version 243014 (0.0026) [2025-01-04 16:05:22,900][134294] Updated weights for policy 0, policy_version 243024 (0.0027) [2025-01-04 16:05:23,968][134211] Fps is (10 sec: 13517.0, 60 sec: 14336.0, 300 sec: 14384.6). Total num frames: 995438592. Throughput: 0: 3628.6. Samples: 238027162. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 16:05:23,968][134211] Avg episode reward: [(0, '9.722')] [2025-01-04 16:05:26,069][134294] Updated weights for policy 0, policy_version 243034 (0.0027) [2025-01-04 16:05:28,968][134211] Fps is (10 sec: 13516.8, 60 sec: 14267.8, 300 sec: 14370.7). Total num frames: 995504128. Throughput: 0: 3500.4. Samples: 238046684. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 16:05:28,968][134211] Avg episode reward: [(0, '11.322')] [2025-01-04 16:05:29,283][134294] Updated weights for policy 0, policy_version 243044 (0.0027) [2025-01-04 16:05:32,301][134294] Updated weights for policy 0, policy_version 243054 (0.0027) [2025-01-04 16:05:33,968][134211] Fps is (10 sec: 13107.4, 60 sec: 14199.5, 300 sec: 14357.0). Total num frames: 995569664. Throughput: 0: 3492.5. Samples: 238056672. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 16:05:33,968][134211] Avg episode reward: [(0, '11.022')] [2025-01-04 16:05:35,294][134294] Updated weights for policy 0, policy_version 243064 (0.0024) [2025-01-04 16:05:38,124][134294] Updated weights for policy 0, policy_version 243074 (0.0025) [2025-01-04 16:05:38,968][134211] Fps is (10 sec: 13516.0, 60 sec: 14199.3, 300 sec: 14356.9). Total num frames: 995639296. Throughput: 0: 3502.1. Samples: 238077634. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 16:05:38,969][134211] Avg episode reward: [(0, '10.173')] [2025-01-04 16:05:41,097][134294] Updated weights for policy 0, policy_version 243084 (0.0029) [2025-01-04 16:05:43,547][134294] Updated weights for policy 0, policy_version 243094 (0.0019) [2025-01-04 16:05:43,967][134211] Fps is (10 sec: 14745.8, 60 sec: 14063.0, 300 sec: 14398.5). Total num frames: 995717120. Throughput: 0: 3545.0. Samples: 238099492. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 16:05:43,968][134211] Avg episode reward: [(0, '10.012')] [2025-01-04 16:05:45,896][134294] Updated weights for policy 0, policy_version 243104 (0.0018) [2025-01-04 16:05:48,735][134294] Updated weights for policy 0, policy_version 243114 (0.0026) [2025-01-04 16:05:48,968][134211] Fps is (10 sec: 15565.7, 60 sec: 14267.7, 300 sec: 14426.3). Total num frames: 995794944. Throughput: 0: 3630.9. Samples: 238112708. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 16:05:48,968][134211] Avg episode reward: [(0, '9.675')] [2025-01-04 16:05:51,790][134294] Updated weights for policy 0, policy_version 243124 (0.0023) [2025-01-04 16:05:53,969][134211] Fps is (10 sec: 14743.4, 60 sec: 14267.4, 300 sec: 14426.2). Total num frames: 995864576. Throughput: 0: 3462.7. Samples: 238133558. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 16:05:53,970][134211] Avg episode reward: [(0, '9.836')] [2025-01-04 16:05:54,842][134294] Updated weights for policy 0, policy_version 243134 (0.0025) [2025-01-04 16:05:57,882][134294] Updated weights for policy 0, policy_version 243144 (0.0026) [2025-01-04 16:05:58,968][134211] Fps is (10 sec: 13517.0, 60 sec: 14267.8, 300 sec: 14426.3). Total num frames: 995930112. Throughput: 0: 3464.8. Samples: 238153594. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 16:05:58,968][134211] Avg episode reward: [(0, '9.866')] [2025-01-04 16:06:00,830][134294] Updated weights for policy 0, policy_version 243154 (0.0025) [2025-01-04 16:06:03,123][134294] Updated weights for policy 0, policy_version 243164 (0.0018) [2025-01-04 16:06:03,968][134211] Fps is (10 sec: 15157.2, 60 sec: 14404.2, 300 sec: 14467.9). Total num frames: 996016128. Throughput: 0: 3489.1. Samples: 238164212. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 16:06:03,968][134211] Avg episode reward: [(0, '10.041')] [2025-01-04 16:06:05,433][134294] Updated weights for policy 0, policy_version 243174 (0.0021) [2025-01-04 16:06:08,413][134294] Updated weights for policy 0, policy_version 243184 (0.0026) [2025-01-04 16:06:08,968][134211] Fps is (10 sec: 15564.6, 60 sec: 14062.9, 300 sec: 14481.8). Total num frames: 996085760. Throughput: 0: 3607.2. Samples: 238189486. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 16:06:08,968][134211] Avg episode reward: [(0, '9.841')] [2025-01-04 16:06:11,391][134294] Updated weights for policy 0, policy_version 243194 (0.0025) [2025-01-04 16:06:13,968][134211] Fps is (10 sec: 13926.1, 60 sec: 14199.5, 300 sec: 14495.7). Total num frames: 996155392. Throughput: 0: 3624.1. Samples: 238209768. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 16:06:13,968][134211] Avg episode reward: [(0, '8.942')] [2025-01-04 16:06:14,500][134294] Updated weights for policy 0, policy_version 243204 (0.0023) [2025-01-04 16:06:17,455][134294] Updated weights for policy 0, policy_version 243214 (0.0024) [2025-01-04 16:06:18,968][134211] Fps is (10 sec: 13516.9, 60 sec: 14199.5, 300 sec: 14370.7). Total num frames: 996220928. Throughput: 0: 3626.8. Samples: 238219876. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 16:06:18,969][134211] Avg episode reward: [(0, '9.623')] [2025-01-04 16:06:20,433][134294] Updated weights for policy 0, policy_version 243224 (0.0022) [2025-01-04 16:06:23,337][134294] Updated weights for policy 0, policy_version 243234 (0.0027) [2025-01-04 16:06:23,968][134211] Fps is (10 sec: 13926.5, 60 sec: 14267.7, 300 sec: 14370.7). Total num frames: 996294656. Throughput: 0: 3624.4. Samples: 238240732. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 16:06:23,968][134211] Avg episode reward: [(0, '9.661')] [2025-01-04 16:06:26,394][134294] Updated weights for policy 0, policy_version 243244 (0.0023) [2025-01-04 16:06:28,968][134211] Fps is (10 sec: 13926.3, 60 sec: 14267.7, 300 sec: 14370.7). Total num frames: 996360192. Throughput: 0: 3593.8. Samples: 238261212. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 16:06:28,968][134211] Avg episode reward: [(0, '9.670')] [2025-01-04 16:06:29,370][134294] Updated weights for policy 0, policy_version 243254 (0.0024) [2025-01-04 16:06:31,391][134294] Updated weights for policy 0, policy_version 243264 (0.0014) [2025-01-04 16:06:33,968][134211] Fps is (10 sec: 15155.2, 60 sec: 14609.0, 300 sec: 14426.3). Total num frames: 996446208. Throughput: 0: 3599.8. Samples: 238274700. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 16:06:33,968][134211] Avg episode reward: [(0, '10.900')] [2025-01-04 16:06:34,001][134294] Updated weights for policy 0, policy_version 243274 (0.0022) [2025-01-04 16:06:37,021][134294] Updated weights for policy 0, policy_version 243284 (0.0027) [2025-01-04 16:06:38,968][134211] Fps is (10 sec: 15563.8, 60 sec: 14609.1, 300 sec: 14426.2). Total num frames: 996515840. Throughput: 0: 3618.9. Samples: 238296408. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 16:06:38,969][134211] Avg episode reward: [(0, '10.453')] [2025-01-04 16:06:40,023][134294] Updated weights for policy 0, policy_version 243294 (0.0026) [2025-01-04 16:06:43,011][134294] Updated weights for policy 0, policy_version 243304 (0.0025) [2025-01-04 16:06:43,968][134211] Fps is (10 sec: 13926.4, 60 sec: 14472.5, 300 sec: 14440.1). Total num frames: 996585472. Throughput: 0: 3633.2. Samples: 238317088. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 16:06:43,968][134211] Avg episode reward: [(0, '9.892')] [2025-01-04 16:06:45,997][134294] Updated weights for policy 0, policy_version 243314 (0.0021) [2025-01-04 16:06:48,968][134211] Fps is (10 sec: 13517.7, 60 sec: 14267.7, 300 sec: 14412.4). Total num frames: 996651008. Throughput: 0: 3630.4. Samples: 238327578. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 16:06:48,968][134211] Avg episode reward: [(0, '9.973')] [2025-01-04 16:06:49,030][134294] Updated weights for policy 0, policy_version 243324 (0.0025) [2025-01-04 16:06:51,950][134294] Updated weights for policy 0, policy_version 243334 (0.0024) [2025-01-04 16:06:53,968][134211] Fps is (10 sec: 13517.0, 60 sec: 14268.1, 300 sec: 14426.3). Total num frames: 996720640. Throughput: 0: 3518.8. Samples: 238347830. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 16:06:53,968][134211] Avg episode reward: [(0, '10.026')] [2025-01-04 16:06:55,022][134294] Updated weights for policy 0, policy_version 243344 (0.0028) [2025-01-04 16:06:58,003][134294] Updated weights for policy 0, policy_version 243354 (0.0025) [2025-01-04 16:06:58,968][134211] Fps is (10 sec: 14336.2, 60 sec: 14404.3, 300 sec: 14440.2). Total num frames: 996794368. Throughput: 0: 3528.0. Samples: 238368526. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 16:06:58,968][134211] Avg episode reward: [(0, '10.610')] [2025-01-04 16:06:59,965][134294] Updated weights for policy 0, policy_version 243364 (0.0014) [2025-01-04 16:07:02,694][134294] Updated weights for policy 0, policy_version 243374 (0.0024) [2025-01-04 16:07:03,968][134211] Fps is (10 sec: 15155.0, 60 sec: 14267.7, 300 sec: 14467.9). Total num frames: 996872192. Throughput: 0: 3622.1. Samples: 238382870. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 16:07:03,968][134211] Avg episode reward: [(0, '8.997')] [2025-01-04 16:07:05,722][134294] Updated weights for policy 0, policy_version 243384 (0.0027) [2025-01-04 16:07:08,863][134294] Updated weights for policy 0, policy_version 243394 (0.0025) [2025-01-04 16:07:08,968][134211] Fps is (10 sec: 14745.3, 60 sec: 14267.7, 300 sec: 14454.0). Total num frames: 996941824. Throughput: 0: 3613.8. Samples: 238403352. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 16:07:08,968][134211] Avg episode reward: [(0, '8.516')] [2025-01-04 16:07:11,773][134294] Updated weights for policy 0, policy_version 243404 (0.0024) [2025-01-04 16:07:13,968][134211] Fps is (10 sec: 13926.5, 60 sec: 14267.8, 300 sec: 14356.8). Total num frames: 997011456. Throughput: 0: 3604.8. Samples: 238423430. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 16:07:13,968][134211] Avg episode reward: [(0, '8.950')] [2025-01-04 16:07:13,981][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000243411_997011456.pth... [2025-01-04 16:07:14,054][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000242573_993579008.pth [2025-01-04 16:07:14,935][134294] Updated weights for policy 0, policy_version 243414 (0.0025) [2025-01-04 16:07:17,005][134294] Updated weights for policy 0, policy_version 243424 (0.0015) [2025-01-04 16:07:18,902][134294] Updated weights for policy 0, policy_version 243434 (0.0013) [2025-01-04 16:07:18,968][134211] Fps is (10 sec: 16384.3, 60 sec: 14745.6, 300 sec: 14467.9). Total num frames: 997105664. Throughput: 0: 3559.1. Samples: 238434860. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 16:07:18,968][134211] Avg episode reward: [(0, '9.516')] [2025-01-04 16:07:20,803][134294] Updated weights for policy 0, policy_version 243444 (0.0013) [2025-01-04 16:07:22,683][134294] Updated weights for policy 0, policy_version 243454 (0.0013) [2025-01-04 16:07:23,968][134211] Fps is (10 sec: 20070.7, 60 sec: 15291.8, 300 sec: 14592.9). Total num frames: 997212160. Throughput: 0: 3802.3. Samples: 238467508. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 16:07:23,968][134211] Avg episode reward: [(0, '9.056')] [2025-01-04 16:07:24,602][134294] Updated weights for policy 0, policy_version 243464 (0.0012) [2025-01-04 16:07:27,527][134294] Updated weights for policy 0, policy_version 243474 (0.0025) [2025-01-04 16:07:28,968][134211] Fps is (10 sec: 17612.6, 60 sec: 15360.0, 300 sec: 14606.8). Total num frames: 997281792. Throughput: 0: 3891.7. Samples: 238492214. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 16:07:28,968][134211] Avg episode reward: [(0, '9.964')] [2025-01-04 16:07:31,021][134294] Updated weights for policy 0, policy_version 243484 (0.0029) [2025-01-04 16:07:33,968][134211] Fps is (10 sec: 13107.0, 60 sec: 14950.4, 300 sec: 14579.0). Total num frames: 997343232. Throughput: 0: 3860.7. Samples: 238501310. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 16:07:33,968][134211] Avg episode reward: [(0, '10.150')] [2025-01-04 16:07:34,301][134294] Updated weights for policy 0, policy_version 243494 (0.0031) [2025-01-04 16:07:37,343][134294] Updated weights for policy 0, policy_version 243504 (0.0023) [2025-01-04 16:07:38,968][134211] Fps is (10 sec: 13107.1, 60 sec: 14950.6, 300 sec: 14592.9). Total num frames: 997412864. Throughput: 0: 3844.0. Samples: 238520812. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 16:07:38,968][134211] Avg episode reward: [(0, '11.184')] [2025-01-04 16:07:40,449][134294] Updated weights for policy 0, policy_version 243514 (0.0027) [2025-01-04 16:07:43,402][134294] Updated weights for policy 0, policy_version 243524 (0.0026) [2025-01-04 16:07:43,968][134211] Fps is (10 sec: 13516.8, 60 sec: 14882.2, 300 sec: 14440.1). Total num frames: 997478400. Throughput: 0: 3835.9. Samples: 238541140. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 16:07:43,968][134211] Avg episode reward: [(0, '9.699')] [2025-01-04 16:07:46,331][134294] Updated weights for policy 0, policy_version 243534 (0.0028) [2025-01-04 16:07:48,968][134211] Fps is (10 sec: 13516.8, 60 sec: 14950.4, 300 sec: 14356.8). Total num frames: 997548032. Throughput: 0: 3745.9. Samples: 238551434. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 16:07:48,968][134211] Avg episode reward: [(0, '10.082')] [2025-01-04 16:07:49,459][134294] Updated weights for policy 0, policy_version 243544 (0.0025) [2025-01-04 16:07:52,460][134294] Updated weights for policy 0, policy_version 243554 (0.0023) [2025-01-04 16:07:53,968][134211] Fps is (10 sec: 13516.6, 60 sec: 14882.1, 300 sec: 14370.7). Total num frames: 997613568. Throughput: 0: 3738.3. Samples: 238571576. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 16:07:53,969][134211] Avg episode reward: [(0, '9.766')] [2025-01-04 16:07:55,693][134294] Updated weights for policy 0, policy_version 243564 (0.0025) [2025-01-04 16:07:58,780][134294] Updated weights for policy 0, policy_version 243574 (0.0027) [2025-01-04 16:07:58,968][134211] Fps is (10 sec: 13107.2, 60 sec: 14745.6, 300 sec: 14356.8). Total num frames: 997679104. Throughput: 0: 3723.2. Samples: 238590976. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 16:07:58,968][134211] Avg episode reward: [(0, '9.049')] [2025-01-04 16:08:01,798][134294] Updated weights for policy 0, policy_version 243584 (0.0024) [2025-01-04 16:08:03,968][134211] Fps is (10 sec: 13107.2, 60 sec: 14540.8, 300 sec: 14356.8). Total num frames: 997744640. Throughput: 0: 3695.1. Samples: 238601142. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 16:08:03,968][134211] Avg episode reward: [(0, '10.141')] [2025-01-04 16:08:04,943][134294] Updated weights for policy 0, policy_version 243594 (0.0026) [2025-01-04 16:08:07,877][134294] Updated weights for policy 0, policy_version 243604 (0.0026) [2025-01-04 16:08:08,968][134211] Fps is (10 sec: 13517.0, 60 sec: 14540.8, 300 sec: 14356.8). Total num frames: 997814272. Throughput: 0: 3417.2. Samples: 238621284. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 16:08:08,968][134211] Avg episode reward: [(0, '8.534')] [2025-01-04 16:08:10,910][134294] Updated weights for policy 0, policy_version 243614 (0.0025) [2025-01-04 16:08:13,759][134294] Updated weights for policy 0, policy_version 243624 (0.0028) [2025-01-04 16:08:13,968][134211] Fps is (10 sec: 13925.7, 60 sec: 14540.7, 300 sec: 14356.8). Total num frames: 997883904. Throughput: 0: 3332.0. Samples: 238642154. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-04 16:08:13,969][134211] Avg episode reward: [(0, '10.914')] [2025-01-04 16:08:16,750][134294] Updated weights for policy 0, policy_version 243634 (0.0025) [2025-01-04 16:08:18,970][134211] Fps is (10 sec: 13923.3, 60 sec: 14130.7, 300 sec: 14370.6). Total num frames: 997953536. Throughput: 0: 3360.1. Samples: 238652524. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 16:08:18,971][134211] Avg episode reward: [(0, '9.859')] [2025-01-04 16:08:19,863][134294] Updated weights for policy 0, policy_version 243644 (0.0026) [2025-01-04 16:08:22,719][134294] Updated weights for policy 0, policy_version 243654 (0.0024) [2025-01-04 16:08:23,968][134211] Fps is (10 sec: 13517.5, 60 sec: 13448.5, 300 sec: 14356.8). Total num frames: 998019072. Throughput: 0: 3381.2. Samples: 238672968. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 16:08:23,968][134211] Avg episode reward: [(0, '9.031')] [2025-01-04 16:08:25,790][134294] Updated weights for policy 0, policy_version 243664 (0.0025) [2025-01-04 16:08:28,568][134294] Updated weights for policy 0, policy_version 243674 (0.0026) [2025-01-04 16:08:28,968][134211] Fps is (10 sec: 13929.6, 60 sec: 13516.8, 300 sec: 14370.7). Total num frames: 998092800. Throughput: 0: 3392.6. Samples: 238693808. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 16:08:28,968][134211] Avg episode reward: [(0, '9.670')] [2025-01-04 16:08:30,604][134294] Updated weights for policy 0, policy_version 243684 (0.0013) [2025-01-04 16:08:32,478][134294] Updated weights for policy 0, policy_version 243694 (0.0012) [2025-01-04 16:08:33,968][134211] Fps is (10 sec: 18022.8, 60 sec: 14267.8, 300 sec: 14495.7). Total num frames: 998199296. Throughput: 0: 3493.9. Samples: 238708658. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 16:08:33,968][134211] Avg episode reward: [(0, '9.583')] [2025-01-04 16:08:34,396][134294] Updated weights for policy 0, policy_version 243704 (0.0015) [2025-01-04 16:08:36,292][134294] Updated weights for policy 0, policy_version 243714 (0.0013) [2025-01-04 16:08:38,213][134294] Updated weights for policy 0, policy_version 243724 (0.0013) [2025-01-04 16:08:38,968][134211] Fps is (10 sec: 21299.0, 60 sec: 14882.2, 300 sec: 14620.6). Total num frames: 998305792. Throughput: 0: 3765.3. Samples: 238741014. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 16:08:38,968][134211] Avg episode reward: [(0, '9.668')] [2025-01-04 16:08:40,887][134294] Updated weights for policy 0, policy_version 243734 (0.0021) [2025-01-04 16:08:43,968][134211] Fps is (10 sec: 17202.9, 60 sec: 14882.1, 300 sec: 14606.8). Total num frames: 998371328. Throughput: 0: 3854.1. Samples: 238764412. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 16:08:43,968][134211] Avg episode reward: [(0, '10.239')] [2025-01-04 16:08:44,058][134294] Updated weights for policy 0, policy_version 243744 (0.0025) [2025-01-04 16:08:47,217][134294] Updated weights for policy 0, policy_version 243754 (0.0027) [2025-01-04 16:08:48,968][134211] Fps is (10 sec: 13107.3, 60 sec: 14813.9, 300 sec: 14606.8). Total num frames: 998436864. Throughput: 0: 3838.2. Samples: 238773862. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 16:08:48,968][134211] Avg episode reward: [(0, '8.550')] [2025-01-04 16:08:50,354][134294] Updated weights for policy 0, policy_version 243764 (0.0026) [2025-01-04 16:08:53,299][134294] Updated weights for policy 0, policy_version 243774 (0.0027) [2025-01-04 16:08:53,968][134211] Fps is (10 sec: 13516.6, 60 sec: 14882.1, 300 sec: 14551.2). Total num frames: 998506496. Throughput: 0: 3837.3. Samples: 238793962. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 16:08:53,968][134211] Avg episode reward: [(0, '10.583')] [2025-01-04 16:08:56,370][134294] Updated weights for policy 0, policy_version 243784 (0.0023) [2025-01-04 16:08:58,968][134211] Fps is (10 sec: 13516.6, 60 sec: 14882.1, 300 sec: 14523.4). Total num frames: 998572032. Throughput: 0: 3819.6. Samples: 238814032. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 16:08:58,968][134211] Avg episode reward: [(0, '8.883')] [2025-01-04 16:08:59,424][134294] Updated weights for policy 0, policy_version 243794 (0.0027) [2025-01-04 16:09:02,493][134294] Updated weights for policy 0, policy_version 243804 (0.0023) [2025-01-04 16:09:03,968][134211] Fps is (10 sec: 13516.8, 60 sec: 14950.4, 300 sec: 14537.3). Total num frames: 998641664. Throughput: 0: 3810.8. Samples: 238824004. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 16:09:03,968][134211] Avg episode reward: [(0, '9.513')] [2025-01-04 16:09:05,527][134294] Updated weights for policy 0, policy_version 243814 (0.0027) [2025-01-04 16:09:08,368][134294] Updated weights for policy 0, policy_version 243824 (0.0027) [2025-01-04 16:09:08,968][134211] Fps is (10 sec: 13516.7, 60 sec: 14882.1, 300 sec: 14467.9). Total num frames: 998707200. Throughput: 0: 3818.8. Samples: 238844814. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 16:09:08,968][134211] Avg episode reward: [(0, '10.099')] [2025-01-04 16:09:11,686][134294] Updated weights for policy 0, policy_version 243834 (0.0028) [2025-01-04 16:09:13,968][134211] Fps is (10 sec: 13107.3, 60 sec: 14814.0, 300 sec: 14454.0). Total num frames: 998772736. Throughput: 0: 3781.2. Samples: 238863962. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 16:09:13,968][134211] Avg episode reward: [(0, '9.386')] [2025-01-04 16:09:13,980][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000243841_998772736.pth... [2025-01-04 16:09:14,052][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000242995_995307520.pth [2025-01-04 16:09:14,946][134294] Updated weights for policy 0, policy_version 243844 (0.0024) [2025-01-04 16:09:17,950][134294] Updated weights for policy 0, policy_version 243854 (0.0025) [2025-01-04 16:09:18,968][134211] Fps is (10 sec: 13107.4, 60 sec: 14746.2, 300 sec: 14440.1). Total num frames: 998838272. Throughput: 0: 3659.7. Samples: 238873344. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 16:09:18,968][134211] Avg episode reward: [(0, '8.522')] [2025-01-04 16:09:21,016][134294] Updated weights for policy 0, policy_version 243864 (0.0023) [2025-01-04 16:09:23,968][134211] Fps is (10 sec: 13107.2, 60 sec: 14745.6, 300 sec: 14426.3). Total num frames: 998903808. Throughput: 0: 3391.9. Samples: 238893650. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-04 16:09:23,968][134211] Avg episode reward: [(0, '10.558')] [2025-01-04 16:09:24,394][134294] Updated weights for policy 0, policy_version 243874 (0.0028) [2025-01-04 16:09:27,739][134294] Updated weights for policy 0, policy_version 243884 (0.0028) [2025-01-04 16:09:28,968][134211] Fps is (10 sec: 12288.0, 60 sec: 14472.5, 300 sec: 14384.6). Total num frames: 998961152. Throughput: 0: 3276.0. Samples: 238911834. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 16:09:28,968][134211] Avg episode reward: [(0, '9.911')] [2025-01-04 16:09:30,918][134294] Updated weights for policy 0, policy_version 243894 (0.0026) [2025-01-04 16:09:33,774][134294] Updated weights for policy 0, policy_version 243904 (0.0026) [2025-01-04 16:09:33,968][134211] Fps is (10 sec: 12697.4, 60 sec: 13858.1, 300 sec: 14384.6). Total num frames: 999030784. Throughput: 0: 3287.1. Samples: 238921784. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 16:09:33,971][134211] Avg episode reward: [(0, '10.372')] [2025-01-04 16:09:36,743][134294] Updated weights for policy 0, policy_version 243914 (0.0028) [2025-01-04 16:09:38,968][134211] Fps is (10 sec: 13926.3, 60 sec: 13243.7, 300 sec: 14329.1). Total num frames: 999100416. Throughput: 0: 3303.3. Samples: 238942610. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 16:09:38,968][134211] Avg episode reward: [(0, '8.846')] [2025-01-04 16:09:39,856][134294] Updated weights for policy 0, policy_version 243924 (0.0022) [2025-01-04 16:09:42,247][134294] Updated weights for policy 0, policy_version 243934 (0.0017) [2025-01-04 16:09:43,968][134211] Fps is (10 sec: 15974.8, 60 sec: 13653.4, 300 sec: 14412.4). Total num frames: 999190528. Throughput: 0: 3392.6. Samples: 238966700. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 16:09:43,968][134211] Avg episode reward: [(0, '8.552')] [2025-01-04 16:09:44,151][134294] Updated weights for policy 0, policy_version 243944 (0.0012) [2025-01-04 16:09:46,016][134294] Updated weights for policy 0, policy_version 243954 (0.0013) [2025-01-04 16:09:47,828][134294] Updated weights for policy 0, policy_version 243964 (0.0013) [2025-01-04 16:09:48,967][134211] Fps is (10 sec: 19661.3, 60 sec: 14336.0, 300 sec: 14537.3). Total num frames: 999297024. Throughput: 0: 3535.3. Samples: 238983090. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 16:09:48,968][134211] Avg episode reward: [(0, '9.015')] [2025-01-04 16:09:49,889][134294] Updated weights for policy 0, policy_version 243974 (0.0016) [2025-01-04 16:09:53,108][134294] Updated weights for policy 0, policy_version 243984 (0.0029) [2025-01-04 16:09:53,968][134211] Fps is (10 sec: 17611.8, 60 sec: 14335.9, 300 sec: 14551.2). Total num frames: 999366656. Throughput: 0: 3671.9. Samples: 239010050. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 16:09:53,969][134211] Avg episode reward: [(0, '9.731')] [2025-01-04 16:09:56,180][134294] Updated weights for policy 0, policy_version 243994 (0.0029) [2025-01-04 16:09:58,968][134211] Fps is (10 sec: 13516.5, 60 sec: 14336.0, 300 sec: 14509.6). Total num frames: 999432192. Throughput: 0: 3672.7. Samples: 239029234. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 16:09:58,968][134211] Avg episode reward: [(0, '10.345')] [2025-01-04 16:09:59,521][134294] Updated weights for policy 0, policy_version 244004 (0.0029) [2025-01-04 16:10:02,718][134294] Updated weights for policy 0, policy_version 244014 (0.0025) [2025-01-04 16:10:03,968][134211] Fps is (10 sec: 13107.6, 60 sec: 14267.7, 300 sec: 14426.2). Total num frames: 999497728. Throughput: 0: 3673.4. Samples: 239038648. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 16:10:03,969][134211] Avg episode reward: [(0, '9.642')] [2025-01-04 16:10:05,716][134294] Updated weights for policy 0, policy_version 244024 (0.0024) [2025-01-04 16:10:08,685][134294] Updated weights for policy 0, policy_version 244034 (0.0025) [2025-01-04 16:10:08,968][134211] Fps is (10 sec: 13516.7, 60 sec: 14336.0, 300 sec: 14454.0). Total num frames: 999567360. Throughput: 0: 3674.9. Samples: 239059020. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 16:10:08,968][134211] Avg episode reward: [(0, '10.410')] [2025-01-04 16:10:11,578][134294] Updated weights for policy 0, policy_version 244044 (0.0023) [2025-01-04 16:10:13,969][134211] Fps is (10 sec: 13515.5, 60 sec: 14335.7, 300 sec: 14454.0). Total num frames: 999632896. Throughput: 0: 3722.8. Samples: 239079364. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 16:10:13,969][134211] Avg episode reward: [(0, '9.165')] [2025-01-04 16:10:14,758][134294] Updated weights for policy 0, policy_version 244054 (0.0025) [2025-01-04 16:10:17,639][134294] Updated weights for policy 0, policy_version 244064 (0.0023) [2025-01-04 16:10:18,968][134211] Fps is (10 sec: 13517.0, 60 sec: 14404.3, 300 sec: 14454.0). Total num frames: 999702528. Throughput: 0: 3726.0. Samples: 239089454. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 16:10:18,968][134211] Avg episode reward: [(0, '9.449')] [2025-01-04 16:10:20,670][134294] Updated weights for policy 0, policy_version 244074 (0.0023) [2025-01-04 16:10:23,556][134294] Updated weights for policy 0, policy_version 244084 (0.0025) [2025-01-04 16:10:23,968][134211] Fps is (10 sec: 13927.8, 60 sec: 14472.5, 300 sec: 14467.9). Total num frames: 999772160. Throughput: 0: 3725.4. Samples: 239110252. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 16:10:23,968][134211] Avg episode reward: [(0, '9.808')] [2025-01-04 16:10:26,569][134294] Updated weights for policy 0, policy_version 244094 (0.0024) [2025-01-04 16:10:28,969][134211] Fps is (10 sec: 13515.4, 60 sec: 14608.8, 300 sec: 14467.9). Total num frames: 999837696. Throughput: 0: 3645.6. Samples: 239130754. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 16:10:28,969][134211] Avg episode reward: [(0, '10.259')] [2025-01-04 16:10:29,695][134294] Updated weights for policy 0, policy_version 244104 (0.0027) [2025-01-04 16:10:32,672][134294] Updated weights for policy 0, policy_version 244114 (0.0022) [2025-01-04 16:10:33,968][134211] Fps is (10 sec: 13107.2, 60 sec: 14540.8, 300 sec: 14454.0). Total num frames: 999903232. Throughput: 0: 3505.3. Samples: 239140830. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 16:10:33,968][134211] Avg episode reward: [(0, '10.286')] [2025-01-04 16:10:35,657][134294] Updated weights for policy 0, policy_version 244124 (0.0028) [2025-01-04 16:10:38,623][134294] Updated weights for policy 0, policy_version 244134 (0.0024) [2025-01-04 16:10:38,968][134211] Fps is (10 sec: 13927.9, 60 sec: 14609.1, 300 sec: 14440.1). Total num frames: 999976960. Throughput: 0: 3368.0. Samples: 239161608. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-04 16:10:38,968][134211] Avg episode reward: [(0, '10.077')] [2025-01-04 16:10:40,920][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000244142_1000005632.pth... [2025-01-04 16:10:40,930][134211] Component Batcher_0 stopped! [2025-01-04 16:10:40,929][134264] Stopping Batcher_0... [2025-01-04 16:10:40,934][134264] Loop batcher_evt_loop terminating... [2025-01-04 16:10:40,963][134294] Weights refcount: 2 0 [2025-01-04 16:10:40,965][134294] Stopping InferenceWorker_p0-w0... [2025-01-04 16:10:40,966][134294] Loop inference_proc0-0_evt_loop terminating... [2025-01-04 16:10:40,968][134211] Component InferenceWorker_p0-w0 stopped! [2025-01-04 16:10:41,003][134264] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000243411_997011456.pth [2025-01-04 16:10:41,006][134264] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000244142_1000005632.pth... [2025-01-04 16:10:41,093][134264] Stopping LearnerWorker_p0... [2025-01-04 16:10:41,094][134264] Loop learner_proc0_evt_loop terminating... [2025-01-04 16:10:41,096][134211] Component LearnerWorker_p0 stopped! [2025-01-04 16:10:41,305][134293] Stopping RolloutWorker_w1... [2025-01-04 16:10:41,306][134293] Loop rollout_proc1_evt_loop terminating... [2025-01-04 16:10:41,306][134211] Component RolloutWorker_w1 stopped! [2025-01-04 16:10:41,307][134311] Stopping RolloutWorker_w6... [2025-01-04 16:10:41,308][134211] Component RolloutWorker_w6 stopped! [2025-01-04 16:10:41,308][134311] Loop rollout_proc6_evt_loop terminating... [2025-01-04 16:10:41,319][134308] Stopping RolloutWorker_w3... [2025-01-04 16:10:41,319][134211] Component RolloutWorker_w3 stopped! [2025-01-04 16:10:41,320][134308] Loop rollout_proc3_evt_loop terminating... [2025-01-04 16:10:41,364][134211] Component RolloutWorker_w5 stopped! [2025-01-04 16:10:41,367][134313] Stopping RolloutWorker_w8... [2025-01-04 16:10:41,368][134313] Loop rollout_proc8_evt_loop terminating... [2025-01-04 16:10:41,368][134211] Component RolloutWorker_w8 stopped! [2025-01-04 16:10:41,373][134310] Stopping RolloutWorker_w5... [2025-01-04 16:10:41,380][134310] Loop rollout_proc5_evt_loop terminating... [2025-01-04 16:10:41,415][134315] Stopping RolloutWorker_w11... [2025-01-04 16:10:41,416][134315] Loop rollout_proc11_evt_loop terminating... [2025-01-04 16:10:41,416][134211] Component RolloutWorker_w11 stopped! [2025-01-04 16:10:41,436][134295] Stopping RolloutWorker_w2... [2025-01-04 16:10:41,437][134211] Component RolloutWorker_w2 stopped! [2025-01-04 16:10:41,437][134295] Loop rollout_proc2_evt_loop terminating... [2025-01-04 16:10:41,482][134211] Component RolloutWorker_w7 stopped! [2025-01-04 16:10:41,481][134314] Stopping RolloutWorker_w7... [2025-01-04 16:10:41,484][134314] Loop rollout_proc7_evt_loop terminating... [2025-01-04 16:10:41,597][134312] Stopping RolloutWorker_w4... [2025-01-04 16:10:41,598][134312] Loop rollout_proc4_evt_loop terminating... [2025-01-04 16:10:41,598][134211] Component RolloutWorker_w4 stopped! [2025-01-04 16:10:41,669][134211] Component RolloutWorker_w0 stopped! [2025-01-04 16:10:41,669][134296] Stopping RolloutWorker_w0... [2025-01-04 16:10:41,679][134296] Loop rollout_proc0_evt_loop terminating... [2025-01-04 16:10:42,325][134317] Stopping RolloutWorker_w10... [2025-01-04 16:10:42,325][134211] Component RolloutWorker_w10 stopped! [2025-01-04 16:10:42,326][134317] Loop rollout_proc10_evt_loop terminating... [2025-01-04 16:10:42,576][134316] Stopping RolloutWorker_w9... [2025-01-04 16:10:42,576][134211] Component RolloutWorker_w9 stopped! [2025-01-04 16:10:42,577][134316] Loop rollout_proc9_evt_loop terminating... [2025-01-04 16:10:42,577][134211] Waiting for process learner_proc0 to stop... [2025-01-04 16:10:42,916][134211] Waiting for process inference_proc0-0 to join... [2025-01-04 16:10:42,917][134211] Waiting for process rollout_proc0 to join... [2025-01-04 16:10:42,917][134211] Waiting for process rollout_proc1 to join... [2025-01-04 16:10:42,918][134211] Waiting for process rollout_proc2 to join... [2025-01-04 16:10:42,918][134211] Waiting for process rollout_proc3 to join... [2025-01-04 16:10:42,918][134211] Waiting for process rollout_proc4 to join... [2025-01-04 16:10:42,919][134211] Waiting for process rollout_proc5 to join... [2025-01-04 16:10:42,919][134211] Waiting for process rollout_proc6 to join... [2025-01-04 16:10:42,920][134211] Waiting for process rollout_proc7 to join... [2025-01-04 16:10:42,920][134211] Waiting for process rollout_proc8 to join... [2025-01-04 16:10:42,920][134211] Waiting for process rollout_proc9 to join... [2025-01-04 16:10:43,139][134211] Waiting for process rollout_proc10 to join... [2025-01-04 16:10:43,140][134211] Waiting for process rollout_proc11 to join... [2025-01-04 16:10:43,140][134211] Batcher 0 profile tree view: batching: 3769.6530, releasing_batches: 7.4324 [2025-01-04 16:10:43,141][134211] InferenceWorker_p0-w0 profile tree view: wait_policy: 0.0001 wait_policy_total: 1173.1898 update_model: 1161.3375 weight_update: 0.0024 one_step: 0.0081 handle_policy_step: 60480.7014 deserialize: 2734.0166, stack: 349.5078, obs_to_device_normalize: 15395.9028, forward: 26261.9838, send_messages: 5517.1273 prepare_outputs: 7766.1900 to_cpu: 4886.6861 [2025-01-04 16:10:43,141][134211] Learner 0 profile tree view: misc: 1.3846, prepare_batch: 3016.8428 train: 14509.0264 epoch_init: 1.5437, minibatch_init: 2.0145, losses_postprocess: 91.5965, kl_divergence: 103.2778, after_optimizer: 116.8185 calculate_losses: 5096.1704 losses_init: 1.0783, forward_head: 213.1738, bptt_initial: 3641.5855, tail: 203.1416, advantages_returns: 59.1064, losses: 560.7626 bptt: 357.2067 bptt_forward_core: 336.3340 update: 8980.0928 clip: 227.6525 [2025-01-04 16:10:43,141][134211] RolloutWorker_w0 profile tree view: wait_for_trajectories: 30.9630, enqueue_policy_requests: 2384.2327, env_step: 30462.1752, overhead: 1532.4782, complete_rollouts: 51.0428 save_policy_outputs: 2726.4054 split_output_tensors: 891.3264 [2025-01-04 16:10:43,142][134211] RolloutWorker_w11 profile tree view: wait_for_trajectories: 34.0943, enqueue_policy_requests: 2648.2991, env_step: 30556.7327, overhead: 1672.4467, complete_rollouts: 57.3052 save_policy_outputs: 3043.3564 split_output_tensors: 1000.4512 [2025-01-04 16:10:43,142][134211] Loop Runner_EvtLoop terminating... [2025-01-04 16:10:43,143][134211] Runner profile tree view: main_loop: 65965.8047 [2025-01-04 16:10:43,143][134211] Collected {0: 1000005632}, FPS: 14502.9 [2025-01-04 16:10:43,509][134211] Loading existing experiment configuration from /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/config.json [2025-01-04 16:10:43,509][134211] Overriding arg 'num_workers' with value 1 passed from command line [2025-01-04 16:10:43,509][134211] Adding new argument 'no_render'=True that is not in the saved config file! [2025-01-04 16:10:43,509][134211] Adding new argument 'save_video'=True that is not in the saved config file! [2025-01-04 16:10:43,510][134211] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! [2025-01-04 16:10:43,510][134211] Adding new argument 'video_name'=None that is not in the saved config file! [2025-01-04 16:10:43,510][134211] Adding new argument 'max_num_frames'=1000000000.0 that is not in the saved config file! [2025-01-04 16:10:43,510][134211] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! [2025-01-04 16:10:43,510][134211] Adding new argument 'push_to_hub'=False that is not in the saved config file! [2025-01-04 16:10:43,510][134211] Adding new argument 'hf_repository'=None that is not in the saved config file! [2025-01-04 16:10:43,510][134211] Adding new argument 'policy_index'=0 that is not in the saved config file! [2025-01-04 16:10:43,511][134211] Adding new argument 'eval_deterministic'=False that is not in the saved config file! [2025-01-04 16:10:43,511][134211] Adding new argument 'train_script'=None that is not in the saved config file! [2025-01-04 16:10:43,511][134211] Adding new argument 'enjoy_script'=None that is not in the saved config file! [2025-01-04 16:10:43,511][134211] Using frameskip 1 and render_action_repeat=4 for evaluation [2025-01-04 16:10:43,544][134211] Doom resolution: 160x120, resize resolution: (128, 72) [2025-01-04 16:10:43,546][134211] RunningMeanStd input shape: (3, 72, 128) [2025-01-04 16:10:43,547][134211] RunningMeanStd input shape: (1,) [2025-01-04 16:10:43,560][134211] ConvEncoder: input_channels=3 [2025-01-04 16:10:43,695][134211] Conv encoder output size: 512 [2025-01-04 16:10:43,695][134211] Policy head output size: 512 [2025-01-04 16:10:43,853][134211] Loading state from checkpoint /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000244142_1000005632.pth... [2025-01-04 16:10:44,553][134211] Num frames 100... [2025-01-04 16:10:44,647][134211] Num frames 200... [2025-01-04 16:10:44,739][134211] Num frames 300... [2025-01-04 16:10:44,870][134211] Avg episode rewards: #0: 3.840, true rewards: #0: 3.840 [2025-01-04 16:10:44,870][134211] Avg episode reward: 3.840, avg true_objective: 3.840 [2025-01-04 16:10:44,899][134211] Num frames 400... [2025-01-04 16:10:44,996][134211] Num frames 500... [2025-01-04 16:10:45,088][134211] Num frames 600... [2025-01-04 16:10:45,183][134211] Num frames 700... [2025-01-04 16:10:45,277][134211] Num frames 800... [2025-01-04 16:10:45,371][134211] Num frames 900... [2025-01-04 16:10:45,463][134211] Num frames 1000... [2025-01-04 16:10:45,556][134211] Num frames 1100... [2025-01-04 16:10:45,629][134211] Avg episode rewards: #0: 8.600, true rewards: #0: 5.600 [2025-01-04 16:10:45,629][134211] Avg episode reward: 8.600, avg true_objective: 5.600 [2025-01-04 16:10:45,726][134211] Num frames 1200... [2025-01-04 16:10:45,820][134211] Num frames 1300... [2025-01-04 16:10:45,915][134211] Num frames 1400... [2025-01-04 16:10:46,011][134211] Num frames 1500... [2025-01-04 16:10:46,104][134211] Num frames 1600... [2025-01-04 16:10:46,199][134211] Num frames 1700... [2025-01-04 16:10:46,311][134211] Avg episode rewards: #0: 9.537, true rewards: #0: 5.870 [2025-01-04 16:10:46,311][134211] Avg episode reward: 9.537, avg true_objective: 5.870 [2025-01-04 16:10:46,385][134211] Num frames 1800... [2025-01-04 16:10:46,477][134211] Num frames 1900... [2025-01-04 16:10:46,571][134211] Num frames 2000... [2025-01-04 16:10:46,642][134211] Avg episode rewards: #0: 7.793, true rewards: #0: 5.042 [2025-01-04 16:10:46,642][134211] Avg episode reward: 7.793, avg true_objective: 5.042 [2025-01-04 16:10:46,738][134211] Num frames 2100... [2025-01-04 16:10:46,832][134211] Num frames 2200... [2025-01-04 16:10:46,926][134211] Num frames 2300... [2025-01-04 16:10:47,032][134211] Num frames 2400... [2025-01-04 16:10:47,146][134211] Avg episode rewards: #0: 7.730, true rewards: #0: 4.930 [2025-01-04 16:10:47,147][134211] Avg episode reward: 7.730, avg true_objective: 4.930 [2025-01-04 16:10:47,212][134211] Num frames 2500... [2025-01-04 16:10:47,303][134211] Num frames 2600... [2025-01-04 16:10:47,396][134211] Num frames 2700... [2025-01-04 16:10:47,488][134211] Num frames 2800... [2025-01-04 16:10:47,582][134211] Num frames 2900... [2025-01-04 16:10:47,675][134211] Num frames 3000... [2025-01-04 16:10:47,768][134211] Avg episode rewards: #0: 7.902, true rewards: #0: 5.068 [2025-01-04 16:10:47,769][134211] Avg episode reward: 7.902, avg true_objective: 5.068 [2025-01-04 16:10:47,852][134211] Num frames 3100... [2025-01-04 16:10:47,945][134211] Num frames 3200... [2025-01-04 16:10:48,039][134211] Num frames 3300... [2025-01-04 16:10:48,134][134211] Num frames 3400... [2025-01-04 16:10:48,244][134211] Avg episode rewards: #0: 7.653, true rewards: #0: 4.939 [2025-01-04 16:10:48,245][134211] Avg episode reward: 7.653, avg true_objective: 4.939 [2025-01-04 16:10:48,326][134211] Num frames 3500... [2025-01-04 16:10:48,421][134211] Num frames 3600... [2025-01-04 16:10:48,515][134211] Num frames 3700... [2025-01-04 16:10:48,610][134211] Num frames 3800... [2025-01-04 16:10:48,702][134211] Num frames 3900... [2025-01-04 16:10:48,797][134211] Num frames 4000... [2025-01-04 16:10:48,891][134211] Num frames 4100... [2025-01-04 16:10:48,987][134211] Num frames 4200... [2025-01-04 16:10:49,080][134211] Num frames 4300... [2025-01-04 16:10:49,174][134211] Num frames 4400... [2025-01-04 16:10:49,269][134211] Num frames 4500... [2025-01-04 16:10:49,366][134211] Num frames 4600... [2025-01-04 16:10:49,462][134211] Num frames 4700... [2025-01-04 16:10:49,556][134211] Num frames 4800... [2025-01-04 16:10:49,649][134211] Num frames 4900... [2025-01-04 16:10:49,745][134211] Num frames 5000... [2025-01-04 16:10:49,840][134211] Num frames 5100... [2025-01-04 16:10:49,933][134211] Num frames 5200... [2025-01-04 16:10:50,031][134211] Num frames 5300... [2025-01-04 16:10:50,159][134211] Avg episode rewards: #0: 12.096, true rewards: #0: 6.721 [2025-01-04 16:10:50,160][134211] Avg episode reward: 12.096, avg true_objective: 6.721 [2025-01-04 16:10:50,196][134211] Num frames 5400... [2025-01-04 16:10:50,297][134211] Num frames 5500... [2025-01-04 16:10:50,391][134211] Num frames 5600... [2025-01-04 16:10:50,486][134211] Num frames 5700... [2025-01-04 16:10:50,597][134211] Avg episode rewards: #0: 11.401, true rewards: #0: 6.401 [2025-01-04 16:10:50,597][134211] Avg episode reward: 11.401, avg true_objective: 6.401 [2025-01-04 16:10:50,668][134211] Num frames 5800... [2025-01-04 16:10:50,761][134211] Num frames 5900... [2025-01-04 16:10:50,855][134211] Num frames 6000... [2025-01-04 16:10:50,950][134211] Num frames 6100... [2025-01-04 16:10:51,051][134211] Num frames 6200... [2025-01-04 16:10:51,146][134211] Num frames 6300... [2025-01-04 16:10:51,265][134211] Avg episode rewards: #0: 11.169, true rewards: #0: 6.369 [2025-01-04 16:10:51,266][134211] Avg episode reward: 11.169, avg true_objective: 6.369 [2025-01-04 16:11:02,436][134211] Replay video saved to /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/replay.mp4! [2025-01-04 16:11:02,447][134211] Loading existing experiment configuration from /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/config.json [2025-01-04 16:11:02,448][134211] Overriding arg 'num_workers' with value 1 passed from command line [2025-01-04 16:11:02,448][134211] Adding new argument 'no_render'=True that is not in the saved config file! [2025-01-04 16:11:02,448][134211] Adding new argument 'save_video'=True that is not in the saved config file! [2025-01-04 16:11:02,448][134211] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! [2025-01-04 16:11:02,448][134211] Adding new argument 'video_name'=None that is not in the saved config file! [2025-01-04 16:11:02,448][134211] Adding new argument 'max_num_frames'=100000 that is not in the saved config file! [2025-01-04 16:11:02,448][134211] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! [2025-01-04 16:11:02,448][134211] Adding new argument 'push_to_hub'=True that is not in the saved config file! [2025-01-04 16:11:02,448][134211] Adding new argument 'hf_repository'='spenning/rl_course_vizdoom_health_gathering_supreme' that is not in the saved config file! [2025-01-04 16:11:02,448][134211] Adding new argument 'policy_index'=0 that is not in the saved config file! [2025-01-04 16:11:02,448][134211] Adding new argument 'eval_deterministic'=False that is not in the saved config file! [2025-01-04 16:11:02,449][134211] Adding new argument 'train_script'=None that is not in the saved config file! [2025-01-04 16:11:02,449][134211] Adding new argument 'enjoy_script'=None that is not in the saved config file! [2025-01-04 16:11:02,449][134211] Using frameskip 1 and render_action_repeat=4 for evaluation [2025-01-04 16:11:02,465][134211] RunningMeanStd input shape: (3, 72, 128) [2025-01-04 16:11:02,465][134211] RunningMeanStd input shape: (1,) [2025-01-04 16:11:02,472][134211] ConvEncoder: input_channels=3 [2025-01-04 16:11:02,498][134211] Conv encoder output size: 512 [2025-01-04 16:11:02,498][134211] Policy head output size: 512 [2025-01-04 16:11:02,511][134211] Loading state from checkpoint /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000244142_1000005632.pth... [2025-01-04 16:11:02,862][134211] Num frames 100... [2025-01-04 16:11:02,953][134211] Num frames 200... [2025-01-04 16:11:03,035][134211] Num frames 300... [2025-01-04 16:11:03,113][134211] Num frames 400... [2025-01-04 16:11:03,205][134211] Avg episode rewards: #0: 5.480, true rewards: #0: 4.480 [2025-01-04 16:11:03,206][134211] Avg episode reward: 5.480, avg true_objective: 4.480 [2025-01-04 16:11:03,288][134211] Num frames 500... [2025-01-04 16:11:03,378][134211] Num frames 600... [2025-01-04 16:11:03,465][134211] Num frames 700... [2025-01-04 16:11:03,543][134211] Num frames 800... [2025-01-04 16:11:03,621][134211] Num frames 900... [2025-01-04 16:11:03,697][134211] Num frames 1000... [2025-01-04 16:11:03,773][134211] Num frames 1100... [2025-01-04 16:11:03,892][134211] Avg episode rewards: #0: 9.425, true rewards: #0: 5.925 [2025-01-04 16:11:03,893][134211] Avg episode reward: 9.425, avg true_objective: 5.925 [2025-01-04 16:11:03,923][134211] Num frames 1200... [2025-01-04 16:11:04,031][134211] Num frames 1300... [2025-01-04 16:11:04,119][134211] Num frames 1400... [2025-01-04 16:11:04,211][134211] Num frames 1500... [2025-01-04 16:11:04,302][134211] Num frames 1600... [2025-01-04 16:11:04,394][134211] Num frames 1700... [2025-01-04 16:11:04,535][134211] Avg episode rewards: #0: 9.310, true rewards: #0: 5.977 [2025-01-04 16:11:04,536][134211] Avg episode reward: 9.310, avg true_objective: 5.977 [2025-01-04 16:11:04,554][134211] Num frames 1800... [2025-01-04 16:11:04,651][134211] Num frames 1900... [2025-01-04 16:11:04,743][134211] Num frames 2000... [2025-01-04 16:11:04,881][134211] Avg episode rewards: #0: 7.735, true rewards: #0: 5.235 [2025-01-04 16:11:04,882][134211] Avg episode reward: 7.735, avg true_objective: 5.235 [2025-01-04 16:11:04,896][134211] Num frames 2100... [2025-01-04 16:11:05,016][134211] Num frames 2200... [2025-01-04 16:11:05,112][134211] Num frames 2300... [2025-01-04 16:11:05,205][134211] Num frames 2400... [2025-01-04 16:11:05,302][134211] Avg episode rewards: #0: 7.092, true rewards: #0: 4.892 [2025-01-04 16:11:05,302][134211] Avg episode reward: 7.092, avg true_objective: 4.892 [2025-01-04 16:11:05,375][134211] Num frames 2500... [2025-01-04 16:11:05,470][134211] Num frames 2600... [2025-01-04 16:11:05,562][134211] Num frames 2700... [2025-01-04 16:11:05,653][134211] Num frames 2800... [2025-01-04 16:11:05,735][134211] Avg episode rewards: #0: 6.550, true rewards: #0: 4.717 [2025-01-04 16:11:05,736][134211] Avg episode reward: 6.550, avg true_objective: 4.717 [2025-01-04 16:11:05,815][134211] Num frames 2900... [2025-01-04 16:11:05,909][134211] Num frames 3000... [2025-01-04 16:11:06,001][134211] Num frames 3100... [2025-01-04 16:11:06,096][134211] Num frames 3200... [2025-01-04 16:11:06,188][134211] Num frames 3300... [2025-01-04 16:11:06,310][134211] Avg episode rewards: #0: 7.106, true rewards: #0: 4.820 [2025-01-04 16:11:06,310][134211] Avg episode reward: 7.106, avg true_objective: 4.820 [2025-01-04 16:11:06,360][134211] Num frames 3400... [2025-01-04 16:11:06,455][134211] Num frames 3500... [2025-01-04 16:11:06,548][134211] Num frames 3600... [2025-01-04 16:11:06,642][134211] Num frames 3700... [2025-01-04 16:11:06,735][134211] Num frames 3800... [2025-01-04 16:11:06,828][134211] Num frames 3900... [2025-01-04 16:11:06,921][134211] Num frames 4000... [2025-01-04 16:11:07,015][134211] Num frames 4100... [2025-01-04 16:11:07,110][134211] Num frames 4200... [2025-01-04 16:11:07,205][134211] Num frames 4300... [2025-01-04 16:11:07,299][134211] Num frames 4400... [2025-01-04 16:11:07,393][134211] Num frames 4500... [2025-01-04 16:11:07,490][134211] Num frames 4600... [2025-01-04 16:11:07,631][134211] Avg episode rewards: #0: 10.246, true rewards: #0: 5.871 [2025-01-04 16:11:07,632][134211] Avg episode reward: 10.246, avg true_objective: 5.871 [2025-01-04 16:11:07,635][134211] Num frames 4700... [2025-01-04 16:11:07,751][134211] Num frames 4800... [2025-01-04 16:11:07,843][134211] Num frames 4900... [2025-01-04 16:11:07,934][134211] Num frames 5000... [2025-01-04 16:11:08,030][134211] Avg episode rewards: #0: 9.610, true rewards: #0: 5.610 [2025-01-04 16:11:08,030][134211] Avg episode reward: 9.610, avg true_objective: 5.610 [2025-01-04 16:11:08,088][134211] Num frames 5100... [2025-01-04 16:11:08,175][134211] Num frames 5200... [2025-01-04 16:11:08,261][134211] Num frames 5300... [2025-01-04 16:11:08,348][134211] Num frames 5400... [2025-01-04 16:11:08,435][134211] Num frames 5500... [2025-01-04 16:11:08,521][134211] Num frames 5600... [2025-01-04 16:11:08,608][134211] Num frames 5700... [2025-01-04 16:11:08,697][134211] Num frames 5800... [2025-01-04 16:11:08,775][134211] Avg episode rewards: #0: 10.628, true rewards: #0: 5.828 [2025-01-04 16:11:08,776][134211] Avg episode reward: 10.628, avg true_objective: 5.828 [2025-01-04 16:11:19,530][134211] Replay video saved to /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/replay.mp4! [2025-01-04 16:13:04,299][134211] The model has been pushed to https://huggingface.co/spenning/rl_course_vizdoom_health_gathering_supreme [2025-01-05 10:27:12,121][28277] Saving configuration to /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/config.json... [2025-01-05 10:27:12,130][28277] Rollout worker 0 uses device cpu [2025-01-05 10:27:12,130][28277] Rollout worker 1 uses device cpu [2025-01-05 10:27:12,130][28277] Rollout worker 2 uses device cpu [2025-01-05 10:27:12,130][28277] Rollout worker 3 uses device cpu [2025-01-05 10:27:12,130][28277] Rollout worker 4 uses device cpu [2025-01-05 10:27:12,131][28277] Rollout worker 5 uses device cpu [2025-01-05 10:27:12,131][28277] Rollout worker 6 uses device cpu [2025-01-05 10:27:12,131][28277] Rollout worker 7 uses device cpu [2025-01-05 10:27:12,131][28277] Rollout worker 8 uses device cpu [2025-01-05 10:27:12,131][28277] Rollout worker 9 uses device cpu [2025-01-05 10:27:12,131][28277] Rollout worker 10 uses device cpu [2025-01-05 10:27:12,131][28277] Rollout worker 11 uses device cpu [2025-01-05 10:33:53,140][07439] Saving configuration to /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/config.json... [2025-01-05 10:33:53,142][07439] Rollout worker 0 uses device cpu [2025-01-05 10:33:53,142][07439] Rollout worker 1 uses device cpu [2025-01-05 10:33:53,142][07439] Rollout worker 2 uses device cpu [2025-01-05 10:33:53,143][07439] Rollout worker 3 uses device cpu [2025-01-05 10:33:53,143][07439] Rollout worker 4 uses device cpu [2025-01-05 10:33:53,143][07439] Rollout worker 5 uses device cpu [2025-01-05 10:33:53,143][07439] Rollout worker 6 uses device cpu [2025-01-05 10:33:53,144][07439] Rollout worker 7 uses device cpu [2025-01-05 10:33:53,144][07439] Rollout worker 8 uses device cpu [2025-01-05 10:33:53,144][07439] Rollout worker 9 uses device cpu [2025-01-05 10:33:53,144][07439] Rollout worker 10 uses device cpu [2025-01-05 10:33:53,145][07439] Rollout worker 11 uses device cpu [2025-01-05 10:33:53,227][07439] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2025-01-05 10:33:53,227][07439] InferenceWorker_p0-w0: min num requests: 4 [2025-01-05 10:33:53,265][07439] Starting all processes... [2025-01-05 10:33:53,266][07439] Starting process learner_proc0 [2025-01-05 10:33:55,336][07439] Starting all processes... [2025-01-05 10:33:55,354][07750] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2025-01-05 10:33:55,354][07750] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0 [2025-01-05 10:33:55,358][07439] Starting process inference_proc0-0 [2025-01-05 10:33:55,358][07439] Starting process rollout_proc0 [2025-01-05 10:33:55,364][07750] Num visible devices: 1 [2025-01-05 10:33:55,358][07439] Starting process rollout_proc1 [2025-01-05 10:33:55,359][07439] Starting process rollout_proc2 [2025-01-05 10:33:55,371][07439] Starting process rollout_proc6 [2025-01-05 10:33:55,367][07439] Starting process rollout_proc4 [2025-01-05 10:33:55,378][07750] Starting seed is not provided [2025-01-05 10:33:55,379][07750] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2025-01-05 10:33:55,370][07439] Starting process rollout_proc5 [2025-01-05 10:33:55,379][07750] Initializing actor-critic model on device cuda:0 [2025-01-05 10:33:55,380][07750] RunningMeanStd input shape: (3, 72, 128) [2025-01-05 10:33:55,383][07750] RunningMeanStd input shape: (1,) [2025-01-05 10:33:55,360][07439] Starting process rollout_proc3 [2025-01-05 10:33:55,374][07439] Starting process rollout_proc7 [2025-01-05 10:33:55,374][07439] Starting process rollout_proc8 [2025-01-05 10:33:55,379][07439] Starting process rollout_proc9 [2025-01-05 10:33:55,379][07439] Starting process rollout_proc10 [2025-01-05 10:33:55,379][07439] Starting process rollout_proc11 [2025-01-05 10:33:55,418][07750] ConvEncoder: input_channels=3 [2025-01-05 10:33:55,851][07750] Conv encoder output size: 512 [2025-01-05 10:33:55,852][07750] Policy head output size: 512 [2025-01-05 10:33:55,914][07750] Created Actor Critic model with architecture: [2025-01-05 10:33:55,914][07750] ActorCriticSharedWeights( (obs_normalizer): ObservationNormalizer( (running_mean_std): RunningMeanStdDictInPlace( (running_mean_std): ModuleDict( (obs): RunningMeanStdInPlace() ) ) ) (returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace) (encoder): VizdoomEncoder( (basic_encoder): ConvEncoder( (enc): RecursiveScriptModule( original_name=ConvEncoderImpl (conv_head): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Conv2d) (1): RecursiveScriptModule(original_name=ELU) (2): RecursiveScriptModule(original_name=Conv2d) (3): RecursiveScriptModule(original_name=ELU) (4): RecursiveScriptModule(original_name=Conv2d) (5): RecursiveScriptModule(original_name=ELU) ) (mlp_layers): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Linear) (1): RecursiveScriptModule(original_name=ELU) ) ) ) ) (core): ModelCoreRNN( (core): GRU(512, 512) ) (decoder): MlpDecoder( (mlp): Identity() ) (critic_linear): Linear(in_features=512, out_features=1, bias=True) (action_parameterization): ActionParameterizationDefault( (distribution_linear): Linear(in_features=512, out_features=5, bias=True) ) ) [2025-01-05 10:33:55,964][07750] EvtLoop [learner_proc0_evt_loop, process=learner_proc0] unhandled exception in slot='init' connected to emitter=Emitter(object_id='Runner_EvtLoop', signal_name='start'), args=() Traceback (most recent call last): File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/signal_slot/signal_slot.py", line 355, in _process_signal slot_callable(*args) File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/sample_factory/algo/learning/learner_worker.py", line 139, in init init_model_data = self.learner.init() ^^^^^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/sample_factory/algo/learning/learner.py", line 215, in init self.actor_critic.model_to_device(self.device) File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/sample_factory/model/actor_critic.py", line 60, in model_to_device module.to(device) File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1340, in to return self._apply(convert) ^^^^^^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/torch/nn/modules/module.py", line 900, in _apply module._apply(fn) File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/torch/nn/modules/module.py", line 900, in _apply module._apply(fn) File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/torch/nn/modules/module.py", line 900, in _apply module._apply(fn) File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/torch/nn/modules/module.py", line 988, in _apply self._buffers[key] = fn(buf) ^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1326, in convert return t.to( ^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/torch/cuda/__init__.py", line 319, in _lazy_init torch._C._cuda_init() RuntimeError: The NVIDIA driver on your system is too old (found version 11040). Please update your GPU driver by downloading and installing a new version from the URL: http://www.nvidia.com/Download/index.aspx Alternatively, go to: https://pytorch.org to install a PyTorch version that has been compiled with your version of the CUDA driver. [2025-01-05 10:33:55,971][07750] Unhandled exception The NVIDIA driver on your system is too old (found version 11040). Please update your GPU driver by downloading and installing a new version from the URL: http://www.nvidia.com/Download/index.aspx Alternatively, go to: https://pytorch.org to install a PyTorch version that has been compiled with your version of the CUDA driver. in evt loop learner_proc0_evt_loop [2025-01-05 10:33:59,615][07831] Worker 2 uses CPU cores [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15] [2025-01-05 10:33:59,913][07827] Worker 0 uses CPU cores [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15] [2025-01-05 10:33:59,987][07839] Worker 5 uses CPU cores [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15] [2025-01-05 10:34:00,027][07835] Worker 3 uses CPU cores [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15] [2025-01-05 10:34:00,241][07840] Worker 11 uses CPU cores [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15] [2025-01-05 10:34:00,310][07833] Worker 6 uses CPU cores [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15] [2025-01-05 10:34:00,343][07832] Worker 4 uses CPU cores [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15] [2025-01-05 10:34:00,372][07836] Worker 8 uses CPU cores [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15] [2025-01-05 10:34:00,392][07829] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2025-01-05 10:34:00,392][07829] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0 [2025-01-05 10:34:00,404][07837] Worker 10 uses CPU cores [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15] [2025-01-05 10:34:00,404][07829] Num visible devices: 1 [2025-01-05 10:34:00,407][07828] Worker 1 uses CPU cores [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15] [2025-01-05 10:34:00,424][07838] Worker 9 uses CPU cores [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15] [2025-01-05 10:34:00,444][07834] Worker 7 uses CPU cores [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15] [2025-01-05 10:34:13,220][07439] Heartbeat connected on Batcher_0 [2025-01-05 10:34:13,227][07439] Heartbeat connected on InferenceWorker_p0-w0 [2025-01-05 10:34:13,231][07439] Heartbeat connected on RolloutWorker_w0 [2025-01-05 10:34:13,234][07439] Heartbeat connected on RolloutWorker_w1 [2025-01-05 10:34:13,236][07439] Heartbeat connected on RolloutWorker_w2 [2025-01-05 10:34:13,238][07439] Heartbeat connected on RolloutWorker_w3 [2025-01-05 10:34:13,241][07439] Heartbeat connected on RolloutWorker_w4 [2025-01-05 10:34:13,243][07439] Heartbeat connected on RolloutWorker_w5 [2025-01-05 10:34:13,247][07439] Heartbeat connected on RolloutWorker_w6 [2025-01-05 10:34:13,252][07439] Heartbeat connected on RolloutWorker_w7 [2025-01-05 10:34:13,255][07439] Heartbeat connected on RolloutWorker_w8 [2025-01-05 10:34:13,258][07439] Heartbeat connected on RolloutWorker_w9 [2025-01-05 10:34:13,261][07439] Heartbeat connected on RolloutWorker_w10 [2025-01-05 10:34:13,264][07439] Heartbeat connected on RolloutWorker_w11 [2025-01-05 10:37:08,716][07439] Keyboard interrupt detected in the event loop EvtLoop [Runner_EvtLoop, process=main process 7439], exiting... [2025-01-05 10:37:08,718][07831] Stopping RolloutWorker_w2... [2025-01-05 10:37:08,718][07838] Stopping RolloutWorker_w9... [2025-01-05 10:37:08,718][07835] Stopping RolloutWorker_w3... [2025-01-05 10:37:08,718][07828] Stopping RolloutWorker_w1... [2025-01-05 10:37:08,718][07837] Stopping RolloutWorker_w10... [2025-01-05 10:37:08,718][07833] Stopping RolloutWorker_w6... [2025-01-05 10:37:08,718][07827] Stopping RolloutWorker_w0... [2025-01-05 10:37:08,718][07439] Runner profile tree view: main_loop: 195.4533 [2025-01-05 10:37:08,718][07832] Stopping RolloutWorker_w4... [2025-01-05 10:37:08,718][07750] Stopping Batcher_0... [2025-01-05 10:37:08,718][07839] Stopping RolloutWorker_w5... [2025-01-05 10:37:08,719][07829] Stopping InferenceWorker_p0-w0... [2025-01-05 10:37:08,719][07840] Stopping RolloutWorker_w11... [2025-01-05 10:37:08,719][07831] Loop rollout_proc2_evt_loop terminating... [2025-01-05 10:37:08,719][07439] Collected {}, FPS: 0.0 [2025-01-05 10:37:08,720][07835] Loop rollout_proc3_evt_loop terminating... [2025-01-05 10:37:08,720][07834] Stopping RolloutWorker_w7... [2025-01-05 10:37:08,720][07833] Loop rollout_proc6_evt_loop terminating... [2025-01-05 10:37:08,720][07836] Stopping RolloutWorker_w8... [2025-01-05 10:37:08,720][07838] Loop rollout_proc9_evt_loop terminating... [2025-01-05 10:37:08,720][07750] Loop batcher_evt_loop terminating... [2025-01-05 10:37:08,720][07839] Loop rollout_proc5_evt_loop terminating... [2025-01-05 10:37:08,720][07840] Loop rollout_proc11_evt_loop terminating... [2025-01-05 10:37:08,720][07837] Loop rollout_proc10_evt_loop terminating... [2025-01-05 10:37:08,721][07832] Loop rollout_proc4_evt_loop terminating... [2025-01-05 10:37:08,721][07834] Loop rollout_proc7_evt_loop terminating... [2025-01-05 10:37:08,721][07836] Loop rollout_proc8_evt_loop terminating... [2025-01-05 10:37:08,722][07828] Loop rollout_proc1_evt_loop terminating... [2025-01-05 10:37:08,722][07827] Loop rollout_proc0_evt_loop terminating... [2025-01-05 10:37:08,720][07829] Loop inference_proc0-0_evt_loop terminating... [2025-01-05 10:37:09,698][07439] Loading existing experiment configuration from /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/config.json [2025-01-05 10:37:09,699][07439] Overriding arg 'num_workers' with value 1 passed from command line [2025-01-05 10:37:09,703][07439] Adding new argument 'no_render'=True that is not in the saved config file! [2025-01-05 10:37:09,703][07439] Adding new argument 'save_video'=True that is not in the saved config file! [2025-01-05 10:37:09,704][07439] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! [2025-01-05 10:37:09,708][07439] Adding new argument 'video_name'=None that is not in the saved config file! [2025-01-05 10:37:09,708][07439] Adding new argument 'max_num_frames'=1000000000.0 that is not in the saved config file! [2025-01-05 10:37:09,709][07439] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! [2025-01-05 10:37:09,709][07439] Adding new argument 'push_to_hub'=False that is not in the saved config file! [2025-01-05 10:37:09,709][07439] Adding new argument 'hf_repository'=None that is not in the saved config file! [2025-01-05 10:37:09,709][07439] Adding new argument 'policy_index'=0 that is not in the saved config file! [2025-01-05 10:37:09,710][07439] Adding new argument 'eval_deterministic'=False that is not in the saved config file! [2025-01-05 10:37:09,710][07439] Adding new argument 'train_script'=None that is not in the saved config file! [2025-01-05 10:37:09,710][07439] Adding new argument 'enjoy_script'=None that is not in the saved config file! [2025-01-05 10:37:09,710][07439] Using frameskip 1 and render_action_repeat=4 for evaluation [2025-01-05 10:37:09,770][07439] Doom resolution: 160x120, resize resolution: (128, 72) [2025-01-05 10:37:09,773][07439] RunningMeanStd input shape: (3, 72, 128) [2025-01-05 10:37:09,774][07439] RunningMeanStd input shape: (1,) [2025-01-05 10:37:09,800][07439] ConvEncoder: input_channels=3 [2025-01-05 10:37:09,964][07439] Conv encoder output size: 512 [2025-01-05 10:37:09,965][07439] Policy head output size: 512 [2025-01-05 10:48:11,699][06474] Saving configuration to /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/config.json... [2025-01-05 10:48:11,700][06474] Rollout worker 0 uses device cpu [2025-01-05 10:48:11,701][06474] Rollout worker 1 uses device cpu [2025-01-05 10:48:11,701][06474] Rollout worker 2 uses device cpu [2025-01-05 10:48:11,701][06474] Rollout worker 3 uses device cpu [2025-01-05 10:48:11,701][06474] Rollout worker 4 uses device cpu [2025-01-05 10:48:11,701][06474] Rollout worker 5 uses device cpu [2025-01-05 10:48:11,702][06474] Rollout worker 6 uses device cpu [2025-01-05 10:48:11,702][06474] Rollout worker 7 uses device cpu [2025-01-05 10:48:11,702][06474] Rollout worker 8 uses device cpu [2025-01-05 10:48:11,702][06474] Rollout worker 9 uses device cpu [2025-01-05 10:48:11,703][06474] Rollout worker 10 uses device cpu [2025-01-05 10:48:11,703][06474] Rollout worker 11 uses device cpu [2025-01-05 11:37:17,770][05549] Saving configuration to /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/config.json... [2025-01-05 11:37:17,772][05549] Rollout worker 0 uses device cpu [2025-01-05 11:37:17,773][05549] Rollout worker 1 uses device cpu [2025-01-05 11:37:17,773][05549] Rollout worker 2 uses device cpu [2025-01-05 11:37:17,773][05549] Rollout worker 3 uses device cpu [2025-01-05 11:37:17,773][05549] Rollout worker 4 uses device cpu [2025-01-05 11:37:17,773][05549] Rollout worker 5 uses device cpu [2025-01-05 11:37:17,774][05549] Rollout worker 6 uses device cpu [2025-01-05 11:37:17,774][05549] Rollout worker 7 uses device cpu [2025-01-05 11:37:17,774][05549] Rollout worker 8 uses device cpu [2025-01-05 11:37:17,774][05549] Rollout worker 9 uses device cpu [2025-01-05 11:37:17,775][05549] Rollout worker 10 uses device cpu [2025-01-05 11:37:17,775][05549] Rollout worker 11 uses device cpu [2025-01-05 11:37:17,845][05549] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2025-01-05 11:37:17,845][05549] InferenceWorker_p0-w0: min num requests: 4 [2025-01-05 11:37:17,883][05549] Starting all processes... [2025-01-05 11:37:17,884][05549] Starting process learner_proc0 [2025-01-05 11:37:19,501][05549] Starting all processes... [2025-01-05 11:37:19,507][05549] Starting process inference_proc0-0 [2025-01-05 11:37:19,510][05634] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2025-01-05 11:37:19,510][05549] Starting process rollout_proc0 [2025-01-05 11:37:19,510][05634] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0 [2025-01-05 11:37:19,510][05549] Starting process rollout_proc1 [2025-01-05 11:37:19,510][05549] Starting process rollout_proc2 [2025-01-05 11:37:19,513][05549] Starting process rollout_proc3 [2025-01-05 11:37:19,514][05549] Starting process rollout_proc4 [2025-01-05 11:37:19,517][05549] Starting process rollout_proc5 [2025-01-05 11:37:19,517][05549] Starting process rollout_proc6 [2025-01-05 11:37:19,523][05549] Starting process rollout_proc7 [2025-01-05 11:37:19,526][05634] Num visible devices: 1 [2025-01-05 11:37:19,523][05549] Starting process rollout_proc8 [2025-01-05 11:37:19,537][05634] Starting seed is not provided [2025-01-05 11:37:19,537][05634] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2025-01-05 11:37:19,537][05634] Initializing actor-critic model on device cuda:0 [2025-01-05 11:37:19,538][05634] RunningMeanStd input shape: (3, 72, 128) [2025-01-05 11:37:19,542][05634] RunningMeanStd input shape: (1,) [2025-01-05 11:37:19,524][05549] Starting process rollout_proc9 [2025-01-05 11:37:19,524][05549] Starting process rollout_proc10 [2025-01-05 11:37:19,524][05549] Starting process rollout_proc11 [2025-01-05 11:37:19,567][05634] ConvEncoder: input_channels=3 [2025-01-05 11:37:19,873][05634] Conv encoder output size: 512 [2025-01-05 11:37:19,874][05634] Policy head output size: 512 [2025-01-05 11:37:19,933][05634] Created Actor Critic model with architecture: [2025-01-05 11:37:19,933][05634] ActorCriticSharedWeights( (obs_normalizer): ObservationNormalizer( (running_mean_std): RunningMeanStdDictInPlace( (running_mean_std): ModuleDict( (obs): RunningMeanStdInPlace() ) ) ) (returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace) (encoder): VizdoomEncoder( (basic_encoder): ConvEncoder( (enc): RecursiveScriptModule( original_name=ConvEncoderImpl (conv_head): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Conv2d) (1): RecursiveScriptModule(original_name=ELU) (2): RecursiveScriptModule(original_name=Conv2d) (3): RecursiveScriptModule(original_name=ELU) (4): RecursiveScriptModule(original_name=Conv2d) (5): RecursiveScriptModule(original_name=ELU) ) (mlp_layers): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Linear) (1): RecursiveScriptModule(original_name=ELU) ) ) ) ) (core): ModelCoreRNN( (core): GRU(512, 512) ) (decoder): MlpDecoder( (mlp): Identity() ) (critic_linear): Linear(in_features=512, out_features=1, bias=True) (action_parameterization): ActionParameterizationDefault( (distribution_linear): Linear(in_features=512, out_features=5, bias=True) ) ) [2025-01-05 11:37:20,186][05634] Using optimizer [2025-01-05 11:37:22,675][05675] Worker 1 uses CPU cores [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15] [2025-01-05 11:37:22,726][05676] Worker 3 uses CPU cores [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15] [2025-01-05 11:37:22,744][05701] Worker 11 uses CPU cores [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15] [2025-01-05 11:37:22,859][05697] Worker 8 uses CPU cores [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15] [2025-01-05 11:37:22,900][05673] Worker 0 uses CPU cores [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15] [2025-01-05 11:37:22,942][05696] Worker 7 uses CPU cores [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15] [2025-01-05 11:37:22,958][05677] Worker 2 uses CPU cores [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15] [2025-01-05 11:37:22,964][05694] Worker 5 uses CPU cores [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15] [2025-01-05 11:37:23,070][05700] Worker 10 uses CPU cores [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15] [2025-01-05 11:37:23,089][05674] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2025-01-05 11:37:23,089][05674] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0 [2025-01-05 11:37:23,102][05698] Worker 9 uses CPU cores [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15] [2025-01-05 11:37:23,102][05674] Num visible devices: 1 [2025-01-05 11:37:23,129][05678] Worker 4 uses CPU cores [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15] [2025-01-05 11:37:23,138][05699] Worker 6 uses CPU cores [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15] [2025-01-05 11:37:23,146][05634] Loading state from checkpoint /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000244142_1000005632.pth... [2025-01-05 11:37:23,201][05634] Loading model from checkpoint [2025-01-05 11:37:23,202][05634] Loaded experiment state at self.train_step=244142, self.env_steps=1000005632 [2025-01-05 11:37:23,204][05634] Initialized policy 0 weights for model version 244142 [2025-01-05 11:37:23,207][05634] LearnerWorker_p0 finished initialization! [2025-01-05 11:37:23,207][05634] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2025-01-05 11:37:23,287][05674] RunningMeanStd input shape: (3, 72, 128) [2025-01-05 11:37:23,287][05674] RunningMeanStd input shape: (1,) [2025-01-05 11:37:23,297][05674] ConvEncoder: input_channels=3 [2025-01-05 11:37:23,390][05674] Conv encoder output size: 512 [2025-01-05 11:37:23,391][05674] Policy head output size: 512 [2025-01-05 11:37:23,418][05549] Inference worker 0-0 is ready! [2025-01-05 11:37:23,418][05549] All inference workers are ready! Signal rollout workers to start! [2025-01-05 11:37:23,460][05700] Doom resolution: 160x120, resize resolution: (128, 72) [2025-01-05 11:37:23,460][05678] Doom resolution: 160x120, resize resolution: (128, 72) [2025-01-05 11:37:23,462][05677] Doom resolution: 160x120, resize resolution: (128, 72) [2025-01-05 11:37:23,467][05676] Doom resolution: 160x120, resize resolution: (128, 72) [2025-01-05 11:37:23,481][05673] Doom resolution: 160x120, resize resolution: (128, 72) [2025-01-05 11:37:23,483][05675] Doom resolution: 160x120, resize resolution: (128, 72) [2025-01-05 11:37:23,483][05701] Doom resolution: 160x120, resize resolution: (128, 72) [2025-01-05 11:37:23,484][05699] Doom resolution: 160x120, resize resolution: (128, 72) [2025-01-05 11:37:23,484][05697] Doom resolution: 160x120, resize resolution: (128, 72) [2025-01-05 11:37:23,484][05696] Doom resolution: 160x120, resize resolution: (128, 72) [2025-01-05 11:37:23,484][05698] Doom resolution: 160x120, resize resolution: (128, 72) [2025-01-05 11:37:23,484][05694] Doom resolution: 160x120, resize resolution: (128, 72) [2025-01-05 11:37:23,670][05549] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 1000005632. Throughput: 0: nan. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2025-01-05 11:37:23,935][05678] Decorrelating experience for 0 frames... [2025-01-05 11:37:23,935][05676] Decorrelating experience for 0 frames... [2025-01-05 11:37:23,935][05694] Decorrelating experience for 0 frames... [2025-01-05 11:37:23,935][05697] Decorrelating experience for 0 frames... [2025-01-05 11:37:23,935][05675] Decorrelating experience for 0 frames... [2025-01-05 11:37:23,950][05673] Decorrelating experience for 0 frames... [2025-01-05 11:37:24,213][05678] Decorrelating experience for 32 frames... [2025-01-05 11:37:24,213][05697] Decorrelating experience for 32 frames... [2025-01-05 11:37:24,243][05694] Decorrelating experience for 32 frames... [2025-01-05 11:37:24,247][05698] Decorrelating experience for 0 frames... [2025-01-05 11:37:24,272][05673] Decorrelating experience for 32 frames... [2025-01-05 11:37:24,278][05676] Decorrelating experience for 32 frames... [2025-01-05 11:37:24,531][05700] Decorrelating experience for 0 frames... [2025-01-05 11:37:24,589][05696] Decorrelating experience for 0 frames... [2025-01-05 11:37:24,598][05697] Decorrelating experience for 64 frames... [2025-01-05 11:37:24,626][05677] Decorrelating experience for 0 frames... [2025-01-05 11:37:24,642][05694] Decorrelating experience for 64 frames... [2025-01-05 11:37:24,845][05676] Decorrelating experience for 64 frames... [2025-01-05 11:37:24,850][05700] Decorrelating experience for 32 frames... [2025-01-05 11:37:24,904][05698] Decorrelating experience for 32 frames... [2025-01-05 11:37:24,916][05673] Decorrelating experience for 64 frames... [2025-01-05 11:37:24,965][05697] Decorrelating experience for 96 frames... [2025-01-05 11:37:25,006][05694] Decorrelating experience for 96 frames... [2025-01-05 11:37:25,181][05678] Decorrelating experience for 64 frames... [2025-01-05 11:37:25,269][05677] Decorrelating experience for 32 frames... [2025-01-05 11:37:25,302][05700] Decorrelating experience for 64 frames... [2025-01-05 11:37:25,341][05673] Decorrelating experience for 96 frames... [2025-01-05 11:37:25,540][05696] Decorrelating experience for 32 frames... [2025-01-05 11:37:25,570][05678] Decorrelating experience for 96 frames... [2025-01-05 11:37:25,681][05700] Decorrelating experience for 96 frames... [2025-01-05 11:37:25,693][05698] Decorrelating experience for 64 frames... [2025-01-05 11:37:25,863][05675] Decorrelating experience for 32 frames... [2025-01-05 11:37:25,949][05696] Decorrelating experience for 64 frames... [2025-01-05 11:37:25,973][05677] Decorrelating experience for 64 frames... [2025-01-05 11:37:26,259][05699] Decorrelating experience for 0 frames... [2025-01-05 11:37:26,296][05698] Decorrelating experience for 96 frames... [2025-01-05 11:37:26,327][05675] Decorrelating experience for 64 frames... [2025-01-05 11:37:26,644][05699] Decorrelating experience for 32 frames... [2025-01-05 11:37:26,649][05701] Decorrelating experience for 0 frames... [2025-01-05 11:37:26,723][05677] Decorrelating experience for 96 frames... [2025-01-05 11:37:26,741][05634] Signal inference workers to stop experience collection... [2025-01-05 11:37:26,749][05674] InferenceWorker_p0-w0: stopping experience collection [2025-01-05 11:37:26,962][05701] Decorrelating experience for 32 frames... [2025-01-05 11:37:26,997][05675] Decorrelating experience for 96 frames... [2025-01-05 11:37:27,017][05696] Decorrelating experience for 96 frames... [2025-01-05 11:37:27,306][05676] Decorrelating experience for 96 frames... [2025-01-05 11:37:27,328][05701] Decorrelating experience for 64 frames... [2025-01-05 11:37:27,621][05699] Decorrelating experience for 64 frames... [2025-01-05 11:37:27,646][05701] Decorrelating experience for 96 frames... [2025-01-05 11:37:27,956][05699] Decorrelating experience for 96 frames... [2025-01-05 11:37:28,669][05549] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 1000005632. Throughput: 0: 254.0. Samples: 1270. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2025-01-05 11:37:28,670][05549] Avg episode reward: [(0, '2.280')] [2025-01-05 11:37:30,283][05634] Signal inference workers to resume experience collection... [2025-01-05 11:37:30,283][05674] InferenceWorker_p0-w0: resuming experience collection [2025-01-05 11:37:30,284][05634] Stopping Batcher_0... [2025-01-05 11:37:30,284][05634] Loop batcher_evt_loop terminating... [2025-01-05 11:37:30,289][05549] Component Batcher_0 stopped! [2025-01-05 11:37:30,303][05674] Weights refcount: 2 0 [2025-01-05 11:37:30,304][05674] Stopping InferenceWorker_p0-w0... [2025-01-05 11:37:30,305][05549] Component InferenceWorker_p0-w0 stopped! [2025-01-05 11:37:30,305][05674] Loop inference_proc0-0_evt_loop terminating... [2025-01-05 11:37:30,340][05694] Stopping RolloutWorker_w5... [2025-01-05 11:37:30,340][05549] Component RolloutWorker_w5 stopped! [2025-01-05 11:37:30,340][05700] Stopping RolloutWorker_w10... [2025-01-05 11:37:30,341][05700] Loop rollout_proc10_evt_loop terminating... [2025-01-05 11:37:30,341][05549] Component RolloutWorker_w10 stopped! [2025-01-05 11:37:30,340][05694] Loop rollout_proc5_evt_loop terminating... [2025-01-05 11:37:30,344][05549] Component RolloutWorker_w1 stopped! [2025-01-05 11:37:30,344][05675] Stopping RolloutWorker_w1... [2025-01-05 11:37:30,345][05675] Loop rollout_proc1_evt_loop terminating... [2025-01-05 11:37:30,346][05549] Component RolloutWorker_w8 stopped! [2025-01-05 11:37:30,346][05697] Stopping RolloutWorker_w8... [2025-01-05 11:37:30,347][05698] Stopping RolloutWorker_w9... [2025-01-05 11:37:30,347][05549] Component RolloutWorker_w9 stopped! [2025-01-05 11:37:30,347][05697] Loop rollout_proc8_evt_loop terminating... [2025-01-05 11:37:30,347][05698] Loop rollout_proc9_evt_loop terminating... [2025-01-05 11:37:30,348][05696] Stopping RolloutWorker_w7... [2025-01-05 11:37:30,348][05696] Loop rollout_proc7_evt_loop terminating... [2025-01-05 11:37:30,350][05549] Component RolloutWorker_w7 stopped! [2025-01-05 11:37:30,350][05676] Stopping RolloutWorker_w3... [2025-01-05 11:37:30,350][05549] Component RolloutWorker_w3 stopped! [2025-01-05 11:37:30,350][05676] Loop rollout_proc3_evt_loop terminating... [2025-01-05 11:37:30,350][05699] Stopping RolloutWorker_w6... [2025-01-05 11:37:30,350][05549] Component RolloutWorker_w6 stopped! [2025-01-05 11:37:30,351][05699] Loop rollout_proc6_evt_loop terminating... [2025-01-05 11:37:30,354][05677] Stopping RolloutWorker_w2... [2025-01-05 11:37:30,354][05549] Component RolloutWorker_w2 stopped! [2025-01-05 11:37:30,355][05673] Stopping RolloutWorker_w0... [2025-01-05 11:37:30,355][05678] Stopping RolloutWorker_w4... [2025-01-05 11:37:30,355][05677] Loop rollout_proc2_evt_loop terminating... [2025-01-05 11:37:30,355][05549] Component RolloutWorker_w0 stopped! [2025-01-05 11:37:30,355][05673] Loop rollout_proc0_evt_loop terminating... [2025-01-05 11:37:30,355][05678] Loop rollout_proc4_evt_loop terminating... [2025-01-05 11:37:30,355][05549] Component RolloutWorker_w4 stopped! [2025-01-05 11:37:30,359][05701] Stopping RolloutWorker_w11... [2025-01-05 11:37:30,360][05549] Component RolloutWorker_w11 stopped! [2025-01-05 11:37:30,360][05701] Loop rollout_proc11_evt_loop terminating... [2025-01-05 11:37:30,790][05634] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000244144_1000013824.pth... [2025-01-05 11:37:30,884][05634] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000243841_998772736.pth [2025-01-05 11:37:30,886][05634] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000244144_1000013824.pth... [2025-01-05 11:37:30,978][05634] Stopping LearnerWorker_p0... [2025-01-05 11:37:30,979][05634] Loop learner_proc0_evt_loop terminating... [2025-01-05 11:37:30,978][05549] Component LearnerWorker_p0 stopped! [2025-01-05 11:37:30,980][05549] Waiting for process learner_proc0 to stop... [2025-01-05 11:37:31,834][05549] Waiting for process inference_proc0-0 to join... [2025-01-05 11:37:31,835][05549] Waiting for process rollout_proc0 to join... [2025-01-05 11:37:31,835][05549] Waiting for process rollout_proc1 to join... [2025-01-05 11:37:31,835][05549] Waiting for process rollout_proc2 to join... [2025-01-05 11:37:31,836][05549] Waiting for process rollout_proc3 to join... [2025-01-05 11:37:31,836][05549] Waiting for process rollout_proc4 to join... [2025-01-05 11:37:31,836][05549] Waiting for process rollout_proc5 to join... [2025-01-05 11:37:31,836][05549] Waiting for process rollout_proc6 to join... [2025-01-05 11:37:31,836][05549] Waiting for process rollout_proc7 to join... [2025-01-05 11:37:31,836][05549] Waiting for process rollout_proc8 to join... [2025-01-05 11:37:31,837][05549] Waiting for process rollout_proc9 to join... [2025-01-05 11:37:31,837][05549] Waiting for process rollout_proc10 to join... [2025-01-05 11:37:31,837][05549] Waiting for process rollout_proc11 to join... [2025-01-05 11:37:31,837][05549] Batcher 0 profile tree view: batching: 0.0225, releasing_batches: 0.0007 [2025-01-05 11:37:31,837][05549] InferenceWorker_p0-w0 profile tree view: update_model: 0.0088 wait_policy: 0.0001 wait_policy_total: 1.6375 one_step: 0.0042 handle_policy_step: 1.6229 deserialize: 0.0288, stack: 0.0040, obs_to_device_normalize: 0.2556, forward: 1.1785, send_messages: 0.0425 prepare_outputs: 0.0857 to_cpu: 0.0484 [2025-01-05 11:37:31,838][05549] Learner 0 profile tree view: misc: 0.0000, prepare_batch: 1.1710 train: 3.1627 epoch_init: 0.0000, minibatch_init: 0.0000, losses_postprocess: 0.0015, kl_divergence: 0.0078, after_optimizer: 0.0398 calculate_losses: 0.5947 losses_init: 0.0000, forward_head: 0.4179, bptt_initial: 0.1187, tail: 0.0145, advantages_returns: 0.0017, losses: 0.0369 bptt: 0.0043 bptt_forward_core: 0.0042 update: 2.5159 clip: 0.0526 [2025-01-05 11:37:31,838][05549] RolloutWorker_w0 profile tree view: wait_for_trajectories: 0.0008, enqueue_policy_requests: 0.0422, env_step: 0.4404, overhead: 0.0230, complete_rollouts: 0.0008 save_policy_outputs: 0.0396 split_output_tensors: 0.0130 [2025-01-05 11:37:31,838][05549] RolloutWorker_w11 profile tree view: wait_for_trajectories: 0.0002, enqueue_policy_requests: 0.0006 [2025-01-05 11:37:31,838][05549] Loop Runner_EvtLoop terminating... [2025-01-05 11:37:31,839][05549] Runner profile tree view: main_loop: 13.9557 [2025-01-05 11:37:31,839][05549] Collected {0: 1000013824}, FPS: 587.0 [2025-01-05 11:37:32,130][05549] Loading existing experiment configuration from /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/config.json [2025-01-05 11:37:32,130][05549] Overriding arg 'num_workers' with value 1 passed from command line [2025-01-05 11:37:32,131][05549] Adding new argument 'no_render'=True that is not in the saved config file! [2025-01-05 11:37:32,131][05549] Adding new argument 'save_video'=True that is not in the saved config file! [2025-01-05 11:37:32,131][05549] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! [2025-01-05 11:37:32,131][05549] Adding new argument 'video_name'=None that is not in the saved config file! [2025-01-05 11:37:32,131][05549] Adding new argument 'max_num_frames'=1000000000.0 that is not in the saved config file! [2025-01-05 11:37:32,131][05549] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! [2025-01-05 11:37:32,131][05549] Adding new argument 'push_to_hub'=False that is not in the saved config file! [2025-01-05 11:37:32,131][05549] Adding new argument 'hf_repository'=None that is not in the saved config file! [2025-01-05 11:37:32,131][05549] Adding new argument 'policy_index'=0 that is not in the saved config file! [2025-01-05 11:37:32,132][05549] Adding new argument 'eval_deterministic'=False that is not in the saved config file! [2025-01-05 11:37:32,132][05549] Adding new argument 'train_script'=None that is not in the saved config file! [2025-01-05 11:37:32,132][05549] Adding new argument 'enjoy_script'=None that is not in the saved config file! [2025-01-05 11:37:32,132][05549] Using frameskip 1 and render_action_repeat=4 for evaluation [2025-01-05 11:37:32,152][05549] Doom resolution: 160x120, resize resolution: (128, 72) [2025-01-05 11:37:32,153][05549] RunningMeanStd input shape: (3, 72, 128) [2025-01-05 11:37:32,154][05549] RunningMeanStd input shape: (1,) [2025-01-05 11:37:32,163][05549] ConvEncoder: input_channels=3 [2025-01-05 11:37:32,258][05549] Conv encoder output size: 512 [2025-01-05 11:37:32,258][05549] Policy head output size: 512 [2025-01-05 11:37:32,377][05549] Loading state from checkpoint /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000244144_1000013824.pth... [2025-01-05 11:37:33,017][05549] Num frames 100... [2025-01-05 11:37:33,163][05549] Num frames 200... [2025-01-05 11:37:33,296][05549] Num frames 300... [2025-01-05 11:37:33,415][05549] Num frames 400... [2025-01-05 11:37:33,523][05549] Num frames 500... [2025-01-05 11:37:33,640][05549] Num frames 600... [2025-01-05 11:37:33,778][05549] Num frames 700... [2025-01-05 11:37:33,926][05549] Num frames 800... [2025-01-05 11:37:34,094][05549] Avg episode rewards: #0: 18.960, true rewards: #0: 8.960 [2025-01-05 11:37:34,094][05549] Avg episode reward: 18.960, avg true_objective: 8.960 [2025-01-05 11:37:34,101][05549] Num frames 900... [2025-01-05 11:37:34,231][05549] Num frames 1000... [2025-01-05 11:37:34,365][05549] Num frames 1100... [2025-01-05 11:37:34,487][05549] Num frames 1200... [2025-01-05 11:37:34,631][05549] Avg episode rewards: #0: 11.400, true rewards: #0: 6.400 [2025-01-05 11:37:34,632][05549] Avg episode reward: 11.400, avg true_objective: 6.400 [2025-01-05 11:37:34,656][05549] Num frames 1300... [2025-01-05 11:37:34,774][05549] Num frames 1400... [2025-01-05 11:37:34,888][05549] Num frames 1500... [2025-01-05 11:37:35,006][05549] Num frames 1600... [2025-01-05 11:37:35,124][05549] Num frames 1700... [2025-01-05 11:37:35,260][05549] Num frames 1800... [2025-01-05 11:37:35,390][05549] Num frames 1900... [2025-01-05 11:37:35,557][05549] Avg episode rewards: #0: 11.613, true rewards: #0: 6.613 [2025-01-05 11:37:35,558][05549] Avg episode reward: 11.613, avg true_objective: 6.613 [2025-01-05 11:37:35,577][05549] Num frames 2000... [2025-01-05 11:37:35,690][05549] Num frames 2100... [2025-01-05 11:37:35,805][05549] Num frames 2200... [2025-01-05 11:37:35,938][05549] Num frames 2300... [2025-01-05 11:37:36,031][05549] Avg episode rewards: #0: 9.840, true rewards: #0: 5.840 [2025-01-05 11:37:36,032][05549] Avg episode reward: 9.840, avg true_objective: 5.840 [2025-01-05 11:37:36,101][05549] Num frames 2400... [2025-01-05 11:37:36,206][05549] Num frames 2500... [2025-01-05 11:37:36,309][05549] Num frames 2600... [2025-01-05 11:37:36,370][05549] Avg episode rewards: #0: 8.412, true rewards: #0: 5.212 [2025-01-05 11:37:36,370][05549] Avg episode reward: 8.412, avg true_objective: 5.212 [2025-01-05 11:37:36,471][05549] Num frames 2700... [2025-01-05 11:37:36,576][05549] Num frames 2800... [2025-01-05 11:37:36,680][05549] Num frames 2900... [2025-01-05 11:37:36,787][05549] Num frames 3000... [2025-01-05 11:37:36,894][05549] Num frames 3100... [2025-01-05 11:37:37,001][05549] Num frames 3200... [2025-01-05 11:37:37,097][05549] Num frames 3300... [2025-01-05 11:37:37,192][05549] Avg episode rewards: #0: 9.237, true rewards: #0: 5.570 [2025-01-05 11:37:37,192][05549] Avg episode reward: 9.237, avg true_objective: 5.570 [2025-01-05 11:37:37,250][05549] Num frames 3400... [2025-01-05 11:37:37,361][05549] Num frames 3500... [2025-01-05 11:37:37,477][05549] Num frames 3600... [2025-01-05 11:37:37,588][05549] Num frames 3700... [2025-01-05 11:37:37,753][05549] Avg episode rewards: #0: 8.700, true rewards: #0: 5.414 [2025-01-05 11:37:37,754][05549] Avg episode reward: 8.700, avg true_objective: 5.414 [2025-01-05 11:37:37,766][05549] Num frames 3800... [2025-01-05 11:37:37,898][05549] Num frames 3900... [2025-01-05 11:37:38,039][05549] Num frames 4000... [2025-01-05 11:37:38,158][05549] Avg episode rewards: #0: 7.933, true rewards: #0: 5.057 [2025-01-05 11:37:38,158][05549] Avg episode reward: 7.933, avg true_objective: 5.057 [2025-01-05 11:37:38,231][05549] Num frames 4100... [2025-01-05 11:37:38,376][05549] Num frames 4200... [2025-01-05 11:37:38,489][05549] Num frames 4300... [2025-01-05 11:37:38,602][05549] Num frames 4400... [2025-01-05 11:37:38,716][05549] Num frames 4500... [2025-01-05 11:37:38,842][05549] Avg episode rewards: #0: 7.953, true rewards: #0: 5.064 [2025-01-05 11:37:38,842][05549] Avg episode reward: 7.953, avg true_objective: 5.064 [2025-01-05 11:37:38,891][05549] Num frames 4600... [2025-01-05 11:37:39,015][05549] Num frames 4700... [2025-01-05 11:37:39,163][05549] Avg episode rewards: #0: 7.482, true rewards: #0: 4.782 [2025-01-05 11:37:39,164][05549] Avg episode reward: 7.482, avg true_objective: 4.782 [2025-01-05 11:37:49,273][05549] Replay video saved to /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/replay.mp4! [2025-01-05 11:37:49,282][05549] Loading existing experiment configuration from /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/config.json [2025-01-05 11:37:49,282][05549] Overriding arg 'num_workers' with value 1 passed from command line [2025-01-05 11:37:49,283][05549] Adding new argument 'no_render'=True that is not in the saved config file! [2025-01-05 11:37:49,283][05549] Adding new argument 'save_video'=True that is not in the saved config file! [2025-01-05 11:37:49,283][05549] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! [2025-01-05 11:37:49,283][05549] Adding new argument 'video_name'=None that is not in the saved config file! [2025-01-05 11:37:49,283][05549] Adding new argument 'max_num_frames'=100000 that is not in the saved config file! [2025-01-05 11:37:49,283][05549] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! [2025-01-05 11:37:49,283][05549] Adding new argument 'push_to_hub'=True that is not in the saved config file! [2025-01-05 11:37:49,283][05549] Adding new argument 'hf_repository'='spenning/rl_course_vizdoom_health_gathering_supreme' that is not in the saved config file! [2025-01-05 11:37:49,283][05549] Adding new argument 'policy_index'=0 that is not in the saved config file! [2025-01-05 11:37:49,283][05549] Adding new argument 'eval_deterministic'=False that is not in the saved config file! [2025-01-05 11:37:49,283][05549] Adding new argument 'train_script'=None that is not in the saved config file! [2025-01-05 11:37:49,284][05549] Adding new argument 'enjoy_script'=None that is not in the saved config file! [2025-01-05 11:37:49,284][05549] Using frameskip 1 and render_action_repeat=4 for evaluation [2025-01-05 11:37:49,301][05549] RunningMeanStd input shape: (3, 72, 128) [2025-01-05 11:37:49,302][05549] RunningMeanStd input shape: (1,) [2025-01-05 11:37:49,308][05549] ConvEncoder: input_channels=3 [2025-01-05 11:37:49,335][05549] Conv encoder output size: 512 [2025-01-05 11:37:49,336][05549] Policy head output size: 512 [2025-01-05 11:37:49,351][05549] Loading state from checkpoint /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000244144_1000013824.pth... [2025-01-05 11:37:49,782][05549] Num frames 100... [2025-01-05 11:37:49,882][05549] Num frames 200... [2025-01-05 11:37:49,978][05549] Num frames 300... [2025-01-05 11:37:50,077][05549] Num frames 400... [2025-01-05 11:37:50,176][05549] Num frames 500... [2025-01-05 11:37:50,274][05549] Num frames 600... [2025-01-05 11:37:50,367][05549] Avg episode rewards: #0: 11.400, true rewards: #0: 6.400 [2025-01-05 11:37:50,367][05549] Avg episode reward: 11.400, avg true_objective: 6.400 [2025-01-05 11:37:50,468][05549] Num frames 700... [2025-01-05 11:37:50,548][05549] Num frames 800... [2025-01-05 11:37:50,646][05549] Avg episode rewards: #0: 6.755, true rewards: #0: 4.255 [2025-01-05 11:37:50,646][05549] Avg episode reward: 6.755, avg true_objective: 4.255 [2025-01-05 11:37:50,694][05549] Num frames 900... [2025-01-05 11:37:50,777][05549] Num frames 1000... [2025-01-05 11:37:50,864][05549] Num frames 1100... [2025-01-05 11:37:50,949][05549] Num frames 1200... [2025-01-05 11:37:51,039][05549] Num frames 1300... [2025-01-05 11:37:51,121][05549] Avg episode rewards: #0: 7.437, true rewards: #0: 4.437 [2025-01-05 11:37:51,121][05549] Avg episode reward: 7.437, avg true_objective: 4.437 [2025-01-05 11:37:51,192][05549] Num frames 1400... [2025-01-05 11:37:51,280][05549] Num frames 1500... [2025-01-05 11:37:51,367][05549] Num frames 1600... [2025-01-05 11:37:51,451][05549] Num frames 1700... [2025-01-05 11:37:51,535][05549] Num frames 1800... [2025-01-05 11:37:51,654][05549] Avg episode rewards: #0: 7.438, true rewards: #0: 4.687 [2025-01-05 11:37:51,654][05549] Avg episode reward: 7.438, avg true_objective: 4.687 [2025-01-05 11:37:51,691][05549] Num frames 1900... [2025-01-05 11:37:51,781][05549] Num frames 2000... [2025-01-05 11:37:51,866][05549] Num frames 2100... [2025-01-05 11:37:51,951][05549] Num frames 2200... [2025-01-05 11:37:52,028][05549] Avg episode rewards: #0: 6.854, true rewards: #0: 4.454 [2025-01-05 11:37:52,028][05549] Avg episode reward: 6.854, avg true_objective: 4.454 [2025-01-05 11:37:52,103][05549] Num frames 2300... [2025-01-05 11:37:52,186][05549] Num frames 2400... [2025-01-05 11:37:52,272][05549] Num frames 2500... [2025-01-05 11:37:52,356][05549] Num frames 2600... [2025-01-05 11:37:52,475][05549] Avg episode rewards: #0: 7.125, true rewards: #0: 4.458 [2025-01-05 11:37:52,475][05549] Avg episode reward: 7.125, avg true_objective: 4.458 [2025-01-05 11:37:52,504][05549] Num frames 2700... [2025-01-05 11:37:52,608][05549] Num frames 2800... [2025-01-05 11:37:52,691][05549] Num frames 2900... [2025-01-05 11:37:52,776][05549] Num frames 3000... [2025-01-05 11:37:52,863][05549] Num frames 3100... [2025-01-05 11:37:52,948][05549] Num frames 3200... [2025-01-05 11:37:53,036][05549] Num frames 3300... [2025-01-05 11:37:53,122][05549] Num frames 3400... [2025-01-05 11:37:53,210][05549] Num frames 3500... [2025-01-05 11:37:53,293][05549] Num frames 3600... [2025-01-05 11:37:53,378][05549] Num frames 3700... [2025-01-05 11:37:53,466][05549] Num frames 3800... [2025-01-05 11:37:53,553][05549] Num frames 3900... [2025-01-05 11:37:53,637][05549] Num frames 4000... [2025-01-05 11:37:53,723][05549] Num frames 4100... [2025-01-05 11:37:53,859][05549] Avg episode rewards: #0: 11.121, true rewards: #0: 5.979 [2025-01-05 11:37:53,860][05549] Avg episode reward: 11.121, avg true_objective: 5.979 [2025-01-05 11:37:53,881][05549] Num frames 4200... [2025-01-05 11:37:53,973][05549] Num frames 4300... [2025-01-05 11:37:54,062][05549] Num frames 4400... [2025-01-05 11:37:54,147][05549] Num frames 4500... [2025-01-05 11:37:54,231][05549] Num frames 4600... [2025-01-05 11:37:54,368][05549] Avg episode rewards: #0: 10.746, true rewards: #0: 5.871 [2025-01-05 11:37:54,368][05549] Avg episode reward: 10.746, avg true_objective: 5.871 [2025-01-05 11:37:54,370][05549] Num frames 4700... [2025-01-05 11:37:54,469][05549] Num frames 4800... [2025-01-05 11:37:54,554][05549] Num frames 4900... [2025-01-05 11:37:54,645][05549] Num frames 5000... [2025-01-05 11:37:54,764][05549] Num frames 5100... [2025-01-05 11:37:54,862][05549] Num frames 5200... [2025-01-05 11:37:54,959][05549] Num frames 5300... [2025-01-05 11:37:55,019][05549] Avg episode rewards: #0: 10.561, true rewards: #0: 5.894 [2025-01-05 11:37:55,020][05549] Avg episode reward: 10.561, avg true_objective: 5.894 [2025-01-05 11:37:55,145][05549] Num frames 5400... [2025-01-05 11:37:55,231][05549] Num frames 5500... [2025-01-05 11:37:55,317][05549] Num frames 5600... [2025-01-05 11:37:55,403][05549] Num frames 5700... [2025-01-05 11:37:55,504][05549] Avg episode rewards: #0: 10.053, true rewards: #0: 5.753 [2025-01-05 11:37:55,505][05549] Avg episode reward: 10.053, avg true_objective: 5.753 [2025-01-05 11:38:07,233][05549] Replay video saved to /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/replay.mp4! [2025-01-05 11:39:04,802][06859] Saving configuration to /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/config.json... [2025-01-05 11:39:04,803][06859] Rollout worker 0 uses device cpu [2025-01-05 11:39:04,803][06859] Rollout worker 1 uses device cpu [2025-01-05 11:39:04,803][06859] Rollout worker 2 uses device cpu [2025-01-05 11:39:04,804][06859] Rollout worker 3 uses device cpu [2025-01-05 11:39:04,804][06859] Rollout worker 4 uses device cpu [2025-01-05 11:39:04,804][06859] Rollout worker 5 uses device cpu [2025-01-05 11:39:04,804][06859] Rollout worker 6 uses device cpu [2025-01-05 11:39:04,804][06859] Rollout worker 7 uses device cpu [2025-01-05 11:39:04,804][06859] Rollout worker 8 uses device cpu [2025-01-05 11:39:04,804][06859] Rollout worker 9 uses device cpu [2025-01-05 11:39:04,804][06859] Rollout worker 10 uses device cpu [2025-01-05 11:39:04,804][06859] Rollout worker 11 uses device cpu [2025-01-05 11:39:04,855][06859] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2025-01-05 11:39:04,855][06859] InferenceWorker_p0-w0: min num requests: 4 [2025-01-05 11:39:04,885][06859] Starting all processes... [2025-01-05 11:39:04,885][06859] Starting process learner_proc0 [2025-01-05 11:39:06,267][06859] Starting all processes... [2025-01-05 11:39:06,273][06859] Starting process inference_proc0-0 [2025-01-05 11:39:06,275][06919] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2025-01-05 11:39:06,276][06919] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0 [2025-01-05 11:39:06,273][06859] Starting process rollout_proc0 [2025-01-05 11:39:06,274][06859] Starting process rollout_proc1 [2025-01-05 11:39:06,276][06859] Starting process rollout_proc2 [2025-01-05 11:39:06,276][06859] Starting process rollout_proc3 [2025-01-05 11:39:06,277][06859] Starting process rollout_proc4 [2025-01-05 11:39:06,279][06859] Starting process rollout_proc5 [2025-01-05 11:39:06,290][06919] Num visible devices: 1 [2025-01-05 11:39:06,295][06919] Starting seed is not provided [2025-01-05 11:39:06,279][06859] Starting process rollout_proc6 [2025-01-05 11:39:06,296][06919] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2025-01-05 11:39:06,296][06919] Initializing actor-critic model on device cuda:0 [2025-01-05 11:39:06,297][06919] RunningMeanStd input shape: (3, 72, 128) [2025-01-05 11:39:06,298][06919] RunningMeanStd input shape: (1,) [2025-01-05 11:39:06,280][06859] Starting process rollout_proc7 [2025-01-05 11:39:06,295][06859] Starting process rollout_proc8 [2025-01-05 11:39:06,296][06859] Starting process rollout_proc9 [2025-01-05 11:39:06,300][06859] Starting process rollout_proc10 [2025-01-05 11:39:06,300][06859] Starting process rollout_proc11 [2025-01-05 11:39:06,316][06919] ConvEncoder: input_channels=3 [2025-01-05 11:39:06,476][06919] Conv encoder output size: 512 [2025-01-05 11:39:06,477][06919] Policy head output size: 512 [2025-01-05 11:39:06,491][06919] Created Actor Critic model with architecture: [2025-01-05 11:39:06,492][06919] ActorCriticSharedWeights( (obs_normalizer): ObservationNormalizer( (running_mean_std): RunningMeanStdDictInPlace( (running_mean_std): ModuleDict( (obs): RunningMeanStdInPlace() ) ) ) (returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace) (encoder): VizdoomEncoder( (basic_encoder): ConvEncoder( (enc): RecursiveScriptModule( original_name=ConvEncoderImpl (conv_head): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Conv2d) (1): RecursiveScriptModule(original_name=ELU) (2): RecursiveScriptModule(original_name=Conv2d) (3): RecursiveScriptModule(original_name=ELU) (4): RecursiveScriptModule(original_name=Conv2d) (5): RecursiveScriptModule(original_name=ELU) ) (mlp_layers): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Linear) (1): RecursiveScriptModule(original_name=ELU) ) ) ) ) (core): ModelCoreRNN( (core): GRU(512, 512) ) (decoder): MlpDecoder( (mlp): Identity() ) (critic_linear): Linear(in_features=512, out_features=1, bias=True) (action_parameterization): ActionParameterizationDefault( (distribution_linear): Linear(in_features=512, out_features=5, bias=True) ) ) [2025-01-05 11:39:06,605][06919] Using optimizer [2025-01-05 11:39:08,472][06919] Loading state from checkpoint /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000244144_1000013824.pth... [2025-01-05 11:39:08,526][06919] Loading model from checkpoint [2025-01-05 11:39:08,528][06919] Loaded experiment state at self.train_step=244144, self.env_steps=1000013824 [2025-01-05 11:39:08,529][06919] Initialized policy 0 weights for model version 244144 [2025-01-05 11:39:08,531][06919] LearnerWorker_p0 finished initialization! [2025-01-05 11:39:08,532][06919] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2025-01-05 11:39:08,971][06974] Worker 5 uses CPU cores [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15] [2025-01-05 11:39:09,114][06978] Worker 10 uses CPU cores [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15] [2025-01-05 11:39:09,146][06972] Worker 4 uses CPU cores [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15] [2025-01-05 11:39:09,210][06955] Worker 3 uses CPU cores [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15] [2025-01-05 11:39:09,381][06951] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2025-01-05 11:39:09,381][06951] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0 [2025-01-05 11:39:09,395][06951] Num visible devices: 1 [2025-01-05 11:39:09,446][06979] Worker 11 uses CPU cores [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15] [2025-01-05 11:39:09,474][06976] Worker 7 uses CPU cores [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15] [2025-01-05 11:39:09,475][06954] Worker 2 uses CPU cores [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15] [2025-01-05 11:39:09,476][06973] Worker 6 uses CPU cores [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15] [2025-01-05 11:39:09,484][06951] RunningMeanStd input shape: (3, 72, 128) [2025-01-05 11:39:09,485][06953] Worker 1 uses CPU cores [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15] [2025-01-05 11:39:09,485][06951] RunningMeanStd input shape: (1,) [2025-01-05 11:39:09,497][06951] ConvEncoder: input_channels=3 [2025-01-05 11:39:09,521][06952] Worker 0 uses CPU cores [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15] [2025-01-05 11:39:09,588][06951] Conv encoder output size: 512 [2025-01-05 11:39:09,588][06951] Policy head output size: 512 [2025-01-05 11:39:09,600][06977] Worker 9 uses CPU cores [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15] [2025-01-05 11:39:09,686][06859] Inference worker 0-0 is ready! [2025-01-05 11:39:09,687][06859] All inference workers are ready! Signal rollout workers to start! [2025-01-05 11:39:09,687][06859] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 1000013824. Throughput: 0: nan. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2025-01-05 11:39:09,701][06975] Worker 8 uses CPU cores [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15] [2025-01-05 11:39:09,732][06974] Doom resolution: 160x120, resize resolution: (128, 72) [2025-01-05 11:39:09,736][06954] Doom resolution: 160x120, resize resolution: (128, 72) [2025-01-05 11:39:09,744][06975] Doom resolution: 160x120, resize resolution: (128, 72) [2025-01-05 11:39:09,746][06977] Doom resolution: 160x120, resize resolution: (128, 72) [2025-01-05 11:39:09,746][06953] Doom resolution: 160x120, resize resolution: (128, 72) [2025-01-05 11:39:09,746][06973] Doom resolution: 160x120, resize resolution: (128, 72) [2025-01-05 11:39:09,746][06976] Doom resolution: 160x120, resize resolution: (128, 72) [2025-01-05 11:39:09,748][06972] Doom resolution: 160x120, resize resolution: (128, 72) [2025-01-05 11:39:09,749][06978] Doom resolution: 160x120, resize resolution: (128, 72) [2025-01-05 11:39:09,749][06955] Doom resolution: 160x120, resize resolution: (128, 72) [2025-01-05 11:39:09,750][06979] Doom resolution: 160x120, resize resolution: (128, 72) [2025-01-05 11:39:09,750][06952] Doom resolution: 160x120, resize resolution: (128, 72) [2025-01-05 11:39:10,087][06975] Decorrelating experience for 0 frames... [2025-01-05 11:39:10,087][06974] Decorrelating experience for 0 frames... [2025-01-05 11:39:10,090][06954] Decorrelating experience for 0 frames... [2025-01-05 11:39:10,095][06977] Decorrelating experience for 0 frames... [2025-01-05 11:39:10,096][06953] Decorrelating experience for 0 frames... [2025-01-05 11:39:10,101][06973] Decorrelating experience for 0 frames... [2025-01-05 11:39:10,101][06976] Decorrelating experience for 0 frames... [2025-01-05 11:39:10,102][06979] Decorrelating experience for 0 frames... [2025-01-05 11:39:10,375][06974] Decorrelating experience for 32 frames... [2025-01-05 11:39:10,380][06977] Decorrelating experience for 32 frames... [2025-01-05 11:39:10,387][06979] Decorrelating experience for 32 frames... [2025-01-05 11:39:10,390][06976] Decorrelating experience for 32 frames... [2025-01-05 11:39:10,395][06973] Decorrelating experience for 32 frames... [2025-01-05 11:39:10,395][06953] Decorrelating experience for 32 frames... [2025-01-05 11:39:10,403][06978] Decorrelating experience for 0 frames... [2025-01-05 11:39:10,404][06955] Decorrelating experience for 0 frames... [2025-01-05 11:39:10,699][06978] Decorrelating experience for 32 frames... [2025-01-05 11:39:10,704][06955] Decorrelating experience for 32 frames... [2025-01-05 11:39:10,720][06954] Decorrelating experience for 32 frames... [2025-01-05 11:39:10,737][06952] Decorrelating experience for 0 frames... [2025-01-05 11:39:10,762][06975] Decorrelating experience for 32 frames... [2025-01-05 11:39:10,793][06976] Decorrelating experience for 64 frames... [2025-01-05 11:39:10,802][06973] Decorrelating experience for 64 frames... [2025-01-05 11:39:10,805][06972] Decorrelating experience for 0 frames... [2025-01-05 11:39:11,002][06974] Decorrelating experience for 64 frames... [2025-01-05 11:39:11,019][06977] Decorrelating experience for 64 frames... [2025-01-05 11:39:11,061][06953] Decorrelating experience for 64 frames... [2025-01-05 11:39:11,081][06972] Decorrelating experience for 32 frames... [2025-01-05 11:39:11,085][06954] Decorrelating experience for 64 frames... [2025-01-05 11:39:11,140][06952] Decorrelating experience for 32 frames... [2025-01-05 11:39:11,291][06973] Decorrelating experience for 96 frames... [2025-01-05 11:39:11,327][06978] Decorrelating experience for 64 frames... [2025-01-05 11:39:11,381][06977] Decorrelating experience for 96 frames... [2025-01-05 11:39:11,395][06974] Decorrelating experience for 96 frames... [2025-01-05 11:39:11,424][06954] Decorrelating experience for 96 frames... [2025-01-05 11:39:11,446][06979] Decorrelating experience for 64 frames... [2025-01-05 11:39:11,501][06952] Decorrelating experience for 64 frames... [2025-01-05 11:39:11,601][06975] Decorrelating experience for 64 frames... [2025-01-05 11:39:11,659][06976] Decorrelating experience for 96 frames... [2025-01-05 11:39:11,668][06978] Decorrelating experience for 96 frames... [2025-01-05 11:39:11,673][06859] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 1000013824. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2025-01-05 11:39:11,697][06972] Decorrelating experience for 64 frames... [2025-01-05 11:39:11,922][06955] Decorrelating experience for 64 frames... [2025-01-05 11:39:11,981][06975] Decorrelating experience for 96 frames... [2025-01-05 11:39:12,000][06953] Decorrelating experience for 96 frames... [2025-01-05 11:39:12,015][06979] Decorrelating experience for 96 frames... [2025-01-05 11:39:12,067][06972] Decorrelating experience for 96 frames... [2025-01-05 11:39:12,297][06952] Decorrelating experience for 96 frames... [2025-01-05 11:39:12,385][06955] Decorrelating experience for 96 frames... [2025-01-05 11:39:12,513][06919] Signal inference workers to stop experience collection... [2025-01-05 11:39:12,522][06951] InferenceWorker_p0-w0: stopping experience collection [2025-01-05 11:39:15,169][06919] Signal inference workers to resume experience collection... [2025-01-05 11:39:15,170][06919] Stopping Batcher_0... [2025-01-05 11:39:15,171][06919] Loop batcher_evt_loop terminating... [2025-01-05 11:39:15,181][06859] Component Batcher_0 stopped! [2025-01-05 11:39:15,188][06951] Weights refcount: 2 0 [2025-01-05 11:39:15,189][06951] Stopping InferenceWorker_p0-w0... [2025-01-05 11:39:15,190][06951] Loop inference_proc0-0_evt_loop terminating... [2025-01-05 11:39:15,190][06859] Component InferenceWorker_p0-w0 stopped! [2025-01-05 11:39:15,220][06859] Component RolloutWorker_w7 stopped! [2025-01-05 11:39:15,220][06976] Stopping RolloutWorker_w7... [2025-01-05 11:39:15,221][06976] Loop rollout_proc7_evt_loop terminating... [2025-01-05 11:39:15,222][06954] Stopping RolloutWorker_w2... [2025-01-05 11:39:15,222][06859] Component RolloutWorker_w2 stopped! [2025-01-05 11:39:15,222][06954] Loop rollout_proc2_evt_loop terminating... [2025-01-05 11:39:15,224][06979] Stopping RolloutWorker_w11... [2025-01-05 11:39:15,224][06979] Loop rollout_proc11_evt_loop terminating... [2025-01-05 11:39:15,224][06952] Stopping RolloutWorker_w0... [2025-01-05 11:39:15,225][06952] Loop rollout_proc0_evt_loop terminating... [2025-01-05 11:39:15,225][06859] Component RolloutWorker_w11 stopped! [2025-01-05 11:39:15,225][06953] Stopping RolloutWorker_w1... [2025-01-05 11:39:15,225][06859] Component RolloutWorker_w0 stopped! [2025-01-05 11:39:15,225][06977] Stopping RolloutWorker_w9... [2025-01-05 11:39:15,226][06859] Component RolloutWorker_w1 stopped! [2025-01-05 11:39:15,226][06953] Loop rollout_proc1_evt_loop terminating... [2025-01-05 11:39:15,226][06977] Loop rollout_proc9_evt_loop terminating... [2025-01-05 11:39:15,226][06859] Component RolloutWorker_w9 stopped! [2025-01-05 11:39:15,226][06974] Stopping RolloutWorker_w5... [2025-01-05 11:39:15,226][06955] Stopping RolloutWorker_w3... [2025-01-05 11:39:15,226][06975] Stopping RolloutWorker_w8... [2025-01-05 11:39:15,226][06859] Component RolloutWorker_w5 stopped! [2025-01-05 11:39:15,227][06859] Component RolloutWorker_w3 stopped! [2025-01-05 11:39:15,227][06974] Loop rollout_proc5_evt_loop terminating... [2025-01-05 11:39:15,227][06955] Loop rollout_proc3_evt_loop terminating... [2025-01-05 11:39:15,227][06859] Component RolloutWorker_w8 stopped! [2025-01-05 11:39:15,227][06975] Loop rollout_proc8_evt_loop terminating... [2025-01-05 11:39:15,229][06978] Stopping RolloutWorker_w10... [2025-01-05 11:39:15,229][06859] Component RolloutWorker_w10 stopped! [2025-01-05 11:39:15,229][06978] Loop rollout_proc10_evt_loop terminating... [2025-01-05 11:39:15,230][06859] Component RolloutWorker_w4 stopped! [2025-01-05 11:39:15,231][06859] Component RolloutWorker_w6 stopped! [2025-01-05 11:39:15,230][06973] Stopping RolloutWorker_w6... [2025-01-05 11:39:15,230][06972] Stopping RolloutWorker_w4... [2025-01-05 11:39:15,231][06972] Loop rollout_proc4_evt_loop terminating... [2025-01-05 11:39:15,231][06973] Loop rollout_proc6_evt_loop terminating... [2025-01-05 11:39:15,546][06919] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000244146_1000022016.pth... [2025-01-05 11:39:15,622][06919] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000244142_1000005632.pth [2025-01-05 11:39:15,623][06919] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000244146_1000022016.pth... [2025-01-05 11:39:15,726][06919] Stopping LearnerWorker_p0... [2025-01-05 11:39:15,730][06919] Loop learner_proc0_evt_loop terminating... [2025-01-05 11:39:15,729][06859] Component LearnerWorker_p0 stopped! [2025-01-05 11:39:15,734][06859] Waiting for process learner_proc0 to stop... [2025-01-05 11:39:16,492][06859] Waiting for process inference_proc0-0 to join... [2025-01-05 11:39:16,493][06859] Waiting for process rollout_proc0 to join... [2025-01-05 11:39:16,493][06859] Waiting for process rollout_proc1 to join... [2025-01-05 11:39:16,493][06859] Waiting for process rollout_proc2 to join... [2025-01-05 11:39:16,493][06859] Waiting for process rollout_proc3 to join... [2025-01-05 11:39:16,494][06859] Waiting for process rollout_proc4 to join... [2025-01-05 11:39:16,494][06859] Waiting for process rollout_proc5 to join... [2025-01-05 11:39:16,494][06859] Waiting for process rollout_proc6 to join... [2025-01-05 11:39:16,494][06859] Waiting for process rollout_proc7 to join... [2025-01-05 11:39:16,495][06859] Waiting for process rollout_proc8 to join... [2025-01-05 11:39:16,495][06859] Waiting for process rollout_proc9 to join... [2025-01-05 11:39:16,495][06859] Waiting for process rollout_proc10 to join... [2025-01-05 11:39:16,495][06859] Waiting for process rollout_proc11 to join... [2025-01-05 11:39:16,496][06859] Batcher 0 profile tree view: batching: 0.0918, releasing_batches: 0.0009 [2025-01-05 11:39:16,496][06859] InferenceWorker_p0-w0 profile tree view: update_model: 0.0063 wait_policy: 0.0001 wait_policy_total: 1.7705 one_step: 0.0033 handle_policy_step: 1.0765 deserialize: 0.0318, stack: 0.0033, obs_to_device_normalize: 0.1935, forward: 0.7124, send_messages: 0.0463 prepare_outputs: 0.0672 to_cpu: 0.0406 [2025-01-05 11:39:16,496][06859] Learner 0 profile tree view: misc: 0.0000, prepare_batch: 0.6967 train: 2.4827 epoch_init: 0.0000, minibatch_init: 0.0000, losses_postprocess: 0.0004, kl_divergence: 0.0061, after_optimizer: 0.0317 calculate_losses: 0.4923 losses_init: 0.0000, forward_head: 0.3994, bptt_initial: 0.0650, tail: 0.0090, advantages_returns: 0.0007, losses: 0.0151 bptt: 0.0027 bptt_forward_core: 0.0026 update: 1.9514 clip: 0.0283 [2025-01-05 11:39:16,496][06859] RolloutWorker_w0 profile tree view: wait_for_trajectories: 0.0002, enqueue_policy_requests: 0.0068, env_step: 0.0600, overhead: 0.0036, complete_rollouts: 0.0000 save_policy_outputs: 0.0066 split_output_tensors: 0.0021 [2025-01-05 11:39:16,497][06859] RolloutWorker_w11 profile tree view: wait_for_trajectories: 0.0005, enqueue_policy_requests: 0.0197, env_step: 0.1844, overhead: 0.0105, complete_rollouts: 0.0003 save_policy_outputs: 0.0182 split_output_tensors: 0.0061 [2025-01-05 11:39:16,497][06859] Loop Runner_EvtLoop terminating... [2025-01-05 11:39:16,498][06859] Runner profile tree view: main_loop: 11.6131 [2025-01-05 11:39:16,498][06859] Collected {0: 1000022016}, FPS: 705.4 [2025-01-05 11:39:16,698][06859] Loading existing experiment configuration from /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/config.json [2025-01-05 11:39:16,699][06859] Overriding arg 'num_workers' with value 1 passed from command line [2025-01-05 11:39:16,699][06859] Adding new argument 'no_render'=True that is not in the saved config file! [2025-01-05 11:39:16,699][06859] Adding new argument 'save_video'=True that is not in the saved config file! [2025-01-05 11:39:16,699][06859] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! [2025-01-05 11:39:16,699][06859] Adding new argument 'video_name'=None that is not in the saved config file! [2025-01-05 11:39:16,699][06859] Adding new argument 'max_num_frames'=1000000000.0 that is not in the saved config file! [2025-01-05 11:39:16,700][06859] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! [2025-01-05 11:39:16,700][06859] Adding new argument 'push_to_hub'=False that is not in the saved config file! [2025-01-05 11:39:16,700][06859] Adding new argument 'hf_repository'=None that is not in the saved config file! [2025-01-05 11:39:16,700][06859] Adding new argument 'policy_index'=0 that is not in the saved config file! [2025-01-05 11:39:16,700][06859] Adding new argument 'eval_deterministic'=False that is not in the saved config file! [2025-01-05 11:39:16,700][06859] Adding new argument 'train_script'=None that is not in the saved config file! [2025-01-05 11:39:16,700][06859] Adding new argument 'enjoy_script'=None that is not in the saved config file! [2025-01-05 11:39:16,700][06859] Using frameskip 1 and render_action_repeat=4 for evaluation [2025-01-05 11:39:16,721][06859] Doom resolution: 160x120, resize resolution: (128, 72) [2025-01-05 11:39:16,722][06859] RunningMeanStd input shape: (3, 72, 128) [2025-01-05 11:39:16,722][06859] RunningMeanStd input shape: (1,) [2025-01-05 11:39:16,732][06859] ConvEncoder: input_channels=3 [2025-01-05 11:39:16,819][06859] Conv encoder output size: 512 [2025-01-05 11:39:16,819][06859] Policy head output size: 512 [2025-01-05 11:39:16,927][06859] Loading state from checkpoint /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000244146_1000022016.pth... [2025-01-05 11:39:17,523][06859] Num frames 100... [2025-01-05 11:39:17,610][06859] Num frames 200... [2025-01-05 11:39:17,699][06859] Num frames 300... [2025-01-05 11:39:17,828][06859] Avg episode rewards: #0: 7.840, true rewards: #0: 3.840 [2025-01-05 11:39:17,828][06859] Avg episode reward: 7.840, avg true_objective: 3.840 [2025-01-05 11:39:17,847][06859] Num frames 400... [2025-01-05 11:39:17,938][06859] Num frames 500... [2025-01-05 11:39:18,027][06859] Num frames 600... [2025-01-05 11:39:18,116][06859] Num frames 700... [2025-01-05 11:39:18,230][06859] Avg episode rewards: #0: 6.840, true rewards: #0: 3.840 [2025-01-05 11:39:18,230][06859] Avg episode reward: 6.840, avg true_objective: 3.840 [2025-01-05 11:39:18,271][06859] Num frames 800... [2025-01-05 11:39:18,357][06859] Num frames 900... [2025-01-05 11:39:18,444][06859] Num frames 1000... [2025-01-05 11:39:18,532][06859] Num frames 1100... [2025-01-05 11:39:18,622][06859] Num frames 1200... [2025-01-05 11:39:18,690][06859] Avg episode rewards: #0: 7.053, true rewards: #0: 4.053 [2025-01-05 11:39:18,690][06859] Avg episode reward: 7.053, avg true_objective: 4.053 [2025-01-05 11:39:18,780][06859] Num frames 1300... [2025-01-05 11:39:18,865][06859] Num frames 1400... [2025-01-05 11:39:18,955][06859] Num frames 1500... [2025-01-05 11:39:19,044][06859] Num frames 1600... [2025-01-05 11:39:19,127][06859] Avg episode rewards: #0: 6.830, true rewards: #0: 4.080 [2025-01-05 11:39:19,128][06859] Avg episode reward: 6.830, avg true_objective: 4.080 [2025-01-05 11:39:19,231][06859] Num frames 1700... [2025-01-05 11:39:19,317][06859] Num frames 1800... [2025-01-05 11:39:19,406][06859] Num frames 1900... [2025-01-05 11:39:19,496][06859] Num frames 2000... [2025-01-05 11:39:19,585][06859] Num frames 2100... [2025-01-05 11:39:19,675][06859] Num frames 2200... [2025-01-05 11:39:19,765][06859] Num frames 2300... [2025-01-05 11:39:19,855][06859] Num frames 2400... [2025-01-05 11:39:19,943][06859] Num frames 2500... [2025-01-05 11:39:20,035][06859] Num frames 2600... [2025-01-05 11:39:20,125][06859] Num frames 2700... [2025-01-05 11:39:20,255][06859] Avg episode rewards: #0: 10.764, true rewards: #0: 5.564 [2025-01-05 11:39:20,256][06859] Avg episode reward: 10.764, avg true_objective: 5.564 [2025-01-05 11:39:20,283][06859] Num frames 2800... [2025-01-05 11:39:20,378][06859] Num frames 2900... [2025-01-05 11:39:20,466][06859] Num frames 3000... [2025-01-05 11:39:20,558][06859] Num frames 3100... [2025-01-05 11:39:20,647][06859] Num frames 3200... [2025-01-05 11:39:20,737][06859] Num frames 3300... [2025-01-05 11:39:20,815][06859] Avg episode rewards: #0: 10.710, true rewards: #0: 5.543 [2025-01-05 11:39:20,816][06859] Avg episode reward: 10.710, avg true_objective: 5.543 [2025-01-05 11:39:20,888][06859] Num frames 3400... [2025-01-05 11:39:20,975][06859] Num frames 3500... [2025-01-05 11:39:21,067][06859] Num frames 3600... [2025-01-05 11:39:21,156][06859] Num frames 3700... [2025-01-05 11:39:21,251][06859] Num frames 3800... [2025-01-05 11:39:21,360][06859] Num frames 3900... [2025-01-05 11:39:21,438][06859] Avg episode rewards: #0: 10.463, true rewards: #0: 5.606 [2025-01-05 11:39:21,439][06859] Avg episode reward: 10.463, avg true_objective: 5.606 [2025-01-05 11:39:21,547][06859] Num frames 4000... [2025-01-05 11:39:21,639][06859] Num frames 4100... [2025-01-05 11:39:21,732][06859] Num frames 4200... [2025-01-05 11:39:21,821][06859] Num frames 4300... [2025-01-05 11:39:21,947][06859] Avg episode rewards: #0: 9.840, true rewards: #0: 5.465 [2025-01-05 11:39:21,947][06859] Avg episode reward: 9.840, avg true_objective: 5.465 [2025-01-05 11:39:21,991][06859] Num frames 4400... [2025-01-05 11:39:22,083][06859] Num frames 4500... [2025-01-05 11:39:22,175][06859] Num frames 4600... [2025-01-05 11:39:22,264][06859] Num frames 4700... [2025-01-05 11:39:22,368][06859] Avg episode rewards: #0: 9.396, true rewards: #0: 5.284 [2025-01-05 11:39:22,369][06859] Avg episode reward: 9.396, avg true_objective: 5.284 [2025-01-05 11:39:22,416][06859] Num frames 4800... [2025-01-05 11:39:22,502][06859] Num frames 4900... [2025-01-05 11:39:22,585][06859] Num frames 5000... [2025-01-05 11:39:22,668][06859] Num frames 5100... [2025-01-05 11:39:22,753][06859] Num frames 5200... [2025-01-05 11:39:22,838][06859] Num frames 5300... [2025-01-05 11:39:22,926][06859] Num frames 5400... [2025-01-05 11:39:23,012][06859] Num frames 5500... [2025-01-05 11:39:23,100][06859] Num frames 5600... [2025-01-05 11:39:23,227][06859] Avg episode rewards: #0: 10.283, true rewards: #0: 5.683 [2025-01-05 11:39:23,227][06859] Avg episode reward: 10.283, avg true_objective: 5.683 [2025-01-05 11:39:29,241][06859] Replay video saved to /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/replay.mp4! [2025-01-05 11:39:29,249][06859] Loading existing experiment configuration from /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/config.json [2025-01-05 11:39:29,250][06859] Overriding arg 'num_workers' with value 1 passed from command line [2025-01-05 11:39:29,250][06859] Adding new argument 'no_render'=True that is not in the saved config file! [2025-01-05 11:39:29,250][06859] Adding new argument 'save_video'=True that is not in the saved config file! [2025-01-05 11:39:29,250][06859] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! [2025-01-05 11:39:29,250][06859] Adding new argument 'video_name'=None that is not in the saved config file! [2025-01-05 11:39:29,250][06859] Adding new argument 'max_num_frames'=100000 that is not in the saved config file! [2025-01-05 11:39:29,250][06859] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! [2025-01-05 11:39:29,251][06859] Adding new argument 'push_to_hub'=True that is not in the saved config file! [2025-01-05 11:39:29,251][06859] Adding new argument 'hf_repository'='spenning/rl_course_vizdoom_health_gathering_supreme' that is not in the saved config file! [2025-01-05 11:39:29,251][06859] Adding new argument 'policy_index'=0 that is not in the saved config file! [2025-01-05 11:39:29,251][06859] Adding new argument 'eval_deterministic'=False that is not in the saved config file! [2025-01-05 11:39:29,251][06859] Adding new argument 'train_script'=None that is not in the saved config file! [2025-01-05 11:39:29,251][06859] Adding new argument 'enjoy_script'=None that is not in the saved config file! [2025-01-05 11:39:29,251][06859] Using frameskip 1 and render_action_repeat=4 for evaluation [2025-01-05 11:39:29,266][06859] RunningMeanStd input shape: (3, 72, 128) [2025-01-05 11:39:29,268][06859] RunningMeanStd input shape: (1,) [2025-01-05 11:39:29,273][06859] ConvEncoder: input_channels=3 [2025-01-05 11:39:29,301][06859] Conv encoder output size: 512 [2025-01-05 11:39:29,301][06859] Policy head output size: 512 [2025-01-05 11:39:29,317][06859] Loading state from checkpoint /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000244146_1000022016.pth... [2025-01-05 11:39:54,505][07906] Saving configuration to /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/config.json... [2025-01-05 11:39:54,506][07906] Rollout worker 0 uses device cpu [2025-01-05 11:39:54,506][07906] Rollout worker 1 uses device cpu [2025-01-05 11:39:54,506][07906] Rollout worker 2 uses device cpu [2025-01-05 11:39:54,507][07906] Rollout worker 3 uses device cpu [2025-01-05 11:39:54,507][07906] Rollout worker 4 uses device cpu [2025-01-05 11:39:54,507][07906] Rollout worker 5 uses device cpu [2025-01-05 11:39:54,507][07906] Rollout worker 6 uses device cpu [2025-01-05 11:39:54,507][07906] Rollout worker 7 uses device cpu [2025-01-05 11:39:54,507][07906] Rollout worker 8 uses device cpu [2025-01-05 11:39:54,508][07906] Rollout worker 9 uses device cpu [2025-01-05 11:39:54,508][07906] Rollout worker 10 uses device cpu [2025-01-05 11:39:54,508][07906] Rollout worker 11 uses device cpu [2025-01-05 11:39:54,558][07906] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2025-01-05 11:39:54,558][07906] InferenceWorker_p0-w0: min num requests: 4 [2025-01-05 11:39:54,586][07906] Starting all processes... [2025-01-05 11:39:54,586][07906] Starting process learner_proc0 [2025-01-05 11:39:55,997][07906] Starting all processes... [2025-01-05 11:39:56,002][07906] Starting process inference_proc0-0 [2025-01-05 11:39:56,004][07906] Starting process rollout_proc0 [2025-01-05 11:39:56,004][07967] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2025-01-05 11:39:56,005][07967] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0 [2025-01-05 11:39:56,004][07906] Starting process rollout_proc1 [2025-01-05 11:39:56,007][07906] Starting process rollout_proc2 [2025-01-05 11:39:56,009][07906] Starting process rollout_proc3 [2025-01-05 11:39:56,011][07906] Starting process rollout_proc4 [2025-01-05 11:39:56,019][07967] Num visible devices: 1 [2025-01-05 11:39:56,011][07906] Starting process rollout_proc5 [2025-01-05 11:39:56,025][07967] Starting seed is not provided [2025-01-05 11:39:56,025][07967] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2025-01-05 11:39:56,025][07967] Initializing actor-critic model on device cuda:0 [2025-01-05 11:39:56,025][07967] RunningMeanStd input shape: (3, 72, 128) [2025-01-05 11:39:56,026][07967] RunningMeanStd input shape: (1,) [2025-01-05 11:39:56,014][07906] Starting process rollout_proc6 [2025-01-05 11:39:56,014][07906] Starting process rollout_proc7 [2025-01-05 11:39:56,014][07906] Starting process rollout_proc8 [2025-01-05 11:39:56,014][07906] Starting process rollout_proc9 [2025-01-05 11:39:56,038][07967] ConvEncoder: input_channels=3 [2025-01-05 11:39:56,016][07906] Starting process rollout_proc10 [2025-01-05 11:39:56,024][07906] Starting process rollout_proc11 [2025-01-05 11:39:56,157][07967] Conv encoder output size: 512 [2025-01-05 11:39:56,158][07967] Policy head output size: 512 [2025-01-05 11:39:56,173][07967] Created Actor Critic model with architecture: [2025-01-05 11:39:56,173][07967] ActorCriticSharedWeights( (obs_normalizer): ObservationNormalizer( (running_mean_std): RunningMeanStdDictInPlace( (running_mean_std): ModuleDict( (obs): RunningMeanStdInPlace() ) ) ) (returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace) (encoder): VizdoomEncoder( (basic_encoder): ConvEncoder( (enc): RecursiveScriptModule( original_name=ConvEncoderImpl (conv_head): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Conv2d) (1): RecursiveScriptModule(original_name=ELU) (2): RecursiveScriptModule(original_name=Conv2d) (3): RecursiveScriptModule(original_name=ELU) (4): RecursiveScriptModule(original_name=Conv2d) (5): RecursiveScriptModule(original_name=ELU) ) (mlp_layers): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Linear) (1): RecursiveScriptModule(original_name=ELU) ) ) ) ) (core): ModelCoreRNN( (core): GRU(512, 512) ) (decoder): MlpDecoder( (mlp): Identity() ) (critic_linear): Linear(in_features=512, out_features=1, bias=True) (action_parameterization): ActionParameterizationDefault( (distribution_linear): Linear(in_features=512, out_features=5, bias=True) ) ) [2025-01-05 11:39:56,292][07967] Using optimizer [2025-01-05 11:39:58,083][07967] Loading state from checkpoint /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000244146_1000022016.pth... [2025-01-05 11:39:58,131][07967] Loading model from checkpoint [2025-01-05 11:39:58,134][07967] Loaded experiment state at self.train_step=244146, self.env_steps=1000022016 [2025-01-05 11:39:58,134][07967] Initialized policy 0 weights for model version 244146 [2025-01-05 11:39:58,137][07967] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2025-01-05 11:39:58,140][07967] LearnerWorker_p0 finished initialization! [2025-01-05 11:39:58,701][08002] Worker 2 uses CPU cores [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15] [2025-01-05 11:39:58,717][07999] Worker 0 uses CPU cores [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15] [2025-01-05 11:39:58,756][08026] Worker 11 uses CPU cores [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15] [2025-01-05 11:39:59,092][08021] Worker 5 uses CPU cores [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15] [2025-01-05 11:39:59,096][08003] Worker 3 uses CPU cores [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15] [2025-01-05 11:39:59,096][08004] Worker 4 uses CPU cores [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15] [2025-01-05 11:39:59,162][08001] Worker 1 uses CPU cores [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15] [2025-01-05 11:39:59,192][08000] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2025-01-05 11:39:59,192][08000] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0 [2025-01-05 11:39:59,206][08000] Num visible devices: 1 [2025-01-05 11:39:59,209][08024] Worker 8 uses CPU cores [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15] [2025-01-05 11:39:59,253][08027] Worker 10 uses CPU cores [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15] [2025-01-05 11:39:59,293][08000] RunningMeanStd input shape: (3, 72, 128) [2025-01-05 11:39:59,294][08000] RunningMeanStd input shape: (1,) [2025-01-05 11:39:59,304][08000] ConvEncoder: input_channels=3 [2025-01-05 11:39:59,338][08025] Worker 9 uses CPU cores [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15] [2025-01-05 11:39:59,352][08023] Worker 7 uses CPU cores [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15] [2025-01-05 11:39:59,386][07906] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 1000022016. Throughput: 0: nan. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2025-01-05 11:39:59,391][08022] Worker 6 uses CPU cores [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15] [2025-01-05 11:39:59,399][08000] Conv encoder output size: 512 [2025-01-05 11:39:59,399][08000] Policy head output size: 512 [2025-01-05 11:39:59,423][07906] Inference worker 0-0 is ready! [2025-01-05 11:39:59,423][07906] All inference workers are ready! Signal rollout workers to start! [2025-01-05 11:39:59,460][08022] Doom resolution: 160x120, resize resolution: (128, 72) [2025-01-05 11:39:59,460][08023] Doom resolution: 160x120, resize resolution: (128, 72) [2025-01-05 11:39:59,461][08002] Doom resolution: 160x120, resize resolution: (128, 72) [2025-01-05 11:39:59,461][08026] Doom resolution: 160x120, resize resolution: (128, 72) [2025-01-05 11:39:59,478][08004] Doom resolution: 160x120, resize resolution: (128, 72) [2025-01-05 11:39:59,481][08025] Doom resolution: 160x120, resize resolution: (128, 72) [2025-01-05 11:39:59,482][08027] Doom resolution: 160x120, resize resolution: (128, 72) [2025-01-05 11:39:59,482][08021] Doom resolution: 160x120, resize resolution: (128, 72) [2025-01-05 11:39:59,482][07999] Doom resolution: 160x120, resize resolution: (128, 72) [2025-01-05 11:39:59,482][08024] Doom resolution: 160x120, resize resolution: (128, 72) [2025-01-05 11:39:59,482][08003] Doom resolution: 160x120, resize resolution: (128, 72) [2025-01-05 11:39:59,482][08001] Doom resolution: 160x120, resize resolution: (128, 72) [2025-01-05 11:39:59,816][08026] Decorrelating experience for 0 frames... [2025-01-05 11:39:59,816][08002] Decorrelating experience for 0 frames... [2025-01-05 11:39:59,819][08025] Decorrelating experience for 0 frames... [2025-01-05 11:39:59,820][08023] Decorrelating experience for 0 frames... [2025-01-05 11:39:59,827][08003] Decorrelating experience for 0 frames... [2025-01-05 11:39:59,830][08024] Decorrelating experience for 0 frames... [2025-01-05 11:39:59,832][08021] Decorrelating experience for 0 frames... [2025-01-05 11:40:00,084][08023] Decorrelating experience for 32 frames... [2025-01-05 11:40:00,090][08026] Decorrelating experience for 32 frames... [2025-01-05 11:40:00,097][08003] Decorrelating experience for 32 frames... [2025-01-05 11:40:00,105][08021] Decorrelating experience for 32 frames... [2025-01-05 11:40:00,125][08001] Decorrelating experience for 0 frames... [2025-01-05 11:40:00,147][08025] Decorrelating experience for 32 frames... [2025-01-05 11:40:00,407][08002] Decorrelating experience for 32 frames... [2025-01-05 11:40:00,409][08001] Decorrelating experience for 32 frames... [2025-01-05 11:40:00,414][08004] Decorrelating experience for 0 frames... [2025-01-05 11:40:00,443][08024] Decorrelating experience for 32 frames... [2025-01-05 11:40:00,451][07999] Decorrelating experience for 0 frames... [2025-01-05 11:40:00,465][08022] Decorrelating experience for 0 frames... [2025-01-05 11:40:00,479][08026] Decorrelating experience for 64 frames... [2025-01-05 11:40:00,534][08025] Decorrelating experience for 64 frames... [2025-01-05 11:40:00,700][08004] Decorrelating experience for 32 frames... [2025-01-05 11:40:00,706][08023] Decorrelating experience for 64 frames... [2025-01-05 11:40:00,748][08021] Decorrelating experience for 64 frames... [2025-01-05 11:40:00,794][08001] Decorrelating experience for 64 frames... [2025-01-05 11:40:00,816][08002] Decorrelating experience for 64 frames... [2025-01-05 11:40:00,831][08024] Decorrelating experience for 64 frames... [2025-01-05 11:40:00,835][08022] Decorrelating experience for 32 frames... [2025-01-05 11:40:00,990][07999] Decorrelating experience for 32 frames... [2025-01-05 11:40:01,093][08021] Decorrelating experience for 96 frames... [2025-01-05 11:40:01,109][08001] Decorrelating experience for 96 frames... [2025-01-05 11:40:01,113][08026] Decorrelating experience for 96 frames... [2025-01-05 11:40:01,150][08002] Decorrelating experience for 96 frames... [2025-01-05 11:40:01,202][08022] Decorrelating experience for 64 frames... [2025-01-05 11:40:01,279][08025] Decorrelating experience for 96 frames... [2025-01-05 11:40:01,364][07999] Decorrelating experience for 64 frames... [2025-01-05 11:40:01,400][08004] Decorrelating experience for 64 frames... [2025-01-05 11:40:01,537][08003] Decorrelating experience for 64 frames... [2025-01-05 11:40:01,596][07906] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 1000022016. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2025-01-05 11:40:01,691][07999] Decorrelating experience for 96 frames... [2025-01-05 11:40:01,719][08004] Decorrelating experience for 96 frames... [2025-01-05 11:40:01,861][08022] Decorrelating experience for 96 frames... [2025-01-05 11:40:01,893][08003] Decorrelating experience for 96 frames... [2025-01-05 11:40:02,217][08023] Decorrelating experience for 96 frames... [2025-01-05 11:40:02,252][07967] Signal inference workers to stop experience collection... [2025-01-05 11:40:02,261][08000] InferenceWorker_p0-w0: stopping experience collection [2025-01-05 11:40:02,475][08024] Decorrelating experience for 96 frames... [2025-01-05 11:40:02,721][08027] Decorrelating experience for 0 frames... [2025-01-05 11:40:02,942][08027] Decorrelating experience for 32 frames... [2025-01-05 11:40:03,215][08027] Decorrelating experience for 64 frames... [2025-01-05 11:40:03,451][08027] Decorrelating experience for 96 frames... [2025-01-05 11:40:04,920][07967] Signal inference workers to resume experience collection... [2025-01-05 11:40:04,921][08000] InferenceWorker_p0-w0: resuming experience collection [2025-01-05 11:40:04,922][07967] Stopping Batcher_0... [2025-01-05 11:40:04,922][07967] Loop batcher_evt_loop terminating... [2025-01-05 11:40:04,929][07906] Component Batcher_0 stopped! [2025-01-05 11:40:04,939][08000] Weights refcount: 2 0 [2025-01-05 11:40:04,940][08000] Stopping InferenceWorker_p0-w0... [2025-01-05 11:40:04,941][07906] Component InferenceWorker_p0-w0 stopped! [2025-01-05 11:40:04,941][08000] Loop inference_proc0-0_evt_loop terminating... [2025-01-05 11:40:04,970][07999] Stopping RolloutWorker_w0... [2025-01-05 11:40:04,971][07906] Component RolloutWorker_w0 stopped! [2025-01-05 11:40:04,970][07999] Loop rollout_proc0_evt_loop terminating... [2025-01-05 11:40:04,973][07906] Component RolloutWorker_w11 stopped! [2025-01-05 11:40:04,974][08026] Stopping RolloutWorker_w11... [2025-01-05 11:40:04,974][08026] Loop rollout_proc11_evt_loop terminating... [2025-01-05 11:40:04,975][08024] Stopping RolloutWorker_w8... [2025-01-05 11:40:04,975][07906] Component RolloutWorker_w8 stopped! [2025-01-05 11:40:04,975][08024] Loop rollout_proc8_evt_loop terminating... [2025-01-05 11:40:04,976][07906] Component RolloutWorker_w5 stopped! [2025-01-05 11:40:04,976][08021] Stopping RolloutWorker_w5... [2025-01-05 11:40:04,977][08023] Stopping RolloutWorker_w7... [2025-01-05 11:40:04,977][07906] Component RolloutWorker_w7 stopped! [2025-01-05 11:40:04,977][08021] Loop rollout_proc5_evt_loop terminating... [2025-01-05 11:40:04,977][08023] Loop rollout_proc7_evt_loop terminating... [2025-01-05 11:40:04,978][07906] Component RolloutWorker_w9 stopped! [2025-01-05 11:40:04,978][08025] Stopping RolloutWorker_w9... [2025-01-05 11:40:04,979][08025] Loop rollout_proc9_evt_loop terminating... [2025-01-05 11:40:04,979][08022] Stopping RolloutWorker_w6... [2025-01-05 11:40:04,980][07906] Component RolloutWorker_w6 stopped! [2025-01-05 11:40:04,980][08022] Loop rollout_proc6_evt_loop terminating... [2025-01-05 11:40:04,980][07906] Component RolloutWorker_w3 stopped! [2025-01-05 11:40:04,980][08003] Stopping RolloutWorker_w3... [2025-01-05 11:40:04,981][08003] Loop rollout_proc3_evt_loop terminating... [2025-01-05 11:40:04,982][08027] Stopping RolloutWorker_w10... [2025-01-05 11:40:04,983][07906] Component RolloutWorker_w10 stopped! [2025-01-05 11:40:04,983][08027] Loop rollout_proc10_evt_loop terminating... [2025-01-05 11:40:04,984][08001] Stopping RolloutWorker_w1... [2025-01-05 11:40:04,984][07906] Component RolloutWorker_w1 stopped! [2025-01-05 11:40:04,984][08002] Stopping RolloutWorker_w2... [2025-01-05 11:40:04,984][07906] Component RolloutWorker_w2 stopped! [2025-01-05 11:40:04,984][08002] Loop rollout_proc2_evt_loop terminating... [2025-01-05 11:40:04,985][08001] Loop rollout_proc1_evt_loop terminating... [2025-01-05 11:40:04,985][08004] Stopping RolloutWorker_w4... [2025-01-05 11:40:04,985][07906] Component RolloutWorker_w4 stopped! [2025-01-05 11:40:04,986][08004] Loop rollout_proc4_evt_loop terminating... [2025-01-05 11:40:05,301][07967] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000244148_1000030208.pth... [2025-01-05 11:40:05,384][07967] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000244144_1000013824.pth [2025-01-05 11:40:05,385][07967] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000244148_1000030208.pth... [2025-01-05 11:40:05,461][07967] Stopping LearnerWorker_p0... [2025-01-05 11:40:05,461][07967] Loop learner_proc0_evt_loop terminating... [2025-01-05 11:40:05,464][07906] Component LearnerWorker_p0 stopped! [2025-01-05 11:40:05,464][07906] Waiting for process learner_proc0 to stop... [2025-01-05 11:40:06,260][07906] Waiting for process inference_proc0-0 to join... [2025-01-05 11:40:06,260][07906] Waiting for process rollout_proc0 to join... [2025-01-05 11:40:06,261][07906] Waiting for process rollout_proc1 to join... [2025-01-05 11:40:06,261][07906] Waiting for process rollout_proc2 to join... [2025-01-05 11:40:06,261][07906] Waiting for process rollout_proc3 to join... [2025-01-05 11:40:06,261][07906] Waiting for process rollout_proc4 to join... [2025-01-05 11:40:06,261][07906] Waiting for process rollout_proc5 to join... [2025-01-05 11:40:06,261][07906] Waiting for process rollout_proc6 to join... [2025-01-05 11:40:06,262][07906] Waiting for process rollout_proc7 to join... [2025-01-05 11:40:06,262][07906] Waiting for process rollout_proc8 to join... [2025-01-05 11:40:06,262][07906] Waiting for process rollout_proc9 to join... [2025-01-05 11:40:06,262][07906] Waiting for process rollout_proc10 to join... [2025-01-05 11:40:06,262][07906] Waiting for process rollout_proc11 to join... [2025-01-05 11:40:06,263][07906] Batcher 0 profile tree view: batching: 0.0216, releasing_batches: 0.0005 [2025-01-05 11:40:06,263][07906] InferenceWorker_p0-w0 profile tree view: update_model: 0.0069 wait_policy: 0.0001 wait_policy_total: 1.7620 one_step: 0.0034 handle_policy_step: 1.0165 deserialize: 0.0328, stack: 0.0037, obs_to_device_normalize: 0.1897, forward: 0.6413, send_messages: 0.0441 prepare_outputs: 0.0807 to_cpu: 0.0524 [2025-01-05 11:40:06,263][07906] Learner 0 profile tree view: misc: 0.0000, prepare_batch: 0.7388 train: 2.5439 epoch_init: 0.0000, minibatch_init: 0.0000, losses_postprocess: 0.0004, kl_divergence: 0.0069, after_optimizer: 0.0519 calculate_losses: 0.5059 losses_init: 0.0000, forward_head: 0.4079, bptt_initial: 0.0680, tail: 0.0097, advantages_returns: 0.0007, losses: 0.0165 bptt: 0.0027 bptt_forward_core: 0.0026 update: 1.9778 clip: 0.0273 [2025-01-05 11:40:06,263][07906] RolloutWorker_w0 profile tree view: wait_for_trajectories: 0.0004, enqueue_policy_requests: 0.0241, env_step: 0.2353, overhead: 0.0135, complete_rollouts: 0.0003 save_policy_outputs: 0.0236 split_output_tensors: 0.0076 [2025-01-05 11:40:06,263][07906] RolloutWorker_w11 profile tree view: wait_for_trajectories: 0.0007, enqueue_policy_requests: 0.0377, env_step: 0.3669, overhead: 0.0203, complete_rollouts: 0.0008 save_policy_outputs: 0.0364 split_output_tensors: 0.0119 [2025-01-05 11:40:06,264][07906] Loop Runner_EvtLoop terminating... [2025-01-05 11:40:06,264][07906] Runner profile tree view: main_loop: 11.6780 [2025-01-05 11:40:06,264][07906] Collected {0: 1000030208}, FPS: 701.5 [2025-01-05 11:40:06,467][07906] Loading existing experiment configuration from /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/config.json [2025-01-05 11:40:06,467][07906] Overriding arg 'num_workers' with value 1 passed from command line [2025-01-05 11:40:06,467][07906] Adding new argument 'no_render'=True that is not in the saved config file! [2025-01-05 11:40:06,467][07906] Adding new argument 'save_video'=True that is not in the saved config file! [2025-01-05 11:40:06,467][07906] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! [2025-01-05 11:40:06,467][07906] Adding new argument 'video_name'=None that is not in the saved config file! [2025-01-05 11:40:06,467][07906] Adding new argument 'max_num_frames'=1000000000.0 that is not in the saved config file! [2025-01-05 11:40:06,467][07906] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! [2025-01-05 11:40:06,468][07906] Adding new argument 'push_to_hub'=False that is not in the saved config file! [2025-01-05 11:40:06,468][07906] Adding new argument 'hf_repository'=None that is not in the saved config file! [2025-01-05 11:40:06,468][07906] Adding new argument 'policy_index'=0 that is not in the saved config file! [2025-01-05 11:40:06,468][07906] Adding new argument 'eval_deterministic'=False that is not in the saved config file! [2025-01-05 11:40:06,468][07906] Adding new argument 'train_script'=None that is not in the saved config file! [2025-01-05 11:40:06,468][07906] Adding new argument 'enjoy_script'=None that is not in the saved config file! [2025-01-05 11:40:06,468][07906] Using frameskip 1 and render_action_repeat=4 for evaluation [2025-01-05 11:40:06,488][07906] Doom resolution: 160x120, resize resolution: (128, 72) [2025-01-05 11:40:06,489][07906] RunningMeanStd input shape: (3, 72, 128) [2025-01-05 11:40:06,490][07906] RunningMeanStd input shape: (1,) [2025-01-05 11:40:06,499][07906] ConvEncoder: input_channels=3 [2025-01-05 11:40:06,581][07906] Conv encoder output size: 512 [2025-01-05 11:40:06,581][07906] Policy head output size: 512 [2025-01-05 11:40:06,697][07906] Loading state from checkpoint /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000244148_1000030208.pth... [2025-01-05 11:40:07,231][07906] Num frames 100... [2025-01-05 11:40:07,319][07906] Num frames 200... [2025-01-05 11:40:07,411][07906] Num frames 300... [2025-01-05 11:40:07,515][07906] Num frames 400... [2025-01-05 11:40:07,610][07906] Avg episode rewards: #0: 5.480, true rewards: #0: 4.480 [2025-01-05 11:40:07,610][07906] Avg episode reward: 5.480, avg true_objective: 4.480 [2025-01-05 11:40:07,667][07906] Num frames 500... [2025-01-05 11:40:07,752][07906] Num frames 600... [2025-01-05 11:40:07,836][07906] Num frames 700... [2025-01-05 11:40:07,923][07906] Num frames 800... [2025-01-05 11:40:08,014][07906] Num frames 900... [2025-01-05 11:40:08,151][07906] Avg episode rewards: #0: 6.460, true rewards: #0: 4.960 [2025-01-05 11:40:08,151][07906] Avg episode reward: 6.460, avg true_objective: 4.960 [2025-01-05 11:40:08,163][07906] Num frames 1000... [2025-01-05 11:40:08,258][07906] Num frames 1100... [2025-01-05 11:40:08,344][07906] Num frames 1200... [2025-01-05 11:40:08,428][07906] Num frames 1300... [2025-01-05 11:40:08,513][07906] Num frames 1400... [2025-01-05 11:40:08,594][07906] Avg episode rewards: #0: 6.100, true rewards: #0: 4.767 [2025-01-05 11:40:08,594][07906] Avg episode reward: 6.100, avg true_objective: 4.767 [2025-01-05 11:40:08,664][07906] Num frames 1500... [2025-01-05 11:40:08,750][07906] Num frames 1600... [2025-01-05 11:40:08,853][07906] Num frames 1700... [2025-01-05 11:40:08,936][07906] Num frames 1800... [2025-01-05 11:40:09,024][07906] Num frames 1900... [2025-01-05 11:40:09,147][07906] Avg episode rewards: #0: 6.700, true rewards: #0: 4.950 [2025-01-05 11:40:09,148][07906] Avg episode reward: 6.700, avg true_objective: 4.950 [2025-01-05 11:40:09,178][07906] Num frames 2000... [2025-01-05 11:40:09,271][07906] Num frames 2100... [2025-01-05 11:40:09,363][07906] Num frames 2200... [2025-01-05 11:40:09,454][07906] Num frames 2300... [2025-01-05 11:40:09,543][07906] Num frames 2400... [2025-01-05 11:40:09,636][07906] Num frames 2500... [2025-01-05 11:40:09,765][07906] Avg episode rewards: #0: 7.376, true rewards: #0: 5.176 [2025-01-05 11:40:09,766][07906] Avg episode reward: 7.376, avg true_objective: 5.176 [2025-01-05 11:40:09,783][07906] Num frames 2600... [2025-01-05 11:40:09,888][07906] Num frames 2700... [2025-01-05 11:40:09,979][07906] Num frames 2800... [2025-01-05 11:40:10,074][07906] Num frames 2900... [2025-01-05 11:40:10,166][07906] Num frames 3000... [2025-01-05 11:40:10,263][07906] Num frames 3100... [2025-01-05 11:40:10,406][07906] Avg episode rewards: #0: 7.660, true rewards: #0: 5.327 [2025-01-05 11:40:10,406][07906] Avg episode reward: 7.660, avg true_objective: 5.327 [2025-01-05 11:40:10,412][07906] Num frames 3200... [2025-01-05 11:40:10,508][07906] Num frames 3300... [2025-01-05 11:40:10,589][07906] Num frames 3400... [2025-01-05 11:40:10,680][07906] Num frames 3500... [2025-01-05 11:40:10,770][07906] Num frames 3600... [2025-01-05 11:40:10,856][07906] Num frames 3700... [2025-01-05 11:40:10,943][07906] Num frames 3800... [2025-01-05 11:40:11,002][07906] Avg episode rewards: #0: 7.863, true rewards: #0: 5.434 [2025-01-05 11:40:11,003][07906] Avg episode reward: 7.863, avg true_objective: 5.434 [2025-01-05 11:40:11,132][07906] Num frames 3900... [2025-01-05 11:40:11,220][07906] Num frames 4000... [2025-01-05 11:40:11,308][07906] Num frames 4100... [2025-01-05 11:40:11,398][07906] Num frames 4200... [2025-01-05 11:40:11,481][07906] Num frames 4300... [2025-01-05 11:40:11,576][07906] Avg episode rewards: #0: 7.810, true rewards: #0: 5.435 [2025-01-05 11:40:11,576][07906] Avg episode reward: 7.810, avg true_objective: 5.435 [2025-01-05 11:40:11,628][07906] Num frames 4400... [2025-01-05 11:40:11,712][07906] Num frames 4500... [2025-01-05 11:40:11,858][07906] Num frames 4600... [2025-01-05 11:40:12,011][07906] Num frames 4700... [2025-01-05 11:40:12,177][07906] Avg episode rewards: #0: 7.653, true rewards: #0: 5.320 [2025-01-05 11:40:12,177][07906] Avg episode reward: 7.653, avg true_objective: 5.320 [2025-01-05 11:40:12,205][07906] Num frames 4800... [2025-01-05 11:40:12,327][07906] Num frames 4900... [2025-01-05 11:40:12,450][07906] Num frames 5000... [2025-01-05 11:40:12,565][07906] Num frames 5100... [2025-01-05 11:40:12,685][07906] Num frames 5200... [2025-01-05 11:40:12,774][07906] Avg episode rewards: #0: 7.436, true rewards: #0: 5.236 [2025-01-05 11:40:12,775][07906] Avg episode reward: 7.436, avg true_objective: 5.236 [2025-01-05 11:40:18,231][07906] Replay video saved to /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/replay.mp4! [2025-01-05 11:40:18,241][07906] Loading existing experiment configuration from /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/config.json [2025-01-05 11:40:18,241][07906] Overriding arg 'num_workers' with value 1 passed from command line [2025-01-05 11:40:18,241][07906] Adding new argument 'no_render'=True that is not in the saved config file! [2025-01-05 11:40:18,241][07906] Adding new argument 'save_video'=True that is not in the saved config file! [2025-01-05 11:40:18,241][07906] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! [2025-01-05 11:40:18,241][07906] Adding new argument 'video_name'=None that is not in the saved config file! [2025-01-05 11:40:18,241][07906] Adding new argument 'max_num_frames'=100000 that is not in the saved config file! [2025-01-05 11:40:18,241][07906] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! [2025-01-05 11:40:18,241][07906] Adding new argument 'push_to_hub'=True that is not in the saved config file! [2025-01-05 11:40:18,242][07906] Adding new argument 'hf_repository'='spenning/rl_course_vizdoom_health_gathering_supreme' that is not in the saved config file! [2025-01-05 11:40:18,242][07906] Adding new argument 'policy_index'=0 that is not in the saved config file! [2025-01-05 11:40:18,242][07906] Adding new argument 'eval_deterministic'=False that is not in the saved config file! [2025-01-05 11:40:18,242][07906] Adding new argument 'train_script'=None that is not in the saved config file! [2025-01-05 11:40:18,242][07906] Adding new argument 'enjoy_script'=None that is not in the saved config file! [2025-01-05 11:40:18,242][07906] Using frameskip 1 and render_action_repeat=4 for evaluation [2025-01-05 11:40:18,257][07906] RunningMeanStd input shape: (3, 72, 128) [2025-01-05 11:40:18,259][07906] RunningMeanStd input shape: (1,) [2025-01-05 11:40:18,264][07906] ConvEncoder: input_channels=3 [2025-01-05 11:40:18,293][07906] Conv encoder output size: 512 [2025-01-05 11:40:18,294][07906] Policy head output size: 512 [2025-01-05 11:41:50,870][08963] Saving configuration to /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/config.json... [2025-01-05 11:41:50,871][08963] Rollout worker 0 uses device cpu [2025-01-05 11:41:50,871][08963] Rollout worker 1 uses device cpu [2025-01-05 11:41:50,871][08963] Rollout worker 2 uses device cpu [2025-01-05 11:41:50,871][08963] Rollout worker 3 uses device cpu [2025-01-05 11:41:50,871][08963] Rollout worker 4 uses device cpu [2025-01-05 11:41:50,872][08963] Rollout worker 5 uses device cpu [2025-01-05 11:41:50,872][08963] Rollout worker 6 uses device cpu [2025-01-05 11:41:50,872][08963] Rollout worker 7 uses device cpu [2025-01-05 11:41:50,872][08963] Rollout worker 8 uses device cpu [2025-01-05 11:41:50,872][08963] Rollout worker 9 uses device cpu [2025-01-05 11:41:50,872][08963] Rollout worker 10 uses device cpu [2025-01-05 11:41:50,873][08963] Rollout worker 11 uses device cpu [2025-01-05 11:41:50,922][08963] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2025-01-05 11:41:50,922][08963] InferenceWorker_p0-w0: min num requests: 4 [2025-01-05 11:41:50,950][08963] Starting all processes... [2025-01-05 11:41:50,950][08963] Starting process learner_proc0 [2025-01-05 11:41:52,320][08963] Starting all processes... [2025-01-05 11:41:52,325][08963] Starting process inference_proc0-0 [2025-01-05 11:41:52,326][08963] Starting process rollout_proc0 [2025-01-05 11:41:52,328][09024] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2025-01-05 11:41:52,328][08963] Starting process rollout_proc1 [2025-01-05 11:41:52,328][09024] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0 [2025-01-05 11:41:52,334][08963] Starting process rollout_proc2 [2025-01-05 11:41:52,337][08963] Starting process rollout_proc3 [2025-01-05 11:41:52,337][08963] Starting process rollout_proc4 [2025-01-05 11:41:52,344][08963] Starting process rollout_proc5 [2025-01-05 11:41:52,344][09024] Num visible devices: 1 [2025-01-05 11:41:52,344][08963] Starting process rollout_proc6 [2025-01-05 11:41:52,350][09024] Starting seed is not provided [2025-01-05 11:41:52,351][09024] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2025-01-05 11:41:52,351][09024] Initializing actor-critic model on device cuda:0 [2025-01-05 11:41:52,352][09024] RunningMeanStd input shape: (3, 72, 128) [2025-01-05 11:41:52,353][09024] RunningMeanStd input shape: (1,) [2025-01-05 11:41:52,344][08963] Starting process rollout_proc7 [2025-01-05 11:41:52,350][08963] Starting process rollout_proc8 [2025-01-05 11:41:52,350][08963] Starting process rollout_proc9 [2025-01-05 11:41:52,354][08963] Starting process rollout_proc10 [2025-01-05 11:41:52,354][08963] Starting process rollout_proc11 [2025-01-05 11:41:52,371][09024] ConvEncoder: input_channels=3 [2025-01-05 11:41:52,558][09024] Conv encoder output size: 512 [2025-01-05 11:41:52,558][09024] Policy head output size: 512 [2025-01-05 11:41:52,583][09024] Created Actor Critic model with architecture: [2025-01-05 11:41:52,584][09024] ActorCriticSharedWeights( (obs_normalizer): ObservationNormalizer( (running_mean_std): RunningMeanStdDictInPlace( (running_mean_std): ModuleDict( (obs): RunningMeanStdInPlace() ) ) ) (returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace) (encoder): VizdoomEncoder( (basic_encoder): ConvEncoder( (enc): RecursiveScriptModule( original_name=ConvEncoderImpl (conv_head): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Conv2d) (1): RecursiveScriptModule(original_name=ELU) (2): RecursiveScriptModule(original_name=Conv2d) (3): RecursiveScriptModule(original_name=ELU) (4): RecursiveScriptModule(original_name=Conv2d) (5): RecursiveScriptModule(original_name=ELU) ) (mlp_layers): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Linear) (1): RecursiveScriptModule(original_name=ELU) ) ) ) ) (core): ModelCoreRNN( (core): GRU(512, 512) ) (decoder): MlpDecoder( (mlp): Identity() ) (critic_linear): Linear(in_features=512, out_features=1, bias=True) (action_parameterization): ActionParameterizationDefault( (distribution_linear): Linear(in_features=512, out_features=5, bias=True) ) ) [2025-01-05 11:41:52,728][09024] Using optimizer [2025-01-05 11:41:54,645][09024] Loading state from checkpoint /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000244148_1000030208.pth... [2025-01-05 11:41:54,732][09024] Loading model from checkpoint [2025-01-05 11:41:54,734][09024] Loaded experiment state at self.train_step=244148, self.env_steps=1000030208 [2025-01-05 11:41:54,735][09024] Initialized policy 0 weights for model version 244148 [2025-01-05 11:41:54,739][09024] LearnerWorker_p0 finished initialization! [2025-01-05 11:41:54,740][09024] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2025-01-05 11:41:54,951][09059] Worker 2 uses CPU cores [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15] [2025-01-05 11:41:55,096][09061] Worker 3 uses CPU cores [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15] [2025-01-05 11:41:55,147][09084] Worker 11 uses CPU cores [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15] [2025-01-05 11:41:55,260][09060] Worker 4 uses CPU cores [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15] [2025-01-05 11:41:55,289][09058] Worker 1 uses CPU cores [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15] [2025-01-05 11:41:55,292][09056] Worker 0 uses CPU cores [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15] [2025-01-05 11:41:55,438][09081] Worker 9 uses CPU cores [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15] [2025-01-05 11:41:55,448][09057] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2025-01-05 11:41:55,449][09057] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0 [2025-01-05 11:41:55,462][09057] Num visible devices: 1 [2025-01-05 11:41:55,487][09077] Worker 5 uses CPU cores [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15] [2025-01-05 11:41:55,509][09080] Worker 6 uses CPU cores [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15] [2025-01-05 11:41:55,544][09083] Worker 10 uses CPU cores [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15] [2025-01-05 11:41:55,551][09057] RunningMeanStd input shape: (3, 72, 128) [2025-01-05 11:41:55,552][09057] RunningMeanStd input shape: (1,) [2025-01-05 11:41:55,561][09057] ConvEncoder: input_channels=3 [2025-01-05 11:41:55,570][09082] Worker 8 uses CPU cores [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15] [2025-01-05 11:41:55,591][08963] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 1000030208. Throughput: 0: nan. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2025-01-05 11:41:55,596][09079] Worker 7 uses CPU cores [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15] [2025-01-05 11:41:55,649][09057] Conv encoder output size: 512 [2025-01-05 11:41:55,649][09057] Policy head output size: 512 [2025-01-05 11:41:55,676][08963] Inference worker 0-0 is ready! [2025-01-05 11:41:55,676][08963] All inference workers are ready! Signal rollout workers to start! [2025-01-05 11:41:55,715][09084] Doom resolution: 160x120, resize resolution: (128, 72) [2025-01-05 11:41:55,715][09082] Doom resolution: 160x120, resize resolution: (128, 72) [2025-01-05 11:41:55,717][09056] Doom resolution: 160x120, resize resolution: (128, 72) [2025-01-05 11:41:55,717][09061] Doom resolution: 160x120, resize resolution: (128, 72) [2025-01-05 11:41:55,739][09081] Doom resolution: 160x120, resize resolution: (128, 72) [2025-01-05 11:41:55,739][09079] Doom resolution: 160x120, resize resolution: (128, 72) [2025-01-05 11:41:55,739][09083] Doom resolution: 160x120, resize resolution: (128, 72) [2025-01-05 11:41:55,739][09077] Doom resolution: 160x120, resize resolution: (128, 72) [2025-01-05 11:41:55,740][09058] Doom resolution: 160x120, resize resolution: (128, 72) [2025-01-05 11:41:55,740][09059] Doom resolution: 160x120, resize resolution: (128, 72) [2025-01-05 11:41:55,740][09060] Doom resolution: 160x120, resize resolution: (128, 72) [2025-01-05 11:41:55,740][09080] Doom resolution: 160x120, resize resolution: (128, 72) [2025-01-05 11:41:56,068][09084] Decorrelating experience for 0 frames... [2025-01-05 11:41:56,069][09056] Decorrelating experience for 0 frames... [2025-01-05 11:41:56,068][09061] Decorrelating experience for 0 frames... [2025-01-05 11:41:56,069][09082] Decorrelating experience for 0 frames... [2025-01-05 11:41:56,075][09083] Decorrelating experience for 0 frames... [2025-01-05 11:41:56,078][09077] Decorrelating experience for 0 frames... [2025-01-05 11:41:56,079][09059] Decorrelating experience for 0 frames... [2025-01-05 11:41:56,351][09084] Decorrelating experience for 32 frames... [2025-01-05 11:41:56,352][09061] Decorrelating experience for 32 frames... [2025-01-05 11:41:56,357][09082] Decorrelating experience for 32 frames... [2025-01-05 11:41:56,358][09077] Decorrelating experience for 32 frames... [2025-01-05 11:41:56,361][09059] Decorrelating experience for 32 frames... [2025-01-05 11:41:56,388][09080] Decorrelating experience for 0 frames... [2025-01-05 11:41:56,392][09081] Decorrelating experience for 0 frames... [2025-01-05 11:41:56,392][09058] Decorrelating experience for 0 frames... [2025-01-05 11:41:56,660][09083] Decorrelating experience for 32 frames... [2025-01-05 11:41:56,691][09056] Decorrelating experience for 32 frames... [2025-01-05 11:41:56,697][09058] Decorrelating experience for 32 frames... [2025-01-05 11:41:56,700][09081] Decorrelating experience for 32 frames... [2025-01-05 11:41:56,723][09061] Decorrelating experience for 64 frames... [2025-01-05 11:41:56,730][09077] Decorrelating experience for 64 frames... [2025-01-05 11:41:56,730][09059] Decorrelating experience for 64 frames... [2025-01-05 11:41:56,969][09080] Decorrelating experience for 32 frames... [2025-01-05 11:41:57,008][09084] Decorrelating experience for 64 frames... [2025-01-05 11:41:57,019][09082] Decorrelating experience for 64 frames... [2025-01-05 11:41:57,041][09083] Decorrelating experience for 64 frames... [2025-01-05 11:41:57,054][09058] Decorrelating experience for 64 frames... [2025-01-05 11:41:57,060][09061] Decorrelating experience for 96 frames... [2025-01-05 11:41:57,298][09079] Decorrelating experience for 0 frames... [2025-01-05 11:41:57,323][09059] Decorrelating experience for 96 frames... [2025-01-05 11:41:57,344][09081] Decorrelating experience for 64 frames... [2025-01-05 11:41:57,347][09082] Decorrelating experience for 96 frames... [2025-01-05 11:41:57,370][09083] Decorrelating experience for 96 frames... [2025-01-05 11:41:57,373][09077] Decorrelating experience for 96 frames... [2025-01-05 11:41:57,392][09056] Decorrelating experience for 64 frames... [2025-01-05 11:41:57,508][09080] Decorrelating experience for 64 frames... [2025-01-05 11:41:57,612][09060] Decorrelating experience for 0 frames... [2025-01-05 11:41:57,679][09081] Decorrelating experience for 96 frames... [2025-01-05 11:41:57,688][09084] Decorrelating experience for 96 frames... [2025-01-05 11:41:57,699][09079] Decorrelating experience for 32 frames... [2025-01-05 11:41:57,712][09058] Decorrelating experience for 96 frames... [2025-01-05 11:41:57,842][08963] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 1000030208. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2025-01-05 11:41:57,988][09060] Decorrelating experience for 32 frames... [2025-01-05 11:41:58,068][09056] Decorrelating experience for 96 frames... [2025-01-05 11:41:58,146][09080] Decorrelating experience for 96 frames... [2025-01-05 11:41:58,205][09079] Decorrelating experience for 64 frames... [2025-01-05 11:41:58,316][09024] Signal inference workers to stop experience collection... [2025-01-05 11:41:58,324][09057] InferenceWorker_p0-w0: stopping experience collection [2025-01-05 11:41:58,409][09060] Decorrelating experience for 64 frames... [2025-01-05 11:41:58,515][09079] Decorrelating experience for 96 frames... [2025-01-05 11:41:58,674][09060] Decorrelating experience for 96 frames... [2025-01-05 11:42:00,905][09024] Signal inference workers to resume experience collection... [2025-01-05 11:42:00,906][09057] InferenceWorker_p0-w0: resuming experience collection [2025-01-05 11:42:02,842][08963] Fps is (10 sec: 5084.3, 60 sec: 5084.3, 300 sec: 5084.3). Total num frames: 1000067072. Throughput: 0: 760.2. Samples: 5512. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2025-01-05 11:42:02,842][08963] Avg episode reward: [(0, '6.801')] [2025-01-05 11:42:02,877][09057] Updated weights for policy 0, policy_version 244158 (0.0074) [2025-01-05 11:42:04,840][09057] Updated weights for policy 0, policy_version 244168 (0.0015) [2025-01-05 11:42:06,902][09057] Updated weights for policy 0, policy_version 244178 (0.0017) [2025-01-05 11:42:07,842][08963] Fps is (10 sec: 13926.4, 60 sec: 11368.0, 300 sec: 11368.0). Total num frames: 1000169472. Throughput: 0: 2922.8. Samples: 35806. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2025-01-05 11:42:07,842][08963] Avg episode reward: [(0, '8.935')] [2025-01-05 11:42:09,029][09057] Updated weights for policy 0, policy_version 244188 (0.0017) [2025-01-05 11:42:10,915][08963] Heartbeat connected on Batcher_0 [2025-01-05 11:42:10,918][08963] Heartbeat connected on LearnerWorker_p0 [2025-01-05 11:42:10,926][08963] Heartbeat connected on RolloutWorker_w0 [2025-01-05 11:42:10,928][08963] Heartbeat connected on RolloutWorker_w1 [2025-01-05 11:42:10,929][08963] Heartbeat connected on InferenceWorker_p0-w0 [2025-01-05 11:42:10,932][08963] Heartbeat connected on RolloutWorker_w2 [2025-01-05 11:42:10,932][08963] Heartbeat connected on RolloutWorker_w3 [2025-01-05 11:42:10,937][08963] Heartbeat connected on RolloutWorker_w5 [2025-01-05 11:42:10,938][08963] Heartbeat connected on RolloutWorker_w4 [2025-01-05 11:42:10,939][08963] Heartbeat connected on RolloutWorker_w6 [2025-01-05 11:42:10,944][08963] Heartbeat connected on RolloutWorker_w8 [2025-01-05 11:42:10,944][08963] Heartbeat connected on RolloutWorker_w7 [2025-01-05 11:42:10,948][08963] Heartbeat connected on RolloutWorker_w9 [2025-01-05 11:42:10,948][08963] Heartbeat connected on RolloutWorker_w10 [2025-01-05 11:42:10,951][08963] Heartbeat connected on RolloutWorker_w11 [2025-01-05 11:42:11,099][09057] Updated weights for policy 0, policy_version 244198 (0.0017) [2025-01-05 11:42:12,842][08963] Fps is (10 sec: 20070.4, 60 sec: 13771.6, 300 sec: 13771.6). Total num frames: 1000267776. Throughput: 0: 2926.9. Samples: 50490. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2025-01-05 11:42:12,842][08963] Avg episode reward: [(0, '9.689')] [2025-01-05 11:42:13,126][09057] Updated weights for policy 0, policy_version 244208 (0.0017) [2025-01-05 11:42:15,116][09057] Updated weights for policy 0, policy_version 244218 (0.0016) [2025-01-05 11:42:17,169][09057] Updated weights for policy 0, policy_version 244228 (0.0017) [2025-01-05 11:42:17,842][08963] Fps is (10 sec: 20070.3, 60 sec: 15279.1, 300 sec: 15279.1). Total num frames: 1000370176. Throughput: 0: 3632.5. Samples: 80826. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2025-01-05 11:42:17,842][08963] Avg episode reward: [(0, '10.685')] [2025-01-05 11:42:19,267][09057] Updated weights for policy 0, policy_version 244238 (0.0017) [2025-01-05 11:42:21,366][09057] Updated weights for policy 0, policy_version 244248 (0.0016) [2025-01-05 11:42:22,842][08963] Fps is (10 sec: 19660.6, 60 sec: 15932.7, 300 sec: 15932.7). Total num frames: 1000464384. Throughput: 0: 4020.3. Samples: 109556. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2025-01-05 11:42:22,842][08963] Avg episode reward: [(0, '8.856')] [2025-01-05 11:42:23,895][09057] Updated weights for policy 0, policy_version 244258 (0.0018) [2025-01-05 11:42:26,366][09057] Updated weights for policy 0, policy_version 244268 (0.0018) [2025-01-05 11:42:27,842][08963] Fps is (10 sec: 17612.8, 60 sec: 16002.7, 300 sec: 16002.7). Total num frames: 1000546304. Throughput: 0: 3759.1. Samples: 121232. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2025-01-05 11:42:27,842][08963] Avg episode reward: [(0, '8.858')] [2025-01-05 11:42:28,591][09057] Updated weights for policy 0, policy_version 244278 (0.0019) [2025-01-05 11:42:30,758][09057] Updated weights for policy 0, policy_version 244288 (0.0017) [2025-01-05 11:42:32,842][08963] Fps is (10 sec: 17613.0, 60 sec: 16383.8, 300 sec: 16383.8). Total num frames: 1000640512. Throughput: 0: 3990.5. Samples: 148648. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2025-01-05 11:42:32,842][08963] Avg episode reward: [(0, '9.842')] [2025-01-05 11:42:33,038][09057] Updated weights for policy 0, policy_version 244298 (0.0018) [2025-01-05 11:42:35,235][09057] Updated weights for policy 0, policy_version 244308 (0.0019) [2025-01-05 11:42:37,286][09057] Updated weights for policy 0, policy_version 244318 (0.0017) [2025-01-05 11:42:37,842][08963] Fps is (10 sec: 18841.8, 60 sec: 16674.6, 300 sec: 16674.6). Total num frames: 1000734720. Throughput: 0: 4192.3. Samples: 177126. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2025-01-05 11:42:37,842][08963] Avg episode reward: [(0, '8.990')] [2025-01-05 11:42:39,497][09057] Updated weights for policy 0, policy_version 244328 (0.0018) [2025-01-05 11:42:41,610][09057] Updated weights for policy 0, policy_version 244338 (0.0018) [2025-01-05 11:42:42,842][08963] Fps is (10 sec: 18841.5, 60 sec: 16903.9, 300 sec: 16903.9). Total num frames: 1000828928. Throughput: 0: 4253.8. Samples: 191422. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2025-01-05 11:42:42,842][08963] Avg episode reward: [(0, '10.438')] [2025-01-05 11:42:43,841][09057] Updated weights for policy 0, policy_version 244348 (0.0019) [2025-01-05 11:42:45,877][09057] Updated weights for policy 0, policy_version 244358 (0.0015) [2025-01-05 11:42:47,825][09057] Updated weights for policy 0, policy_version 244368 (0.0015) [2025-01-05 11:42:47,842][08963] Fps is (10 sec: 19660.7, 60 sec: 17246.1, 300 sec: 17246.1). Total num frames: 1000931328. Throughput: 0: 4779.3. Samples: 220582. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2025-01-05 11:42:47,842][08963] Avg episode reward: [(0, '10.013')] [2025-01-05 11:42:49,755][09057] Updated weights for policy 0, policy_version 244378 (0.0014) [2025-01-05 11:42:51,657][09057] Updated weights for policy 0, policy_version 244388 (0.0014) [2025-01-05 11:42:52,842][08963] Fps is (10 sec: 20889.8, 60 sec: 17600.1, 300 sec: 17600.1). Total num frames: 1001037824. Throughput: 0: 4818.5. Samples: 252640. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2025-01-05 11:42:52,842][08963] Avg episode reward: [(0, '10.312')] [2025-01-05 11:42:53,537][09057] Updated weights for policy 0, policy_version 244398 (0.0014) [2025-01-05 11:42:55,476][09057] Updated weights for policy 0, policy_version 244408 (0.0013) [2025-01-05 11:42:57,379][09057] Updated weights for policy 0, policy_version 244418 (0.0015) [2025-01-05 11:42:57,842][08963] Fps is (10 sec: 21299.0, 60 sec: 18568.5, 300 sec: 17897.2). Total num frames: 1001144320. Throughput: 0: 4849.4. Samples: 268716. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2025-01-05 11:42:57,842][08963] Avg episode reward: [(0, '10.529')] [2025-01-05 11:42:59,294][09057] Updated weights for policy 0, policy_version 244428 (0.0014) [2025-01-05 11:43:01,270][09057] Updated weights for policy 0, policy_version 244438 (0.0014) [2025-01-05 11:43:02,842][08963] Fps is (10 sec: 21299.2, 60 sec: 19729.1, 300 sec: 18150.2). Total num frames: 1001250816. Throughput: 0: 4882.7. Samples: 300548. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2025-01-05 11:43:02,842][08963] Avg episode reward: [(0, '9.875')] [2025-01-05 11:43:03,185][09057] Updated weights for policy 0, policy_version 244448 (0.0013) [2025-01-05 11:43:05,113][09057] Updated weights for policy 0, policy_version 244458 (0.0014) [2025-01-05 11:43:07,101][09057] Updated weights for policy 0, policy_version 244468 (0.0014) [2025-01-05 11:43:07,842][08963] Fps is (10 sec: 20890.0, 60 sec: 19729.1, 300 sec: 18311.4). Total num frames: 1001353216. Throughput: 0: 4945.8. Samples: 332116. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2025-01-05 11:43:07,842][08963] Avg episode reward: [(0, '9.913')] [2025-01-05 11:43:09,032][09057] Updated weights for policy 0, policy_version 244478 (0.0014) [2025-01-05 11:43:10,952][09057] Updated weights for policy 0, policy_version 244488 (0.0014) [2025-01-05 11:43:12,842][08963] Fps is (10 sec: 20889.5, 60 sec: 19865.6, 300 sec: 18504.8). Total num frames: 1001459712. Throughput: 0: 5039.9. Samples: 348026. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 11:43:12,842][08963] Avg episode reward: [(0, '9.836')] [2025-01-05 11:43:12,955][09057] Updated weights for policy 0, policy_version 244498 (0.0014) [2025-01-05 11:43:14,898][09057] Updated weights for policy 0, policy_version 244508 (0.0014) [2025-01-05 11:43:16,842][09057] Updated weights for policy 0, policy_version 244518 (0.0015) [2025-01-05 11:43:17,842][08963] Fps is (10 sec: 21299.0, 60 sec: 19933.9, 300 sec: 18674.6). Total num frames: 1001566208. Throughput: 0: 5125.5. Samples: 379298. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 11:43:17,842][08963] Avg episode reward: [(0, '10.237')] [2025-01-05 11:43:18,840][09057] Updated weights for policy 0, policy_version 244528 (0.0014) [2025-01-05 11:43:20,807][09057] Updated weights for policy 0, policy_version 244538 (0.0015) [2025-01-05 11:43:22,758][09057] Updated weights for policy 0, policy_version 244548 (0.0015) [2025-01-05 11:43:22,842][08963] Fps is (10 sec: 20889.6, 60 sec: 20070.4, 300 sec: 18778.1). Total num frames: 1001668608. Throughput: 0: 5186.8. Samples: 410530. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 11:43:22,842][08963] Avg episode reward: [(0, '10.285')] [2025-01-05 11:43:24,742][09057] Updated weights for policy 0, policy_version 244558 (0.0016) [2025-01-05 11:43:26,689][09057] Updated weights for policy 0, policy_version 244568 (0.0014) [2025-01-05 11:43:27,842][08963] Fps is (10 sec: 20479.8, 60 sec: 20411.7, 300 sec: 18870.3). Total num frames: 1001771008. Throughput: 0: 5215.4. Samples: 426116. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 11:43:27,842][08963] Avg episode reward: [(0, '9.365')] [2025-01-05 11:43:28,655][09057] Updated weights for policy 0, policy_version 244578 (0.0014) [2025-01-05 11:43:30,660][09057] Updated weights for policy 0, policy_version 244588 (0.0016) [2025-01-05 11:43:32,591][09057] Updated weights for policy 0, policy_version 244598 (0.0014) [2025-01-05 11:43:32,842][08963] Fps is (10 sec: 20889.7, 60 sec: 20616.5, 300 sec: 18995.2). Total num frames: 1001877504. Throughput: 0: 5260.8. Samples: 457316. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 11:43:32,842][08963] Avg episode reward: [(0, '9.237')] [2025-01-05 11:43:34,567][09057] Updated weights for policy 0, policy_version 244608 (0.0014) [2025-01-05 11:43:36,584][09057] Updated weights for policy 0, policy_version 244618 (0.0014) [2025-01-05 11:43:37,842][08963] Fps is (10 sec: 20889.7, 60 sec: 20753.0, 300 sec: 19067.8). Total num frames: 1001979904. Throughput: 0: 5235.1. Samples: 488220. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 11:43:37,843][08963] Avg episode reward: [(0, '9.309')] [2025-01-05 11:43:38,592][09057] Updated weights for policy 0, policy_version 244628 (0.0015) [2025-01-05 11:43:40,623][09057] Updated weights for policy 0, policy_version 244638 (0.0015) [2025-01-05 11:43:42,668][09057] Updated weights for policy 0, policy_version 244648 (0.0013) [2025-01-05 11:43:42,842][08963] Fps is (10 sec: 20480.0, 60 sec: 20889.6, 300 sec: 19133.7). Total num frames: 1002082304. Throughput: 0: 5211.6. Samples: 503238. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 11:43:42,842][08963] Avg episode reward: [(0, '10.344')] [2025-01-05 11:43:44,667][09057] Updated weights for policy 0, policy_version 244658 (0.0016) [2025-01-05 11:43:46,715][09057] Updated weights for policy 0, policy_version 244668 (0.0015) [2025-01-05 11:43:47,842][08963] Fps is (10 sec: 20070.2, 60 sec: 20821.3, 300 sec: 19157.1). Total num frames: 1002180608. Throughput: 0: 5175.8. Samples: 533462. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 11:43:47,842][08963] Avg episode reward: [(0, '10.106')] [2025-01-05 11:43:47,850][09024] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000244673_1002180608.pth... [2025-01-05 11:43:47,904][09024] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000244146_1000022016.pth [2025-01-05 11:43:48,888][09057] Updated weights for policy 0, policy_version 244678 (0.0016) [2025-01-05 11:43:50,865][09057] Updated weights for policy 0, policy_version 244688 (0.0015) [2025-01-05 11:43:52,842][08963] Fps is (10 sec: 19660.7, 60 sec: 20684.8, 300 sec: 19178.6). Total num frames: 1002278912. Throughput: 0: 5142.7. Samples: 563536. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 11:43:52,842][08963] Avg episode reward: [(0, '11.036')] [2025-01-05 11:43:52,882][09057] Updated weights for policy 0, policy_version 244698 (0.0015) [2025-01-05 11:43:54,918][09057] Updated weights for policy 0, policy_version 244708 (0.0014) [2025-01-05 11:43:56,839][09057] Updated weights for policy 0, policy_version 244718 (0.0016) [2025-01-05 11:43:57,842][08963] Fps is (10 sec: 20070.5, 60 sec: 20616.5, 300 sec: 19231.8). Total num frames: 1002381312. Throughput: 0: 5128.4. Samples: 578804. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 11:43:57,842][08963] Avg episode reward: [(0, '9.751')] [2025-01-05 11:43:58,857][09057] Updated weights for policy 0, policy_version 244728 (0.0014) [2025-01-05 11:44:00,888][09057] Updated weights for policy 0, policy_version 244738 (0.0015) [2025-01-05 11:44:02,842][08963] Fps is (10 sec: 20480.2, 60 sec: 20548.3, 300 sec: 19280.9). Total num frames: 1002483712. Throughput: 0: 5115.1. Samples: 609478. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 11:44:02,842][08963] Avg episode reward: [(0, '10.395')] [2025-01-05 11:44:02,903][09057] Updated weights for policy 0, policy_version 244748 (0.0016) [2025-01-05 11:44:04,986][09057] Updated weights for policy 0, policy_version 244758 (0.0016) [2025-01-05 11:44:07,061][09057] Updated weights for policy 0, policy_version 244768 (0.0016) [2025-01-05 11:44:07,842][08963] Fps is (10 sec: 20480.3, 60 sec: 20548.3, 300 sec: 19326.2). Total num frames: 1002586112. Throughput: 0: 5087.3. Samples: 639458. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 11:44:07,842][08963] Avg episode reward: [(0, '10.657')] [2025-01-05 11:44:09,069][09057] Updated weights for policy 0, policy_version 244778 (0.0018) [2025-01-05 11:44:11,118][09057] Updated weights for policy 0, policy_version 244788 (0.0016) [2025-01-05 11:44:12,842][08963] Fps is (10 sec: 20070.3, 60 sec: 20411.7, 300 sec: 19338.4). Total num frames: 1002684416. Throughput: 0: 5073.0. Samples: 654400. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 11:44:12,842][08963] Avg episode reward: [(0, '9.585')] [2025-01-05 11:44:13,282][09057] Updated weights for policy 0, policy_version 244798 (0.0017) [2025-01-05 11:44:15,225][09057] Updated weights for policy 0, policy_version 244808 (0.0018) [2025-01-05 11:44:17,276][09057] Updated weights for policy 0, policy_version 244818 (0.0014) [2025-01-05 11:44:17,842][08963] Fps is (10 sec: 19660.6, 60 sec: 20275.2, 300 sec: 19349.7). Total num frames: 1002782720. Throughput: 0: 5043.0. Samples: 684250. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 11:44:17,842][08963] Avg episode reward: [(0, '9.813')] [2025-01-05 11:44:19,465][09057] Updated weights for policy 0, policy_version 244828 (0.0016) [2025-01-05 11:44:21,408][09057] Updated weights for policy 0, policy_version 244838 (0.0017) [2025-01-05 11:44:22,842][08963] Fps is (10 sec: 19660.8, 60 sec: 20207.0, 300 sec: 19360.3). Total num frames: 1002881024. Throughput: 0: 5018.5. Samples: 714050. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 11:44:22,842][08963] Avg episode reward: [(0, '9.896')] [2025-01-05 11:44:23,467][09057] Updated weights for policy 0, policy_version 244848 (0.0018) [2025-01-05 11:44:25,535][09057] Updated weights for policy 0, policy_version 244858 (0.0018) [2025-01-05 11:44:27,464][09057] Updated weights for policy 0, policy_version 244868 (0.0016) [2025-01-05 11:44:27,842][08963] Fps is (10 sec: 20070.4, 60 sec: 20206.9, 300 sec: 19397.1). Total num frames: 1002983424. Throughput: 0: 5022.2. Samples: 729238. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 11:44:27,842][08963] Avg episode reward: [(0, '10.605')] [2025-01-05 11:44:29,530][09057] Updated weights for policy 0, policy_version 244878 (0.0017) [2025-01-05 11:44:31,598][09057] Updated weights for policy 0, policy_version 244888 (0.0016) [2025-01-05 11:44:32,842][08963] Fps is (10 sec: 20070.1, 60 sec: 20070.4, 300 sec: 19405.5). Total num frames: 1003081728. Throughput: 0: 5026.5. Samples: 759654. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 11:44:32,842][08963] Avg episode reward: [(0, '9.514')] [2025-01-05 11:44:33,664][09057] Updated weights for policy 0, policy_version 244898 (0.0017) [2025-01-05 11:44:35,775][09057] Updated weights for policy 0, policy_version 244908 (0.0016) [2025-01-05 11:44:37,842][08963] Fps is (10 sec: 19660.6, 60 sec: 20002.1, 300 sec: 19413.3). Total num frames: 1003180032. Throughput: 0: 4999.0. Samples: 788490. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 11:44:37,843][08963] Avg episode reward: [(0, '9.712')] [2025-01-05 11:44:38,026][09057] Updated weights for policy 0, policy_version 244918 (0.0017) [2025-01-05 11:44:40,124][09057] Updated weights for policy 0, policy_version 244928 (0.0018) [2025-01-05 11:44:42,320][09057] Updated weights for policy 0, policy_version 244938 (0.0016) [2025-01-05 11:44:42,842][08963] Fps is (10 sec: 19251.2, 60 sec: 19865.6, 300 sec: 19396.2). Total num frames: 1003274240. Throughput: 0: 4963.2. Samples: 802150. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 11:44:42,842][08963] Avg episode reward: [(0, '9.384')] [2025-01-05 11:44:43,931][09024] Signal inference workers to stop experience collection... (50 times) [2025-01-05 11:44:43,932][09024] Signal inference workers to resume experience collection... (50 times) [2025-01-05 11:44:43,943][09057] InferenceWorker_p0-w0: stopping experience collection (50 times) [2025-01-05 11:44:43,943][09057] InferenceWorker_p0-w0: resuming experience collection (50 times) [2025-01-05 11:44:44,577][09057] Updated weights for policy 0, policy_version 244948 (0.0020) [2025-01-05 11:44:46,524][09057] Updated weights for policy 0, policy_version 244958 (0.0016) [2025-01-05 11:44:47,842][08963] Fps is (10 sec: 19251.6, 60 sec: 19865.6, 300 sec: 19403.9). Total num frames: 1003372544. Throughput: 0: 4929.2. Samples: 831294. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 11:44:47,842][08963] Avg episode reward: [(0, '9.515')] [2025-01-05 11:44:48,562][09057] Updated weights for policy 0, policy_version 244968 (0.0016) [2025-01-05 11:44:50,675][09057] Updated weights for policy 0, policy_version 244978 (0.0015) [2025-01-05 11:44:52,720][09057] Updated weights for policy 0, policy_version 244988 (0.0018) [2025-01-05 11:44:52,842][08963] Fps is (10 sec: 19661.1, 60 sec: 19865.6, 300 sec: 19411.2). Total num frames: 1003470848. Throughput: 0: 4922.3. Samples: 860962. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 11:44:52,842][08963] Avg episode reward: [(0, '9.559')] [2025-01-05 11:44:54,911][09057] Updated weights for policy 0, policy_version 244998 (0.0017) [2025-01-05 11:44:57,086][09057] Updated weights for policy 0, policy_version 245008 (0.0018) [2025-01-05 11:44:57,842][08963] Fps is (10 sec: 19660.6, 60 sec: 19797.3, 300 sec: 19418.0). Total num frames: 1003569152. Throughput: 0: 4907.0. Samples: 875214. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 11:44:57,842][08963] Avg episode reward: [(0, '9.693')] [2025-01-05 11:44:59,116][09057] Updated weights for policy 0, policy_version 245018 (0.0018) [2025-01-05 11:45:01,296][09057] Updated weights for policy 0, policy_version 245028 (0.0017) [2025-01-05 11:45:02,842][08963] Fps is (10 sec: 18841.4, 60 sec: 19592.5, 300 sec: 19380.7). Total num frames: 1003659264. Throughput: 0: 4885.0. Samples: 904076. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 11:45:02,842][08963] Avg episode reward: [(0, '10.232')] [2025-01-05 11:45:03,485][09057] Updated weights for policy 0, policy_version 245038 (0.0017) [2025-01-05 11:45:05,425][09057] Updated weights for policy 0, policy_version 245048 (0.0017) [2025-01-05 11:45:07,477][09057] Updated weights for policy 0, policy_version 245058 (0.0017) [2025-01-05 11:45:07,842][08963] Fps is (10 sec: 19251.5, 60 sec: 19592.5, 300 sec: 19409.3). Total num frames: 1003761664. Throughput: 0: 4888.9. Samples: 934050. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 11:45:07,842][08963] Avg episode reward: [(0, '9.336')] [2025-01-05 11:45:09,695][09057] Updated weights for policy 0, policy_version 245068 (0.0019) [2025-01-05 11:45:11,666][09057] Updated weights for policy 0, policy_version 245078 (0.0015) [2025-01-05 11:45:12,842][08963] Fps is (10 sec: 20070.6, 60 sec: 19592.5, 300 sec: 19415.7). Total num frames: 1003859968. Throughput: 0: 4869.3. Samples: 948356. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 11:45:12,842][08963] Avg episode reward: [(0, '10.173')] [2025-01-05 11:45:13,687][09057] Updated weights for policy 0, policy_version 245088 (0.0016) [2025-01-05 11:45:15,164][09024] Signal inference workers to stop experience collection... (100 times) [2025-01-05 11:45:15,165][09024] Signal inference workers to resume experience collection... (100 times) [2025-01-05 11:45:15,177][09057] InferenceWorker_p0-w0: stopping experience collection (100 times) [2025-01-05 11:45:15,178][09057] InferenceWorker_p0-w0: resuming experience collection (100 times) [2025-01-05 11:45:15,768][09057] Updated weights for policy 0, policy_version 245098 (0.0016) [2025-01-05 11:45:17,688][09057] Updated weights for policy 0, policy_version 245108 (0.0016) [2025-01-05 11:45:17,842][08963] Fps is (10 sec: 20070.2, 60 sec: 19660.8, 300 sec: 19442.0). Total num frames: 1003962368. Throughput: 0: 4873.7. Samples: 978970. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 11:45:17,842][08963] Avg episode reward: [(0, '9.544')] [2025-01-05 11:45:19,726][09057] Updated weights for policy 0, policy_version 245118 (0.0015) [2025-01-05 11:45:21,820][09057] Updated weights for policy 0, policy_version 245128 (0.0016) [2025-01-05 11:45:22,842][08963] Fps is (10 sec: 20479.8, 60 sec: 19729.0, 300 sec: 19467.1). Total num frames: 1004064768. Throughput: 0: 4905.5. Samples: 1009238. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 11:45:22,842][08963] Avg episode reward: [(0, '9.975')] [2025-01-05 11:45:23,858][09057] Updated weights for policy 0, policy_version 245138 (0.0017) [2025-01-05 11:45:25,955][09057] Updated weights for policy 0, policy_version 245148 (0.0017) [2025-01-05 11:45:27,842][08963] Fps is (10 sec: 20069.9, 60 sec: 19660.7, 300 sec: 19471.6). Total num frames: 1004163072. Throughput: 0: 4925.0. Samples: 1023776. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 11:45:27,843][08963] Avg episode reward: [(0, '9.656')] [2025-01-05 11:45:28,117][09057] Updated weights for policy 0, policy_version 245158 (0.0017) [2025-01-05 11:45:30,075][09057] Updated weights for policy 0, policy_version 245168 (0.0015) [2025-01-05 11:45:32,105][09057] Updated weights for policy 0, policy_version 245178 (0.0017) [2025-01-05 11:45:32,842][08963] Fps is (10 sec: 19660.8, 60 sec: 19660.8, 300 sec: 19476.0). Total num frames: 1004261376. Throughput: 0: 4944.4. Samples: 1053792. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 11:45:32,842][08963] Avg episode reward: [(0, '8.872')] [2025-01-05 11:45:34,290][09057] Updated weights for policy 0, policy_version 245188 (0.0017) [2025-01-05 11:45:36,279][09057] Updated weights for policy 0, policy_version 245198 (0.0016) [2025-01-05 11:45:37,842][08963] Fps is (10 sec: 19661.2, 60 sec: 19660.8, 300 sec: 19480.1). Total num frames: 1004359680. Throughput: 0: 4943.4. Samples: 1083414. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 11:45:37,842][08963] Avg episode reward: [(0, '10.859')] [2025-01-05 11:45:38,301][09057] Updated weights for policy 0, policy_version 245208 (0.0015) [2025-01-05 11:45:40,365][09057] Updated weights for policy 0, policy_version 245218 (0.0015) [2025-01-05 11:45:42,314][09057] Updated weights for policy 0, policy_version 245228 (0.0014) [2025-01-05 11:45:42,842][08963] Fps is (10 sec: 20070.5, 60 sec: 19797.4, 300 sec: 19502.1). Total num frames: 1004462080. Throughput: 0: 4966.0. Samples: 1098684. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 11:45:42,842][08963] Avg episode reward: [(0, '9.664')] [2025-01-05 11:45:44,339][09057] Updated weights for policy 0, policy_version 245238 (0.0014) [2025-01-05 11:45:46,391][09057] Updated weights for policy 0, policy_version 245248 (0.0015) [2025-01-05 11:45:47,842][08963] Fps is (10 sec: 20479.8, 60 sec: 19865.5, 300 sec: 19523.2). Total num frames: 1004564480. Throughput: 0: 5003.9. Samples: 1129252. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 11:45:47,842][08963] Avg episode reward: [(0, '9.956')] [2025-01-05 11:45:47,849][09024] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000245255_1004564480.pth... [2025-01-05 11:45:47,905][09024] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000244148_1000030208.pth [2025-01-05 11:45:48,463][09057] Updated weights for policy 0, policy_version 245258 (0.0019) [2025-01-05 11:45:50,567][09057] Updated weights for policy 0, policy_version 245268 (0.0017) [2025-01-05 11:45:52,685][09057] Updated weights for policy 0, policy_version 245278 (0.0016) [2025-01-05 11:45:52,842][08963] Fps is (10 sec: 20070.3, 60 sec: 19865.5, 300 sec: 19526.1). Total num frames: 1004662784. Throughput: 0: 4989.0. Samples: 1158554. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 11:45:52,842][08963] Avg episode reward: [(0, '9.659')] [2025-01-05 11:45:54,753][09057] Updated weights for policy 0, policy_version 245288 (0.0018) [2025-01-05 11:45:56,902][09057] Updated weights for policy 0, policy_version 245298 (0.0017) [2025-01-05 11:45:57,842][08963] Fps is (10 sec: 18841.9, 60 sec: 19729.1, 300 sec: 19495.1). Total num frames: 1004752896. Throughput: 0: 4984.8. Samples: 1172670. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 11:45:57,842][08963] Avg episode reward: [(0, '8.919')] [2025-01-05 11:45:59,160][09057] Updated weights for policy 0, policy_version 245308 (0.0018) [2025-01-05 11:46:01,184][09057] Updated weights for policy 0, policy_version 245318 (0.0018) [2025-01-05 11:46:02,843][08963] Fps is (10 sec: 18430.0, 60 sec: 19797.0, 300 sec: 19481.8). Total num frames: 1004847104. Throughput: 0: 4943.9. Samples: 1201450. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 11:46:02,844][08963] Avg episode reward: [(0, '11.012')] [2025-01-05 11:46:03,615][09057] Updated weights for policy 0, policy_version 245328 (0.0018) [2025-01-05 11:46:05,962][09057] Updated weights for policy 0, policy_version 245338 (0.0019) [2025-01-05 11:46:07,842][08963] Fps is (10 sec: 18431.8, 60 sec: 19592.5, 300 sec: 19452.9). Total num frames: 1004937216. Throughput: 0: 4856.8. Samples: 1227796. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 11:46:07,842][08963] Avg episode reward: [(0, '8.755')] [2025-01-05 11:46:08,137][09057] Updated weights for policy 0, policy_version 245348 (0.0019) [2025-01-05 11:46:10,484][09057] Updated weights for policy 0, policy_version 245358 (0.0018) [2025-01-05 11:46:12,842][08963] Fps is (10 sec: 17614.9, 60 sec: 19387.7, 300 sec: 19409.2). Total num frames: 1005023232. Throughput: 0: 4829.7. Samples: 1241112. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 11:46:12,842][08963] Avg episode reward: [(0, '10.238')] [2025-01-05 11:46:12,982][09057] Updated weights for policy 0, policy_version 245368 (0.0017) [2025-01-05 11:46:15,115][09057] Updated weights for policy 0, policy_version 245378 (0.0018) [2025-01-05 11:46:17,290][09057] Updated weights for policy 0, policy_version 245388 (0.0017) [2025-01-05 11:46:17,842][08963] Fps is (10 sec: 18022.2, 60 sec: 19251.1, 300 sec: 19398.3). Total num frames: 1005117440. Throughput: 0: 4760.6. Samples: 1268020. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 11:46:17,843][08963] Avg episode reward: [(0, '9.739')] [2025-01-05 11:46:19,601][09057] Updated weights for policy 0, policy_version 245398 (0.0018) [2025-01-05 11:46:21,907][09057] Updated weights for policy 0, policy_version 245408 (0.0022) [2025-01-05 11:46:22,842][08963] Fps is (10 sec: 18021.9, 60 sec: 18978.1, 300 sec: 19357.3). Total num frames: 1005203456. Throughput: 0: 4691.4. Samples: 1294526. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 11:46:22,843][08963] Avg episode reward: [(0, '11.576')] [2025-01-05 11:46:24,325][09057] Updated weights for policy 0, policy_version 245418 (0.0021) [2025-01-05 11:46:26,782][09057] Updated weights for policy 0, policy_version 245428 (0.0022) [2025-01-05 11:46:27,842][08963] Fps is (10 sec: 17203.3, 60 sec: 18773.4, 300 sec: 19317.7). Total num frames: 1005289472. Throughput: 0: 4641.1. Samples: 1307532. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 11:46:27,843][08963] Avg episode reward: [(0, '9.897')] [2025-01-05 11:46:29,197][09057] Updated weights for policy 0, policy_version 245438 (0.0021) [2025-01-05 11:46:31,505][09057] Updated weights for policy 0, policy_version 245448 (0.0020) [2025-01-05 11:46:32,842][08963] Fps is (10 sec: 17203.6, 60 sec: 18568.5, 300 sec: 19279.6). Total num frames: 1005375488. Throughput: 0: 4525.8. Samples: 1332914. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 11:46:32,842][08963] Avg episode reward: [(0, '10.505')] [2025-01-05 11:46:34,020][09057] Updated weights for policy 0, policy_version 245458 (0.0022) [2025-01-05 11:46:36,197][09057] Updated weights for policy 0, policy_version 245468 (0.0018) [2025-01-05 11:46:37,842][08963] Fps is (10 sec: 17203.5, 60 sec: 18363.8, 300 sec: 19242.8). Total num frames: 1005461504. Throughput: 0: 4457.7. Samples: 1359150. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 11:46:37,842][08963] Avg episode reward: [(0, '10.043')] [2025-01-05 11:46:38,532][09057] Updated weights for policy 0, policy_version 245478 (0.0020) [2025-01-05 11:46:40,659][09057] Updated weights for policy 0, policy_version 245488 (0.0017) [2025-01-05 11:46:42,682][09057] Updated weights for policy 0, policy_version 245498 (0.0017) [2025-01-05 11:46:42,842][08963] Fps is (10 sec: 18431.7, 60 sec: 18295.4, 300 sec: 19250.1). Total num frames: 1005559808. Throughput: 0: 4464.1. Samples: 1373556. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 11:46:42,843][08963] Avg episode reward: [(0, '9.880')] [2025-01-05 11:46:45,001][09057] Updated weights for policy 0, policy_version 245508 (0.0021) [2025-01-05 11:46:47,207][09057] Updated weights for policy 0, policy_version 245518 (0.0018) [2025-01-05 11:46:47,842][08963] Fps is (10 sec: 18841.3, 60 sec: 18090.7, 300 sec: 19229.1). Total num frames: 1005649920. Throughput: 0: 4447.5. Samples: 1401582. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 11:46:47,843][08963] Avg episode reward: [(0, '10.147')] [2025-01-05 11:46:49,348][09057] Updated weights for policy 0, policy_version 245528 (0.0018) [2025-01-05 11:46:51,491][09057] Updated weights for policy 0, policy_version 245538 (0.0017) [2025-01-05 11:46:52,842][08963] Fps is (10 sec: 18842.1, 60 sec: 18090.7, 300 sec: 19383.1). Total num frames: 1005748224. Throughput: 0: 4496.2. Samples: 1430126. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 11:46:52,842][08963] Avg episode reward: [(0, '10.198')] [2025-01-05 11:46:53,710][09057] Updated weights for policy 0, policy_version 245548 (0.0018) [2025-01-05 11:46:55,730][09057] Updated weights for policy 0, policy_version 245558 (0.0017) [2025-01-05 11:46:57,842][08963] Fps is (10 sec: 19251.3, 60 sec: 18158.9, 300 sec: 19577.5). Total num frames: 1005842432. Throughput: 0: 4521.3. Samples: 1444570. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 11:46:57,843][08963] Avg episode reward: [(0, '9.834')] [2025-01-05 11:46:58,013][09057] Updated weights for policy 0, policy_version 245568 (0.0019) [2025-01-05 11:47:00,198][09057] Updated weights for policy 0, policy_version 245578 (0.0020) [2025-01-05 11:47:02,245][09057] Updated weights for policy 0, policy_version 245588 (0.0018) [2025-01-05 11:47:02,842][08963] Fps is (10 sec: 18841.5, 60 sec: 18159.3, 300 sec: 19549.7). Total num frames: 1005936640. Throughput: 0: 4547.7. Samples: 1472666. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 11:47:02,842][08963] Avg episode reward: [(0, '10.200')] [2025-01-05 11:47:04,533][09057] Updated weights for policy 0, policy_version 245598 (0.0018) [2025-01-05 11:47:06,664][09057] Updated weights for policy 0, policy_version 245608 (0.0018) [2025-01-05 11:47:07,842][08963] Fps is (10 sec: 18841.7, 60 sec: 18227.2, 300 sec: 19535.8). Total num frames: 1006030848. Throughput: 0: 4590.6. Samples: 1501100. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 11:47:07,843][08963] Avg episode reward: [(0, '10.476')] [2025-01-05 11:47:08,796][09057] Updated weights for policy 0, policy_version 245618 (0.0018) [2025-01-05 11:47:10,963][09057] Updated weights for policy 0, policy_version 245628 (0.0018) [2025-01-05 11:47:12,842][08963] Fps is (10 sec: 18841.2, 60 sec: 18363.7, 300 sec: 19508.1). Total num frames: 1006125056. Throughput: 0: 4616.1. Samples: 1515258. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 11:47:12,843][08963] Avg episode reward: [(0, '9.491')] [2025-01-05 11:47:13,206][09057] Updated weights for policy 0, policy_version 245638 (0.0019) [2025-01-05 11:47:15,196][09057] Updated weights for policy 0, policy_version 245648 (0.0017) [2025-01-05 11:47:17,308][09057] Updated weights for policy 0, policy_version 245658 (0.0016) [2025-01-05 11:47:17,842][08963] Fps is (10 sec: 19251.1, 60 sec: 18432.1, 300 sec: 19522.0). Total num frames: 1006223360. Throughput: 0: 4698.9. Samples: 1544366. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 11:47:17,842][08963] Avg episode reward: [(0, '9.014')] [2025-01-05 11:47:19,548][09057] Updated weights for policy 0, policy_version 245668 (0.0018) [2025-01-05 11:47:21,551][09057] Updated weights for policy 0, policy_version 245678 (0.0017) [2025-01-05 11:47:22,842][08963] Fps is (10 sec: 19661.2, 60 sec: 18636.9, 300 sec: 19577.5). Total num frames: 1006321664. Throughput: 0: 4758.9. Samples: 1573302. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 11:47:22,842][08963] Avg episode reward: [(0, '8.907')] [2025-01-05 11:47:23,694][09057] Updated weights for policy 0, policy_version 245688 (0.0017) [2025-01-05 11:47:25,759][09057] Updated weights for policy 0, policy_version 245698 (0.0017) [2025-01-05 11:47:27,842][08963] Fps is (10 sec: 19250.9, 60 sec: 18773.3, 300 sec: 19577.5). Total num frames: 1006415872. Throughput: 0: 4769.7. Samples: 1588192. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 11:47:27,843][08963] Avg episode reward: [(0, '10.439')] [2025-01-05 11:47:27,883][09057] Updated weights for policy 0, policy_version 245708 (0.0019) [2025-01-05 11:47:30,149][09057] Updated weights for policy 0, policy_version 245718 (0.0019) [2025-01-05 11:47:32,275][09057] Updated weights for policy 0, policy_version 245728 (0.0017) [2025-01-05 11:47:32,842][08963] Fps is (10 sec: 18841.7, 60 sec: 18909.9, 300 sec: 19577.5). Total num frames: 1006510080. Throughput: 0: 4772.6. Samples: 1616348. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 11:47:32,842][08963] Avg episode reward: [(0, '9.415')] [2025-01-05 11:47:34,394][09057] Updated weights for policy 0, policy_version 245738 (0.0019) [2025-01-05 11:47:36,510][09057] Updated weights for policy 0, policy_version 245748 (0.0018) [2025-01-05 11:47:37,842][08963] Fps is (10 sec: 19251.5, 60 sec: 19114.6, 300 sec: 19591.4). Total num frames: 1006608384. Throughput: 0: 4777.0. Samples: 1645090. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 11:47:37,842][08963] Avg episode reward: [(0, '10.206')] [2025-01-05 11:47:38,766][09057] Updated weights for policy 0, policy_version 245758 (0.0020) [2025-01-05 11:47:40,747][09057] Updated weights for policy 0, policy_version 245768 (0.0017) [2025-01-05 11:47:42,842][08963] Fps is (10 sec: 19251.2, 60 sec: 19046.5, 300 sec: 19563.6). Total num frames: 1006702592. Throughput: 0: 4776.3. Samples: 1659502. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 11:47:42,842][08963] Avg episode reward: [(0, '10.041')] [2025-01-05 11:47:42,860][09057] Updated weights for policy 0, policy_version 245778 (0.0017) [2025-01-05 11:47:45,118][09057] Updated weights for policy 0, policy_version 245788 (0.0018) [2025-01-05 11:47:47,119][09057] Updated weights for policy 0, policy_version 245798 (0.0017) [2025-01-05 11:47:47,842][08963] Fps is (10 sec: 19251.1, 60 sec: 19182.9, 300 sec: 19535.8). Total num frames: 1006800896. Throughput: 0: 4795.2. Samples: 1688450. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 11:47:47,842][08963] Avg episode reward: [(0, '9.366')] [2025-01-05 11:47:47,849][09024] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000245801_1006800896.pth... [2025-01-05 11:47:47,908][09024] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000244673_1002180608.pth [2025-01-05 11:47:49,307][09057] Updated weights for policy 0, policy_version 245808 (0.0018) [2025-01-05 11:47:51,415][09057] Updated weights for policy 0, policy_version 245818 (0.0018) [2025-01-05 11:47:52,842][08963] Fps is (10 sec: 19660.7, 60 sec: 19182.9, 300 sec: 19508.1). Total num frames: 1006899200. Throughput: 0: 4817.2. Samples: 1717872. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-05 11:47:52,842][08963] Avg episode reward: [(0, '10.251')] [2025-01-05 11:47:53,379][09057] Updated weights for policy 0, policy_version 245828 (0.0017) [2025-01-05 11:47:55,463][09057] Updated weights for policy 0, policy_version 245838 (0.0017) [2025-01-05 11:47:57,532][09057] Updated weights for policy 0, policy_version 245848 (0.0017) [2025-01-05 11:47:57,842][08963] Fps is (10 sec: 19661.1, 60 sec: 19251.2, 300 sec: 19480.3). Total num frames: 1006997504. Throughput: 0: 4835.9. Samples: 1732874. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-05 11:47:57,842][08963] Avg episode reward: [(0, '10.319')] [2025-01-05 11:47:59,622][09057] Updated weights for policy 0, policy_version 245858 (0.0019) [2025-01-05 11:48:01,762][09057] Updated weights for policy 0, policy_version 245868 (0.0016) [2025-01-05 11:48:02,842][08963] Fps is (10 sec: 19660.8, 60 sec: 19319.5, 300 sec: 19466.4). Total num frames: 1007095808. Throughput: 0: 4839.6. Samples: 1762148. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-05 11:48:02,842][08963] Avg episode reward: [(0, '9.954')] [2025-01-05 11:48:04,024][09057] Updated weights for policy 0, policy_version 245878 (0.0021) [2025-01-05 11:48:05,994][09057] Updated weights for policy 0, policy_version 245888 (0.0016) [2025-01-05 11:48:07,842][08963] Fps is (10 sec: 19660.6, 60 sec: 19387.7, 300 sec: 19438.6). Total num frames: 1007194112. Throughput: 0: 4843.5. Samples: 1791258. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-05 11:48:07,842][08963] Avg episode reward: [(0, '9.741')] [2025-01-05 11:48:08,107][09057] Updated weights for policy 0, policy_version 245898 (0.0017) [2025-01-05 11:48:10,187][09057] Updated weights for policy 0, policy_version 245908 (0.0017) [2025-01-05 11:48:10,792][09024] Signal inference workers to stop experience collection... (150 times) [2025-01-05 11:48:10,795][09024] Signal inference workers to resume experience collection... (150 times) [2025-01-05 11:48:10,810][09057] InferenceWorker_p0-w0: stopping experience collection (150 times) [2025-01-05 11:48:10,818][09057] InferenceWorker_p0-w0: resuming experience collection (150 times) [2025-01-05 11:48:12,149][09057] Updated weights for policy 0, policy_version 245918 (0.0017) [2025-01-05 11:48:12,842][08963] Fps is (10 sec: 19660.8, 60 sec: 19456.1, 300 sec: 19410.9). Total num frames: 1007292416. Throughput: 0: 4846.5. Samples: 1806284. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-05 11:48:12,842][08963] Avg episode reward: [(0, '10.164')] [2025-01-05 11:48:14,276][09057] Updated weights for policy 0, policy_version 245928 (0.0017) [2025-01-05 11:48:16,367][09057] Updated weights for policy 0, policy_version 245938 (0.0018) [2025-01-05 11:48:17,842][08963] Fps is (10 sec: 19660.9, 60 sec: 19456.0, 300 sec: 19397.0). Total num frames: 1007390720. Throughput: 0: 4878.7. Samples: 1835890. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-05 11:48:17,842][08963] Avg episode reward: [(0, '10.175')] [2025-01-05 11:48:18,459][09057] Updated weights for policy 0, policy_version 245948 (0.0019) [2025-01-05 11:48:20,636][09057] Updated weights for policy 0, policy_version 245958 (0.0018) [2025-01-05 11:48:22,745][09057] Updated weights for policy 0, policy_version 245968 (0.0018) [2025-01-05 11:48:22,842][08963] Fps is (10 sec: 19251.2, 60 sec: 19387.7, 300 sec: 19369.2). Total num frames: 1007484928. Throughput: 0: 4885.2. Samples: 1864922. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-05 11:48:22,842][08963] Avg episode reward: [(0, '10.452')] [2025-01-05 11:48:24,832][09057] Updated weights for policy 0, policy_version 245978 (0.0019) [2025-01-05 11:48:26,986][09057] Updated weights for policy 0, policy_version 245988 (0.0019) [2025-01-05 11:48:27,842][08963] Fps is (10 sec: 18841.6, 60 sec: 19387.8, 300 sec: 19327.6). Total num frames: 1007579136. Throughput: 0: 4876.9. Samples: 1878964. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-05 11:48:27,842][08963] Avg episode reward: [(0, '9.808')] [2025-01-05 11:48:29,222][09057] Updated weights for policy 0, policy_version 245998 (0.0019) [2025-01-05 11:48:31,218][09057] Updated weights for policy 0, policy_version 246008 (0.0017) [2025-01-05 11:48:32,842][08963] Fps is (10 sec: 19251.1, 60 sec: 19456.0, 300 sec: 19313.7). Total num frames: 1007677440. Throughput: 0: 4878.8. Samples: 1907994. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-05 11:48:32,842][08963] Avg episode reward: [(0, '9.939')] [2025-01-05 11:48:33,317][09057] Updated weights for policy 0, policy_version 246018 (0.0017) [2025-01-05 11:48:35,395][09057] Updated weights for policy 0, policy_version 246028 (0.0016) [2025-01-05 11:48:37,353][09057] Updated weights for policy 0, policy_version 246038 (0.0016) [2025-01-05 11:48:37,842][08963] Fps is (10 sec: 19660.9, 60 sec: 19456.0, 300 sec: 19299.8). Total num frames: 1007775744. Throughput: 0: 4890.6. Samples: 1937950. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-05 11:48:37,842][08963] Avg episode reward: [(0, '10.159')] [2025-01-05 11:48:39,456][09057] Updated weights for policy 0, policy_version 246048 (0.0017) [2025-01-05 11:48:41,539][09057] Updated weights for policy 0, policy_version 246058 (0.0017) [2025-01-05 11:48:42,842][08963] Fps is (10 sec: 20070.7, 60 sec: 19592.5, 300 sec: 19313.7). Total num frames: 1007878144. Throughput: 0: 4888.3. Samples: 1952846. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-05 11:48:42,842][08963] Avg episode reward: [(0, '9.488')] [2025-01-05 11:48:43,629][09057] Updated weights for policy 0, policy_version 246068 (0.0019) [2025-01-05 11:48:45,807][09057] Updated weights for policy 0, policy_version 246078 (0.0018) [2025-01-05 11:48:47,842][08963] Fps is (10 sec: 19660.4, 60 sec: 19524.2, 300 sec: 19299.8). Total num frames: 1007972352. Throughput: 0: 4876.2. Samples: 1981580. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-05 11:48:47,843][08963] Avg episode reward: [(0, '9.230')] [2025-01-05 11:48:48,052][09057] Updated weights for policy 0, policy_version 246088 (0.0018) [2025-01-05 11:48:50,175][09057] Updated weights for policy 0, policy_version 246098 (0.0020) [2025-01-05 11:48:52,291][09057] Updated weights for policy 0, policy_version 246108 (0.0017) [2025-01-05 11:48:52,842][08963] Fps is (10 sec: 18841.3, 60 sec: 19456.0, 300 sec: 19272.0). Total num frames: 1008066560. Throughput: 0: 4859.2. Samples: 2009922. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-05 11:48:52,842][08963] Avg episode reward: [(0, '10.648')] [2025-01-05 11:48:54,519][09057] Updated weights for policy 0, policy_version 246118 (0.0018) [2025-01-05 11:48:56,543][09057] Updated weights for policy 0, policy_version 246128 (0.0017) [2025-01-05 11:48:57,842][08963] Fps is (10 sec: 18841.6, 60 sec: 19387.7, 300 sec: 19244.2). Total num frames: 1008160768. Throughput: 0: 4847.5. Samples: 2024422. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-05 11:48:57,843][08963] Avg episode reward: [(0, '9.520')] [2025-01-05 11:48:58,738][09057] Updated weights for policy 0, policy_version 246138 (0.0019) [2025-01-05 11:49:00,912][09057] Updated weights for policy 0, policy_version 246148 (0.0018) [2025-01-05 11:49:02,842][08963] Fps is (10 sec: 18841.6, 60 sec: 19319.4, 300 sec: 19216.5). Total num frames: 1008254976. Throughput: 0: 4822.1. Samples: 2052886. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-05 11:49:02,842][08963] Avg episode reward: [(0, '9.632')] [2025-01-05 11:49:03,043][09057] Updated weights for policy 0, policy_version 246158 (0.0018) [2025-01-05 11:49:05,132][09057] Updated weights for policy 0, policy_version 246168 (0.0017) [2025-01-05 11:49:07,215][09057] Updated weights for policy 0, policy_version 246178 (0.0017) [2025-01-05 11:49:07,842][08963] Fps is (10 sec: 19251.5, 60 sec: 19319.5, 300 sec: 19216.5). Total num frames: 1008353280. Throughput: 0: 4830.0. Samples: 2082272. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-05 11:49:07,842][08963] Avg episode reward: [(0, '9.674')] [2025-01-05 11:49:09,342][09057] Updated weights for policy 0, policy_version 246188 (0.0018) [2025-01-05 11:49:11,452][09057] Updated weights for policy 0, policy_version 246198 (0.0018) [2025-01-05 11:49:12,842][08963] Fps is (10 sec: 19660.9, 60 sec: 19319.5, 300 sec: 19216.5). Total num frames: 1008451584. Throughput: 0: 4835.6. Samples: 2096564. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-05 11:49:12,842][08963] Avg episode reward: [(0, '10.272')] [2025-01-05 11:49:13,636][09057] Updated weights for policy 0, policy_version 246208 (0.0017) [2025-01-05 11:49:15,672][09057] Updated weights for policy 0, policy_version 246218 (0.0016) [2025-01-05 11:49:17,815][09057] Updated weights for policy 0, policy_version 246228 (0.0016) [2025-01-05 11:49:17,842][08963] Fps is (10 sec: 19660.6, 60 sec: 19319.4, 300 sec: 19216.5). Total num frames: 1008549888. Throughput: 0: 4839.8. Samples: 2125786. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-05 11:49:17,843][08963] Avg episode reward: [(0, '9.715')] [2025-01-05 11:49:20,006][09057] Updated weights for policy 0, policy_version 246238 (0.0018) [2025-01-05 11:49:22,038][09057] Updated weights for policy 0, policy_version 246248 (0.0016) [2025-01-05 11:49:22,842][08963] Fps is (10 sec: 19251.4, 60 sec: 19319.5, 300 sec: 19188.7). Total num frames: 1008644096. Throughput: 0: 4816.4. Samples: 2154686. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-05 11:49:22,842][08963] Avg episode reward: [(0, '9.865')] [2025-01-05 11:49:24,172][09057] Updated weights for policy 0, policy_version 246258 (0.0016) [2025-01-05 11:49:26,286][09057] Updated weights for policy 0, policy_version 246268 (0.0016) [2025-01-05 11:49:27,842][08963] Fps is (10 sec: 19251.2, 60 sec: 19387.7, 300 sec: 19188.7). Total num frames: 1008742400. Throughput: 0: 4808.9. Samples: 2169248. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-05 11:49:27,842][08963] Avg episode reward: [(0, '10.232')] [2025-01-05 11:49:28,402][09057] Updated weights for policy 0, policy_version 246278 (0.0016) [2025-01-05 11:49:30,450][09057] Updated weights for policy 0, policy_version 246288 (0.0015) [2025-01-05 11:49:32,545][09057] Updated weights for policy 0, policy_version 246298 (0.0017) [2025-01-05 11:49:32,842][08963] Fps is (10 sec: 19660.8, 60 sec: 19387.8, 300 sec: 19188.7). Total num frames: 1008840704. Throughput: 0: 4822.6. Samples: 2198594. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-05 11:49:32,842][08963] Avg episode reward: [(0, '10.543')] [2025-01-05 11:49:34,664][09057] Updated weights for policy 0, policy_version 246308 (0.0017) [2025-01-05 11:49:36,725][09057] Updated weights for policy 0, policy_version 246318 (0.0016) [2025-01-05 11:49:37,842][08963] Fps is (10 sec: 19660.8, 60 sec: 19387.7, 300 sec: 19202.6). Total num frames: 1008939008. Throughput: 0: 4844.8. Samples: 2227936. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-05 11:49:37,842][08963] Avg episode reward: [(0, '11.202')] [2025-01-05 11:49:38,898][09057] Updated weights for policy 0, policy_version 246328 (0.0017) [2025-01-05 11:49:40,946][09057] Updated weights for policy 0, policy_version 246338 (0.0016) [2025-01-05 11:49:42,842][08963] Fps is (10 sec: 19251.0, 60 sec: 19251.2, 300 sec: 19188.7). Total num frames: 1009033216. Throughput: 0: 4847.0. Samples: 2242538. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-05 11:49:42,842][08963] Avg episode reward: [(0, '10.615')] [2025-01-05 11:49:43,074][09057] Updated weights for policy 0, policy_version 246348 (0.0016) [2025-01-05 11:49:45,204][09057] Updated weights for policy 0, policy_version 246358 (0.0015) [2025-01-05 11:49:47,251][09057] Updated weights for policy 0, policy_version 246368 (0.0015) [2025-01-05 11:49:47,842][08963] Fps is (10 sec: 19251.2, 60 sec: 19319.5, 300 sec: 19188.7). Total num frames: 1009131520. Throughput: 0: 4865.9. Samples: 2271850. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-05 11:49:47,842][08963] Avg episode reward: [(0, '10.662')] [2025-01-05 11:49:47,896][09024] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000246371_1009135616.pth... [2025-01-05 11:49:47,955][09024] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000245255_1004564480.pth [2025-01-05 11:49:49,356][09057] Updated weights for policy 0, policy_version 246378 (0.0017) [2025-01-05 11:49:51,492][09057] Updated weights for policy 0, policy_version 246388 (0.0015) [2025-01-05 11:49:52,842][08963] Fps is (10 sec: 19660.9, 60 sec: 19387.7, 300 sec: 19188.7). Total num frames: 1009229824. Throughput: 0: 4857.8. Samples: 2300874. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-05 11:49:52,842][08963] Avg episode reward: [(0, '9.784')] [2025-01-05 11:49:53,645][09057] Updated weights for policy 0, policy_version 246398 (0.0017) [2025-01-05 11:49:55,679][09057] Updated weights for policy 0, policy_version 246408 (0.0017) [2025-01-05 11:49:57,842][08963] Fps is (10 sec: 19251.3, 60 sec: 19387.8, 300 sec: 19202.6). Total num frames: 1009324032. Throughput: 0: 4862.8. Samples: 2315392. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-05 11:49:57,843][08963] Avg episode reward: [(0, '9.345')] [2025-01-05 11:49:57,899][09057] Updated weights for policy 0, policy_version 246418 (0.0017) [2025-01-05 11:50:00,042][09057] Updated weights for policy 0, policy_version 246428 (0.0017) [2025-01-05 11:50:02,071][09057] Updated weights for policy 0, policy_version 246438 (0.0017) [2025-01-05 11:50:02,842][08963] Fps is (10 sec: 19251.2, 60 sec: 19456.0, 300 sec: 19188.7). Total num frames: 1009422336. Throughput: 0: 4853.8. Samples: 2344208. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-05 11:50:02,842][08963] Avg episode reward: [(0, '10.898')] [2025-01-05 11:50:04,289][09057] Updated weights for policy 0, policy_version 246448 (0.0017) [2025-01-05 11:50:06,350][09057] Updated weights for policy 0, policy_version 246458 (0.0017) [2025-01-05 11:50:07,842][08963] Fps is (10 sec: 19660.9, 60 sec: 19456.0, 300 sec: 19188.7). Total num frames: 1009520640. Throughput: 0: 4852.4. Samples: 2373046. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-05 11:50:07,843][08963] Avg episode reward: [(0, '9.829')] [2025-01-05 11:50:08,468][09057] Updated weights for policy 0, policy_version 246468 (0.0016) [2025-01-05 11:50:10,603][09057] Updated weights for policy 0, policy_version 246478 (0.0016) [2025-01-05 11:50:12,642][09057] Updated weights for policy 0, policy_version 246488 (0.0016) [2025-01-05 11:50:12,842][08963] Fps is (10 sec: 19660.8, 60 sec: 19456.0, 300 sec: 19174.8). Total num frames: 1009618944. Throughput: 0: 4854.9. Samples: 2387718. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 11:50:12,842][08963] Avg episode reward: [(0, '10.645')] [2025-01-05 11:50:14,759][09057] Updated weights for policy 0, policy_version 246498 (0.0016) [2025-01-05 11:50:16,896][09057] Updated weights for policy 0, policy_version 246508 (0.0015) [2025-01-05 11:50:17,842][08963] Fps is (10 sec: 19251.0, 60 sec: 19387.7, 300 sec: 19147.1). Total num frames: 1009713152. Throughput: 0: 4852.4. Samples: 2416954. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 11:50:17,843][08963] Avg episode reward: [(0, '9.620')] [2025-01-05 11:50:19,042][09057] Updated weights for policy 0, policy_version 246518 (0.0017) [2025-01-05 11:50:21,096][09057] Updated weights for policy 0, policy_version 246528 (0.0016) [2025-01-05 11:50:22,842][08963] Fps is (10 sec: 19251.2, 60 sec: 19456.0, 300 sec: 19147.1). Total num frames: 1009811456. Throughput: 0: 4840.5. Samples: 2445760. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 11:50:22,843][08963] Avg episode reward: [(0, '10.277')] [2025-01-05 11:50:23,315][09057] Updated weights for policy 0, policy_version 246538 (0.0016) [2025-01-05 11:50:25,362][09057] Updated weights for policy 0, policy_version 246548 (0.0016) [2025-01-05 11:50:27,411][09057] Updated weights for policy 0, policy_version 246558 (0.0015) [2025-01-05 11:50:27,842][08963] Fps is (10 sec: 19661.1, 60 sec: 19456.0, 300 sec: 19147.1). Total num frames: 1009909760. Throughput: 0: 4841.0. Samples: 2460384. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 11:50:27,842][08963] Avg episode reward: [(0, '9.625')] [2025-01-05 11:50:29,619][09057] Updated weights for policy 0, policy_version 246568 (0.0017) [2025-01-05 11:50:31,596][09057] Updated weights for policy 0, policy_version 246578 (0.0015) [2025-01-05 11:50:32,842][08963] Fps is (10 sec: 19251.3, 60 sec: 19387.7, 300 sec: 19133.2). Total num frames: 1010003968. Throughput: 0: 4844.1. Samples: 2489832. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 11:50:32,842][08963] Avg episode reward: [(0, '9.085')] [2025-01-05 11:50:33,682][09057] Updated weights for policy 0, policy_version 246588 (0.0016) [2025-01-05 11:50:35,803][09057] Updated weights for policy 0, policy_version 246598 (0.0016) [2025-01-05 11:50:37,842][08963] Fps is (10 sec: 19251.2, 60 sec: 19387.8, 300 sec: 19119.3). Total num frames: 1010102272. Throughput: 0: 4849.6. Samples: 2519108. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 11:50:37,842][08963] Avg episode reward: [(0, '9.203')] [2025-01-05 11:50:37,905][09057] Updated weights for policy 0, policy_version 246608 (0.0018) [2025-01-05 11:50:40,093][09057] Updated weights for policy 0, policy_version 246618 (0.0018) [2025-01-05 11:50:42,281][09057] Updated weights for policy 0, policy_version 246628 (0.0017) [2025-01-05 11:50:42,842][08963] Fps is (10 sec: 19251.3, 60 sec: 19387.8, 300 sec: 19091.5). Total num frames: 1010196480. Throughput: 0: 4838.1. Samples: 2533106. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 11:50:42,842][08963] Avg episode reward: [(0, '9.144')] [2025-01-05 11:50:44,271][09024] Signal inference workers to stop experience collection... (200 times) [2025-01-05 11:50:44,272][09024] Signal inference workers to resume experience collection... (200 times) [2025-01-05 11:50:44,288][09057] InferenceWorker_p0-w0: stopping experience collection (200 times) [2025-01-05 11:50:44,289][09057] InferenceWorker_p0-w0: resuming experience collection (200 times) [2025-01-05 11:50:44,398][09057] Updated weights for policy 0, policy_version 246638 (0.0018) [2025-01-05 11:50:46,528][09057] Updated weights for policy 0, policy_version 246648 (0.0018) [2025-01-05 11:50:47,842][08963] Fps is (10 sec: 19251.1, 60 sec: 19387.8, 300 sec: 19091.5). Total num frames: 1010294784. Throughput: 0: 4840.9. Samples: 2562048. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 11:50:47,842][08963] Avg episode reward: [(0, '10.427')] [2025-01-05 11:50:48,655][09057] Updated weights for policy 0, policy_version 246658 (0.0015) [2025-01-05 11:50:50,623][09057] Updated weights for policy 0, policy_version 246668 (0.0016) [2025-01-05 11:50:52,710][09057] Updated weights for policy 0, policy_version 246678 (0.0016) [2025-01-05 11:50:52,842][08963] Fps is (10 sec: 19660.8, 60 sec: 19387.7, 300 sec: 19119.3). Total num frames: 1010393088. Throughput: 0: 4859.1. Samples: 2591706. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 11:50:52,842][08963] Avg episode reward: [(0, '10.621')] [2025-01-05 11:50:55,005][09057] Updated weights for policy 0, policy_version 246688 (0.0018) [2025-01-05 11:50:56,987][09057] Updated weights for policy 0, policy_version 246698 (0.0018) [2025-01-05 11:50:57,842][08963] Fps is (10 sec: 19661.0, 60 sec: 19456.1, 300 sec: 19133.3). Total num frames: 1010491392. Throughput: 0: 4846.4. Samples: 2605804. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 11:50:57,842][08963] Avg episode reward: [(0, '9.911')] [2025-01-05 11:50:59,073][09057] Updated weights for policy 0, policy_version 246708 (0.0018) [2025-01-05 11:51:01,222][09057] Updated weights for policy 0, policy_version 246718 (0.0016) [2025-01-05 11:51:02,842][08963] Fps is (10 sec: 19251.3, 60 sec: 19387.8, 300 sec: 19147.1). Total num frames: 1010585600. Throughput: 0: 4853.4. Samples: 2635354. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 11:51:02,842][08963] Avg episode reward: [(0, '10.121')] [2025-01-05 11:51:03,305][09057] Updated weights for policy 0, policy_version 246728 (0.0017) [2025-01-05 11:51:05,446][09057] Updated weights for policy 0, policy_version 246738 (0.0016) [2025-01-05 11:51:07,558][09057] Updated weights for policy 0, policy_version 246748 (0.0018) [2025-01-05 11:51:07,842][08963] Fps is (10 sec: 19250.4, 60 sec: 19387.6, 300 sec: 19188.7). Total num frames: 1010683904. Throughput: 0: 4858.7. Samples: 2664402. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 11:51:07,843][08963] Avg episode reward: [(0, '10.061')] [2025-01-05 11:51:09,614][09057] Updated weights for policy 0, policy_version 246758 (0.0016) [2025-01-05 11:51:11,724][09057] Updated weights for policy 0, policy_version 246768 (0.0016) [2025-01-05 11:51:12,842][08963] Fps is (10 sec: 19660.5, 60 sec: 19387.7, 300 sec: 19202.6). Total num frames: 1010782208. Throughput: 0: 4856.3. Samples: 2678920. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 11:51:12,842][08963] Avg episode reward: [(0, '10.733')] [2025-01-05 11:51:13,995][09057] Updated weights for policy 0, policy_version 246778 (0.0018) [2025-01-05 11:51:15,983][09057] Updated weights for policy 0, policy_version 246788 (0.0017) [2025-01-05 11:51:17,842][08963] Fps is (10 sec: 19661.3, 60 sec: 19456.0, 300 sec: 19244.3). Total num frames: 1010880512. Throughput: 0: 4841.9. Samples: 2707718. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 11:51:17,842][08963] Avg episode reward: [(0, '9.179')] [2025-01-05 11:51:18,104][09057] Updated weights for policy 0, policy_version 246798 (0.0017) [2025-01-05 11:51:20,221][09057] Updated weights for policy 0, policy_version 246808 (0.0017) [2025-01-05 11:51:22,207][09057] Updated weights for policy 0, policy_version 246818 (0.0018) [2025-01-05 11:51:22,842][08963] Fps is (10 sec: 19661.1, 60 sec: 19456.0, 300 sec: 19285.9). Total num frames: 1010978816. Throughput: 0: 4852.4. Samples: 2737468. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-05 11:51:22,842][08963] Avg episode reward: [(0, '10.254')] [2025-01-05 11:51:23,346][09024] Signal inference workers to stop experience collection... (250 times) [2025-01-05 11:51:23,348][09024] Signal inference workers to resume experience collection... (250 times) [2025-01-05 11:51:23,368][09057] InferenceWorker_p0-w0: stopping experience collection (250 times) [2025-01-05 11:51:23,368][09057] InferenceWorker_p0-w0: resuming experience collection (250 times) [2025-01-05 11:51:24,351][09057] Updated weights for policy 0, policy_version 246828 (0.0017) [2025-01-05 11:51:26,500][09057] Updated weights for policy 0, policy_version 246838 (0.0015) [2025-01-05 11:51:27,842][08963] Fps is (10 sec: 19251.5, 60 sec: 19387.8, 300 sec: 19313.7). Total num frames: 1011073024. Throughput: 0: 4866.8. Samples: 2752112. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-05 11:51:27,842][08963] Avg episode reward: [(0, '10.107')] [2025-01-05 11:51:28,500][09057] Updated weights for policy 0, policy_version 246848 (0.0016) [2025-01-05 11:51:30,609][09057] Updated weights for policy 0, policy_version 246858 (0.0015) [2025-01-05 11:51:32,753][09057] Updated weights for policy 0, policy_version 246868 (0.0018) [2025-01-05 11:51:32,842][08963] Fps is (10 sec: 19251.0, 60 sec: 19456.0, 300 sec: 19355.3). Total num frames: 1011171328. Throughput: 0: 4876.3. Samples: 2781482. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-05 11:51:32,842][08963] Avg episode reward: [(0, '10.043')] [2025-01-05 11:51:34,849][09057] Updated weights for policy 0, policy_version 246878 (0.0018) [2025-01-05 11:51:37,037][09057] Updated weights for policy 0, policy_version 246888 (0.0017) [2025-01-05 11:51:37,842][08963] Fps is (10 sec: 19250.9, 60 sec: 19387.7, 300 sec: 19341.5). Total num frames: 1011265536. Throughput: 0: 4850.3. Samples: 2809968. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-05 11:51:37,842][08963] Avg episode reward: [(0, '9.922')] [2025-01-05 11:51:39,477][09057] Updated weights for policy 0, policy_version 246898 (0.0017) [2025-01-05 11:51:41,834][09057] Updated weights for policy 0, policy_version 246908 (0.0016) [2025-01-05 11:51:42,842][08963] Fps is (10 sec: 17612.8, 60 sec: 19182.9, 300 sec: 19313.7). Total num frames: 1011347456. Throughput: 0: 4811.5. Samples: 2822320. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-05 11:51:42,842][08963] Avg episode reward: [(0, '9.395')] [2025-01-05 11:51:44,319][09057] Updated weights for policy 0, policy_version 246918 (0.0017) [2025-01-05 11:51:46,576][09057] Updated weights for policy 0, policy_version 246928 (0.0017) [2025-01-05 11:51:47,842][08963] Fps is (10 sec: 17203.0, 60 sec: 19046.4, 300 sec: 19285.9). Total num frames: 1011437568. Throughput: 0: 4737.4. Samples: 2848540. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-05 11:51:47,843][08963] Avg episode reward: [(0, '10.342')] [2025-01-05 11:51:47,922][09024] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000246934_1011441664.pth... [2025-01-05 11:51:47,980][09024] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000245801_1006800896.pth [2025-01-05 11:51:48,731][09057] Updated weights for policy 0, policy_version 246938 (0.0018) [2025-01-05 11:51:50,911][09057] Updated weights for policy 0, policy_version 246948 (0.0017) [2025-01-05 11:51:52,842][08963] Fps is (10 sec: 18841.8, 60 sec: 19046.4, 300 sec: 19299.8). Total num frames: 1011535872. Throughput: 0: 4730.7. Samples: 2877282. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-05 11:51:52,842][08963] Avg episode reward: [(0, '9.947')] [2025-01-05 11:51:52,993][09057] Updated weights for policy 0, policy_version 246958 (0.0016) [2025-01-05 11:51:54,973][09057] Updated weights for policy 0, policy_version 246968 (0.0016) [2025-01-05 11:51:57,088][09057] Updated weights for policy 0, policy_version 246978 (0.0019) [2025-01-05 11:51:57,842][08963] Fps is (10 sec: 19660.8, 60 sec: 19046.3, 300 sec: 19313.7). Total num frames: 1011634176. Throughput: 0: 4739.1. Samples: 2892180. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-05 11:51:57,843][08963] Avg episode reward: [(0, '9.179')] [2025-01-05 11:51:59,189][09057] Updated weights for policy 0, policy_version 246988 (0.0017) [2025-01-05 11:52:01,162][09057] Updated weights for policy 0, policy_version 246998 (0.0018) [2025-01-05 11:52:02,842][08963] Fps is (10 sec: 19660.6, 60 sec: 19114.6, 300 sec: 19327.6). Total num frames: 1011732480. Throughput: 0: 4761.1. Samples: 2921966. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-05 11:52:02,842][08963] Avg episode reward: [(0, '10.809')] [2025-01-05 11:52:03,264][09057] Updated weights for policy 0, policy_version 247008 (0.0017) [2025-01-05 11:52:05,362][09057] Updated weights for policy 0, policy_version 247018 (0.0018) [2025-01-05 11:52:07,317][09057] Updated weights for policy 0, policy_version 247028 (0.0015) [2025-01-05 11:52:07,842][08963] Fps is (10 sec: 20070.9, 60 sec: 19183.1, 300 sec: 19355.4). Total num frames: 1011834880. Throughput: 0: 4766.7. Samples: 2951968. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-05 11:52:07,842][08963] Avg episode reward: [(0, '9.167')] [2025-01-05 11:52:09,440][09057] Updated weights for policy 0, policy_version 247038 (0.0018) [2025-01-05 11:52:11,527][09057] Updated weights for policy 0, policy_version 247048 (0.0018) [2025-01-05 11:52:12,842][08963] Fps is (10 sec: 20070.1, 60 sec: 19182.9, 300 sec: 19355.3). Total num frames: 1011933184. Throughput: 0: 4770.3. Samples: 2966778. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-05 11:52:12,843][08963] Avg episode reward: [(0, '8.940')] [2025-01-05 11:52:13,590][09057] Updated weights for policy 0, policy_version 247058 (0.0018) [2025-01-05 11:52:14,759][09024] Signal inference workers to stop experience collection... (300 times) [2025-01-05 11:52:14,762][09024] Signal inference workers to resume experience collection... (300 times) [2025-01-05 11:52:14,771][09057] InferenceWorker_p0-w0: stopping experience collection (300 times) [2025-01-05 11:52:14,779][09057] InferenceWorker_p0-w0: resuming experience collection (300 times) [2025-01-05 11:52:15,783][09057] Updated weights for policy 0, policy_version 247068 (0.0016) [2025-01-05 11:52:17,842][08963] Fps is (10 sec: 19250.8, 60 sec: 19114.7, 300 sec: 19341.4). Total num frames: 1012027392. Throughput: 0: 4764.2. Samples: 2995870. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-05 11:52:17,842][08963] Avg episode reward: [(0, '10.608')] [2025-01-05 11:52:17,859][09057] Updated weights for policy 0, policy_version 247078 (0.0017) [2025-01-05 11:52:19,822][09057] Updated weights for policy 0, policy_version 247088 (0.0019) [2025-01-05 11:52:21,932][09057] Updated weights for policy 0, policy_version 247098 (0.0017) [2025-01-05 11:52:22,842][08963] Fps is (10 sec: 19251.2, 60 sec: 19114.6, 300 sec: 19355.3). Total num frames: 1012125696. Throughput: 0: 4792.0. Samples: 3025610. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-05 11:52:22,843][08963] Avg episode reward: [(0, '9.733')] [2025-01-05 11:52:24,123][09057] Updated weights for policy 0, policy_version 247108 (0.0019) [2025-01-05 11:52:26,121][09057] Updated weights for policy 0, policy_version 247118 (0.0016) [2025-01-05 11:52:27,842][08963] Fps is (10 sec: 20070.9, 60 sec: 19251.2, 300 sec: 19383.1). Total num frames: 1012228096. Throughput: 0: 4840.7. Samples: 3040152. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-05 11:52:27,842][08963] Avg episode reward: [(0, '11.396')] [2025-01-05 11:52:28,203][09057] Updated weights for policy 0, policy_version 247128 (0.0016) [2025-01-05 11:52:30,282][09057] Updated weights for policy 0, policy_version 247138 (0.0019) [2025-01-05 11:52:32,248][09057] Updated weights for policy 0, policy_version 247148 (0.0016) [2025-01-05 11:52:32,842][08963] Fps is (10 sec: 20070.7, 60 sec: 19251.2, 300 sec: 19383.1). Total num frames: 1012326400. Throughput: 0: 4924.7. Samples: 3070150. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-05 11:52:32,842][08963] Avg episode reward: [(0, '10.381')] [2025-01-05 11:52:34,335][09057] Updated weights for policy 0, policy_version 247158 (0.0016) [2025-01-05 11:52:36,402][09057] Updated weights for policy 0, policy_version 247168 (0.0019) [2025-01-05 11:52:37,842][08963] Fps is (10 sec: 20070.3, 60 sec: 19387.8, 300 sec: 19410.9). Total num frames: 1012428800. Throughput: 0: 4948.6. Samples: 3099968. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-05 11:52:37,842][08963] Avg episode reward: [(0, '9.010')] [2025-01-05 11:52:38,479][09057] Updated weights for policy 0, policy_version 247178 (0.0017) [2025-01-05 11:52:40,643][09057] Updated weights for policy 0, policy_version 247188 (0.0016) [2025-01-05 11:52:42,723][09057] Updated weights for policy 0, policy_version 247198 (0.0016) [2025-01-05 11:52:42,842][08963] Fps is (10 sec: 20070.3, 60 sec: 19660.8, 300 sec: 19410.9). Total num frames: 1012527104. Throughput: 0: 4938.3. Samples: 3114404. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-05 11:52:42,842][08963] Avg episode reward: [(0, '9.732')] [2025-01-05 11:52:44,772][09057] Updated weights for policy 0, policy_version 247208 (0.0017) [2025-01-05 11:52:46,908][09057] Updated weights for policy 0, policy_version 247218 (0.0016) [2025-01-05 11:52:47,842][08963] Fps is (10 sec: 19251.2, 60 sec: 19729.1, 300 sec: 19397.0). Total num frames: 1012621312. Throughput: 0: 4929.7. Samples: 3143802. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-05 11:52:47,842][08963] Avg episode reward: [(0, '9.829')] [2025-01-05 11:52:49,093][09057] Updated weights for policy 0, policy_version 247228 (0.0019) [2025-01-05 11:52:51,089][09057] Updated weights for policy 0, policy_version 247238 (0.0016) [2025-01-05 11:52:52,842][08963] Fps is (10 sec: 19251.5, 60 sec: 19729.1, 300 sec: 19397.0). Total num frames: 1012719616. Throughput: 0: 4914.3. Samples: 3173110. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-05 11:52:52,842][08963] Avg episode reward: [(0, '10.601')] [2025-01-05 11:52:53,200][09057] Updated weights for policy 0, policy_version 247248 (0.0015) [2025-01-05 11:52:55,265][09057] Updated weights for policy 0, policy_version 247258 (0.0016) [2025-01-05 11:52:57,216][09057] Updated weights for policy 0, policy_version 247268 (0.0016) [2025-01-05 11:52:57,842][08963] Fps is (10 sec: 20070.2, 60 sec: 19797.4, 300 sec: 19410.9). Total num frames: 1012822016. Throughput: 0: 4920.0. Samples: 3188178. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-05 11:52:57,842][08963] Avg episode reward: [(0, '10.131')] [2025-01-05 11:52:59,310][09057] Updated weights for policy 0, policy_version 247278 (0.0017) [2025-01-05 11:53:01,378][09057] Updated weights for policy 0, policy_version 247288 (0.0016) [2025-01-05 11:53:02,842][08963] Fps is (10 sec: 20070.1, 60 sec: 19797.3, 300 sec: 19410.9). Total num frames: 1012920320. Throughput: 0: 4943.2. Samples: 3218314. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-05 11:53:02,843][08963] Avg episode reward: [(0, '9.489')] [2025-01-05 11:53:03,399][09057] Updated weights for policy 0, policy_version 247298 (0.0017) [2025-01-05 11:53:05,502][09057] Updated weights for policy 0, policy_version 247308 (0.0016) [2025-01-05 11:53:07,568][09057] Updated weights for policy 0, policy_version 247318 (0.0017) [2025-01-05 11:53:07,842][08963] Fps is (10 sec: 19660.7, 60 sec: 19729.0, 300 sec: 19410.9). Total num frames: 1013018624. Throughput: 0: 4940.1. Samples: 3247916. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-05 11:53:07,842][08963] Avg episode reward: [(0, '9.613')] [2025-01-05 11:53:08,210][09024] Signal inference workers to stop experience collection... (350 times) [2025-01-05 11:53:08,213][09024] Signal inference workers to resume experience collection... (350 times) [2025-01-05 11:53:08,225][09057] InferenceWorker_p0-w0: stopping experience collection (350 times) [2025-01-05 11:53:08,226][09057] InferenceWorker_p0-w0: resuming experience collection (350 times) [2025-01-05 11:53:09,622][09057] Updated weights for policy 0, policy_version 247328 (0.0018) [2025-01-05 11:53:11,777][09057] Updated weights for policy 0, policy_version 247338 (0.0017) [2025-01-05 11:53:12,842][08963] Fps is (10 sec: 19660.7, 60 sec: 19729.1, 300 sec: 19410.9). Total num frames: 1013116928. Throughput: 0: 4938.5. Samples: 3262384. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-05 11:53:12,843][08963] Avg episode reward: [(0, '10.110')] [2025-01-05 11:53:13,944][09057] Updated weights for policy 0, policy_version 247348 (0.0016) [2025-01-05 11:53:15,935][09057] Updated weights for policy 0, policy_version 247358 (0.0016) [2025-01-05 11:53:17,842][08963] Fps is (10 sec: 19660.9, 60 sec: 19797.4, 300 sec: 19424.8). Total num frames: 1013215232. Throughput: 0: 4925.4. Samples: 3291792. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-05 11:53:17,842][08963] Avg episode reward: [(0, '10.664')] [2025-01-05 11:53:18,017][09057] Updated weights for policy 0, policy_version 247368 (0.0016) [2025-01-05 11:53:20,107][09057] Updated weights for policy 0, policy_version 247378 (0.0016) [2025-01-05 11:53:22,066][09057] Updated weights for policy 0, policy_version 247388 (0.0016) [2025-01-05 11:53:22,842][08963] Fps is (10 sec: 19660.9, 60 sec: 19797.4, 300 sec: 19438.6). Total num frames: 1013313536. Throughput: 0: 4926.5. Samples: 3321660. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-05 11:53:22,842][08963] Avg episode reward: [(0, '10.520')] [2025-01-05 11:53:24,176][09057] Updated weights for policy 0, policy_version 247398 (0.0016) [2025-01-05 11:53:26,234][09057] Updated weights for policy 0, policy_version 247408 (0.0015) [2025-01-05 11:53:27,842][08963] Fps is (10 sec: 19660.7, 60 sec: 19729.0, 300 sec: 19438.6). Total num frames: 1013411840. Throughput: 0: 4942.8. Samples: 3336828. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-05 11:53:27,843][08963] Avg episode reward: [(0, '10.975')] [2025-01-05 11:53:28,280][09057] Updated weights for policy 0, policy_version 247418 (0.0017) [2025-01-05 11:53:30,389][09057] Updated weights for policy 0, policy_version 247428 (0.0017) [2025-01-05 11:53:32,460][09057] Updated weights for policy 0, policy_version 247438 (0.0015) [2025-01-05 11:53:32,842][08963] Fps is (10 sec: 19661.0, 60 sec: 19729.1, 300 sec: 19438.6). Total num frames: 1013510144. Throughput: 0: 4948.3. Samples: 3366476. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-05 11:53:32,842][08963] Avg episode reward: [(0, '9.581')] [2025-01-05 11:53:34,563][09057] Updated weights for policy 0, policy_version 247448 (0.0018) [2025-01-05 11:53:36,651][09057] Updated weights for policy 0, policy_version 247458 (0.0017) [2025-01-05 11:53:37,842][08963] Fps is (10 sec: 19660.4, 60 sec: 19660.7, 300 sec: 19424.7). Total num frames: 1013608448. Throughput: 0: 4941.2. Samples: 3395466. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-05 11:53:37,843][08963] Avg episode reward: [(0, '10.671')] [2025-01-05 11:53:38,854][09057] Updated weights for policy 0, policy_version 247468 (0.0017) [2025-01-05 11:53:40,890][09057] Updated weights for policy 0, policy_version 247478 (0.0017) [2025-01-05 11:53:42,842][08963] Fps is (10 sec: 19660.5, 60 sec: 19660.8, 300 sec: 19438.7). Total num frames: 1013706752. Throughput: 0: 4931.6. Samples: 3410098. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-05 11:53:42,842][08963] Avg episode reward: [(0, '8.998')] [2025-01-05 11:53:43,067][09057] Updated weights for policy 0, policy_version 247488 (0.0017) [2025-01-05 11:53:45,196][09057] Updated weights for policy 0, policy_version 247498 (0.0017) [2025-01-05 11:53:47,191][09057] Updated weights for policy 0, policy_version 247508 (0.0018) [2025-01-05 11:53:47,842][08963] Fps is (10 sec: 19661.4, 60 sec: 19729.1, 300 sec: 19452.5). Total num frames: 1013805056. Throughput: 0: 4908.8. Samples: 3439210. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-05 11:53:47,842][08963] Avg episode reward: [(0, '9.205')] [2025-01-05 11:53:47,849][09024] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000247511_1013805056.pth... [2025-01-05 11:53:47,897][09024] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000246371_1009135616.pth [2025-01-05 11:53:49,329][09057] Updated weights for policy 0, policy_version 247518 (0.0016) [2025-01-05 11:53:51,383][09057] Updated weights for policy 0, policy_version 247528 (0.0017) [2025-01-05 11:53:52,842][08963] Fps is (10 sec: 19660.9, 60 sec: 19729.0, 300 sec: 19466.4). Total num frames: 1013903360. Throughput: 0: 4909.3. Samples: 3468834. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-05 11:53:52,842][08963] Avg episode reward: [(0, '11.641')] [2025-01-05 11:53:53,421][09057] Updated weights for policy 0, policy_version 247538 (0.0017) [2025-01-05 11:53:55,492][09057] Updated weights for policy 0, policy_version 247548 (0.0015) [2025-01-05 11:53:57,571][09057] Updated weights for policy 0, policy_version 247558 (0.0016) [2025-01-05 11:53:57,842][08963] Fps is (10 sec: 19660.8, 60 sec: 19660.8, 300 sec: 19480.3). Total num frames: 1014001664. Throughput: 0: 4920.1. Samples: 3483786. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-05 11:53:57,842][08963] Avg episode reward: [(0, '9.892')] [2025-01-05 11:53:59,633][09057] Updated weights for policy 0, policy_version 247568 (0.0017) [2025-01-05 11:54:01,769][09057] Updated weights for policy 0, policy_version 247578 (0.0016) [2025-01-05 11:54:02,842][08963] Fps is (10 sec: 19660.8, 60 sec: 19660.8, 300 sec: 19480.3). Total num frames: 1014099968. Throughput: 0: 4918.6. Samples: 3513130. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-05 11:54:02,842][08963] Avg episode reward: [(0, '10.562')] [2025-01-05 11:54:03,960][09057] Updated weights for policy 0, policy_version 247588 (0.0021) [2025-01-05 11:54:06,012][09057] Updated weights for policy 0, policy_version 247598 (0.0017) [2025-01-05 11:54:07,842][08963] Fps is (10 sec: 19250.9, 60 sec: 19592.5, 300 sec: 19466.4). Total num frames: 1014194176. Throughput: 0: 4897.1. Samples: 3542032. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-05 11:54:07,842][08963] Avg episode reward: [(0, '11.504')] [2025-01-05 11:54:08,165][09057] Updated weights for policy 0, policy_version 247608 (0.0018) [2025-01-05 11:54:10,301][09057] Updated weights for policy 0, policy_version 247618 (0.0017) [2025-01-05 11:54:12,301][09057] Updated weights for policy 0, policy_version 247628 (0.0017) [2025-01-05 11:54:12,842][08963] Fps is (10 sec: 19251.2, 60 sec: 19592.5, 300 sec: 19466.4). Total num frames: 1014292480. Throughput: 0: 4884.0. Samples: 3556606. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-05 11:54:12,842][08963] Avg episode reward: [(0, '9.734')] [2025-01-05 11:54:14,367][09057] Updated weights for policy 0, policy_version 247638 (0.0016) [2025-01-05 11:54:16,443][09057] Updated weights for policy 0, policy_version 247648 (0.0017) [2025-01-05 11:54:17,842][08963] Fps is (10 sec: 19661.1, 60 sec: 19592.6, 300 sec: 19480.3). Total num frames: 1014390784. Throughput: 0: 4889.7. Samples: 3586514. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-05 11:54:17,842][08963] Avg episode reward: [(0, '9.995')] [2025-01-05 11:54:18,548][09057] Updated weights for policy 0, policy_version 247658 (0.0017) [2025-01-05 11:54:20,664][09057] Updated weights for policy 0, policy_version 247668 (0.0017) [2025-01-05 11:54:22,727][09057] Updated weights for policy 0, policy_version 247678 (0.0017) [2025-01-05 11:54:22,842][08963] Fps is (10 sec: 19660.8, 60 sec: 19592.5, 300 sec: 19480.3). Total num frames: 1014489088. Throughput: 0: 4897.4. Samples: 3615850. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-05 11:54:22,842][08963] Avg episode reward: [(0, '10.212')] [2025-01-05 11:54:24,867][09057] Updated weights for policy 0, policy_version 247688 (0.0020) [2025-01-05 11:54:26,956][09057] Updated weights for policy 0, policy_version 247698 (0.0017) [2025-01-05 11:54:27,842][08963] Fps is (10 sec: 19251.0, 60 sec: 19524.3, 300 sec: 19466.4). Total num frames: 1014583296. Throughput: 0: 4888.6. Samples: 3630084. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-05 11:54:27,842][08963] Avg episode reward: [(0, '9.630')] [2025-01-05 11:54:29,164][09057] Updated weights for policy 0, policy_version 247708 (0.0018) [2025-01-05 11:54:31,242][09057] Updated weights for policy 0, policy_version 247718 (0.0017) [2025-01-05 11:54:32,842][08963] Fps is (10 sec: 19251.4, 60 sec: 19524.3, 300 sec: 19466.4). Total num frames: 1014681600. Throughput: 0: 4880.3. Samples: 3658824. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-05 11:54:32,842][08963] Avg episode reward: [(0, '9.385')] [2025-01-05 11:54:33,454][09057] Updated weights for policy 0, policy_version 247728 (0.0018) [2025-01-05 11:54:35,538][09057] Updated weights for policy 0, policy_version 247738 (0.0016) [2025-01-05 11:54:37,568][09057] Updated weights for policy 0, policy_version 247748 (0.0016) [2025-01-05 11:54:37,842][08963] Fps is (10 sec: 19660.8, 60 sec: 19524.3, 300 sec: 19480.3). Total num frames: 1014779904. Throughput: 0: 4873.9. Samples: 3688160. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-05 11:54:37,842][08963] Avg episode reward: [(0, '10.677')] [2025-01-05 11:54:39,744][09057] Updated weights for policy 0, policy_version 247758 (0.0018) [2025-01-05 11:54:41,852][09057] Updated weights for policy 0, policy_version 247768 (0.0017) [2025-01-05 11:54:42,842][08963] Fps is (10 sec: 19250.9, 60 sec: 19456.0, 300 sec: 19466.4). Total num frames: 1014874112. Throughput: 0: 4857.4. Samples: 3702372. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-05 11:54:42,842][08963] Avg episode reward: [(0, '10.412')] [2025-01-05 11:54:43,964][09057] Updated weights for policy 0, policy_version 247778 (0.0017) [2025-01-05 11:54:46,065][09057] Updated weights for policy 0, policy_version 247788 (0.0017) [2025-01-05 11:54:47,842][08963] Fps is (10 sec: 19251.2, 60 sec: 19456.0, 300 sec: 19466.4). Total num frames: 1014972416. Throughput: 0: 4853.6. Samples: 3731542. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-05 11:54:47,842][08963] Avg episode reward: [(0, '10.062')] [2025-01-05 11:54:48,229][09057] Updated weights for policy 0, policy_version 247798 (0.0018) [2025-01-05 11:54:50,319][09057] Updated weights for policy 0, policy_version 247808 (0.0017) [2025-01-05 11:54:52,374][09057] Updated weights for policy 0, policy_version 247818 (0.0016) [2025-01-05 11:54:52,842][08963] Fps is (10 sec: 19660.9, 60 sec: 19456.0, 300 sec: 19480.3). Total num frames: 1015070720. Throughput: 0: 4863.1. Samples: 3760870. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 11:54:52,842][08963] Avg episode reward: [(0, '10.562')] [2025-01-05 11:54:54,536][09057] Updated weights for policy 0, policy_version 247828 (0.0017) [2025-01-05 11:54:56,610][09057] Updated weights for policy 0, policy_version 247838 (0.0016) [2025-01-05 11:54:57,842][08963] Fps is (10 sec: 19251.1, 60 sec: 19387.7, 300 sec: 19466.4). Total num frames: 1015164928. Throughput: 0: 4855.9. Samples: 3775122. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 11:54:57,842][08963] Avg episode reward: [(0, '9.871')] [2025-01-05 11:54:58,741][09057] Updated weights for policy 0, policy_version 247848 (0.0016) [2025-01-05 11:55:00,799][09057] Updated weights for policy 0, policy_version 247858 (0.0016) [2025-01-05 11:55:02,842][08963] Fps is (10 sec: 19251.1, 60 sec: 19387.7, 300 sec: 19466.4). Total num frames: 1015263232. Throughput: 0: 4845.1. Samples: 3804544. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 11:55:02,842][08963] Avg episode reward: [(0, '10.394')] [2025-01-05 11:55:02,903][09057] Updated weights for policy 0, policy_version 247868 (0.0017) [2025-01-05 11:55:05,076][09057] Updated weights for policy 0, policy_version 247878 (0.0017) [2025-01-05 11:55:07,175][09057] Updated weights for policy 0, policy_version 247888 (0.0017) [2025-01-05 11:55:07,842][08963] Fps is (10 sec: 19660.8, 60 sec: 19456.0, 300 sec: 19466.4). Total num frames: 1015361536. Throughput: 0: 4837.5. Samples: 3833538. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 11:55:07,842][08963] Avg episode reward: [(0, '9.564')] [2025-01-05 11:55:09,287][09057] Updated weights for policy 0, policy_version 247898 (0.0018) [2025-01-05 11:55:11,405][09057] Updated weights for policy 0, policy_version 247908 (0.0017) [2025-01-05 11:55:12,842][08963] Fps is (10 sec: 19251.3, 60 sec: 19387.7, 300 sec: 19466.4). Total num frames: 1015455744. Throughput: 0: 4840.3. Samples: 3847896. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 11:55:12,842][08963] Avg episode reward: [(0, '10.319')] [2025-01-05 11:55:13,562][09057] Updated weights for policy 0, policy_version 247918 (0.0017) [2025-01-05 11:55:15,627][09057] Updated weights for policy 0, policy_version 247928 (0.0017) [2025-01-05 11:55:17,677][09057] Updated weights for policy 0, policy_version 247938 (0.0016) [2025-01-05 11:55:17,842][08963] Fps is (10 sec: 19251.4, 60 sec: 19387.7, 300 sec: 19466.4). Total num frames: 1015554048. Throughput: 0: 4852.2. Samples: 3877172. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 11:55:17,842][08963] Avg episode reward: [(0, '9.622')] [2025-01-05 11:55:19,861][09057] Updated weights for policy 0, policy_version 247948 (0.0016) [2025-01-05 11:55:21,917][09057] Updated weights for policy 0, policy_version 247958 (0.0017) [2025-01-05 11:55:22,842][08963] Fps is (10 sec: 19660.8, 60 sec: 19387.7, 300 sec: 19466.4). Total num frames: 1015652352. Throughput: 0: 4847.5. Samples: 3906298. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 11:55:22,842][08963] Avg episode reward: [(0, '9.826')] [2025-01-05 11:55:24,086][09057] Updated weights for policy 0, policy_version 247968 (0.0018) [2025-01-05 11:55:26,202][09057] Updated weights for policy 0, policy_version 247978 (0.0017) [2025-01-05 11:55:27,842][08963] Fps is (10 sec: 19251.0, 60 sec: 19387.7, 300 sec: 19466.4). Total num frames: 1015746560. Throughput: 0: 4851.0. Samples: 3920666. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 11:55:27,842][08963] Avg episode reward: [(0, '9.662')] [2025-01-05 11:55:28,310][09057] Updated weights for policy 0, policy_version 247988 (0.0016) [2025-01-05 11:55:30,427][09057] Updated weights for policy 0, policy_version 247998 (0.0016) [2025-01-05 11:55:32,488][09057] Updated weights for policy 0, policy_version 248008 (0.0018) [2025-01-05 11:55:32,842][08963] Fps is (10 sec: 19251.2, 60 sec: 19387.7, 300 sec: 19466.4). Total num frames: 1015844864. Throughput: 0: 4853.8. Samples: 3949962. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 11:55:32,842][08963] Avg episode reward: [(0, '10.048')] [2025-01-05 11:55:34,604][09057] Updated weights for policy 0, policy_version 248018 (0.0017) [2025-01-05 11:55:36,717][09057] Updated weights for policy 0, policy_version 248028 (0.0016) [2025-01-05 11:55:37,842][08963] Fps is (10 sec: 19660.7, 60 sec: 19387.7, 300 sec: 19480.3). Total num frames: 1015943168. Throughput: 0: 4846.2. Samples: 3978950. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 11:55:37,842][08963] Avg episode reward: [(0, '10.527')] [2025-01-05 11:55:38,933][09057] Updated weights for policy 0, policy_version 248038 (0.0018) [2025-01-05 11:55:41,012][09057] Updated weights for policy 0, policy_version 248048 (0.0015) [2025-01-05 11:55:42,842][08963] Fps is (10 sec: 19251.3, 60 sec: 19387.8, 300 sec: 19466.4). Total num frames: 1016037376. Throughput: 0: 4849.4. Samples: 3993344. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 11:55:42,842][08963] Avg episode reward: [(0, '9.042')] [2025-01-05 11:55:43,188][09057] Updated weights for policy 0, policy_version 248058 (0.0017) [2025-01-05 11:55:45,270][09057] Updated weights for policy 0, policy_version 248068 (0.0017) [2025-01-05 11:55:47,318][09057] Updated weights for policy 0, policy_version 248078 (0.0017) [2025-01-05 11:55:47,842][08963] Fps is (10 sec: 19251.1, 60 sec: 19387.7, 300 sec: 19466.4). Total num frames: 1016135680. Throughput: 0: 4842.9. Samples: 4022476. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 11:55:47,843][08963] Avg episode reward: [(0, '9.610')] [2025-01-05 11:55:47,953][09024] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000248081_1016139776.pth... [2025-01-05 11:55:48,005][09024] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000246934_1011441664.pth [2025-01-05 11:55:49,459][09057] Updated weights for policy 0, policy_version 248088 (0.0017) [2025-01-05 11:55:51,553][09057] Updated weights for policy 0, policy_version 248098 (0.0016) [2025-01-05 11:55:52,842][08963] Fps is (10 sec: 19660.8, 60 sec: 19387.7, 300 sec: 19466.4). Total num frames: 1016233984. Throughput: 0: 4843.9. Samples: 4051512. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 11:55:52,842][08963] Avg episode reward: [(0, '10.844')] [2025-01-05 11:55:53,717][09057] Updated weights for policy 0, policy_version 248108 (0.0016) [2025-01-05 11:55:55,793][09057] Updated weights for policy 0, policy_version 248118 (0.0017) [2025-01-05 11:55:57,842][08963] Fps is (10 sec: 19251.4, 60 sec: 19387.7, 300 sec: 19466.4). Total num frames: 1016328192. Throughput: 0: 4847.0. Samples: 4066012. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 11:55:57,842][08963] Avg episode reward: [(0, '11.741')] [2025-01-05 11:55:57,976][09057] Updated weights for policy 0, policy_version 248128 (0.0017) [2025-01-05 11:56:00,160][09057] Updated weights for policy 0, policy_version 248138 (0.0018) [2025-01-05 11:56:02,222][09057] Updated weights for policy 0, policy_version 248148 (0.0016) [2025-01-05 11:56:02,842][08963] Fps is (10 sec: 18841.6, 60 sec: 19319.5, 300 sec: 19452.6). Total num frames: 1016422400. Throughput: 0: 4831.3. Samples: 4094582. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 11:56:02,842][08963] Avg episode reward: [(0, '9.427')] [2025-01-05 11:56:04,413][09057] Updated weights for policy 0, policy_version 248158 (0.0018) [2025-01-05 11:56:06,581][09057] Updated weights for policy 0, policy_version 248168 (0.0017) [2025-01-05 11:56:07,842][08963] Fps is (10 sec: 18841.7, 60 sec: 19251.2, 300 sec: 19438.6). Total num frames: 1016516608. Throughput: 0: 4814.8. Samples: 4122966. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 11:56:07,842][08963] Avg episode reward: [(0, '10.178')] [2025-01-05 11:56:08,748][09057] Updated weights for policy 0, policy_version 248178 (0.0017) [2025-01-05 11:56:10,804][09057] Updated weights for policy 0, policy_version 248188 (0.0016) [2025-01-05 11:56:12,842][08963] Fps is (10 sec: 19251.2, 60 sec: 19319.5, 300 sec: 19438.6). Total num frames: 1016614912. Throughput: 0: 4820.9. Samples: 4137606. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 11:56:12,842][08963] Avg episode reward: [(0, '9.852')] [2025-01-05 11:56:13,042][09057] Updated weights for policy 0, policy_version 248198 (0.0018) [2025-01-05 11:56:15,178][09057] Updated weights for policy 0, policy_version 248208 (0.0017) [2025-01-05 11:56:17,286][09057] Updated weights for policy 0, policy_version 248218 (0.0016) [2025-01-05 11:56:17,842][08963] Fps is (10 sec: 19251.1, 60 sec: 19251.2, 300 sec: 19424.7). Total num frames: 1016709120. Throughput: 0: 4797.5. Samples: 4165852. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 11:56:17,842][08963] Avg episode reward: [(0, '10.208')] [2025-01-05 11:56:19,477][09057] Updated weights for policy 0, policy_version 248228 (0.0017) [2025-01-05 11:56:21,506][09057] Updated weights for policy 0, policy_version 248238 (0.0016) [2025-01-05 11:56:22,842][08963] Fps is (10 sec: 19251.1, 60 sec: 19251.2, 300 sec: 19438.6). Total num frames: 1016807424. Throughput: 0: 4799.8. Samples: 4194942. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 11:56:22,842][08963] Avg episode reward: [(0, '10.760')] [2025-01-05 11:56:23,672][09057] Updated weights for policy 0, policy_version 248248 (0.0018) [2025-01-05 11:56:25,785][09057] Updated weights for policy 0, policy_version 248258 (0.0017) [2025-01-05 11:56:27,842][08963] Fps is (10 sec: 19251.2, 60 sec: 19251.2, 300 sec: 19424.8). Total num frames: 1016901632. Throughput: 0: 4804.9. Samples: 4209566. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 11:56:27,843][08963] Avg episode reward: [(0, '9.822')] [2025-01-05 11:56:27,919][09057] Updated weights for policy 0, policy_version 248268 (0.0017) [2025-01-05 11:56:30,079][09057] Updated weights for policy 0, policy_version 248278 (0.0017) [2025-01-05 11:56:32,220][09057] Updated weights for policy 0, policy_version 248288 (0.0017) [2025-01-05 11:56:32,842][08963] Fps is (10 sec: 18841.7, 60 sec: 19183.0, 300 sec: 19424.8). Total num frames: 1016995840. Throughput: 0: 4791.4. Samples: 4238088. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 11:56:32,842][08963] Avg episode reward: [(0, '9.180')] [2025-01-05 11:56:34,336][09057] Updated weights for policy 0, policy_version 248298 (0.0018) [2025-01-05 11:56:36,445][09057] Updated weights for policy 0, policy_version 248308 (0.0016) [2025-01-05 11:56:37,842][08963] Fps is (10 sec: 19251.1, 60 sec: 19182.9, 300 sec: 19480.3). Total num frames: 1017094144. Throughput: 0: 4790.0. Samples: 4267062. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 11:56:37,843][08963] Avg episode reward: [(0, '8.870')] [2025-01-05 11:56:38,600][09057] Updated weights for policy 0, policy_version 248318 (0.0017) [2025-01-05 11:56:40,654][09057] Updated weights for policy 0, policy_version 248328 (0.0017) [2025-01-05 11:56:42,711][09057] Updated weights for policy 0, policy_version 248338 (0.0015) [2025-01-05 11:56:42,842][08963] Fps is (10 sec: 19660.6, 60 sec: 19251.2, 300 sec: 19508.1). Total num frames: 1017192448. Throughput: 0: 4791.4. Samples: 4281626. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 11:56:42,843][08963] Avg episode reward: [(0, '9.174')] [2025-01-05 11:56:44,899][09057] Updated weights for policy 0, policy_version 248348 (0.0018) [2025-01-05 11:56:46,950][09057] Updated weights for policy 0, policy_version 248358 (0.0016) [2025-01-05 11:56:47,842][08963] Fps is (10 sec: 19660.9, 60 sec: 19251.2, 300 sec: 19508.1). Total num frames: 1017290752. Throughput: 0: 4807.9. Samples: 4310936. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 11:56:47,842][08963] Avg episode reward: [(0, '9.992')] [2025-01-05 11:56:49,109][09057] Updated weights for policy 0, policy_version 248368 (0.0017) [2025-01-05 11:56:51,252][09057] Updated weights for policy 0, policy_version 248378 (0.0016) [2025-01-05 11:56:52,842][08963] Fps is (10 sec: 19251.3, 60 sec: 19182.9, 300 sec: 19494.2). Total num frames: 1017384960. Throughput: 0: 4813.9. Samples: 4339592. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 11:56:52,843][08963] Avg episode reward: [(0, '9.714')] [2025-01-05 11:56:53,401][09057] Updated weights for policy 0, policy_version 248388 (0.0018) [2025-01-05 11:56:55,482][09057] Updated weights for policy 0, policy_version 248398 (0.0015) [2025-01-05 11:56:57,566][09057] Updated weights for policy 0, policy_version 248408 (0.0016) [2025-01-05 11:56:57,842][08963] Fps is (10 sec: 19251.2, 60 sec: 19251.2, 300 sec: 19494.2). Total num frames: 1017483264. Throughput: 0: 4813.2. Samples: 4354200. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 11:56:57,842][08963] Avg episode reward: [(0, '9.696')] [2025-01-05 11:56:59,723][09057] Updated weights for policy 0, policy_version 248418 (0.0017) [2025-01-05 11:57:01,786][09057] Updated weights for policy 0, policy_version 248428 (0.0016) [2025-01-05 11:57:02,842][08963] Fps is (10 sec: 19251.5, 60 sec: 19251.2, 300 sec: 19466.4). Total num frames: 1017577472. Throughput: 0: 4834.0. Samples: 4383382. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 11:57:02,842][08963] Avg episode reward: [(0, '9.541')] [2025-01-05 11:57:03,995][09057] Updated weights for policy 0, policy_version 248438 (0.0017) [2025-01-05 11:57:06,089][09057] Updated weights for policy 0, policy_version 248448 (0.0016) [2025-01-05 11:57:07,842][08963] Fps is (10 sec: 19251.2, 60 sec: 19319.5, 300 sec: 19466.4). Total num frames: 1017675776. Throughput: 0: 4823.6. Samples: 4412004. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 11:57:07,843][08963] Avg episode reward: [(0, '9.579')] [2025-01-05 11:57:08,227][09057] Updated weights for policy 0, policy_version 248458 (0.0018) [2025-01-05 11:57:10,347][09057] Updated weights for policy 0, policy_version 248468 (0.0017) [2025-01-05 11:57:12,413][09057] Updated weights for policy 0, policy_version 248478 (0.0014) [2025-01-05 11:57:12,842][08963] Fps is (10 sec: 19660.7, 60 sec: 19319.5, 300 sec: 19480.3). Total num frames: 1017774080. Throughput: 0: 4822.4. Samples: 4426572. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 11:57:12,842][08963] Avg episode reward: [(0, '9.191')] [2025-01-05 11:57:14,571][09057] Updated weights for policy 0, policy_version 248488 (0.0017) [2025-01-05 11:57:16,644][09057] Updated weights for policy 0, policy_version 248498 (0.0015) [2025-01-05 11:57:17,842][08963] Fps is (10 sec: 19251.4, 60 sec: 19319.5, 300 sec: 19466.4). Total num frames: 1017868288. Throughput: 0: 4838.2. Samples: 4455808. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 11:57:17,842][08963] Avg episode reward: [(0, '9.642')] [2025-01-05 11:57:18,807][09057] Updated weights for policy 0, policy_version 248508 (0.0018) [2025-01-05 11:57:20,887][09057] Updated weights for policy 0, policy_version 248518 (0.0016) [2025-01-05 11:57:22,842][08963] Fps is (10 sec: 19251.3, 60 sec: 19319.5, 300 sec: 19452.5). Total num frames: 1017966592. Throughput: 0: 4834.2. Samples: 4484600. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 11:57:22,842][08963] Avg episode reward: [(0, '9.580')] [2025-01-05 11:57:23,066][09057] Updated weights for policy 0, policy_version 248528 (0.0017) [2025-01-05 11:57:25,113][09057] Updated weights for policy 0, policy_version 248538 (0.0016) [2025-01-05 11:57:27,175][09057] Updated weights for policy 0, policy_version 248548 (0.0017) [2025-01-05 11:57:27,842][08963] Fps is (10 sec: 19660.5, 60 sec: 19387.7, 300 sec: 19452.5). Total num frames: 1018064896. Throughput: 0: 4839.1. Samples: 4499384. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 11:57:27,843][08963] Avg episode reward: [(0, '9.919')] [2025-01-05 11:57:29,360][09057] Updated weights for policy 0, policy_version 248558 (0.0018) [2025-01-05 11:57:31,449][09057] Updated weights for policy 0, policy_version 248568 (0.0016) [2025-01-05 11:57:32,842][08963] Fps is (10 sec: 19251.2, 60 sec: 19387.8, 300 sec: 19424.8). Total num frames: 1018159104. Throughput: 0: 4832.3. Samples: 4528390. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 11:57:32,842][08963] Avg episode reward: [(0, '9.437')] [2025-01-05 11:57:33,623][09057] Updated weights for policy 0, policy_version 248578 (0.0017) [2025-01-05 11:57:35,724][09057] Updated weights for policy 0, policy_version 248588 (0.0016) [2025-01-05 11:57:37,842][08963] Fps is (10 sec: 18841.9, 60 sec: 19319.5, 300 sec: 19410.9). Total num frames: 1018253312. Throughput: 0: 4830.4. Samples: 4556958. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 11:57:37,842][08963] Avg episode reward: [(0, '9.801')] [2025-01-05 11:57:37,930][09057] Updated weights for policy 0, policy_version 248598 (0.0017) [2025-01-05 11:57:40,082][09057] Updated weights for policy 0, policy_version 248608 (0.0017) [2025-01-05 11:57:42,153][09057] Updated weights for policy 0, policy_version 248618 (0.0017) [2025-01-05 11:57:42,842][08963] Fps is (10 sec: 19251.0, 60 sec: 19319.5, 300 sec: 19424.8). Total num frames: 1018351616. Throughput: 0: 4822.9. Samples: 4571228. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 11:57:42,842][08963] Avg episode reward: [(0, '9.181')] [2025-01-05 11:57:44,313][09057] Updated weights for policy 0, policy_version 248628 (0.0017) [2025-01-05 11:57:46,352][09057] Updated weights for policy 0, policy_version 248638 (0.0016) [2025-01-05 11:57:47,842][08963] Fps is (10 sec: 19250.9, 60 sec: 19251.2, 300 sec: 19410.9). Total num frames: 1018445824. Throughput: 0: 4828.2. Samples: 4600654. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 11:57:47,842][08963] Avg episode reward: [(0, '10.043')] [2025-01-05 11:57:47,851][09024] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000248645_1018449920.pth... [2025-01-05 11:57:47,905][09024] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000247511_1013805056.pth [2025-01-05 11:57:48,517][09057] Updated weights for policy 0, policy_version 248648 (0.0017) [2025-01-05 11:57:50,630][09057] Updated weights for policy 0, policy_version 248658 (0.0017) [2025-01-05 11:57:52,637][09057] Updated weights for policy 0, policy_version 248668 (0.0016) [2025-01-05 11:57:52,842][08963] Fps is (10 sec: 19251.4, 60 sec: 19319.5, 300 sec: 19397.0). Total num frames: 1018544128. Throughput: 0: 4842.8. Samples: 4629930. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 11:57:52,842][08963] Avg episode reward: [(0, '10.045')] [2025-01-05 11:57:54,686][09057] Updated weights for policy 0, policy_version 248678 (0.0016) [2025-01-05 11:57:56,745][09057] Updated weights for policy 0, policy_version 248688 (0.0015) [2025-01-05 11:57:57,842][08963] Fps is (10 sec: 20070.7, 60 sec: 19387.8, 300 sec: 19410.9). Total num frames: 1018646528. Throughput: 0: 4853.9. Samples: 4644996. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 11:57:57,842][08963] Avg episode reward: [(0, '9.902')] [2025-01-05 11:57:58,849][09057] Updated weights for policy 0, policy_version 248698 (0.0017) [2025-01-05 11:58:00,938][09057] Updated weights for policy 0, policy_version 248708 (0.0016) [2025-01-05 11:58:02,842][08963] Fps is (10 sec: 19660.6, 60 sec: 19387.7, 300 sec: 19397.0). Total num frames: 1018740736. Throughput: 0: 4855.8. Samples: 4674318. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 11:58:02,842][08963] Avg episode reward: [(0, '10.309')] [2025-01-05 11:58:03,106][09057] Updated weights for policy 0, policy_version 248718 (0.0017) [2025-01-05 11:58:05,119][09057] Updated weights for policy 0, policy_version 248728 (0.0017) [2025-01-05 11:58:07,155][09057] Updated weights for policy 0, policy_version 248738 (0.0016) [2025-01-05 11:58:07,842][08963] Fps is (10 sec: 19660.6, 60 sec: 19456.0, 300 sec: 19410.9). Total num frames: 1018843136. Throughput: 0: 4875.1. Samples: 4703978. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 11:58:07,842][08963] Avg episode reward: [(0, '9.662')] [2025-01-05 11:58:09,351][09057] Updated weights for policy 0, policy_version 248748 (0.0018) [2025-01-05 11:58:11,335][09057] Updated weights for policy 0, policy_version 248758 (0.0015) [2025-01-05 11:58:12,842][08963] Fps is (10 sec: 20070.1, 60 sec: 19455.9, 300 sec: 19410.9). Total num frames: 1018941440. Throughput: 0: 4869.4. Samples: 4718506. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 11:58:12,843][08963] Avg episode reward: [(0, '9.437')] [2025-01-05 11:58:13,399][09057] Updated weights for policy 0, policy_version 248768 (0.0017) [2025-01-05 11:58:15,480][09057] Updated weights for policy 0, policy_version 248778 (0.0016) [2025-01-05 11:58:17,447][09057] Updated weights for policy 0, policy_version 248788 (0.0015) [2025-01-05 11:58:17,842][08963] Fps is (10 sec: 19661.1, 60 sec: 19524.3, 300 sec: 19410.9). Total num frames: 1019039744. Throughput: 0: 4895.9. Samples: 4748704. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 11:58:17,842][08963] Avg episode reward: [(0, '10.443')] [2025-01-05 11:58:19,510][09057] Updated weights for policy 0, policy_version 248798 (0.0016) [2025-01-05 11:58:21,579][09057] Updated weights for policy 0, policy_version 248808 (0.0015) [2025-01-05 11:58:22,842][08963] Fps is (10 sec: 19661.3, 60 sec: 19524.3, 300 sec: 19410.9). Total num frames: 1019138048. Throughput: 0: 4927.7. Samples: 4778706. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) [2025-01-05 11:58:22,842][08963] Avg episode reward: [(0, '9.440')] [2025-01-05 11:58:23,646][09057] Updated weights for policy 0, policy_version 248818 (0.0016) [2025-01-05 11:58:25,744][09057] Updated weights for policy 0, policy_version 248828 (0.0016) [2025-01-05 11:58:27,842][08963] Fps is (10 sec: 19660.7, 60 sec: 19524.3, 300 sec: 19410.9). Total num frames: 1019236352. Throughput: 0: 4936.9. Samples: 4793388. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) [2025-01-05 11:58:27,842][08963] Avg episode reward: [(0, '9.175')] [2025-01-05 11:58:27,929][09057] Updated weights for policy 0, policy_version 248838 (0.0017) [2025-01-05 11:58:29,981][09057] Updated weights for policy 0, policy_version 248848 (0.0017) [2025-01-05 11:58:32,083][09057] Updated weights for policy 0, policy_version 248858 (0.0016) [2025-01-05 11:58:32,842][08963] Fps is (10 sec: 19660.8, 60 sec: 19592.5, 300 sec: 19410.9). Total num frames: 1019334656. Throughput: 0: 4927.5. Samples: 4822392. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) [2025-01-05 11:58:32,842][08963] Avg episode reward: [(0, '9.002')] [2025-01-05 11:58:34,264][09057] Updated weights for policy 0, policy_version 248868 (0.0017) [2025-01-05 11:58:36,281][09057] Updated weights for policy 0, policy_version 248878 (0.0016) [2025-01-05 11:58:37,842][08963] Fps is (10 sec: 19660.6, 60 sec: 19660.8, 300 sec: 19410.9). Total num frames: 1019432960. Throughput: 0: 4927.7. Samples: 4851678. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) [2025-01-05 11:58:37,842][08963] Avg episode reward: [(0, '9.863')] [2025-01-05 11:58:38,368][09057] Updated weights for policy 0, policy_version 248888 (0.0017) [2025-01-05 11:58:40,502][09057] Updated weights for policy 0, policy_version 248898 (0.0017) [2025-01-05 11:58:42,467][09057] Updated weights for policy 0, policy_version 248908 (0.0017) [2025-01-05 11:58:42,842][08963] Fps is (10 sec: 19660.6, 60 sec: 19660.8, 300 sec: 19410.9). Total num frames: 1019531264. Throughput: 0: 4921.8. Samples: 4866478. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) [2025-01-05 11:58:42,842][08963] Avg episode reward: [(0, '9.595')] [2025-01-05 11:58:44,539][09057] Updated weights for policy 0, policy_version 248918 (0.0016) [2025-01-05 11:58:46,621][09057] Updated weights for policy 0, policy_version 248928 (0.0016) [2025-01-05 11:58:47,842][08963] Fps is (10 sec: 19660.2, 60 sec: 19729.0, 300 sec: 19410.9). Total num frames: 1019629568. Throughput: 0: 4938.1. Samples: 4896532. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) [2025-01-05 11:58:47,843][08963] Avg episode reward: [(0, '10.039')] [2025-01-05 11:58:48,664][09057] Updated weights for policy 0, policy_version 248938 (0.0018) [2025-01-05 11:58:50,823][09057] Updated weights for policy 0, policy_version 248948 (0.0017) [2025-01-05 11:58:52,842][08963] Fps is (10 sec: 19660.8, 60 sec: 19729.0, 300 sec: 19410.9). Total num frames: 1019727872. Throughput: 0: 4921.1. Samples: 4925428. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) [2025-01-05 11:58:52,842][08963] Avg episode reward: [(0, '11.493')] [2025-01-05 11:58:53,007][09057] Updated weights for policy 0, policy_version 248958 (0.0017) [2025-01-05 11:58:54,957][09057] Updated weights for policy 0, policy_version 248968 (0.0015) [2025-01-05 11:58:57,049][09057] Updated weights for policy 0, policy_version 248978 (0.0015) [2025-01-05 11:58:57,842][08963] Fps is (10 sec: 19661.6, 60 sec: 19660.8, 300 sec: 19410.9). Total num frames: 1019826176. Throughput: 0: 4930.8. Samples: 4940392. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) [2025-01-05 11:58:57,842][08963] Avg episode reward: [(0, '10.734')] [2025-01-05 11:58:59,230][09057] Updated weights for policy 0, policy_version 248988 (0.0017) [2025-01-05 11:59:01,222][09057] Updated weights for policy 0, policy_version 248998 (0.0016) [2025-01-05 11:59:02,842][08963] Fps is (10 sec: 19660.8, 60 sec: 19729.1, 300 sec: 19424.8). Total num frames: 1019924480. Throughput: 0: 4914.8. Samples: 4969870. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) [2025-01-05 11:59:02,842][08963] Avg episode reward: [(0, '10.405')] [2025-01-05 11:59:03,323][09057] Updated weights for policy 0, policy_version 249008 (0.0017) [2025-01-05 11:59:05,390][09057] Updated weights for policy 0, policy_version 249018 (0.0016) [2025-01-05 11:59:07,327][09057] Updated weights for policy 0, policy_version 249028 (0.0016) [2025-01-05 11:59:07,842][08963] Fps is (10 sec: 20070.4, 60 sec: 19729.1, 300 sec: 19438.7). Total num frames: 1020026880. Throughput: 0: 4920.2. Samples: 5000114. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) [2025-01-05 11:59:07,842][08963] Avg episode reward: [(0, '11.300')] [2025-01-05 11:59:09,406][09057] Updated weights for policy 0, policy_version 249038 (0.0016) [2025-01-05 11:59:11,467][09057] Updated weights for policy 0, policy_version 249048 (0.0017) [2025-01-05 11:59:12,842][08963] Fps is (10 sec: 20070.6, 60 sec: 19729.2, 300 sec: 19438.6). Total num frames: 1020125184. Throughput: 0: 4930.6. Samples: 5015264. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) [2025-01-05 11:59:12,842][08963] Avg episode reward: [(0, '9.716')] [2025-01-05 11:59:13,504][09057] Updated weights for policy 0, policy_version 249058 (0.0017) [2025-01-05 11:59:15,657][09057] Updated weights for policy 0, policy_version 249068 (0.0018) [2025-01-05 11:59:17,738][09057] Updated weights for policy 0, policy_version 249078 (0.0016) [2025-01-05 11:59:17,842][08963] Fps is (10 sec: 19660.6, 60 sec: 19729.0, 300 sec: 19438.6). Total num frames: 1020223488. Throughput: 0: 4937.3. Samples: 5044572. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) [2025-01-05 11:59:17,842][08963] Avg episode reward: [(0, '9.869')] [2025-01-05 11:59:19,768][09057] Updated weights for policy 0, policy_version 249088 (0.0016) [2025-01-05 11:59:21,870][09057] Updated weights for policy 0, policy_version 249098 (0.0016) [2025-01-05 11:59:22,842][08963] Fps is (10 sec: 19660.6, 60 sec: 19729.0, 300 sec: 19452.5). Total num frames: 1020321792. Throughput: 0: 4943.6. Samples: 5074140. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) [2025-01-05 11:59:22,842][08963] Avg episode reward: [(0, '9.626')] [2025-01-05 11:59:24,046][09057] Updated weights for policy 0, policy_version 249108 (0.0017) [2025-01-05 11:59:26,056][09057] Updated weights for policy 0, policy_version 249118 (0.0017) [2025-01-05 11:59:27,162][09024] Signal inference workers to stop experience collection... (400 times) [2025-01-05 11:59:27,163][09024] Signal inference workers to resume experience collection... (400 times) [2025-01-05 11:59:27,170][09057] InferenceWorker_p0-w0: stopping experience collection (400 times) [2025-01-05 11:59:27,171][09057] InferenceWorker_p0-w0: resuming experience collection (400 times) [2025-01-05 11:59:27,842][08963] Fps is (10 sec: 19661.0, 60 sec: 19729.1, 300 sec: 19452.5). Total num frames: 1020420096. Throughput: 0: 4934.9. Samples: 5088550. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) [2025-01-05 11:59:27,842][08963] Avg episode reward: [(0, '9.289')] [2025-01-05 11:59:28,121][09057] Updated weights for policy 0, policy_version 249128 (0.0016) [2025-01-05 11:59:30,231][09057] Updated weights for policy 0, policy_version 249138 (0.0016) [2025-01-05 11:59:32,170][09057] Updated weights for policy 0, policy_version 249148 (0.0016) [2025-01-05 11:59:32,842][08963] Fps is (10 sec: 20070.6, 60 sec: 19797.3, 300 sec: 19466.4). Total num frames: 1020522496. Throughput: 0: 4938.4. Samples: 5118758. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) [2025-01-05 11:59:32,842][08963] Avg episode reward: [(0, '10.175')] [2025-01-05 11:59:34,236][09057] Updated weights for policy 0, policy_version 249158 (0.0016) [2025-01-05 11:59:36,311][09057] Updated weights for policy 0, policy_version 249168 (0.0015) [2025-01-05 11:59:37,842][08963] Fps is (10 sec: 20070.0, 60 sec: 19797.3, 300 sec: 19480.3). Total num frames: 1020620800. Throughput: 0: 4959.2. Samples: 5148592. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) [2025-01-05 11:59:37,843][08963] Avg episode reward: [(0, '10.075')] [2025-01-05 11:59:38,369][09057] Updated weights for policy 0, policy_version 249178 (0.0017) [2025-01-05 11:59:40,431][09057] Updated weights for policy 0, policy_version 249188 (0.0015) [2025-01-05 11:59:42,534][09057] Updated weights for policy 0, policy_version 249198 (0.0016) [2025-01-05 11:59:42,842][08963] Fps is (10 sec: 19660.8, 60 sec: 19797.4, 300 sec: 19480.3). Total num frames: 1020719104. Throughput: 0: 4956.8. Samples: 5163448. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) [2025-01-05 11:59:42,842][08963] Avg episode reward: [(0, '8.773')] [2025-01-05 11:59:44,583][09057] Updated weights for policy 0, policy_version 249208 (0.0017) [2025-01-05 11:59:46,711][09057] Updated weights for policy 0, policy_version 249218 (0.0017) [2025-01-05 11:59:47,842][08963] Fps is (10 sec: 19660.9, 60 sec: 19797.4, 300 sec: 19480.3). Total num frames: 1020817408. Throughput: 0: 4954.5. Samples: 5192824. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) [2025-01-05 11:59:47,842][08963] Avg episode reward: [(0, '11.168')] [2025-01-05 11:59:47,849][09024] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000249223_1020817408.pth... [2025-01-05 11:59:47,905][09024] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000248081_1016139776.pth [2025-01-05 11:59:48,928][09057] Updated weights for policy 0, policy_version 249228 (0.0018) [2025-01-05 11:59:50,950][09057] Updated weights for policy 0, policy_version 249238 (0.0015) [2025-01-05 11:59:52,842][08963] Fps is (10 sec: 19660.7, 60 sec: 19797.3, 300 sec: 19494.2). Total num frames: 1020915712. Throughput: 0: 4931.5. Samples: 5222034. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) [2025-01-05 11:59:52,842][08963] Avg episode reward: [(0, '10.445')] [2025-01-05 11:59:53,020][09057] Updated weights for policy 0, policy_version 249248 (0.0016) [2025-01-05 11:59:55,093][09057] Updated weights for policy 0, policy_version 249258 (0.0016) [2025-01-05 11:59:57,086][09057] Updated weights for policy 0, policy_version 249268 (0.0016) [2025-01-05 11:59:57,842][08963] Fps is (10 sec: 19661.1, 60 sec: 19797.3, 300 sec: 19494.2). Total num frames: 1021014016. Throughput: 0: 4929.2. Samples: 5237076. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) [2025-01-05 11:59:57,842][08963] Avg episode reward: [(0, '8.999')] [2025-01-05 11:59:59,152][09057] Updated weights for policy 0, policy_version 249278 (0.0016) [2025-01-05 12:00:01,194][09057] Updated weights for policy 0, policy_version 249288 (0.0015) [2025-01-05 12:00:02,842][08963] Fps is (10 sec: 19660.5, 60 sec: 19797.3, 300 sec: 19494.2). Total num frames: 1021112320. Throughput: 0: 4946.3. Samples: 5267156. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) [2025-01-05 12:00:02,843][08963] Avg episode reward: [(0, '10.802')] [2025-01-05 12:00:03,263][09057] Updated weights for policy 0, policy_version 249298 (0.0018) [2025-01-05 12:00:05,397][09057] Updated weights for policy 0, policy_version 249308 (0.0016) [2025-01-05 12:00:07,452][09057] Updated weights for policy 0, policy_version 249318 (0.0015) [2025-01-05 12:00:07,842][08963] Fps is (10 sec: 19660.7, 60 sec: 19729.0, 300 sec: 19508.1). Total num frames: 1021210624. Throughput: 0: 4944.3. Samples: 5296632. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) [2025-01-05 12:00:07,842][08963] Avg episode reward: [(0, '9.639')] [2025-01-05 12:00:09,560][09057] Updated weights for policy 0, policy_version 249328 (0.0018) [2025-01-05 12:00:11,712][09057] Updated weights for policy 0, policy_version 249338 (0.0018) [2025-01-05 12:00:12,842][08963] Fps is (10 sec: 19661.0, 60 sec: 19729.0, 300 sec: 19508.1). Total num frames: 1021308928. Throughput: 0: 4939.3. Samples: 5310818. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) [2025-01-05 12:00:12,842][08963] Avg episode reward: [(0, '9.741')] [2025-01-05 12:00:13,952][09057] Updated weights for policy 0, policy_version 249348 (0.0018) [2025-01-05 12:00:15,958][09057] Updated weights for policy 0, policy_version 249358 (0.0017) [2025-01-05 12:00:17,842][08963] Fps is (10 sec: 19660.8, 60 sec: 19729.1, 300 sec: 19508.1). Total num frames: 1021407232. Throughput: 0: 4912.5. Samples: 5339820. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) [2025-01-05 12:00:17,842][08963] Avg episode reward: [(0, '9.745')] [2025-01-05 12:00:18,074][09057] Updated weights for policy 0, policy_version 249368 (0.0017) [2025-01-05 12:00:20,180][09057] Updated weights for policy 0, policy_version 249378 (0.0016) [2025-01-05 12:00:22,186][09057] Updated weights for policy 0, policy_version 249388 (0.0016) [2025-01-05 12:00:22,842][08963] Fps is (10 sec: 19661.0, 60 sec: 19729.1, 300 sec: 19522.0). Total num frames: 1021505536. Throughput: 0: 4907.2. Samples: 5369414. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) [2025-01-05 12:00:22,842][08963] Avg episode reward: [(0, '9.492')] [2025-01-05 12:00:24,281][09057] Updated weights for policy 0, policy_version 249398 (0.0016) [2025-01-05 12:00:26,389][09057] Updated weights for policy 0, policy_version 249408 (0.0016) [2025-01-05 12:00:27,842][08963] Fps is (10 sec: 19660.7, 60 sec: 19729.0, 300 sec: 19522.0). Total num frames: 1021603840. Throughput: 0: 4909.1. Samples: 5384358. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) [2025-01-05 12:00:27,843][08963] Avg episode reward: [(0, '9.671')] [2025-01-05 12:00:28,474][09057] Updated weights for policy 0, policy_version 249418 (0.0017) [2025-01-05 12:00:30,616][09057] Updated weights for policy 0, policy_version 249428 (0.0016) [2025-01-05 12:00:32,726][09057] Updated weights for policy 0, policy_version 249438 (0.0016) [2025-01-05 12:00:32,842][08963] Fps is (10 sec: 19251.1, 60 sec: 19592.5, 300 sec: 19508.1). Total num frames: 1021698048. Throughput: 0: 4900.0. Samples: 5413322. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) [2025-01-05 12:00:32,842][08963] Avg episode reward: [(0, '9.966')] [2025-01-05 12:00:34,810][09057] Updated weights for policy 0, policy_version 249448 (0.0018) [2025-01-05 12:00:36,939][09057] Updated weights for policy 0, policy_version 249458 (0.0017) [2025-01-05 12:00:37,842][08963] Fps is (10 sec: 18841.8, 60 sec: 19524.3, 300 sec: 19508.1). Total num frames: 1021792256. Throughput: 0: 4893.3. Samples: 5442234. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) [2025-01-05 12:00:37,842][08963] Avg episode reward: [(0, '8.827')] [2025-01-05 12:00:39,152][09057] Updated weights for policy 0, policy_version 249468 (0.0018) [2025-01-05 12:00:41,159][09057] Updated weights for policy 0, policy_version 249478 (0.0016) [2025-01-05 12:00:42,842][08963] Fps is (10 sec: 19251.1, 60 sec: 19524.2, 300 sec: 19508.1). Total num frames: 1021890560. Throughput: 0: 4880.2. Samples: 5456686. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 12:00:42,842][08963] Avg episode reward: [(0, '9.945')] [2025-01-05 12:00:43,273][09057] Updated weights for policy 0, policy_version 249488 (0.0016) [2025-01-05 12:00:45,343][09057] Updated weights for policy 0, policy_version 249498 (0.0016) [2025-01-05 12:00:47,321][09057] Updated weights for policy 0, policy_version 249508 (0.0016) [2025-01-05 12:00:47,842][08963] Fps is (10 sec: 20070.1, 60 sec: 19592.5, 300 sec: 19521.9). Total num frames: 1021992960. Throughput: 0: 4873.7. Samples: 5486472. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 12:00:47,842][08963] Avg episode reward: [(0, '10.342')] [2025-01-05 12:00:49,425][09057] Updated weights for policy 0, policy_version 249518 (0.0016) [2025-01-05 12:00:51,543][09057] Updated weights for policy 0, policy_version 249528 (0.0017) [2025-01-05 12:00:52,842][08963] Fps is (10 sec: 19661.0, 60 sec: 19524.3, 300 sec: 19522.0). Total num frames: 1022087168. Throughput: 0: 4872.7. Samples: 5515902. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 12:00:52,842][08963] Avg episode reward: [(0, '9.810')] [2025-01-05 12:00:53,642][09057] Updated weights for policy 0, policy_version 249538 (0.0017) [2025-01-05 12:00:55,803][09057] Updated weights for policy 0, policy_version 249548 (0.0017) [2025-01-05 12:00:57,842][08963] Fps is (10 sec: 19251.2, 60 sec: 19524.2, 300 sec: 19535.8). Total num frames: 1022185472. Throughput: 0: 4877.9. Samples: 5530322. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 12:00:57,842][08963] Avg episode reward: [(0, '9.241')] [2025-01-05 12:00:57,985][09057] Updated weights for policy 0, policy_version 249558 (0.0017) [2025-01-05 12:01:00,072][09057] Updated weights for policy 0, policy_version 249568 (0.0017) [2025-01-05 12:01:02,224][09057] Updated weights for policy 0, policy_version 249578 (0.0016) [2025-01-05 12:01:02,842][08963] Fps is (10 sec: 19251.0, 60 sec: 19456.0, 300 sec: 19535.8). Total num frames: 1022279680. Throughput: 0: 4870.0. Samples: 5558968. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 12:01:02,842][08963] Avg episode reward: [(0, '10.183')] [2025-01-05 12:01:04,423][09057] Updated weights for policy 0, policy_version 249588 (0.0018) [2025-01-05 12:01:06,420][09057] Updated weights for policy 0, policy_version 249598 (0.0016) [2025-01-05 12:01:07,842][08963] Fps is (10 sec: 19251.4, 60 sec: 19456.0, 300 sec: 19535.8). Total num frames: 1022377984. Throughput: 0: 4859.9. Samples: 5588108. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 12:01:07,842][08963] Avg episode reward: [(0, '10.379')] [2025-01-05 12:01:08,528][09057] Updated weights for policy 0, policy_version 249608 (0.0016) [2025-01-05 12:01:10,616][09057] Updated weights for policy 0, policy_version 249618 (0.0016) [2025-01-05 12:01:12,602][09057] Updated weights for policy 0, policy_version 249628 (0.0016) [2025-01-05 12:01:12,842][08963] Fps is (10 sec: 20070.6, 60 sec: 19524.3, 300 sec: 19563.6). Total num frames: 1022480384. Throughput: 0: 4858.9. Samples: 5603008. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 12:01:12,842][08963] Avg episode reward: [(0, '10.441')] [2025-01-05 12:01:14,683][09057] Updated weights for policy 0, policy_version 249638 (0.0016) [2025-01-05 12:01:16,762][09057] Updated weights for policy 0, policy_version 249648 (0.0016) [2025-01-05 12:01:17,842][08963] Fps is (10 sec: 20070.2, 60 sec: 19524.2, 300 sec: 19563.6). Total num frames: 1022578688. Throughput: 0: 4883.5. Samples: 5633082. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 12:01:17,842][08963] Avg episode reward: [(0, '10.857')] [2025-01-05 12:01:18,853][09057] Updated weights for policy 0, policy_version 249658 (0.0017) [2025-01-05 12:01:20,988][09057] Updated weights for policy 0, policy_version 249668 (0.0017) [2025-01-05 12:01:22,842][08963] Fps is (10 sec: 19251.0, 60 sec: 19456.0, 300 sec: 19563.6). Total num frames: 1022672896. Throughput: 0: 4875.9. Samples: 5661650. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 12:01:22,842][08963] Avg episode reward: [(0, '10.423')] [2025-01-05 12:01:23,212][09057] Updated weights for policy 0, policy_version 249678 (0.0018) [2025-01-05 12:01:25,245][09057] Updated weights for policy 0, policy_version 249688 (0.0016) [2025-01-05 12:01:27,332][09057] Updated weights for policy 0, policy_version 249698 (0.0016) [2025-01-05 12:01:27,842][08963] Fps is (10 sec: 19251.4, 60 sec: 19456.0, 300 sec: 19577.5). Total num frames: 1022771200. Throughput: 0: 4880.2. Samples: 5676294. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 12:01:27,842][08963] Avg episode reward: [(0, '9.503')] [2025-01-05 12:01:29,524][09057] Updated weights for policy 0, policy_version 249708 (0.0017) [2025-01-05 12:01:31,496][09057] Updated weights for policy 0, policy_version 249718 (0.0016) [2025-01-05 12:01:32,842][08963] Fps is (10 sec: 19661.0, 60 sec: 19524.3, 300 sec: 19577.5). Total num frames: 1022869504. Throughput: 0: 4871.3. Samples: 5705680. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 12:01:32,842][08963] Avg episode reward: [(0, '10.553')] [2025-01-05 12:01:33,575][09057] Updated weights for policy 0, policy_version 249728 (0.0016) [2025-01-05 12:01:35,649][09057] Updated weights for policy 0, policy_version 249738 (0.0016) [2025-01-05 12:01:37,621][09057] Updated weights for policy 0, policy_version 249748 (0.0016) [2025-01-05 12:01:37,842][08963] Fps is (10 sec: 19660.9, 60 sec: 19592.5, 300 sec: 19577.5). Total num frames: 1022967808. Throughput: 0: 4887.8. Samples: 5735852. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 12:01:37,842][08963] Avg episode reward: [(0, '10.020')] [2025-01-05 12:01:39,686][09057] Updated weights for policy 0, policy_version 249758 (0.0015) [2025-01-05 12:01:41,736][09057] Updated weights for policy 0, policy_version 249768 (0.0016) [2025-01-05 12:01:42,842][08963] Fps is (10 sec: 20070.2, 60 sec: 19660.8, 300 sec: 19591.4). Total num frames: 1023070208. Throughput: 0: 4901.5. Samples: 5750888. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 12:01:42,842][08963] Avg episode reward: [(0, '10.259')] [2025-01-05 12:01:43,825][09057] Updated weights for policy 0, policy_version 249778 (0.0017) [2025-01-05 12:01:45,968][09057] Updated weights for policy 0, policy_version 249788 (0.0019) [2025-01-05 12:01:47,842][08963] Fps is (10 sec: 19660.5, 60 sec: 19524.3, 300 sec: 19591.4). Total num frames: 1023164416. Throughput: 0: 4914.0. Samples: 5780098. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 12:01:47,842][08963] Avg episode reward: [(0, '8.801')] [2025-01-05 12:01:47,909][09024] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000249797_1023168512.pth... [2025-01-05 12:01:47,963][09024] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000248645_1018449920.pth [2025-01-05 12:01:48,141][09057] Updated weights for policy 0, policy_version 249798 (0.0018) [2025-01-05 12:01:50,188][09057] Updated weights for policy 0, policy_version 249808 (0.0016) [2025-01-05 12:01:52,251][09057] Updated weights for policy 0, policy_version 249818 (0.0016) [2025-01-05 12:01:52,842][08963] Fps is (10 sec: 19251.3, 60 sec: 19592.5, 300 sec: 19591.4). Total num frames: 1023262720. Throughput: 0: 4916.5. Samples: 5809352. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 12:01:52,842][08963] Avg episode reward: [(0, '9.940')] [2025-01-05 12:01:54,417][09057] Updated weights for policy 0, policy_version 249828 (0.0017) [2025-01-05 12:01:56,442][09057] Updated weights for policy 0, policy_version 249838 (0.0016) [2025-01-05 12:01:57,842][08963] Fps is (10 sec: 19660.8, 60 sec: 19592.5, 300 sec: 19605.3). Total num frames: 1023361024. Throughput: 0: 4909.8. Samples: 5823950. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 12:01:57,842][08963] Avg episode reward: [(0, '10.814')] [2025-01-05 12:01:58,594][09057] Updated weights for policy 0, policy_version 249848 (0.0016) [2025-01-05 12:02:00,719][09057] Updated weights for policy 0, policy_version 249858 (0.0017) [2025-01-05 12:02:02,842][08963] Fps is (10 sec: 19251.2, 60 sec: 19592.6, 300 sec: 19591.4). Total num frames: 1023455232. Throughput: 0: 4885.7. Samples: 5852936. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 12:02:02,842][08963] Avg episode reward: [(0, '9.392')] [2025-01-05 12:02:02,871][09057] Updated weights for policy 0, policy_version 249868 (0.0018) [2025-01-05 12:02:05,068][09057] Updated weights for policy 0, policy_version 249878 (0.0017) [2025-01-05 12:02:07,188][09057] Updated weights for policy 0, policy_version 249888 (0.0016) [2025-01-05 12:02:07,842][08963] Fps is (10 sec: 18841.8, 60 sec: 19524.3, 300 sec: 19577.5). Total num frames: 1023549440. Throughput: 0: 4882.4. Samples: 5881358. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 12:02:07,842][08963] Avg episode reward: [(0, '9.836')] [2025-01-05 12:02:09,348][09057] Updated weights for policy 0, policy_version 249898 (0.0017) [2025-01-05 12:02:11,456][09057] Updated weights for policy 0, policy_version 249908 (0.0016) [2025-01-05 12:02:12,842][08963] Fps is (10 sec: 19251.1, 60 sec: 19456.0, 300 sec: 19591.4). Total num frames: 1023647744. Throughput: 0: 4874.4. Samples: 5895640. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 12:02:12,842][08963] Avg episode reward: [(0, '10.323')] [2025-01-05 12:02:13,646][09057] Updated weights for policy 0, policy_version 249918 (0.0017) [2025-01-05 12:02:15,725][09057] Updated weights for policy 0, policy_version 249928 (0.0016) [2025-01-05 12:02:17,842][08963] Fps is (10 sec: 19251.1, 60 sec: 19387.8, 300 sec: 19577.5). Total num frames: 1023741952. Throughput: 0: 4861.7. Samples: 5924456. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 12:02:17,842][08963] Avg episode reward: [(0, '11.262')] [2025-01-05 12:02:17,912][09057] Updated weights for policy 0, policy_version 249938 (0.0017) [2025-01-05 12:02:20,056][09057] Updated weights for policy 0, policy_version 249948 (0.0017) [2025-01-05 12:02:22,165][09057] Updated weights for policy 0, policy_version 249958 (0.0018) [2025-01-05 12:02:22,842][08963] Fps is (10 sec: 19251.2, 60 sec: 19456.0, 300 sec: 19577.5). Total num frames: 1023840256. Throughput: 0: 4825.7. Samples: 5953010. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 12:02:22,842][08963] Avg episode reward: [(0, '9.872')] [2025-01-05 12:02:24,364][09057] Updated weights for policy 0, policy_version 249968 (0.0017) [2025-01-05 12:02:26,443][09057] Updated weights for policy 0, policy_version 249978 (0.0016) [2025-01-05 12:02:27,842][08963] Fps is (10 sec: 19251.0, 60 sec: 19387.7, 300 sec: 19577.5). Total num frames: 1023934464. Throughput: 0: 4811.1. Samples: 5967386. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 12:02:27,842][08963] Avg episode reward: [(0, '11.137')] [2025-01-05 12:02:28,549][09057] Updated weights for policy 0, policy_version 249988 (0.0018) [2025-01-05 12:02:30,703][09057] Updated weights for policy 0, policy_version 249998 (0.0017) [2025-01-05 12:02:32,842][08963] Fps is (10 sec: 18841.6, 60 sec: 19319.4, 300 sec: 19577.5). Total num frames: 1024028672. Throughput: 0: 4805.2. Samples: 5996332. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 12:02:32,842][08963] Avg episode reward: [(0, '10.701')] [2025-01-05 12:02:32,856][09057] Updated weights for policy 0, policy_version 250008 (0.0017) [2025-01-05 12:02:34,958][09057] Updated weights for policy 0, policy_version 250018 (0.0017) [2025-01-05 12:02:37,112][09057] Updated weights for policy 0, policy_version 250028 (0.0016) [2025-01-05 12:02:37,842][08963] Fps is (10 sec: 19251.2, 60 sec: 19319.4, 300 sec: 19577.5). Total num frames: 1024126976. Throughput: 0: 4796.1. Samples: 6025178. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 12:02:37,842][08963] Avg episode reward: [(0, '10.724')] [2025-01-05 12:02:39,260][09057] Updated weights for policy 0, policy_version 250038 (0.0017) [2025-01-05 12:02:41,265][09057] Updated weights for policy 0, policy_version 250048 (0.0017) [2025-01-05 12:02:42,842][08963] Fps is (10 sec: 19660.6, 60 sec: 19251.2, 300 sec: 19591.4). Total num frames: 1024225280. Throughput: 0: 4792.1. Samples: 6039596. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 12:02:42,842][08963] Avg episode reward: [(0, '10.287')] [2025-01-05 12:02:43,367][09057] Updated weights for policy 0, policy_version 250058 (0.0015) [2025-01-05 12:02:45,406][09057] Updated weights for policy 0, policy_version 250068 (0.0016) [2025-01-05 12:02:47,415][09057] Updated weights for policy 0, policy_version 250078 (0.0016) [2025-01-05 12:02:47,842][08963] Fps is (10 sec: 19661.0, 60 sec: 19319.5, 300 sec: 19591.4). Total num frames: 1024323584. Throughput: 0: 4816.0. Samples: 6069656. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 12:02:47,842][08963] Avg episode reward: [(0, '10.562')] [2025-01-05 12:02:49,523][09057] Updated weights for policy 0, policy_version 250088 (0.0015) [2025-01-05 12:02:51,558][09057] Updated weights for policy 0, policy_version 250098 (0.0015) [2025-01-05 12:02:52,842][08963] Fps is (10 sec: 20070.7, 60 sec: 19387.7, 300 sec: 19591.4). Total num frames: 1024425984. Throughput: 0: 4847.1. Samples: 6099476. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 12:02:52,842][08963] Avg episode reward: [(0, '10.382')] [2025-01-05 12:02:53,652][09057] Updated weights for policy 0, policy_version 250108 (0.0016) [2025-01-05 12:02:55,800][09057] Updated weights for policy 0, policy_version 250118 (0.0016) [2025-01-05 12:02:57,842][08963] Fps is (10 sec: 19660.6, 60 sec: 19319.5, 300 sec: 19591.4). Total num frames: 1024520192. Throughput: 0: 4853.7. Samples: 6114058. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 12:02:57,842][08963] Avg episode reward: [(0, '10.242')] [2025-01-05 12:02:57,970][09057] Updated weights for policy 0, policy_version 250128 (0.0017) [2025-01-05 12:03:00,072][09057] Updated weights for policy 0, policy_version 250138 (0.0017) [2025-01-05 12:03:02,235][09057] Updated weights for policy 0, policy_version 250148 (0.0016) [2025-01-05 12:03:02,842][08963] Fps is (10 sec: 18841.5, 60 sec: 19319.5, 300 sec: 19563.6). Total num frames: 1024614400. Throughput: 0: 4846.3. Samples: 6142538. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 12:03:02,842][08963] Avg episode reward: [(0, '9.547')] [2025-01-05 12:03:04,410][09057] Updated weights for policy 0, policy_version 250158 (0.0018) [2025-01-05 12:03:06,474][09057] Updated weights for policy 0, policy_version 250168 (0.0016) [2025-01-05 12:03:07,842][08963] Fps is (10 sec: 19251.4, 60 sec: 19387.7, 300 sec: 19563.6). Total num frames: 1024712704. Throughput: 0: 4851.4. Samples: 6171322. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 12:03:07,842][08963] Avg episode reward: [(0, '9.851')] [2025-01-05 12:03:08,676][09057] Updated weights for policy 0, policy_version 250178 (0.0017) [2025-01-05 12:03:10,736][09057] Updated weights for policy 0, policy_version 250188 (0.0017) [2025-01-05 12:03:12,842][08963] Fps is (10 sec: 19251.3, 60 sec: 19319.5, 300 sec: 19549.7). Total num frames: 1024806912. Throughput: 0: 4858.9. Samples: 6186036. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 12:03:12,842][08963] Avg episode reward: [(0, '9.398')] [2025-01-05 12:03:12,857][09057] Updated weights for policy 0, policy_version 250198 (0.0019) [2025-01-05 12:03:15,087][09057] Updated weights for policy 0, policy_version 250208 (0.0019) [2025-01-05 12:03:17,185][09057] Updated weights for policy 0, policy_version 250218 (0.0017) [2025-01-05 12:03:17,842][08963] Fps is (10 sec: 19251.0, 60 sec: 19387.7, 300 sec: 19549.7). Total num frames: 1024905216. Throughput: 0: 4847.3. Samples: 6214460. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 12:03:17,842][08963] Avg episode reward: [(0, '10.663')] [2025-01-05 12:03:19,315][09057] Updated weights for policy 0, policy_version 250228 (0.0017) [2025-01-05 12:03:21,452][09057] Updated weights for policy 0, policy_version 250238 (0.0016) [2025-01-05 12:03:22,842][08963] Fps is (10 sec: 19251.1, 60 sec: 19319.5, 300 sec: 19535.8). Total num frames: 1024999424. Throughput: 0: 4849.3. Samples: 6243394. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 12:03:22,842][08963] Avg episode reward: [(0, '9.635')] [2025-01-05 12:03:23,607][09057] Updated weights for policy 0, policy_version 250248 (0.0017) [2025-01-05 12:03:25,668][09057] Updated weights for policy 0, policy_version 250258 (0.0016) [2025-01-05 12:03:27,842][08963] Fps is (10 sec: 18841.9, 60 sec: 19319.5, 300 sec: 19522.0). Total num frames: 1025093632. Throughput: 0: 4850.2. Samples: 6257854. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 12:03:27,842][08963] Avg episode reward: [(0, '10.866')] [2025-01-05 12:03:27,861][09057] Updated weights for policy 0, policy_version 250268 (0.0019) [2025-01-05 12:03:29,991][09057] Updated weights for policy 0, policy_version 250278 (0.0017) [2025-01-05 12:03:32,074][09057] Updated weights for policy 0, policy_version 250288 (0.0017) [2025-01-05 12:03:32,842][08963] Fps is (10 sec: 19250.8, 60 sec: 19387.7, 300 sec: 19521.9). Total num frames: 1025191936. Throughput: 0: 4822.6. Samples: 6286672. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 12:03:32,843][08963] Avg episode reward: [(0, '9.672')] [2025-01-05 12:03:34,254][09057] Updated weights for policy 0, policy_version 250298 (0.0018) [2025-01-05 12:03:36,249][09057] Updated weights for policy 0, policy_version 250308 (0.0018) [2025-01-05 12:03:37,842][08963] Fps is (10 sec: 19660.5, 60 sec: 19387.7, 300 sec: 19521.9). Total num frames: 1025290240. Throughput: 0: 4815.8. Samples: 6316190. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 12:03:37,842][08963] Avg episode reward: [(0, '9.710')] [2025-01-05 12:03:38,287][09057] Updated weights for policy 0, policy_version 250318 (0.0016) [2025-01-05 12:03:40,371][09057] Updated weights for policy 0, policy_version 250328 (0.0016) [2025-01-05 12:03:42,396][09057] Updated weights for policy 0, policy_version 250338 (0.0020) [2025-01-05 12:03:42,842][08963] Fps is (10 sec: 19661.1, 60 sec: 19387.7, 300 sec: 19522.0). Total num frames: 1025388544. Throughput: 0: 4825.2. Samples: 6331190. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 12:03:42,842][08963] Avg episode reward: [(0, '10.282')] [2025-01-05 12:03:44,561][09057] Updated weights for policy 0, policy_version 250348 (0.0017) [2025-01-05 12:03:46,695][09057] Updated weights for policy 0, policy_version 250358 (0.0017) [2025-01-05 12:03:47,842][08963] Fps is (10 sec: 19660.7, 60 sec: 19387.7, 300 sec: 19521.9). Total num frames: 1025486848. Throughput: 0: 4839.1. Samples: 6360298. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 12:03:47,843][08963] Avg episode reward: [(0, '10.113')] [2025-01-05 12:03:47,852][09024] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000250363_1025486848.pth... [2025-01-05 12:03:47,913][09024] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000249223_1020817408.pth [2025-01-05 12:03:48,865][09057] Updated weights for policy 0, policy_version 250368 (0.0018) [2025-01-05 12:03:50,942][09057] Updated weights for policy 0, policy_version 250378 (0.0016) [2025-01-05 12:03:52,842][08963] Fps is (10 sec: 19660.8, 60 sec: 19319.4, 300 sec: 19521.9). Total num frames: 1025585152. Throughput: 0: 4845.9. Samples: 6389388. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 12:03:52,842][08963] Avg episode reward: [(0, '10.594')] [2025-01-05 12:03:53,029][09057] Updated weights for policy 0, policy_version 250388 (0.0017) [2025-01-05 12:03:55,061][09057] Updated weights for policy 0, policy_version 250398 (0.0016) [2025-01-05 12:03:57,098][09057] Updated weights for policy 0, policy_version 250408 (0.0016) [2025-01-05 12:03:57,842][08963] Fps is (10 sec: 19661.0, 60 sec: 19387.7, 300 sec: 19522.0). Total num frames: 1025683456. Throughput: 0: 4852.2. Samples: 6404386. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 12:03:57,842][08963] Avg episode reward: [(0, '10.904')] [2025-01-05 12:03:59,194][09057] Updated weights for policy 0, policy_version 250418 (0.0016) [2025-01-05 12:04:01,241][09057] Updated weights for policy 0, policy_version 250428 (0.0016) [2025-01-05 12:04:02,842][08963] Fps is (10 sec: 19660.7, 60 sec: 19456.0, 300 sec: 19508.1). Total num frames: 1025781760. Throughput: 0: 4881.0. Samples: 6434106. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 12:04:02,843][08963] Avg episode reward: [(0, '9.888')] [2025-01-05 12:04:03,403][09057] Updated weights for policy 0, policy_version 250438 (0.0018) [2025-01-05 12:04:05,513][09057] Updated weights for policy 0, policy_version 250448 (0.0016) [2025-01-05 12:04:07,559][09057] Updated weights for policy 0, policy_version 250458 (0.0017) [2025-01-05 12:04:07,842][08963] Fps is (10 sec: 19661.1, 60 sec: 19456.0, 300 sec: 19508.1). Total num frames: 1025880064. Throughput: 0: 4888.5. Samples: 6463376. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 12:04:07,842][08963] Avg episode reward: [(0, '9.295')] [2025-01-05 12:04:09,688][09057] Updated weights for policy 0, policy_version 250468 (0.0017) [2025-01-05 12:04:11,790][09057] Updated weights for policy 0, policy_version 250478 (0.0016) [2025-01-05 12:04:12,842][08963] Fps is (10 sec: 19661.0, 60 sec: 19524.2, 300 sec: 19508.1). Total num frames: 1025978368. Throughput: 0: 4888.1. Samples: 6477818. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 12:04:12,842][08963] Avg episode reward: [(0, '9.304')] [2025-01-05 12:04:13,931][09057] Updated weights for policy 0, policy_version 250488 (0.0017) [2025-01-05 12:04:15,977][09057] Updated weights for policy 0, policy_version 250498 (0.0016) [2025-01-05 12:04:17,842][08963] Fps is (10 sec: 19251.0, 60 sec: 19456.0, 300 sec: 19494.2). Total num frames: 1026072576. Throughput: 0: 4896.0. Samples: 6506990. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 12:04:17,843][08963] Avg episode reward: [(0, '9.744')] [2025-01-05 12:04:18,141][09057] Updated weights for policy 0, policy_version 250508 (0.0017) [2025-01-05 12:04:20,213][09057] Updated weights for policy 0, policy_version 250518 (0.0017) [2025-01-05 12:04:22,215][09057] Updated weights for policy 0, policy_version 250528 (0.0016) [2025-01-05 12:04:22,842][08963] Fps is (10 sec: 19661.0, 60 sec: 19592.6, 300 sec: 19508.1). Total num frames: 1026174976. Throughput: 0: 4898.9. Samples: 6536640. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 12:04:22,842][08963] Avg episode reward: [(0, '9.848')] [2025-01-05 12:04:24,282][09057] Updated weights for policy 0, policy_version 250538 (0.0015) [2025-01-05 12:04:26,329][09057] Updated weights for policy 0, policy_version 250548 (0.0016) [2025-01-05 12:04:27,842][08963] Fps is (10 sec: 20070.5, 60 sec: 19660.8, 300 sec: 19494.2). Total num frames: 1026273280. Throughput: 0: 4900.6. Samples: 6551718. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 12:04:27,842][08963] Avg episode reward: [(0, '10.778')] [2025-01-05 12:04:28,386][09057] Updated weights for policy 0, policy_version 250558 (0.0016) [2025-01-05 12:04:30,450][09057] Updated weights for policy 0, policy_version 250568 (0.0016) [2025-01-05 12:04:32,481][09057] Updated weights for policy 0, policy_version 250578 (0.0015) [2025-01-05 12:04:32,842][08963] Fps is (10 sec: 19659.7, 60 sec: 19660.7, 300 sec: 19494.2). Total num frames: 1026371584. Throughput: 0: 4917.5. Samples: 6581586. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 12:04:32,843][08963] Avg episode reward: [(0, '9.284')] [2025-01-05 12:04:34,597][09057] Updated weights for policy 0, policy_version 250588 (0.0017) [2025-01-05 12:04:36,728][09057] Updated weights for policy 0, policy_version 250598 (0.0016) [2025-01-05 12:04:37,842][08963] Fps is (10 sec: 19660.6, 60 sec: 19660.8, 300 sec: 19494.2). Total num frames: 1026469888. Throughput: 0: 4918.2. Samples: 6610708. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 12:04:37,842][08963] Avg episode reward: [(0, '11.471')] [2025-01-05 12:04:38,862][09057] Updated weights for policy 0, policy_version 250608 (0.0017) [2025-01-05 12:04:40,888][09057] Updated weights for policy 0, policy_version 250618 (0.0016) [2025-01-05 12:04:42,842][08963] Fps is (10 sec: 19661.6, 60 sec: 19660.8, 300 sec: 19494.2). Total num frames: 1026568192. Throughput: 0: 4911.4. Samples: 6625398. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 12:04:42,842][08963] Avg episode reward: [(0, '9.565')] [2025-01-05 12:04:43,098][09057] Updated weights for policy 0, policy_version 250628 (0.0016) [2025-01-05 12:04:45,162][09057] Updated weights for policy 0, policy_version 250638 (0.0016) [2025-01-05 12:04:47,157][09057] Updated weights for policy 0, policy_version 250648 (0.0015) [2025-01-05 12:04:47,842][08963] Fps is (10 sec: 19660.7, 60 sec: 19660.8, 300 sec: 19494.2). Total num frames: 1026666496. Throughput: 0: 4903.8. Samples: 6654778. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 12:04:47,843][08963] Avg episode reward: [(0, '9.609')] [2025-01-05 12:04:49,234][09057] Updated weights for policy 0, policy_version 250658 (0.0016) [2025-01-05 12:04:51,258][09057] Updated weights for policy 0, policy_version 250668 (0.0016) [2025-01-05 12:04:52,842][08963] Fps is (10 sec: 19661.0, 60 sec: 19660.8, 300 sec: 19494.2). Total num frames: 1026764800. Throughput: 0: 4918.3. Samples: 6684700. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 12:04:52,842][08963] Avg episode reward: [(0, '8.504')] [2025-01-05 12:04:53,292][09057] Updated weights for policy 0, policy_version 250678 (0.0017) [2025-01-05 12:04:55,381][09057] Updated weights for policy 0, policy_version 250688 (0.0015) [2025-01-05 12:04:57,419][09057] Updated weights for policy 0, policy_version 250698 (0.0016) [2025-01-05 12:04:57,842][08963] Fps is (10 sec: 20070.4, 60 sec: 19729.0, 300 sec: 19508.1). Total num frames: 1026867200. Throughput: 0: 4932.6. Samples: 6699784. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 12:04:57,843][08963] Avg episode reward: [(0, '9.778')] [2025-01-05 12:04:59,512][09057] Updated weights for policy 0, policy_version 250708 (0.0017) [2025-01-05 12:05:01,616][09057] Updated weights for policy 0, policy_version 250718 (0.0016) [2025-01-05 12:05:02,842][08963] Fps is (10 sec: 19660.8, 60 sec: 19660.9, 300 sec: 19494.2). Total num frames: 1026961408. Throughput: 0: 4940.2. Samples: 6729298. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 12:05:02,842][08963] Avg episode reward: [(0, '9.908')] [2025-01-05 12:05:03,773][09057] Updated weights for policy 0, policy_version 250728 (0.0016) [2025-01-05 12:05:05,819][09057] Updated weights for policy 0, policy_version 250738 (0.0016) [2025-01-05 12:05:07,842][08963] Fps is (10 sec: 19251.3, 60 sec: 19660.7, 300 sec: 19494.2). Total num frames: 1027059712. Throughput: 0: 4924.4. Samples: 6758238. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 12:05:07,842][08963] Avg episode reward: [(0, '9.997')] [2025-01-05 12:05:07,989][09057] Updated weights for policy 0, policy_version 250748 (0.0017) [2025-01-05 12:05:10,133][09057] Updated weights for policy 0, policy_version 250758 (0.0016) [2025-01-05 12:05:12,148][09057] Updated weights for policy 0, policy_version 250768 (0.0016) [2025-01-05 12:05:12,842][08963] Fps is (10 sec: 19660.8, 60 sec: 19660.8, 300 sec: 19494.2). Total num frames: 1027158016. Throughput: 0: 4910.0. Samples: 6772668. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 12:05:12,842][08963] Avg episode reward: [(0, '10.028')] [2025-01-05 12:05:14,236][09057] Updated weights for policy 0, policy_version 250778 (0.0016) [2025-01-05 12:05:16,276][09057] Updated weights for policy 0, policy_version 250788 (0.0016) [2025-01-05 12:05:17,842][08963] Fps is (10 sec: 19660.9, 60 sec: 19729.1, 300 sec: 19494.2). Total num frames: 1027256320. Throughput: 0: 4911.8. Samples: 6802614. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 12:05:17,842][08963] Avg episode reward: [(0, '9.488')] [2025-01-05 12:05:18,392][09057] Updated weights for policy 0, policy_version 250798 (0.0017) [2025-01-05 12:05:20,564][09057] Updated weights for policy 0, policy_version 250808 (0.0017) [2025-01-05 12:05:22,592][09057] Updated weights for policy 0, policy_version 250818 (0.0015) [2025-01-05 12:05:22,842][08963] Fps is (10 sec: 19660.5, 60 sec: 19660.7, 300 sec: 19494.2). Total num frames: 1027354624. Throughput: 0: 4913.8. Samples: 6831830. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 12:05:22,842][08963] Avg episode reward: [(0, '9.703')] [2025-01-05 12:05:24,699][09057] Updated weights for policy 0, policy_version 250828 (0.0016) [2025-01-05 12:05:26,806][09057] Updated weights for policy 0, policy_version 250838 (0.0016) [2025-01-05 12:05:27,842][08963] Fps is (10 sec: 19251.0, 60 sec: 19592.5, 300 sec: 19494.2). Total num frames: 1027448832. Throughput: 0: 4907.9. Samples: 6846256. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 12:05:27,842][08963] Avg episode reward: [(0, '10.491')] [2025-01-05 12:05:28,942][09057] Updated weights for policy 0, policy_version 250848 (0.0017) [2025-01-05 12:05:30,994][09057] Updated weights for policy 0, policy_version 250858 (0.0018) [2025-01-05 12:05:32,842][08963] Fps is (10 sec: 19251.2, 60 sec: 19592.7, 300 sec: 19508.1). Total num frames: 1027547136. Throughput: 0: 4900.9. Samples: 6875320. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 12:05:32,842][08963] Avg episode reward: [(0, '9.088')] [2025-01-05 12:05:33,222][09057] Updated weights for policy 0, policy_version 250868 (0.0018) [2025-01-05 12:05:35,315][09057] Updated weights for policy 0, policy_version 250878 (0.0016) [2025-01-05 12:05:37,333][09057] Updated weights for policy 0, policy_version 250888 (0.0017) [2025-01-05 12:05:37,842][08963] Fps is (10 sec: 19660.8, 60 sec: 19592.5, 300 sec: 19508.1). Total num frames: 1027645440. Throughput: 0: 4888.9. Samples: 6904700. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 12:05:37,843][08963] Avg episode reward: [(0, '8.933')] [2025-01-05 12:05:39,435][09057] Updated weights for policy 0, policy_version 250898 (0.0017) [2025-01-05 12:05:41,515][09057] Updated weights for policy 0, policy_version 250908 (0.0017) [2025-01-05 12:05:42,842][08963] Fps is (10 sec: 19661.1, 60 sec: 19592.6, 300 sec: 19494.2). Total num frames: 1027743744. Throughput: 0: 4881.4. Samples: 6919448. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 12:05:42,842][08963] Avg episode reward: [(0, '9.733')] [2025-01-05 12:05:43,654][09057] Updated weights for policy 0, policy_version 250918 (0.0018) [2025-01-05 12:05:45,824][09057] Updated weights for policy 0, policy_version 250928 (0.0017) [2025-01-05 12:05:47,842][08963] Fps is (10 sec: 19251.4, 60 sec: 19524.3, 300 sec: 19494.2). Total num frames: 1027837952. Throughput: 0: 4862.2. Samples: 6948096. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 12:05:47,842][08963] Avg episode reward: [(0, '9.017')] [2025-01-05 12:05:47,849][09024] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000250937_1027837952.pth... [2025-01-05 12:05:47,907][09024] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000249797_1023168512.pth [2025-01-05 12:05:48,013][09057] Updated weights for policy 0, policy_version 250938 (0.0017) [2025-01-05 12:05:50,124][09057] Updated weights for policy 0, policy_version 250948 (0.0017) [2025-01-05 12:05:52,238][09057] Updated weights for policy 0, policy_version 250958 (0.0016) [2025-01-05 12:05:52,842][08963] Fps is (10 sec: 18841.3, 60 sec: 19456.0, 300 sec: 19480.3). Total num frames: 1027932160. Throughput: 0: 4860.1. Samples: 6976942. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 12:05:52,842][08963] Avg episode reward: [(0, '10.606')] [2025-01-05 12:05:54,377][09057] Updated weights for policy 0, policy_version 250968 (0.0021) [2025-01-05 12:05:56,454][09057] Updated weights for policy 0, policy_version 250978 (0.0017) [2025-01-05 12:05:57,842][08963] Fps is (10 sec: 19251.0, 60 sec: 19387.7, 300 sec: 19494.2). Total num frames: 1028030464. Throughput: 0: 4856.6. Samples: 6991214. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 12:05:57,842][08963] Avg episode reward: [(0, '8.487')] [2025-01-05 12:05:58,670][09057] Updated weights for policy 0, policy_version 250988 (0.0017) [2025-01-05 12:06:00,691][09057] Updated weights for policy 0, policy_version 250998 (0.0016) [2025-01-05 12:06:02,817][09057] Updated weights for policy 0, policy_version 251008 (0.0017) [2025-01-05 12:06:02,842][08963] Fps is (10 sec: 19661.1, 60 sec: 19456.0, 300 sec: 19494.2). Total num frames: 1028128768. Throughput: 0: 4841.7. Samples: 7020492. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 12:06:02,842][08963] Avg episode reward: [(0, '10.929')] [2025-01-05 12:06:05,022][09057] Updated weights for policy 0, policy_version 251018 (0.0017) [2025-01-05 12:06:07,099][09057] Updated weights for policy 0, policy_version 251028 (0.0016) [2025-01-05 12:06:07,842][08963] Fps is (10 sec: 19251.6, 60 sec: 19387.8, 300 sec: 19466.4). Total num frames: 1028222976. Throughput: 0: 4828.5. Samples: 7049112. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 12:06:07,842][08963] Avg episode reward: [(0, '9.896')] [2025-01-05 12:06:09,241][09057] Updated weights for policy 0, policy_version 251038 (0.0017) [2025-01-05 12:06:11,399][09057] Updated weights for policy 0, policy_version 251048 (0.0017) [2025-01-05 12:06:12,842][08963] Fps is (10 sec: 18841.4, 60 sec: 19319.4, 300 sec: 19452.5). Total num frames: 1028317184. Throughput: 0: 4827.9. Samples: 7063510. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 12:06:12,842][08963] Avg episode reward: [(0, '9.711')] [2025-01-05 12:06:13,576][09057] Updated weights for policy 0, policy_version 251058 (0.0018) [2025-01-05 12:06:15,663][09057] Updated weights for policy 0, policy_version 251068 (0.0019) [2025-01-05 12:06:17,766][09057] Updated weights for policy 0, policy_version 251078 (0.0016) [2025-01-05 12:06:17,842][08963] Fps is (10 sec: 19250.9, 60 sec: 19319.5, 300 sec: 19466.4). Total num frames: 1028415488. Throughput: 0: 4823.3. Samples: 7092370. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 12:06:17,842][08963] Avg episode reward: [(0, '9.738')] [2025-01-05 12:06:19,907][09057] Updated weights for policy 0, policy_version 251088 (0.0017) [2025-01-05 12:06:21,943][09057] Updated weights for policy 0, policy_version 251098 (0.0015) [2025-01-05 12:06:22,842][08963] Fps is (10 sec: 19251.4, 60 sec: 19251.2, 300 sec: 19452.5). Total num frames: 1028509696. Throughput: 0: 4817.6. Samples: 7121490. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 12:06:22,842][08963] Avg episode reward: [(0, '10.727')] [2025-01-05 12:06:24,154][09057] Updated weights for policy 0, policy_version 251108 (0.0017) [2025-01-05 12:06:26,303][09057] Updated weights for policy 0, policy_version 251118 (0.0016) [2025-01-05 12:06:27,842][08963] Fps is (10 sec: 19250.6, 60 sec: 19319.4, 300 sec: 19452.5). Total num frames: 1028608000. Throughput: 0: 4802.7. Samples: 7135572. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 12:06:27,843][08963] Avg episode reward: [(0, '10.167')] [2025-01-05 12:06:28,447][09057] Updated weights for policy 0, policy_version 251128 (0.0018) [2025-01-05 12:06:30,557][09057] Updated weights for policy 0, policy_version 251138 (0.0015) [2025-01-05 12:06:32,604][09057] Updated weights for policy 0, policy_version 251148 (0.0016) [2025-01-05 12:06:32,842][08963] Fps is (10 sec: 19660.9, 60 sec: 19319.5, 300 sec: 19452.5). Total num frames: 1028706304. Throughput: 0: 4814.6. Samples: 7164752. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 12:06:32,842][08963] Avg episode reward: [(0, '9.663')] [2025-01-05 12:06:34,785][09057] Updated weights for policy 0, policy_version 251158 (0.0018) [2025-01-05 12:06:36,956][09057] Updated weights for policy 0, policy_version 251168 (0.0017) [2025-01-05 12:06:37,842][08963] Fps is (10 sec: 19251.5, 60 sec: 19251.2, 300 sec: 19424.7). Total num frames: 1028800512. Throughput: 0: 4808.6. Samples: 7193330. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 12:06:37,843][08963] Avg episode reward: [(0, '9.403')] [2025-01-05 12:06:39,139][09057] Updated weights for policy 0, policy_version 251178 (0.0017) [2025-01-05 12:06:41,179][09057] Updated weights for policy 0, policy_version 251188 (0.0016) [2025-01-05 12:06:42,842][08963] Fps is (10 sec: 18841.3, 60 sec: 19182.9, 300 sec: 19424.8). Total num frames: 1028894720. Throughput: 0: 4813.7. Samples: 7207830. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 12:06:42,842][08963] Avg episode reward: [(0, '8.733')] [2025-01-05 12:06:43,397][09057] Updated weights for policy 0, policy_version 251198 (0.0018) [2025-01-05 12:06:45,468][09057] Updated weights for policy 0, policy_version 251208 (0.0015) [2025-01-05 12:06:47,508][09057] Updated weights for policy 0, policy_version 251218 (0.0016) [2025-01-05 12:06:47,842][08963] Fps is (10 sec: 19251.2, 60 sec: 19251.2, 300 sec: 19424.7). Total num frames: 1028993024. Throughput: 0: 4807.0. Samples: 7236806. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 12:06:47,843][08963] Avg episode reward: [(0, '9.599')] [2025-01-05 12:06:49,713][09057] Updated weights for policy 0, policy_version 251228 (0.0016) [2025-01-05 12:06:51,755][09057] Updated weights for policy 0, policy_version 251238 (0.0015) [2025-01-05 12:06:52,842][08963] Fps is (10 sec: 19251.2, 60 sec: 19251.2, 300 sec: 19410.9). Total num frames: 1029087232. Throughput: 0: 4816.0. Samples: 7265834. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 12:06:52,842][08963] Avg episode reward: [(0, '10.449')] [2025-01-05 12:06:53,929][09057] Updated weights for policy 0, policy_version 251248 (0.0017) [2025-01-05 12:06:56,072][09057] Updated weights for policy 0, policy_version 251258 (0.0016) [2025-01-05 12:06:57,842][08963] Fps is (10 sec: 19251.4, 60 sec: 19251.2, 300 sec: 19424.7). Total num frames: 1029185536. Throughput: 0: 4812.7. Samples: 7280080. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 12:06:57,843][08963] Avg episode reward: [(0, '10.324')] [2025-01-05 12:06:58,250][09057] Updated weights for policy 0, policy_version 251268 (0.0017) [2025-01-05 12:07:00,312][09057] Updated weights for policy 0, policy_version 251278 (0.0015) [2025-01-05 12:07:02,384][09057] Updated weights for policy 0, policy_version 251288 (0.0017) [2025-01-05 12:07:02,842][08963] Fps is (10 sec: 19660.9, 60 sec: 19251.2, 300 sec: 19438.6). Total num frames: 1029283840. Throughput: 0: 4821.5. Samples: 7309336. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 12:07:02,842][08963] Avg episode reward: [(0, '9.655')] [2025-01-05 12:07:04,544][09057] Updated weights for policy 0, policy_version 251298 (0.0017) [2025-01-05 12:07:06,695][09057] Updated weights for policy 0, policy_version 251308 (0.0016) [2025-01-05 12:07:07,842][08963] Fps is (10 sec: 19251.2, 60 sec: 19251.1, 300 sec: 19424.8). Total num frames: 1029378048. Throughput: 0: 4807.5. Samples: 7337830. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 12:07:07,842][08963] Avg episode reward: [(0, '8.887')] [2025-01-05 12:07:08,889][09057] Updated weights for policy 0, policy_version 251318 (0.0016) [2025-01-05 12:07:10,950][09057] Updated weights for policy 0, policy_version 251328 (0.0016) [2025-01-05 12:07:12,842][08963] Fps is (10 sec: 18841.8, 60 sec: 19251.2, 300 sec: 19424.8). Total num frames: 1029472256. Throughput: 0: 4816.5. Samples: 7352312. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 12:07:12,842][08963] Avg episode reward: [(0, '9.354')] [2025-01-05 12:07:13,105][09057] Updated weights for policy 0, policy_version 251338 (0.0017) [2025-01-05 12:07:15,233][09057] Updated weights for policy 0, policy_version 251348 (0.0016) [2025-01-05 12:07:17,241][09057] Updated weights for policy 0, policy_version 251358 (0.0016) [2025-01-05 12:07:17,842][08963] Fps is (10 sec: 19251.3, 60 sec: 19251.2, 300 sec: 19424.8). Total num frames: 1029570560. Throughput: 0: 4817.5. Samples: 7381540. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 12:07:17,842][08963] Avg episode reward: [(0, '10.252')] [2025-01-05 12:07:19,280][09057] Updated weights for policy 0, policy_version 251368 (0.0016) [2025-01-05 12:07:21,363][09057] Updated weights for policy 0, policy_version 251378 (0.0016) [2025-01-05 12:07:22,842][08963] Fps is (10 sec: 20070.2, 60 sec: 19387.7, 300 sec: 19452.5). Total num frames: 1029672960. Throughput: 0: 4847.0. Samples: 7411446. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 12:07:22,842][08963] Avg episode reward: [(0, '10.289')] [2025-01-05 12:07:23,405][09057] Updated weights for policy 0, policy_version 251388 (0.0015) [2025-01-05 12:07:25,438][09057] Updated weights for policy 0, policy_version 251398 (0.0015) [2025-01-05 12:07:27,538][09057] Updated weights for policy 0, policy_version 251408 (0.0015) [2025-01-05 12:07:27,842][08963] Fps is (10 sec: 20070.6, 60 sec: 19387.9, 300 sec: 19466.4). Total num frames: 1029771264. Throughput: 0: 4857.1. Samples: 7426400. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 12:07:27,842][08963] Avg episode reward: [(0, '10.085')] [2025-01-05 12:07:29,647][09057] Updated weights for policy 0, policy_version 251418 (0.0017) [2025-01-05 12:07:31,717][09057] Updated weights for policy 0, policy_version 251428 (0.0016) [2025-01-05 12:07:32,842][08963] Fps is (10 sec: 19660.8, 60 sec: 19387.7, 300 sec: 19466.4). Total num frames: 1029869568. Throughput: 0: 4864.7. Samples: 7455718. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 12:07:32,843][08963] Avg episode reward: [(0, '8.111')] [2025-01-05 12:07:33,922][09057] Updated weights for policy 0, policy_version 251438 (0.0017) [2025-01-05 12:07:35,972][09057] Updated weights for policy 0, policy_version 251448 (0.0016) [2025-01-05 12:07:37,842][08963] Fps is (10 sec: 19250.8, 60 sec: 19387.8, 300 sec: 19452.5). Total num frames: 1029963776. Throughput: 0: 4859.9. Samples: 7484528. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 12:07:37,843][08963] Avg episode reward: [(0, '10.878')] [2025-01-05 12:07:38,106][09057] Updated weights for policy 0, policy_version 251458 (0.0020) [2025-01-05 12:07:40,235][09057] Updated weights for policy 0, policy_version 251468 (0.0016) [2025-01-05 12:07:42,280][09057] Updated weights for policy 0, policy_version 251478 (0.0016) [2025-01-05 12:07:42,842][08963] Fps is (10 sec: 19251.3, 60 sec: 19456.0, 300 sec: 19452.5). Total num frames: 1030062080. Throughput: 0: 4869.7. Samples: 7499216. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 12:07:42,842][08963] Avg episode reward: [(0, '8.664')] [2025-01-05 12:07:44,406][09057] Updated weights for policy 0, policy_version 251488 (0.0017) [2025-01-05 12:07:46,491][09057] Updated weights for policy 0, policy_version 251498 (0.0016) [2025-01-05 12:07:47,842][08963] Fps is (10 sec: 19660.5, 60 sec: 19456.0, 300 sec: 19438.6). Total num frames: 1030160384. Throughput: 0: 4872.0. Samples: 7528576. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 12:07:47,843][08963] Avg episode reward: [(0, '9.483')] [2025-01-05 12:07:47,849][09024] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000251504_1030160384.pth... [2025-01-05 12:07:47,904][09024] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000250363_1025486848.pth [2025-01-05 12:07:48,673][09057] Updated weights for policy 0, policy_version 251508 (0.0017) [2025-01-05 12:07:50,745][09057] Updated weights for policy 0, policy_version 251518 (0.0016) [2025-01-05 12:07:52,804][09057] Updated weights for policy 0, policy_version 251528 (0.0015) [2025-01-05 12:07:52,842][08963] Fps is (10 sec: 19660.8, 60 sec: 19524.3, 300 sec: 19452.5). Total num frames: 1030258688. Throughput: 0: 4889.3. Samples: 7557848. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 12:07:52,842][08963] Avg episode reward: [(0, '10.724')] [2025-01-05 12:07:54,980][09057] Updated weights for policy 0, policy_version 251538 (0.0018) [2025-01-05 12:07:57,056][09057] Updated weights for policy 0, policy_version 251548 (0.0017) [2025-01-05 12:07:57,842][08963] Fps is (10 sec: 19251.7, 60 sec: 19456.0, 300 sec: 19452.5). Total num frames: 1030352896. Throughput: 0: 4882.0. Samples: 7572004. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 12:07:57,842][08963] Avg episode reward: [(0, '9.937')] [2025-01-05 12:07:59,257][09057] Updated weights for policy 0, policy_version 251558 (0.0017) [2025-01-05 12:08:01,356][09057] Updated weights for policy 0, policy_version 251568 (0.0016) [2025-01-05 12:08:02,842][08963] Fps is (10 sec: 19251.0, 60 sec: 19456.0, 300 sec: 19452.5). Total num frames: 1030451200. Throughput: 0: 4878.4. Samples: 7601070. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 12:08:02,842][08963] Avg episode reward: [(0, '10.293')] [2025-01-05 12:08:03,477][09057] Updated weights for policy 0, policy_version 251578 (0.0018) [2025-01-05 12:08:05,601][09057] Updated weights for policy 0, policy_version 251588 (0.0016) [2025-01-05 12:08:07,658][09057] Updated weights for policy 0, policy_version 251598 (0.0017) [2025-01-05 12:08:07,842][08963] Fps is (10 sec: 19660.7, 60 sec: 19524.3, 300 sec: 19466.4). Total num frames: 1030549504. Throughput: 0: 4863.7. Samples: 7630314. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 12:08:07,842][08963] Avg episode reward: [(0, '10.841')] [2025-01-05 12:08:09,773][09057] Updated weights for policy 0, policy_version 251608 (0.0017) [2025-01-05 12:08:11,905][09057] Updated weights for policy 0, policy_version 251618 (0.0017) [2025-01-05 12:08:12,842][08963] Fps is (10 sec: 19251.2, 60 sec: 19524.2, 300 sec: 19452.5). Total num frames: 1030643712. Throughput: 0: 4850.3. Samples: 7644662. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 12:08:12,842][08963] Avg episode reward: [(0, '10.463')] [2025-01-05 12:08:14,029][09057] Updated weights for policy 0, policy_version 251628 (0.0020) [2025-01-05 12:08:16,091][09057] Updated weights for policy 0, policy_version 251638 (0.0016) [2025-01-05 12:08:17,842][08963] Fps is (10 sec: 19251.4, 60 sec: 19524.3, 300 sec: 19466.4). Total num frames: 1030742016. Throughput: 0: 4848.0. Samples: 7673878. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 12:08:17,842][08963] Avg episode reward: [(0, '9.331')] [2025-01-05 12:08:18,261][09057] Updated weights for policy 0, policy_version 251648 (0.0017) [2025-01-05 12:08:20,316][09057] Updated weights for policy 0, policy_version 251658 (0.0016) [2025-01-05 12:08:22,362][09057] Updated weights for policy 0, policy_version 251668 (0.0016) [2025-01-05 12:08:22,842][08963] Fps is (10 sec: 19660.9, 60 sec: 19456.0, 300 sec: 19480.3). Total num frames: 1030840320. Throughput: 0: 4864.2. Samples: 7703416. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 12:08:22,842][08963] Avg episode reward: [(0, '10.792')] [2025-01-05 12:08:24,511][09057] Updated weights for policy 0, policy_version 251678 (0.0017) [2025-01-05 12:08:26,587][09057] Updated weights for policy 0, policy_version 251688 (0.0016) [2025-01-05 12:08:27,842][08963] Fps is (10 sec: 19251.3, 60 sec: 19387.7, 300 sec: 19466.4). Total num frames: 1030934528. Throughput: 0: 4857.1. Samples: 7717786. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 12:08:27,842][08963] Avg episode reward: [(0, '9.363')] [2025-01-05 12:08:28,715][09057] Updated weights for policy 0, policy_version 251698 (0.0017) [2025-01-05 12:08:30,781][09057] Updated weights for policy 0, policy_version 251708 (0.0016) [2025-01-05 12:08:32,842][08963] Fps is (10 sec: 19251.3, 60 sec: 19387.8, 300 sec: 19466.4). Total num frames: 1031032832. Throughput: 0: 4855.5. Samples: 7747074. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 12:08:32,842][08963] Avg episode reward: [(0, '9.360')] [2025-01-05 12:08:32,903][09057] Updated weights for policy 0, policy_version 251718 (0.0016) [2025-01-05 12:08:35,015][09057] Updated weights for policy 0, policy_version 251728 (0.0016) [2025-01-05 12:08:37,092][09057] Updated weights for policy 0, policy_version 251738 (0.0015) [2025-01-05 12:08:37,842][08963] Fps is (10 sec: 19660.6, 60 sec: 19456.0, 300 sec: 19466.4). Total num frames: 1031131136. Throughput: 0: 4854.2. Samples: 7776286. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 12:08:37,842][08963] Avg episode reward: [(0, '8.711')] [2025-01-05 12:08:39,264][09057] Updated weights for policy 0, policy_version 251748 (0.0017) [2025-01-05 12:08:41,303][09057] Updated weights for policy 0, policy_version 251758 (0.0016) [2025-01-05 12:08:42,842][08963] Fps is (10 sec: 19660.8, 60 sec: 19456.0, 300 sec: 19466.4). Total num frames: 1031229440. Throughput: 0: 4860.6. Samples: 7790730. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 12:08:42,842][08963] Avg episode reward: [(0, '10.679')] [2025-01-05 12:08:43,486][09057] Updated weights for policy 0, policy_version 251768 (0.0017) [2025-01-05 12:08:45,584][09057] Updated weights for policy 0, policy_version 251778 (0.0016) [2025-01-05 12:08:47,586][09057] Updated weights for policy 0, policy_version 251788 (0.0016) [2025-01-05 12:08:47,842][08963] Fps is (10 sec: 19660.9, 60 sec: 19456.1, 300 sec: 19466.4). Total num frames: 1031327744. Throughput: 0: 4865.5. Samples: 7820016. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 12:08:47,842][08963] Avg episode reward: [(0, '10.305')] [2025-01-05 12:08:49,653][09057] Updated weights for policy 0, policy_version 251798 (0.0016) [2025-01-05 12:08:51,696][09057] Updated weights for policy 0, policy_version 251808 (0.0016) [2025-01-05 12:08:52,842][08963] Fps is (10 sec: 19660.7, 60 sec: 19456.0, 300 sec: 19466.4). Total num frames: 1031426048. Throughput: 0: 4880.1. Samples: 7849918. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 12:08:52,842][08963] Avg episode reward: [(0, '10.705')] [2025-01-05 12:08:53,796][09057] Updated weights for policy 0, policy_version 251818 (0.0017) [2025-01-05 12:08:55,943][09057] Updated weights for policy 0, policy_version 251828 (0.0017) [2025-01-05 12:08:57,842][08963] Fps is (10 sec: 19251.1, 60 sec: 19456.0, 300 sec: 19452.5). Total num frames: 1031520256. Throughput: 0: 4880.4. Samples: 7864278. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 12:08:57,842][08963] Avg episode reward: [(0, '9.137')] [2025-01-05 12:08:58,139][09057] Updated weights for policy 0, policy_version 251838 (0.0017) [2025-01-05 12:09:00,182][09057] Updated weights for policy 0, policy_version 251848 (0.0017) [2025-01-05 12:09:02,270][09057] Updated weights for policy 0, policy_version 251858 (0.0016) [2025-01-05 12:09:02,842][08963] Fps is (10 sec: 19251.1, 60 sec: 19456.0, 300 sec: 19452.5). Total num frames: 1031618560. Throughput: 0: 4879.6. Samples: 7893460. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 12:09:02,842][08963] Avg episode reward: [(0, '9.714')] [2025-01-05 12:09:04,420][09057] Updated weights for policy 0, policy_version 251868 (0.0017) [2025-01-05 12:09:06,472][09057] Updated weights for policy 0, policy_version 251878 (0.0016) [2025-01-05 12:09:07,842][08963] Fps is (10 sec: 19660.9, 60 sec: 19456.0, 300 sec: 19452.5). Total num frames: 1031716864. Throughput: 0: 4869.1. Samples: 7922524. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 12:09:07,842][08963] Avg episode reward: [(0, '9.415')] [2025-01-05 12:09:08,653][09057] Updated weights for policy 0, policy_version 251888 (0.0018) [2025-01-05 12:09:10,729][09057] Updated weights for policy 0, policy_version 251898 (0.0016) [2025-01-05 12:09:12,842][08963] Fps is (10 sec: 19251.5, 60 sec: 19456.0, 300 sec: 19452.5). Total num frames: 1031811072. Throughput: 0: 4874.0. Samples: 7937116. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 12:09:12,842][08963] Avg episode reward: [(0, '8.893')] [2025-01-05 12:09:12,870][09057] Updated weights for policy 0, policy_version 251908 (0.0017) [2025-01-05 12:09:15,066][09057] Updated weights for policy 0, policy_version 251918 (0.0017) [2025-01-05 12:09:17,157][09057] Updated weights for policy 0, policy_version 251928 (0.0016) [2025-01-05 12:09:17,842][08963] Fps is (10 sec: 19251.1, 60 sec: 19456.0, 300 sec: 19438.6). Total num frames: 1031909376. Throughput: 0: 4857.4. Samples: 7965656. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 12:09:17,842][08963] Avg episode reward: [(0, '10.520')] [2025-01-05 12:09:19,266][09057] Updated weights for policy 0, policy_version 251938 (0.0017) [2025-01-05 12:09:21,387][09057] Updated weights for policy 0, policy_version 251948 (0.0018) [2025-01-05 12:09:22,842][08963] Fps is (10 sec: 19251.0, 60 sec: 19387.7, 300 sec: 19424.8). Total num frames: 1032003584. Throughput: 0: 4852.4. Samples: 7994644. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 12:09:22,842][08963] Avg episode reward: [(0, '10.405')] [2025-01-05 12:09:23,538][09057] Updated weights for policy 0, policy_version 251958 (0.0017) [2025-01-05 12:09:25,615][09057] Updated weights for policy 0, policy_version 251968 (0.0016) [2025-01-05 12:09:27,698][09057] Updated weights for policy 0, policy_version 251978 (0.0016) [2025-01-05 12:09:27,842][08963] Fps is (10 sec: 19251.4, 60 sec: 19456.0, 300 sec: 19424.8). Total num frames: 1032101888. Throughput: 0: 4854.4. Samples: 8009176. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 12:09:27,842][08963] Avg episode reward: [(0, '9.832')] [2025-01-05 12:09:29,815][09057] Updated weights for policy 0, policy_version 251988 (0.0017) [2025-01-05 12:09:31,882][09057] Updated weights for policy 0, policy_version 251998 (0.0017) [2025-01-05 12:09:32,842][08963] Fps is (10 sec: 19660.7, 60 sec: 19456.0, 300 sec: 19424.8). Total num frames: 1032200192. Throughput: 0: 4856.3. Samples: 8038550. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 12:09:32,842][08963] Avg episode reward: [(0, '10.431')] [2025-01-05 12:09:34,091][09057] Updated weights for policy 0, policy_version 252008 (0.0018) [2025-01-05 12:09:36,194][09057] Updated weights for policy 0, policy_version 252018 (0.0017) [2025-01-05 12:09:37,842][08963] Fps is (10 sec: 19250.9, 60 sec: 19387.7, 300 sec: 19410.9). Total num frames: 1032294400. Throughput: 0: 4828.6. Samples: 8067204. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 12:09:37,842][08963] Avg episode reward: [(0, '9.220')] [2025-01-05 12:09:38,320][09057] Updated weights for policy 0, policy_version 252028 (0.0018) [2025-01-05 12:09:40,440][09057] Updated weights for policy 0, policy_version 252038 (0.0018) [2025-01-05 12:09:42,455][09057] Updated weights for policy 0, policy_version 252048 (0.0016) [2025-01-05 12:09:42,842][08963] Fps is (10 sec: 19251.1, 60 sec: 19387.7, 300 sec: 19410.9). Total num frames: 1032392704. Throughput: 0: 4837.3. Samples: 8081958. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 12:09:42,842][08963] Avg episode reward: [(0, '9.835')] [2025-01-05 12:09:44,490][09057] Updated weights for policy 0, policy_version 252058 (0.0016) [2025-01-05 12:09:46,605][09057] Updated weights for policy 0, policy_version 252068 (0.0016) [2025-01-05 12:09:47,842][08963] Fps is (10 sec: 19660.8, 60 sec: 19387.7, 300 sec: 19410.9). Total num frames: 1032491008. Throughput: 0: 4853.4. Samples: 8111864. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 12:09:47,842][08963] Avg episode reward: [(0, '9.261')] [2025-01-05 12:09:47,874][09024] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000252074_1032495104.pth... [2025-01-05 12:09:47,929][09024] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000250937_1027837952.pth [2025-01-05 12:09:48,725][09057] Updated weights for policy 0, policy_version 252078 (0.0016) [2025-01-05 12:09:50,851][09057] Updated weights for policy 0, policy_version 252088 (0.0017) [2025-01-05 12:09:52,842][08963] Fps is (10 sec: 19660.8, 60 sec: 19387.7, 300 sec: 19397.0). Total num frames: 1032589312. Throughput: 0: 4848.3. Samples: 8140698. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 12:09:52,842][08963] Avg episode reward: [(0, '8.705')] [2025-01-05 12:09:52,967][09057] Updated weights for policy 0, policy_version 252098 (0.0017) [2025-01-05 12:09:54,994][09057] Updated weights for policy 0, policy_version 252108 (0.0016) [2025-01-05 12:09:57,055][09057] Updated weights for policy 0, policy_version 252118 (0.0018) [2025-01-05 12:09:57,842][08963] Fps is (10 sec: 19661.1, 60 sec: 19456.0, 300 sec: 19410.9). Total num frames: 1032687616. Throughput: 0: 4856.2. Samples: 8155644. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 12:09:57,842][08963] Avg episode reward: [(0, '10.872')] [2025-01-05 12:09:59,184][09057] Updated weights for policy 0, policy_version 252128 (0.0017) [2025-01-05 12:10:01,200][09057] Updated weights for policy 0, policy_version 252138 (0.0016) [2025-01-05 12:10:02,842][08963] Fps is (10 sec: 19660.8, 60 sec: 19456.0, 300 sec: 19410.9). Total num frames: 1032785920. Throughput: 0: 4881.5. Samples: 8185322. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 12:10:02,842][08963] Avg episode reward: [(0, '10.439')] [2025-01-05 12:10:03,246][09057] Updated weights for policy 0, policy_version 252148 (0.0017) [2025-01-05 12:10:05,335][09057] Updated weights for policy 0, policy_version 252158 (0.0016) [2025-01-05 12:10:07,360][09057] Updated weights for policy 0, policy_version 252168 (0.0016) [2025-01-05 12:10:07,842][08963] Fps is (10 sec: 20070.3, 60 sec: 19524.3, 300 sec: 19424.8). Total num frames: 1032888320. Throughput: 0: 4900.6. Samples: 8215170. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 12:10:07,842][08963] Avg episode reward: [(0, '9.541')] [2025-01-05 12:10:09,499][09057] Updated weights for policy 0, policy_version 252178 (0.0017) [2025-01-05 12:10:11,645][09057] Updated weights for policy 0, policy_version 252188 (0.0017) [2025-01-05 12:10:12,842][08963] Fps is (10 sec: 19660.9, 60 sec: 19524.2, 300 sec: 19410.9). Total num frames: 1032982528. Throughput: 0: 4896.7. Samples: 8229528. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 12:10:12,842][08963] Avg episode reward: [(0, '9.830')] [2025-01-05 12:10:13,764][09057] Updated weights for policy 0, policy_version 252198 (0.0018) [2025-01-05 12:10:15,822][09057] Updated weights for policy 0, policy_version 252208 (0.0017) [2025-01-05 12:10:17,842][08963] Fps is (10 sec: 19251.1, 60 sec: 19524.3, 300 sec: 19410.9). Total num frames: 1033080832. Throughput: 0: 4891.1. Samples: 8258648. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 12:10:17,843][08963] Avg episode reward: [(0, '10.573')] [2025-01-05 12:10:18,028][09057] Updated weights for policy 0, policy_version 252218 (0.0018) [2025-01-05 12:10:20,196][09057] Updated weights for policy 0, policy_version 252228 (0.0018) [2025-01-05 12:10:22,248][09057] Updated weights for policy 0, policy_version 252238 (0.0017) [2025-01-05 12:10:22,842][08963] Fps is (10 sec: 19251.1, 60 sec: 19524.2, 300 sec: 19410.9). Total num frames: 1033175040. Throughput: 0: 4892.9. Samples: 8287386. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 12:10:22,842][08963] Avg episode reward: [(0, '9.700')] [2025-01-05 12:10:24,447][09057] Updated weights for policy 0, policy_version 252248 (0.0017) [2025-01-05 12:10:26,538][09057] Updated weights for policy 0, policy_version 252258 (0.0016) [2025-01-05 12:10:27,842][08963] Fps is (10 sec: 19251.2, 60 sec: 19524.2, 300 sec: 19410.9). Total num frames: 1033273344. Throughput: 0: 4884.2. Samples: 8301746. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 12:10:27,842][08963] Avg episode reward: [(0, '9.506')] [2025-01-05 12:10:28,654][09057] Updated weights for policy 0, policy_version 252268 (0.0017) [2025-01-05 12:10:30,761][09057] Updated weights for policy 0, policy_version 252278 (0.0017) [2025-01-05 12:10:32,842][08963] Fps is (10 sec: 19251.3, 60 sec: 19456.0, 300 sec: 19397.0). Total num frames: 1033367552. Throughput: 0: 4866.0. Samples: 8330832. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 12:10:32,843][08963] Avg episode reward: [(0, '11.304')] [2025-01-05 12:10:32,912][09057] Updated weights for policy 0, policy_version 252288 (0.0018) [2025-01-05 12:10:35,056][09057] Updated weights for policy 0, policy_version 252298 (0.0017) [2025-01-05 12:10:37,198][09057] Updated weights for policy 0, policy_version 252308 (0.0016) [2025-01-05 12:10:37,842][08963] Fps is (10 sec: 19251.1, 60 sec: 19524.3, 300 sec: 19397.0). Total num frames: 1033465856. Throughput: 0: 4864.6. Samples: 8359606. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 12:10:37,842][08963] Avg episode reward: [(0, '9.176')] [2025-01-05 12:10:39,321][09057] Updated weights for policy 0, policy_version 252318 (0.0018) [2025-01-05 12:10:41,395][09057] Updated weights for policy 0, policy_version 252328 (0.0016) [2025-01-05 12:10:42,842][08963] Fps is (10 sec: 19251.3, 60 sec: 19456.0, 300 sec: 19397.0). Total num frames: 1033560064. Throughput: 0: 4851.9. Samples: 8373978. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 12:10:42,842][08963] Avg episode reward: [(0, '9.425')] [2025-01-05 12:10:43,578][09057] Updated weights for policy 0, policy_version 252338 (0.0018) [2025-01-05 12:10:45,675][09057] Updated weights for policy 0, policy_version 252348 (0.0019) [2025-01-05 12:10:47,793][09057] Updated weights for policy 0, policy_version 252358 (0.0017) [2025-01-05 12:10:47,842][08963] Fps is (10 sec: 19251.2, 60 sec: 19456.0, 300 sec: 19410.9). Total num frames: 1033658368. Throughput: 0: 4837.7. Samples: 8403020. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 12:10:47,842][08963] Avg episode reward: [(0, '10.929')] [2025-01-05 12:10:49,983][09057] Updated weights for policy 0, policy_version 252368 (0.0017) [2025-01-05 12:10:52,098][09057] Updated weights for policy 0, policy_version 252378 (0.0017) [2025-01-05 12:10:52,842][08963] Fps is (10 sec: 19250.9, 60 sec: 19387.7, 300 sec: 19397.0). Total num frames: 1033752576. Throughput: 0: 4811.4. Samples: 8431682. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 12:10:52,842][08963] Avg episode reward: [(0, '9.298')] [2025-01-05 12:10:54,224][09057] Updated weights for policy 0, policy_version 252388 (0.0017) [2025-01-05 12:10:56,355][09057] Updated weights for policy 0, policy_version 252398 (0.0017) [2025-01-05 12:10:57,842][08963] Fps is (10 sec: 18841.9, 60 sec: 19319.5, 300 sec: 19383.1). Total num frames: 1033846784. Throughput: 0: 4811.4. Samples: 8446040. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 12:10:57,842][08963] Avg episode reward: [(0, '8.969')] [2025-01-05 12:10:58,516][09057] Updated weights for policy 0, policy_version 252408 (0.0017) [2025-01-05 12:11:00,576][09057] Updated weights for policy 0, policy_version 252418 (0.0017) [2025-01-05 12:11:02,629][09057] Updated weights for policy 0, policy_version 252428 (0.0016) [2025-01-05 12:11:02,842][08963] Fps is (10 sec: 19251.3, 60 sec: 19319.5, 300 sec: 19397.0). Total num frames: 1033945088. Throughput: 0: 4814.6. Samples: 8475304. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 12:11:02,842][08963] Avg episode reward: [(0, '9.232')] [2025-01-05 12:11:04,795][09057] Updated weights for policy 0, policy_version 252438 (0.0017) [2025-01-05 12:11:06,868][09057] Updated weights for policy 0, policy_version 252448 (0.0019) [2025-01-05 12:11:07,842][08963] Fps is (10 sec: 19660.8, 60 sec: 19251.2, 300 sec: 19410.9). Total num frames: 1034043392. Throughput: 0: 4821.7. Samples: 8504364. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 12:11:07,842][08963] Avg episode reward: [(0, '9.855')] [2025-01-05 12:11:09,039][09057] Updated weights for policy 0, policy_version 252458 (0.0018) [2025-01-05 12:11:11,148][09057] Updated weights for policy 0, policy_version 252468 (0.0017) [2025-01-05 12:11:12,842][08963] Fps is (10 sec: 19251.3, 60 sec: 19251.2, 300 sec: 19397.0). Total num frames: 1034137600. Throughput: 0: 4820.6. Samples: 8518672. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 12:11:12,842][08963] Avg episode reward: [(0, '10.804')] [2025-01-05 12:11:13,323][09057] Updated weights for policy 0, policy_version 252478 (0.0018) [2025-01-05 12:11:15,381][09057] Updated weights for policy 0, policy_version 252488 (0.0017) [2025-01-05 12:11:17,438][09057] Updated weights for policy 0, policy_version 252498 (0.0016) [2025-01-05 12:11:17,842][08963] Fps is (10 sec: 19250.1, 60 sec: 19251.0, 300 sec: 19410.8). Total num frames: 1034235904. Throughput: 0: 4824.1. Samples: 8547920. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 12:11:17,843][08963] Avg episode reward: [(0, '8.683')] [2025-01-05 12:11:19,587][09057] Updated weights for policy 0, policy_version 252508 (0.0017) [2025-01-05 12:11:21,658][09057] Updated weights for policy 0, policy_version 252518 (0.0015) [2025-01-05 12:11:22,842][08963] Fps is (10 sec: 19660.8, 60 sec: 19319.5, 300 sec: 19410.9). Total num frames: 1034334208. Throughput: 0: 4833.9. Samples: 8577130. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 12:11:22,842][08963] Avg episode reward: [(0, '10.316')] [2025-01-05 12:11:23,763][09057] Updated weights for policy 0, policy_version 252528 (0.0016) [2025-01-05 12:11:25,860][09057] Updated weights for policy 0, policy_version 252538 (0.0016) [2025-01-05 12:11:27,842][08963] Fps is (10 sec: 19661.7, 60 sec: 19319.5, 300 sec: 19410.9). Total num frames: 1034432512. Throughput: 0: 4839.5. Samples: 8591756. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 12:11:27,843][08963] Avg episode reward: [(0, '9.497')] [2025-01-05 12:11:28,028][09057] Updated weights for policy 0, policy_version 252548 (0.0016) [2025-01-05 12:11:30,038][09057] Updated weights for policy 0, policy_version 252558 (0.0016) [2025-01-05 12:11:32,086][09057] Updated weights for policy 0, policy_version 252568 (0.0015) [2025-01-05 12:11:32,842][08963] Fps is (10 sec: 19660.7, 60 sec: 19387.7, 300 sec: 19424.8). Total num frames: 1034530816. Throughput: 0: 4852.7. Samples: 8621392. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 12:11:32,842][08963] Avg episode reward: [(0, '10.492')] [2025-01-05 12:11:34,230][09057] Updated weights for policy 0, policy_version 252578 (0.0017) [2025-01-05 12:11:36,220][09057] Updated weights for policy 0, policy_version 252588 (0.0015) [2025-01-05 12:11:37,842][08963] Fps is (10 sec: 19661.0, 60 sec: 19387.8, 300 sec: 19438.7). Total num frames: 1034629120. Throughput: 0: 4873.7. Samples: 8650998. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 12:11:37,842][08963] Avg episode reward: [(0, '8.346')] [2025-01-05 12:11:38,303][09057] Updated weights for policy 0, policy_version 252598 (0.0015) [2025-01-05 12:11:40,381][09057] Updated weights for policy 0, policy_version 252608 (0.0015) [2025-01-05 12:11:42,364][09057] Updated weights for policy 0, policy_version 252618 (0.0016) [2025-01-05 12:11:42,842][08963] Fps is (10 sec: 19661.0, 60 sec: 19456.0, 300 sec: 19438.7). Total num frames: 1034727424. Throughput: 0: 4887.2. Samples: 8665966. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 12:11:42,842][08963] Avg episode reward: [(0, '10.512')] [2025-01-05 12:11:44,439][09057] Updated weights for policy 0, policy_version 252628 (0.0015) [2025-01-05 12:11:46,552][09057] Updated weights for policy 0, policy_version 252638 (0.0016) [2025-01-05 12:11:47,842][08963] Fps is (10 sec: 19660.7, 60 sec: 19456.0, 300 sec: 19452.5). Total num frames: 1034825728. Throughput: 0: 4899.1. Samples: 8695764. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 12:11:47,842][08963] Avg episode reward: [(0, '9.064')] [2025-01-05 12:11:47,866][09024] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000252644_1034829824.pth... [2025-01-05 12:11:47,917][09024] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000251504_1030160384.pth [2025-01-05 12:11:48,673][09057] Updated weights for policy 0, policy_version 252648 (0.0017) [2025-01-05 12:11:50,831][09057] Updated weights for policy 0, policy_version 252658 (0.0017) [2025-01-05 12:11:52,842][08963] Fps is (10 sec: 19660.7, 60 sec: 19524.3, 300 sec: 19452.5). Total num frames: 1034924032. Throughput: 0: 4897.7. Samples: 8724760. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 12:11:52,842][08963] Avg episode reward: [(0, '9.625')] [2025-01-05 12:11:52,904][09057] Updated weights for policy 0, policy_version 252668 (0.0016) [2025-01-05 12:11:54,861][09057] Updated weights for policy 0, policy_version 252678 (0.0015) [2025-01-05 12:11:56,946][09057] Updated weights for policy 0, policy_version 252688 (0.0016) [2025-01-05 12:11:57,842][08963] Fps is (10 sec: 20070.2, 60 sec: 19660.8, 300 sec: 19466.4). Total num frames: 1035026432. Throughput: 0: 4915.4. Samples: 8739864. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 12:11:57,842][08963] Avg episode reward: [(0, '11.308')] [2025-01-05 12:11:59,132][09057] Updated weights for policy 0, policy_version 252698 (0.0017) [2025-01-05 12:12:01,147][09057] Updated weights for policy 0, policy_version 252708 (0.0017) [2025-01-05 12:12:02,842][08963] Fps is (10 sec: 20070.6, 60 sec: 19660.8, 300 sec: 19480.3). Total num frames: 1035124736. Throughput: 0: 4917.1. Samples: 8769188. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 12:12:02,842][08963] Avg episode reward: [(0, '9.354')] [2025-01-05 12:12:03,230][09057] Updated weights for policy 0, policy_version 252718 (0.0017) [2025-01-05 12:12:05,322][09057] Updated weights for policy 0, policy_version 252728 (0.0016) [2025-01-05 12:12:07,316][09057] Updated weights for policy 0, policy_version 252738 (0.0016) [2025-01-05 12:12:07,842][08963] Fps is (10 sec: 19660.8, 60 sec: 19660.8, 300 sec: 19494.2). Total num frames: 1035223040. Throughput: 0: 4933.9. Samples: 8799158. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 12:12:07,842][08963] Avg episode reward: [(0, '9.450')] [2025-01-05 12:12:09,385][09057] Updated weights for policy 0, policy_version 252748 (0.0017) [2025-01-05 12:12:11,458][09057] Updated weights for policy 0, policy_version 252758 (0.0016) [2025-01-05 12:12:12,842][08963] Fps is (10 sec: 19660.6, 60 sec: 19729.0, 300 sec: 19494.2). Total num frames: 1035321344. Throughput: 0: 4942.9. Samples: 8814188. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 12:12:12,842][08963] Avg episode reward: [(0, '9.402')] [2025-01-05 12:12:13,538][09057] Updated weights for policy 0, policy_version 252768 (0.0017) [2025-01-05 12:12:15,670][09057] Updated weights for policy 0, policy_version 252778 (0.0018) [2025-01-05 12:12:17,842][08963] Fps is (10 sec: 19251.2, 60 sec: 19660.9, 300 sec: 19466.4). Total num frames: 1035415552. Throughput: 0: 4928.2. Samples: 8843162. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 12:12:17,842][08963] Avg episode reward: [(0, '8.402')] [2025-01-05 12:12:17,853][09057] Updated weights for policy 0, policy_version 252788 (0.0020) [2025-01-05 12:12:19,969][09057] Updated weights for policy 0, policy_version 252798 (0.0017) [2025-01-05 12:12:22,061][09057] Updated weights for policy 0, policy_version 252808 (0.0017) [2025-01-05 12:12:22,842][08963] Fps is (10 sec: 19250.6, 60 sec: 19660.7, 300 sec: 19466.4). Total num frames: 1035513856. Throughput: 0: 4908.0. Samples: 8871858. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 12:12:22,843][08963] Avg episode reward: [(0, '10.755')] [2025-01-05 12:12:24,269][09057] Updated weights for policy 0, policy_version 252818 (0.0017) [2025-01-05 12:12:26,339][09057] Updated weights for policy 0, policy_version 252828 (0.0017) [2025-01-05 12:12:27,842][08963] Fps is (10 sec: 19660.9, 60 sec: 19660.8, 300 sec: 19466.4). Total num frames: 1035612160. Throughput: 0: 4897.3. Samples: 8886344. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 12:12:27,842][08963] Avg episode reward: [(0, '10.363')] [2025-01-05 12:12:28,440][09057] Updated weights for policy 0, policy_version 252838 (0.0017) [2025-01-05 12:12:30,561][09057] Updated weights for policy 0, policy_version 252848 (0.0016) [2025-01-05 12:12:32,595][09057] Updated weights for policy 0, policy_version 252858 (0.0015) [2025-01-05 12:12:32,842][08963] Fps is (10 sec: 19661.4, 60 sec: 19660.8, 300 sec: 19480.3). Total num frames: 1035710464. Throughput: 0: 4892.4. Samples: 8915922. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 12:12:32,842][08963] Avg episode reward: [(0, '9.974')] [2025-01-05 12:12:34,696][09057] Updated weights for policy 0, policy_version 252868 (0.0017) [2025-01-05 12:12:36,806][09057] Updated weights for policy 0, policy_version 252878 (0.0016) [2025-01-05 12:12:37,842][08963] Fps is (10 sec: 19661.0, 60 sec: 19660.8, 300 sec: 19480.3). Total num frames: 1035808768. Throughput: 0: 4896.9. Samples: 8945122. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 12:12:37,842][08963] Avg episode reward: [(0, '9.497')] [2025-01-05 12:12:38,943][09057] Updated weights for policy 0, policy_version 252888 (0.0017) [2025-01-05 12:12:40,970][09057] Updated weights for policy 0, policy_version 252898 (0.0016) [2025-01-05 12:12:42,842][08963] Fps is (10 sec: 19251.2, 60 sec: 19592.5, 300 sec: 19466.4). Total num frames: 1035902976. Throughput: 0: 4885.6. Samples: 8959716. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 12:12:42,842][08963] Avg episode reward: [(0, '9.825')] [2025-01-05 12:12:43,157][09057] Updated weights for policy 0, policy_version 252908 (0.0017) [2025-01-05 12:12:45,268][09057] Updated weights for policy 0, policy_version 252918 (0.0017) [2025-01-05 12:12:47,260][09057] Updated weights for policy 0, policy_version 252928 (0.0016) [2025-01-05 12:12:47,842][08963] Fps is (10 sec: 19251.2, 60 sec: 19592.5, 300 sec: 19466.4). Total num frames: 1036001280. Throughput: 0: 4881.9. Samples: 8988872. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 12:12:47,842][08963] Avg episode reward: [(0, '9.093')] [2025-01-05 12:12:49,354][09057] Updated weights for policy 0, policy_version 252938 (0.0016) [2025-01-05 12:12:51,381][09057] Updated weights for policy 0, policy_version 252948 (0.0016) [2025-01-05 12:12:52,842][08963] Fps is (10 sec: 20070.4, 60 sec: 19660.8, 300 sec: 19494.2). Total num frames: 1036103680. Throughput: 0: 4880.0. Samples: 9018758. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 12:12:52,842][08963] Avg episode reward: [(0, '9.646')] [2025-01-05 12:12:53,470][09057] Updated weights for policy 0, policy_version 252958 (0.0018) [2025-01-05 12:12:55,619][09057] Updated weights for policy 0, policy_version 252968 (0.0017) [2025-01-05 12:12:57,623][09057] Updated weights for policy 0, policy_version 252978 (0.0016) [2025-01-05 12:12:57,842][08963] Fps is (10 sec: 20070.2, 60 sec: 19592.5, 300 sec: 19494.2). Total num frames: 1036201984. Throughput: 0: 4872.9. Samples: 9033470. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 12:12:57,842][08963] Avg episode reward: [(0, '9.484')] [2025-01-05 12:12:59,638][09057] Updated weights for policy 0, policy_version 252988 (0.0016) [2025-01-05 12:13:01,745][09057] Updated weights for policy 0, policy_version 252998 (0.0016) [2025-01-05 12:13:02,842][08963] Fps is (10 sec: 19660.8, 60 sec: 19592.5, 300 sec: 19494.2). Total num frames: 1036300288. Throughput: 0: 4896.6. Samples: 9063510. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 12:13:02,842][08963] Avg episode reward: [(0, '9.691')] [2025-01-05 12:13:03,866][09057] Updated weights for policy 0, policy_version 253008 (0.0017) [2025-01-05 12:13:05,904][09057] Updated weights for policy 0, policy_version 253018 (0.0017) [2025-01-05 12:13:07,842][08963] Fps is (10 sec: 19660.8, 60 sec: 19592.5, 300 sec: 19508.1). Total num frames: 1036398592. Throughput: 0: 4901.5. Samples: 9092424. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 12:13:07,842][08963] Avg episode reward: [(0, '10.045')] [2025-01-05 12:13:08,112][09057] Updated weights for policy 0, policy_version 253028 (0.0017) [2025-01-05 12:13:10,204][09057] Updated weights for policy 0, policy_version 253038 (0.0017) [2025-01-05 12:13:12,209][09057] Updated weights for policy 0, policy_version 253048 (0.0019) [2025-01-05 12:13:12,842][08963] Fps is (10 sec: 19661.0, 60 sec: 19592.6, 300 sec: 19508.1). Total num frames: 1036496896. Throughput: 0: 4906.1. Samples: 9107118. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 12:13:12,842][08963] Avg episode reward: [(0, '10.062')] [2025-01-05 12:13:14,311][09057] Updated weights for policy 0, policy_version 253058 (0.0017) [2025-01-05 12:13:16,329][09057] Updated weights for policy 0, policy_version 253068 (0.0016) [2025-01-05 12:13:17,842][08963] Fps is (10 sec: 19660.9, 60 sec: 19660.8, 300 sec: 19508.1). Total num frames: 1036595200. Throughput: 0: 4915.6. Samples: 9137122. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 12:13:17,842][08963] Avg episode reward: [(0, '9.476')] [2025-01-05 12:13:18,390][09057] Updated weights for policy 0, policy_version 253078 (0.0016) [2025-01-05 12:13:20,504][09057] Updated weights for policy 0, policy_version 253088 (0.0016) [2025-01-05 12:13:22,512][09057] Updated weights for policy 0, policy_version 253098 (0.0016) [2025-01-05 12:13:22,842][08963] Fps is (10 sec: 19660.5, 60 sec: 19660.9, 300 sec: 19521.9). Total num frames: 1036693504. Throughput: 0: 4928.1. Samples: 9166888. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 12:13:22,842][08963] Avg episode reward: [(0, '9.158')] [2025-01-05 12:13:24,536][09057] Updated weights for policy 0, policy_version 253108 (0.0016) [2025-01-05 12:13:26,642][09057] Updated weights for policy 0, policy_version 253118 (0.0018) [2025-01-05 12:13:27,842][08963] Fps is (10 sec: 19660.9, 60 sec: 19660.8, 300 sec: 19521.9). Total num frames: 1036791808. Throughput: 0: 4938.3. Samples: 9181940. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 12:13:27,842][08963] Avg episode reward: [(0, '10.334')] [2025-01-05 12:13:28,748][09057] Updated weights for policy 0, policy_version 253128 (0.0017) [2025-01-05 12:13:30,845][09057] Updated weights for policy 0, policy_version 253138 (0.0017) [2025-01-05 12:13:32,842][08963] Fps is (10 sec: 19660.9, 60 sec: 19660.8, 300 sec: 19522.0). Total num frames: 1036890112. Throughput: 0: 4935.6. Samples: 9210974. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 12:13:32,842][08963] Avg episode reward: [(0, '10.737')] [2025-01-05 12:13:33,029][09057] Updated weights for policy 0, policy_version 253148 (0.0017) [2025-01-05 12:13:35,071][09057] Updated weights for policy 0, policy_version 253158 (0.0017) [2025-01-05 12:13:37,117][09057] Updated weights for policy 0, policy_version 253168 (0.0016) [2025-01-05 12:13:37,842][08963] Fps is (10 sec: 19660.6, 60 sec: 19660.8, 300 sec: 19521.9). Total num frames: 1036988416. Throughput: 0: 4926.4. Samples: 9240446. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-05 12:13:37,842][08963] Avg episode reward: [(0, '10.026')] [2025-01-05 12:13:39,282][09057] Updated weights for policy 0, policy_version 253178 (0.0017) [2025-01-05 12:13:41,316][09057] Updated weights for policy 0, policy_version 253188 (0.0017) [2025-01-05 12:13:42,842][08963] Fps is (10 sec: 19660.8, 60 sec: 19729.1, 300 sec: 19522.0). Total num frames: 1037086720. Throughput: 0: 4921.1. Samples: 9254918. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-05 12:13:42,842][08963] Avg episode reward: [(0, '9.860')] [2025-01-05 12:13:43,442][09057] Updated weights for policy 0, policy_version 253198 (0.0017) [2025-01-05 12:13:45,549][09057] Updated weights for policy 0, policy_version 253208 (0.0016) [2025-01-05 12:13:47,558][09057] Updated weights for policy 0, policy_version 253218 (0.0017) [2025-01-05 12:13:47,842][08963] Fps is (10 sec: 19661.0, 60 sec: 19729.1, 300 sec: 19522.0). Total num frames: 1037185024. Throughput: 0: 4911.5. Samples: 9284528. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-05 12:13:47,842][08963] Avg episode reward: [(0, '11.104')] [2025-01-05 12:13:47,849][09024] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000253219_1037185024.pth... [2025-01-05 12:13:47,903][09024] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000252074_1032495104.pth [2025-01-05 12:13:49,631][09057] Updated weights for policy 0, policy_version 253228 (0.0016) [2025-01-05 12:13:51,716][09057] Updated weights for policy 0, policy_version 253238 (0.0016) [2025-01-05 12:13:52,842][08963] Fps is (10 sec: 19661.0, 60 sec: 19660.8, 300 sec: 19535.8). Total num frames: 1037283328. Throughput: 0: 4927.4. Samples: 9314154. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-05 12:13:52,842][08963] Avg episode reward: [(0, '10.097')] [2025-01-05 12:13:53,820][09057] Updated weights for policy 0, policy_version 253248 (0.0017) [2025-01-05 12:13:55,862][09057] Updated weights for policy 0, policy_version 253258 (0.0016) [2025-01-05 12:13:57,842][08963] Fps is (10 sec: 19660.6, 60 sec: 19660.8, 300 sec: 19535.8). Total num frames: 1037381632. Throughput: 0: 4926.6. Samples: 9328814. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-05 12:13:57,843][08963] Avg episode reward: [(0, '8.953')] [2025-01-05 12:13:58,025][09057] Updated weights for policy 0, policy_version 253268 (0.0017) [2025-01-05 12:14:00,099][09057] Updated weights for policy 0, policy_version 253278 (0.0016) [2025-01-05 12:14:02,121][09057] Updated weights for policy 0, policy_version 253288 (0.0017) [2025-01-05 12:14:02,842][08963] Fps is (10 sec: 19660.6, 60 sec: 19660.8, 300 sec: 19535.8). Total num frames: 1037479936. Throughput: 0: 4914.4. Samples: 9358272. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-05 12:14:02,842][08963] Avg episode reward: [(0, '9.823')] [2025-01-05 12:14:04,299][09057] Updated weights for policy 0, policy_version 253298 (0.0017) [2025-01-05 12:14:06,411][09057] Updated weights for policy 0, policy_version 253308 (0.0018) [2025-01-05 12:14:07,842][08963] Fps is (10 sec: 19251.5, 60 sec: 19592.6, 300 sec: 19535.8). Total num frames: 1037574144. Throughput: 0: 4895.6. Samples: 9387190. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-05 12:14:07,842][08963] Avg episode reward: [(0, '10.856')] [2025-01-05 12:14:08,509][09057] Updated weights for policy 0, policy_version 253318 (0.0017) [2025-01-05 12:14:10,650][09057] Updated weights for policy 0, policy_version 253328 (0.0017) [2025-01-05 12:14:12,737][09057] Updated weights for policy 0, policy_version 253338 (0.0016) [2025-01-05 12:14:12,842][08963] Fps is (10 sec: 19251.2, 60 sec: 19592.5, 300 sec: 19535.8). Total num frames: 1037672448. Throughput: 0: 4885.0. Samples: 9401766. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-05 12:14:12,842][08963] Avg episode reward: [(0, '9.396')] [2025-01-05 12:14:14,812][09057] Updated weights for policy 0, policy_version 253348 (0.0017) [2025-01-05 12:14:16,941][09057] Updated weights for policy 0, policy_version 253358 (0.0017) [2025-01-05 12:14:17,842][08963] Fps is (10 sec: 19660.5, 60 sec: 19592.5, 300 sec: 19549.7). Total num frames: 1037770752. Throughput: 0: 4888.4. Samples: 9430954. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-05 12:14:17,842][08963] Avg episode reward: [(0, '10.422')] [2025-01-05 12:14:19,088][09057] Updated weights for policy 0, policy_version 253368 (0.0016) [2025-01-05 12:14:21,086][09057] Updated weights for policy 0, policy_version 253378 (0.0016) [2025-01-05 12:14:22,842][08963] Fps is (10 sec: 19660.8, 60 sec: 19592.6, 300 sec: 19549.7). Total num frames: 1037869056. Throughput: 0: 4889.7. Samples: 9460480. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-05 12:14:22,842][08963] Avg episode reward: [(0, '10.149')] [2025-01-05 12:14:23,192][09057] Updated weights for policy 0, policy_version 253388 (0.0017) [2025-01-05 12:14:25,283][09057] Updated weights for policy 0, policy_version 253398 (0.0017) [2025-01-05 12:14:27,255][09057] Updated weights for policy 0, policy_version 253408 (0.0016) [2025-01-05 12:14:27,842][08963] Fps is (10 sec: 19661.1, 60 sec: 19592.6, 300 sec: 19549.7). Total num frames: 1037967360. Throughput: 0: 4899.4. Samples: 9475392. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-05 12:14:27,842][08963] Avg episode reward: [(0, '9.754')] [2025-01-05 12:14:29,358][09057] Updated weights for policy 0, policy_version 253418 (0.0016) [2025-01-05 12:14:31,471][09057] Updated weights for policy 0, policy_version 253428 (0.0016) [2025-01-05 12:14:32,842][08963] Fps is (10 sec: 19660.7, 60 sec: 19592.5, 300 sec: 19563.6). Total num frames: 1038065664. Throughput: 0: 4904.7. Samples: 9505240. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-05 12:14:32,843][08963] Avg episode reward: [(0, '9.595')] [2025-01-05 12:14:33,530][09057] Updated weights for policy 0, policy_version 253438 (0.0020) [2025-01-05 12:14:35,713][09057] Updated weights for policy 0, policy_version 253448 (0.0018) [2025-01-05 12:14:37,842][08963] Fps is (10 sec: 19250.9, 60 sec: 19524.3, 300 sec: 19549.7). Total num frames: 1038159872. Throughput: 0: 4882.8. Samples: 9533880. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-05 12:14:37,843][08963] Avg episode reward: [(0, '8.794')] [2025-01-05 12:14:37,909][09057] Updated weights for policy 0, policy_version 253458 (0.0017) [2025-01-05 12:14:39,957][09057] Updated weights for policy 0, policy_version 253468 (0.0017) [2025-01-05 12:14:42,057][09057] Updated weights for policy 0, policy_version 253478 (0.0019) [2025-01-05 12:14:42,842][08963] Fps is (10 sec: 19251.2, 60 sec: 19524.3, 300 sec: 19549.7). Total num frames: 1038258176. Throughput: 0: 4875.7. Samples: 9548218. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-05 12:14:42,843][08963] Avg episode reward: [(0, '10.079')] [2025-01-05 12:14:44,249][09057] Updated weights for policy 0, policy_version 253488 (0.0016) [2025-01-05 12:14:46,253][09057] Updated weights for policy 0, policy_version 253498 (0.0017) [2025-01-05 12:14:47,842][08963] Fps is (10 sec: 19660.9, 60 sec: 19524.2, 300 sec: 19549.7). Total num frames: 1038356480. Throughput: 0: 4874.1. Samples: 9577606. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-05 12:14:47,842][08963] Avg episode reward: [(0, '9.226')] [2025-01-05 12:14:48,336][09057] Updated weights for policy 0, policy_version 253508 (0.0015) [2025-01-05 12:14:50,444][09057] Updated weights for policy 0, policy_version 253518 (0.0016) [2025-01-05 12:14:51,076][09024] Signal inference workers to stop experience collection... (450 times) [2025-01-05 12:14:51,079][09024] Signal inference workers to resume experience collection... (450 times) [2025-01-05 12:14:51,093][09057] InferenceWorker_p0-w0: stopping experience collection (450 times) [2025-01-05 12:14:51,093][09057] InferenceWorker_p0-w0: resuming experience collection (450 times) [2025-01-05 12:14:52,423][09057] Updated weights for policy 0, policy_version 253528 (0.0016) [2025-01-05 12:14:52,842][08963] Fps is (10 sec: 19661.1, 60 sec: 19524.3, 300 sec: 19549.7). Total num frames: 1038454784. Throughput: 0: 4894.2. Samples: 9607430. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-05 12:14:52,842][08963] Avg episode reward: [(0, '9.788')] [2025-01-05 12:14:54,512][09057] Updated weights for policy 0, policy_version 253538 (0.0014) [2025-01-05 12:14:56,604][09057] Updated weights for policy 0, policy_version 253548 (0.0015) [2025-01-05 12:14:57,842][08963] Fps is (10 sec: 19660.8, 60 sec: 19524.3, 300 sec: 19549.7). Total num frames: 1038553088. Throughput: 0: 4903.0. Samples: 9622400. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-05 12:14:57,842][08963] Avg episode reward: [(0, '8.962')] [2025-01-05 12:14:58,697][09057] Updated weights for policy 0, policy_version 253558 (0.0017) [2025-01-05 12:15:00,809][09057] Updated weights for policy 0, policy_version 253568 (0.0016) [2025-01-05 12:15:02,842][08963] Fps is (10 sec: 19660.5, 60 sec: 19524.3, 300 sec: 19535.8). Total num frames: 1038651392. Throughput: 0: 4898.0. Samples: 9651364. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-05 12:15:02,842][08963] Avg episode reward: [(0, '9.183')] [2025-01-05 12:15:03,039][09057] Updated weights for policy 0, policy_version 253578 (0.0018) [2025-01-05 12:15:05,137][09057] Updated weights for policy 0, policy_version 253588 (0.0017) [2025-01-05 12:15:07,234][09057] Updated weights for policy 0, policy_version 253598 (0.0016) [2025-01-05 12:15:07,842][08963] Fps is (10 sec: 19251.2, 60 sec: 19524.2, 300 sec: 19535.8). Total num frames: 1038745600. Throughput: 0: 4882.4. Samples: 9680190. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-05 12:15:07,842][08963] Avg episode reward: [(0, '9.242')] [2025-01-05 12:15:09,420][09057] Updated weights for policy 0, policy_version 253608 (0.0016) [2025-01-05 12:15:11,440][09057] Updated weights for policy 0, policy_version 253618 (0.0016) [2025-01-05 12:15:12,842][08963] Fps is (10 sec: 19251.2, 60 sec: 19524.2, 300 sec: 19535.8). Total num frames: 1038843904. Throughput: 0: 4870.8. Samples: 9694580. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-05 12:15:12,844][08963] Avg episode reward: [(0, '10.089')] [2025-01-05 12:15:13,597][09057] Updated weights for policy 0, policy_version 253628 (0.0016) [2025-01-05 12:15:15,730][09057] Updated weights for policy 0, policy_version 253638 (0.0017) [2025-01-05 12:15:17,816][09057] Updated weights for policy 0, policy_version 253648 (0.0017) [2025-01-05 12:15:17,842][08963] Fps is (10 sec: 19660.7, 60 sec: 19524.3, 300 sec: 19549.7). Total num frames: 1038942208. Throughput: 0: 4854.7. Samples: 9723704. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-05 12:15:17,842][08963] Avg episode reward: [(0, '10.083')] [2025-01-05 12:15:19,968][09057] Updated weights for policy 0, policy_version 253658 (0.0016) [2025-01-05 12:15:22,138][09057] Updated weights for policy 0, policy_version 253668 (0.0015) [2025-01-05 12:15:22,842][08963] Fps is (10 sec: 19251.5, 60 sec: 19456.0, 300 sec: 19535.8). Total num frames: 1039036416. Throughput: 0: 4860.1. Samples: 9752582. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-05 12:15:22,842][08963] Avg episode reward: [(0, '9.824')] [2025-01-05 12:15:24,232][09057] Updated weights for policy 0, policy_version 253678 (0.0020) [2025-01-05 12:15:26,297][09057] Updated weights for policy 0, policy_version 253688 (0.0017) [2025-01-05 12:15:27,842][08963] Fps is (10 sec: 18841.7, 60 sec: 19387.7, 300 sec: 19535.8). Total num frames: 1039130624. Throughput: 0: 4859.1. Samples: 9766876. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-05 12:15:27,842][08963] Avg episode reward: [(0, '9.697')] [2025-01-05 12:15:28,533][09057] Updated weights for policy 0, policy_version 253698 (0.0017) [2025-01-05 12:15:30,552][09057] Updated weights for policy 0, policy_version 253708 (0.0016) [2025-01-05 12:15:32,573][09057] Updated weights for policy 0, policy_version 253718 (0.0016) [2025-01-05 12:15:32,842][08963] Fps is (10 sec: 19660.8, 60 sec: 19456.0, 300 sec: 19549.7). Total num frames: 1039233024. Throughput: 0: 4860.8. Samples: 9796342. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-05 12:15:32,842][08963] Avg episode reward: [(0, '10.441')] [2025-01-05 12:15:34,788][09057] Updated weights for policy 0, policy_version 253728 (0.0018) [2025-01-05 12:15:36,794][09057] Updated weights for policy 0, policy_version 253738 (0.0016) [2025-01-05 12:15:37,842][08963] Fps is (10 sec: 19661.0, 60 sec: 19456.1, 300 sec: 19549.7). Total num frames: 1039327232. Throughput: 0: 4852.7. Samples: 9825800. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-05 12:15:37,842][08963] Avg episode reward: [(0, '10.109')] [2025-01-05 12:15:38,842][09057] Updated weights for policy 0, policy_version 253748 (0.0016) [2025-01-05 12:15:40,931][09057] Updated weights for policy 0, policy_version 253758 (0.0016) [2025-01-05 12:15:42,842][08963] Fps is (10 sec: 19251.0, 60 sec: 19456.0, 300 sec: 19549.7). Total num frames: 1039425536. Throughput: 0: 4853.8. Samples: 9840822. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-05 12:15:42,842][08963] Avg episode reward: [(0, '9.949')] [2025-01-05 12:15:43,025][09057] Updated weights for policy 0, policy_version 253768 (0.0017) [2025-01-05 12:15:45,143][09057] Updated weights for policy 0, policy_version 253778 (0.0016) [2025-01-05 12:15:47,269][09057] Updated weights for policy 0, policy_version 253788 (0.0017) [2025-01-05 12:15:47,842][08963] Fps is (10 sec: 19660.4, 60 sec: 19456.0, 300 sec: 19563.6). Total num frames: 1039523840. Throughput: 0: 4857.3. Samples: 9869942. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-05 12:15:47,842][08963] Avg episode reward: [(0, '9.448')] [2025-01-05 12:15:47,912][09024] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000253791_1039527936.pth... [2025-01-05 12:15:47,967][09024] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000252644_1034829824.pth [2025-01-05 12:15:49,384][09057] Updated weights for policy 0, policy_version 253798 (0.0017) [2025-01-05 12:15:51,465][09057] Updated weights for policy 0, policy_version 253808 (0.0016) [2025-01-05 12:15:52,842][08963] Fps is (10 sec: 19660.9, 60 sec: 19456.0, 300 sec: 19577.5). Total num frames: 1039622144. Throughput: 0: 4856.4. Samples: 9898726. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-05 12:15:52,842][08963] Avg episode reward: [(0, '10.531')] [2025-01-05 12:15:53,695][09057] Updated weights for policy 0, policy_version 253818 (0.0017) [2025-01-05 12:15:55,737][09057] Updated weights for policy 0, policy_version 253828 (0.0020) [2025-01-05 12:15:57,842][08963] Fps is (10 sec: 19251.4, 60 sec: 19387.7, 300 sec: 19563.6). Total num frames: 1039716352. Throughput: 0: 4856.0. Samples: 9913100. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-05 12:15:57,842][08963] Avg episode reward: [(0, '9.829')] [2025-01-05 12:15:57,908][09057] Updated weights for policy 0, policy_version 253838 (0.0017) [2025-01-05 12:16:00,152][09057] Updated weights for policy 0, policy_version 253848 (0.0017) [2025-01-05 12:16:02,160][09057] Updated weights for policy 0, policy_version 253858 (0.0017) [2025-01-05 12:16:02,842][08963] Fps is (10 sec: 19251.3, 60 sec: 19387.8, 300 sec: 19563.6). Total num frames: 1039814656. Throughput: 0: 4844.2. Samples: 9941690. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-05 12:16:02,842][08963] Avg episode reward: [(0, '10.427')] [2025-01-05 12:16:04,224][09057] Updated weights for policy 0, policy_version 253868 (0.0017) [2025-01-05 12:16:06,334][09057] Updated weights for policy 0, policy_version 253878 (0.0016) [2025-01-05 12:16:07,842][08963] Fps is (10 sec: 19660.6, 60 sec: 19456.0, 300 sec: 19577.5). Total num frames: 1039912960. Throughput: 0: 4863.5. Samples: 9971442. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-05 12:16:07,842][08963] Avg episode reward: [(0, '10.329')] [2025-01-05 12:16:08,375][09057] Updated weights for policy 0, policy_version 253888 (0.0016) [2025-01-05 12:16:10,477][09057] Updated weights for policy 0, policy_version 253898 (0.0016) [2025-01-05 12:16:12,589][09057] Updated weights for policy 0, policy_version 253908 (0.0017) [2025-01-05 12:16:12,842][08963] Fps is (10 sec: 19660.5, 60 sec: 19456.0, 300 sec: 19577.5). Total num frames: 1040011264. Throughput: 0: 4873.8. Samples: 9986196. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-05 12:16:12,842][08963] Avg episode reward: [(0, '9.029')] [2025-01-05 12:16:14,683][09057] Updated weights for policy 0, policy_version 253918 (0.0017) [2025-01-05 12:16:16,758][09057] Updated weights for policy 0, policy_version 253928 (0.0015) [2025-01-05 12:16:17,842][08963] Fps is (10 sec: 19661.1, 60 sec: 19456.0, 300 sec: 19577.5). Total num frames: 1040109568. Throughput: 0: 4868.6. Samples: 10015430. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-05 12:16:17,842][08963] Avg episode reward: [(0, '9.783')] [2025-01-05 12:16:18,984][09057] Updated weights for policy 0, policy_version 253938 (0.0017) [2025-01-05 12:16:20,948][09057] Updated weights for policy 0, policy_version 253948 (0.0016) [2025-01-05 12:16:22,842][08963] Fps is (10 sec: 19660.8, 60 sec: 19524.2, 300 sec: 19577.5). Total num frames: 1040207872. Throughput: 0: 4871.3. Samples: 10045008. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-05 12:16:22,842][08963] Avg episode reward: [(0, '11.014')] [2025-01-05 12:16:22,991][09057] Updated weights for policy 0, policy_version 253958 (0.0016) [2025-01-05 12:16:25,100][09057] Updated weights for policy 0, policy_version 253968 (0.0016) [2025-01-05 12:16:27,077][09057] Updated weights for policy 0, policy_version 253978 (0.0019) [2025-01-05 12:16:27,842][08963] Fps is (10 sec: 19660.5, 60 sec: 19592.5, 300 sec: 19577.5). Total num frames: 1040306176. Throughput: 0: 4871.7. Samples: 10060050. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-05 12:16:27,842][08963] Avg episode reward: [(0, '10.820')] [2025-01-05 12:16:29,130][09057] Updated weights for policy 0, policy_version 253988 (0.0016) [2025-01-05 12:16:31,228][09057] Updated weights for policy 0, policy_version 253998 (0.0015) [2025-01-05 12:16:32,842][08963] Fps is (10 sec: 19660.7, 60 sec: 19524.2, 300 sec: 19577.5). Total num frames: 1040404480. Throughput: 0: 4889.8. Samples: 10089984. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-05 12:16:32,842][08963] Avg episode reward: [(0, '9.476')] [2025-01-05 12:16:33,309][09057] Updated weights for policy 0, policy_version 254008 (0.0017) [2025-01-05 12:16:35,415][09057] Updated weights for policy 0, policy_version 254018 (0.0016) [2025-01-05 12:16:37,520][09057] Updated weights for policy 0, policy_version 254028 (0.0017) [2025-01-05 12:16:37,842][08963] Fps is (10 sec: 19661.1, 60 sec: 19592.5, 300 sec: 19577.5). Total num frames: 1040502784. Throughput: 0: 4902.1. Samples: 10119322. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-05 12:16:37,842][08963] Avg episode reward: [(0, '9.734')] [2025-01-05 12:16:39,594][09057] Updated weights for policy 0, policy_version 254038 (0.0017) [2025-01-05 12:16:41,664][09057] Updated weights for policy 0, policy_version 254048 (0.0016) [2025-01-05 12:16:42,842][08963] Fps is (10 sec: 19661.1, 60 sec: 19592.6, 300 sec: 19577.5). Total num frames: 1040601088. Throughput: 0: 4904.9. Samples: 10133818. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-05 12:16:42,842][08963] Avg episode reward: [(0, '10.384')] [2025-01-05 12:16:43,875][09057] Updated weights for policy 0, policy_version 254058 (0.0016) [2025-01-05 12:16:45,907][09057] Updated weights for policy 0, policy_version 254068 (0.0016) [2025-01-05 12:16:47,842][08963] Fps is (10 sec: 19660.6, 60 sec: 19592.6, 300 sec: 19577.5). Total num frames: 1040699392. Throughput: 0: 4917.2. Samples: 10162964. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-05 12:16:47,842][08963] Avg episode reward: [(0, '10.341')] [2025-01-05 12:16:48,029][09057] Updated weights for policy 0, policy_version 254078 (0.0018) [2025-01-05 12:16:50,154][09057] Updated weights for policy 0, policy_version 254088 (0.0016) [2025-01-05 12:16:52,150][09057] Updated weights for policy 0, policy_version 254098 (0.0016) [2025-01-05 12:16:52,842][08963] Fps is (10 sec: 19660.8, 60 sec: 19592.6, 300 sec: 19563.6). Total num frames: 1040797696. Throughput: 0: 4917.2. Samples: 10192716. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-05 12:16:52,842][08963] Avg episode reward: [(0, '10.227')] [2025-01-05 12:16:54,160][09057] Updated weights for policy 0, policy_version 254108 (0.0016) [2025-01-05 12:16:56,275][09057] Updated weights for policy 0, policy_version 254118 (0.0016) [2025-01-05 12:16:57,842][08963] Fps is (10 sec: 19661.0, 60 sec: 19660.8, 300 sec: 19563.6). Total num frames: 1040896000. Throughput: 0: 4924.8. Samples: 10207812. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-05 12:16:57,842][08963] Avg episode reward: [(0, '9.737')] [2025-01-05 12:16:58,280][09057] Updated weights for policy 0, policy_version 254128 (0.0016) [2025-01-05 12:17:00,312][09057] Updated weights for policy 0, policy_version 254138 (0.0016) [2025-01-05 12:17:02,414][09057] Updated weights for policy 0, policy_version 254148 (0.0016) [2025-01-05 12:17:02,842][08963] Fps is (10 sec: 20070.3, 60 sec: 19729.0, 300 sec: 19577.5). Total num frames: 1040998400. Throughput: 0: 4942.0. Samples: 10237820. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-05 12:17:02,842][08963] Avg episode reward: [(0, '9.695')] [2025-01-05 12:17:04,499][09057] Updated weights for policy 0, policy_version 254158 (0.0017) [2025-01-05 12:17:06,621][09057] Updated weights for policy 0, policy_version 254168 (0.0016) [2025-01-05 12:17:07,842][08963] Fps is (10 sec: 19660.8, 60 sec: 19660.8, 300 sec: 19563.6). Total num frames: 1041092608. Throughput: 0: 4924.9. Samples: 10266626. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 12:17:07,842][08963] Avg episode reward: [(0, '10.189')] [2025-01-05 12:17:08,841][09057] Updated weights for policy 0, policy_version 254178 (0.0017) [2025-01-05 12:17:10,828][09057] Updated weights for policy 0, policy_version 254188 (0.0016) [2025-01-05 12:17:12,842][08963] Fps is (10 sec: 19251.3, 60 sec: 19660.8, 300 sec: 19577.5). Total num frames: 1041190912. Throughput: 0: 4918.3. Samples: 10281372. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 12:17:12,842][08963] Avg episode reward: [(0, '9.384')] [2025-01-05 12:17:12,873][09057] Updated weights for policy 0, policy_version 254198 (0.0017) [2025-01-05 12:17:14,984][09057] Updated weights for policy 0, policy_version 254208 (0.0016) [2025-01-05 12:17:16,952][09057] Updated weights for policy 0, policy_version 254218 (0.0017) [2025-01-05 12:17:17,842][08963] Fps is (10 sec: 20070.3, 60 sec: 19729.0, 300 sec: 19591.4). Total num frames: 1041293312. Throughput: 0: 4921.7. Samples: 10311462. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 12:17:17,842][08963] Avg episode reward: [(0, '10.736')] [2025-01-05 12:17:18,997][09057] Updated weights for policy 0, policy_version 254228 (0.0017) [2025-01-05 12:17:21,089][09057] Updated weights for policy 0, policy_version 254238 (0.0015) [2025-01-05 12:17:22,842][08963] Fps is (10 sec: 20070.2, 60 sec: 19729.1, 300 sec: 19591.4). Total num frames: 1041391616. Throughput: 0: 4931.6. Samples: 10341246. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 12:17:22,842][08963] Avg episode reward: [(0, '10.136')] [2025-01-05 12:17:23,158][09057] Updated weights for policy 0, policy_version 254248 (0.0017) [2025-01-05 12:17:25,256][09057] Updated weights for policy 0, policy_version 254258 (0.0017) [2025-01-05 12:17:27,340][09057] Updated weights for policy 0, policy_version 254268 (0.0016) [2025-01-05 12:17:27,842][08963] Fps is (10 sec: 19660.7, 60 sec: 19729.1, 300 sec: 19591.4). Total num frames: 1041489920. Throughput: 0: 4936.6. Samples: 10355966. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 12:17:27,842][08963] Avg episode reward: [(0, '11.044')] [2025-01-05 12:17:29,456][09057] Updated weights for policy 0, policy_version 254278 (0.0016) [2025-01-05 12:17:31,551][09057] Updated weights for policy 0, policy_version 254288 (0.0017) [2025-01-05 12:17:32,842][08963] Fps is (10 sec: 19660.8, 60 sec: 19729.1, 300 sec: 19591.4). Total num frames: 1041588224. Throughput: 0: 4940.7. Samples: 10385294. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 12:17:32,842][08963] Avg episode reward: [(0, '9.522')] [2025-01-05 12:17:33,705][09057] Updated weights for policy 0, policy_version 254298 (0.0017) [2025-01-05 12:17:35,733][09057] Updated weights for policy 0, policy_version 254308 (0.0017) [2025-01-05 12:17:37,842][08963] Fps is (10 sec: 19251.2, 60 sec: 19660.8, 300 sec: 19591.4). Total num frames: 1041682432. Throughput: 0: 4923.8. Samples: 10414286. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 12:17:37,843][08963] Avg episode reward: [(0, '8.770')] [2025-01-05 12:17:37,881][09057] Updated weights for policy 0, policy_version 254318 (0.0018) [2025-01-05 12:17:40,046][09057] Updated weights for policy 0, policy_version 254328 (0.0017) [2025-01-05 12:17:42,090][09057] Updated weights for policy 0, policy_version 254338 (0.0016) [2025-01-05 12:17:42,842][08963] Fps is (10 sec: 19251.1, 60 sec: 19660.8, 300 sec: 19591.4). Total num frames: 1041780736. Throughput: 0: 4908.3. Samples: 10428684. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 12:17:42,842][08963] Avg episode reward: [(0, '9.141')] [2025-01-05 12:17:44,273][09057] Updated weights for policy 0, policy_version 254348 (0.0016) [2025-01-05 12:17:46,374][09057] Updated weights for policy 0, policy_version 254358 (0.0017) [2025-01-05 12:17:47,842][08963] Fps is (10 sec: 19660.7, 60 sec: 19660.8, 300 sec: 19577.5). Total num frames: 1041879040. Throughput: 0: 4890.3. Samples: 10457882. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 12:17:47,843][08963] Avg episode reward: [(0, '9.702')] [2025-01-05 12:17:47,850][09024] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000254365_1041879040.pth... [2025-01-05 12:17:47,904][09024] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000253219_1037185024.pth [2025-01-05 12:17:48,492][09057] Updated weights for policy 0, policy_version 254368 (0.0017) [2025-01-05 12:17:50,611][09057] Updated weights for policy 0, policy_version 254378 (0.0017) [2025-01-05 12:17:52,678][09057] Updated weights for policy 0, policy_version 254388 (0.0017) [2025-01-05 12:17:52,842][08963] Fps is (10 sec: 19661.1, 60 sec: 19660.8, 300 sec: 19577.5). Total num frames: 1041977344. Throughput: 0: 4900.0. Samples: 10487124. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 12:17:52,842][08963] Avg episode reward: [(0, '9.902')] [2025-01-05 12:17:54,758][09057] Updated weights for policy 0, policy_version 254398 (0.0016) [2025-01-05 12:17:56,862][09057] Updated weights for policy 0, policy_version 254408 (0.0017) [2025-01-05 12:17:57,842][08963] Fps is (10 sec: 19251.3, 60 sec: 19592.5, 300 sec: 19563.6). Total num frames: 1042071552. Throughput: 0: 4892.6. Samples: 10501540. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 12:17:57,842][08963] Avg episode reward: [(0, '10.029')] [2025-01-05 12:17:59,068][09057] Updated weights for policy 0, policy_version 254418 (0.0017) [2025-01-05 12:18:01,075][09057] Updated weights for policy 0, policy_version 254428 (0.0015) [2025-01-05 12:18:02,842][08963] Fps is (10 sec: 19251.0, 60 sec: 19524.3, 300 sec: 19563.6). Total num frames: 1042169856. Throughput: 0: 4875.5. Samples: 10530860. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 12:18:02,842][08963] Avg episode reward: [(0, '9.850')] [2025-01-05 12:18:03,116][09057] Updated weights for policy 0, policy_version 254438 (0.0016) [2025-01-05 12:18:05,197][09057] Updated weights for policy 0, policy_version 254448 (0.0016) [2025-01-05 12:18:07,204][09057] Updated weights for policy 0, policy_version 254458 (0.0016) [2025-01-05 12:18:07,842][08963] Fps is (10 sec: 20070.5, 60 sec: 19660.8, 300 sec: 19577.5). Total num frames: 1042272256. Throughput: 0: 4882.0. Samples: 10560934. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 12:18:07,842][08963] Avg episode reward: [(0, '9.923')] [2025-01-05 12:18:09,247][09057] Updated weights for policy 0, policy_version 254468 (0.0016) [2025-01-05 12:18:11,351][09057] Updated weights for policy 0, policy_version 254478 (0.0017) [2025-01-05 12:18:12,842][08963] Fps is (10 sec: 20070.4, 60 sec: 19660.8, 300 sec: 19577.5). Total num frames: 1042370560. Throughput: 0: 4890.1. Samples: 10576020. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 12:18:12,842][08963] Avg episode reward: [(0, '9.228')] [2025-01-05 12:18:13,431][09057] Updated weights for policy 0, policy_version 254488 (0.0016) [2025-01-05 12:18:15,547][09057] Updated weights for policy 0, policy_version 254498 (0.0016) [2025-01-05 12:18:17,635][09057] Updated weights for policy 0, policy_version 254508 (0.0016) [2025-01-05 12:18:17,842][08963] Fps is (10 sec: 19660.8, 60 sec: 19592.5, 300 sec: 19577.5). Total num frames: 1042468864. Throughput: 0: 4888.5. Samples: 10605276. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-05 12:18:17,842][08963] Avg episode reward: [(0, '7.947')] [2025-01-05 12:18:19,718][09057] Updated weights for policy 0, policy_version 254518 (0.0020) [2025-01-05 12:18:21,826][09057] Updated weights for policy 0, policy_version 254528 (0.0016) [2025-01-05 12:18:22,842][08963] Fps is (10 sec: 19251.2, 60 sec: 19524.3, 300 sec: 19563.6). Total num frames: 1042563072. Throughput: 0: 4891.6. Samples: 10634408. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-05 12:18:22,842][08963] Avg episode reward: [(0, '10.105')] [2025-01-05 12:18:24,016][09057] Updated weights for policy 0, policy_version 254538 (0.0017) [2025-01-05 12:18:26,016][09057] Updated weights for policy 0, policy_version 254548 (0.0017) [2025-01-05 12:18:27,842][08963] Fps is (10 sec: 19251.4, 60 sec: 19524.3, 300 sec: 19563.6). Total num frames: 1042661376. Throughput: 0: 4897.1. Samples: 10649052. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-05 12:18:27,842][08963] Avg episode reward: [(0, '10.996')] [2025-01-05 12:18:28,079][09057] Updated weights for policy 0, policy_version 254558 (0.0017) [2025-01-05 12:18:30,181][09057] Updated weights for policy 0, policy_version 254568 (0.0016) [2025-01-05 12:18:32,147][09057] Updated weights for policy 0, policy_version 254578 (0.0016) [2025-01-05 12:18:32,842][08963] Fps is (10 sec: 20070.4, 60 sec: 19592.5, 300 sec: 19577.5). Total num frames: 1042763776. Throughput: 0: 4915.7. Samples: 10679090. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-05 12:18:32,842][08963] Avg episode reward: [(0, '10.126')] [2025-01-05 12:18:34,203][09057] Updated weights for policy 0, policy_version 254588 (0.0016) [2025-01-05 12:18:36,303][09057] Updated weights for policy 0, policy_version 254598 (0.0016) [2025-01-05 12:18:37,842][08963] Fps is (10 sec: 20070.4, 60 sec: 19660.8, 300 sec: 19577.5). Total num frames: 1042862080. Throughput: 0: 4925.5. Samples: 10708774. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-05 12:18:37,842][08963] Avg episode reward: [(0, '10.710')] [2025-01-05 12:18:38,405][09057] Updated weights for policy 0, policy_version 254608 (0.0016) [2025-01-05 12:18:40,437][09057] Updated weights for policy 0, policy_version 254618 (0.0016) [2025-01-05 12:18:42,555][09057] Updated weights for policy 0, policy_version 254628 (0.0017) [2025-01-05 12:18:42,842][08963] Fps is (10 sec: 19660.9, 60 sec: 19660.8, 300 sec: 19577.5). Total num frames: 1042960384. Throughput: 0: 4935.7. Samples: 10723648. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-05 12:18:42,842][08963] Avg episode reward: [(0, '10.798')] [2025-01-05 12:18:44,620][09057] Updated weights for policy 0, policy_version 254638 (0.0017) [2025-01-05 12:18:46,684][09057] Updated weights for policy 0, policy_version 254648 (0.0015) [2025-01-05 12:18:47,842][08963] Fps is (10 sec: 19660.6, 60 sec: 19660.8, 300 sec: 19577.5). Total num frames: 1043058688. Throughput: 0: 4937.9. Samples: 10753064. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-05 12:18:47,842][08963] Avg episode reward: [(0, '10.626')] [2025-01-05 12:18:48,913][09057] Updated weights for policy 0, policy_version 254658 (0.0017) [2025-01-05 12:18:50,894][09057] Updated weights for policy 0, policy_version 254668 (0.0016) [2025-01-05 12:18:52,842][08963] Fps is (10 sec: 19660.9, 60 sec: 19660.8, 300 sec: 19577.5). Total num frames: 1043156992. Throughput: 0: 4922.5. Samples: 10782446. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-05 12:18:52,842][08963] Avg episode reward: [(0, '10.103')] [2025-01-05 12:18:52,971][09057] Updated weights for policy 0, policy_version 254678 (0.0016) [2025-01-05 12:18:55,060][09057] Updated weights for policy 0, policy_version 254688 (0.0016) [2025-01-05 12:18:57,022][09057] Updated weights for policy 0, policy_version 254698 (0.0015) [2025-01-05 12:18:57,842][08963] Fps is (10 sec: 19660.9, 60 sec: 19729.1, 300 sec: 19577.5). Total num frames: 1043255296. Throughput: 0: 4921.5. Samples: 10797488. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-05 12:18:57,842][08963] Avg episode reward: [(0, '10.339')] [2025-01-05 12:18:59,107][09057] Updated weights for policy 0, policy_version 254708 (0.0016) [2025-01-05 12:19:01,203][09057] Updated weights for policy 0, policy_version 254718 (0.0019) [2025-01-05 12:19:02,842][08963] Fps is (10 sec: 20070.4, 60 sec: 19797.3, 300 sec: 19605.3). Total num frames: 1043357696. Throughput: 0: 4938.1. Samples: 10827488. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-05 12:19:02,842][08963] Avg episode reward: [(0, '9.975')] [2025-01-05 12:19:03,239][09057] Updated weights for policy 0, policy_version 254728 (0.0017) [2025-01-05 12:19:05,345][09057] Updated weights for policy 0, policy_version 254738 (0.0016) [2025-01-05 12:19:07,436][09057] Updated weights for policy 0, policy_version 254748 (0.0016) [2025-01-05 12:19:07,842][08963] Fps is (10 sec: 20070.3, 60 sec: 19729.1, 300 sec: 19605.3). Total num frames: 1043456000. Throughput: 0: 4947.8. Samples: 10857058. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-05 12:19:07,842][08963] Avg episode reward: [(0, '9.493')] [2025-01-05 12:19:09,513][09057] Updated weights for policy 0, policy_version 254758 (0.0016) [2025-01-05 12:19:11,567][09057] Updated weights for policy 0, policy_version 254768 (0.0015) [2025-01-05 12:19:12,842][08963] Fps is (10 sec: 19251.1, 60 sec: 19660.8, 300 sec: 19591.4). Total num frames: 1043550208. Throughput: 0: 4944.6. Samples: 10871558. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-05 12:19:12,842][08963] Avg episode reward: [(0, '9.227')] [2025-01-05 12:19:13,772][09057] Updated weights for policy 0, policy_version 254778 (0.0016) [2025-01-05 12:19:15,782][09057] Updated weights for policy 0, policy_version 254788 (0.0016) [2025-01-05 12:19:17,842][08963] Fps is (10 sec: 19660.8, 60 sec: 19729.1, 300 sec: 19605.3). Total num frames: 1043652608. Throughput: 0: 4930.9. Samples: 10900982. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-05 12:19:17,842][09057] Updated weights for policy 0, policy_version 254798 (0.0016) [2025-01-05 12:19:17,842][08963] Avg episode reward: [(0, '9.426')] [2025-01-05 12:19:20,032][09057] Updated weights for policy 0, policy_version 254808 (0.0017) [2025-01-05 12:19:22,074][09057] Updated weights for policy 0, policy_version 254818 (0.0016) [2025-01-05 12:19:22,842][08963] Fps is (10 sec: 19660.8, 60 sec: 19729.1, 300 sec: 19591.4). Total num frames: 1043746816. Throughput: 0: 4918.2. Samples: 10930092. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-05 12:19:22,842][08963] Avg episode reward: [(0, '10.410')] [2025-01-05 12:19:24,250][09057] Updated weights for policy 0, policy_version 254828 (0.0016) [2025-01-05 12:19:26,399][09057] Updated weights for policy 0, policy_version 254838 (0.0017) [2025-01-05 12:19:27,842][08963] Fps is (10 sec: 19251.4, 60 sec: 19729.1, 300 sec: 19591.4). Total num frames: 1043845120. Throughput: 0: 4908.2. Samples: 10944516. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-05 12:19:27,842][08963] Avg episode reward: [(0, '9.375')] [2025-01-05 12:19:28,491][09057] Updated weights for policy 0, policy_version 254848 (0.0017) [2025-01-05 12:19:30,602][09057] Updated weights for policy 0, policy_version 254858 (0.0016) [2025-01-05 12:19:32,689][09057] Updated weights for policy 0, policy_version 254868 (0.0017) [2025-01-05 12:19:32,842][08963] Fps is (10 sec: 19660.8, 60 sec: 19660.8, 300 sec: 19605.3). Total num frames: 1043943424. Throughput: 0: 4902.6. Samples: 10973680. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-05 12:19:32,842][08963] Avg episode reward: [(0, '9.671')] [2025-01-05 12:19:34,789][09057] Updated weights for policy 0, policy_version 254878 (0.0017) [2025-01-05 12:19:36,908][09057] Updated weights for policy 0, policy_version 254888 (0.0016) [2025-01-05 12:19:37,842][08963] Fps is (10 sec: 19251.1, 60 sec: 19592.5, 300 sec: 19591.4). Total num frames: 1044037632. Throughput: 0: 4894.9. Samples: 11002718. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-05 12:19:37,842][08963] Avg episode reward: [(0, '9.685')] [2025-01-05 12:19:39,097][09057] Updated weights for policy 0, policy_version 254898 (0.0017) [2025-01-05 12:19:41,123][09057] Updated weights for policy 0, policy_version 254908 (0.0017) [2025-01-05 12:19:42,842][08963] Fps is (10 sec: 18841.5, 60 sec: 19524.2, 300 sec: 19577.5). Total num frames: 1044131840. Throughput: 0: 4880.0. Samples: 11017090. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-05 12:19:42,842][08963] Avg episode reward: [(0, '10.299')] [2025-01-05 12:19:43,315][09057] Updated weights for policy 0, policy_version 254918 (0.0017) [2025-01-05 12:19:45,407][09057] Updated weights for policy 0, policy_version 254928 (0.0015) [2025-01-05 12:19:47,368][09057] Updated weights for policy 0, policy_version 254938 (0.0016) [2025-01-05 12:19:47,842][08963] Fps is (10 sec: 19251.2, 60 sec: 19524.3, 300 sec: 19577.5). Total num frames: 1044230144. Throughput: 0: 4867.8. Samples: 11046540. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-05 12:19:47,842][08963] Avg episode reward: [(0, '10.418')] [2025-01-05 12:19:47,855][09024] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000254940_1044234240.pth... [2025-01-05 12:19:47,910][09024] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000253791_1039527936.pth [2025-01-05 12:19:49,522][09057] Updated weights for policy 0, policy_version 254948 (0.0016) [2025-01-05 12:19:51,620][09057] Updated weights for policy 0, policy_version 254958 (0.0016) [2025-01-05 12:19:52,842][08963] Fps is (10 sec: 19660.9, 60 sec: 19524.3, 300 sec: 19577.5). Total num frames: 1044328448. Throughput: 0: 4864.9. Samples: 11075976. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-05 12:19:52,842][08963] Avg episode reward: [(0, '9.562')] [2025-01-05 12:19:53,678][09057] Updated weights for policy 0, policy_version 254968 (0.0017) [2025-01-05 12:19:55,844][09057] Updated weights for policy 0, policy_version 254978 (0.0018) [2025-01-05 12:19:57,842][08963] Fps is (10 sec: 19660.9, 60 sec: 19524.3, 300 sec: 19577.5). Total num frames: 1044426752. Throughput: 0: 4863.6. Samples: 11090418. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-05 12:19:57,842][08963] Avg episode reward: [(0, '10.689')] [2025-01-05 12:19:58,089][09057] Updated weights for policy 0, policy_version 254988 (0.0018) [2025-01-05 12:20:00,073][09057] Updated weights for policy 0, policy_version 254998 (0.0016) [2025-01-05 12:20:02,166][09057] Updated weights for policy 0, policy_version 255008 (0.0015) [2025-01-05 12:20:02,842][08963] Fps is (10 sec: 19660.9, 60 sec: 19456.0, 300 sec: 19591.4). Total num frames: 1044525056. Throughput: 0: 4856.5. Samples: 11119522. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-05 12:20:02,842][08963] Avg episode reward: [(0, '10.898')] [2025-01-05 12:20:04,413][09057] Updated weights for policy 0, policy_version 255018 (0.0017) [2025-01-05 12:20:06,380][09057] Updated weights for policy 0, policy_version 255028 (0.0016) [2025-01-05 12:20:07,842][08963] Fps is (10 sec: 19251.1, 60 sec: 19387.7, 300 sec: 19577.5). Total num frames: 1044619264. Throughput: 0: 4856.2. Samples: 11148622. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-05 12:20:07,842][08963] Avg episode reward: [(0, '9.641')] [2025-01-05 12:20:08,476][09057] Updated weights for policy 0, policy_version 255038 (0.0016) [2025-01-05 12:20:10,584][09057] Updated weights for policy 0, policy_version 255048 (0.0015) [2025-01-05 12:20:12,548][09057] Updated weights for policy 0, policy_version 255058 (0.0015) [2025-01-05 12:20:12,842][08963] Fps is (10 sec: 19660.8, 60 sec: 19524.3, 300 sec: 19591.4). Total num frames: 1044721664. Throughput: 0: 4867.2. Samples: 11163542. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-05 12:20:12,842][08963] Avg episode reward: [(0, '9.531')] [2025-01-05 12:20:14,657][09057] Updated weights for policy 0, policy_version 255068 (0.0015) [2025-01-05 12:20:16,778][09057] Updated weights for policy 0, policy_version 255078 (0.0016) [2025-01-05 12:20:17,842][08963] Fps is (10 sec: 19660.9, 60 sec: 19387.8, 300 sec: 19591.4). Total num frames: 1044815872. Throughput: 0: 4882.1. Samples: 11193372. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-05 12:20:17,842][08963] Avg episode reward: [(0, '10.430')] [2025-01-05 12:20:18,899][09057] Updated weights for policy 0, policy_version 255088 (0.0019) [2025-01-05 12:20:21,046][09057] Updated weights for policy 0, policy_version 255098 (0.0016) [2025-01-05 12:20:22,842][08963] Fps is (10 sec: 19251.0, 60 sec: 19456.0, 300 sec: 19605.3). Total num frames: 1044914176. Throughput: 0: 4864.0. Samples: 11221598. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-05 12:20:22,842][08963] Avg episode reward: [(0, '9.923')] [2025-01-05 12:20:23,318][09057] Updated weights for policy 0, policy_version 255108 (0.0017) [2025-01-05 12:20:25,327][09057] Updated weights for policy 0, policy_version 255118 (0.0018) [2025-01-05 12:20:27,719][09057] Updated weights for policy 0, policy_version 255128 (0.0019) [2025-01-05 12:20:27,842][08963] Fps is (10 sec: 18841.3, 60 sec: 19319.4, 300 sec: 19563.6). Total num frames: 1045004288. Throughput: 0: 4866.3. Samples: 11236072. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-05 12:20:27,842][08963] Avg episode reward: [(0, '10.459')] [2025-01-05 12:20:30,162][09057] Updated weights for policy 0, policy_version 255138 (0.0020) [2025-01-05 12:20:32,228][09057] Updated weights for policy 0, policy_version 255148 (0.0018) [2025-01-05 12:20:32,842][08963] Fps is (10 sec: 18022.4, 60 sec: 19182.9, 300 sec: 19549.7). Total num frames: 1045094400. Throughput: 0: 4796.4. Samples: 11262380. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-05 12:20:32,842][08963] Avg episode reward: [(0, '10.445')] [2025-01-05 12:20:34,481][09057] Updated weights for policy 0, policy_version 255158 (0.0019) [2025-01-05 12:20:36,726][09057] Updated weights for policy 0, policy_version 255168 (0.0019) [2025-01-05 12:20:37,842][08963] Fps is (10 sec: 18431.7, 60 sec: 19182.9, 300 sec: 19535.8). Total num frames: 1045188608. Throughput: 0: 4762.7. Samples: 11290298. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 12:20:37,843][08963] Avg episode reward: [(0, '11.188')] [2025-01-05 12:20:38,873][09057] Updated weights for policy 0, policy_version 255178 (0.0018) [2025-01-05 12:20:40,999][09057] Updated weights for policy 0, policy_version 255188 (0.0018) [2025-01-05 12:20:42,842][08963] Fps is (10 sec: 18432.1, 60 sec: 19114.7, 300 sec: 19508.1). Total num frames: 1045278720. Throughput: 0: 4759.0. Samples: 11304574. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 12:20:42,842][08963] Avg episode reward: [(0, '11.571')] [2025-01-05 12:20:43,439][09057] Updated weights for policy 0, policy_version 255198 (0.0020) [2025-01-05 12:20:45,547][09057] Updated weights for policy 0, policy_version 255208 (0.0017) [2025-01-05 12:20:47,676][09057] Updated weights for policy 0, policy_version 255218 (0.0018) [2025-01-05 12:20:47,842][08963] Fps is (10 sec: 18432.4, 60 sec: 19046.4, 300 sec: 19494.2). Total num frames: 1045372928. Throughput: 0: 4720.9. Samples: 11331964. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 12:20:47,842][08963] Avg episode reward: [(0, '9.948')] [2025-01-05 12:20:49,973][09057] Updated weights for policy 0, policy_version 255228 (0.0019) [2025-01-05 12:20:52,078][09057] Updated weights for policy 0, policy_version 255238 (0.0019) [2025-01-05 12:20:52,842][08963] Fps is (10 sec: 18841.5, 60 sec: 18978.1, 300 sec: 19494.2). Total num frames: 1045467136. Throughput: 0: 4697.3. Samples: 11360002. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 12:20:52,842][08963] Avg episode reward: [(0, '9.812')] [2025-01-05 12:20:54,334][09057] Updated weights for policy 0, policy_version 255248 (0.0019) [2025-01-05 12:20:56,530][09057] Updated weights for policy 0, policy_version 255258 (0.0018) [2025-01-05 12:20:57,842][08963] Fps is (10 sec: 18432.2, 60 sec: 18841.6, 300 sec: 19466.4). Total num frames: 1045557248. Throughput: 0: 4674.5. Samples: 11373896. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 12:20:57,842][08963] Avg episode reward: [(0, '11.455')] [2025-01-05 12:20:58,866][09057] Updated weights for policy 0, policy_version 255268 (0.0018) [2025-01-05 12:21:01,151][09057] Updated weights for policy 0, policy_version 255278 (0.0018) [2025-01-05 12:21:02,842][08963] Fps is (10 sec: 18022.2, 60 sec: 18705.0, 300 sec: 19438.6). Total num frames: 1045647360. Throughput: 0: 4605.2. Samples: 11400608. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 12:21:02,843][08963] Avg episode reward: [(0, '9.491')] [2025-01-05 12:21:03,495][09057] Updated weights for policy 0, policy_version 255288 (0.0019) [2025-01-05 12:21:05,608][09057] Updated weights for policy 0, policy_version 255298 (0.0017) [2025-01-05 12:21:07,710][09057] Updated weights for policy 0, policy_version 255308 (0.0017) [2025-01-05 12:21:07,842][08963] Fps is (10 sec: 18431.8, 60 sec: 18705.0, 300 sec: 19424.8). Total num frames: 1045741568. Throughput: 0: 4603.3. Samples: 11428748. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 12:21:07,842][08963] Avg episode reward: [(0, '9.809')] [2025-01-05 12:21:09,991][09057] Updated weights for policy 0, policy_version 255318 (0.0018) [2025-01-05 12:21:12,073][09057] Updated weights for policy 0, policy_version 255328 (0.0018) [2025-01-05 12:21:12,842][08963] Fps is (10 sec: 18841.9, 60 sec: 18568.5, 300 sec: 19410.9). Total num frames: 1045835776. Throughput: 0: 4594.5. Samples: 11442824. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 12:21:12,842][08963] Avg episode reward: [(0, '10.316')] [2025-01-05 12:21:14,310][09057] Updated weights for policy 0, policy_version 255338 (0.0019) [2025-01-05 12:21:16,506][09057] Updated weights for policy 0, policy_version 255348 (0.0018) [2025-01-05 12:21:17,842][08963] Fps is (10 sec: 18431.9, 60 sec: 18500.2, 300 sec: 19383.1). Total num frames: 1045925888. Throughput: 0: 4632.8. Samples: 11470856. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 12:21:17,842][08963] Avg episode reward: [(0, '9.247')] [2025-01-05 12:21:18,719][09057] Updated weights for policy 0, policy_version 255358 (0.0019) [2025-01-05 12:21:20,854][09057] Updated weights for policy 0, policy_version 255368 (0.0017) [2025-01-05 12:21:22,842][08963] Fps is (10 sec: 18431.7, 60 sec: 18432.0, 300 sec: 19369.2). Total num frames: 1046020096. Throughput: 0: 4630.9. Samples: 11498688. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 12:21:22,843][08963] Avg episode reward: [(0, '8.187')] [2025-01-05 12:21:23,148][09057] Updated weights for policy 0, policy_version 255378 (0.0019) [2025-01-05 12:21:25,210][09057] Updated weights for policy 0, policy_version 255388 (0.0018) [2025-01-05 12:21:27,386][09057] Updated weights for policy 0, policy_version 255398 (0.0019) [2025-01-05 12:21:27,842][08963] Fps is (10 sec: 18841.4, 60 sec: 18500.2, 300 sec: 19355.3). Total num frames: 1046114304. Throughput: 0: 4634.5. Samples: 11513126. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 12:21:27,843][08963] Avg episode reward: [(0, '9.335')] [2025-01-05 12:21:29,690][09057] Updated weights for policy 0, policy_version 255408 (0.0019) [2025-01-05 12:21:31,815][09057] Updated weights for policy 0, policy_version 255418 (0.0017) [2025-01-05 12:21:32,842][08963] Fps is (10 sec: 18841.7, 60 sec: 18568.5, 300 sec: 19341.4). Total num frames: 1046208512. Throughput: 0: 4645.8. Samples: 11541026. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 12:21:32,843][08963] Avg episode reward: [(0, '9.863')] [2025-01-05 12:21:34,086][09057] Updated weights for policy 0, policy_version 255428 (0.0018) [2025-01-05 12:21:36,281][09057] Updated weights for policy 0, policy_version 255438 (0.0018) [2025-01-05 12:21:37,842][08963] Fps is (10 sec: 18432.4, 60 sec: 18500.4, 300 sec: 19313.7). Total num frames: 1046298624. Throughput: 0: 4634.3. Samples: 11568544. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 12:21:37,842][08963] Avg episode reward: [(0, '8.779')] [2025-01-05 12:21:38,515][09057] Updated weights for policy 0, policy_version 255448 (0.0018) [2025-01-05 12:21:40,647][09057] Updated weights for policy 0, policy_version 255458 (0.0018) [2025-01-05 12:21:42,770][09057] Updated weights for policy 0, policy_version 255468 (0.0018) [2025-01-05 12:21:42,842][08963] Fps is (10 sec: 18841.8, 60 sec: 18636.8, 300 sec: 19313.7). Total num frames: 1046396928. Throughput: 0: 4642.1. Samples: 11582792. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 12:21:42,842][08963] Avg episode reward: [(0, '9.557')] [2025-01-05 12:21:45,009][09057] Updated weights for policy 0, policy_version 255478 (0.0019) [2025-01-05 12:21:47,138][09057] Updated weights for policy 0, policy_version 255488 (0.0017) [2025-01-05 12:21:47,842][08963] Fps is (10 sec: 19251.1, 60 sec: 18636.8, 300 sec: 19299.8). Total num frames: 1046491136. Throughput: 0: 4678.9. Samples: 11611158. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 12:21:47,842][08963] Avg episode reward: [(0, '9.481')] [2025-01-05 12:21:47,849][09024] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000255491_1046491136.pth... [2025-01-05 12:21:47,905][09024] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000254365_1041879040.pth [2025-01-05 12:21:49,426][09057] Updated weights for policy 0, policy_version 255498 (0.0019) [2025-01-05 12:21:51,525][09057] Updated weights for policy 0, policy_version 255508 (0.0017) [2025-01-05 12:21:52,842][08963] Fps is (10 sec: 18431.8, 60 sec: 18568.5, 300 sec: 19272.0). Total num frames: 1046581248. Throughput: 0: 4672.4. Samples: 11639008. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 12:21:52,843][08963] Avg episode reward: [(0, '10.603')] [2025-01-05 12:21:53,733][09057] Updated weights for policy 0, policy_version 255518 (0.0019) [2025-01-05 12:21:55,932][09057] Updated weights for policy 0, policy_version 255528 (0.0018) [2025-01-05 12:21:57,842][08963] Fps is (10 sec: 18431.9, 60 sec: 18636.8, 300 sec: 19244.3). Total num frames: 1046675456. Throughput: 0: 4676.0. Samples: 11653246. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 12:21:57,843][08963] Avg episode reward: [(0, '9.977')] [2025-01-05 12:21:58,141][09057] Updated weights for policy 0, policy_version 255538 (0.0021) [2025-01-05 12:22:00,260][09057] Updated weights for policy 0, policy_version 255548 (0.0018) [2025-01-05 12:22:02,414][09057] Updated weights for policy 0, policy_version 255558 (0.0017) [2025-01-05 12:22:02,842][08963] Fps is (10 sec: 19251.4, 60 sec: 18773.4, 300 sec: 19258.1). Total num frames: 1046773760. Throughput: 0: 4685.7. Samples: 11681712. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 12:22:02,842][08963] Avg episode reward: [(0, '9.999')] [2025-01-05 12:22:04,610][09057] Updated weights for policy 0, policy_version 255568 (0.0018) [2025-01-05 12:22:06,751][09057] Updated weights for policy 0, policy_version 255578 (0.0019) [2025-01-05 12:22:07,842][08963] Fps is (10 sec: 18841.3, 60 sec: 18705.0, 300 sec: 19230.4). Total num frames: 1046863872. Throughput: 0: 4688.5. Samples: 11709670. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 12:22:07,843][08963] Avg episode reward: [(0, '10.157')] [2025-01-05 12:22:09,055][09057] Updated weights for policy 0, policy_version 255588 (0.0019) [2025-01-05 12:22:11,194][09057] Updated weights for policy 0, policy_version 255598 (0.0018) [2025-01-05 12:22:12,842][08963] Fps is (10 sec: 18022.0, 60 sec: 18636.7, 300 sec: 19188.7). Total num frames: 1046953984. Throughput: 0: 4678.0. Samples: 11723638. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 12:22:12,843][08963] Avg episode reward: [(0, '10.934')] [2025-01-05 12:22:13,604][09057] Updated weights for policy 0, policy_version 255608 (0.0020) [2025-01-05 12:22:15,793][09057] Updated weights for policy 0, policy_version 255618 (0.0018) [2025-01-05 12:22:17,842][08963] Fps is (10 sec: 18022.7, 60 sec: 18636.8, 300 sec: 19160.9). Total num frames: 1047044096. Throughput: 0: 4659.3. Samples: 11750694. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 12:22:17,842][08963] Avg episode reward: [(0, '9.628')] [2025-01-05 12:22:18,083][09057] Updated weights for policy 0, policy_version 255628 (0.0019) [2025-01-05 12:22:20,353][09057] Updated weights for policy 0, policy_version 255638 (0.0022) [2025-01-05 12:22:22,574][09057] Updated weights for policy 0, policy_version 255648 (0.0017) [2025-01-05 12:22:22,842][08963] Fps is (10 sec: 18432.2, 60 sec: 18636.8, 300 sec: 19147.1). Total num frames: 1047138304. Throughput: 0: 4651.2. Samples: 11777848. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 12:22:22,842][08963] Avg episode reward: [(0, '9.158')] [2025-01-05 12:22:24,824][09057] Updated weights for policy 0, policy_version 255658 (0.0020) [2025-01-05 12:22:27,180][09057] Updated weights for policy 0, policy_version 255668 (0.0018) [2025-01-05 12:22:27,842][08963] Fps is (10 sec: 18022.2, 60 sec: 18500.3, 300 sec: 19105.4). Total num frames: 1047224320. Throughput: 0: 4629.4. Samples: 11791118. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 12:22:27,843][08963] Avg episode reward: [(0, '9.311')] [2025-01-05 12:22:29,785][09057] Updated weights for policy 0, policy_version 255678 (0.0020) [2025-01-05 12:22:32,013][09057] Updated weights for policy 0, policy_version 255688 (0.0018) [2025-01-05 12:22:32,842][08963] Fps is (10 sec: 17203.4, 60 sec: 18363.8, 300 sec: 19077.7). Total num frames: 1047310336. Throughput: 0: 4568.5. Samples: 11816740. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 12:22:32,842][08963] Avg episode reward: [(0, '8.192')] [2025-01-05 12:22:34,380][09057] Updated weights for policy 0, policy_version 255698 (0.0020) [2025-01-05 12:22:36,662][09057] Updated weights for policy 0, policy_version 255708 (0.0018) [2025-01-05 12:22:37,842][08963] Fps is (10 sec: 17613.3, 60 sec: 18363.7, 300 sec: 19049.9). Total num frames: 1047400448. Throughput: 0: 4535.7. Samples: 11843112. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 12:22:37,842][08963] Avg episode reward: [(0, '10.039')] [2025-01-05 12:22:39,005][09057] Updated weights for policy 0, policy_version 255718 (0.0022) [2025-01-05 12:22:41,211][09057] Updated weights for policy 0, policy_version 255728 (0.0018) [2025-01-05 12:22:42,842][08963] Fps is (10 sec: 17612.5, 60 sec: 18158.9, 300 sec: 19008.2). Total num frames: 1047486464. Throughput: 0: 4519.3. Samples: 11856614. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 12:22:42,843][08963] Avg episode reward: [(0, '9.216')] [2025-01-05 12:22:43,545][09057] Updated weights for policy 0, policy_version 255738 (0.0019) [2025-01-05 12:22:45,711][09057] Updated weights for policy 0, policy_version 255748 (0.0018) [2025-01-05 12:22:47,842][08963] Fps is (10 sec: 18022.1, 60 sec: 18158.9, 300 sec: 18994.3). Total num frames: 1047580672. Throughput: 0: 4495.3. Samples: 11884000. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 12:22:47,842][08963] Avg episode reward: [(0, '10.103')] [2025-01-05 12:22:47,980][09057] Updated weights for policy 0, policy_version 255758 (0.0020) [2025-01-05 12:22:50,359][09057] Updated weights for policy 0, policy_version 255768 (0.0019) [2025-01-05 12:22:52,540][09057] Updated weights for policy 0, policy_version 255778 (0.0018) [2025-01-05 12:22:52,842][08963] Fps is (10 sec: 18432.1, 60 sec: 18158.9, 300 sec: 18980.5). Total num frames: 1047670784. Throughput: 0: 4474.3. Samples: 11911014. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 12:22:52,842][08963] Avg episode reward: [(0, '9.838')] [2025-01-05 12:22:54,886][09057] Updated weights for policy 0, policy_version 255788 (0.0021) [2025-01-05 12:22:57,534][09057] Updated weights for policy 0, policy_version 255798 (0.0018) [2025-01-05 12:22:57,842][08963] Fps is (10 sec: 17203.3, 60 sec: 17954.1, 300 sec: 18924.9). Total num frames: 1047752704. Throughput: 0: 4444.4. Samples: 11923636. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 12:22:57,842][08963] Avg episode reward: [(0, '10.779')] [2025-01-05 12:23:00,207][09057] Updated weights for policy 0, policy_version 255808 (0.0018) [2025-01-05 12:23:02,460][09057] Updated weights for policy 0, policy_version 255818 (0.0019) [2025-01-05 12:23:02,842][08963] Fps is (10 sec: 16384.2, 60 sec: 17681.1, 300 sec: 18855.5). Total num frames: 1047834624. Throughput: 0: 4380.2. Samples: 11947804. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 12:23:02,842][08963] Avg episode reward: [(0, '9.506')] [2025-01-05 12:23:05,219][09057] Updated weights for policy 0, policy_version 255828 (0.0017) [2025-01-05 12:23:07,431][09057] Updated weights for policy 0, policy_version 255838 (0.0018) [2025-01-05 12:23:07,842][08963] Fps is (10 sec: 16383.9, 60 sec: 17544.6, 300 sec: 18799.9). Total num frames: 1047916544. Throughput: 0: 4339.7. Samples: 11973136. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 12:23:07,843][08963] Avg episode reward: [(0, '9.316')] [2025-01-05 12:23:09,613][09057] Updated weights for policy 0, policy_version 255848 (0.0019) [2025-01-05 12:23:11,797][09057] Updated weights for policy 0, policy_version 255858 (0.0018) [2025-01-05 12:23:12,842][08963] Fps is (10 sec: 17612.5, 60 sec: 17612.8, 300 sec: 18786.1). Total num frames: 1048010752. Throughput: 0: 4352.6. Samples: 11986986. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 12:23:12,842][08963] Avg episode reward: [(0, '8.542')] [2025-01-05 12:23:14,026][09057] Updated weights for policy 0, policy_version 255868 (0.0019) [2025-01-05 12:23:16,115][09057] Updated weights for policy 0, policy_version 255878 (0.0018) [2025-01-05 12:23:17,842][08963] Fps is (10 sec: 18841.7, 60 sec: 17681.1, 300 sec: 18786.1). Total num frames: 1048104960. Throughput: 0: 4410.0. Samples: 12015192. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 12:23:17,842][08963] Avg episode reward: [(0, '9.482')] [2025-01-05 12:23:18,403][09057] Updated weights for policy 0, policy_version 255888 (0.0019) [2025-01-05 12:23:20,474][09057] Updated weights for policy 0, policy_version 255898 (0.0018) [2025-01-05 12:23:22,492][09057] Updated weights for policy 0, policy_version 255908 (0.0017) [2025-01-05 12:23:22,842][08963] Fps is (10 sec: 19251.3, 60 sec: 17749.3, 300 sec: 18786.1). Total num frames: 1048203264. Throughput: 0: 4470.6. Samples: 12044288. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 12:23:22,842][08963] Avg episode reward: [(0, '9.271')] [2025-01-05 12:23:24,757][09057] Updated weights for policy 0, policy_version 255918 (0.0017) [2025-01-05 12:23:26,813][09057] Updated weights for policy 0, policy_version 255928 (0.0017) [2025-01-05 12:23:27,842][08963] Fps is (10 sec: 19251.5, 60 sec: 17885.9, 300 sec: 18758.3). Total num frames: 1048297472. Throughput: 0: 4482.5. Samples: 12058326. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 12:23:27,842][08963] Avg episode reward: [(0, '9.177')] [2025-01-05 12:23:28,982][09057] Updated weights for policy 0, policy_version 255938 (0.0018) [2025-01-05 12:23:31,198][09057] Updated weights for policy 0, policy_version 255948 (0.0018) [2025-01-05 12:23:32,842][08963] Fps is (10 sec: 18841.5, 60 sec: 18022.4, 300 sec: 18744.4). Total num frames: 1048391680. Throughput: 0: 4506.5. Samples: 12086794. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 12:23:32,842][08963] Avg episode reward: [(0, '9.450')] [2025-01-05 12:23:33,344][09057] Updated weights for policy 0, policy_version 255958 (0.0018) [2025-01-05 12:23:35,462][09057] Updated weights for policy 0, policy_version 255968 (0.0017) [2025-01-05 12:23:37,588][09057] Updated weights for policy 0, policy_version 255978 (0.0017) [2025-01-05 12:23:37,842][08963] Fps is (10 sec: 19250.9, 60 sec: 18158.9, 300 sec: 18744.4). Total num frames: 1048489984. Throughput: 0: 4549.1. Samples: 12115724. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 12:23:37,842][08963] Avg episode reward: [(0, '10.132')] [2025-01-05 12:23:39,751][09057] Updated weights for policy 0, policy_version 255988 (0.0018) [2025-01-05 12:23:41,861][09057] Updated weights for policy 0, policy_version 255998 (0.0017) [2025-01-05 12:23:42,842][08963] Fps is (10 sec: 19251.4, 60 sec: 18295.5, 300 sec: 18730.5). Total num frames: 1048584192. Throughput: 0: 4576.2. Samples: 12129566. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 12:23:42,842][08963] Avg episode reward: [(0, '10.290')] [2025-01-05 12:23:44,129][09057] Updated weights for policy 0, policy_version 256008 (0.0018) [2025-01-05 12:23:46,208][09057] Updated weights for policy 0, policy_version 256018 (0.0018) [2025-01-05 12:23:47,842][08963] Fps is (10 sec: 18841.6, 60 sec: 18295.5, 300 sec: 18716.6). Total num frames: 1048678400. Throughput: 0: 4674.8. Samples: 12158172. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 12:23:47,842][08963] Avg episode reward: [(0, '10.008')] [2025-01-05 12:23:47,849][09024] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000256025_1048678400.pth... [2025-01-05 12:23:47,907][09024] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000254940_1044234240.pth [2025-01-05 12:23:48,474][09057] Updated weights for policy 0, policy_version 256028 (0.0019) [2025-01-05 12:23:50,703][09057] Updated weights for policy 0, policy_version 256038 (0.0019) [2025-01-05 12:23:52,842][08963] Fps is (10 sec: 18431.1, 60 sec: 18295.3, 300 sec: 18688.8). Total num frames: 1048768512. Throughput: 0: 4727.1. Samples: 12185858. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 12:23:52,843][08963] Avg episode reward: [(0, '10.195')] [2025-01-05 12:23:52,848][09057] Updated weights for policy 0, policy_version 256048 (0.0019) [2025-01-05 12:23:55,095][09057] Updated weights for policy 0, policy_version 256058 (0.0019) [2025-01-05 12:23:57,253][09057] Updated weights for policy 0, policy_version 256068 (0.0017) [2025-01-05 12:23:57,842][08963] Fps is (10 sec: 18431.9, 60 sec: 18500.2, 300 sec: 18661.1). Total num frames: 1048862720. Throughput: 0: 4728.3. Samples: 12199762. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 12:23:57,843][08963] Avg episode reward: [(0, '10.204')] [2025-01-05 12:23:59,389][09057] Updated weights for policy 0, policy_version 256078 (0.0019) [2025-01-05 12:24:01,517][09057] Updated weights for policy 0, policy_version 256088 (0.0018) [2025-01-05 12:24:02,842][08963] Fps is (10 sec: 19252.0, 60 sec: 18773.3, 300 sec: 18661.1). Total num frames: 1048961024. Throughput: 0: 4741.7. Samples: 12228570. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 12:24:02,842][08963] Avg episode reward: [(0, '9.573')] [2025-01-05 12:24:03,752][09057] Updated weights for policy 0, policy_version 256098 (0.0018) [2025-01-05 12:24:05,833][09057] Updated weights for policy 0, policy_version 256108 (0.0017) [2025-01-05 12:24:07,842][08963] Fps is (10 sec: 19251.3, 60 sec: 18978.1, 300 sec: 18661.1). Total num frames: 1049055232. Throughput: 0: 4715.9. Samples: 12256504. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 12:24:07,842][08963] Avg episode reward: [(0, '10.300')] [2025-01-05 12:24:08,115][09057] Updated weights for policy 0, policy_version 256118 (0.0019) [2025-01-05 12:24:10,289][09057] Updated weights for policy 0, policy_version 256128 (0.0018) [2025-01-05 12:24:12,339][09057] Updated weights for policy 0, policy_version 256138 (0.0018) [2025-01-05 12:24:12,842][08963] Fps is (10 sec: 18431.9, 60 sec: 18909.9, 300 sec: 18619.4). Total num frames: 1049145344. Throughput: 0: 4717.9. Samples: 12270632. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-05 12:24:12,842][08963] Avg episode reward: [(0, '9.867')] [2025-01-05 12:24:14,585][09057] Updated weights for policy 0, policy_version 256148 (0.0019) [2025-01-05 12:24:16,753][09057] Updated weights for policy 0, policy_version 256158 (0.0018) [2025-01-05 12:24:17,842][08963] Fps is (10 sec: 18841.7, 60 sec: 18978.1, 300 sec: 18633.3). Total num frames: 1049243648. Throughput: 0: 4722.6. Samples: 12299310. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-05 12:24:17,842][08963] Avg episode reward: [(0, '9.480')] [2025-01-05 12:24:18,923][09057] Updated weights for policy 0, policy_version 256168 (0.0018) [2025-01-05 12:24:21,065][09057] Updated weights for policy 0, policy_version 256178 (0.0018) [2025-01-05 12:24:22,842][08963] Fps is (10 sec: 19251.3, 60 sec: 18909.9, 300 sec: 18619.4). Total num frames: 1049337856. Throughput: 0: 4700.8. Samples: 12327258. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-05 12:24:22,842][08963] Avg episode reward: [(0, '8.968')] [2025-01-05 12:24:23,322][09057] Updated weights for policy 0, policy_version 256188 (0.0018) [2025-01-05 12:24:25,422][09057] Updated weights for policy 0, policy_version 256198 (0.0017) [2025-01-05 12:24:27,473][09057] Updated weights for policy 0, policy_version 256208 (0.0018) [2025-01-05 12:24:27,842][08963] Fps is (10 sec: 18841.4, 60 sec: 18909.8, 300 sec: 18605.6). Total num frames: 1049432064. Throughput: 0: 4713.0. Samples: 12341652. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-05 12:24:27,843][08963] Avg episode reward: [(0, '8.480')] [2025-01-05 12:24:29,719][09057] Updated weights for policy 0, policy_version 256218 (0.0017) [2025-01-05 12:24:31,765][09057] Updated weights for policy 0, policy_version 256228 (0.0018) [2025-01-05 12:24:32,842][08963] Fps is (10 sec: 18841.4, 60 sec: 18909.9, 300 sec: 18605.6). Total num frames: 1049526272. Throughput: 0: 4721.0. Samples: 12370616. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-05 12:24:32,843][08963] Avg episode reward: [(0, '10.198')] [2025-01-05 12:24:33,959][09057] Updated weights for policy 0, policy_version 256238 (0.0019) [2025-01-05 12:24:36,142][09057] Updated weights for policy 0, policy_version 256248 (0.0018) [2025-01-05 12:24:37,842][08963] Fps is (10 sec: 18841.8, 60 sec: 18841.6, 300 sec: 18605.6). Total num frames: 1049620480. Throughput: 0: 4729.5. Samples: 12398682. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-05 12:24:37,842][08963] Avg episode reward: [(0, '10.160')] [2025-01-05 12:24:38,329][09057] Updated weights for policy 0, policy_version 256258 (0.0019) [2025-01-05 12:24:40,461][09057] Updated weights for policy 0, policy_version 256268 (0.0018) [2025-01-05 12:24:42,618][09057] Updated weights for policy 0, policy_version 256278 (0.0018) [2025-01-05 12:24:42,842][08963] Fps is (10 sec: 19251.5, 60 sec: 18909.9, 300 sec: 18605.6). Total num frames: 1049718784. Throughput: 0: 4741.7. Samples: 12413136. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-05 12:24:42,842][08963] Avg episode reward: [(0, '9.964')] [2025-01-05 12:24:44,767][09057] Updated weights for policy 0, policy_version 256288 (0.0018) [2025-01-05 12:24:46,902][09057] Updated weights for policy 0, policy_version 256298 (0.0018) [2025-01-05 12:24:47,842][08963] Fps is (10 sec: 19251.1, 60 sec: 18909.9, 300 sec: 18591.7). Total num frames: 1049812992. Throughput: 0: 4732.8. Samples: 12441546. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-05 12:24:47,842][08963] Avg episode reward: [(0, '10.200')] [2025-01-05 12:24:49,179][09057] Updated weights for policy 0, policy_version 256308 (0.0019) [2025-01-05 12:24:51,229][09057] Updated weights for policy 0, policy_version 256318 (0.0017) [2025-01-05 12:24:52,842][08963] Fps is (10 sec: 18841.5, 60 sec: 18978.3, 300 sec: 18577.8). Total num frames: 1049907200. Throughput: 0: 4743.5. Samples: 12469962. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-05 12:24:52,843][08963] Avg episode reward: [(0, '9.521')] [2025-01-05 12:24:53,442][09057] Updated weights for policy 0, policy_version 256328 (0.0018) [2025-01-05 12:24:55,603][09057] Updated weights for policy 0, policy_version 256338 (0.0017) [2025-01-05 12:24:57,603][09057] Updated weights for policy 0, policy_version 256348 (0.0017) [2025-01-05 12:24:57,843][08963] Fps is (10 sec: 19250.2, 60 sec: 19046.3, 300 sec: 18577.7). Total num frames: 1050005504. Throughput: 0: 4747.9. Samples: 12484292. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-05 12:24:57,844][08963] Avg episode reward: [(0, '10.060')] [2025-01-05 12:24:59,675][09057] Updated weights for policy 0, policy_version 256358 (0.0017) [2025-01-05 12:25:01,782][09057] Updated weights for policy 0, policy_version 256368 (0.0017) [2025-01-05 12:25:02,842][08963] Fps is (10 sec: 19660.8, 60 sec: 19046.4, 300 sec: 18591.7). Total num frames: 1050103808. Throughput: 0: 4776.5. Samples: 12514254. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-05 12:25:02,842][08963] Avg episode reward: [(0, '9.983')] [2025-01-05 12:25:03,918][09057] Updated weights for policy 0, policy_version 256378 (0.0018) [2025-01-05 12:25:06,067][09057] Updated weights for policy 0, policy_version 256388 (0.0017) [2025-01-05 12:25:07,842][08963] Fps is (10 sec: 19252.3, 60 sec: 19046.4, 300 sec: 18563.9). Total num frames: 1050198016. Throughput: 0: 4781.1. Samples: 12542406. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-05 12:25:07,843][08963] Avg episode reward: [(0, '9.475')] [2025-01-05 12:25:08,315][09057] Updated weights for policy 0, policy_version 256398 (0.0018) [2025-01-05 12:25:10,383][09057] Updated weights for policy 0, policy_version 256408 (0.0018) [2025-01-05 12:25:12,482][09057] Updated weights for policy 0, policy_version 256418 (0.0017) [2025-01-05 12:25:12,842][08963] Fps is (10 sec: 18841.4, 60 sec: 19114.7, 300 sec: 18563.9). Total num frames: 1050292224. Throughput: 0: 4780.2. Samples: 12556760. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-05 12:25:12,843][08963] Avg episode reward: [(0, '9.339')] [2025-01-05 12:25:14,762][09057] Updated weights for policy 0, policy_version 256428 (0.0019) [2025-01-05 12:25:16,057][09024] Signal inference workers to stop experience collection... (500 times) [2025-01-05 12:25:16,058][09024] Signal inference workers to resume experience collection... (500 times) [2025-01-05 12:25:16,079][09057] InferenceWorker_p0-w0: stopping experience collection (500 times) [2025-01-05 12:25:16,079][09057] InferenceWorker_p0-w0: resuming experience collection (500 times) [2025-01-05 12:25:16,799][09057] Updated weights for policy 0, policy_version 256438 (0.0017) [2025-01-05 12:25:17,842][08963] Fps is (10 sec: 18841.5, 60 sec: 19046.4, 300 sec: 18550.0). Total num frames: 1050386432. Throughput: 0: 4771.4. Samples: 12585328. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) [2025-01-05 12:25:17,842][08963] Avg episode reward: [(0, '9.146')] [2025-01-05 12:25:18,929][09057] Updated weights for policy 0, policy_version 256448 (0.0018) [2025-01-05 12:25:21,036][09057] Updated weights for policy 0, policy_version 256458 (0.0018) [2025-01-05 12:25:22,842][08963] Fps is (10 sec: 19251.4, 60 sec: 19114.7, 300 sec: 18577.8). Total num frames: 1050484736. Throughput: 0: 4794.9. Samples: 12614454. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 12:25:22,842][08963] Avg episode reward: [(0, '9.736')] [2025-01-05 12:25:23,156][09057] Updated weights for policy 0, policy_version 256468 (0.0018) [2025-01-05 12:25:25,313][09057] Updated weights for policy 0, policy_version 256478 (0.0017) [2025-01-05 12:25:27,451][09057] Updated weights for policy 0, policy_version 256488 (0.0018) [2025-01-05 12:25:27,842][08963] Fps is (10 sec: 19251.3, 60 sec: 19114.7, 300 sec: 18591.7). Total num frames: 1050578944. Throughput: 0: 4792.6. Samples: 12628802. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 12:25:27,842][08963] Avg episode reward: [(0, '10.295')] [2025-01-05 12:25:29,619][09057] Updated weights for policy 0, policy_version 256498 (0.0019) [2025-01-05 12:25:31,790][09057] Updated weights for policy 0, policy_version 256508 (0.0018) [2025-01-05 12:25:32,842][08963] Fps is (10 sec: 18841.7, 60 sec: 19114.7, 300 sec: 18591.7). Total num frames: 1050673152. Throughput: 0: 4796.9. Samples: 12657406. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 12:25:32,842][08963] Avg episode reward: [(0, '9.107')] [2025-01-05 12:25:34,079][09057] Updated weights for policy 0, policy_version 256518 (0.0019) [2025-01-05 12:25:36,131][09057] Updated weights for policy 0, policy_version 256528 (0.0017) [2025-01-05 12:25:37,842][08963] Fps is (10 sec: 18841.5, 60 sec: 19114.6, 300 sec: 18605.6). Total num frames: 1050767360. Throughput: 0: 4784.2. Samples: 12685250. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 12:25:37,843][08963] Avg episode reward: [(0, '10.380')] [2025-01-05 12:25:38,387][09057] Updated weights for policy 0, policy_version 256538 (0.0019) [2025-01-05 12:25:40,541][09057] Updated weights for policy 0, policy_version 256548 (0.0017) [2025-01-05 12:25:42,566][09057] Updated weights for policy 0, policy_version 256558 (0.0017) [2025-01-05 12:25:42,842][08963] Fps is (10 sec: 19251.3, 60 sec: 19114.7, 300 sec: 18619.5). Total num frames: 1050865664. Throughput: 0: 4783.3. Samples: 12699538. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 12:25:42,842][08963] Avg episode reward: [(0, '11.191')] [2025-01-05 12:25:44,802][09057] Updated weights for policy 0, policy_version 256568 (0.0019) [2025-01-05 12:25:46,940][09057] Updated weights for policy 0, policy_version 256578 (0.0017) [2025-01-05 12:25:47,842][08963] Fps is (10 sec: 19251.1, 60 sec: 19114.7, 300 sec: 18619.4). Total num frames: 1050959872. Throughput: 0: 4757.3. Samples: 12728334. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 12:25:47,842][08963] Avg episode reward: [(0, '10.328')] [2025-01-05 12:25:47,849][09024] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000256582_1050959872.pth... [2025-01-05 12:25:47,903][09024] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000255491_1046491136.pth [2025-01-05 12:25:49,143][09057] Updated weights for policy 0, policy_version 256588 (0.0019) [2025-01-05 12:25:51,312][09057] Updated weights for policy 0, policy_version 256598 (0.0018) [2025-01-05 12:25:52,844][08963] Fps is (10 sec: 18837.5, 60 sec: 19114.0, 300 sec: 18633.2). Total num frames: 1051054080. Throughput: 0: 4765.6. Samples: 12756866. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 12:25:52,845][08963] Avg episode reward: [(0, '10.518')] [2025-01-05 12:25:53,431][09057] Updated weights for policy 0, policy_version 256608 (0.0018) [2025-01-05 12:25:55,442][09057] Updated weights for policy 0, policy_version 256618 (0.0018) [2025-01-05 12:25:57,535][09057] Updated weights for policy 0, policy_version 256628 (0.0017) [2025-01-05 12:25:57,842][08963] Fps is (10 sec: 19251.4, 60 sec: 19114.9, 300 sec: 18661.1). Total num frames: 1051152384. Throughput: 0: 4771.7. Samples: 12771486. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 12:25:57,842][08963] Avg episode reward: [(0, '10.051')] [2025-01-05 12:25:59,702][09057] Updated weights for policy 0, policy_version 256638 (0.0018) [2025-01-05 12:26:01,684][09057] Updated weights for policy 0, policy_version 256648 (0.0016) [2025-01-05 12:26:02,842][08963] Fps is (10 sec: 19665.0, 60 sec: 19114.7, 300 sec: 18675.0). Total num frames: 1051250688. Throughput: 0: 4792.5. Samples: 12800988. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 12:26:02,842][08963] Avg episode reward: [(0, '10.967')] [2025-01-05 12:26:03,800][09057] Updated weights for policy 0, policy_version 256658 (0.0017) [2025-01-05 12:26:05,910][09057] Updated weights for policy 0, policy_version 256668 (0.0018) [2025-01-05 12:26:07,842][08963] Fps is (10 sec: 19250.9, 60 sec: 19114.6, 300 sec: 18675.0). Total num frames: 1051344896. Throughput: 0: 4794.2. Samples: 12830192. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 12:26:07,843][08963] Avg episode reward: [(0, '9.935')] [2025-01-05 12:26:08,018][09057] Updated weights for policy 0, policy_version 256678 (0.0018) [2025-01-05 12:26:10,172][09057] Updated weights for policy 0, policy_version 256688 (0.0017) [2025-01-05 12:26:12,271][09057] Updated weights for policy 0, policy_version 256698 (0.0016) [2025-01-05 12:26:12,842][08963] Fps is (10 sec: 19251.1, 60 sec: 19183.0, 300 sec: 18702.8). Total num frames: 1051443200. Throughput: 0: 4797.9. Samples: 12844708. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 12:26:12,842][08963] Avg episode reward: [(0, '10.355')] [2025-01-05 12:26:14,404][09057] Updated weights for policy 0, policy_version 256708 (0.0019) [2025-01-05 12:26:16,562][09057] Updated weights for policy 0, policy_version 256718 (0.0017) [2025-01-05 12:26:17,842][08963] Fps is (10 sec: 19251.5, 60 sec: 19183.0, 300 sec: 18702.8). Total num frames: 1051537408. Throughput: 0: 4800.6. Samples: 12873432. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 12:26:17,842][08963] Avg episode reward: [(0, '10.773')] [2025-01-05 12:26:18,813][09057] Updated weights for policy 0, policy_version 256728 (0.0018) [2025-01-05 12:26:20,865][09057] Updated weights for policy 0, policy_version 256738 (0.0016) [2025-01-05 12:26:22,842][08963] Fps is (10 sec: 19251.3, 60 sec: 19182.9, 300 sec: 18716.7). Total num frames: 1051635712. Throughput: 0: 4809.2. Samples: 12901664. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 12:26:22,842][08963] Avg episode reward: [(0, '10.157')] [2025-01-05 12:26:23,091][09057] Updated weights for policy 0, policy_version 256748 (0.0018) [2025-01-05 12:26:25,237][09057] Updated weights for policy 0, policy_version 256758 (0.0017) [2025-01-05 12:26:27,215][09057] Updated weights for policy 0, policy_version 256768 (0.0017) [2025-01-05 12:26:27,842][08963] Fps is (10 sec: 19660.7, 60 sec: 19251.2, 300 sec: 18730.5). Total num frames: 1051734016. Throughput: 0: 4817.6. Samples: 12916332. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 12:26:27,842][08963] Avg episode reward: [(0, '9.856')] [2025-01-05 12:26:29,324][09057] Updated weights for policy 0, policy_version 256778 (0.0017) [2025-01-05 12:26:31,424][09057] Updated weights for policy 0, policy_version 256788 (0.0016) [2025-01-05 12:26:32,842][08963] Fps is (10 sec: 19660.7, 60 sec: 19319.5, 300 sec: 18758.3). Total num frames: 1051832320. Throughput: 0: 4836.7. Samples: 12945984. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 12:26:32,842][08963] Avg episode reward: [(0, '10.433')] [2025-01-05 12:26:33,458][09057] Updated weights for policy 0, policy_version 256798 (0.0017) [2025-01-05 12:26:35,542][09057] Updated weights for policy 0, policy_version 256808 (0.0020) [2025-01-05 12:26:37,653][09057] Updated weights for policy 0, policy_version 256818 (0.0018) [2025-01-05 12:26:37,842][08963] Fps is (10 sec: 19660.9, 60 sec: 19387.7, 300 sec: 18758.3). Total num frames: 1051930624. Throughput: 0: 4861.3. Samples: 12975616. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 12:26:37,842][08963] Avg episode reward: [(0, '10.491')] [2025-01-05 12:26:39,781][09057] Updated weights for policy 0, policy_version 256828 (0.0018) [2025-01-05 12:26:41,899][09057] Updated weights for policy 0, policy_version 256838 (0.0018) [2025-01-05 12:26:42,842][08963] Fps is (10 sec: 19251.1, 60 sec: 19319.4, 300 sec: 18758.3). Total num frames: 1052024832. Throughput: 0: 4846.5. Samples: 12989580. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 12:26:42,842][08963] Avg episode reward: [(0, '9.461')] [2025-01-05 12:26:44,169][09057] Updated weights for policy 0, policy_version 256848 (0.0018) [2025-01-05 12:26:46,231][09057] Updated weights for policy 0, policy_version 256858 (0.0018) [2025-01-05 12:26:47,842][08963] Fps is (10 sec: 18841.6, 60 sec: 19319.5, 300 sec: 18772.2). Total num frames: 1052119040. Throughput: 0: 4827.6. Samples: 13018230. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 12:26:47,842][08963] Avg episode reward: [(0, '10.064')] [2025-01-05 12:26:48,444][09057] Updated weights for policy 0, policy_version 256868 (0.0018) [2025-01-05 12:26:50,583][09057] Updated weights for policy 0, policy_version 256878 (0.0017) [2025-01-05 12:26:52,577][09057] Updated weights for policy 0, policy_version 256888 (0.0019) [2025-01-05 12:26:52,842][08963] Fps is (10 sec: 19251.2, 60 sec: 19388.4, 300 sec: 18786.1). Total num frames: 1052217344. Throughput: 0: 4826.5. Samples: 13047382. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 12:26:52,842][08963] Avg episode reward: [(0, '9.779')] [2025-01-05 12:26:54,655][09057] Updated weights for policy 0, policy_version 256898 (0.0017) [2025-01-05 12:26:56,773][09057] Updated weights for policy 0, policy_version 256908 (0.0017) [2025-01-05 12:26:57,842][08963] Fps is (10 sec: 19660.8, 60 sec: 19387.7, 300 sec: 18786.1). Total num frames: 1052315648. Throughput: 0: 4831.6. Samples: 13062128. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 12:26:57,842][08963] Avg episode reward: [(0, '10.321')] [2025-01-05 12:26:58,913][09057] Updated weights for policy 0, policy_version 256918 (0.0018) [2025-01-05 12:27:01,054][09057] Updated weights for policy 0, policy_version 256928 (0.0017) [2025-01-05 12:27:02,842][08963] Fps is (10 sec: 19251.3, 60 sec: 19319.5, 300 sec: 18800.0). Total num frames: 1052409856. Throughput: 0: 4828.4. Samples: 13090710. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 12:27:02,842][08963] Avg episode reward: [(0, '9.978')] [2025-01-05 12:27:03,323][09057] Updated weights for policy 0, policy_version 256938 (0.0018) [2025-01-05 12:27:05,330][09057] Updated weights for policy 0, policy_version 256948 (0.0016) [2025-01-05 12:27:07,410][09057] Updated weights for policy 0, policy_version 256958 (0.0017) [2025-01-05 12:27:07,842][08963] Fps is (10 sec: 19251.2, 60 sec: 19387.8, 300 sec: 18827.7). Total num frames: 1052508160. Throughput: 0: 4848.1. Samples: 13119828. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 12:27:07,842][08963] Avg episode reward: [(0, '9.171')] [2025-01-05 12:27:09,655][09057] Updated weights for policy 0, policy_version 256968 (0.0018) [2025-01-05 12:27:11,721][09057] Updated weights for policy 0, policy_version 256978 (0.0017) [2025-01-05 12:27:12,842][08963] Fps is (10 sec: 18841.5, 60 sec: 19251.2, 300 sec: 18827.7). Total num frames: 1052598272. Throughput: 0: 4832.6. Samples: 13133798. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 12:27:12,842][08963] Avg episode reward: [(0, '9.446')] [2025-01-05 12:27:13,929][09057] Updated weights for policy 0, policy_version 256988 (0.0018) [2025-01-05 12:27:16,106][09057] Updated weights for policy 0, policy_version 256998 (0.0018) [2025-01-05 12:27:17,842][08963] Fps is (10 sec: 18841.4, 60 sec: 19319.4, 300 sec: 18841.6). Total num frames: 1052696576. Throughput: 0: 4811.1. Samples: 13162484. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 12:27:17,843][08963] Avg episode reward: [(0, '10.269')] [2025-01-05 12:27:18,304][09057] Updated weights for policy 0, policy_version 257008 (0.0019) [2025-01-05 12:27:20,409][09057] Updated weights for policy 0, policy_version 257018 (0.0017) [2025-01-05 12:27:22,529][09057] Updated weights for policy 0, policy_version 257028 (0.0018) [2025-01-05 12:27:22,842][08963] Fps is (10 sec: 19251.2, 60 sec: 19251.2, 300 sec: 18869.4). Total num frames: 1052790784. Throughput: 0: 4789.6. Samples: 13191150. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 12:27:22,842][08963] Avg episode reward: [(0, '10.811')] [2025-01-05 12:27:24,691][09057] Updated weights for policy 0, policy_version 257038 (0.0018) [2025-01-05 12:27:26,807][09057] Updated weights for policy 0, policy_version 257048 (0.0018) [2025-01-05 12:27:27,842][08963] Fps is (10 sec: 18841.6, 60 sec: 19182.9, 300 sec: 18897.1). Total num frames: 1052884992. Throughput: 0: 4792.1. Samples: 13205226. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 12:27:27,843][08963] Avg episode reward: [(0, '9.020')] [2025-01-05 12:27:29,063][09057] Updated weights for policy 0, policy_version 257058 (0.0019) [2025-01-05 12:27:31,181][09057] Updated weights for policy 0, policy_version 257068 (0.0018) [2025-01-05 12:27:32,842][08963] Fps is (10 sec: 18841.6, 60 sec: 19114.6, 300 sec: 18911.0). Total num frames: 1052979200. Throughput: 0: 4789.4. Samples: 13233754. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 12:27:32,842][08963] Avg episode reward: [(0, '9.253')] [2025-01-05 12:27:33,343][09057] Updated weights for policy 0, policy_version 257078 (0.0018) [2025-01-05 12:27:35,536][09057] Updated weights for policy 0, policy_version 257088 (0.0018) [2025-01-05 12:27:37,565][09057] Updated weights for policy 0, policy_version 257098 (0.0017) [2025-01-05 12:27:37,842][08963] Fps is (10 sec: 19251.4, 60 sec: 19114.7, 300 sec: 18952.7). Total num frames: 1053077504. Throughput: 0: 4782.4. Samples: 13262588. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 12:27:37,842][08963] Avg episode reward: [(0, '8.297')] [2025-01-05 12:27:39,728][09057] Updated weights for policy 0, policy_version 257108 (0.0018) [2025-01-05 12:27:41,948][09057] Updated weights for policy 0, policy_version 257118 (0.0018) [2025-01-05 12:27:42,842][08963] Fps is (10 sec: 19251.4, 60 sec: 19114.7, 300 sec: 18952.7). Total num frames: 1053171712. Throughput: 0: 4763.3. Samples: 13276474. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-05 12:27:42,842][08963] Avg episode reward: [(0, '9.195')] [2025-01-05 12:27:44,173][09057] Updated weights for policy 0, policy_version 257128 (0.0019) [2025-01-05 12:27:46,241][09057] Updated weights for policy 0, policy_version 257138 (0.0018) [2025-01-05 12:27:47,842][08963] Fps is (10 sec: 18841.8, 60 sec: 19114.7, 300 sec: 18966.6). Total num frames: 1053265920. Throughput: 0: 4761.3. Samples: 13304970. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-05 12:27:47,842][08963] Avg episode reward: [(0, '10.308')] [2025-01-05 12:27:47,848][09024] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000257145_1053265920.pth... [2025-01-05 12:27:47,901][09024] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000256025_1048678400.pth [2025-01-05 12:27:48,537][09057] Updated weights for policy 0, policy_version 257148 (0.0019) [2025-01-05 12:27:50,694][09057] Updated weights for policy 0, policy_version 257158 (0.0018) [2025-01-05 12:27:52,842][08963] Fps is (10 sec: 18431.6, 60 sec: 18978.1, 300 sec: 18994.3). Total num frames: 1053356032. Throughput: 0: 4733.7. Samples: 13332846. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-05 12:27:52,843][08963] Avg episode reward: [(0, '9.657')] [2025-01-05 12:27:52,870][09057] Updated weights for policy 0, policy_version 257168 (0.0019) [2025-01-05 12:27:55,087][09057] Updated weights for policy 0, policy_version 257178 (0.0019) [2025-01-05 12:27:57,304][09057] Updated weights for policy 0, policy_version 257188 (0.0019) [2025-01-05 12:27:57,842][08963] Fps is (10 sec: 18431.8, 60 sec: 18909.9, 300 sec: 19036.0). Total num frames: 1053450240. Throughput: 0: 4730.6. Samples: 13346674. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-05 12:27:57,842][08963] Avg episode reward: [(0, '9.887')] [2025-01-05 12:27:59,480][09057] Updated weights for policy 0, policy_version 257198 (0.0019) [2025-01-05 12:28:01,589][09057] Updated weights for policy 0, policy_version 257208 (0.0018) [2025-01-05 12:28:02,842][08963] Fps is (10 sec: 18842.0, 60 sec: 18909.9, 300 sec: 19077.6). Total num frames: 1053544448. Throughput: 0: 4724.4. Samples: 13375082. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-05 12:28:02,842][08963] Avg episode reward: [(0, '10.966')] [2025-01-05 12:28:03,863][09057] Updated weights for policy 0, policy_version 257218 (0.0019) [2025-01-05 12:28:05,868][09057] Updated weights for policy 0, policy_version 257228 (0.0015) [2025-01-05 12:28:07,842][08963] Fps is (10 sec: 19251.2, 60 sec: 18909.9, 300 sec: 19091.5). Total num frames: 1053642752. Throughput: 0: 4733.1. Samples: 13404138. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-05 12:28:07,842][08963] Avg episode reward: [(0, '9.881')] [2025-01-05 12:28:07,953][09057] Updated weights for policy 0, policy_version 257238 (0.0017) [2025-01-05 12:28:10,036][09057] Updated weights for policy 0, policy_version 257248 (0.0015) [2025-01-05 12:28:12,021][09057] Updated weights for policy 0, policy_version 257258 (0.0016) [2025-01-05 12:28:12,842][08963] Fps is (10 sec: 19660.7, 60 sec: 19046.4, 300 sec: 19105.4). Total num frames: 1053741056. Throughput: 0: 4755.1. Samples: 13419206. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-05 12:28:12,842][08963] Avg episode reward: [(0, '9.826')] [2025-01-05 12:28:14,097][09057] Updated weights for policy 0, policy_version 257268 (0.0015) [2025-01-05 12:28:16,194][09057] Updated weights for policy 0, policy_version 257278 (0.0016) [2025-01-05 12:28:17,842][08963] Fps is (10 sec: 19660.8, 60 sec: 19046.4, 300 sec: 19105.4). Total num frames: 1053839360. Throughput: 0: 4781.3. Samples: 13448914. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-05 12:28:17,842][08963] Avg episode reward: [(0, '9.157')] [2025-01-05 12:28:18,334][09057] Updated weights for policy 0, policy_version 257288 (0.0016) [2025-01-05 12:28:20,403][09057] Updated weights for policy 0, policy_version 257298 (0.0016) [2025-01-05 12:28:22,510][09057] Updated weights for policy 0, policy_version 257308 (0.0016) [2025-01-05 12:28:22,842][08963] Fps is (10 sec: 19660.8, 60 sec: 19114.7, 300 sec: 19119.3). Total num frames: 1053937664. Throughput: 0: 4789.0. Samples: 13478092. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-05 12:28:22,842][08963] Avg episode reward: [(0, '8.537')] [2025-01-05 12:28:24,628][09057] Updated weights for policy 0, policy_version 257318 (0.0016) [2025-01-05 12:28:26,734][09057] Updated weights for policy 0, policy_version 257328 (0.0016) [2025-01-05 12:28:27,842][08963] Fps is (10 sec: 19660.2, 60 sec: 19182.9, 300 sec: 19133.2). Total num frames: 1054035968. Throughput: 0: 4800.5. Samples: 13492500. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-05 12:28:27,843][08963] Avg episode reward: [(0, '9.557')] [2025-01-05 12:28:28,926][09057] Updated weights for policy 0, policy_version 257338 (0.0017) [2025-01-05 12:28:30,969][09057] Updated weights for policy 0, policy_version 257348 (0.0016) [2025-01-05 12:28:32,842][08963] Fps is (10 sec: 19251.3, 60 sec: 19183.0, 300 sec: 19119.3). Total num frames: 1054130176. Throughput: 0: 4810.1. Samples: 13521426. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-05 12:28:32,843][08963] Avg episode reward: [(0, '9.355')] [2025-01-05 12:28:33,157][09057] Updated weights for policy 0, policy_version 257358 (0.0017) [2025-01-05 12:28:35,295][09057] Updated weights for policy 0, policy_version 257368 (0.0017) [2025-01-05 12:28:37,322][09057] Updated weights for policy 0, policy_version 257378 (0.0017) [2025-01-05 12:28:37,842][08963] Fps is (10 sec: 19252.0, 60 sec: 19183.0, 300 sec: 19133.2). Total num frames: 1054228480. Throughput: 0: 4836.8. Samples: 13550500. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-05 12:28:37,842][08963] Avg episode reward: [(0, '10.200')] [2025-01-05 12:28:39,512][09057] Updated weights for policy 0, policy_version 257388 (0.0017) [2025-01-05 12:28:41,643][09057] Updated weights for policy 0, policy_version 257398 (0.0016) [2025-01-05 12:28:42,842][08963] Fps is (10 sec: 19251.0, 60 sec: 19182.9, 300 sec: 19133.2). Total num frames: 1054322688. Throughput: 0: 4845.0. Samples: 13564698. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-05 12:28:42,843][08963] Avg episode reward: [(0, '8.708')] [2025-01-05 12:28:43,779][09057] Updated weights for policy 0, policy_version 257408 (0.0016) [2025-01-05 12:28:45,864][09057] Updated weights for policy 0, policy_version 257418 (0.0017) [2025-01-05 12:28:47,842][08963] Fps is (10 sec: 19251.0, 60 sec: 19251.2, 300 sec: 19161.0). Total num frames: 1054420992. Throughput: 0: 4857.1. Samples: 13593652. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-05 12:28:47,843][08963] Avg episode reward: [(0, '9.502')] [2025-01-05 12:28:48,059][09057] Updated weights for policy 0, policy_version 257428 (0.0018) [2025-01-05 12:28:50,097][09057] Updated weights for policy 0, policy_version 257438 (0.0016) [2025-01-05 12:28:52,159][09057] Updated weights for policy 0, policy_version 257448 (0.0015) [2025-01-05 12:28:52,842][08963] Fps is (10 sec: 19660.9, 60 sec: 19387.8, 300 sec: 19174.8). Total num frames: 1054519296. Throughput: 0: 4863.9. Samples: 13623012. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 12:28:52,842][08963] Avg episode reward: [(0, '10.404')] [2025-01-05 12:28:54,349][09057] Updated weights for policy 0, policy_version 257458 (0.0017) [2025-01-05 12:28:56,341][09057] Updated weights for policy 0, policy_version 257468 (0.0015) [2025-01-05 12:28:57,842][08963] Fps is (10 sec: 19660.9, 60 sec: 19456.0, 300 sec: 19174.8). Total num frames: 1054617600. Throughput: 0: 4852.6. Samples: 13637572. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 12:28:57,842][08963] Avg episode reward: [(0, '9.743')] [2025-01-05 12:28:58,415][09057] Updated weights for policy 0, policy_version 257478 (0.0015) [2025-01-05 12:29:00,498][09057] Updated weights for policy 0, policy_version 257488 (0.0016) [2025-01-05 12:29:02,482][09057] Updated weights for policy 0, policy_version 257498 (0.0015) [2025-01-05 12:29:02,842][08963] Fps is (10 sec: 19661.0, 60 sec: 19524.3, 300 sec: 19188.7). Total num frames: 1054715904. Throughput: 0: 4858.4. Samples: 13667540. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 12:29:02,842][08963] Avg episode reward: [(0, '9.724')] [2025-01-05 12:29:04,557][09057] Updated weights for policy 0, policy_version 257508 (0.0016) [2025-01-05 12:29:06,633][09057] Updated weights for policy 0, policy_version 257518 (0.0015) [2025-01-05 12:29:07,842][08963] Fps is (10 sec: 19660.4, 60 sec: 19524.2, 300 sec: 19216.5). Total num frames: 1054814208. Throughput: 0: 4870.6. Samples: 13697270. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 12:29:07,843][08963] Avg episode reward: [(0, '9.685')] [2025-01-05 12:29:08,739][09057] Updated weights for policy 0, policy_version 257528 (0.0017) [2025-01-05 12:29:10,821][09057] Updated weights for policy 0, policy_version 257538 (0.0016) [2025-01-05 12:29:12,842][08963] Fps is (10 sec: 19660.7, 60 sec: 19524.3, 300 sec: 19216.5). Total num frames: 1054912512. Throughput: 0: 4878.4. Samples: 13712026. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 12:29:12,842][08963] Avg episode reward: [(0, '9.831')] [2025-01-05 12:29:13,016][09057] Updated weights for policy 0, policy_version 257548 (0.0017) [2025-01-05 12:29:15,116][09057] Updated weights for policy 0, policy_version 257558 (0.0016) [2025-01-05 12:29:17,215][09057] Updated weights for policy 0, policy_version 257568 (0.0017) [2025-01-05 12:29:17,842][08963] Fps is (10 sec: 19251.6, 60 sec: 19456.0, 300 sec: 19216.5). Total num frames: 1055006720. Throughput: 0: 4875.9. Samples: 13740844. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 12:29:17,842][08963] Avg episode reward: [(0, '7.967')] [2025-01-05 12:29:19,421][09057] Updated weights for policy 0, policy_version 257578 (0.0017) [2025-01-05 12:29:21,470][09057] Updated weights for policy 0, policy_version 257588 (0.0017) [2025-01-05 12:29:22,842][08963] Fps is (10 sec: 19251.2, 60 sec: 19456.0, 300 sec: 19230.4). Total num frames: 1055105024. Throughput: 0: 4869.0. Samples: 13769606. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 12:29:22,842][08963] Avg episode reward: [(0, '8.933')] [2025-01-05 12:29:23,675][09057] Updated weights for policy 0, policy_version 257598 (0.0016) [2025-01-05 12:29:25,747][09057] Updated weights for policy 0, policy_version 257608 (0.0017) [2025-01-05 12:29:27,842][08963] Fps is (10 sec: 19660.8, 60 sec: 19456.1, 300 sec: 19244.3). Total num frames: 1055203328. Throughput: 0: 4875.8. Samples: 13784108. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 12:29:27,842][08963] Avg episode reward: [(0, '9.627')] [2025-01-05 12:29:27,843][09057] Updated weights for policy 0, policy_version 257618 (0.0017) [2025-01-05 12:29:30,057][09057] Updated weights for policy 0, policy_version 257628 (0.0017) [2025-01-05 12:29:32,136][09057] Updated weights for policy 0, policy_version 257638 (0.0016) [2025-01-05 12:29:32,842][08963] Fps is (10 sec: 19251.1, 60 sec: 19456.0, 300 sec: 19244.3). Total num frames: 1055297536. Throughput: 0: 4869.8. Samples: 13812794. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 12:29:32,842][08963] Avg episode reward: [(0, '8.995')] [2025-01-05 12:29:34,265][09057] Updated weights for policy 0, policy_version 257648 (0.0016) [2025-01-05 12:29:36,371][09057] Updated weights for policy 0, policy_version 257658 (0.0016) [2025-01-05 12:29:37,842][08963] Fps is (10 sec: 18841.5, 60 sec: 19387.7, 300 sec: 19230.4). Total num frames: 1055391744. Throughput: 0: 4862.1. Samples: 13841806. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 12:29:37,842][08963] Avg episode reward: [(0, '10.458')] [2025-01-05 12:29:38,560][09057] Updated weights for policy 0, policy_version 257668 (0.0017) [2025-01-05 12:29:40,633][09057] Updated weights for policy 0, policy_version 257678 (0.0016) [2025-01-05 12:29:42,719][09057] Updated weights for policy 0, policy_version 257688 (0.0016) [2025-01-05 12:29:42,842][08963] Fps is (10 sec: 19251.4, 60 sec: 19456.0, 300 sec: 19244.3). Total num frames: 1055490048. Throughput: 0: 4858.1. Samples: 13856186. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 12:29:42,842][08963] Avg episode reward: [(0, '8.839')] [2025-01-05 12:29:44,936][09057] Updated weights for policy 0, policy_version 257698 (0.0017) [2025-01-05 12:29:46,964][09057] Updated weights for policy 0, policy_version 257708 (0.0016) [2025-01-05 12:29:47,842][08963] Fps is (10 sec: 19251.3, 60 sec: 19387.7, 300 sec: 19244.3). Total num frames: 1055584256. Throughput: 0: 4838.3. Samples: 13885264. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 12:29:47,842][08963] Avg episode reward: [(0, '9.996')] [2025-01-05 12:29:47,914][09024] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000257712_1055588352.pth... [2025-01-05 12:29:47,970][09024] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000256582_1050959872.pth [2025-01-05 12:29:49,260][09057] Updated weights for policy 0, policy_version 257718 (0.0018) [2025-01-05 12:29:51,399][09057] Updated weights for policy 0, policy_version 257728 (0.0019) [2025-01-05 12:29:52,842][08963] Fps is (10 sec: 19251.0, 60 sec: 19387.7, 300 sec: 19244.3). Total num frames: 1055682560. Throughput: 0: 4809.5. Samples: 13913696. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 12:29:52,842][08963] Avg episode reward: [(0, '9.310')] [2025-01-05 12:29:53,421][09057] Updated weights for policy 0, policy_version 257738 (0.0015) [2025-01-05 12:29:55,506][09057] Updated weights for policy 0, policy_version 257748 (0.0015) [2025-01-05 12:29:57,543][09057] Updated weights for policy 0, policy_version 257758 (0.0015) [2025-01-05 12:29:57,842][08963] Fps is (10 sec: 19660.5, 60 sec: 19387.7, 300 sec: 19244.2). Total num frames: 1055780864. Throughput: 0: 4814.2. Samples: 13928664. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 12:29:57,843][08963] Avg episode reward: [(0, '9.400')] [2025-01-05 12:29:59,583][09057] Updated weights for policy 0, policy_version 257768 (0.0016) [2025-01-05 12:30:01,689][09057] Updated weights for policy 0, policy_version 257778 (0.0016) [2025-01-05 12:30:02,842][08963] Fps is (10 sec: 19660.9, 60 sec: 19387.7, 300 sec: 19258.1). Total num frames: 1055879168. Throughput: 0: 4837.6. Samples: 13958534. Policy #0 lag: (min: 0.0, avg: 1.4, max: 2.0) [2025-01-05 12:30:02,842][08963] Avg episode reward: [(0, '8.645')] [2025-01-05 12:30:03,849][09057] Updated weights for policy 0, policy_version 257788 (0.0017) [2025-01-05 12:30:05,944][09057] Updated weights for policy 0, policy_version 257798 (0.0017) [2025-01-05 12:30:07,842][08963] Fps is (10 sec: 19251.3, 60 sec: 19319.5, 300 sec: 19258.1). Total num frames: 1055973376. Throughput: 0: 4827.9. Samples: 13986864. Policy #0 lag: (min: 0.0, avg: 1.4, max: 2.0) [2025-01-05 12:30:07,843][08963] Avg episode reward: [(0, '9.293')] [2025-01-05 12:30:08,194][09057] Updated weights for policy 0, policy_version 257808 (0.0017) [2025-01-05 12:30:10,275][09057] Updated weights for policy 0, policy_version 257818 (0.0016) [2025-01-05 12:30:12,305][09057] Updated weights for policy 0, policy_version 257828 (0.0015) [2025-01-05 12:30:12,842][08963] Fps is (10 sec: 19251.0, 60 sec: 19319.4, 300 sec: 19272.0). Total num frames: 1056071680. Throughput: 0: 4827.7. Samples: 14001356. Policy #0 lag: (min: 0.0, avg: 1.4, max: 2.0) [2025-01-05 12:30:12,842][08963] Avg episode reward: [(0, '9.502')] [2025-01-05 12:30:14,545][09057] Updated weights for policy 0, policy_version 257838 (0.0018) [2025-01-05 12:30:16,631][09057] Updated weights for policy 0, policy_version 257848 (0.0015) [2025-01-05 12:30:17,842][08963] Fps is (10 sec: 19251.2, 60 sec: 19319.4, 300 sec: 19258.1). Total num frames: 1056165888. Throughput: 0: 4833.9. Samples: 14030318. Policy #0 lag: (min: 0.0, avg: 1.4, max: 2.0) [2025-01-05 12:30:17,842][08963] Avg episode reward: [(0, '9.726')] [2025-01-05 12:30:18,768][09057] Updated weights for policy 0, policy_version 257858 (0.0017) [2025-01-05 12:30:20,942][09057] Updated weights for policy 0, policy_version 257868 (0.0017) [2025-01-05 12:30:22,842][08963] Fps is (10 sec: 18841.7, 60 sec: 19251.2, 300 sec: 19258.1). Total num frames: 1056260096. Throughput: 0: 4822.3. Samples: 14058810. Policy #0 lag: (min: 0.0, avg: 1.4, max: 2.0) [2025-01-05 12:30:22,842][08963] Avg episode reward: [(0, '9.476')] [2025-01-05 12:30:23,114][09057] Updated weights for policy 0, policy_version 257878 (0.0018) [2025-01-05 12:30:25,201][09057] Updated weights for policy 0, policy_version 257888 (0.0017) [2025-01-05 12:30:27,307][09057] Updated weights for policy 0, policy_version 257898 (0.0015) [2025-01-05 12:30:27,842][08963] Fps is (10 sec: 19251.5, 60 sec: 19251.2, 300 sec: 19272.0). Total num frames: 1056358400. Throughput: 0: 4825.0. Samples: 14073310. Policy #0 lag: (min: 0.0, avg: 1.4, max: 2.0) [2025-01-05 12:30:27,842][08963] Avg episode reward: [(0, '9.522')] [2025-01-05 12:30:29,463][09057] Updated weights for policy 0, policy_version 257908 (0.0017) [2025-01-05 12:30:31,541][09057] Updated weights for policy 0, policy_version 257918 (0.0017) [2025-01-05 12:30:32,842][08963] Fps is (10 sec: 19251.2, 60 sec: 19251.2, 300 sec: 19272.0). Total num frames: 1056452608. Throughput: 0: 4821.9. Samples: 14102248. Policy #0 lag: (min: 0.0, avg: 1.4, max: 2.0) [2025-01-05 12:30:32,842][08963] Avg episode reward: [(0, '10.188')] [2025-01-05 12:30:33,877][09057] Updated weights for policy 0, policy_version 257928 (0.0017) [2025-01-05 12:30:36,287][09057] Updated weights for policy 0, policy_version 257938 (0.0016) [2025-01-05 12:30:37,842][08963] Fps is (10 sec: 18022.2, 60 sec: 19114.7, 300 sec: 19230.4). Total num frames: 1056538624. Throughput: 0: 4764.5. Samples: 14128100. Policy #0 lag: (min: 0.0, avg: 1.4, max: 2.0) [2025-01-05 12:30:37,843][08963] Avg episode reward: [(0, '10.252')] [2025-01-05 12:30:38,662][09057] Updated weights for policy 0, policy_version 257948 (0.0016) [2025-01-05 12:30:40,943][09057] Updated weights for policy 0, policy_version 257958 (0.0017) [2025-01-05 12:30:42,842][08963] Fps is (10 sec: 17613.0, 60 sec: 18978.1, 300 sec: 19216.5). Total num frames: 1056628736. Throughput: 0: 4731.4. Samples: 14141578. Policy #0 lag: (min: 0.0, avg: 1.4, max: 2.0) [2025-01-05 12:30:42,842][08963] Avg episode reward: [(0, '8.775')] [2025-01-05 12:30:43,192][09057] Updated weights for policy 0, policy_version 257968 (0.0017) [2025-01-05 12:30:45,323][09057] Updated weights for policy 0, policy_version 257978 (0.0017) [2025-01-05 12:30:47,431][09057] Updated weights for policy 0, policy_version 257988 (0.0017) [2025-01-05 12:30:47,842][08963] Fps is (10 sec: 18432.1, 60 sec: 18978.1, 300 sec: 19216.6). Total num frames: 1056722944. Throughput: 0: 4696.4. Samples: 14169874. Policy #0 lag: (min: 0.0, avg: 1.4, max: 2.0) [2025-01-05 12:30:47,842][08963] Avg episode reward: [(0, '8.951')] [2025-01-05 12:30:49,599][09057] Updated weights for policy 0, policy_version 257998 (0.0018) [2025-01-05 12:30:51,694][09057] Updated weights for policy 0, policy_version 258008 (0.0017) [2025-01-05 12:30:52,842][08963] Fps is (10 sec: 19251.1, 60 sec: 18978.2, 300 sec: 19216.5). Total num frames: 1056821248. Throughput: 0: 4706.2. Samples: 14198644. Policy #0 lag: (min: 0.0, avg: 1.4, max: 2.0) [2025-01-05 12:30:52,842][08963] Avg episode reward: [(0, '9.894')] [2025-01-05 12:30:53,854][09057] Updated weights for policy 0, policy_version 258018 (0.0017) [2025-01-05 12:30:55,958][09057] Updated weights for policy 0, policy_version 258028 (0.0017) [2025-01-05 12:30:57,842][08963] Fps is (10 sec: 19251.2, 60 sec: 18909.9, 300 sec: 19202.6). Total num frames: 1056915456. Throughput: 0: 4703.9. Samples: 14213030. Policy #0 lag: (min: 0.0, avg: 1.4, max: 2.0) [2025-01-05 12:30:57,842][08963] Avg episode reward: [(0, '8.829')] [2025-01-05 12:30:58,109][09057] Updated weights for policy 0, policy_version 258038 (0.0018) [2025-01-05 12:31:00,173][09057] Updated weights for policy 0, policy_version 258048 (0.0017) [2025-01-05 12:31:02,272][09057] Updated weights for policy 0, policy_version 258058 (0.0017) [2025-01-05 12:31:02,842][08963] Fps is (10 sec: 19251.1, 60 sec: 18909.9, 300 sec: 19216.5). Total num frames: 1057013760. Throughput: 0: 4710.9. Samples: 14242306. Policy #0 lag: (min: 0.0, avg: 1.4, max: 2.0) [2025-01-05 12:31:02,842][08963] Avg episode reward: [(0, '9.545')] [2025-01-05 12:31:04,411][09057] Updated weights for policy 0, policy_version 258068 (0.0017) [2025-01-05 12:31:06,502][09057] Updated weights for policy 0, policy_version 258078 (0.0017) [2025-01-05 12:31:07,842][08963] Fps is (10 sec: 19661.0, 60 sec: 18978.2, 300 sec: 19216.5). Total num frames: 1057112064. Throughput: 0: 4720.3. Samples: 14271224. Policy #0 lag: (min: 0.0, avg: 1.4, max: 2.0) [2025-01-05 12:31:07,842][08963] Avg episode reward: [(0, '9.546')] [2025-01-05 12:31:08,675][09057] Updated weights for policy 0, policy_version 258088 (0.0019) [2025-01-05 12:31:10,761][09057] Updated weights for policy 0, policy_version 258098 (0.0017) [2025-01-05 12:31:12,842][08963] Fps is (10 sec: 19251.2, 60 sec: 18909.9, 300 sec: 19216.5). Total num frames: 1057206272. Throughput: 0: 4720.9. Samples: 14285752. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 12:31:12,842][08963] Avg episode reward: [(0, '9.145')] [2025-01-05 12:31:12,973][09057] Updated weights for policy 0, policy_version 258108 (0.0017) [2025-01-05 12:31:15,103][09057] Updated weights for policy 0, policy_version 258118 (0.0017) [2025-01-05 12:31:17,196][09057] Updated weights for policy 0, policy_version 258128 (0.0017) [2025-01-05 12:31:17,842][08963] Fps is (10 sec: 18841.3, 60 sec: 18909.9, 300 sec: 19202.6). Total num frames: 1057300480. Throughput: 0: 4711.3. Samples: 14314258. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 12:31:17,843][08963] Avg episode reward: [(0, '10.408')] [2025-01-05 12:31:19,404][09057] Updated weights for policy 0, policy_version 258138 (0.0018) [2025-01-05 12:31:21,501][09057] Updated weights for policy 0, policy_version 258148 (0.0017) [2025-01-05 12:31:22,842][08963] Fps is (10 sec: 19251.2, 60 sec: 18978.1, 300 sec: 19202.6). Total num frames: 1057398784. Throughput: 0: 4771.7. Samples: 14342824. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 12:31:22,842][08963] Avg episode reward: [(0, '9.532')] [2025-01-05 12:31:23,668][09057] Updated weights for policy 0, policy_version 258158 (0.0016) [2025-01-05 12:31:25,737][09057] Updated weights for policy 0, policy_version 258168 (0.0015) [2025-01-05 12:31:27,842][08963] Fps is (10 sec: 19251.5, 60 sec: 18909.9, 300 sec: 19188.7). Total num frames: 1057492992. Throughput: 0: 4799.4. Samples: 14357550. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 12:31:27,842][08963] Avg episode reward: [(0, '8.773')] [2025-01-05 12:31:27,923][09057] Updated weights for policy 0, policy_version 258178 (0.0018) [2025-01-05 12:31:30,139][09057] Updated weights for policy 0, policy_version 258188 (0.0017) [2025-01-05 12:31:32,230][09057] Updated weights for policy 0, policy_version 258198 (0.0016) [2025-01-05 12:31:32,842][08963] Fps is (10 sec: 18841.6, 60 sec: 18909.9, 300 sec: 19174.8). Total num frames: 1057587200. Throughput: 0: 4798.3. Samples: 14385798. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 12:31:32,842][08963] Avg episode reward: [(0, '9.344')] [2025-01-05 12:31:34,334][09057] Updated weights for policy 0, policy_version 258208 (0.0016) [2025-01-05 12:31:36,438][09057] Updated weights for policy 0, policy_version 258218 (0.0016) [2025-01-05 12:31:37,842][08963] Fps is (10 sec: 19250.5, 60 sec: 19114.6, 300 sec: 19188.7). Total num frames: 1057685504. Throughput: 0: 4804.8. Samples: 14414860. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 12:31:37,843][08963] Avg episode reward: [(0, '9.841')] [2025-01-05 12:31:38,591][09057] Updated weights for policy 0, policy_version 258228 (0.0016) [2025-01-05 12:31:40,660][09057] Updated weights for policy 0, policy_version 258238 (0.0016) [2025-01-05 12:31:42,770][09057] Updated weights for policy 0, policy_version 258248 (0.0016) [2025-01-05 12:31:42,842][08963] Fps is (10 sec: 19661.0, 60 sec: 19251.2, 300 sec: 19202.6). Total num frames: 1057783808. Throughput: 0: 4808.5. Samples: 14429412. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 12:31:42,842][08963] Avg episode reward: [(0, '9.874')] [2025-01-05 12:31:44,904][09057] Updated weights for policy 0, policy_version 258258 (0.0017) [2025-01-05 12:31:46,959][09057] Updated weights for policy 0, policy_version 258268 (0.0016) [2025-01-05 12:31:47,842][08963] Fps is (10 sec: 19251.5, 60 sec: 19251.2, 300 sec: 19188.7). Total num frames: 1057878016. Throughput: 0: 4808.3. Samples: 14458680. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 12:31:47,842][08963] Avg episode reward: [(0, '9.410')] [2025-01-05 12:31:47,874][09024] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000258272_1057882112.pth... [2025-01-05 12:31:47,929][09024] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000257145_1053265920.pth [2025-01-05 12:31:49,195][09057] Updated weights for policy 0, policy_version 258278 (0.0017) [2025-01-05 12:31:51,227][09057] Updated weights for policy 0, policy_version 258288 (0.0016) [2025-01-05 12:31:52,842][08963] Fps is (10 sec: 19251.1, 60 sec: 19251.2, 300 sec: 19188.7). Total num frames: 1057976320. Throughput: 0: 4809.6. Samples: 14487656. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 12:31:52,842][08963] Avg episode reward: [(0, '9.834')] [2025-01-05 12:31:53,297][09057] Updated weights for policy 0, policy_version 258298 (0.0017) [2025-01-05 12:31:55,428][09057] Updated weights for policy 0, policy_version 258308 (0.0018) [2025-01-05 12:31:57,475][09057] Updated weights for policy 0, policy_version 258318 (0.0017) [2025-01-05 12:31:57,842][08963] Fps is (10 sec: 19661.0, 60 sec: 19319.5, 300 sec: 19202.6). Total num frames: 1058074624. Throughput: 0: 4816.8. Samples: 14502508. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 12:31:57,842][08963] Avg episode reward: [(0, '9.495')] [2025-01-05 12:31:59,551][09057] Updated weights for policy 0, policy_version 258328 (0.0016) [2025-01-05 12:32:01,684][09057] Updated weights for policy 0, policy_version 258338 (0.0017) [2025-01-05 12:32:02,842][08963] Fps is (10 sec: 19660.6, 60 sec: 19319.5, 300 sec: 19202.6). Total num frames: 1058172928. Throughput: 0: 4837.9. Samples: 14531964. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 12:32:02,843][08963] Avg episode reward: [(0, '9.377')] [2025-01-05 12:32:03,841][09057] Updated weights for policy 0, policy_version 258348 (0.0018) [2025-01-05 12:32:05,909][09057] Updated weights for policy 0, policy_version 258358 (0.0017) [2025-01-05 12:32:07,842][08963] Fps is (10 sec: 19251.3, 60 sec: 19251.2, 300 sec: 19216.5). Total num frames: 1058267136. Throughput: 0: 4834.8. Samples: 14560390. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 12:32:07,843][08963] Avg episode reward: [(0, '9.539')] [2025-01-05 12:32:08,143][09057] Updated weights for policy 0, policy_version 258368 (0.0017) [2025-01-05 12:32:10,210][09057] Updated weights for policy 0, policy_version 258378 (0.0017) [2025-01-05 12:32:12,255][09057] Updated weights for policy 0, policy_version 258388 (0.0017) [2025-01-05 12:32:12,842][08963] Fps is (10 sec: 19250.5, 60 sec: 19319.3, 300 sec: 19216.5). Total num frames: 1058365440. Throughput: 0: 4835.5. Samples: 14575152. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 12:32:12,843][08963] Avg episode reward: [(0, '8.824')] [2025-01-05 12:32:14,467][09057] Updated weights for policy 0, policy_version 258398 (0.0016) [2025-01-05 12:32:16,533][09057] Updated weights for policy 0, policy_version 258408 (0.0016) [2025-01-05 12:32:17,842][08963] Fps is (10 sec: 19660.8, 60 sec: 19387.8, 300 sec: 19230.4). Total num frames: 1058463744. Throughput: 0: 4854.1. Samples: 14604234. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 12:32:17,843][08963] Avg episode reward: [(0, '9.411')] [2025-01-05 12:32:18,682][09057] Updated weights for policy 0, policy_version 258418 (0.0018) [2025-01-05 12:32:20,821][09057] Updated weights for policy 0, policy_version 258428 (0.0016) [2025-01-05 12:32:22,842][08963] Fps is (10 sec: 19251.9, 60 sec: 19319.5, 300 sec: 19230.4). Total num frames: 1058557952. Throughput: 0: 4843.4. Samples: 14632812. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 12:32:22,843][08963] Avg episode reward: [(0, '10.008')] [2025-01-05 12:32:22,992][09057] Updated weights for policy 0, policy_version 258438 (0.0017) [2025-01-05 12:32:25,135][09057] Updated weights for policy 0, policy_version 258448 (0.0017) [2025-01-05 12:32:27,265][09057] Updated weights for policy 0, policy_version 258458 (0.0016) [2025-01-05 12:32:27,842][08963] Fps is (10 sec: 18841.3, 60 sec: 19319.4, 300 sec: 19230.4). Total num frames: 1058652160. Throughput: 0: 4834.9. Samples: 14646982. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 12:32:27,843][08963] Avg episode reward: [(0, '9.556')] [2025-01-05 12:32:29,404][09057] Updated weights for policy 0, policy_version 258468 (0.0016) [2025-01-05 12:32:31,445][09057] Updated weights for policy 0, policy_version 258478 (0.0016) [2025-01-05 12:32:32,842][08963] Fps is (10 sec: 19251.3, 60 sec: 19387.7, 300 sec: 19230.4). Total num frames: 1058750464. Throughput: 0: 4833.3. Samples: 14676178. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 12:32:32,843][08963] Avg episode reward: [(0, '9.549')] [2025-01-05 12:32:33,633][09057] Updated weights for policy 0, policy_version 258488 (0.0016) [2025-01-05 12:32:35,644][09057] Updated weights for policy 0, policy_version 258498 (0.0015) [2025-01-05 12:32:37,649][09057] Updated weights for policy 0, policy_version 258508 (0.0014) [2025-01-05 12:32:37,842][08963] Fps is (10 sec: 19661.0, 60 sec: 19387.8, 300 sec: 19244.2). Total num frames: 1058848768. Throughput: 0: 4848.5. Samples: 14705840. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 12:32:37,842][08963] Avg episode reward: [(0, '8.894')] [2025-01-05 12:32:39,738][09057] Updated weights for policy 0, policy_version 258518 (0.0015) [2025-01-05 12:32:41,746][09057] Updated weights for policy 0, policy_version 258528 (0.0016) [2025-01-05 12:32:42,842][08963] Fps is (10 sec: 20070.4, 60 sec: 19456.0, 300 sec: 19272.0). Total num frames: 1058951168. Throughput: 0: 4854.2. Samples: 14720946. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 12:32:42,842][08963] Avg episode reward: [(0, '9.044')] [2025-01-05 12:32:43,751][09057] Updated weights for policy 0, policy_version 258538 (0.0015) [2025-01-05 12:32:45,833][09057] Updated weights for policy 0, policy_version 258548 (0.0016) [2025-01-05 12:32:47,842][08963] Fps is (10 sec: 20070.5, 60 sec: 19524.3, 300 sec: 19299.8). Total num frames: 1059049472. Throughput: 0: 4864.5. Samples: 14750868. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 12:32:47,842][08963] Avg episode reward: [(0, '9.653')] [2025-01-05 12:32:47,942][09057] Updated weights for policy 0, policy_version 258558 (0.0017) [2025-01-05 12:32:49,976][09057] Updated weights for policy 0, policy_version 258568 (0.0015) [2025-01-05 12:32:52,049][09057] Updated weights for policy 0, policy_version 258578 (0.0017) [2025-01-05 12:32:52,842][08963] Fps is (10 sec: 19661.0, 60 sec: 19524.3, 300 sec: 19313.7). Total num frames: 1059147776. Throughput: 0: 4895.7. Samples: 14780694. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 12:32:52,842][08963] Avg episode reward: [(0, '9.305')] [2025-01-05 12:32:54,161][09057] Updated weights for policy 0, policy_version 258588 (0.0017) [2025-01-05 12:32:56,218][09057] Updated weights for policy 0, policy_version 258598 (0.0016) [2025-01-05 12:32:57,842][08963] Fps is (10 sec: 19660.6, 60 sec: 19524.3, 300 sec: 19327.6). Total num frames: 1059246080. Throughput: 0: 4895.4. Samples: 14795442. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 12:32:57,842][08963] Avg episode reward: [(0, '10.465')] [2025-01-05 12:32:58,363][09057] Updated weights for policy 0, policy_version 258608 (0.0017) [2025-01-05 12:33:00,401][09057] Updated weights for policy 0, policy_version 258618 (0.0018) [2025-01-05 12:33:02,457][09057] Updated weights for policy 0, policy_version 258628 (0.0015) [2025-01-05 12:33:02,842][08963] Fps is (10 sec: 19660.6, 60 sec: 19524.3, 300 sec: 19327.6). Total num frames: 1059344384. Throughput: 0: 4903.8. Samples: 14824904. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 12:33:02,842][08963] Avg episode reward: [(0, '9.684')] [2025-01-05 12:33:04,597][09057] Updated weights for policy 0, policy_version 258638 (0.0019) [2025-01-05 12:33:06,638][09057] Updated weights for policy 0, policy_version 258648 (0.0016) [2025-01-05 12:33:07,842][08963] Fps is (10 sec: 19660.7, 60 sec: 19592.5, 300 sec: 19327.6). Total num frames: 1059442688. Throughput: 0: 4919.9. Samples: 14854210. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 12:33:07,843][08963] Avg episode reward: [(0, '8.721')] [2025-01-05 12:33:08,793][09057] Updated weights for policy 0, policy_version 258658 (0.0017) [2025-01-05 12:33:10,833][09057] Updated weights for policy 0, policy_version 258668 (0.0017) [2025-01-05 12:33:12,842][08963] Fps is (10 sec: 19660.7, 60 sec: 19592.7, 300 sec: 19327.6). Total num frames: 1059540992. Throughput: 0: 4934.2. Samples: 14869022. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 12:33:12,842][08963] Avg episode reward: [(0, '9.583')] [2025-01-05 12:33:12,990][09057] Updated weights for policy 0, policy_version 258678 (0.0017) [2025-01-05 12:33:15,031][09057] Updated weights for policy 0, policy_version 258688 (0.0014) [2025-01-05 12:33:17,096][09057] Updated weights for policy 0, policy_version 258698 (0.0017) [2025-01-05 12:33:17,842][08963] Fps is (10 sec: 19660.9, 60 sec: 19592.5, 300 sec: 19327.6). Total num frames: 1059639296. Throughput: 0: 4938.5. Samples: 14898412. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 12:33:17,842][08963] Avg episode reward: [(0, '9.550')] [2025-01-05 12:33:19,256][09057] Updated weights for policy 0, policy_version 258708 (0.0019) [2025-01-05 12:33:21,289][09057] Updated weights for policy 0, policy_version 258718 (0.0015) [2025-01-05 12:33:22,842][08963] Fps is (10 sec: 19660.7, 60 sec: 19660.8, 300 sec: 19327.6). Total num frames: 1059737600. Throughput: 0: 4925.8. Samples: 14927502. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 12:33:22,842][08963] Avg episode reward: [(0, '9.293')] [2025-01-05 12:33:23,460][09057] Updated weights for policy 0, policy_version 258728 (0.0017) [2025-01-05 12:33:25,507][09057] Updated weights for policy 0, policy_version 258738 (0.0015) [2025-01-05 12:33:27,573][09057] Updated weights for policy 0, policy_version 258748 (0.0015) [2025-01-05 12:33:27,842][08963] Fps is (10 sec: 19660.8, 60 sec: 19729.1, 300 sec: 19341.4). Total num frames: 1059835904. Throughput: 0: 4919.2. Samples: 14942310. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 12:33:27,842][08963] Avg episode reward: [(0, '9.758')] [2025-01-05 12:33:29,784][09057] Updated weights for policy 0, policy_version 258758 (0.0017) [2025-01-05 12:33:31,825][09057] Updated weights for policy 0, policy_version 258768 (0.0016) [2025-01-05 12:33:32,842][08963] Fps is (10 sec: 19251.1, 60 sec: 19660.8, 300 sec: 19327.6). Total num frames: 1059930112. Throughput: 0: 4901.4. Samples: 14971432. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 12:33:32,842][08963] Avg episode reward: [(0, '9.631')] [2025-01-05 12:33:34,021][09057] Updated weights for policy 0, policy_version 258778 (0.0017) [2025-01-05 12:33:36,062][09057] Updated weights for policy 0, policy_version 258788 (0.0015) [2025-01-05 12:33:37,842][08963] Fps is (10 sec: 19251.5, 60 sec: 19660.8, 300 sec: 19341.5). Total num frames: 1060028416. Throughput: 0: 4880.9. Samples: 15000334. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 12:33:37,842][08963] Avg episode reward: [(0, '8.679')] [2025-01-05 12:33:38,223][09057] Updated weights for policy 0, policy_version 258798 (0.0019) [2025-01-05 12:33:40,291][09057] Updated weights for policy 0, policy_version 258808 (0.0017) [2025-01-05 12:33:42,329][09057] Updated weights for policy 0, policy_version 258818 (0.0016) [2025-01-05 12:33:42,842][08963] Fps is (10 sec: 19661.1, 60 sec: 19592.6, 300 sec: 19341.5). Total num frames: 1060126720. Throughput: 0: 4880.5. Samples: 15015062. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 12:33:42,842][08963] Avg episode reward: [(0, '9.971')] [2025-01-05 12:33:44,503][09057] Updated weights for policy 0, policy_version 258828 (0.0018) [2025-01-05 12:33:46,581][09057] Updated weights for policy 0, policy_version 258838 (0.0017) [2025-01-05 12:33:47,842][08963] Fps is (10 sec: 19250.7, 60 sec: 19524.2, 300 sec: 19327.6). Total num frames: 1060220928. Throughput: 0: 4875.8. Samples: 15044316. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 12:33:47,842][08963] Avg episode reward: [(0, '9.694')] [2025-01-05 12:33:47,858][09024] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000258844_1060225024.pth... [2025-01-05 12:33:47,912][09024] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000257712_1055588352.pth [2025-01-05 12:33:48,768][09057] Updated weights for policy 0, policy_version 258848 (0.0018) [2025-01-05 12:33:50,802][09057] Updated weights for policy 0, policy_version 258858 (0.0016) [2025-01-05 12:33:52,842][08963] Fps is (10 sec: 19251.0, 60 sec: 19524.2, 300 sec: 19327.6). Total num frames: 1060319232. Throughput: 0: 4876.2. Samples: 15073638. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 12:33:52,842][08963] Avg episode reward: [(0, '10.802')] [2025-01-05 12:33:52,868][09057] Updated weights for policy 0, policy_version 258868 (0.0017) [2025-01-05 12:33:55,030][09057] Updated weights for policy 0, policy_version 258878 (0.0017) [2025-01-05 12:33:57,064][09057] Updated weights for policy 0, policy_version 258888 (0.0017) [2025-01-05 12:33:57,842][08963] Fps is (10 sec: 19661.2, 60 sec: 19524.3, 300 sec: 19327.6). Total num frames: 1060417536. Throughput: 0: 4867.6. Samples: 15088064. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 12:33:57,842][08963] Avg episode reward: [(0, '9.367')] [2025-01-05 12:33:59,211][09057] Updated weights for policy 0, policy_version 258898 (0.0017) [2025-01-05 12:34:01,278][09057] Updated weights for policy 0, policy_version 258908 (0.0017) [2025-01-05 12:34:02,842][08963] Fps is (10 sec: 19661.0, 60 sec: 19524.3, 300 sec: 19327.6). Total num frames: 1060515840. Throughput: 0: 4866.5. Samples: 15117404. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 12:34:02,842][08963] Avg episode reward: [(0, '10.517')] [2025-01-05 12:34:03,391][09057] Updated weights for policy 0, policy_version 258918 (0.0017) [2025-01-05 12:34:05,438][09057] Updated weights for policy 0, policy_version 258928 (0.0016) [2025-01-05 12:34:07,506][09057] Updated weights for policy 0, policy_version 258938 (0.0017) [2025-01-05 12:34:07,842][08963] Fps is (10 sec: 19660.5, 60 sec: 19524.3, 300 sec: 19327.6). Total num frames: 1060614144. Throughput: 0: 4880.6. Samples: 15147130. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 12:34:07,842][08963] Avg episode reward: [(0, '9.432')] [2025-01-05 12:34:09,628][09057] Updated weights for policy 0, policy_version 258948 (0.0017) [2025-01-05 12:34:11,678][09057] Updated weights for policy 0, policy_version 258958 (0.0016) [2025-01-05 12:34:12,842][08963] Fps is (10 sec: 19660.6, 60 sec: 19524.3, 300 sec: 19341.4). Total num frames: 1060712448. Throughput: 0: 4874.6. Samples: 15161668. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 12:34:12,842][08963] Avg episode reward: [(0, '10.826')] [2025-01-05 12:34:13,810][09057] Updated weights for policy 0, policy_version 258968 (0.0017) [2025-01-05 12:34:15,881][09057] Updated weights for policy 0, policy_version 258978 (0.0016) [2025-01-05 12:34:17,842][08963] Fps is (10 sec: 19660.9, 60 sec: 19524.3, 300 sec: 19341.4). Total num frames: 1060810752. Throughput: 0: 4880.4. Samples: 15191048. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 12:34:17,842][08963] Avg episode reward: [(0, '9.168')] [2025-01-05 12:34:18,005][09057] Updated weights for policy 0, policy_version 258988 (0.0017) [2025-01-05 12:34:20,045][09057] Updated weights for policy 0, policy_version 258998 (0.0016) [2025-01-05 12:34:22,130][09057] Updated weights for policy 0, policy_version 259008 (0.0016) [2025-01-05 12:34:22,842][08963] Fps is (10 sec: 19660.7, 60 sec: 19524.3, 300 sec: 19341.4). Total num frames: 1060909056. Throughput: 0: 4897.5. Samples: 15220722. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 12:34:22,843][08963] Avg episode reward: [(0, '9.694')] [2025-01-05 12:34:24,243][09057] Updated weights for policy 0, policy_version 259018 (0.0017) [2025-01-05 12:34:26,298][09057] Updated weights for policy 0, policy_version 259028 (0.0017) [2025-01-05 12:34:27,842][08963] Fps is (10 sec: 19660.9, 60 sec: 19524.3, 300 sec: 19355.3). Total num frames: 1061007360. Throughput: 0: 4892.7. Samples: 15235234. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 12:34:27,842][08963] Avg episode reward: [(0, '10.559')] [2025-01-05 12:34:28,460][09057] Updated weights for policy 0, policy_version 259038 (0.0017) [2025-01-05 12:34:30,496][09057] Updated weights for policy 0, policy_version 259048 (0.0017) [2025-01-05 12:34:32,522][09057] Updated weights for policy 0, policy_version 259058 (0.0016) [2025-01-05 12:34:32,842][08963] Fps is (10 sec: 19661.0, 60 sec: 19592.6, 300 sec: 19369.2). Total num frames: 1061105664. Throughput: 0: 4901.1. Samples: 15264864. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 12:34:32,842][08963] Avg episode reward: [(0, '10.796')] [2025-01-05 12:34:34,686][09057] Updated weights for policy 0, policy_version 259068 (0.0017) [2025-01-05 12:34:36,757][09057] Updated weights for policy 0, policy_version 259078 (0.0016) [2025-01-05 12:34:37,842][08963] Fps is (10 sec: 19661.0, 60 sec: 19592.5, 300 sec: 19369.2). Total num frames: 1061203968. Throughput: 0: 4897.5. Samples: 15294024. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 12:34:37,842][08963] Avg episode reward: [(0, '10.517')] [2025-01-05 12:34:38,876][09057] Updated weights for policy 0, policy_version 259088 (0.0017) [2025-01-05 12:34:40,975][09057] Updated weights for policy 0, policy_version 259098 (0.0016) [2025-01-05 12:34:42,842][08963] Fps is (10 sec: 19251.1, 60 sec: 19524.2, 300 sec: 19369.2). Total num frames: 1061298176. Throughput: 0: 4903.3. Samples: 15308712. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 12:34:42,842][08963] Avg episode reward: [(0, '9.671')] [2025-01-05 12:34:43,108][09057] Updated weights for policy 0, policy_version 259108 (0.0017) [2025-01-05 12:34:45,121][09057] Updated weights for policy 0, policy_version 259118 (0.0016) [2025-01-05 12:34:47,167][09057] Updated weights for policy 0, policy_version 259128 (0.0017) [2025-01-05 12:34:47,842][08963] Fps is (10 sec: 19660.5, 60 sec: 19660.8, 300 sec: 19383.1). Total num frames: 1061400576. Throughput: 0: 4912.6. Samples: 15338472. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 12:34:47,842][08963] Avg episode reward: [(0, '12.323')] [2025-01-05 12:34:47,850][09024] Saving new best policy, reward=12.323! [2025-01-05 12:34:49,302][09057] Updated weights for policy 0, policy_version 259138 (0.0017) [2025-01-05 12:34:51,351][09057] Updated weights for policy 0, policy_version 259148 (0.0016) [2025-01-05 12:34:52,842][08963] Fps is (10 sec: 19661.0, 60 sec: 19592.6, 300 sec: 19369.2). Total num frames: 1061494784. Throughput: 0: 4901.2. Samples: 15367682. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 12:34:52,842][08963] Avg episode reward: [(0, '10.153')] [2025-01-05 12:34:53,500][09057] Updated weights for policy 0, policy_version 259158 (0.0016) [2025-01-05 12:34:55,527][09057] Updated weights for policy 0, policy_version 259168 (0.0016) [2025-01-05 12:34:57,597][09057] Updated weights for policy 0, policy_version 259178 (0.0016) [2025-01-05 12:34:57,842][08963] Fps is (10 sec: 19660.9, 60 sec: 19660.8, 300 sec: 19383.1). Total num frames: 1061597184. Throughput: 0: 4905.9. Samples: 15382432. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 12:34:57,842][08963] Avg episode reward: [(0, '9.981')] [2025-01-05 12:34:59,740][09057] Updated weights for policy 0, policy_version 259188 (0.0016) [2025-01-05 12:35:01,778][09057] Updated weights for policy 0, policy_version 259198 (0.0016) [2025-01-05 12:35:02,842][08963] Fps is (10 sec: 19660.6, 60 sec: 19592.5, 300 sec: 19383.1). Total num frames: 1061691392. Throughput: 0: 4910.9. Samples: 15412036. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 12:35:02,842][08963] Avg episode reward: [(0, '10.262')] [2025-01-05 12:35:03,914][09057] Updated weights for policy 0, policy_version 259208 (0.0017) [2025-01-05 12:35:05,986][09057] Updated weights for policy 0, policy_version 259218 (0.0017) [2025-01-05 12:35:07,842][08963] Fps is (10 sec: 19251.2, 60 sec: 19592.6, 300 sec: 19383.1). Total num frames: 1061789696. Throughput: 0: 4899.2. Samples: 15441188. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 12:35:07,842][08963] Avg episode reward: [(0, '10.061')] [2025-01-05 12:35:08,080][09057] Updated weights for policy 0, policy_version 259228 (0.0017) [2025-01-05 12:35:10,137][09057] Updated weights for policy 0, policy_version 259238 (0.0016) [2025-01-05 12:35:12,206][09057] Updated weights for policy 0, policy_version 259248 (0.0015) [2025-01-05 12:35:12,842][08963] Fps is (10 sec: 19661.0, 60 sec: 19592.6, 300 sec: 19397.0). Total num frames: 1061888000. Throughput: 0: 4910.3. Samples: 15456198. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 12:35:12,842][08963] Avg episode reward: [(0, '9.390')] [2025-01-05 12:35:14,291][09057] Updated weights for policy 0, policy_version 259258 (0.0016) [2025-01-05 12:35:16,346][09057] Updated weights for policy 0, policy_version 259268 (0.0016) [2025-01-05 12:35:17,842][08963] Fps is (10 sec: 19660.7, 60 sec: 19592.5, 300 sec: 19410.9). Total num frames: 1061986304. Throughput: 0: 4909.8. Samples: 15485806. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 12:35:17,842][08963] Avg episode reward: [(0, '10.831')] [2025-01-05 12:35:18,498][09057] Updated weights for policy 0, policy_version 259278 (0.0017) [2025-01-05 12:35:20,524][09057] Updated weights for policy 0, policy_version 259288 (0.0016) [2025-01-05 12:35:22,591][09057] Updated weights for policy 0, policy_version 259298 (0.0016) [2025-01-05 12:35:22,842][08963] Fps is (10 sec: 20070.0, 60 sec: 19660.8, 300 sec: 19424.8). Total num frames: 1062088704. Throughput: 0: 4918.4. Samples: 15515352. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 12:35:22,842][08963] Avg episode reward: [(0, '10.775')] [2025-01-05 12:35:24,695][09057] Updated weights for policy 0, policy_version 259308 (0.0016) [2025-01-05 12:35:26,729][09057] Updated weights for policy 0, policy_version 259318 (0.0018) [2025-01-05 12:35:27,842][08963] Fps is (10 sec: 20070.3, 60 sec: 19660.8, 300 sec: 19438.6). Total num frames: 1062187008. Throughput: 0: 4920.9. Samples: 15530154. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 12:35:27,842][08963] Avg episode reward: [(0, '10.509')] [2025-01-05 12:35:28,905][09057] Updated weights for policy 0, policy_version 259328 (0.0017) [2025-01-05 12:35:30,938][09057] Updated weights for policy 0, policy_version 259338 (0.0017) [2025-01-05 12:35:32,842][08963] Fps is (10 sec: 19251.4, 60 sec: 19592.5, 300 sec: 19466.4). Total num frames: 1062281216. Throughput: 0: 4910.0. Samples: 15559422. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 12:35:32,843][08963] Avg episode reward: [(0, '10.095')] [2025-01-05 12:35:33,070][09057] Updated weights for policy 0, policy_version 259348 (0.0018) [2025-01-05 12:35:35,158][09057] Updated weights for policy 0, policy_version 259358 (0.0016) [2025-01-05 12:35:37,184][09057] Updated weights for policy 0, policy_version 259368 (0.0016) [2025-01-05 12:35:37,842][08963] Fps is (10 sec: 19661.1, 60 sec: 19660.8, 300 sec: 19508.1). Total num frames: 1062383616. Throughput: 0: 4916.3. Samples: 15588916. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 12:35:37,842][08963] Avg episode reward: [(0, '10.177')] [2025-01-05 12:35:39,299][09057] Updated weights for policy 0, policy_version 259378 (0.0017) [2025-01-05 12:35:41,350][09057] Updated weights for policy 0, policy_version 259388 (0.0016) [2025-01-05 12:35:42,842][08963] Fps is (10 sec: 20070.4, 60 sec: 19729.1, 300 sec: 19522.0). Total num frames: 1062481920. Throughput: 0: 4918.4. Samples: 15603760. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 12:35:42,843][08963] Avg episode reward: [(0, '10.063')] [2025-01-05 12:35:43,479][09057] Updated weights for policy 0, policy_version 259398 (0.0016) [2025-01-05 12:35:45,547][09057] Updated weights for policy 0, policy_version 259408 (0.0016) [2025-01-05 12:35:47,601][09057] Updated weights for policy 0, policy_version 259418 (0.0016) [2025-01-05 12:35:47,842][08963] Fps is (10 sec: 19660.5, 60 sec: 19660.8, 300 sec: 19521.9). Total num frames: 1062580224. Throughput: 0: 4917.7. Samples: 15633332. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 12:35:47,842][08963] Avg episode reward: [(0, '8.858')] [2025-01-05 12:35:47,849][09024] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000259419_1062580224.pth... [2025-01-05 12:35:47,899][09024] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000258272_1057882112.pth [2025-01-05 12:35:49,728][09057] Updated weights for policy 0, policy_version 259428 (0.0017) [2025-01-05 12:35:51,759][09057] Updated weights for policy 0, policy_version 259438 (0.0016) [2025-01-05 12:35:52,842][08963] Fps is (10 sec: 19251.2, 60 sec: 19660.8, 300 sec: 19522.0). Total num frames: 1062674432. Throughput: 0: 4918.5. Samples: 15662518. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 12:35:52,842][08963] Avg episode reward: [(0, '9.289')] [2025-01-05 12:35:53,937][09057] Updated weights for policy 0, policy_version 259448 (0.0017) [2025-01-05 12:35:55,970][09057] Updated weights for policy 0, policy_version 259458 (0.0016) [2025-01-05 12:35:57,842][08963] Fps is (10 sec: 19251.5, 60 sec: 19592.6, 300 sec: 19522.0). Total num frames: 1062772736. Throughput: 0: 4914.0. Samples: 15677328. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 12:35:57,842][08963] Avg episode reward: [(0, '10.995')] [2025-01-05 12:35:58,084][09057] Updated weights for policy 0, policy_version 259468 (0.0017) [2025-01-05 12:36:00,122][09057] Updated weights for policy 0, policy_version 259478 (0.0016) [2025-01-05 12:36:02,162][09057] Updated weights for policy 0, policy_version 259488 (0.0016) [2025-01-05 12:36:02,842][08963] Fps is (10 sec: 20069.8, 60 sec: 19729.0, 300 sec: 19535.8). Total num frames: 1062875136. Throughput: 0: 4917.1. Samples: 15707078. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 12:36:02,843][08963] Avg episode reward: [(0, '10.261')] [2025-01-05 12:36:04,300][09057] Updated weights for policy 0, policy_version 259498 (0.0017) [2025-01-05 12:36:06,361][09057] Updated weights for policy 0, policy_version 259508 (0.0017) [2025-01-05 12:36:07,842][08963] Fps is (10 sec: 19660.3, 60 sec: 19660.8, 300 sec: 19535.8). Total num frames: 1062969344. Throughput: 0: 4905.1. Samples: 15736080. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 12:36:07,843][08963] Avg episode reward: [(0, '9.994')] [2025-01-05 12:36:08,520][09057] Updated weights for policy 0, policy_version 259518 (0.0018) [2025-01-05 12:36:10,586][09057] Updated weights for policy 0, policy_version 259528 (0.0017) [2025-01-05 12:36:12,635][09057] Updated weights for policy 0, policy_version 259538 (0.0016) [2025-01-05 12:36:12,842][08963] Fps is (10 sec: 19251.9, 60 sec: 19660.8, 300 sec: 19549.7). Total num frames: 1063067648. Throughput: 0: 4907.5. Samples: 15750990. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 12:36:12,842][08963] Avg episode reward: [(0, '9.539')] [2025-01-05 12:36:14,776][09057] Updated weights for policy 0, policy_version 259548 (0.0017) [2025-01-05 12:36:16,867][09057] Updated weights for policy 0, policy_version 259558 (0.0016) [2025-01-05 12:36:17,842][08963] Fps is (10 sec: 19661.2, 60 sec: 19660.8, 300 sec: 19549.7). Total num frames: 1063165952. Throughput: 0: 4907.6. Samples: 15780264. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 12:36:17,842][08963] Avg episode reward: [(0, '10.154')] [2025-01-05 12:36:19,006][09057] Updated weights for policy 0, policy_version 259568 (0.0017) [2025-01-05 12:36:21,061][09057] Updated weights for policy 0, policy_version 259578 (0.0016) [2025-01-05 12:36:22,842][08963] Fps is (10 sec: 19660.7, 60 sec: 19592.6, 300 sec: 19563.6). Total num frames: 1063264256. Throughput: 0: 4895.6. Samples: 15809218. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 12:36:22,842][08963] Avg episode reward: [(0, '8.803')] [2025-01-05 12:36:23,232][09057] Updated weights for policy 0, policy_version 259588 (0.0017) [2025-01-05 12:36:25,278][09057] Updated weights for policy 0, policy_version 259598 (0.0018) [2025-01-05 12:36:27,320][09057] Updated weights for policy 0, policy_version 259608 (0.0017) [2025-01-05 12:36:27,842][08963] Fps is (10 sec: 19660.6, 60 sec: 19592.6, 300 sec: 19577.5). Total num frames: 1063362560. Throughput: 0: 4896.2. Samples: 15824088. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 12:36:27,843][08963] Avg episode reward: [(0, '10.739')] [2025-01-05 12:36:29,509][09057] Updated weights for policy 0, policy_version 259618 (0.0018) [2025-01-05 12:36:31,562][09057] Updated weights for policy 0, policy_version 259628 (0.0016) [2025-01-05 12:36:32,848][08963] Fps is (10 sec: 19649.0, 60 sec: 19658.9, 300 sec: 19577.1). Total num frames: 1063460864. Throughput: 0: 4889.9. Samples: 15853404. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 12:36:32,848][08963] Avg episode reward: [(0, '8.689')] [2025-01-05 12:36:33,676][09057] Updated weights for policy 0, policy_version 259638 (0.0017) [2025-01-05 12:36:35,768][09057] Updated weights for policy 0, policy_version 259648 (0.0017) [2025-01-05 12:36:37,842][08963] Fps is (10 sec: 19251.3, 60 sec: 19524.3, 300 sec: 19563.6). Total num frames: 1063555072. Throughput: 0: 4888.3. Samples: 15882492. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 12:36:37,842][08963] Avg episode reward: [(0, '9.957')] [2025-01-05 12:36:37,901][09057] Updated weights for policy 0, policy_version 259658 (0.0017) [2025-01-05 12:36:40,016][09057] Updated weights for policy 0, policy_version 259668 (0.0017) [2025-01-05 12:36:42,065][09057] Updated weights for policy 0, policy_version 259678 (0.0014) [2025-01-05 12:36:42,842][08963] Fps is (10 sec: 19262.8, 60 sec: 19524.3, 300 sec: 19577.5). Total num frames: 1063653376. Throughput: 0: 4883.6. Samples: 15897088. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 12:36:42,842][08963] Avg episode reward: [(0, '9.022')] [2025-01-05 12:36:44,197][09057] Updated weights for policy 0, policy_version 259688 (0.0017) [2025-01-05 12:36:46,222][09057] Updated weights for policy 0, policy_version 259698 (0.0016) [2025-01-05 12:36:47,842][08963] Fps is (10 sec: 19660.8, 60 sec: 19524.3, 300 sec: 19577.5). Total num frames: 1063751680. Throughput: 0: 4878.3. Samples: 15926600. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 12:36:47,842][08963] Avg episode reward: [(0, '9.927')] [2025-01-05 12:36:48,359][09057] Updated weights for policy 0, policy_version 259708 (0.0016) [2025-01-05 12:36:50,399][09057] Updated weights for policy 0, policy_version 259718 (0.0015) [2025-01-05 12:36:52,412][09057] Updated weights for policy 0, policy_version 259728 (0.0016) [2025-01-05 12:36:52,842][08963] Fps is (10 sec: 19660.7, 60 sec: 19592.5, 300 sec: 19577.5). Total num frames: 1063849984. Throughput: 0: 4896.2. Samples: 15956408. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 12:36:52,842][08963] Avg episode reward: [(0, '9.825')] [2025-01-05 12:36:54,480][09057] Updated weights for policy 0, policy_version 259738 (0.0016) [2025-01-05 12:36:56,530][09057] Updated weights for policy 0, policy_version 259748 (0.0016) [2025-01-05 12:36:57,842][08963] Fps is (10 sec: 20070.2, 60 sec: 19660.8, 300 sec: 19591.4). Total num frames: 1063952384. Throughput: 0: 4899.9. Samples: 15971488. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 12:36:57,842][08963] Avg episode reward: [(0, '10.712')] [2025-01-05 12:36:58,624][09057] Updated weights for policy 0, policy_version 259758 (0.0016) [2025-01-05 12:37:00,718][09057] Updated weights for policy 0, policy_version 259768 (0.0016) [2025-01-05 12:37:02,842][08963] Fps is (10 sec: 20070.3, 60 sec: 19592.6, 300 sec: 19605.3). Total num frames: 1064050688. Throughput: 0: 4900.3. Samples: 16000780. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 12:37:02,842][08963] Avg episode reward: [(0, '9.885')] [2025-01-05 12:37:02,847][09057] Updated weights for policy 0, policy_version 259778 (0.0017) [2025-01-05 12:37:04,997][09057] Updated weights for policy 0, policy_version 259788 (0.0017) [2025-01-05 12:37:07,068][09057] Updated weights for policy 0, policy_version 259798 (0.0017) [2025-01-05 12:37:07,842][08963] Fps is (10 sec: 19251.3, 60 sec: 19592.6, 300 sec: 19591.4). Total num frames: 1064144896. Throughput: 0: 4901.1. Samples: 16029770. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 12:37:07,842][08963] Avg episode reward: [(0, '11.558')] [2025-01-05 12:37:09,183][09057] Updated weights for policy 0, policy_version 259808 (0.0017) [2025-01-05 12:37:11,218][09057] Updated weights for policy 0, policy_version 259818 (0.0017) [2025-01-05 12:37:12,842][08963] Fps is (10 sec: 19251.3, 60 sec: 19592.5, 300 sec: 19591.4). Total num frames: 1064243200. Throughput: 0: 4902.6. Samples: 16044704. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 12:37:12,842][08963] Avg episode reward: [(0, '10.572')] [2025-01-05 12:37:13,341][09057] Updated weights for policy 0, policy_version 259828 (0.0017) [2025-01-05 12:37:15,365][09057] Updated weights for policy 0, policy_version 259838 (0.0016) [2025-01-05 12:37:17,409][09057] Updated weights for policy 0, policy_version 259848 (0.0015) [2025-01-05 12:37:17,842][08963] Fps is (10 sec: 20070.3, 60 sec: 19660.8, 300 sec: 19619.1). Total num frames: 1064345600. Throughput: 0: 4913.0. Samples: 16074462. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 12:37:17,842][08963] Avg episode reward: [(0, '9.208')] [2025-01-05 12:37:19,532][09057] Updated weights for policy 0, policy_version 259858 (0.0016) [2025-01-05 12:37:21,568][09057] Updated weights for policy 0, policy_version 259868 (0.0017) [2025-01-05 12:37:22,842][08963] Fps is (10 sec: 19660.9, 60 sec: 19592.5, 300 sec: 19619.2). Total num frames: 1064439808. Throughput: 0: 4920.0. Samples: 16103892. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 12:37:22,842][08963] Avg episode reward: [(0, '9.096')] [2025-01-05 12:37:23,673][09057] Updated weights for policy 0, policy_version 259878 (0.0016) [2025-01-05 12:37:25,734][09057] Updated weights for policy 0, policy_version 259888 (0.0016) [2025-01-05 12:37:27,842][08963] Fps is (10 sec: 19251.0, 60 sec: 19592.5, 300 sec: 19619.1). Total num frames: 1064538112. Throughput: 0: 4926.6. Samples: 16118786. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 12:37:27,842][08963] Avg episode reward: [(0, '10.381')] [2025-01-05 12:37:27,863][09057] Updated weights for policy 0, policy_version 259898 (0.0016) [2025-01-05 12:37:29,999][09057] Updated weights for policy 0, policy_version 259908 (0.0016) [2025-01-05 12:37:32,059][09057] Updated weights for policy 0, policy_version 259918 (0.0016) [2025-01-05 12:37:32,842][08963] Fps is (10 sec: 19660.4, 60 sec: 19594.4, 300 sec: 19619.1). Total num frames: 1064636416. Throughput: 0: 4916.5. Samples: 16147844. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 12:37:32,842][08963] Avg episode reward: [(0, '10.760')] [2025-01-05 12:37:34,191][09057] Updated weights for policy 0, policy_version 259928 (0.0016) [2025-01-05 12:37:36,241][09057] Updated weights for policy 0, policy_version 259938 (0.0016) [2025-01-05 12:37:37,842][08963] Fps is (10 sec: 19660.9, 60 sec: 19660.8, 300 sec: 19605.3). Total num frames: 1064734720. Throughput: 0: 4905.3. Samples: 16177146. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 12:37:37,842][08963] Avg episode reward: [(0, '10.251')] [2025-01-05 12:37:38,389][09057] Updated weights for policy 0, policy_version 259948 (0.0017) [2025-01-05 12:37:40,436][09057] Updated weights for policy 0, policy_version 259958 (0.0017) [2025-01-05 12:37:42,474][09057] Updated weights for policy 0, policy_version 259968 (0.0017) [2025-01-05 12:37:42,842][08963] Fps is (10 sec: 19661.1, 60 sec: 19660.8, 300 sec: 19605.3). Total num frames: 1064833024. Throughput: 0: 4898.8. Samples: 16191932. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 12:37:42,842][08963] Avg episode reward: [(0, '10.147')] [2025-01-05 12:37:44,603][09057] Updated weights for policy 0, policy_version 259978 (0.0017) [2025-01-05 12:37:46,635][09057] Updated weights for policy 0, policy_version 259988 (0.0016) [2025-01-05 12:37:47,842][08963] Fps is (10 sec: 19660.8, 60 sec: 19660.8, 300 sec: 19605.2). Total num frames: 1064931328. Throughput: 0: 4908.4. Samples: 16221660. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 12:37:47,842][08963] Avg episode reward: [(0, '10.320')] [2025-01-05 12:37:47,912][09024] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000259994_1064935424.pth... [2025-01-05 12:37:47,963][09024] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000258844_1060225024.pth [2025-01-05 12:37:48,783][09057] Updated weights for policy 0, policy_version 259998 (0.0016) [2025-01-05 12:37:50,867][09057] Updated weights for policy 0, policy_version 260008 (0.0016) [2025-01-05 12:37:52,842][08963] Fps is (10 sec: 19660.6, 60 sec: 19660.8, 300 sec: 19605.3). Total num frames: 1065029632. Throughput: 0: 4916.7. Samples: 16251020. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 12:37:52,842][08963] Avg episode reward: [(0, '11.334')] [2025-01-05 12:37:52,898][09057] Updated weights for policy 0, policy_version 260018 (0.0015) [2025-01-05 12:37:54,916][09057] Updated weights for policy 0, policy_version 260028 (0.0015) [2025-01-05 12:37:56,983][09057] Updated weights for policy 0, policy_version 260038 (0.0015) [2025-01-05 12:37:57,842][08963] Fps is (10 sec: 20070.6, 60 sec: 19660.8, 300 sec: 19619.1). Total num frames: 1065132032. Throughput: 0: 4921.6. Samples: 16266178. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 12:37:57,842][08963] Avg episode reward: [(0, '9.783')] [2025-01-05 12:37:59,125][09057] Updated weights for policy 0, policy_version 260048 (0.0017) [2025-01-05 12:38:01,146][09057] Updated weights for policy 0, policy_version 260058 (0.0018) [2025-01-05 12:38:02,842][08963] Fps is (10 sec: 19660.7, 60 sec: 19592.5, 300 sec: 19605.3). Total num frames: 1065226240. Throughput: 0: 4914.0. Samples: 16295592. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 12:38:02,842][08963] Avg episode reward: [(0, '9.871')] [2025-01-05 12:38:03,314][09057] Updated weights for policy 0, policy_version 260068 (0.0018) [2025-01-05 12:38:05,389][09057] Updated weights for policy 0, policy_version 260078 (0.0016) [2025-01-05 12:38:07,406][09057] Updated weights for policy 0, policy_version 260088 (0.0016) [2025-01-05 12:38:07,842][08963] Fps is (10 sec: 19660.7, 60 sec: 19729.1, 300 sec: 19619.1). Total num frames: 1065328640. Throughput: 0: 4915.8. Samples: 16325102. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 12:38:07,842][08963] Avg episode reward: [(0, '9.808')] [2025-01-05 12:38:09,490][09057] Updated weights for policy 0, policy_version 260098 (0.0015) [2025-01-05 12:38:11,523][09057] Updated weights for policy 0, policy_version 260108 (0.0016) [2025-01-05 12:38:12,842][08963] Fps is (10 sec: 20070.5, 60 sec: 19729.1, 300 sec: 19619.2). Total num frames: 1065426944. Throughput: 0: 4917.2. Samples: 16340060. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 12:38:12,842][08963] Avg episode reward: [(0, '10.206')] [2025-01-05 12:38:13,571][09057] Updated weights for policy 0, policy_version 260118 (0.0015) [2025-01-05 12:38:15,663][09057] Updated weights for policy 0, policy_version 260128 (0.0016) [2025-01-05 12:38:17,681][09057] Updated weights for policy 0, policy_version 260138 (0.0016) [2025-01-05 12:38:17,842][08963] Fps is (10 sec: 19660.0, 60 sec: 19660.7, 300 sec: 19619.1). Total num frames: 1065525248. Throughput: 0: 4937.0. Samples: 16370012. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 12:38:17,843][08963] Avg episode reward: [(0, '9.920')] [2025-01-05 12:38:19,806][09057] Updated weights for policy 0, policy_version 260148 (0.0017) [2025-01-05 12:38:21,902][09057] Updated weights for policy 0, policy_version 260158 (0.0016) [2025-01-05 12:38:22,842][08963] Fps is (10 sec: 19660.7, 60 sec: 19729.0, 300 sec: 19619.2). Total num frames: 1065623552. Throughput: 0: 4935.7. Samples: 16399250. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 12:38:22,842][08963] Avg episode reward: [(0, '10.199')] [2025-01-05 12:38:24,038][09057] Updated weights for policy 0, policy_version 260168 (0.0017) [2025-01-05 12:38:26,075][09057] Updated weights for policy 0, policy_version 260178 (0.0017) [2025-01-05 12:38:27,842][08963] Fps is (10 sec: 19661.7, 60 sec: 19729.1, 300 sec: 19633.0). Total num frames: 1065721856. Throughput: 0: 4932.5. Samples: 16413894. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 12:38:27,842][08963] Avg episode reward: [(0, '9.678')] [2025-01-05 12:38:28,269][09057] Updated weights for policy 0, policy_version 260188 (0.0017) [2025-01-05 12:38:30,307][09057] Updated weights for policy 0, policy_version 260198 (0.0016) [2025-01-05 12:38:32,322][09057] Updated weights for policy 0, policy_version 260208 (0.0014) [2025-01-05 12:38:32,842][08963] Fps is (10 sec: 19660.9, 60 sec: 19729.1, 300 sec: 19633.0). Total num frames: 1065820160. Throughput: 0: 4928.1. Samples: 16443426. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 12:38:32,842][08963] Avg episode reward: [(0, '9.448')] [2025-01-05 12:38:34,385][09057] Updated weights for policy 0, policy_version 260218 (0.0016) [2025-01-05 12:38:36,454][09057] Updated weights for policy 0, policy_version 260228 (0.0017) [2025-01-05 12:38:37,842][08963] Fps is (10 sec: 19661.0, 60 sec: 19729.1, 300 sec: 19633.0). Total num frames: 1065918464. Throughput: 0: 4938.9. Samples: 16473268. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 12:38:37,842][08963] Avg episode reward: [(0, '9.703')] [2025-01-05 12:38:38,486][09057] Updated weights for policy 0, policy_version 260238 (0.0016) [2025-01-05 12:38:40,559][09057] Updated weights for policy 0, policy_version 260248 (0.0016) [2025-01-05 12:38:42,596][09057] Updated weights for policy 0, policy_version 260258 (0.0016) [2025-01-05 12:38:42,842][08963] Fps is (10 sec: 20070.3, 60 sec: 19797.3, 300 sec: 19660.8). Total num frames: 1066020864. Throughput: 0: 4936.7. Samples: 16488330. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 12:38:42,842][08963] Avg episode reward: [(0, '9.189')] [2025-01-05 12:38:44,724][09057] Updated weights for policy 0, policy_version 260268 (0.0017) [2025-01-05 12:38:46,827][09057] Updated weights for policy 0, policy_version 260278 (0.0016) [2025-01-05 12:38:47,842][08963] Fps is (10 sec: 19660.5, 60 sec: 19729.1, 300 sec: 19646.9). Total num frames: 1066115072. Throughput: 0: 4936.5. Samples: 16517734. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 12:38:47,842][08963] Avg episode reward: [(0, '9.927')] [2025-01-05 12:38:48,951][09057] Updated weights for policy 0, policy_version 260288 (0.0016) [2025-01-05 12:38:50,963][09057] Updated weights for policy 0, policy_version 260298 (0.0015) [2025-01-05 12:38:52,842][08963] Fps is (10 sec: 19660.8, 60 sec: 19797.3, 300 sec: 19660.8). Total num frames: 1066217472. Throughput: 0: 4940.0. Samples: 16547402. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 12:38:52,842][08963] Avg episode reward: [(0, '10.551')] [2025-01-05 12:38:53,019][09057] Updated weights for policy 0, policy_version 260308 (0.0016) [2025-01-05 12:38:55,079][09057] Updated weights for policy 0, policy_version 260318 (0.0016) [2025-01-05 12:38:57,105][09057] Updated weights for policy 0, policy_version 260328 (0.0016) [2025-01-05 12:38:57,842][08963] Fps is (10 sec: 20070.4, 60 sec: 19729.1, 300 sec: 19660.8). Total num frames: 1066315776. Throughput: 0: 4940.2. Samples: 16562368. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 12:38:57,842][08963] Avg episode reward: [(0, '9.985')] [2025-01-05 12:38:59,251][09057] Updated weights for policy 0, policy_version 260338 (0.0017) [2025-01-05 12:39:01,330][09057] Updated weights for policy 0, policy_version 260348 (0.0016) [2025-01-05 12:39:02,842][08963] Fps is (10 sec: 19660.9, 60 sec: 19797.4, 300 sec: 19660.8). Total num frames: 1066414080. Throughput: 0: 4926.7. Samples: 16591712. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 12:39:02,842][08963] Avg episode reward: [(0, '9.825')] [2025-01-05 12:39:03,464][09057] Updated weights for policy 0, policy_version 260358 (0.0017) [2025-01-05 12:39:05,514][09057] Updated weights for policy 0, policy_version 260368 (0.0017) [2025-01-05 12:39:07,602][09057] Updated weights for policy 0, policy_version 260378 (0.0017) [2025-01-05 12:39:07,842][08963] Fps is (10 sec: 19660.8, 60 sec: 19729.1, 300 sec: 19660.8). Total num frames: 1066512384. Throughput: 0: 4934.7. Samples: 16621314. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 12:39:07,842][08963] Avg episode reward: [(0, '9.682')] [2025-01-05 12:39:09,684][09057] Updated weights for policy 0, policy_version 260388 (0.0017) [2025-01-05 12:39:11,749][09057] Updated weights for policy 0, policy_version 260398 (0.0016) [2025-01-05 12:39:12,842][08963] Fps is (10 sec: 19660.6, 60 sec: 19729.1, 300 sec: 19660.8). Total num frames: 1066610688. Throughput: 0: 4934.0. Samples: 16635922. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 12:39:12,842][08963] Avg episode reward: [(0, '9.510')] [2025-01-05 12:39:13,917][09057] Updated weights for policy 0, policy_version 260408 (0.0020) [2025-01-05 12:39:15,937][09057] Updated weights for policy 0, policy_version 260418 (0.0016) [2025-01-05 12:39:17,842][08963] Fps is (10 sec: 19251.2, 60 sec: 19660.9, 300 sec: 19646.9). Total num frames: 1066704896. Throughput: 0: 4931.1. Samples: 16665328. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 12:39:17,843][08963] Avg episode reward: [(0, '10.571')] [2025-01-05 12:39:18,060][09057] Updated weights for policy 0, policy_version 260428 (0.0017) [2025-01-05 12:39:20,140][09057] Updated weights for policy 0, policy_version 260438 (0.0016) [2025-01-05 12:39:22,162][09057] Updated weights for policy 0, policy_version 260448 (0.0015) [2025-01-05 12:39:22,842][08963] Fps is (10 sec: 19660.8, 60 sec: 19729.1, 300 sec: 19660.8). Total num frames: 1066807296. Throughput: 0: 4925.4. Samples: 16694910. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 12:39:22,842][08963] Avg episode reward: [(0, '9.414')] [2025-01-05 12:39:24,294][09057] Updated weights for policy 0, policy_version 260458 (0.0017) [2025-01-05 12:39:26,390][09057] Updated weights for policy 0, policy_version 260468 (0.0016) [2025-01-05 12:39:27,842][08963] Fps is (10 sec: 19661.1, 60 sec: 19660.8, 300 sec: 19646.9). Total num frames: 1066901504. Throughput: 0: 4917.1. Samples: 16709600. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-05 12:39:27,842][08963] Avg episode reward: [(0, '10.610')] [2025-01-05 12:39:28,494][09057] Updated weights for policy 0, policy_version 260478 (0.0017) [2025-01-05 12:39:30,537][09057] Updated weights for policy 0, policy_version 260488 (0.0016) [2025-01-05 12:39:32,625][09057] Updated weights for policy 0, policy_version 260498 (0.0016) [2025-01-05 12:39:32,842][08963] Fps is (10 sec: 19660.8, 60 sec: 19729.1, 300 sec: 19660.8). Total num frames: 1067003904. Throughput: 0: 4920.9. Samples: 16739176. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-05 12:39:32,842][08963] Avg episode reward: [(0, '10.366')] [2025-01-05 12:39:34,720][09057] Updated weights for policy 0, policy_version 260508 (0.0017) [2025-01-05 12:39:36,766][09057] Updated weights for policy 0, policy_version 260518 (0.0016) [2025-01-05 12:39:37,842][08963] Fps is (10 sec: 20070.1, 60 sec: 19729.0, 300 sec: 19674.7). Total num frames: 1067102208. Throughput: 0: 4913.5. Samples: 16768508. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-05 12:39:37,843][08963] Avg episode reward: [(0, '9.902')] [2025-01-05 12:39:38,917][09057] Updated weights for policy 0, policy_version 260528 (0.0019) [2025-01-05 12:39:40,935][09057] Updated weights for policy 0, policy_version 260538 (0.0016) [2025-01-05 12:39:42,842][08963] Fps is (10 sec: 19660.8, 60 sec: 19660.8, 300 sec: 19660.8). Total num frames: 1067200512. Throughput: 0: 4910.4. Samples: 16783334. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-05 12:39:42,842][08963] Avg episode reward: [(0, '9.817')] [2025-01-05 12:39:42,973][09057] Updated weights for policy 0, policy_version 260548 (0.0015) [2025-01-05 12:39:45,053][09057] Updated weights for policy 0, policy_version 260558 (0.0015) [2025-01-05 12:39:47,057][09057] Updated weights for policy 0, policy_version 260568 (0.0016) [2025-01-05 12:39:47,842][08963] Fps is (10 sec: 19661.0, 60 sec: 19729.1, 300 sec: 19674.7). Total num frames: 1067298816. Throughput: 0: 4926.2. Samples: 16813392. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-05 12:39:47,842][08963] Avg episode reward: [(0, '9.096')] [2025-01-05 12:39:47,866][09024] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000260572_1067302912.pth... [2025-01-05 12:39:47,913][09024] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000259419_1062580224.pth [2025-01-05 12:39:49,128][09057] Updated weights for policy 0, policy_version 260578 (0.0016) [2025-01-05 12:39:51,201][09057] Updated weights for policy 0, policy_version 260588 (0.0016) [2025-01-05 12:39:52,842][08963] Fps is (10 sec: 20070.4, 60 sec: 19729.1, 300 sec: 19674.7). Total num frames: 1067401216. Throughput: 0: 4936.0. Samples: 16843436. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-05 12:39:52,842][08963] Avg episode reward: [(0, '10.523')] [2025-01-05 12:39:53,206][09057] Updated weights for policy 0, policy_version 260598 (0.0016) [2025-01-05 12:39:55,247][09057] Updated weights for policy 0, policy_version 260608 (0.0016) [2025-01-05 12:39:57,319][09057] Updated weights for policy 0, policy_version 260618 (0.0016) [2025-01-05 12:39:57,842][08963] Fps is (10 sec: 20070.2, 60 sec: 19729.1, 300 sec: 19688.6). Total num frames: 1067499520. Throughput: 0: 4944.0. Samples: 16858402. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-05 12:39:57,842][08963] Avg episode reward: [(0, '9.910')] [2025-01-05 12:39:59,388][09057] Updated weights for policy 0, policy_version 260628 (0.0019) [2025-01-05 12:40:01,448][09057] Updated weights for policy 0, policy_version 260638 (0.0019) [2025-01-05 12:40:02,842][08963] Fps is (10 sec: 19660.8, 60 sec: 19729.0, 300 sec: 19688.6). Total num frames: 1067597824. Throughput: 0: 4950.4. Samples: 16888098. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-05 12:40:02,842][08963] Avg episode reward: [(0, '9.575')] [2025-01-05 12:40:03,644][09057] Updated weights for policy 0, policy_version 260648 (0.0017) [2025-01-05 12:40:05,625][09057] Updated weights for policy 0, policy_version 260658 (0.0016) [2025-01-05 12:40:07,686][09057] Updated weights for policy 0, policy_version 260668 (0.0016) [2025-01-05 12:40:07,842][08963] Fps is (10 sec: 19661.1, 60 sec: 19729.1, 300 sec: 19688.6). Total num frames: 1067696128. Throughput: 0: 4950.2. Samples: 16917670. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-05 12:40:07,842][08963] Avg episode reward: [(0, '8.762')] [2025-01-05 12:40:09,845][09057] Updated weights for policy 0, policy_version 260678 (0.0017) [2025-01-05 12:40:11,829][09057] Updated weights for policy 0, policy_version 260688 (0.0016) [2025-01-05 12:40:12,842][08963] Fps is (10 sec: 19660.9, 60 sec: 19729.1, 300 sec: 19688.6). Total num frames: 1067794432. Throughput: 0: 4948.2. Samples: 16932270. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-05 12:40:12,842][08963] Avg episode reward: [(0, '9.681')] [2025-01-05 12:40:13,877][09057] Updated weights for policy 0, policy_version 260698 (0.0016) [2025-01-05 12:40:15,963][09057] Updated weights for policy 0, policy_version 260708 (0.0016) [2025-01-05 12:40:17,842][08963] Fps is (10 sec: 19660.6, 60 sec: 19797.3, 300 sec: 19674.7). Total num frames: 1067892736. Throughput: 0: 4957.8. Samples: 16962278. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-05 12:40:17,842][08963] Avg episode reward: [(0, '10.134')] [2025-01-05 12:40:18,025][09057] Updated weights for policy 0, policy_version 260718 (0.0017) [2025-01-05 12:40:20,058][09057] Updated weights for policy 0, policy_version 260728 (0.0016) [2025-01-05 12:40:22,121][09057] Updated weights for policy 0, policy_version 260738 (0.0015) [2025-01-05 12:40:22,842][08963] Fps is (10 sec: 20070.5, 60 sec: 19797.4, 300 sec: 19688.6). Total num frames: 1067995136. Throughput: 0: 4973.9. Samples: 16992334. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-05 12:40:22,842][08963] Avg episode reward: [(0, '10.634')] [2025-01-05 12:40:24,184][09057] Updated weights for policy 0, policy_version 260748 (0.0016) [2025-01-05 12:40:26,217][09057] Updated weights for policy 0, policy_version 260758 (0.0014) [2025-01-05 12:40:27,842][08963] Fps is (10 sec: 20070.6, 60 sec: 19865.6, 300 sec: 19702.5). Total num frames: 1068093440. Throughput: 0: 4973.9. Samples: 17007158. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-05 12:40:27,842][08963] Avg episode reward: [(0, '8.373')] [2025-01-05 12:40:28,395][09057] Updated weights for policy 0, policy_version 260768 (0.0018) [2025-01-05 12:40:30,370][09057] Updated weights for policy 0, policy_version 260778 (0.0016) [2025-01-05 12:40:32,397][09057] Updated weights for policy 0, policy_version 260788 (0.0015) [2025-01-05 12:40:32,842][08963] Fps is (10 sec: 20070.4, 60 sec: 19865.6, 300 sec: 19702.5). Total num frames: 1068195840. Throughput: 0: 4968.6. Samples: 17036980. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-05 12:40:32,842][08963] Avg episode reward: [(0, '9.661')] [2025-01-05 12:40:34,606][09057] Updated weights for policy 0, policy_version 260798 (0.0017) [2025-01-05 12:40:36,589][09057] Updated weights for policy 0, policy_version 260808 (0.0015) [2025-01-05 12:40:37,842][08963] Fps is (10 sec: 20070.2, 60 sec: 19865.6, 300 sec: 19702.5). Total num frames: 1068294144. Throughput: 0: 4958.7. Samples: 17066576. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-05 12:40:37,842][08963] Avg episode reward: [(0, '9.467')] [2025-01-05 12:40:38,614][09057] Updated weights for policy 0, policy_version 260818 (0.0016) [2025-01-05 12:40:40,708][09057] Updated weights for policy 0, policy_version 260828 (0.0016) [2025-01-05 12:40:42,801][09057] Updated weights for policy 0, policy_version 260838 (0.0017) [2025-01-05 12:40:42,842][08963] Fps is (10 sec: 19660.6, 60 sec: 19865.6, 300 sec: 19702.5). Total num frames: 1068392448. Throughput: 0: 4960.4. Samples: 17081620. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 12:40:42,842][08963] Avg episode reward: [(0, '9.437')] [2025-01-05 12:40:44,918][09057] Updated weights for policy 0, policy_version 260848 (0.0017) [2025-01-05 12:40:46,989][09057] Updated weights for policy 0, policy_version 260858 (0.0016) [2025-01-05 12:40:47,842][08963] Fps is (10 sec: 19660.7, 60 sec: 19865.6, 300 sec: 19716.3). Total num frames: 1068490752. Throughput: 0: 4951.0. Samples: 17110892. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 12:40:47,842][08963] Avg episode reward: [(0, '9.908')] [2025-01-05 12:40:49,100][09057] Updated weights for policy 0, policy_version 260868 (0.0016) [2025-01-05 12:40:51,139][09057] Updated weights for policy 0, policy_version 260878 (0.0016) [2025-01-05 12:40:52,842][08963] Fps is (10 sec: 19250.9, 60 sec: 19729.0, 300 sec: 19702.4). Total num frames: 1068584960. Throughput: 0: 4943.2. Samples: 17140114. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 12:40:52,843][08963] Avg episode reward: [(0, '9.256')] [2025-01-05 12:40:53,305][09057] Updated weights for policy 0, policy_version 260888 (0.0017) [2025-01-05 12:40:55,314][09057] Updated weights for policy 0, policy_version 260898 (0.0016) [2025-01-05 12:40:57,362][09057] Updated weights for policy 0, policy_version 260908 (0.0016) [2025-01-05 12:40:57,842][08963] Fps is (10 sec: 19660.9, 60 sec: 19797.4, 300 sec: 19702.5). Total num frames: 1068687360. Throughput: 0: 4951.8. Samples: 17155102. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 12:40:57,842][08963] Avg episode reward: [(0, '9.955')] [2025-01-05 12:40:59,551][09057] Updated weights for policy 0, policy_version 260918 (0.0017) [2025-01-05 12:41:01,541][09057] Updated weights for policy 0, policy_version 260928 (0.0015) [2025-01-05 12:41:02,842][08963] Fps is (10 sec: 20070.9, 60 sec: 19797.4, 300 sec: 19716.4). Total num frames: 1068785664. Throughput: 0: 4942.5. Samples: 17184690. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 12:41:02,842][08963] Avg episode reward: [(0, '8.867')] [2025-01-05 12:41:03,575][09057] Updated weights for policy 0, policy_version 260938 (0.0016) [2025-01-05 12:41:05,652][09057] Updated weights for policy 0, policy_version 260948 (0.0016) [2025-01-05 12:41:07,626][09057] Updated weights for policy 0, policy_version 260958 (0.0016) [2025-01-05 12:41:07,842][08963] Fps is (10 sec: 20070.4, 60 sec: 19865.6, 300 sec: 19730.2). Total num frames: 1068888064. Throughput: 0: 4945.2. Samples: 17214868. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 12:41:07,842][08963] Avg episode reward: [(0, '11.061')] [2025-01-05 12:41:09,664][09057] Updated weights for policy 0, policy_version 260968 (0.0016) [2025-01-05 12:41:11,779][09057] Updated weights for policy 0, policy_version 260978 (0.0017) [2025-01-05 12:41:12,842][08963] Fps is (10 sec: 20070.2, 60 sec: 19865.6, 300 sec: 19730.2). Total num frames: 1068986368. Throughput: 0: 4951.4. Samples: 17229972. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 12:41:12,842][08963] Avg episode reward: [(0, '9.977')] [2025-01-05 12:41:13,823][09057] Updated weights for policy 0, policy_version 260988 (0.0019) [2025-01-05 12:41:15,857][09057] Updated weights for policy 0, policy_version 260998 (0.0015) [2025-01-05 12:41:17,842][08963] Fps is (10 sec: 19660.8, 60 sec: 19865.6, 300 sec: 19730.2). Total num frames: 1069084672. Throughput: 0: 4947.1. Samples: 17259602. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 12:41:17,842][08963] Avg episode reward: [(0, '9.928')] [2025-01-05 12:41:18,058][09057] Updated weights for policy 0, policy_version 261008 (0.0017) [2025-01-05 12:41:20,040][09057] Updated weights for policy 0, policy_version 261018 (0.0015) [2025-01-05 12:41:22,074][09057] Updated weights for policy 0, policy_version 261028 (0.0015) [2025-01-05 12:41:22,842][08963] Fps is (10 sec: 19660.8, 60 sec: 19797.3, 300 sec: 19730.2). Total num frames: 1069182976. Throughput: 0: 4947.5. Samples: 17289212. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 12:41:22,842][08963] Avg episode reward: [(0, '10.573')] [2025-01-05 12:41:24,255][09057] Updated weights for policy 0, policy_version 261038 (0.0016) [2025-01-05 12:41:26,209][09057] Updated weights for policy 0, policy_version 261048 (0.0015) [2025-01-05 12:41:27,842][08963] Fps is (10 sec: 19661.0, 60 sec: 19797.3, 300 sec: 19730.6). Total num frames: 1069281280. Throughput: 0: 4945.2. Samples: 17304152. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 12:41:27,842][08963] Avg episode reward: [(0, '9.832')] [2025-01-05 12:41:28,244][09057] Updated weights for policy 0, policy_version 261058 (0.0015) [2025-01-05 12:41:30,342][09057] Updated weights for policy 0, policy_version 261068 (0.0016) [2025-01-05 12:41:32,304][09057] Updated weights for policy 0, policy_version 261078 (0.0016) [2025-01-05 12:41:32,842][08963] Fps is (10 sec: 20070.6, 60 sec: 19797.3, 300 sec: 19758.0). Total num frames: 1069383680. Throughput: 0: 4964.8. Samples: 17334308. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 12:41:32,842][08963] Avg episode reward: [(0, '10.198')] [2025-01-05 12:41:34,343][09057] Updated weights for policy 0, policy_version 261088 (0.0016) [2025-01-05 12:41:36,452][09057] Updated weights for policy 0, policy_version 261098 (0.0017) [2025-01-05 12:41:37,842][08963] Fps is (10 sec: 20070.2, 60 sec: 19797.3, 300 sec: 19758.0). Total num frames: 1069481984. Throughput: 0: 4979.4. Samples: 17364188. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 12:41:37,842][08963] Avg episode reward: [(0, '10.301')] [2025-01-05 12:41:38,525][09057] Updated weights for policy 0, policy_version 261108 (0.0017) [2025-01-05 12:41:40,589][09057] Updated weights for policy 0, policy_version 261118 (0.0015) [2025-01-05 12:41:42,688][09057] Updated weights for policy 0, policy_version 261128 (0.0016) [2025-01-05 12:41:42,842][08963] Fps is (10 sec: 20070.2, 60 sec: 19865.6, 300 sec: 19771.9). Total num frames: 1069584384. Throughput: 0: 4976.6. Samples: 17379048. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 12:41:42,842][08963] Avg episode reward: [(0, '10.598')] [2025-01-05 12:41:44,739][09057] Updated weights for policy 0, policy_version 261138 (0.0017) [2025-01-05 12:41:46,814][09057] Updated weights for policy 0, policy_version 261148 (0.0016) [2025-01-05 12:41:47,842][08963] Fps is (10 sec: 19660.8, 60 sec: 19797.4, 300 sec: 19758.0). Total num frames: 1069678592. Throughput: 0: 4975.7. Samples: 17408596. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 12:41:47,842][08963] Avg episode reward: [(0, '9.909')] [2025-01-05 12:41:47,862][09024] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000261153_1069682688.pth... [2025-01-05 12:41:47,915][09024] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000259994_1064935424.pth [2025-01-05 12:41:49,022][09057] Updated weights for policy 0, policy_version 261158 (0.0017) [2025-01-05 12:41:51,047][09057] Updated weights for policy 0, policy_version 261168 (0.0016) [2025-01-05 12:41:52,842][08963] Fps is (10 sec: 19251.3, 60 sec: 19865.7, 300 sec: 19744.1). Total num frames: 1069776896. Throughput: 0: 4957.9. Samples: 17437972. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 12:41:52,842][08963] Avg episode reward: [(0, '9.192')] [2025-01-05 12:41:53,074][09057] Updated weights for policy 0, policy_version 261178 (0.0015) [2025-01-05 12:41:55,159][09057] Updated weights for policy 0, policy_version 261188 (0.0015) [2025-01-05 12:41:57,149][09057] Updated weights for policy 0, policy_version 261198 (0.0015) [2025-01-05 12:41:57,842][08963] Fps is (10 sec: 20070.6, 60 sec: 19865.6, 300 sec: 19758.0). Total num frames: 1069879296. Throughput: 0: 4958.5. Samples: 17453102. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 12:41:57,842][08963] Avg episode reward: [(0, '9.929')] [2025-01-05 12:41:59,212][09057] Updated weights for policy 0, policy_version 261208 (0.0016) [2025-01-05 12:42:01,312][09057] Updated weights for policy 0, policy_version 261218 (0.0016) [2025-01-05 12:42:02,842][08963] Fps is (10 sec: 20070.4, 60 sec: 19865.6, 300 sec: 19771.9). Total num frames: 1069977600. Throughput: 0: 4963.2. Samples: 17482946. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 12:42:02,842][08963] Avg episode reward: [(0, '10.395')] [2025-01-05 12:42:03,363][09057] Updated weights for policy 0, policy_version 261228 (0.0016) [2025-01-05 12:42:05,433][09057] Updated weights for policy 0, policy_version 261238 (0.0016) [2025-01-05 12:42:07,542][09057] Updated weights for policy 0, policy_version 261248 (0.0015) [2025-01-05 12:42:07,842][08963] Fps is (10 sec: 19660.5, 60 sec: 19797.3, 300 sec: 19771.9). Total num frames: 1070075904. Throughput: 0: 4962.4. Samples: 17512522. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 12:42:07,843][08963] Avg episode reward: [(0, '10.745')] [2025-01-05 12:42:09,621][09057] Updated weights for policy 0, policy_version 261258 (0.0017) [2025-01-05 12:42:11,677][09057] Updated weights for policy 0, policy_version 261268 (0.0016) [2025-01-05 12:42:12,842][08963] Fps is (10 sec: 19660.6, 60 sec: 19797.3, 300 sec: 19758.0). Total num frames: 1070174208. Throughput: 0: 4956.8. Samples: 17527210. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 12:42:12,843][08963] Avg episode reward: [(0, '9.826')] [2025-01-05 12:42:13,851][09057] Updated weights for policy 0, policy_version 261278 (0.0016) [2025-01-05 12:42:15,842][09057] Updated weights for policy 0, policy_version 261288 (0.0015) [2025-01-05 12:42:17,842][08963] Fps is (10 sec: 19661.1, 60 sec: 19797.4, 300 sec: 19771.9). Total num frames: 1070272512. Throughput: 0: 4942.7. Samples: 17556730. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 12:42:17,842][08963] Avg episode reward: [(0, '8.947')] [2025-01-05 12:42:17,887][09057] Updated weights for policy 0, policy_version 261298 (0.0016) [2025-01-05 12:42:20,005][09057] Updated weights for policy 0, policy_version 261308 (0.0016) [2025-01-05 12:42:21,974][09057] Updated weights for policy 0, policy_version 261318 (0.0016) [2025-01-05 12:42:22,842][08963] Fps is (10 sec: 20070.6, 60 sec: 19865.6, 300 sec: 19785.8). Total num frames: 1070374912. Throughput: 0: 4946.9. Samples: 17586800. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 12:42:22,842][08963] Avg episode reward: [(0, '8.566')] [2025-01-05 12:42:24,041][09057] Updated weights for policy 0, policy_version 261328 (0.0016) [2025-01-05 12:42:26,142][09057] Updated weights for policy 0, policy_version 261338 (0.0016) [2025-01-05 12:42:27,842][08963] Fps is (10 sec: 20070.4, 60 sec: 19865.6, 300 sec: 19785.8). Total num frames: 1070473216. Throughput: 0: 4948.6. Samples: 17601736. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 12:42:27,842][08963] Avg episode reward: [(0, '9.953')] [2025-01-05 12:42:28,194][09057] Updated weights for policy 0, policy_version 261348 (0.0016) [2025-01-05 12:42:30,252][09057] Updated weights for policy 0, policy_version 261358 (0.0015) [2025-01-05 12:42:32,355][09057] Updated weights for policy 0, policy_version 261368 (0.0016) [2025-01-05 12:42:32,842][08963] Fps is (10 sec: 19660.7, 60 sec: 19797.3, 300 sec: 19785.8). Total num frames: 1070571520. Throughput: 0: 4952.4. Samples: 17631454. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 12:42:32,842][08963] Avg episode reward: [(0, '10.728')] [2025-01-05 12:42:34,476][09057] Updated weights for policy 0, policy_version 261378 (0.0018) [2025-01-05 12:42:36,558][09057] Updated weights for policy 0, policy_version 261388 (0.0016) [2025-01-05 12:42:37,842][08963] Fps is (10 sec: 19251.2, 60 sec: 19729.1, 300 sec: 19771.9). Total num frames: 1070665728. Throughput: 0: 4941.0. Samples: 17660316. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 12:42:37,842][08963] Avg episode reward: [(0, '9.050')] [2025-01-05 12:42:38,788][09057] Updated weights for policy 0, policy_version 261398 (0.0018) [2025-01-05 12:42:40,743][09057] Updated weights for policy 0, policy_version 261408 (0.0017) [2025-01-05 12:42:42,816][09057] Updated weights for policy 0, policy_version 261418 (0.0016) [2025-01-05 12:42:42,842][08963] Fps is (10 sec: 19660.9, 60 sec: 19729.1, 300 sec: 19785.8). Total num frames: 1070768128. Throughput: 0: 4932.8. Samples: 17675078. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 12:42:42,842][08963] Avg episode reward: [(0, '10.357')] [2025-01-05 12:42:44,968][09057] Updated weights for policy 0, policy_version 261428 (0.0016) [2025-01-05 12:42:46,919][09057] Updated weights for policy 0, policy_version 261438 (0.0016) [2025-01-05 12:42:47,842][08963] Fps is (10 sec: 20070.5, 60 sec: 19797.4, 300 sec: 19785.8). Total num frames: 1070866432. Throughput: 0: 4935.6. Samples: 17705046. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 12:42:47,842][08963] Avg episode reward: [(0, '10.938')] [2025-01-05 12:42:48,957][09057] Updated weights for policy 0, policy_version 261448 (0.0016) [2025-01-05 12:42:51,064][09057] Updated weights for policy 0, policy_version 261458 (0.0015) [2025-01-05 12:42:52,842][08963] Fps is (10 sec: 19660.8, 60 sec: 19797.3, 300 sec: 19771.9). Total num frames: 1070964736. Throughput: 0: 4934.7. Samples: 17734582. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 12:42:52,843][08963] Avg episode reward: [(0, '10.346')] [2025-01-05 12:42:53,162][09057] Updated weights for policy 0, policy_version 261468 (0.0017) [2025-01-05 12:42:55,219][09057] Updated weights for policy 0, policy_version 261478 (0.0020) [2025-01-05 12:42:57,353][09057] Updated weights for policy 0, policy_version 261488 (0.0016) [2025-01-05 12:42:57,842][08963] Fps is (10 sec: 19660.6, 60 sec: 19729.0, 300 sec: 19785.8). Total num frames: 1071063040. Throughput: 0: 4937.2. Samples: 17749384. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 12:42:57,842][08963] Avg episode reward: [(0, '10.208')] [2025-01-05 12:42:59,426][09057] Updated weights for policy 0, policy_version 261498 (0.0016) [2025-01-05 12:43:01,476][09057] Updated weights for policy 0, policy_version 261508 (0.0015) [2025-01-05 12:43:02,842][08963] Fps is (10 sec: 19661.0, 60 sec: 19729.1, 300 sec: 19771.9). Total num frames: 1071161344. Throughput: 0: 4938.0. Samples: 17778942. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 12:43:02,842][08963] Avg episode reward: [(0, '10.109')] [2025-01-05 12:43:03,673][09057] Updated weights for policy 0, policy_version 261518 (0.0016) [2025-01-05 12:43:05,658][09057] Updated weights for policy 0, policy_version 261528 (0.0015) [2025-01-05 12:43:07,736][09057] Updated weights for policy 0, policy_version 261538 (0.0015) [2025-01-05 12:43:07,842][08963] Fps is (10 sec: 19660.6, 60 sec: 19729.1, 300 sec: 19771.9). Total num frames: 1071259648. Throughput: 0: 4925.7. Samples: 17808456. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 12:43:07,842][08963] Avg episode reward: [(0, '9.039')] [2025-01-05 12:43:09,900][09057] Updated weights for policy 0, policy_version 261548 (0.0017) [2025-01-05 12:43:11,872][09057] Updated weights for policy 0, policy_version 261558 (0.0016) [2025-01-05 12:43:12,842][08963] Fps is (10 sec: 19660.8, 60 sec: 19729.1, 300 sec: 19771.9). Total num frames: 1071357952. Throughput: 0: 4917.6. Samples: 17823030. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 12:43:12,842][08963] Avg episode reward: [(0, '9.530')] [2025-01-05 12:43:13,958][09057] Updated weights for policy 0, policy_version 261568 (0.0016) [2025-01-05 12:43:16,019][09057] Updated weights for policy 0, policy_version 261578 (0.0016) [2025-01-05 12:43:17,842][08963] Fps is (10 sec: 19661.1, 60 sec: 19729.1, 300 sec: 19771.9). Total num frames: 1071456256. Throughput: 0: 4922.6. Samples: 17852972. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 12:43:17,842][08963] Avg episode reward: [(0, '9.825')] [2025-01-05 12:43:18,111][09057] Updated weights for policy 0, policy_version 261588 (0.0017) [2025-01-05 12:43:20,194][09057] Updated weights for policy 0, policy_version 261598 (0.0016) [2025-01-05 12:43:22,272][09057] Updated weights for policy 0, policy_version 261608 (0.0016) [2025-01-05 12:43:22,842][08963] Fps is (10 sec: 19660.8, 60 sec: 19660.8, 300 sec: 19771.9). Total num frames: 1071554560. Throughput: 0: 4935.4. Samples: 17882410. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 12:43:22,842][08963] Avg episode reward: [(0, '9.596')] [2025-01-05 12:43:24,333][09057] Updated weights for policy 0, policy_version 261618 (0.0016) [2025-01-05 12:43:26,404][09057] Updated weights for policy 0, policy_version 261628 (0.0016) [2025-01-05 12:43:27,842][08963] Fps is (10 sec: 19660.5, 60 sec: 19660.7, 300 sec: 19771.9). Total num frames: 1071652864. Throughput: 0: 4937.0. Samples: 17897242. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 12:43:27,843][08963] Avg episode reward: [(0, '10.519')] [2025-01-05 12:43:28,573][09057] Updated weights for policy 0, policy_version 261638 (0.0017) [2025-01-05 12:43:30,537][09057] Updated weights for policy 0, policy_version 261648 (0.0016) [2025-01-05 12:43:32,614][09057] Updated weights for policy 0, policy_version 261658 (0.0016) [2025-01-05 12:43:32,842][08963] Fps is (10 sec: 19660.8, 60 sec: 19660.8, 300 sec: 19771.9). Total num frames: 1071751168. Throughput: 0: 4930.4. Samples: 17926916. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 12:43:32,842][08963] Avg episode reward: [(0, '9.762')] [2025-01-05 12:43:34,770][09057] Updated weights for policy 0, policy_version 261668 (0.0017) [2025-01-05 12:43:36,720][09057] Updated weights for policy 0, policy_version 261678 (0.0015) [2025-01-05 12:43:37,842][08963] Fps is (10 sec: 20070.6, 60 sec: 19797.3, 300 sec: 19771.9). Total num frames: 1071853568. Throughput: 0: 4938.1. Samples: 17956796. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 12:43:37,842][08963] Avg episode reward: [(0, '9.679')] [2025-01-05 12:43:38,780][09057] Updated weights for policy 0, policy_version 261688 (0.0016) [2025-01-05 12:43:40,864][09057] Updated weights for policy 0, policy_version 261698 (0.0015) [2025-01-05 12:43:42,842][08963] Fps is (10 sec: 20070.4, 60 sec: 19729.1, 300 sec: 19785.8). Total num frames: 1071951872. Throughput: 0: 4943.2. Samples: 17971826. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 12:43:42,842][08963] Avg episode reward: [(0, '10.532')] [2025-01-05 12:43:42,913][09057] Updated weights for policy 0, policy_version 261708 (0.0017) [2025-01-05 12:43:45,026][09057] Updated weights for policy 0, policy_version 261718 (0.0016) [2025-01-05 12:43:47,112][09057] Updated weights for policy 0, policy_version 261728 (0.0016) [2025-01-05 12:43:47,842][08963] Fps is (10 sec: 19661.0, 60 sec: 19729.1, 300 sec: 19771.9). Total num frames: 1072050176. Throughput: 0: 4939.3. Samples: 18001212. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 12:43:47,842][08963] Avg episode reward: [(0, '8.570')] [2025-01-05 12:43:47,916][09024] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000261732_1072054272.pth... [2025-01-05 12:43:47,968][09024] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000260572_1067302912.pth [2025-01-05 12:43:49,292][09057] Updated weights for policy 0, policy_version 261738 (0.0018) [2025-01-05 12:43:51,396][09057] Updated weights for policy 0, policy_version 261748 (0.0016) [2025-01-05 12:43:52,842][08963] Fps is (10 sec: 19660.8, 60 sec: 19729.1, 300 sec: 19771.9). Total num frames: 1072148480. Throughput: 0: 4928.2. Samples: 18030226. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 12:43:52,842][08963] Avg episode reward: [(0, '9.488')] [2025-01-05 12:43:53,470][09057] Updated weights for policy 0, policy_version 261758 (0.0015) [2025-01-05 12:43:55,488][09057] Updated weights for policy 0, policy_version 261768 (0.0016) [2025-01-05 12:43:57,541][09057] Updated weights for policy 0, policy_version 261778 (0.0015) [2025-01-05 12:43:57,842][08963] Fps is (10 sec: 19660.6, 60 sec: 19729.1, 300 sec: 19771.9). Total num frames: 1072246784. Throughput: 0: 4935.1. Samples: 18045112. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 12:43:57,842][08963] Avg episode reward: [(0, '9.320')] [2025-01-05 12:43:59,643][09057] Updated weights for policy 0, policy_version 261788 (0.0016) [2025-01-05 12:44:01,664][09057] Updated weights for policy 0, policy_version 261798 (0.0015) [2025-01-05 12:44:02,842][08963] Fps is (10 sec: 19660.8, 60 sec: 19729.1, 300 sec: 19771.9). Total num frames: 1072345088. Throughput: 0: 4931.6. Samples: 18074896. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 12:44:02,842][08963] Avg episode reward: [(0, '11.205')] [2025-01-05 12:44:03,846][09057] Updated weights for policy 0, policy_version 261808 (0.0017) [2025-01-05 12:44:05,922][09057] Updated weights for policy 0, policy_version 261818 (0.0016) [2025-01-05 12:44:07,842][08963] Fps is (10 sec: 19660.9, 60 sec: 19729.1, 300 sec: 19771.9). Total num frames: 1072443392. Throughput: 0: 4924.1. Samples: 18103996. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 12:44:07,842][08963] Avg episode reward: [(0, '8.921')] [2025-01-05 12:44:08,026][09057] Updated weights for policy 0, policy_version 261828 (0.0018) [2025-01-05 12:44:10,063][09057] Updated weights for policy 0, policy_version 261838 (0.0016) [2025-01-05 12:44:12,122][09057] Updated weights for policy 0, policy_version 261848 (0.0016) [2025-01-05 12:44:12,842][08963] Fps is (10 sec: 19660.6, 60 sec: 19729.0, 300 sec: 19785.8). Total num frames: 1072541696. Throughput: 0: 4925.0. Samples: 18118868. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) [2025-01-05 12:44:12,842][08963] Avg episode reward: [(0, '9.728')] [2025-01-05 12:44:14,257][09057] Updated weights for policy 0, policy_version 261858 (0.0018) [2025-01-05 12:44:16,301][09057] Updated weights for policy 0, policy_version 261868 (0.0016) [2025-01-05 12:44:17,842][08963] Fps is (10 sec: 19660.6, 60 sec: 19729.0, 300 sec: 19771.9). Total num frames: 1072640000. Throughput: 0: 4921.4. Samples: 18148378. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) [2025-01-05 12:44:17,842][08963] Avg episode reward: [(0, '9.322')] [2025-01-05 12:44:18,435][09057] Updated weights for policy 0, policy_version 261878 (0.0017) [2025-01-05 12:44:20,458][09057] Updated weights for policy 0, policy_version 261888 (0.0017) [2025-01-05 12:44:22,501][09057] Updated weights for policy 0, policy_version 261898 (0.0016) [2025-01-05 12:44:22,843][08963] Fps is (10 sec: 19658.6, 60 sec: 19728.7, 300 sec: 19785.7). Total num frames: 1072738304. Throughput: 0: 4918.3. Samples: 18178126. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) [2025-01-05 12:44:22,843][08963] Avg episode reward: [(0, '10.178')] [2025-01-05 12:44:24,636][09057] Updated weights for policy 0, policy_version 261908 (0.0017) [2025-01-05 12:44:26,661][09057] Updated weights for policy 0, policy_version 261918 (0.0016) [2025-01-05 12:44:27,842][08963] Fps is (10 sec: 19660.9, 60 sec: 19729.1, 300 sec: 19771.9). Total num frames: 1072836608. Throughput: 0: 4911.2. Samples: 18192830. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) [2025-01-05 12:44:27,842][08963] Avg episode reward: [(0, '10.589')] [2025-01-05 12:44:28,748][09057] Updated weights for policy 0, policy_version 261928 (0.0017) [2025-01-05 12:44:30,782][09057] Updated weights for policy 0, policy_version 261938 (0.0016) [2025-01-05 12:44:32,842][08963] Fps is (10 sec: 19662.9, 60 sec: 19729.0, 300 sec: 19771.9). Total num frames: 1072934912. Throughput: 0: 4917.7. Samples: 18222510. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) [2025-01-05 12:44:32,842][08963] Avg episode reward: [(0, '9.272')] [2025-01-05 12:44:32,900][09057] Updated weights for policy 0, policy_version 261948 (0.0017) [2025-01-05 12:44:35,053][09057] Updated weights for policy 0, policy_version 261958 (0.0016) [2025-01-05 12:44:37,058][09057] Updated weights for policy 0, policy_version 261968 (0.0016) [2025-01-05 12:44:37,842][08963] Fps is (10 sec: 19661.0, 60 sec: 19660.8, 300 sec: 19771.9). Total num frames: 1073033216. Throughput: 0: 4926.8. Samples: 18251930. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) [2025-01-05 12:44:37,842][08963] Avg episode reward: [(0, '9.332')] [2025-01-05 12:44:39,110][09057] Updated weights for policy 0, policy_version 261978 (0.0016) [2025-01-05 12:44:41,164][09057] Updated weights for policy 0, policy_version 261988 (0.0019) [2025-01-05 12:44:42,842][08963] Fps is (10 sec: 19660.9, 60 sec: 19660.8, 300 sec: 19771.9). Total num frames: 1073131520. Throughput: 0: 4931.0. Samples: 18267008. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) [2025-01-05 12:44:42,842][08963] Avg episode reward: [(0, '10.328')] [2025-01-05 12:44:43,267][09057] Updated weights for policy 0, policy_version 261998 (0.0017) [2025-01-05 12:44:45,330][09057] Updated weights for policy 0, policy_version 262008 (0.0015) [2025-01-05 12:44:47,370][09057] Updated weights for policy 0, policy_version 262018 (0.0016) [2025-01-05 12:44:47,842][08963] Fps is (10 sec: 20070.2, 60 sec: 19729.0, 300 sec: 19771.9). Total num frames: 1073233920. Throughput: 0: 4930.3. Samples: 18296758. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) [2025-01-05 12:44:47,842][08963] Avg episode reward: [(0, '9.934')] [2025-01-05 12:44:49,498][09057] Updated weights for policy 0, policy_version 262028 (0.0016) [2025-01-05 12:44:51,527][09057] Updated weights for policy 0, policy_version 262038 (0.0019) [2025-01-05 12:44:52,842][08963] Fps is (10 sec: 20070.4, 60 sec: 19729.0, 300 sec: 19771.9). Total num frames: 1073332224. Throughput: 0: 4938.2. Samples: 18326216. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) [2025-01-05 12:44:52,842][08963] Avg episode reward: [(0, '10.262')] [2025-01-05 12:44:53,665][09057] Updated weights for policy 0, policy_version 262048 (0.0016) [2025-01-05 12:44:55,704][09057] Updated weights for policy 0, policy_version 262058 (0.0016) [2025-01-05 12:44:57,842][08963] Fps is (10 sec: 19251.2, 60 sec: 19660.8, 300 sec: 19758.0). Total num frames: 1073426432. Throughput: 0: 4934.4. Samples: 18340916. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) [2025-01-05 12:44:57,842][08963] Avg episode reward: [(0, '10.987')] [2025-01-05 12:44:57,858][09057] Updated weights for policy 0, policy_version 262068 (0.0017) [2025-01-05 12:45:00,033][09057] Updated weights for policy 0, policy_version 262078 (0.0017) [2025-01-05 12:45:02,051][09057] Updated weights for policy 0, policy_version 262088 (0.0016) [2025-01-05 12:45:02,842][08963] Fps is (10 sec: 19251.2, 60 sec: 19660.8, 300 sec: 19758.0). Total num frames: 1073524736. Throughput: 0: 4923.5. Samples: 18369934. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) [2025-01-05 12:45:02,842][08963] Avg episode reward: [(0, '10.421')] [2025-01-05 12:45:04,213][09057] Updated weights for policy 0, policy_version 262098 (0.0018) [2025-01-05 12:45:06,314][09057] Updated weights for policy 0, policy_version 262108 (0.0017) [2025-01-05 12:45:07,842][08963] Fps is (10 sec: 19660.8, 60 sec: 19660.8, 300 sec: 19758.0). Total num frames: 1073623040. Throughput: 0: 4907.4. Samples: 18398954. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) [2025-01-05 12:45:07,842][08963] Avg episode reward: [(0, '9.167')] [2025-01-05 12:45:08,400][09057] Updated weights for policy 0, policy_version 262118 (0.0016) [2025-01-05 12:45:10,453][09057] Updated weights for policy 0, policy_version 262128 (0.0016) [2025-01-05 12:45:12,524][09057] Updated weights for policy 0, policy_version 262138 (0.0019) [2025-01-05 12:45:12,842][08963] Fps is (10 sec: 19660.7, 60 sec: 19660.8, 300 sec: 19758.0). Total num frames: 1073721344. Throughput: 0: 4912.8. Samples: 18413904. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) [2025-01-05 12:45:12,842][08963] Avg episode reward: [(0, '11.443')] [2025-01-05 12:45:14,650][09057] Updated weights for policy 0, policy_version 262148 (0.0018) [2025-01-05 12:45:16,734][09057] Updated weights for policy 0, policy_version 262158 (0.0017) [2025-01-05 12:45:17,842][08963] Fps is (10 sec: 19660.8, 60 sec: 19660.8, 300 sec: 19744.1). Total num frames: 1073819648. Throughput: 0: 4905.6. Samples: 18443260. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) [2025-01-05 12:45:17,842][08963] Avg episode reward: [(0, '9.163')] [2025-01-05 12:45:18,945][09057] Updated weights for policy 0, policy_version 262168 (0.0017) [2025-01-05 12:45:20,966][09057] Updated weights for policy 0, policy_version 262178 (0.0015) [2025-01-05 12:45:22,842][08963] Fps is (10 sec: 19250.3, 60 sec: 19592.7, 300 sec: 19730.2). Total num frames: 1073913856. Throughput: 0: 4890.8. Samples: 18472020. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 12:45:22,843][08963] Avg episode reward: [(0, '9.664')] [2025-01-05 12:45:23,128][09057] Updated weights for policy 0, policy_version 262188 (0.0017) [2025-01-05 12:45:25,213][09057] Updated weights for policy 0, policy_version 262198 (0.0015) [2025-01-05 12:45:27,233][09057] Updated weights for policy 0, policy_version 262208 (0.0016) [2025-01-05 12:45:27,842][08963] Fps is (10 sec: 19251.3, 60 sec: 19592.5, 300 sec: 19716.3). Total num frames: 1074012160. Throughput: 0: 4887.6. Samples: 18486950. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 12:45:27,842][08963] Avg episode reward: [(0, '10.983')] [2025-01-05 12:45:29,396][09057] Updated weights for policy 0, policy_version 262218 (0.0017) [2025-01-05 12:45:31,492][09057] Updated weights for policy 0, policy_version 262228 (0.0017) [2025-01-05 12:45:32,842][08963] Fps is (10 sec: 19661.8, 60 sec: 19592.6, 300 sec: 19716.3). Total num frames: 1074110464. Throughput: 0: 4875.0. Samples: 18516132. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 12:45:32,842][08963] Avg episode reward: [(0, '8.172')] [2025-01-05 12:45:33,630][09057] Updated weights for policy 0, policy_version 262238 (0.0016) [2025-01-05 12:45:35,673][09057] Updated weights for policy 0, policy_version 262248 (0.0015) [2025-01-05 12:45:37,838][09057] Updated weights for policy 0, policy_version 262258 (0.0016) [2025-01-05 12:45:37,842][08963] Fps is (10 sec: 19661.0, 60 sec: 19592.5, 300 sec: 19716.3). Total num frames: 1074208768. Throughput: 0: 4865.9. Samples: 18545180. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 12:45:37,842][08963] Avg episode reward: [(0, '9.464')] [2025-01-05 12:45:39,990][09057] Updated weights for policy 0, policy_version 262268 (0.0017) [2025-01-05 12:45:42,061][09057] Updated weights for policy 0, policy_version 262278 (0.0018) [2025-01-05 12:45:42,842][08963] Fps is (10 sec: 19251.2, 60 sec: 19524.3, 300 sec: 19702.5). Total num frames: 1074302976. Throughput: 0: 4858.9. Samples: 18559568. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 12:45:42,842][08963] Avg episode reward: [(0, '9.352')] [2025-01-05 12:45:44,216][09057] Updated weights for policy 0, policy_version 262288 (0.0016) [2025-01-05 12:45:46,248][09057] Updated weights for policy 0, policy_version 262298 (0.0015) [2025-01-05 12:45:47,842][08963] Fps is (10 sec: 19250.9, 60 sec: 19456.0, 300 sec: 19716.3). Total num frames: 1074401280. Throughput: 0: 4865.6. Samples: 18588888. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 12:45:47,842][08963] Avg episode reward: [(0, '9.339')] [2025-01-05 12:45:47,850][09024] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000262305_1074401280.pth... [2025-01-05 12:45:47,901][09024] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000261153_1069682688.pth [2025-01-05 12:45:48,412][09057] Updated weights for policy 0, policy_version 262308 (0.0016) [2025-01-05 12:45:50,477][09057] Updated weights for policy 0, policy_version 262318 (0.0018) [2025-01-05 12:45:52,503][09057] Updated weights for policy 0, policy_version 262328 (0.0016) [2025-01-05 12:45:52,842][08963] Fps is (10 sec: 19660.8, 60 sec: 19456.0, 300 sec: 19702.5). Total num frames: 1074499584. Throughput: 0: 4877.0. Samples: 18618418. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 12:45:52,842][08963] Avg episode reward: [(0, '10.730')] [2025-01-05 12:45:54,658][09057] Updated weights for policy 0, policy_version 262338 (0.0017) [2025-01-05 12:45:56,705][09057] Updated weights for policy 0, policy_version 262348 (0.0016) [2025-01-05 12:45:57,842][08963] Fps is (10 sec: 19660.9, 60 sec: 19524.3, 300 sec: 19702.4). Total num frames: 1074597888. Throughput: 0: 4870.3. Samples: 18633066. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 12:45:57,842][08963] Avg episode reward: [(0, '9.694')] [2025-01-05 12:45:58,806][09057] Updated weights for policy 0, policy_version 262358 (0.0016) [2025-01-05 12:46:00,866][09057] Updated weights for policy 0, policy_version 262368 (0.0017) [2025-01-05 12:46:02,842][08963] Fps is (10 sec: 19661.0, 60 sec: 19524.3, 300 sec: 19688.6). Total num frames: 1074696192. Throughput: 0: 4872.4. Samples: 18662518. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 12:46:02,842][08963] Avg episode reward: [(0, '9.253')] [2025-01-05 12:46:02,988][09057] Updated weights for policy 0, policy_version 262378 (0.0017) [2025-01-05 12:46:04,998][09057] Updated weights for policy 0, policy_version 262388 (0.0015) [2025-01-05 12:46:07,075][09057] Updated weights for policy 0, policy_version 262398 (0.0016) [2025-01-05 12:46:07,842][08963] Fps is (10 sec: 19661.0, 60 sec: 19524.3, 300 sec: 19688.6). Total num frames: 1074794496. Throughput: 0: 4892.4. Samples: 18692176. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 12:46:07,842][08963] Avg episode reward: [(0, '9.311')] [2025-01-05 12:46:09,250][09057] Updated weights for policy 0, policy_version 262408 (0.0017) [2025-01-05 12:46:11,266][09057] Updated weights for policy 0, policy_version 262418 (0.0017) [2025-01-05 12:46:12,842][08963] Fps is (10 sec: 19660.7, 60 sec: 19524.3, 300 sec: 19688.6). Total num frames: 1074892800. Throughput: 0: 4886.3. Samples: 18706834. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 12:46:12,842][08963] Avg episode reward: [(0, '10.255')] [2025-01-05 12:46:13,325][09057] Updated weights for policy 0, policy_version 262428 (0.0018) [2025-01-05 12:46:15,414][09057] Updated weights for policy 0, policy_version 262438 (0.0016) [2025-01-05 12:46:17,455][09057] Updated weights for policy 0, policy_version 262448 (0.0019) [2025-01-05 12:46:17,842][08963] Fps is (10 sec: 19660.5, 60 sec: 19524.3, 300 sec: 19688.6). Total num frames: 1074991104. Throughput: 0: 4901.6. Samples: 18736706. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 12:46:17,842][08963] Avg episode reward: [(0, '9.883')] [2025-01-05 12:46:19,516][09057] Updated weights for policy 0, policy_version 262458 (0.0016) [2025-01-05 12:46:21,606][09057] Updated weights for policy 0, policy_version 262468 (0.0017) [2025-01-05 12:46:22,842][08963] Fps is (10 sec: 19660.9, 60 sec: 19592.7, 300 sec: 19688.6). Total num frames: 1075089408. Throughput: 0: 4912.4. Samples: 18766236. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 12:46:22,842][08963] Avg episode reward: [(0, '10.729')] [2025-01-05 12:46:23,731][09057] Updated weights for policy 0, policy_version 262478 (0.0017) [2025-01-05 12:46:25,758][09057] Updated weights for policy 0, policy_version 262488 (0.0016) [2025-01-05 12:46:27,842][08963] Fps is (10 sec: 19661.0, 60 sec: 19592.6, 300 sec: 19674.7). Total num frames: 1075187712. Throughput: 0: 4921.6. Samples: 18781038. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 12:46:27,842][08963] Avg episode reward: [(0, '10.200')] [2025-01-05 12:46:27,949][09057] Updated weights for policy 0, policy_version 262498 (0.0016) [2025-01-05 12:46:30,069][09057] Updated weights for policy 0, policy_version 262508 (0.0016) [2025-01-05 12:46:32,117][09057] Updated weights for policy 0, policy_version 262518 (0.0017) [2025-01-05 12:46:32,842][08963] Fps is (10 sec: 19660.7, 60 sec: 19592.5, 300 sec: 19674.7). Total num frames: 1075286016. Throughput: 0: 4913.9. Samples: 18810014. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 12:46:32,842][08963] Avg episode reward: [(0, '8.979')] [2025-01-05 12:46:34,282][09057] Updated weights for policy 0, policy_version 262528 (0.0017) [2025-01-05 12:46:36,327][09057] Updated weights for policy 0, policy_version 262538 (0.0016) [2025-01-05 12:46:37,842][08963] Fps is (10 sec: 19660.6, 60 sec: 19592.5, 300 sec: 19660.8). Total num frames: 1075384320. Throughput: 0: 4902.4. Samples: 18839028. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 12:46:37,843][08963] Avg episode reward: [(0, '9.646')] [2025-01-05 12:46:38,471][09057] Updated weights for policy 0, policy_version 262548 (0.0018) [2025-01-05 12:46:40,542][09057] Updated weights for policy 0, policy_version 262558 (0.0015) [2025-01-05 12:46:42,570][09057] Updated weights for policy 0, policy_version 262568 (0.0016) [2025-01-05 12:46:42,842][08963] Fps is (10 sec: 19660.9, 60 sec: 19660.8, 300 sec: 19674.7). Total num frames: 1075482624. Throughput: 0: 4908.9. Samples: 18853964. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 12:46:42,842][08963] Avg episode reward: [(0, '9.444')] [2025-01-05 12:46:44,710][09057] Updated weights for policy 0, policy_version 262578 (0.0016) [2025-01-05 12:46:46,785][09057] Updated weights for policy 0, policy_version 262588 (0.0015) [2025-01-05 12:46:47,842][08963] Fps is (10 sec: 19661.0, 60 sec: 19660.8, 300 sec: 19674.7). Total num frames: 1075580928. Throughput: 0: 4909.7. Samples: 18883456. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 12:46:47,842][08963] Avg episode reward: [(0, '9.310')] [2025-01-05 12:46:48,952][09057] Updated weights for policy 0, policy_version 262598 (0.0016) [2025-01-05 12:46:50,966][09057] Updated weights for policy 0, policy_version 262608 (0.0015) [2025-01-05 12:46:52,842][08963] Fps is (10 sec: 19660.7, 60 sec: 19660.8, 300 sec: 19660.8). Total num frames: 1075679232. Throughput: 0: 4902.9. Samples: 18912806. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 12:46:52,842][08963] Avg episode reward: [(0, '10.844')] [2025-01-05 12:46:53,064][09057] Updated weights for policy 0, policy_version 262618 (0.0015) [2025-01-05 12:46:55,101][09057] Updated weights for policy 0, policy_version 262628 (0.0015) [2025-01-05 12:46:57,118][09057] Updated weights for policy 0, policy_version 262638 (0.0016) [2025-01-05 12:46:57,842][08963] Fps is (10 sec: 19660.7, 60 sec: 19660.8, 300 sec: 19660.8). Total num frames: 1075777536. Throughput: 0: 4910.0. Samples: 18927786. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 12:46:57,842][08963] Avg episode reward: [(0, '9.467')] [2025-01-05 12:46:59,216][09057] Updated weights for policy 0, policy_version 262648 (0.0016) [2025-01-05 12:47:01,231][09057] Updated weights for policy 0, policy_version 262658 (0.0015) [2025-01-05 12:47:02,842][08963] Fps is (10 sec: 19660.8, 60 sec: 19660.8, 300 sec: 19660.8). Total num frames: 1075875840. Throughput: 0: 4913.2. Samples: 18957798. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 12:47:02,842][08963] Avg episode reward: [(0, '9.544')] [2025-01-05 12:47:03,272][09057] Updated weights for policy 0, policy_version 262668 (0.0016) [2025-01-05 12:47:05,358][09057] Updated weights for policy 0, policy_version 262678 (0.0017) [2025-01-05 12:47:07,376][09057] Updated weights for policy 0, policy_version 262688 (0.0016) [2025-01-05 12:47:07,842][08963] Fps is (10 sec: 20070.4, 60 sec: 19729.0, 300 sec: 19674.7). Total num frames: 1075978240. Throughput: 0: 4922.3. Samples: 18987738. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 12:47:07,842][08963] Avg episode reward: [(0, '9.053')] [2025-01-05 12:47:09,437][09057] Updated weights for policy 0, policy_version 262698 (0.0017) [2025-01-05 12:47:11,541][09057] Updated weights for policy 0, policy_version 262708 (0.0017) [2025-01-05 12:47:12,842][08963] Fps is (10 sec: 19660.7, 60 sec: 19660.8, 300 sec: 19660.8). Total num frames: 1076072448. Throughput: 0: 4922.9. Samples: 19002568. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 12:47:12,843][08963] Avg episode reward: [(0, '9.722')] [2025-01-05 12:47:13,713][09057] Updated weights for policy 0, policy_version 262718 (0.0018) [2025-01-05 12:47:15,732][09057] Updated weights for policy 0, policy_version 262728 (0.0015) [2025-01-05 12:47:17,842][08963] Fps is (10 sec: 19251.3, 60 sec: 19660.8, 300 sec: 19646.9). Total num frames: 1076170752. Throughput: 0: 4924.7. Samples: 19031624. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 12:47:17,842][08963] Avg episode reward: [(0, '8.823')] [2025-01-05 12:47:17,952][09057] Updated weights for policy 0, policy_version 262738 (0.0018) [2025-01-05 12:47:20,103][09057] Updated weights for policy 0, policy_version 262748 (0.0017) [2025-01-05 12:47:22,172][09057] Updated weights for policy 0, policy_version 262758 (0.0016) [2025-01-05 12:47:22,842][08963] Fps is (10 sec: 19659.6, 60 sec: 19660.6, 300 sec: 19646.9). Total num frames: 1076269056. Throughput: 0: 4916.1. Samples: 19060256. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 12:47:22,843][08963] Avg episode reward: [(0, '10.538')] [2025-01-05 12:47:24,384][09057] Updated weights for policy 0, policy_version 262768 (0.0017) [2025-01-05 12:47:26,393][09057] Updated weights for policy 0, policy_version 262778 (0.0015) [2025-01-05 12:47:27,842][08963] Fps is (10 sec: 19660.6, 60 sec: 19660.8, 300 sec: 19646.9). Total num frames: 1076367360. Throughput: 0: 4906.8. Samples: 19074770. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 12:47:27,842][08963] Avg episode reward: [(0, '10.675')] [2025-01-05 12:47:28,449][09057] Updated weights for policy 0, policy_version 262788 (0.0015) [2025-01-05 12:47:30,506][09057] Updated weights for policy 0, policy_version 262798 (0.0016) [2025-01-05 12:47:32,528][09057] Updated weights for policy 0, policy_version 262808 (0.0015) [2025-01-05 12:47:32,842][08963] Fps is (10 sec: 19662.0, 60 sec: 19660.8, 300 sec: 19660.8). Total num frames: 1076465664. Throughput: 0: 4918.3. Samples: 19104778. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 12:47:32,842][08963] Avg episode reward: [(0, '10.058')] [2025-01-05 12:47:34,598][09057] Updated weights for policy 0, policy_version 262818 (0.0016) [2025-01-05 12:47:36,668][09057] Updated weights for policy 0, policy_version 262828 (0.0015) [2025-01-05 12:47:37,842][08963] Fps is (10 sec: 19660.7, 60 sec: 19660.8, 300 sec: 19646.9). Total num frames: 1076563968. Throughput: 0: 4925.1. Samples: 19134436. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 12:47:37,843][08963] Avg episode reward: [(0, '10.651')] [2025-01-05 12:47:38,815][09057] Updated weights for policy 0, policy_version 262838 (0.0016) [2025-01-05 12:47:40,837][09057] Updated weights for policy 0, policy_version 262848 (0.0016) [2025-01-05 12:47:42,842][08963] Fps is (10 sec: 19660.9, 60 sec: 19660.8, 300 sec: 19646.9). Total num frames: 1076662272. Throughput: 0: 4920.3. Samples: 19149198. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 12:47:42,842][08963] Avg episode reward: [(0, '9.849')] [2025-01-05 12:47:43,008][09057] Updated weights for policy 0, policy_version 262858 (0.0017) [2025-01-05 12:47:45,036][09057] Updated weights for policy 0, policy_version 262868 (0.0016) [2025-01-05 12:47:47,047][09057] Updated weights for policy 0, policy_version 262878 (0.0015) [2025-01-05 12:47:47,842][08963] Fps is (10 sec: 19660.9, 60 sec: 19660.8, 300 sec: 19646.9). Total num frames: 1076760576. Throughput: 0: 4911.0. Samples: 19178794. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 12:47:47,842][08963] Avg episode reward: [(0, '9.502')] [2025-01-05 12:47:47,902][09024] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000262882_1076764672.pth... [2025-01-05 12:47:47,950][09024] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000261732_1072054272.pth [2025-01-05 12:47:49,177][09057] Updated weights for policy 0, policy_version 262888 (0.0017) [2025-01-05 12:47:51,222][09057] Updated weights for policy 0, policy_version 262898 (0.0016) [2025-01-05 12:47:52,842][08963] Fps is (10 sec: 19660.8, 60 sec: 19660.8, 300 sec: 19646.9). Total num frames: 1076858880. Throughput: 0: 4908.4. Samples: 19208618. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 12:47:52,842][08963] Avg episode reward: [(0, '8.854')] [2025-01-05 12:47:53,232][09057] Updated weights for policy 0, policy_version 262908 (0.0016) [2025-01-05 12:47:55,318][09057] Updated weights for policy 0, policy_version 262918 (0.0015) [2025-01-05 12:47:57,351][09057] Updated weights for policy 0, policy_version 262928 (0.0016) [2025-01-05 12:47:57,842][08963] Fps is (10 sec: 20070.3, 60 sec: 19729.1, 300 sec: 19660.8). Total num frames: 1076961280. Throughput: 0: 4911.4. Samples: 19223580. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 12:47:57,842][08963] Avg episode reward: [(0, '9.715')] [2025-01-05 12:47:59,462][09057] Updated weights for policy 0, policy_version 262938 (0.0015) [2025-01-05 12:48:01,564][09057] Updated weights for policy 0, policy_version 262948 (0.0015) [2025-01-05 12:48:02,842][08963] Fps is (10 sec: 19661.0, 60 sec: 19660.8, 300 sec: 19646.9). Total num frames: 1077055488. Throughput: 0: 4923.3. Samples: 19253170. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 12:48:02,842][08963] Avg episode reward: [(0, '9.631')] [2025-01-05 12:48:03,679][09057] Updated weights for policy 0, policy_version 262958 (0.0016) [2025-01-05 12:48:05,707][09057] Updated weights for policy 0, policy_version 262968 (0.0016) [2025-01-05 12:48:07,842][08963] Fps is (10 sec: 19251.2, 60 sec: 19592.5, 300 sec: 19646.9). Total num frames: 1077153792. Throughput: 0: 4934.9. Samples: 19282326. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 12:48:07,842][08963] Avg episode reward: [(0, '10.434')] [2025-01-05 12:48:07,881][09057] Updated weights for policy 0, policy_version 262978 (0.0017) [2025-01-05 12:48:10,020][09057] Updated weights for policy 0, policy_version 262988 (0.0016) [2025-01-05 12:48:12,001][09057] Updated weights for policy 0, policy_version 262998 (0.0015) [2025-01-05 12:48:12,842][08963] Fps is (10 sec: 19660.7, 60 sec: 19660.8, 300 sec: 19646.9). Total num frames: 1077252096. Throughput: 0: 4935.2. Samples: 19296854. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 12:48:12,842][08963] Avg episode reward: [(0, '8.998')] [2025-01-05 12:48:14,090][09057] Updated weights for policy 0, policy_version 263008 (0.0015) [2025-01-05 12:48:16,121][09057] Updated weights for policy 0, policy_version 263018 (0.0016) [2025-01-05 12:48:17,842][08963] Fps is (10 sec: 20070.6, 60 sec: 19729.1, 300 sec: 19660.8). Total num frames: 1077354496. Throughput: 0: 4938.9. Samples: 19327028. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 12:48:17,842][08963] Avg episode reward: [(0, '9.593')] [2025-01-05 12:48:18,172][09057] Updated weights for policy 0, policy_version 263028 (0.0017) [2025-01-05 12:48:20,264][09057] Updated weights for policy 0, policy_version 263038 (0.0016) [2025-01-05 12:48:22,294][09057] Updated weights for policy 0, policy_version 263048 (0.0015) [2025-01-05 12:48:22,842][08963] Fps is (10 sec: 20070.3, 60 sec: 19729.3, 300 sec: 19660.8). Total num frames: 1077452800. Throughput: 0: 4941.1. Samples: 19356784. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 12:48:22,842][08963] Avg episode reward: [(0, '9.534')] [2025-01-05 12:48:24,371][09057] Updated weights for policy 0, policy_version 263058 (0.0016) [2025-01-05 12:48:26,448][09057] Updated weights for policy 0, policy_version 263068 (0.0016) [2025-01-05 12:48:27,842][08963] Fps is (10 sec: 19661.0, 60 sec: 19729.1, 300 sec: 19660.8). Total num frames: 1077551104. Throughput: 0: 4941.3. Samples: 19371558. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 12:48:27,842][08963] Avg episode reward: [(0, '11.667')] [2025-01-05 12:48:28,591][09057] Updated weights for policy 0, policy_version 263078 (0.0017) [2025-01-05 12:48:30,569][09057] Updated weights for policy 0, policy_version 263088 (0.0015) [2025-01-05 12:48:32,640][09057] Updated weights for policy 0, policy_version 263098 (0.0019) [2025-01-05 12:48:32,842][08963] Fps is (10 sec: 19660.8, 60 sec: 19729.1, 300 sec: 19646.9). Total num frames: 1077649408. Throughput: 0: 4946.1. Samples: 19401368. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 12:48:32,842][08963] Avg episode reward: [(0, '9.574')] [2025-01-05 12:48:34,803][09057] Updated weights for policy 0, policy_version 263108 (0.0017) [2025-01-05 12:48:36,826][09057] Updated weights for policy 0, policy_version 263118 (0.0015) [2025-01-05 12:48:37,842][08963] Fps is (10 sec: 19660.6, 60 sec: 19729.1, 300 sec: 19646.9). Total num frames: 1077747712. Throughput: 0: 4929.5. Samples: 19430444. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 12:48:37,842][08963] Avg episode reward: [(0, '9.551')] [2025-01-05 12:48:39,051][09057] Updated weights for policy 0, policy_version 263128 (0.0017) [2025-01-05 12:48:41,071][09057] Updated weights for policy 0, policy_version 263138 (0.0016) [2025-01-05 12:48:42,842][08963] Fps is (10 sec: 19660.9, 60 sec: 19729.1, 300 sec: 19646.9). Total num frames: 1077846016. Throughput: 0: 4921.2. Samples: 19445032. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 12:48:42,842][08963] Avg episode reward: [(0, '10.672')] [2025-01-05 12:48:43,229][09057] Updated weights for policy 0, policy_version 263148 (0.0017) [2025-01-05 12:48:45,324][09057] Updated weights for policy 0, policy_version 263158 (0.0016) [2025-01-05 12:48:47,359][09057] Updated weights for policy 0, policy_version 263168 (0.0015) [2025-01-05 12:48:47,842][08963] Fps is (10 sec: 19660.8, 60 sec: 19729.1, 300 sec: 19646.9). Total num frames: 1077944320. Throughput: 0: 4915.1. Samples: 19474348. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 12:48:47,842][08963] Avg episode reward: [(0, '10.672')] [2025-01-05 12:48:49,458][09057] Updated weights for policy 0, policy_version 263178 (0.0016) [2025-01-05 12:48:51,511][09057] Updated weights for policy 0, policy_version 263188 (0.0017) [2025-01-05 12:48:52,842][08963] Fps is (10 sec: 19250.6, 60 sec: 19660.7, 300 sec: 19633.0). Total num frames: 1078038528. Throughput: 0: 4915.5. Samples: 19503526. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 12:48:52,843][08963] Avg episode reward: [(0, '9.998')] [2025-01-05 12:48:53,741][09057] Updated weights for policy 0, policy_version 263198 (0.0017) [2025-01-05 12:48:55,742][09057] Updated weights for policy 0, policy_version 263208 (0.0016) [2025-01-05 12:48:57,825][09057] Updated weights for policy 0, policy_version 263218 (0.0017) [2025-01-05 12:48:57,842][08963] Fps is (10 sec: 19661.0, 60 sec: 19660.9, 300 sec: 19646.9). Total num frames: 1078140928. Throughput: 0: 4921.2. Samples: 19518306. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 12:48:57,842][08963] Avg episode reward: [(0, '9.415')] [2025-01-05 12:48:59,926][09057] Updated weights for policy 0, policy_version 263228 (0.0016) [2025-01-05 12:49:01,967][09057] Updated weights for policy 0, policy_version 263238 (0.0017) [2025-01-05 12:49:02,842][08963] Fps is (10 sec: 20070.9, 60 sec: 19729.0, 300 sec: 19646.9). Total num frames: 1078239232. Throughput: 0: 4909.7. Samples: 19547966. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 12:49:02,842][08963] Avg episode reward: [(0, '10.087')] [2025-01-05 12:49:04,102][09057] Updated weights for policy 0, policy_version 263248 (0.0016) [2025-01-05 12:49:06,142][09057] Updated weights for policy 0, policy_version 263258 (0.0016) [2025-01-05 12:49:07,842][08963] Fps is (10 sec: 19660.6, 60 sec: 19729.1, 300 sec: 19646.9). Total num frames: 1078337536. Throughput: 0: 4904.0. Samples: 19577462. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 12:49:07,842][08963] Avg episode reward: [(0, '10.184')] [2025-01-05 12:49:08,216][09057] Updated weights for policy 0, policy_version 263268 (0.0016) [2025-01-05 12:49:10,288][09057] Updated weights for policy 0, policy_version 263278 (0.0016) [2025-01-05 12:49:12,317][09057] Updated weights for policy 0, policy_version 263288 (0.0016) [2025-01-05 12:49:12,842][08963] Fps is (10 sec: 19660.8, 60 sec: 19729.0, 300 sec: 19646.9). Total num frames: 1078435840. Throughput: 0: 4907.4. Samples: 19592390. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 12:49:12,842][08963] Avg episode reward: [(0, '10.198')] [2025-01-05 12:49:14,435][09057] Updated weights for policy 0, policy_version 263298 (0.0017) [2025-01-05 12:49:16,516][09057] Updated weights for policy 0, policy_version 263308 (0.0016) [2025-01-05 12:49:17,842][08963] Fps is (10 sec: 19660.9, 60 sec: 19660.8, 300 sec: 19647.0). Total num frames: 1078534144. Throughput: 0: 4901.5. Samples: 19621936. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 12:49:17,842][08963] Avg episode reward: [(0, '10.544')] [2025-01-05 12:49:18,666][09057] Updated weights for policy 0, policy_version 263318 (0.0017) [2025-01-05 12:49:20,666][09057] Updated weights for policy 0, policy_version 263328 (0.0017) [2025-01-05 12:49:22,709][09057] Updated weights for policy 0, policy_version 263338 (0.0015) [2025-01-05 12:49:22,842][08963] Fps is (10 sec: 19660.7, 60 sec: 19660.8, 300 sec: 19646.9). Total num frames: 1078632448. Throughput: 0: 4916.5. Samples: 19651686. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 12:49:22,842][08963] Avg episode reward: [(0, '8.832')] [2025-01-05 12:49:24,857][09057] Updated weights for policy 0, policy_version 263348 (0.0017) [2025-01-05 12:49:26,880][09057] Updated weights for policy 0, policy_version 263358 (0.0016) [2025-01-05 12:49:27,842][08963] Fps is (10 sec: 19660.6, 60 sec: 19660.7, 300 sec: 19646.9). Total num frames: 1078730752. Throughput: 0: 4918.9. Samples: 19666384. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 12:49:27,842][08963] Avg episode reward: [(0, '8.324')] [2025-01-05 12:49:28,993][09057] Updated weights for policy 0, policy_version 263368 (0.0016) [2025-01-05 12:49:31,046][09057] Updated weights for policy 0, policy_version 263378 (0.0015) [2025-01-05 12:49:32,842][08963] Fps is (10 sec: 19661.1, 60 sec: 19660.8, 300 sec: 19646.9). Total num frames: 1078829056. Throughput: 0: 4924.4. Samples: 19695944. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 12:49:32,842][08963] Avg episode reward: [(0, '9.943')] [2025-01-05 12:49:33,143][09057] Updated weights for policy 0, policy_version 263388 (0.0016) [2025-01-05 12:49:35,181][09057] Updated weights for policy 0, policy_version 263398 (0.0015) [2025-01-05 12:49:37,266][09057] Updated weights for policy 0, policy_version 263408 (0.0017) [2025-01-05 12:49:37,842][08963] Fps is (10 sec: 19661.1, 60 sec: 19660.8, 300 sec: 19646.9). Total num frames: 1078927360. Throughput: 0: 4936.0. Samples: 19725644. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 12:49:37,842][08963] Avg episode reward: [(0, '10.050')] [2025-01-05 12:49:39,337][09057] Updated weights for policy 0, policy_version 263418 (0.0017) [2025-01-05 12:49:41,386][09057] Updated weights for policy 0, policy_version 263428 (0.0016) [2025-01-05 12:49:42,842][08963] Fps is (10 sec: 19660.6, 60 sec: 19660.8, 300 sec: 19633.0). Total num frames: 1079025664. Throughput: 0: 4936.7. Samples: 19740458. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 12:49:42,842][08963] Avg episode reward: [(0, '9.447')] [2025-01-05 12:49:43,530][09057] Updated weights for policy 0, policy_version 263438 (0.0016) [2025-01-05 12:49:45,521][09057] Updated weights for policy 0, policy_version 263448 (0.0015) [2025-01-05 12:49:47,592][09057] Updated weights for policy 0, policy_version 263458 (0.0016) [2025-01-05 12:49:47,842][08963] Fps is (10 sec: 20070.1, 60 sec: 19729.0, 300 sec: 19646.9). Total num frames: 1079128064. Throughput: 0: 4937.6. Samples: 19770158. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 12:49:47,842][08963] Avg episode reward: [(0, '10.730')] [2025-01-05 12:49:47,848][09024] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000263459_1079128064.pth... [2025-01-05 12:49:47,908][09024] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000262305_1074401280.pth [2025-01-05 12:49:49,786][09057] Updated weights for policy 0, policy_version 263468 (0.0017) [2025-01-05 12:49:51,794][09057] Updated weights for policy 0, policy_version 263478 (0.0016) [2025-01-05 12:49:52,842][08963] Fps is (10 sec: 19660.9, 60 sec: 19729.2, 300 sec: 19646.9). Total num frames: 1079222272. Throughput: 0: 4935.5. Samples: 19799558. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 12:49:52,842][08963] Avg episode reward: [(0, '8.984')] [2025-01-05 12:49:53,850][09057] Updated weights for policy 0, policy_version 263488 (0.0016) [2025-01-05 12:49:55,901][09057] Updated weights for policy 0, policy_version 263498 (0.0016) [2025-01-05 12:49:57,842][08963] Fps is (10 sec: 19661.0, 60 sec: 19729.0, 300 sec: 19660.8). Total num frames: 1079324672. Throughput: 0: 4938.0. Samples: 19814598. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 12:49:57,843][08963] Avg episode reward: [(0, '9.962')] [2025-01-05 12:49:58,009][09057] Updated weights for policy 0, policy_version 263508 (0.0017) [2025-01-05 12:50:00,048][09057] Updated weights for policy 0, policy_version 263518 (0.0016) [2025-01-05 12:50:02,119][09057] Updated weights for policy 0, policy_version 263528 (0.0016) [2025-01-05 12:50:02,842][08963] Fps is (10 sec: 20070.3, 60 sec: 19729.1, 300 sec: 19660.8). Total num frames: 1079422976. Throughput: 0: 4941.4. Samples: 19844298. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 12:50:02,842][08963] Avg episode reward: [(0, '9.157')] [2025-01-05 12:50:04,212][09057] Updated weights for policy 0, policy_version 263538 (0.0017) [2025-01-05 12:50:06,240][09057] Updated weights for policy 0, policy_version 263548 (0.0019) [2025-01-05 12:50:07,842][08963] Fps is (10 sec: 19660.6, 60 sec: 19729.0, 300 sec: 19660.8). Total num frames: 1079521280. Throughput: 0: 4934.8. Samples: 19873752. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 12:50:07,843][08963] Avg episode reward: [(0, '10.229')] [2025-01-05 12:50:08,391][09057] Updated weights for policy 0, policy_version 263558 (0.0017) [2025-01-05 12:50:10,424][09057] Updated weights for policy 0, policy_version 263568 (0.0016) [2025-01-05 12:50:12,434][09057] Updated weights for policy 0, policy_version 263578 (0.0016) [2025-01-05 12:50:12,842][08963] Fps is (10 sec: 19660.8, 60 sec: 19729.1, 300 sec: 19660.8). Total num frames: 1079619584. Throughput: 0: 4940.4. Samples: 19888702. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 12:50:12,842][08963] Avg episode reward: [(0, '9.302')] [2025-01-05 12:50:14,495][09057] Updated weights for policy 0, policy_version 263588 (0.0016) [2025-01-05 12:50:16,534][09057] Updated weights for policy 0, policy_version 263598 (0.0016) [2025-01-05 12:50:17,842][08963] Fps is (10 sec: 20070.8, 60 sec: 19797.4, 300 sec: 19688.6). Total num frames: 1079721984. Throughput: 0: 4953.3. Samples: 19918844. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 12:50:17,842][08963] Avg episode reward: [(0, '10.298')] [2025-01-05 12:50:18,565][09057] Updated weights for policy 0, policy_version 263608 (0.0016) [2025-01-05 12:50:20,599][09057] Updated weights for policy 0, policy_version 263618 (0.0016) [2025-01-05 12:50:22,657][09057] Updated weights for policy 0, policy_version 263628 (0.0016) [2025-01-05 12:50:22,842][08963] Fps is (10 sec: 20480.2, 60 sec: 19865.7, 300 sec: 19702.5). Total num frames: 1079824384. Throughput: 0: 4962.6. Samples: 19948960. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 12:50:22,842][08963] Avg episode reward: [(0, '10.056')] [2025-01-05 12:50:24,760][09057] Updated weights for policy 0, policy_version 263638 (0.0017) [2025-01-05 12:50:26,789][09057] Updated weights for policy 0, policy_version 263648 (0.0016) [2025-01-05 12:50:27,842][08963] Fps is (10 sec: 19660.4, 60 sec: 19797.3, 300 sec: 19688.6). Total num frames: 1079918592. Throughput: 0: 4960.1. Samples: 19963662. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 12:50:27,843][08963] Avg episode reward: [(0, '9.766')] [2025-01-05 12:50:28,917][09057] Updated weights for policy 0, policy_version 263658 (0.0016) [2025-01-05 12:50:30,921][09057] Updated weights for policy 0, policy_version 263668 (0.0015) [2025-01-05 12:50:32,842][08963] Fps is (10 sec: 19660.6, 60 sec: 19865.6, 300 sec: 19702.4). Total num frames: 1080020992. Throughput: 0: 4963.9. Samples: 19993532. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 12:50:32,842][08963] Avg episode reward: [(0, '9.812')] [2025-01-05 12:50:32,946][09057] Updated weights for policy 0, policy_version 263678 (0.0017) [2025-01-05 12:50:34,979][09057] Updated weights for policy 0, policy_version 263688 (0.0015) [2025-01-05 12:50:37,023][09057] Updated weights for policy 0, policy_version 263698 (0.0015) [2025-01-05 12:50:37,842][08963] Fps is (10 sec: 20480.1, 60 sec: 19933.8, 300 sec: 19730.2). Total num frames: 1080123392. Throughput: 0: 4981.1. Samples: 20023710. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 12:50:37,842][08963] Avg episode reward: [(0, '11.219')] [2025-01-05 12:50:39,111][09057] Updated weights for policy 0, policy_version 263708 (0.0017) [2025-01-05 12:50:41,137][09057] Updated weights for policy 0, policy_version 263718 (0.0016) [2025-01-05 12:50:42,842][08963] Fps is (10 sec: 20070.5, 60 sec: 19933.9, 300 sec: 19730.2). Total num frames: 1080221696. Throughput: 0: 4978.9. Samples: 20038646. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 12:50:42,842][08963] Avg episode reward: [(0, '9.877')] [2025-01-05 12:50:43,264][09057] Updated weights for policy 0, policy_version 263728 (0.0016) [2025-01-05 12:50:45,289][09057] Updated weights for policy 0, policy_version 263738 (0.0015) [2025-01-05 12:50:47,320][09057] Updated weights for policy 0, policy_version 263748 (0.0015) [2025-01-05 12:50:47,842][08963] Fps is (10 sec: 19660.8, 60 sec: 19865.6, 300 sec: 19730.2). Total num frames: 1080320000. Throughput: 0: 4980.7. Samples: 20068428. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 12:50:47,842][08963] Avg episode reward: [(0, '9.067')] [2025-01-05 12:50:49,465][09057] Updated weights for policy 0, policy_version 263758 (0.0016) [2025-01-05 12:50:51,481][09057] Updated weights for policy 0, policy_version 263768 (0.0015) [2025-01-05 12:50:52,842][08963] Fps is (10 sec: 19660.7, 60 sec: 19933.9, 300 sec: 19730.2). Total num frames: 1080418304. Throughput: 0: 4987.4. Samples: 20098184. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 12:50:52,842][08963] Avg episode reward: [(0, '9.697')] [2025-01-05 12:50:53,493][09057] Updated weights for policy 0, policy_version 263778 (0.0016) [2025-01-05 12:50:55,557][09057] Updated weights for policy 0, policy_version 263788 (0.0016) [2025-01-05 12:50:57,584][09057] Updated weights for policy 0, policy_version 263798 (0.0016) [2025-01-05 12:50:57,842][08963] Fps is (10 sec: 20070.6, 60 sec: 19933.9, 300 sec: 19744.1). Total num frames: 1080520704. Throughput: 0: 4990.2. Samples: 20113260. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 12:50:57,842][08963] Avg episode reward: [(0, '9.918')] [2025-01-05 12:50:59,718][09057] Updated weights for policy 0, policy_version 263808 (0.0020) [2025-01-05 12:51:01,774][09057] Updated weights for policy 0, policy_version 263818 (0.0018) [2025-01-05 12:51:02,842][08963] Fps is (10 sec: 20070.4, 60 sec: 19933.9, 300 sec: 19744.1). Total num frames: 1080619008. Throughput: 0: 4981.1. Samples: 20142992. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 12:51:02,842][08963] Avg episode reward: [(0, '10.584')] [2025-01-05 12:51:03,886][09057] Updated weights for policy 0, policy_version 263828 (0.0016) [2025-01-05 12:51:05,922][09057] Updated weights for policy 0, policy_version 263838 (0.0016) [2025-01-05 12:51:07,842][08963] Fps is (10 sec: 19660.7, 60 sec: 19933.9, 300 sec: 19744.1). Total num frames: 1080717312. Throughput: 0: 4962.9. Samples: 20172292. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 12:51:07,842][08963] Avg episode reward: [(0, '9.477')] [2025-01-05 12:51:08,064][09057] Updated weights for policy 0, policy_version 263848 (0.0016) [2025-01-05 12:51:10,104][09057] Updated weights for policy 0, policy_version 263858 (0.0015) [2025-01-05 12:51:12,118][09057] Updated weights for policy 0, policy_version 263868 (0.0016) [2025-01-05 12:51:12,842][08963] Fps is (10 sec: 19660.7, 60 sec: 19933.9, 300 sec: 19744.1). Total num frames: 1080815616. Throughput: 0: 4970.1. Samples: 20187316. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 12:51:12,842][08963] Avg episode reward: [(0, '9.484')] [2025-01-05 12:51:14,163][09057] Updated weights for policy 0, policy_version 263878 (0.0015) [2025-01-05 12:51:16,167][09057] Updated weights for policy 0, policy_version 263888 (0.0015) [2025-01-05 12:51:17,842][08963] Fps is (10 sec: 20070.5, 60 sec: 19933.9, 300 sec: 19758.0). Total num frames: 1080918016. Throughput: 0: 4981.0. Samples: 20217676. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 12:51:17,842][08963] Avg episode reward: [(0, '10.252')] [2025-01-05 12:51:18,185][09057] Updated weights for policy 0, policy_version 263898 (0.0016) [2025-01-05 12:51:20,237][09057] Updated weights for policy 0, policy_version 263908 (0.0016) [2025-01-05 12:51:22,290][09057] Updated weights for policy 0, policy_version 263918 (0.0014) [2025-01-05 12:51:22,842][08963] Fps is (10 sec: 20070.4, 60 sec: 19865.6, 300 sec: 19758.0). Total num frames: 1081016320. Throughput: 0: 4975.7. Samples: 20247618. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 12:51:22,842][08963] Avg episode reward: [(0, '9.189')] [2025-01-05 12:51:24,410][09057] Updated weights for policy 0, policy_version 263928 (0.0017) [2025-01-05 12:51:26,474][09057] Updated weights for policy 0, policy_version 263938 (0.0015) [2025-01-05 12:51:27,842][08963] Fps is (10 sec: 19660.7, 60 sec: 19933.9, 300 sec: 19758.0). Total num frames: 1081114624. Throughput: 0: 4969.2. Samples: 20262260. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 12:51:27,842][08963] Avg episode reward: [(0, '9.922')] [2025-01-05 12:51:28,630][09057] Updated weights for policy 0, policy_version 263948 (0.0017) [2025-01-05 12:51:30,636][09057] Updated weights for policy 0, policy_version 263958 (0.0015) [2025-01-05 12:51:32,700][09057] Updated weights for policy 0, policy_version 263968 (0.0016) [2025-01-05 12:51:32,842][08963] Fps is (10 sec: 19660.8, 60 sec: 19865.6, 300 sec: 19758.0). Total num frames: 1081212928. Throughput: 0: 4964.9. Samples: 20291850. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 12:51:32,843][08963] Avg episode reward: [(0, '9.975')] [2025-01-05 12:51:34,838][09057] Updated weights for policy 0, policy_version 263978 (0.0016) [2025-01-05 12:51:36,842][09057] Updated weights for policy 0, policy_version 263988 (0.0015) [2025-01-05 12:51:37,842][08963] Fps is (10 sec: 19661.0, 60 sec: 19797.4, 300 sec: 19758.0). Total num frames: 1081311232. Throughput: 0: 4964.1. Samples: 20321570. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 12:51:37,842][08963] Avg episode reward: [(0, '9.442')] [2025-01-05 12:51:38,918][09057] Updated weights for policy 0, policy_version 263998 (0.0016) [2025-01-05 12:51:40,973][09057] Updated weights for policy 0, policy_version 264008 (0.0016) [2025-01-05 12:51:42,842][08963] Fps is (10 sec: 19661.0, 60 sec: 19797.4, 300 sec: 19758.0). Total num frames: 1081409536. Throughput: 0: 4963.8. Samples: 20336630. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 12:51:42,842][08963] Avg episode reward: [(0, '12.102')] [2025-01-05 12:51:43,087][09057] Updated weights for policy 0, policy_version 264018 (0.0018) [2025-01-05 12:51:45,170][09057] Updated weights for policy 0, policy_version 264028 (0.0016) [2025-01-05 12:51:47,188][09057] Updated weights for policy 0, policy_version 264038 (0.0016) [2025-01-05 12:51:47,843][08963] Fps is (10 sec: 20068.6, 60 sec: 19865.3, 300 sec: 19771.8). Total num frames: 1081511936. Throughput: 0: 4957.8. Samples: 20366098. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 12:51:47,843][08963] Avg episode reward: [(0, '10.057')] [2025-01-05 12:51:47,852][09024] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000264041_1081511936.pth... [2025-01-05 12:51:47,904][09024] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000262882_1076764672.pth [2025-01-05 12:51:49,397][09057] Updated weights for policy 0, policy_version 264048 (0.0017) [2025-01-05 12:51:51,529][09057] Updated weights for policy 0, policy_version 264058 (0.0016) [2025-01-05 12:51:52,842][08963] Fps is (10 sec: 19660.6, 60 sec: 19797.3, 300 sec: 19758.0). Total num frames: 1081606144. Throughput: 0: 4942.0. Samples: 20394684. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 12:51:52,842][08963] Avg episode reward: [(0, '9.386')] [2025-01-05 12:51:53,652][09057] Updated weights for policy 0, policy_version 264068 (0.0017) [2025-01-05 12:51:55,686][09057] Updated weights for policy 0, policy_version 264078 (0.0016) [2025-01-05 12:51:57,842][08963] Fps is (10 sec: 18843.3, 60 sec: 19660.8, 300 sec: 19744.1). Total num frames: 1081700352. Throughput: 0: 4936.2. Samples: 20409446. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 12:51:57,842][08963] Avg episode reward: [(0, '9.560')] [2025-01-05 12:51:57,912][09057] Updated weights for policy 0, policy_version 264088 (0.0017) [2025-01-05 12:52:00,055][09057] Updated weights for policy 0, policy_version 264098 (0.0019) [2025-01-05 12:52:02,089][09057] Updated weights for policy 0, policy_version 264108 (0.0016) [2025-01-05 12:52:02,842][08963] Fps is (10 sec: 19251.2, 60 sec: 19660.8, 300 sec: 19730.2). Total num frames: 1081798656. Throughput: 0: 4901.4. Samples: 20438238. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 12:52:02,842][08963] Avg episode reward: [(0, '9.827')] [2025-01-05 12:52:04,271][09057] Updated weights for policy 0, policy_version 264118 (0.0016) [2025-01-05 12:52:06,269][09057] Updated weights for policy 0, policy_version 264128 (0.0016) [2025-01-05 12:52:07,842][08963] Fps is (10 sec: 19660.8, 60 sec: 19660.8, 300 sec: 19744.1). Total num frames: 1081896960. Throughput: 0: 4893.0. Samples: 20467802. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 12:52:07,842][08963] Avg episode reward: [(0, '10.416')] [2025-01-05 12:52:08,290][09057] Updated weights for policy 0, policy_version 264138 (0.0015) [2025-01-05 12:52:10,374][09057] Updated weights for policy 0, policy_version 264148 (0.0016) [2025-01-05 12:52:12,383][09057] Updated weights for policy 0, policy_version 264158 (0.0016) [2025-01-05 12:52:12,842][08963] Fps is (10 sec: 20070.5, 60 sec: 19729.1, 300 sec: 19758.0). Total num frames: 1081999360. Throughput: 0: 4904.7. Samples: 20482972. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 12:52:12,842][08963] Avg episode reward: [(0, '9.247')] [2025-01-05 12:52:14,400][09057] Updated weights for policy 0, policy_version 264168 (0.0017) [2025-01-05 12:52:16,498][09057] Updated weights for policy 0, policy_version 264178 (0.0016) [2025-01-05 12:52:17,842][08963] Fps is (10 sec: 20070.3, 60 sec: 19660.8, 300 sec: 19758.0). Total num frames: 1082097664. Throughput: 0: 4916.2. Samples: 20513078. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 12:52:17,842][08963] Avg episode reward: [(0, '10.487')] [2025-01-05 12:52:18,612][09057] Updated weights for policy 0, policy_version 264188 (0.0019) [2025-01-05 12:52:20,622][09057] Updated weights for policy 0, policy_version 264198 (0.0015) [2025-01-05 12:52:22,708][09057] Updated weights for policy 0, policy_version 264208 (0.0016) [2025-01-05 12:52:22,842][08963] Fps is (10 sec: 19660.9, 60 sec: 19660.8, 300 sec: 19758.0). Total num frames: 1082195968. Throughput: 0: 4914.8. Samples: 20542734. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 12:52:22,842][08963] Avg episode reward: [(0, '9.988')] [2025-01-05 12:52:24,806][09057] Updated weights for policy 0, policy_version 264218 (0.0017) [2025-01-05 12:52:26,802][09057] Updated weights for policy 0, policy_version 264228 (0.0015) [2025-01-05 12:52:27,842][08963] Fps is (10 sec: 20070.3, 60 sec: 19729.1, 300 sec: 19771.9). Total num frames: 1082298368. Throughput: 0: 4906.0. Samples: 20557400. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 12:52:27,842][08963] Avg episode reward: [(0, '10.073')] [2025-01-05 12:52:28,901][09057] Updated weights for policy 0, policy_version 264238 (0.0015) [2025-01-05 12:52:30,899][09057] Updated weights for policy 0, policy_version 264248 (0.0016) [2025-01-05 12:52:32,842][08963] Fps is (10 sec: 20070.3, 60 sec: 19729.1, 300 sec: 19771.9). Total num frames: 1082396672. Throughput: 0: 4920.2. Samples: 20587502. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 12:52:32,842][08963] Avg episode reward: [(0, '9.295')] [2025-01-05 12:52:32,933][09057] Updated weights for policy 0, policy_version 264258 (0.0015) [2025-01-05 12:52:35,016][09057] Updated weights for policy 0, policy_version 264268 (0.0014) [2025-01-05 12:52:37,015][09057] Updated weights for policy 0, policy_version 264278 (0.0015) [2025-01-05 12:52:37,842][08963] Fps is (10 sec: 20070.5, 60 sec: 19797.3, 300 sec: 19785.8). Total num frames: 1082499072. Throughput: 0: 4955.4. Samples: 20617678. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 12:52:37,842][08963] Avg episode reward: [(0, '9.716')] [2025-01-05 12:52:39,031][09057] Updated weights for policy 0, policy_version 264288 (0.0017) [2025-01-05 12:52:41,122][09057] Updated weights for policy 0, policy_version 264298 (0.0015) [2025-01-05 12:52:42,842][08963] Fps is (10 sec: 20070.3, 60 sec: 19797.3, 300 sec: 19785.8). Total num frames: 1082597376. Throughput: 0: 4963.6. Samples: 20632806. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 12:52:42,842][08963] Avg episode reward: [(0, '9.019')] [2025-01-05 12:52:43,202][09057] Updated weights for policy 0, policy_version 264308 (0.0016) [2025-01-05 12:52:45,242][09057] Updated weights for policy 0, policy_version 264318 (0.0016) [2025-01-05 12:52:47,338][09057] Updated weights for policy 0, policy_version 264328 (0.0018) [2025-01-05 12:52:47,842][08963] Fps is (10 sec: 19660.6, 60 sec: 19729.3, 300 sec: 19785.8). Total num frames: 1082695680. Throughput: 0: 4984.4. Samples: 20662538. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 12:52:47,842][08963] Avg episode reward: [(0, '8.554')] [2025-01-05 12:52:49,409][09057] Updated weights for policy 0, policy_version 264338 (0.0017) [2025-01-05 12:52:51,446][09057] Updated weights for policy 0, policy_version 264348 (0.0015) [2025-01-05 12:52:52,842][08963] Fps is (10 sec: 19660.7, 60 sec: 19797.3, 300 sec: 19771.9). Total num frames: 1082793984. Throughput: 0: 4980.3. Samples: 20691914. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 12:52:52,842][08963] Avg episode reward: [(0, '9.707')] [2025-01-05 12:52:53,617][09057] Updated weights for policy 0, policy_version 264358 (0.0016) [2025-01-05 12:52:55,606][09057] Updated weights for policy 0, policy_version 264368 (0.0015) [2025-01-05 12:52:57,606][09057] Updated weights for policy 0, policy_version 264378 (0.0016) [2025-01-05 12:52:57,842][08963] Fps is (10 sec: 19660.9, 60 sec: 19865.6, 300 sec: 19785.8). Total num frames: 1082892288. Throughput: 0: 4977.3. Samples: 20706952. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 12:52:57,842][08963] Avg episode reward: [(0, '10.546')] [2025-01-05 12:52:59,694][09057] Updated weights for policy 0, policy_version 264388 (0.0016) [2025-01-05 12:53:01,706][09057] Updated weights for policy 0, policy_version 264398 (0.0015) [2025-01-05 12:53:02,842][08963] Fps is (10 sec: 20070.5, 60 sec: 19933.9, 300 sec: 19799.7). Total num frames: 1082994688. Throughput: 0: 4980.8. Samples: 20737212. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 12:53:02,842][08963] Avg episode reward: [(0, '11.232')] [2025-01-05 12:53:03,716][09057] Updated weights for policy 0, policy_version 264408 (0.0017) [2025-01-05 12:53:05,790][09057] Updated weights for policy 0, policy_version 264418 (0.0015) [2025-01-05 12:53:07,842][08963] Fps is (10 sec: 20070.3, 60 sec: 19933.8, 300 sec: 19799.6). Total num frames: 1083092992. Throughput: 0: 4985.1. Samples: 20767064. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 12:53:07,842][08963] Avg episode reward: [(0, '9.066')] [2025-01-05 12:53:07,848][09057] Updated weights for policy 0, policy_version 264428 (0.0016) [2025-01-05 12:53:09,954][09057] Updated weights for policy 0, policy_version 264438 (0.0016) [2025-01-05 12:53:12,036][09057] Updated weights for policy 0, policy_version 264448 (0.0015) [2025-01-05 12:53:12,842][08963] Fps is (10 sec: 20070.5, 60 sec: 19933.9, 300 sec: 19799.7). Total num frames: 1083195392. Throughput: 0: 4989.9. Samples: 20781946. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 12:53:12,842][08963] Avg episode reward: [(0, '9.904')] [2025-01-05 12:53:14,118][09057] Updated weights for policy 0, policy_version 264458 (0.0017) [2025-01-05 12:53:16,184][09057] Updated weights for policy 0, policy_version 264468 (0.0016) [2025-01-05 12:53:17,842][08963] Fps is (10 sec: 19661.2, 60 sec: 19865.6, 300 sec: 19785.8). Total num frames: 1083289600. Throughput: 0: 4972.9. Samples: 20811280. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 12:53:17,842][08963] Avg episode reward: [(0, '10.188')] [2025-01-05 12:53:18,327][09057] Updated weights for policy 0, policy_version 264478 (0.0016) [2025-01-05 12:53:20,341][09057] Updated weights for policy 0, policy_version 264488 (0.0019) [2025-01-05 12:53:22,378][09057] Updated weights for policy 0, policy_version 264498 (0.0016) [2025-01-05 12:53:22,842][08963] Fps is (10 sec: 19660.6, 60 sec: 19933.8, 300 sec: 19799.6). Total num frames: 1083392000. Throughput: 0: 4964.6. Samples: 20841086. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 12:53:22,842][08963] Avg episode reward: [(0, '9.080')] [2025-01-05 12:53:24,512][09057] Updated weights for policy 0, policy_version 264508 (0.0017) [2025-01-05 12:53:26,555][09057] Updated weights for policy 0, policy_version 264518 (0.0016) [2025-01-05 12:53:27,842][08963] Fps is (10 sec: 20070.0, 60 sec: 19865.6, 300 sec: 19799.6). Total num frames: 1083490304. Throughput: 0: 4955.8. Samples: 20855818. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 12:53:27,842][08963] Avg episode reward: [(0, '9.762')] [2025-01-05 12:53:28,675][09057] Updated weights for policy 0, policy_version 264528 (0.0017) [2025-01-05 12:53:30,717][09057] Updated weights for policy 0, policy_version 264538 (0.0016) [2025-01-05 12:53:32,810][09057] Updated weights for policy 0, policy_version 264548 (0.0017) [2025-01-05 12:53:32,842][08963] Fps is (10 sec: 19660.9, 60 sec: 19865.6, 300 sec: 19799.6). Total num frames: 1083588608. Throughput: 0: 4951.6. Samples: 20885360. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 12:53:32,842][08963] Avg episode reward: [(0, '10.167')] [2025-01-05 12:53:34,935][09057] Updated weights for policy 0, policy_version 264558 (0.0016) [2025-01-05 12:53:36,962][09057] Updated weights for policy 0, policy_version 264568 (0.0015) [2025-01-05 12:53:37,842][08963] Fps is (10 sec: 19660.8, 60 sec: 19797.3, 300 sec: 19799.6). Total num frames: 1083686912. Throughput: 0: 4953.8. Samples: 20914836. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 12:53:37,842][08963] Avg episode reward: [(0, '9.400')] [2025-01-05 12:53:39,088][09057] Updated weights for policy 0, policy_version 264578 (0.0017) [2025-01-05 12:53:41,104][09057] Updated weights for policy 0, policy_version 264588 (0.0016) [2025-01-05 12:53:42,842][08963] Fps is (10 sec: 19661.0, 60 sec: 19797.4, 300 sec: 19799.7). Total num frames: 1083785216. Throughput: 0: 4948.4. Samples: 20929630. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 12:53:42,842][08963] Avg episode reward: [(0, '9.378')] [2025-01-05 12:53:43,155][09057] Updated weights for policy 0, policy_version 264598 (0.0017) [2025-01-05 12:53:45,216][09057] Updated weights for policy 0, policy_version 264608 (0.0016) [2025-01-05 12:53:47,229][09057] Updated weights for policy 0, policy_version 264618 (0.0016) [2025-01-05 12:53:47,842][08963] Fps is (10 sec: 20070.7, 60 sec: 19865.6, 300 sec: 19827.4). Total num frames: 1083887616. Throughput: 0: 4946.4. Samples: 20959798. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 12:53:47,842][08963] Avg episode reward: [(0, '8.691')] [2025-01-05 12:53:47,849][09024] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000264621_1083887616.pth... [2025-01-05 12:53:47,898][09024] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000263459_1079128064.pth [2025-01-05 12:53:49,292][09057] Updated weights for policy 0, policy_version 264628 (0.0016) [2025-01-05 12:53:51,368][09057] Updated weights for policy 0, policy_version 264638 (0.0015) [2025-01-05 12:53:52,842][08963] Fps is (10 sec: 20070.2, 60 sec: 19865.6, 300 sec: 19813.5). Total num frames: 1083985920. Throughput: 0: 4941.4. Samples: 20989426. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 12:53:52,843][08963] Avg episode reward: [(0, '10.191')] [2025-01-05 12:53:53,479][09057] Updated weights for policy 0, policy_version 264648 (0.0016) [2025-01-05 12:53:55,489][09057] Updated weights for policy 0, policy_version 264658 (0.0016) [2025-01-05 12:53:57,578][09057] Updated weights for policy 0, policy_version 264668 (0.0016) [2025-01-05 12:53:57,842][08963] Fps is (10 sec: 19660.5, 60 sec: 19865.6, 300 sec: 19813.5). Total num frames: 1084084224. Throughput: 0: 4942.3. Samples: 21004352. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 12:53:57,842][08963] Avg episode reward: [(0, '9.890')] [2025-01-05 12:53:59,630][09057] Updated weights for policy 0, policy_version 264678 (0.0017) [2025-01-05 12:54:01,679][09057] Updated weights for policy 0, policy_version 264688 (0.0015) [2025-01-05 12:54:02,842][08963] Fps is (10 sec: 19660.7, 60 sec: 19797.3, 300 sec: 19813.5). Total num frames: 1084182528. Throughput: 0: 4952.5. Samples: 21034144. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 12:54:02,843][08963] Avg episode reward: [(0, '10.259')] [2025-01-05 12:54:03,880][09057] Updated weights for policy 0, policy_version 264698 (0.0017) [2025-01-05 12:54:05,902][09057] Updated weights for policy 0, policy_version 264708 (0.0016) [2025-01-05 12:54:07,843][08963] Fps is (10 sec: 19659.7, 60 sec: 19797.1, 300 sec: 19813.5). Total num frames: 1084280832. Throughput: 0: 4933.0. Samples: 21063074. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 12:54:07,845][08963] Avg episode reward: [(0, '9.362')] [2025-01-05 12:54:08,059][09057] Updated weights for policy 0, policy_version 264718 (0.0018) [2025-01-05 12:54:10,159][09057] Updated weights for policy 0, policy_version 264728 (0.0016) [2025-01-05 12:54:12,165][09057] Updated weights for policy 0, policy_version 264738 (0.0016) [2025-01-05 12:54:12,842][08963] Fps is (10 sec: 19660.9, 60 sec: 19729.0, 300 sec: 19813.5). Total num frames: 1084379136. Throughput: 0: 4936.8. Samples: 21077972. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 12:54:12,842][08963] Avg episode reward: [(0, '10.545')] [2025-01-05 12:54:14,228][09057] Updated weights for policy 0, policy_version 264748 (0.0016) [2025-01-05 12:54:16,318][09057] Updated weights for policy 0, policy_version 264758 (0.0016) [2025-01-05 12:54:17,842][08963] Fps is (10 sec: 19661.9, 60 sec: 19797.3, 300 sec: 19813.5). Total num frames: 1084477440. Throughput: 0: 4940.8. Samples: 21107698. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 12:54:17,843][08963] Avg episode reward: [(0, '10.330')] [2025-01-05 12:54:18,355][09057] Updated weights for policy 0, policy_version 264768 (0.0016) [2025-01-05 12:54:20,421][09057] Updated weights for policy 0, policy_version 264778 (0.0016) [2025-01-05 12:54:22,494][09057] Updated weights for policy 0, policy_version 264788 (0.0019) [2025-01-05 12:54:22,842][08963] Fps is (10 sec: 19660.8, 60 sec: 19729.1, 300 sec: 19813.5). Total num frames: 1084575744. Throughput: 0: 4949.4. Samples: 21137558. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 12:54:22,842][08963] Avg episode reward: [(0, '8.904')] [2025-01-05 12:54:24,613][09057] Updated weights for policy 0, policy_version 264798 (0.0017) [2025-01-05 12:54:26,664][09057] Updated weights for policy 0, policy_version 264808 (0.0016) [2025-01-05 12:54:27,842][08963] Fps is (10 sec: 19661.1, 60 sec: 19729.1, 300 sec: 19813.5). Total num frames: 1084674048. Throughput: 0: 4944.1. Samples: 21152114. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 12:54:27,842][08963] Avg episode reward: [(0, '9.550')] [2025-01-05 12:54:28,834][09057] Updated weights for policy 0, policy_version 264818 (0.0018) [2025-01-05 12:54:30,855][09057] Updated weights for policy 0, policy_version 264828 (0.0016) [2025-01-05 12:54:32,842][08963] Fps is (10 sec: 19660.7, 60 sec: 19729.1, 300 sec: 19813.5). Total num frames: 1084772352. Throughput: 0: 4928.7. Samples: 21181588. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 12:54:32,843][08963] Avg episode reward: [(0, '9.685')] [2025-01-05 12:54:32,957][09057] Updated weights for policy 0, policy_version 264838 (0.0017) [2025-01-05 12:54:35,043][09057] Updated weights for policy 0, policy_version 264848 (0.0016) [2025-01-05 12:54:37,045][09057] Updated weights for policy 0, policy_version 264858 (0.0016) [2025-01-05 12:54:37,842][08963] Fps is (10 sec: 19660.8, 60 sec: 19729.1, 300 sec: 19813.5). Total num frames: 1084870656. Throughput: 0: 4933.6. Samples: 21211438. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 12:54:37,842][08963] Avg episode reward: [(0, '9.870')] [2025-01-05 12:54:39,080][09057] Updated weights for policy 0, policy_version 264868 (0.0016) [2025-01-05 12:54:41,173][09057] Updated weights for policy 0, policy_version 264878 (0.0016) [2025-01-05 12:54:42,842][08963] Fps is (10 sec: 20070.6, 60 sec: 19797.3, 300 sec: 19813.5). Total num frames: 1084973056. Throughput: 0: 4936.0. Samples: 21226472. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 12:54:42,842][08963] Avg episode reward: [(0, '10.769')] [2025-01-05 12:54:43,263][09057] Updated weights for policy 0, policy_version 264888 (0.0017) [2025-01-05 12:54:45,308][09057] Updated weights for policy 0, policy_version 264898 (0.0016) [2025-01-05 12:54:47,409][09057] Updated weights for policy 0, policy_version 264908 (0.0016) [2025-01-05 12:54:47,842][08963] Fps is (10 sec: 20070.2, 60 sec: 19729.0, 300 sec: 19827.4). Total num frames: 1085071360. Throughput: 0: 4933.4. Samples: 21256146. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 12:54:47,842][08963] Avg episode reward: [(0, '9.784')] [2025-01-05 12:54:49,479][09057] Updated weights for policy 0, policy_version 264918 (0.0017) [2025-01-05 12:54:51,532][09057] Updated weights for policy 0, policy_version 264928 (0.0017) [2025-01-05 12:54:52,842][08963] Fps is (10 sec: 19660.8, 60 sec: 19729.1, 300 sec: 19813.5). Total num frames: 1085169664. Throughput: 0: 4941.4. Samples: 21285434. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 12:54:52,842][08963] Avg episode reward: [(0, '9.530')] [2025-01-05 12:54:53,712][09057] Updated weights for policy 0, policy_version 264938 (0.0017) [2025-01-05 12:54:55,701][09057] Updated weights for policy 0, policy_version 264948 (0.0016) [2025-01-05 12:54:57,761][09057] Updated weights for policy 0, policy_version 264958 (0.0017) [2025-01-05 12:54:57,842][08963] Fps is (10 sec: 19660.7, 60 sec: 19729.1, 300 sec: 19813.5). Total num frames: 1085267968. Throughput: 0: 4937.7. Samples: 21300168. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 12:54:57,842][08963] Avg episode reward: [(0, '10.859')] [2025-01-05 12:54:59,941][09057] Updated weights for policy 0, policy_version 264968 (0.0016) [2025-01-05 12:55:01,927][09057] Updated weights for policy 0, policy_version 264978 (0.0016) [2025-01-05 12:55:02,842][08963] Fps is (10 sec: 19660.8, 60 sec: 19729.1, 300 sec: 19813.5). Total num frames: 1085366272. Throughput: 0: 4937.9. Samples: 21329904. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 12:55:02,842][08963] Avg episode reward: [(0, '10.032')] [2025-01-05 12:55:03,995][09057] Updated weights for policy 0, policy_version 264988 (0.0016) [2025-01-05 12:55:06,059][09057] Updated weights for policy 0, policy_version 264998 (0.0017) [2025-01-05 12:55:07,842][08963] Fps is (10 sec: 19660.8, 60 sec: 19729.3, 300 sec: 19813.5). Total num frames: 1085464576. Throughput: 0: 4935.3. Samples: 21359646. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 12:55:07,842][08963] Avg episode reward: [(0, '10.114')] [2025-01-05 12:55:08,122][09057] Updated weights for policy 0, policy_version 265008 (0.0017) [2025-01-05 12:55:10,199][09057] Updated weights for policy 0, policy_version 265018 (0.0015) [2025-01-05 12:55:12,248][09057] Updated weights for policy 0, policy_version 265028 (0.0015) [2025-01-05 12:55:12,842][08963] Fps is (10 sec: 19660.8, 60 sec: 19729.1, 300 sec: 19799.6). Total num frames: 1085562880. Throughput: 0: 4944.7. Samples: 21374624. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 12:55:12,842][08963] Avg episode reward: [(0, '10.750')] [2025-01-05 12:55:14,304][09057] Updated weights for policy 0, policy_version 265038 (0.0016) [2025-01-05 12:55:16,373][09057] Updated weights for policy 0, policy_version 265048 (0.0015) [2025-01-05 12:55:17,842][08963] Fps is (10 sec: 19660.7, 60 sec: 19729.1, 300 sec: 19785.8). Total num frames: 1085661184. Throughput: 0: 4953.4. Samples: 21404490. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 12:55:17,842][08963] Avg episode reward: [(0, '9.295')] [2025-01-05 12:55:18,509][09057] Updated weights for policy 0, policy_version 265058 (0.0016) [2025-01-05 12:55:20,485][09057] Updated weights for policy 0, policy_version 265068 (0.0017) [2025-01-05 12:55:22,572][09057] Updated weights for policy 0, policy_version 265078 (0.0015) [2025-01-05 12:55:22,842][08963] Fps is (10 sec: 20070.4, 60 sec: 19797.3, 300 sec: 19813.5). Total num frames: 1085763584. Throughput: 0: 4951.5. Samples: 21434258. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 12:55:22,842][08963] Avg episode reward: [(0, '9.911')] [2025-01-05 12:55:24,714][09057] Updated weights for policy 0, policy_version 265088 (0.0017) [2025-01-05 12:55:26,672][09057] Updated weights for policy 0, policy_version 265098 (0.0015) [2025-01-05 12:55:27,842][08963] Fps is (10 sec: 20070.7, 60 sec: 19797.4, 300 sec: 19799.7). Total num frames: 1085861888. Throughput: 0: 4941.8. Samples: 21448854. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 12:55:27,842][08963] Avg episode reward: [(0, '9.382')] [2025-01-05 12:55:28,747][09057] Updated weights for policy 0, policy_version 265108 (0.0015) [2025-01-05 12:55:30,782][09057] Updated weights for policy 0, policy_version 265118 (0.0015) [2025-01-05 12:55:32,842][08963] Fps is (10 sec: 19661.1, 60 sec: 19797.4, 300 sec: 19785.8). Total num frames: 1085960192. Throughput: 0: 4951.1. Samples: 21478944. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 12:55:32,842][08963] Avg episode reward: [(0, '9.285')] [2025-01-05 12:55:32,849][09057] Updated weights for policy 0, policy_version 265128 (0.0016) [2025-01-05 12:55:35,017][09057] Updated weights for policy 0, policy_version 265138 (0.0016) [2025-01-05 12:55:37,049][09057] Updated weights for policy 0, policy_version 265148 (0.0015) [2025-01-05 12:55:37,842][08963] Fps is (10 sec: 19660.6, 60 sec: 19797.3, 300 sec: 19785.8). Total num frames: 1086058496. Throughput: 0: 4953.6. Samples: 21508344. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 12:55:37,842][08963] Avg episode reward: [(0, '10.671')] [2025-01-05 12:55:39,121][09057] Updated weights for policy 0, policy_version 265158 (0.0017) [2025-01-05 12:55:41,217][09057] Updated weights for policy 0, policy_version 265168 (0.0015) [2025-01-05 12:55:42,842][08963] Fps is (10 sec: 19660.7, 60 sec: 19729.1, 300 sec: 19785.8). Total num frames: 1086156800. Throughput: 0: 4955.2. Samples: 21523150. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 12:55:42,842][08963] Avg episode reward: [(0, '10.114')] [2025-01-05 12:55:43,350][09057] Updated weights for policy 0, policy_version 265178 (0.0016) [2025-01-05 12:55:45,353][09057] Updated weights for policy 0, policy_version 265188 (0.0016) [2025-01-05 12:55:47,413][09057] Updated weights for policy 0, policy_version 265198 (0.0015) [2025-01-05 12:55:47,842][08963] Fps is (10 sec: 20070.3, 60 sec: 19797.3, 300 sec: 19799.6). Total num frames: 1086259200. Throughput: 0: 4953.9. Samples: 21552828. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 12:55:47,842][08963] Avg episode reward: [(0, '8.673')] [2025-01-05 12:55:47,849][09024] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000265200_1086259200.pth... [2025-01-05 12:55:47,899][09024] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000264041_1081511936.pth [2025-01-05 12:55:49,575][09057] Updated weights for policy 0, policy_version 265208 (0.0016) [2025-01-05 12:55:51,573][09057] Updated weights for policy 0, policy_version 265218 (0.0016) [2025-01-05 12:55:52,842][08963] Fps is (10 sec: 20070.3, 60 sec: 19797.3, 300 sec: 19785.8). Total num frames: 1086357504. Throughput: 0: 4952.4. Samples: 21582506. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 12:55:52,842][08963] Avg episode reward: [(0, '9.538')] [2025-01-05 12:55:53,616][09057] Updated weights for policy 0, policy_version 265228 (0.0016) [2025-01-05 12:55:55,687][09057] Updated weights for policy 0, policy_version 265238 (0.0015) [2025-01-05 12:55:57,804][09057] Updated weights for policy 0, policy_version 265248 (0.0016) [2025-01-05 12:55:57,842][08963] Fps is (10 sec: 19660.8, 60 sec: 19797.3, 300 sec: 19785.8). Total num frames: 1086455808. Throughput: 0: 4951.3. Samples: 21597432. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 12:55:57,842][08963] Avg episode reward: [(0, '10.121')] [2025-01-05 12:55:59,920][09057] Updated weights for policy 0, policy_version 265258 (0.0016) [2025-01-05 12:56:01,978][09057] Updated weights for policy 0, policy_version 265268 (0.0015) [2025-01-05 12:56:02,842][08963] Fps is (10 sec: 19660.7, 60 sec: 19797.3, 300 sec: 19785.8). Total num frames: 1086554112. Throughput: 0: 4938.5. Samples: 21626724. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 12:56:02,842][08963] Avg episode reward: [(0, '9.090')] [2025-01-05 12:56:04,084][09057] Updated weights for policy 0, policy_version 265278 (0.0017) [2025-01-05 12:56:06,110][09057] Updated weights for policy 0, policy_version 265288 (0.0015) [2025-01-05 12:56:07,842][08963] Fps is (10 sec: 19660.7, 60 sec: 19797.3, 300 sec: 19785.8). Total num frames: 1086652416. Throughput: 0: 4931.9. Samples: 21656192. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 12:56:07,842][08963] Avg episode reward: [(0, '10.703')] [2025-01-05 12:56:08,243][09057] Updated weights for policy 0, policy_version 265298 (0.0016) [2025-01-05 12:56:10,251][09057] Updated weights for policy 0, policy_version 265308 (0.0014) [2025-01-05 12:56:12,254][09057] Updated weights for policy 0, policy_version 265318 (0.0015) [2025-01-05 12:56:12,842][08963] Fps is (10 sec: 19660.9, 60 sec: 19797.3, 300 sec: 19771.9). Total num frames: 1086750720. Throughput: 0: 4943.4. Samples: 21671306. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 12:56:12,842][08963] Avg episode reward: [(0, '8.859')] [2025-01-05 12:56:14,305][09057] Updated weights for policy 0, policy_version 265328 (0.0014) [2025-01-05 12:56:16,329][09057] Updated weights for policy 0, policy_version 265338 (0.0018) [2025-01-05 12:56:17,842][08963] Fps is (10 sec: 20070.7, 60 sec: 19865.6, 300 sec: 19785.8). Total num frames: 1086853120. Throughput: 0: 4948.2. Samples: 21701612. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 12:56:17,842][08963] Avg episode reward: [(0, '8.476')] [2025-01-05 12:56:18,351][09057] Updated weights for policy 0, policy_version 265348 (0.0015) [2025-01-05 12:56:20,414][09057] Updated weights for policy 0, policy_version 265358 (0.0015) [2025-01-05 12:56:22,424][09057] Updated weights for policy 0, policy_version 265368 (0.0015) [2025-01-05 12:56:22,842][08963] Fps is (10 sec: 20070.6, 60 sec: 19797.4, 300 sec: 19785.8). Total num frames: 1086951424. Throughput: 0: 4967.0. Samples: 21731858. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 12:56:22,842][08963] Avg episode reward: [(0, '9.765')] [2025-01-05 12:56:24,444][09057] Updated weights for policy 0, policy_version 265378 (0.0015) [2025-01-05 12:56:26,520][09057] Updated weights for policy 0, policy_version 265388 (0.0016) [2025-01-05 12:56:27,843][08963] Fps is (10 sec: 20068.9, 60 sec: 19865.3, 300 sec: 19799.6). Total num frames: 1087053824. Throughput: 0: 4974.9. Samples: 21747026. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 12:56:27,844][08963] Avg episode reward: [(0, '9.601')] [2025-01-05 12:56:28,598][09057] Updated weights for policy 0, policy_version 265398 (0.0017) [2025-01-05 12:56:30,638][09057] Updated weights for policy 0, policy_version 265408 (0.0017) [2025-01-05 12:56:32,707][09057] Updated weights for policy 0, policy_version 265418 (0.0015) [2025-01-05 12:56:32,842][08963] Fps is (10 sec: 20070.2, 60 sec: 19865.6, 300 sec: 19799.6). Total num frames: 1087152128. Throughput: 0: 4976.2. Samples: 21776758. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 12:56:32,842][08963] Avg episode reward: [(0, '10.465')] [2025-01-05 12:56:34,768][09057] Updated weights for policy 0, policy_version 265428 (0.0016) [2025-01-05 12:56:36,805][09057] Updated weights for policy 0, policy_version 265438 (0.0016) [2025-01-05 12:56:37,842][08963] Fps is (10 sec: 19662.1, 60 sec: 19865.6, 300 sec: 19799.6). Total num frames: 1087250432. Throughput: 0: 4975.8. Samples: 21806418. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 12:56:37,842][08963] Avg episode reward: [(0, '8.994')] [2025-01-05 12:56:38,945][09057] Updated weights for policy 0, policy_version 265448 (0.0016) [2025-01-05 12:56:40,934][09057] Updated weights for policy 0, policy_version 265458 (0.0016) [2025-01-05 12:56:42,842][08963] Fps is (10 sec: 20070.4, 60 sec: 19933.9, 300 sec: 19799.7). Total num frames: 1087352832. Throughput: 0: 4979.0. Samples: 21821486. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 12:56:42,842][08963] Avg episode reward: [(0, '9.265')] [2025-01-05 12:56:42,962][09057] Updated weights for policy 0, policy_version 265468 (0.0015) [2025-01-05 12:56:45,010][09057] Updated weights for policy 0, policy_version 265478 (0.0016) [2025-01-05 12:56:46,984][09057] Updated weights for policy 0, policy_version 265488 (0.0015) [2025-01-05 12:56:47,842][08963] Fps is (10 sec: 20480.1, 60 sec: 19933.9, 300 sec: 19827.4). Total num frames: 1087455232. Throughput: 0: 5002.2. Samples: 21851822. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 12:56:47,842][08963] Avg episode reward: [(0, '9.196')] [2025-01-05 12:56:49,032][09057] Updated weights for policy 0, policy_version 265498 (0.0016) [2025-01-05 12:56:51,088][09057] Updated weights for policy 0, policy_version 265508 (0.0015) [2025-01-05 12:56:52,842][08963] Fps is (10 sec: 20070.3, 60 sec: 19933.9, 300 sec: 19841.3). Total num frames: 1087553536. Throughput: 0: 5014.3. Samples: 21881836. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 12:56:52,842][08963] Avg episode reward: [(0, '11.013')] [2025-01-05 12:56:53,153][09057] Updated weights for policy 0, policy_version 265518 (0.0016) [2025-01-05 12:56:55,210][09057] Updated weights for policy 0, policy_version 265528 (0.0015) [2025-01-05 12:56:57,267][09057] Updated weights for policy 0, policy_version 265538 (0.0017) [2025-01-05 12:56:57,842][08963] Fps is (10 sec: 19660.6, 60 sec: 19933.9, 300 sec: 19841.3). Total num frames: 1087651840. Throughput: 0: 5011.7. Samples: 21896834. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 12:56:57,842][08963] Avg episode reward: [(0, '9.302')] [2025-01-05 12:56:59,326][09057] Updated weights for policy 0, policy_version 265548 (0.0016) [2025-01-05 12:57:01,376][09057] Updated weights for policy 0, policy_version 265558 (0.0017) [2025-01-05 12:57:02,842][08963] Fps is (10 sec: 19660.8, 60 sec: 19933.9, 300 sec: 19841.3). Total num frames: 1087750144. Throughput: 0: 4999.8. Samples: 21926602. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 12:57:02,842][08963] Avg episode reward: [(0, '9.376')] [2025-01-05 12:57:03,571][09057] Updated weights for policy 0, policy_version 265568 (0.0017) [2025-01-05 12:57:05,557][09057] Updated weights for policy 0, policy_version 265578 (0.0016) [2025-01-05 12:57:07,633][09057] Updated weights for policy 0, policy_version 265588 (0.0016) [2025-01-05 12:57:07,842][08963] Fps is (10 sec: 19661.1, 60 sec: 19933.9, 300 sec: 19827.4). Total num frames: 1087848448. Throughput: 0: 4981.6. Samples: 21956028. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 12:57:07,842][08963] Avg episode reward: [(0, '10.192')] [2025-01-05 12:57:09,817][09057] Updated weights for policy 0, policy_version 265598 (0.0018) [2025-01-05 12:57:11,793][09057] Updated weights for policy 0, policy_version 265608 (0.0015) [2025-01-05 12:57:12,842][08963] Fps is (10 sec: 19661.1, 60 sec: 19933.9, 300 sec: 19827.4). Total num frames: 1087946752. Throughput: 0: 4969.7. Samples: 21970658. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 12:57:12,842][08963] Avg episode reward: [(0, '10.189')] [2025-01-05 12:57:13,872][09057] Updated weights for policy 0, policy_version 265618 (0.0015) [2025-01-05 12:57:15,950][09057] Updated weights for policy 0, policy_version 265628 (0.0016) [2025-01-05 12:57:17,842][08963] Fps is (10 sec: 19660.8, 60 sec: 19865.6, 300 sec: 19827.4). Total num frames: 1088045056. Throughput: 0: 4970.5. Samples: 22000430. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) [2025-01-05 12:57:17,842][08963] Avg episode reward: [(0, '10.287')] [2025-01-05 12:57:18,077][09057] Updated weights for policy 0, policy_version 265638 (0.0017) [2025-01-05 12:57:20,116][09057] Updated weights for policy 0, policy_version 265648 (0.0016) [2025-01-05 12:57:22,217][09057] Updated weights for policy 0, policy_version 265658 (0.0016) [2025-01-05 12:57:22,842][08963] Fps is (10 sec: 19660.6, 60 sec: 19865.6, 300 sec: 19813.5). Total num frames: 1088143360. Throughput: 0: 4966.1. Samples: 22029892. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) [2025-01-05 12:57:22,842][08963] Avg episode reward: [(0, '9.619')] [2025-01-05 12:57:24,295][09057] Updated weights for policy 0, policy_version 265668 (0.0016) [2025-01-05 12:57:26,345][09057] Updated weights for policy 0, policy_version 265678 (0.0016) [2025-01-05 12:57:27,842][08963] Fps is (10 sec: 19660.8, 60 sec: 19797.6, 300 sec: 19813.5). Total num frames: 1088241664. Throughput: 0: 4959.5. Samples: 22044662. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) [2025-01-05 12:57:27,842][08963] Avg episode reward: [(0, '10.015')] [2025-01-05 12:57:28,542][09057] Updated weights for policy 0, policy_version 265688 (0.0016) [2025-01-05 12:57:30,548][09057] Updated weights for policy 0, policy_version 265698 (0.0017) [2025-01-05 12:57:32,602][09057] Updated weights for policy 0, policy_version 265708 (0.0015) [2025-01-05 12:57:32,842][08963] Fps is (10 sec: 19660.9, 60 sec: 19797.3, 300 sec: 19799.6). Total num frames: 1088339968. Throughput: 0: 4939.8. Samples: 22074114. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) [2025-01-05 12:57:32,842][08963] Avg episode reward: [(0, '8.576')] [2025-01-05 12:57:34,795][09057] Updated weights for policy 0, policy_version 265718 (0.0017) [2025-01-05 12:57:36,806][09057] Updated weights for policy 0, policy_version 265728 (0.0015) [2025-01-05 12:57:37,842][08963] Fps is (10 sec: 19660.7, 60 sec: 19797.3, 300 sec: 19799.6). Total num frames: 1088438272. Throughput: 0: 4926.2. Samples: 22103514. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) [2025-01-05 12:57:37,842][08963] Avg episode reward: [(0, '8.632')] [2025-01-05 12:57:38,849][09057] Updated weights for policy 0, policy_version 265738 (0.0014) [2025-01-05 12:57:40,924][09057] Updated weights for policy 0, policy_version 265748 (0.0015) [2025-01-05 12:57:42,842][08963] Fps is (10 sec: 19660.7, 60 sec: 19729.1, 300 sec: 19799.6). Total num frames: 1088536576. Throughput: 0: 4927.1. Samples: 22118552. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) [2025-01-05 12:57:42,843][08963] Avg episode reward: [(0, '9.162')] [2025-01-05 12:57:43,062][09057] Updated weights for policy 0, policy_version 265758 (0.0017) [2025-01-05 12:57:45,097][09057] Updated weights for policy 0, policy_version 265768 (0.0015) [2025-01-05 12:57:47,159][09057] Updated weights for policy 0, policy_version 265778 (0.0017) [2025-01-05 12:57:47,842][08963] Fps is (10 sec: 20070.3, 60 sec: 19729.0, 300 sec: 19813.5). Total num frames: 1088638976. Throughput: 0: 4921.9. Samples: 22148088. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) [2025-01-05 12:57:47,843][08963] Avg episode reward: [(0, '9.592')] [2025-01-05 12:57:47,851][09024] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000265781_1088638976.pth... [2025-01-05 12:57:47,908][09024] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000264621_1083887616.pth [2025-01-05 12:57:49,344][09057] Updated weights for policy 0, policy_version 265788 (0.0017) [2025-01-05 12:57:51,361][09057] Updated weights for policy 0, policy_version 265798 (0.0016) [2025-01-05 12:57:52,842][08963] Fps is (10 sec: 20070.3, 60 sec: 19729.1, 300 sec: 19813.5). Total num frames: 1088737280. Throughput: 0: 4918.9. Samples: 22177380. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) [2025-01-05 12:57:52,842][08963] Avg episode reward: [(0, '9.992')] [2025-01-05 12:57:53,450][09057] Updated weights for policy 0, policy_version 265808 (0.0015) [2025-01-05 12:57:55,486][09057] Updated weights for policy 0, policy_version 265818 (0.0015) [2025-01-05 12:57:57,506][09057] Updated weights for policy 0, policy_version 265828 (0.0017) [2025-01-05 12:57:57,842][08963] Fps is (10 sec: 19661.0, 60 sec: 19729.1, 300 sec: 19799.6). Total num frames: 1088835584. Throughput: 0: 4927.4. Samples: 22192390. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) [2025-01-05 12:57:57,842][08963] Avg episode reward: [(0, '10.255')] [2025-01-05 12:57:59,611][09057] Updated weights for policy 0, policy_version 265838 (0.0016) [2025-01-05 12:58:01,660][09057] Updated weights for policy 0, policy_version 265848 (0.0017) [2025-01-05 12:58:02,842][08963] Fps is (10 sec: 19661.0, 60 sec: 19729.1, 300 sec: 19799.7). Total num frames: 1088933888. Throughput: 0: 4927.1. Samples: 22222150. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) [2025-01-05 12:58:02,842][08963] Avg episode reward: [(0, '10.495')] [2025-01-05 12:58:03,828][09057] Updated weights for policy 0, policy_version 265858 (0.0017) [2025-01-05 12:58:05,903][09057] Updated weights for policy 0, policy_version 265868 (0.0016) [2025-01-05 12:58:07,842][08963] Fps is (10 sec: 19251.2, 60 sec: 19660.8, 300 sec: 19771.9). Total num frames: 1089028096. Throughput: 0: 4912.1. Samples: 22250938. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) [2025-01-05 12:58:07,842][08963] Avg episode reward: [(0, '10.462')] [2025-01-05 12:58:08,073][09057] Updated weights for policy 0, policy_version 265878 (0.0017) [2025-01-05 12:58:10,122][09057] Updated weights for policy 0, policy_version 265888 (0.0016) [2025-01-05 12:58:12,172][09057] Updated weights for policy 0, policy_version 265898 (0.0017) [2025-01-05 12:58:12,842][08963] Fps is (10 sec: 19660.6, 60 sec: 19729.0, 300 sec: 19799.6). Total num frames: 1089130496. Throughput: 0: 4915.7. Samples: 22265868. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) [2025-01-05 12:58:12,842][08963] Avg episode reward: [(0, '8.739')] [2025-01-05 12:58:14,319][09057] Updated weights for policy 0, policy_version 265908 (0.0017) [2025-01-05 12:58:16,369][09057] Updated weights for policy 0, policy_version 265918 (0.0016) [2025-01-05 12:58:17,842][08963] Fps is (10 sec: 19660.6, 60 sec: 19660.8, 300 sec: 19771.9). Total num frames: 1089224704. Throughput: 0: 4915.6. Samples: 22295316. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) [2025-01-05 12:58:17,842][08963] Avg episode reward: [(0, '9.929')] [2025-01-05 12:58:18,523][09057] Updated weights for policy 0, policy_version 265928 (0.0017) [2025-01-05 12:58:20,551][09057] Updated weights for policy 0, policy_version 265938 (0.0016) [2025-01-05 12:58:22,630][09057] Updated weights for policy 0, policy_version 265948 (0.0016) [2025-01-05 12:58:22,842][08963] Fps is (10 sec: 19660.9, 60 sec: 19729.1, 300 sec: 19785.8). Total num frames: 1089327104. Throughput: 0: 4917.5. Samples: 22324802. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) [2025-01-05 12:58:22,842][08963] Avg episode reward: [(0, '9.865')] [2025-01-05 12:58:24,770][09057] Updated weights for policy 0, policy_version 265958 (0.0017) [2025-01-05 12:58:26,810][09057] Updated weights for policy 0, policy_version 265968 (0.0017) [2025-01-05 12:58:27,842][08963] Fps is (10 sec: 19661.1, 60 sec: 19660.8, 300 sec: 19771.9). Total num frames: 1089421312. Throughput: 0: 4907.7. Samples: 22339400. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 12:58:27,842][08963] Avg episode reward: [(0, '10.313')] [2025-01-05 12:58:28,972][09057] Updated weights for policy 0, policy_version 265978 (0.0017) [2025-01-05 12:58:31,009][09057] Updated weights for policy 0, policy_version 265988 (0.0016) [2025-01-05 12:58:32,842][08963] Fps is (10 sec: 19251.1, 60 sec: 19660.8, 300 sec: 19771.9). Total num frames: 1089519616. Throughput: 0: 4902.4. Samples: 22368696. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 12:58:32,842][08963] Avg episode reward: [(0, '9.808')] [2025-01-05 12:58:33,101][09057] Updated weights for policy 0, policy_version 265998 (0.0017) [2025-01-05 12:58:35,186][09057] Updated weights for policy 0, policy_version 266008 (0.0016) [2025-01-05 12:58:37,224][09057] Updated weights for policy 0, policy_version 266018 (0.0016) [2025-01-05 12:58:37,842][08963] Fps is (10 sec: 19660.7, 60 sec: 19660.8, 300 sec: 19771.9). Total num frames: 1089617920. Throughput: 0: 4911.7. Samples: 22398404. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 12:58:37,842][08963] Avg episode reward: [(0, '10.000')] [2025-01-05 12:58:39,324][09057] Updated weights for policy 0, policy_version 266028 (0.0017) [2025-01-05 12:58:41,392][09057] Updated weights for policy 0, policy_version 266038 (0.0016) [2025-01-05 12:58:42,842][08963] Fps is (10 sec: 19660.8, 60 sec: 19660.8, 300 sec: 19758.0). Total num frames: 1089716224. Throughput: 0: 4906.4. Samples: 22413180. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 12:58:42,842][08963] Avg episode reward: [(0, '11.032')] [2025-01-05 12:58:43,519][09057] Updated weights for policy 0, policy_version 266048 (0.0017) [2025-01-05 12:58:45,523][09057] Updated weights for policy 0, policy_version 266058 (0.0016) [2025-01-05 12:58:47,584][09057] Updated weights for policy 0, policy_version 266068 (0.0016) [2025-01-05 12:58:47,842][08963] Fps is (10 sec: 20070.3, 60 sec: 19660.8, 300 sec: 19771.9). Total num frames: 1089818624. Throughput: 0: 4903.1. Samples: 22442792. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 12:58:47,842][08963] Avg episode reward: [(0, '10.493')] [2025-01-05 12:58:49,745][09057] Updated weights for policy 0, policy_version 266078 (0.0016) [2025-01-05 12:58:51,752][09057] Updated weights for policy 0, policy_version 266088 (0.0016) [2025-01-05 12:58:52,842][08963] Fps is (10 sec: 20070.4, 60 sec: 19660.8, 300 sec: 19771.9). Total num frames: 1089916928. Throughput: 0: 4919.5. Samples: 22472314. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 12:58:52,842][08963] Avg episode reward: [(0, '10.374')] [2025-01-05 12:58:53,829][09057] Updated weights for policy 0, policy_version 266098 (0.0016) [2025-01-05 12:58:55,879][09057] Updated weights for policy 0, policy_version 266108 (0.0016) [2025-01-05 12:58:57,842][08963] Fps is (10 sec: 19661.0, 60 sec: 19660.8, 300 sec: 19771.9). Total num frames: 1090015232. Throughput: 0: 4923.3. Samples: 22487414. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 12:58:57,842][08963] Avg episode reward: [(0, '10.230')] [2025-01-05 12:58:57,963][09057] Updated weights for policy 0, policy_version 266118 (0.0017) [2025-01-05 12:59:00,027][09057] Updated weights for policy 0, policy_version 266128 (0.0016) [2025-01-05 12:59:02,078][09057] Updated weights for policy 0, policy_version 266138 (0.0016) [2025-01-05 12:59:02,842][08963] Fps is (10 sec: 19661.0, 60 sec: 19660.8, 300 sec: 19771.9). Total num frames: 1090113536. Throughput: 0: 4927.4. Samples: 22517046. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 12:59:02,842][08963] Avg episode reward: [(0, '10.129')] [2025-01-05 12:59:04,177][09057] Updated weights for policy 0, policy_version 266148 (0.0017) [2025-01-05 12:59:06,178][09057] Updated weights for policy 0, policy_version 266158 (0.0015) [2025-01-05 12:59:07,842][08963] Fps is (10 sec: 20070.3, 60 sec: 19797.3, 300 sec: 19785.8). Total num frames: 1090215936. Throughput: 0: 4939.9. Samples: 22547096. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 12:59:07,842][08963] Avg episode reward: [(0, '9.535')] [2025-01-05 12:59:08,211][09057] Updated weights for policy 0, policy_version 266168 (0.0015) [2025-01-05 12:59:10,227][09057] Updated weights for policy 0, policy_version 266178 (0.0016) [2025-01-05 12:59:12,268][09057] Updated weights for policy 0, policy_version 266188 (0.0016) [2025-01-05 12:59:12,842][08963] Fps is (10 sec: 20070.2, 60 sec: 19729.1, 300 sec: 19785.8). Total num frames: 1090314240. Throughput: 0: 4951.5. Samples: 22562216. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 12:59:12,842][08963] Avg episode reward: [(0, '10.156')] [2025-01-05 12:59:14,347][09057] Updated weights for policy 0, policy_version 266198 (0.0016) [2025-01-05 12:59:16,349][09057] Updated weights for policy 0, policy_version 266208 (0.0015) [2025-01-05 12:59:17,842][08963] Fps is (10 sec: 20070.6, 60 sec: 19865.7, 300 sec: 19799.7). Total num frames: 1090416640. Throughput: 0: 4969.8. Samples: 22592338. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 12:59:17,842][08963] Avg episode reward: [(0, '10.281')] [2025-01-05 12:59:18,397][09057] Updated weights for policy 0, policy_version 266218 (0.0014) [2025-01-05 12:59:20,441][09057] Updated weights for policy 0, policy_version 266228 (0.0016) [2025-01-05 12:59:22,426][09057] Updated weights for policy 0, policy_version 266238 (0.0015) [2025-01-05 12:59:22,842][08963] Fps is (10 sec: 20070.6, 60 sec: 19797.3, 300 sec: 19799.6). Total num frames: 1090514944. Throughput: 0: 4981.1. Samples: 22622552. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 12:59:22,842][08963] Avg episode reward: [(0, '9.412')] [2025-01-05 12:59:24,457][09057] Updated weights for policy 0, policy_version 266248 (0.0015) [2025-01-05 12:59:26,531][09057] Updated weights for policy 0, policy_version 266258 (0.0015) [2025-01-05 12:59:27,842][08963] Fps is (10 sec: 20070.1, 60 sec: 19933.8, 300 sec: 19813.5). Total num frames: 1090617344. Throughput: 0: 4989.5. Samples: 22637706. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 12:59:27,842][08963] Avg episode reward: [(0, '9.789')] [2025-01-05 12:59:28,622][09057] Updated weights for policy 0, policy_version 266268 (0.0017) [2025-01-05 12:59:30,677][09057] Updated weights for policy 0, policy_version 266278 (0.0016) [2025-01-05 12:59:32,842][08963] Fps is (10 sec: 20070.3, 60 sec: 19933.9, 300 sec: 19813.5). Total num frames: 1090715648. Throughput: 0: 4983.6. Samples: 22667052. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 12:59:32,842][09057] Updated weights for policy 0, policy_version 266288 (0.0017) [2025-01-05 12:59:32,842][08963] Avg episode reward: [(0, '9.730')] [2025-01-05 12:59:34,890][09057] Updated weights for policy 0, policy_version 266298 (0.0016) [2025-01-05 12:59:36,944][09057] Updated weights for policy 0, policy_version 266308 (0.0016) [2025-01-05 12:59:37,842][08963] Fps is (10 sec: 19660.7, 60 sec: 19933.8, 300 sec: 19799.6). Total num frames: 1090813952. Throughput: 0: 4983.3. Samples: 22696564. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-05 12:59:37,842][08963] Avg episode reward: [(0, '10.590')] [2025-01-05 12:59:39,097][09057] Updated weights for policy 0, policy_version 266318 (0.0017) [2025-01-05 12:59:41,081][09057] Updated weights for policy 0, policy_version 266328 (0.0015) [2025-01-05 12:59:42,842][08963] Fps is (10 sec: 19660.7, 60 sec: 19933.9, 300 sec: 19799.6). Total num frames: 1090912256. Throughput: 0: 4976.8. Samples: 22711370. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-05 12:59:42,842][08963] Avg episode reward: [(0, '11.046')] [2025-01-05 12:59:43,148][09057] Updated weights for policy 0, policy_version 266338 (0.0016) [2025-01-05 12:59:45,196][09057] Updated weights for policy 0, policy_version 266348 (0.0015) [2025-01-05 12:59:47,205][09057] Updated weights for policy 0, policy_version 266358 (0.0016) [2025-01-05 12:59:47,842][08963] Fps is (10 sec: 20070.4, 60 sec: 19933.9, 300 sec: 19813.5). Total num frames: 1091014656. Throughput: 0: 4989.2. Samples: 22741562. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-05 12:59:47,842][08963] Avg episode reward: [(0, '9.961')] [2025-01-05 12:59:47,848][09024] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000266361_1091014656.pth... [2025-01-05 12:59:47,900][09024] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000265200_1086259200.pth [2025-01-05 12:59:49,300][09057] Updated weights for policy 0, policy_version 266368 (0.0016) [2025-01-05 12:59:51,353][09057] Updated weights for policy 0, policy_version 266378 (0.0015) [2025-01-05 12:59:52,842][08963] Fps is (10 sec: 20070.5, 60 sec: 19933.9, 300 sec: 19813.5). Total num frames: 1091112960. Throughput: 0: 4980.5. Samples: 22771218. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-05 12:59:52,842][08963] Avg episode reward: [(0, '9.952')] [2025-01-05 12:59:53,434][09057] Updated weights for policy 0, policy_version 266388 (0.0017) [2025-01-05 12:59:55,475][09057] Updated weights for policy 0, policy_version 266398 (0.0017) [2025-01-05 12:59:57,517][09057] Updated weights for policy 0, policy_version 266408 (0.0015) [2025-01-05 12:59:57,842][08963] Fps is (10 sec: 19660.9, 60 sec: 19933.8, 300 sec: 19813.5). Total num frames: 1091211264. Throughput: 0: 4977.5. Samples: 22786202. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-05 12:59:57,842][08963] Avg episode reward: [(0, '9.219')] [2025-01-05 12:59:59,601][09057] Updated weights for policy 0, policy_version 266418 (0.0017) [2025-01-05 13:00:01,631][09057] Updated weights for policy 0, policy_version 266428 (0.0016) [2025-01-05 13:00:02,842][08963] Fps is (10 sec: 19660.8, 60 sec: 19933.8, 300 sec: 19813.5). Total num frames: 1091309568. Throughput: 0: 4973.5. Samples: 22816146. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-05 13:00:02,842][08963] Avg episode reward: [(0, '10.756')] [2025-01-05 13:00:03,747][09057] Updated weights for policy 0, policy_version 266438 (0.0016) [2025-01-05 13:00:05,770][09057] Updated weights for policy 0, policy_version 266448 (0.0016) [2025-01-05 13:00:07,842][08963] Fps is (10 sec: 19661.0, 60 sec: 19865.6, 300 sec: 19813.5). Total num frames: 1091407872. Throughput: 0: 4958.9. Samples: 22845704. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-05 13:00:07,842][08963] Avg episode reward: [(0, '10.103')] [2025-01-05 13:00:07,857][09057] Updated weights for policy 0, policy_version 266458 (0.0016) [2025-01-05 13:00:10,005][09057] Updated weights for policy 0, policy_version 266468 (0.0016) [2025-01-05 13:00:12,045][09057] Updated weights for policy 0, policy_version 266478 (0.0016) [2025-01-05 13:00:12,842][08963] Fps is (10 sec: 19660.7, 60 sec: 19865.6, 300 sec: 19813.5). Total num frames: 1091506176. Throughput: 0: 4951.5. Samples: 22860524. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-05 13:00:12,842][08963] Avg episode reward: [(0, '9.608')] [2025-01-05 13:00:14,155][09057] Updated weights for policy 0, policy_version 266488 (0.0017) [2025-01-05 13:00:16,228][09057] Updated weights for policy 0, policy_version 266498 (0.0016) [2025-01-05 13:00:17,842][08963] Fps is (10 sec: 19660.8, 60 sec: 19797.3, 300 sec: 19799.7). Total num frames: 1091604480. Throughput: 0: 4953.9. Samples: 22889978. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-05 13:00:17,842][08963] Avg episode reward: [(0, '9.918')] [2025-01-05 13:00:18,312][09057] Updated weights for policy 0, policy_version 266508 (0.0017) [2025-01-05 13:00:20,349][09057] Updated weights for policy 0, policy_version 266518 (0.0016) [2025-01-05 13:00:22,441][09057] Updated weights for policy 0, policy_version 266528 (0.0016) [2025-01-05 13:00:22,842][08963] Fps is (10 sec: 20070.5, 60 sec: 19865.6, 300 sec: 19813.5). Total num frames: 1091706880. Throughput: 0: 4957.6. Samples: 22919656. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-05 13:00:22,842][08963] Avg episode reward: [(0, '9.674')] [2025-01-05 13:00:24,509][09057] Updated weights for policy 0, policy_version 266538 (0.0017) [2025-01-05 13:00:26,537][09057] Updated weights for policy 0, policy_version 266548 (0.0016) [2025-01-05 13:00:27,842][08963] Fps is (10 sec: 20070.3, 60 sec: 19797.3, 300 sec: 19813.5). Total num frames: 1091805184. Throughput: 0: 4957.3. Samples: 22934448. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-05 13:00:27,842][08963] Avg episode reward: [(0, '8.986')] [2025-01-05 13:00:28,705][09057] Updated weights for policy 0, policy_version 266558 (0.0017) [2025-01-05 13:00:30,690][09057] Updated weights for policy 0, policy_version 266568 (0.0016) [2025-01-05 13:00:32,722][09057] Updated weights for policy 0, policy_version 266578 (0.0016) [2025-01-05 13:00:32,842][08963] Fps is (10 sec: 19660.9, 60 sec: 19797.3, 300 sec: 19813.5). Total num frames: 1091903488. Throughput: 0: 4950.4. Samples: 22964330. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-05 13:00:32,842][08963] Avg episode reward: [(0, '9.262')] [2025-01-05 13:00:34,877][09057] Updated weights for policy 0, policy_version 266588 (0.0016) [2025-01-05 13:00:36,878][09057] Updated weights for policy 0, policy_version 266598 (0.0016) [2025-01-05 13:00:37,842][08963] Fps is (10 sec: 19660.6, 60 sec: 19797.3, 300 sec: 19813.5). Total num frames: 1092001792. Throughput: 0: 4949.4. Samples: 22993942. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-05 13:00:37,843][08963] Avg episode reward: [(0, '9.593')] [2025-01-05 13:00:38,902][09057] Updated weights for policy 0, policy_version 266608 (0.0016) [2025-01-05 13:00:40,997][09057] Updated weights for policy 0, policy_version 266618 (0.0016) [2025-01-05 13:00:42,842][08963] Fps is (10 sec: 19660.7, 60 sec: 19797.3, 300 sec: 19799.6). Total num frames: 1092100096. Throughput: 0: 4953.9. Samples: 23009126. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-05 13:00:42,843][08963] Avg episode reward: [(0, '10.034')] [2025-01-05 13:00:43,059][09057] Updated weights for policy 0, policy_version 266628 (0.0017) [2025-01-05 13:00:45,083][09057] Updated weights for policy 0, policy_version 266638 (0.0016) [2025-01-05 13:00:47,170][09057] Updated weights for policy 0, policy_version 266648 (0.0016) [2025-01-05 13:00:47,842][08963] Fps is (10 sec: 20070.8, 60 sec: 19797.4, 300 sec: 19813.5). Total num frames: 1092202496. Throughput: 0: 4951.6. Samples: 23038966. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-05 13:00:47,842][08963] Avg episode reward: [(0, '9.388')] [2025-01-05 13:00:49,220][09057] Updated weights for policy 0, policy_version 266658 (0.0016) [2025-01-05 13:00:51,250][09057] Updated weights for policy 0, policy_version 266668 (0.0016) [2025-01-05 13:00:52,842][08963] Fps is (10 sec: 20070.5, 60 sec: 19797.3, 300 sec: 19813.5). Total num frames: 1092300800. Throughput: 0: 4954.2. Samples: 23068644. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-05 13:00:52,843][08963] Avg episode reward: [(0, '10.660')] [2025-01-05 13:00:53,371][09057] Updated weights for policy 0, policy_version 266678 (0.0016) [2025-01-05 13:00:55,370][09057] Updated weights for policy 0, policy_version 266688 (0.0016) [2025-01-05 13:00:57,383][09057] Updated weights for policy 0, policy_version 266698 (0.0016) [2025-01-05 13:00:57,842][08963] Fps is (10 sec: 20070.2, 60 sec: 19865.6, 300 sec: 19827.4). Total num frames: 1092403200. Throughput: 0: 4960.5. Samples: 23083746. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-05 13:00:57,842][08963] Avg episode reward: [(0, '10.166')] [2025-01-05 13:00:59,461][09057] Updated weights for policy 0, policy_version 266708 (0.0016) [2025-01-05 13:01:01,456][09057] Updated weights for policy 0, policy_version 266718 (0.0016) [2025-01-05 13:01:02,842][08963] Fps is (10 sec: 20070.5, 60 sec: 19865.6, 300 sec: 19827.4). Total num frames: 1092501504. Throughput: 0: 4977.2. Samples: 23113954. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-05 13:01:02,842][08963] Avg episode reward: [(0, '9.086')] [2025-01-05 13:01:03,485][09057] Updated weights for policy 0, policy_version 266728 (0.0016) [2025-01-05 13:01:05,553][09057] Updated weights for policy 0, policy_version 266738 (0.0016) [2025-01-05 13:01:07,543][09057] Updated weights for policy 0, policy_version 266748 (0.0016) [2025-01-05 13:01:07,842][08963] Fps is (10 sec: 20070.4, 60 sec: 19933.8, 300 sec: 19841.3). Total num frames: 1092603904. Throughput: 0: 4992.3. Samples: 23144308. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-05 13:01:07,842][08963] Avg episode reward: [(0, '10.559')] [2025-01-05 13:01:09,575][09057] Updated weights for policy 0, policy_version 266758 (0.0015) [2025-01-05 13:01:11,636][09057] Updated weights for policy 0, policy_version 266768 (0.0016) [2025-01-05 13:01:12,842][08963] Fps is (10 sec: 20070.2, 60 sec: 19933.9, 300 sec: 19827.4). Total num frames: 1092702208. Throughput: 0: 5000.9. Samples: 23159488. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-05 13:01:12,843][08963] Avg episode reward: [(0, '9.725')] [2025-01-05 13:01:13,718][09057] Updated weights for policy 0, policy_version 266778 (0.0018) [2025-01-05 13:01:15,735][09057] Updated weights for policy 0, policy_version 266788 (0.0015) [2025-01-05 13:01:17,817][09057] Updated weights for policy 0, policy_version 266798 (0.0017) [2025-01-05 13:01:17,842][08963] Fps is (10 sec: 20070.0, 60 sec: 20002.0, 300 sec: 19841.3). Total num frames: 1092804608. Throughput: 0: 5000.4. Samples: 23189348. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-05 13:01:17,843][08963] Avg episode reward: [(0, '9.516')] [2025-01-05 13:01:19,872][09057] Updated weights for policy 0, policy_version 266808 (0.0016) [2025-01-05 13:01:21,871][09057] Updated weights for policy 0, policy_version 266818 (0.0016) [2025-01-05 13:01:22,842][08963] Fps is (10 sec: 20070.4, 60 sec: 19933.9, 300 sec: 19827.5). Total num frames: 1092902912. Throughput: 0: 5009.7. Samples: 23219380. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-05 13:01:22,842][08963] Avg episode reward: [(0, '9.827')] [2025-01-05 13:01:23,948][09057] Updated weights for policy 0, policy_version 266828 (0.0016) [2025-01-05 13:01:25,903][09057] Updated weights for policy 0, policy_version 266838 (0.0016) [2025-01-05 13:01:27,842][08963] Fps is (10 sec: 20070.7, 60 sec: 20002.1, 300 sec: 19841.3). Total num frames: 1093005312. Throughput: 0: 5011.3. Samples: 23234636. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-05 13:01:27,842][08963] Avg episode reward: [(0, '9.440')] [2025-01-05 13:01:27,943][09057] Updated weights for policy 0, policy_version 266848 (0.0016) [2025-01-05 13:01:30,023][09057] Updated weights for policy 0, policy_version 266858 (0.0017) [2025-01-05 13:01:32,025][09057] Updated weights for policy 0, policy_version 266868 (0.0015) [2025-01-05 13:01:32,842][08963] Fps is (10 sec: 20480.1, 60 sec: 20070.4, 300 sec: 19855.2). Total num frames: 1093107712. Throughput: 0: 5017.2. Samples: 23264740. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-05 13:01:32,842][08963] Avg episode reward: [(0, '9.465')] [2025-01-05 13:01:34,048][09057] Updated weights for policy 0, policy_version 266878 (0.0016) [2025-01-05 13:01:36,122][09057] Updated weights for policy 0, policy_version 266888 (0.0016) [2025-01-05 13:01:37,842][08963] Fps is (10 sec: 20070.7, 60 sec: 20070.5, 300 sec: 19841.3). Total num frames: 1093206016. Throughput: 0: 5022.9. Samples: 23294674. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-05 13:01:37,842][08963] Avg episode reward: [(0, '10.237')] [2025-01-05 13:01:38,215][09057] Updated weights for policy 0, policy_version 266898 (0.0017) [2025-01-05 13:01:40,242][09057] Updated weights for policy 0, policy_version 266908 (0.0015) [2025-01-05 13:01:42,286][09057] Updated weights for policy 0, policy_version 266918 (0.0015) [2025-01-05 13:01:42,842][08963] Fps is (10 sec: 19660.9, 60 sec: 20070.5, 300 sec: 19827.4). Total num frames: 1093304320. Throughput: 0: 5020.2. Samples: 23309656. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-05 13:01:42,842][08963] Avg episode reward: [(0, '10.780')] [2025-01-05 13:01:44,388][09057] Updated weights for policy 0, policy_version 266928 (0.0016) [2025-01-05 13:01:46,391][09057] Updated weights for policy 0, policy_version 266938 (0.0015) [2025-01-05 13:01:47,842][08963] Fps is (10 sec: 20070.1, 60 sec: 20070.3, 300 sec: 19841.3). Total num frames: 1093406720. Throughput: 0: 5015.9. Samples: 23339670. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-05 13:01:47,843][08963] Avg episode reward: [(0, '9.253')] [2025-01-05 13:01:47,850][09024] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000266945_1093406720.pth... [2025-01-05 13:01:47,907][09024] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000265781_1088638976.pth [2025-01-05 13:01:48,484][09057] Updated weights for policy 0, policy_version 266948 (0.0016) [2025-01-05 13:01:50,509][09057] Updated weights for policy 0, policy_version 266958 (0.0015) [2025-01-05 13:01:52,524][09057] Updated weights for policy 0, policy_version 266968 (0.0015) [2025-01-05 13:01:52,842][08963] Fps is (10 sec: 20070.4, 60 sec: 20070.4, 300 sec: 19841.3). Total num frames: 1093505024. Throughput: 0: 5009.0. Samples: 23369714. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-05 13:01:52,842][08963] Avg episode reward: [(0, '8.839')] [2025-01-05 13:01:54,591][09057] Updated weights for policy 0, policy_version 266978 (0.0015) [2025-01-05 13:01:56,633][09057] Updated weights for policy 0, policy_version 266988 (0.0016) [2025-01-05 13:01:57,842][08963] Fps is (10 sec: 19660.8, 60 sec: 20002.1, 300 sec: 19841.3). Total num frames: 1093603328. Throughput: 0: 5006.5. Samples: 23384780. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-05 13:01:57,843][08963] Avg episode reward: [(0, '10.301')] [2025-01-05 13:01:58,748][09057] Updated weights for policy 0, policy_version 266998 (0.0017) [2025-01-05 13:02:00,835][09057] Updated weights for policy 0, policy_version 267008 (0.0015) [2025-01-05 13:02:02,842][08963] Fps is (10 sec: 19660.8, 60 sec: 20002.1, 300 sec: 19841.3). Total num frames: 1093701632. Throughput: 0: 4994.0. Samples: 23414076. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-05 13:02:02,842][08963] Avg episode reward: [(0, '10.689')] [2025-01-05 13:02:02,942][09057] Updated weights for policy 0, policy_version 267018 (0.0016) [2025-01-05 13:02:04,927][09057] Updated weights for policy 0, policy_version 267028 (0.0015) [2025-01-05 13:02:06,999][09057] Updated weights for policy 0, policy_version 267038 (0.0015) [2025-01-05 13:02:07,842][08963] Fps is (10 sec: 19660.8, 60 sec: 19933.9, 300 sec: 19841.3). Total num frames: 1093799936. Throughput: 0: 4991.7. Samples: 23444006. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-05 13:02:07,842][08963] Avg episode reward: [(0, '10.601')] [2025-01-05 13:02:09,141][09057] Updated weights for policy 0, policy_version 267048 (0.0017) [2025-01-05 13:02:11,134][09057] Updated weights for policy 0, policy_version 267058 (0.0017) [2025-01-05 13:02:12,842][08963] Fps is (10 sec: 20070.2, 60 sec: 20002.1, 300 sec: 19855.2). Total num frames: 1093902336. Throughput: 0: 4980.5. Samples: 23458756. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-05 13:02:12,842][08963] Avg episode reward: [(0, '9.777')] [2025-01-05 13:02:13,221][09057] Updated weights for policy 0, policy_version 267068 (0.0015) [2025-01-05 13:02:15,260][09057] Updated weights for policy 0, policy_version 267078 (0.0015) [2025-01-05 13:02:17,243][09057] Updated weights for policy 0, policy_version 267088 (0.0016) [2025-01-05 13:02:17,842][08963] Fps is (10 sec: 20480.1, 60 sec: 20002.2, 300 sec: 19869.1). Total num frames: 1094004736. Throughput: 0: 4980.4. Samples: 23488860. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-05 13:02:17,842][08963] Avg episode reward: [(0, '9.312')] [2025-01-05 13:02:19,307][09057] Updated weights for policy 0, policy_version 267098 (0.0015) [2025-01-05 13:02:21,355][09057] Updated weights for policy 0, policy_version 267108 (0.0015) [2025-01-05 13:02:22,842][08963] Fps is (10 sec: 20070.4, 60 sec: 20002.1, 300 sec: 19869.1). Total num frames: 1094103040. Throughput: 0: 4980.3. Samples: 23518786. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-05 13:02:22,842][08963] Avg episode reward: [(0, '10.575')] [2025-01-05 13:02:23,430][09057] Updated weights for policy 0, policy_version 267118 (0.0016) [2025-01-05 13:02:25,521][09057] Updated weights for policy 0, policy_version 267128 (0.0015) [2025-01-05 13:02:27,550][09057] Updated weights for policy 0, policy_version 267138 (0.0015) [2025-01-05 13:02:27,842][08963] Fps is (10 sec: 19661.0, 60 sec: 19933.9, 300 sec: 19869.1). Total num frames: 1094201344. Throughput: 0: 4980.3. Samples: 23533770. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-05 13:02:27,842][08963] Avg episode reward: [(0, '10.295')] [2025-01-05 13:02:29,621][09057] Updated weights for policy 0, policy_version 267148 (0.0016) [2025-01-05 13:02:31,720][09057] Updated weights for policy 0, policy_version 267158 (0.0016) [2025-01-05 13:02:32,842][08963] Fps is (10 sec: 19660.7, 60 sec: 19865.6, 300 sec: 19869.1). Total num frames: 1094299648. Throughput: 0: 4972.6. Samples: 23563438. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-05 13:02:32,842][08963] Avg episode reward: [(0, '8.912')] [2025-01-05 13:02:33,867][09057] Updated weights for policy 0, policy_version 267168 (0.0018) [2025-01-05 13:02:35,845][09057] Updated weights for policy 0, policy_version 267178 (0.0014) [2025-01-05 13:02:37,842][08963] Fps is (10 sec: 19660.8, 60 sec: 19865.6, 300 sec: 19869.1). Total num frames: 1094397952. Throughput: 0: 4962.5. Samples: 23593028. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-05 13:02:37,842][08963] Avg episode reward: [(0, '9.998')] [2025-01-05 13:02:37,933][09057] Updated weights for policy 0, policy_version 267188 (0.0015) [2025-01-05 13:02:39,973][09057] Updated weights for policy 0, policy_version 267198 (0.0015) [2025-01-05 13:02:41,967][09057] Updated weights for policy 0, policy_version 267208 (0.0015) [2025-01-05 13:02:42,842][08963] Fps is (10 sec: 20070.5, 60 sec: 19933.8, 300 sec: 19869.1). Total num frames: 1094500352. Throughput: 0: 4963.3. Samples: 23608130. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-05 13:02:42,842][08963] Avg episode reward: [(0, '10.260')] [2025-01-05 13:02:44,058][09057] Updated weights for policy 0, policy_version 267218 (0.0015) [2025-01-05 13:02:46,064][09057] Updated weights for policy 0, policy_version 267228 (0.0015) [2025-01-05 13:02:47,842][08963] Fps is (10 sec: 20070.2, 60 sec: 19865.6, 300 sec: 19869.1). Total num frames: 1094598656. Throughput: 0: 4983.4. Samples: 23638330. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-05 13:02:47,842][08963] Avg episode reward: [(0, '9.554')] [2025-01-05 13:02:48,058][09057] Updated weights for policy 0, policy_version 267238 (0.0016) [2025-01-05 13:02:50,161][09057] Updated weights for policy 0, policy_version 267248 (0.0016) [2025-01-05 13:02:52,171][09057] Updated weights for policy 0, policy_version 267258 (0.0016) [2025-01-05 13:02:52,842][08963] Fps is (10 sec: 20070.4, 60 sec: 19933.8, 300 sec: 19883.0). Total num frames: 1094701056. Throughput: 0: 4990.9. Samples: 23668594. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-05 13:02:52,842][08963] Avg episode reward: [(0, '9.944')] [2025-01-05 13:02:54,167][09057] Updated weights for policy 0, policy_version 267268 (0.0016) [2025-01-05 13:02:56,253][09057] Updated weights for policy 0, policy_version 267278 (0.0019) [2025-01-05 13:02:57,842][08963] Fps is (10 sec: 20070.6, 60 sec: 19933.9, 300 sec: 19883.0). Total num frames: 1094799360. Throughput: 0: 4995.4. Samples: 23683550. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-05 13:02:57,842][08963] Avg episode reward: [(0, '9.655')] [2025-01-05 13:02:58,368][09057] Updated weights for policy 0, policy_version 267288 (0.0016) [2025-01-05 13:03:00,379][09057] Updated weights for policy 0, policy_version 267298 (0.0015) [2025-01-05 13:03:02,473][09057] Updated weights for policy 0, policy_version 267308 (0.0015) [2025-01-05 13:03:02,842][08963] Fps is (10 sec: 20070.3, 60 sec: 20002.1, 300 sec: 19910.7). Total num frames: 1094901760. Throughput: 0: 4987.2. Samples: 23713286. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-05 13:03:02,842][08963] Avg episode reward: [(0, '9.545')] [2025-01-05 13:03:04,578][09057] Updated weights for policy 0, policy_version 267318 (0.0017) [2025-01-05 13:03:06,608][09057] Updated weights for policy 0, policy_version 267328 (0.0016) [2025-01-05 13:03:07,842][08963] Fps is (10 sec: 19660.8, 60 sec: 19933.9, 300 sec: 19883.0). Total num frames: 1094995968. Throughput: 0: 4973.1. Samples: 23742574. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-05 13:03:07,842][08963] Avg episode reward: [(0, '11.074')] [2025-01-05 13:03:08,784][09057] Updated weights for policy 0, policy_version 267338 (0.0016) [2025-01-05 13:03:10,821][09057] Updated weights for policy 0, policy_version 267348 (0.0015) [2025-01-05 13:03:12,842][08963] Fps is (10 sec: 19251.4, 60 sec: 19865.6, 300 sec: 19896.9). Total num frames: 1095094272. Throughput: 0: 4970.7. Samples: 23757450. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 13:03:12,842][08963] Avg episode reward: [(0, '10.261')] [2025-01-05 13:03:12,912][09057] Updated weights for policy 0, policy_version 267358 (0.0017) [2025-01-05 13:03:15,067][09057] Updated weights for policy 0, policy_version 267368 (0.0016) [2025-01-05 13:03:17,065][09057] Updated weights for policy 0, policy_version 267378 (0.0015) [2025-01-05 13:03:17,842][08963] Fps is (10 sec: 19660.7, 60 sec: 19797.3, 300 sec: 19883.0). Total num frames: 1095192576. Throughput: 0: 4964.1. Samples: 23786824. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 13:03:17,842][08963] Avg episode reward: [(0, '9.662')] [2025-01-05 13:03:19,100][09057] Updated weights for policy 0, policy_version 267388 (0.0016) [2025-01-05 13:03:21,176][09057] Updated weights for policy 0, policy_version 267398 (0.0016) [2025-01-05 13:03:22,842][08963] Fps is (10 sec: 19660.6, 60 sec: 19797.3, 300 sec: 19896.8). Total num frames: 1095290880. Throughput: 0: 4966.8. Samples: 23816534. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 13:03:22,842][08963] Avg episode reward: [(0, '9.283')] [2025-01-05 13:03:23,279][09057] Updated weights for policy 0, policy_version 267408 (0.0017) [2025-01-05 13:03:25,327][09057] Updated weights for policy 0, policy_version 267418 (0.0017) [2025-01-05 13:03:27,408][09057] Updated weights for policy 0, policy_version 267428 (0.0015) [2025-01-05 13:03:27,842][08963] Fps is (10 sec: 20070.4, 60 sec: 19865.6, 300 sec: 19910.7). Total num frames: 1095393280. Throughput: 0: 4964.0. Samples: 23831510. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 13:03:27,842][08963] Avg episode reward: [(0, '9.766')] [2025-01-05 13:03:29,496][09057] Updated weights for policy 0, policy_version 267438 (0.0016) [2025-01-05 13:03:31,548][09057] Updated weights for policy 0, policy_version 267448 (0.0017) [2025-01-05 13:03:32,842][08963] Fps is (10 sec: 20070.3, 60 sec: 19865.6, 300 sec: 19910.7). Total num frames: 1095491584. Throughput: 0: 4950.4. Samples: 23861098. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 13:03:32,842][08963] Avg episode reward: [(0, '9.140')] [2025-01-05 13:03:33,680][09057] Updated weights for policy 0, policy_version 267458 (0.0016) [2025-01-05 13:03:35,688][09057] Updated weights for policy 0, policy_version 267468 (0.0015) [2025-01-05 13:03:37,751][09057] Updated weights for policy 0, policy_version 267478 (0.0017) [2025-01-05 13:03:37,842][08963] Fps is (10 sec: 19660.6, 60 sec: 19865.6, 300 sec: 19910.7). Total num frames: 1095589888. Throughput: 0: 4936.8. Samples: 23890752. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 13:03:37,842][08963] Avg episode reward: [(0, '10.043')] [2025-01-05 13:03:39,898][09057] Updated weights for policy 0, policy_version 267488 (0.0017) [2025-01-05 13:03:41,919][09057] Updated weights for policy 0, policy_version 267498 (0.0016) [2025-01-05 13:03:42,842][08963] Fps is (10 sec: 19661.1, 60 sec: 19797.4, 300 sec: 19896.9). Total num frames: 1095688192. Throughput: 0: 4928.1. Samples: 23905316. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 13:03:42,842][08963] Avg episode reward: [(0, '9.606')] [2025-01-05 13:03:44,025][09057] Updated weights for policy 0, policy_version 267508 (0.0016) [2025-01-05 13:03:46,042][09057] Updated weights for policy 0, policy_version 267518 (0.0014) [2025-01-05 13:03:47,842][08963] Fps is (10 sec: 19660.6, 60 sec: 19797.3, 300 sec: 19896.8). Total num frames: 1095786496. Throughput: 0: 4929.7. Samples: 23935124. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 13:03:47,843][08963] Avg episode reward: [(0, '9.137')] [2025-01-05 13:03:47,953][09024] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000267527_1095790592.pth... [2025-01-05 13:03:48,005][09024] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000266361_1091014656.pth [2025-01-05 13:03:48,167][09057] Updated weights for policy 0, policy_version 267528 (0.0016) [2025-01-05 13:03:50,240][09057] Updated weights for policy 0, policy_version 267538 (0.0015) [2025-01-05 13:03:52,276][09057] Updated weights for policy 0, policy_version 267548 (0.0016) [2025-01-05 13:03:52,842][08963] Fps is (10 sec: 19660.6, 60 sec: 19729.1, 300 sec: 19896.8). Total num frames: 1095884800. Throughput: 0: 4937.0. Samples: 23964740. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 13:03:52,842][08963] Avg episode reward: [(0, '8.829')] [2025-01-05 13:03:54,385][09057] Updated weights for policy 0, policy_version 267558 (0.0016) [2025-01-05 13:03:56,427][09057] Updated weights for policy 0, policy_version 267568 (0.0015) [2025-01-05 13:03:57,842][08963] Fps is (10 sec: 19660.9, 60 sec: 19729.0, 300 sec: 19896.8). Total num frames: 1095983104. Throughput: 0: 4937.4. Samples: 23979634. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 13:03:57,842][08963] Avg episode reward: [(0, '10.141')] [2025-01-05 13:03:58,546][09057] Updated weights for policy 0, policy_version 267578 (0.0015) [2025-01-05 13:04:00,536][09057] Updated weights for policy 0, policy_version 267588 (0.0015) [2025-01-05 13:04:02,606][09057] Updated weights for policy 0, policy_version 267598 (0.0015) [2025-01-05 13:04:02,842][08963] Fps is (10 sec: 20070.7, 60 sec: 19729.1, 300 sec: 19896.8). Total num frames: 1096085504. Throughput: 0: 4946.1. Samples: 24009400. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 13:04:02,842][08963] Avg episode reward: [(0, '9.003')] [2025-01-05 13:04:04,736][09057] Updated weights for policy 0, policy_version 267608 (0.0016) [2025-01-05 13:04:06,718][09057] Updated weights for policy 0, policy_version 267618 (0.0015) [2025-01-05 13:04:07,842][08963] Fps is (10 sec: 20070.5, 60 sec: 19797.3, 300 sec: 19896.8). Total num frames: 1096183808. Throughput: 0: 4952.1. Samples: 24039378. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 13:04:07,842][08963] Avg episode reward: [(0, '10.700')] [2025-01-05 13:04:08,775][09057] Updated weights for policy 0, policy_version 267628 (0.0015) [2025-01-05 13:04:10,811][09057] Updated weights for policy 0, policy_version 267638 (0.0016) [2025-01-05 13:04:12,842][08963] Fps is (10 sec: 19660.6, 60 sec: 19797.3, 300 sec: 19883.0). Total num frames: 1096282112. Throughput: 0: 4955.4. Samples: 24054502. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 13:04:12,842][08963] Avg episode reward: [(0, '9.415')] [2025-01-05 13:04:12,886][09057] Updated weights for policy 0, policy_version 267648 (0.0017) [2025-01-05 13:04:14,979][09057] Updated weights for policy 0, policy_version 267658 (0.0016) [2025-01-05 13:04:17,070][09057] Updated weights for policy 0, policy_version 267668 (0.0016) [2025-01-05 13:04:17,842][08963] Fps is (10 sec: 19660.9, 60 sec: 19797.3, 300 sec: 19883.0). Total num frames: 1096380416. Throughput: 0: 4951.6. Samples: 24083922. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 13:04:17,842][08963] Avg episode reward: [(0, '10.685')] [2025-01-05 13:04:19,130][09057] Updated weights for policy 0, policy_version 267678 (0.0016) [2025-01-05 13:04:21,166][09057] Updated weights for policy 0, policy_version 267688 (0.0015) [2025-01-05 13:04:22,842][08963] Fps is (10 sec: 19660.9, 60 sec: 19797.4, 300 sec: 19869.1). Total num frames: 1096478720. Throughput: 0: 4947.0. Samples: 24113368. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 13:04:22,842][08963] Avg episode reward: [(0, '11.400')] [2025-01-05 13:04:23,310][09057] Updated weights for policy 0, policy_version 267698 (0.0019) [2025-01-05 13:04:25,296][09057] Updated weights for policy 0, policy_version 267708 (0.0015) [2025-01-05 13:04:27,337][09057] Updated weights for policy 0, policy_version 267718 (0.0015) [2025-01-05 13:04:27,842][08963] Fps is (10 sec: 20070.3, 60 sec: 19797.3, 300 sec: 19883.0). Total num frames: 1096581120. Throughput: 0: 4959.9. Samples: 24128514. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 13:04:27,842][08963] Avg episode reward: [(0, '8.908')] [2025-01-05 13:04:29,499][09057] Updated weights for policy 0, policy_version 267728 (0.0016) [2025-01-05 13:04:31,480][09057] Updated weights for policy 0, policy_version 267738 (0.0016) [2025-01-05 13:04:32,842][08963] Fps is (10 sec: 20070.4, 60 sec: 19797.4, 300 sec: 19883.0). Total num frames: 1096679424. Throughput: 0: 4958.3. Samples: 24158248. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 13:04:32,842][08963] Avg episode reward: [(0, '8.726')] [2025-01-05 13:04:33,529][09057] Updated weights for policy 0, policy_version 267748 (0.0016) [2025-01-05 13:04:35,589][09057] Updated weights for policy 0, policy_version 267758 (0.0015) [2025-01-05 13:04:37,560][09057] Updated weights for policy 0, policy_version 267768 (0.0016) [2025-01-05 13:04:37,842][08963] Fps is (10 sec: 20070.3, 60 sec: 19865.6, 300 sec: 19896.8). Total num frames: 1096781824. Throughput: 0: 4974.6. Samples: 24188596. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 13:04:37,842][08963] Avg episode reward: [(0, '9.492')] [2025-01-05 13:04:39,609][09057] Updated weights for policy 0, policy_version 267778 (0.0016) [2025-01-05 13:04:41,660][09057] Updated weights for policy 0, policy_version 267788 (0.0015) [2025-01-05 13:04:42,842][08963] Fps is (10 sec: 20070.4, 60 sec: 19865.6, 300 sec: 19883.0). Total num frames: 1096880128. Throughput: 0: 4980.5. Samples: 24203756. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 13:04:42,842][08963] Avg episode reward: [(0, '10.538')] [2025-01-05 13:04:43,725][09057] Updated weights for policy 0, policy_version 267798 (0.0017) [2025-01-05 13:04:45,782][09057] Updated weights for policy 0, policy_version 267808 (0.0015) [2025-01-05 13:04:47,842][08963] Fps is (10 sec: 19660.8, 60 sec: 19865.6, 300 sec: 19883.0). Total num frames: 1096978432. Throughput: 0: 4975.8. Samples: 24233312. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 13:04:47,843][08963] Avg episode reward: [(0, '10.451')] [2025-01-05 13:04:47,970][09057] Updated weights for policy 0, policy_version 267818 (0.0016) [2025-01-05 13:04:49,992][09057] Updated weights for policy 0, policy_version 267828 (0.0016) [2025-01-05 13:04:52,065][09057] Updated weights for policy 0, policy_version 267838 (0.0016) [2025-01-05 13:04:52,842][08963] Fps is (10 sec: 19660.6, 60 sec: 19865.6, 300 sec: 19883.0). Total num frames: 1097076736. Throughput: 0: 4963.2. Samples: 24262722. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 13:04:52,843][08963] Avg episode reward: [(0, '9.948')] [2025-01-05 13:04:54,191][09057] Updated weights for policy 0, policy_version 267848 (0.0015) [2025-01-05 13:04:56,160][09057] Updated weights for policy 0, policy_version 267858 (0.0015) [2025-01-05 13:04:57,842][08963] Fps is (10 sec: 20070.7, 60 sec: 19933.9, 300 sec: 19896.8). Total num frames: 1097179136. Throughput: 0: 4960.3. Samples: 24277716. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 13:04:57,842][08963] Avg episode reward: [(0, '9.890')] [2025-01-05 13:04:58,216][09057] Updated weights for policy 0, policy_version 267868 (0.0015) [2025-01-05 13:05:00,293][09057] Updated weights for policy 0, policy_version 267878 (0.0016) [2025-01-05 13:05:02,248][09057] Updated weights for policy 0, policy_version 267888 (0.0015) [2025-01-05 13:05:02,842][08963] Fps is (10 sec: 20070.5, 60 sec: 19865.6, 300 sec: 19896.8). Total num frames: 1097277440. Throughput: 0: 4977.7. Samples: 24307920. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 13:05:02,842][08963] Avg episode reward: [(0, '9.345')] [2025-01-05 13:05:04,324][09057] Updated weights for policy 0, policy_version 267898 (0.0015) [2025-01-05 13:05:06,392][09057] Updated weights for policy 0, policy_version 267908 (0.0015) [2025-01-05 13:05:07,842][08963] Fps is (10 sec: 20070.1, 60 sec: 19933.9, 300 sec: 19910.7). Total num frames: 1097379840. Throughput: 0: 4987.7. Samples: 24337816. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 13:05:07,842][08963] Avg episode reward: [(0, '9.422')] [2025-01-05 13:05:08,450][09057] Updated weights for policy 0, policy_version 267918 (0.0017) [2025-01-05 13:05:10,515][09057] Updated weights for policy 0, policy_version 267928 (0.0015) [2025-01-05 13:05:12,560][09057] Updated weights for policy 0, policy_version 267938 (0.0016) [2025-01-05 13:05:12,842][08963] Fps is (10 sec: 20070.2, 60 sec: 19933.8, 300 sec: 19910.7). Total num frames: 1097478144. Throughput: 0: 4985.6. Samples: 24352866. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 13:05:12,842][08963] Avg episode reward: [(0, '10.122')] [2025-01-05 13:05:14,593][09057] Updated weights for policy 0, policy_version 267948 (0.0015) [2025-01-05 13:05:16,655][09057] Updated weights for policy 0, policy_version 267958 (0.0016) [2025-01-05 13:05:17,842][08963] Fps is (10 sec: 19660.9, 60 sec: 19933.9, 300 sec: 19896.8). Total num frames: 1097576448. Throughput: 0: 4988.7. Samples: 24382738. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 13:05:17,842][08963] Avg episode reward: [(0, '9.531')] [2025-01-05 13:05:18,781][09057] Updated weights for policy 0, policy_version 267968 (0.0015) [2025-01-05 13:05:20,771][09057] Updated weights for policy 0, policy_version 267978 (0.0015) [2025-01-05 13:05:22,828][09057] Updated weights for policy 0, policy_version 267988 (0.0016) [2025-01-05 13:05:22,842][08963] Fps is (10 sec: 20070.6, 60 sec: 20002.1, 300 sec: 19910.7). Total num frames: 1097678848. Throughput: 0: 4979.4. Samples: 24412668. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 13:05:22,842][08963] Avg episode reward: [(0, '9.504')] [2025-01-05 13:05:24,954][09057] Updated weights for policy 0, policy_version 267998 (0.0016) [2025-01-05 13:05:26,949][09057] Updated weights for policy 0, policy_version 268008 (0.0015) [2025-01-05 13:05:27,842][08963] Fps is (10 sec: 20070.6, 60 sec: 19933.9, 300 sec: 19910.7). Total num frames: 1097777152. Throughput: 0: 4968.2. Samples: 24427324. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 13:05:27,842][08963] Avg episode reward: [(0, '9.921')] [2025-01-05 13:05:29,001][09057] Updated weights for policy 0, policy_version 268018 (0.0016) [2025-01-05 13:05:31,077][09057] Updated weights for policy 0, policy_version 268028 (0.0015) [2025-01-05 13:05:32,842][08963] Fps is (10 sec: 19660.8, 60 sec: 19933.8, 300 sec: 19910.7). Total num frames: 1097875456. Throughput: 0: 4979.9. Samples: 24457406. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 13:05:32,842][08963] Avg episode reward: [(0, '9.691')] [2025-01-05 13:05:33,124][09057] Updated weights for policy 0, policy_version 268038 (0.0016) [2025-01-05 13:05:35,162][09057] Updated weights for policy 0, policy_version 268048 (0.0015) [2025-01-05 13:05:37,220][09057] Updated weights for policy 0, policy_version 268058 (0.0015) [2025-01-05 13:05:37,842][08963] Fps is (10 sec: 19660.6, 60 sec: 19865.6, 300 sec: 19910.7). Total num frames: 1097973760. Throughput: 0: 4994.9. Samples: 24487494. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 13:05:37,842][08963] Avg episode reward: [(0, '9.507')] [2025-01-05 13:05:39,277][09057] Updated weights for policy 0, policy_version 268068 (0.0015) [2025-01-05 13:05:41,325][09057] Updated weights for policy 0, policy_version 268078 (0.0016) [2025-01-05 13:05:42,842][08963] Fps is (10 sec: 20070.4, 60 sec: 19933.8, 300 sec: 19910.7). Total num frames: 1098076160. Throughput: 0: 4989.5. Samples: 24502244. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 13:05:42,842][08963] Avg episode reward: [(0, '10.754')] [2025-01-05 13:05:43,467][09057] Updated weights for policy 0, policy_version 268088 (0.0017) [2025-01-05 13:05:45,440][09057] Updated weights for policy 0, policy_version 268098 (0.0015) [2025-01-05 13:05:47,506][09057] Updated weights for policy 0, policy_version 268108 (0.0015) [2025-01-05 13:05:47,842][08963] Fps is (10 sec: 20070.4, 60 sec: 19933.9, 300 sec: 19910.7). Total num frames: 1098174464. Throughput: 0: 4980.2. Samples: 24532030. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 13:05:47,842][08963] Avg episode reward: [(0, '10.048')] [2025-01-05 13:05:47,889][09024] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000268110_1098178560.pth... [2025-01-05 13:05:47,940][09024] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000266945_1093406720.pth [2025-01-05 13:05:49,653][09057] Updated weights for policy 0, policy_version 268118 (0.0016) [2025-01-05 13:05:51,621][09057] Updated weights for policy 0, policy_version 268128 (0.0015) [2025-01-05 13:05:52,842][08963] Fps is (10 sec: 20070.6, 60 sec: 20002.2, 300 sec: 19910.7). Total num frames: 1098276864. Throughput: 0: 4981.0. Samples: 24561960. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 13:05:52,842][08963] Avg episode reward: [(0, '9.306')] [2025-01-05 13:05:53,681][09057] Updated weights for policy 0, policy_version 268138 (0.0016) [2025-01-05 13:05:55,729][09057] Updated weights for policy 0, policy_version 268148 (0.0015) [2025-01-05 13:05:57,763][09057] Updated weights for policy 0, policy_version 268158 (0.0016) [2025-01-05 13:05:57,842][08963] Fps is (10 sec: 20070.7, 60 sec: 19933.9, 300 sec: 19910.7). Total num frames: 1098375168. Throughput: 0: 4983.7. Samples: 24577130. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 13:05:57,842][08963] Avg episode reward: [(0, '9.212')] [2025-01-05 13:05:59,913][09057] Updated weights for policy 0, policy_version 268168 (0.0016) [2025-01-05 13:06:01,952][09057] Updated weights for policy 0, policy_version 268178 (0.0014) [2025-01-05 13:06:02,842][08963] Fps is (10 sec: 19660.9, 60 sec: 19933.9, 300 sec: 19896.9). Total num frames: 1098473472. Throughput: 0: 4976.6. Samples: 24606684. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 13:06:02,842][08963] Avg episode reward: [(0, '9.729')] [2025-01-05 13:06:04,019][09057] Updated weights for policy 0, policy_version 268188 (0.0017) [2025-01-05 13:06:06,088][09057] Updated weights for policy 0, policy_version 268198 (0.0016) [2025-01-05 13:06:07,842][08963] Fps is (10 sec: 19660.4, 60 sec: 19865.6, 300 sec: 19896.8). Total num frames: 1098571776. Throughput: 0: 4966.1. Samples: 24636144. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 13:06:07,843][08963] Avg episode reward: [(0, '10.120')] [2025-01-05 13:06:08,216][09057] Updated weights for policy 0, policy_version 268208 (0.0016) [2025-01-05 13:06:10,189][09057] Updated weights for policy 0, policy_version 268218 (0.0015) [2025-01-05 13:06:12,247][09057] Updated weights for policy 0, policy_version 268228 (0.0015) [2025-01-05 13:06:12,842][08963] Fps is (10 sec: 19660.7, 60 sec: 19865.7, 300 sec: 19883.0). Total num frames: 1098670080. Throughput: 0: 4975.7. Samples: 24651230. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 13:06:12,842][08963] Avg episode reward: [(0, '9.513')] [2025-01-05 13:06:14,401][09057] Updated weights for policy 0, policy_version 268238 (0.0016) [2025-01-05 13:06:16,407][09057] Updated weights for policy 0, policy_version 268248 (0.0017) [2025-01-05 13:06:17,842][08963] Fps is (10 sec: 19661.1, 60 sec: 19865.6, 300 sec: 19883.0). Total num frames: 1098768384. Throughput: 0: 4965.0. Samples: 24680830. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 13:06:17,842][08963] Avg episode reward: [(0, '9.634')] [2025-01-05 13:06:18,482][09057] Updated weights for policy 0, policy_version 268258 (0.0017) [2025-01-05 13:06:20,561][09057] Updated weights for policy 0, policy_version 268268 (0.0017) [2025-01-05 13:06:22,548][09057] Updated weights for policy 0, policy_version 268278 (0.0016) [2025-01-05 13:06:22,842][08963] Fps is (10 sec: 20070.2, 60 sec: 19865.6, 300 sec: 19883.0). Total num frames: 1098870784. Throughput: 0: 4964.6. Samples: 24710900. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 13:06:22,842][08963] Avg episode reward: [(0, '9.642')] [2025-01-05 13:06:24,627][09057] Updated weights for policy 0, policy_version 268288 (0.0016) [2025-01-05 13:06:26,701][09057] Updated weights for policy 0, policy_version 268298 (0.0016) [2025-01-05 13:06:27,842][08963] Fps is (10 sec: 20070.3, 60 sec: 19865.6, 300 sec: 19869.1). Total num frames: 1098969088. Throughput: 0: 4968.1. Samples: 24725810. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 13:06:27,842][08963] Avg episode reward: [(0, '10.735')] [2025-01-05 13:06:28,773][09057] Updated weights for policy 0, policy_version 268308 (0.0017) [2025-01-05 13:06:30,878][09057] Updated weights for policy 0, policy_version 268318 (0.0016) [2025-01-05 13:06:32,842][08963] Fps is (10 sec: 19660.9, 60 sec: 19865.6, 300 sec: 19869.1). Total num frames: 1099067392. Throughput: 0: 4960.1. Samples: 24755236. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 13:06:32,842][08963] Avg episode reward: [(0, '9.421')] [2025-01-05 13:06:33,003][09057] Updated weights for policy 0, policy_version 268328 (0.0016) [2025-01-05 13:06:34,973][09057] Updated weights for policy 0, policy_version 268338 (0.0015) [2025-01-05 13:06:37,027][09057] Updated weights for policy 0, policy_version 268348 (0.0015) [2025-01-05 13:06:37,842][08963] Fps is (10 sec: 19661.0, 60 sec: 19865.6, 300 sec: 19869.1). Total num frames: 1099165696. Throughput: 0: 4959.9. Samples: 24785156. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 13:06:37,842][08963] Avg episode reward: [(0, '10.031')] [2025-01-05 13:06:39,167][09057] Updated weights for policy 0, policy_version 268358 (0.0016) [2025-01-05 13:06:41,156][09057] Updated weights for policy 0, policy_version 268368 (0.0015) [2025-01-05 13:06:42,842][08963] Fps is (10 sec: 19661.0, 60 sec: 19797.4, 300 sec: 19855.2). Total num frames: 1099264000. Throughput: 0: 4953.6. Samples: 24800040. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 13:06:42,842][08963] Avg episode reward: [(0, '8.984')] [2025-01-05 13:06:43,235][09057] Updated weights for policy 0, policy_version 268378 (0.0016) [2025-01-05 13:06:45,309][09057] Updated weights for policy 0, policy_version 268388 (0.0016) [2025-01-05 13:06:47,294][09057] Updated weights for policy 0, policy_version 268398 (0.0016) [2025-01-05 13:06:47,842][08963] Fps is (10 sec: 20070.1, 60 sec: 19865.6, 300 sec: 19869.1). Total num frames: 1099366400. Throughput: 0: 4961.0. Samples: 24829932. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 13:06:47,843][08963] Avg episode reward: [(0, '10.458')] [2025-01-05 13:06:49,362][09057] Updated weights for policy 0, policy_version 268408 (0.0016) [2025-01-05 13:06:51,426][09057] Updated weights for policy 0, policy_version 268418 (0.0016) [2025-01-05 13:06:52,842][08963] Fps is (10 sec: 20070.0, 60 sec: 19797.3, 300 sec: 19869.1). Total num frames: 1099464704. Throughput: 0: 4971.6. Samples: 24859866. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 13:06:52,842][08963] Avg episode reward: [(0, '10.015')] [2025-01-05 13:06:53,516][09057] Updated weights for policy 0, policy_version 268428 (0.0017) [2025-01-05 13:06:55,575][09057] Updated weights for policy 0, policy_version 268438 (0.0017) [2025-01-05 13:06:57,627][09057] Updated weights for policy 0, policy_version 268448 (0.0016) [2025-01-05 13:06:57,842][08963] Fps is (10 sec: 20070.6, 60 sec: 19865.6, 300 sec: 19882.9). Total num frames: 1099567104. Throughput: 0: 4968.8. Samples: 24874826. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 13:06:57,842][08963] Avg episode reward: [(0, '10.010')] [2025-01-05 13:06:59,686][09057] Updated weights for policy 0, policy_version 268458 (0.0016) [2025-01-05 13:07:01,718][09057] Updated weights for policy 0, policy_version 268468 (0.0016) [2025-01-05 13:07:02,842][08963] Fps is (10 sec: 20070.6, 60 sec: 19865.6, 300 sec: 19883.0). Total num frames: 1099665408. Throughput: 0: 4973.0. Samples: 24904616. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 13:07:02,842][08963] Avg episode reward: [(0, '10.221')] [2025-01-05 13:07:03,862][09057] Updated weights for policy 0, policy_version 268478 (0.0016) [2025-01-05 13:07:05,890][09057] Updated weights for policy 0, policy_version 268488 (0.0016) [2025-01-05 13:07:07,842][08963] Fps is (10 sec: 19661.0, 60 sec: 19865.7, 300 sec: 19869.1). Total num frames: 1099763712. Throughput: 0: 4958.4. Samples: 24934028. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 13:07:07,842][08963] Avg episode reward: [(0, '9.039')] [2025-01-05 13:07:08,011][09057] Updated weights for policy 0, policy_version 268498 (0.0017) [2025-01-05 13:07:10,082][09057] Updated weights for policy 0, policy_version 268508 (0.0016) [2025-01-05 13:07:12,108][09057] Updated weights for policy 0, policy_version 268518 (0.0016) [2025-01-05 13:07:12,842][08963] Fps is (10 sec: 19660.7, 60 sec: 19865.6, 300 sec: 19855.2). Total num frames: 1099862016. Throughput: 0: 4959.1. Samples: 24948970. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 13:07:12,842][08963] Avg episode reward: [(0, '9.773')] [2025-01-05 13:07:14,235][09057] Updated weights for policy 0, policy_version 268528 (0.0017) [2025-01-05 13:07:16,312][09057] Updated weights for policy 0, policy_version 268538 (0.0017) [2025-01-05 13:07:17,842][08963] Fps is (10 sec: 19660.5, 60 sec: 19865.6, 300 sec: 19855.2). Total num frames: 1099960320. Throughput: 0: 4960.7. Samples: 24978470. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 13:07:17,842][08963] Avg episode reward: [(0, '9.652')] [2025-01-05 13:07:18,429][09057] Updated weights for policy 0, policy_version 268548 (0.0017) [2025-01-05 13:07:20,069][09024] Stopping Batcher_0... [2025-01-05 13:07:20,069][08963] Component Batcher_0 stopped! [2025-01-05 13:07:20,070][09024] Loop batcher_evt_loop terminating... [2025-01-05 13:07:20,070][09024] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000268556_1100005376.pth... [2025-01-05 13:07:20,101][09057] Weights refcount: 2 0 [2025-01-05 13:07:20,103][09057] Stopping InferenceWorker_p0-w0... [2025-01-05 13:07:20,103][08963] Component InferenceWorker_p0-w0 stopped! [2025-01-05 13:07:20,104][09057] Loop inference_proc0-0_evt_loop terminating... [2025-01-05 13:07:20,140][09024] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000267527_1095790592.pth [2025-01-05 13:07:20,142][09024] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000268556_1100005376.pth... [2025-01-05 13:07:20,153][09061] Stopping RolloutWorker_w3... [2025-01-05 13:07:20,153][08963] Component RolloutWorker_w3 stopped! [2025-01-05 13:07:20,153][09079] Stopping RolloutWorker_w7... [2025-01-05 13:07:20,153][08963] Component RolloutWorker_w7 stopped! [2025-01-05 13:07:20,153][09061] Loop rollout_proc3_evt_loop terminating... [2025-01-05 13:07:20,153][09079] Loop rollout_proc7_evt_loop terminating... [2025-01-05 13:07:20,156][09084] Stopping RolloutWorker_w11... [2025-01-05 13:07:20,157][08963] Component RolloutWorker_w11 stopped! [2025-01-05 13:07:20,157][09084] Loop rollout_proc11_evt_loop terminating... [2025-01-05 13:07:20,158][09082] Stopping RolloutWorker_w8... [2025-01-05 13:07:20,158][08963] Component RolloutWorker_w8 stopped! [2025-01-05 13:07:20,158][09082] Loop rollout_proc8_evt_loop terminating... [2025-01-05 13:07:20,160][09077] Stopping RolloutWorker_w5... [2025-01-05 13:07:20,160][08963] Component RolloutWorker_w5 stopped! [2025-01-05 13:07:20,160][09077] Loop rollout_proc5_evt_loop terminating... [2025-01-05 13:07:20,161][08963] Component RolloutWorker_w1 stopped! [2025-01-05 13:07:20,161][09058] Stopping RolloutWorker_w1... [2025-01-05 13:07:20,161][08963] Component RolloutWorker_w6 stopped! [2025-01-05 13:07:20,161][09080] Stopping RolloutWorker_w6... [2025-01-05 13:07:20,161][09058] Loop rollout_proc1_evt_loop terminating... [2025-01-05 13:07:20,162][09080] Loop rollout_proc6_evt_loop terminating... [2025-01-05 13:07:20,162][09083] Stopping RolloutWorker_w10... [2025-01-05 13:07:20,163][09083] Loop rollout_proc10_evt_loop terminating... [2025-01-05 13:07:20,163][09059] Stopping RolloutWorker_w2... [2025-01-05 13:07:20,163][08963] Component RolloutWorker_w10 stopped! [2025-01-05 13:07:20,163][08963] Component RolloutWorker_w2 stopped! [2025-01-05 13:07:20,163][09059] Loop rollout_proc2_evt_loop terminating... [2025-01-05 13:07:20,164][08963] Component RolloutWorker_w0 stopped! [2025-01-05 13:07:20,164][09056] Stopping RolloutWorker_w0... [2025-01-05 13:07:20,164][08963] Component RolloutWorker_w4 stopped! [2025-01-05 13:07:20,164][09060] Stopping RolloutWorker_w4... [2025-01-05 13:07:20,164][09056] Loop rollout_proc0_evt_loop terminating... [2025-01-05 13:07:20,165][09060] Loop rollout_proc4_evt_loop terminating... [2025-01-05 13:07:20,166][09081] Stopping RolloutWorker_w9... [2025-01-05 13:07:20,166][08963] Component RolloutWorker_w9 stopped! [2025-01-05 13:07:20,166][09081] Loop rollout_proc9_evt_loop terminating... [2025-01-05 13:07:20,221][09024] Stopping LearnerWorker_p0... [2025-01-05 13:07:20,221][08963] Component LearnerWorker_p0 stopped! [2025-01-05 13:07:20,221][09024] Loop learner_proc0_evt_loop terminating... [2025-01-05 13:07:20,221][08963] Waiting for process learner_proc0 to stop... [2025-01-05 13:07:21,367][08963] Waiting for process inference_proc0-0 to join... [2025-01-05 13:07:21,367][08963] Waiting for process rollout_proc0 to join... [2025-01-05 13:07:21,367][08963] Waiting for process rollout_proc1 to join... [2025-01-05 13:07:21,367][08963] Waiting for process rollout_proc2 to join... [2025-01-05 13:07:21,368][08963] Waiting for process rollout_proc3 to join... [2025-01-05 13:07:21,368][08963] Waiting for process rollout_proc4 to join... [2025-01-05 13:07:21,368][08963] Waiting for process rollout_proc5 to join... [2025-01-05 13:07:21,368][08963] Waiting for process rollout_proc6 to join... [2025-01-05 13:07:21,369][08963] Waiting for process rollout_proc7 to join... [2025-01-05 13:07:21,369][08963] Waiting for process rollout_proc8 to join... [2025-01-05 13:07:21,369][08963] Waiting for process rollout_proc9 to join... [2025-01-05 13:07:21,369][08963] Waiting for process rollout_proc10 to join... [2025-01-05 13:07:21,369][08963] Waiting for process rollout_proc11 to join... [2025-01-05 13:07:21,370][08963] Batcher 0 profile tree view: batching: 247.8647, releasing_batches: 0.7155 [2025-01-05 13:07:21,370][08963] InferenceWorker_p0-w0 profile tree view: wait_policy: 0.0001 wait_policy_total: 84.2574 update_model: 88.5945 weight_update: 0.0017 one_step: 0.0053 handle_policy_step: 4725.3877 deserialize: 234.3060, stack: 23.9724, obs_to_device_normalize: 1130.5378, forward: 1888.2092, send_messages: 434.4525 prepare_outputs: 845.5434 to_cpu: 649.4179 [2025-01-05 13:07:21,370][08963] Learner 0 profile tree view: misc: 0.1169, prepare_batch: 577.8470 train: 2012.7437 epoch_init: 0.1360, minibatch_init: 0.1845, losses_postprocess: 11.7308, kl_divergence: 7.5536, after_optimizer: 27.7208 calculate_losses: 657.0565 losses_init: 0.0886, forward_head: 18.6687, bptt_initial: 502.2819, tail: 17.6191, advantages_returns: 5.3158, losses: 77.2376 bptt: 30.8606 bptt_forward_core: 29.1220 update: 1298.2636 clip: 20.2211 [2025-01-05 13:07:21,370][08963] RolloutWorker_w0 profile tree view: wait_for_trajectories: 2.7464, enqueue_policy_requests: 212.9446, env_step: 2620.3819, overhead: 139.9405, complete_rollouts: 4.4244 save_policy_outputs: 242.6155 split_output_tensors: 78.5702 [2025-01-05 13:07:21,371][08963] RolloutWorker_w11 profile tree view: wait_for_trajectories: 2.7811, enqueue_policy_requests: 214.6509, env_step: 2619.1545, overhead: 139.8958, complete_rollouts: 4.5946 save_policy_outputs: 244.8692 split_output_tensors: 79.5708 [2025-01-05 13:07:21,371][08963] Loop Runner_EvtLoop terminating... [2025-01-05 13:07:21,371][08963] Runner profile tree view: main_loop: 5130.4213 [2025-01-05 13:07:21,371][08963] Collected {0: 1100005376}, FPS: 19486.7 [2025-01-05 13:07:21,617][08963] Loading existing experiment configuration from /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/config.json [2025-01-05 13:07:21,617][08963] Overriding arg 'num_workers' with value 1 passed from command line [2025-01-05 13:07:21,618][08963] Adding new argument 'no_render'=True that is not in the saved config file! [2025-01-05 13:07:21,618][08963] Adding new argument 'save_video'=True that is not in the saved config file! [2025-01-05 13:07:21,618][08963] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! [2025-01-05 13:07:21,618][08963] Adding new argument 'video_name'=None that is not in the saved config file! [2025-01-05 13:07:21,618][08963] Adding new argument 'max_num_frames'=1000000000.0 that is not in the saved config file! [2025-01-05 13:07:21,618][08963] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! [2025-01-05 13:07:21,619][08963] Adding new argument 'push_to_hub'=False that is not in the saved config file! [2025-01-05 13:07:21,619][08963] Adding new argument 'hf_repository'=None that is not in the saved config file! [2025-01-05 13:07:21,619][08963] Adding new argument 'policy_index'=0 that is not in the saved config file! [2025-01-05 13:07:21,619][08963] Adding new argument 'eval_deterministic'=False that is not in the saved config file! [2025-01-05 13:07:21,619][08963] Adding new argument 'train_script'=None that is not in the saved config file! [2025-01-05 13:07:21,619][08963] Adding new argument 'enjoy_script'=None that is not in the saved config file! [2025-01-05 13:07:21,619][08963] Using frameskip 1 and render_action_repeat=4 for evaluation [2025-01-05 13:07:21,643][08963] Doom resolution: 160x120, resize resolution: (128, 72) [2025-01-05 13:07:21,645][08963] RunningMeanStd input shape: (3, 72, 128) [2025-01-05 13:07:21,645][08963] RunningMeanStd input shape: (1,) [2025-01-05 13:07:21,655][08963] ConvEncoder: input_channels=3 [2025-01-05 13:07:21,759][08963] Conv encoder output size: 512 [2025-01-05 13:07:21,760][08963] Policy head output size: 512 [2025-01-05 13:07:21,877][08963] Loading state from checkpoint /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000268556_1100005376.pth... [2025-01-05 13:07:22,548][08963] Num frames 100... [2025-01-05 13:07:22,648][08963] Num frames 200... [2025-01-05 13:07:22,751][08963] Num frames 300... [2025-01-05 13:07:22,851][08963] Num frames 400... [2025-01-05 13:07:22,934][08963] Avg episode rewards: #0: 7.290, true rewards: #0: 4.290 [2025-01-05 13:07:22,935][08963] Avg episode reward: 7.290, avg true_objective: 4.290 [2025-01-05 13:07:23,009][08963] Num frames 500... [2025-01-05 13:07:23,101][08963] Num frames 600... [2025-01-05 13:07:23,189][08963] Num frames 700... [2025-01-05 13:07:23,295][08963] Num frames 800... [2025-01-05 13:07:23,362][08963] Avg episode rewards: #0: 5.565, true rewards: #0: 4.065 [2025-01-05 13:07:23,362][08963] Avg episode reward: 5.565, avg true_objective: 4.065 [2025-01-05 13:07:23,449][08963] Num frames 900... [2025-01-05 13:07:23,539][08963] Num frames 1000... [2025-01-05 13:07:23,632][08963] Num frames 1100... [2025-01-05 13:07:23,775][08963] Avg episode rewards: #0: 4.990, true rewards: #0: 3.990 [2025-01-05 13:07:23,775][08963] Avg episode reward: 4.990, avg true_objective: 3.990 [2025-01-05 13:07:23,778][08963] Num frames 1200... [2025-01-05 13:07:23,875][08963] Num frames 1300... [2025-01-05 13:07:23,964][08963] Num frames 1400... [2025-01-05 13:07:24,056][08963] Num frames 1500... [2025-01-05 13:07:24,149][08963] Num frames 1600... [2025-01-05 13:07:24,241][08963] Num frames 1700... [2025-01-05 13:07:24,330][08963] Num frames 1800... [2025-01-05 13:07:24,421][08963] Num frames 1900... [2025-01-05 13:07:24,564][08963] Avg episode rewards: #0: 7.240, true rewards: #0: 4.990 [2025-01-05 13:07:24,564][08963] Avg episode reward: 7.240, avg true_objective: 4.990 [2025-01-05 13:07:24,569][08963] Num frames 2000... [2025-01-05 13:07:24,669][08963] Num frames 2100... [2025-01-05 13:07:24,759][08963] Num frames 2200... [2025-01-05 13:07:24,860][08963] Avg episode rewards: #0: 6.304, true rewards: #0: 4.504 [2025-01-05 13:07:24,860][08963] Avg episode reward: 6.304, avg true_objective: 4.504 [2025-01-05 13:07:24,919][08963] Num frames 2300... [2025-01-05 13:07:25,006][08963] Num frames 2400... [2025-01-05 13:07:25,099][08963] Num frames 2500... [2025-01-05 13:07:25,188][08963] Num frames 2600... [2025-01-05 13:07:25,280][08963] Num frames 2700... [2025-01-05 13:07:25,422][08963] Avg episode rewards: #0: 6.493, true rewards: #0: 4.660 [2025-01-05 13:07:25,423][08963] Avg episode reward: 6.493, avg true_objective: 4.660 [2025-01-05 13:07:25,441][08963] Num frames 2800... [2025-01-05 13:07:25,551][08963] Num frames 2900... [2025-01-05 13:07:25,644][08963] Num frames 3000... [2025-01-05 13:07:25,743][08963] Num frames 3100... [2025-01-05 13:07:25,842][08963] Num frames 3200... [2025-01-05 13:07:25,917][08963] Avg episode rewards: #0: 6.463, true rewards: #0: 4.606 [2025-01-05 13:07:25,918][08963] Avg episode reward: 6.463, avg true_objective: 4.606 [2025-01-05 13:07:26,031][08963] Num frames 3300... [2025-01-05 13:07:26,117][08963] Num frames 3400... [2025-01-05 13:07:26,206][08963] Num frames 3500... [2025-01-05 13:07:26,297][08963] Num frames 3600... [2025-01-05 13:07:26,388][08963] Num frames 3700... [2025-01-05 13:07:26,474][08963] Avg episode rewards: #0: 6.545, true rewards: #0: 4.670 [2025-01-05 13:07:26,475][08963] Avg episode reward: 6.545, avg true_objective: 4.670 [2025-01-05 13:07:26,575][08963] Num frames 3800... [2025-01-05 13:07:26,661][08963] Num frames 3900... [2025-01-05 13:07:26,749][08963] Num frames 4000... [2025-01-05 13:07:26,841][08963] Num frames 4100... [2025-01-05 13:07:26,912][08963] Avg episode rewards: #0: 6.244, true rewards: #0: 4.578 [2025-01-05 13:07:26,912][08963] Avg episode reward: 6.244, avg true_objective: 4.578 [2025-01-05 13:07:27,012][08963] Num frames 4200... [2025-01-05 13:07:27,097][08963] Num frames 4300... [2025-01-05 13:07:27,187][08963] Num frames 4400... [2025-01-05 13:07:27,277][08963] Num frames 4500... [2025-01-05 13:07:27,392][08963] Avg episode rewards: #0: 6.168, true rewards: #0: 4.568 [2025-01-05 13:07:27,392][08963] Avg episode reward: 6.168, avg true_objective: 4.568 [2025-01-05 13:07:37,310][08963] Replay video saved to /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/replay.mp4! [2025-01-05 13:07:37,318][08963] Loading existing experiment configuration from /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/config.json [2025-01-05 13:07:37,318][08963] Overriding arg 'num_workers' with value 1 passed from command line [2025-01-05 13:07:37,318][08963] Adding new argument 'no_render'=True that is not in the saved config file! [2025-01-05 13:07:37,319][08963] Adding new argument 'save_video'=True that is not in the saved config file! [2025-01-05 13:07:37,319][08963] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! [2025-01-05 13:07:37,319][08963] Adding new argument 'video_name'=None that is not in the saved config file! [2025-01-05 13:07:37,319][08963] Adding new argument 'max_num_frames'=100000 that is not in the saved config file! [2025-01-05 13:07:37,319][08963] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! [2025-01-05 13:07:37,319][08963] Adding new argument 'push_to_hub'=True that is not in the saved config file! [2025-01-05 13:07:37,319][08963] Adding new argument 'hf_repository'='spenning/rl_course_vizdoom_health_gathering_supreme' that is not in the saved config file! [2025-01-05 13:07:37,319][08963] Adding new argument 'policy_index'=0 that is not in the saved config file! [2025-01-05 13:07:37,319][08963] Adding new argument 'eval_deterministic'=False that is not in the saved config file! [2025-01-05 13:07:37,319][08963] Adding new argument 'train_script'=None that is not in the saved config file! [2025-01-05 13:07:37,320][08963] Adding new argument 'enjoy_script'=None that is not in the saved config file! [2025-01-05 13:07:37,320][08963] Using frameskip 1 and render_action_repeat=4 for evaluation [2025-01-05 13:07:37,335][08963] RunningMeanStd input shape: (3, 72, 128) [2025-01-05 13:07:37,336][08963] RunningMeanStd input shape: (1,) [2025-01-05 13:07:37,342][08963] ConvEncoder: input_channels=3 [2025-01-05 13:07:37,371][08963] Conv encoder output size: 512 [2025-01-05 13:07:37,371][08963] Policy head output size: 512 [2025-01-05 13:07:37,387][08963] Loading state from checkpoint /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000268556_1100005376.pth... [2025-01-05 13:07:37,790][08963] Num frames 100... [2025-01-05 13:07:37,895][08963] Num frames 200... [2025-01-05 13:07:38,008][08963] Avg episode rewards: #0: 2.560, true rewards: #0: 2.560 [2025-01-05 13:07:38,009][08963] Avg episode reward: 2.560, avg true_objective: 2.560 [2025-01-05 13:07:38,055][08963] Num frames 300... [2025-01-05 13:07:38,141][08963] Num frames 400... [2025-01-05 13:07:38,225][08963] Num frames 500... [2025-01-05 13:07:38,311][08963] Num frames 600... [2025-01-05 13:07:38,412][08963] Num frames 700... [2025-01-05 13:07:38,468][08963] Avg episode rewards: #0: 4.020, true rewards: #0: 3.520 [2025-01-05 13:07:38,469][08963] Avg episode reward: 4.020, avg true_objective: 3.520 [2025-01-05 13:07:38,552][08963] Num frames 800... [2025-01-05 13:07:38,637][08963] Num frames 900... [2025-01-05 13:07:38,722][08963] Num frames 1000... [2025-01-05 13:07:38,805][08963] Num frames 1100... [2025-01-05 13:07:38,890][08963] Num frames 1200... [2025-01-05 13:07:39,011][08963] Avg episode rewards: #0: 5.933, true rewards: #0: 4.267 [2025-01-05 13:07:39,011][08963] Avg episode reward: 5.933, avg true_objective: 4.267 [2025-01-05 13:07:39,034][08963] Num frames 1300... [2025-01-05 13:07:39,123][08963] Num frames 1400... [2025-01-05 13:07:39,207][08963] Num frames 1500... [2025-01-05 13:07:39,291][08963] Num frames 1600... [2025-01-05 13:07:39,427][08963] Avg episode rewards: #0: 5.990, true rewards: #0: 4.240 [2025-01-05 13:07:39,428][08963] Avg episode reward: 5.990, avg true_objective: 4.240 [2025-01-05 13:07:39,432][08963] Num frames 1700... [2025-01-05 13:07:39,525][08963] Num frames 1800... [2025-01-05 13:07:39,607][08963] Num frames 1900... [2025-01-05 13:07:39,692][08963] Num frames 2000... [2025-01-05 13:07:39,777][08963] Num frames 2100... [2025-01-05 13:07:39,862][08963] Num frames 2200... [2025-01-05 13:07:39,954][08963] Num frames 2300... [2025-01-05 13:07:40,047][08963] Num frames 2400... [2025-01-05 13:07:40,138][08963] Num frames 2500... [2025-01-05 13:07:40,230][08963] Num frames 2600... [2025-01-05 13:07:40,321][08963] Num frames 2700... [2025-01-05 13:07:40,414][08963] Num frames 2800... [2025-01-05 13:07:40,484][08963] Avg episode rewards: #0: 9.232, true rewards: #0: 5.632 [2025-01-05 13:07:40,485][08963] Avg episode reward: 9.232, avg true_objective: 5.632 [2025-01-05 13:07:40,603][08963] Num frames 2900... [2025-01-05 13:07:40,689][08963] Num frames 3000... [2025-01-05 13:07:40,779][08963] Num frames 3100... [2025-01-05 13:07:40,872][08963] Num frames 3200... [2025-01-05 13:07:40,964][08963] Num frames 3300... [2025-01-05 13:07:41,100][08963] Avg episode rewards: #0: 9.320, true rewards: #0: 5.653 [2025-01-05 13:07:41,101][08963] Avg episode reward: 9.320, avg true_objective: 5.653 [2025-01-05 13:07:41,110][08963] Num frames 3400... [2025-01-05 13:07:41,214][08963] Num frames 3500... [2025-01-05 13:07:41,297][08963] Num frames 3600... [2025-01-05 13:07:41,378][08963] Num frames 3700... [2025-01-05 13:07:41,464][08963] Num frames 3800... [2025-01-05 13:07:41,549][08963] Num frames 3900... [2025-01-05 13:07:41,635][08963] Num frames 4000... [2025-01-05 13:07:41,720][08963] Num frames 4100... [2025-01-05 13:07:41,804][08963] Num frames 4200... [2025-01-05 13:07:41,894][08963] Num frames 4300... [2025-01-05 13:07:41,984][08963] Num frames 4400... [2025-01-05 13:07:42,079][08963] Num frames 4500... [2025-01-05 13:07:42,169][08963] Num frames 4600... [2025-01-05 13:07:42,230][08963] Avg episode rewards: #0: 11.440, true rewards: #0: 6.583 [2025-01-05 13:07:42,230][08963] Avg episode reward: 11.440, avg true_objective: 6.583 [2025-01-05 13:07:42,320][08963] Num frames 4700... [2025-01-05 13:07:42,409][08963] Num frames 4800... [2025-01-05 13:07:42,501][08963] Num frames 4900... [2025-01-05 13:07:42,593][08963] Num frames 5000... [2025-01-05 13:07:42,655][08963] Avg episode rewards: #0: 10.886, true rewards: #0: 6.261 [2025-01-05 13:07:42,656][08963] Avg episode reward: 10.886, avg true_objective: 6.261 [2025-01-05 13:07:42,751][08963] Num frames 5100... [2025-01-05 13:07:42,840][08963] Num frames 5200... [2025-01-05 13:07:42,930][08963] Num frames 5300... [2025-01-05 13:07:43,021][08963] Num frames 5400... [2025-01-05 13:07:43,114][08963] Num frames 5500... [2025-01-05 13:07:43,206][08963] Num frames 5600... [2025-01-05 13:07:43,275][08963] Avg episode rewards: #0: 10.686, true rewards: #0: 6.241 [2025-01-05 13:07:43,275][08963] Avg episode reward: 10.686, avg true_objective: 6.241 [2025-01-05 13:07:43,348][08963] Num frames 5700... [2025-01-05 13:07:43,438][08963] Num frames 5800... [2025-01-05 13:07:43,532][08963] Num frames 5900... [2025-01-05 13:07:43,648][08963] Avg episode rewards: #0: 10.069, true rewards: #0: 5.969 [2025-01-05 13:07:43,648][08963] Avg episode reward: 10.069, avg true_objective: 5.969 [2025-01-05 13:07:55,526][08963] Replay video saved to /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/replay.mp4! [2025-01-05 13:09:30,692][08963] The model has been pushed to https://huggingface.co/spenning/rl_course_vizdoom_health_gathering_supreme [2025-01-05 13:18:42,914][19571] Saving configuration to /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/config.json... [2025-01-05 13:18:42,915][19571] Rollout worker 0 uses device cpu [2025-01-05 13:18:42,915][19571] Rollout worker 1 uses device cpu [2025-01-05 13:18:42,915][19571] Rollout worker 2 uses device cpu [2025-01-05 13:18:42,916][19571] Rollout worker 3 uses device cpu [2025-01-05 13:18:42,916][19571] Rollout worker 4 uses device cpu [2025-01-05 13:18:42,916][19571] Rollout worker 5 uses device cpu [2025-01-05 13:18:42,916][19571] Rollout worker 6 uses device cpu [2025-01-05 13:18:42,916][19571] Rollout worker 7 uses device cpu [2025-01-05 13:18:42,916][19571] Rollout worker 8 uses device cpu [2025-01-05 13:18:42,917][19571] Rollout worker 9 uses device cpu [2025-01-05 13:18:42,917][19571] Rollout worker 10 uses device cpu [2025-01-05 13:18:42,917][19571] Rollout worker 11 uses device cpu [2025-01-05 13:18:42,963][19571] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2025-01-05 13:18:42,963][19571] InferenceWorker_p0-w0: min num requests: 4 [2025-01-05 13:18:42,991][19571] Starting all processes... [2025-01-05 13:18:42,991][19571] Starting process learner_proc0 [2025-01-05 13:18:44,329][19571] Starting all processes... [2025-01-05 13:18:44,335][19571] Starting process inference_proc0-0 [2025-01-05 13:18:44,335][19571] Starting process rollout_proc0 [2025-01-05 13:18:44,337][19636] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2025-01-05 13:18:44,338][19636] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0 [2025-01-05 13:18:44,335][19571] Starting process rollout_proc1 [2025-01-05 13:18:44,338][19571] Starting process rollout_proc2 [2025-01-05 13:18:44,338][19571] Starting process rollout_proc3 [2025-01-05 13:18:44,338][19571] Starting process rollout_proc4 [2025-01-05 13:18:44,342][19571] Starting process rollout_proc5 [2025-01-05 13:18:44,343][19571] Starting process rollout_proc6 [2025-01-05 13:18:44,353][19636] Num visible devices: 1 [2025-01-05 13:18:44,343][19571] Starting process rollout_proc7 [2025-01-05 13:18:44,361][19636] Starting seed is not provided [2025-01-05 13:18:44,362][19636] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2025-01-05 13:18:44,362][19636] Initializing actor-critic model on device cuda:0 [2025-01-05 13:18:44,363][19636] RunningMeanStd input shape: (3, 72, 128) [2025-01-05 13:18:44,364][19636] RunningMeanStd input shape: (1,) [2025-01-05 13:18:44,347][19571] Starting process rollout_proc8 [2025-01-05 13:18:44,350][19571] Starting process rollout_proc9 [2025-01-05 13:18:44,351][19571] Starting process rollout_proc10 [2025-01-05 13:18:44,351][19571] Starting process rollout_proc11 [2025-01-05 13:18:44,382][19636] ConvEncoder: input_channels=3 [2025-01-05 13:18:44,559][19636] Conv encoder output size: 512 [2025-01-05 13:18:44,560][19636] Policy head output size: 512 [2025-01-05 13:18:44,583][19636] Created Actor Critic model with architecture: [2025-01-05 13:18:44,584][19636] ActorCriticSharedWeights( (obs_normalizer): ObservationNormalizer( (running_mean_std): RunningMeanStdDictInPlace( (running_mean_std): ModuleDict( (obs): RunningMeanStdInPlace() ) ) ) (returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace) (encoder): VizdoomEncoder( (basic_encoder): ConvEncoder( (enc): RecursiveScriptModule( original_name=ConvEncoderImpl (conv_head): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Conv2d) (1): RecursiveScriptModule(original_name=ELU) (2): RecursiveScriptModule(original_name=Conv2d) (3): RecursiveScriptModule(original_name=ELU) (4): RecursiveScriptModule(original_name=Conv2d) (5): RecursiveScriptModule(original_name=ELU) ) (mlp_layers): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Linear) (1): RecursiveScriptModule(original_name=ELU) ) ) ) ) (core): ModelCoreRNN( (core): GRU(512, 512) ) (decoder): MlpDecoder( (mlp): Identity() ) (critic_linear): Linear(in_features=512, out_features=1, bias=True) (action_parameterization): ActionParameterizationDefault( (distribution_linear): Linear(in_features=512, out_features=5, bias=True) ) ) [2025-01-05 13:18:44,721][19636] Using optimizer [2025-01-05 13:18:46,595][19636] Loading state from checkpoint /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000268556_1100005376.pth... [2025-01-05 13:18:46,653][19636] Loading model from checkpoint [2025-01-05 13:18:46,655][19636] Loaded experiment state at self.train_step=268556, self.env_steps=1100005376 [2025-01-05 13:18:46,656][19636] Initialized policy 0 weights for model version 268556 [2025-01-05 13:18:46,659][19636] LearnerWorker_p0 finished initialization! [2025-01-05 13:18:46,660][19636] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2025-01-05 13:18:47,081][19671] Worker 2 uses CPU cores [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15] [2025-01-05 13:18:47,142][19672] Worker 3 uses CPU cores [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15] [2025-01-05 13:18:47,184][19693] Worker 7 uses CPU cores [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15] [2025-01-05 13:18:47,346][19694] Worker 9 uses CPU cores [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15] [2025-01-05 13:18:47,406][19668] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2025-01-05 13:18:47,406][19668] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0 [2025-01-05 13:18:47,418][19689] Worker 6 uses CPU cores [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15] [2025-01-05 13:18:47,423][19668] Num visible devices: 1 [2025-01-05 13:18:47,512][19668] RunningMeanStd input shape: (3, 72, 128) [2025-01-05 13:18:47,513][19668] RunningMeanStd input shape: (1,) [2025-01-05 13:18:47,521][19695] Worker 10 uses CPU cores [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15] [2025-01-05 13:18:47,524][19668] ConvEncoder: input_channels=3 [2025-01-05 13:18:47,592][19669] Worker 1 uses CPU cores [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15] [2025-01-05 13:18:47,600][19692] Worker 8 uses CPU cores [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15] [2025-01-05 13:18:47,617][19696] Worker 11 uses CPU cores [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15] [2025-01-05 13:18:47,633][19668] Conv encoder output size: 512 [2025-01-05 13:18:47,633][19668] Policy head output size: 512 [2025-01-05 13:18:47,658][19691] Worker 5 uses CPU cores [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15] [2025-01-05 13:18:47,660][19670] Worker 0 uses CPU cores [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15] [2025-01-05 13:18:47,693][19571] Inference worker 0-0 is ready! [2025-01-05 13:18:47,693][19571] All inference workers are ready! Signal rollout workers to start! [2025-01-05 13:18:47,694][19571] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 1100005376. Throughput: 0: nan. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2025-01-05 13:18:47,711][19690] Worker 4 uses CPU cores [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15] [2025-01-05 13:18:47,736][19696] Doom resolution: 160x120, resize resolution: (128, 72) [2025-01-05 13:18:47,736][19692] Doom resolution: 160x120, resize resolution: (128, 72) [2025-01-05 13:18:47,740][19669] Doom resolution: 160x120, resize resolution: (128, 72) [2025-01-05 13:18:47,740][19691] Doom resolution: 160x120, resize resolution: (128, 72) [2025-01-05 13:18:47,741][19695] Doom resolution: 160x120, resize resolution: (128, 72) [2025-01-05 13:18:47,749][19670] Doom resolution: 160x120, resize resolution: (128, 72) [2025-01-05 13:18:47,755][19689] Doom resolution: 160x120, resize resolution: (128, 72) [2025-01-05 13:18:47,757][19693] Doom resolution: 160x120, resize resolution: (128, 72) [2025-01-05 13:18:47,757][19671] Doom resolution: 160x120, resize resolution: (128, 72) [2025-01-05 13:18:47,757][19694] Doom resolution: 160x120, resize resolution: (128, 72) [2025-01-05 13:18:47,757][19672] Doom resolution: 160x120, resize resolution: (128, 72) [2025-01-05 13:18:47,769][19690] Doom resolution: 160x120, resize resolution: (128, 72) [2025-01-05 13:18:48,084][19670] Decorrelating experience for 0 frames... [2025-01-05 13:18:48,084][19696] Decorrelating experience for 0 frames... [2025-01-05 13:18:48,085][19693] Decorrelating experience for 0 frames... [2025-01-05 13:18:48,085][19672] Decorrelating experience for 0 frames... [2025-01-05 13:18:48,086][19690] Decorrelating experience for 0 frames... [2025-01-05 13:18:48,125][19695] Decorrelating experience for 0 frames... [2025-01-05 13:18:48,125][19692] Decorrelating experience for 0 frames... [2025-01-05 13:18:48,367][19672] Decorrelating experience for 32 frames... [2025-01-05 13:18:48,371][19696] Decorrelating experience for 32 frames... [2025-01-05 13:18:48,371][19690] Decorrelating experience for 32 frames... [2025-01-05 13:18:48,393][19689] Decorrelating experience for 0 frames... [2025-01-05 13:18:48,398][19671] Decorrelating experience for 0 frames... [2025-01-05 13:18:48,418][19695] Decorrelating experience for 32 frames... [2025-01-05 13:18:48,427][19670] Decorrelating experience for 32 frames... [2025-01-05 13:18:48,428][19669] Decorrelating experience for 0 frames... [2025-01-05 13:18:48,678][19692] Decorrelating experience for 32 frames... [2025-01-05 13:18:48,679][19693] Decorrelating experience for 32 frames... [2025-01-05 13:18:48,729][19691] Decorrelating experience for 0 frames... [2025-01-05 13:18:48,746][19690] Decorrelating experience for 64 frames... [2025-01-05 13:18:48,750][19672] Decorrelating experience for 64 frames... [2025-01-05 13:18:48,762][19669] Decorrelating experience for 32 frames... [2025-01-05 13:18:48,768][19671] Decorrelating experience for 32 frames... [2025-01-05 13:18:48,838][19670] Decorrelating experience for 64 frames... [2025-01-05 13:18:48,998][19689] Decorrelating experience for 32 frames... [2025-01-05 13:18:49,038][19691] Decorrelating experience for 32 frames... [2025-01-05 13:18:49,077][19694] Decorrelating experience for 0 frames... [2025-01-05 13:18:49,085][19695] Decorrelating experience for 64 frames... [2025-01-05 13:18:49,112][19672] Decorrelating experience for 96 frames... [2025-01-05 13:18:49,140][19671] Decorrelating experience for 64 frames... [2025-01-05 13:18:49,309][19670] Decorrelating experience for 96 frames... [2025-01-05 13:18:49,393][19690] Decorrelating experience for 96 frames... [2025-01-05 13:18:49,396][19694] Decorrelating experience for 32 frames... [2025-01-05 13:18:49,410][19691] Decorrelating experience for 64 frames... [2025-01-05 13:18:49,451][19696] Decorrelating experience for 64 frames... [2025-01-05 13:18:49,461][19669] Decorrelating experience for 64 frames... [2025-01-05 13:18:49,512][19671] Decorrelating experience for 96 frames... [2025-01-05 13:18:49,641][19692] Decorrelating experience for 64 frames... [2025-01-05 13:18:49,741][19693] Decorrelating experience for 64 frames... [2025-01-05 13:18:49,816][19689] Decorrelating experience for 64 frames... [2025-01-05 13:18:49,837][19694] Decorrelating experience for 64 frames... [2025-01-05 13:18:49,965][19571] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 1100005376. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2025-01-05 13:18:49,965][19571] Avg episode reward: [(0, '1.530')] [2025-01-05 13:18:50,018][19696] Decorrelating experience for 96 frames... [2025-01-05 13:18:50,098][19691] Decorrelating experience for 96 frames... [2025-01-05 13:18:50,172][19695] Decorrelating experience for 96 frames... [2025-01-05 13:18:50,217][19694] Decorrelating experience for 96 frames... [2025-01-05 13:18:50,362][19689] Decorrelating experience for 96 frames... [2025-01-05 13:18:50,616][19669] Decorrelating experience for 96 frames... [2025-01-05 13:18:50,763][19692] Decorrelating experience for 96 frames... [2025-01-05 13:18:50,911][19636] Signal inference workers to stop experience collection... [2025-01-05 13:18:51,016][19668] InferenceWorker_p0-w0: stopping experience collection [2025-01-05 13:18:51,064][19693] Decorrelating experience for 96 frames... [2025-01-05 13:18:53,084][19636] Signal inference workers to resume experience collection... [2025-01-05 13:18:53,085][19668] InferenceWorker_p0-w0: resuming experience collection [2025-01-05 13:18:54,804][19668] Updated weights for policy 0, policy_version 268566 (0.0082) [2025-01-05 13:18:54,965][19571] Fps is (10 sec: 5633.8, 60 sec: 5633.8, 300 sec: 5633.8). Total num frames: 1100046336. Throughput: 0: 735.9. Samples: 5350. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2025-01-05 13:18:54,965][19571] Avg episode reward: [(0, '6.920')] [2025-01-05 13:18:56,844][19668] Updated weights for policy 0, policy_version 268576 (0.0015) [2025-01-05 13:18:58,897][19668] Updated weights for policy 0, policy_version 268586 (0.0016) [2025-01-05 13:18:59,965][19571] Fps is (10 sec: 13926.1, 60 sec: 11349.4, 300 sec: 11349.4). Total num frames: 1100144640. Throughput: 0: 2901.2. Samples: 35600. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2025-01-05 13:18:59,966][19571] Avg episode reward: [(0, '8.870')] [2025-01-05 13:19:01,098][19668] Updated weights for policy 0, policy_version 268596 (0.0020) [2025-01-05 13:19:02,956][19571] Heartbeat connected on Batcher_0 [2025-01-05 13:19:02,962][19571] Heartbeat connected on LearnerWorker_p0 [2025-01-05 13:19:02,969][19571] Heartbeat connected on RolloutWorker_w1 [2025-01-05 13:19:02,970][19571] Heartbeat connected on RolloutWorker_w0 [2025-01-05 13:19:02,971][19571] Heartbeat connected on RolloutWorker_w2 [2025-01-05 13:19:02,972][19571] Heartbeat connected on InferenceWorker_p0-w0 [2025-01-05 13:19:02,975][19571] Heartbeat connected on RolloutWorker_w3 [2025-01-05 13:19:02,976][19571] Heartbeat connected on RolloutWorker_w4 [2025-01-05 13:19:02,980][19571] Heartbeat connected on RolloutWorker_w6 [2025-01-05 13:19:02,984][19571] Heartbeat connected on RolloutWorker_w7 [2025-01-05 13:19:02,985][19571] Heartbeat connected on RolloutWorker_w8 [2025-01-05 13:19:02,987][19571] Heartbeat connected on RolloutWorker_w5 [2025-01-05 13:19:02,987][19571] Heartbeat connected on RolloutWorker_w9 [2025-01-05 13:19:02,989][19571] Heartbeat connected on RolloutWorker_w10 [2025-01-05 13:19:02,992][19571] Heartbeat connected on RolloutWorker_w11 [2025-01-05 13:19:03,201][19668] Updated weights for policy 0, policy_version 268606 (0.0017) [2025-01-05 13:19:04,965][19571] Fps is (10 sec: 19660.5, 60 sec: 13755.6, 300 sec: 13755.6). Total num frames: 1100242944. Throughput: 0: 2898.5. Samples: 50058. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2025-01-05 13:19:04,966][19571] Avg episode reward: [(0, '10.811')] [2025-01-05 13:19:05,301][19668] Updated weights for policy 0, policy_version 268616 (0.0016) [2025-01-05 13:19:07,325][19668] Updated weights for policy 0, policy_version 268626 (0.0017) [2025-01-05 13:19:09,579][19668] Updated weights for policy 0, policy_version 268636 (0.0018) [2025-01-05 13:19:09,965][19571] Fps is (10 sec: 19251.5, 60 sec: 14897.6, 300 sec: 14897.6). Total num frames: 1100337152. Throughput: 0: 3531.0. Samples: 78636. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2025-01-05 13:19:09,965][19571] Avg episode reward: [(0, '10.184')] [2025-01-05 13:19:11,745][19668] Updated weights for policy 0, policy_version 268646 (0.0018) [2025-01-05 13:19:13,732][19668] Updated weights for policy 0, policy_version 268656 (0.0018) [2025-01-05 13:19:14,965][19571] Fps is (10 sec: 19661.1, 60 sec: 15921.1, 300 sec: 15921.1). Total num frames: 1100439552. Throughput: 0: 3977.3. Samples: 108464. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2025-01-05 13:19:14,965][19571] Avg episode reward: [(0, '8.854')] [2025-01-05 13:19:15,757][19668] Updated weights for policy 0, policy_version 268666 (0.0016) [2025-01-05 13:19:17,979][19668] Updated weights for policy 0, policy_version 268676 (0.0017) [2025-01-05 13:19:19,965][19571] Fps is (10 sec: 19251.0, 60 sec: 16246.6, 300 sec: 16246.6). Total num frames: 1100529664. Throughput: 0: 3816.4. Samples: 123158. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2025-01-05 13:19:19,965][19571] Avg episode reward: [(0, '8.898')] [2025-01-05 13:19:20,302][19668] Updated weights for policy 0, policy_version 268686 (0.0018) [2025-01-05 13:19:22,638][19668] Updated weights for policy 0, policy_version 268696 (0.0018) [2025-01-05 13:19:24,924][19668] Updated weights for policy 0, policy_version 268706 (0.0018) [2025-01-05 13:19:24,965][19571] Fps is (10 sec: 18022.4, 60 sec: 16484.9, 300 sec: 16484.9). Total num frames: 1100619776. Throughput: 0: 4009.3. Samples: 149430. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2025-01-05 13:19:24,965][19571] Avg episode reward: [(0, '9.548')] [2025-01-05 13:19:27,032][19668] Updated weights for policy 0, policy_version 268716 (0.0018) [2025-01-05 13:19:29,069][19668] Updated weights for policy 0, policy_version 268726 (0.0019) [2025-01-05 13:19:29,965][19571] Fps is (10 sec: 18841.8, 60 sec: 16860.6, 300 sec: 16860.6). Total num frames: 1100718080. Throughput: 0: 4217.9. Samples: 178292. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2025-01-05 13:19:29,965][19571] Avg episode reward: [(0, '10.331')] [2025-01-05 13:19:31,244][19668] Updated weights for policy 0, policy_version 268736 (0.0018) [2025-01-05 13:19:33,363][19668] Updated weights for policy 0, policy_version 268746 (0.0017) [2025-01-05 13:19:34,965][19571] Fps is (10 sec: 19661.0, 60 sec: 17156.8, 300 sec: 17156.8). Total num frames: 1100816384. Throughput: 0: 4278.2. Samples: 192520. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2025-01-05 13:19:34,965][19571] Avg episode reward: [(0, '10.237')] [2025-01-05 13:19:35,351][19668] Updated weights for policy 0, policy_version 268756 (0.0014) [2025-01-05 13:19:37,219][19668] Updated weights for policy 0, policy_version 268766 (0.0013) [2025-01-05 13:19:39,110][19668] Updated weights for policy 0, policy_version 268776 (0.0014) [2025-01-05 13:19:39,965][19571] Fps is (10 sec: 20480.3, 60 sec: 17553.0, 300 sec: 17553.0). Total num frames: 1100922880. Throughput: 0: 4864.2. Samples: 224238. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2025-01-05 13:19:39,965][19571] Avg episode reward: [(0, '9.381')] [2025-01-05 13:19:41,083][19668] Updated weights for policy 0, policy_version 268786 (0.0014) [2025-01-05 13:19:42,967][19668] Updated weights for policy 0, policy_version 268796 (0.0014) [2025-01-05 13:19:44,878][19668] Updated weights for policy 0, policy_version 268806 (0.0014) [2025-01-05 13:19:44,965][19571] Fps is (10 sec: 21298.9, 60 sec: 17880.1, 300 sec: 17880.1). Total num frames: 1101029376. Throughput: 0: 4900.8. Samples: 256136. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2025-01-05 13:19:44,965][19571] Avg episode reward: [(0, '9.285')] [2025-01-05 13:19:46,814][19668] Updated weights for policy 0, policy_version 268816 (0.0014) [2025-01-05 13:19:48,703][19668] Updated weights for policy 0, policy_version 268826 (0.0014) [2025-01-05 13:19:49,965][19571] Fps is (10 sec: 21298.8, 60 sec: 18841.6, 300 sec: 18154.6). Total num frames: 1101135872. Throughput: 0: 4936.5. Samples: 272202. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2025-01-05 13:19:49,965][19571] Avg episode reward: [(0, '9.691')] [2025-01-05 13:19:50,647][19668] Updated weights for policy 0, policy_version 268836 (0.0015) [2025-01-05 13:19:52,621][19668] Updated weights for policy 0, policy_version 268846 (0.0017) [2025-01-05 13:19:54,522][19668] Updated weights for policy 0, policy_version 268856 (0.0015) [2025-01-05 13:19:54,965][19571] Fps is (10 sec: 21299.2, 60 sec: 19933.9, 300 sec: 18388.3). Total num frames: 1101242368. Throughput: 0: 5006.0. Samples: 303904. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2025-01-05 13:19:54,965][19571] Avg episode reward: [(0, '10.596')] [2025-01-05 13:19:56,468][19668] Updated weights for policy 0, policy_version 268866 (0.0014) [2025-01-05 13:19:58,453][19668] Updated weights for policy 0, policy_version 268876 (0.0015) [2025-01-05 13:19:59,965][19571] Fps is (10 sec: 21299.4, 60 sec: 20070.4, 300 sec: 18589.7). Total num frames: 1101348864. Throughput: 0: 5046.0. Samples: 335536. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2025-01-05 13:19:59,965][19571] Avg episode reward: [(0, '10.212')] [2025-01-05 13:20:00,338][19668] Updated weights for policy 0, policy_version 268886 (0.0014) [2025-01-05 13:20:02,303][19668] Updated weights for policy 0, policy_version 268896 (0.0015) [2025-01-05 13:20:04,306][19668] Updated weights for policy 0, policy_version 268906 (0.0014) [2025-01-05 13:20:04,965][19571] Fps is (10 sec: 20889.8, 60 sec: 20138.7, 300 sec: 18712.1). Total num frames: 1101451264. Throughput: 0: 5067.4. Samples: 351188. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 13:20:04,965][19571] Avg episode reward: [(0, '10.413')] [2025-01-05 13:20:06,200][19668] Updated weights for policy 0, policy_version 268916 (0.0015) [2025-01-05 13:20:08,159][19668] Updated weights for policy 0, policy_version 268926 (0.0015) [2025-01-05 13:20:09,965][19571] Fps is (10 sec: 20889.7, 60 sec: 20343.5, 300 sec: 18869.3). Total num frames: 1101557760. Throughput: 0: 5185.3. Samples: 382770. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 13:20:09,965][19571] Avg episode reward: [(0, '9.385')] [2025-01-05 13:20:10,148][19668] Updated weights for policy 0, policy_version 268936 (0.0015) [2025-01-05 13:20:12,057][19668] Updated weights for policy 0, policy_version 268946 (0.0015) [2025-01-05 13:20:14,042][19668] Updated weights for policy 0, policy_version 268956 (0.0015) [2025-01-05 13:20:14,965][19571] Fps is (10 sec: 20889.4, 60 sec: 20343.5, 300 sec: 18961.6). Total num frames: 1101660160. Throughput: 0: 5239.0. Samples: 414046. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 13:20:14,965][19571] Avg episode reward: [(0, '9.673')] [2025-01-05 13:20:16,044][19668] Updated weights for policy 0, policy_version 268966 (0.0014) [2025-01-05 13:20:17,948][19668] Updated weights for policy 0, policy_version 268976 (0.0017) [2025-01-05 13:20:19,926][19668] Updated weights for policy 0, policy_version 268986 (0.0015) [2025-01-05 13:20:19,965][19571] Fps is (10 sec: 20889.4, 60 sec: 20616.6, 300 sec: 19088.2). Total num frames: 1101766656. Throughput: 0: 5272.6. Samples: 429788. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 13:20:19,965][19571] Avg episode reward: [(0, '9.498')] [2025-01-05 13:20:21,907][19668] Updated weights for policy 0, policy_version 268996 (0.0015) [2025-01-05 13:20:23,818][19668] Updated weights for policy 0, policy_version 269006 (0.0015) [2025-01-05 13:20:24,965][19571] Fps is (10 sec: 20889.8, 60 sec: 20821.4, 300 sec: 19159.8). Total num frames: 1101869056. Throughput: 0: 5264.7. Samples: 461150. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 13:20:24,965][19571] Avg episode reward: [(0, '9.649')] [2025-01-05 13:20:25,786][19668] Updated weights for policy 0, policy_version 269016 (0.0014) [2025-01-05 13:20:27,761][19668] Updated weights for policy 0, policy_version 269026 (0.0014) [2025-01-05 13:20:29,656][19668] Updated weights for policy 0, policy_version 269036 (0.0014) [2025-01-05 13:20:29,965][19571] Fps is (10 sec: 20889.8, 60 sec: 20957.9, 300 sec: 19264.4). Total num frames: 1101975552. Throughput: 0: 5255.6. Samples: 492638. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 13:20:29,965][19571] Avg episode reward: [(0, '9.987')] [2025-01-05 13:20:31,646][19668] Updated weights for policy 0, policy_version 269046 (0.0014) [2025-01-05 13:20:33,628][19668] Updated weights for policy 0, policy_version 269056 (0.0015) [2025-01-05 13:20:34,965][19571] Fps is (10 sec: 21299.0, 60 sec: 21094.4, 300 sec: 19359.2). Total num frames: 1102082048. Throughput: 0: 5249.2. Samples: 508414. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 13:20:34,965][19571] Avg episode reward: [(0, '10.661')] [2025-01-05 13:20:35,524][19668] Updated weights for policy 0, policy_version 269066 (0.0014) [2025-01-05 13:20:37,521][19668] Updated weights for policy 0, policy_version 269076 (0.0014) [2025-01-05 13:20:39,510][19668] Updated weights for policy 0, policy_version 269086 (0.0014) [2025-01-05 13:20:39,965][19571] Fps is (10 sec: 20889.2, 60 sec: 21026.1, 300 sec: 19409.1). Total num frames: 1102184448. Throughput: 0: 5239.0. Samples: 539660. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 13:20:39,965][19571] Avg episode reward: [(0, '10.007')] [2025-01-05 13:20:39,971][19636] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000269088_1102184448.pth... [2025-01-05 13:20:40,026][19636] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000268110_1098178560.pth [2025-01-05 13:20:41,481][19668] Updated weights for policy 0, policy_version 269096 (0.0014) [2025-01-05 13:20:43,476][19668] Updated weights for policy 0, policy_version 269106 (0.0014) [2025-01-05 13:20:44,965][19571] Fps is (10 sec: 20480.2, 60 sec: 20957.9, 300 sec: 19454.8). Total num frames: 1102286848. Throughput: 0: 5222.7. Samples: 570558. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 13:20:44,965][19571] Avg episode reward: [(0, '9.636')] [2025-01-05 13:20:45,484][19668] Updated weights for policy 0, policy_version 269116 (0.0015) [2025-01-05 13:20:47,394][19668] Updated weights for policy 0, policy_version 269126 (0.0017) [2025-01-05 13:20:49,379][19668] Updated weights for policy 0, policy_version 269136 (0.0014) [2025-01-05 13:20:49,965][19571] Fps is (10 sec: 20480.2, 60 sec: 20889.6, 300 sec: 19496.7). Total num frames: 1102389248. Throughput: 0: 5222.9. Samples: 586218. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 13:20:49,965][19571] Avg episode reward: [(0, '9.607')] [2025-01-05 13:20:51,381][19668] Updated weights for policy 0, policy_version 269146 (0.0015) [2025-01-05 13:20:53,294][19668] Updated weights for policy 0, policy_version 269156 (0.0013) [2025-01-05 13:20:54,965][19571] Fps is (10 sec: 20889.4, 60 sec: 20889.6, 300 sec: 19567.5). Total num frames: 1102495744. Throughput: 0: 5216.7. Samples: 617522. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 13:20:54,965][19571] Avg episode reward: [(0, '10.514')] [2025-01-05 13:20:55,286][19668] Updated weights for policy 0, policy_version 269166 (0.0015) [2025-01-05 13:20:57,287][19668] Updated weights for policy 0, policy_version 269176 (0.0014) [2025-01-05 13:20:59,232][19668] Updated weights for policy 0, policy_version 269186 (0.0015) [2025-01-05 13:20:59,965][19571] Fps is (10 sec: 20889.4, 60 sec: 20821.3, 300 sec: 19602.0). Total num frames: 1102598144. Throughput: 0: 5208.9. Samples: 648448. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 13:20:59,965][19571] Avg episode reward: [(0, '10.675')] [2025-01-05 13:21:01,248][19668] Updated weights for policy 0, policy_version 269196 (0.0014) [2025-01-05 13:21:03,214][19668] Updated weights for policy 0, policy_version 269206 (0.0014) [2025-01-05 13:21:04,965][19571] Fps is (10 sec: 20479.9, 60 sec: 20821.3, 300 sec: 19634.0). Total num frames: 1102700544. Throughput: 0: 5205.8. Samples: 664048. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 13:21:04,965][19571] Avg episode reward: [(0, '8.998')] [2025-01-05 13:21:05,169][19668] Updated weights for policy 0, policy_version 269216 (0.0016) [2025-01-05 13:21:07,175][19668] Updated weights for policy 0, policy_version 269226 (0.0014) [2025-01-05 13:21:09,155][19668] Updated weights for policy 0, policy_version 269236 (0.0014) [2025-01-05 13:21:09,965][19571] Fps is (10 sec: 20889.8, 60 sec: 20821.3, 300 sec: 19692.5). Total num frames: 1102807040. Throughput: 0: 5198.8. Samples: 695098. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 13:21:09,965][19571] Avg episode reward: [(0, '9.377')] [2025-01-05 13:21:11,149][19668] Updated weights for policy 0, policy_version 269246 (0.0015) [2025-01-05 13:21:13,172][19668] Updated weights for policy 0, policy_version 269256 (0.0015) [2025-01-05 13:21:14,965][19571] Fps is (10 sec: 20480.2, 60 sec: 20753.1, 300 sec: 19691.4). Total num frames: 1102905344. Throughput: 0: 5179.4. Samples: 725712. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 13:21:14,965][19571] Avg episode reward: [(0, '8.891')] [2025-01-05 13:21:15,200][19668] Updated weights for policy 0, policy_version 269266 (0.0016) [2025-01-05 13:21:17,160][19668] Updated weights for policy 0, policy_version 269276 (0.0014) [2025-01-05 13:21:19,172][19668] Updated weights for policy 0, policy_version 269286 (0.0015) [2025-01-05 13:21:19,965][19571] Fps is (10 sec: 20480.2, 60 sec: 20753.1, 300 sec: 19744.2). Total num frames: 1103011840. Throughput: 0: 5172.4. Samples: 741172. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 13:21:19,965][19571] Avg episode reward: [(0, '10.509')] [2025-01-05 13:21:21,112][19668] Updated weights for policy 0, policy_version 269296 (0.0014) [2025-01-05 13:21:23,081][19668] Updated weights for policy 0, policy_version 269306 (0.0015) [2025-01-05 13:21:24,965][19571] Fps is (10 sec: 20889.6, 60 sec: 20753.0, 300 sec: 19767.6). Total num frames: 1103114240. Throughput: 0: 5168.7. Samples: 772252. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 13:21:24,965][19571] Avg episode reward: [(0, '9.686')] [2025-01-05 13:21:25,074][19668] Updated weights for policy 0, policy_version 269316 (0.0015) [2025-01-05 13:21:27,025][19668] Updated weights for policy 0, policy_version 269326 (0.0016) [2025-01-05 13:21:28,995][19668] Updated weights for policy 0, policy_version 269336 (0.0014) [2025-01-05 13:21:29,965][19571] Fps is (10 sec: 20479.7, 60 sec: 20684.7, 300 sec: 19789.6). Total num frames: 1103216640. Throughput: 0: 5175.5. Samples: 803456. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 13:21:29,965][19571] Avg episode reward: [(0, '9.999')] [2025-01-05 13:21:31,023][19668] Updated weights for policy 0, policy_version 269346 (0.0015) [2025-01-05 13:21:32,990][19668] Updated weights for policy 0, policy_version 269356 (0.0015) [2025-01-05 13:21:34,965][19571] Fps is (10 sec: 20479.9, 60 sec: 20616.5, 300 sec: 19810.2). Total num frames: 1103319040. Throughput: 0: 5170.0. Samples: 818866. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 13:21:34,965][19571] Avg episode reward: [(0, '9.862')] [2025-01-05 13:21:34,981][19668] Updated weights for policy 0, policy_version 269366 (0.0016) [2025-01-05 13:21:36,985][19668] Updated weights for policy 0, policy_version 269376 (0.0015) [2025-01-05 13:21:38,945][19668] Updated weights for policy 0, policy_version 269386 (0.0015) [2025-01-05 13:21:39,965][19571] Fps is (10 sec: 20889.7, 60 sec: 20684.8, 300 sec: 19853.4). Total num frames: 1103425536. Throughput: 0: 5163.1. Samples: 849860. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 13:21:39,965][19571] Avg episode reward: [(0, '10.341')] [2025-01-05 13:21:40,921][19668] Updated weights for policy 0, policy_version 269396 (0.0014) [2025-01-05 13:21:42,956][19668] Updated weights for policy 0, policy_version 269406 (0.0015) [2025-01-05 13:21:44,965][19571] Fps is (10 sec: 20480.3, 60 sec: 20616.5, 300 sec: 19848.0). Total num frames: 1103523840. Throughput: 0: 5155.4. Samples: 880438. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 13:21:44,965][19571] Avg episode reward: [(0, '10.446')] [2025-01-05 13:21:44,979][19668] Updated weights for policy 0, policy_version 269416 (0.0015) [2025-01-05 13:21:46,958][19668] Updated weights for policy 0, policy_version 269426 (0.0014) [2025-01-05 13:21:48,995][19668] Updated weights for policy 0, policy_version 269436 (0.0015) [2025-01-05 13:21:49,965][19571] Fps is (10 sec: 20070.4, 60 sec: 20616.5, 300 sec: 19865.3). Total num frames: 1103626240. Throughput: 0: 5149.7. Samples: 895784. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 13:21:49,965][19571] Avg episode reward: [(0, '9.040')] [2025-01-05 13:21:51,020][19668] Updated weights for policy 0, policy_version 269446 (0.0015) [2025-01-05 13:21:52,989][19668] Updated weights for policy 0, policy_version 269456 (0.0014) [2025-01-05 13:21:54,965][19571] Fps is (10 sec: 20479.7, 60 sec: 20548.3, 300 sec: 19881.7). Total num frames: 1103728640. Throughput: 0: 5136.9. Samples: 926258. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 13:21:54,965][19571] Avg episode reward: [(0, '9.941')] [2025-01-05 13:21:55,040][19668] Updated weights for policy 0, policy_version 269466 (0.0014) [2025-01-05 13:21:56,989][19668] Updated weights for policy 0, policy_version 269476 (0.0015) [2025-01-05 13:21:58,968][19668] Updated weights for policy 0, policy_version 269486 (0.0018) [2025-01-05 13:21:59,965][19571] Fps is (10 sec: 20480.0, 60 sec: 20548.3, 300 sec: 19897.3). Total num frames: 1103831040. Throughput: 0: 5144.1. Samples: 957198. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 13:21:59,965][19571] Avg episode reward: [(0, '10.415')] [2025-01-05 13:22:01,014][19668] Updated weights for policy 0, policy_version 269496 (0.0014) [2025-01-05 13:22:02,975][19668] Updated weights for policy 0, policy_version 269506 (0.0014) [2025-01-05 13:22:04,949][19668] Updated weights for policy 0, policy_version 269516 (0.0014) [2025-01-05 13:22:04,965][19571] Fps is (10 sec: 20889.8, 60 sec: 20616.6, 300 sec: 19932.8). Total num frames: 1103937536. Throughput: 0: 5143.4. Samples: 972624. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 13:22:04,965][19571] Avg episode reward: [(0, '9.384')] [2025-01-05 13:22:06,988][19668] Updated weights for policy 0, policy_version 269526 (0.0015) [2025-01-05 13:22:08,959][19668] Updated weights for policy 0, policy_version 269536 (0.0014) [2025-01-05 13:22:09,965][19571] Fps is (10 sec: 20479.9, 60 sec: 20480.0, 300 sec: 19926.1). Total num frames: 1104035840. Throughput: 0: 5137.0. Samples: 1003418. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 13:22:09,965][19571] Avg episode reward: [(0, '10.219')] [2025-01-05 13:22:10,933][19668] Updated weights for policy 0, policy_version 269546 (0.0014) [2025-01-05 13:22:12,985][19668] Updated weights for policy 0, policy_version 269556 (0.0015) [2025-01-05 13:22:14,965][19571] Fps is (10 sec: 20070.4, 60 sec: 20548.3, 300 sec: 19939.5). Total num frames: 1104138240. Throughput: 0: 5118.9. Samples: 1033806. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 13:22:14,965][19571] Avg episode reward: [(0, '11.005')] [2025-01-05 13:22:15,025][19668] Updated weights for policy 0, policy_version 269566 (0.0016) [2025-01-05 13:22:17,039][19668] Updated weights for policy 0, policy_version 269576 (0.0015) [2025-01-05 13:22:19,045][19668] Updated weights for policy 0, policy_version 269586 (0.0014) [2025-01-05 13:22:19,965][19571] Fps is (10 sec: 20480.1, 60 sec: 20480.0, 300 sec: 19952.2). Total num frames: 1104240640. Throughput: 0: 5114.3. Samples: 1049010. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 13:22:19,965][19571] Avg episode reward: [(0, '9.608')] [2025-01-05 13:22:20,998][19668] Updated weights for policy 0, policy_version 269596 (0.0014) [2025-01-05 13:22:23,021][19668] Updated weights for policy 0, policy_version 269606 (0.0017) [2025-01-05 13:22:24,965][19571] Fps is (10 sec: 20480.0, 60 sec: 20480.0, 300 sec: 19964.4). Total num frames: 1104343040. Throughput: 0: 5105.0. Samples: 1079586. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 13:22:24,965][19571] Avg episode reward: [(0, '9.272')] [2025-01-05 13:22:25,103][19668] Updated weights for policy 0, policy_version 269616 (0.0016) [2025-01-05 13:22:27,084][19668] Updated weights for policy 0, policy_version 269626 (0.0017) [2025-01-05 13:22:29,082][19668] Updated weights for policy 0, policy_version 269636 (0.0015) [2025-01-05 13:22:29,965][19571] Fps is (10 sec: 20480.0, 60 sec: 20480.0, 300 sec: 19975.9). Total num frames: 1104445440. Throughput: 0: 5105.9. Samples: 1110206. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) [2025-01-05 13:22:29,965][19571] Avg episode reward: [(0, '9.200')] [2025-01-05 13:22:31,119][19668] Updated weights for policy 0, policy_version 269646 (0.0015) [2025-01-05 13:22:33,091][19668] Updated weights for policy 0, policy_version 269656 (0.0015) [2025-01-05 13:22:34,965][19571] Fps is (10 sec: 20479.8, 60 sec: 20480.0, 300 sec: 19987.0). Total num frames: 1104547840. Throughput: 0: 5104.2. Samples: 1125474. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) [2025-01-05 13:22:34,965][19571] Avg episode reward: [(0, '10.087')] [2025-01-05 13:22:35,111][19668] Updated weights for policy 0, policy_version 269666 (0.0015) [2025-01-05 13:22:37,147][19668] Updated weights for policy 0, policy_version 269676 (0.0015) [2025-01-05 13:22:39,105][19668] Updated weights for policy 0, policy_version 269686 (0.0018) [2025-01-05 13:22:39,965][19571] Fps is (10 sec: 20479.8, 60 sec: 20411.7, 300 sec: 19997.6). Total num frames: 1104650240. Throughput: 0: 5108.1. Samples: 1156124. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) [2025-01-05 13:22:39,965][19571] Avg episode reward: [(0, '10.266')] [2025-01-05 13:22:39,971][19636] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000269690_1104650240.pth... [2025-01-05 13:22:40,018][19636] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000268556_1100005376.pth [2025-01-05 13:22:41,129][19668] Updated weights for policy 0, policy_version 269696 (0.0015) [2025-01-05 13:22:43,146][19668] Updated weights for policy 0, policy_version 269706 (0.0015) [2025-01-05 13:22:44,965][19571] Fps is (10 sec: 20480.2, 60 sec: 20480.0, 300 sec: 20007.8). Total num frames: 1104752640. Throughput: 0: 5104.3. Samples: 1186890. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) [2025-01-05 13:22:44,965][19571] Avg episode reward: [(0, '10.704')] [2025-01-05 13:22:45,099][19668] Updated weights for policy 0, policy_version 269716 (0.0016) [2025-01-05 13:22:47,142][19668] Updated weights for policy 0, policy_version 269726 (0.0015) [2025-01-05 13:22:49,154][19668] Updated weights for policy 0, policy_version 269736 (0.0016) [2025-01-05 13:22:49,965][19571] Fps is (10 sec: 20480.4, 60 sec: 20480.0, 300 sec: 20017.6). Total num frames: 1104855040. Throughput: 0: 5100.9. Samples: 1202164. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) [2025-01-05 13:22:49,965][19571] Avg episode reward: [(0, '10.261')] [2025-01-05 13:22:51,092][19668] Updated weights for policy 0, policy_version 269746 (0.0015) [2025-01-05 13:22:53,119][19668] Updated weights for policy 0, policy_version 269756 (0.0016) [2025-01-05 13:22:54,965][19571] Fps is (10 sec: 20070.4, 60 sec: 20411.8, 300 sec: 20010.4). Total num frames: 1104953344. Throughput: 0: 5096.6. Samples: 1232762. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) [2025-01-05 13:22:54,965][19571] Avg episode reward: [(0, '10.739')] [2025-01-05 13:22:55,240][19668] Updated weights for policy 0, policy_version 269766 (0.0017) [2025-01-05 13:22:57,149][19668] Updated weights for policy 0, policy_version 269776 (0.0015) [2025-01-05 13:22:59,175][19668] Updated weights for policy 0, policy_version 269786 (0.0016) [2025-01-05 13:22:59,965][19571] Fps is (10 sec: 20070.1, 60 sec: 20411.7, 300 sec: 20019.7). Total num frames: 1105055744. Throughput: 0: 5098.2. Samples: 1263226. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) [2025-01-05 13:22:59,965][19571] Avg episode reward: [(0, '10.424')] [2025-01-05 13:23:01,327][19668] Updated weights for policy 0, policy_version 269796 (0.0018) [2025-01-05 13:23:03,247][19668] Updated weights for policy 0, policy_version 269806 (0.0018) [2025-01-05 13:23:04,965][19571] Fps is (10 sec: 20479.6, 60 sec: 20343.4, 300 sec: 20028.6). Total num frames: 1105158144. Throughput: 0: 5091.8. Samples: 1278142. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) [2025-01-05 13:23:04,965][19571] Avg episode reward: [(0, '11.131')] [2025-01-05 13:23:05,292][19668] Updated weights for policy 0, policy_version 269816 (0.0018) [2025-01-05 13:23:07,339][19668] Updated weights for policy 0, policy_version 269826 (0.0016) [2025-01-05 13:23:09,257][19668] Updated weights for policy 0, policy_version 269836 (0.0017) [2025-01-05 13:23:09,965][19571] Fps is (10 sec: 20479.9, 60 sec: 20411.7, 300 sec: 20037.2). Total num frames: 1105260544. Throughput: 0: 5092.5. Samples: 1308750. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) [2025-01-05 13:23:09,965][19571] Avg episode reward: [(0, '9.069')] [2025-01-05 13:23:11,307][19668] Updated weights for policy 0, policy_version 269846 (0.0016) [2025-01-05 13:23:13,369][19668] Updated weights for policy 0, policy_version 269856 (0.0016) [2025-01-05 13:23:14,965][19571] Fps is (10 sec: 20070.8, 60 sec: 20343.5, 300 sec: 20030.2). Total num frames: 1105358848. Throughput: 0: 5086.6. Samples: 1339100. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) [2025-01-05 13:23:14,965][19571] Avg episode reward: [(0, '9.704')] [2025-01-05 13:23:15,360][19668] Updated weights for policy 0, policy_version 269866 (0.0016) [2025-01-05 13:23:17,390][19668] Updated weights for policy 0, policy_version 269876 (0.0015) [2025-01-05 13:23:19,420][19668] Updated weights for policy 0, policy_version 269886 (0.0015) [2025-01-05 13:23:19,965][19571] Fps is (10 sec: 20070.7, 60 sec: 20343.5, 300 sec: 20038.4). Total num frames: 1105461248. Throughput: 0: 5087.9. Samples: 1354430. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) [2025-01-05 13:23:19,965][19571] Avg episode reward: [(0, '9.406')] [2025-01-05 13:23:21,418][19668] Updated weights for policy 0, policy_version 269896 (0.0016) [2025-01-05 13:23:23,432][19668] Updated weights for policy 0, policy_version 269906 (0.0014) [2025-01-05 13:23:24,965][19571] Fps is (10 sec: 20479.7, 60 sec: 20343.4, 300 sec: 20046.4). Total num frames: 1105563648. Throughput: 0: 5083.4. Samples: 1384876. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) [2025-01-05 13:23:24,965][19571] Avg episode reward: [(0, '9.797')] [2025-01-05 13:23:25,468][19668] Updated weights for policy 0, policy_version 269916 (0.0015) [2025-01-05 13:23:27,402][19668] Updated weights for policy 0, policy_version 269926 (0.0015) [2025-01-05 13:23:29,432][19668] Updated weights for policy 0, policy_version 269936 (0.0015) [2025-01-05 13:23:29,965][19571] Fps is (10 sec: 20480.1, 60 sec: 20343.5, 300 sec: 20054.1). Total num frames: 1105666048. Throughput: 0: 5081.2. Samples: 1415544. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) [2025-01-05 13:23:29,965][19571] Avg episode reward: [(0, '9.680')] [2025-01-05 13:23:31,516][19668] Updated weights for policy 0, policy_version 269946 (0.0016) [2025-01-05 13:23:33,467][19668] Updated weights for policy 0, policy_version 269956 (0.0014) [2025-01-05 13:23:34,965][19571] Fps is (10 sec: 20479.9, 60 sec: 20343.5, 300 sec: 20061.5). Total num frames: 1105768448. Throughput: 0: 5078.2. Samples: 1430682. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) [2025-01-05 13:23:34,965][19571] Avg episode reward: [(0, '9.629')] [2025-01-05 13:23:35,528][19668] Updated weights for policy 0, policy_version 269966 (0.0016) [2025-01-05 13:23:37,558][19668] Updated weights for policy 0, policy_version 269976 (0.0015) [2025-01-05 13:23:39,532][19668] Updated weights for policy 0, policy_version 269986 (0.0015) [2025-01-05 13:23:39,965][19571] Fps is (10 sec: 20070.0, 60 sec: 20275.2, 300 sec: 20054.6). Total num frames: 1105866752. Throughput: 0: 5075.0. Samples: 1461140. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-05 13:23:39,965][19571] Avg episode reward: [(0, '10.051')] [2025-01-05 13:23:41,577][19668] Updated weights for policy 0, policy_version 269996 (0.0015) [2025-01-05 13:23:43,608][19668] Updated weights for policy 0, policy_version 270006 (0.0016) [2025-01-05 13:23:44,965][19571] Fps is (10 sec: 20070.2, 60 sec: 20275.1, 300 sec: 20216.2). Total num frames: 1105969152. Throughput: 0: 5071.6. Samples: 1491450. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-05 13:23:44,965][19571] Avg episode reward: [(0, '11.197')] [2025-01-05 13:23:45,621][19668] Updated weights for policy 0, policy_version 270016 (0.0016) [2025-01-05 13:23:47,654][19668] Updated weights for policy 0, policy_version 270026 (0.0015) [2025-01-05 13:23:49,665][19668] Updated weights for policy 0, policy_version 270036 (0.0015) [2025-01-05 13:23:49,965][19571] Fps is (10 sec: 20480.0, 60 sec: 20275.2, 300 sec: 20424.5). Total num frames: 1106071552. Throughput: 0: 5078.1. Samples: 1506658. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-05 13:23:49,965][19571] Avg episode reward: [(0, '10.430')] [2025-01-05 13:23:51,611][19668] Updated weights for policy 0, policy_version 270046 (0.0016) [2025-01-05 13:23:53,641][19668] Updated weights for policy 0, policy_version 270056 (0.0016) [2025-01-05 13:23:54,965][19571] Fps is (10 sec: 20480.3, 60 sec: 20343.4, 300 sec: 20438.4). Total num frames: 1106173952. Throughput: 0: 5081.3. Samples: 1537406. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-05 13:23:54,966][19571] Avg episode reward: [(0, '9.891')] [2025-01-05 13:23:55,782][19668] Updated weights for policy 0, policy_version 270066 (0.0016) [2025-01-05 13:23:57,712][19668] Updated weights for policy 0, policy_version 270076 (0.0019) [2025-01-05 13:23:59,754][19668] Updated weights for policy 0, policy_version 270086 (0.0016) [2025-01-05 13:23:59,965][19571] Fps is (10 sec: 20070.4, 60 sec: 20275.2, 300 sec: 20438.4). Total num frames: 1106272256. Throughput: 0: 5077.4. Samples: 1567586. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-05 13:23:59,965][19571] Avg episode reward: [(0, '11.331')] [2025-01-05 13:24:01,843][19668] Updated weights for policy 0, policy_version 270096 (0.0015) [2025-01-05 13:24:03,763][19668] Updated weights for policy 0, policy_version 270106 (0.0016) [2025-01-05 13:24:04,965][19571] Fps is (10 sec: 20480.3, 60 sec: 20343.5, 300 sec: 20480.0). Total num frames: 1106378752. Throughput: 0: 5071.9. Samples: 1582666. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-05 13:24:04,965][19571] Avg episode reward: [(0, '9.768')] [2025-01-05 13:24:05,802][19668] Updated weights for policy 0, policy_version 270116 (0.0015) [2025-01-05 13:24:07,834][19668] Updated weights for policy 0, policy_version 270126 (0.0016) [2025-01-05 13:24:09,808][19668] Updated weights for policy 0, policy_version 270136 (0.0018) [2025-01-05 13:24:09,965][19571] Fps is (10 sec: 20479.8, 60 sec: 20275.2, 300 sec: 20466.1). Total num frames: 1106477056. Throughput: 0: 5074.5. Samples: 1613228. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-05 13:24:09,966][19571] Avg episode reward: [(0, '9.924')] [2025-01-05 13:24:11,894][19668] Updated weights for policy 0, policy_version 270146 (0.0015) [2025-01-05 13:24:13,946][19668] Updated weights for policy 0, policy_version 270156 (0.0018) [2025-01-05 13:24:14,965][19571] Fps is (10 sec: 20070.2, 60 sec: 20343.4, 300 sec: 20507.8). Total num frames: 1106579456. Throughput: 0: 5061.4. Samples: 1643308. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-05 13:24:14,965][19571] Avg episode reward: [(0, '9.626')] [2025-01-05 13:24:15,935][19668] Updated weights for policy 0, policy_version 270166 (0.0016) [2025-01-05 13:24:17,981][19668] Updated weights for policy 0, policy_version 270176 (0.0015) [2025-01-05 13:24:19,965][19571] Fps is (10 sec: 20070.7, 60 sec: 20275.2, 300 sec: 20535.5). Total num frames: 1106677760. Throughput: 0: 5062.5. Samples: 1658496. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-05 13:24:19,965][19571] Avg episode reward: [(0, '8.853')] [2025-01-05 13:24:20,064][19668] Updated weights for policy 0, policy_version 270186 (0.0015) [2025-01-05 13:24:22,003][19668] Updated weights for policy 0, policy_version 270196 (0.0015) [2025-01-05 13:24:24,045][19668] Updated weights for policy 0, policy_version 270206 (0.0016) [2025-01-05 13:24:24,965][19571] Fps is (10 sec: 20070.5, 60 sec: 20275.2, 300 sec: 20549.4). Total num frames: 1106780160. Throughput: 0: 5061.4. Samples: 1688902. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-05 13:24:24,965][19571] Avg episode reward: [(0, '9.744')] [2025-01-05 13:24:26,187][19668] Updated weights for policy 0, policy_version 270216 (0.0017) [2025-01-05 13:24:28,126][19668] Updated weights for policy 0, policy_version 270226 (0.0017) [2025-01-05 13:24:29,965][19571] Fps is (10 sec: 20480.1, 60 sec: 20275.2, 300 sec: 20563.3). Total num frames: 1106882560. Throughput: 0: 5057.3. Samples: 1719028. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-05 13:24:29,965][19571] Avg episode reward: [(0, '10.046')] [2025-01-05 13:24:30,185][19668] Updated weights for policy 0, policy_version 270236 (0.0015) [2025-01-05 13:24:32,232][19668] Updated weights for policy 0, policy_version 270246 (0.0014) [2025-01-05 13:24:34,178][19668] Updated weights for policy 0, policy_version 270256 (0.0016) [2025-01-05 13:24:34,965][19571] Fps is (10 sec: 20070.4, 60 sec: 20207.0, 300 sec: 20535.5). Total num frames: 1106980864. Throughput: 0: 5056.5. Samples: 1734200. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-05 13:24:34,965][19571] Avg episode reward: [(0, '9.340')] [2025-01-05 13:24:36,224][19668] Updated weights for policy 0, policy_version 270266 (0.0015) [2025-01-05 13:24:38,285][19668] Updated weights for policy 0, policy_version 270276 (0.0015) [2025-01-05 13:24:39,965][19571] Fps is (10 sec: 20070.4, 60 sec: 20275.2, 300 sec: 20521.7). Total num frames: 1107083264. Throughput: 0: 5045.0. Samples: 1764432. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-05 13:24:39,965][19571] Avg episode reward: [(0, '10.193')] [2025-01-05 13:24:39,970][19636] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000270284_1107083264.pth... [2025-01-05 13:24:40,021][19636] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000269088_1102184448.pth [2025-01-05 13:24:40,376][19668] Updated weights for policy 0, policy_version 270286 (0.0016) [2025-01-05 13:24:42,422][19668] Updated weights for policy 0, policy_version 270296 (0.0015) [2025-01-05 13:24:44,453][19668] Updated weights for policy 0, policy_version 270306 (0.0016) [2025-01-05 13:24:44,965][19571] Fps is (10 sec: 20070.2, 60 sec: 20207.0, 300 sec: 20493.9). Total num frames: 1107181568. Throughput: 0: 5039.6. Samples: 1794366. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-05 13:24:44,965][19571] Avg episode reward: [(0, '9.393')] [2025-01-05 13:24:46,533][19668] Updated weights for policy 0, policy_version 270316 (0.0017) [2025-01-05 13:24:48,572][19668] Updated weights for policy 0, policy_version 270326 (0.0016) [2025-01-05 13:24:49,965][19571] Fps is (10 sec: 19660.7, 60 sec: 20138.7, 300 sec: 20466.1). Total num frames: 1107279872. Throughput: 0: 5033.0. Samples: 1809152. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-05 13:24:49,965][19571] Avg episode reward: [(0, '8.560')] [2025-01-05 13:24:50,706][19668] Updated weights for policy 0, policy_version 270336 (0.0018) [2025-01-05 13:24:52,689][19668] Updated weights for policy 0, policy_version 270346 (0.0015) [2025-01-05 13:24:54,711][19668] Updated weights for policy 0, policy_version 270356 (0.0017) [2025-01-05 13:24:54,965][19571] Fps is (10 sec: 20070.4, 60 sec: 20138.7, 300 sec: 20452.2). Total num frames: 1107382272. Throughput: 0: 5021.3. Samples: 1839184. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-05 13:24:54,965][19571] Avg episode reward: [(0, '8.717')] [2025-01-05 13:24:56,859][19668] Updated weights for policy 0, policy_version 270366 (0.0016) [2025-01-05 13:24:58,846][19668] Updated weights for policy 0, policy_version 270376 (0.0016) [2025-01-05 13:24:59,965][19571] Fps is (10 sec: 20070.6, 60 sec: 20138.7, 300 sec: 20438.3). Total num frames: 1107480576. Throughput: 0: 5016.7. Samples: 1869060. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-05 13:24:59,965][19571] Avg episode reward: [(0, '9.913')] [2025-01-05 13:25:00,876][19668] Updated weights for policy 0, policy_version 270386 (0.0014) [2025-01-05 13:25:02,947][19668] Updated weights for policy 0, policy_version 270396 (0.0016) [2025-01-05 13:25:04,965][19571] Fps is (10 sec: 19661.1, 60 sec: 20002.1, 300 sec: 20410.6). Total num frames: 1107578880. Throughput: 0: 5014.9. Samples: 1884168. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-05 13:25:04,965][19571] Avg episode reward: [(0, '9.786')] [2025-01-05 13:25:05,010][19668] Updated weights for policy 0, policy_version 270406 (0.0017) [2025-01-05 13:25:07,033][19668] Updated weights for policy 0, policy_version 270416 (0.0015) [2025-01-05 13:25:09,119][19668] Updated weights for policy 0, policy_version 270426 (0.0016) [2025-01-05 13:25:09,965][19571] Fps is (10 sec: 20070.4, 60 sec: 20070.5, 300 sec: 20410.6). Total num frames: 1107681280. Throughput: 0: 5000.8. Samples: 1913940. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-05 13:25:09,965][19571] Avg episode reward: [(0, '10.164')] [2025-01-05 13:25:11,167][19668] Updated weights for policy 0, policy_version 270436 (0.0016) [2025-01-05 13:25:13,193][19668] Updated weights for policy 0, policy_version 270446 (0.0016) [2025-01-05 13:25:14,965][19571] Fps is (10 sec: 20070.1, 60 sec: 20002.1, 300 sec: 20382.8). Total num frames: 1107779584. Throughput: 0: 4993.9. Samples: 1943756. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-05 13:25:14,965][19571] Avg episode reward: [(0, '10.043')] [2025-01-05 13:25:15,321][19668] Updated weights for policy 0, policy_version 270456 (0.0017) [2025-01-05 13:25:17,318][19668] Updated weights for policy 0, policy_version 270466 (0.0015) [2025-01-05 13:25:19,322][19668] Updated weights for policy 0, policy_version 270476 (0.0015) [2025-01-05 13:25:19,965][19571] Fps is (10 sec: 20070.3, 60 sec: 20070.4, 300 sec: 20382.8). Total num frames: 1107881984. Throughput: 0: 4993.1. Samples: 1958890. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-05 13:25:19,965][19571] Avg episode reward: [(0, '10.325')] [2025-01-05 13:25:21,361][19668] Updated weights for policy 0, policy_version 270486 (0.0017) [2025-01-05 13:25:23,328][19668] Updated weights for policy 0, policy_version 270496 (0.0016) [2025-01-05 13:25:24,965][19571] Fps is (10 sec: 20070.6, 60 sec: 20002.1, 300 sec: 20355.0). Total num frames: 1107980288. Throughput: 0: 5001.5. Samples: 1989500. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-05 13:25:24,965][19571] Avg episode reward: [(0, '9.771')] [2025-01-05 13:25:25,353][19668] Updated weights for policy 0, policy_version 270506 (0.0015) [2025-01-05 13:25:27,394][19668] Updated weights for policy 0, policy_version 270516 (0.0015) [2025-01-05 13:25:29,352][19668] Updated weights for policy 0, policy_version 270526 (0.0015) [2025-01-05 13:25:29,965][19571] Fps is (10 sec: 20479.4, 60 sec: 20070.3, 300 sec: 20355.0). Total num frames: 1108086784. Throughput: 0: 5015.8. Samples: 2020078. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-05 13:25:29,966][19571] Avg episode reward: [(0, '9.434')] [2025-01-05 13:25:31,382][19668] Updated weights for policy 0, policy_version 270536 (0.0017) [2025-01-05 13:25:33,404][19668] Updated weights for policy 0, policy_version 270546 (0.0015) [2025-01-05 13:25:34,965][19571] Fps is (10 sec: 20480.1, 60 sec: 20070.4, 300 sec: 20341.2). Total num frames: 1108185088. Throughput: 0: 5028.3. Samples: 2035424. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-05 13:25:34,965][19571] Avg episode reward: [(0, '10.855')] [2025-01-05 13:25:35,411][19668] Updated weights for policy 0, policy_version 270556 (0.0016) [2025-01-05 13:25:37,452][19668] Updated weights for policy 0, policy_version 270566 (0.0015) [2025-01-05 13:25:39,464][19668] Updated weights for policy 0, policy_version 270576 (0.0015) [2025-01-05 13:25:39,965][19571] Fps is (10 sec: 20070.5, 60 sec: 20070.3, 300 sec: 20341.1). Total num frames: 1108287488. Throughput: 0: 5034.5. Samples: 2065738. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-05 13:25:39,966][19571] Avg episode reward: [(0, '10.388')] [2025-01-05 13:25:41,418][19668] Updated weights for policy 0, policy_version 270586 (0.0017) [2025-01-05 13:25:43,468][19668] Updated weights for policy 0, policy_version 270596 (0.0015) [2025-01-05 13:25:44,965][19571] Fps is (10 sec: 20479.4, 60 sec: 20138.6, 300 sec: 20341.1). Total num frames: 1108389888. Throughput: 0: 5045.7. Samples: 2096116. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-05 13:25:44,966][19571] Avg episode reward: [(0, '9.866')] [2025-01-05 13:25:45,563][19668] Updated weights for policy 0, policy_version 270606 (0.0019) [2025-01-05 13:25:47,511][19668] Updated weights for policy 0, policy_version 270616 (0.0014) [2025-01-05 13:25:49,521][19668] Updated weights for policy 0, policy_version 270626 (0.0015) [2025-01-05 13:25:49,965][19571] Fps is (10 sec: 20480.5, 60 sec: 20207.0, 300 sec: 20327.3). Total num frames: 1108492288. Throughput: 0: 5049.0. Samples: 2111374. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-05 13:25:49,965][19571] Avg episode reward: [(0, '10.271')] [2025-01-05 13:25:51,572][19668] Updated weights for policy 0, policy_version 270636 (0.0014) [2025-01-05 13:25:53,508][19668] Updated weights for policy 0, policy_version 270646 (0.0015) [2025-01-05 13:25:54,965][19571] Fps is (10 sec: 20480.5, 60 sec: 20207.0, 300 sec: 20327.3). Total num frames: 1108594688. Throughput: 0: 5069.2. Samples: 2142054. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-05 13:25:54,965][19571] Avg episode reward: [(0, '10.208')] [2025-01-05 13:25:55,531][19668] Updated weights for policy 0, policy_version 270656 (0.0015) [2025-01-05 13:25:57,558][19668] Updated weights for policy 0, policy_version 270666 (0.0015) [2025-01-05 13:25:59,522][19668] Updated weights for policy 0, policy_version 270676 (0.0016) [2025-01-05 13:25:59,965][19571] Fps is (10 sec: 20479.2, 60 sec: 20275.1, 300 sec: 20327.2). Total num frames: 1108697088. Throughput: 0: 5089.7. Samples: 2172794. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-05 13:25:59,966][19571] Avg episode reward: [(0, '9.619')] [2025-01-05 13:26:01,541][19668] Updated weights for policy 0, policy_version 270686 (0.0014) [2025-01-05 13:26:03,558][19668] Updated weights for policy 0, policy_version 270696 (0.0015) [2025-01-05 13:26:04,965][19571] Fps is (10 sec: 20479.8, 60 sec: 20343.4, 300 sec: 20313.4). Total num frames: 1108799488. Throughput: 0: 5094.3. Samples: 2188136. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 13:26:04,965][19571] Avg episode reward: [(0, '9.153')] [2025-01-05 13:26:05,530][19668] Updated weights for policy 0, policy_version 270706 (0.0015) [2025-01-05 13:26:07,564][19668] Updated weights for policy 0, policy_version 270716 (0.0015) [2025-01-05 13:26:09,555][19668] Updated weights for policy 0, policy_version 270726 (0.0015) [2025-01-05 13:26:09,965][19571] Fps is (10 sec: 20480.9, 60 sec: 20343.5, 300 sec: 20327.3). Total num frames: 1108901888. Throughput: 0: 5095.5. Samples: 2218798. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 13:26:09,965][19571] Avg episode reward: [(0, '10.020')] [2025-01-05 13:26:11,536][19668] Updated weights for policy 0, policy_version 270736 (0.0016) [2025-01-05 13:26:13,596][19668] Updated weights for policy 0, policy_version 270746 (0.0015) [2025-01-05 13:26:14,965][19571] Fps is (10 sec: 20070.6, 60 sec: 20343.5, 300 sec: 20299.5). Total num frames: 1109000192. Throughput: 0: 5090.7. Samples: 2249160. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 13:26:14,965][19571] Avg episode reward: [(0, '10.556')] [2025-01-05 13:26:15,671][19668] Updated weights for policy 0, policy_version 270756 (0.0016) [2025-01-05 13:26:17,662][19668] Updated weights for policy 0, policy_version 270766 (0.0016) [2025-01-05 13:26:19,689][19668] Updated weights for policy 0, policy_version 270776 (0.0015) [2025-01-05 13:26:19,965][19571] Fps is (10 sec: 20070.0, 60 sec: 20343.4, 300 sec: 20299.5). Total num frames: 1109102592. Throughput: 0: 5082.7. Samples: 2264146. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 13:26:19,965][19571] Avg episode reward: [(0, '10.206')] [2025-01-05 13:26:21,778][19668] Updated weights for policy 0, policy_version 270786 (0.0016) [2025-01-05 13:26:23,731][19668] Updated weights for policy 0, policy_version 270796 (0.0015) [2025-01-05 13:26:24,965][19571] Fps is (10 sec: 20479.8, 60 sec: 20411.7, 300 sec: 20299.5). Total num frames: 1109204992. Throughput: 0: 5085.0. Samples: 2294560. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 13:26:24,965][19571] Avg episode reward: [(0, '9.891')] [2025-01-05 13:26:25,764][19668] Updated weights for policy 0, policy_version 270806 (0.0016) [2025-01-05 13:26:27,790][19668] Updated weights for policy 0, policy_version 270816 (0.0015) [2025-01-05 13:26:29,827][19668] Updated weights for policy 0, policy_version 270826 (0.0016) [2025-01-05 13:26:29,965][19571] Fps is (10 sec: 20070.6, 60 sec: 20275.3, 300 sec: 20285.6). Total num frames: 1109303296. Throughput: 0: 5083.0. Samples: 2324852. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 13:26:29,965][19571] Avg episode reward: [(0, '10.126')] [2025-01-05 13:26:31,913][19668] Updated weights for policy 0, policy_version 270836 (0.0016) [2025-01-05 13:26:33,961][19668] Updated weights for policy 0, policy_version 270846 (0.0016) [2025-01-05 13:26:34,965][19571] Fps is (10 sec: 19661.1, 60 sec: 20275.2, 300 sec: 20257.9). Total num frames: 1109401600. Throughput: 0: 5076.8. Samples: 2339832. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 13:26:34,965][19571] Avg episode reward: [(0, '9.047')] [2025-01-05 13:26:35,986][19668] Updated weights for policy 0, policy_version 270856 (0.0016) [2025-01-05 13:26:38,027][19668] Updated weights for policy 0, policy_version 270866 (0.0018) [2025-01-05 13:26:39,965][19571] Fps is (10 sec: 20070.1, 60 sec: 20275.2, 300 sec: 20271.7). Total num frames: 1109504000. Throughput: 0: 5060.8. Samples: 2369790. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 13:26:39,966][19571] Avg episode reward: [(0, '9.757')] [2025-01-05 13:26:39,973][19636] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000270875_1109504000.pth... [2025-01-05 13:26:40,027][19636] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000269690_1104650240.pth [2025-01-05 13:26:40,134][19668] Updated weights for policy 0, policy_version 270876 (0.0016) [2025-01-05 13:26:42,144][19668] Updated weights for policy 0, policy_version 270886 (0.0016) [2025-01-05 13:26:44,178][19668] Updated weights for policy 0, policy_version 270896 (0.0015) [2025-01-05 13:26:44,965][19571] Fps is (10 sec: 20069.7, 60 sec: 20206.9, 300 sec: 20257.8). Total num frames: 1109602304. Throughput: 0: 5045.5. Samples: 2399842. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 13:26:44,965][19571] Avg episode reward: [(0, '9.321')] [2025-01-05 13:26:46,277][19668] Updated weights for policy 0, policy_version 270906 (0.0017) [2025-01-05 13:26:48,271][19668] Updated weights for policy 0, policy_version 270916 (0.0015) [2025-01-05 13:26:49,965][19571] Fps is (10 sec: 20070.6, 60 sec: 20206.9, 300 sec: 20257.8). Total num frames: 1109704704. Throughput: 0: 5037.7. Samples: 2414834. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 13:26:49,965][19571] Avg episode reward: [(0, '11.132')] [2025-01-05 13:26:50,297][19668] Updated weights for policy 0, policy_version 270926 (0.0015) [2025-01-05 13:26:52,329][19668] Updated weights for policy 0, policy_version 270936 (0.0015) [2025-01-05 13:26:54,336][19668] Updated weights for policy 0, policy_version 270946 (0.0015) [2025-01-05 13:26:54,965][19571] Fps is (10 sec: 20480.4, 60 sec: 20206.9, 300 sec: 20257.8). Total num frames: 1109807104. Throughput: 0: 5033.1. Samples: 2445290. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 13:26:54,965][19571] Avg episode reward: [(0, '9.920')] [2025-01-05 13:26:56,327][19668] Updated weights for policy 0, policy_version 270956 (0.0015) [2025-01-05 13:26:58,346][19668] Updated weights for policy 0, policy_version 270966 (0.0015) [2025-01-05 13:26:59,965][19571] Fps is (10 sec: 20070.3, 60 sec: 20138.8, 300 sec: 20230.1). Total num frames: 1109905408. Throughput: 0: 5031.5. Samples: 2475576. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 13:26:59,966][19571] Avg episode reward: [(0, '10.725')] [2025-01-05 13:27:00,440][19668] Updated weights for policy 0, policy_version 270976 (0.0016) [2025-01-05 13:27:02,455][19668] Updated weights for policy 0, policy_version 270986 (0.0015) [2025-01-05 13:27:04,495][19668] Updated weights for policy 0, policy_version 270996 (0.0015) [2025-01-05 13:27:04,965][19571] Fps is (10 sec: 20070.4, 60 sec: 20138.7, 300 sec: 20244.0). Total num frames: 1110007808. Throughput: 0: 5033.0. Samples: 2490630. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 13:27:04,965][19571] Avg episode reward: [(0, '8.978')] [2025-01-05 13:27:06,543][19668] Updated weights for policy 0, policy_version 271006 (0.0015) [2025-01-05 13:27:08,571][19668] Updated weights for policy 0, policy_version 271016 (0.0015) [2025-01-05 13:27:09,965][19571] Fps is (10 sec: 20070.5, 60 sec: 20070.4, 300 sec: 20230.1). Total num frames: 1110106112. Throughput: 0: 5023.2. Samples: 2520604. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 13:27:09,966][19571] Avg episode reward: [(0, '9.741')] [2025-01-05 13:27:10,733][19668] Updated weights for policy 0, policy_version 271026 (0.0016) [2025-01-05 13:27:12,738][19668] Updated weights for policy 0, policy_version 271036 (0.0015) [2025-01-05 13:27:14,728][19668] Updated weights for policy 0, policy_version 271046 (0.0016) [2025-01-05 13:27:14,965][19571] Fps is (10 sec: 19661.0, 60 sec: 20070.4, 300 sec: 20216.2). Total num frames: 1110204416. Throughput: 0: 5017.7. Samples: 2550650. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 13:27:14,965][19571] Avg episode reward: [(0, '10.159')] [2025-01-05 13:27:16,776][19668] Updated weights for policy 0, policy_version 271056 (0.0015) [2025-01-05 13:27:18,759][19668] Updated weights for policy 0, policy_version 271066 (0.0015) [2025-01-05 13:27:19,965][19571] Fps is (10 sec: 20479.9, 60 sec: 20138.7, 300 sec: 20230.1). Total num frames: 1110310912. Throughput: 0: 5024.6. Samples: 2565942. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 13:27:19,965][19571] Avg episode reward: [(0, '10.758')] [2025-01-05 13:27:20,739][19668] Updated weights for policy 0, policy_version 271076 (0.0016) [2025-01-05 13:27:22,786][19668] Updated weights for policy 0, policy_version 271086 (0.0015) [2025-01-05 13:27:24,865][19668] Updated weights for policy 0, policy_version 271096 (0.0016) [2025-01-05 13:27:24,965][19571] Fps is (10 sec: 20479.9, 60 sec: 20070.4, 300 sec: 20216.2). Total num frames: 1110409216. Throughput: 0: 5039.5. Samples: 2596566. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 13:27:24,965][19571] Avg episode reward: [(0, '9.801')] [2025-01-05 13:27:26,942][19668] Updated weights for policy 0, policy_version 271106 (0.0015) [2025-01-05 13:27:28,975][19668] Updated weights for policy 0, policy_version 271116 (0.0015) [2025-01-05 13:27:29,965][19571] Fps is (10 sec: 19660.8, 60 sec: 20070.4, 300 sec: 20202.3). Total num frames: 1110507520. Throughput: 0: 5026.7. Samples: 2626044. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 13:27:29,965][19571] Avg episode reward: [(0, '10.662')] [2025-01-05 13:27:31,082][19668] Updated weights for policy 0, policy_version 271126 (0.0015) [2025-01-05 13:27:33,077][19668] Updated weights for policy 0, policy_version 271136 (0.0015) [2025-01-05 13:27:34,965][19571] Fps is (10 sec: 20070.4, 60 sec: 20138.6, 300 sec: 20202.3). Total num frames: 1110609920. Throughput: 0: 5027.0. Samples: 2641050. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 13:27:34,965][19571] Avg episode reward: [(0, '9.870')] [2025-01-05 13:27:35,119][19668] Updated weights for policy 0, policy_version 271146 (0.0015) [2025-01-05 13:27:37,135][19668] Updated weights for policy 0, policy_version 271156 (0.0014) [2025-01-05 13:27:39,125][19668] Updated weights for policy 0, policy_version 271166 (0.0015) [2025-01-05 13:27:39,965][19571] Fps is (10 sec: 20480.3, 60 sec: 20138.7, 300 sec: 20202.3). Total num frames: 1110712320. Throughput: 0: 5028.9. Samples: 2671590. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 13:27:39,965][19571] Avg episode reward: [(0, '9.556')] [2025-01-05 13:27:41,173][19668] Updated weights for policy 0, policy_version 271176 (0.0015) [2025-01-05 13:27:43,181][19668] Updated weights for policy 0, policy_version 271186 (0.0016) [2025-01-05 13:27:44,965][19571] Fps is (10 sec: 20070.4, 60 sec: 20138.7, 300 sec: 20188.4). Total num frames: 1110810624. Throughput: 0: 5033.3. Samples: 2702074. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 13:27:44,965][19571] Avg episode reward: [(0, '10.077')] [2025-01-05 13:27:45,154][19668] Updated weights for policy 0, policy_version 271196 (0.0016) [2025-01-05 13:27:47,201][19668] Updated weights for policy 0, policy_version 271206 (0.0015) [2025-01-05 13:27:49,207][19668] Updated weights for policy 0, policy_version 271216 (0.0016) [2025-01-05 13:27:49,965][19571] Fps is (10 sec: 20070.4, 60 sec: 20138.7, 300 sec: 20202.3). Total num frames: 1110913024. Throughput: 0: 5037.7. Samples: 2717326. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 13:27:49,965][19571] Avg episode reward: [(0, '10.285')] [2025-01-05 13:27:51,186][19668] Updated weights for policy 0, policy_version 271226 (0.0015) [2025-01-05 13:27:53,220][19668] Updated weights for policy 0, policy_version 271236 (0.0014) [2025-01-05 13:27:54,965][19571] Fps is (10 sec: 20479.9, 60 sec: 20138.7, 300 sec: 20202.3). Total num frames: 1111015424. Throughput: 0: 5051.2. Samples: 2747906. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 13:27:54,965][19571] Avg episode reward: [(0, '10.462')] [2025-01-05 13:27:55,304][19668] Updated weights for policy 0, policy_version 271246 (0.0016) [2025-01-05 13:27:57,302][19668] Updated weights for policy 0, policy_version 271256 (0.0015) [2025-01-05 13:27:59,311][19668] Updated weights for policy 0, policy_version 271266 (0.0014) [2025-01-05 13:27:59,965][19571] Fps is (10 sec: 20480.0, 60 sec: 20207.0, 300 sec: 20202.3). Total num frames: 1111117824. Throughput: 0: 5057.1. Samples: 2778218. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 13:27:59,965][19571] Avg episode reward: [(0, '9.833')] [2025-01-05 13:28:01,328][19668] Updated weights for policy 0, policy_version 271276 (0.0015) [2025-01-05 13:28:03,351][19668] Updated weights for policy 0, policy_version 271286 (0.0015) [2025-01-05 13:28:04,965][19571] Fps is (10 sec: 20070.4, 60 sec: 20138.7, 300 sec: 20188.4). Total num frames: 1111216128. Throughput: 0: 5054.8. Samples: 2793410. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 13:28:04,965][19571] Avg episode reward: [(0, '9.633')] [2025-01-05 13:28:05,438][19668] Updated weights for policy 0, policy_version 271296 (0.0017) [2025-01-05 13:28:07,437][19668] Updated weights for policy 0, policy_version 271306 (0.0014) [2025-01-05 13:28:09,427][19668] Updated weights for policy 0, policy_version 271316 (0.0014) [2025-01-05 13:28:09,965][19571] Fps is (10 sec: 20070.2, 60 sec: 20206.9, 300 sec: 20202.3). Total num frames: 1111318528. Throughput: 0: 5046.8. Samples: 2823674. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 13:28:09,965][19571] Avg episode reward: [(0, '10.000')] [2025-01-05 13:28:11,441][19668] Updated weights for policy 0, policy_version 271326 (0.0015) [2025-01-05 13:28:13,453][19668] Updated weights for policy 0, policy_version 271336 (0.0015) [2025-01-05 13:28:14,965][19571] Fps is (10 sec: 20480.1, 60 sec: 20275.2, 300 sec: 20202.3). Total num frames: 1111420928. Throughput: 0: 5071.6. Samples: 2854266. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 13:28:14,965][19571] Avg episode reward: [(0, '9.410')] [2025-01-05 13:28:15,441][19668] Updated weights for policy 0, policy_version 271346 (0.0016) [2025-01-05 13:28:17,483][19668] Updated weights for policy 0, policy_version 271356 (0.0015) [2025-01-05 13:28:19,470][19668] Updated weights for policy 0, policy_version 271366 (0.0017) [2025-01-05 13:28:19,965][19571] Fps is (10 sec: 20480.0, 60 sec: 20206.9, 300 sec: 20202.3). Total num frames: 1111523328. Throughput: 0: 5076.8. Samples: 2869508. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 13:28:19,965][19571] Avg episode reward: [(0, '11.063')] [2025-01-05 13:28:21,449][19668] Updated weights for policy 0, policy_version 271376 (0.0016) [2025-01-05 13:28:23,491][19668] Updated weights for policy 0, policy_version 271386 (0.0015) [2025-01-05 13:28:24,965][19571] Fps is (10 sec: 20480.0, 60 sec: 20275.2, 300 sec: 20202.3). Total num frames: 1111625728. Throughput: 0: 5078.7. Samples: 2900134. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 13:28:24,965][19571] Avg episode reward: [(0, '9.803')] [2025-01-05 13:28:25,595][19668] Updated weights for policy 0, policy_version 271396 (0.0016) [2025-01-05 13:28:27,562][19668] Updated weights for policy 0, policy_version 271406 (0.0015) [2025-01-05 13:28:29,599][19668] Updated weights for policy 0, policy_version 271416 (0.0015) [2025-01-05 13:28:29,965][19571] Fps is (10 sec: 20070.5, 60 sec: 20275.2, 300 sec: 20188.4). Total num frames: 1111724032. Throughput: 0: 5071.8. Samples: 2930304. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 13:28:29,965][19571] Avg episode reward: [(0, '10.354')] [2025-01-05 13:28:31,716][19668] Updated weights for policy 0, policy_version 271426 (0.0016) [2025-01-05 13:28:33,725][19668] Updated weights for policy 0, policy_version 271436 (0.0016) [2025-01-05 13:28:34,965][19571] Fps is (10 sec: 20070.5, 60 sec: 20275.2, 300 sec: 20202.3). Total num frames: 1111826432. Throughput: 0: 5061.3. Samples: 2945086. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 13:28:34,965][19571] Avg episode reward: [(0, '9.388')] [2025-01-05 13:28:35,778][19668] Updated weights for policy 0, policy_version 271446 (0.0016) [2025-01-05 13:28:37,826][19668] Updated weights for policy 0, policy_version 271456 (0.0016) [2025-01-05 13:28:39,887][19668] Updated weights for policy 0, policy_version 271466 (0.0017) [2025-01-05 13:28:39,965][19571] Fps is (10 sec: 20070.3, 60 sec: 20206.9, 300 sec: 20188.4). Total num frames: 1111924736. Throughput: 0: 5047.9. Samples: 2975060. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 13:28:39,965][19571] Avg episode reward: [(0, '10.037')] [2025-01-05 13:28:39,973][19636] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000271466_1111924736.pth... [2025-01-05 13:28:40,031][19636] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000270284_1107083264.pth [2025-01-05 13:28:42,048][19668] Updated weights for policy 0, policy_version 271476 (0.0016) [2025-01-05 13:28:44,056][19668] Updated weights for policy 0, policy_version 271486 (0.0015) [2025-01-05 13:28:44,965][19571] Fps is (10 sec: 19660.7, 60 sec: 20206.9, 300 sec: 20174.5). Total num frames: 1112023040. Throughput: 0: 5034.8. Samples: 3004786. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 13:28:44,965][19571] Avg episode reward: [(0, '9.960')] [2025-01-05 13:28:46,039][19668] Updated weights for policy 0, policy_version 271496 (0.0015) [2025-01-05 13:28:48,093][19668] Updated weights for policy 0, policy_version 271506 (0.0016) [2025-01-05 13:28:49,965][19571] Fps is (10 sec: 20070.5, 60 sec: 20206.9, 300 sec: 20174.5). Total num frames: 1112125440. Throughput: 0: 5035.2. Samples: 3019994. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 13:28:49,965][19571] Avg episode reward: [(0, '9.984')] [2025-01-05 13:28:50,164][19668] Updated weights for policy 0, policy_version 271516 (0.0016) [2025-01-05 13:28:52,159][19668] Updated weights for policy 0, policy_version 271526 (0.0015) [2025-01-05 13:28:54,212][19668] Updated weights for policy 0, policy_version 271536 (0.0015) [2025-01-05 13:28:54,965][19571] Fps is (10 sec: 20070.3, 60 sec: 20138.7, 300 sec: 20174.5). Total num frames: 1112223744. Throughput: 0: 5033.5. Samples: 3050182. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 13:28:54,965][19571] Avg episode reward: [(0, '9.918')] [2025-01-05 13:28:56,345][19668] Updated weights for policy 0, policy_version 271546 (0.0016) [2025-01-05 13:28:58,331][19668] Updated weights for policy 0, policy_version 271556 (0.0015) [2025-01-05 13:28:59,965][19571] Fps is (10 sec: 19661.0, 60 sec: 20070.4, 300 sec: 20146.8). Total num frames: 1112322048. Throughput: 0: 5016.5. Samples: 3080008. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 13:28:59,965][19571] Avg episode reward: [(0, '9.474')] [2025-01-05 13:29:00,388][19668] Updated weights for policy 0, policy_version 271566 (0.0016) [2025-01-05 13:29:02,451][19668] Updated weights for policy 0, policy_version 271576 (0.0015) [2025-01-05 13:29:04,446][19668] Updated weights for policy 0, policy_version 271586 (0.0015) [2025-01-05 13:29:04,965][19571] Fps is (10 sec: 20070.7, 60 sec: 20138.7, 300 sec: 20160.7). Total num frames: 1112424448. Throughput: 0: 5010.4. Samples: 3094976. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 13:29:04,965][19571] Avg episode reward: [(0, '8.891')] [2025-01-05 13:29:06,502][19668] Updated weights for policy 0, policy_version 271596 (0.0014) [2025-01-05 13:29:08,573][19668] Updated weights for policy 0, policy_version 271606 (0.0015) [2025-01-05 13:29:09,965][19571] Fps is (10 sec: 20070.0, 60 sec: 20070.4, 300 sec: 20146.8). Total num frames: 1112522752. Throughput: 0: 5000.4. Samples: 3125152. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 13:29:09,965][19571] Avg episode reward: [(0, '10.107')] [2025-01-05 13:29:10,627][19668] Updated weights for policy 0, policy_version 271616 (0.0016) [2025-01-05 13:29:12,674][19668] Updated weights for policy 0, policy_version 271626 (0.0016) [2025-01-05 13:29:14,743][19668] Updated weights for policy 0, policy_version 271636 (0.0015) [2025-01-05 13:29:14,965][19571] Fps is (10 sec: 20070.2, 60 sec: 20070.4, 300 sec: 20160.7). Total num frames: 1112625152. Throughput: 0: 4995.5. Samples: 3155100. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 13:29:14,965][19571] Avg episode reward: [(0, '10.428')] [2025-01-05 13:29:16,815][19668] Updated weights for policy 0, policy_version 271646 (0.0019) [2025-01-05 13:29:18,885][19668] Updated weights for policy 0, policy_version 271656 (0.0016) [2025-01-05 13:29:19,965][19571] Fps is (10 sec: 20070.7, 60 sec: 20002.2, 300 sec: 20146.8). Total num frames: 1112723456. Throughput: 0: 4989.1. Samples: 3169594. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 13:29:19,965][19571] Avg episode reward: [(0, '9.182')] [2025-01-05 13:29:21,022][19668] Updated weights for policy 0, policy_version 271666 (0.0016) [2025-01-05 13:29:23,005][19668] Updated weights for policy 0, policy_version 271676 (0.0015) [2025-01-05 13:29:24,965][19571] Fps is (10 sec: 19660.8, 60 sec: 19933.9, 300 sec: 20132.9). Total num frames: 1112821760. Throughput: 0: 4984.4. Samples: 3199358. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 13:29:24,965][19571] Avg episode reward: [(0, '9.487')] [2025-01-05 13:29:25,115][19668] Updated weights for policy 0, policy_version 271686 (0.0016) [2025-01-05 13:29:27,200][19668] Updated weights for policy 0, policy_version 271696 (0.0016) [2025-01-05 13:29:29,179][19668] Updated weights for policy 0, policy_version 271706 (0.0014) [2025-01-05 13:29:29,965][19571] Fps is (10 sec: 19660.7, 60 sec: 19933.9, 300 sec: 20132.9). Total num frames: 1112920064. Throughput: 0: 4989.1. Samples: 3229294. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 13:29:29,965][19571] Avg episode reward: [(0, '9.253')] [2025-01-05 13:29:31,235][19668] Updated weights for policy 0, policy_version 271716 (0.0015) [2025-01-05 13:29:33,332][19668] Updated weights for policy 0, policy_version 271726 (0.0016) [2025-01-05 13:29:34,965][19571] Fps is (10 sec: 19660.7, 60 sec: 19865.6, 300 sec: 20119.0). Total num frames: 1113018368. Throughput: 0: 4985.3. Samples: 3244332. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 13:29:34,965][19571] Avg episode reward: [(0, '10.322')] [2025-01-05 13:29:35,383][19668] Updated weights for policy 0, policy_version 271736 (0.0017) [2025-01-05 13:29:37,445][19668] Updated weights for policy 0, policy_version 271746 (0.0016) [2025-01-05 13:29:39,522][19668] Updated weights for policy 0, policy_version 271756 (0.0017) [2025-01-05 13:29:39,965][19571] Fps is (10 sec: 20070.3, 60 sec: 19933.9, 300 sec: 20132.9). Total num frames: 1113120768. Throughput: 0: 4973.3. Samples: 3273982. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 13:29:39,965][19571] Avg episode reward: [(0, '9.875')] [2025-01-05 13:29:41,547][19668] Updated weights for policy 0, policy_version 271766 (0.0016) [2025-01-05 13:29:43,606][19668] Updated weights for policy 0, policy_version 271776 (0.0016) [2025-01-05 13:29:44,965][19571] Fps is (10 sec: 20070.5, 60 sec: 19933.9, 300 sec: 20132.9). Total num frames: 1113219072. Throughput: 0: 4972.2. Samples: 3303756. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 13:29:44,965][19571] Avg episode reward: [(0, '9.632')] [2025-01-05 13:29:45,741][19668] Updated weights for policy 0, policy_version 271786 (0.0017) [2025-01-05 13:29:47,704][19668] Updated weights for policy 0, policy_version 271796 (0.0016) [2025-01-05 13:29:49,759][19668] Updated weights for policy 0, policy_version 271806 (0.0015) [2025-01-05 13:29:49,965][19571] Fps is (10 sec: 19661.0, 60 sec: 19865.6, 300 sec: 20119.0). Total num frames: 1113317376. Throughput: 0: 4972.5. Samples: 3318738. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 13:29:49,965][19571] Avg episode reward: [(0, '10.782')] [2025-01-05 13:29:51,907][19668] Updated weights for policy 0, policy_version 271816 (0.0016) [2025-01-05 13:29:53,860][19668] Updated weights for policy 0, policy_version 271826 (0.0016) [2025-01-05 13:29:54,965][19571] Fps is (10 sec: 20070.3, 60 sec: 19933.9, 300 sec: 20132.9). Total num frames: 1113419776. Throughput: 0: 4969.5. Samples: 3348780. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 13:29:54,965][19571] Avg episode reward: [(0, '9.674')] [2025-01-05 13:29:55,915][19668] Updated weights for policy 0, policy_version 271836 (0.0017) [2025-01-05 13:29:57,998][19668] Updated weights for policy 0, policy_version 271846 (0.0016) [2025-01-05 13:29:59,965][19571] Fps is (10 sec: 20070.3, 60 sec: 19933.8, 300 sec: 20132.9). Total num frames: 1113518080. Throughput: 0: 4967.4. Samples: 3378632. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 13:29:59,965][19571] Avg episode reward: [(0, '9.290')] [2025-01-05 13:30:00,002][19668] Updated weights for policy 0, policy_version 271856 (0.0017) [2025-01-05 13:30:02,091][19668] Updated weights for policy 0, policy_version 271866 (0.0016) [2025-01-05 13:30:04,136][19668] Updated weights for policy 0, policy_version 271876 (0.0017) [2025-01-05 13:30:04,965][19571] Fps is (10 sec: 20070.4, 60 sec: 19933.8, 300 sec: 20132.9). Total num frames: 1113620480. Throughput: 0: 4981.1. Samples: 3393744. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 13:30:04,965][19571] Avg episode reward: [(0, '11.488')] [2025-01-05 13:30:06,151][19668] Updated weights for policy 0, policy_version 271886 (0.0016) [2025-01-05 13:30:08,186][19668] Updated weights for policy 0, policy_version 271896 (0.0016) [2025-01-05 13:30:09,965][19571] Fps is (10 sec: 20070.5, 60 sec: 19933.9, 300 sec: 20132.9). Total num frames: 1113718784. Throughput: 0: 4985.2. Samples: 3423692. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 13:30:09,965][19571] Avg episode reward: [(0, '9.447')] [2025-01-05 13:30:10,339][19668] Updated weights for policy 0, policy_version 271906 (0.0018) [2025-01-05 13:30:12,284][19668] Updated weights for policy 0, policy_version 271916 (0.0016) [2025-01-05 13:30:13,991][19636] Signal inference workers to stop experience collection... (50 times) [2025-01-05 13:30:13,995][19636] Signal inference workers to resume experience collection... (50 times) [2025-01-05 13:30:13,999][19668] InferenceWorker_p0-w0: stopping experience collection (50 times) [2025-01-05 13:30:14,009][19668] InferenceWorker_p0-w0: resuming experience collection (50 times) [2025-01-05 13:30:14,356][19668] Updated weights for policy 0, policy_version 271926 (0.0015) [2025-01-05 13:30:14,965][19571] Fps is (10 sec: 19661.0, 60 sec: 19865.6, 300 sec: 20119.0). Total num frames: 1113817088. Throughput: 0: 4987.2. Samples: 3453720. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 13:30:14,965][19571] Avg episode reward: [(0, '10.790')] [2025-01-05 13:30:16,440][19668] Updated weights for policy 0, policy_version 271936 (0.0016) [2025-01-05 13:30:18,388][19668] Updated weights for policy 0, policy_version 271946 (0.0015) [2025-01-05 13:30:19,965][19571] Fps is (10 sec: 20070.5, 60 sec: 19933.9, 300 sec: 20132.9). Total num frames: 1113919488. Throughput: 0: 4989.3. Samples: 3468848. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 13:30:19,965][19571] Avg episode reward: [(0, '8.653')] [2025-01-05 13:30:20,473][19668] Updated weights for policy 0, policy_version 271956 (0.0016) [2025-01-05 13:30:22,552][19668] Updated weights for policy 0, policy_version 271966 (0.0015) [2025-01-05 13:30:24,504][19668] Updated weights for policy 0, policy_version 271976 (0.0016) [2025-01-05 13:30:24,965][19571] Fps is (10 sec: 20070.1, 60 sec: 19933.8, 300 sec: 20105.1). Total num frames: 1114017792. Throughput: 0: 5003.0. Samples: 3499116. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 13:30:24,965][19571] Avg episode reward: [(0, '10.190')] [2025-01-05 13:30:26,556][19668] Updated weights for policy 0, policy_version 271986 (0.0015) [2025-01-05 13:30:28,661][19668] Updated weights for policy 0, policy_version 271996 (0.0016) [2025-01-05 13:30:29,965][19571] Fps is (10 sec: 20070.0, 60 sec: 20002.1, 300 sec: 20119.0). Total num frames: 1114120192. Throughput: 0: 5003.3. Samples: 3528904. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 13:30:29,966][19571] Avg episode reward: [(0, '10.835')] [2025-01-05 13:30:30,690][19668] Updated weights for policy 0, policy_version 272006 (0.0017) [2025-01-05 13:30:32,743][19668] Updated weights for policy 0, policy_version 272016 (0.0015) [2025-01-05 13:30:34,808][19668] Updated weights for policy 0, policy_version 272026 (0.0016) [2025-01-05 13:30:34,965][19571] Fps is (10 sec: 20480.1, 60 sec: 20070.4, 300 sec: 20119.0). Total num frames: 1114222592. Throughput: 0: 5004.9. Samples: 3543958. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 13:30:34,965][19571] Avg episode reward: [(0, '10.483')] [2025-01-05 13:30:36,812][19668] Updated weights for policy 0, policy_version 272036 (0.0016) [2025-01-05 13:30:38,867][19668] Updated weights for policy 0, policy_version 272046 (0.0017) [2025-01-05 13:30:39,965][19571] Fps is (10 sec: 20070.6, 60 sec: 20002.2, 300 sec: 20105.1). Total num frames: 1114320896. Throughput: 0: 5006.0. Samples: 3574048. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 13:30:39,965][19571] Avg episode reward: [(0, '9.698')] [2025-01-05 13:30:39,972][19636] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000272051_1114320896.pth... [2025-01-05 13:30:40,029][19636] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000270875_1109504000.pth [2025-01-05 13:30:41,008][19668] Updated weights for policy 0, policy_version 272056 (0.0016) [2025-01-05 13:30:42,953][19668] Updated weights for policy 0, policy_version 272066 (0.0015) [2025-01-05 13:30:44,965][19571] Fps is (10 sec: 19661.1, 60 sec: 20002.2, 300 sec: 20091.2). Total num frames: 1114419200. Throughput: 0: 5007.0. Samples: 3603946. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 13:30:44,965][19571] Avg episode reward: [(0, '9.651')] [2025-01-05 13:30:45,025][19668] Updated weights for policy 0, policy_version 272076 (0.0015) [2025-01-05 13:30:47,096][19668] Updated weights for policy 0, policy_version 272086 (0.0019) [2025-01-05 13:30:49,045][19668] Updated weights for policy 0, policy_version 272096 (0.0015) [2025-01-05 13:30:49,965][19571] Fps is (10 sec: 20070.1, 60 sec: 20070.3, 300 sec: 20091.2). Total num frames: 1114521600. Throughput: 0: 5008.6. Samples: 3619130. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-05 13:30:49,965][19571] Avg episode reward: [(0, '9.217')] [2025-01-05 13:30:51,107][19668] Updated weights for policy 0, policy_version 272106 (0.0015) [2025-01-05 13:30:53,180][19668] Updated weights for policy 0, policy_version 272116 (0.0015) [2025-01-05 13:30:54,965][19571] Fps is (10 sec: 20070.0, 60 sec: 20002.1, 300 sec: 20077.4). Total num frames: 1114619904. Throughput: 0: 5015.2. Samples: 3649378. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-05 13:30:54,965][19571] Avg episode reward: [(0, '9.501')] [2025-01-05 13:30:55,216][19668] Updated weights for policy 0, policy_version 272126 (0.0016) [2025-01-05 13:30:57,260][19668] Updated weights for policy 0, policy_version 272136 (0.0015) [2025-01-05 13:30:59,312][19668] Updated weights for policy 0, policy_version 272146 (0.0016) [2025-01-05 13:30:59,965][19571] Fps is (10 sec: 20070.3, 60 sec: 20070.3, 300 sec: 20077.3). Total num frames: 1114722304. Throughput: 0: 5012.0. Samples: 3679262. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-05 13:30:59,966][19571] Avg episode reward: [(0, '9.006')] [2025-01-05 13:31:01,360][19668] Updated weights for policy 0, policy_version 272156 (0.0016) [2025-01-05 13:31:03,419][19668] Updated weights for policy 0, policy_version 272166 (0.0017) [2025-01-05 13:31:04,965][19571] Fps is (10 sec: 20070.2, 60 sec: 20002.1, 300 sec: 20063.4). Total num frames: 1114820608. Throughput: 0: 5007.2. Samples: 3694172. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-05 13:31:04,966][19571] Avg episode reward: [(0, '9.732')] [2025-01-05 13:31:05,551][19668] Updated weights for policy 0, policy_version 272176 (0.0016) [2025-01-05 13:31:07,534][19668] Updated weights for policy 0, policy_version 272186 (0.0016) [2025-01-05 13:31:09,590][19668] Updated weights for policy 0, policy_version 272196 (0.0014) [2025-01-05 13:31:09,965][19571] Fps is (10 sec: 19660.9, 60 sec: 20002.1, 300 sec: 20063.4). Total num frames: 1114918912. Throughput: 0: 4997.4. Samples: 3724000. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-05 13:31:09,965][19571] Avg episode reward: [(0, '9.479')] [2025-01-05 13:31:11,708][19668] Updated weights for policy 0, policy_version 272206 (0.0017) [2025-01-05 13:31:13,675][19668] Updated weights for policy 0, policy_version 272216 (0.0015) [2025-01-05 13:31:14,965][19571] Fps is (10 sec: 20070.7, 60 sec: 20070.4, 300 sec: 20063.5). Total num frames: 1115021312. Throughput: 0: 5004.4. Samples: 3754100. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-05 13:31:14,965][19571] Avg episode reward: [(0, '9.921')] [2025-01-05 13:31:15,718][19668] Updated weights for policy 0, policy_version 272226 (0.0015) [2025-01-05 13:31:17,768][19668] Updated weights for policy 0, policy_version 272236 (0.0015) [2025-01-05 13:31:19,759][19668] Updated weights for policy 0, policy_version 272246 (0.0014) [2025-01-05 13:31:19,965][19571] Fps is (10 sec: 20480.1, 60 sec: 20070.3, 300 sec: 20063.5). Total num frames: 1115123712. Throughput: 0: 5008.0. Samples: 3769316. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-05 13:31:19,965][19571] Avg episode reward: [(0, '10.316')] [2025-01-05 13:31:21,776][19668] Updated weights for policy 0, policy_version 272256 (0.0015) [2025-01-05 13:31:23,830][19668] Updated weights for policy 0, policy_version 272266 (0.0015) [2025-01-05 13:31:24,965][19571] Fps is (10 sec: 20070.3, 60 sec: 20070.4, 300 sec: 20063.5). Total num frames: 1115222016. Throughput: 0: 5014.6. Samples: 3799704. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-05 13:31:24,965][19571] Avg episode reward: [(0, '9.190')] [2025-01-05 13:31:25,912][19668] Updated weights for policy 0, policy_version 272276 (0.0016) [2025-01-05 13:31:27,936][19668] Updated weights for policy 0, policy_version 272286 (0.0016) [2025-01-05 13:31:29,965][19571] Fps is (10 sec: 19660.7, 60 sec: 20002.1, 300 sec: 20063.4). Total num frames: 1115320320. Throughput: 0: 5005.8. Samples: 3829206. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-05 13:31:29,965][19571] Avg episode reward: [(0, '8.947')] [2025-01-05 13:31:30,033][19668] Updated weights for policy 0, policy_version 272296 (0.0017) [2025-01-05 13:31:32,121][19668] Updated weights for policy 0, policy_version 272306 (0.0016) [2025-01-05 13:31:34,135][19668] Updated weights for policy 0, policy_version 272316 (0.0015) [2025-01-05 13:31:34,965][19571] Fps is (10 sec: 20070.4, 60 sec: 20002.1, 300 sec: 20063.5). Total num frames: 1115422720. Throughput: 0: 5000.2. Samples: 3844140. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-05 13:31:34,965][19571] Avg episode reward: [(0, '10.388')] [2025-01-05 13:31:36,171][19668] Updated weights for policy 0, policy_version 272326 (0.0015) [2025-01-05 13:31:38,210][19668] Updated weights for policy 0, policy_version 272336 (0.0015) [2025-01-05 13:31:39,965][19571] Fps is (10 sec: 20070.7, 60 sec: 20002.1, 300 sec: 20063.5). Total num frames: 1115521024. Throughput: 0: 4999.6. Samples: 3874360. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-05 13:31:39,965][19571] Avg episode reward: [(0, '9.602')] [2025-01-05 13:31:40,301][19668] Updated weights for policy 0, policy_version 272346 (0.0017) [2025-01-05 13:31:42,337][19668] Updated weights for policy 0, policy_version 272356 (0.0015) [2025-01-05 13:31:44,400][19668] Updated weights for policy 0, policy_version 272366 (0.0016) [2025-01-05 13:31:44,965][19571] Fps is (10 sec: 19660.8, 60 sec: 20002.1, 300 sec: 20049.6). Total num frames: 1115619328. Throughput: 0: 5000.4. Samples: 3904280. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-05 13:31:44,965][19571] Avg episode reward: [(0, '10.418')] [2025-01-05 13:31:46,496][19668] Updated weights for policy 0, policy_version 272376 (0.0016) [2025-01-05 13:31:48,514][19668] Updated weights for policy 0, policy_version 272386 (0.0016) [2025-01-05 13:31:49,965][19571] Fps is (10 sec: 20070.4, 60 sec: 20002.2, 300 sec: 20049.6). Total num frames: 1115721728. Throughput: 0: 4995.6. Samples: 3918972. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-05 13:31:49,965][19571] Avg episode reward: [(0, '9.010')] [2025-01-05 13:31:50,568][19668] Updated weights for policy 0, policy_version 272396 (0.0015) [2025-01-05 13:31:52,571][19668] Updated weights for policy 0, policy_version 272406 (0.0017) [2025-01-05 13:31:54,619][19668] Updated weights for policy 0, policy_version 272416 (0.0015) [2025-01-05 13:31:54,965][19571] Fps is (10 sec: 20070.3, 60 sec: 20002.1, 300 sec: 20049.6). Total num frames: 1115820032. Throughput: 0: 5006.5. Samples: 3949294. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-05 13:31:54,965][19571] Avg episode reward: [(0, '8.877')] [2025-01-05 13:31:56,742][19668] Updated weights for policy 0, policy_version 272426 (0.0016) [2025-01-05 13:31:58,742][19668] Updated weights for policy 0, policy_version 272436 (0.0015) [2025-01-05 13:31:59,965][19571] Fps is (10 sec: 20070.3, 60 sec: 20002.2, 300 sec: 20049.6). Total num frames: 1115922432. Throughput: 0: 5000.9. Samples: 3979140. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-05 13:31:59,965][19571] Avg episode reward: [(0, '10.130')] [2025-01-05 13:32:00,791][19668] Updated weights for policy 0, policy_version 272446 (0.0016) [2025-01-05 13:32:02,798][19668] Updated weights for policy 0, policy_version 272456 (0.0015) [2025-01-05 13:32:04,792][19668] Updated weights for policy 0, policy_version 272466 (0.0015) [2025-01-05 13:32:04,965][19571] Fps is (10 sec: 20070.5, 60 sec: 20002.2, 300 sec: 20049.6). Total num frames: 1116020736. Throughput: 0: 5002.4. Samples: 3994422. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 13:32:04,965][19571] Avg episode reward: [(0, '10.028')] [2025-01-05 13:32:06,857][19668] Updated weights for policy 0, policy_version 272476 (0.0015) [2025-01-05 13:32:08,890][19668] Updated weights for policy 0, policy_version 272486 (0.0015) [2025-01-05 13:32:09,965][19571] Fps is (10 sec: 20070.2, 60 sec: 20070.4, 300 sec: 20063.4). Total num frames: 1116123136. Throughput: 0: 4998.1. Samples: 4024618. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 13:32:09,965][19571] Avg episode reward: [(0, '11.062')] [2025-01-05 13:32:10,952][19668] Updated weights for policy 0, policy_version 272496 (0.0016) [2025-01-05 13:32:12,998][19668] Updated weights for policy 0, policy_version 272506 (0.0015) [2025-01-05 13:32:14,965][19571] Fps is (10 sec: 20070.5, 60 sec: 20002.2, 300 sec: 20035.7). Total num frames: 1116221440. Throughput: 0: 5001.3. Samples: 4054264. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 13:32:14,965][19571] Avg episode reward: [(0, '9.777')] [2025-01-05 13:32:15,089][19668] Updated weights for policy 0, policy_version 272516 (0.0017) [2025-01-05 13:32:17,061][19668] Updated weights for policy 0, policy_version 272526 (0.0015) [2025-01-05 13:32:19,089][19668] Updated weights for policy 0, policy_version 272536 (0.0016) [2025-01-05 13:32:19,965][19571] Fps is (10 sec: 20070.5, 60 sec: 20002.1, 300 sec: 20049.6). Total num frames: 1116323840. Throughput: 0: 5009.7. Samples: 4069576. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 13:32:19,965][19571] Avg episode reward: [(0, '9.623')] [2025-01-05 13:32:21,174][19668] Updated weights for policy 0, policy_version 272546 (0.0015) [2025-01-05 13:32:23,149][19668] Updated weights for policy 0, policy_version 272556 (0.0016) [2025-01-05 13:32:24,965][19571] Fps is (10 sec: 20070.3, 60 sec: 20002.1, 300 sec: 20049.6). Total num frames: 1116422144. Throughput: 0: 5010.3. Samples: 4099822. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 13:32:24,965][19571] Avg episode reward: [(0, '9.178')] [2025-01-05 13:32:25,227][19668] Updated weights for policy 0, policy_version 272566 (0.0019) [2025-01-05 13:32:27,222][19668] Updated weights for policy 0, policy_version 272576 (0.0015) [2025-01-05 13:32:29,214][19668] Updated weights for policy 0, policy_version 272586 (0.0016) [2025-01-05 13:32:29,965][19571] Fps is (10 sec: 20070.3, 60 sec: 20070.4, 300 sec: 20049.6). Total num frames: 1116524544. Throughput: 0: 5020.2. Samples: 4130190. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 13:32:29,965][19571] Avg episode reward: [(0, '9.293')] [2025-01-05 13:32:31,273][19668] Updated weights for policy 0, policy_version 272596 (0.0016) [2025-01-05 13:32:33,280][19668] Updated weights for policy 0, policy_version 272606 (0.0015) [2025-01-05 13:32:34,965][19571] Fps is (10 sec: 20480.0, 60 sec: 20070.4, 300 sec: 20049.6). Total num frames: 1116626944. Throughput: 0: 5032.2. Samples: 4145420. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 13:32:34,965][19571] Avg episode reward: [(0, '9.637')] [2025-01-05 13:32:35,263][19668] Updated weights for policy 0, policy_version 272616 (0.0016) [2025-01-05 13:32:37,324][19668] Updated weights for policy 0, policy_version 272626 (0.0016) [2025-01-05 13:32:39,322][19668] Updated weights for policy 0, policy_version 272636 (0.0016) [2025-01-05 13:32:39,965][19571] Fps is (10 sec: 20480.1, 60 sec: 20138.6, 300 sec: 20063.5). Total num frames: 1116729344. Throughput: 0: 5036.5. Samples: 4175938. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 13:32:39,965][19571] Avg episode reward: [(0, '9.424')] [2025-01-05 13:32:39,972][19636] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000272639_1116729344.pth... [2025-01-05 13:32:40,026][19636] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000271466_1111924736.pth [2025-01-05 13:32:41,345][19668] Updated weights for policy 0, policy_version 272646 (0.0016) [2025-01-05 13:32:43,424][19668] Updated weights for policy 0, policy_version 272656 (0.0016) [2025-01-05 13:32:44,965][19571] Fps is (10 sec: 20070.6, 60 sec: 20138.7, 300 sec: 20049.6). Total num frames: 1116827648. Throughput: 0: 5039.2. Samples: 4205902. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 13:32:44,965][19571] Avg episode reward: [(0, '9.400')] [2025-01-05 13:32:45,471][19668] Updated weights for policy 0, policy_version 272666 (0.0016) [2025-01-05 13:32:47,468][19668] Updated weights for policy 0, policy_version 272676 (0.0016) [2025-01-05 13:32:49,528][19668] Updated weights for policy 0, policy_version 272686 (0.0015) [2025-01-05 13:32:49,965][19571] Fps is (10 sec: 20070.6, 60 sec: 20138.7, 300 sec: 20049.6). Total num frames: 1116930048. Throughput: 0: 5037.7. Samples: 4221120. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 13:32:49,965][19571] Avg episode reward: [(0, '11.433')] [2025-01-05 13:32:51,582][19668] Updated weights for policy 0, policy_version 272696 (0.0016) [2025-01-05 13:32:53,588][19668] Updated weights for policy 0, policy_version 272706 (0.0015) [2025-01-05 13:32:54,965][19571] Fps is (10 sec: 20070.2, 60 sec: 20138.7, 300 sec: 20035.7). Total num frames: 1117028352. Throughput: 0: 5036.8. Samples: 4251274. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 13:32:54,965][19571] Avg episode reward: [(0, '8.876')] [2025-01-05 13:32:55,634][19668] Updated weights for policy 0, policy_version 272716 (0.0017) [2025-01-05 13:32:57,623][19668] Updated weights for policy 0, policy_version 272726 (0.0016) [2025-01-05 13:32:59,617][19668] Updated weights for policy 0, policy_version 272736 (0.0015) [2025-01-05 13:32:59,965][19571] Fps is (10 sec: 20070.3, 60 sec: 20138.7, 300 sec: 20049.6). Total num frames: 1117130752. Throughput: 0: 5057.1. Samples: 4281834. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 13:32:59,965][19571] Avg episode reward: [(0, '9.538')] [2025-01-05 13:33:01,644][19668] Updated weights for policy 0, policy_version 272746 (0.0014) [2025-01-05 13:33:03,654][19668] Updated weights for policy 0, policy_version 272756 (0.0014) [2025-01-05 13:33:04,965][19571] Fps is (10 sec: 20480.0, 60 sec: 20206.9, 300 sec: 20049.6). Total num frames: 1117233152. Throughput: 0: 5057.5. Samples: 4297164. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 13:33:04,965][19571] Avg episode reward: [(0, '9.891')] [2025-01-05 13:33:05,621][19668] Updated weights for policy 0, policy_version 272766 (0.0014) [2025-01-05 13:33:07,651][19668] Updated weights for policy 0, policy_version 272776 (0.0015) [2025-01-05 13:33:09,663][19668] Updated weights for policy 0, policy_version 272786 (0.0016) [2025-01-05 13:33:09,965][19571] Fps is (10 sec: 20480.1, 60 sec: 20207.0, 300 sec: 20049.6). Total num frames: 1117335552. Throughput: 0: 5065.9. Samples: 4327788. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 13:33:09,965][19571] Avg episode reward: [(0, '9.248')] [2025-01-05 13:33:11,628][19668] Updated weights for policy 0, policy_version 272796 (0.0015) [2025-01-05 13:33:13,680][19668] Updated weights for policy 0, policy_version 272806 (0.0016) [2025-01-05 13:33:14,965][19571] Fps is (10 sec: 20480.0, 60 sec: 20275.2, 300 sec: 20049.6). Total num frames: 1117437952. Throughput: 0: 5062.7. Samples: 4358010. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 13:33:14,965][19571] Avg episode reward: [(0, '9.394')] [2025-01-05 13:33:15,804][19668] Updated weights for policy 0, policy_version 272816 (0.0018) [2025-01-05 13:33:17,799][19668] Updated weights for policy 0, policy_version 272826 (0.0016) [2025-01-05 13:33:19,858][19668] Updated weights for policy 0, policy_version 272836 (0.0018) [2025-01-05 13:33:19,965][19571] Fps is (10 sec: 20070.4, 60 sec: 20207.0, 300 sec: 20035.7). Total num frames: 1117536256. Throughput: 0: 5058.1. Samples: 4373034. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 13:33:19,965][19571] Avg episode reward: [(0, '10.027')] [2025-01-05 13:33:21,928][19668] Updated weights for policy 0, policy_version 272846 (0.0016) [2025-01-05 13:33:23,918][19668] Updated weights for policy 0, policy_version 272856 (0.0016) [2025-01-05 13:33:24,965][19571] Fps is (10 sec: 19660.8, 60 sec: 20206.9, 300 sec: 20035.7). Total num frames: 1117634560. Throughput: 0: 5048.8. Samples: 4403134. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 13:33:24,965][19571] Avg episode reward: [(0, '10.870')] [2025-01-05 13:33:25,984][19668] Updated weights for policy 0, policy_version 272866 (0.0016) [2025-01-05 13:33:28,025][19668] Updated weights for policy 0, policy_version 272876 (0.0016) [2025-01-05 13:33:29,965][19571] Fps is (10 sec: 20070.1, 60 sec: 20206.9, 300 sec: 20035.7). Total num frames: 1117736960. Throughput: 0: 5049.6. Samples: 4433136. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 13:33:29,965][19571] Avg episode reward: [(0, '9.096')] [2025-01-05 13:33:30,070][19668] Updated weights for policy 0, policy_version 272886 (0.0017) [2025-01-05 13:33:32,107][19668] Updated weights for policy 0, policy_version 272896 (0.0015) [2025-01-05 13:33:34,131][19668] Updated weights for policy 0, policy_version 272906 (0.0017) [2025-01-05 13:33:34,965][19571] Fps is (10 sec: 20480.0, 60 sec: 20206.9, 300 sec: 20049.6). Total num frames: 1117839360. Throughput: 0: 5049.5. Samples: 4448350. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 13:33:34,965][19571] Avg episode reward: [(0, '9.942')] [2025-01-05 13:33:36,188][19668] Updated weights for policy 0, policy_version 272916 (0.0016) [2025-01-05 13:33:38,222][19668] Updated weights for policy 0, policy_version 272926 (0.0016) [2025-01-05 13:33:39,965][19571] Fps is (10 sec: 20070.6, 60 sec: 20138.7, 300 sec: 20049.6). Total num frames: 1117937664. Throughput: 0: 5045.6. Samples: 4478328. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 13:33:39,965][19571] Avg episode reward: [(0, '10.697')] [2025-01-05 13:33:40,298][19668] Updated weights for policy 0, policy_version 272936 (0.0015) [2025-01-05 13:33:42,312][19668] Updated weights for policy 0, policy_version 272946 (0.0015) [2025-01-05 13:33:44,360][19668] Updated weights for policy 0, policy_version 272956 (0.0016) [2025-01-05 13:33:44,965][19571] Fps is (10 sec: 19660.8, 60 sec: 20138.6, 300 sec: 20035.7). Total num frames: 1118035968. Throughput: 0: 5037.7. Samples: 4508532. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 13:33:44,965][19571] Avg episode reward: [(0, '9.897')] [2025-01-05 13:33:46,431][19668] Updated weights for policy 0, policy_version 272966 (0.0015) [2025-01-05 13:33:48,482][19668] Updated weights for policy 0, policy_version 272976 (0.0015) [2025-01-05 13:33:49,965][19571] Fps is (10 sec: 20070.4, 60 sec: 20138.7, 300 sec: 20049.6). Total num frames: 1118138368. Throughput: 0: 5026.8. Samples: 4523368. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 13:33:49,965][19571] Avg episode reward: [(0, '10.174')] [2025-01-05 13:33:50,576][19668] Updated weights for policy 0, policy_version 272986 (0.0016) [2025-01-05 13:33:52,574][19668] Updated weights for policy 0, policy_version 272996 (0.0015) [2025-01-05 13:33:54,577][19668] Updated weights for policy 0, policy_version 273006 (0.0015) [2025-01-05 13:33:54,965][19571] Fps is (10 sec: 20070.3, 60 sec: 20138.6, 300 sec: 20049.6). Total num frames: 1118236672. Throughput: 0: 5014.9. Samples: 4553460. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 13:33:54,965][19571] Avg episode reward: [(0, '8.913')] [2025-01-05 13:33:56,613][19668] Updated weights for policy 0, policy_version 273016 (0.0015) [2025-01-05 13:33:58,631][19668] Updated weights for policy 0, policy_version 273026 (0.0015) [2025-01-05 13:33:59,965][19571] Fps is (10 sec: 20070.2, 60 sec: 20138.6, 300 sec: 20049.6). Total num frames: 1118339072. Throughput: 0: 5015.6. Samples: 4583712. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 13:33:59,965][19571] Avg episode reward: [(0, '9.960')] [2025-01-05 13:34:00,681][19668] Updated weights for policy 0, policy_version 273036 (0.0016) [2025-01-05 13:34:02,752][19668] Updated weights for policy 0, policy_version 273046 (0.0014) [2025-01-05 13:34:04,788][19668] Updated weights for policy 0, policy_version 273056 (0.0015) [2025-01-05 13:34:04,965][19571] Fps is (10 sec: 20480.1, 60 sec: 20138.7, 300 sec: 20063.5). Total num frames: 1118441472. Throughput: 0: 5016.6. Samples: 4598780. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 13:34:04,965][19571] Avg episode reward: [(0, '10.900')] [2025-01-05 13:34:06,835][19668] Updated weights for policy 0, policy_version 273066 (0.0017) [2025-01-05 13:34:08,895][19668] Updated weights for policy 0, policy_version 273076 (0.0018) [2025-01-05 13:34:09,965][19571] Fps is (10 sec: 20070.4, 60 sec: 20070.4, 300 sec: 20049.6). Total num frames: 1118539776. Throughput: 0: 5011.1. Samples: 4628634. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 13:34:09,965][19571] Avg episode reward: [(0, '10.471')] [2025-01-05 13:34:11,015][19668] Updated weights for policy 0, policy_version 273086 (0.0015) [2025-01-05 13:34:12,959][19668] Updated weights for policy 0, policy_version 273096 (0.0015) [2025-01-05 13:34:14,965][19571] Fps is (10 sec: 19660.9, 60 sec: 20002.1, 300 sec: 20049.6). Total num frames: 1118638080. Throughput: 0: 5014.8. Samples: 4658800. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 13:34:14,965][19571] Avg episode reward: [(0, '9.598')] [2025-01-05 13:34:15,008][19668] Updated weights for policy 0, policy_version 273106 (0.0015) [2025-01-05 13:34:17,054][19668] Updated weights for policy 0, policy_version 273116 (0.0015) [2025-01-05 13:34:19,021][19668] Updated weights for policy 0, policy_version 273126 (0.0015) [2025-01-05 13:34:19,965][19571] Fps is (10 sec: 20070.3, 60 sec: 20070.3, 300 sec: 20063.4). Total num frames: 1118740480. Throughput: 0: 5014.9. Samples: 4674022. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 13:34:19,966][19571] Avg episode reward: [(0, '10.739')] [2025-01-05 13:34:21,082][19668] Updated weights for policy 0, policy_version 273136 (0.0016) [2025-01-05 13:34:23,126][19668] Updated weights for policy 0, policy_version 273146 (0.0014) [2025-01-05 13:34:24,965][19571] Fps is (10 sec: 20070.4, 60 sec: 20070.4, 300 sec: 20063.5). Total num frames: 1118838784. Throughput: 0: 5021.3. Samples: 4704288. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 13:34:24,965][19571] Avg episode reward: [(0, '10.433')] [2025-01-05 13:34:25,154][19668] Updated weights for policy 0, policy_version 273156 (0.0016) [2025-01-05 13:34:27,206][19668] Updated weights for policy 0, policy_version 273166 (0.0015) [2025-01-05 13:34:29,242][19668] Updated weights for policy 0, policy_version 273176 (0.0015) [2025-01-05 13:34:29,965][19571] Fps is (10 sec: 20070.8, 60 sec: 20070.5, 300 sec: 20077.4). Total num frames: 1118941184. Throughput: 0: 5018.9. Samples: 4734384. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 13:34:29,965][19571] Avg episode reward: [(0, '9.718')] [2025-01-05 13:34:31,245][19668] Updated weights for policy 0, policy_version 273186 (0.0016) [2025-01-05 13:34:33,285][19668] Updated weights for policy 0, policy_version 273196 (0.0014) [2025-01-05 13:34:34,965][19571] Fps is (10 sec: 20480.0, 60 sec: 20070.4, 300 sec: 20077.3). Total num frames: 1119043584. Throughput: 0: 5024.0. Samples: 4749448. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 13:34:34,965][19571] Avg episode reward: [(0, '9.934')] [2025-01-05 13:34:35,380][19668] Updated weights for policy 0, policy_version 273206 (0.0016) [2025-01-05 13:34:37,335][19668] Updated weights for policy 0, policy_version 273216 (0.0016) [2025-01-05 13:34:39,387][19668] Updated weights for policy 0, policy_version 273226 (0.0015) [2025-01-05 13:34:39,965][19571] Fps is (10 sec: 20070.3, 60 sec: 20070.4, 300 sec: 20077.3). Total num frames: 1119141888. Throughput: 0: 5028.6. Samples: 4779746. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 13:34:39,965][19571] Avg episode reward: [(0, '8.713')] [2025-01-05 13:34:40,031][19636] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000273229_1119145984.pth... [2025-01-05 13:34:40,082][19636] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000272051_1114320896.pth [2025-01-05 13:34:41,528][19668] Updated weights for policy 0, policy_version 273236 (0.0016) [2025-01-05 13:34:43,485][19668] Updated weights for policy 0, policy_version 273246 (0.0015) [2025-01-05 13:34:44,965][19571] Fps is (10 sec: 20070.5, 60 sec: 20138.7, 300 sec: 20091.2). Total num frames: 1119244288. Throughput: 0: 5019.3. Samples: 4809578. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 13:34:44,965][19571] Avg episode reward: [(0, '9.668')] [2025-01-05 13:34:45,573][19668] Updated weights for policy 0, policy_version 273256 (0.0015) [2025-01-05 13:34:47,582][19668] Updated weights for policy 0, policy_version 273266 (0.0016) [2025-01-05 13:34:49,547][19668] Updated weights for policy 0, policy_version 273276 (0.0014) [2025-01-05 13:34:49,965][19571] Fps is (10 sec: 20070.3, 60 sec: 20070.4, 300 sec: 20077.3). Total num frames: 1119342592. Throughput: 0: 5023.4. Samples: 4824834. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 13:34:49,965][19571] Avg episode reward: [(0, '9.814')] [2025-01-05 13:34:51,629][19668] Updated weights for policy 0, policy_version 273286 (0.0017) [2025-01-05 13:34:53,662][19668] Updated weights for policy 0, policy_version 273296 (0.0015) [2025-01-05 13:34:54,965][19571] Fps is (10 sec: 20070.3, 60 sec: 20138.7, 300 sec: 20091.2). Total num frames: 1119444992. Throughput: 0: 5034.6. Samples: 4855192. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 13:34:54,965][19571] Avg episode reward: [(0, '10.333')] [2025-01-05 13:34:55,710][19668] Updated weights for policy 0, policy_version 273306 (0.0016) [2025-01-05 13:34:57,784][19668] Updated weights for policy 0, policy_version 273316 (0.0015) [2025-01-05 13:34:59,799][19668] Updated weights for policy 0, policy_version 273326 (0.0016) [2025-01-05 13:34:59,965][19571] Fps is (10 sec: 20480.2, 60 sec: 20138.7, 300 sec: 20091.2). Total num frames: 1119547392. Throughput: 0: 5031.1. Samples: 4885200. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 13:34:59,965][19571] Avg episode reward: [(0, '9.363')] [2025-01-05 13:35:01,777][19668] Updated weights for policy 0, policy_version 273336 (0.0015) [2025-01-05 13:35:03,847][19668] Updated weights for policy 0, policy_version 273346 (0.0016) [2025-01-05 13:35:04,965][19571] Fps is (10 sec: 20070.4, 60 sec: 20070.4, 300 sec: 20091.2). Total num frames: 1119645696. Throughput: 0: 5030.0. Samples: 4900370. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 13:35:04,965][19571] Avg episode reward: [(0, '10.116')] [2025-01-05 13:35:05,932][19668] Updated weights for policy 0, policy_version 273356 (0.0017) [2025-01-05 13:35:07,907][19668] Updated weights for policy 0, policy_version 273366 (0.0015) [2025-01-05 13:35:09,964][19668] Updated weights for policy 0, policy_version 273376 (0.0016) [2025-01-05 13:35:09,965][19571] Fps is (10 sec: 20070.2, 60 sec: 20138.7, 300 sec: 20105.1). Total num frames: 1119748096. Throughput: 0: 5026.7. Samples: 4930490. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 13:35:09,965][19571] Avg episode reward: [(0, '9.776')] [2025-01-05 13:35:12,042][19668] Updated weights for policy 0, policy_version 273386 (0.0016) [2025-01-05 13:35:14,039][19668] Updated weights for policy 0, policy_version 273396 (0.0015) [2025-01-05 13:35:14,965][19571] Fps is (10 sec: 20070.5, 60 sec: 20138.7, 300 sec: 20091.2). Total num frames: 1119846400. Throughput: 0: 5026.7. Samples: 4960588. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 13:35:14,965][19571] Avg episode reward: [(0, '10.491')] [2025-01-05 13:35:16,072][19668] Updated weights for policy 0, policy_version 273406 (0.0016) [2025-01-05 13:35:18,093][19668] Updated weights for policy 0, policy_version 273416 (0.0015) [2025-01-05 13:35:19,965][19571] Fps is (10 sec: 19660.5, 60 sec: 20070.4, 300 sec: 20091.2). Total num frames: 1119944704. Throughput: 0: 5028.8. Samples: 4975746. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 13:35:19,966][19571] Avg episode reward: [(0, '11.243')] [2025-01-05 13:35:20,171][19668] Updated weights for policy 0, policy_version 273426 (0.0017) [2025-01-05 13:35:22,254][19668] Updated weights for policy 0, policy_version 273436 (0.0014) [2025-01-05 13:35:24,300][19668] Updated weights for policy 0, policy_version 273446 (0.0016) [2025-01-05 13:35:24,965][19571] Fps is (10 sec: 20070.5, 60 sec: 20138.7, 300 sec: 20091.2). Total num frames: 1120047104. Throughput: 0: 5017.0. Samples: 5005512. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 13:35:24,965][19571] Avg episode reward: [(0, '10.349')] [2025-01-05 13:35:26,442][19668] Updated weights for policy 0, policy_version 273456 (0.0017) [2025-01-05 13:35:28,509][19668] Updated weights for policy 0, policy_version 273466 (0.0016) [2025-01-05 13:35:29,965][19571] Fps is (10 sec: 19661.2, 60 sec: 20002.1, 300 sec: 20063.5). Total num frames: 1120141312. Throughput: 0: 5000.5. Samples: 5034602. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 13:35:29,965][19571] Avg episode reward: [(0, '9.930')] [2025-01-05 13:35:30,635][19668] Updated weights for policy 0, policy_version 273476 (0.0017) [2025-01-05 13:35:32,646][19668] Updated weights for policy 0, policy_version 273486 (0.0015) [2025-01-05 13:35:34,714][19668] Updated weights for policy 0, policy_version 273496 (0.0016) [2025-01-05 13:35:34,965][19571] Fps is (10 sec: 19660.8, 60 sec: 20002.2, 300 sec: 20077.3). Total num frames: 1120243712. Throughput: 0: 4994.7. Samples: 5049596. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 13:35:34,965][19571] Avg episode reward: [(0, '9.587')] [2025-01-05 13:35:36,840][19668] Updated weights for policy 0, policy_version 273506 (0.0017) [2025-01-05 13:35:38,853][19668] Updated weights for policy 0, policy_version 273516 (0.0015) [2025-01-05 13:35:39,965][19571] Fps is (10 sec: 20070.3, 60 sec: 20002.1, 300 sec: 20077.3). Total num frames: 1120342016. Throughput: 0: 4980.7. Samples: 5079324. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 13:35:39,965][19571] Avg episode reward: [(0, '9.772')] [2025-01-05 13:35:40,911][19668] Updated weights for policy 0, policy_version 273526 (0.0016) [2025-01-05 13:35:42,948][19668] Updated weights for policy 0, policy_version 273536 (0.0014) [2025-01-05 13:35:44,965][19571] Fps is (10 sec: 19660.8, 60 sec: 19933.9, 300 sec: 20063.5). Total num frames: 1120440320. Throughput: 0: 4973.9. Samples: 5109026. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 13:35:44,965][19571] Avg episode reward: [(0, '10.239')] [2025-01-05 13:35:45,043][19668] Updated weights for policy 0, policy_version 273546 (0.0017) [2025-01-05 13:35:47,127][19668] Updated weights for policy 0, policy_version 273556 (0.0016) [2025-01-05 13:35:49,149][19668] Updated weights for policy 0, policy_version 273566 (0.0016) [2025-01-05 13:35:49,965][19571] Fps is (10 sec: 19660.9, 60 sec: 19933.9, 300 sec: 20063.5). Total num frames: 1120538624. Throughput: 0: 4971.6. Samples: 5124092. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 13:35:49,965][19571] Avg episode reward: [(0, '9.415')] [2025-01-05 13:35:51,291][19668] Updated weights for policy 0, policy_version 273576 (0.0016) [2025-01-05 13:35:53,367][19668] Updated weights for policy 0, policy_version 273586 (0.0016) [2025-01-05 13:35:54,965][19571] Fps is (10 sec: 19660.7, 60 sec: 19865.6, 300 sec: 20049.6). Total num frames: 1120636928. Throughput: 0: 4957.4. Samples: 5153572. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 13:35:54,965][19571] Avg episode reward: [(0, '9.587')] [2025-01-05 13:35:55,476][19668] Updated weights for policy 0, policy_version 273596 (0.0015) [2025-01-05 13:35:57,500][19668] Updated weights for policy 0, policy_version 273606 (0.0015) [2025-01-05 13:35:59,587][19668] Updated weights for policy 0, policy_version 273616 (0.0015) [2025-01-05 13:35:59,965][19571] Fps is (10 sec: 20070.2, 60 sec: 19865.5, 300 sec: 20063.5). Total num frames: 1120739328. Throughput: 0: 4949.7. Samples: 5183324. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 13:35:59,965][19571] Avg episode reward: [(0, '9.580')] [2025-01-05 13:36:01,662][19668] Updated weights for policy 0, policy_version 273626 (0.0016) [2025-01-05 13:36:03,686][19668] Updated weights for policy 0, policy_version 273636 (0.0015) [2025-01-05 13:36:04,965][19571] Fps is (10 sec: 19660.7, 60 sec: 19797.3, 300 sec: 20049.6). Total num frames: 1120833536. Throughput: 0: 4940.6. Samples: 5198072. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 13:36:04,965][19571] Avg episode reward: [(0, '10.034')] [2025-01-05 13:36:05,836][19668] Updated weights for policy 0, policy_version 273646 (0.0017) [2025-01-05 13:36:07,851][19668] Updated weights for policy 0, policy_version 273656 (0.0015) [2025-01-05 13:36:09,865][19668] Updated weights for policy 0, policy_version 273666 (0.0015) [2025-01-05 13:36:09,965][19571] Fps is (10 sec: 19660.9, 60 sec: 19797.3, 300 sec: 20049.6). Total num frames: 1120935936. Throughput: 0: 4939.4. Samples: 5227784. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 13:36:09,965][19571] Avg episode reward: [(0, '10.154')] [2025-01-05 13:36:11,929][19668] Updated weights for policy 0, policy_version 273676 (0.0016) [2025-01-05 13:36:13,949][19668] Updated weights for policy 0, policy_version 273686 (0.0014) [2025-01-05 13:36:14,965][19571] Fps is (10 sec: 20070.4, 60 sec: 19797.3, 300 sec: 20035.7). Total num frames: 1121034240. Throughput: 0: 4961.9. Samples: 5257890. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 13:36:14,965][19571] Avg episode reward: [(0, '9.348')] [2025-01-05 13:36:16,039][19668] Updated weights for policy 0, policy_version 273696 (0.0016) [2025-01-05 13:36:18,095][19668] Updated weights for policy 0, policy_version 273706 (0.0015) [2025-01-05 13:36:19,965][19571] Fps is (10 sec: 19661.0, 60 sec: 19797.4, 300 sec: 20035.7). Total num frames: 1121132544. Throughput: 0: 4960.4. Samples: 5272812. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 13:36:19,965][19571] Avg episode reward: [(0, '8.824')] [2025-01-05 13:36:20,164][19668] Updated weights for policy 0, policy_version 273716 (0.0016) [2025-01-05 13:36:22,173][19668] Updated weights for policy 0, policy_version 273726 (0.0015) [2025-01-05 13:36:24,246][19668] Updated weights for policy 0, policy_version 273736 (0.0015) [2025-01-05 13:36:24,965][19571] Fps is (10 sec: 20070.3, 60 sec: 19797.3, 300 sec: 20049.6). Total num frames: 1121234944. Throughput: 0: 4966.9. Samples: 5302836. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 13:36:24,966][19571] Avg episode reward: [(0, '10.428')] [2025-01-05 13:36:26,329][19668] Updated weights for policy 0, policy_version 273746 (0.0016) [2025-01-05 13:36:28,311][19668] Updated weights for policy 0, policy_version 273756 (0.0015) [2025-01-05 13:36:29,965][19571] Fps is (10 sec: 20479.8, 60 sec: 19933.8, 300 sec: 20049.6). Total num frames: 1121337344. Throughput: 0: 4974.8. Samples: 5332894. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 13:36:29,965][19571] Avg episode reward: [(0, '9.432')] [2025-01-05 13:36:30,378][19668] Updated weights for policy 0, policy_version 273766 (0.0017) [2025-01-05 13:36:32,409][19668] Updated weights for policy 0, policy_version 273776 (0.0015) [2025-01-05 13:36:34,397][19668] Updated weights for policy 0, policy_version 273786 (0.0017) [2025-01-05 13:36:34,965][19571] Fps is (10 sec: 20070.8, 60 sec: 19865.6, 300 sec: 20049.6). Total num frames: 1121435648. Throughput: 0: 4975.4. Samples: 5347986. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 13:36:34,965][19571] Avg episode reward: [(0, '10.234')] [2025-01-05 13:36:36,448][19668] Updated weights for policy 0, policy_version 273796 (0.0015) [2025-01-05 13:36:38,473][19668] Updated weights for policy 0, policy_version 273806 (0.0015) [2025-01-05 13:36:39,965][19571] Fps is (10 sec: 20070.3, 60 sec: 19933.8, 300 sec: 20063.5). Total num frames: 1121538048. Throughput: 0: 4993.1. Samples: 5378262. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 13:36:39,965][19571] Avg episode reward: [(0, '8.963')] [2025-01-05 13:36:39,973][19636] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000273813_1121538048.pth... [2025-01-05 13:36:40,024][19636] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000272639_1116729344.pth [2025-01-05 13:36:40,589][19668] Updated weights for policy 0, policy_version 273816 (0.0017) [2025-01-05 13:36:42,646][19668] Updated weights for policy 0, policy_version 273826 (0.0014) [2025-01-05 13:36:44,663][19668] Updated weights for policy 0, policy_version 273836 (0.0015) [2025-01-05 13:36:44,965][19571] Fps is (10 sec: 20070.4, 60 sec: 19933.9, 300 sec: 20049.6). Total num frames: 1121636352. Throughput: 0: 4994.4. Samples: 5408070. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 13:36:44,965][19571] Avg episode reward: [(0, '9.810')] [2025-01-05 13:36:46,685][19668] Updated weights for policy 0, policy_version 273846 (0.0015) [2025-01-05 13:36:48,728][19668] Updated weights for policy 0, policy_version 273856 (0.0015) [2025-01-05 13:36:49,965][19571] Fps is (10 sec: 19660.9, 60 sec: 19933.8, 300 sec: 20049.6). Total num frames: 1121734656. Throughput: 0: 5003.9. Samples: 5423246. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 13:36:49,965][19571] Avg episode reward: [(0, '9.791')] [2025-01-05 13:36:50,786][19668] Updated weights for policy 0, policy_version 273866 (0.0017) [2025-01-05 13:36:52,800][19668] Updated weights for policy 0, policy_version 273876 (0.0015) [2025-01-05 13:36:54,849][19668] Updated weights for policy 0, policy_version 273886 (0.0015) [2025-01-05 13:36:54,965][19571] Fps is (10 sec: 20070.2, 60 sec: 20002.1, 300 sec: 20049.6). Total num frames: 1121837056. Throughput: 0: 5013.3. Samples: 5453384. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 13:36:54,965][19571] Avg episode reward: [(0, '9.576')] [2025-01-05 13:36:56,910][19668] Updated weights for policy 0, policy_version 273896 (0.0016) [2025-01-05 13:36:58,917][19668] Updated weights for policy 0, policy_version 273906 (0.0015) [2025-01-05 13:36:59,965][19571] Fps is (10 sec: 20480.0, 60 sec: 20002.1, 300 sec: 20063.5). Total num frames: 1121939456. Throughput: 0: 5014.0. Samples: 5483522. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 13:36:59,965][19571] Avg episode reward: [(0, '10.317')] [2025-01-05 13:37:00,975][19668] Updated weights for policy 0, policy_version 273916 (0.0015) [2025-01-05 13:37:02,978][19668] Updated weights for policy 0, policy_version 273926 (0.0016) [2025-01-05 13:37:04,965][19571] Fps is (10 sec: 20070.7, 60 sec: 20070.4, 300 sec: 20049.6). Total num frames: 1122037760. Throughput: 0: 5020.3. Samples: 5498724. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 13:37:04,965][19571] Avg episode reward: [(0, '9.702')] [2025-01-05 13:37:04,993][19668] Updated weights for policy 0, policy_version 273936 (0.0015) [2025-01-05 13:37:07,052][19668] Updated weights for policy 0, policy_version 273946 (0.0016) [2025-01-05 13:37:09,075][19668] Updated weights for policy 0, policy_version 273956 (0.0016) [2025-01-05 13:37:09,965][19571] Fps is (10 sec: 20070.4, 60 sec: 20070.4, 300 sec: 20063.5). Total num frames: 1122140160. Throughput: 0: 5021.0. Samples: 5528782. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 13:37:09,965][19571] Avg episode reward: [(0, '9.110')] [2025-01-05 13:37:11,172][19668] Updated weights for policy 0, policy_version 273966 (0.0016) [2025-01-05 13:37:13,257][19668] Updated weights for policy 0, policy_version 273976 (0.0016) [2025-01-05 13:37:14,965][19571] Fps is (10 sec: 20070.1, 60 sec: 20070.4, 300 sec: 20049.6). Total num frames: 1122238464. Throughput: 0: 5012.9. Samples: 5558474. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 13:37:14,965][19571] Avg episode reward: [(0, '10.132')] [2025-01-05 13:37:15,292][19668] Updated weights for policy 0, policy_version 273986 (0.0017) [2025-01-05 13:37:17,316][19668] Updated weights for policy 0, policy_version 273996 (0.0016) [2025-01-05 13:37:19,393][19668] Updated weights for policy 0, policy_version 274006 (0.0015) [2025-01-05 13:37:19,965][19571] Fps is (10 sec: 19660.6, 60 sec: 20070.3, 300 sec: 20049.6). Total num frames: 1122336768. Throughput: 0: 5014.2. Samples: 5573626. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 13:37:19,965][19571] Avg episode reward: [(0, '9.970')] [2025-01-05 13:37:21,438][19668] Updated weights for policy 0, policy_version 274016 (0.0015) [2025-01-05 13:37:23,472][19668] Updated weights for policy 0, policy_version 274026 (0.0016) [2025-01-05 13:37:24,965][19571] Fps is (10 sec: 19660.8, 60 sec: 20002.1, 300 sec: 20035.7). Total num frames: 1122435072. Throughput: 0: 5007.3. Samples: 5603590. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 13:37:24,965][19571] Avg episode reward: [(0, '10.504')] [2025-01-05 13:37:25,623][19668] Updated weights for policy 0, policy_version 274036 (0.0016) [2025-01-05 13:37:27,586][19668] Updated weights for policy 0, policy_version 274046 (0.0015) [2025-01-05 13:37:29,637][19668] Updated weights for policy 0, policy_version 274056 (0.0015) [2025-01-05 13:37:29,965][19571] Fps is (10 sec: 20070.5, 60 sec: 20002.1, 300 sec: 20035.7). Total num frames: 1122537472. Throughput: 0: 5007.1. Samples: 5633390. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 13:37:29,965][19571] Avg episode reward: [(0, '9.733')] [2025-01-05 13:37:31,765][19668] Updated weights for policy 0, policy_version 274066 (0.0016) [2025-01-05 13:37:33,702][19668] Updated weights for policy 0, policy_version 274076 (0.0014) [2025-01-05 13:37:34,965][19571] Fps is (10 sec: 20480.1, 60 sec: 20070.4, 300 sec: 20035.7). Total num frames: 1122639872. Throughput: 0: 5002.3. Samples: 5648350. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 13:37:34,965][19571] Avg episode reward: [(0, '9.539')] [2025-01-05 13:37:35,752][19668] Updated weights for policy 0, policy_version 274086 (0.0014) [2025-01-05 13:37:37,803][19668] Updated weights for policy 0, policy_version 274096 (0.0015) [2025-01-05 13:37:39,824][19668] Updated weights for policy 0, policy_version 274106 (0.0016) [2025-01-05 13:37:39,965][19571] Fps is (10 sec: 20070.4, 60 sec: 20002.1, 300 sec: 20035.7). Total num frames: 1122738176. Throughput: 0: 5003.6. Samples: 5678546. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 13:37:39,965][19571] Avg episode reward: [(0, '10.948')] [2025-01-05 13:37:41,907][19668] Updated weights for policy 0, policy_version 274116 (0.0015) [2025-01-05 13:37:43,970][19668] Updated weights for policy 0, policy_version 274126 (0.0015) [2025-01-05 13:37:44,965][19571] Fps is (10 sec: 19660.9, 60 sec: 20002.1, 300 sec: 20021.8). Total num frames: 1122836480. Throughput: 0: 4998.1. Samples: 5708436. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 13:37:44,965][19571] Avg episode reward: [(0, '10.008')] [2025-01-05 13:37:46,025][19668] Updated weights for policy 0, policy_version 274136 (0.0017) [2025-01-05 13:37:48,072][19668] Updated weights for policy 0, policy_version 274146 (0.0016) [2025-01-05 13:37:49,965][19571] Fps is (10 sec: 20070.5, 60 sec: 20070.4, 300 sec: 20035.7). Total num frames: 1122938880. Throughput: 0: 4991.9. Samples: 5723362. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 13:37:49,965][19571] Avg episode reward: [(0, '9.869')] [2025-01-05 13:37:50,193][19668] Updated weights for policy 0, policy_version 274156 (0.0016) [2025-01-05 13:37:52,156][19668] Updated weights for policy 0, policy_version 274166 (0.0016) [2025-01-05 13:37:54,221][19668] Updated weights for policy 0, policy_version 274176 (0.0016) [2025-01-05 13:37:54,965][19571] Fps is (10 sec: 20070.4, 60 sec: 20002.2, 300 sec: 20021.8). Total num frames: 1123037184. Throughput: 0: 4989.5. Samples: 5753310. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 13:37:54,965][19571] Avg episode reward: [(0, '10.392')] [2025-01-05 13:37:56,337][19668] Updated weights for policy 0, policy_version 274186 (0.0017) [2025-01-05 13:37:58,314][19668] Updated weights for policy 0, policy_version 274196 (0.0016) [2025-01-05 13:37:59,965][19571] Fps is (10 sec: 19660.9, 60 sec: 19933.9, 300 sec: 20007.9). Total num frames: 1123135488. Throughput: 0: 4998.5. Samples: 5783406. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 13:37:59,965][19571] Avg episode reward: [(0, '9.418')] [2025-01-05 13:38:00,356][19668] Updated weights for policy 0, policy_version 274206 (0.0016) [2025-01-05 13:38:02,404][19668] Updated weights for policy 0, policy_version 274216 (0.0015) [2025-01-05 13:38:04,368][19668] Updated weights for policy 0, policy_version 274226 (0.0016) [2025-01-05 13:38:04,965][19571] Fps is (10 sec: 20480.1, 60 sec: 20070.4, 300 sec: 20021.8). Total num frames: 1123241984. Throughput: 0: 5000.7. Samples: 5798656. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 13:38:04,965][19571] Avg episode reward: [(0, '9.411')] [2025-01-05 13:38:06,393][19668] Updated weights for policy 0, policy_version 274236 (0.0016) [2025-01-05 13:38:08,452][19668] Updated weights for policy 0, policy_version 274246 (0.0016) [2025-01-05 13:38:09,965][19571] Fps is (10 sec: 20479.7, 60 sec: 20002.1, 300 sec: 20007.9). Total num frames: 1123340288. Throughput: 0: 5010.9. Samples: 5829080. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 13:38:09,965][19571] Avg episode reward: [(0, '8.445')] [2025-01-05 13:38:10,528][19668] Updated weights for policy 0, policy_version 274256 (0.0016) [2025-01-05 13:38:12,579][19668] Updated weights for policy 0, policy_version 274266 (0.0016) [2025-01-05 13:38:14,657][19668] Updated weights for policy 0, policy_version 274276 (0.0016) [2025-01-05 13:38:14,965][19571] Fps is (10 sec: 19660.8, 60 sec: 20002.2, 300 sec: 20007.9). Total num frames: 1123438592. Throughput: 0: 5008.2. Samples: 5858760. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 13:38:14,965][19571] Avg episode reward: [(0, '8.810')] [2025-01-05 13:38:16,684][19668] Updated weights for policy 0, policy_version 274286 (0.0016) [2025-01-05 13:38:18,787][19668] Updated weights for policy 0, policy_version 274296 (0.0017) [2025-01-05 13:38:19,965][19571] Fps is (10 sec: 19661.1, 60 sec: 20002.2, 300 sec: 20007.9). Total num frames: 1123536896. Throughput: 0: 5003.3. Samples: 5873496. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 13:38:19,965][19571] Avg episode reward: [(0, '10.705')] [2025-01-05 13:38:20,905][19668] Updated weights for policy 0, policy_version 274306 (0.0016) [2025-01-05 13:38:22,865][19668] Updated weights for policy 0, policy_version 274316 (0.0018) [2025-01-05 13:38:24,953][19668] Updated weights for policy 0, policy_version 274326 (0.0016) [2025-01-05 13:38:24,965][19571] Fps is (10 sec: 20070.2, 60 sec: 20070.4, 300 sec: 20007.9). Total num frames: 1123639296. Throughput: 0: 4995.1. Samples: 5903324. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 13:38:24,965][19571] Avg episode reward: [(0, '9.372')] [2025-01-05 13:38:27,065][19668] Updated weights for policy 0, policy_version 274336 (0.0019) [2025-01-05 13:38:29,035][19668] Updated weights for policy 0, policy_version 274346 (0.0016) [2025-01-05 13:38:29,965][19571] Fps is (10 sec: 20070.2, 60 sec: 20002.1, 300 sec: 19994.0). Total num frames: 1123737600. Throughput: 0: 4994.8. Samples: 5933202. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 13:38:29,965][19571] Avg episode reward: [(0, '9.522')] [2025-01-05 13:38:31,097][19668] Updated weights for policy 0, policy_version 274356 (0.0017) [2025-01-05 13:38:33,162][19668] Updated weights for policy 0, policy_version 274366 (0.0016) [2025-01-05 13:38:34,965][19571] Fps is (10 sec: 19660.9, 60 sec: 19933.9, 300 sec: 19994.0). Total num frames: 1123835904. Throughput: 0: 4998.0. Samples: 5948270. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 13:38:34,965][19571] Avg episode reward: [(0, '9.382')] [2025-01-05 13:38:35,183][19668] Updated weights for policy 0, policy_version 274376 (0.0017) [2025-01-05 13:38:37,246][19668] Updated weights for policy 0, policy_version 274386 (0.0016) [2025-01-05 13:38:39,314][19668] Updated weights for policy 0, policy_version 274396 (0.0018) [2025-01-05 13:38:39,965][19571] Fps is (10 sec: 20070.2, 60 sec: 20002.1, 300 sec: 20007.9). Total num frames: 1123938304. Throughput: 0: 5000.5. Samples: 5978334. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 13:38:39,966][19571] Avg episode reward: [(0, '10.095')] [2025-01-05 13:38:39,973][19636] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000274399_1123938304.pth... [2025-01-05 13:38:40,031][19636] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000273229_1119145984.pth [2025-01-05 13:38:41,376][19668] Updated weights for policy 0, policy_version 274406 (0.0016) [2025-01-05 13:38:43,424][19668] Updated weights for policy 0, policy_version 274416 (0.0016) [2025-01-05 13:38:44,965][19571] Fps is (10 sec: 20070.2, 60 sec: 20002.1, 300 sec: 19994.0). Total num frames: 1124036608. Throughput: 0: 4989.6. Samples: 6007938. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 13:38:44,965][19571] Avg episode reward: [(0, '10.805')] [2025-01-05 13:38:45,576][19668] Updated weights for policy 0, policy_version 274426 (0.0019) [2025-01-05 13:38:47,533][19668] Updated weights for policy 0, policy_version 274436 (0.0016) [2025-01-05 13:38:49,608][19668] Updated weights for policy 0, policy_version 274446 (0.0017) [2025-01-05 13:38:49,965][19571] Fps is (10 sec: 19661.2, 60 sec: 19933.9, 300 sec: 19994.0). Total num frames: 1124134912. Throughput: 0: 4983.3. Samples: 6022906. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 13:38:49,965][19571] Avg episode reward: [(0, '10.622')] [2025-01-05 13:38:51,736][19668] Updated weights for policy 0, policy_version 274456 (0.0019) [2025-01-05 13:38:53,698][19668] Updated weights for policy 0, policy_version 274466 (0.0016) [2025-01-05 13:38:54,965][19571] Fps is (10 sec: 20070.4, 60 sec: 20002.1, 300 sec: 19994.0). Total num frames: 1124237312. Throughput: 0: 4970.2. Samples: 6052740. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 13:38:54,965][19571] Avg episode reward: [(0, '10.305')] [2025-01-05 13:38:55,770][19668] Updated weights for policy 0, policy_version 274476 (0.0015) [2025-01-05 13:38:57,828][19668] Updated weights for policy 0, policy_version 274486 (0.0017) [2025-01-05 13:38:59,868][19668] Updated weights for policy 0, policy_version 274496 (0.0017) [2025-01-05 13:38:59,965][19571] Fps is (10 sec: 20070.4, 60 sec: 20002.1, 300 sec: 19980.2). Total num frames: 1124335616. Throughput: 0: 4977.8. Samples: 6082762. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 13:38:59,965][19571] Avg episode reward: [(0, '9.458')] [2025-01-05 13:39:01,950][19668] Updated weights for policy 0, policy_version 274506 (0.0015) [2025-01-05 13:39:03,992][19668] Updated weights for policy 0, policy_version 274516 (0.0016) [2025-01-05 13:39:04,965][19571] Fps is (10 sec: 19660.8, 60 sec: 19865.6, 300 sec: 19980.2). Total num frames: 1124433920. Throughput: 0: 4983.9. Samples: 6097770. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 13:39:04,965][19571] Avg episode reward: [(0, '9.768')] [2025-01-05 13:39:05,999][19668] Updated weights for policy 0, policy_version 274526 (0.0015) [2025-01-05 13:39:08,023][19668] Updated weights for policy 0, policy_version 274536 (0.0015) [2025-01-05 13:39:09,965][19571] Fps is (10 sec: 20070.4, 60 sec: 19933.9, 300 sec: 19994.0). Total num frames: 1124536320. Throughput: 0: 4989.9. Samples: 6127870. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 13:39:09,965][19571] Avg episode reward: [(0, '9.991')] [2025-01-05 13:39:10,132][19668] Updated weights for policy 0, policy_version 274546 (0.0016) [2025-01-05 13:39:12,083][19668] Updated weights for policy 0, policy_version 274556 (0.0015) [2025-01-05 13:39:14,092][19668] Updated weights for policy 0, policy_version 274566 (0.0015) [2025-01-05 13:39:14,965][19571] Fps is (10 sec: 20480.1, 60 sec: 20002.1, 300 sec: 19994.0). Total num frames: 1124638720. Throughput: 0: 5005.1. Samples: 6158430. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 13:39:14,965][19571] Avg episode reward: [(0, '10.717')] [2025-01-05 13:39:16,125][19668] Updated weights for policy 0, policy_version 274576 (0.0015) [2025-01-05 13:39:18,073][19668] Updated weights for policy 0, policy_version 274586 (0.0015) [2025-01-05 13:39:19,965][19571] Fps is (10 sec: 20479.7, 60 sec: 20070.4, 300 sec: 20007.9). Total num frames: 1124741120. Throughput: 0: 5011.9. Samples: 6173804. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 13:39:19,965][19571] Avg episode reward: [(0, '11.273')] [2025-01-05 13:39:20,106][19668] Updated weights for policy 0, policy_version 274596 (0.0015) [2025-01-05 13:39:22,156][19668] Updated weights for policy 0, policy_version 274606 (0.0016) [2025-01-05 13:39:24,107][19668] Updated weights for policy 0, policy_version 274616 (0.0015) [2025-01-05 13:39:24,965][19571] Fps is (10 sec: 20479.8, 60 sec: 20070.4, 300 sec: 20007.9). Total num frames: 1124843520. Throughput: 0: 5022.7. Samples: 6204356. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 13:39:24,965][19571] Avg episode reward: [(0, '11.255')] [2025-01-05 13:39:26,139][19668] Updated weights for policy 0, policy_version 274626 (0.0015) [2025-01-05 13:39:28,185][19668] Updated weights for policy 0, policy_version 274636 (0.0015) [2025-01-05 13:39:29,965][19571] Fps is (10 sec: 20070.7, 60 sec: 20070.4, 300 sec: 19994.0). Total num frames: 1124941824. Throughput: 0: 5039.2. Samples: 6234702. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 13:39:29,965][19571] Avg episode reward: [(0, '9.651')] [2025-01-05 13:39:30,207][19668] Updated weights for policy 0, policy_version 274646 (0.0016) [2025-01-05 13:39:32,225][19668] Updated weights for policy 0, policy_version 274656 (0.0015) [2025-01-05 13:39:34,277][19668] Updated weights for policy 0, policy_version 274666 (0.0015) [2025-01-05 13:39:34,965][19571] Fps is (10 sec: 20070.2, 60 sec: 20138.6, 300 sec: 20007.9). Total num frames: 1125044224. Throughput: 0: 5045.1. Samples: 6249936. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 13:39:34,965][19571] Avg episode reward: [(0, '10.377')] [2025-01-05 13:39:36,293][19668] Updated weights for policy 0, policy_version 274676 (0.0016) [2025-01-05 13:39:38,313][19668] Updated weights for policy 0, policy_version 274686 (0.0016) [2025-01-05 13:39:39,965][19571] Fps is (10 sec: 20070.1, 60 sec: 20070.4, 300 sec: 19994.0). Total num frames: 1125142528. Throughput: 0: 5052.0. Samples: 6280078. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 13:39:39,965][19571] Avg episode reward: [(0, '10.950')] [2025-01-05 13:39:40,425][19668] Updated weights for policy 0, policy_version 274696 (0.0016) [2025-01-05 13:39:42,429][19668] Updated weights for policy 0, policy_version 274706 (0.0016) [2025-01-05 13:39:44,428][19668] Updated weights for policy 0, policy_version 274716 (0.0014) [2025-01-05 13:39:44,965][19571] Fps is (10 sec: 20070.7, 60 sec: 20138.7, 300 sec: 20007.9). Total num frames: 1125244928. Throughput: 0: 5055.8. Samples: 6310272. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 13:39:44,965][19571] Avg episode reward: [(0, '10.523')] [2025-01-05 13:39:46,471][19668] Updated weights for policy 0, policy_version 274726 (0.0015) [2025-01-05 13:39:48,470][19668] Updated weights for policy 0, policy_version 274736 (0.0015) [2025-01-05 13:39:49,965][19571] Fps is (10 sec: 20480.0, 60 sec: 20206.9, 300 sec: 20007.9). Total num frames: 1125347328. Throughput: 0: 5062.2. Samples: 6325568. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 13:39:49,965][19571] Avg episode reward: [(0, '10.389')] [2025-01-05 13:39:50,463][19668] Updated weights for policy 0, policy_version 274746 (0.0016) [2025-01-05 13:39:52,489][19668] Updated weights for policy 0, policy_version 274756 (0.0015) [2025-01-05 13:39:54,494][19668] Updated weights for policy 0, policy_version 274766 (0.0015) [2025-01-05 13:39:54,965][19571] Fps is (10 sec: 20480.2, 60 sec: 20207.0, 300 sec: 20007.9). Total num frames: 1125449728. Throughput: 0: 5073.4. Samples: 6356174. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 13:39:54,965][19571] Avg episode reward: [(0, '10.191')] [2025-01-05 13:39:56,498][19668] Updated weights for policy 0, policy_version 274776 (0.0015) [2025-01-05 13:39:58,523][19668] Updated weights for policy 0, policy_version 274786 (0.0015) [2025-01-05 13:39:59,965][19571] Fps is (10 sec: 20070.4, 60 sec: 20206.9, 300 sec: 20007.9). Total num frames: 1125548032. Throughput: 0: 5068.8. Samples: 6386526. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 13:39:59,965][19571] Avg episode reward: [(0, '10.288')] [2025-01-05 13:40:00,607][19668] Updated weights for policy 0, policy_version 274796 (0.0016) [2025-01-05 13:40:02,621][19668] Updated weights for policy 0, policy_version 274806 (0.0015) [2025-01-05 13:40:04,629][19668] Updated weights for policy 0, policy_version 274816 (0.0015) [2025-01-05 13:40:04,965][19571] Fps is (10 sec: 20070.2, 60 sec: 20275.2, 300 sec: 20007.9). Total num frames: 1125650432. Throughput: 0: 5062.1. Samples: 6401600. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 13:40:04,965][19571] Avg episode reward: [(0, '9.496')] [2025-01-05 13:40:06,663][19668] Updated weights for policy 0, policy_version 274826 (0.0016) [2025-01-05 13:40:08,652][19668] Updated weights for policy 0, policy_version 274836 (0.0016) [2025-01-05 13:40:09,965][19571] Fps is (10 sec: 20480.2, 60 sec: 20275.2, 300 sec: 20021.8). Total num frames: 1125752832. Throughput: 0: 5062.2. Samples: 6432156. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 13:40:09,965][19571] Avg episode reward: [(0, '9.794')] [2025-01-05 13:40:10,649][19668] Updated weights for policy 0, policy_version 274846 (0.0015) [2025-01-05 13:40:12,706][19668] Updated weights for policy 0, policy_version 274856 (0.0016) [2025-01-05 13:40:14,681][19668] Updated weights for policy 0, policy_version 274866 (0.0015) [2025-01-05 13:40:14,965][19571] Fps is (10 sec: 20480.0, 60 sec: 20275.2, 300 sec: 20035.7). Total num frames: 1125855232. Throughput: 0: 5064.7. Samples: 6462614. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 13:40:14,965][19571] Avg episode reward: [(0, '10.474')] [2025-01-05 13:40:16,722][19668] Updated weights for policy 0, policy_version 274876 (0.0015) [2025-01-05 13:40:18,751][19668] Updated weights for policy 0, policy_version 274886 (0.0015) [2025-01-05 13:40:19,965][19571] Fps is (10 sec: 20070.3, 60 sec: 20207.0, 300 sec: 20021.8). Total num frames: 1125953536. Throughput: 0: 5064.0. Samples: 6477814. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 13:40:19,965][19571] Avg episode reward: [(0, '9.522')] [2025-01-05 13:40:20,809][19668] Updated weights for policy 0, policy_version 274896 (0.0016) [2025-01-05 13:40:22,882][19668] Updated weights for policy 0, policy_version 274906 (0.0015) [2025-01-05 13:40:24,942][19668] Updated weights for policy 0, policy_version 274916 (0.0016) [2025-01-05 13:40:24,965][19571] Fps is (10 sec: 20070.4, 60 sec: 20206.9, 300 sec: 20049.6). Total num frames: 1126055936. Throughput: 0: 5056.8. Samples: 6507632. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 13:40:24,965][19571] Avg episode reward: [(0, '9.893')] [2025-01-05 13:40:26,991][19668] Updated weights for policy 0, policy_version 274926 (0.0016) [2025-01-05 13:40:29,087][19668] Updated weights for policy 0, policy_version 274936 (0.0017) [2025-01-05 13:40:29,965][19571] Fps is (10 sec: 20070.2, 60 sec: 20206.9, 300 sec: 20035.7). Total num frames: 1126154240. Throughput: 0: 5046.7. Samples: 6537372. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 13:40:29,965][19571] Avg episode reward: [(0, '9.785')] [2025-01-05 13:40:31,185][19668] Updated weights for policy 0, policy_version 274946 (0.0017) [2025-01-05 13:40:33,201][19668] Updated weights for policy 0, policy_version 274956 (0.0016) [2025-01-05 13:40:34,965][19571] Fps is (10 sec: 19661.0, 60 sec: 20138.7, 300 sec: 20035.7). Total num frames: 1126252544. Throughput: 0: 5035.1. Samples: 6552146. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 13:40:34,965][19571] Avg episode reward: [(0, '10.210')] [2025-01-05 13:40:35,303][19668] Updated weights for policy 0, policy_version 274966 (0.0016) [2025-01-05 13:40:37,355][19668] Updated weights for policy 0, policy_version 274976 (0.0018) [2025-01-05 13:40:39,344][19668] Updated weights for policy 0, policy_version 274986 (0.0016) [2025-01-05 13:40:39,965][19571] Fps is (10 sec: 20070.3, 60 sec: 20206.9, 300 sec: 20049.6). Total num frames: 1126354944. Throughput: 0: 5020.2. Samples: 6582084. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 13:40:39,965][19571] Avg episode reward: [(0, '9.925')] [2025-01-05 13:40:39,971][19636] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000274989_1126354944.pth... [2025-01-05 13:40:40,019][19636] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000273813_1121538048.pth [2025-01-05 13:40:41,454][19668] Updated weights for policy 0, policy_version 274996 (0.0018) [2025-01-05 13:40:43,528][19668] Updated weights for policy 0, policy_version 275006 (0.0016) [2025-01-05 13:40:44,965][19571] Fps is (10 sec: 20070.3, 60 sec: 20138.7, 300 sec: 20049.6). Total num frames: 1126453248. Throughput: 0: 5007.1. Samples: 6611844. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 13:40:44,965][19571] Avg episode reward: [(0, '10.165')] [2025-01-05 13:40:45,578][19668] Updated weights for policy 0, policy_version 275016 (0.0016) [2025-01-05 13:40:47,631][19668] Updated weights for policy 0, policy_version 275026 (0.0016) [2025-01-05 13:40:49,688][19668] Updated weights for policy 0, policy_version 275036 (0.0016) [2025-01-05 13:40:49,965][19571] Fps is (10 sec: 19661.1, 60 sec: 20070.4, 300 sec: 20049.6). Total num frames: 1126551552. Throughput: 0: 5004.5. Samples: 6626800. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 13:40:49,965][19571] Avg episode reward: [(0, '8.422')] [2025-01-05 13:40:51,718][19668] Updated weights for policy 0, policy_version 275046 (0.0015) [2025-01-05 13:40:53,751][19668] Updated weights for policy 0, policy_version 275056 (0.0016) [2025-01-05 13:40:54,965][19571] Fps is (10 sec: 19660.7, 60 sec: 20002.1, 300 sec: 20035.7). Total num frames: 1126649856. Throughput: 0: 4994.3. Samples: 6656902. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 13:40:54,965][19571] Avg episode reward: [(0, '9.811')] [2025-01-05 13:40:55,891][19668] Updated weights for policy 0, policy_version 275066 (0.0016) [2025-01-05 13:40:57,894][19668] Updated weights for policy 0, policy_version 275076 (0.0017) [2025-01-05 13:40:59,931][19668] Updated weights for policy 0, policy_version 275086 (0.0015) [2025-01-05 13:40:59,965][19571] Fps is (10 sec: 20070.2, 60 sec: 20070.4, 300 sec: 20063.5). Total num frames: 1126752256. Throughput: 0: 4979.1. Samples: 6686674. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 13:40:59,965][19571] Avg episode reward: [(0, '9.908')] [2025-01-05 13:41:02,086][19668] Updated weights for policy 0, policy_version 275096 (0.0016) [2025-01-05 13:41:04,034][19668] Updated weights for policy 0, policy_version 275106 (0.0015) [2025-01-05 13:41:04,965][19571] Fps is (10 sec: 20070.6, 60 sec: 20002.2, 300 sec: 20049.6). Total num frames: 1126850560. Throughput: 0: 4969.8. Samples: 6701454. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 13:41:04,965][19571] Avg episode reward: [(0, '10.458')] [2025-01-05 13:41:06,082][19668] Updated weights for policy 0, policy_version 275116 (0.0016) [2025-01-05 13:41:08,143][19668] Updated weights for policy 0, policy_version 275126 (0.0015) [2025-01-05 13:41:09,965][19571] Fps is (10 sec: 19661.0, 60 sec: 19933.9, 300 sec: 20049.6). Total num frames: 1126948864. Throughput: 0: 4978.0. Samples: 6731640. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 13:41:09,965][19571] Avg episode reward: [(0, '9.732')] [2025-01-05 13:41:10,190][19668] Updated weights for policy 0, policy_version 275136 (0.0016) [2025-01-05 13:41:12,258][19668] Updated weights for policy 0, policy_version 275146 (0.0015) [2025-01-05 13:41:14,337][19668] Updated weights for policy 0, policy_version 275156 (0.0015) [2025-01-05 13:41:14,965][19571] Fps is (10 sec: 19660.8, 60 sec: 19865.6, 300 sec: 20049.6). Total num frames: 1127047168. Throughput: 0: 4980.3. Samples: 6761486. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 13:41:14,965][19571] Avg episode reward: [(0, '9.848')] [2025-01-05 13:41:16,432][19668] Updated weights for policy 0, policy_version 275166 (0.0016) [2025-01-05 13:41:18,475][19668] Updated weights for policy 0, policy_version 275176 (0.0015) [2025-01-05 13:41:19,965][19571] Fps is (10 sec: 19660.8, 60 sec: 19865.6, 300 sec: 20035.7). Total num frames: 1127145472. Throughput: 0: 4978.0. Samples: 6776156. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 13:41:19,965][19571] Avg episode reward: [(0, '9.320')] [2025-01-05 13:41:20,618][19668] Updated weights for policy 0, policy_version 275186 (0.0016) [2025-01-05 13:41:22,601][19668] Updated weights for policy 0, policy_version 275196 (0.0015) [2025-01-05 13:41:24,668][19668] Updated weights for policy 0, policy_version 275206 (0.0015) [2025-01-05 13:41:24,965][19571] Fps is (10 sec: 20070.1, 60 sec: 19865.6, 300 sec: 20035.7). Total num frames: 1127247872. Throughput: 0: 4974.3. Samples: 6805926. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 13:41:24,965][19571] Avg episode reward: [(0, '10.921')] [2025-01-05 13:41:26,812][19668] Updated weights for policy 0, policy_version 275216 (0.0017) [2025-01-05 13:41:28,820][19668] Updated weights for policy 0, policy_version 275226 (0.0016) [2025-01-05 13:41:29,965][19571] Fps is (10 sec: 20069.6, 60 sec: 19865.5, 300 sec: 20035.7). Total num frames: 1127346176. Throughput: 0: 4970.5. Samples: 6835518. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 13:41:29,966][19571] Avg episode reward: [(0, '9.836')] [2025-01-05 13:41:30,890][19668] Updated weights for policy 0, policy_version 275236 (0.0016) [2025-01-05 13:41:32,938][19668] Updated weights for policy 0, policy_version 275246 (0.0016) [2025-01-05 13:41:34,965][19571] Fps is (10 sec: 19661.1, 60 sec: 19865.6, 300 sec: 20021.8). Total num frames: 1127444480. Throughput: 0: 4973.9. Samples: 6850626. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-05 13:41:34,965][19571] Avg episode reward: [(0, '10.292')] [2025-01-05 13:41:35,034][19668] Updated weights for policy 0, policy_version 275256 (0.0017) [2025-01-05 13:41:37,170][19668] Updated weights for policy 0, policy_version 275266 (0.0016) [2025-01-05 13:41:39,210][19668] Updated weights for policy 0, policy_version 275276 (0.0015) [2025-01-05 13:41:39,965][19571] Fps is (10 sec: 19661.6, 60 sec: 19797.4, 300 sec: 20021.8). Total num frames: 1127542784. Throughput: 0: 4956.5. Samples: 6879944. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-05 13:41:39,965][19571] Avg episode reward: [(0, '9.696')] [2025-01-05 13:41:41,288][19668] Updated weights for policy 0, policy_version 275286 (0.0016) [2025-01-05 13:41:43,357][19668] Updated weights for policy 0, policy_version 275296 (0.0015) [2025-01-05 13:41:44,965][19571] Fps is (10 sec: 19660.7, 60 sec: 19797.3, 300 sec: 20021.8). Total num frames: 1127641088. Throughput: 0: 4951.2. Samples: 6909476. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-05 13:41:44,965][19571] Avg episode reward: [(0, '9.400')] [2025-01-05 13:41:45,478][19668] Updated weights for policy 0, policy_version 275306 (0.0016) [2025-01-05 13:41:47,417][19668] Updated weights for policy 0, policy_version 275316 (0.0015) [2025-01-05 13:41:49,466][19668] Updated weights for policy 0, policy_version 275326 (0.0016) [2025-01-05 13:41:49,965][19571] Fps is (10 sec: 20070.1, 60 sec: 19865.5, 300 sec: 20021.8). Total num frames: 1127743488. Throughput: 0: 4960.2. Samples: 6924666. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-05 13:41:49,966][19571] Avg episode reward: [(0, '10.137')] [2025-01-05 13:41:51,584][19668] Updated weights for policy 0, policy_version 275336 (0.0016) [2025-01-05 13:41:53,515][19668] Updated weights for policy 0, policy_version 275346 (0.0018) [2025-01-05 13:41:54,965][19571] Fps is (10 sec: 20070.6, 60 sec: 19865.6, 300 sec: 20007.9). Total num frames: 1127841792. Throughput: 0: 4960.7. Samples: 6954870. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-05 13:41:54,965][19571] Avg episode reward: [(0, '10.077')] [2025-01-05 13:41:55,596][19668] Updated weights for policy 0, policy_version 275356 (0.0016) [2025-01-05 13:41:57,632][19668] Updated weights for policy 0, policy_version 275366 (0.0016) [2025-01-05 13:41:59,584][19668] Updated weights for policy 0, policy_version 275376 (0.0015) [2025-01-05 13:41:59,965][19571] Fps is (10 sec: 20070.3, 60 sec: 19865.6, 300 sec: 20021.8). Total num frames: 1127944192. Throughput: 0: 4971.9. Samples: 6985224. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-05 13:41:59,966][19571] Avg episode reward: [(0, '9.727')] [2025-01-05 13:42:01,667][19668] Updated weights for policy 0, policy_version 275386 (0.0016) [2025-01-05 13:42:03,731][19668] Updated weights for policy 0, policy_version 275396 (0.0015) [2025-01-05 13:42:04,965][19571] Fps is (10 sec: 20070.2, 60 sec: 19865.6, 300 sec: 20007.9). Total num frames: 1128042496. Throughput: 0: 4981.0. Samples: 7000300. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-05 13:42:04,965][19571] Avg episode reward: [(0, '10.567')] [2025-01-05 13:42:05,753][19668] Updated weights for policy 0, policy_version 275406 (0.0016) [2025-01-05 13:42:07,813][19668] Updated weights for policy 0, policy_version 275416 (0.0015) [2025-01-05 13:42:09,942][19668] Updated weights for policy 0, policy_version 275426 (0.0016) [2025-01-05 13:42:09,965][19571] Fps is (10 sec: 20070.6, 60 sec: 19933.8, 300 sec: 20021.8). Total num frames: 1128144896. Throughput: 0: 4980.8. Samples: 7030060. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-05 13:42:09,966][19571] Avg episode reward: [(0, '9.545')] [2025-01-05 13:42:11,968][19668] Updated weights for policy 0, policy_version 275436 (0.0016) [2025-01-05 13:42:14,031][19668] Updated weights for policy 0, policy_version 275446 (0.0015) [2025-01-05 13:42:14,965][19571] Fps is (10 sec: 20070.5, 60 sec: 19933.9, 300 sec: 20021.8). Total num frames: 1128243200. Throughput: 0: 4983.9. Samples: 7059790. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-05 13:42:14,965][19571] Avg episode reward: [(0, '10.832')] [2025-01-05 13:42:16,157][19668] Updated weights for policy 0, policy_version 275456 (0.0016) [2025-01-05 13:42:18,135][19668] Updated weights for policy 0, policy_version 275466 (0.0015) [2025-01-05 13:42:19,965][19571] Fps is (10 sec: 20070.6, 60 sec: 20002.1, 300 sec: 20035.7). Total num frames: 1128345600. Throughput: 0: 4980.0. Samples: 7074726. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-05 13:42:19,965][19571] Avg episode reward: [(0, '9.312')] [2025-01-05 13:42:20,188][19668] Updated weights for policy 0, policy_version 275476 (0.0015) [2025-01-05 13:42:22,254][19668] Updated weights for policy 0, policy_version 275486 (0.0020) [2025-01-05 13:42:24,257][19668] Updated weights for policy 0, policy_version 275496 (0.0015) [2025-01-05 13:42:24,965][19571] Fps is (10 sec: 20070.3, 60 sec: 19933.9, 300 sec: 20021.8). Total num frames: 1128443904. Throughput: 0: 4997.4. Samples: 7104828. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-05 13:42:24,965][19571] Avg episode reward: [(0, '11.226')] [2025-01-05 13:42:26,300][19668] Updated weights for policy 0, policy_version 275506 (0.0015) [2025-01-05 13:42:28,356][19668] Updated weights for policy 0, policy_version 275516 (0.0015) [2025-01-05 13:42:29,965][19571] Fps is (10 sec: 19660.9, 60 sec: 19934.0, 300 sec: 20007.9). Total num frames: 1128542208. Throughput: 0: 5007.6. Samples: 7134818. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-05 13:42:29,965][19571] Avg episode reward: [(0, '9.124')] [2025-01-05 13:42:30,426][19668] Updated weights for policy 0, policy_version 275526 (0.0017) [2025-01-05 13:42:32,452][19668] Updated weights for policy 0, policy_version 275536 (0.0015) [2025-01-05 13:42:34,527][19668] Updated weights for policy 0, policy_version 275546 (0.0015) [2025-01-05 13:42:34,965][19571] Fps is (10 sec: 20070.3, 60 sec: 20002.1, 300 sec: 20021.8). Total num frames: 1128644608. Throughput: 0: 5004.8. Samples: 7149880. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-05 13:42:34,965][19571] Avg episode reward: [(0, '10.630')] [2025-01-05 13:42:36,604][19668] Updated weights for policy 0, policy_version 275556 (0.0016) [2025-01-05 13:42:38,619][19668] Updated weights for policy 0, policy_version 275566 (0.0014) [2025-01-05 13:42:39,965][19571] Fps is (10 sec: 20070.2, 60 sec: 20002.1, 300 sec: 20021.8). Total num frames: 1128742912. Throughput: 0: 4996.9. Samples: 7179732. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-05 13:42:39,965][19571] Avg episode reward: [(0, '10.028')] [2025-01-05 13:42:40,070][19636] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000275573_1128747008.pth... [2025-01-05 13:42:40,121][19636] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000274399_1123938304.pth [2025-01-05 13:42:40,733][19668] Updated weights for policy 0, policy_version 275576 (0.0018) [2025-01-05 13:42:42,719][19668] Updated weights for policy 0, policy_version 275586 (0.0015) [2025-01-05 13:42:44,760][19668] Updated weights for policy 0, policy_version 275596 (0.0015) [2025-01-05 13:42:44,965][19571] Fps is (10 sec: 19660.8, 60 sec: 20002.1, 300 sec: 20007.9). Total num frames: 1128841216. Throughput: 0: 4989.7. Samples: 7209760. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-05 13:42:44,965][19571] Avg episode reward: [(0, '11.407')] [2025-01-05 13:42:46,887][19668] Updated weights for policy 0, policy_version 275606 (0.0016) [2025-01-05 13:42:48,879][19668] Updated weights for policy 0, policy_version 275616 (0.0016) [2025-01-05 13:42:49,965][19571] Fps is (10 sec: 20070.6, 60 sec: 20002.2, 300 sec: 20021.8). Total num frames: 1128943616. Throughput: 0: 4981.9. Samples: 7224486. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 13:42:49,965][19571] Avg episode reward: [(0, '9.472')] [2025-01-05 13:42:50,918][19668] Updated weights for policy 0, policy_version 275626 (0.0018) [2025-01-05 13:42:52,964][19668] Updated weights for policy 0, policy_version 275636 (0.0015) [2025-01-05 13:42:54,965][19571] Fps is (10 sec: 20070.5, 60 sec: 20002.1, 300 sec: 20021.8). Total num frames: 1129041920. Throughput: 0: 4992.0. Samples: 7254698. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 13:42:54,965][19571] Avg episode reward: [(0, '10.663')] [2025-01-05 13:42:55,048][19668] Updated weights for policy 0, policy_version 275646 (0.0020) [2025-01-05 13:42:57,090][19668] Updated weights for policy 0, policy_version 275656 (0.0014) [2025-01-05 13:42:59,128][19668] Updated weights for policy 0, policy_version 275666 (0.0015) [2025-01-05 13:42:59,965][19571] Fps is (10 sec: 20070.2, 60 sec: 20002.2, 300 sec: 20007.9). Total num frames: 1129144320. Throughput: 0: 4996.7. Samples: 7284640. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 13:42:59,965][19571] Avg episode reward: [(0, '10.863')] [2025-01-05 13:43:01,243][19668] Updated weights for policy 0, policy_version 275676 (0.0017) [2025-01-05 13:43:03,281][19668] Updated weights for policy 0, policy_version 275686 (0.0016) [2025-01-05 13:43:04,965][19571] Fps is (10 sec: 19660.9, 60 sec: 19933.9, 300 sec: 19994.0). Total num frames: 1129238528. Throughput: 0: 4990.6. Samples: 7299302. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 13:43:04,965][19571] Avg episode reward: [(0, '9.522')] [2025-01-05 13:43:05,420][19668] Updated weights for policy 0, policy_version 275696 (0.0015) [2025-01-05 13:43:07,445][19668] Updated weights for policy 0, policy_version 275706 (0.0015) [2025-01-05 13:43:09,463][19668] Updated weights for policy 0, policy_version 275716 (0.0015) [2025-01-05 13:43:09,965][19571] Fps is (10 sec: 19660.9, 60 sec: 19933.9, 300 sec: 20007.9). Total num frames: 1129340928. Throughput: 0: 4983.2. Samples: 7329072. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 13:43:09,965][19571] Avg episode reward: [(0, '10.084')] [2025-01-05 13:43:11,568][19668] Updated weights for policy 0, policy_version 275726 (0.0015) [2025-01-05 13:43:13,571][19668] Updated weights for policy 0, policy_version 275736 (0.0016) [2025-01-05 13:43:14,965][19571] Fps is (10 sec: 20070.4, 60 sec: 19933.9, 300 sec: 20007.9). Total num frames: 1129439232. Throughput: 0: 4985.1. Samples: 7359148. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 13:43:14,965][19571] Avg episode reward: [(0, '10.428')] [2025-01-05 13:43:15,623][19668] Updated weights for policy 0, policy_version 275746 (0.0015) [2025-01-05 13:43:17,647][19668] Updated weights for policy 0, policy_version 275756 (0.0016) [2025-01-05 13:43:19,677][19668] Updated weights for policy 0, policy_version 275766 (0.0015) [2025-01-05 13:43:19,965][19571] Fps is (10 sec: 20070.6, 60 sec: 19933.9, 300 sec: 20007.9). Total num frames: 1129541632. Throughput: 0: 4986.2. Samples: 7374258. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 13:43:19,965][19571] Avg episode reward: [(0, '8.848')] [2025-01-05 13:43:21,769][19668] Updated weights for policy 0, policy_version 275776 (0.0016) [2025-01-05 13:43:23,807][19668] Updated weights for policy 0, policy_version 275786 (0.0016) [2025-01-05 13:43:24,965][19571] Fps is (10 sec: 20070.2, 60 sec: 19933.9, 300 sec: 20007.9). Total num frames: 1129639936. Throughput: 0: 4988.4. Samples: 7404212. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 13:43:24,965][19571] Avg episode reward: [(0, '9.932')] [2025-01-05 13:43:25,882][19668] Updated weights for policy 0, policy_version 275796 (0.0016) [2025-01-05 13:43:27,905][19668] Updated weights for policy 0, policy_version 275806 (0.0015) [2025-01-05 13:43:29,965][19571] Fps is (10 sec: 19660.5, 60 sec: 19933.8, 300 sec: 20007.9). Total num frames: 1129738240. Throughput: 0: 4983.1. Samples: 7434000. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 13:43:29,965][19571] Avg episode reward: [(0, '9.972')] [2025-01-05 13:43:30,004][19668] Updated weights for policy 0, policy_version 275816 (0.0016) [2025-01-05 13:43:32,081][19668] Updated weights for policy 0, policy_version 275826 (0.0015) [2025-01-05 13:43:34,107][19668] Updated weights for policy 0, policy_version 275836 (0.0016) [2025-01-05 13:43:34,965][19571] Fps is (10 sec: 19660.5, 60 sec: 19865.6, 300 sec: 19994.0). Total num frames: 1129836544. Throughput: 0: 4982.6. Samples: 7448702. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 13:43:34,966][19571] Avg episode reward: [(0, '12.006')] [2025-01-05 13:43:36,228][19668] Updated weights for policy 0, policy_version 275846 (0.0016) [2025-01-05 13:43:38,237][19668] Updated weights for policy 0, policy_version 275856 (0.0016) [2025-01-05 13:43:39,965][19571] Fps is (10 sec: 20070.5, 60 sec: 19933.9, 300 sec: 20007.9). Total num frames: 1129938944. Throughput: 0: 4975.5. Samples: 7478594. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 13:43:39,965][19571] Avg episode reward: [(0, '9.663')] [2025-01-05 13:43:40,262][19668] Updated weights for policy 0, policy_version 275866 (0.0016) [2025-01-05 13:43:42,269][19668] Updated weights for policy 0, policy_version 275876 (0.0015) [2025-01-05 13:43:44,262][19668] Updated weights for policy 0, policy_version 275886 (0.0016) [2025-01-05 13:43:44,965][19571] Fps is (10 sec: 20480.5, 60 sec: 20002.2, 300 sec: 20021.8). Total num frames: 1130041344. Throughput: 0: 4991.7. Samples: 7509268. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 13:43:44,965][19571] Avg episode reward: [(0, '8.890')] [2025-01-05 13:43:46,282][19668] Updated weights for policy 0, policy_version 275896 (0.0015) [2025-01-05 13:43:48,300][19668] Updated weights for policy 0, policy_version 275906 (0.0016) [2025-01-05 13:43:49,965][19571] Fps is (10 sec: 20070.3, 60 sec: 19933.8, 300 sec: 20007.9). Total num frames: 1130139648. Throughput: 0: 5004.8. Samples: 7524520. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 13:43:49,965][19571] Avg episode reward: [(0, '9.374')] [2025-01-05 13:43:50,396][19668] Updated weights for policy 0, policy_version 275916 (0.0016) [2025-01-05 13:43:52,421][19668] Updated weights for policy 0, policy_version 275926 (0.0016) [2025-01-05 13:43:54,476][19668] Updated weights for policy 0, policy_version 275936 (0.0015) [2025-01-05 13:43:54,965][19571] Fps is (10 sec: 20070.1, 60 sec: 20002.1, 300 sec: 20021.8). Total num frames: 1130242048. Throughput: 0: 5007.3. Samples: 7554400. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 13:43:54,965][19571] Avg episode reward: [(0, '9.904')] [2025-01-05 13:43:56,555][19668] Updated weights for policy 0, policy_version 275946 (0.0016) [2025-01-05 13:43:58,577][19668] Updated weights for policy 0, policy_version 275956 (0.0016) [2025-01-05 13:43:59,965][19571] Fps is (10 sec: 20070.4, 60 sec: 19933.9, 300 sec: 20021.8). Total num frames: 1130340352. Throughput: 0: 5001.5. Samples: 7584214. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 13:43:59,965][19571] Avg episode reward: [(0, '9.813')] [2025-01-05 13:44:00,667][19668] Updated weights for policy 0, policy_version 275966 (0.0016) [2025-01-05 13:44:02,679][19668] Updated weights for policy 0, policy_version 275976 (0.0015) [2025-01-05 13:44:04,664][19668] Updated weights for policy 0, policy_version 275986 (0.0015) [2025-01-05 13:44:04,965][19571] Fps is (10 sec: 20070.5, 60 sec: 20070.4, 300 sec: 20021.8). Total num frames: 1130442752. Throughput: 0: 5002.2. Samples: 7599358. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 13:44:04,965][19571] Avg episode reward: [(0, '9.483')] [2025-01-05 13:44:06,717][19668] Updated weights for policy 0, policy_version 275996 (0.0015) [2025-01-05 13:44:08,728][19668] Updated weights for policy 0, policy_version 276006 (0.0015) [2025-01-05 13:44:09,965][19571] Fps is (10 sec: 20480.1, 60 sec: 20070.4, 300 sec: 20021.8). Total num frames: 1130545152. Throughput: 0: 5014.1. Samples: 7629844. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 13:44:09,965][19571] Avg episode reward: [(0, '10.533')] [2025-01-05 13:44:10,738][19668] Updated weights for policy 0, policy_version 276016 (0.0015) [2025-01-05 13:44:12,776][19668] Updated weights for policy 0, policy_version 276026 (0.0015) [2025-01-05 13:44:14,790][19668] Updated weights for policy 0, policy_version 276036 (0.0018) [2025-01-05 13:44:14,965][19571] Fps is (10 sec: 20070.4, 60 sec: 20070.4, 300 sec: 20007.9). Total num frames: 1130643456. Throughput: 0: 5029.2. Samples: 7660312. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 13:44:14,965][19571] Avg episode reward: [(0, '9.518')] [2025-01-05 13:44:16,783][19668] Updated weights for policy 0, policy_version 276046 (0.0014) [2025-01-05 13:44:18,803][19668] Updated weights for policy 0, policy_version 276056 (0.0015) [2025-01-05 13:44:19,965][19571] Fps is (10 sec: 20069.9, 60 sec: 20070.3, 300 sec: 20007.9). Total num frames: 1130745856. Throughput: 0: 5040.3. Samples: 7675516. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 13:44:19,966][19571] Avg episode reward: [(0, '9.791')] [2025-01-05 13:44:20,910][19668] Updated weights for policy 0, policy_version 276066 (0.0016) [2025-01-05 13:44:22,933][19668] Updated weights for policy 0, policy_version 276076 (0.0016) [2025-01-05 13:44:24,965][19571] Fps is (10 sec: 20070.4, 60 sec: 20070.4, 300 sec: 20007.9). Total num frames: 1130844160. Throughput: 0: 5039.8. Samples: 7705386. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 13:44:24,965][19571] Avg episode reward: [(0, '10.008')] [2025-01-05 13:44:24,996][19668] Updated weights for policy 0, policy_version 276086 (0.0015) [2025-01-05 13:44:27,098][19668] Updated weights for policy 0, policy_version 276096 (0.0015) [2025-01-05 13:44:29,108][19668] Updated weights for policy 0, policy_version 276106 (0.0015) [2025-01-05 13:44:29,965][19571] Fps is (10 sec: 20070.8, 60 sec: 20138.7, 300 sec: 20007.9). Total num frames: 1130946560. Throughput: 0: 5024.2. Samples: 7735356. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 13:44:29,965][19571] Avg episode reward: [(0, '10.792')] [2025-01-05 13:44:31,120][19668] Updated weights for policy 0, policy_version 276116 (0.0018) [2025-01-05 13:44:33,187][19668] Updated weights for policy 0, policy_version 276126 (0.0016) [2025-01-05 13:44:34,965][19571] Fps is (10 sec: 20070.6, 60 sec: 20138.8, 300 sec: 20007.9). Total num frames: 1131044864. Throughput: 0: 5021.9. Samples: 7750506. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 13:44:34,965][19571] Avg episode reward: [(0, '8.969')] [2025-01-05 13:44:35,231][19668] Updated weights for policy 0, policy_version 276136 (0.0018) [2025-01-05 13:44:37,238][19668] Updated weights for policy 0, policy_version 276146 (0.0015) [2025-01-05 13:44:39,292][19668] Updated weights for policy 0, policy_version 276156 (0.0015) [2025-01-05 13:44:39,965][19571] Fps is (10 sec: 20070.2, 60 sec: 20138.6, 300 sec: 20007.9). Total num frames: 1131147264. Throughput: 0: 5027.6. Samples: 7780642. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 13:44:39,966][19571] Avg episode reward: [(0, '9.552')] [2025-01-05 13:44:39,973][19636] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000276159_1131147264.pth... [2025-01-05 13:44:40,028][19636] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000274989_1126354944.pth [2025-01-05 13:44:41,373][19668] Updated weights for policy 0, policy_version 276166 (0.0017) [2025-01-05 13:44:43,395][19668] Updated weights for policy 0, policy_version 276176 (0.0017) [2025-01-05 13:44:44,965][19571] Fps is (10 sec: 20070.2, 60 sec: 20070.4, 300 sec: 19994.0). Total num frames: 1131245568. Throughput: 0: 5028.1. Samples: 7810476. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 13:44:44,965][19571] Avg episode reward: [(0, '9.033')] [2025-01-05 13:44:45,481][19668] Updated weights for policy 0, policy_version 276186 (0.0015) [2025-01-05 13:44:47,480][19668] Updated weights for policy 0, policy_version 276196 (0.0019) [2025-01-05 13:44:49,518][19668] Updated weights for policy 0, policy_version 276206 (0.0016) [2025-01-05 13:44:49,965][19571] Fps is (10 sec: 20070.7, 60 sec: 20138.7, 300 sec: 19994.0). Total num frames: 1131347968. Throughput: 0: 5026.5. Samples: 7825550. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 13:44:49,965][19571] Avg episode reward: [(0, '9.200')] [2025-01-05 13:44:51,595][19668] Updated weights for policy 0, policy_version 276216 (0.0019) [2025-01-05 13:44:53,590][19668] Updated weights for policy 0, policy_version 276226 (0.0016) [2025-01-05 13:44:54,965][19571] Fps is (10 sec: 20070.5, 60 sec: 20070.4, 300 sec: 19994.0). Total num frames: 1131446272. Throughput: 0: 5021.7. Samples: 7855820. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 13:44:54,965][19571] Avg episode reward: [(0, '9.954')] [2025-01-05 13:44:55,589][19668] Updated weights for policy 0, policy_version 276236 (0.0016) [2025-01-05 13:44:57,675][19668] Updated weights for policy 0, policy_version 276246 (0.0016) [2025-01-05 13:44:59,640][19668] Updated weights for policy 0, policy_version 276256 (0.0015) [2025-01-05 13:44:59,965][19571] Fps is (10 sec: 20070.3, 60 sec: 20138.7, 300 sec: 19994.0). Total num frames: 1131548672. Throughput: 0: 5018.1. Samples: 7886128. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 13:44:59,965][19571] Avg episode reward: [(0, '9.355')] [2025-01-05 13:45:01,650][19668] Updated weights for policy 0, policy_version 276266 (0.0016) [2025-01-05 13:45:03,729][19668] Updated weights for policy 0, policy_version 276276 (0.0016) [2025-01-05 13:45:04,965][19571] Fps is (10 sec: 20070.3, 60 sec: 20070.4, 300 sec: 19980.1). Total num frames: 1131646976. Throughput: 0: 5018.4. Samples: 7901342. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 13:45:04,965][19571] Avg episode reward: [(0, '10.424')] [2025-01-05 13:45:05,761][19668] Updated weights for policy 0, policy_version 276286 (0.0016) [2025-01-05 13:45:07,760][19668] Updated weights for policy 0, policy_version 276296 (0.0016) [2025-01-05 13:45:09,850][19668] Updated weights for policy 0, policy_version 276306 (0.0016) [2025-01-05 13:45:09,965][19571] Fps is (10 sec: 20070.6, 60 sec: 20070.4, 300 sec: 19980.2). Total num frames: 1131749376. Throughput: 0: 5026.6. Samples: 7931584. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 13:45:09,965][19571] Avg episode reward: [(0, '10.527')] [2025-01-05 13:45:11,872][19668] Updated weights for policy 0, policy_version 276316 (0.0017) [2025-01-05 13:45:13,878][19668] Updated weights for policy 0, policy_version 276326 (0.0015) [2025-01-05 13:45:14,965][19571] Fps is (10 sec: 20480.1, 60 sec: 20138.7, 300 sec: 19994.0). Total num frames: 1131851776. Throughput: 0: 5029.5. Samples: 7961684. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-05 13:45:14,965][19571] Avg episode reward: [(0, '10.576')] [2025-01-05 13:45:15,977][19668] Updated weights for policy 0, policy_version 276336 (0.0016) [2025-01-05 13:45:17,942][19668] Updated weights for policy 0, policy_version 276346 (0.0018) [2025-01-05 13:45:19,949][19668] Updated weights for policy 0, policy_version 276356 (0.0016) [2025-01-05 13:45:19,965][19571] Fps is (10 sec: 20480.0, 60 sec: 20138.8, 300 sec: 19994.0). Total num frames: 1131954176. Throughput: 0: 5030.4. Samples: 7976874. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-05 13:45:19,965][19571] Avg episode reward: [(0, '9.358')] [2025-01-05 13:45:22,037][19668] Updated weights for policy 0, policy_version 276366 (0.0015) [2025-01-05 13:45:23,993][19668] Updated weights for policy 0, policy_version 276376 (0.0016) [2025-01-05 13:45:24,965][19571] Fps is (10 sec: 20070.2, 60 sec: 20138.6, 300 sec: 19994.0). Total num frames: 1132052480. Throughput: 0: 5038.1. Samples: 8007358. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-05 13:45:24,966][19571] Avg episode reward: [(0, '10.797')] [2025-01-05 13:45:25,985][19668] Updated weights for policy 0, policy_version 276386 (0.0015) [2025-01-05 13:45:28,064][19668] Updated weights for policy 0, policy_version 276396 (0.0016) [2025-01-05 13:45:29,965][19571] Fps is (10 sec: 20070.4, 60 sec: 20138.7, 300 sec: 20007.9). Total num frames: 1132154880. Throughput: 0: 5045.1. Samples: 8037506. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-05 13:45:29,965][19571] Avg episode reward: [(0, '9.166')] [2025-01-05 13:45:30,106][19668] Updated weights for policy 0, policy_version 276406 (0.0016) [2025-01-05 13:45:32,141][19668] Updated weights for policy 0, policy_version 276416 (0.0016) [2025-01-05 13:45:34,175][19668] Updated weights for policy 0, policy_version 276426 (0.0015) [2025-01-05 13:45:34,965][19571] Fps is (10 sec: 20070.6, 60 sec: 20138.6, 300 sec: 19994.0). Total num frames: 1132253184. Throughput: 0: 5047.2. Samples: 8052676. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-05 13:45:34,965][19571] Avg episode reward: [(0, '10.199')] [2025-01-05 13:45:36,206][19668] Updated weights for policy 0, policy_version 276436 (0.0016) [2025-01-05 13:45:38,219][19668] Updated weights for policy 0, policy_version 276446 (0.0015) [2025-01-05 13:45:39,965][19571] Fps is (10 sec: 20070.3, 60 sec: 20138.7, 300 sec: 20007.9). Total num frames: 1132355584. Throughput: 0: 5049.0. Samples: 8083026. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-05 13:45:39,965][19571] Avg episode reward: [(0, '8.410')] [2025-01-05 13:45:40,270][19668] Updated weights for policy 0, policy_version 276456 (0.0016) [2025-01-05 13:45:42,237][19668] Updated weights for policy 0, policy_version 276466 (0.0016) [2025-01-05 13:45:44,276][19668] Updated weights for policy 0, policy_version 276476 (0.0016) [2025-01-05 13:45:44,965][19571] Fps is (10 sec: 20480.2, 60 sec: 20207.0, 300 sec: 20021.8). Total num frames: 1132457984. Throughput: 0: 5048.7. Samples: 8113320. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-05 13:45:44,965][19571] Avg episode reward: [(0, '10.132')] [2025-01-05 13:45:46,350][19668] Updated weights for policy 0, policy_version 276486 (0.0015) [2025-01-05 13:45:48,306][19668] Updated weights for policy 0, policy_version 276496 (0.0016) [2025-01-05 13:45:49,965][19571] Fps is (10 sec: 20070.4, 60 sec: 20138.7, 300 sec: 20021.8). Total num frames: 1132556288. Throughput: 0: 5047.3. Samples: 8128472. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-05 13:45:49,965][19571] Avg episode reward: [(0, '10.481')] [2025-01-05 13:45:50,352][19668] Updated weights for policy 0, policy_version 276506 (0.0017) [2025-01-05 13:45:52,399][19668] Updated weights for policy 0, policy_version 276516 (0.0016) [2025-01-05 13:45:54,379][19668] Updated weights for policy 0, policy_version 276526 (0.0015) [2025-01-05 13:45:54,965][19571] Fps is (10 sec: 20070.4, 60 sec: 20207.0, 300 sec: 20021.8). Total num frames: 1132658688. Throughput: 0: 5054.6. Samples: 8159042. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-05 13:45:54,965][19571] Avg episode reward: [(0, '9.462')] [2025-01-05 13:45:56,405][19668] Updated weights for policy 0, policy_version 276536 (0.0016) [2025-01-05 13:45:58,433][19668] Updated weights for policy 0, policy_version 276546 (0.0016) [2025-01-05 13:45:59,965][19571] Fps is (10 sec: 20479.9, 60 sec: 20206.9, 300 sec: 20035.7). Total num frames: 1132761088. Throughput: 0: 5057.7. Samples: 8189282. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-05 13:45:59,965][19571] Avg episode reward: [(0, '9.784')] [2025-01-05 13:46:00,470][19668] Updated weights for policy 0, policy_version 276556 (0.0016) [2025-01-05 13:46:02,463][19668] Updated weights for policy 0, policy_version 276566 (0.0016) [2025-01-05 13:46:04,500][19668] Updated weights for policy 0, policy_version 276576 (0.0016) [2025-01-05 13:46:04,965][19571] Fps is (10 sec: 20479.8, 60 sec: 20275.2, 300 sec: 20049.6). Total num frames: 1132863488. Throughput: 0: 5059.3. Samples: 8204544. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-05 13:46:04,965][19571] Avg episode reward: [(0, '10.652')] [2025-01-05 13:46:06,540][19668] Updated weights for policy 0, policy_version 276586 (0.0016) [2025-01-05 13:46:08,538][19668] Updated weights for policy 0, policy_version 276596 (0.0016) [2025-01-05 13:46:09,965][19571] Fps is (10 sec: 20070.6, 60 sec: 20206.9, 300 sec: 20049.6). Total num frames: 1132961792. Throughput: 0: 5057.5. Samples: 8234946. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-05 13:46:09,965][19571] Avg episode reward: [(0, '9.831')] [2025-01-05 13:46:10,598][19668] Updated weights for policy 0, policy_version 276606 (0.0016) [2025-01-05 13:46:12,573][19668] Updated weights for policy 0, policy_version 276616 (0.0016) [2025-01-05 13:46:14,597][19668] Updated weights for policy 0, policy_version 276626 (0.0017) [2025-01-05 13:46:14,965][19571] Fps is (10 sec: 20070.5, 60 sec: 20206.9, 300 sec: 20063.5). Total num frames: 1133064192. Throughput: 0: 5062.6. Samples: 8265322. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-05 13:46:14,965][19571] Avg episode reward: [(0, '11.566')] [2025-01-05 13:46:16,687][19668] Updated weights for policy 0, policy_version 276636 (0.0016) [2025-01-05 13:46:18,671][19668] Updated weights for policy 0, policy_version 276646 (0.0016) [2025-01-05 13:46:19,965][19571] Fps is (10 sec: 20480.0, 60 sec: 20206.9, 300 sec: 20063.5). Total num frames: 1133166592. Throughput: 0: 5057.5. Samples: 8280262. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-05 13:46:19,965][19571] Avg episode reward: [(0, '9.356')] [2025-01-05 13:46:20,685][19668] Updated weights for policy 0, policy_version 276656 (0.0015) [2025-01-05 13:46:22,708][19668] Updated weights for policy 0, policy_version 276666 (0.0018) [2025-01-05 13:46:24,708][19668] Updated weights for policy 0, policy_version 276676 (0.0015) [2025-01-05 13:46:24,965][19571] Fps is (10 sec: 20479.9, 60 sec: 20275.2, 300 sec: 20077.4). Total num frames: 1133268992. Throughput: 0: 5062.5. Samples: 8310838. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 13:46:24,965][19571] Avg episode reward: [(0, '10.913')] [2025-01-05 13:46:26,739][19668] Updated weights for policy 0, policy_version 276686 (0.0015) [2025-01-05 13:46:28,762][19668] Updated weights for policy 0, policy_version 276696 (0.0016) [2025-01-05 13:46:29,965][19571] Fps is (10 sec: 20070.5, 60 sec: 20206.9, 300 sec: 20077.3). Total num frames: 1133367296. Throughput: 0: 5063.0. Samples: 8341154. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 13:46:29,965][19571] Avg episode reward: [(0, '11.929')] [2025-01-05 13:46:30,809][19668] Updated weights for policy 0, policy_version 276706 (0.0016) [2025-01-05 13:46:32,888][19668] Updated weights for policy 0, policy_version 276716 (0.0016) [2025-01-05 13:46:34,965][19571] Fps is (10 sec: 20070.4, 60 sec: 20275.2, 300 sec: 20091.2). Total num frames: 1133469696. Throughput: 0: 5056.0. Samples: 8355994. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 13:46:34,965][19571] Avg episode reward: [(0, '9.747')] [2025-01-05 13:46:34,968][19668] Updated weights for policy 0, policy_version 276726 (0.0017) [2025-01-05 13:46:37,086][19668] Updated weights for policy 0, policy_version 276736 (0.0018) [2025-01-05 13:46:39,153][19668] Updated weights for policy 0, policy_version 276746 (0.0016) [2025-01-05 13:46:39,965][19571] Fps is (10 sec: 19660.4, 60 sec: 20138.6, 300 sec: 20077.3). Total num frames: 1133563904. Throughput: 0: 5032.0. Samples: 8385484. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 13:46:39,966][19571] Avg episode reward: [(0, '10.241')] [2025-01-05 13:46:39,973][19636] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000276750_1133568000.pth... [2025-01-05 13:46:40,023][19636] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000275573_1128747008.pth [2025-01-05 13:46:41,250][19668] Updated weights for policy 0, policy_version 276756 (0.0017) [2025-01-05 13:46:43,286][19668] Updated weights for policy 0, policy_version 276766 (0.0017) [2025-01-05 13:46:44,965][19571] Fps is (10 sec: 19660.8, 60 sec: 20138.6, 300 sec: 20077.3). Total num frames: 1133666304. Throughput: 0: 5022.1. Samples: 8415278. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 13:46:44,965][19571] Avg episode reward: [(0, '10.129')] [2025-01-05 13:46:45,336][19668] Updated weights for policy 0, policy_version 276776 (0.0016) [2025-01-05 13:46:47,372][19668] Updated weights for policy 0, policy_version 276786 (0.0016) [2025-01-05 13:46:49,374][19668] Updated weights for policy 0, policy_version 276796 (0.0017) [2025-01-05 13:46:49,965][19571] Fps is (10 sec: 20070.6, 60 sec: 20138.7, 300 sec: 20077.3). Total num frames: 1133764608. Throughput: 0: 5018.0. Samples: 8430356. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 13:46:49,965][19571] Avg episode reward: [(0, '10.160')] [2025-01-05 13:46:51,457][19668] Updated weights for policy 0, policy_version 276806 (0.0016) [2025-01-05 13:46:53,494][19668] Updated weights for policy 0, policy_version 276816 (0.0016) [2025-01-05 13:46:54,965][19571] Fps is (10 sec: 19661.0, 60 sec: 20070.4, 300 sec: 20063.5). Total num frames: 1133862912. Throughput: 0: 5008.1. Samples: 8460312. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 13:46:54,965][19571] Avg episode reward: [(0, '8.539')] [2025-01-05 13:46:55,602][19668] Updated weights for policy 0, policy_version 276826 (0.0016) [2025-01-05 13:46:57,663][19668] Updated weights for policy 0, policy_version 276836 (0.0016) [2025-01-05 13:46:59,680][19668] Updated weights for policy 0, policy_version 276846 (0.0016) [2025-01-05 13:46:59,965][19571] Fps is (10 sec: 20070.7, 60 sec: 20070.5, 300 sec: 20077.3). Total num frames: 1133965312. Throughput: 0: 4994.5. Samples: 8490074. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 13:46:59,965][19571] Avg episode reward: [(0, '10.222')] [2025-01-05 13:47:01,706][19668] Updated weights for policy 0, policy_version 276856 (0.0016) [2025-01-05 13:47:03,763][19668] Updated weights for policy 0, policy_version 276866 (0.0016) [2025-01-05 13:47:04,965][19571] Fps is (10 sec: 20070.4, 60 sec: 20002.2, 300 sec: 20063.5). Total num frames: 1134063616. Throughput: 0: 4999.3. Samples: 8505230. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 13:47:04,965][19571] Avg episode reward: [(0, '9.926')] [2025-01-05 13:47:05,839][19668] Updated weights for policy 0, policy_version 276876 (0.0017) [2025-01-05 13:47:07,896][19668] Updated weights for policy 0, policy_version 276886 (0.0016) [2025-01-05 13:47:09,965][19571] Fps is (10 sec: 19660.5, 60 sec: 20002.1, 300 sec: 20063.4). Total num frames: 1134161920. Throughput: 0: 4976.9. Samples: 8534800. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 13:47:09,965][19571] Avg episode reward: [(0, '10.632')] [2025-01-05 13:47:10,048][19668] Updated weights for policy 0, policy_version 276896 (0.0017) [2025-01-05 13:47:12,104][19668] Updated weights for policy 0, policy_version 276906 (0.0017) [2025-01-05 13:47:14,132][19668] Updated weights for policy 0, policy_version 276916 (0.0017) [2025-01-05 13:47:14,965][19571] Fps is (10 sec: 19660.6, 60 sec: 19933.8, 300 sec: 20049.6). Total num frames: 1134260224. Throughput: 0: 4964.1. Samples: 8564538. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 13:47:14,965][19571] Avg episode reward: [(0, '9.301')] [2025-01-05 13:47:16,272][19668] Updated weights for policy 0, policy_version 276926 (0.0017) [2025-01-05 13:47:18,263][19668] Updated weights for policy 0, policy_version 276936 (0.0015) [2025-01-05 13:47:19,965][19571] Fps is (10 sec: 20070.5, 60 sec: 19933.9, 300 sec: 20063.5). Total num frames: 1134362624. Throughput: 0: 4962.8. Samples: 8579320. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 13:47:19,965][19571] Avg episode reward: [(0, '9.630')] [2025-01-05 13:47:20,304][19668] Updated weights for policy 0, policy_version 276946 (0.0015) [2025-01-05 13:47:22,364][19668] Updated weights for policy 0, policy_version 276956 (0.0016) [2025-01-05 13:47:24,352][19668] Updated weights for policy 0, policy_version 276966 (0.0016) [2025-01-05 13:47:24,965][19571] Fps is (10 sec: 20479.9, 60 sec: 19933.8, 300 sec: 20077.3). Total num frames: 1134465024. Throughput: 0: 4981.2. Samples: 8609638. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 13:47:24,965][19571] Avg episode reward: [(0, '10.013')] [2025-01-05 13:47:26,391][19668] Updated weights for policy 0, policy_version 276976 (0.0019) [2025-01-05 13:47:28,464][19668] Updated weights for policy 0, policy_version 276986 (0.0015) [2025-01-05 13:47:29,965][19571] Fps is (10 sec: 20070.5, 60 sec: 19933.8, 300 sec: 20063.5). Total num frames: 1134563328. Throughput: 0: 4984.7. Samples: 8639588. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 13:47:29,965][19571] Avg episode reward: [(0, '11.085')] [2025-01-05 13:47:30,538][19668] Updated weights for policy 0, policy_version 276996 (0.0016) [2025-01-05 13:47:32,562][19668] Updated weights for policy 0, policy_version 277006 (0.0016) [2025-01-05 13:47:34,623][19668] Updated weights for policy 0, policy_version 277016 (0.0016) [2025-01-05 13:47:34,965][19571] Fps is (10 sec: 19661.1, 60 sec: 19865.6, 300 sec: 20063.5). Total num frames: 1134661632. Throughput: 0: 4983.4. Samples: 8654608. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 13:47:34,965][19571] Avg episode reward: [(0, '9.559')] [2025-01-05 13:47:36,713][19668] Updated weights for policy 0, policy_version 277026 (0.0017) [2025-01-05 13:47:38,745][19668] Updated weights for policy 0, policy_version 277036 (0.0016) [2025-01-05 13:47:39,965][19571] Fps is (10 sec: 19660.7, 60 sec: 19933.9, 300 sec: 20063.5). Total num frames: 1134759936. Throughput: 0: 4979.1. Samples: 8684372. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 13:47:39,965][19571] Avg episode reward: [(0, '9.483')] [2025-01-05 13:47:40,874][19668] Updated weights for policy 0, policy_version 277046 (0.0016) [2025-01-05 13:47:42,883][19668] Updated weights for policy 0, policy_version 277056 (0.0016) [2025-01-05 13:47:44,898][19668] Updated weights for policy 0, policy_version 277066 (0.0016) [2025-01-05 13:47:44,965][19571] Fps is (10 sec: 20070.4, 60 sec: 19933.9, 300 sec: 20063.5). Total num frames: 1134862336. Throughput: 0: 4983.8. Samples: 8714346. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 13:47:44,965][19571] Avg episode reward: [(0, '11.911')] [2025-01-05 13:47:46,960][19668] Updated weights for policy 0, policy_version 277076 (0.0019) [2025-01-05 13:47:48,964][19668] Updated weights for policy 0, policy_version 277086 (0.0015) [2025-01-05 13:47:49,965][19571] Fps is (10 sec: 20070.6, 60 sec: 19933.9, 300 sec: 20063.5). Total num frames: 1134960640. Throughput: 0: 4985.4. Samples: 8729574. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 13:47:49,965][19571] Avg episode reward: [(0, '9.307')] [2025-01-05 13:47:50,961][19668] Updated weights for policy 0, policy_version 277096 (0.0016) [2025-01-05 13:47:53,051][19668] Updated weights for policy 0, policy_version 277106 (0.0016) [2025-01-05 13:47:54,965][19571] Fps is (10 sec: 19660.6, 60 sec: 19933.8, 300 sec: 20049.6). Total num frames: 1135058944. Throughput: 0: 4997.7. Samples: 8759698. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 13:47:54,965][19571] Avg episode reward: [(0, '10.003')] [2025-01-05 13:47:55,130][19668] Updated weights for policy 0, policy_version 277116 (0.0016) [2025-01-05 13:47:57,154][19668] Updated weights for policy 0, policy_version 277126 (0.0016) [2025-01-05 13:47:59,222][19668] Updated weights for policy 0, policy_version 277136 (0.0015) [2025-01-05 13:47:59,965][19571] Fps is (10 sec: 20070.2, 60 sec: 19933.8, 300 sec: 20077.3). Total num frames: 1135161344. Throughput: 0: 5001.7. Samples: 8789614. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 13:47:59,965][19571] Avg episode reward: [(0, '8.644')] [2025-01-05 13:48:01,259][19668] Updated weights for policy 0, policy_version 277146 (0.0016) [2025-01-05 13:48:03,260][19668] Updated weights for policy 0, policy_version 277156 (0.0015) [2025-01-05 13:48:04,965][19571] Fps is (10 sec: 20479.9, 60 sec: 20002.1, 300 sec: 20077.3). Total num frames: 1135263744. Throughput: 0: 5006.7. Samples: 8804622. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 13:48:04,965][19571] Avg episode reward: [(0, '9.965')] [2025-01-05 13:48:05,318][19668] Updated weights for policy 0, policy_version 277166 (0.0015) [2025-01-05 13:48:07,307][19668] Updated weights for policy 0, policy_version 277176 (0.0015) [2025-01-05 13:48:09,297][19668] Updated weights for policy 0, policy_version 277186 (0.0015) [2025-01-05 13:48:09,965][19571] Fps is (10 sec: 20480.0, 60 sec: 20070.4, 300 sec: 20091.2). Total num frames: 1135366144. Throughput: 0: 5011.6. Samples: 8835158. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 13:48:09,965][19571] Avg episode reward: [(0, '9.182')] [2025-01-05 13:48:11,403][19668] Updated weights for policy 0, policy_version 277196 (0.0015) [2025-01-05 13:48:13,388][19668] Updated weights for policy 0, policy_version 277206 (0.0017) [2025-01-05 13:48:14,965][19571] Fps is (10 sec: 20070.5, 60 sec: 20070.4, 300 sec: 20077.3). Total num frames: 1135464448. Throughput: 0: 5019.5. Samples: 8865466. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 13:48:14,965][19571] Avg episode reward: [(0, '10.040')] [2025-01-05 13:48:15,392][19668] Updated weights for policy 0, policy_version 277216 (0.0014) [2025-01-05 13:48:17,469][19668] Updated weights for policy 0, policy_version 277226 (0.0015) [2025-01-05 13:48:19,449][19668] Updated weights for policy 0, policy_version 277236 (0.0015) [2025-01-05 13:48:19,965][19571] Fps is (10 sec: 20070.6, 60 sec: 20070.4, 300 sec: 20091.2). Total num frames: 1135566848. Throughput: 0: 5021.6. Samples: 8880578. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 13:48:19,965][19571] Avg episode reward: [(0, '11.489')] [2025-01-05 13:48:21,441][19668] Updated weights for policy 0, policy_version 277246 (0.0015) [2025-01-05 13:48:23,529][19668] Updated weights for policy 0, policy_version 277256 (0.0015) [2025-01-05 13:48:24,965][19571] Fps is (10 sec: 20480.0, 60 sec: 20070.4, 300 sec: 20105.1). Total num frames: 1135669248. Throughput: 0: 5036.8. Samples: 8911028. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 13:48:24,965][19571] Avg episode reward: [(0, '9.666')] [2025-01-05 13:48:25,590][19668] Updated weights for policy 0, policy_version 277266 (0.0015) [2025-01-05 13:48:27,570][19668] Updated weights for policy 0, policy_version 277276 (0.0015) [2025-01-05 13:48:29,640][19668] Updated weights for policy 0, policy_version 277286 (0.0015) [2025-01-05 13:48:29,965][19571] Fps is (10 sec: 20070.2, 60 sec: 20070.4, 300 sec: 20105.1). Total num frames: 1135767552. Throughput: 0: 5038.5. Samples: 8941080. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 13:48:29,965][19571] Avg episode reward: [(0, '9.714')] [2025-01-05 13:48:31,709][19668] Updated weights for policy 0, policy_version 277296 (0.0015) [2025-01-05 13:48:33,707][19668] Updated weights for policy 0, policy_version 277306 (0.0015) [2025-01-05 13:48:34,965][19571] Fps is (10 sec: 20070.3, 60 sec: 20138.6, 300 sec: 20105.1). Total num frames: 1135869952. Throughput: 0: 5032.2. Samples: 8956024. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 13:48:34,965][19571] Avg episode reward: [(0, '9.211')] [2025-01-05 13:48:35,770][19668] Updated weights for policy 0, policy_version 277316 (0.0015) [2025-01-05 13:48:37,759][19668] Updated weights for policy 0, policy_version 277326 (0.0015) [2025-01-05 13:48:39,751][19668] Updated weights for policy 0, policy_version 277336 (0.0016) [2025-01-05 13:48:39,965][19571] Fps is (10 sec: 20070.5, 60 sec: 20138.7, 300 sec: 20091.2). Total num frames: 1135968256. Throughput: 0: 5040.8. Samples: 8986534. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 13:48:39,965][19571] Avg episode reward: [(0, '9.277')] [2025-01-05 13:48:39,999][19636] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000277337_1135972352.pth... [2025-01-05 13:48:40,051][19636] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000276159_1131147264.pth [2025-01-05 13:48:41,877][19668] Updated weights for policy 0, policy_version 277346 (0.0016) [2025-01-05 13:48:43,854][19668] Updated weights for policy 0, policy_version 277356 (0.0016) [2025-01-05 13:48:44,965][19571] Fps is (10 sec: 20070.5, 60 sec: 20138.7, 300 sec: 20105.1). Total num frames: 1136070656. Throughput: 0: 5047.1. Samples: 9016734. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 13:48:44,965][19571] Avg episode reward: [(0, '10.354')] [2025-01-05 13:48:45,832][19668] Updated weights for policy 0, policy_version 277366 (0.0014) [2025-01-05 13:48:47,903][19668] Updated weights for policy 0, policy_version 277376 (0.0016) [2025-01-05 13:48:49,965][19571] Fps is (10 sec: 20070.4, 60 sec: 20138.7, 300 sec: 20091.2). Total num frames: 1136168960. Throughput: 0: 5052.2. Samples: 9031970. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 13:48:49,966][19571] Avg episode reward: [(0, '9.529')] [2025-01-05 13:48:49,978][19668] Updated weights for policy 0, policy_version 277386 (0.0016) [2025-01-05 13:48:52,009][19668] Updated weights for policy 0, policy_version 277396 (0.0017) [2025-01-05 13:48:54,087][19668] Updated weights for policy 0, policy_version 277406 (0.0016) [2025-01-05 13:48:54,965][19571] Fps is (10 sec: 20070.1, 60 sec: 20206.9, 300 sec: 20105.1). Total num frames: 1136271360. Throughput: 0: 5038.1. Samples: 9061874. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 13:48:54,966][19571] Avg episode reward: [(0, '9.468')] [2025-01-05 13:48:56,112][19668] Updated weights for policy 0, policy_version 277416 (0.0015) [2025-01-05 13:48:58,117][19668] Updated weights for policy 0, policy_version 277426 (0.0014) [2025-01-05 13:48:59,965][19571] Fps is (10 sec: 20480.0, 60 sec: 20207.0, 300 sec: 20105.1). Total num frames: 1136373760. Throughput: 0: 5032.4. Samples: 9091922. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 13:48:59,965][19571] Avg episode reward: [(0, '9.666')] [2025-01-05 13:49:00,209][19668] Updated weights for policy 0, policy_version 277436 (0.0015) [2025-01-05 13:49:02,181][19668] Updated weights for policy 0, policy_version 277446 (0.0015) [2025-01-05 13:49:04,171][19668] Updated weights for policy 0, policy_version 277456 (0.0015) [2025-01-05 13:49:04,965][19571] Fps is (10 sec: 20070.9, 60 sec: 20138.7, 300 sec: 20091.2). Total num frames: 1136472064. Throughput: 0: 5034.7. Samples: 9107140. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 13:49:04,965][19571] Avg episode reward: [(0, '10.182')] [2025-01-05 13:49:06,234][19668] Updated weights for policy 0, policy_version 277466 (0.0015) [2025-01-05 13:49:08,231][19668] Updated weights for policy 0, policy_version 277476 (0.0017) [2025-01-05 13:49:09,965][19571] Fps is (10 sec: 20070.4, 60 sec: 20138.7, 300 sec: 20105.1). Total num frames: 1136574464. Throughput: 0: 5035.8. Samples: 9137640. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 13:49:09,965][19571] Avg episode reward: [(0, '10.723')] [2025-01-05 13:49:10,241][19668] Updated weights for policy 0, policy_version 277486 (0.0015) [2025-01-05 13:49:12,283][19668] Updated weights for policy 0, policy_version 277496 (0.0015) [2025-01-05 13:49:14,276][19668] Updated weights for policy 0, policy_version 277506 (0.0015) [2025-01-05 13:49:14,965][19571] Fps is (10 sec: 20479.7, 60 sec: 20206.9, 300 sec: 20105.1). Total num frames: 1136676864. Throughput: 0: 5044.8. Samples: 9168096. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 13:49:14,965][19571] Avg episode reward: [(0, '10.032')] [2025-01-05 13:49:16,276][19668] Updated weights for policy 0, policy_version 277516 (0.0015) [2025-01-05 13:49:18,326][19668] Updated weights for policy 0, policy_version 277526 (0.0016) [2025-01-05 13:49:19,965][19571] Fps is (10 sec: 20070.4, 60 sec: 20138.6, 300 sec: 20105.1). Total num frames: 1136775168. Throughput: 0: 5053.2. Samples: 9183418. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 13:49:19,965][19571] Avg episode reward: [(0, '11.471')] [2025-01-05 13:49:20,425][19668] Updated weights for policy 0, policy_version 277536 (0.0017) [2025-01-05 13:49:22,447][19668] Updated weights for policy 0, policy_version 277546 (0.0017) [2025-01-05 13:49:24,532][19668] Updated weights for policy 0, policy_version 277556 (0.0016) [2025-01-05 13:49:24,965][19571] Fps is (10 sec: 20070.6, 60 sec: 20138.7, 300 sec: 20105.1). Total num frames: 1136877568. Throughput: 0: 5035.5. Samples: 9213132. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 13:49:24,965][19571] Avg episode reward: [(0, '10.120')] [2025-01-05 13:49:26,629][19668] Updated weights for policy 0, policy_version 277566 (0.0016) [2025-01-05 13:49:28,635][19668] Updated weights for policy 0, policy_version 277576 (0.0016) [2025-01-05 13:49:29,965][19571] Fps is (10 sec: 20070.6, 60 sec: 20138.7, 300 sec: 20105.1). Total num frames: 1136975872. Throughput: 0: 5025.1. Samples: 9242862. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 13:49:29,965][19571] Avg episode reward: [(0, '10.514')] [2025-01-05 13:49:30,730][19668] Updated weights for policy 0, policy_version 277586 (0.0016) [2025-01-05 13:49:32,774][19668] Updated weights for policy 0, policy_version 277596 (0.0019) [2025-01-05 13:49:34,771][19668] Updated weights for policy 0, policy_version 277606 (0.0016) [2025-01-05 13:49:34,965][19571] Fps is (10 sec: 19660.7, 60 sec: 20070.4, 300 sec: 20091.2). Total num frames: 1137074176. Throughput: 0: 5019.7. Samples: 9257856. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 13:49:34,965][19571] Avg episode reward: [(0, '10.442')] [2025-01-05 13:49:36,858][19668] Updated weights for policy 0, policy_version 277616 (0.0016) [2025-01-05 13:49:38,865][19668] Updated weights for policy 0, policy_version 277626 (0.0016) [2025-01-05 13:49:39,965][19571] Fps is (10 sec: 20070.2, 60 sec: 20138.7, 300 sec: 20105.1). Total num frames: 1137176576. Throughput: 0: 5026.1. Samples: 9288048. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 13:49:39,965][19571] Avg episode reward: [(0, '10.230')] [2025-01-05 13:49:40,880][19668] Updated weights for policy 0, policy_version 277636 (0.0018) [2025-01-05 13:49:42,973][19668] Updated weights for policy 0, policy_version 277646 (0.0016) [2025-01-05 13:49:44,965][19571] Fps is (10 sec: 20070.3, 60 sec: 20070.4, 300 sec: 20091.2). Total num frames: 1137274880. Throughput: 0: 5019.3. Samples: 9317790. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 13:49:44,965][19571] Avg episode reward: [(0, '10.075')] [2025-01-05 13:49:45,054][19668] Updated weights for policy 0, policy_version 277656 (0.0020) [2025-01-05 13:49:47,075][19668] Updated weights for policy 0, policy_version 277666 (0.0018) [2025-01-05 13:49:49,167][19668] Updated weights for policy 0, policy_version 277676 (0.0016) [2025-01-05 13:49:49,965][19571] Fps is (10 sec: 19660.7, 60 sec: 20070.4, 300 sec: 20091.2). Total num frames: 1137373184. Throughput: 0: 5013.4. Samples: 9332746. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 13:49:49,965][19571] Avg episode reward: [(0, '10.770')] [2025-01-05 13:49:51,229][19668] Updated weights for policy 0, policy_version 277686 (0.0017) [2025-01-05 13:49:53,206][19668] Updated weights for policy 0, policy_version 277696 (0.0015) [2025-01-05 13:49:54,965][19571] Fps is (10 sec: 20070.5, 60 sec: 20070.4, 300 sec: 20091.2). Total num frames: 1137475584. Throughput: 0: 5006.8. Samples: 9362944. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 13:49:54,965][19571] Avg episode reward: [(0, '10.291')] [2025-01-05 13:49:55,275][19668] Updated weights for policy 0, policy_version 277706 (0.0015) [2025-01-05 13:49:57,269][19668] Updated weights for policy 0, policy_version 277716 (0.0016) [2025-01-05 13:49:59,230][19668] Updated weights for policy 0, policy_version 277726 (0.0015) [2025-01-05 13:49:59,965][19571] Fps is (10 sec: 20480.1, 60 sec: 20070.4, 300 sec: 20105.1). Total num frames: 1137577984. Throughput: 0: 5009.5. Samples: 9393522. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 13:49:59,965][19571] Avg episode reward: [(0, '10.598')] [2025-01-05 13:50:01,309][19668] Updated weights for policy 0, policy_version 277736 (0.0015) [2025-01-05 13:50:03,272][19668] Updated weights for policy 0, policy_version 277746 (0.0016) [2025-01-05 13:50:04,965][19571] Fps is (10 sec: 20479.7, 60 sec: 20138.6, 300 sec: 20105.1). Total num frames: 1137680384. Throughput: 0: 5007.4. Samples: 9408750. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 13:50:04,965][19571] Avg episode reward: [(0, '8.597')] [2025-01-05 13:50:05,266][19668] Updated weights for policy 0, policy_version 277756 (0.0015) [2025-01-05 13:50:07,345][19668] Updated weights for policy 0, policy_version 277766 (0.0015) [2025-01-05 13:50:09,299][19668] Updated weights for policy 0, policy_version 277776 (0.0016) [2025-01-05 13:50:09,965][19571] Fps is (10 sec: 20480.1, 60 sec: 20138.7, 300 sec: 20105.1). Total num frames: 1137782784. Throughput: 0: 5027.8. Samples: 9439382. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 13:50:09,965][19571] Avg episode reward: [(0, '8.661')] [2025-01-05 13:50:11,286][19668] Updated weights for policy 0, policy_version 277786 (0.0014) [2025-01-05 13:50:13,345][19668] Updated weights for policy 0, policy_version 277796 (0.0015) [2025-01-05 13:50:14,965][19571] Fps is (10 sec: 20070.6, 60 sec: 20070.4, 300 sec: 20091.2). Total num frames: 1137881088. Throughput: 0: 5041.2. Samples: 9469718. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 13:50:14,965][19571] Avg episode reward: [(0, '10.649')] [2025-01-05 13:50:15,406][19668] Updated weights for policy 0, policy_version 277806 (0.0016) [2025-01-05 13:50:17,415][19668] Updated weights for policy 0, policy_version 277816 (0.0016) [2025-01-05 13:50:19,479][19668] Updated weights for policy 0, policy_version 277826 (0.0016) [2025-01-05 13:50:19,965][19571] Fps is (10 sec: 20070.3, 60 sec: 20138.6, 300 sec: 20105.1). Total num frames: 1137983488. Throughput: 0: 5043.3. Samples: 9484804. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 13:50:19,965][19571] Avg episode reward: [(0, '9.379')] [2025-01-05 13:50:21,538][19668] Updated weights for policy 0, policy_version 277836 (0.0016) [2025-01-05 13:50:23,548][19668] Updated weights for policy 0, policy_version 277846 (0.0016) [2025-01-05 13:50:24,965][19571] Fps is (10 sec: 20070.4, 60 sec: 20070.4, 300 sec: 20091.2). Total num frames: 1138081792. Throughput: 0: 5042.9. Samples: 9514978. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 13:50:24,965][19571] Avg episode reward: [(0, '10.531')] [2025-01-05 13:50:25,638][19668] Updated weights for policy 0, policy_version 277856 (0.0016) [2025-01-05 13:50:27,687][19668] Updated weights for policy 0, policy_version 277866 (0.0016) [2025-01-05 13:50:29,712][19668] Updated weights for policy 0, policy_version 277876 (0.0015) [2025-01-05 13:50:29,965][19571] Fps is (10 sec: 20070.8, 60 sec: 20138.7, 300 sec: 20105.1). Total num frames: 1138184192. Throughput: 0: 5044.7. Samples: 9544800. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 13:50:29,965][19571] Avg episode reward: [(0, '9.859')] [2025-01-05 13:50:31,802][19668] Updated weights for policy 0, policy_version 277886 (0.0018) [2025-01-05 13:50:33,797][19668] Updated weights for policy 0, policy_version 277896 (0.0015) [2025-01-05 13:50:34,965][19571] Fps is (10 sec: 20070.6, 60 sec: 20138.7, 300 sec: 20091.2). Total num frames: 1138282496. Throughput: 0: 5046.2. Samples: 9559822. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 13:50:34,965][19571] Avg episode reward: [(0, '9.276')] [2025-01-05 13:50:35,789][19668] Updated weights for policy 0, policy_version 277906 (0.0016) [2025-01-05 13:50:37,829][19668] Updated weights for policy 0, policy_version 277916 (0.0018) [2025-01-05 13:50:39,894][19668] Updated weights for policy 0, policy_version 277926 (0.0017) [2025-01-05 13:50:39,965][19571] Fps is (10 sec: 20070.0, 60 sec: 20138.6, 300 sec: 20091.2). Total num frames: 1138384896. Throughput: 0: 5049.2. Samples: 9590158. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 13:50:39,965][19571] Avg episode reward: [(0, '10.097')] [2025-01-05 13:50:39,972][19636] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000277926_1138384896.pth... [2025-01-05 13:50:40,024][19636] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000276750_1133568000.pth [2025-01-05 13:50:41,958][19668] Updated weights for policy 0, policy_version 277936 (0.0016) [2025-01-05 13:50:43,988][19668] Updated weights for policy 0, policy_version 277946 (0.0015) [2025-01-05 13:50:44,965][19571] Fps is (10 sec: 20070.2, 60 sec: 20138.7, 300 sec: 20091.2). Total num frames: 1138483200. Throughput: 0: 5033.5. Samples: 9620030. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 13:50:44,965][19571] Avg episode reward: [(0, '10.944')] [2025-01-05 13:50:46,080][19668] Updated weights for policy 0, policy_version 277956 (0.0016) [2025-01-05 13:50:48,086][19668] Updated weights for policy 0, policy_version 277966 (0.0015) [2025-01-05 13:50:49,965][19571] Fps is (10 sec: 20070.4, 60 sec: 20206.9, 300 sec: 20091.2). Total num frames: 1138585600. Throughput: 0: 5028.8. Samples: 9635048. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 13:50:49,965][19571] Avg episode reward: [(0, '9.255')] [2025-01-05 13:50:50,099][19668] Updated weights for policy 0, policy_version 277976 (0.0015) [2025-01-05 13:50:52,150][19668] Updated weights for policy 0, policy_version 277986 (0.0016) [2025-01-05 13:50:54,139][19668] Updated weights for policy 0, policy_version 277996 (0.0015) [2025-01-05 13:50:54,965][19571] Fps is (10 sec: 20480.1, 60 sec: 20206.9, 300 sec: 20091.2). Total num frames: 1138688000. Throughput: 0: 5025.2. Samples: 9665516. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 13:50:54,965][19571] Avg episode reward: [(0, '11.078')] [2025-01-05 13:50:56,163][19668] Updated weights for policy 0, policy_version 278006 (0.0015) [2025-01-05 13:50:58,204][19668] Updated weights for policy 0, policy_version 278016 (0.0015) [2025-01-05 13:50:59,965][19571] Fps is (10 sec: 20070.4, 60 sec: 20138.7, 300 sec: 20077.3). Total num frames: 1138786304. Throughput: 0: 5019.2. Samples: 9695580. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 13:50:59,965][19571] Avg episode reward: [(0, '9.804')] [2025-01-05 13:51:00,265][19668] Updated weights for policy 0, policy_version 278026 (0.0016) [2025-01-05 13:51:02,299][19668] Updated weights for policy 0, policy_version 278036 (0.0015) [2025-01-05 13:51:04,325][19668] Updated weights for policy 0, policy_version 278046 (0.0016) [2025-01-05 13:51:04,965][19571] Fps is (10 sec: 20070.5, 60 sec: 20138.7, 300 sec: 20091.2). Total num frames: 1138888704. Throughput: 0: 5020.4. Samples: 9710720. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 13:51:04,965][19571] Avg episode reward: [(0, '9.342')] [2025-01-05 13:51:06,394][19668] Updated weights for policy 0, policy_version 278056 (0.0015) [2025-01-05 13:51:08,442][19668] Updated weights for policy 0, policy_version 278066 (0.0015) [2025-01-05 13:51:09,965][19571] Fps is (10 sec: 20070.4, 60 sec: 20070.4, 300 sec: 20077.3). Total num frames: 1138987008. Throughput: 0: 5016.4. Samples: 9740718. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 13:51:09,965][19571] Avg episode reward: [(0, '9.713')] [2025-01-05 13:51:10,500][19668] Updated weights for policy 0, policy_version 278076 (0.0015) [2025-01-05 13:51:12,532][19668] Updated weights for policy 0, policy_version 278086 (0.0016) [2025-01-05 13:51:14,569][19668] Updated weights for policy 0, policy_version 278096 (0.0016) [2025-01-05 13:51:14,965][19571] Fps is (10 sec: 19660.7, 60 sec: 20070.4, 300 sec: 20063.5). Total num frames: 1139085312. Throughput: 0: 5022.7. Samples: 9770824. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 13:51:14,965][19571] Avg episode reward: [(0, '10.144')] [2025-01-05 13:51:16,701][19668] Updated weights for policy 0, policy_version 278106 (0.0017) [2025-01-05 13:51:18,738][19668] Updated weights for policy 0, policy_version 278116 (0.0017) [2025-01-05 13:51:19,965][19571] Fps is (10 sec: 19660.9, 60 sec: 20002.2, 300 sec: 20049.6). Total num frames: 1139183616. Throughput: 0: 5013.6. Samples: 9785436. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 13:51:19,965][19571] Avg episode reward: [(0, '10.126')] [2025-01-05 13:51:20,849][19668] Updated weights for policy 0, policy_version 278126 (0.0016) [2025-01-05 13:51:22,891][19668] Updated weights for policy 0, policy_version 278136 (0.0017) [2025-01-05 13:51:24,965][19571] Fps is (10 sec: 19660.8, 60 sec: 20002.2, 300 sec: 20049.6). Total num frames: 1139281920. Throughput: 0: 4997.1. Samples: 9815026. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 13:51:24,965][19571] Avg episode reward: [(0, '9.629')] [2025-01-05 13:51:24,987][19668] Updated weights for policy 0, policy_version 278146 (0.0017) [2025-01-05 13:51:27,108][19668] Updated weights for policy 0, policy_version 278156 (0.0016) [2025-01-05 13:51:29,140][19668] Updated weights for policy 0, policy_version 278166 (0.0015) [2025-01-05 13:51:29,965][19571] Fps is (10 sec: 20070.4, 60 sec: 20002.1, 300 sec: 20049.6). Total num frames: 1139384320. Throughput: 0: 4992.0. Samples: 9844668. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 13:51:29,965][19571] Avg episode reward: [(0, '10.687')] [2025-01-05 13:51:31,197][19668] Updated weights for policy 0, policy_version 278176 (0.0016) [2025-01-05 13:51:33,238][19668] Updated weights for policy 0, policy_version 278186 (0.0016) [2025-01-05 13:51:34,965][19571] Fps is (10 sec: 20070.2, 60 sec: 20002.1, 300 sec: 20063.5). Total num frames: 1139482624. Throughput: 0: 4988.9. Samples: 9859550. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 13:51:34,965][19571] Avg episode reward: [(0, '8.560')] [2025-01-05 13:51:35,346][19668] Updated weights for policy 0, policy_version 278196 (0.0016) [2025-01-05 13:51:37,320][19668] Updated weights for policy 0, policy_version 278206 (0.0016) [2025-01-05 13:51:39,357][19668] Updated weights for policy 0, policy_version 278216 (0.0016) [2025-01-05 13:51:39,965][19571] Fps is (10 sec: 19660.7, 60 sec: 19933.9, 300 sec: 20049.6). Total num frames: 1139580928. Throughput: 0: 4981.3. Samples: 9889674. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 13:51:39,965][19571] Avg episode reward: [(0, '9.310')] [2025-01-05 13:51:41,447][19668] Updated weights for policy 0, policy_version 278226 (0.0016) [2025-01-05 13:51:43,445][19668] Updated weights for policy 0, policy_version 278236 (0.0015) [2025-01-05 13:51:44,965][19571] Fps is (10 sec: 20070.4, 60 sec: 20002.1, 300 sec: 20063.5). Total num frames: 1139683328. Throughput: 0: 4983.1. Samples: 9919818. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 13:51:44,965][19571] Avg episode reward: [(0, '9.693')] [2025-01-05 13:51:45,466][19668] Updated weights for policy 0, policy_version 278246 (0.0016) [2025-01-05 13:51:47,497][19668] Updated weights for policy 0, policy_version 278256 (0.0016) [2025-01-05 13:51:49,494][19668] Updated weights for policy 0, policy_version 278266 (0.0016) [2025-01-05 13:51:49,965][19571] Fps is (10 sec: 20480.0, 60 sec: 20002.1, 300 sec: 20077.3). Total num frames: 1139785728. Throughput: 0: 4985.2. Samples: 9935056. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 13:51:49,965][19571] Avg episode reward: [(0, '10.806')] [2025-01-05 13:51:51,529][19668] Updated weights for policy 0, policy_version 278276 (0.0019) [2025-01-05 13:51:53,559][19668] Updated weights for policy 0, policy_version 278286 (0.0016) [2025-01-05 13:51:54,965][19571] Fps is (10 sec: 20070.4, 60 sec: 19933.9, 300 sec: 20063.5). Total num frames: 1139884032. Throughput: 0: 4994.8. Samples: 9965482. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 13:51:54,965][19571] Avg episode reward: [(0, '9.420')] [2025-01-05 13:51:55,627][19668] Updated weights for policy 0, policy_version 278296 (0.0017) [2025-01-05 13:51:57,659][19668] Updated weights for policy 0, policy_version 278306 (0.0016) [2025-01-05 13:51:59,739][19668] Updated weights for policy 0, policy_version 278316 (0.0022) [2025-01-05 13:51:59,965][19571] Fps is (10 sec: 20070.3, 60 sec: 20002.1, 300 sec: 20077.3). Total num frames: 1139986432. Throughput: 0: 4989.7. Samples: 9995360. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 13:51:59,965][19571] Avg episode reward: [(0, '10.929')] [2025-01-05 13:52:01,798][19668] Updated weights for policy 0, policy_version 278326 (0.0017) [2025-01-05 13:52:03,862][19668] Updated weights for policy 0, policy_version 278336 (0.0016) [2025-01-05 13:52:04,965][19571] Fps is (10 sec: 20070.2, 60 sec: 19933.8, 300 sec: 20077.3). Total num frames: 1140084736. Throughput: 0: 4992.0. Samples: 10010078. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 13:52:04,966][19571] Avg episode reward: [(0, '9.887')] [2025-01-05 13:52:06,009][19668] Updated weights for policy 0, policy_version 278346 (0.0016) [2025-01-05 13:52:08,006][19668] Updated weights for policy 0, policy_version 278356 (0.0017) [2025-01-05 13:52:09,965][19571] Fps is (10 sec: 19661.0, 60 sec: 19933.9, 300 sec: 20077.3). Total num frames: 1140183040. Throughput: 0: 4992.9. Samples: 10039706. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 13:52:09,965][19571] Avg episode reward: [(0, '10.006')] [2025-01-05 13:52:10,083][19668] Updated weights for policy 0, policy_version 278366 (0.0016) [2025-01-05 13:52:12,161][19668] Updated weights for policy 0, policy_version 278376 (0.0016) [2025-01-05 13:52:14,134][19668] Updated weights for policy 0, policy_version 278386 (0.0016) [2025-01-05 13:52:14,965][19571] Fps is (10 sec: 20070.8, 60 sec: 20002.1, 300 sec: 20077.4). Total num frames: 1140285440. Throughput: 0: 5005.9. Samples: 10069932. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 13:52:14,965][19571] Avg episode reward: [(0, '10.903')] [2025-01-05 13:52:16,196][19668] Updated weights for policy 0, policy_version 278396 (0.0016) [2025-01-05 13:52:18,274][19668] Updated weights for policy 0, policy_version 278406 (0.0016) [2025-01-05 13:52:19,965][19571] Fps is (10 sec: 20069.7, 60 sec: 20002.0, 300 sec: 20063.4). Total num frames: 1140383744. Throughput: 0: 5009.9. Samples: 10084996. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 13:52:19,966][19571] Avg episode reward: [(0, '11.329')] [2025-01-05 13:52:20,324][19668] Updated weights for policy 0, policy_version 278416 (0.0017) [2025-01-05 13:52:22,371][19668] Updated weights for policy 0, policy_version 278426 (0.0016) [2025-01-05 13:52:24,463][19668] Updated weights for policy 0, policy_version 278436 (0.0016) [2025-01-05 13:52:24,965][19571] Fps is (10 sec: 19660.6, 60 sec: 20002.1, 300 sec: 20063.5). Total num frames: 1140482048. Throughput: 0: 5002.6. Samples: 10114790. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) [2025-01-05 13:52:24,965][19571] Avg episode reward: [(0, '10.772')] [2025-01-05 13:52:26,484][19668] Updated weights for policy 0, policy_version 278446 (0.0017) [2025-01-05 13:52:28,553][19668] Updated weights for policy 0, policy_version 278456 (0.0017) [2025-01-05 13:52:29,965][19571] Fps is (10 sec: 19661.3, 60 sec: 19933.9, 300 sec: 20063.5). Total num frames: 1140580352. Throughput: 0: 4987.2. Samples: 10144242. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) [2025-01-05 13:52:29,965][19571] Avg episode reward: [(0, '9.382')] [2025-01-05 13:52:30,703][19668] Updated weights for policy 0, policy_version 278466 (0.0017) [2025-01-05 13:52:32,682][19668] Updated weights for policy 0, policy_version 278476 (0.0017) [2025-01-05 13:52:34,718][19668] Updated weights for policy 0, policy_version 278486 (0.0016) [2025-01-05 13:52:34,965][19571] Fps is (10 sec: 19660.7, 60 sec: 19933.9, 300 sec: 20063.5). Total num frames: 1140678656. Throughput: 0: 4982.7. Samples: 10159278. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) [2025-01-05 13:52:34,965][19571] Avg episode reward: [(0, '10.478')] [2025-01-05 13:52:36,869][19668] Updated weights for policy 0, policy_version 278496 (0.0018) [2025-01-05 13:52:38,860][19668] Updated weights for policy 0, policy_version 278506 (0.0016) [2025-01-05 13:52:39,965][19571] Fps is (10 sec: 20070.5, 60 sec: 20002.2, 300 sec: 20063.5). Total num frames: 1140781056. Throughput: 0: 4970.4. Samples: 10189148. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) [2025-01-05 13:52:39,965][19571] Avg episode reward: [(0, '9.291')] [2025-01-05 13:52:39,971][19636] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000278511_1140781056.pth... [2025-01-05 13:52:40,025][19636] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000277337_1135972352.pth [2025-01-05 13:52:40,950][19668] Updated weights for policy 0, policy_version 278516 (0.0016) [2025-01-05 13:52:43,041][19668] Updated weights for policy 0, policy_version 278526 (0.0016) [2025-01-05 13:52:44,965][19571] Fps is (10 sec: 20070.5, 60 sec: 19933.9, 300 sec: 20063.5). Total num frames: 1140879360. Throughput: 0: 4969.7. Samples: 10218994. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) [2025-01-05 13:52:44,965][19571] Avg episode reward: [(0, '9.516')] [2025-01-05 13:52:45,031][19668] Updated weights for policy 0, policy_version 278536 (0.0017) [2025-01-05 13:52:47,095][19668] Updated weights for policy 0, policy_version 278546 (0.0016) [2025-01-05 13:52:49,220][19668] Updated weights for policy 0, policy_version 278556 (0.0016) [2025-01-05 13:52:49,965][19571] Fps is (10 sec: 19660.6, 60 sec: 19865.6, 300 sec: 20063.5). Total num frames: 1140977664. Throughput: 0: 4973.5. Samples: 10233884. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) [2025-01-05 13:52:49,965][19571] Avg episode reward: [(0, '9.692')] [2025-01-05 13:52:51,252][19668] Updated weights for policy 0, policy_version 278566 (0.0017) [2025-01-05 13:52:53,297][19668] Updated weights for policy 0, policy_version 278576 (0.0016) [2025-01-05 13:52:54,965][19571] Fps is (10 sec: 19661.0, 60 sec: 19865.6, 300 sec: 20049.6). Total num frames: 1141075968. Throughput: 0: 4975.6. Samples: 10263610. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) [2025-01-05 13:52:54,965][19571] Avg episode reward: [(0, '10.216')] [2025-01-05 13:52:55,460][19668] Updated weights for policy 0, policy_version 278586 (0.0016) [2025-01-05 13:52:57,419][19668] Updated weights for policy 0, policy_version 278596 (0.0016) [2025-01-05 13:52:59,480][19668] Updated weights for policy 0, policy_version 278606 (0.0016) [2025-01-05 13:52:59,965][19571] Fps is (10 sec: 20070.7, 60 sec: 19865.6, 300 sec: 20049.6). Total num frames: 1141178368. Throughput: 0: 4967.3. Samples: 10293460. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) [2025-01-05 13:52:59,965][19571] Avg episode reward: [(0, '9.158')] [2025-01-05 13:53:01,623][19668] Updated weights for policy 0, policy_version 278616 (0.0016) [2025-01-05 13:53:03,607][19668] Updated weights for policy 0, policy_version 278626 (0.0016) [2025-01-05 13:53:04,965][19571] Fps is (10 sec: 20070.2, 60 sec: 19865.6, 300 sec: 20035.7). Total num frames: 1141276672. Throughput: 0: 4962.9. Samples: 10308324. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) [2025-01-05 13:53:04,965][19571] Avg episode reward: [(0, '9.117')] [2025-01-05 13:53:05,633][19668] Updated weights for policy 0, policy_version 278636 (0.0016) [2025-01-05 13:53:07,722][19668] Updated weights for policy 0, policy_version 278646 (0.0016) [2025-01-05 13:53:09,704][19668] Updated weights for policy 0, policy_version 278656 (0.0017) [2025-01-05 13:53:09,965][19571] Fps is (10 sec: 20070.1, 60 sec: 19933.8, 300 sec: 20049.6). Total num frames: 1141379072. Throughput: 0: 4971.0. Samples: 10338486. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) [2025-01-05 13:53:09,965][19571] Avg episode reward: [(0, '10.516')] [2025-01-05 13:53:11,728][19668] Updated weights for policy 0, policy_version 278666 (0.0016) [2025-01-05 13:53:13,817][19668] Updated weights for policy 0, policy_version 278676 (0.0016) [2025-01-05 13:53:14,965][19571] Fps is (10 sec: 20070.6, 60 sec: 19865.6, 300 sec: 20035.7). Total num frames: 1141477376. Throughput: 0: 4983.8. Samples: 10368512. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) [2025-01-05 13:53:14,965][19571] Avg episode reward: [(0, '11.544')] [2025-01-05 13:53:15,884][19668] Updated weights for policy 0, policy_version 278686 (0.0017) [2025-01-05 13:53:17,939][19668] Updated weights for policy 0, policy_version 278696 (0.0017) [2025-01-05 13:53:19,965][19571] Fps is (10 sec: 19660.9, 60 sec: 19865.7, 300 sec: 20021.8). Total num frames: 1141575680. Throughput: 0: 4978.4. Samples: 10383304. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) [2025-01-05 13:53:19,965][19571] Avg episode reward: [(0, '10.559')] [2025-01-05 13:53:20,067][19668] Updated weights for policy 0, policy_version 278706 (0.0016) [2025-01-05 13:53:22,107][19668] Updated weights for policy 0, policy_version 278716 (0.0016) [2025-01-05 13:53:24,141][19668] Updated weights for policy 0, policy_version 278726 (0.0016) [2025-01-05 13:53:24,965][19571] Fps is (10 sec: 19660.5, 60 sec: 19865.6, 300 sec: 20021.8). Total num frames: 1141673984. Throughput: 0: 4975.9. Samples: 10413064. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) [2025-01-05 13:53:24,965][19571] Avg episode reward: [(0, '10.040')] [2025-01-05 13:53:26,299][19668] Updated weights for policy 0, policy_version 278736 (0.0016) [2025-01-05 13:53:28,265][19668] Updated weights for policy 0, policy_version 278746 (0.0016) [2025-01-05 13:53:29,965][19571] Fps is (10 sec: 20070.5, 60 sec: 19933.9, 300 sec: 20021.8). Total num frames: 1141776384. Throughput: 0: 4975.2. Samples: 10442876. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) [2025-01-05 13:53:29,965][19571] Avg episode reward: [(0, '9.329')] [2025-01-05 13:53:30,298][19668] Updated weights for policy 0, policy_version 278756 (0.0016) [2025-01-05 13:53:32,364][19668] Updated weights for policy 0, policy_version 278766 (0.0015) [2025-01-05 13:53:34,352][19668] Updated weights for policy 0, policy_version 278776 (0.0018) [2025-01-05 13:53:34,965][19571] Fps is (10 sec: 20480.1, 60 sec: 20002.1, 300 sec: 20035.7). Total num frames: 1141878784. Throughput: 0: 4982.7. Samples: 10458106. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 13:53:34,965][19571] Avg episode reward: [(0, '10.419')] [2025-01-05 13:53:36,394][19668] Updated weights for policy 0, policy_version 278786 (0.0016) [2025-01-05 13:53:38,451][19668] Updated weights for policy 0, policy_version 278796 (0.0015) [2025-01-05 13:53:39,965][19571] Fps is (10 sec: 20070.1, 60 sec: 19933.8, 300 sec: 20021.8). Total num frames: 1141977088. Throughput: 0: 4990.3. Samples: 10488176. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 13:53:39,965][19571] Avg episode reward: [(0, '9.735')] [2025-01-05 13:53:40,543][19668] Updated weights for policy 0, policy_version 278806 (0.0017) [2025-01-05 13:53:42,548][19668] Updated weights for policy 0, policy_version 278816 (0.0016) [2025-01-05 13:53:44,619][19668] Updated weights for policy 0, policy_version 278826 (0.0015) [2025-01-05 13:53:44,965][19571] Fps is (10 sec: 19661.0, 60 sec: 19933.9, 300 sec: 20021.8). Total num frames: 1142075392. Throughput: 0: 4994.1. Samples: 10518194. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 13:53:44,965][19571] Avg episode reward: [(0, '10.561')] [2025-01-05 13:53:46,681][19668] Updated weights for policy 0, policy_version 278836 (0.0017) [2025-01-05 13:53:48,695][19668] Updated weights for policy 0, policy_version 278846 (0.0016) [2025-01-05 13:53:49,965][19571] Fps is (10 sec: 20070.7, 60 sec: 20002.2, 300 sec: 20021.8). Total num frames: 1142177792. Throughput: 0: 4993.9. Samples: 10533050. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 13:53:49,965][19571] Avg episode reward: [(0, '9.304')] [2025-01-05 13:53:50,772][19668] Updated weights for policy 0, policy_version 278856 (0.0015) [2025-01-05 13:53:52,774][19668] Updated weights for policy 0, policy_version 278866 (0.0014) [2025-01-05 13:53:54,793][19668] Updated weights for policy 0, policy_version 278876 (0.0016) [2025-01-05 13:53:54,965][19571] Fps is (10 sec: 20070.2, 60 sec: 20002.1, 300 sec: 20007.9). Total num frames: 1142276096. Throughput: 0: 4995.4. Samples: 10563278. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 13:53:54,965][19571] Avg episode reward: [(0, '9.257')] [2025-01-05 13:53:56,974][19668] Updated weights for policy 0, policy_version 278886 (0.0016) [2025-01-05 13:53:58,969][19668] Updated weights for policy 0, policy_version 278896 (0.0015) [2025-01-05 13:53:59,965][19571] Fps is (10 sec: 19660.7, 60 sec: 19933.8, 300 sec: 20007.9). Total num frames: 1142374400. Throughput: 0: 4988.7. Samples: 10593002. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 13:53:59,965][19571] Avg episode reward: [(0, '9.446')] [2025-01-05 13:54:00,976][19668] Updated weights for policy 0, policy_version 278906 (0.0015) [2025-01-05 13:54:03,044][19668] Updated weights for policy 0, policy_version 278916 (0.0015) [2025-01-05 13:54:04,965][19571] Fps is (10 sec: 20070.4, 60 sec: 20002.1, 300 sec: 20007.9). Total num frames: 1142476800. Throughput: 0: 4997.1. Samples: 10608172. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 13:54:04,965][19571] Avg episode reward: [(0, '9.861')] [2025-01-05 13:54:05,125][19668] Updated weights for policy 0, policy_version 278926 (0.0016) [2025-01-05 13:54:07,129][19668] Updated weights for policy 0, policy_version 278936 (0.0015) [2025-01-05 13:54:09,200][19668] Updated weights for policy 0, policy_version 278946 (0.0015) [2025-01-05 13:54:09,965][19571] Fps is (10 sec: 20070.2, 60 sec: 19933.9, 300 sec: 19994.0). Total num frames: 1142575104. Throughput: 0: 5002.1. Samples: 10638160. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 13:54:09,965][19571] Avg episode reward: [(0, '8.910')] [2025-01-05 13:54:11,315][19668] Updated weights for policy 0, policy_version 278956 (0.0017) [2025-01-05 13:54:13,312][19668] Updated weights for policy 0, policy_version 278966 (0.0015) [2025-01-05 13:54:14,965][19571] Fps is (10 sec: 20070.4, 60 sec: 20002.1, 300 sec: 20007.9). Total num frames: 1142677504. Throughput: 0: 5002.6. Samples: 10667994. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 13:54:14,965][19571] Avg episode reward: [(0, '9.926')] [2025-01-05 13:54:15,368][19668] Updated weights for policy 0, policy_version 278976 (0.0015) [2025-01-05 13:54:17,398][19668] Updated weights for policy 0, policy_version 278986 (0.0015) [2025-01-05 13:54:19,382][19668] Updated weights for policy 0, policy_version 278996 (0.0015) [2025-01-05 13:54:19,965][19571] Fps is (10 sec: 20070.6, 60 sec: 20002.1, 300 sec: 19994.0). Total num frames: 1142775808. Throughput: 0: 5000.1. Samples: 10683112. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 13:54:19,965][19571] Avg episode reward: [(0, '10.282')] [2025-01-05 13:54:21,460][19668] Updated weights for policy 0, policy_version 279006 (0.0016) [2025-01-05 13:54:23,468][19668] Updated weights for policy 0, policy_version 279016 (0.0016) [2025-01-05 13:54:24,965][19571] Fps is (10 sec: 20070.5, 60 sec: 20070.4, 300 sec: 20007.9). Total num frames: 1142878208. Throughput: 0: 5003.7. Samples: 10713344. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 13:54:24,965][19571] Avg episode reward: [(0, '10.670')] [2025-01-05 13:54:25,477][19668] Updated weights for policy 0, policy_version 279026 (0.0019) [2025-01-05 13:54:27,542][19668] Updated weights for policy 0, policy_version 279036 (0.0015) [2025-01-05 13:54:29,548][19668] Updated weights for policy 0, policy_version 279046 (0.0015) [2025-01-05 13:54:29,965][19571] Fps is (10 sec: 20070.2, 60 sec: 20002.1, 300 sec: 20007.9). Total num frames: 1142976512. Throughput: 0: 5010.1. Samples: 10743650. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 13:54:29,965][19571] Avg episode reward: [(0, '10.293')] [2025-01-05 13:54:31,599][19668] Updated weights for policy 0, policy_version 279056 (0.0016) [2025-01-05 13:54:33,660][19668] Updated weights for policy 0, policy_version 279066 (0.0016) [2025-01-05 13:54:34,965][19571] Fps is (10 sec: 20070.2, 60 sec: 20002.1, 300 sec: 20007.9). Total num frames: 1143078912. Throughput: 0: 5014.3. Samples: 10758692. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 13:54:34,965][19571] Avg episode reward: [(0, '9.208')] [2025-01-05 13:54:35,774][19668] Updated weights for policy 0, policy_version 279076 (0.0016) [2025-01-05 13:54:37,798][19668] Updated weights for policy 0, policy_version 279086 (0.0016) [2025-01-05 13:54:39,935][19668] Updated weights for policy 0, policy_version 279096 (0.0016) [2025-01-05 13:54:39,965][19571] Fps is (10 sec: 20070.6, 60 sec: 20002.1, 300 sec: 20007.9). Total num frames: 1143177216. Throughput: 0: 4996.6. Samples: 10788126. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 13:54:39,965][19571] Avg episode reward: [(0, '8.837')] [2025-01-05 13:54:39,971][19636] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000279096_1143177216.pth... [2025-01-05 13:54:40,026][19636] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000277926_1138384896.pth [2025-01-05 13:54:42,041][19668] Updated weights for policy 0, policy_version 279106 (0.0016) [2025-01-05 13:54:44,079][19668] Updated weights for policy 0, policy_version 279116 (0.0016) [2025-01-05 13:54:44,965][19571] Fps is (10 sec: 19661.0, 60 sec: 20002.1, 300 sec: 20007.9). Total num frames: 1143275520. Throughput: 0: 4987.8. Samples: 10817454. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 13:54:44,965][19571] Avg episode reward: [(0, '9.370')] [2025-01-05 13:54:46,224][19668] Updated weights for policy 0, policy_version 279126 (0.0016) [2025-01-05 13:54:48,243][19668] Updated weights for policy 0, policy_version 279136 (0.0015) [2025-01-05 13:54:49,965][19571] Fps is (10 sec: 19660.8, 60 sec: 19933.8, 300 sec: 19994.0). Total num frames: 1143373824. Throughput: 0: 4980.6. Samples: 10832300. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-05 13:54:49,965][19571] Avg episode reward: [(0, '9.858')] [2025-01-05 13:54:50,333][19668] Updated weights for policy 0, policy_version 279146 (0.0015) [2025-01-05 13:54:52,387][19668] Updated weights for policy 0, policy_version 279156 (0.0016) [2025-01-05 13:54:54,386][19668] Updated weights for policy 0, policy_version 279166 (0.0015) [2025-01-05 13:54:54,965][19571] Fps is (10 sec: 19660.9, 60 sec: 19933.9, 300 sec: 19980.2). Total num frames: 1143472128. Throughput: 0: 4980.7. Samples: 10862290. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-05 13:54:54,965][19571] Avg episode reward: [(0, '10.452')] [2025-01-05 13:54:56,419][19668] Updated weights for policy 0, policy_version 279176 (0.0016) [2025-01-05 13:54:58,501][19668] Updated weights for policy 0, policy_version 279186 (0.0019) [2025-01-05 13:54:59,965][19571] Fps is (10 sec: 20070.4, 60 sec: 20002.1, 300 sec: 19980.2). Total num frames: 1143574528. Throughput: 0: 4982.6. Samples: 10892212. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-05 13:54:59,965][19571] Avg episode reward: [(0, '9.586')] [2025-01-05 13:55:00,556][19668] Updated weights for policy 0, policy_version 279196 (0.0015) [2025-01-05 13:55:02,586][19668] Updated weights for policy 0, policy_version 279206 (0.0016) [2025-01-05 13:55:04,636][19668] Updated weights for policy 0, policy_version 279216 (0.0016) [2025-01-05 13:55:04,965][19571] Fps is (10 sec: 20070.2, 60 sec: 19933.9, 300 sec: 19966.3). Total num frames: 1143672832. Throughput: 0: 4982.7. Samples: 10907334. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-05 13:55:04,965][19571] Avg episode reward: [(0, '10.956')] [2025-01-05 13:55:06,750][19668] Updated weights for policy 0, policy_version 279226 (0.0016) [2025-01-05 13:55:08,778][19668] Updated weights for policy 0, policy_version 279236 (0.0016) [2025-01-05 13:55:09,965][19571] Fps is (10 sec: 19660.4, 60 sec: 19933.8, 300 sec: 19966.3). Total num frames: 1143771136. Throughput: 0: 4969.8. Samples: 10936984. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-05 13:55:09,966][19571] Avg episode reward: [(0, '9.280')] [2025-01-05 13:55:10,933][19668] Updated weights for policy 0, policy_version 279246 (0.0016) [2025-01-05 13:55:12,942][19668] Updated weights for policy 0, policy_version 279256 (0.0018) [2025-01-05 13:55:14,965][19571] Fps is (10 sec: 19660.7, 60 sec: 19865.6, 300 sec: 19952.4). Total num frames: 1143869440. Throughput: 0: 4956.8. Samples: 10966704. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-05 13:55:14,965][19571] Avg episode reward: [(0, '9.603')] [2025-01-05 13:55:14,976][19668] Updated weights for policy 0, policy_version 279266 (0.0016) [2025-01-05 13:55:17,121][19668] Updated weights for policy 0, policy_version 279276 (0.0016) [2025-01-05 13:55:19,142][19668] Updated weights for policy 0, policy_version 279286 (0.0018) [2025-01-05 13:55:19,965][19571] Fps is (10 sec: 19661.0, 60 sec: 19865.6, 300 sec: 19952.4). Total num frames: 1143967744. Throughput: 0: 4949.2. Samples: 10981404. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-05 13:55:19,965][19571] Avg episode reward: [(0, '10.485')] [2025-01-05 13:55:21,235][19668] Updated weights for policy 0, policy_version 279296 (0.0017) [2025-01-05 13:55:23,283][19668] Updated weights for policy 0, policy_version 279306 (0.0016) [2025-01-05 13:55:24,965][19571] Fps is (10 sec: 20070.8, 60 sec: 19865.6, 300 sec: 19952.4). Total num frames: 1144070144. Throughput: 0: 4958.9. Samples: 11011274. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-05 13:55:24,965][19571] Avg episode reward: [(0, '9.757')] [2025-01-05 13:55:25,343][19668] Updated weights for policy 0, policy_version 279316 (0.0016) [2025-01-05 13:55:27,349][19668] Updated weights for policy 0, policy_version 279326 (0.0015) [2025-01-05 13:55:29,419][19668] Updated weights for policy 0, policy_version 279336 (0.0016) [2025-01-05 13:55:29,965][19571] Fps is (10 sec: 20070.4, 60 sec: 19865.6, 300 sec: 19952.4). Total num frames: 1144168448. Throughput: 0: 4977.0. Samples: 11041418. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-05 13:55:29,965][19571] Avg episode reward: [(0, '9.515')] [2025-01-05 13:55:31,452][19668] Updated weights for policy 0, policy_version 279346 (0.0016) [2025-01-05 13:55:33,474][19668] Updated weights for policy 0, policy_version 279356 (0.0016) [2025-01-05 13:55:34,965][19571] Fps is (10 sec: 20070.2, 60 sec: 19865.6, 300 sec: 19952.4). Total num frames: 1144270848. Throughput: 0: 4979.6. Samples: 11056380. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-05 13:55:34,965][19571] Avg episode reward: [(0, '10.193')] [2025-01-05 13:55:35,574][19668] Updated weights for policy 0, policy_version 279366 (0.0016) [2025-01-05 13:55:37,555][19668] Updated weights for policy 0, policy_version 279376 (0.0016) [2025-01-05 13:55:39,556][19668] Updated weights for policy 0, policy_version 279386 (0.0015) [2025-01-05 13:55:39,965][19571] Fps is (10 sec: 20070.8, 60 sec: 19865.6, 300 sec: 19952.4). Total num frames: 1144369152. Throughput: 0: 4985.4. Samples: 11086632. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-05 13:55:39,965][19571] Avg episode reward: [(0, '9.929')] [2025-01-05 13:55:41,607][19668] Updated weights for policy 0, policy_version 279396 (0.0015) [2025-01-05 13:55:43,596][19668] Updated weights for policy 0, policy_version 279406 (0.0015) [2025-01-05 13:55:44,965][19571] Fps is (10 sec: 20070.4, 60 sec: 19933.9, 300 sec: 19952.4). Total num frames: 1144471552. Throughput: 0: 4999.4. Samples: 11117186. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-05 13:55:44,965][19571] Avg episode reward: [(0, '9.743')] [2025-01-05 13:55:45,589][19668] Updated weights for policy 0, policy_version 279416 (0.0014) [2025-01-05 13:55:47,639][19668] Updated weights for policy 0, policy_version 279426 (0.0015) [2025-01-05 13:55:49,646][19668] Updated weights for policy 0, policy_version 279436 (0.0015) [2025-01-05 13:55:49,965][19571] Fps is (10 sec: 20480.0, 60 sec: 20002.2, 300 sec: 19952.4). Total num frames: 1144573952. Throughput: 0: 5002.6. Samples: 11132452. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-05 13:55:49,965][19571] Avg episode reward: [(0, '9.612')] [2025-01-05 13:55:51,645][19668] Updated weights for policy 0, policy_version 279446 (0.0015) [2025-01-05 13:55:53,673][19668] Updated weights for policy 0, policy_version 279456 (0.0014) [2025-01-05 13:55:54,965][19571] Fps is (10 sec: 20479.9, 60 sec: 20070.4, 300 sec: 19966.3). Total num frames: 1144676352. Throughput: 0: 5020.2. Samples: 11162894. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-05 13:55:54,965][19571] Avg episode reward: [(0, '9.935')] [2025-01-05 13:55:55,800][19668] Updated weights for policy 0, policy_version 279466 (0.0016) [2025-01-05 13:55:57,774][19668] Updated weights for policy 0, policy_version 279476 (0.0015) [2025-01-05 13:55:59,817][19668] Updated weights for policy 0, policy_version 279486 (0.0016) [2025-01-05 13:55:59,965][19571] Fps is (10 sec: 20070.4, 60 sec: 20002.2, 300 sec: 19952.4). Total num frames: 1144774656. Throughput: 0: 5027.0. Samples: 11192918. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 13:55:59,965][19571] Avg episode reward: [(0, '9.922')] [2025-01-05 13:56:01,905][19668] Updated weights for policy 0, policy_version 279496 (0.0015) [2025-01-05 13:56:03,880][19668] Updated weights for policy 0, policy_version 279506 (0.0015) [2025-01-05 13:56:04,965][19571] Fps is (10 sec: 20070.4, 60 sec: 20070.4, 300 sec: 19966.3). Total num frames: 1144877056. Throughput: 0: 5032.1. Samples: 11207850. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 13:56:04,965][19571] Avg episode reward: [(0, '10.081')] [2025-01-05 13:56:05,952][19668] Updated weights for policy 0, policy_version 279516 (0.0016) [2025-01-05 13:56:07,985][19668] Updated weights for policy 0, policy_version 279526 (0.0016) [2025-01-05 13:56:09,965][19571] Fps is (10 sec: 20070.2, 60 sec: 20070.5, 300 sec: 19966.3). Total num frames: 1144975360. Throughput: 0: 5038.6. Samples: 11238014. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 13:56:09,966][19571] Avg episode reward: [(0, '10.018')] [2025-01-05 13:56:10,039][19668] Updated weights for policy 0, policy_version 279536 (0.0016) [2025-01-05 13:56:12,088][19668] Updated weights for policy 0, policy_version 279546 (0.0015) [2025-01-05 13:56:14,114][19668] Updated weights for policy 0, policy_version 279556 (0.0018) [2025-01-05 13:56:14,965][19571] Fps is (10 sec: 20070.3, 60 sec: 20138.7, 300 sec: 19980.1). Total num frames: 1145077760. Throughput: 0: 5037.4. Samples: 11268102. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 13:56:14,965][19571] Avg episode reward: [(0, '10.403')] [2025-01-05 13:56:16,131][19668] Updated weights for policy 0, policy_version 279566 (0.0016) [2025-01-05 13:56:18,198][19668] Updated weights for policy 0, policy_version 279576 (0.0018) [2025-01-05 13:56:19,965][19571] Fps is (10 sec: 20070.3, 60 sec: 20138.7, 300 sec: 19980.1). Total num frames: 1145176064. Throughput: 0: 5042.1. Samples: 11283276. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 13:56:19,966][19571] Avg episode reward: [(0, '11.259')] [2025-01-05 13:56:20,280][19668] Updated weights for policy 0, policy_version 279586 (0.0016) [2025-01-05 13:56:22,275][19668] Updated weights for policy 0, policy_version 279596 (0.0016) [2025-01-05 13:56:24,326][19668] Updated weights for policy 0, policy_version 279606 (0.0015) [2025-01-05 13:56:24,965][19571] Fps is (10 sec: 20070.6, 60 sec: 20138.6, 300 sec: 19980.2). Total num frames: 1145278464. Throughput: 0: 5037.0. Samples: 11313296. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 13:56:24,965][19571] Avg episode reward: [(0, '11.033')] [2025-01-05 13:56:26,390][19668] Updated weights for policy 0, policy_version 279616 (0.0015) [2025-01-05 13:56:28,407][19668] Updated weights for policy 0, policy_version 279626 (0.0015) [2025-01-05 13:56:29,965][19571] Fps is (10 sec: 20070.5, 60 sec: 20138.7, 300 sec: 19980.1). Total num frames: 1145376768. Throughput: 0: 5026.8. Samples: 11343392. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 13:56:29,965][19571] Avg episode reward: [(0, '10.304')] [2025-01-05 13:56:30,451][19668] Updated weights for policy 0, policy_version 279636 (0.0016) [2025-01-05 13:56:32,470][19668] Updated weights for policy 0, policy_version 279646 (0.0019) [2025-01-05 13:56:34,463][19668] Updated weights for policy 0, policy_version 279656 (0.0016) [2025-01-05 13:56:34,965][19571] Fps is (10 sec: 20070.4, 60 sec: 20138.7, 300 sec: 19994.0). Total num frames: 1145479168. Throughput: 0: 5022.8. Samples: 11358480. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 13:56:34,965][19571] Avg episode reward: [(0, '9.119')] [2025-01-05 13:56:36,516][19668] Updated weights for policy 0, policy_version 279666 (0.0016) [2025-01-05 13:56:38,533][19668] Updated weights for policy 0, policy_version 279676 (0.0015) [2025-01-05 13:56:39,965][19571] Fps is (10 sec: 20479.9, 60 sec: 20206.9, 300 sec: 19994.0). Total num frames: 1145581568. Throughput: 0: 5022.4. Samples: 11388900. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 13:56:39,965][19571] Avg episode reward: [(0, '9.940')] [2025-01-05 13:56:39,972][19636] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000279683_1145581568.pth... [2025-01-05 13:56:40,021][19636] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000278511_1140781056.pth [2025-01-05 13:56:40,561][19668] Updated weights for policy 0, policy_version 279686 (0.0015) [2025-01-05 13:56:42,626][19668] Updated weights for policy 0, policy_version 279696 (0.0016) [2025-01-05 13:56:44,651][19668] Updated weights for policy 0, policy_version 279706 (0.0016) [2025-01-05 13:56:44,965][19571] Fps is (10 sec: 20070.1, 60 sec: 20138.6, 300 sec: 19980.1). Total num frames: 1145679872. Throughput: 0: 5024.7. Samples: 11419032. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 13:56:44,965][19571] Avg episode reward: [(0, '10.283')] [2025-01-05 13:56:46,724][19668] Updated weights for policy 0, policy_version 279716 (0.0020) [2025-01-05 13:56:48,782][19668] Updated weights for policy 0, policy_version 279726 (0.0017) [2025-01-05 13:56:49,965][19571] Fps is (10 sec: 19660.8, 60 sec: 20070.4, 300 sec: 19980.1). Total num frames: 1145778176. Throughput: 0: 5023.9. Samples: 11433924. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 13:56:49,965][19571] Avg episode reward: [(0, '10.783')] [2025-01-05 13:56:50,866][19668] Updated weights for policy 0, policy_version 279736 (0.0015) [2025-01-05 13:56:52,859][19668] Updated weights for policy 0, policy_version 279746 (0.0016) [2025-01-05 13:56:54,886][19668] Updated weights for policy 0, policy_version 279756 (0.0015) [2025-01-05 13:56:54,965][19571] Fps is (10 sec: 20070.0, 60 sec: 20070.3, 300 sec: 19980.1). Total num frames: 1145880576. Throughput: 0: 5021.0. Samples: 11463960. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 13:56:54,966][19571] Avg episode reward: [(0, '8.823')] [2025-01-05 13:56:56,958][19668] Updated weights for policy 0, policy_version 279766 (0.0015) [2025-01-05 13:56:58,927][19668] Updated weights for policy 0, policy_version 279776 (0.0015) [2025-01-05 13:56:59,965][19571] Fps is (10 sec: 20070.6, 60 sec: 20070.4, 300 sec: 19980.2). Total num frames: 1145978880. Throughput: 0: 5025.7. Samples: 11494258. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 13:56:59,965][19571] Avg episode reward: [(0, '9.879')] [2025-01-05 13:57:00,992][19668] Updated weights for policy 0, policy_version 279786 (0.0015) [2025-01-05 13:57:02,995][19668] Updated weights for policy 0, policy_version 279796 (0.0016) [2025-01-05 13:57:04,965][19571] Fps is (10 sec: 20070.9, 60 sec: 20070.4, 300 sec: 19994.0). Total num frames: 1146081280. Throughput: 0: 5026.6. Samples: 11509472. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 13:57:04,965][19571] Avg episode reward: [(0, '9.930')] [2025-01-05 13:57:04,984][19668] Updated weights for policy 0, policy_version 279806 (0.0015) [2025-01-05 13:57:07,048][19668] Updated weights for policy 0, policy_version 279816 (0.0015) [2025-01-05 13:57:09,122][19668] Updated weights for policy 0, policy_version 279826 (0.0015) [2025-01-05 13:57:09,965][19571] Fps is (10 sec: 20479.8, 60 sec: 20138.7, 300 sec: 19994.0). Total num frames: 1146183680. Throughput: 0: 5025.1. Samples: 11539428. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 13:57:09,965][19571] Avg episode reward: [(0, '10.422')] [2025-01-05 13:57:11,209][19668] Updated weights for policy 0, policy_version 279836 (0.0016) [2025-01-05 13:57:13,245][19668] Updated weights for policy 0, policy_version 279846 (0.0015) [2025-01-05 13:57:14,965][19571] Fps is (10 sec: 20070.4, 60 sec: 20070.4, 300 sec: 19994.1). Total num frames: 1146281984. Throughput: 0: 5018.5. Samples: 11569226. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 13:57:14,965][19571] Avg episode reward: [(0, '9.930')] [2025-01-05 13:57:15,349][19668] Updated weights for policy 0, policy_version 279856 (0.0016) [2025-01-05 13:57:17,320][19668] Updated weights for policy 0, policy_version 279866 (0.0015) [2025-01-05 13:57:19,339][19668] Updated weights for policy 0, policy_version 279876 (0.0015) [2025-01-05 13:57:19,965][19571] Fps is (10 sec: 20070.5, 60 sec: 20138.7, 300 sec: 20007.9). Total num frames: 1146384384. Throughput: 0: 5023.5. Samples: 11584538. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 13:57:19,965][19571] Avg episode reward: [(0, '9.745')] [2025-01-05 13:57:21,467][19668] Updated weights for policy 0, policy_version 279886 (0.0016) [2025-01-05 13:57:23,442][19668] Updated weights for policy 0, policy_version 279896 (0.0015) [2025-01-05 13:57:24,965][19571] Fps is (10 sec: 20070.3, 60 sec: 20070.4, 300 sec: 20007.9). Total num frames: 1146482688. Throughput: 0: 5015.2. Samples: 11614582. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 13:57:24,965][19571] Avg episode reward: [(0, '9.302')] [2025-01-05 13:57:25,453][19668] Updated weights for policy 0, policy_version 279906 (0.0015) [2025-01-05 13:57:27,507][19668] Updated weights for policy 0, policy_version 279916 (0.0017) [2025-01-05 13:57:29,500][19668] Updated weights for policy 0, policy_version 279926 (0.0015) [2025-01-05 13:57:29,965][19571] Fps is (10 sec: 20070.3, 60 sec: 20138.7, 300 sec: 20021.8). Total num frames: 1146585088. Throughput: 0: 5021.5. Samples: 11644998. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 13:57:29,965][19571] Avg episode reward: [(0, '9.923')] [2025-01-05 13:57:31,500][19668] Updated weights for policy 0, policy_version 279936 (0.0014) [2025-01-05 13:57:33,560][19668] Updated weights for policy 0, policy_version 279946 (0.0019) [2025-01-05 13:57:34,965][19571] Fps is (10 sec: 20070.5, 60 sec: 20070.4, 300 sec: 20007.9). Total num frames: 1146683392. Throughput: 0: 5029.8. Samples: 11660266. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 13:57:34,965][19571] Avg episode reward: [(0, '10.453')] [2025-01-05 13:57:35,617][19668] Updated weights for policy 0, policy_version 279956 (0.0015) [2025-01-05 13:57:37,595][19668] Updated weights for policy 0, policy_version 279966 (0.0015) [2025-01-05 13:57:39,665][19668] Updated weights for policy 0, policy_version 279976 (0.0016) [2025-01-05 13:57:39,965][19571] Fps is (10 sec: 20070.4, 60 sec: 20070.4, 300 sec: 20021.8). Total num frames: 1146785792. Throughput: 0: 5035.0. Samples: 11690536. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 13:57:39,965][19571] Avg episode reward: [(0, '10.741')] [2025-01-05 13:57:41,724][19668] Updated weights for policy 0, policy_version 279986 (0.0016) [2025-01-05 13:57:43,735][19668] Updated weights for policy 0, policy_version 279996 (0.0016) [2025-01-05 13:57:44,965][19571] Fps is (10 sec: 20480.2, 60 sec: 20138.7, 300 sec: 20035.7). Total num frames: 1146888192. Throughput: 0: 5027.4. Samples: 11720492. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 13:57:44,965][19571] Avg episode reward: [(0, '9.682')] [2025-01-05 13:57:45,811][19668] Updated weights for policy 0, policy_version 280006 (0.0015) [2025-01-05 13:57:47,792][19668] Updated weights for policy 0, policy_version 280016 (0.0015) [2025-01-05 13:57:49,787][19668] Updated weights for policy 0, policy_version 280026 (0.0015) [2025-01-05 13:57:49,965][19571] Fps is (10 sec: 20070.5, 60 sec: 20138.7, 300 sec: 20035.7). Total num frames: 1146986496. Throughput: 0: 5028.2. Samples: 11735740. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 13:57:49,965][19571] Avg episode reward: [(0, '10.511')] [2025-01-05 13:57:51,860][19668] Updated weights for policy 0, policy_version 280036 (0.0014) [2025-01-05 13:57:53,862][19668] Updated weights for policy 0, policy_version 280046 (0.0017) [2025-01-05 13:57:54,965][19571] Fps is (10 sec: 20070.1, 60 sec: 20138.7, 300 sec: 20035.7). Total num frames: 1147088896. Throughput: 0: 5037.7. Samples: 11766124. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 13:57:54,965][19571] Avg episode reward: [(0, '10.805')] [2025-01-05 13:57:55,865][19668] Updated weights for policy 0, policy_version 280056 (0.0015) [2025-01-05 13:57:57,924][19668] Updated weights for policy 0, policy_version 280066 (0.0015) [2025-01-05 13:57:59,965][19571] Fps is (10 sec: 20070.4, 60 sec: 20138.7, 300 sec: 20035.7). Total num frames: 1147187200. Throughput: 0: 5041.3. Samples: 11796084. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 13:57:59,965][19571] Avg episode reward: [(0, '9.652')] [2025-01-05 13:58:00,007][19668] Updated weights for policy 0, policy_version 280076 (0.0017) [2025-01-05 13:58:02,078][19668] Updated weights for policy 0, policy_version 280086 (0.0017) [2025-01-05 13:58:04,147][19668] Updated weights for policy 0, policy_version 280096 (0.0016) [2025-01-05 13:58:04,965][19571] Fps is (10 sec: 19660.9, 60 sec: 20070.4, 300 sec: 20021.8). Total num frames: 1147285504. Throughput: 0: 5029.2. Samples: 11810850. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 13:58:04,965][19571] Avg episode reward: [(0, '9.444')] [2025-01-05 13:58:06,216][19668] Updated weights for policy 0, policy_version 280106 (0.0016) [2025-01-05 13:58:08,230][19668] Updated weights for policy 0, policy_version 280116 (0.0015) [2025-01-05 13:58:09,965][19571] Fps is (10 sec: 20070.4, 60 sec: 20070.4, 300 sec: 20035.7). Total num frames: 1147387904. Throughput: 0: 5030.4. Samples: 11840952. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 13:58:09,965][19571] Avg episode reward: [(0, '9.598')] [2025-01-05 13:58:10,276][19668] Updated weights for policy 0, policy_version 280126 (0.0014) [2025-01-05 13:58:12,294][19668] Updated weights for policy 0, policy_version 280136 (0.0015) [2025-01-05 13:58:14,311][19668] Updated weights for policy 0, policy_version 280146 (0.0016) [2025-01-05 13:58:14,965][19571] Fps is (10 sec: 20479.9, 60 sec: 20138.7, 300 sec: 20049.6). Total num frames: 1147490304. Throughput: 0: 5028.2. Samples: 11871268. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 13:58:14,965][19571] Avg episode reward: [(0, '9.788')] [2025-01-05 13:58:16,352][19668] Updated weights for policy 0, policy_version 280156 (0.0015) [2025-01-05 13:58:18,351][19668] Updated weights for policy 0, policy_version 280166 (0.0015) [2025-01-05 13:58:19,965][19571] Fps is (10 sec: 20070.5, 60 sec: 20070.4, 300 sec: 20049.6). Total num frames: 1147588608. Throughput: 0: 5027.7. Samples: 11886514. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 13:58:19,965][19571] Avg episode reward: [(0, '9.592')] [2025-01-05 13:58:20,365][19668] Updated weights for policy 0, policy_version 280176 (0.0015) [2025-01-05 13:58:22,413][19668] Updated weights for policy 0, policy_version 280186 (0.0015) [2025-01-05 13:58:24,433][19668] Updated weights for policy 0, policy_version 280196 (0.0015) [2025-01-05 13:58:24,965][19571] Fps is (10 sec: 20070.3, 60 sec: 20138.7, 300 sec: 20049.6). Total num frames: 1147691008. Throughput: 0: 5029.5. Samples: 11916862. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 13:58:24,965][19571] Avg episode reward: [(0, '9.623')] [2025-01-05 13:58:26,506][19668] Updated weights for policy 0, policy_version 280206 (0.0016) [2025-01-05 13:58:28,563][19668] Updated weights for policy 0, policy_version 280216 (0.0015) [2025-01-05 13:58:29,965][19571] Fps is (10 sec: 20070.2, 60 sec: 20070.4, 300 sec: 20035.7). Total num frames: 1147789312. Throughput: 0: 5024.0. Samples: 11946574. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 13:58:29,965][19571] Avg episode reward: [(0, '9.513')] [2025-01-05 13:58:30,646][19668] Updated weights for policy 0, policy_version 280226 (0.0015) [2025-01-05 13:58:32,661][19668] Updated weights for policy 0, policy_version 280236 (0.0015) [2025-01-05 13:58:34,709][19668] Updated weights for policy 0, policy_version 280246 (0.0015) [2025-01-05 13:58:34,965][19571] Fps is (10 sec: 20070.5, 60 sec: 20138.7, 300 sec: 20049.6). Total num frames: 1147891712. Throughput: 0: 5019.3. Samples: 11961608. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 13:58:34,965][19571] Avg episode reward: [(0, '9.272')] [2025-01-05 13:58:36,764][19668] Updated weights for policy 0, policy_version 280256 (0.0015) [2025-01-05 13:58:38,770][19668] Updated weights for policy 0, policy_version 280266 (0.0015) [2025-01-05 13:58:39,965][19571] Fps is (10 sec: 20070.6, 60 sec: 20070.4, 300 sec: 20049.6). Total num frames: 1147990016. Throughput: 0: 5015.3. Samples: 11991812. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 13:58:39,965][19571] Avg episode reward: [(0, '10.014')] [2025-01-05 13:58:39,982][19636] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000280272_1147994112.pth... [2025-01-05 13:58:40,033][19636] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000279096_1143177216.pth [2025-01-05 13:58:40,847][19668] Updated weights for policy 0, policy_version 280276 (0.0016) [2025-01-05 13:58:42,840][19668] Updated weights for policy 0, policy_version 280286 (0.0014) [2025-01-05 13:58:44,871][19668] Updated weights for policy 0, policy_version 280296 (0.0015) [2025-01-05 13:58:44,965][19571] Fps is (10 sec: 20070.6, 60 sec: 20070.4, 300 sec: 20049.6). Total num frames: 1148092416. Throughput: 0: 5019.3. Samples: 12021950. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 13:58:44,965][19571] Avg episode reward: [(0, '8.680')] [2025-01-05 13:58:46,969][19668] Updated weights for policy 0, policy_version 280306 (0.0016) [2025-01-05 13:58:48,959][19668] Updated weights for policy 0, policy_version 280316 (0.0015) [2025-01-05 13:58:49,965][19571] Fps is (10 sec: 20070.2, 60 sec: 20070.4, 300 sec: 20049.6). Total num frames: 1148190720. Throughput: 0: 5026.1. Samples: 12037024. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 13:58:49,965][19571] Avg episode reward: [(0, '9.319')] [2025-01-05 13:58:50,980][19668] Updated weights for policy 0, policy_version 280326 (0.0015) [2025-01-05 13:58:53,058][19668] Updated weights for policy 0, policy_version 280336 (0.0016) [2025-01-05 13:58:54,965][19571] Fps is (10 sec: 20070.4, 60 sec: 20070.4, 300 sec: 20063.5). Total num frames: 1148293120. Throughput: 0: 5026.7. Samples: 12067152. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 13:58:54,965][19571] Avg episode reward: [(0, '9.780')] [2025-01-05 13:58:55,090][19668] Updated weights for policy 0, policy_version 280346 (0.0015) [2025-01-05 13:58:57,101][19668] Updated weights for policy 0, policy_version 280356 (0.0015) [2025-01-05 13:58:59,173][19668] Updated weights for policy 0, policy_version 280366 (0.0016) [2025-01-05 13:58:59,965][19571] Fps is (10 sec: 20070.4, 60 sec: 20070.4, 300 sec: 20049.6). Total num frames: 1148391424. Throughput: 0: 5023.3. Samples: 12097316. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 13:58:59,965][19571] Avg episode reward: [(0, '10.257')] [2025-01-05 13:59:01,222][19668] Updated weights for policy 0, policy_version 280376 (0.0016) [2025-01-05 13:59:03,270][19668] Updated weights for policy 0, policy_version 280386 (0.0015) [2025-01-05 13:59:04,965][19571] Fps is (10 sec: 20070.2, 60 sec: 20138.6, 300 sec: 20063.5). Total num frames: 1148493824. Throughput: 0: 5012.7. Samples: 12112086. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 13:59:04,965][19571] Avg episode reward: [(0, '9.730')] [2025-01-05 13:59:05,355][19668] Updated weights for policy 0, policy_version 280396 (0.0015) [2025-01-05 13:59:07,342][19668] Updated weights for policy 0, policy_version 280406 (0.0015) [2025-01-05 13:59:09,377][19668] Updated weights for policy 0, policy_version 280416 (0.0017) [2025-01-05 13:59:09,965][19571] Fps is (10 sec: 20070.6, 60 sec: 20070.4, 300 sec: 20049.6). Total num frames: 1148592128. Throughput: 0: 5011.3. Samples: 12142372. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 13:59:09,965][19571] Avg episode reward: [(0, '9.632')] [2025-01-05 13:59:11,473][19668] Updated weights for policy 0, policy_version 280426 (0.0016) [2025-01-05 13:59:13,447][19668] Updated weights for policy 0, policy_version 280436 (0.0016) [2025-01-05 13:59:14,965][19571] Fps is (10 sec: 20070.6, 60 sec: 20070.4, 300 sec: 20063.5). Total num frames: 1148694528. Throughput: 0: 5019.0. Samples: 12172426. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 13:59:14,965][19571] Avg episode reward: [(0, '9.745')] [2025-01-05 13:59:15,498][19668] Updated weights for policy 0, policy_version 280446 (0.0015) [2025-01-05 13:59:17,534][19668] Updated weights for policy 0, policy_version 280456 (0.0016) [2025-01-05 13:59:19,569][19668] Updated weights for policy 0, policy_version 280466 (0.0015) [2025-01-05 13:59:19,965][19571] Fps is (10 sec: 20069.9, 60 sec: 20070.3, 300 sec: 20049.6). Total num frames: 1148792832. Throughput: 0: 5018.3. Samples: 12187434. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 13:59:19,966][19571] Avg episode reward: [(0, '9.725')] [2025-01-05 13:59:21,645][19668] Updated weights for policy 0, policy_version 280476 (0.0016) [2025-01-05 13:59:23,716][19668] Updated weights for policy 0, policy_version 280486 (0.0015) [2025-01-05 13:59:24,965][19571] Fps is (10 sec: 19660.5, 60 sec: 20002.1, 300 sec: 20049.6). Total num frames: 1148891136. Throughput: 0: 5018.4. Samples: 12217642. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 13:59:24,965][19571] Avg episode reward: [(0, '10.039')] [2025-01-05 13:59:25,741][19668] Updated weights for policy 0, policy_version 280496 (0.0016) [2025-01-05 13:59:27,774][19668] Updated weights for policy 0, policy_version 280506 (0.0014) [2025-01-05 13:59:29,838][19668] Updated weights for policy 0, policy_version 280516 (0.0015) [2025-01-05 13:59:29,965][19571] Fps is (10 sec: 20070.7, 60 sec: 20070.4, 300 sec: 20049.6). Total num frames: 1148993536. Throughput: 0: 5015.8. Samples: 12247662. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 13:59:29,965][19571] Avg episode reward: [(0, '9.144')] [2025-01-05 13:59:31,890][19668] Updated weights for policy 0, policy_version 280526 (0.0015) [2025-01-05 13:59:33,921][19668] Updated weights for policy 0, policy_version 280536 (0.0014) [2025-01-05 13:59:34,965][19571] Fps is (10 sec: 20480.1, 60 sec: 20070.4, 300 sec: 20063.5). Total num frames: 1149095936. Throughput: 0: 5010.0. Samples: 12262474. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 13:59:34,965][19571] Avg episode reward: [(0, '10.027')] [2025-01-05 13:59:36,037][19668] Updated weights for policy 0, policy_version 280546 (0.0017) [2025-01-05 13:59:37,995][19668] Updated weights for policy 0, policy_version 280556 (0.0014) [2025-01-05 13:59:39,965][19571] Fps is (10 sec: 20069.9, 60 sec: 20070.3, 300 sec: 20063.4). Total num frames: 1149194240. Throughput: 0: 5012.2. Samples: 12292704. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 13:59:39,966][19571] Avg episode reward: [(0, '9.025')] [2025-01-05 13:59:40,001][19668] Updated weights for policy 0, policy_version 280566 (0.0015) [2025-01-05 13:59:42,065][19668] Updated weights for policy 0, policy_version 280576 (0.0015) [2025-01-05 13:59:44,028][19668] Updated weights for policy 0, policy_version 280586 (0.0016) [2025-01-05 13:59:44,965][19571] Fps is (10 sec: 20070.4, 60 sec: 20070.4, 300 sec: 20077.3). Total num frames: 1149296640. Throughput: 0: 5017.6. Samples: 12323110. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 13:59:44,965][19571] Avg episode reward: [(0, '11.248')] [2025-01-05 13:59:46,082][19668] Updated weights for policy 0, policy_version 280596 (0.0016) [2025-01-05 13:59:48,113][19668] Updated weights for policy 0, policy_version 280606 (0.0017) [2025-01-05 13:59:49,965][19571] Fps is (10 sec: 20070.9, 60 sec: 20070.4, 300 sec: 20077.3). Total num frames: 1149394944. Throughput: 0: 5028.0. Samples: 12338344. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 13:59:49,965][19571] Avg episode reward: [(0, '10.255')] [2025-01-05 13:59:50,187][19668] Updated weights for policy 0, policy_version 280616 (0.0017) [2025-01-05 13:59:52,228][19668] Updated weights for policy 0, policy_version 280626 (0.0015) [2025-01-05 13:59:54,268][19668] Updated weights for policy 0, policy_version 280636 (0.0015) [2025-01-05 13:59:54,965][19571] Fps is (10 sec: 20070.6, 60 sec: 20070.4, 300 sec: 20077.3). Total num frames: 1149497344. Throughput: 0: 5021.5. Samples: 12368338. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 13:59:54,965][19571] Avg episode reward: [(0, '8.976')] [2025-01-05 13:59:56,305][19668] Updated weights for policy 0, policy_version 280646 (0.0016) [2025-01-05 13:59:58,326][19668] Updated weights for policy 0, policy_version 280656 (0.0015) [2025-01-05 13:59:59,965][19571] Fps is (10 sec: 20070.4, 60 sec: 20070.4, 300 sec: 20077.3). Total num frames: 1149595648. Throughput: 0: 5014.5. Samples: 12398080. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 13:59:59,966][19571] Avg episode reward: [(0, '9.562')] [2025-01-05 14:00:00,469][19668] Updated weights for policy 0, policy_version 280666 (0.0015) [2025-01-05 14:00:02,459][19668] Updated weights for policy 0, policy_version 280676 (0.0014) [2025-01-05 14:00:04,471][19668] Updated weights for policy 0, policy_version 280686 (0.0015) [2025-01-05 14:00:04,965][19571] Fps is (10 sec: 20070.1, 60 sec: 20070.4, 300 sec: 20091.2). Total num frames: 1149698048. Throughput: 0: 5016.9. Samples: 12413192. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 14:00:04,965][19571] Avg episode reward: [(0, '9.954')] [2025-01-05 14:00:06,528][19668] Updated weights for policy 0, policy_version 280696 (0.0015) [2025-01-05 14:00:08,529][19668] Updated weights for policy 0, policy_version 280706 (0.0015) [2025-01-05 14:00:09,965][19571] Fps is (10 sec: 20480.0, 60 sec: 20138.6, 300 sec: 20105.1). Total num frames: 1149800448. Throughput: 0: 5021.6. Samples: 12443612. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 14:00:09,965][19571] Avg episode reward: [(0, '9.807')] [2025-01-05 14:00:10,529][19668] Updated weights for policy 0, policy_version 280716 (0.0015) [2025-01-05 14:00:12,553][19668] Updated weights for policy 0, policy_version 280726 (0.0015) [2025-01-05 14:00:14,580][19668] Updated weights for policy 0, policy_version 280736 (0.0016) [2025-01-05 14:00:14,965][19571] Fps is (10 sec: 20070.7, 60 sec: 20070.4, 300 sec: 20105.1). Total num frames: 1149898752. Throughput: 0: 5032.9. Samples: 12474142. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 14:00:14,965][19571] Avg episode reward: [(0, '9.937')] [2025-01-05 14:00:16,651][19668] Updated weights for policy 0, policy_version 280746 (0.0017) [2025-01-05 14:00:18,655][19668] Updated weights for policy 0, policy_version 280756 (0.0015) [2025-01-05 14:00:19,965][19571] Fps is (10 sec: 20070.5, 60 sec: 20138.7, 300 sec: 20105.1). Total num frames: 1150001152. Throughput: 0: 5033.9. Samples: 12488998. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 14:00:19,965][19571] Avg episode reward: [(0, '9.202')] [2025-01-05 14:00:20,699][19668] Updated weights for policy 0, policy_version 280766 (0.0015) [2025-01-05 14:00:22,678][19668] Updated weights for policy 0, policy_version 280776 (0.0016) [2025-01-05 14:00:24,709][19668] Updated weights for policy 0, policy_version 280786 (0.0015) [2025-01-05 14:00:24,965][19571] Fps is (10 sec: 20479.7, 60 sec: 20206.9, 300 sec: 20119.0). Total num frames: 1150103552. Throughput: 0: 5040.3. Samples: 12519518. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 14:00:24,965][19571] Avg episode reward: [(0, '9.674')] [2025-01-05 14:00:26,801][19668] Updated weights for policy 0, policy_version 280796 (0.0016) [2025-01-05 14:00:28,778][19668] Updated weights for policy 0, policy_version 280806 (0.0016) [2025-01-05 14:00:29,965][19571] Fps is (10 sec: 20070.3, 60 sec: 20138.7, 300 sec: 20105.1). Total num frames: 1150201856. Throughput: 0: 5036.2. Samples: 12549738. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 14:00:29,965][19571] Avg episode reward: [(0, '9.334')] [2025-01-05 14:00:30,842][19668] Updated weights for policy 0, policy_version 280816 (0.0015) [2025-01-05 14:00:32,851][19668] Updated weights for policy 0, policy_version 280826 (0.0016) [2025-01-05 14:00:34,840][19668] Updated weights for policy 0, policy_version 280836 (0.0016) [2025-01-05 14:00:34,965][19571] Fps is (10 sec: 20070.4, 60 sec: 20138.7, 300 sec: 20119.0). Total num frames: 1150304256. Throughput: 0: 5035.0. Samples: 12564920. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 14:00:34,965][19571] Avg episode reward: [(0, '10.493')] [2025-01-05 14:00:36,898][19668] Updated weights for policy 0, policy_version 280846 (0.0015) [2025-01-05 14:00:38,929][19668] Updated weights for policy 0, policy_version 280856 (0.0015) [2025-01-05 14:00:39,965][19571] Fps is (10 sec: 20069.9, 60 sec: 20138.7, 300 sec: 20105.1). Total num frames: 1150402560. Throughput: 0: 5044.0. Samples: 12595322. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 14:00:39,966][19571] Avg episode reward: [(0, '9.142')] [2025-01-05 14:00:39,995][19636] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000280861_1150406656.pth... [2025-01-05 14:00:40,050][19636] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000279683_1145581568.pth [2025-01-05 14:00:41,023][19668] Updated weights for policy 0, policy_version 280866 (0.0017) [2025-01-05 14:00:43,100][19668] Updated weights for policy 0, policy_version 280876 (0.0016) [2025-01-05 14:00:44,965][19571] Fps is (10 sec: 20070.5, 60 sec: 20138.7, 300 sec: 20105.1). Total num frames: 1150504960. Throughput: 0: 5042.0. Samples: 12624968. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 14:00:44,965][19571] Avg episode reward: [(0, '9.186')] [2025-01-05 14:00:45,127][19668] Updated weights for policy 0, policy_version 280886 (0.0016) [2025-01-05 14:00:47,103][19668] Updated weights for policy 0, policy_version 280896 (0.0016) [2025-01-05 14:00:49,179][19668] Updated weights for policy 0, policy_version 280906 (0.0016) [2025-01-05 14:00:49,965][19571] Fps is (10 sec: 20071.0, 60 sec: 20138.7, 300 sec: 20091.2). Total num frames: 1150603264. Throughput: 0: 5044.9. Samples: 12640214. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 14:00:49,965][19571] Avg episode reward: [(0, '10.639')] [2025-01-05 14:00:51,265][19668] Updated weights for policy 0, policy_version 280916 (0.0016) [2025-01-05 14:00:53,231][19668] Updated weights for policy 0, policy_version 280926 (0.0016) [2025-01-05 14:00:54,965][19571] Fps is (10 sec: 20070.6, 60 sec: 20138.7, 300 sec: 20105.1). Total num frames: 1150705664. Throughput: 0: 5037.6. Samples: 12670302. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 14:00:54,965][19571] Avg episode reward: [(0, '9.157')] [2025-01-05 14:00:55,278][19668] Updated weights for policy 0, policy_version 280936 (0.0016) [2025-01-05 14:00:57,310][19668] Updated weights for policy 0, policy_version 280946 (0.0017) [2025-01-05 14:00:59,278][19668] Updated weights for policy 0, policy_version 280956 (0.0016) [2025-01-05 14:00:59,965][19571] Fps is (10 sec: 20479.9, 60 sec: 20206.9, 300 sec: 20105.1). Total num frames: 1150808064. Throughput: 0: 5036.0. Samples: 12700762. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 14:00:59,965][19571] Avg episode reward: [(0, '9.556')] [2025-01-05 14:01:01,327][19668] Updated weights for policy 0, policy_version 280966 (0.0015) [2025-01-05 14:01:03,343][19668] Updated weights for policy 0, policy_version 280976 (0.0016) [2025-01-05 14:01:04,965][19571] Fps is (10 sec: 20070.4, 60 sec: 20138.7, 300 sec: 20105.1). Total num frames: 1150906368. Throughput: 0: 5044.0. Samples: 12715978. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 14:01:04,965][19571] Avg episode reward: [(0, '10.106')] [2025-01-05 14:01:05,334][19668] Updated weights for policy 0, policy_version 280986 (0.0015) [2025-01-05 14:01:07,379][19668] Updated weights for policy 0, policy_version 280996 (0.0016) [2025-01-05 14:01:09,374][19668] Updated weights for policy 0, policy_version 281006 (0.0015) [2025-01-05 14:01:09,965][19571] Fps is (10 sec: 20070.7, 60 sec: 20138.7, 300 sec: 20105.1). Total num frames: 1151008768. Throughput: 0: 5045.6. Samples: 12746570. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 14:01:09,965][19571] Avg episode reward: [(0, '10.042')] [2025-01-05 14:01:11,369][19668] Updated weights for policy 0, policy_version 281016 (0.0016) [2025-01-05 14:01:13,427][19668] Updated weights for policy 0, policy_version 281026 (0.0016) [2025-01-05 14:01:14,965][19571] Fps is (10 sec: 20479.8, 60 sec: 20206.9, 300 sec: 20119.0). Total num frames: 1151111168. Throughput: 0: 5044.0. Samples: 12776720. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 14:01:14,965][19571] Avg episode reward: [(0, '9.949')] [2025-01-05 14:01:15,488][19668] Updated weights for policy 0, policy_version 281036 (0.0016) [2025-01-05 14:01:17,516][19668] Updated weights for policy 0, policy_version 281046 (0.0016) [2025-01-05 14:01:19,545][19668] Updated weights for policy 0, policy_version 281056 (0.0016) [2025-01-05 14:01:19,965][19571] Fps is (10 sec: 20479.9, 60 sec: 20206.9, 300 sec: 20119.0). Total num frames: 1151213568. Throughput: 0: 5043.8. Samples: 12791892. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 14:01:19,965][19571] Avg episode reward: [(0, '9.926')] [2025-01-05 14:01:21,600][19668] Updated weights for policy 0, policy_version 281066 (0.0016) [2025-01-05 14:01:23,614][19668] Updated weights for policy 0, policy_version 281076 (0.0016) [2025-01-05 14:01:24,965][19571] Fps is (10 sec: 20070.5, 60 sec: 20138.7, 300 sec: 20119.0). Total num frames: 1151311872. Throughput: 0: 5038.4. Samples: 12822050. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 14:01:24,965][19571] Avg episode reward: [(0, '10.068')] [2025-01-05 14:01:25,659][19668] Updated weights for policy 0, policy_version 281086 (0.0016) [2025-01-05 14:01:27,676][19668] Updated weights for policy 0, policy_version 281096 (0.0016) [2025-01-05 14:01:29,664][19668] Updated weights for policy 0, policy_version 281106 (0.0015) [2025-01-05 14:01:29,965][19571] Fps is (10 sec: 20070.2, 60 sec: 20206.9, 300 sec: 20119.0). Total num frames: 1151414272. Throughput: 0: 5053.9. Samples: 12852396. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 14:01:29,965][19571] Avg episode reward: [(0, '10.421')] [2025-01-05 14:01:31,728][19668] Updated weights for policy 0, policy_version 281116 (0.0016) [2025-01-05 14:01:33,731][19668] Updated weights for policy 0, policy_version 281126 (0.0015) [2025-01-05 14:01:34,965][19571] Fps is (10 sec: 20479.9, 60 sec: 20206.9, 300 sec: 20119.0). Total num frames: 1151516672. Throughput: 0: 5051.6. Samples: 12867536. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 14:01:34,965][19571] Avg episode reward: [(0, '10.174')] [2025-01-05 14:01:35,715][19668] Updated weights for policy 0, policy_version 281136 (0.0016) [2025-01-05 14:01:37,792][19668] Updated weights for policy 0, policy_version 281146 (0.0016) [2025-01-05 14:01:39,848][19668] Updated weights for policy 0, policy_version 281156 (0.0016) [2025-01-05 14:01:39,965][19571] Fps is (10 sec: 20070.4, 60 sec: 20207.0, 300 sec: 20119.0). Total num frames: 1151614976. Throughput: 0: 5055.3. Samples: 12897792. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 14:01:39,965][19571] Avg episode reward: [(0, '9.236')] [2025-01-05 14:01:41,946][19668] Updated weights for policy 0, policy_version 281166 (0.0017) [2025-01-05 14:01:44,016][19668] Updated weights for policy 0, policy_version 281176 (0.0015) [2025-01-05 14:01:44,965][19571] Fps is (10 sec: 19660.8, 60 sec: 20138.6, 300 sec: 20119.0). Total num frames: 1151713280. Throughput: 0: 5039.6. Samples: 12927544. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 14:01:44,965][19571] Avg episode reward: [(0, '10.826')] [2025-01-05 14:01:46,054][19668] Updated weights for policy 0, policy_version 281186 (0.0017) [2025-01-05 14:01:48,065][19668] Updated weights for policy 0, policy_version 281196 (0.0015) [2025-01-05 14:01:49,965][19571] Fps is (10 sec: 20070.3, 60 sec: 20206.9, 300 sec: 20119.0). Total num frames: 1151815680. Throughput: 0: 5035.7. Samples: 12942584. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 14:01:49,965][19571] Avg episode reward: [(0, '8.598')] [2025-01-05 14:01:50,141][19668] Updated weights for policy 0, policy_version 281206 (0.0015) [2025-01-05 14:01:52,122][19668] Updated weights for policy 0, policy_version 281216 (0.0019) [2025-01-05 14:01:54,155][19668] Updated weights for policy 0, policy_version 281226 (0.0015) [2025-01-05 14:01:54,965][19571] Fps is (10 sec: 20070.7, 60 sec: 20138.7, 300 sec: 20119.0). Total num frames: 1151913984. Throughput: 0: 5031.6. Samples: 12972992. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 14:01:54,965][19571] Avg episode reward: [(0, '9.887')] [2025-01-05 14:01:56,255][19668] Updated weights for policy 0, policy_version 281236 (0.0016) [2025-01-05 14:01:58,242][19668] Updated weights for policy 0, policy_version 281246 (0.0015) [2025-01-05 14:01:59,965][19571] Fps is (10 sec: 20070.4, 60 sec: 20138.7, 300 sec: 20119.0). Total num frames: 1152016384. Throughput: 0: 5028.3. Samples: 13002992. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 14:01:59,965][19571] Avg episode reward: [(0, '9.944')] [2025-01-05 14:02:00,264][19668] Updated weights for policy 0, policy_version 281256 (0.0015) [2025-01-05 14:02:02,299][19668] Updated weights for policy 0, policy_version 281266 (0.0016) [2025-01-05 14:02:04,298][19668] Updated weights for policy 0, policy_version 281276 (0.0016) [2025-01-05 14:02:04,965][19571] Fps is (10 sec: 20479.8, 60 sec: 20206.9, 300 sec: 20119.0). Total num frames: 1152118784. Throughput: 0: 5029.6. Samples: 13018226. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 14:02:04,965][19571] Avg episode reward: [(0, '9.137')] [2025-01-05 14:02:06,315][19668] Updated weights for policy 0, policy_version 281286 (0.0016) [2025-01-05 14:02:08,397][19668] Updated weights for policy 0, policy_version 281296 (0.0016) [2025-01-05 14:02:09,965][19571] Fps is (10 sec: 20070.7, 60 sec: 20138.7, 300 sec: 20119.0). Total num frames: 1152217088. Throughput: 0: 5033.3. Samples: 13048548. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 14:02:09,965][19571] Avg episode reward: [(0, '11.344')] [2025-01-05 14:02:10,418][19668] Updated weights for policy 0, policy_version 281306 (0.0017) [2025-01-05 14:02:12,427][19668] Updated weights for policy 0, policy_version 281316 (0.0015) [2025-01-05 14:02:14,482][19668] Updated weights for policy 0, policy_version 281326 (0.0016) [2025-01-05 14:02:14,965][19571] Fps is (10 sec: 20070.4, 60 sec: 20138.7, 300 sec: 20119.0). Total num frames: 1152319488. Throughput: 0: 5029.9. Samples: 13078740. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 14:02:14,965][19571] Avg episode reward: [(0, '9.319')] [2025-01-05 14:02:16,527][19668] Updated weights for policy 0, policy_version 281336 (0.0016) [2025-01-05 14:02:18,553][19668] Updated weights for policy 0, policy_version 281346 (0.0016) [2025-01-05 14:02:19,965][19571] Fps is (10 sec: 20070.2, 60 sec: 20070.4, 300 sec: 20119.0). Total num frames: 1152417792. Throughput: 0: 5025.4. Samples: 13093678. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 14:02:19,965][19571] Avg episode reward: [(0, '10.181')] [2025-01-05 14:02:20,681][19668] Updated weights for policy 0, policy_version 281356 (0.0018) [2025-01-05 14:02:22,651][19668] Updated weights for policy 0, policy_version 281366 (0.0016) [2025-01-05 14:02:24,682][19668] Updated weights for policy 0, policy_version 281376 (0.0019) [2025-01-05 14:02:24,965][19571] Fps is (10 sec: 20070.3, 60 sec: 20138.7, 300 sec: 20119.0). Total num frames: 1152520192. Throughput: 0: 5021.6. Samples: 13123766. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 14:02:24,965][19571] Avg episode reward: [(0, '9.091')] [2025-01-05 14:02:26,801][19668] Updated weights for policy 0, policy_version 281386 (0.0016) [2025-01-05 14:02:28,773][19668] Updated weights for policy 0, policy_version 281396 (0.0016) [2025-01-05 14:02:29,965][19571] Fps is (10 sec: 20070.6, 60 sec: 20070.4, 300 sec: 20119.0). Total num frames: 1152618496. Throughput: 0: 5031.2. Samples: 13153946. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 14:02:29,965][19571] Avg episode reward: [(0, '9.654')] [2025-01-05 14:02:30,791][19668] Updated weights for policy 0, policy_version 281406 (0.0015) [2025-01-05 14:02:32,855][19668] Updated weights for policy 0, policy_version 281416 (0.0016) [2025-01-05 14:02:34,871][19668] Updated weights for policy 0, policy_version 281426 (0.0016) [2025-01-05 14:02:34,965][19571] Fps is (10 sec: 20070.2, 60 sec: 20070.4, 300 sec: 20119.0). Total num frames: 1152720896. Throughput: 0: 5036.6. Samples: 13169230. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 14:02:34,966][19571] Avg episode reward: [(0, '9.079')] [2025-01-05 14:02:36,910][19668] Updated weights for policy 0, policy_version 281436 (0.0015) [2025-01-05 14:02:38,965][19668] Updated weights for policy 0, policy_version 281446 (0.0016) [2025-01-05 14:02:39,965][19571] Fps is (10 sec: 20070.1, 60 sec: 20070.4, 300 sec: 20105.1). Total num frames: 1152819200. Throughput: 0: 5029.3. Samples: 13199312. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 14:02:39,966][19571] Avg episode reward: [(0, '9.831')] [2025-01-05 14:02:39,974][19636] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000281451_1152823296.pth... [2025-01-05 14:02:40,025][19636] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000280272_1147994112.pth [2025-01-05 14:02:41,010][19668] Updated weights for policy 0, policy_version 281456 (0.0016) [2025-01-05 14:02:43,038][19668] Updated weights for policy 0, policy_version 281466 (0.0019) [2025-01-05 14:02:44,965][19571] Fps is (10 sec: 20070.6, 60 sec: 20138.7, 300 sec: 20119.0). Total num frames: 1152921600. Throughput: 0: 5027.2. Samples: 13229214. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 14:02:44,965][19571] Avg episode reward: [(0, '10.114')] [2025-01-05 14:02:45,109][19668] Updated weights for policy 0, policy_version 281476 (0.0016) [2025-01-05 14:02:47,083][19668] Updated weights for policy 0, policy_version 281486 (0.0016) [2025-01-05 14:02:49,100][19668] Updated weights for policy 0, policy_version 281496 (0.0015) [2025-01-05 14:02:49,965][19571] Fps is (10 sec: 20480.1, 60 sec: 20138.7, 300 sec: 20119.0). Total num frames: 1153024000. Throughput: 0: 5027.7. Samples: 13244472. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 14:02:49,965][19571] Avg episode reward: [(0, '10.309')] [2025-01-05 14:02:51,149][19668] Updated weights for policy 0, policy_version 281506 (0.0015) [2025-01-05 14:02:53,115][19668] Updated weights for policy 0, policy_version 281516 (0.0016) [2025-01-05 14:02:54,965][19571] Fps is (10 sec: 20480.2, 60 sec: 20206.9, 300 sec: 20132.9). Total num frames: 1153126400. Throughput: 0: 5033.0. Samples: 13275034. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 14:02:54,965][19571] Avg episode reward: [(0, '9.739')] [2025-01-05 14:02:55,142][19668] Updated weights for policy 0, policy_version 281526 (0.0015) [2025-01-05 14:02:57,173][19668] Updated weights for policy 0, policy_version 281536 (0.0016) [2025-01-05 14:02:59,178][19668] Updated weights for policy 0, policy_version 281546 (0.0016) [2025-01-05 14:02:59,965][19571] Fps is (10 sec: 20070.4, 60 sec: 20138.7, 300 sec: 20132.9). Total num frames: 1153224704. Throughput: 0: 5040.7. Samples: 13305570. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 14:02:59,965][19571] Avg episode reward: [(0, '10.603')] [2025-01-05 14:03:01,199][19668] Updated weights for policy 0, policy_version 281556 (0.0016) [2025-01-05 14:03:03,226][19668] Updated weights for policy 0, policy_version 281566 (0.0015) [2025-01-05 14:03:04,965][19571] Fps is (10 sec: 20070.2, 60 sec: 20138.7, 300 sec: 20132.9). Total num frames: 1153327104. Throughput: 0: 5045.6. Samples: 13320732. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 14:03:04,966][19571] Avg episode reward: [(0, '9.906')] [2025-01-05 14:03:05,328][19668] Updated weights for policy 0, policy_version 281576 (0.0016) [2025-01-05 14:03:07,342][19668] Updated weights for policy 0, policy_version 281586 (0.0016) [2025-01-05 14:03:09,354][19668] Updated weights for policy 0, policy_version 281596 (0.0015) [2025-01-05 14:03:09,965][19571] Fps is (10 sec: 20480.1, 60 sec: 20206.9, 300 sec: 20132.9). Total num frames: 1153429504. Throughput: 0: 5044.2. Samples: 13350754. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 14:03:09,965][19571] Avg episode reward: [(0, '10.439')] [2025-01-05 14:03:11,376][19668] Updated weights for policy 0, policy_version 281606 (0.0015) [2025-01-05 14:03:13,402][19668] Updated weights for policy 0, policy_version 281616 (0.0016) [2025-01-05 14:03:14,965][19571] Fps is (10 sec: 20070.4, 60 sec: 20138.7, 300 sec: 20132.9). Total num frames: 1153527808. Throughput: 0: 5043.3. Samples: 13380896. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 14:03:14,966][19571] Avg episode reward: [(0, '9.223')] [2025-01-05 14:03:15,463][19668] Updated weights for policy 0, policy_version 281626 (0.0016) [2025-01-05 14:03:17,468][19668] Updated weights for policy 0, policy_version 281636 (0.0015) [2025-01-05 14:03:19,509][19668] Updated weights for policy 0, policy_version 281646 (0.0016) [2025-01-05 14:03:19,965][19571] Fps is (10 sec: 20070.5, 60 sec: 20207.0, 300 sec: 20132.9). Total num frames: 1153630208. Throughput: 0: 5042.5. Samples: 13396142. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 14:03:19,965][19571] Avg episode reward: [(0, '9.365')] [2025-01-05 14:03:21,601][19668] Updated weights for policy 0, policy_version 281656 (0.0016) [2025-01-05 14:03:23,593][19668] Updated weights for policy 0, policy_version 281666 (0.0016) [2025-01-05 14:03:24,965][19571] Fps is (10 sec: 20070.4, 60 sec: 20138.7, 300 sec: 20132.9). Total num frames: 1153728512. Throughput: 0: 5043.4. Samples: 13426264. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 14:03:24,965][19571] Avg episode reward: [(0, '10.086')] [2025-01-05 14:03:25,611][19668] Updated weights for policy 0, policy_version 281676 (0.0016) [2025-01-05 14:03:27,644][19668] Updated weights for policy 0, policy_version 281686 (0.0016) [2025-01-05 14:03:29,624][19668] Updated weights for policy 0, policy_version 281696 (0.0016) [2025-01-05 14:03:29,965][19571] Fps is (10 sec: 20070.4, 60 sec: 20206.9, 300 sec: 20132.9). Total num frames: 1153830912. Throughput: 0: 5058.4. Samples: 13456840. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 14:03:29,965][19571] Avg episode reward: [(0, '9.722')] [2025-01-05 14:03:31,649][19668] Updated weights for policy 0, policy_version 281706 (0.0015) [2025-01-05 14:03:33,673][19668] Updated weights for policy 0, policy_version 281716 (0.0016) [2025-01-05 14:03:34,965][19571] Fps is (10 sec: 20479.4, 60 sec: 20206.8, 300 sec: 20146.7). Total num frames: 1153933312. Throughput: 0: 5056.8. Samples: 13472030. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 14:03:34,966][19571] Avg episode reward: [(0, '9.071')] [2025-01-05 14:03:35,759][19668] Updated weights for policy 0, policy_version 281726 (0.0016) [2025-01-05 14:03:37,791][19668] Updated weights for policy 0, policy_version 281736 (0.0016) [2025-01-05 14:03:39,867][19668] Updated weights for policy 0, policy_version 281746 (0.0016) [2025-01-05 14:03:39,965][19571] Fps is (10 sec: 20070.1, 60 sec: 20206.9, 300 sec: 20132.9). Total num frames: 1154031616. Throughput: 0: 5040.5. Samples: 13501858. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 14:03:39,965][19571] Avg episode reward: [(0, '9.989')] [2025-01-05 14:03:41,958][19668] Updated weights for policy 0, policy_version 281756 (0.0015) [2025-01-05 14:03:43,968][19668] Updated weights for policy 0, policy_version 281766 (0.0017) [2025-01-05 14:03:44,965][19571] Fps is (10 sec: 19661.4, 60 sec: 20138.7, 300 sec: 20132.9). Total num frames: 1154129920. Throughput: 0: 5029.7. Samples: 13531908. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 14:03:44,965][19571] Avg episode reward: [(0, '9.731')] [2025-01-05 14:03:46,008][19668] Updated weights for policy 0, policy_version 281776 (0.0015) [2025-01-05 14:03:48,002][19668] Updated weights for policy 0, policy_version 281786 (0.0015) [2025-01-05 14:03:49,965][19571] Fps is (10 sec: 20070.5, 60 sec: 20138.7, 300 sec: 20132.9). Total num frames: 1154232320. Throughput: 0: 5030.6. Samples: 13547110. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 14:03:49,965][19571] Avg episode reward: [(0, '10.385')] [2025-01-05 14:03:50,030][19668] Updated weights for policy 0, policy_version 281796 (0.0015) [2025-01-05 14:03:52,058][19668] Updated weights for policy 0, policy_version 281806 (0.0015) [2025-01-05 14:03:54,057][19668] Updated weights for policy 0, policy_version 281816 (0.0016) [2025-01-05 14:03:54,965][19571] Fps is (10 sec: 20480.4, 60 sec: 20138.7, 300 sec: 20146.8). Total num frames: 1154334720. Throughput: 0: 5038.5. Samples: 13577484. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 14:03:54,965][19571] Avg episode reward: [(0, '9.852')] [2025-01-05 14:03:56,067][19668] Updated weights for policy 0, policy_version 281826 (0.0015) [2025-01-05 14:03:58,087][19668] Updated weights for policy 0, policy_version 281836 (0.0016) [2025-01-05 14:03:59,965][19571] Fps is (10 sec: 20480.1, 60 sec: 20207.0, 300 sec: 20146.8). Total num frames: 1154437120. Throughput: 0: 5043.4. Samples: 13607848. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 14:03:59,965][19571] Avg episode reward: [(0, '10.382')] [2025-01-05 14:04:00,142][19668] Updated weights for policy 0, policy_version 281846 (0.0015) [2025-01-05 14:04:02,176][19668] Updated weights for policy 0, policy_version 281856 (0.0016) [2025-01-05 14:04:04,208][19668] Updated weights for policy 0, policy_version 281866 (0.0016) [2025-01-05 14:04:04,965][19571] Fps is (10 sec: 20070.0, 60 sec: 20138.7, 300 sec: 20146.8). Total num frames: 1154535424. Throughput: 0: 5042.8. Samples: 13623068. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 14:04:04,965][19571] Avg episode reward: [(0, '8.588')] [2025-01-05 14:04:06,326][19668] Updated weights for policy 0, policy_version 281876 (0.0016) [2025-01-05 14:04:08,332][19668] Updated weights for policy 0, policy_version 281886 (0.0015) [2025-01-05 14:04:09,965][19571] Fps is (10 sec: 20070.3, 60 sec: 20138.7, 300 sec: 20146.8). Total num frames: 1154637824. Throughput: 0: 5037.9. Samples: 13652968. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 14:04:09,965][19571] Avg episode reward: [(0, '9.304')] [2025-01-05 14:04:10,332][19668] Updated weights for policy 0, policy_version 281896 (0.0015) [2025-01-05 14:04:12,351][19668] Updated weights for policy 0, policy_version 281906 (0.0015) [2025-01-05 14:04:14,419][19668] Updated weights for policy 0, policy_version 281916 (0.0015) [2025-01-05 14:04:14,965][19571] Fps is (10 sec: 20070.5, 60 sec: 20138.7, 300 sec: 20146.8). Total num frames: 1154736128. Throughput: 0: 5028.3. Samples: 13683112. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 14:04:14,965][19571] Avg episode reward: [(0, '9.646')] [2025-01-05 14:04:16,482][19668] Updated weights for policy 0, policy_version 281926 (0.0016) [2025-01-05 14:04:18,489][19668] Updated weights for policy 0, policy_version 281936 (0.0015) [2025-01-05 14:04:19,965][19571] Fps is (10 sec: 20070.4, 60 sec: 20138.6, 300 sec: 20160.7). Total num frames: 1154838528. Throughput: 0: 5026.6. Samples: 13698226. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 14:04:19,965][19571] Avg episode reward: [(0, '9.420')] [2025-01-05 14:04:20,501][19668] Updated weights for policy 0, policy_version 281946 (0.0016) [2025-01-05 14:04:22,540][19668] Updated weights for policy 0, policy_version 281956 (0.0015) [2025-01-05 14:04:24,532][19668] Updated weights for policy 0, policy_version 281966 (0.0015) [2025-01-05 14:04:24,965][19571] Fps is (10 sec: 20479.9, 60 sec: 20206.9, 300 sec: 20160.7). Total num frames: 1154940928. Throughput: 0: 5041.6. Samples: 13728728. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 14:04:24,965][19571] Avg episode reward: [(0, '9.902')] [2025-01-05 14:04:26,561][19668] Updated weights for policy 0, policy_version 281976 (0.0017) [2025-01-05 14:04:28,580][19668] Updated weights for policy 0, policy_version 281986 (0.0015) [2025-01-05 14:04:29,965][19571] Fps is (10 sec: 20070.7, 60 sec: 20138.7, 300 sec: 20146.8). Total num frames: 1155039232. Throughput: 0: 5051.5. Samples: 13759224. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 14:04:29,965][19571] Avg episode reward: [(0, '10.009')] [2025-01-05 14:04:30,580][19668] Updated weights for policy 0, policy_version 281996 (0.0015) [2025-01-05 14:04:32,607][19668] Updated weights for policy 0, policy_version 282006 (0.0016) [2025-01-05 14:04:34,644][19668] Updated weights for policy 0, policy_version 282016 (0.0015) [2025-01-05 14:04:34,965][19571] Fps is (10 sec: 20070.6, 60 sec: 20138.8, 300 sec: 20160.7). Total num frames: 1155141632. Throughput: 0: 5051.6. Samples: 13774430. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 14:04:34,965][19571] Avg episode reward: [(0, '10.192')] [2025-01-05 14:04:36,722][19668] Updated weights for policy 0, policy_version 282026 (0.0016) [2025-01-05 14:04:38,756][19668] Updated weights for policy 0, policy_version 282036 (0.0014) [2025-01-05 14:04:39,965][19571] Fps is (10 sec: 20070.0, 60 sec: 20138.7, 300 sec: 20146.8). Total num frames: 1155239936. Throughput: 0: 5043.4. Samples: 13804438. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 14:04:39,965][19571] Avg episode reward: [(0, '8.964')] [2025-01-05 14:04:40,017][19636] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000282042_1155244032.pth... [2025-01-05 14:04:40,064][19636] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000280861_1150406656.pth [2025-01-05 14:04:40,846][19668] Updated weights for policy 0, policy_version 282046 (0.0015) [2025-01-05 14:04:42,884][19668] Updated weights for policy 0, policy_version 282056 (0.0015) [2025-01-05 14:04:44,910][19668] Updated weights for policy 0, policy_version 282066 (0.0015) [2025-01-05 14:04:44,965][19571] Fps is (10 sec: 20070.3, 60 sec: 20207.0, 300 sec: 20160.7). Total num frames: 1155342336. Throughput: 0: 5033.3. Samples: 13834344. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 14:04:44,965][19571] Avg episode reward: [(0, '10.810')] [2025-01-05 14:04:46,972][19668] Updated weights for policy 0, policy_version 282076 (0.0016) [2025-01-05 14:04:48,980][19668] Updated weights for policy 0, policy_version 282086 (0.0015) [2025-01-05 14:04:49,965][19571] Fps is (10 sec: 20070.6, 60 sec: 20138.7, 300 sec: 20146.8). Total num frames: 1155440640. Throughput: 0: 5026.8. Samples: 13849272. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 14:04:49,965][19571] Avg episode reward: [(0, '9.428')] [2025-01-05 14:04:51,030][19668] Updated weights for policy 0, policy_version 282096 (0.0015) [2025-01-05 14:04:53,036][19668] Updated weights for policy 0, policy_version 282106 (0.0015) [2025-01-05 14:04:54,965][19571] Fps is (10 sec: 20070.3, 60 sec: 20138.6, 300 sec: 20160.7). Total num frames: 1155543040. Throughput: 0: 5037.4. Samples: 13879650. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 14:04:54,965][19571] Avg episode reward: [(0, '9.058')] [2025-01-05 14:04:55,025][19668] Updated weights for policy 0, policy_version 282116 (0.0015) [2025-01-05 14:04:57,097][19668] Updated weights for policy 0, policy_version 282126 (0.0015) [2025-01-05 14:04:59,111][19668] Updated weights for policy 0, policy_version 282136 (0.0015) [2025-01-05 14:04:59,965][19571] Fps is (10 sec: 20479.9, 60 sec: 20138.6, 300 sec: 20160.6). Total num frames: 1155645440. Throughput: 0: 5044.8. Samples: 13910128. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 14:04:59,965][19571] Avg episode reward: [(0, '9.336')] [2025-01-05 14:05:01,099][19668] Updated weights for policy 0, policy_version 282146 (0.0015) [2025-01-05 14:05:03,153][19668] Updated weights for policy 0, policy_version 282156 (0.0014) [2025-01-05 14:05:04,965][19571] Fps is (10 sec: 20070.3, 60 sec: 20138.7, 300 sec: 20146.8). Total num frames: 1155743744. Throughput: 0: 5047.3. Samples: 13925354. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 14:05:04,965][19571] Avg episode reward: [(0, '9.779')] [2025-01-05 14:05:05,228][19668] Updated weights for policy 0, policy_version 282166 (0.0017) [2025-01-05 14:05:07,221][19668] Updated weights for policy 0, policy_version 282176 (0.0014) [2025-01-05 14:05:09,267][19668] Updated weights for policy 0, policy_version 282186 (0.0016) [2025-01-05 14:05:09,965][19571] Fps is (10 sec: 20070.6, 60 sec: 20138.7, 300 sec: 20160.6). Total num frames: 1155846144. Throughput: 0: 5038.9. Samples: 13955478. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 14:05:09,965][19571] Avg episode reward: [(0, '9.872')] [2025-01-05 14:05:11,339][19668] Updated weights for policy 0, policy_version 282196 (0.0015) [2025-01-05 14:05:13,330][19668] Updated weights for policy 0, policy_version 282206 (0.0015) [2025-01-05 14:05:14,965][19571] Fps is (10 sec: 20070.7, 60 sec: 20138.7, 300 sec: 20146.8). Total num frames: 1155944448. Throughput: 0: 5031.7. Samples: 13985650. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 14:05:14,965][19571] Avg episode reward: [(0, '10.117')] [2025-01-05 14:05:15,379][19668] Updated weights for policy 0, policy_version 282216 (0.0015) [2025-01-05 14:05:17,393][19668] Updated weights for policy 0, policy_version 282226 (0.0016) [2025-01-05 14:05:19,370][19668] Updated weights for policy 0, policy_version 282236 (0.0016) [2025-01-05 14:05:19,965][19571] Fps is (10 sec: 20070.4, 60 sec: 20138.7, 300 sec: 20146.8). Total num frames: 1156046848. Throughput: 0: 5032.8. Samples: 14000906. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 14:05:19,965][19571] Avg episode reward: [(0, '9.460')] [2025-01-05 14:05:21,441][19668] Updated weights for policy 0, policy_version 282246 (0.0014) [2025-01-05 14:05:23,450][19668] Updated weights for policy 0, policy_version 282256 (0.0015) [2025-01-05 14:05:24,965][19571] Fps is (10 sec: 20479.7, 60 sec: 20138.7, 300 sec: 20160.6). Total num frames: 1156149248. Throughput: 0: 5039.4. Samples: 14031210. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 14:05:24,965][19571] Avg episode reward: [(0, '9.490')] [2025-01-05 14:05:25,431][19668] Updated weights for policy 0, policy_version 282266 (0.0016) [2025-01-05 14:05:27,500][19668] Updated weights for policy 0, policy_version 282276 (0.0015) [2025-01-05 14:05:29,497][19668] Updated weights for policy 0, policy_version 282286 (0.0015) [2025-01-05 14:05:29,965][19571] Fps is (10 sec: 20479.9, 60 sec: 20206.9, 300 sec: 20160.7). Total num frames: 1156251648. Throughput: 0: 5052.8. Samples: 14061720. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 14:05:29,965][19571] Avg episode reward: [(0, '9.024')] [2025-01-05 14:05:31,485][19668] Updated weights for policy 0, policy_version 282296 (0.0015) [2025-01-05 14:05:33,547][19668] Updated weights for policy 0, policy_version 282306 (0.0015) [2025-01-05 14:05:34,965][19571] Fps is (10 sec: 20070.4, 60 sec: 20138.6, 300 sec: 20160.7). Total num frames: 1156349952. Throughput: 0: 5058.9. Samples: 14076924. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 14:05:34,965][19571] Avg episode reward: [(0, '10.528')] [2025-01-05 14:05:35,612][19668] Updated weights for policy 0, policy_version 282316 (0.0016) [2025-01-05 14:05:37,584][19668] Updated weights for policy 0, policy_version 282326 (0.0015) [2025-01-05 14:05:39,662][19668] Updated weights for policy 0, policy_version 282336 (0.0016) [2025-01-05 14:05:39,965][19571] Fps is (10 sec: 20070.3, 60 sec: 20207.0, 300 sec: 20160.6). Total num frames: 1156452352. Throughput: 0: 5054.0. Samples: 14107078. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-05 14:05:39,965][19571] Avg episode reward: [(0, '10.170')] [2025-01-05 14:05:41,747][19668] Updated weights for policy 0, policy_version 282346 (0.0015) [2025-01-05 14:05:43,698][19668] Updated weights for policy 0, policy_version 282356 (0.0015) [2025-01-05 14:05:44,965][19571] Fps is (10 sec: 20480.0, 60 sec: 20206.9, 300 sec: 20174.5). Total num frames: 1156554752. Throughput: 0: 5050.1. Samples: 14137382. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-05 14:05:44,965][19571] Avg episode reward: [(0, '9.766')] [2025-01-05 14:05:45,769][19668] Updated weights for policy 0, policy_version 282366 (0.0016) [2025-01-05 14:05:47,763][19668] Updated weights for policy 0, policy_version 282376 (0.0015) [2025-01-05 14:05:49,739][19668] Updated weights for policy 0, policy_version 282386 (0.0015) [2025-01-05 14:05:49,965][19571] Fps is (10 sec: 20070.5, 60 sec: 20206.9, 300 sec: 20160.6). Total num frames: 1156653056. Throughput: 0: 5050.2. Samples: 14152612. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-05 14:05:49,965][19571] Avg episode reward: [(0, '9.390')] [2025-01-05 14:05:51,792][19668] Updated weights for policy 0, policy_version 282396 (0.0014) [2025-01-05 14:05:53,797][19668] Updated weights for policy 0, policy_version 282406 (0.0015) [2025-01-05 14:05:54,965][19571] Fps is (10 sec: 20070.6, 60 sec: 20206.9, 300 sec: 20160.7). Total num frames: 1156755456. Throughput: 0: 5059.8. Samples: 14183168. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-05 14:05:54,965][19571] Avg episode reward: [(0, '9.900')] [2025-01-05 14:05:55,772][19668] Updated weights for policy 0, policy_version 282416 (0.0016) [2025-01-05 14:05:57,829][19668] Updated weights for policy 0, policy_version 282426 (0.0014) [2025-01-05 14:05:59,912][19668] Updated weights for policy 0, policy_version 282436 (0.0019) [2025-01-05 14:05:59,965][19571] Fps is (10 sec: 20479.8, 60 sec: 20206.9, 300 sec: 20174.5). Total num frames: 1156857856. Throughput: 0: 5057.1. Samples: 14213222. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-05 14:05:59,965][19571] Avg episode reward: [(0, '10.109')] [2025-01-05 14:06:01,942][19668] Updated weights for policy 0, policy_version 282446 (0.0016) [2025-01-05 14:06:04,015][19668] Updated weights for policy 0, policy_version 282456 (0.0015) [2025-01-05 14:06:04,965][19571] Fps is (10 sec: 20070.4, 60 sec: 20206.9, 300 sec: 20160.6). Total num frames: 1156956160. Throughput: 0: 5052.2. Samples: 14228254. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-05 14:06:04,965][19571] Avg episode reward: [(0, '8.346')] [2025-01-05 14:06:06,069][19668] Updated weights for policy 0, policy_version 282466 (0.0016) [2025-01-05 14:06:08,055][19668] Updated weights for policy 0, policy_version 282476 (0.0016) [2025-01-05 14:06:09,965][19571] Fps is (10 sec: 20070.5, 60 sec: 20206.9, 300 sec: 20160.6). Total num frames: 1157058560. Throughput: 0: 5047.4. Samples: 14258344. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-05 14:06:09,965][19571] Avg episode reward: [(0, '9.790')] [2025-01-05 14:06:10,115][19668] Updated weights for policy 0, policy_version 282486 (0.0015) [2025-01-05 14:06:12,145][19668] Updated weights for policy 0, policy_version 282496 (0.0016) [2025-01-05 14:06:14,121][19668] Updated weights for policy 0, policy_version 282506 (0.0014) [2025-01-05 14:06:14,965][19571] Fps is (10 sec: 20480.2, 60 sec: 20275.2, 300 sec: 20160.7). Total num frames: 1157160960. Throughput: 0: 5045.2. Samples: 14288752. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-05 14:06:14,965][19571] Avg episode reward: [(0, '9.392')] [2025-01-05 14:06:16,175][19668] Updated weights for policy 0, policy_version 282516 (0.0015) [2025-01-05 14:06:18,196][19668] Updated weights for policy 0, policy_version 282526 (0.0016) [2025-01-05 14:06:19,965][19571] Fps is (10 sec: 20070.4, 60 sec: 20206.9, 300 sec: 20160.6). Total num frames: 1157259264. Throughput: 0: 5046.3. Samples: 14304006. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-05 14:06:19,965][19571] Avg episode reward: [(0, '8.925')] [2025-01-05 14:06:20,205][19668] Updated weights for policy 0, policy_version 282536 (0.0016) [2025-01-05 14:06:22,259][19668] Updated weights for policy 0, policy_version 282546 (0.0016) [2025-01-05 14:06:24,289][19668] Updated weights for policy 0, policy_version 282556 (0.0016) [2025-01-05 14:06:24,965][19571] Fps is (10 sec: 20070.1, 60 sec: 20206.9, 300 sec: 20160.7). Total num frames: 1157361664. Throughput: 0: 5050.7. Samples: 14334360. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-05 14:06:24,965][19571] Avg episode reward: [(0, '10.212')] [2025-01-05 14:06:26,288][19668] Updated weights for policy 0, policy_version 282566 (0.0015) [2025-01-05 14:06:28,364][19668] Updated weights for policy 0, policy_version 282576 (0.0016) [2025-01-05 14:06:29,965][19571] Fps is (10 sec: 20070.4, 60 sec: 20138.7, 300 sec: 20146.8). Total num frames: 1157459968. Throughput: 0: 5040.4. Samples: 14364202. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-05 14:06:29,965][19571] Avg episode reward: [(0, '9.916')] [2025-01-05 14:06:30,510][19668] Updated weights for policy 0, policy_version 282586 (0.0016) [2025-01-05 14:06:32,433][19668] Updated weights for policy 0, policy_version 282596 (0.0015) [2025-01-05 14:06:34,539][19668] Updated weights for policy 0, policy_version 282606 (0.0017) [2025-01-05 14:06:34,965][19571] Fps is (10 sec: 20070.3, 60 sec: 20206.9, 300 sec: 20160.7). Total num frames: 1157562368. Throughput: 0: 5036.8. Samples: 14379270. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-05 14:06:34,965][19571] Avg episode reward: [(0, '10.213')] [2025-01-05 14:06:35,454][19636] Signal inference workers to stop experience collection... (100 times) [2025-01-05 14:06:35,457][19636] Signal inference workers to resume experience collection... (100 times) [2025-01-05 14:06:35,471][19668] InferenceWorker_p0-w0: stopping experience collection (100 times) [2025-01-05 14:06:35,472][19668] InferenceWorker_p0-w0: resuming experience collection (100 times) [2025-01-05 14:06:36,684][19668] Updated weights for policy 0, policy_version 282616 (0.0016) [2025-01-05 14:06:38,600][19668] Updated weights for policy 0, policy_version 282626 (0.0015) [2025-01-05 14:06:39,965][19571] Fps is (10 sec: 20070.5, 60 sec: 20138.7, 300 sec: 20160.7). Total num frames: 1157660672. Throughput: 0: 5020.6. Samples: 14409094. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-05 14:06:39,965][19571] Avg episode reward: [(0, '10.148')] [2025-01-05 14:06:39,972][19636] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000282632_1157660672.pth... [2025-01-05 14:06:40,026][19636] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000281451_1152823296.pth [2025-01-05 14:06:40,729][19668] Updated weights for policy 0, policy_version 282636 (0.0016) [2025-01-05 14:06:42,777][19668] Updated weights for policy 0, policy_version 282646 (0.0016) [2025-01-05 14:06:44,731][19668] Updated weights for policy 0, policy_version 282656 (0.0016) [2025-01-05 14:06:44,965][19571] Fps is (10 sec: 19661.0, 60 sec: 20070.5, 300 sec: 20146.8). Total num frames: 1157758976. Throughput: 0: 5022.4. Samples: 14439230. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-05 14:06:44,965][19571] Avg episode reward: [(0, '9.454')] [2025-01-05 14:06:46,814][19668] Updated weights for policy 0, policy_version 282666 (0.0015) [2025-01-05 14:06:48,863][19668] Updated weights for policy 0, policy_version 282676 (0.0016) [2025-01-05 14:06:49,965][19571] Fps is (10 sec: 20070.3, 60 sec: 20138.7, 300 sec: 20160.6). Total num frames: 1157861376. Throughput: 0: 5024.0. Samples: 14454334. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 14:06:49,965][19571] Avg episode reward: [(0, '9.222')] [2025-01-05 14:06:50,879][19668] Updated weights for policy 0, policy_version 282686 (0.0017) [2025-01-05 14:06:52,943][19668] Updated weights for policy 0, policy_version 282696 (0.0015) [2025-01-05 14:06:54,965][19571] Fps is (10 sec: 20070.3, 60 sec: 20070.4, 300 sec: 20146.8). Total num frames: 1157959680. Throughput: 0: 5021.1. Samples: 14484294. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 14:06:54,965][19571] Avg episode reward: [(0, '9.325')] [2025-01-05 14:06:55,119][19668] Updated weights for policy 0, policy_version 282706 (0.0017) [2025-01-05 14:06:57,109][19668] Updated weights for policy 0, policy_version 282716 (0.0017) [2025-01-05 14:06:59,188][19668] Updated weights for policy 0, policy_version 282726 (0.0016) [2025-01-05 14:06:59,965][19571] Fps is (10 sec: 19660.6, 60 sec: 20002.1, 300 sec: 20132.9). Total num frames: 1158057984. Throughput: 0: 5000.7. Samples: 14513784. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 14:06:59,965][19571] Avg episode reward: [(0, '9.743')] [2025-01-05 14:07:01,324][19668] Updated weights for policy 0, policy_version 282736 (0.0017) [2025-01-05 14:07:03,268][19668] Updated weights for policy 0, policy_version 282746 (0.0016) [2025-01-05 14:07:04,965][19571] Fps is (10 sec: 19660.6, 60 sec: 20002.1, 300 sec: 20132.9). Total num frames: 1158156288. Throughput: 0: 4992.5. Samples: 14528668. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 14:07:04,965][19571] Avg episode reward: [(0, '9.581')] [2025-01-05 14:07:05,337][19668] Updated weights for policy 0, policy_version 282756 (0.0017) [2025-01-05 14:07:07,411][19668] Updated weights for policy 0, policy_version 282766 (0.0016) [2025-01-05 14:07:09,381][19668] Updated weights for policy 0, policy_version 282776 (0.0016) [2025-01-05 14:07:09,965][19571] Fps is (10 sec: 20070.5, 60 sec: 20002.1, 300 sec: 20132.9). Total num frames: 1158258688. Throughput: 0: 4990.3. Samples: 14558922. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 14:07:09,965][19571] Avg episode reward: [(0, '9.516')] [2025-01-05 14:07:11,449][19668] Updated weights for policy 0, policy_version 282786 (0.0016) [2025-01-05 14:07:12,903][19636] Signal inference workers to stop experience collection... (150 times) [2025-01-05 14:07:12,903][19636] Signal inference workers to resume experience collection... (150 times) [2025-01-05 14:07:12,911][19668] InferenceWorker_p0-w0: stopping experience collection (150 times) [2025-01-05 14:07:12,911][19668] InferenceWorker_p0-w0: resuming experience collection (150 times) [2025-01-05 14:07:13,504][19668] Updated weights for policy 0, policy_version 282796 (0.0015) [2025-01-05 14:07:14,965][19571] Fps is (10 sec: 20480.1, 60 sec: 20002.1, 300 sec: 20146.8). Total num frames: 1158361088. Throughput: 0: 5001.6. Samples: 14589276. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 14:07:14,965][19571] Avg episode reward: [(0, '9.395')] [2025-01-05 14:07:15,434][19668] Updated weights for policy 0, policy_version 282806 (0.0015) [2025-01-05 14:07:17,492][19668] Updated weights for policy 0, policy_version 282816 (0.0015) [2025-01-05 14:07:19,565][19668] Updated weights for policy 0, policy_version 282826 (0.0015) [2025-01-05 14:07:19,965][19571] Fps is (10 sec: 20480.0, 60 sec: 20070.4, 300 sec: 20146.8). Total num frames: 1158463488. Throughput: 0: 5006.5. Samples: 14604564. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 14:07:19,965][19571] Avg episode reward: [(0, '10.259')] [2025-01-05 14:07:21,559][19668] Updated weights for policy 0, policy_version 282836 (0.0016) [2025-01-05 14:07:23,593][19668] Updated weights for policy 0, policy_version 282846 (0.0015) [2025-01-05 14:07:24,965][19571] Fps is (10 sec: 20070.4, 60 sec: 20002.1, 300 sec: 20146.8). Total num frames: 1158561792. Throughput: 0: 5011.7. Samples: 14634622. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 14:07:24,965][19571] Avg episode reward: [(0, '10.247')] [2025-01-05 14:07:25,739][19668] Updated weights for policy 0, policy_version 282856 (0.0016) [2025-01-05 14:07:27,680][19668] Updated weights for policy 0, policy_version 282866 (0.0015) [2025-01-05 14:07:29,721][19668] Updated weights for policy 0, policy_version 282876 (0.0016) [2025-01-05 14:07:29,965][19571] Fps is (10 sec: 20070.4, 60 sec: 20070.4, 300 sec: 20146.8). Total num frames: 1158664192. Throughput: 0: 5012.6. Samples: 14664796. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 14:07:29,965][19571] Avg episode reward: [(0, '9.418')] [2025-01-05 14:07:31,819][19668] Updated weights for policy 0, policy_version 282886 (0.0016) [2025-01-05 14:07:33,776][19668] Updated weights for policy 0, policy_version 282896 (0.0018) [2025-01-05 14:07:34,965][19571] Fps is (10 sec: 20070.6, 60 sec: 20002.2, 300 sec: 20146.8). Total num frames: 1158762496. Throughput: 0: 5008.4. Samples: 14679710. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 14:07:34,965][19571] Avg episode reward: [(0, '10.132')] [2025-01-05 14:07:35,833][19668] Updated weights for policy 0, policy_version 282906 (0.0017) [2025-01-05 14:07:37,890][19668] Updated weights for policy 0, policy_version 282916 (0.0014) [2025-01-05 14:07:39,886][19668] Updated weights for policy 0, policy_version 282926 (0.0015) [2025-01-05 14:07:39,965][19571] Fps is (10 sec: 20070.7, 60 sec: 20070.4, 300 sec: 20146.8). Total num frames: 1158864896. Throughput: 0: 5013.9. Samples: 14709920. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 14:07:39,965][19571] Avg episode reward: [(0, '9.386')] [2025-01-05 14:07:41,986][19668] Updated weights for policy 0, policy_version 282936 (0.0015) [2025-01-05 14:07:44,040][19668] Updated weights for policy 0, policy_version 282946 (0.0015) [2025-01-05 14:07:44,965][19571] Fps is (10 sec: 20070.3, 60 sec: 20070.4, 300 sec: 20132.9). Total num frames: 1158963200. Throughput: 0: 5026.7. Samples: 14739984. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 14:07:44,965][19571] Avg episode reward: [(0, '9.751')] [2025-01-05 14:07:46,051][19668] Updated weights for policy 0, policy_version 282956 (0.0016) [2025-01-05 14:07:48,094][19668] Updated weights for policy 0, policy_version 282966 (0.0015) [2025-01-05 14:07:49,965][19571] Fps is (10 sec: 19660.4, 60 sec: 20002.1, 300 sec: 20119.0). Total num frames: 1159061504. Throughput: 0: 5031.7. Samples: 14755096. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 14:07:49,966][19571] Avg episode reward: [(0, '9.464')] [2025-01-05 14:07:50,233][19668] Updated weights for policy 0, policy_version 282976 (0.0017) [2025-01-05 14:07:52,198][19668] Updated weights for policy 0, policy_version 282986 (0.0015) [2025-01-05 14:07:54,235][19668] Updated weights for policy 0, policy_version 282996 (0.0015) [2025-01-05 14:07:54,965][19571] Fps is (10 sec: 20070.2, 60 sec: 20070.4, 300 sec: 20132.9). Total num frames: 1159163904. Throughput: 0: 5023.2. Samples: 14784968. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 14:07:54,965][19571] Avg episode reward: [(0, '10.336')] [2025-01-05 14:07:56,372][19668] Updated weights for policy 0, policy_version 283006 (0.0016) [2025-01-05 14:07:58,326][19668] Updated weights for policy 0, policy_version 283016 (0.0015) [2025-01-05 14:07:59,965][19571] Fps is (10 sec: 20070.8, 60 sec: 20070.5, 300 sec: 20119.0). Total num frames: 1159262208. Throughput: 0: 5018.7. Samples: 14815116. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 14:07:59,965][19571] Avg episode reward: [(0, '9.391')] [2025-01-05 14:08:00,354][19668] Updated weights for policy 0, policy_version 283026 (0.0014) [2025-01-05 14:08:02,426][19668] Updated weights for policy 0, policy_version 283036 (0.0015) [2025-01-05 14:08:04,380][19668] Updated weights for policy 0, policy_version 283046 (0.0015) [2025-01-05 14:08:04,965][19571] Fps is (10 sec: 20070.5, 60 sec: 20138.7, 300 sec: 20119.0). Total num frames: 1159364608. Throughput: 0: 5018.4. Samples: 14830392. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 14:08:04,965][19571] Avg episode reward: [(0, '10.428')] [2025-01-05 14:08:06,414][19668] Updated weights for policy 0, policy_version 283056 (0.0016) [2025-01-05 14:08:08,512][19668] Updated weights for policy 0, policy_version 283066 (0.0015) [2025-01-05 14:08:09,965][19571] Fps is (10 sec: 20479.8, 60 sec: 20138.7, 300 sec: 20132.9). Total num frames: 1159467008. Throughput: 0: 5021.4. Samples: 14860584. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 14:08:09,965][19571] Avg episode reward: [(0, '9.813')] [2025-01-05 14:08:10,495][19668] Updated weights for policy 0, policy_version 283076 (0.0015) [2025-01-05 14:08:12,589][19668] Updated weights for policy 0, policy_version 283086 (0.0019) [2025-01-05 14:08:14,668][19668] Updated weights for policy 0, policy_version 283096 (0.0016) [2025-01-05 14:08:14,965][19571] Fps is (10 sec: 20070.3, 60 sec: 20070.4, 300 sec: 20119.0). Total num frames: 1159565312. Throughput: 0: 5016.8. Samples: 14890552. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 14:08:14,965][19571] Avg episode reward: [(0, '9.372')] [2025-01-05 14:08:16,710][19668] Updated weights for policy 0, policy_version 283106 (0.0017) [2025-01-05 14:08:18,806][19668] Updated weights for policy 0, policy_version 283116 (0.0016) [2025-01-05 14:08:19,965][19571] Fps is (10 sec: 19661.0, 60 sec: 20002.2, 300 sec: 20119.0). Total num frames: 1159663616. Throughput: 0: 5009.3. Samples: 14905130. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 14:08:19,965][19571] Avg episode reward: [(0, '9.214')] [2025-01-05 14:08:20,966][19668] Updated weights for policy 0, policy_version 283126 (0.0016) [2025-01-05 14:08:22,906][19668] Updated weights for policy 0, policy_version 283136 (0.0015) [2025-01-05 14:08:24,944][19668] Updated weights for policy 0, policy_version 283146 (0.0016) [2025-01-05 14:08:24,965][19571] Fps is (10 sec: 20070.7, 60 sec: 20070.4, 300 sec: 20119.0). Total num frames: 1159766016. Throughput: 0: 5001.6. Samples: 14934994. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 14:08:24,965][19571] Avg episode reward: [(0, '10.226')] [2025-01-05 14:08:27,149][19668] Updated weights for policy 0, policy_version 283156 (0.0016) [2025-01-05 14:08:29,090][19668] Updated weights for policy 0, policy_version 283166 (0.0016) [2025-01-05 14:08:29,965][19571] Fps is (10 sec: 20070.1, 60 sec: 20002.1, 300 sec: 20105.1). Total num frames: 1159864320. Throughput: 0: 4993.3. Samples: 14964682. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 14:08:29,965][19571] Avg episode reward: [(0, '10.596')] [2025-01-05 14:08:31,151][19668] Updated weights for policy 0, policy_version 283176 (0.0016) [2025-01-05 14:08:33,242][19668] Updated weights for policy 0, policy_version 283186 (0.0016) [2025-01-05 14:08:34,529][19636] Signal inference workers to stop experience collection... (200 times) [2025-01-05 14:08:34,530][19636] Signal inference workers to resume experience collection... (200 times) [2025-01-05 14:08:34,542][19668] InferenceWorker_p0-w0: stopping experience collection (200 times) [2025-01-05 14:08:34,542][19668] InferenceWorker_p0-w0: resuming experience collection (200 times) [2025-01-05 14:08:34,965][19571] Fps is (10 sec: 19660.6, 60 sec: 20002.1, 300 sec: 20105.1). Total num frames: 1159962624. Throughput: 0: 4993.5. Samples: 14979804. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 14:08:34,965][19571] Avg episode reward: [(0, '9.126')] [2025-01-05 14:08:35,258][19668] Updated weights for policy 0, policy_version 283196 (0.0017) [2025-01-05 14:08:37,359][19668] Updated weights for policy 0, policy_version 283206 (0.0016) [2025-01-05 14:08:39,541][19668] Updated weights for policy 0, policy_version 283216 (0.0017) [2025-01-05 14:08:39,965][19571] Fps is (10 sec: 19660.6, 60 sec: 19933.8, 300 sec: 20105.1). Total num frames: 1160060928. Throughput: 0: 4980.9. Samples: 15009110. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 14:08:39,965][19571] Avg episode reward: [(0, '9.925')] [2025-01-05 14:08:39,973][19636] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000283218_1160060928.pth... [2025-01-05 14:08:40,023][19636] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000282042_1155244032.pth [2025-01-05 14:08:41,616][19668] Updated weights for policy 0, policy_version 283226 (0.0017) [2025-01-05 14:08:43,673][19668] Updated weights for policy 0, policy_version 283236 (0.0016) [2025-01-05 14:08:44,965][19571] Fps is (10 sec: 19661.0, 60 sec: 19933.9, 300 sec: 20091.2). Total num frames: 1160159232. Throughput: 0: 4962.4. Samples: 15038424. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 14:08:44,965][19571] Avg episode reward: [(0, '10.089')] [2025-01-05 14:08:45,797][19668] Updated weights for policy 0, policy_version 283246 (0.0019) [2025-01-05 14:08:47,748][19668] Updated weights for policy 0, policy_version 283256 (0.0016) [2025-01-05 14:08:49,798][19668] Updated weights for policy 0, policy_version 283266 (0.0016) [2025-01-05 14:08:49,965][19571] Fps is (10 sec: 19660.9, 60 sec: 19933.9, 300 sec: 20077.3). Total num frames: 1160257536. Throughput: 0: 4958.0. Samples: 15053502. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 14:08:49,965][19571] Avg episode reward: [(0, '9.579')] [2025-01-05 14:08:51,939][19668] Updated weights for policy 0, policy_version 283276 (0.0017) [2025-01-05 14:08:53,876][19668] Updated weights for policy 0, policy_version 283286 (0.0016) [2025-01-05 14:08:54,965][19571] Fps is (10 sec: 20070.2, 60 sec: 19933.9, 300 sec: 20077.3). Total num frames: 1160359936. Throughput: 0: 4958.4. Samples: 15083710. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 14:08:54,965][19571] Avg episode reward: [(0, '10.705')] [2025-01-05 14:08:55,910][19668] Updated weights for policy 0, policy_version 283296 (0.0016) [2025-01-05 14:08:57,968][19668] Updated weights for policy 0, policy_version 283306 (0.0016) [2025-01-05 14:08:59,965][19571] Fps is (10 sec: 20480.3, 60 sec: 20002.1, 300 sec: 20091.2). Total num frames: 1160462336. Throughput: 0: 4964.4. Samples: 15113950. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 14:08:59,965][19668] Updated weights for policy 0, policy_version 283316 (0.0015) [2025-01-05 14:08:59,965][19571] Avg episode reward: [(0, '11.436')] [2025-01-05 14:09:01,990][19668] Updated weights for policy 0, policy_version 283326 (0.0015) [2025-01-05 14:09:04,132][19668] Updated weights for policy 0, policy_version 283336 (0.0016) [2025-01-05 14:09:04,965][19571] Fps is (10 sec: 20069.7, 60 sec: 19933.7, 300 sec: 20077.3). Total num frames: 1160560640. Throughput: 0: 4976.1. Samples: 15129056. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 14:09:04,966][19571] Avg episode reward: [(0, '9.914')] [2025-01-05 14:09:06,162][19668] Updated weights for policy 0, policy_version 283346 (0.0017) [2025-01-05 14:09:08,389][19668] Updated weights for policy 0, policy_version 283356 (0.0016) [2025-01-05 14:09:09,965][19571] Fps is (10 sec: 19251.1, 60 sec: 19797.3, 300 sec: 20063.5). Total num frames: 1160654848. Throughput: 0: 4949.3. Samples: 15157712. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 14:09:09,965][19571] Avg episode reward: [(0, '9.213')] [2025-01-05 14:09:10,572][19668] Updated weights for policy 0, policy_version 283366 (0.0017) [2025-01-05 14:09:12,560][19668] Updated weights for policy 0, policy_version 283376 (0.0015) [2025-01-05 14:09:14,885][19668] Updated weights for policy 0, policy_version 283386 (0.0019) [2025-01-05 14:09:14,965][19571] Fps is (10 sec: 18842.2, 60 sec: 19729.1, 300 sec: 20035.7). Total num frames: 1160749056. Throughput: 0: 4919.3. Samples: 15186050. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-05 14:09:14,965][19571] Avg episode reward: [(0, '8.761')] [2025-01-05 14:09:17,304][19668] Updated weights for policy 0, policy_version 283396 (0.0020) [2025-01-05 14:09:19,332][19668] Updated weights for policy 0, policy_version 283406 (0.0017) [2025-01-05 14:09:19,965][19571] Fps is (10 sec: 18432.0, 60 sec: 19592.5, 300 sec: 19994.0). Total num frames: 1160839168. Throughput: 0: 4877.5. Samples: 15199290. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-05 14:09:19,965][19571] Avg episode reward: [(0, '11.384')] [2025-01-05 14:09:21,543][19668] Updated weights for policy 0, policy_version 283416 (0.0018) [2025-01-05 14:09:23,908][19668] Updated weights for policy 0, policy_version 283426 (0.0017) [2025-01-05 14:09:24,965][19571] Fps is (10 sec: 18022.6, 60 sec: 19387.7, 300 sec: 19966.3). Total num frames: 1160929280. Throughput: 0: 4847.0. Samples: 15227222. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-05 14:09:24,965][19571] Avg episode reward: [(0, '10.844')] [2025-01-05 14:09:26,159][19668] Updated weights for policy 0, policy_version 283436 (0.0018) [2025-01-05 14:09:28,370][19668] Updated weights for policy 0, policy_version 283446 (0.0019) [2025-01-05 14:09:29,965][19571] Fps is (10 sec: 18022.5, 60 sec: 19251.2, 300 sec: 19924.6). Total num frames: 1161019392. Throughput: 0: 4795.6. Samples: 15254228. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-05 14:09:29,965][19571] Avg episode reward: [(0, '10.869')] [2025-01-05 14:09:30,821][19668] Updated weights for policy 0, policy_version 283456 (0.0018) [2025-01-05 14:09:32,897][19668] Updated weights for policy 0, policy_version 283466 (0.0018) [2025-01-05 14:09:34,965][19571] Fps is (10 sec: 18431.9, 60 sec: 19182.9, 300 sec: 19910.7). Total num frames: 1161113600. Throughput: 0: 4764.9. Samples: 15267922. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-05 14:09:34,965][19571] Avg episode reward: [(0, '10.769')] [2025-01-05 14:09:35,152][19668] Updated weights for policy 0, policy_version 283476 (0.0019) [2025-01-05 14:09:37,411][19668] Updated weights for policy 0, policy_version 283486 (0.0019) [2025-01-05 14:09:39,534][19668] Updated weights for policy 0, policy_version 283496 (0.0019) [2025-01-05 14:09:39,965][19571] Fps is (10 sec: 18431.7, 60 sec: 19046.4, 300 sec: 19869.1). Total num frames: 1161203712. Throughput: 0: 4712.2. Samples: 15295758. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-05 14:09:39,966][19571] Avg episode reward: [(0, '10.439')] [2025-01-05 14:09:41,893][19668] Updated weights for policy 0, policy_version 283506 (0.0018) [2025-01-05 14:09:44,019][19668] Updated weights for policy 0, policy_version 283516 (0.0018) [2025-01-05 14:09:44,965][19571] Fps is (10 sec: 18432.0, 60 sec: 18978.1, 300 sec: 19855.2). Total num frames: 1161297920. Throughput: 0: 4653.3. Samples: 15323350. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-05 14:09:44,965][19571] Avg episode reward: [(0, '9.801')] [2025-01-05 14:09:46,167][19668] Updated weights for policy 0, policy_version 283526 (0.0018) [2025-01-05 14:09:48,322][19668] Updated weights for policy 0, policy_version 283536 (0.0018) [2025-01-05 14:09:49,965][19571] Fps is (10 sec: 18841.9, 60 sec: 18909.9, 300 sec: 19827.4). Total num frames: 1161392128. Throughput: 0: 4635.5. Samples: 15337650. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-05 14:09:49,965][19571] Avg episode reward: [(0, '10.077')] [2025-01-05 14:09:50,489][19668] Updated weights for policy 0, policy_version 283546 (0.0018) [2025-01-05 14:09:52,497][19668] Updated weights for policy 0, policy_version 283556 (0.0017) [2025-01-05 14:09:54,680][19668] Updated weights for policy 0, policy_version 283566 (0.0018) [2025-01-05 14:09:54,965][19571] Fps is (10 sec: 19251.4, 60 sec: 18841.6, 300 sec: 19813.5). Total num frames: 1161490432. Throughput: 0: 4643.7. Samples: 15366678. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-05 14:09:54,965][19571] Avg episode reward: [(0, '10.093')] [2025-01-05 14:09:56,933][19668] Updated weights for policy 0, policy_version 283576 (0.0019) [2025-01-05 14:09:59,111][19668] Updated weights for policy 0, policy_version 283586 (0.0018) [2025-01-05 14:09:59,965][19571] Fps is (10 sec: 18841.2, 60 sec: 18636.7, 300 sec: 19785.8). Total num frames: 1161580544. Throughput: 0: 4627.3. Samples: 15394280. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-05 14:09:59,966][19571] Avg episode reward: [(0, '8.643')] [2025-01-05 14:10:01,328][19668] Updated weights for policy 0, policy_version 283596 (0.0018) [2025-01-05 14:10:03,432][19668] Updated weights for policy 0, policy_version 283606 (0.0019) [2025-01-05 14:10:04,965][19571] Fps is (10 sec: 18841.4, 60 sec: 18636.9, 300 sec: 19771.9). Total num frames: 1161678848. Throughput: 0: 4660.0. Samples: 15408992. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-05 14:10:04,965][19571] Avg episode reward: [(0, '11.153')] [2025-01-05 14:10:05,587][19668] Updated weights for policy 0, policy_version 283616 (0.0019) [2025-01-05 14:10:08,104][19668] Updated weights for policy 0, policy_version 283626 (0.0016) [2025-01-05 14:10:09,965][19571] Fps is (10 sec: 18432.4, 60 sec: 18500.3, 300 sec: 19730.2). Total num frames: 1161764864. Throughput: 0: 4630.8. Samples: 15435610. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-05 14:10:09,966][19571] Avg episode reward: [(0, '9.884')] [2025-01-05 14:10:10,364][19668] Updated weights for policy 0, policy_version 283636 (0.0018) [2025-01-05 14:10:12,383][19668] Updated weights for policy 0, policy_version 283646 (0.0017) [2025-01-05 14:10:14,554][19668] Updated weights for policy 0, policy_version 283656 (0.0018) [2025-01-05 14:10:14,965][19571] Fps is (10 sec: 18432.1, 60 sec: 18568.6, 300 sec: 19716.3). Total num frames: 1161863168. Throughput: 0: 4666.9. Samples: 15464238. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-05 14:10:14,965][19571] Avg episode reward: [(0, '11.073')] [2025-01-05 14:10:16,852][19668] Updated weights for policy 0, policy_version 283666 (0.0019) [2025-01-05 14:10:18,969][19668] Updated weights for policy 0, policy_version 283676 (0.0017) [2025-01-05 14:10:19,965][19571] Fps is (10 sec: 18841.5, 60 sec: 18568.5, 300 sec: 19674.7). Total num frames: 1161953280. Throughput: 0: 4666.2. Samples: 15477900. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-05 14:10:19,965][19571] Avg episode reward: [(0, '9.305')] [2025-01-05 14:10:21,398][19668] Updated weights for policy 0, policy_version 283686 (0.0019) [2025-01-05 14:10:23,606][19668] Updated weights for policy 0, policy_version 283696 (0.0019) [2025-01-05 14:10:24,965][19571] Fps is (10 sec: 17612.8, 60 sec: 18500.3, 300 sec: 19619.1). Total num frames: 1162039296. Throughput: 0: 4646.6. Samples: 15504856. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-05 14:10:24,965][19571] Avg episode reward: [(0, '9.750')] [2025-01-05 14:10:25,953][19668] Updated weights for policy 0, policy_version 283706 (0.0020) [2025-01-05 14:10:28,173][19668] Updated weights for policy 0, policy_version 283716 (0.0019) [2025-01-05 14:10:29,965][19571] Fps is (10 sec: 17612.8, 60 sec: 18500.2, 300 sec: 19591.4). Total num frames: 1162129408. Throughput: 0: 4616.5. Samples: 15531092. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 14:10:29,965][19571] Avg episode reward: [(0, '8.532')] [2025-01-05 14:10:30,623][19668] Updated weights for policy 0, policy_version 283726 (0.0019) [2025-01-05 14:10:32,859][19668] Updated weights for policy 0, policy_version 283736 (0.0018) [2025-01-05 14:10:34,965][19571] Fps is (10 sec: 18022.0, 60 sec: 18431.9, 300 sec: 19549.7). Total num frames: 1162219520. Throughput: 0: 4599.6. Samples: 15544632. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 14:10:34,966][19571] Avg episode reward: [(0, '10.324')] [2025-01-05 14:10:35,196][19668] Updated weights for policy 0, policy_version 283746 (0.0020) [2025-01-05 14:10:37,518][19668] Updated weights for policy 0, policy_version 283756 (0.0018) [2025-01-05 14:10:39,606][19668] Updated weights for policy 0, policy_version 283766 (0.0017) [2025-01-05 14:10:39,965][19571] Fps is (10 sec: 18022.1, 60 sec: 18432.0, 300 sec: 19508.1). Total num frames: 1162309632. Throughput: 0: 4561.4. Samples: 15571944. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 14:10:39,966][19571] Avg episode reward: [(0, '9.951')] [2025-01-05 14:10:40,089][19636] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000283768_1162313728.pth... [2025-01-05 14:10:40,140][19636] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000282632_1157660672.pth [2025-01-05 14:10:41,818][19668] Updated weights for policy 0, policy_version 283776 (0.0019) [2025-01-05 14:10:43,911][19668] Updated weights for policy 0, policy_version 283786 (0.0017) [2025-01-05 14:10:44,965][19571] Fps is (10 sec: 18432.3, 60 sec: 18432.0, 300 sec: 19494.2). Total num frames: 1162403840. Throughput: 0: 4576.3. Samples: 15600214. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 14:10:44,965][19571] Avg episode reward: [(0, '9.047')] [2025-01-05 14:10:46,166][19668] Updated weights for policy 0, policy_version 283796 (0.0020) [2025-01-05 14:10:48,273][19668] Updated weights for policy 0, policy_version 283806 (0.0017) [2025-01-05 14:10:49,965][19571] Fps is (10 sec: 18841.7, 60 sec: 18431.9, 300 sec: 19466.4). Total num frames: 1162498048. Throughput: 0: 4561.1. Samples: 15614242. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 14:10:49,966][19571] Avg episode reward: [(0, '10.086')] [2025-01-05 14:10:50,491][19668] Updated weights for policy 0, policy_version 283816 (0.0018) [2025-01-05 14:10:52,504][19668] Updated weights for policy 0, policy_version 283826 (0.0017) [2025-01-05 14:10:54,562][19668] Updated weights for policy 0, policy_version 283836 (0.0018) [2025-01-05 14:10:54,965][19571] Fps is (10 sec: 19251.1, 60 sec: 18432.0, 300 sec: 19452.5). Total num frames: 1162596352. Throughput: 0: 4619.7. Samples: 15643498. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 14:10:54,965][19571] Avg episode reward: [(0, '10.022')] [2025-01-05 14:10:56,773][19668] Updated weights for policy 0, policy_version 283846 (0.0018) [2025-01-05 14:10:58,796][19668] Updated weights for policy 0, policy_version 283856 (0.0018) [2025-01-05 14:10:59,965][19571] Fps is (10 sec: 19661.1, 60 sec: 18568.6, 300 sec: 19452.5). Total num frames: 1162694656. Throughput: 0: 4628.7. Samples: 15672528. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 14:10:59,965][19571] Avg episode reward: [(0, '8.846')] [2025-01-05 14:11:00,941][19668] Updated weights for policy 0, policy_version 283866 (0.0018) [2025-01-05 14:11:03,033][19668] Updated weights for policy 0, policy_version 283876 (0.0017) [2025-01-05 14:11:04,965][19571] Fps is (10 sec: 19661.1, 60 sec: 18568.6, 300 sec: 19438.7). Total num frames: 1162792960. Throughput: 0: 4654.1. Samples: 15687334. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 14:11:04,965][19571] Avg episode reward: [(0, '10.626')] [2025-01-05 14:11:05,136][19668] Updated weights for policy 0, policy_version 283886 (0.0018) [2025-01-05 14:11:07,209][19668] Updated weights for policy 0, policy_version 283896 (0.0017) [2025-01-05 14:11:09,286][19668] Updated weights for policy 0, policy_version 283906 (0.0018) [2025-01-05 14:11:09,965][19571] Fps is (10 sec: 19660.5, 60 sec: 18773.3, 300 sec: 19424.7). Total num frames: 1162891264. Throughput: 0: 4708.1. Samples: 15716720. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 14:11:09,965][19571] Avg episode reward: [(0, '8.829')] [2025-01-05 14:11:11,440][19668] Updated weights for policy 0, policy_version 283916 (0.0018) [2025-01-05 14:11:13,472][19668] Updated weights for policy 0, policy_version 283926 (0.0017) [2025-01-05 14:11:14,965][19571] Fps is (10 sec: 19250.5, 60 sec: 18705.0, 300 sec: 19410.9). Total num frames: 1162985472. Throughput: 0: 4771.1. Samples: 15745792. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 14:11:14,966][19571] Avg episode reward: [(0, '9.486')] [2025-01-05 14:11:15,655][19668] Updated weights for policy 0, policy_version 283936 (0.0018) [2025-01-05 14:11:17,696][19668] Updated weights for policy 0, policy_version 283946 (0.0017) [2025-01-05 14:11:19,731][19668] Updated weights for policy 0, policy_version 283956 (0.0017) [2025-01-05 14:11:19,965][19571] Fps is (10 sec: 19251.4, 60 sec: 18841.6, 300 sec: 19397.0). Total num frames: 1163083776. Throughput: 0: 4798.4. Samples: 15760558. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 14:11:19,965][19571] Avg episode reward: [(0, '9.573')] [2025-01-05 14:11:21,909][19668] Updated weights for policy 0, policy_version 283966 (0.0018) [2025-01-05 14:11:23,953][19668] Updated weights for policy 0, policy_version 283976 (0.0017) [2025-01-05 14:11:24,965][19571] Fps is (10 sec: 19661.5, 60 sec: 19046.4, 300 sec: 19397.0). Total num frames: 1163182080. Throughput: 0: 4846.0. Samples: 15790014. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 14:11:24,965][19571] Avg episode reward: [(0, '11.023')] [2025-01-05 14:11:26,120][19668] Updated weights for policy 0, policy_version 283986 (0.0018) [2025-01-05 14:11:28,191][19668] Updated weights for policy 0, policy_version 283996 (0.0017) [2025-01-05 14:11:29,965][19571] Fps is (10 sec: 19661.0, 60 sec: 19183.0, 300 sec: 19383.1). Total num frames: 1163280384. Throughput: 0: 4856.0. Samples: 15818732. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 14:11:29,965][19571] Avg episode reward: [(0, '8.599')] [2025-01-05 14:11:30,341][19668] Updated weights for policy 0, policy_version 284006 (0.0018) [2025-01-05 14:11:32,460][19668] Updated weights for policy 0, policy_version 284016 (0.0018) [2025-01-05 14:11:34,522][19668] Updated weights for policy 0, policy_version 284026 (0.0018) [2025-01-05 14:11:34,965][19571] Fps is (10 sec: 19660.6, 60 sec: 19319.5, 300 sec: 19383.1). Total num frames: 1163378688. Throughput: 0: 4870.2. Samples: 15833400. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 14:11:34,965][19571] Avg episode reward: [(0, '10.253')] [2025-01-05 14:11:36,665][19668] Updated weights for policy 0, policy_version 284036 (0.0018) [2025-01-05 14:11:38,762][19668] Updated weights for policy 0, policy_version 284046 (0.0018) [2025-01-05 14:11:39,965][19571] Fps is (10 sec: 19251.0, 60 sec: 19387.8, 300 sec: 19369.2). Total num frames: 1163472896. Throughput: 0: 4872.3. Samples: 15862750. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 14:11:39,965][19571] Avg episode reward: [(0, '8.772')] [2025-01-05 14:11:40,945][19668] Updated weights for policy 0, policy_version 284056 (0.0018) [2025-01-05 14:11:42,970][19668] Updated weights for policy 0, policy_version 284066 (0.0017) [2025-01-05 14:11:44,965][19571] Fps is (10 sec: 19251.2, 60 sec: 19456.0, 300 sec: 19355.3). Total num frames: 1163571200. Throughput: 0: 4862.7. Samples: 15891348. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 14:11:44,965][19571] Avg episode reward: [(0, '10.280')] [2025-01-05 14:11:45,214][19668] Updated weights for policy 0, policy_version 284076 (0.0018) [2025-01-05 14:11:47,266][19668] Updated weights for policy 0, policy_version 284086 (0.0017) [2025-01-05 14:11:49,299][19668] Updated weights for policy 0, policy_version 284096 (0.0017) [2025-01-05 14:11:49,965][19571] Fps is (10 sec: 19661.0, 60 sec: 19524.3, 300 sec: 19355.3). Total num frames: 1163669504. Throughput: 0: 4861.4. Samples: 15906098. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 14:11:49,965][19571] Avg episode reward: [(0, '8.792')] [2025-01-05 14:11:51,584][19668] Updated weights for policy 0, policy_version 284106 (0.0019) [2025-01-05 14:11:53,636][19668] Updated weights for policy 0, policy_version 284116 (0.0017) [2025-01-05 14:11:54,965][19571] Fps is (10 sec: 18841.6, 60 sec: 19387.7, 300 sec: 19327.6). Total num frames: 1163759616. Throughput: 0: 4849.3. Samples: 15934936. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 14:11:54,965][19571] Avg episode reward: [(0, '9.844')] [2025-01-05 14:11:55,775][19668] Updated weights for policy 0, policy_version 284126 (0.0018) [2025-01-05 14:11:57,894][19668] Updated weights for policy 0, policy_version 284136 (0.0017) [2025-01-05 14:11:59,965][19571] Fps is (10 sec: 18841.3, 60 sec: 19387.7, 300 sec: 19327.6). Total num frames: 1163857920. Throughput: 0: 4841.5. Samples: 15963660. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 14:11:59,966][19571] Avg episode reward: [(0, '9.189')] [2025-01-05 14:12:00,059][19668] Updated weights for policy 0, policy_version 284146 (0.0019) [2025-01-05 14:12:02,194][19668] Updated weights for policy 0, policy_version 284156 (0.0018) [2025-01-05 14:12:04,297][19668] Updated weights for policy 0, policy_version 284166 (0.0018) [2025-01-05 14:12:04,965][19571] Fps is (10 sec: 19660.8, 60 sec: 19387.7, 300 sec: 19313.7). Total num frames: 1163956224. Throughput: 0: 4830.6. Samples: 15977934. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 14:12:04,965][19571] Avg episode reward: [(0, '9.678')] [2025-01-05 14:12:06,463][19668] Updated weights for policy 0, policy_version 284176 (0.0019) [2025-01-05 14:12:08,513][19668] Updated weights for policy 0, policy_version 284186 (0.0019) [2025-01-05 14:12:09,965][19571] Fps is (10 sec: 19251.5, 60 sec: 19319.5, 300 sec: 19285.9). Total num frames: 1164050432. Throughput: 0: 4822.8. Samples: 16007038. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 14:12:09,965][19571] Avg episode reward: [(0, '10.054')] [2025-01-05 14:12:10,703][19668] Updated weights for policy 0, policy_version 284196 (0.0017) [2025-01-05 14:12:12,767][19668] Updated weights for policy 0, policy_version 284206 (0.0017) [2025-01-05 14:12:14,799][19668] Updated weights for policy 0, policy_version 284216 (0.0017) [2025-01-05 14:12:14,965][19571] Fps is (10 sec: 19250.8, 60 sec: 19387.8, 300 sec: 19272.0). Total num frames: 1164148736. Throughput: 0: 4839.6. Samples: 16036516. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 14:12:14,966][19571] Avg episode reward: [(0, '10.431')] [2025-01-05 14:12:17,006][19668] Updated weights for policy 0, policy_version 284226 (0.0019) [2025-01-05 14:12:19,071][19668] Updated weights for policy 0, policy_version 284236 (0.0018) [2025-01-05 14:12:19,965][19571] Fps is (10 sec: 19660.8, 60 sec: 19387.8, 300 sec: 19272.0). Total num frames: 1164247040. Throughput: 0: 4829.7. Samples: 16050736. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 14:12:19,965][19571] Avg episode reward: [(0, '9.611')] [2025-01-05 14:12:21,198][19668] Updated weights for policy 0, policy_version 284246 (0.0018) [2025-01-05 14:12:23,280][19668] Updated weights for policy 0, policy_version 284256 (0.0017) [2025-01-05 14:12:24,965][19571] Fps is (10 sec: 19251.5, 60 sec: 19319.4, 300 sec: 19244.3). Total num frames: 1164341248. Throughput: 0: 4826.8. Samples: 16079956. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 14:12:24,965][19571] Avg episode reward: [(0, '10.112')] [2025-01-05 14:12:25,428][19668] Updated weights for policy 0, policy_version 284266 (0.0020) [2025-01-05 14:12:27,492][19668] Updated weights for policy 0, policy_version 284276 (0.0017) [2025-01-05 14:12:29,557][19668] Updated weights for policy 0, policy_version 284286 (0.0017) [2025-01-05 14:12:29,965][19571] Fps is (10 sec: 19251.0, 60 sec: 19319.4, 300 sec: 19244.2). Total num frames: 1164439552. Throughput: 0: 4845.9. Samples: 16109416. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 14:12:29,965][19571] Avg episode reward: [(0, '10.848')] [2025-01-05 14:12:31,716][19668] Updated weights for policy 0, policy_version 284296 (0.0018) [2025-01-05 14:12:33,772][19668] Updated weights for policy 0, policy_version 284306 (0.0018) [2025-01-05 14:12:34,965][19571] Fps is (10 sec: 19660.9, 60 sec: 19319.5, 300 sec: 19230.4). Total num frames: 1164537856. Throughput: 0: 4838.6. Samples: 16123834. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 14:12:34,965][19571] Avg episode reward: [(0, '9.042')] [2025-01-05 14:12:35,955][19668] Updated weights for policy 0, policy_version 284316 (0.0018) [2025-01-05 14:12:38,044][19668] Updated weights for policy 0, policy_version 284326 (0.0017) [2025-01-05 14:12:39,965][19571] Fps is (10 sec: 19251.1, 60 sec: 19319.4, 300 sec: 19216.5). Total num frames: 1164632064. Throughput: 0: 4840.8. Samples: 16152772. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 14:12:39,966][19571] Avg episode reward: [(0, '9.479')] [2025-01-05 14:12:39,991][19636] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000284335_1164636160.pth... [2025-01-05 14:12:40,038][19636] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000283218_1160060928.pth [2025-01-05 14:12:40,186][19668] Updated weights for policy 0, policy_version 284336 (0.0018) [2025-01-05 14:12:42,330][19668] Updated weights for policy 0, policy_version 284346 (0.0018) [2025-01-05 14:12:44,425][19668] Updated weights for policy 0, policy_version 284356 (0.0018) [2025-01-05 14:12:44,965][19571] Fps is (10 sec: 19251.3, 60 sec: 19319.5, 300 sec: 19216.5). Total num frames: 1164730368. Throughput: 0: 4843.4. Samples: 16181610. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 14:12:44,965][19571] Avg episode reward: [(0, '9.351')] [2025-01-05 14:12:46,595][19668] Updated weights for policy 0, policy_version 284366 (0.0019) [2025-01-05 14:12:48,715][19668] Updated weights for policy 0, policy_version 284376 (0.0017) [2025-01-05 14:12:49,965][19571] Fps is (10 sec: 19250.9, 60 sec: 19251.1, 300 sec: 19188.7). Total num frames: 1164824576. Throughput: 0: 4841.7. Samples: 16195810. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 14:12:49,966][19571] Avg episode reward: [(0, '10.268')] [2025-01-05 14:12:50,847][19668] Updated weights for policy 0, policy_version 284386 (0.0018) [2025-01-05 14:12:52,855][19668] Updated weights for policy 0, policy_version 284396 (0.0016) [2025-01-05 14:12:54,874][19668] Updated weights for policy 0, policy_version 284406 (0.0016) [2025-01-05 14:12:54,965][19571] Fps is (10 sec: 19660.5, 60 sec: 19456.0, 300 sec: 19202.6). Total num frames: 1164926976. Throughput: 0: 4853.2. Samples: 16225434. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 14:12:54,965][19571] Avg episode reward: [(0, '10.705')] [2025-01-05 14:12:56,953][19668] Updated weights for policy 0, policy_version 284416 (0.0016) [2025-01-05 14:12:58,979][19668] Updated weights for policy 0, policy_version 284426 (0.0015) [2025-01-05 14:12:59,965][19571] Fps is (10 sec: 20070.5, 60 sec: 19456.0, 300 sec: 19188.7). Total num frames: 1165025280. Throughput: 0: 4868.4. Samples: 16255596. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 14:12:59,966][19571] Avg episode reward: [(0, '11.318')] [2025-01-05 14:13:01,084][19668] Updated weights for policy 0, policy_version 284436 (0.0016) [2025-01-05 14:13:03,112][19668] Updated weights for policy 0, policy_version 284446 (0.0017) [2025-01-05 14:13:04,965][19571] Fps is (10 sec: 19661.0, 60 sec: 19456.0, 300 sec: 19174.8). Total num frames: 1165123584. Throughput: 0: 4883.6. Samples: 16270500. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 14:13:04,965][19571] Avg episode reward: [(0, '9.488')] [2025-01-05 14:13:05,192][19668] Updated weights for policy 0, policy_version 284456 (0.0016) [2025-01-05 14:13:07,211][19668] Updated weights for policy 0, policy_version 284466 (0.0015) [2025-01-05 14:13:09,270][19668] Updated weights for policy 0, policy_version 284476 (0.0016) [2025-01-05 14:13:09,965][19571] Fps is (10 sec: 20070.9, 60 sec: 19592.5, 300 sec: 19188.7). Total num frames: 1165225984. Throughput: 0: 4898.7. Samples: 16300396. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 14:13:09,965][19571] Avg episode reward: [(0, '11.361')] [2025-01-05 14:13:11,380][19668] Updated weights for policy 0, policy_version 284486 (0.0018) [2025-01-05 14:13:13,443][19668] Updated weights for policy 0, policy_version 284496 (0.0017) [2025-01-05 14:13:14,965][19571] Fps is (10 sec: 20070.6, 60 sec: 19592.6, 300 sec: 19188.7). Total num frames: 1165324288. Throughput: 0: 4897.7. Samples: 16329810. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 14:13:14,965][19571] Avg episode reward: [(0, '9.836')] [2025-01-05 14:13:15,541][19668] Updated weights for policy 0, policy_version 284506 (0.0017) [2025-01-05 14:13:17,570][19668] Updated weights for policy 0, policy_version 284516 (0.0016) [2025-01-05 14:13:19,619][19668] Updated weights for policy 0, policy_version 284526 (0.0016) [2025-01-05 14:13:19,965][19571] Fps is (10 sec: 19660.6, 60 sec: 19592.5, 300 sec: 19174.8). Total num frames: 1165422592. Throughput: 0: 4911.5. Samples: 16344852. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 14:13:19,965][19571] Avg episode reward: [(0, '11.258')] [2025-01-05 14:13:21,733][19668] Updated weights for policy 0, policy_version 284536 (0.0016) [2025-01-05 14:13:23,742][19668] Updated weights for policy 0, policy_version 284546 (0.0017) [2025-01-05 14:13:24,965][19571] Fps is (10 sec: 20070.2, 60 sec: 19729.1, 300 sec: 19188.7). Total num frames: 1165524992. Throughput: 0: 4929.8. Samples: 16374612. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 14:13:24,965][19571] Avg episode reward: [(0, '10.531')] [2025-01-05 14:13:25,813][19668] Updated weights for policy 0, policy_version 284556 (0.0016) [2025-01-05 14:13:27,898][19668] Updated weights for policy 0, policy_version 284566 (0.0016) [2025-01-05 14:13:29,965][19571] Fps is (10 sec: 19661.1, 60 sec: 19660.8, 300 sec: 19174.8). Total num frames: 1165619200. Throughput: 0: 4943.6. Samples: 16404074. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 14:13:29,965][19571] Avg episode reward: [(0, '9.924')] [2025-01-05 14:13:30,018][19668] Updated weights for policy 0, policy_version 284576 (0.0017) [2025-01-05 14:13:32,119][19668] Updated weights for policy 0, policy_version 284586 (0.0016) [2025-01-05 14:13:34,151][19668] Updated weights for policy 0, policy_version 284596 (0.0015) [2025-01-05 14:13:34,965][19571] Fps is (10 sec: 19251.2, 60 sec: 19660.8, 300 sec: 19174.8). Total num frames: 1165717504. Throughput: 0: 4957.1. Samples: 16418880. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 14:13:34,965][19571] Avg episode reward: [(0, '10.294')] [2025-01-05 14:13:36,246][19668] Updated weights for policy 0, policy_version 284606 (0.0015) [2025-01-05 14:13:38,309][19668] Updated weights for policy 0, policy_version 284616 (0.0015) [2025-01-05 14:13:39,965][19571] Fps is (10 sec: 19660.7, 60 sec: 19729.1, 300 sec: 19174.8). Total num frames: 1165815808. Throughput: 0: 4955.2. Samples: 16448416. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 14:13:39,965][19571] Avg episode reward: [(0, '8.944')] [2025-01-05 14:13:40,408][19668] Updated weights for policy 0, policy_version 284626 (0.0016) [2025-01-05 14:13:42,438][19668] Updated weights for policy 0, policy_version 284636 (0.0015) [2025-01-05 14:13:44,480][19668] Updated weights for policy 0, policy_version 284646 (0.0014) [2025-01-05 14:13:44,965][19571] Fps is (10 sec: 20070.4, 60 sec: 19797.3, 300 sec: 19188.7). Total num frames: 1165918208. Throughput: 0: 4950.5. Samples: 16478368. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 14:13:44,965][19571] Avg episode reward: [(0, '10.489')] [2025-01-05 14:13:46,599][19668] Updated weights for policy 0, policy_version 284656 (0.0016) [2025-01-05 14:13:48,591][19668] Updated weights for policy 0, policy_version 284666 (0.0016) [2025-01-05 14:13:49,965][19571] Fps is (10 sec: 20070.3, 60 sec: 19865.7, 300 sec: 19174.8). Total num frames: 1166016512. Throughput: 0: 4950.0. Samples: 16493250. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 14:13:49,965][19571] Avg episode reward: [(0, '10.297')] [2025-01-05 14:13:50,669][19668] Updated weights for policy 0, policy_version 284676 (0.0016) [2025-01-05 14:13:52,710][19668] Updated weights for policy 0, policy_version 284686 (0.0015) [2025-01-05 14:13:54,734][19668] Updated weights for policy 0, policy_version 284696 (0.0016) [2025-01-05 14:13:54,965][19571] Fps is (10 sec: 19660.8, 60 sec: 19797.4, 300 sec: 19160.9). Total num frames: 1166114816. Throughput: 0: 4953.2. Samples: 16523292. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 14:13:54,965][19571] Avg episode reward: [(0, '9.675')] [2025-01-05 14:13:56,857][19668] Updated weights for policy 0, policy_version 284706 (0.0016) [2025-01-05 14:13:58,869][19668] Updated weights for policy 0, policy_version 284716 (0.0017) [2025-01-05 14:13:59,965][19571] Fps is (10 sec: 20070.5, 60 sec: 19865.7, 300 sec: 19174.9). Total num frames: 1166217216. Throughput: 0: 4961.7. Samples: 16553086. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 14:13:59,965][19571] Avg episode reward: [(0, '9.551')] [2025-01-05 14:14:00,908][19668] Updated weights for policy 0, policy_version 284726 (0.0015) [2025-01-05 14:14:02,955][19668] Updated weights for policy 0, policy_version 284736 (0.0014) [2025-01-05 14:14:04,965][19571] Fps is (10 sec: 20070.6, 60 sec: 19865.6, 300 sec: 19188.7). Total num frames: 1166315520. Throughput: 0: 4962.9. Samples: 16568182. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 14:14:04,965][19571] Avg episode reward: [(0, '9.013')] [2025-01-05 14:14:05,115][19668] Updated weights for policy 0, policy_version 284746 (0.0016) [2025-01-05 14:14:07,130][19668] Updated weights for policy 0, policy_version 284756 (0.0016) [2025-01-05 14:14:09,170][19668] Updated weights for policy 0, policy_version 284766 (0.0015) [2025-01-05 14:14:09,965][19571] Fps is (10 sec: 19660.9, 60 sec: 19797.3, 300 sec: 19202.6). Total num frames: 1166413824. Throughput: 0: 4961.0. Samples: 16597858. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 14:14:09,965][19571] Avg episode reward: [(0, '9.233')] [2025-01-05 14:14:11,299][19668] Updated weights for policy 0, policy_version 284776 (0.0016) [2025-01-05 14:14:13,319][19668] Updated weights for policy 0, policy_version 284786 (0.0015) [2025-01-05 14:14:14,965][19571] Fps is (10 sec: 19660.5, 60 sec: 19797.3, 300 sec: 19230.4). Total num frames: 1166512128. Throughput: 0: 4957.1. Samples: 16627144. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 14:14:14,966][19571] Avg episode reward: [(0, '10.588')] [2025-01-05 14:14:15,474][19668] Updated weights for policy 0, policy_version 284796 (0.0016) [2025-01-05 14:14:17,525][19668] Updated weights for policy 0, policy_version 284806 (0.0016) [2025-01-05 14:14:19,543][19668] Updated weights for policy 0, policy_version 284816 (0.0015) [2025-01-05 14:14:19,965][19571] Fps is (10 sec: 19660.7, 60 sec: 19797.4, 300 sec: 19258.1). Total num frames: 1166610432. Throughput: 0: 4960.0. Samples: 16642082. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 14:14:19,965][19571] Avg episode reward: [(0, '9.874')] [2025-01-05 14:14:21,633][19668] Updated weights for policy 0, policy_version 284826 (0.0016) [2025-01-05 14:14:23,675][19668] Updated weights for policy 0, policy_version 284836 (0.0014) [2025-01-05 14:14:24,965][19571] Fps is (10 sec: 20070.6, 60 sec: 19797.3, 300 sec: 19299.8). Total num frames: 1166712832. Throughput: 0: 4969.6. Samples: 16672048. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 14:14:24,965][19571] Avg episode reward: [(0, '8.839')] [2025-01-05 14:14:25,785][19668] Updated weights for policy 0, policy_version 284846 (0.0016) [2025-01-05 14:14:27,802][19668] Updated weights for policy 0, policy_version 284856 (0.0016) [2025-01-05 14:14:29,851][19668] Updated weights for policy 0, policy_version 284866 (0.0016) [2025-01-05 14:14:29,965][19571] Fps is (10 sec: 20070.6, 60 sec: 19865.6, 300 sec: 19313.7). Total num frames: 1166811136. Throughput: 0: 4966.0. Samples: 16701840. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 14:14:29,965][19571] Avg episode reward: [(0, '10.238')] [2025-01-05 14:14:31,960][19668] Updated weights for policy 0, policy_version 284876 (0.0015) [2025-01-05 14:14:33,954][19668] Updated weights for policy 0, policy_version 284886 (0.0015) [2025-01-05 14:14:34,965][19571] Fps is (10 sec: 19660.7, 60 sec: 19865.6, 300 sec: 19341.5). Total num frames: 1166909440. Throughput: 0: 4964.5. Samples: 16716654. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 14:14:34,965][19571] Avg episode reward: [(0, '9.846')] [2025-01-05 14:14:36,015][19668] Updated weights for policy 0, policy_version 284896 (0.0015) [2025-01-05 14:14:38,031][19668] Updated weights for policy 0, policy_version 284906 (0.0015) [2025-01-05 14:14:39,965][19571] Fps is (10 sec: 20070.4, 60 sec: 19933.9, 300 sec: 19369.2). Total num frames: 1167011840. Throughput: 0: 4970.2. Samples: 16746952. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 14:14:39,965][19571] Avg episode reward: [(0, '9.937')] [2025-01-05 14:14:40,048][19636] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000284916_1167015936.pth... [2025-01-05 14:14:40,052][19668] Updated weights for policy 0, policy_version 284916 (0.0015) [2025-01-05 14:14:40,099][19636] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000283768_1162313728.pth [2025-01-05 14:14:42,127][19668] Updated weights for policy 0, policy_version 284926 (0.0016) [2025-01-05 14:14:44,147][19668] Updated weights for policy 0, policy_version 284936 (0.0016) [2025-01-05 14:14:44,965][19571] Fps is (10 sec: 20070.4, 60 sec: 19865.6, 300 sec: 19383.1). Total num frames: 1167110144. Throughput: 0: 4976.9. Samples: 16777046. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 14:14:44,965][19571] Avg episode reward: [(0, '9.649')] [2025-01-05 14:14:46,303][19668] Updated weights for policy 0, policy_version 284946 (0.0016) [2025-01-05 14:14:48,351][19668] Updated weights for policy 0, policy_version 284956 (0.0017) [2025-01-05 14:14:49,965][19571] Fps is (10 sec: 19659.7, 60 sec: 19865.5, 300 sec: 19383.1). Total num frames: 1167208448. Throughput: 0: 4962.3. Samples: 16791488. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 14:14:49,966][19571] Avg episode reward: [(0, '10.416')] [2025-01-05 14:14:50,508][19668] Updated weights for policy 0, policy_version 284966 (0.0017) [2025-01-05 14:14:52,565][19668] Updated weights for policy 0, policy_version 284976 (0.0016) [2025-01-05 14:14:54,640][19668] Updated weights for policy 0, policy_version 284986 (0.0016) [2025-01-05 14:14:54,965][19571] Fps is (10 sec: 19660.8, 60 sec: 19865.6, 300 sec: 19410.9). Total num frames: 1167306752. Throughput: 0: 4955.3. Samples: 16820848. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 14:14:54,965][19571] Avg episode reward: [(0, '9.810')] [2025-01-05 14:14:56,808][19668] Updated weights for policy 0, policy_version 284996 (0.0017) [2025-01-05 14:14:58,843][19668] Updated weights for policy 0, policy_version 285006 (0.0016) [2025-01-05 14:14:59,965][19571] Fps is (10 sec: 19662.0, 60 sec: 19797.4, 300 sec: 19410.9). Total num frames: 1167405056. Throughput: 0: 4951.9. Samples: 16849980. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 14:14:59,965][19571] Avg episode reward: [(0, '11.429')] [2025-01-05 14:15:00,936][19668] Updated weights for policy 0, policy_version 285016 (0.0016) [2025-01-05 14:15:03,003][19668] Updated weights for policy 0, policy_version 285026 (0.0016) [2025-01-05 14:15:04,965][19571] Fps is (10 sec: 19660.8, 60 sec: 19797.3, 300 sec: 19452.5). Total num frames: 1167503360. Throughput: 0: 4952.7. Samples: 16864954. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 14:15:04,965][19571] Avg episode reward: [(0, '9.824')] [2025-01-05 14:15:05,096][19668] Updated weights for policy 0, policy_version 285036 (0.0018) [2025-01-05 14:15:07,152][19668] Updated weights for policy 0, policy_version 285046 (0.0015) [2025-01-05 14:15:09,203][19668] Updated weights for policy 0, policy_version 285056 (0.0019) [2025-01-05 14:15:09,965][19571] Fps is (10 sec: 19660.5, 60 sec: 19797.3, 300 sec: 19452.5). Total num frames: 1167601664. Throughput: 0: 4948.4. Samples: 16894728. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 14:15:09,965][19571] Avg episode reward: [(0, '10.369')] [2025-01-05 14:15:11,283][19668] Updated weights for policy 0, policy_version 285066 (0.0016) [2025-01-05 14:15:13,334][19668] Updated weights for policy 0, policy_version 285076 (0.0016) [2025-01-05 14:15:14,965][19571] Fps is (10 sec: 19660.9, 60 sec: 19797.4, 300 sec: 19480.3). Total num frames: 1167699968. Throughput: 0: 4940.4. Samples: 16924160. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 14:15:14,965][19571] Avg episode reward: [(0, '9.665')] [2025-01-05 14:15:15,479][19668] Updated weights for policy 0, policy_version 285086 (0.0016) [2025-01-05 14:15:17,486][19668] Updated weights for policy 0, policy_version 285096 (0.0016) [2025-01-05 14:15:19,517][19668] Updated weights for policy 0, policy_version 285106 (0.0015) [2025-01-05 14:15:19,965][19571] Fps is (10 sec: 20070.5, 60 sec: 19865.6, 300 sec: 19535.8). Total num frames: 1167802368. Throughput: 0: 4942.5. Samples: 16939068. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 14:15:19,965][19571] Avg episode reward: [(0, '9.613')] [2025-01-05 14:15:22,001][19668] Updated weights for policy 0, policy_version 285116 (0.0017) [2025-01-05 14:15:24,274][19668] Updated weights for policy 0, policy_version 285126 (0.0014) [2025-01-05 14:15:24,965][19571] Fps is (10 sec: 18431.9, 60 sec: 19524.2, 300 sec: 19508.1). Total num frames: 1167884288. Throughput: 0: 4872.8. Samples: 16966230. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 14:15:24,965][19571] Avg episode reward: [(0, '11.180')] [2025-01-05 14:15:26,581][19668] Updated weights for policy 0, policy_version 285136 (0.0015) [2025-01-05 14:15:28,719][19668] Updated weights for policy 0, policy_version 285146 (0.0016) [2025-01-05 14:15:29,965][19571] Fps is (10 sec: 17612.9, 60 sec: 19456.0, 300 sec: 19522.0). Total num frames: 1167978496. Throughput: 0: 4820.5. Samples: 16993966. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 14:15:29,965][19571] Avg episode reward: [(0, '10.730')] [2025-01-05 14:15:30,849][19668] Updated weights for policy 0, policy_version 285156 (0.0015) [2025-01-05 14:15:32,915][19668] Updated weights for policy 0, policy_version 285166 (0.0016) [2025-01-05 14:15:34,965][19571] Fps is (10 sec: 19251.3, 60 sec: 19456.0, 300 sec: 19549.7). Total num frames: 1168076800. Throughput: 0: 4825.7. Samples: 17008642. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 14:15:34,965][19571] Avg episode reward: [(0, '9.794')] [2025-01-05 14:15:35,056][19668] Updated weights for policy 0, policy_version 285176 (0.0017) [2025-01-05 14:15:37,107][19668] Updated weights for policy 0, policy_version 285186 (0.0018) [2025-01-05 14:15:39,156][19668] Updated weights for policy 0, policy_version 285196 (0.0016) [2025-01-05 14:15:39,965][19571] Fps is (10 sec: 19660.7, 60 sec: 19387.7, 300 sec: 19563.6). Total num frames: 1168175104. Throughput: 0: 4830.8. Samples: 17038234. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 14:15:39,965][19571] Avg episode reward: [(0, '10.915')] [2025-01-05 14:15:41,251][19668] Updated weights for policy 0, policy_version 285206 (0.0019) [2025-01-05 14:15:43,246][19668] Updated weights for policy 0, policy_version 285216 (0.0016) [2025-01-05 14:15:44,965][19571] Fps is (10 sec: 20070.3, 60 sec: 19456.0, 300 sec: 19591.4). Total num frames: 1168277504. Throughput: 0: 4849.7. Samples: 17068218. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 14:15:44,965][19571] Avg episode reward: [(0, '10.195')] [2025-01-05 14:15:45,287][19668] Updated weights for policy 0, policy_version 285226 (0.0016) [2025-01-05 14:15:47,325][19668] Updated weights for policy 0, policy_version 285236 (0.0016) [2025-01-05 14:15:49,323][19668] Updated weights for policy 0, policy_version 285246 (0.0016) [2025-01-05 14:15:49,965][19571] Fps is (10 sec: 20479.8, 60 sec: 19524.4, 300 sec: 19605.3). Total num frames: 1168379904. Throughput: 0: 4854.3. Samples: 17083398. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 14:15:49,965][19571] Avg episode reward: [(0, '9.825')] [2025-01-05 14:15:51,370][19668] Updated weights for policy 0, policy_version 285256 (0.0015) [2025-01-05 14:15:53,380][19668] Updated weights for policy 0, policy_version 285266 (0.0016) [2025-01-05 14:15:54,965][19571] Fps is (10 sec: 20070.5, 60 sec: 19524.3, 300 sec: 19605.3). Total num frames: 1168478208. Throughput: 0: 4868.6. Samples: 17113816. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 14:15:54,965][19571] Avg episode reward: [(0, '9.852')] [2025-01-05 14:15:55,404][19668] Updated weights for policy 0, policy_version 285276 (0.0016) [2025-01-05 14:15:57,481][19668] Updated weights for policy 0, policy_version 285286 (0.0016) [2025-01-05 14:15:59,474][19668] Updated weights for policy 0, policy_version 285296 (0.0016) [2025-01-05 14:15:59,965][19571] Fps is (10 sec: 20070.5, 60 sec: 19592.5, 300 sec: 19619.1). Total num frames: 1168580608. Throughput: 0: 4884.9. Samples: 17143980. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 14:15:59,966][19571] Avg episode reward: [(0, '10.470')] [2025-01-05 14:16:01,495][19668] Updated weights for policy 0, policy_version 285306 (0.0016) [2025-01-05 14:16:03,561][19668] Updated weights for policy 0, policy_version 285316 (0.0016) [2025-01-05 14:16:04,965][19571] Fps is (10 sec: 20070.4, 60 sec: 19592.5, 300 sec: 19619.2). Total num frames: 1168678912. Throughput: 0: 4889.1. Samples: 17159076. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 14:16:04,965][19571] Avg episode reward: [(0, '9.405')] [2025-01-05 14:16:05,622][19668] Updated weights for policy 0, policy_version 285326 (0.0016) [2025-01-05 14:16:07,641][19668] Updated weights for policy 0, policy_version 285336 (0.0015) [2025-01-05 14:16:09,718][19668] Updated weights for policy 0, policy_version 285346 (0.0015) [2025-01-05 14:16:09,965][19571] Fps is (10 sec: 20070.6, 60 sec: 19660.8, 300 sec: 19646.9). Total num frames: 1168781312. Throughput: 0: 4951.9. Samples: 17189066. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 14:16:09,965][19571] Avg episode reward: [(0, '9.172')] [2025-01-05 14:16:11,765][19668] Updated weights for policy 0, policy_version 285356 (0.0016) [2025-01-05 14:16:13,777][19668] Updated weights for policy 0, policy_version 285366 (0.0016) [2025-01-05 14:16:14,965][19571] Fps is (10 sec: 20070.4, 60 sec: 19660.8, 300 sec: 19646.9). Total num frames: 1168879616. Throughput: 0: 5004.5. Samples: 17219170. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 14:16:14,965][19571] Avg episode reward: [(0, '10.436')] [2025-01-05 14:16:15,839][19668] Updated weights for policy 0, policy_version 285376 (0.0016) [2025-01-05 14:16:17,830][19668] Updated weights for policy 0, policy_version 285386 (0.0016) [2025-01-05 14:16:19,858][19668] Updated weights for policy 0, policy_version 285396 (0.0016) [2025-01-05 14:16:19,965][19571] Fps is (10 sec: 20070.3, 60 sec: 19660.8, 300 sec: 19660.8). Total num frames: 1168982016. Throughput: 0: 5015.6. Samples: 17234344. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 14:16:19,965][19571] Avg episode reward: [(0, '10.441')] [2025-01-05 14:16:21,985][19668] Updated weights for policy 0, policy_version 285406 (0.0016) [2025-01-05 14:16:24,000][19668] Updated weights for policy 0, policy_version 285416 (0.0016) [2025-01-05 14:16:24,965][19571] Fps is (10 sec: 20070.6, 60 sec: 19933.9, 300 sec: 19660.8). Total num frames: 1169080320. Throughput: 0: 5021.9. Samples: 17264218. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 14:16:24,965][19571] Avg episode reward: [(0, '10.551')] [2025-01-05 14:16:26,049][19668] Updated weights for policy 0, policy_version 285426 (0.0015) [2025-01-05 14:16:28,105][19668] Updated weights for policy 0, policy_version 285436 (0.0018) [2025-01-05 14:16:29,965][19571] Fps is (10 sec: 19661.0, 60 sec: 20002.1, 300 sec: 19660.8). Total num frames: 1169178624. Throughput: 0: 5019.0. Samples: 17294072. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 14:16:29,965][19571] Avg episode reward: [(0, '10.791')] [2025-01-05 14:16:30,180][19668] Updated weights for policy 0, policy_version 285446 (0.0017) [2025-01-05 14:16:32,235][19668] Updated weights for policy 0, policy_version 285456 (0.0016) [2025-01-05 14:16:34,263][19668] Updated weights for policy 0, policy_version 285466 (0.0016) [2025-01-05 14:16:34,965][19571] Fps is (10 sec: 20070.2, 60 sec: 20070.4, 300 sec: 19688.6). Total num frames: 1169281024. Throughput: 0: 5015.3. Samples: 17309088. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 14:16:34,965][19571] Avg episode reward: [(0, '11.500')] [2025-01-05 14:16:36,340][19668] Updated weights for policy 0, policy_version 285476 (0.0017) [2025-01-05 14:16:38,401][19668] Updated weights for policy 0, policy_version 285486 (0.0015) [2025-01-05 14:16:39,965][19571] Fps is (10 sec: 20070.4, 60 sec: 20070.4, 300 sec: 19688.6). Total num frames: 1169379328. Throughput: 0: 5002.7. Samples: 17338936. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 14:16:39,965][19571] Avg episode reward: [(0, '10.902')] [2025-01-05 14:16:40,062][19636] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000285494_1169383424.pth... [2025-01-05 14:16:40,112][19636] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000284335_1164636160.pth [2025-01-05 14:16:40,516][19668] Updated weights for policy 0, policy_version 285496 (0.0017) [2025-01-05 14:16:42,524][19668] Updated weights for policy 0, policy_version 285506 (0.0016) [2025-01-05 14:16:44,576][19668] Updated weights for policy 0, policy_version 285516 (0.0017) [2025-01-05 14:16:44,965][19571] Fps is (10 sec: 19661.0, 60 sec: 20002.2, 300 sec: 19688.6). Total num frames: 1169477632. Throughput: 0: 4998.4. Samples: 17368908. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 14:16:44,965][19571] Avg episode reward: [(0, '9.810')] [2025-01-05 14:16:46,687][19668] Updated weights for policy 0, policy_version 285526 (0.0017) [2025-01-05 14:16:48,698][19668] Updated weights for policy 0, policy_version 285536 (0.0015) [2025-01-05 14:16:49,965][19571] Fps is (10 sec: 20070.2, 60 sec: 20002.2, 300 sec: 19730.2). Total num frames: 1169580032. Throughput: 0: 4990.4. Samples: 17383644. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 14:16:49,965][19571] Avg episode reward: [(0, '10.201')] [2025-01-05 14:16:50,728][19668] Updated weights for policy 0, policy_version 285546 (0.0016) [2025-01-05 14:16:52,785][19668] Updated weights for policy 0, policy_version 285556 (0.0016) [2025-01-05 14:16:54,783][19668] Updated weights for policy 0, policy_version 285566 (0.0015) [2025-01-05 14:16:54,965][19571] Fps is (10 sec: 20070.4, 60 sec: 20002.2, 300 sec: 19730.2). Total num frames: 1169678336. Throughput: 0: 4997.7. Samples: 17413962. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 14:16:54,965][19571] Avg episode reward: [(0, '9.596')] [2025-01-05 14:16:56,811][19668] Updated weights for policy 0, policy_version 285576 (0.0016) [2025-01-05 14:16:58,844][19668] Updated weights for policy 0, policy_version 285586 (0.0016) [2025-01-05 14:16:59,965][19571] Fps is (10 sec: 20070.4, 60 sec: 20002.1, 300 sec: 19744.1). Total num frames: 1169780736. Throughput: 0: 4997.9. Samples: 17444076. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 14:16:59,965][19571] Avg episode reward: [(0, '10.209')] [2025-01-05 14:17:00,922][19668] Updated weights for policy 0, policy_version 285596 (0.0016) [2025-01-05 14:17:02,984][19668] Updated weights for policy 0, policy_version 285606 (0.0016) [2025-01-05 14:17:04,965][19571] Fps is (10 sec: 20070.2, 60 sec: 20002.1, 300 sec: 19758.0). Total num frames: 1169879040. Throughput: 0: 4992.4. Samples: 17459002. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 14:17:04,965][19571] Avg episode reward: [(0, '11.000')] [2025-01-05 14:17:05,083][19668] Updated weights for policy 0, policy_version 285616 (0.0016) [2025-01-05 14:17:07,085][19668] Updated weights for policy 0, policy_version 285626 (0.0015) [2025-01-05 14:17:09,165][19668] Updated weights for policy 0, policy_version 285636 (0.0016) [2025-01-05 14:17:09,965][19571] Fps is (10 sec: 19661.0, 60 sec: 19933.9, 300 sec: 19758.0). Total num frames: 1169977344. Throughput: 0: 4991.3. Samples: 17488826. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 14:17:09,965][19571] Avg episode reward: [(0, '10.285')] [2025-01-05 14:17:11,296][19668] Updated weights for policy 0, policy_version 285646 (0.0017) [2025-01-05 14:17:13,325][19668] Updated weights for policy 0, policy_version 285656 (0.0017) [2025-01-05 14:17:14,965][19571] Fps is (10 sec: 19660.7, 60 sec: 19933.9, 300 sec: 19758.0). Total num frames: 1170075648. Throughput: 0: 4979.3. Samples: 17518142. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 14:17:14,965][19571] Avg episode reward: [(0, '9.378')] [2025-01-05 14:17:15,455][19668] Updated weights for policy 0, policy_version 285666 (0.0016) [2025-01-05 14:17:17,504][19668] Updated weights for policy 0, policy_version 285676 (0.0017) [2025-01-05 14:17:19,532][19668] Updated weights for policy 0, policy_version 285686 (0.0017) [2025-01-05 14:17:19,965][19571] Fps is (10 sec: 19660.8, 60 sec: 19865.6, 300 sec: 19771.9). Total num frames: 1170173952. Throughput: 0: 4976.8. Samples: 17533044. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 14:17:19,965][19571] Avg episode reward: [(0, '10.339')] [2025-01-05 14:17:21,618][19668] Updated weights for policy 0, policy_version 285696 (0.0017) [2025-01-05 14:17:23,692][19668] Updated weights for policy 0, policy_version 285706 (0.0016) [2025-01-05 14:17:24,965][19571] Fps is (10 sec: 20070.5, 60 sec: 19933.8, 300 sec: 19785.8). Total num frames: 1170276352. Throughput: 0: 4977.1. Samples: 17562904. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 14:17:24,965][19571] Avg episode reward: [(0, '9.636')] [2025-01-05 14:17:25,753][19668] Updated weights for policy 0, policy_version 285716 (0.0017) [2025-01-05 14:17:27,783][19668] Updated weights for policy 0, policy_version 285726 (0.0016) [2025-01-05 14:17:29,889][19668] Updated weights for policy 0, policy_version 285736 (0.0018) [2025-01-05 14:17:29,965][19571] Fps is (10 sec: 20070.2, 60 sec: 19933.8, 300 sec: 19785.8). Total num frames: 1170374656. Throughput: 0: 4970.2. Samples: 17592568. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 14:17:29,965][19571] Avg episode reward: [(0, '10.382')] [2025-01-05 14:17:31,956][19668] Updated weights for policy 0, policy_version 285746 (0.0017) [2025-01-05 14:17:33,979][19668] Updated weights for policy 0, policy_version 285756 (0.0017) [2025-01-05 14:17:34,965][19571] Fps is (10 sec: 19661.0, 60 sec: 19865.6, 300 sec: 19799.7). Total num frames: 1170472960. Throughput: 0: 4971.0. Samples: 17607338. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 14:17:34,965][19571] Avg episode reward: [(0, '10.307')] [2025-01-05 14:17:36,138][19668] Updated weights for policy 0, policy_version 285766 (0.0016) [2025-01-05 14:17:38,170][19668] Updated weights for policy 0, policy_version 285776 (0.0016) [2025-01-05 14:17:39,965][19571] Fps is (10 sec: 19660.9, 60 sec: 19865.6, 300 sec: 19799.6). Total num frames: 1170571264. Throughput: 0: 4956.3. Samples: 17636998. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 14:17:39,965][19571] Avg episode reward: [(0, '11.269')] [2025-01-05 14:17:40,285][19668] Updated weights for policy 0, policy_version 285786 (0.0017) [2025-01-05 14:17:42,333][19668] Updated weights for policy 0, policy_version 285796 (0.0016) [2025-01-05 14:17:44,368][19668] Updated weights for policy 0, policy_version 285806 (0.0016) [2025-01-05 14:17:44,965][19571] Fps is (10 sec: 19660.6, 60 sec: 19865.6, 300 sec: 19813.6). Total num frames: 1170669568. Throughput: 0: 4949.3. Samples: 17666794. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 14:17:44,965][19571] Avg episode reward: [(0, '9.874')] [2025-01-05 14:17:46,450][19668] Updated weights for policy 0, policy_version 285816 (0.0019) [2025-01-05 14:17:48,502][19668] Updated weights for policy 0, policy_version 285826 (0.0016) [2025-01-05 14:17:49,965][19571] Fps is (10 sec: 19660.4, 60 sec: 19797.3, 300 sec: 19799.6). Total num frames: 1170767872. Throughput: 0: 4945.5. Samples: 17681550. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 14:17:49,966][19571] Avg episode reward: [(0, '10.749')] [2025-01-05 14:17:50,592][19668] Updated weights for policy 0, policy_version 285836 (0.0016) [2025-01-05 14:17:52,601][19668] Updated weights for policy 0, policy_version 285846 (0.0015) [2025-01-05 14:17:54,656][19668] Updated weights for policy 0, policy_version 285856 (0.0015) [2025-01-05 14:17:54,965][19571] Fps is (10 sec: 20070.4, 60 sec: 19865.6, 300 sec: 19813.5). Total num frames: 1170870272. Throughput: 0: 4948.9. Samples: 17711528. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 14:17:54,965][19571] Avg episode reward: [(0, '10.013')] [2025-01-05 14:17:56,772][19668] Updated weights for policy 0, policy_version 285866 (0.0017) [2025-01-05 14:17:58,783][19668] Updated weights for policy 0, policy_version 285876 (0.0015) [2025-01-05 14:17:59,965][19571] Fps is (10 sec: 20070.7, 60 sec: 19797.3, 300 sec: 19813.5). Total num frames: 1170968576. Throughput: 0: 4960.9. Samples: 17741384. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 14:17:59,965][19571] Avg episode reward: [(0, '9.871')] [2025-01-05 14:18:00,811][19668] Updated weights for policy 0, policy_version 285886 (0.0016) [2025-01-05 14:18:02,865][19668] Updated weights for policy 0, policy_version 285896 (0.0016) [2025-01-05 14:18:04,961][19668] Updated weights for policy 0, policy_version 285906 (0.0018) [2025-01-05 14:18:04,965][19571] Fps is (10 sec: 20070.6, 60 sec: 19865.6, 300 sec: 19813.5). Total num frames: 1171070976. Throughput: 0: 4965.0. Samples: 17756470. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 14:18:04,965][19571] Avg episode reward: [(0, '10.083')] [2025-01-05 14:18:07,021][19668] Updated weights for policy 0, policy_version 285916 (0.0016) [2025-01-05 14:18:09,050][19668] Updated weights for policy 0, policy_version 285926 (0.0015) [2025-01-05 14:18:09,965][19571] Fps is (10 sec: 20070.5, 60 sec: 19865.6, 300 sec: 19813.5). Total num frames: 1171169280. Throughput: 0: 4963.6. Samples: 17786264. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 14:18:09,965][19571] Avg episode reward: [(0, '10.441')] [2025-01-05 14:18:11,170][19668] Updated weights for policy 0, policy_version 285936 (0.0016) [2025-01-05 14:18:13,176][19668] Updated weights for policy 0, policy_version 285946 (0.0015) [2025-01-05 14:18:14,965][19571] Fps is (10 sec: 19660.7, 60 sec: 19865.6, 300 sec: 19813.5). Total num frames: 1171267584. Throughput: 0: 4970.6. Samples: 17816246. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 14:18:14,965][19571] Avg episode reward: [(0, '10.318')] [2025-01-05 14:18:15,209][19668] Updated weights for policy 0, policy_version 285956 (0.0015) [2025-01-05 14:18:17,231][19668] Updated weights for policy 0, policy_version 285966 (0.0017) [2025-01-05 14:18:19,257][19668] Updated weights for policy 0, policy_version 285976 (0.0015) [2025-01-05 14:18:19,966][19571] Fps is (10 sec: 20068.0, 60 sec: 19933.5, 300 sec: 19813.5). Total num frames: 1171369984. Throughput: 0: 4979.8. Samples: 17831436. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 14:18:19,967][19571] Avg episode reward: [(0, '10.570')] [2025-01-05 14:18:21,327][19668] Updated weights for policy 0, policy_version 285986 (0.0019) [2025-01-05 14:18:23,343][19668] Updated weights for policy 0, policy_version 285996 (0.0015) [2025-01-05 14:18:24,965][19571] Fps is (10 sec: 20480.0, 60 sec: 19933.9, 300 sec: 19841.3). Total num frames: 1171472384. Throughput: 0: 4989.9. Samples: 17861542. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 14:18:24,965][19571] Avg episode reward: [(0, '10.300')] [2025-01-05 14:18:25,382][19668] Updated weights for policy 0, policy_version 286006 (0.0015) [2025-01-05 14:18:27,423][19668] Updated weights for policy 0, policy_version 286016 (0.0015) [2025-01-05 14:18:29,447][19668] Updated weights for policy 0, policy_version 286026 (0.0016) [2025-01-05 14:18:29,965][19571] Fps is (10 sec: 20072.7, 60 sec: 19933.9, 300 sec: 19841.3). Total num frames: 1171570688. Throughput: 0: 4996.5. Samples: 17891636. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 14:18:29,965][19571] Avg episode reward: [(0, '9.950')] [2025-01-05 14:18:31,486][19668] Updated weights for policy 0, policy_version 286036 (0.0015) [2025-01-05 14:18:33,490][19668] Updated weights for policy 0, policy_version 286046 (0.0015) [2025-01-05 14:18:34,965][19571] Fps is (10 sec: 20070.2, 60 sec: 20002.1, 300 sec: 19855.2). Total num frames: 1171673088. Throughput: 0: 5006.1. Samples: 17906822. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 14:18:34,965][19571] Avg episode reward: [(0, '10.306')] [2025-01-05 14:18:35,530][19668] Updated weights for policy 0, policy_version 286056 (0.0016) [2025-01-05 14:18:37,558][19668] Updated weights for policy 0, policy_version 286066 (0.0015) [2025-01-05 14:18:39,562][19668] Updated weights for policy 0, policy_version 286076 (0.0014) [2025-01-05 14:18:39,965][19571] Fps is (10 sec: 20070.3, 60 sec: 20002.1, 300 sec: 19841.3). Total num frames: 1171771392. Throughput: 0: 5016.6. Samples: 17937274. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 14:18:39,966][19571] Avg episode reward: [(0, '10.098')] [2025-01-05 14:18:39,972][19636] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000286078_1171775488.pth... [2025-01-05 14:18:40,024][19636] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000284916_1167015936.pth [2025-01-05 14:18:41,670][19668] Updated weights for policy 0, policy_version 286086 (0.0015) [2025-01-05 14:18:43,708][19668] Updated weights for policy 0, policy_version 286096 (0.0015) [2025-01-05 14:18:44,965][19571] Fps is (10 sec: 20070.4, 60 sec: 20070.4, 300 sec: 19855.2). Total num frames: 1171873792. Throughput: 0: 5011.1. Samples: 17966882. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 14:18:44,965][19571] Avg episode reward: [(0, '9.586')] [2025-01-05 14:18:45,820][19668] Updated weights for policy 0, policy_version 286106 (0.0016) [2025-01-05 14:18:47,825][19668] Updated weights for policy 0, policy_version 286116 (0.0015) [2025-01-05 14:18:49,832][19668] Updated weights for policy 0, policy_version 286126 (0.0015) [2025-01-05 14:18:49,965][19571] Fps is (10 sec: 20070.8, 60 sec: 20070.5, 300 sec: 19855.2). Total num frames: 1171972096. Throughput: 0: 5010.3. Samples: 17981936. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 14:18:49,965][19571] Avg episode reward: [(0, '11.021')] [2025-01-05 14:18:51,891][19668] Updated weights for policy 0, policy_version 286136 (0.0015) [2025-01-05 14:18:53,923][19668] Updated weights for policy 0, policy_version 286146 (0.0015) [2025-01-05 14:18:54,965][19571] Fps is (10 sec: 19661.0, 60 sec: 20002.2, 300 sec: 19841.3). Total num frames: 1172070400. Throughput: 0: 5020.9. Samples: 18012204. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 14:18:54,965][19571] Avg episode reward: [(0, '10.029')] [2025-01-05 14:18:56,008][19668] Updated weights for policy 0, policy_version 286156 (0.0017) [2025-01-05 14:18:58,073][19668] Updated weights for policy 0, policy_version 286166 (0.0015) [2025-01-05 14:18:59,965][19571] Fps is (10 sec: 20070.4, 60 sec: 20070.4, 300 sec: 19855.2). Total num frames: 1172172800. Throughput: 0: 5011.8. Samples: 18041776. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 14:18:59,965][19571] Avg episode reward: [(0, '10.667')] [2025-01-05 14:19:00,142][19668] Updated weights for policy 0, policy_version 286176 (0.0016) [2025-01-05 14:19:02,171][19668] Updated weights for policy 0, policy_version 286186 (0.0015) [2025-01-05 14:19:04,227][19668] Updated weights for policy 0, policy_version 286196 (0.0015) [2025-01-05 14:19:04,965][19571] Fps is (10 sec: 20070.3, 60 sec: 20002.1, 300 sec: 19855.2). Total num frames: 1172271104. Throughput: 0: 5008.9. Samples: 18056832. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 14:19:04,965][19571] Avg episode reward: [(0, '9.140')] [2025-01-05 14:19:06,310][19668] Updated weights for policy 0, policy_version 286206 (0.0017) [2025-01-05 14:19:08,356][19668] Updated weights for policy 0, policy_version 286216 (0.0015) [2025-01-05 14:19:09,965][19571] Fps is (10 sec: 19660.6, 60 sec: 20002.1, 300 sec: 19855.2). Total num frames: 1172369408. Throughput: 0: 5003.7. Samples: 18086708. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 14:19:09,965][19571] Avg episode reward: [(0, '10.284')] [2025-01-05 14:19:10,451][19668] Updated weights for policy 0, policy_version 286226 (0.0016) [2025-01-05 14:19:12,447][19668] Updated weights for policy 0, policy_version 286236 (0.0015) [2025-01-05 14:19:14,486][19668] Updated weights for policy 0, policy_version 286246 (0.0015) [2025-01-05 14:19:14,965][19571] Fps is (10 sec: 20070.2, 60 sec: 20070.4, 300 sec: 19869.1). Total num frames: 1172471808. Throughput: 0: 5001.8. Samples: 18116716. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 14:19:14,965][19571] Avg episode reward: [(0, '9.373')] [2025-01-05 14:19:16,598][19668] Updated weights for policy 0, policy_version 286256 (0.0015) [2025-01-05 14:19:18,595][19668] Updated weights for policy 0, policy_version 286266 (0.0016) [2025-01-05 14:19:19,965][19571] Fps is (10 sec: 20070.7, 60 sec: 20002.6, 300 sec: 19855.2). Total num frames: 1172570112. Throughput: 0: 4993.7. Samples: 18131538. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 14:19:19,965][19571] Avg episode reward: [(0, '10.294')] [2025-01-05 14:19:20,651][19668] Updated weights for policy 0, policy_version 286276 (0.0015) [2025-01-05 14:19:22,681][19668] Updated weights for policy 0, policy_version 286286 (0.0015) [2025-01-05 14:19:24,667][19668] Updated weights for policy 0, policy_version 286296 (0.0015) [2025-01-05 14:19:24,965][19571] Fps is (10 sec: 20070.6, 60 sec: 20002.1, 300 sec: 19869.1). Total num frames: 1172672512. Throughput: 0: 4990.8. Samples: 18161860. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 14:19:24,965][19571] Avg episode reward: [(0, '9.067')] [2025-01-05 14:19:26,715][19668] Updated weights for policy 0, policy_version 286306 (0.0015) [2025-01-05 14:19:28,741][19668] Updated weights for policy 0, policy_version 286316 (0.0015) [2025-01-05 14:19:29,965][19571] Fps is (10 sec: 20070.3, 60 sec: 20002.2, 300 sec: 19869.1). Total num frames: 1172770816. Throughput: 0: 5004.2. Samples: 18192072. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 14:19:29,965][19571] Avg episode reward: [(0, '9.463')] [2025-01-05 14:19:30,834][19668] Updated weights for policy 0, policy_version 286326 (0.0017) [2025-01-05 14:19:32,889][19668] Updated weights for policy 0, policy_version 286336 (0.0014) [2025-01-05 14:19:34,948][19668] Updated weights for policy 0, policy_version 286346 (0.0015) [2025-01-05 14:19:34,965][19571] Fps is (10 sec: 20070.2, 60 sec: 20002.1, 300 sec: 19869.1). Total num frames: 1172873216. Throughput: 0: 5000.1. Samples: 18206940. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 14:19:34,965][19571] Avg episode reward: [(0, '9.363')] [2025-01-05 14:19:37,018][19668] Updated weights for policy 0, policy_version 286356 (0.0017) [2025-01-05 14:19:39,051][19668] Updated weights for policy 0, policy_version 286366 (0.0016) [2025-01-05 14:19:39,965][19571] Fps is (10 sec: 20070.3, 60 sec: 20002.2, 300 sec: 19869.1). Total num frames: 1172971520. Throughput: 0: 4992.7. Samples: 18236874. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 14:19:39,965][19571] Avg episode reward: [(0, '10.615')] [2025-01-05 14:19:41,140][19668] Updated weights for policy 0, policy_version 286376 (0.0018) [2025-01-05 14:19:43,163][19668] Updated weights for policy 0, policy_version 286386 (0.0015) [2025-01-05 14:19:44,965][19571] Fps is (10 sec: 19660.8, 60 sec: 19933.9, 300 sec: 19869.1). Total num frames: 1173069824. Throughput: 0: 4991.7. Samples: 18266404. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 14:19:44,965][19571] Avg episode reward: [(0, '9.869')] [2025-01-05 14:19:45,288][19668] Updated weights for policy 0, policy_version 286396 (0.0016) [2025-01-05 14:19:47,285][19668] Updated weights for policy 0, policy_version 286406 (0.0015) [2025-01-05 14:19:49,317][19668] Updated weights for policy 0, policy_version 286416 (0.0015) [2025-01-05 14:19:49,965][19571] Fps is (10 sec: 20070.2, 60 sec: 20002.1, 300 sec: 19882.9). Total num frames: 1173172224. Throughput: 0: 4994.9. Samples: 18281602. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 14:19:49,965][19571] Avg episode reward: [(0, '9.336')] [2025-01-05 14:19:51,428][19668] Updated weights for policy 0, policy_version 286426 (0.0016) [2025-01-05 14:19:53,415][19668] Updated weights for policy 0, policy_version 286436 (0.0015) [2025-01-05 14:19:54,965][19571] Fps is (10 sec: 20070.4, 60 sec: 20002.1, 300 sec: 19882.9). Total num frames: 1173270528. Throughput: 0: 4997.2. Samples: 18311582. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 14:19:54,965][19571] Avg episode reward: [(0, '9.641')] [2025-01-05 14:19:55,464][19668] Updated weights for policy 0, policy_version 286446 (0.0015) [2025-01-05 14:19:57,513][19668] Updated weights for policy 0, policy_version 286456 (0.0018) [2025-01-05 14:19:59,499][19668] Updated weights for policy 0, policy_version 286466 (0.0015) [2025-01-05 14:19:59,965][19571] Fps is (10 sec: 20070.5, 60 sec: 20002.1, 300 sec: 19896.8). Total num frames: 1173372928. Throughput: 0: 5003.9. Samples: 18341892. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 14:19:59,965][19571] Avg episode reward: [(0, '9.178')] [2025-01-05 14:20:01,547][19668] Updated weights for policy 0, policy_version 286476 (0.0015) [2025-01-05 14:20:03,598][19668] Updated weights for policy 0, policy_version 286486 (0.0015) [2025-01-05 14:20:04,965][19571] Fps is (10 sec: 20070.5, 60 sec: 20002.1, 300 sec: 19896.8). Total num frames: 1173471232. Throughput: 0: 5012.2. Samples: 18357088. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 14:20:04,965][19571] Avg episode reward: [(0, '11.415')] [2025-01-05 14:20:05,651][19668] Updated weights for policy 0, policy_version 286496 (0.0017) [2025-01-05 14:20:07,720][19668] Updated weights for policy 0, policy_version 286506 (0.0017) [2025-01-05 14:20:09,774][19668] Updated weights for policy 0, policy_version 286516 (0.0014) [2025-01-05 14:20:09,965][19571] Fps is (10 sec: 20070.3, 60 sec: 20070.4, 300 sec: 19910.7). Total num frames: 1173573632. Throughput: 0: 5000.2. Samples: 18386868. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 14:20:09,965][19571] Avg episode reward: [(0, '9.446')] [2025-01-05 14:20:11,826][19668] Updated weights for policy 0, policy_version 286526 (0.0018) [2025-01-05 14:20:13,917][19668] Updated weights for policy 0, policy_version 286536 (0.0016) [2025-01-05 14:20:14,965][19571] Fps is (10 sec: 20070.6, 60 sec: 20002.2, 300 sec: 19896.8). Total num frames: 1173671936. Throughput: 0: 4987.6. Samples: 18416512. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 14:20:14,965][19571] Avg episode reward: [(0, '10.630')] [2025-01-05 14:20:16,051][19668] Updated weights for policy 0, policy_version 286546 (0.0018) [2025-01-05 14:20:18,062][19668] Updated weights for policy 0, policy_version 286556 (0.0016) [2025-01-05 14:20:19,965][19571] Fps is (10 sec: 19660.9, 60 sec: 20002.1, 300 sec: 19952.4). Total num frames: 1173770240. Throughput: 0: 4984.4. Samples: 18431236. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 14:20:19,965][19571] Avg episode reward: [(0, '9.882')] [2025-01-05 14:20:20,197][19668] Updated weights for policy 0, policy_version 286566 (0.0017) [2025-01-05 14:20:22,288][19668] Updated weights for policy 0, policy_version 286576 (0.0016) [2025-01-05 14:20:24,486][19668] Updated weights for policy 0, policy_version 286586 (0.0017) [2025-01-05 14:20:24,965][19571] Fps is (10 sec: 18841.3, 60 sec: 19797.3, 300 sec: 19938.5). Total num frames: 1173860352. Throughput: 0: 4968.8. Samples: 18460472. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 14:20:24,965][19571] Avg episode reward: [(0, '10.972')] [2025-01-05 14:20:26,977][19668] Updated weights for policy 0, policy_version 286596 (0.0018) [2025-01-05 14:20:29,552][19668] Updated weights for policy 0, policy_version 286606 (0.0021) [2025-01-05 14:20:29,965][19571] Fps is (10 sec: 17202.9, 60 sec: 19524.2, 300 sec: 19882.9). Total num frames: 1173942272. Throughput: 0: 4857.6. Samples: 18484998. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 14:20:29,966][19571] Avg episode reward: [(0, '10.869')] [2025-01-05 14:20:31,821][19668] Updated weights for policy 0, policy_version 286616 (0.0020) [2025-01-05 14:20:34,412][19668] Updated weights for policy 0, policy_version 286626 (0.0018) [2025-01-05 14:20:34,966][19571] Fps is (10 sec: 16792.5, 60 sec: 19251.0, 300 sec: 19841.3). Total num frames: 1174028288. Throughput: 0: 4820.7. Samples: 18498536. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 14:20:34,966][19571] Avg episode reward: [(0, '10.641')] [2025-01-05 14:20:37,294][19668] Updated weights for policy 0, policy_version 286636 (0.0023) [2025-01-05 14:20:39,965][19571] Fps is (10 sec: 15155.3, 60 sec: 18705.0, 300 sec: 19716.3). Total num frames: 1174093824. Throughput: 0: 4620.8. Samples: 18519520. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 14:20:39,966][19571] Avg episode reward: [(0, '9.191')] [2025-01-05 14:20:40,010][19636] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000286645_1174097920.pth... [2025-01-05 14:20:40,063][19636] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000285494_1169383424.pth [2025-01-05 14:20:40,255][19668] Updated weights for policy 0, policy_version 286646 (0.0022) [2025-01-05 14:20:42,527][19668] Updated weights for policy 0, policy_version 286656 (0.0019) [2025-01-05 14:20:44,688][19668] Updated weights for policy 0, policy_version 286666 (0.0019) [2025-01-05 14:20:44,965][19571] Fps is (10 sec: 15975.6, 60 sec: 18636.8, 300 sec: 19688.6). Total num frames: 1174188032. Throughput: 0: 4529.5. Samples: 18545720. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 14:20:44,965][19571] Avg episode reward: [(0, '10.066')] [2025-01-05 14:20:46,880][19668] Updated weights for policy 0, policy_version 286676 (0.0018) [2025-01-05 14:20:48,990][19668] Updated weights for policy 0, policy_version 286686 (0.0018) [2025-01-05 14:20:49,965][19571] Fps is (10 sec: 18841.8, 60 sec: 18500.3, 300 sec: 19674.7). Total num frames: 1174282240. Throughput: 0: 4510.0. Samples: 18560038. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 14:20:49,965][19571] Avg episode reward: [(0, '9.742')] [2025-01-05 14:20:51,209][19668] Updated weights for policy 0, policy_version 286696 (0.0019) [2025-01-05 14:20:53,286][19668] Updated weights for policy 0, policy_version 286706 (0.0018) [2025-01-05 14:20:54,965][19571] Fps is (10 sec: 18841.5, 60 sec: 18432.0, 300 sec: 19646.9). Total num frames: 1174376448. Throughput: 0: 4481.3. Samples: 18588528. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 14:20:54,966][19571] Avg episode reward: [(0, '9.917')] [2025-01-05 14:20:55,495][19668] Updated weights for policy 0, policy_version 286716 (0.0019) [2025-01-05 14:20:57,533][19668] Updated weights for policy 0, policy_version 286726 (0.0017) [2025-01-05 14:20:59,676][19668] Updated weights for policy 0, policy_version 286736 (0.0018) [2025-01-05 14:20:59,965][19571] Fps is (10 sec: 19251.1, 60 sec: 18363.7, 300 sec: 19646.9). Total num frames: 1174474752. Throughput: 0: 4465.2. Samples: 18617446. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 14:20:59,965][19571] Avg episode reward: [(0, '9.893')] [2025-01-05 14:21:01,914][19668] Updated weights for policy 0, policy_version 286746 (0.0018) [2025-01-05 14:21:03,963][19668] Updated weights for policy 0, policy_version 286756 (0.0017) [2025-01-05 14:21:04,965][19571] Fps is (10 sec: 19251.3, 60 sec: 18295.5, 300 sec: 19619.1). Total num frames: 1174568960. Throughput: 0: 4453.6. Samples: 18631650. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 14:21:04,965][19571] Avg episode reward: [(0, '10.086')] [2025-01-05 14:21:06,171][19668] Updated weights for policy 0, policy_version 286766 (0.0019) [2025-01-05 14:21:08,287][19668] Updated weights for policy 0, policy_version 286776 (0.0018) [2025-01-05 14:21:09,965][19571] Fps is (10 sec: 18841.7, 60 sec: 18159.0, 300 sec: 19605.3). Total num frames: 1174663168. Throughput: 0: 4440.8. Samples: 18660306. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 14:21:09,965][19571] Avg episode reward: [(0, '12.476')] [2025-01-05 14:21:10,035][19636] Saving new best policy, reward=12.476! [2025-01-05 14:21:10,491][19668] Updated weights for policy 0, policy_version 286786 (0.0019) [2025-01-05 14:21:12,579][19668] Updated weights for policy 0, policy_version 286796 (0.0017) [2025-01-05 14:21:14,663][19668] Updated weights for policy 0, policy_version 286806 (0.0018) [2025-01-05 14:21:14,965][19571] Fps is (10 sec: 19251.1, 60 sec: 18158.9, 300 sec: 19591.4). Total num frames: 1174761472. Throughput: 0: 4540.0. Samples: 18689296. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 14:21:14,965][19571] Avg episode reward: [(0, '9.763')] [2025-01-05 14:21:16,838][19668] Updated weights for policy 0, policy_version 286816 (0.0019) [2025-01-05 14:21:18,877][19668] Updated weights for policy 0, policy_version 286826 (0.0017) [2025-01-05 14:21:19,965][19571] Fps is (10 sec: 19251.2, 60 sec: 18090.7, 300 sec: 19577.5). Total num frames: 1174855680. Throughput: 0: 4557.4. Samples: 18703616. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 14:21:19,965][19571] Avg episode reward: [(0, '9.173')] [2025-01-05 14:21:21,070][19668] Updated weights for policy 0, policy_version 286836 (0.0017) [2025-01-05 14:21:23,114][19668] Updated weights for policy 0, policy_version 286846 (0.0017) [2025-01-05 14:21:24,965][19571] Fps is (10 sec: 19251.4, 60 sec: 18227.2, 300 sec: 19577.5). Total num frames: 1174953984. Throughput: 0: 4740.9. Samples: 18732860. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 14:21:24,965][19571] Avg episode reward: [(0, '10.114')] [2025-01-05 14:21:25,315][19668] Updated weights for policy 0, policy_version 286856 (0.0017) [2025-01-05 14:21:27,416][19668] Updated weights for policy 0, policy_version 286866 (0.0018) [2025-01-05 14:21:29,449][19668] Updated weights for policy 0, policy_version 286876 (0.0018) [2025-01-05 14:21:29,965][19571] Fps is (10 sec: 19660.7, 60 sec: 18500.3, 300 sec: 19563.6). Total num frames: 1175052288. Throughput: 0: 4803.3. Samples: 18761870. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 14:21:29,965][19571] Avg episode reward: [(0, '10.312')] [2025-01-05 14:21:31,632][19668] Updated weights for policy 0, policy_version 286886 (0.0018) [2025-01-05 14:21:33,707][19668] Updated weights for policy 0, policy_version 286896 (0.0017) [2025-01-05 14:21:34,965][19571] Fps is (10 sec: 19251.2, 60 sec: 18637.0, 300 sec: 19549.7). Total num frames: 1175146496. Throughput: 0: 4806.0. Samples: 18776306. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 14:21:34,965][19571] Avg episode reward: [(0, '9.764')] [2025-01-05 14:21:35,879][19668] Updated weights for policy 0, policy_version 286906 (0.0019) [2025-01-05 14:21:37,927][19668] Updated weights for policy 0, policy_version 286916 (0.0019) [2025-01-05 14:21:39,965][19571] Fps is (10 sec: 19251.2, 60 sec: 19183.0, 300 sec: 19549.7). Total num frames: 1175244800. Throughput: 0: 4818.4. Samples: 18805354. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 14:21:39,965][19571] Avg episode reward: [(0, '9.728')] [2025-01-05 14:21:40,137][19668] Updated weights for policy 0, policy_version 286926 (0.0020) [2025-01-05 14:21:42,271][19668] Updated weights for policy 0, policy_version 286936 (0.0018) [2025-01-05 14:21:44,310][19668] Updated weights for policy 0, policy_version 286946 (0.0017) [2025-01-05 14:21:44,965][19571] Fps is (10 sec: 19250.8, 60 sec: 19182.9, 300 sec: 19521.9). Total num frames: 1175339008. Throughput: 0: 4818.4. Samples: 18834276. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 14:21:44,966][19571] Avg episode reward: [(0, '11.020')] [2025-01-05 14:21:46,526][19668] Updated weights for policy 0, policy_version 286956 (0.0018) [2025-01-05 14:21:48,563][19668] Updated weights for policy 0, policy_version 286966 (0.0016) [2025-01-05 14:21:49,965][19571] Fps is (10 sec: 19251.2, 60 sec: 19251.2, 300 sec: 19521.9). Total num frames: 1175437312. Throughput: 0: 4824.8. Samples: 18848764. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 14:21:49,965][19571] Avg episode reward: [(0, '10.814')] [2025-01-05 14:21:50,731][19668] Updated weights for policy 0, policy_version 286976 (0.0018) [2025-01-05 14:21:52,828][19668] Updated weights for policy 0, policy_version 286986 (0.0016) [2025-01-05 14:21:54,965][19571] Fps is (10 sec: 19251.2, 60 sec: 19251.2, 300 sec: 19494.2). Total num frames: 1175531520. Throughput: 0: 4826.7. Samples: 18877510. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 14:21:54,966][19571] Avg episode reward: [(0, '10.749')] [2025-01-05 14:21:54,973][19668] Updated weights for policy 0, policy_version 286996 (0.0019) [2025-01-05 14:21:57,210][19668] Updated weights for policy 0, policy_version 287006 (0.0019) [2025-01-05 14:21:59,303][19668] Updated weights for policy 0, policy_version 287016 (0.0017) [2025-01-05 14:21:59,965][19571] Fps is (10 sec: 19251.2, 60 sec: 19251.2, 300 sec: 19494.2). Total num frames: 1175629824. Throughput: 0: 4817.3. Samples: 18906076. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 14:21:59,965][19571] Avg episode reward: [(0, '9.012')] [2025-01-05 14:22:01,454][19668] Updated weights for policy 0, policy_version 287026 (0.0021) [2025-01-05 14:22:03,569][19668] Updated weights for policy 0, policy_version 287036 (0.0017) [2025-01-05 14:22:04,965][19571] Fps is (10 sec: 19251.4, 60 sec: 19251.2, 300 sec: 19480.3). Total num frames: 1175724032. Throughput: 0: 4816.6. Samples: 18920364. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 14:22:04,965][19571] Avg episode reward: [(0, '9.529')] [2025-01-05 14:22:05,776][19668] Updated weights for policy 0, policy_version 287046 (0.0018) [2025-01-05 14:22:07,820][19668] Updated weights for policy 0, policy_version 287056 (0.0017) [2025-01-05 14:22:09,965][19571] Fps is (10 sec: 18840.9, 60 sec: 19251.1, 300 sec: 19466.4). Total num frames: 1175818240. Throughput: 0: 4801.2. Samples: 18948914. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 14:22:09,966][19571] Avg episode reward: [(0, '9.800')] [2025-01-05 14:22:10,105][19668] Updated weights for policy 0, policy_version 287066 (0.0021) [2025-01-05 14:22:12,365][19668] Updated weights for policy 0, policy_version 287076 (0.0019) [2025-01-05 14:22:14,382][19668] Updated weights for policy 0, policy_version 287086 (0.0017) [2025-01-05 14:22:14,965][19571] Fps is (10 sec: 18841.6, 60 sec: 19182.9, 300 sec: 19452.5). Total num frames: 1175912448. Throughput: 0: 4787.7. Samples: 18977318. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 14:22:14,965][19571] Avg episode reward: [(0, '9.981')] [2025-01-05 14:22:16,480][19668] Updated weights for policy 0, policy_version 287096 (0.0017) [2025-01-05 14:22:18,593][19668] Updated weights for policy 0, policy_version 287106 (0.0018) [2025-01-05 14:22:19,965][19571] Fps is (10 sec: 19251.7, 60 sec: 19251.2, 300 sec: 19438.6). Total num frames: 1176010752. Throughput: 0: 4794.0. Samples: 18992036. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 14:22:19,965][19571] Avg episode reward: [(0, '9.584')] [2025-01-05 14:22:20,712][19668] Updated weights for policy 0, policy_version 287116 (0.0019) [2025-01-05 14:22:22,791][19668] Updated weights for policy 0, policy_version 287126 (0.0017) [2025-01-05 14:22:24,965][19571] Fps is (10 sec: 19251.0, 60 sec: 19182.9, 300 sec: 19424.8). Total num frames: 1176104960. Throughput: 0: 4789.6. Samples: 19020888. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 14:22:24,966][19571] Avg episode reward: [(0, '9.535')] [2025-01-05 14:22:25,013][19668] Updated weights for policy 0, policy_version 287136 (0.0018) [2025-01-05 14:22:27,207][19668] Updated weights for policy 0, policy_version 287146 (0.0019) [2025-01-05 14:22:29,253][19668] Updated weights for policy 0, policy_version 287156 (0.0017) [2025-01-05 14:22:29,965][19571] Fps is (10 sec: 19251.6, 60 sec: 19183.0, 300 sec: 19424.8). Total num frames: 1176203264. Throughput: 0: 4782.1. Samples: 19049468. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 14:22:29,965][19571] Avg episode reward: [(0, '9.583')] [2025-01-05 14:22:31,492][19668] Updated weights for policy 0, policy_version 287166 (0.0018) [2025-01-05 14:22:33,532][19668] Updated weights for policy 0, policy_version 287176 (0.0016) [2025-01-05 14:22:34,965][19571] Fps is (10 sec: 19251.5, 60 sec: 19182.9, 300 sec: 19410.9). Total num frames: 1176297472. Throughput: 0: 4780.5. Samples: 19063888. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 14:22:34,965][19571] Avg episode reward: [(0, '10.472')] [2025-01-05 14:22:35,671][19668] Updated weights for policy 0, policy_version 287186 (0.0017) [2025-01-05 14:22:37,773][19668] Updated weights for policy 0, policy_version 287196 (0.0018) [2025-01-05 14:22:39,829][19668] Updated weights for policy 0, policy_version 287206 (0.0017) [2025-01-05 14:22:39,965][19571] Fps is (10 sec: 19250.4, 60 sec: 19182.8, 300 sec: 19410.9). Total num frames: 1176395776. Throughput: 0: 4792.2. Samples: 19093160. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 14:22:39,966][19571] Avg episode reward: [(0, '10.429')] [2025-01-05 14:22:40,035][19636] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000287207_1176399872.pth... [2025-01-05 14:22:40,084][19636] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000286078_1171775488.pth [2025-01-05 14:22:42,053][19668] Updated weights for policy 0, policy_version 287216 (0.0019) [2025-01-05 14:22:44,171][19668] Updated weights for policy 0, policy_version 287226 (0.0017) [2025-01-05 14:22:44,965][19571] Fps is (10 sec: 19250.9, 60 sec: 19183.0, 300 sec: 19397.0). Total num frames: 1176489984. Throughput: 0: 4793.2. Samples: 19121770. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 14:22:44,965][19571] Avg episode reward: [(0, '9.633')] [2025-01-05 14:22:46,293][19668] Updated weights for policy 0, policy_version 287236 (0.0018) [2025-01-05 14:22:48,364][19668] Updated weights for policy 0, policy_version 287246 (0.0017) [2025-01-05 14:22:49,965][19571] Fps is (10 sec: 19251.8, 60 sec: 19182.9, 300 sec: 19383.1). Total num frames: 1176588288. Throughput: 0: 4798.1. Samples: 19136278. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 14:22:49,965][19571] Avg episode reward: [(0, '10.579')] [2025-01-05 14:22:50,550][19668] Updated weights for policy 0, policy_version 287256 (0.0018) [2025-01-05 14:22:52,600][19668] Updated weights for policy 0, policy_version 287266 (0.0017) [2025-01-05 14:22:54,641][19668] Updated weights for policy 0, policy_version 287276 (0.0017) [2025-01-05 14:22:54,965][19571] Fps is (10 sec: 19660.9, 60 sec: 19251.2, 300 sec: 19383.1). Total num frames: 1176686592. Throughput: 0: 4814.4. Samples: 19165560. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 14:22:54,965][19571] Avg episode reward: [(0, '11.061')] [2025-01-05 14:22:56,902][19668] Updated weights for policy 0, policy_version 287286 (0.0018) [2025-01-05 14:22:58,970][19668] Updated weights for policy 0, policy_version 287296 (0.0018) [2025-01-05 14:22:59,965][19571] Fps is (10 sec: 19251.1, 60 sec: 19182.9, 300 sec: 19355.3). Total num frames: 1176780800. Throughput: 0: 4818.6. Samples: 19194154. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 14:22:59,965][19571] Avg episode reward: [(0, '9.894')] [2025-01-05 14:23:01,115][19668] Updated weights for policy 0, policy_version 287306 (0.0018) [2025-01-05 14:23:03,230][19668] Updated weights for policy 0, policy_version 287316 (0.0017) [2025-01-05 14:23:04,965][19571] Fps is (10 sec: 19251.3, 60 sec: 19251.2, 300 sec: 19355.3). Total num frames: 1176879104. Throughput: 0: 4819.1. Samples: 19208894. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 14:23:04,965][19571] Avg episode reward: [(0, '9.127')] [2025-01-05 14:23:05,363][19668] Updated weights for policy 0, policy_version 287326 (0.0018) [2025-01-05 14:23:07,428][19668] Updated weights for policy 0, policy_version 287336 (0.0017) [2025-01-05 14:23:09,545][19668] Updated weights for policy 0, policy_version 287346 (0.0018) [2025-01-05 14:23:09,965][19571] Fps is (10 sec: 19660.9, 60 sec: 19319.6, 300 sec: 19355.3). Total num frames: 1176977408. Throughput: 0: 4822.1. Samples: 19237884. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 14:23:09,965][19571] Avg episode reward: [(0, '9.750')] [2025-01-05 14:23:11,679][19668] Updated weights for policy 0, policy_version 287356 (0.0018) [2025-01-05 14:23:13,731][19668] Updated weights for policy 0, policy_version 287366 (0.0017) [2025-01-05 14:23:14,965][19571] Fps is (10 sec: 19251.3, 60 sec: 19319.5, 300 sec: 19327.6). Total num frames: 1177071616. Throughput: 0: 4829.8. Samples: 19266808. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 14:23:14,965][19571] Avg episode reward: [(0, '9.678')] [2025-01-05 14:23:15,949][19668] Updated weights for policy 0, policy_version 287376 (0.0018) [2025-01-05 14:23:17,987][19668] Updated weights for policy 0, policy_version 287386 (0.0017) [2025-01-05 14:23:19,965][19571] Fps is (10 sec: 19251.4, 60 sec: 19319.5, 300 sec: 19313.7). Total num frames: 1177169920. Throughput: 0: 4837.9. Samples: 19281594. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 14:23:19,965][19571] Avg episode reward: [(0, '9.642')] [2025-01-05 14:23:20,133][19668] Updated weights for policy 0, policy_version 287396 (0.0018) [2025-01-05 14:23:22,250][19668] Updated weights for policy 0, policy_version 287406 (0.0017) [2025-01-05 14:23:24,278][19668] Updated weights for policy 0, policy_version 287416 (0.0017) [2025-01-05 14:23:24,965][19571] Fps is (10 sec: 19660.6, 60 sec: 19387.8, 300 sec: 19313.7). Total num frames: 1177268224. Throughput: 0: 4835.2. Samples: 19310744. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 14:23:24,965][19571] Avg episode reward: [(0, '9.374')] [2025-01-05 14:23:26,448][19668] Updated weights for policy 0, policy_version 287426 (0.0018) [2025-01-05 14:23:28,586][19668] Updated weights for policy 0, policy_version 287436 (0.0017) [2025-01-05 14:23:29,965][19571] Fps is (10 sec: 19251.0, 60 sec: 19319.4, 300 sec: 19285.9). Total num frames: 1177362432. Throughput: 0: 4838.0. Samples: 19339480. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 14:23:29,965][19571] Avg episode reward: [(0, '10.939')] [2025-01-05 14:23:30,685][19668] Updated weights for policy 0, policy_version 287446 (0.0017) [2025-01-05 14:23:32,668][19668] Updated weights for policy 0, policy_version 287456 (0.0015) [2025-01-05 14:23:34,727][19668] Updated weights for policy 0, policy_version 287466 (0.0016) [2025-01-05 14:23:34,965][19571] Fps is (10 sec: 19660.9, 60 sec: 19456.0, 300 sec: 19299.8). Total num frames: 1177464832. Throughput: 0: 4853.6. Samples: 19354688. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 14:23:34,965][19571] Avg episode reward: [(0, '9.842')] [2025-01-05 14:23:36,791][19668] Updated weights for policy 0, policy_version 287476 (0.0015) [2025-01-05 14:23:38,827][19668] Updated weights for policy 0, policy_version 287486 (0.0015) [2025-01-05 14:23:39,965][19571] Fps is (10 sec: 20070.4, 60 sec: 19456.1, 300 sec: 19285.9). Total num frames: 1177563136. Throughput: 0: 4869.8. Samples: 19384700. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 14:23:39,965][19571] Avg episode reward: [(0, '9.683')] [2025-01-05 14:23:40,943][19668] Updated weights for policy 0, policy_version 287496 (0.0015) [2025-01-05 14:23:42,958][19668] Updated weights for policy 0, policy_version 287506 (0.0015) [2025-01-05 14:23:44,966][19668] Updated weights for policy 0, policy_version 287516 (0.0015) [2025-01-05 14:23:44,969][19571] Fps is (10 sec: 20061.1, 60 sec: 19591.1, 300 sec: 19299.5). Total num frames: 1177665536. Throughput: 0: 4902.4. Samples: 19414786. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 14:23:44,970][19571] Avg episode reward: [(0, '10.165')] [2025-01-05 14:23:47,028][19668] Updated weights for policy 0, policy_version 287526 (0.0015) [2025-01-05 14:23:49,045][19668] Updated weights for policy 0, policy_version 287536 (0.0015) [2025-01-05 14:23:49,965][19571] Fps is (10 sec: 20070.6, 60 sec: 19592.6, 300 sec: 19299.8). Total num frames: 1177763840. Throughput: 0: 4910.4. Samples: 19429864. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 14:23:49,965][19571] Avg episode reward: [(0, '10.228')] [2025-01-05 14:23:51,067][19668] Updated weights for policy 0, policy_version 287546 (0.0015) [2025-01-05 14:23:53,132][19668] Updated weights for policy 0, policy_version 287556 (0.0016) [2025-01-05 14:23:54,965][19571] Fps is (10 sec: 19669.9, 60 sec: 19592.5, 300 sec: 19285.9). Total num frames: 1177862144. Throughput: 0: 4931.9. Samples: 19459820. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 14:23:54,965][19571] Avg episode reward: [(0, '9.168')] [2025-01-05 14:23:55,286][19668] Updated weights for policy 0, policy_version 287566 (0.0016) [2025-01-05 14:23:57,301][19668] Updated weights for policy 0, policy_version 287576 (0.0016) [2025-01-05 14:23:59,333][19668] Updated weights for policy 0, policy_version 287586 (0.0018) [2025-01-05 14:23:59,965][19571] Fps is (10 sec: 19660.6, 60 sec: 19660.8, 300 sec: 19285.9). Total num frames: 1177960448. Throughput: 0: 4949.5. Samples: 19489538. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 14:23:59,965][19571] Avg episode reward: [(0, '10.006')] [2025-01-05 14:24:01,489][19668] Updated weights for policy 0, policy_version 287596 (0.0015) [2025-01-05 14:24:03,495][19668] Updated weights for policy 0, policy_version 287606 (0.0014) [2025-01-05 14:24:04,965][19571] Fps is (10 sec: 20070.5, 60 sec: 19729.1, 300 sec: 19299.8). Total num frames: 1178062848. Throughput: 0: 4947.7. Samples: 19504242. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 14:24:04,965][19571] Avg episode reward: [(0, '10.102')] [2025-01-05 14:24:05,570][19668] Updated weights for policy 0, policy_version 287616 (0.0016) [2025-01-05 14:24:07,624][19668] Updated weights for policy 0, policy_version 287626 (0.0015) [2025-01-05 14:24:09,656][19668] Updated weights for policy 0, policy_version 287636 (0.0016) [2025-01-05 14:24:09,965][19571] Fps is (10 sec: 20070.3, 60 sec: 19729.1, 300 sec: 19285.9). Total num frames: 1178161152. Throughput: 0: 4966.3. Samples: 19534228. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 14:24:09,965][19571] Avg episode reward: [(0, '10.685')] [2025-01-05 14:24:11,781][19668] Updated weights for policy 0, policy_version 287646 (0.0017) [2025-01-05 14:24:13,876][19668] Updated weights for policy 0, policy_version 287656 (0.0016) [2025-01-05 14:24:14,965][19571] Fps is (10 sec: 19660.7, 60 sec: 19797.3, 300 sec: 19285.9). Total num frames: 1178259456. Throughput: 0: 4978.3. Samples: 19563504. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 14:24:14,965][19571] Avg episode reward: [(0, '10.400')] [2025-01-05 14:24:15,962][19668] Updated weights for policy 0, policy_version 287666 (0.0015) [2025-01-05 14:24:18,027][19668] Updated weights for policy 0, policy_version 287676 (0.0015) [2025-01-05 14:24:19,965][19571] Fps is (10 sec: 19660.8, 60 sec: 19797.3, 300 sec: 19272.0). Total num frames: 1178357760. Throughput: 0: 4973.1. Samples: 19578478. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 14:24:19,966][19571] Avg episode reward: [(0, '11.021')] [2025-01-05 14:24:20,174][19668] Updated weights for policy 0, policy_version 287686 (0.0017) [2025-01-05 14:24:22,177][19668] Updated weights for policy 0, policy_version 287696 (0.0015) [2025-01-05 14:24:24,292][19668] Updated weights for policy 0, policy_version 287706 (0.0016) [2025-01-05 14:24:24,965][19571] Fps is (10 sec: 19660.8, 60 sec: 19797.3, 300 sec: 19272.0). Total num frames: 1178456064. Throughput: 0: 4957.0. Samples: 19607766. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 14:24:24,965][19571] Avg episode reward: [(0, '9.940')] [2025-01-05 14:24:26,424][19668] Updated weights for policy 0, policy_version 287716 (0.0016) [2025-01-05 14:24:28,450][19668] Updated weights for policy 0, policy_version 287726 (0.0014) [2025-01-05 14:24:29,965][19571] Fps is (10 sec: 19251.2, 60 sec: 19797.3, 300 sec: 19244.3). Total num frames: 1178550272. Throughput: 0: 4941.6. Samples: 19637134. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 14:24:29,966][19571] Avg episode reward: [(0, '9.324')] [2025-01-05 14:24:30,607][19668] Updated weights for policy 0, policy_version 287736 (0.0016) [2025-01-05 14:24:32,645][19668] Updated weights for policy 0, policy_version 287746 (0.0015) [2025-01-05 14:24:34,659][19668] Updated weights for policy 0, policy_version 287756 (0.0015) [2025-01-05 14:24:34,965][19571] Fps is (10 sec: 19660.8, 60 sec: 19797.3, 300 sec: 19258.1). Total num frames: 1178652672. Throughput: 0: 4937.0. Samples: 19652030. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 14:24:34,965][19571] Avg episode reward: [(0, '9.171')] [2025-01-05 14:24:36,755][19668] Updated weights for policy 0, policy_version 287766 (0.0015) [2025-01-05 14:24:38,845][19668] Updated weights for policy 0, policy_version 287776 (0.0016) [2025-01-05 14:24:39,965][19571] Fps is (10 sec: 20070.4, 60 sec: 19797.3, 300 sec: 19258.1). Total num frames: 1178750976. Throughput: 0: 4933.6. Samples: 19681832. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 14:24:39,966][19571] Avg episode reward: [(0, '9.974')] [2025-01-05 14:24:39,972][19636] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000287781_1178750976.pth... [2025-01-05 14:24:40,023][19636] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000286645_1174097920.pth [2025-01-05 14:24:40,948][19668] Updated weights for policy 0, policy_version 287786 (0.0016) [2025-01-05 14:24:43,036][19668] Updated weights for policy 0, policy_version 287796 (0.0016) [2025-01-05 14:24:44,965][19571] Fps is (10 sec: 19660.8, 60 sec: 19730.6, 300 sec: 19244.3). Total num frames: 1178849280. Throughput: 0: 4929.0. Samples: 19711342. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 14:24:44,965][19571] Avg episode reward: [(0, '9.120')] [2025-01-05 14:24:45,100][19668] Updated weights for policy 0, policy_version 287806 (0.0016) [2025-01-05 14:24:47,081][19668] Updated weights for policy 0, policy_version 287816 (0.0016) [2025-01-05 14:24:49,182][19668] Updated weights for policy 0, policy_version 287826 (0.0015) [2025-01-05 14:24:49,965][19571] Fps is (10 sec: 19660.9, 60 sec: 19729.0, 300 sec: 19244.3). Total num frames: 1178947584. Throughput: 0: 4933.5. Samples: 19726248. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 14:24:49,965][19571] Avg episode reward: [(0, '8.386')] [2025-01-05 14:24:51,327][19668] Updated weights for policy 0, policy_version 287836 (0.0016) [2025-01-05 14:24:53,330][19668] Updated weights for policy 0, policy_version 287846 (0.0017) [2025-01-05 14:24:54,965][19571] Fps is (10 sec: 19660.8, 60 sec: 19729.1, 300 sec: 19230.4). Total num frames: 1179045888. Throughput: 0: 4924.9. Samples: 19755850. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 14:24:54,965][19571] Avg episode reward: [(0, '9.721')] [2025-01-05 14:24:55,458][19668] Updated weights for policy 0, policy_version 287856 (0.0016) [2025-01-05 14:24:57,497][19668] Updated weights for policy 0, policy_version 287866 (0.0015) [2025-01-05 14:24:59,484][19668] Updated weights for policy 0, policy_version 287876 (0.0015) [2025-01-05 14:24:59,965][19571] Fps is (10 sec: 20070.3, 60 sec: 19797.3, 300 sec: 19244.3). Total num frames: 1179148288. Throughput: 0: 4939.3. Samples: 19785772. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 14:24:59,965][19571] Avg episode reward: [(0, '10.343')] [2025-01-05 14:25:01,586][19668] Updated weights for policy 0, policy_version 287886 (0.0016) [2025-01-05 14:25:03,613][19668] Updated weights for policy 0, policy_version 287896 (0.0016) [2025-01-05 14:25:04,965][19571] Fps is (10 sec: 20070.5, 60 sec: 19729.0, 300 sec: 19230.4). Total num frames: 1179246592. Throughput: 0: 4940.7. Samples: 19800808. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 14:25:04,965][19571] Avg episode reward: [(0, '9.973')] [2025-01-05 14:25:05,710][19668] Updated weights for policy 0, policy_version 287906 (0.0017) [2025-01-05 14:25:07,805][19668] Updated weights for policy 0, policy_version 287916 (0.0015) [2025-01-05 14:25:09,911][19668] Updated weights for policy 0, policy_version 287926 (0.0016) [2025-01-05 14:25:09,965][19571] Fps is (10 sec: 19660.8, 60 sec: 19729.1, 300 sec: 19230.4). Total num frames: 1179344896. Throughput: 0: 4943.5. Samples: 19830224. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 14:25:09,965][19571] Avg episode reward: [(0, '9.951')] [2025-01-05 14:25:12,003][19668] Updated weights for policy 0, policy_version 287936 (0.0016) [2025-01-05 14:25:14,099][19668] Updated weights for policy 0, policy_version 287946 (0.0015) [2025-01-05 14:25:14,965][19571] Fps is (10 sec: 19660.9, 60 sec: 19729.1, 300 sec: 19230.4). Total num frames: 1179443200. Throughput: 0: 4940.8. Samples: 19859468. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 14:25:14,965][19571] Avg episode reward: [(0, '10.222')] [2025-01-05 14:25:16,199][19668] Updated weights for policy 0, policy_version 287956 (0.0017) [2025-01-05 14:25:18,246][19668] Updated weights for policy 0, policy_version 287966 (0.0015) [2025-01-05 14:25:19,965][19571] Fps is (10 sec: 19251.2, 60 sec: 19660.8, 300 sec: 19244.3). Total num frames: 1179537408. Throughput: 0: 4932.9. Samples: 19874012. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 14:25:19,965][19571] Avg episode reward: [(0, '10.192')] [2025-01-05 14:25:20,421][19668] Updated weights for policy 0, policy_version 287976 (0.0016) [2025-01-05 14:25:22,443][19668] Updated weights for policy 0, policy_version 287986 (0.0016) [2025-01-05 14:25:24,478][19668] Updated weights for policy 0, policy_version 287996 (0.0016) [2025-01-05 14:25:24,965][19571] Fps is (10 sec: 19660.6, 60 sec: 19729.1, 300 sec: 19313.7). Total num frames: 1179639808. Throughput: 0: 4928.3. Samples: 19903604. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 14:25:24,965][19571] Avg episode reward: [(0, '9.857')] [2025-01-05 14:25:26,637][19668] Updated weights for policy 0, policy_version 288006 (0.0018) [2025-01-05 14:25:28,670][19668] Updated weights for policy 0, policy_version 288016 (0.0016) [2025-01-05 14:25:29,965][19571] Fps is (10 sec: 20070.7, 60 sec: 19797.4, 300 sec: 19355.4). Total num frames: 1179738112. Throughput: 0: 4927.7. Samples: 19933086. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 14:25:29,965][19571] Avg episode reward: [(0, '10.542')] [2025-01-05 14:25:30,774][19668] Updated weights for policy 0, policy_version 288026 (0.0017) [2025-01-05 14:25:32,895][19668] Updated weights for policy 0, policy_version 288036 (0.0016) [2025-01-05 14:25:34,965][19571] Fps is (10 sec: 19660.5, 60 sec: 19729.0, 300 sec: 19466.4). Total num frames: 1179836416. Throughput: 0: 4922.3. Samples: 19947754. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 14:25:34,965][19571] Avg episode reward: [(0, '8.918')] [2025-01-05 14:25:34,970][19668] Updated weights for policy 0, policy_version 288046 (0.0017) [2025-01-05 14:25:37,078][19668] Updated weights for policy 0, policy_version 288056 (0.0016) [2025-01-05 14:25:39,178][19668] Updated weights for policy 0, policy_version 288066 (0.0016) [2025-01-05 14:25:39,965][19571] Fps is (10 sec: 19250.7, 60 sec: 19660.8, 300 sec: 19466.4). Total num frames: 1179930624. Throughput: 0: 4917.5. Samples: 19977138. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 14:25:39,966][19571] Avg episode reward: [(0, '9.634')] [2025-01-05 14:25:41,285][19668] Updated weights for policy 0, policy_version 288076 (0.0017) [2025-01-05 14:25:43,251][19668] Updated weights for policy 0, policy_version 288086 (0.0016) [2025-01-05 14:25:44,965][19571] Fps is (10 sec: 19661.0, 60 sec: 19729.1, 300 sec: 19494.2). Total num frames: 1180033024. Throughput: 0: 4917.2. Samples: 20007044. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 14:25:44,965][19571] Avg episode reward: [(0, '9.817')] [2025-01-05 14:25:45,341][19668] Updated weights for policy 0, policy_version 288096 (0.0016) [2025-01-05 14:25:47,346][19668] Updated weights for policy 0, policy_version 288106 (0.0016) [2025-01-05 14:25:49,339][19668] Updated weights for policy 0, policy_version 288116 (0.0016) [2025-01-05 14:25:49,965][19571] Fps is (10 sec: 20480.5, 60 sec: 19797.4, 300 sec: 19522.0). Total num frames: 1180135424. Throughput: 0: 4919.5. Samples: 20022184. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 14:25:49,965][19571] Avg episode reward: [(0, '11.468')] [2025-01-05 14:25:51,433][19668] Updated weights for policy 0, policy_version 288126 (0.0015) [2025-01-05 14:25:53,446][19668] Updated weights for policy 0, policy_version 288136 (0.0016) [2025-01-05 14:25:54,965][19571] Fps is (10 sec: 20070.3, 60 sec: 19797.3, 300 sec: 19522.0). Total num frames: 1180233728. Throughput: 0: 4934.9. Samples: 20052296. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 14:25:54,965][19571] Avg episode reward: [(0, '10.109')] [2025-01-05 14:25:55,460][19668] Updated weights for policy 0, policy_version 288146 (0.0015) [2025-01-05 14:25:57,527][19668] Updated weights for policy 0, policy_version 288156 (0.0015) [2025-01-05 14:25:59,539][19668] Updated weights for policy 0, policy_version 288166 (0.0016) [2025-01-05 14:25:59,965][19571] Fps is (10 sec: 20070.3, 60 sec: 19797.4, 300 sec: 19549.7). Total num frames: 1180336128. Throughput: 0: 4958.2. Samples: 20082586. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 14:25:59,965][19571] Avg episode reward: [(0, '10.929')] [2025-01-05 14:26:01,810][19668] Updated weights for policy 0, policy_version 288176 (0.0016) [2025-01-05 14:26:04,209][19668] Updated weights for policy 0, policy_version 288186 (0.0017) [2025-01-05 14:26:04,965][19571] Fps is (10 sec: 18841.0, 60 sec: 19592.4, 300 sec: 19521.9). Total num frames: 1180422144. Throughput: 0: 4931.7. Samples: 20095940. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 14:26:04,966][19571] Avg episode reward: [(0, '10.584')] [2025-01-05 14:26:06,273][19668] Updated weights for policy 0, policy_version 288196 (0.0017) [2025-01-05 14:26:08,294][19668] Updated weights for policy 0, policy_version 288206 (0.0016) [2025-01-05 14:26:09,965][19571] Fps is (10 sec: 18431.8, 60 sec: 19592.5, 300 sec: 19521.9). Total num frames: 1180520448. Throughput: 0: 4909.7. Samples: 20124542. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 14:26:09,966][19571] Avg episode reward: [(0, '10.076')] [2025-01-05 14:26:10,455][19668] Updated weights for policy 0, policy_version 288216 (0.0017) [2025-01-05 14:26:12,475][19668] Updated weights for policy 0, policy_version 288226 (0.0017) [2025-01-05 14:26:14,528][19668] Updated weights for policy 0, policy_version 288236 (0.0017) [2025-01-05 14:26:14,965][19571] Fps is (10 sec: 19661.6, 60 sec: 19592.5, 300 sec: 19535.8). Total num frames: 1180618752. Throughput: 0: 4912.5. Samples: 20154150. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-05 14:26:14,965][19571] Avg episode reward: [(0, '10.475')] [2025-01-05 14:26:16,736][19668] Updated weights for policy 0, policy_version 288246 (0.0018) [2025-01-05 14:26:18,757][19668] Updated weights for policy 0, policy_version 288256 (0.0017) [2025-01-05 14:26:19,965][19571] Fps is (10 sec: 19660.9, 60 sec: 19660.8, 300 sec: 19535.8). Total num frames: 1180717056. Throughput: 0: 4906.3. Samples: 20168538. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-05 14:26:19,965][19571] Avg episode reward: [(0, '10.152')] [2025-01-05 14:26:20,910][19668] Updated weights for policy 0, policy_version 288266 (0.0018) [2025-01-05 14:26:23,072][19668] Updated weights for policy 0, policy_version 288276 (0.0018) [2025-01-05 14:26:24,965][19571] Fps is (10 sec: 19251.1, 60 sec: 19524.3, 300 sec: 19522.0). Total num frames: 1180811264. Throughput: 0: 4894.0. Samples: 20197368. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-05 14:26:24,965][19571] Avg episode reward: [(0, '10.733')] [2025-01-05 14:26:25,211][19668] Updated weights for policy 0, policy_version 288286 (0.0018) [2025-01-05 14:26:27,432][19668] Updated weights for policy 0, policy_version 288296 (0.0018) [2025-01-05 14:26:29,572][19668] Updated weights for policy 0, policy_version 288306 (0.0018) [2025-01-05 14:26:29,965][19571] Fps is (10 sec: 18841.1, 60 sec: 19455.9, 300 sec: 19521.9). Total num frames: 1180905472. Throughput: 0: 4860.4. Samples: 20225762. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-05 14:26:29,966][19571] Avg episode reward: [(0, '10.417')] [2025-01-05 14:26:31,877][19668] Updated weights for policy 0, policy_version 288316 (0.0019) [2025-01-05 14:26:34,053][19668] Updated weights for policy 0, policy_version 288326 (0.0018) [2025-01-05 14:26:34,965][19571] Fps is (10 sec: 18431.9, 60 sec: 19319.5, 300 sec: 19494.2). Total num frames: 1180995584. Throughput: 0: 4822.7. Samples: 20239206. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-05 14:26:34,965][19571] Avg episode reward: [(0, '10.112')] [2025-01-05 14:26:36,303][19668] Updated weights for policy 0, policy_version 288336 (0.0018) [2025-01-05 14:26:38,370][19668] Updated weights for policy 0, policy_version 288346 (0.0017) [2025-01-05 14:26:39,965][19571] Fps is (10 sec: 18842.3, 60 sec: 19387.8, 300 sec: 19508.1). Total num frames: 1181093888. Throughput: 0: 4783.7. Samples: 20267564. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-05 14:26:39,965][19571] Avg episode reward: [(0, '9.050')] [2025-01-05 14:26:39,970][19636] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000288353_1181093888.pth... [2025-01-05 14:26:40,037][19636] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000287207_1176399872.pth [2025-01-05 14:26:40,578][19668] Updated weights for policy 0, policy_version 288356 (0.0018) [2025-01-05 14:26:42,737][19668] Updated weights for policy 0, policy_version 288366 (0.0017) [2025-01-05 14:26:44,830][19668] Updated weights for policy 0, policy_version 288376 (0.0018) [2025-01-05 14:26:44,965][19571] Fps is (10 sec: 19251.1, 60 sec: 19251.2, 300 sec: 19494.2). Total num frames: 1181188096. Throughput: 0: 4746.0. Samples: 20296156. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-05 14:26:44,966][19571] Avg episode reward: [(0, '9.984')] [2025-01-05 14:26:46,977][19668] Updated weights for policy 0, policy_version 288386 (0.0018) [2025-01-05 14:26:49,081][19668] Updated weights for policy 0, policy_version 288396 (0.0018) [2025-01-05 14:26:49,965][19571] Fps is (10 sec: 18841.6, 60 sec: 19114.7, 300 sec: 19494.2). Total num frames: 1181282304. Throughput: 0: 4767.9. Samples: 20310492. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-05 14:26:49,965][19571] Avg episode reward: [(0, '9.956')] [2025-01-05 14:26:51,236][19668] Updated weights for policy 0, policy_version 288406 (0.0019) [2025-01-05 14:26:53,330][19668] Updated weights for policy 0, policy_version 288416 (0.0018) [2025-01-05 14:26:54,965][19571] Fps is (10 sec: 19251.5, 60 sec: 19114.7, 300 sec: 19494.2). Total num frames: 1181380608. Throughput: 0: 4774.4. Samples: 20339390. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-05 14:26:54,965][19571] Avg episode reward: [(0, '8.936')] [2025-01-05 14:26:55,518][19668] Updated weights for policy 0, policy_version 288426 (0.0018) [2025-01-05 14:26:57,580][19668] Updated weights for policy 0, policy_version 288436 (0.0017) [2025-01-05 14:26:59,635][19668] Updated weights for policy 0, policy_version 288446 (0.0017) [2025-01-05 14:26:59,965][19571] Fps is (10 sec: 19660.7, 60 sec: 19046.4, 300 sec: 19508.1). Total num frames: 1181478912. Throughput: 0: 4766.7. Samples: 20368652. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-05 14:26:59,965][19571] Avg episode reward: [(0, '9.152')] [2025-01-05 14:27:01,815][19668] Updated weights for policy 0, policy_version 288456 (0.0017) [2025-01-05 14:27:03,899][19668] Updated weights for policy 0, policy_version 288466 (0.0017) [2025-01-05 14:27:04,965][19571] Fps is (10 sec: 19251.0, 60 sec: 19183.0, 300 sec: 19508.1). Total num frames: 1181573120. Throughput: 0: 4764.4. Samples: 20382936. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-05 14:27:04,965][19571] Avg episode reward: [(0, '9.820')] [2025-01-05 14:27:06,069][19668] Updated weights for policy 0, policy_version 288476 (0.0018) [2025-01-05 14:27:08,142][19668] Updated weights for policy 0, policy_version 288486 (0.0018) [2025-01-05 14:27:09,965][19571] Fps is (10 sec: 19251.3, 60 sec: 19183.0, 300 sec: 19522.0). Total num frames: 1181671424. Throughput: 0: 4767.3. Samples: 20411894. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-05 14:27:09,965][19571] Avg episode reward: [(0, '10.493')] [2025-01-05 14:27:10,342][19668] Updated weights for policy 0, policy_version 288496 (0.0018) [2025-01-05 14:27:12,440][19668] Updated weights for policy 0, policy_version 288506 (0.0017) [2025-01-05 14:27:14,503][19668] Updated weights for policy 0, policy_version 288516 (0.0017) [2025-01-05 14:27:14,965][19571] Fps is (10 sec: 19661.1, 60 sec: 19183.0, 300 sec: 19522.0). Total num frames: 1181769728. Throughput: 0: 4782.8. Samples: 20440984. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-05 14:27:14,965][19571] Avg episode reward: [(0, '9.556')] [2025-01-05 14:27:16,660][19668] Updated weights for policy 0, policy_version 288526 (0.0018) [2025-01-05 14:27:18,799][19668] Updated weights for policy 0, policy_version 288536 (0.0018) [2025-01-05 14:27:19,965][19571] Fps is (10 sec: 19251.3, 60 sec: 19114.7, 300 sec: 19522.0). Total num frames: 1181863936. Throughput: 0: 4805.5. Samples: 20455454. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) [2025-01-05 14:27:19,965][19571] Avg episode reward: [(0, '10.138')] [2025-01-05 14:27:20,940][19668] Updated weights for policy 0, policy_version 288546 (0.0018) [2025-01-05 14:27:23,012][19668] Updated weights for policy 0, policy_version 288556 (0.0018) [2025-01-05 14:27:24,965][19571] Fps is (10 sec: 18841.3, 60 sec: 19114.7, 300 sec: 19508.1). Total num frames: 1181958144. Throughput: 0: 4810.7. Samples: 20484046. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-05 14:27:24,965][19571] Avg episode reward: [(0, '9.118')] [2025-01-05 14:27:25,238][19668] Updated weights for policy 0, policy_version 288566 (0.0017) [2025-01-05 14:27:27,270][19668] Updated weights for policy 0, policy_version 288576 (0.0017) [2025-01-05 14:27:29,309][19668] Updated weights for policy 0, policy_version 288586 (0.0018) [2025-01-05 14:27:29,965][19571] Fps is (10 sec: 19250.7, 60 sec: 19183.0, 300 sec: 19521.9). Total num frames: 1182056448. Throughput: 0: 4830.6. Samples: 20513534. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-05 14:27:29,966][19571] Avg episode reward: [(0, '9.040')] [2025-01-05 14:27:31,528][19668] Updated weights for policy 0, policy_version 288596 (0.0019) [2025-01-05 14:27:33,591][19668] Updated weights for policy 0, policy_version 288606 (0.0017) [2025-01-05 14:27:34,965][19571] Fps is (10 sec: 19661.1, 60 sec: 19319.5, 300 sec: 19522.0). Total num frames: 1182154752. Throughput: 0: 4834.3. Samples: 20528034. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-05 14:27:34,965][19571] Avg episode reward: [(0, '10.120')] [2025-01-05 14:27:35,743][19668] Updated weights for policy 0, policy_version 288616 (0.0018) [2025-01-05 14:27:37,817][19668] Updated weights for policy 0, policy_version 288626 (0.0018) [2025-01-05 14:27:39,940][19668] Updated weights for policy 0, policy_version 288636 (0.0018) [2025-01-05 14:27:39,965][19571] Fps is (10 sec: 19661.0, 60 sec: 19319.4, 300 sec: 19535.8). Total num frames: 1182253056. Throughput: 0: 4831.6. Samples: 20556814. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-05 14:27:39,965][19571] Avg episode reward: [(0, '10.776')] [2025-01-05 14:27:42,112][19668] Updated weights for policy 0, policy_version 288646 (0.0018) [2025-01-05 14:27:44,232][19668] Updated weights for policy 0, policy_version 288656 (0.0018) [2025-01-05 14:27:44,965][19571] Fps is (10 sec: 19251.0, 60 sec: 19319.5, 300 sec: 19522.0). Total num frames: 1182347264. Throughput: 0: 4819.9. Samples: 20585546. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-05 14:27:44,965][19571] Avg episode reward: [(0, '9.932')] [2025-01-05 14:27:46,421][19668] Updated weights for policy 0, policy_version 288666 (0.0019) [2025-01-05 14:27:48,514][19668] Updated weights for policy 0, policy_version 288676 (0.0018) [2025-01-05 14:27:49,965][19571] Fps is (10 sec: 18841.8, 60 sec: 19319.5, 300 sec: 19508.1). Total num frames: 1182441472. Throughput: 0: 4824.3. Samples: 20600030. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-05 14:27:49,965][19571] Avg episode reward: [(0, '9.717')] [2025-01-05 14:27:50,670][19668] Updated weights for policy 0, policy_version 288686 (0.0018) [2025-01-05 14:27:52,759][19668] Updated weights for policy 0, policy_version 288696 (0.0018) [2025-01-05 14:27:54,825][19668] Updated weights for policy 0, policy_version 288706 (0.0017) [2025-01-05 14:27:54,965][19571] Fps is (10 sec: 19250.9, 60 sec: 19319.4, 300 sec: 19521.9). Total num frames: 1182539776. Throughput: 0: 4828.2. Samples: 20629162. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-05 14:27:54,966][19571] Avg episode reward: [(0, '8.819')] [2025-01-05 14:27:56,969][19668] Updated weights for policy 0, policy_version 288716 (0.0018) [2025-01-05 14:27:59,133][19668] Updated weights for policy 0, policy_version 288726 (0.0019) [2025-01-05 14:27:59,965][19571] Fps is (10 sec: 19251.3, 60 sec: 19251.2, 300 sec: 19508.1). Total num frames: 1182633984. Throughput: 0: 4820.5. Samples: 20657906. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-05 14:27:59,965][19571] Avg episode reward: [(0, '10.039')] [2025-01-05 14:28:01,300][19668] Updated weights for policy 0, policy_version 288736 (0.0018) [2025-01-05 14:28:03,352][19668] Updated weights for policy 0, policy_version 288746 (0.0017) [2025-01-05 14:28:04,965][19571] Fps is (10 sec: 19251.6, 60 sec: 19319.5, 300 sec: 19508.1). Total num frames: 1182732288. Throughput: 0: 4818.7. Samples: 20672296. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-05 14:28:04,965][19571] Avg episode reward: [(0, '8.968')] [2025-01-05 14:28:05,594][19668] Updated weights for policy 0, policy_version 288756 (0.0019) [2025-01-05 14:28:07,688][19668] Updated weights for policy 0, policy_version 288766 (0.0017) [2025-01-05 14:28:09,748][19668] Updated weights for policy 0, policy_version 288776 (0.0017) [2025-01-05 14:28:09,965][19571] Fps is (10 sec: 19250.9, 60 sec: 19251.1, 300 sec: 19508.1). Total num frames: 1182826496. Throughput: 0: 4826.7. Samples: 20701248. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-05 14:28:09,966][19571] Avg episode reward: [(0, '9.926')] [2025-01-05 14:28:11,958][19668] Updated weights for policy 0, policy_version 288786 (0.0018) [2025-01-05 14:28:14,032][19668] Updated weights for policy 0, policy_version 288796 (0.0020) [2025-01-05 14:28:14,965][19571] Fps is (10 sec: 19251.1, 60 sec: 19251.2, 300 sec: 19508.1). Total num frames: 1182924800. Throughput: 0: 4812.2. Samples: 20730082. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-05 14:28:14,965][19571] Avg episode reward: [(0, '10.986')] [2025-01-05 14:28:16,176][19668] Updated weights for policy 0, policy_version 288806 (0.0018) [2025-01-05 14:28:18,278][19668] Updated weights for policy 0, policy_version 288816 (0.0017) [2025-01-05 14:28:19,965][19571] Fps is (10 sec: 19251.1, 60 sec: 19251.1, 300 sec: 19494.2). Total num frames: 1183019008. Throughput: 0: 4810.2. Samples: 20744494. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-05 14:28:19,966][19571] Avg episode reward: [(0, '8.646')] [2025-01-05 14:28:20,497][19668] Updated weights for policy 0, policy_version 288826 (0.0019) [2025-01-05 14:28:22,518][19668] Updated weights for policy 0, policy_version 288836 (0.0018) [2025-01-05 14:28:24,602][19668] Updated weights for policy 0, policy_version 288846 (0.0017) [2025-01-05 14:28:24,965][19571] Fps is (10 sec: 19251.0, 60 sec: 19319.5, 300 sec: 19508.1). Total num frames: 1183117312. Throughput: 0: 4819.2. Samples: 20773678. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-05 14:28:24,966][19571] Avg episode reward: [(0, '9.675')] [2025-01-05 14:28:26,801][19668] Updated weights for policy 0, policy_version 288856 (0.0018) [2025-01-05 14:28:28,821][19668] Updated weights for policy 0, policy_version 288866 (0.0017) [2025-01-05 14:28:29,965][19571] Fps is (10 sec: 19251.5, 60 sec: 19251.2, 300 sec: 19480.3). Total num frames: 1183211520. Throughput: 0: 4821.2. Samples: 20802502. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-05 14:28:29,965][19571] Avg episode reward: [(0, '9.814')] [2025-01-05 14:28:31,028][19668] Updated weights for policy 0, policy_version 288876 (0.0018) [2025-01-05 14:28:33,106][19668] Updated weights for policy 0, policy_version 288886 (0.0017) [2025-01-05 14:28:34,965][19571] Fps is (10 sec: 19251.4, 60 sec: 19251.2, 300 sec: 19480.3). Total num frames: 1183309824. Throughput: 0: 4822.2. Samples: 20817028. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-05 14:28:34,965][19571] Avg episode reward: [(0, '10.696')] [2025-01-05 14:28:35,297][19668] Updated weights for policy 0, policy_version 288896 (0.0021) [2025-01-05 14:28:37,411][19668] Updated weights for policy 0, policy_version 288906 (0.0018) [2025-01-05 14:28:39,496][19668] Updated weights for policy 0, policy_version 288916 (0.0018) [2025-01-05 14:28:39,965][19571] Fps is (10 sec: 19660.7, 60 sec: 19251.2, 300 sec: 19466.7). Total num frames: 1183408128. Throughput: 0: 4813.8. Samples: 20845784. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 14:28:39,965][19571] Avg episode reward: [(0, '8.889')] [2025-01-05 14:28:39,972][19636] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000288918_1183408128.pth... [2025-01-05 14:28:40,023][19636] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000287781_1178750976.pth [2025-01-05 14:28:41,685][19668] Updated weights for policy 0, policy_version 288926 (0.0018) [2025-01-05 14:28:43,801][19668] Updated weights for policy 0, policy_version 288936 (0.0018) [2025-01-05 14:28:44,965][19571] Fps is (10 sec: 19251.4, 60 sec: 19251.2, 300 sec: 19452.5). Total num frames: 1183502336. Throughput: 0: 4810.9. Samples: 20874398. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 14:28:44,965][19571] Avg episode reward: [(0, '10.851')] [2025-01-05 14:28:46,028][19668] Updated weights for policy 0, policy_version 288946 (0.0018) [2025-01-05 14:28:48,048][19668] Updated weights for policy 0, policy_version 288956 (0.0017) [2025-01-05 14:28:49,965][19571] Fps is (10 sec: 18841.8, 60 sec: 19251.2, 300 sec: 19438.6). Total num frames: 1183596544. Throughput: 0: 4814.2. Samples: 20888936. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 14:28:49,965][19571] Avg episode reward: [(0, '10.409')] [2025-01-05 14:28:50,227][19668] Updated weights for policy 0, policy_version 288966 (0.0018) [2025-01-05 14:28:52,332][19668] Updated weights for policy 0, policy_version 288976 (0.0017) [2025-01-05 14:28:54,381][19668] Updated weights for policy 0, policy_version 288986 (0.0017) [2025-01-05 14:28:54,965][19571] Fps is (10 sec: 19251.0, 60 sec: 19251.2, 300 sec: 19438.6). Total num frames: 1183694848. Throughput: 0: 4819.0. Samples: 20918102. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 14:28:54,965][19571] Avg episode reward: [(0, '10.879')] [2025-01-05 14:28:56,597][19668] Updated weights for policy 0, policy_version 288996 (0.0018) [2025-01-05 14:28:58,655][19668] Updated weights for policy 0, policy_version 289006 (0.0018) [2025-01-05 14:28:59,965][19571] Fps is (10 sec: 19251.0, 60 sec: 19251.2, 300 sec: 19410.9). Total num frames: 1183789056. Throughput: 0: 4817.5. Samples: 20946872. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 14:28:59,965][19571] Avg episode reward: [(0, '9.134')] [2025-01-05 14:29:00,800][19668] Updated weights for policy 0, policy_version 289016 (0.0018) [2025-01-05 14:29:02,890][19668] Updated weights for policy 0, policy_version 289026 (0.0017) [2025-01-05 14:29:04,965][19571] Fps is (10 sec: 19251.3, 60 sec: 19251.2, 300 sec: 19410.9). Total num frames: 1183887360. Throughput: 0: 4820.2. Samples: 20961404. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 14:29:04,965][19571] Avg episode reward: [(0, '9.666')] [2025-01-05 14:29:05,084][19668] Updated weights for policy 0, policy_version 289036 (0.0018) [2025-01-05 14:29:07,217][19668] Updated weights for policy 0, policy_version 289046 (0.0017) [2025-01-05 14:29:09,314][19668] Updated weights for policy 0, policy_version 289056 (0.0017) [2025-01-05 14:29:09,965][19571] Fps is (10 sec: 19250.9, 60 sec: 19251.2, 300 sec: 19397.0). Total num frames: 1183981568. Throughput: 0: 4808.6. Samples: 20990064. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 14:29:09,966][19571] Avg episode reward: [(0, '10.432')] [2025-01-05 14:29:11,519][19668] Updated weights for policy 0, policy_version 289066 (0.0019) [2025-01-05 14:29:13,560][19668] Updated weights for policy 0, policy_version 289076 (0.0017) [2025-01-05 14:29:14,965][19571] Fps is (10 sec: 19251.2, 60 sec: 19251.2, 300 sec: 19397.0). Total num frames: 1184079872. Throughput: 0: 4805.2. Samples: 21018736. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 14:29:14,965][19571] Avg episode reward: [(0, '9.160')] [2025-01-05 14:29:15,751][19668] Updated weights for policy 0, policy_version 289086 (0.0018) [2025-01-05 14:29:17,835][19668] Updated weights for policy 0, policy_version 289096 (0.0016) [2025-01-05 14:29:19,965][19571] Fps is (10 sec: 19251.1, 60 sec: 19251.2, 300 sec: 19383.1). Total num frames: 1184174080. Throughput: 0: 4807.5. Samples: 21033368. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 14:29:19,966][19571] Avg episode reward: [(0, '9.266')] [2025-01-05 14:29:20,009][19668] Updated weights for policy 0, policy_version 289106 (0.0018) [2025-01-05 14:29:22,192][19668] Updated weights for policy 0, policy_version 289116 (0.0018) [2025-01-05 14:29:24,277][19668] Updated weights for policy 0, policy_version 289126 (0.0017) [2025-01-05 14:29:24,965][19571] Fps is (10 sec: 19251.1, 60 sec: 19251.2, 300 sec: 19397.0). Total num frames: 1184272384. Throughput: 0: 4805.4. Samples: 21062028. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 14:29:24,965][19571] Avg episode reward: [(0, '10.587')] [2025-01-05 14:29:26,438][19668] Updated weights for policy 0, policy_version 289136 (0.0019) [2025-01-05 14:29:28,515][19668] Updated weights for policy 0, policy_version 289146 (0.0018) [2025-01-05 14:29:29,965][19571] Fps is (10 sec: 19251.4, 60 sec: 19251.2, 300 sec: 19369.2). Total num frames: 1184366592. Throughput: 0: 4812.6. Samples: 21090966. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 14:29:29,966][19571] Avg episode reward: [(0, '9.546')] [2025-01-05 14:29:30,675][19668] Updated weights for policy 0, policy_version 289156 (0.0018) [2025-01-05 14:29:32,724][19668] Updated weights for policy 0, policy_version 289166 (0.0017) [2025-01-05 14:29:34,780][19668] Updated weights for policy 0, policy_version 289176 (0.0017) [2025-01-05 14:29:34,965][19571] Fps is (10 sec: 19251.1, 60 sec: 19251.2, 300 sec: 19369.2). Total num frames: 1184464896. Throughput: 0: 4815.2. Samples: 21105620. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 14:29:34,965][19571] Avg episode reward: [(0, '9.182')] [2025-01-05 14:29:36,957][19668] Updated weights for policy 0, policy_version 289186 (0.0018) [2025-01-05 14:29:38,989][19668] Updated weights for policy 0, policy_version 289196 (0.0017) [2025-01-05 14:29:39,965][19571] Fps is (10 sec: 19661.2, 60 sec: 19251.3, 300 sec: 19369.2). Total num frames: 1184563200. Throughput: 0: 4821.7. Samples: 21135080. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 14:29:39,965][19571] Avg episode reward: [(0, '8.988')] [2025-01-05 14:29:41,152][19668] Updated weights for policy 0, policy_version 289206 (0.0018) [2025-01-05 14:29:43,202][19668] Updated weights for policy 0, policy_version 289216 (0.0017) [2025-01-05 14:29:44,965][19571] Fps is (10 sec: 19661.0, 60 sec: 19319.4, 300 sec: 19369.2). Total num frames: 1184661504. Throughput: 0: 4825.4. Samples: 21164014. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 14:29:44,965][19571] Avg episode reward: [(0, '9.976')] [2025-01-05 14:29:45,347][19668] Updated weights for policy 0, policy_version 289226 (0.0018) [2025-01-05 14:29:47,428][19668] Updated weights for policy 0, policy_version 289236 (0.0019) [2025-01-05 14:29:49,534][19668] Updated weights for policy 0, policy_version 289246 (0.0018) [2025-01-05 14:29:49,965][19571] Fps is (10 sec: 19660.4, 60 sec: 19387.7, 300 sec: 19369.2). Total num frames: 1184759808. Throughput: 0: 4830.0. Samples: 21178756. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 14:29:49,966][19571] Avg episode reward: [(0, '9.582')] [2025-01-05 14:29:51,711][19668] Updated weights for policy 0, policy_version 289256 (0.0018) [2025-01-05 14:29:53,757][19668] Updated weights for policy 0, policy_version 289266 (0.0017) [2025-01-05 14:29:54,965][19571] Fps is (10 sec: 19251.1, 60 sec: 19319.5, 300 sec: 19341.5). Total num frames: 1184854016. Throughput: 0: 4839.9. Samples: 21207860. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 14:29:54,965][19571] Avg episode reward: [(0, '10.104')] [2025-01-05 14:29:55,875][19668] Updated weights for policy 0, policy_version 289276 (0.0017) [2025-01-05 14:29:57,846][19668] Updated weights for policy 0, policy_version 289286 (0.0015) [2025-01-05 14:29:59,851][19668] Updated weights for policy 0, policy_version 289296 (0.0014) [2025-01-05 14:29:59,965][19571] Fps is (10 sec: 19661.0, 60 sec: 19456.0, 300 sec: 19355.3). Total num frames: 1184956416. Throughput: 0: 4872.2. Samples: 21237984. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 14:29:59,965][19571] Avg episode reward: [(0, '10.564')] [2025-01-05 14:30:01,889][19668] Updated weights for policy 0, policy_version 289306 (0.0017) [2025-01-05 14:30:03,887][19668] Updated weights for policy 0, policy_version 289316 (0.0015) [2025-01-05 14:30:04,965][19571] Fps is (10 sec: 20480.1, 60 sec: 19524.3, 300 sec: 19369.2). Total num frames: 1185058816. Throughput: 0: 4885.5. Samples: 21253216. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 14:30:04,965][19571] Avg episode reward: [(0, '8.584')] [2025-01-05 14:30:05,919][19668] Updated weights for policy 0, policy_version 289326 (0.0015) [2025-01-05 14:30:07,980][19668] Updated weights for policy 0, policy_version 289336 (0.0016) [2025-01-05 14:30:09,965][19571] Fps is (10 sec: 20070.5, 60 sec: 19592.6, 300 sec: 19369.2). Total num frames: 1185157120. Throughput: 0: 4919.2. Samples: 21283392. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 14:30:09,965][19571] Avg episode reward: [(0, '9.889')] [2025-01-05 14:30:10,064][19668] Updated weights for policy 0, policy_version 289346 (0.0016) [2025-01-05 14:30:12,088][19668] Updated weights for policy 0, policy_version 289356 (0.0016) [2025-01-05 14:30:14,123][19668] Updated weights for policy 0, policy_version 289366 (0.0015) [2025-01-05 14:30:14,965][19571] Fps is (10 sec: 19660.8, 60 sec: 19592.5, 300 sec: 19383.1). Total num frames: 1185255424. Throughput: 0: 4942.8. Samples: 21313390. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 14:30:14,965][19571] Avg episode reward: [(0, '10.533')] [2025-01-05 14:30:16,232][19668] Updated weights for policy 0, policy_version 289376 (0.0016) [2025-01-05 14:30:18,309][19668] Updated weights for policy 0, policy_version 289386 (0.0015) [2025-01-05 14:30:19,965][19571] Fps is (10 sec: 19660.8, 60 sec: 19660.9, 300 sec: 19369.2). Total num frames: 1185353728. Throughput: 0: 4940.0. Samples: 21327918. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 14:30:19,965][19571] Avg episode reward: [(0, '9.563')] [2025-01-05 14:30:20,443][19668] Updated weights for policy 0, policy_version 289396 (0.0016) [2025-01-05 14:30:22,481][19668] Updated weights for policy 0, policy_version 289406 (0.0014) [2025-01-05 14:30:24,503][19668] Updated weights for policy 0, policy_version 289416 (0.0016) [2025-01-05 14:30:24,965][19571] Fps is (10 sec: 20070.5, 60 sec: 19729.1, 300 sec: 19383.1). Total num frames: 1185456128. Throughput: 0: 4946.8. Samples: 21357684. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 14:30:24,965][19571] Avg episode reward: [(0, '9.777')] [2025-01-05 14:30:26,634][19668] Updated weights for policy 0, policy_version 289426 (0.0016) [2025-01-05 14:30:28,680][19668] Updated weights for policy 0, policy_version 289436 (0.0016) [2025-01-05 14:30:29,965][19571] Fps is (10 sec: 20070.4, 60 sec: 19797.4, 300 sec: 19383.1). Total num frames: 1185554432. Throughput: 0: 4960.1. Samples: 21387218. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 14:30:29,965][19571] Avg episode reward: [(0, '10.011')] [2025-01-05 14:30:30,799][19668] Updated weights for policy 0, policy_version 289446 (0.0016) [2025-01-05 14:30:32,829][19668] Updated weights for policy 0, policy_version 289456 (0.0015) [2025-01-05 14:30:34,936][19668] Updated weights for policy 0, policy_version 289466 (0.0016) [2025-01-05 14:30:34,965][19571] Fps is (10 sec: 19660.8, 60 sec: 19797.4, 300 sec: 19397.0). Total num frames: 1185652736. Throughput: 0: 4965.7. Samples: 21402212. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 14:30:34,965][19571] Avg episode reward: [(0, '10.303')] [2025-01-05 14:30:37,032][19668] Updated weights for policy 0, policy_version 289476 (0.0016) [2025-01-05 14:30:39,117][19668] Updated weights for policy 0, policy_version 289486 (0.0019) [2025-01-05 14:30:39,965][19571] Fps is (10 sec: 19251.3, 60 sec: 19729.1, 300 sec: 19369.2). Total num frames: 1185746944. Throughput: 0: 4968.6. Samples: 21431446. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 14:30:39,965][19571] Avg episode reward: [(0, '11.040')] [2025-01-05 14:30:39,991][19636] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000289490_1185751040.pth... [2025-01-05 14:30:40,045][19636] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000288353_1181093888.pth [2025-01-05 14:30:41,311][19668] Updated weights for policy 0, policy_version 289496 (0.0017) [2025-01-05 14:30:43,368][19668] Updated weights for policy 0, policy_version 289506 (0.0015) [2025-01-05 14:30:44,965][19571] Fps is (10 sec: 19251.2, 60 sec: 19729.1, 300 sec: 19355.3). Total num frames: 1185845248. Throughput: 0: 4947.7. Samples: 21460628. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 14:30:44,965][19571] Avg episode reward: [(0, '10.513')] [2025-01-05 14:30:45,441][19668] Updated weights for policy 0, policy_version 289516 (0.0016) [2025-01-05 14:30:47,493][19668] Updated weights for policy 0, policy_version 289526 (0.0015) [2025-01-05 14:30:49,518][19668] Updated weights for policy 0, policy_version 289536 (0.0017) [2025-01-05 14:30:49,965][19571] Fps is (10 sec: 19660.8, 60 sec: 19729.1, 300 sec: 19355.3). Total num frames: 1185943552. Throughput: 0: 4938.1. Samples: 21475432. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 14:30:49,965][19571] Avg episode reward: [(0, '9.468')] [2025-01-05 14:30:51,639][19668] Updated weights for policy 0, policy_version 289546 (0.0015) [2025-01-05 14:30:53,664][19668] Updated weights for policy 0, policy_version 289556 (0.0015) [2025-01-05 14:30:54,965][19571] Fps is (10 sec: 20070.3, 60 sec: 19865.6, 300 sec: 19355.3). Total num frames: 1186045952. Throughput: 0: 4930.9. Samples: 21505284. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 14:30:54,965][19571] Avg episode reward: [(0, '9.776')] [2025-01-05 14:30:55,788][19668] Updated weights for policy 0, policy_version 289566 (0.0018) [2025-01-05 14:30:57,855][19668] Updated weights for policy 0, policy_version 289576 (0.0016) [2025-01-05 14:30:59,954][19668] Updated weights for policy 0, policy_version 289586 (0.0016) [2025-01-05 14:30:59,965][19571] Fps is (10 sec: 20070.3, 60 sec: 19797.4, 300 sec: 19397.0). Total num frames: 1186144256. Throughput: 0: 4915.9. Samples: 21534604. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 14:30:59,965][19571] Avg episode reward: [(0, '10.538')] [2025-01-05 14:31:02,020][19668] Updated weights for policy 0, policy_version 289596 (0.0015) [2025-01-05 14:31:04,143][19668] Updated weights for policy 0, policy_version 289606 (0.0016) [2025-01-05 14:31:04,965][19571] Fps is (10 sec: 19660.8, 60 sec: 19729.1, 300 sec: 19397.0). Total num frames: 1186242560. Throughput: 0: 4919.5. Samples: 21549294. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-05 14:31:04,965][19571] Avg episode reward: [(0, '10.850')] [2025-01-05 14:31:06,223][19668] Updated weights for policy 0, policy_version 289616 (0.0016) [2025-01-05 14:31:08,254][19668] Updated weights for policy 0, policy_version 289626 (0.0016) [2025-01-05 14:31:09,965][19571] Fps is (10 sec: 19251.2, 60 sec: 19660.8, 300 sec: 19383.1). Total num frames: 1186336768. Throughput: 0: 4913.6. Samples: 21578794. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-05 14:31:09,965][19571] Avg episode reward: [(0, '10.859')] [2025-01-05 14:31:10,410][19668] Updated weights for policy 0, policy_version 289636 (0.0016) [2025-01-05 14:31:12,445][19668] Updated weights for policy 0, policy_version 289646 (0.0015) [2025-01-05 14:31:14,464][19668] Updated weights for policy 0, policy_version 289656 (0.0016) [2025-01-05 14:31:14,965][19571] Fps is (10 sec: 19661.0, 60 sec: 19729.1, 300 sec: 19397.0). Total num frames: 1186439168. Throughput: 0: 4920.1. Samples: 21608622. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-05 14:31:14,965][19571] Avg episode reward: [(0, '10.327')] [2025-01-05 14:31:16,538][19668] Updated weights for policy 0, policy_version 289666 (0.0015) [2025-01-05 14:31:18,547][19668] Updated weights for policy 0, policy_version 289676 (0.0019) [2025-01-05 14:31:19,965][19571] Fps is (10 sec: 20070.3, 60 sec: 19729.1, 300 sec: 19410.9). Total num frames: 1186537472. Throughput: 0: 4922.0. Samples: 21623704. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-05 14:31:19,965][19571] Avg episode reward: [(0, '9.182')] [2025-01-05 14:31:20,584][19668] Updated weights for policy 0, policy_version 289686 (0.0016) [2025-01-05 14:31:22,646][19668] Updated weights for policy 0, policy_version 289696 (0.0015) [2025-01-05 14:31:24,667][19668] Updated weights for policy 0, policy_version 289706 (0.0016) [2025-01-05 14:31:24,965][19571] Fps is (10 sec: 20070.3, 60 sec: 19729.1, 300 sec: 19438.7). Total num frames: 1186639872. Throughput: 0: 4941.3. Samples: 21653804. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-05 14:31:24,965][19571] Avg episode reward: [(0, '9.137')] [2025-01-05 14:31:26,776][19668] Updated weights for policy 0, policy_version 289716 (0.0016) [2025-01-05 14:31:28,874][19668] Updated weights for policy 0, policy_version 289726 (0.0018) [2025-01-05 14:31:29,965][19571] Fps is (10 sec: 20070.6, 60 sec: 19729.1, 300 sec: 19466.4). Total num frames: 1186738176. Throughput: 0: 4946.6. Samples: 21683224. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-05 14:31:29,965][19571] Avg episode reward: [(0, '9.496')] [2025-01-05 14:31:30,967][19668] Updated weights for policy 0, policy_version 289736 (0.0016) [2025-01-05 14:31:32,984][19668] Updated weights for policy 0, policy_version 289746 (0.0015) [2025-01-05 14:31:34,965][19571] Fps is (10 sec: 19661.1, 60 sec: 19729.1, 300 sec: 19466.4). Total num frames: 1186836480. Throughput: 0: 4949.7. Samples: 21698166. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-05 14:31:34,965][19571] Avg episode reward: [(0, '8.995')] [2025-01-05 14:31:35,084][19668] Updated weights for policy 0, policy_version 289756 (0.0015) [2025-01-05 14:31:37,123][19668] Updated weights for policy 0, policy_version 289766 (0.0014) [2025-01-05 14:31:39,150][19668] Updated weights for policy 0, policy_version 289776 (0.0015) [2025-01-05 14:31:39,965][19571] Fps is (10 sec: 19660.6, 60 sec: 19797.3, 300 sec: 19480.3). Total num frames: 1186934784. Throughput: 0: 4952.2. Samples: 21728132. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-05 14:31:39,965][19571] Avg episode reward: [(0, '9.096')] [2025-01-05 14:31:41,249][19668] Updated weights for policy 0, policy_version 289786 (0.0016) [2025-01-05 14:31:43,296][19668] Updated weights for policy 0, policy_version 289796 (0.0016) [2025-01-05 14:31:44,965][19571] Fps is (10 sec: 19660.7, 60 sec: 19797.4, 300 sec: 19494.2). Total num frames: 1187033088. Throughput: 0: 4956.2. Samples: 21757632. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-05 14:31:44,965][19571] Avg episode reward: [(0, '9.306')] [2025-01-05 14:31:45,376][19668] Updated weights for policy 0, policy_version 289806 (0.0016) [2025-01-05 14:31:47,453][19668] Updated weights for policy 0, policy_version 289816 (0.0017) [2025-01-05 14:31:49,510][19668] Updated weights for policy 0, policy_version 289826 (0.0016) [2025-01-05 14:31:49,965][19571] Fps is (10 sec: 20070.4, 60 sec: 19865.6, 300 sec: 19508.1). Total num frames: 1187135488. Throughput: 0: 4963.6. Samples: 21772658. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-05 14:31:49,965][19571] Avg episode reward: [(0, '9.771')] [2025-01-05 14:31:51,597][19668] Updated weights for policy 0, policy_version 289836 (0.0017) [2025-01-05 14:31:53,693][19668] Updated weights for policy 0, policy_version 289846 (0.0016) [2025-01-05 14:31:54,965][19571] Fps is (10 sec: 19660.5, 60 sec: 19729.1, 300 sec: 19494.2). Total num frames: 1187229696. Throughput: 0: 4964.4. Samples: 21802190. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-05 14:31:54,965][19571] Avg episode reward: [(0, '10.572')] [2025-01-05 14:31:55,798][19668] Updated weights for policy 0, policy_version 289856 (0.0016) [2025-01-05 14:31:57,808][19668] Updated weights for policy 0, policy_version 289866 (0.0016) [2025-01-05 14:31:59,875][19668] Updated weights for policy 0, policy_version 289876 (0.0016) [2025-01-05 14:31:59,965][19571] Fps is (10 sec: 19660.7, 60 sec: 19797.3, 300 sec: 19521.9). Total num frames: 1187332096. Throughput: 0: 4963.7. Samples: 21831988. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-05 14:31:59,966][19571] Avg episode reward: [(0, '8.504')] [2025-01-05 14:32:01,984][19668] Updated weights for policy 0, policy_version 289886 (0.0017) [2025-01-05 14:32:03,980][19668] Updated weights for policy 0, policy_version 289896 (0.0018) [2025-01-05 14:32:04,965][19571] Fps is (10 sec: 20070.6, 60 sec: 19797.4, 300 sec: 19522.0). Total num frames: 1187430400. Throughput: 0: 4955.7. Samples: 21846712. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-05 14:32:04,965][19571] Avg episode reward: [(0, '9.473')] [2025-01-05 14:32:06,073][19668] Updated weights for policy 0, policy_version 289906 (0.0016) [2025-01-05 14:32:08,079][19668] Updated weights for policy 0, policy_version 289916 (0.0016) [2025-01-05 14:32:09,965][19571] Fps is (10 sec: 20070.5, 60 sec: 19933.8, 300 sec: 19535.8). Total num frames: 1187532800. Throughput: 0: 4953.8. Samples: 21876726. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-05 14:32:09,965][19571] Avg episode reward: [(0, '9.565')] [2025-01-05 14:32:10,124][19668] Updated weights for policy 0, policy_version 289926 (0.0016) [2025-01-05 14:32:12,213][19668] Updated weights for policy 0, policy_version 289936 (0.0016) [2025-01-05 14:32:14,243][19668] Updated weights for policy 0, policy_version 289946 (0.0016) [2025-01-05 14:32:14,965][19571] Fps is (10 sec: 20069.7, 60 sec: 19865.5, 300 sec: 19549.7). Total num frames: 1187631104. Throughput: 0: 4966.9. Samples: 21906736. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-05 14:32:14,966][19571] Avg episode reward: [(0, '9.734')] [2025-01-05 14:32:16,318][19668] Updated weights for policy 0, policy_version 289956 (0.0016) [2025-01-05 14:32:18,416][19668] Updated weights for policy 0, policy_version 289966 (0.0017) [2025-01-05 14:32:19,965][19571] Fps is (10 sec: 19661.0, 60 sec: 19865.6, 300 sec: 19563.6). Total num frames: 1187729408. Throughput: 0: 4962.0. Samples: 21921458. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-05 14:32:19,965][19571] Avg episode reward: [(0, '9.937')] [2025-01-05 14:32:20,528][19668] Updated weights for policy 0, policy_version 289976 (0.0016) [2025-01-05 14:32:22,492][19668] Updated weights for policy 0, policy_version 289986 (0.0015) [2025-01-05 14:32:24,866][19668] Updated weights for policy 0, policy_version 289996 (0.0015) [2025-01-05 14:32:24,965][19571] Fps is (10 sec: 19251.8, 60 sec: 19729.1, 300 sec: 19549.7). Total num frames: 1187823616. Throughput: 0: 4948.6. Samples: 21950820. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-05 14:32:24,965][19571] Avg episode reward: [(0, '9.655')] [2025-01-05 14:32:27,253][19668] Updated weights for policy 0, policy_version 290006 (0.0019) [2025-01-05 14:32:29,453][19668] Updated weights for policy 0, policy_version 290016 (0.0016) [2025-01-05 14:32:29,965][19571] Fps is (10 sec: 18431.7, 60 sec: 19592.5, 300 sec: 19521.9). Total num frames: 1187913728. Throughput: 0: 4879.0. Samples: 21977190. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-05 14:32:29,966][19571] Avg episode reward: [(0, '9.688')] [2025-01-05 14:32:31,689][19668] Updated weights for policy 0, policy_version 290026 (0.0016) [2025-01-05 14:32:33,792][19668] Updated weights for policy 0, policy_version 290036 (0.0017) [2025-01-05 14:32:34,965][19571] Fps is (10 sec: 18431.9, 60 sec: 19524.2, 300 sec: 19508.1). Total num frames: 1188007936. Throughput: 0: 4859.7. Samples: 21991346. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-05 14:32:34,965][19571] Avg episode reward: [(0, '9.756')] [2025-01-05 14:32:35,940][19668] Updated weights for policy 0, policy_version 290046 (0.0017) [2025-01-05 14:32:38,011][19668] Updated weights for policy 0, policy_version 290056 (0.0017) [2025-01-05 14:32:39,965][19571] Fps is (10 sec: 19251.2, 60 sec: 19524.3, 300 sec: 19521.9). Total num frames: 1188106240. Throughput: 0: 4845.7. Samples: 22020246. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-05 14:32:39,965][19571] Avg episode reward: [(0, '9.656')] [2025-01-05 14:32:39,971][19636] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000290065_1188106240.pth... [2025-01-05 14:32:40,028][19636] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000288918_1183408128.pth [2025-01-05 14:32:40,190][19668] Updated weights for policy 0, policy_version 290066 (0.0018) [2025-01-05 14:32:42,241][19668] Updated weights for policy 0, policy_version 290076 (0.0015) [2025-01-05 14:32:44,319][19668] Updated weights for policy 0, policy_version 290086 (0.0016) [2025-01-05 14:32:44,965][19571] Fps is (10 sec: 19250.7, 60 sec: 19455.9, 300 sec: 19521.9). Total num frames: 1188200448. Throughput: 0: 4834.5. Samples: 22049540. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-05 14:32:44,966][19571] Avg episode reward: [(0, '10.567')] [2025-01-05 14:32:46,518][19668] Updated weights for policy 0, policy_version 290096 (0.0017) [2025-01-05 14:32:48,504][19668] Updated weights for policy 0, policy_version 290106 (0.0016) [2025-01-05 14:32:49,965][19571] Fps is (10 sec: 19251.4, 60 sec: 19387.7, 300 sec: 19522.0). Total num frames: 1188298752. Throughput: 0: 4832.5. Samples: 22064176. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-05 14:32:49,965][19571] Avg episode reward: [(0, '10.789')] [2025-01-05 14:32:50,590][19668] Updated weights for policy 0, policy_version 290116 (0.0015) [2025-01-05 14:32:52,632][19668] Updated weights for policy 0, policy_version 290126 (0.0018) [2025-01-05 14:32:54,662][19668] Updated weights for policy 0, policy_version 290136 (0.0015) [2025-01-05 14:32:54,965][19571] Fps is (10 sec: 20070.9, 60 sec: 19524.3, 300 sec: 19549.7). Total num frames: 1188401152. Throughput: 0: 4832.0. Samples: 22094164. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-05 14:32:54,965][19571] Avg episode reward: [(0, '9.821')] [2025-01-05 14:32:56,855][19668] Updated weights for policy 0, policy_version 290146 (0.0017) [2025-01-05 14:32:58,901][19668] Updated weights for policy 0, policy_version 290156 (0.0016) [2025-01-05 14:32:59,965][19571] Fps is (10 sec: 19661.0, 60 sec: 19387.8, 300 sec: 19535.8). Total num frames: 1188495360. Throughput: 0: 4812.9. Samples: 22123314. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-05 14:32:59,965][19571] Avg episode reward: [(0, '9.575')] [2025-01-05 14:33:01,012][19668] Updated weights for policy 0, policy_version 290166 (0.0017) [2025-01-05 14:33:03,082][19668] Updated weights for policy 0, policy_version 290176 (0.0016) [2025-01-05 14:33:04,965][19571] Fps is (10 sec: 19251.3, 60 sec: 19387.7, 300 sec: 19549.7). Total num frames: 1188593664. Throughput: 0: 4814.0. Samples: 22138088. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-05 14:33:04,965][19571] Avg episode reward: [(0, '9.258')] [2025-01-05 14:33:05,255][19668] Updated weights for policy 0, policy_version 290186 (0.0016) [2025-01-05 14:33:07,232][19668] Updated weights for policy 0, policy_version 290196 (0.0015) [2025-01-05 14:33:09,308][19668] Updated weights for policy 0, policy_version 290206 (0.0015) [2025-01-05 14:33:09,965][19571] Fps is (10 sec: 19660.8, 60 sec: 19319.5, 300 sec: 19549.7). Total num frames: 1188691968. Throughput: 0: 4818.4. Samples: 22167650. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-05 14:33:09,965][19571] Avg episode reward: [(0, '9.904')] [2025-01-05 14:33:11,501][19668] Updated weights for policy 0, policy_version 290216 (0.0018) [2025-01-05 14:33:13,581][19668] Updated weights for policy 0, policy_version 290226 (0.0016) [2025-01-05 14:33:14,965][19571] Fps is (10 sec: 19660.7, 60 sec: 19319.6, 300 sec: 19563.6). Total num frames: 1188790272. Throughput: 0: 4867.4. Samples: 22196222. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-05 14:33:14,965][19571] Avg episode reward: [(0, '10.018')] [2025-01-05 14:33:15,817][19668] Updated weights for policy 0, policy_version 290236 (0.0017) [2025-01-05 14:33:18,306][19668] Updated weights for policy 0, policy_version 290246 (0.0018) [2025-01-05 14:33:19,965][19571] Fps is (10 sec: 18431.9, 60 sec: 19114.7, 300 sec: 19522.0). Total num frames: 1188876288. Throughput: 0: 4844.3. Samples: 22209340. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-05 14:33:19,965][19571] Avg episode reward: [(0, '8.974')] [2025-01-05 14:33:20,513][19668] Updated weights for policy 0, policy_version 290256 (0.0018) [2025-01-05 14:33:22,713][19668] Updated weights for policy 0, policy_version 290266 (0.0019) [2025-01-05 14:33:24,862][19668] Updated weights for policy 0, policy_version 290276 (0.0018) [2025-01-05 14:33:24,965][19571] Fps is (10 sec: 18022.5, 60 sec: 19114.7, 300 sec: 19522.0). Total num frames: 1188970496. Throughput: 0: 4812.5. Samples: 22236810. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-05 14:33:24,965][19571] Avg episode reward: [(0, '9.812')] [2025-01-05 14:33:27,053][19668] Updated weights for policy 0, policy_version 290286 (0.0018) [2025-01-05 14:33:29,180][19668] Updated weights for policy 0, policy_version 290296 (0.0018) [2025-01-05 14:33:29,965][19571] Fps is (10 sec: 18841.5, 60 sec: 19183.0, 300 sec: 19508.1). Total num frames: 1189064704. Throughput: 0: 4791.9. Samples: 22265176. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-05 14:33:29,965][19571] Avg episode reward: [(0, '10.181')] [2025-01-05 14:33:31,454][19668] Updated weights for policy 0, policy_version 290306 (0.0019) [2025-01-05 14:33:33,444][19668] Updated weights for policy 0, policy_version 290316 (0.0016) [2025-01-05 14:33:34,965][19571] Fps is (10 sec: 19251.3, 60 sec: 19251.2, 300 sec: 19508.1). Total num frames: 1189163008. Throughput: 0: 4785.4. Samples: 22279520. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-05 14:33:34,965][19571] Avg episode reward: [(0, '9.433')] [2025-01-05 14:33:35,538][19668] Updated weights for policy 0, policy_version 290326 (0.0018) [2025-01-05 14:33:37,673][19668] Updated weights for policy 0, policy_version 290336 (0.0018) [2025-01-05 14:33:39,650][19668] Updated weights for policy 0, policy_version 290346 (0.0017) [2025-01-05 14:33:39,965][19571] Fps is (10 sec: 19661.1, 60 sec: 19251.3, 300 sec: 19522.0). Total num frames: 1189261312. Throughput: 0: 4778.7. Samples: 22309206. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-05 14:33:39,965][19571] Avg episode reward: [(0, '9.695')] [2025-01-05 14:33:41,734][19668] Updated weights for policy 0, policy_version 290356 (0.0017) [2025-01-05 14:33:43,871][19668] Updated weights for policy 0, policy_version 290366 (0.0018) [2025-01-05 14:33:44,965][19571] Fps is (10 sec: 19660.6, 60 sec: 19319.6, 300 sec: 19535.8). Total num frames: 1189359616. Throughput: 0: 4781.7. Samples: 22338492. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-05 14:33:44,965][19571] Avg episode reward: [(0, '10.314')] [2025-01-05 14:33:45,993][19668] Updated weights for policy 0, policy_version 290376 (0.0018) [2025-01-05 14:33:48,145][19668] Updated weights for policy 0, policy_version 290386 (0.0019) [2025-01-05 14:33:49,965][19571] Fps is (10 sec: 19251.0, 60 sec: 19251.2, 300 sec: 19522.0). Total num frames: 1189453824. Throughput: 0: 4777.2. Samples: 22353064. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-05 14:33:49,965][19571] Avg episode reward: [(0, '10.059')] [2025-01-05 14:33:50,384][19668] Updated weights for policy 0, policy_version 290396 (0.0019) [2025-01-05 14:33:52,398][19668] Updated weights for policy 0, policy_version 290406 (0.0017) [2025-01-05 14:33:54,495][19668] Updated weights for policy 0, policy_version 290416 (0.0018) [2025-01-05 14:33:54,965][19571] Fps is (10 sec: 19251.3, 60 sec: 19183.0, 300 sec: 19535.8). Total num frames: 1189552128. Throughput: 0: 4760.2. Samples: 22381860. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-05 14:33:54,965][19571] Avg episode reward: [(0, '9.342')] [2025-01-05 14:33:56,680][19668] Updated weights for policy 0, policy_version 290426 (0.0018) [2025-01-05 14:33:58,724][19668] Updated weights for policy 0, policy_version 290436 (0.0018) [2025-01-05 14:33:59,965][19571] Fps is (10 sec: 19251.1, 60 sec: 19182.9, 300 sec: 19521.9). Total num frames: 1189646336. Throughput: 0: 4762.0. Samples: 22410514. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-05 14:33:59,965][19571] Avg episode reward: [(0, '10.076')] [2025-01-05 14:34:00,935][19668] Updated weights for policy 0, policy_version 290446 (0.0018) [2025-01-05 14:34:03,028][19668] Updated weights for policy 0, policy_version 290456 (0.0018) [2025-01-05 14:34:04,965][19571] Fps is (10 sec: 18841.5, 60 sec: 19114.6, 300 sec: 19522.0). Total num frames: 1189740544. Throughput: 0: 4797.4. Samples: 22425222. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-05 14:34:04,965][19571] Avg episode reward: [(0, '9.812')] [2025-01-05 14:34:05,204][19668] Updated weights for policy 0, policy_version 290466 (0.0018) [2025-01-05 14:34:07,261][19668] Updated weights for policy 0, policy_version 290476 (0.0017) [2025-01-05 14:34:09,346][19668] Updated weights for policy 0, policy_version 290486 (0.0018) [2025-01-05 14:34:09,965][19571] Fps is (10 sec: 19251.0, 60 sec: 19114.6, 300 sec: 19521.9). Total num frames: 1189838848. Throughput: 0: 4835.6. Samples: 22454412. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-05 14:34:09,966][19571] Avg episode reward: [(0, '10.193')] [2025-01-05 14:34:11,486][19668] Updated weights for policy 0, policy_version 290496 (0.0018) [2025-01-05 14:34:13,603][19668] Updated weights for policy 0, policy_version 290506 (0.0018) [2025-01-05 14:34:14,965][19571] Fps is (10 sec: 19660.9, 60 sec: 19114.7, 300 sec: 19535.9). Total num frames: 1189937152. Throughput: 0: 4839.6. Samples: 22482956. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-05 14:34:14,965][19571] Avg episode reward: [(0, '9.741')] [2025-01-05 14:34:15,812][19668] Updated weights for policy 0, policy_version 290516 (0.0018) [2025-01-05 14:34:17,852][19668] Updated weights for policy 0, policy_version 290526 (0.0018) [2025-01-05 14:34:19,965][19571] Fps is (10 sec: 19251.0, 60 sec: 19251.1, 300 sec: 19521.9). Total num frames: 1190031360. Throughput: 0: 4847.6. Samples: 22497664. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-05 14:34:19,966][19571] Avg episode reward: [(0, '9.777')] [2025-01-05 14:34:20,033][19668] Updated weights for policy 0, policy_version 290536 (0.0019) [2025-01-05 14:34:22,184][19668] Updated weights for policy 0, policy_version 290546 (0.0018) [2025-01-05 14:34:24,340][19668] Updated weights for policy 0, policy_version 290556 (0.0019) [2025-01-05 14:34:24,965][19571] Fps is (10 sec: 18841.5, 60 sec: 19251.2, 300 sec: 19522.0). Total num frames: 1190125568. Throughput: 0: 4819.7. Samples: 22526092. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-05 14:34:24,965][19571] Avg episode reward: [(0, '9.868')] [2025-01-05 14:34:26,516][19668] Updated weights for policy 0, policy_version 290566 (0.0017) [2025-01-05 14:34:28,593][19668] Updated weights for policy 0, policy_version 290576 (0.0017) [2025-01-05 14:34:29,965][19571] Fps is (10 sec: 19251.6, 60 sec: 19319.5, 300 sec: 19522.0). Total num frames: 1190223872. Throughput: 0: 4808.8. Samples: 22554888. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-05 14:34:29,965][19571] Avg episode reward: [(0, '8.861')] [2025-01-05 14:34:30,705][19668] Updated weights for policy 0, policy_version 290586 (0.0018) [2025-01-05 14:34:32,747][19668] Updated weights for policy 0, policy_version 290596 (0.0017) [2025-01-05 14:34:34,845][19668] Updated weights for policy 0, policy_version 290606 (0.0017) [2025-01-05 14:34:34,965][19571] Fps is (10 sec: 19660.9, 60 sec: 19319.4, 300 sec: 19521.9). Total num frames: 1190322176. Throughput: 0: 4818.0. Samples: 22569874. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-05 14:34:34,965][19571] Avg episode reward: [(0, '10.199')] [2025-01-05 14:34:37,034][19668] Updated weights for policy 0, policy_version 290616 (0.0019) [2025-01-05 14:34:39,096][19668] Updated weights for policy 0, policy_version 290626 (0.0017) [2025-01-05 14:34:39,965][19571] Fps is (10 sec: 19251.2, 60 sec: 19251.2, 300 sec: 19508.1). Total num frames: 1190416384. Throughput: 0: 4821.8. Samples: 22598842. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-05 14:34:39,965][19571] Avg episode reward: [(0, '9.601')] [2025-01-05 14:34:40,018][19636] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000290630_1190420480.pth... [2025-01-05 14:34:40,065][19636] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000289490_1185751040.pth [2025-01-05 14:34:41,334][19668] Updated weights for policy 0, policy_version 290636 (0.0018) [2025-01-05 14:34:43,427][19668] Updated weights for policy 0, policy_version 290646 (0.0017) [2025-01-05 14:34:44,965][19571] Fps is (10 sec: 19251.2, 60 sec: 19251.2, 300 sec: 19508.1). Total num frames: 1190514688. Throughput: 0: 4823.7. Samples: 22627580. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 14:34:44,965][19571] Avg episode reward: [(0, '10.423')] [2025-01-05 14:34:45,491][19668] Updated weights for policy 0, policy_version 290656 (0.0017) [2025-01-05 14:34:47,587][19668] Updated weights for policy 0, policy_version 290666 (0.0018) [2025-01-05 14:34:49,776][19668] Updated weights for policy 0, policy_version 290676 (0.0019) [2025-01-05 14:34:49,965][19571] Fps is (10 sec: 19660.8, 60 sec: 19319.4, 300 sec: 19522.0). Total num frames: 1190612992. Throughput: 0: 4825.3. Samples: 22642362. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 14:34:49,965][19571] Avg episode reward: [(0, '10.451')] [2025-01-05 14:34:51,833][19668] Updated weights for policy 0, policy_version 290686 (0.0017) [2025-01-05 14:34:53,939][19668] Updated weights for policy 0, policy_version 290696 (0.0018) [2025-01-05 14:34:54,965][19571] Fps is (10 sec: 19251.4, 60 sec: 19251.2, 300 sec: 19494.2). Total num frames: 1190707200. Throughput: 0: 4821.0. Samples: 22671358. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 14:34:54,965][19571] Avg episode reward: [(0, '10.642')] [2025-01-05 14:34:56,162][19668] Updated weights for policy 0, policy_version 290706 (0.0018) [2025-01-05 14:34:58,201][19668] Updated weights for policy 0, policy_version 290716 (0.0017) [2025-01-05 14:34:59,965][19571] Fps is (10 sec: 18841.8, 60 sec: 19251.2, 300 sec: 19466.4). Total num frames: 1190801408. Throughput: 0: 4821.9. Samples: 22699942. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 14:34:59,965][19571] Avg episode reward: [(0, '9.876')] [2025-01-05 14:35:00,423][19668] Updated weights for policy 0, policy_version 290726 (0.0017) [2025-01-05 14:35:02,511][19668] Updated weights for policy 0, policy_version 290736 (0.0017) [2025-01-05 14:35:04,574][19668] Updated weights for policy 0, policy_version 290746 (0.0018) [2025-01-05 14:35:04,965][19571] Fps is (10 sec: 19251.2, 60 sec: 19319.5, 300 sec: 19466.4). Total num frames: 1190899712. Throughput: 0: 4817.3. Samples: 22714442. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 14:35:04,965][19571] Avg episode reward: [(0, '10.505')] [2025-01-05 14:35:06,886][19668] Updated weights for policy 0, policy_version 290756 (0.0019) [2025-01-05 14:35:08,990][19668] Updated weights for policy 0, policy_version 290766 (0.0018) [2025-01-05 14:35:09,965][19571] Fps is (10 sec: 19251.1, 60 sec: 19251.3, 300 sec: 19452.5). Total num frames: 1190993920. Throughput: 0: 4819.5. Samples: 22742968. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 14:35:09,965][19571] Avg episode reward: [(0, '9.396')] [2025-01-05 14:35:11,169][19668] Updated weights for policy 0, policy_version 290776 (0.0020) [2025-01-05 14:35:13,292][19668] Updated weights for policy 0, policy_version 290786 (0.0017) [2025-01-05 14:35:14,965][19571] Fps is (10 sec: 18841.5, 60 sec: 19182.9, 300 sec: 19438.6). Total num frames: 1191088128. Throughput: 0: 4800.7. Samples: 22770920. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 14:35:14,966][19571] Avg episode reward: [(0, '10.297')] [2025-01-05 14:35:15,527][19668] Updated weights for policy 0, policy_version 290796 (0.0021) [2025-01-05 14:35:17,642][19668] Updated weights for policy 0, policy_version 290806 (0.0018) [2025-01-05 14:35:19,746][19668] Updated weights for policy 0, policy_version 290816 (0.0018) [2025-01-05 14:35:19,965][19571] Fps is (10 sec: 19251.3, 60 sec: 19251.3, 300 sec: 19424.8). Total num frames: 1191186432. Throughput: 0: 4788.7. Samples: 22785366. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 14:35:19,965][19571] Avg episode reward: [(0, '9.285')] [2025-01-05 14:35:21,903][19668] Updated weights for policy 0, policy_version 290826 (0.0019) [2025-01-05 14:35:23,949][19668] Updated weights for policy 0, policy_version 290836 (0.0017) [2025-01-05 14:35:24,965][19571] Fps is (10 sec: 19251.3, 60 sec: 19251.2, 300 sec: 19410.9). Total num frames: 1191280640. Throughput: 0: 4795.8. Samples: 22814654. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 14:35:24,965][19571] Avg episode reward: [(0, '10.775')] [2025-01-05 14:35:26,139][19668] Updated weights for policy 0, policy_version 290846 (0.0018) [2025-01-05 14:35:28,231][19668] Updated weights for policy 0, policy_version 290856 (0.0018) [2025-01-05 14:35:29,965][19571] Fps is (10 sec: 18841.2, 60 sec: 19182.9, 300 sec: 19397.0). Total num frames: 1191374848. Throughput: 0: 4791.9. Samples: 22843214. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 14:35:29,966][19571] Avg episode reward: [(0, '10.357')] [2025-01-05 14:35:30,362][19668] Updated weights for policy 0, policy_version 290866 (0.0017) [2025-01-05 14:35:32,470][19668] Updated weights for policy 0, policy_version 290876 (0.0018) [2025-01-05 14:35:34,551][19668] Updated weights for policy 0, policy_version 290886 (0.0017) [2025-01-05 14:35:34,965][19571] Fps is (10 sec: 19251.3, 60 sec: 19183.0, 300 sec: 19410.9). Total num frames: 1191473152. Throughput: 0: 4792.4. Samples: 22858020. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 14:35:34,965][19571] Avg episode reward: [(0, '9.367')] [2025-01-05 14:35:36,677][19668] Updated weights for policy 0, policy_version 290896 (0.0018) [2025-01-05 14:35:38,825][19668] Updated weights for policy 0, policy_version 290906 (0.0018) [2025-01-05 14:35:39,965][19571] Fps is (10 sec: 19661.0, 60 sec: 19251.2, 300 sec: 19410.9). Total num frames: 1191571456. Throughput: 0: 4794.6. Samples: 22887116. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 14:35:39,966][19571] Avg episode reward: [(0, '9.486')] [2025-01-05 14:35:41,032][19668] Updated weights for policy 0, policy_version 290916 (0.0019) [2025-01-05 14:35:43,062][19668] Updated weights for policy 0, policy_version 290926 (0.0017) [2025-01-05 14:35:44,965][19571] Fps is (10 sec: 19251.3, 60 sec: 19183.0, 300 sec: 19397.0). Total num frames: 1191665664. Throughput: 0: 4786.4. Samples: 22915328. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 14:35:44,965][19571] Avg episode reward: [(0, '9.375')] [2025-01-05 14:35:45,327][19668] Updated weights for policy 0, policy_version 290936 (0.0018) [2025-01-05 14:35:47,417][19668] Updated weights for policy 0, policy_version 290946 (0.0018) [2025-01-05 14:35:49,445][19668] Updated weights for policy 0, policy_version 290956 (0.0018) [2025-01-05 14:35:49,965][19571] Fps is (10 sec: 19251.2, 60 sec: 19182.9, 300 sec: 19383.1). Total num frames: 1191763968. Throughput: 0: 4791.0. Samples: 22930036. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 14:35:49,965][19571] Avg episode reward: [(0, '10.163')] [2025-01-05 14:35:51,641][19668] Updated weights for policy 0, policy_version 290966 (0.0017) [2025-01-05 14:35:53,722][19668] Updated weights for policy 0, policy_version 290976 (0.0018) [2025-01-05 14:35:54,965][19571] Fps is (10 sec: 19251.1, 60 sec: 19182.9, 300 sec: 19369.2). Total num frames: 1191858176. Throughput: 0: 4806.1. Samples: 22959244. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-05 14:35:54,965][19571] Avg episode reward: [(0, '10.097')] [2025-01-05 14:35:55,895][19668] Updated weights for policy 0, policy_version 290986 (0.0018) [2025-01-05 14:35:57,997][19668] Updated weights for policy 0, policy_version 290996 (0.0018) [2025-01-05 14:35:59,965][19571] Fps is (10 sec: 19251.2, 60 sec: 19251.2, 300 sec: 19369.2). Total num frames: 1191956480. Throughput: 0: 4817.7. Samples: 22987716. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-05 14:35:59,965][19571] Avg episode reward: [(0, '9.951')] [2025-01-05 14:36:00,171][19668] Updated weights for policy 0, policy_version 291006 (0.0018) [2025-01-05 14:36:02,194][19668] Updated weights for policy 0, policy_version 291016 (0.0016) [2025-01-05 14:36:04,313][19668] Updated weights for policy 0, policy_version 291026 (0.0017) [2025-01-05 14:36:04,965][19571] Fps is (10 sec: 19250.9, 60 sec: 19182.9, 300 sec: 19369.2). Total num frames: 1192050688. Throughput: 0: 4826.4. Samples: 23002554. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-05 14:36:04,966][19571] Avg episode reward: [(0, '9.979')] [2025-01-05 14:36:06,502][19668] Updated weights for policy 0, policy_version 291036 (0.0018) [2025-01-05 14:36:08,523][19668] Updated weights for policy 0, policy_version 291046 (0.0017) [2025-01-05 14:36:09,965][19571] Fps is (10 sec: 19251.5, 60 sec: 19251.2, 300 sec: 19355.3). Total num frames: 1192148992. Throughput: 0: 4824.5. Samples: 23031756. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-05 14:36:09,965][19571] Avg episode reward: [(0, '9.692')] [2025-01-05 14:36:10,709][19668] Updated weights for policy 0, policy_version 291056 (0.0018) [2025-01-05 14:36:12,773][19668] Updated weights for policy 0, policy_version 291066 (0.0018) [2025-01-05 14:36:14,826][19668] Updated weights for policy 0, policy_version 291076 (0.0018) [2025-01-05 14:36:14,965][19571] Fps is (10 sec: 19661.1, 60 sec: 19319.5, 300 sec: 19355.3). Total num frames: 1192247296. Throughput: 0: 4838.0. Samples: 23060922. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-05 14:36:14,965][19571] Avg episode reward: [(0, '10.476')] [2025-01-05 14:36:17,055][19668] Updated weights for policy 0, policy_version 291086 (0.0019) [2025-01-05 14:36:19,119][19668] Updated weights for policy 0, policy_version 291096 (0.0017) [2025-01-05 14:36:19,965][19571] Fps is (10 sec: 19660.8, 60 sec: 19319.5, 300 sec: 19341.5). Total num frames: 1192345600. Throughput: 0: 4825.8. Samples: 23075182. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-05 14:36:19,965][19571] Avg episode reward: [(0, '9.772')] [2025-01-05 14:36:21,255][19668] Updated weights for policy 0, policy_version 291106 (0.0018) [2025-01-05 14:36:23,388][19668] Updated weights for policy 0, policy_version 291116 (0.0018) [2025-01-05 14:36:24,965][19571] Fps is (10 sec: 19251.1, 60 sec: 19319.4, 300 sec: 19327.6). Total num frames: 1192439808. Throughput: 0: 4823.2. Samples: 23104162. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-05 14:36:24,966][19571] Avg episode reward: [(0, '9.655')] [2025-01-05 14:36:25,542][19668] Updated weights for policy 0, policy_version 291126 (0.0020) [2025-01-05 14:36:27,573][19668] Updated weights for policy 0, policy_version 291136 (0.0017) [2025-01-05 14:36:29,645][19668] Updated weights for policy 0, policy_version 291146 (0.0017) [2025-01-05 14:36:29,965][19571] Fps is (10 sec: 19250.9, 60 sec: 19387.8, 300 sec: 19327.6). Total num frames: 1192538112. Throughput: 0: 4850.7. Samples: 23133608. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-05 14:36:29,965][19571] Avg episode reward: [(0, '10.586')] [2025-01-05 14:36:31,815][19668] Updated weights for policy 0, policy_version 291156 (0.0018) [2025-01-05 14:36:33,842][19668] Updated weights for policy 0, policy_version 291166 (0.0017) [2025-01-05 14:36:34,965][19571] Fps is (10 sec: 19251.3, 60 sec: 19319.5, 300 sec: 19313.7). Total num frames: 1192632320. Throughput: 0: 4844.5. Samples: 23148038. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-05 14:36:34,965][19571] Avg episode reward: [(0, '9.360')] [2025-01-05 14:36:36,093][19668] Updated weights for policy 0, policy_version 291176 (0.0019) [2025-01-05 14:36:38,177][19668] Updated weights for policy 0, policy_version 291186 (0.0017) [2025-01-05 14:36:39,965][19571] Fps is (10 sec: 19251.4, 60 sec: 19319.5, 300 sec: 19313.7). Total num frames: 1192730624. Throughput: 0: 4837.9. Samples: 23176948. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-05 14:36:39,965][19571] Avg episode reward: [(0, '9.067')] [2025-01-05 14:36:39,972][19636] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000291194_1192730624.pth... [2025-01-05 14:36:40,030][19636] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000290065_1188106240.pth [2025-01-05 14:36:40,292][19668] Updated weights for policy 0, policy_version 291196 (0.0019) [2025-01-05 14:36:42,431][19668] Updated weights for policy 0, policy_version 291206 (0.0018) [2025-01-05 14:36:44,524][19668] Updated weights for policy 0, policy_version 291216 (0.0018) [2025-01-05 14:36:44,965][19571] Fps is (10 sec: 19660.8, 60 sec: 19387.7, 300 sec: 19299.8). Total num frames: 1192828928. Throughput: 0: 4848.0. Samples: 23205876. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-05 14:36:44,965][19571] Avg episode reward: [(0, '9.578')] [2025-01-05 14:36:46,703][19668] Updated weights for policy 0, policy_version 291226 (0.0019) [2025-01-05 14:36:48,811][19668] Updated weights for policy 0, policy_version 291236 (0.0017) [2025-01-05 14:36:49,965][19571] Fps is (10 sec: 19251.3, 60 sec: 19319.5, 300 sec: 19299.8). Total num frames: 1192923136. Throughput: 0: 4834.7. Samples: 23220114. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-05 14:36:49,965][19571] Avg episode reward: [(0, '9.705')] [2025-01-05 14:36:51,003][19668] Updated weights for policy 0, policy_version 291246 (0.0018) [2025-01-05 14:36:53,012][19668] Updated weights for policy 0, policy_version 291256 (0.0017) [2025-01-05 14:36:54,965][19571] Fps is (10 sec: 19251.2, 60 sec: 19387.7, 300 sec: 19285.9). Total num frames: 1193021440. Throughput: 0: 4830.1. Samples: 23249112. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-05 14:36:54,965][19571] Avg episode reward: [(0, '9.304')] [2025-01-05 14:36:55,121][19668] Updated weights for policy 0, policy_version 291266 (0.0017) [2025-01-05 14:36:57,212][19668] Updated weights for policy 0, policy_version 291276 (0.0017) [2025-01-05 14:36:59,206][19668] Updated weights for policy 0, policy_version 291286 (0.0018) [2025-01-05 14:36:59,965][19571] Fps is (10 sec: 19660.6, 60 sec: 19387.7, 300 sec: 19285.9). Total num frames: 1193119744. Throughput: 0: 4846.0. Samples: 23278994. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-05 14:36:59,965][19571] Avg episode reward: [(0, '9.956')] [2025-01-05 14:37:01,320][19668] Updated weights for policy 0, policy_version 291296 (0.0017) [2025-01-05 14:37:03,391][19668] Updated weights for policy 0, policy_version 291306 (0.0017) [2025-01-05 14:37:04,965][19571] Fps is (10 sec: 19661.0, 60 sec: 19456.1, 300 sec: 19272.0). Total num frames: 1193218048. Throughput: 0: 4860.0. Samples: 23293884. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-05 14:37:04,965][19571] Avg episode reward: [(0, '8.773')] [2025-01-05 14:37:05,506][19668] Updated weights for policy 0, policy_version 291316 (0.0019) [2025-01-05 14:37:07,644][19668] Updated weights for policy 0, policy_version 291326 (0.0017) [2025-01-05 14:37:09,751][19668] Updated weights for policy 0, policy_version 291336 (0.0018) [2025-01-05 14:37:09,965][19571] Fps is (10 sec: 19660.8, 60 sec: 19456.0, 300 sec: 19272.0). Total num frames: 1193316352. Throughput: 0: 4856.7. Samples: 23322712. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 14:37:09,965][19571] Avg episode reward: [(0, '9.753')] [2025-01-05 14:37:11,882][19668] Updated weights for policy 0, policy_version 291346 (0.0018) [2025-01-05 14:37:13,986][19668] Updated weights for policy 0, policy_version 291356 (0.0017) [2025-01-05 14:37:14,965][19571] Fps is (10 sec: 19250.8, 60 sec: 19387.7, 300 sec: 19258.1). Total num frames: 1193410560. Throughput: 0: 4848.8. Samples: 23351804. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 14:37:14,966][19571] Avg episode reward: [(0, '9.404')] [2025-01-05 14:37:16,113][19668] Updated weights for policy 0, policy_version 291366 (0.0017) [2025-01-05 14:37:18,148][19668] Updated weights for policy 0, policy_version 291376 (0.0017) [2025-01-05 14:37:19,965][19571] Fps is (10 sec: 19251.2, 60 sec: 19387.7, 300 sec: 19272.0). Total num frames: 1193508864. Throughput: 0: 4856.3. Samples: 23366572. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 14:37:19,965][19571] Avg episode reward: [(0, '10.372')] [2025-01-05 14:37:20,359][19668] Updated weights for policy 0, policy_version 291386 (0.0017) [2025-01-05 14:37:22,423][19668] Updated weights for policy 0, policy_version 291396 (0.0017) [2025-01-05 14:37:24,425][19668] Updated weights for policy 0, policy_version 291406 (0.0017) [2025-01-05 14:37:24,965][19571] Fps is (10 sec: 19661.1, 60 sec: 19456.0, 300 sec: 19299.8). Total num frames: 1193607168. Throughput: 0: 4865.1. Samples: 23395878. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 14:37:24,965][19571] Avg episode reward: [(0, '10.157')] [2025-01-05 14:37:26,526][19668] Updated weights for policy 0, policy_version 291416 (0.0018) [2025-01-05 14:37:28,587][19668] Updated weights for policy 0, policy_version 291426 (0.0017) [2025-01-05 14:37:29,965][19571] Fps is (10 sec: 19660.7, 60 sec: 19456.0, 300 sec: 19313.7). Total num frames: 1193705472. Throughput: 0: 4886.0. Samples: 23425746. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 14:37:29,965][19571] Avg episode reward: [(0, '9.732')] [2025-01-05 14:37:30,598][19668] Updated weights for policy 0, policy_version 291436 (0.0017) [2025-01-05 14:37:32,712][19668] Updated weights for policy 0, policy_version 291446 (0.0017) [2025-01-05 14:37:34,790][19668] Updated weights for policy 0, policy_version 291456 (0.0018) [2025-01-05 14:37:34,965][19571] Fps is (10 sec: 20070.5, 60 sec: 19592.6, 300 sec: 19327.6). Total num frames: 1193807872. Throughput: 0: 4901.1. Samples: 23440664. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 14:37:34,965][19571] Avg episode reward: [(0, '9.525')] [2025-01-05 14:37:36,945][19668] Updated weights for policy 0, policy_version 291466 (0.0019) [2025-01-05 14:37:39,093][19668] Updated weights for policy 0, policy_version 291476 (0.0018) [2025-01-05 14:37:39,965][19571] Fps is (10 sec: 19251.1, 60 sec: 19455.9, 300 sec: 19313.7). Total num frames: 1193897984. Throughput: 0: 4895.7. Samples: 23469420. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 14:37:39,966][19571] Avg episode reward: [(0, '9.282')] [2025-01-05 14:37:41,284][19668] Updated weights for policy 0, policy_version 291486 (0.0018) [2025-01-05 14:37:43,284][19668] Updated weights for policy 0, policy_version 291496 (0.0017) [2025-01-05 14:37:44,965][19571] Fps is (10 sec: 18841.6, 60 sec: 19456.0, 300 sec: 19313.7). Total num frames: 1193996288. Throughput: 0: 4881.3. Samples: 23498652. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 14:37:44,965][19571] Avg episode reward: [(0, '10.267')] [2025-01-05 14:37:45,429][19668] Updated weights for policy 0, policy_version 291506 (0.0017) [2025-01-05 14:37:47,483][19668] Updated weights for policy 0, policy_version 291516 (0.0017) [2025-01-05 14:37:49,445][19668] Updated weights for policy 0, policy_version 291526 (0.0016) [2025-01-05 14:37:49,965][19571] Fps is (10 sec: 20070.6, 60 sec: 19592.5, 300 sec: 19313.7). Total num frames: 1194098688. Throughput: 0: 4877.2. Samples: 23513358. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 14:37:49,965][19571] Avg episode reward: [(0, '10.525')] [2025-01-05 14:37:51,497][19668] Updated weights for policy 0, policy_version 291536 (0.0015) [2025-01-05 14:37:53,506][19668] Updated weights for policy 0, policy_version 291546 (0.0015) [2025-01-05 14:37:54,965][19571] Fps is (10 sec: 20479.9, 60 sec: 19660.8, 300 sec: 19341.4). Total num frames: 1194201088. Throughput: 0: 4918.3. Samples: 23544034. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 14:37:54,965][19571] Avg episode reward: [(0, '9.892')] [2025-01-05 14:37:55,477][19668] Updated weights for policy 0, policy_version 291556 (0.0015) [2025-01-05 14:37:57,535][19668] Updated weights for policy 0, policy_version 291566 (0.0014) [2025-01-05 14:37:59,545][19668] Updated weights for policy 0, policy_version 291576 (0.0014) [2025-01-05 14:37:59,965][19571] Fps is (10 sec: 20480.0, 60 sec: 19729.1, 300 sec: 19355.3). Total num frames: 1194303488. Throughput: 0: 4948.5. Samples: 23574486. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 14:37:59,965][19571] Avg episode reward: [(0, '11.002')] [2025-01-05 14:38:01,545][19668] Updated weights for policy 0, policy_version 291586 (0.0015) [2025-01-05 14:38:03,595][19668] Updated weights for policy 0, policy_version 291596 (0.0015) [2025-01-05 14:38:04,965][19571] Fps is (10 sec: 20070.3, 60 sec: 19729.0, 300 sec: 19355.3). Total num frames: 1194401792. Throughput: 0: 4958.4. Samples: 23589700. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 14:38:04,965][19571] Avg episode reward: [(0, '10.418')] [2025-01-05 14:38:05,705][19668] Updated weights for policy 0, policy_version 291606 (0.0016) [2025-01-05 14:38:07,692][19668] Updated weights for policy 0, policy_version 291616 (0.0015) [2025-01-05 14:38:09,799][19668] Updated weights for policy 0, policy_version 291626 (0.0015) [2025-01-05 14:38:09,965][19571] Fps is (10 sec: 19659.7, 60 sec: 19728.9, 300 sec: 19355.3). Total num frames: 1194500096. Throughput: 0: 4970.0. Samples: 23619532. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 14:38:09,966][19571] Avg episode reward: [(0, '9.857')] [2025-01-05 14:38:11,910][19668] Updated weights for policy 0, policy_version 291636 (0.0016) [2025-01-05 14:38:13,907][19668] Updated weights for policy 0, policy_version 291646 (0.0016) [2025-01-05 14:38:14,965][19571] Fps is (10 sec: 19661.1, 60 sec: 19797.4, 300 sec: 19397.0). Total num frames: 1194598400. Throughput: 0: 4966.8. Samples: 23649250. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 14:38:14,965][19571] Avg episode reward: [(0, '10.318')] [2025-01-05 14:38:16,000][19668] Updated weights for policy 0, policy_version 291656 (0.0016) [2025-01-05 14:38:18,064][19668] Updated weights for policy 0, policy_version 291666 (0.0016) [2025-01-05 14:38:19,965][19571] Fps is (10 sec: 20071.5, 60 sec: 19865.6, 300 sec: 19424.8). Total num frames: 1194700800. Throughput: 0: 4967.8. Samples: 23664214. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 14:38:19,965][19571] Avg episode reward: [(0, '10.484')] [2025-01-05 14:38:20,143][19668] Updated weights for policy 0, policy_version 291676 (0.0018) [2025-01-05 14:38:22,207][19668] Updated weights for policy 0, policy_version 291686 (0.0016) [2025-01-05 14:38:24,278][19668] Updated weights for policy 0, policy_version 291696 (0.0016) [2025-01-05 14:38:24,965][19571] Fps is (10 sec: 20070.1, 60 sec: 19865.6, 300 sec: 19438.6). Total num frames: 1194799104. Throughput: 0: 4985.4. Samples: 23693764. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 14:38:24,965][19571] Avg episode reward: [(0, '10.275')] [2025-01-05 14:38:26,360][19668] Updated weights for policy 0, policy_version 291706 (0.0016) [2025-01-05 14:38:28,447][19668] Updated weights for policy 0, policy_version 291716 (0.0016) [2025-01-05 14:38:29,965][19571] Fps is (10 sec: 19660.8, 60 sec: 19865.6, 300 sec: 19438.6). Total num frames: 1194897408. Throughput: 0: 4989.3. Samples: 23723172. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 14:38:29,965][19571] Avg episode reward: [(0, '9.331')] [2025-01-05 14:38:30,552][19668] Updated weights for policy 0, policy_version 291726 (0.0016) [2025-01-05 14:38:32,549][19668] Updated weights for policy 0, policy_version 291736 (0.0018) [2025-01-05 14:38:34,632][19668] Updated weights for policy 0, policy_version 291746 (0.0016) [2025-01-05 14:38:34,965][19571] Fps is (10 sec: 19660.8, 60 sec: 19797.3, 300 sec: 19438.6). Total num frames: 1194995712. Throughput: 0: 4995.7. Samples: 23738164. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 14:38:34,965][19571] Avg episode reward: [(0, '8.771')] [2025-01-05 14:38:36,762][19668] Updated weights for policy 0, policy_version 291756 (0.0016) [2025-01-05 14:38:38,730][19668] Updated weights for policy 0, policy_version 291766 (0.0016) [2025-01-05 14:38:39,965][19571] Fps is (10 sec: 20070.4, 60 sec: 20002.2, 300 sec: 19452.5). Total num frames: 1195098112. Throughput: 0: 4974.4. Samples: 23767884. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 14:38:39,965][19571] Avg episode reward: [(0, '9.293')] [2025-01-05 14:38:39,970][19636] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000291772_1195098112.pth... [2025-01-05 14:38:40,018][19636] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000290630_1190420480.pth [2025-01-05 14:38:40,868][19668] Updated weights for policy 0, policy_version 291776 (0.0016) [2025-01-05 14:38:42,926][19668] Updated weights for policy 0, policy_version 291786 (0.0015) [2025-01-05 14:38:44,885][19668] Updated weights for policy 0, policy_version 291796 (0.0015) [2025-01-05 14:38:44,965][19571] Fps is (10 sec: 20070.7, 60 sec: 20002.1, 300 sec: 19466.4). Total num frames: 1195196416. Throughput: 0: 4964.7. Samples: 23797896. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 14:38:44,965][19571] Avg episode reward: [(0, '9.198')] [2025-01-05 14:38:46,987][19668] Updated weights for policy 0, policy_version 291806 (0.0016) [2025-01-05 14:38:49,047][19668] Updated weights for policy 0, policy_version 291816 (0.0016) [2025-01-05 14:38:49,965][19571] Fps is (10 sec: 19660.8, 60 sec: 19933.9, 300 sec: 19466.4). Total num frames: 1195294720. Throughput: 0: 4962.7. Samples: 23813022. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 14:38:49,965][19571] Avg episode reward: [(0, '9.139')] [2025-01-05 14:38:51,092][19668] Updated weights for policy 0, policy_version 291826 (0.0017) [2025-01-05 14:38:53,162][19668] Updated weights for policy 0, policy_version 291836 (0.0016) [2025-01-05 14:38:54,965][19571] Fps is (10 sec: 19660.8, 60 sec: 19865.6, 300 sec: 19480.3). Total num frames: 1195393024. Throughput: 0: 4958.9. Samples: 23842678. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 14:38:54,965][19571] Avg episode reward: [(0, '10.758')] [2025-01-05 14:38:55,294][19668] Updated weights for policy 0, policy_version 291846 (0.0016) [2025-01-05 14:38:57,264][19668] Updated weights for policy 0, policy_version 291856 (0.0015) [2025-01-05 14:38:59,362][19668] Updated weights for policy 0, policy_version 291866 (0.0016) [2025-01-05 14:38:59,965][19571] Fps is (10 sec: 19661.0, 60 sec: 19797.4, 300 sec: 19494.2). Total num frames: 1195491328. Throughput: 0: 4957.8. Samples: 23872352. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 14:38:59,965][19571] Avg episode reward: [(0, '10.668')] [2025-01-05 14:39:01,476][19668] Updated weights for policy 0, policy_version 291876 (0.0016) [2025-01-05 14:39:03,447][19668] Updated weights for policy 0, policy_version 291886 (0.0015) [2025-01-05 14:39:04,965][19571] Fps is (10 sec: 20070.2, 60 sec: 19865.6, 300 sec: 19508.1). Total num frames: 1195593728. Throughput: 0: 4954.2. Samples: 23887152. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 14:39:04,965][19571] Avg episode reward: [(0, '10.270')] [2025-01-05 14:39:05,571][19668] Updated weights for policy 0, policy_version 291896 (0.0019) [2025-01-05 14:39:07,603][19668] Updated weights for policy 0, policy_version 291906 (0.0016) [2025-01-05 14:39:09,577][19668] Updated weights for policy 0, policy_version 291916 (0.0016) [2025-01-05 14:39:09,965][19571] Fps is (10 sec: 20070.3, 60 sec: 19865.8, 300 sec: 19508.1). Total num frames: 1195692032. Throughput: 0: 4966.4. Samples: 23917252. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 14:39:09,965][19571] Avg episode reward: [(0, '9.292')] [2025-01-05 14:39:11,677][19668] Updated weights for policy 0, policy_version 291926 (0.0016) [2025-01-05 14:39:13,711][19668] Updated weights for policy 0, policy_version 291936 (0.0016) [2025-01-05 14:39:14,965][19571] Fps is (10 sec: 19660.8, 60 sec: 19865.6, 300 sec: 19522.0). Total num frames: 1195790336. Throughput: 0: 4977.6. Samples: 23947162. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 14:39:14,965][19571] Avg episode reward: [(0, '9.379')] [2025-01-05 14:39:15,776][19668] Updated weights for policy 0, policy_version 291946 (0.0017) [2025-01-05 14:39:17,898][19668] Updated weights for policy 0, policy_version 291956 (0.0016) [2025-01-05 14:39:19,965][19571] Fps is (10 sec: 19660.6, 60 sec: 19797.3, 300 sec: 19535.8). Total num frames: 1195888640. Throughput: 0: 4969.1. Samples: 23961774. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 14:39:19,965][19571] Avg episode reward: [(0, '9.888')] [2025-01-05 14:39:19,998][19668] Updated weights for policy 0, policy_version 291966 (0.0016) [2025-01-05 14:39:22,081][19668] Updated weights for policy 0, policy_version 291976 (0.0017) [2025-01-05 14:39:24,119][19668] Updated weights for policy 0, policy_version 291986 (0.0017) [2025-01-05 14:39:24,965][19571] Fps is (10 sec: 19660.9, 60 sec: 19797.4, 300 sec: 19535.8). Total num frames: 1195986944. Throughput: 0: 4966.9. Samples: 23991396. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 14:39:24,965][19571] Avg episode reward: [(0, '10.502')] [2025-01-05 14:39:26,228][19668] Updated weights for policy 0, policy_version 291996 (0.0016) [2025-01-05 14:39:28,241][19668] Updated weights for policy 0, policy_version 292006 (0.0015) [2025-01-05 14:39:29,965][19571] Fps is (10 sec: 20070.6, 60 sec: 19865.6, 300 sec: 19549.7). Total num frames: 1196089344. Throughput: 0: 4961.4. Samples: 24021158. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 14:39:29,965][19571] Avg episode reward: [(0, '10.825')] [2025-01-05 14:39:30,343][19668] Updated weights for policy 0, policy_version 292016 (0.0016) [2025-01-05 14:39:32,382][19668] Updated weights for policy 0, policy_version 292026 (0.0018) [2025-01-05 14:39:34,369][19668] Updated weights for policy 0, policy_version 292036 (0.0016) [2025-01-05 14:39:34,965][19571] Fps is (10 sec: 20070.5, 60 sec: 19865.7, 300 sec: 19563.6). Total num frames: 1196187648. Throughput: 0: 4958.3. Samples: 24036144. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 14:39:34,965][19571] Avg episode reward: [(0, '10.387')] [2025-01-05 14:39:36,432][19668] Updated weights for policy 0, policy_version 292046 (0.0016) [2025-01-05 14:39:38,447][19668] Updated weights for policy 0, policy_version 292056 (0.0016) [2025-01-05 14:39:39,965][19571] Fps is (10 sec: 20070.4, 60 sec: 19865.6, 300 sec: 19577.5). Total num frames: 1196290048. Throughput: 0: 4972.4. Samples: 24066436. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 14:39:39,965][19571] Avg episode reward: [(0, '9.863')] [2025-01-05 14:39:40,471][19668] Updated weights for policy 0, policy_version 292066 (0.0016) [2025-01-05 14:39:42,525][19668] Updated weights for policy 0, policy_version 292076 (0.0016) [2025-01-05 14:39:44,555][19668] Updated weights for policy 0, policy_version 292086 (0.0016) [2025-01-05 14:39:44,965][19571] Fps is (10 sec: 20070.4, 60 sec: 19865.6, 300 sec: 19577.5). Total num frames: 1196388352. Throughput: 0: 4982.1. Samples: 24096548. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 14:39:44,965][19571] Avg episode reward: [(0, '9.537')] [2025-01-05 14:39:46,653][19668] Updated weights for policy 0, policy_version 292096 (0.0016) [2025-01-05 14:39:48,695][19668] Updated weights for policy 0, policy_version 292106 (0.0016) [2025-01-05 14:39:49,965][19571] Fps is (10 sec: 20070.3, 60 sec: 19933.9, 300 sec: 19605.3). Total num frames: 1196490752. Throughput: 0: 4981.2. Samples: 24111306. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 14:39:49,965][19571] Avg episode reward: [(0, '9.811')] [2025-01-05 14:39:50,812][19668] Updated weights for policy 0, policy_version 292116 (0.0017) [2025-01-05 14:39:52,813][19668] Updated weights for policy 0, policy_version 292126 (0.0016) [2025-01-05 14:39:54,867][19668] Updated weights for policy 0, policy_version 292136 (0.0017) [2025-01-05 14:39:54,965][19571] Fps is (10 sec: 20070.2, 60 sec: 19933.8, 300 sec: 19619.1). Total num frames: 1196589056. Throughput: 0: 4977.4. Samples: 24141236. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 14:39:54,965][19571] Avg episode reward: [(0, '8.708')] [2025-01-05 14:39:57,006][19668] Updated weights for policy 0, policy_version 292146 (0.0017) [2025-01-05 14:39:59,036][19668] Updated weights for policy 0, policy_version 292156 (0.0018) [2025-01-05 14:39:59,965][19571] Fps is (10 sec: 19660.8, 60 sec: 19933.8, 300 sec: 19619.1). Total num frames: 1196687360. Throughput: 0: 4966.9. Samples: 24170672. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 14:39:59,965][19571] Avg episode reward: [(0, '10.150')] [2025-01-05 14:40:01,163][19668] Updated weights for policy 0, policy_version 292166 (0.0017) [2025-01-05 14:40:03,240][19668] Updated weights for policy 0, policy_version 292176 (0.0015) [2025-01-05 14:40:04,965][19571] Fps is (10 sec: 19660.8, 60 sec: 19865.6, 300 sec: 19633.0). Total num frames: 1196785664. Throughput: 0: 4973.3. Samples: 24185572. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 14:40:04,965][19571] Avg episode reward: [(0, '11.233')] [2025-01-05 14:40:05,293][19668] Updated weights for policy 0, policy_version 292186 (0.0016) [2025-01-05 14:40:07,344][19668] Updated weights for policy 0, policy_version 292196 (0.0016) [2025-01-05 14:40:09,386][19668] Updated weights for policy 0, policy_version 292206 (0.0015) [2025-01-05 14:40:09,965][19571] Fps is (10 sec: 19660.2, 60 sec: 19865.5, 300 sec: 19646.9). Total num frames: 1196883968. Throughput: 0: 4979.8. Samples: 24215490. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 14:40:09,966][19571] Avg episode reward: [(0, '10.062')] [2025-01-05 14:40:11,454][19668] Updated weights for policy 0, policy_version 292216 (0.0016) [2025-01-05 14:40:13,498][19668] Updated weights for policy 0, policy_version 292226 (0.0015) [2025-01-05 14:40:14,965][19571] Fps is (10 sec: 19660.9, 60 sec: 19865.6, 300 sec: 19646.9). Total num frames: 1196982272. Throughput: 0: 4976.3. Samples: 24245092. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 14:40:14,965][19571] Avg episode reward: [(0, '9.598')] [2025-01-05 14:40:15,650][19668] Updated weights for policy 0, policy_version 292236 (0.0016) [2025-01-05 14:40:17,635][19668] Updated weights for policy 0, policy_version 292246 (0.0015) [2025-01-05 14:40:19,965][19668] Updated weights for policy 0, policy_version 292256 (0.0014) [2025-01-05 14:40:19,965][19571] Fps is (10 sec: 19661.6, 60 sec: 19865.7, 300 sec: 19660.8). Total num frames: 1197080576. Throughput: 0: 4975.2. Samples: 24260030. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 14:40:19,965][19571] Avg episode reward: [(0, '9.667')] [2025-01-05 14:40:22,349][19668] Updated weights for policy 0, policy_version 292266 (0.0016) [2025-01-05 14:40:24,506][19668] Updated weights for policy 0, policy_version 292276 (0.0015) [2025-01-05 14:40:24,965][19571] Fps is (10 sec: 18432.0, 60 sec: 19660.8, 300 sec: 19633.0). Total num frames: 1197166592. Throughput: 0: 4895.1. Samples: 24286714. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 14:40:24,965][19571] Avg episode reward: [(0, '9.191')] [2025-01-05 14:40:26,709][19668] Updated weights for policy 0, policy_version 292286 (0.0016) [2025-01-05 14:40:28,833][19668] Updated weights for policy 0, policy_version 292296 (0.0015) [2025-01-05 14:40:29,965][19571] Fps is (10 sec: 18431.8, 60 sec: 19592.5, 300 sec: 19633.0). Total num frames: 1197264896. Throughput: 0: 4858.6. Samples: 24315186. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 14:40:29,965][19571] Avg episode reward: [(0, '9.947')] [2025-01-05 14:40:30,905][19668] Updated weights for policy 0, policy_version 292306 (0.0016) [2025-01-05 14:40:32,942][19668] Updated weights for policy 0, policy_version 292316 (0.0017) [2025-01-05 14:40:34,965][19571] Fps is (10 sec: 19660.7, 60 sec: 19592.5, 300 sec: 19633.0). Total num frames: 1197363200. Throughput: 0: 4866.1. Samples: 24330280. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 14:40:34,965][19571] Avg episode reward: [(0, '9.861')] [2025-01-05 14:40:35,116][19668] Updated weights for policy 0, policy_version 292326 (0.0016) [2025-01-05 14:40:37,192][19668] Updated weights for policy 0, policy_version 292336 (0.0019) [2025-01-05 14:40:39,261][19668] Updated weights for policy 0, policy_version 292346 (0.0016) [2025-01-05 14:40:39,965][19571] Fps is (10 sec: 19660.9, 60 sec: 19524.3, 300 sec: 19646.9). Total num frames: 1197461504. Throughput: 0: 4846.4. Samples: 24359324. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 14:40:39,965][19571] Avg episode reward: [(0, '8.380')] [2025-01-05 14:40:39,973][19636] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000292349_1197461504.pth... [2025-01-05 14:40:40,024][19636] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000291194_1192730624.pth [2025-01-05 14:40:41,441][19668] Updated weights for policy 0, policy_version 292356 (0.0016) [2025-01-05 14:40:43,430][19668] Updated weights for policy 0, policy_version 292366 (0.0015) [2025-01-05 14:40:44,965][19571] Fps is (10 sec: 19660.7, 60 sec: 19524.2, 300 sec: 19646.9). Total num frames: 1197559808. Throughput: 0: 4851.3. Samples: 24388982. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 14:40:44,965][19571] Avg episode reward: [(0, '10.218')] [2025-01-05 14:40:45,493][19668] Updated weights for policy 0, policy_version 292376 (0.0016) [2025-01-05 14:40:47,563][19668] Updated weights for policy 0, policy_version 292386 (0.0016) [2025-01-05 14:40:49,550][19668] Updated weights for policy 0, policy_version 292396 (0.0015) [2025-01-05 14:40:49,965][19571] Fps is (10 sec: 19660.9, 60 sec: 19456.0, 300 sec: 19660.8). Total num frames: 1197658112. Throughput: 0: 4854.6. Samples: 24404030. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) [2025-01-05 14:40:49,965][19571] Avg episode reward: [(0, '11.170')] [2025-01-05 14:40:51,596][19668] Updated weights for policy 0, policy_version 292406 (0.0017) [2025-01-05 14:40:53,669][19668] Updated weights for policy 0, policy_version 292416 (0.0016) [2025-01-05 14:40:54,965][19571] Fps is (10 sec: 20070.6, 60 sec: 19524.3, 300 sec: 19674.7). Total num frames: 1197760512. Throughput: 0: 4859.2. Samples: 24434152. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) [2025-01-05 14:40:54,965][19571] Avg episode reward: [(0, '10.803')] [2025-01-05 14:40:55,761][19668] Updated weights for policy 0, policy_version 292426 (0.0017) [2025-01-05 14:40:57,801][19668] Updated weights for policy 0, policy_version 292436 (0.0015) [2025-01-05 14:40:59,961][19668] Updated weights for policy 0, policy_version 292446 (0.0017) [2025-01-05 14:40:59,965][19571] Fps is (10 sec: 20070.1, 60 sec: 19524.3, 300 sec: 19688.6). Total num frames: 1197858816. Throughput: 0: 4851.0. Samples: 24463386. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) [2025-01-05 14:40:59,965][19571] Avg episode reward: [(0, '10.279')] [2025-01-05 14:41:02,071][19668] Updated weights for policy 0, policy_version 292456 (0.0017) [2025-01-05 14:41:04,113][19668] Updated weights for policy 0, policy_version 292466 (0.0015) [2025-01-05 14:41:04,965][19571] Fps is (10 sec: 19251.0, 60 sec: 19456.0, 300 sec: 19674.7). Total num frames: 1197953024. Throughput: 0: 4840.2. Samples: 24477840. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) [2025-01-05 14:41:04,965][19571] Avg episode reward: [(0, '10.863')] [2025-01-05 14:41:06,258][19668] Updated weights for policy 0, policy_version 292476 (0.0016) [2025-01-05 14:41:08,296][19668] Updated weights for policy 0, policy_version 292486 (0.0016) [2025-01-05 14:41:09,965][19571] Fps is (10 sec: 19251.2, 60 sec: 19456.1, 300 sec: 19674.7). Total num frames: 1198051328. Throughput: 0: 4905.8. Samples: 24507478. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) [2025-01-05 14:41:09,965][19571] Avg episode reward: [(0, '10.271')] [2025-01-05 14:41:10,405][19668] Updated weights for policy 0, policy_version 292496 (0.0016) [2025-01-05 14:41:12,433][19668] Updated weights for policy 0, policy_version 292506 (0.0015) [2025-01-05 14:41:14,468][19668] Updated weights for policy 0, policy_version 292516 (0.0016) [2025-01-05 14:41:14,965][19571] Fps is (10 sec: 20070.6, 60 sec: 19524.3, 300 sec: 19688.6). Total num frames: 1198153728. Throughput: 0: 4935.9. Samples: 24537302. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) [2025-01-05 14:41:14,965][19571] Avg episode reward: [(0, '9.390')] [2025-01-05 14:41:16,609][19668] Updated weights for policy 0, policy_version 292526 (0.0016) [2025-01-05 14:41:18,643][19668] Updated weights for policy 0, policy_version 292536 (0.0015) [2025-01-05 14:41:19,965][19571] Fps is (10 sec: 20070.5, 60 sec: 19524.2, 300 sec: 19702.5). Total num frames: 1198252032. Throughput: 0: 4926.8. Samples: 24551986. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) [2025-01-05 14:41:19,965][19571] Avg episode reward: [(0, '10.652')] [2025-01-05 14:41:20,732][19668] Updated weights for policy 0, policy_version 292546 (0.0016) [2025-01-05 14:41:22,794][19668] Updated weights for policy 0, policy_version 292556 (0.0015) [2025-01-05 14:41:24,872][19668] Updated weights for policy 0, policy_version 292566 (0.0017) [2025-01-05 14:41:24,965][19571] Fps is (10 sec: 19660.6, 60 sec: 19729.0, 300 sec: 19702.5). Total num frames: 1198350336. Throughput: 0: 4938.8. Samples: 24581572. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) [2025-01-05 14:41:24,965][19571] Avg episode reward: [(0, '10.099')] [2025-01-05 14:41:26,994][19668] Updated weights for policy 0, policy_version 292576 (0.0016) [2025-01-05 14:41:29,063][19668] Updated weights for policy 0, policy_version 292586 (0.0015) [2025-01-05 14:41:29,965][19571] Fps is (10 sec: 19660.8, 60 sec: 19729.1, 300 sec: 19716.3). Total num frames: 1198448640. Throughput: 0: 4934.5. Samples: 24611034. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) [2025-01-05 14:41:29,965][19571] Avg episode reward: [(0, '10.095')] [2025-01-05 14:41:31,119][19668] Updated weights for policy 0, policy_version 292596 (0.0016) [2025-01-05 14:41:33,180][19668] Updated weights for policy 0, policy_version 292606 (0.0015) [2025-01-05 14:41:34,965][19571] Fps is (10 sec: 19660.8, 60 sec: 19729.0, 300 sec: 19716.3). Total num frames: 1198546944. Throughput: 0: 4933.6. Samples: 24626042. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) [2025-01-05 14:41:34,965][19571] Avg episode reward: [(0, '10.910')] [2025-01-05 14:41:35,322][19668] Updated weights for policy 0, policy_version 292616 (0.0016) [2025-01-05 14:41:37,350][19668] Updated weights for policy 0, policy_version 292626 (0.0015) [2025-01-05 14:41:39,377][19668] Updated weights for policy 0, policy_version 292636 (0.0017) [2025-01-05 14:41:39,965][19571] Fps is (10 sec: 19660.7, 60 sec: 19729.0, 300 sec: 19716.3). Total num frames: 1198645248. Throughput: 0: 4923.5. Samples: 24655712. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) [2025-01-05 14:41:39,965][19571] Avg episode reward: [(0, '9.486')] [2025-01-05 14:41:41,481][19668] Updated weights for policy 0, policy_version 292646 (0.0016) [2025-01-05 14:41:43,493][19668] Updated weights for policy 0, policy_version 292656 (0.0017) [2025-01-05 14:41:44,965][19571] Fps is (10 sec: 20070.4, 60 sec: 19797.3, 300 sec: 19744.1). Total num frames: 1198747648. Throughput: 0: 4936.4. Samples: 24685522. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) [2025-01-05 14:41:44,965][19571] Avg episode reward: [(0, '9.752')] [2025-01-05 14:41:45,551][19668] Updated weights for policy 0, policy_version 292666 (0.0015) [2025-01-05 14:41:47,605][19668] Updated weights for policy 0, policy_version 292676 (0.0015) [2025-01-05 14:41:49,637][19668] Updated weights for policy 0, policy_version 292686 (0.0014) [2025-01-05 14:41:49,965][19571] Fps is (10 sec: 20069.8, 60 sec: 19797.2, 300 sec: 19744.1). Total num frames: 1198845952. Throughput: 0: 4948.4. Samples: 24700518. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) [2025-01-05 14:41:49,966][19571] Avg episode reward: [(0, '10.842')] [2025-01-05 14:41:51,766][19668] Updated weights for policy 0, policy_version 292696 (0.0016) [2025-01-05 14:41:53,816][19668] Updated weights for policy 0, policy_version 292706 (0.0015) [2025-01-05 14:41:54,965][19571] Fps is (10 sec: 19660.7, 60 sec: 19729.0, 300 sec: 19744.1). Total num frames: 1198944256. Throughput: 0: 4948.8. Samples: 24730176. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) [2025-01-05 14:41:54,965][19571] Avg episode reward: [(0, '10.427')] [2025-01-05 14:41:55,937][19668] Updated weights for policy 0, policy_version 292716 (0.0017) [2025-01-05 14:41:58,045][19668] Updated weights for policy 0, policy_version 292726 (0.0017) [2025-01-05 14:41:59,965][19571] Fps is (10 sec: 19251.8, 60 sec: 19660.8, 300 sec: 19730.2). Total num frames: 1199038464. Throughput: 0: 4928.6. Samples: 24759088. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 14:41:59,966][19571] Avg episode reward: [(0, '10.426')] [2025-01-05 14:42:00,206][19668] Updated weights for policy 0, policy_version 292736 (0.0016) [2025-01-05 14:42:02,358][19668] Updated weights for policy 0, policy_version 292746 (0.0016) [2025-01-05 14:42:04,709][19668] Updated weights for policy 0, policy_version 292756 (0.0022) [2025-01-05 14:42:04,965][19571] Fps is (10 sec: 18431.8, 60 sec: 19592.5, 300 sec: 19702.4). Total num frames: 1199128576. Throughput: 0: 4910.1. Samples: 24772940. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 14:42:04,966][19571] Avg episode reward: [(0, '9.639')] [2025-01-05 14:42:06,990][19668] Updated weights for policy 0, policy_version 292766 (0.0019) [2025-01-05 14:42:09,129][19668] Updated weights for policy 0, policy_version 292776 (0.0019) [2025-01-05 14:42:09,965][19571] Fps is (10 sec: 18432.1, 60 sec: 19524.3, 300 sec: 19702.5). Total num frames: 1199222784. Throughput: 0: 4858.4. Samples: 24800198. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 14:42:09,965][19571] Avg episode reward: [(0, '9.917')] [2025-01-05 14:42:11,370][19668] Updated weights for policy 0, policy_version 292786 (0.0018) [2025-01-05 14:42:13,469][19668] Updated weights for policy 0, policy_version 292796 (0.0018) [2025-01-05 14:42:14,965][19571] Fps is (10 sec: 18841.7, 60 sec: 19387.7, 300 sec: 19688.6). Total num frames: 1199316992. Throughput: 0: 4832.5. Samples: 24828496. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 14:42:14,966][19571] Avg episode reward: [(0, '10.214')] [2025-01-05 14:42:15,660][19668] Updated weights for policy 0, policy_version 292806 (0.0018) [2025-01-05 14:42:17,785][19668] Updated weights for policy 0, policy_version 292816 (0.0018) [2025-01-05 14:42:19,891][19668] Updated weights for policy 0, policy_version 292826 (0.0018) [2025-01-05 14:42:19,965][19571] Fps is (10 sec: 19251.0, 60 sec: 19387.7, 300 sec: 19688.6). Total num frames: 1199415296. Throughput: 0: 4821.0. Samples: 24842988. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 14:42:19,965][19571] Avg episode reward: [(0, '10.691')] [2025-01-05 14:42:22,097][19668] Updated weights for policy 0, policy_version 292836 (0.0019) [2025-01-05 14:42:24,205][19668] Updated weights for policy 0, policy_version 292846 (0.0018) [2025-01-05 14:42:24,965][19571] Fps is (10 sec: 19251.4, 60 sec: 19319.5, 300 sec: 19674.7). Total num frames: 1199509504. Throughput: 0: 4799.4. Samples: 24871686. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 14:42:24,965][19571] Avg episode reward: [(0, '9.926')] [2025-01-05 14:42:26,351][19668] Updated weights for policy 0, policy_version 292856 (0.0018) [2025-01-05 14:42:28,417][19668] Updated weights for policy 0, policy_version 292866 (0.0018) [2025-01-05 14:42:29,965][19571] Fps is (10 sec: 18841.7, 60 sec: 19251.2, 300 sec: 19646.9). Total num frames: 1199603712. Throughput: 0: 4774.9. Samples: 24900394. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 14:42:29,966][19571] Avg episode reward: [(0, '9.744')] [2025-01-05 14:42:30,601][19668] Updated weights for policy 0, policy_version 292876 (0.0017) [2025-01-05 14:42:32,649][19668] Updated weights for policy 0, policy_version 292886 (0.0018) [2025-01-05 14:42:34,746][19668] Updated weights for policy 0, policy_version 292896 (0.0019) [2025-01-05 14:42:34,965][19571] Fps is (10 sec: 19251.2, 60 sec: 19251.2, 300 sec: 19674.7). Total num frames: 1199702016. Throughput: 0: 4770.8. Samples: 24915202. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 14:42:34,965][19571] Avg episode reward: [(0, '10.003')] [2025-01-05 14:42:36,944][19668] Updated weights for policy 0, policy_version 292906 (0.0017) [2025-01-05 14:42:39,004][19668] Updated weights for policy 0, policy_version 292916 (0.0018) [2025-01-05 14:42:39,965][19571] Fps is (10 sec: 19660.8, 60 sec: 19251.2, 300 sec: 19674.7). Total num frames: 1199800320. Throughput: 0: 4758.7. Samples: 24944318. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 14:42:39,965][19571] Avg episode reward: [(0, '10.420')] [2025-01-05 14:42:39,974][19636] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000292920_1199800320.pth... [2025-01-05 14:42:40,040][19636] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000291772_1195098112.pth [2025-01-05 14:42:41,179][19668] Updated weights for policy 0, policy_version 292926 (0.0019) [2025-01-05 14:42:43,250][19668] Updated weights for policy 0, policy_version 292936 (0.0017) [2025-01-05 14:42:44,965][19571] Fps is (10 sec: 19660.9, 60 sec: 19183.0, 300 sec: 19660.8). Total num frames: 1199898624. Throughput: 0: 4761.6. Samples: 24973358. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 14:42:44,965][19571] Avg episode reward: [(0, '9.743')] [2025-01-05 14:42:45,362][19668] Updated weights for policy 0, policy_version 292946 (0.0018) [2025-01-05 14:42:47,380][19668] Updated weights for policy 0, policy_version 292956 (0.0017) [2025-01-05 14:42:49,487][19668] Updated weights for policy 0, policy_version 292966 (0.0018) [2025-01-05 14:42:49,965][19571] Fps is (10 sec: 19660.9, 60 sec: 19183.0, 300 sec: 19646.9). Total num frames: 1199996928. Throughput: 0: 4782.5. Samples: 24988152. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 14:42:49,965][19571] Avg episode reward: [(0, '10.487')] [2025-01-05 14:42:51,678][19668] Updated weights for policy 0, policy_version 292976 (0.0017) [2025-01-05 14:42:53,710][19668] Updated weights for policy 0, policy_version 292986 (0.0017) [2025-01-05 14:42:54,965][19571] Fps is (10 sec: 19251.0, 60 sec: 19114.7, 300 sec: 19619.2). Total num frames: 1200091136. Throughput: 0: 4822.6. Samples: 25017214. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 14:42:54,965][19571] Avg episode reward: [(0, '8.677')] [2025-01-05 14:42:55,913][19668] Updated weights for policy 0, policy_version 292996 (0.0018) [2025-01-05 14:42:58,032][19668] Updated weights for policy 0, policy_version 293006 (0.0018) [2025-01-05 14:42:59,965][19571] Fps is (10 sec: 18841.4, 60 sec: 19114.6, 300 sec: 19605.3). Total num frames: 1200185344. Throughput: 0: 4826.4. Samples: 25045686. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 14:42:59,966][19571] Avg episode reward: [(0, '10.186')] [2025-01-05 14:43:00,198][19668] Updated weights for policy 0, policy_version 293016 (0.0018) [2025-01-05 14:43:02,274][19668] Updated weights for policy 0, policy_version 293026 (0.0018) [2025-01-05 14:43:04,428][19668] Updated weights for policy 0, policy_version 293036 (0.0018) [2025-01-05 14:43:04,965][19571] Fps is (10 sec: 19251.2, 60 sec: 19251.2, 300 sec: 19605.3). Total num frames: 1200283648. Throughput: 0: 4828.4. Samples: 25060264. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 14:43:04,965][19571] Avg episode reward: [(0, '10.040')] [2025-01-05 14:43:06,554][19668] Updated weights for policy 0, policy_version 293046 (0.0018) [2025-01-05 14:43:08,647][19668] Updated weights for policy 0, policy_version 293056 (0.0017) [2025-01-05 14:43:09,966][19571] Fps is (10 sec: 19250.0, 60 sec: 19251.0, 300 sec: 19591.3). Total num frames: 1200377856. Throughput: 0: 4832.2. Samples: 25089138. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 14:43:09,966][19571] Avg episode reward: [(0, '10.612')] [2025-01-05 14:43:10,933][19668] Updated weights for policy 0, policy_version 293066 (0.0020) [2025-01-05 14:43:12,948][19668] Updated weights for policy 0, policy_version 293076 (0.0017) [2025-01-05 14:43:14,965][19571] Fps is (10 sec: 19251.2, 60 sec: 19319.5, 300 sec: 19577.5). Total num frames: 1200476160. Throughput: 0: 4834.9. Samples: 25117962. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-05 14:43:14,965][19571] Avg episode reward: [(0, '9.588')] [2025-01-05 14:43:15,062][19668] Updated weights for policy 0, policy_version 293086 (0.0018) [2025-01-05 14:43:17,279][19668] Updated weights for policy 0, policy_version 293096 (0.0018) [2025-01-05 14:43:19,286][19668] Updated weights for policy 0, policy_version 293106 (0.0018) [2025-01-05 14:43:19,965][19571] Fps is (10 sec: 19662.3, 60 sec: 19319.5, 300 sec: 19577.5). Total num frames: 1200574464. Throughput: 0: 4822.0. Samples: 25132194. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-05 14:43:19,965][19571] Avg episode reward: [(0, '9.630')] [2025-01-05 14:43:21,392][19668] Updated weights for policy 0, policy_version 293116 (0.0017) [2025-01-05 14:43:23,491][19668] Updated weights for policy 0, policy_version 293126 (0.0017) [2025-01-05 14:43:24,965][19571] Fps is (10 sec: 19661.0, 60 sec: 19387.8, 300 sec: 19577.5). Total num frames: 1200672768. Throughput: 0: 4833.0. Samples: 25161800. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-05 14:43:24,965][19571] Avg episode reward: [(0, '9.461')] [2025-01-05 14:43:25,582][19668] Updated weights for policy 0, policy_version 293136 (0.0019) [2025-01-05 14:43:27,672][19668] Updated weights for policy 0, policy_version 293146 (0.0017) [2025-01-05 14:43:29,780][19668] Updated weights for policy 0, policy_version 293156 (0.0018) [2025-01-05 14:43:29,965][19571] Fps is (10 sec: 19660.8, 60 sec: 19456.0, 300 sec: 19577.5). Total num frames: 1200771072. Throughput: 0: 4842.8. Samples: 25191286. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-05 14:43:29,965][19571] Avg episode reward: [(0, '10.357')] [2025-01-05 14:43:31,882][19668] Updated weights for policy 0, policy_version 293166 (0.0019) [2025-01-05 14:43:33,978][19668] Updated weights for policy 0, policy_version 293176 (0.0017) [2025-01-05 14:43:34,965][19571] Fps is (10 sec: 19251.0, 60 sec: 19387.7, 300 sec: 19549.7). Total num frames: 1200865280. Throughput: 0: 4830.6. Samples: 25205528. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-05 14:43:34,965][19571] Avg episode reward: [(0, '10.136')] [2025-01-05 14:43:36,230][19668] Updated weights for policy 0, policy_version 293186 (0.0019) [2025-01-05 14:43:38,254][19668] Updated weights for policy 0, policy_version 293196 (0.0017) [2025-01-05 14:43:39,965][19571] Fps is (10 sec: 18841.6, 60 sec: 19319.5, 300 sec: 19535.8). Total num frames: 1200959488. Throughput: 0: 4827.6. Samples: 25234458. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-05 14:43:39,965][19571] Avg episode reward: [(0, '10.514')] [2025-01-05 14:43:40,397][19668] Updated weights for policy 0, policy_version 293206 (0.0018) [2025-01-05 14:43:42,523][19668] Updated weights for policy 0, policy_version 293216 (0.0017) [2025-01-05 14:43:44,531][19668] Updated weights for policy 0, policy_version 293226 (0.0017) [2025-01-05 14:43:44,965][19571] Fps is (10 sec: 19251.4, 60 sec: 19319.5, 300 sec: 19535.8). Total num frames: 1201057792. Throughput: 0: 4847.4. Samples: 25263820. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-05 14:43:44,965][19571] Avg episode reward: [(0, '9.224')] [2025-01-05 14:43:46,603][19668] Updated weights for policy 0, policy_version 293236 (0.0017) [2025-01-05 14:43:48,845][19668] Updated weights for policy 0, policy_version 293246 (0.0017) [2025-01-05 14:43:49,965][19571] Fps is (10 sec: 19660.8, 60 sec: 19319.5, 300 sec: 19535.8). Total num frames: 1201156096. Throughput: 0: 4846.9. Samples: 25278376. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-05 14:43:49,965][19571] Avg episode reward: [(0, '10.245')] [2025-01-05 14:43:50,967][19668] Updated weights for policy 0, policy_version 293256 (0.0018) [2025-01-05 14:43:53,020][19668] Updated weights for policy 0, policy_version 293266 (0.0018) [2025-01-05 14:43:54,965][19571] Fps is (10 sec: 19251.0, 60 sec: 19319.5, 300 sec: 19521.9). Total num frames: 1201250304. Throughput: 0: 4842.0. Samples: 25307026. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-05 14:43:54,965][19571] Avg episode reward: [(0, '10.256')] [2025-01-05 14:43:55,251][19668] Updated weights for policy 0, policy_version 293276 (0.0018) [2025-01-05 14:43:57,316][19668] Updated weights for policy 0, policy_version 293286 (0.0017) [2025-01-05 14:43:59,369][19668] Updated weights for policy 0, policy_version 293296 (0.0018) [2025-01-05 14:43:59,965][19571] Fps is (10 sec: 19251.0, 60 sec: 19387.7, 300 sec: 19508.1). Total num frames: 1201348608. Throughput: 0: 4849.7. Samples: 25336200. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-05 14:43:59,966][19571] Avg episode reward: [(0, '11.525')] [2025-01-05 14:44:01,569][19668] Updated weights for policy 0, policy_version 293306 (0.0018) [2025-01-05 14:44:03,629][19668] Updated weights for policy 0, policy_version 293316 (0.0019) [2025-01-05 14:44:04,965][19571] Fps is (10 sec: 19660.8, 60 sec: 19387.7, 300 sec: 19508.1). Total num frames: 1201446912. Throughput: 0: 4856.8. Samples: 25350748. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-05 14:44:04,965][19571] Avg episode reward: [(0, '9.506')] [2025-01-05 14:44:05,770][19668] Updated weights for policy 0, policy_version 293326 (0.0018) [2025-01-05 14:44:07,932][19668] Updated weights for policy 0, policy_version 293336 (0.0018) [2025-01-05 14:44:09,965][19571] Fps is (10 sec: 19251.7, 60 sec: 19388.0, 300 sec: 19494.2). Total num frames: 1201541120. Throughput: 0: 4834.8. Samples: 25379364. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-05 14:44:09,965][19571] Avg episode reward: [(0, '10.175')] [2025-01-05 14:44:10,110][19668] Updated weights for policy 0, policy_version 293346 (0.0018) [2025-01-05 14:44:12,305][19668] Updated weights for policy 0, policy_version 293356 (0.0019) [2025-01-05 14:44:14,427][19668] Updated weights for policy 0, policy_version 293366 (0.0018) [2025-01-05 14:44:14,965][19571] Fps is (10 sec: 18841.8, 60 sec: 19319.5, 300 sec: 19480.3). Total num frames: 1201635328. Throughput: 0: 4811.7. Samples: 25407814. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-05 14:44:14,965][19571] Avg episode reward: [(0, '9.409')] [2025-01-05 14:44:16,604][19668] Updated weights for policy 0, policy_version 293376 (0.0018) [2025-01-05 14:44:18,610][19668] Updated weights for policy 0, policy_version 293386 (0.0017) [2025-01-05 14:44:19,965][19571] Fps is (10 sec: 19251.2, 60 sec: 19319.5, 300 sec: 19480.3). Total num frames: 1201733632. Throughput: 0: 4817.7. Samples: 25422326. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-05 14:44:19,965][19571] Avg episode reward: [(0, '10.043')] [2025-01-05 14:44:20,734][19668] Updated weights for policy 0, policy_version 293396 (0.0017) [2025-01-05 14:44:22,800][19668] Updated weights for policy 0, policy_version 293406 (0.0018) [2025-01-05 14:44:24,884][19668] Updated weights for policy 0, policy_version 293416 (0.0018) [2025-01-05 14:44:24,965][19571] Fps is (10 sec: 19660.7, 60 sec: 19319.4, 300 sec: 19466.4). Total num frames: 1201831936. Throughput: 0: 4829.8. Samples: 25451798. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-05 14:44:24,965][19571] Avg episode reward: [(0, '10.214')] [2025-01-05 14:44:27,110][19668] Updated weights for policy 0, policy_version 293426 (0.0018) [2025-01-05 14:44:29,185][19668] Updated weights for policy 0, policy_version 293436 (0.0017) [2025-01-05 14:44:29,965][19571] Fps is (10 sec: 19660.7, 60 sec: 19319.5, 300 sec: 19466.4). Total num frames: 1201930240. Throughput: 0: 4823.7. Samples: 25480886. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-05 14:44:29,965][19571] Avg episode reward: [(0, '10.042')] [2025-01-05 14:44:31,285][19668] Updated weights for policy 0, policy_version 293446 (0.0019) [2025-01-05 14:44:33,401][19668] Updated weights for policy 0, policy_version 293456 (0.0019) [2025-01-05 14:44:34,965][19571] Fps is (10 sec: 19251.2, 60 sec: 19319.5, 300 sec: 19438.6). Total num frames: 1202024448. Throughput: 0: 4820.3. Samples: 25495290. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-05 14:44:34,965][19571] Avg episode reward: [(0, '10.203')] [2025-01-05 14:44:35,594][19668] Updated weights for policy 0, policy_version 293466 (0.0017) [2025-01-05 14:44:37,586][19668] Updated weights for policy 0, policy_version 293476 (0.0017) [2025-01-05 14:44:39,703][19668] Updated weights for policy 0, policy_version 293486 (0.0018) [2025-01-05 14:44:39,965][19571] Fps is (10 sec: 19250.9, 60 sec: 19387.7, 300 sec: 19438.6). Total num frames: 1202122752. Throughput: 0: 4833.6. Samples: 25524540. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-05 14:44:39,965][19571] Avg episode reward: [(0, '9.878')] [2025-01-05 14:44:39,972][19636] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000293487_1202122752.pth... [2025-01-05 14:44:40,029][19636] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000292349_1197461504.pth [2025-01-05 14:44:42,002][19668] Updated weights for policy 0, policy_version 293496 (0.0019) [2025-01-05 14:44:44,007][19668] Updated weights for policy 0, policy_version 293506 (0.0018) [2025-01-05 14:44:44,965][19571] Fps is (10 sec: 19251.2, 60 sec: 19319.5, 300 sec: 19410.9). Total num frames: 1202216960. Throughput: 0: 4819.4. Samples: 25553070. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-05 14:44:44,965][19571] Avg episode reward: [(0, '9.812')] [2025-01-05 14:44:46,144][19668] Updated weights for policy 0, policy_version 293516 (0.0018) [2025-01-05 14:44:48,226][19668] Updated weights for policy 0, policy_version 293526 (0.0018) [2025-01-05 14:44:49,965][19571] Fps is (10 sec: 19251.4, 60 sec: 19319.5, 300 sec: 19410.9). Total num frames: 1202315264. Throughput: 0: 4825.9. Samples: 25567914. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-05 14:44:49,965][19571] Avg episode reward: [(0, '9.162')] [2025-01-05 14:44:50,307][19668] Updated weights for policy 0, policy_version 293536 (0.0019) [2025-01-05 14:44:52,419][19668] Updated weights for policy 0, policy_version 293546 (0.0017) [2025-01-05 14:44:54,503][19668] Updated weights for policy 0, policy_version 293556 (0.0018) [2025-01-05 14:44:54,965][19571] Fps is (10 sec: 19660.7, 60 sec: 19387.7, 300 sec: 19410.9). Total num frames: 1202413568. Throughput: 0: 4842.7. Samples: 25597284. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-05 14:44:54,965][19571] Avg episode reward: [(0, '10.139')] [2025-01-05 14:44:56,540][19668] Updated weights for policy 0, policy_version 293566 (0.0017) [2025-01-05 14:44:58,634][19668] Updated weights for policy 0, policy_version 293576 (0.0018) [2025-01-05 14:44:59,965][19571] Fps is (10 sec: 19251.3, 60 sec: 19319.5, 300 sec: 19397.0). Total num frames: 1202507776. Throughput: 0: 4858.5. Samples: 25626448. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-05 14:44:59,965][19571] Avg episode reward: [(0, '9.484')] [2025-01-05 14:45:00,838][19668] Updated weights for policy 0, policy_version 293586 (0.0018) [2025-01-05 14:45:02,865][19668] Updated weights for policy 0, policy_version 293596 (0.0017) [2025-01-05 14:45:04,965][19571] Fps is (10 sec: 19251.2, 60 sec: 19319.5, 300 sec: 19397.0). Total num frames: 1202606080. Throughput: 0: 4862.7. Samples: 25641146. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-05 14:45:04,965][19571] Avg episode reward: [(0, '10.007')] [2025-01-05 14:45:05,094][19668] Updated weights for policy 0, policy_version 293606 (0.0019) [2025-01-05 14:45:07,267][19668] Updated weights for policy 0, policy_version 293616 (0.0018) [2025-01-05 14:45:09,319][19668] Updated weights for policy 0, policy_version 293626 (0.0019) [2025-01-05 14:45:09,965][19571] Fps is (10 sec: 19251.0, 60 sec: 19319.4, 300 sec: 19383.1). Total num frames: 1202700288. Throughput: 0: 4839.5. Samples: 25669576. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-05 14:45:09,965][19571] Avg episode reward: [(0, '10.505')] [2025-01-05 14:45:11,537][19668] Updated weights for policy 0, policy_version 293636 (0.0017) [2025-01-05 14:45:13,638][19668] Updated weights for policy 0, policy_version 293646 (0.0018) [2025-01-05 14:45:14,965][19571] Fps is (10 sec: 18841.8, 60 sec: 19319.5, 300 sec: 19369.2). Total num frames: 1202794496. Throughput: 0: 4832.6. Samples: 25698354. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-05 14:45:14,965][19571] Avg episode reward: [(0, '9.798')] [2025-01-05 14:45:15,744][19668] Updated weights for policy 0, policy_version 293656 (0.0018) [2025-01-05 14:45:17,860][19668] Updated weights for policy 0, policy_version 293666 (0.0017) [2025-01-05 14:45:19,965][19571] Fps is (10 sec: 19251.2, 60 sec: 19319.4, 300 sec: 19410.9). Total num frames: 1202892800. Throughput: 0: 4838.1. Samples: 25713006. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-05 14:45:19,966][19571] Avg episode reward: [(0, '9.662')] [2025-01-05 14:45:20,077][19668] Updated weights for policy 0, policy_version 293676 (0.0019) [2025-01-05 14:45:22,167][19668] Updated weights for policy 0, policy_version 293686 (0.0017) [2025-01-05 14:45:24,304][19668] Updated weights for policy 0, policy_version 293696 (0.0018) [2025-01-05 14:45:24,965][19571] Fps is (10 sec: 19660.6, 60 sec: 19319.5, 300 sec: 19410.9). Total num frames: 1202991104. Throughput: 0: 4821.7. Samples: 25741516. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-05 14:45:24,965][19571] Avg episode reward: [(0, '11.546')] [2025-01-05 14:45:26,470][19668] Updated weights for policy 0, policy_version 293706 (0.0017) [2025-01-05 14:45:28,480][19668] Updated weights for policy 0, policy_version 293716 (0.0018) [2025-01-05 14:45:29,965][19571] Fps is (10 sec: 19661.0, 60 sec: 19319.5, 300 sec: 19410.9). Total num frames: 1203089408. Throughput: 0: 4839.3. Samples: 25770838. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-05 14:45:29,965][19571] Avg episode reward: [(0, '11.179')] [2025-01-05 14:45:30,596][19668] Updated weights for policy 0, policy_version 293726 (0.0016) [2025-01-05 14:45:32,633][19668] Updated weights for policy 0, policy_version 293736 (0.0017) [2025-01-05 14:45:34,646][19668] Updated weights for policy 0, policy_version 293746 (0.0017) [2025-01-05 14:45:34,965][19571] Fps is (10 sec: 19660.3, 60 sec: 19387.6, 300 sec: 19410.9). Total num frames: 1203187712. Throughput: 0: 4842.1. Samples: 25785812. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) [2025-01-05 14:45:34,966][19571] Avg episode reward: [(0, '9.349')] [2025-01-05 14:45:36,767][19668] Updated weights for policy 0, policy_version 293756 (0.0018) [2025-01-05 14:45:38,857][19668] Updated weights for policy 0, policy_version 293766 (0.0018) [2025-01-05 14:45:39,965][19571] Fps is (10 sec: 19250.9, 60 sec: 19319.5, 300 sec: 19397.0). Total num frames: 1203281920. Throughput: 0: 4847.3. Samples: 25815412. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 14:45:39,966][19571] Avg episode reward: [(0, '9.427')] [2025-01-05 14:45:41,002][19668] Updated weights for policy 0, policy_version 293776 (0.0019) [2025-01-05 14:45:43,121][19668] Updated weights for policy 0, policy_version 293786 (0.0018) [2025-01-05 14:45:44,965][19571] Fps is (10 sec: 19251.8, 60 sec: 19387.7, 300 sec: 19397.0). Total num frames: 1203380224. Throughput: 0: 4835.2. Samples: 25844034. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 14:45:44,965][19571] Avg episode reward: [(0, '9.431')] [2025-01-05 14:45:45,298][19668] Updated weights for policy 0, policy_version 293796 (0.0017) [2025-01-05 14:45:47,320][19668] Updated weights for policy 0, policy_version 293806 (0.0017) [2025-01-05 14:45:49,434][19668] Updated weights for policy 0, policy_version 293816 (0.0017) [2025-01-05 14:45:49,965][19571] Fps is (10 sec: 19661.0, 60 sec: 19387.7, 300 sec: 19383.1). Total num frames: 1203478528. Throughput: 0: 4839.2. Samples: 25858908. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 14:45:49,965][19571] Avg episode reward: [(0, '10.501')] [2025-01-05 14:45:51,573][19668] Updated weights for policy 0, policy_version 293826 (0.0019) [2025-01-05 14:45:53,617][19668] Updated weights for policy 0, policy_version 293836 (0.0016) [2025-01-05 14:45:54,965][19571] Fps is (10 sec: 19660.8, 60 sec: 19387.7, 300 sec: 19383.1). Total num frames: 1203576832. Throughput: 0: 4856.1. Samples: 25888100. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 14:45:54,965][19571] Avg episode reward: [(0, '9.557')] [2025-01-05 14:45:55,848][19668] Updated weights for policy 0, policy_version 293846 (0.0018) [2025-01-05 14:45:57,933][19668] Updated weights for policy 0, policy_version 293856 (0.0018) [2025-01-05 14:45:59,965][19571] Fps is (10 sec: 19251.2, 60 sec: 19387.7, 300 sec: 19383.1). Total num frames: 1203671040. Throughput: 0: 4848.7. Samples: 25916546. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 14:45:59,966][19571] Avg episode reward: [(0, '9.972')] [2025-01-05 14:46:00,102][19668] Updated weights for policy 0, policy_version 293866 (0.0018) [2025-01-05 14:46:02,297][19668] Updated weights for policy 0, policy_version 293876 (0.0018) [2025-01-05 14:46:04,378][19668] Updated weights for policy 0, policy_version 293886 (0.0017) [2025-01-05 14:46:04,965][19571] Fps is (10 sec: 18841.4, 60 sec: 19319.4, 300 sec: 19369.2). Total num frames: 1203765248. Throughput: 0: 4840.9. Samples: 25930846. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 14:46:04,965][19571] Avg episode reward: [(0, '10.769')] [2025-01-05 14:46:06,520][19668] Updated weights for policy 0, policy_version 293896 (0.0019) [2025-01-05 14:46:08,629][19668] Updated weights for policy 0, policy_version 293906 (0.0018) [2025-01-05 14:46:09,965][19571] Fps is (10 sec: 19251.2, 60 sec: 19387.7, 300 sec: 19355.3). Total num frames: 1203863552. Throughput: 0: 4853.4. Samples: 25959920. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 14:46:09,966][19571] Avg episode reward: [(0, '9.933')] [2025-01-05 14:46:10,820][19668] Updated weights for policy 0, policy_version 293916 (0.0017) [2025-01-05 14:46:12,805][19668] Updated weights for policy 0, policy_version 293926 (0.0016) [2025-01-05 14:46:14,905][19668] Updated weights for policy 0, policy_version 293936 (0.0017) [2025-01-05 14:46:14,965][19571] Fps is (10 sec: 19661.0, 60 sec: 19456.0, 300 sec: 19355.3). Total num frames: 1203961856. Throughput: 0: 4853.2. Samples: 25989234. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 14:46:14,965][19571] Avg episode reward: [(0, '9.023')] [2025-01-05 14:46:17,114][19668] Updated weights for policy 0, policy_version 293946 (0.0019) [2025-01-05 14:46:19,099][19668] Updated weights for policy 0, policy_version 293956 (0.0017) [2025-01-05 14:46:19,965][19571] Fps is (10 sec: 19660.9, 60 sec: 19456.0, 300 sec: 19355.3). Total num frames: 1204060160. Throughput: 0: 4843.5. Samples: 26003770. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 14:46:19,965][19571] Avg episode reward: [(0, '9.625')] [2025-01-05 14:46:21,198][19668] Updated weights for policy 0, policy_version 293966 (0.0018) [2025-01-05 14:46:23,238][19668] Updated weights for policy 0, policy_version 293976 (0.0015) [2025-01-05 14:46:24,965][19571] Fps is (10 sec: 19660.9, 60 sec: 19456.0, 300 sec: 19355.3). Total num frames: 1204158464. Throughput: 0: 4852.7. Samples: 26033782. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 14:46:24,965][19571] Avg episode reward: [(0, '9.995')] [2025-01-05 14:46:25,243][19668] Updated weights for policy 0, policy_version 293986 (0.0014) [2025-01-05 14:46:27,289][19668] Updated weights for policy 0, policy_version 293996 (0.0015) [2025-01-05 14:46:29,336][19668] Updated weights for policy 0, policy_version 294006 (0.0015) [2025-01-05 14:46:29,965][19571] Fps is (10 sec: 19660.7, 60 sec: 19456.0, 300 sec: 19355.3). Total num frames: 1204256768. Throughput: 0: 4886.2. Samples: 26063914. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 14:46:29,965][19571] Avg episode reward: [(0, '9.963')] [2025-01-05 14:46:31,372][19668] Updated weights for policy 0, policy_version 294016 (0.0016) [2025-01-05 14:46:33,414][19668] Updated weights for policy 0, policy_version 294026 (0.0015) [2025-01-05 14:46:34,965][19571] Fps is (10 sec: 20070.3, 60 sec: 19524.4, 300 sec: 19369.2). Total num frames: 1204359168. Throughput: 0: 4887.1. Samples: 26078828. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 14:46:34,965][19571] Avg episode reward: [(0, '9.401')] [2025-01-05 14:46:35,546][19668] Updated weights for policy 0, policy_version 294036 (0.0016) [2025-01-05 14:46:37,509][19668] Updated weights for policy 0, policy_version 294046 (0.0015) [2025-01-05 14:46:39,570][19668] Updated weights for policy 0, policy_version 294056 (0.0015) [2025-01-05 14:46:39,965][19571] Fps is (10 sec: 20070.2, 60 sec: 19592.5, 300 sec: 19355.3). Total num frames: 1204457472. Throughput: 0: 4906.1. Samples: 26108874. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 14:46:39,965][19571] Avg episode reward: [(0, '9.461')] [2025-01-05 14:46:40,033][19636] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000294058_1204461568.pth... [2025-01-05 14:46:40,085][19636] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000292920_1199800320.pth [2025-01-05 14:46:41,925][19668] Updated weights for policy 0, policy_version 294066 (0.0017) [2025-01-05 14:46:44,011][19668] Updated weights for policy 0, policy_version 294076 (0.0020) [2025-01-05 14:46:44,974][19571] Fps is (10 sec: 18415.4, 60 sec: 19384.8, 300 sec: 19313.1). Total num frames: 1204543488. Throughput: 0: 4859.6. Samples: 26135272. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 14:46:44,975][19571] Avg episode reward: [(0, '9.741')] [2025-01-05 14:46:47,795][19668] Updated weights for policy 0, policy_version 294086 (0.0027) [2025-01-05 14:46:49,965][19571] Fps is (10 sec: 15565.0, 60 sec: 18909.9, 300 sec: 19216.5). Total num frames: 1204613120. Throughput: 0: 4725.0. Samples: 26143472. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 14:46:49,965][19571] Avg episode reward: [(0, '10.542')] [2025-01-05 14:46:50,176][19668] Updated weights for policy 0, policy_version 294096 (0.0020) [2025-01-05 14:46:52,471][19668] Updated weights for policy 0, policy_version 294106 (0.0022) [2025-01-05 14:46:54,813][19668] Updated weights for policy 0, policy_version 294116 (0.0021) [2025-01-05 14:46:54,965][19571] Fps is (10 sec: 15578.7, 60 sec: 18705.0, 300 sec: 19188.7). Total num frames: 1204699136. Throughput: 0: 4661.6. Samples: 26169694. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 14:46:54,966][19571] Avg episode reward: [(0, '9.673')] [2025-01-05 14:46:57,346][19668] Updated weights for policy 0, policy_version 294126 (0.0021) [2025-01-05 14:46:59,819][19668] Updated weights for policy 0, policy_version 294136 (0.0021) [2025-01-05 14:46:59,965][19571] Fps is (10 sec: 16793.6, 60 sec: 18500.3, 300 sec: 19161.0). Total num frames: 1204781056. Throughput: 0: 4560.7. Samples: 26194464. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 14:46:59,965][19571] Avg episode reward: [(0, '9.070')] [2025-01-05 14:47:02,346][19668] Updated weights for policy 0, policy_version 294146 (0.0022) [2025-01-05 14:47:04,951][19668] Updated weights for policy 0, policy_version 294156 (0.0022) [2025-01-05 14:47:04,965][19571] Fps is (10 sec: 16383.1, 60 sec: 18295.3, 300 sec: 19119.3). Total num frames: 1204862976. Throughput: 0: 4511.5. Samples: 26206790. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 14:47:04,966][19571] Avg episode reward: [(0, '9.099')] [2025-01-05 14:47:07,577][19668] Updated weights for policy 0, policy_version 294166 (0.0019) [2025-01-05 14:47:09,674][19668] Updated weights for policy 0, policy_version 294176 (0.0018) [2025-01-05 14:47:09,965][19571] Fps is (10 sec: 16793.6, 60 sec: 18090.7, 300 sec: 19091.5). Total num frames: 1204948992. Throughput: 0: 4386.3. Samples: 26231166. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 14:47:09,965][19571] Avg episode reward: [(0, '10.116')] [2025-01-05 14:47:11,925][19668] Updated weights for policy 0, policy_version 294186 (0.0018) [2025-01-05 14:47:13,912][19668] Updated weights for policy 0, policy_version 294196 (0.0017) [2025-01-05 14:47:14,965][19571] Fps is (10 sec: 18023.6, 60 sec: 18022.4, 300 sec: 19077.7). Total num frames: 1205043200. Throughput: 0: 4364.5. Samples: 26260318. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 14:47:14,965][19571] Avg episode reward: [(0, '10.131')] [2025-01-05 14:47:16,025][19668] Updated weights for policy 0, policy_version 294206 (0.0017) [2025-01-05 14:47:18,119][19668] Updated weights for policy 0, policy_version 294216 (0.0018) [2025-01-05 14:47:19,965][19571] Fps is (10 sec: 19251.2, 60 sec: 18022.4, 300 sec: 19091.5). Total num frames: 1205141504. Throughput: 0: 4361.6. Samples: 26275102. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 14:47:19,965][19571] Avg episode reward: [(0, '9.354')] [2025-01-05 14:47:20,196][19668] Updated weights for policy 0, policy_version 294226 (0.0018) [2025-01-05 14:47:22,296][19668] Updated weights for policy 0, policy_version 294236 (0.0018) [2025-01-05 14:47:24,386][19668] Updated weights for policy 0, policy_version 294246 (0.0017) [2025-01-05 14:47:24,965][19571] Fps is (10 sec: 19660.5, 60 sec: 18022.3, 300 sec: 19105.4). Total num frames: 1205239808. Throughput: 0: 4350.8. Samples: 26304660. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 14:47:24,966][19571] Avg episode reward: [(0, '10.341')] [2025-01-05 14:47:26,465][19668] Updated weights for policy 0, policy_version 294256 (0.0018) [2025-01-05 14:47:28,542][19668] Updated weights for policy 0, policy_version 294266 (0.0017) [2025-01-05 14:47:29,965][19571] Fps is (10 sec: 19660.8, 60 sec: 18022.4, 300 sec: 19105.4). Total num frames: 1205338112. Throughput: 0: 4407.5. Samples: 26333568. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 14:47:29,965][19571] Avg episode reward: [(0, '11.371')] [2025-01-05 14:47:30,728][19668] Updated weights for policy 0, policy_version 294276 (0.0018) [2025-01-05 14:47:32,730][19668] Updated weights for policy 0, policy_version 294286 (0.0017) [2025-01-05 14:47:34,836][19668] Updated weights for policy 0, policy_version 294296 (0.0018) [2025-01-05 14:47:34,965][19571] Fps is (10 sec: 19661.1, 60 sec: 17954.1, 300 sec: 19105.4). Total num frames: 1205436416. Throughput: 0: 4551.8. Samples: 26348304. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 14:47:34,965][19571] Avg episode reward: [(0, '9.711')] [2025-01-05 14:47:37,032][19668] Updated weights for policy 0, policy_version 294306 (0.0018) [2025-01-05 14:47:39,039][19668] Updated weights for policy 0, policy_version 294316 (0.0017) [2025-01-05 14:47:39,965][19571] Fps is (10 sec: 19660.9, 60 sec: 17954.2, 300 sec: 19105.4). Total num frames: 1205534720. Throughput: 0: 4617.7. Samples: 26377490. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 14:47:39,965][19571] Avg episode reward: [(0, '10.935')] [2025-01-05 14:47:41,159][19668] Updated weights for policy 0, policy_version 294326 (0.0018) [2025-01-05 14:47:43,264][19668] Updated weights for policy 0, policy_version 294336 (0.0018) [2025-01-05 14:47:44,965][19571] Fps is (10 sec: 19251.3, 60 sec: 18093.4, 300 sec: 19091.5). Total num frames: 1205628928. Throughput: 0: 4714.1. Samples: 26406598. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 14:47:44,965][19571] Avg episode reward: [(0, '9.853')] [2025-01-05 14:47:45,394][19668] Updated weights for policy 0, policy_version 294346 (0.0018) [2025-01-05 14:47:47,535][19668] Updated weights for policy 0, policy_version 294356 (0.0017) [2025-01-05 14:47:49,628][19668] Updated weights for policy 0, policy_version 294366 (0.0018) [2025-01-05 14:47:49,965][19571] Fps is (10 sec: 19250.9, 60 sec: 18568.5, 300 sec: 19105.4). Total num frames: 1205727232. Throughput: 0: 4767.0. Samples: 26421302. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 14:47:49,965][19571] Avg episode reward: [(0, '9.580')] [2025-01-05 14:47:51,759][19668] Updated weights for policy 0, policy_version 294376 (0.0018) [2025-01-05 14:47:53,889][19668] Updated weights for policy 0, policy_version 294386 (0.0018) [2025-01-05 14:47:54,965][19571] Fps is (10 sec: 19660.8, 60 sec: 18773.4, 300 sec: 19119.3). Total num frames: 1205825536. Throughput: 0: 4867.7. Samples: 26450212. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 14:47:54,965][19571] Avg episode reward: [(0, '9.579')] [2025-01-05 14:47:56,087][19668] Updated weights for policy 0, policy_version 294396 (0.0018) [2025-01-05 14:47:58,103][19668] Updated weights for policy 0, policy_version 294406 (0.0017) [2025-01-05 14:47:59,965][19571] Fps is (10 sec: 19660.9, 60 sec: 19046.4, 300 sec: 19119.3). Total num frames: 1205923840. Throughput: 0: 4868.2. Samples: 26479388. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 14:47:59,965][19571] Avg episode reward: [(0, '9.523')] [2025-01-05 14:48:00,230][19668] Updated weights for policy 0, policy_version 294416 (0.0017) [2025-01-05 14:48:02,327][19668] Updated weights for policy 0, policy_version 294426 (0.0018) [2025-01-05 14:48:04,319][19668] Updated weights for policy 0, policy_version 294436 (0.0017) [2025-01-05 14:48:04,965][19571] Fps is (10 sec: 19660.8, 60 sec: 19319.7, 300 sec: 19133.2). Total num frames: 1206022144. Throughput: 0: 4867.3. Samples: 26494132. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 14:48:04,965][19571] Avg episode reward: [(0, '10.198')] [2025-01-05 14:48:06,438][19668] Updated weights for policy 0, policy_version 294446 (0.0020) [2025-01-05 14:48:08,523][19668] Updated weights for policy 0, policy_version 294456 (0.0017) [2025-01-05 14:48:09,965][19571] Fps is (10 sec: 19660.7, 60 sec: 19524.2, 300 sec: 19133.2). Total num frames: 1206120448. Throughput: 0: 4867.0. Samples: 26523676. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 14:48:09,965][19571] Avg episode reward: [(0, '9.878')] [2025-01-05 14:48:10,615][19668] Updated weights for policy 0, policy_version 294466 (0.0018) [2025-01-05 14:48:12,732][19668] Updated weights for policy 0, policy_version 294476 (0.0017) [2025-01-05 14:48:14,841][19668] Updated weights for policy 0, policy_version 294486 (0.0017) [2025-01-05 14:48:14,965][19571] Fps is (10 sec: 19251.1, 60 sec: 19524.3, 300 sec: 19119.3). Total num frames: 1206214656. Throughput: 0: 4877.8. Samples: 26553068. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 14:48:14,965][19571] Avg episode reward: [(0, '9.465')] [2025-01-05 14:48:16,913][19668] Updated weights for policy 0, policy_version 294496 (0.0017) [2025-01-05 14:48:19,031][19668] Updated weights for policy 0, policy_version 294506 (0.0017) [2025-01-05 14:48:19,965][19571] Fps is (10 sec: 19251.1, 60 sec: 19524.2, 300 sec: 19119.3). Total num frames: 1206312960. Throughput: 0: 4868.5. Samples: 26567386. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 14:48:19,966][19571] Avg episode reward: [(0, '9.425')] [2025-01-05 14:48:21,220][19668] Updated weights for policy 0, policy_version 294516 (0.0017) [2025-01-05 14:48:23,231][19668] Updated weights for policy 0, policy_version 294526 (0.0016) [2025-01-05 14:48:24,965][19571] Fps is (10 sec: 19660.9, 60 sec: 19524.3, 300 sec: 19119.3). Total num frames: 1206411264. Throughput: 0: 4871.2. Samples: 26596694. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 14:48:24,965][19571] Avg episode reward: [(0, '9.392')] [2025-01-05 14:48:25,328][19668] Updated weights for policy 0, policy_version 294536 (0.0017) [2025-01-05 14:48:27,419][19668] Updated weights for policy 0, policy_version 294546 (0.0017) [2025-01-05 14:48:29,413][19668] Updated weights for policy 0, policy_version 294556 (0.0017) [2025-01-05 14:48:29,965][19571] Fps is (10 sec: 19660.8, 60 sec: 19524.2, 300 sec: 19133.2). Total num frames: 1206509568. Throughput: 0: 4884.2. Samples: 26626388. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 14:48:29,966][19571] Avg episode reward: [(0, '10.370')] [2025-01-05 14:48:31,525][19668] Updated weights for policy 0, policy_version 294566 (0.0017) [2025-01-05 14:48:33,671][19668] Updated weights for policy 0, policy_version 294576 (0.0018) [2025-01-05 14:48:34,965][19571] Fps is (10 sec: 19250.8, 60 sec: 19455.9, 300 sec: 19133.2). Total num frames: 1206603776. Throughput: 0: 4886.9. Samples: 26641214. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 14:48:34,966][19571] Avg episode reward: [(0, '9.944')] [2025-01-05 14:48:35,765][19668] Updated weights for policy 0, policy_version 294586 (0.0018) [2025-01-05 14:48:37,903][19668] Updated weights for policy 0, policy_version 294596 (0.0018) [2025-01-05 14:48:39,965][19571] Fps is (10 sec: 19251.5, 60 sec: 19456.0, 300 sec: 19133.2). Total num frames: 1206702080. Throughput: 0: 4883.2. Samples: 26669956. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 14:48:39,965][19571] Avg episode reward: [(0, '9.702')] [2025-01-05 14:48:40,081][19636] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000294606_1206706176.pth... [2025-01-05 14:48:40,082][19668] Updated weights for policy 0, policy_version 294606 (0.0019) [2025-01-05 14:48:40,136][19636] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000293487_1202122752.pth [2025-01-05 14:48:42,225][19668] Updated weights for policy 0, policy_version 294616 (0.0019) [2025-01-05 14:48:44,351][19668] Updated weights for policy 0, policy_version 294626 (0.0017) [2025-01-05 14:48:44,965][19571] Fps is (10 sec: 19251.4, 60 sec: 19456.0, 300 sec: 19119.3). Total num frames: 1206796288. Throughput: 0: 4868.1. Samples: 26698452. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 14:48:44,965][19571] Avg episode reward: [(0, '9.906')] [2025-01-05 14:48:46,607][19668] Updated weights for policy 0, policy_version 294636 (0.0019) [2025-01-05 14:48:48,621][19668] Updated weights for policy 0, policy_version 294646 (0.0017) [2025-01-05 14:48:49,965][19571] Fps is (10 sec: 19251.0, 60 sec: 19456.0, 300 sec: 19133.2). Total num frames: 1206894592. Throughput: 0: 4855.9. Samples: 26712646. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 14:48:49,965][19571] Avg episode reward: [(0, '10.182')] [2025-01-05 14:48:50,721][19668] Updated weights for policy 0, policy_version 294656 (0.0017) [2025-01-05 14:48:52,803][19668] Updated weights for policy 0, policy_version 294666 (0.0017) [2025-01-05 14:48:54,896][19668] Updated weights for policy 0, policy_version 294676 (0.0019) [2025-01-05 14:48:54,965][19571] Fps is (10 sec: 19661.1, 60 sec: 19456.0, 300 sec: 19133.2). Total num frames: 1206992896. Throughput: 0: 4852.2. Samples: 26742026. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 14:48:54,965][19571] Avg episode reward: [(0, '10.031')] [2025-01-05 14:48:57,076][19668] Updated weights for policy 0, policy_version 294686 (0.0018) [2025-01-05 14:48:59,174][19668] Updated weights for policy 0, policy_version 294696 (0.0018) [2025-01-05 14:48:59,965][19571] Fps is (10 sec: 19251.0, 60 sec: 19387.7, 300 sec: 19119.3). Total num frames: 1207087104. Throughput: 0: 4846.8. Samples: 26771174. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 14:48:59,966][19571] Avg episode reward: [(0, '11.184')] [2025-01-05 14:49:01,262][19668] Updated weights for policy 0, policy_version 294706 (0.0017) [2025-01-05 14:49:03,360][19668] Updated weights for policy 0, policy_version 294716 (0.0017) [2025-01-05 14:49:04,965][19571] Fps is (10 sec: 19251.0, 60 sec: 19387.7, 300 sec: 19133.2). Total num frames: 1207185408. Throughput: 0: 4853.1. Samples: 26785776. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 14:49:04,965][19571] Avg episode reward: [(0, '10.274')] [2025-01-05 14:49:05,530][19668] Updated weights for policy 0, policy_version 294726 (0.0019) [2025-01-05 14:49:07,546][19668] Updated weights for policy 0, policy_version 294736 (0.0017) [2025-01-05 14:49:09,616][19668] Updated weights for policy 0, policy_version 294746 (0.0017) [2025-01-05 14:49:09,965][19571] Fps is (10 sec: 19660.8, 60 sec: 19387.7, 300 sec: 19147.0). Total num frames: 1207283712. Throughput: 0: 4853.6. Samples: 26815106. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 14:49:09,966][19571] Avg episode reward: [(0, '10.502')] [2025-01-05 14:49:11,803][19668] Updated weights for policy 0, policy_version 294756 (0.0018) [2025-01-05 14:49:13,823][19668] Updated weights for policy 0, policy_version 294766 (0.0018) [2025-01-05 14:49:14,965][19571] Fps is (10 sec: 19251.4, 60 sec: 19387.7, 300 sec: 19133.2). Total num frames: 1207377920. Throughput: 0: 4837.8. Samples: 26844086. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 14:49:14,965][19571] Avg episode reward: [(0, '9.919')] [2025-01-05 14:49:15,998][19668] Updated weights for policy 0, policy_version 294776 (0.0018) [2025-01-05 14:49:18,137][19668] Updated weights for policy 0, policy_version 294786 (0.0017) [2025-01-05 14:49:19,965][19571] Fps is (10 sec: 19251.5, 60 sec: 19387.8, 300 sec: 19133.2). Total num frames: 1207476224. Throughput: 0: 4831.3. Samples: 26858624. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-05 14:49:19,965][19571] Avg episode reward: [(0, '8.972')] [2025-01-05 14:49:20,266][19668] Updated weights for policy 0, policy_version 294796 (0.0019) [2025-01-05 14:49:22,348][19668] Updated weights for policy 0, policy_version 294806 (0.0017) [2025-01-05 14:49:24,485][19668] Updated weights for policy 0, policy_version 294816 (0.0016) [2025-01-05 14:49:24,965][19571] Fps is (10 sec: 19660.7, 60 sec: 19387.7, 300 sec: 19133.2). Total num frames: 1207574528. Throughput: 0: 4839.4. Samples: 26887730. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-05 14:49:24,965][19571] Avg episode reward: [(0, '9.691')] [2025-01-05 14:49:26,606][19668] Updated weights for policy 0, policy_version 294826 (0.0019) [2025-01-05 14:49:28,695][19668] Updated weights for policy 0, policy_version 294836 (0.0018) [2025-01-05 14:49:29,965][19571] Fps is (10 sec: 19251.2, 60 sec: 19319.5, 300 sec: 19133.2). Total num frames: 1207668736. Throughput: 0: 4843.6. Samples: 26916414. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-05 14:49:29,965][19571] Avg episode reward: [(0, '9.621')] [2025-01-05 14:49:30,897][19668] Updated weights for policy 0, policy_version 294846 (0.0018) [2025-01-05 14:49:32,907][19668] Updated weights for policy 0, policy_version 294856 (0.0017) [2025-01-05 14:49:34,965][19571] Fps is (10 sec: 19251.0, 60 sec: 19387.8, 300 sec: 19133.2). Total num frames: 1207767040. Throughput: 0: 4858.0. Samples: 26931254. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-05 14:49:34,965][19571] Avg episode reward: [(0, '8.932')] [2025-01-05 14:49:35,013][19668] Updated weights for policy 0, policy_version 294866 (0.0017) [2025-01-05 14:49:37,192][19668] Updated weights for policy 0, policy_version 294876 (0.0018) [2025-01-05 14:49:39,220][19668] Updated weights for policy 0, policy_version 294886 (0.0018) [2025-01-05 14:49:39,965][19571] Fps is (10 sec: 19661.0, 60 sec: 19387.7, 300 sec: 19147.1). Total num frames: 1207865344. Throughput: 0: 4853.7. Samples: 26960442. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-05 14:49:39,965][19571] Avg episode reward: [(0, '9.530')] [2025-01-05 14:49:41,355][19668] Updated weights for policy 0, policy_version 294896 (0.0016) [2025-01-05 14:49:43,356][19668] Updated weights for policy 0, policy_version 294906 (0.0015) [2025-01-05 14:49:44,965][19571] Fps is (10 sec: 20070.5, 60 sec: 19524.3, 300 sec: 19160.9). Total num frames: 1207967744. Throughput: 0: 4874.2. Samples: 26990510. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-05 14:49:44,965][19571] Avg episode reward: [(0, '8.735')] [2025-01-05 14:49:45,332][19668] Updated weights for policy 0, policy_version 294916 (0.0015) [2025-01-05 14:49:47,372][19668] Updated weights for policy 0, policy_version 294926 (0.0015) [2025-01-05 14:49:49,417][19668] Updated weights for policy 0, policy_version 294936 (0.0016) [2025-01-05 14:49:49,965][19571] Fps is (10 sec: 20070.3, 60 sec: 19524.3, 300 sec: 19161.0). Total num frames: 1208066048. Throughput: 0: 4887.3. Samples: 27005704. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-05 14:49:49,965][19571] Avg episode reward: [(0, '10.368')] [2025-01-05 14:49:51,457][19668] Updated weights for policy 0, policy_version 294946 (0.0015) [2025-01-05 14:49:53,513][19668] Updated weights for policy 0, policy_version 294956 (0.0016) [2025-01-05 14:49:54,965][19571] Fps is (10 sec: 19661.0, 60 sec: 19524.3, 300 sec: 19174.8). Total num frames: 1208164352. Throughput: 0: 4900.0. Samples: 27035604. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-05 14:49:54,965][19571] Avg episode reward: [(0, '10.202')] [2025-01-05 14:49:55,720][19668] Updated weights for policy 0, policy_version 294966 (0.0017) [2025-01-05 14:49:57,795][19668] Updated weights for policy 0, policy_version 294976 (0.0017) [2025-01-05 14:49:59,945][19668] Updated weights for policy 0, policy_version 294986 (0.0019) [2025-01-05 14:49:59,965][19571] Fps is (10 sec: 19660.8, 60 sec: 19592.6, 300 sec: 19174.8). Total num frames: 1208262656. Throughput: 0: 4893.0. Samples: 27064272. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-05 14:49:59,965][19571] Avg episode reward: [(0, '10.123')] [2025-01-05 14:50:02,120][19668] Updated weights for policy 0, policy_version 294996 (0.0019) [2025-01-05 14:50:04,165][19668] Updated weights for policy 0, policy_version 295006 (0.0017) [2025-01-05 14:50:04,965][19571] Fps is (10 sec: 19251.1, 60 sec: 19524.3, 300 sec: 19174.8). Total num frames: 1208356864. Throughput: 0: 4890.1. Samples: 27078680. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-05 14:50:04,965][19571] Avg episode reward: [(0, '10.439')] [2025-01-05 14:50:06,329][19668] Updated weights for policy 0, policy_version 295016 (0.0018) [2025-01-05 14:50:08,392][19668] Updated weights for policy 0, policy_version 295026 (0.0017) [2025-01-05 14:50:09,965][19571] Fps is (10 sec: 19251.0, 60 sec: 19524.3, 300 sec: 19188.7). Total num frames: 1208455168. Throughput: 0: 4892.4. Samples: 27107888. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-05 14:50:09,965][19571] Avg episode reward: [(0, '8.205')] [2025-01-05 14:50:10,578][19668] Updated weights for policy 0, policy_version 295036 (0.0018) [2025-01-05 14:50:12,638][19668] Updated weights for policy 0, policy_version 295046 (0.0017) [2025-01-05 14:50:14,734][19668] Updated weights for policy 0, policy_version 295056 (0.0018) [2025-01-05 14:50:14,965][19571] Fps is (10 sec: 19660.9, 60 sec: 19592.5, 300 sec: 19188.7). Total num frames: 1208553472. Throughput: 0: 4902.9. Samples: 27137042. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-05 14:50:14,965][19571] Avg episode reward: [(0, '10.480')] [2025-01-05 14:50:16,872][19668] Updated weights for policy 0, policy_version 295066 (0.0017) [2025-01-05 14:50:18,958][19668] Updated weights for policy 0, policy_version 295076 (0.0017) [2025-01-05 14:50:19,965][19571] Fps is (10 sec: 19251.2, 60 sec: 19524.3, 300 sec: 19174.8). Total num frames: 1208647680. Throughput: 0: 4891.0. Samples: 27151348. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-05 14:50:19,965][19571] Avg episode reward: [(0, '9.596')] [2025-01-05 14:50:21,189][19668] Updated weights for policy 0, policy_version 295086 (0.0019) [2025-01-05 14:50:23,238][19668] Updated weights for policy 0, policy_version 295096 (0.0017) [2025-01-05 14:50:24,965][19571] Fps is (10 sec: 18841.3, 60 sec: 19456.0, 300 sec: 19160.9). Total num frames: 1208741888. Throughput: 0: 4881.7. Samples: 27180120. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-05 14:50:24,965][19571] Avg episode reward: [(0, '9.993')] [2025-01-05 14:50:25,439][19668] Updated weights for policy 0, policy_version 295106 (0.0017) [2025-01-05 14:50:27,521][19668] Updated weights for policy 0, policy_version 295116 (0.0017) [2025-01-05 14:50:29,607][19668] Updated weights for policy 0, policy_version 295126 (0.0018) [2025-01-05 14:50:29,965][19571] Fps is (10 sec: 19251.3, 60 sec: 19524.3, 300 sec: 19161.0). Total num frames: 1208840192. Throughput: 0: 4859.2. Samples: 27209176. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-05 14:50:29,965][19571] Avg episode reward: [(0, '9.420')] [2025-01-05 14:50:31,801][19668] Updated weights for policy 0, policy_version 295136 (0.0019) [2025-01-05 14:50:33,874][19668] Updated weights for policy 0, policy_version 295146 (0.0018) [2025-01-05 14:50:34,965][19571] Fps is (10 sec: 19251.2, 60 sec: 19456.0, 300 sec: 19161.0). Total num frames: 1208934400. Throughput: 0: 4839.1. Samples: 27223464. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 14:50:34,965][19571] Avg episode reward: [(0, '9.841')] [2025-01-05 14:50:36,041][19668] Updated weights for policy 0, policy_version 295156 (0.0018) [2025-01-05 14:50:38,094][19668] Updated weights for policy 0, policy_version 295166 (0.0018) [2025-01-05 14:50:39,965][19571] Fps is (10 sec: 19251.2, 60 sec: 19456.0, 300 sec: 19160.9). Total num frames: 1209032704. Throughput: 0: 4823.5. Samples: 27252662. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 14:50:39,965][19571] Avg episode reward: [(0, '10.441')] [2025-01-05 14:50:40,058][19636] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000295175_1209036800.pth... [2025-01-05 14:50:40,106][19636] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000294058_1204461568.pth [2025-01-05 14:50:40,273][19668] Updated weights for policy 0, policy_version 295176 (0.0018) [2025-01-05 14:50:42,397][19668] Updated weights for policy 0, policy_version 295186 (0.0019) [2025-01-05 14:50:44,457][19668] Updated weights for policy 0, policy_version 295196 (0.0018) [2025-01-05 14:50:44,965][19571] Fps is (10 sec: 19661.0, 60 sec: 19387.7, 300 sec: 19161.0). Total num frames: 1209131008. Throughput: 0: 4828.8. Samples: 27281566. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 14:50:44,965][19571] Avg episode reward: [(0, '11.589')] [2025-01-05 14:50:46,632][19668] Updated weights for policy 0, policy_version 295206 (0.0018) [2025-01-05 14:50:48,682][19668] Updated weights for policy 0, policy_version 295216 (0.0019) [2025-01-05 14:50:49,965][19571] Fps is (10 sec: 19251.2, 60 sec: 19319.5, 300 sec: 19147.1). Total num frames: 1209225216. Throughput: 0: 4831.3. Samples: 27296088. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 14:50:49,965][19571] Avg episode reward: [(0, '9.473')] [2025-01-05 14:50:50,851][19668] Updated weights for policy 0, policy_version 295226 (0.0018) [2025-01-05 14:50:52,929][19668] Updated weights for policy 0, policy_version 295236 (0.0017) [2025-01-05 14:50:54,965][19571] Fps is (10 sec: 19251.1, 60 sec: 19319.4, 300 sec: 19161.0). Total num frames: 1209323520. Throughput: 0: 4826.3. Samples: 27325072. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 14:50:54,965][19571] Avg episode reward: [(0, '9.248')] [2025-01-05 14:50:55,111][19668] Updated weights for policy 0, policy_version 295246 (0.0018) [2025-01-05 14:50:57,290][19668] Updated weights for policy 0, policy_version 295256 (0.0018) [2025-01-05 14:50:59,360][19668] Updated weights for policy 0, policy_version 295266 (0.0017) [2025-01-05 14:50:59,965][19571] Fps is (10 sec: 19251.1, 60 sec: 19251.2, 300 sec: 19161.0). Total num frames: 1209417728. Throughput: 0: 4818.0. Samples: 27353852. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 14:50:59,965][19571] Avg episode reward: [(0, '10.977')] [2025-01-05 14:51:01,503][19668] Updated weights for policy 0, policy_version 295276 (0.0018) [2025-01-05 14:51:03,545][19668] Updated weights for policy 0, policy_version 295286 (0.0017) [2025-01-05 14:51:04,965][19571] Fps is (10 sec: 19251.4, 60 sec: 19319.5, 300 sec: 19161.0). Total num frames: 1209516032. Throughput: 0: 4824.5. Samples: 27368448. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 14:51:04,965][19571] Avg episode reward: [(0, '9.268')] [2025-01-05 14:51:05,726][19668] Updated weights for policy 0, policy_version 295296 (0.0018) [2025-01-05 14:51:07,834][19668] Updated weights for policy 0, policy_version 295306 (0.0018) [2025-01-05 14:51:09,965][19571] Fps is (10 sec: 19251.3, 60 sec: 19251.2, 300 sec: 19147.1). Total num frames: 1209610240. Throughput: 0: 4823.9. Samples: 27397196. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 14:51:09,965][19571] Avg episode reward: [(0, '10.259')] [2025-01-05 14:51:09,968][19668] Updated weights for policy 0, policy_version 295316 (0.0018) [2025-01-05 14:51:12,122][19668] Updated weights for policy 0, policy_version 295326 (0.0017) [2025-01-05 14:51:14,172][19668] Updated weights for policy 0, policy_version 295336 (0.0016) [2025-01-05 14:51:14,965][19571] Fps is (10 sec: 19251.0, 60 sec: 19251.2, 300 sec: 19147.1). Total num frames: 1209708544. Throughput: 0: 4825.3. Samples: 27426314. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 14:51:14,965][19571] Avg episode reward: [(0, '9.945')] [2025-01-05 14:51:16,322][19668] Updated weights for policy 0, policy_version 295346 (0.0018) [2025-01-05 14:51:18,359][19668] Updated weights for policy 0, policy_version 295356 (0.0017) [2025-01-05 14:51:19,965][19571] Fps is (10 sec: 19660.8, 60 sec: 19319.5, 300 sec: 19147.1). Total num frames: 1209806848. Throughput: 0: 4834.2. Samples: 27441004. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 14:51:19,965][19571] Avg episode reward: [(0, '9.765')] [2025-01-05 14:51:20,499][19668] Updated weights for policy 0, policy_version 295366 (0.0018) [2025-01-05 14:51:22,579][19668] Updated weights for policy 0, policy_version 295376 (0.0017) [2025-01-05 14:51:24,616][19668] Updated weights for policy 0, policy_version 295386 (0.0017) [2025-01-05 14:51:24,965][19571] Fps is (10 sec: 19660.7, 60 sec: 19387.7, 300 sec: 19147.1). Total num frames: 1209905152. Throughput: 0: 4838.6. Samples: 27470398. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 14:51:24,965][19571] Avg episode reward: [(0, '9.026')] [2025-01-05 14:51:26,771][19668] Updated weights for policy 0, policy_version 295396 (0.0017) [2025-01-05 14:51:28,859][19668] Updated weights for policy 0, policy_version 295406 (0.0017) [2025-01-05 14:51:29,965][19571] Fps is (10 sec: 19660.7, 60 sec: 19387.7, 300 sec: 19133.2). Total num frames: 1210003456. Throughput: 0: 4841.3. Samples: 27499426. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 14:51:29,965][19571] Avg episode reward: [(0, '9.704')] [2025-01-05 14:51:30,992][19668] Updated weights for policy 0, policy_version 295416 (0.0018) [2025-01-05 14:51:33,063][19668] Updated weights for policy 0, policy_version 295426 (0.0018) [2025-01-05 14:51:34,965][19571] Fps is (10 sec: 19251.4, 60 sec: 19387.8, 300 sec: 19119.3). Total num frames: 1210097664. Throughput: 0: 4844.4. Samples: 27514086. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 14:51:34,965][19571] Avg episode reward: [(0, '8.853')] [2025-01-05 14:51:35,236][19668] Updated weights for policy 0, policy_version 295436 (0.0017) [2025-01-05 14:51:37,318][19668] Updated weights for policy 0, policy_version 295446 (0.0018) [2025-01-05 14:51:39,369][19668] Updated weights for policy 0, policy_version 295456 (0.0017) [2025-01-05 14:51:39,965][19571] Fps is (10 sec: 19251.0, 60 sec: 19387.7, 300 sec: 19161.5). Total num frames: 1210195968. Throughput: 0: 4850.7. Samples: 27543354. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 14:51:39,966][19571] Avg episode reward: [(0, '9.900')] [2025-01-05 14:51:41,557][19668] Updated weights for policy 0, policy_version 295466 (0.0017) [2025-01-05 14:51:43,647][19668] Updated weights for policy 0, policy_version 295476 (0.0018) [2025-01-05 14:51:44,965][19571] Fps is (10 sec: 19660.9, 60 sec: 19387.8, 300 sec: 19258.1). Total num frames: 1210294272. Throughput: 0: 4848.9. Samples: 27572050. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 14:51:44,965][19571] Avg episode reward: [(0, '9.704')] [2025-01-05 14:51:45,807][19668] Updated weights for policy 0, policy_version 295486 (0.0018) [2025-01-05 14:51:47,854][19668] Updated weights for policy 0, policy_version 295496 (0.0017) [2025-01-05 14:51:49,965][19571] Fps is (10 sec: 19251.4, 60 sec: 19387.7, 300 sec: 19285.9). Total num frames: 1210388480. Throughput: 0: 4851.2. Samples: 27586754. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 14:51:49,965][19571] Avg episode reward: [(0, '10.220')] [2025-01-05 14:51:50,023][19668] Updated weights for policy 0, policy_version 295506 (0.0018) [2025-01-05 14:51:52,154][19668] Updated weights for policy 0, policy_version 295516 (0.0018) [2025-01-05 14:51:54,201][19668] Updated weights for policy 0, policy_version 295526 (0.0017) [2025-01-05 14:51:54,965][19571] Fps is (10 sec: 19251.2, 60 sec: 19387.8, 300 sec: 19341.5). Total num frames: 1210486784. Throughput: 0: 4856.8. Samples: 27615752. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 14:51:54,965][19571] Avg episode reward: [(0, '9.184')] [2025-01-05 14:51:56,374][19668] Updated weights for policy 0, policy_version 295536 (0.0018) [2025-01-05 14:51:58,446][19668] Updated weights for policy 0, policy_version 295546 (0.0017) [2025-01-05 14:51:59,965][19571] Fps is (10 sec: 19660.9, 60 sec: 19456.0, 300 sec: 19397.0). Total num frames: 1210585088. Throughput: 0: 4855.1. Samples: 27644794. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 14:51:59,965][19571] Avg episode reward: [(0, '10.904')] [2025-01-05 14:52:00,569][19668] Updated weights for policy 0, policy_version 295556 (0.0018) [2025-01-05 14:52:02,642][19668] Updated weights for policy 0, policy_version 295566 (0.0018) [2025-01-05 14:52:04,716][19668] Updated weights for policy 0, policy_version 295576 (0.0017) [2025-01-05 14:52:04,965][19571] Fps is (10 sec: 19660.6, 60 sec: 19456.0, 300 sec: 19438.6). Total num frames: 1210683392. Throughput: 0: 4857.2. Samples: 27659576. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 14:52:04,965][19571] Avg episode reward: [(0, '10.805')] [2025-01-05 14:52:06,840][19668] Updated weights for policy 0, policy_version 295586 (0.0018) [2025-01-05 14:52:08,973][19668] Updated weights for policy 0, policy_version 295596 (0.0018) [2025-01-05 14:52:09,965][19571] Fps is (10 sec: 19250.8, 60 sec: 19455.9, 300 sec: 19438.6). Total num frames: 1210777600. Throughput: 0: 4851.1. Samples: 27688698. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 14:52:09,966][19571] Avg episode reward: [(0, '10.444')] [2025-01-05 14:52:11,129][19668] Updated weights for policy 0, policy_version 295606 (0.0018) [2025-01-05 14:52:13,160][19668] Updated weights for policy 0, policy_version 295616 (0.0017) [2025-01-05 14:52:14,965][19571] Fps is (10 sec: 19251.3, 60 sec: 19456.0, 300 sec: 19438.6). Total num frames: 1210875904. Throughput: 0: 4849.5. Samples: 27717654. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 14:52:14,965][19571] Avg episode reward: [(0, '9.424')] [2025-01-05 14:52:15,389][19668] Updated weights for policy 0, policy_version 295626 (0.0018) [2025-01-05 14:52:17,441][19668] Updated weights for policy 0, policy_version 295636 (0.0018) [2025-01-05 14:52:19,490][19668] Updated weights for policy 0, policy_version 295646 (0.0017) [2025-01-05 14:52:19,965][19571] Fps is (10 sec: 19251.3, 60 sec: 19387.7, 300 sec: 19424.8). Total num frames: 1210970112. Throughput: 0: 4850.2. Samples: 27732344. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 14:52:19,966][19571] Avg episode reward: [(0, '10.313')] [2025-01-05 14:52:21,681][19668] Updated weights for policy 0, policy_version 295656 (0.0018) [2025-01-05 14:52:23,751][19668] Updated weights for policy 0, policy_version 295666 (0.0018) [2025-01-05 14:52:24,965][19571] Fps is (10 sec: 19251.1, 60 sec: 19387.7, 300 sec: 19424.8). Total num frames: 1211068416. Throughput: 0: 4849.7. Samples: 27761588. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 14:52:24,965][19571] Avg episode reward: [(0, '9.028')] [2025-01-05 14:52:25,823][19668] Updated weights for policy 0, policy_version 295676 (0.0017) [2025-01-05 14:52:27,819][19668] Updated weights for policy 0, policy_version 295686 (0.0015) [2025-01-05 14:52:29,831][19668] Updated weights for policy 0, policy_version 295696 (0.0015) [2025-01-05 14:52:29,965][19571] Fps is (10 sec: 20070.6, 60 sec: 19456.0, 300 sec: 19438.6). Total num frames: 1211170816. Throughput: 0: 4887.2. Samples: 27791974. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 14:52:29,965][19571] Avg episode reward: [(0, '9.581')] [2025-01-05 14:52:31,824][19668] Updated weights for policy 0, policy_version 295706 (0.0014) [2025-01-05 14:52:33,875][19668] Updated weights for policy 0, policy_version 295716 (0.0015) [2025-01-05 14:52:34,965][19571] Fps is (10 sec: 20480.0, 60 sec: 19592.5, 300 sec: 19452.5). Total num frames: 1211273216. Throughput: 0: 4896.8. Samples: 27807108. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 14:52:34,965][19571] Avg episode reward: [(0, '9.588')] [2025-01-05 14:52:35,938][19668] Updated weights for policy 0, policy_version 295726 (0.0016) [2025-01-05 14:52:37,961][19668] Updated weights for policy 0, policy_version 295736 (0.0016) [2025-01-05 14:52:39,965][19571] Fps is (10 sec: 20070.2, 60 sec: 19592.5, 300 sec: 19466.4). Total num frames: 1211371520. Throughput: 0: 4918.1. Samples: 27837066. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 14:52:39,966][19571] Avg episode reward: [(0, '10.119')] [2025-01-05 14:52:40,088][19636] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000295746_1211375616.pth... [2025-01-05 14:52:40,089][19668] Updated weights for policy 0, policy_version 295746 (0.0015) [2025-01-05 14:52:40,136][19636] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000294606_1206706176.pth [2025-01-05 14:52:42,144][19668] Updated weights for policy 0, policy_version 295756 (0.0018) [2025-01-05 14:52:44,147][19668] Updated weights for policy 0, policy_version 295766 (0.0015) [2025-01-05 14:52:44,965][19571] Fps is (10 sec: 19661.1, 60 sec: 19592.5, 300 sec: 19466.4). Total num frames: 1211469824. Throughput: 0: 4937.7. Samples: 27866990. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 14:52:44,965][19571] Avg episode reward: [(0, '9.056')] [2025-01-05 14:52:46,193][19668] Updated weights for policy 0, policy_version 295776 (0.0016) [2025-01-05 14:52:48,209][19668] Updated weights for policy 0, policy_version 295786 (0.0015) [2025-01-05 14:52:49,965][19571] Fps is (10 sec: 20070.4, 60 sec: 19729.0, 300 sec: 19480.3). Total num frames: 1211572224. Throughput: 0: 4948.0. Samples: 27882238. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 14:52:49,965][19571] Avg episode reward: [(0, '9.629')] [2025-01-05 14:52:50,218][19668] Updated weights for policy 0, policy_version 295796 (0.0015) [2025-01-05 14:52:52,285][19668] Updated weights for policy 0, policy_version 295806 (0.0015) [2025-01-05 14:52:54,341][19668] Updated weights for policy 0, policy_version 295816 (0.0015) [2025-01-05 14:52:54,965][19571] Fps is (10 sec: 20070.1, 60 sec: 19729.0, 300 sec: 19480.3). Total num frames: 1211670528. Throughput: 0: 4969.5. Samples: 27912326. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 14:52:54,965][19571] Avg episode reward: [(0, '9.662')] [2025-01-05 14:52:56,422][19668] Updated weights for policy 0, policy_version 295826 (0.0016) [2025-01-05 14:52:58,457][19668] Updated weights for policy 0, policy_version 295836 (0.0016) [2025-01-05 14:52:59,965][19571] Fps is (10 sec: 19660.9, 60 sec: 19729.0, 300 sec: 19480.3). Total num frames: 1211768832. Throughput: 0: 4981.7. Samples: 27941830. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 14:52:59,966][19571] Avg episode reward: [(0, '8.779')] [2025-01-05 14:53:00,583][19668] Updated weights for policy 0, policy_version 295846 (0.0016) [2025-01-05 14:53:02,578][19668] Updated weights for policy 0, policy_version 295856 (0.0016) [2025-01-05 14:53:04,651][19668] Updated weights for policy 0, policy_version 295866 (0.0015) [2025-01-05 14:53:04,965][19571] Fps is (10 sec: 20070.4, 60 sec: 19797.3, 300 sec: 19494.2). Total num frames: 1211871232. Throughput: 0: 4992.2. Samples: 27956992. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 14:53:04,965][19571] Avg episode reward: [(0, '10.912')] [2025-01-05 14:53:06,750][19668] Updated weights for policy 0, policy_version 295876 (0.0018) [2025-01-05 14:53:08,768][19668] Updated weights for policy 0, policy_version 295886 (0.0016) [2025-01-05 14:53:09,965][19571] Fps is (10 sec: 20070.4, 60 sec: 19865.6, 300 sec: 19508.1). Total num frames: 1211969536. Throughput: 0: 5005.5. Samples: 27986836. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 14:53:09,965][19571] Avg episode reward: [(0, '10.437')] [2025-01-05 14:53:10,797][19668] Updated weights for policy 0, policy_version 295896 (0.0015) [2025-01-05 14:53:12,844][19668] Updated weights for policy 0, policy_version 295906 (0.0015) [2025-01-05 14:53:14,947][19668] Updated weights for policy 0, policy_version 295916 (0.0016) [2025-01-05 14:53:14,965][19571] Fps is (10 sec: 20070.6, 60 sec: 19933.9, 300 sec: 19522.0). Total num frames: 1212071936. Throughput: 0: 4992.0. Samples: 28016612. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 14:53:14,965][19571] Avg episode reward: [(0, '9.827')] [2025-01-05 14:53:17,023][19668] Updated weights for policy 0, policy_version 295926 (0.0016) [2025-01-05 14:53:19,093][19668] Updated weights for policy 0, policy_version 295936 (0.0014) [2025-01-05 14:53:19,965][19571] Fps is (10 sec: 20070.4, 60 sec: 20002.1, 300 sec: 19521.9). Total num frames: 1212170240. Throughput: 0: 4983.8. Samples: 28031380. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 14:53:19,965][19571] Avg episode reward: [(0, '9.603')] [2025-01-05 14:53:21,204][19668] Updated weights for policy 0, policy_version 295946 (0.0015) [2025-01-05 14:53:23,186][19668] Updated weights for policy 0, policy_version 295956 (0.0016) [2025-01-05 14:53:24,965][19571] Fps is (10 sec: 19660.9, 60 sec: 20002.2, 300 sec: 19522.0). Total num frames: 1212268544. Throughput: 0: 4980.8. Samples: 28061202. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 14:53:24,965][19571] Avg episode reward: [(0, '9.689')] [2025-01-05 14:53:25,310][19668] Updated weights for policy 0, policy_version 295966 (0.0015) [2025-01-05 14:53:27,332][19668] Updated weights for policy 0, policy_version 295976 (0.0015) [2025-01-05 14:53:29,341][19668] Updated weights for policy 0, policy_version 295986 (0.0016) [2025-01-05 14:53:29,965][19571] Fps is (10 sec: 20070.5, 60 sec: 20002.1, 300 sec: 19549.7). Total num frames: 1212370944. Throughput: 0: 4984.0. Samples: 28091272. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 14:53:29,965][19571] Avg episode reward: [(0, '10.160')] [2025-01-05 14:53:31,405][19668] Updated weights for policy 0, policy_version 295996 (0.0015) [2025-01-05 14:53:33,482][19668] Updated weights for policy 0, policy_version 296006 (0.0015) [2025-01-05 14:53:34,965][19571] Fps is (10 sec: 20070.4, 60 sec: 19933.9, 300 sec: 19549.7). Total num frames: 1212469248. Throughput: 0: 4977.8. Samples: 28106238. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 14:53:34,965][19571] Avg episode reward: [(0, '10.866')] [2025-01-05 14:53:35,573][19668] Updated weights for policy 0, policy_version 296016 (0.0016) [2025-01-05 14:53:37,617][19668] Updated weights for policy 0, policy_version 296026 (0.0015) [2025-01-05 14:53:39,686][19668] Updated weights for policy 0, policy_version 296036 (0.0015) [2025-01-05 14:53:39,965][19571] Fps is (10 sec: 19660.8, 60 sec: 19933.9, 300 sec: 19563.6). Total num frames: 1212567552. Throughput: 0: 4967.1. Samples: 28135846. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 14:53:39,965][19571] Avg episode reward: [(0, '11.103')] [2025-01-05 14:53:41,730][19668] Updated weights for policy 0, policy_version 296046 (0.0016) [2025-01-05 14:53:43,787][19668] Updated weights for policy 0, policy_version 296056 (0.0016) [2025-01-05 14:53:44,965][19571] Fps is (10 sec: 19660.7, 60 sec: 19933.8, 300 sec: 19563.6). Total num frames: 1212665856. Throughput: 0: 4970.7. Samples: 28165512. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 14:53:44,965][19571] Avg episode reward: [(0, '10.325')] [2025-01-05 14:53:45,921][19668] Updated weights for policy 0, policy_version 296066 (0.0015) [2025-01-05 14:53:47,929][19668] Updated weights for policy 0, policy_version 296076 (0.0016) [2025-01-05 14:53:49,965][19571] Fps is (10 sec: 20070.5, 60 sec: 19933.9, 300 sec: 19577.5). Total num frames: 1212768256. Throughput: 0: 4967.6. Samples: 28180536. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 14:53:49,965][19571] Avg episode reward: [(0, '10.171')] [2025-01-05 14:53:49,967][19668] Updated weights for policy 0, policy_version 296086 (0.0015) [2025-01-05 14:53:52,075][19668] Updated weights for policy 0, policy_version 296096 (0.0017) [2025-01-05 14:53:54,088][19668] Updated weights for policy 0, policy_version 296106 (0.0015) [2025-01-05 14:53:54,965][19571] Fps is (10 sec: 20070.3, 60 sec: 19933.9, 300 sec: 19591.4). Total num frames: 1212866560. Throughput: 0: 4967.4. Samples: 28210370. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 14:53:54,965][19571] Avg episode reward: [(0, '9.687')] [2025-01-05 14:53:56,149][19668] Updated weights for policy 0, policy_version 296116 (0.0015) [2025-01-05 14:53:58,204][19668] Updated weights for policy 0, policy_version 296126 (0.0017) [2025-01-05 14:53:59,965][19571] Fps is (10 sec: 19660.7, 60 sec: 19933.9, 300 sec: 19591.4). Total num frames: 1212964864. Throughput: 0: 4967.3. Samples: 28240140. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 14:53:59,965][19571] Avg episode reward: [(0, '10.358')] [2025-01-05 14:54:00,300][19668] Updated weights for policy 0, policy_version 296136 (0.0016) [2025-01-05 14:54:02,350][19668] Updated weights for policy 0, policy_version 296146 (0.0015) [2025-01-05 14:54:04,417][19668] Updated weights for policy 0, policy_version 296156 (0.0016) [2025-01-05 14:54:04,965][19571] Fps is (10 sec: 19660.8, 60 sec: 19865.6, 300 sec: 19591.4). Total num frames: 1213063168. Throughput: 0: 4971.1. Samples: 28255078. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 14:54:04,965][19571] Avg episode reward: [(0, '9.480')] [2025-01-05 14:54:06,476][19668] Updated weights for policy 0, policy_version 296166 (0.0016) [2025-01-05 14:54:08,546][19668] Updated weights for policy 0, policy_version 296176 (0.0016) [2025-01-05 14:54:09,965][19571] Fps is (10 sec: 19660.7, 60 sec: 19865.6, 300 sec: 19605.2). Total num frames: 1213161472. Throughput: 0: 4968.7. Samples: 28284794. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 14:54:09,965][19571] Avg episode reward: [(0, '9.145')] [2025-01-05 14:54:10,722][19668] Updated weights for policy 0, policy_version 296186 (0.0016) [2025-01-05 14:54:12,748][19668] Updated weights for policy 0, policy_version 296196 (0.0017) [2025-01-05 14:54:14,847][19668] Updated weights for policy 0, policy_version 296206 (0.0016) [2025-01-05 14:54:14,965][19571] Fps is (10 sec: 19661.0, 60 sec: 19797.3, 300 sec: 19605.3). Total num frames: 1213259776. Throughput: 0: 4951.1. Samples: 28314070. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 14:54:14,965][19571] Avg episode reward: [(0, '9.375')] [2025-01-05 14:54:17,037][19668] Updated weights for policy 0, policy_version 296216 (0.0019) [2025-01-05 14:54:19,043][19668] Updated weights for policy 0, policy_version 296226 (0.0016) [2025-01-05 14:54:19,965][19571] Fps is (10 sec: 19660.9, 60 sec: 19797.3, 300 sec: 19605.3). Total num frames: 1213358080. Throughput: 0: 4937.4. Samples: 28328420. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 14:54:19,965][19571] Avg episode reward: [(0, '9.067')] [2025-01-05 14:54:21,130][19668] Updated weights for policy 0, policy_version 296236 (0.0015) [2025-01-05 14:54:23,182][19668] Updated weights for policy 0, policy_version 296246 (0.0015) [2025-01-05 14:54:24,965][19571] Fps is (10 sec: 19660.6, 60 sec: 19797.3, 300 sec: 19619.1). Total num frames: 1213456384. Throughput: 0: 4944.8. Samples: 28358364. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 14:54:24,965][19571] Avg episode reward: [(0, '9.907')] [2025-01-05 14:54:25,233][19668] Updated weights for policy 0, policy_version 296256 (0.0016) [2025-01-05 14:54:27,317][19668] Updated weights for policy 0, policy_version 296266 (0.0017) [2025-01-05 14:54:29,368][19668] Updated weights for policy 0, policy_version 296276 (0.0015) [2025-01-05 14:54:29,965][19571] Fps is (10 sec: 19660.9, 60 sec: 19729.1, 300 sec: 19619.1). Total num frames: 1213554688. Throughput: 0: 4948.2. Samples: 28388182. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 14:54:29,965][19571] Avg episode reward: [(0, '10.249')] [2025-01-05 14:54:31,416][19668] Updated weights for policy 0, policy_version 296286 (0.0016) [2025-01-05 14:54:33,483][19668] Updated weights for policy 0, policy_version 296296 (0.0014) [2025-01-05 14:54:34,965][19571] Fps is (10 sec: 20070.5, 60 sec: 19797.3, 300 sec: 19633.0). Total num frames: 1213657088. Throughput: 0: 4943.3. Samples: 28402986. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 14:54:34,965][19571] Avg episode reward: [(0, '8.731')] [2025-01-05 14:54:35,572][19668] Updated weights for policy 0, policy_version 296306 (0.0016) [2025-01-05 14:54:37,565][19668] Updated weights for policy 0, policy_version 296316 (0.0017) [2025-01-05 14:54:39,648][19668] Updated weights for policy 0, policy_version 296326 (0.0017) [2025-01-05 14:54:39,965][19571] Fps is (10 sec: 20070.2, 60 sec: 19797.3, 300 sec: 19619.1). Total num frames: 1213755392. Throughput: 0: 4944.8. Samples: 28432888. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 14:54:39,965][19571] Avg episode reward: [(0, '9.931')] [2025-01-05 14:54:40,046][19636] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000296328_1213759488.pth... [2025-01-05 14:54:40,098][19636] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000295175_1209036800.pth [2025-01-05 14:54:41,842][19668] Updated weights for policy 0, policy_version 296336 (0.0016) [2025-01-05 14:54:43,838][19668] Updated weights for policy 0, policy_version 296346 (0.0015) [2025-01-05 14:54:44,965][19571] Fps is (10 sec: 19660.6, 60 sec: 19797.3, 300 sec: 19619.1). Total num frames: 1213853696. Throughput: 0: 4939.2. Samples: 28462402. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 14:54:44,965][19571] Avg episode reward: [(0, '10.735')] [2025-01-05 14:54:45,884][19668] Updated weights for policy 0, policy_version 296356 (0.0015) [2025-01-05 14:54:47,958][19668] Updated weights for policy 0, policy_version 296366 (0.0014) [2025-01-05 14:54:49,965][19571] Fps is (10 sec: 19661.2, 60 sec: 19729.1, 300 sec: 19619.1). Total num frames: 1213952000. Throughput: 0: 4942.5. Samples: 28477488. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 14:54:49,965][19571] Avg episode reward: [(0, '8.638')] [2025-01-05 14:54:50,056][19668] Updated weights for policy 0, policy_version 296376 (0.0017) [2025-01-05 14:54:52,234][19668] Updated weights for policy 0, policy_version 296386 (0.0014) [2025-01-05 14:54:54,431][19668] Updated weights for policy 0, policy_version 296396 (0.0016) [2025-01-05 14:54:54,965][19571] Fps is (10 sec: 19251.2, 60 sec: 19660.8, 300 sec: 19605.3). Total num frames: 1214046208. Throughput: 0: 4914.3. Samples: 28505938. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 14:54:54,965][19571] Avg episode reward: [(0, '10.893')] [2025-01-05 14:54:56,564][19668] Updated weights for policy 0, policy_version 296406 (0.0018) [2025-01-05 14:54:58,837][19668] Updated weights for policy 0, policy_version 296416 (0.0017) [2025-01-05 14:54:59,965][19571] Fps is (10 sec: 18841.3, 60 sec: 19592.5, 300 sec: 19605.3). Total num frames: 1214140416. Throughput: 0: 4884.1. Samples: 28533854. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 14:54:59,965][19571] Avg episode reward: [(0, '10.983')] [2025-01-05 14:55:01,035][19668] Updated weights for policy 0, policy_version 296426 (0.0018) [2025-01-05 14:55:03,076][19668] Updated weights for policy 0, policy_version 296436 (0.0018) [2025-01-05 14:55:04,965][19571] Fps is (10 sec: 18841.8, 60 sec: 19524.3, 300 sec: 19591.4). Total num frames: 1214234624. Throughput: 0: 4887.8. Samples: 28548372. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 14:55:04,965][19571] Avg episode reward: [(0, '10.356')] [2025-01-05 14:55:05,291][19668] Updated weights for policy 0, policy_version 296446 (0.0018) [2025-01-05 14:55:07,426][19668] Updated weights for policy 0, policy_version 296456 (0.0017) [2025-01-05 14:55:09,458][19668] Updated weights for policy 0, policy_version 296466 (0.0017) [2025-01-05 14:55:09,965][19571] Fps is (10 sec: 19251.4, 60 sec: 19524.3, 300 sec: 19591.4). Total num frames: 1214332928. Throughput: 0: 4863.7. Samples: 28577230. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 14:55:09,965][19571] Avg episode reward: [(0, '9.602')] [2025-01-05 14:55:11,655][19668] Updated weights for policy 0, policy_version 296476 (0.0018) [2025-01-05 14:55:13,761][19668] Updated weights for policy 0, policy_version 296486 (0.0017) [2025-01-05 14:55:14,965][19571] Fps is (10 sec: 19251.1, 60 sec: 19456.0, 300 sec: 19591.4). Total num frames: 1214427136. Throughput: 0: 4839.2. Samples: 28605948. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 14:55:14,965][19571] Avg episode reward: [(0, '9.850')] [2025-01-05 14:55:15,892][19668] Updated weights for policy 0, policy_version 296496 (0.0018) [2025-01-05 14:55:17,981][19668] Updated weights for policy 0, policy_version 296506 (0.0016) [2025-01-05 14:55:19,965][19571] Fps is (10 sec: 19251.1, 60 sec: 19456.0, 300 sec: 19605.3). Total num frames: 1214525440. Throughput: 0: 4834.7. Samples: 28620548. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 14:55:19,965][19571] Avg episode reward: [(0, '9.830')] [2025-01-05 14:55:20,164][19668] Updated weights for policy 0, policy_version 296516 (0.0018) [2025-01-05 14:55:22,185][19668] Updated weights for policy 0, policy_version 296526 (0.0017) [2025-01-05 14:55:24,290][19668] Updated weights for policy 0, policy_version 296536 (0.0018) [2025-01-05 14:55:24,965][19571] Fps is (10 sec: 19660.8, 60 sec: 19456.0, 300 sec: 19605.3). Total num frames: 1214623744. Throughput: 0: 4818.8. Samples: 28649734. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 14:55:24,965][19571] Avg episode reward: [(0, '10.063')] [2025-01-05 14:55:26,485][19668] Updated weights for policy 0, policy_version 296546 (0.0018) [2025-01-05 14:55:28,514][19668] Updated weights for policy 0, policy_version 296556 (0.0019) [2025-01-05 14:55:29,965][19571] Fps is (10 sec: 19251.1, 60 sec: 19387.7, 300 sec: 19605.3). Total num frames: 1214717952. Throughput: 0: 4802.0. Samples: 28678494. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 14:55:29,965][19571] Avg episode reward: [(0, '9.215')] [2025-01-05 14:55:30,710][19668] Updated weights for policy 0, policy_version 296566 (0.0018) [2025-01-05 14:55:32,825][19668] Updated weights for policy 0, policy_version 296576 (0.0017) [2025-01-05 14:55:34,940][19668] Updated weights for policy 0, policy_version 296586 (0.0018) [2025-01-05 14:55:34,965][19571] Fps is (10 sec: 19250.6, 60 sec: 19319.3, 300 sec: 19605.2). Total num frames: 1214816256. Throughput: 0: 4793.4. Samples: 28693194. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 14:55:34,966][19571] Avg episode reward: [(0, '10.354')] [2025-01-05 14:55:37,145][19668] Updated weights for policy 0, policy_version 296596 (0.0018) [2025-01-05 14:55:39,287][19668] Updated weights for policy 0, policy_version 296606 (0.0018) [2025-01-05 14:55:39,965][19571] Fps is (10 sec: 19251.5, 60 sec: 19251.3, 300 sec: 19591.4). Total num frames: 1214910464. Throughput: 0: 4791.3. Samples: 28721548. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 14:55:39,965][19571] Avg episode reward: [(0, '9.859')] [2025-01-05 14:55:41,424][19668] Updated weights for policy 0, policy_version 296616 (0.0019) [2025-01-05 14:55:43,600][19668] Updated weights for policy 0, policy_version 296626 (0.0018) [2025-01-05 14:55:44,965][19571] Fps is (10 sec: 18842.4, 60 sec: 19183.0, 300 sec: 19591.4). Total num frames: 1215004672. Throughput: 0: 4795.5. Samples: 28749652. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 14:55:44,965][19571] Avg episode reward: [(0, '11.215')] [2025-01-05 14:55:45,913][19668] Updated weights for policy 0, policy_version 296636 (0.0018) [2025-01-05 14:55:48,036][19668] Updated weights for policy 0, policy_version 296646 (0.0018) [2025-01-05 14:55:49,965][19571] Fps is (10 sec: 18022.4, 60 sec: 18978.1, 300 sec: 19549.7). Total num frames: 1215090688. Throughput: 0: 4781.9. Samples: 28763558. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 14:55:49,965][19571] Avg episode reward: [(0, '10.415')] [2025-01-05 14:55:50,422][19668] Updated weights for policy 0, policy_version 296656 (0.0019) [2025-01-05 14:55:52,554][19668] Updated weights for policy 0, policy_version 296666 (0.0017) [2025-01-05 14:55:54,571][19668] Updated weights for policy 0, policy_version 296676 (0.0017) [2025-01-05 14:55:54,965][19571] Fps is (10 sec: 18431.8, 60 sec: 19046.4, 300 sec: 19563.6). Total num frames: 1215188992. Throughput: 0: 4762.7. Samples: 28791552. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 14:55:54,965][19571] Avg episode reward: [(0, '10.746')] [2025-01-05 14:55:56,678][19668] Updated weights for policy 0, policy_version 296686 (0.0017) [2025-01-05 14:55:59,330][19668] Updated weights for policy 0, policy_version 296696 (0.0022) [2025-01-05 14:55:59,965][19571] Fps is (10 sec: 18431.3, 60 sec: 18909.8, 300 sec: 19521.9). Total num frames: 1215275008. Throughput: 0: 4716.8. Samples: 28818206. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 14:55:59,966][19571] Avg episode reward: [(0, '9.626')] [2025-01-05 14:56:01,773][19668] Updated weights for policy 0, policy_version 296706 (0.0021) [2025-01-05 14:56:04,132][19668] Updated weights for policy 0, policy_version 296716 (0.0018) [2025-01-05 14:56:04,965][19571] Fps is (10 sec: 17203.2, 60 sec: 18773.3, 300 sec: 19494.2). Total num frames: 1215361024. Throughput: 0: 4672.2. Samples: 28830798. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 14:56:04,965][19571] Avg episode reward: [(0, '10.768')] [2025-01-05 14:56:06,532][19668] Updated weights for policy 0, policy_version 296726 (0.0019) [2025-01-05 14:56:08,628][19668] Updated weights for policy 0, policy_version 296736 (0.0018) [2025-01-05 14:56:09,965][19571] Fps is (10 sec: 18022.9, 60 sec: 18705.0, 300 sec: 19480.3). Total num frames: 1215455232. Throughput: 0: 4617.0. Samples: 28857500. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 14:56:09,965][19571] Avg episode reward: [(0, '10.007')] [2025-01-05 14:56:10,964][19668] Updated weights for policy 0, policy_version 296746 (0.0019) [2025-01-05 14:56:13,120][19668] Updated weights for policy 0, policy_version 296756 (0.0017) [2025-01-05 14:56:14,965][19571] Fps is (10 sec: 18432.0, 60 sec: 18636.8, 300 sec: 19452.5). Total num frames: 1215545344. Throughput: 0: 4590.2. Samples: 28885052. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 14:56:14,965][19571] Avg episode reward: [(0, '10.323')] [2025-01-05 14:56:15,308][19668] Updated weights for policy 0, policy_version 296766 (0.0019) [2025-01-05 14:56:17,397][19668] Updated weights for policy 0, policy_version 296776 (0.0017) [2025-01-05 14:56:19,456][19668] Updated weights for policy 0, policy_version 296786 (0.0018) [2025-01-05 14:56:19,965][19571] Fps is (10 sec: 18841.6, 60 sec: 18636.8, 300 sec: 19452.5). Total num frames: 1215643648. Throughput: 0: 4596.0. Samples: 28900014. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 14:56:19,965][19571] Avg episode reward: [(0, '9.118')] [2025-01-05 14:56:21,645][19668] Updated weights for policy 0, policy_version 296796 (0.0018) [2025-01-05 14:56:23,851][19668] Updated weights for policy 0, policy_version 296806 (0.0018) [2025-01-05 14:56:24,965][19571] Fps is (10 sec: 19251.2, 60 sec: 18568.5, 300 sec: 19438.6). Total num frames: 1215737856. Throughput: 0: 4600.2. Samples: 28928558. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 14:56:24,965][19571] Avg episode reward: [(0, '9.605')] [2025-01-05 14:56:25,997][19668] Updated weights for policy 0, policy_version 296816 (0.0017) [2025-01-05 14:56:28,021][19668] Updated weights for policy 0, policy_version 296826 (0.0017) [2025-01-05 14:56:29,965][19571] Fps is (10 sec: 18841.5, 60 sec: 18568.5, 300 sec: 19438.6). Total num frames: 1215832064. Throughput: 0: 4611.0. Samples: 28957150. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 14:56:29,966][19571] Avg episode reward: [(0, '9.044')] [2025-01-05 14:56:30,305][19668] Updated weights for policy 0, policy_version 296836 (0.0018) [2025-01-05 14:56:32,498][19668] Updated weights for policy 0, policy_version 296846 (0.0017) [2025-01-05 14:56:34,527][19668] Updated weights for policy 0, policy_version 296856 (0.0018) [2025-01-05 14:56:34,965][19571] Fps is (10 sec: 18841.4, 60 sec: 18500.3, 300 sec: 19424.8). Total num frames: 1215926272. Throughput: 0: 4612.3. Samples: 28971112. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 14:56:34,966][19571] Avg episode reward: [(0, '9.483')] [2025-01-05 14:56:36,872][19668] Updated weights for policy 0, policy_version 296866 (0.0018) [2025-01-05 14:56:38,984][19668] Updated weights for policy 0, policy_version 296876 (0.0017) [2025-01-05 14:56:39,965][19571] Fps is (10 sec: 18841.6, 60 sec: 18500.2, 300 sec: 19410.9). Total num frames: 1216020480. Throughput: 0: 4623.2. Samples: 28999596. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-05 14:56:39,965][19571] Avg episode reward: [(0, '10.629')] [2025-01-05 14:56:39,972][19636] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000296880_1216020480.pth... [2025-01-05 14:56:40,033][19636] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000295746_1211375616.pth [2025-01-05 14:56:41,157][19668] Updated weights for policy 0, policy_version 296886 (0.0019) [2025-01-05 14:56:43,288][19668] Updated weights for policy 0, policy_version 296896 (0.0018) [2025-01-05 14:56:44,965][19571] Fps is (10 sec: 18841.7, 60 sec: 18500.2, 300 sec: 19410.9). Total num frames: 1216114688. Throughput: 0: 4657.0. Samples: 29027770. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-05 14:56:44,965][19571] Avg episode reward: [(0, '10.344')] [2025-01-05 14:56:45,560][19668] Updated weights for policy 0, policy_version 296906 (0.0020) [2025-01-05 14:56:47,733][19668] Updated weights for policy 0, policy_version 296916 (0.0018) [2025-01-05 14:56:49,884][19668] Updated weights for policy 0, policy_version 296926 (0.0018) [2025-01-05 14:56:49,965][19571] Fps is (10 sec: 18841.9, 60 sec: 18636.8, 300 sec: 19397.0). Total num frames: 1216208896. Throughput: 0: 4681.3. Samples: 29041454. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-05 14:56:49,965][19571] Avg episode reward: [(0, '10.905')] [2025-01-05 14:56:52,099][19668] Updated weights for policy 0, policy_version 296936 (0.0019) [2025-01-05 14:56:54,202][19668] Updated weights for policy 0, policy_version 296946 (0.0018) [2025-01-05 14:56:54,965][19571] Fps is (10 sec: 18432.2, 60 sec: 18500.3, 300 sec: 19369.2). Total num frames: 1216299008. Throughput: 0: 4727.4. Samples: 29070232. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-05 14:56:54,965][19571] Avg episode reward: [(0, '10.436')] [2025-01-05 14:56:56,616][19668] Updated weights for policy 0, policy_version 296956 (0.0019) [2025-01-05 14:56:58,787][19668] Updated weights for policy 0, policy_version 296966 (0.0017) [2025-01-05 14:56:59,965][19571] Fps is (10 sec: 18431.8, 60 sec: 18636.9, 300 sec: 19355.3). Total num frames: 1216393216. Throughput: 0: 4711.9. Samples: 29097088. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-05 14:56:59,966][19571] Avg episode reward: [(0, '9.693')] [2025-01-05 14:57:00,935][19668] Updated weights for policy 0, policy_version 296976 (0.0018) [2025-01-05 14:57:03,055][19668] Updated weights for policy 0, policy_version 296986 (0.0017) [2025-01-05 14:57:04,965][19571] Fps is (10 sec: 18841.5, 60 sec: 18773.4, 300 sec: 19355.3). Total num frames: 1216487424. Throughput: 0: 4708.3. Samples: 29111886. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-05 14:57:04,965][19571] Avg episode reward: [(0, '11.493')] [2025-01-05 14:57:05,417][19668] Updated weights for policy 0, policy_version 296996 (0.0018) [2025-01-05 14:57:07,558][19668] Updated weights for policy 0, policy_version 297006 (0.0017) [2025-01-05 14:57:09,807][19668] Updated weights for policy 0, policy_version 297016 (0.0017) [2025-01-05 14:57:09,965][19571] Fps is (10 sec: 18432.1, 60 sec: 18705.1, 300 sec: 19327.6). Total num frames: 1216577536. Throughput: 0: 4683.6. Samples: 29139318. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-05 14:57:09,965][19571] Avg episode reward: [(0, '10.653')] [2025-01-05 14:57:12,032][19668] Updated weights for policy 0, policy_version 297026 (0.0019) [2025-01-05 14:57:14,153][19668] Updated weights for policy 0, policy_version 297036 (0.0018) [2025-01-05 14:57:14,965][19571] Fps is (10 sec: 18432.1, 60 sec: 18773.4, 300 sec: 19327.6). Total num frames: 1216671744. Throughput: 0: 4663.8. Samples: 29167022. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-05 14:57:14,965][19571] Avg episode reward: [(0, '11.650')] [2025-01-05 14:57:16,404][19668] Updated weights for policy 0, policy_version 297046 (0.0017) [2025-01-05 14:57:18,522][19668] Updated weights for policy 0, policy_version 297056 (0.0017) [2025-01-05 14:57:19,965][19571] Fps is (10 sec: 18841.4, 60 sec: 18705.0, 300 sec: 19313.7). Total num frames: 1216765952. Throughput: 0: 4666.8. Samples: 29181116. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-05 14:57:19,966][19571] Avg episode reward: [(0, '11.303')] [2025-01-05 14:57:20,653][19668] Updated weights for policy 0, policy_version 297066 (0.0019) [2025-01-05 14:57:22,863][19668] Updated weights for policy 0, policy_version 297076 (0.0017) [2025-01-05 14:57:24,965][19571] Fps is (10 sec: 18841.6, 60 sec: 18705.1, 300 sec: 19285.9). Total num frames: 1216860160. Throughput: 0: 4663.3. Samples: 29209444. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-05 14:57:24,965][19571] Avg episode reward: [(0, '11.564')] [2025-01-05 14:57:25,134][19668] Updated weights for policy 0, policy_version 297086 (0.0019) [2025-01-05 14:57:27,274][19668] Updated weights for policy 0, policy_version 297096 (0.0018) [2025-01-05 14:57:29,565][19668] Updated weights for policy 0, policy_version 297106 (0.0017) [2025-01-05 14:57:29,965][19571] Fps is (10 sec: 18432.2, 60 sec: 18636.8, 300 sec: 19244.3). Total num frames: 1216950272. Throughput: 0: 4651.3. Samples: 29237078. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-05 14:57:29,965][19571] Avg episode reward: [(0, '11.459')] [2025-01-05 14:57:31,788][19668] Updated weights for policy 0, policy_version 297116 (0.0018) [2025-01-05 14:57:33,854][19668] Updated weights for policy 0, policy_version 297126 (0.0017) [2025-01-05 14:57:34,965][19571] Fps is (10 sec: 18431.7, 60 sec: 18636.8, 300 sec: 19230.4). Total num frames: 1217044480. Throughput: 0: 4658.0. Samples: 29251066. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-05 14:57:34,965][19571] Avg episode reward: [(0, '10.548')] [2025-01-05 14:57:36,130][19668] Updated weights for policy 0, policy_version 297136 (0.0019) [2025-01-05 14:57:38,204][19668] Updated weights for policy 0, policy_version 297146 (0.0018) [2025-01-05 14:57:39,965][19571] Fps is (10 sec: 18841.6, 60 sec: 18636.8, 300 sec: 19216.5). Total num frames: 1217138688. Throughput: 0: 4650.6. Samples: 29279508. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-05 14:57:39,965][19571] Avg episode reward: [(0, '11.957')] [2025-01-05 14:57:40,367][19668] Updated weights for policy 0, policy_version 297156 (0.0019) [2025-01-05 14:57:42,463][19668] Updated weights for policy 0, policy_version 297166 (0.0017) [2025-01-05 14:57:44,508][19668] Updated weights for policy 0, policy_version 297176 (0.0018) [2025-01-05 14:57:44,965][19571] Fps is (10 sec: 19660.9, 60 sec: 18773.3, 300 sec: 19216.5). Total num frames: 1217241088. Throughput: 0: 4706.3. Samples: 29308872. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-05 14:57:44,965][19571] Avg episode reward: [(0, '11.240')] [2025-01-05 14:57:46,686][19668] Updated weights for policy 0, policy_version 297186 (0.0018) [2025-01-05 14:57:48,785][19668] Updated weights for policy 0, policy_version 297196 (0.0017) [2025-01-05 14:57:49,965][19571] Fps is (10 sec: 19660.6, 60 sec: 18773.3, 300 sec: 19202.6). Total num frames: 1217335296. Throughput: 0: 4698.1. Samples: 29323302. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-05 14:57:49,965][19571] Avg episode reward: [(0, '11.570')] [2025-01-05 14:57:50,967][19668] Updated weights for policy 0, policy_version 297206 (0.0018) [2025-01-05 14:57:53,011][19668] Updated weights for policy 0, policy_version 297216 (0.0018) [2025-01-05 14:57:54,965][19571] Fps is (10 sec: 18841.5, 60 sec: 18841.6, 300 sec: 19188.7). Total num frames: 1217429504. Throughput: 0: 4724.4. Samples: 29351918. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-05 14:57:54,965][19571] Avg episode reward: [(0, '10.322')] [2025-01-05 14:57:55,235][19668] Updated weights for policy 0, policy_version 297226 (0.0019) [2025-01-05 14:57:57,304][19668] Updated weights for policy 0, policy_version 297236 (0.0018) [2025-01-05 14:57:59,369][19668] Updated weights for policy 0, policy_version 297246 (0.0018) [2025-01-05 14:57:59,965][19571] Fps is (10 sec: 19251.0, 60 sec: 18909.8, 300 sec: 19174.8). Total num frames: 1217527808. Throughput: 0: 4758.7. Samples: 29381166. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-05 14:57:59,966][19571] Avg episode reward: [(0, '10.939')] [2025-01-05 14:58:01,581][19668] Updated weights for policy 0, policy_version 297256 (0.0018) [2025-01-05 14:58:03,625][19668] Updated weights for policy 0, policy_version 297266 (0.0017) [2025-01-05 14:58:04,965][19571] Fps is (10 sec: 19661.0, 60 sec: 18978.1, 300 sec: 19174.8). Total num frames: 1217626112. Throughput: 0: 4769.1. Samples: 29395726. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-05 14:58:04,965][19571] Avg episode reward: [(0, '10.645')] [2025-01-05 14:58:05,761][19668] Updated weights for policy 0, policy_version 297276 (0.0018) [2025-01-05 14:58:07,857][19668] Updated weights for policy 0, policy_version 297286 (0.0017) [2025-01-05 14:58:09,965][19571] Fps is (10 sec: 19251.5, 60 sec: 19046.4, 300 sec: 19147.1). Total num frames: 1217720320. Throughput: 0: 4780.8. Samples: 29424582. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-05 14:58:09,965][19571] Avg episode reward: [(0, '11.455')] [2025-01-05 14:58:09,999][19668] Updated weights for policy 0, policy_version 297296 (0.0018) [2025-01-05 14:58:12,164][19668] Updated weights for policy 0, policy_version 297306 (0.0019) [2025-01-05 14:58:14,284][19668] Updated weights for policy 0, policy_version 297316 (0.0018) [2025-01-05 14:58:14,965][19571] Fps is (10 sec: 19251.2, 60 sec: 19114.6, 300 sec: 19147.1). Total num frames: 1217818624. Throughput: 0: 4805.2. Samples: 29453314. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-05 14:58:14,965][19571] Avg episode reward: [(0, '12.325')] [2025-01-05 14:58:16,435][19668] Updated weights for policy 0, policy_version 297326 (0.0018) [2025-01-05 14:58:18,503][19668] Updated weights for policy 0, policy_version 297336 (0.0018) [2025-01-05 14:58:19,965][19571] Fps is (10 sec: 19251.4, 60 sec: 19114.7, 300 sec: 19133.2). Total num frames: 1217912832. Throughput: 0: 4816.4. Samples: 29467802. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-05 14:58:19,965][19571] Avg episode reward: [(0, '12.249')] [2025-01-05 14:58:20,726][19668] Updated weights for policy 0, policy_version 297346 (0.0019) [2025-01-05 14:58:22,776][19668] Updated weights for policy 0, policy_version 297356 (0.0017) [2025-01-05 14:58:24,851][19668] Updated weights for policy 0, policy_version 297366 (0.0017) [2025-01-05 14:58:24,965][19571] Fps is (10 sec: 19251.2, 60 sec: 19182.9, 300 sec: 19119.3). Total num frames: 1218011136. Throughput: 0: 4827.9. Samples: 29496764. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-05 14:58:24,965][19571] Avg episode reward: [(0, '12.679')] [2025-01-05 14:58:24,966][19636] Saving new best policy, reward=12.679! [2025-01-05 14:58:27,315][19668] Updated weights for policy 0, policy_version 297376 (0.0020) [2025-01-05 14:58:29,345][19668] Updated weights for policy 0, policy_version 297386 (0.0017) [2025-01-05 14:58:29,965][19571] Fps is (10 sec: 18841.2, 60 sec: 19182.9, 300 sec: 19091.5). Total num frames: 1218101248. Throughput: 0: 4795.6. Samples: 29524676. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-05 14:58:29,966][19571] Avg episode reward: [(0, '11.793')] [2025-01-05 14:58:31,511][19668] Updated weights for policy 0, policy_version 297396 (0.0018) [2025-01-05 14:58:31,988][19571] Keyboard interrupt detected in the event loop EvtLoop [Runner_EvtLoop, process=main process 19571], exiting... [2025-01-05 14:58:31,989][19571] Runner profile tree view: main_loop: 5988.9978 [2025-01-05 14:58:31,989][19571] Collected {0: 1218142208}, FPS: 19725.6 [2025-01-05 14:58:31,993][19636] Stopping Batcher_0... [2025-01-05 14:58:31,994][19636] Loop batcher_evt_loop terminating... [2025-01-05 14:58:31,998][19636] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000297398_1218142208.pth... [2025-01-05 14:58:32,014][19695] EvtLoop [rollout_proc10_evt_loop, process=rollout_proc10] unhandled exception in slot='advance_rollouts' connected to emitter=Emitter(object_id='InferenceWorker_p0-w0', signal_name='advance10'), args=(0, 0) Traceback (most recent call last): File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/signal_slot/signal_slot.py", line 355, in _process_signal slot_callable(*args) File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/sample_factory/algo/sampling/rollout_worker.py", line 241, in advance_rollouts complete_rollouts, episodic_stats = runner.advance_rollouts(policy_id, self.timing) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/sample_factory/algo/sampling/non_batched_sampling.py", line 634, in advance_rollouts new_obs, rewards, terminated, truncated, infos = e.step(actions) ^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/gymnasium/core.py", line 461, in step return self.env.step(action) ^^^^^^^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/sample_factory/algo/utils/make_env.py", line 129, in step obs, rew, terminated, truncated, info = self.env.step(action) ^^^^^^^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/sample_factory/algo/utils/make_env.py", line 115, in step obs, rew, terminated, truncated, info = self.env.step(action) ^^^^^^^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/sf_examples/vizdoom/doom/wrappers/scenario_wrappers/gathering_reward_shaping.py", line 33, in step observation, reward, terminated, truncated, info = self.env.step(action) ^^^^^^^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/gymnasium/core.py", line 522, in step observation, reward, terminated, truncated, info = self.env.step(action) ^^^^^^^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/sample_factory/envs/env_wrappers.py", line 86, in step obs, reward, terminated, truncated, info = self.env.step(action) ^^^^^^^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/gymnasium/core.py", line 461, in step return self.env.step(action) ^^^^^^^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/sf_examples/vizdoom/doom/wrappers/multiplayer_stats.py", line 54, in step obs, reward, terminated, truncated, info = self.env.step(action) ^^^^^^^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/sf_examples/vizdoom/doom/doom_gym.py", line 452, in step reward = self.game.make_action(actions_flattened, self.skip_frames) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ vizdoom.vizdoom.SignalException: Signal SIGINT received. ViZDoom instance has been closed. [2025-01-05 14:58:32,030][19695] Unhandled exception Signal SIGINT received. ViZDoom instance has been closed. in evt loop rollout_proc10_evt_loop [2025-01-05 14:58:32,029][19694] EvtLoop [rollout_proc9_evt_loop, process=rollout_proc9] unhandled exception in slot='advance_rollouts' connected to emitter=Emitter(object_id='InferenceWorker_p0-w0', signal_name='advance9'), args=(0, 0) Traceback (most recent call last): File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/signal_slot/signal_slot.py", line 355, in _process_signal slot_callable(*args) File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/sample_factory/algo/sampling/rollout_worker.py", line 241, in advance_rollouts complete_rollouts, episodic_stats = runner.advance_rollouts(policy_id, self.timing) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/sample_factory/algo/sampling/non_batched_sampling.py", line 634, in advance_rollouts new_obs, rewards, terminated, truncated, infos = e.step(actions) ^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/gymnasium/core.py", line 461, in step return self.env.step(action) ^^^^^^^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/sample_factory/algo/utils/make_env.py", line 129, in step obs, rew, terminated, truncated, info = self.env.step(action) ^^^^^^^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/sample_factory/algo/utils/make_env.py", line 115, in step obs, rew, terminated, truncated, info = self.env.step(action) ^^^^^^^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/sf_examples/vizdoom/doom/wrappers/scenario_wrappers/gathering_reward_shaping.py", line 33, in step observation, reward, terminated, truncated, info = self.env.step(action) ^^^^^^^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/gymnasium/core.py", line 522, in step observation, reward, terminated, truncated, info = self.env.step(action) ^^^^^^^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/sample_factory/envs/env_wrappers.py", line 86, in step obs, reward, terminated, truncated, info = self.env.step(action) ^^^^^^^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/gymnasium/core.py", line 461, in step return self.env.step(action) ^^^^^^^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/sf_examples/vizdoom/doom/wrappers/multiplayer_stats.py", line 54, in step obs, reward, terminated, truncated, info = self.env.step(action) ^^^^^^^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/sf_examples/vizdoom/doom/doom_gym.py", line 452, in step reward = self.game.make_action(actions_flattened, self.skip_frames) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ vizdoom.vizdoom.SignalException: Signal SIGINT received. ViZDoom instance has been closed. [2025-01-05 14:58:32,019][19671] EvtLoop [rollout_proc2_evt_loop, process=rollout_proc2] unhandled exception in slot='advance_rollouts' connected to emitter=Emitter(object_id='InferenceWorker_p0-w0', signal_name='advance2'), args=(0, 0) Traceback (most recent call last): File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/signal_slot/signal_slot.py", line 355, in _process_signal slot_callable(*args) File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/sample_factory/algo/sampling/rollout_worker.py", line 241, in advance_rollouts complete_rollouts, episodic_stats = runner.advance_rollouts(policy_id, self.timing) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/sample_factory/algo/sampling/non_batched_sampling.py", line 634, in advance_rollouts new_obs, rewards, terminated, truncated, infos = e.step(actions) ^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/gymnasium/core.py", line 461, in step return self.env.step(action) ^^^^^^^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/sample_factory/algo/utils/make_env.py", line 129, in step obs, rew, terminated, truncated, info = self.env.step(action) ^^^^^^^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/sample_factory/algo/utils/make_env.py", line 115, in step obs, rew, terminated, truncated, info = self.env.step(action) ^^^^^^^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/sf_examples/vizdoom/doom/wrappers/scenario_wrappers/gathering_reward_shaping.py", line 33, in step observation, reward, terminated, truncated, info = self.env.step(action) ^^^^^^^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/gymnasium/core.py", line 522, in step observation, reward, terminated, truncated, info = self.env.step(action) ^^^^^^^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/sample_factory/envs/env_wrappers.py", line 86, in step obs, reward, terminated, truncated, info = self.env.step(action) ^^^^^^^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/gymnasium/core.py", line 461, in step return self.env.step(action) ^^^^^^^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/sf_examples/vizdoom/doom/wrappers/multiplayer_stats.py", line 54, in step obs, reward, terminated, truncated, info = self.env.step(action) ^^^^^^^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/sf_examples/vizdoom/doom/doom_gym.py", line 452, in step reward = self.game.make_action(actions_flattened, self.skip_frames) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ vizdoom.vizdoom.SignalException: Signal SIGINT received. ViZDoom instance has been closed. [2025-01-05 14:58:32,033][19694] Unhandled exception Signal SIGINT received. ViZDoom instance has been closed. in evt loop rollout_proc9_evt_loop [2025-01-05 14:58:32,034][19671] Unhandled exception Signal SIGINT received. ViZDoom instance has been closed. in evt loop rollout_proc2_evt_loop [2025-01-05 14:58:32,033][19693] EvtLoop [rollout_proc7_evt_loop, process=rollout_proc7] unhandled exception in slot='advance_rollouts' connected to emitter=Emitter(object_id='InferenceWorker_p0-w0', signal_name='advance7'), args=(1, 0) Traceback (most recent call last): File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/signal_slot/signal_slot.py", line 355, in _process_signal slot_callable(*args) File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/sample_factory/algo/sampling/rollout_worker.py", line 241, in advance_rollouts complete_rollouts, episodic_stats = runner.advance_rollouts(policy_id, self.timing) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/sample_factory/algo/sampling/non_batched_sampling.py", line 634, in advance_rollouts new_obs, rewards, terminated, truncated, infos = e.step(actions) ^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/gymnasium/core.py", line 461, in step return self.env.step(action) ^^^^^^^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/sample_factory/algo/utils/make_env.py", line 129, in step obs, rew, terminated, truncated, info = self.env.step(action) ^^^^^^^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/sample_factory/algo/utils/make_env.py", line 115, in step obs, rew, terminated, truncated, info = self.env.step(action) ^^^^^^^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/sf_examples/vizdoom/doom/wrappers/scenario_wrappers/gathering_reward_shaping.py", line 33, in step observation, reward, terminated, truncated, info = self.env.step(action) ^^^^^^^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/gymnasium/core.py", line 522, in step observation, reward, terminated, truncated, info = self.env.step(action) ^^^^^^^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/sample_factory/envs/env_wrappers.py", line 86, in step obs, reward, terminated, truncated, info = self.env.step(action) ^^^^^^^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/gymnasium/core.py", line 461, in step return self.env.step(action) ^^^^^^^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/sf_examples/vizdoom/doom/wrappers/multiplayer_stats.py", line 54, in step obs, reward, terminated, truncated, info = self.env.step(action) ^^^^^^^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/sf_examples/vizdoom/doom/doom_gym.py", line 452, in step reward = self.game.make_action(actions_flattened, self.skip_frames) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ vizdoom.vizdoom.SignalException: Signal SIGINT received. ViZDoom instance has been closed. [2025-01-05 14:58:32,038][19693] Unhandled exception Signal SIGINT received. ViZDoom instance has been closed. in evt loop rollout_proc7_evt_loop [2025-01-05 14:58:32,026][19690] EvtLoop [rollout_proc4_evt_loop, process=rollout_proc4] unhandled exception in slot='advance_rollouts' connected to emitter=Emitter(object_id='InferenceWorker_p0-w0', signal_name='advance4'), args=(0, 0) Traceback (most recent call last): File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/signal_slot/signal_slot.py", line 355, in _process_signal slot_callable(*args) File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/sample_factory/algo/sampling/rollout_worker.py", line 241, in advance_rollouts complete_rollouts, episodic_stats = runner.advance_rollouts(policy_id, self.timing) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/sample_factory/algo/sampling/non_batched_sampling.py", line 634, in advance_rollouts new_obs, rewards, terminated, truncated, infos = e.step(actions) ^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/gymnasium/core.py", line 461, in step return self.env.step(action) ^^^^^^^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/sample_factory/algo/utils/make_env.py", line 129, in step obs, rew, terminated, truncated, info = self.env.step(action) ^^^^^^^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/sample_factory/algo/utils/make_env.py", line 115, in step obs, rew, terminated, truncated, info = self.env.step(action) ^^^^^^^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/sf_examples/vizdoom/doom/wrappers/scenario_wrappers/gathering_reward_shaping.py", line 33, in step observation, reward, terminated, truncated, info = self.env.step(action) ^^^^^^^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/gymnasium/core.py", line 522, in step observation, reward, terminated, truncated, info = self.env.step(action) ^^^^^^^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/sample_factory/envs/env_wrappers.py", line 86, in step obs, reward, terminated, truncated, info = self.env.step(action) ^^^^^^^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/gymnasium/core.py", line 461, in step return self.env.step(action) ^^^^^^^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/sf_examples/vizdoom/doom/wrappers/multiplayer_stats.py", line 54, in step obs, reward, terminated, truncated, info = self.env.step(action) ^^^^^^^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/sf_examples/vizdoom/doom/doom_gym.py", line 452, in step reward = self.game.make_action(actions_flattened, self.skip_frames) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ vizdoom.vizdoom.SignalException: Signal SIGINT received. ViZDoom instance has been closed. [2025-01-05 14:58:32,039][19690] Unhandled exception Signal SIGINT received. ViZDoom instance has been closed. in evt loop rollout_proc4_evt_loop [2025-01-05 14:58:32,045][19692] EvtLoop [rollout_proc8_evt_loop, process=rollout_proc8] unhandled exception in slot='advance_rollouts' connected to emitter=Emitter(object_id='InferenceWorker_p0-w0', signal_name='advance8'), args=(1, 0) Traceback (most recent call last): File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/signal_slot/signal_slot.py", line 355, in _process_signal slot_callable(*args) File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/sample_factory/algo/sampling/rollout_worker.py", line 241, in advance_rollouts complete_rollouts, episodic_stats = runner.advance_rollouts(policy_id, self.timing) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/sample_factory/algo/sampling/non_batched_sampling.py", line 634, in advance_rollouts new_obs, rewards, terminated, truncated, infos = e.step(actions) ^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/gymnasium/core.py", line 461, in step return self.env.step(action) ^^^^^^^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/sample_factory/algo/utils/make_env.py", line 129, in step obs, rew, terminated, truncated, info = self.env.step(action) ^^^^^^^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/sample_factory/algo/utils/make_env.py", line 115, in step obs, rew, terminated, truncated, info = self.env.step(action) ^^^^^^^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/sf_examples/vizdoom/doom/wrappers/scenario_wrappers/gathering_reward_shaping.py", line 33, in step observation, reward, terminated, truncated, info = self.env.step(action) ^^^^^^^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/gymnasium/core.py", line 522, in step observation, reward, terminated, truncated, info = self.env.step(action) ^^^^^^^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/sample_factory/envs/env_wrappers.py", line 86, in step obs, reward, terminated, truncated, info = self.env.step(action) ^^^^^^^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/gymnasium/core.py", line 461, in step return self.env.step(action) ^^^^^^^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/sf_examples/vizdoom/doom/wrappers/multiplayer_stats.py", line 54, in step obs, reward, terminated, truncated, info = self.env.step(action) ^^^^^^^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/sf_examples/vizdoom/doom/doom_gym.py", line 452, in step reward = self.game.make_action(actions_flattened, self.skip_frames) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ vizdoom.vizdoom.SignalException: Signal SIGINT received. ViZDoom instance has been closed. [2025-01-05 14:58:32,050][19692] Unhandled exception Signal SIGINT received. ViZDoom instance has been closed. in evt loop rollout_proc8_evt_loop [2025-01-05 14:58:32,062][19668] Weights refcount: 2 0 [2025-01-05 14:58:32,064][19668] Stopping InferenceWorker_p0-w0... [2025-01-05 14:58:32,065][19668] Loop inference_proc0-0_evt_loop terminating... [2025-01-05 14:58:32,069][19672] EvtLoop [rollout_proc3_evt_loop, process=rollout_proc3] unhandled exception in slot='advance_rollouts' connected to emitter=Emitter(object_id='InferenceWorker_p0-w0', signal_name='advance3'), args=(0, 0) Traceback (most recent call last): File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/signal_slot/signal_slot.py", line 355, in _process_signal slot_callable(*args) File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/sample_factory/algo/sampling/rollout_worker.py", line 241, in advance_rollouts complete_rollouts, episodic_stats = runner.advance_rollouts(policy_id, self.timing) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/sample_factory/algo/sampling/non_batched_sampling.py", line 634, in advance_rollouts new_obs, rewards, terminated, truncated, infos = e.step(actions) ^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/gymnasium/core.py", line 461, in step return self.env.step(action) ^^^^^^^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/sample_factory/algo/utils/make_env.py", line 129, in step obs, rew, terminated, truncated, info = self.env.step(action) ^^^^^^^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/sample_factory/algo/utils/make_env.py", line 115, in step obs, rew, terminated, truncated, info = self.env.step(action) ^^^^^^^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/sf_examples/vizdoom/doom/wrappers/scenario_wrappers/gathering_reward_shaping.py", line 33, in step observation, reward, terminated, truncated, info = self.env.step(action) ^^^^^^^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/gymnasium/core.py", line 522, in step observation, reward, terminated, truncated, info = self.env.step(action) ^^^^^^^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/sample_factory/envs/env_wrappers.py", line 86, in step obs, reward, terminated, truncated, info = self.env.step(action) ^^^^^^^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/gymnasium/core.py", line 461, in step return self.env.step(action) ^^^^^^^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/sf_examples/vizdoom/doom/wrappers/multiplayer_stats.py", line 54, in step obs, reward, terminated, truncated, info = self.env.step(action) ^^^^^^^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/sf_examples/vizdoom/doom/doom_gym.py", line 452, in step reward = self.game.make_action(actions_flattened, self.skip_frames) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ vizdoom.vizdoom.SignalException: Signal SIGINT received. ViZDoom instance has been closed. [2025-01-05 14:58:32,074][19672] Unhandled exception Signal SIGINT received. ViZDoom instance has been closed. in evt loop rollout_proc3_evt_loop [2025-01-05 14:58:32,075][19689] EvtLoop [rollout_proc6_evt_loop, process=rollout_proc6] unhandled exception in slot='advance_rollouts' connected to emitter=Emitter(object_id='InferenceWorker_p0-w0', signal_name='advance6'), args=(0, 0) Traceback (most recent call last): File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/signal_slot/signal_slot.py", line 355, in _process_signal slot_callable(*args) File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/sample_factory/algo/sampling/rollout_worker.py", line 241, in advance_rollouts complete_rollouts, episodic_stats = runner.advance_rollouts(policy_id, self.timing) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/sample_factory/algo/sampling/non_batched_sampling.py", line 634, in advance_rollouts new_obs, rewards, terminated, truncated, infos = e.step(actions) ^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/gymnasium/core.py", line 461, in step return self.env.step(action) ^^^^^^^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/sample_factory/algo/utils/make_env.py", line 129, in step obs, rew, terminated, truncated, info = self.env.step(action) ^^^^^^^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/sample_factory/algo/utils/make_env.py", line 115, in step obs, rew, terminated, truncated, info = self.env.step(action) ^^^^^^^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/sf_examples/vizdoom/doom/wrappers/scenario_wrappers/gathering_reward_shaping.py", line 33, in step observation, reward, terminated, truncated, info = self.env.step(action) ^^^^^^^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/gymnasium/core.py", line 522, in step observation, reward, terminated, truncated, info = self.env.step(action) ^^^^^^^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/sample_factory/envs/env_wrappers.py", line 86, in step obs, reward, terminated, truncated, info = self.env.step(action) ^^^^^^^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/gymnasium/core.py", line 461, in step return self.env.step(action) ^^^^^^^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/sf_examples/vizdoom/doom/wrappers/multiplayer_stats.py", line 54, in step obs, reward, terminated, truncated, info = self.env.step(action) ^^^^^^^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/sf_examples/vizdoom/doom/doom_gym.py", line 452, in step reward = self.game.make_action(actions_flattened, self.skip_frames) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ vizdoom.vizdoom.SignalException: Signal SIGINT received. ViZDoom instance has been closed. [2025-01-05 14:58:32,089][19689] Unhandled exception Signal SIGINT received. ViZDoom instance has been closed. in evt loop rollout_proc6_evt_loop [2025-01-05 14:58:32,092][19636] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000296328_1213759488.pth [2025-01-05 14:58:32,093][19636] Stopping LearnerWorker_p0... [2025-01-05 14:58:32,094][19636] Loop learner_proc0_evt_loop terminating... [2025-01-05 14:58:32,071][19669] EvtLoop [rollout_proc1_evt_loop, process=rollout_proc1] unhandled exception in slot='advance_rollouts' connected to emitter=Emitter(object_id='InferenceWorker_p0-w0', signal_name='advance1'), args=(0, 0) Traceback (most recent call last): File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/signal_slot/signal_slot.py", line 355, in _process_signal slot_callable(*args) File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/sample_factory/algo/sampling/rollout_worker.py", line 241, in advance_rollouts complete_rollouts, episodic_stats = runner.advance_rollouts(policy_id, self.timing) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/sample_factory/algo/sampling/non_batched_sampling.py", line 634, in advance_rollouts new_obs, rewards, terminated, truncated, infos = e.step(actions) ^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/gymnasium/core.py", line 461, in step return self.env.step(action) ^^^^^^^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/sample_factory/algo/utils/make_env.py", line 129, in step obs, rew, terminated, truncated, info = self.env.step(action) ^^^^^^^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/sample_factory/algo/utils/make_env.py", line 115, in step obs, rew, terminated, truncated, info = self.env.step(action) ^^^^^^^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/sf_examples/vizdoom/doom/wrappers/scenario_wrappers/gathering_reward_shaping.py", line 33, in step observation, reward, terminated, truncated, info = self.env.step(action) ^^^^^^^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/gymnasium/core.py", line 522, in step observation, reward, terminated, truncated, info = self.env.step(action) ^^^^^^^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/sample_factory/envs/env_wrappers.py", line 86, in step obs, reward, terminated, truncated, info = self.env.step(action) ^^^^^^^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/gymnasium/core.py", line 461, in step return self.env.step(action) ^^^^^^^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/sf_examples/vizdoom/doom/wrappers/multiplayer_stats.py", line 54, in step obs, reward, terminated, truncated, info = self.env.step(action) ^^^^^^^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/sf_examples/vizdoom/doom/doom_gym.py", line 452, in step reward = self.game.make_action(actions_flattened, self.skip_frames) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ vizdoom.vizdoom.SignalException: Signal SIGINT received. ViZDoom instance has been closed. [2025-01-05 14:58:32,095][19669] Unhandled exception Signal SIGINT received. ViZDoom instance has been closed. in evt loop rollout_proc1_evt_loop [2025-01-05 14:58:32,165][19696] EvtLoop [rollout_proc11_evt_loop, process=rollout_proc11] unhandled exception in slot='advance_rollouts' connected to emitter=Emitter(object_id='InferenceWorker_p0-w0', signal_name='advance11'), args=(0, 0) Traceback (most recent call last): File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/signal_slot/signal_slot.py", line 355, in _process_signal slot_callable(*args) File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/sample_factory/algo/sampling/rollout_worker.py", line 241, in advance_rollouts complete_rollouts, episodic_stats = runner.advance_rollouts(policy_id, self.timing) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/sample_factory/algo/sampling/non_batched_sampling.py", line 634, in advance_rollouts new_obs, rewards, terminated, truncated, infos = e.step(actions) ^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/gymnasium/core.py", line 461, in step return self.env.step(action) ^^^^^^^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/sample_factory/algo/utils/make_env.py", line 129, in step obs, rew, terminated, truncated, info = self.env.step(action) ^^^^^^^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/sample_factory/algo/utils/make_env.py", line 115, in step obs, rew, terminated, truncated, info = self.env.step(action) ^^^^^^^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/sf_examples/vizdoom/doom/wrappers/scenario_wrappers/gathering_reward_shaping.py", line 33, in step observation, reward, terminated, truncated, info = self.env.step(action) ^^^^^^^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/gymnasium/core.py", line 522, in step observation, reward, terminated, truncated, info = self.env.step(action) ^^^^^^^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/sample_factory/envs/env_wrappers.py", line 86, in step obs, reward, terminated, truncated, info = self.env.step(action) ^^^^^^^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/gymnasium/core.py", line 461, in step return self.env.step(action) ^^^^^^^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/sf_examples/vizdoom/doom/wrappers/multiplayer_stats.py", line 54, in step obs, reward, terminated, truncated, info = self.env.step(action) ^^^^^^^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/sf_examples/vizdoom/doom/doom_gym.py", line 452, in step reward = self.game.make_action(actions_flattened, self.skip_frames) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ vizdoom.vizdoom.SignalException: Signal SIGINT received. ViZDoom instance has been closed. [2025-01-05 14:58:32,169][19696] Unhandled exception Signal SIGINT received. ViZDoom instance has been closed. in evt loop rollout_proc11_evt_loop [2025-01-05 14:58:32,164][19670] EvtLoop [rollout_proc0_evt_loop, process=rollout_proc0] unhandled exception in slot='advance_rollouts' connected to emitter=Emitter(object_id='InferenceWorker_p0-w0', signal_name='advance0'), args=(0, 0) Traceback (most recent call last): File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/signal_slot/signal_slot.py", line 355, in _process_signal slot_callable(*args) File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/sample_factory/algo/sampling/rollout_worker.py", line 241, in advance_rollouts complete_rollouts, episodic_stats = runner.advance_rollouts(policy_id, self.timing) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/sample_factory/algo/sampling/non_batched_sampling.py", line 634, in advance_rollouts new_obs, rewards, terminated, truncated, infos = e.step(actions) ^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/gymnasium/core.py", line 461, in step return self.env.step(action) ^^^^^^^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/sample_factory/algo/utils/make_env.py", line 129, in step obs, rew, terminated, truncated, info = self.env.step(action) ^^^^^^^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/sample_factory/algo/utils/make_env.py", line 115, in step obs, rew, terminated, truncated, info = self.env.step(action) ^^^^^^^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/sf_examples/vizdoom/doom/wrappers/scenario_wrappers/gathering_reward_shaping.py", line 33, in step observation, reward, terminated, truncated, info = self.env.step(action) ^^^^^^^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/gymnasium/core.py", line 522, in step observation, reward, terminated, truncated, info = self.env.step(action) ^^^^^^^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/sample_factory/envs/env_wrappers.py", line 86, in step obs, reward, terminated, truncated, info = self.env.step(action) ^^^^^^^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/gymnasium/core.py", line 461, in step return self.env.step(action) ^^^^^^^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/sf_examples/vizdoom/doom/wrappers/multiplayer_stats.py", line 54, in step obs, reward, terminated, truncated, info = self.env.step(action) ^^^^^^^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/sf_examples/vizdoom/doom/doom_gym.py", line 452, in step reward = self.game.make_action(actions_flattened, self.skip_frames) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ vizdoom.vizdoom.SignalException: Signal SIGINT received. ViZDoom instance has been closed. [2025-01-05 14:58:32,208][19670] Unhandled exception Signal SIGINT received. ViZDoom instance has been closed. in evt loop rollout_proc0_evt_loop [2025-01-05 14:58:32,272][19691] EvtLoop [rollout_proc5_evt_loop, process=rollout_proc5] unhandled exception in slot='advance_rollouts' connected to emitter=Emitter(object_id='InferenceWorker_p0-w0', signal_name='advance5'), args=(0, 0) Traceback (most recent call last): File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/signal_slot/signal_slot.py", line 355, in _process_signal slot_callable(*args) File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/sample_factory/algo/sampling/rollout_worker.py", line 241, in advance_rollouts complete_rollouts, episodic_stats = runner.advance_rollouts(policy_id, self.timing) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/sample_factory/algo/sampling/non_batched_sampling.py", line 634, in advance_rollouts new_obs, rewards, terminated, truncated, infos = e.step(actions) ^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/gymnasium/core.py", line 461, in step return self.env.step(action) ^^^^^^^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/sample_factory/algo/utils/make_env.py", line 129, in step obs, rew, terminated, truncated, info = self.env.step(action) ^^^^^^^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/sample_factory/algo/utils/make_env.py", line 115, in step obs, rew, terminated, truncated, info = self.env.step(action) ^^^^^^^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/sf_examples/vizdoom/doom/wrappers/scenario_wrappers/gathering_reward_shaping.py", line 33, in step observation, reward, terminated, truncated, info = self.env.step(action) ^^^^^^^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/gymnasium/core.py", line 522, in step observation, reward, terminated, truncated, info = self.env.step(action) ^^^^^^^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/sample_factory/envs/env_wrappers.py", line 86, in step obs, reward, terminated, truncated, info = self.env.step(action) ^^^^^^^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/gymnasium/core.py", line 461, in step return self.env.step(action) ^^^^^^^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/sf_examples/vizdoom/doom/wrappers/multiplayer_stats.py", line 54, in step obs, reward, terminated, truncated, info = self.env.step(action) ^^^^^^^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/sf_examples/vizdoom/doom/doom_gym.py", line 452, in step reward = self.game.make_action(actions_flattened, self.skip_frames) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ vizdoom.vizdoom.SignalException: Signal SIGINT received. ViZDoom instance has been closed. [2025-01-05 14:58:32,280][19691] Unhandled exception Signal SIGINT received. ViZDoom instance has been closed. in evt loop rollout_proc5_evt_loop [2025-01-05 14:58:32,546][19571] Loading existing experiment configuration from /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/config.json [2025-01-05 14:58:32,547][19571] Overriding arg 'num_workers' with value 1 passed from command line [2025-01-05 14:58:32,547][19571] Adding new argument 'no_render'=True that is not in the saved config file! [2025-01-05 14:58:32,547][19571] Adding new argument 'save_video'=True that is not in the saved config file! [2025-01-05 14:58:32,548][19571] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! [2025-01-05 14:58:32,548][19571] Adding new argument 'video_name'=None that is not in the saved config file! [2025-01-05 14:58:32,548][19571] Adding new argument 'max_num_frames'=1000000000.0 that is not in the saved config file! [2025-01-05 14:58:32,549][19571] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! [2025-01-05 14:58:32,549][19571] Adding new argument 'push_to_hub'=False that is not in the saved config file! [2025-01-05 14:58:32,549][19571] Adding new argument 'hf_repository'=None that is not in the saved config file! [2025-01-05 14:58:32,550][19571] Adding new argument 'policy_index'=0 that is not in the saved config file! [2025-01-05 14:58:32,550][19571] Adding new argument 'eval_deterministic'=False that is not in the saved config file! [2025-01-05 14:58:32,550][19571] Adding new argument 'train_script'=None that is not in the saved config file! [2025-01-05 14:58:32,551][19571] Adding new argument 'enjoy_script'=None that is not in the saved config file! [2025-01-05 14:58:32,551][19571] Using frameskip 1 and render_action_repeat=4 for evaluation [2025-01-05 14:58:32,603][19571] Doom resolution: 160x120, resize resolution: (128, 72) [2025-01-05 14:58:32,610][19571] RunningMeanStd input shape: (3, 72, 128) [2025-01-05 14:58:32,611][19571] RunningMeanStd input shape: (1,) [2025-01-05 14:58:32,636][19571] ConvEncoder: input_channels=3 [2025-01-05 14:58:32,991][19571] Conv encoder output size: 512 [2025-01-05 14:58:32,992][19571] Policy head output size: 512 [2025-01-05 14:58:33,197][19571] Loading state from checkpoint /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000297398_1218142208.pth... [2025-01-05 14:58:33,917][19571] Num frames 100... [2025-01-05 14:58:34,031][19571] Num frames 200... [2025-01-05 14:58:34,150][19571] Num frames 300... [2025-01-05 14:58:34,263][19571] Num frames 400... [2025-01-05 14:58:34,357][19571] Num frames 500... [2025-01-05 14:58:34,459][19571] Num frames 600... [2025-01-05 14:58:34,587][19571] Avg episode rewards: #0: 10.720, true rewards: #0: 6.720 [2025-01-05 14:58:34,587][19571] Avg episode reward: 10.720, avg true_objective: 6.720 [2025-01-05 14:58:34,641][19571] Num frames 700... [2025-01-05 14:58:34,739][19571] Num frames 800... [2025-01-05 14:58:34,842][19571] Num frames 900... [2025-01-05 14:58:34,947][19571] Num frames 1000... [2025-01-05 14:58:35,058][19571] Num frames 1100... [2025-01-05 14:58:35,162][19571] Num frames 1200... [2025-01-05 14:58:35,268][19571] Num frames 1300... [2025-01-05 15:20:41,010][07361] Saving configuration to /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/config.json... [2025-01-05 15:20:41,011][07361] Rollout worker 0 uses device cpu [2025-01-05 15:20:41,012][07361] Rollout worker 1 uses device cpu [2025-01-05 15:20:41,012][07361] Rollout worker 2 uses device cpu [2025-01-05 15:20:41,012][07361] Rollout worker 3 uses device cpu [2025-01-05 15:20:41,012][07361] Rollout worker 4 uses device cpu [2025-01-05 15:20:41,012][07361] Rollout worker 5 uses device cpu [2025-01-05 15:20:41,012][07361] Rollout worker 6 uses device cpu [2025-01-05 15:20:41,012][07361] Rollout worker 7 uses device cpu [2025-01-05 15:20:41,013][07361] Rollout worker 8 uses device cpu [2025-01-05 15:20:41,013][07361] Rollout worker 9 uses device cpu [2025-01-05 15:20:41,013][07361] Rollout worker 10 uses device cpu [2025-01-05 15:20:41,013][07361] Rollout worker 11 uses device cpu [2025-01-05 15:20:41,110][07361] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2025-01-05 15:20:41,110][07361] InferenceWorker_p0-w0: min num requests: 4 [2025-01-05 15:20:41,135][07361] Starting all processes... [2025-01-05 15:20:41,135][07361] Starting process learner_proc0 [2025-01-05 15:20:42,477][07361] Starting all processes... [2025-01-05 15:20:42,488][07448] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2025-01-05 15:20:42,488][07448] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0 [2025-01-05 15:20:42,490][07361] Starting process inference_proc0-0 [2025-01-05 15:20:42,490][07361] Starting process rollout_proc0 [2025-01-05 15:20:42,492][07361] Starting process rollout_proc1 [2025-01-05 15:20:42,493][07361] Starting process rollout_proc2 [2025-01-05 15:20:42,494][07361] Starting process rollout_proc3 [2025-01-05 15:20:42,501][07448] Num visible devices: 1 [2025-01-05 15:20:42,497][07361] Starting process rollout_proc4 [2025-01-05 15:20:42,499][07361] Starting process rollout_proc5 [2025-01-05 15:20:42,501][07361] Starting process rollout_proc6 [2025-01-05 15:20:42,518][07448] Starting seed is not provided [2025-01-05 15:20:42,519][07448] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2025-01-05 15:20:42,519][07448] Initializing actor-critic model on device cuda:0 [2025-01-05 15:20:42,502][07361] Starting process rollout_proc7 [2025-01-05 15:20:42,521][07448] RunningMeanStd input shape: (3, 72, 128) [2025-01-05 15:20:42,530][07448] RunningMeanStd input shape: (1,) [2025-01-05 15:20:42,502][07361] Starting process rollout_proc8 [2025-01-05 15:20:42,508][07361] Starting process rollout_proc9 [2025-01-05 15:20:42,509][07361] Starting process rollout_proc10 [2025-01-05 15:20:42,527][07361] Starting process rollout_proc11 [2025-01-05 15:20:42,554][07448] ConvEncoder: input_channels=3 [2025-01-05 15:20:42,950][07448] Conv encoder output size: 512 [2025-01-05 15:20:42,951][07448] Policy head output size: 512 [2025-01-05 15:20:43,018][07448] Created Actor Critic model with architecture: [2025-01-05 15:20:43,019][07448] ActorCriticSharedWeights( (obs_normalizer): ObservationNormalizer( (running_mean_std): RunningMeanStdDictInPlace( (running_mean_std): ModuleDict( (obs): RunningMeanStdInPlace() ) ) ) (returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace) (encoder): VizdoomEncoder( (basic_encoder): ConvEncoder( (enc): RecursiveScriptModule( original_name=ConvEncoderImpl (conv_head): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Conv2d) (1): RecursiveScriptModule(original_name=ELU) (2): RecursiveScriptModule(original_name=Conv2d) (3): RecursiveScriptModule(original_name=ELU) (4): RecursiveScriptModule(original_name=Conv2d) (5): RecursiveScriptModule(original_name=ELU) ) (mlp_layers): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Linear) (1): RecursiveScriptModule(original_name=ELU) ) ) ) ) (core): ModelCoreRNN( (core): GRU(512, 512) ) (decoder): MlpDecoder( (mlp): Identity() ) (critic_linear): Linear(in_features=512, out_features=1, bias=True) (action_parameterization): ActionParameterizationDefault( (distribution_linear): Linear(in_features=512, out_features=5, bias=True) ) ) [2025-01-05 15:20:43,310][07448] Using optimizer [2025-01-05 15:20:46,102][07510] Worker 11 uses CPU cores [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15] [2025-01-05 15:20:46,387][07503] Worker 5 uses CPU cores [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15] [2025-01-05 15:20:46,519][07482] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2025-01-05 15:20:46,519][07482] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0 [2025-01-05 15:20:46,549][07482] Num visible devices: 1 [2025-01-05 15:20:46,625][07485] Worker 1 uses CPU cores [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15] [2025-01-05 15:20:46,667][07486] Worker 0 uses CPU cores [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15] [2025-01-05 15:20:46,687][07509] Worker 9 uses CPU cores [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15] [2025-01-05 15:20:46,802][07508] Worker 10 uses CPU cores [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15] [2025-01-05 15:20:46,805][07484] Worker 3 uses CPU cores [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15] [2025-01-05 15:20:46,867][07505] Worker 6 uses CPU cores [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15] [2025-01-05 15:20:46,882][07483] Worker 2 uses CPU cores [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15] [2025-01-05 15:20:46,882][07448] Loading state from checkpoint /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000297398_1218142208.pth... [2025-01-05 15:20:46,979][07448] Loading model from checkpoint [2025-01-05 15:20:46,982][07448] Loaded experiment state at self.train_step=297398, self.env_steps=1218142208 [2025-01-05 15:20:46,985][07448] Initialized policy 0 weights for model version 297398 [2025-01-05 15:20:46,989][07448] LearnerWorker_p0 finished initialization! [2025-01-05 15:20:46,989][07448] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2025-01-05 15:20:46,990][07504] Worker 7 uses CPU cores [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15] [2025-01-05 15:20:47,004][07507] Worker 8 uses CPU cores [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15] [2025-01-05 15:20:47,014][07361] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 1218142208. Throughput: 0: nan. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2025-01-05 15:20:47,018][07506] Worker 4 uses CPU cores [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15] [2025-01-05 15:20:47,087][07482] RunningMeanStd input shape: (3, 72, 128) [2025-01-05 15:20:47,087][07482] RunningMeanStd input shape: (1,) [2025-01-05 15:20:47,094][07482] ConvEncoder: input_channels=3 [2025-01-05 15:20:47,166][07482] Conv encoder output size: 512 [2025-01-05 15:20:47,167][07482] Policy head output size: 512 [2025-01-05 15:20:47,188][07361] Inference worker 0-0 is ready! [2025-01-05 15:20:47,188][07361] All inference workers are ready! Signal rollout workers to start! [2025-01-05 15:20:47,239][07486] Doom resolution: 160x120, resize resolution: (128, 72) [2025-01-05 15:20:47,239][07485] Doom resolution: 160x120, resize resolution: (128, 72) [2025-01-05 15:20:47,240][07503] Doom resolution: 160x120, resize resolution: (128, 72) [2025-01-05 15:20:47,251][07508] Doom resolution: 160x120, resize resolution: (128, 72) [2025-01-05 15:20:47,272][07484] Doom resolution: 160x120, resize resolution: (128, 72) [2025-01-05 15:20:47,272][07509] Doom resolution: 160x120, resize resolution: (128, 72) [2025-01-05 15:20:47,272][07505] Doom resolution: 160x120, resize resolution: (128, 72) [2025-01-05 15:20:47,272][07483] Doom resolution: 160x120, resize resolution: (128, 72) [2025-01-05 15:20:47,272][07510] Doom resolution: 160x120, resize resolution: (128, 72) [2025-01-05 15:20:47,272][07504] Doom resolution: 160x120, resize resolution: (128, 72) [2025-01-05 15:20:47,272][07506] Doom resolution: 160x120, resize resolution: (128, 72) [2025-01-05 15:20:47,272][07507] Doom resolution: 160x120, resize resolution: (128, 72) [2025-01-05 15:20:47,828][07505] Decorrelating experience for 0 frames... [2025-01-05 15:20:47,828][07504] Decorrelating experience for 0 frames... [2025-01-05 15:20:47,828][07506] Decorrelating experience for 0 frames... [2025-01-05 15:20:47,830][07485] Decorrelating experience for 0 frames... [2025-01-05 15:20:47,837][07503] Decorrelating experience for 0 frames... [2025-01-05 15:20:47,837][07486] Decorrelating experience for 0 frames... [2025-01-05 15:20:47,837][07508] Decorrelating experience for 0 frames... [2025-01-05 15:20:47,852][07361] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 1218142208. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2025-01-05 15:20:48,215][07503] Decorrelating experience for 32 frames... [2025-01-05 15:20:48,224][07510] Decorrelating experience for 0 frames... [2025-01-05 15:20:48,229][07506] Decorrelating experience for 32 frames... [2025-01-05 15:20:48,230][07483] Decorrelating experience for 0 frames... [2025-01-05 15:20:48,254][07486] Decorrelating experience for 32 frames... [2025-01-05 15:20:48,273][07484] Decorrelating experience for 0 frames... [2025-01-05 15:20:48,327][07485] Decorrelating experience for 32 frames... [2025-01-05 15:20:48,353][07505] Decorrelating experience for 32 frames... [2025-01-05 15:20:48,607][07504] Decorrelating experience for 32 frames... [2025-01-05 15:20:48,617][07510] Decorrelating experience for 32 frames... [2025-01-05 15:20:48,625][07509] Decorrelating experience for 0 frames... [2025-01-05 15:20:48,675][07483] Decorrelating experience for 32 frames... [2025-01-05 15:20:48,756][07506] Decorrelating experience for 64 frames... [2025-01-05 15:20:48,765][07484] Decorrelating experience for 32 frames... [2025-01-05 15:20:48,997][07486] Decorrelating experience for 64 frames... [2025-01-05 15:20:49,009][07509] Decorrelating experience for 32 frames... [2025-01-05 15:20:49,094][07507] Decorrelating experience for 0 frames... [2025-01-05 15:20:49,113][07504] Decorrelating experience for 64 frames... [2025-01-05 15:20:49,120][07510] Decorrelating experience for 64 frames... [2025-01-05 15:20:49,134][07508] Decorrelating experience for 32 frames... [2025-01-05 15:20:49,151][07485] Decorrelating experience for 64 frames... [2025-01-05 15:20:49,278][07505] Decorrelating experience for 64 frames... [2025-01-05 15:20:49,405][07484] Decorrelating experience for 64 frames... [2025-01-05 15:20:49,468][07486] Decorrelating experience for 96 frames... [2025-01-05 15:20:49,471][07507] Decorrelating experience for 32 frames... [2025-01-05 15:20:49,541][07504] Decorrelating experience for 96 frames... [2025-01-05 15:20:49,639][07503] Decorrelating experience for 64 frames... [2025-01-05 15:20:49,771][07505] Decorrelating experience for 96 frames... [2025-01-05 15:20:49,827][07508] Decorrelating experience for 64 frames... [2025-01-05 15:20:49,845][07483] Decorrelating experience for 64 frames... [2025-01-05 15:20:49,970][07506] Decorrelating experience for 96 frames... [2025-01-05 15:20:49,993][07507] Decorrelating experience for 64 frames... [2025-01-05 15:20:50,089][07503] Decorrelating experience for 96 frames... [2025-01-05 15:20:50,150][07484] Decorrelating experience for 96 frames... [2025-01-05 15:20:50,219][07510] Decorrelating experience for 96 frames... [2025-01-05 15:20:50,257][07485] Decorrelating experience for 96 frames... [2025-01-05 15:20:50,367][07508] Decorrelating experience for 96 frames... [2025-01-05 15:20:50,433][07507] Decorrelating experience for 96 frames... [2025-01-05 15:20:50,486][07483] Decorrelating experience for 96 frames... [2025-01-05 15:20:50,561][07509] Decorrelating experience for 64 frames... [2025-01-05 15:20:50,856][07509] Decorrelating experience for 96 frames... [2025-01-05 15:20:51,673][07448] Signal inference workers to stop experience collection... [2025-01-05 15:20:51,716][07482] InferenceWorker_p0-w0: stopping experience collection [2025-01-05 15:20:52,852][07361] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 1218142208. Throughput: 0: 9.3. Samples: 54. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2025-01-05 15:20:52,853][07361] Avg episode reward: [(0, '1.423')] [2025-01-05 15:20:56,002][07448] Signal inference workers to resume experience collection... [2025-01-05 15:20:56,002][07482] InferenceWorker_p0-w0: resuming experience collection [2025-01-05 15:20:57,852][07361] Fps is (10 sec: 2867.2, 60 sec: 2645.6, 300 sec: 2645.6). Total num frames: 1218170880. Throughput: 0: 301.4. Samples: 3266. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2025-01-05 15:20:57,852][07361] Avg episode reward: [(0, '6.581')] [2025-01-05 15:20:58,424][07482] Updated weights for policy 0, policy_version 297408 (0.0094) [2025-01-05 15:21:01,105][07361] Heartbeat connected on Batcher_0 [2025-01-05 15:21:01,107][07361] Heartbeat connected on LearnerWorker_p0 [2025-01-05 15:21:01,113][07361] Heartbeat connected on RolloutWorker_w0 [2025-01-05 15:21:01,117][07361] Heartbeat connected on RolloutWorker_w1 [2025-01-05 15:21:01,119][07361] Heartbeat connected on InferenceWorker_p0-w0 [2025-01-05 15:21:01,123][07361] Heartbeat connected on RolloutWorker_w4 [2025-01-05 15:21:01,123][07361] Heartbeat connected on RolloutWorker_w3 [2025-01-05 15:21:01,123][07361] Heartbeat connected on RolloutWorker_w2 [2025-01-05 15:21:01,124][07361] Heartbeat connected on RolloutWorker_w5 [2025-01-05 15:21:01,127][07361] Heartbeat connected on RolloutWorker_w6 [2025-01-05 15:21:01,127][07361] Heartbeat connected on RolloutWorker_w7 [2025-01-05 15:21:01,131][07361] Heartbeat connected on RolloutWorker_w8 [2025-01-05 15:21:01,131][07361] Heartbeat connected on RolloutWorker_w9 [2025-01-05 15:21:01,133][07361] Heartbeat connected on RolloutWorker_w10 [2025-01-05 15:21:01,137][07361] Heartbeat connected on RolloutWorker_w11 [2025-01-05 15:21:01,295][07482] Updated weights for policy 0, policy_version 297418 (0.0017) [2025-01-05 15:21:02,852][07361] Fps is (10 sec: 10239.9, 60 sec: 6465.5, 300 sec: 6465.5). Total num frames: 1218244608. Throughput: 0: 1578.6. Samples: 25002. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2025-01-05 15:21:02,853][07361] Avg episode reward: [(0, '12.664')] [2025-01-05 15:21:04,248][07482] Updated weights for policy 0, policy_version 297428 (0.0016) [2025-01-05 15:21:07,078][07482] Updated weights for policy 0, policy_version 297438 (0.0022) [2025-01-05 15:21:07,852][07361] Fps is (10 sec: 14335.3, 60 sec: 8255.7, 300 sec: 8255.7). Total num frames: 1218314240. Throughput: 0: 1698.5. Samples: 35394. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2025-01-05 15:21:07,853][07361] Avg episode reward: [(0, '11.812')] [2025-01-05 15:21:10,332][07482] Updated weights for policy 0, policy_version 297448 (0.0025) [2025-01-05 15:21:12,852][07361] Fps is (10 sec: 13107.1, 60 sec: 9036.0, 300 sec: 9036.0). Total num frames: 1218375680. Throughput: 0: 2136.9. Samples: 55212. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2025-01-05 15:21:12,854][07361] Avg episode reward: [(0, '11.833')] [2025-01-05 15:21:13,587][07482] Updated weights for policy 0, policy_version 297458 (0.0027) [2025-01-05 15:21:16,678][07482] Updated weights for policy 0, policy_version 297468 (0.0025) [2025-01-05 15:21:17,853][07361] Fps is (10 sec: 12697.1, 60 sec: 9696.0, 300 sec: 9696.0). Total num frames: 1218441216. Throughput: 0: 2397.4. Samples: 73932. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2025-01-05 15:21:17,854][07361] Avg episode reward: [(0, '11.645')] [2025-01-05 15:21:20,101][07482] Updated weights for policy 0, policy_version 297478 (0.0027) [2025-01-05 15:21:22,852][07361] Fps is (10 sec: 12697.6, 60 sec: 10057.7, 300 sec: 10057.7). Total num frames: 1218502656. Throughput: 0: 2331.2. Samples: 83546. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2025-01-05 15:21:22,854][07361] Avg episode reward: [(0, '13.630')] [2025-01-05 15:21:22,856][07448] Saving new best policy, reward=13.630! [2025-01-05 15:21:23,366][07482] Updated weights for policy 0, policy_version 297488 (0.0024) [2025-01-05 15:21:26,624][07482] Updated weights for policy 0, policy_version 297498 (0.0026) [2025-01-05 15:21:27,852][07361] Fps is (10 sec: 12288.5, 60 sec: 10330.8, 300 sec: 10330.8). Total num frames: 1218564096. Throughput: 0: 2494.0. Samples: 101850. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2025-01-05 15:21:27,854][07361] Avg episode reward: [(0, '13.152')] [2025-01-05 15:21:30,095][07482] Updated weights for policy 0, policy_version 297508 (0.0026) [2025-01-05 15:21:32,851][07361] Fps is (10 sec: 12288.7, 60 sec: 10544.4, 300 sec: 10544.4). Total num frames: 1218625536. Throughput: 0: 2670.8. Samples: 120186. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2025-01-05 15:21:32,852][07361] Avg episode reward: [(0, '12.354')] [2025-01-05 15:21:33,351][07482] Updated weights for policy 0, policy_version 297518 (0.0032) [2025-01-05 15:21:36,868][07482] Updated weights for policy 0, policy_version 297528 (0.0028) [2025-01-05 15:21:37,852][07361] Fps is (10 sec: 11878.1, 60 sec: 10635.1, 300 sec: 10635.1). Total num frames: 1218682880. Throughput: 0: 2881.0. Samples: 129702. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2025-01-05 15:21:37,854][07361] Avg episode reward: [(0, '12.368')] [2025-01-05 15:21:40,462][07482] Updated weights for policy 0, policy_version 297538 (0.0020) [2025-01-05 15:21:42,851][07361] Fps is (10 sec: 11878.4, 60 sec: 10783.3, 300 sec: 10783.3). Total num frames: 1218744320. Throughput: 0: 3198.1. Samples: 147182. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2025-01-05 15:21:42,852][07361] Avg episode reward: [(0, '12.760')] [2025-01-05 15:21:43,752][07482] Updated weights for policy 0, policy_version 297548 (0.0032) [2025-01-05 15:21:47,852][07361] Fps is (10 sec: 11059.8, 60 sec: 10854.3, 300 sec: 10704.9). Total num frames: 1218793472. Throughput: 0: 3065.3. Samples: 162940. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2025-01-05 15:21:47,853][07361] Avg episode reward: [(0, '14.160')] [2025-01-05 15:21:47,857][07448] Saving new best policy, reward=14.160! [2025-01-05 15:21:48,149][07482] Updated weights for policy 0, policy_version 297558 (0.0030) [2025-01-05 15:21:52,151][07482] Updated weights for policy 0, policy_version 297568 (0.0035) [2025-01-05 15:21:52,852][07361] Fps is (10 sec: 10239.4, 60 sec: 11741.8, 300 sec: 10700.7). Total num frames: 1218846720. Throughput: 0: 2984.2. Samples: 169684. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2025-01-05 15:21:52,853][07361] Avg episode reward: [(0, '15.193')] [2025-01-05 15:21:52,855][07448] Saving new best policy, reward=15.193! [2025-01-05 15:21:57,109][07482] Updated weights for policy 0, policy_version 297578 (0.0027) [2025-01-05 15:21:57,853][07361] Fps is (10 sec: 9010.0, 60 sec: 11878.1, 300 sec: 10465.6). Total num frames: 1218883584. Throughput: 0: 2860.6. Samples: 183944. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2025-01-05 15:21:57,855][07361] Avg episode reward: [(0, '13.217')] [2025-01-05 15:22:01,222][07482] Updated weights for policy 0, policy_version 297588 (0.0028) [2025-01-05 15:22:02,852][07361] Fps is (10 sec: 9421.0, 60 sec: 11605.4, 300 sec: 10532.0). Total num frames: 1218940928. Throughput: 0: 2767.2. Samples: 198456. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2025-01-05 15:22:02,853][07361] Avg episode reward: [(0, '13.879')] [2025-01-05 15:22:04,549][07482] Updated weights for policy 0, policy_version 297598 (0.0031) [2025-01-05 15:22:07,852][07361] Fps is (10 sec: 11470.6, 60 sec: 11400.6, 300 sec: 10589.9). Total num frames: 1218998272. Throughput: 0: 2761.8. Samples: 207824. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2025-01-05 15:22:07,852][07361] Avg episode reward: [(0, '13.492')] [2025-01-05 15:22:07,996][07482] Updated weights for policy 0, policy_version 297608 (0.0021) [2025-01-05 15:22:11,579][07482] Updated weights for policy 0, policy_version 297618 (0.0025) [2025-01-05 15:22:12,852][07361] Fps is (10 sec: 11469.1, 60 sec: 11332.4, 300 sec: 10641.1). Total num frames: 1219055616. Throughput: 0: 2743.0. Samples: 225284. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 15:22:12,852][07361] Avg episode reward: [(0, '14.106')] [2025-01-05 15:22:14,934][07482] Updated weights for policy 0, policy_version 297628 (0.0027) [2025-01-05 15:22:17,852][07361] Fps is (10 sec: 11878.4, 60 sec: 11264.2, 300 sec: 10731.8). Total num frames: 1219117056. Throughput: 0: 2736.4. Samples: 243324. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 15:22:17,852][07361] Avg episode reward: [(0, '16.467')] [2025-01-05 15:22:17,859][07448] Saving new best policy, reward=16.467! [2025-01-05 15:22:18,498][07482] Updated weights for policy 0, policy_version 297638 (0.0030) [2025-01-05 15:22:21,922][07482] Updated weights for policy 0, policy_version 297648 (0.0027) [2025-01-05 15:22:22,852][07361] Fps is (10 sec: 11878.3, 60 sec: 11195.8, 300 sec: 10770.2). Total num frames: 1219174400. Throughput: 0: 2717.0. Samples: 251964. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 15:22:22,852][07361] Avg episode reward: [(0, '15.813')] [2025-01-05 15:22:25,410][07482] Updated weights for policy 0, policy_version 297658 (0.0026) [2025-01-05 15:22:27,852][07361] Fps is (10 sec: 11877.7, 60 sec: 11195.7, 300 sec: 10845.4). Total num frames: 1219235840. Throughput: 0: 2726.0. Samples: 269856. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 15:22:27,854][07361] Avg episode reward: [(0, '14.429')] [2025-01-05 15:22:28,785][07482] Updated weights for policy 0, policy_version 297668 (0.0029) [2025-01-05 15:22:32,265][07482] Updated weights for policy 0, policy_version 297678 (0.0025) [2025-01-05 15:22:32,852][07361] Fps is (10 sec: 11877.9, 60 sec: 11127.4, 300 sec: 10874.9). Total num frames: 1219293184. Throughput: 0: 2767.9. Samples: 287498. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 15:22:32,853][07361] Avg episode reward: [(0, '15.840')] [2025-01-05 15:22:35,681][07482] Updated weights for policy 0, policy_version 297688 (0.0029) [2025-01-05 15:22:37,852][07361] Fps is (10 sec: 11878.6, 60 sec: 11195.8, 300 sec: 10938.6). Total num frames: 1219354624. Throughput: 0: 2824.6. Samples: 296792. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 15:22:37,852][07361] Avg episode reward: [(0, '16.146')] [2025-01-05 15:22:37,858][07448] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000297694_1219354624.pth... [2025-01-05 15:22:37,913][07448] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000296880_1216020480.pth [2025-01-05 15:22:39,256][07482] Updated weights for policy 0, policy_version 297698 (0.0030) [2025-01-05 15:22:42,485][07482] Updated weights for policy 0, policy_version 297708 (0.0025) [2025-01-05 15:22:42,852][07361] Fps is (10 sec: 12288.1, 60 sec: 11195.6, 300 sec: 10996.9). Total num frames: 1219416064. Throughput: 0: 2900.9. Samples: 314480. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 15:22:42,853][07361] Avg episode reward: [(0, '15.730')] [2025-01-05 15:22:45,915][07482] Updated weights for policy 0, policy_version 297718 (0.0031) [2025-01-05 15:22:47,852][07361] Fps is (10 sec: 11878.7, 60 sec: 11332.3, 300 sec: 11016.4). Total num frames: 1219473408. Throughput: 0: 2978.1. Samples: 332472. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 15:22:47,852][07361] Avg episode reward: [(0, '15.684')] [2025-01-05 15:22:49,867][07482] Updated weights for policy 0, policy_version 297728 (0.0029) [2025-01-05 15:22:52,852][07361] Fps is (10 sec: 11059.1, 60 sec: 11332.3, 300 sec: 11001.8). Total num frames: 1219526656. Throughput: 0: 2936.2. Samples: 339954. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 15:22:52,853][07361] Avg episode reward: [(0, '14.498')] [2025-01-05 15:22:53,373][07482] Updated weights for policy 0, policy_version 297738 (0.0025) [2025-01-05 15:22:57,109][07482] Updated weights for policy 0, policy_version 297748 (0.0029) [2025-01-05 15:22:57,852][07361] Fps is (10 sec: 10649.1, 60 sec: 11605.5, 300 sec: 10988.4). Total num frames: 1219579904. Throughput: 0: 2920.5. Samples: 356710. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 15:22:57,854][07361] Avg episode reward: [(0, '15.754')] [2025-01-05 15:23:01,488][07482] Updated weights for policy 0, policy_version 297758 (0.0030) [2025-01-05 15:23:02,852][07361] Fps is (10 sec: 10240.5, 60 sec: 11468.9, 300 sec: 10945.8). Total num frames: 1219629056. Throughput: 0: 2849.2. Samples: 371540. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 15:23:02,852][07361] Avg episode reward: [(0, '16.357')] [2025-01-05 15:23:06,873][07482] Updated weights for policy 0, policy_version 297768 (0.0034) [2025-01-05 15:23:07,852][07361] Fps is (10 sec: 8601.5, 60 sec: 11127.3, 300 sec: 10818.9). Total num frames: 1219665920. Throughput: 0: 2778.2. Samples: 376986. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 15:23:07,854][07361] Avg episode reward: [(0, '16.210')] [2025-01-05 15:23:10,681][07482] Updated weights for policy 0, policy_version 297778 (0.0029) [2025-01-05 15:23:12,852][07361] Fps is (10 sec: 9420.5, 60 sec: 11127.4, 300 sec: 10841.2). Total num frames: 1219723264. Throughput: 0: 2704.2. Samples: 391544. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 15:23:12,853][07361] Avg episode reward: [(0, '14.873')] [2025-01-05 15:23:14,210][07482] Updated weights for policy 0, policy_version 297788 (0.0028) [2025-01-05 15:23:17,847][07482] Updated weights for policy 0, policy_version 297798 (0.0027) [2025-01-05 15:23:17,851][07361] Fps is (10 sec: 11469.9, 60 sec: 11059.2, 300 sec: 10862.0). Total num frames: 1219780608. Throughput: 0: 2703.8. Samples: 409166. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 15:23:17,852][07361] Avg episode reward: [(0, '12.967')] [2025-01-05 15:23:21,840][07482] Updated weights for policy 0, policy_version 297808 (0.0027) [2025-01-05 15:23:22,852][07361] Fps is (10 sec: 10649.9, 60 sec: 10922.7, 300 sec: 10828.9). Total num frames: 1219829760. Throughput: 0: 2655.0. Samples: 416268. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-05 15:23:22,852][07361] Avg episode reward: [(0, '14.428')] [2025-01-05 15:23:25,287][07482] Updated weights for policy 0, policy_version 297818 (0.0033) [2025-01-05 15:23:27,852][07361] Fps is (10 sec: 11058.4, 60 sec: 10922.7, 300 sec: 10874.2). Total num frames: 1219891200. Throughput: 0: 2657.7. Samples: 434076. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-05 15:23:27,854][07361] Avg episode reward: [(0, '15.432')] [2025-01-05 15:23:28,726][07482] Updated weights for policy 0, policy_version 297828 (0.0027) [2025-01-05 15:23:32,413][07482] Updated weights for policy 0, policy_version 297838 (0.0030) [2025-01-05 15:23:32,852][07361] Fps is (10 sec: 11877.9, 60 sec: 10922.7, 300 sec: 10892.2). Total num frames: 1219948544. Throughput: 0: 2637.2. Samples: 451148. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-05 15:23:32,853][07361] Avg episode reward: [(0, '15.191')] [2025-01-05 15:23:35,885][07482] Updated weights for policy 0, policy_version 297848 (0.0027) [2025-01-05 15:23:37,852][07361] Fps is (10 sec: 11059.6, 60 sec: 10786.2, 300 sec: 10885.1). Total num frames: 1220001792. Throughput: 0: 2669.3. Samples: 460072. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-05 15:23:37,852][07361] Avg episode reward: [(0, '14.132')] [2025-01-05 15:23:39,677][07482] Updated weights for policy 0, policy_version 297858 (0.0028) [2025-01-05 15:23:42,852][07361] Fps is (10 sec: 10240.1, 60 sec: 10581.3, 300 sec: 10855.1). Total num frames: 1220050944. Throughput: 0: 2630.6. Samples: 475088. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-05 15:23:42,854][07361] Avg episode reward: [(0, '14.133')] [2025-01-05 15:23:44,091][07482] Updated weights for policy 0, policy_version 297868 (0.0031) [2025-01-05 15:23:47,852][07361] Fps is (10 sec: 9830.0, 60 sec: 10444.7, 300 sec: 10826.7). Total num frames: 1220100096. Throughput: 0: 2643.7. Samples: 490508. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-05 15:23:47,854][07361] Avg episode reward: [(0, '15.755')] [2025-01-05 15:23:48,488][07482] Updated weights for policy 0, policy_version 297878 (0.0036) [2025-01-05 15:23:52,851][07361] Fps is (10 sec: 9421.3, 60 sec: 10308.4, 300 sec: 10777.9). Total num frames: 1220145152. Throughput: 0: 2637.1. Samples: 495652. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-05 15:23:52,852][07361] Avg episode reward: [(0, '14.805')] [2025-01-05 15:23:52,859][07482] Updated weights for policy 0, policy_version 297888 (0.0031) [2025-01-05 15:23:57,120][07482] Updated weights for policy 0, policy_version 297898 (0.0028) [2025-01-05 15:23:57,852][07361] Fps is (10 sec: 9831.0, 60 sec: 10308.4, 300 sec: 10774.6). Total num frames: 1220198400. Throughput: 0: 2643.5. Samples: 510502. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-05 15:23:57,852][07361] Avg episode reward: [(0, '15.150')] [2025-01-05 15:24:00,784][07482] Updated weights for policy 0, policy_version 297908 (0.0032) [2025-01-05 15:24:02,852][07361] Fps is (10 sec: 11059.2, 60 sec: 10444.8, 300 sec: 10792.3). Total num frames: 1220255744. Throughput: 0: 2637.1. Samples: 527836. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-05 15:24:02,852][07361] Avg episode reward: [(0, '15.123')] [2025-01-05 15:24:04,111][07482] Updated weights for policy 0, policy_version 297918 (0.0023) [2025-01-05 15:24:07,852][07361] Fps is (10 sec: 11059.1, 60 sec: 10718.0, 300 sec: 10788.7). Total num frames: 1220308992. Throughput: 0: 2665.6. Samples: 536220. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-05 15:24:07,852][07361] Avg episode reward: [(0, '16.582')] [2025-01-05 15:24:07,858][07448] Saving new best policy, reward=16.582! [2025-01-05 15:24:08,154][07482] Updated weights for policy 0, policy_version 297928 (0.0031) [2025-01-05 15:24:11,475][07482] Updated weights for policy 0, policy_version 297938 (0.0028) [2025-01-05 15:24:12,852][07361] Fps is (10 sec: 11058.9, 60 sec: 10717.9, 300 sec: 10805.2). Total num frames: 1220366336. Throughput: 0: 2644.4. Samples: 553072. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-05 15:24:12,853][07361] Avg episode reward: [(0, '18.370')] [2025-01-05 15:24:12,854][07448] Saving new best policy, reward=18.370! [2025-01-05 15:24:15,382][07482] Updated weights for policy 0, policy_version 297948 (0.0034) [2025-01-05 15:24:17,852][07361] Fps is (10 sec: 11059.1, 60 sec: 10649.5, 300 sec: 10801.6). Total num frames: 1220419584. Throughput: 0: 2632.4. Samples: 569606. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-05 15:24:17,852][07361] Avg episode reward: [(0, '18.769')] [2025-01-05 15:24:17,862][07448] Saving new best policy, reward=18.769! [2025-01-05 15:24:18,910][07482] Updated weights for policy 0, policy_version 297958 (0.0032) [2025-01-05 15:24:22,350][07482] Updated weights for policy 0, policy_version 297968 (0.0032) [2025-01-05 15:24:22,851][07361] Fps is (10 sec: 11469.2, 60 sec: 10854.4, 300 sec: 10836.0). Total num frames: 1220481024. Throughput: 0: 2627.4. Samples: 578304. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-05 15:24:22,852][07361] Avg episode reward: [(0, '16.316')] [2025-01-05 15:24:25,677][07482] Updated weights for policy 0, policy_version 297978 (0.0028) [2025-01-05 15:24:27,852][07361] Fps is (10 sec: 11878.2, 60 sec: 10786.2, 300 sec: 10850.3). Total num frames: 1220538368. Throughput: 0: 2694.2. Samples: 596326. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-05 15:24:27,853][07361] Avg episode reward: [(0, '15.390')] [2025-01-05 15:24:29,374][07482] Updated weights for policy 0, policy_version 297988 (0.0033) [2025-01-05 15:24:32,740][07482] Updated weights for policy 0, policy_version 297998 (0.0023) [2025-01-05 15:24:32,851][07361] Fps is (10 sec: 11878.5, 60 sec: 10854.5, 300 sec: 10882.2). Total num frames: 1220599808. Throughput: 0: 2739.4. Samples: 613780. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) [2025-01-05 15:24:32,852][07361] Avg episode reward: [(0, '16.697')] [2025-01-05 15:24:36,230][07482] Updated weights for policy 0, policy_version 298008 (0.0030) [2025-01-05 15:24:37,852][07361] Fps is (10 sec: 11878.1, 60 sec: 10922.6, 300 sec: 10894.8). Total num frames: 1220657152. Throughput: 0: 2819.1. Samples: 622514. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 15:24:37,854][07361] Avg episode reward: [(0, '17.120')] [2025-01-05 15:24:37,942][07448] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000298013_1220661248.pth... [2025-01-05 15:24:38,075][07448] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000297398_1218142208.pth [2025-01-05 15:24:39,848][07482] Updated weights for policy 0, policy_version 298018 (0.0026) [2025-01-05 15:24:42,851][07361] Fps is (10 sec: 11468.8, 60 sec: 11059.3, 300 sec: 10907.0). Total num frames: 1220714496. Throughput: 0: 2872.8. Samples: 639778. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 15:24:42,852][07361] Avg episode reward: [(0, '16.675')] [2025-01-05 15:24:43,216][07482] Updated weights for policy 0, policy_version 298028 (0.0031) [2025-01-05 15:24:46,639][07482] Updated weights for policy 0, policy_version 298038 (0.0032) [2025-01-05 15:24:47,852][07361] Fps is (10 sec: 11878.3, 60 sec: 11264.0, 300 sec: 10935.7). Total num frames: 1220775936. Throughput: 0: 2890.3. Samples: 657900. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 15:24:47,853][07361] Avg episode reward: [(0, '15.884')] [2025-01-05 15:24:50,142][07482] Updated weights for policy 0, policy_version 298048 (0.0024) [2025-01-05 15:24:52,852][07361] Fps is (10 sec: 11877.9, 60 sec: 11468.7, 300 sec: 10946.5). Total num frames: 1220833280. Throughput: 0: 2906.7. Samples: 667020. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 15:24:52,853][07361] Avg episode reward: [(0, '15.370')] [2025-01-05 15:24:53,591][07482] Updated weights for policy 0, policy_version 298058 (0.0032) [2025-01-05 15:24:57,103][07482] Updated weights for policy 0, policy_version 298068 (0.0026) [2025-01-05 15:24:57,852][07361] Fps is (10 sec: 11878.6, 60 sec: 11605.2, 300 sec: 10973.3). Total num frames: 1220894720. Throughput: 0: 2921.5. Samples: 684540. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 15:24:57,853][07361] Avg episode reward: [(0, '15.441')] [2025-01-05 15:25:00,467][07482] Updated weights for policy 0, policy_version 298078 (0.0027) [2025-01-05 15:25:02,852][07361] Fps is (10 sec: 11878.2, 60 sec: 11605.2, 300 sec: 10983.0). Total num frames: 1220952064. Throughput: 0: 2956.6. Samples: 702652. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 15:25:02,854][07361] Avg episode reward: [(0, '16.309')] [2025-01-05 15:25:03,879][07482] Updated weights for policy 0, policy_version 298088 (0.0024) [2025-01-05 15:25:07,461][07482] Updated weights for policy 0, policy_version 298098 (0.0030) [2025-01-05 15:25:07,852][07361] Fps is (10 sec: 11878.7, 60 sec: 11741.9, 300 sec: 11008.0). Total num frames: 1221013504. Throughput: 0: 2957.1. Samples: 711376. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 15:25:07,852][07361] Avg episode reward: [(0, '17.610')] [2025-01-05 15:25:10,941][07482] Updated weights for policy 0, policy_version 298108 (0.0030) [2025-01-05 15:25:12,852][07361] Fps is (10 sec: 11878.4, 60 sec: 11741.8, 300 sec: 11016.6). Total num frames: 1221070848. Throughput: 0: 2950.5. Samples: 729098. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 15:25:12,853][07361] Avg episode reward: [(0, '17.276')] [2025-01-05 15:25:14,397][07482] Updated weights for policy 0, policy_version 298118 (0.0027) [2025-01-05 15:25:17,726][07482] Updated weights for policy 0, policy_version 298128 (0.0024) [2025-01-05 15:25:17,852][07361] Fps is (10 sec: 11878.7, 60 sec: 11878.4, 300 sec: 11040.1). Total num frames: 1221132288. Throughput: 0: 2961.1. Samples: 747028. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 15:25:17,852][07361] Avg episode reward: [(0, '18.572')] [2025-01-05 15:25:21,150][07482] Updated weights for policy 0, policy_version 298138 (0.0028) [2025-01-05 15:25:22,852][07361] Fps is (10 sec: 11878.9, 60 sec: 11810.1, 300 sec: 11047.9). Total num frames: 1221189632. Throughput: 0: 2961.9. Samples: 755800. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 15:25:22,852][07361] Avg episode reward: [(0, '18.433')] [2025-01-05 15:25:24,729][07482] Updated weights for policy 0, policy_version 298148 (0.0032) [2025-01-05 15:25:27,852][07361] Fps is (10 sec: 11878.4, 60 sec: 11878.5, 300 sec: 11070.0). Total num frames: 1221251072. Throughput: 0: 2975.1. Samples: 773656. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 15:25:27,852][07361] Avg episode reward: [(0, '16.785')] [2025-01-05 15:25:28,158][07482] Updated weights for policy 0, policy_version 298158 (0.0029) [2025-01-05 15:25:31,557][07482] Updated weights for policy 0, policy_version 298168 (0.0030) [2025-01-05 15:25:32,853][07361] Fps is (10 sec: 11877.3, 60 sec: 11809.9, 300 sec: 11076.9). Total num frames: 1221308416. Throughput: 0: 2968.7. Samples: 791492. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 15:25:32,854][07361] Avg episode reward: [(0, '15.603')] [2025-01-05 15:25:35,009][07482] Updated weights for policy 0, policy_version 298178 (0.0030) [2025-01-05 15:25:37,852][07361] Fps is (10 sec: 11878.3, 60 sec: 11878.5, 300 sec: 11097.8). Total num frames: 1221369856. Throughput: 0: 2969.4. Samples: 800642. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 15:25:37,852][07361] Avg episode reward: [(0, '15.945')] [2025-01-05 15:25:38,441][07482] Updated weights for policy 0, policy_version 298188 (0.0028) [2025-01-05 15:25:41,881][07482] Updated weights for policy 0, policy_version 298198 (0.0029) [2025-01-05 15:25:42,852][07361] Fps is (10 sec: 11879.6, 60 sec: 11878.4, 300 sec: 11135.6). Total num frames: 1221427200. Throughput: 0: 2972.8. Samples: 818314. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 15:25:42,852][07361] Avg episode reward: [(0, '15.568')] [2025-01-05 15:25:45,593][07482] Updated weights for policy 0, policy_version 298208 (0.0024) [2025-01-05 15:25:47,852][07361] Fps is (10 sec: 11877.5, 60 sec: 11878.4, 300 sec: 11343.8). Total num frames: 1221488640. Throughput: 0: 2954.2. Samples: 835594. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 15:25:47,854][07361] Avg episode reward: [(0, '16.162')] [2025-01-05 15:25:48,921][07482] Updated weights for policy 0, policy_version 298218 (0.0026) [2025-01-05 15:25:52,379][07482] Updated weights for policy 0, policy_version 298228 (0.0032) [2025-01-05 15:25:52,852][07361] Fps is (10 sec: 11877.7, 60 sec: 11878.3, 300 sec: 11441.0). Total num frames: 1221545984. Throughput: 0: 2962.2. Samples: 844678. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 15:25:52,854][07361] Avg episode reward: [(0, '17.646')] [2025-01-05 15:25:55,910][07482] Updated weights for policy 0, policy_version 298238 (0.0031) [2025-01-05 15:25:57,852][07361] Fps is (10 sec: 11469.5, 60 sec: 11810.2, 300 sec: 11385.5). Total num frames: 1221603328. Throughput: 0: 2962.6. Samples: 862412. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 15:25:57,852][07361] Avg episode reward: [(0, '16.680')] [2025-01-05 15:25:59,390][07482] Updated weights for policy 0, policy_version 298248 (0.0026) [2025-01-05 15:26:01,544][07482] Updated weights for policy 0, policy_version 298258 (0.0012) [2025-01-05 15:26:02,851][07361] Fps is (10 sec: 14337.0, 60 sec: 12288.1, 300 sec: 11441.1). Total num frames: 1221689344. Throughput: 0: 3066.9. Samples: 885036. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 15:26:02,852][07361] Avg episode reward: [(0, '18.643')] [2025-01-05 15:26:03,672][07482] Updated weights for policy 0, policy_version 298268 (0.0013) [2025-01-05 15:26:05,665][07482] Updated weights for policy 0, policy_version 298278 (0.0013) [2025-01-05 15:26:07,754][07482] Updated weights for policy 0, policy_version 298288 (0.0013) [2025-01-05 15:26:07,851][07361] Fps is (10 sec: 18432.4, 60 sec: 12902.5, 300 sec: 11566.0). Total num frames: 1221787648. Throughput: 0: 3204.9. Samples: 900018. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 15:26:07,852][07361] Avg episode reward: [(0, '17.976')] [2025-01-05 15:26:09,882][07482] Updated weights for policy 0, policy_version 298298 (0.0014) [2025-01-05 15:26:11,920][07482] Updated weights for policy 0, policy_version 298308 (0.0014) [2025-01-05 15:26:12,852][07361] Fps is (10 sec: 19660.1, 60 sec: 13585.1, 300 sec: 11677.1). Total num frames: 1221885952. Throughput: 0: 3462.7. Samples: 929480. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 15:26:12,852][07361] Avg episode reward: [(0, '19.228')] [2025-01-05 15:26:12,853][07448] Saving new best policy, reward=19.228! [2025-01-05 15:26:14,125][07482] Updated weights for policy 0, policy_version 298318 (0.0013) [2025-01-05 15:26:16,217][07482] Updated weights for policy 0, policy_version 298328 (0.0015) [2025-01-05 15:26:17,851][07361] Fps is (10 sec: 19251.2, 60 sec: 14131.2, 300 sec: 11788.2). Total num frames: 1221980160. Throughput: 0: 3712.7. Samples: 958560. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 15:26:17,852][07361] Avg episode reward: [(0, '20.098')] [2025-01-05 15:26:17,915][07448] Saving new best policy, reward=20.098! [2025-01-05 15:26:18,327][07482] Updated weights for policy 0, policy_version 298338 (0.0014) [2025-01-05 15:26:20,475][07482] Updated weights for policy 0, policy_version 298348 (0.0014) [2025-01-05 15:26:22,548][07482] Updated weights for policy 0, policy_version 298358 (0.0013) [2025-01-05 15:26:22,851][07361] Fps is (10 sec: 19251.7, 60 sec: 14813.9, 300 sec: 11913.1). Total num frames: 1222078464. Throughput: 0: 3826.4. Samples: 972830. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 15:26:22,852][07361] Avg episode reward: [(0, '20.892')] [2025-01-05 15:26:22,852][07448] Saving new best policy, reward=20.892! [2025-01-05 15:26:24,656][07482] Updated weights for policy 0, policy_version 298368 (0.0014) [2025-01-05 15:26:27,203][07482] Updated weights for policy 0, policy_version 298378 (0.0014) [2025-01-05 15:26:27,852][07361] Fps is (10 sec: 18431.8, 60 sec: 15223.5, 300 sec: 11996.4). Total num frames: 1222164480. Throughput: 0: 4043.5. Samples: 1000270. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 15:26:27,852][07361] Avg episode reward: [(0, '22.397')] [2025-01-05 15:26:27,917][07448] Saving new best policy, reward=22.397! [2025-01-05 15:26:29,533][07482] Updated weights for policy 0, policy_version 298388 (0.0016) [2025-01-05 15:26:31,996][07482] Updated weights for policy 0, policy_version 298398 (0.0015) [2025-01-05 15:26:32,851][07361] Fps is (10 sec: 17203.2, 60 sec: 15701.6, 300 sec: 12093.6). Total num frames: 1222250496. Throughput: 0: 4234.0. Samples: 1026122. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 15:26:32,852][07361] Avg episode reward: [(0, '22.495')] [2025-01-05 15:26:32,937][07448] Saving new best policy, reward=22.495! [2025-01-05 15:26:34,248][07482] Updated weights for policy 0, policy_version 298408 (0.0015) [2025-01-05 15:26:36,381][07482] Updated weights for policy 0, policy_version 298418 (0.0014) [2025-01-05 15:26:37,851][07361] Fps is (10 sec: 18432.3, 60 sec: 16315.8, 300 sec: 12218.6). Total num frames: 1222348800. Throughput: 0: 4342.9. Samples: 1040104. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 15:26:37,852][07361] Avg episode reward: [(0, '19.677')] [2025-01-05 15:26:37,858][07448] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000298425_1222348800.pth... [2025-01-05 15:26:37,899][07448] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000297694_1219354624.pth [2025-01-05 15:26:38,489][07482] Updated weights for policy 0, policy_version 298428 (0.0014) [2025-01-05 15:26:40,614][07482] Updated weights for policy 0, policy_version 298438 (0.0014) [2025-01-05 15:26:42,735][07482] Updated weights for policy 0, policy_version 298448 (0.0014) [2025-01-05 15:26:42,851][07361] Fps is (10 sec: 19251.3, 60 sec: 16930.2, 300 sec: 12371.3). Total num frames: 1222443008. Throughput: 0: 4594.1. Samples: 1069144. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 15:26:42,852][07361] Avg episode reward: [(0, '22.564')] [2025-01-05 15:26:42,933][07448] Saving new best policy, reward=22.564! [2025-01-05 15:26:44,932][07482] Updated weights for policy 0, policy_version 298458 (0.0015) [2025-01-05 15:26:47,134][07482] Updated weights for policy 0, policy_version 298468 (0.0014) [2025-01-05 15:26:47,851][07361] Fps is (10 sec: 18841.6, 60 sec: 17476.5, 300 sec: 12510.2). Total num frames: 1222537216. Throughput: 0: 4717.4. Samples: 1097318. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 15:26:47,852][07361] Avg episode reward: [(0, '20.025')] [2025-01-05 15:26:49,340][07482] Updated weights for policy 0, policy_version 298478 (0.0014) [2025-01-05 15:26:51,446][07482] Updated weights for policy 0, policy_version 298488 (0.0015) [2025-01-05 15:26:52,851][07361] Fps is (10 sec: 18841.6, 60 sec: 18090.9, 300 sec: 12704.6). Total num frames: 1222631424. Throughput: 0: 4694.7. Samples: 1111280. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 15:26:52,852][07361] Avg episode reward: [(0, '23.069')] [2025-01-05 15:26:52,852][07448] Saving new best policy, reward=23.069! [2025-01-05 15:26:53,717][07482] Updated weights for policy 0, policy_version 298498 (0.0015) [2025-01-05 15:26:55,845][07482] Updated weights for policy 0, policy_version 298508 (0.0015) [2025-01-05 15:26:57,851][07361] Fps is (10 sec: 18841.5, 60 sec: 18705.1, 300 sec: 12829.5). Total num frames: 1222725632. Throughput: 0: 4671.7. Samples: 1139704. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 15:26:57,852][07361] Avg episode reward: [(0, '23.589')] [2025-01-05 15:26:57,967][07448] Saving new best policy, reward=23.589! [2025-01-05 15:26:57,971][07482] Updated weights for policy 0, policy_version 298518 (0.0013) [2025-01-05 15:27:00,178][07482] Updated weights for policy 0, policy_version 298528 (0.0014) [2025-01-05 15:27:02,307][07482] Updated weights for policy 0, policy_version 298538 (0.0014) [2025-01-05 15:27:02,852][07361] Fps is (10 sec: 18841.4, 60 sec: 18841.6, 300 sec: 12954.5). Total num frames: 1222819840. Throughput: 0: 4658.8. Samples: 1168206. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) [2025-01-05 15:27:02,852][07361] Avg episode reward: [(0, '21.202')] [2025-01-05 15:27:04,478][07482] Updated weights for policy 0, policy_version 298548 (0.0014) [2025-01-05 15:27:06,642][07482] Updated weights for policy 0, policy_version 298558 (0.0015) [2025-01-05 15:27:07,852][07361] Fps is (10 sec: 18840.9, 60 sec: 18773.2, 300 sec: 13079.4). Total num frames: 1222914048. Throughput: 0: 4654.8. Samples: 1182298. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) [2025-01-05 15:27:07,852][07361] Avg episode reward: [(0, '23.271')] [2025-01-05 15:27:08,892][07482] Updated weights for policy 0, policy_version 298568 (0.0016) [2025-01-05 15:27:11,000][07482] Updated weights for policy 0, policy_version 298578 (0.0015) [2025-01-05 15:27:12,851][07361] Fps is (10 sec: 18841.7, 60 sec: 18705.1, 300 sec: 13190.5). Total num frames: 1223008256. Throughput: 0: 4670.1. Samples: 1210424. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) [2025-01-05 15:27:12,852][07361] Avg episode reward: [(0, '23.285')] [2025-01-05 15:27:13,279][07482] Updated weights for policy 0, policy_version 298588 (0.0015) [2025-01-05 15:27:15,389][07482] Updated weights for policy 0, policy_version 298598 (0.0015) [2025-01-05 15:27:17,493][07482] Updated weights for policy 0, policy_version 298608 (0.0015) [2025-01-05 15:27:17,852][07361] Fps is (10 sec: 18842.1, 60 sec: 18705.0, 300 sec: 13315.5). Total num frames: 1223102464. Throughput: 0: 4727.3. Samples: 1238852. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) [2025-01-05 15:27:17,852][07361] Avg episode reward: [(0, '21.848')] [2025-01-05 15:27:19,769][07482] Updated weights for policy 0, policy_version 298618 (0.0015) [2025-01-05 15:27:21,917][07482] Updated weights for policy 0, policy_version 298628 (0.0015) [2025-01-05 15:27:22,852][07361] Fps is (10 sec: 18841.6, 60 sec: 18636.8, 300 sec: 13426.6). Total num frames: 1223196672. Throughput: 0: 4720.4. Samples: 1252524. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) [2025-01-05 15:27:22,852][07361] Avg episode reward: [(0, '25.024')] [2025-01-05 15:27:22,853][07448] Saving new best policy, reward=25.024! [2025-01-05 15:27:24,304][07482] Updated weights for policy 0, policy_version 298638 (0.0017) [2025-01-05 15:27:26,522][07482] Updated weights for policy 0, policy_version 298648 (0.0015) [2025-01-05 15:27:27,851][07361] Fps is (10 sec: 18432.1, 60 sec: 18705.1, 300 sec: 13537.7). Total num frames: 1223286784. Throughput: 0: 4682.7. Samples: 1279866. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) [2025-01-05 15:27:27,852][07361] Avg episode reward: [(0, '21.399')] [2025-01-05 15:27:28,723][07482] Updated weights for policy 0, policy_version 298658 (0.0019) [2025-01-05 15:27:30,884][07482] Updated weights for policy 0, policy_version 298668 (0.0016) [2025-01-05 15:27:32,852][07361] Fps is (10 sec: 18022.3, 60 sec: 18773.3, 300 sec: 13634.8). Total num frames: 1223376896. Throughput: 0: 4672.4. Samples: 1307576. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) [2025-01-05 15:27:32,852][07361] Avg episode reward: [(0, '26.360')] [2025-01-05 15:27:32,927][07448] Saving new best policy, reward=26.360! [2025-01-05 15:27:33,241][07482] Updated weights for policy 0, policy_version 298678 (0.0017) [2025-01-05 15:27:35,457][07482] Updated weights for policy 0, policy_version 298688 (0.0015) [2025-01-05 15:27:37,627][07482] Updated weights for policy 0, policy_version 298698 (0.0015) [2025-01-05 15:27:37,851][07361] Fps is (10 sec: 18022.5, 60 sec: 18636.8, 300 sec: 13732.0). Total num frames: 1223467008. Throughput: 0: 4664.0. Samples: 1321160. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) [2025-01-05 15:27:37,852][07361] Avg episode reward: [(0, '24.790')] [2025-01-05 15:27:39,957][07482] Updated weights for policy 0, policy_version 298708 (0.0017) [2025-01-05 15:27:42,151][07482] Updated weights for policy 0, policy_version 298718 (0.0017) [2025-01-05 15:27:42,852][07361] Fps is (10 sec: 18432.0, 60 sec: 18636.8, 300 sec: 13857.0). Total num frames: 1223561216. Throughput: 0: 4643.6. Samples: 1348664. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) [2025-01-05 15:27:42,852][07361] Avg episode reward: [(0, '21.975')] [2025-01-05 15:27:44,384][07482] Updated weights for policy 0, policy_version 298728 (0.0017) [2025-01-05 15:27:46,616][07482] Updated weights for policy 0, policy_version 298738 (0.0018) [2025-01-05 15:27:47,852][07361] Fps is (10 sec: 18431.7, 60 sec: 18568.5, 300 sec: 13982.0). Total num frames: 1223651328. Throughput: 0: 4623.0. Samples: 1376242. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) [2025-01-05 15:27:47,852][07361] Avg episode reward: [(0, '22.137')] [2025-01-05 15:27:48,848][07482] Updated weights for policy 0, policy_version 298748 (0.0017) [2025-01-05 15:27:51,015][07482] Updated weights for policy 0, policy_version 298758 (0.0016) [2025-01-05 15:27:52,852][07361] Fps is (10 sec: 18022.5, 60 sec: 18500.2, 300 sec: 14106.9). Total num frames: 1223741440. Throughput: 0: 4616.3. Samples: 1390028. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) [2025-01-05 15:27:52,852][07361] Avg episode reward: [(0, '23.650')] [2025-01-05 15:27:53,516][07482] Updated weights for policy 0, policy_version 298768 (0.0016) [2025-01-05 15:27:55,736][07482] Updated weights for policy 0, policy_version 298778 (0.0015) [2025-01-05 15:27:57,852][07361] Fps is (10 sec: 18022.4, 60 sec: 18432.0, 300 sec: 14245.7). Total num frames: 1223831552. Throughput: 0: 4579.2. Samples: 1416488. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) [2025-01-05 15:27:57,852][07361] Avg episode reward: [(0, '21.900')] [2025-01-05 15:27:58,023][07482] Updated weights for policy 0, policy_version 298788 (0.0017) [2025-01-05 15:28:00,342][07482] Updated weights for policy 0, policy_version 298798 (0.0016) [2025-01-05 15:28:02,554][07482] Updated weights for policy 0, policy_version 298808 (0.0016) [2025-01-05 15:28:02,852][07361] Fps is (10 sec: 18022.4, 60 sec: 18363.7, 300 sec: 14426.3). Total num frames: 1223921664. Throughput: 0: 4550.5. Samples: 1443622. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) [2025-01-05 15:28:02,852][07361] Avg episode reward: [(0, '25.059')] [2025-01-05 15:28:04,798][07482] Updated weights for policy 0, policy_version 298818 (0.0016) [2025-01-05 15:28:07,062][07482] Updated weights for policy 0, policy_version 298828 (0.0016) [2025-01-05 15:28:07,851][07361] Fps is (10 sec: 18022.6, 60 sec: 18295.6, 300 sec: 14537.4). Total num frames: 1224011776. Throughput: 0: 4545.8. Samples: 1457086. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) [2025-01-05 15:28:07,852][07361] Avg episode reward: [(0, '24.336')] [2025-01-05 15:28:09,344][07482] Updated weights for policy 0, policy_version 298838 (0.0016) [2025-01-05 15:28:11,455][07482] Updated weights for policy 0, policy_version 298848 (0.0015) [2025-01-05 15:28:12,851][07361] Fps is (10 sec: 18432.1, 60 sec: 18295.5, 300 sec: 14662.3). Total num frames: 1224105984. Throughput: 0: 4555.0. Samples: 1484842. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) [2025-01-05 15:28:12,852][07361] Avg episode reward: [(0, '22.494')] [2025-01-05 15:28:13,770][07482] Updated weights for policy 0, policy_version 298858 (0.0016) [2025-01-05 15:28:15,917][07482] Updated weights for policy 0, policy_version 298868 (0.0015) [2025-01-05 15:28:17,852][07361] Fps is (10 sec: 18431.7, 60 sec: 18227.2, 300 sec: 14801.1). Total num frames: 1224196096. Throughput: 0: 4553.3. Samples: 1512474. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 15:28:17,852][07361] Avg episode reward: [(0, '26.737')] [2025-01-05 15:28:17,971][07448] Saving new best policy, reward=26.737! [2025-01-05 15:28:18,172][07482] Updated weights for policy 0, policy_version 298878 (0.0017) [2025-01-05 15:28:20,424][07482] Updated weights for policy 0, policy_version 298888 (0.0016) [2025-01-05 15:28:22,587][07482] Updated weights for policy 0, policy_version 298898 (0.0015) [2025-01-05 15:28:22,851][07361] Fps is (10 sec: 18432.1, 60 sec: 18227.2, 300 sec: 14912.3). Total num frames: 1224290304. Throughput: 0: 4553.7. Samples: 1526076. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 15:28:22,852][07361] Avg episode reward: [(0, '24.726')] [2025-01-05 15:28:24,807][07482] Updated weights for policy 0, policy_version 298908 (0.0018) [2025-01-05 15:28:27,054][07482] Updated weights for policy 0, policy_version 298918 (0.0016) [2025-01-05 15:28:27,852][07361] Fps is (10 sec: 18432.0, 60 sec: 18227.2, 300 sec: 15023.3). Total num frames: 1224380416. Throughput: 0: 4559.6. Samples: 1553848. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 15:28:27,852][07361] Avg episode reward: [(0, '28.657')] [2025-01-05 15:28:27,991][07448] Saving new best policy, reward=28.657! [2025-01-05 15:28:29,358][07482] Updated weights for policy 0, policy_version 298928 (0.0017) [2025-01-05 15:28:31,503][07482] Updated weights for policy 0, policy_version 298938 (0.0016) [2025-01-05 15:28:32,852][07361] Fps is (10 sec: 18022.2, 60 sec: 18227.2, 300 sec: 15148.3). Total num frames: 1224470528. Throughput: 0: 4551.3. Samples: 1581050. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 15:28:32,852][07361] Avg episode reward: [(0, '26.563')] [2025-01-05 15:28:33,857][07482] Updated weights for policy 0, policy_version 298948 (0.0016) [2025-01-05 15:28:36,084][07482] Updated weights for policy 0, policy_version 298958 (0.0016) [2025-01-05 15:28:37,851][07361] Fps is (10 sec: 18022.6, 60 sec: 18227.2, 300 sec: 15287.1). Total num frames: 1224560640. Throughput: 0: 4548.8. Samples: 1594722. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 15:28:37,852][07361] Avg episode reward: [(0, '28.307')] [2025-01-05 15:28:37,895][07448] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000298966_1224564736.pth... [2025-01-05 15:28:37,942][07448] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000298013_1220661248.pth [2025-01-05 15:28:38,343][07482] Updated weights for policy 0, policy_version 298968 (0.0016) [2025-01-05 15:28:40,617][07482] Updated weights for policy 0, policy_version 298978 (0.0016) [2025-01-05 15:28:42,837][07482] Updated weights for policy 0, policy_version 298988 (0.0016) [2025-01-05 15:28:42,851][07361] Fps is (10 sec: 18432.0, 60 sec: 18227.2, 300 sec: 15439.9). Total num frames: 1224654848. Throughput: 0: 4566.4. Samples: 1621976. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 15:28:42,852][07361] Avg episode reward: [(0, '24.028')] [2025-01-05 15:28:45,060][07482] Updated weights for policy 0, policy_version 298998 (0.0017) [2025-01-05 15:28:47,266][07482] Updated weights for policy 0, policy_version 299008 (0.0016) [2025-01-05 15:28:47,852][07361] Fps is (10 sec: 18431.8, 60 sec: 18227.2, 300 sec: 15592.6). Total num frames: 1224744960. Throughput: 0: 4579.1. Samples: 1649684. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 15:28:47,852][07361] Avg episode reward: [(0, '24.426')] [2025-01-05 15:28:49,566][07482] Updated weights for policy 0, policy_version 299018 (0.0017) [2025-01-05 15:28:51,722][07482] Updated weights for policy 0, policy_version 299028 (0.0015) [2025-01-05 15:28:52,851][07361] Fps is (10 sec: 18022.6, 60 sec: 18227.2, 300 sec: 15717.5). Total num frames: 1224835072. Throughput: 0: 4581.4. Samples: 1663250. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 15:28:52,852][07361] Avg episode reward: [(0, '25.675')] [2025-01-05 15:28:54,058][07482] Updated weights for policy 0, policy_version 299038 (0.0016) [2025-01-05 15:28:56,233][07482] Updated weights for policy 0, policy_version 299048 (0.0015) [2025-01-05 15:28:57,852][07361] Fps is (10 sec: 18432.0, 60 sec: 18295.5, 300 sec: 15842.5). Total num frames: 1224929280. Throughput: 0: 4578.6. Samples: 1690880. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 15:28:57,852][07361] Avg episode reward: [(0, '26.946')] [2025-01-05 15:28:58,459][07482] Updated weights for policy 0, policy_version 299058 (0.0016) [2025-01-05 15:29:00,663][07482] Updated weights for policy 0, policy_version 299068 (0.0016) [2025-01-05 15:29:02,825][07482] Updated weights for policy 0, policy_version 299078 (0.0015) [2025-01-05 15:29:02,852][07361] Fps is (10 sec: 18841.3, 60 sec: 18363.7, 300 sec: 15981.3). Total num frames: 1225023488. Throughput: 0: 4582.5. Samples: 1718688. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 15:29:02,852][07361] Avg episode reward: [(0, '26.586')] [2025-01-05 15:29:05,019][07482] Updated weights for policy 0, policy_version 299088 (0.0016) [2025-01-05 15:29:07,222][07482] Updated weights for policy 0, policy_version 299098 (0.0016) [2025-01-05 15:29:07,851][07361] Fps is (10 sec: 18432.1, 60 sec: 18363.7, 300 sec: 16092.4). Total num frames: 1225113600. Throughput: 0: 4587.9. Samples: 1732532. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 15:29:07,852][07361] Avg episode reward: [(0, '25.786')] [2025-01-05 15:29:09,482][07482] Updated weights for policy 0, policy_version 299108 (0.0015) [2025-01-05 15:29:11,627][07482] Updated weights for policy 0, policy_version 299118 (0.0016) [2025-01-05 15:29:12,851][07361] Fps is (10 sec: 18432.3, 60 sec: 18363.8, 300 sec: 16231.3). Total num frames: 1225207808. Throughput: 0: 4591.0. Samples: 1760442. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 15:29:12,852][07361] Avg episode reward: [(0, '24.637')] [2025-01-05 15:29:13,912][07482] Updated weights for policy 0, policy_version 299128 (0.0016) [2025-01-05 15:29:16,064][07482] Updated weights for policy 0, policy_version 299138 (0.0016) [2025-01-05 15:29:17,851][07361] Fps is (10 sec: 18432.1, 60 sec: 18363.8, 300 sec: 16328.5). Total num frames: 1225297920. Throughput: 0: 4603.9. Samples: 1788226. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 15:29:17,852][07361] Avg episode reward: [(0, '25.536')] [2025-01-05 15:29:18,304][07482] Updated weights for policy 0, policy_version 299148 (0.0016) [2025-01-05 15:29:20,552][07482] Updated weights for policy 0, policy_version 299158 (0.0016) [2025-01-05 15:29:22,756][07482] Updated weights for policy 0, policy_version 299168 (0.0016) [2025-01-05 15:29:22,852][07361] Fps is (10 sec: 18431.7, 60 sec: 18363.7, 300 sec: 16453.4). Total num frames: 1225392128. Throughput: 0: 4604.9. Samples: 1801942. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 15:29:22,852][07361] Avg episode reward: [(0, '26.354')] [2025-01-05 15:29:25,027][07482] Updated weights for policy 0, policy_version 299178 (0.0017) [2025-01-05 15:29:27,277][07482] Updated weights for policy 0, policy_version 299188 (0.0016) [2025-01-05 15:29:27,852][07361] Fps is (10 sec: 18431.8, 60 sec: 18363.7, 300 sec: 16550.6). Total num frames: 1225482240. Throughput: 0: 4606.9. Samples: 1829288. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 15:29:27,852][07361] Avg episode reward: [(0, '26.622')] [2025-01-05 15:29:29,559][07482] Updated weights for policy 0, policy_version 299198 (0.0016) [2025-01-05 15:29:31,748][07482] Updated weights for policy 0, policy_version 299208 (0.0016) [2025-01-05 15:29:32,851][07361] Fps is (10 sec: 18022.5, 60 sec: 18363.7, 300 sec: 16661.7). Total num frames: 1225572352. Throughput: 0: 4598.9. Samples: 1856636. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 15:29:32,852][07361] Avg episode reward: [(0, '26.186')] [2025-01-05 15:29:34,081][07482] Updated weights for policy 0, policy_version 299218 (0.0017) [2025-01-05 15:29:36,281][07482] Updated weights for policy 0, policy_version 299228 (0.0016) [2025-01-05 15:29:37,852][07361] Fps is (10 sec: 18022.5, 60 sec: 18363.7, 300 sec: 16772.8). Total num frames: 1225662464. Throughput: 0: 4598.3. Samples: 1870174. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 15:29:37,852][07361] Avg episode reward: [(0, '23.938')] [2025-01-05 15:29:38,579][07482] Updated weights for policy 0, policy_version 299238 (0.0017) [2025-01-05 15:29:40,749][07482] Updated weights for policy 0, policy_version 299248 (0.0015) [2025-01-05 15:29:42,852][07361] Fps is (10 sec: 18431.9, 60 sec: 18363.7, 300 sec: 16883.9). Total num frames: 1225756672. Throughput: 0: 4595.4. Samples: 1897674. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 15:29:42,852][07361] Avg episode reward: [(0, '25.855')] [2025-01-05 15:29:43,062][07482] Updated weights for policy 0, policy_version 299258 (0.0017) [2025-01-05 15:29:45,338][07482] Updated weights for policy 0, policy_version 299268 (0.0017) [2025-01-05 15:29:47,527][07482] Updated weights for policy 0, policy_version 299278 (0.0015) [2025-01-05 15:29:47,852][07361] Fps is (10 sec: 18431.8, 60 sec: 18363.7, 300 sec: 16994.9). Total num frames: 1225846784. Throughput: 0: 4582.7. Samples: 1924910. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 15:29:47,852][07361] Avg episode reward: [(0, '29.272')] [2025-01-05 15:29:47,859][07448] Saving new best policy, reward=29.272! [2025-01-05 15:29:49,838][07482] Updated weights for policy 0, policy_version 299288 (0.0017) [2025-01-05 15:29:51,985][07482] Updated weights for policy 0, policy_version 299298 (0.0016) [2025-01-05 15:29:52,851][07361] Fps is (10 sec: 18022.7, 60 sec: 18363.7, 300 sec: 17092.2). Total num frames: 1225936896. Throughput: 0: 4574.4. Samples: 1938378. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 15:29:52,852][07361] Avg episode reward: [(0, '26.045')] [2025-01-05 15:29:54,316][07482] Updated weights for policy 0, policy_version 299308 (0.0016) [2025-01-05 15:29:56,447][07482] Updated weights for policy 0, policy_version 299318 (0.0015) [2025-01-05 15:29:57,852][07361] Fps is (10 sec: 18432.2, 60 sec: 18363.7, 300 sec: 17217.1). Total num frames: 1226031104. Throughput: 0: 4572.6. Samples: 1966208. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 15:29:57,852][07361] Avg episode reward: [(0, '28.359')] [2025-01-05 15:29:58,684][07482] Updated weights for policy 0, policy_version 299328 (0.0016) [2025-01-05 15:30:00,851][07482] Updated weights for policy 0, policy_version 299338 (0.0017) [2025-01-05 15:30:02,852][07361] Fps is (10 sec: 18431.7, 60 sec: 18295.5, 300 sec: 17314.3). Total num frames: 1226121216. Throughput: 0: 4571.7. Samples: 1993954. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 15:30:02,852][07361] Avg episode reward: [(0, '30.049')] [2025-01-05 15:30:02,902][07448] Saving new best policy, reward=30.049! [2025-01-05 15:30:03,145][07482] Updated weights for policy 0, policy_version 299348 (0.0016) [2025-01-05 15:30:05,334][07482] Updated weights for policy 0, policy_version 299358 (0.0015) [2025-01-05 15:30:07,552][07482] Updated weights for policy 0, policy_version 299368 (0.0015) [2025-01-05 15:30:07,852][07361] Fps is (10 sec: 18431.9, 60 sec: 18363.7, 300 sec: 17439.3). Total num frames: 1226215424. Throughput: 0: 4571.5. Samples: 2007658. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 15:30:07,852][07361] Avg episode reward: [(0, '31.139')] [2025-01-05 15:30:07,859][07448] Saving new best policy, reward=31.139! [2025-01-05 15:30:09,856][07482] Updated weights for policy 0, policy_version 299378 (0.0017) [2025-01-05 15:30:12,089][07482] Updated weights for policy 0, policy_version 299388 (0.0015) [2025-01-05 15:30:12,851][07361] Fps is (10 sec: 18432.1, 60 sec: 18295.4, 300 sec: 17536.4). Total num frames: 1226305536. Throughput: 0: 4568.9. Samples: 2034886. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 15:30:12,852][07361] Avg episode reward: [(0, '29.381')] [2025-01-05 15:30:14,416][07482] Updated weights for policy 0, policy_version 299398 (0.0016) [2025-01-05 15:30:16,651][07482] Updated weights for policy 0, policy_version 299408 (0.0017) [2025-01-05 15:30:17,851][07361] Fps is (10 sec: 18022.8, 60 sec: 18295.5, 300 sec: 17647.5). Total num frames: 1226395648. Throughput: 0: 4560.5. Samples: 2061856. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 15:30:17,852][07361] Avg episode reward: [(0, '30.265')] [2025-01-05 15:30:18,995][07482] Updated weights for policy 0, policy_version 299418 (0.0018) [2025-01-05 15:30:21,223][07482] Updated weights for policy 0, policy_version 299428 (0.0015) [2025-01-05 15:30:22,852][07361] Fps is (10 sec: 18022.1, 60 sec: 18227.2, 300 sec: 17744.7). Total num frames: 1226485760. Throughput: 0: 4557.5. Samples: 2075264. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 15:30:22,852][07361] Avg episode reward: [(0, '30.502')] [2025-01-05 15:30:23,505][07482] Updated weights for policy 0, policy_version 299438 (0.0015) [2025-01-05 15:30:25,675][07482] Updated weights for policy 0, policy_version 299448 (0.0015) [2025-01-05 15:30:27,852][07361] Fps is (10 sec: 18022.2, 60 sec: 18227.2, 300 sec: 17855.8). Total num frames: 1226575872. Throughput: 0: 4558.3. Samples: 2102796. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 15:30:27,852][07361] Avg episode reward: [(0, '29.236')] [2025-01-05 15:30:27,976][07482] Updated weights for policy 0, policy_version 299458 (0.0019) [2025-01-05 15:30:30,225][07482] Updated weights for policy 0, policy_version 299468 (0.0016) [2025-01-05 15:30:32,399][07482] Updated weights for policy 0, policy_version 299478 (0.0014) [2025-01-05 15:30:32,852][07361] Fps is (10 sec: 18432.2, 60 sec: 18295.5, 300 sec: 17966.9). Total num frames: 1226670080. Throughput: 0: 4563.9. Samples: 2130284. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 15:30:32,852][07361] Avg episode reward: [(0, '28.463')] [2025-01-05 15:30:34,684][07482] Updated weights for policy 0, policy_version 299488 (0.0016) [2025-01-05 15:30:36,815][07482] Updated weights for policy 0, policy_version 299498 (0.0016) [2025-01-05 15:30:37,851][07361] Fps is (10 sec: 18432.0, 60 sec: 18295.5, 300 sec: 18077.9). Total num frames: 1226760192. Throughput: 0: 4566.6. Samples: 2143874. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 15:30:37,852][07361] Avg episode reward: [(0, '29.955')] [2025-01-05 15:30:37,859][07448] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000299502_1226760192.pth... [2025-01-05 15:30:37,920][07448] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000298425_1222348800.pth [2025-01-05 15:30:39,170][07482] Updated weights for policy 0, policy_version 299508 (0.0016) [2025-01-05 15:30:41,336][07482] Updated weights for policy 0, policy_version 299518 (0.0015) [2025-01-05 15:30:42,852][07361] Fps is (10 sec: 18432.0, 60 sec: 18295.5, 300 sec: 18189.1). Total num frames: 1226854400. Throughput: 0: 4563.1. Samples: 2171548. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 15:30:42,852][07361] Avg episode reward: [(0, '27.909')] [2025-01-05 15:30:43,489][07482] Updated weights for policy 0, policy_version 299528 (0.0015) [2025-01-05 15:30:45,683][07482] Updated weights for policy 0, policy_version 299538 (0.0015) [2025-01-05 15:30:47,851][07361] Fps is (10 sec: 18432.1, 60 sec: 18295.5, 300 sec: 18300.1). Total num frames: 1226944512. Throughput: 0: 4568.5. Samples: 2199538. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 15:30:47,852][07361] Avg episode reward: [(0, '28.671')] [2025-01-05 15:30:47,965][07482] Updated weights for policy 0, policy_version 299548 (0.0017) [2025-01-05 15:30:50,207][07482] Updated weights for policy 0, policy_version 299558 (0.0016) [2025-01-05 15:30:52,412][07482] Updated weights for policy 0, policy_version 299568 (0.0015) [2025-01-05 15:30:52,852][07361] Fps is (10 sec: 18022.4, 60 sec: 18295.4, 300 sec: 18411.2). Total num frames: 1227034624. Throughput: 0: 4561.8. Samples: 2212940. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 15:30:52,852][07361] Avg episode reward: [(0, '29.865')] [2025-01-05 15:30:54,708][07482] Updated weights for policy 0, policy_version 299578 (0.0016) [2025-01-05 15:30:56,877][07482] Updated weights for policy 0, policy_version 299588 (0.0016) [2025-01-05 15:30:57,852][07361] Fps is (10 sec: 18431.7, 60 sec: 18295.4, 300 sec: 18438.9). Total num frames: 1227128832. Throughput: 0: 4569.7. Samples: 2240524. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 15:30:57,852][07361] Avg episode reward: [(0, '31.349')] [2025-01-05 15:30:57,859][07448] Saving new best policy, reward=31.349! [2025-01-05 15:30:59,242][07482] Updated weights for policy 0, policy_version 299598 (0.0017) [2025-01-05 15:31:01,501][07482] Updated weights for policy 0, policy_version 299608 (0.0017) [2025-01-05 15:31:02,851][07361] Fps is (10 sec: 18022.5, 60 sec: 18227.2, 300 sec: 18397.3). Total num frames: 1227214848. Throughput: 0: 4570.3. Samples: 2267520. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 15:31:02,852][07361] Avg episode reward: [(0, '33.999')] [2025-01-05 15:31:02,893][07448] Saving new best policy, reward=33.999! [2025-01-05 15:31:03,777][07482] Updated weights for policy 0, policy_version 299618 (0.0017) [2025-01-05 15:31:06,071][07482] Updated weights for policy 0, policy_version 299628 (0.0016) [2025-01-05 15:31:07,852][07361] Fps is (10 sec: 18022.5, 60 sec: 18227.2, 300 sec: 18383.4). Total num frames: 1227309056. Throughput: 0: 4565.9. Samples: 2280730. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 15:31:07,852][07361] Avg episode reward: [(0, '30.831')] [2025-01-05 15:31:08,310][07482] Updated weights for policy 0, policy_version 299638 (0.0016) [2025-01-05 15:31:10,468][07482] Updated weights for policy 0, policy_version 299648 (0.0016) [2025-01-05 15:31:12,721][07482] Updated weights for policy 0, policy_version 299658 (0.0016) [2025-01-05 15:31:12,852][07361] Fps is (10 sec: 18431.9, 60 sec: 18227.2, 300 sec: 18369.5). Total num frames: 1227399168. Throughput: 0: 4571.8. Samples: 2308528. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 15:31:12,852][07361] Avg episode reward: [(0, '29.375')] [2025-01-05 15:31:14,992][07482] Updated weights for policy 0, policy_version 299668 (0.0016) [2025-01-05 15:31:17,103][07482] Updated weights for policy 0, policy_version 299678 (0.0015) [2025-01-05 15:31:17,852][07361] Fps is (10 sec: 18432.0, 60 sec: 18295.4, 300 sec: 18355.6). Total num frames: 1227493376. Throughput: 0: 4578.8. Samples: 2336328. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 15:31:17,852][07361] Avg episode reward: [(0, '28.557')] [2025-01-05 15:31:19,359][07482] Updated weights for policy 0, policy_version 299688 (0.0016) [2025-01-05 15:31:21,519][07482] Updated weights for policy 0, policy_version 299698 (0.0014) [2025-01-05 15:31:22,852][07361] Fps is (10 sec: 18432.0, 60 sec: 18295.5, 300 sec: 18369.5). Total num frames: 1227583488. Throughput: 0: 4586.8. Samples: 2350280. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 15:31:22,852][07361] Avg episode reward: [(0, '30.780')] [2025-01-05 15:31:23,752][07482] Updated weights for policy 0, policy_version 299708 (0.0015) [2025-01-05 15:31:25,932][07482] Updated weights for policy 0, policy_version 299718 (0.0016) [2025-01-05 15:31:27,852][07361] Fps is (10 sec: 18432.0, 60 sec: 18363.7, 300 sec: 18397.3). Total num frames: 1227677696. Throughput: 0: 4586.9. Samples: 2377958. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 15:31:27,852][07361] Avg episode reward: [(0, '26.548')] [2025-01-05 15:31:28,291][07482] Updated weights for policy 0, policy_version 299728 (0.0017) [2025-01-05 15:31:30,459][07482] Updated weights for policy 0, policy_version 299738 (0.0016) [2025-01-05 15:31:32,665][07482] Updated weights for policy 0, policy_version 299748 (0.0015) [2025-01-05 15:31:32,851][07361] Fps is (10 sec: 18432.2, 60 sec: 18295.5, 300 sec: 18369.5). Total num frames: 1227767808. Throughput: 0: 4576.0. Samples: 2405456. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 15:31:32,852][07361] Avg episode reward: [(0, '27.467')] [2025-01-05 15:31:34,995][07482] Updated weights for policy 0, policy_version 299758 (0.0017) [2025-01-05 15:31:37,160][07482] Updated weights for policy 0, policy_version 299768 (0.0015) [2025-01-05 15:31:37,852][07361] Fps is (10 sec: 18022.4, 60 sec: 18295.4, 300 sec: 18355.6). Total num frames: 1227857920. Throughput: 0: 4575.3. Samples: 2418830. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 15:31:37,852][07361] Avg episode reward: [(0, '30.792')] [2025-01-05 15:31:39,510][07482] Updated weights for policy 0, policy_version 299778 (0.0017) [2025-01-05 15:31:41,729][07482] Updated weights for policy 0, policy_version 299788 (0.0017) [2025-01-05 15:31:42,852][07361] Fps is (10 sec: 18022.3, 60 sec: 18227.2, 300 sec: 18341.7). Total num frames: 1227948032. Throughput: 0: 4571.4. Samples: 2446236. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 15:31:42,852][07361] Avg episode reward: [(0, '28.548')] [2025-01-05 15:31:44,028][07482] Updated weights for policy 0, policy_version 299798 (0.0016) [2025-01-05 15:31:46,226][07482] Updated weights for policy 0, policy_version 299808 (0.0015) [2025-01-05 15:31:47,851][07361] Fps is (10 sec: 18022.6, 60 sec: 18227.2, 300 sec: 18327.9). Total num frames: 1228038144. Throughput: 0: 4574.3. Samples: 2473362. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 15:31:47,852][07361] Avg episode reward: [(0, '27.560')] [2025-01-05 15:31:48,538][07482] Updated weights for policy 0, policy_version 299818 (0.0016) [2025-01-05 15:31:50,694][07482] Updated weights for policy 0, policy_version 299828 (0.0015) [2025-01-05 15:31:52,851][07361] Fps is (10 sec: 18432.1, 60 sec: 18295.5, 300 sec: 18327.9). Total num frames: 1228132352. Throughput: 0: 4588.9. Samples: 2487232. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 15:31:52,852][07361] Avg episode reward: [(0, '28.175')] [2025-01-05 15:31:52,955][07482] Updated weights for policy 0, policy_version 299838 (0.0016) [2025-01-05 15:31:55,263][07482] Updated weights for policy 0, policy_version 299848 (0.0017) [2025-01-05 15:31:57,466][07482] Updated weights for policy 0, policy_version 299858 (0.0016) [2025-01-05 15:31:57,852][07361] Fps is (10 sec: 18431.8, 60 sec: 18227.2, 300 sec: 18314.0). Total num frames: 1228222464. Throughput: 0: 4575.2. Samples: 2514414. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 15:31:57,852][07361] Avg episode reward: [(0, '31.568')] [2025-01-05 15:31:59,753][07482] Updated weights for policy 0, policy_version 299868 (0.0016) [2025-01-05 15:32:01,988][07482] Updated weights for policy 0, policy_version 299878 (0.0015) [2025-01-05 15:32:02,851][07361] Fps is (10 sec: 18022.5, 60 sec: 18295.5, 300 sec: 18300.1). Total num frames: 1228312576. Throughput: 0: 4564.5. Samples: 2541728. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 15:32:02,852][07361] Avg episode reward: [(0, '29.970')] [2025-01-05 15:32:04,248][07482] Updated weights for policy 0, policy_version 299888 (0.0017) [2025-01-05 15:32:06,424][07482] Updated weights for policy 0, policy_version 299898 (0.0015) [2025-01-05 15:32:07,852][07361] Fps is (10 sec: 18431.9, 60 sec: 18295.4, 300 sec: 18300.1). Total num frames: 1228406784. Throughput: 0: 4558.3. Samples: 2555404. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 15:32:07,852][07361] Avg episode reward: [(0, '30.105')] [2025-01-05 15:32:08,806][07482] Updated weights for policy 0, policy_version 299908 (0.0016) [2025-01-05 15:32:10,986][07482] Updated weights for policy 0, policy_version 299918 (0.0016) [2025-01-05 15:32:12,852][07361] Fps is (10 sec: 18022.2, 60 sec: 18227.2, 300 sec: 18272.3). Total num frames: 1228492800. Throughput: 0: 4546.8. Samples: 2582562. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 15:32:12,852][07361] Avg episode reward: [(0, '32.684')] [2025-01-05 15:32:13,286][07482] Updated weights for policy 0, policy_version 299928 (0.0016) [2025-01-05 15:32:15,573][07482] Updated weights for policy 0, policy_version 299938 (0.0015) [2025-01-05 15:32:17,722][07482] Updated weights for policy 0, policy_version 299948 (0.0015) [2025-01-05 15:32:17,852][07361] Fps is (10 sec: 18022.5, 60 sec: 18227.2, 300 sec: 18272.3). Total num frames: 1228587008. Throughput: 0: 4544.6. Samples: 2609964. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 15:32:17,852][07361] Avg episode reward: [(0, '34.441')] [2025-01-05 15:32:17,949][07448] Saving new best policy, reward=34.441! [2025-01-05 15:32:20,032][07482] Updated weights for policy 0, policy_version 299958 (0.0016) [2025-01-05 15:32:22,288][07482] Updated weights for policy 0, policy_version 299968 (0.0016) [2025-01-05 15:32:22,851][07361] Fps is (10 sec: 18432.0, 60 sec: 18227.2, 300 sec: 18272.3). Total num frames: 1228677120. Throughput: 0: 4547.5. Samples: 2623468. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 15:32:22,852][07361] Avg episode reward: [(0, '32.341')] [2025-01-05 15:32:24,591][07482] Updated weights for policy 0, policy_version 299978 (0.0017) [2025-01-05 15:32:26,775][07482] Updated weights for policy 0, policy_version 299988 (0.0016) [2025-01-05 15:32:27,851][07361] Fps is (10 sec: 18022.6, 60 sec: 18159.0, 300 sec: 18272.3). Total num frames: 1228767232. Throughput: 0: 4540.9. Samples: 2650578. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 15:32:27,852][07361] Avg episode reward: [(0, '30.436')] [2025-01-05 15:32:29,162][07482] Updated weights for policy 0, policy_version 299998 (0.0017) [2025-01-05 15:32:31,351][07482] Updated weights for policy 0, policy_version 300008 (0.0015) [2025-01-05 15:32:32,852][07361] Fps is (10 sec: 18022.3, 60 sec: 18158.9, 300 sec: 18272.3). Total num frames: 1228857344. Throughput: 0: 4536.6. Samples: 2677510. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 15:32:32,852][07361] Avg episode reward: [(0, '33.531')] [2025-01-05 15:32:33,713][07482] Updated weights for policy 0, policy_version 300018 (0.0017) [2025-01-05 15:32:35,999][07482] Updated weights for policy 0, policy_version 300028 (0.0016) [2025-01-05 15:32:37,852][07361] Fps is (10 sec: 18022.2, 60 sec: 18158.9, 300 sec: 18258.4). Total num frames: 1228947456. Throughput: 0: 4526.4. Samples: 2690920. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 15:32:37,852][07361] Avg episode reward: [(0, '32.812')] [2025-01-05 15:32:37,859][07448] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000300036_1228947456.pth... [2025-01-05 15:32:37,905][07448] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000298966_1224564736.pth [2025-01-05 15:32:38,333][07482] Updated weights for policy 0, policy_version 300038 (0.0016) [2025-01-05 15:32:40,522][07482] Updated weights for policy 0, policy_version 300048 (0.0017) [2025-01-05 15:32:42,788][07482] Updated weights for policy 0, policy_version 300058 (0.0016) [2025-01-05 15:32:42,852][07361] Fps is (10 sec: 18022.4, 60 sec: 18158.9, 300 sec: 18258.4). Total num frames: 1229037568. Throughput: 0: 4520.2. Samples: 2717824. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 15:32:42,852][07361] Avg episode reward: [(0, '31.056')] [2025-01-05 15:32:45,026][07482] Updated weights for policy 0, policy_version 300068 (0.0016) [2025-01-05 15:32:47,183][07482] Updated weights for policy 0, policy_version 300078 (0.0015) [2025-01-05 15:32:47,852][07361] Fps is (10 sec: 18431.5, 60 sec: 18227.1, 300 sec: 18272.3). Total num frames: 1229131776. Throughput: 0: 4527.5. Samples: 2745466. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 15:32:47,852][07361] Avg episode reward: [(0, '32.833')] [2025-01-05 15:32:49,497][07482] Updated weights for policy 0, policy_version 300088 (0.0015) [2025-01-05 15:32:51,666][07482] Updated weights for policy 0, policy_version 300098 (0.0015) [2025-01-05 15:32:52,851][07361] Fps is (10 sec: 18022.6, 60 sec: 18090.7, 300 sec: 18258.5). Total num frames: 1229217792. Throughput: 0: 4530.7. Samples: 2759284. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 15:32:52,852][07361] Avg episode reward: [(0, '35.604')] [2025-01-05 15:32:52,855][07448] Saving new best policy, reward=35.604! [2025-01-05 15:32:54,037][07482] Updated weights for policy 0, policy_version 300108 (0.0017) [2025-01-05 15:32:56,269][07482] Updated weights for policy 0, policy_version 300118 (0.0015) [2025-01-05 15:32:57,851][07361] Fps is (10 sec: 17613.4, 60 sec: 18090.7, 300 sec: 18258.4). Total num frames: 1229307904. Throughput: 0: 4528.3. Samples: 2786336. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 15:32:57,852][07361] Avg episode reward: [(0, '31.860')] [2025-01-05 15:32:58,608][07482] Updated weights for policy 0, policy_version 300128 (0.0014) [2025-01-05 15:33:01,289][07482] Updated weights for policy 0, policy_version 300138 (0.0016) [2025-01-05 15:33:02,851][07361] Fps is (10 sec: 17203.2, 60 sec: 17954.1, 300 sec: 18230.7). Total num frames: 1229389824. Throughput: 0: 4465.8. Samples: 2810924. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 15:33:02,852][07361] Avg episode reward: [(0, '29.310')] [2025-01-05 15:33:03,660][07482] Updated weights for policy 0, policy_version 300148 (0.0015) [2025-01-05 15:33:06,000][07482] Updated weights for policy 0, policy_version 300158 (0.0016) [2025-01-05 15:33:07,852][07361] Fps is (10 sec: 17203.2, 60 sec: 17885.9, 300 sec: 18216.8). Total num frames: 1229479936. Throughput: 0: 4457.8. Samples: 2824070. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 15:33:07,852][07361] Avg episode reward: [(0, '30.966')] [2025-01-05 15:33:08,309][07482] Updated weights for policy 0, policy_version 300168 (0.0016) [2025-01-05 15:33:11,072][07482] Updated weights for policy 0, policy_version 300178 (0.0016) [2025-01-05 15:33:12,851][07361] Fps is (10 sec: 16384.0, 60 sec: 17681.1, 300 sec: 18161.3). Total num frames: 1229553664. Throughput: 0: 4398.5. Samples: 2848508. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 15:33:12,852][07361] Avg episode reward: [(0, '31.537')] [2025-01-05 15:33:13,632][07482] Updated weights for policy 0, policy_version 300188 (0.0016) [2025-01-05 15:33:16,517][07482] Updated weights for policy 0, policy_version 300198 (0.0015) [2025-01-05 15:33:17,851][07361] Fps is (10 sec: 15155.4, 60 sec: 17408.1, 300 sec: 18105.7). Total num frames: 1229631488. Throughput: 0: 4298.9. Samples: 2870962. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 15:33:17,852][07361] Avg episode reward: [(0, '32.072')] [2025-01-05 15:33:18,892][07482] Updated weights for policy 0, policy_version 300208 (0.0014) [2025-01-05 15:33:21,956][07482] Updated weights for policy 0, policy_version 300218 (0.0015) [2025-01-05 15:33:22,851][07361] Fps is (10 sec: 15155.2, 60 sec: 17135.0, 300 sec: 18050.2). Total num frames: 1229705216. Throughput: 0: 4280.1. Samples: 2883524. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 15:33:22,852][07361] Avg episode reward: [(0, '30.907')] [2025-01-05 15:33:24,382][07482] Updated weights for policy 0, policy_version 300228 (0.0014) [2025-01-05 15:33:26,971][07482] Updated weights for policy 0, policy_version 300238 (0.0015) [2025-01-05 15:33:27,851][07361] Fps is (10 sec: 15564.7, 60 sec: 16998.4, 300 sec: 18022.4). Total num frames: 1229787136. Throughput: 0: 4181.0. Samples: 2905970. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 15:33:27,852][07361] Avg episode reward: [(0, '33.960')] [2025-01-05 15:33:29,518][07482] Updated weights for policy 0, policy_version 300248 (0.0016) [2025-01-05 15:33:31,791][07482] Updated weights for policy 0, policy_version 300258 (0.0016) [2025-01-05 15:33:32,851][07361] Fps is (10 sec: 16384.0, 60 sec: 16861.9, 300 sec: 17994.6). Total num frames: 1229869056. Throughput: 0: 4120.8. Samples: 2930900. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 15:33:32,852][07361] Avg episode reward: [(0, '34.824')] [2025-01-05 15:33:34,567][07482] Updated weights for policy 0, policy_version 300268 (0.0015) [2025-01-05 15:33:37,364][07482] Updated weights for policy 0, policy_version 300278 (0.0017) [2025-01-05 15:33:37,851][07361] Fps is (10 sec: 15564.8, 60 sec: 16588.8, 300 sec: 17925.2). Total num frames: 1229942784. Throughput: 0: 4076.9. Samples: 2942744. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 15:33:37,852][07361] Avg episode reward: [(0, '30.519')] [2025-01-05 15:33:40,611][07482] Updated weights for policy 0, policy_version 300288 (0.0015) [2025-01-05 15:33:42,851][07361] Fps is (10 sec: 13516.6, 60 sec: 16110.9, 300 sec: 17828.0). Total num frames: 1230004224. Throughput: 0: 3910.3. Samples: 2962298. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 15:33:42,852][07361] Avg episode reward: [(0, '28.430')] [2025-01-05 15:33:44,090][07482] Updated weights for policy 0, policy_version 300298 (0.0015) [2025-01-05 15:33:47,132][07482] Updated weights for policy 0, policy_version 300308 (0.0016) [2025-01-05 15:33:47,852][07361] Fps is (10 sec: 13107.1, 60 sec: 15701.4, 300 sec: 17758.6). Total num frames: 1230073856. Throughput: 0: 3786.0. Samples: 2981296. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 15:33:47,852][07361] Avg episode reward: [(0, '31.216')] [2025-01-05 15:33:49,762][07482] Updated weights for policy 0, policy_version 300318 (0.0018) [2025-01-05 15:33:52,852][07361] Fps is (10 sec: 13516.8, 60 sec: 15360.0, 300 sec: 17661.4). Total num frames: 1230139392. Throughput: 0: 3739.4. Samples: 2992344. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 15:33:52,852][07361] Avg episode reward: [(0, '31.563')] [2025-01-05 15:33:53,157][07482] Updated weights for policy 0, policy_version 300328 (0.0016) [2025-01-05 15:33:55,734][07482] Updated weights for policy 0, policy_version 300338 (0.0016) [2025-01-05 15:33:57,852][07361] Fps is (10 sec: 13926.4, 60 sec: 15086.9, 300 sec: 17592.0). Total num frames: 1230213120. Throughput: 0: 3666.5. Samples: 3013502. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 15:33:57,852][07361] Avg episode reward: [(0, '31.628')] [2025-01-05 15:33:58,599][07482] Updated weights for policy 0, policy_version 300348 (0.0016) [2025-01-05 15:34:01,490][07482] Updated weights for policy 0, policy_version 300358 (0.0016) [2025-01-05 15:34:02,851][07361] Fps is (10 sec: 14745.6, 60 sec: 14950.4, 300 sec: 17536.4). Total num frames: 1230286848. Throughput: 0: 3643.5. Samples: 3034918. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 15:34:02,852][07361] Avg episode reward: [(0, '30.286')] [2025-01-05 15:34:03,885][07482] Updated weights for policy 0, policy_version 300368 (0.0016) [2025-01-05 15:34:06,125][07482] Updated weights for policy 0, policy_version 300378 (0.0015) [2025-01-05 15:34:07,852][07361] Fps is (10 sec: 16384.0, 60 sec: 14950.4, 300 sec: 17522.5). Total num frames: 1230376960. Throughput: 0: 3664.5. Samples: 3048428. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 15:34:07,852][07361] Avg episode reward: [(0, '29.320')] [2025-01-05 15:34:08,438][07482] Updated weights for policy 0, policy_version 300388 (0.0015) [2025-01-05 15:34:10,638][07482] Updated weights for policy 0, policy_version 300398 (0.0015) [2025-01-05 15:34:12,803][07482] Updated weights for policy 0, policy_version 300408 (0.0015) [2025-01-05 15:34:12,852][07361] Fps is (10 sec: 18432.0, 60 sec: 15291.7, 300 sec: 17536.4). Total num frames: 1230471168. Throughput: 0: 3775.2. Samples: 3075852. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 15:34:12,852][07361] Avg episode reward: [(0, '30.244')] [2025-01-05 15:34:15,134][07482] Updated weights for policy 0, policy_version 300418 (0.0016) [2025-01-05 15:34:17,285][07482] Updated weights for policy 0, policy_version 300428 (0.0015) [2025-01-05 15:34:17,851][07361] Fps is (10 sec: 18432.0, 60 sec: 15496.5, 300 sec: 17522.6). Total num frames: 1230561280. Throughput: 0: 3837.8. Samples: 3103602. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 15:34:17,852][07361] Avg episode reward: [(0, '33.902')] [2025-01-05 15:34:19,514][07482] Updated weights for policy 0, policy_version 300438 (0.0017) [2025-01-05 15:34:21,738][07482] Updated weights for policy 0, policy_version 300448 (0.0015) [2025-01-05 15:34:22,852][07361] Fps is (10 sec: 18432.0, 60 sec: 15837.8, 300 sec: 17536.4). Total num frames: 1230655488. Throughput: 0: 3878.8. Samples: 3117288. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 15:34:22,852][07361] Avg episode reward: [(0, '36.640')] [2025-01-05 15:34:22,853][07448] Saving new best policy, reward=36.640! [2025-01-05 15:34:24,012][07482] Updated weights for policy 0, policy_version 300458 (0.0016) [2025-01-05 15:34:26,199][07482] Updated weights for policy 0, policy_version 300468 (0.0018) [2025-01-05 15:34:27,852][07361] Fps is (10 sec: 18431.9, 60 sec: 15974.4, 300 sec: 17536.4). Total num frames: 1230745600. Throughput: 0: 4056.2. Samples: 3144828. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 15:34:27,852][07361] Avg episode reward: [(0, '34.310')] [2025-01-05 15:34:28,435][07482] Updated weights for policy 0, policy_version 300478 (0.0015) [2025-01-05 15:34:30,601][07482] Updated weights for policy 0, policy_version 300488 (0.0014) [2025-01-05 15:34:32,758][07482] Updated weights for policy 0, policy_version 300498 (0.0014) [2025-01-05 15:34:32,852][07361] Fps is (10 sec: 18431.9, 60 sec: 16179.2, 300 sec: 17550.3). Total num frames: 1230839808. Throughput: 0: 4259.9. Samples: 3172992. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 15:34:32,852][07361] Avg episode reward: [(0, '29.804')] [2025-01-05 15:34:35,005][07482] Updated weights for policy 0, policy_version 300508 (0.0014) [2025-01-05 15:34:37,175][07482] Updated weights for policy 0, policy_version 300518 (0.0015) [2025-01-05 15:34:37,851][07361] Fps is (10 sec: 18841.8, 60 sec: 16520.5, 300 sec: 17550.3). Total num frames: 1230934016. Throughput: 0: 4323.4. Samples: 3186898. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 15:34:37,852][07361] Avg episode reward: [(0, '31.278')] [2025-01-05 15:34:37,858][07448] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000300521_1230934016.pth... [2025-01-05 15:34:37,902][07448] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000299502_1226760192.pth [2025-01-05 15:34:39,440][07482] Updated weights for policy 0, policy_version 300528 (0.0016) [2025-01-05 15:34:41,669][07482] Updated weights for policy 0, policy_version 300538 (0.0016) [2025-01-05 15:34:42,852][07361] Fps is (10 sec: 18432.1, 60 sec: 16998.4, 300 sec: 17550.3). Total num frames: 1231024128. Throughput: 0: 4467.6. Samples: 3214542. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 15:34:42,852][07361] Avg episode reward: [(0, '31.278')] [2025-01-05 15:34:43,888][07482] Updated weights for policy 0, policy_version 300548 (0.0016) [2025-01-05 15:34:46,079][07482] Updated weights for policy 0, policy_version 300558 (0.0015) [2025-01-05 15:34:47,852][07361] Fps is (10 sec: 18022.1, 60 sec: 17339.7, 300 sec: 17550.3). Total num frames: 1231114240. Throughput: 0: 4602.6. Samples: 3242036. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 15:34:47,852][07361] Avg episode reward: [(0, '31.810')] [2025-01-05 15:34:48,386][07482] Updated weights for policy 0, policy_version 300568 (0.0015) [2025-01-05 15:34:50,550][07482] Updated weights for policy 0, policy_version 300578 (0.0016) [2025-01-05 15:34:52,701][07482] Updated weights for policy 0, policy_version 300588 (0.0014) [2025-01-05 15:34:52,851][07361] Fps is (10 sec: 18432.1, 60 sec: 17817.6, 300 sec: 17550.3). Total num frames: 1231208448. Throughput: 0: 4611.3. Samples: 3255936. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 15:34:52,852][07361] Avg episode reward: [(0, '33.978')] [2025-01-05 15:34:54,989][07482] Updated weights for policy 0, policy_version 300598 (0.0019) [2025-01-05 15:34:57,171][07482] Updated weights for policy 0, policy_version 300608 (0.0018) [2025-01-05 15:34:57,852][07361] Fps is (10 sec: 18431.9, 60 sec: 18090.6, 300 sec: 17550.3). Total num frames: 1231298560. Throughput: 0: 4620.8. Samples: 3283790. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 15:34:57,852][07361] Avg episode reward: [(0, '32.868')] [2025-01-05 15:34:59,402][07482] Updated weights for policy 0, policy_version 300618 (0.0015) [2025-01-05 15:35:01,633][07482] Updated weights for policy 0, policy_version 300628 (0.0015) [2025-01-05 15:35:02,851][07361] Fps is (10 sec: 18432.0, 60 sec: 18432.0, 300 sec: 17550.3). Total num frames: 1231392768. Throughput: 0: 4618.0. Samples: 3311414. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 15:35:02,852][07361] Avg episode reward: [(0, '34.617')] [2025-01-05 15:35:03,897][07482] Updated weights for policy 0, policy_version 300638 (0.0019) [2025-01-05 15:35:06,075][07482] Updated weights for policy 0, policy_version 300648 (0.0015) [2025-01-05 15:35:07,852][07361] Fps is (10 sec: 18432.3, 60 sec: 18432.0, 300 sec: 17550.3). Total num frames: 1231482880. Throughput: 0: 4618.2. Samples: 3325108. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 15:35:07,852][07361] Avg episode reward: [(0, '33.326')] [2025-01-05 15:35:08,355][07482] Updated weights for policy 0, policy_version 300658 (0.0015) [2025-01-05 15:35:10,515][07482] Updated weights for policy 0, policy_version 300668 (0.0015) [2025-01-05 15:35:12,667][07482] Updated weights for policy 0, policy_version 300678 (0.0016) [2025-01-05 15:35:12,851][07361] Fps is (10 sec: 18432.0, 60 sec: 18432.0, 300 sec: 17564.2). Total num frames: 1231577088. Throughput: 0: 4624.1. Samples: 3352910. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 15:35:12,852][07361] Avg episode reward: [(0, '33.442')] [2025-01-05 15:35:14,975][07482] Updated weights for policy 0, policy_version 300688 (0.0016) [2025-01-05 15:35:17,163][07482] Updated weights for policy 0, policy_version 300698 (0.0016) [2025-01-05 15:35:17,851][07361] Fps is (10 sec: 18841.8, 60 sec: 18500.3, 300 sec: 17578.1). Total num frames: 1231671296. Throughput: 0: 4614.1. Samples: 3380626. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 15:35:17,852][07361] Avg episode reward: [(0, '35.198')] [2025-01-05 15:35:19,394][07482] Updated weights for policy 0, policy_version 300708 (0.0016) [2025-01-05 15:35:21,576][07482] Updated weights for policy 0, policy_version 300718 (0.0015) [2025-01-05 15:35:22,851][07361] Fps is (10 sec: 18432.0, 60 sec: 18432.0, 300 sec: 17578.1). Total num frames: 1231761408. Throughput: 0: 4613.0. Samples: 3394482. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 15:35:22,852][07361] Avg episode reward: [(0, '34.853')] [2025-01-05 15:35:23,824][07482] Updated weights for policy 0, policy_version 300728 (0.0016) [2025-01-05 15:35:25,941][07482] Updated weights for policy 0, policy_version 300738 (0.0015) [2025-01-05 15:35:27,852][07361] Fps is (10 sec: 18431.7, 60 sec: 18500.3, 300 sec: 17578.1). Total num frames: 1231855616. Throughput: 0: 4623.3. Samples: 3422592. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 15:35:27,852][07361] Avg episode reward: [(0, '33.958')] [2025-01-05 15:35:28,287][07482] Updated weights for policy 0, policy_version 300748 (0.0018) [2025-01-05 15:35:30,530][07482] Updated weights for policy 0, policy_version 300758 (0.0017) [2025-01-05 15:35:32,552][07482] Updated weights for policy 0, policy_version 300768 (0.0016) [2025-01-05 15:35:32,851][07361] Fps is (10 sec: 18841.5, 60 sec: 18500.3, 300 sec: 17592.0). Total num frames: 1231949824. Throughput: 0: 4628.6. Samples: 3450324. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 15:35:32,852][07361] Avg episode reward: [(0, '32.224')] [2025-01-05 15:35:34,648][07482] Updated weights for policy 0, policy_version 300778 (0.0017) [2025-01-05 15:35:36,718][07482] Updated weights for policy 0, policy_version 300788 (0.0016) [2025-01-05 15:35:37,852][07361] Fps is (10 sec: 19251.1, 60 sec: 18568.5, 300 sec: 17605.9). Total num frames: 1232048128. Throughput: 0: 4648.8. Samples: 3465134. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 15:35:37,852][07361] Avg episode reward: [(0, '31.523')] [2025-01-05 15:35:38,763][07482] Updated weights for policy 0, policy_version 300798 (0.0017) [2025-01-05 15:35:40,832][07482] Updated weights for policy 0, policy_version 300808 (0.0017) [2025-01-05 15:35:42,852][07361] Fps is (10 sec: 19660.7, 60 sec: 18705.1, 300 sec: 17633.6). Total num frames: 1232146432. Throughput: 0: 4688.4. Samples: 3494766. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 15:35:42,852][07361] Avg episode reward: [(0, '33.253')] [2025-01-05 15:35:43,046][07482] Updated weights for policy 0, policy_version 300818 (0.0016) [2025-01-05 15:35:45,159][07482] Updated weights for policy 0, policy_version 300828 (0.0017) [2025-01-05 15:35:47,294][07482] Updated weights for policy 0, policy_version 300838 (0.0017) [2025-01-05 15:35:47,852][07361] Fps is (10 sec: 19251.2, 60 sec: 18773.3, 300 sec: 17647.5). Total num frames: 1232240640. Throughput: 0: 4713.5. Samples: 3523524. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 15:35:47,852][07361] Avg episode reward: [(0, '32.460')] [2025-01-05 15:35:49,512][07482] Updated weights for policy 0, policy_version 300848 (0.0016) [2025-01-05 15:35:51,543][07482] Updated weights for policy 0, policy_version 300858 (0.0016) [2025-01-05 15:35:52,851][07361] Fps is (10 sec: 18841.8, 60 sec: 18773.3, 300 sec: 17647.5). Total num frames: 1232334848. Throughput: 0: 4720.4. Samples: 3537524. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 15:35:52,852][07361] Avg episode reward: [(0, '31.065')] [2025-01-05 15:35:53,742][07482] Updated weights for policy 0, policy_version 300868 (0.0016) [2025-01-05 15:35:55,843][07482] Updated weights for policy 0, policy_version 300878 (0.0017) [2025-01-05 15:35:57,852][07361] Fps is (10 sec: 19251.4, 60 sec: 18909.9, 300 sec: 17689.2). Total num frames: 1232433152. Throughput: 0: 4747.8. Samples: 3566562. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 15:35:57,852][07361] Avg episode reward: [(0, '31.238')] [2025-01-05 15:35:58,002][07482] Updated weights for policy 0, policy_version 300888 (0.0016) [2025-01-05 15:36:00,162][07482] Updated weights for policy 0, policy_version 300898 (0.0015) [2025-01-05 15:36:02,232][07482] Updated weights for policy 0, policy_version 300908 (0.0016) [2025-01-05 15:36:02,852][07361] Fps is (10 sec: 19251.0, 60 sec: 18909.8, 300 sec: 17689.2). Total num frames: 1232527360. Throughput: 0: 4773.7. Samples: 3595444. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 15:36:02,852][07361] Avg episode reward: [(0, '34.022')] [2025-01-05 15:36:04,320][07482] Updated weights for policy 0, policy_version 300918 (0.0017) [2025-01-05 15:36:06,369][07482] Updated weights for policy 0, policy_version 300928 (0.0016) [2025-01-05 15:36:07,852][07361] Fps is (10 sec: 19251.1, 60 sec: 19046.4, 300 sec: 17716.9). Total num frames: 1232625664. Throughput: 0: 4792.2. Samples: 3610130. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 15:36:07,852][07361] Avg episode reward: [(0, '36.291')] [2025-01-05 15:36:08,482][07482] Updated weights for policy 0, policy_version 300938 (0.0015) [2025-01-05 15:36:10,494][07482] Updated weights for policy 0, policy_version 300948 (0.0016) [2025-01-05 15:36:12,533][07482] Updated weights for policy 0, policy_version 300958 (0.0019) [2025-01-05 15:36:12,852][07361] Fps is (10 sec: 20070.3, 60 sec: 19182.9, 300 sec: 17744.7). Total num frames: 1232728064. Throughput: 0: 4831.8. Samples: 3640022. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 15:36:12,852][07361] Avg episode reward: [(0, '32.819')] [2025-01-05 15:36:14,644][07482] Updated weights for policy 0, policy_version 300968 (0.0016) [2025-01-05 15:36:16,631][07482] Updated weights for policy 0, policy_version 300978 (0.0016) [2025-01-05 15:36:17,852][07361] Fps is (10 sec: 20480.1, 60 sec: 19319.4, 300 sec: 17786.4). Total num frames: 1232830464. Throughput: 0: 4881.6. Samples: 3669998. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 15:36:17,852][07361] Avg episode reward: [(0, '33.371')] [2025-01-05 15:36:18,665][07482] Updated weights for policy 0, policy_version 300988 (0.0016) [2025-01-05 15:36:20,700][07482] Updated weights for policy 0, policy_version 300998 (0.0016) [2025-01-05 15:36:22,757][07482] Updated weights for policy 0, policy_version 301008 (0.0016) [2025-01-05 15:36:22,852][07361] Fps is (10 sec: 20070.5, 60 sec: 19456.0, 300 sec: 17800.2). Total num frames: 1232928768. Throughput: 0: 4891.0. Samples: 3685228. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 15:36:22,852][07361] Avg episode reward: [(0, '32.429')] [2025-01-05 15:36:24,869][07482] Updated weights for policy 0, policy_version 301018 (0.0016) [2025-01-05 15:36:26,908][07482] Updated weights for policy 0, policy_version 301028 (0.0016) [2025-01-05 15:36:27,851][07361] Fps is (10 sec: 19661.0, 60 sec: 19524.3, 300 sec: 17828.0). Total num frames: 1233027072. Throughput: 0: 4893.3. Samples: 3714966. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 15:36:27,852][07361] Avg episode reward: [(0, '35.473')] [2025-01-05 15:36:28,979][07482] Updated weights for policy 0, policy_version 301038 (0.0019) [2025-01-05 15:36:31,017][07482] Updated weights for policy 0, policy_version 301048 (0.0015) [2025-01-05 15:36:32,851][07361] Fps is (10 sec: 19661.0, 60 sec: 19592.5, 300 sec: 17855.8). Total num frames: 1233125376. Throughput: 0: 4911.4. Samples: 3744538. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 15:36:32,852][07361] Avg episode reward: [(0, '35.060')] [2025-01-05 15:36:33,155][07482] Updated weights for policy 0, policy_version 301058 (0.0016) [2025-01-05 15:36:35,113][07482] Updated weights for policy 0, policy_version 301068 (0.0015) [2025-01-05 15:36:37,137][07482] Updated weights for policy 0, policy_version 301078 (0.0015) [2025-01-05 15:36:37,852][07361] Fps is (10 sec: 20070.0, 60 sec: 19660.8, 300 sec: 17897.4). Total num frames: 1233227776. Throughput: 0: 4938.0. Samples: 3759736. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 15:36:37,852][07361] Avg episode reward: [(0, '36.004')] [2025-01-05 15:36:37,861][07448] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000301081_1233227776.pth... [2025-01-05 15:36:37,918][07448] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000300036_1228947456.pth [2025-01-05 15:36:39,297][07482] Updated weights for policy 0, policy_version 301088 (0.0017) [2025-01-05 15:36:41,284][07482] Updated weights for policy 0, policy_version 301098 (0.0015) [2025-01-05 15:36:42,851][07361] Fps is (10 sec: 20070.4, 60 sec: 19660.8, 300 sec: 17925.2). Total num frames: 1233326080. Throughput: 0: 4956.0. Samples: 3789584. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 15:36:42,852][07361] Avg episode reward: [(0, '30.831')] [2025-01-05 15:36:43,320][07482] Updated weights for policy 0, policy_version 301108 (0.0015) [2025-01-05 15:36:45,374][07482] Updated weights for policy 0, policy_version 301118 (0.0016) [2025-01-05 15:36:47,378][07482] Updated weights for policy 0, policy_version 301128 (0.0015) [2025-01-05 15:36:47,851][07361] Fps is (10 sec: 20070.7, 60 sec: 19797.4, 300 sec: 17953.0). Total num frames: 1233428480. Throughput: 0: 4987.1. Samples: 3819864. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 15:36:47,852][07361] Avg episode reward: [(0, '31.168')] [2025-01-05 15:36:49,391][07482] Updated weights for policy 0, policy_version 301138 (0.0014) [2025-01-05 15:36:51,450][07482] Updated weights for policy 0, policy_version 301148 (0.0015) [2025-01-05 15:36:52,852][07361] Fps is (10 sec: 20070.3, 60 sec: 19865.6, 300 sec: 17980.8). Total num frames: 1233526784. Throughput: 0: 4998.6. Samples: 3835066. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 15:36:52,852][07361] Avg episode reward: [(0, '33.116')] [2025-01-05 15:36:53,520][07482] Updated weights for policy 0, policy_version 301158 (0.0016) [2025-01-05 15:36:55,521][07482] Updated weights for policy 0, policy_version 301168 (0.0014) [2025-01-05 15:36:57,550][07482] Updated weights for policy 0, policy_version 301178 (0.0015) [2025-01-05 15:36:57,851][07361] Fps is (10 sec: 20070.4, 60 sec: 19933.9, 300 sec: 18022.4). Total num frames: 1233629184. Throughput: 0: 5003.4. Samples: 3865174. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 15:36:57,852][07361] Avg episode reward: [(0, '32.680')] [2025-01-05 15:36:59,671][07482] Updated weights for policy 0, policy_version 301188 (0.0017) [2025-01-05 15:37:01,693][07482] Updated weights for policy 0, policy_version 301198 (0.0015) [2025-01-05 15:37:02,852][07361] Fps is (10 sec: 20070.2, 60 sec: 20002.1, 300 sec: 18036.3). Total num frames: 1233727488. Throughput: 0: 4998.7. Samples: 3894938. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 15:37:02,852][07361] Avg episode reward: [(0, '30.728')] [2025-01-05 15:37:03,807][07482] Updated weights for policy 0, policy_version 301208 (0.0016) [2025-01-05 15:37:05,853][07482] Updated weights for policy 0, policy_version 301218 (0.0016) [2025-01-05 15:37:07,852][07361] Fps is (10 sec: 19660.6, 60 sec: 20002.1, 300 sec: 18077.9). Total num frames: 1233825792. Throughput: 0: 4990.4. Samples: 3909798. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 15:37:07,852][07361] Avg episode reward: [(0, '33.148')] [2025-01-05 15:37:07,934][07482] Updated weights for policy 0, policy_version 301228 (0.0015) [2025-01-05 15:37:09,953][07482] Updated weights for policy 0, policy_version 301238 (0.0016) [2025-01-05 15:37:12,018][07482] Updated weights for policy 0, policy_version 301248 (0.0015) [2025-01-05 15:37:12,851][07361] Fps is (10 sec: 20070.7, 60 sec: 20002.2, 300 sec: 18105.7). Total num frames: 1233928192. Throughput: 0: 4991.2. Samples: 3939568. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 15:37:12,852][07361] Avg episode reward: [(0, '30.714')] [2025-01-05 15:37:14,110][07482] Updated weights for policy 0, policy_version 301258 (0.0016) [2025-01-05 15:37:16,109][07482] Updated weights for policy 0, policy_version 301268 (0.0014) [2025-01-05 15:37:17,851][07361] Fps is (10 sec: 20070.7, 60 sec: 19933.9, 300 sec: 18133.5). Total num frames: 1234026496. Throughput: 0: 5000.5. Samples: 3969558. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 15:37:17,852][07361] Avg episode reward: [(0, '29.567')] [2025-01-05 15:37:18,166][07482] Updated weights for policy 0, policy_version 301278 (0.0015) [2025-01-05 15:37:20,201][07482] Updated weights for policy 0, policy_version 301288 (0.0016) [2025-01-05 15:37:22,225][07482] Updated weights for policy 0, policy_version 301298 (0.0016) [2025-01-05 15:37:22,851][07361] Fps is (10 sec: 20070.4, 60 sec: 20002.2, 300 sec: 18175.1). Total num frames: 1234128896. Throughput: 0: 4996.2. Samples: 3984566. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 15:37:22,852][07361] Avg episode reward: [(0, '30.577')] [2025-01-05 15:37:24,301][07482] Updated weights for policy 0, policy_version 301308 (0.0016) [2025-01-05 15:37:26,308][07482] Updated weights for policy 0, policy_version 301318 (0.0016) [2025-01-05 15:37:27,851][07361] Fps is (10 sec: 20070.2, 60 sec: 20002.1, 300 sec: 18202.9). Total num frames: 1234227200. Throughput: 0: 5004.8. Samples: 4014802. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 15:37:27,852][07361] Avg episode reward: [(0, '33.583')] [2025-01-05 15:37:28,321][07482] Updated weights for policy 0, policy_version 301328 (0.0015) [2025-01-05 15:37:30,361][07482] Updated weights for policy 0, policy_version 301338 (0.0015) [2025-01-05 15:37:32,378][07482] Updated weights for policy 0, policy_version 301348 (0.0014) [2025-01-05 15:37:32,851][07361] Fps is (10 sec: 20070.4, 60 sec: 20070.4, 300 sec: 18244.6). Total num frames: 1234329600. Throughput: 0: 5005.0. Samples: 4045090. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 15:37:32,852][07361] Avg episode reward: [(0, '35.115')] [2025-01-05 15:37:34,417][07482] Updated weights for policy 0, policy_version 301358 (0.0015) [2025-01-05 15:37:36,445][07482] Updated weights for policy 0, policy_version 301368 (0.0015) [2025-01-05 15:37:37,852][07361] Fps is (10 sec: 20070.3, 60 sec: 20002.2, 300 sec: 18272.3). Total num frames: 1234427904. Throughput: 0: 5003.4. Samples: 4060218. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 15:37:37,852][07361] Avg episode reward: [(0, '30.612')] [2025-01-05 15:37:38,563][07482] Updated weights for policy 0, policy_version 301378 (0.0016) [2025-01-05 15:37:40,591][07482] Updated weights for policy 0, policy_version 301388 (0.0014) [2025-01-05 15:37:42,649][07482] Updated weights for policy 0, policy_version 301398 (0.0016) [2025-01-05 15:37:42,852][07361] Fps is (10 sec: 20069.9, 60 sec: 20070.3, 300 sec: 18300.1). Total num frames: 1234530304. Throughput: 0: 4996.3. Samples: 4090010. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 15:37:42,852][07361] Avg episode reward: [(0, '33.286')] [2025-01-05 15:37:44,742][07482] Updated weights for policy 0, policy_version 301408 (0.0016) [2025-01-05 15:37:46,754][07482] Updated weights for policy 0, policy_version 301418 (0.0015) [2025-01-05 15:37:47,851][07361] Fps is (10 sec: 20070.6, 60 sec: 20002.1, 300 sec: 18341.7). Total num frames: 1234628608. Throughput: 0: 4999.7. Samples: 4119924. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 15:37:47,852][07361] Avg episode reward: [(0, '32.147')] [2025-01-05 15:37:48,790][07482] Updated weights for policy 0, policy_version 301428 (0.0014) [2025-01-05 15:37:50,792][07482] Updated weights for policy 0, policy_version 301438 (0.0015) [2025-01-05 15:37:52,808][07482] Updated weights for policy 0, policy_version 301448 (0.0015) [2025-01-05 15:37:52,851][07361] Fps is (10 sec: 20070.8, 60 sec: 20070.4, 300 sec: 18383.4). Total num frames: 1234731008. Throughput: 0: 5007.9. Samples: 4135152. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 15:37:52,852][07361] Avg episode reward: [(0, '33.434')] [2025-01-05 15:37:54,860][07482] Updated weights for policy 0, policy_version 301458 (0.0017) [2025-01-05 15:37:56,862][07482] Updated weights for policy 0, policy_version 301468 (0.0015) [2025-01-05 15:37:57,851][07361] Fps is (10 sec: 20070.5, 60 sec: 20002.1, 300 sec: 18438.9). Total num frames: 1234829312. Throughput: 0: 5020.6. Samples: 4165494. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 15:37:57,852][07361] Avg episode reward: [(0, '35.394')] [2025-01-05 15:37:58,906][07482] Updated weights for policy 0, policy_version 301478 (0.0015) [2025-01-05 15:38:00,951][07482] Updated weights for policy 0, policy_version 301488 (0.0015) [2025-01-05 15:38:02,852][07361] Fps is (10 sec: 19660.7, 60 sec: 20002.2, 300 sec: 18466.7). Total num frames: 1234927616. Throughput: 0: 5019.8. Samples: 4195448. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 15:38:02,852][07361] Avg episode reward: [(0, '34.560')] [2025-01-05 15:38:03,036][07482] Updated weights for policy 0, policy_version 301498 (0.0016) [2025-01-05 15:38:05,072][07482] Updated weights for policy 0, policy_version 301508 (0.0015) [2025-01-05 15:38:07,139][07482] Updated weights for policy 0, policy_version 301518 (0.0015) [2025-01-05 15:38:07,851][07361] Fps is (10 sec: 20070.4, 60 sec: 20070.4, 300 sec: 18563.9). Total num frames: 1235030016. Throughput: 0: 5021.6. Samples: 4210538. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 15:38:07,852][07361] Avg episode reward: [(0, '32.925')] [2025-01-05 15:38:09,189][07482] Updated weights for policy 0, policy_version 301528 (0.0016) [2025-01-05 15:38:11,207][07482] Updated weights for policy 0, policy_version 301538 (0.0015) [2025-01-05 15:38:12,851][07361] Fps is (10 sec: 20070.5, 60 sec: 20002.1, 300 sec: 18633.3). Total num frames: 1235128320. Throughput: 0: 5013.4. Samples: 4240404. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 15:38:12,852][07361] Avg episode reward: [(0, '32.193')] [2025-01-05 15:38:13,366][07482] Updated weights for policy 0, policy_version 301548 (0.0015) [2025-01-05 15:38:15,345][07482] Updated weights for policy 0, policy_version 301558 (0.0016) [2025-01-05 15:38:17,394][07482] Updated weights for policy 0, policy_version 301568 (0.0016) [2025-01-05 15:38:17,852][07361] Fps is (10 sec: 20070.1, 60 sec: 20070.4, 300 sec: 18730.5). Total num frames: 1235230720. Throughput: 0: 5002.5. Samples: 4270204. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 15:38:17,852][07361] Avg episode reward: [(0, '32.159')] [2025-01-05 15:38:19,558][07482] Updated weights for policy 0, policy_version 301578 (0.0015) [2025-01-05 15:38:21,560][07482] Updated weights for policy 0, policy_version 301588 (0.0016) [2025-01-05 15:38:22,851][07361] Fps is (10 sec: 19660.9, 60 sec: 19933.9, 300 sec: 18772.2). Total num frames: 1235324928. Throughput: 0: 4993.7. Samples: 4284936. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 15:38:22,852][07361] Avg episode reward: [(0, '32.291')] [2025-01-05 15:38:23,703][07482] Updated weights for policy 0, policy_version 301598 (0.0015) [2025-01-05 15:38:25,822][07482] Updated weights for policy 0, policy_version 301608 (0.0015) [2025-01-05 15:38:27,852][07361] Fps is (10 sec: 19251.2, 60 sec: 19933.9, 300 sec: 18827.7). Total num frames: 1235423232. Throughput: 0: 4981.9. Samples: 4314196. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 15:38:27,852][07361] Avg episode reward: [(0, '30.008')] [2025-01-05 15:38:27,898][07482] Updated weights for policy 0, policy_version 301618 (0.0016) [2025-01-05 15:38:29,975][07482] Updated weights for policy 0, policy_version 301628 (0.0016) [2025-01-05 15:38:32,080][07482] Updated weights for policy 0, policy_version 301638 (0.0016) [2025-01-05 15:38:32,852][07361] Fps is (10 sec: 19660.7, 60 sec: 19865.6, 300 sec: 18911.0). Total num frames: 1235521536. Throughput: 0: 4973.9. Samples: 4343750. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 15:38:32,852][07361] Avg episode reward: [(0, '30.632')] [2025-01-05 15:38:34,123][07482] Updated weights for policy 0, policy_version 301648 (0.0015) [2025-01-05 15:38:36,164][07482] Updated weights for policy 0, policy_version 301658 (0.0015) [2025-01-05 15:38:37,851][07361] Fps is (10 sec: 19661.0, 60 sec: 19865.6, 300 sec: 19036.0). Total num frames: 1235619840. Throughput: 0: 4964.2. Samples: 4358542. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 15:38:37,852][07361] Avg episode reward: [(0, '33.424')] [2025-01-05 15:38:37,871][07448] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000301666_1235623936.pth... [2025-01-05 15:38:37,924][07448] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000300521_1230934016.pth [2025-01-05 15:38:38,364][07482] Updated weights for policy 0, policy_version 301668 (0.0019) [2025-01-05 15:38:40,374][07482] Updated weights for policy 0, policy_version 301678 (0.0015) [2025-01-05 15:38:42,442][07482] Updated weights for policy 0, policy_version 301688 (0.0016) [2025-01-05 15:38:42,852][07361] Fps is (10 sec: 20070.4, 60 sec: 19865.7, 300 sec: 19147.1). Total num frames: 1235722240. Throughput: 0: 4944.3. Samples: 4387986. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 15:38:42,852][07361] Avg episode reward: [(0, '30.811')] [2025-01-05 15:38:44,578][07482] Updated weights for policy 0, policy_version 301698 (0.0016) [2025-01-05 15:38:46,583][07482] Updated weights for policy 0, policy_version 301708 (0.0015) [2025-01-05 15:38:47,852][07361] Fps is (10 sec: 20070.1, 60 sec: 19865.6, 300 sec: 19258.1). Total num frames: 1235820544. Throughput: 0: 4937.6. Samples: 4417638. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 15:38:47,852][07361] Avg episode reward: [(0, '32.927')] [2025-01-05 15:38:48,645][07482] Updated weights for policy 0, policy_version 301718 (0.0016) [2025-01-05 15:38:50,739][07482] Updated weights for policy 0, policy_version 301728 (0.0016) [2025-01-05 15:38:52,844][07482] Updated weights for policy 0, policy_version 301738 (0.0017) [2025-01-05 15:38:52,852][07361] Fps is (10 sec: 19660.8, 60 sec: 19797.3, 300 sec: 19341.5). Total num frames: 1235918848. Throughput: 0: 4935.7. Samples: 4432646. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 15:38:52,852][07361] Avg episode reward: [(0, '30.721')] [2025-01-05 15:38:55,005][07482] Updated weights for policy 0, policy_version 301748 (0.0016) [2025-01-05 15:38:57,125][07482] Updated weights for policy 0, policy_version 301758 (0.0016) [2025-01-05 15:38:57,851][07361] Fps is (10 sec: 18841.8, 60 sec: 19660.8, 300 sec: 19397.0). Total num frames: 1236008960. Throughput: 0: 4912.4. Samples: 4461464. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 15:38:57,852][07361] Avg episode reward: [(0, '34.106')] [2025-01-05 15:38:59,657][07482] Updated weights for policy 0, policy_version 301768 (0.0018) [2025-01-05 15:39:01,807][07482] Updated weights for policy 0, policy_version 301778 (0.0017) [2025-01-05 15:39:02,852][07361] Fps is (10 sec: 18022.4, 60 sec: 19524.3, 300 sec: 19397.0). Total num frames: 1236099072. Throughput: 0: 4839.3. Samples: 4487974. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 15:39:02,852][07361] Avg episode reward: [(0, '33.746')] [2025-01-05 15:39:04,174][07482] Updated weights for policy 0, policy_version 301788 (0.0018) [2025-01-05 15:39:06,308][07482] Updated weights for policy 0, policy_version 301798 (0.0018) [2025-01-05 15:39:07,852][07361] Fps is (10 sec: 18022.3, 60 sec: 19319.4, 300 sec: 19383.1). Total num frames: 1236189184. Throughput: 0: 4811.2. Samples: 4501440. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 15:39:07,852][07361] Avg episode reward: [(0, '34.764')] [2025-01-05 15:39:08,651][07482] Updated weights for policy 0, policy_version 301808 (0.0018) [2025-01-05 15:39:10,984][07482] Updated weights for policy 0, policy_version 301818 (0.0018) [2025-01-05 15:39:12,852][07361] Fps is (10 sec: 18022.3, 60 sec: 19182.9, 300 sec: 19383.1). Total num frames: 1236279296. Throughput: 0: 4762.4. Samples: 4528504. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 15:39:12,852][07361] Avg episode reward: [(0, '36.145')] [2025-01-05 15:39:13,225][07482] Updated weights for policy 0, policy_version 301828 (0.0018) [2025-01-05 15:39:15,500][07482] Updated weights for policy 0, policy_version 301838 (0.0018) [2025-01-05 15:39:17,605][07482] Updated weights for policy 0, policy_version 301848 (0.0015) [2025-01-05 15:39:17,852][07361] Fps is (10 sec: 18431.9, 60 sec: 19046.4, 300 sec: 19383.1). Total num frames: 1236373504. Throughput: 0: 4723.5. Samples: 4556306. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 15:39:17,852][07361] Avg episode reward: [(0, '36.615')] [2025-01-05 15:39:19,665][07482] Updated weights for policy 0, policy_version 301858 (0.0017) [2025-01-05 15:39:21,721][07482] Updated weights for policy 0, policy_version 301868 (0.0015) [2025-01-05 15:39:22,852][07361] Fps is (10 sec: 19251.3, 60 sec: 19114.6, 300 sec: 19410.9). Total num frames: 1236471808. Throughput: 0: 4717.6. Samples: 4570834. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 15:39:22,852][07361] Avg episode reward: [(0, '33.783')] [2025-01-05 15:39:23,958][07482] Updated weights for policy 0, policy_version 301878 (0.0019) [2025-01-05 15:39:25,936][07482] Updated weights for policy 0, policy_version 301888 (0.0015) [2025-01-05 15:39:27,851][07361] Fps is (10 sec: 19661.0, 60 sec: 19114.7, 300 sec: 19424.8). Total num frames: 1236570112. Throughput: 0: 4715.5. Samples: 4600182. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 15:39:27,852][07361] Avg episode reward: [(0, '35.684')] [2025-01-05 15:39:27,989][07482] Updated weights for policy 0, policy_version 301898 (0.0016) [2025-01-05 15:39:30,109][07482] Updated weights for policy 0, policy_version 301908 (0.0016) [2025-01-05 15:39:32,076][07482] Updated weights for policy 0, policy_version 301918 (0.0016) [2025-01-05 15:39:32,851][07361] Fps is (10 sec: 19661.1, 60 sec: 19114.7, 300 sec: 19438.6). Total num frames: 1236668416. Throughput: 0: 4725.0. Samples: 4630260. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 15:39:32,852][07361] Avg episode reward: [(0, '32.469')] [2025-01-05 15:39:34,127][07482] Updated weights for policy 0, policy_version 301928 (0.0016) [2025-01-05 15:39:36,246][07482] Updated weights for policy 0, policy_version 301938 (0.0016) [2025-01-05 15:39:37,852][07361] Fps is (10 sec: 19660.5, 60 sec: 19114.6, 300 sec: 19466.4). Total num frames: 1236766720. Throughput: 0: 4724.0. Samples: 4645226. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 15:39:37,852][07361] Avg episode reward: [(0, '32.762')] [2025-01-05 15:39:38,317][07482] Updated weights for policy 0, policy_version 301948 (0.0018) [2025-01-05 15:39:40,364][07482] Updated weights for policy 0, policy_version 301958 (0.0014) [2025-01-05 15:39:42,462][07482] Updated weights for policy 0, policy_version 301968 (0.0016) [2025-01-05 15:39:42,851][07361] Fps is (10 sec: 20070.4, 60 sec: 19114.7, 300 sec: 19508.1). Total num frames: 1236869120. Throughput: 0: 4741.6. Samples: 4674836. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 15:39:42,852][07361] Avg episode reward: [(0, '32.879')] [2025-01-05 15:39:44,518][07482] Updated weights for policy 0, policy_version 301978 (0.0017) [2025-01-05 15:39:46,592][07482] Updated weights for policy 0, policy_version 301988 (0.0016) [2025-01-05 15:39:47,851][07361] Fps is (10 sec: 19661.1, 60 sec: 19046.4, 300 sec: 19508.1). Total num frames: 1236963328. Throughput: 0: 4807.5. Samples: 4704312. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 15:39:47,852][07361] Avg episode reward: [(0, '36.293')] [2025-01-05 15:39:48,769][07482] Updated weights for policy 0, policy_version 301998 (0.0017) [2025-01-05 15:39:50,730][07482] Updated weights for policy 0, policy_version 302008 (0.0016) [2025-01-05 15:39:52,771][07482] Updated weights for policy 0, policy_version 302018 (0.0016) [2025-01-05 15:39:52,851][07361] Fps is (10 sec: 19660.7, 60 sec: 19114.7, 300 sec: 19549.7). Total num frames: 1237065728. Throughput: 0: 4836.5. Samples: 4719080. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 15:39:52,852][07361] Avg episode reward: [(0, '35.804')] [2025-01-05 15:39:54,929][07482] Updated weights for policy 0, policy_version 302028 (0.0017) [2025-01-05 15:39:56,885][07482] Updated weights for policy 0, policy_version 302038 (0.0015) [2025-01-05 15:39:57,852][07361] Fps is (10 sec: 20070.1, 60 sec: 19251.2, 300 sec: 19563.6). Total num frames: 1237164032. Throughput: 0: 4899.4. Samples: 4748976. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 15:39:57,852][07361] Avg episode reward: [(0, '37.449')] [2025-01-05 15:39:57,859][07448] Saving new best policy, reward=37.449! [2025-01-05 15:39:58,998][07482] Updated weights for policy 0, policy_version 302048 (0.0015) [2025-01-05 15:40:01,101][07482] Updated weights for policy 0, policy_version 302058 (0.0017) [2025-01-05 15:40:02,852][07361] Fps is (10 sec: 19660.3, 60 sec: 19387.7, 300 sec: 19591.4). Total num frames: 1237262336. Throughput: 0: 4945.8. Samples: 4778870. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 15:40:02,852][07361] Avg episode reward: [(0, '35.940')] [2025-01-05 15:40:03,065][07482] Updated weights for policy 0, policy_version 302068 (0.0016) [2025-01-05 15:40:05,128][07482] Updated weights for policy 0, policy_version 302078 (0.0015) [2025-01-05 15:40:07,203][07482] Updated weights for policy 0, policy_version 302088 (0.0015) [2025-01-05 15:40:07,851][07361] Fps is (10 sec: 20070.7, 60 sec: 19592.5, 300 sec: 19619.1). Total num frames: 1237364736. Throughput: 0: 4958.3. Samples: 4793958. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 15:40:07,852][07361] Avg episode reward: [(0, '34.955')] [2025-01-05 15:40:09,256][07482] Updated weights for policy 0, policy_version 302098 (0.0016) [2025-01-05 15:40:11,322][07482] Updated weights for policy 0, policy_version 302108 (0.0015) [2025-01-05 15:40:12,852][07361] Fps is (10 sec: 20070.8, 60 sec: 19729.1, 300 sec: 19633.0). Total num frames: 1237463040. Throughput: 0: 4967.7. Samples: 4823730. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 15:40:12,852][07361] Avg episode reward: [(0, '35.681')] [2025-01-05 15:40:13,435][07482] Updated weights for policy 0, policy_version 302118 (0.0016) [2025-01-05 15:40:15,426][07482] Updated weights for policy 0, policy_version 302128 (0.0017) [2025-01-05 15:40:17,477][07482] Updated weights for policy 0, policy_version 302138 (0.0015) [2025-01-05 15:40:17,852][07361] Fps is (10 sec: 19660.6, 60 sec: 19797.3, 300 sec: 19660.8). Total num frames: 1237561344. Throughput: 0: 4962.5. Samples: 4853574. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 15:40:17,852][07361] Avg episode reward: [(0, '37.002')] [2025-01-05 15:40:19,606][07482] Updated weights for policy 0, policy_version 302148 (0.0016) [2025-01-05 15:40:21,619][07482] Updated weights for policy 0, policy_version 302158 (0.0015) [2025-01-05 15:40:22,852][07361] Fps is (10 sec: 19660.7, 60 sec: 19797.3, 300 sec: 19674.7). Total num frames: 1237659648. Throughput: 0: 4956.5. Samples: 4868270. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 15:40:22,852][07361] Avg episode reward: [(0, '39.359')] [2025-01-05 15:40:22,853][07448] Saving new best policy, reward=39.359! [2025-01-05 15:40:23,727][07482] Updated weights for policy 0, policy_version 302168 (0.0016) [2025-01-05 15:40:25,807][07482] Updated weights for policy 0, policy_version 302178 (0.0015) [2025-01-05 15:40:27,792][07482] Updated weights for policy 0, policy_version 302188 (0.0015) [2025-01-05 15:40:27,852][07361] Fps is (10 sec: 20070.4, 60 sec: 19865.6, 300 sec: 19702.4). Total num frames: 1237762048. Throughput: 0: 4957.8. Samples: 4897940. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 15:40:27,852][07361] Avg episode reward: [(0, '37.582')] [2025-01-05 15:40:29,867][07482] Updated weights for policy 0, policy_version 302198 (0.0015) [2025-01-05 15:40:31,945][07482] Updated weights for policy 0, policy_version 302208 (0.0016) [2025-01-05 15:40:32,852][07361] Fps is (10 sec: 20070.4, 60 sec: 19865.5, 300 sec: 19702.5). Total num frames: 1237860352. Throughput: 0: 4971.1. Samples: 4928010. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 15:40:32,852][07361] Avg episode reward: [(0, '37.983')] [2025-01-05 15:40:34,019][07482] Updated weights for policy 0, policy_version 302218 (0.0016) [2025-01-05 15:40:36,122][07482] Updated weights for policy 0, policy_version 302228 (0.0016) [2025-01-05 15:40:37,852][07361] Fps is (10 sec: 19660.6, 60 sec: 19865.6, 300 sec: 19702.4). Total num frames: 1237958656. Throughput: 0: 4966.7. Samples: 4942584. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 15:40:37,852][07361] Avg episode reward: [(0, '38.268')] [2025-01-05 15:40:37,860][07448] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000302236_1237958656.pth... [2025-01-05 15:40:37,923][07448] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000301081_1233227776.pth [2025-01-05 15:40:38,288][07482] Updated weights for policy 0, policy_version 302238 (0.0017) [2025-01-05 15:40:40,323][07482] Updated weights for policy 0, policy_version 302248 (0.0016) [2025-01-05 15:40:42,382][07482] Updated weights for policy 0, policy_version 302258 (0.0017) [2025-01-05 15:40:42,851][07361] Fps is (10 sec: 19661.1, 60 sec: 19797.3, 300 sec: 19716.4). Total num frames: 1238056960. Throughput: 0: 4952.2. Samples: 4971822. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 15:40:42,852][07361] Avg episode reward: [(0, '36.495')] [2025-01-05 15:40:44,512][07482] Updated weights for policy 0, policy_version 302268 (0.0016) [2025-01-05 15:40:46,502][07482] Updated weights for policy 0, policy_version 302278 (0.0016) [2025-01-05 15:40:47,852][07361] Fps is (10 sec: 19661.0, 60 sec: 19865.6, 300 sec: 19730.2). Total num frames: 1238155264. Throughput: 0: 4951.7. Samples: 5001698. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 15:40:47,852][07361] Avg episode reward: [(0, '33.104')] [2025-01-05 15:40:48,576][07482] Updated weights for policy 0, policy_version 302288 (0.0017) [2025-01-05 15:40:50,655][07482] Updated weights for policy 0, policy_version 302298 (0.0016) [2025-01-05 15:40:52,662][07482] Updated weights for policy 0, policy_version 302308 (0.0020) [2025-01-05 15:40:52,852][07361] Fps is (10 sec: 19660.5, 60 sec: 19797.3, 300 sec: 19730.2). Total num frames: 1238253568. Throughput: 0: 4949.0. Samples: 5016664. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 15:40:52,852][07361] Avg episode reward: [(0, '34.786')] [2025-01-05 15:40:54,739][07482] Updated weights for policy 0, policy_version 302318 (0.0017) [2025-01-05 15:40:56,815][07482] Updated weights for policy 0, policy_version 302328 (0.0016) [2025-01-05 15:40:57,851][07361] Fps is (10 sec: 20070.6, 60 sec: 19865.7, 300 sec: 19758.0). Total num frames: 1238355968. Throughput: 0: 4951.3. Samples: 5046538. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 15:40:57,852][07361] Avg episode reward: [(0, '33.343')] [2025-01-05 15:40:58,887][07482] Updated weights for policy 0, policy_version 302338 (0.0017) [2025-01-05 15:41:00,961][07482] Updated weights for policy 0, policy_version 302348 (0.0017) [2025-01-05 15:41:02,851][07361] Fps is (10 sec: 19661.1, 60 sec: 19797.4, 300 sec: 19744.1). Total num frames: 1238450176. Throughput: 0: 4938.7. Samples: 5075816. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 15:41:02,852][07361] Avg episode reward: [(0, '31.705')] [2025-01-05 15:41:03,143][07482] Updated weights for policy 0, policy_version 302358 (0.0017) [2025-01-05 15:41:05,123][07482] Updated weights for policy 0, policy_version 302368 (0.0019) [2025-01-05 15:41:07,213][07482] Updated weights for policy 0, policy_version 302378 (0.0017) [2025-01-05 15:41:07,851][07361] Fps is (10 sec: 19251.2, 60 sec: 19729.1, 300 sec: 19730.2). Total num frames: 1238548480. Throughput: 0: 4941.9. Samples: 5090656. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 15:41:07,852][07361] Avg episode reward: [(0, '34.519')] [2025-01-05 15:41:09,391][07482] Updated weights for policy 0, policy_version 302388 (0.0017) [2025-01-05 15:41:11,360][07482] Updated weights for policy 0, policy_version 302398 (0.0016) [2025-01-05 15:41:12,852][07361] Fps is (10 sec: 20070.2, 60 sec: 19797.3, 300 sec: 19730.2). Total num frames: 1238650880. Throughput: 0: 4939.2. Samples: 5120202. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 15:41:12,852][07361] Avg episode reward: [(0, '31.945')] [2025-01-05 15:41:13,466][07482] Updated weights for policy 0, policy_version 302408 (0.0017) [2025-01-05 15:41:15,539][07482] Updated weights for policy 0, policy_version 302418 (0.0017) [2025-01-05 15:41:17,508][07482] Updated weights for policy 0, policy_version 302428 (0.0016) [2025-01-05 15:41:17,852][07361] Fps is (10 sec: 20070.2, 60 sec: 19797.3, 300 sec: 19730.2). Total num frames: 1238749184. Throughput: 0: 4936.3. Samples: 5150144. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 15:41:17,852][07361] Avg episode reward: [(0, '32.387')] [2025-01-05 15:41:19,602][07482] Updated weights for policy 0, policy_version 302438 (0.0017) [2025-01-05 15:41:21,678][07482] Updated weights for policy 0, policy_version 302448 (0.0017) [2025-01-05 15:41:22,851][07361] Fps is (10 sec: 19661.0, 60 sec: 19797.4, 300 sec: 19730.2). Total num frames: 1238847488. Throughput: 0: 4944.9. Samples: 5165104. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 15:41:22,852][07361] Avg episode reward: [(0, '35.102')] [2025-01-05 15:41:23,754][07482] Updated weights for policy 0, policy_version 302458 (0.0017) [2025-01-05 15:41:25,819][07482] Updated weights for policy 0, policy_version 302468 (0.0015) [2025-01-05 15:41:27,852][07361] Fps is (10 sec: 19660.8, 60 sec: 19729.1, 300 sec: 19730.2). Total num frames: 1238945792. Throughput: 0: 4951.7. Samples: 5194648. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 15:41:27,852][07361] Avg episode reward: [(0, '34.091')] [2025-01-05 15:41:27,985][07482] Updated weights for policy 0, policy_version 302478 (0.0017) [2025-01-05 15:41:30,058][07482] Updated weights for policy 0, policy_version 302488 (0.0016) [2025-01-05 15:41:32,108][07482] Updated weights for policy 0, policy_version 302498 (0.0016) [2025-01-05 15:41:32,852][07361] Fps is (10 sec: 19660.6, 60 sec: 19729.1, 300 sec: 19716.3). Total num frames: 1239044096. Throughput: 0: 4939.5. Samples: 5223974. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 15:41:32,852][07361] Avg episode reward: [(0, '34.214')] [2025-01-05 15:41:34,250][07482] Updated weights for policy 0, policy_version 302508 (0.0016) [2025-01-05 15:41:36,219][07482] Updated weights for policy 0, policy_version 302518 (0.0015) [2025-01-05 15:41:37,852][07361] Fps is (10 sec: 19660.8, 60 sec: 19729.1, 300 sec: 19716.3). Total num frames: 1239142400. Throughput: 0: 4935.7. Samples: 5238770. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 15:41:37,852][07361] Avg episode reward: [(0, '32.682')] [2025-01-05 15:41:38,274][07482] Updated weights for policy 0, policy_version 302528 (0.0016) [2025-01-05 15:41:40,350][07482] Updated weights for policy 0, policy_version 302538 (0.0016) [2025-01-05 15:41:42,327][07482] Updated weights for policy 0, policy_version 302548 (0.0016) [2025-01-05 15:41:42,851][07361] Fps is (10 sec: 20070.7, 60 sec: 19797.3, 300 sec: 19716.3). Total num frames: 1239244800. Throughput: 0: 4944.3. Samples: 5269032. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 15:41:42,852][07361] Avg episode reward: [(0, '33.659')] [2025-01-05 15:41:44,393][07482] Updated weights for policy 0, policy_version 302558 (0.0015) [2025-01-05 15:41:46,485][07482] Updated weights for policy 0, policy_version 302568 (0.0019) [2025-01-05 15:41:47,851][07361] Fps is (10 sec: 20070.6, 60 sec: 19797.4, 300 sec: 19716.3). Total num frames: 1239343104. Throughput: 0: 4957.1. Samples: 5298884. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 15:41:47,852][07361] Avg episode reward: [(0, '35.310')] [2025-01-05 15:41:48,553][07482] Updated weights for policy 0, policy_version 302578 (0.0017) [2025-01-05 15:41:50,631][07482] Updated weights for policy 0, policy_version 302588 (0.0016) [2025-01-05 15:41:52,722][07482] Updated weights for policy 0, policy_version 302598 (0.0016) [2025-01-05 15:41:52,852][07361] Fps is (10 sec: 20070.1, 60 sec: 19865.6, 300 sec: 19716.3). Total num frames: 1239445504. Throughput: 0: 4954.8. Samples: 5313624. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 15:41:52,852][07361] Avg episode reward: [(0, '35.429')] [2025-01-05 15:41:54,757][07482] Updated weights for policy 0, policy_version 302608 (0.0017) [2025-01-05 15:41:56,832][07482] Updated weights for policy 0, policy_version 302618 (0.0017) [2025-01-05 15:41:57,852][07361] Fps is (10 sec: 20070.1, 60 sec: 19797.3, 300 sec: 19716.3). Total num frames: 1239543808. Throughput: 0: 4957.2. Samples: 5343278. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 15:41:57,852][07361] Avg episode reward: [(0, '35.689')] [2025-01-05 15:41:59,002][07482] Updated weights for policy 0, policy_version 302628 (0.0017) [2025-01-05 15:42:00,982][07482] Updated weights for policy 0, policy_version 302638 (0.0015) [2025-01-05 15:42:02,851][07361] Fps is (10 sec: 19660.9, 60 sec: 19865.6, 300 sec: 19716.3). Total num frames: 1239642112. Throughput: 0: 4948.4. Samples: 5372822. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 15:42:02,852][07361] Avg episode reward: [(0, '33.286')] [2025-01-05 15:42:03,064][07482] Updated weights for policy 0, policy_version 302648 (0.0016) [2025-01-05 15:42:05,136][07482] Updated weights for policy 0, policy_version 302658 (0.0015) [2025-01-05 15:42:07,117][07482] Updated weights for policy 0, policy_version 302668 (0.0016) [2025-01-05 15:42:07,852][07361] Fps is (10 sec: 19660.9, 60 sec: 19865.6, 300 sec: 19702.4). Total num frames: 1239740416. Throughput: 0: 4950.9. Samples: 5387896. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 15:42:07,852][07361] Avg episode reward: [(0, '32.968')] [2025-01-05 15:42:09,232][07482] Updated weights for policy 0, policy_version 302678 (0.0016) [2025-01-05 15:42:11,315][07482] Updated weights for policy 0, policy_version 302688 (0.0015) [2025-01-05 15:42:12,851][07361] Fps is (10 sec: 19660.8, 60 sec: 19797.3, 300 sec: 19702.4). Total num frames: 1239838720. Throughput: 0: 4959.1. Samples: 5417808. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 15:42:12,852][07361] Avg episode reward: [(0, '35.291')] [2025-01-05 15:42:13,445][07482] Updated weights for policy 0, policy_version 302698 (0.0017) [2025-01-05 15:42:15,514][07482] Updated weights for policy 0, policy_version 302708 (0.0016) [2025-01-05 15:42:17,594][07482] Updated weights for policy 0, policy_version 302718 (0.0016) [2025-01-05 15:42:17,852][07361] Fps is (10 sec: 19660.3, 60 sec: 19797.2, 300 sec: 19688.5). Total num frames: 1239937024. Throughput: 0: 4957.5. Samples: 5447062. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 15:42:17,852][07361] Avg episode reward: [(0, '37.744')] [2025-01-05 15:42:19,691][07482] Updated weights for policy 0, policy_version 302728 (0.0016) [2025-01-05 15:42:21,761][07482] Updated weights for policy 0, policy_version 302738 (0.0016) [2025-01-05 15:42:22,851][07361] Fps is (10 sec: 19660.8, 60 sec: 19797.3, 300 sec: 19688.6). Total num frames: 1240035328. Throughput: 0: 4949.3. Samples: 5461488. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 15:42:22,852][07361] Avg episode reward: [(0, '35.180')] [2025-01-05 15:42:23,961][07482] Updated weights for policy 0, policy_version 302748 (0.0017) [2025-01-05 15:42:25,959][07482] Updated weights for policy 0, policy_version 302758 (0.0019) [2025-01-05 15:42:27,851][07361] Fps is (10 sec: 19661.6, 60 sec: 19797.4, 300 sec: 19674.7). Total num frames: 1240133632. Throughput: 0: 4929.9. Samples: 5490876. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 15:42:27,852][07361] Avg episode reward: [(0, '38.431')] [2025-01-05 15:42:28,024][07482] Updated weights for policy 0, policy_version 302768 (0.0016) [2025-01-05 15:42:30,107][07482] Updated weights for policy 0, policy_version 302778 (0.0017) [2025-01-05 15:42:32,115][07482] Updated weights for policy 0, policy_version 302788 (0.0015) [2025-01-05 15:42:32,851][07361] Fps is (10 sec: 19661.0, 60 sec: 19797.4, 300 sec: 19674.7). Total num frames: 1240231936. Throughput: 0: 4932.8. Samples: 5520858. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 15:42:32,852][07361] Avg episode reward: [(0, '36.114')] [2025-01-05 15:42:34,190][07482] Updated weights for policy 0, policy_version 302798 (0.0017) [2025-01-05 15:42:36,270][07482] Updated weights for policy 0, policy_version 302808 (0.0016) [2025-01-05 15:42:37,852][07361] Fps is (10 sec: 19660.5, 60 sec: 19797.3, 300 sec: 19660.8). Total num frames: 1240330240. Throughput: 0: 4937.0. Samples: 5535788. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 15:42:37,852][07361] Avg episode reward: [(0, '38.710')] [2025-01-05 15:42:37,939][07448] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000302816_1240334336.pth... [2025-01-05 15:42:38,000][07448] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000301666_1235623936.pth [2025-01-05 15:42:38,382][07482] Updated weights for policy 0, policy_version 302818 (0.0017) [2025-01-05 15:42:40,449][07482] Updated weights for policy 0, policy_version 302828 (0.0018) [2025-01-05 15:42:42,513][07482] Updated weights for policy 0, policy_version 302838 (0.0016) [2025-01-05 15:42:42,852][07361] Fps is (10 sec: 19660.2, 60 sec: 19729.0, 300 sec: 19660.8). Total num frames: 1240428544. Throughput: 0: 4933.0. Samples: 5565264. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 15:42:42,852][07361] Avg episode reward: [(0, '38.250')] [2025-01-05 15:42:44,641][07482] Updated weights for policy 0, policy_version 302848 (0.0017) [2025-01-05 15:42:46,682][07482] Updated weights for policy 0, policy_version 302858 (0.0016) [2025-01-05 15:42:47,852][07361] Fps is (10 sec: 19660.7, 60 sec: 19729.0, 300 sec: 19646.9). Total num frames: 1240526848. Throughput: 0: 4928.5. Samples: 5594606. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 15:42:47,852][07361] Avg episode reward: [(0, '36.634')] [2025-01-05 15:42:48,846][07482] Updated weights for policy 0, policy_version 302868 (0.0017) [2025-01-05 15:42:50,858][07482] Updated weights for policy 0, policy_version 302878 (0.0016) [2025-01-05 15:42:52,852][07361] Fps is (10 sec: 19661.1, 60 sec: 19660.8, 300 sec: 19646.9). Total num frames: 1240625152. Throughput: 0: 4920.7. Samples: 5609328. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 15:42:52,852][07361] Avg episode reward: [(0, '33.876')] [2025-01-05 15:42:52,904][07482] Updated weights for policy 0, policy_version 302888 (0.0015) [2025-01-05 15:42:54,976][07482] Updated weights for policy 0, policy_version 302898 (0.0016) [2025-01-05 15:42:56,971][07482] Updated weights for policy 0, policy_version 302908 (0.0015) [2025-01-05 15:42:57,852][07361] Fps is (10 sec: 20070.6, 60 sec: 19729.1, 300 sec: 19660.8). Total num frames: 1240727552. Throughput: 0: 4927.0. Samples: 5639524. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 15:42:57,852][07361] Avg episode reward: [(0, '35.026')] [2025-01-05 15:42:59,005][07482] Updated weights for policy 0, policy_version 302918 (0.0015) [2025-01-05 15:43:01,088][07482] Updated weights for policy 0, policy_version 302928 (0.0016) [2025-01-05 15:43:02,851][07361] Fps is (10 sec: 20070.5, 60 sec: 19729.1, 300 sec: 19646.9). Total num frames: 1240825856. Throughput: 0: 4940.7. Samples: 5669392. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 15:43:02,852][07361] Avg episode reward: [(0, '33.798')] [2025-01-05 15:43:03,179][07482] Updated weights for policy 0, policy_version 302938 (0.0016) [2025-01-05 15:43:05,215][07482] Updated weights for policy 0, policy_version 302948 (0.0015) [2025-01-05 15:43:07,277][07482] Updated weights for policy 0, policy_version 302958 (0.0016) [2025-01-05 15:43:07,852][07361] Fps is (10 sec: 19660.7, 60 sec: 19729.1, 300 sec: 19646.9). Total num frames: 1240924160. Throughput: 0: 4949.9. Samples: 5684236. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 15:43:07,852][07361] Avg episode reward: [(0, '35.704')] [2025-01-05 15:43:09,343][07482] Updated weights for policy 0, policy_version 302968 (0.0017) [2025-01-05 15:43:11,397][07482] Updated weights for policy 0, policy_version 302978 (0.0015) [2025-01-05 15:43:12,851][07361] Fps is (10 sec: 19660.8, 60 sec: 19729.1, 300 sec: 19633.0). Total num frames: 1241022464. Throughput: 0: 4959.6. Samples: 5714060. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 15:43:12,852][07361] Avg episode reward: [(0, '37.119')] [2025-01-05 15:43:13,535][07482] Updated weights for policy 0, policy_version 302988 (0.0016) [2025-01-05 15:43:15,546][07482] Updated weights for policy 0, policy_version 302998 (0.0016) [2025-01-05 15:43:17,592][07482] Updated weights for policy 0, policy_version 303008 (0.0015) [2025-01-05 15:43:17,852][07361] Fps is (10 sec: 20070.5, 60 sec: 19797.4, 300 sec: 19660.8). Total num frames: 1241124864. Throughput: 0: 4954.6. Samples: 5743814. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 15:43:17,852][07361] Avg episode reward: [(0, '36.806')] [2025-01-05 15:43:19,730][07482] Updated weights for policy 0, policy_version 303018 (0.0017) [2025-01-05 15:43:21,753][07482] Updated weights for policy 0, policy_version 303028 (0.0015) [2025-01-05 15:43:22,851][07361] Fps is (10 sec: 19660.8, 60 sec: 19729.1, 300 sec: 19646.9). Total num frames: 1241219072. Throughput: 0: 4949.4. Samples: 5758510. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 15:43:22,852][07361] Avg episode reward: [(0, '36.292')] [2025-01-05 15:43:23,884][07482] Updated weights for policy 0, policy_version 303038 (0.0017) [2025-01-05 15:43:25,926][07482] Updated weights for policy 0, policy_version 303048 (0.0015) [2025-01-05 15:43:27,852][07361] Fps is (10 sec: 19660.9, 60 sec: 19797.3, 300 sec: 19660.8). Total num frames: 1241321472. Throughput: 0: 4955.7. Samples: 5788270. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 15:43:27,852][07361] Avg episode reward: [(0, '35.668')] [2025-01-05 15:43:27,994][07482] Updated weights for policy 0, policy_version 303058 (0.0015) [2025-01-05 15:43:30,025][07482] Updated weights for policy 0, policy_version 303068 (0.0016) [2025-01-05 15:43:32,088][07482] Updated weights for policy 0, policy_version 303078 (0.0015) [2025-01-05 15:43:32,852][07361] Fps is (10 sec: 20070.2, 60 sec: 19797.3, 300 sec: 19660.8). Total num frames: 1241419776. Throughput: 0: 4968.8. Samples: 5818202. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 15:43:32,852][07361] Avg episode reward: [(0, '35.058')] [2025-01-05 15:43:34,201][07482] Updated weights for policy 0, policy_version 303088 (0.0019) [2025-01-05 15:43:36,206][07482] Updated weights for policy 0, policy_version 303098 (0.0015) [2025-01-05 15:43:37,851][07361] Fps is (10 sec: 19660.9, 60 sec: 19797.4, 300 sec: 19646.9). Total num frames: 1241518080. Throughput: 0: 4967.6. Samples: 5832868. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 15:43:37,852][07361] Avg episode reward: [(0, '39.209')] [2025-01-05 15:43:38,275][07482] Updated weights for policy 0, policy_version 303108 (0.0016) [2025-01-05 15:43:40,287][07482] Updated weights for policy 0, policy_version 303118 (0.0015) [2025-01-05 15:43:42,311][07482] Updated weights for policy 0, policy_version 303128 (0.0015) [2025-01-05 15:43:42,851][07361] Fps is (10 sec: 20070.6, 60 sec: 19865.7, 300 sec: 19660.8). Total num frames: 1241620480. Throughput: 0: 4968.9. Samples: 5863122. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 15:43:42,852][07361] Avg episode reward: [(0, '39.365')] [2025-01-05 15:43:42,940][07448] Saving new best policy, reward=39.365! [2025-01-05 15:43:44,468][07482] Updated weights for policy 0, policy_version 303138 (0.0017) [2025-01-05 15:43:46,515][07482] Updated weights for policy 0, policy_version 303148 (0.0016) [2025-01-05 15:43:47,852][07361] Fps is (10 sec: 20070.1, 60 sec: 19865.6, 300 sec: 19660.8). Total num frames: 1241718784. Throughput: 0: 4959.3. Samples: 5892560. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 15:43:47,852][07361] Avg episode reward: [(0, '40.150')] [2025-01-05 15:43:47,857][07448] Saving new best policy, reward=40.150! [2025-01-05 15:43:48,623][07482] Updated weights for policy 0, policy_version 303158 (0.0017) [2025-01-05 15:43:50,699][07482] Updated weights for policy 0, policy_version 303168 (0.0015) [2025-01-05 15:43:52,781][07482] Updated weights for policy 0, policy_version 303178 (0.0016) [2025-01-05 15:43:52,852][07361] Fps is (10 sec: 19660.6, 60 sec: 19865.6, 300 sec: 19688.6). Total num frames: 1241817088. Throughput: 0: 4956.8. Samples: 5907290. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 15:43:52,852][07361] Avg episode reward: [(0, '38.982')] [2025-01-05 15:43:54,857][07482] Updated weights for policy 0, policy_version 303188 (0.0017) [2025-01-05 15:43:56,942][07482] Updated weights for policy 0, policy_version 303198 (0.0014) [2025-01-05 15:43:57,852][07361] Fps is (10 sec: 19660.8, 60 sec: 19797.3, 300 sec: 19716.3). Total num frames: 1241915392. Throughput: 0: 4948.1. Samples: 5936724. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 15:43:57,852][07361] Avg episode reward: [(0, '37.021')] [2025-01-05 15:43:59,040][07482] Updated weights for policy 0, policy_version 303208 (0.0017) [2025-01-05 15:44:01,044][07482] Updated weights for policy 0, policy_version 303218 (0.0015) [2025-01-05 15:44:02,851][07361] Fps is (10 sec: 19661.1, 60 sec: 19797.4, 300 sec: 19744.1). Total num frames: 1242013696. Throughput: 0: 4951.0. Samples: 5966606. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 15:44:02,852][07361] Avg episode reward: [(0, '37.741')] [2025-01-05 15:44:03,126][07482] Updated weights for policy 0, policy_version 303228 (0.0015) [2025-01-05 15:44:05,172][07482] Updated weights for policy 0, policy_version 303238 (0.0016) [2025-01-05 15:44:07,167][07482] Updated weights for policy 0, policy_version 303248 (0.0015) [2025-01-05 15:44:07,851][07361] Fps is (10 sec: 20070.6, 60 sec: 19865.6, 300 sec: 19785.8). Total num frames: 1242116096. Throughput: 0: 4957.2. Samples: 5981586. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 15:44:07,852][07361] Avg episode reward: [(0, '36.400')] [2025-01-05 15:44:09,242][07482] Updated weights for policy 0, policy_version 303258 (0.0017) [2025-01-05 15:44:11,274][07482] Updated weights for policy 0, policy_version 303268 (0.0016) [2025-01-05 15:44:12,851][07361] Fps is (10 sec: 20070.3, 60 sec: 19865.6, 300 sec: 19799.7). Total num frames: 1242214400. Throughput: 0: 4969.0. Samples: 6011876. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 15:44:12,852][07361] Avg episode reward: [(0, '36.136')] [2025-01-05 15:44:13,386][07482] Updated weights for policy 0, policy_version 303278 (0.0016) [2025-01-05 15:44:15,434][07482] Updated weights for policy 0, policy_version 303288 (0.0015) [2025-01-05 15:44:17,475][07482] Updated weights for policy 0, policy_version 303298 (0.0015) [2025-01-05 15:44:17,852][07361] Fps is (10 sec: 19660.7, 60 sec: 19797.3, 300 sec: 19799.6). Total num frames: 1242312704. Throughput: 0: 4962.5. Samples: 6041512. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 15:44:17,852][07361] Avg episode reward: [(0, '37.365')] [2025-01-05 15:44:19,566][07482] Updated weights for policy 0, policy_version 303308 (0.0017) [2025-01-05 15:44:21,634][07482] Updated weights for policy 0, policy_version 303318 (0.0018) [2025-01-05 15:44:22,851][07361] Fps is (10 sec: 19660.7, 60 sec: 19865.6, 300 sec: 19799.6). Total num frames: 1242411008. Throughput: 0: 4960.6. Samples: 6056094. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 15:44:22,852][07361] Avg episode reward: [(0, '35.503')] [2025-01-05 15:44:23,775][07482] Updated weights for policy 0, policy_version 303328 (0.0016) [2025-01-05 15:44:25,756][07482] Updated weights for policy 0, policy_version 303338 (0.0016) [2025-01-05 15:44:27,803][07482] Updated weights for policy 0, policy_version 303348 (0.0016) [2025-01-05 15:44:27,852][07361] Fps is (10 sec: 20070.3, 60 sec: 19865.6, 300 sec: 19813.5). Total num frames: 1242513408. Throughput: 0: 4952.8. Samples: 6086000. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 15:44:27,852][07361] Avg episode reward: [(0, '37.297')] [2025-01-05 15:44:29,957][07482] Updated weights for policy 0, policy_version 303358 (0.0017) [2025-01-05 15:44:31,951][07482] Updated weights for policy 0, policy_version 303368 (0.0016) [2025-01-05 15:44:32,851][07361] Fps is (10 sec: 20070.4, 60 sec: 19865.6, 300 sec: 19813.5). Total num frames: 1242611712. Throughput: 0: 4957.9. Samples: 6115666. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 15:44:32,852][07361] Avg episode reward: [(0, '37.949')] [2025-01-05 15:44:34,013][07482] Updated weights for policy 0, policy_version 303378 (0.0015) [2025-01-05 15:44:36,110][07482] Updated weights for policy 0, policy_version 303388 (0.0016) [2025-01-05 15:44:37,852][07361] Fps is (10 sec: 19660.8, 60 sec: 19865.6, 300 sec: 19799.6). Total num frames: 1242710016. Throughput: 0: 4964.1. Samples: 6130674. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 15:44:37,852][07361] Avg episode reward: [(0, '34.990')] [2025-01-05 15:44:37,862][07448] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000303396_1242710016.pth... [2025-01-05 15:44:37,915][07448] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000302236_1237958656.pth [2025-01-05 15:44:38,196][07482] Updated weights for policy 0, policy_version 303398 (0.0017) [2025-01-05 15:44:40,267][07482] Updated weights for policy 0, policy_version 303408 (0.0016) [2025-01-05 15:44:42,375][07482] Updated weights for policy 0, policy_version 303418 (0.0017) [2025-01-05 15:44:42,852][07361] Fps is (10 sec: 19660.7, 60 sec: 19797.3, 300 sec: 19813.5). Total num frames: 1242808320. Throughput: 0: 4964.1. Samples: 6160108. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 15:44:42,852][07361] Avg episode reward: [(0, '37.055')] [2025-01-05 15:44:44,464][07482] Updated weights for policy 0, policy_version 303428 (0.0017) [2025-01-05 15:44:46,521][07482] Updated weights for policy 0, policy_version 303438 (0.0015) [2025-01-05 15:44:47,852][07361] Fps is (10 sec: 19660.9, 60 sec: 19797.3, 300 sec: 19799.6). Total num frames: 1242906624. Throughput: 0: 4950.3. Samples: 6189370. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 15:44:47,852][07361] Avg episode reward: [(0, '37.556')] [2025-01-05 15:44:48,702][07482] Updated weights for policy 0, policy_version 303448 (0.0016) [2025-01-05 15:44:50,674][07482] Updated weights for policy 0, policy_version 303458 (0.0015) [2025-01-05 15:44:52,720][07482] Updated weights for policy 0, policy_version 303468 (0.0016) [2025-01-05 15:44:52,852][07361] Fps is (10 sec: 19660.5, 60 sec: 19797.3, 300 sec: 19799.6). Total num frames: 1243004928. Throughput: 0: 4948.0. Samples: 6204248. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 15:44:52,852][07361] Avg episode reward: [(0, '34.578')] [2025-01-05 15:44:54,886][07482] Updated weights for policy 0, policy_version 303478 (0.0016) [2025-01-05 15:44:56,865][07482] Updated weights for policy 0, policy_version 303488 (0.0015) [2025-01-05 15:44:57,851][07361] Fps is (10 sec: 19661.0, 60 sec: 19797.4, 300 sec: 19799.7). Total num frames: 1243103232. Throughput: 0: 4937.4. Samples: 6234058. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 15:44:57,852][07361] Avg episode reward: [(0, '39.551')] [2025-01-05 15:44:58,951][07482] Updated weights for policy 0, policy_version 303498 (0.0015) [2025-01-05 15:45:01,078][07482] Updated weights for policy 0, policy_version 303508 (0.0017) [2025-01-05 15:45:02,851][07361] Fps is (10 sec: 19661.2, 60 sec: 19797.3, 300 sec: 19785.8). Total num frames: 1243201536. Throughput: 0: 4932.0. Samples: 6263452. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 15:45:02,852][07361] Avg episode reward: [(0, '40.032')] [2025-01-05 15:45:03,181][07482] Updated weights for policy 0, policy_version 303518 (0.0017) [2025-01-05 15:45:05,245][07482] Updated weights for policy 0, policy_version 303528 (0.0017) [2025-01-05 15:45:07,341][07482] Updated weights for policy 0, policy_version 303538 (0.0015) [2025-01-05 15:45:07,851][07361] Fps is (10 sec: 19660.8, 60 sec: 19729.1, 300 sec: 19785.8). Total num frames: 1243299840. Throughput: 0: 4933.1. Samples: 6278084. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 15:45:07,852][07361] Avg episode reward: [(0, '40.371')] [2025-01-05 15:45:07,858][07448] Saving new best policy, reward=40.371! [2025-01-05 15:45:09,436][07482] Updated weights for policy 0, policy_version 303548 (0.0017) [2025-01-05 15:45:11,506][07482] Updated weights for policy 0, policy_version 303558 (0.0017) [2025-01-05 15:45:12,852][07361] Fps is (10 sec: 19660.7, 60 sec: 19729.0, 300 sec: 19785.8). Total num frames: 1243398144. Throughput: 0: 4923.3. Samples: 6307546. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 15:45:12,852][07361] Avg episode reward: [(0, '36.648')] [2025-01-05 15:45:13,727][07482] Updated weights for policy 0, policy_version 303568 (0.0018) [2025-01-05 15:45:15,747][07482] Updated weights for policy 0, policy_version 303578 (0.0017) [2025-01-05 15:45:17,852][07361] Fps is (10 sec: 19660.5, 60 sec: 19729.0, 300 sec: 19785.8). Total num frames: 1243496448. Throughput: 0: 4907.3. Samples: 6336496. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 15:45:17,852][07361] Avg episode reward: [(0, '37.972')] [2025-01-05 15:45:17,857][07482] Updated weights for policy 0, policy_version 303588 (0.0018) [2025-01-05 15:45:20,032][07482] Updated weights for policy 0, policy_version 303598 (0.0017) [2025-01-05 15:45:22,048][07482] Updated weights for policy 0, policy_version 303608 (0.0015) [2025-01-05 15:45:22,851][07361] Fps is (10 sec: 19251.3, 60 sec: 19660.8, 300 sec: 19758.0). Total num frames: 1243590656. Throughput: 0: 4897.1. Samples: 6351044. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 15:45:22,852][07361] Avg episode reward: [(0, '39.463')] [2025-01-05 15:45:24,103][07482] Updated weights for policy 0, policy_version 303618 (0.0016) [2025-01-05 15:45:26,245][07482] Updated weights for policy 0, policy_version 303628 (0.0015) [2025-01-05 15:45:27,852][07361] Fps is (10 sec: 19251.4, 60 sec: 19592.5, 300 sec: 19758.0). Total num frames: 1243688960. Throughput: 0: 4905.2. Samples: 6380842. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 15:45:27,852][07361] Avg episode reward: [(0, '40.302')] [2025-01-05 15:45:28,326][07482] Updated weights for policy 0, policy_version 303638 (0.0017) [2025-01-05 15:45:30,411][07482] Updated weights for policy 0, policy_version 303648 (0.0016) [2025-01-05 15:45:32,550][07482] Updated weights for policy 0, policy_version 303658 (0.0016) [2025-01-05 15:45:32,851][07361] Fps is (10 sec: 19660.9, 60 sec: 19592.6, 300 sec: 19758.0). Total num frames: 1243787264. Throughput: 0: 4903.4. Samples: 6410022. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 15:45:32,852][07361] Avg episode reward: [(0, '38.725')] [2025-01-05 15:45:34,754][07482] Updated weights for policy 0, policy_version 303668 (0.0018) [2025-01-05 15:45:36,878][07482] Updated weights for policy 0, policy_version 303678 (0.0017) [2025-01-05 15:45:37,851][07361] Fps is (10 sec: 19251.3, 60 sec: 19524.3, 300 sec: 19744.1). Total num frames: 1243881472. Throughput: 0: 4879.8. Samples: 6423836. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 15:45:37,852][07361] Avg episode reward: [(0, '36.077')] [2025-01-05 15:45:39,096][07482] Updated weights for policy 0, policy_version 303688 (0.0017) [2025-01-05 15:45:41,140][07482] Updated weights for policy 0, policy_version 303698 (0.0017) [2025-01-05 15:45:42,852][07361] Fps is (10 sec: 18841.4, 60 sec: 19456.0, 300 sec: 19730.2). Total num frames: 1243975680. Throughput: 0: 4857.9. Samples: 6452662. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 15:45:42,852][07361] Avg episode reward: [(0, '36.581')] [2025-01-05 15:45:43,235][07482] Updated weights for policy 0, policy_version 303708 (0.0016) [2025-01-05 15:45:45,329][07482] Updated weights for policy 0, policy_version 303718 (0.0016) [2025-01-05 15:45:47,369][07482] Updated weights for policy 0, policy_version 303728 (0.0016) [2025-01-05 15:45:47,852][07361] Fps is (10 sec: 19660.6, 60 sec: 19524.3, 300 sec: 19744.1). Total num frames: 1244078080. Throughput: 0: 4865.1. Samples: 6482380. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 15:45:47,852][07361] Avg episode reward: [(0, '38.103')] [2025-01-05 15:45:49,526][07482] Updated weights for policy 0, policy_version 303738 (0.0017) [2025-01-05 15:45:51,676][07482] Updated weights for policy 0, policy_version 303748 (0.0016) [2025-01-05 15:45:52,852][07361] Fps is (10 sec: 19660.8, 60 sec: 19456.1, 300 sec: 19716.3). Total num frames: 1244172288. Throughput: 0: 4854.7. Samples: 6496544. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 15:45:52,852][07361] Avg episode reward: [(0, '40.552')] [2025-01-05 15:45:52,853][07448] Saving new best policy, reward=40.552! [2025-01-05 15:45:53,872][07482] Updated weights for policy 0, policy_version 303758 (0.0016) [2025-01-05 15:45:55,929][07482] Updated weights for policy 0, policy_version 303768 (0.0016) [2025-01-05 15:45:57,851][07361] Fps is (10 sec: 18841.7, 60 sec: 19387.7, 300 sec: 19716.3). Total num frames: 1244266496. Throughput: 0: 4840.0. Samples: 6525346. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 15:45:57,852][07361] Avg episode reward: [(0, '39.903')] [2025-01-05 15:45:58,177][07482] Updated weights for policy 0, policy_version 303778 (0.0015) [2025-01-05 15:46:00,262][07482] Updated weights for policy 0, policy_version 303788 (0.0015) [2025-01-05 15:46:02,295][07482] Updated weights for policy 0, policy_version 303798 (0.0015) [2025-01-05 15:46:02,851][07361] Fps is (10 sec: 19251.4, 60 sec: 19387.7, 300 sec: 19716.3). Total num frames: 1244364800. Throughput: 0: 4839.8. Samples: 6554286. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 15:46:02,852][07361] Avg episode reward: [(0, '38.035')] [2025-01-05 15:46:04,433][07482] Updated weights for policy 0, policy_version 303808 (0.0015) [2025-01-05 15:46:06,423][07482] Updated weights for policy 0, policy_version 303818 (0.0015) [2025-01-05 15:46:07,851][07361] Fps is (10 sec: 20070.6, 60 sec: 19456.0, 300 sec: 19716.3). Total num frames: 1244467200. Throughput: 0: 4847.0. Samples: 6569158. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 15:46:07,852][07361] Avg episode reward: [(0, '35.630')] [2025-01-05 15:46:08,429][07482] Updated weights for policy 0, policy_version 303828 (0.0015) [2025-01-05 15:46:10,538][07482] Updated weights for policy 0, policy_version 303838 (0.0016) [2025-01-05 15:46:12,560][07482] Updated weights for policy 0, policy_version 303848 (0.0015) [2025-01-05 15:46:12,852][07361] Fps is (10 sec: 20070.2, 60 sec: 19456.0, 300 sec: 19716.3). Total num frames: 1244565504. Throughput: 0: 4855.1. Samples: 6599322. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 15:46:12,852][07361] Avg episode reward: [(0, '38.878')] [2025-01-05 15:46:14,659][07482] Updated weights for policy 0, policy_version 303858 (0.0017) [2025-01-05 15:46:16,784][07482] Updated weights for policy 0, policy_version 303868 (0.0015) [2025-01-05 15:46:17,852][07361] Fps is (10 sec: 19660.5, 60 sec: 19456.0, 300 sec: 19716.3). Total num frames: 1244663808. Throughput: 0: 4860.8. Samples: 6628758. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 15:46:17,852][07361] Avg episode reward: [(0, '38.010')] [2025-01-05 15:46:18,860][07482] Updated weights for policy 0, policy_version 303878 (0.0016) [2025-01-05 15:46:20,886][07482] Updated weights for policy 0, policy_version 303888 (0.0017) [2025-01-05 15:46:22,852][07361] Fps is (10 sec: 19660.8, 60 sec: 19524.2, 300 sec: 19716.3). Total num frames: 1244762112. Throughput: 0: 4881.3. Samples: 6643494. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 15:46:22,852][07361] Avg episode reward: [(0, '38.633')] [2025-01-05 15:46:23,057][07482] Updated weights for policy 0, policy_version 303898 (0.0017) [2025-01-05 15:46:25,078][07482] Updated weights for policy 0, policy_version 303908 (0.0017) [2025-01-05 15:46:27,102][07482] Updated weights for policy 0, policy_version 303918 (0.0016) [2025-01-05 15:46:27,852][07361] Fps is (10 sec: 19660.8, 60 sec: 19524.3, 300 sec: 19716.3). Total num frames: 1244860416. Throughput: 0: 4900.0. Samples: 6673160. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 15:46:27,852][07361] Avg episode reward: [(0, '39.507')] [2025-01-05 15:46:29,266][07482] Updated weights for policy 0, policy_version 303928 (0.0017) [2025-01-05 15:46:31,289][07482] Updated weights for policy 0, policy_version 303938 (0.0016) [2025-01-05 15:46:32,851][07361] Fps is (10 sec: 19660.8, 60 sec: 19524.2, 300 sec: 19716.3). Total num frames: 1244958720. Throughput: 0: 4893.6. Samples: 6702592. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 15:46:32,852][07361] Avg episode reward: [(0, '41.148')] [2025-01-05 15:46:32,853][07448] Saving new best policy, reward=41.148! [2025-01-05 15:46:33,430][07482] Updated weights for policy 0, policy_version 303948 (0.0017) [2025-01-05 15:46:35,590][07482] Updated weights for policy 0, policy_version 303958 (0.0015) [2025-01-05 15:46:37,653][07482] Updated weights for policy 0, policy_version 303968 (0.0015) [2025-01-05 15:46:37,851][07361] Fps is (10 sec: 19661.0, 60 sec: 19592.5, 300 sec: 19702.5). Total num frames: 1245057024. Throughput: 0: 4898.6. Samples: 6716982. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 15:46:37,852][07361] Avg episode reward: [(0, '41.450')] [2025-01-05 15:46:37,860][07448] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000303969_1245057024.pth... [2025-01-05 15:46:37,916][07448] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000302816_1240334336.pth [2025-01-05 15:46:37,917][07448] Saving new best policy, reward=41.450! [2025-01-05 15:46:39,846][07482] Updated weights for policy 0, policy_version 303978 (0.0016) [2025-01-05 15:46:41,928][07482] Updated weights for policy 0, policy_version 303988 (0.0015) [2025-01-05 15:46:42,851][07361] Fps is (10 sec: 19251.2, 60 sec: 19592.5, 300 sec: 19688.6). Total num frames: 1245151232. Throughput: 0: 4902.0. Samples: 6745934. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 15:46:42,852][07361] Avg episode reward: [(0, '38.604')] [2025-01-05 15:46:44,039][07482] Updated weights for policy 0, policy_version 303998 (0.0016) [2025-01-05 15:46:46,098][07482] Updated weights for policy 0, policy_version 304008 (0.0015) [2025-01-05 15:46:47,852][07361] Fps is (10 sec: 19251.0, 60 sec: 19524.3, 300 sec: 19674.7). Total num frames: 1245249536. Throughput: 0: 4906.1. Samples: 6775060. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 15:46:47,852][07361] Avg episode reward: [(0, '39.081')] [2025-01-05 15:46:48,258][07482] Updated weights for policy 0, policy_version 304018 (0.0016) [2025-01-05 15:46:50,358][07482] Updated weights for policy 0, policy_version 304028 (0.0017) [2025-01-05 15:46:52,390][07482] Updated weights for policy 0, policy_version 304038 (0.0015) [2025-01-05 15:46:52,852][07361] Fps is (10 sec: 19660.8, 60 sec: 19592.5, 300 sec: 19674.7). Total num frames: 1245347840. Throughput: 0: 4899.6. Samples: 6789640. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 15:46:52,852][07361] Avg episode reward: [(0, '39.601')] [2025-01-05 15:46:54,576][07482] Updated weights for policy 0, policy_version 304048 (0.0016) [2025-01-05 15:46:56,651][07482] Updated weights for policy 0, policy_version 304058 (0.0016) [2025-01-05 15:46:57,852][07361] Fps is (10 sec: 19251.2, 60 sec: 19592.5, 300 sec: 19660.8). Total num frames: 1245442048. Throughput: 0: 4882.0. Samples: 6819010. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 15:46:57,852][07361] Avg episode reward: [(0, '40.423')] [2025-01-05 15:46:58,823][07482] Updated weights for policy 0, policy_version 304068 (0.0016) [2025-01-05 15:47:00,959][07482] Updated weights for policy 0, policy_version 304078 (0.0017) [2025-01-05 15:47:02,852][07361] Fps is (10 sec: 18841.4, 60 sec: 19524.2, 300 sec: 19646.9). Total num frames: 1245536256. Throughput: 0: 4863.7. Samples: 6847624. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 15:47:02,852][07361] Avg episode reward: [(0, '41.330')] [2025-01-05 15:47:03,085][07482] Updated weights for policy 0, policy_version 304088 (0.0017) [2025-01-05 15:47:05,142][07482] Updated weights for policy 0, policy_version 304098 (0.0016) [2025-01-05 15:47:07,295][07482] Updated weights for policy 0, policy_version 304108 (0.0015) [2025-01-05 15:47:07,852][07361] Fps is (10 sec: 19251.2, 60 sec: 19455.9, 300 sec: 19646.9). Total num frames: 1245634560. Throughput: 0: 4859.5. Samples: 6862170. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 15:47:07,852][07361] Avg episode reward: [(0, '40.156')] [2025-01-05 15:47:09,421][07482] Updated weights for policy 0, policy_version 304118 (0.0015) [2025-01-05 15:47:11,528][07482] Updated weights for policy 0, policy_version 304128 (0.0017) [2025-01-05 15:47:12,852][07361] Fps is (10 sec: 19660.9, 60 sec: 19456.0, 300 sec: 19646.9). Total num frames: 1245732864. Throughput: 0: 4841.6. Samples: 6891034. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 15:47:12,852][07361] Avg episode reward: [(0, '40.284')] [2025-01-05 15:47:13,778][07482] Updated weights for policy 0, policy_version 304138 (0.0016) [2025-01-05 15:47:15,889][07482] Updated weights for policy 0, policy_version 304148 (0.0016) [2025-01-05 15:47:17,852][07361] Fps is (10 sec: 18841.6, 60 sec: 19319.5, 300 sec: 19619.1). Total num frames: 1245822976. Throughput: 0: 4814.5. Samples: 6919246. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 15:47:17,852][07361] Avg episode reward: [(0, '39.584')] [2025-01-05 15:47:18,060][07482] Updated weights for policy 0, policy_version 304158 (0.0017) [2025-01-05 15:47:20,192][07482] Updated weights for policy 0, policy_version 304168 (0.0018) [2025-01-05 15:47:22,289][07482] Updated weights for policy 0, policy_version 304178 (0.0016) [2025-01-05 15:47:22,852][07361] Fps is (10 sec: 18841.5, 60 sec: 19319.4, 300 sec: 19619.1). Total num frames: 1245921280. Throughput: 0: 4819.5. Samples: 6933858. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 15:47:22,852][07361] Avg episode reward: [(0, '38.735')] [2025-01-05 15:47:24,408][07482] Updated weights for policy 0, policy_version 304188 (0.0017) [2025-01-05 15:47:26,541][07482] Updated weights for policy 0, policy_version 304198 (0.0017) [2025-01-05 15:47:27,851][07361] Fps is (10 sec: 19661.0, 60 sec: 19319.5, 300 sec: 19619.1). Total num frames: 1246019584. Throughput: 0: 4821.4. Samples: 6962898. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 15:47:27,852][07361] Avg episode reward: [(0, '41.134')] [2025-01-05 15:47:28,676][07482] Updated weights for policy 0, policy_version 304208 (0.0017) [2025-01-05 15:47:30,703][07482] Updated weights for policy 0, policy_version 304218 (0.0017) [2025-01-05 15:47:32,851][07361] Fps is (10 sec: 19251.4, 60 sec: 19251.2, 300 sec: 19605.3). Total num frames: 1246113792. Throughput: 0: 4818.3. Samples: 6991884. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 15:47:32,852][07361] Avg episode reward: [(0, '37.966')] [2025-01-05 15:47:32,897][07482] Updated weights for policy 0, policy_version 304228 (0.0017) [2025-01-05 15:47:35,027][07482] Updated weights for policy 0, policy_version 304238 (0.0018) [2025-01-05 15:47:37,069][07482] Updated weights for policy 0, policy_version 304248 (0.0018) [2025-01-05 15:47:37,852][07361] Fps is (10 sec: 19250.9, 60 sec: 19251.1, 300 sec: 19605.3). Total num frames: 1246212096. Throughput: 0: 4813.7. Samples: 7006258. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 15:47:37,852][07361] Avg episode reward: [(0, '40.303')] [2025-01-05 15:47:39,258][07482] Updated weights for policy 0, policy_version 304258 (0.0017) [2025-01-05 15:47:41,325][07482] Updated weights for policy 0, policy_version 304268 (0.0016) [2025-01-05 15:47:42,852][07361] Fps is (10 sec: 19660.7, 60 sec: 19319.5, 300 sec: 19605.3). Total num frames: 1246310400. Throughput: 0: 4812.9. Samples: 7035590. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 15:47:42,852][07361] Avg episode reward: [(0, '42.542')] [2025-01-05 15:47:42,853][07448] Saving new best policy, reward=42.542! [2025-01-05 15:47:43,438][07482] Updated weights for policy 0, policy_version 304278 (0.0017) [2025-01-05 15:47:45,508][07482] Updated weights for policy 0, policy_version 304288 (0.0015) [2025-01-05 15:47:47,534][07482] Updated weights for policy 0, policy_version 304298 (0.0017) [2025-01-05 15:47:47,852][07361] Fps is (10 sec: 19660.9, 60 sec: 19319.4, 300 sec: 19605.3). Total num frames: 1246408704. Throughput: 0: 4834.7. Samples: 7065186. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 15:47:47,852][07361] Avg episode reward: [(0, '40.236')] [2025-01-05 15:47:49,729][07482] Updated weights for policy 0, policy_version 304308 (0.0018) [2025-01-05 15:47:51,969][07482] Updated weights for policy 0, policy_version 304318 (0.0016) [2025-01-05 15:47:52,852][07361] Fps is (10 sec: 19251.2, 60 sec: 19251.2, 300 sec: 19577.5). Total num frames: 1246502912. Throughput: 0: 4826.0. Samples: 7079338. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 15:47:52,852][07361] Avg episode reward: [(0, '43.563')] [2025-01-05 15:47:52,853][07448] Saving new best policy, reward=43.563! [2025-01-05 15:47:54,147][07482] Updated weights for policy 0, policy_version 304328 (0.0017) [2025-01-05 15:47:56,160][07482] Updated weights for policy 0, policy_version 304338 (0.0015) [2025-01-05 15:47:57,852][07361] Fps is (10 sec: 19251.3, 60 sec: 19319.5, 300 sec: 19577.5). Total num frames: 1246601216. Throughput: 0: 4817.7. Samples: 7107830. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 15:47:57,852][07361] Avg episode reward: [(0, '44.406')] [2025-01-05 15:47:57,858][07448] Saving new best policy, reward=44.406! [2025-01-05 15:47:58,242][07482] Updated weights for policy 0, policy_version 304348 (0.0016) [2025-01-05 15:48:00,301][07482] Updated weights for policy 0, policy_version 304358 (0.0015) [2025-01-05 15:48:02,330][07482] Updated weights for policy 0, policy_version 304368 (0.0016) [2025-01-05 15:48:02,852][07361] Fps is (10 sec: 19660.8, 60 sec: 19387.8, 300 sec: 19577.5). Total num frames: 1246699520. Throughput: 0: 4857.1. Samples: 7137816. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 15:48:02,852][07361] Avg episode reward: [(0, '41.890')] [2025-01-05 15:48:04,418][07482] Updated weights for policy 0, policy_version 304378 (0.0017) [2025-01-05 15:48:06,722][07482] Updated weights for policy 0, policy_version 304388 (0.0021) [2025-01-05 15:48:07,852][07361] Fps is (10 sec: 18841.3, 60 sec: 19251.2, 300 sec: 19549.7). Total num frames: 1246789632. Throughput: 0: 4860.0. Samples: 7152558. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 15:48:07,852][07361] Avg episode reward: [(0, '38.655')] [2025-01-05 15:48:08,972][07482] Updated weights for policy 0, policy_version 304398 (0.0017) [2025-01-05 15:48:11,094][07482] Updated weights for policy 0, policy_version 304408 (0.0016) [2025-01-05 15:48:12,852][07361] Fps is (10 sec: 18841.6, 60 sec: 19251.2, 300 sec: 19535.8). Total num frames: 1246887936. Throughput: 0: 4826.1. Samples: 7180072. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 15:48:12,852][07361] Avg episode reward: [(0, '42.140')] [2025-01-05 15:48:13,191][07482] Updated weights for policy 0, policy_version 304418 (0.0015) [2025-01-05 15:48:15,250][07482] Updated weights for policy 0, policy_version 304428 (0.0016) [2025-01-05 15:48:17,305][07482] Updated weights for policy 0, policy_version 304438 (0.0016) [2025-01-05 15:48:17,852][07361] Fps is (10 sec: 19661.0, 60 sec: 19387.7, 300 sec: 19549.7). Total num frames: 1246986240. Throughput: 0: 4841.6. Samples: 7209756. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 15:48:17,852][07361] Avg episode reward: [(0, '40.590')] [2025-01-05 15:48:19,439][07482] Updated weights for policy 0, policy_version 304448 (0.0017) [2025-01-05 15:48:21,493][07482] Updated weights for policy 0, policy_version 304458 (0.0015) [2025-01-05 15:48:22,852][07361] Fps is (10 sec: 19660.7, 60 sec: 19387.7, 300 sec: 19535.8). Total num frames: 1247084544. Throughput: 0: 4843.7. Samples: 7224226. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 15:48:22,852][07361] Avg episode reward: [(0, '39.679')] [2025-01-05 15:48:23,624][07482] Updated weights for policy 0, policy_version 304468 (0.0016) [2025-01-05 15:48:25,659][07482] Updated weights for policy 0, policy_version 304478 (0.0017) [2025-01-05 15:48:27,730][07482] Updated weights for policy 0, policy_version 304488 (0.0016) [2025-01-05 15:48:27,852][07361] Fps is (10 sec: 19660.7, 60 sec: 19387.7, 300 sec: 19535.8). Total num frames: 1247182848. Throughput: 0: 4850.1. Samples: 7253844. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 15:48:27,852][07361] Avg episode reward: [(0, '39.868')] [2025-01-05 15:48:29,975][07482] Updated weights for policy 0, policy_version 304498 (0.0018) [2025-01-05 15:48:32,040][07482] Updated weights for policy 0, policy_version 304508 (0.0017) [2025-01-05 15:48:32,852][07361] Fps is (10 sec: 19251.2, 60 sec: 19387.7, 300 sec: 19521.9). Total num frames: 1247277056. Throughput: 0: 4829.8. Samples: 7282528. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 15:48:32,852][07361] Avg episode reward: [(0, '36.541')] [2025-01-05 15:48:34,208][07482] Updated weights for policy 0, policy_version 304518 (0.0017) [2025-01-05 15:48:36,211][07482] Updated weights for policy 0, policy_version 304528 (0.0015) [2025-01-05 15:48:37,851][07361] Fps is (10 sec: 19251.4, 60 sec: 19387.8, 300 sec: 19508.1). Total num frames: 1247375360. Throughput: 0: 4840.1. Samples: 7297144. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 15:48:37,852][07361] Avg episode reward: [(0, '38.703')] [2025-01-05 15:48:37,884][07448] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000304536_1247379456.pth... [2025-01-05 15:48:37,933][07448] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000303396_1242710016.pth [2025-01-05 15:48:38,292][07482] Updated weights for policy 0, policy_version 304538 (0.0016) [2025-01-05 15:48:40,384][07482] Updated weights for policy 0, policy_version 304548 (0.0016) [2025-01-05 15:48:42,404][07482] Updated weights for policy 0, policy_version 304558 (0.0016) [2025-01-05 15:48:42,852][07361] Fps is (10 sec: 19660.9, 60 sec: 19387.7, 300 sec: 19508.1). Total num frames: 1247473664. Throughput: 0: 4869.3. Samples: 7326946. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 15:48:42,852][07361] Avg episode reward: [(0, '39.372')] [2025-01-05 15:48:44,540][07482] Updated weights for policy 0, policy_version 304568 (0.0017) [2025-01-05 15:48:46,740][07482] Updated weights for policy 0, policy_version 304578 (0.0018) [2025-01-05 15:48:47,852][07361] Fps is (10 sec: 19660.7, 60 sec: 19387.8, 300 sec: 19508.1). Total num frames: 1247571968. Throughput: 0: 4842.4. Samples: 7355724. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 15:48:47,852][07361] Avg episode reward: [(0, '37.221')] [2025-01-05 15:48:48,899][07482] Updated weights for policy 0, policy_version 304588 (0.0016) [2025-01-05 15:48:51,001][07482] Updated weights for policy 0, policy_version 304598 (0.0017) [2025-01-05 15:48:52,851][07361] Fps is (10 sec: 19251.4, 60 sec: 19387.8, 300 sec: 19494.2). Total num frames: 1247666176. Throughput: 0: 4830.2. Samples: 7369914. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 15:48:52,852][07361] Avg episode reward: [(0, '37.190')] [2025-01-05 15:48:53,239][07482] Updated weights for policy 0, policy_version 304608 (0.0016) [2025-01-05 15:48:55,317][07482] Updated weights for policy 0, policy_version 304618 (0.0016) [2025-01-05 15:48:57,372][07482] Updated weights for policy 0, policy_version 304628 (0.0016) [2025-01-05 15:48:57,852][07361] Fps is (10 sec: 19251.1, 60 sec: 19387.7, 300 sec: 19494.2). Total num frames: 1247764480. Throughput: 0: 4861.7. Samples: 7398848. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 15:48:57,852][07361] Avg episode reward: [(0, '37.153')] [2025-01-05 15:48:59,566][07482] Updated weights for policy 0, policy_version 304638 (0.0016) [2025-01-05 15:49:01,600][07482] Updated weights for policy 0, policy_version 304648 (0.0018) [2025-01-05 15:49:02,852][07361] Fps is (10 sec: 19251.0, 60 sec: 19319.5, 300 sec: 19466.4). Total num frames: 1247858688. Throughput: 0: 4849.8. Samples: 7427996. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 15:49:02,852][07361] Avg episode reward: [(0, '38.395')] [2025-01-05 15:49:03,851][07482] Updated weights for policy 0, policy_version 304658 (0.0017) [2025-01-05 15:49:06,060][07482] Updated weights for policy 0, policy_version 304668 (0.0016) [2025-01-05 15:49:07,852][07361] Fps is (10 sec: 18841.6, 60 sec: 19387.8, 300 sec: 19452.5). Total num frames: 1247952896. Throughput: 0: 4841.4. Samples: 7442088. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 15:49:07,852][07361] Avg episode reward: [(0, '38.452')] [2025-01-05 15:49:08,196][07482] Updated weights for policy 0, policy_version 304678 (0.0016) [2025-01-05 15:49:10,363][07482] Updated weights for policy 0, policy_version 304688 (0.0017) [2025-01-05 15:49:12,550][07482] Updated weights for policy 0, policy_version 304698 (0.0017) [2025-01-05 15:49:12,852][07361] Fps is (10 sec: 18841.6, 60 sec: 19319.5, 300 sec: 19438.6). Total num frames: 1248047104. Throughput: 0: 4805.6. Samples: 7470094. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-05 15:49:12,852][07361] Avg episode reward: [(0, '37.136')] [2025-01-05 15:49:14,729][07482] Updated weights for policy 0, policy_version 304708 (0.0018) [2025-01-05 15:49:16,901][07482] Updated weights for policy 0, policy_version 304718 (0.0017) [2025-01-05 15:49:17,851][07361] Fps is (10 sec: 18841.8, 60 sec: 19251.2, 300 sec: 19424.8). Total num frames: 1248141312. Throughput: 0: 4792.1. Samples: 7498172. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-05 15:49:17,852][07361] Avg episode reward: [(0, '41.292')] [2025-01-05 15:49:19,185][07482] Updated weights for policy 0, policy_version 304728 (0.0018) [2025-01-05 15:49:21,352][07482] Updated weights for policy 0, policy_version 304738 (0.0018) [2025-01-05 15:49:22,852][07361] Fps is (10 sec: 18432.0, 60 sec: 19114.7, 300 sec: 19383.1). Total num frames: 1248231424. Throughput: 0: 4774.5. Samples: 7511998. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-05 15:49:22,852][07361] Avg episode reward: [(0, '39.358')] [2025-01-05 15:49:23,573][07482] Updated weights for policy 0, policy_version 304748 (0.0017) [2025-01-05 15:49:25,676][07482] Updated weights for policy 0, policy_version 304758 (0.0020) [2025-01-05 15:49:27,781][07482] Updated weights for policy 0, policy_version 304768 (0.0019) [2025-01-05 15:49:27,852][07361] Fps is (10 sec: 18841.5, 60 sec: 19114.7, 300 sec: 19383.1). Total num frames: 1248329728. Throughput: 0: 4744.8. Samples: 7540460. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-05 15:49:27,852][07361] Avg episode reward: [(0, '37.459')] [2025-01-05 15:49:29,965][07482] Updated weights for policy 0, policy_version 304778 (0.0018) [2025-01-05 15:49:32,076][07482] Updated weights for policy 0, policy_version 304788 (0.0017) [2025-01-05 15:49:32,852][07361] Fps is (10 sec: 19251.2, 60 sec: 19114.7, 300 sec: 19369.2). Total num frames: 1248423936. Throughput: 0: 4745.1. Samples: 7569254. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-05 15:49:32,852][07361] Avg episode reward: [(0, '35.279')] [2025-01-05 15:49:34,199][07482] Updated weights for policy 0, policy_version 304798 (0.0018) [2025-01-05 15:49:36,293][07482] Updated weights for policy 0, policy_version 304808 (0.0017) [2025-01-05 15:49:37,852][07361] Fps is (10 sec: 19251.1, 60 sec: 19114.7, 300 sec: 19369.2). Total num frames: 1248522240. Throughput: 0: 4751.5. Samples: 7583734. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-05 15:49:37,852][07361] Avg episode reward: [(0, '37.852')] [2025-01-05 15:49:38,488][07482] Updated weights for policy 0, policy_version 304818 (0.0017) [2025-01-05 15:49:40,508][07482] Updated weights for policy 0, policy_version 304828 (0.0016) [2025-01-05 15:49:42,609][07482] Updated weights for policy 0, policy_version 304838 (0.0017) [2025-01-05 15:49:42,851][07361] Fps is (10 sec: 19251.4, 60 sec: 19046.4, 300 sec: 19355.3). Total num frames: 1248616448. Throughput: 0: 4752.4. Samples: 7612706. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-05 15:49:42,852][07361] Avg episode reward: [(0, '39.722')] [2025-01-05 15:49:44,830][07482] Updated weights for policy 0, policy_version 304848 (0.0018) [2025-01-05 15:49:46,833][07482] Updated weights for policy 0, policy_version 304858 (0.0017) [2025-01-05 15:49:47,851][07361] Fps is (10 sec: 19251.4, 60 sec: 19046.4, 300 sec: 19355.3). Total num frames: 1248714752. Throughput: 0: 4756.2. Samples: 7642024. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-05 15:49:47,852][07361] Avg episode reward: [(0, '39.559')] [2025-01-05 15:49:48,902][07482] Updated weights for policy 0, policy_version 304868 (0.0017) [2025-01-05 15:49:51,022][07482] Updated weights for policy 0, policy_version 304878 (0.0017) [2025-01-05 15:49:52,851][07361] Fps is (10 sec: 19660.7, 60 sec: 19114.7, 300 sec: 19355.3). Total num frames: 1248813056. Throughput: 0: 4772.4. Samples: 7656844. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-05 15:49:52,852][07361] Avg episode reward: [(0, '38.810')] [2025-01-05 15:49:53,135][07482] Updated weights for policy 0, policy_version 304888 (0.0017) [2025-01-05 15:49:55,222][07482] Updated weights for policy 0, policy_version 304898 (0.0021) [2025-01-05 15:49:57,370][07482] Updated weights for policy 0, policy_version 304908 (0.0017) [2025-01-05 15:49:57,852][07361] Fps is (10 sec: 19660.6, 60 sec: 19114.7, 300 sec: 19355.3). Total num frames: 1248911360. Throughput: 0: 4797.4. Samples: 7685976. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-05 15:49:57,852][07361] Avg episode reward: [(0, '40.251')] [2025-01-05 15:49:59,496][07482] Updated weights for policy 0, policy_version 304918 (0.0017) [2025-01-05 15:50:01,597][07482] Updated weights for policy 0, policy_version 304928 (0.0017) [2025-01-05 15:50:02,851][07361] Fps is (10 sec: 19251.2, 60 sec: 19114.7, 300 sec: 19341.5). Total num frames: 1249005568. Throughput: 0: 4814.0. Samples: 7714802. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-05 15:50:02,852][07361] Avg episode reward: [(0, '42.399')] [2025-01-05 15:50:03,810][07482] Updated weights for policy 0, policy_version 304938 (0.0017) [2025-01-05 15:50:05,876][07482] Updated weights for policy 0, policy_version 304948 (0.0018) [2025-01-05 15:50:07,852][07361] Fps is (10 sec: 19251.2, 60 sec: 19182.9, 300 sec: 19341.4). Total num frames: 1249103872. Throughput: 0: 4826.3. Samples: 7729182. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-05 15:50:07,852][07361] Avg episode reward: [(0, '41.480')] [2025-01-05 15:50:08,059][07482] Updated weights for policy 0, policy_version 304958 (0.0021) [2025-01-05 15:50:10,207][07482] Updated weights for policy 0, policy_version 304968 (0.0017) [2025-01-05 15:50:12,229][07482] Updated weights for policy 0, policy_version 304978 (0.0015) [2025-01-05 15:50:12,851][07361] Fps is (10 sec: 19251.1, 60 sec: 19182.9, 300 sec: 19327.6). Total num frames: 1249198080. Throughput: 0: 4834.6. Samples: 7758018. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-05 15:50:12,852][07361] Avg episode reward: [(0, '42.244')] [2025-01-05 15:50:14,520][07482] Updated weights for policy 0, policy_version 304988 (0.0018) [2025-01-05 15:50:16,681][07482] Updated weights for policy 0, policy_version 304998 (0.0017) [2025-01-05 15:50:17,852][07361] Fps is (10 sec: 18841.0, 60 sec: 19182.8, 300 sec: 19327.5). Total num frames: 1249292288. Throughput: 0: 4823.8. Samples: 7786326. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-05 15:50:17,852][07361] Avg episode reward: [(0, '38.873')] [2025-01-05 15:50:18,798][07482] Updated weights for policy 0, policy_version 305008 (0.0019) [2025-01-05 15:50:20,854][07482] Updated weights for policy 0, policy_version 305018 (0.0015) [2025-01-05 15:50:22,851][07361] Fps is (10 sec: 19251.3, 60 sec: 19319.5, 300 sec: 19327.6). Total num frames: 1249390592. Throughput: 0: 4822.1. Samples: 7800728. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 15:50:22,852][07361] Avg episode reward: [(0, '38.840')] [2025-01-05 15:50:23,065][07482] Updated weights for policy 0, policy_version 305028 (0.0017) [2025-01-05 15:50:25,046][07482] Updated weights for policy 0, policy_version 305038 (0.0014) [2025-01-05 15:50:27,093][07482] Updated weights for policy 0, policy_version 305048 (0.0015) [2025-01-05 15:50:27,852][07361] Fps is (10 sec: 19661.4, 60 sec: 19319.4, 300 sec: 19327.6). Total num frames: 1249488896. Throughput: 0: 4835.9. Samples: 7830324. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 15:50:27,852][07361] Avg episode reward: [(0, '39.791')] [2025-01-05 15:50:29,280][07482] Updated weights for policy 0, policy_version 305058 (0.0016) [2025-01-05 15:50:31,303][07482] Updated weights for policy 0, policy_version 305068 (0.0015) [2025-01-05 15:50:32,851][07361] Fps is (10 sec: 19660.9, 60 sec: 19387.8, 300 sec: 19341.5). Total num frames: 1249587200. Throughput: 0: 4834.2. Samples: 7859564. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 15:50:32,852][07361] Avg episode reward: [(0, '40.801')] [2025-01-05 15:50:33,451][07482] Updated weights for policy 0, policy_version 305078 (0.0017) [2025-01-05 15:50:35,515][07482] Updated weights for policy 0, policy_version 305088 (0.0016) [2025-01-05 15:50:37,508][07482] Updated weights for policy 0, policy_version 305098 (0.0016) [2025-01-05 15:50:37,852][07361] Fps is (10 sec: 19660.7, 60 sec: 19387.7, 300 sec: 19355.3). Total num frames: 1249685504. Throughput: 0: 4833.3. Samples: 7874342. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 15:50:37,852][07361] Avg episode reward: [(0, '40.476')] [2025-01-05 15:50:37,859][07448] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000305099_1249685504.pth... [2025-01-05 15:50:37,918][07448] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000303969_1245057024.pth [2025-01-05 15:50:39,620][07482] Updated weights for policy 0, policy_version 305108 (0.0015) [2025-01-05 15:50:41,711][07482] Updated weights for policy 0, policy_version 305118 (0.0016) [2025-01-05 15:50:42,852][07361] Fps is (10 sec: 19660.6, 60 sec: 19456.0, 300 sec: 19341.5). Total num frames: 1249783808. Throughput: 0: 4847.5. Samples: 7904114. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 15:50:42,852][07361] Avg episode reward: [(0, '38.561')] [2025-01-05 15:50:43,785][07482] Updated weights for policy 0, policy_version 305128 (0.0017) [2025-01-05 15:50:45,829][07482] Updated weights for policy 0, policy_version 305138 (0.0016) [2025-01-05 15:50:47,852][07361] Fps is (10 sec: 19661.0, 60 sec: 19456.0, 300 sec: 19355.3). Total num frames: 1249882112. Throughput: 0: 4861.4. Samples: 7933566. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 15:50:47,852][07361] Avg episode reward: [(0, '36.921')] [2025-01-05 15:50:47,992][07482] Updated weights for policy 0, policy_version 305148 (0.0016) [2025-01-05 15:50:50,030][07482] Updated weights for policy 0, policy_version 305158 (0.0016) [2025-01-05 15:50:52,083][07482] Updated weights for policy 0, policy_version 305168 (0.0017) [2025-01-05 15:50:52,852][07361] Fps is (10 sec: 19660.8, 60 sec: 19456.0, 300 sec: 19369.2). Total num frames: 1249980416. Throughput: 0: 4866.5. Samples: 7948176. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 15:50:52,852][07361] Avg episode reward: [(0, '37.331')] [2025-01-05 15:50:54,260][07482] Updated weights for policy 0, policy_version 305178 (0.0016) [2025-01-05 15:50:56,233][07482] Updated weights for policy 0, policy_version 305188 (0.0016) [2025-01-05 15:50:57,852][07361] Fps is (10 sec: 19660.8, 60 sec: 19456.0, 300 sec: 19369.2). Total num frames: 1250078720. Throughput: 0: 4884.1. Samples: 7977804. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 15:50:57,852][07361] Avg episode reward: [(0, '40.416')] [2025-01-05 15:50:58,304][07482] Updated weights for policy 0, policy_version 305198 (0.0015) [2025-01-05 15:51:00,375][07482] Updated weights for policy 0, policy_version 305208 (0.0016) [2025-01-05 15:51:02,354][07482] Updated weights for policy 0, policy_version 305218 (0.0016) [2025-01-05 15:51:02,851][07361] Fps is (10 sec: 20070.5, 60 sec: 19592.5, 300 sec: 19369.2). Total num frames: 1250181120. Throughput: 0: 4925.8. Samples: 8007986. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 15:51:02,852][07361] Avg episode reward: [(0, '35.818')] [2025-01-05 15:51:04,404][07482] Updated weights for policy 0, policy_version 305228 (0.0016) [2025-01-05 15:51:06,475][07482] Updated weights for policy 0, policy_version 305238 (0.0016) [2025-01-05 15:51:07,852][07361] Fps is (10 sec: 20070.4, 60 sec: 19592.5, 300 sec: 19369.2). Total num frames: 1250279424. Throughput: 0: 4942.5. Samples: 8023142. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 15:51:07,852][07361] Avg episode reward: [(0, '35.265')] [2025-01-05 15:51:08,541][07482] Updated weights for policy 0, policy_version 305248 (0.0016) [2025-01-05 15:51:10,582][07482] Updated weights for policy 0, policy_version 305258 (0.0015) [2025-01-05 15:51:12,635][07482] Updated weights for policy 0, policy_version 305268 (0.0016) [2025-01-05 15:51:12,852][07361] Fps is (10 sec: 20070.2, 60 sec: 19729.1, 300 sec: 19383.1). Total num frames: 1250381824. Throughput: 0: 4946.8. Samples: 8052930. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 15:51:12,852][07361] Avg episode reward: [(0, '35.584')] [2025-01-05 15:51:14,696][07482] Updated weights for policy 0, policy_version 305278 (0.0016) [2025-01-05 15:51:16,725][07482] Updated weights for policy 0, policy_version 305288 (0.0016) [2025-01-05 15:51:17,852][07361] Fps is (10 sec: 20070.4, 60 sec: 19797.5, 300 sec: 19383.1). Total num frames: 1250480128. Throughput: 0: 4958.9. Samples: 8082714. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 15:51:17,852][07361] Avg episode reward: [(0, '36.767')] [2025-01-05 15:51:18,891][07482] Updated weights for policy 0, policy_version 305298 (0.0017) [2025-01-05 15:51:20,877][07482] Updated weights for policy 0, policy_version 305308 (0.0015) [2025-01-05 15:51:22,851][07361] Fps is (10 sec: 19660.9, 60 sec: 19797.3, 300 sec: 19383.1). Total num frames: 1250578432. Throughput: 0: 4960.2. Samples: 8097550. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 15:51:22,852][07361] Avg episode reward: [(0, '36.166')] [2025-01-05 15:51:22,896][07482] Updated weights for policy 0, policy_version 305318 (0.0016) [2025-01-05 15:51:24,967][07482] Updated weights for policy 0, policy_version 305328 (0.0015) [2025-01-05 15:51:26,944][07482] Updated weights for policy 0, policy_version 305338 (0.0015) [2025-01-05 15:51:27,852][07361] Fps is (10 sec: 20069.1, 60 sec: 19865.4, 300 sec: 19396.9). Total num frames: 1250680832. Throughput: 0: 4973.9. Samples: 8127942. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 15:51:27,853][07361] Avg episode reward: [(0, '37.034')] [2025-01-05 15:51:28,981][07482] Updated weights for policy 0, policy_version 305348 (0.0015) [2025-01-05 15:51:31,060][07482] Updated weights for policy 0, policy_version 305358 (0.0016) [2025-01-05 15:51:32,852][07361] Fps is (10 sec: 20070.3, 60 sec: 19865.6, 300 sec: 19397.0). Total num frames: 1250779136. Throughput: 0: 4985.2. Samples: 8157898. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 15:51:32,852][07361] Avg episode reward: [(0, '37.840')] [2025-01-05 15:51:33,145][07482] Updated weights for policy 0, policy_version 305368 (0.0016) [2025-01-05 15:51:35,164][07482] Updated weights for policy 0, policy_version 305378 (0.0015) [2025-01-05 15:51:37,259][07482] Updated weights for policy 0, policy_version 305388 (0.0015) [2025-01-05 15:51:37,851][07361] Fps is (10 sec: 19662.2, 60 sec: 19865.7, 300 sec: 19410.9). Total num frames: 1250877440. Throughput: 0: 4991.0. Samples: 8172770. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 15:51:37,852][07361] Avg episode reward: [(0, '39.626')] [2025-01-05 15:51:39,369][07482] Updated weights for policy 0, policy_version 305398 (0.0018) [2025-01-05 15:51:41,372][07482] Updated weights for policy 0, policy_version 305408 (0.0016) [2025-01-05 15:51:42,851][07361] Fps is (10 sec: 20070.5, 60 sec: 19933.9, 300 sec: 19424.8). Total num frames: 1250979840. Throughput: 0: 4992.1. Samples: 8202448. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 15:51:42,852][07361] Avg episode reward: [(0, '38.389')] [2025-01-05 15:51:43,456][07482] Updated weights for policy 0, policy_version 305418 (0.0017) [2025-01-05 15:51:45,467][07482] Updated weights for policy 0, policy_version 305428 (0.0015) [2025-01-05 15:51:47,483][07482] Updated weights for policy 0, policy_version 305438 (0.0016) [2025-01-05 15:51:47,852][07361] Fps is (10 sec: 20070.3, 60 sec: 19933.9, 300 sec: 19424.8). Total num frames: 1251078144. Throughput: 0: 4994.8. Samples: 8232752. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 15:51:47,852][07361] Avg episode reward: [(0, '39.538')] [2025-01-05 15:51:49,556][07482] Updated weights for policy 0, policy_version 305448 (0.0015) [2025-01-05 15:51:51,584][07482] Updated weights for policy 0, policy_version 305458 (0.0016) [2025-01-05 15:51:52,851][07361] Fps is (10 sec: 19660.9, 60 sec: 19933.9, 300 sec: 19438.7). Total num frames: 1251176448. Throughput: 0: 4991.5. Samples: 8247758. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 15:51:52,852][07361] Avg episode reward: [(0, '39.521')] [2025-01-05 15:51:53,684][07482] Updated weights for policy 0, policy_version 305468 (0.0016) [2025-01-05 15:51:55,756][07482] Updated weights for policy 0, policy_version 305478 (0.0016) [2025-01-05 15:51:57,852][07361] Fps is (10 sec: 19660.7, 60 sec: 19933.9, 300 sec: 19452.5). Total num frames: 1251274752. Throughput: 0: 4986.2. Samples: 8277310. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 15:51:57,852][07361] Avg episode reward: [(0, '39.569')] [2025-01-05 15:51:57,904][07482] Updated weights for policy 0, policy_version 305488 (0.0017) [2025-01-05 15:51:59,948][07482] Updated weights for policy 0, policy_version 305498 (0.0016) [2025-01-05 15:52:02,100][07482] Updated weights for policy 0, policy_version 305508 (0.0017) [2025-01-05 15:52:02,852][07361] Fps is (10 sec: 19660.5, 60 sec: 19865.6, 300 sec: 19452.5). Total num frames: 1251373056. Throughput: 0: 4973.5. Samples: 8306522. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 15:52:02,852][07361] Avg episode reward: [(0, '38.881')] [2025-01-05 15:52:04,274][07482] Updated weights for policy 0, policy_version 305518 (0.0017) [2025-01-05 15:52:06,258][07482] Updated weights for policy 0, policy_version 305528 (0.0016) [2025-01-05 15:52:07,852][07361] Fps is (10 sec: 19660.9, 60 sec: 19865.6, 300 sec: 19452.5). Total num frames: 1251471360. Throughput: 0: 4964.4. Samples: 8320950. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 15:52:07,852][07361] Avg episode reward: [(0, '37.685')] [2025-01-05 15:52:08,350][07482] Updated weights for policy 0, policy_version 305538 (0.0016) [2025-01-05 15:52:10,407][07482] Updated weights for policy 0, policy_version 305548 (0.0016) [2025-01-05 15:52:12,396][07482] Updated weights for policy 0, policy_version 305558 (0.0016) [2025-01-05 15:52:12,851][07361] Fps is (10 sec: 19660.9, 60 sec: 19797.3, 300 sec: 19480.3). Total num frames: 1251569664. Throughput: 0: 4955.4. Samples: 8350930. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 15:52:12,852][07361] Avg episode reward: [(0, '39.168')] [2025-01-05 15:52:14,491][07482] Updated weights for policy 0, policy_version 305568 (0.0015) [2025-01-05 15:52:16,551][07482] Updated weights for policy 0, policy_version 305578 (0.0016) [2025-01-05 15:52:17,852][07361] Fps is (10 sec: 19660.7, 60 sec: 19797.3, 300 sec: 19480.3). Total num frames: 1251667968. Throughput: 0: 4951.5. Samples: 8380716. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 15:52:17,852][07361] Avg episode reward: [(0, '40.343')] [2025-01-05 15:52:18,661][07482] Updated weights for policy 0, policy_version 305588 (0.0017) [2025-01-05 15:52:20,769][07482] Updated weights for policy 0, policy_version 305598 (0.0016) [2025-01-05 15:52:22,851][07361] Fps is (10 sec: 19660.9, 60 sec: 19797.3, 300 sec: 19480.3). Total num frames: 1251766272. Throughput: 0: 4945.3. Samples: 8395310. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 15:52:22,852][07361] Avg episode reward: [(0, '40.993')] [2025-01-05 15:52:22,923][07482] Updated weights for policy 0, policy_version 305608 (0.0016) [2025-01-05 15:52:24,996][07482] Updated weights for policy 0, policy_version 305618 (0.0016) [2025-01-05 15:52:27,093][07482] Updated weights for policy 0, policy_version 305628 (0.0015) [2025-01-05 15:52:27,852][07361] Fps is (10 sec: 19660.8, 60 sec: 19729.3, 300 sec: 19494.2). Total num frames: 1251864576. Throughput: 0: 4930.7. Samples: 8424332. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 15:52:27,852][07361] Avg episode reward: [(0, '39.414')] [2025-01-05 15:52:29,245][07482] Updated weights for policy 0, policy_version 305638 (0.0016) [2025-01-05 15:52:31,246][07482] Updated weights for policy 0, policy_version 305648 (0.0017) [2025-01-05 15:52:32,851][07361] Fps is (10 sec: 19660.8, 60 sec: 19729.1, 300 sec: 19494.2). Total num frames: 1251962880. Throughput: 0: 4914.9. Samples: 8453920. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 15:52:32,852][07361] Avg episode reward: [(0, '38.268')] [2025-01-05 15:52:33,342][07482] Updated weights for policy 0, policy_version 305658 (0.0016) [2025-01-05 15:52:35,389][07482] Updated weights for policy 0, policy_version 305668 (0.0016) [2025-01-05 15:52:37,379][07482] Updated weights for policy 0, policy_version 305678 (0.0016) [2025-01-05 15:52:37,852][07361] Fps is (10 sec: 20070.6, 60 sec: 19797.3, 300 sec: 19508.1). Total num frames: 1252065280. Throughput: 0: 4914.4. Samples: 8468908. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 15:52:37,852][07361] Avg episode reward: [(0, '41.604')] [2025-01-05 15:52:37,859][07448] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000305680_1252065280.pth... [2025-01-05 15:52:37,913][07448] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000304536_1247379456.pth [2025-01-05 15:52:39,505][07482] Updated weights for policy 0, policy_version 305688 (0.0016) [2025-01-05 15:52:41,565][07482] Updated weights for policy 0, policy_version 305698 (0.0017) [2025-01-05 15:52:42,852][07361] Fps is (10 sec: 20070.2, 60 sec: 19729.0, 300 sec: 19508.1). Total num frames: 1252163584. Throughput: 0: 4923.4. Samples: 8498864. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 15:52:42,852][07361] Avg episode reward: [(0, '42.104')] [2025-01-05 15:52:43,614][07482] Updated weights for policy 0, policy_version 305708 (0.0016) [2025-01-05 15:52:45,702][07482] Updated weights for policy 0, policy_version 305718 (0.0016) [2025-01-05 15:52:47,829][07482] Updated weights for policy 0, policy_version 305728 (0.0017) [2025-01-05 15:52:47,852][07361] Fps is (10 sec: 19660.8, 60 sec: 19729.1, 300 sec: 19522.0). Total num frames: 1252261888. Throughput: 0: 4926.2. Samples: 8528200. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 15:52:47,852][07361] Avg episode reward: [(0, '39.332')] [2025-01-05 15:52:49,855][07482] Updated weights for policy 0, policy_version 305738 (0.0017) [2025-01-05 15:52:51,965][07482] Updated weights for policy 0, policy_version 305748 (0.0016) [2025-01-05 15:52:52,852][07361] Fps is (10 sec: 19660.8, 60 sec: 19729.0, 300 sec: 19522.0). Total num frames: 1252360192. Throughput: 0: 4933.4. Samples: 8542952. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 15:52:52,852][07361] Avg episode reward: [(0, '36.629')] [2025-01-05 15:52:54,122][07482] Updated weights for policy 0, policy_version 305758 (0.0017) [2025-01-05 15:52:56,089][07482] Updated weights for policy 0, policy_version 305768 (0.0017) [2025-01-05 15:52:57,851][07361] Fps is (10 sec: 19660.9, 60 sec: 19729.1, 300 sec: 19522.0). Total num frames: 1252458496. Throughput: 0: 4922.8. Samples: 8572458. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 15:52:57,852][07361] Avg episode reward: [(0, '38.690')] [2025-01-05 15:52:58,195][07482] Updated weights for policy 0, policy_version 305778 (0.0017) [2025-01-05 15:53:00,241][07482] Updated weights for policy 0, policy_version 305788 (0.0016) [2025-01-05 15:53:02,218][07482] Updated weights for policy 0, policy_version 305798 (0.0016) [2025-01-05 15:53:02,851][07361] Fps is (10 sec: 20070.4, 60 sec: 19797.4, 300 sec: 19563.6). Total num frames: 1252560896. Throughput: 0: 4929.6. Samples: 8602548. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 15:53:02,852][07361] Avg episode reward: [(0, '40.492')] [2025-01-05 15:53:04,313][07482] Updated weights for policy 0, policy_version 305808 (0.0016) [2025-01-05 15:53:06,372][07482] Updated weights for policy 0, policy_version 305818 (0.0016) [2025-01-05 15:53:07,852][07361] Fps is (10 sec: 20070.2, 60 sec: 19797.3, 300 sec: 19563.6). Total num frames: 1252659200. Throughput: 0: 4941.2. Samples: 8617666. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 15:53:07,852][07361] Avg episode reward: [(0, '40.191')] [2025-01-05 15:53:08,434][07482] Updated weights for policy 0, policy_version 305828 (0.0017) [2025-01-05 15:53:10,539][07482] Updated weights for policy 0, policy_version 305838 (0.0016) [2025-01-05 15:53:12,594][07482] Updated weights for policy 0, policy_version 305848 (0.0016) [2025-01-05 15:53:12,851][07361] Fps is (10 sec: 19660.9, 60 sec: 19797.3, 300 sec: 19563.6). Total num frames: 1252757504. Throughput: 0: 4954.9. Samples: 8647304. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 15:53:12,852][07361] Avg episode reward: [(0, '40.660')] [2025-01-05 15:53:14,646][07482] Updated weights for policy 0, policy_version 305858 (0.0017) [2025-01-05 15:53:16,708][07482] Updated weights for policy 0, policy_version 305868 (0.0016) [2025-01-05 15:53:17,852][07361] Fps is (10 sec: 19660.9, 60 sec: 19797.3, 300 sec: 19563.6). Total num frames: 1252855808. Throughput: 0: 4955.5. Samples: 8676918. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 15:53:17,852][07361] Avg episode reward: [(0, '36.683')] [2025-01-05 15:53:18,855][07482] Updated weights for policy 0, policy_version 305878 (0.0017) [2025-01-05 15:53:20,837][07482] Updated weights for policy 0, policy_version 305888 (0.0016) [2025-01-05 15:53:22,852][07361] Fps is (10 sec: 19660.7, 60 sec: 19797.3, 300 sec: 19563.6). Total num frames: 1252954112. Throughput: 0: 4950.8. Samples: 8691692. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 15:53:22,852][07361] Avg episode reward: [(0, '37.724')] [2025-01-05 15:53:22,941][07482] Updated weights for policy 0, policy_version 305898 (0.0016) [2025-01-05 15:53:25,020][07482] Updated weights for policy 0, policy_version 305908 (0.0017) [2025-01-05 15:53:27,015][07482] Updated weights for policy 0, policy_version 305918 (0.0017) [2025-01-05 15:53:27,852][07361] Fps is (10 sec: 19660.7, 60 sec: 19797.3, 300 sec: 19577.5). Total num frames: 1253052416. Throughput: 0: 4949.3. Samples: 8721584. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 15:53:27,852][07361] Avg episode reward: [(0, '40.575')] [2025-01-05 15:53:29,128][07482] Updated weights for policy 0, policy_version 305928 (0.0016) [2025-01-05 15:53:31,169][07482] Updated weights for policy 0, policy_version 305938 (0.0017) [2025-01-05 15:53:32,852][07361] Fps is (10 sec: 19660.8, 60 sec: 19797.3, 300 sec: 19577.5). Total num frames: 1253150720. Throughput: 0: 4958.1. Samples: 8751314. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 15:53:32,852][07361] Avg episode reward: [(0, '39.387')] [2025-01-05 15:53:33,226][07482] Updated weights for policy 0, policy_version 305948 (0.0017) [2025-01-05 15:53:35,326][07482] Updated weights for policy 0, policy_version 305958 (0.0015) [2025-01-05 15:53:37,397][07482] Updated weights for policy 0, policy_version 305968 (0.0016) [2025-01-05 15:53:37,852][07361] Fps is (10 sec: 20070.4, 60 sec: 19797.3, 300 sec: 19591.4). Total num frames: 1253253120. Throughput: 0: 4961.7. Samples: 8766230. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 15:53:37,852][07361] Avg episode reward: [(0, '41.572')] [2025-01-05 15:53:39,424][07482] Updated weights for policy 0, policy_version 305978 (0.0017) [2025-01-05 15:53:41,541][07482] Updated weights for policy 0, policy_version 305988 (0.0017) [2025-01-05 15:53:42,852][07361] Fps is (10 sec: 20070.4, 60 sec: 19797.3, 300 sec: 19591.4). Total num frames: 1253351424. Throughput: 0: 4963.8. Samples: 8795830. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 15:53:42,852][07361] Avg episode reward: [(0, '41.162')] [2025-01-05 15:53:43,653][07482] Updated weights for policy 0, policy_version 305998 (0.0017) [2025-01-05 15:53:45,655][07482] Updated weights for policy 0, policy_version 306008 (0.0016) [2025-01-05 15:53:47,762][07482] Updated weights for policy 0, policy_version 306018 (0.0016) [2025-01-05 15:53:47,852][07361] Fps is (10 sec: 19660.8, 60 sec: 19797.3, 300 sec: 19605.3). Total num frames: 1253449728. Throughput: 0: 4953.4. Samples: 8825452. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 15:53:47,852][07361] Avg episode reward: [(0, '38.923')] [2025-01-05 15:53:49,930][07482] Updated weights for policy 0, policy_version 306028 (0.0018) [2025-01-05 15:53:51,978][07482] Updated weights for policy 0, policy_version 306038 (0.0017) [2025-01-05 15:53:52,852][07361] Fps is (10 sec: 19250.9, 60 sec: 19729.0, 300 sec: 19591.4). Total num frames: 1253543936. Throughput: 0: 4937.0. Samples: 8839830. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 15:53:52,852][07361] Avg episode reward: [(0, '40.406')] [2025-01-05 15:53:54,338][07482] Updated weights for policy 0, policy_version 306048 (0.0019) [2025-01-05 15:53:56,668][07482] Updated weights for policy 0, policy_version 306058 (0.0020) [2025-01-05 15:53:57,852][07361] Fps is (10 sec: 18431.9, 60 sec: 19592.5, 300 sec: 19577.5). Total num frames: 1253634048. Throughput: 0: 4886.8. Samples: 8867210. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 15:53:57,852][07361] Avg episode reward: [(0, '41.832')] [2025-01-05 15:53:58,842][07482] Updated weights for policy 0, policy_version 306068 (0.0018) [2025-01-05 15:54:01,389][07482] Updated weights for policy 0, policy_version 306078 (0.0019) [2025-01-05 15:54:02,852][07361] Fps is (10 sec: 17203.3, 60 sec: 19251.2, 300 sec: 19535.8). Total num frames: 1253715968. Throughput: 0: 4787.9. Samples: 8892372. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 15:54:02,852][07361] Avg episode reward: [(0, '40.202')] [2025-01-05 15:54:04,209][07482] Updated weights for policy 0, policy_version 306088 (0.0020) [2025-01-05 15:54:06,531][07482] Updated weights for policy 0, policy_version 306098 (0.0018) [2025-01-05 15:54:07,855][07361] Fps is (10 sec: 16378.6, 60 sec: 18977.1, 300 sec: 19494.0). Total num frames: 1253797888. Throughput: 0: 4722.2. Samples: 8904208. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-05 15:54:07,857][07361] Avg episode reward: [(0, '43.973')] [2025-01-05 15:54:08,890][07482] Updated weights for policy 0, policy_version 306108 (0.0019) [2025-01-05 15:54:11,414][07482] Updated weights for policy 0, policy_version 306118 (0.0020) [2025-01-05 15:54:12,852][07361] Fps is (10 sec: 16793.7, 60 sec: 18773.3, 300 sec: 19466.4). Total num frames: 1253883904. Throughput: 0: 4623.7. Samples: 8929648. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-05 15:54:12,852][07361] Avg episode reward: [(0, '41.748')] [2025-01-05 15:54:13,679][07482] Updated weights for policy 0, policy_version 306128 (0.0017) [2025-01-05 15:54:15,963][07482] Updated weights for policy 0, policy_version 306138 (0.0021) [2025-01-05 15:54:17,852][07361] Fps is (10 sec: 17618.6, 60 sec: 18636.8, 300 sec: 19466.4). Total num frames: 1253974016. Throughput: 0: 4563.2. Samples: 8956660. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-05 15:54:17,852][07361] Avg episode reward: [(0, '37.824')] [2025-01-05 15:54:18,322][07482] Updated weights for policy 0, policy_version 306148 (0.0018) [2025-01-05 15:54:20,400][07482] Updated weights for policy 0, policy_version 306158 (0.0017) [2025-01-05 15:54:22,480][07482] Updated weights for policy 0, policy_version 306168 (0.0016) [2025-01-05 15:54:22,851][07361] Fps is (10 sec: 18432.2, 60 sec: 18568.6, 300 sec: 19452.5). Total num frames: 1254068224. Throughput: 0: 4537.7. Samples: 8970424. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-05 15:54:22,852][07361] Avg episode reward: [(0, '41.133')] [2025-01-05 15:54:24,669][07482] Updated weights for policy 0, policy_version 306178 (0.0017) [2025-01-05 15:54:26,859][07482] Updated weights for policy 0, policy_version 306188 (0.0017) [2025-01-05 15:54:27,852][07361] Fps is (10 sec: 18841.6, 60 sec: 18500.3, 300 sec: 19452.5). Total num frames: 1254162432. Throughput: 0: 4514.5. Samples: 8998982. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-05 15:54:27,852][07361] Avg episode reward: [(0, '40.764')] [2025-01-05 15:54:29,097][07482] Updated weights for policy 0, policy_version 306198 (0.0017) [2025-01-05 15:54:31,171][07482] Updated weights for policy 0, policy_version 306208 (0.0016) [2025-01-05 15:54:32,852][07361] Fps is (10 sec: 18841.4, 60 sec: 18432.0, 300 sec: 19438.6). Total num frames: 1254256640. Throughput: 0: 4494.6. Samples: 9027708. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-05 15:54:32,852][07361] Avg episode reward: [(0, '41.676')] [2025-01-05 15:54:33,261][07482] Updated weights for policy 0, policy_version 306218 (0.0017) [2025-01-05 15:54:35,351][07482] Updated weights for policy 0, policy_version 306228 (0.0016) [2025-01-05 15:54:37,428][07482] Updated weights for policy 0, policy_version 306238 (0.0015) [2025-01-05 15:54:37,852][07361] Fps is (10 sec: 19660.8, 60 sec: 18432.0, 300 sec: 19466.4). Total num frames: 1254359040. Throughput: 0: 4502.9. Samples: 9042460. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-05 15:54:37,852][07361] Avg episode reward: [(0, '41.878')] [2025-01-05 15:54:37,859][07448] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000306240_1254359040.pth... [2025-01-05 15:54:37,914][07448] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000305099_1249685504.pth [2025-01-05 15:54:39,577][07482] Updated weights for policy 0, policy_version 306248 (0.0017) [2025-01-05 15:54:41,661][07482] Updated weights for policy 0, policy_version 306258 (0.0016) [2025-01-05 15:54:42,852][07361] Fps is (10 sec: 19660.8, 60 sec: 18363.7, 300 sec: 19452.5). Total num frames: 1254453248. Throughput: 0: 4542.9. Samples: 9071640. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-05 15:54:42,852][07361] Avg episode reward: [(0, '41.906')] [2025-01-05 15:54:43,816][07482] Updated weights for policy 0, policy_version 306268 (0.0017) [2025-01-05 15:54:45,825][07482] Updated weights for policy 0, policy_version 306278 (0.0016) [2025-01-05 15:54:47,852][07361] Fps is (10 sec: 19251.3, 60 sec: 18363.7, 300 sec: 19452.5). Total num frames: 1254551552. Throughput: 0: 4633.6. Samples: 9100886. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-05 15:54:47,852][07361] Avg episode reward: [(0, '40.171')] [2025-01-05 15:54:47,966][07482] Updated weights for policy 0, policy_version 306288 (0.0016) [2025-01-05 15:54:50,124][07482] Updated weights for policy 0, policy_version 306298 (0.0017) [2025-01-05 15:54:52,135][07482] Updated weights for policy 0, policy_version 306308 (0.0017) [2025-01-05 15:54:52,852][07361] Fps is (10 sec: 19660.9, 60 sec: 18432.0, 300 sec: 19452.5). Total num frames: 1254649856. Throughput: 0: 4692.2. Samples: 9115342. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-05 15:54:52,852][07361] Avg episode reward: [(0, '38.806')] [2025-01-05 15:54:54,226][07482] Updated weights for policy 0, policy_version 306318 (0.0016) [2025-01-05 15:54:56,350][07482] Updated weights for policy 0, policy_version 306328 (0.0016) [2025-01-05 15:54:57,851][07361] Fps is (10 sec: 19661.0, 60 sec: 18568.6, 300 sec: 19466.4). Total num frames: 1254748160. Throughput: 0: 4785.9. Samples: 9145012. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-05 15:54:57,852][07361] Avg episode reward: [(0, '41.425')] [2025-01-05 15:54:58,485][07482] Updated weights for policy 0, policy_version 306338 (0.0017) [2025-01-05 15:55:00,595][07482] Updated weights for policy 0, policy_version 306348 (0.0017) [2025-01-05 15:55:02,661][07482] Updated weights for policy 0, policy_version 306358 (0.0017) [2025-01-05 15:55:02,852][07361] Fps is (10 sec: 19660.8, 60 sec: 18841.6, 300 sec: 19466.4). Total num frames: 1254846464. Throughput: 0: 4831.3. Samples: 9174066. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-05 15:55:02,852][07361] Avg episode reward: [(0, '41.498')] [2025-01-05 15:55:04,795][07482] Updated weights for policy 0, policy_version 306368 (0.0017) [2025-01-05 15:55:06,903][07482] Updated weights for policy 0, policy_version 306378 (0.0016) [2025-01-05 15:55:07,851][07361] Fps is (10 sec: 19251.2, 60 sec: 19047.5, 300 sec: 19466.4). Total num frames: 1254940672. Throughput: 0: 4845.9. Samples: 9188488. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-05 15:55:07,852][07361] Avg episode reward: [(0, '40.244')] [2025-01-05 15:55:09,055][07482] Updated weights for policy 0, policy_version 306388 (0.0017) [2025-01-05 15:55:11,095][07482] Updated weights for policy 0, policy_version 306398 (0.0016) [2025-01-05 15:55:12,851][07361] Fps is (10 sec: 18432.0, 60 sec: 19114.7, 300 sec: 19452.6). Total num frames: 1255030784. Throughput: 0: 4857.1. Samples: 9217552. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-05 15:55:12,852][07361] Avg episode reward: [(0, '39.045')] [2025-01-05 15:55:13,623][07482] Updated weights for policy 0, policy_version 306408 (0.0019) [2025-01-05 15:55:15,922][07482] Updated weights for policy 0, policy_version 306418 (0.0017) [2025-01-05 15:55:17,852][07361] Fps is (10 sec: 18022.1, 60 sec: 19114.7, 300 sec: 19424.7). Total num frames: 1255120896. Throughput: 0: 4795.6. Samples: 9243510. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-05 15:55:17,852][07361] Avg episode reward: [(0, '37.352')] [2025-01-05 15:55:18,261][07482] Updated weights for policy 0, policy_version 306428 (0.0019) [2025-01-05 15:55:20,452][07482] Updated weights for policy 0, policy_version 306438 (0.0017) [2025-01-05 15:55:22,533][07482] Updated weights for policy 0, policy_version 306448 (0.0017) [2025-01-05 15:55:22,852][07361] Fps is (10 sec: 18431.8, 60 sec: 19114.6, 300 sec: 19410.9). Total num frames: 1255215104. Throughput: 0: 4767.5. Samples: 9256996. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 15:55:22,852][07361] Avg episode reward: [(0, '39.447')] [2025-01-05 15:55:24,865][07482] Updated weights for policy 0, policy_version 306458 (0.0019) [2025-01-05 15:55:27,115][07482] Updated weights for policy 0, policy_version 306468 (0.0019) [2025-01-05 15:55:27,852][07361] Fps is (10 sec: 18432.2, 60 sec: 19046.4, 300 sec: 19383.1). Total num frames: 1255305216. Throughput: 0: 4732.4. Samples: 9284598. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 15:55:27,852][07361] Avg episode reward: [(0, '39.741')] [2025-01-05 15:55:29,285][07482] Updated weights for policy 0, policy_version 306478 (0.0018) [2025-01-05 15:55:31,386][07482] Updated weights for policy 0, policy_version 306488 (0.0016) [2025-01-05 15:55:32,852][07361] Fps is (10 sec: 18432.0, 60 sec: 19046.4, 300 sec: 19369.2). Total num frames: 1255399424. Throughput: 0: 4718.0. Samples: 9313198. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 15:55:32,852][07361] Avg episode reward: [(0, '40.453')] [2025-01-05 15:55:33,591][07482] Updated weights for policy 0, policy_version 306498 (0.0017) [2025-01-05 15:55:35,592][07482] Updated weights for policy 0, policy_version 306508 (0.0015) [2025-01-05 15:55:37,670][07482] Updated weights for policy 0, policy_version 306518 (0.0016) [2025-01-05 15:55:37,852][07361] Fps is (10 sec: 19251.2, 60 sec: 18978.2, 300 sec: 19369.2). Total num frames: 1255497728. Throughput: 0: 4720.0. Samples: 9327744. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 15:55:37,852][07361] Avg episode reward: [(0, '42.298')] [2025-01-05 15:55:39,854][07482] Updated weights for policy 0, policy_version 306528 (0.0016) [2025-01-05 15:55:41,865][07482] Updated weights for policy 0, policy_version 306538 (0.0017) [2025-01-05 15:55:42,852][07361] Fps is (10 sec: 19660.9, 60 sec: 19046.4, 300 sec: 19369.2). Total num frames: 1255596032. Throughput: 0: 4714.9. Samples: 9357184. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 15:55:42,852][07361] Avg episode reward: [(0, '41.116')] [2025-01-05 15:55:43,980][07482] Updated weights for policy 0, policy_version 306548 (0.0015) [2025-01-05 15:55:46,063][07482] Updated weights for policy 0, policy_version 306558 (0.0016) [2025-01-05 15:55:47,852][07361] Fps is (10 sec: 19660.9, 60 sec: 19046.4, 300 sec: 19369.2). Total num frames: 1255694336. Throughput: 0: 4721.6. Samples: 9386538. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 15:55:47,852][07361] Avg episode reward: [(0, '37.086')] [2025-01-05 15:55:48,167][07482] Updated weights for policy 0, policy_version 306568 (0.0016) [2025-01-05 15:55:50,263][07482] Updated weights for policy 0, policy_version 306578 (0.0017) [2025-01-05 15:55:52,357][07482] Updated weights for policy 0, policy_version 306588 (0.0016) [2025-01-05 15:55:52,852][07361] Fps is (10 sec: 19660.9, 60 sec: 19046.4, 300 sec: 19369.2). Total num frames: 1255792640. Throughput: 0: 4727.5. Samples: 9401224. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 15:55:52,852][07361] Avg episode reward: [(0, '37.834')] [2025-01-05 15:55:54,465][07482] Updated weights for policy 0, policy_version 306598 (0.0018) [2025-01-05 15:55:56,638][07482] Updated weights for policy 0, policy_version 306608 (0.0017) [2025-01-05 15:55:57,852][07361] Fps is (10 sec: 19251.1, 60 sec: 18978.1, 300 sec: 19341.4). Total num frames: 1255886848. Throughput: 0: 4726.4. Samples: 9430242. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 15:55:57,852][07361] Avg episode reward: [(0, '39.240')] [2025-01-05 15:55:58,780][07482] Updated weights for policy 0, policy_version 306618 (0.0017) [2025-01-05 15:56:00,808][07482] Updated weights for policy 0, policy_version 306628 (0.0018) [2025-01-05 15:56:02,852][07361] Fps is (10 sec: 19251.1, 60 sec: 18978.1, 300 sec: 19341.5). Total num frames: 1255985152. Throughput: 0: 4794.3. Samples: 9459252. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 15:56:02,852][07361] Avg episode reward: [(0, '41.210')] [2025-01-05 15:56:02,974][07482] Updated weights for policy 0, policy_version 306638 (0.0017) [2025-01-05 15:56:05,097][07482] Updated weights for policy 0, policy_version 306648 (0.0017) [2025-01-05 15:56:07,127][07482] Updated weights for policy 0, policy_version 306658 (0.0016) [2025-01-05 15:56:07,852][07361] Fps is (10 sec: 19660.8, 60 sec: 19046.4, 300 sec: 19327.6). Total num frames: 1256083456. Throughput: 0: 4816.9. Samples: 9473758. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 15:56:07,852][07361] Avg episode reward: [(0, '40.137')] [2025-01-05 15:56:09,353][07482] Updated weights for policy 0, policy_version 306668 (0.0017) [2025-01-05 15:56:11,467][07482] Updated weights for policy 0, policy_version 306678 (0.0018) [2025-01-05 15:56:12,851][07361] Fps is (10 sec: 19251.5, 60 sec: 19114.7, 300 sec: 19313.7). Total num frames: 1256177664. Throughput: 0: 4847.4. Samples: 9502732. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 15:56:12,852][07361] Avg episode reward: [(0, '41.222')] [2025-01-05 15:56:13,672][07482] Updated weights for policy 0, policy_version 306688 (0.0018) [2025-01-05 15:56:15,959][07482] Updated weights for policy 0, policy_version 306698 (0.0021) [2025-01-05 15:56:17,852][07361] Fps is (10 sec: 18431.9, 60 sec: 19114.7, 300 sec: 19285.9). Total num frames: 1256267776. Throughput: 0: 4823.2. Samples: 9530244. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 15:56:17,852][07361] Avg episode reward: [(0, '43.839')] [2025-01-05 15:56:18,145][07482] Updated weights for policy 0, policy_version 306708 (0.0017) [2025-01-05 15:56:20,255][07482] Updated weights for policy 0, policy_version 306718 (0.0017) [2025-01-05 15:56:22,295][07482] Updated weights for policy 0, policy_version 306728 (0.0015) [2025-01-05 15:56:22,851][07361] Fps is (10 sec: 18841.5, 60 sec: 19183.0, 300 sec: 19272.1). Total num frames: 1256366080. Throughput: 0: 4818.7. Samples: 9544584. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 15:56:22,852][07361] Avg episode reward: [(0, '45.038')] [2025-01-05 15:56:22,957][07448] Saving new best policy, reward=45.038! [2025-01-05 15:56:24,540][07482] Updated weights for policy 0, policy_version 306738 (0.0016) [2025-01-05 15:56:26,573][07482] Updated weights for policy 0, policy_version 306748 (0.0017) [2025-01-05 15:56:27,852][07361] Fps is (10 sec: 19251.3, 60 sec: 19251.2, 300 sec: 19258.1). Total num frames: 1256460288. Throughput: 0: 4811.2. Samples: 9573690. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 15:56:27,852][07361] Avg episode reward: [(0, '44.463')] [2025-01-05 15:56:28,741][07482] Updated weights for policy 0, policy_version 306758 (0.0017) [2025-01-05 15:56:30,861][07482] Updated weights for policy 0, policy_version 306768 (0.0017) [2025-01-05 15:56:32,851][07361] Fps is (10 sec: 19251.3, 60 sec: 19319.5, 300 sec: 19258.1). Total num frames: 1256558592. Throughput: 0: 4796.1. Samples: 9602360. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-05 15:56:32,852][07361] Avg episode reward: [(0, '40.825')] [2025-01-05 15:56:33,009][07482] Updated weights for policy 0, policy_version 306778 (0.0016) [2025-01-05 15:56:35,073][07482] Updated weights for policy 0, policy_version 306788 (0.0016) [2025-01-05 15:56:37,210][07482] Updated weights for policy 0, policy_version 306798 (0.0016) [2025-01-05 15:56:37,852][07361] Fps is (10 sec: 19251.0, 60 sec: 19251.2, 300 sec: 19230.4). Total num frames: 1256652800. Throughput: 0: 4799.7. Samples: 9617212. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-05 15:56:37,852][07361] Avg episode reward: [(0, '42.999')] [2025-01-05 15:56:37,877][07448] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000306801_1256656896.pth... [2025-01-05 15:56:37,936][07448] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000305680_1252065280.pth [2025-01-05 15:56:39,362][07482] Updated weights for policy 0, policy_version 306808 (0.0017) [2025-01-05 15:56:41,428][07482] Updated weights for policy 0, policy_version 306818 (0.0017) [2025-01-05 15:56:42,851][07361] Fps is (10 sec: 19251.1, 60 sec: 19251.2, 300 sec: 19230.4). Total num frames: 1256751104. Throughput: 0: 4796.5. Samples: 9646082. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-05 15:56:42,852][07361] Avg episode reward: [(0, '45.601')] [2025-01-05 15:56:42,853][07448] Saving new best policy, reward=45.601! [2025-01-05 15:56:43,682][07482] Updated weights for policy 0, policy_version 306828 (0.0017) [2025-01-05 15:56:45,706][07482] Updated weights for policy 0, policy_version 306838 (0.0016) [2025-01-05 15:56:47,784][07482] Updated weights for policy 0, policy_version 306848 (0.0016) [2025-01-05 15:56:47,852][07361] Fps is (10 sec: 19661.0, 60 sec: 19251.2, 300 sec: 19230.4). Total num frames: 1256849408. Throughput: 0: 4795.6. Samples: 9675054. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-05 15:56:47,852][07361] Avg episode reward: [(0, '41.138')] [2025-01-05 15:56:49,983][07482] Updated weights for policy 0, policy_version 306858 (0.0017) [2025-01-05 15:56:51,998][07482] Updated weights for policy 0, policy_version 306868 (0.0016) [2025-01-05 15:56:52,851][07361] Fps is (10 sec: 19251.3, 60 sec: 19183.0, 300 sec: 19216.5). Total num frames: 1256943616. Throughput: 0: 4793.7. Samples: 9689472. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-05 15:56:52,852][07361] Avg episode reward: [(0, '39.748')] [2025-01-05 15:56:54,084][07482] Updated weights for policy 0, policy_version 306878 (0.0016) [2025-01-05 15:56:56,171][07482] Updated weights for policy 0, policy_version 306888 (0.0015) [2025-01-05 15:56:57,852][07361] Fps is (10 sec: 19251.3, 60 sec: 19251.2, 300 sec: 19216.5). Total num frames: 1257041920. Throughput: 0: 4811.2. Samples: 9719236. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-05 15:56:57,852][07361] Avg episode reward: [(0, '42.683')] [2025-01-05 15:56:58,277][07482] Updated weights for policy 0, policy_version 306898 (0.0017) [2025-01-05 15:57:00,369][07482] Updated weights for policy 0, policy_version 306908 (0.0016) [2025-01-05 15:57:02,458][07482] Updated weights for policy 0, policy_version 306918 (0.0016) [2025-01-05 15:57:02,851][07361] Fps is (10 sec: 19660.7, 60 sec: 19251.2, 300 sec: 19216.5). Total num frames: 1257140224. Throughput: 0: 4850.9. Samples: 9748534. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-05 15:57:02,852][07361] Avg episode reward: [(0, '42.298')] [2025-01-05 15:57:04,549][07482] Updated weights for policy 0, policy_version 306928 (0.0017) [2025-01-05 15:57:06,627][07482] Updated weights for policy 0, policy_version 306938 (0.0016) [2025-01-05 15:57:07,852][07361] Fps is (10 sec: 19660.8, 60 sec: 19251.2, 300 sec: 19216.5). Total num frames: 1257238528. Throughput: 0: 4856.2. Samples: 9763112. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-05 15:57:07,852][07361] Avg episode reward: [(0, '42.393')] [2025-01-05 15:57:08,842][07482] Updated weights for policy 0, policy_version 306948 (0.0017) [2025-01-05 15:57:10,853][07482] Updated weights for policy 0, policy_version 306958 (0.0016) [2025-01-05 15:57:12,851][07361] Fps is (10 sec: 19660.9, 60 sec: 19319.5, 300 sec: 19216.5). Total num frames: 1257336832. Throughput: 0: 4859.0. Samples: 9792346. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-05 15:57:12,852][07361] Avg episode reward: [(0, '41.287')] [2025-01-05 15:57:12,957][07482] Updated weights for policy 0, policy_version 306968 (0.0016) [2025-01-05 15:57:15,093][07482] Updated weights for policy 0, policy_version 306978 (0.0019) [2025-01-05 15:57:17,100][07482] Updated weights for policy 0, policy_version 306988 (0.0016) [2025-01-05 15:57:17,851][07361] Fps is (10 sec: 19661.0, 60 sec: 19456.1, 300 sec: 19216.5). Total num frames: 1257435136. Throughput: 0: 4876.6. Samples: 9821806. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-05 15:57:17,852][07361] Avg episode reward: [(0, '43.243')] [2025-01-05 15:57:19,202][07482] Updated weights for policy 0, policy_version 306998 (0.0017) [2025-01-05 15:57:21,294][07482] Updated weights for policy 0, policy_version 307008 (0.0016) [2025-01-05 15:57:22,852][07361] Fps is (10 sec: 19660.4, 60 sec: 19456.0, 300 sec: 19216.5). Total num frames: 1257533440. Throughput: 0: 4875.0. Samples: 9836588. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-05 15:57:22,852][07361] Avg episode reward: [(0, '40.733')] [2025-01-05 15:57:23,472][07482] Updated weights for policy 0, policy_version 307018 (0.0018) [2025-01-05 15:57:25,555][07482] Updated weights for policy 0, policy_version 307028 (0.0016) [2025-01-05 15:57:27,629][07482] Updated weights for policy 0, policy_version 307038 (0.0016) [2025-01-05 15:57:27,852][07361] Fps is (10 sec: 19660.6, 60 sec: 19524.3, 300 sec: 19216.5). Total num frames: 1257631744. Throughput: 0: 4878.5. Samples: 9865616. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-05 15:57:27,852][07361] Avg episode reward: [(0, '42.899')] [2025-01-05 15:57:29,753][07482] Updated weights for policy 0, policy_version 307048 (0.0017) [2025-01-05 15:57:31,838][07482] Updated weights for policy 0, policy_version 307058 (0.0016) [2025-01-05 15:57:32,851][07361] Fps is (10 sec: 19251.5, 60 sec: 19456.0, 300 sec: 19188.7). Total num frames: 1257725952. Throughput: 0: 4880.9. Samples: 9894694. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-05 15:57:32,852][07361] Avg episode reward: [(0, '42.427')] [2025-01-05 15:57:34,073][07482] Updated weights for policy 0, policy_version 307068 (0.0017) [2025-01-05 15:57:36,135][07482] Updated weights for policy 0, policy_version 307078 (0.0017) [2025-01-05 15:57:37,851][07361] Fps is (10 sec: 18841.7, 60 sec: 19456.1, 300 sec: 19174.8). Total num frames: 1257820160. Throughput: 0: 4879.5. Samples: 9909048. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-05 15:57:37,852][07361] Avg episode reward: [(0, '41.557')] [2025-01-05 15:57:38,324][07482] Updated weights for policy 0, policy_version 307088 (0.0017) [2025-01-05 15:57:40,396][07482] Updated weights for policy 0, policy_version 307098 (0.0016) [2025-01-05 15:57:42,452][07482] Updated weights for policy 0, policy_version 307108 (0.0017) [2025-01-05 15:57:42,851][07361] Fps is (10 sec: 19251.3, 60 sec: 19456.0, 300 sec: 19174.8). Total num frames: 1257918464. Throughput: 0: 4861.5. Samples: 9938004. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-05 15:57:42,852][07361] Avg episode reward: [(0, '43.373')] [2025-01-05 15:57:44,627][07482] Updated weights for policy 0, policy_version 307118 (0.0018) [2025-01-05 15:57:46,708][07482] Updated weights for policy 0, policy_version 307128 (0.0016) [2025-01-05 15:57:47,851][07361] Fps is (10 sec: 19660.8, 60 sec: 19456.0, 300 sec: 19174.8). Total num frames: 1258016768. Throughput: 0: 4859.3. Samples: 9967202. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 15:57:47,852][07361] Avg episode reward: [(0, '42.727')] [2025-01-05 15:57:48,819][07482] Updated weights for policy 0, policy_version 307138 (0.0019) [2025-01-05 15:57:50,949][07482] Updated weights for policy 0, policy_version 307148 (0.0016) [2025-01-05 15:57:52,851][07361] Fps is (10 sec: 19251.3, 60 sec: 19456.0, 300 sec: 19161.0). Total num frames: 1258110976. Throughput: 0: 4858.1. Samples: 9981726. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 15:57:52,852][07361] Avg episode reward: [(0, '40.890')] [2025-01-05 15:57:53,111][07482] Updated weights for policy 0, policy_version 307158 (0.0016) [2025-01-05 15:57:55,171][07482] Updated weights for policy 0, policy_version 307168 (0.0016) [2025-01-05 15:57:57,241][07482] Updated weights for policy 0, policy_version 307178 (0.0016) [2025-01-05 15:57:57,851][07361] Fps is (10 sec: 19251.2, 60 sec: 19456.0, 300 sec: 19147.1). Total num frames: 1258209280. Throughput: 0: 4854.2. Samples: 10010786. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 15:57:57,852][07361] Avg episode reward: [(0, '40.668')] [2025-01-05 15:57:59,446][07482] Updated weights for policy 0, policy_version 307188 (0.0018) [2025-01-05 15:58:01,486][07482] Updated weights for policy 0, policy_version 307198 (0.0017) [2025-01-05 15:58:02,852][07361] Fps is (10 sec: 19660.5, 60 sec: 19456.0, 300 sec: 19147.1). Total num frames: 1258307584. Throughput: 0: 4839.3. Samples: 10039576. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 15:58:02,852][07361] Avg episode reward: [(0, '45.223')] [2025-01-05 15:58:03,743][07482] Updated weights for policy 0, policy_version 307208 (0.0019) [2025-01-05 15:58:05,938][07482] Updated weights for policy 0, policy_version 307218 (0.0016) [2025-01-05 15:58:07,852][07361] Fps is (10 sec: 18841.4, 60 sec: 19319.5, 300 sec: 19119.3). Total num frames: 1258397696. Throughput: 0: 4822.5. Samples: 10053600. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 15:58:07,852][07361] Avg episode reward: [(0, '42.599')] [2025-01-05 15:58:08,176][07482] Updated weights for policy 0, policy_version 307228 (0.0018) [2025-01-05 15:58:10,409][07482] Updated weights for policy 0, policy_version 307238 (0.0017) [2025-01-05 15:58:12,542][07482] Updated weights for policy 0, policy_version 307248 (0.0017) [2025-01-05 15:58:12,852][07361] Fps is (10 sec: 18431.7, 60 sec: 19251.1, 300 sec: 19105.4). Total num frames: 1258491904. Throughput: 0: 4793.0. Samples: 10081302. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 15:58:12,852][07361] Avg episode reward: [(0, '40.763')] [2025-01-05 15:58:14,712][07482] Updated weights for policy 0, policy_version 307258 (0.0018) [2025-01-05 15:58:16,863][07482] Updated weights for policy 0, policy_version 307268 (0.0016) [2025-01-05 15:58:17,852][07361] Fps is (10 sec: 18840.6, 60 sec: 19182.7, 300 sec: 19091.5). Total num frames: 1258586112. Throughput: 0: 4777.7. Samples: 10109694. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 15:58:17,853][07361] Avg episode reward: [(0, '40.046')] [2025-01-05 15:58:18,988][07482] Updated weights for policy 0, policy_version 307278 (0.0016) [2025-01-05 15:58:20,995][07482] Updated weights for policy 0, policy_version 307288 (0.0016) [2025-01-05 15:58:22,851][07361] Fps is (10 sec: 19251.5, 60 sec: 19183.0, 300 sec: 19091.5). Total num frames: 1258684416. Throughput: 0: 4786.4. Samples: 10124434. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 15:58:22,852][07361] Avg episode reward: [(0, '40.474')] [2025-01-05 15:58:23,123][07482] Updated weights for policy 0, policy_version 307298 (0.0016) [2025-01-05 15:58:25,201][07482] Updated weights for policy 0, policy_version 307308 (0.0016) [2025-01-05 15:58:27,261][07482] Updated weights for policy 0, policy_version 307318 (0.0016) [2025-01-05 15:58:27,852][07361] Fps is (10 sec: 19661.8, 60 sec: 19182.9, 300 sec: 19091.5). Total num frames: 1258782720. Throughput: 0: 4798.8. Samples: 10153952. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 15:58:27,852][07361] Avg episode reward: [(0, '41.782')] [2025-01-05 15:58:29,440][07482] Updated weights for policy 0, policy_version 307328 (0.0017) [2025-01-05 15:58:31,486][07482] Updated weights for policy 0, policy_version 307338 (0.0016) [2025-01-05 15:58:32,852][07361] Fps is (10 sec: 19659.9, 60 sec: 19251.1, 300 sec: 19077.6). Total num frames: 1258881024. Throughput: 0: 4801.2. Samples: 10183256. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 15:58:32,852][07361] Avg episode reward: [(0, '43.464')] [2025-01-05 15:58:33,589][07482] Updated weights for policy 0, policy_version 307348 (0.0017) [2025-01-05 15:58:35,696][07482] Updated weights for policy 0, policy_version 307358 (0.0016) [2025-01-05 15:58:37,832][07482] Updated weights for policy 0, policy_version 307368 (0.0018) [2025-01-05 15:58:37,852][07361] Fps is (10 sec: 19660.7, 60 sec: 19319.4, 300 sec: 19077.6). Total num frames: 1258979328. Throughput: 0: 4804.1. Samples: 10197912. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 15:58:37,852][07361] Avg episode reward: [(0, '41.637')] [2025-01-05 15:58:37,859][07448] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000307368_1258979328.pth... [2025-01-05 15:58:37,915][07448] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000306240_1254359040.pth [2025-01-05 15:58:39,934][07482] Updated weights for policy 0, policy_version 307378 (0.0017) [2025-01-05 15:58:42,046][07482] Updated weights for policy 0, policy_version 307388 (0.0020) [2025-01-05 15:58:42,852][07361] Fps is (10 sec: 19252.0, 60 sec: 19251.2, 300 sec: 19063.8). Total num frames: 1259073536. Throughput: 0: 4801.0. Samples: 10226830. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 15:58:42,852][07361] Avg episode reward: [(0, '42.178')] [2025-01-05 15:58:44,204][07482] Updated weights for policy 0, policy_version 307398 (0.0017) [2025-01-05 15:58:46,214][07482] Updated weights for policy 0, policy_version 307408 (0.0016) [2025-01-05 15:58:47,851][07361] Fps is (10 sec: 19251.6, 60 sec: 19251.2, 300 sec: 19077.7). Total num frames: 1259171840. Throughput: 0: 4815.1. Samples: 10256256. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 15:58:47,852][07361] Avg episode reward: [(0, '39.586')] [2025-01-05 15:58:48,340][07482] Updated weights for policy 0, policy_version 307418 (0.0016) [2025-01-05 15:58:50,381][07482] Updated weights for policy 0, policy_version 307428 (0.0016) [2025-01-05 15:58:52,415][07482] Updated weights for policy 0, policy_version 307438 (0.0016) [2025-01-05 15:58:52,851][07361] Fps is (10 sec: 19661.0, 60 sec: 19319.5, 300 sec: 19105.4). Total num frames: 1259270144. Throughput: 0: 4833.4. Samples: 10271102. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 15:58:52,852][07361] Avg episode reward: [(0, '39.132')] [2025-01-05 15:58:54,551][07482] Updated weights for policy 0, policy_version 307448 (0.0016) [2025-01-05 15:58:56,604][07482] Updated weights for policy 0, policy_version 307458 (0.0016) [2025-01-05 15:58:57,851][07361] Fps is (10 sec: 19660.7, 60 sec: 19319.5, 300 sec: 19161.0). Total num frames: 1259368448. Throughput: 0: 4876.5. Samples: 10300744. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 15:58:57,852][07361] Avg episode reward: [(0, '43.301')] [2025-01-05 15:58:58,722][07482] Updated weights for policy 0, policy_version 307468 (0.0017) [2025-01-05 15:59:00,798][07482] Updated weights for policy 0, policy_version 307478 (0.0017) [2025-01-05 15:59:02,852][07361] Fps is (10 sec: 19660.5, 60 sec: 19319.5, 300 sec: 19216.7). Total num frames: 1259466752. Throughput: 0: 4896.5. Samples: 10330036. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 15:59:02,852][07361] Avg episode reward: [(0, '44.552')] [2025-01-05 15:59:02,903][07482] Updated weights for policy 0, policy_version 307488 (0.0016) [2025-01-05 15:59:05,013][07482] Updated weights for policy 0, policy_version 307498 (0.0017) [2025-01-05 15:59:07,116][07482] Updated weights for policy 0, policy_version 307508 (0.0016) [2025-01-05 15:59:07,852][07361] Fps is (10 sec: 19659.9, 60 sec: 19455.9, 300 sec: 19258.1). Total num frames: 1259565056. Throughput: 0: 4889.2. Samples: 10344452. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 15:59:07,852][07361] Avg episode reward: [(0, '42.656')] [2025-01-05 15:59:09,244][07482] Updated weights for policy 0, policy_version 307518 (0.0016) [2025-01-05 15:59:11,266][07482] Updated weights for policy 0, policy_version 307528 (0.0018) [2025-01-05 15:59:12,852][07361] Fps is (10 sec: 19660.8, 60 sec: 19524.3, 300 sec: 19285.9). Total num frames: 1259663360. Throughput: 0: 4888.2. Samples: 10373922. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 15:59:12,852][07361] Avg episode reward: [(0, '39.496')] [2025-01-05 15:59:13,449][07482] Updated weights for policy 0, policy_version 307538 (0.0017) [2025-01-05 15:59:15,502][07482] Updated weights for policy 0, policy_version 307548 (0.0017) [2025-01-05 15:59:17,526][07482] Updated weights for policy 0, policy_version 307558 (0.0016) [2025-01-05 15:59:17,852][07361] Fps is (10 sec: 19661.5, 60 sec: 19592.7, 300 sec: 19299.8). Total num frames: 1259761664. Throughput: 0: 4893.1. Samples: 10403446. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 15:59:17,852][07361] Avg episode reward: [(0, '40.532')] [2025-01-05 15:59:19,689][07482] Updated weights for policy 0, policy_version 307568 (0.0017) [2025-01-05 15:59:21,826][07482] Updated weights for policy 0, policy_version 307578 (0.0017) [2025-01-05 15:59:22,852][07361] Fps is (10 sec: 19251.2, 60 sec: 19524.3, 300 sec: 19299.8). Total num frames: 1259855872. Throughput: 0: 4888.7. Samples: 10417902. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 15:59:22,852][07361] Avg episode reward: [(0, '41.882')] [2025-01-05 15:59:23,884][07482] Updated weights for policy 0, policy_version 307588 (0.0017) [2025-01-05 15:59:25,975][07482] Updated weights for policy 0, policy_version 307598 (0.0016) [2025-01-05 15:59:27,852][07361] Fps is (10 sec: 19251.2, 60 sec: 19524.3, 300 sec: 19313.7). Total num frames: 1259954176. Throughput: 0: 4895.6. Samples: 10447132. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 15:59:27,852][07361] Avg episode reward: [(0, '42.298')] [2025-01-05 15:59:28,121][07482] Updated weights for policy 0, policy_version 307608 (0.0017) [2025-01-05 15:59:30,135][07482] Updated weights for policy 0, policy_version 307618 (0.0016) [2025-01-05 15:59:32,232][07482] Updated weights for policy 0, policy_version 307628 (0.0017) [2025-01-05 15:59:32,851][07361] Fps is (10 sec: 19661.0, 60 sec: 19524.4, 300 sec: 19299.8). Total num frames: 1260052480. Throughput: 0: 4897.8. Samples: 10476656. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 15:59:32,852][07361] Avg episode reward: [(0, '39.820')] [2025-01-05 15:59:34,424][07482] Updated weights for policy 0, policy_version 307638 (0.0017) [2025-01-05 15:59:36,426][07482] Updated weights for policy 0, policy_version 307648 (0.0016) [2025-01-05 15:59:37,852][07361] Fps is (10 sec: 19660.8, 60 sec: 19524.3, 300 sec: 19313.7). Total num frames: 1260150784. Throughput: 0: 4885.0. Samples: 10490926. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 15:59:37,852][07361] Avg episode reward: [(0, '38.304')] [2025-01-05 15:59:38,525][07482] Updated weights for policy 0, policy_version 307658 (0.0015) [2025-01-05 15:59:40,588][07482] Updated weights for policy 0, policy_version 307668 (0.0016) [2025-01-05 15:59:42,574][07482] Updated weights for policy 0, policy_version 307678 (0.0016) [2025-01-05 15:59:42,851][07361] Fps is (10 sec: 20070.4, 60 sec: 19660.8, 300 sec: 19327.6). Total num frames: 1260253184. Throughput: 0: 4892.2. Samples: 10520892. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 15:59:42,852][07361] Avg episode reward: [(0, '39.571')] [2025-01-05 15:59:44,682][07482] Updated weights for policy 0, policy_version 307688 (0.0017) [2025-01-05 15:59:46,742][07482] Updated weights for policy 0, policy_version 307698 (0.0016) [2025-01-05 15:59:47,852][07361] Fps is (10 sec: 20070.4, 60 sec: 19660.8, 300 sec: 19327.6). Total num frames: 1260351488. Throughput: 0: 4904.3. Samples: 10550730. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 15:59:47,852][07361] Avg episode reward: [(0, '39.546')] [2025-01-05 15:59:48,842][07482] Updated weights for policy 0, policy_version 307708 (0.0017) [2025-01-05 15:59:51,142][07482] Updated weights for policy 0, policy_version 307718 (0.0017) [2025-01-05 15:59:52,852][07361] Fps is (10 sec: 19251.0, 60 sec: 19592.5, 300 sec: 19313.7). Total num frames: 1260445696. Throughput: 0: 4893.2. Samples: 10564644. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 15:59:52,852][07361] Avg episode reward: [(0, '39.999')] [2025-01-05 15:59:53,311][07482] Updated weights for policy 0, policy_version 307728 (0.0016) [2025-01-05 15:59:55,310][07482] Updated weights for policy 0, policy_version 307738 (0.0016) [2025-01-05 15:59:57,414][07482] Updated weights for policy 0, policy_version 307748 (0.0015) [2025-01-05 15:59:57,852][07361] Fps is (10 sec: 19251.2, 60 sec: 19592.5, 300 sec: 19313.7). Total num frames: 1260544000. Throughput: 0: 4883.3. Samples: 10593672. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 15:59:57,852][07361] Avg episode reward: [(0, '41.850')] [2025-01-05 15:59:59,571][07482] Updated weights for policy 0, policy_version 307758 (0.0016) [2025-01-05 16:00:01,596][07482] Updated weights for policy 0, policy_version 307768 (0.0015) [2025-01-05 16:00:02,852][07361] Fps is (10 sec: 19251.2, 60 sec: 19524.3, 300 sec: 19313.7). Total num frames: 1260638208. Throughput: 0: 4876.9. Samples: 10622904. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 16:00:02,852][07361] Avg episode reward: [(0, '44.830')] [2025-01-05 16:00:03,743][07482] Updated weights for policy 0, policy_version 307778 (0.0017) [2025-01-05 16:00:05,830][07482] Updated weights for policy 0, policy_version 307788 (0.0018) [2025-01-05 16:00:07,851][07361] Fps is (10 sec: 19251.5, 60 sec: 19524.4, 300 sec: 19341.5). Total num frames: 1260736512. Throughput: 0: 4882.8. Samples: 10637630. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 16:00:07,852][07361] Avg episode reward: [(0, '44.723')] [2025-01-05 16:00:07,957][07482] Updated weights for policy 0, policy_version 307798 (0.0019) [2025-01-05 16:00:10,121][07482] Updated weights for policy 0, policy_version 307808 (0.0017) [2025-01-05 16:00:12,238][07482] Updated weights for policy 0, policy_version 307818 (0.0016) [2025-01-05 16:00:12,851][07361] Fps is (10 sec: 19251.3, 60 sec: 19456.0, 300 sec: 19355.3). Total num frames: 1260830720. Throughput: 0: 4874.9. Samples: 10666504. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 16:00:12,852][07361] Avg episode reward: [(0, '43.460')] [2025-01-05 16:00:14,310][07482] Updated weights for policy 0, policy_version 307828 (0.0016) [2025-01-05 16:00:16,387][07482] Updated weights for policy 0, policy_version 307838 (0.0016) [2025-01-05 16:00:17,852][07361] Fps is (10 sec: 19251.0, 60 sec: 19456.0, 300 sec: 19369.2). Total num frames: 1260929024. Throughput: 0: 4866.4. Samples: 10695644. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 16:00:17,852][07361] Avg episode reward: [(0, '43.897')] [2025-01-05 16:00:18,561][07482] Updated weights for policy 0, policy_version 307848 (0.0017) [2025-01-05 16:00:20,573][07482] Updated weights for policy 0, policy_version 307858 (0.0018) [2025-01-05 16:00:22,656][07482] Updated weights for policy 0, policy_version 307868 (0.0016) [2025-01-05 16:00:22,852][07361] Fps is (10 sec: 19660.6, 60 sec: 19524.3, 300 sec: 19397.0). Total num frames: 1261027328. Throughput: 0: 4876.1. Samples: 10710350. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 16:00:22,852][07361] Avg episode reward: [(0, '43.712')] [2025-01-05 16:00:24,957][07482] Updated weights for policy 0, policy_version 307878 (0.0017) [2025-01-05 16:00:27,138][07482] Updated weights for policy 0, policy_version 307888 (0.0016) [2025-01-05 16:00:27,852][07361] Fps is (10 sec: 19251.2, 60 sec: 19456.0, 300 sec: 19397.0). Total num frames: 1261121536. Throughput: 0: 4839.3. Samples: 10738660. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 16:00:27,852][07361] Avg episode reward: [(0, '41.023')] [2025-01-05 16:00:29,364][07482] Updated weights for policy 0, policy_version 307898 (0.0017) [2025-01-05 16:00:31,477][07482] Updated weights for policy 0, policy_version 307908 (0.0017) [2025-01-05 16:00:32,852][07361] Fps is (10 sec: 18840.6, 60 sec: 19387.5, 300 sec: 19383.1). Total num frames: 1261215744. Throughput: 0: 4803.7. Samples: 10766898. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 16:00:32,853][07361] Avg episode reward: [(0, '38.541')] [2025-01-05 16:00:33,650][07482] Updated weights for policy 0, policy_version 307918 (0.0017) [2025-01-05 16:00:35,768][07482] Updated weights for policy 0, policy_version 307928 (0.0016) [2025-01-05 16:00:37,851][07361] Fps is (10 sec: 18841.8, 60 sec: 19319.5, 300 sec: 19369.2). Total num frames: 1261309952. Throughput: 0: 4809.4. Samples: 10781066. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 16:00:37,852][07361] Avg episode reward: [(0, '41.361')] [2025-01-05 16:00:37,925][07448] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000307938_1261314048.pth... [2025-01-05 16:00:37,928][07482] Updated weights for policy 0, policy_version 307938 (0.0017) [2025-01-05 16:00:37,976][07448] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000306801_1256656896.pth [2025-01-05 16:00:40,058][07482] Updated weights for policy 0, policy_version 307948 (0.0018) [2025-01-05 16:00:42,124][07482] Updated weights for policy 0, policy_version 307958 (0.0016) [2025-01-05 16:00:42,852][07361] Fps is (10 sec: 19252.3, 60 sec: 19251.2, 300 sec: 19369.2). Total num frames: 1261408256. Throughput: 0: 4812.5. Samples: 10810234. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 16:00:42,852][07361] Avg episode reward: [(0, '41.806')] [2025-01-05 16:00:44,272][07482] Updated weights for policy 0, policy_version 307968 (0.0016) [2025-01-05 16:00:46,322][07482] Updated weights for policy 0, policy_version 307978 (0.0019) [2025-01-05 16:00:47,852][07361] Fps is (10 sec: 19660.7, 60 sec: 19251.2, 300 sec: 19369.2). Total num frames: 1261506560. Throughput: 0: 4809.8. Samples: 10839346. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 16:00:47,852][07361] Avg episode reward: [(0, '40.600')] [2025-01-05 16:00:48,467][07482] Updated weights for policy 0, policy_version 307988 (0.0016) [2025-01-05 16:00:50,526][07482] Updated weights for policy 0, policy_version 307998 (0.0015) [2025-01-05 16:00:52,597][07482] Updated weights for policy 0, policy_version 308008 (0.0016) [2025-01-05 16:00:52,852][07361] Fps is (10 sec: 19660.7, 60 sec: 19319.5, 300 sec: 19383.1). Total num frames: 1261604864. Throughput: 0: 4810.7. Samples: 10854110. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 16:00:52,852][07361] Avg episode reward: [(0, '41.131')] [2025-01-05 16:00:54,730][07482] Updated weights for policy 0, policy_version 308018 (0.0016) [2025-01-05 16:00:56,778][07482] Updated weights for policy 0, policy_version 308028 (0.0016) [2025-01-05 16:00:57,852][07361] Fps is (10 sec: 19660.9, 60 sec: 19319.5, 300 sec: 19383.1). Total num frames: 1261703168. Throughput: 0: 4823.2. Samples: 10883550. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 16:00:57,852][07361] Avg episode reward: [(0, '41.137')] [2025-01-05 16:00:58,900][07482] Updated weights for policy 0, policy_version 308038 (0.0019) [2025-01-05 16:01:00,945][07482] Updated weights for policy 0, policy_version 308048 (0.0016) [2025-01-05 16:01:02,852][07361] Fps is (10 sec: 19251.2, 60 sec: 19319.5, 300 sec: 19369.2). Total num frames: 1261797376. Throughput: 0: 4825.7. Samples: 10912798. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 16:01:02,852][07361] Avg episode reward: [(0, '42.262')] [2025-01-05 16:01:03,110][07482] Updated weights for policy 0, policy_version 308058 (0.0016) [2025-01-05 16:01:05,120][07482] Updated weights for policy 0, policy_version 308068 (0.0015) [2025-01-05 16:01:07,168][07482] Updated weights for policy 0, policy_version 308078 (0.0015) [2025-01-05 16:01:07,851][07361] Fps is (10 sec: 19660.9, 60 sec: 19387.7, 300 sec: 19397.0). Total num frames: 1261899776. Throughput: 0: 4830.5. Samples: 10927722. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 16:01:07,852][07361] Avg episode reward: [(0, '41.234')] [2025-01-05 16:01:09,325][07482] Updated weights for policy 0, policy_version 308088 (0.0016) [2025-01-05 16:01:11,353][07482] Updated weights for policy 0, policy_version 308098 (0.0018) [2025-01-05 16:01:12,852][07361] Fps is (10 sec: 20070.3, 60 sec: 19456.0, 300 sec: 19424.8). Total num frames: 1261998080. Throughput: 0: 4858.6. Samples: 10957298. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 16:01:12,852][07361] Avg episode reward: [(0, '41.488')] [2025-01-05 16:01:13,491][07482] Updated weights for policy 0, policy_version 308108 (0.0017) [2025-01-05 16:01:15,596][07482] Updated weights for policy 0, policy_version 308118 (0.0015) [2025-01-05 16:01:17,597][07482] Updated weights for policy 0, policy_version 308128 (0.0015) [2025-01-05 16:01:17,852][07361] Fps is (10 sec: 19660.6, 60 sec: 19456.0, 300 sec: 19424.7). Total num frames: 1262096384. Throughput: 0: 4885.2. Samples: 10986728. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 16:01:17,852][07361] Avg episode reward: [(0, '43.525')] [2025-01-05 16:01:19,642][07482] Updated weights for policy 0, policy_version 308138 (0.0015) [2025-01-05 16:01:21,758][07482] Updated weights for policy 0, policy_version 308148 (0.0015) [2025-01-05 16:01:22,852][07361] Fps is (10 sec: 19660.8, 60 sec: 19456.0, 300 sec: 19438.6). Total num frames: 1262194688. Throughput: 0: 4903.5. Samples: 11001724. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 16:01:22,852][07361] Avg episode reward: [(0, '43.240')] [2025-01-05 16:01:23,855][07482] Updated weights for policy 0, policy_version 308158 (0.0016) [2025-01-05 16:01:25,903][07482] Updated weights for policy 0, policy_version 308168 (0.0015) [2025-01-05 16:01:27,852][07361] Fps is (10 sec: 19660.8, 60 sec: 19524.3, 300 sec: 19438.6). Total num frames: 1262292992. Throughput: 0: 4907.9. Samples: 11031088. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 16:01:27,852][07361] Avg episode reward: [(0, '40.062')] [2025-01-05 16:01:28,090][07482] Updated weights for policy 0, policy_version 308178 (0.0017) [2025-01-05 16:01:30,098][07482] Updated weights for policy 0, policy_version 308188 (0.0016) [2025-01-05 16:01:32,160][07482] Updated weights for policy 0, policy_version 308198 (0.0016) [2025-01-05 16:01:32,852][07361] Fps is (10 sec: 19660.9, 60 sec: 19592.7, 300 sec: 19452.5). Total num frames: 1262391296. Throughput: 0: 4916.1. Samples: 11060568. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 16:01:32,852][07361] Avg episode reward: [(0, '40.936')] [2025-01-05 16:01:34,364][07482] Updated weights for policy 0, policy_version 308208 (0.0016) [2025-01-05 16:01:36,352][07482] Updated weights for policy 0, policy_version 308218 (0.0016) [2025-01-05 16:01:37,852][07361] Fps is (10 sec: 19660.9, 60 sec: 19660.8, 300 sec: 19452.5). Total num frames: 1262489600. Throughput: 0: 4911.2. Samples: 11075116. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 16:01:37,852][07361] Avg episode reward: [(0, '41.699')] [2025-01-05 16:01:38,403][07482] Updated weights for policy 0, policy_version 308228 (0.0016) [2025-01-05 16:01:40,509][07482] Updated weights for policy 0, policy_version 308238 (0.0016) [2025-01-05 16:01:42,502][07482] Updated weights for policy 0, policy_version 308248 (0.0016) [2025-01-05 16:01:42,852][07361] Fps is (10 sec: 19660.7, 60 sec: 19660.8, 300 sec: 19452.5). Total num frames: 1262587904. Throughput: 0: 4922.4. Samples: 11105056. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 16:01:42,852][07361] Avg episode reward: [(0, '42.076')] [2025-01-05 16:01:44,554][07482] Updated weights for policy 0, policy_version 308258 (0.0016) [2025-01-05 16:01:46,655][07482] Updated weights for policy 0, policy_version 308268 (0.0017) [2025-01-05 16:01:47,852][07361] Fps is (10 sec: 19659.0, 60 sec: 19660.5, 300 sec: 19466.3). Total num frames: 1262686208. Throughput: 0: 4936.8. Samples: 11134960. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 16:01:47,853][07361] Avg episode reward: [(0, '42.545')] [2025-01-05 16:01:48,751][07482] Updated weights for policy 0, policy_version 308278 (0.0017) [2025-01-05 16:01:50,804][07482] Updated weights for policy 0, policy_version 308288 (0.0016) [2025-01-05 16:01:52,851][07361] Fps is (10 sec: 19660.9, 60 sec: 19660.8, 300 sec: 19466.4). Total num frames: 1262784512. Throughput: 0: 4928.4. Samples: 11149498. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 16:01:52,852][07361] Avg episode reward: [(0, '40.331')] [2025-01-05 16:01:52,986][07482] Updated weights for policy 0, policy_version 308298 (0.0017) [2025-01-05 16:01:55,106][07482] Updated weights for policy 0, policy_version 308308 (0.0017) [2025-01-05 16:01:57,161][07482] Updated weights for policy 0, policy_version 308318 (0.0016) [2025-01-05 16:01:57,852][07361] Fps is (10 sec: 19662.5, 60 sec: 19660.8, 300 sec: 19466.4). Total num frames: 1262882816. Throughput: 0: 4915.6. Samples: 11178502. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 16:01:57,852][07361] Avg episode reward: [(0, '41.121')] [2025-01-05 16:01:59,302][07482] Updated weights for policy 0, policy_version 308328 (0.0017) [2025-01-05 16:02:01,363][07482] Updated weights for policy 0, policy_version 308338 (0.0016) [2025-01-05 16:02:02,851][07361] Fps is (10 sec: 19660.8, 60 sec: 19729.1, 300 sec: 19466.4). Total num frames: 1262981120. Throughput: 0: 4912.9. Samples: 11207810. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 16:02:02,852][07361] Avg episode reward: [(0, '43.131')] [2025-01-05 16:02:03,473][07482] Updated weights for policy 0, policy_version 308348 (0.0017) [2025-01-05 16:02:05,516][07482] Updated weights for policy 0, policy_version 308358 (0.0016) [2025-01-05 16:02:07,549][07482] Updated weights for policy 0, policy_version 308368 (0.0017) [2025-01-05 16:02:07,851][07361] Fps is (10 sec: 19661.2, 60 sec: 19660.8, 300 sec: 19466.4). Total num frames: 1263079424. Throughput: 0: 4911.7. Samples: 11222752. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 16:02:07,852][07361] Avg episode reward: [(0, '42.552')] [2025-01-05 16:02:09,761][07482] Updated weights for policy 0, policy_version 308378 (0.0018) [2025-01-05 16:02:11,836][07482] Updated weights for policy 0, policy_version 308388 (0.0017) [2025-01-05 16:02:12,851][07361] Fps is (10 sec: 19251.3, 60 sec: 19592.6, 300 sec: 19452.5). Total num frames: 1263173632. Throughput: 0: 4907.4. Samples: 11251922. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 16:02:12,852][07361] Avg episode reward: [(0, '42.424')] [2025-01-05 16:02:13,949][07482] Updated weights for policy 0, policy_version 308398 (0.0017) [2025-01-05 16:02:15,975][07482] Updated weights for policy 0, policy_version 308408 (0.0016) [2025-01-05 16:02:17,851][07361] Fps is (10 sec: 19251.3, 60 sec: 19592.6, 300 sec: 19452.5). Total num frames: 1263271936. Throughput: 0: 4904.4. Samples: 11281266. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 16:02:17,852][07361] Avg episode reward: [(0, '40.902')] [2025-01-05 16:02:18,121][07482] Updated weights for policy 0, policy_version 308418 (0.0017) [2025-01-05 16:02:20,160][07482] Updated weights for policy 0, policy_version 308428 (0.0016) [2025-01-05 16:02:22,209][07482] Updated weights for policy 0, policy_version 308438 (0.0016) [2025-01-05 16:02:22,852][07361] Fps is (10 sec: 20070.2, 60 sec: 19660.8, 300 sec: 19466.4). Total num frames: 1263374336. Throughput: 0: 4911.2. Samples: 11296118. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 16:02:22,852][07361] Avg episode reward: [(0, '41.330')] [2025-01-05 16:02:24,347][07482] Updated weights for policy 0, policy_version 308448 (0.0017) [2025-01-05 16:02:26,363][07482] Updated weights for policy 0, policy_version 308458 (0.0020) [2025-01-05 16:02:27,851][07361] Fps is (10 sec: 20070.3, 60 sec: 19660.8, 300 sec: 19480.3). Total num frames: 1263472640. Throughput: 0: 4904.9. Samples: 11325776. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 16:02:27,852][07361] Avg episode reward: [(0, '40.715')] [2025-01-05 16:02:28,428][07482] Updated weights for policy 0, policy_version 308468 (0.0016) [2025-01-05 16:02:30,502][07482] Updated weights for policy 0, policy_version 308478 (0.0016) [2025-01-05 16:02:32,530][07482] Updated weights for policy 0, policy_version 308488 (0.0017) [2025-01-05 16:02:32,852][07361] Fps is (10 sec: 19660.8, 60 sec: 19660.8, 300 sec: 19494.2). Total num frames: 1263570944. Throughput: 0: 4906.4. Samples: 11355742. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 16:02:32,852][07361] Avg episode reward: [(0, '41.081')] [2025-01-05 16:02:34,634][07482] Updated weights for policy 0, policy_version 308498 (0.0017) [2025-01-05 16:02:36,666][07482] Updated weights for policy 0, policy_version 308508 (0.0016) [2025-01-05 16:02:37,852][07361] Fps is (10 sec: 19660.4, 60 sec: 19660.8, 300 sec: 19494.2). Total num frames: 1263669248. Throughput: 0: 4912.3. Samples: 11370554. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 16:02:37,852][07361] Avg episode reward: [(0, '37.254')] [2025-01-05 16:02:37,952][07448] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000308514_1263673344.pth... [2025-01-05 16:02:38,005][07448] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000307368_1258979328.pth [2025-01-05 16:02:38,788][07482] Updated weights for policy 0, policy_version 308518 (0.0017) [2025-01-05 16:02:40,899][07482] Updated weights for policy 0, policy_version 308528 (0.0017) [2025-01-05 16:02:42,851][07361] Fps is (10 sec: 19661.0, 60 sec: 19660.8, 300 sec: 19494.2). Total num frames: 1263767552. Throughput: 0: 4920.0. Samples: 11399902. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 16:02:42,852][07361] Avg episode reward: [(0, '37.763')] [2025-01-05 16:02:42,945][07482] Updated weights for policy 0, policy_version 308538 (0.0016) [2025-01-05 16:02:44,971][07482] Updated weights for policy 0, policy_version 308548 (0.0016) [2025-01-05 16:02:47,062][07482] Updated weights for policy 0, policy_version 308558 (0.0017) [2025-01-05 16:02:47,852][07361] Fps is (10 sec: 19661.0, 60 sec: 19661.1, 300 sec: 19508.1). Total num frames: 1263865856. Throughput: 0: 4932.1. Samples: 11429756. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 16:02:47,852][07361] Avg episode reward: [(0, '41.663')] [2025-01-05 16:02:49,160][07482] Updated weights for policy 0, policy_version 308568 (0.0016) [2025-01-05 16:02:51,194][07482] Updated weights for policy 0, policy_version 308578 (0.0017) [2025-01-05 16:02:52,852][07361] Fps is (10 sec: 19660.5, 60 sec: 19660.8, 300 sec: 19508.1). Total num frames: 1263964160. Throughput: 0: 4928.9. Samples: 11444552. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 16:02:52,852][07361] Avg episode reward: [(0, '40.484')] [2025-01-05 16:02:53,416][07482] Updated weights for policy 0, policy_version 308588 (0.0017) [2025-01-05 16:02:55,463][07482] Updated weights for policy 0, policy_version 308598 (0.0015) [2025-01-05 16:02:57,491][07482] Updated weights for policy 0, policy_version 308608 (0.0019) [2025-01-05 16:02:57,852][07361] Fps is (10 sec: 19660.7, 60 sec: 19660.8, 300 sec: 19508.1). Total num frames: 1264062464. Throughput: 0: 4929.7. Samples: 11473760. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 16:02:57,852][07361] Avg episode reward: [(0, '43.059')] [2025-01-05 16:02:59,685][07482] Updated weights for policy 0, policy_version 308618 (0.0017) [2025-01-05 16:03:01,765][07482] Updated weights for policy 0, policy_version 308628 (0.0016) [2025-01-05 16:03:02,852][07361] Fps is (10 sec: 19660.8, 60 sec: 19660.8, 300 sec: 19535.8). Total num frames: 1264160768. Throughput: 0: 4924.0. Samples: 11502846. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 16:03:02,852][07361] Avg episode reward: [(0, '44.764')] [2025-01-05 16:03:03,909][07482] Updated weights for policy 0, policy_version 308638 (0.0017) [2025-01-05 16:03:05,989][07482] Updated weights for policy 0, policy_version 308648 (0.0016) [2025-01-05 16:03:07,852][07361] Fps is (10 sec: 19251.4, 60 sec: 19592.5, 300 sec: 19535.8). Total num frames: 1264254976. Throughput: 0: 4917.4. Samples: 11517400. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 16:03:07,852][07361] Avg episode reward: [(0, '44.573')] [2025-01-05 16:03:08,198][07482] Updated weights for policy 0, policy_version 308658 (0.0017) [2025-01-05 16:03:10,181][07482] Updated weights for policy 0, policy_version 308668 (0.0015) [2025-01-05 16:03:12,281][07482] Updated weights for policy 0, policy_version 308678 (0.0016) [2025-01-05 16:03:12,851][07361] Fps is (10 sec: 19251.4, 60 sec: 19660.8, 300 sec: 19549.8). Total num frames: 1264353280. Throughput: 0: 4909.6. Samples: 11546706. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 16:03:12,852][07361] Avg episode reward: [(0, '44.042')] [2025-01-05 16:03:14,457][07482] Updated weights for policy 0, policy_version 308688 (0.0016) [2025-01-05 16:03:16,458][07482] Updated weights for policy 0, policy_version 308698 (0.0016) [2025-01-05 16:03:17,851][07361] Fps is (10 sec: 19661.0, 60 sec: 19660.8, 300 sec: 19549.7). Total num frames: 1264451584. Throughput: 0: 4896.3. Samples: 11576074. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 16:03:17,852][07361] Avg episode reward: [(0, '39.863')] [2025-01-05 16:03:18,548][07482] Updated weights for policy 0, policy_version 308708 (0.0016) [2025-01-05 16:03:20,660][07482] Updated weights for policy 0, policy_version 308718 (0.0016) [2025-01-05 16:03:22,647][07482] Updated weights for policy 0, policy_version 308728 (0.0016) [2025-01-05 16:03:22,852][07361] Fps is (10 sec: 19660.6, 60 sec: 19592.5, 300 sec: 19549.7). Total num frames: 1264549888. Throughput: 0: 4894.5. Samples: 11590804. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 16:03:22,852][07361] Avg episode reward: [(0, '40.221')] [2025-01-05 16:03:24,750][07482] Updated weights for policy 0, policy_version 308738 (0.0017) [2025-01-05 16:03:26,837][07482] Updated weights for policy 0, policy_version 308748 (0.0016) [2025-01-05 16:03:27,851][07361] Fps is (10 sec: 19660.8, 60 sec: 19592.5, 300 sec: 19549.8). Total num frames: 1264648192. Throughput: 0: 4907.1. Samples: 11620720. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 16:03:27,852][07361] Avg episode reward: [(0, '44.370')] [2025-01-05 16:03:28,945][07482] Updated weights for policy 0, policy_version 308758 (0.0020) [2025-01-05 16:03:31,032][07482] Updated weights for policy 0, policy_version 308768 (0.0017) [2025-01-05 16:03:32,852][07361] Fps is (10 sec: 19660.8, 60 sec: 19592.5, 300 sec: 19549.7). Total num frames: 1264746496. Throughput: 0: 4884.5. Samples: 11649560. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 16:03:32,852][07361] Avg episode reward: [(0, '42.562')] [2025-01-05 16:03:33,248][07482] Updated weights for policy 0, policy_version 308778 (0.0017) [2025-01-05 16:03:35,239][07482] Updated weights for policy 0, policy_version 308788 (0.0016) [2025-01-05 16:03:37,330][07482] Updated weights for policy 0, policy_version 308798 (0.0016) [2025-01-05 16:03:37,852][07361] Fps is (10 sec: 19660.5, 60 sec: 19592.6, 300 sec: 19563.6). Total num frames: 1264844800. Throughput: 0: 4883.0. Samples: 11664286. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 16:03:37,852][07361] Avg episode reward: [(0, '40.922')] [2025-01-05 16:03:39,538][07482] Updated weights for policy 0, policy_version 308808 (0.0017) [2025-01-05 16:03:41,533][07482] Updated weights for policy 0, policy_version 308818 (0.0016) [2025-01-05 16:03:42,851][07361] Fps is (10 sec: 19661.0, 60 sec: 19592.5, 300 sec: 19563.6). Total num frames: 1264943104. Throughput: 0: 4885.8. Samples: 11693620. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 16:03:42,852][07361] Avg episode reward: [(0, '43.157')] [2025-01-05 16:03:43,632][07482] Updated weights for policy 0, policy_version 308828 (0.0015) [2025-01-05 16:03:45,738][07482] Updated weights for policy 0, policy_version 308838 (0.0016) [2025-01-05 16:03:47,852][07361] Fps is (10 sec: 19251.0, 60 sec: 19524.2, 300 sec: 19549.7). Total num frames: 1265037312. Throughput: 0: 4887.0. Samples: 11722762. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 16:03:47,853][07361] Avg episode reward: [(0, '40.044')] [2025-01-05 16:03:47,864][07482] Updated weights for policy 0, policy_version 308848 (0.0019) [2025-01-05 16:03:50,054][07482] Updated weights for policy 0, policy_version 308858 (0.0017) [2025-01-05 16:03:52,148][07482] Updated weights for policy 0, policy_version 308868 (0.0017) [2025-01-05 16:03:52,852][07361] Fps is (10 sec: 19250.9, 60 sec: 19524.3, 300 sec: 19549.7). Total num frames: 1265135616. Throughput: 0: 4882.4. Samples: 11737108. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 16:03:52,852][07361] Avg episode reward: [(0, '41.005')] [2025-01-05 16:03:54,240][07482] Updated weights for policy 0, policy_version 308878 (0.0017) [2025-01-05 16:03:56,311][07482] Updated weights for policy 0, policy_version 308888 (0.0016) [2025-01-05 16:03:57,851][07361] Fps is (10 sec: 19661.3, 60 sec: 19524.3, 300 sec: 19549.7). Total num frames: 1265233920. Throughput: 0: 4885.6. Samples: 11766556. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 16:03:57,852][07361] Avg episode reward: [(0, '41.462')] [2025-01-05 16:03:58,465][07482] Updated weights for policy 0, policy_version 308898 (0.0018) [2025-01-05 16:04:00,458][07482] Updated weights for policy 0, policy_version 308908 (0.0016) [2025-01-05 16:04:02,532][07482] Updated weights for policy 0, policy_version 308918 (0.0016) [2025-01-05 16:04:02,852][07361] Fps is (10 sec: 19660.8, 60 sec: 19524.3, 300 sec: 19549.7). Total num frames: 1265332224. Throughput: 0: 4889.8. Samples: 11796116. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 16:04:02,852][07361] Avg episode reward: [(0, '39.245')] [2025-01-05 16:04:04,722][07482] Updated weights for policy 0, policy_version 308928 (0.0016) [2025-01-05 16:04:06,716][07482] Updated weights for policy 0, policy_version 308938 (0.0015) [2025-01-05 16:04:07,852][07361] Fps is (10 sec: 19660.7, 60 sec: 19592.5, 300 sec: 19549.7). Total num frames: 1265430528. Throughput: 0: 4884.8. Samples: 11810620. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 16:04:07,852][07361] Avg episode reward: [(0, '41.645')] [2025-01-05 16:04:08,790][07482] Updated weights for policy 0, policy_version 308948 (0.0017) [2025-01-05 16:04:10,885][07482] Updated weights for policy 0, policy_version 308958 (0.0015) [2025-01-05 16:04:12,852][07361] Fps is (10 sec: 19251.3, 60 sec: 19524.2, 300 sec: 19535.8). Total num frames: 1265524736. Throughput: 0: 4881.0. Samples: 11840366. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 16:04:12,852][07361] Avg episode reward: [(0, '43.938')] [2025-01-05 16:04:13,016][07482] Updated weights for policy 0, policy_version 308968 (0.0017) [2025-01-05 16:04:15,059][07482] Updated weights for policy 0, policy_version 308978 (0.0015) [2025-01-05 16:04:17,179][07482] Updated weights for policy 0, policy_version 308988 (0.0016) [2025-01-05 16:04:17,852][07361] Fps is (10 sec: 19660.8, 60 sec: 19592.5, 300 sec: 19563.6). Total num frames: 1265627136. Throughput: 0: 4893.1. Samples: 11869748. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 16:04:17,852][07361] Avg episode reward: [(0, '35.699')] [2025-01-05 16:04:19,248][07482] Updated weights for policy 0, policy_version 308998 (0.0016) [2025-01-05 16:04:21,295][07482] Updated weights for policy 0, policy_version 309008 (0.0016) [2025-01-05 16:04:22,852][07361] Fps is (10 sec: 19660.8, 60 sec: 19524.3, 300 sec: 19549.7). Total num frames: 1265721344. Throughput: 0: 4892.5. Samples: 11884448. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 16:04:22,852][07361] Avg episode reward: [(0, '38.562')] [2025-01-05 16:04:23,510][07482] Updated weights for policy 0, policy_version 309018 (0.0017) [2025-01-05 16:04:25,531][07482] Updated weights for policy 0, policy_version 309028 (0.0017) [2025-01-05 16:04:27,602][07482] Updated weights for policy 0, policy_version 309038 (0.0016) [2025-01-05 16:04:27,852][07361] Fps is (10 sec: 19251.3, 60 sec: 19524.2, 300 sec: 19549.7). Total num frames: 1265819648. Throughput: 0: 4891.4. Samples: 11913734. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 16:04:27,852][07361] Avg episode reward: [(0, '41.866')] [2025-01-05 16:04:29,837][07482] Updated weights for policy 0, policy_version 309048 (0.0016) [2025-01-05 16:04:31,895][07482] Updated weights for policy 0, policy_version 309058 (0.0017) [2025-01-05 16:04:32,851][07361] Fps is (10 sec: 19660.9, 60 sec: 19524.3, 300 sec: 19549.7). Total num frames: 1265917952. Throughput: 0: 4883.6. Samples: 11942524. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 16:04:32,852][07361] Avg episode reward: [(0, '39.541')] [2025-01-05 16:04:34,062][07482] Updated weights for policy 0, policy_version 309068 (0.0017) [2025-01-05 16:04:36,164][07482] Updated weights for policy 0, policy_version 309078 (0.0018) [2025-01-05 16:04:37,851][07361] Fps is (10 sec: 19660.9, 60 sec: 19524.3, 300 sec: 19535.8). Total num frames: 1266016256. Throughput: 0: 4887.7. Samples: 11957054. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 16:04:37,852][07361] Avg episode reward: [(0, '38.004')] [2025-01-05 16:04:37,858][07448] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000309086_1266016256.pth... [2025-01-05 16:04:37,911][07448] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000307938_1261314048.pth [2025-01-05 16:04:38,268][07482] Updated weights for policy 0, policy_version 309088 (0.0019) [2025-01-05 16:04:40,358][07482] Updated weights for policy 0, policy_version 309098 (0.0016) [2025-01-05 16:04:42,480][07482] Updated weights for policy 0, policy_version 309108 (0.0016) [2025-01-05 16:04:42,851][07361] Fps is (10 sec: 19251.2, 60 sec: 19456.0, 300 sec: 19522.0). Total num frames: 1266110464. Throughput: 0: 4881.6. Samples: 11986226. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 16:04:42,852][07361] Avg episode reward: [(0, '39.842')] [2025-01-05 16:04:44,580][07482] Updated weights for policy 0, policy_version 309118 (0.0017) [2025-01-05 16:04:46,648][07482] Updated weights for policy 0, policy_version 309128 (0.0015) [2025-01-05 16:04:47,852][07361] Fps is (10 sec: 19250.9, 60 sec: 19524.3, 300 sec: 19535.8). Total num frames: 1266208768. Throughput: 0: 4873.4. Samples: 12015420. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 16:04:47,852][07361] Avg episode reward: [(0, '43.229')] [2025-01-05 16:04:48,850][07482] Updated weights for policy 0, policy_version 309138 (0.0016) [2025-01-05 16:04:50,848][07482] Updated weights for policy 0, policy_version 309148 (0.0016) [2025-01-05 16:04:52,851][07361] Fps is (10 sec: 19660.7, 60 sec: 19524.3, 300 sec: 19535.8). Total num frames: 1266307072. Throughput: 0: 4875.0. Samples: 12029994. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 16:04:52,852][07361] Avg episode reward: [(0, '42.546')] [2025-01-05 16:04:52,935][07482] Updated weights for policy 0, policy_version 309158 (0.0016) [2025-01-05 16:04:55,071][07482] Updated weights for policy 0, policy_version 309168 (0.0016) [2025-01-05 16:04:57,197][07482] Updated weights for policy 0, policy_version 309178 (0.0017) [2025-01-05 16:04:57,852][07361] Fps is (10 sec: 19251.2, 60 sec: 19456.0, 300 sec: 19535.8). Total num frames: 1266401280. Throughput: 0: 4861.2. Samples: 12059122. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 16:04:57,852][07361] Avg episode reward: [(0, '40.252')] [2025-01-05 16:04:59,390][07482] Updated weights for policy 0, policy_version 309188 (0.0019) [2025-01-05 16:05:01,495][07482] Updated weights for policy 0, policy_version 309198 (0.0016) [2025-01-05 16:05:02,852][07361] Fps is (10 sec: 18841.4, 60 sec: 19387.7, 300 sec: 19521.9). Total num frames: 1266495488. Throughput: 0: 4846.1. Samples: 12087822. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 16:05:02,852][07361] Avg episode reward: [(0, '38.774')] [2025-01-05 16:05:03,777][07482] Updated weights for policy 0, policy_version 309208 (0.0018) [2025-01-05 16:05:05,835][07482] Updated weights for policy 0, policy_version 309218 (0.0016) [2025-01-05 16:05:07,851][07361] Fps is (10 sec: 18841.8, 60 sec: 19319.5, 300 sec: 19522.0). Total num frames: 1266589696. Throughput: 0: 4827.8. Samples: 12101700. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 16:05:07,852][07361] Avg episode reward: [(0, '40.237')] [2025-01-05 16:05:08,115][07482] Updated weights for policy 0, policy_version 309228 (0.0017) [2025-01-05 16:05:10,305][07482] Updated weights for policy 0, policy_version 309238 (0.0020) [2025-01-05 16:05:12,563][07482] Updated weights for policy 0, policy_version 309248 (0.0017) [2025-01-05 16:05:12,852][07361] Fps is (10 sec: 18432.0, 60 sec: 19251.2, 300 sec: 19494.2). Total num frames: 1266679808. Throughput: 0: 4787.9. Samples: 12129190. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-05 16:05:12,852][07361] Avg episode reward: [(0, '43.100')] [2025-01-05 16:05:15,015][07482] Updated weights for policy 0, policy_version 309258 (0.0018) [2025-01-05 16:05:17,045][07482] Updated weights for policy 0, policy_version 309268 (0.0016) [2025-01-05 16:05:17,852][07361] Fps is (10 sec: 18431.8, 60 sec: 19114.7, 300 sec: 19480.3). Total num frames: 1266774016. Throughput: 0: 4762.2. Samples: 12156824. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-05 16:05:17,852][07361] Avg episode reward: [(0, '45.533')] [2025-01-05 16:05:19,192][07482] Updated weights for policy 0, policy_version 309278 (0.0017) [2025-01-05 16:05:21,313][07482] Updated weights for policy 0, policy_version 309288 (0.0017) [2025-01-05 16:05:22,851][07361] Fps is (10 sec: 19251.6, 60 sec: 19183.0, 300 sec: 19494.2). Total num frames: 1266872320. Throughput: 0: 4760.6. Samples: 12171280. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-05 16:05:22,852][07361] Avg episode reward: [(0, '42.491')] [2025-01-05 16:05:23,372][07482] Updated weights for policy 0, policy_version 309298 (0.0015) [2025-01-05 16:05:25,493][07482] Updated weights for policy 0, policy_version 309308 (0.0017) [2025-01-05 16:05:27,803][07482] Updated weights for policy 0, policy_version 309318 (0.0018) [2025-01-05 16:05:27,851][07361] Fps is (10 sec: 19251.3, 60 sec: 19114.7, 300 sec: 19494.2). Total num frames: 1266966528. Throughput: 0: 4753.7. Samples: 12200142. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-05 16:05:27,852][07361] Avg episode reward: [(0, '39.115')] [2025-01-05 16:05:30,036][07482] Updated weights for policy 0, policy_version 309328 (0.0018) [2025-01-05 16:05:32,310][07482] Updated weights for policy 0, policy_version 309338 (0.0019) [2025-01-05 16:05:32,852][07361] Fps is (10 sec: 18431.8, 60 sec: 18978.1, 300 sec: 19480.3). Total num frames: 1267056640. Throughput: 0: 4707.6. Samples: 12227260. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-05 16:05:32,852][07361] Avg episode reward: [(0, '43.051')] [2025-01-05 16:05:34,550][07482] Updated weights for policy 0, policy_version 309348 (0.0017) [2025-01-05 16:05:36,656][07482] Updated weights for policy 0, policy_version 309358 (0.0017) [2025-01-05 16:05:37,852][07361] Fps is (10 sec: 18431.7, 60 sec: 18909.8, 300 sec: 19466.4). Total num frames: 1267150848. Throughput: 0: 4695.8. Samples: 12241308. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-05 16:05:37,853][07361] Avg episode reward: [(0, '43.435')] [2025-01-05 16:05:38,941][07482] Updated weights for policy 0, policy_version 309368 (0.0018) [2025-01-05 16:05:41,065][07482] Updated weights for policy 0, policy_version 309378 (0.0017) [2025-01-05 16:05:42,852][07361] Fps is (10 sec: 18841.6, 60 sec: 18909.8, 300 sec: 19452.5). Total num frames: 1267245056. Throughput: 0: 4674.6. Samples: 12269480. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-05 16:05:42,852][07361] Avg episode reward: [(0, '43.521')] [2025-01-05 16:05:43,229][07482] Updated weights for policy 0, policy_version 309388 (0.0017) [2025-01-05 16:05:45,364][07482] Updated weights for policy 0, policy_version 309398 (0.0018) [2025-01-05 16:05:47,465][07482] Updated weights for policy 0, policy_version 309408 (0.0016) [2025-01-05 16:05:47,852][07361] Fps is (10 sec: 18841.8, 60 sec: 18841.6, 300 sec: 19438.6). Total num frames: 1267339264. Throughput: 0: 4675.3. Samples: 12298212. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-05 16:05:47,852][07361] Avg episode reward: [(0, '42.486')] [2025-01-05 16:05:49,634][07482] Updated weights for policy 0, policy_version 309418 (0.0016) [2025-01-05 16:05:51,752][07482] Updated weights for policy 0, policy_version 309428 (0.0016) [2025-01-05 16:05:52,852][07361] Fps is (10 sec: 19251.2, 60 sec: 18841.6, 300 sec: 19438.6). Total num frames: 1267437568. Throughput: 0: 4684.9. Samples: 12312520. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-05 16:05:52,852][07361] Avg episode reward: [(0, '43.137')] [2025-01-05 16:05:53,943][07482] Updated weights for policy 0, policy_version 309438 (0.0017) [2025-01-05 16:05:56,072][07482] Updated weights for policy 0, policy_version 309448 (0.0017) [2025-01-05 16:05:57,851][07361] Fps is (10 sec: 19251.4, 60 sec: 18841.6, 300 sec: 19438.6). Total num frames: 1267531776. Throughput: 0: 4708.5. Samples: 12341070. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-05 16:05:57,852][07361] Avg episode reward: [(0, '43.168')] [2025-01-05 16:05:58,205][07482] Updated weights for policy 0, policy_version 309458 (0.0017) [2025-01-05 16:06:00,262][07482] Updated weights for policy 0, policy_version 309468 (0.0017) [2025-01-05 16:06:02,314][07482] Updated weights for policy 0, policy_version 309478 (0.0016) [2025-01-05 16:06:02,852][07361] Fps is (10 sec: 19251.2, 60 sec: 18909.9, 300 sec: 19424.8). Total num frames: 1267630080. Throughput: 0: 4752.3. Samples: 12370678. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-05 16:06:02,852][07361] Avg episode reward: [(0, '42.278')] [2025-01-05 16:06:04,411][07482] Updated weights for policy 0, policy_version 309488 (0.0016) [2025-01-05 16:06:06,471][07482] Updated weights for policy 0, policy_version 309498 (0.0016) [2025-01-05 16:06:07,851][07361] Fps is (10 sec: 19660.8, 60 sec: 18978.1, 300 sec: 19424.8). Total num frames: 1267728384. Throughput: 0: 4756.2. Samples: 12385310. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-05 16:06:07,852][07361] Avg episode reward: [(0, '42.432')] [2025-01-05 16:06:08,604][07482] Updated weights for policy 0, policy_version 309508 (0.0016) [2025-01-05 16:06:10,623][07482] Updated weights for policy 0, policy_version 309518 (0.0018) [2025-01-05 16:06:12,692][07482] Updated weights for policy 0, policy_version 309528 (0.0016) [2025-01-05 16:06:12,851][07361] Fps is (10 sec: 19661.1, 60 sec: 19114.7, 300 sec: 19424.8). Total num frames: 1267826688. Throughput: 0: 4771.0. Samples: 12414836. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-05 16:06:12,852][07361] Avg episode reward: [(0, '39.612')] [2025-01-05 16:06:14,833][07482] Updated weights for policy 0, policy_version 309538 (0.0018) [2025-01-05 16:06:16,864][07482] Updated weights for policy 0, policy_version 309548 (0.0017) [2025-01-05 16:06:17,852][07361] Fps is (10 sec: 19660.5, 60 sec: 19182.9, 300 sec: 19424.8). Total num frames: 1267924992. Throughput: 0: 4823.0. Samples: 12444294. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-05 16:06:17,852][07361] Avg episode reward: [(0, '40.738')] [2025-01-05 16:06:19,062][07482] Updated weights for policy 0, policy_version 309558 (0.0016) [2025-01-05 16:06:21,126][07482] Updated weights for policy 0, policy_version 309568 (0.0016) [2025-01-05 16:06:22,851][07361] Fps is (10 sec: 19251.0, 60 sec: 19114.6, 300 sec: 19410.9). Total num frames: 1268019200. Throughput: 0: 4834.4. Samples: 12458856. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-05 16:06:22,852][07361] Avg episode reward: [(0, '40.610')] [2025-01-05 16:06:23,244][07482] Updated weights for policy 0, policy_version 309578 (0.0018) [2025-01-05 16:06:25,370][07482] Updated weights for policy 0, policy_version 309588 (0.0016) [2025-01-05 16:06:27,418][07482] Updated weights for policy 0, policy_version 309598 (0.0016) [2025-01-05 16:06:27,852][07361] Fps is (10 sec: 19251.3, 60 sec: 19182.9, 300 sec: 19410.9). Total num frames: 1268117504. Throughput: 0: 4856.9. Samples: 12488040. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 16:06:27,852][07361] Avg episode reward: [(0, '42.053')] [2025-01-05 16:06:29,497][07482] Updated weights for policy 0, policy_version 309608 (0.0017) [2025-01-05 16:06:31,623][07482] Updated weights for policy 0, policy_version 309618 (0.0016) [2025-01-05 16:06:32,852][07361] Fps is (10 sec: 19660.7, 60 sec: 19319.5, 300 sec: 19410.9). Total num frames: 1268215808. Throughput: 0: 4871.0. Samples: 12517406. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 16:06:32,852][07361] Avg episode reward: [(0, '39.156')] [2025-01-05 16:06:33,744][07482] Updated weights for policy 0, policy_version 309628 (0.0017) [2025-01-05 16:06:35,774][07482] Updated weights for policy 0, policy_version 309638 (0.0017) [2025-01-05 16:06:37,851][07361] Fps is (10 sec: 19660.9, 60 sec: 19387.8, 300 sec: 19410.9). Total num frames: 1268314112. Throughput: 0: 4877.4. Samples: 12532002. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 16:06:37,852][07361] Avg episode reward: [(0, '40.025')] [2025-01-05 16:06:37,859][07448] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000309647_1268314112.pth... [2025-01-05 16:06:37,914][07448] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000308514_1263673344.pth [2025-01-05 16:06:38,021][07482] Updated weights for policy 0, policy_version 309648 (0.0017) [2025-01-05 16:06:40,145][07482] Updated weights for policy 0, policy_version 309658 (0.0017) [2025-01-05 16:06:42,206][07482] Updated weights for policy 0, policy_version 309668 (0.0016) [2025-01-05 16:06:42,851][07361] Fps is (10 sec: 19251.3, 60 sec: 19387.8, 300 sec: 19397.1). Total num frames: 1268408320. Throughput: 0: 4880.3. Samples: 12560682. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 16:06:42,852][07361] Avg episode reward: [(0, '42.474')] [2025-01-05 16:06:44,388][07482] Updated weights for policy 0, policy_version 309678 (0.0017) [2025-01-05 16:06:46,452][07482] Updated weights for policy 0, policy_version 309688 (0.0016) [2025-01-05 16:06:47,852][07361] Fps is (10 sec: 19251.0, 60 sec: 19456.0, 300 sec: 19397.0). Total num frames: 1268506624. Throughput: 0: 4870.4. Samples: 12589844. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 16:06:47,852][07361] Avg episode reward: [(0, '42.192')] [2025-01-05 16:06:48,542][07482] Updated weights for policy 0, policy_version 309698 (0.0016) [2025-01-05 16:06:50,619][07482] Updated weights for policy 0, policy_version 309708 (0.0015) [2025-01-05 16:06:52,688][07482] Updated weights for policy 0, policy_version 309718 (0.0017) [2025-01-05 16:06:52,851][07361] Fps is (10 sec: 19661.0, 60 sec: 19456.0, 300 sec: 19397.0). Total num frames: 1268604928. Throughput: 0: 4873.8. Samples: 12604632. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 16:06:52,852][07361] Avg episode reward: [(0, '43.597')] [2025-01-05 16:06:54,788][07482] Updated weights for policy 0, policy_version 309728 (0.0017) [2025-01-05 16:06:56,882][07482] Updated weights for policy 0, policy_version 309738 (0.0016) [2025-01-05 16:06:57,851][07361] Fps is (10 sec: 19661.1, 60 sec: 19524.3, 300 sec: 19397.0). Total num frames: 1268703232. Throughput: 0: 4873.3. Samples: 12634136. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 16:06:57,852][07361] Avg episode reward: [(0, '39.944')] [2025-01-05 16:06:59,017][07482] Updated weights for policy 0, policy_version 309748 (0.0020) [2025-01-05 16:07:01,051][07482] Updated weights for policy 0, policy_version 309758 (0.0016) [2025-01-05 16:07:02,851][07361] Fps is (10 sec: 19660.6, 60 sec: 19524.3, 300 sec: 19397.0). Total num frames: 1268801536. Throughput: 0: 4865.0. Samples: 12663220. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 16:07:02,852][07361] Avg episode reward: [(0, '40.439')] [2025-01-05 16:07:03,221][07482] Updated weights for policy 0, policy_version 309768 (0.0017) [2025-01-05 16:07:05,260][07482] Updated weights for policy 0, policy_version 309778 (0.0016) [2025-01-05 16:07:07,326][07482] Updated weights for policy 0, policy_version 309788 (0.0015) [2025-01-05 16:07:07,852][07361] Fps is (10 sec: 19660.5, 60 sec: 19524.2, 300 sec: 19410.9). Total num frames: 1268899840. Throughput: 0: 4870.7. Samples: 12678036. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 16:07:07,852][07361] Avg episode reward: [(0, '43.117')] [2025-01-05 16:07:09,505][07482] Updated weights for policy 0, policy_version 309798 (0.0017) [2025-01-05 16:07:11,571][07482] Updated weights for policy 0, policy_version 309808 (0.0019) [2025-01-05 16:07:12,851][07361] Fps is (10 sec: 19251.4, 60 sec: 19456.0, 300 sec: 19397.0). Total num frames: 1268994048. Throughput: 0: 4872.7. Samples: 12707312. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 16:07:12,852][07361] Avg episode reward: [(0, '43.036')] [2025-01-05 16:07:13,679][07482] Updated weights for policy 0, policy_version 309818 (0.0017) [2025-01-05 16:07:15,773][07482] Updated weights for policy 0, policy_version 309828 (0.0016) [2025-01-05 16:07:17,851][07361] Fps is (10 sec: 19251.5, 60 sec: 19456.1, 300 sec: 19383.1). Total num frames: 1269092352. Throughput: 0: 4869.2. Samples: 12736518. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 16:07:17,852][07361] Avg episode reward: [(0, '41.064')] [2025-01-05 16:07:17,864][07482] Updated weights for policy 0, policy_version 309838 (0.0016) [2025-01-05 16:07:19,969][07482] Updated weights for policy 0, policy_version 309848 (0.0018) [2025-01-05 16:07:22,048][07482] Updated weights for policy 0, policy_version 309858 (0.0015) [2025-01-05 16:07:22,852][07361] Fps is (10 sec: 19660.6, 60 sec: 19524.3, 300 sec: 19383.1). Total num frames: 1269190656. Throughput: 0: 4869.4. Samples: 12751126. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 16:07:22,852][07361] Avg episode reward: [(0, '42.063')] [2025-01-05 16:07:24,175][07482] Updated weights for policy 0, policy_version 309868 (0.0017) [2025-01-05 16:07:26,212][07482] Updated weights for policy 0, policy_version 309878 (0.0016) [2025-01-05 16:07:27,852][07361] Fps is (10 sec: 19660.6, 60 sec: 19524.3, 300 sec: 19383.1). Total num frames: 1269288960. Throughput: 0: 4886.9. Samples: 12780594. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 16:07:27,852][07361] Avg episode reward: [(0, '39.980')] [2025-01-05 16:07:28,390][07482] Updated weights for policy 0, policy_version 309888 (0.0017) [2025-01-05 16:07:30,446][07482] Updated weights for policy 0, policy_version 309898 (0.0016) [2025-01-05 16:07:32,472][07482] Updated weights for policy 0, policy_version 309908 (0.0016) [2025-01-05 16:07:32,852][07361] Fps is (10 sec: 19660.8, 60 sec: 19524.3, 300 sec: 19383.1). Total num frames: 1269387264. Throughput: 0: 4894.1. Samples: 12810078. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 16:07:32,852][07361] Avg episode reward: [(0, '38.277')] [2025-01-05 16:07:34,632][07482] Updated weights for policy 0, policy_version 309918 (0.0017) [2025-01-05 16:07:36,690][07482] Updated weights for policy 0, policy_version 309928 (0.0017) [2025-01-05 16:07:37,852][07361] Fps is (10 sec: 19660.7, 60 sec: 19524.2, 300 sec: 19383.1). Total num frames: 1269485568. Throughput: 0: 4887.9. Samples: 12824590. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 16:07:37,852][07361] Avg episode reward: [(0, '38.529')] [2025-01-05 16:07:38,810][07482] Updated weights for policy 0, policy_version 309938 (0.0017) [2025-01-05 16:07:40,877][07482] Updated weights for policy 0, policy_version 309948 (0.0017) [2025-01-05 16:07:42,851][07361] Fps is (10 sec: 19661.1, 60 sec: 19592.6, 300 sec: 19383.1). Total num frames: 1269583872. Throughput: 0: 4886.9. Samples: 12854048. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 16:07:42,852][07361] Avg episode reward: [(0, '41.782')] [2025-01-05 16:07:42,985][07482] Updated weights for policy 0, policy_version 309958 (0.0017) [2025-01-05 16:07:45,019][07482] Updated weights for policy 0, policy_version 309968 (0.0015) [2025-01-05 16:07:47,076][07482] Updated weights for policy 0, policy_version 309978 (0.0017) [2025-01-05 16:07:47,852][07361] Fps is (10 sec: 19660.8, 60 sec: 19592.5, 300 sec: 19383.1). Total num frames: 1269682176. Throughput: 0: 4901.6. Samples: 12883794. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 16:07:47,852][07361] Avg episode reward: [(0, '41.890')] [2025-01-05 16:07:49,221][07482] Updated weights for policy 0, policy_version 309988 (0.0018) [2025-01-05 16:07:51,275][07482] Updated weights for policy 0, policy_version 309998 (0.0015) [2025-01-05 16:07:52,852][07361] Fps is (10 sec: 19660.5, 60 sec: 19592.5, 300 sec: 19383.1). Total num frames: 1269780480. Throughput: 0: 4895.4. Samples: 12898330. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 16:07:52,852][07361] Avg episode reward: [(0, '41.342')] [2025-01-05 16:07:53,394][07482] Updated weights for policy 0, policy_version 310008 (0.0016) [2025-01-05 16:07:55,459][07482] Updated weights for policy 0, policy_version 310018 (0.0016) [2025-01-05 16:07:57,498][07482] Updated weights for policy 0, policy_version 310028 (0.0016) [2025-01-05 16:07:57,851][07361] Fps is (10 sec: 19661.0, 60 sec: 19592.5, 300 sec: 19383.1). Total num frames: 1269878784. Throughput: 0: 4901.1. Samples: 12927860. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 16:07:57,852][07361] Avg episode reward: [(0, '40.387')] [2025-01-05 16:07:59,672][07482] Updated weights for policy 0, policy_version 310038 (0.0016) [2025-01-05 16:08:01,728][07482] Updated weights for policy 0, policy_version 310048 (0.0015) [2025-01-05 16:08:02,852][07361] Fps is (10 sec: 19660.7, 60 sec: 19592.5, 300 sec: 19397.0). Total num frames: 1269977088. Throughput: 0: 4902.7. Samples: 12957138. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 16:08:02,852][07361] Avg episode reward: [(0, '42.072')] [2025-01-05 16:08:03,841][07482] Updated weights for policy 0, policy_version 310058 (0.0017) [2025-01-05 16:08:05,908][07482] Updated weights for policy 0, policy_version 310068 (0.0016) [2025-01-05 16:08:07,852][07361] Fps is (10 sec: 19660.7, 60 sec: 19592.6, 300 sec: 19397.0). Total num frames: 1270075392. Throughput: 0: 4906.1. Samples: 12971900. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 16:08:07,852][07361] Avg episode reward: [(0, '41.556')] [2025-01-05 16:08:07,989][07482] Updated weights for policy 0, policy_version 310078 (0.0017) [2025-01-05 16:08:10,049][07482] Updated weights for policy 0, policy_version 310088 (0.0017) [2025-01-05 16:08:12,120][07482] Updated weights for policy 0, policy_version 310098 (0.0016) [2025-01-05 16:08:12,852][07361] Fps is (10 sec: 19660.9, 60 sec: 19660.8, 300 sec: 19397.0). Total num frames: 1270173696. Throughput: 0: 4911.1. Samples: 13001594. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 16:08:12,852][07361] Avg episode reward: [(0, '40.842')] [2025-01-05 16:08:14,245][07482] Updated weights for policy 0, policy_version 310108 (0.0017) [2025-01-05 16:08:16,295][07482] Updated weights for policy 0, policy_version 310118 (0.0016) [2025-01-05 16:08:17,852][07361] Fps is (10 sec: 19660.8, 60 sec: 19660.8, 300 sec: 19397.0). Total num frames: 1270272000. Throughput: 0: 4905.6. Samples: 13030830. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 16:08:17,852][07361] Avg episode reward: [(0, '39.046')] [2025-01-05 16:08:18,457][07482] Updated weights for policy 0, policy_version 310128 (0.0019) [2025-01-05 16:08:20,479][07482] Updated weights for policy 0, policy_version 310138 (0.0016) [2025-01-05 16:08:22,526][07482] Updated weights for policy 0, policy_version 310148 (0.0015) [2025-01-05 16:08:22,852][07361] Fps is (10 sec: 19660.8, 60 sec: 19660.8, 300 sec: 19397.0). Total num frames: 1270370304. Throughput: 0: 4913.0. Samples: 13045674. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 16:08:22,852][07361] Avg episode reward: [(0, '41.602')] [2025-01-05 16:08:24,695][07482] Updated weights for policy 0, policy_version 310158 (0.0017) [2025-01-05 16:08:26,702][07482] Updated weights for policy 0, policy_version 310168 (0.0016) [2025-01-05 16:08:27,851][07361] Fps is (10 sec: 19661.0, 60 sec: 19660.8, 300 sec: 19397.0). Total num frames: 1270468608. Throughput: 0: 4915.4. Samples: 13075240. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 16:08:27,852][07361] Avg episode reward: [(0, '40.635')] [2025-01-05 16:08:28,750][07482] Updated weights for policy 0, policy_version 310178 (0.0016) [2025-01-05 16:08:30,814][07482] Updated weights for policy 0, policy_version 310188 (0.0015) [2025-01-05 16:08:32,852][07361] Fps is (10 sec: 19660.8, 60 sec: 19660.8, 300 sec: 19397.0). Total num frames: 1270566912. Throughput: 0: 4915.3. Samples: 13104980. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 16:08:32,852][07361] Avg episode reward: [(0, '41.705')] [2025-01-05 16:08:32,906][07482] Updated weights for policy 0, policy_version 310198 (0.0016) [2025-01-05 16:08:35,055][07482] Updated weights for policy 0, policy_version 310208 (0.0016) [2025-01-05 16:08:37,131][07482] Updated weights for policy 0, policy_version 310218 (0.0016) [2025-01-05 16:08:37,851][07361] Fps is (10 sec: 19660.8, 60 sec: 19660.9, 300 sec: 19397.0). Total num frames: 1270665216. Throughput: 0: 4914.8. Samples: 13119496. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 16:08:37,852][07361] Avg episode reward: [(0, '40.994')] [2025-01-05 16:08:37,938][07448] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000310222_1270669312.pth... [2025-01-05 16:08:37,992][07448] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000309086_1266016256.pth [2025-01-05 16:08:39,244][07482] Updated weights for policy 0, policy_version 310228 (0.0017) [2025-01-05 16:08:41,311][07482] Updated weights for policy 0, policy_version 310238 (0.0016) [2025-01-05 16:08:42,852][07361] Fps is (10 sec: 19660.8, 60 sec: 19660.8, 300 sec: 19410.9). Total num frames: 1270763520. Throughput: 0: 4912.4. Samples: 13148920. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 16:08:42,852][07361] Avg episode reward: [(0, '39.982')] [2025-01-05 16:08:43,380][07482] Updated weights for policy 0, policy_version 310248 (0.0016) [2025-01-05 16:08:45,387][07482] Updated weights for policy 0, policy_version 310258 (0.0016) [2025-01-05 16:08:47,453][07482] Updated weights for policy 0, policy_version 310268 (0.0018) [2025-01-05 16:08:47,851][07361] Fps is (10 sec: 19660.8, 60 sec: 19660.9, 300 sec: 19410.9). Total num frames: 1270861824. Throughput: 0: 4926.9. Samples: 13178846. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 16:08:47,852][07361] Avg episode reward: [(0, '41.163')] [2025-01-05 16:08:49,567][07482] Updated weights for policy 0, policy_version 310278 (0.0016) [2025-01-05 16:08:51,586][07482] Updated weights for policy 0, policy_version 310288 (0.0017) [2025-01-05 16:08:52,851][07361] Fps is (10 sec: 19661.0, 60 sec: 19660.8, 300 sec: 19410.9). Total num frames: 1270960128. Throughput: 0: 4928.9. Samples: 13193700. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 16:08:52,852][07361] Avg episode reward: [(0, '43.496')] [2025-01-05 16:08:53,725][07482] Updated weights for policy 0, policy_version 310298 (0.0018) [2025-01-05 16:08:55,823][07482] Updated weights for policy 0, policy_version 310308 (0.0016) [2025-01-05 16:08:57,851][07361] Fps is (10 sec: 19660.8, 60 sec: 19660.8, 300 sec: 19410.9). Total num frames: 1271058432. Throughput: 0: 4920.9. Samples: 13223032. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 16:08:57,852][07361] Avg episode reward: [(0, '44.915')] [2025-01-05 16:08:57,901][07482] Updated weights for policy 0, policy_version 310318 (0.0017) [2025-01-05 16:09:00,059][07482] Updated weights for policy 0, policy_version 310328 (0.0017) [2025-01-05 16:09:02,157][07482] Updated weights for policy 0, policy_version 310338 (0.0015) [2025-01-05 16:09:02,852][07361] Fps is (10 sec: 19660.5, 60 sec: 19660.8, 300 sec: 19410.9). Total num frames: 1271156736. Throughput: 0: 4919.0. Samples: 13252184. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 16:09:02,852][07361] Avg episode reward: [(0, '42.215')] [2025-01-05 16:09:04,211][07482] Updated weights for policy 0, policy_version 310348 (0.0016) [2025-01-05 16:09:06,280][07482] Updated weights for policy 0, policy_version 310358 (0.0016) [2025-01-05 16:09:07,851][07361] Fps is (10 sec: 19660.8, 60 sec: 19660.8, 300 sec: 19424.8). Total num frames: 1271255040. Throughput: 0: 4916.2. Samples: 13266902. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 16:09:07,852][07361] Avg episode reward: [(0, '42.568')] [2025-01-05 16:09:08,453][07482] Updated weights for policy 0, policy_version 310368 (0.0017) [2025-01-05 16:09:10,455][07482] Updated weights for policy 0, policy_version 310378 (0.0015) [2025-01-05 16:09:12,505][07482] Updated weights for policy 0, policy_version 310388 (0.0015) [2025-01-05 16:09:12,852][07361] Fps is (10 sec: 19660.7, 60 sec: 19660.8, 300 sec: 19410.9). Total num frames: 1271353344. Throughput: 0: 4915.3. Samples: 13296428. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 16:09:12,852][07361] Avg episode reward: [(0, '41.828')] [2025-01-05 16:09:14,650][07482] Updated weights for policy 0, policy_version 310398 (0.0016) [2025-01-05 16:09:16,678][07482] Updated weights for policy 0, policy_version 310408 (0.0016) [2025-01-05 16:09:17,852][07361] Fps is (10 sec: 19660.5, 60 sec: 19660.8, 300 sec: 19424.8). Total num frames: 1271451648. Throughput: 0: 4909.8. Samples: 13325922. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 16:09:17,852][07361] Avg episode reward: [(0, '42.562')] [2025-01-05 16:09:18,801][07482] Updated weights for policy 0, policy_version 310418 (0.0017) [2025-01-05 16:09:20,881][07482] Updated weights for policy 0, policy_version 310428 (0.0016) [2025-01-05 16:09:22,851][07361] Fps is (10 sec: 19660.9, 60 sec: 19660.8, 300 sec: 19424.8). Total num frames: 1271549952. Throughput: 0: 4914.8. Samples: 13340660. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 16:09:22,852][07361] Avg episode reward: [(0, '41.554')] [2025-01-05 16:09:22,986][07482] Updated weights for policy 0, policy_version 310438 (0.0017) [2025-01-05 16:09:25,060][07482] Updated weights for policy 0, policy_version 310448 (0.0016) [2025-01-05 16:09:27,184][07482] Updated weights for policy 0, policy_version 310458 (0.0017) [2025-01-05 16:09:27,852][07361] Fps is (10 sec: 19660.8, 60 sec: 19660.8, 300 sec: 19424.7). Total num frames: 1271648256. Throughput: 0: 4911.1. Samples: 13369922. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 16:09:27,852][07361] Avg episode reward: [(0, '43.499')] [2025-01-05 16:09:29,298][07482] Updated weights for policy 0, policy_version 310468 (0.0017) [2025-01-05 16:09:31,387][07482] Updated weights for policy 0, policy_version 310478 (0.0020) [2025-01-05 16:09:32,851][07361] Fps is (10 sec: 19251.2, 60 sec: 19592.5, 300 sec: 19410.9). Total num frames: 1271742464. Throughput: 0: 4892.8. Samples: 13399024. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 16:09:32,852][07361] Avg episode reward: [(0, '43.907')] [2025-01-05 16:09:33,559][07482] Updated weights for policy 0, policy_version 310488 (0.0017) [2025-01-05 16:09:35,598][07482] Updated weights for policy 0, policy_version 310498 (0.0017) [2025-01-05 16:09:37,673][07482] Updated weights for policy 0, policy_version 310508 (0.0016) [2025-01-05 16:09:37,851][07361] Fps is (10 sec: 19251.5, 60 sec: 19592.5, 300 sec: 19424.8). Total num frames: 1271840768. Throughput: 0: 4885.1. Samples: 13413532. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 16:09:37,852][07361] Avg episode reward: [(0, '44.547')] [2025-01-05 16:09:39,878][07482] Updated weights for policy 0, policy_version 310518 (0.0017) [2025-01-05 16:09:41,884][07482] Updated weights for policy 0, policy_version 310528 (0.0016) [2025-01-05 16:09:42,851][07361] Fps is (10 sec: 19660.9, 60 sec: 19592.6, 300 sec: 19424.8). Total num frames: 1271939072. Throughput: 0: 4883.6. Samples: 13442792. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 16:09:42,852][07361] Avg episode reward: [(0, '43.640')] [2025-01-05 16:09:43,951][07482] Updated weights for policy 0, policy_version 310538 (0.0015) [2025-01-05 16:09:46,043][07482] Updated weights for policy 0, policy_version 310548 (0.0016) [2025-01-05 16:09:47,851][07361] Fps is (10 sec: 19660.7, 60 sec: 19592.5, 300 sec: 19424.8). Total num frames: 1272037376. Throughput: 0: 4892.5. Samples: 13472348. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 16:09:47,852][07361] Avg episode reward: [(0, '41.832')] [2025-01-05 16:09:48,181][07482] Updated weights for policy 0, policy_version 310558 (0.0016) [2025-01-05 16:09:50,254][07482] Updated weights for policy 0, policy_version 310568 (0.0017) [2025-01-05 16:09:52,370][07482] Updated weights for policy 0, policy_version 310578 (0.0016) [2025-01-05 16:09:52,852][07361] Fps is (10 sec: 19660.6, 60 sec: 19592.5, 300 sec: 19438.6). Total num frames: 1272135680. Throughput: 0: 4892.6. Samples: 13487070. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 16:09:52,852][07361] Avg episode reward: [(0, '42.163')] [2025-01-05 16:09:54,510][07482] Updated weights for policy 0, policy_version 310588 (0.0018) [2025-01-05 16:09:56,600][07482] Updated weights for policy 0, policy_version 310598 (0.0017) [2025-01-05 16:09:57,851][07361] Fps is (10 sec: 19251.2, 60 sec: 19524.2, 300 sec: 19438.7). Total num frames: 1272229888. Throughput: 0: 4880.5. Samples: 13516048. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 16:09:57,852][07361] Avg episode reward: [(0, '42.320')] [2025-01-05 16:09:58,774][07482] Updated weights for policy 0, policy_version 310608 (0.0017) [2025-01-05 16:10:00,778][07482] Updated weights for policy 0, policy_version 310618 (0.0016) [2025-01-05 16:10:02,851][07361] Fps is (10 sec: 19251.4, 60 sec: 19524.3, 300 sec: 19452.5). Total num frames: 1272328192. Throughput: 0: 4875.7. Samples: 13545326. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 16:10:02,852][07361] Avg episode reward: [(0, '41.007')] [2025-01-05 16:10:02,882][07482] Updated weights for policy 0, policy_version 310628 (0.0016) [2025-01-05 16:10:05,091][07482] Updated weights for policy 0, policy_version 310638 (0.0018) [2025-01-05 16:10:07,135][07482] Updated weights for policy 0, policy_version 310648 (0.0019) [2025-01-05 16:10:07,851][07361] Fps is (10 sec: 19660.8, 60 sec: 19524.3, 300 sec: 19480.3). Total num frames: 1272426496. Throughput: 0: 4865.0. Samples: 13559584. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 16:10:07,852][07361] Avg episode reward: [(0, '41.899')] [2025-01-05 16:10:09,316][07482] Updated weights for policy 0, policy_version 310658 (0.0016) [2025-01-05 16:10:11,429][07482] Updated weights for policy 0, policy_version 310668 (0.0017) [2025-01-05 16:10:12,851][07361] Fps is (10 sec: 19251.2, 60 sec: 19456.0, 300 sec: 19480.3). Total num frames: 1272520704. Throughput: 0: 4861.8. Samples: 13588704. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 16:10:12,852][07361] Avg episode reward: [(0, '41.871')] [2025-01-05 16:10:13,580][07482] Updated weights for policy 0, policy_version 310678 (0.0017) [2025-01-05 16:10:15,635][07482] Updated weights for policy 0, policy_version 310688 (0.0016) [2025-01-05 16:10:17,710][07482] Updated weights for policy 0, policy_version 310698 (0.0016) [2025-01-05 16:10:17,852][07361] Fps is (10 sec: 19251.1, 60 sec: 19456.0, 300 sec: 19480.3). Total num frames: 1272619008. Throughput: 0: 4866.0. Samples: 13617994. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 16:10:17,852][07361] Avg episode reward: [(0, '42.358')] [2025-01-05 16:10:19,871][07482] Updated weights for policy 0, policy_version 310708 (0.0017) [2025-01-05 16:10:21,957][07482] Updated weights for policy 0, policy_version 310718 (0.0016) [2025-01-05 16:10:22,852][07361] Fps is (10 sec: 19251.0, 60 sec: 19387.7, 300 sec: 19480.3). Total num frames: 1272713216. Throughput: 0: 4861.1. Samples: 13632284. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 16:10:22,852][07361] Avg episode reward: [(0, '42.163')] [2025-01-05 16:10:24,166][07482] Updated weights for policy 0, policy_version 310728 (0.0017) [2025-01-05 16:10:26,216][07482] Updated weights for policy 0, policy_version 310738 (0.0016) [2025-01-05 16:10:27,852][07361] Fps is (10 sec: 19251.1, 60 sec: 19387.7, 300 sec: 19508.1). Total num frames: 1272811520. Throughput: 0: 4855.4. Samples: 13661284. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 16:10:27,852][07361] Avg episode reward: [(0, '41.993')] [2025-01-05 16:10:28,416][07482] Updated weights for policy 0, policy_version 310748 (0.0018) [2025-01-05 16:10:30,515][07482] Updated weights for policy 0, policy_version 310758 (0.0016) [2025-01-05 16:10:32,629][07482] Updated weights for policy 0, policy_version 310768 (0.0017) [2025-01-05 16:10:32,852][07361] Fps is (10 sec: 19660.8, 60 sec: 19456.0, 300 sec: 19522.0). Total num frames: 1272909824. Throughput: 0: 4839.9. Samples: 13690142. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 16:10:32,852][07361] Avg episode reward: [(0, '44.330')] [2025-01-05 16:10:34,839][07482] Updated weights for policy 0, policy_version 310778 (0.0017) [2025-01-05 16:10:36,955][07482] Updated weights for policy 0, policy_version 310788 (0.0016) [2025-01-05 16:10:37,852][07361] Fps is (10 sec: 19251.0, 60 sec: 19387.7, 300 sec: 19521.9). Total num frames: 1273004032. Throughput: 0: 4822.8. Samples: 13704098. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 16:10:37,852][07361] Avg episode reward: [(0, '41.685')] [2025-01-05 16:10:37,858][07448] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000310792_1273004032.pth... [2025-01-05 16:10:37,914][07448] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000309647_1268314112.pth [2025-01-05 16:10:39,112][07482] Updated weights for policy 0, policy_version 310798 (0.0017) [2025-01-05 16:10:41,183][07482] Updated weights for policy 0, policy_version 310808 (0.0017) [2025-01-05 16:10:42,852][07361] Fps is (10 sec: 19251.2, 60 sec: 19387.7, 300 sec: 19535.8). Total num frames: 1273102336. Throughput: 0: 4823.8. Samples: 13733118. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 16:10:42,852][07361] Avg episode reward: [(0, '41.915')] [2025-01-05 16:10:43,271][07482] Updated weights for policy 0, policy_version 310818 (0.0016) [2025-01-05 16:10:45,329][07482] Updated weights for policy 0, policy_version 310828 (0.0016) [2025-01-05 16:10:47,399][07482] Updated weights for policy 0, policy_version 310838 (0.0016) [2025-01-05 16:10:47,852][07361] Fps is (10 sec: 19661.0, 60 sec: 19387.7, 300 sec: 19535.8). Total num frames: 1273200640. Throughput: 0: 4831.9. Samples: 13762764. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 16:10:47,852][07361] Avg episode reward: [(0, '43.180')] [2025-01-05 16:10:49,539][07482] Updated weights for policy 0, policy_version 310848 (0.0016) [2025-01-05 16:10:51,598][07482] Updated weights for policy 0, policy_version 310858 (0.0016) [2025-01-05 16:10:52,852][07361] Fps is (10 sec: 19251.2, 60 sec: 19319.5, 300 sec: 19535.8). Total num frames: 1273294848. Throughput: 0: 4839.7. Samples: 13777370. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 16:10:52,852][07361] Avg episode reward: [(0, '43.058')] [2025-01-05 16:10:53,802][07482] Updated weights for policy 0, policy_version 310868 (0.0017) [2025-01-05 16:10:55,886][07482] Updated weights for policy 0, policy_version 310878 (0.0016) [2025-01-05 16:10:57,852][07361] Fps is (10 sec: 18841.7, 60 sec: 19319.4, 300 sec: 19522.0). Total num frames: 1273389056. Throughput: 0: 4835.7. Samples: 13806312. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 16:10:57,852][07361] Avg episode reward: [(0, '45.660')] [2025-01-05 16:10:57,858][07448] Saving new best policy, reward=45.660! [2025-01-05 16:10:58,040][07482] Updated weights for policy 0, policy_version 310888 (0.0017) [2025-01-05 16:11:00,158][07482] Updated weights for policy 0, policy_version 310898 (0.0016) [2025-01-05 16:11:02,248][07482] Updated weights for policy 0, policy_version 310908 (0.0017) [2025-01-05 16:11:02,852][07361] Fps is (10 sec: 19251.2, 60 sec: 19319.4, 300 sec: 19521.9). Total num frames: 1273487360. Throughput: 0: 4830.5. Samples: 13835368. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 16:11:02,852][07361] Avg episode reward: [(0, '46.204')] [2025-01-05 16:11:02,896][07448] Saving new best policy, reward=46.204! [2025-01-05 16:11:04,425][07482] Updated weights for policy 0, policy_version 310918 (0.0016) [2025-01-05 16:11:06,488][07482] Updated weights for policy 0, policy_version 310928 (0.0016) [2025-01-05 16:11:07,852][07361] Fps is (10 sec: 19660.6, 60 sec: 19319.4, 300 sec: 19521.9). Total num frames: 1273585664. Throughput: 0: 4827.8. Samples: 13849534. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 16:11:07,852][07361] Avg episode reward: [(0, '44.081')] [2025-01-05 16:11:08,712][07482] Updated weights for policy 0, policy_version 310938 (0.0020) [2025-01-05 16:11:10,740][07482] Updated weights for policy 0, policy_version 310948 (0.0016) [2025-01-05 16:11:12,852][07361] Fps is (10 sec: 19251.2, 60 sec: 19319.4, 300 sec: 19508.1). Total num frames: 1273679872. Throughput: 0: 4829.2. Samples: 13878596. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 16:11:12,852][07361] Avg episode reward: [(0, '43.275')] [2025-01-05 16:11:12,859][07482] Updated weights for policy 0, policy_version 310958 (0.0016) [2025-01-05 16:11:15,006][07482] Updated weights for policy 0, policy_version 310968 (0.0016) [2025-01-05 16:11:17,023][07482] Updated weights for policy 0, policy_version 310978 (0.0016) [2025-01-05 16:11:17,852][07361] Fps is (10 sec: 19251.3, 60 sec: 19319.4, 300 sec: 19521.9). Total num frames: 1273778176. Throughput: 0: 4841.4. Samples: 13908004. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 16:11:17,852][07361] Avg episode reward: [(0, '43.002')] [2025-01-05 16:11:19,143][07482] Updated weights for policy 0, policy_version 310988 (0.0017) [2025-01-05 16:11:21,248][07482] Updated weights for policy 0, policy_version 310998 (0.0018) [2025-01-05 16:11:22,852][07361] Fps is (10 sec: 19660.9, 60 sec: 19387.7, 300 sec: 19522.0). Total num frames: 1273876480. Throughput: 0: 4856.2. Samples: 13922628. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 16:11:22,852][07361] Avg episode reward: [(0, '45.248')] [2025-01-05 16:11:23,343][07482] Updated weights for policy 0, policy_version 311008 (0.0017) [2025-01-05 16:11:25,441][07482] Updated weights for policy 0, policy_version 311018 (0.0016) [2025-01-05 16:11:27,522][07482] Updated weights for policy 0, policy_version 311028 (0.0016) [2025-01-05 16:11:27,852][07361] Fps is (10 sec: 19660.9, 60 sec: 19387.7, 300 sec: 19522.0). Total num frames: 1273974784. Throughput: 0: 4863.6. Samples: 13951982. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 16:11:27,852][07361] Avg episode reward: [(0, '42.081')] [2025-01-05 16:11:29,663][07482] Updated weights for policy 0, policy_version 311038 (0.0018) [2025-01-05 16:11:31,729][07482] Updated weights for policy 0, policy_version 311048 (0.0015) [2025-01-05 16:11:32,852][07361] Fps is (10 sec: 19660.7, 60 sec: 19387.7, 300 sec: 19521.9). Total num frames: 1274073088. Throughput: 0: 4848.9. Samples: 13980966. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 16:11:32,852][07361] Avg episode reward: [(0, '43.892')] [2025-01-05 16:11:33,922][07482] Updated weights for policy 0, policy_version 311058 (0.0015) [2025-01-05 16:11:35,929][07482] Updated weights for policy 0, policy_version 311068 (0.0017) [2025-01-05 16:11:37,852][07361] Fps is (10 sec: 19660.7, 60 sec: 19456.0, 300 sec: 19535.8). Total num frames: 1274171392. Throughput: 0: 4849.8. Samples: 13995610. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 16:11:37,852][07361] Avg episode reward: [(0, '46.775')] [2025-01-05 16:11:37,858][07448] Saving new best policy, reward=46.775! [2025-01-05 16:11:38,019][07482] Updated weights for policy 0, policy_version 311078 (0.0017) [2025-01-05 16:11:40,168][07482] Updated weights for policy 0, policy_version 311088 (0.0017) [2025-01-05 16:11:42,187][07482] Updated weights for policy 0, policy_version 311098 (0.0016) [2025-01-05 16:11:42,851][07361] Fps is (10 sec: 19661.1, 60 sec: 19456.0, 300 sec: 19535.9). Total num frames: 1274269696. Throughput: 0: 4861.1. Samples: 14025060. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 16:11:42,852][07361] Avg episode reward: [(0, '46.491')] [2025-01-05 16:11:44,298][07482] Updated weights for policy 0, policy_version 311108 (0.0016) [2025-01-05 16:11:46,397][07482] Updated weights for policy 0, policy_version 311118 (0.0016) [2025-01-05 16:11:47,851][07361] Fps is (10 sec: 19251.5, 60 sec: 19387.8, 300 sec: 19521.9). Total num frames: 1274363904. Throughput: 0: 4864.7. Samples: 14054280. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 16:11:47,852][07361] Avg episode reward: [(0, '45.473')] [2025-01-05 16:11:48,581][07482] Updated weights for policy 0, policy_version 311128 (0.0017) [2025-01-05 16:11:50,629][07482] Updated weights for policy 0, policy_version 311138 (0.0015) [2025-01-05 16:11:52,761][07482] Updated weights for policy 0, policy_version 311148 (0.0016) [2025-01-05 16:11:52,852][07361] Fps is (10 sec: 19250.8, 60 sec: 19456.0, 300 sec: 19521.9). Total num frames: 1274462208. Throughput: 0: 4872.5. Samples: 14068796. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 16:11:52,852][07361] Avg episode reward: [(0, '43.301')] [2025-01-05 16:11:54,910][07482] Updated weights for policy 0, policy_version 311158 (0.0017) [2025-01-05 16:11:56,972][07482] Updated weights for policy 0, policy_version 311168 (0.0017) [2025-01-05 16:11:57,852][07361] Fps is (10 sec: 19249.8, 60 sec: 19455.8, 300 sec: 19508.0). Total num frames: 1274556416. Throughput: 0: 4870.2. Samples: 14097758. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 16:11:57,853][07361] Avg episode reward: [(0, '44.286')] [2025-01-05 16:11:59,169][07482] Updated weights for policy 0, policy_version 311178 (0.0017) [2025-01-05 16:12:01,204][07482] Updated weights for policy 0, policy_version 311188 (0.0016) [2025-01-05 16:12:02,852][07361] Fps is (10 sec: 19251.3, 60 sec: 19456.0, 300 sec: 19508.1). Total num frames: 1274654720. Throughput: 0: 4859.7. Samples: 14126690. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 16:12:02,852][07361] Avg episode reward: [(0, '46.816')] [2025-01-05 16:12:02,853][07448] Saving new best policy, reward=46.816! [2025-01-05 16:12:03,456][07482] Updated weights for policy 0, policy_version 311198 (0.0019) [2025-01-05 16:12:05,548][07482] Updated weights for policy 0, policy_version 311208 (0.0015) [2025-01-05 16:12:07,585][07482] Updated weights for policy 0, policy_version 311218 (0.0016) [2025-01-05 16:12:07,852][07361] Fps is (10 sec: 19661.5, 60 sec: 19455.9, 300 sec: 19521.9). Total num frames: 1274753024. Throughput: 0: 4853.6. Samples: 14141040. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 16:12:07,852][07361] Avg episode reward: [(0, '46.927')] [2025-01-05 16:12:07,859][07448] Saving new best policy, reward=46.927! [2025-01-05 16:12:09,877][07482] Updated weights for policy 0, policy_version 311228 (0.0018) [2025-01-05 16:12:11,970][07482] Updated weights for policy 0, policy_version 311238 (0.0016) [2025-01-05 16:12:12,852][07361] Fps is (10 sec: 19251.3, 60 sec: 19456.0, 300 sec: 19508.1). Total num frames: 1274847232. Throughput: 0: 4838.7. Samples: 14169724. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 16:12:12,852][07361] Avg episode reward: [(0, '45.805')] [2025-01-05 16:12:14,139][07482] Updated weights for policy 0, policy_version 311248 (0.0017) [2025-01-05 16:12:16,225][07482] Updated weights for policy 0, policy_version 311258 (0.0020) [2025-01-05 16:12:17,852][07361] Fps is (10 sec: 18842.1, 60 sec: 19387.7, 300 sec: 19494.2). Total num frames: 1274941440. Throughput: 0: 4829.3. Samples: 14198286. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 16:12:17,852][07361] Avg episode reward: [(0, '44.105')] [2025-01-05 16:12:18,445][07482] Updated weights for policy 0, policy_version 311268 (0.0017) [2025-01-05 16:12:20,463][07482] Updated weights for policy 0, policy_version 311278 (0.0016) [2025-01-05 16:12:22,520][07482] Updated weights for policy 0, policy_version 311288 (0.0016) [2025-01-05 16:12:22,851][07361] Fps is (10 sec: 19251.2, 60 sec: 19387.7, 300 sec: 19494.2). Total num frames: 1275039744. Throughput: 0: 4830.6. Samples: 14212986. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 16:12:22,852][07361] Avg episode reward: [(0, '43.465')] [2025-01-05 16:12:24,718][07482] Updated weights for policy 0, policy_version 311298 (0.0020) [2025-01-05 16:12:26,740][07482] Updated weights for policy 0, policy_version 311308 (0.0017) [2025-01-05 16:12:27,851][07361] Fps is (10 sec: 19251.3, 60 sec: 19319.5, 300 sec: 19480.3). Total num frames: 1275133952. Throughput: 0: 4826.1. Samples: 14242236. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 16:12:27,852][07361] Avg episode reward: [(0, '42.226')] [2025-01-05 16:12:28,895][07482] Updated weights for policy 0, policy_version 311318 (0.0016) [2025-01-05 16:12:30,981][07482] Updated weights for policy 0, policy_version 311328 (0.0018) [2025-01-05 16:12:32,851][07361] Fps is (10 sec: 19251.4, 60 sec: 19319.5, 300 sec: 19480.3). Total num frames: 1275232256. Throughput: 0: 4822.5. Samples: 14271294. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 16:12:32,852][07361] Avg episode reward: [(0, '42.231')] [2025-01-05 16:12:33,103][07482] Updated weights for policy 0, policy_version 311338 (0.0017) [2025-01-05 16:12:35,190][07482] Updated weights for policy 0, policy_version 311348 (0.0017) [2025-01-05 16:12:37,293][07482] Updated weights for policy 0, policy_version 311358 (0.0018) [2025-01-05 16:12:37,852][07361] Fps is (10 sec: 19660.6, 60 sec: 19319.5, 300 sec: 19480.3). Total num frames: 1275330560. Throughput: 0: 4826.4. Samples: 14285984. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 16:12:37,852][07361] Avg episode reward: [(0, '44.860')] [2025-01-05 16:12:37,938][07448] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000311361_1275334656.pth... [2025-01-05 16:12:37,993][07448] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000310222_1270669312.pth [2025-01-05 16:12:39,410][07482] Updated weights for policy 0, policy_version 311368 (0.0018) [2025-01-05 16:12:41,476][07482] Updated weights for policy 0, policy_version 311378 (0.0016) [2025-01-05 16:12:42,851][07361] Fps is (10 sec: 19660.6, 60 sec: 19319.4, 300 sec: 19480.3). Total num frames: 1275428864. Throughput: 0: 4834.5. Samples: 14315306. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 16:12:42,852][07361] Avg episode reward: [(0, '43.865')] [2025-01-05 16:12:43,662][07482] Updated weights for policy 0, policy_version 311388 (0.0017) [2025-01-05 16:12:45,685][07482] Updated weights for policy 0, policy_version 311398 (0.0016) [2025-01-05 16:12:47,852][07361] Fps is (10 sec: 19251.3, 60 sec: 19319.4, 300 sec: 19466.4). Total num frames: 1275523072. Throughput: 0: 4833.8. Samples: 14344212. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 16:12:47,852][07361] Avg episode reward: [(0, '43.507')] [2025-01-05 16:12:47,869][07482] Updated weights for policy 0, policy_version 311408 (0.0017) [2025-01-05 16:12:50,080][07482] Updated weights for policy 0, policy_version 311418 (0.0017) [2025-01-05 16:12:52,157][07482] Updated weights for policy 0, policy_version 311428 (0.0017) [2025-01-05 16:12:52,851][07361] Fps is (10 sec: 19251.4, 60 sec: 19319.5, 300 sec: 19466.4). Total num frames: 1275621376. Throughput: 0: 4830.6. Samples: 14358416. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 16:12:52,852][07361] Avg episode reward: [(0, '45.123')] [2025-01-05 16:12:54,334][07482] Updated weights for policy 0, policy_version 311438 (0.0018) [2025-01-05 16:12:56,465][07482] Updated weights for policy 0, policy_version 311448 (0.0017) [2025-01-05 16:12:57,852][07361] Fps is (10 sec: 19251.1, 60 sec: 19319.7, 300 sec: 19452.5). Total num frames: 1275715584. Throughput: 0: 4834.4. Samples: 14387274. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 16:12:57,852][07361] Avg episode reward: [(0, '46.472')] [2025-01-05 16:12:58,578][07482] Updated weights for policy 0, policy_version 311458 (0.0017) [2025-01-05 16:13:00,687][07482] Updated weights for policy 0, policy_version 311468 (0.0016) [2025-01-05 16:13:02,851][07361] Fps is (10 sec: 19251.1, 60 sec: 19319.5, 300 sec: 19452.5). Total num frames: 1275813888. Throughput: 0: 4840.4. Samples: 14416102. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 16:13:02,852][07361] Avg episode reward: [(0, '41.708')] [2025-01-05 16:13:02,852][07482] Updated weights for policy 0, policy_version 311478 (0.0018) [2025-01-05 16:13:04,951][07482] Updated weights for policy 0, policy_version 311488 (0.0017) [2025-01-05 16:13:07,059][07482] Updated weights for policy 0, policy_version 311498 (0.0017) [2025-01-05 16:13:07,851][07361] Fps is (10 sec: 19251.5, 60 sec: 19251.3, 300 sec: 19438.7). Total num frames: 1275908096. Throughput: 0: 4832.2. Samples: 14430434. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 16:13:07,852][07361] Avg episode reward: [(0, '44.849')] [2025-01-05 16:13:09,236][07482] Updated weights for policy 0, policy_version 311508 (0.0018) [2025-01-05 16:13:11,233][07482] Updated weights for policy 0, policy_version 311518 (0.0018) [2025-01-05 16:13:12,851][07361] Fps is (10 sec: 19251.3, 60 sec: 19319.5, 300 sec: 19438.7). Total num frames: 1276006400. Throughput: 0: 4834.6. Samples: 14459792. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 16:13:12,852][07361] Avg episode reward: [(0, '43.670')] [2025-01-05 16:13:13,290][07482] Updated weights for policy 0, policy_version 311528 (0.0015) [2025-01-05 16:13:15,366][07482] Updated weights for policy 0, policy_version 311538 (0.0016) [2025-01-05 16:13:17,387][07482] Updated weights for policy 0, policy_version 311548 (0.0016) [2025-01-05 16:13:17,852][07361] Fps is (10 sec: 19660.6, 60 sec: 19387.7, 300 sec: 19438.6). Total num frames: 1276104704. Throughput: 0: 4855.4. Samples: 14489788. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 16:13:17,852][07361] Avg episode reward: [(0, '40.322')] [2025-01-05 16:13:19,499][07482] Updated weights for policy 0, policy_version 311558 (0.0016) [2025-01-05 16:13:21,579][07482] Updated weights for policy 0, policy_version 311568 (0.0018) [2025-01-05 16:13:22,852][07361] Fps is (10 sec: 19660.3, 60 sec: 19387.7, 300 sec: 19438.6). Total num frames: 1276203008. Throughput: 0: 4856.7. Samples: 14504534. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 16:13:22,852][07361] Avg episode reward: [(0, '42.230')] [2025-01-05 16:13:23,695][07482] Updated weights for policy 0, policy_version 311578 (0.0017) [2025-01-05 16:13:25,787][07482] Updated weights for policy 0, policy_version 311588 (0.0016) [2025-01-05 16:13:27,852][07361] Fps is (10 sec: 19660.8, 60 sec: 19456.0, 300 sec: 19438.6). Total num frames: 1276301312. Throughput: 0: 4853.0. Samples: 14533690. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 16:13:27,852][07361] Avg episode reward: [(0, '43.666')] [2025-01-05 16:13:27,946][07482] Updated weights for policy 0, policy_version 311598 (0.0017) [2025-01-05 16:13:30,069][07482] Updated weights for policy 0, policy_version 311608 (0.0016) [2025-01-05 16:13:32,140][07482] Updated weights for policy 0, policy_version 311618 (0.0018) [2025-01-05 16:13:32,851][07361] Fps is (10 sec: 19661.2, 60 sec: 19456.0, 300 sec: 19438.6). Total num frames: 1276399616. Throughput: 0: 4856.3. Samples: 14562744. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 16:13:32,852][07361] Avg episode reward: [(0, '41.836')] [2025-01-05 16:13:34,325][07482] Updated weights for policy 0, policy_version 311628 (0.0017) [2025-01-05 16:13:36,420][07482] Updated weights for policy 0, policy_version 311638 (0.0016) [2025-01-05 16:13:37,852][07361] Fps is (10 sec: 19251.1, 60 sec: 19387.7, 300 sec: 19424.8). Total num frames: 1276493824. Throughput: 0: 4859.0. Samples: 14577072. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 16:13:37,852][07361] Avg episode reward: [(0, '42.939')] [2025-01-05 16:13:38,543][07482] Updated weights for policy 0, policy_version 311648 (0.0017) [2025-01-05 16:13:40,656][07482] Updated weights for policy 0, policy_version 311658 (0.0017) [2025-01-05 16:13:42,680][07482] Updated weights for policy 0, policy_version 311668 (0.0015) [2025-01-05 16:13:42,852][07361] Fps is (10 sec: 19251.0, 60 sec: 19387.7, 300 sec: 19424.7). Total num frames: 1276592128. Throughput: 0: 4867.0. Samples: 14606288. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 16:13:42,852][07361] Avg episode reward: [(0, '41.991')] [2025-01-05 16:13:44,773][07482] Updated weights for policy 0, policy_version 311678 (0.0016) [2025-01-05 16:13:46,831][07482] Updated weights for policy 0, policy_version 311688 (0.0015) [2025-01-05 16:13:47,851][07361] Fps is (10 sec: 19661.2, 60 sec: 19456.0, 300 sec: 19424.8). Total num frames: 1276690432. Throughput: 0: 4887.5. Samples: 14636038. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 16:13:47,852][07361] Avg episode reward: [(0, '42.045')] [2025-01-05 16:13:48,916][07482] Updated weights for policy 0, policy_version 311698 (0.0016) [2025-01-05 16:13:50,975][07482] Updated weights for policy 0, policy_version 311708 (0.0016) [2025-01-05 16:13:52,852][07361] Fps is (10 sec: 19661.0, 60 sec: 19456.0, 300 sec: 19424.8). Total num frames: 1276788736. Throughput: 0: 4898.7. Samples: 14650876. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 16:13:52,852][07361] Avg episode reward: [(0, '43.542')] [2025-01-05 16:13:53,129][07482] Updated weights for policy 0, policy_version 311718 (0.0016) [2025-01-05 16:13:55,176][07482] Updated weights for policy 0, policy_version 311728 (0.0016) [2025-01-05 16:13:57,222][07482] Updated weights for policy 0, policy_version 311738 (0.0017) [2025-01-05 16:13:57,852][07361] Fps is (10 sec: 19660.6, 60 sec: 19524.3, 300 sec: 19424.8). Total num frames: 1276887040. Throughput: 0: 4899.5. Samples: 14680272. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 16:13:57,852][07361] Avg episode reward: [(0, '44.665')] [2025-01-05 16:13:59,367][07482] Updated weights for policy 0, policy_version 311748 (0.0017) [2025-01-05 16:14:01,392][07482] Updated weights for policy 0, policy_version 311758 (0.0015) [2025-01-05 16:14:02,852][07361] Fps is (10 sec: 19660.8, 60 sec: 19524.3, 300 sec: 19424.8). Total num frames: 1276985344. Throughput: 0: 4884.9. Samples: 14709610. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 16:14:02,852][07361] Avg episode reward: [(0, '46.246')] [2025-01-05 16:14:03,561][07482] Updated weights for policy 0, policy_version 311768 (0.0016) [2025-01-05 16:14:05,644][07482] Updated weights for policy 0, policy_version 311778 (0.0016) [2025-01-05 16:14:07,691][07482] Updated weights for policy 0, policy_version 311788 (0.0016) [2025-01-05 16:14:07,852][07361] Fps is (10 sec: 19660.8, 60 sec: 19592.5, 300 sec: 19424.8). Total num frames: 1277083648. Throughput: 0: 4882.0. Samples: 14724222. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 16:14:07,852][07361] Avg episode reward: [(0, '45.759')] [2025-01-05 16:14:09,839][07482] Updated weights for policy 0, policy_version 311798 (0.0016) [2025-01-05 16:14:11,924][07482] Updated weights for policy 0, policy_version 311808 (0.0016) [2025-01-05 16:14:12,852][07361] Fps is (10 sec: 19660.8, 60 sec: 19592.5, 300 sec: 19424.8). Total num frames: 1277181952. Throughput: 0: 4888.6. Samples: 14753678. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 16:14:12,852][07361] Avg episode reward: [(0, '46.200')] [2025-01-05 16:14:14,059][07482] Updated weights for policy 0, policy_version 311818 (0.0019) [2025-01-05 16:14:16,116][07482] Updated weights for policy 0, policy_version 311828 (0.0016) [2025-01-05 16:14:17,851][07361] Fps is (10 sec: 19661.0, 60 sec: 19592.6, 300 sec: 19424.8). Total num frames: 1277280256. Throughput: 0: 4886.5. Samples: 14782638. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 16:14:17,852][07361] Avg episode reward: [(0, '44.052')] [2025-01-05 16:14:18,284][07482] Updated weights for policy 0, policy_version 311838 (0.0017) [2025-01-05 16:14:20,311][07482] Updated weights for policy 0, policy_version 311848 (0.0015) [2025-01-05 16:14:22,383][07482] Updated weights for policy 0, policy_version 311858 (0.0016) [2025-01-05 16:14:22,852][07361] Fps is (10 sec: 19660.7, 60 sec: 19592.6, 300 sec: 19424.8). Total num frames: 1277378560. Throughput: 0: 4897.1. Samples: 14797442. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 16:14:22,852][07361] Avg episode reward: [(0, '44.492')] [2025-01-05 16:14:24,531][07482] Updated weights for policy 0, policy_version 311868 (0.0016) [2025-01-05 16:14:26,579][07482] Updated weights for policy 0, policy_version 311878 (0.0016) [2025-01-05 16:14:27,852][07361] Fps is (10 sec: 19250.7, 60 sec: 19524.2, 300 sec: 19424.7). Total num frames: 1277472768. Throughput: 0: 4900.4. Samples: 14826808. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 16:14:27,852][07361] Avg episode reward: [(0, '45.053')] [2025-01-05 16:14:28,722][07482] Updated weights for policy 0, policy_version 311888 (0.0017) [2025-01-05 16:14:30,814][07482] Updated weights for policy 0, policy_version 311898 (0.0016) [2025-01-05 16:14:32,852][07361] Fps is (10 sec: 19251.0, 60 sec: 19524.2, 300 sec: 19424.7). Total num frames: 1277571072. Throughput: 0: 4883.8. Samples: 14855810. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 16:14:32,852][07361] Avg episode reward: [(0, '41.028')] [2025-01-05 16:14:32,939][07482] Updated weights for policy 0, policy_version 311908 (0.0017) [2025-01-05 16:14:35,074][07482] Updated weights for policy 0, policy_version 311918 (0.0017) [2025-01-05 16:14:37,158][07482] Updated weights for policy 0, policy_version 311928 (0.0016) [2025-01-05 16:14:37,852][07361] Fps is (10 sec: 19660.8, 60 sec: 19592.5, 300 sec: 19424.7). Total num frames: 1277669376. Throughput: 0: 4878.0. Samples: 14870386. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 16:14:37,852][07361] Avg episode reward: [(0, '44.002')] [2025-01-05 16:14:37,859][07448] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000311931_1277669376.pth... [2025-01-05 16:14:37,916][07448] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000310792_1273004032.pth [2025-01-05 16:14:39,278][07482] Updated weights for policy 0, policy_version 311938 (0.0017) [2025-01-05 16:14:41,348][07482] Updated weights for policy 0, policy_version 311948 (0.0017) [2025-01-05 16:14:42,852][07361] Fps is (10 sec: 19251.4, 60 sec: 19524.3, 300 sec: 19410.9). Total num frames: 1277763584. Throughput: 0: 4875.6. Samples: 14899676. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 16:14:42,852][07361] Avg episode reward: [(0, '44.810')] [2025-01-05 16:14:43,555][07482] Updated weights for policy 0, policy_version 311958 (0.0018) [2025-01-05 16:14:45,550][07482] Updated weights for policy 0, policy_version 311968 (0.0016) [2025-01-05 16:14:47,608][07482] Updated weights for policy 0, policy_version 311978 (0.0016) [2025-01-05 16:14:47,852][07361] Fps is (10 sec: 19251.3, 60 sec: 19524.2, 300 sec: 19410.9). Total num frames: 1277861888. Throughput: 0: 4877.9. Samples: 14929114. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 16:14:47,852][07361] Avg episode reward: [(0, '43.755')] [2025-01-05 16:14:49,796][07482] Updated weights for policy 0, policy_version 311988 (0.0016) [2025-01-05 16:14:51,813][07482] Updated weights for policy 0, policy_version 311998 (0.0017) [2025-01-05 16:14:52,852][07361] Fps is (10 sec: 19660.9, 60 sec: 19524.3, 300 sec: 19424.8). Total num frames: 1277960192. Throughput: 0: 4875.9. Samples: 14943636. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 16:14:52,852][07361] Avg episode reward: [(0, '44.144')] [2025-01-05 16:14:53,859][07482] Updated weights for policy 0, policy_version 312008 (0.0017) [2025-01-05 16:14:55,972][07482] Updated weights for policy 0, policy_version 312018 (0.0015) [2025-01-05 16:14:57,852][07361] Fps is (10 sec: 19660.9, 60 sec: 19524.3, 300 sec: 19424.8). Total num frames: 1278058496. Throughput: 0: 4882.4. Samples: 14973386. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 16:14:57,852][07361] Avg episode reward: [(0, '45.098')] [2025-01-05 16:14:58,046][07482] Updated weights for policy 0, policy_version 312028 (0.0018) [2025-01-05 16:15:00,117][07482] Updated weights for policy 0, policy_version 312038 (0.0015) [2025-01-05 16:15:02,214][07482] Updated weights for policy 0, policy_version 312048 (0.0015) [2025-01-05 16:15:02,852][07361] Fps is (10 sec: 19660.7, 60 sec: 19524.3, 300 sec: 19424.8). Total num frames: 1278156800. Throughput: 0: 4896.2. Samples: 15002968. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) [2025-01-05 16:15:02,852][07361] Avg episode reward: [(0, '40.340')] [2025-01-05 16:15:04,288][07482] Updated weights for policy 0, policy_version 312058 (0.0017) [2025-01-05 16:15:06,357][07482] Updated weights for policy 0, policy_version 312068 (0.0016) [2025-01-05 16:15:07,851][07361] Fps is (10 sec: 19661.0, 60 sec: 19524.3, 300 sec: 19438.6). Total num frames: 1278255104. Throughput: 0: 4891.3. Samples: 15017550. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 16:15:07,852][07361] Avg episode reward: [(0, '40.701')] [2025-01-05 16:15:08,572][07482] Updated weights for policy 0, policy_version 312078 (0.0017) [2025-01-05 16:15:10,548][07482] Updated weights for policy 0, policy_version 312088 (0.0015) [2025-01-05 16:15:12,605][07482] Updated weights for policy 0, policy_version 312098 (0.0015) [2025-01-05 16:15:12,852][07361] Fps is (10 sec: 19660.8, 60 sec: 19524.3, 300 sec: 19438.6). Total num frames: 1278353408. Throughput: 0: 4892.1. Samples: 15046952. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 16:15:12,852][07361] Avg episode reward: [(0, '43.231')] [2025-01-05 16:15:14,794][07482] Updated weights for policy 0, policy_version 312108 (0.0019) [2025-01-05 16:15:16,774][07482] Updated weights for policy 0, policy_version 312118 (0.0016) [2025-01-05 16:15:17,851][07361] Fps is (10 sec: 19660.8, 60 sec: 19524.3, 300 sec: 19452.5). Total num frames: 1278451712. Throughput: 0: 4905.2. Samples: 15076542. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 16:15:17,852][07361] Avg episode reward: [(0, '45.252')] [2025-01-05 16:15:18,836][07482] Updated weights for policy 0, policy_version 312128 (0.0016) [2025-01-05 16:15:20,958][07482] Updated weights for policy 0, policy_version 312138 (0.0015) [2025-01-05 16:15:22,851][07361] Fps is (10 sec: 19661.1, 60 sec: 19524.3, 300 sec: 19452.5). Total num frames: 1278550016. Throughput: 0: 4913.6. Samples: 15091496. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 16:15:22,852][07361] Avg episode reward: [(0, '45.832')] [2025-01-05 16:15:23,019][07482] Updated weights for policy 0, policy_version 312148 (0.0017) [2025-01-05 16:15:25,088][07482] Updated weights for policy 0, policy_version 312158 (0.0016) [2025-01-05 16:15:27,194][07482] Updated weights for policy 0, policy_version 312168 (0.0016) [2025-01-05 16:15:27,851][07361] Fps is (10 sec: 20070.3, 60 sec: 19660.9, 300 sec: 19466.4). Total num frames: 1278652416. Throughput: 0: 4918.1. Samples: 15120992. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 16:15:27,852][07361] Avg episode reward: [(0, '41.725')] [2025-01-05 16:15:29,252][07482] Updated weights for policy 0, policy_version 312178 (0.0017) [2025-01-05 16:15:31,306][07482] Updated weights for policy 0, policy_version 312188 (0.0016) [2025-01-05 16:15:32,852][07361] Fps is (10 sec: 20070.2, 60 sec: 19660.8, 300 sec: 19480.3). Total num frames: 1278750720. Throughput: 0: 4918.9. Samples: 15150462. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 16:15:32,852][07361] Avg episode reward: [(0, '42.324')] [2025-01-05 16:15:33,470][07482] Updated weights for policy 0, policy_version 312198 (0.0017) [2025-01-05 16:15:35,491][07482] Updated weights for policy 0, policy_version 312208 (0.0016) [2025-01-05 16:15:37,582][07482] Updated weights for policy 0, policy_version 312218 (0.0016) [2025-01-05 16:15:37,851][07361] Fps is (10 sec: 19251.2, 60 sec: 19592.6, 300 sec: 19466.4). Total num frames: 1278844928. Throughput: 0: 4923.0. Samples: 15165172. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 16:15:37,852][07361] Avg episode reward: [(0, '42.313')] [2025-01-05 16:15:39,792][07482] Updated weights for policy 0, policy_version 312228 (0.0018) [2025-01-05 16:15:41,753][07482] Updated weights for policy 0, policy_version 312238 (0.0017) [2025-01-05 16:15:42,851][07361] Fps is (10 sec: 19251.4, 60 sec: 19660.8, 300 sec: 19466.4). Total num frames: 1278943232. Throughput: 0: 4912.5. Samples: 15194446. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 16:15:42,852][07361] Avg episode reward: [(0, '41.044')] [2025-01-05 16:15:43,831][07482] Updated weights for policy 0, policy_version 312248 (0.0017) [2025-01-05 16:15:45,949][07482] Updated weights for policy 0, policy_version 312258 (0.0018) [2025-01-05 16:15:47,851][07361] Fps is (10 sec: 19660.9, 60 sec: 19660.9, 300 sec: 19480.3). Total num frames: 1279041536. Throughput: 0: 4914.9. Samples: 15224138. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 16:15:47,852][07361] Avg episode reward: [(0, '43.272')] [2025-01-05 16:15:47,995][07482] Updated weights for policy 0, policy_version 312268 (0.0018) [2025-01-05 16:15:50,073][07482] Updated weights for policy 0, policy_version 312278 (0.0017) [2025-01-05 16:15:52,228][07482] Updated weights for policy 0, policy_version 312288 (0.0020) [2025-01-05 16:15:52,852][07361] Fps is (10 sec: 19659.2, 60 sec: 19660.6, 300 sec: 19494.1). Total num frames: 1279139840. Throughput: 0: 4920.4. Samples: 15238972. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 16:15:52,853][07361] Avg episode reward: [(0, '43.989')] [2025-01-05 16:15:54,260][07482] Updated weights for policy 0, policy_version 312298 (0.0016) [2025-01-05 16:15:56,355][07482] Updated weights for policy 0, policy_version 312308 (0.0016) [2025-01-05 16:15:57,852][07361] Fps is (10 sec: 19660.6, 60 sec: 19660.8, 300 sec: 19494.2). Total num frames: 1279238144. Throughput: 0: 4918.5. Samples: 15268284. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 16:15:57,852][07361] Avg episode reward: [(0, '44.369')] [2025-01-05 16:15:58,511][07482] Updated weights for policy 0, policy_version 312318 (0.0016) [2025-01-05 16:16:00,500][07482] Updated weights for policy 0, policy_version 312328 (0.0016) [2025-01-05 16:16:02,577][07482] Updated weights for policy 0, policy_version 312338 (0.0016) [2025-01-05 16:16:02,851][07361] Fps is (10 sec: 20071.9, 60 sec: 19729.1, 300 sec: 19508.1). Total num frames: 1279340544. Throughput: 0: 4918.5. Samples: 15297874. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 16:16:02,852][07361] Avg episode reward: [(0, '43.429')] [2025-01-05 16:16:04,804][07482] Updated weights for policy 0, policy_version 312348 (0.0017) [2025-01-05 16:16:06,749][07482] Updated weights for policy 0, policy_version 312358 (0.0015) [2025-01-05 16:16:07,852][07361] Fps is (10 sec: 19660.8, 60 sec: 19660.8, 300 sec: 19508.1). Total num frames: 1279434752. Throughput: 0: 4906.8. Samples: 15312302. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 16:16:07,852][07361] Avg episode reward: [(0, '42.242')] [2025-01-05 16:16:08,856][07482] Updated weights for policy 0, policy_version 312368 (0.0017) [2025-01-05 16:16:10,972][07482] Updated weights for policy 0, policy_version 312378 (0.0018) [2025-01-05 16:16:12,851][07361] Fps is (10 sec: 19251.2, 60 sec: 19660.8, 300 sec: 19508.1). Total num frames: 1279533056. Throughput: 0: 4914.8. Samples: 15342158. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 16:16:12,852][07361] Avg episode reward: [(0, '42.452')] [2025-01-05 16:16:12,991][07482] Updated weights for policy 0, policy_version 312388 (0.0017) [2025-01-05 16:16:15,083][07482] Updated weights for policy 0, policy_version 312398 (0.0018) [2025-01-05 16:16:17,186][07482] Updated weights for policy 0, policy_version 312408 (0.0017) [2025-01-05 16:16:17,851][07361] Fps is (10 sec: 20070.7, 60 sec: 19729.1, 300 sec: 19522.0). Total num frames: 1279635456. Throughput: 0: 4917.7. Samples: 15371760. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 16:16:17,852][07361] Avg episode reward: [(0, '43.132')] [2025-01-05 16:16:19,223][07482] Updated weights for policy 0, policy_version 312418 (0.0017) [2025-01-05 16:16:21,310][07482] Updated weights for policy 0, policy_version 312428 (0.0018) [2025-01-05 16:16:22,851][07361] Fps is (10 sec: 20070.5, 60 sec: 19729.1, 300 sec: 19522.0). Total num frames: 1279733760. Throughput: 0: 4917.8. Samples: 15386474. Policy #0 lag: (min: 0.0, avg: 1.4, max: 2.0) [2025-01-05 16:16:22,852][07361] Avg episode reward: [(0, '42.367')] [2025-01-05 16:16:23,453][07482] Updated weights for policy 0, policy_version 312438 (0.0017) [2025-01-05 16:16:25,428][07482] Updated weights for policy 0, policy_version 312448 (0.0017) [2025-01-05 16:16:27,508][07482] Updated weights for policy 0, policy_version 312458 (0.0016) [2025-01-05 16:16:27,852][07361] Fps is (10 sec: 19660.4, 60 sec: 19660.7, 300 sec: 19521.9). Total num frames: 1279832064. Throughput: 0: 4926.5. Samples: 15416140. Policy #0 lag: (min: 0.0, avg: 1.4, max: 2.0) [2025-01-05 16:16:27,852][07361] Avg episode reward: [(0, '43.668')] [2025-01-05 16:16:29,696][07482] Updated weights for policy 0, policy_version 312468 (0.0018) [2025-01-05 16:16:31,683][07482] Updated weights for policy 0, policy_version 312478 (0.0017) [2025-01-05 16:16:32,851][07361] Fps is (10 sec: 19660.8, 60 sec: 19660.8, 300 sec: 19522.0). Total num frames: 1279930368. Throughput: 0: 4918.4. Samples: 15445468. Policy #0 lag: (min: 0.0, avg: 1.4, max: 2.0) [2025-01-05 16:16:32,852][07361] Avg episode reward: [(0, '41.339')] [2025-01-05 16:16:33,826][07482] Updated weights for policy 0, policy_version 312488 (0.0018) [2025-01-05 16:16:35,946][07482] Updated weights for policy 0, policy_version 312498 (0.0017) [2025-01-05 16:16:37,851][07361] Fps is (10 sec: 19251.5, 60 sec: 19660.8, 300 sec: 19508.1). Total num frames: 1280024576. Throughput: 0: 4916.8. Samples: 15460224. Policy #0 lag: (min: 0.0, avg: 1.4, max: 2.0) [2025-01-05 16:16:37,852][07361] Avg episode reward: [(0, '40.832')] [2025-01-05 16:16:37,895][07448] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000312507_1280028672.pth... [2025-01-05 16:16:37,948][07448] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000311361_1275334656.pth [2025-01-05 16:16:38,059][07482] Updated weights for policy 0, policy_version 312508 (0.0018) [2025-01-05 16:16:40,119][07482] Updated weights for policy 0, policy_version 312518 (0.0016) [2025-01-05 16:16:42,248][07448] Signal inference workers to stop experience collection... (50 times) [2025-01-05 16:16:42,251][07448] Signal inference workers to resume experience collection... (50 times) [2025-01-05 16:16:42,257][07482] Updated weights for policy 0, policy_version 312528 (0.0019) [2025-01-05 16:16:42,266][07482] InferenceWorker_p0-w0: stopping experience collection (50 times) [2025-01-05 16:16:42,266][07482] InferenceWorker_p0-w0: resuming experience collection (50 times) [2025-01-05 16:16:42,851][07361] Fps is (10 sec: 19251.1, 60 sec: 19660.8, 300 sec: 19521.9). Total num frames: 1280122880. Throughput: 0: 4913.7. Samples: 15489402. Policy #0 lag: (min: 0.0, avg: 1.4, max: 2.0) [2025-01-05 16:16:42,852][07361] Avg episode reward: [(0, '43.361')] [2025-01-05 16:16:44,285][07482] Updated weights for policy 0, policy_version 312538 (0.0018) [2025-01-05 16:16:46,387][07482] Updated weights for policy 0, policy_version 312548 (0.0017) [2025-01-05 16:16:47,852][07361] Fps is (10 sec: 19660.6, 60 sec: 19660.7, 300 sec: 19522.0). Total num frames: 1280221184. Throughput: 0: 4907.4. Samples: 15518708. Policy #0 lag: (min: 0.0, avg: 1.4, max: 2.0) [2025-01-05 16:16:47,852][07361] Avg episode reward: [(0, '44.505')] [2025-01-05 16:16:48,580][07482] Updated weights for policy 0, policy_version 312558 (0.0017) [2025-01-05 16:16:50,554][07482] Updated weights for policy 0, policy_version 312568 (0.0016) [2025-01-05 16:16:52,689][07482] Updated weights for policy 0, policy_version 312578 (0.0017) [2025-01-05 16:16:52,852][07361] Fps is (10 sec: 19660.7, 60 sec: 19661.0, 300 sec: 19535.9). Total num frames: 1280319488. Throughput: 0: 4913.6. Samples: 15533416. Policy #0 lag: (min: 0.0, avg: 1.4, max: 2.0) [2025-01-05 16:16:52,852][07361] Avg episode reward: [(0, '44.668')] [2025-01-05 16:16:54,895][07482] Updated weights for policy 0, policy_version 312588 (0.0019) [2025-01-05 16:16:56,851][07482] Updated weights for policy 0, policy_version 312598 (0.0016) [2025-01-05 16:16:57,852][07361] Fps is (10 sec: 19660.9, 60 sec: 19660.8, 300 sec: 19535.8). Total num frames: 1280417792. Throughput: 0: 4897.9. Samples: 15562564. Policy #0 lag: (min: 0.0, avg: 1.4, max: 2.0) [2025-01-05 16:16:57,852][07361] Avg episode reward: [(0, '43.487')] [2025-01-05 16:16:58,943][07482] Updated weights for policy 0, policy_version 312608 (0.0018) [2025-01-05 16:17:01,045][07482] Updated weights for policy 0, policy_version 312618 (0.0018) [2025-01-05 16:17:02,851][07361] Fps is (10 sec: 19661.0, 60 sec: 19592.6, 300 sec: 19535.9). Total num frames: 1280516096. Throughput: 0: 4904.1. Samples: 15592444. Policy #0 lag: (min: 0.0, avg: 1.4, max: 2.0) [2025-01-05 16:17:02,852][07361] Avg episode reward: [(0, '43.248')] [2025-01-05 16:17:03,072][07482] Updated weights for policy 0, policy_version 312628 (0.0019) [2025-01-05 16:17:05,162][07482] Updated weights for policy 0, policy_version 312638 (0.0016) [2025-01-05 16:17:07,334][07482] Updated weights for policy 0, policy_version 312648 (0.0019) [2025-01-05 16:17:07,852][07361] Fps is (10 sec: 19660.8, 60 sec: 19660.8, 300 sec: 19549.7). Total num frames: 1280614400. Throughput: 0: 4903.6. Samples: 15607138. Policy #0 lag: (min: 0.0, avg: 1.4, max: 2.0) [2025-01-05 16:17:07,852][07361] Avg episode reward: [(0, '44.564')] [2025-01-05 16:17:09,459][07482] Updated weights for policy 0, policy_version 312658 (0.0019) [2025-01-05 16:17:11,596][07482] Updated weights for policy 0, policy_version 312668 (0.0018) [2025-01-05 16:17:12,851][07361] Fps is (10 sec: 19251.2, 60 sec: 19592.6, 300 sec: 19549.7). Total num frames: 1280708608. Throughput: 0: 4881.8. Samples: 15635818. Policy #0 lag: (min: 0.0, avg: 1.4, max: 2.0) [2025-01-05 16:17:12,852][07361] Avg episode reward: [(0, '43.890')] [2025-01-05 16:17:13,833][07482] Updated weights for policy 0, policy_version 312678 (0.0019) [2025-01-05 16:17:15,798][07482] Updated weights for policy 0, policy_version 312688 (0.0016) [2025-01-05 16:17:17,852][07361] Fps is (10 sec: 19250.8, 60 sec: 19524.2, 300 sec: 19549.7). Total num frames: 1280806912. Throughput: 0: 4874.7. Samples: 15664830. Policy #0 lag: (min: 0.0, avg: 1.4, max: 2.0) [2025-01-05 16:17:17,852][07361] Avg episode reward: [(0, '41.121')] [2025-01-05 16:17:17,936][07482] Updated weights for policy 0, policy_version 312698 (0.0016) [2025-01-05 16:17:20,147][07482] Updated weights for policy 0, policy_version 312708 (0.0016) [2025-01-05 16:17:22,129][07482] Updated weights for policy 0, policy_version 312718 (0.0016) [2025-01-05 16:17:22,851][07361] Fps is (10 sec: 19660.8, 60 sec: 19524.3, 300 sec: 19563.6). Total num frames: 1280905216. Throughput: 0: 4864.2. Samples: 15679112. Policy #0 lag: (min: 0.0, avg: 1.4, max: 2.0) [2025-01-05 16:17:22,852][07361] Avg episode reward: [(0, '42.022')] [2025-01-05 16:17:24,235][07482] Updated weights for policy 0, policy_version 312728 (0.0016) [2025-01-05 16:17:25,745][07448] Signal inference workers to stop experience collection... (100 times) [2025-01-05 16:17:25,746][07448] Signal inference workers to resume experience collection... (100 times) [2025-01-05 16:17:25,759][07482] InferenceWorker_p0-w0: stopping experience collection (100 times) [2025-01-05 16:17:25,759][07482] InferenceWorker_p0-w0: resuming experience collection (100 times) [2025-01-05 16:17:26,357][07482] Updated weights for policy 0, policy_version 312738 (0.0016) [2025-01-05 16:17:27,852][07361] Fps is (10 sec: 19661.1, 60 sec: 19524.3, 300 sec: 19563.6). Total num frames: 1281003520. Throughput: 0: 4877.0. Samples: 15708868. Policy #0 lag: (min: 0.0, avg: 1.4, max: 2.0) [2025-01-05 16:17:27,852][07361] Avg episode reward: [(0, '43.968')] [2025-01-05 16:17:28,327][07482] Updated weights for policy 0, policy_version 312748 (0.0017) [2025-01-05 16:17:30,438][07482] Updated weights for policy 0, policy_version 312758 (0.0017) [2025-01-05 16:17:32,517][07482] Updated weights for policy 0, policy_version 312768 (0.0017) [2025-01-05 16:17:32,851][07361] Fps is (10 sec: 19660.8, 60 sec: 19524.3, 300 sec: 19563.6). Total num frames: 1281101824. Throughput: 0: 4889.7. Samples: 15738744. Policy #0 lag: (min: 0.0, avg: 1.4, max: 2.0) [2025-01-05 16:17:32,852][07361] Avg episode reward: [(0, '41.549')] [2025-01-05 16:17:34,586][07482] Updated weights for policy 0, policy_version 312778 (0.0017) [2025-01-05 16:17:36,667][07482] Updated weights for policy 0, policy_version 312788 (0.0016) [2025-01-05 16:17:37,852][07361] Fps is (10 sec: 19660.7, 60 sec: 19592.5, 300 sec: 19563.6). Total num frames: 1281200128. Throughput: 0: 4883.7. Samples: 15753184. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-05 16:17:37,852][07361] Avg episode reward: [(0, '40.279')] [2025-01-05 16:17:38,830][07482] Updated weights for policy 0, policy_version 312798 (0.0016) [2025-01-05 16:17:40,823][07482] Updated weights for policy 0, policy_version 312808 (0.0015) [2025-01-05 16:17:42,852][07361] Fps is (10 sec: 19660.5, 60 sec: 19592.5, 300 sec: 19577.5). Total num frames: 1281298432. Throughput: 0: 4895.6. Samples: 15782864. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-05 16:17:42,852][07361] Avg episode reward: [(0, '39.876')] [2025-01-05 16:17:42,903][07482] Updated weights for policy 0, policy_version 312818 (0.0017) [2025-01-05 16:17:45,045][07482] Updated weights for policy 0, policy_version 312828 (0.0016) [2025-01-05 16:17:47,032][07482] Updated weights for policy 0, policy_version 312838 (0.0015) [2025-01-05 16:17:47,851][07361] Fps is (10 sec: 19661.1, 60 sec: 19592.6, 300 sec: 19577.5). Total num frames: 1281396736. Throughput: 0: 4891.2. Samples: 15812548. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-05 16:17:47,852][07361] Avg episode reward: [(0, '44.359')] [2025-01-05 16:17:49,090][07482] Updated weights for policy 0, policy_version 312848 (0.0015) [2025-01-05 16:17:51,213][07482] Updated weights for policy 0, policy_version 312858 (0.0016) [2025-01-05 16:17:52,852][07361] Fps is (10 sec: 19660.8, 60 sec: 19592.5, 300 sec: 19591.4). Total num frames: 1281495040. Throughput: 0: 4899.7. Samples: 15827624. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-05 16:17:52,852][07361] Avg episode reward: [(0, '44.402')] [2025-01-05 16:17:53,254][07482] Updated weights for policy 0, policy_version 312868 (0.0017) [2025-01-05 16:17:55,336][07482] Updated weights for policy 0, policy_version 312878 (0.0016) [2025-01-05 16:17:57,437][07482] Updated weights for policy 0, policy_version 312888 (0.0018) [2025-01-05 16:17:57,851][07361] Fps is (10 sec: 20070.5, 60 sec: 19660.8, 300 sec: 19605.3). Total num frames: 1281597440. Throughput: 0: 4916.2. Samples: 15857048. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-05 16:17:57,852][07361] Avg episode reward: [(0, '42.616')] [2025-01-05 16:17:59,507][07482] Updated weights for policy 0, policy_version 312898 (0.0017) [2025-01-05 16:18:01,615][07482] Updated weights for policy 0, policy_version 312908 (0.0016) [2025-01-05 16:18:02,852][07361] Fps is (10 sec: 19660.8, 60 sec: 19592.5, 300 sec: 19605.3). Total num frames: 1281691648. Throughput: 0: 4919.4. Samples: 15886202. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-05 16:18:02,852][07361] Avg episode reward: [(0, '43.045')] [2025-01-05 16:18:03,824][07482] Updated weights for policy 0, policy_version 312918 (0.0017) [2025-01-05 16:18:05,782][07482] Updated weights for policy 0, policy_version 312928 (0.0015) [2025-01-05 16:18:07,852][07361] Fps is (10 sec: 19250.8, 60 sec: 19592.5, 300 sec: 19605.2). Total num frames: 1281789952. Throughput: 0: 4926.8. Samples: 15900820. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-05 16:18:07,852][07361] Avg episode reward: [(0, '43.156')] [2025-01-05 16:18:07,911][07482] Updated weights for policy 0, policy_version 312938 (0.0016) [2025-01-05 16:18:10,098][07482] Updated weights for policy 0, policy_version 312948 (0.0018) [2025-01-05 16:18:12,068][07482] Updated weights for policy 0, policy_version 312958 (0.0016) [2025-01-05 16:18:12,852][07361] Fps is (10 sec: 19660.8, 60 sec: 19660.8, 300 sec: 19605.3). Total num frames: 1281888256. Throughput: 0: 4920.1. Samples: 15930274. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-05 16:18:12,852][07361] Avg episode reward: [(0, '38.748')] [2025-01-05 16:18:14,164][07482] Updated weights for policy 0, policy_version 312968 (0.0018) [2025-01-05 16:18:16,274][07482] Updated weights for policy 0, policy_version 312978 (0.0016) [2025-01-05 16:18:17,852][07361] Fps is (10 sec: 19660.9, 60 sec: 19660.9, 300 sec: 19605.3). Total num frames: 1281986560. Throughput: 0: 4912.5. Samples: 15959808. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-05 16:18:17,852][07361] Avg episode reward: [(0, '40.636')] [2025-01-05 16:18:18,340][07482] Updated weights for policy 0, policy_version 312988 (0.0017) [2025-01-05 16:18:20,429][07482] Updated weights for policy 0, policy_version 312998 (0.0016) [2025-01-05 16:18:22,544][07482] Updated weights for policy 0, policy_version 313008 (0.0016) [2025-01-05 16:18:22,852][07361] Fps is (10 sec: 19660.8, 60 sec: 19660.8, 300 sec: 19605.3). Total num frames: 1282084864. Throughput: 0: 4916.3. Samples: 15974418. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-05 16:18:22,852][07361] Avg episode reward: [(0, '42.833')] [2025-01-05 16:18:24,578][07482] Updated weights for policy 0, policy_version 313018 (0.0016) [2025-01-05 16:18:26,671][07482] Updated weights for policy 0, policy_version 313028 (0.0015) [2025-01-05 16:18:27,852][07361] Fps is (10 sec: 19660.8, 60 sec: 19660.8, 300 sec: 19605.3). Total num frames: 1282183168. Throughput: 0: 4916.1. Samples: 16004088. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-05 16:18:27,852][07361] Avg episode reward: [(0, '40.819')] [2025-01-05 16:18:28,869][07482] Updated weights for policy 0, policy_version 313038 (0.0016) [2025-01-05 16:18:30,849][07482] Updated weights for policy 0, policy_version 313048 (0.0016) [2025-01-05 16:18:32,851][07361] Fps is (10 sec: 19661.0, 60 sec: 19660.8, 300 sec: 19619.2). Total num frames: 1282281472. Throughput: 0: 4908.8. Samples: 16033444. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-05 16:18:32,852][07361] Avg episode reward: [(0, '41.553')] [2025-01-05 16:18:32,931][07482] Updated weights for policy 0, policy_version 313058 (0.0017) [2025-01-05 16:18:35,036][07482] Updated weights for policy 0, policy_version 313068 (0.0016) [2025-01-05 16:18:37,034][07482] Updated weights for policy 0, policy_version 313078 (0.0017) [2025-01-05 16:18:37,852][07361] Fps is (10 sec: 19660.8, 60 sec: 19660.8, 300 sec: 19619.1). Total num frames: 1282379776. Throughput: 0: 4903.7. Samples: 16048292. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-05 16:18:37,852][07361] Avg episode reward: [(0, '43.468')] [2025-01-05 16:18:37,868][07448] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000313082_1282383872.pth... [2025-01-05 16:18:37,919][07448] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000311931_1277669376.pth [2025-01-05 16:18:39,146][07482] Updated weights for policy 0, policy_version 313088 (0.0016) [2025-01-05 16:18:41,210][07482] Updated weights for policy 0, policy_version 313098 (0.0019) [2025-01-05 16:18:42,851][07361] Fps is (10 sec: 20070.2, 60 sec: 19729.1, 300 sec: 19633.0). Total num frames: 1282482176. Throughput: 0: 4912.4. Samples: 16078108. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-05 16:18:42,852][07361] Avg episode reward: [(0, '42.233')] [2025-01-05 16:18:43,221][07482] Updated weights for policy 0, policy_version 313108 (0.0016) [2025-01-05 16:18:45,304][07482] Updated weights for policy 0, policy_version 313118 (0.0016) [2025-01-05 16:18:47,363][07482] Updated weights for policy 0, policy_version 313128 (0.0016) [2025-01-05 16:18:47,852][07361] Fps is (10 sec: 20070.5, 60 sec: 19729.0, 300 sec: 19633.0). Total num frames: 1282580480. Throughput: 0: 4930.7. Samples: 16108084. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-05 16:18:47,852][07361] Avg episode reward: [(0, '39.337')] [2025-01-05 16:18:49,472][07482] Updated weights for policy 0, policy_version 313138 (0.0019) [2025-01-05 16:18:51,585][07482] Updated weights for policy 0, policy_version 313148 (0.0017) [2025-01-05 16:18:52,851][07361] Fps is (10 sec: 19251.3, 60 sec: 19660.8, 300 sec: 19619.2). Total num frames: 1282674688. Throughput: 0: 4927.4. Samples: 16122554. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-05 16:18:52,852][07361] Avg episode reward: [(0, '41.620')] [2025-01-05 16:18:53,759][07482] Updated weights for policy 0, policy_version 313158 (0.0017) [2025-01-05 16:18:55,752][07482] Updated weights for policy 0, policy_version 313168 (0.0016) [2025-01-05 16:18:57,849][07482] Updated weights for policy 0, policy_version 313178 (0.0016) [2025-01-05 16:18:57,851][07361] Fps is (10 sec: 19660.9, 60 sec: 19660.8, 300 sec: 19633.0). Total num frames: 1282777088. Throughput: 0: 4922.8. Samples: 16151800. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-05 16:18:57,852][07361] Avg episode reward: [(0, '46.270')] [2025-01-05 16:19:00,018][07482] Updated weights for policy 0, policy_version 313188 (0.0017) [2025-01-05 16:19:02,017][07482] Updated weights for policy 0, policy_version 313198 (0.0016) [2025-01-05 16:19:02,852][07361] Fps is (10 sec: 19660.6, 60 sec: 19660.8, 300 sec: 19619.1). Total num frames: 1282871296. Throughput: 0: 4920.7. Samples: 16181238. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-05 16:19:02,852][07361] Avg episode reward: [(0, '45.553')] [2025-01-05 16:19:04,110][07482] Updated weights for policy 0, policy_version 313208 (0.0017) [2025-01-05 16:19:06,196][07482] Updated weights for policy 0, policy_version 313218 (0.0015) [2025-01-05 16:19:07,852][07361] Fps is (10 sec: 19250.9, 60 sec: 19660.8, 300 sec: 19619.1). Total num frames: 1282969600. Throughput: 0: 4926.7. Samples: 16196120. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-05 16:19:07,852][07361] Avg episode reward: [(0, '46.579')] [2025-01-05 16:19:08,282][07482] Updated weights for policy 0, policy_version 313228 (0.0017) [2025-01-05 16:19:10,354][07482] Updated weights for policy 0, policy_version 313238 (0.0015) [2025-01-05 16:19:12,455][07482] Updated weights for policy 0, policy_version 313248 (0.0016) [2025-01-05 16:19:12,852][07361] Fps is (10 sec: 19660.7, 60 sec: 19660.8, 300 sec: 19619.1). Total num frames: 1283067904. Throughput: 0: 4921.4. Samples: 16225552. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-05 16:19:12,852][07361] Avg episode reward: [(0, '47.250')] [2025-01-05 16:19:12,858][07448] Saving new best policy, reward=47.250! [2025-01-05 16:19:14,606][07482] Updated weights for policy 0, policy_version 313258 (0.0016) [2025-01-05 16:19:16,683][07482] Updated weights for policy 0, policy_version 313268 (0.0015) [2025-01-05 16:19:17,851][07361] Fps is (10 sec: 19661.2, 60 sec: 19660.8, 300 sec: 19619.2). Total num frames: 1283166208. Throughput: 0: 4913.2. Samples: 16254538. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-05 16:19:17,852][07361] Avg episode reward: [(0, '47.947')] [2025-01-05 16:19:17,859][07448] Saving new best policy, reward=47.947! [2025-01-05 16:19:18,895][07482] Updated weights for policy 0, policy_version 313278 (0.0018) [2025-01-05 16:19:20,887][07482] Updated weights for policy 0, policy_version 313288 (0.0016) [2025-01-05 16:19:22,852][07361] Fps is (10 sec: 19660.7, 60 sec: 19660.8, 300 sec: 19633.0). Total num frames: 1283264512. Throughput: 0: 4905.4. Samples: 16269036. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-05 16:19:22,852][07361] Avg episode reward: [(0, '43.206')] [2025-01-05 16:19:22,969][07482] Updated weights for policy 0, policy_version 313298 (0.0016) [2025-01-05 16:19:25,049][07482] Updated weights for policy 0, policy_version 313308 (0.0015) [2025-01-05 16:19:27,044][07482] Updated weights for policy 0, policy_version 313318 (0.0015) [2025-01-05 16:19:27,851][07361] Fps is (10 sec: 19660.8, 60 sec: 19660.8, 300 sec: 19633.0). Total num frames: 1283362816. Throughput: 0: 4907.5. Samples: 16298946. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-05 16:19:27,852][07361] Avg episode reward: [(0, '41.301')] [2025-01-05 16:19:29,132][07482] Updated weights for policy 0, policy_version 313328 (0.0016) [2025-01-05 16:19:31,215][07482] Updated weights for policy 0, policy_version 313338 (0.0015) [2025-01-05 16:19:32,852][07361] Fps is (10 sec: 19660.9, 60 sec: 19660.8, 300 sec: 19633.0). Total num frames: 1283461120. Throughput: 0: 4900.8. Samples: 16328620. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-05 16:19:32,852][07361] Avg episode reward: [(0, '40.036')] [2025-01-05 16:19:33,296][07482] Updated weights for policy 0, policy_version 313348 (0.0017) [2025-01-05 16:19:35,400][07482] Updated weights for policy 0, policy_version 313358 (0.0016) [2025-01-05 16:19:37,480][07482] Updated weights for policy 0, policy_version 313368 (0.0016) [2025-01-05 16:19:37,852][07361] Fps is (10 sec: 19660.6, 60 sec: 19660.8, 300 sec: 19646.9). Total num frames: 1283559424. Throughput: 0: 4903.6. Samples: 16343218. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-05 16:19:37,852][07361] Avg episode reward: [(0, '40.717')] [2025-01-05 16:19:39,578][07482] Updated weights for policy 0, policy_version 313378 (0.0016) [2025-01-05 16:19:41,666][07482] Updated weights for policy 0, policy_version 313388 (0.0015) [2025-01-05 16:19:42,852][07361] Fps is (10 sec: 19660.6, 60 sec: 19592.5, 300 sec: 19646.9). Total num frames: 1283657728. Throughput: 0: 4909.5. Samples: 16372726. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-05 16:19:42,852][07361] Avg episode reward: [(0, '42.652')] [2025-01-05 16:19:43,850][07482] Updated weights for policy 0, policy_version 313398 (0.0017) [2025-01-05 16:19:45,868][07482] Updated weights for policy 0, policy_version 313408 (0.0016) [2025-01-05 16:19:47,852][07361] Fps is (10 sec: 19660.8, 60 sec: 19592.5, 300 sec: 19646.9). Total num frames: 1283756032. Throughput: 0: 4905.4. Samples: 16401982. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-05 16:19:47,852][07361] Avg episode reward: [(0, '42.412')] [2025-01-05 16:19:47,957][07482] Updated weights for policy 0, policy_version 313418 (0.0015) [2025-01-05 16:19:50,078][07482] Updated weights for policy 0, policy_version 313428 (0.0015) [2025-01-05 16:19:52,077][07482] Updated weights for policy 0, policy_version 313438 (0.0015) [2025-01-05 16:19:52,852][07361] Fps is (10 sec: 19660.9, 60 sec: 19660.8, 300 sec: 19646.9). Total num frames: 1283854336. Throughput: 0: 4902.7. Samples: 16416742. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-05 16:19:52,852][07361] Avg episode reward: [(0, '44.326')] [2025-01-05 16:19:54,166][07482] Updated weights for policy 0, policy_version 313448 (0.0016) [2025-01-05 16:19:56,268][07482] Updated weights for policy 0, policy_version 313458 (0.0016) [2025-01-05 16:19:57,852][07361] Fps is (10 sec: 19660.8, 60 sec: 19592.5, 300 sec: 19646.9). Total num frames: 1283952640. Throughput: 0: 4911.6. Samples: 16446572. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-05 16:19:57,852][07361] Avg episode reward: [(0, '43.260')] [2025-01-05 16:19:58,385][07482] Updated weights for policy 0, policy_version 313468 (0.0017) [2025-01-05 16:20:00,451][07482] Updated weights for policy 0, policy_version 313478 (0.0018) [2025-01-05 16:20:02,558][07482] Updated weights for policy 0, policy_version 313488 (0.0015) [2025-01-05 16:20:02,852][07361] Fps is (10 sec: 19660.9, 60 sec: 19660.8, 300 sec: 19646.9). Total num frames: 1284050944. Throughput: 0: 4919.1. Samples: 16475898. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 16:20:02,852][07361] Avg episode reward: [(0, '42.466')] [2025-01-05 16:20:04,637][07482] Updated weights for policy 0, policy_version 313498 (0.0017) [2025-01-05 16:20:06,711][07482] Updated weights for policy 0, policy_version 313508 (0.0016) [2025-01-05 16:20:07,852][07361] Fps is (10 sec: 19660.7, 60 sec: 19660.8, 300 sec: 19646.9). Total num frames: 1284149248. Throughput: 0: 4918.7. Samples: 16490378. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 16:20:07,852][07361] Avg episode reward: [(0, '46.048')] [2025-01-05 16:20:08,922][07482] Updated weights for policy 0, policy_version 313518 (0.0017) [2025-01-05 16:20:10,949][07482] Updated weights for policy 0, policy_version 313528 (0.0016) [2025-01-05 16:20:12,851][07361] Fps is (10 sec: 19251.4, 60 sec: 19592.6, 300 sec: 19633.0). Total num frames: 1284243456. Throughput: 0: 4902.7. Samples: 16519566. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 16:20:12,852][07361] Avg episode reward: [(0, '45.261')] [2025-01-05 16:20:13,059][07482] Updated weights for policy 0, policy_version 313538 (0.0016) [2025-01-05 16:20:15,159][07482] Updated weights for policy 0, policy_version 313548 (0.0016) [2025-01-05 16:20:17,186][07482] Updated weights for policy 0, policy_version 313558 (0.0015) [2025-01-05 16:20:17,851][07361] Fps is (10 sec: 19661.2, 60 sec: 19660.8, 300 sec: 19646.9). Total num frames: 1284345856. Throughput: 0: 4901.0. Samples: 16549164. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 16:20:17,852][07361] Avg episode reward: [(0, '46.762')] [2025-01-05 16:20:19,304][07482] Updated weights for policy 0, policy_version 313568 (0.0016) [2025-01-05 16:20:21,379][07482] Updated weights for policy 0, policy_version 313578 (0.0018) [2025-01-05 16:20:22,851][07361] Fps is (10 sec: 19660.9, 60 sec: 19592.6, 300 sec: 19619.2). Total num frames: 1284440064. Throughput: 0: 4903.0. Samples: 16563854. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 16:20:22,852][07361] Avg episode reward: [(0, '45.377')] [2025-01-05 16:20:23,479][07482] Updated weights for policy 0, policy_version 313588 (0.0016) [2025-01-05 16:20:25,532][07482] Updated weights for policy 0, policy_version 313598 (0.0016) [2025-01-05 16:20:27,619][07482] Updated weights for policy 0, policy_version 313608 (0.0016) [2025-01-05 16:20:27,852][07361] Fps is (10 sec: 19660.4, 60 sec: 19660.8, 300 sec: 19633.0). Total num frames: 1284542464. Throughput: 0: 4903.9. Samples: 16593400. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 16:20:27,852][07361] Avg episode reward: [(0, '43.021')] [2025-01-05 16:20:29,732][07482] Updated weights for policy 0, policy_version 313618 (0.0017) [2025-01-05 16:20:31,774][07482] Updated weights for policy 0, policy_version 313628 (0.0015) [2025-01-05 16:20:32,852][07361] Fps is (10 sec: 19660.6, 60 sec: 19592.5, 300 sec: 19633.0). Total num frames: 1284636672. Throughput: 0: 4906.9. Samples: 16622790. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 16:20:32,852][07361] Avg episode reward: [(0, '43.868')] [2025-01-05 16:20:33,941][07482] Updated weights for policy 0, policy_version 313638 (0.0017) [2025-01-05 16:20:36,004][07482] Updated weights for policy 0, policy_version 313648 (0.0015) [2025-01-05 16:20:37,852][07361] Fps is (10 sec: 19251.1, 60 sec: 19592.5, 300 sec: 19633.0). Total num frames: 1284734976. Throughput: 0: 4902.2. Samples: 16637340. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 16:20:37,852][07361] Avg episode reward: [(0, '44.745')] [2025-01-05 16:20:37,922][07448] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000313657_1284739072.pth... [2025-01-05 16:20:37,976][07448] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000312507_1280028672.pth [2025-01-05 16:20:38,141][07482] Updated weights for policy 0, policy_version 313658 (0.0017) [2025-01-05 16:20:40,258][07482] Updated weights for policy 0, policy_version 313668 (0.0019) [2025-01-05 16:20:42,294][07482] Updated weights for policy 0, policy_version 313678 (0.0017) [2025-01-05 16:20:42,851][07361] Fps is (10 sec: 19661.0, 60 sec: 19592.6, 300 sec: 19633.0). Total num frames: 1284833280. Throughput: 0: 4888.0. Samples: 16666532. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 16:20:42,852][07361] Avg episode reward: [(0, '45.845')] [2025-01-05 16:20:44,447][07482] Updated weights for policy 0, policy_version 313688 (0.0016) [2025-01-05 16:20:46,537][07482] Updated weights for policy 0, policy_version 313698 (0.0017) [2025-01-05 16:20:47,852][07361] Fps is (10 sec: 19659.9, 60 sec: 19592.3, 300 sec: 19633.0). Total num frames: 1284931584. Throughput: 0: 4884.8. Samples: 16695718. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 16:20:47,853][07361] Avg episode reward: [(0, '44.564')] [2025-01-05 16:20:48,640][07482] Updated weights for policy 0, policy_version 313708 (0.0017) [2025-01-05 16:20:50,706][07482] Updated weights for policy 0, policy_version 313718 (0.0017) [2025-01-05 16:20:52,851][07361] Fps is (10 sec: 19251.1, 60 sec: 19524.3, 300 sec: 19619.1). Total num frames: 1285025792. Throughput: 0: 4888.5. Samples: 16710358. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 16:20:52,852][07361] Avg episode reward: [(0, '44.261')] [2025-01-05 16:20:52,909][07482] Updated weights for policy 0, policy_version 313728 (0.0016) [2025-01-05 16:20:54,995][07482] Updated weights for policy 0, policy_version 313738 (0.0017) [2025-01-05 16:20:57,038][07482] Updated weights for policy 0, policy_version 313748 (0.0017) [2025-01-05 16:20:57,852][07361] Fps is (10 sec: 19252.4, 60 sec: 19524.3, 300 sec: 19605.3). Total num frames: 1285124096. Throughput: 0: 4884.6. Samples: 16739372. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 16:20:57,852][07361] Avg episode reward: [(0, '43.155')] [2025-01-05 16:20:59,184][07482] Updated weights for policy 0, policy_version 313758 (0.0017) [2025-01-05 16:21:01,235][07482] Updated weights for policy 0, policy_version 313768 (0.0016) [2025-01-05 16:21:02,852][07361] Fps is (10 sec: 19660.7, 60 sec: 19524.3, 300 sec: 19619.1). Total num frames: 1285222400. Throughput: 0: 4877.1. Samples: 16768636. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 16:21:02,852][07361] Avg episode reward: [(0, '43.887')] [2025-01-05 16:21:03,393][07482] Updated weights for policy 0, policy_version 313778 (0.0018) [2025-01-05 16:21:05,520][07482] Updated weights for policy 0, policy_version 313788 (0.0017) [2025-01-05 16:21:07,582][07482] Updated weights for policy 0, policy_version 313798 (0.0017) [2025-01-05 16:21:07,851][07361] Fps is (10 sec: 19660.9, 60 sec: 19524.3, 300 sec: 19619.1). Total num frames: 1285320704. Throughput: 0: 4872.4. Samples: 16783112. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 16:21:07,852][07361] Avg episode reward: [(0, '44.631')] [2025-01-05 16:21:09,747][07482] Updated weights for policy 0, policy_version 313808 (0.0018) [2025-01-05 16:21:11,865][07482] Updated weights for policy 0, policy_version 313818 (0.0019) [2025-01-05 16:21:12,852][07361] Fps is (10 sec: 19251.2, 60 sec: 19524.2, 300 sec: 19591.4). Total num frames: 1285414912. Throughput: 0: 4862.1. Samples: 16812194. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) [2025-01-05 16:21:12,852][07361] Avg episode reward: [(0, '44.885')] [2025-01-05 16:21:13,987][07482] Updated weights for policy 0, policy_version 313828 (0.0018) [2025-01-05 16:21:16,050][07482] Updated weights for policy 0, policy_version 313838 (0.0017) [2025-01-05 16:21:17,852][07361] Fps is (10 sec: 19251.0, 60 sec: 19455.9, 300 sec: 19591.4). Total num frames: 1285513216. Throughput: 0: 4852.8. Samples: 16841168. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-05 16:21:17,852][07361] Avg episode reward: [(0, '44.297')] [2025-01-05 16:21:18,221][07482] Updated weights for policy 0, policy_version 313848 (0.0018) [2025-01-05 16:21:20,284][07482] Updated weights for policy 0, policy_version 313858 (0.0017) [2025-01-05 16:21:22,376][07482] Updated weights for policy 0, policy_version 313868 (0.0017) [2025-01-05 16:21:22,852][07361] Fps is (10 sec: 19660.7, 60 sec: 19524.2, 300 sec: 19591.4). Total num frames: 1285611520. Throughput: 0: 4857.2. Samples: 16855912. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-05 16:21:22,852][07361] Avg episode reward: [(0, '44.034')] [2025-01-05 16:21:24,583][07482] Updated weights for policy 0, policy_version 313878 (0.0018) [2025-01-05 16:21:26,643][07482] Updated weights for policy 0, policy_version 313888 (0.0017) [2025-01-05 16:21:27,852][07361] Fps is (10 sec: 19251.3, 60 sec: 19387.8, 300 sec: 19577.5). Total num frames: 1285705728. Throughput: 0: 4849.5. Samples: 16884760. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-05 16:21:27,852][07361] Avg episode reward: [(0, '42.957')] [2025-01-05 16:21:28,783][07482] Updated weights for policy 0, policy_version 313898 (0.0018) [2025-01-05 16:21:30,912][07482] Updated weights for policy 0, policy_version 313908 (0.0017) [2025-01-05 16:21:32,852][07361] Fps is (10 sec: 19251.3, 60 sec: 19456.0, 300 sec: 19591.4). Total num frames: 1285804032. Throughput: 0: 4844.2. Samples: 16913702. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-05 16:21:32,852][07361] Avg episode reward: [(0, '42.870')] [2025-01-05 16:21:33,025][07482] Updated weights for policy 0, policy_version 313918 (0.0017) [2025-01-05 16:21:35,042][07482] Updated weights for policy 0, policy_version 313928 (0.0016) [2025-01-05 16:21:37,152][07482] Updated weights for policy 0, policy_version 313938 (0.0016) [2025-01-05 16:21:37,852][07361] Fps is (10 sec: 19660.7, 60 sec: 19456.0, 300 sec: 19591.4). Total num frames: 1285902336. Throughput: 0: 4852.8. Samples: 16928734. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-05 16:21:37,852][07361] Avg episode reward: [(0, '44.825')] [2025-01-05 16:21:39,267][07482] Updated weights for policy 0, policy_version 313948 (0.0017) [2025-01-05 16:21:41,276][07482] Updated weights for policy 0, policy_version 313958 (0.0015) [2025-01-05 16:21:42,851][07361] Fps is (10 sec: 19660.9, 60 sec: 19456.0, 300 sec: 19591.4). Total num frames: 1286000640. Throughput: 0: 4863.2. Samples: 16958216. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-05 16:21:42,852][07361] Avg episode reward: [(0, '43.316')] [2025-01-05 16:21:43,382][07482] Updated weights for policy 0, policy_version 313968 (0.0016) [2025-01-05 16:21:45,428][07482] Updated weights for policy 0, policy_version 313978 (0.0015) [2025-01-05 16:21:47,439][07482] Updated weights for policy 0, policy_version 313988 (0.0016) [2025-01-05 16:21:47,852][07361] Fps is (10 sec: 19660.8, 60 sec: 19456.2, 300 sec: 19591.4). Total num frames: 1286098944. Throughput: 0: 4877.9. Samples: 16988144. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-05 16:21:47,852][07361] Avg episode reward: [(0, '43.138')] [2025-01-05 16:21:49,542][07482] Updated weights for policy 0, policy_version 313998 (0.0015) [2025-01-05 16:21:51,619][07482] Updated weights for policy 0, policy_version 314008 (0.0017) [2025-01-05 16:21:52,852][07361] Fps is (10 sec: 19660.6, 60 sec: 19524.2, 300 sec: 19591.4). Total num frames: 1286197248. Throughput: 0: 4889.3. Samples: 17003132. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-05 16:21:52,852][07361] Avg episode reward: [(0, '43.569')] [2025-01-05 16:21:53,766][07482] Updated weights for policy 0, policy_version 314018 (0.0018) [2025-01-05 16:21:55,868][07482] Updated weights for policy 0, policy_version 314028 (0.0016) [2025-01-05 16:21:57,852][07361] Fps is (10 sec: 19660.9, 60 sec: 19524.3, 300 sec: 19591.4). Total num frames: 1286295552. Throughput: 0: 4887.5. Samples: 17032130. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-05 16:21:57,852][07361] Avg episode reward: [(0, '44.777')] [2025-01-05 16:21:57,990][07482] Updated weights for policy 0, policy_version 314038 (0.0016) [2025-01-05 16:22:00,030][07482] Updated weights for policy 0, policy_version 314048 (0.0016) [2025-01-05 16:22:02,139][07482] Updated weights for policy 0, policy_version 314058 (0.0016) [2025-01-05 16:22:02,852][07361] Fps is (10 sec: 19661.0, 60 sec: 19524.3, 300 sec: 19591.4). Total num frames: 1286393856. Throughput: 0: 4897.2. Samples: 17061540. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-05 16:22:02,852][07361] Avg episode reward: [(0, '44.294')] [2025-01-05 16:22:04,244][07482] Updated weights for policy 0, policy_version 314068 (0.0016) [2025-01-05 16:22:06,307][07482] Updated weights for policy 0, policy_version 314078 (0.0015) [2025-01-05 16:22:07,852][07361] Fps is (10 sec: 19660.7, 60 sec: 19524.2, 300 sec: 19605.2). Total num frames: 1286492160. Throughput: 0: 4894.0. Samples: 17076144. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-05 16:22:07,852][07361] Avg episode reward: [(0, '45.381')] [2025-01-05 16:22:08,499][07482] Updated weights for policy 0, policy_version 314088 (0.0017) [2025-01-05 16:22:10,582][07482] Updated weights for policy 0, policy_version 314098 (0.0015) [2025-01-05 16:22:12,630][07482] Updated weights for policy 0, policy_version 314108 (0.0018) [2025-01-05 16:22:12,852][07361] Fps is (10 sec: 19251.1, 60 sec: 19524.3, 300 sec: 19591.4). Total num frames: 1286586368. Throughput: 0: 4896.9. Samples: 17105120. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-05 16:22:12,852][07361] Avg episode reward: [(0, '43.342')] [2025-01-05 16:22:14,788][07482] Updated weights for policy 0, policy_version 314118 (0.0017) [2025-01-05 16:22:16,811][07482] Updated weights for policy 0, policy_version 314128 (0.0015) [2025-01-05 16:22:17,851][07361] Fps is (10 sec: 19251.4, 60 sec: 19524.3, 300 sec: 19591.4). Total num frames: 1286684672. Throughput: 0: 4909.4. Samples: 17134626. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-05 16:22:17,852][07361] Avg episode reward: [(0, '43.161')] [2025-01-05 16:22:18,944][07482] Updated weights for policy 0, policy_version 314138 (0.0020) [2025-01-05 16:22:21,052][07482] Updated weights for policy 0, policy_version 314148 (0.0017) [2025-01-05 16:22:22,851][07361] Fps is (10 sec: 19661.1, 60 sec: 19524.3, 300 sec: 19591.4). Total num frames: 1286782976. Throughput: 0: 4898.8. Samples: 17149178. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-05 16:22:22,852][07361] Avg episode reward: [(0, '44.043')] [2025-01-05 16:22:23,158][07482] Updated weights for policy 0, policy_version 314158 (0.0017) [2025-01-05 16:22:25,183][07482] Updated weights for policy 0, policy_version 314168 (0.0016) [2025-01-05 16:22:27,278][07482] Updated weights for policy 0, policy_version 314178 (0.0016) [2025-01-05 16:22:27,852][07361] Fps is (10 sec: 19659.3, 60 sec: 19592.3, 300 sec: 19591.3). Total num frames: 1286881280. Throughput: 0: 4900.5. Samples: 17178744. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) [2025-01-05 16:22:27,853][07361] Avg episode reward: [(0, '45.462')] [2025-01-05 16:22:29,396][07482] Updated weights for policy 0, policy_version 314188 (0.0016) [2025-01-05 16:22:31,434][07482] Updated weights for policy 0, policy_version 314198 (0.0016) [2025-01-05 16:22:32,852][07361] Fps is (10 sec: 19660.6, 60 sec: 19592.5, 300 sec: 19591.4). Total num frames: 1286979584. Throughput: 0: 4888.9. Samples: 17208142. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 16:22:32,852][07361] Avg episode reward: [(0, '44.774')] [2025-01-05 16:22:33,593][07482] Updated weights for policy 0, policy_version 314208 (0.0017) [2025-01-05 16:22:35,648][07482] Updated weights for policy 0, policy_version 314218 (0.0016) [2025-01-05 16:22:37,702][07482] Updated weights for policy 0, policy_version 314228 (0.0017) [2025-01-05 16:22:37,852][07361] Fps is (10 sec: 19662.1, 60 sec: 19592.5, 300 sec: 19591.4). Total num frames: 1287077888. Throughput: 0: 4881.1. Samples: 17222782. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 16:22:37,852][07361] Avg episode reward: [(0, '45.271')] [2025-01-05 16:22:37,936][07448] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000314229_1287081984.pth... [2025-01-05 16:22:37,988][07448] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000313082_1282383872.pth [2025-01-05 16:22:39,889][07482] Updated weights for policy 0, policy_version 314238 (0.0018) [2025-01-05 16:22:41,925][07482] Updated weights for policy 0, policy_version 314248 (0.0016) [2025-01-05 16:22:42,851][07361] Fps is (10 sec: 19660.9, 60 sec: 19592.5, 300 sec: 19591.4). Total num frames: 1287176192. Throughput: 0: 4888.2. Samples: 17252098. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 16:22:42,852][07361] Avg episode reward: [(0, '44.033')] [2025-01-05 16:22:44,095][07482] Updated weights for policy 0, policy_version 314258 (0.0018) [2025-01-05 16:22:46,169][07482] Updated weights for policy 0, policy_version 314268 (0.0017) [2025-01-05 16:22:47,852][07361] Fps is (10 sec: 19251.4, 60 sec: 19524.3, 300 sec: 19577.5). Total num frames: 1287270400. Throughput: 0: 4880.6. Samples: 17281166. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 16:22:47,852][07361] Avg episode reward: [(0, '45.549')] [2025-01-05 16:22:48,278][07482] Updated weights for policy 0, policy_version 314278 (0.0017) [2025-01-05 16:22:50,340][07482] Updated weights for policy 0, policy_version 314288 (0.0016) [2025-01-05 16:22:52,425][07482] Updated weights for policy 0, policy_version 314298 (0.0016) [2025-01-05 16:22:52,852][07361] Fps is (10 sec: 19660.6, 60 sec: 19592.5, 300 sec: 19577.5). Total num frames: 1287372800. Throughput: 0: 4884.8. Samples: 17295960. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 16:22:52,852][07361] Avg episode reward: [(0, '45.106')] [2025-01-05 16:22:54,529][07482] Updated weights for policy 0, policy_version 314308 (0.0017) [2025-01-05 16:22:56,599][07482] Updated weights for policy 0, policy_version 314318 (0.0017) [2025-01-05 16:22:57,852][07361] Fps is (10 sec: 19660.7, 60 sec: 19524.3, 300 sec: 19577.5). Total num frames: 1287467008. Throughput: 0: 4894.7. Samples: 17325382. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 16:22:57,852][07361] Avg episode reward: [(0, '43.562')] [2025-01-05 16:22:58,762][07482] Updated weights for policy 0, policy_version 314328 (0.0017) [2025-01-05 16:23:00,789][07482] Updated weights for policy 0, policy_version 314338 (0.0016) [2025-01-05 16:23:02,852][07361] Fps is (10 sec: 19251.2, 60 sec: 19524.2, 300 sec: 19577.5). Total num frames: 1287565312. Throughput: 0: 4886.5. Samples: 17354518. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 16:23:02,852][07361] Avg episode reward: [(0, '43.865')] [2025-01-05 16:23:02,926][07482] Updated weights for policy 0, policy_version 314348 (0.0017) [2025-01-05 16:23:05,110][07482] Updated weights for policy 0, policy_version 314358 (0.0016) [2025-01-05 16:23:07,106][07482] Updated weights for policy 0, policy_version 314368 (0.0015) [2025-01-05 16:23:07,851][07361] Fps is (10 sec: 19661.0, 60 sec: 19524.3, 300 sec: 19577.5). Total num frames: 1287663616. Throughput: 0: 4882.6. Samples: 17368896. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 16:23:07,852][07361] Avg episode reward: [(0, '46.982')] [2025-01-05 16:23:09,201][07482] Updated weights for policy 0, policy_version 314378 (0.0016) [2025-01-05 16:23:11,285][07482] Updated weights for policy 0, policy_version 314388 (0.0016) [2025-01-05 16:23:12,852][07361] Fps is (10 sec: 19659.8, 60 sec: 19592.4, 300 sec: 19577.5). Total num frames: 1287761920. Throughput: 0: 4890.0. Samples: 17398794. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 16:23:12,853][07361] Avg episode reward: [(0, '45.267')] [2025-01-05 16:23:13,362][07482] Updated weights for policy 0, policy_version 314398 (0.0017) [2025-01-05 16:23:15,421][07482] Updated weights for policy 0, policy_version 314408 (0.0015) [2025-01-05 16:23:17,517][07482] Updated weights for policy 0, policy_version 314418 (0.0016) [2025-01-05 16:23:17,852][07361] Fps is (10 sec: 19660.5, 60 sec: 19592.5, 300 sec: 19577.5). Total num frames: 1287860224. Throughput: 0: 4892.2. Samples: 17428290. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 16:23:17,852][07361] Avg episode reward: [(0, '43.327')] [2025-01-05 16:23:19,626][07482] Updated weights for policy 0, policy_version 314428 (0.0016) [2025-01-05 16:23:21,685][07482] Updated weights for policy 0, policy_version 314438 (0.0015) [2025-01-05 16:23:22,851][07361] Fps is (10 sec: 19662.1, 60 sec: 19592.5, 300 sec: 19577.5). Total num frames: 1287958528. Throughput: 0: 4889.0. Samples: 17442784. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 16:23:22,852][07361] Avg episode reward: [(0, '44.286')] [2025-01-05 16:23:23,869][07482] Updated weights for policy 0, policy_version 314448 (0.0017) [2025-01-05 16:23:25,886][07482] Updated weights for policy 0, policy_version 314458 (0.0015) [2025-01-05 16:23:27,852][07361] Fps is (10 sec: 19660.9, 60 sec: 19592.8, 300 sec: 19577.5). Total num frames: 1288056832. Throughput: 0: 4892.7. Samples: 17472272. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 16:23:27,852][07361] Avg episode reward: [(0, '45.242')] [2025-01-05 16:23:27,916][07482] Updated weights for policy 0, policy_version 314468 (0.0016) [2025-01-05 16:23:30,024][07482] Updated weights for policy 0, policy_version 314478 (0.0015) [2025-01-05 16:23:32,040][07482] Updated weights for policy 0, policy_version 314488 (0.0016) [2025-01-05 16:23:32,852][07361] Fps is (10 sec: 20070.2, 60 sec: 19660.8, 300 sec: 19591.4). Total num frames: 1288159232. Throughput: 0: 4911.4. Samples: 17502180. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 16:23:32,852][07361] Avg episode reward: [(0, '45.459')] [2025-01-05 16:23:34,071][07482] Updated weights for policy 0, policy_version 314498 (0.0016) [2025-01-05 16:23:36,185][07482] Updated weights for policy 0, policy_version 314508 (0.0015) [2025-01-05 16:23:37,852][07361] Fps is (10 sec: 19660.8, 60 sec: 19592.5, 300 sec: 19563.6). Total num frames: 1288253440. Throughput: 0: 4915.3. Samples: 17517148. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 16:23:37,852][07361] Avg episode reward: [(0, '44.613')] [2025-01-05 16:23:38,308][07482] Updated weights for policy 0, policy_version 314518 (0.0016) [2025-01-05 16:23:40,336][07482] Updated weights for policy 0, policy_version 314528 (0.0015) [2025-01-05 16:23:42,451][07482] Updated weights for policy 0, policy_version 314538 (0.0015) [2025-01-05 16:23:42,851][07361] Fps is (10 sec: 19660.9, 60 sec: 19660.8, 300 sec: 19577.5). Total num frames: 1288355840. Throughput: 0: 4915.1. Samples: 17546562. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 16:23:42,852][07361] Avg episode reward: [(0, '46.041')] [2025-01-05 16:23:44,573][07482] Updated weights for policy 0, policy_version 314548 (0.0017) [2025-01-05 16:23:46,609][07482] Updated weights for policy 0, policy_version 314558 (0.0016) [2025-01-05 16:23:47,852][07361] Fps is (10 sec: 19660.7, 60 sec: 19660.8, 300 sec: 19577.5). Total num frames: 1288450048. Throughput: 0: 4916.9. Samples: 17575778. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 16:23:47,852][07361] Avg episode reward: [(0, '44.460')] [2025-01-05 16:23:48,786][07482] Updated weights for policy 0, policy_version 314568 (0.0016) [2025-01-05 16:23:50,797][07482] Updated weights for policy 0, policy_version 314578 (0.0015) [2025-01-05 16:23:52,835][07482] Updated weights for policy 0, policy_version 314588 (0.0015) [2025-01-05 16:23:52,852][07361] Fps is (10 sec: 19660.7, 60 sec: 19660.8, 300 sec: 19577.5). Total num frames: 1288552448. Throughput: 0: 4926.4. Samples: 17590586. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 16:23:52,852][07361] Avg episode reward: [(0, '45.105')] [2025-01-05 16:23:55,032][07482] Updated weights for policy 0, policy_version 314598 (0.0017) [2025-01-05 16:23:57,053][07482] Updated weights for policy 0, policy_version 314608 (0.0015) [2025-01-05 16:23:57,851][07361] Fps is (10 sec: 19661.1, 60 sec: 19660.8, 300 sec: 19577.5). Total num frames: 1288646656. Throughput: 0: 4918.2. Samples: 17620110. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 16:23:57,852][07361] Avg episode reward: [(0, '45.048')] [2025-01-05 16:23:59,173][07482] Updated weights for policy 0, policy_version 314618 (0.0017) [2025-01-05 16:24:01,268][07482] Updated weights for policy 0, policy_version 314628 (0.0015) [2025-01-05 16:24:02,851][07361] Fps is (10 sec: 19251.3, 60 sec: 19660.8, 300 sec: 19577.5). Total num frames: 1288744960. Throughput: 0: 4911.6. Samples: 17649310. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 16:24:02,852][07361] Avg episode reward: [(0, '46.068')] [2025-01-05 16:24:03,388][07482] Updated weights for policy 0, policy_version 314638 (0.0016) [2025-01-05 16:24:05,445][07482] Updated weights for policy 0, policy_version 314648 (0.0016) [2025-01-05 16:24:07,534][07482] Updated weights for policy 0, policy_version 314658 (0.0019) [2025-01-05 16:24:07,852][07361] Fps is (10 sec: 19660.6, 60 sec: 19660.8, 300 sec: 19577.5). Total num frames: 1288843264. Throughput: 0: 4916.2. Samples: 17664012. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 16:24:07,852][07361] Avg episode reward: [(0, '44.488')] [2025-01-05 16:24:09,630][07482] Updated weights for policy 0, policy_version 314668 (0.0016) [2025-01-05 16:24:11,669][07482] Updated weights for policy 0, policy_version 314678 (0.0015) [2025-01-05 16:24:12,851][07361] Fps is (10 sec: 19660.9, 60 sec: 19661.0, 300 sec: 19577.5). Total num frames: 1288941568. Throughput: 0: 4919.3. Samples: 17693638. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 16:24:12,852][07361] Avg episode reward: [(0, '45.917')] [2025-01-05 16:24:13,838][07482] Updated weights for policy 0, policy_version 314688 (0.0016) [2025-01-05 16:24:15,887][07482] Updated weights for policy 0, policy_version 314698 (0.0016) [2025-01-05 16:24:17,852][07361] Fps is (10 sec: 19660.7, 60 sec: 19660.8, 300 sec: 19577.5). Total num frames: 1289039872. Throughput: 0: 4901.7. Samples: 17722756. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 16:24:17,852][07361] Avg episode reward: [(0, '45.091')] [2025-01-05 16:24:17,992][07482] Updated weights for policy 0, policy_version 314708 (0.0016) [2025-01-05 16:24:20,071][07482] Updated weights for policy 0, policy_version 314718 (0.0016) [2025-01-05 16:24:22,092][07482] Updated weights for policy 0, policy_version 314728 (0.0016) [2025-01-05 16:24:22,852][07361] Fps is (10 sec: 19660.4, 60 sec: 19660.7, 300 sec: 19577.5). Total num frames: 1289138176. Throughput: 0: 4901.7. Samples: 17737724. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 16:24:22,852][07361] Avg episode reward: [(0, '45.220')] [2025-01-05 16:24:24,198][07482] Updated weights for policy 0, policy_version 314738 (0.0017) [2025-01-05 16:24:26,313][07482] Updated weights for policy 0, policy_version 314748 (0.0016) [2025-01-05 16:24:27,851][07361] Fps is (10 sec: 19661.0, 60 sec: 19660.8, 300 sec: 19577.5). Total num frames: 1289236480. Throughput: 0: 4905.4. Samples: 17767304. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 16:24:27,852][07361] Avg episode reward: [(0, '46.181')] [2025-01-05 16:24:28,411][07482] Updated weights for policy 0, policy_version 314758 (0.0016) [2025-01-05 16:24:30,440][07482] Updated weights for policy 0, policy_version 314768 (0.0016) [2025-01-05 16:24:32,552][07482] Updated weights for policy 0, policy_version 314778 (0.0016) [2025-01-05 16:24:32,852][07361] Fps is (10 sec: 19661.0, 60 sec: 19592.5, 300 sec: 19577.5). Total num frames: 1289334784. Throughput: 0: 4912.5. Samples: 17796842. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 16:24:32,852][07361] Avg episode reward: [(0, '42.178')] [2025-01-05 16:24:34,610][07482] Updated weights for policy 0, policy_version 314788 (0.0016) [2025-01-05 16:24:36,675][07482] Updated weights for policy 0, policy_version 314798 (0.0018) [2025-01-05 16:24:37,852][07361] Fps is (10 sec: 19660.7, 60 sec: 19660.8, 300 sec: 19577.5). Total num frames: 1289433088. Throughput: 0: 4909.0. Samples: 17811492. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 16:24:37,852][07361] Avg episode reward: [(0, '46.310')] [2025-01-05 16:24:37,967][07448] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000314804_1289437184.pth... [2025-01-05 16:24:38,020][07448] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000313657_1284739072.pth [2025-01-05 16:24:38,912][07482] Updated weights for policy 0, policy_version 314808 (0.0016) [2025-01-05 16:24:40,929][07482] Updated weights for policy 0, policy_version 314818 (0.0016) [2025-01-05 16:24:42,851][07361] Fps is (10 sec: 19661.0, 60 sec: 19592.6, 300 sec: 19577.5). Total num frames: 1289531392. Throughput: 0: 4902.3. Samples: 17840712. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 16:24:42,852][07361] Avg episode reward: [(0, '45.084')] [2025-01-05 16:24:42,962][07482] Updated weights for policy 0, policy_version 314828 (0.0017) [2025-01-05 16:24:45,073][07482] Updated weights for policy 0, policy_version 314838 (0.0016) [2025-01-05 16:24:47,113][07482] Updated weights for policy 0, policy_version 314848 (0.0017) [2025-01-05 16:24:47,852][07361] Fps is (10 sec: 19660.8, 60 sec: 19660.8, 300 sec: 19577.5). Total num frames: 1289629696. Throughput: 0: 4916.1. Samples: 17870534. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 16:24:47,852][07361] Avg episode reward: [(0, '42.230')] [2025-01-05 16:24:49,177][07482] Updated weights for policy 0, policy_version 314858 (0.0017) [2025-01-05 16:24:51,314][07482] Updated weights for policy 0, policy_version 314868 (0.0017) [2025-01-05 16:24:52,851][07361] Fps is (10 sec: 19660.8, 60 sec: 19592.6, 300 sec: 19577.5). Total num frames: 1289728000. Throughput: 0: 4918.5. Samples: 17885342. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 16:24:52,852][07361] Avg episode reward: [(0, '44.937')] [2025-01-05 16:24:53,458][07482] Updated weights for policy 0, policy_version 314878 (0.0018) [2025-01-05 16:24:55,469][07482] Updated weights for policy 0, policy_version 314888 (0.0016) [2025-01-05 16:24:57,622][07482] Updated weights for policy 0, policy_version 314898 (0.0017) [2025-01-05 16:24:57,852][07361] Fps is (10 sec: 19660.7, 60 sec: 19660.8, 300 sec: 19577.5). Total num frames: 1289826304. Throughput: 0: 4907.3. Samples: 17914468. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 16:24:57,852][07361] Avg episode reward: [(0, '48.146')] [2025-01-05 16:24:57,860][07448] Saving new best policy, reward=48.146! [2025-01-05 16:24:59,757][07482] Updated weights for policy 0, policy_version 314908 (0.0016) [2025-01-05 16:25:01,779][07482] Updated weights for policy 0, policy_version 314918 (0.0017) [2025-01-05 16:25:02,852][07361] Fps is (10 sec: 19250.9, 60 sec: 19592.5, 300 sec: 19563.6). Total num frames: 1289920512. Throughput: 0: 4908.4. Samples: 17943634. Policy #0 lag: (min: 0.0, avg: 1.4, max: 2.0) [2025-01-05 16:25:02,852][07361] Avg episode reward: [(0, '46.674')] [2025-01-05 16:25:04,021][07482] Updated weights for policy 0, policy_version 314928 (0.0018) [2025-01-05 16:25:06,061][07482] Updated weights for policy 0, policy_version 314938 (0.0016) [2025-01-05 16:25:07,852][07361] Fps is (10 sec: 19251.1, 60 sec: 19592.5, 300 sec: 19577.5). Total num frames: 1290018816. Throughput: 0: 4898.0. Samples: 17958132. Policy #0 lag: (min: 0.0, avg: 1.4, max: 2.0) [2025-01-05 16:25:07,852][07361] Avg episode reward: [(0, '47.103')] [2025-01-05 16:25:08,142][07482] Updated weights for policy 0, policy_version 314948 (0.0018) [2025-01-05 16:25:10,259][07482] Updated weights for policy 0, policy_version 314958 (0.0015) [2025-01-05 16:25:12,308][07482] Updated weights for policy 0, policy_version 314968 (0.0016) [2025-01-05 16:25:12,852][07361] Fps is (10 sec: 19660.9, 60 sec: 19592.5, 300 sec: 19563.6). Total num frames: 1290117120. Throughput: 0: 4895.3. Samples: 17987594. Policy #0 lag: (min: 0.0, avg: 1.4, max: 2.0) [2025-01-05 16:25:12,852][07361] Avg episode reward: [(0, '46.171')] [2025-01-05 16:25:14,414][07482] Updated weights for policy 0, policy_version 314978 (0.0017) [2025-01-05 16:25:16,566][07482] Updated weights for policy 0, policy_version 314988 (0.0016) [2025-01-05 16:25:17,852][07361] Fps is (10 sec: 19661.0, 60 sec: 19592.6, 300 sec: 19577.5). Total num frames: 1290215424. Throughput: 0: 4884.0. Samples: 18016620. Policy #0 lag: (min: 0.0, avg: 1.4, max: 2.0) [2025-01-05 16:25:17,852][07361] Avg episode reward: [(0, '43.484')] [2025-01-05 16:25:18,712][07482] Updated weights for policy 0, policy_version 314998 (0.0017) [2025-01-05 16:25:20,715][07482] Updated weights for policy 0, policy_version 315008 (0.0016) [2025-01-05 16:25:22,832][07482] Updated weights for policy 0, policy_version 315018 (0.0016) [2025-01-05 16:25:22,852][07361] Fps is (10 sec: 19660.7, 60 sec: 19592.5, 300 sec: 19563.6). Total num frames: 1290313728. Throughput: 0: 4884.8. Samples: 18031306. Policy #0 lag: (min: 0.0, avg: 1.4, max: 2.0) [2025-01-05 16:25:22,852][07361] Avg episode reward: [(0, '43.872')] [2025-01-05 16:25:24,994][07482] Updated weights for policy 0, policy_version 315028 (0.0017) [2025-01-05 16:25:27,009][07482] Updated weights for policy 0, policy_version 315038 (0.0016) [2025-01-05 16:25:27,852][07361] Fps is (10 sec: 19251.1, 60 sec: 19524.2, 300 sec: 19563.6). Total num frames: 1290407936. Throughput: 0: 4885.1. Samples: 18060542. Policy #0 lag: (min: 0.0, avg: 1.4, max: 2.0) [2025-01-05 16:25:27,852][07361] Avg episode reward: [(0, '47.090')] [2025-01-05 16:25:29,138][07482] Updated weights for policy 0, policy_version 315048 (0.0016) [2025-01-05 16:25:31,197][07482] Updated weights for policy 0, policy_version 315058 (0.0016) [2025-01-05 16:25:32,852][07361] Fps is (10 sec: 19251.2, 60 sec: 19524.3, 300 sec: 19563.6). Total num frames: 1290506240. Throughput: 0: 4875.7. Samples: 18089942. Policy #0 lag: (min: 0.0, avg: 1.4, max: 2.0) [2025-01-05 16:25:32,852][07361] Avg episode reward: [(0, '42.512')] [2025-01-05 16:25:33,292][07482] Updated weights for policy 0, policy_version 315068 (0.0017) [2025-01-05 16:25:35,405][07482] Updated weights for policy 0, policy_version 315078 (0.0015) [2025-01-05 16:25:37,473][07482] Updated weights for policy 0, policy_version 315088 (0.0017) [2025-01-05 16:25:37,852][07361] Fps is (10 sec: 19660.7, 60 sec: 19524.2, 300 sec: 19563.6). Total num frames: 1290604544. Throughput: 0: 4874.3. Samples: 18104688. Policy #0 lag: (min: 0.0, avg: 1.4, max: 2.0) [2025-01-05 16:25:37,852][07361] Avg episode reward: [(0, '43.054')] [2025-01-05 16:25:39,570][07482] Updated weights for policy 0, policy_version 315098 (0.0017) [2025-01-05 16:25:41,697][07482] Updated weights for policy 0, policy_version 315108 (0.0016) [2025-01-05 16:25:42,852][07361] Fps is (10 sec: 19660.7, 60 sec: 19524.2, 300 sec: 19563.6). Total num frames: 1290702848. Throughput: 0: 4879.7. Samples: 18134056. Policy #0 lag: (min: 0.0, avg: 1.4, max: 2.0) [2025-01-05 16:25:42,852][07361] Avg episode reward: [(0, '44.946')] [2025-01-05 16:25:43,855][07482] Updated weights for policy 0, policy_version 315118 (0.0017) [2025-01-05 16:25:45,902][07482] Updated weights for policy 0, policy_version 315128 (0.0018) [2025-01-05 16:25:47,851][07361] Fps is (10 sec: 19251.7, 60 sec: 19456.0, 300 sec: 19563.6). Total num frames: 1290797056. Throughput: 0: 4866.6. Samples: 18162632. Policy #0 lag: (min: 0.0, avg: 1.4, max: 2.0) [2025-01-05 16:25:47,852][07361] Avg episode reward: [(0, '46.142')] [2025-01-05 16:25:48,146][07482] Updated weights for policy 0, policy_version 315138 (0.0017) [2025-01-05 16:25:50,217][07482] Updated weights for policy 0, policy_version 315148 (0.0016) [2025-01-05 16:25:52,247][07482] Updated weights for policy 0, policy_version 315158 (0.0015) [2025-01-05 16:25:52,852][07361] Fps is (10 sec: 19251.3, 60 sec: 19456.0, 300 sec: 19563.6). Total num frames: 1290895360. Throughput: 0: 4871.1. Samples: 18177330. Policy #0 lag: (min: 0.0, avg: 1.4, max: 2.0) [2025-01-05 16:25:52,852][07361] Avg episode reward: [(0, '45.087')] [2025-01-05 16:25:54,443][07482] Updated weights for policy 0, policy_version 315168 (0.0016) [2025-01-05 16:25:56,476][07482] Updated weights for policy 0, policy_version 315178 (0.0015) [2025-01-05 16:25:57,852][07361] Fps is (10 sec: 19660.0, 60 sec: 19455.9, 300 sec: 19563.6). Total num frames: 1290993664. Throughput: 0: 4870.9. Samples: 18206788. Policy #0 lag: (min: 0.0, avg: 1.4, max: 2.0) [2025-01-05 16:25:57,852][07361] Avg episode reward: [(0, '45.593')] [2025-01-05 16:25:58,592][07482] Updated weights for policy 0, policy_version 315188 (0.0017) [2025-01-05 16:26:00,684][07482] Updated weights for policy 0, policy_version 315198 (0.0017) [2025-01-05 16:26:02,811][07482] Updated weights for policy 0, policy_version 315208 (0.0017) [2025-01-05 16:26:02,852][07361] Fps is (10 sec: 19660.8, 60 sec: 19524.3, 300 sec: 19563.6). Total num frames: 1291091968. Throughput: 0: 4870.0. Samples: 18235770. Policy #0 lag: (min: 0.0, avg: 1.4, max: 2.0) [2025-01-05 16:26:02,852][07361] Avg episode reward: [(0, '48.894')] [2025-01-05 16:26:02,853][07448] Saving new best policy, reward=48.894! [2025-01-05 16:26:04,920][07482] Updated weights for policy 0, policy_version 315218 (0.0016) [2025-01-05 16:26:07,012][07482] Updated weights for policy 0, policy_version 315228 (0.0015) [2025-01-05 16:26:07,852][07361] Fps is (10 sec: 19251.7, 60 sec: 19456.0, 300 sec: 19563.6). Total num frames: 1291186176. Throughput: 0: 4864.8. Samples: 18250224. Policy #0 lag: (min: 0.0, avg: 1.4, max: 2.0) [2025-01-05 16:26:07,852][07361] Avg episode reward: [(0, '47.854')] [2025-01-05 16:26:09,132][07482] Updated weights for policy 0, policy_version 315238 (0.0017) [2025-01-05 16:26:11,166][07482] Updated weights for policy 0, policy_version 315248 (0.0016) [2025-01-05 16:26:12,851][07361] Fps is (10 sec: 19251.3, 60 sec: 19456.0, 300 sec: 19563.6). Total num frames: 1291284480. Throughput: 0: 4870.5. Samples: 18279714. Policy #0 lag: (min: 0.0, avg: 1.4, max: 2.0) [2025-01-05 16:26:12,852][07361] Avg episode reward: [(0, '44.259')] [2025-01-05 16:26:13,383][07482] Updated weights for policy 0, policy_version 315258 (0.0017) [2025-01-05 16:26:15,402][07482] Updated weights for policy 0, policy_version 315268 (0.0017) [2025-01-05 16:26:17,434][07482] Updated weights for policy 0, policy_version 315278 (0.0017) [2025-01-05 16:26:17,851][07361] Fps is (10 sec: 19661.1, 60 sec: 19456.0, 300 sec: 19563.6). Total num frames: 1291382784. Throughput: 0: 4871.8. Samples: 18309172. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 16:26:17,852][07361] Avg episode reward: [(0, '42.831')] [2025-01-05 16:26:19,612][07482] Updated weights for policy 0, policy_version 315288 (0.0017) [2025-01-05 16:26:21,650][07482] Updated weights for policy 0, policy_version 315298 (0.0015) [2025-01-05 16:26:22,852][07361] Fps is (10 sec: 19660.7, 60 sec: 19456.0, 300 sec: 19577.5). Total num frames: 1291481088. Throughput: 0: 4868.5. Samples: 18323770. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 16:26:22,852][07361] Avg episode reward: [(0, '45.091')] [2025-01-05 16:26:23,779][07482] Updated weights for policy 0, policy_version 315308 (0.0017) [2025-01-05 16:26:25,873][07482] Updated weights for policy 0, policy_version 315318 (0.0015) [2025-01-05 16:26:27,852][07361] Fps is (10 sec: 19660.4, 60 sec: 19524.3, 300 sec: 19577.5). Total num frames: 1291579392. Throughput: 0: 4865.6. Samples: 18353008. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 16:26:27,852][07361] Avg episode reward: [(0, '48.537')] [2025-01-05 16:26:28,027][07482] Updated weights for policy 0, policy_version 315328 (0.0017) [2025-01-05 16:26:30,040][07482] Updated weights for policy 0, policy_version 315338 (0.0016) [2025-01-05 16:26:32,155][07482] Updated weights for policy 0, policy_version 315348 (0.0016) [2025-01-05 16:26:32,852][07361] Fps is (10 sec: 19660.8, 60 sec: 19524.3, 300 sec: 19577.5). Total num frames: 1291677696. Throughput: 0: 4884.8. Samples: 18382450. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 16:26:32,852][07361] Avg episode reward: [(0, '46.150')] [2025-01-05 16:26:34,296][07482] Updated weights for policy 0, policy_version 315358 (0.0017) [2025-01-05 16:26:36,310][07482] Updated weights for policy 0, policy_version 315368 (0.0015) [2025-01-05 16:26:37,852][07361] Fps is (10 sec: 19660.8, 60 sec: 19524.3, 300 sec: 19577.5). Total num frames: 1291776000. Throughput: 0: 4880.9. Samples: 18396972. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 16:26:37,852][07361] Avg episode reward: [(0, '44.483')] [2025-01-05 16:26:37,859][07448] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000315375_1291776000.pth... [2025-01-05 16:26:37,916][07448] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000314229_1287081984.pth [2025-01-05 16:26:38,474][07482] Updated weights for policy 0, policy_version 315378 (0.0016) [2025-01-05 16:26:40,516][07482] Updated weights for policy 0, policy_version 315388 (0.0015) [2025-01-05 16:26:42,534][07482] Updated weights for policy 0, policy_version 315398 (0.0016) [2025-01-05 16:26:42,851][07361] Fps is (10 sec: 19661.0, 60 sec: 19524.3, 300 sec: 19577.5). Total num frames: 1291874304. Throughput: 0: 4883.3. Samples: 18426536. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 16:26:42,852][07361] Avg episode reward: [(0, '47.240')] [2025-01-05 16:26:44,680][07482] Updated weights for policy 0, policy_version 315408 (0.0016) [2025-01-05 16:26:46,727][07482] Updated weights for policy 0, policy_version 315418 (0.0015) [2025-01-05 16:26:47,851][07361] Fps is (10 sec: 19661.1, 60 sec: 19592.5, 300 sec: 19577.5). Total num frames: 1291972608. Throughput: 0: 4895.0. Samples: 18456044. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 16:26:47,852][07361] Avg episode reward: [(0, '48.984')] [2025-01-05 16:26:47,858][07448] Saving new best policy, reward=48.984! [2025-01-05 16:26:48,890][07482] Updated weights for policy 0, policy_version 315428 (0.0016) [2025-01-05 16:26:51,031][07482] Updated weights for policy 0, policy_version 315438 (0.0016) [2025-01-05 16:26:52,852][07361] Fps is (10 sec: 19251.0, 60 sec: 19524.3, 300 sec: 19563.6). Total num frames: 1292066816. Throughput: 0: 4891.1. Samples: 18470322. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 16:26:52,852][07361] Avg episode reward: [(0, '47.466')] [2025-01-05 16:26:53,078][07482] Updated weights for policy 0, policy_version 315448 (0.0016) [2025-01-05 16:26:55,112][07482] Updated weights for policy 0, policy_version 315458 (0.0016) [2025-01-05 16:26:57,220][07482] Updated weights for policy 0, policy_version 315468 (0.0017) [2025-01-05 16:26:57,852][07361] Fps is (10 sec: 19251.0, 60 sec: 19524.4, 300 sec: 19563.6). Total num frames: 1292165120. Throughput: 0: 4895.5. Samples: 18500012. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 16:26:57,852][07361] Avg episode reward: [(0, '45.298')] [2025-01-05 16:26:59,302][07482] Updated weights for policy 0, policy_version 315478 (0.0016) [2025-01-05 16:27:01,338][07482] Updated weights for policy 0, policy_version 315488 (0.0016) [2025-01-05 16:27:02,852][07361] Fps is (10 sec: 19660.7, 60 sec: 19524.3, 300 sec: 19563.6). Total num frames: 1292263424. Throughput: 0: 4894.5. Samples: 18529426. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 16:27:02,852][07361] Avg episode reward: [(0, '46.199')] [2025-01-05 16:27:03,546][07482] Updated weights for policy 0, policy_version 315498 (0.0017) [2025-01-05 16:27:05,579][07482] Updated weights for policy 0, policy_version 315508 (0.0015) [2025-01-05 16:27:07,604][07482] Updated weights for policy 0, policy_version 315518 (0.0017) [2025-01-05 16:27:07,852][07361] Fps is (10 sec: 19660.8, 60 sec: 19592.5, 300 sec: 19577.5). Total num frames: 1292361728. Throughput: 0: 4896.4. Samples: 18544108. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 16:27:07,852][07361] Avg episode reward: [(0, '45.318')] [2025-01-05 16:27:09,884][07482] Updated weights for policy 0, policy_version 315528 (0.0017) [2025-01-05 16:27:11,918][07482] Updated weights for policy 0, policy_version 315538 (0.0015) [2025-01-05 16:27:12,852][07361] Fps is (10 sec: 19660.8, 60 sec: 19592.5, 300 sec: 19577.5). Total num frames: 1292460032. Throughput: 0: 4891.9. Samples: 18573142. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 16:27:12,852][07361] Avg episode reward: [(0, '42.571')] [2025-01-05 16:27:14,013][07482] Updated weights for policy 0, policy_version 315548 (0.0017) [2025-01-05 16:27:16,132][07482] Updated weights for policy 0, policy_version 315558 (0.0016) [2025-01-05 16:27:17,852][07361] Fps is (10 sec: 19660.7, 60 sec: 19592.5, 300 sec: 19577.5). Total num frames: 1292558336. Throughput: 0: 4889.6. Samples: 18602484. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 16:27:17,852][07361] Avg episode reward: [(0, '43.803')] [2025-01-05 16:27:18,222][07482] Updated weights for policy 0, policy_version 315568 (0.0016) [2025-01-05 16:27:20,257][07482] Updated weights for policy 0, policy_version 315578 (0.0014) [2025-01-05 16:27:22,489][07482] Updated weights for policy 0, policy_version 315588 (0.0017) [2025-01-05 16:27:22,851][07361] Fps is (10 sec: 19251.4, 60 sec: 19524.3, 300 sec: 19563.7). Total num frames: 1292652544. Throughput: 0: 4892.0. Samples: 18617110. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 16:27:22,852][07361] Avg episode reward: [(0, '43.053')] [2025-01-05 16:27:24,623][07482] Updated weights for policy 0, policy_version 315598 (0.0017) [2025-01-05 16:27:26,631][07482] Updated weights for policy 0, policy_version 315608 (0.0018) [2025-01-05 16:27:27,852][07361] Fps is (10 sec: 19251.2, 60 sec: 19524.3, 300 sec: 19563.6). Total num frames: 1292750848. Throughput: 0: 4878.9. Samples: 18646088. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 16:27:27,852][07361] Avg episode reward: [(0, '43.470')] [2025-01-05 16:27:28,767][07482] Updated weights for policy 0, policy_version 315618 (0.0016) [2025-01-05 16:27:30,765][07482] Updated weights for policy 0, policy_version 315628 (0.0015) [2025-01-05 16:27:32,781][07482] Updated weights for policy 0, policy_version 315638 (0.0015) [2025-01-05 16:27:32,852][07361] Fps is (10 sec: 20070.3, 60 sec: 19592.5, 300 sec: 19577.5). Total num frames: 1292853248. Throughput: 0: 4886.8. Samples: 18675952. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 16:27:32,852][07361] Avg episode reward: [(0, '44.366')] [2025-01-05 16:27:34,914][07482] Updated weights for policy 0, policy_version 315648 (0.0016) [2025-01-05 16:27:36,923][07482] Updated weights for policy 0, policy_version 315658 (0.0016) [2025-01-05 16:27:37,852][07361] Fps is (10 sec: 20070.6, 60 sec: 19592.6, 300 sec: 19577.5). Total num frames: 1292951552. Throughput: 0: 4901.8. Samples: 18690902. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 16:27:37,852][07361] Avg episode reward: [(0, '45.502')] [2025-01-05 16:27:38,962][07482] Updated weights for policy 0, policy_version 315668 (0.0014) [2025-01-05 16:27:41,081][07482] Updated weights for policy 0, policy_version 315678 (0.0016) [2025-01-05 16:27:42,852][07361] Fps is (10 sec: 19660.7, 60 sec: 19592.5, 300 sec: 19591.4). Total num frames: 1293049856. Throughput: 0: 4905.0. Samples: 18720736. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 16:27:42,852][07361] Avg episode reward: [(0, '45.436')] [2025-01-05 16:27:43,173][07482] Updated weights for policy 0, policy_version 315688 (0.0017) [2025-01-05 16:27:45,222][07482] Updated weights for policy 0, policy_version 315698 (0.0016) [2025-01-05 16:27:47,371][07482] Updated weights for policy 0, policy_version 315708 (0.0018) [2025-01-05 16:27:47,852][07361] Fps is (10 sec: 19660.7, 60 sec: 19592.5, 300 sec: 19577.5). Total num frames: 1293148160. Throughput: 0: 4905.2. Samples: 18750160. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 16:27:47,852][07361] Avg episode reward: [(0, '44.056')] [2025-01-05 16:27:49,436][07482] Updated weights for policy 0, policy_version 315718 (0.0016) [2025-01-05 16:27:51,473][07482] Updated weights for policy 0, policy_version 315728 (0.0016) [2025-01-05 16:27:52,852][07361] Fps is (10 sec: 19660.8, 60 sec: 19660.8, 300 sec: 19591.4). Total num frames: 1293246464. Throughput: 0: 4902.8. Samples: 18764736. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 16:27:52,852][07361] Avg episode reward: [(0, '46.335')] [2025-01-05 16:27:53,699][07482] Updated weights for policy 0, policy_version 315738 (0.0016) [2025-01-05 16:27:55,715][07482] Updated weights for policy 0, policy_version 315748 (0.0017) [2025-01-05 16:27:57,759][07482] Updated weights for policy 0, policy_version 315758 (0.0017) [2025-01-05 16:27:57,852][07361] Fps is (10 sec: 19660.8, 60 sec: 19660.8, 300 sec: 19591.4). Total num frames: 1293344768. Throughput: 0: 4909.2. Samples: 18794058. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 16:27:57,852][07361] Avg episode reward: [(0, '47.163')] [2025-01-05 16:27:59,941][07482] Updated weights for policy 0, policy_version 315768 (0.0017) [2025-01-05 16:28:01,945][07482] Updated weights for policy 0, policy_version 315778 (0.0016) [2025-01-05 16:28:02,852][07361] Fps is (10 sec: 19659.7, 60 sec: 19660.6, 300 sec: 19591.3). Total num frames: 1293443072. Throughput: 0: 4913.4. Samples: 18823588. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 16:28:02,853][07361] Avg episode reward: [(0, '47.721')] [2025-01-05 16:28:03,993][07482] Updated weights for policy 0, policy_version 315788 (0.0016) [2025-01-05 16:28:06,118][07482] Updated weights for policy 0, policy_version 315798 (0.0017) [2025-01-05 16:28:07,852][07361] Fps is (10 sec: 19660.8, 60 sec: 19660.8, 300 sec: 19591.4). Total num frames: 1293541376. Throughput: 0: 4918.7. Samples: 18838452. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 16:28:07,852][07361] Avg episode reward: [(0, '47.219')] [2025-01-05 16:28:08,211][07482] Updated weights for policy 0, policy_version 315808 (0.0017) [2025-01-05 16:28:10,287][07482] Updated weights for policy 0, policy_version 315818 (0.0017) [2025-01-05 16:28:12,402][07482] Updated weights for policy 0, policy_version 315828 (0.0015) [2025-01-05 16:28:12,852][07361] Fps is (10 sec: 19661.9, 60 sec: 19660.8, 300 sec: 19591.4). Total num frames: 1293639680. Throughput: 0: 4925.5. Samples: 18867734. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 16:28:12,852][07361] Avg episode reward: [(0, '48.140')] [2025-01-05 16:28:14,480][07482] Updated weights for policy 0, policy_version 315838 (0.0017) [2025-01-05 16:28:16,534][07482] Updated weights for policy 0, policy_version 315848 (0.0015) [2025-01-05 16:28:17,851][07361] Fps is (10 sec: 19661.0, 60 sec: 19660.8, 300 sec: 19591.4). Total num frames: 1293737984. Throughput: 0: 4914.1. Samples: 18897088. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 16:28:17,852][07361] Avg episode reward: [(0, '48.644')] [2025-01-05 16:28:18,732][07482] Updated weights for policy 0, policy_version 315858 (0.0016) [2025-01-05 16:28:20,758][07482] Updated weights for policy 0, policy_version 315868 (0.0016) [2025-01-05 16:28:22,852][07361] Fps is (10 sec: 19251.1, 60 sec: 19660.8, 300 sec: 19577.5). Total num frames: 1293832192. Throughput: 0: 4909.3. Samples: 18911822. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 16:28:22,852][07361] Avg episode reward: [(0, '47.245')] [2025-01-05 16:28:22,891][07482] Updated weights for policy 0, policy_version 315878 (0.0017) [2025-01-05 16:28:25,073][07482] Updated weights for policy 0, policy_version 315888 (0.0017) [2025-01-05 16:28:27,079][07482] Updated weights for policy 0, policy_version 315898 (0.0016) [2025-01-05 16:28:27,852][07361] Fps is (10 sec: 19251.0, 60 sec: 19660.8, 300 sec: 19563.6). Total num frames: 1293930496. Throughput: 0: 4890.6. Samples: 18940814. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 16:28:27,852][07361] Avg episode reward: [(0, '46.993')] [2025-01-05 16:28:29,156][07482] Updated weights for policy 0, policy_version 315908 (0.0016) [2025-01-05 16:28:31,278][07482] Updated weights for policy 0, policy_version 315918 (0.0016) [2025-01-05 16:28:32,851][07361] Fps is (10 sec: 19661.1, 60 sec: 19592.5, 300 sec: 19577.5). Total num frames: 1294028800. Throughput: 0: 4891.5. Samples: 18970278. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 16:28:32,852][07361] Avg episode reward: [(0, '43.664')] [2025-01-05 16:28:33,373][07482] Updated weights for policy 0, policy_version 315928 (0.0017) [2025-01-05 16:28:35,468][07482] Updated weights for policy 0, policy_version 315938 (0.0016) [2025-01-05 16:28:37,548][07482] Updated weights for policy 0, policy_version 315948 (0.0016) [2025-01-05 16:28:37,852][07361] Fps is (10 sec: 19660.6, 60 sec: 19592.5, 300 sec: 19563.6). Total num frames: 1294127104. Throughput: 0: 4891.8. Samples: 18984868. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 16:28:37,852][07361] Avg episode reward: [(0, '43.910')] [2025-01-05 16:28:37,932][07448] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000315950_1294131200.pth... [2025-01-05 16:28:37,988][07448] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000314804_1289437184.pth [2025-01-05 16:28:39,660][07482] Updated weights for policy 0, policy_version 315958 (0.0017) [2025-01-05 16:28:41,742][07482] Updated weights for policy 0, policy_version 315968 (0.0016) [2025-01-05 16:28:42,852][07361] Fps is (10 sec: 19660.7, 60 sec: 19592.5, 300 sec: 19577.5). Total num frames: 1294225408. Throughput: 0: 4892.3. Samples: 19014210. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 16:28:42,852][07361] Avg episode reward: [(0, '44.769')] [2025-01-05 16:28:43,943][07482] Updated weights for policy 0, policy_version 315978 (0.0017) [2025-01-05 16:28:45,955][07482] Updated weights for policy 0, policy_version 315988 (0.0016) [2025-01-05 16:28:47,851][07361] Fps is (10 sec: 19661.2, 60 sec: 19592.6, 300 sec: 19563.6). Total num frames: 1294323712. Throughput: 0: 4886.4. Samples: 19043472. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 16:28:47,852][07361] Avg episode reward: [(0, '47.114')] [2025-01-05 16:28:48,010][07482] Updated weights for policy 0, policy_version 315998 (0.0016) [2025-01-05 16:28:50,128][07482] Updated weights for policy 0, policy_version 316008 (0.0016) [2025-01-05 16:28:52,179][07482] Updated weights for policy 0, policy_version 316018 (0.0018) [2025-01-05 16:28:52,851][07361] Fps is (10 sec: 19661.0, 60 sec: 19592.6, 300 sec: 19577.5). Total num frames: 1294422016. Throughput: 0: 4889.0. Samples: 19058456. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 16:28:52,852][07361] Avg episode reward: [(0, '47.466')] [2025-01-05 16:28:54,285][07482] Updated weights for policy 0, policy_version 316028 (0.0016) [2025-01-05 16:28:56,433][07482] Updated weights for policy 0, policy_version 316038 (0.0016) [2025-01-05 16:28:57,851][07361] Fps is (10 sec: 19251.1, 60 sec: 19524.3, 300 sec: 19563.6). Total num frames: 1294516224. Throughput: 0: 4887.8. Samples: 19087684. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 16:28:57,852][07361] Avg episode reward: [(0, '47.939')] [2025-01-05 16:28:58,521][07482] Updated weights for policy 0, policy_version 316048 (0.0018) [2025-01-05 16:29:00,589][07482] Updated weights for policy 0, policy_version 316058 (0.0016) [2025-01-05 16:29:02,719][07482] Updated weights for policy 0, policy_version 316068 (0.0017) [2025-01-05 16:29:02,851][07361] Fps is (10 sec: 19251.2, 60 sec: 19524.5, 300 sec: 19563.6). Total num frames: 1294614528. Throughput: 0: 4886.8. Samples: 19116992. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 16:29:02,852][07361] Avg episode reward: [(0, '44.290')] [2025-01-05 16:29:04,792][07482] Updated weights for policy 0, policy_version 316078 (0.0017) [2025-01-05 16:29:06,841][07482] Updated weights for policy 0, policy_version 316088 (0.0016) [2025-01-05 16:29:07,852][07361] Fps is (10 sec: 19660.7, 60 sec: 19524.3, 300 sec: 19563.6). Total num frames: 1294712832. Throughput: 0: 4883.7. Samples: 19131590. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 16:29:07,852][07361] Avg episode reward: [(0, '42.158')] [2025-01-05 16:29:09,006][07482] Updated weights for policy 0, policy_version 316098 (0.0017) [2025-01-05 16:29:11,016][07482] Updated weights for policy 0, policy_version 316108 (0.0021) [2025-01-05 16:29:12,851][07361] Fps is (10 sec: 19660.8, 60 sec: 19524.3, 300 sec: 19563.6). Total num frames: 1294811136. Throughput: 0: 4895.9. Samples: 19161130. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 16:29:12,852][07361] Avg episode reward: [(0, '44.449')] [2025-01-05 16:29:13,104][07482] Updated weights for policy 0, policy_version 316118 (0.0017) [2025-01-05 16:29:15,201][07482] Updated weights for policy 0, policy_version 316128 (0.0017) [2025-01-05 16:29:17,225][07482] Updated weights for policy 0, policy_version 316138 (0.0017) [2025-01-05 16:29:17,852][07361] Fps is (10 sec: 19660.8, 60 sec: 19524.2, 300 sec: 19563.6). Total num frames: 1294909440. Throughput: 0: 4901.9. Samples: 19190864. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 16:29:17,852][07361] Avg episode reward: [(0, '46.689')] [2025-01-05 16:29:19,318][07482] Updated weights for policy 0, policy_version 316148 (0.0017) [2025-01-05 16:29:21,416][07482] Updated weights for policy 0, policy_version 316158 (0.0018) [2025-01-05 16:29:22,852][07361] Fps is (10 sec: 19660.7, 60 sec: 19592.6, 300 sec: 19563.6). Total num frames: 1295007744. Throughput: 0: 4905.0. Samples: 19205592. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 16:29:22,852][07361] Avg episode reward: [(0, '46.866')] [2025-01-05 16:29:23,498][07482] Updated weights for policy 0, policy_version 316168 (0.0017) [2025-01-05 16:29:25,539][07482] Updated weights for policy 0, policy_version 316178 (0.0015) [2025-01-05 16:29:27,631][07482] Updated weights for policy 0, policy_version 316188 (0.0017) [2025-01-05 16:29:27,852][07361] Fps is (10 sec: 20070.3, 60 sec: 19660.8, 300 sec: 19577.5). Total num frames: 1295110144. Throughput: 0: 4911.4. Samples: 19235222. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 16:29:27,852][07361] Avg episode reward: [(0, '44.802')] [2025-01-05 16:29:29,717][07482] Updated weights for policy 0, policy_version 316198 (0.0017) [2025-01-05 16:29:31,757][07482] Updated weights for policy 0, policy_version 316208 (0.0017) [2025-01-05 16:29:32,851][07361] Fps is (10 sec: 20070.5, 60 sec: 19660.8, 300 sec: 19577.5). Total num frames: 1295208448. Throughput: 0: 4914.7. Samples: 19264632. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 16:29:32,852][07361] Avg episode reward: [(0, '41.965')] [2025-01-05 16:29:33,955][07482] Updated weights for policy 0, policy_version 316218 (0.0016) [2025-01-05 16:29:35,944][07482] Updated weights for policy 0, policy_version 316228 (0.0015) [2025-01-05 16:29:37,851][07361] Fps is (10 sec: 19661.1, 60 sec: 19660.9, 300 sec: 19577.5). Total num frames: 1295306752. Throughput: 0: 4909.0. Samples: 19279360. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 16:29:37,852][07361] Avg episode reward: [(0, '43.389')] [2025-01-05 16:29:37,984][07482] Updated weights for policy 0, policy_version 316238 (0.0016) [2025-01-05 16:29:40,118][07482] Updated weights for policy 0, policy_version 316248 (0.0017) [2025-01-05 16:29:42,103][07482] Updated weights for policy 0, policy_version 316258 (0.0016) [2025-01-05 16:29:42,852][07361] Fps is (10 sec: 19660.7, 60 sec: 19660.8, 300 sec: 19577.5). Total num frames: 1295405056. Throughput: 0: 4925.0. Samples: 19309310. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 16:29:42,852][07361] Avg episode reward: [(0, '43.708')] [2025-01-05 16:29:44,154][07482] Updated weights for policy 0, policy_version 316268 (0.0017) [2025-01-05 16:29:46,260][07482] Updated weights for policy 0, policy_version 316278 (0.0016) [2025-01-05 16:29:47,852][07361] Fps is (10 sec: 19660.6, 60 sec: 19660.8, 300 sec: 19577.5). Total num frames: 1295503360. Throughput: 0: 4933.3. Samples: 19338992. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 16:29:47,852][07361] Avg episode reward: [(0, '46.514')] [2025-01-05 16:29:48,385][07482] Updated weights for policy 0, policy_version 316288 (0.0017) [2025-01-05 16:29:50,444][07482] Updated weights for policy 0, policy_version 316298 (0.0015) [2025-01-05 16:29:52,551][07482] Updated weights for policy 0, policy_version 316308 (0.0016) [2025-01-05 16:29:52,851][07361] Fps is (10 sec: 19661.0, 60 sec: 19660.8, 300 sec: 19577.5). Total num frames: 1295601664. Throughput: 0: 4933.3. Samples: 19353588. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 16:29:52,852][07361] Avg episode reward: [(0, '49.187')] [2025-01-05 16:29:52,930][07448] Saving new best policy, reward=49.187! [2025-01-05 16:29:54,669][07482] Updated weights for policy 0, policy_version 316318 (0.0016) [2025-01-05 16:29:56,721][07482] Updated weights for policy 0, policy_version 316328 (0.0018) [2025-01-05 16:29:57,852][07361] Fps is (10 sec: 19660.4, 60 sec: 19729.0, 300 sec: 19591.4). Total num frames: 1295699968. Throughput: 0: 4929.3. Samples: 19382952. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) [2025-01-05 16:29:57,852][07361] Avg episode reward: [(0, '45.870')] [2025-01-05 16:29:58,915][07482] Updated weights for policy 0, policy_version 316338 (0.0016) [2025-01-05 16:30:00,926][07482] Updated weights for policy 0, policy_version 316348 (0.0016) [2025-01-05 16:30:02,852][07361] Fps is (10 sec: 19660.5, 60 sec: 19729.0, 300 sec: 19591.4). Total num frames: 1295798272. Throughput: 0: 4920.0. Samples: 19412262. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 16:30:02,852][07361] Avg episode reward: [(0, '45.132')] [2025-01-05 16:30:02,996][07482] Updated weights for policy 0, policy_version 316358 (0.0015) [2025-01-05 16:30:05,130][07482] Updated weights for policy 0, policy_version 316368 (0.0017) [2025-01-05 16:30:07,153][07482] Updated weights for policy 0, policy_version 316378 (0.0016) [2025-01-05 16:30:07,851][07361] Fps is (10 sec: 19661.4, 60 sec: 19729.1, 300 sec: 19591.4). Total num frames: 1295896576. Throughput: 0: 4923.0. Samples: 19427126. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 16:30:07,852][07361] Avg episode reward: [(0, '46.297')] [2025-01-05 16:30:09,305][07482] Updated weights for policy 0, policy_version 316388 (0.0018) [2025-01-05 16:30:11,418][07482] Updated weights for policy 0, policy_version 316398 (0.0016) [2025-01-05 16:30:12,851][07361] Fps is (10 sec: 19251.5, 60 sec: 19660.8, 300 sec: 19577.5). Total num frames: 1295990784. Throughput: 0: 4912.8. Samples: 19456298. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 16:30:12,852][07361] Avg episode reward: [(0, '45.500')] [2025-01-05 16:30:13,527][07482] Updated weights for policy 0, policy_version 316408 (0.0017) [2025-01-05 16:30:15,581][07482] Updated weights for policy 0, policy_version 316418 (0.0016) [2025-01-05 16:30:17,680][07482] Updated weights for policy 0, policy_version 316428 (0.0016) [2025-01-05 16:30:17,852][07361] Fps is (10 sec: 19251.0, 60 sec: 19660.8, 300 sec: 19577.5). Total num frames: 1296089088. Throughput: 0: 4912.7. Samples: 19485702. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 16:30:17,852][07361] Avg episode reward: [(0, '45.789')] [2025-01-05 16:30:19,820][07482] Updated weights for policy 0, policy_version 316438 (0.0016) [2025-01-05 16:30:21,881][07482] Updated weights for policy 0, policy_version 316448 (0.0016) [2025-01-05 16:30:22,851][07361] Fps is (10 sec: 19660.8, 60 sec: 19660.8, 300 sec: 19591.4). Total num frames: 1296187392. Throughput: 0: 4905.9. Samples: 19500124. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 16:30:22,852][07361] Avg episode reward: [(0, '46.876')] [2025-01-05 16:30:24,051][07482] Updated weights for policy 0, policy_version 316458 (0.0017) [2025-01-05 16:30:26,088][07482] Updated weights for policy 0, policy_version 316468 (0.0016) [2025-01-05 16:30:27,852][07361] Fps is (10 sec: 19251.2, 60 sec: 19524.3, 300 sec: 19577.5). Total num frames: 1296281600. Throughput: 0: 4891.6. Samples: 19529432. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 16:30:27,852][07361] Avg episode reward: [(0, '44.685')] [2025-01-05 16:30:28,299][07482] Updated weights for policy 0, policy_version 316478 (0.0016) [2025-01-05 16:30:30,418][07482] Updated weights for policy 0, policy_version 316488 (0.0016) [2025-01-05 16:30:32,508][07482] Updated weights for policy 0, policy_version 316498 (0.0018) [2025-01-05 16:30:32,852][07361] Fps is (10 sec: 19250.7, 60 sec: 19524.2, 300 sec: 19577.5). Total num frames: 1296379904. Throughput: 0: 4873.3. Samples: 19558292. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 16:30:32,852][07361] Avg episode reward: [(0, '44.408')] [2025-01-05 16:30:34,910][07482] Updated weights for policy 0, policy_version 316508 (0.0018) [2025-01-05 16:30:37,089][07482] Updated weights for policy 0, policy_version 316518 (0.0017) [2025-01-05 16:30:37,852][07361] Fps is (10 sec: 18841.6, 60 sec: 19387.7, 300 sec: 19549.7). Total num frames: 1296470016. Throughput: 0: 4835.9. Samples: 19571206. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 16:30:37,852][07361] Avg episode reward: [(0, '48.190')] [2025-01-05 16:30:37,901][07448] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000316522_1296474112.pth... [2025-01-05 16:30:37,961][07448] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000315375_1291776000.pth [2025-01-05 16:30:39,264][07482] Updated weights for policy 0, policy_version 316528 (0.0018) [2025-01-05 16:30:41,364][07482] Updated weights for policy 0, policy_version 316538 (0.0017) [2025-01-05 16:30:42,852][07361] Fps is (10 sec: 18432.3, 60 sec: 19319.5, 300 sec: 19549.7). Total num frames: 1296564224. Throughput: 0: 4817.2. Samples: 19599726. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) [2025-01-05 16:30:42,852][07361] Avg episode reward: [(0, '47.135')] [2025-01-05 16:30:43,498][07482] Updated weights for policy 0, policy_version 316548 (0.0017) [2025-01-05 16:30:43,981][07361] Keyboard interrupt detected in the event loop EvtLoop [Runner_EvtLoop, process=main process 7361], exiting... [2025-01-05 16:30:43,982][07448] Stopping Batcher_0... [2025-01-05 16:30:43,984][07361] Runner profile tree view: main_loop: 4202.8490 [2025-01-05 16:30:43,983][07448] Loop batcher_evt_loop terminating... [2025-01-05 16:30:43,985][07448] Saving /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000316550_1296588800.pth... [2025-01-05 16:30:43,985][07361] Collected {0: 1296588800}, FPS: 18665.1 [2025-01-05 16:30:44,050][07482] Weights refcount: 2 0 [2025-01-05 16:30:44,034][07505] EvtLoop [rollout_proc6_evt_loop, process=rollout_proc6] unhandled exception in slot='advance_rollouts' connected to emitter=Emitter(object_id='InferenceWorker_p0-w0', signal_name='advance6'), args=(1, 0) Traceback (most recent call last): File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/signal_slot/signal_slot.py", line 355, in _process_signal slot_callable(*args) File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/sample_factory/algo/sampling/rollout_worker.py", line 241, in advance_rollouts complete_rollouts, episodic_stats = runner.advance_rollouts(policy_id, self.timing) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/sample_factory/algo/sampling/non_batched_sampling.py", line 634, in advance_rollouts new_obs, rewards, terminated, truncated, infos = e.step(actions) ^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/gymnasium/core.py", line 461, in step return self.env.step(action) ^^^^^^^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/sample_factory/algo/utils/make_env.py", line 129, in step obs, rew, terminated, truncated, info = self.env.step(action) ^^^^^^^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/sample_factory/algo/utils/make_env.py", line 115, in step obs, rew, terminated, truncated, info = self.env.step(action) ^^^^^^^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/sf_examples/vizdoom/doom/wrappers/scenario_wrappers/gathering_reward_shaping.py", line 33, in step observation, reward, terminated, truncated, info = self.env.step(action) ^^^^^^^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/gymnasium/core.py", line 522, in step observation, reward, terminated, truncated, info = self.env.step(action) ^^^^^^^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/sample_factory/envs/env_wrappers.py", line 86, in step obs, reward, terminated, truncated, info = self.env.step(action) ^^^^^^^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/gymnasium/core.py", line 461, in step return self.env.step(action) ^^^^^^^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/sf_examples/vizdoom/doom/wrappers/multiplayer_stats.py", line 54, in step obs, reward, terminated, truncated, info = self.env.step(action) ^^^^^^^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/sf_examples/vizdoom/doom/doom_gym.py", line 452, in step reward = self.game.make_action(actions_flattened, self.skip_frames) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ vizdoom.vizdoom.SignalException: Signal SIGINT received. ViZDoom instance has been closed. [2025-01-05 16:30:44,056][07482] Stopping InferenceWorker_p0-w0... [2025-01-05 16:30:44,053][07510] EvtLoop [rollout_proc11_evt_loop, process=rollout_proc11] unhandled exception in slot='advance_rollouts' connected to emitter=Emitter(object_id='InferenceWorker_p0-w0', signal_name='advance11'), args=(0, 0) Traceback (most recent call last): File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/signal_slot/signal_slot.py", line 355, in _process_signal slot_callable(*args) File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/sample_factory/algo/sampling/rollout_worker.py", line 241, in advance_rollouts complete_rollouts, episodic_stats = runner.advance_rollouts(policy_id, self.timing) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/sample_factory/algo/sampling/non_batched_sampling.py", line 634, in advance_rollouts new_obs, rewards, terminated, truncated, infos = e.step(actions) ^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/gymnasium/core.py", line 461, in step return self.env.step(action) ^^^^^^^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/sample_factory/algo/utils/make_env.py", line 129, in step obs, rew, terminated, truncated, info = self.env.step(action) ^^^^^^^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/sample_factory/algo/utils/make_env.py", line 115, in step obs, rew, terminated, truncated, info = self.env.step(action) ^^^^^^^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/sf_examples/vizdoom/doom/wrappers/scenario_wrappers/gathering_reward_shaping.py", line 33, in step observation, reward, terminated, truncated, info = self.env.step(action) ^^^^^^^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/gymnasium/core.py", line 522, in step observation, reward, terminated, truncated, info = self.env.step(action) ^^^^^^^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/sample_factory/envs/env_wrappers.py", line 86, in step obs, reward, terminated, truncated, info = self.env.step(action) ^^^^^^^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/gymnasium/core.py", line 461, in step return self.env.step(action) ^^^^^^^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/sf_examples/vizdoom/doom/wrappers/multiplayer_stats.py", line 54, in step obs, reward, terminated, truncated, info = self.env.step(action) ^^^^^^^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/sf_examples/vizdoom/doom/doom_gym.py", line 452, in step reward = self.game.make_action(actions_flattened, self.skip_frames) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ vizdoom.vizdoom.SignalException: Signal SIGINT received. ViZDoom instance has been closed. [2025-01-05 16:30:44,058][07482] Loop inference_proc0-0_evt_loop terminating... [2025-01-05 16:30:44,058][07510] Unhandled exception Signal SIGINT received. ViZDoom instance has been closed. in evt loop rollout_proc11_evt_loop [2025-01-05 16:30:44,058][07505] Unhandled exception Signal SIGINT received. ViZDoom instance has been closed. in evt loop rollout_proc6_evt_loop [2025-01-05 16:30:44,059][07448] Removing /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000315950_1294131200.pth [2025-01-05 16:30:44,060][07448] Stopping LearnerWorker_p0... [2025-01-05 16:30:44,061][07448] Loop learner_proc0_evt_loop terminating... [2025-01-05 16:30:44,058][07483] EvtLoop [rollout_proc2_evt_loop, process=rollout_proc2] unhandled exception in slot='advance_rollouts' connected to emitter=Emitter(object_id='InferenceWorker_p0-w0', signal_name='advance2'), args=(1, 0) Traceback (most recent call last): File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/signal_slot/signal_slot.py", line 355, in _process_signal slot_callable(*args) File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/sample_factory/algo/sampling/rollout_worker.py", line 241, in advance_rollouts complete_rollouts, episodic_stats = runner.advance_rollouts(policy_id, self.timing) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/sample_factory/algo/sampling/non_batched_sampling.py", line 634, in advance_rollouts new_obs, rewards, terminated, truncated, infos = e.step(actions) ^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/gymnasium/core.py", line 461, in step return self.env.step(action) ^^^^^^^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/sample_factory/algo/utils/make_env.py", line 129, in step obs, rew, terminated, truncated, info = self.env.step(action) ^^^^^^^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/sample_factory/algo/utils/make_env.py", line 115, in step obs, rew, terminated, truncated, info = self.env.step(action) ^^^^^^^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/sf_examples/vizdoom/doom/wrappers/scenario_wrappers/gathering_reward_shaping.py", line 33, in step observation, reward, terminated, truncated, info = self.env.step(action) ^^^^^^^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/gymnasium/core.py", line 522, in step observation, reward, terminated, truncated, info = self.env.step(action) ^^^^^^^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/sample_factory/envs/env_wrappers.py", line 86, in step obs, reward, terminated, truncated, info = self.env.step(action) ^^^^^^^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/gymnasium/core.py", line 461, in step return self.env.step(action) ^^^^^^^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/sf_examples/vizdoom/doom/wrappers/multiplayer_stats.py", line 54, in step obs, reward, terminated, truncated, info = self.env.step(action) ^^^^^^^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/sf_examples/vizdoom/doom/doom_gym.py", line 452, in step reward = self.game.make_action(actions_flattened, self.skip_frames) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ vizdoom.vizdoom.SignalException: Signal SIGINT received. ViZDoom instance has been closed. [2025-01-05 16:30:44,063][07483] Unhandled exception Signal SIGINT received. ViZDoom instance has been closed. in evt loop rollout_proc2_evt_loop [2025-01-05 16:30:44,056][07485] EvtLoop [rollout_proc1_evt_loop, process=rollout_proc1] unhandled exception in slot='advance_rollouts' connected to emitter=Emitter(object_id='InferenceWorker_p0-w0', signal_name='advance1'), args=(1, 0) Traceback (most recent call last): File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/signal_slot/signal_slot.py", line 355, in _process_signal slot_callable(*args) File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/sample_factory/algo/sampling/rollout_worker.py", line 241, in advance_rollouts complete_rollouts, episodic_stats = runner.advance_rollouts(policy_id, self.timing) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/sample_factory/algo/sampling/non_batched_sampling.py", line 634, in advance_rollouts new_obs, rewards, terminated, truncated, infos = e.step(actions) ^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/gymnasium/core.py", line 461, in step return self.env.step(action) ^^^^^^^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/sample_factory/algo/utils/make_env.py", line 129, in step obs, rew, terminated, truncated, info = self.env.step(action) ^^^^^^^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/sample_factory/algo/utils/make_env.py", line 115, in step obs, rew, terminated, truncated, info = self.env.step(action) ^^^^^^^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/sf_examples/vizdoom/doom/wrappers/scenario_wrappers/gathering_reward_shaping.py", line 33, in step observation, reward, terminated, truncated, info = self.env.step(action) ^^^^^^^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/gymnasium/core.py", line 522, in step observation, reward, terminated, truncated, info = self.env.step(action) ^^^^^^^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/sample_factory/envs/env_wrappers.py", line 86, in step obs, reward, terminated, truncated, info = self.env.step(action) ^^^^^^^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/gymnasium/core.py", line 461, in step return self.env.step(action) ^^^^^^^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/sf_examples/vizdoom/doom/wrappers/multiplayer_stats.py", line 54, in step obs, reward, terminated, truncated, info = self.env.step(action) ^^^^^^^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/sf_examples/vizdoom/doom/doom_gym.py", line 452, in step reward = self.game.make_action(actions_flattened, self.skip_frames) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ vizdoom.vizdoom.SignalException: Signal SIGINT received. ViZDoom instance has been closed. [2025-01-05 16:30:44,065][07485] Unhandled exception Signal SIGINT received. ViZDoom instance has been closed. in evt loop rollout_proc1_evt_loop [2025-01-05 16:30:44,065][07509] EvtLoop [rollout_proc9_evt_loop, process=rollout_proc9] unhandled exception in slot='advance_rollouts' connected to emitter=Emitter(object_id='InferenceWorker_p0-w0', signal_name='advance9'), args=(1, 0) Traceback (most recent call last): File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/signal_slot/signal_slot.py", line 355, in _process_signal slot_callable(*args) File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/sample_factory/algo/sampling/rollout_worker.py", line 241, in advance_rollouts complete_rollouts, episodic_stats = runner.advance_rollouts(policy_id, self.timing) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/sample_factory/algo/sampling/non_batched_sampling.py", line 634, in advance_rollouts new_obs, rewards, terminated, truncated, infos = e.step(actions) ^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/gymnasium/core.py", line 461, in step return self.env.step(action) ^^^^^^^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/sample_factory/algo/utils/make_env.py", line 129, in step obs, rew, terminated, truncated, info = self.env.step(action) ^^^^^^^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/sample_factory/algo/utils/make_env.py", line 115, in step obs, rew, terminated, truncated, info = self.env.step(action) ^^^^^^^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/sf_examples/vizdoom/doom/wrappers/scenario_wrappers/gathering_reward_shaping.py", line 33, in step observation, reward, terminated, truncated, info = self.env.step(action) ^^^^^^^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/gymnasium/core.py", line 522, in step observation, reward, terminated, truncated, info = self.env.step(action) ^^^^^^^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/sample_factory/envs/env_wrappers.py", line 86, in step obs, reward, terminated, truncated, info = self.env.step(action) ^^^^^^^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/gymnasium/core.py", line 461, in step return self.env.step(action) ^^^^^^^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/sf_examples/vizdoom/doom/wrappers/multiplayer_stats.py", line 54, in step obs, reward, terminated, truncated, info = self.env.step(action) ^^^^^^^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/sf_examples/vizdoom/doom/doom_gym.py", line 452, in step reward = self.game.make_action(actions_flattened, self.skip_frames) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ vizdoom.vizdoom.SignalException: Signal SIGINT received. ViZDoom instance has been closed. [2025-01-05 16:30:44,088][07509] Unhandled exception Signal SIGINT received. ViZDoom instance has been closed. in evt loop rollout_proc9_evt_loop [2025-01-05 16:30:44,079][07486] EvtLoop [rollout_proc0_evt_loop, process=rollout_proc0] unhandled exception in slot='advance_rollouts' connected to emitter=Emitter(object_id='InferenceWorker_p0-w0', signal_name='advance0'), args=(1, 0) Traceback (most recent call last): File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/signal_slot/signal_slot.py", line 355, in _process_signal slot_callable(*args) File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/sample_factory/algo/sampling/rollout_worker.py", line 241, in advance_rollouts complete_rollouts, episodic_stats = runner.advance_rollouts(policy_id, self.timing) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/sample_factory/algo/sampling/non_batched_sampling.py", line 634, in advance_rollouts new_obs, rewards, terminated, truncated, infos = e.step(actions) ^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/gymnasium/core.py", line 461, in step return self.env.step(action) ^^^^^^^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/sample_factory/algo/utils/make_env.py", line 129, in step obs, rew, terminated, truncated, info = self.env.step(action) ^^^^^^^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/sample_factory/algo/utils/make_env.py", line 115, in step obs, rew, terminated, truncated, info = self.env.step(action) ^^^^^^^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/sf_examples/vizdoom/doom/wrappers/scenario_wrappers/gathering_reward_shaping.py", line 33, in step observation, reward, terminated, truncated, info = self.env.step(action) ^^^^^^^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/gymnasium/core.py", line 522, in step observation, reward, terminated, truncated, info = self.env.step(action) ^^^^^^^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/sample_factory/envs/env_wrappers.py", line 86, in step obs, reward, terminated, truncated, info = self.env.step(action) ^^^^^^^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/gymnasium/core.py", line 461, in step return self.env.step(action) ^^^^^^^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/sf_examples/vizdoom/doom/wrappers/multiplayer_stats.py", line 54, in step obs, reward, terminated, truncated, info = self.env.step(action) ^^^^^^^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/sf_examples/vizdoom/doom/doom_gym.py", line 452, in step reward = self.game.make_action(actions_flattened, self.skip_frames) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ vizdoom.vizdoom.SignalException: Signal SIGINT received. ViZDoom instance has been closed. [2025-01-05 16:30:44,090][07486] Unhandled exception Signal SIGINT received. ViZDoom instance has been closed. in evt loop rollout_proc0_evt_loop [2025-01-05 16:30:44,064][07503] EvtLoop [rollout_proc5_evt_loop, process=rollout_proc5] unhandled exception in slot='advance_rollouts' connected to emitter=Emitter(object_id='InferenceWorker_p0-w0', signal_name='advance5'), args=(1, 0) Traceback (most recent call last): File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/signal_slot/signal_slot.py", line 355, in _process_signal slot_callable(*args) File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/sample_factory/algo/sampling/rollout_worker.py", line 241, in advance_rollouts complete_rollouts, episodic_stats = runner.advance_rollouts(policy_id, self.timing) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/sample_factory/algo/sampling/non_batched_sampling.py", line 634, in advance_rollouts new_obs, rewards, terminated, truncated, infos = e.step(actions) ^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/gymnasium/core.py", line 461, in step return self.env.step(action) ^^^^^^^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/sample_factory/algo/utils/make_env.py", line 129, in step obs, rew, terminated, truncated, info = self.env.step(action) ^^^^^^^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/sample_factory/algo/utils/make_env.py", line 115, in step obs, rew, terminated, truncated, info = self.env.step(action) ^^^^^^^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/sf_examples/vizdoom/doom/wrappers/scenario_wrappers/gathering_reward_shaping.py", line 33, in step observation, reward, terminated, truncated, info = self.env.step(action) ^^^^^^^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/gymnasium/core.py", line 522, in step observation, reward, terminated, truncated, info = self.env.step(action) ^^^^^^^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/sample_factory/envs/env_wrappers.py", line 86, in step obs, reward, terminated, truncated, info = self.env.step(action) ^^^^^^^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/gymnasium/core.py", line 461, in step return self.env.step(action) ^^^^^^^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/sf_examples/vizdoom/doom/wrappers/multiplayer_stats.py", line 54, in step obs, reward, terminated, truncated, info = self.env.step(action) ^^^^^^^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/sf_examples/vizdoom/doom/doom_gym.py", line 452, in step reward = self.game.make_action(actions_flattened, self.skip_frames) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ vizdoom.vizdoom.SignalException: Signal SIGINT received. ViZDoom instance has been closed. [2025-01-05 16:30:44,098][07503] Unhandled exception Signal SIGINT received. ViZDoom instance has been closed. in evt loop rollout_proc5_evt_loop [2025-01-05 16:30:44,079][07507] EvtLoop [rollout_proc8_evt_loop, process=rollout_proc8] unhandled exception in slot='advance_rollouts' connected to emitter=Emitter(object_id='InferenceWorker_p0-w0', signal_name='advance8'), args=(1, 0) Traceback (most recent call last): File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/signal_slot/signal_slot.py", line 355, in _process_signal slot_callable(*args) File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/sample_factory/algo/sampling/rollout_worker.py", line 241, in advance_rollouts complete_rollouts, episodic_stats = runner.advance_rollouts(policy_id, self.timing) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/sample_factory/algo/sampling/non_batched_sampling.py", line 634, in advance_rollouts new_obs, rewards, terminated, truncated, infos = e.step(actions) ^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/gymnasium/core.py", line 461, in step return self.env.step(action) ^^^^^^^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/sample_factory/algo/utils/make_env.py", line 129, in step obs, rew, terminated, truncated, info = self.env.step(action) ^^^^^^^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/sample_factory/algo/utils/make_env.py", line 115, in step obs, rew, terminated, truncated, info = self.env.step(action) ^^^^^^^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/sf_examples/vizdoom/doom/wrappers/scenario_wrappers/gathering_reward_shaping.py", line 33, in step observation, reward, terminated, truncated, info = self.env.step(action) ^^^^^^^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/gymnasium/core.py", line 522, in step observation, reward, terminated, truncated, info = self.env.step(action) ^^^^^^^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/sample_factory/envs/env_wrappers.py", line 86, in step obs, reward, terminated, truncated, info = self.env.step(action) ^^^^^^^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/gymnasium/core.py", line 461, in step return self.env.step(action) ^^^^^^^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/sf_examples/vizdoom/doom/wrappers/multiplayer_stats.py", line 54, in step obs, reward, terminated, truncated, info = self.env.step(action) ^^^^^^^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/sf_examples/vizdoom/doom/doom_gym.py", line 452, in step reward = self.game.make_action(actions_flattened, self.skip_frames) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ vizdoom.vizdoom.SignalException: Signal SIGINT received. ViZDoom instance has been closed. [2025-01-05 16:30:44,100][07507] Unhandled exception Signal SIGINT received. ViZDoom instance has been closed. in evt loop rollout_proc8_evt_loop [2025-01-05 16:30:44,117][07504] EvtLoop [rollout_proc7_evt_loop, process=rollout_proc7] unhandled exception in slot='advance_rollouts' connected to emitter=Emitter(object_id='InferenceWorker_p0-w0', signal_name='advance7'), args=(1, 0) Traceback (most recent call last): File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/signal_slot/signal_slot.py", line 355, in _process_signal slot_callable(*args) File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/sample_factory/algo/sampling/rollout_worker.py", line 241, in advance_rollouts complete_rollouts, episodic_stats = runner.advance_rollouts(policy_id, self.timing) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/sample_factory/algo/sampling/non_batched_sampling.py", line 634, in advance_rollouts new_obs, rewards, terminated, truncated, infos = e.step(actions) ^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/gymnasium/core.py", line 461, in step return self.env.step(action) ^^^^^^^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/sample_factory/algo/utils/make_env.py", line 129, in step obs, rew, terminated, truncated, info = self.env.step(action) ^^^^^^^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/sample_factory/algo/utils/make_env.py", line 115, in step obs, rew, terminated, truncated, info = self.env.step(action) ^^^^^^^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/sf_examples/vizdoom/doom/wrappers/scenario_wrappers/gathering_reward_shaping.py", line 33, in step observation, reward, terminated, truncated, info = self.env.step(action) ^^^^^^^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/gymnasium/core.py", line 522, in step observation, reward, terminated, truncated, info = self.env.step(action) ^^^^^^^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/sample_factory/envs/env_wrappers.py", line 86, in step obs, reward, terminated, truncated, info = self.env.step(action) ^^^^^^^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/gymnasium/core.py", line 461, in step return self.env.step(action) ^^^^^^^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/sf_examples/vizdoom/doom/wrappers/multiplayer_stats.py", line 54, in step obs, reward, terminated, truncated, info = self.env.step(action) ^^^^^^^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/sf_examples/vizdoom/doom/doom_gym.py", line 452, in step reward = self.game.make_action(actions_flattened, self.skip_frames) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ vizdoom.vizdoom.SignalException: Signal SIGINT received. ViZDoom instance has been closed. [2025-01-05 16:30:44,128][07504] Unhandled exception Signal SIGINT received. ViZDoom instance has been closed. in evt loop rollout_proc7_evt_loop [2025-01-05 16:30:44,146][07506] EvtLoop [rollout_proc4_evt_loop, process=rollout_proc4] unhandled exception in slot='advance_rollouts' connected to emitter=Emitter(object_id='InferenceWorker_p0-w0', signal_name='advance4'), args=(0, 0) Traceback (most recent call last): File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/signal_slot/signal_slot.py", line 355, in _process_signal slot_callable(*args) File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/sample_factory/algo/sampling/rollout_worker.py", line 241, in advance_rollouts complete_rollouts, episodic_stats = runner.advance_rollouts(policy_id, self.timing) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/sample_factory/algo/sampling/non_batched_sampling.py", line 634, in advance_rollouts new_obs, rewards, terminated, truncated, infos = e.step(actions) ^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/gymnasium/core.py", line 461, in step return self.env.step(action) ^^^^^^^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/sample_factory/algo/utils/make_env.py", line 129, in step obs, rew, terminated, truncated, info = self.env.step(action) ^^^^^^^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/sample_factory/algo/utils/make_env.py", line 115, in step obs, rew, terminated, truncated, info = self.env.step(action) ^^^^^^^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/sf_examples/vizdoom/doom/wrappers/scenario_wrappers/gathering_reward_shaping.py", line 33, in step observation, reward, terminated, truncated, info = self.env.step(action) ^^^^^^^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/gymnasium/core.py", line 522, in step observation, reward, terminated, truncated, info = self.env.step(action) ^^^^^^^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/sample_factory/envs/env_wrappers.py", line 86, in step obs, reward, terminated, truncated, info = self.env.step(action) ^^^^^^^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/gymnasium/core.py", line 461, in step return self.env.step(action) ^^^^^^^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/sf_examples/vizdoom/doom/wrappers/multiplayer_stats.py", line 54, in step obs, reward, terminated, truncated, info = self.env.step(action) ^^^^^^^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/sf_examples/vizdoom/doom/doom_gym.py", line 452, in step reward = self.game.make_action(actions_flattened, self.skip_frames) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ vizdoom.vizdoom.SignalException: Signal SIGINT received. ViZDoom instance has been closed. [2025-01-05 16:30:44,151][07506] Unhandled exception Signal SIGINT received. ViZDoom instance has been closed. in evt loop rollout_proc4_evt_loop [2025-01-05 16:30:44,180][07484] EvtLoop [rollout_proc3_evt_loop, process=rollout_proc3] unhandled exception in slot='advance_rollouts' connected to emitter=Emitter(object_id='InferenceWorker_p0-w0', signal_name='advance3'), args=(0, 0) Traceback (most recent call last): File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/signal_slot/signal_slot.py", line 355, in _process_signal slot_callable(*args) File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/sample_factory/algo/sampling/rollout_worker.py", line 241, in advance_rollouts complete_rollouts, episodic_stats = runner.advance_rollouts(policy_id, self.timing) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/sample_factory/algo/sampling/non_batched_sampling.py", line 634, in advance_rollouts new_obs, rewards, terminated, truncated, infos = e.step(actions) ^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/gymnasium/core.py", line 461, in step return self.env.step(action) ^^^^^^^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/sample_factory/algo/utils/make_env.py", line 129, in step obs, rew, terminated, truncated, info = self.env.step(action) ^^^^^^^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/sample_factory/algo/utils/make_env.py", line 115, in step obs, rew, terminated, truncated, info = self.env.step(action) ^^^^^^^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/sf_examples/vizdoom/doom/wrappers/scenario_wrappers/gathering_reward_shaping.py", line 33, in step observation, reward, terminated, truncated, info = self.env.step(action) ^^^^^^^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/gymnasium/core.py", line 522, in step observation, reward, terminated, truncated, info = self.env.step(action) ^^^^^^^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/sample_factory/envs/env_wrappers.py", line 86, in step obs, reward, terminated, truncated, info = self.env.step(action) ^^^^^^^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/gymnasium/core.py", line 461, in step return self.env.step(action) ^^^^^^^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/sf_examples/vizdoom/doom/wrappers/multiplayer_stats.py", line 54, in step obs, reward, terminated, truncated, info = self.env.step(action) ^^^^^^^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/sf_examples/vizdoom/doom/doom_gym.py", line 452, in step reward = self.game.make_action(actions_flattened, self.skip_frames) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ vizdoom.vizdoom.SignalException: Signal SIGINT received. ViZDoom instance has been closed. [2025-01-05 16:30:44,187][07484] Unhandled exception Signal SIGINT received. ViZDoom instance has been closed. in evt loop rollout_proc3_evt_loop [2025-01-05 16:30:44,177][07508] EvtLoop [rollout_proc10_evt_loop, process=rollout_proc10] unhandled exception in slot='advance_rollouts' connected to emitter=Emitter(object_id='InferenceWorker_p0-w0', signal_name='advance10'), args=(0, 0) Traceback (most recent call last): File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/signal_slot/signal_slot.py", line 355, in _process_signal slot_callable(*args) File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/sample_factory/algo/sampling/rollout_worker.py", line 241, in advance_rollouts complete_rollouts, episodic_stats = runner.advance_rollouts(policy_id, self.timing) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/sample_factory/algo/sampling/non_batched_sampling.py", line 634, in advance_rollouts new_obs, rewards, terminated, truncated, infos = e.step(actions) ^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/gymnasium/core.py", line 461, in step return self.env.step(action) ^^^^^^^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/sample_factory/algo/utils/make_env.py", line 129, in step obs, rew, terminated, truncated, info = self.env.step(action) ^^^^^^^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/sample_factory/algo/utils/make_env.py", line 115, in step obs, rew, terminated, truncated, info = self.env.step(action) ^^^^^^^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/sf_examples/vizdoom/doom/wrappers/scenario_wrappers/gathering_reward_shaping.py", line 33, in step observation, reward, terminated, truncated, info = self.env.step(action) ^^^^^^^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/gymnasium/core.py", line 522, in step observation, reward, terminated, truncated, info = self.env.step(action) ^^^^^^^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/sample_factory/envs/env_wrappers.py", line 86, in step obs, reward, terminated, truncated, info = self.env.step(action) ^^^^^^^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/gymnasium/core.py", line 461, in step return self.env.step(action) ^^^^^^^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/sf_examples/vizdoom/doom/wrappers/multiplayer_stats.py", line 54, in step obs, reward, terminated, truncated, info = self.env.step(action) ^^^^^^^^^^^^^^^^^^^^^ File "/home/steve/Documents/AI/huggingface/rl/unit_8_2/.env/lib/python3.12/site-packages/sf_examples/vizdoom/doom/doom_gym.py", line 452, in step reward = self.game.make_action(actions_flattened, self.skip_frames) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ vizdoom.vizdoom.SignalException: Signal SIGINT received. ViZDoom instance has been closed. [2025-01-05 16:30:44,195][07508] Unhandled exception Signal SIGINT received. ViZDoom instance has been closed. in evt loop rollout_proc10_evt_loop [2025-01-05 16:30:44,894][07361] Loading existing experiment configuration from /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/config.json [2025-01-05 16:30:44,894][07361] Overriding arg 'num_workers' with value 1 passed from command line [2025-01-05 16:30:44,895][07361] Adding new argument 'no_render'=True that is not in the saved config file! [2025-01-05 16:30:44,895][07361] Adding new argument 'save_video'=True that is not in the saved config file! [2025-01-05 16:30:44,895][07361] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! [2025-01-05 16:30:44,896][07361] Adding new argument 'video_name'=None that is not in the saved config file! [2025-01-05 16:30:44,896][07361] Adding new argument 'max_num_frames'=1000000000.0 that is not in the saved config file! [2025-01-05 16:30:44,896][07361] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! [2025-01-05 16:30:44,896][07361] Adding new argument 'push_to_hub'=False that is not in the saved config file! [2025-01-05 16:30:44,897][07361] Adding new argument 'hf_repository'=None that is not in the saved config file! [2025-01-05 16:30:44,897][07361] Adding new argument 'policy_index'=0 that is not in the saved config file! [2025-01-05 16:30:44,897][07361] Adding new argument 'eval_deterministic'=False that is not in the saved config file! [2025-01-05 16:30:44,897][07361] Adding new argument 'train_script'=None that is not in the saved config file! [2025-01-05 16:30:44,898][07361] Adding new argument 'enjoy_script'=None that is not in the saved config file! [2025-01-05 16:30:44,898][07361] Using frameskip 1 and render_action_repeat=4 for evaluation [2025-01-05 16:30:44,950][07361] Doom resolution: 160x120, resize resolution: (128, 72) [2025-01-05 16:30:44,953][07361] RunningMeanStd input shape: (3, 72, 128) [2025-01-05 16:30:44,954][07361] RunningMeanStd input shape: (1,) [2025-01-05 16:30:44,979][07361] ConvEncoder: input_channels=3 [2025-01-05 16:30:45,211][07361] Conv encoder output size: 512 [2025-01-05 16:30:45,211][07361] Policy head output size: 512 [2025-01-05 16:30:45,338][07361] Loading state from checkpoint /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000316550_1296588800.pth... [2025-01-05 16:30:45,989][07361] Num frames 100... [2025-01-05 16:30:46,090][07361] Num frames 200... [2025-01-05 16:30:46,190][07361] Num frames 300... [2025-01-05 16:30:46,286][07361] Num frames 400... [2025-01-05 16:30:46,388][07361] Num frames 500... [2025-01-05 16:30:46,492][07361] Num frames 600... [2025-01-05 16:30:46,595][07361] Num frames 700... [2025-01-05 16:30:46,698][07361] Num frames 800... [2025-01-05 16:30:46,804][07361] Num frames 900... [2025-01-05 16:30:46,898][07361] Num frames 1000... [2025-01-05 16:30:46,998][07361] Num frames 1100... [2025-01-05 16:30:47,101][07361] Num frames 1200... [2025-01-05 16:30:47,205][07361] Num frames 1300... [2025-01-05 16:30:47,308][07361] Num frames 1400... [2025-01-05 16:30:47,410][07361] Num frames 1500... [2025-01-05 16:30:47,530][07361] Num frames 1600... [2025-01-05 16:30:47,632][07361] Num frames 1700... [2025-01-05 16:30:47,735][07361] Num frames 1800... [2025-01-05 16:30:47,839][07361] Num frames 1900... [2025-01-05 16:30:47,941][07361] Num frames 2000... [2025-01-05 16:30:48,045][07361] Num frames 2100... [2025-01-05 16:30:48,097][07361] Avg episode rewards: #0: 51.999, true rewards: #0: 21.000 [2025-01-05 16:30:48,098][07361] Avg episode reward: 51.999, avg true_objective: 21.000 [2025-01-05 16:30:48,224][07361] Num frames 2200... [2025-01-05 16:30:48,323][07361] Num frames 2300... [2025-01-05 16:30:48,422][07361] Num frames 2400... [2025-01-05 16:30:48,521][07361] Num frames 2500... [2025-01-05 16:30:48,644][07361] Avg episode rewards: #0: 29.840, true rewards: #0: 12.840 [2025-01-05 16:30:48,644][07361] Avg episode reward: 29.840, avg true_objective: 12.840 [2025-01-05 16:30:48,695][07361] Num frames 2600... [2025-01-05 16:30:48,790][07361] Num frames 2700... [2025-01-05 16:30:48,890][07361] Num frames 2800... [2025-01-05 16:30:48,994][07361] Num frames 2900... [2025-01-05 16:30:49,094][07361] Num frames 3000... [2025-01-05 16:30:49,193][07361] Num frames 3100... [2025-01-05 16:30:49,293][07361] Num frames 3200... [2025-01-05 16:30:49,391][07361] Num frames 3300... [2025-01-05 16:30:49,492][07361] Num frames 3400... [2025-01-05 16:30:49,591][07361] Num frames 3500... [2025-01-05 16:30:49,707][07361] Num frames 3600... [2025-01-05 16:30:49,810][07361] Num frames 3700... [2025-01-05 16:30:49,912][07361] Num frames 3800... [2025-01-05 16:30:50,013][07361] Num frames 3900... [2025-01-05 16:30:50,116][07361] Num frames 4000... [2025-01-05 16:30:50,218][07361] Num frames 4100... [2025-01-05 16:30:50,318][07361] Num frames 4200... [2025-01-05 16:30:50,419][07361] Num frames 4300... [2025-01-05 16:30:50,519][07361] Num frames 4400... [2025-01-05 16:30:50,618][07361] Num frames 4500... [2025-01-05 16:30:50,719][07361] Num frames 4600... [2025-01-05 16:30:50,850][07361] Avg episode rewards: #0: 39.893, true rewards: #0: 15.560 [2025-01-05 16:30:50,850][07361] Avg episode reward: 39.893, avg true_objective: 15.560 [2025-01-05 16:30:50,901][07361] Num frames 4700... [2025-01-05 16:30:50,992][07361] Num frames 4800... [2025-01-05 16:30:51,092][07361] Num frames 4900... [2025-01-05 16:30:51,190][07361] Num frames 5000... [2025-01-05 16:30:51,289][07361] Num frames 5100... [2025-01-05 16:30:51,387][07361] Num frames 5200... [2025-01-05 16:30:51,486][07361] Num frames 5300... [2025-01-05 16:30:51,583][07361] Num frames 5400... [2025-01-05 16:30:51,683][07361] Num frames 5500... [2025-01-05 16:30:51,769][07361] Avg episode rewards: #0: 33.829, true rewards: #0: 13.830 [2025-01-05 16:30:51,770][07361] Avg episode reward: 33.829, avg true_objective: 13.830 [2025-01-05 16:30:51,877][07361] Num frames 5600... [2025-01-05 16:30:51,970][07361] Num frames 5700... [2025-01-05 16:30:52,068][07361] Num frames 5800... [2025-01-05 16:30:52,165][07361] Num frames 5900... [2025-01-05 16:30:52,261][07361] Num frames 6000... [2025-01-05 16:30:52,358][07361] Num frames 6100... [2025-01-05 16:30:52,457][07361] Num frames 6200... [2025-01-05 16:30:52,555][07361] Num frames 6300... [2025-01-05 16:30:52,653][07361] Num frames 6400... [2025-01-05 16:30:52,749][07361] Num frames 6500... [2025-01-05 16:30:52,847][07361] Num frames 6600... [2025-01-05 16:30:52,946][07361] Num frames 6700... [2025-01-05 16:30:53,044][07361] Num frames 6800... [2025-01-05 16:30:53,142][07361] Num frames 6900... [2025-01-05 16:30:53,239][07361] Num frames 7000... [2025-01-05 16:30:53,336][07361] Num frames 7100... [2025-01-05 16:30:53,433][07361] Num frames 7200... [2025-01-05 16:30:53,532][07361] Num frames 7300... [2025-01-05 16:30:53,630][07361] Num frames 7400... [2025-01-05 16:30:53,729][07361] Num frames 7500... [2025-01-05 16:30:53,828][07361] Num frames 7600... [2025-01-05 16:30:53,913][07361] Avg episode rewards: #0: 39.263, true rewards: #0: 15.264 [2025-01-05 16:30:53,914][07361] Avg episode reward: 39.263, avg true_objective: 15.264 [2025-01-05 16:30:54,015][07361] Num frames 7700... [2025-01-05 16:30:54,112][07361] Num frames 7800... [2025-01-05 16:30:54,207][07361] Num frames 7900... [2025-01-05 16:30:54,304][07361] Num frames 8000... [2025-01-05 16:30:54,400][07361] Num frames 8100... [2025-01-05 16:30:54,499][07361] Num frames 8200... [2025-01-05 16:30:54,598][07361] Num frames 8300... [2025-01-05 16:30:54,694][07361] Num frames 8400... [2025-01-05 16:30:54,792][07361] Num frames 8500... [2025-01-05 16:30:54,896][07361] Num frames 8600... [2025-01-05 16:30:54,998][07361] Num frames 8700... [2025-01-05 16:30:55,106][07361] Num frames 8800... [2025-01-05 16:30:55,211][07361] Num frames 8900... [2025-01-05 16:30:55,316][07361] Num frames 9000... [2025-01-05 16:30:55,421][07361] Num frames 9100... [2025-01-05 16:30:55,529][07361] Num frames 9200... [2025-01-05 16:30:55,634][07361] Num frames 9300... [2025-01-05 16:30:55,738][07361] Num frames 9400... [2025-01-05 16:30:55,840][07361] Avg episode rewards: #0: 40.741, true rewards: #0: 15.742 [2025-01-05 16:30:55,841][07361] Avg episode reward: 40.741, avg true_objective: 15.742 [2025-01-05 16:30:55,910][07361] Num frames 9500... [2025-01-05 16:30:56,009][07361] Num frames 9600... [2025-01-05 16:30:56,134][07361] Num frames 9700... [2025-01-05 16:30:56,233][07361] Num frames 9800... [2025-01-05 16:30:56,338][07361] Num frames 9900... [2025-01-05 16:30:56,443][07361] Num frames 10000... [2025-01-05 16:30:56,546][07361] Num frames 10100... [2025-01-05 16:30:56,638][07361] Num frames 10200... [2025-01-05 16:30:56,729][07361] Num frames 10300... [2025-01-05 16:30:56,821][07361] Num frames 10400... [2025-01-05 16:30:56,877][07361] Avg episode rewards: #0: 37.858, true rewards: #0: 14.859 [2025-01-05 16:30:56,877][07361] Avg episode reward: 37.858, avg true_objective: 14.859 [2025-01-05 16:30:56,981][07361] Num frames 10500... [2025-01-05 16:30:57,084][07361] Num frames 10600... [2025-01-05 16:30:57,178][07361] Num frames 10700... [2025-01-05 16:30:57,270][07361] Num frames 10800... [2025-01-05 16:30:57,363][07361] Num frames 10900... [2025-01-05 16:30:57,457][07361] Num frames 11000... [2025-01-05 16:30:57,552][07361] Num frames 11100... [2025-01-05 16:30:57,648][07361] Num frames 11200... [2025-01-05 16:30:57,742][07361] Num frames 11300... [2025-01-05 16:30:57,844][07361] Num frames 11400... [2025-01-05 16:30:57,950][07361] Num frames 11500... [2025-01-05 16:30:58,055][07361] Num frames 11600... [2025-01-05 16:30:58,159][07361] Num frames 11700... [2025-01-05 16:30:58,274][07361] Num frames 11800... [2025-01-05 16:30:58,380][07361] Num frames 11900... [2025-01-05 16:30:58,488][07361] Num frames 12000... [2025-01-05 16:30:58,594][07361] Num frames 12100... [2025-01-05 16:30:58,699][07361] Num frames 12200... [2025-01-05 16:30:58,804][07361] Num frames 12300... [2025-01-05 16:30:58,906][07361] Num frames 12400... [2025-01-05 16:30:59,010][07361] Num frames 12500... [2025-01-05 16:30:59,066][07361] Avg episode rewards: #0: 40.501, true rewards: #0: 15.626 [2025-01-05 16:30:59,066][07361] Avg episode reward: 40.501, avg true_objective: 15.626 [2025-01-05 16:30:59,177][07361] Num frames 12600... [2025-01-05 16:30:59,284][07361] Num frames 12700... [2025-01-05 16:30:59,389][07361] Num frames 12800... [2025-01-05 16:30:59,495][07361] Num frames 12900... [2025-01-05 16:30:59,600][07361] Num frames 13000... [2025-01-05 16:30:59,704][07361] Num frames 13100... [2025-01-05 16:30:59,809][07361] Num frames 13200... [2025-01-05 16:30:59,914][07361] Num frames 13300... [2025-01-05 16:31:00,017][07361] Num frames 13400... [2025-01-05 16:31:00,120][07361] Num frames 13500... [2025-01-05 16:31:00,223][07361] Num frames 13600... [2025-01-05 16:31:00,324][07361] Num frames 13700... [2025-01-05 16:31:00,443][07361] Num frames 13800... [2025-01-05 16:31:00,543][07361] Avg episode rewards: #0: 39.716, true rewards: #0: 15.383 [2025-01-05 16:31:00,544][07361] Avg episode reward: 39.716, avg true_objective: 15.383 [2025-01-05 16:31:00,638][07361] Num frames 13900... [2025-01-05 16:31:00,741][07361] Num frames 14000... [2025-01-05 16:31:00,842][07361] Num frames 14100... [2025-01-05 16:31:00,946][07361] Num frames 14200... [2025-01-05 16:31:01,049][07361] Num frames 14300... [2025-01-05 16:31:01,154][07361] Num frames 14400... [2025-01-05 16:31:01,256][07361] Num frames 14500... [2025-01-05 16:31:01,357][07361] Num frames 14600... [2025-01-05 16:31:01,459][07361] Num frames 14700... [2025-01-05 16:31:01,563][07361] Num frames 14800... [2025-01-05 16:31:01,666][07361] Num frames 14900... [2025-01-05 16:31:01,770][07361] Num frames 15000... [2025-01-05 16:31:01,873][07361] Num frames 15100... [2025-01-05 16:31:01,977][07361] Num frames 15200... [2025-01-05 16:31:02,078][07361] Num frames 15300... [2025-01-05 16:31:02,180][07361] Num frames 15400... [2025-01-05 16:31:02,287][07361] Num frames 15500... [2025-01-05 16:31:02,390][07361] Num frames 15600... [2025-01-05 16:31:02,494][07361] Num frames 15700... [2025-01-05 16:31:02,610][07361] Num frames 15800... [2025-01-05 16:31:02,709][07361] Num frames 15900... [2025-01-05 16:31:02,810][07361] Avg episode rewards: #0: 41.144, true rewards: #0: 15.945 [2025-01-05 16:31:02,810][07361] Avg episode reward: 41.144, avg true_objective: 15.945 [2025-01-05 16:31:33,681][07361] Replay video saved to /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/replay.mp4! [2025-01-05 16:31:33,692][07361] Loading existing experiment configuration from /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/config.json [2025-01-05 16:31:33,692][07361] Overriding arg 'num_workers' with value 1 passed from command line [2025-01-05 16:31:33,692][07361] Adding new argument 'no_render'=True that is not in the saved config file! [2025-01-05 16:31:33,692][07361] Adding new argument 'save_video'=True that is not in the saved config file! [2025-01-05 16:31:33,692][07361] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! [2025-01-05 16:31:33,692][07361] Adding new argument 'video_name'=None that is not in the saved config file! [2025-01-05 16:31:33,692][07361] Adding new argument 'max_num_frames'=100000 that is not in the saved config file! [2025-01-05 16:31:33,692][07361] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! [2025-01-05 16:31:33,693][07361] Adding new argument 'push_to_hub'=True that is not in the saved config file! [2025-01-05 16:31:33,693][07361] Adding new argument 'hf_repository'='spenning/rl_course_vizdoom_health_gathering_supreme' that is not in the saved config file! [2025-01-05 16:31:33,693][07361] Adding new argument 'policy_index'=0 that is not in the saved config file! [2025-01-05 16:31:33,693][07361] Adding new argument 'eval_deterministic'=False that is not in the saved config file! [2025-01-05 16:31:33,693][07361] Adding new argument 'train_script'=None that is not in the saved config file! [2025-01-05 16:31:33,693][07361] Adding new argument 'enjoy_script'=None that is not in the saved config file! [2025-01-05 16:31:33,693][07361] Using frameskip 1 and render_action_repeat=4 for evaluation [2025-01-05 16:31:33,708][07361] RunningMeanStd input shape: (3, 72, 128) [2025-01-05 16:31:33,710][07361] RunningMeanStd input shape: (1,) [2025-01-05 16:31:33,717][07361] ConvEncoder: input_channels=3 [2025-01-05 16:31:33,748][07361] Conv encoder output size: 512 [2025-01-05 16:31:33,748][07361] Policy head output size: 512 [2025-01-05 16:31:33,778][07361] Loading state from checkpoint /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/checkpoint_p0/checkpoint_000316550_1296588800.pth... [2025-01-05 16:31:34,227][07361] Num frames 100... [2025-01-05 16:31:34,338][07361] Num frames 200... [2025-01-05 16:31:34,451][07361] Num frames 300... [2025-01-05 16:31:34,552][07361] Num frames 400... [2025-01-05 16:31:34,639][07361] Num frames 500... [2025-01-05 16:31:34,731][07361] Num frames 600... [2025-01-05 16:31:34,831][07361] Num frames 700... [2025-01-05 16:31:34,919][07361] Num frames 800... [2025-01-05 16:31:35,008][07361] Num frames 900... [2025-01-05 16:31:35,101][07361] Num frames 1000... [2025-01-05 16:31:35,190][07361] Num frames 1100... [2025-01-05 16:31:35,277][07361] Num frames 1200... [2025-01-05 16:31:35,346][07361] Avg episode rewards: #0: 24.160, true rewards: #0: 12.160 [2025-01-05 16:31:35,347][07361] Avg episode reward: 24.160, avg true_objective: 12.160 [2025-01-05 16:31:35,460][07361] Num frames 1300... [2025-01-05 16:31:35,549][07361] Num frames 1400... [2025-01-05 16:31:35,639][07361] Num frames 1500... [2025-01-05 16:31:35,728][07361] Num frames 1600... [2025-01-05 16:31:35,819][07361] Num frames 1700... [2025-01-05 16:31:35,910][07361] Num frames 1800... [2025-01-05 16:31:36,003][07361] Num frames 1900... [2025-01-05 16:31:36,097][07361] Num frames 2000... [2025-01-05 16:31:36,193][07361] Num frames 2100... [2025-01-05 16:31:36,277][07361] Num frames 2200... [2025-01-05 16:31:36,363][07361] Num frames 2300... [2025-01-05 16:31:36,450][07361] Num frames 2400... [2025-01-05 16:31:36,542][07361] Num frames 2500... [2025-01-05 16:31:36,631][07361] Num frames 2600... [2025-01-05 16:31:36,721][07361] Num frames 2700... [2025-01-05 16:31:36,811][07361] Num frames 2800... [2025-01-05 16:31:36,900][07361] Num frames 2900... [2025-01-05 16:31:36,989][07361] Num frames 3000... [2025-01-05 16:31:37,078][07361] Num frames 3100... [2025-01-05 16:31:37,167][07361] Num frames 3200... [2025-01-05 16:31:37,283][07361] Num frames 3300... [2025-01-05 16:31:37,351][07361] Avg episode rewards: #0: 37.580, true rewards: #0: 16.580 [2025-01-05 16:31:37,351][07361] Avg episode reward: 37.580, avg true_objective: 16.580 [2025-01-05 16:31:37,445][07361] Num frames 3400... [2025-01-05 16:31:37,531][07361] Num frames 3500... [2025-01-05 16:31:37,620][07361] Num frames 3600... [2025-01-05 16:31:37,710][07361] Num frames 3700... [2025-01-05 16:31:37,802][07361] Num frames 3800... [2025-01-05 16:31:37,893][07361] Num frames 3900... [2025-01-05 16:31:37,984][07361] Num frames 4000... [2025-01-05 16:31:38,075][07361] Num frames 4100... [2025-01-05 16:31:38,164][07361] Num frames 4200... [2025-01-05 16:31:38,254][07361] Num frames 4300... [2025-01-05 16:31:38,345][07361] Num frames 4400... [2025-01-05 16:31:38,435][07361] Num frames 4500... [2025-01-05 16:31:38,525][07361] Num frames 4600... [2025-01-05 16:31:38,618][07361] Num frames 4700... [2025-01-05 16:31:38,706][07361] Num frames 4800... [2025-01-05 16:31:38,792][07361] Num frames 4900... [2025-01-05 16:31:38,878][07361] Num frames 5000... [2025-01-05 16:31:38,966][07361] Num frames 5100... [2025-01-05 16:31:39,055][07361] Num frames 5200... [2025-01-05 16:31:39,145][07361] Num frames 5300... [2025-01-05 16:31:39,232][07361] Num frames 5400... [2025-01-05 16:31:39,301][07361] Avg episode rewards: #0: 44.053, true rewards: #0: 18.053 [2025-01-05 16:31:39,301][07361] Avg episode reward: 44.053, avg true_objective: 18.053 [2025-01-05 16:31:39,402][07361] Num frames 5500... [2025-01-05 16:31:39,486][07361] Num frames 5600... [2025-01-05 16:31:39,572][07361] Num frames 5700... [2025-01-05 16:31:39,658][07361] Num frames 5800... [2025-01-05 16:31:39,748][07361] Num frames 5900... [2025-01-05 16:31:39,835][07361] Num frames 6000... [2025-01-05 16:31:39,923][07361] Num frames 6100... [2025-01-05 16:31:40,011][07361] Num frames 6200... [2025-01-05 16:31:40,098][07361] Num frames 6300... [2025-01-05 16:31:40,185][07361] Num frames 6400... [2025-01-05 16:31:40,271][07361] Num frames 6500... [2025-01-05 16:31:40,355][07361] Num frames 6600... [2025-01-05 16:31:40,443][07361] Num frames 6700... [2025-01-05 16:31:40,527][07361] Num frames 6800... [2025-01-05 16:31:40,602][07361] Avg episode rewards: #0: 42.059, true rewards: #0: 17.060 [2025-01-05 16:31:40,602][07361] Avg episode reward: 42.059, avg true_objective: 17.060 [2025-01-05 16:31:40,690][07361] Num frames 6900... [2025-01-05 16:31:40,774][07361] Num frames 7000... [2025-01-05 16:31:40,858][07361] Num frames 7100... [2025-01-05 16:31:40,944][07361] Num frames 7200... [2025-01-05 16:31:41,031][07361] Num frames 7300... [2025-01-05 16:31:41,115][07361] Num frames 7400... [2025-01-05 16:31:41,201][07361] Num frames 7500... [2025-01-05 16:31:41,286][07361] Num frames 7600... [2025-01-05 16:31:41,371][07361] Num frames 7700... [2025-01-05 16:31:41,498][07361] Avg episode rewards: #0: 38.968, true rewards: #0: 15.568 [2025-01-05 16:31:41,498][07361] Avg episode reward: 38.968, avg true_objective: 15.568 [2025-01-05 16:31:41,521][07361] Num frames 7800... [2025-01-05 16:31:41,619][07361] Num frames 7900... [2025-01-05 16:31:41,707][07361] Num frames 8000... [2025-01-05 16:31:41,794][07361] Num frames 8100... [2025-01-05 16:31:41,881][07361] Num frames 8200... [2025-01-05 16:31:41,969][07361] Num frames 8300... [2025-01-05 16:31:42,057][07361] Num frames 8400... [2025-01-05 16:31:42,145][07361] Num frames 8500... [2025-01-05 16:31:42,244][07361] Avg episode rewards: #0: 35.086, true rewards: #0: 14.253 [2025-01-05 16:31:42,244][07361] Avg episode reward: 35.086, avg true_objective: 14.253 [2025-01-05 16:31:42,293][07361] Num frames 8600... [2025-01-05 16:31:42,377][07361] Num frames 8700... [2025-01-05 16:31:42,462][07361] Num frames 8800... [2025-01-05 16:31:42,548][07361] Num frames 8900... [2025-01-05 16:31:42,635][07361] Num frames 9000... [2025-01-05 16:31:42,723][07361] Num frames 9100... [2025-01-05 16:31:42,811][07361] Num frames 9200... [2025-01-05 16:31:42,897][07361] Num frames 9300... [2025-01-05 16:31:42,985][07361] Num frames 9400... [2025-01-05 16:31:43,071][07361] Num frames 9500... [2025-01-05 16:31:43,158][07361] Num frames 9600... [2025-01-05 16:31:43,244][07361] Num frames 9700... [2025-01-05 16:31:43,333][07361] Num frames 9800... [2025-01-05 16:31:43,420][07361] Num frames 9900... [2025-01-05 16:31:43,507][07361] Num frames 10000... [2025-01-05 16:31:43,595][07361] Num frames 10100... [2025-01-05 16:31:43,680][07361] Num frames 10200... [2025-01-05 16:31:43,784][07361] Num frames 10300... [2025-01-05 16:31:43,870][07361] Num frames 10400... [2025-01-05 16:31:43,957][07361] Num frames 10500... [2025-01-05 16:31:44,043][07361] Num frames 10600... [2025-01-05 16:31:44,142][07361] Avg episode rewards: #0: 38.217, true rewards: #0: 15.217 [2025-01-05 16:31:44,142][07361] Avg episode reward: 38.217, avg true_objective: 15.217 [2025-01-05 16:31:44,195][07361] Num frames 10700... [2025-01-05 16:31:44,282][07361] Num frames 10800... [2025-01-05 16:31:44,368][07361] Num frames 10900... [2025-01-05 16:31:44,452][07361] Num frames 11000... [2025-01-05 16:31:44,540][07361] Num frames 11100... [2025-01-05 16:31:44,626][07361] Num frames 11200... [2025-01-05 16:31:44,714][07361] Num frames 11300... [2025-01-05 16:31:44,778][07361] Avg episode rewards: #0: 35.390, true rewards: #0: 14.140 [2025-01-05 16:31:44,778][07361] Avg episode reward: 35.390, avg true_objective: 14.140 [2025-01-05 16:31:44,875][07361] Num frames 11400... [2025-01-05 16:31:44,959][07361] Num frames 11500... [2025-01-05 16:31:45,046][07361] Num frames 11600... [2025-01-05 16:31:45,131][07361] Num frames 11700... [2025-01-05 16:31:45,221][07361] Num frames 11800... [2025-01-05 16:31:45,310][07361] Num frames 11900... [2025-01-05 16:31:45,398][07361] Num frames 12000... [2025-01-05 16:31:45,484][07361] Num frames 12100... [2025-01-05 16:31:45,571][07361] Num frames 12200... [2025-01-05 16:31:45,659][07361] Num frames 12300... [2025-01-05 16:31:45,746][07361] Num frames 12400... [2025-01-05 16:31:45,832][07361] Num frames 12500... [2025-01-05 16:31:45,941][07361] Num frames 12600... [2025-01-05 16:31:46,030][07361] Num frames 12700... [2025-01-05 16:31:46,122][07361] Num frames 12800... [2025-01-05 16:31:46,213][07361] Num frames 12900... [2025-01-05 16:31:46,304][07361] Num frames 13000... [2025-01-05 16:31:46,395][07361] Num frames 13100... [2025-01-05 16:31:46,485][07361] Num frames 13200... [2025-01-05 16:31:46,574][07361] Num frames 13300... [2025-01-05 16:31:46,666][07361] Num frames 13400... [2025-01-05 16:31:46,731][07361] Avg episode rewards: #0: 37.791, true rewards: #0: 14.902 [2025-01-05 16:31:46,731][07361] Avg episode reward: 37.791, avg true_objective: 14.902 [2025-01-05 16:31:46,809][07361] Num frames 13500... [2025-01-05 16:31:46,898][07361] Num frames 13600... [2025-01-05 16:31:46,990][07361] Num frames 13700... [2025-01-05 16:31:47,080][07361] Num frames 13800... [2025-01-05 16:31:47,172][07361] Num frames 13900... [2025-01-05 16:31:47,262][07361] Num frames 14000... [2025-01-05 16:31:47,352][07361] Num frames 14100... [2025-01-05 16:31:47,443][07361] Num frames 14200... [2025-01-05 16:31:47,534][07361] Num frames 14300... [2025-01-05 16:31:47,624][07361] Num frames 14400... [2025-01-05 16:31:47,714][07361] Num frames 14500... [2025-01-05 16:31:47,804][07361] Num frames 14600... [2025-01-05 16:31:47,894][07361] Num frames 14700... [2025-01-05 16:31:47,984][07361] Num frames 14800... [2025-01-05 16:31:48,097][07361] Num frames 14900... [2025-01-05 16:31:48,187][07361] Num frames 15000... [2025-01-05 16:31:48,277][07361] Num frames 15100... [2025-01-05 16:31:48,366][07361] Num frames 15200... [2025-01-05 16:31:48,458][07361] Num frames 15300... [2025-01-05 16:31:48,548][07361] Num frames 15400... [2025-01-05 16:31:48,637][07361] Num frames 15500... [2025-01-05 16:31:48,702][07361] Avg episode rewards: #0: 39.911, true rewards: #0: 15.512 [2025-01-05 16:31:48,702][07361] Avg episode reward: 39.911, avg true_objective: 15.512 [2025-01-05 16:32:18,397][07361] Replay video saved to /home/steve/Documents/AI/huggingface/rl/unit_8_2/train_dir/default_experiment/replay.mp4!