PPO Agent playing AntBulletEnv-v0
This is a trained model of a PPO agent playing AntBulletEnv-v0 using the stable-baselines3 library.
Usage (with Stable-baselines3)
from stable_baselines3 import ...
from huggingface_sb3 import load_from_hub
...
MODEL model = PPO(policy = "MlpPolicy", env = env, batch_size = 256, clip_range = 0.4, ent_coef = 0.0, gae_lambda = 0.92, gamma = 0.99, learning_rate = 3.0e-05, max_grad_norm = 0.5, n_epochs = 30, n_steps = 512, policy_kwargs = dict(log_std_init=-2, ortho_init=False, activation_fn=nn.ReLU, net_arch=[dict(pi=[256, 256], vf=[256, 256])] ), use_sde = True, sde_sample_freq = 4, vf_coef = 0.5, tensorboard_log = "./tensorboard", verbose=1)
model.learn(1_000_000)
- Downloads last month
- 0
Evaluation results
- mean_reward on AntBulletEnv-v0self-reported2447.40 +/- 23.14