Async vec, 250000, not looking too hot on my end, lets see what their eval looks like
f9cb710
verified
tags: | |
- LunarLander-v2 | |
- ppo | |
- deep-reinforcement-learning | |
- reinforcement-learning | |
- custom-implementation | |
- deep-rl-course | |
model-index: | |
- name: PPO | |
results: | |
- task: | |
type: reinforcement-learning | |
name: reinforcement-learning | |
dataset: | |
name: LunarLander-v2 | |
type: LunarLander-v2 | |
metrics: | |
- type: mean_reward | |
value: -40.01 +/- 76.54 | |
name: mean_reward | |
verified: false | |
# PPO Agent Playing LunarLander-v2 | |
This is a trained model of a PPO agent playing LunarLander-v2. | |
# Hyperparameters | |
```python | |
{'seed': 42069 | |
'capture_video': True | |
'learning_rate': 0.0003 | |
'eps': 1e-05 | |
'num_steps': 1024 | |
'total_timesteps': 2500000 | |
'anneal_lr': True | |
'gae': True | |
'gamma': 0.999 | |
'gae_lambda': 0.95 | |
'update_epochs': 4 | |
'num_minibatches': 4 | |
'clip_coef': 0.2 | |
'norm_adv': True | |
'clip_vloss': True | |
'ent_coef': 0.01 | |
'vf_coef': 0.5 | |
'max_grad_norm': 0.5 | |
'target_kl': None | |
'batch_size': 8192 | |
'minibatch_size': 2048 | |
'env_id': 'LunarLander-v2'} | |
``` | |