|
--- |
|
library_name: skrl |
|
tags: |
|
- deep-reinforcement-learning |
|
- reinforcement-learning |
|
- skrl |
|
model-index: |
|
- name: PPO |
|
results: |
|
- metrics: |
|
- type: mean_reward |
|
value: 6524.74 +/- 570.54 |
|
name: Total reward (mean) |
|
task: |
|
type: reinforcement-learning |
|
name: reinforcement-learning |
|
dataset: |
|
name: IsaacGymEnvs-Humanoid |
|
type: IsaacGymEnvs-Humanoid |
|
--- |
|
|
|
<!-- --- |
|
torch: 6524.74 +/- 570.54 |
|
jax: 6265.95 +/- 280.11 |
|
numpy: 5727.54 +/- 406.96 |
|
--- --> |
|
|
|
# IsaacGymEnvs-Humanoid-PPO |
|
|
|
Trained agent for [NVIDIA Isaac Gym Preview](https://github.com/NVIDIA-Omniverse/IsaacGymEnvs) environments. |
|
|
|
- **Task:** Humanoid |
|
- **Agent:** [PPO](https://skrl.readthedocs.io/en/latest/api/agents/ppo.html) |
|
|
|
# Usage (with skrl) |
|
|
|
Note: Visit the skrl [Examples](https://skrl.readthedocs.io/en/latest/intro/examples.html) section to access the scripts. |
|
|
|
* PyTorch |
|
|
|
```python |
|
from skrl.utils.huggingface import download_model_from_huggingface |
|
|
|
# assuming that there is an agent named `agent` |
|
path = download_model_from_huggingface("skrl/IsaacGymEnvs-Humanoid-PPO", filename="agent.pt") |
|
agent.load(path) |
|
``` |
|
|
|
* JAX |
|
|
|
```python |
|
from skrl.utils.huggingface import download_model_from_huggingface |
|
|
|
# assuming that there is an agent named `agent` |
|
path = download_model_from_huggingface("skrl/IsaacGymEnvs-Humanoid-PPO", filename="agent.pickle") |
|
agent.load(path) |
|
``` |
|
|
|
# Hyperparameters |
|
|
|
Note: Undefined parameters keep their values by default. |
|
|
|
```python |
|
# https://skrl.readthedocs.io/en/latest/api/agents/ppo.html#configuration-and-hyperparameters |
|
cfg = PPO_DEFAULT_CONFIG.copy() |
|
cfg["rollouts"] = 32 # memory_size |
|
cfg["learning_epochs"] = 5 |
|
cfg["mini_batches"] = 4 # 32 * 4096 / 32768 |
|
cfg["discount_factor"] = 0.99 |
|
cfg["lambda"] = 0.95 |
|
cfg["learning_rate"] = 5e-4 |
|
cfg["learning_rate_scheduler"] = KLAdaptiveRL |
|
cfg["learning_rate_scheduler_kwargs"] = {"kl_threshold": 0.008} |
|
cfg["random_timesteps"] = 0 |
|
cfg["learning_starts"] = 0 |
|
cfg["grad_norm_clip"] = 1.0 |
|
cfg["ratio_clip"] = 0.2 |
|
cfg["value_clip"] = 0.2 |
|
cfg["clip_predicted_values"] = True |
|
cfg["entropy_loss_scale"] = 0.0 |
|
cfg["value_loss_scale"] = 2.0 |
|
cfg["kl_threshold"] = 0 |
|
cfg["rewards_shaper"] = lambda rewards, timestep, timesteps: rewards * 0.01 |
|
cfg["state_preprocessor"] = RunningStandardScaler |
|
cfg["state_preprocessor_kwargs"] = {"size": env.observation_space, "device": device} |
|
cfg["value_preprocessor"] = RunningStandardScaler |
|
cfg["value_preprocessor_kwargs"] = {"size": 1, "device": device} |
|
``` |
|
|