Upload README.md with huggingface_hub
Browse files
README.md
CHANGED
|
@@ -5,6 +5,7 @@ tags:
|
|
| 5 |
- deep-reinforcement-learning
|
| 6 |
- reinforcement-learning
|
| 7 |
- stable-baselines3
|
|
|
|
| 8 |
model-index:
|
| 9 |
- name: PPO
|
| 10 |
results:
|
|
@@ -20,61 +21,3 @@ model-index:
|
|
| 20 |
name: mean_reward
|
| 21 |
verified: false
|
| 22 |
---
|
| 23 |
-
|
| 24 |
-
# **PPO** Agent playing **InvertedDoublePendulum-v2**
|
| 25 |
-
This is a trained model of a **PPO** agent playing **InvertedDoublePendulum-v2**
|
| 26 |
-
using the [stable-baselines3 library](https://github.com/DLR-RM/stable-baselines3)
|
| 27 |
-
and the [RL Zoo](https://github.com/DLR-RM/rl-baselines3-zoo).
|
| 28 |
-
|
| 29 |
-
The RL Zoo is a training framework for Stable Baselines3
|
| 30 |
-
reinforcement learning agents,
|
| 31 |
-
with hyperparameter optimization and pre-trained agents included.
|
| 32 |
-
|
| 33 |
-
## Usage (with SB3 RL Zoo)
|
| 34 |
-
|
| 35 |
-
RL Zoo: https://github.com/DLR-RM/rl-baselines3-zoo<br/>
|
| 36 |
-
SB3: https://github.com/DLR-RM/stable-baselines3<br/>
|
| 37 |
-
SB3 Contrib: https://github.com/Stable-Baselines-Team/stable-baselines3-contrib
|
| 38 |
-
|
| 39 |
-
Install the RL Zoo (with SB3 and SB3-Contrib):
|
| 40 |
-
```bash
|
| 41 |
-
pip install rl_zoo3
|
| 42 |
-
```
|
| 43 |
-
|
| 44 |
-
```
|
| 45 |
-
# Download model and save it into the logs/ folder
|
| 46 |
-
python -m rl_zoo3.load_from_hub --algo ppo --env InvertedDoublePendulum-v2 -orga qgallouedec -f logs/
|
| 47 |
-
python -m rl_zoo3.enjoy --algo ppo --env InvertedDoublePendulum-v2 -f logs/
|
| 48 |
-
```
|
| 49 |
-
|
| 50 |
-
If you installed the RL Zoo3 via pip (`pip install rl_zoo3`), from anywhere you can do:
|
| 51 |
-
```
|
| 52 |
-
python -m rl_zoo3.load_from_hub --algo ppo --env InvertedDoublePendulum-v2 -orga qgallouedec -f logs/
|
| 53 |
-
python -m rl_zoo3.enjoy --algo ppo --env InvertedDoublePendulum-v2 -f logs/
|
| 54 |
-
```
|
| 55 |
-
|
| 56 |
-
## Training (with the RL Zoo)
|
| 57 |
-
```
|
| 58 |
-
python -m rl_zoo3.train --algo ppo --env InvertedDoublePendulum-v2 -f logs/
|
| 59 |
-
# Upload the model and generate video (when possible)
|
| 60 |
-
python -m rl_zoo3.push_to_hub --algo ppo --env InvertedDoublePendulum-v2 -f logs/ -orga qgallouedec
|
| 61 |
-
```
|
| 62 |
-
|
| 63 |
-
## Hyperparameters
|
| 64 |
-
```python
|
| 65 |
-
OrderedDict([('batch_size', 512),
|
| 66 |
-
('clip_range', 0.4),
|
| 67 |
-
('ent_coef', 1.05057e-06),
|
| 68 |
-
('gae_lambda', 0.8),
|
| 69 |
-
('gamma', 0.98),
|
| 70 |
-
('learning_rate', 0.000155454),
|
| 71 |
-
('max_grad_norm', 0.5),
|
| 72 |
-
('n_envs', 1),
|
| 73 |
-
('n_epochs', 10),
|
| 74 |
-
('n_steps', 128),
|
| 75 |
-
('n_timesteps', 1000000.0),
|
| 76 |
-
('normalize', True),
|
| 77 |
-
('policy', 'MlpPolicy'),
|
| 78 |
-
('vf_coef', 0.695929),
|
| 79 |
-
('normalize_kwargs', {'norm_obs': True, 'norm_reward': False})])
|
| 80 |
-
```
|
|
|
|
| 5 |
- deep-reinforcement-learning
|
| 6 |
- reinforcement-learning
|
| 7 |
- stable-baselines3
|
| 8 |
+
- InvertedDoublePendulum-v4
|
| 9 |
model-index:
|
| 10 |
- name: PPO
|
| 11 |
results:
|
|
|
|
| 21 |
name: mean_reward
|
| 22 |
verified: false
|
| 23 |
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|