Commit
ยท
acafacf
1
Parent(s):
2c407e0
model card enhancement
Browse files- .gitattributes +1 -0
- README.md +8 -19
- atari.gif +3 -0
- config.json +32 -0
.gitattributes
CHANGED
@@ -34,3 +34,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
|
|
34 |
*.zst filter=lfs diff=lfs merge=lfs -text
|
35 |
*tfevents* filter=lfs diff=lfs merge=lfs -text
|
36 |
*best_online_params filter=lfs diff=lfs merge=lfs -text
|
|
|
|
34 |
*.zst filter=lfs diff=lfs merge=lfs -text
|
35 |
*tfevents* filter=lfs diff=lfs merge=lfs -text
|
36 |
*best_online_params filter=lfs diff=lfs merge=lfs -text
|
37 |
+
*.gif filter=lfs diff=lfs merge=lfs -text
|
README.md
CHANGED
@@ -1,35 +1,24 @@
|
|
1 |
---
|
2 |
license: mit
|
3 |
license_link: https://huggingface.co/TheoVincent/Atari_i-QN/blob/main/LICENSE
|
4 |
-
language:
|
5 |
-
- en
|
6 |
tags:
|
7 |
- reinforcement-learning
|
8 |
- jax
|
9 |
-
- eval-results
|
10 |
-
- deep-reinforcement-learning
|
11 |
- atari
|
12 |
-
|
13 |
-
|
14 |
---
|
15 |
|
16 |
-
# Model parameters
|
17 |
-
This repository contains the model parameters trained with `i-DQN` on [
|
18 |
|
19 |
-
The [evaluate.ipynb](./evaluate.ipynb) notebook contains a minimal example to evaluate to model parameters ๐งโ๐ซ. It uses JAX ๐.
|
20 |
|
21 |
-
ps: The set of [
|
22 |
|
23 |
### Model performances
|
24 |
-
`i-DQN` and `i-IQN` are improvements made over [`DQN`](https://www.nature.com/articles/nature14236.pdf) and [`IQN`](https://arxiv.org/abs/1806.06923) โจ. Check
|
25 |
-
|
26 |
-
|
27 |
-
|
28 |
-
### List of games for `i-DQN`
|
29 |
-
Alien, Amidar, Assault, Asterix, Asteroids, Atlantis, BankHeist, BattleZone, BeamRider, Berzerk, Bowling, Boxing, Breakout, Centipede, ChopperCommand, CrazyClimber, DemonAttack, DoubleDunk, Enduro, FishingDerby, Freeway, Frostbite, Gopher, Gravitar, Hero, IceHockey, Jamesbond, Kangaroo, Krull, KungFuMaster, MontezumaRevenge, MsPacman, NameThisGame, Phoenix, Pitfall, Pong, Pooyan, PrivateEye, Qbert, Riverraid, RoadRunner, Robotank, Seaquest, Skiing, Solaris, SpaceInvaders, StarGunner, Tennis, TimePilot, Tutankham, UpNDown, Venture, VideoPinball, WizardOfWor, YarsRevenge, Zaxxon.
|
30 |
-
|
31 |
-
### List of games for `i-IQN`
|
32 |
-
Alien, Assault, BankHeist, Berzerk, Breakout, Centipede, ChopperCommand, DemonAttack, Enduro, Frostbite, Gopher, Gravitar, IceHockey, Jamesbond, Krull, KungFuMaster, Riverraid, Seaquest, Skiing, StarGunner.
|
33 |
|
34 |
## User installation
|
35 |
Python 3.10 is recommended. Create a Python virtual environment, activate it, update pip and install the package and its dependencies in editable mode:
|
|
|
1 |
---
|
2 |
license: mit
|
3 |
license_link: https://huggingface.co/TheoVincent/Atari_i-QN/blob/main/LICENSE
|
|
|
|
|
4 |
tags:
|
5 |
- reinforcement-learning
|
6 |
- jax
|
|
|
|
|
7 |
- atari
|
8 |
+
co2_eq_emissions:
|
9 |
+
emissions: 3000
|
10 |
---
|
11 |
|
12 |
+
# Model parameters trained with `i-DQN` and `i-IQN`
|
13 |
+
This repository contains the model parameters trained with `i-DQN` on [56 Atari games](#i-DQN_games) and trained with `i-IQN` on [20 Atari games](#i-IQN_games) ๐ฎ. 5 seeds are available for each configuration which makes a total of 380 available models ๐.
|
14 |
|
15 |
+
The [evaluate.ipynb](./evaluate.ipynb) notebook contains a minimal example to evaluate to model parameters ๐งโ๐ซ. It uses JAX ๐. The hyperparameters used during training are reported in [config.json](./config.json) ๐ง.
|
16 |
|
17 |
+
ps: The set of [20 Atari games](#i-DQN_games) is included in the set of [56 Atari games](#i-IQN_games).
|
18 |
|
19 |
### Model performances
|
20 |
+
| `i-DQN` and `i-IQN` are improvements made over [`DQN`](https://www.nature.com/articles/nature14236.pdf) and [`IQN`](https://arxiv.org/abs/1806.06923) โจ. Check the paper on [arXiv](https://arxiv.org/abs/2403.02107)! <details> <summary id=i-DQN_games>List of games trained with `i-DQN`</summary> *Alien, Amidar, Assault, Asterix, Asteroids, Atlantis, BankHeist, BattleZone, BeamRider, Berzerk, Bowling, Boxing, Breakout, Centipede, ChopperCommand, CrazyClimber, DemonAttack, DoubleDunk, Enduro, FishingDerby, Freeway, Frostbite, Gopher, Gravitar, Hero, IceHockey, Jamesbond, Kangaroo, Krull, KungFuMaster, MontezumaRevenge, MsPacman, NameThisGame, Phoenix, Pitfall, Pong, Pooyan, PrivateEye, Qbert, Riverraid, RoadRunner, Robotank, Seaquest, Skiing, Solaris, SpaceInvaders, StarGunner, Tennis, TimePilot, Tutankham, UpNDown, Venture, VideoPinball, WizardOfWor, YarsRevenge, Zaxxon.* </details> <details> <summary id=i-IQN_games>List of games trained with `i-IQN`</summary> *Alien, Assault, BankHeist, Berzerk, Breakout, Centipede, ChopperCommand, DemonAttack, Enduro, Frostbite, Gopher, Gravitar, IceHockey, Jamesbond, Krull, KungFuMaster, Riverraid, Seaquest, Skiing, StarGunner.* </details> | <img src="performances.png" alt="drawing" width="600"/> |
|
21 |
+
| :-: | :-: |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
22 |
|
23 |
## User installation
|
24 |
Python 3.10 is recommended. Create a Python virtual environment, activate it, update pip and install the package and its dependencies in editable mode:
|
atari.gif
ADDED
![]() |
Git LFS Details
|
config.json
ADDED
@@ -0,0 +1,32 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
{
|
2 |
+
"---- Shared parameters ---": "----------------",
|
3 |
+
"gamma": 0.99,
|
4 |
+
"replay_buffer_size": 1000000,
|
5 |
+
"n_initial_samples": 20000,
|
6 |
+
"n_epochs": 200,
|
7 |
+
"n_training_steps_per_epoch": 250000,
|
8 |
+
"n_training_steps_per_online_update": 4,
|
9 |
+
"horizon": 27000,
|
10 |
+
"starting_eps": 1,
|
11 |
+
"ending_eps": 0.01,
|
12 |
+
"duration_eps": 250000,
|
13 |
+
"batch_size": 32,
|
14 |
+
"n_step_return": 5,
|
15 |
+
"---- i-DQN ---": "----------------------------",
|
16 |
+
"idqn_learning_rate": 6.25e-5,
|
17 |
+
"idqn_optimizer_eps": 1.5e-4,
|
18 |
+
"idqn_n_training_steps_per_target_update": 3000000,
|
19 |
+
"idqn_n_training_steps_per_rolling_step": 8000,
|
20 |
+
"idqn_head_behaviorial_policy": "uniform",
|
21 |
+
"idqn_shared_network": true,
|
22 |
+
"---- i-IQN ---": "----------------------------",
|
23 |
+
"iiqn_learning_rate": 0.00005,
|
24 |
+
"iiqn_optimizer_eps": 0.0003125,
|
25 |
+
"iiqn_n_training_steps_per_target_update": 3000000,
|
26 |
+
"iiqn_n_training_steps_per_rolling_step": 8000,
|
27 |
+
"iiqn_head_behaviorial_policy": "uniform",
|
28 |
+
"iiqn_n_quantiles_policy": 32,
|
29 |
+
"iiqn_n_quantiles": 64,
|
30 |
+
"iiqn_n_quantiles_target": 64,
|
31 |
+
"iiqn_shared_network": true
|
32 |
+
}
|