TheoVincent commited on
Commit
acafacf
ยท
1 Parent(s): 2c407e0

model card enhancement

Browse files
Files changed (4) hide show
  1. .gitattributes +1 -0
  2. README.md +8 -19
  3. atari.gif +3 -0
  4. config.json +32 -0
.gitattributes CHANGED
@@ -34,3 +34,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
  *best_online_params filter=lfs diff=lfs merge=lfs -text
 
 
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
  *best_online_params filter=lfs diff=lfs merge=lfs -text
37
+ *.gif filter=lfs diff=lfs merge=lfs -text
README.md CHANGED
@@ -1,35 +1,24 @@
1
  ---
2
  license: mit
3
  license_link: https://huggingface.co/TheoVincent/Atari_i-QN/blob/main/LICENSE
4
- language:
5
- - en
6
  tags:
7
  - reinforcement-learning
8
  - jax
9
- - eval-results
10
- - deep-reinforcement-learning
11
  - atari
12
- - dqn
13
- - iqn
14
  ---
15
 
16
- # Model parameters training with `i-DQN` and `i-IQN`
17
- This repository contains the model parameters trained with `i-DQN` on [$56$ Atari games](#list-of-games-for-i-dqn) and trained with `i-IQN` on [$20$ Atari games](#list-of-games-for-i-iqn) ๐ŸŽฎ. $5$ seeds are available for each configuration which makes a total of $380$ available models ๐Ÿ“ˆ.
18
 
19
- The [evaluate.ipynb](./evaluate.ipynb) notebook contains a minimal example to evaluate to model parameters ๐Ÿง‘โ€๐Ÿซ. It uses JAX ๐Ÿš€.
20
 
21
- ps: The set of [$20$ Atari games](#list-of-games-for-i-iqn) is included in the set of [$56$ Atari games](#list-of-games-for-i-dqn).
22
 
23
  ### Model performances
24
- `i-DQN` and `i-IQN` are improvements made over [`DQN`](https://www.nature.com/articles/nature14236.pdf) and [`IQN`](https://arxiv.org/abs/1806.06923) โœจ. Check it out on [arXiv](https://arxiv.org/abs/2403.02107)! | <img src="performances.png" alt="drawing" width="600"/>
25
- :-:|:-:
26
-
27
-
28
- ### List of games for `i-DQN`
29
- Alien, Amidar, Assault, Asterix, Asteroids, Atlantis, BankHeist, BattleZone, BeamRider, Berzerk, Bowling, Boxing, Breakout, Centipede, ChopperCommand, CrazyClimber, DemonAttack, DoubleDunk, Enduro, FishingDerby, Freeway, Frostbite, Gopher, Gravitar, Hero, IceHockey, Jamesbond, Kangaroo, Krull, KungFuMaster, MontezumaRevenge, MsPacman, NameThisGame, Phoenix, Pitfall, Pong, Pooyan, PrivateEye, Qbert, Riverraid, RoadRunner, Robotank, Seaquest, Skiing, Solaris, SpaceInvaders, StarGunner, Tennis, TimePilot, Tutankham, UpNDown, Venture, VideoPinball, WizardOfWor, YarsRevenge, Zaxxon.
30
-
31
- ### List of games for `i-IQN`
32
- Alien, Assault, BankHeist, Berzerk, Breakout, Centipede, ChopperCommand, DemonAttack, Enduro, Frostbite, Gopher, Gravitar, IceHockey, Jamesbond, Krull, KungFuMaster, Riverraid, Seaquest, Skiing, StarGunner.
33
 
34
  ## User installation
35
  Python 3.10 is recommended. Create a Python virtual environment, activate it, update pip and install the package and its dependencies in editable mode:
 
1
  ---
2
  license: mit
3
  license_link: https://huggingface.co/TheoVincent/Atari_i-QN/blob/main/LICENSE
 
 
4
  tags:
5
  - reinforcement-learning
6
  - jax
 
 
7
  - atari
8
+ co2_eq_emissions:
9
+ emissions: 3000
10
  ---
11
 
12
+ # Model parameters trained with `i-DQN` and `i-IQN`
13
+ This repository contains the model parameters trained with `i-DQN` on [56 Atari games](#i-DQN_games) and trained with `i-IQN` on [20 Atari games](#i-IQN_games) ๐ŸŽฎ. 5 seeds are available for each configuration which makes a total of 380 available models ๐Ÿ“ˆ.
14
 
15
+ The [evaluate.ipynb](./evaluate.ipynb) notebook contains a minimal example to evaluate to model parameters ๐Ÿง‘โ€๐Ÿซ. It uses JAX ๐Ÿš€. The hyperparameters used during training are reported in [config.json](./config.json) ๐Ÿ”ง.
16
 
17
+ ps: The set of [20 Atari games](#i-DQN_games) is included in the set of [56 Atari games](#i-IQN_games).
18
 
19
  ### Model performances
20
+ | `i-DQN` and `i-IQN` are improvements made over [`DQN`](https://www.nature.com/articles/nature14236.pdf) and [`IQN`](https://arxiv.org/abs/1806.06923) โœจ. Check the paper on [arXiv](https://arxiv.org/abs/2403.02107)! <details> <summary id=i-DQN_games>List of games trained with `i-DQN`</summary> *Alien, Amidar, Assault, Asterix, Asteroids, Atlantis, BankHeist, BattleZone, BeamRider, Berzerk, Bowling, Boxing, Breakout, Centipede, ChopperCommand, CrazyClimber, DemonAttack, DoubleDunk, Enduro, FishingDerby, Freeway, Frostbite, Gopher, Gravitar, Hero, IceHockey, Jamesbond, Kangaroo, Krull, KungFuMaster, MontezumaRevenge, MsPacman, NameThisGame, Phoenix, Pitfall, Pong, Pooyan, PrivateEye, Qbert, Riverraid, RoadRunner, Robotank, Seaquest, Skiing, Solaris, SpaceInvaders, StarGunner, Tennis, TimePilot, Tutankham, UpNDown, Venture, VideoPinball, WizardOfWor, YarsRevenge, Zaxxon.* </details> <details> <summary id=i-IQN_games>List of games trained with `i-IQN`</summary> *Alien, Assault, BankHeist, Berzerk, Breakout, Centipede, ChopperCommand, DemonAttack, Enduro, Frostbite, Gopher, Gravitar, IceHockey, Jamesbond, Krull, KungFuMaster, Riverraid, Seaquest, Skiing, StarGunner.* </details> | <img src="performances.png" alt="drawing" width="600"/> |
21
+ | :-: | :-: |
 
 
 
 
 
 
 
22
 
23
  ## User installation
24
  Python 3.10 is recommended. Create a Python virtual environment, activate it, update pip and install the package and its dependencies in editable mode:
atari.gif ADDED

Git LFS Details

  • SHA256: 9084236da60018fb7538ed8b5a1a65c76a22cc8fea4a2423c639c3d81ca71760
  • Pointer size: 131 Bytes
  • Size of remote file: 690 kB
config.json ADDED
@@ -0,0 +1,32 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "---- Shared parameters ---": "----------------",
3
+ "gamma": 0.99,
4
+ "replay_buffer_size": 1000000,
5
+ "n_initial_samples": 20000,
6
+ "n_epochs": 200,
7
+ "n_training_steps_per_epoch": 250000,
8
+ "n_training_steps_per_online_update": 4,
9
+ "horizon": 27000,
10
+ "starting_eps": 1,
11
+ "ending_eps": 0.01,
12
+ "duration_eps": 250000,
13
+ "batch_size": 32,
14
+ "n_step_return": 5,
15
+ "---- i-DQN ---": "----------------------------",
16
+ "idqn_learning_rate": 6.25e-5,
17
+ "idqn_optimizer_eps": 1.5e-4,
18
+ "idqn_n_training_steps_per_target_update": 3000000,
19
+ "idqn_n_training_steps_per_rolling_step": 8000,
20
+ "idqn_head_behaviorial_policy": "uniform",
21
+ "idqn_shared_network": true,
22
+ "---- i-IQN ---": "----------------------------",
23
+ "iiqn_learning_rate": 0.00005,
24
+ "iiqn_optimizer_eps": 0.0003125,
25
+ "iiqn_n_training_steps_per_target_update": 3000000,
26
+ "iiqn_n_training_steps_per_rolling_step": 8000,
27
+ "iiqn_head_behaviorial_policy": "uniform",
28
+ "iiqn_n_quantiles_policy": 32,
29
+ "iiqn_n_quantiles": 64,
30
+ "iiqn_n_quantiles_target": 64,
31
+ "iiqn_shared_network": true
32
+ }