zjowowen commited on
Commit
10e4fa0
1 Parent(s): 5eff669

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +388 -0
README.md ADDED
@@ -0,0 +1,388 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language: en
3
+ license: apache-2.0
4
+ library_name: pytorch
5
+ tags:
6
+ - deep-reinforcement-learning
7
+ - reinforcement-learning
8
+ - DI-engine
9
+ - CartPole-v0
10
+ benchmark_name: OpenAI/Gym/Box2d
11
+ task_name: CartPole-v0
12
+ pipeline_tag: reinforcement-learning
13
+ model-index:
14
+ - name: MuZero
15
+ results:
16
+ - task:
17
+ type: reinforcement-learning
18
+ name: reinforcement-learning
19
+ dataset:
20
+ name: CartPole-v0
21
+ type: CartPole-v0
22
+ metrics:
23
+ - type: mean_reward
24
+ value: 195.5 +/- 4.18
25
+ name: mean_reward
26
+ ---
27
+
28
+ # Play **CartPole-v0** with **MuZero** Policy
29
+
30
+ ## Model Description
31
+ <!-- Provide a longer summary of what this model is. -->
32
+ This is a simple **MuZero** implementation to OpenAI/Gym/Box2d **CartPole-v0** using the [DI-engine library](https://github.com/opendilab/di-engine) and the [DI-zoo](https://github.com/opendilab/DI-engine/tree/main/dizoo).
33
+
34
+ **DI-engine** is a python library for solving general decision intelligence problems, which is based on implementations of reinforcement learning framework using PyTorch or JAX. This library aims to standardize the reinforcement learning framework across different algorithms, benchmarks, environments, and to support both academic researches and prototype applications. Besides, self-customized training pipelines and applications are supported by reusing different abstraction levels of DI-engine reinforcement learning framework.
35
+
36
+
37
+
38
+ ## Model Usage
39
+ ### Install the Dependencies
40
+ <details close>
41
+ <summary>(Click for Details)</summary>
42
+
43
+ ```shell
44
+ # install huggingface_ding
45
+ git clone https://github.com/opendilab/huggingface_ding.git
46
+ pip3 install -e ./huggingface_ding/
47
+ # install environment dependencies if needed
48
+ pip3 install DI-engine[common_env,video] LightZero
49
+ ```
50
+ </details>
51
+
52
+ ### Git Clone from Huggingface and Run the Model
53
+
54
+ <details close>
55
+ <summary>(Click for Details)</summary>
56
+
57
+ ```shell
58
+ # running with trained model
59
+ python3 -u run.py
60
+ ```
61
+ **run.py**
62
+ ```python
63
+ from lzero.agent import MuZeroAgent
64
+ from ding.config import Config
65
+ from easydict import EasyDict
66
+ import torch
67
+
68
+ # Pull model from files which are git cloned from huggingface
69
+ policy_state_dict = torch.load("pytorch_model.bin", map_location=torch.device("cpu"))
70
+ cfg = EasyDict(Config.file_to_dict("policy_config.py").cfg_dict)
71
+ # Instantiate the agent
72
+ agent = MuZeroAgent(
73
+ env_id="CartPole-v0", exp_name="CartPole-v0-MuZero", cfg=cfg.exp_config, policy_state_dict=policy_state_dict
74
+ )
75
+ # Continue training
76
+ agent.train(step=5000)
77
+ # Render the new agent performance
78
+ agent.deploy(enable_save_replay=True)
79
+
80
+ ```
81
+ </details>
82
+
83
+ ### Run Model by Using Huggingface_ding
84
+
85
+ <details close>
86
+ <summary>(Click for Details)</summary>
87
+
88
+ ```shell
89
+ # running with trained model
90
+ python3 -u run.py
91
+ ```
92
+ **run.py**
93
+ ```python
94
+ from lzero.agent import MuZeroAgent
95
+ from huggingface_ding import pull_model_from_hub
96
+
97
+ # Pull model from Hugggingface hub
98
+ policy_state_dict, cfg = pull_model_from_hub(repo_id="OpenDILabCommunity/CartPole-v0-MuZero")
99
+ # Instantiate the agent
100
+ agent = MuZeroAgent(
101
+ env_id="CartPole-v0", exp_name="CartPole-v0-MuZero", cfg=cfg.exp_config, policy_state_dict=policy_state_dict
102
+ )
103
+ # Continue training
104
+ agent.train(step=5000)
105
+ # Render the new agent performance
106
+ agent.deploy(enable_save_replay=True)
107
+
108
+ ```
109
+ </details>
110
+
111
+ ## Model Training
112
+
113
+ ### Train the Model and Push to Huggingface_hub
114
+
115
+ <details close>
116
+ <summary>(Click for Details)</summary>
117
+
118
+ ```shell
119
+ #Training Your Own Agent
120
+ python3 -u train.py
121
+ ```
122
+ **train.py**
123
+ ```python
124
+ from lzero.agent import MuZeroAgent
125
+ from huggingface_ding import push_model_to_hub
126
+
127
+ # Instantiate the agent
128
+ agent = MuZeroAgent(env_id="CartPole-v0", exp_name="CartPole-v0-MuZero")
129
+ # Train the agent
130
+ return_ = agent.train(step=int(10000))
131
+ # Push model to huggingface hub
132
+ push_model_to_hub(
133
+ agent=agent.best,
134
+ env_name="OpenAI/Gym/Box2d",
135
+ task_name="CartPole-v0",
136
+ algo_name="MuZero",
137
+ github_repo_url="https://github.com/opendilab/LightZero",
138
+ github_doc_model_url=None,
139
+ github_doc_env_url=None,
140
+ installation_guide="pip3 install DI-engine[common_env,video] LightZero",
141
+ usage_file_by_git_clone="./muzero/cartpole_muzero_deploy.py",
142
+ usage_file_by_huggingface_ding="./muzero/cartpole_muzero_download.py",
143
+ train_file="./muzero/cartpole_muzero.py",
144
+ repo_id="OpenDILabCommunity/CartPole-v0-MuZero",
145
+ create_repo=True
146
+ )
147
+
148
+ ```
149
+ </details>
150
+
151
+ **Configuration**
152
+ <details close>
153
+ <summary>(Click for Details)</summary>
154
+
155
+
156
+ ```python
157
+ exp_config = {
158
+ 'env': {
159
+ 'manager': {
160
+ 'episode_num': float("inf"),
161
+ 'max_retry': 5,
162
+ 'step_timeout': None,
163
+ 'auto_reset': True,
164
+ 'reset_timeout': None,
165
+ 'retry_type': 'reset',
166
+ 'retry_waiting_time': 0.1,
167
+ 'shared_memory': False,
168
+ 'copy_on_get': True,
169
+ 'context': 'fork',
170
+ 'wait_num': float("inf"),
171
+ 'step_wait_timeout': None,
172
+ 'connect_timeout': 60,
173
+ 'reset_inplace': False,
174
+ 'cfg_type': 'SyncSubprocessEnvManagerDict',
175
+ 'type': 'subprocess'
176
+ },
177
+ 'stop_value':
178
+ 10000000000,
179
+ 'n_evaluator_episode':
180
+ 3,
181
+ 'type':
182
+ 'cartpole_lightzero',
183
+ 'import_names':
184
+ ['zoo.classic_control.cartpole.envs.cartpole_lightzero_env'],
185
+ 'env_id':
186
+ 'CartPole-v0',
187
+ 'continuous':
188
+ False,
189
+ 'manually_discretization':
190
+ False,
191
+ 'replay_path':
192
+ '/tmp/tmp4kdr3rf1/videos'
193
+ },
194
+ 'policy': {
195
+ 'model': {
196
+ 'model_type': 'mlp',
197
+ 'continuous_action_space': False,
198
+ 'observation_shape': 4,
199
+ 'self_supervised_learning_loss': True,
200
+ 'categorical_distribution': True,
201
+ 'image_channel': 1,
202
+ 'frame_stack_num': 1,
203
+ 'num_res_blocks': 1,
204
+ 'num_channels': 64,
205
+ 'support_scale': 300,
206
+ 'bias': True,
207
+ 'discrete_action_encoding_type': 'one_hot',
208
+ 'res_connection_in_dynamics': True,
209
+ 'norm_type': 'BN',
210
+ 'action_space_size': 2,
211
+ 'lstm_hidden_size': 128,
212
+ 'latent_state_dim': 128
213
+ },
214
+ 'learn': {
215
+ 'learner': {
216
+ 'train_iterations': 1000000000,
217
+ 'dataloader': {
218
+ 'num_workers': 0
219
+ },
220
+ 'log_policy': True,
221
+ 'hook': {
222
+ 'load_ckpt_before_run': '',
223
+ 'log_show_after_iter': 100,
224
+ 'save_ckpt_after_iter': 10000,
225
+ 'save_ckpt_after_run': True
226
+ },
227
+ 'cfg_type': 'BaseLearnerDict'
228
+ }
229
+ },
230
+ 'collect': {
231
+ 'collector': {
232
+ 'deepcopy_obs': False,
233
+ 'transform_obs': False,
234
+ 'collect_print_freq': 100,
235
+ 'cfg_type': 'SampleSerialCollectorDict',
236
+ 'type': 'sample'
237
+ }
238
+ },
239
+ 'eval': {
240
+ 'evaluator': {
241
+ 'eval_freq': 1000,
242
+ 'render': {
243
+ 'render_freq': -1,
244
+ 'mode': 'train_iter'
245
+ },
246
+ 'figure_path': None,
247
+ 'cfg_type': 'InteractionSerialEvaluatorDict',
248
+ 'stop_value': 10000000000,
249
+ 'n_episode': 3
250
+ }
251
+ },
252
+ 'other': {
253
+ 'replay_buffer': {
254
+ 'type': 'advanced',
255
+ 'replay_buffer_size': 4096,
256
+ 'max_use': float("inf"),
257
+ 'max_staleness': float("inf"),
258
+ 'alpha': 0.6,
259
+ 'beta': 0.4,
260
+ 'anneal_step': 100000,
261
+ 'enable_track_used_data': False,
262
+ 'deepcopy': False,
263
+ 'thruput_controller': {
264
+ 'push_sample_rate_limit': {
265
+ 'max': float("inf"),
266
+ 'min': 0
267
+ },
268
+ 'window_seconds': 30,
269
+ 'sample_min_limit_ratio': 1
270
+ },
271
+ 'monitor': {
272
+ 'sampled_data_attr': {
273
+ 'average_range': 5,
274
+ 'print_freq': 200
275
+ },
276
+ 'periodic_thruput': {
277
+ 'seconds': 60
278
+ }
279
+ },
280
+ 'cfg_type': 'AdvancedReplayBufferDict'
281
+ },
282
+ 'commander': {
283
+ 'cfg_type': 'BaseSerialCommanderDict'
284
+ }
285
+ },
286
+ 'on_policy': False,
287
+ 'cuda': True,
288
+ 'multi_gpu': False,
289
+ 'bp_update_sync': True,
290
+ 'traj_len_inf': False,
291
+ 'use_rnd_model': False,
292
+ 'sampled_algo': False,
293
+ 'gumbel_algo': False,
294
+ 'mcts_ctree': True,
295
+ 'collector_env_num': 8,
296
+ 'evaluator_env_num': 3,
297
+ 'env_type': 'not_board_games',
298
+ 'battle_mode': 'play_with_bot_mode',
299
+ 'monitor_extra_statistics': True,
300
+ 'game_segment_length': 50,
301
+ 'transform2string': False,
302
+ 'gray_scale': False,
303
+ 'use_augmentation': False,
304
+ 'augmentation': ['shift', 'intensity'],
305
+ 'ignore_done': False,
306
+ 'update_per_collect': 100,
307
+ 'model_update_ratio': 0.1,
308
+ 'batch_size': 256,
309
+ 'optim_type': 'Adam',
310
+ 'learning_rate': 0.003,
311
+ 'target_update_freq': 100,
312
+ 'target_update_freq_for_intrinsic_reward': 1000,
313
+ 'weight_decay': 0.0001,
314
+ 'momentum': 0.9,
315
+ 'grad_clip_value': 10,
316
+ 'n_episode': 8,
317
+ 'num_simulations': 25,
318
+ 'discount_factor': 0.997,
319
+ 'td_steps': 5,
320
+ 'num_unroll_steps': 5,
321
+ 'reward_loss_weight': 1,
322
+ 'value_loss_weight': 0.25,
323
+ 'policy_loss_weight': 1,
324
+ 'policy_entropy_loss_weight': 0,
325
+ 'ssl_loss_weight': 2,
326
+ 'lr_piecewise_constant_decay': False,
327
+ 'threshold_training_steps_for_final_lr': 50000,
328
+ 'manual_temperature_decay': False,
329
+ 'threshold_training_steps_for_final_temperature': 100000,
330
+ 'fixed_temperature_value': 0.25,
331
+ 'use_ture_chance_label_in_chance_encoder': False,
332
+ 'use_priority': True,
333
+ 'priority_prob_alpha': 0.6,
334
+ 'priority_prob_beta': 0.4,
335
+ 'root_dirichlet_alpha': 0.3,
336
+ 'root_noise_weight': 0.25,
337
+ 'random_collect_episode_num': 0,
338
+ 'eps': {
339
+ 'eps_greedy_exploration_in_collect': False,
340
+ 'type': 'linear',
341
+ 'start': 1.0,
342
+ 'end': 0.05,
343
+ 'decay': 100000
344
+ },
345
+ 'cfg_type': 'MuZeroPolicyDict',
346
+ 'type': 'muzero',
347
+ 'import_names': ['lzero.policy.muzero'],
348
+ 'reanalyze_ratio': 0,
349
+ 'eval_freq': 200,
350
+ 'replay_buffer_size': 1000000,
351
+ 'device': 'cuda'
352
+ },
353
+ 'exp_name': 'CartPole-v0-MuZero',
354
+ 'seed': 0,
355
+ 'wandb_logger': {
356
+ 'gradient_logger': False,
357
+ 'video_logger': False,
358
+ 'plot_logger': False,
359
+ 'action_logger': False,
360
+ 'return_logger': False
361
+ }
362
+ }
363
+
364
+ ```
365
+ </details>
366
+
367
+ **Training Procedure**
368
+ <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
369
+ - **Weights & Biases (wandb):** [monitor link](<TODO>)
370
+
371
+ ## Model Information
372
+ <!-- Provide the basic links for the model. -->
373
+ - **Github Repository:** [repo link](https://github.com/opendilab/LightZero)
374
+ - **Doc**: [DI-engine-docs Algorithm link](<TODO>)
375
+ - **Configuration:** [config link](https://huggingface.co/OpenDILabCommunity/CartPole-v0-MuZero/blob/main/policy_config.py)
376
+ - **Demo:** [video](https://huggingface.co/OpenDILabCommunity/CartPole-v0-MuZero/blob/main/replay.mp4)
377
+ <!-- Provide the size information for the model. -->
378
+ - **Parameters total size:** 13548.13 KB
379
+ - **Last Update Date:** 2023-11-30
380
+
381
+ ## Environments
382
+ <!-- Address questions around what environment the model is intended to be trained and deployed at, including the necessary information needed to be provided for future users. -->
383
+ - **Benchmark:** OpenAI/Gym/Box2d
384
+ - **Task:** CartPole-v0
385
+ - **Gym version:** 0.25.1
386
+ - **DI-engine version:** v0.4.9
387
+ - **PyTorch version:** 2.1.1+cu121
388
+ - **Doc**: [DI-engine-docs Environments link](<TODO>)