Robotics
Transformers
Safetensors
Inference Endpoints
Files changed (6) hide show
  1. README.md +19 -23
  2. config.json +59 -38
  3. config.yaml +0 -141
  4. eval_info.json +0 -0
  5. model.safetensors +2 -2
  6. train_config.json +234 -0
README.md CHANGED
@@ -2,6 +2,11 @@
2
  license: apache-2.0
3
  datasets:
4
  - lerobot/pusht
 
 
 
 
 
5
  pipeline_tag: robotics
6
  ---
7
  # Model Card for Diffusion Policy / PushT
@@ -15,31 +20,23 @@ See the [LeRobot library](https://github.com/huggingface/lerobot) (particularly
15
 
16
  ## Training Details
17
 
18
- The model was trained using [LeRobot's training script](https://github.com/huggingface/lerobot/blob/d747195c5733c4f68d4bfbe62632d6fc1b605712/lerobot/scripts/train.py) and with the [pusht](https://huggingface.co/datasets/lerobot/pusht/tree/v1.3) dataset, using this command:
19
 
20
  ```bash
21
  python lerobot/scripts/train.py \
22
- hydra.run.dir=outputs/train/diffusion_pusht \
23
- hydra.job.name=diffusion_pusht \
24
- policy=diffusion training.save_model=true \
25
- env=pusht \
26
- env.task=PushT-v0 \
27
- dataset_repo_id=lerobot/pusht \
28
- training.offline_steps=200000 \
29
- training.save_freq=20000 \
30
- training.eval_freq=10000 \
31
- eval.n_episodes=50 \
32
- wandb.enable=true \
33
- wandb.disable_artifact=true \
34
- device=cuda
35
  ```
36
 
37
 
38
- The training curves may be found at https://wandb.ai/alexander-soare/Alexander-LeRobot/runs/508luayd.
39
-
40
- This took about 7 hours to train on an Nvida RTX 3090.
41
-
42
- _Note: At the time of training, [this PR](https://github.com/huggingface/lerobot/pull/129) was also incorporated._
43
 
44
  ## Evaluation
45
 
@@ -48,12 +45,11 @@ The model was evaluated on the `PushT` environment from [gym-pusht](https://gith
48
  - Maximum overlap with target (seen as `eval/avg_max_reward` in the charts above). This ranges in [0, 1].
49
  - Success: whether or not the maximum overlap is at least 95%.
50
 
51
- Here are the metrics for 500 episodes worth of evaluation. For the succes rate we add an extra row with confidence bounds. This assumes a uniform prior over success probability and computes the beta posterior, then calculates the mean and lower/upper confidence bounds (with a 68.2% confidence interval centered on the mean). The "Theirs" column is for an equivalent model trained on the original Diffusion Policy repository and evaluated on LeRobot (the model weights may be found in the [`original_dp_repo`](https://huggingface.co/lerobot/diffusion_pusht/tree/original_dp_repo) branch of this respository).
52
 
53
  <blank>|Ours|Theirs
54
  -|-|-
55
- Average max. overlap ratio | 0.959 | 0.957
56
- Success rate for 500 episodes (%) | 63.8 | 64.2
57
- Beta distribution lower/mean/upper (%) | 61.6 / 63.7 / 65.9 | 62.0 / 64.1 / 66.3
58
 
59
  The results of each of the individual rollouts may be found in [eval_info.json](eval_info.json).
 
2
  license: apache-2.0
3
  datasets:
4
  - lerobot/pusht
5
+ tags:
6
+ - diffusion-policy
7
+ - model_hub_mixin
8
+ - pytorch_model_hub_mixin
9
+ - robotics
10
  pipeline_tag: robotics
11
  ---
12
  # Model Card for Diffusion Policy / PushT
 
20
 
21
  ## Training Details
22
 
23
+ The model was trained using [LeRobot's training script](https://github.com/huggingface/lerobot/blob/main/lerobot/scripts/train.py) and with the [pusht](https://huggingface.co/datasets/lerobot/pusht) dataset, using this command:
24
 
25
  ```bash
26
  python lerobot/scripts/train.py \
27
+ --policy.type=diffusion \
28
+ --dataset.repo_id=lerobot/pusht \
29
+ --seed=100000 \
30
+ --env.type=pusht \
31
+ --batch_size=64 \
32
+ --offline.steps=200000 \
33
+ --eval_freq=25000 \
34
+ --save_freq=25000 \
35
+ --wandb.enable=true
 
 
 
 
36
  ```
37
 
38
 
39
+ The training curves may be found at https://wandb.ai/aliberts/lerobot/runs/s7elvf4r.
 
 
 
 
40
 
41
  ## Evaluation
42
 
 
45
  - Maximum overlap with target (seen as `eval/avg_max_reward` in the charts above). This ranges in [0, 1].
46
  - Success: whether or not the maximum overlap is at least 95%.
47
 
48
+ Here are the metrics for 500 episodes worth of evaluation. The "Theirs" column is for an equivalent model trained on the original Diffusion Policy repository and evaluated on LeRobot (the model weights may be found in the [`original_dp_repo`](https://huggingface.co/lerobot/diffusion_pusht/tree/original_dp_repo) branch of this respository).
49
 
50
  <blank>|Ours|Theirs
51
  -|-|-
52
+ Average max. overlap ratio | 0.945 | 0.957
53
+ Success rate for 500 episodes (%) | 63.2 | 64.2
 
54
 
55
  The results of each of the individual rollouts may be found in [eval_info.json](eval_info.json).
config.json CHANGED
@@ -1,53 +1,74 @@
1
  {
2
- "n_obs_steps": 2,
3
- "horizon": 16,
4
- "n_action_steps": 8,
5
- "input_shapes": {
6
- "observation.image": [
7
- 3,
8
- 96,
9
- 96
10
- ],
11
- "observation.state": [
12
- 2
13
- ]
14
- },
15
- "output_shapes": {
16
- "action": [
17
- 2
18
- ]
19
- },
20
- "input_normalization_modes": {
21
- "observation.image": "mean_std",
22
- "observation.state": "min_max"
23
- },
24
- "output_normalization_modes": {
25
- "action": "min_max"
26
- },
27
- "vision_backbone": "resnet18",
28
  "crop_shape": [
29
  84,
30
  84
31
  ],
32
- "crop_is_random": true,
33
- "pretrained_backbone_weights": null,
34
- "use_group_norm": true,
35
- "spatial_softmax_num_keypoints": 32,
36
  "down_dims": [
37
  512,
38
  1024,
39
  2048
40
  ],
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
41
  "kernel_size": 5,
 
42
  "n_groups": 8,
43
- "diffusion_step_embed_dim": 128,
44
- "use_film_scale_modulation": true,
 
 
 
 
 
 
45
  "num_train_timesteps": 100,
46
- "beta_schedule": "squaredcos_cap_v2",
47
- "beta_start": 0.0001,
48
- "beta_end": 0.02,
 
 
 
 
 
 
 
 
 
 
 
 
49
  "prediction_type": "epsilon",
50
- "clip_sample": true,
51
- "clip_sample_range": 1.0,
52
- "num_inference_steps": 100
 
 
 
 
 
 
53
  }
 
1
  {
2
+ "beta_end": 0.02,
3
+ "beta_schedule": "squaredcos_cap_v2",
4
+ "beta_start": 0.0001,
5
+ "clip_sample": true,
6
+ "clip_sample_range": 1.0,
7
+ "crop_is_random": true,
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
8
  "crop_shape": [
9
  84,
10
  84
11
  ],
12
+ "diffusion_step_embed_dim": 128,
13
+ "do_mask_loss_for_padding": false,
 
 
14
  "down_dims": [
15
  512,
16
  1024,
17
  2048
18
  ],
19
+ "drop_n_last_frames": 7,
20
+ "horizon": 16,
21
+ "input_features": {
22
+ "observation.image": {
23
+ "shape": [
24
+ 3,
25
+ 96,
26
+ 96
27
+ ],
28
+ "type": "VISUAL"
29
+ },
30
+ "observation.state": {
31
+ "shape": [
32
+ 2
33
+ ],
34
+ "type": "STATE"
35
+ }
36
+ },
37
  "kernel_size": 5,
38
+ "n_action_steps": 8,
39
  "n_groups": 8,
40
+ "n_obs_steps": 2,
41
+ "noise_scheduler_type": "DDPM",
42
+ "normalization_mapping": {
43
+ "ACTION": "MIN_MAX",
44
+ "STATE": "MIN_MAX",
45
+ "VISUAL": "MEAN_STD"
46
+ },
47
+ "num_inference_steps": null,
48
  "num_train_timesteps": 100,
49
+ "optimizer_betas": [
50
+ 0.95,
51
+ 0.999
52
+ ],
53
+ "optimizer_eps": 1e-08,
54
+ "optimizer_lr": 0.0001,
55
+ "optimizer_weight_decay": 1e-06,
56
+ "output_features": {
57
+ "action": {
58
+ "shape": [
59
+ 2
60
+ ],
61
+ "type": "ACTION"
62
+ }
63
+ },
64
  "prediction_type": "epsilon",
65
+ "pretrained_backbone_weights": null,
66
+ "scheduler_name": "cosine",
67
+ "scheduler_warmup_steps": 500,
68
+ "spatial_softmax_num_keypoints": 32,
69
+ "type": "diffusion",
70
+ "use_film_scale_modulation": true,
71
+ "use_group_norm": true,
72
+ "use_separate_rgb_encoder_per_camera": false,
73
+ "vision_backbone": "resnet18"
74
  }
config.yaml DELETED
@@ -1,141 +0,0 @@
1
- device: cuda
2
- use_amp: false
3
- seed: 100000
4
- dataset_repo_id: lerobot/pusht
5
- training:
6
- offline_steps: 200000
7
- online_steps: 0
8
- online_steps_between_rollouts: 1
9
- online_sampling_ratio: 0.5
10
- online_env_seed: ???
11
- eval_freq: 10000
12
- save_freq: 20000
13
- log_freq: 250
14
- save_model: true
15
- batch_size: 64
16
- grad_clip_norm: 10
17
- lr: 0.0001
18
- lr_scheduler: cosine
19
- lr_warmup_steps: 500
20
- adam_betas:
21
- - 0.95
22
- - 0.999
23
- adam_eps: 1.0e-08
24
- adam_weight_decay: 1.0e-06
25
- delta_timestamps:
26
- observation.image:
27
- - -0.1
28
- - 0.0
29
- observation.state:
30
- - -0.1
31
- - 0.0
32
- action:
33
- - -0.1
34
- - 0.0
35
- - 0.1
36
- - 0.2
37
- - 0.3
38
- - 0.4
39
- - 0.5
40
- - 0.6
41
- - 0.7
42
- - 0.8
43
- - 0.9
44
- - 1.0
45
- - 1.1
46
- - 1.2
47
- - 1.3
48
- - 1.4
49
- n_end_keyframes_dropped: ${policy.horizon} - ${policy.n_action_steps} - ${policy.n_obs_steps}
50
- + 1
51
- eval:
52
- n_episodes: 50
53
- batch_size: 50
54
- use_async_envs: false
55
- wandb:
56
- enable: true
57
- disable_artifact: true
58
- project: lerobot
59
- notes: ''
60
- fps: 10
61
- env:
62
- name: pusht
63
- task: PushT-v0
64
- image_size: 96
65
- state_dim: 2
66
- action_dim: 2
67
- fps: ${fps}
68
- episode_length: 300
69
- gym:
70
- obs_type: pixels_agent_pos
71
- render_mode: rgb_array
72
- visualization_width: 384
73
- visualization_height: 384
74
- override_dataset_stats:
75
- observation.image:
76
- mean:
77
- - - - 0.5
78
- - - - 0.5
79
- - - - 0.5
80
- std:
81
- - - - 0.5
82
- - - - 0.5
83
- - - - 0.5
84
- observation.state:
85
- min:
86
- - 13.456424
87
- - 32.938293
88
- max:
89
- - 496.14618
90
- - 510.9579
91
- action:
92
- min:
93
- - 12.0
94
- - 25.0
95
- max:
96
- - 511.0
97
- - 511.0
98
- policy:
99
- name: diffusion
100
- n_obs_steps: 2
101
- horizon: 16
102
- n_action_steps: 8
103
- input_shapes:
104
- observation.image:
105
- - 3
106
- - 96
107
- - 96
108
- observation.state:
109
- - ${env.state_dim}
110
- output_shapes:
111
- action:
112
- - ${env.action_dim}
113
- input_normalization_modes:
114
- observation.image: mean_std
115
- observation.state: min_max
116
- output_normalization_modes:
117
- action: min_max
118
- vision_backbone: resnet18
119
- crop_shape:
120
- - 84
121
- - 84
122
- crop_is_random: true
123
- pretrained_backbone_weights: null
124
- use_group_norm: true
125
- spatial_softmax_num_keypoints: 32
126
- down_dims:
127
- - 512
128
- - 1024
129
- - 2048
130
- kernel_size: 5
131
- n_groups: 8
132
- diffusion_step_embed_dim: 128
133
- use_film_scale_modulation: true
134
- num_train_timesteps: 100
135
- beta_schedule: squaredcos_cap_v2
136
- beta_start: 0.0001
137
- beta_end: 0.02
138
- prediction_type: epsilon
139
- clip_sample: true
140
- clip_sample_range: 1.0
141
- num_inference_steps: 100
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
eval_info.json CHANGED
The diff for this file is too large to render. See raw diff
 
model.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:877969d58d12af315d8c672a2328b3984071901b6f71bdf592b6f131056b520f
3
- size 1050862612
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:f6b2458798c6dfad3edc0d338a1a4c78da62cc68817eb3d8bd7b395bed4ef672
3
+ size 1050862408
train_config.json ADDED
@@ -0,0 +1,234 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "dataset": {
3
+ "repo_id": "lerobot/pusht",
4
+ "episodes": null,
5
+ "image_transforms": {
6
+ "enable": false,
7
+ "max_num_transforms": 3,
8
+ "random_order": false,
9
+ "tfs": {
10
+ "brightness": {
11
+ "weight": 1.0,
12
+ "type": "ColorJitter",
13
+ "kwargs": {
14
+ "brightness": [
15
+ 0.8,
16
+ 1.2
17
+ ]
18
+ }
19
+ },
20
+ "contrast": {
21
+ "weight": 1.0,
22
+ "type": "ColorJitter",
23
+ "kwargs": {
24
+ "contrast": [
25
+ 0.8,
26
+ 1.2
27
+ ]
28
+ }
29
+ },
30
+ "saturation": {
31
+ "weight": 1.0,
32
+ "type": "ColorJitter",
33
+ "kwargs": {
34
+ "saturation": [
35
+ 0.5,
36
+ 1.5
37
+ ]
38
+ }
39
+ },
40
+ "hue": {
41
+ "weight": 1.0,
42
+ "type": "ColorJitter",
43
+ "kwargs": {
44
+ "hue": [
45
+ -0.05,
46
+ 0.05
47
+ ]
48
+ }
49
+ },
50
+ "sharpness": {
51
+ "weight": 1.0,
52
+ "type": "SharpnessJitter",
53
+ "kwargs": {
54
+ "sharpness": [
55
+ 0.5,
56
+ 1.5
57
+ ]
58
+ }
59
+ }
60
+ }
61
+ },
62
+ "local_files_only": false,
63
+ "use_imagenet_stats": true,
64
+ "video_backend": "pyav"
65
+ },
66
+ "env": {
67
+ "type": "pusht",
68
+ "n_envs": null,
69
+ "task": "PushT-v0",
70
+ "fps": 10,
71
+ "features": {
72
+ "action": {
73
+ "type": "ACTION",
74
+ "shape": [
75
+ 2
76
+ ]
77
+ },
78
+ "agent_pos": {
79
+ "type": "STATE",
80
+ "shape": [
81
+ 2
82
+ ]
83
+ },
84
+ "pixels": {
85
+ "type": "VISUAL",
86
+ "shape": [
87
+ 384,
88
+ 384,
89
+ 3
90
+ ]
91
+ }
92
+ },
93
+ "features_map": {
94
+ "action": "action",
95
+ "agent_pos": "observation.state",
96
+ "environment_state": "observation.environment_state",
97
+ "pixels": "observation.image"
98
+ },
99
+ "episode_length": 300,
100
+ "obs_type": "pixels_agent_pos",
101
+ "render_mode": "rgb_array",
102
+ "visualization_width": 384,
103
+ "visualization_height": 384
104
+ },
105
+ "policy": {
106
+ "type": "diffusion",
107
+ "n_obs_steps": 2,
108
+ "normalization_mapping": {
109
+ "VISUAL": "MEAN_STD",
110
+ "STATE": "MIN_MAX",
111
+ "ACTION": "MIN_MAX"
112
+ },
113
+ "input_features": {
114
+ "observation.image": {
115
+ "type": "VISUAL",
116
+ "shape": [
117
+ 3,
118
+ 96,
119
+ 96
120
+ ]
121
+ },
122
+ "observation.state": {
123
+ "type": "STATE",
124
+ "shape": [
125
+ 2
126
+ ]
127
+ }
128
+ },
129
+ "output_features": {
130
+ "action": {
131
+ "type": "ACTION",
132
+ "shape": [
133
+ 2
134
+ ]
135
+ }
136
+ },
137
+ "horizon": 16,
138
+ "n_action_steps": 8,
139
+ "drop_n_last_frames": 7,
140
+ "vision_backbone": "resnet18",
141
+ "crop_shape": [
142
+ 84,
143
+ 84
144
+ ],
145
+ "crop_is_random": true,
146
+ "pretrained_backbone_weights": null,
147
+ "use_group_norm": true,
148
+ "spatial_softmax_num_keypoints": 32,
149
+ "use_separate_rgb_encoder_per_camera": false,
150
+ "down_dims": [
151
+ 512,
152
+ 1024,
153
+ 2048
154
+ ],
155
+ "kernel_size": 5,
156
+ "n_groups": 8,
157
+ "diffusion_step_embed_dim": 128,
158
+ "use_film_scale_modulation": true,
159
+ "noise_scheduler_type": "DDPM",
160
+ "num_train_timesteps": 100,
161
+ "beta_schedule": "squaredcos_cap_v2",
162
+ "beta_start": 0.0001,
163
+ "beta_end": 0.02,
164
+ "prediction_type": "epsilon",
165
+ "clip_sample": true,
166
+ "clip_sample_range": 1.0,
167
+ "num_inference_steps": null,
168
+ "do_mask_loss_for_padding": false,
169
+ "optimizer_lr": 0.0001,
170
+ "optimizer_betas": [
171
+ 0.95,
172
+ 0.999
173
+ ],
174
+ "optimizer_eps": 1e-08,
175
+ "optimizer_weight_decay": 1e-06,
176
+ "scheduler_name": "cosine",
177
+ "scheduler_warmup_steps": 500
178
+ },
179
+ "output_dir": "outputs/train/2025-01-17/11-51-15_pusht_diffusion",
180
+ "job_name": "pusht_diffusion",
181
+ "resume": false,
182
+ "device": "cuda",
183
+ "use_amp": false,
184
+ "seed": 100000,
185
+ "num_workers": 4,
186
+ "batch_size": 64,
187
+ "eval_freq": 25000,
188
+ "log_freq": 200,
189
+ "save_checkpoint": true,
190
+ "save_freq": 25000,
191
+ "offline": {
192
+ "steps": 200000
193
+ },
194
+ "online": {
195
+ "steps": 0,
196
+ "rollout_n_episodes": 1,
197
+ "rollout_batch_size": 1,
198
+ "steps_between_rollouts": null,
199
+ "sampling_ratio": 0.5,
200
+ "env_seed": null,
201
+ "buffer_capacity": null,
202
+ "buffer_seed_size": 0,
203
+ "do_rollout_async": false
204
+ },
205
+ "use_policy_training_preset": true,
206
+ "optimizer": {
207
+ "type": "adam",
208
+ "lr": 0.0001,
209
+ "betas": [
210
+ 0.95,
211
+ 0.999
212
+ ],
213
+ "eps": 1e-08,
214
+ "weight_decay": 1e-06,
215
+ "grad_clip_norm": 10.0
216
+ },
217
+ "scheduler": {
218
+ "type": "diffuser",
219
+ "num_warmup_steps": 500,
220
+ "name": "cosine"
221
+ },
222
+ "eval": {
223
+ "n_episodes": 50,
224
+ "batch_size": 50,
225
+ "use_async_envs": false
226
+ },
227
+ "wandb": {
228
+ "enable": true,
229
+ "disable_artifact": false,
230
+ "project": "lerobot",
231
+ "entity": null,
232
+ "notes": null
233
+ }
234
+ }