carohiguera commited on
Commit
ebd4d0b
β€’
1 Parent(s): b14c0ac

added force field decoder for gelsight

Browse files
README.md ADDED
@@ -0,0 +1,44 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: cc-by-nc-4.0
3
+ tags:
4
+ - sparsh
5
+ - force field
6
+ - gelsight mini
7
+ ---
8
+
9
+ # Sparsh (DINO) + force field decoder for GelSight mini sensor
10
+
11
+ We decode the touch representations from Sparsh into normal and shear force fields. This allows us to gather an intuition about what the representations capture in terms of forces in a way that is interpretable for humans.
12
+
13
+ ## How to Use
14
+ For testing Sparsh(DINO) + force field decoder live, you only need a GelSight mini sensor. Follow these steps to run the demo:
15
+
16
+ 1. Clone the [sparsh repo](https://github.com/facebookresearch/sparsh.git)
17
+
18
+ 2. Create a folder for downloading the task checkpoints. For example, `${YOUR_PATH}/outputs_sparsh/checkpoints`.
19
+ 3. Download the Sparsh (DINO) base [checkpoint](https://huggingface.co/facebook/sparsh-dino-base)
20
+ 4. Download the decoder checkpoints from this repo.
21
+ 5. Connect the sensor to your PC.
22
+ 6. Make sure the device is recognized by the OS (you can use Cheese in Linux to see the video that the sensor is streaming).
23
+
24
+ 7. Running the demo for GelSight mini (please refer to the Sparsh repo README for more information about how to setup the path configs):
25
+
26
+ ```bash
27
+ python demo_forcefield.py +experiment=digit/downstream_task/forcefield/gelsight_dino paths=${YOUR_PATH_CONFIG} paths.output_dir=${YOUR_PATH}/outputs_sparsh/checkpoints/ test.demo.gelsight_device_id=${YOUR_GELSIGHT_VIDEO_ID}`
28
+ ```
29
+
30
+ The GelSight mini is recognized as a webcam. You can get the video ID by checking in a terminal `ls -l /dev/video*`.
31
+
32
+ 8. Take the sensor and slide it across the edge of a table, or across objects with interesting textures! Look at the normal field to localize where you're making contact on the sensor's surface. Look at the shear field to gather an intuition about the direction of the shear force that you applied while sliding the sensor. For example, slide the sensor over an edge up and down to get translational shear or rotate the sensor in place to see torsional slip!
33
+
34
+ ### BibTeX entry and citation info
35
+ ```bibtex
36
+ @inproceedings{
37
+ higuera2024sparsh,
38
+ title={Sparsh: Self-supervised touch representations for vision-based tactile sensing},
39
+ author={Carolina Higuera and Akash Sharma and Chaithanya Krishna Bodduluri and Taosha Fan and Patrick Lancaster and Mrinal Kalakrishnan and Michael Kaess and Byron Boots and Mike Lambeta and Tingfan Wu and Mustafa Mukadam},
40
+ booktitle={8th Annual Conference on Robot Learning},
41
+ year={2024},
42
+ url={https://openreview.net/forum?id=xYJn2e1uu8}
43
+ }
44
+ ```
gelsight_t1_forcefield_dino_vitbase_bg/checkpoints/epoch-0010.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:2342576673a2e731790e1a02b059f12572e5946aa25b48a172c30494c958d2ea
3
+ size 14695019
gelsight_t1_forcefield_dino_vitbase_bg/checkpoints/epoch-0014.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:fdc987d0a960ec6c9207ad7712693b807bc68743dc95500b78b94db427092081
3
+ size 14695019
gelsight_t1_forcefield_dino_vitbase_bg/checkpoints/epoch-0021.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:dc8a697da3131adaffd54bc4b03e8fb137a0affca689c296d145dd54ad0151c3
3
+ size 14695019
gelsight_t1_forcefield_dino_vitbase_bg/checkpoints/last.ckpt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:d62ad4094fec1d32a22ce180e7e099b901cac210c04cff1d3d974019f751d997
3
+ size 541797495
gelsight_t1_forcefield_dino_vitbase_bg/config.yaml ADDED
@@ -0,0 +1,159 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ paths:
2
+ data_root: /media/chiguera/GUM/
3
+ encoder_checkpoint_root: /media/chiguera/2TB/sparsh/encoders_460k/
4
+ log_dir: /media/chiguera/GUM/tactile_ssl/outputs_sparsh/${experiment_name}/logs/
5
+ output_dir: /media/chiguera/GUM/tactile_ssl/outputs_sparsh/${experiment_name}/
6
+ work_dir: ${hydra:runtime.cwd}
7
+ wandb:
8
+ project: ${task_name}_${sensor}
9
+ entity: chiguera
10
+ save_dir: ${paths.output_dir}
11
+ id: ${hydra:job.id}_${experiment_name}
12
+ tags:
13
+ - ${ssl_name}
14
+ group: null
15
+ notes: null
16
+ data:
17
+ train_val_split: null
18
+ train_data_budget: ${train_data_budget}
19
+ val_data_budget: ${val_data_budget}
20
+ max_train_data: null
21
+ sensor: gelsight_mini
22
+ dataset:
23
+ _target_: tactile_ssl.data.vision_tactile_forcefield.VisionTactileBackboneDataset
24
+ config:
25
+ sensor: ${data.sensor}
26
+ remove_bg: true
27
+ out_format: concat_ch_img
28
+ num_frames: 2
29
+ frame_stride: 5
30
+ path_dataset: ${paths.data_root}/datasets/gelsight/Object-Slide/
31
+ path_bgs: ${paths.data_root}/datasets/gelsight/Object-Slide/bgs/
32
+ list_datasets:
33
+ - strawberry/dataset_0
34
+ - strawberry/dataset_1
35
+ - strawberry/dataset_2
36
+ - plum/dataset_0
37
+ - plum/dataset_1
38
+ - plum/dataset_2
39
+ - icecream_cup/dataset_0
40
+ - icecream_cup/dataset_1
41
+ - icecream_cup/dataset_2
42
+ - donut/dataset_0
43
+ - donut/dataset_1
44
+ - donut/dataset_2
45
+ - cookie2/dataset_0
46
+ - cookie2/dataset_1
47
+ - cookie2/dataset_2
48
+ - bread/dataset_0
49
+ - bread/dataset_1
50
+ - bread/dataset_2
51
+ - banana/dataset_0
52
+ - banana/dataset_1
53
+ - banana/dataset_2
54
+ - bagel/dataset_0
55
+ - bagel/dataset_1
56
+ - bagel/dataset_2
57
+ list_datasets_test:
58
+ - bagel/dataset_3
59
+ - banana/dataset_3
60
+ - bread/dataset_3
61
+ - cookie2/dataset_3
62
+ - donut/dataset_3
63
+ - icecream_cup/dataset_3
64
+ - plum/dataset_3
65
+ - strawberry/dataset_3
66
+ look_in_folder: false
67
+ transforms:
68
+ with_augmentation: false
69
+ resize:
70
+ - 224
71
+ - 224
72
+ p_flip: 0.0
73
+ p_crop: 0.0
74
+ p_rot: 0.0
75
+ train_dataloader:
76
+ batch_size: 20
77
+ num_workers: 2
78
+ drop_last: true
79
+ pin_memory: true
80
+ persistent_workers: true
81
+ shuffle: true
82
+ val_dataloader:
83
+ batch_size: 20
84
+ num_workers: 2
85
+ drop_last: true
86
+ pin_memory: true
87
+ persistent_workers: true
88
+ task:
89
+ _target_: tactile_ssl.downstream_task.ForceFieldModuleSL
90
+ model_encoder:
91
+ _target_: tactile_ssl.model.vit_${ssl_model_size}
92
+ img_size:
93
+ - 224
94
+ - 224
95
+ in_chans: 6
96
+ pos_embed_fn: sinusoidal
97
+ num_register_tokens: 1
98
+ model_task:
99
+ _target_: tactile_ssl.downstream_task.ForceFieldDecoderSL
100
+ embed_dim: ${ssl_model_size}
101
+ checkpoint_encoder: ${paths.encoder_checkpoint_root}/${ssl_name}_vit${ssl_model_size}.ckpt
102
+ checkpoint_task: /media/chiguera/GUM/tactile_ssl/outputs_sparsh/digit_t1_forcefield_dino_vitbase_bg/checkpoints/epoch-0031.pth
103
+ train_encoder: false
104
+ optim_cfg:
105
+ _partial_: true
106
+ _target_: torch.optim.Adam
107
+ lr: 0.0001
108
+ scheduler_cfg: null
109
+ ssl_config:
110
+ img_sz:
111
+ - 224
112
+ - 224
113
+ pose_estimator:
114
+ num_encoder_layers: 18
115
+ loss:
116
+ with_mask_supervision: false
117
+ with_sl_supervision: false
118
+ with_ssim: true
119
+ disparity_smoothness: 0.001
120
+ min_depth: 0.1
121
+ max_depth: 100.0
122
+ encoder_type: ${ssl_name}
123
+ ssl_name: dino
124
+ sensor: gelsight
125
+ ckpt_path: null
126
+ task_name: t1_forcefield
127
+ ssl_model_size: base
128
+ train_data_budget: 1.0
129
+ val_data_budget: 1.0
130
+ experiment_name: ${sensor}_${task_name}_${ssl_name}_vit${ssl_model_size}_bg
131
+ seed: 42
132
+ data_out_format: concat_ch_img
133
+ num_frames: 2
134
+ frame_stride: 5
135
+ trainer:
136
+ max_epochs: 21
137
+ validation_frequency: 2
138
+ sanity_validate: false
139
+ save_checkpoint_dir: ${paths.output_dir}/checkpoints
140
+ checkpoint_interval_type: log
141
+ max_task_checkpoints: 10
142
+ save_probe_weights_only: true
143
+ limit_train_batches: 500
144
+ limit_val_batches: 150
145
+ use_distributed_sampler: false
146
+ devices:
147
+ - 0
148
+ test:
149
+ data:
150
+ dataset_name:
151
+ - cookie2/dataset_0
152
+ batch_size: 1
153
+ tester:
154
+ _partial_: true
155
+ _target_: tactile_ssl.test.TestForceField
156
+ demo:
157
+ _partial_: true
158
+ _target_: tactile_ssl.test.DemoForceField
159
+ path_outputs: null
gelsight_t1_forcefield_dino_vitbase_bg/config_tree.log ADDED
@@ -0,0 +1,178 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ CONFIG
2
+ β”œβ”€β”€ data
3
+ β”‚ └── train_val_split: null
4
+ β”‚ train_data_budget: 1.0
5
+ β”‚ val_data_budget: 1.0
6
+ β”‚ max_train_data: null
7
+ β”‚ sensor: gelsight_mini
8
+ β”‚ dataset:
9
+ β”‚ _target_: tactile_ssl.data.vision_tactile_forcefield.VisionTactileBackboneDataset
10
+ β”‚ config:
11
+ β”‚ sensor: gelsight_mini
12
+ β”‚ remove_bg: true
13
+ β”‚ out_format: concat_ch_img
14
+ β”‚ num_frames: 2
15
+ β”‚ frame_stride: 5
16
+ β”‚ path_dataset: /media/chiguera/GUM//datasets/gelsight/Object-Slide/
17
+ β”‚ path_bgs: /media/chiguera/GUM//datasets/gelsight/Object-Slide/bgs/
18
+ β”‚ list_datasets:
19
+ β”‚ - strawberry/dataset_0
20
+ β”‚ - strawberry/dataset_1
21
+ β”‚ - strawberry/dataset_2
22
+ β”‚ - plum/dataset_0
23
+ β”‚ - plum/dataset_1
24
+ β”‚ - plum/dataset_2
25
+ β”‚ - icecream_cup/dataset_0
26
+ β”‚ - icecream_cup/dataset_1
27
+ β”‚ - icecream_cup/dataset_2
28
+ β”‚ - donut/dataset_0
29
+ β”‚ - donut/dataset_1
30
+ β”‚ - donut/dataset_2
31
+ β”‚ - cookie2/dataset_0
32
+ β”‚ - cookie2/dataset_1
33
+ β”‚ - cookie2/dataset_2
34
+ β”‚ - bread/dataset_0
35
+ β”‚ - bread/dataset_1
36
+ β”‚ - bread/dataset_2
37
+ β”‚ - banana/dataset_0
38
+ β”‚ - banana/dataset_1
39
+ β”‚ - banana/dataset_2
40
+ β”‚ - bagel/dataset_0
41
+ β”‚ - bagel/dataset_1
42
+ β”‚ - bagel/dataset_2
43
+ β”‚ list_datasets_test:
44
+ β”‚ - bagel/dataset_3
45
+ β”‚ - banana/dataset_3
46
+ β”‚ - bread/dataset_3
47
+ β”‚ - cookie2/dataset_3
48
+ β”‚ - donut/dataset_3
49
+ β”‚ - icecream_cup/dataset_3
50
+ β”‚ - plum/dataset_3
51
+ β”‚ - strawberry/dataset_3
52
+ β”‚ look_in_folder: false
53
+ β”‚ transforms:
54
+ β”‚ with_augmentation: false
55
+ β”‚ resize:
56
+ β”‚ - 224
57
+ β”‚ - 224
58
+ β”‚ p_flip: 0.0
59
+ β”‚ p_crop: 0.0
60
+ β”‚ p_rot: 0.0
61
+ β”‚ train_dataloader:
62
+ β”‚ batch_size: 20
63
+ β”‚ num_workers: 2
64
+ β”‚ drop_last: true
65
+ β”‚ pin_memory: true
66
+ β”‚ persistent_workers: true
67
+ β”‚ shuffle: true
68
+ β”‚ val_dataloader:
69
+ β”‚ batch_size: 20
70
+ β”‚ num_workers: 2
71
+ β”‚ drop_last: true
72
+ β”‚ pin_memory: true
73
+ β”‚ persistent_workers: true
74
+ β”‚
75
+ β”œβ”€β”€ trainer
76
+ β”‚ └── max_epochs: 21
77
+ β”‚ validation_frequency: 2
78
+ β”‚ sanity_validate: false
79
+ β”‚ save_checkpoint_dir: /media/chiguera/GUM/tactile_ssl/outputs_sparsh/gelsight_t1_forcefield_dino_vitb
80
+ β”‚ checkpoint_interval_type: log
81
+ β”‚ max_task_checkpoints: 10
82
+ β”‚ save_probe_weights_only: true
83
+ β”‚ limit_train_batches: 500
84
+ β”‚ limit_val_batches: 150
85
+ β”‚ use_distributed_sampler: false
86
+ β”‚ devices:
87
+ β”‚ - 0
88
+ β”‚
89
+ β”œβ”€β”€ paths
90
+ β”‚ └── data_root: /media/chiguera/GUM/
91
+ β”‚ encoder_checkpoint_root: /media/chiguera/2TB/sparsh/encoders_460k/
92
+ β”‚ log_dir: /media/chiguera/GUM/tactile_ssl/outputs_sparsh/gelsight_t1_forcefield_dino_vitbase_bg/logs/
93
+ β”‚ output_dir: /media/chiguera/GUM/tactile_ssl/outputs_sparsh/gelsight_t1_forcefield_dino_vitbase_bg/
94
+ β”‚ work_dir: /media/chiguera/2TB/sparsh/tactile-ssl
95
+ β”‚
96
+ β”œβ”€β”€ wandb
97
+ β”‚ └── project: t1_forcefield_gelsight
98
+ β”‚ entity: chiguera
99
+ β”‚ save_dir: /media/chiguera/GUM/tactile_ssl/outputs_sparsh/gelsight_t1_forcefield_dino_vitbase_bg/
100
+ β”‚ id: 2024.09.30_11-38_gelsight_t1_forcefield_dino_vitbase_bg
101
+ β”‚ tags:
102
+ β”‚ - dino
103
+ β”‚ group: null
104
+ β”‚ notes: null
105
+ β”‚
106
+ β”œβ”€β”€ task
107
+ β”‚ └── _target_: tactile_ssl.downstream_task.ForceFieldModuleSL
108
+ β”‚ model_encoder:
109
+ β”‚ _target_: tactile_ssl.model.vit_base
110
+ β”‚ img_size:
111
+ β”‚ - 224
112
+ β”‚ - 224
113
+ β”‚ in_chans: 6
114
+ β”‚ pos_embed_fn: sinusoidal
115
+ β”‚ num_register_tokens: 1
116
+ β”‚ model_task:
117
+ β”‚ _target_: tactile_ssl.downstream_task.ForceFieldDecoderSL
118
+ β”‚ embed_dim: base
119
+ β”‚ checkpoint_encoder: /media/chiguera/2TB/sparsh/encoders_460k//dino_vitbase.ckpt
120
+ β”‚ checkpoint_task: /media/chiguera/GUM/tactile_ssl/outputs_sparsh/digit_t1_forcefield_dino_vitbase_bg/
121
+ β”‚ train_encoder: false
122
+ β”‚ optim_cfg:
123
+ β”‚ _partial_: true
124
+ β”‚ _target_: torch.optim.Adam
125
+ β”‚ lr: 0.0001
126
+ β”‚ scheduler_cfg: null
127
+ β”‚ ssl_config:
128
+ β”‚ img_sz:
129
+ β”‚ - 224
130
+ β”‚ - 224
131
+ β”‚ pose_estimator:
132
+ β”‚ num_encoder_layers: 18
133
+ β”‚ loss:
134
+ β”‚ with_mask_supervision: false
135
+ β”‚ with_sl_supervision: false
136
+ β”‚ with_ssim: true
137
+ β”‚ disparity_smoothness: 0.001
138
+ β”‚ min_depth: 0.1
139
+ β”‚ max_depth: 100.0
140
+ β”‚ encoder_type: dino
141
+ β”‚
142
+ β”œβ”€β”€ ssl_name
143
+ β”‚ └── dino
144
+ β”œβ”€β”€ sensor
145
+ β”‚ └── gelsight
146
+ β”œβ”€β”€ ckpt_path
147
+ β”‚ └── None
148
+ β”œβ”€β”€ task_name
149
+ β”‚ └── t1_forcefield
150
+ β”œβ”€β”€ ssl_model_size
151
+ β”‚ └── base
152
+ β”œβ”€β”€ train_data_budget
153
+ β”‚ └── 1.0
154
+ β”œβ”€β”€ val_data_budget
155
+ β”‚ └── 1.0
156
+ β”œβ”€β”€ experiment_name
157
+ β”‚ └── gelsight_t1_forcefield_dino_vitbase_bg
158
+ β”œβ”€β”€ seed
159
+ β”‚ └── 42
160
+ β”œβ”€β”€ data_out_format
161
+ β”‚ └── concat_ch_img
162
+ β”œβ”€β”€ num_frames
163
+ β”‚ └── 2
164
+ β”œβ”€β”€ frame_stride
165
+ β”‚ └── 5
166
+ └── test
167
+ └── data:
168
+ dataset_name:
169
+ - cookie2/dataset_0
170
+ batch_size: 1
171
+ tester:
172
+ _partial_: true
173
+ _target_: tactile_ssl.test.TestForceField
174
+ demo:
175
+ _partial_: true
176
+ _target_: tactile_ssl.test.DemoForceField
177
+ path_outputs: null
178
+