Image-to-3D
English
make-a-shape
sv-to-3d
Hooman commited on
Commit
57ca87a
·
verified ·
1 Parent(s): c59a8fa

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +85 -114
README.md CHANGED
@@ -1,200 +1,171 @@
1
  ---
2
- # For reference on model card metadata, see the spec: https://github.com/huggingface/hub-docs/blob/main/modelcard.md?plain=1
3
- # Doc / guide: https://huggingface.co/docs/hub/model-cards
4
- {{ card_data }}
 
 
 
 
 
 
5
  ---
 
 
6
 
7
- # Model Card for {{ model_id | default("Model ID", true) }}
8
-
9
- <!-- Provide a quick summary of what the model is/does. -->
10
-
11
- {{ model_summary | default("", true) }}
12
 
13
  ## Model Details
14
 
15
  ### Model Description
16
 
17
- <!-- Provide a longer summary of what this model is. -->
18
 
19
- {{ model_description | default("", true) }}
 
 
20
 
21
- - **Developed by:** {{ developers | default("[More Information Needed]", true)}}
22
- - **Funded by [optional]:** {{ funded_by | default("[More Information Needed]", true)}}
23
- - **Shared by [optional]:** {{ shared_by | default("[More Information Needed]", true)}}
24
- - **Model type:** {{ model_type | default("[More Information Needed]", true)}}
25
- - **Language(s) (NLP):** {{ language | default("[More Information Needed]", true)}}
26
- - **License:** {{ license | default("[More Information Needed]", true)}}
27
- - **Finetuned from model [optional]:** {{ base_model | default("[More Information Needed]", true)}}
28
 
29
- ### Model Sources [optional]
30
 
31
  <!-- Provide the basic links for the model. -->
32
 
33
- - **Repository:** {{ repo | default("[More Information Needed]", true)}}
34
- - **Paper [optional]:** {{ paper | default("[More Information Needed]", true)}}
35
- - **Demo [optional]:** {{ demo | default("[More Information Needed]", true)}}
36
 
37
  ## Uses
38
 
39
- <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
40
-
41
  ### Direct Use
42
 
43
- <!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
44
-
45
- {{ direct_use | default("[More Information Needed]", true)}}
46
 
47
- ### Downstream Use [optional]
48
 
49
- <!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
50
-
51
- {{ downstream_use | default("[More Information Needed]", true)}}
 
 
52
 
53
  ### Out-of-Scope Use
54
 
55
- <!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
56
-
57
- {{ out_of_scope_use | default("[More Information Needed]", true)}}
 
58
 
59
  ## Bias, Risks, and Limitations
60
 
61
  <!-- This section is meant to convey both technical and sociotechnical limitations. -->
62
 
63
- {{ bias_risks_limitations | default("[More Information Needed]", true)}}
 
 
 
64
 
65
  ### Recommendations
66
 
67
- <!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
68
-
69
- {{ bias_recommendations | default("Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.", true)}}
 
 
 
70
 
71
  ## How to Get Started with the Model
72
 
73
- Use the code below to get started with the model.
74
-
75
- {{ get_started_code | default("[More Information Needed]", true)}}
76
 
77
  ## Training Details
78
 
79
  ### Training Data
80
 
81
- <!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
82
-
83
- {{ training_data | default("[More Information Needed]", true)}}
84
 
85
  ### Training Procedure
86
 
87
- <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
88
-
89
- #### Preprocessing [optional]
90
-
91
- {{ preprocessing | default("[More Information Needed]", true)}}
92
 
 
93
 
94
  #### Training Hyperparameters
95
 
96
- - **Training regime:** {{ training_regime | default("[More Information Needed]", true)}} <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
97
 
98
- #### Speeds, Sizes, Times [optional]
99
 
100
- <!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
101
-
102
- {{ speeds_sizes_times | default("[More Information Needed]", true)}}
103
 
104
  ## Evaluation
105
 
106
- <!-- This section describes the evaluation protocols and provides the results. -->
107
-
108
  ### Testing Data, Factors & Metrics
109
 
110
  #### Testing Data
111
 
112
- <!-- This should link to a Dataset Card if possible. -->
113
-
114
- {{ testing_data | default("[More Information Needed]", true)}}
115
 
116
  #### Factors
117
 
118
- <!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
119
-
120
- {{ testing_factors | default("[More Information Needed]", true)}}
121
 
122
  #### Metrics
123
 
124
- <!-- These are the evaluation metrics being used, ideally with a description of why. -->
125
-
126
- {{ testing_metrics | default("[More Information Needed]", true)}}
 
127
 
128
  ### Results
129
 
130
- {{ results | default("[More Information Needed]", true)}}
131
-
132
- #### Summary
 
133
 
134
- {{ results_summary | default("", true) }}
 
 
 
135
 
136
- ## Model Examination [optional]
137
 
138
- <!-- Relevant interpretability work for the model goes here -->
139
-
140
- {{ model_examination | default("[More Information Needed]", true)}}
141
-
142
- ## Environmental Impact
143
-
144
- <!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
145
-
146
- Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
147
-
148
- - **Hardware Type:** {{ hardware_type | default("[More Information Needed]", true)}}
149
- - **Hours used:** {{ hours_used | default("[More Information Needed]", true)}}
150
- - **Cloud Provider:** {{ cloud_provider | default("[More Information Needed]", true)}}
151
- - **Compute Region:** {{ cloud_region | default("[More Information Needed]", true)}}
152
- - **Carbon Emitted:** {{ co2_emitted | default("[More Information Needed]", true)}}
153
-
154
- ## Technical Specifications [optional]
155
 
156
  ### Model Architecture and Objective
157
 
158
- {{ model_specs | default("[More Information Needed]", true)}}
159
 
160
  ### Compute Infrastructure
161
 
162
- {{ compute_infrastructure | default("[More Information Needed]", true)}}
163
-
164
  #### Hardware
165
 
166
- {{ hardware_requirements | default("[More Information Needed]", true)}}
167
-
168
- #### Software
169
-
170
- {{ software | default("[More Information Needed]", true)}}
171
 
172
- ## Citation [optional]
173
-
174
- <!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
175
 
176
  **BibTeX:**
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
177
 
178
- {{ citation_bibtex | default("[More Information Needed]", true)}}
179
 
180
  **APA:**
181
 
182
- {{ citation_apa | default("[More Information Needed]", true)}}
183
-
184
- ## Glossary [optional]
185
-
186
- <!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
187
-
188
- {{ glossary | default("[More Information Needed]", true)}}
189
-
190
- ## More Information [optional]
191
-
192
- {{ more_information | default("[More Information Needed]", true)}}
193
-
194
- ## Model Card Authors [optional]
195
-
196
- {{ model_card_authors | default("[More Information Needed]", true)}}
197
 
198
  ## Model Card Contact
199
 
200
- {{ model_card_contact | default("[More Information Needed]", true)}}
 
1
  ---
2
+ '[object Object]': null
3
+ language:
4
+ - en
5
+ license: other
6
+ license_name: autodesk-non-commercial-3d-generative-v1.0
7
+ license_link: LICENSE.md
8
+ tags:
9
+ - make-a-shape
10
+ - sv-to-3d
11
  ---
12
+ ---
13
+ # Model Card for Make-A-Shape Single-View to 3D Model
14
 
15
+ This model is part of the Make-A-Shape paper, capable of generating high-quality 3D shapes from single-view images with intricate geometric details, realistic structures, and complex topologies.
 
 
 
 
16
 
17
  ## Model Details
18
 
19
  ### Model Description
20
 
21
+ Make-A-Shape is a novel 3D generative framework trained on an extensive dataset of over 10 million publicly-available 3D shapes. The single-view to 3D model is one of the conditional generation models in this framework. It can efficiently generate a wide range of high-quality 3D shapes from single-view image inputs in just 2 seconds. The model uses a wavelet-tree representation and adaptive training strategy to achieve superior performance in terms of geometric detail and structural plausibility.
22
 
23
+ - **Developed by:** Ka-Hei Hui, Aditya Sanghi, Arianna Rampini, Kamal Rahimi Malekshan, Zhengzhe Liu, Hooman Shayani, Chi-Wing Fu
24
+ - **Model type:** 3D Generative Model
25
+ - **License:** Autodesk Non-Commercial (3D Generative) v1.0
26
 
27
+ For more information please look at the [Project](https://www.research.autodesk.com/publications/generative-ai-make-a-shape/) [Page](https://edward1997104.github.io/make-a-shape/) and [the ICML paper](https://proceedings.mlr.press/v235/hui24a.html).
 
 
 
 
 
 
28
 
29
+ ### Model Sources
30
 
31
  <!-- Provide the basic links for the model. -->
32
 
33
+ - **Repository:** [https://github.com/AutodeskAILab/Make-a-Shape](https://github.com/AutodeskAILab/Make-a-Shape)
34
+ - **Paper:** [Make-A-Shape: a Ten-Million-scale 3D Shape Model](https://proceedings.mlr.press/v235/hui24a.html)
35
+ - **Demo:** [in progress...]
36
 
37
  ## Uses
38
 
 
 
39
  ### Direct Use
40
 
41
+ Please look at the instructions [here](https://github.com/AutodeskAILab/Make-a-Shape?tab=readme-ov-file#single-view-to-3d) to test this model for research and acadeic purposes.
 
 
42
 
43
+ ### Downstream Use
44
 
45
+ This model could potentially be used in various applications such as:
46
+ - 3D content creation for gaming and virtual environments
47
+ - Augmented reality applications
48
+ - Computer-aided design and prototyping
49
+ - Architectural visualization
50
 
51
  ### Out-of-Scope Use
52
 
53
+ The model should not be used for:
54
+ - Commercial use
55
+ - Generating 3D shapes of sensitive or copyrighted content without proper authorization
56
+ - Creating 3D models intended for harmful or malicious purposes
57
 
58
  ## Bias, Risks, and Limitations
59
 
60
  <!-- This section is meant to convey both technical and sociotechnical limitations. -->
61
 
62
+ - The model may inherit biases present in the training dataset, which could lead to uneven representation of certain object types or styles.
63
+ - The quality of the generated 3D shape depends on the quality and clarity of the input image.
64
+ - The model may occasionally generate implausible shapes, especially when the input image is ambiguous or of low quality.
65
+ - The model's performance may degrade for object categories or styles that are underrepresented in the training data.
66
 
67
  ### Recommendations
68
 
69
+ Users should be aware of the potential biases and limitations of the model. It's recommended to:
70
+ - Use high-quality, clear input images for best results
71
+ - Verify and potentially post-process the generated 3D shapes for critical applications
72
+ - Be cautious when using the model for object categories that may be underrepresented in the training data
73
+ - Consider ethical implications and potential biases
74
+ - DO NOT USE for commercial or public-facing applications
75
 
76
  ## How to Get Started with the Model
77
 
78
+ [More Information Needed]
 
 
79
 
80
  ## Training Details
81
 
82
  ### Training Data
83
 
84
+ The model was trained on a dataset of over 10 million 3D shapes aggregated from 18 different publicly-available sub-datasets, including ModelNet, ShapeNet, SMPL, Thingi10K, SMAL, COMA, House3D, ABC, Fusion 360, 3D-FUTURE, BuildingNet, DeformingThings4D, FG3D, Toys4K, ABO, Infinigen, Objaverse, and two subsets of ObjaverseXL (Thingiverse and GitHub).
 
 
85
 
86
  ### Training Procedure
87
 
88
+ #### Preprocessing
 
 
 
 
89
 
90
+ Each 3D shape in the dataset was converted into a truncated signed distance function (TSDF) with a resolution of 256³. The TSDF was then decomposed using a discrete wavelet transform to create the wavelet-tree representation used by the model.
91
 
92
  #### Training Hyperparameters
93
 
94
+ - **Training regime:** Please look at the paper.
95
 
96
+ #### Speeds, Sizes, Times
97
 
98
+ - The model was trained on 48 × A10G GPUs for about 20 days, amounting to around 23,000 GPU hours.
99
+ - The model can generate shapes within two seconds for most conditions.
 
100
 
101
  ## Evaluation
102
 
 
 
103
  ### Testing Data, Factors & Metrics
104
 
105
  #### Testing Data
106
 
107
+ The model was evaluated on a test set consisting of 2% of the shapes from each sub-dataset in the training data, as well as on the entire Google Scanned Objects (GSO) dataset, which was not part of the training data.
 
 
108
 
109
  #### Factors
110
 
111
+ The evaluation considered various factors such as the quality of generated shapes, the ability to capture fine details and complex structures, and the model's performance across different object categories.
 
 
112
 
113
  #### Metrics
114
 
115
+ The model was evaluated using the following metrics:
116
+ - Intersection over Union (IoU)
117
+ - Light Field Distance (LFD)
118
+ - Chamfer Distance (CD)
119
 
120
  ### Results
121
 
122
+ The single-view to 3D model achieved the following results on the "Our Val" dataset:
123
+ - LFD: 4071.33
124
+ - IoU: 0.4285
125
+ - CD: 0.01851
126
 
127
+ On the GSO dataset:
128
+ - LFD: 3406.61
129
+ - IoU: 0.5004
130
+ - CD: 0.01748
131
 
 
132
 
133
+ ## Technical Specifications
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
134
 
135
  ### Model Architecture and Objective
136
 
137
+ The model uses a U-ViT architecture with learnable skip-connections between the convolution and deconvolution blocks. It employs a wavelet-tree representation and a subband adaptive training strategy to effectively capture both coarse and fine details of 3D shapes.
138
 
139
  ### Compute Infrastructure
140
 
 
 
141
  #### Hardware
142
 
143
+ The model was trained on 48 × A10G GPUs.
 
 
 
 
144
 
145
+ ## Citation
 
 
146
 
147
  **BibTeX:**
148
+ @InProceedings{pmlr-v235-hui24a,
149
+ title = {Make-A-Shape: a Ten-Million-scale 3{D} Shape Model},
150
+ author = {Hui, Ka-Hei and Sanghi, Aditya and Rampini, Arianna and Rahimi Malekshan, Kamal and Liu, Zhengzhe and Shayani, Hooman and Fu, Chi-Wing},
151
+ booktitle = {Proceedings of the 41st International Conference on Machine Learning},
152
+ pages = {20660--20681},
153
+ year = {2024},
154
+ editor = {Salakhutdinov, Ruslan and Kolter, Zico and Heller, Katherine and Weller, Adrian and Oliver, Nuria and Scarlett, Jonathan and Berkenkamp, Felix},
155
+ volume = {235},
156
+ series = {Proceedings of Machine Learning Research},
157
+ month = {21--27 Jul},
158
+ publisher = {PMLR},
159
+ pdf = {https://raw.githubusercontent.com/mlresearch/v235/main/assets/hui24a/hui24a.pdf},
160
+ url = {https://proceedings.mlr.press/v235/hui24a.html},
161
+ abstract = {The progression in large-scale 3D generative models has been impeded by significant resource requirements for training and challenges like inefficient representations. This paper introduces Make-A-Shape, a novel 3D generative model trained on a vast scale, using 10 million publicly-available shapes. We first innovate the wavelet-tree representation to encode high-resolution SDF shapes with minimal loss, leveraging our newly-proposed subband coefficient filtering scheme. We then design a subband coefficient packing scheme to facilitate diffusion-based generation and a subband adaptive training strategy for effective training on the large-scale dataset. Our generative framework is versatile, capable of conditioning on various input modalities such as images, point clouds, and voxels, enabling a variety of downstream applications, e.g., unconditional generation, completion, and conditional generation. Our approach clearly surpasses the existing baselines in delivering high-quality results and can efficiently generate shapes within two seconds for most conditions.}
162
+ }
163
 
 
164
 
165
  **APA:**
166
 
167
+ Hui, K. H., Sanghi, A., Rampini, A., Malekshan, K. R., Liu, Z., Shayani, H., & Fu, C. W. (2024). Make-A-Shape: a Ten-Million-scale 3D Shape Model. arXiv preprint arXiv:2401.08504.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
168
 
169
  ## Model Card Contact
170
 
171