Update README.md
Browse files
README.md
CHANGED
@@ -1,200 +1,171 @@
|
|
1 |
---
|
2 |
-
|
3 |
-
|
4 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
5 |
---
|
|
|
|
|
6 |
|
7 |
-
|
8 |
-
|
9 |
-
<!-- Provide a quick summary of what the model is/does. -->
|
10 |
-
|
11 |
-
{{ model_summary | default("", true) }}
|
12 |
|
13 |
## Model Details
|
14 |
|
15 |
### Model Description
|
16 |
|
17 |
-
|
18 |
|
19 |
-
|
|
|
|
|
20 |
|
21 |
-
|
22 |
-
- **Funded by [optional]:** {{ funded_by | default("[More Information Needed]", true)}}
|
23 |
-
- **Shared by [optional]:** {{ shared_by | default("[More Information Needed]", true)}}
|
24 |
-
- **Model type:** {{ model_type | default("[More Information Needed]", true)}}
|
25 |
-
- **Language(s) (NLP):** {{ language | default("[More Information Needed]", true)}}
|
26 |
-
- **License:** {{ license | default("[More Information Needed]", true)}}
|
27 |
-
- **Finetuned from model [optional]:** {{ base_model | default("[More Information Needed]", true)}}
|
28 |
|
29 |
-
### Model Sources
|
30 |
|
31 |
<!-- Provide the basic links for the model. -->
|
32 |
|
33 |
-
- **Repository:**
|
34 |
-
- **Paper [
|
35 |
-
- **Demo
|
36 |
|
37 |
## Uses
|
38 |
|
39 |
-
<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
|
40 |
-
|
41 |
### Direct Use
|
42 |
|
43 |
-
|
44 |
-
|
45 |
-
{{ direct_use | default("[More Information Needed]", true)}}
|
46 |
|
47 |
-
### Downstream Use
|
48 |
|
49 |
-
|
50 |
-
|
51 |
-
|
|
|
|
|
52 |
|
53 |
### Out-of-Scope Use
|
54 |
|
55 |
-
|
56 |
-
|
57 |
-
|
|
|
58 |
|
59 |
## Bias, Risks, and Limitations
|
60 |
|
61 |
<!-- This section is meant to convey both technical and sociotechnical limitations. -->
|
62 |
|
63 |
-
|
|
|
|
|
|
|
64 |
|
65 |
### Recommendations
|
66 |
|
67 |
-
|
68 |
-
|
69 |
-
|
|
|
|
|
|
|
70 |
|
71 |
## How to Get Started with the Model
|
72 |
|
73 |
-
|
74 |
-
|
75 |
-
{{ get_started_code | default("[More Information Needed]", true)}}
|
76 |
|
77 |
## Training Details
|
78 |
|
79 |
### Training Data
|
80 |
|
81 |
-
|
82 |
-
|
83 |
-
{{ training_data | default("[More Information Needed]", true)}}
|
84 |
|
85 |
### Training Procedure
|
86 |
|
87 |
-
|
88 |
-
|
89 |
-
#### Preprocessing [optional]
|
90 |
-
|
91 |
-
{{ preprocessing | default("[More Information Needed]", true)}}
|
92 |
|
|
|
93 |
|
94 |
#### Training Hyperparameters
|
95 |
|
96 |
-
- **Training regime:**
|
97 |
|
98 |
-
#### Speeds, Sizes, Times
|
99 |
|
100 |
-
|
101 |
-
|
102 |
-
{{ speeds_sizes_times | default("[More Information Needed]", true)}}
|
103 |
|
104 |
## Evaluation
|
105 |
|
106 |
-
<!-- This section describes the evaluation protocols and provides the results. -->
|
107 |
-
|
108 |
### Testing Data, Factors & Metrics
|
109 |
|
110 |
#### Testing Data
|
111 |
|
112 |
-
|
113 |
-
|
114 |
-
{{ testing_data | default("[More Information Needed]", true)}}
|
115 |
|
116 |
#### Factors
|
117 |
|
118 |
-
|
119 |
-
|
120 |
-
{{ testing_factors | default("[More Information Needed]", true)}}
|
121 |
|
122 |
#### Metrics
|
123 |
|
124 |
-
|
125 |
-
|
126 |
-
|
|
|
127 |
|
128 |
### Results
|
129 |
|
130 |
-
|
131 |
-
|
132 |
-
|
|
|
133 |
|
134 |
-
|
|
|
|
|
|
|
135 |
|
136 |
-
## Model Examination [optional]
|
137 |
|
138 |
-
|
139 |
-
|
140 |
-
{{ model_examination | default("[More Information Needed]", true)}}
|
141 |
-
|
142 |
-
## Environmental Impact
|
143 |
-
|
144 |
-
<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
|
145 |
-
|
146 |
-
Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
|
147 |
-
|
148 |
-
- **Hardware Type:** {{ hardware_type | default("[More Information Needed]", true)}}
|
149 |
-
- **Hours used:** {{ hours_used | default("[More Information Needed]", true)}}
|
150 |
-
- **Cloud Provider:** {{ cloud_provider | default("[More Information Needed]", true)}}
|
151 |
-
- **Compute Region:** {{ cloud_region | default("[More Information Needed]", true)}}
|
152 |
-
- **Carbon Emitted:** {{ co2_emitted | default("[More Information Needed]", true)}}
|
153 |
-
|
154 |
-
## Technical Specifications [optional]
|
155 |
|
156 |
### Model Architecture and Objective
|
157 |
|
158 |
-
|
159 |
|
160 |
### Compute Infrastructure
|
161 |
|
162 |
-
{{ compute_infrastructure | default("[More Information Needed]", true)}}
|
163 |
-
|
164 |
#### Hardware
|
165 |
|
166 |
-
|
167 |
-
|
168 |
-
#### Software
|
169 |
-
|
170 |
-
{{ software | default("[More Information Needed]", true)}}
|
171 |
|
172 |
-
## Citation
|
173 |
-
|
174 |
-
<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
|
175 |
|
176 |
**BibTeX:**
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
177 |
|
178 |
-
{{ citation_bibtex | default("[More Information Needed]", true)}}
|
179 |
|
180 |
**APA:**
|
181 |
|
182 |
-
|
183 |
-
|
184 |
-
## Glossary [optional]
|
185 |
-
|
186 |
-
<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
|
187 |
-
|
188 |
-
{{ glossary | default("[More Information Needed]", true)}}
|
189 |
-
|
190 |
-
## More Information [optional]
|
191 |
-
|
192 |
-
{{ more_information | default("[More Information Needed]", true)}}
|
193 |
-
|
194 |
-
## Model Card Authors [optional]
|
195 |
-
|
196 |
-
{{ model_card_authors | default("[More Information Needed]", true)}}
|
197 |
|
198 |
## Model Card Contact
|
199 |
|
200 |
-
|
|
|
1 |
---
|
2 |
+
'[object Object]': null
|
3 |
+
language:
|
4 |
+
- en
|
5 |
+
license: other
|
6 |
+
license_name: autodesk-non-commercial-3d-generative-v1.0
|
7 |
+
license_link: LICENSE.md
|
8 |
+
tags:
|
9 |
+
- make-a-shape
|
10 |
+
- sv-to-3d
|
11 |
---
|
12 |
+
---
|
13 |
+
# Model Card for Make-A-Shape Single-View to 3D Model
|
14 |
|
15 |
+
This model is part of the Make-A-Shape paper, capable of generating high-quality 3D shapes from single-view images with intricate geometric details, realistic structures, and complex topologies.
|
|
|
|
|
|
|
|
|
16 |
|
17 |
## Model Details
|
18 |
|
19 |
### Model Description
|
20 |
|
21 |
+
Make-A-Shape is a novel 3D generative framework trained on an extensive dataset of over 10 million publicly-available 3D shapes. The single-view to 3D model is one of the conditional generation models in this framework. It can efficiently generate a wide range of high-quality 3D shapes from single-view image inputs in just 2 seconds. The model uses a wavelet-tree representation and adaptive training strategy to achieve superior performance in terms of geometric detail and structural plausibility.
|
22 |
|
23 |
+
- **Developed by:** Ka-Hei Hui, Aditya Sanghi, Arianna Rampini, Kamal Rahimi Malekshan, Zhengzhe Liu, Hooman Shayani, Chi-Wing Fu
|
24 |
+
- **Model type:** 3D Generative Model
|
25 |
+
- **License:** Autodesk Non-Commercial (3D Generative) v1.0
|
26 |
|
27 |
+
For more information please look at the [Project](https://www.research.autodesk.com/publications/generative-ai-make-a-shape/) [Page](https://edward1997104.github.io/make-a-shape/) and [the ICML paper](https://proceedings.mlr.press/v235/hui24a.html).
|
|
|
|
|
|
|
|
|
|
|
|
|
28 |
|
29 |
+
### Model Sources
|
30 |
|
31 |
<!-- Provide the basic links for the model. -->
|
32 |
|
33 |
+
- **Repository:** [https://github.com/AutodeskAILab/Make-a-Shape](https://github.com/AutodeskAILab/Make-a-Shape)
|
34 |
+
- **Paper:** [Make-A-Shape: a Ten-Million-scale 3D Shape Model](https://proceedings.mlr.press/v235/hui24a.html)
|
35 |
+
- **Demo:** [in progress...]
|
36 |
|
37 |
## Uses
|
38 |
|
|
|
|
|
39 |
### Direct Use
|
40 |
|
41 |
+
Please look at the instructions [here](https://github.com/AutodeskAILab/Make-a-Shape?tab=readme-ov-file#single-view-to-3d) to test this model for research and acadeic purposes.
|
|
|
|
|
42 |
|
43 |
+
### Downstream Use
|
44 |
|
45 |
+
This model could potentially be used in various applications such as:
|
46 |
+
- 3D content creation for gaming and virtual environments
|
47 |
+
- Augmented reality applications
|
48 |
+
- Computer-aided design and prototyping
|
49 |
+
- Architectural visualization
|
50 |
|
51 |
### Out-of-Scope Use
|
52 |
|
53 |
+
The model should not be used for:
|
54 |
+
- Commercial use
|
55 |
+
- Generating 3D shapes of sensitive or copyrighted content without proper authorization
|
56 |
+
- Creating 3D models intended for harmful or malicious purposes
|
57 |
|
58 |
## Bias, Risks, and Limitations
|
59 |
|
60 |
<!-- This section is meant to convey both technical and sociotechnical limitations. -->
|
61 |
|
62 |
+
- The model may inherit biases present in the training dataset, which could lead to uneven representation of certain object types or styles.
|
63 |
+
- The quality of the generated 3D shape depends on the quality and clarity of the input image.
|
64 |
+
- The model may occasionally generate implausible shapes, especially when the input image is ambiguous or of low quality.
|
65 |
+
- The model's performance may degrade for object categories or styles that are underrepresented in the training data.
|
66 |
|
67 |
### Recommendations
|
68 |
|
69 |
+
Users should be aware of the potential biases and limitations of the model. It's recommended to:
|
70 |
+
- Use high-quality, clear input images for best results
|
71 |
+
- Verify and potentially post-process the generated 3D shapes for critical applications
|
72 |
+
- Be cautious when using the model for object categories that may be underrepresented in the training data
|
73 |
+
- Consider ethical implications and potential biases
|
74 |
+
- DO NOT USE for commercial or public-facing applications
|
75 |
|
76 |
## How to Get Started with the Model
|
77 |
|
78 |
+
[More Information Needed]
|
|
|
|
|
79 |
|
80 |
## Training Details
|
81 |
|
82 |
### Training Data
|
83 |
|
84 |
+
The model was trained on a dataset of over 10 million 3D shapes aggregated from 18 different publicly-available sub-datasets, including ModelNet, ShapeNet, SMPL, Thingi10K, SMAL, COMA, House3D, ABC, Fusion 360, 3D-FUTURE, BuildingNet, DeformingThings4D, FG3D, Toys4K, ABO, Infinigen, Objaverse, and two subsets of ObjaverseXL (Thingiverse and GitHub).
|
|
|
|
|
85 |
|
86 |
### Training Procedure
|
87 |
|
88 |
+
#### Preprocessing
|
|
|
|
|
|
|
|
|
89 |
|
90 |
+
Each 3D shape in the dataset was converted into a truncated signed distance function (TSDF) with a resolution of 256³. The TSDF was then decomposed using a discrete wavelet transform to create the wavelet-tree representation used by the model.
|
91 |
|
92 |
#### Training Hyperparameters
|
93 |
|
94 |
+
- **Training regime:** Please look at the paper.
|
95 |
|
96 |
+
#### Speeds, Sizes, Times
|
97 |
|
98 |
+
- The model was trained on 48 × A10G GPUs for about 20 days, amounting to around 23,000 GPU hours.
|
99 |
+
- The model can generate shapes within two seconds for most conditions.
|
|
|
100 |
|
101 |
## Evaluation
|
102 |
|
|
|
|
|
103 |
### Testing Data, Factors & Metrics
|
104 |
|
105 |
#### Testing Data
|
106 |
|
107 |
+
The model was evaluated on a test set consisting of 2% of the shapes from each sub-dataset in the training data, as well as on the entire Google Scanned Objects (GSO) dataset, which was not part of the training data.
|
|
|
|
|
108 |
|
109 |
#### Factors
|
110 |
|
111 |
+
The evaluation considered various factors such as the quality of generated shapes, the ability to capture fine details and complex structures, and the model's performance across different object categories.
|
|
|
|
|
112 |
|
113 |
#### Metrics
|
114 |
|
115 |
+
The model was evaluated using the following metrics:
|
116 |
+
- Intersection over Union (IoU)
|
117 |
+
- Light Field Distance (LFD)
|
118 |
+
- Chamfer Distance (CD)
|
119 |
|
120 |
### Results
|
121 |
|
122 |
+
The single-view to 3D model achieved the following results on the "Our Val" dataset:
|
123 |
+
- LFD: 4071.33
|
124 |
+
- IoU: 0.4285
|
125 |
+
- CD: 0.01851
|
126 |
|
127 |
+
On the GSO dataset:
|
128 |
+
- LFD: 3406.61
|
129 |
+
- IoU: 0.5004
|
130 |
+
- CD: 0.01748
|
131 |
|
|
|
132 |
|
133 |
+
## Technical Specifications
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
134 |
|
135 |
### Model Architecture and Objective
|
136 |
|
137 |
+
The model uses a U-ViT architecture with learnable skip-connections between the convolution and deconvolution blocks. It employs a wavelet-tree representation and a subband adaptive training strategy to effectively capture both coarse and fine details of 3D shapes.
|
138 |
|
139 |
### Compute Infrastructure
|
140 |
|
|
|
|
|
141 |
#### Hardware
|
142 |
|
143 |
+
The model was trained on 48 × A10G GPUs.
|
|
|
|
|
|
|
|
|
144 |
|
145 |
+
## Citation
|
|
|
|
|
146 |
|
147 |
**BibTeX:**
|
148 |
+
@InProceedings{pmlr-v235-hui24a,
|
149 |
+
title = {Make-A-Shape: a Ten-Million-scale 3{D} Shape Model},
|
150 |
+
author = {Hui, Ka-Hei and Sanghi, Aditya and Rampini, Arianna and Rahimi Malekshan, Kamal and Liu, Zhengzhe and Shayani, Hooman and Fu, Chi-Wing},
|
151 |
+
booktitle = {Proceedings of the 41st International Conference on Machine Learning},
|
152 |
+
pages = {20660--20681},
|
153 |
+
year = {2024},
|
154 |
+
editor = {Salakhutdinov, Ruslan and Kolter, Zico and Heller, Katherine and Weller, Adrian and Oliver, Nuria and Scarlett, Jonathan and Berkenkamp, Felix},
|
155 |
+
volume = {235},
|
156 |
+
series = {Proceedings of Machine Learning Research},
|
157 |
+
month = {21--27 Jul},
|
158 |
+
publisher = {PMLR},
|
159 |
+
pdf = {https://raw.githubusercontent.com/mlresearch/v235/main/assets/hui24a/hui24a.pdf},
|
160 |
+
url = {https://proceedings.mlr.press/v235/hui24a.html},
|
161 |
+
abstract = {The progression in large-scale 3D generative models has been impeded by significant resource requirements for training and challenges like inefficient representations. This paper introduces Make-A-Shape, a novel 3D generative model trained on a vast scale, using 10 million publicly-available shapes. We first innovate the wavelet-tree representation to encode high-resolution SDF shapes with minimal loss, leveraging our newly-proposed subband coefficient filtering scheme. We then design a subband coefficient packing scheme to facilitate diffusion-based generation and a subband adaptive training strategy for effective training on the large-scale dataset. Our generative framework is versatile, capable of conditioning on various input modalities such as images, point clouds, and voxels, enabling a variety of downstream applications, e.g., unconditional generation, completion, and conditional generation. Our approach clearly surpasses the existing baselines in delivering high-quality results and can efficiently generate shapes within two seconds for most conditions.}
|
162 |
+
}
|
163 |
|
|
|
164 |
|
165 |
**APA:**
|
166 |
|
167 |
+
Hui, K. H., Sanghi, A., Rampini, A., Malekshan, K. R., Liu, Z., Shayani, H., & Fu, C. W. (2024). Make-A-Shape: a Ten-Million-scale 3D Shape Model. arXiv preprint arXiv:2401.08504.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
168 |
|
169 |
## Model Card Contact
|
170 |
|
171 |