lyndonzheng commited on
Commit
0b13fd6
1 Parent(s): a19de17
Files changed (2) hide show
  1. app.py +25 -21
  2. demo_examples/re10k_04.jpg +0 -0
app.py CHANGED
@@ -75,6 +75,10 @@ def main():
75
  gr.Markdown(
76
  """
77
  # Flash3D
 
 
 
 
78
  """
79
  )
80
  with gr.Row(variant="panel"):
@@ -96,7 +100,6 @@ def main():
96
  './demo_examples/bedroom_01.png',
97
  './demo_examples/kitti_02.png',
98
  './demo_examples/kitti_03.png',
99
- './demo_examples/re10k_04.jpg',
100
  './demo_examples/re10k_05.jpg',
101
  './demo_examples/re10k_06.jpg',
102
  ],
@@ -118,26 +121,27 @@ def main():
118
  interactive=False
119
  )
120
 
121
- # gr.Markdown(
122
- # """
123
- # ## Comments:
124
- # 1. If you run the demo online, the first example you upload should take about 4.5 seconds (with preprocessing, saving and overhead), the following take about 1.5s.
125
- # 2. The 3D viewer shows a .ply mesh extracted from a mix of 3D Gaussians. This is only an approximations and artefacts might show.
126
- # 3. Known limitations include:
127
- # - a black dot appearing on the model from some viewpoints
128
- # - see-through parts of objects, especially on the back: this is due to the model performing less well on more complicated shapes
129
- # - back of objects are blurry: this is a model limiation due to it being deterministic
130
- # 4. Our model is of comparable quality to state-of-the-art methods, and is **much** cheaper to train and run.
131
- # ## How does it work?
132
- # Splatter Image formulates 3D reconstruction as an image-to-image translation task. It maps the input image to another image,
133
- # in which every pixel represents one 3D Gaussian and the channels of the output represent parameters of these Gaussians, including their shapes, colours and locations.
134
- # The resulting image thus represents a set of Gaussians (almost like a point cloud) which reconstruct the shape and colour of the object.
135
- # The method is very cheap: the reconstruction amounts to a single forward pass of a neural network with only 2D operators (2D convolutions and attention).
136
- # The rendering is also very fast, due to using Gaussian Splatting.
137
- # Combined, this results in very cheap training and high-quality results.
138
- # For more results see the [project page](https://szymanowiczs.github.io/splatter-image) and the [CVPR article](https://arxiv.org/abs/2312.13150).
139
- # """
140
- # )
 
141
 
142
  submit.click(fn=check_input_image, inputs=[input_image]).success(
143
  fn=preprocess,
 
75
  gr.Markdown(
76
  """
77
  # Flash3D
78
+ **Flash3D** [project page](https://www.robots.ox.ac.uk/~vgg/research/flash3d/)] is a fast, super efficient, trinable on a single GPU in a day for sence 3D reconstruction from a single image.
79
+ The model used in the demo was trained on only **RealEstate10k dataset on a single A6000 GPU within 1 day**.
80
+ Upload an image of a scene or click on one of the provided examples to see how the Flash3D does.
81
+ The 3D viewer will render a .ply scene exported from the 3D Gaussians, which is only an approximation.
82
  """
83
  )
84
  with gr.Row(variant="panel"):
 
100
  './demo_examples/bedroom_01.png',
101
  './demo_examples/kitti_02.png',
102
  './demo_examples/kitti_03.png',
 
103
  './demo_examples/re10k_05.jpg',
104
  './demo_examples/re10k_06.jpg',
105
  ],
 
121
  interactive=False
122
  )
123
 
124
+ gr.Markdown(
125
+ """
126
+ ## Comments:
127
+ 1. If you run the demo online, the first example you upload should take about 25 seconds (with preprocessing, saving and overhead), the following take about 14s.
128
+ 2. The 3D viewer shows a .ply mesh extracted from a mix of 3D Gaussians. This is only an approximations and artefacts might show.
129
+ 3. Known limitations include:
130
+ - a black dot appearing on the model from some viewpoints
131
+ - while the multiple gaussians fill in resonable pixels to the invisible parts, the visual quality is still blurry.
132
+ 4. It achieves state-of-the-art results when trained and tested on RealEstate10k., and is **much** cheaper to train and run.
133
+ 5. When transferred to unseen datasets like NYU it outperforms competitors by a large margin.
134
+ 6. More impressively, when transferred to KITTI, Flash3D achieves better PSNR than methods trained specifically on that dataset.
135
+ ## How does it work?
136
+ Given a single image I as input, Flash3D first estimates the metric depth D using a frozen off-the-shelf network.
137
+ Then, a ResNet50-like encoder–decoder network predicts a set of shape and appearance parameters P of K layers of Gaussians for every pixel u,
138
+ allowing unobserved and occluded surfaces to be modelled.
139
+ From these predicted components, the depth can be obtained by summing the predicted (positive) offsets δi with the predicted monocular depth D,
140
+ allowing the mean vector for every layer of Gaussians to be computed.
141
+ This strategy ensures that the layers are depth-ordered, encouraging the network to model occluded surfaces.
142
+ For more results see the [project page](https://www.robots.ox.ac.uk/~vgg/research/flash3d/).
143
+ """
144
+ )
145
 
146
  submit.click(fn=check_input_image, inputs=[input_image]).success(
147
  fn=preprocess,
demo_examples/re10k_04.jpg DELETED
Binary file (15.1 kB)