Spaces:

szymanowiczs
/

flash3d

Runtime error

App Files Files Community

lyndonzheng commited on Jun 7

Commit

0b13fd6

•

1 Parent(s): a19de17

update ui

Browse files

Files changed (2) hide show

app.py +25 -21
demo_examples/re10k_04.jpg +0 -0

app.py CHANGED Viewed

@@ -75,6 +75,10 @@ def main():
  gr.Markdown(
  """
  # Flash3D
  """
  )
  with gr.Row(variant="panel"):
@@ -96,7 +100,6 @@ def main():
  './demo_examples/bedroom_01.png',
  './demo_examples/kitti_02.png',
  './demo_examples/kitti_03.png',
- './demo_examples/re10k_04.jpg',
  './demo_examples/re10k_05.jpg',
  './demo_examples/re10k_06.jpg',
  ],
@@ -118,26 +121,27 @@ def main():
  interactive=False
  )
- # gr.Markdown(
- # """
- # ## Comments:
- # 1. If you run the demo online, the first example you upload should take about 4.5 seconds (with preprocessing, saving and overhead), the following take about 1.5s.
- # 2. The 3D viewer shows a .ply mesh extracted from a mix of 3D Gaussians. This is only an approximations and artefacts might show.
- # 3. Known limitations include:
- # - a black dot appearing on the model from some viewpoints
- # - see-through parts of objects, especially on the back: this is due to the model performing less well on more complicated shapes
- # - back of objects are blurry: this is a model limiation due to it being deterministic
- # 4. Our model is of comparable quality to state-of-the-art methods, and is **much** cheaper to train and run.
- # ## How does it work?
- # Splatter Image formulates 3D reconstruction as an image-to-image translation task. It maps the input image to another image,
- # in which every pixel represents one 3D Gaussian and the channels of the output represent parameters of these Gaussians, including their shapes, colours and locations.
- # The resulting image thus represents a set of Gaussians (almost like a point cloud) which reconstruct the shape and colour of the object.
- # The method is very cheap: the reconstruction amounts to a single forward pass of a neural network with only 2D operators (2D convolutions and attention).
- # The rendering is also very fast, due to using Gaussian Splatting.
- # Combined, this results in very cheap training and high-quality results.
- # For more results see the [project page](https://szymanowiczs.github.io/splatter-image) and the [CVPR article](https://arxiv.org/abs/2312.13150).
- # """
- # )
  submit.click(fn=check_input_image, inputs=[input_image]).success(
  fn=preprocess,

  gr.Markdown(
  """
  # Flash3D
+ **Flash3D** [project page](https://www.robots.ox.ac.uk/~vgg/research/flash3d/)] is a fast, super efficient, trinable on a single GPU in a day for sence 3D reconstruction from a single image.
+ The model used in the demo was trained on only **RealEstate10k dataset on a single A6000 GPU within 1 day**.
+ Upload an image of a scene or click on one of the provided examples to see how the Flash3D does.
+ The 3D viewer will render a .ply scene exported from the 3D Gaussians, which is only an approximation.
  """
  )
  with gr.Row(variant="panel"):
  './demo_examples/bedroom_01.png',
  './demo_examples/kitti_02.png',
  './demo_examples/kitti_03.png',
  './demo_examples/re10k_05.jpg',
  './demo_examples/re10k_06.jpg',
  ],
  interactive=False
  )
+ gr.Markdown(
+ """
+ ## Comments:
+ 1. If you run the demo online, the first example you upload should take about 25 seconds (with preprocessing, saving and overhead), the following take about 14s.
+ 2. The 3D viewer shows a .ply mesh extracted from a mix of 3D Gaussians. This is only an approximations and artefacts might show.
+ 3. Known limitations include:
+ - a black dot appearing on the model from some viewpoints
+ - while the multiple gaussians fill in resonable pixels to the invisible parts, the visual quality is still blurry.
+ 4. It achieves state-of-the-art results when trained and tested on RealEstate10k., and is **much** cheaper to train and run.
+ 5. When transferred to unseen datasets like NYU it outperforms competitors by a large margin.
+ 6. More impressively, when transferred to KITTI, Flash3D achieves better PSNR than methods trained specifically on that dataset.
+ ## How does it work?
+ Given a single image I as input, Flash3D first estimates the metric depth D using a frozen off-the-shelf network.
+ Then, a ResNet50-like encoder–decoder network predicts a set of shape and appearance parameters P of K layers of Gaussians for every pixel u,
+ allowing unobserved and occluded surfaces to be modelled.
+ From these predicted components, the depth can be obtained by summing the predicted (positive) offsets δi with the predicted monocular depth D,
+ allowing the mean vector for every layer of Gaussians to be computed.
+ This strategy ensures that the layers are depth-ordered, encouraging the network to model occluded surfaces.
+ For more results see the [project page](https://www.robots.ox.ac.uk/~vgg/research/flash3d/).
+ """
+ )
  submit.click(fn=check_input_image, inputs=[input_image]).success(
  fn=preprocess,

demo_examples/re10k_04.jpg DELETED Viewed

Binary file (15.1 kB)