File size: 5,727 Bytes
95c62fe a714244 95c62fe a714244 95c62fe a714244 1dee3e2 a714244 fb4999b a714244 fb4999b 24de120 fb4999b a714244 fb4999b a714244 81b3949 a714244 81b3949 a714244 81b3949 a714244 81b3949 a714244 95c62fe a714244 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 |
---
license: mit
pipeline_tag: image-to-3d
tags:
- image-to-3d
---
# LucidFusion: Generating 3D Gaussians with Arbitrary Unposed Images
[Hao He*](https://heye0507.github.io/) [Yixun Liang*](https://yixunliang.github.io/), [Luozhou Wang](https://wileewang.github.io/), [Yuanhao Cai](https://github.com/caiyuanhao1998), [Xinli Xu](https://scholar.google.com/citations?user=lrgPuBUAAAAJ&hl=en&inst=1381320739207392350), [Hao-Xiang Guo](), [Xiang Wen](), [Yingcong Chen**](https://www.yingcong.me)
\*: Equal contribution.
\**: Corresponding author.
[Paper PDF](https://arxiv.org/abs/2410.15636) | [Project Page](https://heye0507.github.io/LucidFusion_page/) | [Gradio Demo](Coming Soon)
---
### Demo results of our latest model
<style>
.gif-row {
display: flex;
justify-content: center;
flex-wrap: wrap;
gap: 10px; /* Adjust the gap as needed */
}
.gif-row img {
width: 20%; /* Adjust width to control size */
}
</style>
<div class="gif-row">
<img src="resources/res_ironman.gif" alt="Ironman">
<img src="resources/res_hulk.gif" alt="Hulk">
<img src="resources/res_deadpool.gif" alt="Deadpool">
<img src="resources/res_team_america.gif" alt="Team America">
</div>
<div class="gif-row">
<img src="resources/res_venom_1.gif" alt="Venom">
<img src="resources/res_black_widow.gif" alt="Black Widow">
<img src="resources/res_spiderman.gif" alt="Spiderman">
<img src="resources/res_superman.gif" alt="Superman">
</div>
<div class="gif-row">
<img src="resources/res_minions.gif" alt="Minions">
<img src="resources/res_snowman.gif" alt="Snowman">
<img src="resources/res_d2_witch.gif" alt="Diablo 2">
<img src="resources/res_harry_porter.gif" alt="Harry Porter">
</div>
<div class="gif-row">
<img src="resources/princess.gif" alt="Princess">
<img src="resources/res_arabic.gif" alt="Arabic">
<img src="resources/res_chief.gif" alt="Chief">
<img src="resources/res_knight.gif" alt="Knight">
</div>
<div class="gif-row">
<img src="resources/res_cry_witch.gif" alt="Witch">
<img src="resources/boy_running.gif" alt="Boy">
<img src="resources/girl_head_3.gif" alt="Girl Head">
<img src="resources/girl_head_2.gif" alt="Girl Head 2">
</div>
<div align=center>
<p style="max-width: 1000px;">
We present a flexible end-to-end feed-forward framework, named the <i>LucidFusion</i>, to generate high-resolution 3D Gaussians from unposed, sparse, and arbitrary numbers of multiview images.
</p>
</div>
---
<!-- ### Demo results of 256 model -->
<!-- <div align="center">
<img src="resources/output_16.gif" width="95%"/>
<br>
<p><i>Note: we compress these motion pictures for faster previewing.</i></p>
</div> -->
<div align=center>
<img src="resources/ours_qualitative.jpeg" width="95%"/>
Examples of cross-dataset content creations with our framework, the *LucidFusion*, around **~13FPS** on A800.
</div>
## ๐ Abstract
We present a flexible end-to-end feed-forward framework, named the *LucidFusion*, to generate high-resolution 3D Gaussians from unposed, sparse, and arbitrary numbers of multiview images.
<details><summary>CLICK for the full abstract</summary>
> Recent large reconstruction models have made notable progress in generating high-quality 3D objects from single images. However, these methods often struggle with controllability, as they lack information from multiple views, leading to incomplete or inconsistent 3D reconstructions. To address this limitation, we introduce LucidFusion, a flexible end-to-end feed-forward framework that leverages the Relative Coordinate Map (RCM). Unlike traditional methods linking images to 3D world thorough pose, LucidFusion utilizes RCM to align geometric features coherently across different views, making it highly adaptable for 3D generation from arbitrary, unposed images. Furthermore, LucidFusion seamlessly integrates with the original single-image-to-3D pipeline, producing detailed 3D Gaussians at a resolution of $512 \times 512$, making it well-suited for a wide range of applications.
</details>
## ๐ง Training Instructions
Our inference code is now released!
Please refer to our [repo](https://github.com/EnVision-Research/LucidFusion/tree/master) for more details.
### Pretrained Weights
Our current model loads pre-trained diffusion model for config. We use stable-diffusion-2-1-base, to download it, simply run
```
python pretrained/download.py
```
You can omit this step if you already have stable-diffusion-2-1-base, and simply update "model_key" with your local SD-2-1 path for scripts in scripts/ folder.
Our pre-trained weights is released!
## ๐ง Todo
- [x] Release the inference codes
- [x] Release our weights
- [ ] Release the Gardio Demo
- [ ] Release the Stage 1 and 2 training codes
## ๐ Citation
If you find our work useful, please consider citing our paper.
```
@misc{he2024lucidfusion,
title={LucidFusion: Generating 3D Gaussians with Arbitrary Unposed Images},
author={Hao He and Yixun Liang and Luozhou Wang and Yuanhao Cai and Xinli Xu and Hao-Xiang Guo and Xiang Wen and Yingcong Chen},
year={2024},
eprint={2410.15636},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2410.15636},
}
```
## ๐ผ Acknowledgement
This work is built on many amazing research works and open-source projects:
- [gaussian-splatting](https://github.com/graphdeco-inria/gaussian-splatting) and [diff-gaussian-rasterization](https://github.com/graphdeco-inria/diff-gaussian-rasterization)
- [ZeroShape](https://github.com/zxhuang1698/ZeroShape)
- [LGM](https://github.com/3DTopia/LGM)
Thanks for their excellent work and great contribution to 3D generation area.
|