File size: 3,884 Bytes

---
license: mit
pipeline_tag: image-to-3d
tags:
- image-to-3d
---

# LucidFusion: Generating 3D Gaussians with Arbitrary Unposed Images

[Hao He*](https://heye0507.github.io/) [Yixun Liang*](https://yixunliang.github.io/), [Luozhou Wang](https://wileewang.github.io/), [Yuanhao Cai](https://github.com/caiyuanhao1998), [Xinli Xu](https://scholar.google.com/citations?user=lrgPuBUAAAAJ&hl=en&inst=1381320739207392350), [Hao-Xiang Guo](), [Xiang Wen](), [Yingcong Chen**](https://www.yingcong.me)

\*: Equal contribution.
\**: Corresponding author.

[Paper PDF](https://arxiv.org/abs/2410.15636) | [Project Page](https://heye0507.github.io/LucidFusion_page/) | [Gradio Demo](Coming Soon)

---

<div align="center">
    <img src="resources/output_16.gif" width="95%"/>  
    <br>
    <p><i>Note: we compress these motion pictures for faster previewing.</i></p>
</div>

<div align=center>
<img src="resources/ours_qualitative.jpeg" width="95%"/>  
  
Examples of cross-dataset content creations with our framework, the *LucidFusion*, around **~13FPS** on A800.

</div>

## 🎏 Abstract
We present a flexible end-to-end feed-forward framework, named the *LucidFusion*, to generate high-resolution 3D Gaussians from unposed, sparse, and arbitrary numbers of multiview images.

<details><summary>CLICK for the full abstract</summary>

> Recent large reconstruction models have made notable progress in generating high-quality 3D objects from single images. However, these methods often struggle with controllability, as they lack information from multiple views, leading to incomplete or inconsistent 3D reconstructions. To address this limitation, we introduce LucidFusion, a flexible end-to-end feed-forward framework that leverages the Relative Coordinate Map (RCM).  Unlike traditional methods linking images to 3D world thorough pose, LucidFusion utilizes RCM to align geometric features coherently across different views, making it highly adaptable for 3D generation from arbitrary, unposed images. Furthermore, LucidFusion seamlessly integrates with the original single-image-to-3D pipeline, producing detailed 3D Gaussians at a resolution of $512 \times 512$, making it well-suited for a wide range of applications.

</details>

## 🔧 Training Instructions

Our inference code is now released! 

Please refer to our [repo](https://github.com/EnVision-Research/LucidFusion/tree/master) for more details.


### Pretrained Weights

Our current model loads pre-trained diffusion model for config. We use stable-diffusion-2-1-base, to download it, simply run
```
python pretrained/download.py
```
You can omit this step if you already have stable-diffusion-2-1-base, and simply update "model_key" with your local SD-2-1 path for scripts in scripts/ folder.

Our pre-trained weights is released!

## 🚧 Todo

- [x] Release the inference codes
- [x] Release our weights
- [ ] Release the Gardio Demo
- [ ] Release the Stage 1 and 2 training codes

## 📍 Citation 
If you find our work useful, please consider citing our paper.
```
@misc{he2024lucidfusion,
      title={LucidFusion: Generating 3D Gaussians with Arbitrary Unposed Images}, 
      author={Hao He and Yixun Liang and Luozhou Wang and Yuanhao Cai and Xinli Xu and Hao-Xiang Guo and Xiang Wen and Yingcong Chen},
      year={2024},
      eprint={2410.15636},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2410.15636}, 
}
```

## 💼 Acknowledgement
This work is built on many amazing research works and open-source projects:
- [gaussian-splatting](https://github.com/graphdeco-inria/gaussian-splatting) and [diff-gaussian-rasterization](https://github.com/graphdeco-inria/diff-gaussian-rasterization)
- [ZeroShape](https://github.com/zxhuang1698/ZeroShape)
- [LGM](https://github.com/3DTopia/LGM)

Thanks for their excellent work and great contribution to 3D generation area.