We organize a large-scale dataset composed of a series of object detection datasets to train a more general model named Stable Box Diffusion based on ODGEN.
We employ 10 datasets including COCO2014, OpenImagesv7, Object365, PasvalVOC2007, PascalVOC2012, ImageNet, RUOD, nuScenes, ADE20K, and BDD100K, which covers about 31 million images and more than 5300 categories of objects. Our Stable Box Diffusion is trained on x24 NVIDIA A6000 GPUs with batch size 96 for 20 epochs. It costs 42 days and more than 24000 GPU hours in total.
@misc{zhu2024odgendomainspecificobjectdetection,
title={ODGEN: Domain-specific Object Detection Data Generation with Diffusion Models},
author={Jingyuan Zhu and Shiyu Li and Yuxuan Liu and Ping Huang and Jiulong Shan and Huimin Ma and Jian Yuan},
year={2024},
eprint={2405.15199},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2405.15199},
}
Model tree for jy-zhu/Stable_Box_Diffusion
Base model
stabilityai/stable-diffusion-2-1