We organize a large-scale dataset composed of a series of object detection datasets to train a more general model named Stable Box Diffusion based on ODGEN.

We employ 10 datasets including COCO2014, OpenImagesv7, Object365, PasvalVOC2007, PascalVOC2012, ImageNet, RUOD, nuScenes, ADE20K, and BDD100K, which covers about 31 million images and more than 5300 categories of objects. Our Stable Box Diffusion is trained on x24 NVIDIA A6000 GPUs with batch size 96 for 20 epochs. It costs 42 days and more than 24000 GPU hours in total.

Visualized Results

@misc{zhu2024odgendomainspecificobjectdetection,
      title={ODGEN: Domain-specific Object Detection Data Generation with Diffusion Models}, 
      author={Jingyuan Zhu and Shiyu Li and Yuxuan Liu and Ping Huang and Jiulong Shan and Huimin Ma and Jian Yuan},
      year={2024},
      eprint={2405.15199},
      archivePrefix={arXiv},
      primaryClass={cs.CV},   
      url={https://arxiv.org/abs/2405.15199}, 
}
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Examples
Unable to determine this model's library. Check the docs .

Model tree for jy-zhu/Stable_Box_Diffusion

Finetuned
(173)
this model