--- license: apache-2.0 metrics: - accuracy base_model: - liuhaotian/llava-v1.5-7b --- # LLaVA-3D ## Table of Contents 1. [Model Summary](##model-summary) 2. [Use](##use) 3. [Limitations](##limitations) 4. [Training](##training) 5. [License](##license) 6. [Citation](##citation) ## Model Summary The LLaVA-3D model is a 7B parameter models trained on LLaVA-3D-Instruct-1M, based on LLaVA-v1.5-7B. - **Repository:** [ZCMax/LLaVA-3D](https://github.com/ZCMax/LLaVA-3D) - **Project Website:** [zcmax.github.io/projects/LLaVA-3D](https://zcmax.github.io/projects/LLaVA-3D/) - **Paper:** [LLaVA-3D](https://arxiv.org/abs/2409.18125) - **Point of Contact:** [Chenming Zhu](mailto:zcm952742165@gmail.com) - **Languages:** English ## Use ### Intended use The model was trained on LLaVA-3D-Instruct-1M and has the ability to interact with the single image for 2D tasks and posed RBG-D images for 3D tasks. **Feel free to share your generations in the Community tab!** # Training ## Model - **Pretraining Stage:** scene-level and region-level caption data, 1 epoch, projector - **Instructing Tuning Stage:** A mixture of 1M high-quality 2D and 3D data, 1 epoch, full model - **Precision:** bfloat16 ## Hardware & Software - **GPUs:** 8 * Nvidia Tesla A100 (for whole model series training) - **Orchestration:** [Huggingface Trainer](https://huggingface.co/docs/transformers/main_classes/trainer) - **Neural networks:** [PyTorch](https://github.com/pytorch/pytorch) # Citation ``` @article{zhu2024llava, title={LLaVA-3D: A Simple yet Effective Pathway to Empowering LMMs with 3D-awareness}, author={Zhu, Chenming and Wang, Tai and Zhang, Wenwei and Pang, Jiangmiao and Liu, Xihui}, journal={arXiv preprint arXiv:2409.18125}, year={2024} } ```