|
--- |
|
inference: false |
|
datasets: |
|
- liuhaotian/LLaVA-CC3M-Pretrain-595K |
|
--- |
|
|
|
# llava-v1.5-llama-3-8b-pretrain Model Card |
|
|
|
This is a pretrained checkpoint with the MLP connector after LLaVA stage 1, you can use it to instruct tune your multimodal models. |
|
Please follow my reproduced implementation [LLaVA-Llama-3](https://github.com/Victorwz/LLaVA-Llama-3/) for more details on fine-tuning LLaVA model with Llama-3 as the foundatiaon LLM. |
|
|
|
|
|
## Training dataset |
|
- 558K filtered image-text pairs from LAION/CC/SBU, captioned by BLIP. |
|
|
|
## Architecture |
|
- LLM: llama-3-8b (Frozen) |
|
- Vision-Language Adapter: MLP |
|
- Vision Encoder: CLIP-ViT-L-336px (Frozen) |
|
|