metadata
license: cc-by-nc-4.0
language:
- en
pipeline_tag: image-text-to-text
Model description
BLIP-3 consists of 3 models: a CLIP-like image encoder, a VL connector, and a large language model.
Direct Use and Downstream Use
Bias, Risks, Limitations, and Ethical Considerations
How to use
We require use the development version (
"4.41.0.dev0"
) of thetransformers
library. To get it, as of 05/07/2024, one can usepip uninstall -y transformers && pip install git+https://github.com/huggingface/transformers.
License
Our code and weights are released under the Creative Commons Attribution Non Commercial 4.0 LICENSE.
Troubleshoot
- If you missing any packages, please consider the followings
pip install torch==2.2.1 torchvision==0.17.1 torchaudio==2.2.1 --index-url https://download.pytorch.org/whl/cu121
pip install open_clip_torch==2.24.0
pip install einops
pip install einops-exts