metadata

license: cc-by-nc-4.0
language:
  - en
pipeline_tag: image-text-to-text

Model description

BLIP-3 consists of 3 models: a CLIP-like image encoder, a VL connector, and a large language model.

Direct Use and Downstream Use

Bias, Risks, Limitations, and Ethical Considerations

How to use

We require use the development version ("4.41.0.dev0") of the transformers library. To get it, as of 05/07/2024, one can use pip uninstall -y transformers && pip install git+https://github.com/huggingface/transformers.

License

Our code and weights are released under the Creative Commons Attribution Non Commercial 4.0 LICENSE.

Troubleshoot

If you missing any packages, please consider the followings

pip install torch==2.2.1 torchvision==0.17.1 torchaudio==2.2.1 --index-url https://download.pytorch.org/whl/cu121
pip install open_clip_torch==2.24.0
pip install einops
pip install einops-exts