UncleFish's picture
add readme
7a3362f
|
raw
history blame
944 Bytes
metadata
license: cc-by-nc-4.0
language:
  - en
pipeline_tag: image-text-to-text

Model description

BLIP-3 consists of 3 models: a CLIP-like image encoder, a VL connector, and a large language model.

Direct Use and Downstream Use

Bias, Risks, Limitations, and Ethical Considerations

How to use

We require use the development version ("4.41.0.dev0") of the transformers library. To get it, as of 05/07/2024, one can use pip uninstall -y transformers && pip install git+https://github.com/huggingface/transformers.

License

Our code and weights are released under the Creative Commons Attribution Non Commercial 4.0 LICENSE.

Troubleshoot

  1. If you missing any packages, please consider the followings
pip install torch==2.2.1 torchvision==0.17.1 torchaudio==2.2.1 --index-url https://download.pytorch.org/whl/cu121
pip install open_clip_torch==2.24.0
pip install einops
pip install einops-exts