|
--- |
|
license: cc-by-nc-4.0 |
|
language: |
|
- en |
|
pipeline_tag: image-text-to-text |
|
--- |
|
|
|
|
|
# Model description |
|
|
|
BLIP-3 consists of 3 models: a CLIP-like image encoder, a VL connector, and a large language model. |
|
|
|
# Direct Use and Downstream Use |
|
|
|
|
|
# Bias, Risks, Limitations, and Ethical Considerations |
|
|
|
# How to use |
|
|
|
> We require use the development version (`"4.41.0.dev0"`) of the `transformers` library. To get it, as of 05/07/2024, one can use `pip uninstall -y transformers && pip install git+https://github.com/huggingface/transformers.` |
|
|
|
|
|
# License |
|
|
|
Our code and weights are released under the Creative Commons Attribution Non Commercial 4.0 [LICENSE](LICENSE.txt). |
|
|
|
# Troubleshoot |
|
|
|
1. If you missing any packages, please consider the followings |
|
|
|
``` |
|
pip install torch==2.2.1 torchvision==0.17.1 torchaudio==2.2.1 --index-url https://download.pytorch.org/whl/cu121 |
|
pip install open_clip_torch==2.24.0 |
|
pip install einops |
|
pip install einops-exts |
|
``` |