|
--- |
|
title: VLM Demo |
|
sdk: docker |
|
license: mit |
|
--- |
|
|
|
This demo illustrates the work published in the paper ["Prismatic VLMs: Investigating the Design Space of Visually-Conditioned Language Models"](https://arxiv.org/pdf/2402.07865.pdf) |
|
|
|
|
|
# VLM Demo |
|
|
|
> *VLM Demo*: Lightweight repo for chatting with VLMs supported by our |
|
[VLM Evaluation Suite](https://github.com/TRI-ML/vlm-evaluation/tree/main). |
|
|
|
--- |
|
|
|
## Installation |
|
|
|
This repository can be installed as follows: |
|
|
|
```bash |
|
git clone [email protected]:TRI-ML/vlm-demo.git |
|
cd vlm-demo |
|
pip install -e . |
|
``` |
|
|
|
This repository also requires that the `vlm-evaluation` package (`vlm_eval`) is |
|
installed in the current environment. Installation instructions can be found |
|
[here](https://github.com/TRI-ML/vlm-evaluation/tree/main). |
|
|
|
## Usage |
|
|
|
The main script to run is `interactive_demo.py`, while the implementation of |
|
the Gradio Controller (`serve/gradio_controller.py`) and Gradio Web Server |
|
(`serve/gradio_web_server.py`) are within `serve`. All of this code is heavily |
|
adapted from the [LLaVA Github Repo](https://github.com/haotian-liu/LLaVA/blob/main/llava/serve/). |
|
More details on how this code was modified from the original LLaVA repo is provided in the |
|
relevant source files. |
|
|
|
To run the demo, first run the following commands in separate terminals: |
|
|
|
+ Start Gradio Controller: `python -m serve.controller --host 0.0.0.0 --port 10000` |
|
+ Start Gradio Web Server: `python -m serve.gradio_web_server --controller http://localhost:10000 --model-list-mode reload --share` |
|
|
|
To run the interactive demo, you can specify a model to chat with via a `model_dir` or `model_id` as follows |
|
|
|
+ `python -m interactive_demo --port 40000 --model_id <MODEL_ID>` OR |
|
+ `python -m interactive_demo --port 40000 --model_dir <MODEL_DIR>` |
|
|
|
If you want to chat with multiple models simultaneously, you can launch the `interactive_demo` script in different terminals. |
|
|
|
When running the demo, the following parameters are adjustable: |
|
+ Temperature |
|
+ Max output tokens |
|
|
|
The default interaction mode is Chat, which is the main way to use our models. However, we also support a number of other |
|
interaction modes for more specific use cases: |
|
+ Captioning: Here,you can simply upload an image with no provided prompt and the selected model will output a caption. Even if a prompt |
|
is input by the user, it will not be used in producing the caption. |
|
+ Bounding Box Prediction: After uploading an image, simply specify a portion of the image for which bounding box coordinates are desired |
|
in the prompt and the selected model will output corresponding coordinates. |
|
+ Visual Question Answering: Selecting this option is best when the user wants short, succint answers to a specific question provided in the |
|
prompt. |
|
+ True/False Question Answering: Selecting this option is best when the user wants a True/False answer to a specific question provided in the |
|
prompt. |
|
|
|
## Example |
|
|
|
To chat with the LLaVa 1.5 (7B) and Prism 7B models in an interactive GUI, run the following scripts in separate terminals. |
|
|
|
Launch gradio controller: |
|
|
|
`python -m serve.controller --host 0.0.0.0 --port 10000` |
|
|
|
Launch web server: |
|
|
|
`python -m serve.gradio_web_server --controller http://localhost:10000 --model-list-mode reload --share` |
|
|
|
Now we can launch an interactive demo corresponding to each of the models we want to chat with. For Prism models, you |
|
onl need to specify a `model_id`, while for LLaVA and InstructBLIP, you need to additionally specifiy a `model_family` |
|
and `model_dir`. Note that for each model, a different port must be specified. |
|
|
|
Launch interactive demo for Prism 7B Model: |
|
|
|
`python -m interactive_demo --port 40000 --model_id prism-dinosiglip+7b` |
|
|
|
Launch interactive demo for LLaVA 1.5 7B Model: |
|
|
|
`python -m interactive_demo --port 40001 --model_family llava-v15 --model_id llava-v1.5-7b --model_dir liuhaotian/llava-v1.5-7b` |
|
|
|
## Contributing |
|
|
|
Before committing to the repository, *make sure to set up your dev environment!* |
|
|
|
Here are the basic development environment setup guidelines: |
|
|
|
+ Fork/clone the repository, performing an editable installation. Make sure to install with the development dependencies |
|
(e.g., `pip install -e ".[dev]"`); this will install `black`, `ruff`, and `pre-commit`. |
|
|
|
+ Install `pre-commit` hooks (`pre-commit install`). |
|
|
|
+ Branch for the specific feature/issue, issuing PR against the upstream repository for review. |
|
|
|
|