title: VLM Demo
sdk: gradio
sdk_version: 3.35.2
app_file: serve/gradio_web_server.py
VLM Demo
VLM Demo: Lightweight repo for chatting with models loaded into VLM Bench.
Installation
This repository can be installed as follows:
git clone [email protected]:TRI-ML/vlm-demo.git
cd vlm-demo
pip install -e .
This repository also requires that the vlm-bench
package (vlbench
) and
prismatic-vlms
package (prisma
) are installed in the current environment.
These can both be installed from source from the following git repos:
vlm-bench
:htts://github.com/TRI-ML/vlm-bench
prismatic-vlms
:https://github.com/TRI-ML/prismatic-vlms
Usage
The main script to run is interactive_demo.py
, while the implementation of
the Gradio Controller (serve/gradio_controller.py
) and Gradio Web Server
(serve/gradio_web_server.py
) are within serve
. All of this code is heavily
adapted from the LLaVA Github Repo:.
More details on how this code was modified from the original LLaVA repo is provided in the
relevant source files.
To run the demo, run the following commands:
- Start Gradio Controller:
python -m serve.controller --host 0.0.0.0 --port 10000
- Start Gradio Web Server:
python -m serve.gradio_web_server --controller http://localhost:10000 --model-list-mode reload --share
- Run interactive demo:
CUDA_VISIBLE_DEVICES=0 python -m interactive_demo --port 40000 --model_dir <PATH TO MODEL CKPT>
When running the demo, the following parameters are adjustable:
- Temperature
- Max output tokens
The default interaction mode is Chat, which is the main way to use our models. However, we also support a number of other interaction modes for more specific use cases:
- Captioning: Here,you can simply upload an image with no provided prompt and the selected model will output a caption. Even if a prompt is input by the user, it will not be used in producing the caption.
- Bounding Box Prediction: After uploading an image, simply specify a portion of the image for which bounding box coordinates are desired in the prompt and the selected model will output corresponding coordinates.
- Visual Question Answering: Selecting this option is best when the user wants short, succint answers to a specific question provided in the prompt.
- True/False Question Answering: Selecting this option is best when the user wants a True/False answer to a specific question provided in the prompt.
Contributing
Before committing to the repository, make sure to set up your dev environment!
Here are the basic development environment setup guidelines:
Fork/clone the repository, performing an editable installation. Make sure to install with the development dependencies (e.g.,
pip install -e ".[dev]"
); this will installblack
,ruff
, andpre-commit
.Install
pre-commit
hooks (pre-commit install
).Branch for the specific feature/issue, issuing PR against the upstream repository for review.