vlm-demo / README.md
mattb512's picture
take update from main
1db133b
|
raw
history blame
2.92 kB
---
title: VLM Demo
sdk: docker
sdk_version: 3.35.2
app_file: serve/gradio_web_server.py
---
# VLM Demo
> *VLM Demo*: Lightweight repo for chatting with models loaded into *VLM Bench*.
---
## Installation
This repository can be installed as follows:
```bash
git clone [email protected]:TRI-ML/vlm-demo.git
cd vlm-demo
pip install -e .
```
This repository also requires that the `vlm-evaluation` package (`vlm_eval`) is
installed in the current environment. Installation instructions can be found
[here](https://github.com/TRI-ML/vlm-evaluation/tree/main).
## Usage
The main script to run is `interactive_demo.py`, while the implementation of
the Gradio Controller (`serve/gradio_controller.py`) and Gradio Web Server
(`serve/gradio_web_server.py`) are within `serve`. All of this code is heavily
adapted from the [LLaVA Github Repo:](https://github.com/haotian-liu/LLaVA/blob/main/llava/serve/).
More details on how this code was modified from the original LLaVA repo is provided in the
relevant source files.
To run the demo, run the following commands:
+ Start Gradio Controller: `python -m serve.controller --host 0.0.0.0 --port 10000`
+ Start Gradio Web Server: `python -m serve.gradio_web_server --controller http://localhost:10000 --model-list-mode reload --share`
+ Run interactive demo: `CUDA_VISIBLE_DEVICES=0 python -m interactive_demo --port 40000 --model_dir <PATH TO MODEL CKPT>`
When running the demo, the following parameters are adjustable:
+ Temperature
+ Max output tokens
The default interaction mode is Chat, which is the main way to use our models. However, we also support a number of other
interaction modes for more specific use cases:
+ Captioning: Here,you can simply upload an image with no provided prompt and the selected model will output a caption. Even if a prompt
is input by the user, it will not be used in producing the caption.
+ Bounding Box Prediction: After uploading an image, simply specify a portion of the image for which bounding box coordinates are desired
in the prompt and the selected model will output corresponding coordinates.
+ Visual Question Answering: Selecting this option is best when the user wants short, succint answers to a specific question provided in the
prompt.
+ True/False Question Answering: Selecting this option is best when the user wants a True/False answer to a specific question provided in the
prompt.
## Contributing
Before committing to the repository, *make sure to set up your dev environment!*
Here are the basic development environment setup guidelines:
+ Fork/clone the repository, performing an editable installation. Make sure to install with the development dependencies
(e.g., `pip install -e ".[dev]"`); this will install `black`, `ruff`, and `pre-commit`.
+ Install `pre-commit` hooks (`pre-commit install`).
+ Branch for the specific feature/issue, issuing PR against the upstream repository for review.