File size: 1,943 Bytes
4b62e65 873e6c4 7486058 873e6c4 4b62e65 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 |
---
license: cc
datasets:
- liuhaotian/LLaVA-Instruct-150K
- liuhaotian/LLaVA-Pretrain
language:
- en
---
# Model Card for LLaVA-LLaMA-3-8B
<!-- Provide a quick summary of what the model is/does. -->
A reproduced LLaVA LVLM based on Llama-3-8B LLM backbone. Not an official implementation.
## Model Details
Follows LLavA-1.5 pre-train and supervised fine-tuning data.
## How to Use
Please firstly install llava via
```
pip install llava==1.1.2
```
You can load the model and perform inference as follows:
```python
from llava.conversation import conv_templates, SeparatorStyle
from llava.model.builder import load_pretrained_model
from llava.mm_utils import tokenizer_image_token, process_images, get_model_name_from_path
from PIL import Image
import requests
# load model and processor
device = "cuda" if torch.cuda.is_available() else "cpu"
model_name = get_model_name_from_path(weizhiwang/LLaVA-Llama-3-8B)
tokenizer, model, image_processor, context_len = load_pretrained_model(weizhiwang/LLaVA-Llama-3-8B, None, model_name, False, False, device=device)
# prepare inputs for the model
text = '<image>' + '\n' + "Describe the image."
conv.append_message(conv.roles[0], text)
conv.append_message(conv.roles[1], None)
url = "https://huggingface.co/adept/fuyu-8b/resolve/main/bus.png"
image_tensor = image_processor.preprocess(image, return_tensors='pt')['pixel_values'].half().cuda()
# autoregressively generate text
with torch.inference_mode():
output_ids = model.generate(
input_ids,
images=image_tensor,
do_sample=False,
max_new_tokens=512,
use_cache=True)
outputs = tokenizer.batch_decode(output_ids[:, input_ids.shape[1]:], skip_special_tokens=True)
print(outputs[0])
```
Please refer to a forked [LLaVA-Llama-3](https://github.com/Victorwz/LLaVA-Llama-3) git repo for usage. The data loading function and fastchat conversation template are changed due to a different tokenizer.
|