weizhiwang
/

LLaVA-Llama-3-8B

Text Generation

Inference Endpoints

Model card Files Files and versions Community

LLaVA-Llama-3-8B / README.md

weizhiwang's picture

Update README.md

7486058 verified 7 months ago

|

1.94 kB

	---
	license: cc
	datasets:
	- liuhaotian/LLaVA-Instruct-150K
	- liuhaotian/LLaVA-Pretrain
	language:
	- en
	---

	# Model Card for LLaVA-LLaMA-3-8B

	<!-- Provide a quick summary of what the model is/does. -->

	A reproduced LLaVA LVLM based on Llama-3-8B LLM backbone. Not an official implementation.

	## Model Details
	Follows LLavA-1.5 pre-train and supervised fine-tuning data.

	## How to Use

	Please firstly install llava via
	```
	pip install llava==1.1.2
	```

	You can load the model and perform inference as follows:
	```python
	from llava.conversation import conv_templates, SeparatorStyle
	from llava.model.builder import load_pretrained_model
	from llava.mm_utils import tokenizer_image_token, process_images, get_model_name_from_path
	from PIL import Image
	import requests

	# load model and processor
	device = "cuda" if torch.cuda.is_available() else "cpu"
	model_name = get_model_name_from_path(weizhiwang/LLaVA-Llama-3-8B)
	tokenizer, model, image_processor, context_len = load_pretrained_model(weizhiwang/LLaVA-Llama-3-8B, None, model_name, False, False, device=device)

	# prepare inputs for the model
	text = '<image>' + '\n' + "Describe the image."
	conv.append_message(conv.roles[0], text)
	conv.append_message(conv.roles[1], None)
	url = "https://huggingface.co/adept/fuyu-8b/resolve/main/bus.png"
	image_tensor = image_processor.preprocess(image, return_tensors='pt')['pixel_values'].half().cuda()

	# autoregressively generate text
	with torch.inference_mode():
	output_ids = model.generate(
	input_ids,
	images=image_tensor,
	do_sample=False,
	max_new_tokens=512,
	use_cache=True)

	outputs = tokenizer.batch_decode(output_ids[:, input_ids.shape[1]:], skip_special_tokens=True)
	print(outputs[0])
	```



	Please refer to a forked [LLaVA-Llama-3](https://github.com/Victorwz/LLaVA-Llama-3) git repo for usage. The data loading function and fastchat conversation template are changed due to a different tokenizer.