loading images locally?

by fusi0n - opened Nov 26, 2024

Nov 26, 2024

I can't seem to get the model to recognize any local images. I've tried loading them with PIL and Image.open("./test/test.jpg"), for example but no luck. Any ideas?

andito

Hugging Face TB Research org Nov 27, 2024

Have you tried:

from transformers.image_utils import load_image
image1 = load_image("https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg")
?

fusi0n

Nov 28, 2024

I have. That works fine. But if I include a local directory like ./codespace/image1.jpg, the model does not see the image.

kishtyle

Dec 3, 2024

Anyone got it running with local images please post. Also I found it is handling only .jpg could not process .png, can anyone confirm this?

ctranslate2-4you

Dec 3, 2024

•

edited Dec 3, 2024

Here ya go...it's run a little different when processing a local file. Also, please note...

I opted to use the native prompt format because I like seeing it spelled out for some reason and don't like using "apply_chat_template".
I use a custom "set_cuda_paths" function at the top because I like pip installing these libraries rather than relying on a system-wide installation. If you use a system-wide installation (like most people do), simply remove this function.
I rely on a hardcoded path to the folder containing the model files rather than simply specifying the huggingface repo id because I like downloading the files first using snapshot_download where I can actually see the files rather than them being in my cache...adjust accordingly.

import sys
import os
from pathlib import Path

def set_cuda_paths():
    venv_base = Path(sys.executable).parent.parent
    nvidia_base_path = venv_base / 'Lib' / 'site-packages' / 'nvidia'
    cuda_path = nvidia_base_path / 'cuda_runtime' / 'bin'
    cublas_path = nvidia_base_path / 'cublas' / 'bin'
    cudnn_path = nvidia_base_path / 'cudnn' / 'bin'
    nvrtc_path = nvidia_base_path / 'cuda_nvrtc' / 'bin'
    
    paths_to_add = [
        str(cuda_path),
        str(cublas_path),
        str(cudnn_path),
        str(nvrtc_path),
    ]
    env_vars = ['CUDA_PATH', 'PATH']
    
    for env_var in env_vars:
        current_value = os.environ.get(env_var, '')
        new_value = os.pathsep.join(paths_to_add + [current_value] if current_value else paths_to_add)
        os.environ[env_var] = new_value

set_cuda_paths()

import torch
from PIL import Image
from transformers import AutoProcessor, AutoModelForVision2Seq

DEVICE = "cuda" if torch.cuda.is_available() else "cpu"

image_path = r"D:\Scripts\bench_vision\IMG_140531.JPG"
image = Image.open(image_path)
width = image.width
height = image.height
model_dir = r"D:\Scripts\bench_vision\HuggingFaceTB--SmolVLM-Instruct"

processor = AutoProcessor.from_pretrained(model_dir)

model = AutoModelForVision2Seq.from_pretrained(
    model_dir,
    torch_dtype=torch.bfloat16,
    attn_implementation="flash_attention_2" if DEVICE == "cuda" else "eager",
    low_cpu_mem_usage=True,
)

model.to(DEVICE)

prompt = f"""<|im_start|>User:<image>Can you describe this image in detail but be succinct and do not repeat yourself?<end_of_utterance>
Assistant:"""

inputs = processor(text=prompt, images=[image], return_tensors="pt")
inputs = inputs.to(DEVICE)

generated_ids = model.generate(**inputs, max_new_tokens=500)
generated_ids = generated_ids[:, inputs['input_ids'].shape[1]:]
generated_texts = processor.batch_decode(
    generated_ids,
    skip_special_tokens=True,
)
print(generated_texts[0])

ctranslate2-4you

Dec 8, 2024

I have. That works fine. But if I include a local directory like ./codespace/image1.jpg, the model does not see the image.

Did it work? Always curious of whether something works on another platform.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment