How to get Image and Caption Embeddings

by Harshad2410 - opened Sep 16

I have a image and a caption associated with image, I want to get the cross embeddings of the both image and text in a single vector form.

shaoyent

BridgeTower org 29 days ago

Hi,
From the output (BridgeTowerContrastiveOutput) of BridgeTowerForContrastiveLearning you can access cross modal embeddings using:

model = BridgeTowerForContrastiveLearning.from_pretrained("BridgeTower/bridgetower-large-itm-mlm-itc")

inputs  = processor(images, texts, padding=True, return_tensors="pt")
outputs = model(**inputs)

cross_modal_embeddings = outputs.cross_embeds

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment