How to get Image and Caption Embeddings

#2
by Harshad2410 - opened

I have a image and a caption associated with image, I want to get the cross embeddings of the both image and text in a single vector form.

BridgeTower org

Hi,
From the output (BridgeTowerContrastiveOutput) of BridgeTowerForContrastiveLearning you can access cross modal embeddings using:

model = BridgeTowerForContrastiveLearning.from_pretrained("BridgeTower/bridgetower-large-itm-mlm-itc")

inputs  = processor(images, texts, padding=True, return_tensors="pt")
outputs = model(**inputs)

cross_modal_embeddings = outputs.cross_embeds

Sign up or log in to comment