Safetensors
minicpmv
custom_code
MiniCPM-V-2_6-RA / README.md
geekifan's picture
Update README.md
17ade0d verified
metadata
license: mit
base_model:
  - openbmb/MiniCPM-V-2_6

Logo CaReBench: A Fine-grained Benchmark for Video Captioning and Retrieval

Yifan Xu, Xinhao Li, Yichun Yang, Desen Meng, Rui Huang, Limin Wang

πŸ€— Model    |    πŸ€— Data   ο½œ    πŸ“‘ Paper   

πŸ“ Introduction

This is MiniCPM-V 2.6 trained with Retrieval Adaptation. Refer to our paper for details.

Usage

Loading from the huggingface remote path is not tested. It is recommended to download this checkpoint to your local environment to prevent potential bugs.

For Retrieval Tasks

from utils.video import read_frames_decord
from models.modeling_encoders import AutoEncoder
from torch.nn.functional import cosine_similarity

encoder = AutoEncoder.from_pretrained('path/to/checkpoints/MiniCPM-V-2_6-RA')
frames = read_frames_decord(video_path='assets/demo.mp4', num_frames=32)
text = "This video features a man slicing tomatoes in the kitchen."
vision_emb = encoder.encode_vision(frames.unsqueeze(0))
text_emb = encoder.encode_text(text)
print(f'Vision embedding shape: {vision_emb.shape}')
print(f'Text embedding shape: {text_emb.shape}')
print(f'Cosine similarity: {cosine_similarity(vision_emb, text_emb)}')