Image Caption Model

Model description

The model is used to generate the Chinese title of a random movie post. It is based on the BEiT and GPT2.

Training Data

The training data contains 5043 movie posts and their corresponding Chinese title which are collected by Movie-Title-Post

How to use

from transformers import VisionEncoderDecoderModel, ViTFeatureExtractor, AutoTokenizer
from PIL import Image

pretrained = "snzhang/FilmTitle-Beit-GPT2"
model = VisionEncoderDecoderModel.from_pretrained(pretrained)
feature_extractor = ViTFeatureExtractor.from_pretrained(pretrained)
tokenizer = AutoTokenizer.from_pretrained(pretrained)

image_path = "your image path"
image = Image.open(image_path)
if image.mode != "RGB":
        image = image.convert("RGB")
pixel_values = feature_extractor(images=image, return_tensors="pt").pixel_values

output_ids = model.generate(pixel_values, **gen_kwargs)
preds = tokenizer.batch_decode(output_ids, skip_special_tokens=True)
preds = [pred.strip() for pred in preds]
print(preds)

More Details

You can get more training details in FilmTitle-Beit-GPT2

Downloads last month
16
Safetensors
Model size
264M params
Tensor type
I64
FP16
BOOL
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Space using snzhang/FilmTitle-Beit-GPT2 1