Edit model card

Licence

license inherited from Salesforce/blip-image-captioning-base

Overview

ifmain/blip-image2promt-stable-diffusion-base is a model based on Salesforce/blip-image-captioning-base, trained on the Ar4ikov/civitai-sd-337k dataset (2K images). This model is designed to generate text descriptions of images in the style of prompts for use with Stable Diffusion models.

I used my Blip training code: BLIP-Easy-Trainer

Example Usage

import torch
import requests
from PIL import Image
from transformers import BlipProcessor, BlipForConditionalGeneration
import re

def prepare(text):
    text = text.replace('. ','.').replace(' .','.')
    text = text.replace('( ','(').replace(' (','(')
    text = text.replace(') ',')').replace(' )',')')
    text = text.replace(': ',':').replace(' :',':')
    text = text.replace('_ ','_').replace(' _','_')
    text = text.replace(',(())','').replace('(()),','')
    for i in range(10):
        text = text.replace(')))','))').replace('(((','((')
    text = re.sub(r'<[^>]*>', '', text)
    return text

path_to_model = "ifmain/blip-image2promt-stable-diffusion-base"

processor = BlipProcessor.from_pretrained(path_to_model)
model = BlipForConditionalGeneration.from_pretrained(path_to_model, torch_dtype=torch.float16).to("cuda")

img_url = 'https://storage.googleapis.com/sfr-vision-language-research/BLIP/demo.jpg' 
raw_image = Image.open(requests.get(img_url, stream=True).raw).convert('RGB')

# unconditional image captioning
inputs = processor(raw_image, return_tensors="pt").to("cuda", torch.float16)

out = model.generate(**inputs, max_new_tokens=100)

out_txt = processor.decode(out[0], skip_special_tokens=True)

print(prepare(out_txt)) # woman sitting on the beach at sunset, rear view,((happy)),((happy)),((dog)),((mixed)),(()),((

Addition

This model support SFW and NSFW content

Downloads last month
23
Safetensors
Model size
247M params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for ifmain/blip-image2promt-stable-diffusion-base

Finetuned
(10)
this model

Dataset used to train ifmain/blip-image2promt-stable-diffusion-base