--- datasets: - Ar4ikov/civitai-sd-337k language: - en pipeline_tag: image-to-text base_model: Salesforce/blip-image-captioning-base --- # Overview `ifmain/blip-image2prompt-stable-diffusion` is a model based on `Salesforce/blip-image-captioning-base`, trained on the `Ar4ikov/civitai-sd-337k` dataset (2K images). This model is designed to generate text descriptions of images in the style of prompts for use with Stable Diffusion models. I used my Blip training code: [BLIP-Easy-Trainer](https://github.com/ifmain/BLIP-Easy-Trainer) # Example Usage ```python import torch import requests from PIL import Image from transformers import BlipProcessor, BlipForConditionalGeneration import re def prepare(text): text = text.replace('. ','.').replace(' .','.') text = text.replace('( ','(').replace(' (','(') text = text.replace(') ',')').replace(' )',')') text = text.replace(': ',':').replace(' :',':') text = text.replace('_ ','_').replace(' _','_') text = text.replace(',(())','').replace('(()),','') for i in range(10): text = text.replace(')))','))').replace('(((','((') text = re.sub(r'<[^>]*>', '', text) return text path_to_model = "ifmain/blip-image2promt-stable-diffusion" processor = BlipProcessor.from_pretrained(path_to_model) model = BlipForConditionalGeneration.from_pretrained(path_to_model, torch_dtype=torch.float16).to("cuda") img_url = 'https://storage.googleapis.com/sfr-vision-language-research/BLIP/demo.jpg' raw_image = Image.open(requests.get(img_url, stream=True).raw).convert('RGB') # unconditional image captioning inputs = processor(raw_image, return_tensors="pt").to("cuda", torch.float16) out = model.generate(**inputs, max_new_tokens=100) out_txt = processor.decode(out[0], skip_special_tokens=True) print(prepare(out_txt)) # woman sitting on the beach at sunset, rear view,((happy)),((happy)),((dog)),((mixed)),(()),(( ``` ## Addition This model support SFW and NSFW content