|
--- |
|
datasets: |
|
- Ar4ikov/civitai-sd-337k |
|
language: |
|
- en |
|
pipeline_tag: image-to-text |
|
base_model: Salesforce/blip-image-captioning-base |
|
--- |
|
|
|
# Overview |
|
`ifmain/blip-image2prompt-stable-diffusion` is a model based on `Salesforce/blip-image-captioning-base`, trained on the `Ar4ikov/civitai-sd-337k` dataset (2K images). This model is designed to generate text descriptions of images in the style of prompts for use with Stable Diffusion models. |
|
|
|
I used my Blip training code: [BLIP-Easy-Trainer](https://github.com/ifmain/BLIP-Easy-Trainer) |
|
|
|
# Example Usage |
|
```python |
|
import torch |
|
import requests |
|
from PIL import Image |
|
from transformers import BlipProcessor, BlipForConditionalGeneration |
|
import re |
|
|
|
def prepare(text): |
|
text = text.replace('. ','.').replace(' .','.') |
|
text = text.replace('( ','(').replace(' (','(') |
|
text = text.replace(') ',')').replace(' )',')') |
|
text = text.replace(': ',':').replace(' :',':') |
|
text = text.replace('_ ','_').replace(' _','_') |
|
text = text.replace(',(())','').replace('(()),','') |
|
for i in range(10): |
|
text = text.replace(')))','))').replace('(((','((') |
|
text = re.sub(r'<[^>]*>', '', text) |
|
return text |
|
|
|
path_to_model = "ifmain/blip-image2promt-stable-diffusion" |
|
|
|
processor = BlipProcessor.from_pretrained(path_to_model) |
|
model = BlipForConditionalGeneration.from_pretrained(path_to_model, torch_dtype=torch.float16).to("cuda") |
|
|
|
img_url = 'https://storage.googleapis.com/sfr-vision-language-research/BLIP/demo.jpg' |
|
raw_image = Image.open(requests.get(img_url, stream=True).raw).convert('RGB') |
|
|
|
# unconditional image captioning |
|
inputs = processor(raw_image, return_tensors="pt").to("cuda", torch.float16) |
|
|
|
out = model.generate(**inputs, max_new_tokens=100) |
|
|
|
out_txt = processor.decode(out[0], skip_special_tokens=True) |
|
|
|
print(prepare(out_txt)) # woman sitting on the beach at sunset, rear view,((happy)),((happy)),((dog)),((mixed)),(()),(( |
|
``` |
|
|
|
## Addition |
|
|
|
This model support SFW and NSFW content |