File size: 1,963 Bytes
3c1e400 c3c0528 3c1e400 c3c0528 3c1e400 57f243f 3c1e400 f153265 3c1e400 b00d4f0 52c9ecd 3c1e400 b00d4f0 52c9ecd b00d4f0 52c9ecd b00d4f0 3c1e400 6f4446c 3c1e400 b00d4f0 3c1e400 b00d4f0 3c1e400 b00d4f0 52c9ecd b00d4f0 3c1e400 ca26771 c3c0528 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 |
---
datasets:
- Ar4ikov/civitai-sd-337k
language:
- en
pipeline_tag: image-to-text
base_model: Salesforce/blip-image-captioning-base
---
# Overview
`ifmain/blip-image2prompt-stable-diffusion` is a model based on `Salesforce/blip-image-captioning-base`, trained on the `Ar4ikov/civitai-sd-337k` dataset (2K images). This model is designed to generate text descriptions of images in the style of prompts for use with Stable Diffusion models.
I used my Blip training code: [BLIP-Easy-Trainer](https://github.com/ifmain/BLIP-Easy-Trainer)
# Example Usage
```python
import torch
import requests
from PIL import Image
from transformers import BlipProcessor, BlipForConditionalGeneration
import re
def prepare(text):
text = text.replace('. ','.').replace(' .','.')
text = text.replace('( ','(').replace(' (','(')
text = text.replace(') ',')').replace(' )',')')
text = text.replace(': ',':').replace(' :',':')
text = text.replace('_ ','_').replace(' _','_')
text = text.replace(',(())','').replace('(()),','')
for i in range(10):
text = text.replace(')))','))').replace('(((','((')
text = re.sub(r'<[^>]*>', '', text)
return text
path_to_model = "ifmain/blip-image2promt-stable-diffusion"
processor = BlipProcessor.from_pretrained(path_to_model)
model = BlipForConditionalGeneration.from_pretrained(path_to_model, torch_dtype=torch.float16).to("cuda")
img_url = 'https://storage.googleapis.com/sfr-vision-language-research/BLIP/demo.jpg'
raw_image = Image.open(requests.get(img_url, stream=True).raw).convert('RGB')
# unconditional image captioning
inputs = processor(raw_image, return_tensors="pt").to("cuda", torch.float16)
out = model.generate(**inputs, max_new_tokens=100)
out_txt = processor.decode(out[0], skip_special_tokens=True)
print(prepare(out_txt)) # woman sitting on the beach at sunset, rear view,((happy)),((happy)),((dog)),((mixed)),(()),((
```
## Addition
This model support SFW and NSFW content |