File size: 1,963 Bytes
3c1e400
 
c3c0528
3c1e400
c3c0528
3c1e400
 
 
 
 
57f243f
3c1e400
f153265
 
3c1e400
 
b00d4f0
 
 
 
52c9ecd
3c1e400
b00d4f0
 
 
 
 
 
52c9ecd
b00d4f0
 
52c9ecd
b00d4f0
3c1e400
6f4446c
3c1e400
b00d4f0
 
3c1e400
b00d4f0
 
3c1e400
b00d4f0
 
 
 
 
 
 
52c9ecd
b00d4f0
3c1e400
ca26771
 
c3c0528
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
---
datasets:
- Ar4ikov/civitai-sd-337k
language:
- en
pipeline_tag: image-to-text
base_model: Salesforce/blip-image-captioning-base
---

# Overview
`ifmain/blip-image2prompt-stable-diffusion` is a model based on `Salesforce/blip-image-captioning-base`, trained on the `Ar4ikov/civitai-sd-337k` dataset (2K images). This model is designed to generate text descriptions of images in the style of prompts for use with Stable Diffusion models.

I used my Blip training code: [BLIP-Easy-Trainer](https://github.com/ifmain/BLIP-Easy-Trainer)

# Example Usage
```python
import torch
import requests
from PIL import Image
from transformers import BlipProcessor, BlipForConditionalGeneration
import re

def prepare(text):
    text = text.replace('. ','.').replace(' .','.')
    text = text.replace('( ','(').replace(' (','(')
    text = text.replace(') ',')').replace(' )',')')
    text = text.replace(': ',':').replace(' :',':')
    text = text.replace('_ ','_').replace(' _','_')
    text = text.replace(',(())','').replace('(()),','')
    for i in range(10):
        text = text.replace(')))','))').replace('(((','((')
    text = re.sub(r'<[^>]*>', '', text)
    return text

path_to_model = "ifmain/blip-image2promt-stable-diffusion"

processor = BlipProcessor.from_pretrained(path_to_model)
model = BlipForConditionalGeneration.from_pretrained(path_to_model, torch_dtype=torch.float16).to("cuda")

img_url = 'https://storage.googleapis.com/sfr-vision-language-research/BLIP/demo.jpg' 
raw_image = Image.open(requests.get(img_url, stream=True).raw).convert('RGB')

# unconditional image captioning
inputs = processor(raw_image, return_tensors="pt").to("cuda", torch.float16)

out = model.generate(**inputs, max_new_tokens=100)

out_txt = processor.decode(out[0], skip_special_tokens=True)

print(prepare(out_txt)) # woman sitting on the beach at sunset, rear view,((happy)),((happy)),((dog)),((mixed)),(()),((
```

## Addition

This model support SFW and NSFW content