File size: 1,886 Bytes
3c1e400
 
 
 
 
 
 
 
 
 
 
57f243f
3c1e400
 
 
b00d4f0
 
 
 
52c9ecd
3c1e400
b00d4f0
 
 
 
 
 
52c9ecd
b00d4f0
 
52c9ecd
b00d4f0
3c1e400
6f4446c
3c1e400
b00d4f0
 
3c1e400
b00d4f0
 
3c1e400
b00d4f0
 
 
 
 
 
 
52c9ecd
b00d4f0
3c1e400
ca26771
 
 
3c1e400
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
---
license: mit
datasets:
  - Ar4ikov/civitai-sd-337k
language:
  - en
pipeline_tag: image-to-text
base_model: Salesforce/blip-image-captioning-base
---

# Overview
`ifmain/blip-image2prompt-stable-diffusion` is a model based on `Salesforce/blip-image-captioning-base`, trained on the `Ar4ikov/civitai-sd-337k` dataset (2K images). This model is designed to generate text descriptions of images in the style of prompts for use with Stable Diffusion models.

# Example Usage
```python
import torch
import requests
from PIL import Image
from transformers import BlipProcessor, BlipForConditionalGeneration
import re

def prepare(text):
    text = text.replace('. ','.').replace(' .','.')
    text = text.replace('( ','(').replace(' (','(')
    text = text.replace(') ',')').replace(' )',')')
    text = text.replace(': ',':').replace(' :',':')
    text = text.replace('_ ','_').replace(' _','_')
    text = text.replace(',(())','').replace('(()),','')
    for i in range(10):
        text = text.replace(')))','))').replace('(((','((')
    text = re.sub(r'<[^>]*>', '', text)
    return text

path_to_model = "ifmain/blip-image2promt-stable-diffusion"

processor = BlipProcessor.from_pretrained(path_to_model)
model = BlipForConditionalGeneration.from_pretrained(path_to_model, torch_dtype=torch.float16).to("cuda")

img_url = 'https://storage.googleapis.com/sfr-vision-language-research/BLIP/demo.jpg' 
raw_image = Image.open(requests.get(img_url, stream=True).raw).convert('RGB')

# unconditional image captioning
inputs = processor(raw_image, return_tensors="pt").to("cuda", torch.float16)

out = model.generate(**inputs, max_new_tokens=100)

out_txt = processor.decode(out[0], skip_special_tokens=True)

print(prepare(out_txt)) # woman sitting on the beach at sunset, rear view,((happy)),((happy)),((dog)),((mixed)),(()),((
```

## Addition

This model support SFW and NSFW content