MiaoshouAI
commited on
Update README.md
Browse files
README.md
CHANGED
@@ -1,3 +1,55 @@
|
|
1 |
-
---
|
2 |
-
license:
|
3 |
-
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
license: mit
|
3 |
+
---
|
4 |
+
# Florence-2-base-PromptGen
|
5 |
+
|
6 |
+
Florence-2-base-PromptGen is a model trained for [MiaoshouAI Tagger for ComfyUI](https://github.com/miaoshouai/ComfyUI-Miaoshouai-Tagger).
|
7 |
+
It is an advanced image captioning tool based on the [Microsoft Florence-2 Model](https://huggingface.co/microsoft/Florence-2-base) and fine-tuned to perfection.
|
8 |
+
|
9 |
+
## Why another tagging model?
|
10 |
+
Most vision models today are trained mainly for general vision recognition purposes, but when doing prompting and image tagging for model training, the format and details of the captions is quite different.
|
11 |
+
|
12 |
+
Florence-2-base-PromptGen is trained on such a purpose as aiming to improve the tagging experience and accuracy of the prompt and tagging job. The model is trained based on images and cleaned tags from Civitai so that the end result for tagging the images are the prompts you use to generate these images.
|
13 |
+
|
14 |
+
## Instruction prompt:
|
15 |
+
A new instruction prompt \<GENERATE_PROMPT\> is created for this purpose in addition to \<DETAILED_CAPTION\> and \<MORE_DETAILED_CAPTION\>.
|
16 |
+
It will respond back in danbooru tagging style with much better accuracy and proper level of details.
|
17 |
+
|
18 |
+
## How to use:
|
19 |
+
|
20 |
+
To use this model, you can load it directly from the Hugging Face Model Hub:
|
21 |
+
|
22 |
+
```python
|
23 |
+
|
24 |
+
model = AutoModelForCausalLM.from_pretrained("MiaoshouAI/Florence-2-base-PromptGen", trust_remote_code=True)
|
25 |
+
processor = AutoProcessor.from_pretrained("MiaoshouAI/Florence-2-base-PromptGen", trust_remote_code=True)
|
26 |
+
|
27 |
+
prompt = "<GENERATE_PROMPT>"
|
28 |
+
question = "Describe everything you see in this image?"
|
29 |
+
|
30 |
+
url = "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/tasks/car.jpg?download=true"
|
31 |
+
image = Image.open(requests.get(url, stream=True).raw)
|
32 |
+
|
33 |
+
inputs = processor(text=prompt, images=image, return_tensors="pt").to(device)
|
34 |
+
|
35 |
+
generated_ids = model.generate(
|
36 |
+
input_ids=inputs["input_ids"],
|
37 |
+
pixel_values=inputs["pixel_values"],
|
38 |
+
max_new_tokens=1024,
|
39 |
+
do_sample=False,
|
40 |
+
num_beams=3
|
41 |
+
)
|
42 |
+
generated_text = processor.batch_decode(generated_ids, skip_special_tokens=False)[0]
|
43 |
+
|
44 |
+
parsed_answer = processor.post_process_generation(generated_text, task=prompt, image_size=(image.width, image.height))
|
45 |
+
|
46 |
+
print(parsed_answer)
|
47 |
+
```
|
48 |
+
|
49 |
+
## Use under MiaoshouAI Tagger ComfyUI
|
50 |
+
If you just want to use this model, you can use it under ComfyUI-Miaoshouai-Tagger
|
51 |
+
|
52 |
+
https://github.com/miaoshouai/ComfyUI-Miaoshouai-Tagger
|
53 |
+
|
54 |
+
A detailed use and install instruction is already there.
|
55 |
+
|