AIsplain available AI models / easy download links
Browse files
README.md
CHANGED
@@ -3,13 +3,33 @@ license: mit
|
|
3 |
datasets:
|
4 |
- SPRIGHT-T2I/spright_coco
|
5 |
---
|
6 |
-
## Update 03/SEP/2024:
|
7 |
|
8 |
-
|
|
|
9 |
|
|
|
|
|
|
|
|
|
|
|
10 |
|
11 |
![image/png](https://cdn-uploads.huggingface.co/production/uploads/6490359a877fc29cb1b09451/y-B-FimzahYqskNr2MV1C.png)
|
12 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
13 |
----
|
14 |
## Update 11/AUG/2024:
|
15 |
|
|
|
3 |
datasets:
|
4 |
- SPRIGHT-T2I/spright_coco
|
5 |
---
|
6 |
+
## Update 03/SEP/2024 / edit 05/AUG:
|
7 |
|
8 |
+
## π Looking for a Text Encoder for Flux.1 (or SD3, SDXL, SD, ...) to replace CLIP-L? π
|
9 |
+
You'll generally want the "TE-only" .safetensors:
|
10 |
|
11 |
+
- π The "TEXT" model has superior prompt following, especially for text, but also for other details. [DOWNLOAD](https://huggingface.co/zer0int/CLIP-GmP-ViT-L-14/blob/main/ViT-L-14-TEXT-detail-improved-hiT-GmP-TE-only-HF.safetensors)
|
12 |
+
- π The "SMOOTH" model can sometimes** have better details (when there's no text in the image). [DOWNLOAD](https://huggingface.co/zer0int/CLIP-GmP-ViT-L-14/blob/main/ViT-L-14-BEST-smooth-GmP-TE-only-HF-format.safetensors)
|
13 |
+
- The "GmP" initial fine-tune is deprecated / inferior to the above models. Still, you can [DOWNLOAD](https://huggingface.co/zer0int/CLIP-GmP-ViT-L-14/blob/main/ViT-L-14-GmP-ft-TE-only-HF-format.safetensors) it.
|
14 |
+
|
15 |
+
**: The "TEXT" model is the best for text. Full stop. But whether the "SMOOTH" model is better for your (text-free) scenario than the "TEXT" model really depends on the specific prompt. It might also be the case that the "TEXT" model leads to images that you prefer over "SMOOTH"; the only way to know is to experiment with both.
|
16 |
|
17 |
![image/png](https://cdn-uploads.huggingface.co/production/uploads/6490359a877fc29cb1b09451/y-B-FimzahYqskNr2MV1C.png)
|
18 |
|
19 |
+
## π€π¨βπ» In general (because we're not limited to text-to-image generative AI), I provide four versions / downloads:
|
20 |
+
|
21 |
+
- Text encoder only .safetensors.
|
22 |
+
- Full model .safetensors.
|
23 |
+
- State_dict pickle.
|
24 |
+
- Full model pickle (can be used as-is with "import clip" -> clip.load() after bypassing SHA checksum verification).
|
25 |
+
|
26 |
+
## The TEXT model has a modality gap of 0.80 (OpenAI pre-trained: 0.82).
|
27 |
+
- Trained with high temperature of 0.1 + tinkering.
|
28 |
+
- ImageNet/ObjectNet accuracy ~0.91 for both "SMOOTH" and "TEXT" models (pre-trained: ~0.84).
|
29 |
+
- The models (this plot = "TEXT" model) are also golden retrievers: π₯°π
|
30 |
+
|
31 |
+
![image/png](https://cdn-uploads.huggingface.co/production/uploads/6490359a877fc29cb1b09451/WiyuZLZVyjBTdPwHaVG_6.png)
|
32 |
+
|
33 |
----
|
34 |
## Update 11/AUG/2024:
|
35 |
|