zer0int commited on
Commit
2e62dca
β€’
1 Parent(s): c7eb10c

AIsplain available AI models / easy download links

Browse files
Files changed (1) hide show
  1. README.md +22 -2
README.md CHANGED
@@ -3,13 +3,33 @@ license: mit
3
  datasets:
4
  - SPRIGHT-T2I/spright_coco
5
  ---
6
- ## Update 03/SEP/2024:
7
 
8
- Improved CLIP-L for use with Flux.1 (improved TEXT and prompt detail adherence). Trained with high temperature of 0.1 + tinkering. Otherwise, same GmP-smooth-labels code as previous model fine-tune; same dataset (see below for link to GitHub).
 
9
 
 
 
 
 
 
10
 
11
  ![image/png](https://cdn-uploads.huggingface.co/production/uploads/6490359a877fc29cb1b09451/y-B-FimzahYqskNr2MV1C.png)
12
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
13
  ----
14
  ## Update 11/AUG/2024:
15
 
 
3
  datasets:
4
  - SPRIGHT-T2I/spright_coco
5
  ---
6
+ ## Update 03/SEP/2024 / edit 05/AUG:
7
 
8
+ ## πŸ‘‹ Looking for a Text Encoder for Flux.1 (or SD3, SDXL, SD, ...) to replace CLIP-L? πŸ‘€
9
+ You'll generally want the "TE-only" .safetensors:
10
 
11
+ - πŸ‘‰ The "TEXT" model has superior prompt following, especially for text, but also for other details. [DOWNLOAD](https://huggingface.co/zer0int/CLIP-GmP-ViT-L-14/blob/main/ViT-L-14-TEXT-detail-improved-hiT-GmP-TE-only-HF.safetensors)
12
+ - πŸ‘‰ The "SMOOTH" model can sometimes** have better details (when there's no text in the image). [DOWNLOAD](https://huggingface.co/zer0int/CLIP-GmP-ViT-L-14/blob/main/ViT-L-14-BEST-smooth-GmP-TE-only-HF-format.safetensors)
13
+ - The "GmP" initial fine-tune is deprecated / inferior to the above models. Still, you can [DOWNLOAD](https://huggingface.co/zer0int/CLIP-GmP-ViT-L-14/blob/main/ViT-L-14-GmP-ft-TE-only-HF-format.safetensors) it.
14
+
15
+ **: The "TEXT" model is the best for text. Full stop. But whether the "SMOOTH" model is better for your (text-free) scenario than the "TEXT" model really depends on the specific prompt. It might also be the case that the "TEXT" model leads to images that you prefer over "SMOOTH"; the only way to know is to experiment with both.
16
 
17
  ![image/png](https://cdn-uploads.huggingface.co/production/uploads/6490359a877fc29cb1b09451/y-B-FimzahYqskNr2MV1C.png)
18
 
19
+ ## πŸ€“πŸ‘¨β€πŸ’» In general (because we're not limited to text-to-image generative AI), I provide four versions / downloads:
20
+
21
+ - Text encoder only .safetensors.
22
+ - Full model .safetensors.
23
+ - State_dict pickle.
24
+ - Full model pickle (can be used as-is with "import clip" -> clip.load() after bypassing SHA checksum verification).
25
+
26
+ ## The TEXT model has a modality gap of 0.80 (OpenAI pre-trained: 0.82).
27
+ - Trained with high temperature of 0.1 + tinkering.
28
+ - ImageNet/ObjectNet accuracy ~0.91 for both "SMOOTH" and "TEXT" models (pre-trained: ~0.84).
29
+ - The models (this plot = "TEXT" model) are also golden retrievers: πŸ₯°πŸ•
30
+
31
+ ![image/png](https://cdn-uploads.huggingface.co/production/uploads/6490359a877fc29cb1b09451/WiyuZLZVyjBTdPwHaVG_6.png)
32
+
33
  ----
34
  ## Update 11/AUG/2024:
35