---
license: mit
base_model:
- stabilityai/stable-diffusion-xl-base-1.0
---

- Requires a custom training notebook that will be provided soon.

- Distilling SDXL using T5 attention masking for the sake of teaching SDXL; CLIP_L and CLIP_G to expect the T5 attention mask.

- Additional finetuning required, additional interpolation required, addistional distillation required for full cohesion.

- Ongoing training effort interpolating the T5 into SDXL using teacher/student process. 

-
-config = {
-    "epochs": 10,
-    "batch_size": 64,
-    "learning_rate": 1e-6,          # Lower learning rate for stability
-    "save_interval_steps": 10,      # Save checkpoint every 10 training steps
-    "test_save_interval_steps": 10, # Save test images every 10 training steps
-    "checkpoint_dir": "./checkpoints",      # Full diffusers checkpoint folder
-    "compact_model_dir": "./compact_model",   # For final compact model (not used for caching)
-    "baseline_test_dir": "./baseline_test",   # For baseline images & captions
-    "cache_dir": "./cache",                  # Folder for caching T5 outputs and teacher features
-    "num_generated_captions": 128,            # Number of captions to generate for training
-    "model_id": "stabilityai/stable-diffusion-xl-base-1.0",
-    "model_name": "my_interpolative_distillation",  # Folder name for checkpoints
-    "seed": 420,
-    "device": torch.device("cuda:0") if torch.cuda.is_available() else torch.device("cpu"),
-    "inference_steps": 50,
-    "height": 1024,
-    "width": 1024,
-    "guidance_scale": 7.5,
-    "inference_interval": 10,
-    "max_caption_length": 512,
-    # Batch size for teacher feature caching (set very low to reduce VRAM usage)
-    "cache_teacher_batch_size": 64,
-}
-