File size: 3,660 Bytes
f45a75a |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 |
---
library_name: diffusers
pipeline_tag: text-to-image
---
## Model Details
### Model Description
This model is fine-tuned from [stable-diffusion-v1-5](https://huggingface.co/runwayml/stable-diffusion-v1-5) on 110,000 image-text pairs from the MIMIC dataset using the Norm-tuning PEFT method. Under this fine-tuning strategy, fine-tune only the normalization weightsin the U-Net while keeping everything else frozen.
- **Developed by:** [Raman Dutt](https://twitter.com/RamanDutt4)
- **Shared by:** [Raman Dutt](https://twitter.com/RamanDutt4)
- **Model type:** [Stable Diffusion fine-tuned using Parameter-Efficient Fine-Tuning]
- **Finetuned from model:** [stable-diffusion-v1-5](https://huggingface.co/runwayml/stable-diffusion-v1-5)
### Model Sources
- **Paper:** [Parameter-Efficient Fine-Tuning for Medical Image Analysis: The Missed Opportunity](https://arxiv.org/abs/2305.08252)
- **Demo:** [MIMIC-SD-PEFT-Demo](https://huggingface.co/spaces/raman07/MIMIC-SD-Demo-Memory-Optimized?logs=container)
## Direct Use
This model can be directly used to generate realistic medical images from text prompts.
## How to Get Started with the Model
```python
import os
from safetensors.torch import load_file
from diffusers.pipelines import StableDiffusionPipeline
pipe = StableDiffusionPipeline.from_pretrained(sd_folder_path, revision="fp16")
exp_path = os.path.join('unet', 'diffusion_pytorch_model.safetensors')
state_dict = load_file(exp_path)
# Load the adapted U-Net
pipe.unet.load_state_dict(state_dict, strict=False)
pipe.to('cuda:0')
# Generate images with text prompts
TEXT_PROMPT = "No acute cardiopulmonary abnormality."
GUIDANCE_SCALE = 4
INFERENCE_STEPS = 75
result_image = pipe(
prompt=TEXT_PROMPT,
height=224,
width=224,
guidance_scale=GUIDANCE_SCALE,
num_inference_steps=INFERENCE_STEPS,
)
result_pil_image = result_image["images"][0]
```
## Training Details
### Training Data
This model has been fine-tuned on 110K image-text pairs from the MIMIC dataset.
### Training Procedure
The training procedure has been described in detail in Section 4.3 of this [paper](https://arxiv.org/abs/2305.08252).
#### Metrics
This model has been evaluated using the Fréchet inception distance (FID) Score on MIMIC dataset.
### Results
| Fine-Tuning Strategy | FID Score |
|------------------------|-----------|
| Full FT | 58.74 |
| Attention | 52.41 |
| Bias | 20.81 |
| Norm | 29.84 |
| Bias+Norm+Attention | 35.93 |
| LoRA | 439.65 |
| SV-Diff | 23.59 |
| DiffFit | 42.50 |
## Environmental Impact
Using Parameter-Efficient Fine-Tuning potentially causes **lesser** harm to the environment since we fine-tune a significantly lesser number of parameters in a model. This results in much lesser computing and hardware requirements.
## Citation
**BibTeX:**
@article{dutt2023parameter,
title={Parameter-Efficient Fine-Tuning for Medical Image Analysis: The Missed Opportunity},
author={Dutt, Raman and Ericsson, Linus and Sanchez, Pedro and Tsaftaris, Sotirios A and Hospedales, Timothy},
journal={arXiv preprint arXiv:2305.08252},
year={2023}
}
**APA:**
Dutt, R., Ericsson, L., Sanchez, P., Tsaftaris, S. A., & Hospedales, T. (2023). Parameter-Efficient Fine-Tuning for Medical Image Analysis: The Missed Opportunity. arXiv preprint arXiv:2305.08252.
## Model Card Authors
Raman Dutt
[Twitter](https://twitter.com/RamanDutt4)
[LinkedIn](https://www.linkedin.com/in/raman-dutt/)
[Email](mailto:[email protected]) |