|
--- |
|
library_name: diffusers |
|
pipeline_tag: text-to-image |
|
--- |
|
|
|
## Model Details |
|
|
|
### Model Description |
|
|
|
This model is fine-tuned from [stable-diffusion-v1-5](https://huggingface.co/runwayml/stable-diffusion-v1-5) on 110,000 image-text pairs from the MIMIC dataset using the Norm-tuning PEFT method. Under this fine-tuning strategy, fine-tune only the normalization weightsin the U-Net while keeping everything else frozen. |
|
|
|
- **Developed by:** [Raman Dutt](https://twitter.com/RamanDutt4) |
|
- **Shared by:** [Raman Dutt](https://twitter.com/RamanDutt4) |
|
- **Model type:** [Stable Diffusion fine-tuned using Parameter-Efficient Fine-Tuning] |
|
- **Finetuned from model:** [stable-diffusion-v1-5](https://huggingface.co/runwayml/stable-diffusion-v1-5) |
|
|
|
### Model Sources |
|
|
|
|
|
- **Paper:** [Parameter-Efficient Fine-Tuning for Medical Image Analysis: The Missed Opportunity](https://arxiv.org/abs/2305.08252) |
|
- **Demo:** [MIMIC-SD-PEFT-Demo](https://huggingface.co/spaces/raman07/MIMIC-SD-Demo-Memory-Optimized?logs=container) |
|
|
|
## Direct Use |
|
|
|
This model can be directly used to generate realistic medical images from text prompts. |
|
|
|
|
|
## How to Get Started with the Model |
|
|
|
```python |
|
import os |
|
from safetensors.torch import load_file |
|
from diffusers.pipelines import StableDiffusionPipeline |
|
|
|
pipe = StableDiffusionPipeline.from_pretrained(sd_folder_path, revision="fp16") |
|
exp_path = os.path.join('unet', 'diffusion_pytorch_model.safetensors') |
|
state_dict = load_file(exp_path) |
|
|
|
# Load the adapted U-Net |
|
pipe.unet.load_state_dict(state_dict, strict=False) |
|
pipe.to('cuda:0') |
|
|
|
# Generate images with text prompts |
|
|
|
TEXT_PROMPT = "No acute cardiopulmonary abnormality." |
|
GUIDANCE_SCALE = 4 |
|
INFERENCE_STEPS = 75 |
|
|
|
result_image = pipe( |
|
prompt=TEXT_PROMPT, |
|
height=224, |
|
width=224, |
|
guidance_scale=GUIDANCE_SCALE, |
|
num_inference_steps=INFERENCE_STEPS, |
|
) |
|
|
|
result_pil_image = result_image["images"][0] |
|
``` |
|
|
|
|
|
## Training Details |
|
|
|
### Training Data |
|
|
|
This model has been fine-tuned on 110K image-text pairs from the MIMIC dataset. |
|
|
|
### Training Procedure |
|
|
|
The training procedure has been described in detail in Section 4.3 of this [paper](https://arxiv.org/abs/2305.08252). |
|
|
|
#### Metrics |
|
|
|
This model has been evaluated using the Fréchet inception distance (FID) Score on MIMIC dataset. |
|
|
|
### Results |
|
|
|
| Fine-Tuning Strategy | FID Score | |
|
|------------------------|-----------| |
|
| Full FT | 58.74 | |
|
| Attention | 52.41 | |
|
| Bias | 20.81 | |
|
| Norm | 29.84 | |
|
| Bias+Norm+Attention | 35.93 | |
|
| LoRA | 439.65 | |
|
| SV-Diff | 23.59 | |
|
| DiffFit | 42.50 | |
|
|
|
|
|
## Environmental Impact |
|
|
|
Using Parameter-Efficient Fine-Tuning potentially causes **lesser** harm to the environment since we fine-tune a significantly lesser number of parameters in a model. This results in much lesser computing and hardware requirements. |
|
|
|
## Citation |
|
|
|
|
|
**BibTeX:** |
|
|
|
@article{dutt2023parameter, |
|
title={Parameter-Efficient Fine-Tuning for Medical Image Analysis: The Missed Opportunity}, |
|
author={Dutt, Raman and Ericsson, Linus and Sanchez, Pedro and Tsaftaris, Sotirios A and Hospedales, Timothy}, |
|
journal={arXiv preprint arXiv:2305.08252}, |
|
year={2023} |
|
} |
|
|
|
**APA:** |
|
Dutt, R., Ericsson, L., Sanchez, P., Tsaftaris, S. A., & Hospedales, T. (2023). Parameter-Efficient Fine-Tuning for Medical Image Analysis: The Missed Opportunity. arXiv preprint arXiv:2305.08252. |
|
|
|
## Model Card Authors |
|
|
|
Raman Dutt |
|
[Twitter](https://twitter.com/RamanDutt4) |
|
[LinkedIn](https://www.linkedin.com/in/raman-dutt/) |
|
[Email](mailto:[email protected]) |