Model Card for KEEP
Preprint | Github | Webpage | Cite
KEEP (KnowledgE-Enhanced Pathology) is a foundation model designed for cancer diagnosis that integrates disease knowledge into vision-language pre-training. It utilizes a comprehensive disease knowledge graph (KG) containing 11,454 human diseases and 139,143 disease attributes, such as synonyms, definitions, and hierarchical relationships. KEEP reorganizes millions of publicly available noisy pathology image-text pairs into 143K well-structured semantic groups based on the hierarchical relations of the disease KG. By incorporating disease knowledge into the alignment process, KEEP achieves more nuanced image and text representations. The model is validated on 18 diverse benchmarks with over 14,000 whole-slide images (WSIs), demonstrating state-of-the-art performance in zero-shot cancer diagnosis, including an average sensitivity of 89.8% for cancer detection across 7 cancer types. KEEP also excels in subtyping rare cancers, achieving strong generalizability in diagnosing rare tumor subtypes.
Model Details
Model Description
- Developed by: MAGIC-AI4Med team from Shanghai Jiao Tong University and Shanghai AI Lab.
- Model type: Vision-language models (vision encoder: ViT-L/16; text encoder: Bert)
- Pretrain datasets: 143K pathology semantic groups, each with a single caption and multiple images.
- License: MIT
Model Sources [optional]
- Repository: https://github.com/MAGIC-AI4Med/KEEP
- Paper [optional]: https://arxiv.org/abs/2412.13126
- Demo [optional]: [More Information Needed]
Direct Use
from transformers import AutoModel, AutoTokenizer
from torchvision import transforms
from PIL import Image
model = AutoModel.from_pretrained("Astaxanthin/KEEP", trust_remote_code=True)
tokenizer = AutoTokenizer.from_pretrained("Astaxanthin/KEEP", trust_remote_code=True)
model.eval()
transform = transforms.Compose([
transforms.Resize(size=224, interpolation=transforms.InterpolationMode.BICUBIC),
transforms.CenterCrop(size=(224, 224)),
transforms.ToTensor(),
transforms.Normalize(mean=(0.485, 0.456, 0.406), std=(0.229, 0.224, 0.225))
])
example_image_path = './example.tif'
example_text = ['an H&E image of breast invasive carcinoma.', 'an H&E image of normal tissue.', 'an H&E image of lung adenocarcinoma.']
img_input = transform(Image.open(example_image_path).convert('RGB')).unsqueeze(0)
token_input = tokenizer(example_text,max_length=256,padding='max_length',truncation=True, return_tensors='pt')
img_feature = model.encode_image(img_input)
text_feature = model.encode_text(token_input)
Evaluation
Testing Data
We present benchmark results for a range of representative tasks. A complete set of benchmarks can be found in the paper. These results will be updated with each new iteration of KEEP.
Results
Zero-shot Cancer Region Segmentation (DICE)
Models | PLIP[1] | QuiltNet [2] | MI-Zero (Pub) [3] | CONCH [4] | KEEP(Ours) |
---|---|---|---|---|---|
CAMELYON16 | 0.253 | 0.157 | 0.186 | 0.292 | 0.361 |
PANDA | 0.295 | 0.309 | 0.276 | 0.315 | 0.334 |
AGGC22 | 0.284 | 0.282 | 0.324 | 0.449 | 0.530 |
Zero-shot Cancer Detection (AUROC)
Models | CHIEF[1] | PLIP [2] | QuiltNet [3] | MI-Zero (Pub) [4] | CONCH [5] | KEEP(Ours) |
---|---|---|---|---|---|---|
CPTAC-CM | 0.915 | 0.970 | 0.972 | 0.985 | 0.994 | 0.994 |
CPTAC-CCRCC | 0.723 | 0.330 | 0.755 | 0.886 | 0.871 | 0.999 |
CPTAC-PDA | 0.825 | 0.391 | 0.464 | 0.796 | 0.920 | 0.929 |
CPTAC-UCEC | 0.955 | 0.945 | 0.973 | 0.979 | 0.996 | 0.998 |
CPTAC-LSCC | 0.901 | 0.965 | 0.966 | 0.910 | 0.987 | 0.983 |
CPTAC-HNSCC | 0.946 | 0.898 | 0.874 | 0.918 | 0.982 | 0.976 |
CPTAC-LUAD | 0.891 | 0.988 | 0.991 | 0.981 | 0.999 | 1.000 |
Zero-shot Cancer Subtyping (BACC)
Models | PLIP [1] | QuiltNet [2] | MI-Zero (Pub) [3] | CONCH [4] | KEEP(Ours) |
---|---|---|---|---|---|
TCGA-BRCA | 0.519 | 0.500 | 0.633 | 0.727 | 0.774 |
TCGA-NSCLC | 0.699 | 0.667 | 0.753 | 0.901 | 0.902 |
TCGA-RCC | 0.735 | 0.755 | 0.908 | 0.921 | 0.926 |
TCGA-ESCA | 0.614 | 0.746 | 0.954 | 0.923 | 0.977 |
TCGA-BRAIN | 0.361 | 0.346 | 0.361 | 0.453 | 0.604 |
UBC-OCEAN | 0.343 | 0.469 | 0.652 | 0.674 | 0.661 |
CPTAC-NSCLC | 0.647 | 0.607 | 0.643 | 0.836 | 0.863 |
EBRAINS | 0.096 | 0.093 | 0.325 | 0.371 | 0.456 |
Summary
Validated on 18 diverse benchmarks with more than 14,000 whole slide images (WSIs), KEEP achieves state-of-the-art performance in zero-shot cancer diagnostic tasks. Notably, for cancer detection, KEEP demonstrates an average sensitivity of 89.8% at a specificity of 95.0% across 7 cancer types, significantly outperforming vision-only foundation models and highlighting its promising potential for clinical application. For cancer subtyping, KEEP achieves a median balanced accuracy of 0.456 in subtyping 30 rare brain cancers, indicating strong generalizability for diagnosing rare tumors.
Citation
@article{zhou2024keep,
title={A Knowledge-enhanced Pathology Vision-language Foundation Model for Cancer Diagnosis},
author={Xiao Zhou, Luoyi Sun, Dexuan He, Wenbin Guan, Ruifen Wang, Lifeng Wang, Xin Sun, Kun Sun, Ya Zhang, Yanfeng Wang, Weidi Xie},
journal={arXiv preprint arXiv:2412.13126},
year={2024}
}
- Downloads last month
- 79