flaviagiammarino's picture
Update README.md
b98e003
|
raw
history blame
2.78 kB
metadata
license: mit
language:
  - en
tags:
  - medical
  - vision
widget:
  - src: https://d168r5mdg5gtkq.cloudfront.net/medpix/img/full/synpic9078.jpg
    candidate_labels: Chest X-Ray, Brain MRI, Abdomen CT Scan
    example_title: Abdomen CT Scan

Model Card for PubMedCLIP

PubMedCLIP is a fine-tuned version of CLIP for the medical domain.

Model Description

PubMedCLIP was trained on the Radiology Objects in COntext (ROCO) dataset, a large-scale multimodal medical imaging dataset. The ROCO dataset includes diverse imaging modalities (such as X-Ray, MRI, ultrasound, fluoroscopy, etc.) from various human body regions (such as head, spine, chest, abdomen, etc.) captured from open-access PubMed articles.

PubMedCLIP was trained for 50 epochs with a batch size of 64 using the Adam optimizer with a learning rate of 10−5. The authors have released three different pre-trained models at this link which use ResNet-50, ResNet-50x4 and ViT32 as image encoders. This repository includes only the ViT32 variant of the PubMedCLIP model.

Usage

import requests
from PIL import Image

from transformers import CLIPProcessor, CLIPModel

model = CLIPModel.from_pretrained("flaviagiammarino/pubmed-clip-vit-base-patch32")
processor = CLIPProcessor.from_pretrained("flaviagiammarino/pubmed-clip-vit-base-patch32")

url = "https://d168r5mdg5gtkq.cloudfront.net/medpix/img/full/synpic9078.jpg"
image = Image.open(requests.get(url, stream=True).raw)

inputs = processor(text=["Chest X-Ray", "Brain MRI", "Abdomen CT Scan"], images=image, return_tensors="pt", padding=True)
outputs = model(**inputs)
logits_per_image = outputs.logits_per_image # this is the image-text similarity score
probs = logits_per_image.softmax(dim=1) # we can take the softmax to get the label probabilities

Additional Information

Licensing Information

The authors have released the model code and pre-trained checkpoints under the MIT License.

Citation Information

@article{eslami2021does,
  title={Does clip benefit visual question answering in the medical domain as much as it does in the general domain?},
  author={Eslami, Sedigheh and de Melo, Gerard and Meinel, Christoph},
  journal={arXiv preprint arXiv:2112.13906},
  year={2021}
}