Commit
·
0199e15
1
Parent(s):
699e237
Update README.md
Browse files
README.md
CHANGED
@@ -17,15 +17,16 @@ PubMedCLIP is a fine-tuned version of [CLIP](https://huggingface.co/docs/transfo
|
|
17 |
## Model Description
|
18 |
PubMedCLIP was trained on the [Radiology Objects in COntext (ROCO)](https://github.com/razorx89/roco-dataset) dataset, a large-scale multimodal medical imaging dataset.
|
19 |
The ROCO dataset includes diverse imaging modalities (such as ultrasound, X-Ray, MRI, etc.) from various human body regions (such as head, neck, spine, etc.)
|
20 |
-
captured from open-access [PubMed](https://pubmed.ncbi.nlm.nih.gov/) articles
|
|
|
|
|
21 |
this [link](https://1drv.ms/u/s!ApXgPqe9kykTgwD4Np3-f7ODAot8?e=zLVlJ2) which use ResNet-50, ResNet-50x4 and ViT32 as image encoders.
|
22 |
-
This repository includes only the ViT32 variant of the PubMedCLIP model
|
23 |
|
24 |
- **Repository:** [PubMedCLIP Official GitHub Repository](https://github.com/sarahESL/PubMedCLIP)
|
25 |
- **Paper:** [Does CLIP Benefit Visual Question Answering in the Medical Domain as Much as it Does in the General Domain?](https://arxiv.org/abs/2112.13906)
|
26 |
-
|
27 |
-
|
28 |
-
## Use
|
29 |
|
30 |
```python
|
31 |
import requests
|
@@ -38,10 +39,11 @@ processor = CLIPProcessor.from_pretrained("flaviagiammarino/pubmed-clip-vit-base
|
|
38 |
|
39 |
url = "https://d168r5mdg5gtkq.cloudfront.net/medpix/img/full/synpic9078.jpg"
|
40 |
image = Image.open(requests.get(url, stream=True).raw)
|
41 |
-
text = ["Chest X-Ray", "Brain MRI", "Abdominal CT Scan"]
|
42 |
|
43 |
-
inputs = processor(text=
|
44 |
-
|
|
|
|
|
45 |
```
|
46 |
|
47 |
## Additional Information
|
|
|
17 |
## Model Description
|
18 |
PubMedCLIP was trained on the [Radiology Objects in COntext (ROCO)](https://github.com/razorx89/roco-dataset) dataset, a large-scale multimodal medical imaging dataset.
|
19 |
The ROCO dataset includes diverse imaging modalities (such as ultrasound, X-Ray, MRI, etc.) from various human body regions (such as head, neck, spine, etc.)
|
20 |
+
captured from open-access [PubMed](https://pubmed.ncbi.nlm.nih.gov/) articles.<br>
|
21 |
+
|
22 |
+
The authors of PubMedCLIP have released three different pre-trained models at
|
23 |
this [link](https://1drv.ms/u/s!ApXgPqe9kykTgwD4Np3-f7ODAot8?e=zLVlJ2) which use ResNet-50, ResNet-50x4 and ViT32 as image encoders.
|
24 |
+
This repository includes only the ViT32 variant of the PubMedCLIP model.<br>
|
25 |
|
26 |
- **Repository:** [PubMedCLIP Official GitHub Repository](https://github.com/sarahESL/PubMedCLIP)
|
27 |
- **Paper:** [Does CLIP Benefit Visual Question Answering in the Medical Domain as Much as it Does in the General Domain?](https://arxiv.org/abs/2112.13906)
|
28 |
+
|
29 |
+
## Use with Transformers
|
|
|
30 |
|
31 |
```python
|
32 |
import requests
|
|
|
39 |
|
40 |
url = "https://d168r5mdg5gtkq.cloudfront.net/medpix/img/full/synpic9078.jpg"
|
41 |
image = Image.open(requests.get(url, stream=True).raw)
|
|
|
42 |
|
43 |
+
inputs = processor(text=["Chest X-Ray", "Brain MRI", "Abdominal CT Scan"], images=image, return_tensors="pt", padding=True)
|
44 |
+
outputs = model(**inputs)
|
45 |
+
logits_per_image = outputs.logits_per_image # this is the image-text similarity score
|
46 |
+
probs = logits_per_image.softmax(dim=1) # we can take the softmax to get the label probabilities
|
47 |
```
|
48 |
|
49 |
## Additional Information
|