fede97 commited on
Commit
74563d4
1 Parent(s): b515bba

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +20 -18
README.md CHANGED
@@ -22,6 +22,17 @@ Based on the CLIP model, Safe-CLIP is fine-tuned to serve the association betwee
22
  ## NSFW Definition
23
  In our work, with inspiration taken from this [paper](https://arxiv.org/abs/2211.05105), we define NSFW as a finite and fixed set concepts that are considered inappropriate, offensive, or harmful to individuals. These concepts are divided into twenty categories: _hate, harassment, violence, suffering, humiliation, harm, suicide, sexual, nudity, bodily fluids, blood, obscene gestures, illegal activity, drug use, theft, vandalism, weapons, child abuse, brutality and cruelty_.
24
 
 
 
 
 
 
 
 
 
 
 
 
25
  ## Model Details
26
 
27
  Safe-CLIP is a fine-tuned version of [CLIP](https://huggingface.co/docs/transformers/en/model_doc/clip) model. The model fine-tuning is done through the ViSU (Visual Safe and Unsafe) Dataset, introduced in the same [paper](https://arxiv.org/abs/2311.16254).
@@ -31,12 +42,12 @@ ViSU contains quadruplets of elements: safe and NSFW sentence pairs along with c
31
 
32
  **Variations** Safe-CLIP comes in four versions to improve the compatibility across some of the most popular vision-and-language models employed for I2T and T2I generation tasks. More details are reported in the next table.
33
 
34
- | | StableDiffusion compatibility | LLaVA compatibility |
35
- |--------------------------|:-----------------------------:|:-------------------:|
36
- | safe-CLIP ViT-L-14 | 1.4 | ? |
37
- | safe-CLIP ViT-L-14-336px | - | 1.5 1.6 |
38
- | safe-CLIP ViT-H-14 | - | - |
39
- | safe-CLIP SD 2.0 | 2.0 | - |
40
 
41
  **Model Release Date** 9 July 2024.
42
 
@@ -46,23 +57,14 @@ You can also find the donwstream-tasks example codes in the repository of the pa
46
  ## Applications
47
  Safe-CLIP can be employed in various applications where safety and appropriateness are critical, including cross-modal retrieval, text-to-image, and image-to-text generation. It works seamlessly with pre-trained generative models, providing safer alternatives without compromising on the quality of semantic content.
48
 
49
- #### Use with Transformers
50
- See the snippet below for usage with Transformers:
51
-
52
- ```python
53
- >>> from transformers import CLIPModel
54
-
55
- >>> model_id = "aimagelab/safeclip_vit-h_14"
56
-
57
- >>> model = CLIPModel.from_pretrained(model_id)
58
- ```
59
-
60
 
61
  ## Downstream Use
 
62
 
63
  #### Zero-shot classification example
64
  ```python
65
- >>> from transformers import CLIPModel
 
66
 
67
  >>> model_id = "aimagelab/safeclip_vit-h_14"
68
 
 
22
  ## NSFW Definition
23
  In our work, with inspiration taken from this [paper](https://arxiv.org/abs/2211.05105), we define NSFW as a finite and fixed set concepts that are considered inappropriate, offensive, or harmful to individuals. These concepts are divided into twenty categories: _hate, harassment, violence, suffering, humiliation, harm, suicide, sexual, nudity, bodily fluids, blood, obscene gestures, illegal activity, drug use, theft, vandalism, weapons, child abuse, brutality and cruelty_.
24
 
25
+ #### Use with Transformers
26
+ See the snippet below for usage with Transformers:
27
+
28
+ ```python
29
+ >>> from transformers import CLIPModel
30
+
31
+ >>> model_id = "aimagelab/safeclip_vit-h_14"
32
+ >>> model = CLIPModel.from_pretrained(model_id)
33
+ ```
34
+
35
+
36
  ## Model Details
37
 
38
  Safe-CLIP is a fine-tuned version of [CLIP](https://huggingface.co/docs/transformers/en/model_doc/clip) model. The model fine-tuning is done through the ViSU (Visual Safe and Unsafe) Dataset, introduced in the same [paper](https://arxiv.org/abs/2311.16254).
 
42
 
43
  **Variations** Safe-CLIP comes in four versions to improve the compatibility across some of the most popular vision-and-language models employed for I2T and T2I generation tasks. More details are reported in the next table.
44
 
45
+ | | StableDiffusion compatibility | LLaVA compatibility |
46
+ |--------------------------|:-----------------------------:|:----------------------------------------------------:|
47
+ | safe-CLIP ViT-L-14 | 1.4 | llama-2-13b-chat-lightning-preview |
48
+ | safe-CLIP ViT-L-14-336px | - | 1.5 - 1.6 |
49
+ | safe-CLIP ViT-H-14 | - | - |
50
+ | safe-CLIP SD 2.0 | 2.0 | - |
51
 
52
  **Model Release Date** 9 July 2024.
53
 
 
57
  ## Applications
58
  Safe-CLIP can be employed in various applications where safety and appropriateness are critical, including cross-modal retrieval, text-to-image, and image-to-text generation. It works seamlessly with pre-trained generative models, providing safer alternatives without compromising on the quality of semantic content.
59
 
 
 
 
 
 
 
 
 
 
 
 
60
 
61
  ## Downstream Use
62
+ More example codes in the official Safe-CLIP [repo](https://github.com/aimagelab/safe-clip).
63
 
64
  #### Zero-shot classification example
65
  ```python
66
+ >>> from transformers import CLIPModel, CLIPProcessor
67
+ >>> from PIL import Image
68
 
69
  >>> model_id = "aimagelab/safeclip_vit-h_14"
70