AdamCodd commited on
Commit
1d36b84
1 Parent(s): 1e618fb

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +99 -1
README.md CHANGED
@@ -2,4 +2,102 @@
2
  metrics:
3
  - accuracy
4
  pipeline_tag: image-classification
5
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
2
  metrics:
3
  - accuracy
4
  pipeline_tag: image-classification
5
+ base_model: vit-base-patch16-384
6
+ model-index:
7
+ - name: vit-base-nsfw-detector
8
+ results:
9
+ - task:
10
+ type: image-classification
11
+ name: Image Classification
12
+ metrics:
13
+ - type: accuracy
14
+ value: 0.9654
15
+ name: Accuracy
16
+ - type: AUC
17
+ value: 0.9948
18
+ - type: loss
19
+ value: 0.0937
20
+ name: Loss
21
+ ---
22
+
23
+ # vit-base-nsfw-detector
24
+
25
+ This model is a fine-tuned version of [vit-base-patch16-384](https://huggingface.co/google/vit-base-patch16-384) on around 2000 images (drawings, photos...).
26
+ It achieves the following results on the evaluation set:
27
+ - Loss: 0.0937
28
+ - Accuracy: 0.9654
29
+
30
+
31
+ ## Model description
32
+
33
+ The Vision Transformer (ViT) is a transformer encoder model (BERT-like) pretrained on a large collection of images in a supervised fashion, namely ImageNet-21k, at a resolution of 224x224 pixels. Next, the model was fine-tuned on ImageNet (also referred to as ILSVRC2012), a dataset comprising 1 million images and 1,000 classes, at a higher resolution of 384x384.
34
+
35
+ ## Intended uses & limitations
36
+
37
+ There are two classes: SFW and NSFW. The model has been trained to be restrictive and therefore classify "sexy" images as NSFW. That is, if the image shows cleavage or too much skin, it will be classified as NSFW. This is normal.
38
+
39
+ Usage for a local image:
40
+ ```python
41
+ from transformers import pipeline
42
+ from PIL import Image
43
+
44
+ img = Image.open("<path_to_image_file>")
45
+ predict = pipeline("image-classification", model="AdamCodd/vit-base-nsfw-detector")
46
+ predict(img)
47
+ ```
48
+
49
+ Usage for a distant image:
50
+ ```python
51
+ from transformers import ViTImageProcessor, AutoModelForImageClassification
52
+ from PIL import Image
53
+ import requests
54
+
55
+ url = 'http://images.cocodataset.org/val2017/000000039769.jpg'
56
+ image = Image.open(requests.get(url, stream=True).raw)
57
+ processor = ViTImageProcessor.from_pretrained('AdamCodd/vit-base-nsfw-detector')
58
+ model = AutoModelForImageClassification.from_pretrained('AdamCodd/vit-base-nsfw-detector')
59
+ inputs = processor(images=image, return_tensors="pt")
60
+ outputs = model(**inputs)
61
+ logits = outputs.logits
62
+
63
+ predicted_class_idx = logits.argmax(-1).item()
64
+ print("Predicted class:", model.config.id2label[predicted_class_idx])
65
+ # Predicted class: sfw
66
+ ```
67
+
68
+ The model has been trained on a variety of images (realistic, 3D, drawings), yet it is not perfect and some images may be wrongly classified as NSFW when they are not.
69
+
70
+ ## Training and evaluation data
71
+
72
+ More information needed
73
+
74
+ ## Training procedure
75
+
76
+ ### Training hyperparameters
77
+
78
+ The following hyperparameters were used during training:
79
+ - learning_rate: 3e-05
80
+ - train_batch_size: 32
81
+ - eval_batch_size: 32
82
+ - seed: 42
83
+ - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
84
+ - num_epochs: 1
85
+
86
+ ### Training results
87
+
88
+ - Validation Loss: 0.0937
89
+ - Accuracy: 0.9654,
90
+ - AUC: 0.9948
91
+
92
+ Confusion Matrix:
93
+
94
+ [1076 37]
95
+
96
+ [ 60 1627]
97
+
98
+ ### Framework versions
99
+
100
+ - Transformers 4.36.2
101
+ - Evaluate 0.4.1
102
+
103
+ If you want to support me, you can [here](https://ko-fi.com/adamcodd).