AdamCodd
/

vit-base-nsfw-detector

Image Classification

Transformers.js

Model card Files Files and versions Community

vit-base-nsfw-detector / README.md

AdamCodd's picture

Update README.md

1d36b84 12 months ago

|

3 kB

	---
	metrics:
	- accuracy
	pipeline_tag: image-classification
	base_model: vit-base-patch16-384
	model-index:
	- name: vit-base-nsfw-detector
	results:
	- task:
	type: image-classification
	name: Image Classification
	metrics:
	- type: accuracy
	value: 0.9654
	name: Accuracy
	- type: AUC
	value: 0.9948
	- type: loss
	value: 0.0937
	name: Loss
	---

	# vit-base-nsfw-detector

	This model is a fine-tuned version of [vit-base-patch16-384](https://huggingface.co/google/vit-base-patch16-384) on around 2000 images (drawings, photos...).
	It achieves the following results on the evaluation set:
	- Loss: 0.0937
	- Accuracy: 0.9654


	## Model description

	The Vision Transformer (ViT) is a transformer encoder model (BERT-like) pretrained on a large collection of images in a supervised fashion, namely ImageNet-21k, at a resolution of 224x224 pixels. Next, the model was fine-tuned on ImageNet (also referred to as ILSVRC2012), a dataset comprising 1 million images and 1,000 classes, at a higher resolution of 384x384.

	## Intended uses & limitations

	There are two classes: SFW and NSFW. The model has been trained to be restrictive and therefore classify "sexy" images as NSFW. That is, if the image shows cleavage or too much skin, it will be classified as NSFW. This is normal.

	Usage for a local image:
	```python
	from transformers import pipeline
	from PIL import Image

	img = Image.open("<path_to_image_file>")
	predict = pipeline("image-classification", model="AdamCodd/vit-base-nsfw-detector")
	predict(img)
	```

	Usage for a distant image:
	```python
	from transformers import ViTImageProcessor, AutoModelForImageClassification
	from PIL import Image
	import requests

	url = 'http://images.cocodataset.org/val2017/000000039769.jpg'
	image = Image.open(requests.get(url, stream=True).raw)
	processor = ViTImageProcessor.from_pretrained('AdamCodd/vit-base-nsfw-detector')
	model = AutoModelForImageClassification.from_pretrained('AdamCodd/vit-base-nsfw-detector')
	inputs = processor(images=image, return_tensors="pt")
	outputs = model(**inputs)
	logits = outputs.logits

	predicted_class_idx = logits.argmax(-1).item()
	print("Predicted class:", model.config.id2label[predicted_class_idx])
	# Predicted class: sfw
	```

	The model has been trained on a variety of images (realistic, 3D, drawings), yet it is not perfect and some images may be wrongly classified as NSFW when they are not.

	## Training and evaluation data

	More information needed

	## Training procedure

	### Training hyperparameters

	The following hyperparameters were used during training:
	- learning_rate: 3e-05
	- train_batch_size: 32
	- eval_batch_size: 32
	- seed: 42
	- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
	- num_epochs: 1

	### Training results

	- Validation Loss: 0.0937
	- Accuracy: 0.9654,
	- AUC: 0.9948

	Confusion Matrix:

	[1076 37]

	[ 60 1627]

	### Framework versions

	- Transformers 4.36.2
	- Evaluate 0.4.1

	If you want to support me, you can [here](https://ko-fi.com/adamcodd).