--- language: en datasets: - abdulmananraja/real-life-violence-situations tags: - image-classification - vision - violence-detection license: apache-2.0 --- # ViT Base Violence Detection ## Model Description This is a Vision Transformer (ViT) model fine-tuned for violence detection. The model is based on [google/vit-base-patch16-224-in21k](https://huggingface.co/google/vit-base-patch16-224-in21k) and has been trained on the [Real Life Violence Situations](https://www.kaggle.com/datasets/mohamedmustafa/real-life-violence-situations-dataset) dataset from Kaggle to classify images into violent or non-violent categories. ## Intended Use The model is intended for use in applications where detecting violent content in images is necessary. This can include: - Content moderation - Surveillance - Parental control software ## Model accuracy Test accuracy for Vit Base = 98.80% Loss = 0.20038144290447235 ## How to Use Here is an example of how to use this model for image classification: ```python import torch from transformers import ViTForImageClassification, ViTFeatureExtractor from PIL import Image # Load the model and feature extractor model = ViTForImageClassification.from_pretrained('jaranohaal/vit-base-violence-detection') feature_extractor = ViTFeatureExtractor.from_pretrained('jaranohaal/vit-base-violence-detection') # Load an image image = Image.open('image.jpg') # Preprocess the image inputs = feature_extractor(images=image, return_tensors="pt") # Perform inference with torch.no_grad(): outputs = model(**inputs) logits = outputs.logits predicted_class_idx = logits.argmax(-1).item() # Print the predicted class print("Predicted class:", model.config.id2label[predicted_class_idx])