ViT Base Violence Detection

Model Description

This is a Vision Transformer (ViT) model fine-tuned for violence detection. The model is based on google/vit-base-patch16-224-in21k and has been trained on the Real Life Violence Situations dataset from Kaggle to classify images into violent or non-violent categories.

Intended Use

The model is intended for use in applications where detecting violent content in images is necessary. This can include:

Content moderation
Surveillance
Parental control software

Model accuracy

Test accuracy for Vit Base = 98.80% Loss = 0.20038144290447235

How to Use

Here is an example of how to use this model for image classification:

import torch
from transformers import ViTForImageClassification, ViTFeatureExtractor
from PIL import Image

# Load the model and feature extractor
model = ViTForImageClassification.from_pretrained('jaranohaal/vit-base-violence-detection')
feature_extractor = ViTFeatureExtractor.from_pretrained('jaranohaal/vit-base-violence-detection')

# Load an image
image = Image.open('image.jpg')

# Preprocess the image
inputs = feature_extractor(images=image, return_tensors="pt")

# Perform inference
with torch.no_grad():
    outputs = model(**inputs)
    logits = outputs.logits
    predicted_class_idx = logits.argmax(-1).item()

# Print the predicted class
print("Predicted class:", model.config.id2label[predicted_class_idx])