|
--- |
|
language: en |
|
datasets: |
|
- abdulmananraja/real-life-violence-situations |
|
tags: |
|
- image-classification |
|
- vision |
|
- violence-detection |
|
license: apache-2.0 |
|
--- |
|
|
|
# ViT Base Violence Detection |
|
|
|
## Model Description |
|
|
|
This is a Vision Transformer (ViT) model fine-tuned for violence detection. The model is based on [google/vit-base-patch16-224-in21k](https://huggingface.co/google/vit-base-patch16-224-in21k) and has been trained on the [Real Life Violence Situations](https://www.kaggle.com/datasets/mohamedmustafa/real-life-violence-situations-dataset) dataset from Kaggle to classify images into violent or non-violent categories. |
|
|
|
## Intended Use |
|
|
|
The model is intended for use in applications where detecting violent content in images is necessary. This can include: |
|
|
|
- Content moderation |
|
- Surveillance |
|
- Parental control software |
|
|
|
## Model accuracy |
|
|
|
Test accuracy for Vit Base = 98.80% |
|
Loss = 0.20038144290447235 |
|
|
|
## How to Use |
|
|
|
Here is an example of how to use this model for image classification: |
|
|
|
```python |
|
import torch |
|
from transformers import ViTForImageClassification, ViTFeatureExtractor |
|
from PIL import Image |
|
|
|
# Load the model and feature extractor |
|
model = ViTForImageClassification.from_pretrained('jaranohaal/vit-base-violence-detection') |
|
feature_extractor = ViTFeatureExtractor.from_pretrained('jaranohaal/vit-base-violence-detection') |
|
|
|
# Load an image |
|
image = Image.open('image.jpg') |
|
|
|
# Preprocess the image |
|
inputs = feature_extractor(images=image, return_tensors="pt") |
|
|
|
# Perform inference |
|
with torch.no_grad(): |
|
outputs = model(**inputs) |
|
logits = outputs.logits |
|
predicted_class_idx = logits.argmax(-1).item() |
|
|
|
# Print the predicted class |
|
print("Predicted class:", model.config.id2label[predicted_class_idx]) |
|
|