metadata
license: apache-2.0
datasets:
- cybersectony/PhishingEmailDetection
library_name: transformers
language:
- en
base_model:
- distilbert/distilbert-base-uncased
tags:
- Phishing
- Email
- URL
- Detection
A distilBERT based Phishing Email Detection Model
Model Overview
This model is specifically fine-tuned for detecting phishing emails using the Hugging Face Trainer API.
Key Specifications
- Base Architecture: DistilBERT
- Task: Multilabel Classification
- Fine-tuning Framework: Hugging Face Trainer API
- Training Duration: 3 epochs
Performance Metrics
- F1-score: 97.717
- Accuracy: 97.716
- Precision: 97.736
- Recall: 97.717
Dataset Details
This model was trained using a Phishing Email Detection Dataset.
Usage Guide
Installation
pip install transformers
pip install torch
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch
# Load model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("your-username/model-name")
model = AutoModelForSequenceClassification.from_pretrained("your-username/model-name")
def predict_phishing(email_text):
# Preprocess and tokenize
inputs = tokenizer(email_text, return_tensors="pt", truncation=True, max_length=512)
# Get prediction
with torch.no_grad():
outputs = model(**inputs)
predictions = torch.nn.functional.softmax(outputs.logits, dim=-1)
return {
"is_phishing": bool(predictions[0][1] > 0.5),
"confidence": float(predictions[0][1])
}
# Example usage
email = "Your email text here..."
result = predict_phishing(email)
print(f"Is Phishing: {result['is_phishing']}")
print(f"Confidence: {result['confidence']:.2%}")