README.md · cybersectony/phishing-email-detection-distilbert_v2.1 at 0236b9284ffd43140b1d616846890cc8003fc362

metadata

license: apache-2.0
datasets:
  - cybersectony/PhishingEmailDetection
library_name: transformers
language:
  - en
base_model:
  - distilbert/distilbert-base-uncased
tags:
  - Phishing
  - Email
  - URL
  - Detection

A distilBERT based Phishing Email Detection Model

Model Overview

This model is specifically fine-tuned for detecting phishing emails using the Hugging Face Trainer API.

Key Specifications

Base Architecture: DistilBERT
Task: Multilabel Classification
Fine-tuning Framework: Hugging Face Trainer API
Training Duration: 3 epochs

Performance Metrics

F1-score: 97.717
Accuracy: 97.716
Precision: 97.736
Recall: 97.717

Dataset Details

This model was trained using a Phishing Email Detection Dataset.

Usage Guide Installation

pip install transformers
pip install torch

from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

# Load model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("your-username/model-name")
model = AutoModelForSequenceClassification.from_pretrained("your-username/model-name")

def predict_phishing(email_text):
    # Preprocess and tokenize
    inputs = tokenizer(email_text, return_tensors="pt", truncation=True, max_length=512)
    
    # Get prediction
    with torch.no_grad():
        outputs = model(**inputs)
        predictions = torch.nn.functional.softmax(outputs.logits, dim=-1)
    
    return {
        "is_phishing": bool(predictions[0][1] > 0.5),
        "confidence": float(predictions[0][1])
    }

# Example usage
email = "Your email text here..."
result = predict_phishing(email)
print(f"Is Phishing: {result['is_phishing']}")
print(f"Confidence: {result['confidence']:.2%}")