File size: 2,706 Bytes
f621008 c6f8426 0b968d4 f2f9a33 5cdd99f f2f9a33 5cdd99f e94292b f2f9a33 5cdd99f f2f9a33 5cdd99f f2f9a33 5cdd99f 3a73780 e94292b 3a73780 5cdd99f 3f4d371 5cdd99f 3a73780 0236b92 3a73780 0236b92 5cdd99f 5aab137 0236b92 d641088 0236b92 d641088 0236b92 d641088 0236b92 d641088 0236b92 d641088 5cdd99f 0236b92 d641088 0236b92 d641088 0236b92 d641088 3a73780 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 |
---
license: apache-2.0
datasets:
- cybersectony/PhishingEmailDetection
library_name: transformers
language:
- en
base_model:
- distilbert/distilbert-base-uncased
tags:
- Phishing
- Email
- URL
- Detection
---
# A distilBERT based Phishing Email Detection Model
## Model Overview
This model is based on DistilBERT and has been fine-tuned for multilabel classification of Emails and URLs as safe or potentially phishing.
## Key Specifications
- __Base Architecture:__ DistilBERT
- __Task:__ Multilabel Classification
- __Fine-tuning Framework:__ Hugging Face Trainer API
- __Training Duration:__ 3 epochs
## Performance Metrics
- __F1-score:__ 97.717
- __Accuracy:__ 97.716
- __Precision:__ 97.736
- __Recall:__ 97.717
## Dataset Details
The model was trained on a custom dataset of Emails and URLs labeled as legitimate or phishing. The dataset is available at [`cybersectony/PhishingEmailDetection`](https://huggingface.co/datasets/cybersectony/PhishingEmailDetection) on the Hugging Face Hub.
## Usage Guide
## Installation
```bash
pip install transformers
pip install torch
```
## Quick Start
```python
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch
# Load model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("your-username/model-name")
model = AutoModelForSequenceClassification.from_pretrained("your-username/model-name")
def predict_email(email_text):
# Preprocess and tokenize
inputs = tokenizer(
email_text,
return_tensors="pt",
truncation=True,
max_length=512
)
# Get prediction
with torch.no_grad():
outputs = model(**inputs)
predictions = torch.nn.functional.softmax(outputs.logits, dim=-1)
# Get probabilities for each class
probs = predictions[0].tolist()
# Create labels dictionary
labels = {
"legitimate_email": probs[0],
"phishing_url": probs[1],
"legitimate_url": probs[2],
"phishing_url_alt": probs[3]
}
# Determine the most likely classification
max_label = max(labels.items(), key=lambda x: x[1])
return {
"prediction": max_label[0],
"confidence": max_label[1],
"all_probabilities": labels
}
```
## Example Usage
```python
# Example usage
email = """
Dear User,
Your account security needs immediate attention. Please verify your credentials.
Click here: http://suspicious-link.com
"""
result = predict_email(email)
print(f"Prediction: {result['prediction']}")
print(f"Confidence: {result['confidence']:.2%}")
print("\nAll probabilities:")
for label, prob in result['all_probabilities'].items():
print(f"{label}: {prob:.2%}")
``` |