File size: 2,558 Bytes
f621008 c6f8426 0b968d4 f2f9a33 fe14407 f2f9a33 3a73780 3f4d371 3a73780 0236b92 3a73780 0236b92 5aab137 0236b92 d641088 0236b92 d641088 0236b92 d641088 0236b92 d641088 0236b92 d641088 0236b92 d641088 0236b92 d641088 0236b92 d641088 3a73780 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 |
---
license: apache-2.0
datasets:
- cybersectony/PhishingEmailDetection
library_name: transformers
language:
- en
base_model:
- distilbert/distilbert-base-uncased
tags:
- Phishing
- Email
- URL
- Detection
---
**A distilBERT based Phishing Email Detection Model**
**Model Overview**
This model is specifically fine-tuned for detecting phishing emails using the Hugging Face Trainer API.
**Key Specifications**
- __Base Architecture:__ DistilBERT
- __Task:__ Multilabel Classification
- __Fine-tuning Framework:__ Hugging Face Trainer API
- __Training Duration:__ 3 epochs
**Performance Metrics**
- __F1-score:__ 97.717
- __Accuracy:__ 97.716
- __Precision:__ 97.736
- __Recall:__ 97.717
**Dataset Details**
This model was trained using a [Phishing Email Detection Dataset](https://huggingface.co/datasets/cybersectony/PhishingEmailDetection).
**Usage Guide**
**Installation**
```bash
pip install transformers
pip install torch
```
**Quick Start**
```python
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch
# Load model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("your-username/model-name")
model = AutoModelForSequenceClassification.from_pretrained("your-username/model-name")
def predict_email(email_text):
# Preprocess and tokenize
inputs = tokenizer(
email_text,
return_tensors="pt",
truncation=True,
max_length=512
)
# Get prediction
with torch.no_grad():
outputs = model(**inputs)
predictions = torch.nn.functional.softmax(outputs.logits, dim=-1)
# Get probabilities for each class
probs = predictions[0].tolist()
# Create labels dictionary
labels = {
"legitimate_email": probs[0],
"phishing_url": probs[1],
"legitimate_url": probs[2],
"phishing_url_alt": probs[3]
}
# Determine the most likely classification
max_label = max(labels.items(), key=lambda x: x[1])
return {
"prediction": max_label[0],
"confidence": max_label[1],
"all_probabilities": labels
}
```
**Example Usage**
```python
# Example usage
email = """
Dear User,
Your account security needs immediate attention. Please verify your credentials.
Click here: http://suspicious-link.com
"""
result = predict_email(email)
print(f"Prediction: {result['prediction']}")
print(f"Confidence: {result['confidence']:.2%}")
print("\nAll probabilities:")
for label, prob in result['all_probabilities'].items():
print(f"{label}: {prob:.2%}")
``` |