File size: 2,547 Bytes
f621008
 
 
 
 
c6f8426
 
0b968d4
 
 
 
 
 
 
f2f9a33
 
5cdd99f
f2f9a33
5cdd99f
f2f9a33
 
5cdd99f
f2f9a33
 
 
 
 
5cdd99f
f2f9a33
 
 
 
 
5cdd99f
3a73780
 
 
5cdd99f
3f4d371
5cdd99f
3a73780
0236b92
3a73780
 
0236b92
 
5cdd99f
5aab137
0236b92
 
 
 
 
 
 
 
d641088
0236b92
d641088
 
 
 
 
 
0236b92
 
 
 
 
 
d641088
 
 
 
 
 
 
 
 
 
 
 
 
 
0236b92
d641088
 
 
0236b92
d641088
 
5cdd99f
0236b92
d641088
0236b92
d641088
 
 
 
 
 
 
 
0236b92
d641088
 
 
3a73780
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
---
license: apache-2.0
datasets:
- cybersectony/PhishingEmailDetection
library_name: transformers
language:
- en
base_model:
- distilbert/distilbert-base-uncased
tags:
- Phishing
- Email
- URL
- Detection
---

# A distilBERT based Phishing Email Detection Model

## Model Overview
This model is specifically fine-tuned for detecting phishing emails using the Hugging Face Trainer API.

## Key Specifications
- __Base Architecture:__ DistilBERT
- __Task:__ Multilabel Classification
- __Fine-tuning Framework:__ Hugging Face Trainer API
- __Training Duration:__ 3 epochs

## Performance Metrics
- __F1-score:__ 97.717
- __Accuracy:__ 97.716
- __Precision:__ 97.736
- __Recall:__ 97.717

## Dataset Details

This model was trained using a [Phishing Email Detection Dataset](https://huggingface.co/datasets/cybersectony/PhishingEmailDetection).

## Usage Guide

## Installation

```bash
pip install transformers
pip install torch
```

## Quick Start

```python
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

# Load model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("your-username/model-name")
model = AutoModelForSequenceClassification.from_pretrained("your-username/model-name")

def predict_email(email_text):
    # Preprocess and tokenize
    inputs = tokenizer(
        email_text,
        return_tensors="pt",
        truncation=True,
        max_length=512
    )
    
    # Get prediction
    with torch.no_grad():
        outputs = model(**inputs)
        predictions = torch.nn.functional.softmax(outputs.logits, dim=-1)
    
    # Get probabilities for each class
    probs = predictions[0].tolist()
    
    # Create labels dictionary
    labels = {
        "legitimate_email": probs[0],
        "phishing_url": probs[1],
        "legitimate_url": probs[2],
        "phishing_url_alt": probs[3]
    }
    
    # Determine the most likely classification
    max_label = max(labels.items(), key=lambda x: x[1])
    
    return {
        "prediction": max_label[0],
        "confidence": max_label[1],
        "all_probabilities": labels
    }
```

## Example Usage

```python
# Example usage
email = """
Dear User,
Your account security needs immediate attention. Please verify your credentials.
Click here: http://suspicious-link.com
"""

result = predict_email(email)
print(f"Prediction: {result['prediction']}")
print(f"Confidence: {result['confidence']:.2%}")
print("\nAll probabilities:")
for label, prob in result['all_probabilities'].items():
    print(f"{label}: {prob:.2%}")
```