cybersectony
/

phishing-email-detection-distilbert_v2.1

Text Classification

Inference Endpoints

Model card Files Files and versions Community

phishing-email-detection-distilbert_v2.1 / README.md

cybersectony's picture

Update README.md

e94292b verified 25 days ago

|

history blame contribute delete

2.71 kB

	---
	license: apache-2.0
	datasets:
	- cybersectony/PhishingEmailDetection
	library_name: transformers
	language:
	- en
	base_model:
	- distilbert/distilbert-base-uncased
	tags:
	- Phishing
	- Email
	- URL
	- Detection
	---

	# A distilBERT based Phishing Email Detection Model

	## Model Overview
	This model is based on DistilBERT and has been fine-tuned for multilabel classification of Emails and URLs as safe or potentially phishing.

	## Key Specifications
	- __Base Architecture:__ DistilBERT
	- __Task:__ Multilabel Classification
	- __Fine-tuning Framework:__ Hugging Face Trainer API
	- __Training Duration:__ 3 epochs

	## Performance Metrics
	- __F1-score:__ 97.717
	- __Accuracy:__ 97.716
	- __Precision:__ 97.736
	- __Recall:__ 97.717

	## Dataset Details

	The model was trained on a custom dataset of Emails and URLs labeled as legitimate or phishing. The dataset is available at [`cybersectony/PhishingEmailDetection`](https://huggingface.co/datasets/cybersectony/PhishingEmailDetection) on the Hugging Face Hub.


	## Usage Guide

	## Installation

	```bash
	pip install transformers
	pip install torch
	```

	## Quick Start

	```python
	from transformers import AutoTokenizer, AutoModelForSequenceClassification
	import torch

	# Load model and tokenizer
	tokenizer = AutoTokenizer.from_pretrained("your-username/model-name")
	model = AutoModelForSequenceClassification.from_pretrained("your-username/model-name")

	def predict_email(email_text):
	# Preprocess and tokenize
	inputs = tokenizer(
	email_text,
	return_tensors="pt",
	truncation=True,
	max_length=512
	)

	# Get prediction
	with torch.no_grad():
	outputs = model(**inputs)
	predictions = torch.nn.functional.softmax(outputs.logits, dim=-1)

	# Get probabilities for each class
	probs = predictions[0].tolist()

	# Create labels dictionary
	labels = {
	"legitimate_email": probs[0],
	"phishing_url": probs[1],
	"legitimate_url": probs[2],
	"phishing_url_alt": probs[3]
	}

	# Determine the most likely classification
	max_label = max(labels.items(), key=lambda x: x[1])

	return {
	"prediction": max_label[0],
	"confidence": max_label[1],
	"all_probabilities": labels
	}
	```

	## Example Usage

	```python
	# Example usage
	email = """
	Dear User,
	Your account security needs immediate attention. Please verify your credentials.
	Click here: http://suspicious-link.com
	"""

	result = predict_email(email)
	print(f"Prediction: {result['prediction']}")
	print(f"Confidence: {result['confidence']:.2%}")
	print("\nAll probabilities:")
	for label, prob in result['all_probabilities'].items():
	print(f"{label}: {prob:.2%}")
	```