--- license: apache-2.0 datasets: - cybersectony/PhishingEmailDetection library_name: transformers language: - en base_model: - distilbert/distilbert-base-uncased tags: - Phishing - Email - URL - Detection --- **A distilBERT based Phishing Email Detection Model** **Model Overview** This model is specifically fine-tuned for detecting phishing emails using the Hugging Face Trainer API. **Key Specifications** - __Base Architecture:__ DistilBERT - __Task:__ Multilabel Classification - __Fine-tuning Framework:__ Hugging Face Trainer API - __Training Duration:__ 3 epochs **Performance Metrics** - __F1-score:__ 97.717 - __Accuracy:__ 97.716 - __Precision:__ 97.736 - __Recall:__ 97.717 **Dataset Details** This model was trained using a [Phishing Email Detection Dataset](https://huggingface.co/datasets/cybersectony/PhishingEmailDetection). **Usage Guide** **Installation** ```bash pip install transformers pip install torch ``` **Quick Start** ```python from transformers import AutoTokenizer, AutoModelForSequenceClassification import torch # Load model and tokenizer tokenizer = AutoTokenizer.from_pretrained("your-username/model-name") model = AutoModelForSequenceClassification.from_pretrained("your-username/model-name") def predict_email(email_text): # Preprocess and tokenize inputs = tokenizer( email_text, return_tensors="pt", truncation=True, max_length=512 ) # Get prediction with torch.no_grad(): outputs = model(**inputs) predictions = torch.nn.functional.softmax(outputs.logits, dim=-1) # Get probabilities for each class probs = predictions[0].tolist() # Create labels dictionary labels = { "legitimate_email": probs[0], "phishing_url": probs[1], "legitimate_url": probs[2], "phishing_url_alt": probs[3] } # Determine the most likely classification max_label = max(labels.items(), key=lambda x: x[1]) return { "prediction": max_label[0], "confidence": max_label[1], "all_probabilities": labels } ``` **Example Usage** ```python # Example usage email = """ Dear User, Your account security needs immediate attention. Please verify your credentials. Click here: http://suspicious-link.com """ result = predict_email(email) print(f"Prediction: {result['prediction']}") print(f"Confidence: {result['confidence']:.2%}") print("\nAll probabilities:") for label, prob in result['all_probabilities'].items(): print(f"{label}: {prob:.2%}") ```