malphish-eater-v1 / README.md
FredZhang7's picture
Update README.md
83074ed
|
raw
history blame
1.12 kB
metadata
license: cc-by-nc-4.0
datasets:
  - FredZhang7/malicious-website-features-2.4M
wget:
  - text: https://chat.openai.com/
  - text: https://huggingface.co/FredZhang7/aivance-safesearch-v3
metrics:
  - accuracy
language:
  - af
  - en
  - et
  - sw
  - sv
  - sq
  - de
  - ca
  - hu
  - da
  - tl
  - so
  - fi
  - fr
  - cs
  - hr
  - cy
  - es
  - sl
  - tr
  - pl
  - pt
  - nl
  - id
  - sk
  - lt
  - 'no'
  - lv
  - vi
  - it
  - ro
  - ru
  - mk
  - bg
  - th
  - ja
  - ko
  - multilingual

The classification task is split into two stages:

  1. URL features model
    • 96.5%+ accuracy on training and validation data
    • 2,436,727 rows of labelled URLs
  2. Website features model
    • 98.2% on training data, 98.7% accuracy on validation
    • 911,180 rows of 11 features

URL Features

from transformers import AutoModelForSequenceClassification, AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("FredZhang7/malware-phisher")
model = AutoModelForSequenceClassification.from_pretrained("FredZhang7/malware-phisher")

Website Features

pip install lightgbm
import lightgbm as lgb
lgb.Booster(model_file="malicious_features_combined.txt")