--- license: mit language: - en base_model: - CrabInHoney/urlbert-tiny-base-v1 pipeline_tag: text-classification tags: - classification - url - urls - phishing new_version: CrabInHoney/urlbert-tiny-v2-phishing-classifier --- This is a very small version of BERT, designed to categorize links into phishing and non-phishing links Model size 6.53M params Tensor type F32 [Dataset](https://www.kaggle.com/datasets/taruntiwarihp/phishing-site-urls "Dataset") Example: from transformers import BertTokenizerFast, BertForSequenceClassification, pipeline import torch device = torch.device('cuda' if torch.cuda.is_available() else 'cpu') print(f"Используемое устройство: {device}") model_path = "./urlbert-tiny-v1-phishing-classifier" tokenizer = BertTokenizerFast.from_pretrained(model_path) model = BertForSequenceClassification.from_pretrained(model_path) model.to(device) classifier = pipeline( "text-classification", model=model, tokenizer=tokenizer, device=0 if torch.cuda.is_available() else -1, return_all_scores=True ) test_urls = [ "en.wikipedia.org/wiki/", "facebook-profile.km6.net" ] for url in test_urls: results = classifier(url) print(f"\nURL: {url}") for result in results[0]: label = result['label'] score = result['score'] print(f"Класс: {label}, вероятность: {score:.4f}") Output: Используемое устройство: cuda URL: en.wikipedia.org/wiki/ Класс: good, вероятность: 0.9995 Класс: phish, вероятность: 0.0005 URL: facebook-profile.km6.net Класс: good, вероятность: 0.0012 Класс: phish, вероятность: 0.9988 ## License [MIT](https://choosealicense.com/licenses/mit/)