FredZhang7
/

malphish-eater-v1

Text Classification

Inference Endpoints

Model card Files Files and versions Community

FredZhang7 commited on Jul 21, 2023

Commit

e0e11e3

•

1 Parent(s): 40f35ef

update descriptions

Files changed (1) hide show

README.md +17 -3

README.md CHANGED Viewed

@@ -51,11 +51,11 @@ language:
 The classification task is split into two stages:
 1. URL features model
-    - 96.5%+ accuracy on training and validation data
     - 2,436,727 rows of labelled URLs
 2. Website features model
-    - 98.2% on training data, 98.7% accuracy on validation
-    - 911,180 rows of 11 features
 ## Training Features
 I applied cross-validation with `cv=5` to the training dataset to search for the best hyperparameters.
@@ -72,6 +72,20 @@ params = {
     'num_boost_round': [500, 750, 800, 900, 1000, 1250, 2000]
 }
 ```
 ## URL Features

 The classification task is split into two stages:
 1. URL features model
+    - **96.5%+ accuracy** on training and validation data
     - 2,436,727 rows of labelled URLs
 2. Website features model
+    - **100.0% accuracy** on training and validation data
+    - 911,180 rows of 43 features
 ## Training Features
 I applied cross-validation with `cv=5` to the training dataset to search for the best hyperparameters.
     'num_boost_round': [500, 750, 800, 900, 1000, 1250, 2000]
 }
 ```
+To reproduce the 100.0% accuracy model, you can follow the data analysis in the dataset page to filter out the unimportant features.
+Then train a LightGBM model using the most suited hyperparamters for this task:
+```python
+params = {
+    'objective': 'binary',
+    'metric': 'binary_logloss',
+    'boosting_type': 'gbdt',
+    'num_leaves': 31,
+    'learning_rate': 0.01,
+    'feature_fraction': 0.6,
+    'early_stopping_rounds': 10,
+    'num_boost_round': 800
+}
+```
 ## URL Features