FredZhang7 commited on
Commit
e0e11e3
1 Parent(s): 40f35ef

update descriptions

Browse files
Files changed (1) hide show
  1. README.md +17 -3
README.md CHANGED
@@ -51,11 +51,11 @@ language:
51
 
52
  The classification task is split into two stages:
53
  1. URL features model
54
- - 96.5%+ accuracy on training and validation data
55
  - 2,436,727 rows of labelled URLs
56
  2. Website features model
57
- - 98.2% on training data, 98.7% accuracy on validation
58
- - 911,180 rows of 11 features
59
 
60
  ## Training Features
61
  I applied cross-validation with `cv=5` to the training dataset to search for the best hyperparameters.
@@ -72,6 +72,20 @@ params = {
72
  'num_boost_round': [500, 750, 800, 900, 1000, 1250, 2000]
73
  }
74
  ```
 
 
 
 
 
 
 
 
 
 
 
 
 
 
75
 
76
 
77
  ## URL Features
 
51
 
52
  The classification task is split into two stages:
53
  1. URL features model
54
+ - **96.5%+ accuracy** on training and validation data
55
  - 2,436,727 rows of labelled URLs
56
  2. Website features model
57
+ - **100.0% accuracy** on training and validation data
58
+ - 911,180 rows of 43 features
59
 
60
  ## Training Features
61
  I applied cross-validation with `cv=5` to the training dataset to search for the best hyperparameters.
 
72
  'num_boost_round': [500, 750, 800, 900, 1000, 1250, 2000]
73
  }
74
  ```
75
+ To reproduce the 100.0% accuracy model, you can follow the data analysis in the dataset page to filter out the unimportant features.
76
+ Then train a LightGBM model using the most suited hyperparamters for this task:
77
+ ```python
78
+ params = {
79
+ 'objective': 'binary',
80
+ 'metric': 'binary_logloss',
81
+ 'boosting_type': 'gbdt',
82
+ 'num_leaves': 31,
83
+ 'learning_rate': 0.01,
84
+ 'feature_fraction': 0.6,
85
+ 'early_stopping_rounds': 10,
86
+ 'num_boost_round': 800
87
+ }
88
+ ```
89
 
90
 
91
  ## URL Features