ayethuzar commited on
Commit
256f0db
·
2 Parent(s): a8c0953 8d029ab

Merge branch 'milestone-3' of https://github.com/aye-thuzar/CS634Project into milestone-3

Browse files
CS634Project_Milestone3_AyeThuzar.ipynb ADDED
The diff for this file is too large to render. See raw diff
 
README.md CHANGED
@@ -1,10 +1,16 @@
1
  # CS634Project
2
 
3
- Milestone-3 notebook: https://colab.research.google.com/drive/17-7A0RkGcwqcJw0IcSvkniDmhbn5SuXe
4
 
5
  Hugging Face App:
6
 
7
- Results:
 
 
 
 
 
 
8
 
9
  XGBoost Model's RMSE: 28986 (Milestone-2)
10
 
@@ -14,15 +20,15 @@ Optuna optimized LGBM's RMSE: 13799.282803291926
14
 
15
  ***********
16
 
17
- Totalnumber of trials: 120
18
 
19
- Best RMSE score on validation data: 12338.665498601415
20
 
21
- ------------------------------
22
 
23
- Best params:
24
 
25
- ------------------------------
26
 
27
  boosting_type : goss
28
 
@@ -44,6 +50,62 @@ min_child_samples : 1
44
 
45
  ***********
46
 
47
- Reference:
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
48
 
49
  https://github.com/adhok/streamlit_ames_housing_price_prediction_app/tree/main
 
1
  # CS634Project
2
 
3
+ Milestone-3 notebook: [[https://colab.research.google.com/drive/17-7A0RkGcwqcJw0IcSvkniDmhbn5SuXe]](https://github.com/aye-thuzar/CS634Project/blob/milestone-3/CS634Project_Milestone3_AyeThuzar.ipynb)(https://colab.research.google.com/drive/1BeoZ4Dxhgd6OcUwPhk6rKCeFnDFMUCmt#scrollTo=TZ4Ci-YXOSl6)
4
 
5
  Hugging Face App:
6
 
7
+ App Demonstration Video:
8
+
9
+ ***********
10
+
11
+ Results
12
+
13
+ ***********
14
 
15
  XGBoost Model's RMSE: 28986 (Milestone-2)
16
 
 
20
 
21
  ***********
22
 
23
+ Hyperparameter Tuning with Optuna
24
 
25
+ ************
26
 
27
+ Total number of trials: 120
28
 
29
+ Best RMSE score on validation data: 12338.665498601415
30
 
31
+ **Best params:**
32
 
33
  boosting_type : goss
34
 
 
50
 
51
  ***********
52
 
53
+ ## Documentation for Milestone 4
54
+
55
+ ***********
56
+
57
+ Dataset: https://www.kaggle.com/competitions/house-prices-advanced-regression-techniques/overview
58
+
59
+ **Data Processing and Feature Selection:**
60
+
61
+ For the feature selection, I started by dropping columns with a low correlation (< 0.4) with SalePrice. I then dropped columns with low variances (< 1). After that, I checked the correlation matrix between columns to drop selected columns that have a correlation greater than 0.5 but with consideration for domain knowledge. After that, I checked for NAs in the numerical columns. Then, based on the result, I used domain knowledge to fill the NAs with appropriate values. In this case, I used 0 to fill the NAs as it was the most relevant value. As for the categorical NAs, they were replaced with ‘None’. Once, all the NAs were taken care of, I used LabelEncoder to encode the categorical values. I, then, checked for a correlation between columns and dropped them based on domain knowledge.
62
+
63
+ Here are the 10 features I selected:
64
+
65
+ 'OverallQual': Overall material and finish quality
66
+
67
+ 'YearBuilt': Original construction date
68
+
69
+ 'TotalBsmtSF': Total square feet of basement area
70
+
71
+ 'GrLivArea': Above grade (ground) living area square feet
72
+
73
+ 'MasVnrArea': Masonry veneer area in square feet
74
+
75
+ 'BsmtFinType1': Quality of basement finished area
76
+
77
+ 'Neighborhood': Physical locations within Ames city limits
78
+
79
+ 'GarageType': Garage location
80
+
81
+ 'SaleCondition': Condition of sale
82
+
83
+ 'BsmtExposure': Walkout or garden-level basement walls
84
+
85
+ All the attributes are encoded and normalized before splitting into train and test with 80% train and 20% test.
86
+
87
+ **Milestone 2:**
88
+
89
+ For milestone 2, I ran an XGBoost Model with objective="reg:squarederror" and max_depth=3. The RMSE score is 28986.
90
+
91
+ **Milestone 3:**
92
+
93
+
94
+
95
+ **References:**
96
+
97
+ https://towardsdatascience.com/analysing-interactions-with-shap-8c4a2bc11c2a
98
+
99
+ https://towardsdatascience.com/introduction-to-shap-with-python-d27edc23c454
100
+
101
+ https://www.aidancooper.co.uk/a-non-technical-guide-to-interpreting-shap-analyses/
102
+
103
+ https://www.kaggle.com/code/rnepal2/lightgbm-optuna-housing-prices-regression/notebook
104
+
105
+ https://www.kaggle.com/code/rnepal2/lightgbm-optuna-housing-prices-regression/notebook
106
+
107
+ https://machinelearningmastery.com/save-load-machine-learning-models-python-scikit-learn/
108
+
109
+ https://towardsdatascience.com/why-is-everyone-at-kaggle-obsessed-with-optuna-for-hyperparameter-tuning-7608fdca337c
110
 
111
  https://github.com/adhok/streamlit_ames_housing_price_prediction_app/tree/main