shwetashweta05 commited on
Commit
c700107
·
verified ·
1 Parent(s): d2060c2

Update pages/Introduction of Ensemble.py

Browse files
Files changed (1) hide show
  1. pages/Introduction of Ensemble.py +269 -0
pages/Introduction of Ensemble.py CHANGED
@@ -0,0 +1,269 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import streamlit as st
2
+
3
+ st.title(":red[**Introduction to Ensemble Learning**]")
4
+
5
+ st.markdown("""
6
+ **Ensemble Learning** is a machine learning technique where **multiple models** (often called "learners") are combined to **solve the same problem**.
7
+
8
+ The idea is that a **group of models** can outperform any individual model by:
9
+ - **Reducing variance** (overfitting),
10
+ - **Reducing bias** (underfitting),
11
+ - **Improving prediction accuracy**.
12
+
13
+ ---
14
+
15
+ ### Why Use Ensemble Methods?
16
+
17
+ - Improves performance and stability.
18
+ - Reduces the risk of overfitting.
19
+ - Works well in both classification and regression tasks.
20
+ - Often wins data science competitions (e.g., Kaggle).
21
+
22
+ ---
23
+
24
+ ### Common Ensemble Techniques
25
+
26
+ 1. **Bagging** (Bootstrap Aggregating)
27
+ - Builds multiple models in parallel.
28
+ - Reduces **variance**.
29
+ - Example: `RandomForest`
30
+
31
+ 2. **Boosting**
32
+ - Builds models sequentially, each correcting errors from the previous.
33
+ - Reduces **bias**.
34
+ - Examples: `AdaBoost`, `GradientBoosting`, `XGBoost`, `LightGBM`
35
+
36
+ 3. **Stacking**
37
+ - Combines different model types.
38
+ - A meta-model learns how to best combine them.
39
+
40
+ ---
41
+
42
+ ### Real-World Examples
43
+ - **Random Forest**: A popular bagging method using decision trees.
44
+ - **XGBoost / LightGBM**: Powerful boosting frameworks used in competitions.
45
+ - **Voting Classifier**: Combines different models (e.g., SVM + Logistic Regression + Decision Tree).
46
+
47
+ ---
48
+
49
+ **In short:** Ensemble learning = smarter models by working together
50
+ """)
51
+
52
+
53
+ st.subheader(":blue[**Voting Ensemble (Classifier)**]")
54
+
55
+ st.markdown("""
56
+ In **ensemble learning**, a **Voting Classifier** combines predictions from multiple different models to make a **final decision**.
57
+
58
+ ---
59
+
60
+ ### Types of Voting:
61
+
62
+ #### Hard Voting
63
+ - Each model votes for a class label.
64
+ - The final prediction is the **majority vote**.
65
+ - Useful when all models are equally good.
66
+
67
+ #### Soft Voting
68
+ - Uses **predicted probabilities** from models.
69
+ - Averages probabilities and picks the class with the **highest average probability**.
70
+ - Works best when base models are **well-calibrated**.
71
+
72
+ ---
73
+
74
+ ### Why Use Voting?
75
+ - Combines **strengths** of different models.
76
+ - Reduces the **risk of overfitting**.
77
+ - Often **improves accuracy** over individual models.
78
+ """)
79
+
80
+ st.subheader(":blue[**Bagging Algorithm (Bootstrap Aggregating)**]")
81
+
82
+ st.markdown("""
83
+ **Bagging** (short for **Bootstrap Aggregating**) is an ensemble learning method that aims to improve the stability and accuracy of machine learning algorithms.
84
+
85
+ It reduces **variance** and helps to **avoid overfitting**, especially for high-variance models like Decision Trees.
86
+
87
+ ---
88
+
89
+ ### How It Works:
90
+
91
+ 1. Create **multiple subsets** of the original training dataset using **bootstrapping** (random sampling with replacement).
92
+ 2. Train a **separate model** on each subset.
93
+ 3. Aggregate the predictions:
94
+ - For **classification**: majority vote.
95
+ - For **regression**: average.
96
+
97
+ ---
98
+
99
+ ### Key Points:
100
+
101
+ - Models are trained **independently and in parallel**.
102
+ - Often used with **Decision Trees**.
103
+ - Final prediction is **more robust** than any individual model.
104
+
105
+ ---
106
+
107
+ ### Example:
108
+ A well-known example of Bagging is the **Random Forest** algorithm:
109
+ - Uses multiple decision trees trained on bootstrapped samples.
110
+ - Adds feature randomness for further diversity.
111
+
112
+ """)
113
+
114
+
115
+ st.title("What is Random Forest?")
116
+
117
+ st.markdown("""
118
+ **Random Forest** is a popular **ensemble learning** algorithm that combines the power of **multiple decision trees** to make more accurate and robust predictions.
119
+
120
+ It is based on the **Bagging** technique and introduces **randomness** at two levels:
121
+ - Random sampling of data (bootstrap samples).
122
+ - Random subset of features for splitting at each node.
123
+
124
+ ---
125
+
126
+ ### How It Works:
127
+
128
+ 1. **Bootstrap sampling**: Random subsets of the training data are created with replacement.
129
+ 2. **Train multiple Decision Trees** on different subsets.
130
+ 3. Each tree makes a prediction.
131
+ 4. The final output is:
132
+ - **Majority vote** (for classification).
133
+ - **Average prediction** (for regression).
134
+
135
+ ---
136
+
137
+ ### Key Benefits:
138
+
139
+ - Handles **high-dimensional** data well.
140
+ - Reduces **overfitting** (more than a single Decision Tree).
141
+ - Works for both **classification** and **regression** tasks.
142
+ - **Feature importance** is easy to extract.
143
+
144
+ ---
145
+
146
+ ### Real-Life Analogy:
147
+ Imagine asking a **group of experts** instead of one person – each tree gives their opinion, and the forest makes the final decision based on consensus!
148
+
149
+ """)
150
+
151
+
152
+ st.subheader(":blue[**Random Forest: Bagging Ensemble**]")
153
+
154
+ st.markdown("""
155
+ **Random Forest** is a powerful ensemble algorithm that uses the **Bagging (Bootstrap Aggregating)** technique with an added twist:
156
+
157
+ ---
158
+
159
+ ### Bagging Recap:
160
+ - **Bagging** creates multiple models (like decision trees) trained on **random subsets** of the data (with replacement).
161
+ - Final prediction is made by **aggregating** outputs from all models:
162
+ - Majority vote (Classification)
163
+ - Average (Regression)
164
+
165
+ ---
166
+
167
+ ### What Makes Random Forest Special?
168
+
169
+ ✅ Uses **Bagging** to build multiple Decision Trees
170
+ ✅ Adds **randomness in feature selection** at each split in a tree
171
+ ✅ Helps make each tree **less correlated** → more powerful ensemble
172
+
173
+ ---
174
+
175
+ ### How Random Forest Works:
176
+ 1. Create many bootstrap samples from the training data.
177
+ 2. Train a **Decision Tree** on each sample.
178
+ 3. At each split in the tree, only consider a **random subset of features**.
179
+ 4. Combine all trees:
180
+ - For classification → **Majority voting**
181
+ - For regression → **Averaging**
182
+
183
+ ---
184
+
185
+ ### Why Random Forest Works Well:
186
+ - Handles **high-dimensional** data.
187
+ - Reduces **variance** and **overfitting**.
188
+ - More stable than individual decision trees.
189
+
190
+ """)
191
+
192
+
193
+ st.subheader(":blue[**Bagging Algorithm in Random Forest**]")
194
+
195
+ st.markdown("""
196
+ ### 🧺 What is Bagging?
197
+
198
+ **Bagging** (Bootstrap Aggregating) is an ensemble technique that:
199
+
200
+ - Trains multiple models on **random samples** of the data (with replacement).
201
+ - Aggregates the predictions to make the final decision.
202
+ - **Classification** → Majority vote
203
+ - **Regression** → Average
204
+
205
+ ---
206
+
207
+ ### How Random Forest Uses Bagging:
208
+
209
+ **Random Forest = Bagging + Random Feature Selection**
210
+
211
+ #### Here's what happens:
212
+ 1. It builds **many decision trees** using **bootstrapped datasets** (Bagging).
213
+ 2. When splitting a node, it uses a **random subset of features**.
214
+ 3. It aggregates the predictions of all trees.
215
+
216
+ This makes Random Forest **more diverse**, **less correlated**, and **more accurate** than basic bagging with full-feature trees.
217
+
218
+ ---
219
+
220
+ ### Why Bagging Helps Random Forest:
221
+ - Reduces **overfitting** by combining diverse learners.
222
+ - Lowers **variance** of predictions.
223
+ - Makes the model **robust and stable**.
224
+
225
+ """)
226
+
227
+
228
+ st.subheader(":blue[**Bagging Ensemble for Classification & Regression**]")
229
+
230
+ st.markdown("""
231
+ ### What is Bagging?
232
+
233
+ **Bagging** (Bootstrap Aggregating) is an ensemble method that trains multiple base models on **randomly drawn subsets** (with replacement) of the training data, and then **combines** their predictions.
234
+
235
+ ---
236
+
237
+ ### For Classification:
238
+
239
+ - Uses a **voting mechanism**:
240
+ - Each model votes for a class.
241
+ - The final prediction is the **majority class**.
242
+
243
+ #### Advantages:
244
+ - Reduces **overfitting**
245
+ - Decreases **variance**
246
+ - Works well with **unstable learners** like Decision Trees
247
+
248
+ ---
249
+
250
+ ### For Regression:
251
+
252
+ - Uses **averaging**:
253
+ - Each model makes a numerical prediction.
254
+ - The final output is the **average** of all predictions.
255
+
256
+ #### Benefits:
257
+ - Produces **smoother** predictions
258
+ - Helps with **noisy datasets**
259
+ - Improves **model generalization**
260
+
261
+ ---
262
+
263
+ ### Common Base Estimator:
264
+ - `DecisionTreeClassifier` for classification
265
+ - `DecisionTreeRegressor` for regression
266
+
267
+ Scikit-learn’s `BaggingClassifier` and `BaggingRegressor` are often used.
268
+
269
+ """)