Spaces:
Sleeping
Sleeping
Update pages/Introduction of Ensemble.py
Browse files
pages/Introduction of Ensemble.py
CHANGED
@@ -0,0 +1,269 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
import streamlit as st
|
2 |
+
|
3 |
+
st.title(":red[**Introduction to Ensemble Learning**]")
|
4 |
+
|
5 |
+
st.markdown("""
|
6 |
+
**Ensemble Learning** is a machine learning technique where **multiple models** (often called "learners") are combined to **solve the same problem**.
|
7 |
+
|
8 |
+
The idea is that a **group of models** can outperform any individual model by:
|
9 |
+
- **Reducing variance** (overfitting),
|
10 |
+
- **Reducing bias** (underfitting),
|
11 |
+
- **Improving prediction accuracy**.
|
12 |
+
|
13 |
+
---
|
14 |
+
|
15 |
+
### Why Use Ensemble Methods?
|
16 |
+
|
17 |
+
- Improves performance and stability.
|
18 |
+
- Reduces the risk of overfitting.
|
19 |
+
- Works well in both classification and regression tasks.
|
20 |
+
- Often wins data science competitions (e.g., Kaggle).
|
21 |
+
|
22 |
+
---
|
23 |
+
|
24 |
+
### Common Ensemble Techniques
|
25 |
+
|
26 |
+
1. **Bagging** (Bootstrap Aggregating)
|
27 |
+
- Builds multiple models in parallel.
|
28 |
+
- Reduces **variance**.
|
29 |
+
- Example: `RandomForest`
|
30 |
+
|
31 |
+
2. **Boosting**
|
32 |
+
- Builds models sequentially, each correcting errors from the previous.
|
33 |
+
- Reduces **bias**.
|
34 |
+
- Examples: `AdaBoost`, `GradientBoosting`, `XGBoost`, `LightGBM`
|
35 |
+
|
36 |
+
3. **Stacking**
|
37 |
+
- Combines different model types.
|
38 |
+
- A meta-model learns how to best combine them.
|
39 |
+
|
40 |
+
---
|
41 |
+
|
42 |
+
### Real-World Examples
|
43 |
+
- **Random Forest**: A popular bagging method using decision trees.
|
44 |
+
- **XGBoost / LightGBM**: Powerful boosting frameworks used in competitions.
|
45 |
+
- **Voting Classifier**: Combines different models (e.g., SVM + Logistic Regression + Decision Tree).
|
46 |
+
|
47 |
+
---
|
48 |
+
|
49 |
+
**In short:** Ensemble learning = smarter models by working together
|
50 |
+
""")
|
51 |
+
|
52 |
+
|
53 |
+
st.subheader(":blue[**Voting Ensemble (Classifier)**]")
|
54 |
+
|
55 |
+
st.markdown("""
|
56 |
+
In **ensemble learning**, a **Voting Classifier** combines predictions from multiple different models to make a **final decision**.
|
57 |
+
|
58 |
+
---
|
59 |
+
|
60 |
+
### Types of Voting:
|
61 |
+
|
62 |
+
#### Hard Voting
|
63 |
+
- Each model votes for a class label.
|
64 |
+
- The final prediction is the **majority vote**.
|
65 |
+
- Useful when all models are equally good.
|
66 |
+
|
67 |
+
#### Soft Voting
|
68 |
+
- Uses **predicted probabilities** from models.
|
69 |
+
- Averages probabilities and picks the class with the **highest average probability**.
|
70 |
+
- Works best when base models are **well-calibrated**.
|
71 |
+
|
72 |
+
---
|
73 |
+
|
74 |
+
### Why Use Voting?
|
75 |
+
- Combines **strengths** of different models.
|
76 |
+
- Reduces the **risk of overfitting**.
|
77 |
+
- Often **improves accuracy** over individual models.
|
78 |
+
""")
|
79 |
+
|
80 |
+
st.subheader(":blue[**Bagging Algorithm (Bootstrap Aggregating)**]")
|
81 |
+
|
82 |
+
st.markdown("""
|
83 |
+
**Bagging** (short for **Bootstrap Aggregating**) is an ensemble learning method that aims to improve the stability and accuracy of machine learning algorithms.
|
84 |
+
|
85 |
+
It reduces **variance** and helps to **avoid overfitting**, especially for high-variance models like Decision Trees.
|
86 |
+
|
87 |
+
---
|
88 |
+
|
89 |
+
### How It Works:
|
90 |
+
|
91 |
+
1. Create **multiple subsets** of the original training dataset using **bootstrapping** (random sampling with replacement).
|
92 |
+
2. Train a **separate model** on each subset.
|
93 |
+
3. Aggregate the predictions:
|
94 |
+
- For **classification**: majority vote.
|
95 |
+
- For **regression**: average.
|
96 |
+
|
97 |
+
---
|
98 |
+
|
99 |
+
### Key Points:
|
100 |
+
|
101 |
+
- Models are trained **independently and in parallel**.
|
102 |
+
- Often used with **Decision Trees**.
|
103 |
+
- Final prediction is **more robust** than any individual model.
|
104 |
+
|
105 |
+
---
|
106 |
+
|
107 |
+
### Example:
|
108 |
+
A well-known example of Bagging is the **Random Forest** algorithm:
|
109 |
+
- Uses multiple decision trees trained on bootstrapped samples.
|
110 |
+
- Adds feature randomness for further diversity.
|
111 |
+
|
112 |
+
""")
|
113 |
+
|
114 |
+
|
115 |
+
st.title("What is Random Forest?")
|
116 |
+
|
117 |
+
st.markdown("""
|
118 |
+
**Random Forest** is a popular **ensemble learning** algorithm that combines the power of **multiple decision trees** to make more accurate and robust predictions.
|
119 |
+
|
120 |
+
It is based on the **Bagging** technique and introduces **randomness** at two levels:
|
121 |
+
- Random sampling of data (bootstrap samples).
|
122 |
+
- Random subset of features for splitting at each node.
|
123 |
+
|
124 |
+
---
|
125 |
+
|
126 |
+
### How It Works:
|
127 |
+
|
128 |
+
1. **Bootstrap sampling**: Random subsets of the training data are created with replacement.
|
129 |
+
2. **Train multiple Decision Trees** on different subsets.
|
130 |
+
3. Each tree makes a prediction.
|
131 |
+
4. The final output is:
|
132 |
+
- **Majority vote** (for classification).
|
133 |
+
- **Average prediction** (for regression).
|
134 |
+
|
135 |
+
---
|
136 |
+
|
137 |
+
### Key Benefits:
|
138 |
+
|
139 |
+
- Handles **high-dimensional** data well.
|
140 |
+
- Reduces **overfitting** (more than a single Decision Tree).
|
141 |
+
- Works for both **classification** and **regression** tasks.
|
142 |
+
- **Feature importance** is easy to extract.
|
143 |
+
|
144 |
+
---
|
145 |
+
|
146 |
+
### Real-Life Analogy:
|
147 |
+
Imagine asking a **group of experts** instead of one person – each tree gives their opinion, and the forest makes the final decision based on consensus!
|
148 |
+
|
149 |
+
""")
|
150 |
+
|
151 |
+
|
152 |
+
st.subheader(":blue[**Random Forest: Bagging Ensemble**]")
|
153 |
+
|
154 |
+
st.markdown("""
|
155 |
+
**Random Forest** is a powerful ensemble algorithm that uses the **Bagging (Bootstrap Aggregating)** technique with an added twist:
|
156 |
+
|
157 |
+
---
|
158 |
+
|
159 |
+
### Bagging Recap:
|
160 |
+
- **Bagging** creates multiple models (like decision trees) trained on **random subsets** of the data (with replacement).
|
161 |
+
- Final prediction is made by **aggregating** outputs from all models:
|
162 |
+
- Majority vote (Classification)
|
163 |
+
- Average (Regression)
|
164 |
+
|
165 |
+
---
|
166 |
+
|
167 |
+
### What Makes Random Forest Special?
|
168 |
+
|
169 |
+
✅ Uses **Bagging** to build multiple Decision Trees
|
170 |
+
✅ Adds **randomness in feature selection** at each split in a tree
|
171 |
+
✅ Helps make each tree **less correlated** → more powerful ensemble
|
172 |
+
|
173 |
+
---
|
174 |
+
|
175 |
+
### How Random Forest Works:
|
176 |
+
1. Create many bootstrap samples from the training data.
|
177 |
+
2. Train a **Decision Tree** on each sample.
|
178 |
+
3. At each split in the tree, only consider a **random subset of features**.
|
179 |
+
4. Combine all trees:
|
180 |
+
- For classification → **Majority voting**
|
181 |
+
- For regression → **Averaging**
|
182 |
+
|
183 |
+
---
|
184 |
+
|
185 |
+
### Why Random Forest Works Well:
|
186 |
+
- Handles **high-dimensional** data.
|
187 |
+
- Reduces **variance** and **overfitting**.
|
188 |
+
- More stable than individual decision trees.
|
189 |
+
|
190 |
+
""")
|
191 |
+
|
192 |
+
|
193 |
+
st.subheader(":blue[**Bagging Algorithm in Random Forest**]")
|
194 |
+
|
195 |
+
st.markdown("""
|
196 |
+
### 🧺 What is Bagging?
|
197 |
+
|
198 |
+
**Bagging** (Bootstrap Aggregating) is an ensemble technique that:
|
199 |
+
|
200 |
+
- Trains multiple models on **random samples** of the data (with replacement).
|
201 |
+
- Aggregates the predictions to make the final decision.
|
202 |
+
- **Classification** → Majority vote
|
203 |
+
- **Regression** → Average
|
204 |
+
|
205 |
+
---
|
206 |
+
|
207 |
+
### How Random Forest Uses Bagging:
|
208 |
+
|
209 |
+
**Random Forest = Bagging + Random Feature Selection**
|
210 |
+
|
211 |
+
#### Here's what happens:
|
212 |
+
1. It builds **many decision trees** using **bootstrapped datasets** (Bagging).
|
213 |
+
2. When splitting a node, it uses a **random subset of features**.
|
214 |
+
3. It aggregates the predictions of all trees.
|
215 |
+
|
216 |
+
This makes Random Forest **more diverse**, **less correlated**, and **more accurate** than basic bagging with full-feature trees.
|
217 |
+
|
218 |
+
---
|
219 |
+
|
220 |
+
### Why Bagging Helps Random Forest:
|
221 |
+
- Reduces **overfitting** by combining diverse learners.
|
222 |
+
- Lowers **variance** of predictions.
|
223 |
+
- Makes the model **robust and stable**.
|
224 |
+
|
225 |
+
""")
|
226 |
+
|
227 |
+
|
228 |
+
st.subheader(":blue[**Bagging Ensemble for Classification & Regression**]")
|
229 |
+
|
230 |
+
st.markdown("""
|
231 |
+
### What is Bagging?
|
232 |
+
|
233 |
+
**Bagging** (Bootstrap Aggregating) is an ensemble method that trains multiple base models on **randomly drawn subsets** (with replacement) of the training data, and then **combines** their predictions.
|
234 |
+
|
235 |
+
---
|
236 |
+
|
237 |
+
### For Classification:
|
238 |
+
|
239 |
+
- Uses a **voting mechanism**:
|
240 |
+
- Each model votes for a class.
|
241 |
+
- The final prediction is the **majority class**.
|
242 |
+
|
243 |
+
#### Advantages:
|
244 |
+
- Reduces **overfitting**
|
245 |
+
- Decreases **variance**
|
246 |
+
- Works well with **unstable learners** like Decision Trees
|
247 |
+
|
248 |
+
---
|
249 |
+
|
250 |
+
### For Regression:
|
251 |
+
|
252 |
+
- Uses **averaging**:
|
253 |
+
- Each model makes a numerical prediction.
|
254 |
+
- The final output is the **average** of all predictions.
|
255 |
+
|
256 |
+
#### Benefits:
|
257 |
+
- Produces **smoother** predictions
|
258 |
+
- Helps with **noisy datasets**
|
259 |
+
- Improves **model generalization**
|
260 |
+
|
261 |
+
---
|
262 |
+
|
263 |
+
### Common Base Estimator:
|
264 |
+
- `DecisionTreeClassifier` for classification
|
265 |
+
- `DecisionTreeRegressor` for regression
|
266 |
+
|
267 |
+
Scikit-learn’s `BaggingClassifier` and `BaggingRegressor` are often used.
|
268 |
+
|
269 |
+
""")
|