Spaces:

shwetashweta05
/

Ensemble

Sleeping

App Files Files Community

shwetashweta05 commited on Apr 6

Commit

c700107

verified ·

1 Parent(s): d2060c2

Update pages/Introduction of Ensemble.py

Browse files

Files changed (1) hide show

pages/Introduction of Ensemble.py +269 -0

pages/Introduction of Ensemble.py CHANGED Viewed

	@@ -0,0 +1,269 @@

+import streamlit as st
+st.title(":red[**Introduction to Ensemble Learning**]")
+st.markdown("""
+**Ensemble Learning** is a machine learning technique where **multiple models** (often called "learners") are combined to **solve the same problem**.
+The idea is that a **group of models** can outperform any individual model by:
+- **Reducing variance** (overfitting),
+- **Reducing bias** (underfitting),
+- **Improving prediction accuracy**.
+---
+### Why Use Ensemble Methods?
+- Improves performance and stability.
+- Reduces the risk of overfitting.
+- Works well in both classification and regression tasks.
+- Often wins data science competitions (e.g., Kaggle).
+---
+### Common Ensemble Techniques
+1. **Bagging** (Bootstrap Aggregating)
+   - Builds multiple models in parallel.
+   - Reduces **variance**.
+   - Example: `RandomForest`
+2. **Boosting**
+   - Builds models sequentially, each correcting errors from the previous.
+   - Reduces **bias**.
+   - Examples: `AdaBoost`, `GradientBoosting`, `XGBoost`, `LightGBM`
+3. **Stacking**
+   - Combines different model types.
+   - A meta-model learns how to best combine them.
+---
+###  Real-World Examples
+- **Random Forest**: A popular bagging method using decision trees.
+- **XGBoost / LightGBM**: Powerful boosting frameworks used in competitions.
+- **Voting Classifier**: Combines different models (e.g., SVM + Logistic Regression + Decision Tree).
+---
+**In short:** Ensemble learning = smarter models by working together
+""")
+st.subheader(":blue[**Voting Ensemble (Classifier)**]")
+st.markdown("""
+In **ensemble learning**, a **Voting Classifier** combines predictions from multiple different models to make a **final decision**.
+---
+###  Types of Voting:
+#### Hard Voting
+- Each model votes for a class label.
+- The final prediction is the **majority vote**.
+- Useful when all models are equally good.
+####  Soft Voting
+- Uses **predicted probabilities** from models.
+- Averages probabilities and picks the class with the **highest average probability**.
+- Works best when base models are **well-calibrated**.
+---
+###  Why Use Voting?
+- Combines **strengths** of different models.
+- Reduces the **risk of overfitting**.
+- Often **improves accuracy** over individual models.
+""")
+st.subheader(":blue[**Bagging Algorithm (Bootstrap Aggregating)**]")
+st.markdown("""
+**Bagging** (short for **Bootstrap Aggregating**) is an ensemble learning method that aims to improve the stability and accuracy of machine learning algorithms.
+It reduces **variance** and helps to **avoid overfitting**, especially for high-variance models like Decision Trees.
+---
+###  How It Works:
+1. Create **multiple subsets** of the original training dataset using **bootstrapping** (random sampling with replacement).
+2. Train a **separate model** on each subset.
+3. Aggregate the predictions:
+   - For **classification**: majority vote.
+   - For **regression**: average.
+---
+### Key Points:
+- Models are trained **independently and in parallel**.
+- Often used with **Decision Trees**.
+- Final prediction is **more robust** than any individual model.
+---
+###  Example:
+A well-known example of Bagging is the **Random Forest** algorithm:
+- Uses multiple decision trees trained on bootstrapped samples.
+- Adds feature randomness for further diversity.
+""")
+st.title("What is Random Forest?")
+st.markdown("""
+**Random Forest** is a popular **ensemble learning** algorithm that combines the power of **multiple decision trees** to make more accurate and robust predictions.
+It is based on the **Bagging** technique and introduces **randomness** at two levels:
+- Random sampling of data (bootstrap samples).
+- Random subset of features for splitting at each node.
+---
+###  How It Works:
+1. **Bootstrap sampling**: Random subsets of the training data are created with replacement.
+2. **Train multiple Decision Trees** on different subsets.
+3. Each tree makes a prediction.
+4. The final output is:
+   - **Majority vote** (for classification).
+   - **Average prediction** (for regression).
+---
+### Key Benefits:
+- Handles **high-dimensional** data well.
+- Reduces **overfitting** (more than a single Decision Tree).
+- Works for both **classification** and **regression** tasks.
+- **Feature importance** is easy to extract.
+---
+### Real-Life Analogy:
+Imagine asking a **group of experts** instead of one person – each tree gives their opinion, and the forest makes the final decision based on consensus!
+""")
+st.subheader(":blue[**Random Forest: Bagging Ensemble**]")
+st.markdown("""
+**Random Forest** is a powerful ensemble algorithm that uses the **Bagging (Bootstrap Aggregating)** technique with an added twist:
+---
+###  Bagging Recap:
+- **Bagging** creates multiple models (like decision trees) trained on **random subsets** of the data (with replacement).
+- Final prediction is made by **aggregating** outputs from all models:
+  - Majority vote (Classification)
+  - Average (Regression)
+---
+###  What Makes Random Forest Special?
+✅ Uses **Bagging** to build multiple Decision Trees
+✅ Adds **randomness in feature selection** at each split in a tree
+✅ Helps make each tree **less correlated** → more powerful ensemble
+---
+###  How Random Forest Works:
+1. Create many bootstrap samples from the training data.
+2. Train a **Decision Tree** on each sample.
+3. At each split in the tree, only consider a **random subset of features**.
+4. Combine all trees:
+   - For classification → **Majority voting**
+   - For regression → **Averaging**
+---
+###  Why Random Forest Works Well:
+- Handles **high-dimensional** data.
+- Reduces **variance** and **overfitting**.
+- More stable than individual decision trees.
+""")
+st.subheader(":blue[**Bagging Algorithm in Random Forest**]")
+st.markdown("""
+### 🧺 What is Bagging?
+**Bagging** (Bootstrap Aggregating) is an ensemble technique that:
+- Trains multiple models on **random samples** of the data (with replacement).
+- Aggregates the predictions to make the final decision.
+  - **Classification** → Majority vote
+  - **Regression** → Average
+---
+###  How Random Forest Uses Bagging:
+**Random Forest = Bagging + Random Feature Selection**
+#### Here's what happens:
+1. It builds **many decision trees** using **bootstrapped datasets** (Bagging).
+2. When splitting a node, it uses a **random subset of features**.
+3. It aggregates the predictions of all trees.
+This makes Random Forest **more diverse**, **less correlated**, and **more accurate** than basic bagging with full-feature trees.
+---
+###  Why Bagging Helps Random Forest:
+- Reduces **overfitting** by combining diverse learners.
+- Lowers **variance** of predictions.
+- Makes the model **robust and stable**.
+""")
+st.subheader(":blue[**Bagging Ensemble for Classification & Regression**]")
+st.markdown("""
+###  What is Bagging?
+**Bagging** (Bootstrap Aggregating) is an ensemble method that trains multiple base models on **randomly drawn subsets** (with replacement) of the training data, and then **combines** their predictions.
+---
+###  For Classification:
+- Uses a **voting mechanism**:
+  - Each model votes for a class.
+  - The final prediction is the **majority class**.
+####  Advantages:
+- Reduces **overfitting**
+- Decreases **variance**
+- Works well with **unstable learners** like Decision Trees
+---
+###  For Regression:
+- Uses **averaging**:
+  - Each model makes a numerical prediction.
+  - The final output is the **average** of all predictions.
+####  Benefits:
+- Produces **smoother** predictions
+- Helps with **noisy datasets**
+- Improves **model generalization**
+---
+###  Common Base Estimator:
+- `DecisionTreeClassifier` for classification
+- `DecisionTreeRegressor` for regression
+Scikit-learn’s `BaggingClassifier` and `BaggingRegressor` are often used.
+""")