Spaces:

shwetashweta05
/

Ensemble

Sleeping

File size: 7,804 Bytes

c700107

import streamlit as st

st.title(":red[**Introduction to Ensemble Learning**]")

st.markdown("""
**Ensemble Learning** is a machine learning technique where **multiple models** (often called "learners") are combined to **solve the same problem**.

The idea is that a **group of models** can outperform any individual model by:
- **Reducing variance** (overfitting),
- **Reducing bias** (underfitting),
- **Improving prediction accuracy**.

---

### Why Use Ensemble Methods?

- Improves performance and stability.
- Reduces the risk of overfitting.
- Works well in both classification and regression tasks.
- Often wins data science competitions (e.g., Kaggle).

---

### Common Ensemble Techniques

1. **Bagging** (Bootstrap Aggregating)
   - Builds multiple models in parallel.
   - Reduces **variance**.
   - Example: `RandomForest`

2. **Boosting**
   - Builds models sequentially, each correcting errors from the previous.
   - Reduces **bias**.
   - Examples: `AdaBoost`, `GradientBoosting`, `XGBoost`, `LightGBM`

3. **Stacking**
   - Combines different model types.
   - A meta-model learns how to best combine them.

---

###  Real-World Examples
- **Random Forest**: A popular bagging method using decision trees.
- **XGBoost / LightGBM**: Powerful boosting frameworks used in competitions.
- **Voting Classifier**: Combines different models (e.g., SVM + Logistic Regression + Decision Tree).

---

**In short:** Ensemble learning = smarter models by working together 
""")


st.subheader(":blue[**Voting Ensemble (Classifier)**]")

st.markdown("""
In **ensemble learning**, a **Voting Classifier** combines predictions from multiple different models to make a **final decision**.

---

###  Types of Voting:

#### Hard Voting
- Each model votes for a class label.
- The final prediction is the **majority vote**.
- Useful when all models are equally good.

####  Soft Voting
- Uses **predicted probabilities** from models.
- Averages probabilities and picks the class with the **highest average probability**.
- Works best when base models are **well-calibrated**.

---

###  Why Use Voting?
- Combines **strengths** of different models.
- Reduces the **risk of overfitting**.
- Often **improves accuracy** over individual models.
""")

st.subheader(":blue[**Bagging Algorithm (Bootstrap Aggregating)**]")

st.markdown("""
**Bagging** (short for **Bootstrap Aggregating**) is an ensemble learning method that aims to improve the stability and accuracy of machine learning algorithms.

It reduces **variance** and helps to **avoid overfitting**, especially for high-variance models like Decision Trees.

---

###  How It Works:

1. Create **multiple subsets** of the original training dataset using **bootstrapping** (random sampling with replacement).
2. Train a **separate model** on each subset.
3. Aggregate the predictions:
   - For **classification**: majority vote.
   - For **regression**: average.

---

### Key Points:

- Models are trained **independently and in parallel**.
- Often used with **Decision Trees**.
- Final prediction is **more robust** than any individual model.

---

###  Example:
A well-known example of Bagging is the **Random Forest** algorithm:
- Uses multiple decision trees trained on bootstrapped samples.
- Adds feature randomness for further diversity.

""")


st.title("What is Random Forest?")

st.markdown("""
**Random Forest** is a popular **ensemble learning** algorithm that combines the power of **multiple decision trees** to make more accurate and robust predictions.

It is based on the **Bagging** technique and introduces **randomness** at two levels:
- Random sampling of data (bootstrap samples).
- Random subset of features for splitting at each node.

---

###  How It Works:

1. **Bootstrap sampling**: Random subsets of the training data are created with replacement.
2. **Train multiple Decision Trees** on different subsets.
3. Each tree makes a prediction.
4. The final output is:
   - **Majority vote** (for classification).
   - **Average prediction** (for regression).

---

### Key Benefits:

- Handles **high-dimensional** data well.
- Reduces **overfitting** (more than a single Decision Tree).
- Works for both **classification** and **regression** tasks.
- **Feature importance** is easy to extract.

---

### Real-Life Analogy:
Imagine asking a **group of experts** instead of one person – each tree gives their opinion, and the forest makes the final decision based on consensus!

""")


st.subheader(":blue[**Random Forest: Bagging Ensemble**]")

st.markdown("""
**Random Forest** is a powerful ensemble algorithm that uses the **Bagging (Bootstrap Aggregating)** technique with an added twist:

---

###  Bagging Recap:
- **Bagging** creates multiple models (like decision trees) trained on **random subsets** of the data (with replacement).
- Final prediction is made by **aggregating** outputs from all models:
  - Majority vote (Classification)
  - Average (Regression)

---

###  What Makes Random Forest Special?

✅ Uses **Bagging** to build multiple Decision Trees  
✅ Adds **randomness in feature selection** at each split in a tree  
✅ Helps make each tree **less correlated** → more powerful ensemble

---

###  How Random Forest Works:
1. Create many bootstrap samples from the training data.
2. Train a **Decision Tree** on each sample.
3. At each split in the tree, only consider a **random subset of features**.
4. Combine all trees:
   - For classification → **Majority voting**
   - For regression → **Averaging**

---

###  Why Random Forest Works Well:
- Handles **high-dimensional** data.
- Reduces **variance** and **overfitting**.
- More stable than individual decision trees.

""")


st.subheader(":blue[**Bagging Algorithm in Random Forest**]")

st.markdown("""
### 🧺 What is Bagging?

**Bagging** (Bootstrap Aggregating) is an ensemble technique that:

- Trains multiple models on **random samples** of the data (with replacement).
- Aggregates the predictions to make the final decision.
  - **Classification** → Majority vote  
  - **Regression** → Average

---

###  How Random Forest Uses Bagging:

**Random Forest = Bagging + Random Feature Selection**

#### Here's what happens:
1. It builds **many decision trees** using **bootstrapped datasets** (Bagging).
2. When splitting a node, it uses a **random subset of features**.
3. It aggregates the predictions of all trees.

This makes Random Forest **more diverse**, **less correlated**, and **more accurate** than basic bagging with full-feature trees.

---

###  Why Bagging Helps Random Forest:
- Reduces **overfitting** by combining diverse learners.
- Lowers **variance** of predictions.
- Makes the model **robust and stable**.

""")


st.subheader(":blue[**Bagging Ensemble for Classification & Regression**]")

st.markdown("""
###  What is Bagging?

**Bagging** (Bootstrap Aggregating) is an ensemble method that trains multiple base models on **randomly drawn subsets** (with replacement) of the training data, and then **combines** their predictions.

---

###  For Classification:

- Uses a **voting mechanism**:
  - Each model votes for a class.
  - The final prediction is the **majority class**.

####  Advantages:
- Reduces **overfitting**
- Decreases **variance**
- Works well with **unstable learners** like Decision Trees

---

###  For Regression:

- Uses **averaging**:
  - Each model makes a numerical prediction.
  - The final output is the **average** of all predictions.

####  Benefits:
- Produces **smoother** predictions
- Helps with **noisy datasets**
- Improves **model generalization**

---

###  Common Base Estimator:
- `DecisionTreeClassifier` for classification  
- `DecisionTreeRegressor` for regression  

Scikit-learn’s `BaggingClassifier` and `BaggingRegressor` are often used.

""")