Spaces:

shwetashweta05
/

Zero_to_Hero_Machine_Learning

Sleeping

App Files Files Community

shwetashweta05 commited on Dec 8, 2024

Commit

5118bc0

verified ·

1 Parent(s): cdc7d83

Update pages/Life Cycle Of Machine Learning.py

Browse files

Files changed (1) hide show

pages/Life Cycle Of Machine Learning.py +41 -1

pages/Life Cycle Of Machine Learning.py CHANGED Viewed

@@ -72,4 +72,44 @@ if st.button("Simple EDA"):
     - **7.Identify Relationships**
     Check how different features relate to each other using scatter plots or correlation matrices.
         - Example: Do students with high math marks also score high in science?
-    """)

     - **7.Identify Relationships**
     Check how different features relate to each other using scatter plots or correlation matrices.
         - Example: Do students with high math marks also score high in science?
+    """)
+if st.button("Data Pre-processing"):
+    st.write("**Data preprocessing is the process of cleaning and preparing raw data so it can be used by a machine learning model. It ensures that the data is in the right format, free from errors, and ready for analysis.**")
+    st.write("""
+    **Why is Data Preprocessing Important?**
+    - Raw data often contains errors, missing values, or irrelevant information.
+    - Clean and processed data improves the accuracy and performance of the model.
+    """)
+    st.write("""
+    **Steps in Data Preprocessing:**
+    - **1.Collect the Data**
+    Gather data from sources like CSV files, databases, or APIs.
+        - Example: A dataset of house prices with columns like Size, Location, Price, and Year Built.
+    - **2.Handle Missing Data**
+    Replace or remove missing values so the model doesn't face errors.
+        - **Methods:**
+        - Fill with mean, median, or mode.
+        - Remove rows or columns with too many missing values.
+        - Example: If Price is missing for some houses, replace it with the average price.
+    - **3.Remove Outliers**
+    Outliers are extreme values that can distort the model. Use methods like z-score or interquartile range (IQR) to identify and handle them.
+        - Example: A house with a price 10x higher than similar houses might be an outlier.
+    - **4.Convert Categorical Data to Numbers**
+    Machine learning models work with numbers, so categorical data must be converted.
+        - **echniques:**
+        - Label Encoding: Assign a number to each category (e.g., Male = 0, Female = 1).
+        - One-Hot Encoding: Create new columns for each category with binary values (0 or 1).
+        - Example: Convert Location (e.g., "City A", "City B") into numerical values.
+    - **5.Scale Features**
+    Ensure all numerical values are on the same scale so that no feature dominates.
+        - Techniques:
+        - Normalization: Rescale values to be between 0 and 1.
+        - Standardization: Scale data to have a mean of 0 and standard deviation of 1.
+        - Example: House sizes (in square feet) might range from 500 to 5,000, while prices range in millions; scaling ensures both features are treated equally.
+    - **6.Split the Data**
+    Divide the data into training and testing sets.
+        - Training set: Used to train the model.
+        - Testing set: Used to evaluate the model’s performance.
+        - Example: Split 80% of the data for training and 20% for testing.
+        """)