Spaces:

shwetashweta05
/

Zero_to_Hero_Machine_Learning

Sleeping

App Files Files Community

shwetashweta05 commited on Dec 8, 2024

Commit

a63d7c1

verified ·

1 Parent(s): 5118bc0

Update pages/Life Cycle Of Machine Learning.py

Browse files

Files changed (1) hide show

pages/Life Cycle Of Machine Learning.py +35 -2

pages/Life Cycle Of Machine Learning.py CHANGED Viewed

@@ -97,7 +97,7 @@ if st.button("Data Pre-processing"):
         - Example: A house with a price 10x higher than similar houses might be an outlier.
     - **4.Convert Categorical Data to Numbers**
     Machine learning models work with numbers, so categorical data must be converted.
-        - **echniques:**
         - Label Encoding: Assign a number to each category (e.g., Male = 0, Female = 1).
         - One-Hot Encoding: Create new columns for each category with binary values (0 or 1).
         - Example: Convert Location (e.g., "City A", "City B") into numerical values.
@@ -112,4 +112,37 @@ if st.button("Data Pre-processing"):
         - Training set: Used to train the model.
         - Testing set: Used to evaluate the model’s performance.
         - Example: Split 80% of the data for training and 20% for testing.
-        """)

         - Example: A house with a price 10x higher than similar houses might be an outlier.
     - **4.Convert Categorical Data to Numbers**
     Machine learning models work with numbers, so categorical data must be converted.
+        - **Techniques:**
         - Label Encoding: Assign a number to each category (e.g., Male = 0, Female = 1).
         - One-Hot Encoding: Create new columns for each category with binary values (0 or 1).
         - Example: Convert Location (e.g., "City A", "City B") into numerical values.
         - Training set: Used to train the model.
         - Testing set: Used to evaluate the model’s performance.
         - Example: Split 80% of the data for training and 20% for testing.
+        """)
+if st.button("Exploratory Data Analysis (EDA)"):
+    st.write("**EDA in Machine Learning (Easy Language)EDA (Exploratory Data Analysis) is like getting to know your dataset before using it in a machine learning model. It helps you understand the data's structure, patterns, and relationships to decide how to process and use it effectively.**")
+    st.write("""
+    Why is EDA Important?
+    - Identifies errors, missing values, or outliers.
+    - Helps understand data distribution and trends.
+    - Guides feature selection and engineering.
+    - Gives insights for choosing the right ML model.
+    """)
+    st.write("""**Steps in EDA:**
+    - **Understand the Dataset**
+        - Look at the structure of your data (rows, columns, and types of values).
+        - Example: In a student dataset, check if columns include Name, Math Marks, and Grade.
+    - **Summarize the Data**
+        - Generate statistics like mean, median, minimum, maximum, and standard deviation.
+        - Example: For math scores, check the average, highest, and lowest scores.
+    - **Handle Missing Values**
+        - Identify any missing data and decide how to fix it (e.g., fill with average values or remove).
+        - Example: If a student is missing Science Marks, fill it with the average science score.
+    - **Visualize the Data**
+        - Create plots to understand data distributions and relationships:
+        - Histograms: Show how data is spread across a range (e.g., how many students scored between 70-80).
+        - Boxplots: Highlight outliers and data spread.
+        - Scatter Plots: Show relationships between two variables (e.g., Attendance vs. Marks).
+    - **Check Relationships**
+        - Use a correlation matrix to see how features relate to each other.
+        - Example: See if Attendance has a strong positive correlation with Math Marks.
+    - **Identify Outliers**
+        - Look for extreme values that might distort the analysis.
+        - Example: A student with Marks = 0 when others scored 70-100 could be an error.
+        """)