Update pages/Life Cycle Of Machine Learning.py
Browse files
pages/Life Cycle Of Machine Learning.py
CHANGED
@@ -97,7 +97,7 @@ if st.button("Data Pre-processing"):
|
|
97 |
- Example: A house with a price 10x higher than similar houses might be an outlier.
|
98 |
- **4.Convert Categorical Data to Numbers**
|
99 |
Machine learning models work with numbers, so categorical data must be converted.
|
100 |
-
- **
|
101 |
- Label Encoding: Assign a number to each category (e.g., Male = 0, Female = 1).
|
102 |
- One-Hot Encoding: Create new columns for each category with binary values (0 or 1).
|
103 |
- Example: Convert Location (e.g., "City A", "City B") into numerical values.
|
@@ -112,4 +112,37 @@ if st.button("Data Pre-processing"):
|
|
112 |
- Training set: Used to train the model.
|
113 |
- Testing set: Used to evaluate the model’s performance.
|
114 |
- Example: Split 80% of the data for training and 20% for testing.
|
115 |
-
""")
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
97 |
- Example: A house with a price 10x higher than similar houses might be an outlier.
|
98 |
- **4.Convert Categorical Data to Numbers**
|
99 |
Machine learning models work with numbers, so categorical data must be converted.
|
100 |
+
- **Techniques:**
|
101 |
- Label Encoding: Assign a number to each category (e.g., Male = 0, Female = 1).
|
102 |
- One-Hot Encoding: Create new columns for each category with binary values (0 or 1).
|
103 |
- Example: Convert Location (e.g., "City A", "City B") into numerical values.
|
|
|
112 |
- Training set: Used to train the model.
|
113 |
- Testing set: Used to evaluate the model’s performance.
|
114 |
- Example: Split 80% of the data for training and 20% for testing.
|
115 |
+
""")
|
116 |
+
|
117 |
+
if st.button("Exploratory Data Analysis (EDA)"):
|
118 |
+
st.write("**EDA in Machine Learning (Easy Language)EDA (Exploratory Data Analysis) is like getting to know your dataset before using it in a machine learning model. It helps you understand the data's structure, patterns, and relationships to decide how to process and use it effectively.**")
|
119 |
+
st.write("""
|
120 |
+
Why is EDA Important?
|
121 |
+
- Identifies errors, missing values, or outliers.
|
122 |
+
- Helps understand data distribution and trends.
|
123 |
+
- Guides feature selection and engineering.
|
124 |
+
- Gives insights for choosing the right ML model.
|
125 |
+
""")
|
126 |
+
st.write("""**Steps in EDA:**
|
127 |
+
- **Understand the Dataset**
|
128 |
+
- Look at the structure of your data (rows, columns, and types of values).
|
129 |
+
- Example: In a student dataset, check if columns include Name, Math Marks, and Grade.
|
130 |
+
- **Summarize the Data**
|
131 |
+
- Generate statistics like mean, median, minimum, maximum, and standard deviation.
|
132 |
+
- Example: For math scores, check the average, highest, and lowest scores.
|
133 |
+
- **Handle Missing Values**
|
134 |
+
- Identify any missing data and decide how to fix it (e.g., fill with average values or remove).
|
135 |
+
- Example: If a student is missing Science Marks, fill it with the average science score.
|
136 |
+
- **Visualize the Data**
|
137 |
+
- Create plots to understand data distributions and relationships:
|
138 |
+
- Histograms: Show how data is spread across a range (e.g., how many students scored between 70-80).
|
139 |
+
- Boxplots: Highlight outliers and data spread.
|
140 |
+
- Scatter Plots: Show relationships between two variables (e.g., Attendance vs. Marks).
|
141 |
+
- **Check Relationships**
|
142 |
+
- Use a correlation matrix to see how features relate to each other.
|
143 |
+
- Example: See if Attendance has a strong positive correlation with Math Marks.
|
144 |
+
- **Identify Outliers**
|
145 |
+
- Look for extreme values that might distort the analysis.
|
146 |
+
- Example: A student with Marks = 0 when others scored 70-100 could be an error.
|
147 |
+
""")
|
148 |
+
|