Update pages/Life Cycle Of Machine Learning.py
Browse files
pages/Life Cycle Of Machine Learning.py
CHANGED
@@ -72,4 +72,44 @@ if st.button("Simple EDA"):
|
|
72 |
- **7.Identify Relationships**
|
73 |
Check how different features relate to each other using scatter plots or correlation matrices.
|
74 |
- Example: Do students with high math marks also score high in science?
|
75 |
-
""")
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
72 |
- **7.Identify Relationships**
|
73 |
Check how different features relate to each other using scatter plots or correlation matrices.
|
74 |
- Example: Do students with high math marks also score high in science?
|
75 |
+
""")
|
76 |
+
|
77 |
+
if st.button("Data Pre-processing"):
|
78 |
+
st.write("**Data preprocessing is the process of cleaning and preparing raw data so it can be used by a machine learning model. It ensures that the data is in the right format, free from errors, and ready for analysis.**")
|
79 |
+
st.write("""
|
80 |
+
**Why is Data Preprocessing Important?**
|
81 |
+
- Raw data often contains errors, missing values, or irrelevant information.
|
82 |
+
- Clean and processed data improves the accuracy and performance of the model.
|
83 |
+
""")
|
84 |
+
st.write("""
|
85 |
+
**Steps in Data Preprocessing:**
|
86 |
+
- **1.Collect the Data**
|
87 |
+
Gather data from sources like CSV files, databases, or APIs.
|
88 |
+
- Example: A dataset of house prices with columns like Size, Location, Price, and Year Built.
|
89 |
+
- **2.Handle Missing Data**
|
90 |
+
Replace or remove missing values so the model doesn't face errors.
|
91 |
+
- **Methods:**
|
92 |
+
- Fill with mean, median, or mode.
|
93 |
+
- Remove rows or columns with too many missing values.
|
94 |
+
- Example: If Price is missing for some houses, replace it with the average price.
|
95 |
+
- **3.Remove Outliers**
|
96 |
+
Outliers are extreme values that can distort the model. Use methods like z-score or interquartile range (IQR) to identify and handle them.
|
97 |
+
- Example: A house with a price 10x higher than similar houses might be an outlier.
|
98 |
+
- **4.Convert Categorical Data to Numbers**
|
99 |
+
Machine learning models work with numbers, so categorical data must be converted.
|
100 |
+
- **echniques:**
|
101 |
+
- Label Encoding: Assign a number to each category (e.g., Male = 0, Female = 1).
|
102 |
+
- One-Hot Encoding: Create new columns for each category with binary values (0 or 1).
|
103 |
+
- Example: Convert Location (e.g., "City A", "City B") into numerical values.
|
104 |
+
- **5.Scale Features**
|
105 |
+
Ensure all numerical values are on the same scale so that no feature dominates.
|
106 |
+
- Techniques:
|
107 |
+
- Normalization: Rescale values to be between 0 and 1.
|
108 |
+
- Standardization: Scale data to have a mean of 0 and standard deviation of 1.
|
109 |
+
- Example: House sizes (in square feet) might range from 500 to 5,000, while prices range in millions; scaling ensures both features are treated equally.
|
110 |
+
- **6.Split the Data**
|
111 |
+
Divide the data into training and testing sets.
|
112 |
+
- Training set: Used to train the model.
|
113 |
+
- Testing set: Used to evaluate the model’s performance.
|
114 |
+
- Example: Split 80% of the data for training and 20% for testing.
|
115 |
+
""")
|