shwetashweta05 commited on
Commit
5118bc0
·
verified ·
1 Parent(s): cdc7d83

Update pages/Life Cycle Of Machine Learning.py

Browse files
pages/Life Cycle Of Machine Learning.py CHANGED
@@ -72,4 +72,44 @@ if st.button("Simple EDA"):
72
  - **7.Identify Relationships**
73
  Check how different features relate to each other using scatter plots or correlation matrices.
74
  - Example: Do students with high math marks also score high in science?
75
- """)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
72
  - **7.Identify Relationships**
73
  Check how different features relate to each other using scatter plots or correlation matrices.
74
  - Example: Do students with high math marks also score high in science?
75
+ """)
76
+
77
+ if st.button("Data Pre-processing"):
78
+ st.write("**Data preprocessing is the process of cleaning and preparing raw data so it can be used by a machine learning model. It ensures that the data is in the right format, free from errors, and ready for analysis.**")
79
+ st.write("""
80
+ **Why is Data Preprocessing Important?**
81
+ - Raw data often contains errors, missing values, or irrelevant information.
82
+ - Clean and processed data improves the accuracy and performance of the model.
83
+ """)
84
+ st.write("""
85
+ **Steps in Data Preprocessing:**
86
+ - **1.Collect the Data**
87
+ Gather data from sources like CSV files, databases, or APIs.
88
+ - Example: A dataset of house prices with columns like Size, Location, Price, and Year Built.
89
+ - **2.Handle Missing Data**
90
+ Replace or remove missing values so the model doesn't face errors.
91
+ - **Methods:**
92
+ - Fill with mean, median, or mode.
93
+ - Remove rows or columns with too many missing values.
94
+ - Example: If Price is missing for some houses, replace it with the average price.
95
+ - **3.Remove Outliers**
96
+ Outliers are extreme values that can distort the model. Use methods like z-score or interquartile range (IQR) to identify and handle them.
97
+ - Example: A house with a price 10x higher than similar houses might be an outlier.
98
+ - **4.Convert Categorical Data to Numbers**
99
+ Machine learning models work with numbers, so categorical data must be converted.
100
+ - **echniques:**
101
+ - Label Encoding: Assign a number to each category (e.g., Male = 0, Female = 1).
102
+ - One-Hot Encoding: Create new columns for each category with binary values (0 or 1).
103
+ - Example: Convert Location (e.g., "City A", "City B") into numerical values.
104
+ - **5.Scale Features**
105
+ Ensure all numerical values are on the same scale so that no feature dominates.
106
+ - Techniques:
107
+ - Normalization: Rescale values to be between 0 and 1.
108
+ - Standardization: Scale data to have a mean of 0 and standard deviation of 1.
109
+ - Example: House sizes (in square feet) might range from 500 to 5,000, while prices range in millions; scaling ensures both features are treated equally.
110
+ - **6.Split the Data**
111
+ Divide the data into training and testing sets.
112
+ - Training set: Used to train the model.
113
+ - Testing set: Used to evaluate the model’s performance.
114
+ - Example: Split 80% of the data for training and 20% for testing.
115
+ """)