Update pages/Life Cycle Of Machine Learning.py
Browse files
pages/Life Cycle Of Machine Learning.py
CHANGED
@@ -41,4 +41,35 @@ if st.button("Data Collection"):
|
|
41 |
- **4.Ensure Quality**
|
42 |
- Data should be accurate, relevant, and complete. Remove any errors or inconsistencies.
|
43 |
- Example: For customer churn prediction, make sure there are no missing customer details like age or usage data.
|
44 |
-
""")
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
41 |
- **4.Ensure Quality**
|
42 |
- Data should be accurate, relevant, and complete. Remove any errors or inconsistencies.
|
43 |
- Example: For customer churn prediction, make sure there are no missing customer details like age or usage data.
|
44 |
+
""")
|
45 |
+
|
46 |
+
if st.button("Simple EDA"):
|
47 |
+
st.write("""
|
48 |
+
**EDA (Exploratory Data Analysis) is the process of exploring your data to understand its structure, patterns, and insights before building a machine learning model.Think of it as getting to know your data better!**
|
49 |
+
|
50 |
+
**Steps for Simple EDA:**
|
51 |
+
- **1.Understand the Data**
|
52 |
+
Look at the data to understand its structure and contents.
|
53 |
+
- Example: If you have a dataset of students' marks, check columns like Name, Math Marks, Science Marks, and Grade.
|
54 |
+
- **2.Check the Size of the Data**
|
55 |
+
Find out how many rows (data points) and columns (features) are in the dataset.
|
56 |
+
- Example: Your student dataset might have 500 rows (students) and 5 columns (attributes).
|
57 |
+
- **3.View the First Few Rows**
|
58 |
+
Look at the top 5-10 rows to get a snapshot of the data.
|
59 |
+
- Example: Check if the columns contain relevant information like scores and grades.
|
60 |
+
- **4.Summarize the Data**
|
61 |
+
Generate basic statistics for numerical data, such as:
|
62 |
+
- Mean: Average of a column (e.g., average math marks).
|
63 |
+
- Minimum and Maximum: Lowest and highest values (e.g., lowest and highest scores).
|
64 |
+
- Count: Number of non-missing values (e.g., total students who took the test).
|
65 |
+
- **5.Handle Missing Data**
|
66 |
+
Identify and deal with missing or incomplete values.
|
67 |
+
- Example: If some students are missing marks, decide to either fill them with an average or remove those rows.
|
68 |
+
- **6.Check Data Distribution**
|
69 |
+
Visualize how data is spread using graphs:
|
70 |
+
- Histograms: Show the distribution of scores (e.g., most students scored 70-80 in math).
|
71 |
+
- Boxplots: Highlight outliers and data spread.
|
72 |
+
- **7.Identify Relationships**
|
73 |
+
Check how different features relate to each other using scatter plots or correlation matrices.
|
74 |
+
- Example: Do students with high math marks also score high in science?
|
75 |
+
""")
|