shwetashweta05 commited on
Commit
cdc7d83
·
verified ·
1 Parent(s): 87bf5ed

Update pages/Life Cycle Of Machine Learning.py

Browse files
pages/Life Cycle Of Machine Learning.py CHANGED
@@ -41,4 +41,35 @@ if st.button("Data Collection"):
41
  - **4.Ensure Quality**
42
  - Data should be accurate, relevant, and complete. Remove any errors or inconsistencies.
43
  - Example: For customer churn prediction, make sure there are no missing customer details like age or usage data.
44
- """)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
41
  - **4.Ensure Quality**
42
  - Data should be accurate, relevant, and complete. Remove any errors or inconsistencies.
43
  - Example: For customer churn prediction, make sure there are no missing customer details like age or usage data.
44
+ """)
45
+
46
+ if st.button("Simple EDA"):
47
+ st.write("""
48
+ **EDA (Exploratory Data Analysis) is the process of exploring your data to understand its structure, patterns, and insights before building a machine learning model.Think of it as getting to know your data better!**
49
+
50
+ **Steps for Simple EDA:**
51
+ - **1.Understand the Data**
52
+ Look at the data to understand its structure and contents.
53
+ - Example: If you have a dataset of students' marks, check columns like Name, Math Marks, Science Marks, and Grade.
54
+ - **2.Check the Size of the Data**
55
+ Find out how many rows (data points) and columns (features) are in the dataset.
56
+ - Example: Your student dataset might have 500 rows (students) and 5 columns (attributes).
57
+ - **3.View the First Few Rows**
58
+ Look at the top 5-10 rows to get a snapshot of the data.
59
+ - Example: Check if the columns contain relevant information like scores and grades.
60
+ - **4.Summarize the Data**
61
+ Generate basic statistics for numerical data, such as:
62
+ - Mean: Average of a column (e.g., average math marks).
63
+ - Minimum and Maximum: Lowest and highest values (e.g., lowest and highest scores).
64
+ - Count: Number of non-missing values (e.g., total students who took the test).
65
+ - **5.Handle Missing Data**
66
+ Identify and deal with missing or incomplete values.
67
+ - Example: If some students are missing marks, decide to either fill them with an average or remove those rows.
68
+ - **6.Check Data Distribution**
69
+ Visualize how data is spread using graphs:
70
+ - Histograms: Show the distribution of scores (e.g., most students scored 70-80 in math).
71
+ - Boxplots: Highlight outliers and data spread.
72
+ - **7.Identify Relationships**
73
+ Check how different features relate to each other using scatter plots or correlation matrices.
74
+ - Example: Do students with high math marks also score high in science?
75
+ """)