Spaces:

shwetashweta05
/

Zero_to_Hero_Machine_Learning

Running

App Files Files Community

shwetashweta05 commited on Dec 15, 2024

Commit

16e390d

verified ·

1 Parent(s): 17326a7

Update pages/6.Data Collection.py

Browse files

Files changed (1) hide show

pages/6.Data Collection.py +75 -0

pages/6.Data Collection.py CHANGED Viewed

@@ -75,3 +75,78 @@ if data_type == "Structured":
                     mime="application/octet-stream",
                 )

                     mime="application/octet-stream",
                 )
+# CSV Format Content
+if format_selected == "CSV":
+    st.write("#### CSV Format")
+    # Part (a) What it is
+    st.subheader("What is CSV?")
+    st.write("""
+    CSV (Comma-Separated Values) is a plain-text file format used to store tabular data,
+    where each row corresponds to a record, and fields are separated by commas.
+    It is widely used for data exchange due to its simplicity and compatibility across systems.
+    Common file extensions include `.csv`.
+    """)
+    # Part (b) How to Read These Files
+    st.subheader("How to Read CSV Files?")
+    st.code("""
+    import pandas as pd
+    # Reading a CSV file
+    df = pd.read_csv("file.csv")
+    print(df.head())
+    # Reading a CSV file with custom delimiter
+    df = pd.read_csv("file.csv", sep=";")
+    """)
+    # Part (c) Issues Encountered
+    st.subheader("Common Issues Encountered When Handling CSV Files")
+    st.write("""
+    - **Incorrect Delimiters**: Files may use delimiters other than commas, e.g., semicolons or tabs.
+    - **Encoding Problems**: Files with different encodings (e.g., UTF-8, ISO-8859-1) may cause errors.
+    - **Missing or Corrupted Data**: Blank fields or inconsistencies in data.
+    - **Header Issues**: Missing headers or extra/unexpected columns.
+    - **Large File Sizes**: Memory limitations when processing large datasets.
+    """)
+    # Part (d) How to Overcome These Issues
+    st.subheader("How to Overcome These Issues?")
+    st.write("""
+    - **Incorrect Delimiters**: Specify the correct delimiter when reading:
+      ```python
+      df = pd.read_csv("file.csv", sep=";")
+      ```
+    - **Encoding Problems**: Specify the encoding explicitly:
+      ```python
+      df = pd.read_csv("file.csv", encoding="utf-8")
+      ```
+    - **Missing or Corrupted Data**: Handle missing values using pandas:
+      ```python
+      df.fillna("NA", inplace=True)
+      ```
+    - **Header Issues**: Assign custom headers or skip problematic rows:
+      ```python
+      df = pd.read_csv("file.csv", header=None)
+      df.columns = ["Column1", "Column2", "Column3"]
+      ```
+    - **Large Files**: Use chunk processing for large files:
+      ```python
+      chunks = pd.read_csv("file.csv", chunksize=1000)
+      for chunk in chunks:
+          process(chunk)
+      ```
+    """)
+    # Downloadable Guide Button
+    st.markdown("### Download Coding Guide:")
+    if st.button("Download CSV Guide"):
+        # Provide a downloadable Jupyter Notebook file
+        file_path = "CSV_guide.ipynb"  # Replace with the actual file path
+        with open(file_path, "rb") as file:
+            st.download_button(
+                label="Download CSV Guide",
+                data=file,
+                file_name="CSV_guide.ipynb",
+                mime="application/octet-stream",
+            )