{ "cells": [ { "cell_type": "markdown", "id": "7e2a8960-f41d-40fc-a8e7-43c7e938e759", "metadata": {}, "source": [ "# Excel" ] }, { "cell_type": "markdown", "id": "7ea74fc9-f615-4214-a2a0-da948feb484e", "metadata": {}, "source": [ "## (a) What It Is\n", "- Excel is a spreadsheet application developed by Microsoft, widely used for organizing, analyzing, and storing structured data in tabular form.\n", "\n", "## Common file extensions: \n", "- .xlsx (current standard) and .xls (older versions).\n", "\n", "## Features:\n", "- Rows and columns for data organization.\n", "- Built-in formulas and functions for calculations.\n", "- Support for charts and data visualization.\n", "- Capability to handle multiple sheets in a single file." ] }, { "cell_type": "markdown", "id": "fd26d0c4-3772-46f3-9bcc-18a7aedeb4cc", "metadata": {}, "source": [ "## (b) How to Read These Files\n", "- In Python, the pandas library provides a simple and efficient way to read Excel files." ] }, { "cell_type": "markdown", "id": "06ae061f-6c3e-4b3c-8e33-8bf8dda50c88", "metadata": {}, "source": [ "## Code Example:\n", "import pandas as pd\n", "\n", "### Reading an Excel file\n", "df = pd.read_excel(\"example.xlsx\")\n", "\n", "### Display the first few rows\n", "print(df.head())" ] }, { "cell_type": "markdown", "id": "d6b0b4b3-7514-4b8e-9887-51fd25b4bacf", "metadata": {}, "source": [ "## Explanation:\n", "### read_excel():\n", "- Reads data from the specified Excel file.\n", "- Automatically detects the first sheet unless specified." ] }, { "cell_type": "markdown", "id": "bcb8595c-b0f0-47a6-b8e4-5d9d42dd797d", "metadata": {}, "source": [ "## (c) What Are the Issues Encountered When Handling These Files?\n", "1. Missing Data\n", "Cells may be empty, causing errors or skewed analysis.\n", "2. Encoding Problems\n", "Older Excel files or files saved with non-standard encodings may raise errors.\n", "3. Corrupted Files\n", "Excel files can get corrupted, becoming unreadable.\n", "4. Large Files\n", "Excel struggles with large datasets (e.g., millions of rows).\n", "5. Unsupported Features\n", "Some advanced Excel features (like macros) may not be accessible in Python.\n" ] }, { "cell_type": "markdown", "id": "6d2e0ab1-61bb-40a3-a232-ab94fa5669a8", "metadata": {}, "source": [ "## (d) How to Overcome These Errors/Issues?\n", "1. Missing Data\n", "- Use pandas methods to handle missing values:" ] }, { "cell_type": "markdown", "id": "a4366681-0738-49dd-8f06-34f4e3bdec03", "metadata": {}, "source": [ "## code:\n", "### df.fillna(value=\"Unknown\", inplace=True) # Fill missing values with a placeholder\n", "### df.dropna(inplace=True) # Remove rows with missing values" ] }, { "cell_type": "markdown", "id": "cd4081ed-3060-4a90-95d6-0ebcbca45439", "metadata": {}, "source": [ "## 2. Corrupted Files\n", "- Open the file in Excel and repair it manually or save it as a CSV file for processing." ] }, { "cell_type": "markdown", "id": "8b23e35a-911e-4673-8b2d-1a25e38ef756", "metadata": {}, "source": [ "## 3. Large Files\n", "- Read data in chunks using pandas:" ] }, { "cell_type": "markdown", "id": "141a2aba-4ce1-48e6-af2e-5102b927dcca", "metadata": {}, "source": [ "## code:\n", "### for chunk in pd.read_excel(\"large_file.xlsx\", chunksize=1000):\n", " ### process(chunk) # Replace `process` with your function" ] }, { "cell_type": "markdown", "id": "cc730499-3610-406c-8b1c-8b7fc39805fd", "metadata": {}, "source": [ "## 5. Unsupported Features\n", "- Avoid saving files with unsupported features if they need to be processed programmatically.\n" ] }, { "cell_type": "code", "execution_count": null, "id": "cac6941e-31ec-484f-9e6a-c62e0c45f4ca", "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.11.7" } }, "nbformat": 4, "nbformat_minor": 5 }