File size: 4,600 Bytes
c1104ec |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 |
{
"cells": [
{
"cell_type": "markdown",
"id": "7e2a8960-f41d-40fc-a8e7-43c7e938e759",
"metadata": {},
"source": [
"# Excel"
]
},
{
"cell_type": "markdown",
"id": "7ea74fc9-f615-4214-a2a0-da948feb484e",
"metadata": {},
"source": [
"## (a) What It Is\n",
"- Excel is a spreadsheet application developed by Microsoft, widely used for organizing, analyzing, and storing structured data in tabular form.\n",
"\n",
"## Common file extensions: \n",
"- .xlsx (current standard) and .xls (older versions).\n",
"\n",
"## Features:\n",
"- Rows and columns for data organization.\n",
"- Built-in formulas and functions for calculations.\n",
"- Support for charts and data visualization.\n",
"- Capability to handle multiple sheets in a single file."
]
},
{
"cell_type": "markdown",
"id": "fd26d0c4-3772-46f3-9bcc-18a7aedeb4cc",
"metadata": {},
"source": [
"## (b) How to Read These Files\n",
"- In Python, the pandas library provides a simple and efficient way to read Excel files."
]
},
{
"cell_type": "markdown",
"id": "06ae061f-6c3e-4b3c-8e33-8bf8dda50c88",
"metadata": {},
"source": [
"## Code Example:\n",
"import pandas as pd\n",
"\n",
"### Reading an Excel file\n",
"df = pd.read_excel(\"example.xlsx\")\n",
"\n",
"### Display the first few rows\n",
"print(df.head())"
]
},
{
"cell_type": "markdown",
"id": "d6b0b4b3-7514-4b8e-9887-51fd25b4bacf",
"metadata": {},
"source": [
"## Explanation:\n",
"### read_excel():\n",
"- Reads data from the specified Excel file.\n",
"- Automatically detects the first sheet unless specified."
]
},
{
"cell_type": "markdown",
"id": "bcb8595c-b0f0-47a6-b8e4-5d9d42dd797d",
"metadata": {},
"source": [
"## (c) What Are the Issues Encountered When Handling These Files?\n",
"1. Missing Data\n",
"Cells may be empty, causing errors or skewed analysis.\n",
"2. Encoding Problems\n",
"Older Excel files or files saved with non-standard encodings may raise errors.\n",
"3. Corrupted Files\n",
"Excel files can get corrupted, becoming unreadable.\n",
"4. Large Files\n",
"Excel struggles with large datasets (e.g., millions of rows).\n",
"5. Unsupported Features\n",
"Some advanced Excel features (like macros) may not be accessible in Python.\n"
]
},
{
"cell_type": "markdown",
"id": "6d2e0ab1-61bb-40a3-a232-ab94fa5669a8",
"metadata": {},
"source": [
"## (d) How to Overcome These Errors/Issues?\n",
"1. Missing Data\n",
"- Use pandas methods to handle missing values:"
]
},
{
"cell_type": "markdown",
"id": "a4366681-0738-49dd-8f06-34f4e3bdec03",
"metadata": {},
"source": [
"## code:\n",
"### df.fillna(value=\"Unknown\", inplace=True) # Fill missing values with a placeholder\n",
"### df.dropna(inplace=True) # Remove rows with missing values"
]
},
{
"cell_type": "markdown",
"id": "cd4081ed-3060-4a90-95d6-0ebcbca45439",
"metadata": {},
"source": [
"## 2. Corrupted Files\n",
"- Open the file in Excel and repair it manually or save it as a CSV file for processing."
]
},
{
"cell_type": "markdown",
"id": "8b23e35a-911e-4673-8b2d-1a25e38ef756",
"metadata": {},
"source": [
"## 3. Large Files\n",
"- Read data in chunks using pandas:"
]
},
{
"cell_type": "markdown",
"id": "141a2aba-4ce1-48e6-af2e-5102b927dcca",
"metadata": {},
"source": [
"## code:\n",
"### for chunk in pd.read_excel(\"large_file.xlsx\", chunksize=1000):\n",
" ### process(chunk) # Replace `process` with your function"
]
},
{
"cell_type": "markdown",
"id": "cc730499-3610-406c-8b1c-8b7fc39805fd",
"metadata": {},
"source": [
"## 5. Unsupported Features\n",
"- Avoid saving files with unsupported features if they need to be processed programmatically.\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "cac6941e-31ec-484f-9e6a-c62e0c45f4ca",
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.7"
}
},
"nbformat": 4,
"nbformat_minor": 5
}
|