shwetashweta05
commited on
Upload Excel_guide.ipynb
Browse files- Excel_guide.ipynb +169 -0
Excel_guide.ipynb
ADDED
@@ -0,0 +1,169 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
{
|
2 |
+
"cells": [
|
3 |
+
{
|
4 |
+
"cell_type": "markdown",
|
5 |
+
"id": "7e2a8960-f41d-40fc-a8e7-43c7e938e759",
|
6 |
+
"metadata": {},
|
7 |
+
"source": [
|
8 |
+
"# Excel"
|
9 |
+
]
|
10 |
+
},
|
11 |
+
{
|
12 |
+
"cell_type": "markdown",
|
13 |
+
"id": "7ea74fc9-f615-4214-a2a0-da948feb484e",
|
14 |
+
"metadata": {},
|
15 |
+
"source": [
|
16 |
+
"## (a) What It Is\n",
|
17 |
+
"- Excel is a spreadsheet application developed by Microsoft, widely used for organizing, analyzing, and storing structured data in tabular form.\n",
|
18 |
+
"\n",
|
19 |
+
"## Common file extensions: \n",
|
20 |
+
"- .xlsx (current standard) and .xls (older versions).\n",
|
21 |
+
"\n",
|
22 |
+
"## Features:\n",
|
23 |
+
"- Rows and columns for data organization.\n",
|
24 |
+
"- Built-in formulas and functions for calculations.\n",
|
25 |
+
"- Support for charts and data visualization.\n",
|
26 |
+
"- Capability to handle multiple sheets in a single file."
|
27 |
+
]
|
28 |
+
},
|
29 |
+
{
|
30 |
+
"cell_type": "markdown",
|
31 |
+
"id": "fd26d0c4-3772-46f3-9bcc-18a7aedeb4cc",
|
32 |
+
"metadata": {},
|
33 |
+
"source": [
|
34 |
+
"## (b) How to Read These Files\n",
|
35 |
+
"- In Python, the pandas library provides a simple and efficient way to read Excel files."
|
36 |
+
]
|
37 |
+
},
|
38 |
+
{
|
39 |
+
"cell_type": "markdown",
|
40 |
+
"id": "06ae061f-6c3e-4b3c-8e33-8bf8dda50c88",
|
41 |
+
"metadata": {},
|
42 |
+
"source": [
|
43 |
+
"## Code Example:\n",
|
44 |
+
"import pandas as pd\n",
|
45 |
+
"\n",
|
46 |
+
"### Reading an Excel file\n",
|
47 |
+
"df = pd.read_excel(\"example.xlsx\")\n",
|
48 |
+
"\n",
|
49 |
+
"### Display the first few rows\n",
|
50 |
+
"print(df.head())"
|
51 |
+
]
|
52 |
+
},
|
53 |
+
{
|
54 |
+
"cell_type": "markdown",
|
55 |
+
"id": "d6b0b4b3-7514-4b8e-9887-51fd25b4bacf",
|
56 |
+
"metadata": {},
|
57 |
+
"source": [
|
58 |
+
"## Explanation:\n",
|
59 |
+
"### read_excel():\n",
|
60 |
+
"- Reads data from the specified Excel file.\n",
|
61 |
+
"- Automatically detects the first sheet unless specified."
|
62 |
+
]
|
63 |
+
},
|
64 |
+
{
|
65 |
+
"cell_type": "markdown",
|
66 |
+
"id": "bcb8595c-b0f0-47a6-b8e4-5d9d42dd797d",
|
67 |
+
"metadata": {},
|
68 |
+
"source": [
|
69 |
+
"## (c) What Are the Issues Encountered When Handling These Files?\n",
|
70 |
+
"1. Missing Data\n",
|
71 |
+
"Cells may be empty, causing errors or skewed analysis.\n",
|
72 |
+
"2. Encoding Problems\n",
|
73 |
+
"Older Excel files or files saved with non-standard encodings may raise errors.\n",
|
74 |
+
"3. Corrupted Files\n",
|
75 |
+
"Excel files can get corrupted, becoming unreadable.\n",
|
76 |
+
"4. Large Files\n",
|
77 |
+
"Excel struggles with large datasets (e.g., millions of rows).\n",
|
78 |
+
"5. Unsupported Features\n",
|
79 |
+
"Some advanced Excel features (like macros) may not be accessible in Python.\n"
|
80 |
+
]
|
81 |
+
},
|
82 |
+
{
|
83 |
+
"cell_type": "markdown",
|
84 |
+
"id": "6d2e0ab1-61bb-40a3-a232-ab94fa5669a8",
|
85 |
+
"metadata": {},
|
86 |
+
"source": [
|
87 |
+
"## (d) How to Overcome These Errors/Issues?\n",
|
88 |
+
"1. Missing Data\n",
|
89 |
+
"- Use pandas methods to handle missing values:"
|
90 |
+
]
|
91 |
+
},
|
92 |
+
{
|
93 |
+
"cell_type": "markdown",
|
94 |
+
"id": "a4366681-0738-49dd-8f06-34f4e3bdec03",
|
95 |
+
"metadata": {},
|
96 |
+
"source": [
|
97 |
+
"## code:\n",
|
98 |
+
"### df.fillna(value=\"Unknown\", inplace=True) # Fill missing values with a placeholder\n",
|
99 |
+
"### df.dropna(inplace=True) # Remove rows with missing values"
|
100 |
+
]
|
101 |
+
},
|
102 |
+
{
|
103 |
+
"cell_type": "markdown",
|
104 |
+
"id": "cd4081ed-3060-4a90-95d6-0ebcbca45439",
|
105 |
+
"metadata": {},
|
106 |
+
"source": [
|
107 |
+
"## 2. Corrupted Files\n",
|
108 |
+
"- Open the file in Excel and repair it manually or save it as a CSV file for processing."
|
109 |
+
]
|
110 |
+
},
|
111 |
+
{
|
112 |
+
"cell_type": "markdown",
|
113 |
+
"id": "8b23e35a-911e-4673-8b2d-1a25e38ef756",
|
114 |
+
"metadata": {},
|
115 |
+
"source": [
|
116 |
+
"## 3. Large Files\n",
|
117 |
+
"- Read data in chunks using pandas:"
|
118 |
+
]
|
119 |
+
},
|
120 |
+
{
|
121 |
+
"cell_type": "markdown",
|
122 |
+
"id": "141a2aba-4ce1-48e6-af2e-5102b927dcca",
|
123 |
+
"metadata": {},
|
124 |
+
"source": [
|
125 |
+
"## code:\n",
|
126 |
+
"### for chunk in pd.read_excel(\"large_file.xlsx\", chunksize=1000):\n",
|
127 |
+
" ### process(chunk) # Replace `process` with your function"
|
128 |
+
]
|
129 |
+
},
|
130 |
+
{
|
131 |
+
"cell_type": "markdown",
|
132 |
+
"id": "cc730499-3610-406c-8b1c-8b7fc39805fd",
|
133 |
+
"metadata": {},
|
134 |
+
"source": [
|
135 |
+
"## 5. Unsupported Features\n",
|
136 |
+
"- Avoid saving files with unsupported features if they need to be processed programmatically.\n"
|
137 |
+
]
|
138 |
+
},
|
139 |
+
{
|
140 |
+
"cell_type": "code",
|
141 |
+
"execution_count": null,
|
142 |
+
"id": "cac6941e-31ec-484f-9e6a-c62e0c45f4ca",
|
143 |
+
"metadata": {},
|
144 |
+
"outputs": [],
|
145 |
+
"source": []
|
146 |
+
}
|
147 |
+
],
|
148 |
+
"metadata": {
|
149 |
+
"kernelspec": {
|
150 |
+
"display_name": "Python 3 (ipykernel)",
|
151 |
+
"language": "python",
|
152 |
+
"name": "python3"
|
153 |
+
},
|
154 |
+
"language_info": {
|
155 |
+
"codemirror_mode": {
|
156 |
+
"name": "ipython",
|
157 |
+
"version": 3
|
158 |
+
},
|
159 |
+
"file_extension": ".py",
|
160 |
+
"mimetype": "text/x-python",
|
161 |
+
"name": "python",
|
162 |
+
"nbconvert_exporter": "python",
|
163 |
+
"pygments_lexer": "ipython3",
|
164 |
+
"version": "3.11.7"
|
165 |
+
}
|
166 |
+
},
|
167 |
+
"nbformat": 4,
|
168 |
+
"nbformat_minor": 5
|
169 |
+
}
|