shwetashweta05 commited on
Commit
c1104ec
·
verified ·
1 Parent(s): 597e65b

Upload Excel_guide.ipynb

Browse files
Files changed (1) hide show
  1. Excel_guide.ipynb +169 -0
Excel_guide.ipynb ADDED
@@ -0,0 +1,169 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "cells": [
3
+ {
4
+ "cell_type": "markdown",
5
+ "id": "7e2a8960-f41d-40fc-a8e7-43c7e938e759",
6
+ "metadata": {},
7
+ "source": [
8
+ "# Excel"
9
+ ]
10
+ },
11
+ {
12
+ "cell_type": "markdown",
13
+ "id": "7ea74fc9-f615-4214-a2a0-da948feb484e",
14
+ "metadata": {},
15
+ "source": [
16
+ "## (a) What It Is\n",
17
+ "- Excel is a spreadsheet application developed by Microsoft, widely used for organizing, analyzing, and storing structured data in tabular form.\n",
18
+ "\n",
19
+ "## Common file extensions: \n",
20
+ "- .xlsx (current standard) and .xls (older versions).\n",
21
+ "\n",
22
+ "## Features:\n",
23
+ "- Rows and columns for data organization.\n",
24
+ "- Built-in formulas and functions for calculations.\n",
25
+ "- Support for charts and data visualization.\n",
26
+ "- Capability to handle multiple sheets in a single file."
27
+ ]
28
+ },
29
+ {
30
+ "cell_type": "markdown",
31
+ "id": "fd26d0c4-3772-46f3-9bcc-18a7aedeb4cc",
32
+ "metadata": {},
33
+ "source": [
34
+ "## (b) How to Read These Files\n",
35
+ "- In Python, the pandas library provides a simple and efficient way to read Excel files."
36
+ ]
37
+ },
38
+ {
39
+ "cell_type": "markdown",
40
+ "id": "06ae061f-6c3e-4b3c-8e33-8bf8dda50c88",
41
+ "metadata": {},
42
+ "source": [
43
+ "## Code Example:\n",
44
+ "import pandas as pd\n",
45
+ "\n",
46
+ "### Reading an Excel file\n",
47
+ "df = pd.read_excel(\"example.xlsx\")\n",
48
+ "\n",
49
+ "### Display the first few rows\n",
50
+ "print(df.head())"
51
+ ]
52
+ },
53
+ {
54
+ "cell_type": "markdown",
55
+ "id": "d6b0b4b3-7514-4b8e-9887-51fd25b4bacf",
56
+ "metadata": {},
57
+ "source": [
58
+ "## Explanation:\n",
59
+ "### read_excel():\n",
60
+ "- Reads data from the specified Excel file.\n",
61
+ "- Automatically detects the first sheet unless specified."
62
+ ]
63
+ },
64
+ {
65
+ "cell_type": "markdown",
66
+ "id": "bcb8595c-b0f0-47a6-b8e4-5d9d42dd797d",
67
+ "metadata": {},
68
+ "source": [
69
+ "## (c) What Are the Issues Encountered When Handling These Files?\n",
70
+ "1. Missing Data\n",
71
+ "Cells may be empty, causing errors or skewed analysis.\n",
72
+ "2. Encoding Problems\n",
73
+ "Older Excel files or files saved with non-standard encodings may raise errors.\n",
74
+ "3. Corrupted Files\n",
75
+ "Excel files can get corrupted, becoming unreadable.\n",
76
+ "4. Large Files\n",
77
+ "Excel struggles with large datasets (e.g., millions of rows).\n",
78
+ "5. Unsupported Features\n",
79
+ "Some advanced Excel features (like macros) may not be accessible in Python.\n"
80
+ ]
81
+ },
82
+ {
83
+ "cell_type": "markdown",
84
+ "id": "6d2e0ab1-61bb-40a3-a232-ab94fa5669a8",
85
+ "metadata": {},
86
+ "source": [
87
+ "## (d) How to Overcome These Errors/Issues?\n",
88
+ "1. Missing Data\n",
89
+ "- Use pandas methods to handle missing values:"
90
+ ]
91
+ },
92
+ {
93
+ "cell_type": "markdown",
94
+ "id": "a4366681-0738-49dd-8f06-34f4e3bdec03",
95
+ "metadata": {},
96
+ "source": [
97
+ "## code:\n",
98
+ "### df.fillna(value=\"Unknown\", inplace=True) # Fill missing values with a placeholder\n",
99
+ "### df.dropna(inplace=True) # Remove rows with missing values"
100
+ ]
101
+ },
102
+ {
103
+ "cell_type": "markdown",
104
+ "id": "cd4081ed-3060-4a90-95d6-0ebcbca45439",
105
+ "metadata": {},
106
+ "source": [
107
+ "## 2. Corrupted Files\n",
108
+ "- Open the file in Excel and repair it manually or save it as a CSV file for processing."
109
+ ]
110
+ },
111
+ {
112
+ "cell_type": "markdown",
113
+ "id": "8b23e35a-911e-4673-8b2d-1a25e38ef756",
114
+ "metadata": {},
115
+ "source": [
116
+ "## 3. Large Files\n",
117
+ "- Read data in chunks using pandas:"
118
+ ]
119
+ },
120
+ {
121
+ "cell_type": "markdown",
122
+ "id": "141a2aba-4ce1-48e6-af2e-5102b927dcca",
123
+ "metadata": {},
124
+ "source": [
125
+ "## code:\n",
126
+ "### for chunk in pd.read_excel(\"large_file.xlsx\", chunksize=1000):\n",
127
+ " ### process(chunk) # Replace `process` with your function"
128
+ ]
129
+ },
130
+ {
131
+ "cell_type": "markdown",
132
+ "id": "cc730499-3610-406c-8b1c-8b7fc39805fd",
133
+ "metadata": {},
134
+ "source": [
135
+ "## 5. Unsupported Features\n",
136
+ "- Avoid saving files with unsupported features if they need to be processed programmatically.\n"
137
+ ]
138
+ },
139
+ {
140
+ "cell_type": "code",
141
+ "execution_count": null,
142
+ "id": "cac6941e-31ec-484f-9e6a-c62e0c45f4ca",
143
+ "metadata": {},
144
+ "outputs": [],
145
+ "source": []
146
+ }
147
+ ],
148
+ "metadata": {
149
+ "kernelspec": {
150
+ "display_name": "Python 3 (ipykernel)",
151
+ "language": "python",
152
+ "name": "python3"
153
+ },
154
+ "language_info": {
155
+ "codemirror_mode": {
156
+ "name": "ipython",
157
+ "version": 3
158
+ },
159
+ "file_extension": ".py",
160
+ "mimetype": "text/x-python",
161
+ "name": "python",
162
+ "nbconvert_exporter": "python",
163
+ "pygments_lexer": "ipython3",
164
+ "version": "3.11.7"
165
+ }
166
+ },
167
+ "nbformat": 4,
168
+ "nbformat_minor": 5
169
+ }