cboettig commited on
Commit
5a7775e
·
1 Parent(s): 1fc5096
preprocess-States-Counties-Tracts.ipynb ADDED
@@ -0,0 +1,1431 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "cells": [
3
+ {
4
+ "cell_type": "markdown",
5
+ "id": "707b1a14",
6
+ "metadata": {},
7
+ "source": [
8
+ "Pre-process SVI Data from [CDC portal](https://www.atsdr.cdc.gov/place-health/php/svi/svi-data-documentation-download.html)\n",
9
+ "\n",
10
+ "- Tract data for United States from 2022, 2020, 2010, 2000. \n",
11
+ "- Data documentation"
12
+ ]
13
+ },
14
+ {
15
+ "cell_type": "code",
16
+ "execution_count": 1,
17
+ "id": "803df305",
18
+ "metadata": {},
19
+ "outputs": [],
20
+ "source": [
21
+ "import ibis\n",
22
+ "from ibis import _\n",
23
+ "import streamlit as st\n",
24
+ "from utilities import generate_pmtiles\n",
25
+ "\n",
26
+ "con = ibis.duckdb.connect(\"duck.db\", extensions=['httpfs', 'spatial', 'h3'])\n"
27
+ ]
28
+ },
29
+ {
30
+ "cell_type": "code",
31
+ "execution_count": 13,
32
+ "id": "7ac648e6",
33
+ "metadata": {},
34
+ "outputs": [
35
+ {
36
+ "data": {
37
+ "application/vnd.jupyter.widget-view+json": {
38
+ "model_id": "781a57e6e9004c5b8b7ae644aea77dbe",
39
+ "version_major": 2,
40
+ "version_minor": 0
41
+ },
42
+ "text/plain": [
43
+ "FloatProgress(value=0.0, layout=Layout(width='auto'), style=ProgressStyle(bar_color='black'))"
44
+ ]
45
+ },
46
+ "metadata": {},
47
+ "output_type": "display_data"
48
+ },
49
+ {
50
+ "data": {
51
+ "application/vnd.jupyter.widget-view+json": {
52
+ "model_id": "9de6547cfe7e4b32af6852eadf27e53e",
53
+ "version_major": 2,
54
+ "version_minor": 0
55
+ },
56
+ "text/plain": [
57
+ "FloatProgress(value=0.0, layout=Layout(width='auto'), style=ProgressStyle(bar_color='black'))"
58
+ ]
59
+ },
60
+ "metadata": {},
61
+ "output_type": "display_data"
62
+ },
63
+ {
64
+ "name": "stderr",
65
+ "output_type": "stream",
66
+ "text": [
67
+ "For layer 0, using name \"svi\"\n",
68
+ "84120 features, 34922477 bytes of geometry, 5150225 bytes of string pool\n",
69
+ "tile 1/0/0 size is 673414 with detail 12, >500000 \n",
70
+ "Going to try keeping the sparsest 66.82% of the features to make it fit\n",
71
+ "tile 1/0/0 size is 654918 with detail 12, >500000 \n",
72
+ "Going to try keeping the sparsest 45.92% of the features to make it fit\n",
73
+ "tile 1/0/0 size is 627082 with detail 12, >500000 \n",
74
+ "Going to try keeping the sparsest 32.95% of the features to make it fit\n",
75
+ "tile 1/0/0 size is 571221 with detail 12, >500000 \n",
76
+ "Going to try keeping the sparsest 25.96% of the features to make it fit\n",
77
+ "tile 1/0/0 size is 515026 with detail 12, >500000 \n",
78
+ "Going to try keeping the sparsest 22.68% of the features to make it fit\n",
79
+ "tile 2/0/1 size is 556184 with detail 12, >500000 \n",
80
+ "Going to try keeping the sparsest 80.91% of the features to make it fit\n",
81
+ "tile 2/1/1 size is 680483 with detail 12, >500000 \n",
82
+ "Going to try keeping the sparsest 66.13% of the features to make it fit\n",
83
+ "tile 2/0/1 size is 544973 with detail 12, >500000 \n",
84
+ "Going to try keeping the sparsest 66.81% of the features to make it fit\n",
85
+ "tile 2/1/1 size is 633636 with detail 12, >500000 \n",
86
+ "Going to try keeping the sparsest 46.96% of the features to make it fit\n",
87
+ "tile 2/0/1 size is 529976 with detail 12, >500000 \n",
88
+ "Going to try keeping the sparsest 56.73% of the features to make it fit\n",
89
+ "tile 2/1/1 size is 562278 with detail 12, >500000 \n",
90
+ "Going to try keeping the sparsest 37.59% of the features to make it fit\n",
91
+ "tile 2/0/1 size is 509845 with detail 12, >500000 \n",
92
+ "Going to try keeping the sparsest 50.07% of the features to make it fit\n",
93
+ "tile 3/1/3 size is 614365 with detail 12, >500000 \n",
94
+ "Going to try keeping the sparsest 73.25% of the features to make it fit\n",
95
+ "tile 3/2/3 size is 828844 with detail 12, >500000 \n",
96
+ "Going to try keeping the sparsest 54.29% of the features to make it fit\n",
97
+ "tile 3/1/3 size is 557346 with detail 12, >500000 \n",
98
+ "Going to try keeping the sparsest 59.14% of the features to make it fit\n",
99
+ "tile 3/2/3 size is 622365 with detail 12, >500000 \n",
100
+ "Going to try keeping the sparsest 39.26% of the features to make it fit\n",
101
+ "tile 3/1/3 size is 507698 with detail 12, >500000 \n",
102
+ "Going to try keeping the sparsest 52.42% of the features to make it fit\n",
103
+ "tile 4/4/5 size is 513228 with detail 12, >500000 \n",
104
+ "Going to try keeping the sparsest 87.68% of the features to make it fit\n",
105
+ "tile 4/3/6 size is 635333 with detail 12, >500000 \n",
106
+ "Going to try keeping the sparsest 70.83% of the features to make it fit\n",
107
+ "tile 4/3/6 size is 515357 with detail 12, >500000 \n",
108
+ "Going to try keeping the sparsest 61.85% of the features to make it fit\n",
109
+ "tile 4/4/6 size is 1080604 with detail 12, >500000 \n",
110
+ "Going to try keeping the sparsest 41.64% of the features to make it fit\n",
111
+ "tile 4/4/6 size is 614947 with detail 12, >500000 \n",
112
+ "Going to try keeping the sparsest 30.47% of the features to make it fit\n",
113
+ "tile 5/8/12 size is 784796 with detail 12, >500000 \n",
114
+ "Going to try keeping the sparsest 57.34% of the features to make it fit\n",
115
+ "tile 5/8/12 size is 540488 with detail 12, >500000 \n",
116
+ "Going to try keeping the sparsest 47.74% of the features to make it fit\n",
117
+ " 99.9% 12/973/1656 \n",
118
+ " 100.0% 12/4092/1352 \r"
119
+ ]
120
+ },
121
+ {
122
+ "name": "stdout",
123
+ "output_type": "stream",
124
+ "text": [
125
+ "Successfully generated PMTiles file: svi-data/2022/SVI2022_US_tract.pmtiles\n"
126
+ ]
127
+ }
128
+ ],
129
+ "source": [
130
+ "expr = con.read_geo(\"svi-data/2022/SVI2022_US_tract.gdb\")\n",
131
+ "expr.to_parquet(\"svi-data/2022/SVI2022_US_tract.parquet\")\n",
132
+ "\n",
133
+ "# tippecanoe requires geojson input to create PMTiles. Drop most additional variables in PMTiles creation.\n",
134
+ "query = ibis.to_sql(expr.select('STATE', 'COUNTY', 'LOCATION', 'FIPS', 'RPL_THEMES', 'Shape'))\n",
135
+ "con.raw_sql(f\"COPY ({query}) TO '/tmp/svi.json' WITH (FORMAT GDAL, DRIVER 'GeoJSON', LAYER_CREATION_OPTIONS 'WRITE_BBOX=YES');\")\n",
136
+ "\n",
137
+ "generate_pmtiles(\"/tmp/svi.json\", \"svi-data/2022/SVI2022_US_tract.pmtiles\")\n"
138
+ ]
139
+ },
140
+ {
141
+ "cell_type": "code",
142
+ "execution_count": 15,
143
+ "id": "2e29cc6e",
144
+ "metadata": {},
145
+ "outputs": [
146
+ {
147
+ "data": {
148
+ "text/plain": [
149
+ "<minio.helpers.ObjectWriteResult at 0x77886893f050>"
150
+ ]
151
+ },
152
+ "execution_count": 15,
153
+ "metadata": {},
154
+ "output_type": "execute_result"
155
+ }
156
+ ],
157
+ "source": [
158
+ "import minio\n",
159
+ "import re\n",
160
+ "\n",
161
+ "minio_key = st.secrets[\"MINIO_KEY\"]\n",
162
+ "minio_secret = st.secrets[\"MINIO_SECRET\"]\n",
163
+ "mc = minio.Minio(\"minio.carlboettiger.info\", minio_key, minio_secret)\n",
164
+ "\n",
165
+ "mc.fput_object(\"public-data\", \"social-vulnerability/2022/SVI2022_US_tract.pmtiles\", \"svi-data/2022/SVI2022_US_tract.pmtiles\")\n",
166
+ "mc.fput_object(\"public-data\", \"social-vulnerability/2022/SVI2022_US_tract.parquet\", \"svi-data/2022/SVI2022_US_tract.parquet\")\n"
167
+ ]
168
+ },
169
+ {
170
+ "cell_type": "code",
171
+ "execution_count": 19,
172
+ "id": "5fcd59bc-72a4-4de7-9cdb-1b6eca9407fb",
173
+ "metadata": {},
174
+ "outputs": [
175
+ {
176
+ "data": {
177
+ "text/plain": [
178
+ "<duckdb.duckdb.DuckDBPyConnection at 0x7edb2419f330>"
179
+ ]
180
+ },
181
+ "execution_count": 19,
182
+ "metadata": {},
183
+ "output_type": "execute_result"
184
+ }
185
+ ],
186
+ "source": [
187
+ "\n",
188
+ "\n",
189
+ "\n",
190
+ "# Local cloud\n",
191
+ "minio_key = st.secrets[\"MINIO_KEY\"]\n",
192
+ "minio_secret = st.secrets[\"MINIO_SECRET\"]\n",
193
+ "query1 = f'''\n",
194
+ "CREATE OR REPLACE SECRET secret1 (\n",
195
+ " TYPE S3,\n",
196
+ " KEY_ID '{minio_key}',\n",
197
+ " SECRET '{minio_secret}',\n",
198
+ " ENDPOINT 'minio.carlboettiger.info',\n",
199
+ " URL_STYLE 'path',\n",
200
+ " SCOPE \"s3://public-gbif\"\n",
201
+ "\n",
202
+ ");\n",
203
+ "'''\n",
204
+ "query2 = f'''\n",
205
+ "CREATE OR REPLACE SECRET secret2 (\n",
206
+ " TYPE S3,\n",
207
+ " KEY_ID '{minio_key}',\n",
208
+ " SECRET '{minio_secret}',\n",
209
+ " ENDPOINT 'minio.carlboettiger.info',\n",
210
+ " URL_STYLE 'path',\n",
211
+ " SCOPE \"s3://public-data\"\n",
212
+ "\n",
213
+ ");\n",
214
+ "'''\n",
215
+ "# don't scope to a single bucket\n",
216
+ "# SCOPE 's3://public-gbif'\n",
217
+ "\n",
218
+ "con.raw_sql(query1)\n",
219
+ "con.raw_sql(query2)\n",
220
+ "## Limits are sometimes good \n",
221
+ "con.raw_sql(\"SET memory_limit = '20GB';\")\n",
222
+ "con.raw_sql(\"set threads=40;\")\n",
223
+ "\n",
224
+ "# can/should we add explicit spatial index to gbif first? using RTree takes too much memory"
225
+ ]
226
+ },
227
+ {
228
+ "cell_type": "code",
229
+ "execution_count": 20,
230
+ "id": "dcf50375-75ee-4208-87b2-6ffef6361742",
231
+ "metadata": {},
232
+ "outputs": [],
233
+ "source": [
234
+ "overture = (\n",
235
+ " con.read_parquet('s3://overturemaps-us-west-2/release/2024-11-13.0/theme=divisions/type=division_area/*', \n",
236
+ " filename=True, hive_partitioning=1))\n",
237
+ "usa = overture.filter(_.subtype==\"country\").filter(_.country == \"US\").select(_.geometry).execute()"
238
+ ]
239
+ },
240
+ {
241
+ "cell_type": "code",
242
+ "execution_count": 21,
243
+ "id": "ce86081b-a46f-426b-9432-9bce588156ee",
244
+ "metadata": {},
245
+ "outputs": [],
246
+ "source": [
247
+ "\n",
248
+ "gbif = con.read_parquet(\"s3://public-gbif/2024-10-01/**\")\n",
249
+ "svi = con.read_parquet(\"s3://public-data/social-vulnerability/2022/SVI2022_US_tract.parquet\").rename(geom = \"Shape\")\n"
250
+ ]
251
+ },
252
+ {
253
+ "cell_type": "markdown",
254
+ "id": "3891abb6-3652-4217-8615-106d354ff131",
255
+ "metadata": {},
256
+ "source": [
257
+ "We iterate through the city list to do this efficiently. (Should we filter gbif down to US boundary as a one-off first? We will assume it is efficient to filter the full globe state by state)"
258
+ ]
259
+ },
260
+ {
261
+ "cell_type": "code",
262
+ "execution_count": 23,
263
+ "id": "69bf6dc6-4a13-4830-8c1a-87bb5899eb32",
264
+ "metadata": {},
265
+ "outputs": [],
266
+ "source": [
267
+ "all_states = svi.select(_.ST_ABBR).distinct().order_by(_.ST_ABBR).execute()[\"ST_ABBR\"]\n",
268
+ "#all_states"
269
+ ]
270
+ },
271
+ {
272
+ "cell_type": "code",
273
+ "execution_count": 26,
274
+ "id": "32a2b4c1-e08b-4fbb-b891-ac19053a4585",
275
+ "metadata": {},
276
+ "outputs": [],
277
+ "source": [
278
+ "## select from the list we haven't yet written (allows resume).\n",
279
+ "import minio\n",
280
+ "import re\n",
281
+ "\n",
282
+ "minio_key = st.secrets[\"MINIO_KEY\"]\n",
283
+ "minio_secret = st.secrets[\"MINIO_SECRET\"]\n",
284
+ "mc = minio.Minio(\"minio.carlboettiger.info\", minio_key, minio_secret)\n",
285
+ "obj = mc.list_objects(\"public-gbif\", \"social-vulnerability\", recursive=True)\n",
286
+ "pattern = r\"social-vulnerability/|\\.parquet$\"\n",
287
+ "finished = [re.sub(pattern, \"\", i.object_name) for i in obj if not i.is_dir]\n",
288
+ "remaining = set(all_states) - set(finished)"
289
+ ]
290
+ },
291
+ {
292
+ "cell_type": "code",
293
+ "execution_count": 27,
294
+ "id": "4ecc58a3",
295
+ "metadata": {},
296
+ "outputs": [
297
+ {
298
+ "data": {
299
+ "text/plain": [
300
+ "{'AK',\n",
301
+ " 'AL',\n",
302
+ " 'AR',\n",
303
+ " 'AZ',\n",
304
+ " 'CA',\n",
305
+ " 'CO',\n",
306
+ " 'CT',\n",
307
+ " 'DC',\n",
308
+ " 'DE',\n",
309
+ " 'FL',\n",
310
+ " 'GA',\n",
311
+ " 'HI',\n",
312
+ " 'IA',\n",
313
+ " 'ID',\n",
314
+ " 'IL',\n",
315
+ " 'IN',\n",
316
+ " 'KS',\n",
317
+ " 'KY',\n",
318
+ " 'LA',\n",
319
+ " 'MA',\n",
320
+ " 'MD',\n",
321
+ " 'ME',\n",
322
+ " 'MI',\n",
323
+ " 'MN',\n",
324
+ " 'MO',\n",
325
+ " 'MS',\n",
326
+ " 'MT',\n",
327
+ " 'NC',\n",
328
+ " 'ND',\n",
329
+ " 'NE',\n",
330
+ " 'NH',\n",
331
+ " 'NJ',\n",
332
+ " 'NM',\n",
333
+ " 'NV',\n",
334
+ " 'NY',\n",
335
+ " 'OH',\n",
336
+ " 'OK',\n",
337
+ " 'OR',\n",
338
+ " 'PA',\n",
339
+ " 'RI',\n",
340
+ " 'SC',\n",
341
+ " 'SD',\n",
342
+ " 'TN',\n",
343
+ " 'TX',\n",
344
+ " 'UT',\n",
345
+ " 'VA',\n",
346
+ " 'VT',\n",
347
+ " 'WA',\n",
348
+ " 'WI',\n",
349
+ " 'WV',\n",
350
+ " 'WY'}"
351
+ ]
352
+ },
353
+ "execution_count": 27,
354
+ "metadata": {},
355
+ "output_type": "execute_result"
356
+ }
357
+ ],
358
+ "source": [
359
+ "remaining"
360
+ ]
361
+ },
362
+ {
363
+ "cell_type": "code",
364
+ "execution_count": null,
365
+ "id": "c3a4005c-1e8c-4f2a-a93c-1c158c9c26ab",
366
+ "metadata": {},
367
+ "outputs": [
368
+ {
369
+ "name": "stdout",
370
+ "output_type": "stream",
371
+ "text": [
372
+ "NV/Eureka County\n"
373
+ ]
374
+ },
375
+ {
376
+ "data": {
377
+ "application/vnd.jupyter.widget-view+json": {
378
+ "model_id": "6fd7b6e6fe1a4e3b9c8d476e0e757644",
379
+ "version_major": 2,
380
+ "version_minor": 0
381
+ },
382
+ "text/plain": [
383
+ "FloatProgress(value=0.0, layout=Layout(width='auto'), style=ProgressStyle(bar_color='black'))"
384
+ ]
385
+ },
386
+ "metadata": {},
387
+ "output_type": "display_data"
388
+ },
389
+ {
390
+ "name": "stdout",
391
+ "output_type": "stream",
392
+ "text": [
393
+ "NV/Lander County\n"
394
+ ]
395
+ },
396
+ {
397
+ "data": {
398
+ "application/vnd.jupyter.widget-view+json": {
399
+ "model_id": "80f08f9cc267481996667dd1e383a3fb",
400
+ "version_major": 2,
401
+ "version_minor": 0
402
+ },
403
+ "text/plain": [
404
+ "FloatProgress(value=0.0, layout=Layout(width='auto'), style=ProgressStyle(bar_color='black'))"
405
+ ]
406
+ },
407
+ "metadata": {},
408
+ "output_type": "display_data"
409
+ },
410
+ {
411
+ "name": "stdout",
412
+ "output_type": "stream",
413
+ "text": [
414
+ "NV/Clark County\n"
415
+ ]
416
+ },
417
+ {
418
+ "data": {
419
+ "application/vnd.jupyter.widget-view+json": {
420
+ "model_id": "872f806c983d4804b399881eac7d3bd9",
421
+ "version_major": 2,
422
+ "version_minor": 0
423
+ },
424
+ "text/plain": [
425
+ "FloatProgress(value=0.0, layout=Layout(width='auto'), style=ProgressStyle(bar_color='black'))"
426
+ ]
427
+ },
428
+ "metadata": {},
429
+ "output_type": "display_data"
430
+ },
431
+ {
432
+ "name": "stdout",
433
+ "output_type": "stream",
434
+ "text": [
435
+ "NV/Storey County\n"
436
+ ]
437
+ },
438
+ {
439
+ "data": {
440
+ "application/vnd.jupyter.widget-view+json": {
441
+ "model_id": "9927ce307cc8413f973891b388c89288",
442
+ "version_major": 2,
443
+ "version_minor": 0
444
+ },
445
+ "text/plain": [
446
+ "FloatProgress(value=0.0, layout=Layout(width='auto'), style=ProgressStyle(bar_color='black'))"
447
+ ]
448
+ },
449
+ "metadata": {},
450
+ "output_type": "display_data"
451
+ },
452
+ {
453
+ "name": "stdout",
454
+ "output_type": "stream",
455
+ "text": [
456
+ "NV/Churchill County\n"
457
+ ]
458
+ },
459
+ {
460
+ "data": {
461
+ "application/vnd.jupyter.widget-view+json": {
462
+ "model_id": "a5d573d2766c41fbbfde5af9ed19ab76",
463
+ "version_major": 2,
464
+ "version_minor": 0
465
+ },
466
+ "text/plain": [
467
+ "FloatProgress(value=0.0, layout=Layout(width='auto'), style=ProgressStyle(bar_color='black'))"
468
+ ]
469
+ },
470
+ "metadata": {},
471
+ "output_type": "display_data"
472
+ },
473
+ {
474
+ "name": "stdout",
475
+ "output_type": "stream",
476
+ "text": [
477
+ "NV/Esmeralda County\n"
478
+ ]
479
+ },
480
+ {
481
+ "data": {
482
+ "application/vnd.jupyter.widget-view+json": {
483
+ "model_id": "bda8ee87bd1340f1b666c61b5d6bc716",
484
+ "version_major": 2,
485
+ "version_minor": 0
486
+ },
487
+ "text/plain": [
488
+ "FloatProgress(value=0.0, layout=Layout(width='auto'), style=ProgressStyle(bar_color='black'))"
489
+ ]
490
+ },
491
+ "metadata": {},
492
+ "output_type": "display_data"
493
+ },
494
+ {
495
+ "name": "stdout",
496
+ "output_type": "stream",
497
+ "text": [
498
+ "NV/Lyon County\n"
499
+ ]
500
+ },
501
+ {
502
+ "data": {
503
+ "application/vnd.jupyter.widget-view+json": {
504
+ "model_id": "460041e4c2a745d5b0ff6dff64f26343",
505
+ "version_major": 2,
506
+ "version_minor": 0
507
+ },
508
+ "text/plain": [
509
+ "FloatProgress(value=0.0, layout=Layout(width='auto'), style=ProgressStyle(bar_color='black'))"
510
+ ]
511
+ },
512
+ "metadata": {},
513
+ "output_type": "display_data"
514
+ },
515
+ {
516
+ "name": "stdout",
517
+ "output_type": "stream",
518
+ "text": [
519
+ "NV/Nye County\n"
520
+ ]
521
+ },
522
+ {
523
+ "data": {
524
+ "application/vnd.jupyter.widget-view+json": {
525
+ "model_id": "7690f211b2e2418e876898369d3b04ef",
526
+ "version_major": 2,
527
+ "version_minor": 0
528
+ },
529
+ "text/plain": [
530
+ "FloatProgress(value=0.0, layout=Layout(width='auto'), style=ProgressStyle(bar_color='black'))"
531
+ ]
532
+ },
533
+ "metadata": {},
534
+ "output_type": "display_data"
535
+ },
536
+ {
537
+ "name": "stdout",
538
+ "output_type": "stream",
539
+ "text": [
540
+ "NV/Douglas County\n"
541
+ ]
542
+ },
543
+ {
544
+ "data": {
545
+ "application/vnd.jupyter.widget-view+json": {
546
+ "model_id": "fb3235545ac04933ae82d95b5783657e",
547
+ "version_major": 2,
548
+ "version_minor": 0
549
+ },
550
+ "text/plain": [
551
+ "FloatProgress(value=0.0, layout=Layout(width='auto'), style=ProgressStyle(bar_color='black'))"
552
+ ]
553
+ },
554
+ "metadata": {},
555
+ "output_type": "display_data"
556
+ },
557
+ {
558
+ "name": "stdout",
559
+ "output_type": "stream",
560
+ "text": [
561
+ "NV/Elko County\n"
562
+ ]
563
+ },
564
+ {
565
+ "data": {
566
+ "application/vnd.jupyter.widget-view+json": {
567
+ "model_id": "9c72bf43115f4c65b9b7bdfb44c2fc67",
568
+ "version_major": 2,
569
+ "version_minor": 0
570
+ },
571
+ "text/plain": [
572
+ "FloatProgress(value=0.0, layout=Layout(width='auto'), style=ProgressStyle(bar_color='black'))"
573
+ ]
574
+ },
575
+ "metadata": {},
576
+ "output_type": "display_data"
577
+ },
578
+ {
579
+ "name": "stdout",
580
+ "output_type": "stream",
581
+ "text": [
582
+ "NV/Pershing County\n"
583
+ ]
584
+ },
585
+ {
586
+ "data": {
587
+ "application/vnd.jupyter.widget-view+json": {
588
+ "model_id": "2e7d1ff419fe4bf4ae07c1d4288d0259",
589
+ "version_major": 2,
590
+ "version_minor": 0
591
+ },
592
+ "text/plain": [
593
+ "FloatProgress(value=0.0, layout=Layout(width='auto'), style=ProgressStyle(bar_color='black'))"
594
+ ]
595
+ },
596
+ "metadata": {},
597
+ "output_type": "display_data"
598
+ },
599
+ {
600
+ "name": "stdout",
601
+ "output_type": "stream",
602
+ "text": [
603
+ "NV/Washoe County\n"
604
+ ]
605
+ },
606
+ {
607
+ "data": {
608
+ "application/vnd.jupyter.widget-view+json": {
609
+ "model_id": "c57a17eaf70941a9968ad2db89e2c98d",
610
+ "version_major": 2,
611
+ "version_minor": 0
612
+ },
613
+ "text/plain": [
614
+ "FloatProgress(value=0.0, layout=Layout(width='auto'), style=ProgressStyle(bar_color='black'))"
615
+ ]
616
+ },
617
+ "metadata": {},
618
+ "output_type": "display_data"
619
+ },
620
+ {
621
+ "name": "stdout",
622
+ "output_type": "stream",
623
+ "text": [
624
+ "NV/Humboldt County\n"
625
+ ]
626
+ },
627
+ {
628
+ "data": {
629
+ "application/vnd.jupyter.widget-view+json": {
630
+ "model_id": "958d00b68de94acaa96d63871daa356b",
631
+ "version_major": 2,
632
+ "version_minor": 0
633
+ },
634
+ "text/plain": [
635
+ "FloatProgress(value=0.0, layout=Layout(width='auto'), style=ProgressStyle(bar_color='black'))"
636
+ ]
637
+ },
638
+ "metadata": {},
639
+ "output_type": "display_data"
640
+ },
641
+ {
642
+ "name": "stdout",
643
+ "output_type": "stream",
644
+ "text": [
645
+ "NV/Carson City\n"
646
+ ]
647
+ },
648
+ {
649
+ "data": {
650
+ "application/vnd.jupyter.widget-view+json": {
651
+ "model_id": "a162c535a70f4dc89b7e708f2fc8633b",
652
+ "version_major": 2,
653
+ "version_minor": 0
654
+ },
655
+ "text/plain": [
656
+ "FloatProgress(value=0.0, layout=Layout(width='auto'), style=ProgressStyle(bar_color='black'))"
657
+ ]
658
+ },
659
+ "metadata": {},
660
+ "output_type": "display_data"
661
+ },
662
+ {
663
+ "name": "stdout",
664
+ "output_type": "stream",
665
+ "text": [
666
+ "NV/Lincoln County\n"
667
+ ]
668
+ },
669
+ {
670
+ "data": {
671
+ "application/vnd.jupyter.widget-view+json": {
672
+ "model_id": "e4ca31b6f6b64667bc8c1e5dfac7ab03",
673
+ "version_major": 2,
674
+ "version_minor": 0
675
+ },
676
+ "text/plain": [
677
+ "FloatProgress(value=0.0, layout=Layout(width='auto'), style=ProgressStyle(bar_color='black'))"
678
+ ]
679
+ },
680
+ "metadata": {},
681
+ "output_type": "display_data"
682
+ },
683
+ {
684
+ "name": "stdout",
685
+ "output_type": "stream",
686
+ "text": [
687
+ "NV/White Pine County\n"
688
+ ]
689
+ },
690
+ {
691
+ "data": {
692
+ "application/vnd.jupyter.widget-view+json": {
693
+ "model_id": "b89cd3441e714d5983daffa6c93cc0ee",
694
+ "version_major": 2,
695
+ "version_minor": 0
696
+ },
697
+ "text/plain": [
698
+ "FloatProgress(value=0.0, layout=Layout(width='auto'), style=ProgressStyle(bar_color='black'))"
699
+ ]
700
+ },
701
+ "metadata": {},
702
+ "output_type": "display_data"
703
+ },
704
+ {
705
+ "name": "stdout",
706
+ "output_type": "stream",
707
+ "text": [
708
+ "NV/Mineral County\n"
709
+ ]
710
+ },
711
+ {
712
+ "data": {
713
+ "application/vnd.jupyter.widget-view+json": {
714
+ "model_id": "c05bb3bbd3434373a068a489410beb4a",
715
+ "version_major": 2,
716
+ "version_minor": 0
717
+ },
718
+ "text/plain": [
719
+ "FloatProgress(value=0.0, layout=Layout(width='auto'), style=ProgressStyle(bar_color='black'))"
720
+ ]
721
+ },
722
+ "metadata": {},
723
+ "output_type": "display_data"
724
+ },
725
+ {
726
+ "name": "stdout",
727
+ "output_type": "stream",
728
+ "text": [
729
+ "NE/Blaine County\n"
730
+ ]
731
+ },
732
+ {
733
+ "data": {
734
+ "application/vnd.jupyter.widget-view+json": {
735
+ "model_id": "f88979e7186546f3be239f901393ab8a",
736
+ "version_major": 2,
737
+ "version_minor": 0
738
+ },
739
+ "text/plain": [
740
+ "FloatProgress(value=0.0, layout=Layout(width='auto'), style=ProgressStyle(bar_color='black'))"
741
+ ]
742
+ },
743
+ "metadata": {},
744
+ "output_type": "display_data"
745
+ },
746
+ {
747
+ "name": "stdout",
748
+ "output_type": "stream",
749
+ "text": [
750
+ "NE/Butler County\n"
751
+ ]
752
+ },
753
+ {
754
+ "data": {
755
+ "application/vnd.jupyter.widget-view+json": {
756
+ "model_id": "c4bfcd9ba81944b29907ccf3c6d3783e",
757
+ "version_major": 2,
758
+ "version_minor": 0
759
+ },
760
+ "text/plain": [
761
+ "FloatProgress(value=0.0, layout=Layout(width='auto'), style=ProgressStyle(bar_color='black'))"
762
+ ]
763
+ },
764
+ "metadata": {},
765
+ "output_type": "display_data"
766
+ },
767
+ {
768
+ "name": "stdout",
769
+ "output_type": "stream",
770
+ "text": [
771
+ "NE/Custer County\n"
772
+ ]
773
+ },
774
+ {
775
+ "data": {
776
+ "application/vnd.jupyter.widget-view+json": {
777
+ "model_id": "6cd52ae08b7b4dea931ff2ffa5d6c7f6",
778
+ "version_major": 2,
779
+ "version_minor": 0
780
+ },
781
+ "text/plain": [
782
+ "FloatProgress(value=0.0, layout=Layout(width='auto'), style=ProgressStyle(bar_color='black'))"
783
+ ]
784
+ },
785
+ "metadata": {},
786
+ "output_type": "display_data"
787
+ },
788
+ {
789
+ "name": "stdout",
790
+ "output_type": "stream",
791
+ "text": [
792
+ "NE/Dakota County\n"
793
+ ]
794
+ },
795
+ {
796
+ "data": {
797
+ "application/vnd.jupyter.widget-view+json": {
798
+ "model_id": "b3a2e7411b69407ba76c06b0d083a961",
799
+ "version_major": 2,
800
+ "version_minor": 0
801
+ },
802
+ "text/plain": [
803
+ "FloatProgress(value=0.0, layout=Layout(width='auto'), style=ProgressStyle(bar_color='black'))"
804
+ ]
805
+ },
806
+ "metadata": {},
807
+ "output_type": "display_data"
808
+ },
809
+ {
810
+ "name": "stdout",
811
+ "output_type": "stream",
812
+ "text": [
813
+ "NE/Kearney County\n"
814
+ ]
815
+ },
816
+ {
817
+ "data": {
818
+ "application/vnd.jupyter.widget-view+json": {
819
+ "model_id": "e0ccf52489a3467ba172afef3e36f2f0",
820
+ "version_major": 2,
821
+ "version_minor": 0
822
+ },
823
+ "text/plain": [
824
+ "FloatProgress(value=0.0, layout=Layout(width='auto'), style=ProgressStyle(bar_color='black'))"
825
+ ]
826
+ },
827
+ "metadata": {},
828
+ "output_type": "display_data"
829
+ },
830
+ {
831
+ "name": "stdout",
832
+ "output_type": "stream",
833
+ "text": [
834
+ "NE/Keith County\n"
835
+ ]
836
+ },
837
+ {
838
+ "data": {
839
+ "application/vnd.jupyter.widget-view+json": {
840
+ "model_id": "e6f8cd9d59284e7aaa9eabe117a69079",
841
+ "version_major": 2,
842
+ "version_minor": 0
843
+ },
844
+ "text/plain": [
845
+ "FloatProgress(value=0.0, layout=Layout(width='auto'), style=ProgressStyle(bar_color='black'))"
846
+ ]
847
+ },
848
+ "metadata": {},
849
+ "output_type": "display_data"
850
+ }
851
+ ],
852
+ "source": [
853
+ "## And here we go, long-running loop over each city\n",
854
+ "for i in remaining:\n",
855
+ " counties = svi.filter(_.ST_ABBR == i).select(_.COUNTY).distinct().execute()[\"COUNTY\"].to_numpy()\n",
856
+ " for county in counties:\n",
857
+ " gdf = (svi\n",
858
+ " .filter(_.ST_ABBR == i, _.COUNTY== county)\n",
859
+ " .mutate(area = _.geom.area())\n",
860
+ " )\n",
861
+ "\n",
862
+ " print(i + \"/\" + county)\n",
863
+ " \n",
864
+ " bounds = gdf.execute().total_bounds\n",
865
+ " points = (gbif\n",
866
+ " .filter(_.decimallongitude >= bounds[0], \n",
867
+ " _.decimallongitude < bounds[2], \n",
868
+ " _.decimallatitude >= bounds[1], \n",
869
+ " _.decimallatitude < bounds[3])\n",
870
+ " )\n",
871
+ " \n",
872
+ " (gdf\n",
873
+ " .join(points, gdf.geom.intersects(points.geom))\n",
874
+ " .to_parquet(f\"s3://public-gbif/social-vulnerability/state={i}/{county}.parquet\")\n",
875
+ " )\n"
876
+ ]
877
+ },
878
+ {
879
+ "cell_type": "markdown",
880
+ "id": "050a358f-e2de-49bd-a80d-4f8c47e36bab",
881
+ "metadata": {},
882
+ "source": [
883
+ "gbif_usa = con.read_parquet(\"s3://cboettig/gbif/svi/**\")\n"
884
+ ]
885
+ },
886
+ {
887
+ "cell_type": "code",
888
+ "execution_count": 43,
889
+ "id": "9bd1299b-af6b-4d85-97fb-ba83a5c26c70",
890
+ "metadata": {},
891
+ "outputs": [
892
+ {
893
+ "data": {
894
+ "text/html": [
895
+ "<pre style=\"white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace\">DatabaseTable: ibis_read_parquet_msislo4d7fcgdfh2pyoxvxjkdu\n",
896
+ " OBJECTID int64\n",
897
+ " ST string\n",
898
+ " STATE string\n",
899
+ " ST_ABBR string\n",
900
+ " STCNTY string\n",
901
+ " COUNTY string\n",
902
+ " FIPS string\n",
903
+ " LOCATION string\n",
904
+ " AREA_SQMI float64\n",
905
+ " E_TOTPOP int32\n",
906
+ " M_TOTPOP int32\n",
907
+ " E_HU int32\n",
908
+ " M_HU int32\n",
909
+ " E_HH int32\n",
910
+ " M_HH int32\n",
911
+ " E_POV150 int32\n",
912
+ " M_POV150 int32\n",
913
+ " E_UNEMP int32\n",
914
+ " M_UNEMP int32\n",
915
+ " E_HBURD int32\n",
916
+ " M_HBURD int32\n",
917
+ " E_NOHSDP int32\n",
918
+ " M_NOHSDP int32\n",
919
+ " E_UNINSUR int32\n",
920
+ " M_UNINSUR int32\n",
921
+ " E_AGE65 int32\n",
922
+ " M_AGE65 int32\n",
923
+ " E_AGE17 int32\n",
924
+ " M_AGE17 int32\n",
925
+ " E_DISABL int32\n",
926
+ " M_DISABL int32\n",
927
+ " E_SNGPNT int32\n",
928
+ " M_SNGPNT int32\n",
929
+ " E_LIMENG int32\n",
930
+ " M_LIMENG int32\n",
931
+ " E_MINRTY int32\n",
932
+ " M_MINRTY int32\n",
933
+ " E_MUNIT int32\n",
934
+ " M_MUNIT int32\n",
935
+ " E_MOBILE int32\n",
936
+ " M_MOBILE int32\n",
937
+ " E_CROWD int32\n",
938
+ " M_CROWD int32\n",
939
+ " E_NOVEH int32\n",
940
+ " M_NOVEH int32\n",
941
+ " E_GROUPQ int32\n",
942
+ " M_GROUPQ int32\n",
943
+ " EP_POV150 float64\n",
944
+ " MP_POV150 float64\n",
945
+ " EP_UNEMP float64\n",
946
+ " MP_UNEMP float64\n",
947
+ " EP_HBURD float64\n",
948
+ " MP_HBURD float64\n",
949
+ " EP_NOHSDP float64\n",
950
+ " MP_NOHSDP float64\n",
951
+ " EP_UNINSUR float64\n",
952
+ " MP_UNINSUR float64\n",
953
+ " EP_AGE65 float64\n",
954
+ " MP_AGE65 float64\n",
955
+ " EP_AGE17 float64\n",
956
+ " MP_AGE17 float64\n",
957
+ " EP_DISABL float64\n",
958
+ " MP_DISABL float64\n",
959
+ " EP_SNGPNT float64\n",
960
+ " MP_SNGPNT float64\n",
961
+ " EP_LIMENG float64\n",
962
+ " MP_LIMENG float64\n",
963
+ " EP_MINRTY float64\n",
964
+ " MP_MINRTY float64\n",
965
+ " EP_MUNIT float64\n",
966
+ " MP_MUNIT float64\n",
967
+ " EP_MOBILE float64\n",
968
+ " MP_MOBILE float64\n",
969
+ " EP_CROWD float64\n",
970
+ " MP_CROWD float64\n",
971
+ " EP_NOVEH float64\n",
972
+ " MP_NOVEH float64\n",
973
+ " EP_GROUPQ float64\n",
974
+ " MP_GROUPQ float64\n",
975
+ " EPL_POV150 float64\n",
976
+ " EPL_UNEMP float64\n",
977
+ " EPL_HBURD float64\n",
978
+ " EPL_NOHSDP float64\n",
979
+ " EPL_UNINSUR float64\n",
980
+ " SPL_THEME1 float64\n",
981
+ " RPL_THEME1 float64\n",
982
+ " EPL_AGE65 float64\n",
983
+ " EPL_AGE17 float64\n",
984
+ " EPL_DISABL float64\n",
985
+ " EPL_SNGPNT float64\n",
986
+ " EPL_LIMENG float64\n",
987
+ " SPL_THEME2 float64\n",
988
+ " RPL_THEME2 float64\n",
989
+ " EPL_MINRTY float64\n",
990
+ " SPL_THEME3 float64\n",
991
+ " RPL_THEME3 float64\n",
992
+ " EPL_MUNIT float64\n",
993
+ " EPL_MOBILE float64\n",
994
+ " EPL_CROWD float64\n",
995
+ " EPL_NOVEH float64\n",
996
+ " EPL_GROUPQ float64\n",
997
+ " SPL_THEME4 float64\n",
998
+ " RPL_THEME4 float64\n",
999
+ " SPL_THEMES float64\n",
1000
+ " RPL_THEMES float64\n",
1001
+ " F_POV150 int16\n",
1002
+ " F_UNEMP int16\n",
1003
+ " F_HBURD int16\n",
1004
+ " F_NOHSDP int16\n",
1005
+ " F_UNINSUR int16\n",
1006
+ " F_THEME1 int16\n",
1007
+ " F_AGE65 int16\n",
1008
+ " F_AGE17 int16\n",
1009
+ " F_DISABL int16\n",
1010
+ " F_SNGPNT int16\n",
1011
+ " F_LIMENG int16\n",
1012
+ " F_THEME2 int16\n",
1013
+ " F_MINRTY int16\n",
1014
+ " F_THEME3 int16\n",
1015
+ " F_MUNIT int16\n",
1016
+ " F_MOBILE int16\n",
1017
+ " F_CROWD int16\n",
1018
+ " F_NOVEH int16\n",
1019
+ " F_GROUPQ int16\n",
1020
+ " F_THEME4 int16\n",
1021
+ " F_TOTAL int16\n",
1022
+ " E_DAYPOP int32\n",
1023
+ " E_NOINT int32\n",
1024
+ " M_NOINT int32\n",
1025
+ " E_AFAM int32\n",
1026
+ " M_AFAM int32\n",
1027
+ " E_HISP int32\n",
1028
+ " M_HISP int32\n",
1029
+ " E_ASIAN int32\n",
1030
+ " M_ASIAN int32\n",
1031
+ " E_AIAN int32\n",
1032
+ " M_AIAN int32\n",
1033
+ " E_NHPI int32\n",
1034
+ " M_NHPI int32\n",
1035
+ " E_TWOMORE int32\n",
1036
+ " M_TWOMORE int32\n",
1037
+ " E_OTHERRACE int32\n",
1038
+ " M_OTHERRACE int32\n",
1039
+ " EP_NOINT float64\n",
1040
+ " MP_NOINT float64\n",
1041
+ " EP_AFAM float64\n",
1042
+ " MP_AFAM float64\n",
1043
+ " EP_HISP float64\n",
1044
+ " MP_HISP float64\n",
1045
+ " EP_ASIAN float64\n",
1046
+ " MP_ASIAN float64\n",
1047
+ " EP_AIAN float64\n",
1048
+ " MP_AIAN float64\n",
1049
+ " EP_NHPI float64\n",
1050
+ " MP_NHPI float64\n",
1051
+ " EP_TWOMORE float64\n",
1052
+ " MP_TWOMORE float64\n",
1053
+ " EP_OTHERRACE float64\n",
1054
+ " MP_OTHERRACE float64\n",
1055
+ " Shape_Length float64\n",
1056
+ " Shape_Area float64\n",
1057
+ " geom geospatial:geometry\n",
1058
+ " area float64\n",
1059
+ " gbifid string\n",
1060
+ " datasetkey string\n",
1061
+ " occurrenceid string\n",
1062
+ " kingdom string\n",
1063
+ " phylum string\n",
1064
+ " class string\n",
1065
+ " order string\n",
1066
+ " family string\n",
1067
+ " genus string\n",
1068
+ " species string\n",
1069
+ " infraspecificepithet string\n",
1070
+ " taxonrank string\n",
1071
+ " scientificname string\n",
1072
+ " verbatimscientificname string\n",
1073
+ " verbatimscientificnameauthorship string\n",
1074
+ " countrycode string\n",
1075
+ " locality string\n",
1076
+ " stateprovince string\n",
1077
+ " occurrencestatus string\n",
1078
+ " individualcount int32\n",
1079
+ " publishingorgkey string\n",
1080
+ " decimallatitude float64\n",
1081
+ " decimallongitude float64\n",
1082
+ " coordinateuncertaintyinmeters float64\n",
1083
+ " coordinateprecision float64\n",
1084
+ " elevation float64\n",
1085
+ " elevationaccuracy float64\n",
1086
+ " depth float64\n",
1087
+ " depthaccuracy float64\n",
1088
+ " eventdate timestamp(6)\n",
1089
+ " day int32\n",
1090
+ " month int32\n",
1091
+ " year int32\n",
1092
+ " taxonkey int32\n",
1093
+ " specieskey int32\n",
1094
+ " basisofrecord string\n",
1095
+ " institutioncode string\n",
1096
+ " collectioncode string\n",
1097
+ " catalognumber string\n",
1098
+ " recordnumber string\n",
1099
+ " identifiedby array&lt;string&gt;\n",
1100
+ " dateidentified timestamp(6)\n",
1101
+ " license string\n",
1102
+ " rightsholder string\n",
1103
+ " recordedby array&lt;string&gt;\n",
1104
+ " typestatus array&lt;string&gt;\n",
1105
+ " establishmentmeans string\n",
1106
+ " lastinterpreted timestamp(6)\n",
1107
+ " mediatype array&lt;string&gt;\n",
1108
+ " issue array&lt;string&gt;\n",
1109
+ " geom_right geospatial:geometry\n",
1110
+ " h0 string\n",
1111
+ " h1 string\n",
1112
+ " h2 string\n",
1113
+ " h3 string\n",
1114
+ " h4 string\n",
1115
+ " h5 string\n",
1116
+ " h6 string\n",
1117
+ " h7 string\n",
1118
+ " h8 string\n",
1119
+ " h9 string\n",
1120
+ " h10 string\n",
1121
+ " h11 string\n",
1122
+ "</pre>\n"
1123
+ ],
1124
+ "text/plain": [
1125
+ "DatabaseTable: ibis_read_parquet_msislo4d7fcgdfh2pyoxvxjkdu\n",
1126
+ " OBJECTID int64\n",
1127
+ " ST string\n",
1128
+ " STATE string\n",
1129
+ " ST_ABBR string\n",
1130
+ " STCNTY string\n",
1131
+ " COUNTY string\n",
1132
+ " FIPS string\n",
1133
+ " LOCATION string\n",
1134
+ " AREA_SQMI float64\n",
1135
+ " E_TOTPOP int32\n",
1136
+ " M_TOTPOP int32\n",
1137
+ " E_HU int32\n",
1138
+ " M_HU int32\n",
1139
+ " E_HH int32\n",
1140
+ " M_HH int32\n",
1141
+ " E_POV150 int32\n",
1142
+ " M_POV150 int32\n",
1143
+ " E_UNEMP int32\n",
1144
+ " M_UNEMP int32\n",
1145
+ " E_HBURD int32\n",
1146
+ " M_HBURD int32\n",
1147
+ " E_NOHSDP int32\n",
1148
+ " M_NOHSDP int32\n",
1149
+ " E_UNINSUR int32\n",
1150
+ " M_UNINSUR int32\n",
1151
+ " E_AGE65 int32\n",
1152
+ " M_AGE65 int32\n",
1153
+ " E_AGE17 int32\n",
1154
+ " M_AGE17 int32\n",
1155
+ " E_DISABL int32\n",
1156
+ " M_DISABL int32\n",
1157
+ " E_SNGPNT int32\n",
1158
+ " M_SNGPNT int32\n",
1159
+ " E_LIMENG int32\n",
1160
+ " M_LIMENG int32\n",
1161
+ " E_MINRTY int32\n",
1162
+ " M_MINRTY int32\n",
1163
+ " E_MUNIT int32\n",
1164
+ " M_MUNIT int32\n",
1165
+ " E_MOBILE int32\n",
1166
+ " M_MOBILE int32\n",
1167
+ " E_CROWD int32\n",
1168
+ " M_CROWD int32\n",
1169
+ " E_NOVEH int32\n",
1170
+ " M_NOVEH int32\n",
1171
+ " E_GROUPQ int32\n",
1172
+ " M_GROUPQ int32\n",
1173
+ " EP_POV150 float64\n",
1174
+ " MP_POV150 float64\n",
1175
+ " EP_UNEMP float64\n",
1176
+ " MP_UNEMP float64\n",
1177
+ " EP_HBURD float64\n",
1178
+ " MP_HBURD float64\n",
1179
+ " EP_NOHSDP float64\n",
1180
+ " MP_NOHSDP float64\n",
1181
+ " EP_UNINSUR float64\n",
1182
+ " MP_UNINSUR float64\n",
1183
+ " EP_AGE65 float64\n",
1184
+ " MP_AGE65 float64\n",
1185
+ " EP_AGE17 float64\n",
1186
+ " MP_AGE17 float64\n",
1187
+ " EP_DISABL float64\n",
1188
+ " MP_DISABL float64\n",
1189
+ " EP_SNGPNT float64\n",
1190
+ " MP_SNGPNT float64\n",
1191
+ " EP_LIMENG float64\n",
1192
+ " MP_LIMENG float64\n",
1193
+ " EP_MINRTY float64\n",
1194
+ " MP_MINRTY float64\n",
1195
+ " EP_MUNIT float64\n",
1196
+ " MP_MUNIT float64\n",
1197
+ " EP_MOBILE float64\n",
1198
+ " MP_MOBILE float64\n",
1199
+ " EP_CROWD float64\n",
1200
+ " MP_CROWD float64\n",
1201
+ " EP_NOVEH float64\n",
1202
+ " MP_NOVEH float64\n",
1203
+ " EP_GROUPQ float64\n",
1204
+ " MP_GROUPQ float64\n",
1205
+ " EPL_POV150 float64\n",
1206
+ " EPL_UNEMP float64\n",
1207
+ " EPL_HBURD float64\n",
1208
+ " EPL_NOHSDP float64\n",
1209
+ " EPL_UNINSUR float64\n",
1210
+ " SPL_THEME1 float64\n",
1211
+ " RPL_THEME1 float64\n",
1212
+ " EPL_AGE65 float64\n",
1213
+ " EPL_AGE17 float64\n",
1214
+ " EPL_DISABL float64\n",
1215
+ " EPL_SNGPNT float64\n",
1216
+ " EPL_LIMENG float64\n",
1217
+ " SPL_THEME2 float64\n",
1218
+ " RPL_THEME2 float64\n",
1219
+ " EPL_MINRTY float64\n",
1220
+ " SPL_THEME3 float64\n",
1221
+ " RPL_THEME3 float64\n",
1222
+ " EPL_MUNIT float64\n",
1223
+ " EPL_MOBILE float64\n",
1224
+ " EPL_CROWD float64\n",
1225
+ " EPL_NOVEH float64\n",
1226
+ " EPL_GROUPQ float64\n",
1227
+ " SPL_THEME4 float64\n",
1228
+ " RPL_THEME4 float64\n",
1229
+ " SPL_THEMES float64\n",
1230
+ " RPL_THEMES float64\n",
1231
+ " F_POV150 int16\n",
1232
+ " F_UNEMP int16\n",
1233
+ " F_HBURD int16\n",
1234
+ " F_NOHSDP int16\n",
1235
+ " F_UNINSUR int16\n",
1236
+ " F_THEME1 int16\n",
1237
+ " F_AGE65 int16\n",
1238
+ " F_AGE17 int16\n",
1239
+ " F_DISABL int16\n",
1240
+ " F_SNGPNT int16\n",
1241
+ " F_LIMENG int16\n",
1242
+ " F_THEME2 int16\n",
1243
+ " F_MINRTY int16\n",
1244
+ " F_THEME3 int16\n",
1245
+ " F_MUNIT int16\n",
1246
+ " F_MOBILE int16\n",
1247
+ " F_CROWD int16\n",
1248
+ " F_NOVEH int16\n",
1249
+ " F_GROUPQ int16\n",
1250
+ " F_THEME4 int16\n",
1251
+ " F_TOTAL int16\n",
1252
+ " E_DAYPOP int32\n",
1253
+ " E_NOINT int32\n",
1254
+ " M_NOINT int32\n",
1255
+ " E_AFAM int32\n",
1256
+ " M_AFAM int32\n",
1257
+ " E_HISP int32\n",
1258
+ " M_HISP int32\n",
1259
+ " E_ASIAN int32\n",
1260
+ " M_ASIAN int32\n",
1261
+ " E_AIAN int32\n",
1262
+ " M_AIAN int32\n",
1263
+ " E_NHPI int32\n",
1264
+ " M_NHPI int32\n",
1265
+ " E_TWOMORE int32\n",
1266
+ " M_TWOMORE int32\n",
1267
+ " E_OTHERRACE int32\n",
1268
+ " M_OTHERRACE int32\n",
1269
+ " EP_NOINT float64\n",
1270
+ " MP_NOINT float64\n",
1271
+ " EP_AFAM float64\n",
1272
+ " MP_AFAM float64\n",
1273
+ " EP_HISP float64\n",
1274
+ " MP_HISP float64\n",
1275
+ " EP_ASIAN float64\n",
1276
+ " MP_ASIAN float64\n",
1277
+ " EP_AIAN float64\n",
1278
+ " MP_AIAN float64\n",
1279
+ " EP_NHPI float64\n",
1280
+ " MP_NHPI float64\n",
1281
+ " EP_TWOMORE float64\n",
1282
+ " MP_TWOMORE float64\n",
1283
+ " EP_OTHERRACE float64\n",
1284
+ " MP_OTHERRACE float64\n",
1285
+ " Shape_Length float64\n",
1286
+ " Shape_Area float64\n",
1287
+ " geom geospatial:geometry\n",
1288
+ " area float64\n",
1289
+ " gbifid string\n",
1290
+ " datasetkey string\n",
1291
+ " occurrenceid string\n",
1292
+ " kingdom string\n",
1293
+ " phylum string\n",
1294
+ " class string\n",
1295
+ " order string\n",
1296
+ " family string\n",
1297
+ " genus string\n",
1298
+ " species string\n",
1299
+ " infraspecificepithet string\n",
1300
+ " taxonrank string\n",
1301
+ " scientificname string\n",
1302
+ " verbatimscientificname string\n",
1303
+ " verbatimscientificnameauthorship string\n",
1304
+ " countrycode string\n",
1305
+ " locality string\n",
1306
+ " stateprovince string\n",
1307
+ " occurrencestatus string\n",
1308
+ " individualcount int32\n",
1309
+ " publishingorgkey string\n",
1310
+ " decimallatitude float64\n",
1311
+ " decimallongitude float64\n",
1312
+ " coordinateuncertaintyinmeters float64\n",
1313
+ " coordinateprecision float64\n",
1314
+ " elevation float64\n",
1315
+ " elevationaccuracy float64\n",
1316
+ " depth float64\n",
1317
+ " depthaccuracy float64\n",
1318
+ " eventdate timestamp(6)\n",
1319
+ " day int32\n",
1320
+ " month int32\n",
1321
+ " year int32\n",
1322
+ " taxonkey int32\n",
1323
+ " specieskey int32\n",
1324
+ " basisofrecord string\n",
1325
+ " institutioncode string\n",
1326
+ " collectioncode string\n",
1327
+ " catalognumber string\n",
1328
+ " recordnumber string\n",
1329
+ " identifiedby array<string>\n",
1330
+ " dateidentified timestamp(6)\n",
1331
+ " license string\n",
1332
+ " rightsholder string\n",
1333
+ " recordedby array<string>\n",
1334
+ " typestatus array<string>\n",
1335
+ " establishmentmeans string\n",
1336
+ " lastinterpreted timestamp(6)\n",
1337
+ " mediatype array<string>\n",
1338
+ " issue array<string>\n",
1339
+ " geom_right geospatial:geometry\n",
1340
+ " h0 string\n",
1341
+ " h1 string\n",
1342
+ " h2 string\n",
1343
+ " h3 string\n",
1344
+ " h4 string\n",
1345
+ " h5 string\n",
1346
+ " h6 string\n",
1347
+ " h7 string\n",
1348
+ " h8 string\n",
1349
+ " h9 string\n",
1350
+ " h10 string\n",
1351
+ " h11 string"
1352
+ ]
1353
+ },
1354
+ "execution_count": 43,
1355
+ "metadata": {},
1356
+ "output_type": "execute_result"
1357
+ }
1358
+ ],
1359
+ "source": [
1360
+ "gbif_usa"
1361
+ ]
1362
+ },
1363
+ {
1364
+ "cell_type": "code",
1365
+ "execution_count": null,
1366
+ "id": "a6ce4d65-6f93-4725-87fa-29bf413398ad",
1367
+ "metadata": {},
1368
+ "outputs": [],
1369
+ "source": [
1370
+ "The four summary theme ranking variables, detailed in the Data Dictionary below, are:\n",
1371
+ "• Socioeconomic Status - RPL_THEME1\n",
1372
+ "• Household Characteristics - RPL_THEME2\n",
1373
+ "• Racial & Ethnic Minority Status - RPL_THEME3\n",
1374
+ "• Housing Type & Transportation - RPL_THEME4 "
1375
+ ]
1376
+ },
1377
+ {
1378
+ "cell_type": "code",
1379
+ "execution_count": null,
1380
+ "id": "d2e85529-348b-4f33-b09d-f8424299dc8d",
1381
+ "metadata": {},
1382
+ "outputs": [],
1383
+ "source": [
1384
+ "import seaborn.objects as so\n",
1385
+ "\n",
1386
+ "#df = gbif_usa.group_by(_.FIPS).agg(n = _.count().log(), svi = _.RPL_THEMES.mean()).execute()\n",
1387
+ "df = gbif_usa.group_by(_.STATE, _.COUNTY).agg(n = _.count() / _.Shape_Area.sum(), svi1 = _.RPL_THEME1.mean(), svi3 = _.RPL_THEME3.mean()).execute()\n",
1388
+ "\n",
1389
+ "so.Plot(df, x = \"svi1\", y=\"n\", color = \"svi3\").add(so.Dots()).scale(y=\"log\")"
1390
+ ]
1391
+ },
1392
+ {
1393
+ "cell_type": "code",
1394
+ "execution_count": null,
1395
+ "id": "9030d3dc-e2fb-41b7-8fe9-80ee76739b78",
1396
+ "metadata": {},
1397
+ "outputs": [],
1398
+ "source": [
1399
+ "import altair as alt\n",
1400
+ "\n",
1401
+ "alt.Chart(df).mark_point().encode(\n",
1402
+ " x='svi1',\n",
1403
+ " y='n',\n",
1404
+ " color='svi3',\n",
1405
+ " tooltip = ['STATE', 'COUNTY']\n",
1406
+ ")\n"
1407
+ ]
1408
+ }
1409
+ ],
1410
+ "metadata": {
1411
+ "kernelspec": {
1412
+ "display_name": "base",
1413
+ "language": "python",
1414
+ "name": "python3"
1415
+ },
1416
+ "language_info": {
1417
+ "codemirror_mode": {
1418
+ "name": "ipython",
1419
+ "version": 3
1420
+ },
1421
+ "file_extension": ".py",
1422
+ "mimetype": "text/x-python",
1423
+ "name": "python",
1424
+ "nbconvert_exporter": "python",
1425
+ "pygments_lexer": "ipython3",
1426
+ "version": "3.12.8"
1427
+ }
1428
+ },
1429
+ "nbformat": 4,
1430
+ "nbformat_minor": 5
1431
+ }
preprocess-redlining.ipynb ADDED
The diff for this file is too large to render. See raw diff