cjber commited on
Commit
f704542
·
1 Parent(s): e41c091

fix: use pandoc for compiling not quarto

Browse files

Former-commit-id: 06c88db08f808ed8d4405de562a4d453800562b1 [formerly 9ce1c873ed1d6771a47ebe3fe259a2d3889b7a35]
Former-commit-id: 6c0cd4332b81fa2a27068293a126a17251da154b

Files changed (2) hide show
  1. packages.txt +1 -1
  2. planning_ai/document.py +20 -60
packages.txt CHANGED
@@ -2,4 +2,4 @@ texlive-latex-extra
2
  texlive-fonts-extra
3
  cm-super
4
  dvipng
5
- quarto
 
2
  texlive-fonts-extra
3
  cm-super
4
  dvipng
5
+ pandoc
planning_ai/document.py CHANGED
@@ -331,44 +331,10 @@ def build_final_report(out, rep):
331
  introduction_paragraph = """
332
  This report was produced using a generative pre-trained transformer (GPT) large-language model (LLM) to produce an abstractive summary of all responses to the related planning application. This model automatically reviews every response in detail, and extracts key information to inform decision making. This document first consolidates this information into a single-page executive summary, highlighting areas of particular interest to consider, and the broad consensus of responses. Figures generated from responses then give both a geographic and statistical overview, highlighting any demographic imbalances in responses. The document then extracts detailed information from responses, grouped by theme and policy. In this section we incorporate citations which relate with the 'Summary Responses' document, to increase transparency.
333
  """
334
- figures_paragraph = """
335
- This section describes the characteristics of where submissions were received from. This can help to identify how representative submissions were and whether there were any communities whose views were not being considered. @fig-wards shows the number (frequency) of submitted representations by Ward based on the address attached to the submission. To interpret the figure, areas which are coloured white had no submissions from residents, and then areas are coloured in based on the total number of submissions with yellows and greens representing the largest numbers. This figure helps to identify which Wards are more active in terms of participation and representation in this report.
336
 
337
- @fig-oas displays the percentage of representations submitted by the Output Area Classification (2021). The Output Area Classification is the Office for National Statistics preferred classification of neighbourhoods. This measure groups neighbourhoods (here defined as Output Areas, typically containing 100 people) into categories that capture similar types of people based on population, demographic and socioeconomic characteristics. It therefore provides an insightful view of the types of communities who submitted representations. To interpret the figure, where bars extend higher/upwards, this represents a larger population share within a specific area type. The blue bars represent the characteristics of who submitted representations, and the orange bars represent the underlying population – allowing one to compare whether the profile of submissions matched the characteristics of the local population.
338
-
339
- This figure uses OAC 'Supergroups', which are the highest level of the hierarchy, and provide information relative to the average values for the UK population at large. The following gives a summary of each supergroup:
340
-
341
- 1. **Retired Professionals**
342
-
343
- Typically married but no longer with resident dependent children, these well-educated households either remain working in their managerial, professional, administrative or other skilled occupations, or are retired from them – the modal individual age is beyond normal retirement age. Underoccupied detached and semi-detached properties predominate, and unpaid care is more prevalent than reported disability. The prevalence of this Supergroup outside most urban conurbations indicates that rural lifestyles prevail, typically sustained by using two or more cars per household.
344
-
345
- 2. **Suburbanites and Peri-Urbanites**
346
-
347
- Pervasive throughout the UK, members of this Supergroup typically own (or are buying) their detached, semi-detached or terraced homes. They are also typically educated to A Level/Highers or degree level and work in skilled or professional occupations. Typically born in the UK, some families have children, although the median adult age is above 45 and some property has become under-occupied after children have left home. This Supergroup is pervasive not only in suburban locations, but also in neighbourhoods at or beyond the edge of cities that adjoin rural parts of the country.
348
-
349
- 3. **Multicultural and Educated Urbanites**
350
-
351
- Established populations comprising ethnic minorities together with persons born outside the UK predominate in this Supergroup. Residents present diverse personal characteristics and circumstances: while generally well-educated and practising skilled occupations, some residents live in overcrowded rental sector housing. English may not be the main language used by people in this Group. Although the typical adult resident is middle aged, single person households are common and marriage rates are low by national standards. This Supergroup predominates in Inner London, with smaller enclaves in many other densely populated metropolitan areas.
352
-
353
- 4. **Low-Skilled Migrant and Student Communities**
354
-
355
- Young adults, many of whom are students, predominate in these high-density and overcrowded neighbourhoods of rented terrace houses or flats. Most ethnic minorities are present in these communities, as are people born in European countries that are not part of the EU. Students aside, low skilled occupations predominate, and unemployment rates are above average. Overall, the mix of students and more sedentary households means that neighbourhood average numbers of children are not very high. The Mixed or Multiple ethnic group composition of neighbourhoods is often associated with low rates of affiliation to Christian religions. This Supergroup predominates in non-central urban locations across the UK, particularly within England in the Midlands and the outskirts of west, south and north-east London.
356
-
357
- 5. **Ethnically Diverse Suburban Professionals**
358
-
359
- Those working within the managerial, professional and administrative occupations typically reflect a wide range of ethnic groups, and reside in detached or semi-detached housing. Their residential locations at the edges of cities and conurbations and car-based lifestyles are more characteristic of Supergroup membership than birthplace or participation in child-rearing. Houses are typically owner-occupied and marriage rates are lower than the national average. This Supergroup is found throughout suburban UK.
360
-
361
- 6. **Baseline UK**
362
-
363
- This Supergroup exemplifies the broad base to the UK’s social structure, encompassing as it does the average or modal levels of many neighbourhood characteristics, including all housing tenures, a range of levels of educational attainment and religious affiliations, and a variety of pre-retirement age structures. Yet, in combination, these mixes are each distinctive of the parts of the UK. Overall, terraced houses and flats are the most prevalent, as is employment in intermediate or low-skilled occupations. However, this Supergroup is also characterised by above average levels of unemployment and lower levels of use of English as the main language. Many neighbourhoods occur in south London and the UK’s other major urban centres.
364
-
365
- 7. **Semi- and Un-Skilled Workforce**
366
-
367
- Living in terraced or semi-detached houses, residents of these neighbourhoods typically lack high levels of education and work in elementary or routine service occupations. Unemployment is above average. Residents are predominantly born in the UK, and residents are also predominantly from ethnic minorities. Social (but not private sector) rented sector housing is common. This Supergroup is found throughout the UK’s conurbations and industrial regions but is also an integral part of smaller towns.
368
-
369
- 8. **Legacy Communities**
370
-
371
- These neighbourhoods characteristically comprise pockets of flats that are scattered across the UK, particularly in towns that retain or have legacies of heavy industry or are in more remote seaside locations. Employed residents of these neighbourhoods work mainly in low-skilled occupations. Residents typically have limited educational qualifications. Unemployment is above average. Some residents live in overcrowded housing within the social rented sector and experience long-term disability. All adult age groups are represented, although there is an overall age bias towards elderly people in general and the very old in particular. Individuals identifying as belonging to ethnic minorities or Mixed or Multiple ethnic groups are uncommon.
372
 
373
  @fig-imd shows the percentage of responses by level of neighbourhood socioeconomic deprivation. The information is presented using the 2019 Index of Multiple Deprivation, divided into quintiles (i.e., dividing the English population into equal fifths). This measure is the UK Government’s preferred measure of socioeconomic deprivation and is based on information about income, employment, education, health, crime, housing and the local environment for small areas (Lower Super Output Areas, typically containing 1600 people). To interpret the graph, bars represent the share of population from each quintile. Quintile 1 represents the most deprived 20% of areas, and quintile 5 the least deprived 20% of areas. The orange bars represent the distribution of people who submitted representations (i.e., larger bars mean that more people from these areas submitted representations). The blue bars show the distribution of the local population, allowing one to evaluate whether the evidence submitted was from the same communities in the area.
374
  """
@@ -388,10 +354,8 @@ The following section provides a detailed breakdown of notable details from resp
388
  quarto_doc = (
389
  "---\n"
390
  f"title: 'Summary of Submitted Representations: {rep}'\n"
391
- "format: pdf\n"
392
- "execute:\n"
393
- " freeze: auto\n"
394
- " echo: false\n"
395
  "fontfamily: libertinus\n"
396
  "monofont: 'JetBrains Mono'\n"
397
  "monofontoptions:\n"
@@ -406,9 +370,9 @@ The following section provides a detailed breakdown of notable details from resp
406
  f"{introduction_paragraph}\n\n"
407
  "\n# Profile of Submissions\n\n"
408
  f"{figures_paragraph}\n\n"
409
- f"![Total number of representations submitted by Ward](./figs/wards.pdf){{#fig-wards}}\n\n"
410
- f"![Total number of representations submitted by Output Area (OA 2021)](./figs/oas.pdf){{#fig-oas}}\n\n"
411
- f"![Percentage of representations submitted by quintile of index of multiple deprivation (2019)](./figs/imd_decile.pdf){{#fig-imd}}\n\n"
412
  r"\newpage"
413
  "\n\n# Themes and Policies\n\n"
414
  f"{themes_paragraph}\n\n"
@@ -427,18 +391,15 @@ The following section provides a detailed breakdown of notable details from resp
427
  f"{other_policies or '_No other representations._'}\n\n"
428
  )
429
 
430
- out_path = Paths.SUMMARY / f"Summary_of_Submitted_Responses-{rep}.qmd"
 
431
  with open(out_path, "w") as f:
432
  f.write(quarto_doc)
433
- command = [
434
- "quarto",
435
- "render",
436
- f"{out_path}",
437
- ]
438
  try:
439
  subprocess.run(command, check=True, capture_output=True)
440
  except subprocess.CalledProcessError as e:
441
- logging.error(f"Error during Summary_of_Submitted_Responses.qmd render: {e}")
442
 
443
 
444
  def build_summaries_document(out, rep):
@@ -450,25 +411,24 @@ def build_summaries_document(out, rep):
450
  # f"**Identified Entities**\n\n{document['entities']}\n\n"
451
  for document in out["generate_final_report"]["documents"]
452
  )
453
- quarto_header = (
454
  "---\n"
455
  f"title: 'Summary Documents: {rep}'\n"
456
- "format: pdf\n"
457
- "execute:\n"
458
- " freeze: auto\n"
459
- " echo: false\n"
460
  "fontfamily: libertinus\n"
 
 
461
  "monofont: 'JetBrains Mono'\n"
462
  "monofontoptions:\n"
463
  " - Scale=0.55\n"
464
  "---\n\n"
465
  )
466
- out_path = Paths.SUMMARY / f"Summary_Documents-{rep}.qmd"
 
467
  with open(out_path, "w") as f:
468
- f.write(f"{quarto_header}{full_text}")
469
 
470
- command = ["quarto", "render", f"{out_path}"]
471
  try:
472
  subprocess.run(command, check=True, capture_output=True)
473
  except subprocess.CalledProcessError as e:
474
- logging.error(f"Error during Summary_Documents.qmd render: {e}")
 
331
  introduction_paragraph = """
332
  This report was produced using a generative pre-trained transformer (GPT) large-language model (LLM) to produce an abstractive summary of all responses to the related planning application. This model automatically reviews every response in detail, and extracts key information to inform decision making. This document first consolidates this information into a single-page executive summary, highlighting areas of particular interest to consider, and the broad consensus of responses. Figures generated from responses then give both a geographic and statistical overview, highlighting any demographic imbalances in responses. The document then extracts detailed information from responses, grouped by theme and policy. In this section we incorporate citations which relate with the 'Summary Responses' document, to increase transparency.
333
  """
334
+ figures_paragraph = r"""
335
+ This section describes the characteristics of where submissions were received from. This can help to identify how representative submissions were and whether there were any communities whose views were not being considered. \ref{fig-wards} shows the number (frequency) of submitted representations by Ward based on the address attached to the submission. To interpret the figure, areas which are coloured white had no submissions from residents, and then areas are coloured in based on the total number of submissions with yellows and greens representing the largest numbers. This figure helps to identify which Wards are more active in terms of participation and representation in this report.
336
 
337
+ @fig-oas displays the percentage of representations submitted by the Output Area Classification (2021). The Output Area Classification is the Office for National Statistics preferred classification of neighbourhoods. This measure groups neighbourhoods (here defined as Output Areas, typically containing 100 people) into categories that capture similar types of people based on population, demographic and socioeconomic characteristics. It therefore provides an insightful view of the types of communities who submitted representations. To interpret the figure, where bars extend higher/upwards, this represents a larger population share within a specific area type. The blue bars represent the characteristics of who submitted representations, and the orange bars represent the underlying population – allowing one to compare whether the profile of submissions matched the characteristics of the local population. This figure uses OAC 'Supergroups', which are the highest level of the hierarchy, and provide information relative to the average values for the UK population at large.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
338
 
339
  @fig-imd shows the percentage of responses by level of neighbourhood socioeconomic deprivation. The information is presented using the 2019 Index of Multiple Deprivation, divided into quintiles (i.e., dividing the English population into equal fifths). This measure is the UK Government’s preferred measure of socioeconomic deprivation and is based on information about income, employment, education, health, crime, housing and the local environment for small areas (Lower Super Output Areas, typically containing 1600 people). To interpret the graph, bars represent the share of population from each quintile. Quintile 1 represents the most deprived 20% of areas, and quintile 5 the least deprived 20% of areas. The orange bars represent the distribution of people who submitted representations (i.e., larger bars mean that more people from these areas submitted representations). The blue bars show the distribution of the local population, allowing one to evaluate whether the evidence submitted was from the same communities in the area.
340
  """
 
354
  quarto_doc = (
355
  "---\n"
356
  f"title: 'Summary of Submitted Representations: {rep}'\n"
357
+ "geometry: a4paper\n"
358
+ "margin: 2cm\n"
 
 
359
  "fontfamily: libertinus\n"
360
  "monofont: 'JetBrains Mono'\n"
361
  "monofontoptions:\n"
 
370
  f"{introduction_paragraph}\n\n"
371
  "\n# Profile of Submissions\n\n"
372
  f"{figures_paragraph}\n\n"
373
+ f"![Total number of representations submitted by Ward\\label{{fig-wards}}](./figs/wards.pdf)\n\n"
374
+ f"![Total number of representations submitted by Output Area (OA 2021)\\label{{fig-oas}}](./figs/oas.pdf)\n\n"
375
+ f"![Percentage of representations submitted by quintile of index of multiple deprivation (2019)\\label{{fig-imd}}](./figs/imd_decile.pdf)\n\n"
376
  r"\newpage"
377
  "\n\n# Themes and Policies\n\n"
378
  f"{themes_paragraph}\n\n"
 
391
  f"{other_policies or '_No other representations._'}\n\n"
392
  )
393
 
394
+ out_path = Paths.SUMMARY / f"Summary_of_Submitted_Responses-{rep}.md"
395
+ out_file = Paths.SUMMARY / f"Summary_of_Submitted_Responses-{rep}.pdf"
396
  with open(out_path, "w") as f:
397
  f.write(quarto_doc)
398
+ command = ["pandoc", f"{out_path}", "-o", f"{out_file}"]
 
 
 
 
399
  try:
400
  subprocess.run(command, check=True, capture_output=True)
401
  except subprocess.CalledProcessError as e:
402
+ logging.error(f"Error during Summary_of_Submitted_Responses.md render: {e}")
403
 
404
 
405
  def build_summaries_document(out, rep):
 
411
  # f"**Identified Entities**\n\n{document['entities']}\n\n"
412
  for document in out["generate_final_report"]["documents"]
413
  )
414
+ header = (
415
  "---\n"
416
  f"title: 'Summary Documents: {rep}'\n"
 
 
 
 
417
  "fontfamily: libertinus\n"
418
+ "geometry: a4paper\n"
419
+ "margin: 2cm\n"
420
  "monofont: 'JetBrains Mono'\n"
421
  "monofontoptions:\n"
422
  " - Scale=0.55\n"
423
  "---\n\n"
424
  )
425
+ out_path = Paths.SUMMARY / f"Summary_Documents-{rep}.md"
426
+ out_file = Paths.SUMMARY / f"Summary_Documents-{rep}.pdf"
427
  with open(out_path, "w") as f:
428
+ f.write(f"{header}{full_text}")
429
 
430
+ command = ["pandoc", f"{out_path}", "-o", f"{out_file}"]
431
  try:
432
  subprocess.run(command, check=True, capture_output=True)
433
  except subprocess.CalledProcessError as e:
434
+ logging.error(f"Error during render: {e}")