Spaces:

cjber
/

planning-ai

Build error

cjber commited on Feb 17

Commit

d7566e3

1 Parent(s): a7f708f

fix: add info on unused documents

Former-commit-id: 2723beb529ea4a60f16344e3cf4b036ee1e60970 [formerly f160c54bcb76a16803b59a641252bd44f3dedcb1]
Former-commit-id: 8ca6a279a0c088afc3e96a0141cd5682acb9d84f

Files changed (4) hide show

app.py +2 -2
planning_ai/document.py +10 -6
planning_ai/nodes/reduce_node.py +11 -1
planning_ai/states.py +2 -0

app.py CHANGED Viewed

@@ -136,7 +136,7 @@ if st.session_state["completed"]:
         col1, col2 = st.columns(2, border=True)
         with col1:
             with open(summaries_path, "rb") as pdf_file:
-                st.markdown("**Representations Summary Download**")
                 st.download_button(
                     label=f"{rep}",
                     data=pdf_file,
@@ -146,7 +146,7 @@ if st.session_state["completed"]:
                 )
         with col2:
             with open(report_path, "rb") as pdf_file:
-                st.markdown("**Executive Report Download**")
                 st.download_button(
                     label=f"{rep}",
                     data=pdf_file,

         col1, col2 = st.columns(2, border=True)
         with col1:
             with open(summaries_path, "rb") as pdf_file:
+                st.markdown("**Executive Report Download**")
                 st.download_button(
                     label=f"{rep}",
                     data=pdf_file,
                 )
         with col2:
             with open(report_path, "rb") as pdf_file:
+                st.markdown("**Represtations Summary Download**")
                 st.download_button(
                     label=f"{rep}",
                     data=pdf_file,

planning_ai/document.py CHANGED Viewed

@@ -332,16 +332,17 @@ def build_final_report(out, rep):
 This report was produced using a generative pre-trained transformer (GPT) large-language model (LLM) to produce an abstractive summary of all responses to the related planning application. This model automatically reviews every response in detail, and extracts key information to inform decision making. This document first consolidates this information into a single-page executive summary, highlighting areas of particular interest to consider, and the broad consensus of responses. Figures generated from responses then give both a geographic and statistical overview, highlighting any demographic imbalances in responses. The document then extracts detailed information from responses, grouped by theme and policy. In this section we incorporate citations which relate with the 'Summary Responses' document, to increase transparency.
 """
     figures_paragraph = r"""
-This section describes the characteristics of where submissions were received from. This can help to identify how representative submissions were and whether there were any communities whose views were not being considered. \ref{fig-wards} shows the number (frequency) of submitted representations by Ward based on the address attached to the submission. To interpret the figure, areas which are coloured white had no submissions from residents, and then areas are coloured in based on the total number of submissions with yellows and greens representing the largest numbers. This figure helps to identify which Wards are more active in terms of participation and representation in this report.
-@fig-oas displays the percentage of representations submitted by the Output Area Classification (2021). The Output Area Classification is the Office for National Statistics preferred classification of neighbourhoods.  This measure groups neighbourhoods (here defined as Output Areas, typically containing 100 people) into categories that capture similar types of people based on population, demographic and socioeconomic characteristics. It therefore provides an insightful view of the types of communities who submitted representations. To interpret the figure, where bars extend higher/upwards, this represents a larger population share within a specific area type. The blue bars represent the characteristics of who submitted representations, and the orange bars represent the underlying population – allowing one to compare whether the profile of submissions matched the characteristics of the local population. This figure uses OAC 'Supergroups', which are the highest level of the hierarchy, and provide information relative to the average values for the UK population at large.
-@fig-imd shows the percentage of responses by level of neighbourhood socioeconomic deprivation. The information is presented using the 2019 Index of Multiple Deprivation, divided into quintiles (i.e., dividing the English population into equal fifths). This measure is the UK Government’s preferred measure of socioeconomic deprivation and is based on information about income, employment, education, health, crime, housing and the local environment for small areas (Lower Super Output Areas, typically containing 1600 people). To interpret the graph, bars represent the share of population from each quintile. Quintile 1 represents the most deprived 20% of areas, and quintile 5 the least deprived 20% of areas. The orange bars represent the distribution of people who submitted representations (i.e., larger bars mean that more people from these areas submitted representations). The blue bars show the distribution of the local population, allowing one to evaluate whether the evidence submitted was from the same communities in the area.
 """
     themes_paragraph = """
 The following section provides a detailed breakdown of notable details from responses, grouped by themes and policies. Both themes and associated policies are automatically determined through an analysis of the summary content by an LLM agent. Each theme is grouped by whether a responses is supporting, opposed, or a general comment. This section aims to give a comprehensive view of the key issues raised by the respondents with respect to the themes and policies outlined. We have incorporated citations into eac hpoint (see numbers in square brackets) which relate to the specific document they were made in, to promote the transparency of where information was sourced from. @tbl-themes gives a breakdown of the number of submissions that relate with each theme, submissions may relate to more than one theme.
     """
     final = out["generate_final_report"]
     support_policies, object_policies, other_policies = _process_policies(final)
     postcodes = _process_postcodes(final)
     stances = _process_stances(final)
@@ -370,9 +371,9 @@ The following section provides a detailed breakdown of notable details from resp
         f"{introduction_paragraph}\n\n"
         "\n# Profile of Submissions\n\n"
         f"{figures_paragraph}\n\n"
-        f"![Total number of representations submitted by Ward\\label{{fig-wards}}](./figs/wards.pdf)\n\n"
-        f"![Total number of representations submitted by Output Area (OA 2021)\\label{{fig-oas}}](./figs/oas.pdf)\n\n"
-        f"![Percentage of representations submitted by quintile of index of multiple deprivation (2019)\\label{{fig-imd}}](./figs/imd_decile.pdf)\n\n"
         r"\newpage"
         "\n\n# Themes and Policies\n\n"
         f"{themes_paragraph}\n\n"
@@ -389,6 +390,9 @@ The following section provides a detailed breakdown of notable details from resp
         "The following section presents a list of all points raised in representations that do not support "
         "or object to the plan, grouped by theme and policy.\n\n"
         f"{other_policies or '_No other representations._'}\n\n"
     )
     out_path = Paths.SUMMARY / f"Summary_of_Submitted_Responses-{rep}.md"

 This report was produced using a generative pre-trained transformer (GPT) large-language model (LLM) to produce an abstractive summary of all responses to the related planning application. This model automatically reviews every response in detail, and extracts key information to inform decision making. This document first consolidates this information into a single-page executive summary, highlighting areas of particular interest to consider, and the broad consensus of responses. Figures generated from responses then give both a geographic and statistical overview, highlighting any demographic imbalances in responses. The document then extracts detailed information from responses, grouped by theme and policy. In this section we incorporate citations which relate with the 'Summary Responses' document, to increase transparency.
 """
     figures_paragraph = r"""
+This section describes the characteristics of where submissions were received from. This can help to identify how representative submissions were and whether there were any communities whose views were not being considered. Figure \ref{fig-wards} shows the number (frequency) of submitted representations by Ward based on the address attached to the submission. To interpret the figure, areas which are coloured white had no submissions from residents, and then areas are coloured in based on the total number of submissions with yellows and greens representing the largest numbers. This figure helps to identify which Wards are more active in terms of participation and representation in this report.
+Figure \ref{fig-oas} displays the percentage of representations submitted by the Output Area Classification (2021). The Output Area Classification is the Office for National Statistics preferred classification of neighbourhoods.  This measure groups neighbourhoods (here defined as Output Areas, typically containing 100 people) into categories that capture similar types of people based on population, demographic and socioeconomic characteristics. It therefore provides an insightful view of the types of communities who submitted representations. To interpret the figure, where bars extend higher/upwards, this represents a larger population share within a specific area type. The blue bars represent the characteristics of who submitted representations, and the orange bars represent the underlying population – allowing one to compare whether the profile of submissions matched the characteristics of the local population. This figure uses OAC 'Supergroups', which are the highest level of the hierarchy, and provide information relative to the average values for the UK population at large.
+Figure \ref{fig-imd} shows the percentage of responses by level of neighbourhood socioeconomic deprivation. The information is presented using the 2019 Index of Multiple Deprivation, divided into quintiles (i.e., dividing the English population into equal fifths). This measure is the UK Government’s preferred measure of socioeconomic deprivation and is based on information about income, employment, education, health, crime, housing and the local environment for small areas (Lower Super Output Areas, typically containing 1600 people). To interpret the graph, bars represent the share of population from each quintile. Quintile 1 represents the most deprived 20% of areas, and quintile 5 the least deprived 20% of areas. The orange bars represent the distribution of people who submitted representations (i.e., larger bars mean that more people from these areas submitted representations). The blue bars show the distribution of the local population, allowing one to evaluate whether the evidence submitted was from the same communities in the area.
 """
     themes_paragraph = """
 The following section provides a detailed breakdown of notable details from responses, grouped by themes and policies. Both themes and associated policies are automatically determined through an analysis of the summary content by an LLM agent. Each theme is grouped by whether a responses is supporting, opposed, or a general comment. This section aims to give a comprehensive view of the key issues raised by the respondents with respect to the themes and policies outlined. We have incorporated citations into eac hpoint (see numbers in square brackets) which relate to the specific document they were made in, to promote the transparency of where information was sourced from. @tbl-themes gives a breakdown of the number of submissions that relate with each theme, submissions may relate to more than one theme.
     """
     final = out["generate_final_report"]
+    unused_documents = out["generate_final_report"]["unused_documents"]
     support_policies, object_policies, other_policies = _process_policies(final)
     postcodes = _process_postcodes(final)
     stances = _process_stances(final)
         f"{introduction_paragraph}\n\n"
         "\n# Profile of Submissions\n\n"
         f"{figures_paragraph}\n\n"
+        f"![Total number of representations submitted by Ward\\label{{fig-wards}}](./data/out/summary/figs/wards.pdf)\n\n"
+        f"![Total number of representations submitted by Output Area (OA 2021)\\label{{fig-oas}}](./data/out/summary/figs/oas.pdf)\n\n"
+        f"![Percentage of representations submitted by quintile of index of multiple deprivation (2019)\\label{{fig-imd}}](./data/out/summary/figs/imd_decile.pdf)\n\n"
         r"\newpage"
         "\n\n# Themes and Policies\n\n"
         f"{themes_paragraph}\n\n"
         "The following section presents a list of all points raised in representations that do not support "
         "or object to the plan, grouped by theme and policy.\n\n"
         f"{other_policies or '_No other representations._'}\n\n"
+        "## Unused Documents\n\n"
+        "Please note that the following documents were not used to produce this report:\n\n"
+        f"{str(unused_documents)}"
     )
     out_path = Paths.SUMMARY / f"Summary_of_Submitted_Responses-{rep}.md"

planning_ai/nodes/reduce_node.py CHANGED Viewed

@@ -130,6 +130,11 @@ def generate_final_report(state: OverallState):
 def final_output(final_docs):
     docs = [doc for doc in final_docs if not doc["failed"]]
     docs = add_doc_id(docs)
     policy_groups = extract_policies_from_docs(docs)
@@ -139,4 +144,9 @@ def final_output(final_docs):
     executive = reduce_chain_final.invoke(
         {"context": "Executive Report:\n\n".join(batch_executive)}
     )
-    return {"executive": executive, "documents": docs, "policies": policies}

 def final_output(final_docs):
     docs = [doc for doc in final_docs if not doc["failed"]]
+    # TODO: say which docs are not considered in the final report
+    failed_docs = [
+        doc["document"].metadata["filename"] for doc in final_docs if doc["failed"]
+    ]
     docs = add_doc_id(docs)
     policy_groups = extract_policies_from_docs(docs)
     executive = reduce_chain_final.invoke(
         {"context": "Executive Report:\n\n".join(batch_executive)}
     )
+    return {
+        "executive": executive,
+        "documents": docs,
+        "policies": policies,
+        "unused_documents": failed_docs,
+    }

planning_ai/states.py CHANGED Viewed

@@ -31,4 +31,6 @@ class OverallState(TypedDict):
     executive: str
     policies: pl.DataFrame
     n_docs: int

     executive: str
     policies: pl.DataFrame
+    unused_documents: list[int]
     n_docs: int