cjber commited on
Commit
466cbea
Β·
1 Parent(s): a10efb2

docs: add tree

Browse files

Former-commit-id: 4bd5556ea204ff3f298386acbab2d43e5538a92e [formerly 1bd861ab14f0b19796ec55b9192678937c0f07f1]
Former-commit-id: a7e1bac1444032132725db899766fa54d2737ca5

Files changed (1) hide show
  1. README.md +28 -5
README.md CHANGED
@@ -34,10 +34,29 @@ graph TD;
34
 
35
  ## Features
36
 
37
- - **Document Processing**: Extracts and processes text from various document formats including PDFs and Excel files.
38
- - **Summarisation**: Generates concise summaries each response, highlighting key points and overall sentiment.
39
- - **Thematic Analysis**: Breaks down responses into thematic categories, providing a percentage breakdown of themes.
40
- - **Reporting**: Aggregates response summaries to produce an extensive final overview.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
41
 
42
  ## Installation
43
 
@@ -72,7 +91,9 @@ Alternatively run everything manually:
72
 
73
  - **Environment Variables**: Use a `.env` file to store sensitive information like API keys.
74
  - `OPENAI_API_KEY` required for summarisation.
 
75
  - **Constants**: Adjust `Consts` in `planning_ai/common/utils.py` to modify token limits and other settings.
 
76
 
77
  ## Workflow
78
 
@@ -80,4 +101,6 @@ Alternatively run everything manually:
80
  2. **Text Splitting**: Documents are split into manageable chunks using `CharacterTextSplitter`.
81
  3. **Graph Processing**: The `StateGraph` orchestrates the flow of data through various nodes, including mapping and reducing summaries.
82
  4. **Summarisation**: The `map_chain` and `reduce_chain` are used to generate and refine summaries using LLMs.
83
- 5. **Output**: Final summaries and thematic breakdowns are used to produce a final Quarto report.
 
 
 
34
 
35
  ## Features
36
 
37
+ - **Document Processing**: Extracts and processes text from `.json` and `.pdf` files.
38
+ - **Summarisation**: Generates concise summaries each response, highlighting key points and how they relate to policies.
39
+ - **Thematic Analysis**: Breaks down responses into themes.
40
+ - **Reporting**: Aggregates response summaries to produce an extensive final overview, and summary document.
41
+
42
+ ## Project Tree
43
+
44
+
45
+ ```bash
46
+ planning_ai/
47
+ β”œβ”€β”€ chains # llm calls with prompts using langchain
48
+ β”œβ”€β”€ common # shared utility functions
49
+ β”œβ”€β”€ documents # processing for final documents
50
+ β”œβ”€β”€ eval # evaluation functions to compare summaries to manual summaries
51
+ β”œβ”€β”€ graph.py # main langgraph functiosn
52
+ β”œβ”€β”€ llms # openai llm definitions
53
+ β”œβ”€β”€ logging.py # shared logging functiosn
54
+ β”œβ”€β”€ main.py # calls langgraph functions and document processing
55
+ β”œβ”€β”€ nodes # langgraph nodes that use chains to modify graph state
56
+ β”œβ”€β”€ preprocessing # functions for processing .json and .pdf files
57
+ β”œβ”€β”€ states.py # define the paramaters used by graph states
58
+ └── themes.py # defines main themes and policies
59
+ ```
60
 
61
  ## Installation
62
 
 
91
 
92
  - **Environment Variables**: Use a `.env` file to store sensitive information like API keys.
93
  - `OPENAI_API_KEY` required for summarisation.
94
+ - `AZURE_API_KEY` and `AZURE_API_ENDPOINT` needed to process `.pdfs`
95
  - **Constants**: Adjust `Consts` in `planning_ai/common/utils.py` to modify token limits and other settings.
96
+ - The document output format may be altered using files in `planning_ai/document`
97
 
98
  ## Workflow
99
 
 
101
  2. **Text Splitting**: Documents are split into manageable chunks using `CharacterTextSplitter`.
102
  3. **Graph Processing**: The `StateGraph` orchestrates the flow of data through various nodes, including mapping and reducing summaries.
103
  4. **Summarisation**: The `map_chain` and `reduce_chain` are used to generate and refine summaries using LLMs.
104
+ 5. **Output**: Final summaries and thematic breakdowns are used to produce a final report.
105
+
106
+ Citations within the final report correspond with the document IDs attributed to responses in the summaries document.