cjber commited on
Commit
e481f98
·
1 Parent(s): d6d2f21

add reports dir

Browse files
reports/DOCS/DOCS.pdf ADDED
Binary file (141 kB). View file
 
reports/DOCS/DOCS.qmd ADDED
@@ -0,0 +1,98 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ title: 'Summarisation of Planning Responses with LLMs'
3
+ format:
4
+ PrettyPDF-pdf:
5
+ papersize: A4
6
+ execute:
7
+ freeze: auto
8
+ echo: false
9
+ monofont: 'JetBrains Mono'
10
+ monofontoptions:
11
+ - Scale=0.55
12
+ ---
13
+
14
+ ## Introduction
15
+
16
+ * Saves time; takes minutes rather than hours (or days)
17
+ * Reduces bias?
18
+ * All information can be considered equally.
19
+ * Diverse forms of input; from letters to brief comments, even handwritten text (gpt-4o).
20
+ * Generate easy to understand summaries, removing any domain specific terminology; increased transparency etc.
21
+
22
+ * Need to ensure the summaries are accurate, may require human oversight? For this we are using hallucination detection which works well. Eval vs summaries already generated?
23
+
24
+ ## Methodology
25
+
26
+ This project primarily considers the use of generative pre-trained transformer (GPT) large-language models (LLMs) for _abstractive_ summarisation of planning responses. Unlike _extractive_ summarisation, where encoder-transformer LLMs has been an established task for a number of years (e.g. with Google's BERT), this task has now been advanced through the use of larger scale GPT models (e.g. OpenAIs gpt-3/gpt-4 series). One benefit of these new models are their size; they are both _trained_ on more human data, and have a larger number of _model parameters_. Both of these factors mean that such models are able to understand human text and semantic nuances to a greater degree. Subsequently, their architectural differences mean that, while BERT-like models excel at _extractive summarisation_, GPT models are able to _generate_ large amounts of human-like text.
27
+
28
+ Given these advances, a number of methods relating to document summarisation have been established in recent years (and months). In this project, we focus on the task of _map-reduce_ summarisation; given a large set of documents, summarise each, then summarise those summaries to produce a final report.
29
+
30
+ For our use-case we established the following data-flow;
31
+
32
+ NOTE: I want some way to integrate 'citations'. One approach is to use extractive summarisation to show related passages? Could also use inline citations, but there can be quite a few of those (700?). Maybe a way to reduce them down, splitting to sentences, and grouping?
33
+
34
+ ```{mermaid}
35
+ %%{init: {'flowchart': {'curve': 'linear'}}}%%
36
+ graph TD;
37
+ __start__([__start__]):::first
38
+ generate_summary(generate_summary)
39
+ check_hallucination(check_hallucination)
40
+ fix_hallucination(fix_hallucination)
41
+ generate_final_summary(generate_final_summary)
42
+ __end__([__end__]):::last
43
+ check_hallucination --> generate_final_summary;
44
+ generate_final_summary --> __end__;
45
+ __start__ -.-> generate_summary;
46
+ generate_summary -.-> check_hallucination;
47
+ check_hallucination -.-> fix_hallucination;
48
+ fix_hallucination -.-> check_hallucination;
49
+ classDef default fill:#f2f0ff,line-height:1.2
50
+ classDef first fill-opacity:0
51
+ classDef last fill:#bfb6fc
52
+ ```
53
+
54
+ 1. Summaries for each response are generated in parallel
55
+ 2. Each summary is check to ensure there are no _hallucinations_ (cyclically)
56
+ 3. Summaries are used to form a final report
57
+
58
+ ### Hallucination removal
59
+
60
+ Hallucination detection and removal is a key part of this process. While summarisation typically accurately reflect the information contained within the documents they are summarising, the model may sometimes inject information that isn't explicitly stated. To avoid these cases we use another LLM _agent_ to read through both the original document and the summarisation, to produce a report highlighting any issues, and give a score that informs a downstream stage whether the summary is accurate.
61
+
62
+ In the following example we highlight a case where the original summarisation LLM makes a clear mistake, which is then rectified by a further iteration, using information generated by the hallucination agent. For brevity, the questions have been removed from the response.
63
+
64
+ **Document**:
65
+
66
+ _None. The wastewater treatment plant has recently been upgraded and deemed fit for purpose for a significant
67
+ number of years going forward. The carbon and financial cost of relocating this site is huge - if housing is needed then the housing should be situated in the proposed site for the relocated treatment plant, the new
68
+ homeowners would certainly enjoy their life in the greenbelt, those living nearby would be grateful and £227 million pounds and many tonnes of carbon saved._
69
+
70
+ _Vast open spaces should be employed. Community centers are important, including support for children and mental health. National chains should be banned from owning shops or property in the area._
71
+
72
+ _Cambourne should remain isolated and become self sufficient. If anything, more cycle only routes should be set up._
73
+
74
+ _None. These villages should remain as they are._
75
+
76
+ _Grantchester._
77
+
78
+ _Road bypasses._
79
+
80
+ _Barton and Newnham_
81
+
82
+ _Underdeveloped and preserved_
83
+
84
+ **Summarisation Attempt**:
85
+
86
+ The response indicates that the area east of Milton Road can be developed into a lively city district after the wastewater treatment plant relocates, emphasizing the high costs of relocation. It advocates for vast open spaces and community centers around the Cambridge Biomedical Campus, while suggesting that Cambourne should remain isolated and self-sufficient. The response opposes development in the southern rural cluster of villages, supports limited development in Grantchester, and proposes road bypasses. Additionally, it identifies Barton and Newnham as potential sites for development. The overall vision for Greater Cambridge in 2041 includes a focus on preservation and limited development.
87
+
88
+ **Hallucination agent response**
89
+
90
+ The Assistant's Summary contains several inaccuracies and misinterpretations of the source document. Firstly, it states that the response opposes development east of Milton Road due to the recent upgrade of the wastewater treatment plant, which is misleading. The source document suggests that the area can be developed after the plant relocates, indicating a potential for development rather than outright opposition. Secondly, the summary claims that the vision for Greater Cambridge in 2041 is to remain underdeveloped and preserved, which is a misrepresentation of the nuanced views expressed in the source document. The source document does advocate for some development in specific areas, such as Grantchester and the Cambridge Biomedical Campus, while also emphasizing the importance of open spaces and community centers. Therefore, the summary does not fully align with the source document and contains hallucinations.
91
+
92
+ **Score**: FAIL → Iterate to create another summary using the hallucination agent response
93
+
94
+ **Final**:
95
+
96
+ The response opposes the development of the area east of Milton Road, arguing that the wastewater treatment plant is fit for purpose and should not be relocated due to high costs. It supports the development of the Cambridge Biomedical Campus with a focus on vast open spaces and community centers. Cambourne should remain isolated and self-sufficient, with an emphasis on cycle routes. The response opposes development in the southern rural cluster of villages, supports limited development in Grantchester, and suggests that road bypasses are needed. It identifies Barton and Newnham as potential sites for development. The overall vision for Greater Cambridge in 2041 emphasizes preservation and limited development.
97
+
98
+
reports/DOCS/DOCS_files/figure-latex/mermaid-figure-1.png ADDED
reports/DOCS/_extensions/nrennie/PrettyPDF/PrettyPDF.lua ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ function Div(el)
2
+ if el.classes:includes('article') then
3
+ local blocks = pandoc.List({
4
+ pandoc.RawBlock('typst', '#article[')
5
+ })
6
+ blocks:extend(el.content)
7
+ blocks:insert(pandoc.RawBlock('typst', ']\n'))
8
+ return blocks
9
+ end
10
+ end
reports/DOCS/_extensions/nrennie/PrettyPDF/PrettyPDF.tex ADDED
@@ -0,0 +1,77 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ % load packages
2
+ \usepackage{geometry}
3
+ \usepackage{xcolor}
4
+ \usepackage{eso-pic}
5
+ \usepackage{fancyhdr}
6
+ \usepackage{sectsty}
7
+ \usepackage{fontspec}
8
+ \usepackage{titlesec}
9
+
10
+ %% Set page size with a wider right margin
11
+ \geometry{a4paper, total={170mm,257mm}, left=20mm, top=20mm, bottom=20mm, right=50mm}
12
+
13
+ %% Let's define some colours
14
+ \definecolor{light}{HTML}{E6E6FA}
15
+ \definecolor{highlight}{HTML}{800080}
16
+ \definecolor{dark}{HTML}{330033}
17
+
18
+ %% Let's add the border on the right hand side
19
+ \AddToShipoutPicture{%
20
+ \AtPageLowerLeft{%
21
+ \put(\LenToUnit{\dimexpr\paperwidth-3cm},0){%
22
+ \color{light}\rule{3cm}{\LenToUnit\paperheight}%
23
+ }%
24
+ }%
25
+ % logo
26
+ \AtPageLowerLeft{% start the bar at the bottom right of the page
27
+ \put(\LenToUnit{\dimexpr\paperwidth-2.75cm},27.2cm){% move it to the top right
28
+ \color{light}\includegraphics[width=2.5cm]{_extensions/nrennie/PrettyPDF/logo.png}
29
+ }%
30
+ }%
31
+ }
32
+
33
+ %% Style the page number
34
+ \fancypagestyle{mystyle}{
35
+ \fancyhf{}
36
+ \renewcommand\headrulewidth{0pt}
37
+ \fancyfoot[R]{\thepage}
38
+ \fancyfootoffset{3.5cm}
39
+ }
40
+ \setlength{\footskip}{20pt}
41
+
42
+ %% style the chapter/section fonts
43
+ \chapterfont{\color{dark}\fontsize{20}{16.8}\selectfont}
44
+ \sectionfont{\color{dark}\fontsize{20}{16.8}\selectfont}
45
+ \subsectionfont{\color{dark}\fontsize{14}{16.8}\selectfont}
46
+ \titleformat{\subsection}
47
+ {\sffamily\Large\bfseries}{\thesection}{1em}{}[{\titlerule[0.8pt]}]
48
+
49
+ % left align title
50
+ \makeatletter
51
+ \renewcommand{\maketitle}{\bgroup\setlength{\parindent}{0pt}
52
+ \begin{flushleft}
53
+ {\sffamily\huge\textbf{\MakeUppercase{\@title}}} \vspace{0.3cm} \newline
54
+ {\Large {\@subtitle}} \newline
55
+ \@author
56
+ \end{flushleft}\egroup
57
+ }
58
+ \makeatother
59
+
60
+ %% Use some custom fonts
61
+ \setsansfont{Ubuntu}[
62
+ Path=_extensions/nrennie/PrettyPDF/Ubuntu/,
63
+ Scale=0.9,
64
+ Extension = .ttf,
65
+ UprightFont=*-Regular,
66
+ BoldFont=*-Bold,
67
+ ItalicFont=*-Italic,
68
+ ]
69
+
70
+ \setmainfont{Ubuntu}[
71
+ Path=_extensions/nrennie/PrettyPDF/Ubuntu/,
72
+ Scale=0.9,
73
+ Extension = .ttf,
74
+ UprightFont=*-Regular,
75
+ BoldFont=*-Bold,
76
+ ItalicFont=*-Italic,
77
+ ]
reports/DOCS/_extensions/nrennie/PrettyPDF/Ubuntu/UFL.txt ADDED
@@ -0,0 +1,96 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ -------------------------------
2
+ UBUNTU FONT LICENCE Version 1.0
3
+ -------------------------------
4
+
5
+ PREAMBLE
6
+ This licence allows the licensed fonts to be used, studied, modified and
7
+ redistributed freely. The fonts, including any derivative works, can be
8
+ bundled, embedded, and redistributed provided the terms of this licence
9
+ are met. The fonts and derivatives, however, cannot be released under
10
+ any other licence. The requirement for fonts to remain under this
11
+ licence does not require any document created using the fonts or their
12
+ derivatives to be published under this licence, as long as the primary
13
+ purpose of the document is not to be a vehicle for the distribution of
14
+ the fonts.
15
+
16
+ DEFINITIONS
17
+ "Font Software" refers to the set of files released by the Copyright
18
+ Holder(s) under this licence and clearly marked as such. This may
19
+ include source files, build scripts and documentation.
20
+
21
+ "Original Version" refers to the collection of Font Software components
22
+ as received under this licence.
23
+
24
+ "Modified Version" refers to any derivative made by adding to, deleting,
25
+ or substituting -- in part or in whole -- any of the components of the
26
+ Original Version, by changing formats or by porting the Font Software to
27
+ a new environment.
28
+
29
+ "Copyright Holder(s)" refers to all individuals and companies who have a
30
+ copyright ownership of the Font Software.
31
+
32
+ "Substantially Changed" refers to Modified Versions which can be easily
33
+ identified as dissimilar to the Font Software by users of the Font
34
+ Software comparing the Original Version with the Modified Version.
35
+
36
+ To "Propagate" a work means to do anything with it that, without
37
+ permission, would make you directly or secondarily liable for
38
+ infringement under applicable copyright law, except executing it on a
39
+ computer or modifying a private copy. Propagation includes copying,
40
+ distribution (with or without modification and with or without charging
41
+ a redistribution fee), making available to the public, and in some
42
+ countries other activities as well.
43
+
44
+ PERMISSION & CONDITIONS
45
+ This licence does not grant any rights under trademark law and all such
46
+ rights are reserved.
47
+
48
+ Permission is hereby granted, free of charge, to any person obtaining a
49
+ copy of the Font Software, to propagate the Font Software, subject to
50
+ the below conditions:
51
+
52
+ 1) Each copy of the Font Software must contain the above copyright
53
+ notice and this licence. These can be included either as stand-alone
54
+ text files, human-readable headers or in the appropriate machine-
55
+ readable metadata fields within text or binary files as long as those
56
+ fields can be easily viewed by the user.
57
+
58
+ 2) The font name complies with the following:
59
+ (a) The Original Version must retain its name, unmodified.
60
+ (b) Modified Versions which are Substantially Changed must be renamed to
61
+ avoid use of the name of the Original Version or similar names entirely.
62
+ (c) Modified Versions which are not Substantially Changed must be
63
+ renamed to both (i) retain the name of the Original Version and (ii) add
64
+ additional naming elements to distinguish the Modified Version from the
65
+ Original Version. The name of such Modified Versions must be the name of
66
+ the Original Version, with "derivative X" where X represents the name of
67
+ the new work, appended to that name.
68
+
69
+ 3) The name(s) of the Copyright Holder(s) and any contributor to the
70
+ Font Software shall not be used to promote, endorse or advertise any
71
+ Modified Version, except (i) as required by this licence, (ii) to
72
+ acknowledge the contribution(s) of the Copyright Holder(s) or (iii) with
73
+ their explicit written permission.
74
+
75
+ 4) The Font Software, modified or unmodified, in part or in whole, must
76
+ be distributed entirely under this licence, and must not be distributed
77
+ under any other licence. The requirement for fonts to remain under this
78
+ licence does not affect any document created using the Font Software,
79
+ except any version of the Font Software extracted from a document
80
+ created using the Font Software may only be distributed under this
81
+ licence.
82
+
83
+ TERMINATION
84
+ This licence becomes null and void if any of the above conditions are
85
+ not met.
86
+
87
+ DISCLAIMER
88
+ THE FONT SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
89
+ EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTIES OF
90
+ MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT OF
91
+ COPYRIGHT, PATENT, TRADEMARK, OR OTHER RIGHT. IN NO EVENT SHALL THE
92
+ COPYRIGHT HOLDER BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY,
93
+ INCLUDING ANY GENERAL, SPECIAL, INDIRECT, INCIDENTAL, OR CONSEQUENTIAL
94
+ DAMAGES, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
95
+ FROM, OUT OF THE USE OR INABILITY TO USE THE FONT SOFTWARE OR FROM OTHER
96
+ DEALINGS IN THE FONT SOFTWARE.
reports/DOCS/_extensions/nrennie/PrettyPDF/Ubuntu/Ubuntu-Bold.ttf ADDED
Binary file (270 kB). View file
 
reports/DOCS/_extensions/nrennie/PrettyPDF/Ubuntu/Ubuntu-BoldItalic.ttf ADDED
Binary file (283 kB). View file
 
reports/DOCS/_extensions/nrennie/PrettyPDF/Ubuntu/Ubuntu-Italic.ttf ADDED
Binary file (327 kB). View file
 
reports/DOCS/_extensions/nrennie/PrettyPDF/Ubuntu/Ubuntu-Light.ttf ADDED
Binary file (363 kB). View file
 
reports/DOCS/_extensions/nrennie/PrettyPDF/Ubuntu/Ubuntu-LightItalic.ttf ADDED
Binary file (350 kB). View file
 
reports/DOCS/_extensions/nrennie/PrettyPDF/Ubuntu/Ubuntu-Medium.ttf ADDED
Binary file (285 kB). View file
 
reports/DOCS/_extensions/nrennie/PrettyPDF/Ubuntu/Ubuntu-MediumItalic.ttf ADDED
Binary file (310 kB). View file
 
reports/DOCS/_extensions/nrennie/PrettyPDF/Ubuntu/Ubuntu-Regular.ttf ADDED
Binary file (300 kB). View file
 
reports/DOCS/_extensions/nrennie/PrettyPDF/_extension.yml ADDED
@@ -0,0 +1,30 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ title: PrettyPDF
2
+ author: Nicola Rennie
3
+ version: 0.0.3
4
+ contributes:
5
+ project:
6
+ project:
7
+ type: book
8
+ formats:
9
+ pdf:
10
+ include-in-header:
11
+ - "PrettyPDF.tex"
12
+ include-before-body:
13
+ - "pagestyle.tex"
14
+ toc: false
15
+ code-block-bg: light
16
+ linkcolor: highlight
17
+ urlcolor: highlight
18
+ typst:
19
+ papersize: a4
20
+ margin:
21
+ x: 2cm
22
+ y: 2cm
23
+ font-paths: Ubuntu
24
+ typst-logo:
25
+ path: "logo.png"
26
+ template-partials:
27
+ - typst-template.typ
28
+ - typst-show.typ
29
+ filters:
30
+ - PrettyPDF.lua
reports/DOCS/_extensions/nrennie/PrettyPDF/pagestyle.tex ADDED
@@ -0,0 +1 @@
 
 
1
+ \pagestyle{mystyle}
reports/DOCS/_extensions/nrennie/PrettyPDF/typst-show.typ ADDED
@@ -0,0 +1,15 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ #show: PrettyPDF.with(
2
+ $if(title)$
3
+ title: "$title$",
4
+ $endif$
5
+ $if(subtitle)$
6
+ title: "$subtitle$",
7
+ $endif$
8
+ $if(typst-logo)$
9
+ typst-logo: (
10
+ path: "$typst-logo.path$",
11
+ caption: [$typst-logo.caption$]
12
+ ),
13
+ $endif$
14
+ )
15
+
reports/DOCS/_extensions/nrennie/PrettyPDF/typst-template.typ ADDED
@@ -0,0 +1,90 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+
2
+ #let PrettyPDF(
3
+ // The document title.
4
+ title: "PrettyPDF",
5
+
6
+ // Logo in top right corner.
7
+ typst-logo: none,
8
+
9
+ // The document content.
10
+ body
11
+ ) = {
12
+
13
+ // Set document metadata.
14
+ set document(title: title)
15
+
16
+ // Configure pages.
17
+ set page(
18
+ margin: (left: 2cm, right: 1.5cm, top: 2cm, bottom: 2cm),
19
+ numbering: "1",
20
+ number-align: right,
21
+ background: place(right + top, rect(
22
+ fill: rgb("#E6E6FA"),
23
+ height: 100%,
24
+ width: 3cm,
25
+ ))
26
+ )
27
+
28
+ // Set the body font.
29
+ set text(10pt, font: "Ubuntu")
30
+
31
+ // Configure headings.
32
+ show heading.where(level: 1): set block(below: 0.8em)
33
+ show heading.where(level: 1): underline
34
+ show heading.where(level: 2): set block(above: 0.5cm, below: 0.5cm)
35
+
36
+ // Links should be purple.
37
+ show link: set text(rgb("#800080"))
38
+
39
+ // Configure light purple border.
40
+ show figure: it => block({
41
+ move(dx: -3%, dy: 1.5%, rect(
42
+ fill: rgb("FF7D79"),
43
+ inset: 0pt,
44
+ move(dx: 3%, dy: -1.5%, it.body)
45
+ ))
46
+ })
47
+
48
+ // Purple border column
49
+ grid(
50
+ columns: (1fr, 0.75cm),
51
+ column-gutter: 2.5cm,
52
+
53
+ // Title.
54
+ text(font: "Ubuntu", 20pt, weight: 800, upper(title)),
55
+
56
+ // The logo in the sidebar.
57
+ locate(loc => {
58
+ set align(right)
59
+
60
+ // Logo.
61
+ style(styles => {
62
+ if typst-logo == none {
63
+ return
64
+ }
65
+
66
+ let img = image(typst-logo.path, width: 1.5cm)
67
+ let img-size = measure(img, styles)
68
+
69
+ grid(
70
+ columns: (img-size.width, 1cm),
71
+ column-gutter: 16pt,
72
+ rows: img-size.height,
73
+ img,
74
+ )
75
+ })
76
+
77
+ }),
78
+
79
+ // The main body text.
80
+ {
81
+ set par(justify: true)
82
+ body
83
+ v(1fr)
84
+ },
85
+
86
+
87
+ )
88
+ }
89
+
90
+