Msvr commited on
Commit
3911020
·
1 Parent(s): d374df1

Initial commit

Browse files
.gitignore ADDED
@@ -0,0 +1,9 @@
 
 
 
 
 
 
 
 
 
 
1
+ .env
2
+ */__pycache__
3
+ .vscode
4
+ notebooks/
5
+ *.pyc
6
+ local_tests/
7
+ questions_tests.txt
8
+ env/
9
+ spinoza_env
LICENSE.txt ADDED
@@ -0,0 +1,15 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ Project Spinoza License
2
+
3
+ This project, Spinoza, was developed by Ekimetrics, Reporters sans Frontières, and l'Alliance de la presse d'information générale, and funded by the French Ministry of Culture.
4
+ License: GNU General Public License v3.0
5
+
6
+ This project is licensed under the GNU General Public License, version 3.0 (GPL-3.0). A full copy of the GPL-3.0 license is available at https://www.gnu.org/licenses/gpl-3.0#license-text
7
+ Key Provisions:
8
+
9
+ Any redistribution or reuse of the front-end interface of the Spinoza project must retain the footer that references the aforementioned organizations (Ekimetrics, Reporters sans Frontières, l'Alliance de la presse d'information générale) and the French Ministry of Culture.
10
+
11
+ The software includes components that rely on NVIDIA CUDA Runtime and other NVIDIA-specific packages. These packages impose constraints that restrict the software's deployment to environments running on NVIDIA GPUs, as is the case with similar platforms such as Hugging Face.
12
+
13
+ If this software or any of its components are made available outside the Spinoza repository, it is the responsibility of the person or organization making the software available to ensure compliance with all applicable licensing terms, including but not limited to, ensuring that all necessary legal conditions and technical constraints (such as deployment on NVIDIA hardware) are met.
14
+
15
+ By using, modifying, or redistributing this software, you agree to these terms.
README.md CHANGED
@@ -1,12 +1,59 @@
1
  ---
2
- title: Spinoza Public
3
- emoji: 👀
4
- colorFrom: purple
5
- colorTo: yellow
6
  sdk: gradio
7
- sdk_version: 5.23.1
8
  app_file: app.py
9
  pinned: false
 
10
  ---
11
 
12
- Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ title: Spinoza
3
+ emoji: 🐨
4
+ colorFrom: green
5
+ colorTo: indigo
6
  sdk: gradio
7
+ sdk_version: 4.37.2
8
  app_file: app.py
9
  pinned: false
10
+ hf_oauth: true
11
  ---
12
 
13
+ # Spinoza Project
14
+ Reporters Without Borders (RSF) has teamed up with the General Information Press Alliance (the Alliance) to initiate the first step of its **Spinoza project** to develop an artificial intelligence tool by and for journalists that will safeguard the media’s intellectual property over what they publish. You can find more information <a href="https://rsf.org/en/rsf-and-french-general-press-alliance-launch-spinoza-project-develop-ai-tool-journalists" target="_blank">here</a> and <a href="https://rsf.org/en/journalists-france-test-spinoza-project-s-first-ai-prototype-launched-rsf-and-alliance" target="_blank">here</a>.
15
+
16
+ A visual and guided introduction to the artificial intelligence tool can be found in the <a href="https://www.youtube.com/@reporterssansfrontieres" target="_blank">RSF's Youtube Channel</a>. You can have a look just below.
17
+
18
+ <iframe width="560" height="315" src="https://www.youtube.com/embed/iBy6IBSxRkw?si=zjissjcLEOJv7cwP" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" referrerpolicy="strict-origin-when-cross-origin" allowfullscreen></iframe>
19
+
20
+ ## Table of Contents
21
+ - [Installation](#installation)
22
+ - [Usage](#usage)
23
+ - [Contributing](#contributing)
24
+ - [License](#license)
25
+
26
+ ## Installation
27
+ 1. Clone the repository (SSH):
28
+ ```bash
29
+ git clone https://huggingface.co/spaces/SpinozaProject/spinoza
30
+ ```
31
+
32
+ 2. Install dependencies:
33
+ ```bash
34
+ poetry install
35
+ ```
36
+
37
+ Note that we highly recommend to create a new env :
38
+ ```bash
39
+ conda create -n myenv python=3.10
40
+ conda activate myenv
41
+
42
+ pip install poetry
43
+ ```
44
+
45
+ ## Usage
46
+ To run the project in local workspace, use the following command:
47
+ ```bash
48
+ python app.py
49
+ ```
50
+
51
+ ## Contributing
52
+ 1. Clone the repository.
53
+ 2. Create a new branch: `git checkout -b feature-name`.
54
+ 3. Make your changes.
55
+ 4. Push your branch: `git push origin feature-name`.
56
+ 5. Create a pull request.
57
+
58
+ ## License
59
+ WIP
app.py ADDED
@@ -0,0 +1,640 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import gradio as gr
2
+ import time
3
+ import os
4
+ import yaml
5
+ from qdrant_client import models
6
+ from tqdm import tqdm
7
+ from collections import defaultdict
8
+ import pandas as pd
9
+
10
+ from spinoza_project.source.backend.llm_utils import (
11
+ get_llm_api,
12
+ )
13
+ from spinoza_project.source.frontend.utils import (
14
+ init_env,
15
+ parse_output_llm_with_sources,
16
+ )
17
+ from spinoza_project.source.frontend.gradio_utils import (
18
+ get_sources,
19
+ set_prompts,
20
+ get_config,
21
+ get_prompts,
22
+ get_assets,
23
+ get_theme,
24
+ get_init_prompt,
25
+ get_synthesis_prompt,
26
+ get_qdrants,
27
+ start_agents,
28
+ end_agents,
29
+ next_call,
30
+ zip_longest_fill,
31
+ reformulate,
32
+ answer,
33
+ get_text,
34
+ update_translation,
35
+ )
36
+
37
+ from assets.utils_javascript import (
38
+ accordion_trigger,
39
+ accordion_trigger_end,
40
+ accordion_trigger_spinoza,
41
+ accordion_trigger_spinoza_end,
42
+ update_footer
43
+ )
44
+
45
+ init_env()
46
+
47
+ with open("./spinoza_project/config.yaml") as f:
48
+ config = yaml.full_load(f)
49
+
50
+ ## Loading Prompts
51
+ print("Loading Prompts")
52
+ prompts = get_prompts(config)
53
+ chat_qa_prompts, chat_reformulation_prompts = set_prompts(prompts, config)
54
+ synthesis_prompt_template = get_synthesis_prompt(config)
55
+
56
+ ## Building LLM
57
+ print("Building LLM")
58
+ groq_model_name = config.get("groq_model_name", "")
59
+ llm = get_llm_api(groq_model_name)
60
+
61
+ ## Loading BDDs
62
+ print("Loading Databases")
63
+ qdrants, df_qdrants = get_qdrants(config)
64
+
65
+ dataframes_by_source = {
66
+ source: df_qdrants[df_qdrants['Source'] == source].drop(columns=['Source'])
67
+ for source in df_qdrants['Source'].unique()
68
+ }
69
+
70
+ for source, df in dataframes_by_source.items():
71
+ dataframes_by_source[source]['Filter'] = dataframes_by_source[source]['Filter'].fillna('Unknown')
72
+
73
+ unknown_percentage = df.apply(lambda x: (x == 'Unknown').mean())
74
+ columns_to_drop = unknown_percentage[unknown_percentage == 1.0].index
75
+
76
+ if len(columns_to_drop) > 0:
77
+ print(f"Deleting following columns for {source}: {columns_to_drop.tolist()}")
78
+ dataframes_by_source[source] = df.drop(columns=columns_to_drop)
79
+
80
+ ## Loading Assets
81
+ print("Loading assets")
82
+ css, source_information_fr, source_information_en, about_contact_fr, about_contact_en = get_assets()
83
+ theme = get_theme()
84
+ init_prompt = get_init_prompt()
85
+
86
+ ## Updating TRANSLATIONS dictionnary
87
+ list_tabs = list(config["tabs"])
88
+ update_translation(list_tabs, config)
89
+
90
+ def get_source_df(source_name):
91
+ return dataframes_by_source.get(source_name, pd.DataFrame())
92
+
93
+ LANGUAGE_MAPPING = {
94
+ "fr": "french/français",
95
+ "en": "english/anglais"
96
+ }
97
+
98
+ def reformulate_questions(
99
+ lang_component,
100
+ question,
101
+ llm=llm,
102
+ chat_reformulation_prompts=chat_reformulation_prompts,
103
+ config=config,
104
+ ):
105
+ lang = lang_component.value if hasattr(lang_component, 'value') else lang_component
106
+ language = LANGUAGE_MAPPING.get(lang, "french/français")
107
+
108
+ for elt in zip_longest_fill(
109
+ *[
110
+ reformulate(language, llm, chat_reformulation_prompts, question, tab, config=config)
111
+ for tab in config["tabs"]
112
+ ]
113
+ ):
114
+ time.sleep(0.02)
115
+ yield elt
116
+
117
+ def retrieve_sources(
118
+ *questions,
119
+ filters_dict,
120
+ qdrants=qdrants,
121
+ config=config,
122
+ ):
123
+ if filters_dict is None:
124
+ filters_dict = {}
125
+
126
+ formated_sources, text_sources = get_sources(
127
+ questions, filters_dict, qdrants, config
128
+ )
129
+
130
+ return (formated_sources, *text_sources)
131
+
132
+ def retrieve_sources_wrapper(*args):
133
+ questions = list(args[:-1])
134
+ filters = args[-1]
135
+
136
+ return retrieve_sources(
137
+ questions,
138
+ filters_dict=filters
139
+ )
140
+
141
+ def answer_questions(
142
+ lang_component,
143
+ *questions_sources,
144
+ llm=llm,
145
+ chat_qa_prompts=chat_qa_prompts,
146
+ config=config
147
+ ):
148
+ lang = lang_component.value if hasattr(lang_component, 'value') else lang_component
149
+ language = LANGUAGE_MAPPING.get(lang, "french/français")
150
+
151
+ questions = [elt for elt in questions_sources[: len(questions_sources) // 2]]
152
+ sources = [elt for elt in questions_sources[len(questions_sources) // 2 :]]
153
+
154
+ for elt in zip_longest_fill(
155
+ *[
156
+ answer(language, llm, chat_qa_prompts, question, source, tab, config)
157
+ for question, source, tab in zip(questions, sources, config["tabs"])
158
+ ]
159
+ ):
160
+ time.sleep(0.02)
161
+ yield [
162
+ [(question, parse_output_llm_with_sources(ans))]
163
+ for question, ans in zip(questions, elt)
164
+ ]
165
+
166
+ def get_synthesis(
167
+ lang_component,
168
+ question,
169
+ *answers,
170
+ llm=llm,
171
+ synthesis_prompt_template=synthesis_prompt_template,
172
+ config=config,
173
+ ):
174
+ lang = lang_component.value if hasattr(lang_component, 'value') else lang_component
175
+ language = LANGUAGE_MAPPING.get(lang, "french/français")
176
+
177
+ answer = []
178
+ for i, tab in enumerate(config["tabs"]):
179
+ if len(str(answers[i])) >= 100:
180
+ answer.append(
181
+ f"{tab}\n{answers[i]}".replace("<p>", "").replace("</p>\n", "")
182
+ )
183
+
184
+ if len(answer) == 0:
185
+ return "Aucune source n'a pu être identifiée pour répondre, veuillez modifier votre question"
186
+ else:
187
+ for elt in llm.stream(
188
+ synthesis_prompt_template,
189
+ {
190
+ "question": question.replace("<p>", "").replace("</p>\n", ""),
191
+ "answers": "\n\n".join(answer),
192
+ "language": language
193
+ },
194
+ ):
195
+ time.sleep(0.01)
196
+ yield [(question, parse_output_llm_with_sources(elt))]
197
+
198
+ def get_unique_values_filters(df):
199
+ filters_values = sorted([
200
+ str(x) for x in df['Filter'].unique()
201
+ if pd.notna(x) and str(x).strip() != ''
202
+ ])
203
+
204
+ return filters_values
205
+
206
+ def filter_data(filter, source):
207
+ if source not in dataframes_by_source:
208
+ raise ValueError(f"'{source}' not found withing the sources availible")
209
+
210
+ df = dataframes_by_source[source]
211
+
212
+ if filter:
213
+ df = df[df['Filter'].fillna('').astype(str).isin(filter)]
214
+
215
+ return df.values.tolist()
216
+
217
+ def update_filters(filters_dict, agent, values):
218
+ field = "file_filtering_modality"
219
+ if filters_dict is None:
220
+ filters_dict = {}
221
+ new_filters = dict(filters_dict)
222
+
223
+ if agent not in new_filters:
224
+ new_filters[agent] = {}
225
+
226
+ if not values or isinstance(values, list):
227
+ if field in new_filters[agent]:
228
+ del new_filters[agent][field]
229
+ if not new_filters[agent]:
230
+ del new_filters[agent]
231
+ else:
232
+ new_filters[agent][field] = values
233
+
234
+ return new_filters, new_filters
235
+
236
+ with gr.Blocks(
237
+ title=f"🔍 Spinoza",
238
+ css=css,
239
+ js=update_footer(),
240
+ theme=theme,
241
+ ) as demo:
242
+ accordions_qa = {}
243
+ accordions_filters = {}
244
+ current_language = gr.State(value="fr")
245
+ chatbots = {}
246
+ question = gr.State("")
247
+ agt_input_flt = {}
248
+ agt_desc = {}
249
+ agt_input_dsp = gr.State({})
250
+ docs_textbox = gr.State([""])
251
+ agent_questions = {elt: gr.State("") for elt in config["tabs"]}
252
+ component_sources = {elt: gr.State("") for elt in config["tabs"]}
253
+ text_sources = {elt: gr.State("") for elt in config["tabs"]}
254
+ tab_states = {elt: gr.State(elt) for elt in config["tabs"]}
255
+ filters_state = gr.State({})
256
+ filters_display = gr.JSON(
257
+ label="Filtres sélectionnés",
258
+ value={},
259
+ visible=False
260
+ )
261
+
262
+ with gr.Row(elem_classes="header-row"):
263
+ button_fr = gr.Button("", elem_id="fr-button", elem_classes="lang-button", icon='./assets/logos/france_round.png')
264
+ button_en = gr.Button("", elem_id="en-button", elem_classes="lang-button", icon='./assets/logos/us_round.png')
265
+
266
+ with gr.Row(elem_classes="main-row"):
267
+ with gr.Tab("Q&A", elem_id="main-component"):
268
+ with gr.Row(elem_id="chatbot-row"):
269
+ with gr.Column(scale=2, elem_id="center-panel"):
270
+ with gr.Row(elem_id="input-message"):
271
+ ask = gr.Textbox(
272
+ placeholder=get_text("ask_placeholder", current_language.value),
273
+ show_label=False,
274
+ scale=7,
275
+ lines=1,
276
+ interactive=True,
277
+ elem_id="input-textbox",
278
+ )
279
+
280
+ with gr.Group(elem_id="chatbot-group"):
281
+ for tab in list(config["tabs"].keys()):
282
+ agent_name = get_text(f"agent_{config['source_mapping'][tab]}_qa", current_language.value)
283
+ elem_id = f"accordion-{config['source_mapping'][tab]}"
284
+ elem_classes = "accordion accordion-agent"
285
+
286
+ with gr.Accordion(
287
+ label=agent_name,
288
+ open=False,
289
+ elem_id=elem_id,
290
+ elem_classes=elem_classes,
291
+ ) as accordions_qa[config['source_mapping'][tab]]:
292
+ # chatbot_key = agent_name.lower().replace(" ", "_")
293
+ chatbots[tab] = gr.Chatbot(
294
+ value=None,
295
+ show_copy_button=True,
296
+ show_share_button=False,
297
+ show_label=False,
298
+ elem_id=f"chatbot-{agent_name.lower().replace(' ', '-')}",
299
+ layout="panel",
300
+ avatar_images=(
301
+ "./assets/logos/help.png",
302
+ (
303
+ "./assets/logos/spinoza.png"
304
+ if agent_name == "Spinoza"
305
+ else None
306
+ ),
307
+ )
308
+ )
309
+
310
+ agent_name = "Spinoza"
311
+ with gr.Accordion(
312
+ label=agent_name,
313
+ open=True,
314
+ elem_id="accordion-Spinoza",
315
+ elem_classes="accordion accordion-agent spinoza-agent",
316
+ ) as accordion_spinoza:
317
+ # chatbot_key = agent_name.lower().replace(" ", "_")
318
+ chatbots["Spinoza"] = gr.Chatbot(
319
+ value=([(None, get_text("init_prompt", current_language.value))]),
320
+ show_copy_button=True,
321
+ show_share_button=False,
322
+ show_label=False,
323
+ elem_id=f"chatbot-{agent_name.lower().replace(' ', '-')}",
324
+ layout="panel",
325
+ avatar_images=(
326
+ "./assets/logos/help.png",
327
+ "./assets/logos/spinoza.png",
328
+ ),
329
+ )
330
+
331
+ with gr.Column(scale=1, variant="panel", elem_id="right-panel"):
332
+ with gr.TabItem("Sources", elem_id="tab-sources", id=0):
333
+ sources_textbox = gr.HTML(
334
+ show_label=False, elem_id="sources-textbox"
335
+ )
336
+
337
+ with gr.Tab(label=get_text("source_filter_label", current_language.value), elem_id="filter-component") as source_filter_tab:
338
+ source_filter_title= gr.Markdown(value=get_text("source_filter_title", current_language.value))
339
+ source_filter_subtitle = gr.Markdown(value=get_text("source_filter_subtitle", current_language.value))
340
+
341
+ with gr.Row(elem_id="filter-row"):
342
+ with gr.Column(scale=2, elem_id="filter-center-panel"):
343
+ with gr.Group(elem_id="filter-group"):
344
+ for tab in list(config["tabs"].keys()):
345
+ agent_name = get_text(f"agent_{config['source_mapping'][tab]}_flt", current_language.value)
346
+ elem_id = f"accordion-filter-{config['source_mapping'][tab]}"
347
+ elem_classes = "accordion accordion-source"
348
+
349
+ with gr.Accordion(
350
+ label=agent_name,
351
+ open=False,
352
+ elem_id=elem_id,
353
+ elem_classes=elem_classes,
354
+ ) as accordions_filters[config['source_mapping'][tab]]:
355
+ question_filter = gr.Markdown(value=get_text("question_filter", current_language.value))
356
+ with gr.Tabs():
357
+ df = get_source_df(config['source_mapping'][tab])
358
+ if not df.empty and 'Filter' in df.columns:
359
+ filters = get_unique_values_filters(df)
360
+
361
+ with gr.Row():
362
+ var_name = f"{config['source_mapping'][tab]}_input_flt"
363
+ agt_input_flt[var_name] = gr.CheckboxGroup(
364
+ [filter for filter in filters],
365
+ label="Filter(s):"
366
+ )
367
+
368
+ agt_input_flt[var_name].change(
369
+ fn=update_filters,
370
+ inputs=[filters_state, gr.State(config['source_mapping'][tab]), agt_input_flt[var_name]],
371
+ outputs=[filters_state, filters_display]
372
+ )
373
+
374
+ else:
375
+ gr.Markdown("**Error:** No data / 'Filter' column doesn't exist...")
376
+
377
+ with gr.Tab(label=get_text("source_informatation_label", current_language.value), elem_id="source-component") as source_information_tab:
378
+ with gr.Row():
379
+ with gr.Column(scale=1):
380
+ display_info_desc = gr.Markdown(value=get_text("display_info_desc", current_language.value))
381
+ accordions_inf = {}
382
+ with gr.Tabs(elem_id="main-tab-disp"):
383
+ for tab in list(config["tabs"].keys()):
384
+ agent_name = get_text(f"agent_{config['source_mapping'][tab]}_tab", current_language.value)
385
+ elem_id = f"accordion-{config['source_mapping'][tab]}-tab"
386
+ elem_classes = "disp-tabs"
387
+
388
+ with gr.Tab(
389
+ label=agent_name,
390
+ elem_id=elem_id,
391
+ elem_classes=elem_classes
392
+ ) as accordions_inf[config['source_mapping'][tab]]:
393
+ var_name = f"{config['source_mapping'][tab]}_desc"
394
+ agt_desc[var_name] = gr.Markdown(value=get_text(f"{config['source_mapping'][tab]}_desc", current_language.value))
395
+ df = get_source_df(config['source_mapping'][tab])
396
+ if not df.empty and 'Filter' in df.columns:
397
+ filters = get_unique_values_filters(df)
398
+
399
+ with gr.Row():
400
+ var_name = f"{config['source_mapping'][tab]}_input_dsp"
401
+ agt_input_dsp.value[var_name] = gr.CheckboxGroup(
402
+ [filter for filter in filters],
403
+ label="Filter(s):"
404
+ )
405
+
406
+ output_df = gr.Dataframe(
407
+ headers=['Title', 'Pages', 'Filter Category', 'Publishing Date'],
408
+ datatype=['str', 'number', 'str', 'number'],
409
+ value=df.values.tolist(),
410
+ column_widths=[300, 100, 100, 150],
411
+ wrap=True
412
+ )
413
+
414
+ agt_input_dsp.value[var_name].change(
415
+ filter_data,
416
+ inputs=[agt_input_dsp.value[var_name]]+[gr.State(config['source_mapping'][tab])],
417
+ outputs=[output_df]
418
+ )
419
+
420
+ else:
421
+ gr.Markdown("**Error:** No data / 'Filter' column doesn't exist...")
422
+
423
+ with gr.Tab(label=get_text("contact_label", current_language.value), elem_id="contact-component") as contact_label:
424
+ with gr.Row():
425
+ with gr.Column(scale=1):
426
+ contact_info = gr.Markdown(value=about_contact_fr)
427
+
428
+ ask.submit(
429
+ start_agents, inputs=[current_language], outputs=[chatbots["Spinoza"]] + [source_filter_tab], js=accordion_trigger()
430
+ ).then(
431
+ fn=reformulate_questions,
432
+ inputs=[current_language]+
433
+ [ask],
434
+ outputs=[agent_questions[tab] for tab in config["tabs"]],
435
+ ).then(
436
+ fn=retrieve_sources_wrapper,
437
+ inputs=[agent_questions[tab] for tab in config["tabs"]] + [filters_state],
438
+ outputs=[sources_textbox] + [text_sources[tab] for tab in config["tabs"]],
439
+ ).then(
440
+ fn=answer_questions,
441
+ inputs=[current_language]
442
+ + [agent_questions[tab] for tab in config["tabs"]]
443
+ + [text_sources[tab] for tab in config["tabs"]],
444
+ outputs=[chatbots[tab] for tab in config["tabs"]],
445
+ ).then(
446
+ fn=next_call, inputs=[], outputs=[], js=accordion_trigger_end()
447
+ ).then(
448
+ fn=next_call, inputs=[], outputs=[], js=accordion_trigger_spinoza()
449
+ ).then(
450
+ fn=get_synthesis,
451
+ inputs=[current_language]
452
+ + [ask]
453
+ + [chatbots[tab] for tab in config["tabs"]],
454
+ outputs=[chatbots["Spinoza"]],
455
+ ).then(
456
+ fn=next_call, inputs=[], outputs=[], js=accordion_trigger_spinoza_end()
457
+ ).then(
458
+ fn=end_agents, inputs=[current_language], outputs=[source_filter_tab]
459
+ )
460
+
461
+ def reset_app(language):
462
+
463
+ chatbot_updates = {}
464
+ for tab in config["tabs"]:
465
+ chatbot_updates[tab] = gr.update(value=None)
466
+ chatbot_updates["Spinoza"] = gr.update(value=[(None, get_text("init_prompt", language))])
467
+
468
+ empty_checkbox = gr.update(value=None)
469
+ checkbox_components = list(agt_input_flt.keys()) + list(agt_input_dsp.value.keys())
470
+ checkbox_updates = {component: empty_checkbox for component in checkbox_components}
471
+
472
+ return {
473
+ "chatbots": chatbot_updates,
474
+ "filters_state": gr.update(value={}),
475
+ "filters_display": gr.update(value={}),
476
+ "ask": gr.update(value="", placeholder=get_text("ask_placeholder", language)),
477
+ "sources_textbox": gr.update(value=""),
478
+ "checkbox_updates": checkbox_updates
479
+ }
480
+
481
+ def toggle_language_fr():
482
+ reset_state = reset_app("fr")
483
+ return [
484
+ "fr",
485
+ reset_state["ask"],
486
+ reset_state["chatbots"]["Spinoza"],
487
+ *[reset_state["chatbots"][tab] for tab in config["tabs"]],
488
+ *[
489
+ gr.update(
490
+ label=get_text(f"agent_{config['source_mapping'][tab]}_qa", "fr"),
491
+ open=False,
492
+ elem_id=f"accordion-{config['source_mapping'][tab]}",
493
+ elem_classes="accordion accordion-agent"
494
+ )
495
+ for tab in list(config["tabs"].keys())
496
+ ],
497
+ gr.update(label=get_text("source_filter_label", "fr"), elem_id="filter-component"),
498
+ *[
499
+ gr.update(
500
+ label=get_text(f"agent_{config['source_mapping'][tab]}_flt", "fr"),
501
+ elem_id=f"accordion-filter-{config['source_mapping'][tab]}",
502
+ elem_classes="accordion accordion-source"
503
+ )
504
+ for tab in list(config["tabs"].keys())
505
+ ],
506
+ gr.update(value=get_text("source_filter_title", 'fr')),
507
+ gr.update(value=get_text("source_filter_subtitle", 'fr')),
508
+ gr.update(value=get_text("question_filter", 'fr')),
509
+ gr.update(label=get_text("source_informatation_label", "fr"), elem_id="source-component"),
510
+ gr.update(value=get_text("display_info_desc", "fr")),
511
+ *[
512
+ gr.update(value=get_text(f"{config['source_mapping'][tab]}_desc", "fr"))
513
+ for tab in list(config["tabs"].keys())
514
+ ],
515
+ *[
516
+ gr.update(
517
+ label=get_text(f"agent_{config['source_mapping'][tab]}_tab", "fr"),
518
+ elem_id=f"accordion-{config['source_mapping'][tab]}-tab",
519
+ elem_classes="disp-tabs"
520
+ )
521
+ for tab in list(config["tabs"].keys())
522
+ ],
523
+ gr.update(label=get_text("contact_label", "fr")),
524
+ gr.update(value=about_contact_fr),
525
+ gr.update(value=""),
526
+ gr.update(value={}),
527
+ gr.update(value={}),
528
+ *[
529
+ gr.update(value=None) for _ in range(len(agt_input_flt))
530
+ ]
531
+ ]
532
+
533
+ def toggle_language_en():
534
+ reset_state = reset_app("en")
535
+ return [
536
+ "en",
537
+ reset_state["ask"],
538
+ reset_state["chatbots"]["Spinoza"],
539
+ *[reset_state["chatbots"][tab] for tab in config["tabs"]],
540
+ *[
541
+ gr.update(
542
+ label=get_text(f"agent_{config['source_mapping'][tab]}_qa", "en"),
543
+ open=False,
544
+ elem_id=f"accordion-{config['source_mapping'][tab]}",
545
+ elem_classes="accordion accordion-agent"
546
+ )
547
+ for tab in list(config["tabs"].keys())
548
+ ],
549
+ gr.update(label=get_text("source_filter_label", "en"), elem_id="filter-component"),
550
+ *[
551
+ gr.update(
552
+ label=get_text(f"agent_{config['source_mapping'][tab]}_flt", "en"),
553
+ elem_id=f"accordion-filter-{config['source_mapping'][tab]}",
554
+ elem_classes="accordion accordion-source"
555
+ )
556
+ for tab in list(config["tabs"].keys())
557
+ ],
558
+ gr.update(value=get_text("source_filter_title", 'en')),
559
+ gr.update(value=get_text("source_filter_subtitle", 'en')),
560
+ gr.update(value=get_text("question_filter", 'en')),
561
+ gr.update(label=get_text("source_informatation_label", "en"), elem_id="source-component"),
562
+ gr.update(value=get_text("display_info_desc", "en")),
563
+ *[
564
+ gr.update(value=get_text(f"{config['source_mapping'][tab]}_desc", "en"))
565
+ for tab in list(config["tabs"].keys())
566
+ ],
567
+ *[
568
+ gr.update(
569
+ label=get_text(f"agent_{config['source_mapping'][tab]}_tab", "en"),
570
+ elem_id=f"accordion-{config['source_mapping'][tab]}-tab",
571
+ elem_classes="disp-tabs"
572
+ )
573
+ for tab in list(config["tabs"].keys())
574
+ ],
575
+ gr.update(label=get_text("contact_label", "en")),
576
+ gr.update(value=about_contact_en),
577
+ gr.update(value=""),
578
+ gr.update(value={}),
579
+ gr.update(value={}),
580
+ *[
581
+ gr.update(value=None) for _ in range(len(agt_input_flt))
582
+ ]
583
+ ]
584
+
585
+ button_fr.click(
586
+ fn=toggle_language_fr,
587
+ inputs=[],
588
+ outputs=[
589
+ current_language,
590
+ ask,
591
+ chatbots["Spinoza"],
592
+ *[chatbots[tab] for tab in config["tabs"]],
593
+ *[accordions_qa[key] for key in accordions_qa.keys()],
594
+ source_filter_tab,
595
+ *[accordions_filters[key] for key in accordions_filters.keys()],
596
+ source_filter_title,
597
+ source_filter_subtitle,
598
+ question_filter,
599
+ source_information_tab,
600
+ display_info_desc,
601
+ *[agt_desc[key] for key in agt_desc.keys()],
602
+ *[accordions_inf[key] for key in accordions_inf.keys()],
603
+ contact_label,
604
+ contact_info,
605
+ sources_textbox,
606
+ filters_state,
607
+ filters_display,
608
+ *[agt_input_flt[key] for key in agt_input_flt.keys()]
609
+ ]
610
+ )
611
+
612
+ button_en.click(
613
+ fn=toggle_language_en,
614
+ inputs=[],
615
+ outputs=[
616
+ current_language,
617
+ ask,
618
+ chatbots["Spinoza"],
619
+ *[chatbots[tab] for tab in config["tabs"]],
620
+ *[accordions_qa[key] for key in accordions_qa.keys()],
621
+ source_filter_tab,
622
+ *[accordions_filters[key] for key in accordions_filters.keys()],
623
+ source_filter_title,
624
+ source_filter_subtitle,
625
+ question_filter,
626
+ source_information_tab,
627
+ display_info_desc,
628
+ *[agt_desc[key] for key in agt_desc.keys()],
629
+ *[accordions_inf[key] for key in accordions_inf.keys()],
630
+ contact_label,
631
+ contact_info,
632
+ sources_textbox,
633
+ filters_state,
634
+ filters_display,
635
+ *[agt_input_flt[key] for key in agt_input_flt.keys()]
636
+ ]
637
+ )
638
+
639
+ if __name__ == "__main__":
640
+ demo.queue().launch(debug=True, share=True)
assets/about_contact_en.md ADDED
@@ -0,0 +1,78 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # The Spinoza Project
2
+
3
+
4
+
5
+ Reporters Without Borders and the Alliance of General Information Press have collaborated to create an AI tool tailored to journalism. The project aims to develop a collaborative prototype to assist journalists in their work while ensuring the protection of media data.
6
+
7
+
8
+
9
+ The ultimate goal is to provide reliable, pluralistic, and high-quality information. Ekimetrics acts as the technical partner in this project, responsible for identifying appropriate solutions to meet the needs and implementing the prototype.
10
+
11
+
12
+
13
+ If you encounter any bugs to report, need additional information, or wish to get in touch with us, feel free to write to us at the following address: [[email protected]](mailto:[email protected]).
14
+
15
+
16
+
17
+ ---
18
+
19
+
20
+
21
+ ## How it Works
22
+
23
+
24
+
25
+ ### How Sources are Retrieved
26
+
27
+
28
+
29
+ Here are some details about the **relevance score**. The relevance score is a metric used to evaluate the relevance of retrieved documents to a given query in a *vectorstore*. When a document is stored as a vector, the relevance score indicates how close that document is to the query in terms of vector similarity.
30
+
31
+
32
+
33
+ Here’s how it generally works:
34
+
35
+
36
+
37
+ **Vector Representation**: Documents and the query are converted into vectors in a vector space.
38
+
39
+ **Similarity Calculation**: A similarity measure (such as dot product or cosine distance) is used to compare the vectors of the documents with the query vector.
40
+
41
+ **Relevance Score**: The result of this comparison is the relevance score, which indicates how relevant each document is to the query.
42
+
43
+
44
+
45
+ A higher score means the document is more relevant to the query. This allows the retrieved documents to be ranked by their relevance.
46
+
47
+
48
+
49
+ ### How Summaries are Generated
50
+
51
+
52
+
53
+ Summaries are generated using advanced natural language processing (NLP) algorithms, which rely on the following steps:
54
+
55
+
56
+
57
+ **Analyzing Source Content**: Relevant documents identified through the relevance score are analyzed.
58
+
59
+ **Extracting Key Information**: The essential points of the documents are extracted, taking into account their importance and complementarity.
60
+
61
+ **Automatic Writing**: A coherent and readable summary is generated, following a format tailored to journalists’ needs.
62
+
63
+
64
+
65
+ These steps ensure that the proposed summaries are faithful to the original content while being easily usable by the end-users.
66
+
67
+
68
+
69
+ ---
70
+
71
+
72
+
73
+ ## Contact Us
74
+
75
+
76
+ For any questions or suggestions, feel free to contact us : Spinoza Support" <[email protected]>!
77
+
78
+
assets/about_contact_fr.md ADDED
@@ -0,0 +1,78 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Le projet Spinoza
2
+
3
+
4
+
5
+ Reporters sans frontières et l’Alliance de la presse d’information générale ont collaboré pour créer un outil d’IA adapté au journalisme. Le projet vise à développer un prototype collaboratif pour aider les journalistes dans leur travail tout en garantissant le respect des données des médias.
6
+
7
+
8
+
9
+ L’objectif final est de fournir une information fiable, pluraliste et de haute qualité. Ekimetrics joue dans ce projet le rôle de partenaire technique, en charge de rechercher des solutions adaptées aux besoins identifiés et d’implémenter le prototype.
10
+
11
+
12
+
13
+ En cas de bug à reporter, de besoin d’informations supplémentaires ou si vous souhaitez nous contacter, n’hésitez pas à nous écrire à l’adresse suivante : [[email protected]](mailto:[email protected]).
14
+
15
+
16
+
17
+ ---
18
+
19
+
20
+
21
+ ## Le fonctionnement
22
+
23
+
24
+
25
+ ### Comment sont retrouvées les sources
26
+
27
+
28
+
29
+ Voici quelques informations sur ce qu’est le **score de pertinence**. Le score de pertinence est une mesure utilisée pour évaluer la pertinence des documents récupérés par rapport à une requête donnée dans un *vectorstore*. Lorsqu’un document est stocké sous forme de vecteur, le score de pertinence indique à quel point ce document est proche de la requête en termes de similarité vectorielle.
30
+
31
+
32
+
33
+ Voici comment cela fonctionne généralement :
34
+
35
+
36
+
37
+ **Représentation vectorielle** : Les documents et la requête sont convertis en vecteurs dans un espace vectoriel.
38
+
39
+ **Calcul de similarité** : Une mesure de similarité (comme le produit scalaire ou la distance cosinus) est utilisée pour comparer les vecteurs des documents avec celui de la requête.
40
+
41
+ **Score de pertinence** : Le résultat de cette comparaison est le score de pertinence, qui indique à quel point chaque document est pertinent par rapport à la requête.
42
+
43
+
44
+
45
+ Un score plus élevé signifie que le document est plus pertinent pour la requête. Cela permet de classer les documents récupérés en fonction de leur pertinence.
46
+
47
+
48
+
49
+ ### Comment sont générées les synthèses
50
+
51
+
52
+
53
+ Les synthèses sont générées grâce à l’utilisation d’algorithmes avancés de traitement du langage naturel (NLP) qui s’appuient sur les informations suivantes :
54
+
55
+
56
+
57
+ **Analyse des contenus sources** : Les documents pertinents identifiés grâce au score de pertinence sont analysés.
58
+
59
+ **Extraction des informations clés** : Les points essentiels des documents sont extraits, en tenant compte de leur importance et de leur complémentarité.
60
+
61
+ **Rédaction automatique** : Une synthèse cohérente et lisible est produite, en respectant un format adapté aux besoins des journalistes.
62
+
63
+
64
+
65
+ Ces étapes garantissent que les synthèses proposées soient fidèles aux contenus originaux tout en étant facilement exploitables par les utilisateurs.
66
+
67
+
68
+
69
+ ---
70
+
71
+
72
+
73
+ ## Contactez-nous
74
+
75
+
76
+ Pour toute question ou suggestion, n’hésitez pas à nous contacter : Spinoza Support" <[email protected]> !
77
+
78
+
assets/logos/apig.png ADDED
assets/logos/france_round.png ADDED
assets/logos/help.png ADDED
assets/logos/question.png ADDED
assets/logos/rsf.png ADDED
assets/logos/spinoza.png ADDED
assets/logos/uk_round.png ADDED
assets/logos/us_round.png ADDED
assets/source_information_en.md ADDED
@@ -0,0 +1,34 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ Here's a brief introduction to the data sources accessible by the different agents.
2
+
3
+ 1. **Science**: This tool consists of IPCC and IPBES reports.
4
+
5
+ 2. **Law**: This tool is based on French law and includes 21 "codes" that were modified by the 2021 "Climate Law".
6
+
7
+ 3. **Public Organizations**: This tool queries the French national low-carbon strategy (SNBC).
8
+
9
+ 4. **ADEME**: This tool is dedicated to ADEME data, and we have selected different categories of reports:
10
+
11
+ - Guides made available to the general public
12
+ - Experience reports on new technologies
13
+ - Studies and research on local impacts, institutional documents (analyses commissioned by France & activity reports)
14
+ - Sectoral transition plans for the most emitting industrial sectors (glass, paper, cement, steel, aluminum, chemistry, sugar)
15
+
16
+ 5. **Press**: In 2023, hundreds of thousands of articles from 212 press titles were analyzed to identify those dedicated to Ecological Transition. A documentary query of more than 300 keywords helped select articles mentioning these terms in the title, header, subheadings, or multiple times in the text. The chosen articles were specifically focused on ecological transition and not mere mentions. Once deduplicated and proportionally distributed among media groups, articles were randomly selected, without relying on criteria of size, format, or content, reaching a total of 28,450 articles.
17
+
18
+ 6. **AFP**: More than 700 AFP documents were also collected:
19
+ - References and boxes: These educational formats contain an average of 400 to 600 words. Structured in 3 to 5 sub-sections, their objective is to clearly and concisely explain a current event.
20
+ - Dispatches: These articles are written by AFP and cover real-time news, following an inverted pyramid approach (essential information first). Their length varies from a few words ("alert") to about 600 to 700 words for more detailed articles ("general paper").
21
+ - Fact-checking: Verification of facts related to current events.
22
+ - General papers
23
+
24
+ <br>
25
+
26
+ Here is some information about what a relevance score is. The relevance score is a metric used to evaluate the relevance of documents retrieved in relation to a given query within a **vectorstore**. When a document is stored as a vector, the relevance score indicates how closely that document aligns with the query in terms of vector similarity.
27
+
28
+ Here's how it generally works:
29
+
30
+ - **Vector Representation**: Documents and the query are converted into vectors in a vector space.
31
+ - **Similarity Calculation**: A similarity measure (such as dot product or cosine distance) is used to compare document vectors with the query vector.
32
+ - **Relevance Score**: The result of this comparison is the relevance score, which indicates how relevant each document is to the query.
33
+
34
+ A higher score means the document is more relevant to the query. This allows ranking retrieved documents based on their relevance.
assets/source_information_fr.md ADDED
@@ -0,0 +1,35 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ Voici une brève introduction aux sources de données accessibles par les différents agents.
2
+
3
+ 1. **Science** : cet outil est composé des rapports du GIEC et de l'IPBES.
4
+
5
+ 2. **Loi** : cet outil est basé sur le droit français et regroupe 21 des "codes" qui ont été modifiés par la "loi climat" de 2021.
6
+
7
+ 3. **Organismes Publics** : cet outil interroge la politique nationale française de la stratégie bas carbone (SNBC).
8
+
9
+ 4. **ADEME** : cet outil est dédié aux données de l'ADEME, et nous avons sélectionné différentes catégories de rapports :
10
+
11
+ - Guides mis à disposition du grand public
12
+ - Rapports d’expériences sur les nouvelles technologies
13
+ - Études et recherches sur les impacts locaux, documents institutionnels (analyses commandées par la France & rapports d'activité)
14
+ - Plans de transition sectoriels pour les secteurs industriels les plus émetteurs (verre, papier, ciment, acier, aluminium, chimie, sucre)
15
+
16
+ 5. **Presse** : En 2023, des centaines de milliers d'articles provenant de 212 titres de presse ont été analysés pour repérer ceux consacrés à la Transition Écologique. Une requête documentaire de plus de 300 mots-clés a permis de sélectionner les articles mentionnant ces termes dans le titre, le chapo, les intertitres ou plusieurs fois dans le texte. Les articles choisis étaient spécifiquement axés sur la transition écologique et non de simples mentions. Une fois dédupliqués et répartis proportionnellement entre les groupes de médias, des articles ont été tirés aléatoirement, sans se baser sur des critères de taille, de format ou de contenu pour arriver à un total de 28450 artivles
17
+
18
+ 6. **AFP** : Plus de 700 documents de l'AFP ont aussi été collectées :
19
+ - Repères et encadrés : Ces formats pédagogiques contiennent en moyenne entre 400 et 600 mots. Structurés en 3 à 5 sous-parties, leur objectif est d’expliquer de manière claire et concise un fait d’actualité.
20
+ - Dépêches : Ces articles sont rédigés par l’AFP et traitent de l’actualité en temps réel, selon une approche de pyramide inversée (les informations essentielles en premier). Leur longueur varie de quelques mots ("alerte") à environ 600 à 700 mots pour les articles plus détaillés ("papier général").
21
+ - Fact-checking : Vérification des faits en lien avec l’actualité.
22
+ - Papiers généraux
23
+
24
+
25
+ <br>
26
+
27
+ Voici quelques informations sur ce qu’est le score de pertinence. Le score de pertinence est une mesure utilisée pour évaluer la pertinence des documents récupérés par rapport à une requête donnée dans un **vectorstore**. Lorsqu’un document est stocké sous forme de vecteur, le score de pertinence indique à quel point ce document est proche de la requête en termes de similarité vectorielle.
28
+
29
+ Voici comment cela fonctionne généralement :
30
+
31
+ - **Représentation vectorielle** : Les documents et la requête sont convertis en vecteurs dans un espace vectoriel.
32
+ - **Calcul de similarité** : Une mesure de similarité (comme le produit scalaire ou la distance cosinus) est utilisée pour comparer les vecteurs des documents avec celui de la requête.
33
+ - **Score de pertinence** : Le résultat de cette comparaison est le score de pertinence, qui indique à quel point chaque document est pertinent par rapport à la requête.
34
+
35
+ Un score plus élevé signifie que le document est plus pertinent pour la requête. Cela permet de classer les documents récupérés en fonction de leur pertinence.
assets/style.css ADDED
@@ -0,0 +1,326 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ .message {
2
+ font-size: 14px !important;
3
+ }
4
+
5
+ a {
6
+ text-decoration: none;
7
+ color: inherit;
8
+ }
9
+
10
+ .card {
11
+ background-color: white;
12
+ border-radius: 10px;
13
+ box-shadow: 0 4px 6px rgba(0, 0, 0, 0.1);
14
+ overflow: hidden;
15
+ display: flex;
16
+ flex-direction: column;
17
+ margin: 20px;
18
+ }
19
+
20
+ .card-content {
21
+ padding: 20px;
22
+ }
23
+
24
+ .card-content h2 {
25
+ font-size: 14px !important;
26
+ font-weight: bold;
27
+ margin-bottom: 10px;
28
+ margin-top: 0px !important;
29
+ color: #dc2626 !important;
30
+ ;
31
+ }
32
+
33
+ .card-content p {
34
+ font-size: 12px;
35
+ margin-bottom: 0;
36
+ }
37
+
38
+ .card-footer {
39
+ background-color: #f4f4f4;
40
+ font-size: 10px;
41
+ padding: 10px;
42
+ display: flex;
43
+ justify-content: space-between;
44
+ align-items: center;
45
+ }
46
+
47
+ .card-footer span {
48
+ flex-grow: 1;
49
+ text-align: left;
50
+ color: #999 !important;
51
+ }
52
+
53
+ .message.user {
54
+ /* background-color:#7494b0 !important; */
55
+ border: none;
56
+ /* color:white!important; */
57
+ }
58
+
59
+ .message.bot {
60
+ /* background-color:#f2f2f7 !important; */
61
+ border: none;
62
+ }
63
+
64
+ .message.svelte-x4qvqz .prose {
65
+ font-size: small;
66
+ }
67
+
68
+ .message-row.panel.svelte-x4qvqz.svelte-x4qvqz {
69
+ margin:0;
70
+ padding:calc(var(--spacing-xl) * 2) calc(var(--spacing-xxl) * 1);
71
+ }
72
+
73
+ @media screen {
74
+ div#sources-textbox {
75
+ height: calc(95vh - 160px) !important;
76
+ overflow-y: auto !important;
77
+ scrollbar-width: none;
78
+ -ms-overflow-style: none;
79
+ }
80
+
81
+ div#sources-textbox::-webkit-scrollbar {
82
+ width: 0;
83
+ height: 0;
84
+ }
85
+
86
+ div.svelte-iyf88w {
87
+ scrollbar-width: none;
88
+ height: calc(-180px + 95vh) !important;
89
+ overflow-y: auto;
90
+ }
91
+
92
+ div.svelte-iyf88w::-webkit-scrollbar {
93
+ width: 0;
94
+ height: 0;
95
+ }
96
+
97
+ div#chatbot-row {
98
+ height: calc(95vh - 100px) !important;
99
+ }
100
+
101
+ .form {
102
+ position: relative;
103
+ top: 10px;
104
+ }
105
+
106
+ #center-panel {
107
+ flex-grow: 2;
108
+ min-width: min(320px, 100%);
109
+ height: calc(-100px + 95vh) !important;
110
+ }
111
+
112
+ .compact.svelte-vt1mxs,
113
+ .panel.svelte-vt1mxs {
114
+ border:solid var(--panel-border-width) var(--panel-border-color);
115
+ border-radius:var(--container-radius);
116
+ background:var(--panel-background-fill);
117
+ padding:var(--spacing-lg);
118
+ height: calc(-100px + 95vh) !important;
119
+ }
120
+
121
+ .accordion-agent.spinoza-agent {
122
+ height: 15cm;
123
+ }
124
+
125
+ .accordion-agent.spinoza-agent > button > span {
126
+ color: #000000;
127
+ font-size: large;
128
+ font-weight: bold;
129
+ }
130
+
131
+ .accordion-agent > button > span {
132
+ color: #9ca1a5e7;
133
+ font-weight: bold;
134
+ }
135
+
136
+ .accordion-source > button > span {
137
+ color: #9ca1a5e7;
138
+ font-weight: bold;
139
+ }
140
+
141
+ }
142
+
143
+ textarea.scroll-hide {
144
+ max-height: 42px;
145
+ }
146
+
147
+ footer {
148
+ position: fixed;
149
+ left: 0;
150
+ bottom: 0;
151
+ width: 100%;
152
+ text-align: center;
153
+ }
154
+
155
+ .footer-ekimetrics {
156
+ align-items: center;
157
+ margin-left: var(--size-2);
158
+ font-size: small;
159
+ }
160
+
161
+ @media screen and (max-width: 767px) {
162
+ /* Your mobile-specific styles go here */
163
+ div#chatbot-row{
164
+ height:500px !important;
165
+ }
166
+
167
+ #submit-button{
168
+ padding:0px !important;
169
+ min-width: 80px;
170
+ }
171
+
172
+ /* This will hide all list items */
173
+ div.tab-nav button {
174
+ display: none !important;
175
+ }
176
+
177
+ /* This will show only the first list item */
178
+ div.tab-nav button:first-child {
179
+ display: block !important;
180
+ }
181
+
182
+ /* This will show only the first list item */
183
+ div.tab-nav button:nth-child(2) {
184
+ display: block !important;
185
+ }
186
+
187
+ #right-panel button{
188
+ display: block !important;
189
+ }
190
+ }
191
+
192
+ .tabitem {
193
+ border: none !important;
194
+ }
195
+
196
+ .other-tabs>div {
197
+ padding-left: 40px;
198
+ padding-right: 40px;
199
+ padding-top: 10px;
200
+ }
201
+
202
+ .tab-nav {
203
+ border: none !important;
204
+ }
205
+
206
+ .tab-nav>button.selected {
207
+ color: #4b8ec3;
208
+ font-weight: bold;
209
+ border: none;
210
+ }
211
+
212
+ #science-tab>button.selected {
213
+ color: #4b8ec3 !important;
214
+ font-weight: bold !important;
215
+ border: 2px solid #4b8ec3 !important;
216
+ border-radius: 5px !important;
217
+ padding: 5px 15px !important;
218
+ }
219
+
220
+ #law-tab>button.selected {
221
+ color: #4b8ec3 !important;
222
+ font-weight: bold !important;
223
+ border: 2px solid #4b8ec3 !important;
224
+ border-radius: 5px !important;
225
+ padding: 5px 15px !important;
226
+ }
227
+
228
+ #org-tab>button.selected {
229
+ color: #4b8ec3 !important;
230
+ font-weight: bold !important;
231
+ border: 2px solid #4b8ec3 !important;
232
+ border-radius: 5px !important;
233
+ padding: 5px 15px !important;
234
+ }
235
+
236
+ #ademe-tab>button.selected {
237
+ color: #4b8ec3 !important;
238
+ font-weight: bold !important;
239
+ border: 2px solid #4b8ec3 !important;
240
+ border-radius: 5px !important;
241
+ padding: 5px 15px !important;
242
+ }
243
+
244
+ #press-tab>button.selected {
245
+ color: #4b8ec3 !important;
246
+ font-weight: bold !important;
247
+ border: 2px solid #4b8ec3 !important;
248
+ border-radius: 5px !important;
249
+ padding: 5px 15px !important;
250
+ }
251
+
252
+ #afp-tab>button.selected {
253
+ color: #4b8ec3 !important;
254
+ font-weight: bold !important;
255
+ border: 2px solid #4b8ec3 !important;
256
+ border-radius: 5px !important;
257
+ padding: 5px 15px !important;
258
+ }
259
+
260
+ .header-row {
261
+ position: fixed;
262
+ top: 10px;
263
+ right: 10px;
264
+ z-index: 1000;
265
+ display: flex;
266
+ justify-content: flex-end;
267
+ gap: 10px;
268
+ }
269
+
270
+ .header-row button {
271
+ flex: none !important;
272
+ width: auto !important;
273
+ min-width: 0 !important;
274
+ }
275
+
276
+ .header-row .lang-button {
277
+ flex-grow: 0 !important;
278
+ flex-shrink: 0 !important;
279
+ flex-basis: 35px !important;
280
+ width: 35px !important;
281
+ height: 35px !important;
282
+ padding: 0 !important;
283
+ align-items: center !important;
284
+ justify-content: center !important;
285
+ background: none !important;
286
+ border: none !important;
287
+ box-shadow: none !important;
288
+ }
289
+
290
+ .lang-button.selected {
291
+ background-color: #2196F3 !important;
292
+ color: white !important;
293
+ border-color: #1976D2 !important;
294
+ }
295
+
296
+ .header-row .lang-button:hover {
297
+ background: none !important;
298
+ border: none !important;
299
+ box-shadow: none !important;
300
+ }
301
+ .main-row {
302
+ margin-top: 50px !important;
303
+ }
304
+
305
+ #input-textbox>label>textarea {
306
+ border-radius: 40px;
307
+ padding-left: 30px;
308
+ resize: none;
309
+ }
310
+
311
+ #input-message>div {
312
+ border: none;
313
+ }
314
+
315
+ .loader {
316
+ animation: blink 2s linear infinite;
317
+ }
318
+
319
+ @keyframes blink {
320
+ 0% {
321
+ opacity: 0;
322
+ }
323
+ 100% {
324
+ opacity: 1;
325
+ }
326
+ }
assets/utils_javascript.py ADDED
@@ -0,0 +1,72 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ def update_footer():
2
+ return """
3
+ function update_footer() {
4
+ var footer = document.querySelector('footer')
5
+ footer.innerHTML = "<a href='https://rsf.org/fr' class='footer-ekimetrics' target='_blank' rel='noreferrer'>À l'initiative de RSF</a>";
6
+ footer.innerHTML += "<a href='https://www.alliancepresse.fr/' class='footer-ekimetrics' target='_blank' rel='noreferrer'>et l'Alliance Presse</a>";
7
+ footer.innerHTML += "<div class='footer-ekimetrics'> · </div>";
8
+ footer.innerHTML += "<a href='https://www.culture.gouv.fr/' class='footer-ekimetrics' target='_blank' rel='noreferrer'>Avec le soutien du Ministère de la Culture</a>";
9
+ footer.innerHTML += "<div class='footer-ekimetrics'> · </div>";
10
+ footer.innerHTML += "<a href='https://ekimetrics.com/' class='footer-ekimetrics' target='_blank' rel='noreferrer'>Conçu par Ekimetrics</a>";
11
+ }
12
+ """
13
+
14
+
15
+ def accordion_trigger():
16
+ return """
17
+ function accordion_trigger() {
18
+ var input_textbox = document.getElementById("input-textbox");
19
+ input_textbox.addEventListener('keyup', function (e) {
20
+ if (e.key === 'Enter' || e.keyCode === 13) {
21
+ document.querySelectorAll(".loader, .loader-helper").forEach(el => el.remove());
22
+ var accordions = document.querySelectorAll('.accordion-agent');
23
+ accordions.forEach(function (accordion) {
24
+ var agentName = "Agent " + accordion.id.split('-')[1];
25
+ var buttonSpan = accordion.querySelector('button > span');
26
+ if (!accordion.classList.contains('spinoza-agent')) {
27
+ buttonSpan.textContent = agentName;
28
+ buttonSpan.innerHTML += "<span class='loader-helper'> - </span><span class='loader'>loading</span>";
29
+ }
30
+ });
31
+ }
32
+ });
33
+ }
34
+ """
35
+
36
+
37
+ def accordion_trigger_end():
38
+ return """
39
+ function accordion_trigger_end() {
40
+ var accordions = document.querySelectorAll('.accordion-agent');
41
+
42
+ accordions.forEach(function (accordion) {
43
+ if (!accordion.classList.contains('spinoza-agent')) {
44
+ var agentName = "Agent " + accordion.id.split('-')[1];
45
+ var buttonSpan = accordion.querySelector('button > span');
46
+ buttonSpan.textContent = agentName + " - ready";
47
+ }
48
+ });
49
+ }
50
+ """
51
+
52
+
53
+ def accordion_trigger_spinoza():
54
+ return """
55
+ function accordion_trigger_spinoza() {
56
+ var accordion_spinoza = document.querySelector('.spinoza-agent');
57
+ document.querySelectorAll(".loader, .loader-helper").forEach(el => el.remove());
58
+ var buttonSpan = accordion_spinoza.querySelector('button > span');
59
+ buttonSpan.textContent = "Spinoza";
60
+ buttonSpan.innerHTML += "<span class='loader-helper'> - </span><span class='loader'>generating</span>";
61
+ }
62
+ """
63
+
64
+
65
+ def accordion_trigger_spinoza_end():
66
+ return """
67
+ function accordion_trigger_spinoza_end() {
68
+ var accordion_spinoza = document.querySelector('.spinoza-agent');
69
+ var buttonSpan = accordion_spinoza.querySelector('button > span');
70
+ buttonSpan.textContent = "Spinoza - ready";
71
+ }
72
+ """
poetry.lock ADDED
The diff for this file is too large to render. See raw diff
 
pyproject.toml ADDED
@@ -0,0 +1,32 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [tool.poetry]
2
+ name = "spinoza_project"
3
+ version = "0.1.0"
4
+ description = ""
5
+ authors = ["Miguel Omenaca Muro <[email protected]>"]
6
+ readme = "README.md"
7
+ package-mode = false
8
+
9
+ [tool.poetry.dependencies]
10
+ python = "^3.10"
11
+ gradio = "4.37.2"
12
+ sentence-transformers = "2.2.2"
13
+ msal = "^1.28.1"
14
+ qdrant-client = "^1.9.1"
15
+ loadenv = "^0.1.1"
16
+ datasets = "^2.20.0"
17
+ transformers = "4.39.0"
18
+ azure-search-documents = "^11.4.0"
19
+ azure-identity = "^1.17.1"
20
+ load-dotenv = "^0.1.0"
21
+ python-dotenv = "^1.0.1"
22
+ langchain-groq = "^0.2.1"
23
+ langchain-openai = "^0.2.6"
24
+ langchain-community = "^0.3.5"
25
+ langchain = "^0.3.7"
26
+ huggingface-hub = "< 0.26"
27
+ fastapi = "0.111.0"
28
+
29
+
30
+ [build-system]
31
+ requires = ["poetry-core"]
32
+ build-backend = "poetry.core.masonry.api"
requirements.txt ADDED
The diff for this file is too large to render. See raw diff
 
spinoza_project/__init__.py ADDED
File without changes
spinoza_project/config.yaml ADDED
@@ -0,0 +1,87 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ demo_name: Spinoza Q&A
2
+
3
+ default_databases: "SpinozaProject/spinoza-database"
4
+ ## Veuillez préciser le chemin du répertoire dans lequel est stocké les fichiers pickle des bases de données à ajouter
5
+ database_hf: ""
6
+ # Exemple >>> database_hf: "user_repo/database_name"
7
+
8
+ databases_pickle_files:
9
+ ## Veuillez spécifier le nom EXACT des fichiers pickles ajoutés dans les répertoires ci-dessus correspondant à chaque base de données
10
+ science_agent: "database_Science.pickle"
11
+ law_agent: "database_Loi.pickle"
12
+ pub_institutions_agent: "database_Organismes_publics.pickle"
13
+ ademe_agent: "database_ADEME.pickle"
14
+ # Exemple >>> new_agent: "database_Nouvelle_bdd.pickle"
15
+
16
+ prompts_path_list:
17
+ ## Veuillez spécifier le chemin des prompts associés à chaque agent au sein du répertoire du projet Spinoza
18
+ science_agent: "./spinoza_project/prompt_Science.yaml"
19
+ law_agent: "./spinoza_project/prompt_Loi.yaml"
20
+ pub_institutions_agent: "./spinoza_project/prompt_Organismes_publics.yaml"
21
+ ademe_agent: "./spinoza_project/prompt_ADEME.yaml"
22
+ # Exemple >>> new_agent: "./spinoza_project/prompt_Template.yaml"
23
+
24
+ tabs:
25
+ ## Veuillez associer ici le nom précis des bases de données que vous souhaitez utiliser/afficher, ainsi que leur description
26
+ science_agent: "Cet outil s’appuie sur les rapports du **GIEC**, de l'**IPBES** et de l'**UNESCO**. Nous avons sélectionné ces organismes de sciences consensuelles afin de garantir la fiabilité maximale des réponses obtenues."
27
+ law_agent: "Cet outil regroupe 21 des codes juridiques français modifiés par la **'loi climat' de 2021**. Il est important de souligner qu’il offre une vision limitée de la réglementation française, ne prenant pas en compte, par exemple, les réglementations régionales, les jurisprudences ou les lois européennes. Son objectif est de mettre en perspective des questions avec des articles spécifiques, sans prétendre fournir de réponses légales définitives."
28
+ pub_institutions_agent: "Cet outil interroge les documents publics de certains organismes publics. Il ne s’agit pas d’une vision exhaustive de leurs données. Si vous estimez que certains ajouts sont nécessaires, n’hésitez pas à nous les transmettre à l’adresse suivante : **[[email protected]](mailto:[email protected])"
29
+ ademe_agent:
30
+ "Cet outil est dédié aux données de l’ADEME et évoluera en intégrant directement une base de données administrée par l’ADEME dans le cadre de son projet SofIA. Les documents actuellement disponibles couvrent des typologies variées, notamment :
31
+
32
+ Guides à destination du grand public ;
33
+
34
+ - Rapports d’expériences sur les nouvelles technologies ;
35
+
36
+ - Études et recherches sur les impacts locaux, ainsi que des documents institutionnels (analyses commandées par la France et rapports d’activité) ;
37
+
38
+ - Plans de transition sectoriels pour les industries les plus émettrices (verre, papier, ciment, acier, aluminium, chimie, sucre)."
39
+ # Exemple >>> new_agent: "*Description de la base de données*"
40
+
41
+ en_description:
42
+ ## Veuillez renseigner ici la description anglaise des mêmes bases de données
43
+ science_agent: "This tool relies on reports from internationally recognized scientific organizations, including the **IPCC (Intergovernmental Panel on Climate Change)**, **IPBES (Intergovernmental Science-Policy Platform on Biodiversity and Ecosystem Services)**, and **UNESCO**. We have selected these sources for their rigorous, consensus-based scientific approaches, ensuring the highest reliability of the answers provided."
44
+ law_agent: "This tool integrates information from 21 French legal codes amended by the **2021 'Climate and Resilience Law'**, a major legislative effort to address climate change in France. It is important to note that this tool provides a limited view of French legal frameworks, as it does not include regional regulations, case law, or European Union laws. Its primary objective is to contextualize user questions with relevant legal articles rather than offering definitive legal advice."
45
+ pub_institutions_agent: "This tool queries publicly available documents from selected French public institutions. While the coverage is not exhaustive, it provides valuable insights into these organizations’ policies and actions. If you believe additional sources should be included, please feel free to send your suggestions to: **[email protected]**."
46
+ ademe_agent:
47
+ "This tool is dedicated to data from ADEME (French Agency for Ecological Transition), a key institution supporting France’s environmental and energy transition policies. It will soon evolve to incorporate a database directly managed by ADEME as part of their SofIA project. Currently, the documents available cover a broad range of topics, including:
48
+
49
+ - **Public guides**: Educational materials aimed at the general public.
50
+
51
+ - **Technological pilot projects**: Reports on experimental applications of emerging technologies.
52
+
53
+ - **Research and studies**: Analyses of local environmental impacts and institutional reports commissioned by the French government.
54
+
55
+ - **Sector-specific transition plans**: Detailed strategies for decarbonizing high-emission industries such as glass, paper, cement, steel, aluminum, chemicals, and sugar production."
56
+ # Exemple >>> new_agent: "*New database's english description*"
57
+
58
+ source_mapping:
59
+ ## Veuillez renseigner ici le nom que vous souhaiteriez donner à chaque base de données dans l'outil Spinoza
60
+ science_agent: "Science"
61
+ law_agent: "Loi"
62
+ pub_institutions_agent: "Organismes publics"
63
+ ademe_agent: "ADEME"
64
+ # Exemple >>> new_agent: "Nom de la base de données"
65
+
66
+ en_names:
67
+ ## Veuillez renseigner ici les équivalents anglais du nom des bases de données
68
+ science_agent: "Science"
69
+ law_agent: "Law"
70
+ pub_institutions_agent: "Public Institutions"
71
+ ademe_agent: "ADEME"
72
+ # Exemple >>> new_agent: "Database's name"
73
+
74
+ ## Veuillez renseigner le modèle Groq que devront utiliser les agents pour répondre aux questions
75
+ groq_model_name: "llama-3.3-70b-versatile" # llama-3.1-8b-instant / llama-3.3-70b-versatile/ llama-3.1-70b-versatile / llama-3.2-3b-preview
76
+
77
+ query_preprompt: "query: "
78
+ passage_preprompt: "passage: "
79
+ embedding_model: "intfloat/multilingual-e5-base"
80
+ num_document_retrieved: 5
81
+ min_similarity: 0.05
82
+
83
+ ## Chat API
84
+ user_token: "user"
85
+ assistant_token: "assistant"
86
+ system_token: "system"
87
+ stop_token: "" ## useless in chat mode
spinoza_project/database_building/Data.zip ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:76da75de415d4452746e75e3748617f6be00d3175ff078324a6749c0bcd51f9c
3
+ size 571471
spinoza_project/database_building/doc_parsing_and_vectorization.ipynb ADDED
The diff for this file is too large to render. See raw diff
 
spinoza_project/hackathon/Data.zip ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:3450f9f6135001d44470f81de741d331f29aa0509695b257637b8c82c5df40a3
3
+ size 570556
spinoza_project/prompt_ADEME.yaml ADDED
@@ -0,0 +1,79 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ role_instruction:
2
+ prompt:
3
+ [
4
+ "You are Spinoza Fact Checker, an AI Assistant developed by Ekimetrics.",
5
+ "Your role is to answer questions factually based on the documents provided to you, which may contain opinions, recommendations, or analyses.",
6
+ "You act as a sustainble journalist, providing structured, factual, and concise responses while citing your sources and mentioning linked articles.",
7
+ "If a question is not related to climate, do not answer it and state that the question falls outside your expertise."
8
+ ]
9
+ type: "system"
10
+
11
+ source_prompt:
12
+ prompt:
13
+ [
14
+ "Below are several documents formatted as: Doc X \n textual content.",
15
+ "<documents>",
16
+ "{sources}",
17
+ "</documents>",
18
+ "",
19
+ "Treat the textual content as providing relevant opinions, recommendations, or analyses.",
20
+ "For each fact or analysis used in your response, reference the source clearly (e.g., [Doc 2]: some analysis from Doc 2).",
21
+ "Incorporate all the relevant content from the documents to provide a well-rounded response.",
22
+ "Disregard any information that is irrelevant to the question at hand.",
23
+ "If you do not have relevant documents or they lack context, state that you don't have enough context to answer.",
24
+ "If the question is not related to climate, explain that it falls outside your scope of expertise."
25
+ ]
26
+ type: "instruction"
27
+
28
+ question_answering_prompt:
29
+ prompt:
30
+ [
31
+ "Répondez à la question suivante : {question}.",
32
+ "Si votre réponse est basée sur un article spécifique, formulez-la de la manière suivante : 'Selon l'article [nom de l'article], [réponse]'.",
33
+ "Si la réponse s'appuie sur plusieurs articles, utilisez un point par article.",
34
+ "Citez les passages pertinents des sources lorsque cela est nécessaire.",
35
+ "Si la question n'est pas liée à des questions environnementales, dites que vous ne pouvez pas y répondre en raison de l'irrélevance des sources fournies.",
36
+ "Si la question n'est pas liée à l'environnement, dites explicitement que la question ne relève pas de votre domaine d'expertise.",
37
+ "Répondez impérativement en {language}."
38
+ ]
39
+ type: "prompt"
40
+
41
+
42
+ reformulation_prompt:
43
+ prompt:
44
+ [
45
+ "Reformulez le message de l'utilisateur en une question autonome et concise.",
46
+ "Reformulez strictement en {language}.",
47
+ "La question reformulée doit être claire et suffisamment précise pour interroger des textes publics provenant d'analyses.",
48
+ "Si pertinent, utilisez le résumé de la conversation pour ajouter du contexte.",
49
+ "Si la question est trop vague, reformulez-la telle qu'elle est sans faire d'hypothèses supplémentaires.",
50
+ "Si la question n'est pas liée au climat ou à la réglementation environnementale, indiquez qu'elle est hors de votre domaine d'expertise.",
51
+ "",
52
+ "Exemples:",
53
+ "---",
54
+ "user:",
55
+ "Quels sont les avis sur la taxe carbone?",
56
+ "",
57
+ "assistant:",
58
+ "Quels avis et recommandations sont formulés au sujet de l'application de la taxe carbone en France?",
59
+ "---",
60
+ "user:",
61
+ "Quelles recommandations pour l'indice de réparabilité?",
62
+ "",
63
+ "assistant:",
64
+ "Quelles recommandations les publications fournissent-elles au sujet de l'indice de réparabilité des produits?",
65
+ "---",
66
+ "user:",
67
+ "Quels enjeux autour de l'eau?",
68
+ "",
69
+ "assistant:",
70
+ "Quelles analyses ou avis sont formulés au sujet de la gestion de l'eau dans les publications disponibles?",
71
+ "---",
72
+ "user:",
73
+ "{question}",
74
+ "---",
75
+ "Si la question n'est pas liée au climat ou à la réglementation environnementale:",
76
+ "assistant:",
77
+ "La question posée ne relève d'enjeux environnementaux, je ne peux donc pas y répondre."
78
+ ]
79
+ type: "prompt"
spinoza_project/prompt_Loi.yaml ADDED
@@ -0,0 +1,85 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ role_instruction:
2
+ prompt:
3
+ [
4
+ "You are Spinoza law analyst, an AI Assistant developed by Ekimetrics.",
5
+ "Your primary role is to provide factually accurate responses about climate change, based solely on the documents provided.",
6
+ "You act as a legal expert, delivering structured, factual, concise, and sourced responses.",
7
+ "Always quote your sources by mentioning document titles and linked articles when applicable.",
8
+ "Your only focus is climate change"
9
+ ]
10
+ type: "system"
11
+
12
+ source_prompt:
13
+ prompt:
14
+ [
15
+ "Below are several documents formatted as: Doc X \n textual content.",
16
+ "<documents>",
17
+ "{sources}",
18
+ "</documents>",
19
+ "",
20
+ "Treat the content of the provided documents as authoritative.",
21
+ "For each fact presented in your answer, reference the source explicitly (e.g., [Doc 2]: some fact from Doc 2).",
22
+ "Incorporate all relevant information from the documents to form a complete response.",
23
+ "Disregard any irrelevant facts or information that do not directly pertain to the question.",
24
+ "If no relevant documents are provided or if they lack sufficient context, try to display what could be relevant from a climate perspective.",
25
+ "If you do so, disclose at the begining of your awnser that you are trying to open on another subject, and remind the user that you are using only data linked to climate and that your awnser can be incomplete",
26
+ ]
27
+ type: "instruction"
28
+
29
+ question_answering_prompt:
30
+ prompt:
31
+ [
32
+ "Respond to the following question: {question}.",
33
+ "If your response is based on a specific article, phrase it as follows: 'Selon l'article [name of the article], [answer].'",
34
+ "When the answer references multiple articles, structure the response with bullet points, each citing the corresponding article.",
35
+ "Quote text from the sources when relevant.",
36
+ "If the question is unrelated to legal matters, explicitly state that you cannot provide an answer based on the given sources.",
37
+ "Answer imperatively in {language}.",
38
+ ]
39
+ type: "prompt"
40
+
41
+ reformulation_prompt:
42
+ prompt:
43
+ [
44
+ "Reformulez le message de l'utilisateur en une question autonome et concise en {language}",
45
+ "La question reformulée doit être claire et suffisamment précise pour interroger des textes juridiques concernant la réglementation environnementale.",
46
+ "Si pertinent, utilisez le résumé de la conversation pour ajouter du contexte.",
47
+ "Si la question est trop éloignée du changement, reformule-la de manière à ce qu'elle concerne également le changement climatique",
48
+ "Si la question est trop vague, reformulez-la telle qu'elle est sans faire d'hypothèses supplémentaires.",
49
+ "",
50
+ "Exemples:",
51
+ "---",
52
+ "user:",
53
+ "Applique-t-on une taxe carbone?",
54
+ "",
55
+ "assistant:",
56
+ "La taxe carbone est-elle appliquée en France?",
57
+ "---",
58
+ "user:",
59
+ "Quelle obligation produit l'indice de réparabilité des produits?",
60
+ "",
61
+ "assistant:",
62
+ "Quelles sont les exigences légales concernant l'indice de réparabilité des produits?",
63
+ "---",
64
+ "user:",
65
+ "Quelles obligations de faire un bilan carbone?",
66
+ "",
67
+ "assistant:",
68
+ "Quand doit-on réaliser un bilan des émissions de gaz à effet de serre?",
69
+ "---",
70
+ "user:",
71
+ "Quels enjeux autour de l'eau?",
72
+ "",
73
+ "assistant:",
74
+ "Quels articles réglementent la consommation d'eau et que stipulent-ils?",
75
+ "---",
76
+ "user:",
77
+ "Peut on se baigner dans la gironde ?",
78
+ "",
79
+ "assistant:",
80
+ "Quelles sont les réglementations concernant la baignade en Gironde?",
81
+ "---",
82
+ "user:",
83
+ "{question}",
84
+ ]
85
+ type: "prompt"
spinoza_project/prompt_Organismes_publics.yaml ADDED
@@ -0,0 +1,78 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ role_instruction:
2
+ prompt:
3
+ [
4
+ "You are Spinoza Fact Checker, an AI Assistant developed by Ekimetrics.",
5
+ "Your role is to answer questions factually based on the documents provided to you, which may contain opinions, recommendations, or analyses.",
6
+ "You act as a journalist, providing structured, factual, and concise responses while citing your sources and mentioning linked articles.",
7
+ "If a question is not related to climate, do not answer it and state that the question falls outside your expertise."
8
+ ]
9
+ type: "system"
10
+
11
+ source_prompt:
12
+ prompt:
13
+ [
14
+ "Below are several documents formatted as: Doc X \n textual content.",
15
+ "<documents>",
16
+ "{sources}",
17
+ "</documents>",
18
+ "",
19
+ "Treat the textual content as providing relevant opinions, recommendations, or analyses.",
20
+ "For each fact or analysis used in your response, reference the source clearly (e.g., [Doc 2]: some analysis from Doc 2).",
21
+ "Incorporate all the relevant content from the documents to provide a well-rounded response.",
22
+ "Disregard any information that is irrelevant to the question at hand.",
23
+ "If you do not have relevant documents or they lack context, state that you don't have enough context to answer.",
24
+ "If the question is not related to climate, explain that it falls outside your scope of expertise."
25
+ ]
26
+ type: "instruction"
27
+
28
+ question_answering_prompt:
29
+ prompt:
30
+ [
31
+ "Répondez à la question suivante : {question}.",
32
+ "Si votre réponse est basée sur un article spécifique, formulez-la de la manière suivante : 'Selon l'article [nom de l'article], [réponse]'.",
33
+ "Si la réponse s'appuie sur plusieurs articles, utilisez un point par article.",
34
+ "Citez les passages pertinents des sources lorsque cela est nécessaire.",
35
+ "Si la question n'est pas liée à des questions environnementales, dites que vous ne pouvez pas y répondre en raison de l'irrélevance des sources fournies.",
36
+ "Si la question n'est pas liée au climat ou à la réglementation environnementale, dites explicitement que la question ne relève pas de votre domaine d'expertise.",
37
+ "Répondez impérativement en {language}."
38
+ ]
39
+ type: "prompt"
40
+
41
+
42
+ reformulation_prompt:
43
+ prompt:
44
+ [
45
+ "Reformulez le message de l'utilisateur en une question autonome et concise en {language}",
46
+ "La question reformulée doit être claire et suffisamment précise pour interroger des textes publics provenant d'analyses.",
47
+ "Si pertinent, utilisez le résumé de la conversation pour ajouter du contexte.",
48
+ "Si la question est trop vague, reformulez-la telle qu'elle est sans faire d'hypothèses supplémentaires.",
49
+ "Si la question n'est pas liée au climat ou à la réglementation environnementale, indiquez qu'elle est hors de votre domaine d'expertise.",
50
+ "",
51
+ "Exemples:",
52
+ "---",
53
+ "user:",
54
+ "Quels sont les avis sur la taxe carbone?",
55
+ "",
56
+ "assistant:",
57
+ "Quels avis et recommandations sont formulés au sujet de l'application de la taxe carbone en France?",
58
+ "---",
59
+ "user:",
60
+ "Quelles recommandations pour l'indice de réparabilité?",
61
+ "",
62
+ "assistant:",
63
+ "Quelles recommandations les publications fournissent-elles au sujet de l'indice de réparabilité des produits?",
64
+ "---",
65
+ "user:",
66
+ "Quels enjeux autour de l'eau?",
67
+ "",
68
+ "assistant:",
69
+ "Quelles analyses ou avis sont formulés au sujet de la gestion de l'eau dans les publications disponibles?",
70
+ "---",
71
+ "user:",
72
+ "{question}",
73
+ "---",
74
+ "Si la question n'est pas liée au climat ou à la réglementation environnementale:",
75
+ "assistant:",
76
+ "La question posée ne relève d'enjeux environnementaux, je ne peux donc pas y répondre."
77
+ ]
78
+ type: "prompt"
spinoza_project/prompt_Science.yaml ADDED
@@ -0,0 +1,71 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ role_instruction:
2
+ prompt:
3
+ [
4
+ "You are Spinoza Fact Checker, an AI Assistant developed by Ekimetrics.",
5
+ "Your role is to answer questions factually based on the sources provided to you.",
6
+ "You act as a scientific expert, providing structured, factual, and concise responses while citing your sources.",
7
+ "If a question is not related to science, state that it is outside your expertise and do not provide an answer."
8
+ ]
9
+ type: "system"
10
+
11
+ source_prompt:
12
+ prompt:
13
+ [
14
+ "Here are some documents formatted as: Doc X \n textual content.",
15
+ "<documents>",
16
+ "{sources}",
17
+ "</documents>",
18
+ "",
19
+ "Treat the textual content as authoritative.",
20
+ "Reference the source of each fact before stating it (e.g., [Doc 2]: some fact from Doc 2).",
21
+ "Incorporate all the relevant information from the documents to form your response.",
22
+ "Ignore irrelevant information.",
23
+ "If you have no documents or if they are irrelevant, state that you don't have enough context to answer.",
24
+ "If the question is not related to science, state that it is outside your expertise."
25
+ ]
26
+ type: "instruction"
27
+
28
+ question_answering_prompt:
29
+ prompt:
30
+ [
31
+ "Answer the following question: {question}.",
32
+ "Use bullet points to organize your answer.",
33
+ "Translate your answer in {language} and return only that translated answer.",
34
+ "Avoid returning any text that is not in {language}.",
35
+ ]
36
+ type: "prompt"
37
+
38
+ reformulation_prompt:
39
+ prompt: [
40
+ "Reformulate the following user message to be a short standalone question.",
41
+ "The question should be asked in {language}",
42
+ "The question is related to science.",
43
+ "Use the conversation summary to add context if relevant.",
44
+ "If the question is too vague, just state it as it is.",
45
+ "This reformulated question will be used to retrieve scientific documents or data.",
46
+ "",
47
+ "Examples:",
48
+ "---",
49
+ "user:",
50
+ "La technologie nous sauvera-t-elle?",
51
+ "",
52
+ "assistant:",
53
+ "Can technology help humanity mitigate the effects of climate change?",
54
+ "---",
55
+ "user:",
56
+ "Quelles sont nos réserves en combustibles fossiles?",
57
+ "",
58
+ "assistant:",
59
+ "What are the current reserves of fossil fuels and how long will they last?",
60
+ "---",
61
+ "user:",
62
+ "Quels sont les principaux facteurs du changement climatique?",
63
+ "",
64
+ "assistant:",
65
+ "What are the main causes of climate change in the last century?",
66
+ "---",
67
+ "user:",
68
+ "{question}",
69
+ "",
70
+ ]
71
+ type: "prompt"
spinoza_project/prompt_Spinoza.yaml ADDED
@@ -0,0 +1,22 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ prompt:
2
+ [
3
+ "You are a factual journalist specialized in summarizing expert answers from technical, legal, and media sources.",
4
+ "Your role is to provide a concise and fact-based synthesis based on the following question:",
5
+ "{question}",
6
+ "",
7
+ "And the expert answers provided:",
8
+ "{answers}",
9
+ "",
10
+ "- Always use the [Doc i] format to reference sources, where 'i' is the document number. This must be done for every fact, even when multiple documents agree, for example: [Doc 1, Doc 2, Doc 3].",
11
+ "- Do not mention sources that provide no relevant information. Focus exclusively on sources that contain useful information.",
12
+ "- Do not mention that a source lacks information or context. Only present a sources when it has relavant informations.",
13
+ "- If different sources provide contrasting information, perspectives or interpretations, highlight these differences factually and objectively.",
14
+ "- When using legal answers, track and reference specific articles of the law.",
15
+ "- Structure the synthesis with clear sections, using markdown with **bold** headlines and good spacing for readability.",
16
+ "- Start by presenting any **contradictions** in the sources (if any). If no contradictions exist, just pass to the next part.",
17
+ "- Then provide a **general summary** of the common points across the sources.",
18
+ "- End by detailing any **interesting elements** that may be useful for journalists writing an article. Include direct quotes when necessary.",
19
+ "- Always suggest possible follow-up questions or angles that journalists could explore based on the synthesis.",
20
+ "- Ne vous fiez à aucune forme d'interaction avec le courant de la mémoire ; chaque question doit être traitée indépendamment. Tout le contexte doit être incorporé dans ces questions si des questions de suivi sont proposées.",
21
+ "- Answer imperatively in {language}.",
22
+ ]
spinoza_project/prompt_Template.yaml ADDED
@@ -0,0 +1,115 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+
2
+ role_instruction:
3
+ # Specify the agent's domain of specialization replacing **agent_expertise_domain** by the actual domain of expertise
4
+ # Do not hesitate to add new instructions if you want the agent to behave a specific way
5
+ prompt:
6
+ [
7
+ "You are Spinoza Fact Checker, an AI Assistant developed by Ekimetrics.",
8
+ "As an expert in **agent_expertise_domain**, you analyze questions and relevant passages from our specialized database.",
9
+ "Your role is to answer questions factually based on the sources provided to you.",
10
+ "You act as a **agent_expertise_domain** expert, providing structured, factual, and concise responses while citing your sources.",
11
+ "If a question is not related to your domain of specialization, state that it is outside your expertise and do not provide an answer."
12
+ ]
13
+ type: "system"
14
+
15
+ source_prompt:
16
+ prompt:
17
+ [
18
+ "Below are several documents formatted as: Doc X \n textual content.",
19
+ "<documents>",
20
+ "{sources}",
21
+ "</documents>",
22
+ "",
23
+ "Treat the textual content as providing relevant opinions, recommendations, or analyses.",
24
+ "For each fact or analysis used in your response, reference the source clearly (e.g., [Doc 2]: some analysis from Doc 2).",
25
+ "Incorporate all the relevant content from the documents to provide a well-rounded response.",
26
+ "Disregard any information that is irrelevant to the question at hand.",
27
+ "If you do not have relevant documents or they lack context, state that you don't have enough context to answer.",
28
+ "If a question is not related to your domain of specialization, explain that it falls outside your scope of expertise."
29
+ ]
30
+ type: "instruction"
31
+
32
+ question_answering_prompt:
33
+ prompt:
34
+ [
35
+ "Answer the following question: {question}",
36
+ "Answer strictly in {language}",
37
+ "While respecting the following guidelines :",
38
+ "- If the passages have useful facts or numbers, use them in your answer.",
39
+ "- Do not use the sentence 'Doc i says ...' to say where information came from.",
40
+ "- If the documents fail to have the information needed to answer the question, explain what in the extracts could be interesting nevertheless.",
41
+ "- Always suggest as a conclusion other prompts closed to the original one that could lead the journalist to discover new data and information. For example, rephrase the original question, make it more precise, or change the topic of the question while remaining in the same theme. Use bullet points",
42
+ "- Do not just summarize each passage one by one. Group your summaries to highlight the key parts in the explanation.",
43
+ "- If it makes sense, use bullet points and lists to make your answers easier to understand.",
44
+ "- You do not need to use every passage. Only use the ones that help answer the question.",
45
+ "- If a specific location is mentioned in the question, make it the core of your answer and follow the //specific guidelines//",
46
+ "",
47
+ "//specific guidelines//",
48
+ "if [the question is open and broad] then [:",
49
+ "- If the documents do not have the information needed to answer the question, say that you don't have enough information to answer directly to this question - it must be at the beginning of the text.",
50
+ "- If the documents fail to have the information needed to answer the question, explain what in the extracts could be interesting nevertheless.",
51
+ "- Start every paragraph with a question, and answer the question using different key elements taken from the sources ",
52
+ "- If the passages have useful facts or numbers, use them in your answer.",
53
+ "- When you use information from a passage, mention where it came from by using [Doc i] at the end of the sentence. i stands for the number of the document.",
54
+ "- Do not use the sentence 'Doc i says ...' to say where information came from.",
55
+ "- If the same thing is said in more than one document, you can mention all of them like this: [Doc i, Doc j, Doc k]",
56
+ "- Do not just summarize each passage one by one. Group your summaries to highlight the key parts in the explanation.",
57
+ "- If it makes sense, use bullet points and lists to make your answers easier to understand.",
58
+ "- You do not need to use every passage. Only use the ones that help answer the question.",
59
+ "- If the documents do not have the information needed to answer the question, just say you do not have enough information.",
60
+ "- Make a clear distinction between information about a /location/ named in the question and other regions.",
61
+ " - First you must display information about the precise /location/",
62
+ " - then clearly state that you have information about /other places/,",
63
+ " - the, display information about /other places/.",
64
+ "- Always suggest as a conclusion other prompts closed to the original one that could lead the journalist to discover new data and information. For example, rephrase the original question, make it more precise, or change the topic of the question while remaining in the same theme. Use bullet points]",
65
+ "",
66
+ "if [the question is factual and precise] then [",
67
+ "- If the documents do not have the information needed to answer the question, say that you don't have enough information to answer directly to this question - it must be at the beginning of the text.",
68
+ "- If the documents fail to have the information needed to answer the question, explain what in the extracts could be interesting nevertheless.",
69
+ "- Only answer the question",
70
+ "- Use bullet points and numbers",
71
+ "- If the passages have useful facts or numbers, use them in your answer.",
72
+ "- When you use information from a passage, mention where it came from by using [Doc i] at the end of the sentence. i stands for the number of the document.",
73
+ "- Do not use the sentence 'Doc i says ...' to say where information came from.",
74
+ "- If the same thing is said in more than one document, you can mention all of them like this: [Doc i, Doc j, Doc k]",
75
+ "- Do not just summarize each passage one by one. Group your summaries to highlight the key parts in the explanation.",
76
+ "- If it makes sense, use bullet points and lists to make your answers easier to understand.",
77
+ "- You do not need to use every passage. Only use the ones that help answer the question.",
78
+ "- If the documents do not have the information needed to answer the question, just say you do not have enough information.",
79
+ "- Make a clear distinction between information about a /location/ named in the question and other regions.",
80
+ " - First you must display information about the precise /location/",
81
+ " - then clearly state that you have information about /other places/,",
82
+ " - the, display information about /other places/",
83
+ "- Always suggest as a conclusion other prompts closed to the original one that could lead the journalist to discover new data and information. For example, rephrase the original question, make it more precise, or change the topic of the question while remaining in the same theme. Use bullet points]",
84
+ "-Awnser in French"
85
+ ]
86
+ type: "prompt"
87
+
88
+ reformulation_prompt:
89
+ prompt: [
90
+ "Reformulez le message de l'utilisateur en une question autonome et concise en français, en tenant compte du contexte fourni par la question initiale.",
91
+ "Reformule la question initiale strictement en {language}",
92
+ "Cette question servira à rechercher des documents pertinents dans une liste d'articles de presse.",
93
+ "Si la question est trop vague ou ambiguë, reformulez-la pour la rendre plus précise et ainsi augmenter les chances de trouver des documents pertinents dans ce corpus.",
94
+ "Ajoutez des éléments contextuels si nécessaire, tout en conservant la pertinence du sujet principal.",
95
+ "Si la question est déjà claire, reformulez-la simplement en gardant son essence.",
96
+ "",
97
+ "Exemples:",
98
+ "---",
99
+ "user:",
100
+ "Quels enjeux autour de l'eau?",
101
+ "",
102
+ "assistant:",
103
+ "Quels articles abordent les enjeux liés à l'eau et sous quels aspects sont-ils traités?",
104
+ "---",
105
+ "user:",
106
+ "Quelles obligations de faire un bilan carbone?",
107
+ "",
108
+ "assistant:",
109
+ "Quelles sont les obligations légales liées au bilan carbone et comment ces obligations sont-elles traitées dans les articles?",
110
+ "---",
111
+ "user:",
112
+ "{question}",
113
+ "",
114
+ ]
115
+ type: "prompt"
spinoza_project/source/backend/document_store.py ADDED
@@ -0,0 +1,45 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from qdrant_client.http import models
2
+ import pickle as pickle
3
+ import torch
4
+ import io
5
+
6
+ device_str = "cuda:0" if torch.cuda.is_available() else "cpu"
7
+ device = torch.device(device_str)
8
+
9
+
10
+ class Device_Unpickler(pickle.Unpickler):
11
+
12
+ def find_class(self, module, name):
13
+ if module == "torch.storage" and name == "_load_from_bytes":
14
+ return lambda b: torch.load(io.BytesIO(b), map_location=device_str)
15
+ else:
16
+ return super().find_class(module, name)
17
+
18
+
19
+ def pickle_to_document_store(path):
20
+ with open(path, "rb") as f:
21
+ document_store = Device_Unpickler(f).load()
22
+ document_store.embeddings.encode_kwargs["device"] = device_str
23
+ return document_store
24
+
25
+
26
+ def get_qdrant_filters(filter_dict: dict):
27
+ """Build a Qdrant filter based on a filter dict.
28
+
29
+ Filter dict must use metadata fields and be formated like:
30
+
31
+ filter_dict = {'file_name':['file1', 'file2'],'sub_type':['text']}
32
+ """
33
+ return models.Filter(
34
+ must=[
35
+ models.FieldCondition(
36
+ key=f"metadata.{field}",
37
+ match=models.MatchAny(any=filter_dict[field]),
38
+ )
39
+ for field in filter_dict
40
+ ]
41
+ )
42
+
43
+ def escape_value(value):
44
+ return value.replace("'", "''")
45
+
spinoza_project/source/backend/get_prompts.py ADDED
@@ -0,0 +1,33 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from spinoza_project.source.backend.prompt_utils import SpecialTokens, make_chat_prompt
2
+ from langchain.prompts.chat import ChatPromptTemplate
3
+
4
+
5
+ def get_qa_prompts(config, prompts):
6
+ special_tokens = SpecialTokens(config)
7
+ role_instruction = make_chat_prompt(prompts["role_instruction"], special_tokens)
8
+ source_prompt = make_chat_prompt(prompts["source_prompt"], special_tokens)
9
+ # memory_prompt=make_chat_prompt(prompts['memory_prompt'], special_tokens)
10
+ question_answering_prompt = make_chat_prompt(
11
+ prompts["question_answering_prompt"], special_tokens
12
+ )
13
+ reformulation_prompt = make_chat_prompt(
14
+ prompts["reformulation_prompt"], special_tokens
15
+ )
16
+ # summarize_memory_prompt = make_chat_prompt(
17
+ # prompts["summarize_memory_prompt"], special_tokens
18
+ # )
19
+
20
+ chat_qa_prompt = ChatPromptTemplate.from_messages(
21
+ [
22
+ role_instruction,
23
+ source_prompt,
24
+ # memory_prompt,
25
+ question_answering_prompt,
26
+ ]
27
+ )
28
+ chat_reformulation_prompt = ChatPromptTemplate.from_messages([reformulation_prompt])
29
+ # chat_summarize_memory_prompt = ChatPromptTemplate.from_messages([summarize_memory_prompt])
30
+ return (
31
+ chat_qa_prompt,
32
+ chat_reformulation_prompt,
33
+ ) # , chat_summarize_memory_prompt
spinoza_project/source/backend/llm_utils.py ADDED
@@ -0,0 +1,56 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from msal import ConfidentialClientApplication
2
+ from langchain_openai import AzureOpenAIEmbeddings, AzureChatOpenAI
3
+ from langchain_groq import ChatGroq
4
+ from langchain.vectorstores.azuresearch import AzureSearch
5
+ import os
6
+ import pandas as pd
7
+ from tqdm import tqdm
8
+ from collections import defaultdict
9
+ import json
10
+
11
+ class LLM:
12
+ def __init__(self, llm):
13
+ self.llm = llm
14
+ self.callbacks = []
15
+
16
+ def stream(self, prompt, prompt_arguments):
17
+ self.llm.streaming = True
18
+ streamed_content = self.llm.stream(prompt.format_messages(**prompt_arguments))
19
+ output = ""
20
+ for op in streamed_content:
21
+ output += op.content
22
+ yield output
23
+
24
+ def get_prediction(self, prompt, prompt_arguments):
25
+ self.llm.callbacks = self.callbacks
26
+ return self.llm.predict_messages(
27
+ prompt.format_messages(**prompt_arguments)
28
+ ).content
29
+
30
+ async def get_aprediction(self, prompt, prompt_arguments):
31
+ self.llm.callbacks = self.callbacks
32
+ prediction = await self.llm.apredict_messages(
33
+ prompt.format_messages(**prompt_arguments)
34
+ )
35
+ return prediction
36
+
37
+ async def get_apredictions(self, prompts, prompts_arguments):
38
+ self.llm.callbacks = self.callbacks
39
+ predictions = []
40
+ for prompt_, prompt_args_ in zip(prompts.keys(), prompts_arguments):
41
+ prediction = await self.llm.apredict_messages(
42
+ prompts[prompt_].format_messages(**prompt_args_)
43
+ )
44
+ predictions.append(prediction.content)
45
+ return predictions
46
+
47
+
48
+ def get_llm_api(groq_model_name):
49
+ print("Using GROQ API")
50
+ return LLM(
51
+ ChatGroq(
52
+ model=groq_model_name,
53
+ temperature=0,
54
+ max_tokens=2048,
55
+ )
56
+ )
spinoza_project/source/backend/prompt_utils.py ADDED
@@ -0,0 +1,68 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from langchain.prompts.chat import ChatMessagePromptTemplate
2
+
3
+
4
+ class SpecialTokens:
5
+ def __init__(self, config):
6
+ self.user_token = config["user_token"]
7
+ self.assistant_token = config["assistant_token"]
8
+ self.system_token = config["system_token"]
9
+ self.stop_token = config["stop_token"]
10
+
11
+
12
+ def to_instruction(query, special_tokens):
13
+ return special_tokens.user_token + query + special_tokens.stop_token
14
+
15
+
16
+ def to_prompt(query, special_tokens):
17
+ return (
18
+ special_tokens.user_token
19
+ + query
20
+ + special_tokens.stop_token
21
+ + special_tokens.assistant_token
22
+ )
23
+
24
+
25
+ def to_system(query, special_tokens):
26
+ return special_tokens.system_token + query + special_tokens.stop_token
27
+
28
+
29
+ def make_prompt(prompt, special_tokens):
30
+ prompt_type = prompt["type"]
31
+ if prompt_type == "system":
32
+ return to_system("\n".join(prompt["prompt"]), special_tokens)
33
+ elif prompt_type == "instruction":
34
+ return to_instruction("\n".join(prompt["prompt"]), special_tokens)
35
+ elif prompt_type == "prompt":
36
+ return to_prompt("\n".join(prompt["prompt"]), special_tokens)
37
+ else:
38
+ return "Invalid prompt type, please check your config"
39
+
40
+
41
+ def to_chat_instruction(query, special_tokens):
42
+ return ChatMessagePromptTemplate.from_template(
43
+ query, role=special_tokens.user_token
44
+ )
45
+
46
+
47
+ def to_chat_system(query, special_tokens):
48
+ return ChatMessagePromptTemplate.from_template(
49
+ query, role=special_tokens.system_token
50
+ )
51
+
52
+
53
+ def to_chat_prompt(query, special_tokens):
54
+ return ChatMessagePromptTemplate.from_template(
55
+ query, role=special_tokens.user_token
56
+ )
57
+
58
+
59
+ def make_chat_prompt(prompt, special_tokens):
60
+ prompt_type = prompt["type"]
61
+ if prompt_type == "system":
62
+ return to_chat_system("\n".join(prompt["prompt"]), special_tokens)
63
+ elif prompt_type == "instruction":
64
+ return to_chat_instruction("\n".join(prompt["prompt"]), special_tokens)
65
+ elif prompt_type == "prompt":
66
+ return to_chat_prompt("\n".join(prompt["prompt"]), special_tokens)
67
+ else:
68
+ return "Invalid prompt type, please check your config"
spinoza_project/source/frontend/gradio_utils.py ADDED
@@ -0,0 +1,499 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import gradio as gr
2
+ import os
3
+ import yaml
4
+ import pandas as pd
5
+ from tqdm import tqdm
6
+ from collections import defaultdict
7
+ from langchain.prompts.chat import ChatPromptTemplate
8
+ from huggingface_hub import hf_hub_download
9
+ from spinoza_project.source.frontend.utils import make_html_source
10
+ from spinoza_project.source.backend.prompt_utils import (
11
+ to_chat_instruction,
12
+ SpecialTokens,
13
+ )
14
+ from spinoza_project.source.backend.get_prompts import get_qa_prompts
15
+ from spinoza_project.source.backend.document_store import pickle_to_document_store
16
+ from spinoza_project.source.backend.document_store import get_qdrant_filters
17
+
18
+ def get_config():
19
+ if os.getenv("EKI_OPENAI_LLM_DEPLOYMENT_NAME"):
20
+ with open("./spinoza_project/config.yaml") as f:
21
+ return yaml.full_load(f)
22
+
23
+ else:
24
+ with open("./spinoza_project/config_public.yaml") as f:
25
+ return yaml.full_load(f)
26
+
27
+
28
+ def get_prompts(config):
29
+ prompts = {}
30
+ for key, prompt_path in config["prompts_path_list"].items():
31
+ with open(prompt_path) as f:
32
+ source = config["source_mapping"][key]
33
+ prompts[source] = yaml.full_load(f)
34
+ return prompts
35
+
36
+
37
+ def set_prompts(prompts, config):
38
+ chat_qa_prompts, chat_reformulation_prompts = ({}, {})
39
+ for source, prompt in prompts.items():
40
+ chat_qa_prompt, chat_reformulation_prompt = get_qa_prompts(config, prompt)
41
+ chat_qa_prompts[source] = chat_qa_prompt
42
+ chat_reformulation_prompts[source] = chat_reformulation_prompt
43
+
44
+ return chat_qa_prompts, chat_reformulation_prompts
45
+
46
+
47
+ def get_assets():
48
+ with open("./assets/style.css", "r") as f:
49
+ css = f.read()
50
+ with open("./assets/source_information_fr.md", "r") as f:
51
+ source_information_fr = f.read()
52
+ with open("./assets/source_information_en.md", "r") as f:
53
+ source_information_en = f.read()
54
+ with open("./assets/about_contact_fr.md", "r") as f:
55
+ about_contact_fr = f.read()
56
+ with open("./assets/about_contact_en.md", "r") as f:
57
+ about_contact_en = f.read()
58
+ return (
59
+ css,
60
+ source_information_fr,
61
+ source_information_en,
62
+ about_contact_fr,
63
+ about_contact_en,
64
+ )
65
+
66
+
67
+ def get_qdrants(config):
68
+ qdrants = {}
69
+ tabs_to_load = [
70
+ tab
71
+ for tab in config["source_mapping"].keys()
72
+ ]
73
+
74
+ print("------ Default Qdrants")
75
+ for tab in tqdm(tabs_to_load, desc="Downloading qdrants from HF..."):
76
+ tab_name = config["source_mapping"][tab]
77
+ try:
78
+ if tab_name in ["Science", "Loi", "Organismes publics", "ADEME"]:
79
+ file_path = hf_hub_download(
80
+ repo_id=config["default_databases"],
81
+ filename=config["databases_pickle_files"][tab],
82
+ repo_type="dataset",
83
+ )
84
+ elif tab_name not in ["Science", "Loi", "Organismes publics", "ADEME"]:
85
+ print(f"------ Customed Qdrants: {tab_name}")
86
+ file_path = hf_hub_download(
87
+ repo_id=config["database_hf"],
88
+ filename=config["databases_pickle_files"][tab],
89
+ repo_type="dataset",
90
+ )
91
+ else:
92
+ continue
93
+
94
+ qdrants[tab_name] = pickle_to_document_store(file_path)
95
+
96
+ except Exception as e:
97
+ print(f"Error occured while loading {tab_name}: {str(e)}")
98
+ raise
99
+
100
+
101
+ documents = defaultdict(
102
+ lambda: {
103
+ "database": "",
104
+ "file_source_sub_type": "",
105
+ "file_author": "",
106
+ "file_title": "",
107
+ "file_date_publishing": "",
108
+ "chunk_count": 0,
109
+ "page_numbers": set(),
110
+ "file_filtering_modality": "",
111
+ }
112
+ )
113
+
114
+ for db_name, qdrant in tqdm(qdrants.items(), desc="Processing databases"):
115
+ print(f"Database: {db_name}")
116
+ try:
117
+ offset = None
118
+ total_processed = 0
119
+ batch_size = 10000
120
+
121
+ while True:
122
+ scroll_result = qdrant.client.scroll(
123
+ collection_name=qdrant.collection_name,
124
+ limit=batch_size,
125
+ offset=offset,
126
+ with_payload=True,
127
+ with_vectors=False,
128
+ )
129
+
130
+ if not scroll_result[0]:
131
+ break
132
+
133
+ for point in scroll_result[0]:
134
+ metadata = point.payload.get("metadata", {})
135
+
136
+ unique_fields = [
137
+ metadata.get("file_title", ""),
138
+ str(metadata.get("file_date_publishing", "")),
139
+ metadata.get("file_name", ""),
140
+ metadata.get("file_url", ""),
141
+ ]
142
+
143
+ doc_key = "_".join(field for field in unique_fields if field)
144
+ if not doc_key:
145
+ doc_key = f"unknown_doc_{total_processed}"
146
+
147
+ doc_info = documents[doc_key]
148
+ doc_info["database"] = db_name
149
+ doc_info["file_title"] = metadata.get("file_title", "Unknown")
150
+ date_str = str(metadata.get("file_date_publishing", "Unknown"))
151
+ doc_info["file_date_publishing"] = (
152
+ date_str[:4]
153
+ if date_str[:4].isdigit() and len(date_str) >= 4
154
+ else "Unknown"
155
+ )
156
+ doc_info["file_filtering_modality"] = metadata.get(
157
+ "file_filtering_modality", "Unknown"
158
+ )
159
+
160
+ if "content_page_number" in metadata:
161
+ doc_info["page_numbers"].add(metadata["content_page_number"])
162
+
163
+ total_processed += 1
164
+
165
+ offset = scroll_result[1]
166
+
167
+ print(
168
+ f"\rTraitement de {db_name}: {total_processed} chunks traités",
169
+ end="",
170
+ )
171
+
172
+ if offset is None:
173
+ break
174
+
175
+ except Exception as e:
176
+ print(f"\nErreur lors du traitement de {db_name}: {str(e)}")
177
+ continue
178
+
179
+ df_documents = pd.DataFrame(
180
+ [
181
+ {
182
+ "Source": info["database"],
183
+ "Title": info["file_title"],
184
+ "Page range": len(info["page_numbers"]),
185
+ "Filter": info["file_filtering_modality"],
186
+ "Publishing date": info["file_date_publishing"],
187
+ }
188
+ for info in documents.values()
189
+ ]
190
+ )
191
+
192
+ return qdrants, df_documents
193
+
194
+
195
+ def get_theme():
196
+ return gr.themes.Base(
197
+ primary_hue="blue",
198
+ secondary_hue="red",
199
+ font=[
200
+ gr.themes.GoogleFont("Poppins"),
201
+ "ui-sans-serif",
202
+ "system-ui",
203
+ "sans-serif",
204
+ ],
205
+ )
206
+
207
+
208
+ def get_init_prompt(lang="en"):
209
+ prompt = """
210
+ Hello, I am Spinoza, a conversational assistant specialized in climate-related topics, designed to support you in your journalistic endeavors. I will answer your climate-related questions **based on the provided sources**.
211
+
212
+ ⚠️ Limitations
213
+ *Please note that this questioning system is in its early stages, and it may occasionally provide irrelevant answers. If you are not satisfied with the response, please ask a more specific question or share your feedback to help us improve the system.*
214
+
215
+ What would you like to learn?
216
+ """
217
+ if lang == "fr":
218
+ prompt = """
219
+ Bonjour, je suis Spinoza, un assistant conversationnel expert sur le climat conçu pour vous aider dans votre parcours journalistique. Je répondrai à vos questions en lien avec le climat en me basant **sur les sources fournies**.
220
+
221
+ ⚠️ Limitations
222
+ *Veuillez noter que ce système de questionnement est à un stade précoce, il n'est pas parfait et peut parfois donner des réponses non pertinentes. Si vous n'êtes pas satisfait de la réponse, veuillez poser une question plus spécifique ou signaler vos commentaires pour nous aider à améliorer le système.*
223
+
224
+ Que voulez-vous apprendre ?
225
+ """
226
+
227
+ return prompt.strip()
228
+
229
+
230
+ def get_synthesis_prompt(config):
231
+ special_tokens = SpecialTokens(config)
232
+ with open(f"./spinoza_project/prompt_Spinoza.yaml", "r") as f:
233
+ synthesis_template = f.read()
234
+
235
+ synthesis_prompt = to_chat_instruction(synthesis_template, special_tokens)
236
+ synthesis_prompt_template = ChatPromptTemplate.from_messages([synthesis_prompt])
237
+
238
+ return synthesis_prompt_template
239
+
240
+
241
+ def zip_longest_fill(*args, fillvalue=None):
242
+ # zip_longest('ABCD', 'xy', fillvalue='-') --> Ax By C- D-
243
+ iterators = [iter(it) for it in args]
244
+ num_active = len(iterators)
245
+ if not num_active:
246
+ return
247
+
248
+ cond = True
249
+ fillvalues = [fillvalue] * len(iterators)
250
+ while cond:
251
+ values = []
252
+ for i, it in enumerate(iterators):
253
+ try:
254
+ value = next(it)
255
+ if not value:
256
+ value = next(it)
257
+ except StopIteration:
258
+ value = fillvalues[i]
259
+ values.append(value)
260
+
261
+ new_cond = False
262
+ for i, elt in enumerate(values):
263
+ if elt != fillvalues[i]:
264
+ new_cond = True
265
+ cond = new_cond
266
+
267
+ fillvalues = values.copy()
268
+ yield tuple(values)
269
+
270
+
271
+ def start_agents(lang_component):
272
+ lang = lang_component.value if hasattr(lang_component, "value") else lang_component
273
+ if lang == "fr":
274
+ gr.Info(message="Les agents et Spinoza démarent leurs analyses...", duration=3)
275
+ return [
276
+ (
277
+ None,
278
+ "J'attends que tous les agents aient terminé pour générer une réponse...",
279
+ )
280
+ ], gr.update(
281
+ label=get_text("source_filter_label", lang),
282
+ elem_id="filter-component",
283
+ interactive=False,
284
+ )
285
+
286
+ elif lang == "en":
287
+ gr.Info(
288
+ message="The agents and Spinoza are starting their analyses...", duration=3
289
+ )
290
+ return [
291
+ (
292
+ None,
293
+ "I am waiting for all the agents to finish before generating a response...",
294
+ )
295
+ ], gr.update(
296
+ label=get_text("source_filter_label", lang),
297
+ elem_id="filter-component",
298
+ interactive=False,
299
+ )
300
+
301
+
302
+ def end_agents(lang_component):
303
+ lang = lang_component.value if hasattr(lang_component, "value") else lang_component
304
+ if lang == "fr":
305
+ gr.Info(
306
+ message="Les agents et Spinoza ont fini de répondre à votre question",
307
+ duration=3,
308
+ )
309
+ return gr.update(
310
+ label=get_text("source_filter_label", lang),
311
+ elem_id="filter-component",
312
+ interactive=True,
313
+ )
314
+
315
+ elif lang == "en":
316
+ gr.Info(
317
+ message="The agents and Spinoza have finished answering your question.",
318
+ duration=3,
319
+ )
320
+ return gr.update(
321
+ label=get_text("source_filter_label", lang),
322
+ elem_id="filter-component",
323
+ interactive=True,
324
+ )
325
+
326
+
327
+ def next_call():
328
+ return
329
+
330
+
331
+ def format_question(question):
332
+ return f"{question}"
333
+
334
+
335
+ def parse_question(question):
336
+ x = question.replace("<p>", "").replace("</p>\n", "")
337
+ if "### " in x:
338
+ return x.split("### ")[1]
339
+ return x
340
+
341
+
342
+ def reformulate(language, llm, chat_reformulation_prompts, question, tab, config):
343
+ if tab in list(config["tabs"].keys()):
344
+ return llm.stream(
345
+ chat_reformulation_prompts[config["source_mapping"][tab]],
346
+ {
347
+ "question": parse_question(question),
348
+ "language": language,
349
+ },
350
+ )
351
+ else:
352
+ return iter([None] * 5)
353
+
354
+
355
+ def add_question(question):
356
+ return question
357
+
358
+
359
+ def answer(language, llm, chat_qa_prompts, question, source, tab, config):
360
+ if tab in list(config["tabs"].keys()):
361
+ if len(source) < 10:
362
+ return iter(["Aucune source trouvée, veuillez reformuler votre question"])
363
+ else:
364
+
365
+ return llm.stream(
366
+ chat_qa_prompts[config["source_mapping"][tab]],
367
+ {
368
+ "question": parse_question(question),
369
+ "sources": source.replace("<p>", "").replace("</p>\n", ""),
370
+ "language": language,
371
+ },
372
+ )
373
+ else:
374
+ return iter([None] * 5)
375
+
376
+ def perform_search(search_object, question, k, filter_query, config):
377
+ if search_object is None:
378
+ return []
379
+
380
+ cleaned_question = question.replace("<p>", "").replace("</p>\n", "")
381
+ if config['query_preprompt']:
382
+ search_query = (f"{config['query_preprompt']}{cleaned_question}")
383
+ else:
384
+ search_query = cleaned_question
385
+
386
+ # For qdrants objects
387
+ try:
388
+ results = search_object.similarity_search_with_relevance_scores(
389
+ search_query, k=k, filter=filter_query if filter_query else {}
390
+ )
391
+ return results
392
+
393
+ except Exception as e:
394
+ print(f"Error during search for {search_object}: {str(e)}")
395
+ return []
396
+
397
+
398
+ def get_sources(questions, filters, qdrants, config):
399
+ k = config["num_document_retrieved"]
400
+ min_similarity = config["min_similarity"]
401
+ formated = []
402
+ text = [""] * len(config["tabs"])
403
+
404
+ for i, (question, tab) in enumerate(zip(questions[0], list(config["tabs"].keys()))):
405
+ try:
406
+ search_object = qdrants[config['source_mapping'][tab]]
407
+
408
+ if tab in filters:
409
+ # if qdrant object
410
+ filter_query = get_qdrant_filters(filters[tab])
411
+ else:
412
+ filter_query = None
413
+
414
+ sources = perform_search(
415
+ search_object, question, k, filter_query, config
416
+ )
417
+
418
+ sources = [(doc, score) for doc, score in sources if score >= min_similarity]
419
+
420
+ formated.extend(
421
+ [
422
+ make_html_source(source[0], j, source[1], config)
423
+ for j, source in zip(range(k * i + 1, k * (i + 1) + 1), sources)
424
+ ]
425
+ )
426
+ text[i] = "\n\n".join(
427
+ [
428
+ f"Doc {str(j)} with source type {source[0].metadata.get('file_source_type')}:\n"
429
+ + source[0].page_content
430
+ for j, source in zip(range(k * i + 1, k * (i + 1) + 1), sources)
431
+ ]
432
+ )
433
+
434
+ except KeyError:
435
+ continue
436
+
437
+ formated = "".join(formated)
438
+ return formated, text
439
+
440
+
441
+ TRANSLATIONS = {
442
+ "fr": {
443
+ "ask_placeholder": "Posez votre question ici !",
444
+ "init_prompt": get_init_prompt("fr"),
445
+ "source_filter_label": "Paramètres des Agents",
446
+ "source_filter_title": "Filtrage des sources par agent.",
447
+ "source_filter_subtitle": "Configurez les filtres à appliquer sur les documents afin de raffiner les bases de données utilisées indépendamment par les différents agents.",
448
+ "question_filter": "Veuillez choisir les éléments sur lesquels appliquer le filtrage?",
449
+ "source_informatation_label": "Détails des Données",
450
+ "acc_info_label": "Informations détaillées",
451
+ "acc_info_desc": "Vue d'ensemble sur les sources utilisées par la solution.",
452
+ "display_info_desc": "Vue d'ensemble sur toutes les bases de données utilisées par la solution (à titre informatif).",
453
+ "contact_label": "À propos & Contact",
454
+ },
455
+ "en": {
456
+ "ask_placeholder": "Ask me anything here!",
457
+ "init_prompt": get_init_prompt("en"),
458
+ "source_filter_label": "Agent Data Settings",
459
+ "source_filter_title": "Sources filtering by agent.",
460
+ "source_filter_subtitle": "Configure document filters for each agent to refine their search scope.",
461
+ "question_filter": "Choose the items you would like to filter on...",
462
+ "source_informatation_label": "Data Details Overview",
463
+ "acc_info_label": "Detailed informations",
464
+ "acc_info_desc": "Brief overview of sources used by the solution.",
465
+ "display_info_desc": "Overview of all databases used by the solution for informational purposes.",
466
+ "contact_label": "About & Contact",
467
+ },
468
+ }
469
+
470
+ def update_translation(list_tabs, config):
471
+ for elt in list_tabs:
472
+ qa_key = f"agent_{config['source_mapping'][elt]}_qa"
473
+ flt_key = f"agent_{config['source_mapping'][elt]}_flt"
474
+ db_key = f"agent_{config['source_mapping'][elt]}_tab"
475
+ desc_key = f"{config['source_mapping'][elt]}_desc"
476
+
477
+ if qa_key not in TRANSLATIONS["fr"]:
478
+ qa_translations = {
479
+ "fr": f"Agent {config['source_mapping'][elt]}",
480
+ "en": f"{config['en_names'][elt]} Agent"
481
+ }
482
+ db_translations = {
483
+ "fr": f"Rubrique {config['source_mapping'][elt]}",
484
+ "en": f"{config['en_names'][elt]} Database"
485
+ }
486
+ desc_translations = {
487
+ "fr": config["tabs"][elt],
488
+ "en": config['en_description'][elt]
489
+ }
490
+ for lang in TRANSLATIONS:
491
+ TRANSLATIONS[lang][qa_key] = qa_translations.get(lang, qa_translations["fr"])
492
+ TRANSLATIONS[lang][flt_key] = qa_translations.get(lang, qa_translations["fr"])
493
+ TRANSLATIONS[lang][db_key] = db_translations.get(lang, db_translations["fr"])
494
+ TRANSLATIONS[lang][desc_key] = desc_translations.get(lang, desc_translations["fr"])
495
+
496
+
497
+
498
+ def get_text(key, language):
499
+ return TRANSLATIONS[language][key]
spinoza_project/source/frontend/utils.py ADDED
@@ -0,0 +1,172 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from queue import SimpleQueue
2
+ from dotenv import load_dotenv
3
+ import re
4
+ from langchain.callbacks.base import BaseCallbackHandler
5
+
6
+ job_done = object() # signals the processing is done
7
+
8
+
9
+ class StreamingGradioCallbackHandler(BaseCallbackHandler):
10
+ """Callback handler for streaming. Only works with LLMs that support streaming."""
11
+
12
+ def __init__(self, q):
13
+ self.q = q
14
+
15
+ def on_llm_start(self, serialized, prompts, **kwargs) -> None:
16
+ """Run when LLM starts running."""
17
+ while not self.q.empty():
18
+ try:
19
+ self.q.get(block=False)
20
+ except SimpleQueue.empty:
21
+ continue
22
+
23
+ def on_llm_new_token(self, token, **kwargs) -> None:
24
+ """Run on new LLM token. Only available when streaming is enabled."""
25
+ self.q.put(token)
26
+
27
+ def on_llm_end(self, response, **kwargs) -> None:
28
+ """Run when LLM ends running."""
29
+ self.q.put(job_done)
30
+
31
+ def on_llm_error(self, error, **kwargs) -> None:
32
+ """Run when LLM errors."""
33
+ self.q.put(job_done)
34
+
35
+
36
+ def add_gradio_streaming(llm):
37
+ q = SimpleQueue()
38
+ job_done = object() # signals the processing is done
39
+ llm.callbacks = [StreamingGradioCallbackHandler(q)]
40
+ return llm, q
41
+
42
+
43
+ def gradio_stream(llm, prompt):
44
+ thread = Thread(target=llm.predict, kwargs={"text": prompt})
45
+ thread.start()
46
+ text = ""
47
+ while True:
48
+ next_token = q.get(block=True) # Blocks until an input is available
49
+ if next_token is job_done:
50
+ break
51
+ text += next_token
52
+ time.sleep(0.03)
53
+ yield text
54
+ thread.join()
55
+
56
+
57
+ def get_source_link(metadata):
58
+ return metadata["file_url"] + f"#page={metadata['content_page_number'] + 1}"
59
+
60
+
61
+ def make_html_source(source, i, score, config):
62
+ meta = source.metadata
63
+ if meta["file_source_type"] == "AFP":
64
+ return f"""
65
+ <div class="card" id="doc{i}">
66
+ <div class="card-content">
67
+ <h2>Doc {i} - {meta['file_title']} - {meta['file_type']} AFP</h2>
68
+ <p>{source.page_content}</p>
69
+ </div>
70
+ <div class="card-footer">
71
+ <span>{meta['file_source_type']}</span>
72
+ <span>Relevance Score : {round(100*score,1)}%</span>
73
+ </div>
74
+ </div>
75
+ """
76
+
77
+ if meta["file_source_type"] == "Presse":
78
+ if meta["file_url"] != "none":
79
+ return f"""
80
+ <div class="card" id="doc{i}">
81
+ <div class="card-content">
82
+ <h2>Doc {i} - {meta['file_title']} - {meta['file_publisher']}</h2>
83
+ <p>{source.page_content}</p>
84
+ </div>
85
+ <div class="card-footer">
86
+ <span>{meta['file_source_type']}</span>
87
+ <span>Relevance Score : {round(100*score,1)}%</span>
88
+ <a href={meta['file_url']} target="_blank">
89
+ <span role="img" aria-label="Open PDF">🔗</span>
90
+ </a>
91
+ </div>
92
+ </div>
93
+ """
94
+ else:
95
+ return f"""
96
+ <div class="card" id="doc{i}">
97
+ <div class="card-content">
98
+ <h2>Doc {i} - {meta['file_title']} - {meta['file_publisher']}</h2>
99
+ <p>{source.page_content}</p>
100
+ </div>
101
+ <div class="card-footer">
102
+ <span>{meta['file_source_type']}</span>
103
+ <span>Relevance Score : {round(100*score,1)}%</span>
104
+ </div>
105
+ </div>
106
+ """
107
+
108
+ if meta["file_url"]:
109
+ return f"""
110
+ <div class="card" id="doc{i}">
111
+ <div class="card-content">
112
+ <h2>Doc {i} - {meta['file_title']} - Page {meta['content_page_number'] + 1}</h2>
113
+ <p>{source.page_content.replace(config["passage_preprompt"], "")}</p>
114
+ </div>
115
+ <div class="card-footer">
116
+ <span>{meta['file_source_type']}</span>
117
+ <span>Relevance Score : {round(100*score,1)}%</span>
118
+ <a href="{get_source_link(meta)}" target="_blank">
119
+ <span role="img" aria-label="Open PDF">🔗</span>
120
+ </a>
121
+ </div>
122
+ </div>
123
+ """
124
+ else:
125
+ return f"""
126
+ <div class="card" id="doc{i}">
127
+ <div class="card-content">
128
+ <h2>Doc {i} - {meta['file_title']} - Page {meta['content_page_number'] + 1}</h2>
129
+ <p>{source.page_content.replace(config["passage_preprompt"], "")}</p>
130
+ </div>
131
+ <div class="card-footer">
132
+ <span>{meta['file_source_type']}</span>
133
+ <span>Relevance Score : {round(100*score,1)}%</span>
134
+ </div>
135
+ </div>
136
+ """
137
+
138
+
139
+ def parse_output_llm_with_sources(output):
140
+ content_parts = re.split(
141
+ r"([\[(]?(?:(?:Doc|doc|Document|document)\s*\d+(?:\s*,\s*(?:Doc|doc|Document|document)\s*\d+)*)[\])]?)", # r"[\[(]?(Doc\s?\d+(?:,\s?Doc\s?\d+)*|doc\s?\d+(?:,\s?doc\s?\d+)*|Doc\s\d+)[\])?]",
142
+ output,
143
+ )
144
+ # print(content_parts)
145
+ parts = []
146
+ for part in content_parts:
147
+ if part.lower().strip("[]()").startswith("doc"):
148
+ subpart = part.strip("[]()")
149
+ subpart = subpart.lower().replace("document", "").replace("doc", "").strip()
150
+ subpart = f"""<a href="#doc{subpart}" class="a-doc-ref" target="_self"><span class='doc-ref'><sup>{subpart}</sup></span></a>"""
151
+ parts.append(subpart)
152
+ else:
153
+ parts.append(part)
154
+ content_parts = "".join(parts)
155
+
156
+ return content_parts
157
+
158
+
159
+ def clear_text_box(textbox):
160
+ return ""
161
+
162
+
163
+ def add_text(chatbot, text):
164
+ chatbot = chatbot + [(text, None)]
165
+ return chatbot, text
166
+
167
+
168
+ def init_env():
169
+ try:
170
+ load_dotenv()
171
+ except:
172
+ pass