spinoza_public / assets /source_information_en.md
Msvr's picture
Initial commit
3911020

A newer version of the Gradio SDK is available: 5.27.0

Upgrade

Here's a brief introduction to the data sources accessible by the different agents.

  1. Science: This tool consists of IPCC and IPBES reports.

  2. Law: This tool is based on French law and includes 21 "codes" that were modified by the 2021 "Climate Law".

  3. Public Organizations: This tool queries the French national low-carbon strategy (SNBC).

  4. ADEME: This tool is dedicated to ADEME data, and we have selected different categories of reports:

    • Guides made available to the general public
    • Experience reports on new technologies
    • Studies and research on local impacts, institutional documents (analyses commissioned by France & activity reports)
    • Sectoral transition plans for the most emitting industrial sectors (glass, paper, cement, steel, aluminum, chemistry, sugar)
  5. Press: In 2023, hundreds of thousands of articles from 212 press titles were analyzed to identify those dedicated to Ecological Transition. A documentary query of more than 300 keywords helped select articles mentioning these terms in the title, header, subheadings, or multiple times in the text. The chosen articles were specifically focused on ecological transition and not mere mentions. Once deduplicated and proportionally distributed among media groups, articles were randomly selected, without relying on criteria of size, format, or content, reaching a total of 28,450 articles.

  6. AFP: More than 700 AFP documents were also collected:

    • References and boxes: These educational formats contain an average of 400 to 600 words. Structured in 3 to 5 sub-sections, their objective is to clearly and concisely explain a current event.
    • Dispatches: These articles are written by AFP and cover real-time news, following an inverted pyramid approach (essential information first). Their length varies from a few words ("alert") to about 600 to 700 words for more detailed articles ("general paper").
    • Fact-checking: Verification of facts related to current events.
    • General papers

Here is some information about what a relevance score is. The relevance score is a metric used to evaluate the relevance of documents retrieved in relation to a given query within a vectorstore. When a document is stored as a vector, the relevance score indicates how closely that document aligns with the query in terms of vector similarity.

Here's how it generally works:

  • Vector Representation: Documents and the query are converted into vectors in a vector space.
  • Similarity Calculation: A similarity measure (such as dot product or cosine distance) is used to compare document vectors with the query vector.
  • Relevance Score: The result of this comparison is the relevance score, which indicates how relevant each document is to the query.

A higher score means the document is more relevant to the query. This allows ranking retrieved documents based on their relevance.