# Sorbobot: Expert Finder Chatbot Documentation ## Overview Sorbobot is a chatbot designed for Sorbonne Université to assist their administration in locating academic experts within the university. This document outlines the structure, functionality, and implementation details of Sorbobot. ### Context Sorbobot centers around identifying experts with precision, avoiding confusion with individuals sharing similar names. It leverages HAL unique identifiers to distinguish between experts. ## System Architecture Sorbobot operates on a Retrieval Augmented Generation (RAG) system, composed of two primary steps: 1. **Retrieval**: Identifies publications most similar to the user queries. 2. **Generation**: Produces responses based on the context extracted from relevant publications. ## Implementation Details ### Programming Language and Libraries - **Language**: Python - **Frontend**: Streamlit - **Database**: PostgreSQL with pgvector for similarity search - **NLP Processing**: langchain and GPT4all libraries ### Database - **Postgres with pgvector**: Used for storing data and performing similarity searches based on cosine similarity metrics. ### Natural Language Processing - **Abstracts as Data Source**: The chatbot utilizes publication abstracts to identify experts. - **GPT4all for Word Embedding**: Converts text from author publications into word embeddings, enhancing the accuracy of expert identification. ### Retrieval Process 1. **Query Processing**: User queries are processed to extract key terms. 2. **Similarity Search**: The system searches the database using pgvector to find publications with low cosine distance to the query. 3. **Expert Identification**: The system identifies authors of these publications, ensuring unique identification of experts. ### Generation Process 1. **Context Extraction**: Relevant information is extracted from the identified publications. 2. **Response Generation**: Utilizes a LLM to generate informative responses based on the extracted context. ## User Interaction Flow 1. **Query Submission**: Users submit queries related to their expert search. 2. **Chatbot Processing**: Sorbobot processes the query, retrieves relevant publications, and identifies experts. 3. **Response Presentation**: The system presents a list of experts, including unique identifiers and relevant publication abstracts. ## Conclusion Sorbobot is a powerful tool for Sorbonne Université, streamlining the process of finding academic experts. Its advanced NLP capabilities, combined with a robust database and intelligent retrieval-generation framework, ensure accurate and efficient expert identification.