dl4ds_tutor / README.md
XThomasBU
updated README
f1da7ee
|
raw
history blame
3.09 kB

DL4DS Tutor πŸƒ

Check out the configuration reference at Hugging Face Spaces Config Reference.

You can find an implementation of the Tutor at DL4DS Tutor on Hugging Face, which is hosted on Hugging Face here.

Running Locally

  1. Clone the Repository

    git clone https://github.com/DL4DS/dl4ds_tutor
    
  2. Put your data under the storage/data directory

    • Add URLs in the urls.txt file.
    • Add other PDF files in the storage/data directory.
  3. To test Data Loading (Optional)

    cd code
    python -m modules.dataloader.data_loader
    
  4. Create the Vector Database

    cd code
    python -m modules.vectorstore.store_manager
    
    • Note: You need to run the above command when you add new data to the storage/data directory, or if the storage/data/urls.txt file is updated.
    • Alternatively, you can set ["vectorstore"]["embedd_files"] to True in the code/modules/config/config.yaml file, which will embed files from the storage directory every time you run the below chainlit command.
  5. Run the Chainlit App

    chainlit run main.py
    

See the docs for more information.

File Structure

code/
 β”œβ”€β”€ modules
 β”‚   β”œβ”€β”€ chat                # Contains the chatbot implementation
 β”‚   β”œβ”€β”€ chat_processor      # Contains the implementation to process and log the conversations
 β”‚   β”œβ”€β”€ config              # Contains the configuration files
 β”‚   β”œβ”€β”€ dataloader          # Contains the implementation to load the data from the storage directory
 β”‚   β”œβ”€β”€ retriever           # Contains the implementation to create the retriever
 β”‚   └── vectorstore         # Contains the implementation to create the vector database
 β”œβ”€β”€ public
 β”‚   β”œβ”€β”€ logo_dark.png       # Dark theme logo
 β”‚   β”œβ”€β”€ logo_light.png      # Light theme logo
 β”‚   └── test.css            # Custom CSS file
 └── main.py

 
docs/                        # Contains the documentation to the codebase and methods used

storage/
 β”œβ”€β”€ data                    # Store files and URLs here
 β”œβ”€β”€ logs                    # Logs directory, includes logs on vector DB creation, tutor logs, and chunks logged in JSON files
 └── models                  # Local LLMs are loaded from here

vectorstores/                # Stores the created vector databases

.env                         # This needs to be created, store the API keys here
  • code/modules/vectorstore/vectorstore.py: Instantiates the VectorStore class to create the vector database.
  • code/modules/vectorstore/store_manager.py: Instantiates the VectorStoreManager: class to manage the vector database, and all associated methods.
  • code/modules/retriever/retriever.py: Instantiates the Retriever class to create the retriever.