dl4ds_tutor / README.md
Thomas (Tom) Gardos
Update README.md
104695f unverified
|
raw
history blame
4.14 kB

DL4DS Tutor πŸƒ

Check out the configuration reference at Hugging Face Spaces Config Reference.

You can find a "production" implementation of the Tutor running live at DL4DS Tutor from the Hugging Face Space. It is pushed automatically from the main branch of this repo by this Actions Workflow upon a push to main.

A "development" version of the Tutor is running live at DL4DS Tutor -- Dev from this Hugging Face Space. It is pushed automatically from the dev_branch branch of this repo by this Actions Workflow upon a push to dev_branch.

Running Locally

  1. Clone the Repository

    git clone https://github.com/DL4DS/dl4ds_tutor
    
  2. Put your data under the storage/data directory

    • Add URLs in the urls.txt file.
    • Add other PDF files in the storage/data directory.
  3. To test Data Loading (Optional)

    cd code
    python -m modules.dataloader.data_loader
    
  4. Create the Vector Database

    cd code
    python -m modules.vectorstore.store_manager
    
    • Note: You need to run the above command when you add new data to the storage/data directory, or if the storage/data/urls.txt file is updated.
    • Alternatively, you can set ["vectorstore"]["embedd_files"] to True in the code/modules/config/config.yaml file, which will embed files from the storage directory every time you run the below chainlit command.
  5. Run the Chainlit App

    chainlit run main.py
    

See the docs for more information.

File Structure

code/
 β”œβ”€β”€ modules
 β”‚   β”œβ”€β”€ chat                # Contains the chatbot implementation
 β”‚   β”œβ”€β”€ chat_processor      # Contains the implementation to process and log the conversations
 β”‚   β”œβ”€β”€ config              # Contains the configuration files
 β”‚   β”œβ”€β”€ dataloader          # Contains the implementation to load the data from the storage directory
 β”‚   β”œβ”€β”€ retriever           # Contains the implementation to create the retriever
 β”‚   └── vectorstore         # Contains the implementation to create the vector database
 β”œβ”€β”€ public
 β”‚   β”œβ”€β”€ logo_dark.png       # Dark theme logo
 β”‚   β”œβ”€β”€ logo_light.png      # Light theme logo
 β”‚   └── test.css            # Custom CSS file
 └── main.py

 
docs/                        # Contains the documentation to the codebase and methods used

storage/
 β”œβ”€β”€ data                    # Store files and URLs here
 β”œβ”€β”€ logs                    # Logs directory, includes logs on vector DB creation, tutor logs, and chunks logged in JSON files
 └── models                  # Local LLMs are loaded from here

vectorstores/                # Stores the created vector databases

.env                         # This needs to be created, store the API keys here
  • code/modules/vectorstore/vectorstore.py: Instantiates the VectorStore class to create the vector database.
  • code/modules/vectorstore/store_manager.py: Instantiates the VectorStoreManager: class to manage the vector database, and all associated methods.
  • code/modules/retriever/retriever.py: Instantiates the Retriever class to create the retriever.

Docker

The HuggingFace Space is built using the Dockerfile in the repository. To run it locally, use the Dockerfile.dev file.

docker build --tag dev  -f Dockerfile.dev .
docker run -it --rm -p 8000:8000 dev

Contributing

Please create an issue if you have any suggestions or improvements, and start working on it by creating a branch and by making a pull request to the main branch.