Spaces:
Build error
Build error
DL4DS Tutor π
Check out the configuration reference at Hugging Face Spaces Config Reference.
You can find an implementation of the Tutor at DL4DS Tutor on Hugging Face, which is hosted on Hugging Face here.
Running Locally
Clone the Repository
git clone https://github.com/DL4DS/dl4ds_tutor
Put your data under the
storage/data
directory- Add URLs in the
urls.txt
file. - Add other PDF files in the
storage/data
directory.
- Add URLs in the
To test Data Loading (Optional)
cd code python -m modules.dataloader.data_loader
Create the Vector Database
cd code python -m modules.vectorstore.store_manager
- Note: You need to run the above command when you add new data to the
storage/data
directory, or if thestorage/data/urls.txt
file is updated. - Alternatively, you can set
["vectorstore"]["embedd_files"]
toTrue
in thecode/modules/config/config.yaml
file, which will embed files from the storage directory every time you run the below chainlit command.
- Note: You need to run the above command when you add new data to the
Run the Chainlit App
chainlit run main.py
See the docs for more information.
File Structure
code/
βββ modules
β βββ chat # Contains the chatbot implementation
β βββ chat_processor # Contains the implementation to process and log the conversations
β βββ config # Contains the configuration files
β βββ dataloader # Contains the implementation to load the data from the storage directory
β βββ retriever # Contains the implementation to create the retriever
β βββ vectorstore # Contains the implementation to create the vector database
βββ public
β βββ logo_dark.png # Dark theme logo
β βββ logo_light.png # Light theme logo
β βββ test.css # Custom CSS file
βββ main.py
docs/ # Contains the documentation to the codebase and methods used
storage/
βββ data # Store files and URLs here
βββ logs # Logs directory, includes logs on vector DB creation, tutor logs, and chunks logged in JSON files
βββ models # Local LLMs are loaded from here
vectorstores/ # Stores the created vector databases
.env # This needs to be created, store the API keys here