Spaces:
Build error
DL4DS Tutor π
Check out the configuration reference at Hugging Face Spaces Config Reference.
You can find a "production" implementation of the Tutor running live at DL4DS Tutor from the
Hugging Face Space. It is pushed automatically from the main
branch of this repo by this
Actions Workflow upon a push to main
.
A "development" version of the Tutor is running live at DL4DS Tutor -- Dev from this Hugging Face
Space. It is pushed automatically from the dev_branch
branch of this repo by this
Actions Workflow upon a push to dev_branch
.
Running Locally
Clone the Repository
git clone https://github.com/DL4DS/dl4ds_tutor
Put your data under the
storage/data
directory- Add URLs in the
urls.txt
file. - Add other PDF files in the
storage/data
directory.
- Add URLs in the
To test Data Loading (Optional)
cd code python -m modules.dataloader.data_loader
Create the Vector Database
cd code python -m modules.vectorstore.store_manager
- Note: You need to run the above command when you add new data to the
storage/data
directory, or if thestorage/data/urls.txt
file is updated. - Alternatively, you can set
["vectorstore"]["embedd_files"]
toTrue
in thecode/modules/config/config.yaml
file, which will embed files from the storage directory every time you run the below chainlit command.
- Note: You need to run the above command when you add new data to the
Run the Chainlit App
chainlit run main.py
See the docs for more information.
File Structure
code/
βββ modules
β βββ chat # Contains the chatbot implementation
β βββ chat_processor # Contains the implementation to process and log the conversations
β βββ config # Contains the configuration files
β βββ dataloader # Contains the implementation to load the data from the storage directory
β βββ retriever # Contains the implementation to create the retriever
β βββ vectorstore # Contains the implementation to create the vector database
βββ public
β βββ logo_dark.png # Dark theme logo
β βββ logo_light.png # Light theme logo
β βββ test.css # Custom CSS file
βββ main.py
docs/ # Contains the documentation to the codebase and methods used
storage/
βββ data # Store files and URLs here
βββ logs # Logs directory, includes logs on vector DB creation, tutor logs, and chunks logged in JSON files
βββ models # Local LLMs are loaded from here
vectorstores/ # Stores the created vector databases
.env # This needs to be created, store the API keys here
code/modules/vectorstore/vectorstore.py
: Instantiates theVectorStore
class to create the vector database.code/modules/vectorstore/store_manager.py
: Instantiates theVectorStoreManager:
class to manage the vector database, and all associated methods.code/modules/retriever/retriever.py
: Instantiates theRetriever
class to create the retriever.
Docker
The HuggingFace Space is built using the Dockerfile
in the repository. To run it locally, use the Dockerfile.dev
file.
docker build --tag dev -f Dockerfile.dev .
docker run -it --rm -p 8000:8000 dev
Contributing
Please create an issue if you have any suggestions or improvements, and start working on it by creating a branch and by making a pull request to the main branch.