selimc
/

turkish-colpali

@@ -23,7 +23,7 @@ should probably proofread and complete it, then remove this comment. -->
 This model is a fine-tuned version of [vidore/colpali-v1.3-hf](https://huggingface.co/vidore/colpali-v1.3-hf) on these datasets:
 - [selimc/tr-textbook-ColPali](https://huggingface.co/datasets/selimc/tr-textbook-ColPali)
-- [muhammetfatihaktug/bilim_teknik_mini_base_colpali](https://huggingface.co/datasets/muhammetfatihaktug/bilim_teknik_mini_base_colpali)
 ![image/png](https://cdn-uploads.huggingface.co/production/uploads/65281302cad797fc4abeffd7/bs8zGLYCYPrjCs8JdsmjA.png)
@@ -40,7 +40,7 @@ This model is primarily designed for efficient indexing and retrieval of Turkish
 The training data was created via the following steps:
 - Downloading PDF files of Turkish textbooks and science magazines that are publicly available on the internet.
 - Using the [pdf-to-page-images-dataset](https://huggingface.co/spaces/Dataset-Creation-Tools/pdf-to-page-images-dataset) Space to convert the PDF documents into a single page image dataset
-- Use `gemini-2.0-flash-exp` to generate synthetic queries for these documents using the approach outlined [here](https://danielvanstrien.xyz/posts/post-with-code/colpali/2024-09-23-generate_colpali_dataset.html) with additional modifications. This results in [selimc/tr-textbook-ColPali](https://huggingface.co/datasets/selimc/tr-textbook-ColPali) and [muhammetfatihaktug/bilim_teknik_mini_base_colpali](https://huggingface.co/datasets/muhammetfatihaktug/bilim_teknik_mini_base_colpali).
 - Train the model using the fine tuning [notebook](https://github.com/merveenoyan/smol-vision/blob/main/Finetune_ColPali.ipynb?s=35) from [Merve Noyan](https://huggingface.co/merve). Data processing step was modified to include all 3 types of queries. This approach not only adds variety to the training data but also effectively triples the dataset size, helping the model learn to handle diverse query types.
 ## Usage

 This model is a fine-tuned version of [vidore/colpali-v1.3-hf](https://huggingface.co/vidore/colpali-v1.3-hf) on these datasets:
 - [selimc/tr-textbook-ColPali](https://huggingface.co/datasets/selimc/tr-textbook-ColPali)
+- [muhammetfatihaktug/bilim_teknik_mini_base_colpali](https://huggingface.co/datasets/muhammetfatihaktug/bilim_teknik_mini_colpali)
 ![image/png](https://cdn-uploads.huggingface.co/production/uploads/65281302cad797fc4abeffd7/bs8zGLYCYPrjCs8JdsmjA.png)
 The training data was created via the following steps:
 - Downloading PDF files of Turkish textbooks and science magazines that are publicly available on the internet.
 - Using the [pdf-to-page-images-dataset](https://huggingface.co/spaces/Dataset-Creation-Tools/pdf-to-page-images-dataset) Space to convert the PDF documents into a single page image dataset
+- Use `gemini-2.0-flash-exp` to generate synthetic queries for these documents using the approach outlined [here](https://danielvanstrien.xyz/posts/post-with-code/colpali/2024-09-23-generate_colpali_dataset.html) with additional modifications. This results in [selimc/tr-textbook-ColPali](https://huggingface.co/datasets/selimc/tr-textbook-ColPali) and [muhammetfatihaktug/bilim_teknik_mini_base_colpali](https://huggingface.co/datasets/muhammetfatihaktug/bilim_teknik_mini_colpali).
 - Train the model using the fine tuning [notebook](https://github.com/merveenoyan/smol-vision/blob/main/Finetune_ColPali.ipynb?s=35) from [Merve Noyan](https://huggingface.co/merve). Data processing step was modified to include all 3 types of queries. This approach not only adds variety to the training data but also effectively triples the dataset size, helping the model learn to handle diverse query types.
 ## Usage