Feature Extraction
Safetensors
English
minicpmv
VisRAG
custom_code
tcy6 commited on
Commit
a932f2e
1 Parent(s): 2069e2c

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +2 -1
README.md CHANGED
@@ -41,7 +41,8 @@ In the paper, We use MiniCPM-V 2.0, MiniCPM-V 2.6 and GPT-4o as the generators.
41
  ## Training
42
 
43
  ### VisRAG-Ret
44
- Our training dataset of 362,110 Query-Document (Q-D) Pairs for **VisRAG-Ret** is comprised of train sets of openly available academic datasets (34%) and a synthetic dataset made up of pages from web-crawled PDF documents and augmented with VLM-generated (GPT-4o) pseudo-queries (66%).
 
45
 
46
  ### VisRAG-Gen
47
  The generation part does not use any fine-tuning; we directly use off-the-shelf LLMs/VLMs for generation.
 
41
  ## Training
42
 
43
  ### VisRAG-Ret
44
+ Our training dataset of 362,110 Query-Document (Q-D) Pairs for **VisRAG-Ret** is comprised of train sets of openly available academic datasets (34%) and a synthetic dataset made up of pages from web-crawled PDF documents and augmented with VLM-generated (GPT-4o) pseudo-queries (66%). It can be found in the `VisRAG` Collection on Hugging Face, which is referenced at the beginning of this page.
45
+
46
 
47
  ### VisRAG-Gen
48
  The generation part does not use any fine-tuning; we directly use off-the-shelf LLMs/VLMs for generation.