openbmb
/

VisRAG-Ret

Feature Extraction

Model card Files Files and versions Community

tcy6 commited on Oct 14

Commit

890393c

•

1 Parent(s): 228010f

Update README.md

Files changed (1) hide show

README.md +2 -6

README.md CHANGED Viewed

@@ -15,7 +15,7 @@ pipeline_tag: feature-extraction
 **VisRAG** is a novel vision-language model (VLM)-based RAG pipeline. In this pipeline, instead of first parsing the document to obtain text, the document is directly embedded using a VLM as an image and then retrieved to enhance the generation of a VLM.Compared to traditional text-based RAG, **VisRAG** maximizes the retention and utilization of the data information in the original documents, eliminating the information loss introduced during the parsing process.
 <p align="center"><img width=800 src="https://github.com/openbmb/VisRAG/blob/master/assets/main_figure.png?raw=true"/></p>
-## VisRAG Description
 ### VisRAG-Ret
 **VisRAG-Ret** is a document embedding model built on [MiniCPM-V 2.0](https://huggingface.co/openbmb/MiniCPM-V-2), a vision-language model that integrates [SigLIP](https://huggingface.co/google/siglip-so400m-patch14-384) as the vision encoder and [MiniCPM-2B](https://huggingface.co/openbmb/MiniCPM-2B-sft-bf16) as the language model.
@@ -118,8 +118,4 @@ print(scores.tolist())
 ## Contact
 - Shi Yu: [email protected]
-- Chaoyue Tang: [email protected]
-## Citation
-If you use any datasets or models from this organization in your research, please cite the original dataset as follows:

 **VisRAG** is a novel vision-language model (VLM)-based RAG pipeline. In this pipeline, instead of first parsing the document to obtain text, the document is directly embedded using a VLM as an image and then retrieved to enhance the generation of a VLM.Compared to traditional text-based RAG, **VisRAG** maximizes the retention and utilization of the data information in the original documents, eliminating the information loss introduced during the parsing process.
 <p align="center"><img width=800 src="https://github.com/openbmb/VisRAG/blob/master/assets/main_figure.png?raw=true"/></p>
+## VisRAG Pipeline
 ### VisRAG-Ret
 **VisRAG-Ret** is a document embedding model built on [MiniCPM-V 2.0](https://huggingface.co/openbmb/MiniCPM-V-2), a vision-language model that integrates [SigLIP](https://huggingface.co/google/siglip-so400m-patch14-384) as the vision encoder and [MiniCPM-2B](https://huggingface.co/openbmb/MiniCPM-2B-sft-bf16) as the language model.
 ## Contact
 - Shi Yu: [email protected]
+- Chaoyue Tang: [email protected]