Spaces:

tcy6
/

VisRAG_Pipeline

Running

App Files Files Community

tcy6 commited on Nov 4, 2024

Commit

bace9e3

1 Parent(s): 558ab5b

Update app.py

Browse files

Files changed (1) hide show

app.py +11 -11

app.py CHANGED Viewed

@@ -247,28 +247,28 @@ with gr.Blocks() as app:
     gr.Markdown("# VisRAG Pipeline: Vision-based Retrieval-augmented Generation on Multi-modality Documents")
     gr.Markdown("""
-- A Vision Language Model Dense Retriever ([VisRAG-Ret](https://huggingface.co/openbmb/VisRAG-Ret)) **directly reads** your PDFs **without need of OCR**, produce **multimodal dense representations** and build your personal library.
-- **Ask a question**, it retrieve most relavant pages, then [MiniCPM-V-2.6](https://huggingface.co/spaces/openbmb/MiniCPM-V-2_6) will answer your question based on pages recalled, with strong multi-image understanding capability.
-    - It helps you read a long **visually-intensive** or **text-oriented** PDF document and find the pages that answer your question.
-    - It helps you build a personal library and retireve book pages from a large collection of books.
-    - It works like a human: read, store, retrieve, and answer with full vision.
 """)
-    gr.Markdown("- Currently online demo support PDF document with less than 50 pages due to GPU time limit. Deploy on your own machine for longer PDFs and books.")
     with gr.Row():
         file_input = gr.File(file_types=["pdf"], label="Step 1: Upload PDF")
         file_result = gr.Text(label="Knowledge Base ID (remember it, it is re-usable!)")
-        process_button = gr.Button("Process PDF (Don't click until PDF upload success)")
     process_button.click(add_pdf_gradio, inputs=[file_input], outputs=file_result)
     with gr.Row():
-        kb_id_input = gr.Text(label="Your Knowledge Base ID (paste your Knowledge Base ID here, it is re-usable:)")
         query_input = gr.Text(label="Your Queston")
         topk_input = inputs=gr.Number(value=5, minimum=1, maximum=10, step=1, label="Number of pages to retrieve")
         retrieve_button = gr.Button("Step2: Retrieve Pages")
@@ -276,10 +276,10 @@ with gr.Blocks() as app:
     with gr.Row():
         gr.Examples(
             examples=[
-                ["main_figure.pdf", "What is RAG-V?"],
                 ["main_figure.pdf", "How does RAG-V perform?"]
             ],
-            inputs=[file_input, query_input],
         )
     with gr.Row():
@@ -301,7 +301,7 @@ with gr.Blocks() as app:
     upvote_button.click(upvote, inputs=[kb_id_input, query_input], outputs=None)
     downvote_button.click(downvote, inputs=[kb_id_input, query_input], outputs=None)
-    gr.Markdown("By using this demo, you agree to share your use data with us for research purpose, to help improve user experience.")
 app.launch()

     gr.Markdown("# VisRAG Pipeline: Vision-based Retrieval-augmented Generation on Multi-modality Documents")
     gr.Markdown("""
+- A Vision Language Model Dense Retriever ([VisRAG-Ret](https://huggingface.co/openbmb/VisRAG-Ret)) **directly reads** your PDFs **without need for OCR**, generates **multimodal dense representations** and assists in building your personal library.
+- **Ask a question**, and it will retrieve the most relevant pages. Then, [MiniCPM-V-2.6](https://huggingface.co/spaces/openbmb/MiniCPM-V-2_6) will answer your question based on the recalled pages, utilizing its strong multi-image understanding capabilities.
+    - It assists you in reading **lengthy**, **visually-intensive** or **text-oriented** PDF documents, helping you locate pages that answer your questions.
+    - It enables you to build a personal library and retrieve book pages from a large collection of books.
+    - It works like a human: reading, storing, retrieving, and answering with full visual comprehension.
 """)
+    gr.Markdown("- The current online demo supports PDF documents with fewer than 50 pages due to GPU time limitations. For longer PDFs and books, consider deploying it on your own machine.")
     with gr.Row():
         file_input = gr.File(file_types=["pdf"], label="Step 1: Upload PDF")
         file_result = gr.Text(label="Knowledge Base ID (remember it, it is re-usable!)")
+        process_button = gr.Button("Process PDF (Don't click until PDF uploaded successfully)")
     process_button.click(add_pdf_gradio, inputs=[file_input], outputs=file_result)
     with gr.Row():
+        kb_id_input = gr.Text(label="Your Knowledge Base ID (paste your Knowledge Base ID here, it is re-usable):")
         query_input = gr.Text(label="Your Queston")
         topk_input = inputs=gr.Number(value=5, minimum=1, maximum=10, step=1, label="Number of pages to retrieve")
         retrieve_button = gr.Button("Step2: Retrieve Pages")
     with gr.Row():
         gr.Examples(
             examples=[
+                ["main_figure.pdf", """What is RAG-V?"],
                 ["main_figure.pdf", "How does RAG-V perform?"]
             ],
+            inputs=[file_input, file_result, query_input],
         )
     with gr.Row():
     upvote_button.click(upvote, inputs=[kb_id_input, query_input], outputs=None)
     downvote_button.click(downvote, inputs=[kb_id_input, query_input], outputs=None)
+    gr.Markdown("By using this demo, you agree to share your usage data with us for research purposes, helping us improve the user experience.")
 app.launch()