Luci
Akirami
·
AI & ML interests
None yet
Recent Activity
reacted
to
singhsidhukuldeep's
post
with 🤗
22 days ago
Exciting breakthrough in Document AI! Researchers from UNC Chapel Hill and Bloomberg have developed M3DocRAG, a revolutionary framework for multi-modal document understanding.
The innovation lies in its ability to handle complex document scenarios that traditional systems struggle with:
- Process 40,000+ pages across 3,000+ documents
- Answer questions requiring information from multiple pages
- Understand visual elements like charts, tables, and figures
- Support both closed-domain (single document) and open-domain (multiple documents) queries
Under the hood, M3DocRAG operates through three sophisticated stages:
>> Document Embedding:
- Converts PDF pages to RGB images
- Uses ColPali to project both text queries and page images into a shared embedding space
- Creates dense visual embeddings for each page while maintaining visual information integrity
>> Page Retrieval:
- Employs MaxSim scoring to compute relevance between queries and pages
- Implements inverted file indexing (IVFFlat) for efficient search
- Reduces retrieval latency from 20s to under 2s when searching 40K+ pages
- Supports approximate nearest neighbor search via Faiss
>> Question Answering:
- Leverages Qwen2-VL 7B as the multi-modal language model
- Processes retrieved pages through a visual encoder
- Generates answers considering both textual and visual context
The results are impressive:
- State-of-the-art performance on MP-DocVQA benchmark
- Superior handling of non-text evidence compared to text-only systems
- Significantly better performance on multi-hop reasoning tasks
This is a game-changer for industries dealing with large document volumes—finance, healthcare, and legal sectors can now process documents more efficiently while preserving crucial visual context.
updated
a collection
about 1 month ago
JailBreak
updated
a collection
about 1 month ago
JailBreak
Organizations
spaces
1
models
13
Akirami/twitter-roberta-sentiment-analysiss-onnx-quantized
Updated
•
2
Akirami/Akirami
Updated
Akirami/twitter-roberta-sentiment-analysiss-lr-1e-5
Text Classification
•
Updated
•
14
Akirami/distillbert-uncased-ag-news
Text Classification
•
Updated
•
283
Akirami/telugu_llama2_tokenizer
Updated
Akirami/phi-3-medium_text2cypher_recommendations
Updated
Akirami/llama3-news-classification
Updated
Akirami/vanilla-llama-3-8b-bnb-4bit
Text Generation
•
Updated
•
21
Akirami/truthy-llama3-8b
Text Generation
•
Updated
•
21
•
1
Akirami/llama3-8b-orpo-truthy
Updated