OmniDocBench: Benchmarking Diverse PDF Document Parsing with Comprehensive Annotations Paper • 2412.07626 • Published 15 days ago • 20
General OCR Theory: Towards OCR-2.0 via a Unified End-to-end Model Paper • 2409.01704 • Published Sep 3 • 83
view article Article Fine-tuning Florence-2 - Microsoft's Cutting-edge Vision Language Models Jun 24 • 180
view article Article Announcing Finance Commons and the Bad Data Toolbox: Pioneering Open Data and Advanced Document Processing By Pclanglais • Jul 19 • 18
Florence-2: Advancing a Unified Representation for a Variety of Vision Tasks Paper • 2311.06242 • Published Nov 10, 2023 • 86
ShareGPT4Video: Improving Video Understanding and Generation with Better Captions Paper • 2406.04325 • Published Jun 6 • 72
📀 Dataset comparison models Collection 1.8B models trained on 350BT to compare different pretraining datasets • 8 items • Updated Jun 12 • 34
From Sparse to Dense: GPT-4 Summarization with Chain of Density Prompting Paper • 2309.04269 • Published Sep 8, 2023 • 32
FocalFormer3D : Focusing on Hard Instance for 3D Object Detection Paper • 2308.04556 • Published Aug 8, 2023 • 8
JEN-1: Text-Guided Universal Music Generation with Omnidirectional Diffusion Models Paper • 2308.04729 • Published Aug 9, 2023 • 31
Shepherd: A Critic for Language Model Generation Paper • 2308.04592 • Published Aug 8, 2023 • 31
PDE-Refiner: Achieving Accurate Long Rollouts with Neural PDE Solvers Paper • 2308.05732 • Published Aug 10, 2023 • 8
Alexa, play with robot: Introducing the First Alexa Prize SimBot Challenge on Embodied AI Paper • 2308.05221 • Published Aug 9, 2023 • 9
Flexible Isosurface Extraction for Gradient-Based Mesh Optimization Paper • 2308.05371 • Published Aug 10, 2023 • 10
Follow Anything: Open-set detection, tracking, and following in real-time Paper • 2308.05737 • Published Aug 10, 2023 • 11
OpenProteinSet: Training data for structural biology at scale Paper • 2308.05326 • Published Aug 10, 2023 • 10
Trustworthy LLMs: a Survey and Guideline for Evaluating Large Language Models' Alignment Paper • 2308.05374 • Published Aug 10, 2023 • 27
AudioLDM 2: Learning Holistic Audio Generation with Self-supervised Pretraining Paper • 2308.05734 • Published Aug 10, 2023 • 37
OpenFlamingo: An Open-Source Framework for Training Large Autoregressive Vision-Language Models Paper • 2308.01390 • Published Aug 2, 2023 • 32