Molmo Collection Artifacts for open multimodal language models. β’ 5 items β’ Updated 9 days ago β’ 216
Qwen2.5 Collection Qwen2.5 language models, including pretrained and instruction-tuned models of 7 sizes, including 0.5B, 1.5B, 3B, 7B, 14B, 32B, and 72B. β’ 45 items β’ Updated 17 days ago β’ 223
Qwen2-VL: Enhancing Vision-Language Model's Perception of the World at Any Resolution Paper β’ 2409.12191 β’ Published 17 days ago β’ 69
General OCR Theory: Towards OCR-2.0 via a Unified End-to-end Model Paper β’ 2409.01704 β’ Published Sep 3 β’ 78
Awesome Document AI Collection A collection of open-source document AI π π π β’ 27 items β’ Updated Mar 11 β’ 70
Qwen2-VL Collection Vision-language model series based on Qwen2 β’ 15 items β’ Updated 17 days ago β’ 129
Let Me Speak Freely? A Study on the Impact of Format Restrictions on Performance of Large Language Models Paper β’ 2408.02442 β’ Published Aug 5 β’ 18
Papers I want to read Collection Papers in my to-read list β’ 239 items β’ Updated about 22 hours ago β’ 21
view article Article A Gentle Introduction to 8-bit Matrix Multiplication for transformers at scale using transformers, accelerate and bitsandbytes Aug 17, 2022 β’ 57
view article Article Fine-tune Llama 3.1 Ultra-Efficiently with Unsloth By mlabonne β’ Jul 29 β’ 208
PDF Document / OCR Datasets Collection Document datasets with .pdf files that are usable with pixparse libraries and tools. β’ 2 items β’ Updated Mar 30 β’ 47
MGM Collection Official model collection for the paper "Mini-Gemini: Mining the Potential of Multi-modality Vision Language Models" β’ 13 items β’ Updated May 3 β’ 46
view article Article ColPali: Efficient Document Retrieval with Vision Language Models π By manu β’ Jul 5 β’ 109
Qwen2 Collection Qwen2 language models, including pretrained and instruction-tuned models of 5 sizes, including 0.5B, 1.5B, 7B, 57B-A14B, and 72B. β’ 39 items β’ Updated 17 days ago β’ 340
view article Article Breaking resolution curse of vision-language models By visheratin β’ Feb 24 β’ 10
view article Article Fine-tuning Florence-2 - Microsoft's Cutting-edge Vision Language Models Jun 24 β’ 169
Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data Paper β’ 2401.10891 β’ Published Jan 19 β’ 58
DocLLM: A layout-aware generative language model for multimodal document understanding Paper β’ 2401.00908 β’ Published Dec 31, 2023 β’ 178