Jakaline (Jakaline)

🚀🚀 Exciting times for the document AI community!

We're thrilled to announce the release of some of the largest OCR datasets available to the public.
🔥 With over 26 million pages , 18 billion text tokens, and 6TB of data, these resources are a significant leap forward for document AI research.

Here's how to access these datasets quickly:

from datasets import load_dataset

pdfa_dataset = load_dataset('pixparse/pdfa-eng-wds', streaming=True)
IDL_dataset = load_dataset('pixparse/idl-wds', streaming=True)

This enables you to stream them directly, integrating seamlessly with your projects using the Hugging Face datasets library. On the hub, you can find them here:

pixparse/pdfa-eng-wds
pixparse/idl-wds

For lean data loading, the new [chug](https://github.com/huggingface/chug) library offers a solution with pdf decoding:

import chug

task_cfg = chug.DataTaskDocReadCfg(
    page_sampling='all',
)
data_cfg = chug.DataCfg(
    source='pixparse/pdfa-eng-wds',
    split='train',
    batch_size=None,
    format='hfids',
    num_workers=0,
)
data_loader = chug.create_loader(
    data_cfg,
    task_cfg,
)
sample = next(iter(data_loader))

We owe a huge thank you to Peter Wyatt, Kate Tasker, Rachel Taketa, Ali Furkan Biten, Ruben Tito, and their colleagues for their contributions. Their work putting these datasets together has been invaluable. 🤗

Looking Ahead:

We're on a mission to enhance document AI capabilities, and these datasets are just the beginning. With your engagement and innovation, we're confident in the community's ability to develop robust OCR solutions. We encourage you to explore these datasets, experiment with the code, and contribute to the collective progress in document AI.

For detailed information on usage and licensing, please refer to the dataset cards on the Hugging Face hub.

4 replies

·

reacted to akhaliq's post with ❤️ 9 months ago

Post

2200

LLM Agent Operating System

LLM Agent Operating System (2403.16971)

The integration and deployment of large language model (LLM)-based intelligent agents have been fraught with challenges that compromise their efficiency and efficacy. Among these issues are sub-optimal scheduling and resource allocation of agent requests over the LLM, the difficulties in maintaining context during interactions between agent and LLM, and the complexities inherent in integrating heterogeneous agents with different capabilities and specializations. The rapid increase of agent quantity and complexity further exacerbates these issues, often leading to bottlenecks and sub-optimal utilization of resources. Inspired by these challenges, this paper presents AIOS, an LLM agent operating system, which embeds large language model into operating systems (OS). Specifically, AIOS is designed to optimize resource allocation, facilitate context switch across agents, enable concurrent execution of agents, provide tool service for agents, and maintain access control for agents. We present the architecture of such an operating system, outline the core challenges it aims to resolve, and provide the basic design and implementation of the AIOS. Our experiments on concurrent execution of multiple agents demonstrate the reliability and efficiency of our AIOS modules. Through this, we aim to not only improve the performance and efficiency of LLM agents but also to pioneer for better development and deployment of the AIOS ecosystem in the future.

3 replies

·

reacted to Flowerfan's post with 👍 10 months ago

Post

Multi-Instance Generation Controller: Enjoy complete control over position generation, attribute determination, and count!

code link: https://github.com/limuloo/MIGC
project page: https://migcproject.github.io/

MIGC decouples multi-instance generation into individual single-instance generation subtasks within the cross-attention layer of Stable Diffusion.

Welcome to follow our project and use the code to create anything you imagine!

Please let us know if you have any suggestions!

6 replies

·

reacted to DmitryRyumin's post with 👍 10 months ago

Post

🚀🖼️🌟 New Research Alert - CVPR 2024! 🌟🖼️🚀
📄 Title: CAMixerSR: Only Details Need More "Attention" 🌟🚀

📝 Description: CAMixerSR is a new approach integrating content-aware accelerating framework and token mixer design, to pursue more efficient SR inference via assigning convolution for simple regions but window-attention for complex textures. It exhibits excellent generality and attains competitive results among state-of-the-art models with better complexity-performance trade-offs on large-image SR, lightweight SR, and omnidirectional-image SR.

👥 Authors: Yan Wang, Shijie Zhao, Yi Liu, Junlin Li, and Li Zhang

📅 Conference: CVPR, Jun 17-21, 2024 | Seattle WA, USA 🇺🇸

🔗 Paper: CAMixerSR: Only Details Need More "Attention" (2402.19289)

🔗 Repository: https://github.com/icandle/CAMixerSR

📚 More Papers: more cutting-edge research presented at other conferences in the DmitryRyumin/NewEraAI-Papers curated by @DmitryRyumin

🚀 Added to the Image Enhancement Collection: DmitryRyumin/image-enhancement-65ee1cd2fe1c0c877ae55d28

🔍 Keywords: #CAMixerSR #SuperResolution #WindowAttention #ImageEnhancement #CVPR2024 #DeepLearning #Innovation