Perceiver IO: A General Architecture for Structured Inputs & Outputs
Abstract
A central goal of machine learning is the development of systems that can solve many problems in as many data domains as possible. Current architectures, however, cannot be applied beyond a small set of stereotyped settings, as they bake in domain & task assumptions or scale poorly to large inputs or outputs. In this work, we propose Perceiver IO, a general-purpose architecture that handles data from arbitrary settings while scaling linearly with the size of inputs and outputs. Our model augments the Perceiver with a flexible querying mechanism that enables outputs of various sizes and semantics, doing away with the need for task-specific architecture engineering. The same architecture achieves strong results on tasks spanning natural language and visual understanding, multi-task and multi-modal reasoning, and StarCraft II. As highlights, Perceiver IO outperforms a Transformer-based BERT baseline on the GLUE language benchmark despite removing input tokenization and achieves state-of-the-art performance on Sintel optical flow estimation with no explicit mechanisms for multiscale correspondence.
Community
The Future of AI: Exploring Perceiver IO's General Architecture
Links π:
π Subscribe: https://www.youtube.com/@Arxflix
π Twitter: https://x.com/arxflix
π LMNT (Partner): https://lmnt.com/
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- Generalist Multimodal AI: A Review of Architectures, Challenges and Opportunities (2024)
- 4M-21: An Any-to-Any Vision Model for Tens of Tasks and Modalities (2024)
- Chameleon: A Data-Efficient Generalist for Dense Visual Prediction in the Wild (2024)
- Harmony: A Joint Self-Supervised and Weakly-Supervised Framework for Learning General Purpose Visual Representations (2024)
- Vision-LSTM: xLSTM as Generic Vision Backbone (2024)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Models citing this paper 12
Browse 12 models citing this paperDatasets citing this paper 0
No dataset linking this paper