LGCLCVSDASJul 30, 2021

Perceiver IO: A General Architecture for Structured Inputs & Outputs

arXiv:2107.14795v3839 citations
Originality Highly original
AI Analysis

This addresses the need for scalable and flexible AI systems across multiple domains, representing a significant advancement rather than an incremental improvement.

The authors tackled the problem of creating a general-purpose architecture for machine learning that can handle diverse data domains and tasks without task-specific engineering, achieving strong results including outperforming BERT on GLUE and state-of-the-art performance on Sintel optical flow estimation.

A central goal of machine learning is the development of systems that can solve many problems in as many data domains as possible. Current architectures, however, cannot be applied beyond a small set of stereotyped settings, as they bake in domain & task assumptions or scale poorly to large inputs or outputs. In this work, we propose Perceiver IO, a general-purpose architecture that handles data from arbitrary settings while scaling linearly with the size of inputs and outputs. Our model augments the Perceiver with a flexible querying mechanism that enables outputs of various sizes and semantics, doing away with the need for task-specific architecture engineering. The same architecture achieves strong results on tasks spanning natural language and visual understanding, multi-task and multi-modal reasoning, and StarCraft II. As highlights, Perceiver IO outperforms a Transformer-based BERT baseline on the GLUE language benchmark despite removing input tokenization and achieves state-of-the-art performance on Sintel optical flow estimation with no explicit mechanisms for multiscale correspondence.

Code Implementations9 repos

Data from Papers with Code (CC-BY-SA-4.0)

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes