PatchGame: Learning to Signal Mid-level Patches in Referential Games
This work addresses the challenge of unsupervised communication in AI for computer vision, with potential benefits for efficiency and performance in tasks like classification.
The authors tackled the problem of enabling two agents to develop a communication protocol for identifying important image patches without supervision, and demonstrated applications in speeding up Vision Transformers and improving downstream recognition tasks.
We study a referential game (a type of signaling game) where two agents communicate with each other via a discrete bottleneck to achieve a common goal. In our referential game, the goal of the speaker is to compose a message or a symbolic representation of "important" image patches, while the task for the listener is to match the speaker's message to a different view of the same image. We show that it is indeed possible for the two agents to develop a communication protocol without explicit or implicit supervision. We further investigate the developed protocol and show the applications in speeding up recent Vision Transformers by using only important patches, and as pre-training for downstream recognition tasks (e.g., classification). Code available at https://github.com/kampta/PatchGame.