Token Communications: A Large Model-Driven Framework for Cross-modal Context-aware Semantic Communications
This work addresses bandwidth efficiency challenges in future wireless networks by integrating generative foundation models and multimodal large language models into semantic communication systems, representing a new paradigm rather than an incremental improvement.
The paper tackles the problem of improving bandwidth efficiency in generative semantic communications by introducing a large model-driven framework called token communications (TokCom) that leverages cross-modal context information, demonstrating significant improvement in a typical image semantic communication setup.
In this paper, we introduce token communications (TokCom), a large model-driven framework to leverage cross-modal context information in generative semantic communications (GenSC). TokCom is a new paradigm, motivated by the recent success of generative foundation models and multimodal large language models (GFM/MLLMs), where the communication units are tokens, enabling efficient transformer-based token processing at the transmitter and receiver. In this paper, we introduce the potential opportunities and challenges of leveraging context in GenSC, explore how to integrate GFM/MLLMs-based token processing into semantic communication systems to leverage cross-modal context effectively at affordable complexity, present the key principles for efficient TokCom at various layers in future wireless networks. In a typical image semantic communication setup, we demonstrate a significant improvement of the bandwidth efficiency, achieved by TokCom by leveraging the context information among tokens. Finally, the potential research directions are identified to facilitate adoption of TokCom in future wireless networks.