LGDec 17, 2025
Multi-Modal Semantic CommunicationMatin Mortaheb, Erciyes Karakaya, Sennur Ulukus
Semantic communication aims to transmit information most relevant to a task rather than raw data, offering significant gains in communication efficiency for applications such as telepresence, augmented reality, and remote sensing. Recent transformer-based approaches have used self-attention maps to identify informative regions within images, but they often struggle in complex scenes with multiple objects, where self-attention lacks explicit task guidance. To address this, we propose a novel Multi-Modal Semantic Communication framework that integrates text-based user queries to guide the information extraction process. Our proposed system employs a cross-modal attention mechanism that fuses visual features with language embeddings to produce soft relevance scores over the visual data. Based on these scores and the instantaneous channel bandwidth, we use an algorithm to transmit image patches at adaptive resolutions using independently trained encoder-decoder pairs, with total bitrate matching the channel capacity. At the receiver, the patches are reconstructed and combined to preserve task-critical information. This flexible and goal-driven design enables efficient semantic communication in complex and bandwidth-constrained environments.
AIApr 17
The Query Channel: Information-Theoretic Limits of Masking-Based ExplanationsErciyes Karakaya, Ozgur Ercetin
Masking-based post-hoc explanation methods, such as KernelSHAP and LIME, estimate local feature importance by querying a black-box model under randomized perturbations. This paper formulates this procedure as communication over a query channel, where the latent explanation acts as a message and each masked evaluation is a channel use. Within this framework, the complexity of the explanation is captured by the entropy of the hypothesis class, while the query interface supplies information at a rate determined by an identification capacity per query. We derive a strong converse showing that, if the explanation rate exceeds this capacity, the probability of exact recovery necessarily converges to one in error for any sequence of explainers and decoders. We also prove an achievability result establishing that a sparse maximum-likelihood decoder attains reliable recovery when the rate lies below capacity. A Monte Carlo estimator of mutual information yields a non-asymptotic query benchmark that we use to compare optimal decoding with Lasso- and OLS-based procedures that mirror LIME and KernelSHAP. Experiments reveal a range of query budgets where information theory permits reliable explanations but standard convex surrogates still fail. Finally, we interpret super-pixel resolution and tokenization for neural language models as a source-coding choice that sets the entropy of the explanation and show how Gaussian noise and nonlinear curvature degrade the query channel, induce waterfall and error-floor behavior, and render high-resolution explanations unattainable.
NIJul 25, 2024
Online Learning for Autonomous Management of Intent-based 6G NetworksErciyes Karakaya, Ozgur Ercetin, Huseyin Ozkan et al.
The growing complexity of networks and the variety of future scenarios with diverse and often stringent performance requirements call for a higher level of automation. Intent-based management emerges as a solution to attain high level of automation, enabling human operators to solely communicate with the network through high-level intents. The intents consist of the targets in the form of expectations (i.e., latency expectation) from a service and based on the expectations the required network configurations should be done accordingly. It is almost inevitable that when a network action is taken to fulfill one intent, it can cause negative impacts on the performance of another intent, which results in a conflict. In this paper, we aim to address the conflict issue and autonomous management of intent-based networking, and propose an online learning method based on the hierarchical multi-armed bandits approach for an effective management. Thanks to this hierarchical structure, it performs an efficient exploration and exploitation of network configurations with respect to the dynamic network conditions. We show that our algorithm is an effective approach regarding resource allocation and satisfaction of intent expectations.
CVMay 2, 2024
Transformer-Aided Semantic CommunicationsMatin Mortaheb, Erciyes Karakaya, Mohammad A. Amir Khojastepour et al.
The transformer structure employed in large language models (LLMs), as a specialized category of deep neural networks (DNNs) featuring attention mechanisms, stands out for their ability to identify and highlight the most relevant aspects of input data. Such a capability is particularly beneficial in addressing a variety of communication challenges, notably in the realm of semantic communication where proper encoding of the relevant data is critical especially in systems with limited bandwidth. In this work, we employ vision transformers specifically for the purpose of compression and compact representation of the input image, with the goal of preserving semantic information throughout the transmission process. Through the use of the attention mechanism inherent in transformers, we create an attention mask. This mask effectively prioritizes critical segments of images for transmission, ensuring that the reconstruction phase focuses on key objects highlighted by the mask. Our methodology significantly improves the quality of semantic communication and optimizes bandwidth usage by encoding different parts of the data in accordance with their semantic information content, thus enhancing overall efficiency. We evaluate the effectiveness of our proposed framework using the TinyImageNet dataset, focusing on both reconstruction quality and accuracy. Our evaluation results demonstrate that our framework successfully preserves semantic information, even when only a fraction of the encoded data is transmitted, according to the intended compression rates.