CLJun 17, 2023
Large Generative AI Models for Telecom: The Next Big Thing?Lina Bariah, Qiyang Zhao, Hang Zou et al.
The evolution of generative artificial intelligence (GenAI) constitutes a turning point in reshaping the future of technology in different aspects. Wireless networks in particular, with the blooming of self-evolving networks, represent a rich field for exploiting GenAI and reaping several benefits that can fundamentally change the way how wireless networks are designed and operated nowadays. To be specific, large GenAI models are envisioned to open up a new era of autonomous wireless networks, in which multi-modal GenAI models trained over various Telecom data, can be fine-tuned to perform several downstream tasks, eliminating the need for building and training dedicated AI models for each specific task and paving the way for the realization of artificial general intelligence (AGI)-empowered wireless networks. In this article, we aim to unfold the opportunities that can be reaped from integrating large GenAI models into the Telecom domain. In particular, we first highlight the applications of large GenAI models in future wireless networks, defining potential use-cases and revealing insights on the associated theoretical and practical challenges. Furthermore, we unveil how 6G can open up new opportunities through connecting multiple on-device large GenAI models, and hence, paves the way to the collective intelligence paradigm. Finally, we put a forward-looking vision on how large GenAI models will be the key to realize self-evolving networks.
CLJun 9, 2023
Understanding Telecom Language Through Large Language ModelsLina Bariah, Hang Zou, Qiyang Zhao et al.
The recent progress of artificial intelligence (AI) opens up new frontiers in the possibility of automating many tasks involved in Telecom networks design, implementation, and deployment. This has been further pushed forward with the evolution of generative artificial intelligence (AI), including the emergence of large language models (LLMs), which is believed to be the cornerstone toward realizing self-governed, interactive AI agents. Motivated by this, in this paper, we aim to adapt the paradigm of LLMs to the Telecom domain. In particular, we fine-tune several LLMs including BERT, distilled BERT, RoBERTa and GPT-2, to the Telecom domain languages, and demonstrate a use case for identifying the 3rd Generation Partnership Project (3GPP) standard working groups. We consider training the selected models on 3GPP technical documents (Tdoc) pertinent to years 2009-2019 and predict the Tdoc categories in years 2020-2023. The results demonstrate that fine-tuning BERT and RoBERTa model achieves 84.6% accuracy, while GPT-2 model achieves 83% in identifying 3GPP working groups. The distilled BERT model with around 50% less parameters achieves similar performance as others. This corroborates that fine-tuning pretrained LLM can effectively identify the categories of Telecom language. The developed framework shows a stepping stone towards realizing intent-driven and self-evolving wireless networks from Telecom languages, and paves the way for the implementation of generative AI in the Telecom domain.
SPJul 10, 2024
Generative AI for RF Sensing in IoT systemsLi Wang, Chao Zhang, Qiyang Zhao et al.
The development of wireless sensing technologies, using signals such as Wi-Fi, infrared, and RF to gather environmental data, has significantly advanced within Internet of Things (IoT) systems. Among these, Radio Frequency (RF) sensing stands out for its cost-effective and non-intrusive monitoring of human activities and environmental changes. However, traditional RF sensing methods face significant challenges, including noise, interference, incomplete data, and high deployment costs, which limit their effectiveness and scalability. This paper investigates the potential of Generative AI (GenAI) to overcome these limitations within the IoT ecosystem. We provide a comprehensive review of state-of-the-art GenAI techniques, focusing on their application to RF sensing problems. By generating high-quality synthetic data, enhancing signal quality, and integrating multi-modal data, GenAI offers robust solutions for RF environment reconstruction, localization, and imaging. Additionally, GenAI's ability to generalize enables IoT devices to adapt to new environments and unseen tasks, improving their efficiency and performance. The main contributions of this article include a detailed analysis of the challenges in RF sensing, the presentation of innovative GenAI-based solutions, and the proposal of a unified framework for diverse RF sensing tasks. Through case studies, we demonstrate the effectiveness of integrating GenAI models, leading to advanced, scalable, and intelligent IoT systems.
SPJul 12, 2024
TelecomGPT: A Framework to Build Telecom-Specfic Large Language ModelsHang Zou, Qiyang Zhao, Yu Tian et al.
Large Language Models (LLMs) have the potential to revolutionize the Sixth Generation (6G) communication networks. However, current mainstream LLMs generally lack the specialized knowledge in telecom domain. In this paper, for the first time, we propose a pipeline to adapt any general purpose LLMs to a telecom-specific LLMs. We collect and build telecom-specific pre-train dataset, instruction dataset, preference dataset to perform continual pre-training, instruct tuning and alignment tuning respectively. Besides, due to the lack of widely accepted evaluation benchmarks in telecom domain, we extend existing evaluation benchmarks and proposed three new benchmarks, namely, Telecom Math Modeling, Telecom Open QnA and Telecom Code Tasks. These new benchmarks provide a holistic evaluation of the capabilities of LLMs including math modeling, Open-Ended question answering, code generation, infilling, summarization and analysis in telecom domain. Our fine-tuned LLM TelecomGPT outperforms state of the art (SOTA) LLMs including GPT-4, Llama-3 and Mistral in Telecom Math Modeling benchmark significantly and achieve comparable performance in various evaluation benchmarks such as TeleQnA, 3GPP technical documents classification, telecom code summary and generation and infilling.
SPAug 31, 2023
Joint Semantic-Native Communication and Inference via Minimal Simplicial StructuresQiyang Zhao, Hang Zou, Mehdi Bennis et al.
In this work, we study the problem of semantic communication and inference, in which a student agent (i.e. mobile device) queries a teacher agent (i.e. cloud sever) to generate higher-order data semantics living in a simplicial complex. Specifically, the teacher first maps its data into a k-order simplicial complex and learns its high-order correlations. For effective communication and inference, the teacher seeks minimally sufficient and invariant semantic structures prior to conveying information. These minimal simplicial structures are found via judiciously removing simplices selected by the Hodge Laplacians without compromising the inference query accuracy. Subsequently, the student locally runs its own set of queries based on a masked simplicial convolutional autoencoder (SCAE) leveraging both local and remote teacher's knowledge. Numerical results corroborate the effectiveness of the proposed approach in terms of improving inference query accuracy under different channel conditions and simplicial structures. Experiments on a coauthorship dataset show that removing simplices by ranking the Laplacian values yields a 85% reduction in payload size without sacrificing accuracy. Joint semantic communication and inference by masked SCAE improves query accuracy by 25% compared to local student based query and 15% compared to remote teacher based query. Finally, incorporating channel semantics is shown to effectively improve inference accuracy, notably at low SNR values.
LGJan 29Code
LLM4Fluid: Large Language Models as Generalizable Neural Solvers for Fluid DynamicsQisong Xiao, Xinhai Chen, Qinglin Wang et al.
Deep learning has emerged as a promising paradigm for spatio-temporal modeling of fluid dynamics. However, existing approaches often suffer from limited generalization to unseen flow conditions and typically require retraining when applied to new scenarios. In this paper, we present LLM4Fluid, a spatio-temporal prediction framework that leverages Large Language Models (LLMs) as generalizable neural solvers for fluid dynamics. The framework first compresses high-dimensional flow fields into a compact latent space via reduced-order modeling enhanced with a physics-informed disentanglement mechanism, effectively mitigating spatial feature entanglement while preserving essential flow structures. A pretrained LLM then serves as a temporal processor, autoregressively predicting the dynamics of physical sequences with time series prompts. To bridge the modality gap between prompts and physical sequences, which can otherwise degrade prediction accuracy, we propose a dedicated modality alignment strategy that resolves representational mismatch and stabilizes long-term prediction. Extensive experiments across diverse flow scenarios demonstrate that LLM4Fluid functions as a robust and generalizable neural solver without retraining, achieving state-of-the-art accuracy while exhibiting powerful zero-shot and in-context learning capabilities. Code and datasets are publicly available at https://github.com/qisongxiao/LLM4Fluid.
CVAug 23, 2024
La-SoftMoE CLIP for Unified Physical-Digital Face Attack DetectionHang Zou, Chenxi Du, Hui Zhang et al.
Facial recognition systems are susceptible to both physical and digital attacks, posing significant security risks. Traditional approaches often treat these two attack types separately due to their distinct characteristics. Thus, when being combined attacked, almost all methods could not deal. Some studies attempt to combine the sparse data from both types of attacks into a single dataset and try to find a common feature space, which is often impractical due to the space is difficult to be found or even non-existent. To overcome these challenges, we propose a novel approach that uses the sparse model to handle sparse data, utilizing different parameter groups to process distinct regions of the sparse feature space. Specifically, we employ the Mixture of Experts (MoE) framework in our model, expert parameters are matched to tokens with varying weights during training and adaptively activated during testing. However, the traditional MoE struggles with the complex and irregular classification boundaries of this problem. Thus, we introduce a flexible self-adapting weighting mechanism, enabling the model to better fit and adapt. In this paper, we proposed La-SoftMoE CLIP, which allows for more flexible adaptation to the Unified Attack Detection (UAD) task, significantly enhancing the model's capability to handle diversity attacks. Experiment results demonstrate that our proposed method has SOTA performance.
ROApr 8
Telecom World Models: Unifying Digital Twins, Foundation Models, and Predictive Planning for 6GHang Zou, Yuzhi Yang, Lina Bariah et al.
The integration of machine learning tools into telecom networks, has led to two prevailing paradigms, namely, language-based systems, such as Large Language Models (LLMs), and physics-based systems, such as Digital Twins (DTs). While LLM-based approaches enable flexible interaction and automation, they lack explicit representations of network dynamics. DTs, in contrast, offer a high-fidelity network simulation, but remain scenario-specific and are not designed for learning or decision-making under uncertainty. This gap becomes critical for 6G systems, where decisions must take into account the evolving network states, uncertainty, and the cascading effects of control actions across multiple layers. In this article, we introduce the {Telecom World Model}~(TWM) concept, an architecture for learned, action-conditioned, uncertainty-aware modeling of telecom system dynamics. We decompose the problem into two interacting worlds, a controllable system world consisting of operator-configurable settings and an external world that captures propagation, mobility, traffic, and failures. We propose a three-layer architecture, comprising a field world model for spatial environment prediction, a control/dynamics world model for action-conditioned Key Performance Indicator (KPI) trajectory prediction, and a telecom foundation model layer for intent translation and orchestration. We showcase a comparative analysis between existing paradigms, which demonstrates that TWM jointly provides telecom state grounding, fast action-conditioned roll-outs, calibrated uncertainty, multi-timescale dynamics, model-based planning, and LLM-integrated guardrails. Furthermore, we present a proof-of-concept on network slicing to validate the proposed architecture, showing that the full three-layer pipeline outperforms single-world baselines and accurately predicts KPI trajectories.
SPFeb 16
RF-GPT: Teaching AI to See the Wireless WorldHang Zou, Yu Tian, Bohao Wang et al.
Large language models (LLMs) and multimodal models have become powerful general-purpose reasoning systems. However, radio-frequency (RF) signals, which underpin wireless systems, are still not natively supported by these models. Existing LLM-based approaches for telecom focus mainly on text and structured data, while conventional RF deep-learning models are built separately for specific signal-processing tasks, highlighting a clear gap between RF perception and high-level reasoning. To bridge this gap, we introduce RF-GPT, a radio-frequency language model (RFLM) that utilizes the visual encoders of multimodal LLMs to process and understand RF spectrograms. In this framework, complex in-phase/quadrature (IQ) waveforms are mapped to time-frequency spectrograms and then passed to pretrained visual encoders. The resulting representations are injected as RF tokens into a decoder-only LLM, which generates RF-grounded answers, explanations, and structured outputs. To train RF-GPT, we perform supervised instruction fine-tuning of a pretrained multimodal LLM using a fully synthetic RF corpus. Standards-compliant waveform generators produce wideband scenes for six wireless technologies, from which we derive time-frequency spectrograms, exact configuration metadata, and dense captions. A text-only LLM then converts these captions into RF-grounded instruction-answer pairs, yielding roughly 12,000 RF scenes and 0.625 million instruction examples without any manual labeling. Across benchmarks for wideband modulation classification, overlap analysis, wireless-technology recognition, WLAN user counting, and 5G NR information extraction, RF-GPT achieves strong multi-task performance, whereas general-purpose VLMs with no RF grounding largely fail.
CVAug 19, 2024
A Unified Framework for Iris Anti-Spoofing: Introducing Iris Anti-Spoofing Cross-Domain-Testing Protocol and Masked-MoE MethodHang Zou, Chenxi Du, Ajian Liu et al.
Iris recognition is widely used in high-security scenarios due to its stability and distinctiveness. However, iris images captured by different devices exhibit certain and device-related consistent differences, which has a greater impact on the classification algorithm for anti-spoofing. The iris of various races would also affect the classification, causing the risk of identity theft. So it is necessary to improve the cross-domain capabilities of the iris anti-spoofing (IAS) methods to enable it more robust in facing different races and devices. However, there is no existing protocol that is comprehensively available. To address this gap, we propose an Iris Anti-Spoofing Cross-Domain-Testing (IAS-CDT) Protocol, which involves 10 datasets, belonging to 7 databases, published by 4 institutions, and collected with 6 different devices. It contains three sub-protocols hierarchically, aimed at evaluating average performance, cross-racial generalization, and cross-device generalization of IAS models. Moreover, to address the cross-device generalization challenge brought by the IAS-CDT Protocol, we employ multiple model parameter sets to learn from the multiple sub-datasets. Specifically, we utilize the Mixture of Experts (MoE) to fit complex data distributions using multiple sub-neural networks. To further enhance the generalization capabilities, we propose a novel method Masked-MoE (MMoE), which randomly masks a portion of tokens for some experts and requires their outputs to be similar to the unmasked experts, which can effectively mitigate the overfitting issue of MoE. For the evaluation, we selected ResNet50, VIT-B/16, CLIP, and FLIP as representative models and benchmarked them under the proposed IAS-CDT Protocol.
CLJun 29, 2024Code
Large Language Models for Power Scheduling: A User-Centric ApproachThomas Mongaillard, Samson Lasaulce, Othman Hicheur et al.
While traditional optimization and scheduling schemes are designed to meet fixed, predefined system requirements, future systems are moving toward user-driven approaches and personalized services, aiming to achieve high quality-of-experience (QoE) and flexibility. This challenge is particularly pronounced in wireless and digitalized energy networks, where users' requirements have largely not been taken into consideration due to the lack of a common language between users and machines. The emergence of powerful large language models (LLMs) marks a radical departure from traditional system-centric methods into more advanced user-centric approaches by providing a natural communication interface between users and devices. In this paper, for the first time, we introduce a novel architecture for resource scheduling problems by constructing three LLM agents to convert an arbitrary user's voice request (VRQ) into a resource allocation vector. Specifically, we design an LLM intent recognition agent to translate the request into an optimization problem (OP), an LLM OP parameter identification agent, and an LLM OP solving agent. To evaluate system performance, we construct a database of typical VRQs in the context of electric vehicle (EV) charging. As a proof of concept, we primarily use Llama 3 8B. Through testing with different prompt engineering scenarios, the obtained results demonstrate the efficiency of the proposed architecture. The conducted performance analysis allows key insights to be extracted. For instance, having a larger set of candidate OPs to model the real-world problem might degrade the final performance because of a higher recognition/OP classification noise level. All results and codes are open source.
AIFeb 26, 2024
GenAINet: Enabling Wireless Collective Intelligence via Knowledge Transfer and ReasoningHang Zou, Qiyang Zhao, Samson Lasaulce et al.
Generative Artificial Intelligence (GenAI) and communication networks are expected to have groundbreaking synergies for 6G. Connecting GenAI agents via a wireless network can potentially unleash the power of Collective Intelligence (CI) and pave the way for Artificial General Intelligence (AGI). However, current wireless networks are designed as a "data pipe" and are not suited to accommodate and leverage the power of GenAI. In this paper, we propose the GenAINet framework in which distributed GenAI agents communicate knowledge (facts, experiences, and methods) to accomplish arbitrary tasks. We first propose an architecture for a single GenAI agent and then provide a network architecture integrating GenAI capabilities to manage both network protocols and applications. Building on this, we investigate effective communication and reasoning problems by proposing a semantic-native GenAINet. Specifically, GenAI agents extract semantics from heterogeneous raw data, build and maintain a knowledge model representing the semantic relationships among pieces of knowledge, which is retrieved by GenAI models for planning and reasoning. Under this paradigm, different levels of collaboration can be achieved flexibly depending on the complexity of targeted tasks. Furthermore, we conduct two case studies in which, through wireless device queries, we demonstrate that extracting, compressing and transferring common knowledge can improve query accuracy while reducing communication costs; and in the wireless power control problem, we show that distributed agents can complete general tasks independently through collaborative reasoning without predefined communication protocols. Finally, we discuss challenges and future research directions in applying Large Language Models (LLMs) in 6G networks.
NIMar 6, 2025
Large-Scale AI in Telecom: Charting the Roadmap for Innovation, Scalability, and Enhanced Digital ExperiencesAdnan Shahid, Adrian Kliks, Ahmed Al-Tahmeesschi et al.
This white paper discusses the role of large-scale AI in the telecommunications industry, with a specific focus on the potential of generative AI to revolutionize network functions and user experiences, especially in the context of 6G systems. It highlights the development and deployment of Large Telecom Models (LTMs), which are tailored AI models designed to address the complex challenges faced by modern telecom networks. The paper covers a wide range of topics, from the architecture and deployment strategies of LTMs to their applications in network management, resource allocation, and optimization. It also explores the regulatory, ethical, and standardization considerations for LTMs, offering insights into their future integration into telecom infrastructure. The goal is to provide a comprehensive roadmap for the adoption of LTMs to enhance scalability, performance, and user-centric innovation in telecom networks.
ITSep 16, 2019
Decision Set Optimization and Energy-Efficient MIMO CommunicationsHang Zou, Chao Zhang, Samson Lasaulce et al.
Assuming that the number of possible decisions for a transmitter (e.g., the number of possible beamforming vectors) has to be finite and is given, this paper investigates for the first time the problem of determining the best decision set when energy-efficiency maximization is pursued. We propose a framework to find a good (finite) decision set which induces a minimal performance loss w.r.t. to the continuous case. We exploit this framework for a scenario of energy-efficient MIMO communications in which transmit power and beamforming vectors have to be adapted jointly to the channel given under finite-rate feedback. To determine a good decision set we propose an algorithm which combines the approach of Invasive Weed Optimization (IWO) and an Evolutionary Algorithm (EA). We provide a numerical analysis which illustrates the benefits of our point of view. In particular, given a performance loss level, the feedback rate can by reduced by 2 when the transmit decision set has been designed properly by using our algorithm. The impact on energy-efficiency is also seen to be significant.
LGMay 17, 2019
Decision-Oriented Communications: Application to Energy-Efficient Resource AllocationHang Zou, Chao Zhang, Samson Lasaulce et al.
In this paper, we introduce the problem of decision-oriented communications, that is, the goal of the source is to send the right amount of information in order for the intended destination to execute a task. More specifically, we restrict our attention to how the source should quantize information so that the destination can maximize a utility function which represents the task to be executed only knowing the quantized information. For example, for utility functions under the form $u\left(\boldsymbol{x};\ \boldsymbol{g}\right)$, $\boldsymbol{x}$ might represent a decision in terms of using some radio resources and $\boldsymbol{g}$ the system state which is only observed through its quantized version $Q(\boldsymbol{g})$. Both in the case where the utility function is known and the case where it is only observed through its realizations, we provide solutions to determine such a quantizer. We show how this approach applies to energy-efficient power allocation. In particular, it is seen that quantizing the state very roughly is perfectly suited to sum-rate-type function maximization, whereas energy-efficiency metrics are more sensitive to imperfections.
ITApr 2, 2019
Task Oriented Channel State Information QuantizationHang Zou, Chao Zhang, Samson Lasaulce
In this paper, we propose a new perspective for quantizing a signal and more specifically the channel state information (CSI). The proposed point of view is fully relevant for a receiver which has to send a quantized version of the channel state to the transmitter. Roughly, the key idea is that the receiver sends the right amount of information to the transmitter so that the latter be able to take its (resource allocation) decision. More formally, the decision task of the transmitter is to maximize an utility function u(x;g) with respect to x (e.g., a power allocation vector) given the knowledge of a quantized version of the function parameters g. We exhibit a special case of an energy-efficient power control (PC) problem for which the optimal task oriented CSI quantizer (TOCQ) can be found analytically. For more general utility functions, we propose to use neural networks (NN) based learning. Simulations show that the compression rate obtained by adapting the feedback information rate to the function to be optimized may be significantly increased.