ITAug 11, 2023
Large Language Models for Telecom: Forthcoming Impact on the IndustryAli Maatouk, Nicola Piovesan, Fadhel Ayed et al.
Large Language Models (LLMs), AI-driven models that can achieve general-purpose language understanding and generation, have emerged as a transformative force, revolutionizing fields well beyond Natural Language Processing (NLP) and garnering unprecedented attention. As LLM technology continues to progress, the telecom industry is facing the prospect of its impact on its landscape. To elucidate these implications, we delve into the inner workings of LLMs, providing insights into their current capabilities and limitations. We also examine the use cases that can be readily implemented in the telecom industry, streamlining tasks, such as anomalies resolutions and technical specifications comprehension, which currently hinder operational efficiency and demand significant manpower and expertise. Furthermore, we uncover essential research directions that deal with the distinctive challenges of utilizing the LLMs within the telecom domain. Addressing them represents a significant stride towards fully harnessing the potential of LLMs and unlocking their capabilities to the fullest extent within the telecom domain.
ITOct 23, 2023
TeleQnA: A Benchmark Dataset to Assess Large Language Models Telecommunications KnowledgeAli Maatouk, Fadhel Ayed, Nicola Piovesan et al.
We introduce TeleQnA, the first benchmark dataset designed to evaluate the knowledge of Large Language Models (LLMs) in telecommunications. Comprising 10,000 questions and answers, this dataset draws from diverse sources, including standards and research articles. This paper outlines the automated question generation framework responsible for creating this dataset, along with how human input was integrated at various stages to ensure the quality of the questions. Afterwards, using the provided dataset, an evaluation is conducted to assess the capabilities of LLMs, including GPT-3.5 and GPT-4. The results highlight that these models struggle with complex standards related questions but exhibit proficiency in addressing general telecom-related inquiries. Additionally, our results showcase how incorporating telecom knowledge context significantly enhances their performance, thus shedding light on the need for a specialized telecom foundation model. Finally, the dataset is shared with active telecom professionals, whose performance is subsequently benchmarked against that of the LLMs. The findings illustrate that LLMs can rival the performance of active professionals in telecom knowledge, thanks to their capacity to process vast amounts of information, underscoring the potential of LLMs within this domain. The dataset has been made publicly accessible on GitHub.
62.4LGMay 28
HPO: Hysteretic Policy Optimization for Stable and Efficient Training under Sparse-Reward RegimeMohamed Sana, Nicola Piovesan, Antonio De Domenico et al.
We investigate a narrow but common failure mode of GRPO-style reinforcement learning in the context of sparse verifiable rewards: early updates contain more responses with negative advantages than those with positive advantages, while response-level length normalization ties the magnitude of the update to the length of the output. We propose Hysteretic Policy Optimization (HPO), a minimal modification of GRPO that reduces the weight of negative-advantage updates and replaces per-response length normalization with mean-length normalization. We further introduce Adaptive HPO (A-HPO), which sets the hysteretic weight based on batch-level advantage-sign statistics, thereby removing the need for tuning a fixed hysteretic weight. In our TeleLogs and Countdown experiments, A-HPO improves the reward per update compared to GRPO, with the largest gains in early sparse reward regimes. On TeleLogs, A-HPO achieves a final reward of 0.84, outperforming SAPO by 5%, GSPO by 11%, and GRPO by 15%, while maintaining a comparable response-length. On Countdown, A-HPO achieves the largest gains in initial and most difficult configurations across 1.5B-7B models. Ablation studies on the hysteretic weight show that the gains of A-HPO come from better balancing the contributions of positive and negative advantages compared to positive-only or fully symmetric updates.
NIJan 12, 2023
Accordion: A Communication-Aware Machine Learning Framework for Next Generation NetworksFadhel Ayed, Antonio De Domenico, Adrian Garcia-Rodriguez et al.
In this article, we advocate for the design of ad hoc artificial intelligence (AI)/machine learning (ML) models to facilitate their usage in future smart infrastructures based on communication networks. To motivate this, we first review key operations identified by the 3GPP for transferring AI/ML models through 5G networks and the main existing techniques to reduce their communication overheads. We also present a novel communication-aware ML framework, which we refer to as Accordion, that enables an efficient AI/ML model transfer thanks to an overhauled model training and communication protocol. We demonstrate the communication-related benefits of Accordion, analyse key performance trade-offs, and discuss potential research directions within this realm.
MLFeb 14, 2023
Data pruning and neural scaling laws: fundamental limitations of score-based algorithmsFadhel Ayed, Soufiane Hayou
Data pruning algorithms are commonly used to reduce the memory and computational cost of the optimization process. Recent empirical results reveal that random data pruning remains a strong baseline and outperforms most existing data pruning methods in the high compression regime, i.e., where a fraction of $30\%$ or less of the data is kept. This regime has recently attracted a lot of interest as a result of the role of data pruning in improving the so-called neural scaling laws; in [Sorscher et al.], the authors showed the need for high-quality data pruning algorithms in order to beat the sample power law. In this work, we focus on score-based data pruning algorithms and show theoretically and empirically why such algorithms fail in the high compression regime. We demonstrate ``No Free Lunch" theorems for data pruning and present calibration protocols that enhance the performance of existing pruning algorithms in this high compression regime using randomization.
MLMay 17, 2022
Deep neural networks with dependent weights: Gaussian Process mixture limit, heavy tails, sparsity and compressibilityHoil Lee, Fadhel Ayed, Paul Jung et al.
This article studies the infinite-width limit of deep feedforward neural networks whose weights are dependent, and modelled via a mixture of Gaussian distributions. Each hidden node of the network is assigned a nonnegative random variable that controls the variance of the outgoing weights of that node. We make minimal assumptions on these per-node random variables: they are iid and their sum, in each layer, converges to some finite random variable in the infinite-width limit. Under this model, we show that each layer of the infinite-width neural network can be characterised by two simple quantities: a non-negative scalar parameter and a Lévy measure on the positive reals. If the scalar parameters are strictly positive and the Lévy measures are trivial at all hidden layers, then one recovers the classical Gaussian process (GP) limit, obtained with iid Gaussian weights. More interestingly, if the Lévy measure of at least one layer is non-trivial, we obtain a mixture of Gaussian processes (MoGP) in the large-width limit. The behaviour of the neural network in this regime is very different from the GP regime. One obtains correlated outputs, with non-Gaussian distributions, possibly with heavy tails. Additionally, we show that, in this regime, the weights are compressible, and some nodes have asymptotically non-negligible contributions, therefore representing important hidden features. Many sparsity-promoting neural network models can be recast as special cases of our approach, and we discuss their infinite-width limits; we also present an asymptotic analysis of the pruning error. We illustrate some of the benefits of the MoGP regime over the GP regime in terms of representation learning and compressibility on simulated, MNIST and Fashion MNIST datasets.
MLFeb 2, 2023
Over-parameterised Shallow Neural Networks with Asymmetrical Node Scaling: Global Convergence Guarantees and Feature LearningFrancois Caron, Fadhel Ayed, Paul Jung et al.
We consider gradient-based optimisation of wide, shallow neural networks, where the output of each hidden node is scaled by a positive parameter. The scaling parameters are non-identical, differing from the classical Neural Tangent Kernel (NTK) parameterisation. We prove that for large such neural networks, with high probability, gradient flow and gradient descent converge to a global minimum and can learn features in some sense, unlike in the NTK parameterisation. We perform experiments illustrating our theoretical results and discuss the benefits of such scaling in terms of prunability and transfer learning.
LGOct 31, 2023
FlexTrain: A Dynamic Training Framework for Heterogeneous Devices EnvironmentsMert Unsal, Ali Maatouk, Antonio De Domenico et al.
As deep learning models become increasingly large, they pose significant challenges in heterogeneous devices environments. The size of deep learning models makes it difficult to deploy them on low-power or resource-constrained devices, leading to long inference times and high energy consumption. To address these challenges, we propose FlexTrain, a framework that accommodates the diverse storage and computational resources available on different devices during the training phase. FlexTrain enables efficient deployment of deep learning models, while respecting device constraints, minimizing communication costs, and ensuring seamless integration with diverse devices. We demonstrate the effectiveness of FlexTrain on the CIFAR-100 dataset, where a single global model trained with FlexTrain can be easily deployed on heterogeneous devices, saving training time and energy consumption. We also extend FlexTrain to the federated learning setting, showing that our approach outperforms standard federated learning benchmarks on both CIFAR-10 and CIFAR-100 datasets.
CLSep 19, 2024
Pay Attention to What MattersPedro Luiz Silva, Antonio de Domenico, Ali Maatouk et al.
Despite the remarkable success of Large Language Models (LLMs), they still exhibit a limited capability to align their outputs to the user instructions. In this work, we introduce a simple and effective method, which we name GUIDE, that mechanistically increases attention scores in instruction tokens. To support this operation, we present Influence, a novel metric that highlights how the user's instructions propagate through the transformer layers and impact the LLM output. Our results show that GUIDE improves the accuracy of following instructions 29.4 % to 60.4%, outperforming natural prompting alternatives and Supervised Fine-Tuning up to 1M tokens.
AIJun 12, 2025Code
TeleMath: A Benchmark for Large Language Models in Telecom Mathematical Problem SolvingVincenzo Colle, Mohamed Sana, Nicola Piovesan et al.
The increasing adoption of artificial intelligence in telecommunications has raised interest in the capability of Large Language Models (LLMs) to address domain-specific, mathematically intensive tasks. Although recent advancements have improved the performance of LLMs in general mathematical reasoning, their effectiveness within specialized domains, such as signal processing, network optimization, and performance analysis, remains largely unexplored. To address this gap, we introduce TeleMath, the first benchmark dataset specifically designed to evaluate LLM performance in solving mathematical problems with numerical solutions in the telecommunications domain. Comprising 500 question-answer (QnA) pairs, TeleMath covers a wide spectrum of topics in the telecommunications field. This paper outlines the proposed QnAs generation pipeline, starting from a selected seed of problems crafted by Subject Matter Experts. The evaluation of a wide range of open-source LLMs reveals that best performance on TeleMath is achieved by recent models explicitly designed for mathematical or logical reasoning. In contrast, general-purpose models, even those with a large number of parameters, often struggle with these challenges. We have released the dataset and the evaluation code to ease result reproducibility and support future research.
AIJul 29, 2025Code
Reasoning Language Models for Root Cause Analysis in 5G Wireless NetworksMohamed Sana, Nicola Piovesan, Antonio De Domenico et al.
Root Cause Analysis (RCA) in mobile networks remains a challenging task due to the need for interpretability, domain expertise, and causal reasoning. In this work, we propose a lightweight framework that leverages Large Language Models (LLMs) for RCA. To do so, we introduce TeleLogs, a curated dataset of annotated troubleshooting problems designed to benchmark RCA capabilities. Our evaluation reveals that existing open-source reasoning LLMs struggle with these problems, underscoring the need for domain-specific adaptation. To address this issue, we propose a two-stage training methodology that combines supervised fine-tuning with reinforcement learning to improve the accuracy and reasoning quality of LLMs. The proposed approach fine-tunes a series of RCA models to integrate domain knowledge and generate structured, multi-step diagnostic explanations, improving both interpretability and effectiveness. Extensive experiments across multiple LLM sizes show significant performance gains over state-of-the-art reasoning and non-reasoning models, including strong generalization to randomized test variants. These results demonstrate the promise of domain-adapted, reasoning-enhanced LLMs for practical and explainable RCA in network operation and management.
LGJul 30, 2020Code
Anomaly Detection at Scale: The Case for Deep Distributional Time Series ModelsFadhel Ayed, Lorenzo Stella, Tim Januschowski et al.
This paper introduces a new methodology for detecting anomalies in time series data, with a primary application to monitoring the health of (micro-) services and cloud resources. The main novelty in our approach is that instead of modeling time series consisting of real values or vectors of real values, we model time series of probability distributions over real values (or vectors). This extension to time series of probability distributions allows the technique to be applied to the common scenario where the data is generated by requests coming in to a service, which is then aggregated at a fixed temporal frequency. Our method is amenable to streaming anomaly detection and scales to monitoring for anomalies on millions of time series. We show the superior accuracy of our method on synthetic and public real-world data. On the Yahoo Webscope data set, we outperform the state of the art in 3 out of 4 data sets and we show that we outperform popular open-source anomaly detection tools by up to 17% average improvement for a real-world data set.
CLMar 7, 2024
Telecom Language Models: Must They Be Large?Nicola Piovesan, Antonio De Domenico, Fadhel Ayed
The increasing interest in Large Language Models (LLMs) within the telecommunications sector underscores their potential to revolutionize operational efficiency. However, the deployment of these sophisticated models is often hampered by their substantial size and computational demands, raising concerns about their viability in resource-constrained environments. Addressing this challenge, recent advancements have seen the emergence of small language models that surprisingly exhibit performance comparable to their larger counterparts in many tasks, such as coding and common-sense reasoning. Phi-2, a compact yet powerful model, exemplifies this new wave of efficient small language models. This paper conducts a comprehensive evaluation of Phi-2's intrinsic understanding of the telecommunications domain. Recognizing the scale-related limitations, we enhance Phi-2's capabilities through a Retrieval-Augmented Generation approach, meticulously integrating an extensive knowledge base specifically curated with telecom standard specifications. The enhanced Phi-2 model demonstrates a profound improvement in accuracy, answering questions about telecom standards with a precision that closely rivals the more resource-intensive GPT-3.5. The paper further explores the refined capabilities of Phi-2 in addressing problem-solving scenarios within the telecom sector, highlighting its potential and limitations.
AINov 10, 2024
Hermes: A Large Language Model Framework on the Journey to Autonomous NetworksFadhel Ayed, Ali Maatouk, Nicola Piovesan et al.
The drive toward automating cellular network operations has grown with the increasing complexity of these systems. Despite advancements, full autonomy currently remains out of reach due to reliance on human intervention for modeling network behaviors and defining policies to meet target requirements. Network Digital Twins (NDTs) have shown promise in enhancing network intelligence, but the successful implementation of this technology is constrained by use case-specific architectures, limiting its role in advancing network autonomy. A more capable network intelligence, or "telecommunications brain", is needed to enable seamless, autonomous management of cellular network. Large Language Models (LLMs) have emerged as potential enablers for this vision but face challenges in network modeling, especially in reasoning and handling diverse data types. To address these gaps, we introduce Hermes, a chain of LLM agents that uses "blueprints" for constructing NDT instances through structured and explainable logical steps. Hermes allows automatic, reliable, and accurate network modeling of diverse use cases and configurations, thus marking progress toward fully autonomous network operations.
CLDec 5, 2025
TeleTables: A Benchmark for Large Language Models in Telecom Table InterpretationAnas Ezzakri, Nicola Piovesan, Mohamed Sana et al.
Language Models (LLMs) are increasingly explored in the telecom industry to support engineering tasks, accelerate troubleshooting, and assist in interpreting complex technical documents. However, recent studies show that LLMs perform poorly on telecom standards, particularly 3GPP specifications. We argue that a key reason is that these standards densely include tables to present essential information, yet the LLM knowledge and interpretation ability of such tables remains largely unexamined. To address this gap, we introduce TeleTables, a benchmark designed to evaluate both the implicit knowledge LLMs have about tables in technical specifications and their explicit ability to interpret them. TeleTables is built through a novel multi-stage data generation pipeline that extracts tables from 3GPP standards and uses multimodal and reasoning-oriented LLMs to generate and validate questions. The resulting dataset, which is publicly available, comprises 500 human-verified question-answer pairs, each associated with the corresponding table in multiple formats. Our evaluation shows that, smaller models (under 10B parameters) struggle both to recall 3GPP knowledge and to interpret tables, indicating the limited exposure to telecom standards in their pretraining and the insufficient inductive biases for navigating complex technical material. Larger models, on the other hand, show stronger reasoning on table interpretation. Overall, TeleTables highlights the need for domain-specialized fine-tuning to reliably interpret and reason over telecom standards.
LGSep 5, 2025
KVCompose: Efficient Structured KV Cache Compression with Composite TokensDmitry Akulov, Mohamed Sana, Antonio De Domenico et al.
Large language models (LLMs) rely on key-value (KV) caches for efficient autoregressive decoding; however, cache size grows linearly with context length and model depth, becoming a major bottleneck in long-context inference. Prior KV cache compression methods either enforce rigid heuristics, disrupt tensor layouts with per-attention-head variability, or require specialized compute kernels. We propose a simple, yet effective, KV cache compression framework based on attention-guided, layer-adaptive composite tokens. Our method aggregates attention scores to estimate token importance, selects head-specific tokens independently, and aligns them into composite tokens that respect the uniform cache structure required by existing inference engines. A global allocation mechanism further adapts retention budgets across layers, assigning more capacity to layers with informative tokens. This approach achieves significant memory reduction while preserving accuracy, consistently outperforming prior structured and semi-structured methods. Crucially, our approach remains fully compatible with standard inference pipelines, offering a practical and scalable solution for efficient long-context LLM deployment.
LGApr 24, 2025
Goal-Oriented Time-Series Forecasting: Foundation Framework DesignLuca-Andrei Fechete, Mohamed Sana, Fadhel Ayed et al.
Conventional time-series forecasting methods typically aim to minimize overall prediction error, without accounting for the varying importance of different forecast ranges in downstream applications. We propose a training methodology that enables forecasting models to adapt their focus to application-specific regions of interest at inference time, without retraining. The approach partitions the prediction space into fine-grained segments during training, which are dynamically reweighted and aggregated to emphasize the target range specified by the application. Unlike prior methods that predefine these ranges, our framework supports flexible, on-demand adjustments. Experiments on standard benchmarks and a newly collected wireless communication dataset demonstrate that our method not only improves forecast accuracy within regions of interest but also yields measurable gains in downstream task performance. These results highlight the potential for closer integration between predictive modeling and decision-making in real-world systems.
MLJun 6, 2021
Regularization in ResNet with Stochastic DepthSoufiane Hayou, Fadhel Ayed
Regularization plays a major role in modern deep learning. From classic techniques such as L1,L2 penalties to other noise-based methods such as Dropout, regularization often yields better generalization properties by avoiding overfitting. Recently, Stochastic Depth (SD) has emerged as an alternative regularization technique for residual neural networks (ResNets) and has proven to boost the performance of ResNet on many tasks [Huang et al., 2016]. Despite the recent success of SD, little is known about this technique from a theoretical perspective. This paper provides a hybrid analysis combining perturbation analysis and signal propagation to shed light on different regularization effects of SD. Our analysis allows us to derive principled guidelines for choosing the survival rates used for training with SD.
MLFeb 13, 2019
Beyond the Chinese Restaurant and Pitman-Yor processes: Statistical Models with Double Power-law BehaviorFadhel Ayed, Juho Lee, François Caron
Bayesian nonparametric approaches, in particular the Pitman-Yor process and the associated two-parameter Chinese Restaurant process, have been successfully used in applications where the data exhibit a power-law behavior. Examples include natural language processing, natural images or networks. There is also growing empirical evidence that some datasets exhibit a two-regime power-law behavior: one regime for small frequencies, and a second regime, with a different exponent, for high frequencies. In this paper, we introduce a class of completely random measures which are doubly regularly-varying. Contrary to the Pitman-Yor process, we show that when completely random measures in this class are normalized to obtain random probability measures and associated random partitions, such partitions exhibit a double power-law behavior. We discuss in particular three models within this class: the beta prime process (Broderick et al. (2015, 2018), a novel process called generalized BFRY process, and a mixture construction. We derive efficient Markov chain Monte Carlo algorithms to estimate the parameters of these models. Finally, we show that the proposed models provide a better fit than the Pitman-Yor process on various datasets.
STJun 25, 2018
On consistent estimation of the missing massFadhel Ayed, Marco Battiston, Federico Camerlenghi et al.
Given $n$ samples from a population of individuals belonging to different types with unknown proportions, how do we estimate the probability of discovering a new type at the $(n+1)$-th draw? This is a classical problem in statistics, commonly referred to as the missing mass estimation problem. Recent results by Ohannessian and Dahleh \citet{Oha12} and Mossel and Ohannessian \citet{Mos15} showed: i) the impossibility of estimating (learning) the missing mass without imposing further structural assumptions on the type proportions; ii) the consistency of the Good-Turing estimator for the missing mass under the assumption that the tail of the type proportions decays to zero as a regularly varying function with parameter $α\in(0,1)$. In this paper we rely on tools from Bayesian nonparametrics to provide an alternative, and simpler, proof of the impossibility of a distribution-free estimation of the missing mass. Up to our knowledge, the use of Bayesian ideas to study large sample asymptotics for the missing mass is new, and it could be of independent interest. Still relying on Bayesian nonparametric tools, we then show that under regularly varying type proportions the convergence rate of the Good-Turing estimator is the best rate that any estimator can achieve, up to a slowly varying function, and that minimax rate must be at least $n^{-α/2}$. We conclude with a discussion of our results, and by conjecturing that the Good-Turing estimator is an rate optimal minimax estimator under regularly varying type proportions.