George Iosifidis

NI
h-index47
22papers
125citations
Novelty56%
AI Score50

22 Papers

NIApr 20, 2022
Online Caching with no Regret: Optimistic Learning via Recommendations

Naram Mhaisen, George Iosifidis, Douglas Leith

The design of effective online caching policies is an increasingly important problem for content distribution networks, online social networks and edge computing services, among other areas. This paper proposes a new algorithmic toolbox for tackling this problem through the lens of \emph{optimistic} online learning. We build upon the Follow-the-Regularized-Leader (FTRL) framework, which is developed further here to include predictions for the file requests, and we design online caching algorithms for bipartite networks with pre-reserved or dynamic storage subject to time-average budget constraints. The predictions are provided by a content recommendation system that influences the users viewing activity and hence can naturally reduce the caching network's uncertainty about future requests. We also extend the framework to learn and utilize the best request predictor in cases where many are available. We prove that the proposed {optimistic} learning caching policies can achieve \emph{sub-zero} performance loss (regret) for perfect predictions, and maintain the sub-linear regret bound $O(\sqrt T)$, which is the best achievable bound for policies that do not use predictions, even for arbitrary-bad predictions. The performance of the proposed algorithms is evaluated with detailed trace-driven numerical tests.

NISep 11, 2012
Competition and Regulation in Wireless Services Markets

Omer Korcak, George Iosifidis, Tansu Alpcan et al.

We consider a wireless services market where a set of operators compete for a large common pool of users. The latter have a reservation utility of U0 units or, equivalently, an alternative option to satisfy their communication needs. The operators must satisfy these minimum requirements in order to attract the users. We model the users decisions and interaction as an evolutionary game and the competition among the operators as a non cooperative price game which is proved to be a potential game. For each set of prices selected by the operators, the evolutionary game attains a different stationary point. We show that the outcome of both games depend on the reservation utility of the users and the amount of spectrum W the operators have at their disposal. We express the market welfare and the revenue of the operators as functions of these two parameters. Accordingly, we consider the scenario where a regulating agency is able to intervene and change the outcome of the market by tuning W and/or U0. Different regulators may have different objectives and criteria according to which they intervene. We analyze the various possible regulation methods and discuss their requirements, implications and impact on the market.

NISep 4, 2023
Adaptive Resource Allocation for Virtualized Base Stations in O-RAN with Online Learning

Michail Kalntis, George Iosifidis, Fernando A. Kuipers

Open Radio Access Network systems, with their virtualized base stations (vBSs), offer operators the benefits of increased flexibility, reduced costs, vendor diversity, and interoperability. Optimizing the allocation of resources in a vBS is challenging since it requires knowledge of the environment, (i.e., "external'' information), such as traffic demands and channel quality, which is difficult to acquire precisely over short intervals of a few seconds. To tackle this problem, we propose an online learning algorithm that balances the effective throughput and vBS energy consumption, even under unforeseeable and "challenging'' environments; for instance, non-stationary or adversarial traffic demands. We also develop a meta-learning scheme, which leverages the power of other algorithmic approaches, tailored for more "easy'' environments, and dynamically chooses the best performing one, thus enhancing the overall system's versatility and effectiveness. We prove the proposed solutions achieve sub-linear regret, providing zero average optimality gap even in challenging environments. The performance of the algorithms is evaluated with real-world data and various trace-driven evaluations, indicating savings of up to 64.5% in the power consumption of a vBS compared with state-of-the-art benchmarks.

NIDec 13, 2011
Incentive Mechanisms for Hierarchical Spectrum Markets

George Iosifidis, Anil Kumar Chorppath, Tansu Alpcan et al.

In this paper, we study spectrum allocation mechanisms in hierarchical multi-layer markets which are expected to proliferate in the near future based on the current spectrum policy reform proposals. We consider a setting where a state agency sells spectrum channels to Primary Operators (POs) who subsequently resell them to Secondary Operators (SOs) through auctions. We show that these hierarchical markets do not result in a socially efficient spectrum allocation which is aimed by the agency, due to lack of coordination among the entities in different layers and the inherently selfish revenue-maximizing strategy of POs. In order to reconcile these opposing objectives, we propose an incentive mechanism which aligns the strategy and the actions of the POs with the objective of the agency, and thus leads to system performance improvement in terms of social welfare. This pricing-based scheme constitutes a method for hierarchical market regulation. A basic component of the proposed incentive mechanism is a novel auction scheme which enables POs to allocate their spectrum by balancing their derived revenue and the welfare of the SOs.

NIAug 21, 2022
Energy-aware Scheduling of Virtualized Base Stations in O-RAN with Online Learning

Michail Kalntis, George Iosifidis

The design of Open Radio Access Network (O-RAN) compliant systems for configuring the virtualized Base Stations (vBSs) is of paramount importance for network operators. This task is challenging since optimizing the vBS scheduling procedure requires knowledge of parameters, which are erratic and demanding to obtain in advance. In this paper, we propose an online learning algorithm for balancing the performance and energy consumption of a vBS. This algorithm provides performance guarantees under unforeseeable conditions, such as non-stationary traffic and network state, and is oblivious to the vBS operation profile. We study the problem in its most general form and we prove that the proposed technique achieves sub-linear regret (i.e., zero average optimality gap) even in a fast-changing environment. By using real-world data and various trace-driven evaluations, our findings indicate savings of up to 74.3% in the power consumption of a vBS in comparison with state-of-the-art benchmarks.

LGApr 5, 2022
Penalised FTRL With Time-Varying Constraints

Douglas J. Leith, George Iosifidis

In this paper we extend the classical Follow-The-Regularized-Leader (FTRL) algorithm to encompass time-varying constraints, through adaptive penalization. We establish sufficient conditions for the proposed Penalized FTRL algorithm to achieve $O(\sqrt{t})$ regret and violation with respect to strong benchmark $\hat{X}^{max}_t$. Lacking prior knowledge of the constraints, this is probably the largest benchmark set that we can reasonably hope for. Our sufficient conditions are necessary in the sense that when they are violated there exist examples where $O(\sqrt{t})$ regret and violation is not achieved. Compared to the best existing primal-dual algorithms, Penalized FTRL substantially extends the class of problems for which $O(\sqrt{t})$ regret and violation performance is achievable.

OCOct 2, 2023
Adaptive Online Non-stochastic Control

Naram Mhaisen, George Iosifidis

We tackle the problem of Non-stochastic Control (NSC) with the aim of obtaining algorithms whose policy regret is proportional to the difficulty of the controlled environment. Namely, we tailor the Follow The Regularized Leader (FTRL) framework to dynamical systems by using regularizers that are proportional to the actual witnessed costs. The main challenge arises from using the proposed adaptive regularizers in the presence of a state, or equivalently, a memory, which couples the effect of the online decisions and requires new tools for bounding the regret. Via new analysis techniques for NSC and FTRL integration, we obtain novel disturbance action controllers (DAC) with sub-linear data adaptive policy regret bounds that shrink when the trajectory of costs has small gradients, while staying sub-linear even in the worst case.

OCFeb 20, 2023
Solving Recurrent MIPs with Semi-supervised Graph Neural Networks

Konstantinos Benidis, Ugo Rosolia, Syama Rangapuram et al.

We propose an ML-based model that automates and expedites the solution of MIPs by predicting the values of variables. Our approach is motivated by the observation that many problem instances share salient features and solution structures since they differ only in few (time-varying) parameters. Examples include transportation and routing problems where decisions need to be re-optimized whenever commodity volumes or link costs change. Our method is the first to exploit the sequential nature of the instances being solved periodically, and can be trained with ``unlabeled'' instances, when exact solutions are unavailable, in a semi-supervised setting. Also, we provide a principled way of transforming the probabilistic predictions into integral solutions. Using a battery of experiments with representative binary MIPs, we show the gains of our model over other ML-based optimization approaches.

LGFeb 1, 2023
Bandit Convex Optimisation Revisited: FTRL Achieves $\tilde{O}(t^{1/2})$ Regret

David Young, Douglas Leith, George Iosifidis

We show that a kernel estimator using multiple function evaluations can be easily converted into a sampling-based bandit estimator with expectation equal to the original kernel estimate. Plugging such a bandit estimator into the standard FTRL algorithm yields a bandit convex optimisation algorithm that achieves $\tilde{O}(t^{1/2})$ regret against adversarial time-varying convex loss functions.

42.7LGMar 22
Constrained Online Convex Optimization with Memory and Predictions

Mohammed Abdullah, George Iosifidis, Salah Eddine Elayoubi et al.

We study Constrained Online Convex Optimization with Memory (COCO-M), where both the loss and the constraints depend on a finite window of past decisions made by the learner. This setting extends the previously studied unconstrained online optimization with memory framework and captures practical problems such as the control of constrained dynamical systems and scheduling with reconfiguration budgets. For this problem, we propose the first algorithms that achieve sublinear regret and sublinear cumulative constraint violation under time-varying constraints, both with and without predictions of future loss and constraint functions. Without predictions, we introduce an adaptive penalty approach that guarantees sublinear regret and constraint violation. When short-horizon and potentially unreliable predictions are available, we reinterpret the problem as online learning with delayed feedback and design an optimistic algorithm whose performance improves as prediction accuracy improves, while remaining robust when predictions are inaccurate. Our results bridge the gap between classical constrained online convex optimization and memory-dependent settings, and provide a versatile learning toolbox with diverse applications.

LGJan 22
Partially Lazy Gradient Descent for Smoothed Online Learning

Naram Mhaisen, George Iosifidis

We introduce $k$-lazyGD, an online learning algorithm that bridges the gap between greedy Online Gradient Descent (OGD, for $k=1$) and lazy GD/dual-averaging (for $k=T$), creating a spectrum between reactive and stable updates. We analyze this spectrum in Smoothed Online Convex Optimization (SOCO), where the learner incurs both hitting and movement costs. Our main contribution is establishing that laziness is possible without sacrificing hitting performance: we prove that $k$-lazyGD achieves the optimal dynamic regret $\mathcal{O}(\sqrt{(P_T+1)T})$ for any laziness slack $k$ up to $Θ(\sqrt{T/P_T})$, where $P_T$ is the comparator path length. This result formally connects the allowable laziness to the comparator's shifts, showing that $k$-lazyGD can retain the inherently small movements of lazy methods without compromising tracking ability. We base our analysis on the Follow the Regularized Leader (FTRL) framework, and derive a matching lower bound. Since the slack depends on $P_T$, an ensemble of learners with various slacks is used, yielding a method that is provably stable when it can be, and agile when it must be.

NIDec 26, 2025
Meta-Learning-Based Handover Management in NextG O-RAN

Michail Kalntis, George Iosifidis, José Suárez-Varela et al.

While traditional handovers (THOs) have served as a backbone for mobile connectivity, they increasingly suffer from failures and delays, especially in dense deployments and high-frequency bands. To address these limitations, 3GPP introduced Conditional Handovers (CHOs) that enable proactive cell reservations and user-driven execution. However, both handover (HO) types present intricate trade-offs in signaling, resource usage, and reliability. This paper presents unique, countrywide mobility management datasets from a top-tier mobile network operator (MNO) that offer fresh insights into these issues and call for adaptive and robust HO control in next-generation networks. Motivated by these findings, we propose CONTRA, a framework that, for the first time, jointly optimizes THOs and CHOs within the O-RAN architecture. We study two variants of CONTRA: one where users are a priori assigned to one of the HO types, reflecting distinct service or user-specific requirements, as well as a more dynamic formulation where the controller decides on-the-fly the HO type, based on system conditions and needs. To this end, it relies on a practical meta-learning algorithm that adapts to runtime observations and guarantees performance comparable to an oracle with perfect future information (universal no-regret). CONTRA is specifically designed for near-real-time deployment as an O-RAN xApp and aligns with the 6G goals of flexible and intelligent control. Extensive evaluations leveraging crowdsourced datasets show that CONTRA improves user throughput and reduces both THO and CHO switching costs, outperforming 3GPP-compliant and Reinforcement Learning (RL) baselines in dynamic and real-world scenarios.

NIFeb 17, 2024
Fair Resource Allocation in Virtualized O-RAN Platforms

Fatih Aslan, George Iosifidis, Jose A. Ayala-Romero et al.

O-RAN systems and their deployment in virtualized general-purpose computing platforms (O-Cloud) constitute a paradigm shift expected to bring unprecedented performance gains. However, these architectures raise new implementation challenges and threaten to worsen the already-high energy consumption of mobile networks. This paper presents first a series of experiments which assess the O-Cloud's energy costs and their dependency on the servers' hardware, capacity and data traffic properties which, typically, change over time. Next, it proposes a compute policy for assigning the base station data loads to O-Cloud servers in an energy-efficient fashion; and a radio policy that determines at near-real-time the minimum transmission block size for each user so as to avoid unnecessary energy costs. The policies balance energy savings with performance, and ensure that both of them are dispersed fairly across the servers and users, respectively. To cater for the unknown and time-varying parameters affecting the policies, we develop a novel online learning framework with fairness guarantees that apply to the entire operation horizon of the system (long-term fairness). The policies are evaluated using trace-driven simulations and are fully implemented in an O-RAN compatible system where we measure the energy costs and throughput in realistic scenarios.

DCFeb 1, 2024
Workflow Optimization for Parallel Split Learning

Joana Tirana, Dimitra Tsigkari, George Iosifidis et al.

Split learning (SL) has been recently proposed as a way to enable resource-constrained devices to train multi-parameter neural networks (NNs) and participate in federated learning (FL). In a nutshell, SL splits the NN model into parts, and allows clients (devices) to offload the largest part as a processing task to a computationally powerful helper. In parallel SL, multiple helpers can process model parts of one or more clients, thus, considerably reducing the maximum training time over all clients (makespan). In this paper, we focus on orchestrating the workflow of this operation, which is critical in highly heterogeneous systems, as our experiments show. In particular, we formulate the joint problem of client-helper assignments and scheduling decisions with the goal of minimizing the training makespan, and we prove that it is NP-hard. We propose a solution method based on the decomposition of the problem by leveraging its inherent symmetry, and a second one that is fully scalable. A wealth of numerical evaluations using our testbed's measurements allow us to build a solution strategy comprising these methods. Moreover, we show that this strategy finds a near-optimal solution, and achieves a shorter makespan than the baseline scheme by up to 52.3%.

NIJan 14, 2025
Smooth Handovers via Smoothed Online Learning

Michail Kalntis, Andra Lutu, Jesús Omaña Iglesias et al.

With users demanding seamless connectivity, handovers (HOs) have become a fundamental element of cellular networks. However, optimizing HOs is a challenging problem, further exacerbated by the growing complexity of mobile networks. This paper presents the first countrywide study of HO optimization, through the prism of Smoothed Online Learning (SOL). We first analyze an extensive dataset from a commercial mobile network operator (MNO) in Europe with more than 40M users, to understand and reveal important features and performance impacts on HOs. Our findings highlight a correlation between HO failures/delays, and the characteristics of radio cells and end-user devices, showcasing the impact of heterogeneity in mobile networks nowadays. We subsequently model UE-cell associations as dynamic decisions and propose a realistic system model for smooth and accurate HOs that extends existing approaches by (i) incorporating device and cell features on HO optimization, and (ii) eliminating (prior) strong assumptions about requiring future signal measurements and knowledge of end-user mobility. Our algorithm, aligned with the O-RAN paradigm, provides robust dynamic regret guarantees, even in challenging environments, and shows superior performance in multiple scenarios with real-world and synthetic data.

LGApr 4, 2024
Optimistic Online Non-stochastic Control via FTRL

Naram Mhaisen, George Iosifidis

This paper brings the concept of ``optimism" to the new and promising framework of online Non-stochastic Control (NSC). Namely, we study how NSC can benefit from a prediction oracle of unknown quality responsible for forecasting future costs. The posed problem is first reduced to an optimistic learning with delayed feedback problem, which is handled through the Optimistic Follow the Regularized Leader (OFTRL) algorithmic family. This reduction enables the design of \texttt{OptFTRL-C}, the first Disturbance Action Controller (DAC) with optimistic policy regret bounds. These new bounds are commensurate with the oracle's accuracy, ranging from $\mathcal{O}(1)$ for perfect predictions to the order-optimal $\mathcal{O}(\sqrt{T})$ even when all predictions fail. By addressing the challenge of incorporating untrusted predictions into online control, this work contributes to the advancement of the NSC framework and paves the way toward effective and robust learning-based controllers.

LGJul 10, 2025
CHOMET: Conditional Handovers via Meta-Learning

Michail Kalntis, Fernando A. Kuipers, George Iosifidis

Handovers (HOs) are the cornerstone of modern cellular networks for enabling seamless connectivity to a vast and diverse number of mobile users. However, as mobile networks become more complex with more diverse users and smaller cells, traditional HOs face significant challenges, such as prolonged delays and increased failures. To mitigate these issues, 3GPP introduced conditional handovers (CHOs), a new type of HO that enables the preparation (i.e., resource allocation) of multiple cells for a single user to increase the chance of HO success and decrease the delays in the procedure. Despite its advantages, CHO introduces new challenges that must be addressed, including efficient resource allocation and managing signaling/communication overhead from frequent cell preparations and releases. This paper presents a novel framework aligned with the O-RAN paradigm that leverages meta-learning for CHO optimization, providing robust dynamic regret guarantees and demonstrating at least 180% superior performance than other 3GPP benchmarks in volatile signal conditions.

LGMay 28, 2025
On the Dynamic Regret of Following the Regularized Leader: Optimism with History Pruning

Naram Mhaisen, George Iosifidis

We revisit the Follow the Regularized Leader (FTRL) framework for Online Convex Optimization (OCO) over compact sets, focusing on achieving dynamic regret guarantees. Prior work has highlighted the framework's limitations in dynamic environments due to its tendency to produce "lazy" iterates. However, building on insights showing FTRL's ability to produce "agile" iterates, we show that it can indeed recover known dynamic regret bounds through optimistic composition of future costs and careful linearization of past costs, which can lead to pruning some of them. This new analysis of FTRL against dynamic comparators yields a principled way to interpolate between greedy and agile updates and offers several benefits, including refined control over regret terms, optimism without cyclic dependence, and the application of minimal recursive regularization akin to AdaFTRL. More broadly, we show that it is not the "lazy" projection style of FTRL that hinders (optimistic) dynamic regret, but the decoupling of the algorithm's state (linearized history) from its iterates, allowing the state to grow arbitrarily. Instead, pruning synchronizes these two when necessary.

NIApr 4, 2025
Optimistic Learning for Communication Networks

George Iosifidis, Naram Mhaisen, Douglas J. Leith

AI/ML-based tools are at the forefront of resource management solutions for communication networks. Deep learning, in particular, is highly effective in facilitating fast and high-performing decision-making whenever representative training data is available to build offline accurate models. Conversely, online learning solutions do not require training and enable adaptive decisions based on runtime observations, alas are often overly conservative. This extensive tutorial proposes the use of optimistic learning (OpL) as a decision engine for resource management frameworks in modern communication systems. When properly designed, such solutions can achieve fast and high-performing decisions -- comparable to offline-trained models -- while preserving the robustness and performance guarantees of the respective online learning approaches. We introduce the fundamental concepts, algorithms and results of OpL, discuss the roots of this theory and present different approaches to defining and achieving optimism. We proceed to showcase how OpL can enhance resource management in communication networks for several key problems such as caching, edge computing, network slicing, and workload assignment in decentralized O-RAN platforms. Finally, we discuss the open challenges that must be addressed to unlock the full potential of this new resource management approach.

NIFeb 22, 2022
Online Caching with Optimistic Learning

Naram Mhaisen, George Iosifidis, Douglas Leith

The design of effective online caching policies is an increasingly important problem for content distribution networks, online social networks and edge computing services, among other areas. This paper proposes a new algorithmic toolbox for tackling this problem through the lens of optimistic online learning. We build upon the Follow-the-Regularized-Leader (FTRL) framework which is developed further here to include predictions for the file requests, and we design online caching algorithms for bipartite networks with fixed-size caches or elastic leased caches subject to time-average budget constraints. The predictions are provided by a content recommendation system that influences the users viewing activity, and hence can naturally reduce the caching network's uncertainty about future requests. We prove that the proposed optimistic learning caching policies can achieve sub-zero performance loss (regret) for perfect predictions, and maintain the best achievable regret bound $O(\sqrt T)$ even for arbitrary-bad predictions. The performance of the proposed algorithms is evaluated with detailed trace-driven numerical tests.

LGJan 8, 2022
Lazy Lagrangians with Predictions for Online Learning

Daron Anderson, George Iosifidis, Douglas J. Leith

We consider the general problem of online convex optimization with time-varying additive constraints in the presence of predictions for the next cost and constraint functions. A novel primal-dual algorithm is designed by combining a Follow-The-Regularized-Leader iteration with prediction-adaptive dynamic steps. The algorithm achieves $\mathcal O(T^{\frac{3-β}{4}})$ regret and $\mathcal O(T^{\frac{1+β}{2}})$ constraint violation bounds that are tunable via parameter $β\!\in\![1/2,1)$ and have constant factors that shrink with the predictions quality, achieving eventually $\mathcal O(1)$ regret for perfect predictions. Our work extends the FTRL framework for this constrained OCO setting and outperforms the respective state-of-the-art greedy-based solutions, without imposing conditions on the quality of predictions, the cost functions or the geometry of constraints, beyond convexity.

AIOct 10, 2020
Reinforcement Learning on Computational Resource Allocation of Cloud-based Wireless Networks

Beiran Chen, Yi Zhang, George Iosifidis et al.

Wireless networks used for Internet of Things (IoT) are expected to largely involve cloud-based computing and processing. Softwarised and centralised signal processing and network switching in the cloud enables flexible network control and management. In a cloud environment, dynamic computational resource allocation is essential to save energy while maintaining the performance of the processes. The stochastic features of the Central Processing Unit (CPU) load variation as well as the possible complex parallelisation situations of the cloud processes makes the dynamic resource allocation an interesting research challenge. This paper models this dynamic computational resource allocation problem into a Markov Decision Process (MDP) and designs a model-based reinforcement-learning agent to optimise the dynamic resource allocation of the CPU usage. Value iteration method is used for the reinforcement-learning agent to pick up the optimal policy during the MDP. To evaluate our performance we analyse two types of processes that can be used in the cloud-based IoT networks with different levels of parallelisation capabilities, i.e., Software-Defined Radio (SDR) and Software-Defined Networking (SDN). The results show that our agent rapidly converges to the optimal policy, stably performs in different parameter settings, outperforms or at least equally performs compared to a baseline algorithm in energy savings for different scenarios.