Nir Shlezinger

SP
h-index46
53papers
2,959citations
Novelty46%
AI Score56

53 Papers

SPMay 5, 2022
Model-Based Deep Learning: On the Intersection of Deep Learning and Optimization

Nir Shlezinger, Yonina C. Eldar, Stephen P. Boyd · stanford

Decision making algorithms are used in a multitude of different applications. Conventional approaches for designing decision algorithms employ principled and simplified modelling, based on which one can determine decisions via tractable optimization. More recently, deep learning approaches that use highly parametric architectures tuned from data without relying on mathematical models, are becoming increasingly popular. Model-based optimization and data-centric deep learning are often considered to be distinct disciplines. Here, we characterize them as edges of a continuous spectrum varying in specificity and parameterization, and provide a tutorial-style presentation to the methodologies lying in the middle ground of this spectrum, referred to as model-based deep learning. We accompany our presentation with running examples in super-resolution and stochastic control, and show how they are expressed using the provided characterization and specialized in each of the detailed methodologies. The gains of combining model-based optimization and deep learning are demonstrated using experimental results in various applications, ranging from biomedical imaging to digital communications.

79.2SPMay 24
Online Learning of Modular Bayesian Deep Receivers: Single-Step Adaptation with Streaming Data

Yakov Gusakov, Osvaldo Simeone, Tirza Routtenberg et al.

Deep neural network (DNN)-based receivers offer a powerful alternative to classical model-based designs for wireless communication, especially in complex and nonlinear propagation environments. However, their adoption is challenged by the rapid variability of wireless channels, which makes pre-trained static DNN-based receivers ineffective, and by the latency and computational burden of online stochastic gradient descent (SGD)-based learning. In this work, we propose an online learning framework that enables rapid low-complexity adaptation of DNN-based receivers. Our approach is based on two main tenets. First, we cast online learning as Bayesian tracking in parameter space, enabling a single-step adaptation, which deviates from multi-epoch SGD . Second, we focus on modular DNN architectures that enable parallel, online, and localized variational Bayesian updates. Simulations with practical communication channels demonstrate that our proposed online learning framework can maintain a low error rate with markedly reduced update latency and increased robustness to channel dynamics as compared to traditional gradient descent based method.

SPJun 4, 2023
SubspaceNet: Deep Learning-Aided Subspace Methods for DoA Estimation

Dor H. Shmuel, Julian P. Merkofer, Guy Revach et al. · eth-zurich

Direction of arrival (DoA) estimation is a fundamental task in array processing. A popular family of DoA estimation algorithms are subspace methods, which operate by dividing the measurements into distinct signal and noise subspaces. Subspace methods, such as Multiple Signal Classification (MUSIC) and Root-MUSIC, rely on several restrictive assumptions, including narrowband non-coherent sources and fully calibrated arrays, and their performance is considerably degraded when these do not hold. In this work we propose SubspaceNet; a data-driven DoA estimator which learns how to divide the observations into distinguishable subspaces. This is achieved by utilizing a dedicated deep neural network to learn the empirical autocorrelation of the input, by training it as part of the Root-MUSIC method, leveraging the inherent differentiability of this specific DoA estimator, while removing the need to provide a ground-truth decomposable autocorrelation matrix. Once trained, the resulting SubspaceNet serves as a universal surrogate covariance estimator that can be applied in combination with any subspace-based DoA estimation method, allowing its successful application in challenging setups. SubspaceNet is shown to enable various DoA estimation algorithms to cope with coherent sources, wideband signals, low SNR, array mismatches, and limited snapshots, while preserving the interpretability and the suitability of classic subspace methods.

TROct 19, 2022
Neural Augmented Kalman Filtering with Bollinger Bands for Pairs Trading

Amit Milstein, Haoran Deng, Guy Revach et al. · eth-zurich

Pairs trading is a family of trading techniques that determine their policies based on monitoring the relationships between pairs of assets. A common pairs trading approach relies on describing the pair-wise relationship as a linear Space State (SS) model with Gaussian noise. This representation facilitates extracting financial indicators with low complexity and latency using a Kalman Filter (KF), that are then processed using classic policies such as Bollinger Bands (BB). However, such SS models are inherently approximated and mismatched, often degrading the revenue. In this work, we propose KalmenNet-aided Bollinger bands Pairs Trading (KBPT), a deep learning aided policy that augments the operation of KF-aided BB trading. KBPT is designed by formulating an extended SS model for pairs trading that approximates their relationship as holding partial co-integration. This SS model is utilized by a trading policy that augments KF-BB trading with a dedicated neural network based on the KalmanNet architecture. The resulting KBPT is trained in a two-stage manner which first tunes the tracking algorithm in an unsupervised manner independently of the trading task, followed by its adaptation to track the financial indicators to maximize revenue while approximating BB with a differentiable mapping. KBPT thus leverages data to overcome the approximated nature of the SS model, converting the KF-BB policy into a trainable model. We empirically demonstrate that our proposed KBPT systematically yields improved revenue compared with model-based and data-driven benchmarks over various different assets.

SYOct 23, 2022
LQGNet: Hybrid Model-Based and Data-Driven Linear Quadratic Stochastic Control

Solomon Goldgraber Casspi, Oliver Husser, Guy Revach et al. · eth-zurich

Stochastic control deals with finding an optimal control signal for a dynamical system in a setting with uncertainty, playing a key role in numerous applications. The linear quadratic Gaussian (LQG) is a widely-used setting, where the system dynamics is represented as a linear Gaussian statespace (SS) model, and the objective function is quadratic. For this setting, the optimal controller is obtained in closed form by the separation principle. However, in practice, the underlying system dynamics often cannot be faithfully captured by a fully known linear Gaussian SS model, limiting its performance. Here, we present LQGNet, a stochastic controller that leverages data to operate under partially known dynamics. LQGNet augments the state tracking module of separation-based control with a dedicated trainable algorithm. The resulting system preserves the operation of classic LQG control while learning to cope with partially known SS models without having to fully identify the dynamics. We empirically show that LQGNet outperforms classic stochastic control by overcoming mismatched SS models.

SPOct 12, 2022
Outlier-Insensitive Kalman Filtering Using NUV Priors

Shunit Truzman, Guy Revach, Nir Shlezinger et al. · eth-zurich

The Kalman filter (KF) is a widely-used algorithm for tracking the latent state of a dynamical system from noisy observations. For systems that are well-described by linear Gaussian state space models, the KF minimizes the mean-squared error (MSE). However, in practice, observations are corrupted by outliers, severely impairing the KFs performance. In this work, an outlier-insensitive KF is proposed, where robustness is achieved by modeling each potential outlier as a normally distributed random variable with unknown variance (NUV). The NUVs variances are estimated online, using both expectation-maximization (EM) and alternating maximization (AM). The former was previously proposed for the task of smoothing with outliers and was adapted here to filtering, while both EM and AM obtained the same performance and outperformed the other algorithms, the AM approach is less complex and thus requires 40 percentage less run-time. Our empirical study demonstrates that the MSE of our proposed outlier-insensitive KF outperforms previously proposed algorithms, and that for data clean of outliers, it reverts to the classic KF, i.e., MSE optimality is preserved

LGAug 23, 2022
Joint Privacy Enhancement and Quantization in Federated Learning

Natalie Lang, Elad Sofer, Tomer Shaked et al.

Federated learning (FL) is an emerging paradigm for training machine learning models using possibly private data available at edge devices. The distributed operation of FL gives rise to challenges that are not encountered in centralized machine learning, including the need to preserve the privacy of the local datasets, and the communication load due to the repeated exchange of updated models. These challenges are often tackled individually via techniques that induce some distortion on the updated models, e.g., local differential privacy (LDP) mechanisms and lossy compression. In this work we propose a method coined joint privacy enhancement and quantization (JoPEQ), which jointly implements lossy compression and privacy enhancement in FL settings. In particular, JoPEQ utilizes vector quantization based on random lattice, a universal compression technique whose byproduct distortion is statistically equivalent to additive noise. This distortion is leveraged to enhance privacy by augmenting the model updates with dedicated multivariate privacy preserving noise. We show that JoPEQ simultaneously quantizes data according to a required bit-rate while holding a desired privacy level, without notably affecting the utility of the learned model. This is shown via analytical LDP guarantees, distortion and convergence bounds derivation, and numerical studies. Finally, we empirically assert that JoPEQ demolishes common attacks known to exploit privacy leakage.

SPJun 9, 2022
Discriminative and Generative Learning for Linear Estimation of Random Signals [Lecture Notes]

Nir Shlezinger, Tirza Routtenberg

Inference tasks in signal processing are often characterized by the availability of reliable statistical modeling with some missing instance-specific parameters. One conventional approach uses data to estimate these missing parameters and then infers based on the estimated model. Alternatively, data can also be leveraged to directly learn the inference mapping end-to-end. These approaches for combining partially-known statistical models and data in inference are related to the notions of generative and discriminative models used in the machine learning literature, typically considered in the context of classifiers. The goal of this lecture note is to introduce the concepts of generative and discriminative learning for inference with a partially-known statistical model. While machine learning systems often lack the interpretability of traditional signal processing methods, we focus on a simple setting where one can interpret and compare the approaches in a tractable manner that is accessible and relevant to signal processing readers. In particular, we exemplify the approaches for the task of Bayesian signal estimation in a jointly Gaussian setting with the mean-squared error (MSE) objective, i.e., a linear estimation setting.

LGJun 7, 2022
Decentralized Low-Latency Collaborative Inference via Ensembles on the Edge

May Malka, Erez Farhan, Hai Morgenstern et al.

The success of deep neural networks (DNNs) is heavily dependent on computational resources. While DNNs are often employed on cloud servers, there is a growing need to operate DNNs on edge devices. Edge devices are typically limited in their computational resources, yet, often multiple edge devices are deployed in the same environment and can reliably communicate with each other. In this work we propose to facilitate the application of DNNs on the edge by allowing multiple users to collaborate during inference to improve their accuracy. Our mechanism, coined {\em edge ensembles}, is based on having diverse predictors at each device, which form an ensemble of models during inference. To mitigate the communication overhead, the users share quantized features, and we propose a method for aggregating multiple decisions into a single inference rule. We analyze the latency induced by edge ensembles, showing that its performance improvement comes at the cost of a minor additional delay under common assumptions on the communication network. Our experiments demonstrate that collaborative inference via edge ensembles equipped with compact DNNs substantially improves the accuracy over having each user infer locally, and can outperform using a single centralized DNN larger than all the networks in the ensemble together.

SPNov 28, 2023
GSP-KalmanNet: Tracking Graph Signals via Neural-Aided Kalman Filtering

Itay Buchnik, Guy Sagi, Nimrod Leinwand et al.

Dynamic systems of graph signals are encountered in various applications, including social networks, power grids, and transportation. While such systems can often be described as state space (SS) models, tracking graph signals via conventional tools based on the Kalman filter (KF) and its variants is typically challenging. This is due to the nonlinearity, high dimensionality, irregularity of the domain, and complex modeling associated with real-world dynamic systems of graph signals. In this work, we study the tracking of graph signals using a hybrid model-based/data-driven approach. We develop the GSP-KalmanNet, which tracks the hidden graphical states from the graphical measurements by jointly leveraging graph signal processing (GSP) tools and deep learning (DL) techniques. The derivations of the GSP-KalmanNet are based on extending the KF to exploit the inherent graph structure via graph frequency domain filtering, which considerably simplifies the computational complexity entailed in processing high-dimensional signals and increases the robustness to small topology changes. Then, we use data to learn the Kalman gain following the recently proposed KalmanNet framework, which copes with partial and approximated modeling, without forcing a specific model over the noise statistics. Our empirical results demonstrate that the proposed GSP-KalmanNet achieves enhanced accuracy and run time performance as well as improved robustness to model misspecifications compared with both model-based and data-driven benchmarks.

ITMar 3, 2023
AI-Empowered Hybrid MIMO Beamforming

Nir Shlezinger, Mengyuan Ma, Ortal Lavi et al.

Hybrid multiple-input multiple-output (MIMO) is an attractive technology for realizing extreme massive MIMO systems envisioned for future wireless communications in a scalable and power-efficient manner. However, the fact that hybrid MIMO systems implement part of their beamforming in analog and part in digital makes the optimization of their beampattern notably more challenging compared with conventional fully digital MIMO. Consequently, recent years have witnessed a growing interest in using data-aided artificial intelligence (AI) tools for hybrid beamforming design. This article reviews candidate strategies to leverage data to improve real-time hybrid beamforming design. We discuss the architectural constraints and characterize the core challenges associated with hybrid beamforming optimization. We then present how these challenges are treated via conventional optimization, and identify different AI-aided design approaches. These can be roughly divided into purely data-driven deep learning models and different forms of deep unfolding techniques for combining AI with classical optimization.We provide a systematic comparative study between existing approaches including both numerical evaluations and qualitative measures. We conclude by presenting future research opportunities associated with the incorporation of AI in hybrid MIMO systems.

OCSep 21, 2023
Limited Communications Distributed Optimization via Deep Unfolded Distributed ADMM

Yoav Noah, Nir Shlezinger

Distributed optimization is a fundamental framework for collaborative inference and decision making in decentralized multi-agent systems. The operation is modeled as the joint minimization of a shared objective which typically depends on observations gathered locally by each agent. Distributed optimization algorithms, such as the common D-ADMM, tackle this task by iteratively combining local computations and message exchanges. One of the main challenges associated with distributed optimization, and particularly with D-ADMM, is that it requires a large number of communications, i.e., messages exchanged between the agents, to reach consensus. This can make D-ADMM costly in power, latency, and channel resources. In this work we propose unfolded D-ADMM, which follows the emerging deep unfolding methodology to enable D-ADMM to operate reliably with a predefined and small number of messages exchanged by each agent. Unfolded D-ADMM fully preserves the operation of D-ADMM, while leveraging data to tune the hyperparameters of each iteration of the algorithm. These hyperparameters can either be agent-specific, aiming at achieving the best performance within a fixed number of iterations over a given network, or shared among the agents, allowing to learn to distributedly optimize over different networks. For both settings, our unfolded D-ADMM operates with limited communications, while preserving the interpretability and flexibility of the original D-ADMM algorithm. We specialize unfolded D-ADMM for two representative settings: a distributed estimation task, considering a sparse recovery setup, and a distributed learning scenario, where multiple agents collaborate in learning a machine learning model. Our numerical results demonstrate that the proposed approach dramatically reduces the number of communications utilized by D-ADMM, without compromising on its performance.

SPAug 21, 2024
Learning Flock: Enhancing Sets of Particles for Multi~Sub-State Particle Filtering with Neural Augmentation

Itai Nuri, Nir Shlezinger

A leading family of algorithms for state estimation in dynamic systems with multiple sub-states is based on particle filters (PFs). PFs often struggle when operating under complex or approximated modelling (necessitating many particles) with low latency requirements (limiting the number of particles), as is typically the case in multi target tracking (MTT). In this work, we introduce a deep neural network (DNN) augmentation for PFs termed learning flock (LF). LF learns to correct a particles-weights set, which we coin flock, based on the relationships between all sub-particles in the set itself, while disregarding the set acquisition procedure. Our proposed LF, which can be readily incorporated into different PFs flow, is designed to facilitate rapid operation by maintaining accuracy with a reduced number of particles. We introduce a dedicated training algorithm, allowing both supervised and unsupervised training, and yielding a module that supports a varying number of sub-states and particles without necessitating re-training. We experimentally show the improvements in performance, robustness, and latency of LF augmentation for radar multi-target tracking, as well its ability to mitigate the effect of a mismatched observation modelling. We also compare and illustrate the advantages of LF over a state-of-the-art DNN-aided PF, and demonstrate that LF enhances both classic PFs as well as DNN-based filters.

SPAug 1, 2024
Rapid and Power-Aware Learned Optimization for Modular Receive Beamforming

Ohad Levy, Nir Shlezinger

Multiple-input multiple-output (MIMO) systems play a key role in wireless communication technologies. A widely considered approach to realize scalable MIMO systems involves architectures comprised of multiple separate modules, each with its own beamforming capability. Such models accommodate cell-free massive MIMO and partially connected hybrid MIMO architectures. A core issue with the implementation of modular MIMO arises from the need to rapidly set the beampatterns of the modules, while maintaining their power efficiency. This leads to challenging constrained optimization that should be repeatedly solved on each coherence duration. In this work, we propose a power-oriented optimization algorithm for beamforming in uplink modular hybrid MIMO systems, which learns from data to operate rapidly. We derive our learned optimizer by tackling the rate maximization objective using projected gradient ascent steps with momentum. We then leverage data to tune the hyperparameters of the optimizer, allowing it to operate reliably in a fixed and small number of iterations while completely preserving its interpretable operation. We show how power efficient beamforming can be encouraged by the learned optimizer, via boosting architectures with low-resolution phase shifts and with deactivated analog components. Numerical results show that our learn-to-optimize method notably reduces the number of iterations and computation latency required to reliably tune modular MIMO receivers, and that it allows obtaining desirable balances between power efficient designs and throughput.

SPSep 18, 2023
Outlier-Insensitive Kalman Filtering: Theory and Applications

Shunit Truzman, Guy Revach, Nir Shlezinger et al.

State estimation of dynamical systems from noisy observations is a fundamental task in many applications. It is commonly addressed using the linear Kalman filter (KF), whose performance can significantly degrade in the presence of outliers in the observations, due to the sensitivity of its convex quadratic objective function. To mitigate such behavior, outlier detection algorithms can be applied. In this work, we propose a parameter-free algorithm which mitigates the harmful effect of outliers while requiring only a short iterative process of the standard update step of the KF. To that end, we model each potential outlier as a normal process with unknown variance and apply online estimation through either expectation maximization or alternating maximization algorithms. Simulations and field experiment evaluations demonstrate competitive performance of our method, showcasing its robustness to outliers in filtering scenarios compared to alternative algorithms.

ITAug 5, 2024
Optimization of Iterative Blind Detection based on Expectation Maximization and Belief Propagation

Luca Schmid, Tomer Raviv, Nir Shlezinger et al.

We study iterative blind symbol detection for block-fading linear inter-symbol interference channels. Based on the factor graph framework, we design a joint channel estimation and detection scheme that combines the expectation maximization (EM) algorithm and the ubiquitous belief propagation (BP) algorithm. Interweaving the iterations of both schemes significantly reduces the EM algorithm's computational burden while retaining its excellent performance. To this end, we apply simple yet effective model-based learning methods to find a suitable parameter update schedule by introducing momentum in both the EM parameter updates as well as in the BP message passing. Numerical simulations verify that the proposed method can learn efficient schedules that generalize well and even outperform coherent BP detection in high signal-to-noise scenarios.

33.0ITMar 23
DeepNP: Deep Learning-Based Noise Prediction for Ultra-Reliable Low-Latency Communications

Adina Waxman, Nir Shlezinger, Alejandro Cohen

Adaptive network coding schemes provide a promising approach to bridging the gap between high data rates and low delay in real-time streaming applications. However, their effectiveness often relies on accurate channel prediction, which is typically based on delayed feedback and is especially challenging when the underlying channel model is unknown. To address this, we introduce a novel integration of network coding with a channel-agnostic, Deep learning-based Noise Prediction algorithm (DeepNP). Unlike traditional estimators, DeepNP predicts statistical noise rates rather than instantaneous noise realizations, significantly simplifying the prediction task while enhancing coding performance. DeepNP is designed to operate with both binary (e.g., acknowledgments) and continuous-valued (e.g., Signal-to-Noise Ratio, SNR) feedback. We incorporate DeepNP into the Adaptive and Causal Random Linear Network Coding (AC-RLNC) framework to jointly optimize throughput and in-order delivery delay. Two variants are proposed: (i) Erasure-Rate DeepNP (ER-DeepNP), which serves as a transport-layer noise predictor and achieves in a numerical study up to a 2x reduction in mean and maximum delay with less than 0.1 loss in throughput compared to statistic-based estimators, under Round-Trip Time (RTT) up to 40 time slots and erasure rates up to 60%; and (ii) Cross-Layer DeepNP (CL-DeepNP), which dynamically adjusts the SNR threshold to maintain high physical layer code rates while achieving low transport-layer erasure rates. This yields, in the presented numerical study, a 25% throughput gain over fixed-threshold approaches. Our results demonstrate that DeepNP enables robust, model-free noise prediction, making adaptive network coding more viable in practical, feedback-limited communication scenarios.

SPSep 10, 2023
Deep Learning-Aided Subspace-Based DOA Recovery for Sparse Arrays

Yoav Amiel, Dor H. Shmuel, Nir Shlezinger et al.

Sparse arrays enable resolving more direction of arrivals (DoAs) than antenna elements using non-uniform arrays. This is typically achieved by reconstructing the covariance of a virtual large uniform linear array (ULA), which is then processed by subspace DoA estimators. However, these method assume that the signals are non-coherent and the array is calibrated; the latter often challenging to achieve in sparse arrays, where one cannot access the virtual array elements. In this work, we propose Sparse-SubspaceNet, which leverages deep learning to enable subspace-based DoA recovery from sparse miscallibrated arrays with coherent sources. Sparse- SubspaceNet utilizes a dedicated deep network to learn from data how to compute a surrogate virtual array covariance that is divisible into distinguishable subspaces. By doing so, we learn to cope with coherent sources and miscalibrated sparse arrays, while preserving the interpretability and the suitability of model-based subspace DoA estimators.

70.3ASMar 27
DiffAU: Diffusion-Based Ambisonics Upscaling

Amit Milstein, Nir Shlezinger, Boaz Rafaely

Spatial audio enhances immersion by reproducing 3D sound fields, with Ambisonics offering a scalable format for this purpose. While first-order Ambisonics (FOA) notably facilitates hardware-efficient acquisition and storage of sound fields as compared to high-order Ambisonics (HOA), its low spatial resolution limits realism, highlighting the need for Ambisonics upscaling (AU) as an approach for increasing the order of Ambisonics signals. In this work we propose DiffAU, a cascaded AU method that leverages recent developments in diffusion models combined with novel adaptation to spatial audio to generate 3rd order Ambisonics from FOA. By learning data distributions, DiffAU provides a principled approach that rapidly and reliably reproduces HOA in various settings. Experiments in anechoic conditions with multiple speakers, show strong objective and perceptual performance.

LGDec 3, 2025
Deep Unfolding: Recent Developments, Theory, and Design Guidelines

Nir Shlezinger, Santiago Segarra, Yi Zhang et al.

Optimization methods play a central role in signal processing, serving as the mathematical foundation for inference, estimation, and control. While classical iterative optimization algorithms provide interpretability and theoretical guarantees, they often rely on surrogate objectives, require careful hyperparameter tuning, and exhibit substantial computational latency. Conversely, machine learning (ML ) offers powerful data-driven modeling capabilities but lacks the structure, transparency, and efficiency needed for optimization-driven inference. Deep unfolding has recently emerged as a compelling framework that bridges these two paradigms by systematically transforming iterative optimization algorithms into structured, trainable ML architectures. This article provides a tutorial-style overview of deep unfolding, presenting a unified perspective of methodologies for converting optimization solvers into ML models and highlighting their conceptual, theoretical, and practical implications. We review the foundations of optimization for inference and for learning, introduce four representative design paradigms for deep unfolding, and discuss the distinctive training schemes that arise from their iterative nature. Furthermore, we survey recent theoretical advances that establish convergence and generalization guarantees for unfolded optimizers, and provide comparative qualitative and empirical studies illustrating their relative trade-offs in complexity, interpretability, and robustness.

SYNov 27, 2018
Performance Analysis of LMS Filters with non-Gaussian Cyclostationary Signals

Nir Shlezinger, Koby Todros

The least mean-square (LMS) filter is one of the most common adaptive linear estimation algorithms. In many practical scenarios, and particularly in digital communications systems, the signal of interest (SOI) and the input signal are jointly wide-sense cyclostationary. Previous works analyzing the performance of LMS filters for this important case assume specific probability distributions of the considered signals or specific models that relate the input signal and the SOI. In this work, we provide a general transient and steady-state performance analysis that is free of specific distributional or model assumptions. We obtain conditions for convergence and derive analytical expressions for the non-asymptotic and steady-state mean-squared error. The accuracy of our analysis is demonstrated in simulation studies that correspond to practical communications scenarios.

SPOct 18, 2022
Split-KalmanNet: A Robust Model-Based Deep Learning Approach for SLAM

Geon Choi, Jeonghun Park, Nir Shlezinger et al.

Simultaneous localization and mapping (SLAM) is a method that constructs a map of an unknown environment and localizes the position of a moving agent on the map simultaneously. Extended Kalman filter (EKF) has been widely adopted as a low complexity solution for online SLAM, which relies on a motion and measurement model of the moving agent. In practice, however, acquiring precise information about these models is very challenging, and the model mismatch effect causes severe performance loss in SLAM. In this paper, inspired by the recently proposed KalmanNet, we present a robust EKF algorithm using the power of deep learning for online SLAM, referred to as Split-KalmanNet. The key idea of Split-KalmanNet is to compute the Kalman gain using the Jacobian matrix of a measurement function and two recurrent neural networks (RNNs). The two RNNs independently learn the covariance matrices for a prior state estimate and the innovation from data. The proposed split structure in the computation of the Kalman gain allows to compensate for state and measurement model mismatch effects independently. Numerical simulation results verify that Split-KalmanNet outperforms the traditional EKF and the state-of-the-art KalmanNet algorithm in various model mismatch scenarios.

75.5NIMay 12
Decentralized Multi-Channel MANET Power Optimization Using Graph Neural Networks

Tomer Alter, Nir Shlezinger, Michael Segal

The increasing demand for mobile ad hoc networks (MANETs) calls for decentralized mechanisms that can allocate transmit power across nodes and channels under stringent resource constraints. Existing optimization-based approaches, however, do not account for expected settings where each link includes multiple channels (e.g., multi-band signaling). Motivated by recent advances in machine learning for distributed optimization, we propose MANET-GNN, a graph neural network (GNN)-based algorithm for decentralized power allocation in multi-channel MANETs. MANET-GNN explicitly exploits the network topology, scales efficiently with the number of nodes and frequency bands, generalizes across topologies and channel conditions, and enables near-instantaneous inference suitable for real-time deployment. Our design builds on a constrained optimization formulation and employs a dedicated GNN architecture inspired by message passing, trained via an unsupervised procedure that is robust to noisy channel state information. Numerical evaluations demonstrate that MANET-GNN achieves high-throughput multi-channel communication across diverse MANET scenarios.

LGMar 27, 2024
Stragglers-Aware Low-Latency Synchronous Federated Learning via Layer-Wise Model Updates

Natalie Lang, Alejandro Cohen, Nir Shlezinger

Synchronous federated learning (FL) is a popular paradigm for collaborative edge learning. It typically involves a set of heterogeneous devices locally training neural network (NN) models in parallel with periodic centralized aggregations. As some of the devices may have limited computational resources and varying availability, FL latency is highly sensitive to stragglers. Conventional approaches discard incomplete intra-model updates done by stragglers, alter the amount of local workload and architecture, or resort to asynchronous settings; which all affect the trained model performance under tight training latency constraints. In this work, we propose straggler-aware layer-wise federated learning (SALF) that leverages the optimization procedure of NNs via backpropagation to update the global model in a layer-wise fashion. SALF allows stragglers to synchronously convey partial gradients, having each layer of the global model be updated independently with a different contributing set of users. We provide a theoretical analysis, establishing convergence guarantees for the global model under mild assumptions on the distribution of the participating devices, revealing that SALF converges at the same asymptotic rate as FL with no timing limitations. This insight is matched with empirical observations, demonstrating the performance gains of SALF compared to alternative mechanisms mitigating the device heterogeneity gap in FL.

LGOct 16, 2024
AI-Aided Kalman Filters

Nir Shlezinger, Guy Revach, Anubhab Ghosh et al.

The Kalman filter (KF) and its variants are among the most celebrated algorithms in signal processing. These methods are used for state estimation of dynamic systems by relying on mathematical representations in the form of simple state-space (SS) models, which may be crude and inaccurate descriptions of the underlying dynamics. Emerging data-centric artificial intelligence (AI) techniques tackle these tasks using deep neural networks (DNNs), which are model-agnostic. Recent developments illustrate the possibility of fusing DNNs with classic Kalman-type filtering, obtaining systems that learn to track in partially known dynamics. This article provides a tutorial-style overview of design approaches for incorporating AI in aiding KF-type algorithms. We review both generic and dedicated DNN architectures suitable for state estimation, and provide a systematic presentation of techniques for fusing AI tools with KFs and for leveraging partial SS modeling and data, categorizing design approaches into task-oriented and SS model-oriented. The usefulness of each approach in preserving the individual strengths of model-based KFs and data-driven DNNs is investigated in a qualitative and quantitative study, whose code is publicly available, illustrating the gains of hybrid model-based/data-driven designs. We also discuss existing challenges and future research directions that arise from fusing AI and Kalman-type algorithms.

ITJan 23, 2024
Blind Channel Estimation and Joint Symbol Detection with Data-Driven Factor Graphs

Luca Schmid, Tomer Raviv, Nir Shlezinger et al.

We investigate the application of the factor graph framework for blind joint channel estimation and symbol detection on time-variant linear inter-symbol interference channels. In particular, we consider the expectation maximization (EM) algorithm for maximum likelihood estimation, which typically suffers from high complexity as it requires the computation of the symbol-wise posterior distributions in every iteration. We address this issue by efficiently approximating the posteriors using the belief propagation (BP) algorithm on a suitable factor graph. By interweaving the iterations of BP and EM, the detection complexity can be further reduced to a single BP iteration per EM step. In addition, we propose a data-driven version of our algorithm that introduces momentum in the BP updates and learns a suitable EM parameter update schedule, thereby significantly improving the performance-complexity tradeoff with a few offline training samples. Our numerical experiments demonstrate the excellent performance of the proposed blind detector and show that it even outperforms coherent BP detection in high signal-to-noise scenarios.

SPJan 5, 2025
Remote Inference over Dynamic Links via Adaptive Rate Deep Task-Oriented Vector Quantization

Eyal Fishel, May Malka, Shai Ginzach et al.

A broad range of technologies rely on remote inference, wherein data acquired is conveyed over a communication channel for inference in a remote server. Communication between the participating entities is often carried out over rate-limited channels, necessitating data compression for reducing latency. While deep learning facilitates joint design of the compression mapping along with encoding and inference rules, existing learned compression mechanisms are static, and struggle in adapting their resolution to changes in channel conditions and to dynamic links. To address this, we propose Adaptive Rate Task-Oriented Vector Quantization (ARTOVeQ), a learned compression mechanism that is tailored for remote inference over dynamic links. ARTOVeQ is based on designing nested codebooks along with a learning algorithm employing progressive learning. We show that ARTOVeQ extends to support low-latency inference that is gradually refined via successive refinement principles, and that it enables the simultaneous usage of multiple resolutions when conveying high-dimensional data. Numerical results demonstrate that the proposed scheme yields remote deep inference that operates with multiple rates, supports a broad range of bit budgets, and facilitates rapid inference that gradually improves with more bits exchanged, while approaching the performance of single-rate deep quantization methods.

SPJun 25, 2025
OLALa: Online Learned Adaptive Lattice Codes for Heterogeneous Federated Learning

Natalie Lang, Maya Simhi, Nir Shlezinger

Federated learning (FL) enables collaborative training across distributed clients without sharing raw data, often at the cost of substantial communication overhead induced by transmitting high-dimensional model updates. This overhead can be alleviated by having the clients quantize their model updates, with dithered lattice quantizers identified as an attractive scheme due to its structural simplicity and convergence-preserving properties. However, existing lattice-based FL schemes typically rely on a fixed quantization rule, which is suboptimal in heterogeneous and dynamic environments where the model updates distribution varies across users and training rounds. In this work, we propose Online Learned Adaptive Lattices (OLALa), a heterogeneous FL framework where each client can adjust its quantizer online using lightweight local computations. We first derive convergence guarantees for FL with non-fixed lattice quantizers and show that proper lattice adaptation can tighten the convergence bound. Then, we design an online learning algorithm that enables clients to tune their quantizers throughout the FL process while exchanging only a compact set of quantization parameters. Numerical experiments demonstrate that OLALa consistently improves learning performance under various quantization rates, outperforming conventional fixed-codebook and non-adaptive schemes.

LGApr 26, 2025
Unveiling and Mitigating Adversarial Vulnerabilities in Iterative Optimizers

Elad Sofer, Tomer Shaked, Caroline Chaux et al.

Machine learning (ML) models are often sensitive to carefully crafted yet seemingly unnoticeable perturbations. Such adversarial examples are considered to be a property of ML models, often associated with their black-box operation and sensitivity to features learned from data. This work examines the adversarial sensitivity of non-learned decision rules, and particularly of iterative optimizers. Our analysis is inspired by the recent developments in deep unfolding, which cast such optimizers as ML models. We show that non-learned iterative optimizers share the sensitivity to adversarial examples of ML models, and that attacking iterative optimizers effectively alters the optimization objective surface in a manner that modifies the minima sought. We then leverage the ability to cast iteration-limited optimizers as ML models to enhance robustness via adversarial training. For a class of proximal gradient optimizers, we rigorously prove how their learning affects adversarial sensitivity. We numerically back our findings, showing the vulnerability of various optimizers, as well as the robustness induced by unfolding and adversarial training.

LGJan 10, 2025
Deep Variational Sequential Monte Carlo for High-Dimensional Observations

Wessel L. van Nierop, Nir Shlezinger, Ruud J. G. van Sloun

Sequential Monte Carlo (SMC), or particle filtering, is widely used in nonlinear state-space systems, but its performance often suffers from poorly approximated proposal and state-transition distributions. This work introduces a differentiable particle filter that leverages the unsupervised variational SMC objective to parameterize the proposal and transition distributions with a neural network, designed to learn from high-dimensional observations. Experimental results demonstrate that our approach outperforms established baselines in tracking the challenging Lorenz attractor from high-dimensional and partial observations. Furthermore, an evidence lower bound based evaluation indicates that our method offers a more accurate representation of the posterior distribution.

LGJan 4
SGD-Based Knowledge Distillation with Bayesian Teachers: Theory and Guidelines

Itai Morad, Nir Shlezinger, Yonina C. Eldar

Knowledge Distillation (KD) is a central paradigm for transferring knowledge from a large teacher network to a typically smaller student model, often by leveraging soft probabilistic outputs. While KD has shown strong empirical success in numerous applications, its theoretical underpinnings remain only partially understood. In this work, we adopt a Bayesian perspective on KD to rigorously analyze the convergence behavior of students trained with Stochastic Gradient Descent (SGD). We study two regimes: $(i)$ when the teacher provides the exact Bayes Class Probabilities (BCPs); and $(ii)$ supervision with noisy approximations of the BCPs. Our analysis shows that learning from BCPs yields variance reduction and removes neighborhood terms in the convergence bounds compared to one-hot supervision. We further characterize how the level of noise affects generalization and accuracy. Motivated by these insights, we advocate the use of Bayesian deep learning models, which typically provide improved estimates of the BCPs, as teachers in KD. Consistent with our analysis, we experimentally demonstrate that students distilled from Bayesian teachers not only achieve higher accuracies (up to +4.27%), but also exhibit more stable convergence (up to 30% less noise), compared to students distilled from deterministic teachers.

ITJun 18, 2025
In-Context Learning for Gradient-Free Receiver Adaptation: Principles, Applications, and Theory

Matteo Zecchin, Tomer Raviv, Dileep Kalathil et al.

In recent years, deep learning has facilitated the creation of wireless receivers capable of functioning effectively in conditions that challenge traditional model-based designs. Leveraging programmable hardware architectures, deep learning-based receivers offer the potential to dynamically adapt to varying channel environments. However, current adaptation strategies, including joint training, hypernetwork-based methods, and meta-learning, either demonstrate limited flexibility or necessitate explicit optimization through gradient descent. This paper presents gradient-free adaptation techniques rooted in the emerging paradigm of in-context learning (ICL). We review architectural frameworks for ICL based on Transformer models and structured state-space models (SSMs), alongside theoretical insights into how sequence models effectively learn adaptation from contextual information. Further, we explore the application of ICL to cell-free massive MIMO networks, providing both theoretical analyses and empirical evidence. Our findings indicate that ICL represents a principled and efficient approach to real-time receiver adaptation using pilot signals and auxiliary contextual information-without requiring online retraining.

LGMay 29, 2025
Adaptive Deadline and Batch Layered Synchronized Federated Learning

Asaf Goren, Natalie Lang, Nir Shlezinger et al.

Federated learning (FL) enables collaborative model training across distributed edge devices while preserving data privacy, and typically operates in a round-based synchronous manner. However, synchronous FL suffers from latency bottlenecks due to device heterogeneity, where slower clients (stragglers) delay or degrade global updates. Prior solutions, such as fixed deadlines, client selection, and layer-wise partial aggregation, alleviate the effect of stragglers, but treat round timing and local workload as static parameters, limiting their effectiveness under strict time constraints. We propose ADEL-FL, a novel framework that jointly optimizes per-round deadlines and user-specific batch sizes for layer-wise aggregation. Our approach formulates a constrained optimization problem minimizing the expected L2 distance to the global optimum under total training time and global rounds. We provide a convergence analysis under exponential compute models and prove that ADEL-FL yields unbiased updates with bounded variance. Extensive experiments demonstrate that ADEL-FL outperforms alternative methods in both convergence rate and final accuracy under heterogeneous conditions.

SPApr 1, 2025
Near Field Localization via AI-Aided Subspace Methods

Arad Gast, Luc Le Magoarou, Nir Shlezinger

The increasing demands for high-throughput and energy-efficient wireless communications are driving the adoption of extremely large antennas operating at high-frequency bands. In these regimes, multiple users will reside in the radiative near-field, and accurate localization becomes essential. Unlike conventional far-field systems that rely solely on DOA estimation, near-field localization exploits spherical wavefront propagation to recover both DOA and range information. While subspace-based methods, such as MUSIC and its extensions, offer high resolution and interpretability for near-field localization, their performance is significantly impacted by model assumptions, including non-coherent sources, well-calibrated arrays, and a sufficient number of snapshots. To address these limitations, this work proposes AI-aided subspace methods for near-field localization that enhance robustness to real-world challenges. Specifically, we introduce NF-SubspaceNet, a deep learning-augmented 2D MUSIC algorithm that learns a surrogate covariance matrix to improve localization under challenging conditions, and DCD-MUSIC, a cascaded AI-aided approach that decouples angle and range estimation to reduce computational complexity. We further develop a novel model-order-aware training method to accurately estimate the number of sources, that is combined with casting of near field subspace methods as AI models for learning. Extensive simulations demonstrate that the proposed methods outperform classical and existing deep-learning-based localization techniques, providing robust near-field localization even under coherent sources, miscalibrations, and few snapshots.

LGMar 17, 2025
PAUSE: Low-Latency and Privacy-Aware Active User Selection for Federated Learning

Ori Peleg, Natalie Lang, Dan Ben Ami et al.

Federated learning (FL) enables multiple edge devices to collaboratively train a machine learning model without the need to share potentially private data. Federated learning proceeds through iterative exchanges of model updates, which pose two key challenges: First, the accumulation of privacy leakage over time, and second, communication latency. These two limitations are typically addressed separately: The former via perturbed updates to enhance privacy and the latter using user selection to mitigate latency - both at the expense of accuracy. In this work, we propose a method that jointly addresses the accumulation of privacy leakage and communication latency via active user selection, aiming to improve the trade-off among privacy, latency, and model performance. To achieve this, we construct a reward function that accounts for these three objectives. Building on this reward, we propose a multi-armed bandit (MAB)-based algorithm, termed Privacy-aware Active User SElection (PAUSE) which dynamically selects a subset of users each round while ensuring bounded overall privacy leakage. We establish a theoretical analysis, systematically showing that the reward growth rate of PAUSE follows that of the best-known rate in MAB literature. To address the complexity overhead of active user selection, we propose a simulated annealing-based relaxation of PAUSE and analyze its ability to approximate the reward-maximizing policy under reduced complexity. We numerically validate the privacy leakage, associated improved latency, and accuracy gains of our methods for the federated training in various scenarios.

SPOct 18, 2021
Unsupervised Learned Kalman Filtering

Guy Revach, Nir Shlezinger, Timur Locher et al.

In this paper we adapt KalmanNet, which is a recently pro-posed deep neural network (DNN)-aided system whose architecture follows the operation of the model-based Kalman filter (KF), to learn its mapping in an unsupervised manner, i.e., without requiring ground-truth states. The unsupervised adaptation is achieved by exploiting the hybrid model-based/data-driven architecture of KalmanNet, which internally predicts the next observation as the KF does. These internal features are then used to compute the loss rather than the state estimate at the output of the system. With the capability of unsupervised learning, one can use KalmanNet not only to track the hidden state, but also to adapt to variations in the state space (SS) model. We numerically demonstrate that when the noise statistics are unknown, unsupervised KalmanNet achieves a similar performance to KalmanNet with supervised learning. We also show that we can adapt a pre-trained KalmanNet to changing SS models without providing additional data thanks to the unsupervised capabilities.

SPOct 10, 2021
Uncertainty in Data-Driven Kalman Filtering for Partially Known State-Space Models

Itzik Klein, Guy Revach, Nir Shlezinger et al.

Providing a metric of uncertainty alongside a state estimate is often crucial when tracking a dynamical system. Classic state estimators, such as the Kalman filter (KF), provide a time-dependent uncertainty measure from knowledge of the underlying statistics, however, deep learning based tracking systems struggle to reliably characterize uncertainty. In this paper, we investigate the ability of KalmanNet, a recently proposed hybrid model-based deep state tracking algorithm, to estimate an uncertainty measure. By exploiting the interpretable nature of KalmanNet, we show that the error covariance matrix can be computed based on its internal features, as an uncertainty measure. We demonstrate that when the system dynamics are known, KalmanNet-which learns its mapping from data without access to the statistics-provides uncertainty similar to that provided by the KF; and while in the presence of evolution model-mismatch, KalmanNet pro-vides a more accurate error estimation.

SPSep 22, 2021
DA-MUSIC: Data-Driven DoA Estimation via Deep Augmented MUSIC Algorithm

Julian P. Merkofer, Guy Revach, Nir Shlezinger et al.

Direction of arrival (DoA) estimation of multiple signals is pivotal in sensor array signal processing. A popular multi-signal DoA estimation method is the multiple signal classification (MUSIC) algorithm, which enables high-performance super-resolution DoA recovery while being highly applicable in practice. MUSIC is a model-based algorithm, relying on an accurate mathematical description of the relationship between the signals and the measurements and assumptions on the signals themselves (non-coherent, narrowband sources). As such, it is sensitive to model imperfections. In this work we propose to overcome these limitations of MUSIC by augmenting the algorithm with specifically designed neural architectures. Our proposed deep augmented MUSIC (DA-MUSIC) algorithm is thus a hybrid model-based/data-driven DoA estimator, which leverages data to improve performance and robustness while preserving the interpretable flow of the classic method. DA-MUSIC is shown to learn to overcome limitations of the purely model-based method, such as its inability to successfully localize coherent sources as well as estimate the number of coherent signal sources present. We further demonstrate the superior resolution of the DA-MUSIC algorithm in synthetic narrowband and broadband scenarios as well as with real-world data of DoA estimation from seismic signals.

SPJul 21, 2021
KalmanNet: Neural Network Aided Kalman Filtering for Partially Known Dynamics

Guy Revach, Nir Shlezinger, Xiaoyong Ni et al.

State estimation of dynamical systems in real-time is a fundamental task in signal processing. For systems that are well-represented by a fully known linear Gaussian state space (SS) model, the celebrated Kalman filter (KF) is a low complexity optimal solution. However, both linearity of the underlying SS model and accurate knowledge of it are often not encountered in practice. Here, we present KalmanNet, a real-time state estimator that learns from data to carry out Kalman filtering under non-linear dynamics with partial information. By incorporating the structural SS model with a dedicated recurrent neural network module in the flow of the KF, we retain data efficiency and interpretability of the classic algorithm while implicitly learning complex dynamics from data. We demonstrate numerically that KalmanNet overcomes non-linearities and model mismatch, outperforming classic filtering methods operating with both mismatched and accurate domain knowledge.

SPMar 31, 2021
Federated Learning: A Signal Processing Perspective

Tomer Gafni, Nir Shlezinger, Kobi Cohen et al.

The dramatic success of deep learning is largely due to the availability of data. Data samples are often acquired on edge devices, such as smart phones, vehicles and sensors, and in some cases cannot be shared due to privacy considerations. Federated learning is an emerging machine learning paradigm for training models across multiple edge devices holding local datasets, without explicitly exchanging the data. Learning in a federated manner differs from conventional centralized machine learning, and poses several core unique challenges and requirements, which are closely related to classical problems studied in the areas of signal processing and communications. Consequently, dedicated schemes derived from these areas are expected to play an important role in the success of federated learning and the transition of deep learning from the domain of centralized servers to mobile edge devices. In this article, we provide a unified systematic framework for federated learning in a manner that encapsulates and highlights the main challenges that are natural to treat using signal processing tools. We present a formulation for the federated learning paradigm from a signal processing perspective, and survey a set of candidate approaches for tackling its unique challenges. We further provide guidelines for the design and adaptation of signal processing and communication methods to facilitate federated learning at large scale.

SPFeb 5, 2021
LoRD-Net: Unfolded Deep Detection Network with Low-Resolution Receivers

Shahin Khobahi, Nir Shlezinger, Mojtaba Soltanalian et al.

The need to recover high-dimensional signals from their noisy low-resolution quantized measurements is widely encountered in communications and sensing. In this paper, we focus on the extreme case of one-bit quantizers, and propose a deep detector entitled LoRD-Net for recovering information symbols from one-bit measurements. Our method is a model-aware data-driven architecture based on deep unfolding of first-order optimization iterations. LoRD-Net has a task-based architecture dedicated to recovering the underlying signal of interest from the one-bit noisy measurements without requiring prior knowledge of the channel matrix through which the one-bit measurements are obtained. The proposed deep detector has much fewer parameters compared to black-box deep networks due to the incorporation of domain-knowledge in the design of its architecture, allowing it to operate in a data-driven fashion while benefiting from the flexibility, versatility, and reliability of model-based optimization methods. LoRD-Net operates in a blind fashion, which requires addressing both the non-linear nature of the data-acquisition system as well as identifying a proper optimization objective for signal recovery. Accordingly, we propose a two-stage training method for LoRD-Net, in which the first stage is dedicated to identifying the proper form of the optimization process to unfold, while the latter trains the resulting model in an end-to-end manner. We numerically evaluate the proposed receiver architecture for one-bit signal recovery in wireless communications and demonstrate that the proposed hybrid methodology outperforms both data-driven and model-based state-of-the-art methods, while utilizing small datasets, on the order of merely $\sim 500$ samples, for training.

SPJan 12, 2021
Model-Based Machine Learning for Communications

Nir Shlezinger, Nariman Farsad, Yonina C. Eldar et al.

We present an introduction to model-based machine learning for communication systems. We begin by reviewing existing strategies for combining model-based algorithms and machine learning from a high level perspective, and compare them to the conventional deep learning approach which utilizes established deep neural network (DNN) architectures trained in an end-to-end manner. Then, we focus on symbol detection, which is one of the fundamental tasks of communication receivers. We show how the different strategies of conventional deep architectures, deep unfolding, and DNN-aided hybrid algorithms, can be applied to this problem. The last two approaches constitute a middle ground between purely model-based and solely DNN-based receivers. By focusing on this specific task, we highlight the advantages and drawbacks of each strategy, and present guidelines to facilitate the design of future model-based deep learning systems for communications.

SPDec 15, 2020
Model-Based Deep Learning

Nir Shlezinger, Jay Whang, Yonina C. Eldar et al.

Signal processing, communications, and control have traditionally relied on classical statistical modeling techniques. Such model-based methods utilize mathematical formulations that represent the underlying physics, prior information and additional domain knowledge. Simple classical models are useful but sensitive to inaccuracies and may lead to poor performance when real systems display complex or dynamic behavior. On the other hand, purely data-driven approaches that are model-agnostic are becoming increasingly popular as datasets become abundant and the power of modern deep learning pipelines increases. Deep neural networks (DNNs) use generic architectures which learn to operate from data, and demonstrate excellent performance, especially for supervised problems. However, DNNs typically require massive amounts of data and immense computational resources, limiting their applicability for some signal processing scenarios. We are interested in hybrid techniques that combine principled mathematical models with data-driven systems to benefit from the advantages of both approaches. Such model-based deep learning methods exploit both partial domain knowledge, via mathematical structures designed for specific problems, as well as learning from limited data. In this article we survey the leading approaches for studying and designing model-based deep learning systems. We divide hybrid model-based/data-driven systems into categories based on their inference mechanism. We provide a comprehensive review of the leading approaches for combining model-based algorithms with deep learning in a systematic manner, along with concrete guidelines and detailed signal processing oriented examples from recent literature. Our aim is to facilitate the design and study of future systems on the intersection of signal processing and machine learning that incorporate the advantages of both domains.

ITNov 14, 2020
FedRec: Federated Learning of Universal Receivers over Fading Channels

Mahdi Boloursaz Mashhadi, Nir Shlezinger, Yonina C. Eldar et al.

Wireless communications is often subject to channel fading. Various statistical models have been proposed to capture the inherent randomness in fading, and conventional model-based receiver designs rely on accurate knowledge of this underlying distribution, which, in practice, may be complex and intractable. In this work, we propose a neural network-based symbol detection technique for downlink fading channels, which is based on the maximum a-posteriori probability (MAP) detector. To enable training on a diverse ensemble of fading realizations, we propose a federated training scheme, in which multiple users collaborate to jointly learn a universal data-driven detector, hence the name FedRec. The performance of the resulting receiver is shown to approach the MAP performance in diverse channel conditions without requiring knowledge of the fading statistics, while inducing a substantially reduced communication overhead in its training procedure compared to centralized training.

LGSep 27, 2020
Over-the-Air Federated Learning from Heterogeneous Data

Tomer Sery, Nir Shlezinger, Kobi Cohen et al.

Federated learning (FL) is a framework for distributed learning of centralized models. In FL, a set of edge devices train a model using their local data, while repeatedly exchanging their trained updates with a central server. This procedure allows tuning a centralized model in a distributed fashion without having the users share their possibly private data. In this paper, we focus on over-the-air (OTA) FL, which has been suggested recently to reduce the communication overhead of FL due to the repeated transmissions of the model updates by a large number of users over the wireless channel. In OTA FL, all users simultaneously transmit their updates as analog signals over a multiple access channel, and the server receives a superposition of the analog transmitted signals. However, this approach results in the channel noise directly affecting the optimization procedure, which may degrade the accuracy of the trained model. We develop a Convergent OTA FL (COTAF) algorithm which enhances the common local stochastic gradient descent (SGD) FL algorithm, introducing precoding at the users and scaling at the server, which gradually mitigates the effect of the noise. We analyze the convergence of COTAF to the loss minimizing model and quantify the effect of a statistically heterogeneous setup, i.e. when the training data of each user obeys a different distribution. Our analysis reveals the ability of COTAF to achieve a convergence rate similar to that achievable over error-free channels. Our simulations demonstrate the improved convergence of COTAF over vanilla OTA local SGD for training using non-synthetic datasets. Furthermore, we numerically show that the precoding induced by COTAF notably improves the convergence rate and the accuracy of models trained via OTA FL.

LGJun 5, 2020
UVeQFed: Universal Vector Quantization for Federated Learning

Nir Shlezinger, Mingzhe Chen, Yonina C. Eldar et al.

Traditional deep learning models are trained at a centralized server using labeled data samples collected from end devices or users. Such data samples often include private information, which the users may not be willing to share. Federated learning (FL) is an emerging approach to train such learning models without requiring the users to share their possibly private labeled data. In FL, each user trains its copy of the learning model locally. The server then collects the individual updates and aggregates them into a global model. A major challenge that arises in this method is the need of each user to efficiently transmit its learned model over the throughput limited uplink channel. In this work, we tackle this challenge using tools from quantization theory. In particular, we identify the unique characteristics associated with conveying trained models over rate-constrained channels, and propose a suitable quantization scheme for such settings, referred to as universal vector quantization for FL (UVeQFed). We show that combining universal vector quantization methods with FL yields a decentralized training system in which the compression of the trained models induces only a minimum distortion. We then theoretically analyze the distortion, showing that it vanishes as the number of users grows. We also characterize the convergence of models trained with the traditional federated averaging method combined with UVeQFed to the model which minimizes the loss function. Our numerical results demonstrate the gains of UVeQFed over previously proposed methods in terms of both distortion induced in quantization and accuracy of the resulting aggregated model.

LGJun 5, 2020
Learned Factor Graphs for Inference from Stationary Time Sequences

Nir Shlezinger, Nariman Farsad, Yonina C. Eldar et al.

The design of methods for inference from time sequences has traditionally relied on statistical models that describe the relation between a latent desired sequence and the observed one. A broad family of model-based algorithms have been derived to carry out inference at controllable complexity using recursive computations over the factor graph representing the underlying distribution. An alternative model-agnostic approach utilizes machine learning (ML) methods. Here we propose a framework that combines model-based algorithms and data-driven ML tools for stationary time sequences. In the proposed approach, neural networks are developed to separately learn specific components of a factor graph describing the distribution of the time sequence, rather than the complete inference task. By exploiting stationary properties of this distribution, the resulting approach can be applied to sequences of varying temporal duration. Learned factor graph can be realized using compact neural networks that are trainable using small training sets, or alternatively, be used to improve upon existing deep inference systems. We present an inference algorithm based on learned stationary factor graphs, which learns to implement the sum-product scheme from labeled data, and can be applied to sequences of different lengths. Our experimental results demonstrate the ability of the proposed learned factor graphs to learn to carry out accurate inference from small training sets for sleep stage detection using the Sleep-EDF dataset, as well as for symbol detection in digital communications with unknown channels.

SPFeb 14, 2020
Data-Driven Symbol Detection via Model-Based Machine Learning

Nariman Farsad, Nir Shlezinger, Andrea J. Goldsmith et al.

The design of symbol detectors in digital communication systems has traditionally relied on statistical channel models that describe the relation between the transmitted symbols and the observed signal at the receiver. Here we review a data-driven framework to symbol detection design which combines machine learning (ML) and model-based algorithms. In this hybrid approach, well-known channel-model-based algorithms such as the Viterbi method, BCJR detection, and multiple-input multiple-output (MIMO) soft interference cancellation (SIC) are augmented with ML-based algorithms to remove their channel-model-dependence, allowing the receiver to learn to implement these algorithms solely from data. The resulting data-driven receivers are most suitable for systems where the underlying channel models are poorly understood, highly complex, or do not well-capture the underlying physics. Our approach is unique in that it only replaces the channel-model-based computations with dedicated neural networks that can be trained from a small amount of data, while keeping the general algorithm intact. Our results demonstrate that these techniques can yield near-optimal performance of model-based algorithms without knowing the exact channel input-output statistical relationship and in the presence of channel state information uncertainty.

SPFeb 8, 2020
DeepSIC: Deep Soft Interference Cancellation for Multiuser MIMO Detection

Nir Shlezinger, Rong Fu, Yonina C. Eldar

Digital receivers are required to recover the transmitted symbols from their observed channel output. In multiuser multiple-input multiple-output (MIMO) setups, where multiple symbols are simultaneously transmitted, accurate symbol detection is challenging. A family of algorithms capable of reliably recovering multiple symbols is based on interference cancellation. However, these methods assume that the channel is linear, a model which does not reflect many relevant channels, as well as require accurate channel state information (CSI), which may not be available. In this work we propose a multiuser MIMO receiver which learns to jointly detect in a data-driven fashion, without assuming a specific channel model or requiring CSI. In particular, we propose a data-driven implementation of the iterative soft interference cancellation (SIC) algorithm which we refer to as DeepSIC. The resulting symbol detector is based on integrating dedicated machine-learning (ML) methods into the iterative SIC algorithm. DeepSIC learns to carry out joint detection from a limited set of training samples without requiring the channel to be linear and its parameters to be known. Our numerical evaluations demonstrate that for linear channels with full CSI, DeepSIC approaches the performance of iterative SIC, which is comparable to the optimal performance, and outperforms previously proposed ML-based MIMO receivers. Furthermore, in the presence of CSI uncertainty, DeepSIC significantly outperforms model-based approaches. Finally, we show that DeepSIC accurately detects symbols in non-linear channels, where conventional iterative SIC fails even when accurate CSI is available.

MLJan 31, 2020
Data-Driven Factor Graphs for Deep Symbol Detection

Nir Shlezinger, Nariman Farsad, Yonina C. Eldar et al.

Many important schemes in signal processing and communications, ranging from the BCJR algorithm to the Kalman filter, are instances of factor graph methods. This family of algorithms is based on recursive message passing-based computations carried out over graphical models, representing a factorization of the underlying statistics. Consequently, in order to implement these algorithms, one must have accurate knowledge of the statistical model of the considered signals. In this work we propose to implement factor graph methods in a data-driven manner. In particular, we propose to use machine learning (ML) tools to learn the factor graph, instead of the overall system task, which in turn is used for inference by message passing over the learned graph. We apply the proposed approach to learn the factor graph representing a finite-memory channel, demonstrating the resulting ability to implement BCJR detection in a data-driven fashion. We demonstrate that the proposed system, referred to as BCJRNet, learns to implement the BCJR algorithm from a small training set, and that the resulting receiver exhibits improved robustness to inaccurate training compared to the conventional channel-model-based receiver operating under the same level of uncertainty. Our results indicate that by utilizing ML tools to learn factor graphs from labeled data, one can implement a broad range of model-based algorithms, which traditionally require full knowledge of the underlying statistics, in a data-driven fashion.