Ali Khalesi

Semantic Scholar Profile

h-index5

8papers

24citations

Novelty48%

AI Score51

Ranked #39,528 of 201,326 authors (top 20%)#106 in IT (top 12%)

8 Papers

ITApr 30

Multi-User Non-Linearly Separable Distributed Computing

Ali Khalesi, Ahmad Tanha, Derya Malak et al.

This paper considers an $N$-server distributed computing setting with $K$ users requesting functions that are arbitrary multivariable polynomial evaluations of $L$ real (potentially non-linear) basis subfunctions, where each function output is raised to a bounded power. Our aim is to seek efficient task allocation and data communication techniques that reduce computation and communication costs. To this end, we take a tensor-theoretic approach, in which we represent the requested non-linearly decomposable functions using a properly designed tensor $\bar{\mathcal{F}}$, whose sparse decomposition into a tensor $\bar{\mathcal{E}}$ and a matrix $\mathbf{D}$ directly defines the task assignment, connectivity, and communication patterns. We design a lossless achievable scheme that integrates fixed-support SVD-based tensor factorization with multi-dimensional tiling of $\bar{\mathcal{E}}$ and $\mathbf{D}$, followed by a bipartite graph matching-based recursive assignment of tiles. This step transforms an overlapping decomposition into a disjoint one and reduces the resulting sum rank of the tiles, thereby decreasing the number of required servers. Under mild dimensionality conditions, we derive an explicit zero-error characterization of the achievable system rate $K/N$. Numerical simulations demonstrate the computational and communication savings over existing state-of-the-art matrix factorization approaches across a wide range of system parameters.

LGMay 6

Expert Routing for Communication-Efficient MoE via Finite Expert Banks

Mohammad Reza Deylam Salehi, Ali Khalesi

Resource-efficient machine learning increasingly uses sparse Mixture-of-Experts (MoE) architectures, where the gate acts as both a learning component and a routing interface controlling computation, communication, and accuracy. Motivated by finite-rate interpretations of MoE gating, we treat the gate as a stochastic channel and use $I(X;T)$ to quantify the routing information available to the selected expert. To make the associated information quantities tractable beyond synthetic examples, we develop a finite-bank MNIST construction using pretrained CNN experts and a discrete, data-dependent selection rule. Since the selected model belongs to a finite candidate set, the algorithmic mutual information $I(S;W)$ admits a closed-form discrete-entropy estimator from the empirical posterior $q(W|S)$. Sweeping a data-dependence parameter $α$, we observe that $\widehat I(S;W)$ monotonically tracks the generalization gap, while the Xu-Raginsky bound exhibits the expected looseness. We also compare with a uniform union-bound baseline and introduce an empirical estimator of $I(X;T)$ together with a Blahut-Arimoto procedure for tracing an accuracy-rate curve over the expert bank. The proposed framework provides a practical tool for analyzing resource-aware MoE inference systems and for interpreting $I(X;T)$ and $D(R_g)$ as design proxies for efficient expert routing.

ITApr 21

Secure Multi-User Linearly-Separable Distributed Computing

Amir Masoud Jafarpisheh, Ali Khalesi, Petros Elia

The introduction of the new multi-user linearly-separable distributed computing framework, has recently revealed how a parallel treatment of users can yield large parallelization gains with relatively low computation and communication costs. These gains stem from a new approach that converts the computing problem into a sparse matrix factorization problem; a matrix $\mathbf{F}$ that describes the users' requests, is decomposed as $\mathbf{F} = \mathbf{DE}$, where a $γ$-sparse $\mathbf{E}$ defines the task allocation across $N$ servers, and a $δ$-sparse $\mathbf{D}$ defines the connectivity between $N$ servers and $K$ users as well as the decoding process. While this approach provides near-optimal performance, its linear nature has raised data secrecy concerns. We adopt an information-theoretic secrecy framework requiring that each user learns nothing more than its own requested function. Our main results provide (i) a necessary condition stating that for each user $k$ observing $α_k$ server responses, the common randomness visible to that user must span a subspace of dimension greater than $α_k-1$, and (ii) a necessary and sufficient condition requiring that removing from $\mathbf{D}$ the columns corresponding to the servers observed by a user leaves a matrix of rank at least $K-1$. Based on these conditions, we design a general, cost-preserving secrecy-enforcing transformation valid over both finite and real fields, obtained by appending to $\mathbf{E}$ a basis of $\mathrm{Null}(\mathbf{D})$ and carefully injecting shared randomness. This scheme preserves communication and computation costs, guarantees perfect information-theoretic secrecy over finite fields, and in the real case yields an explicit mutual-information bound that can be made arbitrarily small by increasing the variance of Gaussian common randomness.

ITApr 3

General Multi-User Distributed Computing: A Learning-Theoretic RKHS Framework for Generic Nonlinear Target Functions with Topology-Aware Risk Analysis

Ali Khalesi

This paper studies multi-user distributed computation over shared real-valued subfunctions under computation and communication constraints. We consider a \emph{General Multi-User Distributed Computing (GMUDC)} model in which different users request heterogeneous target functions represented in the reproducing-kernel Hilbert space of a shift-invariant kernel, thereby covering generic nonlinear target mappings beyond linearly separable tasks. Unlike tessellated distributed computing frameworks that rely on disjoint-support topologies in their native setting, the GMUDC model allows arbitrary task-assignment and connectivity topologies subject to per-server computation and communication budgets~$Î$ and~$Î$. We analyze two complementary regimes. In the \emph{quenched} regime, the assignment and communication topology are fixed, and we derive upper and lower bounds on the resulting reconstruction risk that separate a spectral approximation term from a topology-dependent coverage term. In the \emph{annealed} regime, the assignment and links are drawn uniformly at random from a prescribed ensemble, and we characterize the corresponding average-risk scaling together with a topology-dependent coverage threshold. These results provide a topology-aware characterization of the computation--communication--accuracy trade-off for approximate multi-user distributed computation. They identify fundamental limits, up to constants and logarithmic factors, under the model assumptions adopted in the paper. In the shared linear/isotropic comparison regime, the framework also recovers the relevant tessellated distributed computing benchmark, while the broader GMUDC formulation applies to generic nonlinear target functions in kernel RKHSs and accommodates a wider range of task-assignment and communication topologies than disjoint-support constructions alone.

ITMar 19

Wireless Broadcast Gossip for Decentralized Drone Swarms: Success Probability, Contraction, and Optimal Aloha

Ali Khalesi

We study broadcast gossip for decentralized drone swarms over an interference-limited wireless medium. Modeling drone locations as a planar Poisson point process and medium access via slotted Aloha, we derive (i) a closed-form SIR success probability under Rayleigh fading, (ii) a mean-square contraction bound in which the consensus rate factorizes into an ideal mixing term and an explicit wireless thinning term, and (iii) a closed-form access probability that optimizes a sharp availability--reliability proxy. Simulations corroborate the predicted operating point by matching the fastest convergence region.

MLFeb 16

Mixture-of-Experts under Finite-Rate Gating: Communication--Generalization Trade-offs

Ali Khalesi, Mohammad Reza Deylam Salehi

Mixture-of-Experts (MoE) architectures decompose prediction tasks into specialized expert sub-networks selected by a gating mechanism. This letter adopts a communication-theoretic view of MoE gating, modeling the gate as a stochastic channel operating under a finite information rate. Within an information-theoretic learning framework, we specialize a mutual-information generalization bound and develop a rate-distortion characterization $D(R_g)$ of finite-rate gating, where $R_g:=I(X; T)$, yielding (under a standard empirical rate-distortion optimality condition) $\mathbb{E}[R(W)] \le D(R_g)+δ_m+\sqrt{(2/m)\, I(S; W)}$. The analysis yields capacity-aware limits for communication-constrained MoE systems, and numerical simulations on synthetic multi-expert models empirically confirm the predicted trade-offs between gating rate, expressivity, and generalization.

ITApr 27

Densification Converses for Walker Constellations With Explicit Constants and Reuse Scaling Laws

Ali Khalesi, François Baccelli

We establish densification converses for Walker LEO constellations under nearest-visible association in the full-frequency-reuse setting. Performance is evaluated under the invariant (stationary) measure induced by the constellation/Earth dynamics on the user--constellation ``phase state.'' A key Walker-specific feature, absent from unbounded planar models, is that association is restricted to a bounded visible cap determined by Earth geometry. Under power-law path-loss, a two-level antenna-gain model, i.i.d.\ nonnegative fading with unit mean and finite second moment, and nonzero noise, we prove that increasing the total satellite count $N=N_oN_s$ forces the aggregate interference to grow at least linearly in $N$, while the useful signal remains uniformly bounded above. Consequently, the downlink SINR coverage probability at any fixed threshold and the ergodic spectral efficiency both vanish as $N\to\infty$. The key technical ingredient is a deterministic visibility-annulus block lemma, uniform over all sufficiently large constellations and all "phase states", showing that a fixed fraction of visible satellites lies in a distance annulus strictly inside the horizon; this yields explicit finite-$N$ collapse bounds. In particular, we derive nonasymptotic $O(1/N)$ upper bounds on both coverage and ergodic spectral efficiency. Finally, in the case of frequency reuse through independent thinning, with activity probability $q$, we show that avoiding densification collapse necessarily requires $qN=O(1)$, equivalently a reuse factor $Ω(N)$, and we obtain a corresponding explicit $O(1/(qN))$ upper bound.

ITMar 2, 2021

The Capacity Region of Distributed Multi-User Secret Sharing

Ali Khalesi, Mahtab Mirmohseni, Mohammad Ali Maddah-Ali

In this paper, we study the problem of distributed multi-user secret sharing, including a trusted master node, $N\in \mathbb{N}$ storage nodes, and $K$ users, where each user has access to the contents of a subset of storage nodes. Each user has an independent secret message with certain rate, defined as the size of the message normalized by the size of a storage node. Having access to the secret messages, the trusted master node places encoded shares in the storage nodes, such that (i) each user can recover its own message from the content of the storage nodes that it has access to, (ii) each user cannot gain any information about the message of any other user. We characterize the capacity region of the distributed multi-user secret sharing, defined as the set of all achievable rate tuples, subject to the correctness and privacy constraints. In the achievable scheme, for each user, the master node forms a polynomial with the degree equal to the number of its accessible storage nodes minus one, where the value of this polynomial at certain points are stored as the encoded shares. The message of that user is embedded in some of the coefficients of the polynomial. The remaining coefficients are determined such that the content of each storage node serves as the encoded shares for all users that have access to that storage node.