João Xavier

LG
h-index2
8papers
36citations
Novelty54%
AI Score39

8 Papers

SYJul 31, 2018
FADE: Fast and Asymptotically efficient Distributed Estimator for dynamic networks

António Simões, João Xavier

Consider a set of agents that wish to estimate a vector of parameters of their mutual interest. For this estimation goal, agents can sense and communicate. When sensing, an agent measures (in additive gaussian noise) linear combinations of the unknown vector of parameters. When communicating, an agent can broadcast information to a few other agents, by using the channels that happen to be randomly at its disposal at the time. To coordinate the agents towards their estimation goal, we propose a novel algorithm called FADE (Fast and Asymptotically efficient Distributed Estimator), in which agents collaborate at discrete time-steps; at each time-step, agents sense and communicate just once, while also updating their own estimate of the unknown vector of parameters. FADE enjoys five attractive features: first, it is an intuitive estimator, simple to derive; second, it withstands dynamic networks, that is, networks whose communication channels change randomly over time; third, it is strongly consistent in that, as time-steps play out, each agent's local estimate converges (almost surely) to the true vector of parameters; fourth, it is both asymptotically unbiased and efficient, which means that, across time, each agent's estimate becomes unbiased and the mean-square error (MSE) of each agent's estimate vanishes to zero at the same rate of the MSE of the optimal estimator at an almighty central node; fifth, and most importantly, when compared with a state-of-art consensus+innovation (CI) algorithm, it yields estimates with outstandingly lower mean-square errors, for the same number of communications -- for example, in a sparsely connected network model with 50 agents, we find through numerical simulations that the reduction can be dramatic, reaching several orders of magnitude.

SYJul 4, 2018
Distributed Estimation Via a Roaming Token

Lucas Balthazar, João Xavier, Bruno Sinopoli

We present an algorithm for the problem of linear distributed estimation of a parameter in a network where a set of agents are successively taking measurements. The approach considers a roaming token in a network that carries the estimate, and jumps from one agent to another in its vicinity according to the probabilities of a Markov chain. When the token is at an agent it records the agent's local information. We analyze the proposed algorithm and show that it is consistent and asymptotically optimal, in the sense that its mean-square-error (MSE) rate of decay approaches the centralized one as the number of iterations increases. We show these results for a scenario where the network changes over time, and we consider two different set of assumptions on the network instantiations: they are i.i.d. and connected on the average, or they are deterministic and strongly connected for every finite time window of a fixed size. Simulations show our algorithm is competitive with consensus+innovations type algorithms, achieving a smaller MSE at each iteration in all considered scenarios.

LGSep 18, 2023
A Multi-Token Coordinate Descent Method for Semi-Decentralized Vertical Federated Learning

Pedro Valdeira, Yuejie Chi, Cláudia Soares et al.

Most federated learning (FL) methods use a client-server scheme, where clients communicate only with a central server. However, this scheme is prone to bandwidth bottlenecks at the server and has a single point of failure. In contrast, in a (fully) decentralized approach, clients communicate directly with each other, dispensing with the server and mitigating these issues. Yet, as the client network grows larger and sparser, the convergence of decentralized methods slows down, even failing to converge if the network is disconnected. This work addresses this gap between client-server and decentralized schemes, focusing on the vertical FL setup, where clients hold different features of the same samples. We propose multi-token coordinate descent (MTCD), a flexible semi-decentralized method for vertical FL that can exploit both client-server and client-client links. By selecting appropriate hyperparameters, MTCD recovers the client-sever and decentralized schemes as special cases. In fact, its decentralized instance is itself a novel method of independent interest. Yet, by controlling the degree of dependency on client-server links, MTCD can also explore a spectrum of schemes ranging from client-server to decentralized. We prove that, for sufficiently large batch sizes, MTCD converges at an $\mathcal{O}(1/T)$ rate for nonconvex objectives when the tokens roam across disjoint subsets of clients. To capture the aforementioned drawbacks of the client-server scheme succinctly, we model the relative impact of using client-server versus client-client links as the ratio of their "costs", which depends on the application. This allows us to demonstrate, both analytically and empirically, that by tuning the degree of dependency on the server, the semi-decentralized instances of MTCD can outperform both client-server and decentralized approaches across a range of applications.

LGJun 20, 2024Code
Communication-efficient Vertical Federated Learning via Compressed Error Feedback

Pedro Valdeira, João Xavier, Cláudia Soares et al.

Communication overhead is a known bottleneck in federated learning (FL). To address this, lossy compression is commonly used on the information communicated between the server and clients during training. In horizontal FL, where each client holds a subset of the samples, such communication-compressed training methods have recently seen significant progress. However, in their vertical FL counterparts, where each client holds a subset of the features, our understanding remains limited. To address this, we propose an error feedback compressed vertical federated learning (EF-VFL) method to train split neural networks. In contrast to previous communication-compressed methods for vertical FL, EF-VFL does not require a vanishing compression error for the gradient norm to converge to zero for smooth nonconvex problems. By leveraging error feedback, our method can achieve a $\mathcal{O}(1/T)$ convergence rate for a sufficiently large batch size, improving over the state-of-the-art $\mathcal{O}(1/\sqrt{T})$ rate under $\mathcal{O}(1/\sqrt{T})$ compression error, and matching the rate of uncompressed methods. Further, when the objective function satisfies the Polyak-Łojasiewicz inequality, our method converges linearly. In addition to improving convergence, our method also supports the use of private labels. Numerical experiments show that EF-VFL significantly improves over the prior art, confirming our theoretical results. The code for this work can be found at https://github.com/Valdeira/EF-VFL.

LGJan 22
CLASP: An online learning algorithm for Convex Losses And Squared Penalties

Ricardo N. Ferreira, João Xavier, Cláudia Soares

We study Constrained Online Convex Optimization (COCO), where a learner chooses actions iteratively, observes both unanticipated convex loss and convex constraint, and accumulates loss while incurring penalties for constraint violations. We introduce CLASP (Convex Losses And Squared Penalties), an algorithm that minimizes cumulative loss together with squared constraint violations. Our analysis departs from prior work by fully leveraging the firm non-expansiveness of convex projectors, a proof strategy not previously applied in this setting. For convex losses, CLASP achieves regret $O\left(T^{\max\{β,1-β\}}\right)$ and cumulative squared penalty $O\left(T^{1-β}\right)$ for any $β\in (0,1)$. Most importantly, for strongly convex problems, CLASP provides the first logarithmic guarantees on both regret and cumulative squared penalty. In the strongly convex case, the regret is upper bounded by $O( \log T )$ and the cumulative squared penalty is also upper bounded by $O( \log T )$.

LGJan 24, 2022
Decentralized EM to Learn Gaussian Mixtures from Datasets Distributed by Features

Pedro Valdeira, Cláudia Soares, João Xavier

Expectation Maximization (EM) is the standard method to learn Gaussian mixtures. Yet its classic, centralized form is often infeasible, due to privacy concerns and computational and communication bottlenecks. Prior work dealt with data distributed by examples, horizontal partitioning, but we lack a counterpart for data scattered by features, an increasingly common scheme (e.g. user profiling with data from multiple entities). To fill this gap, we provide an EM-based algorithm to fit Gaussian mixtures to Vertically Partitioned data (VP-EM). In federated learning setups, our algorithm matches the centralized EM fitting of Gaussian mixtures constrained to a subspace. In arbitrary communication graphs, consensus averaging allows VP-EM to run on large peer-to-peer networks as an EM approximation. This mismatch comes from consensus error only, which vanishes exponentially fast with the number of consensus rounds. We demonstrate VP-EM on various topologies for both synthetic and real data, evaluating its approximation of centralized EM and seeing that it outperforms the available benchmark.

LGSep 7, 2021
COCO Denoiser: Using Co-Coercivity for Variance Reduction in Stochastic Convex Optimization

Manuel Madeira, Renato Negrinho, João Xavier et al.

First-order methods for stochastic optimization have undeniable relevance, in part due to their pivotal role in machine learning. Variance reduction for these algorithms has become an important research topic. In contrast to common approaches, which rarely leverage global models of the objective function, we exploit convexity and L-smoothness to improve the noisy estimates outputted by the stochastic gradient oracle. Our method, named COCO denoiser, is the joint maximum likelihood estimator of multiple function gradients from their noisy observations, subject to co-coercivity constraints between them. The resulting estimate is the solution of a convex Quadratically Constrained Quadratic Problem. Although this problem is expensive to solve by interior point methods, we exploit its structure to apply an accelerated first-order algorithm, the Fast Dual Proximal Gradient method. Besides analytically characterizing the proposed estimator, we show empirically that increasing the number and proximity of the queried points leads to better gradient estimates. We also apply COCO in stochastic settings by plugging it in existing algorithms, such as SGD, Adam or STRSAGA, outperforming their vanilla versions, even in scenarios where our modelling assumptions are mismatched.

LGMar 26, 2018
DJAM: distributed Jacobi asynchronous method for learning personal models

Inês Almeida, João Xavier

Processing data collected by a network of agents often boils down to solving an optimization problem. The distributed nature of these problems calls for methods that are, themselves, distributed. While most collaborative learning problems require agents to reach a common (or consensus) model, there are situations in which the consensus solution may not be optimal. For instance, agents may want to reach a compromise between agreeing with their neighbors and minimizing a personal loss function. We present DJAM, a Jacobi-like distributed algorithm for learning personalized models. This method is implementation-friendly: it has no hyperparameters that need tuning, it is asynchronous, and its updates only require single-neighbor interactions. We prove that DJAM converges with probability one to the solution, provided that the personal loss functions are strongly convex and have Lipschitz gradient. We then give evidence that DJAM is on par with state-of-the-art methods: our method reaches a solution with error similar to the error of a carefully tuned ADMM in about the same number of single-neighbor interactions.