LGJun 6, 2023

Masked Autoencoders are Efficient Continual Federated Learners

Subarnaduti Paul, Lars-Joel Frey, Roshni Kamath, Kristian Kersting, Martin Mundt

arXiv:2306.03542v23.82 citationsh-index: 23Has Code

Originality Incremental advance

AI Analysis

This work addresses the problem of data distribution drift and task changes in federated learning for clients, though it is incremental as it builds on prior synergies between distributed and continual learning.

The paper tackles the challenge of federated continual learning, where clients collaboratively learn from changing data distributions over time without forgetting previous tasks, by proposing masked autoencoders integrated with task attention mechanisms for selective knowledge transfer. The result is demonstrated through empirical validation on image and binary datasets, showing effectiveness in this setup.

Machine learning is typically framed from a perspective of i.i.d., and more importantly, isolated data. In parts, federated learning lifts this assumption, as it sets out to solve the real-world challenge of collaboratively learning a shared model from data distributed across clients. However, motivated primarily by privacy and computational constraints, the fact that data may change, distributions drift, or even tasks advance individually on clients, is seldom taken into account. The field of continual learning addresses this separate challenge and first steps have recently been taken to leverage synergies in distributed supervised settings, in which several clients learn to solve changing classification tasks over time without forgetting previously seen ones. Motivated by these prior works, we posit that such federated continual learning should be grounded in unsupervised learning of representations that are shared across clients; in the loose spirit of how humans can indirectly leverage others' experience without exposure to a specific task. For this purpose, we demonstrate that masked autoencoders for distribution estimation are particularly amenable to this setup. Specifically, their masking strategy can be seamlessly integrated with task attention mechanisms to enable selective knowledge transfer between clients. We empirically corroborate the latter statement through several continual federated scenarios on both image and binary datasets.

View on arXiv PDF Code

Similar