Event-Driven Online Vertical Federated Learning
This work addresses the problem of efficient and stable online learning in VFL for real-world applications where data arrives asynchronously, representing an incremental improvement over prior methods.
The paper tackled the challenge of integrating online learning into Vertical Federated Learning (VFL) by addressing asynchronous data streaming from clients with non-intersecting feature sets, proposing an event-driven framework that activates only a subset of clients per event. The result showed improved stability under non-stationary data conditions and significant reductions in communication and computation costs compared to existing online VFL frameworks.
Online learning is more adaptable to real-world scenarios in Vertical Federated Learning (VFL) compared to offline learning. However, integrating online learning into VFL presents challenges due to the unique nature of VFL, where clients possess non-intersecting feature sets for the same sample. In real-world scenarios, the clients may not receive data streaming for the disjoint features for the same entity synchronously. Instead, the data are typically generated by an \emph{event} relevant to only a subset of clients. We are the first to identify these challenges in online VFL, which have been overlooked by previous research. To address these challenges, we proposed an event-driven online VFL framework. In this framework, only a subset of clients were activated during each event, while the remaining clients passively collaborated in the learning process. Furthermore, we incorporated \emph{dynamic local regret (DLR)} into VFL to address the challenges posed by online learning problems with non-convex models within a non-stationary environment. We conducted a comprehensive regret analysis of our proposed framework, specifically examining the DLR under non-convex conditions with event-driven online VFL. Extensive experiments demonstrated that our proposed framework was more stable than the existing online VFL framework under non-stationary data conditions while also significantly reducing communication and computation costs.