Decentralized Topic Modelling with Latent Dirichlet Allocation
This addresses privacy concerns for users in decentralized networks like sensors or smartphones who need to analyze text data without exposing sensitive information, though it is incremental as it adapts existing methods.
The paper tackles the problem of learning global topic models from text data across decentralized networks without sharing sensitive local information, and demonstrates on synthetic data that their method recovers similar parameters and performance as centralized approaches.
Privacy preserving networks can be modelled as decentralized networks (e.g., sensors, connected objects, smartphones), where communication between nodes of the network is not controlled by an all-knowing, central node. For this type of networks, the main issue is to gather/learn global information on the network (e.g., by optimizing a global cost function) while keeping the (sensitive) information at each node. In this work, we focus on text information that agents do not want to share (e.g., text messages, emails, confidential reports). We use recent advances on decentralized optimization and topic models to infer topics from a graph with limited communication. We propose a method to adapt latent Dirichlet allocation (LDA) model to decentralized optimization and show on synthetic data that we still recover similar parameters and similar performance at each node than with stochastic methods accessing to the whole information in the graph.