LGFeb 24
Sparse Bayesian Deep Functional Learning with Structured Region SelectionXiaoxian Zhu, Yingmeng Li, Shuangge Ma et al.
In modern applications such as ECG monitoring, neuroimaging, wearable sensing, and industrial equipment diagnostics, complex and continuously structured data are ubiquitous, presenting both challenges and opportunities for functional data analysis. However, existing methods face a critical trade-off: conventional functional models are limited by linearity, whereas deep learning approaches lack interpretable region selection for sparse effects. To bridge these gaps, we propose a sparse Bayesian functional deep neural network (sBayFDNN). It learns adaptive functional embeddings through a deep Bayesian architecture to capture complex nonlinear relationships, while a structured prior enables interpretable, region-wise selection of influential domains with quantified uncertainty. Theoretically, we establish rigorous approximation error bounds, posterior consistency, and region selection consistency. These results provide the first theoretical guarantees for a Bayesian deep functional model, ensuring its reliability and statistical rigor. Empirically, comprehensive simulations and real-world studies confirm the effectiveness and superiority of sBayFDNN. Crucially, sBayFDNN excels in recognizing intricate dependencies for accurate predictions and more precisely identifies functionally meaningful regions, capabilities fundamentally beyond existing approaches.
MLAug 8, 2025
Federated Online Learning for Heterogeneous Multisource Streaming DataJingmao Li, Yuanxing Chen, Shuangge Ma et al.
Federated learning has emerged as an essential paradigm for distributed multi-source data analysis under privacy concerns. Most existing federated learning methods focus on the ``static" datasets. However, in many real-world applications, data arrive continuously over time, forming streaming datasets. This introduces additional challenges for data storage and algorithm design, particularly under high-dimensional settings. In this paper, we propose a federated online learning (FOL) method for distributed multi-source streaming data analysis. To account for heterogeneity, a personalized model is constructed for each data source, and a novel ``subgroup" assumption is employed to capture potential similarities, thereby enhancing model performance. We adopt the penalized renewable estimation method and the efficient proximal gradient descent for model training. The proposed method aligns with both federated and online learning frameworks: raw data are not exchanged among sources, ensuring data privacy, and only summary statistics of previous data batches are required for model updates, significantly reducing storage demands. Theoretically, we establish the consistency properties for model estimation, variable selection, and subgroup structure recovery, demonstrating optimal statistical efficiency. Simulations illustrate the effectiveness of the proposed method. Furthermore, when applied to the financial lending data and the web log data, the proposed method also exhibits advantageous prediction performance. Results of the analysis also provide some practical insights.
LGJun 23, 2025
ContinualFlow: Learning and Unlearning with Neural Flow MatchingLorenzo Simone, Davide Bacciu, Shuangge Ma
We introduce ContinualFlow, a principled framework for targeted unlearning in generative models via Flow Matching. Our method leverages an energy-based reweighting loss to softly subtract undesired regions of the data distribution without retraining from scratch or requiring direct access to the samples to be unlearned. Instead, it relies on energy-based proxies to guide the unlearning process. We prove that this induces gradients equivalent to Flow Matching toward a soft mass-subtracted target, and validate the framework through experiments on 2D and image domains, supported by interpretable visualizations and quantitative evaluations.
MLFeb 26, 2024
Penalized Generative Variable SelectionTong Wang, Jian Huang, Shuangge Ma
Deep networks are increasingly applied to a wide variety of data, including data with high-dimensional predictors. In such analysis, variable selection can be needed along with estimation/model building. Many of the existing deep network studies that incorporate variable selection have been limited to methodological and numerical developments. In this study, we consider modeling/estimation using the conditional Wasserstein Generative Adversarial networks. Group Lasso penalization is applied for variable selection, which may improve model estimation/prediction, interpretability, stability, etc. Significantly advancing from the existing literature, the analysis of censored survival data is also considered. We establish the convergence rate for variable selection while considering the approximation error, and obtain a more efficient distribution estimation. Simulations and the analysis of real experimental data demonstrate satisfactory practical utility of the proposed analysis.
MEApr 26, 2019
Structural modeling using overlapped group penalties for discovering predictive biomarkers for subgroup analysisChong Ma, Wenxuan Deng, Shuangge Ma et al.
The identification of predictive biomarkers from a large scale of covariates for subgroup analysis has attracted fundamental attention in medical research. In this article, we propose a generalized penalized regression method with a novel penalty function, for enforcing the hierarchy structure between the prognostic and predictive effects, such that a nonzero predictive effect must induce its ancestor prognostic effects being nonzero in the model. Our method is able to select useful predictive biomarkers by yielding a sparse, interpretable, and predictable model for subgroup analysis, and can deal with different types of response variable such as continuous, categorical, and time-to-event data. We show that our method is asymptotically consistent under some regularized conditions. To minimize the generalized penalized regression model, we propose a novel integrative optimization algorithm by integrating the majorization-minimization and the alternating direction method of multipliers, which is named after \texttt{smog}. The enriched simulation study and real case study demonstrate that our method is very powerful for discovering the true predictive biomarkers and identifying subgroups of patients.