1.3MLMay 25
StrTransformer: Source-Wise Structured Transformers for Unsupervised Blind Source RecoveryYuan-Hao Wei
This paper proposes StrTransformer, a source-wise structured Transformer framework for blind source recovery and branch-wise latent modeling. Instead of using an encoder to infer latent variables, StrTransformer directly optimizes the latent source matrix together with an observation-space mixer and source-wise structural Transformer branches. The mixer enforces reconstruction consistency, while each Transformer branch imposes a differentiable structural constraint on one latent source trajectory. Specifically, each source is converted into multi-scale patch tokens, randomly masked, processed by a locality-biased Transformer, and evaluated through a masked patch reconstruction energy. This energy acts as an implicit source-wise structural prior. To encourage different latent branches to specialize into different temporal regimes, StrTransformer further introduces an ordered multi-scale controller that learns branch-specific patch-scale weights, ordered scale centers, and locality attention slopes. The resulting objective combines observation reconstruction, source-wise structural regularization, and modular auxiliary penalties for separation and scale specialization. We analyze the decoupling and coupling structure of the objective, the regularized exact-reconstruction fiber, and the reduction of permutation symmetry induced by ordered branch descriptors. A controlled case study shows that the learned branches converge to distinct temporal-scale structures and recover source-aligned latent trajectories under post-hoc evaluation.
16.8MLApr 19
StrEBM: A Structured Latent Energy-Based Model for Blind Source SeparationYuan-Hao Wei
This paper proposes StrEBM, a structured latent energy-based model for source-wise structured representation learning. The framework is motivated by a broader goal of promoting identifiable and decoupled latent organization by assigning different latent dimensions their own learnable structural biases, rather than constraining the entire latent representation with a single shared energy. In this sense, blind source separation is adopted here as a concrete and verifiable testbed, through which the evolution of latent dimensions toward distinct underlying components can be directly examined. In the proposed framework, latent trajectories are optimized directly together with an observation-generation map and source-wise structural parameters. Each latent dimension is associated with its own energy-based formulation, allowing different latent components to gradually evolve toward distinct source-like roles during training. In the present study, this source-wise energy design is instantiated using Gaussian-process-inspired energies with learnable length-scales, but the framework itself is not restricted to Gaussian processes and is intended as a more general structured latent EBM formulation. Experiments on synthetic multichannel signals under linear and nonlinear mixing settings show that the proposed model can recover source components effectively, providing an initial empirical validation of the framework. At the same time, the study reveals important optimization characteristics, including slow late-stage convergence and reduced stability under nonlinear observation mappings. These findings not only clarify the practical behavior of the current GP-based instantiation, but also establish a basis for future investigation of richer source-wise energy families and more robust nonlinear optimization strategies.
MLSep 6, 2024
Half-VAE: An Encoder-Free VAE to Bypass Explicit Inverse MappingYuan-Hao Wei, Yan-Jie Sun, Chen Zhang
Inference and inverse problems are closely related concepts, both fundamentally involving the deduction of unknown causes or parameters from observed data. Bayesian inference, a powerful class of methods, is often employed to solve a variety of problems, including those related to causal inference. Variational inference, a subset of Bayesian inference, is primarily used to efficiently approximate complex posterior distributions. Variational Autoencoders (VAEs), which combine variational inference with deep learning, have become widely applied across various domains. This study explores the potential of VAEs for solving inverse problems, such as Independent Component Analysis (ICA), without relying on an explicit inverse mapping process. Unlike other VAE-based ICA methods, this approach discards the encoder in the VAE architecture, directly setting the latent variables as trainable parameters. In other words, the latent variables are no longer outputs of the encoder but are instead optimized directly through the objective function to converge to appropriate values. We find that, with a suitable prior setup, the latent variables, represented by trainable parameters, can exhibit mutually independent properties as the parameters converge, all without the need for an encoding process. This approach, referred to as the Half-VAE, bypasses the inverse mapping process by eliminating the encoder. This study demonstrates the feasibility of using the Half-VAE to solve ICA without the need for an explicit inverse mapping process.
28.1MLMar 20
PDGMM-VAE: A Variational Autoencoder with Adaptive Per-Dimension Gaussian Mixture Model Priors for Nonlinear ICAYuan-Hao Wei, Yan-Jie Sun
Independent component analysis is a core framework within blind source separation for recovering latent source signals from observed mixtures under statistical independence assumptions. In this work, we propose PDGMM-VAE, a source-oriented variational autoencoder in which each latent dimension, interpreted explicitly as an individual source signal, is assigned its own Gaussian mixture model prior. Unlike conventional VAE formulations with a shared simple prior, the proposed framework imposes per-dimension heterogeneous prior constraints, enabling the model to capture diverse non-Gaussian source statistics and thereby promote source separation under a probabilistic encoder-decoder architecture. Importantly, the parameters of these per-dimension GMM priors are not fixed in advance, but are adaptively learned and automatically refined toward convergence together with the encoder and decoder parameters under the overall training objective. Within this formulation, the encoder serves as a demixing mapping from observations to latent sources, while the decoder reconstructs the observed mixtures from the inferred components. The proposed model provides a systematic study of an idea that had previously only been noted in our preliminary form, namely, equipping different latent sources with different GMM priors for ICA, and formulates it as a full VAE framework with end-to-end training and per-dimension prior learning. Experimental results on both linear and nonlinear mixing problems demonstrate that PDGMM-VAE can recover latent source signals and achieve satisfactory separation performance.
31.6MLMar 15
AR-Flow VAE: A Structured Autoregressive Flow Prior Variational Autoencoder for Unsupervised Blind Source SeparationYuan-Hao Wei, Fu-Hao Deng, Lin-Yong Cui et al.
Blind source separation (BSS) seeks to recover latent source signals from observed mixtures. Variational autoencoders (VAEs) offer a natural perspective for this problem: the latent variables can be interpreted as source components, the encoder can be viewed as a demixing mapping from observations to sources, and the decoder can be regarded as a remixing process from inferred sources back to observations. In this work, we propose AR-Flow VAE, a novel VAE-based framework for BSS in which each latent source is endowed with a parameter-adaptive autoregressive flow prior. This prior significantly enhances the flexibility of latent source modeling, enabling the framework to capture complex non-Gaussian behaviors and structured dependencies, such as temporal correlations, that are difficult to represent with conventional priors. In addition, the structured prior design assigns distinct priors to different latent dimensions, thereby encouraging the latent components to separate into different source signals under heterogeneous prior constraints. Experimental results validate the effectiveness of the proposed architecture for blind source separation. More importantly, this work provides a foundation for future investigations into the identifiability and interpretability of AR-Flow VAE.
77.9MLApr 4
StrADiff: A Structured Source-Wise Adaptive Diffusion Framework for Linear and Nonlinear Blind Source SeparationYuan-Hao Wei
This paper presents a Structured Source-Wise Adaptive Diffusion Framework for linear and nonlinear blind source separation. The framework interprets each latent dimension as a source component and assigns to it an individual adaptive diffusion mechanism, thereby establishing source-wise latent modeling rather than relying on a single shared latent prior. The resulting formulation learns source recovery and the mixing/reconstruction process jointly within a unified end-to-end objective, allowing model parameters and latent sources to adapt simultaneously during training. This yields a common framework for both linear and nonlinear blind source separation. In the present instantiation, each source is further equipped with its own adaptive Gaussian process (GP) prior to impose source-wise temporal structure on the latent trajectories, while the overall framework is not restricted to Gaussian process priors and can in principle accommodate other structured source priors. The proposed model thus provides a general structured diffusion-based route to unsupervised source recovery, with potential relevance beyond blind source separation to interpretable latent modeling, source-wise disentanglement, and potentially identifiable nonlinear latent-variable learning under appropriate structural conditions.
11.5MLMar 26
SAHMM-VAE: A Source-Wise Adaptive Hidden Markov Prior Variational Autoencoder for Unsupervised Blind Source SeparationYuan-Hao Wei
We propose SAHMM-VAE, a source-wise adaptive Hidden Markov prior variational autoencoder for unsupervised blind source separation. Instead of treating the latent prior as a single generic regularizer, the proposed framework assigns each latent dimension its own adaptive regime-switching prior, so that different latent dimensions are pulled toward different source-specific temporal organizations during training. Under this formulation, source separation is not implemented as an external post-processing step; it is embedded directly into variational learning itself. The encoder, decoder, posterior parameters, and source-wise prior parameters are optimized jointly, where the encoder progressively learns an inference map that behaves like an approximate inverse of the mixing transformation, while the decoder plays the role of the generative mixing model. Through this coupled optimization, the gradual alignment between posterior source trajectories and heterogeneous HMM priors becomes the mechanism through which different latent dimensions separate into different source components. To instantiate this idea, we develop three branches within one common framework: a Gaussian-emission HMM prior, a Markov-switching autoregressive HMM prior, and an HMM state-flow prior with state-wise autoregressive flow transformations. Experiments show that the proposed framework achieves unsupervised source recovery while also learning meaningful source-wise switching structures. More broadly, the method extends our structured-prior VAE line from smooth, mixture-based, and flow-based latent priors to adaptive switching priors, and provides a useful basis for future work on interpretable and potentially identifiable latent source modeling.
MLJun 8, 2025
Half-AVAE: Adversarial-Enhanced Factorized and Structured Encoder-Free VAE for Underdetermined Independent Component AnalysisYuan-Hao Wei, Yan-Jie Sun
This study advances the Variational Autoencoder (VAE) framework by addressing challenges in Independent Component Analysis (ICA) under both determined and underdetermined conditions, focusing on enhancing the independence and interpretability of latent variables. Traditional VAEs map observed data to latent variables and back via an encoder-decoder architecture, but struggle with underdetermined ICA where the number of latent variables exceeds observed signals. The proposed Half Adversarial VAE (Half-AVAE) builds on the encoder-free Half-VAE framework, eliminating explicit inverse mapping to tackle underdetermined scenarios. By integrating adversarial networks and External Enhancement (EE) terms, Half-AVAE promotes mutual independence among latent dimensions, achieving factorized and interpretable representations. Experiments with synthetic signals demonstrate that Half-AVAE outperforms baseline models, including GP-AVAE and Half-VAE, in recovering independent components under underdetermined conditions, as evidenced by lower root mean square errors. The study highlights the flexibility of VAEs in variational inference, showing that encoder omission, combined with adversarial training and structured priors, enables effective solutions for complex ICA tasks, advancing applications in disentanglement, causal inference, and generative modeling.
LGFeb 28, 2025
VAEs and GANs: Implicitly Approximating Complex Distributions with Simple Base Distributions and Deep Neural Networks -- Principles, Necessity, and LimitationsYuan-Hao Wei
This tutorial focuses on the fundamental architectures of Variational Autoencoders (VAE) and Generative Adversarial Networks (GAN), disregarding their numerous variations, to highlight their core principles. Both VAE and GAN utilize simple distributions, such as Gaussians, as a basis and leverage the powerful nonlinear transformation capabilities of neural networks to approximate arbitrarily complex distributions. The theoretical basis lies in that a linear combination of multiple Gaussians can almost approximate any probability distribution, while neural networks enable further refinement through nonlinear transformations. Both methods approximate complex data distributions implicitly. This implicit approximation is crucial because directly modeling high-dimensional distributions explicitly is often intractable. However, the choice of a simple latent prior, while computationally convenient, introduces limitations. In VAEs, the fixed Gaussian prior forces the posterior distribution to align with it, potentially leading to loss of information and reduced expressiveness. This restriction affects both the interpretability of the model and the quality of generated samples.
MLAug 13, 2025
Structured Kernel Regression VAE: A Computationally Efficient Surrogate for GP-VAEs in ICAYuan-Hao Wei, Fu-Hao Deng, Lin-Yong Cui et al.
The interpretability of generative models is considered a key factor in demonstrating their effectiveness and controllability. The generated data are believed to be determined by latent variables that are not directly observable. Therefore, disentangling, decoupling, decomposing, causal inference, or performing Independent Component Analysis (ICA) in the latent variable space helps uncover the independent factors that influence the attributes or features affecting the generated outputs, thereby enhancing the interpretability of generative models. As a generative model, Variational Autoencoders (VAEs) combine with variational Bayesian inference algorithms. Using VAEs, the inverse process of ICA can be equivalently framed as a variational inference process. In some studies, Gaussian processes (GPs) have been introduced as priors for each dimension of latent variables in VAEs, structuring and separating each dimension from temporal or spatial perspectives, and encouraging different dimensions to control various attributes of the generated data. However, GPs impose a significant computational burden, resulting in substantial resource consumption when handling large datasets. Essentially, GPs model different temporal or spatial structures through various kernel functions. Structuring the priors of latent variables via kernel functions-so that different kernel functions model the correlations among sequence points within different latent dimensions-is at the core of achieving disentanglement in VAEs. The proposed Structured Kernel Regression VAE (SKR-VAE) leverages this core idea in a more efficient way, avoiding the costly kernel matrix inversion required in GPs. This research demonstrates that, while maintaining ICA performance, SKR-VAE achieves greater computational efficiency and significantly reduced computational burden compared to GP-VAE.