Kejun Huang

h-index22

18papers

2,421citations

Novelty51%

AI Score29

Ranked #146,122 of 194,257 authors (top 75%)#2,356 in ML (top 70%)

18 Papers

3.3LGJul 19, 2022

Adaptive Learning for the Resource-Constrained Classification Problem

Danit Shifman Abukasis, Izack Cohen, Xiaochen Xian et al.

Resource-constrained classification tasks are common in real-world applications such as allocating tests for disease diagnosis, hiring decisions when filling a limited number of positions, and defect detection in manufacturing settings under a limited inspection budget. Typical classification algorithms treat the learning process and the resource constraints as two separate and sequential tasks. Here we design an adaptive learning approach that considers resource constraints and learning jointly by iteratively fine-tuning misclassification costs. Via a structured experimental study using a publicly available data set, we evaluate a decision tree classifier that utilizes the proposed approach. The adaptive learning approach performs significantly better than alternative approaches, especially for difficult classification problems in which the performance of common approaches may be unsatisfactory. We envision the adaptive learning approach as an important addition to the repertoire of techniques for handling resource-constrained classification problems.

3.3LGJan 31, 2022

JULIA: Joint Multi-linear and Nonlinear Identification for Tensor Completion

Cheng Qian, Kejun Huang, Lucas Glass et al.

Tensor completion aims at imputing missing entries from a partially observed tensor. Existing tensor completion methods often assume either multi-linear or nonlinear relationships between latent components. However, real-world tensors have much more complex patterns where both multi-linear and nonlinear relationships may coexist. In such cases, the existing methods are insufficient to describe the data structure. This paper proposes a Joint mUlti-linear and nonLinear IdentificAtion (JULIA) framework for large-scale tensor completion. JULIA unifies the multi-linear and nonlinear tensor completion models with several advantages over the existing methods: 1) Flexible model selection, i.e., it fits a tensor by assigning its values as a combination of multi-linear and nonlinear components; 2) Compatible with existing nonlinear tensor completion methods; 3) Efficient training based on a well-designed alternating optimization approach. Experiments on six real large-scale tensors demonstrate that JULIA outperforms many existing tensor completion algorithms. Furthermore, JULIA can improve the performance of a class of nonlinear tensor completion methods. The results show that in some large-scale tensor completion scenarios, baseline methods with JULIA are able to obtain up to 55% lower root mean-squared-error and save 67% computational complexity.

11.3SPJun 15, 2020

Computing Large-Scale Matrix and Tensor Decomposition with Structured Factors: A Unified Nonconvex Optimization Perspective

Xiao Fu, Nico Vervliet, Lieven De Lathauwer et al.

The proposed article aims at offering a comprehensive tutorial for the computational aspects of structured matrix and tensor factorization. Unlike existing tutorials that mainly focus on {\it algorithmic procedures} for a small set of problems, e.g., nonnegativity or sparsity-constrained factorization, we take a {\it top-down} approach: we start with general optimization theory (e.g., inexact and accelerated block coordinate descent, stochastic optimization, and Gauss-Newton methods) that covers a wide range of factorization problems with diverse constraints and regularization terms of engineering interest. Then, we go `under the hood' to showcase specific algorithm design under these introduced principles. We pay a particular attention to recent algorithmic developments in structured tensor and matrix factorization (e.g., random sketching and adaptive step size based stochastic optimization and structure-exploiting second-order algorithms), which are the state of the art---yet much less touched upon in the literature compared to {\it block coordinate descent} (BCD)-based methods. We expect that the article to have an educational values in the field of structured factorization and hope to stimulate more research in this important and exciting direction.

12.8LGSep 26, 2019

Crowdsourcing via Pairwise Co-occurrences: Identifiability and Algorithms

Shahana Ibrahim, Xiao Fu, Nikos Kargas et al.

The data deluge comes with high demands for data labeling. Crowdsourcing (or, more generally, ensemble learning) techniques aim to produce accurate labels via integrating noisy, non-expert labeling from annotators. The classic Dawid-Skene estimator and its accompanying expectation maximization (EM) algorithm have been widely used, but the theoretical properties are not fully understood. Tensor methods were proposed to guarantee identification of the Dawid-Skene model, but the sample complexity is a hurdle for applying such approaches---since the tensor methods hinge on the availability of third-order statistics that are hard to reliably estimate given limited data. In this paper, we propose a framework using pairwise co-occurrences of the annotator responses, which naturally admits lower sample complexity. We show that the approach can identify the Dawid-Skene model under realistic conditions. We propose an algebraic algorithm reminiscent of convex geometry-based structured matrix factorization to solve the model identification problem efficiently, and an identifiability-enhanced algorithm for handling more challenging and critical scenarios. Experiments show that the proposed algorithms outperform the state-of-art algorithms under a variety of scenarios.

16.3OCJul 9, 2019

SNAP: Finding Approximate Second-Order Stationary Solutions Efficiently for Non-convex Linearly Constrained Problems

Songtao Lu, Meisam Razaviyayn, Bo Yang et al.

This paper proposes low-complexity algorithms for finding approximate second-order stationary points (SOSPs) of problems with smooth non-convex objective and linear constraints. While finding (approximate) SOSPs is computationally intractable, we first show that generic instances of the problem can be solved efficiently. More specifically, for a generic problem instance, certain strict complementarity (SC) condition holds for all Karush-Kuhn-Tucker (KKT) solutions (with probability one). The SC condition is then used to establish an equivalence relationship between two different notions of SOSPs, one of which is computationally easy to verify. Based on this particular notion of SOSP, we design an algorithm named the Successive Negative-curvature grAdient Projection (SNAP), which successively performs either conventional gradient projection or some negative curvature based projection steps to find SOSPs. SNAP and its first-order extension SNAP$^+$, require $\mathcal{O}(1/ε^{2.5})$ iterations to compute an $(ε, \sqrtε)$-SOSP, and their per-iteration computational complexities are polynomial in the number of constraints and problem dimension. To our knowledge, this is the first time that first-order algorithms with polynomial per-iteration complexity and global sublinear rate have been designed to find SOSPs of the important class of non-convex problems with linear constraints.

12.2SPJan 16, 2019

Block-Randomized Stochastic Proximal Gradient for Low-Rank Tensor Factorization

Xiao Fu, Shahana Ibrahim, Hoi-To Wai et al.

This work considers the problem of computing the canonical polyadic decomposition (CPD) of large tensors. Prior works mostly leverage data sparsity to handle this problem, which is not suitable for handling dense tensors that often arise in applications such as medical imaging, computer vision, and remote sensing. Stochastic optimization is known for its low memory cost and per-iteration complexity when handling dense data. However, exisiting stochastic CPD algorithms are not flexible enough to incorporate a variety of constraints/regularizations that are of interest in signal and data analytics. Convergence properties of many such algorithms are also unclear. In this work, we propose a stochastic optimization framework for large-scale CPD with constraints/regularizations. The framework works under a doubly randomized fashion, and can be regarded as a judicious combination of randomized block coordinate descent (BCD) and stochastic proximal gradient (SPG). The algorithm enjoys lightweight updates and small memory footprint. In addition, this framework entails considerable flexibility---many frequently used regularizers and constraints can be readily handled under the proposed scheme. The approach is also supported by convergence analysis. Numerical results on large-scale dense tensors are employed to showcase the effectiveness of the proposed approach.

5.4LGJan 6, 2019

Learning Nonlinear Mixtures: Identifiability and Algorithm

Bo Yang, Xiao Fu, Nicholas D. Sidiropoulos et al.

Linear mixture models have proven very useful in a plethora of applications, e.g., topic modeling, clustering, and source separation. As a critical aspect of the linear mixture models, identifiability of the model parameters is well-studied, under frameworks such as independent component analysis and constrained matrix factorization. Nevertheless, when the linear mixtures are distorted by an unknown nonlinear functions -- which is well-motivated and more realistic in many cases -- the identifiability issues are much less studied. This work proposes an identification criterion for a nonlinear mixture model that is well grounded in many real-world applications, and offers identifiability guarantees. A practical implementation based on a judiciously designed neural network is proposed to realize the criterion, and an effective learning algorithm is proposed. Numerical results on synthetic and real-data corroborate effectiveness of the proposed method.

1.7CLFeb 19, 2018

Learning Hidden Markov Models from Pairwise Co-occurrences with Application to Topic Modeling

Kejun Huang, Xiao Fu, Nicholas D. Sidiropoulos

We present a new algorithm for identifying the transition and emission probabilities of a hidden Markov model (HMM) from the emitted data. Expectation-maximization becomes computationally prohibitive for long observation records, which are often required for identification. The new algorithm is particularly suitable for cases where the available sample size is large enough to accurately estimate second-order output probabilities, but not higher-order ones. We show that if one is only able to obtain a reliable estimate of the pairwise co-occurrence probabilities of the emissions, it is still possible to uniquely identify the HMM if the emission probability is \emph{sufficiently scattered}. We apply our method to hidden topic Markov modeling, and demonstrate that we can learn topics with higher quality if documents are modeled as observations of HMMs sharing the same emission (topic) probability, compared to the simple but widely used bag-of-words model.

6.6SPNov 21, 2017

Kullback-Leibler Principal Component for Tensors is not NP-hard

Kejun Huang, Nicholas D. Sidiropoulos

We study the problem of nonnegative rank-one approximation of a nonnegative tensor, and show that the globally optimal solution that minimizes the generalized Kullback-Leibler divergence can be efficiently obtained, i.e., it is not NP-hard. This result works for arbitrary nonnegative tensors with an arbitrary number of modes (including two, i.e., matrices). We derive a closed-form expression for the KL principal component, which is easy to compute and has an intuitive probabilistic interpretation. For generalized KL approximation with higher ranks, the problem is for the first time shown to be equivalent to multinomial latent variable modeling, and an iterative algorithm is derived that resembles the expectation-maximization algorithm. On the Iris dataset, we showcase how the derived results help us learn the model in an \emph{unsupervised} manner, and obtain strikingly close performance to that from supervised methods.

6.6MLNov 20, 2017

On Convergence of Epanechnikov Mean Shift

Kejun Huang, Xiao Fu, Nicholas D. Sidiropoulos

Epanechnikov Mean Shift is a simple yet empirically very effective algorithm for clustering. It localizes the centroids of data clusters via estimating modes of the probability distribution that generates the data points, using the `optimal' Epanechnikov kernel density estimator. However, since the procedure involves non-smooth kernel density functions, the convergence behavior of Epanechnikov mean shift lacks theoretical support as of this writing---most of the existing analyses are based on smooth functions and thus cannot be applied to Epanechnikov Mean Shift. In this work, we first show that the original Epanechnikov Mean Shift may indeed terminate at a non-critical point, due to the non-smoothness nature. Based on our analysis, we propose a simple remedy to fix it. The modified Epanechnikov Mean Shift is guaranteed to terminate at a local maximum of the estimated density, which corresponds to a cluster centroid, within a finite number of iterations. We also propose a way to avoid running the Mean Shift iterates from every data point, while maintaining good clustering accuracies under non-overlapping spherical Gaussian mixture models. This further pushes Epanechnikov Mean Shift to handle very large and high-dimensional data sets. Experiments show surprisingly good performance compared to the Lloyd's K-means algorithm and the EM algorithm.

15.8LGSep 2, 2017

On Identifiability of Nonnegative Matrix Factorization

Xiao Fu, Kejun Huang, Nicholas D. Sidiropoulos

In this letter, we propose a new identification criterion that guarantees the recovery of the low-rank latent factors in the nonnegative matrix factorization (NMF) model, under mild conditions. Specifically, using the proposed criterion, it suffices to identify the latent factors if the rows of one factor are \emph{sufficiently scattered} over the nonnegative orthant, while no structural assumption is imposed on the other factor except being full-rank. This is by far the mildest condition under which the latent factors are provably identifiable from the NMF model.

12.8MLNov 15, 2016

Anchor-Free Correlated Topic Modeling: Identifiability and Algorithm

Kejun Huang, Xiao Fu, Nicholas D. Sidiropoulos

In topic modeling, many algorithms that guarantee identifiability of the topics have been developed under the premise that there exist anchor words -- i.e., words that only appear (with positive probability) in one topic. Follow-up work has resorted to three or higher-order statistics of the data corpus to relax the anchor word assumption. Reliable estimates of higher-order statistics are hard to obtain, however, and the identification of topics under those models hinges on uncorrelatedness of the topics, which can be unrealistic. This paper revisits topic modeling based on second-order moments, and proposes an anchor-free topic mining framework. The proposed approach guarantees the identification of the topics under a much milder condition compared to the anchor-word assumption, thereby exhibiting much better robustness in practice. The associated algorithm only involves one eigen-decomposition and a few small linear programs. This makes it easy to implement and scale up to very large problem instances. Experiments using the TDT2 and Reuters-21578 corpus demonstrate that the proposed anchor-free approach exhibits very favorable performance (measured using coherence, similarity count, and clustering accuracy metrics) compared to the prior art.

17.2MLAug 15, 2016

Robust Volume Minimization-Based Matrix Factorization for Remote Sensing and Document Clustering

Xiao Fu, Kejun Huang, Bo Yang et al.

This paper considers \emph{volume minimization} (VolMin)-based structured matrix factorization (SMF). VolMin is a factorization criterion that decomposes a given data matrix into a basis matrix times a structured coefficient matrix via finding the minimum-volume simplex that encloses all the columns of the data matrix. Recent work showed that VolMin guarantees the identifiability of the factor matrices under mild conditions that are realistic in a wide variety of applications. This paper focuses on both theoretical and practical aspects of VolMin. On the theory side, exact equivalence of two independently developed sufficient conditions for VolMin identifiability is proven here, thereby providing a more comprehensive understanding of this aspect of VolMin. On the algorithm side, computational complexity and sensitivity to outliers are two key challenges associated with real-world applications of VolMin. These are addressed here via a new VolMin algorithm that handles volume regularization in a computationally simple way, and automatically detects and {iteratively downweights} outliers, simultaneously. Simulations and real-data experiments using a remotely sensed hyperspectral image and the Reuters document corpus are employed to showcase the effectiveness of the proposed algorithm.

40.5MLJul 6, 2016

Tensor Decomposition for Signal Processing and Machine Learning

Nicholas D. Sidiropoulos, Lieven De Lathauwer, Xiao Fu et al.

Tensors or {\em multi-way arrays} are functions of three or more indices $(i,j,k,\cdots)$ -- similar to matrices (two-way arrays), which are functions of two indices $(r,c)$ for (row,column). Tensors have a rich history, stretching over almost a century, and touching upon numerous disciplines; but they have only recently become ubiquitous in signal and data analytics at the confluence of signal processing, statistics, data mining and machine learning. This overview article aims to provide a good starting point for researchers and practitioners interested in learning about and working with tensors. As such, it focuses on fundamentals and motivation (using various application examples), aiming to strike an appropriate balance of breadth {\em and depth} that will enable someone having taken first graduate courses in matrix algebra and probability to get started doing research and/or developing tensor algorithms and software. Some background in applied optimization is useful but not strictly required. The material covered includes tensor rank and rank decomposition; basic tensor factorization models and their relationships and properties (including fairly good coverage of identifiability); broad coverage of algorithms ranging from alternating optimization to stochastic gradient; statistical performance analysis; and applications ranging from source separation to collaborative filtering, mixture and topic modeling, classification, and multilinear subspace learning.

9.7MLMay 31, 2016

Scalable and Flexible Multiview MAX-VAR Canonical Correlation Analysis

Xiao Fu, Kejun Huang, Mingyi Hong et al.

Generalized canonical correlation analysis (GCCA) aims at finding latent low-dimensional common structure from multiple views (feature vectors in different domains) of the same entities. Unlike principal component analysis (PCA) that handles a single view, (G)CCA is able to integrate information from different feature spaces. Here we focus on MAX-VAR GCCA, a popular formulation which has recently gained renewed interest in multilingual processing and speech modeling. The classic MAX-VAR GCCA problem can be solved optimally via eigen-decomposition of a matrix that compounds the (whitened) correlation matrices of the views; but this solution has serious scalability issues, and is not directly amenable to incorporating pertinent structural constraints such as non-negativity and sparsity on the canonical components. We posit regularized MAX-VAR GCCA as a non-convex optimization problem and propose an alternating optimization (AO)-based algorithm to handle it. Our algorithm alternates between {\em inexact} solutions of a regularized least squares subproblem and a manifold-constrained non-convex subproblem, thereby achieving substantial memory and computational savings. An important benefit of our design is that it can easily handle structure-promoting regularization. We show that the algorithm globally converges to a critical point at a sublinear rate, and approaches a global optimal solution at a linear rate when no regularization is considered. Judiciously designed simulations and large-scale word embedding tasks are employed to showcase the effectiveness of the proposed algorithm.

11.3OCMar 16, 2016

Phase Retrieval from 1D Fourier Measurements: Convexity, Uniqueness, and Algorithms

Kejun Huang, Yonina C. Eldar, Nicholas D. Sidiropoulos

This paper considers phase retrieval from the magnitude of 1D over-sampled Fourier measurements, a classical problem that has challenged researchers in various fields of science and engineering. We show that an optimal vector in a least-squares sense can be found by solving a convex problem, thus establishing a hidden convexity in Fourier phase retrieval. We also show that the standard semidefinite relaxation approach yields the optimal cost function value (albeit not necessarily an optimal solution) in this case. A method is then derived to retrieve an optimal minimum phase solution in polynomial time. Using these results, a new measuring technique is proposed which guarantees uniqueness of the solution, along with an efficient algorithm that can solve large-scale Fourier phase retrieval problems with uniqueness and optimality guarantees.

11.8MLJul 16, 2015

Joint Tensor Factorization and Outlying Slab Suppression with Applications

Xiao Fu, Kejun Huang, Wing-Kin Ma et al.

We consider factoring low-rank tensors in the presence of outlying slabs. This problem is important in practice, because data collected in many real-world applications, such as speech, fluorescence, and some social network data, fit this paradigm. Prior work tackles this problem by iteratively selecting a fixed number of slabs and fitting, a procedure which may not converge. We formulate this problem from a group-sparsity promoting point of view, and propose an alternating optimization framework to handle the corresponding $\ell_p$ ($0<p\leq 1$) minimization-based low-rank tensor factorization problem. The proposed algorithm features a similar per-iteration complexity as the plain trilinear alternating least squares (TALS) algorithm. Convergence of the proposed algorithm is also easy to analyze under the framework of alternating optimization and its variants. In addition, regularization and constraints can be easily incorporated to make use of \emph{a priori} information on the latent loading factors. Simulations and real data experiments on blind speech separation, fluorescence data analysis, and social network mining are used to showcase the effectiveness of the proposed algorithm.

24.3MLJun 13, 2015

A Flexible and Efficient Algorithmic Framework for Constrained Matrix and Tensor Factorization

Kejun Huang, Nicholas D. Sidiropoulos, Athanasios P. Liavas

We propose a general algorithmic framework for constrained matrix and tensor factorization, which is widely used in signal processing and machine learning. The new framework is a hybrid between alternating optimization (AO) and the alternating direction method of multipliers (ADMM): each matrix factor is updated in turn, using ADMM, hence the name AO-ADMM. This combination can naturally accommodate a great variety of constraints on the factor matrices, and almost all possible loss measures for the fitting. Computation caching and warm start strategies are used to ensure that each update is evaluated efficiently, while the outer AO framework exploits recent developments in block coordinate descent (BCD)-type methods which help ensure that every limit point is a stationary point, as well as faster and more robust convergence in practice. Three special cases are studied in detail: non-negative matrix/tensor factorization, constrained matrix/tensor completion, and dictionary learning. Extensive simulations and experiments with real data are used to showcase the effectiveness and broad applicability of the proposed framework.