Gene Cheung

CV
h-index31
54papers
1,118citations
Novelty50%
AI Score56

54 Papers

SPApr 1, 2023
Volumetric Attribute Compression for 3D Point Clouds using Feedforward Network with Geometric Attention

Tam Thuc Do, Philip A. Chou, Gene Cheung

We study 3D point cloud attribute compression using a volumetric approach: given a target volumetric attribute function $f : \mathbb{R}^3 \rightarrow \mathbb{R}$, we quantize and encode parameter vector $θ$ that characterizes $f$ at the encoder, for reconstruction $f_{\hatθ}(\mathbf{x})$ at known 3D points $\mathbf{x}$'s at the decoder. Extending a previous work Region Adaptive Hierarchical Transform (RAHT) that employs piecewise constant functions to span a nested sequence of function spaces, we propose a feedforward linear network that implements higher-order B-spline bases spanning function spaces without eigen-decomposition. Feedforward network architecture means that the system is amenable to end-to-end neural learning. The key to our network is space-varying convolution, similar to a graph operator, whose weights are computed from the known 3D geometry for normalization. We show that the number of layers in the normalization at the encoder is equivalent to the number of terms in a matrix inverse Taylor series. Experimental results on real-world 3D point clouds show up to 2-3 dB gain over RAHT in energy compaction and 20-30% bitrate reduction.

SPAug 18, 2022
Efficient Signed Graph Sampling via Balancing & Gershgorin Disc Perfect Alignment

Chinthaka Dinesh, Gene Cheung, Saghar Bagheri et al.

A basic premise in graph signal processing (GSP) is that a graph encoding pairwise (anti-)correlations of the targeted signal as edge weights is exploited for graph filtering. However, existing fast graph sampling schemes are designed and tested only for positive graphs describing positive correlations. In this paper, we show that for datasets with strong inherent anti-correlations, a suitable graph contains both positive and negative edge weights. In response, we propose a linear-time signed graph sampling method centered on the concept of balanced signed graphs. Specifically, given an empirical covariance data matrix $\bar{\bf{C}}$, we first learn a sparse inverse matrix (graph Laplacian) $\mathcal{L}$ corresponding to a signed graph $\mathcal{G}$. We define the eigenvectors of Laplacian $\mathcal{L}_B$ for a balanced signed graph $\mathcal{G}_B$ -- approximating $\mathcal{G}$ via edge weight augmentation -- as graph frequency components. Next, we choose samples to minimize the low-pass filter reconstruction error in two steps. We first align all Gershgorin disc left-ends of Laplacian $\mathcal{L}_B$ at smallest eigenvalue $λ_{\min}(\mathcal{L}_B)$ via similarity transform $\mathcal{L}_p = §\mathcal{L}_B §^{-1}$, leveraging a recent linear algebra theorem called Gershgorin disc perfect alignment (GDPA). We then perform sampling on $\mathcal{L}_p$ using a previous fast Gershgorin disc alignment sampling (GDAS) scheme. Experimental results show that our signed graph sampling method outperformed existing fast sampling schemes noticeably on various datasets.

IVJul 5, 2023
Retinex-based Image Denoising / Contrast Enhancement using Gradient Graph Laplacian Regularizer

Yeganeh Gharedaghi, Gene Cheung, Xianming Liu

Images captured in poorly lit conditions are often corrupted by acquisition noise. Leveraging recent advances in graph-based regularization, we propose a fast Retinex-based restoration scheme that denoises and contrast-enhances an image. Specifically, by Retinex theory we first assume that each image pixel is a multiplication of its reflectance and illumination components. We next assume that the reflectance and illumination components are piecewise constant (PWC) and continuous piecewise planar (PWP) signals, which can be recovered via graph Laplacian regularizer (GLR) and gradient graph Laplacian regularizer (GGLR) respectively. We formulate quadratic objectives regularized by GLR and GGLR, which are minimized alternately until convergence by solving linear systems -- with improved condition numbers via proposed preconditioners -- via conjugate gradient (CG) efficiently. Experimental results show that our algorithm achieves competitive visual image quality while reducing computation complexity noticeably.

IVMar 2, 2022
Hybrid Model-based / Data-driven Graph Transform for Image Coding

Saghar Bagheri, Tam Thuc Do, Gene Cheung et al.

Transform coding to sparsify signal representations remains crucial in an image compression pipeline. While the Karhunen-Loève transform (KLT) computed from an empirical covariance matrix $\bar{C}$ is theoretically optimal for a stationary process, in practice, collecting sufficient statistics from a non-stationary image to reliably estimate $\bar{C}$ can be difficult. In this paper, to encode an intra-prediction residual block, we pursue a hybrid model-based / data-driven approach: the first $K$ eigenvectors of a transform matrix are derived from a statistical model, e.g., the asymmetric discrete sine transform (ADST), for stability, while the remaining $N-K$ are computed from $\bar{C}$ for performance. The transform computation is posed as a graph learning problem, where we seek a graph Laplacian matrix minimizing a graphical lasso objective inside a convex cone sharing the first $K$ eigenvectors in a Hilbert space of real symmetric matrices. We efficiently solve the problem via augmented Lagrangian relaxation and proximal gradient (PG). Using WebP as a baseline image codec, experimental results show that our hybrid graph transform achieved better energy compaction than default discrete cosine transform (DCT) and better stability than KLT.

LGAug 4, 2022
Unsupervised Graph Spectral Feature Denoising for Crop Yield Prediction

Saghar Bagheri, Chinthaka Dinesh, Gene Cheung et al.

Prediction of annual crop yields at a county granularity is important for national food production and price stability. In this paper, towards the goal of better crop yield prediction, leveraging recent graph signal processing (GSP) tools to exploit spatial correlation among neighboring counties, we denoise relevant features via graph spectral filtering that are inputs to a deep learning prediction model. Specifically, we first construct a combinatorial graph with edge weights that encode county-to-county similarities in soil and location features via metric learning. We then denoise features via a maximum a posteriori (MAP) formulation with a graph Laplacian regularizer (GLR). We focus on the challenge to estimate the crucial weight parameter $μ$, trading off the fidelity term and GLR, that is a function of noise variance in an unsupervised manner. We first estimate noise variance directly from noise-corrupted graph signals using a graph clique detection (GCD) procedure that discovers locally constant regions. We then compute an optimal $μ$ minimizing an approximate mean square error function via bias-variance analysis. Experimental results from collected USDA data show that using denoised features as input, performance of a crop yield prediction model can be improved noticeably.

LGJun 2, 2023
Graph Sparsification for GCN Towards Optimal Crop Yield Predictions

Saghar Bagheri, Gene Cheung, Tim Eadie

In agronomics, predicting crop yield at a per field/county granularity is important for farmers to minimize uncertainty and plan seeding for the next crop cycle. While state-of-the-art prediction techniques employ graph convolutional nets (GCN) to predict future crop yields given relevant features and crop yields of previous years, a dense underlying graph kernel requires long training and execution time. In this paper, we propose a graph sparsification method based on the Fiedler number to remove edges from a complete graph kernel, in order to lower the complexity of GCN training/execution. Specifically, we first show that greedily removing an edge at a time that induces the minimal change in the second eigenvalue leads to a sparse graph with good GCN performance. We then propose a fast method to choose an edge for removal per iteration based on an eigenvalue perturbation theorem. Experiments show that our Fiedler-based method produces a sparse graph with good GCN performance compared to other graph sparsification schemes in crop yield prediction.

LGSep 12, 2024
Efficient Learning of Balanced Signed Graphs via Iterative Linear Programming

Haruki Yokota, Hiroshi Higashi, Yuichi Tanaka et al.

Signed graphs are equipped with both positive and negative edge weights, encoding pairwise correlations as well as anti-correlations in data. A balanced signed graph has no cycles of odd number of negative edges. Laplacian of a balanced signed graph has eigenvectors that map simply to ones in a similarity-transformed positive graph Laplacian, thus enabling reuse of well-studied spectral filters designed for positive graphs. We propose a fast method to learn a balanced signed graph Laplacian directly from data. Specifically, for each node $i$, to determine its polarity $β_i \in \{-1,1\}$ and edge weights $\{w_{i,j}\}_{j=1}^N$, we extend a sparse inverse covariance formulation based on linear programming (LP) called CLIME, by adding linear constraints to enforce ``consistent" signs of edge weights $\{w_{i,j}\}_{j=1}^N$ with the polarities of connected nodes -- i.e., positive/negative edges connect nodes of same/opposing polarities. For each LP, we adapt projections on convex set (POCS) to determine a suitable CLIME parameter $ρ> 0$ that guarantees LP feasibility. We solve the resulting LP via an off-the-shelf LP solver in $\mathcal{O}(N^{2.055})$. Experiments on synthetic and real-world datasets show that our balanced graph learning method outperforms competing methods and enables the use of spectral filters and graph convolutional networks (GCNs) designed for positive graphs on signed graphs.

LGMay 19, 2025Code
Lightweight and Interpretable Transformer via Mixed Graph Algorithm Unrolling for Traffic Forecast

Ji Qi, Tam Thuc Do, Mingxiao Liu et al.

Unlike conventional "black-box" transformers with classical self-attention mechanism, we build a lightweight and interpretable transformer-like neural net by unrolling a mixed-graph-based optimization algorithm to forecast traffic with spatial and temporal dimensions. We construct two graphs: an undirected graph $\mathcal{G}^u$ capturing spatial correlations across geography, and a directed graph $\mathcal{G}^d$ capturing sequential relationships over time. We predict future samples of signal $\mathbf{x}$, assuming it is "smooth" with respect to both $\mathcal{G}^u$ and $\mathcal{G}^d$, where we design new $\ell_2$ and $\ell_1$-norm variational terms to quantify and promote signal smoothness (low-frequency reconstruction) on a directed graph. We design an iterative algorithm based on alternating direction method of multipliers (ADMM), and unroll it into a feed-forward network for data-driven parameter learning. We insert graph learning modules for $\mathcal{G}^u$ and $\mathcal{G}^d$ that play the role of self-attention. Experiments show that our unrolled networks achieve competitive traffic forecast performance as state-of-the-art prediction schemes, while reducing parameter counts drastically. Our code is available in https://github.com/SingularityUndefined/Unrolling-GSP-STForecast .

IVNov 22, 2023
Learned Nonlinear Predictor for Critically Sampled 3D Point Cloud Attribute Compression

Tam Thuc Do, Philip A. Chou, Gene Cheung

We study 3D point cloud attribute compression via a volumetric approach: assuming point cloud geometry is known at both encoder and decoder, parameters $θ$ of a continuous attribute function $f: \mathbb{R}^3 \mapsto \mathbb{R}$ are quantized to $\hatθ$ and encoded, so that discrete samples $f_{\hatθ}(\mathbf{x}_i)$ can be recovered at known 3D points $\mathbf{x}_i \in \mathbb{R}^3$ at the decoder. Specifically, we consider a nested sequences of function subspaces $\mathcal{F}^{(p)}_{l_0} \subseteq \cdots \subseteq \mathcal{F}^{(p)}_L$, where $\mathcal{F}_l^{(p)}$ is a family of functions spanned by B-spline basis functions of order $p$, $f_l^*$ is the projection of $f$ on $\mathcal{F}_l^{(p)}$ represented as low-pass coefficients $F_l^*$, and $g_l^*$ is the residual function in an orthogonal subspace $\mathcal{G}_l^{(p)}$ (where $\mathcal{G}_l^{(p)} \oplus \mathcal{F}_l^{(p)} = \mathcal{F}_{l+1}^{(p)}$) represented as high-pass coefficients $G_l^*$. In this paper, to improve coding performance over \cite{do2023volumetric}, we study predicting $f_{l+1}^*$ at level $l+1$ given $f_l^*$ at level $l$ and encoding of $G_l^*$ for the $p=1$ case (RAHT($1$)). For the prediction, we formalize RAHT(1) linear prediction in MPEG-PCC in a theoretical framework, and propose a new nonlinear predictor using a polynomial of bilateral filter. We derive equations to efficiently compute the critically sampled high-pass coefficients $G_l^*$ amenable to encoding. We optimize parameters in our resulting feed-forward network on a large training set of point clouds by minimizing a rate-distortion Lagrangian. Experimental results show that our improved framework outperforms the MPEG G-PCC predictor by $11\%$--$12\%$ in bit rate.

CVAug 3, 2024
Graph Unfolding and Sampling for Transitory Video Summarization via Gershgorin Disc Alignment

Sadid Sahami, Gene Cheung, Chia-Wen Lin

User-generated videos (UGVs) uploaded from mobile phones to social media sites like YouTube and TikTok are short and non-repetitive. We summarize a transitory UGV into several keyframes in linear time via fast graph sampling based on Gershgorin disc alignment (GDA). Specifically, we first model a sequence of $N$ frames in a UGV as an $M$-hop path graph $\mathcal{G}^o$ for $M \ll N$, where the similarity between two frames within $M$ time instants is encoded as a positive edge based on feature similarity. Towards efficient sampling, we then "unfold" $\mathcal{G}^o$ to a $1$-hop path graph $\mathcal{G}$, specified by a generalized graph Laplacian matrix $\mathcal{L}$, via one of two graph unfolding procedures with provable performance bounds. We show that maximizing the smallest eigenvalue $λ_{\min}(\mathbf{B})$ of a coefficient matrix $\mathbf{B} = \textit{diag}\left(\mathbf{h}\right) + μ\mathcal{L}$, where $\mathbf{h}$ is the binary keyframe selection vector, is equivalent to minimizing a worst-case signal reconstruction error. We maximize instead the Gershgorin circle theorem (GCT) lower bound $λ^-_{\min}(\mathbf{B})$ by choosing $\mathbf{h}$ via a new fast graph sampling algorithm that iteratively aligns left-ends of Gershgorin discs for all graph nodes (frames). Extensive experiments on multiple short video datasets show that our algorithm achieves comparable or better video summarization performance compared to state-of-the-art methods, at a substantially reduced complexity.

IVSep 10, 2024
Constructing an Interpretable Deep Denoiser by Unrolling Graph Laplacian Regularizer

Seyed Alireza Hosseini, Tam Thuc Do, Gene Cheung et al.

An image denoiser can be used for a wide range of restoration problems via the Plug-and-Play (PnP) architecture. In this paper, we propose a general framework to build an interpretable graph-based deep denoiser (GDD) by unrolling a solution to a maximum a posteriori (MAP) problem equipped with a graph Laplacian regularizer (GLR) as signal prior. Leveraging a recent theorem showing that any (pseudo-)linear denoiser $\boldsymbol Ψ$, under mild conditions, can be mapped to a solution of a MAP denoising problem regularized using GLR, we first initialize a graph Laplacian matrix $\mathbf L$ via truncated Taylor Series Expansion (TSE) of $\boldsymbol Ψ^{-1}$. Then, we compute the MAP linear system solution by unrolling iterations of the conjugate gradient (CG) algorithm into a sequence of neural layers as a feed-forward network -- one that is amenable to parameter tuning. The resulting GDD network is "graph-interpretable", low in parameter count, and easy to initialize thanks to $\mathbf L$ derived from a known well-performing denoiser $\boldsymbol Ψ$. Experimental results show that GDD achieves competitive image denoising performance compared to competitors, but employing far fewer parameters, and is more robust to covariate shift.

SPApr 28
Sparse Graph Learning from Sparse Data via Fiedler Number Maximization

Bahar Oveisgharan, Gene Cheung, Andrew Eckford

We aim to learn a sparse and connected graph from sparse data, where the number of observations K can be substantially smaller than the signal dimension N for signals x in R^N, and the underlying distribution is unknown. In this severely ill-posed setting, we incorporate Fiedler number (the second eigenvalue of the graph Laplacian matrix that quantifies connectedness) as a robust regularization term in the sparse graph learning objective. We first develop a greedy algorithm that iteratively selects one edge globally for weakening/removal to reduce the objective, leveraging eigenvalue perturbation theorems that bound the adverse effect of an edge change to the Fiedler number. Next, we design a parallel variant, based on the Cheeger's inequality, that recursively partitions an input graph into two sub-graphs using an approximate Cheeger cut to distributedly find an optimal edge. Simulation experiments show that Fiedler number maximization robustifies sparse graph estimates, outperforming previous sparse graph learning algorithms.

LGOct 3, 2025
Lightweight Transformer for EEG Classification via Balanced Signed Graph Algorithm Unrolling

Junyi Yao, Parham Eftekhar, Gene Cheung et al.

Samples of brain signals collected by EEG sensors have inherent anti-correlations that are well modeled by negative edges in a finite graph. To differentiate epilepsy patients from healthy subjects using collected EEG signals, we build lightweight and interpretable transformer-like neural nets by unrolling a spectral denoising algorithm for signals on a balanced signed graph -- graph with no cycles of odd number of negative edges. A balanced signed graph has well-defined frequencies that map to a corresponding positive graph via similarity transform of the graph Laplacian matrices. We implement an ideal low-pass filter efficiently on the mapped positive graph via Lanczos approximation, where the optimal cutoff frequency is learned from data. Given that two balanced signed graph denoisers learn posterior probabilities of two different signal classes during training, we evaluate their reconstruction errors for binary classification of EEG signals. Experiments show that our method achieves classification performance comparable to representative deep learning schemes, while employing dramatically fewer parameters.

CVSep 15, 2025
Graph Algorithm Unrolling with Douglas-Rachford Iterations for Image Interpolation with Guaranteed Initialization

Xue Zhang, Bingshuo Hu, Gene Cheung

Conventional deep neural nets (DNNs) initialize network parameters at random and then optimize each one via stochastic gradient descent (SGD), resulting in substantial risk of poor-performing local minima.Focusing on the image interpolation problem and leveraging a recent theorem that maps a (pseudo-)linear interpolator Θ to a directed graph filter that is a solution to a MAP problem regularized with a graph shift variation (GSV) prior, we first initialize a directed graph adjacency matrix A based on a known interpolator Θ, establishing a baseline performance.Then, towards further gain, we learn perturbation matrices P and P(2) from data to augment A, whose restoration effects are implemented via Douglas-Rachford (DR) iterations, which we unroll into a lightweight interpretable neural net.Experimental results demonstrate state-of-the-art image interpolation results, while drastically reducing network parameters.

IVSep 10, 2025
Deep Unrolling of Sparsity-Induced RDO for 3D Point Cloud Attribute Coding

Tam Thuc Do, Philip A. Chou, Gene Cheung

Given encoded 3D point cloud geometry available at the decoder, we study the problem of lossy attribute compression in a multi-resolution B-spline projection framework. A target continuous 3D attribute function is first projected onto a sequence of nested subspaces $\mathcal{F}^{(p)}_{l_0} \subseteq \cdots \subseteq \mathcal{F}^{(p)}_{L}$, where $\mathcal{F}^{(p)}_{l}$ is a family of functions spanned by a B-spline basis function of order $p$ at a chosen scale and its integer shifts. The projected low-pass coefficients $F_l^*$ are computed by variable-complexity unrolling of a rate-distortion (RD) optimization algorithm into a feed-forward network, where the rate term is the sparsity-promoting $\ell_1$-norm. Thus, the projection operation is end-to-end differentiable. For a chosen coarse-to-fine predictor, the coefficients are then adjusted to account for the prediction from a lower-resolution to a higher-resolution, which is also optimized in a data-driven manner.

IVJun 3, 2025
Unrolling Nonconvex Graph Total Variation for Image Denoising

Songlin Wei, Gene Cheung, Fei Chen et al.

Conventional model-based image denoising optimizations employ convex regularization terms, such as total variation (TV) that convexifies the $\ell_0$-norm to promote sparse signal representation. Instead, we propose a new non-convex total variation term in a graph setting (NC-GTV), such that when combined with an $\ell_2$-norm fidelity term for denoising, leads to a convex objective with no extraneous local minima. We define NC-GTV using a new graph variant of the Huber function, interpretable as a Moreau envelope. The crux is the selection of a parameter $a$ characterizing the graph Huber function that ensures overall objective convexity; we efficiently compute $a$ via an adaptation of Gershgorin Circle Theorem (GCT). To minimize the convex objective, we design a linear-time algorithm based on Alternating Direction Method of Multipliers (ADMM) and unroll it into a lightweight feed-forward network for data-driven parameter learning. Experiments show that our method outperforms unrolled GTV and other representative image denoising schemes, while employing far fewer network parameters.

LGJun 2, 2025
Efficient Learning of Balanced Signed Graphs via Sparse Linear Programming

Haruki Yokota, Hiroshi Higashi, Yuichi Tanaka et al.

Signed graphs are equipped with both positive and negative edge weights, encoding pairwise correlations as well as anti-correlations in data. A balanced signed graph is a signed graph with no cycles containing an odd number of negative edges. Laplacian of a balanced signed graph has eigenvectors that map via a simple linear transform to ones in a corresponding positive graph Laplacian, thus enabling reuse of spectral filtering tools designed for positive graphs. We propose an efficient method to learn a balanced signed graph Laplacian directly from data. Specifically, extending a previous linear programming (LP) based sparse inverse covariance estimation method called CLIME, we formulate a new LP problem for each Laplacian column $i$, where the linear constraints restrict weight signs of edges stemming from node $i$, so that nodes of same / different polarities are connected by positive / negative edges. Towards optimal model selection, we derive a suitable CLIME parameter $ρ$ based on a combination of the Hannan-Quinn information criterion and a minimum feasibility criterion. We solve the LP problem efficiently by tailoring a sparse LP method based on ADMM. We theoretically prove local solution convergence of our proposed iterative algorithm. Extensive experimental results on synthetic and real-world datasets show that our balanced graph learning method outperforms competing methods and enables reuse of spectral filters, wavelets, and graph convolutional nets (GCN) constructed for positive graphs.

LGJun 6, 2024
Interpretable Lightweight Transformer via Unrolling of Learned Graph Smoothness Priors

Tam Thuc Do, Parham Eftekhar, Seyed Alireza Hosseini et al.

We build interpretable and lightweight transformer-like neural networks by unrolling iterative optimization algorithms that minimize graph smoothness priors -- the quadratic graph Laplacian regularizer (GLR) and the $\ell_1$-norm graph total variation (GTV) -- subject to an interpolation constraint. The crucial insight is that a normalized signal-dependent graph learning module amounts to a variant of the basic self-attention mechanism in conventional transformers. Unlike "black-box" transformers that require learning of large key, query and value matrices to compute scaled dot products as affinities and subsequent output embeddings, resulting in huge parameter sets, our unrolled networks employ shallow CNNs to learn low-dimensional features per node to establish pairwise Mahalanobis distances and construct sparse similarity graphs. At each layer, given a learned graph, the target interpolated signal is simply a low-pass filtered output derived from the minimization of an assumed graph smoothness prior, leading to a dramatic reduction in parameter count. Experiments for two image interpolation applications verify the restoration performance, parameter efficiency and robustness to covariate shift of our graph-based unrolled networks compared to conventional transformers.

LGJan 3, 2024
Signal Processing in the Retina: Interpretable Graph Classifier to Predict Ganglion Cell Responses

Yasaman Parhizkar, Gene Cheung, Andrew W. Eckford

It is a popular hypothesis in neuroscience that ganglion cells in the retina are activated by selectively detecting visual features in an observed scene. While ganglion cell firings can be predicted via data-trained deep neural nets, the networks remain indecipherable, thus providing little understanding of the cells' underlying operations. To extract knowledge from the cell firings, in this paper we learn an interpretable graph-based classifier from data to predict the firings of ganglion cells in response to visual stimuli. Specifically, we learn a positive semi-definite (PSD) metric matrix $\mathbf{M} \succeq 0$ that defines Mahalanobis distances between graph nodes (visual events) endowed with pre-computed feature vectors; the computed inter-node distances lead to edge weights and a combinatorial graph that is amenable to binary classification. Mathematically, we define the objective of metric matrix $\mathbf{M}$ optimization using a graph adaptation of large margin nearest neighbor (LMNN), which is rewritten as a semi-definite programming (SDP) problem. We solve it efficiently via a fast approximation called Gershgorin disc perfect alignment (GDPA) linearization. The learned metric matrix $\mathbf{M}$ provides interpretability: important features are identified along $\mathbf{M}$'s diagonal, and their mutual relationships are inferred from off-diagonal terms. Our fast metric learning framework can be applied to other biological systems with pre-chosen features that require interpretation.

LGFeb 28, 2022
Sparse Graph Learning with Spectrum Prior for Deep Graph Convolutional Networks

Jin Zeng, Yang Liu, Gene Cheung et al.

A graph convolutional network (GCN) employs a graph filtering kernel tailored for data with irregular structures. However, simply stacking more GCN layers does not improve performance; instead, the output converges to an uninformative low-dimensional subspace, where the convergence rate is characterized by the graph spectrum -- this is the known over-smoothing problem in GCN. In this paper, we propose a sparse graph learning algorithm incorporating a new spectrum prior to compute a graph topology that circumvents over-smoothing while preserving pairwise correlations inherent in data. Specifically, based on a spectral analysis of multilayer GCN output, we derive a spectrum prior for the graph Laplacian matrix $\mathbf{L}$ to robustify the model expressiveness against over-smoothing. Then, we formulate a sparse graph learning problem with the spectrum prior, solved efficiently via block coordinate descent (BCD). Moreover, we optimize the weight parameter trading off the fidelity term with the spectrum prior, based on data smoothness on the original graph learned without spectrum manipulation. The output $\mathbf{L}$ is then normalized for supervised GCN training. Experiments show that our proposal produced deeper GCNs and higher prediction accuracy for regression and classification tasks compared to competing schemes.

SPDec 15, 2021
Fast Computation of Generalized Eigenvectors for Manifold Graph Embedding

Fei Chen, Gene Cheung, Xue Zhang

Our goal is to efficiently compute low-dimensional latent coordinates for nodes in an input graph -- known as graph embedding -- for subsequent data processing such as clustering. Focusing on finite graphs that are interpreted as uniform samples on continuous manifolds (called manifold graphs), we leverage existing fast extreme eigenvector computation algorithms for speedy execution. We first pose a generalized eigenvalue problem for sparse matrix pair $(\A,\B)$, where $\A = Ł- μ\Q + ε\I$ is a sum of graph Laplacian $Ł$ and disconnected two-hop difference matrix $\Q$. Eigenvector $\v$ minimizing Rayleigh quotient $\frac{\v^{\top} \A \v}{\v^{\top} \v}$ thus minimizes $1$-hop neighbor distances while maximizing distances between disconnected $2$-hop neighbors, preserving graph structure. Matrix $\B = \text{diag}(\{\b_i\})$ that defines eigenvector orthogonality is then chosen so that boundary / interior nodes in the sampling domain have the same generalized degrees. $K$-dimensional latent vectors for the $N$ graph nodes are the first $K$ generalized eigenvectors for $(\A,\B)$, computed in $\cO(N)$ using LOBPCG, where $K \ll N$. Experiments show that our embedding is among the fastest in the literature, while producing the best clustering performance for manifold graphs.

CVNov 9, 2021
Graph-Based Depth Denoising & Dequantization for Point Cloud Enhancement

Xue Zhang, Gene Cheung, Jiahao Pang et al.

A 3D point cloud is typically constructed from depth measurements acquired by sensors at one or more viewpoints. The measurements suffer from both quantization and noise corruption. To improve quality, previous works denoise a point cloud \textit{a posteriori} after projecting the imperfect depth data onto 3D space. Instead, we enhance depth measurements directly on the sensed images \textit{a priori}, before synthesizing a 3D point cloud. By enhancing near the physical sensing process, we tailor our optimization to our depth formation model before subsequent processing steps that obscure measurement errors. Specifically, we model depth formation as a combined process of signal-dependent noise addition and non-uniform log-based quantization. The designed model is validated (with parameters fitted) using collected empirical data from a representative depth sensor. To enhance each pixel row in a depth image, we first encode intra-view similarities between available row pixels as edge weights via feature graph learning. We next establish inter-view similarities with another rectified depth image via viewpoint mapping and sparse linear interpolation. This leads to a maximum a posteriori (MAP) graph filtering objective that is convex and differentiable. We minimize the objective efficiently using accelerated gradient descent (AGD), where the optimal step size is approximated via Gershgorin circle theorem (GCT). Experiments show that our method significantly outperformed recent point cloud denoising schemes and state-of-the-art image denoising schemes in two established point cloud quality metrics.

CVOct 21, 2021
Fast Graph Sampling for Short Video Summarization using Gershgorin Disc Alignment

Sadid Sahami, Gene Cheung, Chia-Wen Lin

We study the problem of efficiently summarizing a short video into several keyframes, leveraging recent progress in fast graph sampling. Specifically, we first construct a similarity path graph (SPG) $\mathcal{G}$, represented by graph Laplacian matrix $\mathbf{L}$, where the similarities between adjacent frames are encoded as positive edge weights. We show that maximizing the smallest eigenvalue $λ_{\min}(\mathbf{B})$ of a coefficient matrix $\mathbf{B} = \text{diag}(\mathbf{a}) + μ\mathbf{L}$, where $\mathbf{a}$ is the binary keyframe selection vector, is equivalent to minimizing a worst-case signal reconstruction error. We prove that, after partitioning $\mathcal{G}$ into $Q$ sub-graphs $\{\mathcal{G}^q\}^Q_{q=1}$, the smallest Gershgorin circle theorem (GCT) lower bound of $Q$ corresponding coefficient matrices -- $\min_q λ^-_{\min}(\mathbf{B}^q)$ -- is a lower bound for $λ_{\min}(\mathbf{B})$. This inspires a fast graph sampling algorithm to iteratively partition $\mathcal{G}$ into $Q$ sub-graphs using $Q$ samples (keyframes), while maximizing $λ^-_{\min}(\mathbf{B}^q)$ for each sub-graph $\mathcal{G}^q$. Experimental results show that our algorithm achieves comparable video summarization performance as state-of-the-art methods, at a substantially reduced complexity.

LGSep 10, 2021
Unfolding Projection-free SDP Relaxation of Binary Graph Classifier via GDPA Linearization

Cheng Yang, Gene Cheung, Wai-tian Tan et al.

Algorithm unfolding creates an interpretable and parsimonious neural network architecture by implementing each iteration of a model-based algorithm as a neural layer. However, unfolding a proximal splitting algorithm with a positive semi-definite (PSD) cone projection operator per iteration is expensive, due to the required full matrix eigen-decomposition. In this paper, leveraging a recent linear algebraic theorem called Gershgorin disc perfect alignment (GDPA), we unroll a projection-free algorithm for semi-definite programming relaxation (SDR) of a binary graph classifier, where the PSD cone constraint is replaced by a set of "tightest possible" linear constraints per iteration. As a result, each iteration only requires computing a linear program (LP) and one extreme eigenvector. Inside the unrolled network, we optimize parameters via stochastic gradient descent (SGD) that determine graph edge weights in two ways: i) a metric matrix that computes feature distances, and ii) a sparse weight matrix computed via local linear embedding (LLE). Experimental results show that our unrolled network outperformed pure model-based graph classifiers, and achieved comparable performance to pure data-driven networks but using far fewer parameters.

LGJun 3, 2021
Projection-free Graph-based Classifier Learning using Gershgorin Disc Perfect Alignment

Cheng Yang, Gene Cheung, Guangtao Zhai

In semi-supervised graph-based binary classifier learning, a subset of known labels $\hat{x}_i$ are used to infer unknown labels, assuming that the label signal $\mathbf{x}$ is smooth with respect to a similarity graph specified by a Laplacian matrix. When restricting labels $x_i$ to binary values, the problem is NP-hard. While a conventional semi-definite programming relaxation (SDR) can be solved in polynomial time using, for example, the alternating direction method of multipliers (ADMM), the complexity of projecting a candidate matrix $\mathbf{M}$ onto the positive semi-definite (PSD) cone ($\mathbf{M} \succeq 0$) per iteration remains high. In this paper, leveraging a recent linear algebraic theory called Gershgorin disc perfect alignment (GDPA), we propose a fast projection-free method by solving a sequence of linear programs (LP) instead. Specifically, we first recast the SDR to its dual, where a feasible solution $\mathbf{H} \succeq 0$ is interpreted as a Laplacian matrix corresponding to a balanced signed graph minus the last node. To achieve graph balance, we split the last node into two, each retains the original positive / negative edges, resulting in a new Laplacian $\bar{\mathbf{H}}$. We repose the SDR dual for solution $\bar{\mathbf{H}}$, then replace the PSD cone constraint $\bar{\mathbf{H}} \succeq 0$ with linear constraints derived from GDPA -- sufficient conditions to ensure $\bar{\mathbf{H}}$ is PSD -- so that the optimization becomes an LP per iteration. Finally, we extract predicted labels from converged solution $\bar{\mathbf{H}}$. Experiments show that our algorithm enjoyed a $28\times$ speedup over the next fastest scheme while achieving comparable label prediction performance.

MMApr 14, 2021
Landmarking for Navigational Streaming of Stored High-Dimensional Media

Yuan Yuan, Gene Cheung, Pascal Frossard et al.

Modern media data such as 360 videos and light field (LF) images are typically captured in much higher dimensions than the observers' visual displays. To efficiently browse high-dimensional media over bandwidth-constrained networks, a navigational streaming model is considered: a client navigates the large media space by dictating a navigation path to a server, who in response transmits the corresponding pre-encoded media data units (MDU) to the client one-by-one in sequence. Intra-coding an MDU (I-MDU) would result in a large bitrate but I-MDU can be randomly accessed, while inter-coding an MDU (P-MDU) using another MDU as a predictor incurs a small coding cost but imposes an order where the predictor must be first transmitted and decoded. From a compression perspective, the technical challenge is: how to achieve coding gain via inter-coding of MDUs, while enabling adequate random access for satisfactory user navigation. To address this problem, we propose landmarks, a selection of key MDUs from the high-dimensional media. Using landmarks as predictors, nearby MDUs in local neighborhoods are intercoded, resulting in a predictive MDU structure with controlled coding cost. It means that any requested MDU can be decoded by at most transmitting a landmark and an inter-coded MDU, enabling navigational random access. To build a landmarked MDU structure, we employ tree-structured vector quantizer (TSVQ) to first optimize landmark locations, then iteratively add/remove inter-coded MDUs as refinements using a fast branch-and-bound technique. Taking interactive LF images and viewport adaptive 360 images as illustrative applications, and I-, P- and previously proposed merge frames to intra- and inter-code MDUs, we show experimentally that landmarked MDU structures can noticeably reduce the expected transmission cost compared with MDU structures without landmarks.

SPOct 25, 2020
Learning Sparse Graph Laplacian with K Eigenvector Prior via Iterative GLASSO and Projection

Saghar Bagheri, Gene Cheung, Antonio Ortega et al.

Learning a suitable graph is an important precursor to many graph signal processing (GSP) pipelines, such as graph spectral signal compression and denoising. Previous graph learning algorithms either i) make some assumptions on connectivity (e.g., graph sparsity), or ii) make simple graph edge assumptions such as positive edges only. In this paper, given an empirical covariance matrix $\bar{C}$ computed from data as input, we consider a structural assumption on the graph Laplacian matrix $L$: the first $K$ eigenvectors of $L$ are pre-selected, e.g., based on domain-specific criteria, such as computation requirement, and the remaining eigenvectors are then learned from data. One example use case is image coding, where the first eigenvector is pre-chosen to be constant, regardless of available observed data. We first prove that the subspace of symmetric positive semi-definite (PSD) matrices $H_{u}^+$ with the first $K$ eigenvectors being $\{u_k\}$ in a defined Hilbert space is a convex cone. We then construct an operator to project a given positive definite (PD) matrix $L$ to $H_{u}^+$, inspired by the Gram-Schmidt procedure. Finally, we design an efficient hybrid graphical lasso/projection algorithm to compute the most suitable graph Laplacian matrix $L^* \in H_{u}^+$ given $\bar{C}$. Experimental results show that given the first $K$ eigenvectors as a prior, our algorithm outperforms competing graph learning schemes using a variety of graph comparison metrics.

CVOct 21, 2020
Unrolling of Deep Graph Total Variation for Image Denoising

Huy Vu, Gene Cheung, Yonina C. Eldar

While deep learning (DL) architectures like convolutional neural networks (CNNs) have enabled effective solutions in image denoising, in general their implementations overly rely on training data, lack interpretability, and require tuning of a large parameter set. In this paper, we combine classical graph signal filtering with deep feature learning into a competitive hybrid design -- one that utilizes interpretable analytical low-pass graph filters and employs 80% fewer network parameters than state-of-the-art DL denoising scheme DnCNN. Specifically, to construct a suitable similarity graph for graph spectral filtering, we first adopt a CNN to learn feature representations per pixel, and then compute feature distances to establish edge weights. Given a constructed graph, we next formulate a convex optimization problem for denoising using a graph total variation (GTV) prior. Via a $l_1$ graph Laplacian reformulation, we interpret its solution in an iterative procedure as a graph low-pass filter and derive its frequency response. For fast filter implementation, we realize this response using a Lanczos approximation. Experimental results show that in the case of statistical mistmatch, our algorithm outperformed DnCNN by up to 3dB in PSNR.

LGJun 15, 2020
Signed Graph Metric Learning via Gershgorin Disc Perfect Alignment

Cheng Yang, Gene Cheung, Wei Hu

Given a convex and differentiable objective $Q(\M)$ for a real symmetric matrix $\M$ in the positive definite (PD) cone -- used to compute Mahalanobis distances -- we propose a fast general metric learning framework that is entirely projection-free. We first assume that $\M$ resides in a space $\cS$ of generalized graph Laplacian matrices corresponding to balanced signed graphs. $\M \in \cS$ that is also PD is called a graph metric matrix. Unlike low-rank metric matrices common in the literature, $\cS$ includes the important diagonal-only matrices as a special case. The key theorem to circumvent full eigen-decomposition and enable fast metric matrix optimization is Gershgorin disc perfect alignment (GDPA): given $\M \in \cS$ and diagonal matrix $§$, where $S_{ii} = 1/v_i$ and $\v$ is $\M$'s first eigenvector, we prove that Gershgorin disc left-ends of similarity transform $\B = §\M §^{-1}$ are perfectly aligned at the smallest eigenvalue $λ_{\min}$. Using this theorem, we replace the PD cone constraint in the metric learning problem with tightest possible linear constraints per iteration, so that the alternating optimization of the diagonal / off-diagonal terms in $\M$ can be solved efficiently as linear programs via the Frank-Wolfe method. We update $\v$ using Locally Optimal Block Preconditioned Conjugate Gradient (LOBPCG) with warm start as entries in $\M$ are optimized successively. Experiments show that our graph metric optimization is significantly faster than cone-projection schemes, and produces competitive binary classification performance.

SPMar 9, 2020
Sampling Signals on Graphs: From Theory to Applications

Yuichi Tanaka, Yonina C. Eldar, Antonio Ortega et al.

The study of sampling signals on graphs, with the goal of building an analog of sampling for standard signals in the time and spatial domains, has attracted considerable attention recently. Beyond adding to the growing theory on graph signal processing (GSP), sampling on graphs has various promising applications. In this article, we review current progress on sampling over graphs focusing on theory and potential applications. Although most methodologies used in graph signal sampling are designed to parallel those used in sampling for standard signals, sampling theory for graph signals significantly differs from the theory of Shannon--Nyquist and shift-invariant sampling. This is due in part to the fact that the definitions of several important properties, such as shift invariance and bandlimitedness, are different in GSP systems. Throughout this review, we discuss similarities and differences between standard and graph signal sampling and highlight open problems and challenges.

IVFeb 11, 2020
3D Point Cloud Enhancement using Graph-Modelled Multiview Depth Measurements

Xue Zhang, Gene Cheung, Jiahao Pang et al.

A 3D point cloud is often synthesized from depth measurements collected by sensors at different viewpoints. The acquired measurements are typically both coarse in precision and corrupted by noise. To improve quality, previous works denoise a synthesized 3D point cloud a posteriori after projecting the imperfect depth data onto 3D space. Instead, we enhance depth measurements on the sensed images a priori, exploiting inherent 3D geometric correlation across views, before synthesizing a 3D point cloud from the improved measurements. By enhancing closer to the actual sensing process, we benefit from optimization targeting specifically the depth image formation model, before subsequent processing steps that can further obscure measurement errors. Mathematically, for each pixel row in a pair of rectified viewpoint depth images, we first construct a graph reflecting inter-pixel similarities via metric learning using data in previous enhanced rows. To optimize left and right viewpoint images simultaneously, we write a non-linear mapping function from left pixel row to the right based on 3D geometry relations. We formulate a MAP optimization problem, which, after suitable linear approximations, results in an unconstrained convex and differentiable objective, solvable using fast gradient method (FGM). Experimental results show that our method noticeably outperforms recent denoising algorithms that enhance after 3D point clouds are synthesized.

LGJan 28, 2020
Graph Metric Learning via Gershgorin Disc Alignment

Cheng Yang, Gene Cheung, Wei Hu

We propose a fast general projection-free metric learning framework, where the minimization objective $\min_{\textbf{M} \in \mathcal{S}} Q(\textbf{M})$ is a convex differentiable function of the metric matrix $\textbf{M}$, and $\textbf{M}$ resides in the set $\mathcal{S}$ of generalized graph Laplacian matrices for connected graphs with positive edge weights and node degrees. Unlike low-rank metric matrices common in the literature, $\mathcal{S}$ includes the important positive-diagonal-only matrices as a special case in the limit. The key idea for fast optimization is to rewrite the positive definite cone constraint in $\mathcal{S}$ as signal-adaptive linear constraints via Gershgorin disc alignment, so that the alternating optimization of the diagonal and off-diagonal terms in $\textbf{M}$ can be solved efficiently as linear programs via Frank-Wolfe iterations. We prove that the Gershgorin discs can be aligned perfectly using the first eigenvector $\textbf{v}$ of $\textbf{M}$, which we update iteratively using Locally Optimal Block Preconditioned Conjugate Gradient (LOBPCG) with warm start as diagonal / off-diagonal terms are optimized. Experiments show that our efficiently computed graph metric matrices outperform metrics learned using competing methods in terms of classification tasks.

LGDec 6, 2019
Robust Deep Graph Based Learning for Binary Classification

Minxiang Ye, Vladimir Stankovic, Lina Stankovic et al.

Convolutional neural network (CNN)-based feature learning has become state of the art, since given sufficient training data, CNN can significantly outperform traditional methods for various classification tasks. However, feature learning becomes more difficult if some training labels are noisy. With traditional regularization techniques, CNN often overfits to the noisy training labels, resulting in sub-par classification performance. In this paper, we propose a robust binary classifier, based on CNNs, to learn deep metric functions, which are then used to construct an optimal underlying graph structure used to clean noisy labels via graph Laplacian regularization (GLR). GLR is posed as a convex maximum a posteriori (MAP) problem solved via convex quadratic programming (QP). To penalize samples around the decision boundary, we propose two regularized loss functions for semi-supervised learning. The binary classification experiments on three datasets, varying in number and type of features, demonstrate that given a noisy training dataset, our proposed networks outperform several state-of-the-art classifiers, including label-noise robust support vector machine, CNNs with three different robust loss functions, model-based GLR, and dynamic graph CNN classifiers.

CVJul 22, 2019
Feature Graph Learning for 3D Point Cloud Denoising

Wei Hu, Xiang Gao, Gene Cheung et al.

Identifying an appropriate underlying graph kernel that reflects pairwise similarities is critical in many recent graph spectral signal restoration schemes, including image denoising, dequantization, and contrast enhancement. Existing graph learning algorithms compute the most likely entries of a properly defined graph Laplacian matrix $\mathbf{L}$, but require a large number of signal observations $\mathbf{z}$'s for a stable estimate. In this work, we assume instead the availability of a relevant feature vector $\mathbf{f}_i$ per node $i$, from which we compute an optimal feature graph via optimization of a feature metric. Specifically, we alternately optimize the diagonal and off-diagonal entries of a Mahalanobis distance matrix $\mathbf{M}$ by minimizing the graph Laplacian regularizer (GLR) $\mathbf{z}^{\top} \mathbf{L} \mathbf{z}$, where edge weight is $w_{i,j} = \exp\{-(\mathbf{f}_i - \mathbf{f}_j)^{\top} \mathbf{M} (\mathbf{f}_i - \mathbf{f}_j) \}$, given a single observation $\mathbf{z}$. We optimize diagonal entries via proximal gradient (PG), where we constrain $\mathbf{M}$ to be positive definite (PD) via linear inequalities derived from the Gershgorin circle theorem. To optimize off-diagonal entries, we design a block descent algorithm that iteratively optimizes one row and column of $\mathbf{M}$. To keep $\mathbf{M}$ PD, we constrain the Schur complement of sub-matrix $\mathbf{M}_{2,2}$ of $\mathbf{M}$ to be PD when optimizing via PG. Our algorithm mitigates full eigen-decomposition of $\mathbf{M}$, thus ensuring fast computation speed even when feature vector $\mathbf{f}_i$ has high dimension. To validate its usefulness, we apply our feature graph learning algorithm to the problem of 3D point cloud denoising, resulting in state-of-the-art performance compared to competing schemes in extensive experiments.

CVJul 31, 2018
Deep Graph Laplacian Regularization for Robust Denoising of Real Images

Jin Zeng, Jiahao Pang, Wenxiu Sun et al.

Recent developments in deep learning have revolutionized the paradigm of image restoration. However, its applications on real image denoising are still limited, due to its sensitivity to training data and the complex nature of real image noise. In this work, we combine the robustness merit of model-based approaches and the learning power of data-driven approaches for real image denoising. Specifically, by integrating graph Laplacian regularization as a trainable module into a deep learning framework, we are less susceptible to overfitting than pure CNN-based approaches, achieving higher robustness to small datasets and cross-domain denoising. First, a sparse neighborhood graph is built from the output of a convolutional neural network (CNN). Then the image is restored by solving an unconstrained quadratic programming problem, using a corresponding graph Laplacian regularizer as a prior term. The proposed restoration pipeline is fully differentiable and hence can be end-to-end trained. Experimental results demonstrate that our work is less prone to overfitting given small training data. It is also endowed with strong cross-domain generalization power, outperforming the state-of-the-art approaches by a remarkable margin.

CVJul 22, 2018
SiGAN: Siamese Generative Adversarial Network for Identity-Preserving Face Hallucination

Chih-Chung Hsu, Chia-Wen Lin, Weng-Tai Su et al.

Despite generative adversarial networks (GANs) can hallucinate photo-realistic high-resolution (HR) faces from low-resolution (LR) faces, they cannot guarantee preserving the identities of hallucinated HR faces, making the HR faces poorly recognizable. To address this problem, we propose a Siamese GAN (SiGAN) to reconstruct HR faces that visually resemble their corresponding identities. On top of a Siamese network, the proposed SiGAN consists of a pair of two identical generators and one discriminator. We incorporate reconstruction error and identity label information in the loss function of SiGAN in a pairwise manner. By iteratively optimizing the loss functions of the generator pair and discriminator of SiGAN, we cannot only achieve photo-realistic face reconstruction, but also ensures the reconstructed information is useful for identity recognition. Experimental results demonstrate that SiGAN significantly outperforms existing face hallucination GANs in objective face verification performance, while achieving photo-realistic reconstruction. Moreover, for input LR faces from unknown identities who are not included in training, SiGAN can still do a good job.

CVMar 20, 2018
3D Point Cloud Denoising using Graph Laplacian Regularization of a Low Dimensional Manifold Model

Jin Zeng, Gene Cheung, Michael Ng et al.

3D point cloud - a new signal representation of volumetric objects - is a discrete collection of triples marking exterior object surface locations in 3D space. Conventional imperfect acquisition processes of 3D point cloud - e.g., stereo-matching from multiple viewpoint images or depth data acquired directly from active light sensors - imply non-negligible noise in the data. In this paper, we adopt a previously proposed low-dimensional manifold model for the surface patches in the point cloud and seek self-similar patches to denoise them simultaneously using the patch manifold prior. Due to discrete observations of the patches on the manifold, we approximate the manifold dimension computation defined in the continuous domain with a patch-based graph Laplacian regularizer and propose a new discrete patch distance measure to quantify the similarity between two same-sized surface patches for graph construction that is robust to noise. We show that our graph Laplacian regularizer has a natural graph spectral interpretation, and has desirable numerical stability properties via eigenanalysis. Extensive simulation results show that our proposed denoising scheme can outperform state-of-the-art methods in objective metrics and can better preserve visually salient structural features like edges.

CVFeb 22, 2018
Graph-Based Blind Image Deblurring From a Single Photograph

Yuanchao Bai, Gene Cheung, Xianming Liu et al.

Blind image deblurring, i.e., deblurring without knowledge of the blur kernel, is a highly ill-posed problem. The problem can be solved in two parts: i) estimate a blur kernel from the blurry image, and ii) given estimated blur kernel, de-convolve blurry input to restore the target image. In this paper, we propose a graph-based blind image deblurring algorithm by interpreting an image patch as a signal on a weighted graph. Specifically, we first argue that a skeleton image---a proxy that retains the strong gradients of the target but smooths out the details---can be used to accurately estimate the blur kernel and has a unique bi-modal edge weight distribution. Then, we design a reweighted graph total variation (RGTV) prior that can efficiently promote a bi-modal edge weight distribution given a blurry patch. Further, to analyze RGTV in the graph frequency domain, we introduce a new weight function to represent RGTV as a graph $l_1$-Laplacian regularizer. This leads to a graph spectral filtering interpretation of the prior with desirable properties, including robustness to noise and blur, strong piecewise smooth (PWS) filtering and sharpness promotion. Minimizing a blind image deblurring objective with RGTV results in a non-convex non-differentiable optimization problem. We leverage the new graph spectral interpretation for RGTV to design an efficient algorithm that solves for the skeleton image and the blur kernel alternately. Specifically for Gaussian blur, we propose a further speedup strategy for blind Gaussian deblurring using accelerated graph spectral filtering. Finally, with the computed blur kernel, recent non-blind image deblurring algorithms can be applied to restore the target image. Experimental results demonstrate that our algorithm successfully restores latent sharp images and outperforms state-of-the-art methods quantitatively and qualitatively.

IVFeb 20, 2018
Non-Local Graph-Based Prediction For Reversible Data Hiding In Images

Qi Chang, Gene Cheung, Yao Zhao et al.

Reversible data hiding (RDH) is desirable in applications where both the hidden message and the cover medium need to be recovered without loss. Among many RDH approaches is prediction-error expansion (PEE), containing two steps: i) prediction of a target pixel value, and ii) embedding according to the value of prediction-error. In general, higher prediction performance leads to larger embedding capacity and/or lower signal distortion. Leveraging on recent advances in graph signal processing (GSP), we pose pixel prediction as a graph-signal restoration problem, where the appropriate edge weights of the underlying graph are computed using a similar patch searched in a semi-local neighborhood. Specifically, for each candidate patch, we first examine eigenvalues of its structure tensor to estimate its local smoothness. If sufficiently smooth, we pose a maximum a posteriori (MAP) problem using either a quadratic Laplacian regularizer or a graph total variation (GTV) term as signal prior. While the MAP problem using the first prior has a closed-form solution, we design an efficient algorithm for the second prior using alternating direction method of multipliers (ADMM) with nested proximal gradient descent. Experimental results show that with better quality GSP-based prediction, at low capacity the visual quality of the embedded image exceeds state-of-the-art methods noticeably.

CVDec 24, 2017
Blind Image Deblurring via Reweighted Graph Total Variation

Yuanchao Bai, Gene Cheung, Xianming Liu et al.

Blind image deblurring, i.e., deblurring without knowledge of the blur kernel, is a highly ill-posed problem. The problem can be solved in two parts: i) estimate a blur kernel from the blurry image, and ii) given estimated blur kernel, de-convolve blurry input to restore the target image. In this paper, by interpreting an image patch as a signal on a weighted graph, we first argue that a skeleton image---a proxy that retains the strong gradients of the target but smooths out the details---can be used to accurately estimate the blur kernel and has a unique bi-modal edge weight distribution. We then design a reweighted graph total variation (RGTV) prior that can efficiently promote bi-modal edge weight distribution given a blurry patch. However, minimizing a blind image deblurring objective with RGTV results in a non-convex non-differentiable optimization problem. We propose a fast algorithm that solves for the skeleton image and the blur kernel alternately. Finally with the computed blur kernel, recent non-blind image deblurring algorithms can be applied to restore the target image. Experimental results show that our algorithm can robustly estimate the blur kernel with large kernel size, and the reconstructed sharp image is competitive against the state-of-the-art methods.

CVApr 30, 2017
Joint Denoising / Compression of Image Contours via Shape Prior and Context Tree

Amin Zheng, Gene Cheung, Dinei Florencio

With the advent of depth sensing technologies, the extraction of object contours in images---a common and important pre-processing step for later higher-level computer vision tasks like object detection and human action recognition---has become easier. However, acquisition noise in captured depth images means that detected contours suffer from unavoidable errors. In this paper, we propose to jointly denoise and compress detected contours in an image for bandwidth-constrained transmission to a client, who can then carry out aforementioned application-specific tasks using the decoded contours as input. We first prove theoretically that in general a joint denoising / compression approach can outperform a separate two-stage approach that first denoises then encodes contours lossily. Adopting a joint approach, we first propose a burst error model that models typical errors encountered in an observed string y of directional edges. We then formulate a rate-constrained maximum a posteriori (MAP) problem that trades off the posterior probability p(x'|y) of an estimated string x' given y with its code rate R(x'). We design a dynamic programming (DP) algorithm that solves the posed problem optimally, and propose a compact context representation called total suffix tree (TST) that can reduce complexity of the algorithm dramatically. Experimental results show that our joint denoising / compression scheme outperformed a competing separate scheme in rate-distortion performance noticeably.

MMMar 27, 2017
Multi-Stream Switching for Interactive Virtual Reality Video Streaming

Gene Cheung, Zhi Liu, Zhiyou Ma et al.

Virtual reality (VR) video provides an immersive 360 viewing experience to a user wearing a head-mounted display: as the user rotates his head, correspondingly different fields-of-view (FoV) of the 360 video are rendered for observation. Transmitting the entire 360 video in high quality over bandwidth-constrained networks from server to client for real-time playback is challenging. In this paper we propose a multi-stream switching framework for VR video streaming: the server pre-encodes a set of VR video streams covering different view ranges that account for server-client round trip time (RTT) delay, and during streaming the server transmits and switches streams according to a user's detected head rotation angle. For a given RTT, we formulate an optimization to seek multiple VR streams of different view ranges and the head-angle-to-stream mapping function simultaneously, in order to minimize the expected distortion subject to bandwidth and storage constraints. We propose an alternating algorithm that, at each iteration, computes the optimal streams while keeping the mapping function fixed and vice versa. Experiments show that for the same bandwidth, our multi-stream switching scheme outperforms a non-switching single-stream approach by up to 2.9dB in PSNR.

CVFeb 10, 2017
Graph Fourier Transform with Negative Edges for Depth Image Coding

Weng-Tai Su, Gene Cheung, Chia-Wen Lin

Recent advent in graph signal processing (GSP) has led to the development of new graph-based transforms and wavelets for image / video coding, where the underlying graph describes inter-pixel correlations. In this paper, we develop a new transform called signed graph Fourier transform (SGFT), where the underlying graph G contains negative edges that describe anti-correlations between pixel pairs. Specifically, we first construct a one-state Markov process that models both inter-pixel correlations and anti-correlations. We then derive the corresponding precision matrix, and show that the loopy graph Laplacian matrix Q of a graph G with a negative edge and two self-loops at its end nodes is approximately equivalent. This proves that the eigenvectors of Q - called SGFT - approximates the optimal Karhunen-Lo`eve Transform (KLT). We show the importance of the self-loops in G to ensure Q is positive semi-definite. We prove that the first eigenvector of Q is piecewise constant (PWC), and thus can well approximate a piecewise smooth (PWS) signal like a depth image. Experimental results show that a block-based coding scheme based on SGFT outperforms a previous scheme using graph transforms with only positive edges for several depth images.

MMDec 23, 2016
Object Shape Approximation & Contour Adaptive Depth Image Coding for Virtual View Synthesis

Yuan Yuan, Gene Cheung, Patrick Le Callet et al.

A depth image provides partial geometric information of a 3D scene, namely the shapes of physical objects as observed from a particular viewpoint. This information is important when synthesizing images of different virtual camera viewpoints via depth-image-based rendering (DIBR). It has been shown that depth images can be efficiently coded using contour-adaptive codecs that preserve edge sharpness, resulting in visually pleasing DIBR-synthesized images. However, contours are typically losslessly coded as side information (SI), which is expensive if the object shapes are complex. In this paper, we pursue a new paradigm in depth image coding for color-plus-depth representation of a 3D scene: we pro-actively simplify object shapes in a depth and color image pair to reduce depth coding cost, at a penalty of a slight increase in synthesized view distortion. Specifically, we first mathematically derive a distortion upper-bound proxy for 3DSwIM---a quality metric tailored for DIBR-synthesized images. This proxy reduces interdependency among pixel rows in a block to ease optimization. We then approximate object contours via a dynamic programming (DP) algorithm to optimally trade off coding cost of contours using arithmetic edge coding (AEC) with our proposed view synthesis distortion proxy. We modify the depth and color images according to the approximated object contours in an inter-view consistent manner. These are then coded respectively using a contour-adaptive image codec based on graph Fourier transform (GFT) for edge preservation and HEVC intra. Experimental results show that by maintaining sharp but simplified object contours during contour-adaptive coding, for the same visual quality of DIBR-synthesized virtual views, our proposal can reduce depth image coding rate by up to 22% compared to alternative coding strategies such as HEVC intra.

LGNov 15, 2016
Robust Semi-Supervised Graph Classifier Learning with Negative Edge Weights

Gene Cheung, Weng-Tai Su, Yu Mao et al.

In a semi-supervised learning scenario, (possibly noisy) partially observed labels are used as input to train a classifier, in order to assign labels to unclassified samples. In this paper, we study this classifier learning problem from a graph signal processing (GSP) perspective. Specifically, by viewing a binary classifier as a piecewise constant graph-signal in a high-dimensional feature space, we cast classifier learning as a signal restoration problem via a classical maximum a posteriori (MAP) formulation. Unlike previous graph-signal restoration works, we consider in addition edges with negative weights that signify anti-correlation between samples. One unfortunate consequence is that the graph Laplacian matrix $\mathbf{L}$ can be indefinite, and previously proposed graph-signal smoothness prior $\mathbf{x}^T \mathbf{L} \mathbf{x}$ for candidate signal $\mathbf{x}$ can lead to pathological solutions. In response, we derive an optimal perturbation matrix $\boldsymbolΔ$ - based on a fast lower-bound computation of the minimum eigenvalue of $\mathbf{L}$ via a novel application of the Haynsworth inertia additivity formula---so that $\mathbf{L} + \boldsymbolΔ$ is positive semi-definite, resulting in a stable signal prior. Further, instead of forcing a hard binary decision for each sample, we define the notion of generalized smoothness on graph that promotes ambiguity in the classifier signal. Finally, we propose an algorithm based on iterative reweighted least squares (IRLS) that solves the posed MAP problem efficiently. Extensive simulation results show that our proposed algorithm outperforms both SVM variants and graph-based classifiers using positive-edge graphs noticeably.

CVJul 7, 2016
Random Walk Graph Laplacian based Smoothness Prior for Soft Decoding of JPEG Images

Xianming Liu, Gene Cheung, Xiaolin Wu et al.

Given the prevalence of JPEG compressed images, optimizing image reconstruction from the compressed format remains an important problem. Instead of simply reconstructing a pixel block from the centers of indexed DCT coefficient quantization bins (hard decoding), soft decoding reconstructs a block by selecting appropriate coefficient values within the indexed bins with the help of signal priors. The challenge thus lies in how to define suitable priors and apply them effectively. In this paper, we combine three image priors---Laplacian prior for DCT coefficients, sparsity prior and graph-signal smoothness prior for image patches---to construct an efficient JPEG soft decoding algorithm. Specifically, we first use the Laplacian prior to compute a minimum mean square error (MMSE) initial solution for each code block. Next, we show that while the sparsity prior can reduce block artifacts, limiting the size of the over-complete dictionary (to lower computation) would lead to poor recovery of high DCT frequencies. To alleviate this problem, we design a new graph-signal smoothness prior (desired signal has mainly low graph frequencies) based on the left eigenvectors of the random walk graph Laplacian matrix (LERaG). Compared to previous graph-signal smoothness priors, LERaG has desirable image filtering properties with low computation overhead. We demonstrate how LERaG can facilitate recovery of high DCT frequencies of a piecewise smooth (PWS) signal via an interpretation of low graph frequency components as relaxed solutions to normalized cut in spectral clustering. Finally, we construct a soft decoding algorithm using the three signal priors with appropriate prior weights. Experimental results show that our proposal outperforms state-of-the-art soft decoding algorithms in both objective and subjective evaluations noticeably.

MMApr 27, 2016
Context Tree based Image Contour Coding using A Geometric Prior

Amin Zheng, Gene Cheung, Dinei Florencio

If object contours in images are coded efficiently as side information, then they can facilitate advanced image / video coding techniques, such as graph Fourier transform coding or motion prediction of arbitrarily shaped pixel blocks. In this paper, we study the problem of lossless and lossy compression of detected contours in images. Specifically, we first convert a detected object contour composed of contiguous between-pixel edges to a sequence of directional symbols drawn from a small alphabet. To encode the symbol sequence using arithmetic coding, we compute an optimal variable-length context tree (VCT) $\mathcal{T}$ via a maximum a posterior (MAP) formulation to estimate symbols' conditional probabilities. MAP prevents us from overfitting given a small training set $\mathcal{X}$ of past symbol sequences by identifying a VCT $\mathcal{T}$ that achieves a high likelihood $P(\mathcal{X}|\mathcal{T})$ of observing $\mathcal{X}$ given $\mathcal{T}$, and a large geometric prior $P(\mathcal{T})$ stating that image contours are more often straight than curvy. For the lossy case, we design efficient dynamic programming (DP) algorithms that optimally trade off coding rate of an approximate contour $\hat{\mathbf{x}}$ given a VCT $\mathcal{T}$ with two notions of distortion of $\hat{\mathbf{x}}$ with respect to the original contour $\mathbf{x}$. To reduce the size of the DP tables, a total suffix tree is derived from a given VCT $\mathcal{T}$ for compact table entry indexing, reducing complexity. Experimental results show that for lossless contour coding, our proposed algorithm outperforms state-of-the-art context-based schemes consistently for both small and large training datasets. For lossy contour coding, our algorithms outperform comparable schemes in the literature in rate-distortion performance.

CVApr 27, 2016
Graph Laplacian Regularization for Image Denoising: Analysis in the Continuous Domain

Jiahao Pang, Gene Cheung

Inverse imaging problems are inherently under-determined, and hence it is important to employ appropriate image priors for regularization. One recent popular prior---the graph Laplacian regularizer---assumes that the target pixel patch is smooth with respect to an appropriately chosen graph. However, the mechanisms and implications of imposing the graph Laplacian regularizer on the original inverse problem are not well understood. To address this problem, in this paper we interpret neighborhood graphs of pixel patches as discrete counterparts of Riemannian manifolds and perform analysis in the continuous domain, providing insights into several fundamental aspects of graph Laplacian regularization for image denoising. Specifically, we first show the convergence of the graph Laplacian regularizer to a continuous-domain functional, integrating a norm measured in a locally adaptive metric space. Focusing on image denoising, we derive an optimal metric space assuming non-local self-similarity of pixel patches, leading to an optimal graph Laplacian regularizer for denoising in the discrete domain. We then interpret graph Laplacian regularization as an anisotropic diffusion scheme to explain its behavior during iterations, e.g., its tendency to promote piecewise smooth signals under certain settings. To verify our analysis, an iterative image denoising algorithm is developed. Experimental results show that our algorithm performs competitively with state-of-the-art denoising methods such as BM3D for natural images, and outperforms them significantly for piecewise smooth images.

MMMar 19, 2016
Optimal Lagrange Multipliers for Dependent Rate Allocation in Video Coding

Ana De Abreu, Gene Cheung, Pascal Frossard et al.

In a typical video rate allocation problem, the objective is to optimally distribute a source rate budget among a set of (in)dependently coded data units to minimize the total distortion of all units. Conventional Lagrangian approaches convert the lone rate constraint to a linear rate penalty scaled by a multiplier in the objective, resulting in a simpler unconstrained formulation. However, the search for the "optimal" multiplier, one that results in a distortion-minimizing solution among all Lagrangian solutions that satisfy the original rate constraint, remains an elusive open problem in the general setting. To address this problem, we propose a computation-efficient search strategy to identify this optimal multiplier numerically. Specifically, we first formulate a general rate allocation problem where each data unit can be dependently coded at different quantization parameters (QP) using a previous unit as predictor, or left uncoded at the encoder and subsequently interpolated at the decoder using neighboring coded units. After converting the original rate constrained problem to the unconstrained Lagrangian counterpart, we design an efficient dynamic programming (DP) algorithm that finds the optimal Lagrangian solution for a fixed multiplier. Finally, within the DP framework, we iteratively compute neighboring singular multiplier values, each resulting in multiple simultaneously optimal Lagrangian solutions, to drive the rates of the computed Lagrangian solutions towards the bit budget. We terminate when a singular multiplier value results in two Lagrangian solutions with rates below and above the bit budget. In extensive monoview and multiview video coding experiments, we show that our DP algorithm and selection of optimal multipliers on average outperform comparable rate control solutions used in video compression standards such as HEVC that do not skip frames in Y-PSNR.

MMSep 10, 2015
Merge Frame Design for Video Stream Switching using Piecewise Constant Functions

Wei Dai, Gene Cheung, Ngai-Man Cheung et al.

The ability to efficiently switch from one pre-encoded video stream to another (e.g., for bitrate adaptation or view switching) is important for many interactive streaming applications. Recently, stream-switching mechanisms based on distributed source coding (DSC) have been proposed. In order to reduce the overall transmission rate, these approaches provide a "merge" mechanism, where information is sent to the decoder such that the exact same frame can be reconstructed given that any one of a known set of side information (SI) frames is available at the decoder (e.g., each SI frame may correspond to a different stream from which we are switching). However, the use of bit-plane coding and channel coding in many DSC approaches leads to complex coding and decoding. In this paper, we propose an alternative approach for merging multiple SI frames, using a piecewise constant (PWC) function as the merge operator. In our approach, for each block to be reconstructed, a series of parameters of these PWC merge functions are transmitted in order to guarantee identical reconstruction given the known side information blocks. We consider two different scenarios. In the first case, a target frame is first given, and then merge parameters are chosen so that this frame can be reconstructed exactly at the decoder. In contrast, in the second scenario, the reconstructed frame and merge parameters are jointly optimized to meet a rate-distortion criteria. Experiments show that for both scenarios, our proposed merge techniques can outperform both a recent approach based on DSC and the SP-frame approach in H.264, in terms of compression efficiency and decoder complexity.