CVOct 17, 2023Code
EvalCrafter: Benchmarking and Evaluating Large Video Generation ModelsYaofang Liu, Xiaodong Cun, Xuebo Liu et al.
The vision and language generative models have been overgrown in recent years. For video generation, various open-sourced models and public-available services have been developed to generate high-quality videos. However, these methods often use a few metrics, e.g., FVD or IS, to evaluate the performance. We argue that it is hard to judge the large conditional generative models from the simple metrics since these models are often trained on very large datasets with multi-aspect abilities. Thus, we propose a novel framework and pipeline for exhaustively evaluating the performance of the generated videos. Our approach involves generating a diverse and comprehensive list of 700 prompts for text-to-video generation, which is based on an analysis of real-world user data and generated with the assistance of a large language model. Then, we evaluate the state-of-the-art video generative models on our carefully designed benchmark, in terms of visual qualities, content qualities, motion qualities, and text-video alignment with 17 well-selected objective metrics. To obtain the final leaderboard of the models, we further fit a series of coefficients to align the objective metrics to the users' opinions. Based on the proposed human alignment method, our final score shows a higher correlation than simply averaging the metrics, showing the effectiveness of the proposed evaluation method.
CVJul 18, 2023
PottsMGNet: A Mathematical Explanation of Encoder-Decoder Based Neural NetworksXue-Cheng Tai, Hao Liu, Raymond Chan · gatech
For problems in image processing and many other fields, a large class of effective neural networks has encoder-decoder-based architectures. Although these networks have made impressive performances, mathematical explanations of their architectures are still underdeveloped. In this paper, we study the encoder-decoder-based network architecture from the algorithmic perspective and provide a mathematical explanation. We use the two-phase Potts model for image segmentation as an example for our explanations. We associate the segmentation problem with a control problem in the continuous setting. Then, multigrid method and operator splitting scheme, the PottsMGNet, are used to discretize the continuous control model. We show that the resulting discrete PottsMGNet is equivalent to an encoder-decoder-based network. With minor modifications, it is shown that a number of the popular encoder-decoder-based neural networks are just instances of the proposed PottsMGNet. By incorporating the Soft-Threshold-Dynamics into the PottsMGNet as a regularizer, the PottsMGNet has shown to be robust with the network parameters such as network width and depth and achieved remarkable performance on datasets with very large noise. In nearly all our experiments, the new network always performs better or as good on accuracy and dice score than existing networks for image segmentation.
CVJul 18, 2023
Connections between Operator-splitting Methods and Deep Neural Networks with Applications in Image SegmentationHao Liu, Xue-Cheng Tai, Raymond Chan · gatech
Deep neural network is a powerful tool for many tasks. Understanding why it is so successful and providing a mathematical explanation is an important problem and has been one popular research direction in past years. In the literature of mathematical analysis of deep neural networks, a lot of works is dedicated to establishing representation theories. How to make connections between deep neural networks and mathematical algorithms is still under development. In this paper, we give an algorithmic explanation for deep neural networks, especially in their connections with operator splitting. We show that with certain splitting strategies, operator-splitting methods have the same structure as networks. Utilizing this connection and the Potts model for image segmentation, two networks inspired by operator-splitting methods are proposed. The two networks are essentially two operator-splitting algorithms solving the Potts model. Numerical experiments are presented to demonstrate the effectiveness of the proposed networks.
NAAug 13, 2011
Vessel Segmentation in Medical Imaging Using a Tight-Frame Based AlgorithmXiaohao Cai, Raymond Chan, Serena Morigi et al.
Tight-frame, a generalization of orthogonal wavelets, has been used successfully in various problems in image processing, including inpainting, impulse noise removal, super-resolution image restoration, etc. Segmentation is the process of identifying object outlines within images. There are quite a few efficient algorithms for segmentation that depend on the variational approach and the partial differential equation (PDE) modeling. In this paper, we propose to apply the tight-frame approach to automatically identify tube-like structures such as blood vessels in Magnetic Resonance Angiography (MRA) images. Our method iteratively refines a region that encloses the possible boundary or surface of the vessels. In each iteration, we apply the tight-frame algorithm to denoise and smooth the possible boundary and sharpen the region. We prove the convergence of our algorithm. Numerical experiments on real 2D/3D MRA images demonstrate that our method is very efficient with convergence usually within a few iterations, and it outperforms existing PDE and variational methods as it can extract more tubular objects and fine details in the images.
IVSep 29, 2022
Spherical Image Inpainting with Frame Transformation and Data-driven Prior Deep NetworksJianfei Li, Chaoyan Huang, Raymond Chan et al.
Spherical image processing has been widely applied in many important fields, such as omnidirectional vision for autonomous cars, global climate modelling, and medical imaging. It is non-trivial to extend an algorithm developed for flat images to the spherical ones. In this work, we focus on the challenging task of spherical image inpainting with deep learning-based regularizer. Instead of a naive application of existing models for planar images, we employ a fast directional spherical Haar framelet transform and develop a novel optimization framework based on a sparsity assumption of the framelet transform. Furthermore, by employing progressive encoder-decoder architecture, a new and better-performed deep CNN denoiser is carefully designed and works as an implicit regularizer. Finally, we use a plug-and-play method to handle the proposed optimization model, which can be implemented efficiently by training the CNN denoiser prior. Numerical experiments are conducted and show that the proposed algorithms can greatly recover damaged spherical images and achieve the best performance over purely using deep learning denoiser and plug-and-play model.
CVJan 27, 2023
Dual-View Selective Instance Segmentation Network for Unstained Live Adherent Cells in Differential Interference Contrast ImagesFei Pan, Yutong Wu, Kangning Cui et al.
Despite recent advances in data-independent and deep-learning algorithms, unstained live adherent cell instance segmentation remains a long-standing challenge in cell image processing. Adherent cells' inherent visual characteristics, such as low contrast structures, fading edges, and irregular morphology, have made it difficult to distinguish from one another, even by human experts, let alone computational methods. In this study, we developed a novel deep-learning algorithm called dual-view selective instance segmentation network (DVSISN) for segmenting unstained adherent cells in differential interference contrast (DIC) images. First, we used a dual-view segmentation (DVS) method with pairs of original and rotated images to predict the bounding box and its corresponding mask for each cell instance. Second, we used a mask selection (MS) method to filter the cell instances predicted by the DVS to keep masks closest to the ground truth only. The developed algorithm was trained and validated on our dataset containing 520 images and 12198 cells. Experimental results demonstrate that our algorithm achieves an AP_segm of 0.555, which remarkably overtakes a benchmark by a margin of 23.6%. This study's success opens up a new possibility of using rotated images as input for better prediction in cell images.
NAMay 8
Solving Convolution-type Integral Equations using Preconditioned Neural OperatorsRaymond Chan, Lingfeng Li
Convolution-type integral equations arise from various fields, \textit{e.g.}, finite impulse response filters in signal processing and deblurring problems in image processing. When solving these equations, conventional numerical methods, like the multigrid method, can only efficiently solve the low-frequency components in the error, but not the high-frequency components. In this paper, we apply neural operators to address this issue. By adopting a preconditioning approach, we propose a novel training strategy that trains neural operators to solve the high-frequency components efficiently. Then, we combine the neural operators with some classical iterative solvers, like the weighted Jacobi method, to obtain an efficient hybrid iterative algorithm for the integral equations. We analyze the generalization error of our training strategy and the convergence of the hybrid iterative algorithm. We test our algorithms on large-scale and ill-conditioned linear systems discretized from one- and two-dimensional convolution-type integral equations. Our proposed algorithm significantly outperforms the multigrid method and the preconditioned conjugate gradient method in both iteration numbers and computational time.
LGMar 1
A level-wise training scheme for learning neural multigrid smoothers with application to integral equationsLingfeng Li, Yin King Chu, Raymond Chan et al.
Convolution-type integral equations commonly occur in signal processing and image processing. Discretizing these equations yields large and ill-conditioned linear systems. While the classic multigrid method is effective for solving linear systems derived from partial differential equations (PDE) problems, it fails to solve integral equations because its smoothers, which are implemented as conventional relaxation methods, are ineffective in reducing high-frequency components in the errors. We propose a novel neural multigrid scheme where learned neural operators replace classical smoothers. Unlike classical smoothers, these operators are trained offline. Once trained, the neural smoothers generalize to new right-hand-side vectors without retraining, making it an efficient solver. We design level-wise loss functions incorporating spectral filtering to emulate the multigrid frequency decomposition principle, ensuring each operator focuses on solving distinct high-frequency spectral bands. Although we focus on integral equations, the framework is generalizable to all kinds of problems, including PDE problems. Our experiments demonstrate superior efficiency over classical solvers and robust convergence across varying problem sizes and regularization weights.
IMSep 27, 2018
Novel Sparse Recovery Algorithms for 3D Debris Localization using Rotating Point Spread Function ImageryChao Wang, Robert Plemmons, Sudhakar Prasad et al. · mila
An optical imager that exploits off-center image rotation to encode both the lateral and depth coordinates of point sources in a single snapshot can perform 3D localization and tracking of space debris. When actively illuminated, unresolved space debris, which can be regarded as a swarm of point sources, can scatter a fraction of laser irradiance back into the imaging sensor. Determining the source locations and fluxes is a large-scale sparse 3D inverse problem, for which we have developed efficient and effective algorithms based on sparse recovery using non-convex optimization. Numerical simulations illustrate the efficiency and stability of the algorithms.
CVMay 30, 2015
A Three-stage Approach for Segmenting Degraded Color Images: Smoothing, Lifting and Thresholding (SLaT)Xiaohao Cai, Raymond Chan, Mila Nikolova et al. · mila
In this paper, we propose a SLaT (Smoothing, Lifting and Thresholding) method with three stages for multiphase segmentation of color images corrupted by different degradations: noise, information loss, and blur. At the first stage, a convex variant of the Mumford-Shah model is applied to each channel to obtain a smooth image. We show that the model has unique solution under the different degradations. In order to properly handle the color information, the second stage is dimension lifting where we consider a new vector-valued image composed of the restored image and its transform in the secondary color space with additional information. This ensures that even if the first color space has highly correlated channels, we can still have enough information to give good segmentation results. In the last stage, we apply multichannel thresholding to the combined vector-valued image to find the segmentation. The number of phases is only required in the last stage, so users can choose or change it all without the need of solving the previous stages again. Experiments demonstrate that our SLaT method gives excellent results in terms of segmentation quality and CPU time in comparison with other state-of-the-art segmentation methods.
LGFeb 29, 2024
BP-DeepONet: A new method for cuffless blood pressure estimation using the physcis-informed DeepONetLingfeng Li, Xue-Cheng Tai, Raymond Chan
Cardiovascular diseases (CVDs) are the leading cause of death worldwide, with blood pressure serving as a crucial indicator. Arterial blood pressure (ABP) waveforms provide continuous pressure measurements throughout the cardiac cycle and offer valuable diagnostic insights. Consequently, there is a significant demand for non-invasive and cuff-less methods to measure ABP waveforms continuously. Accurate prediction of ABP waveforms can also improve the estimation of mean blood pressure, an essential cardiovascular health characteristic. This study proposes a novel framework based on the physics-informed DeepONet approach to predict ABP waveforms. Unlike previous methods, our approach requires the predicted ABP waveforms to satisfy the Navier-Stokes equation with a time-periodic condition and a Windkessel boundary condition. Notably, our framework is the first to predict ABP waveforms continuously, both with location and time, within the part of the artery that is being simulated. Furthermore, our method only requires ground truth data at the outlet boundary and can handle periodic conditions with varying periods. Incorporating the Windkessel boundary condition in our solution allows for generating natural physical reflection waves, which closely resemble measurements observed in real-world cases. Moreover, accurately estimating the hyper-parameters in the Navier-Stokes equation for our simulations poses a significant challenge. To overcome this obstacle, we introduce the concept of meta-learning, enabling the neural networks to learn these parameters during the training process.
NADec 25, 2024
Variational Bayesian Inference for Tensor Robust Principal Component AnalysisChao Wang, Huiwen Zheng, Raymond Chan et al.
Tensor Robust Principal Component Analysis (TRPCA) holds a crucial position in machine learning and computer vision. It aims to recover underlying low-rank structures and characterizing the sparse structures of noise. Current approaches often encounter difficulties in accurately capturing the low-rank properties of tensors and balancing the trade-off between low-rank and sparse components, especially in a mixed-noise scenario. To address these challenges, we introduce a Bayesian framework for TRPCA, which integrates a low-rank tensor nuclear norm prior and a generalized sparsity-inducing prior. By embedding the proposed priors within the Bayesian framework, our method can automatically determine the optimal tensor nuclear norm and achieve a balance between the nuclear norm and sparse components. Furthermore, our method can be efficiently extended to the weighted tensor nuclear norm model. Experiments conducted on synthetic and real-world datasets demonstrate the effectiveness and superiority of our method compared to state-of-the-art approaches.
CVJan 26, 2024
VJT: A Video Transformer on Joint Tasks of Deblurring, Low-light Enhancement and DenoisingYuxiang Hui, Yang Liu, Yaofang Liu et al.
Video restoration task aims to recover high-quality videos from low-quality observations. This contains various important sub-tasks, such as video denoising, deblurring and low-light enhancement, since video often faces different types of degradation, such as blur, low light, and noise. Even worse, these kinds of degradation could happen simultaneously when taking videos in extreme environments. This poses significant challenges if one wants to remove these artifacts at the same time. In this paper, to the best of our knowledge, we are the first to propose an efficient end-to-end video transformer approach for the joint task of video deblurring, low-light enhancement, and denoising. This work builds a novel multi-tier transformer where each tier uses a different level of degraded video as a target to learn the features of video effectively. Moreover, we carefully design a new tier-to-tier feature fusion scheme to learn video features incrementally and accelerate the training process with a suitable adaptive weighting scheme. We also provide a new Multiscene-Lowlight-Blur-Noise (MLBN) dataset, which is generated according to the characteristics of the joint task based on the RealBlur dataset and YouTube videos to simulate realistic scenes as far as possible. We have conducted extensive experiments, compared with many previous state-of-the-art methods, to show the effectiveness of our approach clearly.
LGMay 25, 2020
Deep Tensor CCA for Multi-view LearningHok Shing Wong, Li Wang, Raymond Chan et al.
We present Deep Tensor Canonical Correlation Analysis (DTCCA), a method to learn complex nonlinear transformations of multiple views (more than two) of data such that the resulting representations are linearly correlated in high order. The high-order correlation of given multiple views is modeled by covariance tensor, which is different from most CCA formulations relying solely on the pairwise correlations. Parameters of transformations of each view are jointly learned by maximizing the high-order canonical correlation. To solve the resulting problem, we reformulate it as the best sum of rank-1 approximation, which can be efficiently solved by existing tensor decomposition method. DTCCA is a nonlinear extension of tensor CCA (TCCA) via deep networks. The transformations of DTCCA are parametric functions, which are very different from implicit mapping in the form of kernel function. Comparing with kernel TCCA, DTCCA not only can deal with arbitrary dimensions of the input data, but also does not need to maintain the training data for computing representations of any given data point. Hence, DTCCA as a unified model can efficiently overcome the scalable issue of TCCA for either high-dimensional multi-view data or a large amount of views, and it also naturally extends TCCA for learning nonlinear representation. Extensive experiments on three multi-view data sets demonstrate the effectiveness of the proposed method.
LGDec 4, 2019
Large-Scale Semi-Supervised Learning via Graph Structure Learning over High-Dense PointsZitong Wang, Li Wang, Raymond Chan et al.
We focus on developing a novel scalable graph-based semi-supervised learning (SSL) method for a small number of labeled data and a large amount of unlabeled data. Due to the lack of labeled data and the availability of large-scale unlabeled data, existing SSL methods usually encounter either suboptimal performance because of an improper graph or the high computational complexity of the large-scale optimization problem. In this paper, we propose to address both challenging problems by constructing a proper graph for graph-based SSL methods. Different from existing approaches, we simultaneously learn a small set of vertexes to characterize the high-dense regions of the input data and a graph to depict the relationships among these vertexes. A novel approach is then proposed to construct the graph of the input data from the learned graph of a small number of vertexes with some preferred properties. Without explicitly calculating the constructed graph of inputs, two transductive graph-based SSL approaches are presented with the computational complexity in linear with the number of input data. Extensive experiments on synthetic data and real datasets of varied sizes demonstrate that the proposed method is not only scalable for large-scale data, but also achieve good classification performance, especially for extremely small number of labels.
CVOct 10, 2019
Dynamic Spectral Residual SuperpixelsJianchao Zhang, Angelica I. Aviles-Rivero, Daniel Heydecker et al.
We consider the problem of segmenting an image into superpixels in the context of $k$-means clustering, in which we wish to decompose an image into local, homogeneous regions corresponding to the underlying objects. Our novel approach builds upon the widely used Simple Linear Iterative Clustering (SLIC), and incorporate a measure of objects' structure based on the spectral residual of an image. Based on this combination, we propose a modified initialisation scheme and search metric, which helps keeps fine-details. This combination leads to better adherence to object boundaries, while preventing unnecessary segmentation of large, uniform areas, while remaining computationally tractable in comparison to other methods. We demonstrate through numerical and visual experiments that our approach outperforms the state-of-the-art techniques.
NAMay 21, 2019
A Two-stage Classification Method for High-dimensional Data and Point CloudsXiaohao Cai, Raymond Chan, Xiaoyu Xie et al.
High-dimensional data classification is a fundamental task in machine learning and imaging science. In this paper, we propose a two-stage multiphase semi-supervised classification method for classifying high-dimensional data and unstructured point clouds. To begin with, a fuzzy classification method such as the standard support vector machine is used to generate a warm initialization. We then apply a two-stage approach named SaT (smoothing and thresholding) to improve the classification. In the first stage, an unconstraint convex variational model is implemented to purify and smooth the initialization, followed by the second stage which is to project the smoothed partition obtained at stage one to a binary partition. These two stages can be repeated, with the latest result as a new initialization, to keep improving the classification quality. We show that the convex model of the smoothing stage has a unique solution and can be solved by a specifically designed primal-dual algorithm whose convergence is guaranteed. We test our method and compare it with the state-of-the-art methods on several benchmark data sets. The experimental results demonstrate clearly that our method is superior in both the classification accuracy and computation speed for high-dimensional data and point clouds.
NAJul 26, 2018
Linkage between piecewise constant Mumford-Shah model and ROF model and its virtue in image segmentationXiaohao Cai, Raymond Chan, Carola-Bibiane Schonlieb et al.
The piecewise constant Mumford-Shah (PCMS) model and the Rudin-Osher-Fatemi (ROF) model are two important variational models in image segmentation and image restoration, respectively. In this paper, we explore a linkage between these models. We prove that for the two-phase segmentation problem a partial minimizer of the PCMS model can be obtained by thresholding the minimizer of the ROF model. A similar linkage is still valid for multiphase segmentation under specific assumptions. Thus it opens a new segmentation paradigm: image segmentation can be done via image restoration plus thresholding. This new paradigm, which circumvents the innate non-convex property of the PCMS model, therefore improves the segmentation performance in both efficiency (much faster than state-of-the-art methods based on PCMS model, particularly when the phase number is high) and effectiveness (producing segmentation results with better quality) due to the flexibility of the ROF model in tackling degraded images, such as noisy images, blurry images or images with information loss. As a by-product of the new paradigm, we derive a novel segmentation method, called thresholded-ROF (T-ROF) method, to illustrate the virtue of managing image segmentation through image restoration techniques. The convergence of the T-ROF method is proved, and elaborate experimental results and comparisons are presented.