Shiying Li

CV
h-index15
26papers
320citations
Novelty49%
AI Score47

26 Papers

IVSep 18, 2023Code
An Accurate and Efficient Neural Network for OCTA Vessel Segmentation and a New Dataset

Haojian Ning, Chengliang Wang, Xinrun Chen et al.

Optical coherence tomography angiography (OCTA) is a noninvasive imaging technique that can reveal high-resolution retinal vessels. In this work, we propose an accurate and efficient neural network for retinal vessel segmentation in OCTA images. The proposed network achieves accuracy comparable to other SOTA methods, while having fewer parameters and faster inference speed (e.g. 110x lighter and 1.3x faster than U-Net), which is very friendly for industrial applications. This is achieved by applying the modified Recurrent ConvNeXt Block to a full resolution convolutional network. In addition, we create a new dataset containing 918 OCTA images and their corresponding vessel annotations. The data set is semi-automatically annotated with the help of Segment Anything Model (SAM), which greatly improves the annotation speed. For the benefit of the community, our code and dataset can be obtained from https://github.com/nhjydywd/OCTA-FRNet.

CVSep 21, 2023Code
SAM-OCTA: A Fine-Tuning Strategy for Applying Foundation Model to OCTA Image Segmentation Tasks

Chengliang Wang, Xinrun Chen, Haojian Ning et al.

In the analysis of optical coherence tomography angiography (OCTA) images, the operation of segmenting specific targets is necessary. Existing methods typically train on supervised datasets with limited samples (approximately a few hundred), which can lead to overfitting. To address this, the low-rank adaptation technique is adopted for foundation model fine-tuning and proposed corresponding prompt point generation strategies to process various segmentation tasks on OCTA datasets. This method is named SAM-OCTA and has been experimented on the publicly available OCTA-500 dataset. While achieving state-of-the-art performance metrics, this method accomplishes local vessel segmentation as well as effective artery-vein segmentation, which was not well-solved in previous works. The code is available at: https://github.com/ShellRedia/SAM-OCTA.

SPApr 30, 2022Code
End-to-End Signal Classification in Signed Cumulative Distribution Transform Space

Abu Hasnat Mohammad Rubaiyat, Shiying Li, Xuwang Yin et al.

This paper presents a new end-to-end signal classification method using the signed cumulative distribution transform (SCDT). We adopt a transport-based generative model to define the classification problem. We then make use of mathematical properties of the SCDT to render the problem easier in transform domain, and solve for the class of an unknown sample using a nearest local subspace (NLS) search algorithm in SCDT domain. Experiments show that the proposed method provides high accuracy classification results while being data efficient, robust to out-of-distribution samples, and competitive in terms of computational complexity with respect to the deep learning end-to-end classification methods. The implementation of the proposed method in Python language is integrated as a part of the software package PyTransKit (https://github.com/rohdelab/PyTransKit).

LGOct 11, 2023Code
SAM-OCTA: Prompting Segment-Anything for OCTA Image Segmentation

Xinrun Chen, Chengliang Wang, Haojian Ning et al.

Segmenting specific targets or biomarkers is necessary to analyze optical coherence tomography angiography (OCTA) images. Previous methods typically segment all the targets in an OCTA sample, such as retinal vessels (RVs). Although these methods perform well in accuracy and precision, OCTA analyses often focusing local information within the images which has not been fulfilled. In this paper, we propose a method called SAM-OCTA for local segmentation in OCTA images. The method fine-tunes a pre-trained segment anything model (SAM) using low-rank adaptation (LoRA) and utilizes prompt points for local RVs, arteries, and veins segmentation in OCTA. To explore the effect and mechanism of prompt points, we set up global and local segmentation modes with two prompt point generation strategies, namely random selection and special annotation. Considering practical usage, we conducted extended experiments with different model scales and analyzed the model performance before and after fine-tuning besides the general segmentation task. From comprehensive experimental results with the OCTA-500 dataset, our SAM-OCTA method has achieved state-of-the-art performance in common OCTA segmentation tasks related to RV and FAZ, and it also performs accurate segmentation of artery-vein and local vessels. The code is available at https://github.com/ShellRedia/SAM-OCTA-extend.

CVSep 14, 2024Code
SAM-OCTA2: Layer Sequence OCTA Segmentation with Fine-tuned Segment Anything Model 2

Xinrun Chen, Chengliang Wang, Haojian Ning et al.

Segmentation of indicated targets aids in the precise analysis of optical coherence tomography angiography (OCTA) samples. Existing segmentation methods typically perform on 2D projection targets, making it challenging to capture the variance of segmented objects through the 3D volume. To address this limitation, the low-rank adaptation technique is adopted to fine-tune the Segment Anything Model (SAM) version 2, enabling the tracking and segmentation of specified objects across the OCTA scanning layer sequence. To further this work, a prompt point generation strategy in frame sequence and a sparse annotation method to acquire retinal vessel (RV) layer masks are proposed. This method is named SAM-OCTA2 and has been experimented on the OCTA-500 dataset. It achieves state-of-the-art performance in segmenting the foveal avascular zone (FAZ) on regular 2D en-face and effectively tracks local vessels across scanning layer sequences. The code is available at: https://github.com/ShellRedia/SAM-OCTA2.

CVApr 21, 2023
Omni-Line-of-Sight Imaging for Holistic Shape Reconstruction

Binbin Huang, Xingyue Peng, Siyuan Shen et al.

We introduce Omni-LOS, a neural computational imaging method for conducting holistic shape reconstruction (HSR) of complex objects utilizing a Single-Photon Avalanche Diode (SPAD)-based time-of-flight sensor. As illustrated in Fig. 1, our method enables new capabilities to reconstruct near-$360^\circ$ surrounding geometry of an object from a single scan spot. In such a scenario, traditional line-of-sight (LOS) imaging methods only see the front part of the object and typically fail to recover the occluded back regions. Inspired by recent advances of non-line-of-sight (NLOS) imaging techniques which have demonstrated great power to reconstruct occluded objects, Omni-LOS marries LOS and NLOS together, leveraging their complementary advantages to jointly recover the holistic shape of the object from a single scan position. The core of our method is to put the object nearby diffuse walls and augment the LOS scan in the front view with the NLOS scans from the surrounding walls, which serve as virtual ``mirrors'' to trap lights toward the object. Instead of separately recovering the LOS and NLOS signals, we adopt an implicit neural network to represent the object, analogous to NeRF and NeTF. While transients are measured along straight rays in LOS but over the spherical wavefronts in NLOS, we derive differentiable ray propagation models to simultaneously model both types of transient measurements so that the NLOS reconstruction also takes into account the direct LOS measurements and vice versa. We further develop a proof-of-concept Omni-LOS hardware prototype for real-world validation. Comprehensive experiments on various wall settings demonstrate that Omni-LOS successfully resolves shape ambiguities caused by occlusions, achieves high-fidelity 3D scan quality, and manages to recover objects of various scales and complexity.

ITJul 28, 2023
The Radon Signed Cumulative Distribution Transform and its applications in classification of Signed Images

Le Gong, Shiying Li, Naqib Sad Pathan et al.

Here we describe a new image representation technique based on the mathematics of transport and optimal transport. The method relies on the combination of the well-known Radon transform for images and a recent signal representation method called the Signed Cumulative Distribution Transform. The newly proposed method generalizes previous transport-related image representation methods to arbitrary functions (images), and thus can be used in more applications. We describe the new transform, and some of its mathematical properties and demonstrate its ability to partition image classes with real and simulated data. In comparison to existing transport transform methods, as well as deep learning-based classification methods, the new transform more accurately represents the information content of signed images, and thus can be used to obtain higher classification accuracies. The implementation of the proposed method in Python language is integrated as a part of the software package PyTransKit, available on Github.

OPTICSApr 4, 2023
High-resolution tomographic reconstruction of optical absorbance through scattering media using neural fields

Wuwei Ren, Siyuan Shen, Linlin Li et al.

Light scattering imposes a major obstacle for imaging objects seated deeply in turbid media, such as biological tissues and foggy air. Diffuse optical tomography (DOT) tackles scattering by volumetrically recovering the optical absorbance and has shown significance in medical imaging, remote sensing and autonomous driving. A conventional DOT reconstruction paradigm necessitates discretizing the object volume into voxels at a pre-determined resolution for modelling diffuse light propagation and the resulting spatial resolution of the reconstruction is generally limited. We propose NeuDOT, a novel DOT scheme based on neural fields (NF) to continuously encode the optical absorbance within the volume and subsequently bridge the gap between model accuracy and high resolution. Comprehensive experiments demonstrate that NeuDOT achieves submillimetre lateral resolution and resolves complex 3D objects at 14 mm-depth, outperforming the state-of-the-art methods. NeuDOT is a non-invasive, high-resolution and computationally efficient tomographic method, and unlocks further applications of NF involving light scattering.

CVAug 27, 2023
Nonrigid Object Contact Estimation With Regional Unwrapping Transformer

Wei Xie, Zimeng Zhao, Shiying Li et al.

Acquiring contact patterns between hands and nonrigid objects is a common concern in the vision and robotics community. However, existing learning-based methods focus more on contact with rigid ones from monocular images. When adopting them for nonrigid contact, a major problem is that the existing contact representation is restricted by the geometry of the object. Consequently, contact neighborhoods are stored in an unordered manner and contact features are difficult to align with image cues. At the core of our approach lies a novel hand-object contact representation called RUPs (Region Unwrapping Profiles), which unwrap the roughly estimated hand-object surfaces as multiple high-resolution 2D regional profiles. The region grouping strategy is consistent with the hand kinematic bone division because they are the primitive initiators for a composite contact pattern. Based on this representation, our Regional Unwrapping Transformer (RUFormer) learns the correlation priors across regions from monocular inputs and predicts corresponding contact and deformed transformations. Our experiments demonstrate that the proposed framework can robustly estimate the deformed degrees and deformed transformations, which makes it suitable for both nonrigid and rigid contact.

LGJun 4, 2022
Geodesic Properties of a Generalized Wasserstein Embedding for Time Series Analysis

Shiying Li, Abu Hasnat Mohammad Rubaiyat, Gustavo K. Rohde

Transport-based metrics and related embeddings (transforms) have recently been used to model signal classes where nonlinear structures or variations are present. In this paper, we study the geodesic properties of time series data with a generalized Wasserstein metric and the geometry related to their signed cumulative distribution transforms in the embedding space. Moreover, we show how understanding such geometric characteristics can provide added interpretability to certain time series classifiers, and be an inspiration for more robust classifiers.

59.4NAMar 13
Measure transfer via stochastic slicing and matching

Shiying Li, Caroline Moosmueller, Yongzhe Wang

This paper studies iterative schemes for measure transfer and approximation problems, which are defined through a slicing-and-matching procedure. Similar to the sliced Wasserstein distance, these schemes benefit from the availability of closed-form solutions for the one-dimensional optimal transport problem and the associated computational advantages. While such schemes have already been successfully utilized in data science applications, not too many results on their convergence are available. The main contribution of this paper is an almost sure convergence proof for stochastic slicing-and-matching schemes. The proof builds on an interpretation as a stochastic gradient descent scheme on the Wasserstein space. Numerical examples on step-wise image morphing are demonstrated as well.

IVApr 28, 2024Code
Snake with Shifted Window: Learning to Adapt Vessel Pattern for OCTA Segmentation

Xinrun Chen, Mei Shen, Haojian Ning et al.

Segmenting specific targets or structures in optical coherence tomography angiography (OCTA) images is fundamental for conducting further pathological studies. The retinal vascular layers are rich and intricate, and such vascular with complex shapes can be captured by the widely-studied OCTA images. In this paper, we thus study how to use OCTA images with projection vascular layers to segment retinal structures. To this end, we propose the SSW-OCTA model, which integrates the advantages of deformable convolutions suited for tubular structures and the swin-transformer for global feature extraction, adapting to the characteristics of OCTA modality images. Our model underwent testing and comparison on the OCTA-500 dataset, achieving state-of-the-art performance. The code is available at: https://github.com/ShellRedia/Snake-SWin-OCTA.

CVMar 14, 2025Code
TransiT: Transient Transformer for Non-line-of-sight Videography

Ruiqian Li, Siyuan Shen, Suan Xia et al.

High quality and high speed videography using Non-Line-of-Sight (NLOS) imaging benefit autonomous navigation, collision prevention, and post-disaster search and rescue tasks. Current solutions have to balance between the frame rate and image quality. High frame rates, for example, can be achieved by reducing either per-point scanning time or scanning density, but at the cost of lowering the information density at individual frames. Fast scanning process further reduces the signal-to-noise ratio and different scanning systems exhibit different distortion characteristics. In this work, we design and employ a new Transient Transformer architecture called TransiT to achieve real-time NLOS recovery under fast scans. TransiT directly compresses the temporal dimension of input transients to extract features, reducing computation costs and meeting high frame rate requirements. It further adopts a feature fusion mechanism as well as employs a spatial-temporal Transformer to help capture features of NLOS transient videos. Moreover, TransiT applies transfer learning to bridge the gap between synthetic and real-measured data. In real experiments, TransiT manages to reconstruct from sparse transients of $16 \times 16$ measured at an exposure time of 0.4 ms per point to NLOS videos at a $64 \times 64$ resolution at 10 frames per second. We will make our code and dataset available to the community.

CVJan 9, 2022Code
Invariance encoding in sliced-Wasserstein space for image classification with limited training data

Mohammad Shifat E Rabbi, Yan Zhuang, Shiying Li et al.

Deep convolutional neural networks (CNNs) are broadly considered to be state-of-the-art generic end-to-end image classification systems. However, they are known to underperform when training data are limited and thus require data augmentation strategies that render the method computationally expensive and not always effective. Rather than using a data augmentation strategy to encode invariances as typically done in machine learning, here we propose to mathematically augment a nearest subspace classification model in sliced-Wasserstein space by exploiting certain mathematical properties of the Radon Cumulative Distribution Transform (R-CDT), a recently introduced image transform. We demonstrate that for a particular type of learning problem, our mathematical solution has advantages over data augmentation with deep CNNs in terms of classification accuracy and computational complexity, and is particularly effective under a limited training data setting. The method is simple, effective, computationally efficient, non-iterative, and requires no parameters to be tuned. Python code implementing our method is available at https://github.com/rohdelab/mathematical_augmentation. Our method is integrated as a part of the software package PyTransKit, which is available at https://github.com/rohdelab/PyTransKit.

CVApr 7, 2020Code
Radon cumulative distribution transform subspace modeling for image classification

Mohammad Shifat-E-Rabbi, Xuwang Yin, Abu Hasnat Mohammad Rubaiyat et al.

We present a new supervised image classification method applicable to a broad class of image deformation models. The method makes use of the previously described Radon Cumulative Distribution Transform (R-CDT) for image data, whose mathematical properties are exploited to express the image data in a form that is more suitable for machine learning. While certain operations such as translation, scaling, and higher-order transformations are challenging to model in native image space, we show the R-CDT can capture some of these variations and thus render the associated image classification problems easier to solve. The method -- utilizing a nearest-subspace algorithm in R-CDT space -- is simple to implement, non-iterative, has no hyper-parameters to tune, is computationally efficient, label efficient, and provides competitive accuracies to state-of-the-art neural networks for many types of classification problems. In addition to the test accuracy performances, we show improvements (with respect to neural network-based methods) in terms of computational efficiency (it can be implemented without the use of GPUs), number of training samples needed for training, as well as out-of-distribution generalization. The Python code for reproducing our results is available at https://github.com/rohdelab/rcdt_ns_classifier.

NAOct 16, 2023
Approximation properties of slice-matching operators

Shiying Li, Caroline Moosmueller

Iterative slice-matching procedures are efficient schemes for transferring a source measure to a target measure, especially in high dimensions. These schemes have been successfully used in applications such as color transfer and shape retrieval, and are guaranteed to converge under regularity assumptions. In this paper, we explore approximation properties related to a single step of such iterative schemes by examining an associated slice-matching operator, depending on a source measure, a target measure, and slicing directions. In particular, we demonstrate an invariance property with respect to the source measure, an equivariance property with respect to the target measure, and Lipschitz continuity concerning the slicing directions. We furthermore establish error bounds corresponding to approximating the target measure by one step of the slice-matching scheme and characterize situations in which the slice-matching operator recovers the optimal transport map between two measures. We also investigate connections to affine registration problems with respect to (sliced) Wasserstein distances. These connections can be also be viewed as extensions to the invariance and equivariance properties of the slice-matching operator and illustrate the extent to which slice-matching schemes incorporate affine effects.

MGNov 4, 2024
Metric properties of partial and robust Gromov-Wasserstein distances

Jannatul Chhoa, Michael Ivanitskiy, Fushuai Jiang et al.

The Gromov-Wasserstein (GW) distances define a family of metrics, based on ideas from optimal transport, which enable comparisons between probability measures defined on distinct metric spaces. They are particularly useful in areas such as network analysis and geometry processing, as computation of a GW distance involves solving for registration between the objects which minimizes geometric distortion. Although GW distances have proven useful for various applications in the recent machine learning literature, it has been observed that they are inherently sensitive to outlier noise and cannot accommodate partial matching. This has been addressed by various constructions building on the GW framework; in this article, we focus specifically on a natural relaxation of the GW optimization problem, introduced by Chapel et al., which is aimed at addressing exactly these shortcomings. Our goal is to understand the theoretical properties of this relaxed optimization problem, from the viewpoint of metric geometry. While the relaxed problem fails to induce a metric, we derive precise characterizations of how it fails the axioms of non-degeneracy and triangle inequality. These observations lead us to define a novel family of distances, whose construction is inspired by the Prokhorov and Ky Fan distances, as well as by the recent work of Raghvendra et al.\ on robust versions of classical Wasserstein distance. We show that our new distances define true metrics, that they induce the same topology as the GW distances, and that they enjoy additional robustness to perturbations. These results provide a mathematically rigorous basis for using our robust partial GW distances in applications where outliers and partial matching are concerns.

CVMar 15, 2024
Linear optimal transport subspaces for point set classification

Mohammad Shifat E Rabbi, Naqib Sad Pathan, Shiying Li et al.

Learning from point sets is an essential component in many computer vision and machine learning applications. Native, unordered, and permutation invariant set structure space is challenging to model, particularly for point set classification under spatial deformations. Here we propose a framework for classifying point sets experiencing certain types of spatial deformations, with a particular emphasis on datasets featuring affine deformations. Our approach employs the Linear Optimal Transport (LOT) transform to obtain a linear embedding of set-structured data. Utilizing the mathematical properties of the LOT transform, we demonstrate its capacity to accommodate variations in point sets by constructing a convex data space, effectively simplifying point set classification problems. Our method, which employs a nearest-subspace algorithm in the LOT space, demonstrates label efficiency, non-iterative behavior, and requires no hyper-parameter tuning. It achieves competitive accuracies compared to state-of-the-art methods across various point set classification tasks. Furthermore, our approach exhibits robustness in out-of-distribution scenarios where training and test distributions vary in terms of deformation magnitudes.

CVJun 10, 2025
MARMOT: Masked Autoencoder for Modeling Transient Imaging

Siyuan Shen, Ziheng Wang, Xingyue Peng et al.

Pretrained models have demonstrated impressive success in many modalities such as language and vision. Recent works facilitate the pretraining paradigm in imaging research. Transients are a novel modality, which are captured for an object as photon counts versus arrival times using a precisely time-resolved sensor. In particular for non-line-of-sight (NLOS) scenarios, transients of hidden objects are measured beyond the sensor's direct line of sight. Using NLOS transients, the majority of previous works optimize volume density or surfaces to reconstruct the hidden objects and do not transfer priors learned from datasets. In this work, we present a masked autoencoder for modeling transient imaging, or MARMOT, to facilitate NLOS applications. Our MARMOT is a self-supervised model pretrianed on massive and diverse NLOS transient datasets. Using a Transformer-based encoder-decoder, MARMOT learns features from partially masked transients via a scanning pattern mask (SPM), where the unmasked subset is functionally equivalent to arbitrary sampling, and predicts full measurements. Pretrained on TransVerse-a synthesized transient dataset of 500K 3D models-MARMOT adapts to downstream imaging tasks using direct feature transfer or decoder finetuning. Comprehensive experiments are carried out in comparisons with state-of-the-art methods. Quantitative and qualitative results demonstrate the efficiency of our MARMOT.

CVApr 30, 2025
VividListener: Expressive and Controllable Listener Dynamics Modeling for Multi-Modal Responsive Interaction

Shiying Li, Xingqun Qi, Bingkun Yang et al.

Generating responsive listener head dynamics with nuanced emotions and expressive reactions is crucial for practical dialogue modeling in various virtual avatar animations. Previous studies mainly focus on the direct short-term production of listener behavior. They overlook the fine-grained control over motion variations and emotional intensity, especially in long-sequence modeling. Moreover, the lack of long-term and large-scale paired speaker-listener corpora including head dynamics and fine-grained multi-modality annotations (e.g., text-based expression descriptions, emotional intensity) also limits the application of dialogue modeling.Therefore, we first newly collect a large-scale multi-turn dataset of 3D dyadic conversation containing more than 1.4M valid frames for multi-modal responsive interaction, dubbed ListenerX. Additionally, we propose VividListener, a novel framework enabling fine-grained, expressive and controllable listener dynamics modeling. This framework leverages multi-modal conditions as guiding principles for fostering coherent interactions between speakers and listeners.Specifically, we design the Responsive Interaction Module (RIM) to adaptively represent the multi-modal interactive embeddings. RIM ensures the listener dynamics achieve fine-grained semantic coordination with textual descriptions and adjustments, while preserving expressive reaction with speaker behavior. Meanwhile, we design the Emotional Intensity Tags (EIT) for emotion intensity editing with multi-modal information integration, applying to both text descriptions and listener motion amplitude.Extensive experiments conducted on our newly collected ListenerX dataset demonstrate that VividListener achieves state-of-the-art performance, realizing expressive and controllable listener dynamics.

CVFeb 22, 2022
Local Sliced-Wasserstein Feature Sets for Illumination-invariant Face Recognition

Yan Zhuang, Shiying Li, Mohammad Shifat-E-Rabbi et al.

We present a new method for face recognition from digital images acquired under varying illumination conditions. The method is based on mathematical modeling of local gradient distributions using the Radon Cumulative Distribution Transform (R-CDT). We demonstrate that lighting variations cause certain types of deformations of local image gradient distributions which, when expressed in R-CDT domain, can be modeled as a subspace. Face recognition is then performed using a nearest subspace in R-CDT domain of local gradient distributions. Experiment results demonstrate the proposed method outperforms other alternatives in several face recognition tasks with challenging illumination conditions. Python code implementing the proposed method is available, which is integrated as a part of the software package PyTransKit.

IVDec 29, 2021
Onsite Non-Line-of-Sight Imaging via Online Calibrations

Zhengqing Pan, Ruiqian Li, Tian Gao et al.

There has been an increasing interest in deploying non-line-of-sight (NLOS) imaging systems for recovering objects behind an obstacle. Existing solutions generally pre-calibrate the system before scanning the hidden objects. Onsite adjustments of the occluder, object and scanning pattern require re-calibration. We present an online calibration technique that directly decouples the acquired transients at onsite scanning into the LOS and hidden components. We use the former to directly (re)calibrate the system upon changes of scene/obstacle configurations, scanning regions, and scanning patterns whereas the latter for hidden object recovery via spatial, frequency or learning based techniques. Our technique avoids using auxiliary calibration apparatus such as mirrors or checkerboards and supports both laboratory validations and real-world deployments.

IVJan 2, 2021
Non-line-of-Sight Imaging via Neural Transient Fields

Siyuan Shen, Zi Wang, Ping Liu et al.

We present a neural modeling framework for Non-Line-of-Sight (NLOS) imaging. Previous solutions have sought to explicitly recover the 3D geometry (e.g., as point clouds) or voxel density (e.g., within a pre-defined volume) of the hidden scene. In contrast, inspired by the recent Neural Radiance Field (NeRF) approach, we use a multi-layer perceptron (MLP) to represent the neural transient field or NeTF. However, NeTF measures the transient over spherical wavefronts rather than the radiance along lines. We therefore formulate a spherical volume NeTF reconstruction pipeline, applicable to both confocal and non-confocal setups. Compared with NeRF, NeTF samples a much sparser set of viewpoints (scanning spots) and the sampling is highly uneven. We thus introduce a Monte Carlo technique to improve the robustness in the reconstruction. Comprehensive experiments on synthetic and real datasets demonstrate NeTF provides higher quality reconstruction and preserves fine details largely missing in the state-of-the-art.

LGDec 11, 2020
Learning Energy-Based Models With Adversarial Training

Xuwang Yin, Shiying Li, Gustavo K. Rohde

We study a new approach to learning energy-based models (EBMs) based on adversarial training (AT). We show that (binary) AT learns a special kind of energy function that models the support of the data distribution, and the learning process is closely related to MCMC-based maximum likelihood learning of EBMs. We further propose improved techniques for generative modeling with AT, and demonstrate that this new approach is capable of generating diverse and realistic images. Aside from having competitive image generation performance to explicit EBMs, the studied approach is stable to train, is well-suited for image translation tasks, and exhibits strong out-of-distribution adversarial robustness. Our results demonstrate the viability of the AT approach to generative modeling, suggesting that AT is a competitive alternative approach to learning EBMs.

LGAug 8, 2020
Partitioning signal classes using transport transforms for data analysis and machine learning

Akram Aldroubi, Shiying Li, Gustavo K. Rohde

A relatively new set of transport-based transforms (CDT, R-CDT, LOT) have shown their strength and great potential in various image and data processing tasks such as parametric signal estimation, classification, cancer detection among many others. It is hence worthwhile to elucidate some of the mathematical properties that explain the successes of these transforms when they are used as tools in data analysis, signal processing or data classification. In particular, we give conditions under which classes of signals that are created by algebraic generative models are transformed into convex sets by the transport transforms. Such convexification of the classes simplify the classification and other data analysis and processing problems when viewed in the transform domain. More specifically, we study the extent and limitation of the convexification ability of these transforms under an algebraic generative modeling framework. We hope that this paper will serve as an introduction to these transforms and will encourage mathematicians and other researchers to further explore the theoretical underpinnings and algorithmic tools that will help understand the successes of these transforms and lay the groundwork for further successful applications.

CVMar 7, 2019
Hair Segmentation on Time-of-Flight RGBD Images

Yuanxi Ma, Cen Wang, Shiying Li et al.

Robust segmentation of hair from portrait images remains challenging: hair does not conform to a uniform shape, style or even color; dark hair in particular lacks features. We present a novel computational imaging solution that tackles the problem from both input and processing fronts. We explore using Time-of-Flight (ToF) RGBD sensors on recent mobile devices. We first conduct a comprehensive analysis to show that scattering and inter-reflection cause different noise patterns on hair vs. non-hair regions on ToF images, by changing the light path and/or combining multiple paths. We then develop a deep network based approach that employs both ToF depth map and the RGB gradient maps to produce an initial hair segmentation with labeled hair components. We then refine the result by imposing ToF noise prior under the conditional random field. We collect the first ToF RGBD hair dataset with 20k+ head images captured on 30 human subjects with a variety of hairstyles at different view angles. Comprehensive experiments show that our approach outperforms the RGB based techniques in accuracy and robustness and can handle traditionally challenging cases such as dark hair, similar hair/background, similar hair/foreground, etc.