DATA-ANMar 20, 2017
Entropy-based Time-Varying Window Width Selection for Nonlinear type Time-Frequency AnalysisYae-lin Sheu, Liang-Yan Hsu, Pi-Tai Chou et al.
We propose a time-varying optimal window width (TVOWW) selection scheme to optimize the performance of several nonlinear-type time-frequency analyses, including the reassignment method, and the synchrosqueezing transform (SST) and its variations. A window rendering the most concentrated distribution in the time-frequency representation (TFR) is regarded as the optimal window. The TVOWW selection scheme is particularly useful for signals that comprise fast-varying instantaneous frequencies and small spectral gaps. To demonstrate the efficacy of the method, in addition to analyzing a synthetic signal, we study an atomic time-varying dipole moment driven by two-color mid-infrared laser fields in attosecond physics.
CADec 5, 2012
The Synchrosqueezing algorithm for time-varying spectral analysis: robustness properties and new paleoclimate applicationsGaurav Thakur, Eugene Brevdo, Neven S. Fučkar et al.
We analyze the stability properties of the Synchrosqueezing transform, a time-frequency signal analysis method that can identify and extract oscillatory components with time-varying frequency and amplitude. We show that Synchrosqueezing is robust to bounded perturbations of the signal and to Gaussian white noise. These results justify its applicability to noisy or nonuniformly sampled data that is ubiquitous in engineering and the natural sciences. We also describe a practical implementation of Synchrosqueezing and provide guidance on tuning its main parameters. As a case study in the geosciences, we examine characteristics of a key paleoclimate change in the last 2.5 million years, where Synchrosqueezing provides significantly improved insights.
NAJan 15, 2012
Synchrosqueezing-based Recovery of Instantaneous Frequency from Nonuniform SamplesGaurav Thakur, Hau-Tieng Wu
We propose a new approach for studying the notion of the instantaneous frequency of a signal. We build on ideas from the Synchrosqueezing theory of Daubechies, Lu and Wu and consider a variant of Synchrosqueezing, based on the short-time Fourier transform, to precisely define the instantaneous frequencies of a multi-component AM-FM signal. We describe an algorithm to recover these instantaneous frequencies from the uniform or nonuniform samples of the signal and show that our method is robust to noise. We also consider an alternative approach based on the conventional, Hilbert transform-based notion of instantaneous frequency to compare to our new method. We use these methods on several test cases and apply our results to a signal analysis problem in electrocardiography.
NAJun 21, 2016
Carrier frequencies, holomorphy and unwindingRonald R. Coifman, Stefan Steinerberger, Hau-tieng Wu
We prove that functions of intrinsic-mode type (a classical models for signals) behave essentially like holomorphic functions: adding a pure carrier frequency $e^{int}$ ensures that the anti-holomorphic part is much smaller than the holomorphic part $ \| P_{-}(f)\|_{L^2} \ll \|P_{+}(f)\|_{L^2}.$ This enables us to use techniques from complex analysis, in particular the \textit{unwinding series}. We study its stability and convergence properties and show that the unwinding series can stabilize and show that the unwinding series can provide a high resolution time-frequency representation, which is robust to noise.
NAJul 23, 2012
Instantaneous frequency and wave shape functions (I)Hau-tieng Wu
Although one can formulate an intuitive notion of instantaneous frequency, generalizing "frequency" as we understand it in e.g. the Fourier transform, a rigorous mathematical definition is lacking. In this paper, we consider a class of functions composed of waveforms that repeat nearly periodically, and for which the instantaneous frequency can be given a rigorous meaning. We show that Synchrosqueezing can be used to determine the instantaneous frequency of functions in this class, even if the waveform is not harmonic, thus generalizing earlier results for cosine wave functions. We also provide real-life examples and discuss the advantages, for these examples, of considering such non-harmonic waveforms.
STJul 30, 2012
Local Linear Regression on Manifolds and its Geometric InterpretationMing-Yen Cheng, Hau-tieng Wu
High-dimensional data analysis has been an active area, and the main focuses have been variable selection and dimension reduction. In practice, it occurs often that the variables are located on an unknown, lower-dimensional nonlinear manifold. Under this manifold assumption, one purpose of this paper is regression and gradient estimation on the manifold, and another is developing a new tool for manifold learning. To the first aim, we suggest directly reducing the dimensionality to the intrinsic dimension $d$ of the manifold, and performing the popular local linear regression (LLR) on a tangent plane estimate. An immediate consequence is a dramatic reduction in the computation time when the ambient space dimension $p\gg d$. We provide rigorous theoretical justification of the convergence of the proposed regression and gradient estimators by carefully analyzing the curvature, boundary, and non-uniform sampling effects. A bandwidth selector that can handle heteroscedastic errors is proposed. To the second aim, we analyze carefully the behavior of our regression estimator both in the interior and near the boundary of the manifold, and make explicit its relationship with manifold learning, in particular estimating the Laplace-Beltrami operator of the manifold. In this context, we also make clear that it is important to use a smaller bandwidth in the tangent plane estimation than in the LLR. Simulation studies and the Isomap face data example are used to illustrate the computational speed and estimation accuracy of our methods.
NAMar 20, 2016
Convex Optimization approach to signals with fast varying instantaneous frequencyMatthieu Kowalski, Adrien Meynard, Hau-tieng Wu
Motivated by the limitation of analyzing oscillatory signals composed of multiple components with fast-varying instantaneous frequency, we approach the time-frequency analysis problem by optimization. Based on the proposed adaptive harmonic model, the time-frequency representation of a signal is obtained by directly minimizing a functional, which involves few properties an "ideal time-frequency representation" should satisfy, for example, the signal reconstruction and concentrative time frequency representation. FISTA (Fast Iterative Shrinkage-Thresholding Algorithm) is applied to achieve an efficient numerical approximation of the functional. We coin the algorithm as {\it Time-frequency bY COnvex OptimizatioN} (Tycoon). The numerical results confirm the potential of the Tycoon algorithm.
STMar 19, 2013
Nonparametric and adaptive modeling of dynamic seasonality and trend with heteroscedastic and dependent errorsYu-Chun Chen, Ming-Yen Cheng, Hau-tieng Wu
Seasonality (or periodicity) and trend are features describing an observed sequence, and extracting these features is an important issue in many scientific fields. However, it is not an easy task for existing methods to analyze simultaneously the trend and {\it dynamics} of the seasonality such as time-varying frequency and amplitude, and the {\it adaptivity} of the analysis to such dynamics and robustness to heteroscedastic, dependent errors is not guaranteed. These tasks become even more challenging when there exist multiple seasonal components. We propose a nonparametric model to describe the dynamics of multi-component seasonality, and investigate the recently developed Synchrosqueezing transform (SST) in extracting these features in the presence of a trend and heteroscedastic, dependent errors. The identifiability problem of the nonparametric seasonality model is studied, and the adaptivity and robustness properties of the SST are theoretically justified in both discrete- and continuous-time settings. Consequently we have a new technique for de-coupling the trend, seasonality and heteroscedastic, dependent error process in a general nonparametric setup. Results of a series of simulations are provided, and the incidence time series of varicella and herpes zoster in Taiwan and respiratory signals observed from a sleep study are analyzed.
MLAug 17, 2022
Geometric Scattering on Measure SpacesJoyce Chew, Matthew Hirn, Smita Krishnaswamy et al.
The scattering transform is a multilayered, wavelet-based transform initially introduced as a model of convolutional neural networks (CNNs) that has played a foundational role in our understanding of these networks' stability and invariance properties. Subsequently, there has been widespread interest in extending the success of CNNs to data sets with non-Euclidean structure, such as graphs and manifolds, leading to the emerging field of geometric deep learning. In order to improve our understanding of the architectures used in this new field, several papers have proposed generalizations of the scattering transform for non-Euclidean data structures such as undirected graphs and compact Riemannian manifolds without boundary. In this paper, we introduce a general, unified model for geometric scattering on measure spaces. Our proposed framework includes previous work on geometric scattering as special cases but also applies to more general settings such as directed graphs, signed graphs, and manifolds with boundary. We propose a new criterion that identifies to which groups a useful representation should be invariant and show that this criterion is sufficient to guarantee that the scattering transform has desirable stability and invariance properties. Additionally, we consider finite measure spaces that are obtained from randomly sampling an unknown manifold. We propose two methods for constructing a data-driven graph on which the associated graph scattering transform approximates the scattering transform on the underlying manifold. Moreover, we use a diffusion-maps based approach to prove quantitative estimates on the rate of convergence of one of these approximations as the number of sample points tends to infinity. Lastly, we showcase the utility of our method on spherical images, directed graphs, and on high-dimensional single-cell data.
LGJun 21, 2022
The Manifold Scattering Transform for High-Dimensional Point Cloud DataJoyce Chew, Holly R. Steach, Siddharth Viswanath et al.
The manifold scattering transform is a deep feature extractor for data defined on a Riemannian manifold. It is one of the first examples of extending convolutional neural network-like operators to general manifolds. The initial work on this model focused primarily on its theoretical stability and invariance properties but did not provide methods for its numerical implementation except in the case of two-dimensional surfaces with predefined meshes. In this work, we present practical schemes, based on the theory of diffusion maps, for implementing the manifold scattering transform to datasets arising in naturalistic systems, such as single cell genetics, where the data is a high-dimensional point cloud modeled as lying on a low-dimensional manifold. We show that our methods are effective for signal classification and manifold classification tasks.
APDec 15, 2015
When interpolation-induced reflection artifact meets time-frequency analysisYu-Ting Lin, Patrick Flandrin, Hau-tieng Wu
While extracting the temporal dynamical features based on the time-frequency analyses, like the reassignment and synchrosqueezing transform, attracts more and more interest in bio-medical data analysis, we should be careful about artifacts generated by interpolation schemes, in particular when the sampling rate is not significantly higher than the frequency of the oscillatory component we are interested in. In this study, we formulate the problem called the reflection effect and provide a theoretical justification of the statement. We also show examples in the anesthetic depth analysis with clear but undesirable artifacts. The results show that the artifact associated with the reflection effect exists not only theoretically but practically. Its influence is pronounced when we apply the time-frequency analyses to extract the time-varying dynamics hidden inside the signal. In conclusion, we have to carefully deal with the artifact associated with the reflection effect by choosing a proper interpolation scheme.
MLFeb 11
Generalized Robust Adaptive-Bandwidth Multi-View Manifold Learning in High Dimensions with NoiseXiucai Ding, Chao Shen, Hau-Tieng Wu
Multiview datasets are common in scientific and engineering applications, yet existing fusion methods offer limited theoretical guarantees, particularly in the presence of heterogeneous and high-dimensional noise. We propose Generalized Robust Adaptive-Bandwidth Multiview Diffusion Maps (GRAB-MDM), a new kernel-based diffusion geometry framework for integrating multiple noisy data sources. The key innovation of GRAB-MDM is a {view}-dependent bandwidth selection strategy that adapts to the geometry and noise level of each view, enabling a stable and principled construction of multiview diffusion operators. Under a common-manifold model, we establish asymptotic convergence results and show that the adaptive bandwidths lead to provably robust recovery of the shared intrinsic structure, even when noise levels and sensor dimensions differ across views. Numerical experiments demonstrate that GRAB-MDM significantly improves robustness and embedding quality compared with fixed-bandwidth and equal-bandwidth baselines, and usually outperform existing algorithms. The proposed framework offers a practical and theoretically grounded solution for multiview sensor fusion in high-dimensional noisy environments.
CVFeb 14, 2019Code
Non-contact photoplethysmogram and instantaneous heart rate estimation from infrared face videoNatalia Martinez, Martin Bertran, Guillermo Sapiro et al.
Extracting the instantaneous heart rate (iHR) from face videos has been well studied in recent years. It is well known that changes in skin color due to blood flow can be captured using conventional cameras. One of the main limitations of methods that rely on this principle is the need of an illumination source. Moreover, they have to be able to operate under different light conditions. One way to avoid these constraints is using infrared cameras, allowing the monitoring of iHR under low light conditions. In this work, we present a simple, principled signal extraction method that recovers the iHR from infrared face videos. We tested the procedure on 7 participants, for whom we recorded an electrocardiogram simultaneously with their infrared face video. We checked that the recovered signal matched the ground truth iHR, showing that infrared is a promising alternative to conventional video imaging for heart rate monitoring, especially in low light conditions. Code is available at https://github.com/natalialmg/IR_iHR
NCOct 27, 2023
Unveil Sleep Spindles with Concentration of Frequency and TimeRiki Shimizu, Hau-Tieng Wu
Objective: Sleep spindles contain crucial brain dynamics information. We introduce the novel non-linear time-frequency analysis tool 'Concentration of Frequency and Time' (ConceFT) to create an interpretable automated algorithm for sleep spindle annotation in EEG data and to measure spindle instantaneous frequencies (IFs). Methods: ConceFT effectively reduces stochastic EEG influence, enhancing spindle visibility in the time-frequency representation. Our automated spindle detection algorithm, ConceFT-Spindle (ConceFT-S), is compared to A7 (non-deep learning) and SUMO (deep learning) using Dream and MASS benchmark databases. We also quantify spindle IF dynamics. Results: ConceFT-S achieves F1 scores of 0.749 in Dream and 0.786 in MASS, which is equivalent to or surpass A7 and SUMO with statistical significance. We reveal that spindle IF is generally nonlinear. Conclusion: ConceFT offers an accurate, interpretable EEG-based sleep spindle detection algorithm and enables spindle IF quantification.
MLMar 22
Accelerate Vector Diffusion Maps by LandmarksSing-Yuan Yeh, Yi-An Wu, Hau-Tieng Wu et al.
We propose a landmark-constrained algorithm, LA-VDM (Landmark Accelerated Vector Diffusion Maps), to accelerate the Vector Diffusion Maps (VDM) framework built upon the Graph Connection Laplacian (GCL), which captures pairwise connection relationships within complex datasets. LA-VDM introduces a novel two-stage normalization that effectively address nonuniform sampling densities in both the data and the landmark sets. Under a manifold model with the frame bundle structure, we show that we can accurately recover the parallel transport with landmark-constrained diffusion from a point cloud, and hence asymptotically LA-VDM converges to the connection Laplacian. The performance and accuracy of LA-VDM are demonstrated through experiments on simulated datasets and an application to nonlocal image denoising.
MLJan 31, 2024
Convergence analysis of t-SNE as a gradient flow for point cloud on a manifoldSeonghyeon Jeong, Hau-Tieng Wu
We present a theoretical foundation regarding the boundedness of the t-SNE algorithm. t-SNE employs gradient descent iteration with Kullback-Leibler (KL) divergence as the objective function, aiming to identify a set of points that closely resemble the original data points in a high-dimensional space, minimizing KL divergence. Investigating t-SNE properties such as perplexity and affinity under a weak convergence assumption on the sampled dataset, we examine the behavior of points generated by t-SNE under continuous gradient flow. Demonstrating that points generated by t-SNE remain bounded, we leverage this insight to establish the existence of a minimizer for KL divergence.
MLJan 8, 2024
Design a Metric Robust to Complicated High Dimensional Noise for Efficient Manifold DenoisingHau-Tieng Wu
In this manuscript, we propose an efficient manifold denoiser based on landmark diffusion and optimal shrinkage under the complicated high dimensional noise and compact manifold setup. It is flexible to handle several setups, including the high ambient space dimension with a manifold embedding that occupies a subspace of high or low dimensions, and the noise could be colored and dependent. A systematic comparison with other existing algorithms on both simulated and real datasets is provided. This manuscript is mainly algorithmic and we report several existing tools and numerical results. Theoretical guarantees and more comparisons will be reported in the official paper of this manuscript.
LGApr 29, 2024
Landmark Alternating DiffusionSing-Yuan Yeh, Hau-Tieng Wu, Ronen Talmon et al.
Alternating Diffusion (AD) is a commonly applied diffusion-based sensor fusion algorithm. While it has been successfully applied to various problems, its computational burden remains a limitation. Inspired by the landmark diffusion idea considered in the Robust and Scalable Embedding via Landmark Diffusion (ROSELAND), we propose a variation of AD, called Landmark AD (LAD), which captures the essence of AD while offering superior computational efficiency. We provide a series of theoretical analyses of LAD under the manifold setup and apply it to the automatic sleep stage annotation problem with two electroencephalogram channels to demonstrate its application.
SPOct 8, 2025
Time-Frequency Filtering Meets Graph ClusteringMarcelo A. Colominas, Stefan Steinerberger, Hau-Tieng Wu
We show that the problem of identifying different signal components from a time-frequency representation can be equivalently phrased as a graph clustering problem: given a graph $G=(V,E)$ one aims to identify `clusters', subgraphs that are strongly connected and have relatively few connections between them. The graph clustering problem is well studied, we show how these ideas can suggest (many) new ways to identify signal components. Numerical experiments illustrate the ideas.
LGNov 12, 2024
Sleep Staging from Airflow Signals Using Fourier Approximations of Persistence CurvesShashank Manjunath, Hau-Tieng Wu, Aarti Sathyanarayana
Sleep staging is a challenging task, typically manually performed by sleep technologists based on electroencephalogram and other biosignals of patients taken during overnight sleep studies. Recent work aims to leverage automated algorithms to perform sleep staging not based on electroencephalogram signals, but rather based on the airflow signals of subjects. Prior work uses ideas from topological data analysis (TDA), specifically Hermite function expansions of persistence curves (HEPC) to featurize airflow signals. However, finite order HEPC captures only partial information. In this work, we propose Fourier approximations of persistence curves (FAPC), and use this technique to perform sleep staging based on airflow signals. We analyze performance using an XGBoost model on 1155 pediatric sleep studies taken from the Nationwide Children's Hospital Sleep DataBank (NCHSDB), and find that FAPC methods provide complimentary information to HEPC methods alone, leading to a 4.9% increase in performance over baseline methods.
MLJan 21, 2022
Spatiotemporal Analysis Using Riemannian Composition of Diffusion OperatorsTal Shnitzer, Hau-Tieng Wu, Ronen Talmon
Multivariate time-series have become abundant in recent years, as many data-acquisition systems record information through multiple sensors simultaneously. In this paper, we assume the variables pertain to some geometry and present an operator-based approach for spatiotemporal analysis. Our approach combines three components that are often considered separately: (i) manifold learning for building operators representing the geometry of the variables, (ii) Riemannian geometry of symmetric positive-definite matrices for multiscale composition of operators corresponding to different time samples, and (iii) spectral analysis of the composite operators for extracting different dynamic modes. We propose a method that is analogous to the classical wavelet analysis, which we term Riemannian multi-resolution analysis (RMRA). We provide some theoretical results on the spectral analysis of the composite operators, and we demonstrate the proposed method on simulations and on real data.
HCJan 6, 2022
Predicting Trust Using Automated Assessment of Multivariate Interactional SynchronyAdrien Meynard, Gayan Seneviratna, Elliot Doyle et al.
Diverse disciplines are interested in how the coordination of interacting agents' movements, emotions, and physiology over time impacts social behavior. Here, we describe a new multivariate procedure for automating the investigation of this kind of behaviorally-relevant "interactional synchrony", and introduce a novel interactional synchrony measure based on features of dynamic time warping (DTW) paths. We demonstrate that our DTW path-based measure of interactional synchrony between facial action units of two people interacting freely in a natural social interaction can be used to predict how much trust they will display in a subsequent Trust Game. We also show that our approach outperforms univariate head movement models, models that consider participants' facial action units independently, and models that use previously proposed synchrony or similarity measures. The insights of this work can be applied to any research question that aims to quantify the temporal coordination of multiple signals over time, but has immediate applications in psychology, medicine, and robotics.
MLNov 22, 2021
How do kernel-based sensor fusion algorithms behave under high dimensional noise?Xiucai Ding, Hau-Tieng Wu
We study the behavior of two kernel based sensor fusion algorithms, nonparametric canonical correlation analysis (NCCA) and alternating diffusion (AD), under the nonnull setting that the clean datasets collected from two sensors are modeled by a common low dimensional manifold embedded in a high dimensional Euclidean space and the datasets are corrupted by high dimensional noise. We establish the asymptotic limits and convergence rates for the eigenvalues of the associated kernel matrices assuming that the sample dimension and sample size are comparably large, where NCCA and AD are conducted using the Gaussian kernel. It turns out that both the asymptotic limits and convergence rates depend on the signal-to-noise ratio (SNR) of each sensor and selected bandwidths. On one hand, we show that if NCCA and AD are directly applied to the noisy point clouds without any sanity check, it may generate artificial information that misleads scientists' interpretation. On the other hand, we prove that if the bandwidths are selected adequately, both NCCA and AD can be made robust to high dimensional noise when the SNRs are relatively large.
QMSep 21, 2021
Arterial blood pressure waveform in liver transplant surgery possesses variability of morphology reflecting recipients' acuity and predicting short term outcomesShen-Chih Wang, Chien-Kun Ting, Cheng-Yen Chen et al.
Background: We investigated clinical information underneath the beat-to-beat fluctuation of the arterial blood pressure (ABP) waveform morphology. We proposed the Dynamical Diffusion Map algorithm (DDMap) to quantify the variability of morphology. The underlying physiology could be the compensatory mechanisms involving complex interactions between various physiological mechanisms to regulate the cardiovascular system. As a liver transplant surgery contains distinct periods, we investigated its clinical behavior in different surgical steps. Methods: Our study used DDmap algorithm, based on unsupervised manifold learning, to obtain a quantitative index for the beat-to-beat variability of morphology. We examined the correlation between the variability of ABP morphology and disease acuity as indicated by Model for End-Stage Liver Disease (MELD) scores, the postoperative laboratory data, and 4 early allograft failure (EAF) scores. Results: Among the 85 enrolled patients, the variability of morphology obtained during the presurgical phase was best correlated with MELD-Na scores. The neohepatic phase variability of morphology was associated with EAF scores as well as postoperative bilirubin levels, international normalized ratio, aspartate aminotransferase levels, and platelet count. Furthermore, variability of morphology presents more associations with the above clinical conditions than the common BP measures and their BP variability indices. Conclusions: The variability of morphology obtained during the presurgical phase is indicative of patient acuity, whereas those during the neohepatic phase are indicative of short-term surgical outcomes.
MLNov 21, 2020
Central and Non-central Limit Theorems arising from the Scattering Transform and its Neural Activation GeneralizationGi-Ren Liu, Yuan-Chung Sheu, Hau-Tieng Wu
Motivated by analyzing complicated and non-stationary time series, we study a generalization of the scattering transform (ST) that includes broad neural activation functions, which is called neural activation ST (NAST). On the whole, NAST is a transform that comprises a sequence of ``neural processing units'', each of which applies a high pass filter to the input from the previous layer followed by a composition with a nonlinear function as the output to the next neuron. Here, the nonlinear function models how a neuron gets excited by the input signal. In addition to showing properties like non-expansion, horizontal translational invariability and insensitivity to local deformation, the statistical properties of the second order NAST of a Gaussian process with various dependence and (non-)stationarity structure and its interaction with the chosen high pass filters and activation functions are explored and central limit theorem (CLT) and non-CLT results are provided. Numerical simulations are also provided. The results explain how NAST processes complicated and non-stationary time series, and pave a way towards statistical inference based on NAST under the non-null case.
STNov 21, 2020
Impact of signal-to-noise ratio and bandwidth on graph Laplacian spectrum from high-dimensional noisy point cloudXiucai Ding, Hau-Tieng Wu
We systematically study the spectrum of kernel-based graph Laplacian (GL) constructed from high-dimensional and noisy random point cloud in the nonnull setup. The problem is motived by studying the model when the clean signal is sampled from a manifold that is embedded in a low-dimensional Euclidean subspace, and corrupted by high-dimensional noise. We quantify how the signal and noise interact over different regions of signal-to-noise ratio (SNR), and report the resulting peculiar spectral behavior of GL. In addition, we explore the impact of chosen kernel bandwidth on the spectrum of GL over different regions of SNR, which lead to an adaptive choice of kernel bandwidth that coincides with the common practice in real data. This result paves the way to a theoretical understanding of how practitioners apply GL when the dataset is noisy.
STNov 3, 2020
Convergence of Graph Laplacian with kNN Self-tuned KernelsXiuyuan Cheng, Hau-Tieng Wu
Kernelized Gram matrix $W$ constructed from data points $\{x_i\}_{i=1}^N$ as $W_{ij}= k_0( \frac{ \| x_i - x_j \|^2} {σ^2} )$ is widely used in graph-based geometric data analysis and unsupervised learning. An important question is how to choose the kernel bandwidth $σ$, and a common practice called self-tuned kernel adaptively sets a $σ_i$ at each point $x_i$ by the $k$-nearest neighbor (kNN) distance. When $x_i$'s are sampled from a $d$-dimensional manifold embedded in a possibly high-dimensional space, unlike with fixed-bandwidth kernels, theoretical results of graph Laplacian convergence with self-tuned kernels have been incomplete. This paper proves the convergence of graph Laplacian operator $L_N$ to manifold (weighted-)Laplacian for a new family of kNN self-tuned kernels $W^{(α)}_{ij} = k_0( \frac{ \| x_i - x_j \|^2}{ ε\hatρ(x_i) \hatρ(x_j)})/\hatρ(x_i)^α\hatρ(x_j)^α$, where $\hatρ$ is the estimated bandwidth function {by kNN}, and the limiting operator is also parametrized by $α$. When $α= 1$, the limiting operator is the weighted manifold Laplacian $Δ_p$. Specifically, we prove the point-wise convergence of $L_N f $ and convergence of the graph Dirichlet form with rates. Our analysis is based on first establishing a $C^0$ consistency for $\hatρ$ which bounds the relative estimation error $|\hatρ - \barρ|/\barρ$ uniformly with high probability, where $\barρ = p^{-1/d}$, and $p$ is the data density function. Our theoretical results reveal the advantage of self-tuned kernel over fixed-bandwidth kernel via smaller variance error in low-density regions. In the algorithm, no prior knowledge of $d$ or data density is needed. The theoretical results are supported by numerical experiments on simulated data and hand-written digit image data.
MEOct 14, 2020
Graph Based Gaussian Processes on Restricted DomainsDavid B Dunson, Hau-Tieng Wu, Nan Wu
In nonparametric regression, it is common for the inputs to fall in a restricted subset of Euclidean space. Typical kernel-based methods that do not take into account the intrinsic geometry of the domain across which observations are collected may produce sub-optimal results. In this article, we focus on solving this problem in the context of Gaussian process (GP) models, proposing a new class of Graph Laplacian based GPs (GL-GPs), which learn a covariance that respects the geometry of the input domain. As the heat kernel is intractable computationally, we approximate the covariance using finitely-many eigenpairs of the Graph Laplacian (GL). The GL is constructed from a kernel which depends only on the Euclidean coordinates of the inputs. Hence, we can benefit from the full knowledge about the kernel to extend the covariance structure to newly arriving samples by a Nyström type extension. We provide substantial theoretical support for the GL-GP methodology, and illustrate performance gains in various applications.
MLAug 11, 2020
Airflow recovery from thoracic and abdominal movements using Synchrosqueezing Transform and Locally Stationary Gaussian Process RegressionWhitney K. Huang, Yu-Min Chung, Yu-Bo Wang et al.
Airflow signal encodes rich information about respiratory system. While the gold standard for measuring airflow is to use a spirometer with an occlusive seal, this is not practical for ambulatory monitoring of patients. Advances in sensor technology have made measurement of motion of the thorax and abdomen feasible with small inexpensive devices, but estimation of airflow from these time series is challenging. We propose to use the nonlinear-type time-frequency analysis tool, synchrosqueezing transform, to properly represent the thoracic and abdominal movement signals as the features, which are used to recover the airflow by the locally stationary Gaussian process. We show that, using a dataset that contains respiratory signals under normal sleep conditions, an accurate prediction can be achieved by fitting the proposed model in the feature space both in the intra- and inter-subject setups. We also apply our method to a more challenging case, where subjects under general anesthesia underwent transitions from pressure support to unassisted ventilation to further demonstrate the utility of the proposed method.
STJul 13, 2020
Strong Uniform Consistency with Rates for Kernel Density Estimators with General Kernels on ManifoldsHau-Tieng Wu, Nan Wu
When analyzing modern machine learning algorithms, we may need to handle kernel density estimation (KDE) with intricate kernels that are not designed by the user and might even be irregular and asymmetric. To handle this emerging challenge, we provide a strong uniform consistency result with the $L^\infty$ convergence rate for KDE on Riemannian manifolds with Riemann integrable kernels (in the ambient Euclidean space). We also provide an $L^1$ consistency result for kernel density estimation on Riemannian manifolds with Lebesgue integrable kernels. The isotropic kernels considered in this paper are different from the kernels in the Vapnik-Chervonenkis class that are frequently considered in statistics society. We illustrate the difference when we apply them to estimate the probability density function. Moreover, we elaborate the delicate difference when the kernel is designed on the intrinsic manifold and on the ambient Euclidian space, both might be encountered in practice. At last, we prove the necessary and sufficient condition for an isotropic kernel to be Riemann integrable on a submanifold in the Euclidean space.
SPAug 9, 2019
A persistent homology approach to heart rate variability analysis with an application to sleep-wake classificationYu-Min Chung, Chuan-Shen Hu, Yu-Lun Lo et al.
Persistent homology (PH) is a recently developed theory in the field of algebraic topology to study shapes of datasets. It is an effective data analysis tool that is robust to noise and has been widely applied. We demonstrate a general pipeline to apply PH to study time series; particularly the instantaneous heart rate time series for the heart rate variability (HRV) analysis. The first step is capturing the shapes of time series from two different aspects -- {the PH's and hence persistence diagrams of its} sub-level set and Taken's lag map. Second, we propose a systematic {and computationally efficient} approach to summarize persistence diagrams, which we coined {\em persistence statistics}. To demonstrate our proposed method, we apply these tools to the HRV analysis and the sleep-wake, REM-NREM (rapid eyeball movement and non rapid eyeball movement) and sleep-REM-NREM classification problems. The proposed algorithm is evaluated on three different datasets via the cross-database validation scheme. The performance of our approach is better than the state-of-the-art algorithms, and the result is consistent throughout different datasets.
SPMay 30, 2019
Transient-evoked otoacoustic emission signals predicting outcomes of acute sensorineural hearing loss in patients with Meniere's DiseaseYi-Wen Liu, Sheng-Lun Kao, Hau-Tieng Wu et al.
Background: Fluctuating hearing loss is characteristic of Meniere's Disease (MD) during acute episodes. However, no reliable audiometric hallmarks are available for counselling the hearing recovery possibility. Aims/Objectives: To find parameters for predicting MD hearing outcomes. Material and Methods: We applied machine learning techniques to analyse transient-evoked otoacoustic emission (TEOAE) signals recorded from patients with MD. Thirty unilateral MD patients were recruited prospectively after onset of acute cochleo-vestibular symptoms. Serial TEOAE and pure-tone audiogram (PTA) data were recorded longitudinally. Denoised TEOAE signals were projected onto the three most prominent principal directions through a linear transformation. Binary classification was performed using a support vector machine (SVM). TEOAE signal parameters, including signal energy and group delay, were compared between improved and nonimproved groups using Welchs t-test. Results: Signal energy did not differ (p = 0.64) but a significant difference in 1-kHz (p = 0.045) group delay was recorded between improved and nonimproved groups. The SVM achieved a cross-validated accuracy of >80% in predicting hearing outcomes. Conclusions and Significance: This study revealed that baseline TEOAE parameters obtained during acute MD episodes, when processed through machine learning technology, may provide information on outer hair cell function to predict hearing recovery.
CVNov 7, 2018
Solving Jigsaw Puzzles By the Graph Connection LaplacianVahan Huroyan, Gilad Lerman, Hau-Tieng Wu
We propose a novel mathematical framework to address the problem of automatically solving large jigsaw puzzles. This problem assumes a large image, which is cut into equal square pieces that are arbitrarily rotated and shuffled, and asks to recover the original image given the transformed pieces. The main contribution of this work is a method for recovering the rotations of the pieces when both shuffles and rotations are unknown. A major challenge of this procedure is estimating the graph connection Laplacian without the knowledge of shuffles. A careful combination of our proposed method for estimating rotations with any existing method for estimating shuffles results in a practical solution for the jigsaw puzzle problem. Our theory guarantees, in a clean setting, that our basic idea of recovering rotations is robust to some corruption of the connection graph. Numerical experiments demonstrate the competitive accuracy of this solution, its robustness to corruption and, its computational advantage for large puzzles.
QMAug 27, 2018
Unexpected sawtooth artifact in beat-to-beat pulse transit time measured from patient monitor dataYu-Ting Lin, Yu-Lun Lo, Chen-Yun Lin et al.
Object: It is increasingly popular to collect as much data as possible in the hospital setting from clinical monitors for research purposes. However, in this setup the data calibration issue is often not discussed and, rather, implicitly assumed, while the clinical monitors might not be designed for the data analysis purpose. We hypothesize that this calibration issue for a secondary analysis may become an important source of artifacts in patient monitor data. We test an off-the-shelf integrated photoplethysmography (PPG) and electrocardiogram (ECG) monitoring device for its ability to yield a reliable pulse transit time (PTT) signal. Approach: This is a retrospective clinical study using two databases: one containing 35 subjects who underwent laparoscopic cholecystectomy, another containing 22 subjects who underwent spontaneous breathing test in the intensive care unit. All data sets include recordings of PPG and ECG using a commonly deployed patient monitor. We calculated the PTT signal offline. Main Results: We report a novel constant oscillatory pattern in the PTT signal and identify this pattern as a sawtooth artifact. We apply an approach based on the de-shape method to visualize, quantify and validate this sawtooth artifact. Significance: The PPG and ECG signals not designed for the PTT evaluation may contain unwanted artifacts. The PTT signal should be calibrated before analysis to avoid erroneous interpretation of its physiological meaning.
APAug 1, 2018
Sleep-wake classification via quantifying heart rate variability by convolutional neural networkJohn Malik, Yu-Lun Lo, Hau-tieng Wu
Fluctuations in heart rate are intimately tied to changes in the physiological state of the organism. We examine and exploit this relationship by classifying a human subject's wake/sleep status using his instantaneous heart rate (IHR) series. We use a convolutional neural network (CNN) to build features from the IHR series extracted from a whole-night electrocardiogram (ECG) and predict every 30 seconds whether the subject is awake or asleep. Our training database consists of 56 normal subjects, and we consider three different databases for validation; one is private, and two are public with different races and apnea severities. On our private database of 27 subjects, our accuracy, sensitivity, specificity, and AUC values for predicting the wake stage are 83.1%, 52.4%, 89.4%, and 0.83, respectively. Validation performance is similar on our two public databases. When we use the photoplethysmography instead of the ECG to obtain the IHR series, the performance is also comparable. A robustness check is carried out to confirm the obtained performance statistics. This result advocates for an effective and scalable method for recognizing changes in physiological state using non-invasive heart rate monitoring. The CNN model adaptively quantifies IHR fluctuation as well as its location in time and is suitable for differentiating between the wake and sleep stages.
SPMar 17, 2018
A Novel Blaschke Unwinding Adaptive Fourier Decomposition based Signal Compression Algorithm with Application on ECG SignalsChunyu Tan, Liming Zhang, Hau-tieng Wu
This paper presents a novel signal compression algorithm based on the Blaschke unwinding adaptive Fourier decomposition (AFD). The Blaschke unwinding AFD is a newly developed signal decomposition theory. It utilizes the Nevanlinna factorization and the maximal selection principle in each decomposition step, and achieves a faster convergence rate with higher fidelity. The proposed compression algorithm is applied to the electrocardiogram signal. To assess the performance of the proposed compression algorithm, in addition to the generic assessment criteria, we consider the less discussed criteria related to the clinical needs -- for the heart rate variability analysis purpose, how accurate the R peak information is preserved is evaluated. The experiments are conducted on the MIT-BIH arrhythmia benchmark database. The results show that the proposed algorithm performs better than other state-of-the-art approaches. Meanwhile, it also well preserves the R peak information.
MED-PHFeb 7, 2017
Efficient fetal-maternal ECG signal separation from two channel maternal abdominal ECG via diffusion-based channel selectionRuilin Li, Martin G. Frasch, Hau-tieng Wu
There is a need for affordable, widely deployable maternal-fetal ECG monitors to improve maternal and fetal health during pregnancy and delivery. Based on the diffusion-based channel selection, here we present the mathematical formalism and clinical validation of an algorithm capable of accurate separation of maternal and fetal ECG from a two channel signal acquired over maternal abdomen.
MLJan 13, 2017
Diffusion-based nonlinear filtering for multimodal data fusion with application to sleep stage assessmentOri Katz, Ronen Talmon, Yu-Lun Lo et al.
The problem of information fusion from multiple data-sets acquired by multimodal sensors has drawn significant research attention over the years. In this paper, we focus on a particular problem setting consisting of a physical phenomenon or a system of interest observed by multiple sensors. We assume that all sensors measure some aspects of the system of interest with additional sensor-specific and irrelevant components. Our goal is to recover the variables relevant to the observed system and to filter out the nuisance effects of the sensor-specific variables. We propose an approach based on manifold learning, which is particularly suitable for problems with multiple modalities, since it aims to capture the intrinsic structure of the data and relies on minimal prior model knowledge. Specifically, we propose a nonlinear filtering scheme, which extracts the hidden sources of variability captured by two or more sensors, that are independent of the sensor-specific components. In addition to presenting a theoretical analysis, we demonstrate our technique on real measured data for the purpose of sleep stage assessment based on multiple, multimodal sensor measurements. We show that without prior knowledge on the different modalities and on the measured system, our method gives rise to a data-driven representation that is well correlated with the underlying sleep process and is robust to noise and sensor-specific effects.
QMSep 9, 2016
Extract fetal ECG from single-lead abdominal ECG by de-shape short time Fourier transform and nonlocal medianSu Li, Hau-tieng Wu
The multiple fundamental frequency detection problem and the source separation problem from a single-channel signal containing multiple oscillatory components and a nonstationary noise are both challenging tasks. To extract the fetal electrocardiogram (ECG) from a single-lead maternal abdominal ECG, we face both challenges. In this paper, we propose a novel method to extract the fetal ECG signal from the single channel maternal abdominal ECG signal, without any additional measurement. The algorithm is composed of three main ingredients. First, the maternal and fetal heart rates are estimated by the de-shape short time Fourier transform, which is a recently proposed nonlinear time-frequency analysis technique; second, the beat tracking technique is applied to accurately obtain the maternal and fetal R peaks; third, the maternal and fetal ECG waveforms are established by the nonlocal median. The algorithm is evaluated on a simulated fetal ECG signal database ({\em fecgsyn} database), and tested on two real databases with the annotation provided by experts ({\em adfecgdb} database and {\em CinC2013} database). In general, the algorithm could be applied to solve other detection and source separation problems, and reconstruct the time-varying wave-shape function of each oscillatory component.
MMJun 29, 2016
Minimum-latency Time-frequency Analysis Using Asymmetric Window FunctionsLi Su, Hau-tieng Wu
We study the real-time dynamics retrieval from a time series via the time-frequency (TF) analysis with the minimal latency guarantee. While different from the well-known intrinsic latency definition in the filter design, a rigorous definition of intrinsic latency for different time-frequency representations (TFR) is provided, including the short time Fourier transform (STFT), synchrosqeezing transform (SST) and reassignment method (RM). To achieve the minimal latency, a systematic method is proposed to construct an asymmetric window from a well-designed symmetric one based on the concept of minimum-phase, if the window satisfies some weak conditions. We theoretically show that the TFR determined by SST with the constructed asymmetric window does have a smaller intrinsic latency. Finally, the music onset detection problem is studied to show the strength of the proposed algorithm.
DATA-ANMay 6, 2016
Wave-shape function analysis -- when cepstrum meets time-frequency analysisChen-Yun Lin, Li Su, Hau-tieng Wu
We propose to combine cepstrum and nonlinear time-frequency (TF) analysis to study mutiple component oscillatory signals with time-varying frequency and amplitude and with time-varying non-sinusoidal oscillatory pattern. The concept of cepstrum is applied to eliminate the wave-shape function influence on the TF analysis, and we propose a new algorithm, named de-shape synchrosqueezing transform (de-shape SST). The mathematical model, adaptive non-harmonic model, is introduced and the de-shape SST algorithm is theoretically analyzed. In addition to simulated signals, several different physiological, musical and biological signals are analyzed to illustrate the proposed algorithm.
DATA-ANJan 30, 2016
Latent common manifold learning with alternating diffusion: analysis and applicationsRonen Talmon, Hau-tieng Wu
The analysis of data sets arising from multiple sensors has drawn significant research attention over the years. Traditional methods, including kernel-based methods, are typically incapable of capturing nonlinear geometric structures. We introduce a latent common manifold model underlying multiple sensor observations for the purpose of multimodal data fusion. A method based on alternating diffusion is presented and analyzed; we provide theoretical analysis of the method under the latent common manifold model. To exemplify the power of the proposed framework, experimental results in several applications are reported.
STMay 23, 2014
Connection graph Laplacian methods can be made robust to noiseNoureddine El Karoui, Hau-tieng Wu
Recently, several data analytic techniques based on connection graph laplacian (CGL) ideas have appeared in the literature. At this point, the properties of these methods are starting to be understood in the setting where the data is observed without noise. We study the impact of additive noise on these methods, and show that they are remarkably robust. As a by-product of our analysis, we propose modifications of the standard algorithms that increase their robustness to noise. We illustrate our results in numerical simulations.
PROct 1, 2013
Graph connection Laplacian and random matrices with random blocksNoureddine El Karoui, Hau-tieng Wu
Graph connection Laplacian (GCL) is a modern data analysis technique that is starting to be applied for the analysis of high dimensional and massive datasets. Motivated by this technique, we study matrices that are akin to the ones appearing in the null case of GCL, i.e the case where there is no structure in the dataset under investigation. Developing this understanding is important in making sense of the output of the algorithms based on GCL. We hence develop a theory explaining the behavior of the spectral distribution of a large class of random matrices, in particular random matrices with random block entries of fixed size. Part of the theory covers the case where there is significant dependence between the blocks. Numerical work shows that the agreement between our theoretical predictions and numerical simulations is generally very good.
NAJun 7, 2013
Spectral Convergence of the connection Laplacian from random samplesAmit Singer, Hau-tieng Wu
Spectral methods that are based on eigenvectors and eigenvalues of discrete graph Laplacians, such as Diffusion Maps and Laplacian Eigenmaps are often used for manifold learning and non-linear dimensionality reduction. It was previously shown by Belkin and Niyogi \cite{belkin_niyogi:2007} that the eigenvectors and eigenvalues of the graph Laplacian converge to the eigenfunctions and eigenvalues of the Laplace-Beltrami operator of the manifold in the limit of infinitely many data points sampled independently from the uniform distribution over the manifold. Recently, we introduced Vector Diffusion Maps and showed that the connection Laplacian of the tangent bundle of the manifold can be approximated from random samples. In this paper, we present a unified framework for approximating other connection Laplacians over the manifold by considering its principle bundle structure. We prove that the eigenvectors and eigenvalues of these Laplacians converge in the limit of infinitely many independent random samples. We generalize the spectral convergence results to the case where the data points are sampled from a non-uniform distribution, and for manifolds with and without boundary.
DGMay 18, 2013
Embedding Riemannian Manifolds by the Heat Kernel of the Connection LaplacianHau-tieng Wu
Given a class of closed Riemannian manifolds with prescribed geometric conditions, we introduce an embedding of the manifolds into $\ell^2$ based on the heat kernel of the Connection Laplacian associated with the Levi-Civita connection on the tangent bundle. As a result, we can construct a distance in this class which leads to a pre-compactness theorem on the class under consideration.
NADec 12, 2009
Synchrosqueezed Wavelet Transforms: a Tool for Empirical Mode DecompositionIngrid Daubechies, Jianfeng Lu, Hau-Tieng Wu
The EMD algorithm, first proposed in [11], made more robust as well as more versatile in [12], is a technique that aims to decompose into their building blocks functions that are the superposition of a (reasonably) small number of components, well separated in the time-frequency plane, each of which can be viewed as approximately harmonic locally, with slowly varying amplitudes and frequencies. The EMD has already shown its usefulness in a wide range of applications including meteorology, structural stability analysis, medical studies -- see, e.g. [13]. On the other hand, the EMD algorithm contains heuristic and ad-hoc elements that make it hard to analyze mathematically. In this paper we describe a method that captures the flavor and philosophy of the EMD approach, albeit using a different approach in constructing the components. We introduce a precise mathematical definition for a class of functions that can be viewed as a superposition of a reasonably small number of approximately harmonic components, and we prove that our method does indeed succeed in decomposing arbitrary functions in this class. We provide several examples, for simulated as well as real data.