Namjoon Suh

h-index9

11papers

84citations

Novelty40%

AI Score37

Ranked #115,893 of 201,326 authors (top 58%)#1,554 in ML (top 44%)

11 Papers

MLOct 24, 2023Code

AutoDiff: combining Auto-encoder and Diffusion model for tabular data synthesizing

Namjoon Suh, Xiaofeng Lin, Din-Yin Hsieh et al.

Diffusion model has become a main paradigm for synthetic data generation in many subfields of modern machine learning, including computer vision, language model, or speech synthesis. In this paper, we leverage the power of diffusion model for generating synthetic tabular data. The heterogeneous features in tabular data have been main obstacles in tabular data synthesis, and we tackle this problem by employing the auto-encoder architecture. When compared with the state-of-the-art tabular synthesizers, the resulting synthetic tables from our model show nice statistical fidelities to the real data, and perform well in downstream tasks for machine learning utilities. We conducted the experiments over $15$ publicly available datasets. Notably, our model adeptly captures the correlations among features, which has been a long-standing challenge in tabular data synthesis. Our code is available at https://github.com/UCLA-Trustworthy-AI-Lab/AutoDiffusion.

MLSep 26, 2023

On Excess Risk Convergence Rates of Neural Network Classifiers

Hyunouk Ko, Namjoon Suh, Xiaoming Huo

The recent success of neural networks in pattern recognition and classification problems suggests that neural networks possess qualities distinct from other more classical classifiers such as SVMs or boosting classifiers. This paper studies the performance of plug-in classifiers based on neural networks in a binary classification setting as measured by their excess risks. Compared to the typical settings imposed in the literature, we consider a more general scenario that resembles actual practice in two respects: first, the function class to be approximated includes the Barron functions as a proper subset, and second, the neural network classifier constructed is the minimizer of a surrogate loss instead of the $0$-$1$ loss so that gradient descent-based numerical optimizations can be easily applied. While the class of functions we consider is quite large that optimal rates cannot be faster than $n^{-\frac{1}{3}}$, it is a regime in which dimension-free rates are possible and approximation power of neural networks can be taken advantage of. In particular, we analyze the estimation and approximation properties of neural networks to obtain a dimension-free, uniform rate of convergence for the excess risk. Finally, we show that the rate obtained is in fact minimax optimal up to a logarithmic factor, and the minimax lower bound shows the effect of the margin assumption in this regime.

CLJan 8

Users Mispredict Their Own Preferences for AI Writing Assistance

Vivian Lai, Zana Buçinca, Nil-Jana Akpinar et al.

Proactive AI writing assistants need to predict when users want drafting help, yet we lack empirical understanding of what drives preferences. Through a factorial vignette study with 50 participants making 750 pairwise comparisons, we find compositional effort dominates decisions ($ρ= 0.597$) while urgency shows no predictive power ($ρ\approx 0$). More critically, users exhibit a striking perception-behavior gap: they rank urgency first in self-reports despite it being the weakest behavioral driver, representing a complete preference inversion. This misalignment has measurable consequences. Systems designed from users' stated preferences achieve only 57.7\% accuracy, underperforming even naive baselines, while systems using behavioral patterns reach significantly higher 61.3\% ($p < 0.05$). These findings demonstrate that relying on user introspection for system design actively misleads optimization, with direct implications for proactive natural language generation (NLG) systems.

LGJun 23, 2024Code

TimeAutoDiff: A Unified Framework for Generation, Imputation, Forecasting, and Time-Varying Metadata Conditioning of Heterogeneous Time Series Tabular Data

Namjoon Suh, Yuning Yang, Din-Yin Hsieh et al.

We present TimeAutoDiff, a unified latent-diffusion framework for four fundamental time-series tasks: unconditional generation, missing-data imputation, forecasting, and time-varying-metadata conditional generation. The model natively supports heterogeneous features including continuous, binary, and categorical variables. We unify all tasks using a masked-modeling strategy in which a binary mask specifies which time-series cells are observed and which must be generated. TimeAutoDiff combines a lightweight variational autoencoder, which maps mixed-type features into a continuous latent sequence, with a diffusion model that learns temporal dynamics in this latent space. Two architectural choices provide strong speed and scalability benefits. The diffusion model samples an entire latent trajectory at once rather than denoising one timestep at a time, greatly reducing reverse-diffusion calls. In addition, the VAE compresses along the feature axis, enabling efficient modeling of wide tables in a low-dimensional latent space. Empirical evaluation shows that TimeAutoDiff matches or surpasses strong baselines in synthetic sequence fidelity and consistently improves imputation and forecasting performance. Metadata conditioning enables realistic scenario exploration, allowing users to edit metadata sequences and produce coherent counterfactual trajectories that preserve cross-feature dependencies. Ablation studies highlight the importance of the VAE's feature encoding and key components of the denoiser. A distance-to-closest-record audit further indicates that the model generalizes without excessive memorization. Code is available at https://github.com/namjoonsuh/TimeAutoDiff

MLJan 14, 2024

A Survey on Statistical Theory of Deep Learning: Approximation, Training Dynamics, and Generative Models

Namjoon Suh, Guang Cheng

In this article, we review the literature on statistical theories of neural networks from three perspectives: approximation, training dynamics and generative models. In the first part, results on excess risks for neural networks are reviewed in the nonparametric framework of regression (and classification in Appendix~{\color{blue}B}). These results rely on explicit constructions of neural networks, leading to fast convergence rates of excess risks. Nonetheless, their underlying analysis only applies to the global minimizer in the highly non-convex landscape of deep neural networks. This motivates us to review the training dynamics of neural networks in the second part. Specifically, we review papers that attempt to answer ``how the neural network trained via gradient-based methods finds the solution that can generalize well on unseen data.'' In particular, two well-known paradigms are reviewed: the Neural Tangent Kernel (NTK) paradigm, and Mean-Field (MF) paradigm. Last but not least, we review the most recent theoretical advancements in generative models including Generative Adversarial Networks (GANs), diffusion models, and in-context learning (ICL) in the Large Language Models (LLMs) from two perpsectives reviewed previously, i.e., approximation and training dynamics.

MLMay 24, 2024

Discriminative Estimation of Total Variation Distance: A Fidelity Auditor for Generative Data

Lan Tao, Shirong Xu, Chi-Hua Wang et al.

With the proliferation of generative AI and the increasing volume of generative data (also called as synthetic data), assessing the fidelity of generative data has become a critical concern. In this paper, we propose a discriminative approach to estimate the total variation (TV) distance between two distributions as an effective measure of generative data fidelity. Our method quantitatively characterizes the relation between the Bayes risk in classifying two distributions and their TV distance. Therefore, the estimation of total variation distance reduces to that of the Bayes risk. In particular, this paper establishes theoretical results regarding the convergence rate of the estimation error of TV distance between two Gaussian distributions. We demonstrate that, with a specific choice of hypothesis class in classification, a fast convergence rate in estimating the TV distance can be achieved. Specifically, the estimation accuracy of the TV distance is proven to inherently depend on the separation of two Gaussian distributions: smaller estimation errors are achieved when the two Gaussian distributions are farther apart. This phenomenon is also validated empirically through extensive simulations. In the end, we apply this discriminative estimation method to rank fidelity of synthetic image data using the MNIST dataset.

MLMar 18, 2024

Approximation of RKHS Functionals by Neural Networks

Tian-Yi Zhou, Namjoon Suh, Guang Cheng et al.

Motivated by the abundance of functional data such as time series and images, there has been a growing interest in integrating such data into neural networks and learning maps from function spaces to R (i.e., functionals). In this paper, we study the approximation of functionals on reproducing kernel Hilbert spaces (RKHS's) using neural networks. We establish the universality of the approximation of functionals on the RKHS's. Specifically, we derive explicit error bounds for those induced by inverse multiquadric, Gaussian, and Sobolev kernels. Moreover, we apply our findings to functional regression, proving that neural networks can accurately approximate the regression maps in generalized functional linear models. Existing works on functional learning require integration-type basis function expansions with a set of pre-specified basis functions. By leveraging the interpolating orthogonal projections in RKHS's, our proposed network is much simpler in that we use point evaluations to replace basis function expansions.

LGFeb 1, 2024

Theoretical Understanding of In-Context Learning in Shallow Transformers with Unstructured Data

Yue Xing, Xiaofeng Lin, Chenheng Xu et al.

Large language models (LLMs) are powerful models that can learn concepts at the inference stage via in-context learning (ICL). While theoretical studies, e.g., \cite{zhang2023trained}, attempt to explain the mechanism of ICL, they assume the input $x_i$ and the output $y_i$ of each demonstration example are in the same token (i.e., structured data). However, in real practice, the examples are usually text input, and all words, regardless of their logic relationship, are stored in different tokens (i.e., unstructured data \cite{wibisono2023role}). To understand how LLMs learn from the unstructured data in ICL, this paper studies the role of each component in the transformer architecture and provides a theoretical understanding to explain the success of the architecture. In particular, we consider a simple transformer with one/two attention layers and linear regression tasks for the ICL prediction. We observe that (1) a transformer with two layers of (self-)attentions with a look-ahead attention mask can learn from the prompt in the unstructured data, and (2) positional encoding can match the $x_i$ and $y_i$ tokens to achieve a better ICL performance.

NAMar 12, 2021

Asymptotic Theory of $\ell_1$-Regularized PDE Identification from a Single Noisy Trajectory

Yuchen He, Namjoon Suh, Xiaoming Huo et al.

We prove the support recovery for a general class of linear and nonlinear evolutionary partial differential equation (PDE) identification from a single noisy trajectory using $\ell_1$ regularized Pseudo-Least Squares model~($\ell_1$-PsLS). In any associative $\mathbb{R}$-algebra generated by finitely many differentiation operators that contain the unknown PDE operator, applying $\ell_1$-PsLS to a given data set yields a family of candidate models with coefficients $\mathbf{c}(λ)$ parameterized by the regularization weight $λ\geq 0$. The trace of $\{\mathbf{c}(λ)\}_{λ\geq 0}$ suffers from high variance due to data noises and finite difference approximation errors. We provide a set of sufficient conditions which guarantee that, from a single trajectory data denoised by a Local-Polynomial filter, the support of $\mathbf{c}(λ)$ asymptotically converges to the true signed-support associated with the underlying PDE for sufficiently many data and a certain range of $λ$. We also show various numerical experiments to validate our theory.

MLDec 2, 2019

Factor Analysis on Citation, Using a Combined Latent and Logistic Regression Model

Namjoon Suh, Xiaoming Huo, Eric Heim et al.

We propose a combined model, which integrates the latent factor model and the logistic regression model, for the citation network. It is noticed that neither a latent factor model nor a logistic regression model alone is sufficient to capture the structure of the data. The proposed model has a latent (i.e., factor analysis) model to represents the main technological trends (a.k.a., factors), and adds a sparse component that captures the remaining ad-hoc dependence. Parameter estimation is carried out through the construction of a joint-likelihood function of edges and properly chosen penalty terms. The convexity of the objective function allows us to develop an efficient algorithm, while the penalty terms push towards a low-dimensional latent component and a sparse graphical structure. Simulation results show that the proposed method works well in practical situations. The proposed method has been applied to a real application, which contains a citation network of statisticians (Ji and Jin, 2016). Some interesting findings are reported.

MLNov 20, 2017

Review on Parameter Estimation in HMRF

Namjoon Suh

This is a technical report which explores the estimation methodologies on hyper-parameters in Markov Random Field and Gaussian Hidden Markov Random Field. In first section, we briefly investigate a theoretical framework on Metropolis-Hastings algorithm. Next, by using MH algorithm, we simulate the data from Ising model, and study on how hyper-parameter estimation in Ising model is enabled through MCMC algorithm using pseudo-likelihood approximation. Following section deals with an issue on parameters estimation process of Gaussian Hidden Markov Random Field using MAP estimation and EM algorithm, and also discusses problems, found through several experiments. In following section, we expand this idea on estimating parameters in Gaussian Hidden Markov Spatial-Temporal Random Field, and display results on two performed experiments.