Weonyoung Joo

LG
h-index9
10papers
310citations
Novelty53%
AI Score31

10 Papers

LGMar 8, 2023
Loss-Curvature Matching for Dataset Selection and Condensation

Seungjae Shin, Heesun Bae, Donghyeok Shin et al.

Training neural networks on a large dataset requires substantial computational costs. Dataset reduction selects or synthesizes data instances based on the large dataset, while minimizing the degradation in generalization performance from the full dataset. Existing methods utilize the neural network during the dataset reduction procedure, so the model parameter becomes important factor in preserving the performance after reduction. By depending upon the importance of parameters, this paper introduces a new reduction objective, coined LCMat, which Matches the Loss Curvatures of the original dataset and reduced dataset over the model parameter space, more than the parameter point. This new objective induces a better adaptation of the reduced dataset on the perturbed parameter region than the exact point matching. Particularly, we identify the worst case of the loss curvature gap from the local parameter region, and we derive the implementable upper bound of such worst-case with theoretical analyses. Our experiments on both coreset selection and condensation benchmarks illustrate that LCMat shows better generalization performances than existing baselines.

CLJan 9, 2024Code
Make Prompts Adaptable: Bayesian Modeling for Vision-Language Prompt Learning with Data-Dependent Prior

Youngjae Cho, HeeSun Bae, Seungjae Shin et al.

Recent Vision-Language Pretrained (VLP) models have become the backbone for many downstream tasks, but they are utilized as frozen model without learning. Prompt learning is a method to improve the pre-trained VLP model by adding a learnable context vector to the inputs of the text encoder. In a few-shot learning scenario of the downstream task, MLE training can lead the context vector to over-fit dominant image features in the training data. This overfitting can potentially harm the generalization ability, especially in the presence of a distribution shift between the training and test dataset. This paper presents a Bayesian-based framework of prompt learning, which could alleviate the overfitting issues on few-shot learning application and increase the adaptability of prompts on unseen instances. Specifically, modeling data-dependent prior enhances the adaptability of text features for both seen and unseen image features without the trade-off of performance between them. Based on the Bayesian framework, we utilize the Wasserstein Gradient Flow in the estimation of our target posterior distribution, which enables our prompt to be flexible in capturing the complex modes of image features. We demonstrate the effectiveness of our method on benchmark datasets for several experiments by showing statistically significant improvements on performance compared to existing methods. The code is available at https://github.com/youngjae-cho/APP.

LGFeb 15, 2021
Neural Posterior Regularization for Likelihood-Free Inference

Dongjun Kim, Kyungwoo Song, Seungjae Shin et al.

A simulation is useful when the phenomenon of interest is either expensive to regenerate or irreproducible with the same context. Recently, Bayesian inference on the distribution of the simulation input parameter has been implemented sequentially to minimize the required simulation budget for the task of simulation validation to the real-world. However, the Bayesian inference is still challenging when the ground-truth posterior is multi-modal with a high-dimensional simulation output. This paper introduces a regularization technique, namely Neural Posterior Regularization (NPR), which enforces the model to explore the input parameter space effectively. Afterward, we provide the closed-form solution of the regularized optimization that enables analyzing the effect of the regularization. We empirically validate that NPR attains the statistically significant gain on benchmark performances for diverse simulation tasks.

LGNov 24, 2020
Counterfactual Fairness with Disentangled Causal Effect Variational Autoencoder

Hyemi Kim, Seungjae Shin, JoonHo Jang et al.

The problem of fair classification can be mollified if we develop a method to remove the embedded sensitive information from the classification features. This line of separating the sensitive information is developed through the causal inference, and the causal inference enables the counterfactual generations to contrast the what-if case of the opposite sensitive attribute. Along with this separation with the causality, a frequent assumption in the deep latent causal model defines a single latent variable to absorb the entire exogenous uncertainty of the causal graph. However, we claim that such structure cannot distinguish the 1) information caused by the intervention (i.e., sensitive variable) and 2) information correlated with the intervention from the data. Therefore, this paper proposes Disentangled Causal Effect Variational Autoencoder (DCEVAE) to resolve this limitation by disentangling the exogenous uncertainty into two latent variables: either 1) independent to interventions or 2) correlated to interventions without causality. Particularly, our disentangling approach preserves the latent variable correlated to interventions in generating counterfactual examples. We show that our method estimates the total effect and the counterfactual effect without a complete causal graph. By adding a fairness regularization, DCEVAE generates a counterfactual fair dataset while losing less original information. Also, DCEVAE generates natural counterfactual images by only flipping sensitive information. Additionally, we theoretically show the differences in the covariance structures of DCEVAE and prior works from the perspective of the latent disentanglement.

MEOct 15, 2020
Sequential Likelihood-Free Inference with Neural Proposal

Dongjun Kim, Kyungwoo Song, YoonYeong Kim et al.

Bayesian inference without the likelihood evaluation, or likelihood-free inference, has been a key research topic in simulation studies for gaining quantitatively validated simulation models on real-world datasets. As the likelihood evaluation is inaccessible, previous papers train the amortized neural network to estimate the ground-truth posterior for the simulation of interest. Training the network and accumulating the dataset alternatively in a sequential manner could save the total simulation budget by orders of magnitude. In the data accumulation phase, the new simulation inputs are chosen within a portion of the total simulation budget to accumulate upon the collected dataset. This newly accumulated data degenerates because the set of simulation inputs is hardly mixed, and this degenerated data collection process ruins the posterior inference. This paper introduces a new sampling approach, called Neural Proposal (NP), of the simulation input that resolves the biased data collection as it guarantees the i.i.d. sampling. The experiments show the improved performance of our sampler, especially for the simulations with multi-modal posteriors.

LGApr 13, 2020
Adversarial Likelihood-Free Inference on Black-Box Generator

Dongjun Kim, Weonyoung Joo, Seungjae Shin et al.

Generative Adversarial Network (GAN) can be viewed as an implicit estimator of a data distribution, and this perspective motivates using the adversarial concept in the true input parameter estimation of black-box generators. While previous works on likelihood-free inference introduces an implicit proposal distribution on the generator input, this paper analyzes theoretic limitations of the proposal distribution approach. On top of that, we introduce a new algorithm, Adversarial Likelihood-Free Inference (ALFI), to mitigate the analyzed limitations, so ALFI is able to find the posterior distribution on the input parameter for black-box generative models. We experimented ALFI with diverse simulation models as well as pre-trained statistical models, and we identified that ALFI achieves the best parameter estimation accuracy with a limited simulation budget.

CLApr 7, 2020
Neutralizing Gender Bias in Word Embedding with Latent Disentanglement and Counterfactual Generation

Seungjae Shin, Kyungwoo Song, JoonHo Jang et al.

Recent research demonstrates that word embeddings, trained on the human-generated corpus, have strong gender biases in embedding spaces, and these biases can result in the discriminative results from the various downstream tasks. Whereas the previous methods project word embeddings into a linear subspace for debiasing, we introduce a \textit{Latent Disentanglement} method with a siamese auto-encoder structure with an adapted gradient reversal layer. Our structure enables the separation of the semantic latent information and gender latent information of given word into the disjoint latent dimensions. Afterwards, we introduce a \textit{Counterfactual Generation} to convert the gender information of words, so the original and the modified embeddings can produce a gender-neutralized word embedding after geometric alignment regularization, without loss of semantic information. From the various quantitative and qualitative debiasing experiments, our method shows to be better than existing debiasing methods in debiasing word embeddings. In addition, Our method shows the ability to preserve semantic information during debiasing by minimizing the semantic information losses for extrinsic NLP downstream tasks.

LGMar 4, 2020
Generalized Gumbel-Softmax Gradient Estimator for Generic Discrete Random Variables

Weonyoung Joo, Dongjun Kim, Seungjae Shin et al.

Estimating the gradients of stochastic nodes in stochastic computational graphs is one of the crucial research questions in the deep generative modeling community, which enables the gradient descent optimization on neural network parameters. Stochastic gradient estimators of discrete random variables are widely explored, for example, Gumbel-Softmax reparameterization trick for Bernoulli and categorical distributions. Meanwhile, other discrete distribution cases such as the Poisson, geometric, binomial, multinomial, negative binomial, etc. have not been explored. This paper proposes a generalized version of the Gumbel-Softmax estimator, which is able to reparameterize generic discrete distributions, not restricted to the Bernoulli and the categorical. The proposed estimator utilizes the truncation of discrete random variables, the Gumbel-Softmax trick, and a special form of linear transformation. Our experiments consist of (1) synthetic examples and applications on VAE, which show the efficacy of our methods; and (2) topic models, which demonstrate the value of the proposed estimation in practice.

LGNov 15, 2019
Sequential Recommendation with Relation-Aware Kernelized Self-Attention

Mingi Ji, Weonyoung Joo, Kyungwoo Song et al.

Recent studies identified that sequential Recommendation is improved by the attention mechanism. By following this development, we propose Relation-Aware Kernelized Self-Attention (RKSA) adopting a self-attention mechanism of the Transformer with augmentation of a probabilistic model. The original self-attention of Transformer is a deterministic measure without relation-awareness. Therefore, we introduce a latent space to the self-attention, and the latent space models the recommendation context from relation as a multivariate skew-normal distribution with a kernelized covariance matrix from co-occurrences, item characteristics, and user information. This work merges the self-attention of the Transformer and the sequential recommendation by adding a probabilistic model of the recommendation task specifics. We experimented RKSA over the benchmark datasets, and RKSA shows significant improvements compared to the recent baseline models. Also, RKSA were able to produce a latent space model that answers the reasons for recommendation.

LGJan 9, 2019
Dirichlet Variational Autoencoder

Weonyoung Joo, Wonsung Lee, Sungrae Park et al.

This paper proposes Dirichlet Variational Autoencoder (DirVAE) using a Dirichlet prior for a continuous latent variable that exhibits the characteristic of the categorical probabilities. To infer the parameters of DirVAE, we utilize the stochastic gradient method by approximating the Gamma distribution, which is a component of the Dirichlet distribution, with the inverse Gamma CDF approximation. Additionally, we reshape the component collapsing issue by investigating two problem sources, which are decoder weight collapsing and latent value collapsing, and we show that DirVAE has no component collapsing; while Gaussian VAE exhibits the decoder weight collapsing and Stick-Breaking VAE shows the latent value collapsing. The experimental results show that 1) DirVAE models the latent representation result with the best log-likelihood compared to the baselines; and 2) DirVAE produces more interpretable latent values with no collapsing issues which the baseline models suffer from. Also, we show that the learned latent representation from the DirVAE achieves the best classification accuracy in the semi-supervised and the supervised classification tasks on MNIST, OMNIGLOT, and SVHN compared to the baseline VAEs. Finally, we demonstrated that the DirVAE augmented topic models show better performances in most cases.