Sae-Young Chung

10papers

314citations

Novelty54%

AI Score27

Ranked #158,892 of 201,326 authors (top 79%)#35,039 in LG (top 82%)

10 Papers

CVJul 8, 2022

Test-Time Adaptation via Self-Training with Nearest Neighbor Information

Minguk Jang, Sae-Young Chung, Hye Won Chung

Test-time adaptation (TTA) aims to adapt a trained classifier using online unlabeled test data only, without any information related to the training procedure. Most existing TTA methods adapt the trained classifier using the classifier's prediction on the test data as pseudo-label. However, under test-time domain shift, accuracy of the pseudo labels cannot be guaranteed, and thus the TTA methods often encounter performance degradation at the adapted classifier. To overcome this limitation, we propose a novel test-time adaptation method, called Test-time Adaptation via Self-Training with nearest neighbor information (TAST), which is composed of the following procedures: (1) adds trainable adaptation modules on top of the trained feature extractor; (2) newly defines a pseudo-label distribution for the test data by using the nearest neighbor information; (3) trains these modules only a few times during test time to match the nearest neighbor-based pseudo label distribution and a prototype-based class distribution for the test data; and (4) predicts the label of test data using the average predicted class distribution from these modules. The pseudo-label generation is based on the basic intuition that a test data and its nearest neighbor in the embedding space are likely to share the same label under the domain shift. By utilizing multiple randomly initialized adaptation modules, TAST extracts useful information for the classification of the test data under the domain shift, using the nearest neighbor information. TAST showed better performance than the state-of-the-art TTA methods on two standard benchmark tasks, domain generalization, namely VLCS, PACS, OfficeHome, and TerraIncognita, and image corruption, particularly CIFAR-10/100C.

CVNov 4, 2022

Unsupervised Visual Representation Learning via Mutual Information Regularized Assignment

Dong Hoon Lee, Sungik Choi, Hyunwoo Kim et al.

This paper proposes Mutual Information Regularized Assignment (MIRA), a pseudo-labeling algorithm for unsupervised representation learning inspired by information maximization. We formulate online pseudo-labeling as an optimization problem to find pseudo-labels that maximize the mutual information between the label and data while being close to a given model probability. We derive a fixed-point iteration method and prove its convergence to the optimal solution. In contrast to baselines, MIRA combined with pseudo-label prediction enables a simple yet effective clustering-based representation learning without incorporating extra training techniques or artificial constraints such as sampling strategy, equipartition constraints, etc. With relatively small training epochs, representation learned by MIRA achieves state-of-the-art performance on various downstream tasks, including the linear/k-NN evaluation and transfer learning. Especially, with only 400 epochs, our method applied to ImageNet dataset with ResNet-50 architecture achieves 75.6% linear evaluation accuracy.

LGJul 8, 2022

Few-Example Clustering via Contrastive Learning

Minguk Jang, Sae-Young Chung

We propose Few-Example Clustering (FEC), a novel algorithm that performs contrastive learning to cluster few examples. Our method is composed of the following three steps: (1) generation of candidate cluster assignments, (2) contrastive learning for each cluster assignment, and (3) selection of the best candidate. Based on the hypothesis that the contrastive learner with the ground-truth cluster assignment is trained faster than the others, we choose the candidate with the smallest training loss in the early stage of learning in step (3). Extensive experiments on the \textit{mini}-ImageNet and CUB-200-2011 datasets show that FEC outperforms other baselines by about 3.2% on average under various scenarios. FEC also exhibits an interesting learning curve where clustering performance gradually increases and then sharply drops.

CVJun 22, 2021

Unsupervised Embedding Adaptation via Early-Stage Feature Reconstruction for Few-Shot Classification

Dong Hoon Lee, Sae-Young Chung

We propose unsupervised embedding adaptation for the downstream few-shot classification task. Based on findings that deep neural networks learn to generalize before memorizing, we develop Early-Stage Feature Reconstruction (ESFR) -- a novel adaptation scheme with feature reconstruction and dimensionality-driven early stopping that finds generalizable features. Incorporating ESFR consistently improves the performance of baseline methods on all standard settings, including the recently proposed transductive method. ESFR used in conjunction with the transductive method further achieves state-of-the-art performance on mini-ImageNet, tiered-ImageNet, and CUB; especially with 1.2%~2.0% improvements in accuracy over the previous best performing method on 1-shot setting.

LGMay 28, 2021

Improving Generalization in Meta-RL with Imaginary Tasks from Latent Dynamics Mixture

Suyoung Lee, Sae-Young Chung

The generalization ability of most meta-reinforcement learning (meta-RL) methods is largely limited to test tasks that are sampled from the same distribution used to sample training tasks. To overcome the limitation, we propose Latent Dynamics Mixture (LDM) that trains a reinforcement learning agent with imaginary tasks generated from mixtures of learned latent dynamics. By training a policy on mixture tasks along with original training tasks, LDM allows the agent to prepare for unseen test tasks during training and prevents the agent from overfitting the training tasks. LDM significantly outperforms standard meta-RL methods in test returns on the gridworld navigation and MuJoCo tasks where we strictly separate the training task distribution and the test task distribution.

LGNov 27, 2019

Novelty Detection Via Blurring

Sungik Choi, Sae-Young Chung

Conventional out-of-distribution (OOD) detection schemes based on variational autoencoder or Random Network Distillation (RND) have been observed to assign lower uncertainty to the OOD than the target distribution. In this work, we discover that such conventional novelty detection schemes are also vulnerable to the blurred images. Based on the observation, we construct a novel RND-based OOD detector, SVD-RND, that utilizes blurred images during training. Our detector is simple, efficient at test time, and outperforms baseline OOD detectors in various domains. Further results show that SVD-RND learns better target distribution representation than the baseline RND algorithm. Finally, SVD-RND combined with geometric transform achieves near-perfect detection accuracy on the CelebA dataset.

LGOct 22, 2019

Robust Training with Ensemble Consensus

Jisoo Lee, Sae-Young Chung

Since deep neural networks are over-parameterized, they can memorize noisy examples. We address such a memorization issue in the presence of label noise. From the fact that deep neural networks cannot generalize to neighborhoods of memorized features, we hypothesize that noisy examples do not consistently incur small losses on the network under a certain perturbation. Based on this, we propose a novel training method called Learning with Ensemble Consensus (LEC) that prevents overfitting to noisy examples by removing them based on the consensus of an ensemble of perturbed networks. One of the proposed LECs, LTEC outperforms the current state-of-the-art methods on noisy MNIST, CIFAR-10, and CIFAR-100 in an efficient manner.

MLApr 3, 2019

Fourier Phase Retrieval with Extended Support Estimation via Deep Neural Network

Kyung-Su Kim, Sae-Young Chung

We consider the problem of sparse phase retrieval from Fourier transform magnitudes to recover the $k$-sparse signal vector and its support $\mathcal{T}$. We exploit extended support estimate $\mathcal{E}$ with size larger than $k$ satisfying $\mathcal{E} \supseteq \mathcal{T}$ and obtained by a trained deep neural network (DNN). To make the DNN learnable, it provides $\mathcal{E}$ as the union of equivalent solutions of $\mathcal{T}$ by utilizing modulo Fourier invariances. Set $\mathcal{E}$ can be estimated with short running time via the DNN, and support $\mathcal{T}$ can be determined from the DNN output rather than from the full index set by applying hard thresholding to $\mathcal{E}$. Thus, the DNN-based extended support estimation improves the reconstruction performance of the signal with a low complexity burden dependent on $k$. Numerical results verify that the proposed scheme has a superior performance with lower complexity compared to local search-based greedy sparse phase retrieval and a state-of-the-art variant of the Fienup method.

LGApr 1, 2019

Tree Search Network for Sparse Regression

Kyung-Su Kim, Sae-Young Chung

We consider the classical sparse regression problem of recovering a sparse signal $x_0$ given a measurement vector $y = Φx_0+w$. We propose a tree search algorithm driven by the deep neural network for sparse regression (TSN). TSN improves the signal reconstruction performance of the deep neural network designed for sparse regression by performing a tree search with pruning. It is observed in both noiseless and noisy cases, TSN recovers synthetic and real signals with lower complexity than a conventional tree search and is superior to existing algorithms by a large margin for various types of the sensing matrix $Φ$, widely used in sparse regression.

LGMay 31, 2018

Sample-Efficient Deep Reinforcement Learning via Episodic Backward Update

Su Young Lee, Sungik Choi, Sae-Young Chung

We propose Episodic Backward Update (EBU) - a novel deep reinforcement learning algorithm with a direct value propagation. In contrast to the conventional use of the experience replay with uniform random sampling, our agent samples a whole episode and successively propagates the value of a state to its previous states. Our computationally efficient recursive algorithm allows sparse and delayed rewards to propagate directly through all transitions of the sampled episode. We theoretically prove the convergence of the EBU method and experimentally demonstrate its performance in both deterministic and stochastic environments. Especially in 49 games of Atari 2600 domain, EBU achieves the same mean and median human normalized performance of DQN by using only 5% and 10% of samples, respectively.