Cheng-Yao Hong

LGOct 13, 2021

Chun-Hsiao Yeh, Cheng-Yao Hong, Yen-Chi Hsu et al.

Contrastive learning (CL) is one of the most successful paradigms for self-supervised learning (SSL). In a principled way, it considers two augmented "views" of the same image as positive to be pulled closer, and all other images as negative to be pushed further apart. However, behind the impressive success of CL-based techniques, their formulation often relies on heavy-computation settings, including large sample batches, extensive training epochs, etc. We are thus motivated to tackle these issues and establish a simple, efficient, yet competitive baseline of contrastive learning. Specifically, we identify, from theoretical and empirical studies, a noticeable negative-positive-coupling (NPC) effect in the widely used InfoNCE loss, leading to unsuitable learning efficiency concerning the batch size. By removing the NPC effect, we propose decoupled contrastive learning (DCL) loss, which removes the positive term from the denominator and significantly improves the learning efficiency. DCL achieves competitive performance with less sensitivity to sub-optimal hyperparameters, requiring neither large batches in SimCLR, momentum encoding in MoCo, or large epochs. We demonstrate with various benchmarks while manifesting robustness as much less sensitive to suboptimal hyperparameters. Notably, SimCLR with DCL achieves 68.2% ImageNet-1K top-1 accuracy using batch size 256 within 200 epochs pre-training, outperforming its SimCLR baseline by 6.4%. Further, DCL can be combined with the SOTA contrastive learning method, NNCLR, to achieve 72.3% ImageNet-1K top-1 accuracy with 512 batch size in 400 epochs, which represents a new SOTA in contrastive learning. We believe DCL provides a valuable baseline for future contrastive SSL studies.

CVOct 28, 2019

ACE: Adaptive Confusion Energy for Natural World Data Distribution

Yen-Chi Hsu, Cheng-Yao Hong, Wan-Cyuan Fan et al.

With the development of deep learning, standard classification problems have achieved good results. However, conventional classification problems are often too idealistic. Most data in the natural world usually have imbalanced distribution and fine-grained characteristics. Recently, many state-of-the-art approaches tend to focus on one or another separately, but rarely on both. In this paper, we introduce a novel and adaptive batch-wise regularization based on the proposed Adaptive Confusion Energy (ACE) to flexibly address the nature world distribution, which usually involves fine-grained and long-tailed properties at the same time. ACE increases the difficulty of the training process and further alleviates the overfitting problem. Through the datasets with the technical issue in fine-grained (CUB, CAR, AIR) and long-tailed (ImageNet-LT), or comprehensive issues (CUB-LT, iNaturalist), the result shows that the ACE is not only competitive to some state-of-the-art on performance but also demonstrates the effectiveness of training.

Cheng-Yao Hong

2 Papers