LGNov 18, 2025
LAUD: Integrating Large Language Models with Active Learning for Unlabeled DataTzu-Hsuan Chou, Chun-Nan Chou
Large language models (LLMs) have shown a remarkable ability to generalize beyond their pre-training data, and fine-tuning LLMs can elevate performance to human-level and beyond. However, in real-world scenarios, lacking labeled data often prevents practitioners from obtaining well-performing models, thereby forcing practitioners to highly rely on prompt-based approaches that are often tedious, inefficient, and driven by trial and error. To alleviate this issue of lacking labeled data, we present a learning framework integrating LLMs with active learning for unlabeled dataset (LAUD). LAUD mitigates the cold-start problem by constructing an initial label set with zero-shot learning. Experimental results show that LLMs derived from LAUD outperform LLMs with zero-shot or few-shot learning on commodity name classification tasks, demonstrating the effectiveness of LAUD.
LGMay 29, 2019
G2R Bound: A Generalization Bound for Supervised Learning from GAN-Synthetic DataFu-Chieh Chang, Hao-Jen Wang, Chun-Nan Chou et al.
Performing supervised learning from the data synthesized by using Generative Adversarial Networks (GANs), dubbed GAN-synthetic data, has two important applications. First, GANs may generate more labeled training data, which may help improve classification accuracy. Second, in scenarios where real data cannot be released outside certain premises for privacy and/or security reasons, using GAN- synthetic data to conduct training is a plausible alternative. This paper proposes a generalization bound to guarantee the generalization capability of a classifier learning from GAN-synthetic data. This generalization bound helps developers gauge the generalization gap between learning from synthetic data and testing on real data, and can therefore provide the clues to improve the generalization capability.
LGApr 12, 2019
Distributed Layer-Partitioned Training for Privacy-Preserved Deep LearningChun-Hsien Yu, Chun-Nan Chou, Emily Chang
Deep Learning techniques have achieved remarkable results in many domains. Often, training deep learning models requires large datasets, which may require sensitive information to be uploaded to the cloud to accelerate training. To adequately protect sensitive information, we propose distributed layer-partitioned training with step-wise activation functions for privacy-preserving deep learning. Experimental results attest our method to be simple and effective.
LGSep 18, 2018
MBS: Macroblock Scaling for CNN Model ReductionYu-Hsun Lin, Chun-Nan Chou, Edward Y. Chang
In this paper we propose the macroblock scaling (MBS) algorithm, which can be applied to various CNN architectures to reduce their model size. MBS adaptively reduces each CNN macroblock depending on its information redundancy measured by our proposed effective flops. Empirical studies conducted with ImageNet and CIFAR-10 attest that MBS can reduce the model size of some already compact CNN models, e.g., MobileNetV2 (25.03% further reduction) and ShuffleNet (20.74%), and even ultra-deep ones such as ResNet-101 (51.67%) and ResNet-1202 (72.71%) with negligible accuracy degradation. MBS also performs better reduction at a much lower cost than the state-of-the-art optimization-based methods do. MBS's simplicity and efficiency, its flexibility to work with any CNN model, and its scalability to work with models of any depth make it an attractive choice for CNN model size reduction.
LGJul 16, 2018
BRIEF: Backward Reduction of CNNs with Information Flow AnalysisYu-Hsun Lin, Chun-Nan Chou, Edward Y. Chang
This paper proposes BRIEF, a backward reduction algorithm that explores compact CNN-model designs from the information flow perspective. This algorithm can remove substantial non-zero weighting parameters (redundant neural channels) of a network by considering its dynamic behavior, which traditional model-compaction techniques cannot achieve. With the aid of our proposed algorithm, we achieve significant model reduction on ResNet-34 in the ImageNet scale (32.3% reduction), which is 3X better than the previous result (10.8%). Even for highly optimized models such as SqueezeNet and MobileNet, we can achieve additional 10.81% and 37.56% reduction, respectively, with negligible performance degradation.
LGFeb 19, 2018
EA-CG: An Approximate Second-Order Method for Training Fully-Connected Neural NetworksSheng-Wei Chen, Chun-Nan Chou, Edward Y. Chang
For training fully-connected neural networks (FCNNs), we propose a practical approximate second-order method including: 1) an approximation of the Hessian matrix and 2) a conjugate gradient (CG) based method. Our proposed approximate Hessian matrix is memory-efficient and can be applied to any FCNNs where the activation and criterion functions are twice differentiable. We devise a CG-based method incorporating one-rank approximation to derive Newton directions for training FCNNs, which significantly reduces both space and time complexity. This CG-based method can be employed to solve any linear equation where the coefficient matrix is Kronecker-factored, symmetric and positive definite. Empirical studies show the efficacy and efficiency of our proposed method.
DCAug 10, 2017
Distributed Training Large-Scale Deep ArchitecturesShang-Xuan Zou, Chun-Yen Chen, Jui-Lin Wu et al.
Scale of data and scale of computation infrastructures together enable the current deep learning renaissance. However, training large-scale deep architectures demands both algorithmic improvement and careful system configuration. In this paper, we focus on employing the system approach to speed up large-scale training. Via lessons learned from our routine benchmarking effort, we first identify bottlenecks and overheads that hinter data parallelism. We then devise guidelines that help practitioners to configure an effective system and fine-tune parameters to achieve desired speedup. Specifically, we develop a procedure for setting minibatch size and choosing computation algorithms. We also derive lemmas for determining the quantity of key components such as the number of GPUs and parameter servers. Experiments and examples show that these guidelines help effectively speed up large-scale deep learning training.
CVJul 25, 2017
Representation Learning on Large and Small DataChun-Nan Chou, Chuen-Kai Shie, Fu-Chieh Chang et al.
Deep learning owes its success to three key factors: scale of data, enhanced models to learn representations from data, and scale of computation. This book chapter presented the importance of the data-driven approach to learn good representations from both big data and small data. In terms of big data, it has been widely accepted in the research community that the more data the better for both representation and classification improvement. The question is then how to learn representations from big data, and how to perform representation learning when data is scarce. We addressed the first question by presenting CNN model enhancements in the aspects of representation, optimization, and generalization. To address the small data challenge, we showed transfer representation learning to be effective. Transfer representation learning transfers the learned representation from a source domain where abundant training data is available to a target domain where training data is scarce. Transfer representation learning gave the OM and melanoma diagnosis modules of our XPRIZE Tricorder device (which finished $2^{nd}$ out of $310$ competing teams) a significant boost in diagnosis accuracy.