LGApr 8Code
Critical Patch-Aware Sparse Prompting with Decoupled Training for Continual Learning on the EdgeWonseon Lim, Jaesung Lee, Dae-Won Kim
Continual learning (CL) on edge devices requires not only high accuracy but also training-time efficiency to support on-device adaptation under strict memory and computational constraints. While prompt-based continual learning (PCL) is parameter-efficient and achieves competitive accuracy, prior work has focused mainly on accuracy or inference-time performance, often overlooking the memory and computational costs of on-device training. In this paper, we propose CPS-Prompt, a critical patch-aware sparse prompting framework that explicitly targets training-time memory usage and computational cost by integrating critical patch sampling (CPS) for task-aware token reduction and decoupled prompt and classifier training (DPCT) to reduce backpropagation overhead. Experiments on three public benchmarks and real edge hardware show that CPS-Prompt improves peak memory, training time, and energy efficiency by about 1.6x over the balanced CODA-Prompt baseline, while maintaining accuracy within 2% of the state-of-the-art C-Prompt on average and remaining competitive with CODA-Prompt in accuracy. The code is available at https://github.com/laymond1/cps-prompt.
LGJan 13
On Evaluation of Unsupervised Feature Selection for Pattern ClassificationGyu-Il Kim, Dae-Won Kim, Jaesung Lee
Unsupervised feature selection aims to identify a compact subset of features that captures the intrinsic structure of data without supervised label. Most existing studies evaluate the performance of methods using the single-label dataset that can be instantiated by selecting a label from multi-label data while maintaining the original features. Because the chosen label can vary arbitrarily depending on the experimental setting, the superiority among compared methods can be changed with regard to which label happens to be selected. Thus, evaluating unsupervised feature selection methods based solely on single-label accuracy is unreasonable for assessing their true discriminative ability. This study revisits this evaluation paradigm by adopting a multi-label classification framework. Experiments on 21 multi-label datasets using several representative methods demonstrate that performance rankings differ markedly from those reported under single-label settings, suggesting the possibility of multi-label evaluation settings for fair and reliable comparison of unsupervised feature selection methods.
MLAug 3, 2024
Cost-constrained multi-label group feature selection using shadow featuresTomasz Klonecki, Paweł Teisseyre, Jaesung Lee
We consider the problem of feature selection in multi-label classification, considering the costs assigned to groups of features. In this task, the goal is to select a subset of features that will be useful for predicting the label vector, but at the same time, the cost associated with the selected features will not exceed the assumed budget. Solving the problem is of great importance in medicine, where we may be interested in predicting various diseases based on groups of features. The groups may be associated with parameters obtained from a certain diagnostic test, such as a blood test. Because diagnostic test costs can be very high, considering cost information when selecting relevant features becomes crucial to reducing the cost of making predictions. We focus on the feature selection method based on information theory. The proposed method consists of two steps. First, we select features sequentially while maximizing conditional mutual information until the budget is exhausted. In the second step, we select additional cost-free features, i.e., those coming from groups that have already been used in previous steps. Limiting the number of added features is possible using the stop rule based on the concept of so-called shadow features, which are randomized counterparts of the original ones. In contrast to existing approaches based on penalized criteria, in our method, we avoid the need for computationally demanding optimization of the penalty parameter. Experiments conducted on the MIMIC medical database show the effectiveness of the method, especially when the assumed budget is limited.
CLJan 26
Suppressing Final Layer Hidden State Jumps in Transformer PretrainingKeigo Shibata, Kazuki Yano, Ryosuke Takahashi et al.
This paper discusses the internal behavior of Transformer language models. Many recent pre-trained models have been reported to exhibit only slight changes in the angular distance between the input and output hidden state vectors in the middle Transformer layers, despite a disproportionately large ``jump'' in the angular distance occurring in or around the final Transformer layer. To characterize this, we first introduce a quantitative metric for the jump strength around the final layer, and then demonstrate its prevalence across many open-weight models, as well as its amplification throughout pre-training. Assuming such jumps indicate an undesirable property, we propose the jump-suppressing regularizer (JREG) which penalizes this jump during pre-training, thereby encouraging more balanced capability usage across the middle layers. Empirical evaluations of three model sizes of Llama-based models, trained with the proposed JREG method, reveal improved task performance compared to the baseline without altering the model architecture.
CRFeb 6, 2025
BitAbuse: A Dataset of Visually Perturbed Texts for Defending Phishing AttacksHanyong Lee, Chaelyn Lee, Yongjae Lee et al.
Phishing often targets victims through visually perturbed texts to bypass security systems. The noise contained in these texts functions as an adversarial attack, designed to deceive language models and hinder their ability to accurately interpret the content. However, since it is difficult to obtain sufficient phishing cases, previous studies have used synthetic datasets that do not contain real-world cases. In this study, we propose the BitAbuse dataset, which includes real-world phishing cases, to address the limitations of previous research. Our dataset comprises a total of 325,580 visually perturbed texts. The dataset inputs are drawn from the raw corpus, consisting of visually perturbed sentences and sentences generated through an artificial perturbation process. Each input sentence is labeled with its corresponding ground truth, representing the restored, non-perturbed version. Language models trained on our proposed dataset demonstrated significantly better performance compared to previous methods, achieving an accuracy of approximately 96%. Our analysis revealed a significant gap between real-world and synthetic examples, underscoring the value of our dataset for building reliable pre-trained models for restoration tasks. We release the BitAbuse dataset, which includes real-world phishing cases annotated with visual perturbations, to support future research in adversarial attack defense.
IVMar 4
Polyp Segmentation Using Wavelet-Based Cross-Band Integration for Enhanced Boundary RepresentationHaesung Oh, Jaesung Lee
Accurate polyp segmentation is essential for early colorectal cancer detection, yet achieving reliable boundary localization remains challenging due to low mucosal contrast, uneven illumination, and color similarity between polyps and surrounding tissue. Conventional methods relying solely on RGB information often struggle to delineate precise boundaries due to weak contrast and ambiguous structures between polyps and surrounding mucosa. To establish a quantitative foundation for this limitation, we analyzed polyp-background contrast in the wavelet domain, revealing that grayscale representations consistently preserve higher boundary contrast than RGB images across all frequency bands. This finding suggests that boundary cues are more distinctly represented in the grayscale domain than in the color domain. Motivated by this finding, we propose a segmentation model that integrates grayscale and RGB representations through complementary frequency-consistent interaction, enhancing boundary precision while preserving structural coherence. Extensive experiments on four benchmark datasets demonstrate that the proposed approach achieves superior boundary precision and robustness compared to conventional models.
LGSep 1, 2025
Advanced Torrential Loss Function for Precipitation ForecastingJaeho Choi, Hyeri Kim, Kwang-Ho Kim et al.
Accurate precipitation forecasting is becoming increasingly important in the context of climate change. In response, machine learning-based approaches have recently gained attention as an emerging alternative to traditional methods such as numerical weather prediction and climate models. Nonetheless, many recent approaches still rely on off-the-shelf loss functions, and even the more advanced ones merely involve optimization processes based on the critical success index (CSI). The problem, however, is that CSI may become ineffective during extended dry periods when precipitation remains below the threshold, rendering it less than ideal as a criterion for optimization. To address this limitation, we introduce a simple penalty expression and reinterpret it as a quadratic unconstrained binary optimization (QUBO) formulation. Ultimately, the resulting QUBO formulation is relaxed into a differentiable advanced torrential (AT) loss function through an approximation process. The proposed AT loss demonstrates its superiority through the Lipschitz constant, forecast performance evaluations, consistency experiments, and ablation studies with the operational model.
CLAug 25, 2025
Layerwise Importance Analysis of Feed-Forward Networks in Transformer-based Language ModelsWataru Ikeda, Kazuki Yano, Ryosuke Takahashi et al.
This study investigates the layerwise importance of feed-forward networks (FFNs) in Transformer-based language models during pretraining. We introduce an experimental approach that, while maintaining the total parameter count, increases the FFN dimensions in some layers and completely removes the FFNs from other layers. Furthermore, since our focus is on the importance of FFNs during pretraining, we train models from scratch to examine whether the importance of FFNs varies depending on their layer positions, rather than using publicly available pretrained models as is frequently done. Through comprehensive evaluations of models with varying sizes (285M, 570M, and 1.2B parameters) and layer counts (12, 24, and 40 layers), we demonstrate that concentrating FFNs in 70% of the consecutive middle layers consistently outperforms standard configurations for multiple downstream tasks.
MEJun 20, 2025
Bayesian Joint Model of Multi-Sensor and Failure Event Data for Multi-Mode Failure PredictionSina Aghaee Dabaghan Fard, Minhee Kim, Akash Deep et al.
Modern industrial systems are often subject to multiple failure modes, and their conditions are monitored by multiple sensors, generating multiple time-series signals. Additionally, time-to-failure data are commonly available. Accurately predicting a system's remaining useful life (RUL) requires effectively leveraging multi-sensor time-series data alongside multi-mode failure event data. In most existing models, failure modes and RUL prediction are performed independently, ignoring the inherent relationship between these two tasks. Some models integrate multiple failure modes and event prediction using black-box machine learning approaches, which lack statistical rigor and cannot characterize the inherent uncertainty in the model and data. This paper introduces a unified approach to jointly model the multi-sensor time-series data and failure time concerning multiple failure modes. This proposed model integrate a Cox proportional hazards model, a Convolved Multi-output Gaussian Process, and multinomial failure mode distributions in a hierarchical Bayesian framework with corresponding priors, enabling accurate prediction with robust uncertainty quantification. Posterior distributions are effectively obtained by Variational Bayes, and prediction is performed with Monte Carlo sampling. The advantages of the proposed model is validated through extensive numerical and case studies with jet-engine dataset.
LGJun 16, 2025
Robust Physics-Informed Neural Network Approach for Estimating Heterogeneous Elastic Properties from Noisy Displacement DataTatthapong Srikitrungruang, Matthew Lemon, Sina Aghaee Dabaghan Fard et al.
Accurately estimating spatially heterogeneous elasticity parameters, particularly Young's modulus and Poisson's ratio, from noisy displacement measurements remains significantly challenging in inverse elasticity problems. Existing inverse estimation techniques are often limited by instability, pronounced sensitivity to measurement noise, and difficulty in recovering absolute-scale Young's modulus. This work presents a novel Inverse Elasticity Physics-Informed Neural Network (IE-PINN) specifically designed to robustly reconstruct heterogeneous distributions of elasticity parameters from noisy displacement data based on linear elasticity physics. IE-PINN integrates three distinct neural network architectures dedicated to separately modeling displacement fields, strain fields, and elasticity distributions, thereby significantly enhancing stability and accuracy against measurement noise. Additionally, a two-phase estimation strategy is introduced: the first phase recovers relative spatial distributions of Young's modulus and Poisson's ratio, and the second phase calibrates the absolute scale of Young's modulus using imposed loading boundary conditions. Additional methodological innovations, including positional encoding, sine activation functions, and a sequential pretraining protocol, further enhance the model's performance and robustness. Extensive numerical experiments demonstrate that IE-PINN effectively overcomes critical limitations encountered by existing methods, delivering accurate absolute-scale elasticity estimations even under severe noise conditions. This advancement holds substantial potential for clinical imaging diagnostics and mechanical characterization, where measurements typically encounter substantial noise.
CVFeb 11, 2021
K-Hairstyle: A Large-scale Korean Hairstyle Dataset for Virtual Hair Editing and Hairstyle ClassificationTaewoo Kim, Chaeyeon Chung, Sunghyun Park et al.
The hair and beauty industry is a fast-growing industry. This led to the development of various applications, such as virtual hair dyeing or hairstyle transfer, to satisfy the customer's needs. Although several hairstyle datasets are available for these applications, they often consist of a relatively small number of images with low resolution, thus limiting their performance on high-quality hair editing. In response, we introduce a novel large-scale Korean hairstyle dataset, K-hairstyle, containing 500,000 high-resolution images. In addition, K-hairstyle includes various hair attributes annotated by Korean expert hairstylists as well as hair segmentation masks. We validate the effectiveness of our dataset via several applications, such as hair dyeing, hairstyle transfer, and hairstyle classification. K-hairstyle is publicly available at https://psh01087.github.io/K-Hairstyle/.