Hyunju Kim

LG
h-index10
6papers
14citations
Novelty49%
AI Score44

6 Papers

CVDec 23, 2025
DETACH : Decomposed Spatio-Temporal Alignment for Exocentric Video and Ambient Sensors with Staged Learning

Junho Yoon, Jaemo Jung, Hyunju Kim et al.

Aligning egocentric video with wearable sensors have shown promise for human action recognition, but face practical limitations in user discomfort, privacy concerns, and scalability. We explore exocentric video with ambient sensors as a non-intrusive, scalable alternative. While prior egocentric-wearable works predominantly adopt Global Alignment by encoding entire sequences into unified representations, this approach fails in exocentric-ambient settings due to two problems: (P1) inability to capture local details such as subtle motions, and (P2) over-reliance on modality-invariant temporal patterns, causing misalignment between actions sharing similar temporal patterns with different spatio-semantic contexts. To resolve these problems, we propose DETACH, a decomposed spatio-temporal framework. This explicit decomposition preserves local details, while our novel sensor-spatial features discovered via online clustering provide semantic grounding for context-aware alignment. To align the decomposed features, our two-stage approach establishes spatial correspondence through mutual supervision, then performs temporal alignment via a spatial-temporal weighted contrastive loss that adaptively handles easy negatives, hard negatives, and false negatives. Comprehensive experiments with downstream tasks on Opportunity++ and HWU-USP datasets demonstrate substantial improvements over adapted egocentric-wearable baselines.

LGMar 19, 2024Code
FlowerFormer: Empowering Neural Architecture Encoding using a Flow-aware Graph Transformer

Dongyeong Hwang, Hyunju Kim, Sunwoo Kim et al.

The success of a specific neural network architecture is closely tied to the dataset and task it tackles; there is no one-size-fits-all solution. Thus, considerable efforts have been made to quickly and accurately estimate the performances of neural architectures, without full training or evaluation, for given tasks and datasets. Neural architecture encoding has played a crucial role in the estimation, and graphbased methods, which treat an architecture as a graph, have shown prominent performance. For enhanced representation learning of neural architectures, we introduce FlowerFormer, a powerful graph transformer that incorporates the information flows within a neural architecture. FlowerFormer consists of two key components: (a) bidirectional asynchronous message passing, inspired by the flows; (b) global attention built on flow-based masking. Our extensive experiments demonstrate the superiority of FlowerFormer over existing neural encoding methods, and its effectiveness extends beyond computer vision models to include graph neural networks and auto speech recognition models. Our code is available at http://github.com/y0ngjaenius/CVPR2024_FLOWERFormer.

HCJan 17, 2024
DOO-RE: A dataset of ambient sensors in a meeting room for activity recognition

Hyunju Kim, Geon Kim, Taehoon Lee et al.

With the advancement of IoT technology, recognizing user activities with machine learning methods is a promising way to provide various smart services to users. High-quality data with privacy protection is essential for deploying such services in the real world. Data streams from surrounding ambient sensors are well suited to the requirement. Existing ambient sensor datasets only support constrained private spaces and those for public spaces have yet to be explored despite growing interest in research on them. To meet this need, we build a dataset collected from a meeting room equipped with ambient sensors. The dataset, DOO-RE, includes data streams from various ambient sensor types such as Sound and Projector. Each sensor data stream is segmented into activity units and multiple annotators provide activity labels through a cross-validation annotation process to improve annotation quality. We finally obtain 9 types of activities. To our best knowledge, DOO-RE is the first dataset to support the recognition of both single and group activities in a real meeting room with reliable annotations.

LGSep 27, 2025
Signal Preserving Weight Initialization for Odd-Sigmoid Activations

Hyunwoo Lee, Hayoung Choi, Hyunju Kim

Activation functions critically influence trainability and expressivity, and recent work has therefore explored a broad range of nonlinearities. However, activations and weight initialization are interdependent: without an appropriate initialization method, nonlinearities can cause saturation, variance collapse, and increased learning rate sensitivity. We address this by defining an odd sigmoid function class and, given any activation f in this class, proposing an initialization method tailored to f. The method selects a noise scale in closed form so that forward activations remain well dispersed up to a target layer, thereby avoiding collapse to zero or saturation. Empirically, the approach trains reliably without normalization layers, exhibits strong data efficiency, and enables learning for activations under which standard initialization methods (Xavier, He, Orthogonal) often do not converge reliably.

LGFeb 3, 2025
DiffIM: Differentiable Influence Minimization with Surrogate Modeling and Continuous Relaxation

Junghun Lee, Hyunju Kim, Fanchen Bu et al.

In social networks, people influence each other through social links, which can be represented as propagation among nodes in graphs. Influence minimization (IMIN) is the problem of manipulating the structures of an input graph (e.g., removing edges) to reduce the propagation among nodes. IMIN can represent time-critical real-world applications, such as rumor blocking, but IMIN is theoretically difficult and computationally expensive. Moreover, the discrete nature of IMIN hinders the usage of powerful machine learning techniques, which requires differentiable computation. In this work, we propose DiffIM, a novel method for IMIN with two differentiable schemes for acceleration: (1) surrogate modeling for efficient influence estimation, which avoids time-consuming simulations (e.g., Monte Carlo), and (2) the continuous relaxation of decisions, which avoids the evaluation of individual discrete decisions (e.g., removing an edge). We further propose a third accelerating scheme, gradient-driven selection, that chooses edges instantly based on gradients without optimization (spec., gradient descent iterations) on each test instance. Through extensive experiments on real-world graphs, we show that each proposed scheme significantly improves speed with little (or even no) IMIN performance degradation. Our method is Pareto-optimal (i.e., no baseline is faster and more effective than it) and typically several orders of magnitude (spec., up to 15,160X) faster than the most effective baseline while being more effective.

LGJan 17, 2024
Self-supervised New Activity Detection in Sensor-based Smart Environments

Hyunju Kim, Dongman Lee

With the rapid advancement of ubiquitous computing technology, human activity analysis based on time series data from a diverse range of sensors enables the delivery of more intelligent services. Despite the importance of exploring new activities in real-world scenarios, existing human activity recognition studies generally rely on predefined known activities and often overlook detecting new patterns (novelties) that have not been previously observed during training. Novelty detection in human activities becomes even more challenging due to (1) diversity of patterns within the same known activity, (2) shared patterns between known and new activities, and (3) differences in sensor properties of each activity dataset. We introduce CLAN, a two-tower model that leverages Contrastive Learning with diverse data Augmentation for New activity detection in sensor-based environments. CLAN simultaneously and explicitly utilizes multiple types of strongly shifted data as negative samples in contrastive learning, effectively learning invariant representations that adapt to various pattern variations within the same activity. To enhance the ability to distinguish between known and new activities that share common features, CLAN incorporates both time and frequency domains, enabling the learning of multi-faceted discriminative representations. Additionally, we design an automatic selection mechanism of data augmentation methods tailored to each dataset's properties, generating appropriate positive and negative pairs for contrastive learning. Comprehensive experiments on real-world datasets show that CLAN achieves a 9.24% improvement in AUROC compared to the best-performing baseline model.