LGAug 15, 2023Code
NeFL: Nested Model Scaling for Federated Learning with System Heterogeneous ClientsHonggu Kang, Seohyeon Cha, Jinwoo Shin et al.
Federated learning (FL) enables distributed training while preserving data privacy, but stragglers-slow or incapable clients-can significantly slow down the total training time and degrade performance. To mitigate the impact of stragglers, system heterogeneity, including heterogeneous computing and network bandwidth, has been addressed. While previous studies have addressed system heterogeneity by splitting models into submodels, they offer limited flexibility in model architecture design, without considering potential inconsistencies arising from training multiple submodel architectures. We propose nested federated learning (NeFL), a generalized framework that efficiently divides deep neural networks into submodels using both depthwise and widthwise scaling. To address the inconsistency arising from training multiple submodel architectures, NeFL decouples a subset of parameters from those being trained for each submodel. An averaging method is proposed to handle these decoupled parameters during aggregation. NeFL enables resource-constrained devices to effectively participate in the FL pipeline, facilitating larger datasets for model training. Experiments demonstrate that NeFL achieves performance gain, especially for the worst-case submodel compared to baseline approaches (7.63% improvement on CIFAR-100). Furthermore, NeFL aligns with recent advances in FL, such as leveraging pre-trained models and accounting for statistical heterogeneity. Our code is available online.
LGMar 4
Online Learning for Multi-Layer Hierarchical Inference under Partial and Policy-Dependent FeedbackHaoran Zhang, Seohyeon Cha, Hasan Burhan Beytur et al.
Hierarchical inference systems route tasks across multiple computational layers, where each node may either finalize a prediction locally or offload the task to a node in the next layer for further processing. Learning optimal routing policies in such systems is challenging: inference loss is defined recursively across layers, while feedback on prediction error is revealed only at a terminal oracle layer. This induces a partial, policy-dependent feedback structure in which observability probabilities decay with depth, causing importance-weighted estimators to suffer from amplified variance. We study online routing for multi-layer hierarchical inference under long-term resource constraints and terminal-only feedback. We formalize the recursive loss structure and show that naive importance-weighted contextual bandit methods become unstable as feedback probability decays along the hierarchy. To address this, we develop a variance-reduced EXP4-based algorithm integrated with Lyapunov optimization, yielding unbiased loss estimation and stable learning under sparse and policy-dependent feedback. We provide regret guarantees relative to the best fixed routing policy in hindsight and establish near-optimality under stochastic arrivals and resource constraints. Experiments on large-scale multi-task workloads demonstrate improved stability and performance compared to standard importance-weighted approaches.
LGFeb 5
Regularized Calibration with Successive Rounding for Post-Training QuantizationSeohyeon Cha, Huancheng Chen, Dongjun Kim et al.
Large language models (LLMs) deliver robust performance across diverse applications, yet their deployment often faces challenges due to the memory and latency costs of storing and accessing billions of parameters. Post-training quantization (PTQ) enables efficient inference by mapping pretrained weights to low-bit formats without retraining, but its effectiveness depends critically on both the quantization objective and the rounding procedure used to obtain low-bit weight representations. In this work, we show that interpolating between symmetric and asymmetric calibration acts as a form of regularization that preserves the standard quadratic structure used in PTQ while providing robustness to activation mismatch. Building on this perspective, we derive a simple successive rounding procedure that naturally incorporates asymmetric calibration, as well as a bounded-search extension that allows for an explicit trade-off between quantization quality and the compute cost. Experiments across multiple LLM families, quantization bit-widths, and benchmarks demonstrate that the proposed bounded search based on a regularized asymmetric calibration objective consistently improves perplexity and accuracy over PTQ baselines, while incurring only modest and controllable additional computational cost.
LGOct 17, 2023
On the Temperature of Bayesian Graph Neural Networks for Conformal PredictionSeohyeon Cha, Honggu Kang, Joonhyuk Kang
Accurate uncertainty quantification in graph neural networks (GNNs) is essential, especially in high-stakes domains where GNNs are frequently employed. Conformal prediction (CP) offers a promising framework for quantifying uncertainty by providing $\textit{valid}$ prediction sets for any black-box model. CP ensures formal probabilistic guarantees that a prediction set contains a true label with a desired probability. However, the size of prediction sets, known as $\textit{inefficiency}$, is influenced by the underlying model and data generating process. On the other hand, Bayesian learning also provides a credible region based on the estimated posterior distribution, but this region is $\textit{well-calibrated}$ only when the model is correctly specified. Building on a recent work that introduced a scaling parameter for constructing valid credible regions from posterior estimate, our study explores the advantages of incorporating a temperature parameter into Bayesian GNNs within CP framework. We empirically demonstrate the existence of temperatures that result in more efficient prediction sets. Furthermore, we conduct an analysis to identify the factors contributing to inefficiency and offer valuable insights into the relationship between CP performance and model calibration.
LGDec 24, 2024Code
GeFL: Model-Agnostic Federated Learning with Generative ModelsHonggu Kang, Seohyeon Cha, Joonhyuk Kang
Federated learning (FL) is a distributed training paradigm that enables collaborative learning across clients without sharing local data, thereby preserving privacy. However, the increasing scale and complexity of modern deep models often exceed the computational or memory capabilities of edge devices. Furthermore, clients may be constrained to use heterogeneous model architectures due to hardware variability (e.g., ASICs, FPGAs) or proprietary requirements that prevent the disclosure or modification of local model structures. These practical considerations motivate the need for model-heterogeneous FL, where clients participate using distinct model architectures. In this work, we propose Generative Model-Aided Federated Learning (GeFL), a framework that enables cross-client knowledge sharing via a generative model trained in a federated manner. This generative model captures global data semantics and facilitates local training without requiring model homogeneity across clients. While GeFL achieves strong performance, empirical analysis reveals limitations in scalability and potential privacy leakage due to generative sample memorization. To address these concerns, we propose GeFL-F, which utilizes feature-level generative modeling. This approach enhances scalability to large client populations and mitigates privacy risks. Extensive experiments across image classification tasks demonstrate that both GeFL and GeFL-F offer competitive performance in heterogeneous settings. Code is available at [1].
LGSep 25, 2025
Task-Agnostic Federated Continual Learning via Replay-Free Gradient ProjectionSeohyeon Cha, Huancheng Chen, Haris Vikalo
Federated continual learning (FCL) enables distributed client devices to learn from streaming data across diverse and evolving tasks. A major challenge to continual learning, catastrophic forgetting, is exacerbated in decentralized settings by the data heterogeneity, constrained communication and privacy concerns. We propose Federated gradient Projection-based Continual Learning with Task Identity Prediction (FedProTIP), a novel FCL framework that mitigates forgetting by projecting client updates onto the orthogonal complement of the subspace spanned by previously learned representations of the global model. This projection reduces interference with earlier tasks and preserves performance across the task sequence. To further address the challenge of task-agnostic inference, we incorporate a lightweight mechanism that leverages core bases from prior tasks to predict task identity and dynamically adjust the global model's outputs. Extensive experiments across standard FCL benchmarks demonstrate that FedProTIP significantly outperforms state-of-the-art methods in average accuracy, particularly in settings where task identities are a priori unknown.
LGAug 18, 2025
Batching-Aware Joint Model Onloading and Offloading for Hierarchical Multi-Task InferenceSeohyeon Cha, Kevin Chan, Gustavo de Veciana et al.
The growing demand for intelligent services on resource-constrained edge devices has spurred the development of collaborative inference systems that distribute workloads across end devices, edge servers, and the cloud. While most existing frameworks focus on single-task, single-model scenarios, many real-world applications (e.g., autonomous driving and augmented reality) require concurrent execution of diverse tasks including detection, segmentation, and depth estimation. In this work, we propose a unified framework to jointly decide which multi-task models to deploy (onload) at clients and edge servers, and how to route queries across the hierarchy (offload) to maximize overall inference accuracy under memory, compute, and communication constraints. We formulate this as a mixed-integer program and introduce J3O (Joint Optimization of Onloading and Offloading), an alternating algorithm that (i) greedily selects models to onload via Lagrangian-relaxed submodular optimization and (ii) determines optimal offloading via constrained linear programming. We further extend J3O to account for batching at the edge, maintaining scalability under heterogeneous task loads. Experiments show J3O consistently achieves over $97\%$ of the optimal accuracy while incurring less than $15\%$ of the runtime required by the optimal solver across multi-task benchmarks.