Haotian Lu

AR
h-index12
10papers
16citations
Novelty59%
AI Score56

10 Papers

NCJan 17, 2024Code
MorphGrower: A Synchronized Layer-by-layer Growing Approach for Plausible Neuronal Morphology Generation

Nianzu Yang, Kaipeng Zeng, Haotian Lu et al.

Neuronal morphology is essential for studying brain functioning and understanding neurodegenerative disorders. As acquiring real-world morphology data is expensive, computational approaches for morphology generation have been studied. Traditional methods heavily rely on expert-set rules and parameter tuning, making it difficult to generalize across different types of morphologies. Recently, MorphVAE was introduced as the sole learning-based method, but its generated morphologies lack plausibility, i.e., they do not appear realistic enough and most of the generated samples are topologically invalid. To fill this gap, this paper proposes MorphGrower, which mimicks the neuron natural growth mechanism for generation. Specifically, MorphGrower generates morphologies layer by layer, with each subsequent layer conditioned on the previously generated structure. During each layer generation, MorphGrower utilizes a pair of sibling branches as the basic generation block and generates branch pairs synchronously. This approach ensures topological validity and allows for fine-grained generation, thereby enhancing the realism of the final generated morphologies. Results on four real-world datasets demonstrate that MorphGrower outperforms MorphVAE by a notable margin. Importantly, the electrophysiological response simulation demonstrates the plausibility of our generated samples from a neuroscience perspective. Our code is available at https://github.com/Thinklab-SJTU/MorphGrower.

ETMar 5, 2024Code
DOCTOR: Dynamic On-Chip Temporal Variation Remediation Toward Self-Corrected Photonic Tensor Accelerators

Haotian Lu, Sanmitra Banerjee, Jiaqi Gu

Photonic computing has emerged as a promising solution for accelerating computation-intensive artificial intelligence (AI) workloads, offering unparalleled speed and energy efficiency, especially in resource-limited, latency-sensitive edge computing environments. However, the deployment of analog photonic tensor accelerators encounters reliability challenges due to hardware noise and environmental variations. While off-chip noise-aware training and on-chip training have been proposed to enhance the variation tolerance of optical neural accelerators with moderate, static noise, we observe a notable performance degradation over time due to temporally drifting variations, which requires a real-time, in-situ calibration mechanism. To tackle this challenging reliability issues, for the first time, we propose a lightweight dynamic on-chip remediation framework, dubbed DOCTOR, providing adaptive, in-situ accuracy recovery against temporally drifting noise. The DOCTOR framework intelligently monitors the chip status using adaptive probing and performs fast in-situ training-free calibration to restore accuracy when necessary. Recognizing nonuniform spatial variation distributions across devices and tensor cores, we also propose a variation-aware architectural remapping strategy to avoid executing critical tasks on noisy devices. Extensive experiments show that our proposed framework can guarantee sustained performance under drifting variations with 34% higher accuracy and 2-3 orders-of-magnitude lower overhead compared to state-of-the-art on-chip training methods. Our code is open-sourced at https://github.com/ScopeX-ASU/DOCTOR.

55.7AIApr 12
CHAIRO: Contextual Hierarchical Analogical Induction and Reasoning Optimization for LLMs

Haotian Lu, Yuchen Mou, Bingzhe Wu

Content moderation in online platforms faces persistent challenges due to the evolving complexity of user-generated content and the limitations of traditional rule-based and machine learning approaches. While recent advances in large language models (LLMs) have enabled more sophisticated moderation via direct prompting or fine-tuning, these approaches often exhibit limited generalization, interpretability, and adaptability to unseen or ambiguous cases. In this work, we propose a novel moderation framework that leverages analogical examples to enhance rule induction and decision reliability. Our approach integrates end-to-end optimization of analogical retrieval, rule generation, and moderation classification, enabling the dynamic adaptation of moderation rules to diverse content scenarios. Through comprehensive experiments, we demonstrate that our method significantly outperforms both rule-injected fine-tuning baselines and multi-stage static RAG pipelines in terms of moderation accuracy and rule quality. Further evaluations, including human assessments and external model generalization tests, confirm that our framework produces rules with better clarity, interpretability, and applicability. These findings show that analogical example-driven methods can advance robust, explainable, and generalizable content moderation in real-world applications.

58.1AIApr 12
CARO: Chain-of-Analogy Reasoning Optimization for Robust Content Moderation

Bingzhe Wu, Haotian Lu, Yuchen Mou

Current large language models (LLMs), even those explicitly trained for reasoning, often struggle with ambiguous content moderation cases due to misleading "decision shortcuts" embedded in context. Inspired by cognitive psychology insights into expert moderation, we introduce \caro (Chain-of-Analogy Reasoning Optimization), a novel two-stage training framework to induce robust analogical reasoning in LLMs. First, \caro bootstraps analogical reasoning chains via retrieval-augmented generation (RAG) on moderation data and performs supervised fine-tuning (SFT). Second, we propose a customized direct preference optimization (DPO) approach to reinforce analogical reasoning behaviors explicitly. Unlike static retrieval methods, \caro dynamically generates tailored analogical references during inference, effectively mitigating harmful decision shortcuts. Extensive experiments demonstrate that \caro substantially outperforms state-of-the-art reasoning models (DeepSeek R1, QwQ), specialized moderation models (LLaMA Guard), and advanced fine-tuning and retrieval-augmented methods, achieving an average F1 score improvement of 24.9\% on challenging ambiguous moderation benchmarks.

12.9ARMar 19
WarPGNN: A Parametric Thermal Warpage Analysis Framework with Physics-aware Graph Neural Network

Haotian Lu, Jincong Lu, Sachin Sachdeva et al.

With the advent of system-in-package (SiP) chiplet-based design and heterogeneous 2.5D/3D integration, thermal-induced warpage has become a critical reliability concern. While conventional numerical approaches can deliver highly accurate results, they often incur prohib- itively high computational costs, limiting their scalability for complex chiplet-package systems. In this paper, we present WarPGNN, an ef- ficient and accurate parametric thermal warpage analysis framework powered by Graph Neural Networks (GNNs). By operating directly on graphs constructed from the floorplans, WarPGNN enables fast warpage-aware floorplan exploration and exhibits strong transfer- ability across diverse package configurations. Our method first en- codes multi-die floorplans into reduced Transitive Closure Graphs (rTCGs), then a Graph Convolution Network (GCN)-based encoder extracts hierarchical structural features, followed by a U-Net inspired decoder that reconstructs warpage maps from graph feature embed- dings. Furthermore, to address the long-tailed pattern of warpage data distribution, we developed a physics-informed loss and revised a message-passing encoder based on Graph Isomorphic Network (GIN) that further enhance learning performance for extreme cases and expressiveness of graph embeddings. Numerical results show that WarPGNN achieves more than 205.91x speedup compared with the 2-D efficient FEM-based method and over 119766.64x acceleration with 3-D FEM method COMSOL, respectively, while maintaining comparable accuracy at only 1.26% full-scale normalized RMSE and 2.21% warpage value error. Compared with recent DeepONet-based model, our method achieved comparable prediction accuracy and in- ference speedup with 3.4x lower training time. In addition, WarPGNN demonstrates remarkable transferability on unseen datasets with up to 3.69% normalized RMSE and similar runtime.

7.6ARMar 26
Accelerating Physics-Based Electromigration Analysis via Rational Krylov Subspaces

Sheldon X. -D. Tan, Haotian Lu

Electromigration (EM) induced stress evolution is a major reliability challenge in nanometer-scale VLSI interconnects. Accurate EM analysis requires solving stress-governing partial differential equations over large interconnect trees, which is computationally expensive using conventional finite-difference methods. This work proposes two fast EM stress analysis techniques based on rational Krylov subspace reduction. Unlike traditional Krylov methods that expand around zero frequency, rational Krylov methods enable expansion at selected time constants, aligning directly with metrics such as nucleation and steady-state times and producing compact reduced models with minimal accuracy loss. Two complementary frameworks are developed: a frequency-domain extended rational Krylov method, ExtRaKrylovEM, and a time-domain rational Krylov exponential integration method, EiRaKrylovEM. We show that the accuracy of both methods depends strongly on the choice of expansion point, or shift time, and demonstrate that effective shift times are typically close to times of interest such as nucleation or post-void steady state. Based on this observation, a coordinate descent optimization strategy is introduced to automatically determine optimal reduction orders and shift times for both nucleation and post-void phases. Experimental results on synthesized structures and industry-scale power grids show that the proposed methods achieve orders-of-magnitude improvements in efficiency and accuracy over finite-difference solutions. Using only 4 to 6 Krylov orders, the methods achieve sub-0.1 percent error in nucleation time and resistance change predictions while delivering 20 to 500 times speedup. In contrast, standard extended Krylov methods require more than 50 orders and still incur 10 to 20 percent nucleation time error, limiting their practicality for EM-aware optimization and stochastic EM analysis.

IVMay 15, 2024
Large coordinate kernel attention network for lightweight image super-resolution

Fangwei Hao, Jiesheng Wu, Haotian Lu et al.

The multi-scale receptive field and large kernel attention (LKA) module have been shown to significantly improve performance in the lightweight image super-resolution task. However, existing lightweight super-resolution (SR) methods seldom pay attention to designing efficient building block with multi-scale receptive field for local modeling, and their LKA modules face a quadratic increase in computational and memory footprints as the convolutional kernel size increases. To address the first issue, we propose the multi-scale blueprint separable convolutions (MBSConv) as highly efficient building block with multi-scale receptive field, it can focus on the learning for the multi-scale information which is a vital component of discriminative representation. As for the second issue, we revisit the key properties of LKA in which we find that the adjacent direct interaction of local information and long-distance dependencies is crucial to provide remarkable performance. Thus, taking this into account and in order to mitigate the complexity of LKA, we propose a large coordinate kernel attention (LCKA) module which decomposes the 2D convolutional kernels of the depth-wise convolutional layers in LKA into horizontal and vertical 1-D kernels. LCKA enables the adjacent direct interaction of local information and long-distance dependencies not only in the horizontal direction but also in the vertical. Besides, LCKA allows for the direct use of extremely large kernels in the depth-wise convolutional layers to capture more contextual information, which helps to significantly improve the reconstruction performance, and it incurs lower computational complexity and memory footprints. Integrating MBSConv and LCKA, we propose a large coordinate kernel attention network (LCAN).

29.5ARApr 14
EMSpice 3: Full-chip Temperature-Aware Multiphysics Electromigration and IR-Drop Analysis

Haotian Lu, Sheldon X. -D. Tan

In this work, we present EMSpice 3, a full-chip temperature-aware multiphysics framework for coupled electromigration (EM), thermomigration (TM), and IR-drop analysis of power-grid networks. It unifies extracted netlists, configurable parameters, and optional chip-level thermal maps into a single flow supporting temperature-aware immortality screening, transient EM/TM stress simulation with iterative resistance updates, and optional Monte Carlo lifetime analysis. To accelerate large-tree simulations, EMSpice 3 integrates an extended rational Krylov reduction method into the transient solver without loss of accuracy. It also interfaces with Synopsys ICC and Fusion Compiler for practical deployment. By incorporating realistic spatial thermal maps into the reliability loop, the framework enables map-aware EM sign-off beyond average-temperature assumptions. Experiments on six designs show that spatial thermal variation significantly impacts EM reliability even with identical average temperature. For a RISC-V core, equal-average thermal profiles yield over 70% spread in time to failure (TTF), while an ARM Cortex-A core shows nearly 50%. The Krylov-accelerated solver achieves 1.18x - 1.50x runtime reduction. Monte Carlo analysis reveals strong design dependence: under 20% variation in diffusivity and critical stress, TTF variation is about 25\% for RISC-V but only 0.03% for ARM. These results demonstrate that EMSpice 3 enables practical, map-aware, and workload-aware full-chip EM reliability assessment.

LGJun 11, 2025
EnerBridge-DPO: Energy-Guided Protein Inverse Folding with Markov Bridges and Direct Preference Optimization

Dingyi Rong, Haotian Lu, Wenzhuo Zheng et al.

Designing protein sequences with optimal energetic stability is a key challenge in protein inverse folding, as current deep learning methods are primarily trained by maximizing sequence recovery rates, often neglecting the energy of the generated sequences. This work aims to overcome this limitation by developing a model that directly generates low-energy, stable protein sequences. We propose EnerBridge-DPO, a novel inverse folding framework focused on generating low-energy, high-stability protein sequences. Our core innovation lies in: First, integrating Markov Bridges with Direct Preference Optimization (DPO), where energy-based preferences are used to fine-tune the Markov Bridge model. The Markov Bridge initiates optimization from an information-rich prior sequence, providing DPO with a pool of structurally plausible sequence candidates. Second, an explicit energy constraint loss is introduced, which enhances the energy-driven nature of DPO based on prior sequences, enabling the model to effectively learn energy representations from a wealth of prior knowledge and directly predict sequence energy values, thereby capturing quantitative features of the energy landscape. Our evaluations demonstrate that EnerBridge-DPO can design protein complex sequences with lower energy while maintaining sequence recovery rates comparable to state-of-the-art models, and accurately predicts $ΔΔG$ values between various sequences.

LGMar 18, 2025
BPINN-EM-Post: Bayesian Physics-Informed Neural Network based Stochastic Electromigration Damage Analysis in the Post-void Phase

Subed Lamichhane, Haotian Lu, Sheldon X. -D. Tan

In contrast to the assumptions of most existing Electromigration (EM) analysis tools, the evolution of EM-induced stress is inherently non-deterministic, influenced by factors such as input current fluctuations and manufacturing non-idealities. Traditional approaches for estimating stress variations typically involve computationally expensive and inefficient Monte Carlo simulations with industrial solvers, which quantify variations using mean and variance metrics. In this work, we introduce a novel machine learning-based framework, termed BPINN-EM- Post, for efficient stochastic analysis of EM-induced post-voiding aging processes. For the first time, our new approach integrates closed-form analytical solutions with a Bayesian Physics- Informed Neural Network (BPINN) framework to accelerate the analysis. The closed-form solutions enforce physical laws at the individual wire segment level, while the BPINN ensures that physics constraints at inter-segment junctions are satisfied and stochastic behaviors are accurately modeled. By reducing the number of variables in the loss functions through utilizing analytical solutions, our method significantly improves training efficiency without accuracy loss and naturally incorporates variational effects. Additionally, the analytical solutions effectively address the challenge of incorporating initial stress distributions in interconnect structures during post-void stress calculations. Numerical results demonstrate that BPINN-EM-Post achieves over 240x and more than 67x speedup compared to Monte Carlo simulations using the FEM-based COMSOL solver and FDM-based EMSpice, respectively, with marginal accuracy loss.