Yue Hao

CV
h-index8
9papers
30citations
Novelty53%
AI Score50

9 Papers

ACC-PHNov 11, 2022
Prior-mean-assisted Bayesian optimization application on FRIB Front-End tunning

Kilean Hwang, Tomofumi Maruta, Alexander Plastun et al.

Bayesian optimization~(BO) is often used for accelerator tuning due to its high sample efficiency. However, the computational scalability of training over large data-set can be problematic and the adoption of historical data in a computationally efficient way is not trivial. Here, we exploit a neural network model trained over historical data as a prior mean of BO for FRIB Front-End tuning.

CVSep 24, 2022
Spiking SiamFC++: Deep Spiking Neural Network for Object Tracking

Shuiying Xiang, Tao Zhang, Shuqing Jiang et al.

Spiking neural network (SNN) is a biologically-plausible model and exhibits advantages of high computational capability and low power consumption. While the training of deep SNN is still an open problem, which limits the real-world applications of deep SNN. Here we propose a deep SNN architecture named Spiking SiamFC++ for object tracking with end-to-end direct training. Specifically, the AlexNet network is extended in the time domain to extract the feature, and the surrogate gradient function is adopted to realize direct supervised training of the deep SNN. To examine the performance of the Spiking SiamFC++, several tracking benchmarks including OTB2013, OTB2015, VOT2015, VOT2016, and UAV123 are considered. It is found that, the precision loss is small compared with the original SiamFC++. Compared with the existing SNN-based target tracker, e.g., the SiamSNN, the precision (succession) of the proposed Spiking SiamFC++ reaches 85.24% (64.37%), which is much higher than that of 52.78% (44.32%) achieved by the SiamSNN. To our best knowledge, the performance of the Spiking SiamFC++ outperforms the existing state-of-the-art approaches in SNN-based object tracking, which provides a novel path for SNN application in the field of target tracking. This work may further promote the development of SNN algorithms and neuromorphic chips.

69.1AIMar 12
Multi-Agent Collaboration for Automated Design Exploration on High Performance Computing Systems

Harshitha Menon, Charles F. Jekel, Kevin Korner et al.

Today's scientific challenges, from climate modeling to Inertial Confinement Fusion design to novel material design, require exploring huge design spaces. In order to enable high-impact scientific discovery, we need to scale up our ability to test hypotheses, generate results, and learn from them rapidly. We present MADA (Multi-Agent Design Assistant), a Large Language Model (LLM) powered multi-agent framework that coordinates specialized agents for complex design workflows. A Job Management Agent (JMA) launches and manages ensemble simulations on HPC systems, a Geometry Agent (GA) generates meshes, and an Inverse Design Agent (IDA) proposes new designs informed by simulation outcomes. While general purpose, we focus development and validation on Richtmyer--Meshkov Instability (RMI) suppression, a critical challenge in Inertial Confinement Fusion. We evaluate on two complementary settings: running a hydrodynamics simulations on HPC systems, and using a pre-trained machine learning surrogate for rapid design exploration. Our results demonstrate that the MADA system successfully executes iterative design refinement, automatically improving designs toward optimal RMI suppression with minimal manual intervention. Our framework reduces cumbersome manual workflow setup, and enables automated design exploration at scale. More broadly, it demonstrates a reusable pattern for coupling reasoning, simulation, specialized tools, and coordinated workflows to accelerate scientific discovery.

96.6NIApr 13
Programmable Packet Scheduling with Dynamic Reordering at Line Rate

Zekun Wang, Binghao Yue, Yichen Deng et al.

High-speed switch packet scheduling demands both line-rate performance and programmability. Existing programmable hardware scheduling models, such as PIFO and PIEO, can express a broad range of scheduling algorithms; however, their semantics are restricted to packet-level ordering and cannot dynamically reorder buffered packets, which limits the support for dynamic-ordering algorithms such as pFabric. To overcome this limitation, we propose UIFO (Update-In-First-Out), a new programmable scheduling model that introduces a two-level abstraction over classes and packets. UIFO enables dynamic updates to the scheduling order at the class level while preserving in-order packet scheduling within each class, thereby supporting dynamic reordering of already-buffered packets. Furthermore, UIFO remains fully compatible with and generalizes existing PIFO and PIEO models. We implement a hardware prototype of UIFO based on priority-queue designs and evaluate it on an FPGA platform and in a 28 nm ASIC process. Overall, UIFO significantly enhances scheduling expressiveness and maintains favorable scalability while sustaining 100 Gbps line-rate throughput.

CVMar 4, 2025Code
DQO-MAP: Dual Quadrics Multi-Object mapping with Gaussian Splatting

Haoyuan Li, Ziqin Ye, Yue Hao et al.

Accurate object perception is essential for robotic applications such as object navigation. In this paper, we propose DQO-MAP, a novel object-SLAM system that seamlessly integrates object pose estimation and reconstruction. We employ 3D Gaussian Splatting for high-fidelity object reconstruction and leverage quadrics for precise object pose estimation. Both of them management is handled on the CPU, while optimization is performed on the GPU, significantly improving system efficiency. By associating objects with unique IDs, our system enables rapid object extraction from the scene. Extensive experimental results on object reconstruction and pose estimation demonstrate that DQO-MAP achieves outstanding performance in terms of precision, reconstruction quality, and computational efficiency. The code and dataset are available at: https://github.com/LiHaoy-ux/DQO-MAP.

DSJan 14
A Grouped Sorting Queue Supporting Dynamic Updates for Timer Management in High-Speed Network Interface Cards

Zekun Wang, Binghao Yue, Weitao Pan et al.

With the hardware offloading of network functions, network interface cards (NICs) undertake massive stateful, high-precision, and high-throughput tasks, where timers serve as a critical enabling component. However, existing timer management schemes suffer from heavy software load, low precision, lack of hardware update support, and overflow. This paper proposes two novel operations for priority queues--update and group sorting--to enable hardware timer management. To the best of our knowledge, this work presents the first hardware priority queue to support an update operation through the composition and propagation of basic operations to modify the priorities of elements within the queue. The group sorting mechanism ensures correct timing behavior post-overflow by establishing a group boundary priority to alter the sorting process and element insertion positions. Implemented with a hybrid architecture of a one-dimension (1D) systolic array and shift registers, our design is validated through packet-level simulations for flow table timeout management. Results demonstrate that a 4K-depth, 16-bit timer queue achieves over 500 MHz (175 Mpps, 12 ns precision) in a 28nm process and over 300 MHz (116 Mpps) on an FPGA. Critically, it reduces LUTs and FFs usage by 31% and 25%, respectively, compared to existing designs.

APP-PHOct 2, 2025
Multi-Agent Design Assistant for the Simulation of Inertial Fusion Energy

Meir H. Shachar, Dane M. Sterbentz, Harshitha Menon et al.

Inertial fusion energy promises nearly unlimited, clean power if it can be achieved. However, the design and engineering of fusion systems requires controlling and manipulating matter at extreme energies and timescales; the shock physics and radiation transport governing the physical behavior under these conditions are complex requiring the development, calibration, and use of predictive multiphysics codes to navigate the highly nonlinear and multi-faceted design landscape. We hypothesize that artificial intelligence reasoning models can be combined with physics codes and emulators to autonomously design fusion fuel capsules. In this article, we construct a multi-agent system where natural language is utilized to explore the complex physics regimes around fusion energy. The agentic system is capable of executing a high-order multiphysics inertial fusion computational code. We demonstrate the capacity of the multi-agent design assistant to both collaboratively and autonomously manipulate, navigate, and optimize capsule geometry while accounting for high fidelity physics that ultimately achieve simulated ignition via inverse design.

RONov 30, 2021
Metal Blossom: Laser Forming Complex and Freeform Metal Structures Imitating Flower Blooming

Yue Hao, Peiwen J. Ma, Huaishu Peng et al.

For centuries, human civilizations devised metal forming techniques to make tools and items; yet, customized metal forming remains costly and intricate. Laser-forming origami} (lasergami) is a metal forming process where a laser beam cuts and folds a planar metal sheet to form a three-dimensional (3D) shape. Designing foldable structures formable by lasers, however, has long been a trial-and-error practice that requires significant mental effort and hinders the possibility of creating practical structures. This work demonstrates for the first time that lasergami can form a freeform set of metallic structures previously believed to have been impossible to be laser-formed. This technological breakthrough is enabled by new computational origami methods that imitate flower blooming and optimize laser folding instructions. Combined with new ideas that address laser line of sight and minimize fabrication energy, we report a low-cost manufacturing framework that can be readily adopted by hobbyists and professionals alike.

RONov 20, 2020
Planning Folding Motion with Simulation in the Loop Using Laser Forming Origami and Thermal Behaviors as an Example

Yue Hao, Weilin Guan, Edwin A Peraza Hernandez et al.

Designing a robot or structure that can fold itself into a target shape is a process that involves challenges originated from multiple sources. For example, the designer of rigid self-folding robots must consider foldability from geometric and kinematic aspects to avoid self-intersection and undesired deformations. Recent works have shown success in estimating foldability of a design using robot motion planners. However, many foldable structures are actuated using physically coupled reactions (i.e., folding originated from thermal, chemical, or electromagnetic loads). Therefore, a reliable foldability analysis must consider additional constraints that resulted from these critical phenomena. This work investigates the idea of efficiently incorporating computationally expensive physics simulation within the folding motion planner to provide a better estimation of the foldability. In this paper, we will use laser forming origami as an example to demonstrate the benefits of considering the properties beyond geometry. We show that the design produced by the proposed method can be folded more efficiently.