CEMay 21Code
Therm-FM: Foundation Model is ALL YOU NEED for 3D-ICs Thermal SimulationZhen Huang, Haiyang Xin, Wenkai Yang et al.
Data-driven thermal predictors for 3D-ICs are often trained from scratch for each chip design using many high-fidelity finite-element simulations, leading to high data-generation cost and costly cross-design reuse. We propose Therm-FM, a neural operator framework that adapts a pretrained partial differential equation (PDE) foundation model to steady-state and transient 3D-IC thermal simulation. The motivation is that steady-state and transient chip-level heat conduction respectively share elliptic and parabolic operator structures with diffusion-type PDEs, allowing pretrained diffusion priors to provide an effective initialization for thermal-field prediction under heterogeneous materials, dense TSV/microbump interconnects, and package-level boundary conditions. To further reduce data-generation cost, Therm-FM incorporates a thermal-equivalent multi-fidelity training strategy that uses low-cost approximate simulations for thermal-domain adaptation and limited high-fidelity samples for calibration. Experiments on public HotSpot benchmarks and industrial 3D-IC package benchmarks show that Therm-FM achieves up to a 10.6x reduction in mean error and surpasses prior best accuracy with less than 20% of the training data. In cross-chip adaptation, it matches or surpasses full-data baselines in several metrics using only 10--30 target samples. We release datasets, source code, and pretrained models at https://github.com/haiyangxin/Therm-FM.
CYMay 14
Computational Thinking Development in AI Agent Creation_A Mixed-Methods StudyYimeng Sun, Haiyang Xin, Qiannan Niu et al.
This mixed-methods study examined computational thinking (CT) development among 93 pre-high school students in a five-day AI agent creation workshop using CocoFlow, a no-code platform. Integrating pre-post assessments, behavioral logs, and interviews, we investigated CT development and how initial CT levels shape learning trajectories. Results revealed significant improvements in abstract thinking (effect size d = 0.71) and algorithmic thinking (effect size d = 0.70). Hierarchical regression identified iterative testing engagement as a predictor of self-efficacy gains (beta = 0.20, p = 0.05). Notably, students with moderate initial CT levels demonstrated substantially greater gains than both high-CT and low-CT peers, revealing an Optimal Development Zone effect (eta squared = 0.55). Qualitative analysis showed moderate-CT students exhibited adaptive expertise, while high-CT students risked over-engineering and low-CT students struggled with task decomposition. These findings challenge linear learning assumptions and provide evidence for differentiated scaffolding in CT education.
CYMay 13
Modeling AI-TPACK in Practice Insights from Teachers Multi-Agent Workflow DesignYimeng Sun, Haiyang Xin, Shuang Li et al.
This study investigates teachers design behaviors and cognitive underpinnings when designing multi-agent instructional workflows. Analyzing behavioral logs (N=61), cluster and Markov analyses identified three archetypes: Systematic Optimizers iteratively refining complex architectures; Prolific Creators rapidly prototyping pragmatic tools via scaffolding; and Passive Observers exhibiting polarized expert-novice profiles. Subsequent artifact (n=15) and interview (n=12) analyses reveal AI-TPACK integration emerges from a dynamic interplay of systems thinking, pedagogical beliefs, and self-efficacy, not merely from the possession of discrete knowledge. These findings call for differentiated scaffolding responsive to teachers cognitive-behavioral diversity.
CYMay 13
An Activity-Theoretical Approach to Teacher Professional Development in Pedagogical AI Agent DesignHaiyang Xin, Qiannan Niu, Shuang Li et al.
This two-cycle formative intervention study examined why teachers disengage from AI agent creation after professional development - a low engagement paradox - and tested whether systemic redesign could address it. Cycle 1 (N=218) revealed that despite completing comprehensive TPD, 87 percent of teachers ceased creating within three weeks, with behavioral tracking and interview analysis identifying systemic contradictions as the source of psychological need frustration rather than capacity deficits. Cycle 2 (N=26) implemented Cultural-Historical Activity Theory and Self-Determination Theory - driven redesign directly targeting diagnosed contradictions, achieving synchronized enhancement of both capacity and willingness. The findings reframe implementation failure as a rational response to need-thwarting systems and offer a replicable CHAT - SDT diagnostic framework for transformative professional development.
CYMay 13
MIRACLE_Multi-Agent Intelligent Regulation to Advance Collaborative Learning EnvironmentShuang Li, Haiyang Xin, Yimeng Sun et al.
Effective collaboration requires Socially Shared Regulation (SSRL), but students often lack these skills. This study introduces the MIRACLE (Multi-Agent Intelligent Regulation to Advance Collaborative Learning Environment) system, which supports SSRL by orchestrating metacognitive regulation and proactively providing emotional and motivational support. We conducted a quasi-experimental study with 90 fifth-grade students. The experimental group (n=42) used a collaborative platform CocoNote equipped with MIRACLE, while the control group (n=48) used the same platform with a general GPT assistant. Quantitative results show the MIRACLE group achieved significant gains across SSRL phases (Planning, Monitoring, Reflection) and produced higher-quality collaborative artifacts compared to the control group. Qualitative findings indicate students perceived MIRACLE as an effective facilitator for cognitive, regulatory, and emotional support. This study demonstrates that specialized, orchestrated AI systems are more effective than generic AI in enhancing SSRL.
LGApr 4
Simple yet Effective: Low-Rank Spatial Attention for Neural OperatorsZherui Yang, Haiyang Xin, Tao Du et al.
Neural operators have emerged as data-driven surrogates for solving partial differential equations (PDEs), and their success hinges on efficiently modeling the long-range, global coupling among spatial points induced by the underlying physics. In many PDE regimes, the induced global interaction kernels are empirically compressible, exhibiting rapid spectral decay that admits low-rank approximations. We leverage this observation to unify representative global mixing modules in neural operators under a shared low-rank template: compressing high-dimensional pointwise features into a compact latent space, processing global interactions within it, and reconstructing the global context back to spatial points. Guided by this view, we introduce Low-Rank Spatial Attention (LRSA) as a clean and direct instantiation of this template. Crucially, unlike prior approaches that often rely on non-standard aggregation or normalization modules, LRSA is built purely from standard Transformer primitives, i.e., attention, normalization, and feed-forward networks, yielding a concise block that is straightforward to implement and directly compatible with hardware-optimized kernels. In our experiments, such a simple construction is sufficient to achieve high accuracy, yielding an average error reduction of over 17\% relative to second-best methods, while remaining stable and efficient in mixed-precision training.
LGOct 29, 2025
Mixture-of-Experts Operator Transformer for Large-Scale PDE Pre-TrainingHong Wang, Haiyang Xin, Jie Wang et al.
Pre-training has proven effective in addressing data scarcity and performance limitations in solving PDE problems with neural operators. However, challenges remain due to the heterogeneity of PDE datasets in equation types, which leads to high errors in mixed training. Additionally, dense pre-training models that scale parameters by increasing network width or depth incur significant inference costs. To tackle these challenges, we propose a novel Mixture-of-Experts Pre-training Operator Transformer (MoE-POT), a sparse-activated architecture that scales parameters efficiently while controlling inference costs. Specifically, our model adopts a layer-wise router-gating network to dynamically select 4 routed experts from 16 expert networks during inference, enabling the model to focus on equation-specific features. Meanwhile, we also integrate 2 shared experts, aiming to capture common properties of PDE and reduce redundancy among routed experts. The final output is computed as the weighted average of the results from all activated experts. We pre-train models with parameters from 30M to 0.5B on 6 public PDE datasets. Our model with 90M activated parameters achieves up to a 40% reduction in zero-shot error compared with existing models with 120M activated parameters. Additionally, we conduct interpretability analysis, showing that dataset types can be inferred from router-gating network decisions, which validates the rationality and effectiveness of the MoE architecture.
LGOct 27, 2025
From Uniform to Adaptive: General Skip-Block Mechanisms for Efficient PDE Neural OperatorsLei Liu, Zhongyi Yu, Hong Wang et al.
In recent years, Neural Operators(NO) have gradually emerged as a popular approach for solving Partial Differential Equations (PDEs). However, their application to large-scale engineering tasks suffers from significant computational overhead. And the fact that current models impose a uniform computational cost while physical fields exhibit vastly different complexities constitutes a fundamental mismatch, which is the root of this inefficiency. For instance, in turbulence flows, intricate vortex regions require deeper network processing compared to stable flows. To address this, we introduce a framework: Skip-Block Routing (SBR), a general framework designed for Transformer-based neural operators, capable of being integrated into their multi-layer architectures. First, SBR uses a routing mechanism to learn the complexity and ranking of tokens, which is then applied during inference. Then, in later layers, it decides how many tokens are passed forward based on this ranking. This way, the model focuses more processing capacity on the tokens that are more complex. Experiments demonstrate that SBR is a general framework that seamlessly integrates into various neural operators. Our method reduces computational cost by approximately 50% in terms of Floating Point Operations (FLOPs), while still delivering up to 2x faster inference without sacrificing accuracy.
LGOct 24, 2025
Accelerating Data Generation for Nonlinear temporal PDEs via homologous perturbation in solution spaceLei Liu, Zhenxin Huang, Hong Wang et al.
Data-driven deep learning methods like neural operators have advanced in solving nonlinear temporal partial differential equations (PDEs). However, these methods require large quantities of solution pairs\u2014the solution functions and right-hand sides (RHS) of the equations. These pairs are typically generated via traditional numerical methods, which need thousands of time steps iterations far more than the dozens required for training, creating heavy computational and temporal overheads. To address these challenges, we propose a novel data generation algorithm, called HOmologous Perturbation in Solution Space (HOPSS), which directly generates training datasets with fewer time steps rather than following the traditional approach of generating large time steps datasets. This algorithm simultaneously accelerates dataset generation and preserves the approximate precision required for model training. Specifically, we first obtain a set of base solution functions from a reliable solver, usually with thousands of time steps, and then align them in time steps with training datasets by downsampling. Subsequently, we propose a "homologous perturbation" approach: by combining two solution functions (one as the primary function, the other as a homologous perturbation term scaled by a small scalar) with random noise, we efficiently generate comparable-precision PDE data points. Finally, using these data points, we compute the variation in the original equation's RHS to form new solution pairs. Theoretical and experimental results show HOPSS lowers time complexity. For example, on the Navier-Stokes equation, it generates 10,000 samples in approximately 10% of traditional methods' time, with comparable model training performance.