CVApr 15
Seedance 2.0: Advancing Video Generation for World ComplexityTeam Seedance, De Chen, Liyang Chen et al. · gatech
Seedance 2.0 is a new native multi-modal audio-video generation model, officially released in China in early February 2026. Compared with its predecessors, Seedance 1.0 and 1.5 Pro, Seedance 2.0 adopts a unified, highly efficient, and large-scale architecture for multi-modal audio-video joint generation. This allows it to support four input modalities: text, image, audio, and video, by integrating one of the most comprehensive suites of multi-modal content reference and editing capabilities available in the industry to date. It delivers substantial, well-rounded improvements across all key sub-dimensions of video and audio generation. In both expert evaluations and public user tests, the model has demonstrated performance on par with the leading levels in the field. Seedance 2.0 supports direct generation of audio-video content with durations ranging from 4 to 15 seconds, with native output resolutions of 480p and 720p. For multi-modal inputs as reference, its current open platform supports up to 3 video clips, 9 images, and 3 audio clips. In addition, we provide Seedance 2.0 Fast version, an accelerated variant of Seedance 2.0 designed to boost generation speed for low-latency scenarios. Seedance 2.0 has delivered significant improvements to its foundational generation capabilities and multi-modal generation performance, bringing an enhanced creative experience for end users.
NAMay 2, 2012
A time-splitting spectral scheme for the Maxwell-Dirac systemZhongyi Huang, Shi Jin, Peter Markowich et al.
We present a time-splitting spectral scheme for the Maxwell-Dirac system and similar time-splitting methods for the corresponding asymptotic problems in the semi-classical and the non-relativistic regimes. The scheme for the Maxwell-Dirac system conserves the Lorentz gauge condition, is unconditionally stable and highly efficient as our numerical examples show. In particular we focus in our examples on the creation of positronic modes in the semi-classical regime and on the electron-positron interaction in the non-relativistic regime. Furthermore, in the non-relativistic regime, our numerical method exhibits uniform convergence in the small parameter $\dt$, which is the ratio of the characteristic speed and the speed of light.
NAMay 2, 2012
A Bloch decomposition based split-step pseudo spectral method for quantum dynamics with periodic potentialsZhongyi Huang, Shi Jin, Peter Markowich et al.
We present a new numerical method for accurate computations of solutions to (linear) one dimensional Schrödinger equations with periodic potentials. This is a prominent model in solid state physics where we also allow for perturbations by non-periodic potentials describing external electric fields. Our approach is based on the classical Bloch decomposition method which allows to diagonalize the periodic part of the Hamiltonian operator. Hence, the dominant effects from dispersion and periodic lattice potential are computed together, while the non-periodic potential acts only as a perturbation. Because the split-step communicator error between the periodic and non-periodic parts is relatively small, the step size can be chosen substantially larger than for the traditional splitting of the dispersion and potential operators. Indeed it is shown by the given examples, that our method is unconditionally stable and more efficient than the traditional split-step pseudo spectral schemes. To this end a particular focus is on the semiclassical regime, where the new algorithm naturally incorporates the adiabatic splitting of slow and fast degrees of freedom.
NAMay 2, 2012
Gaussian Beam Methods for the Dirac Equation in the Semi-classical RegimeHao Wu, Zhongyi Huang, Shi Jin et al.
The Dirac equation is an important model in relativistic quantum mechanics. In the semi-classical regime $ε\ll1$, even a spatially spectrally accurate time splitting method \cite{HuJi:05} requires the mesh size to be $O(ε)$, which makes the direct simulation extremely expensive. In this paper, we present the Gaussian beam method for the Dirac equation. With the help of an eigenvalue decomposition, the Gaussian beams can be independently evolved along each eigenspace and summed to construct an approximate solution of the Dirac equation. Moreover, the proposed Eulerian Gaussian beam keeps the advantages of constructing the Hessian matrices by simply using level set functions' derivatives. Finally, several numerical examples show the efficiency and accuracy of the method.
CVNov 2, 2022
tSF: Transformer-based Semantic Filter for Few-Shot LearningJinxiang Lai, Siqian Yang, Wenlong Liu et al.
Few-Shot Learning (FSL) alleviates the data shortage challenge via embedding discriminative target-aware features among plenty seen (base) and few unseen (novel) labeled samples. Most feature embedding modules in recent FSL methods are specially designed for corresponding learning tasks (e.g., classification, segmentation, and object detection), which limits the utility of embedding features. To this end, we propose a light and universal module named transformer-based Semantic Filter (tSF), which can be applied for different FSL tasks. The proposed tSF redesigns the inputs of a transformer-based structure by a semantic filter, which not only embeds the knowledge from whole base set to novel set but also filters semantic features for target category. Furthermore, the parameters of tSF is equal to half of a standard transformer block (less than 1M). In the experiments, our tSF is able to boost the performances in different classic few-shot learning tasks (about 2% improvement), especially outperforms the state-of-the-arts on multiple benchmark datasets in few-shot classification task.
NAApr 16, 2016
A Bloch decomposition-based stochastic Galerkin method for quantum dynamics with a random external potentialZhizhang Wu, Zhongyi Huang
In this paper, we consider the numerical solution of the one-dimensional Schrödinger equation with a periodic lattice potential and a random external potential. This is an important model in solid state physics where the randomness is involved to describe some complicated phenomena that are not exactly known. Here we generalize the Bloch decomposition-based time-splitting pseudospectral method to the stochastic setting using the generalize polynomial chaos with a Galerkin procedure so that the main effects of dispersion and periodic potential are still computed together. We prove that our method is unconditionally stable and numerical examples show that it has other nice properties and is more efficient than the traditional method. Finally, we give some numerical evidence for the well-known phenomenon of Anderson localization.
CVMay 20, 2025Code
Decoupling Classifier for Boosting Few-shot Object Detection and Instance SegmentationBin-Bin Gao, Xiaochen Chen, Zhongyi Huang et al.
This paper focus on few-shot object detection~(FSOD) and instance segmentation~(FSIS), which requires a model to quickly adapt to novel classes with a few labeled instances. The existing methods severely suffer from bias classification because of the missing label issue which naturally exists in an instance-level few-shot scenario and is first formally proposed by us. Our analysis suggests that the standard classification head of most FSOD or FSIS models needs to be decoupled to mitigate the bias classification. Therefore, we propose an embarrassingly simple but effective method that decouples the standard classifier into two heads. Then, these two individual heads are capable of independently addressing clear positive samples and noisy negative samples which are caused by the missing label. In this way, the model can effectively learn novel classes while mitigating the effects of noisy negative samples. Without bells and whistles, our model without any additional computation cost and parameters consistently outperforms its baseline and state-of-the-art by a large margin on PASCAL VOC and MS-COCO benchmarks for FSOD and FSIS tasks. The Code is available at https://csgaobb.github.io/Projects/DCFS.
DCMay 4Code
SPECTRE: Hybrid Ordinary-Parallel Speculative Serving for Resource-Efficient LLM InferenceJincheng Xie, Yawen Ling, Qi Xiao et al.
LLM serving platforms are increasingly deployed as multi-model cloud systems, where user demand is often long-tailed: a few popular large models receive most requests, while many smaller tail models remain underutilized. We propose \textbf{SPECTRE} (Parallel \textbf{SPEC}ulative Decoding with a Multi-\textbf{T}enant \textbf{RE}mote Drafter), a serving framework that reuses underutilized tail-model services as remote drafters for heavily loaded large-model services through speculative decoding. SPECTRE enables draft generation and target-side verification to run in parallel, and makes such parallelism effective through three techniques: a hybrid ordinary-parallel speculative decoding strategy guided by a threshold derived from throughput analysis, speculative priority scheduling to preserve draft--target overlap under multi-tenant traffic, and draft-side prompt compression to reduce draft latency. We implement SPECTRE in \texttt{SGLang} and evaluate it across multiple draft--target model pairs, reasoning benchmarks, real-world long-context workloads, and a wide range of batch sizes. Results show that SPECTRE consistently improves large-model serving throughput while causing only minor interference to the native workloads of tail-model services. In large-model deployments, including Qwen3-235B-A22B with TP=8, SPECTRE achieves up to \textbf{2.28$\times$ speedup} over autoregressive decoding and up to an additional \textbf{66\% relative improvement} over the strongest speculative decoding baselines. Talk is cheap, we show you the code: https://github.com/sgl-project/sglang/pull/22272.
AIApr 18
EmergentBridge: Improving Zero-Shot Cross-Modal Transfer in Unified Multimodal Embedding ModelsJincheng Xie, Xingchen Xiao, Heyan Huang et al.
Unified multimodal embedding spaces underpin practical applications such as cross-modal retrieval and zero-shot recognition. In many real deployments, however, supervision is available only for a small subset of modality pairs (e.g., image--text), leaving \emph{unpaired} modality pairs (e.g., audio$\leftrightarrow$depth, infrared$\leftrightarrow$audio) weakly connected and thus performing poorly on zero-shot transfer. Addressing this sparse-pairing regime is therefore essential for scaling unified embedding systems to new tasks without curating exhaustive pairwise data. We propose \textbf{EmergentBridge}, an embedding-level bridging framework that improves performance on these unpaired pairs \emph{without requiring exhaustive pairwise supervision}. Our key observation is that naively aligning a new modality to a synthesized proxy embedding can introduce \emph{gradient interference}, degrading the anchor-alignment structure that existing retrieval/classification relies on. EmergentBridge addresses this by (i) learning a mapping that produces a \emph{noisy bridge anchor} (a proxy embedding of an already-aligned modality) from an anchor embedding, and (ii) enforcing proxy alignment only in the subspace orthogonal to the anchor-alignment direction, preserving anchor alignment while strengthening non-anchor connectivity. Across nine datasets spanning multiple modalities, EmergentBridge consistently outperforms prior binding baselines on zero-shot classification and retrieval, demonstrating strong emergent alignment.
LGAug 1, 2024
Convergence Analysis of Natural Gradient Descent for Over-parameterized Physics-Informed Neural NetworksXianliang Xu, Ting Du, Wang Kong et al.
In the context of over-parameterization, there is a line of work demonstrating that randomly initialized (stochastic) gradient descent (GD) converges to a globally optimal solution at a linear convergence rate for the quadratic loss function. However, the learning rate of GD for training two-layer neural networks exhibits poor dependence on the sample size and the Gram matrix, leading to a slow training process. In this paper, we show that for training two-layer $\text{ReLU}^3$ Physics-Informed Neural Networks (PINNs), the learning rate can be improved from $\mathcal{O}(λ_0)$ to $\mathcal{O}(1/\|\bm{H}^{\infty}\|_2)$, implying that GD actually enjoys a faster convergence rate. Despite such improvements, the convergence rate is still tied to the least eigenvalue of the Gram matrix, leading to slow convergence. We then develop the positive definiteness of Gram matrices with general smooth activation functions and provide the convergence analysis of natural gradient descent (NGD) in training two-layer PINNs, demonstrating that the learning rate can be $\mathcal{O}(1)$ and at this rate, the convergence rate is independent of the Gram matrix. In particular, for smooth activation functions, the convergence rate of NGD is quadratic. Numerical experiments are conducted to verify our theoretical results.
ITMay 9
Tight Lower Bounds on The Single-Error Detection Threshold for Analog Error-Correcting CodesZhengyi Jiang, Wenhao Liu, Zhongyi Huang et al.
Analog error-correcting codes (Analog ECCs) for approximate vector-matrix multiplication have been extensively studied as means to achieve fault-tolerant in-memory computation. The theoretical foundations for such coding schemes, particularly the characterization of their correction capabilities via the height profile, have been well established in recent literature. In this paper, we focus on the case of single-error detection Analog ECCs. Among several open problems related to this case proposed by Ron M. Roth in [1], Problem 1 asks: "Identify the values of $k$ and $n$ for which every linear $[n, k]$ code $\mathcal{C}$ over $\mathbb{R}$ satisfies: $$\mathsf{h}_1(\mathcal{C}):=\max_{\boldsymbol{c}\in \mathcal{C}\setminus{\{\boldsymbol{0}\}}}\mathsf{h}_1(\boldsymbol{c})\geq \Big\lceil \frac{k}{n-k} \Big\rceil.\text{"}$$ Here, for any $\boldsymbol{x}\in\mathbb{R}^n$, $\mathsf{h}_1(\boldsymbol{x})$ represents the ratio between the largest and second largest absolute values of $\boldsymbol{x}$'s entries. As the simplest special case of Problem 1 (with $n-k=2$), the following problem was posed as Problem 2 in [1]: "Must every $(n-2)$-dimensional subspace of $\mathbb{R}^n$, $n$ even, contain a nonzero vector in which the ratio between the largest and second largest absolute values of its entries is at least $(n/2)-1$?" These problems directly pertain to the lower bounds on the single-error detection threshold for Analog ECCs: Problem 1 corresponds to arbitrary $n-k$ and Problem 2 corresponds to $n-k=2$. In this paper, we provide an affirmative answer to Problem 2 and a rigorous proof using theories related to convex optimization. Furthermore, we extend our analytical method to show that the lower bound in Problem 1 is tight for the case where $n-k$ divides $k$. Our results fill the gap in the lower bound theory of thresholds for single-error detection in Analog ECCs.
LGSep 7, 2024
Component Fourier Neural Operator for Singularly Perturbed Differential EquationsYe Li, Ting Du, Yiwen Pang et al.
Solving Singularly Perturbed Differential Equations (SPDEs) poses computational challenges arising from the rapid transitions in their solutions within thin regions. The effectiveness of deep learning in addressing differential equations motivates us to employ these methods for solving SPDEs. In this manuscript, we introduce Component Fourier Neural Operator (ComFNO), an innovative operator learning method that builds upon Fourier Neural Operator (FNO), while simultaneously incorporating valuable prior knowledge obtained from asymptotic analysis. Our approach is not limited to FNO and can be applied to other neural network frameworks, such as Deep Operator Network (DeepONet), leading to potential similar SPDEs solvers. Experimental results across diverse classes of SPDEs demonstrate that ComFNO significantly improves accuracy compared to vanilla FNO. Furthermore, ComFNO exhibits natural adaptability to diverse data distributions and performs well in few-shot scenarios, showcasing its excellent generalization ability in practical situations.
LGJul 3, 2024
Convergence of Implicit Gradient Descent for Training Two-Layer Physics-Informed Neural NetworksXianliang Xu, Ting Du, Wang Kong et al.
The optimization algorithms are crucial in training physics-informed neural networks (PINNs), as unsuitable methods may lead to poor solutions. Compared to the common gradient descent (GD) algorithm, implicit gradient descent (IGD) outperforms it in handling certain multi-scale problems. In this paper, we provide convergence analysis for the IGD in training over-parameterized two-layer PINNs. We first derive the training dynamics of IGD in training two-layer PINNs. Then, over-parameterization allows us to prove that the randomly initialized IGD converges to a globally optimal solution at a linear convergence rate. Moreover, due to the distinct training dynamics of IGD compared to GD, the learning rate can be selected independently of the sample size and the least eigenvalue of the Gram matrix. Additionally, the novel approach used in our convergence analysis imposes a milder requirement on the network width. Finally, empirical results validate our theoretical findings.
ITMay 5
Joint Design of Piggyback and Conjugate Transformation Functions for Repair Bandwidth Reduction in Piggybacking CodesHao Shi, Zhengyi Jiang, Gefeng Deng et al.
Efficient node repair is a central requirement in distributed storage systems, particularly in high-rate erasure-coded deployments where repair traffic directly affects network overhead and recovery cost. Piggybacking codes reduce the repair bandwidth of MDS array codes while keeping the sub-packetization level small. However, existing piggybacking constructions often rely on restrictive piggyback-function designs to preserve the MDS property over small fields, which limits their repair-bandwidth reduction. We propose {\em conjugate-piggybacking} codes, a new class of MDS array codes that jointly design piggyback functions and conjugate transformations under small sub-packetization. The proposed construction improves repair efficiency while preserving the MDS property over moderate field sizes. In particular, it enables some parity nodes to achieve optimal repair bandwidth and reduces the overall repair bandwidth compared with existing piggybacking-based designs. We analyze the MDS property and repair bandwidth of the proposed codes and evaluate them against existing piggybacking codes under high-code-rate settings over $\mathbb{F}_{2^8}$. We further conduct a repair-traffic simulation under uniform single-node failures to quantify the expected traffic reduction in storage-oriented settings. The results show that our construction consistently achieves lower repair bandwidth than related piggybacking codes and reduces expected repair traffic compared with conventional RS repair. These gains are obtained at the cost of a slightly larger field size, revealing a practical trade-off between repair efficiency and field-size overhead for high-rate distributed storage.
CVDec 8, 2025
Robust Variational Model Based Tailored UNet: Leveraging Edge Detector and Mean Curvature for Improved Image SegmentationKaili Qi, Zhongyi Huang, Wenli Yang
To address the challenge of segmenting noisy images with blurred or fragmented boundaries, this paper presents a robust version of Variational Model Based Tailored UNet (VM_TUNet), a hybrid framework that integrates variational methods with deep learning. The proposed approach incorporates physical priors, an edge detector and a mean curvature term, into a modified Cahn-Hilliard equation, aiming to combine the interpretability and boundary-smoothing advantages of variational partial differential equations (PDEs) with the strong representational ability of deep neural networks. The architecture consists of two collaborative modules: an F module, which conducts efficient frequency domain preprocessing to alleviate poor local minima, and a T module, which ensures accurate and stable local computations, backed by a stability estimate. Extensive experiments on three benchmark datasets indicate that the proposed method achieves a balanced trade-off between performance and computational efficiency, which yields competitive quantitative results and improved visual quality compared to pure convolutional neural network (CNN) based models, while achieving performance close to that of transformer-based method with reasonable computational expense.
LGDec 25, 2024
Computing Approximate Graph Edit Distance via Optimal TransportQihao Cheng, Da Yan, Tianhao Wu et al.
Given a graph pair $(G^1, G^2)$, graph edit distance (GED) is defined as the minimum number of edit operations converting $G^1$ to $G^2$. GED is a fundamental operation widely used in many applications, but its exact computation is NP-hard, so the approximation of GED has gained a lot of attention. Data-driven learning-based methods have been found to provide superior results compared to classical approximate algorithms, but they directly fit the coupling relationship between a pair of vertices from their vertex features. We argue that while pairwise vertex features can capture the coupling cost (discrepancy) of a pair of vertices, the vertex coupling matrix should be derived from the vertex-pair cost matrix through a more well-established method that is aware of the global context of the graph pair, such as optimal transport. In this paper, we propose an ensemble approach that integrates a supervised learning-based method and an unsupervised method, both based on optimal transport. Our learning method, GEDIOT, is based on inverse optimal transport that leverages a learnable Sinkhorn algorithm to generate the coupling matrix. Our unsupervised method, GEDGW, models GED computation as a linear combination of optimal transport and its variant, Gromov-Wasserstein discrepancy, for node and edge operations, respectively, which can be solved efficiently without needing the ground truth. Our ensemble method, GEDHOT, combines GEDIOT and GEDGW to further boost the performance. Extensive experiments demonstrate that our methods significantly outperform the existing methods in terms of the performance of GED computation, edit path generation, and model generalizability.
NAApr 5
A Geometry-Aware Operator Learning Framework for Interface Problems on Varying DomainsShanshan Xiao, Ye Li, Zhongyi Huang et al.
Solving Partial Differential Equation (PDE) interface problems on varying domains is a critical task in design and optimization, yet it remains computationally prohibitive for traditional solvers. Although operator learning has shown promise on fixed geometries, its potential for geometry-dependent interface problems has been largely unexplored. To bridge this gap, we propose an extension-based neural operator framework applicable to general linear interface problems. A key innovation of our method is the integration of the Tailored Finite Point Method (TFPM) with our base network, which reduces memory consumption and effectively alleviates the curse of dimensionality. On the theoretical front, we establish the continuity of the Helmholtz operator with respect to domain perturbations and provide rigorous error estimates for the proposed encodings. Comprehensive numerical experiments demonstrate that our framework achieves state-of-the-art accuracy and robustness. Consequently, this work provides a powerful, data-efficient tool for varying-domain simulations, offering new possibilities for real-time shape optimization.
CVDec 15, 2025
Seedance 1.5 pro: A Native Audio-Visual Joint Generation Foundation ModelTeam Seedance, Heyi Chen, Siyan Chen et al.
Recent strides in video generation have paved the way for unified audio-visual generation. In this work, we present Seedance 1.5 pro, a foundational model engineered specifically for native, joint audio-video generation. Leveraging a dual-branch Diffusion Transformer architecture, the model integrates a cross-modal joint module with a specialized multi-stage data pipeline, achieving exceptional audio-visual synchronization and superior generation quality. To ensure practical utility, we implement meticulous post-training optimizations, including Supervised Fine-Tuning (SFT) on high-quality datasets and Reinforcement Learning from Human Feedback (RLHF) with multi-dimensional reward models. Furthermore, we introduce an acceleration framework that boosts inference speed by over 10X. Seedance 1.5 pro distinguishes itself through precise multilingual and dialect lip-syncing, dynamic cinematic camera control, and enhanced narrative coherence, positioning it as a robust engine for professional-grade content creation. Seedance 1.5 pro is now accessible on Volcano Engine at https://console.volcengine.com/ark/region:ark+cn-beijing/experience/vision?type=GenVideo.
SDDec 15, 2025
Let the Model Learn to Feel: Mode-Guided Tonality Injection for Symbolic Music Emotion RecognitionHaiying Xia, Zhongyi Huang, Yumei Tan et al.
Music emotion recognition is a key task in symbolic music understanding (SMER). Recent approaches have shown promising results by fine-tuning large-scale pre-trained models (e.g., MIDIBERT, a benchmark in symbolic music understanding) to map musical semantics to emotional labels. While these models effectively capture distributional musical semantics, they often overlook tonal structures, particularly musical modes, which play a critical role in emotional perception according to music psychology. In this paper, we investigate the representational capacity of MIDIBERT and identify its limitations in capturing mode-emotion associations. To address this issue, we propose a Mode-Guided Enhancement (MoGE) strategy that incorporates psychological insights on mode into the model. Specifically, we first conduct a mode augmentation analysis, which reveals that MIDIBERT fails to effectively encode emotion-mode correlations. We then identify the least emotion-relevant layer within MIDIBERT and introduce a Mode-guided Feature-wise linear modulation injection (MoFi) framework to inject explicit mode features, thereby enhancing the model's capability in emotional representation and inference. Extensive experiments on the EMOPIA and VGMIDI datasets demonstrate that our mode injection strategy significantly improves SMER performance, achieving accuracies of 75.2% and 59.1%, respectively. These results validate the effectiveness of mode-guided modeling in symbolic music emotion recognition.
CVMay 9, 2025
Image Segmentation via Variational Model Based Tailored UNet: A Deep Variational FrameworkKaili Qi, Wenli Yang, Ye Li et al.
Traditional image segmentation methods, such as variational models based on partial differential equations (PDEs), offer strong mathematical interpretability and precise boundary modeling, but often suffer from sensitivity to parameter settings and high computational costs. In contrast, deep learning models such as UNet, which are relatively lightweight in parameters, excel in automatic feature extraction but lack theoretical interpretability and require extensive labeled data. To harness the complementary strengths of both paradigms, we propose Variational Model Based Tailored UNet (VM_TUNet), a novel hybrid framework that integrates the fourth-order modified Cahn-Hilliard equation with the deep learning backbone of UNet, which combines the interpretability and edge-preserving properties of variational methods with the adaptive feature learning of neural networks. Specifically, a data-driven operator is introduced to replace manual parameter tuning, and we incorporate the tailored finite point method (TFPM) to enforce high-precision boundary preservation. Experimental results on benchmark datasets demonstrate that VM_TUNet achieves superior segmentation performance compared to existing approaches, especially for fine boundary delineation.
LGJan 14, 2025
Distributed Nonparametric Estimation: from Sparse to Dense Samples per TerminalDeheng Yuan, Tao Guo, Zhongyi Huang
Consider the communication-constrained problem of nonparametric function estimation, in which each distributed terminal holds multiple i.i.d. samples. Under certain regularity assumptions, we characterize the minimax optimal rates for all regimes, and identify phase transitions of the optimal rates as the samples per terminal vary from sparse to dense. This fully solves the problem left open by previous works, whose scopes are limited to regimes with either dense samples or a single sample per terminal. To achieve the optimal rates, we design a layered estimation protocol by exploiting protocols for the parametric density estimation problem. We show the optimality of the protocol using information-theoretic methods and strong data processing inequalities, and incorporating the classic balls and bins model. The optimal rates are immediate for various special cases such as density estimation, Gaussian, binary, Poisson and heteroskedastic regression models.
LGDec 7, 2024
Convergence analysis of wide shallow neural operators within the framework of Neural Tangent KernelXianliang Xu, Ye Li, Zhongyi Huang
Neural operators are aiming at approximating operators mapping between Banach spaces of functions, achieving much success in the field of scientific computing. Compared to certain deep learning-based solvers, such as Physics-Informed Neural Networks (PINNs), Deep Ritz Method (DRM), neural operators can solve a class of Partial Differential Equations (PDEs). Although much work has been done to analyze the approximation and generalization error of neural operators, there is still a lack of analysis on their training error. In this work, we conduct the convergence analysis of gradient descent for the wide shallow neural operators and physics-informed shallow neural operators within the framework of Neural Tangent Kernel (NTK). The core idea lies on the fact that over-parameterization and random initialization together ensure that each weight vector remains near its initialization throughout all iterations, yielding the linear convergence of gradient descent. In this work, we demonstrate that under the setting of over-parametrization, gradient descent can find the global minimum regardless of whether it is in continuous time or discrete time.